* Question related to patch: x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces
@ 2015-09-08 10:40 Mika Westerberg
2015-09-08 12:43 ` Jiang Liu
2015-09-08 14:02 ` Thomas Gleixner
0 siblings, 2 replies; 6+ messages in thread
From: Mika Westerberg @ 2015-09-08 10:40 UTC (permalink / raw)
To: Jiang Liu, Thomas Gleixner; +Cc: jarkko.nikula, linux-kernel
Hi Jiang and Thomas,
With recent kernels I'm getting crash when a GPIO interrupt triggers. For
unknown reasons I'm not able to capture the crash on the serial console so I
did following change to irq_move_masked_irq():
assert_raw_spin_locked(&desc->lock);
to
WARN_ON(!raw_spin_is_locked(&desc->lock));
The backtrace looks like:
[ 6.733640] WARNING: CPU: 0 PID: 5 at kernel/irq/migration.c:32 irq_move_masked_irq+0xb8/0xc0()
[ 6.768765] Modules linked in: x86_pkg_temp_thermal i2c_hid(+)
[ 6.775425] CPU: 0 PID: 5 Comm: kworker/0:0H Not tainted 4.2.0+ #124
[ 6.782639] ffffffff81747e59 ffff8801c3c03e18 ffffffff815d6165 0000000000000000
[ 6.791110] ffff8801c3c03e50 ffffffff81050c9d ffff8801bf00a600 ffff8801beb594c0
[ 6.799540] ffff8801beb594c0 0000000000000002 0000000042000000 ffff8801c3c03e60
[ 6.807989] Call Trace:
[ 6.810755] <IRQ> [<ffffffff815d6165>] dump_stack+0x4e/0x79
[ 6.817307] [<ffffffff81050c9d>] warn_slowpath_common+0x7d/0xb0
[ 6.824121] [<ffffffff81050d85>] warn_slowpath_null+0x15/0x20
[ 6.830736] [<ffffffff810a0a88>] irq_move_masked_irq+0xb8/0xc0
[ 6.837450] [<ffffffff8103c161>] ioapic_ack_level+0x111/0x130
[ 6.844079] [<ffffffff812bbfe8>] intel_gpio_irq_handler+0x148/0x1c0
[ 6.851306] [<ffffffff81006bfb>] handle_irq+0xab/0x180
[ 6.857235] [<ffffffff812a8a87>] ? debug_smp_processor_id+0x17/0x20
[ 6.864440] [<ffffffff810063c2>] do_IRQ+0x52/0xe0
[ 6.869877] [<ffffffff815dcfbf>] common_interrupt+0x7f/0x7f
[ 6.876292] <EOI> [<ffffffff815dbf6d>] ? _raw_write_unlock_irq+0xd/0x30
[ 6.884008] [<ffffffff815dbf99>] _raw_spin_unlock_irq+0x9/0x10
[ 6.890721] [<ffffffff810745bc>] finish_task_switch+0x7c/0x1a0
[ 6.897438] [<ffffffff815d7b9b>] __schedule+0x33b/0xb20
[ 6.903461] [<ffffffff8107cc77>] ? __enqueue_entity+0x67/0x70
[ 6.910079] [<ffffffff815d8408>] schedule+0x38/0x90
[ 6.915711] [<ffffffff815db3a8>] schedule_timeout+0x158/0x250
[ 6.922328] [<ffffffff812a8a87>] ? debug_smp_processor_id+0x17/0x20
[ 6.929534] [<ffffffff815d933f>] wait_for_completion_killable+0xaf/0x220
[ 6.937231] [<ffffffff81076d80>] ? wake_up_q+0x60/0x60
[ 6.943158] [<ffffffff810682d0>] ? process_one_work+0x4a0/0x4a0
[ 6.949970] [<ffffffff8106d5c2>] kthread_create_on_node+0xd2/0x170
[ 6.957079] [<ffffffff8129ac39>] ? snprintf+0x39/0x40
[ 6.962907] [<ffffffff810665b9>] create_worker+0xb9/0x190
[ 6.969130] [<ffffffff810685d3>] worker_thread+0x303/0x4b0
[ 6.975452] [<ffffffff810682d0>] ? process_one_work+0x4a0/0x4a0
[ 6.982262] [<ffffffff8106d8bf>] kthread+0xcf/0xf0
[ 6.987797] [<ffffffff815dbf99>] ? _raw_spin_unlock_irq+0x9/0x10
[ 6.994722] [<ffffffff8106d7f0>] ? kthread_worker_fn+0x190/0x190
[ 7.001636] [<ffffffff815dc86f>] ret_from_fork+0x3f/0x70
[ 7.007765] [<ffffffff8106d7f0>] ? kthread_worker_fn+0x190/0x190
The driver in question is drivers/pinctrl/intel/pinctrl-intel.c and the
corresponding code which triggers this is below:
static void intel_gpio_irq_handler(unsigned irq, struct irq_desc *desc)
{
struct gpio_chip *gc = irq_desc_get_handler_data(desc);
struct intel_pinctrl *pctrl = gpiochip_to_pinctrl(gc);
struct irq_chip *chip = irq_desc_get_chip(desc);
int i;
chained_irq_enter(chip, desc);
/* Need to check all communities for pending interrupts */
for (i = 0; i < pctrl->ncommunities; i++)
intel_gpio_community_irq_handler(gc, &pctrl->communities[i]);
chained_irq_exit(chip, desc);
}
So once chained_irq_exit() is called, it in turn calls chip->irq_eoi()
which in this case is ioapic_ack_level().
I'm sure such crash did not happen when pinctrl-intel.c was developed so I
started to investigate what might have changed. Note that it may be the
driver does something wrong and the crash is expected. However, many other
drivers do pretty much the same (with the exception that the parent IRQ
chip is not IO-APIC in most cases).
To me assert_raw_spin_locked(&desc->lock) looks valid as the function goes
and modifies desc right after the check. Also in intel_gpio_irq_handler()
desc->lock is not locked (not sure if it needs to be).
After "manual" bisection I found the commit that causes the crash to be
triggered:
commit aa5cb97f14a2dd5aefabed6538c35ebc087d7c24
Author: Jiang Liu <jiang.liu@linux.intel.com>
Date: Tue Apr 14 10:29:42 2015 +0800
x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces
Now there is no user of x86_io_apic_ops.set_affinity anymore, so remove
it.
It says that there are no users for x86_io_apic_ops.set_affinity but then
it does this:
- x86_io_apic_ops.set_affinity(idata, mask, false);
+ irq_set_affinity(irq, mask);
The difference is that x86_io_apic_ops.set_affinity() programs affinity
directly to the hardware (if I understand it right) but irq_set_affinity()
calls irqd_set_move_pending() which defers programming the hardware later.
Now when an interrupt triggers we end up calling irq_move_masked_irq() with
unlocked descriptor.
Without the above change we never do that and the crash does not happen.
Since I don't know much about genirq and IO-APIC code in particular, I
would like to get your input on how to solve the problem. Do I need to
change the pinctrl driver somehow to lock the descriptor or maybe the
commit above misses something?
It is really easy to trigger so please let me know if further debugging is
needed.
Thanks in advance.
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Question related to patch: x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces
2015-09-08 10:40 Question related to patch: x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces Mika Westerberg
@ 2015-09-08 12:43 ` Jiang Liu
2015-09-08 14:02 ` Thomas Gleixner
1 sibling, 0 replies; 6+ messages in thread
From: Jiang Liu @ 2015-09-08 12:43 UTC (permalink / raw)
To: Mika Westerberg, Thomas Gleixner; +Cc: jarkko.nikula, linux-kernel
On 2015/9/8 18:40, Mika Westerberg wrote:
> Hi Jiang and Thomas,
>
> With recent kernels I'm getting crash when a GPIO interrupt triggers. For
> unknown reasons I'm not able to capture the crash on the serial console so I
> did following change to irq_move_masked_irq():
>
> assert_raw_spin_locked(&desc->lock);
>
> to
>
> WARN_ON(!raw_spin_is_locked(&desc->lock));
>
> The backtrace looks like:
>
> [ 6.733640] WARNING: CPU: 0 PID: 5 at kernel/irq/migration.c:32 irq_move_masked_irq+0xb8/0xc0()
> [ 6.768765] Modules linked in: x86_pkg_temp_thermal i2c_hid(+)
> [ 6.775425] CPU: 0 PID: 5 Comm: kworker/0:0H Not tainted 4.2.0+ #124
> [ 6.782639] ffffffff81747e59 ffff8801c3c03e18 ffffffff815d6165 0000000000000000
> [ 6.791110] ffff8801c3c03e50 ffffffff81050c9d ffff8801bf00a600 ffff8801beb594c0
> [ 6.799540] ffff8801beb594c0 0000000000000002 0000000042000000 ffff8801c3c03e60
> [ 6.807989] Call Trace:
> [ 6.810755] <IRQ> [<ffffffff815d6165>] dump_stack+0x4e/0x79
> [ 6.817307] [<ffffffff81050c9d>] warn_slowpath_common+0x7d/0xb0
> [ 6.824121] [<ffffffff81050d85>] warn_slowpath_null+0x15/0x20
> [ 6.830736] [<ffffffff810a0a88>] irq_move_masked_irq+0xb8/0xc0
> [ 6.837450] [<ffffffff8103c161>] ioapic_ack_level+0x111/0x130
> [ 6.844079] [<ffffffff812bbfe8>] intel_gpio_irq_handler+0x148/0x1c0
> [ 6.851306] [<ffffffff81006bfb>] handle_irq+0xab/0x180
> [ 6.857235] [<ffffffff812a8a87>] ? debug_smp_processor_id+0x17/0x20
> [ 6.864440] [<ffffffff810063c2>] do_IRQ+0x52/0xe0
> [ 6.869877] [<ffffffff815dcfbf>] common_interrupt+0x7f/0x7f
> [ 6.876292] <EOI> [<ffffffff815dbf6d>] ? _raw_write_unlock_irq+0xd/0x30
> [ 6.884008] [<ffffffff815dbf99>] _raw_spin_unlock_irq+0x9/0x10
> [ 6.890721] [<ffffffff810745bc>] finish_task_switch+0x7c/0x1a0
> [ 6.897438] [<ffffffff815d7b9b>] __schedule+0x33b/0xb20
> [ 6.903461] [<ffffffff8107cc77>] ? __enqueue_entity+0x67/0x70
> [ 6.910079] [<ffffffff815d8408>] schedule+0x38/0x90
> [ 6.915711] [<ffffffff815db3a8>] schedule_timeout+0x158/0x250
> [ 6.922328] [<ffffffff812a8a87>] ? debug_smp_processor_id+0x17/0x20
> [ 6.929534] [<ffffffff815d933f>] wait_for_completion_killable+0xaf/0x220
> [ 6.937231] [<ffffffff81076d80>] ? wake_up_q+0x60/0x60
> [ 6.943158] [<ffffffff810682d0>] ? process_one_work+0x4a0/0x4a0
> [ 6.949970] [<ffffffff8106d5c2>] kthread_create_on_node+0xd2/0x170
> [ 6.957079] [<ffffffff8129ac39>] ? snprintf+0x39/0x40
> [ 6.962907] [<ffffffff810665b9>] create_worker+0xb9/0x190
> [ 6.969130] [<ffffffff810685d3>] worker_thread+0x303/0x4b0
> [ 6.975452] [<ffffffff810682d0>] ? process_one_work+0x4a0/0x4a0
> [ 6.982262] [<ffffffff8106d8bf>] kthread+0xcf/0xf0
> [ 6.987797] [<ffffffff815dbf99>] ? _raw_spin_unlock_irq+0x9/0x10
> [ 6.994722] [<ffffffff8106d7f0>] ? kthread_worker_fn+0x190/0x190
> [ 7.001636] [<ffffffff815dc86f>] ret_from_fork+0x3f/0x70
> [ 7.007765] [<ffffffff8106d7f0>] ? kthread_worker_fn+0x190/0x190
>
> The driver in question is drivers/pinctrl/intel/pinctrl-intel.c and the
> corresponding code which triggers this is below:
>
> static void intel_gpio_irq_handler(unsigned irq, struct irq_desc *desc)
> {
> struct gpio_chip *gc = irq_desc_get_handler_data(desc);
> struct intel_pinctrl *pctrl = gpiochip_to_pinctrl(gc);
> struct irq_chip *chip = irq_desc_get_chip(desc);
> int i;
>
> chained_irq_enter(chip, desc);
>
> /* Need to check all communities for pending interrupts */
> for (i = 0; i < pctrl->ncommunities; i++)
> intel_gpio_community_irq_handler(gc, &pctrl->communities[i]);
>
> chained_irq_exit(chip, desc);
> }
>
> So once chained_irq_exit() is called, it in turn calls chip->irq_eoi()
> which in this case is ioapic_ack_level().
>
> I'm sure such crash did not happen when pinctrl-intel.c was developed so I
> started to investigate what might have changed. Note that it may be the
> driver does something wrong and the crash is expected. However, many other
> drivers do pretty much the same (with the exception that the parent IRQ
> chip is not IO-APIC in most cases).
>
> To me assert_raw_spin_locked(&desc->lock) looks valid as the function goes
> and modifies desc right after the check. Also in intel_gpio_irq_handler()
> desc->lock is not locked (not sure if it needs to be).
>
> After "manual" bisection I found the commit that causes the crash to be
> triggered:
>
> commit aa5cb97f14a2dd5aefabed6538c35ebc087d7c24
> Author: Jiang Liu <jiang.liu@linux.intel.com>
> Date: Tue Apr 14 10:29:42 2015 +0800
>
> x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces
>
> Now there is no user of x86_io_apic_ops.set_affinity anymore, so remove
> it.
>
> It says that there are no users for x86_io_apic_ops.set_affinity but then
> it does this:
>
> - x86_io_apic_ops.set_affinity(idata, mask, false);
> + irq_set_affinity(irq, mask);
>
> The difference is that x86_io_apic_ops.set_affinity() programs affinity
> directly to the hardware (if I understand it right) but irq_set_affinity()
> calls irqd_set_move_pending() which defers programming the hardware later.
Hi Mika,
I feel this is the root cause and will investigate it tomorrow.
Thanks!
Gerry
>
> Now when an interrupt triggers we end up calling irq_move_masked_irq() with
> unlocked descriptor.
>
> Without the above change we never do that and the crash does not happen.
>
> Since I don't know much about genirq and IO-APIC code in particular, I
> would like to get your input on how to solve the problem. Do I need to
> change the pinctrl driver somehow to lock the descriptor or maybe the
> commit above misses something?
>
> It is really easy to trigger so please let me know if further debugging is
> needed.
>
> Thanks in advance.
>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Question related to patch: x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces
2015-09-08 10:40 Question related to patch: x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces Mika Westerberg
2015-09-08 12:43 ` Jiang Liu
@ 2015-09-08 14:02 ` Thomas Gleixner
2015-09-08 14:21 ` Mika Westerberg
1 sibling, 1 reply; 6+ messages in thread
From: Thomas Gleixner @ 2015-09-08 14:02 UTC (permalink / raw)
To: Mika Westerberg; +Cc: Jiang Liu, jarkko.nikula, linux-kernel
On Tue, 8 Sep 2015, Mika Westerberg wrote:
> It says that there are no users for x86_io_apic_ops.set_affinity but then
> it does this:
>
> - x86_io_apic_ops.set_affinity(idata, mask, false);
> + irq_set_affinity(irq, mask);
>
> The difference is that x86_io_apic_ops.set_affinity() programs affinity
> directly to the hardware (if I understand it right) but irq_set_affinity()
> calls irqd_set_move_pending() which defers programming the hardware later.
>
> Now when an interrupt triggers we end up calling irq_move_masked_irq() with
> unlocked descriptor.
>
> Without the above change we never do that and the crash does not happen.
Right. Patch below should fix that issue.
Thanks,
tglx
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 38a76f826530..d2ea50c5e936 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2522,6 +2522,7 @@ void __init setup_ioapic_dest(void)
int pin, ioapic, irq, irq_entry;
const struct cpumask *mask;
struct irq_data *idata;
+ struct irq_chip *chip;
if (skip_ioapic_setup == 1)
return;
@@ -2545,9 +2546,9 @@ void __init setup_ioapic_dest(void)
else
mask = apic->target_cpus();
- irq_set_affinity(irq, mask);
+ chip = irq_data_get_chip(idata);
+ chip->irq_set_affinity(idata, mask, false);
}
-
}
#endif
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Question related to patch: x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces
2015-09-08 14:02 ` Thomas Gleixner
@ 2015-09-08 14:21 ` Mika Westerberg
2015-09-08 14:25 ` Thomas Gleixner
0 siblings, 1 reply; 6+ messages in thread
From: Mika Westerberg @ 2015-09-08 14:21 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: Jiang Liu, jarkko.nikula, linux-kernel
On Tue, Sep 08, 2015 at 04:02:29PM +0200, Thomas Gleixner wrote:
> On Tue, 8 Sep 2015, Mika Westerberg wrote:
> > It says that there are no users for x86_io_apic_ops.set_affinity but then
> > it does this:
> >
> > - x86_io_apic_ops.set_affinity(idata, mask, false);
> > + irq_set_affinity(irq, mask);
> >
> > The difference is that x86_io_apic_ops.set_affinity() programs affinity
> > directly to the hardware (if I understand it right) but irq_set_affinity()
> > calls irqd_set_move_pending() which defers programming the hardware later.
> >
> > Now when an interrupt triggers we end up calling irq_move_masked_irq() with
> > unlocked descriptor.
> >
> > Without the above change we never do that and the crash does not happen.
>
> Right. Patch below should fix that issue.
>
> Thanks,
>
> tglx
>
> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> index 38a76f826530..d2ea50c5e936 100644
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -2522,6 +2522,7 @@ void __init setup_ioapic_dest(void)
> int pin, ioapic, irq, irq_entry;
> const struct cpumask *mask;
> struct irq_data *idata;
> + struct irq_chip *chip;
>
> if (skip_ioapic_setup == 1)
> return;
> @@ -2545,9 +2546,9 @@ void __init setup_ioapic_dest(void)
> else
> mask = apic->target_cpus();
>
> - irq_set_affinity(irq, mask);
> + chip = irq_data_get_chip(idata);
There seems to be no irq_data_get_chip() helper in current mainline. So
I replaced this with:
chip = idata->chip;
> + chip->irq_set_affinity(idata, mask, false);
> }
> -
> }
> #endif
With the above change the patch fixes the problem, thanks!
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Question related to patch: x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces
2015-09-08 14:21 ` Mika Westerberg
@ 2015-09-08 14:25 ` Thomas Gleixner
2015-09-08 14:36 ` Mika Westerberg
0 siblings, 1 reply; 6+ messages in thread
From: Thomas Gleixner @ 2015-09-08 14:25 UTC (permalink / raw)
To: Mika Westerberg; +Cc: Jiang Liu, jarkko.nikula, linux-kernel
On Tue, 8 Sep 2015, Mika Westerberg wrote:
> > + chip = irq_data_get_chip(idata);
>
> There seems to be no irq_data_get_chip() helper in current mainline. So
get_irq_chip() of course ....
> I replaced this with:
>
> chip = idata->chip;
>
> > + chip->irq_set_affinity(idata, mask, false);
> > }
> > -
> > }
> > #endif
>
> With the above change the patch fixes the problem, thanks!
>
> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Question related to patch: x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces
2015-09-08 14:25 ` Thomas Gleixner
@ 2015-09-08 14:36 ` Mika Westerberg
0 siblings, 0 replies; 6+ messages in thread
From: Mika Westerberg @ 2015-09-08 14:36 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: Jiang Liu, jarkko.nikula, linux-kernel
On Tue, Sep 08, 2015 at 04:25:44PM +0200, Thomas Gleixner wrote:
> On Tue, 8 Sep 2015, Mika Westerberg wrote:
> > > + chip = irq_data_get_chip(idata);
> >
> > There seems to be no irq_data_get_chip() helper in current mainline. So
>
> get_irq_chip() of course ....
Indeed with irq_data_get_irq_chip() it compiles and works fine.
Thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-09-08 14:36 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-08 10:40 Question related to patch: x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces Mika Westerberg
2015-09-08 12:43 ` Jiang Liu
2015-09-08 14:02 ` Thomas Gleixner
2015-09-08 14:21 ` Mika Westerberg
2015-09-08 14:25 ` Thomas Gleixner
2015-09-08 14:36 ` Mika Westerberg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox