From: Jiang Liu <jiang.liu@linux.intel.com>
To: Mika Westerberg <mika.westerberg@linux.intel.com>,
Thomas Gleixner <tglx@linutronix.de>
Cc: jarkko.nikula@linux.intel.com, linux-kernel@vger.kernel.org
Subject: Re: Question related to patch: x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces
Date: Tue, 8 Sep 2015 20:43:49 +0800 [thread overview]
Message-ID: <55EED805.7060902@linux.intel.com> (raw)
In-Reply-To: <20150908104048.GB12118@lahna.fi.intel.com>
On 2015/9/8 18:40, Mika Westerberg wrote:
> Hi Jiang and Thomas,
>
> With recent kernels I'm getting crash when a GPIO interrupt triggers. For
> unknown reasons I'm not able to capture the crash on the serial console so I
> did following change to irq_move_masked_irq():
>
> assert_raw_spin_locked(&desc->lock);
>
> to
>
> WARN_ON(!raw_spin_is_locked(&desc->lock));
>
> The backtrace looks like:
>
> [ 6.733640] WARNING: CPU: 0 PID: 5 at kernel/irq/migration.c:32 irq_move_masked_irq+0xb8/0xc0()
> [ 6.768765] Modules linked in: x86_pkg_temp_thermal i2c_hid(+)
> [ 6.775425] CPU: 0 PID: 5 Comm: kworker/0:0H Not tainted 4.2.0+ #124
> [ 6.782639] ffffffff81747e59 ffff8801c3c03e18 ffffffff815d6165 0000000000000000
> [ 6.791110] ffff8801c3c03e50 ffffffff81050c9d ffff8801bf00a600 ffff8801beb594c0
> [ 6.799540] ffff8801beb594c0 0000000000000002 0000000042000000 ffff8801c3c03e60
> [ 6.807989] Call Trace:
> [ 6.810755] <IRQ> [<ffffffff815d6165>] dump_stack+0x4e/0x79
> [ 6.817307] [<ffffffff81050c9d>] warn_slowpath_common+0x7d/0xb0
> [ 6.824121] [<ffffffff81050d85>] warn_slowpath_null+0x15/0x20
> [ 6.830736] [<ffffffff810a0a88>] irq_move_masked_irq+0xb8/0xc0
> [ 6.837450] [<ffffffff8103c161>] ioapic_ack_level+0x111/0x130
> [ 6.844079] [<ffffffff812bbfe8>] intel_gpio_irq_handler+0x148/0x1c0
> [ 6.851306] [<ffffffff81006bfb>] handle_irq+0xab/0x180
> [ 6.857235] [<ffffffff812a8a87>] ? debug_smp_processor_id+0x17/0x20
> [ 6.864440] [<ffffffff810063c2>] do_IRQ+0x52/0xe0
> [ 6.869877] [<ffffffff815dcfbf>] common_interrupt+0x7f/0x7f
> [ 6.876292] <EOI> [<ffffffff815dbf6d>] ? _raw_write_unlock_irq+0xd/0x30
> [ 6.884008] [<ffffffff815dbf99>] _raw_spin_unlock_irq+0x9/0x10
> [ 6.890721] [<ffffffff810745bc>] finish_task_switch+0x7c/0x1a0
> [ 6.897438] [<ffffffff815d7b9b>] __schedule+0x33b/0xb20
> [ 6.903461] [<ffffffff8107cc77>] ? __enqueue_entity+0x67/0x70
> [ 6.910079] [<ffffffff815d8408>] schedule+0x38/0x90
> [ 6.915711] [<ffffffff815db3a8>] schedule_timeout+0x158/0x250
> [ 6.922328] [<ffffffff812a8a87>] ? debug_smp_processor_id+0x17/0x20
> [ 6.929534] [<ffffffff815d933f>] wait_for_completion_killable+0xaf/0x220
> [ 6.937231] [<ffffffff81076d80>] ? wake_up_q+0x60/0x60
> [ 6.943158] [<ffffffff810682d0>] ? process_one_work+0x4a0/0x4a0
> [ 6.949970] [<ffffffff8106d5c2>] kthread_create_on_node+0xd2/0x170
> [ 6.957079] [<ffffffff8129ac39>] ? snprintf+0x39/0x40
> [ 6.962907] [<ffffffff810665b9>] create_worker+0xb9/0x190
> [ 6.969130] [<ffffffff810685d3>] worker_thread+0x303/0x4b0
> [ 6.975452] [<ffffffff810682d0>] ? process_one_work+0x4a0/0x4a0
> [ 6.982262] [<ffffffff8106d8bf>] kthread+0xcf/0xf0
> [ 6.987797] [<ffffffff815dbf99>] ? _raw_spin_unlock_irq+0x9/0x10
> [ 6.994722] [<ffffffff8106d7f0>] ? kthread_worker_fn+0x190/0x190
> [ 7.001636] [<ffffffff815dc86f>] ret_from_fork+0x3f/0x70
> [ 7.007765] [<ffffffff8106d7f0>] ? kthread_worker_fn+0x190/0x190
>
> The driver in question is drivers/pinctrl/intel/pinctrl-intel.c and the
> corresponding code which triggers this is below:
>
> static void intel_gpio_irq_handler(unsigned irq, struct irq_desc *desc)
> {
> struct gpio_chip *gc = irq_desc_get_handler_data(desc);
> struct intel_pinctrl *pctrl = gpiochip_to_pinctrl(gc);
> struct irq_chip *chip = irq_desc_get_chip(desc);
> int i;
>
> chained_irq_enter(chip, desc);
>
> /* Need to check all communities for pending interrupts */
> for (i = 0; i < pctrl->ncommunities; i++)
> intel_gpio_community_irq_handler(gc, &pctrl->communities[i]);
>
> chained_irq_exit(chip, desc);
> }
>
> So once chained_irq_exit() is called, it in turn calls chip->irq_eoi()
> which in this case is ioapic_ack_level().
>
> I'm sure such crash did not happen when pinctrl-intel.c was developed so I
> started to investigate what might have changed. Note that it may be the
> driver does something wrong and the crash is expected. However, many other
> drivers do pretty much the same (with the exception that the parent IRQ
> chip is not IO-APIC in most cases).
>
> To me assert_raw_spin_locked(&desc->lock) looks valid as the function goes
> and modifies desc right after the check. Also in intel_gpio_irq_handler()
> desc->lock is not locked (not sure if it needs to be).
>
> After "manual" bisection I found the commit that causes the crash to be
> triggered:
>
> commit aa5cb97f14a2dd5aefabed6538c35ebc087d7c24
> Author: Jiang Liu <jiang.liu@linux.intel.com>
> Date: Tue Apr 14 10:29:42 2015 +0800
>
> x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces
>
> Now there is no user of x86_io_apic_ops.set_affinity anymore, so remove
> it.
>
> It says that there are no users for x86_io_apic_ops.set_affinity but then
> it does this:
>
> - x86_io_apic_ops.set_affinity(idata, mask, false);
> + irq_set_affinity(irq, mask);
>
> The difference is that x86_io_apic_ops.set_affinity() programs affinity
> directly to the hardware (if I understand it right) but irq_set_affinity()
> calls irqd_set_move_pending() which defers programming the hardware later.
Hi Mika,
I feel this is the root cause and will investigate it tomorrow.
Thanks!
Gerry
>
> Now when an interrupt triggers we end up calling irq_move_masked_irq() with
> unlocked descriptor.
>
> Without the above change we never do that and the crash does not happen.
>
> Since I don't know much about genirq and IO-APIC code in particular, I
> would like to get your input on how to solve the problem. Do I need to
> change the pinctrl driver somehow to lock the descriptor or maybe the
> commit above misses something?
>
> It is really easy to trigger so please let me know if further debugging is
> needed.
>
> Thanks in advance.
>
next prev parent reply other threads:[~2015-09-08 12:44 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-08 10:40 Question related to patch: x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces Mika Westerberg
2015-09-08 12:43 ` Jiang Liu [this message]
2015-09-08 14:02 ` Thomas Gleixner
2015-09-08 14:21 ` Mika Westerberg
2015-09-08 14:25 ` Thomas Gleixner
2015-09-08 14:36 ` Mika Westerberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55EED805.7060902@linux.intel.com \
--to=jiang.liu@linux.intel.com \
--cc=jarkko.nikula@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mika.westerberg@linux.intel.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.