Re: Question related to patch: x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jiang Liu <jiang.liu@linux.intel.com>
To: Mika Westerberg <mika.westerberg@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>
Cc: jarkko.nikula@linux.intel.com, linux-kernel@vger.kernel.org
Subject: Re: Question related to patch: x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces
Date: Tue, 8 Sep 2015 20:43:49 +0800	[thread overview]
Message-ID: <55EED805.7060902@linux.intel.com> (raw)
In-Reply-To: <20150908104048.GB12118@lahna.fi.intel.com>

On 2015/9/8 18:40, Mika Westerberg wrote:
> Hi Jiang and Thomas,
> 
> With recent kernels I'm getting crash when a GPIO interrupt triggers. For
> unknown reasons I'm not able to capture the crash on the serial console so I
> did following change to irq_move_masked_irq():
> 
> 	assert_raw_spin_locked(&desc->lock);
> 
> to
> 
> 	WARN_ON(!raw_spin_is_locked(&desc->lock));
> 
> The backtrace looks like:
> 
> [    6.733640] WARNING: CPU: 0 PID: 5 at kernel/irq/migration.c:32 irq_move_masked_irq+0xb8/0xc0() 
> [    6.768765] Modules linked in: x86_pkg_temp_thermal i2c_hid(+)
> [    6.775425] CPU: 0 PID: 5 Comm: kworker/0:0H Not tainted 4.2.0+ #124
> [    6.782639]  ffffffff81747e59 ffff8801c3c03e18 ffffffff815d6165 0000000000000000
> [    6.791110]  ffff8801c3c03e50 ffffffff81050c9d ffff8801bf00a600 ffff8801beb594c0
> [    6.799540]  ffff8801beb594c0 0000000000000002 0000000042000000 ffff8801c3c03e60
> [    6.807989] Call Trace:
> [    6.810755]  <IRQ>  [<ffffffff815d6165>] dump_stack+0x4e/0x79
> [    6.817307]  [<ffffffff81050c9d>] warn_slowpath_common+0x7d/0xb0
> [    6.824121]  [<ffffffff81050d85>] warn_slowpath_null+0x15/0x20
> [    6.830736]  [<ffffffff810a0a88>] irq_move_masked_irq+0xb8/0xc0
> [    6.837450]  [<ffffffff8103c161>] ioapic_ack_level+0x111/0x130
> [    6.844079]  [<ffffffff812bbfe8>] intel_gpio_irq_handler+0x148/0x1c0
> [    6.851306]  [<ffffffff81006bfb>] handle_irq+0xab/0x180
> [    6.857235]  [<ffffffff812a8a87>] ? debug_smp_processor_id+0x17/0x20
> [    6.864440]  [<ffffffff810063c2>] do_IRQ+0x52/0xe0
> [    6.869877]  [<ffffffff815dcfbf>] common_interrupt+0x7f/0x7f
> [    6.876292]  <EOI>  [<ffffffff815dbf6d>] ? _raw_write_unlock_irq+0xd/0x30
> [    6.884008]  [<ffffffff815dbf99>] _raw_spin_unlock_irq+0x9/0x10
> [    6.890721]  [<ffffffff810745bc>] finish_task_switch+0x7c/0x1a0
> [    6.897438]  [<ffffffff815d7b9b>] __schedule+0x33b/0xb20
> [    6.903461]  [<ffffffff8107cc77>] ? __enqueue_entity+0x67/0x70
> [    6.910079]  [<ffffffff815d8408>] schedule+0x38/0x90
> [    6.915711]  [<ffffffff815db3a8>] schedule_timeout+0x158/0x250
> [    6.922328]  [<ffffffff812a8a87>] ? debug_smp_processor_id+0x17/0x20
> [    6.929534]  [<ffffffff815d933f>] wait_for_completion_killable+0xaf/0x220
> [    6.937231]  [<ffffffff81076d80>] ? wake_up_q+0x60/0x60
> [    6.943158]  [<ffffffff810682d0>] ? process_one_work+0x4a0/0x4a0
> [    6.949970]  [<ffffffff8106d5c2>] kthread_create_on_node+0xd2/0x170
> [    6.957079]  [<ffffffff8129ac39>] ? snprintf+0x39/0x40
> [    6.962907]  [<ffffffff810665b9>] create_worker+0xb9/0x190
> [    6.969130]  [<ffffffff810685d3>] worker_thread+0x303/0x4b0
> [    6.975452]  [<ffffffff810682d0>] ? process_one_work+0x4a0/0x4a0
> [    6.982262]  [<ffffffff8106d8bf>] kthread+0xcf/0xf0
> [    6.987797]  [<ffffffff815dbf99>] ? _raw_spin_unlock_irq+0x9/0x10
> [    6.994722]  [<ffffffff8106d7f0>] ? kthread_worker_fn+0x190/0x190
> [    7.001636]  [<ffffffff815dc86f>] ret_from_fork+0x3f/0x70
> [    7.007765]  [<ffffffff8106d7f0>] ? kthread_worker_fn+0x190/0x190
> 
> The driver in question is drivers/pinctrl/intel/pinctrl-intel.c and the
> corresponding code which triggers this is below:
> 
> static void intel_gpio_irq_handler(unsigned irq, struct irq_desc *desc)
> {
>         struct gpio_chip *gc = irq_desc_get_handler_data(desc);
>         struct intel_pinctrl *pctrl = gpiochip_to_pinctrl(gc);
>         struct irq_chip *chip = irq_desc_get_chip(desc);
>         int i;
> 
>         chained_irq_enter(chip, desc);
> 
>         /* Need to check all communities for pending interrupts */
>         for (i = 0; i < pctrl->ncommunities; i++)
>                 intel_gpio_community_irq_handler(gc, &pctrl->communities[i]);
> 
>         chained_irq_exit(chip, desc);
> }
> 
> So once chained_irq_exit() is called, it in turn calls chip->irq_eoi()
> which in this case is ioapic_ack_level().
> 
> I'm sure such crash did not happen when pinctrl-intel.c was developed so I
> started to investigate what might have changed. Note that it may be the
> driver does something wrong and the crash is expected. However, many other
> drivers do pretty much the same (with the exception that the parent IRQ
> chip is not IO-APIC in most cases).
> 
> To me assert_raw_spin_locked(&desc->lock) looks valid as the function goes
> and modifies desc right after the check. Also in intel_gpio_irq_handler()
> desc->lock is not locked (not sure if it needs to be).
> 
> After "manual" bisection I found the commit that causes the crash to be
> triggered:
> 
> 	commit aa5cb97f14a2dd5aefabed6538c35ebc087d7c24
> 	Author: Jiang Liu <jiang.liu@linux.intel.com>
> 	Date:   Tue Apr 14 10:29:42 2015 +0800
> 
> 	    x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces
> 	    
> 	    Now there is no user of x86_io_apic_ops.set_affinity anymore, so remove
> 	    it.
> 
> It says that there are no users for x86_io_apic_ops.set_affinity but then
> it does this:
> 
> -               x86_io_apic_ops.set_affinity(idata, mask, false);
> +               irq_set_affinity(irq, mask);
> 
> The difference is that x86_io_apic_ops.set_affinity() programs affinity
> directly to the hardware (if I understand it right) but irq_set_affinity()
> calls irqd_set_move_pending() which defers programming the hardware later.

Hi Mika,
	I feel this is the root cause and will investigate it tomorrow.
Thanks!
Gerry

> 
> Now when an interrupt triggers we end up calling irq_move_masked_irq() with
> unlocked descriptor.
> 
> Without the above change we never do that and the crash does not happen.
> 
> Since I don't know much about genirq and IO-APIC code in particular, I
> would like to get your input on how to solve the problem. Do I need to
> change the pinctrl driver somehow to lock the descriptor or maybe the
> commit above misses something?
> 
> It is really easy to trigger so please let me know if further debugging is
> needed.
> 
> Thanks in advance.
>

next prev parent reply	other threads:[~2015-09-08 12:44 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-08 10:40 Question related to patch: x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces Mika Westerberg
2015-09-08 12:43 ` Jiang Liu [this message]
2015-09-08 14:02 ` Thomas Gleixner
2015-09-08 14:21   ` Mika Westerberg
2015-09-08 14:25     ` Thomas Gleixner
2015-09-08 14:36       ` Mika Westerberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55EED805.7060902@linux.intel.com \
    --to=jiang.liu@linux.intel.com \
    --cc=jarkko.nikula@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mika.westerberg@linux.intel.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox