From: Brandon Philips <bphilips@suse.de>
To: Yinghai Lu <Yinghai.Lu@Sun.COM>
Cc: Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
Suresh Siddha <suresh.b.siddha@intel.com>,
linux-kernel@vger.kernel.org
Subject: Re: x86: fix race in create_irq_nr on irq_desc
Date: Wed, 3 Feb 2010 19:17:34 -0800 [thread overview]
Message-ID: <20100204031734.GA4930@jenkins.home.ifup.org> (raw)
In-Reply-To: <4B69CF1C.7050603@sun.com>
On 11:31 Wed 03 Feb 2010, Yinghai Lu wrote:
> On 02/03/2010 09:42 AM, Brandon Philips wrote:
> > On 02:20 Wed 03 Feb 2010, Yinghai Lu wrote:
> >> On 02/02/2010 07:31 PM, Brandon Philips wrote:
> >>> Race in create_irq_nr():
> >>>
> >>> - Thread 1 loops through and calls irq_to_desc_alloc_node with new=0x66.
> >>>
> >>> - Thread 2 has exited the loop with irq=0x66 and calls dynamic_irq_init(0x66)
> >>> setting desc->chip_data = NULL
> >>>
> >>> - Thread 1 then dereferences NULL via desc_new->chip_data->vector
> >>
> >> two threads get same irq?
> >
> > This race happened when two drivers were setting up MSI-X at the same
> > time via pci_enable_msix(). See this dmesg excerpt:
> >
> > [ 85.170610] ixgbe 0000:02:00.1: irq 97 for MSI/MSI-X
> > [ 85.170611] alloc irq_desc for 99 on node -1
> > [ 85.170613] igb 0000:08:00.1: irq 98 for MSI/MSI-X
> > [ 85.170614] alloc kstat_irqs on node -1
> > [ 85.170616] alloc irq_2_iommu on node -1
> > [ 85.170617] alloc irq_desc for 100 on node -1
> > [ 85.170619] alloc kstat_irqs on node -1
> > [ 85.170621] alloc irq_2_iommu on node -1
> > [ 85.170625] ixgbe 0000:02:00.1: irq 99 for MSI/MSI-X
> > [ 85.170626] alloc irq_desc for 101 on node -1
> > [ 85.170628] igb 0000:08:00.1: irq 100 for MSI/MSI-X
> > [ 85.170630] alloc kstat_irqs on node -1
> > [ 85.170631] alloc irq_2_iommu on node -1
> > [ 85.170635] alloc irq_desc for 102 on node -1
> > [ 85.170636] alloc kstat_irqs on node -1
> > [ 85.170639] alloc irq_2_iommu on node -1
> > [ 85.170646] BUG: unable to handle kernel NULL pointer dereference
> > at 0000000000000088
> >
> > As you can see igb and ixgbe are both alternating on create_irq_nr()
> > via pci_enable_msix() in their probe function. So, let me rewrite my
> > explanation using this example:
> >
> > ixgbe: While looping through irq_desc_ptrs[] via create_irq_nr() ixgbe
> > choses irq_desc_ptrs[102] and exits the loop, drops vector_lock and
> > calls dynamic_irq_init. Then it sets irq_desc_ptrs[102]->chip_data =
> > NULL via dynamic_irq_init().
> >
> > igb: Grabs the vector_lock now and starts looping over irq_desc_ptrs[]
> > via create_irq_nr(). It gets to irq_desc_ptrs[102] and does this:
> >
> > cfg_new = irq_desc_ptrs[102]->chip_data;
> > if (cfg_new->vector != 0)
> > continue;
> >
> > This hits the NULL deref.
> >
>
> please try following patch in addition to
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=37ef2a3029fde884808ff1b369677abc7dd9a79a
How is this commit related to this bug? The NULL deref I am hitting is
from this bit in create_irq_nr():
if (cfg_new->vector != 0)
continue;
Which comes before the assignment of cfg_new. I don't see how it is
related. Plus, node == -1 in this case so move_irq_desc() is a no-op.
> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> index 7edafc7..14099ba 100644
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -3280,12 +3280,9 @@ unsigned int create_irq_nr(unsigned int irq_want, int node)
> }
> spin_unlock_irqrestore(&vector_lock, flags);
>
> - if (irq > 0) {
> - dynamic_irq_init(irq);
> - /* restore it, in case dynamic_irq_init clear it */
> - if (desc_new)
> - desc_new->chip_data = cfg_new;
> - }
> + if (irq > 0)
> + dynamic_irq_init_keep_chip_data(irq);
> +
> return irq;
> }
That would solve it too but I don't think it is a great
solution. Keeping the vector_lock until we are completely done setting
up the irq is more straightforward and won't cost much time at all.
I am hesitant to have it tested since it is a really small race
window, reproducing took 40+ reboots initially and looks technically
correct.
Thanks,
Brandon
next prev parent reply other threads:[~2010-02-04 3:18 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-03 3:31 x86: fix race in create_irq_nr on irq_desc Brandon Philips
2010-02-03 10:20 ` Yinghai Lu
2010-02-03 17:42 ` Brandon Philips
2010-02-03 19:31 ` Yinghai Lu
2010-02-04 3:17 ` Brandon Philips [this message]
2010-02-05 8:45 ` [PATCH] x86: keep chip_data in create_irq_nr Yinghai Lu
2010-02-05 21:05 ` Brandon Philips
2010-02-05 21:42 ` H. Peter Anvin
2010-02-05 21:09 ` [PATCH] x86: keep chip_data in create_irq_nr and destroy_irq Brandon Philips
2010-02-05 22:44 ` Yinghai Lu
2010-02-05 22:55 ` Brandon Philips
2010-02-06 0:06 ` Yinghai Lu
2010-02-06 0:18 ` [PATCH v2] " Brandon Philips
2010-02-06 6:42 ` [PATCH v3] " Brandon Philips
2010-02-06 7:16 ` Yinghai Lu
2010-02-06 20:05 ` Brandon Philips
2010-02-07 21:02 ` [PATCH v4] " Brandon Philips
2010-02-19 6:06 ` [tip:x86/urgent] x86, irq: Keep " tip-bot for Brandon Philips
2010-02-26 10:26 ` [tip:x86/irq] x86: apic: Fix mismerge, add arch_probe_nr_irqs() again tip-bot for Ingo Molnar
2010-02-26 18:19 ` Yinghai Lu
2010-02-27 9:10 ` Ingo Molnar
2010-02-27 9:37 ` Eric W. Biederman
2010-02-27 9:53 ` Ingo Molnar
2010-02-27 10:12 ` Eric W. Biederman
2010-03-01 11:22 ` Ian Campbell
2010-03-01 18:34 ` Eric W. Biederman
2010-03-01 21:44 ` Ian Campbell
2010-03-01 21:58 ` Eric W. Biederman
2010-03-02 8:31 ` Thomas Gleixner
2010-03-10 10:55 ` Ian Campbell
2010-03-10 10:55 ` [PATCH] x86: namespace some I/O APIC related structures and functions ijc
2010-03-10 17:07 ` Eric W. Biederman
2010-03-10 10:55 ` [PATCH] irq: move some interrupt arch_* functions into struct irq_chip ijc
2010-03-10 11:00 ` Ian Campbell
2010-03-10 17:18 ` Eric W. Biederman
2010-03-10 17:41 ` Ian Campbell
2010-03-10 18:11 ` Eric W. Biederman
2010-03-10 12:06 ` Yinghai Lu
2010-03-10 12:51 ` Ian Campbell
2010-03-10 17:42 ` Eric W. Biederman
2010-03-10 17:50 ` Ian Campbell
2010-03-10 18:15 ` Eric W. Biederman
2010-03-10 18:28 ` Ian Campbell
2010-03-10 18:27 ` Jeremy Fitzhardinge
2010-03-10 18:59 ` Yinghai Lu
2010-03-10 19:15 ` Eric W. Biederman
2010-03-10 22:07 ` Michael Ellerman
2010-03-10 10:55 ` [PATCH] x86: irq_desc->chip_data is always correct whether or not SPARSE_IRQ is enabled ijc
2010-03-01 22:01 ` [tip:x86/irq] x86: apic: Fix mismerge, add arch_probe_nr_irqs() again Jeremy Fitzhardinge
2010-02-27 12:57 ` [tip:x86/apic] " tip-bot for Ingo Molnar
2010-02-03 10:32 ` x86: fix race in create_irq_nr on irq_desc Yinghai Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100204031734.GA4930@jenkins.home.ifup.org \
--to=bphilips@suse.de \
--cc=Yinghai.Lu@Sun.COM \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=suresh.b.siddha@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).