From: tip-bot for Brandon Phiilps <bphilips@suse.de>
To: linux-tip-commits@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, hpa@zytor.com, mingo@redhat.com,
yinghai@kernel.org, bphilips@suse.de, tglx@linutronix.de
Subject: [tip:x86/irq] x86: Avoid race condition in pci_enable_msix()
Date: Wed, 10 Feb 2010 22:39:44 GMT [thread overview]
Message-ID: <tip-ced5b697a76d325e7a7ac7d382dbbb632c765093@git.kernel.org> (raw)
In-Reply-To: <1265793639-15071-3-git-send-email-yinghai@kernel.org>
Commit-ID: ced5b697a76d325e7a7ac7d382dbbb632c765093
Gitweb: http://git.kernel.org/tip/ced5b697a76d325e7a7ac7d382dbbb632c765093
Author: Brandon Phiilps <bphilips@suse.de>
AuthorDate: Wed, 10 Feb 2010 01:20:06 -0800
Committer: H. Peter Anvin <hpa@zytor.com>
CommitDate: Wed, 10 Feb 2010 14:27:28 -0800
x86: Avoid race condition in pci_enable_msix()
Keep chip_data in create_irq_nr and destroy_irq.
When two drivers are setting up MSI-X at the same time via
pci_enable_msix() there is a race. See this dmesg excerpt:
[ 85.170610] ixgbe 0000:02:00.1: irq 97 for MSI/MSI-X
[ 85.170611] alloc irq_desc for 99 on node -1
[ 85.170613] igb 0000:08:00.1: irq 98 for MSI/MSI-X
[ 85.170614] alloc kstat_irqs on node -1
[ 85.170616] alloc irq_2_iommu on node -1
[ 85.170617] alloc irq_desc for 100 on node -1
[ 85.170619] alloc kstat_irqs on node -1
[ 85.170621] alloc irq_2_iommu on node -1
[ 85.170625] ixgbe 0000:02:00.1: irq 99 for MSI/MSI-X
[ 85.170626] alloc irq_desc for 101 on node -1
[ 85.170628] igb 0000:08:00.1: irq 100 for MSI/MSI-X
[ 85.170630] alloc kstat_irqs on node -1
[ 85.170631] alloc irq_2_iommu on node -1
[ 85.170635] alloc irq_desc for 102 on node -1
[ 85.170636] alloc kstat_irqs on node -1
[ 85.170639] alloc irq_2_iommu on node -1
[ 85.170646] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000088
As you can see igb and ixgbe are both alternating on create_irq_nr()
via pci_enable_msix() in their probe function.
ixgbe: While looping through irq_desc_ptrs[] via create_irq_nr() ixgbe
choses irq_desc_ptrs[102] and exits the loop, drops vector_lock and
calls dynamic_irq_init. Then it sets irq_desc_ptrs[102]->chip_data =
NULL via dynamic_irq_init().
igb: Grabs the vector_lock now and starts looping over irq_desc_ptrs[]
via create_irq_nr(). It gets to irq_desc_ptrs[102] and does this:
cfg_new = irq_desc_ptrs[102]->chip_data;
if (cfg_new->vector != 0)
continue;
This hits the NULL deref.
Another possible race exists via pci_disable_msix() in a driver or in
the number of error paths that call free_msi_irqs():
destroy_irq()
dynamic_irq_cleanup() which sets desc->chip_data = NULL
...race window...
desc->chip_data = cfg;
Remove the save and restore code for cfg in create_irq_nr() and
destroy_irq() and take the desc->lock when checking the irq_cfg.
Reported-and-analyzed-by: Brandon Philips <bphilips@suse.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-3-git-send-email-yinghai@kernel.org>
Signed-off-by: Brandon Phililps <bphilips@suse.de>
Cc: stable@kernel.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
arch/x86/kernel/apic/io_apic.c | 18 ++++----------
include/linux/irq.h | 2 +
kernel/irq/chip.c | 52 +++++++++++++++++++++++++++++++++-------
3 files changed, 50 insertions(+), 22 deletions(-)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 53243ca..c86591b 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -3228,12 +3228,9 @@ unsigned int create_irq_nr(unsigned int irq_want, int node)
}
spin_unlock_irqrestore(&vector_lock, flags);
- if (irq > 0) {
- dynamic_irq_init(irq);
- /* restore it, in case dynamic_irq_init clear it */
- if (desc_new)
- desc_new->chip_data = cfg_new;
- }
+ if (irq > 0)
+ dynamic_irq_init_keep_chip_data(irq);
+
return irq;
}
@@ -3256,17 +3253,12 @@ void destroy_irq(unsigned int irq)
{
unsigned long flags;
struct irq_cfg *cfg;
- struct irq_desc *desc;
- /* store it, in case dynamic_irq_cleanup clear it */
- desc = irq_to_desc(irq);
- cfg = desc->chip_data;
- dynamic_irq_cleanup(irq);
- /* connect back irq_cfg */
- desc->chip_data = cfg;
+ dynamic_irq_cleanup_keep_chip_data(irq);
free_irte(irq);
spin_lock_irqsave(&vector_lock, flags);
+ cfg = irq_to_desc(irq)->chip_data;
__clear_irq_vector(irq, cfg);
spin_unlock_irqrestore(&vector_lock, flags);
}
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 451481c..4d9b26e 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -400,7 +400,9 @@ static inline int irq_has_action(unsigned int irq)
/* Dynamic irq helper functions */
extern void dynamic_irq_init(unsigned int irq);
+void dynamic_irq_init_keep_chip_data(unsigned int irq);
extern void dynamic_irq_cleanup(unsigned int irq);
+void dynamic_irq_cleanup_keep_chip_data(unsigned int irq);
/* Set/get chip/data for an IRQ: */
extern int set_irq_chip(unsigned int irq, struct irq_chip *chip);
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index ecc3fa2..d70394f 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -18,11 +18,7 @@
#include "internals.h"
-/**
- * dynamic_irq_init - initialize a dynamically allocated irq
- * @irq: irq number to initialize
- */
-void dynamic_irq_init(unsigned int irq)
+static void dynamic_irq_init_x(unsigned int irq, bool keep_chip_data)
{
struct irq_desc *desc;
unsigned long flags;
@@ -41,7 +37,8 @@ void dynamic_irq_init(unsigned int irq)
desc->depth = 1;
desc->msi_desc = NULL;
desc->handler_data = NULL;
- desc->chip_data = NULL;
+ if (!keep_chip_data)
+ desc->chip_data = NULL;
desc->action = NULL;
desc->irq_count = 0;
desc->irqs_unhandled = 0;
@@ -55,10 +52,26 @@ void dynamic_irq_init(unsigned int irq)
}
/**
- * dynamic_irq_cleanup - cleanup a dynamically allocated irq
+ * dynamic_irq_init - initialize a dynamically allocated irq
* @irq: irq number to initialize
*/
-void dynamic_irq_cleanup(unsigned int irq)
+void dynamic_irq_init(unsigned int irq)
+{
+ dynamic_irq_init_x(irq, false);
+}
+
+/**
+ * dynamic_irq_init_keep_chip_data - initialize a dynamically allocated irq
+ * @irq: irq number to initialize
+ *
+ * does not set irq_to_desc(irq)->chip_data to NULL
+ */
+void dynamic_irq_init_keep_chip_data(unsigned int irq)
+{
+ dynamic_irq_init_x(irq, true);
+}
+
+static void dynamic_irq_cleanup_x(unsigned int irq, bool keep_chip_data)
{
struct irq_desc *desc = irq_to_desc(irq);
unsigned long flags;
@@ -77,7 +90,8 @@ void dynamic_irq_cleanup(unsigned int irq)
}
desc->msi_desc = NULL;
desc->handler_data = NULL;
- desc->chip_data = NULL;
+ if (!keep_chip_data)
+ desc->chip_data = NULL;
desc->handle_irq = handle_bad_irq;
desc->chip = &no_irq_chip;
desc->name = NULL;
@@ -85,6 +99,26 @@ void dynamic_irq_cleanup(unsigned int irq)
raw_spin_unlock_irqrestore(&desc->lock, flags);
}
+/**
+ * dynamic_irq_cleanup - cleanup a dynamically allocated irq
+ * @irq: irq number to initialize
+ */
+void dynamic_irq_cleanup(unsigned int irq)
+{
+ dynamic_irq_cleanup_x(irq, false);
+}
+
+/**
+ * dynamic_irq_cleanup_keep_chip_data - cleanup a dynamically allocated irq
+ * @irq: irq number to initialize
+ *
+ * does not set irq_to_desc(irq)->chip_data to NULL
+ */
+void dynamic_irq_cleanup_keep_chip_data(unsigned int irq)
+{
+ dynamic_irq_cleanup_x(irq, true);
+}
+
/**
* set_irq_chip - set the irq chip for an irq
next prev parent reply other threads:[~2010-02-10 22:40 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-10 9:20 [PATCH -v7 0/35] tip related: not use bootmem for x86 Yinghai Lu
2010-02-10 9:20 ` [PATCH 01/35] x86: fix sci on ioapic 1 Yinghai Lu
2010-02-10 22:48 ` [tip:x86/urgent] x86: Fix SCI on IOAPIC != 0 tip-bot for Yinghai Lu
2010-02-10 9:20 ` [PATCH 02/35] x86: keep chip_data in create_irq_nr and destroy_irq Yinghai Lu
2010-02-10 22:39 ` tip-bot for Brandon Phiilps [this message]
2010-02-10 9:20 ` [PATCH 03/35] x86: move range related operation to one file Yinghai Lu
2010-02-10 9:20 ` [PATCH 04/35] x86/pci: use resource_size_t in update_res Yinghai Lu
2010-02-10 9:20 ` [PATCH 05/35] x86/pci: amd one chain system to use pci read out res Yinghai Lu
2010-02-10 9:20 ` [PATCH 06/35] x86/pci: use u64 instead of size_t in amd_bus.c Yinghai Lu
2010-02-10 9:20 ` [PATCH 07/35] x86/pci: add cap_resource Yinghai Lu
2010-02-10 9:20 ` [PATCH 08/35] x86/pci: enable pci root res read out for 32bit too Yinghai Lu
2010-02-10 9:20 ` [PATCH 09/35] x86: change range end to start+size Yinghai Lu
2010-02-10 9:20 ` [PATCH 10/35] x86: print out for RAM buffer Yinghai Lu
2010-02-10 9:20 ` [PATCH 11/35] x86: call early_res_to_bootmem one time Yinghai Lu
2010-02-10 9:20 ` [PATCH 12/35] x86: introduce max_early_res and early_res_count Yinghai Lu
2010-02-10 9:20 ` [PATCH 13/35] x86: dynamic increase early_res array size Yinghai Lu
2010-02-10 9:20 ` [PATCH 14/35] x86: make early_node_mem get mem > 4g if possible Yinghai Lu
2010-02-10 9:20 ` [PATCH 15/35] x86: only call dma32_reserve_bootmem 64bit !CONFIG_NUMA Yinghai Lu
2010-02-10 9:20 ` [PATCH 16/35] x86: make 64 bit use early_res instead of bootmem before slab Yinghai Lu
2010-02-14 14:08 ` Stephen Rothwell
2010-02-14 20:31 ` Yinghai Lu
2010-02-17 1:16 ` Yinghai Lu
2010-02-24 22:59 ` Peter Zijlstra
2010-02-24 23:29 ` Yinghai Lu
2010-02-24 23:32 ` Yinghai Lu
2010-02-25 2:07 ` Tejun Heo
2010-02-25 2:13 ` Yinghai Lu
2010-02-25 2:33 ` Tejun Heo
2010-02-25 2:36 ` [PATCH] early_res: add free_early_partial Yinghai Lu
2010-02-25 11:10 ` Peter Zijlstra
2010-03-02 2:48 ` [PATCH] early_res: need to save name aside with free_early_partial Yinghai Lu
2010-02-10 9:20 ` [PATCH 17/35] sparsemem: put usemap for one node together Yinghai Lu
2010-02-10 9:20 ` [PATCH 18/35] sparsemem: put mem map " Yinghai Lu
2010-02-10 9:20 ` [PATCH 19/35] x86: move bios page reserve early to head32/64.c Yinghai Lu
2010-02-10 9:20 ` [PATCH 20/35] x86: seperate early_res related code from e820.c Yinghai Lu
2010-02-10 9:20 ` [PATCH 21/35] x86: add find_early_area_size Yinghai Lu
2010-02-10 9:20 ` [PATCH 22/35] x86: move back find_e820_area to e820.c Yinghai Lu
2010-02-10 9:20 ` [PATCH 23/35] early_res: enhance check_and_double_early_res Yinghai Lu
2010-02-10 9:20 ` [PATCH 24/35] x86: make 32bit support NO_BOOTMEM Yinghai Lu
2010-02-10 9:20 ` [PATCH 25/35] move round_up/down to kernel.h Yinghai Lu
2010-02-13 18:49 ` Joe Perches
2010-02-13 19:52 ` H. Peter Anvin
2010-02-13 20:11 ` Andrew Morton
2010-02-13 21:57 ` H. Peter Anvin
2010-02-10 9:20 ` [PATCH 26/35] x86: add find_fw_memmap_area Yinghai Lu
2010-02-10 9:20 ` [PATCH 27/35] core: move early_res Yinghai Lu
2010-02-14 14:16 ` Stephen Rothwell
2010-02-14 17:08 ` Ingo Molnar
2010-02-14 23:43 ` Stephen Rothwell
2010-02-15 4:44 ` Ingo Molnar
2010-02-14 20:46 ` Yinghai Lu
2010-02-16 23:46 ` H. Peter Anvin
2010-02-16 23:53 ` Yinghai Lu
2010-02-17 0:01 ` H. Peter Anvin
2010-02-17 0:41 ` Yinghai Lu
2010-02-17 0:46 ` H. Peter Anvin
2010-02-17 1:10 ` Yinghai Lu
2010-02-17 2:40 ` Yinghai Lu
2010-02-10 9:20 ` [PATCH 28/35] irq: remove not need bootmem code Yinghai Lu
2010-02-18 1:57 ` [tip:x86/irq] irq: Remove unnecessary " tip-bot for Yinghai Lu
2010-02-10 9:20 ` [PATCH 29/35] radix: move radix init early Yinghai Lu
2010-02-18 1:57 ` [tip:x86/irq] init: Move radix_tree_init() early tip-bot for Yinghai Lu
2010-02-10 9:20 ` [PATCH 30/35] sparseirq: change irq_desc_ptrs to static Yinghai Lu
2010-02-18 1:58 ` [tip:x86/irq] sparseirq: Change " tip-bot for Yinghai Lu
2010-02-10 9:20 ` [PATCH 31/35] sparseirq: use radix_tree instead of ptrs array Yinghai Lu
2010-02-18 1:58 ` [tip:x86/irq] sparseirq: Use " tip-bot for Yinghai Lu
2010-02-10 9:20 ` [PATCH 32/35] x86: remove arch_probe_nr_irqs Yinghai Lu
2010-02-18 1:58 ` [tip:x86/irq] x86, irq: Remove arch_probe_nr_irqs tip-bot for Yinghai Lu
2010-02-10 9:20 ` [PATCH 33/35] use nr_cpus= to set nr_cpu_ids early Yinghai Lu
2010-02-18 1:59 ` [tip:x86/irq] smp: Use " tip-bot for Yinghai Lu
2010-02-10 9:20 ` [PATCH 34/35] x86: use num_processors for possible cpus Yinghai Lu
2010-02-18 1:32 ` H. Peter Anvin
2010-02-18 2:38 ` Yinghai Lu
2010-02-18 17:26 ` H. Peter Anvin
2010-02-18 19:48 ` Christoph Lameter
2010-02-18 19:53 ` H. Peter Anvin
2010-02-19 15:14 ` Christoph Lameter
2010-02-19 16:14 ` H. Peter Anvin
2010-02-10 9:20 ` [PATCH 35/35] x86: make 32bit apic flat to physflat switch like 64bit Yinghai Lu
2010-02-11 16:14 ` [PATCH -v7 0/35] tip related: not use bootmem for x86 Ingo Molnar
2010-02-11 21:10 ` Yinghai Lu
2010-02-15 2:27 ` Benjamin Herrenschmidt
2010-02-15 4:50 ` Yinghai Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=tip-ced5b697a76d325e7a7ac7d382dbbb632c765093@git.kernel.org \
--to=bphilips@suse.de \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=tglx@linutronix.de \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.