linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yinghai Lu <yinghai@kernel.org>
To: Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>,
	Christoph Lameter <cl@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	Brandon Phiilps <bphilips@suse.de>,
	Yinghai Lu <yinghai@kernel.org>,
	stable@kernel.org
Subject: [PATCH 02/35] x86: keep chip_data in create_irq_nr and destroy_irq
Date: Wed, 10 Feb 2010 01:20:06 -0800	[thread overview]
Message-ID: <1265793639-15071-3-git-send-email-yinghai@kernel.org> (raw)
In-Reply-To: <1265793639-15071-1-git-send-email-yinghai@kernel.org>

From: Brandon Phiilps <bphilips@suse.de>

When two drivers are setting up MSI-X at the same time via
pci_enable_msix() there is a race.  See this dmesg excerpt:

[   85.170610] ixgbe 0000:02:00.1: irq 97 for MSI/MSI-X
[   85.170611]   alloc irq_desc for 99 on node -1
[   85.170613] igb 0000:08:00.1: irq 98 for MSI/MSI-X
[   85.170614]   alloc kstat_irqs on node -1
[   85.170616] alloc irq_2_iommu on node -1
[   85.170617]   alloc irq_desc for 100 on node -1
[   85.170619]   alloc kstat_irqs on node -1
[   85.170621] alloc irq_2_iommu on node -1
[   85.170625] ixgbe 0000:02:00.1: irq 99 for MSI/MSI-X
[   85.170626]   alloc irq_desc for 101 on node -1
[   85.170628] igb 0000:08:00.1: irq 100 for MSI/MSI-X
[   85.170630]   alloc kstat_irqs on node -1
[   85.170631] alloc irq_2_iommu on node -1
[   85.170635]   alloc irq_desc for 102 on node -1
[   85.170636]   alloc kstat_irqs on node -1
[   85.170639] alloc irq_2_iommu on node -1
[   85.170646] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000088

As you can see igb and ixgbe are both alternating on create_irq_nr()
via pci_enable_msix() in their probe function.

ixgbe: While looping through irq_desc_ptrs[] via create_irq_nr() ixgbe
choses irq_desc_ptrs[102] and exits the loop, drops vector_lock and
calls dynamic_irq_init. Then it sets irq_desc_ptrs[102]->chip_data =
NULL via dynamic_irq_init().

igb: Grabs the vector_lock now and starts looping over irq_desc_ptrs[]
via create_irq_nr(). It gets to irq_desc_ptrs[102] and does this:

	cfg_new = irq_desc_ptrs[102]->chip_data;
	if (cfg_new->vector != 0)
		continue;

This hits the NULL deref.

Another possible race exists via pci_disable_msix() in a driver or in
the number of error paths that call free_msi_irqs():

destroy_irq()
dynamic_irq_cleanup() which sets desc->chip_data = NULL
...race window...
desc->chip_data = cfg;

Remove the save and restore code for cfg in create_irq_nr() and
destroy_irq() and take the desc->lock when checking the irq_cfg.

Reported-and-analyzed-by: Brandon Philips <bphilips@suse.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Brandon Phiilps <bphilips@suse.de>
Cc: stable@kernel.org
---
 arch/x86/kernel/apic/io_apic.c |   18 ++++----------
 include/linux/irq.h            |    2 +
 kernel/irq/chip.c              |   52 +++++++++++++++++++++++++++++++++-------
 3 files changed, 50 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index ee2fba1..bd93444 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -3280,12 +3280,9 @@ unsigned int create_irq_nr(unsigned int irq_want, int node)
 	}
 	spin_unlock_irqrestore(&vector_lock, flags);
 
-	if (irq > 0) {
-		dynamic_irq_init(irq);
-		/* restore it, in case dynamic_irq_init clear it */
-		if (desc_new)
-			desc_new->chip_data = cfg_new;
-	}
+	if (irq > 0)
+		dynamic_irq_init_keep_chip_data(irq);
+
 	return irq;
 }
 
@@ -3308,17 +3305,12 @@ void destroy_irq(unsigned int irq)
 {
 	unsigned long flags;
 	struct irq_cfg *cfg;
-	struct irq_desc *desc;
 
-	/* store it, in case dynamic_irq_cleanup clear it */
-	desc = irq_to_desc(irq);
-	cfg = desc->chip_data;
-	dynamic_irq_cleanup(irq);
-	/* connect back irq_cfg */
-	desc->chip_data = cfg;
+	dynamic_irq_cleanup_keep_chip_data(irq);
 
 	free_irte(irq);
 	spin_lock_irqsave(&vector_lock, flags);
+	cfg = irq_to_desc(irq)->chip_data;
 	__clear_irq_vector(irq, cfg);
 	spin_unlock_irqrestore(&vector_lock, flags);
 }
diff --git a/include/linux/irq.h b/include/linux/irq.h
index d13492d..707ab12 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -400,7 +400,9 @@ static inline int irq_has_action(unsigned int irq)
 
 /* Dynamic irq helper functions */
 extern void dynamic_irq_init(unsigned int irq);
+void dynamic_irq_init_keep_chip_data(unsigned int irq);
 extern void dynamic_irq_cleanup(unsigned int irq);
+void dynamic_irq_cleanup_keep_chip_data(unsigned int irq);
 
 /* Set/get chip/data for an IRQ: */
 extern int set_irq_chip(unsigned int irq, struct irq_chip *chip);
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index ecc3fa2..2fc42d3 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -18,11 +18,7 @@
 
 #include "internals.h"
 
-/**
- *	dynamic_irq_init - initialize a dynamically allocated irq
- *	@irq:	irq number to initialize
- */
-void dynamic_irq_init(unsigned int irq)
+static void dynamic_irq_init_x(unsigned int irq, bool keep_chip_data)
 {
 	struct irq_desc *desc;
 	unsigned long flags;
@@ -41,7 +37,8 @@ void dynamic_irq_init(unsigned int irq)
 	desc->depth = 1;
 	desc->msi_desc = NULL;
 	desc->handler_data = NULL;
-	desc->chip_data = NULL;
+	if (!keep_chip_data)
+		desc->chip_data = NULL;
 	desc->action = NULL;
 	desc->irq_count = 0;
 	desc->irqs_unhandled = 0;
@@ -55,10 +52,26 @@ void dynamic_irq_init(unsigned int irq)
 }
 
 /**
- *	dynamic_irq_cleanup - cleanup a dynamically allocated irq
+ *	dynamic_irq_init - initialize a dynamically allocated irq
  *	@irq:	irq number to initialize
  */
-void dynamic_irq_cleanup(unsigned int irq)
+void dynamic_irq_init(unsigned int irq)
+{
+	dynamic_irq_init_x(irq, false);
+}
+
+/**
+ *	dynamic_irq_init_keep_chip_data - initialize a dynamically allocated irq
+ *	@irq:	irq number to initialize
+ *
+ * 	does not set irq_to_desc(irq)->chip_data to NULL
+ */
+void dynamic_irq_init_keep_chip_data(unsigned int irq)
+{
+	dynamic_irq_init_x(irq, true);
+}
+
+static void dynamic_irq_cleanup_x(unsigned int irq, bool keep_chip_data)
 {
 	struct irq_desc *desc = irq_to_desc(irq);
 	unsigned long flags;
@@ -77,7 +90,8 @@ void dynamic_irq_cleanup(unsigned int irq)
 	}
 	desc->msi_desc = NULL;
 	desc->handler_data = NULL;
-	desc->chip_data = NULL;
+	if (!keep_chip_data)
+		desc->chip_data = NULL;
 	desc->handle_irq = handle_bad_irq;
 	desc->chip = &no_irq_chip;
 	desc->name = NULL;
@@ -85,6 +99,26 @@ void dynamic_irq_cleanup(unsigned int irq)
 	raw_spin_unlock_irqrestore(&desc->lock, flags);
 }
 
+/**
+ *	dynamic_irq_cleanup - cleanup a dynamically allocated irq
+ *	@irq:	irq number to initialize
+ */
+void dynamic_irq_cleanup(unsigned int irq)
+{
+	dynamic_irq_cleanup_x(irq, false);
+}
+
+/**
+ *	dynamic_irq_cleanup_keep_chip_data - cleanup a dynamically allocated irq
+ *	@irq:	irq number to initialize
+ *
+ * 	does not set irq_to_desc(irq)->chip_data to NULL
+ */
+void dynamic_irq_cleanup_keep_chip_data(unsigned int irq)
+{
+	dynamic_irq_cleanup_x(irq, true);
+}
+
 
 /**
  *	set_irq_chip - set the irq chip for an irq
-- 
1.6.4.2


  parent reply	other threads:[~2010-02-10  9:32 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-10  9:20 [PATCH -v7 0/35] tip related: not use bootmem for x86 Yinghai Lu
2010-02-10  9:20 ` [PATCH 01/35] x86: fix sci on ioapic 1 Yinghai Lu
2010-02-10 22:48   ` [tip:x86/urgent] x86: Fix SCI on IOAPIC != 0 tip-bot for Yinghai Lu
2010-02-10  9:20 ` Yinghai Lu [this message]
2010-02-10 22:39   ` [tip:x86/irq] x86: Avoid race condition in pci_enable_msix() tip-bot for Brandon Phiilps
2010-02-10  9:20 ` [PATCH 03/35] x86: move range related operation to one file Yinghai Lu
2010-02-10  9:20 ` [PATCH 04/35] x86/pci: use resource_size_t in update_res Yinghai Lu
2010-02-10  9:20 ` [PATCH 05/35] x86/pci: amd one chain system to use pci read out res Yinghai Lu
2010-02-10  9:20 ` [PATCH 06/35] x86/pci: use u64 instead of size_t in amd_bus.c Yinghai Lu
2010-02-10  9:20 ` [PATCH 07/35] x86/pci: add cap_resource Yinghai Lu
2010-02-10  9:20 ` [PATCH 08/35] x86/pci: enable pci root res read out for 32bit too Yinghai Lu
2010-02-10  9:20 ` [PATCH 09/35] x86: change range end to start+size Yinghai Lu
2010-02-10  9:20 ` [PATCH 10/35] x86: print out for RAM buffer Yinghai Lu
2010-02-10  9:20 ` [PATCH 11/35] x86: call early_res_to_bootmem one time Yinghai Lu
2010-02-10  9:20 ` [PATCH 12/35] x86: introduce max_early_res and early_res_count Yinghai Lu
2010-02-10  9:20 ` [PATCH 13/35] x86: dynamic increase early_res array size Yinghai Lu
2010-02-10  9:20 ` [PATCH 14/35] x86: make early_node_mem get mem > 4g if possible Yinghai Lu
2010-02-10  9:20 ` [PATCH 15/35] x86: only call dma32_reserve_bootmem 64bit !CONFIG_NUMA Yinghai Lu
2010-02-10  9:20 ` [PATCH 16/35] x86: make 64 bit use early_res instead of bootmem before slab Yinghai Lu
2010-02-14 14:08   ` Stephen Rothwell
2010-02-14 20:31     ` Yinghai Lu
2010-02-17  1:16     ` Yinghai Lu
2010-02-24 22:59       ` Peter Zijlstra
2010-02-24 23:29         ` Yinghai Lu
2010-02-24 23:32           ` Yinghai Lu
2010-02-25  2:07             ` Tejun Heo
2010-02-25  2:13               ` Yinghai Lu
2010-02-25  2:33                 ` Tejun Heo
2010-02-25  2:36               ` [PATCH] early_res: add free_early_partial Yinghai Lu
2010-02-25 11:10                 ` Peter Zijlstra
2010-03-02  2:48                 ` [PATCH] early_res: need to save name aside with free_early_partial Yinghai Lu
2010-02-10  9:20 ` [PATCH 17/35] sparsemem: put usemap for one node together Yinghai Lu
2010-02-10  9:20 ` [PATCH 18/35] sparsemem: put mem map " Yinghai Lu
2010-02-10  9:20 ` [PATCH 19/35] x86: move bios page reserve early to head32/64.c Yinghai Lu
2010-02-10  9:20 ` [PATCH 20/35] x86: seperate early_res related code from e820.c Yinghai Lu
2010-02-10  9:20 ` [PATCH 21/35] x86: add find_early_area_size Yinghai Lu
2010-02-10  9:20 ` [PATCH 22/35] x86: move back find_e820_area to e820.c Yinghai Lu
2010-02-10  9:20 ` [PATCH 23/35] early_res: enhance check_and_double_early_res Yinghai Lu
2010-02-10  9:20 ` [PATCH 24/35] x86: make 32bit support NO_BOOTMEM Yinghai Lu
2010-02-10  9:20 ` [PATCH 25/35] move round_up/down to kernel.h Yinghai Lu
2010-02-13 18:49   ` Joe Perches
2010-02-13 19:52     ` H. Peter Anvin
2010-02-13 20:11       ` Andrew Morton
2010-02-13 21:57         ` H. Peter Anvin
2010-02-10  9:20 ` [PATCH 26/35] x86: add find_fw_memmap_area Yinghai Lu
2010-02-10  9:20 ` [PATCH 27/35] core: move early_res Yinghai Lu
2010-02-14 14:16   ` Stephen Rothwell
2010-02-14 17:08     ` Ingo Molnar
2010-02-14 23:43       ` Stephen Rothwell
2010-02-15  4:44         ` Ingo Molnar
2010-02-14 20:46     ` Yinghai Lu
2010-02-16 23:46       ` H. Peter Anvin
2010-02-16 23:53         ` Yinghai Lu
2010-02-17  0:01           ` H. Peter Anvin
2010-02-17  0:41             ` Yinghai Lu
2010-02-17  0:46               ` H. Peter Anvin
2010-02-17  1:10                 ` Yinghai Lu
2010-02-17  2:40                   ` Yinghai Lu
2010-02-10  9:20 ` [PATCH 28/35] irq: remove not need bootmem code Yinghai Lu
2010-02-18  1:57   ` [tip:x86/irq] irq: Remove unnecessary " tip-bot for Yinghai Lu
2010-02-10  9:20 ` [PATCH 29/35] radix: move radix init early Yinghai Lu
2010-02-18  1:57   ` [tip:x86/irq] init: Move radix_tree_init() early tip-bot for Yinghai Lu
2010-02-10  9:20 ` [PATCH 30/35] sparseirq: change irq_desc_ptrs to static Yinghai Lu
2010-02-18  1:58   ` [tip:x86/irq] sparseirq: Change " tip-bot for Yinghai Lu
2010-02-10  9:20 ` [PATCH 31/35] sparseirq: use radix_tree instead of ptrs array Yinghai Lu
2010-02-18  1:58   ` [tip:x86/irq] sparseirq: Use " tip-bot for Yinghai Lu
2010-02-10  9:20 ` [PATCH 32/35] x86: remove arch_probe_nr_irqs Yinghai Lu
2010-02-18  1:58   ` [tip:x86/irq] x86, irq: Remove arch_probe_nr_irqs tip-bot for Yinghai Lu
2010-02-10  9:20 ` [PATCH 33/35] use nr_cpus= to set nr_cpu_ids early Yinghai Lu
2010-02-18  1:59   ` [tip:x86/irq] smp: Use " tip-bot for Yinghai Lu
2010-02-10  9:20 ` [PATCH 34/35] x86: use num_processors for possible cpus Yinghai Lu
2010-02-18  1:32   ` H. Peter Anvin
2010-02-18  2:38     ` Yinghai Lu
2010-02-18 17:26       ` H. Peter Anvin
2010-02-18 19:48         ` Christoph Lameter
2010-02-18 19:53           ` H. Peter Anvin
2010-02-19 15:14             ` Christoph Lameter
2010-02-19 16:14               ` H. Peter Anvin
2010-02-10  9:20 ` [PATCH 35/35] x86: make 32bit apic flat to physflat switch like 64bit Yinghai Lu
2010-02-11 16:14 ` [PATCH -v7 0/35] tip related: not use bootmem for x86 Ingo Molnar
2010-02-11 21:10   ` Yinghai Lu
2010-02-15  2:27 ` Benjamin Herrenschmidt
2010-02-15  4:50   ` Yinghai Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1265793639-15071-3-git-send-email-yinghai@kernel.org \
    --to=yinghai@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bphilips@suse.de \
    --cc=cl@linux-foundation.org \
    --cc=hpa@zytor.com \
    --cc=jbarnes@virtuousgeek.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=stable@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).