public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andy Whitcroft <apw@shadowen.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, Steve Fox <drfickle@us.ibm.com>,
	Mel Gorman <mel@csn.ul.ie>
Subject: Re: 2.6.23-rc1-mm1 -- mostly fails to build
Date: Wed, 25 Jul 2007 23:41:46 +0100	[thread overview]
Message-ID: <20070725224143.GA668@shadowen.org> (raw)
In-Reply-To: <46A77C28.1050807@shadowen.org>

On Wed, Jul 25, 2007 at 05:36:56PM +0100, Andy Whitcroft wrote:

> Will investigate the NUMA-Q explosion and report on that separatly.

Ok, I've been looking at the NUMA-Q boot panic below:

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
 printing eip:
c111689f
*pdpt = 0000000001387001
*pde = 0000000000000000
Oops: 0000 [#1]
SMP
Modules linked in:
CPU:    0
EIP:    0060:[<c111689f>]    Not tainted VLI
EFLAGS: 00010286   (2.6.23-rc1-mm1-gc8131905-dirty #251)
EIP is at pci_create_bus+0x11b/0x277
eax: 00000000   ebx: c9352e00   ecx: c9073e94   edx: c9325400
esi: c9325400   edi: c932559c   ebp: 00000002   esp: c9073e90
ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
Process swapper (pid: 1, ti=c9072000 task=c9070030 task.ti=c9072000)
Stack: c12adc4f c9325400 00000000 00000002 00000000 00000000 00000000 c934c800
       000000d6 00000000 c1116a09 00000000 c9073ed5 c11b178a 00000000 c934c940
       00000000 02fffff4 c12bd8ac c934c800 c12bd8b4 c9325000 c1119825 c934c800
Call Trace:
 [<c1116a09>] pci_scan_bus_parented+0xe/0x21
 [<c11b178a>] pci_fixup_i450nx+0xa7/0x101
 [<c1119825>] pci_do_fixups+0x2d/0x38
 [<c111665c>] pci_device_add+0x48/0x77
 [<c11166a5>] pci_scan_single_device+0x1a/0x1f
 [<c11166bf>] pci_scan_slot+0x15/0x47
 [<c111670a>] pci_scan_child_bus+0x19/0x7c
 [<c1116a14>] pci_scan_bus_parented+0x19/0x21
 [<c11b259d>] pcibios_scan_root+0x75/0x7e
 [<c1369ee3>] pci_numa_init+0x2c/0xe4
 [<c1354b87>] kernel_init+0x0/0xa1
 [<c1354a15>] do_initcalls+0x73/0x1a3
 [<c109032c>] proc_register+0xa0/0xa7
 [<c10905cd>] create_proc_entry+0x73/0x86
 [<c103a5c9>] register_irq_proc+0x75/0x92
 [<c1354b87>] kernel_init+0x0/0xa1
 [<c1354be6>] kernel_init+0x5f/0xa1
 [<c10031c3>] kernel_thread_helper+0x7/0x10
 =======================
Code: ff 8b 83 84 00 00 00 c7 04 24 4f dc 2a c1 89 44 24 04 e8 f8 42 f0 ff 83 7c 24 14 00 75 15 8b 93 84 00 00 00 85 d2 74 0b 8b 43 44 <8b> 00 89 82 50 01 00 00 c7 44 24 04 9a 04 00 00 8d bb 88 00 00
EIP: [<c111689f>] pci_create_bus+0x11b/0x277 SS:ESP 0068:c9073e90
Kernel panic - not syncing: Attempted to kill init!

This seems to have been caused by the introduction of the following
code fragment into pci_create_bus:

        if (!parent)
                set_dev_node(b->bridge, pcibus_to_node(b));

This has come as part of the -mm patch below:

    try-parent-numa_node-at-first-before-using-default-v2.patch

This patch does not seem to be wrong in and of itself.  It does
expose the fact that we are building busses with NULL sysdata.
This has come up at least three times now.  Below is the patch
proposed the last couple of times.  It is needed to allow any machine
with i450nx quirk, plus for NUMA-Q systems.

Andrew please could you add this to -mm again.

-apw

===
pci device ensure sysdata initialised v3

We have been seeing panic's on NUMA systems in pci_call_probe() in
2.6.19-rc1-mm1 and later.  This is related to the changes introduced
in the commit below:

    [x86, PCI] Switch pci_bus::sysdata from NUMA node integer to a pointer
    0a247a58fc3e2ecfc17654301033e8b8d08df2a2

In this change the sysdata has changed from directly representing
a value (the node number in NUMA) to a pointer to a structure.
However, it seems that we do not always initialise this sysdata
before we probe the device.

Prior to the changes above the node was defaulted to 'NULL'
allocating the devices to node 0 unconditionally.  This patch adds
a default sysdata entry (pci_default_sysdata), this is then used
where 'NULL' was used previously.  pci_default_sysdata defaults
the node to unknown (-1).  This is a more accurate assignment,
mirroring the value returned where no topology support is provided
and no locality information is available.

There are only two uses of this value in the affected architectures
(x86, x86_64) and generic code:

1) in x86_64, dma_alloc_pages() looks up the node in order to
   allocate node local memory.  Here if the node is invalid we
   will default to the first online node.  Behaviour here should
   be unchanged.
2) in generic, pci_call_probe() looks up the node in order to
   restrict execution of the probe on the card local node, to
   favor node local allocation.  Where this is unknown previously
   we would force execution (and thereby allocation) to node 0,
   this is arguably wrong and using -1 releases this restriction.

In an ideal world we should be supplying a sysdata for the
appropriate node where it is known.  Where it is not known defaulting
to -1 seems a better course, and would help us where node 0 is
short of memory.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
---
diff --git a/arch/i386/pci/common.c b/arch/i386/pci/common.c
index 85503de..362b7b4 100644
--- a/arch/i386/pci/common.c
+++ b/arch/i386/pci/common.c
@@ -27,6 +27,8 @@ unsigned long pirq_table_addr;
 struct pci_bus *pci_root_bus;
 struct pci_raw_ops *raw_pci_ops;
 
+struct pci_sysdata pci_default_sysdata = { .node = -1 };
+
 static int pci_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
 {
 	return raw_pci_ops->read(0, bus->number, devfn, where, size, value);
diff --git a/arch/i386/pci/fixup.c b/arch/i386/pci/fixup.c
index e7306db..80365ac 100644
--- a/arch/i386/pci/fixup.c
+++ b/arch/i386/pci/fixup.c
@@ -25,9 +25,11 @@ static void __devinit pci_fixup_i450nx(struct pci_dev *d)
 		pci_read_config_byte(d, reg++, &subb);
 		DBG("i450NX PXB %d: %02x/%02x/%02x\n", pxb, busno, suba, subb);
 		if (busno)
-			pci_scan_bus(busno, &pci_root_ops, NULL);	/* Bus A */
+			pci_scan_bus(busno, &pci_root_ops,
+					&pci_default_sysdata);	/* Bus A */
 		if (suba < subb)
-			pci_scan_bus(suba+1, &pci_root_ops, NULL);	/* Bus B */
+			pci_scan_bus(suba+1, &pci_root_ops,
+					&pci_default_sysdata);	/* Bus B */
 	}
 	pcibios_last_bus = -1;
 }
@@ -42,7 +44,7 @@ static void __devinit pci_fixup_i450gx(struct pci_dev *d)
 	u8 busno;
 	pci_read_config_byte(d, 0x4a, &busno);
 	printk(KERN_INFO "PCI: i440KX/GX host bridge %s: secondary bus %02x\n", pci_name(d), busno);
-	pci_scan_bus(busno, &pci_root_ops, NULL);
+	pci_scan_bus(busno, &pci_root_ops, &pci_default_sysdata);
 	pcibios_last_bus = -1;
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82454GX, pci_fixup_i450gx);
diff --git a/arch/i386/pci/legacy.c b/arch/i386/pci/legacy.c
index 149a958..b076b25 100644
--- a/arch/i386/pci/legacy.c
+++ b/arch/i386/pci/legacy.c
@@ -26,7 +26,8 @@ static void __devinit pcibios_fixup_peer_bridges(void)
 			    l != 0x0000 && l != 0xffff) {
 				DBG("Found device at %02x:%02x [%04x]\n", n, devfn, l);
 				printk(KERN_INFO "PCI: Discovered peer bus %02x\n", n);
-				pci_scan_bus(n, &pci_root_ops, NULL);
+				pci_scan_bus(n, &pci_root_ops, 
+						&pci_default_sysdata);
 				break;
 			}
 		}
diff --git a/arch/i386/pci/numa.c b/arch/i386/pci/numa.c
index adbe17a..8d2fd6c 100644
--- a/arch/i386/pci/numa.c
+++ b/arch/i386/pci/numa.c
@@ -97,9 +97,11 @@ static void __devinit pci_fixup_i450nx(struct pci_dev *d)
 		pci_read_config_byte(d, reg++, &subb);
 		DBG("i450NX PXB %d: %02x/%02x/%02x\n", pxb, busno, suba, subb);
 		if (busno)
-			pci_scan_bus(QUADLOCAL2BUS(quad,busno), &pci_root_ops, NULL);	/* Bus A */
+			pci_scan_bus(QUADLOCAL2BUS(quad,busno), &pci_root_ops,
+					&pci_default_sysdata);	/* Bus A */
 		if (suba < subb)
-			pci_scan_bus(QUADLOCAL2BUS(quad,suba+1), &pci_root_ops, NULL);	/* Bus B */
+			pci_scan_bus(QUADLOCAL2BUS(quad,suba+1), &pci_root_ops,
+					&pci_default_sysdata);	/* Bus B */
 	}
 	pcibios_last_bus = -1;
 }
@@ -124,7 +126,7 @@ static int __init pci_numa_init(void)
 			printk("Scanning PCI bus %d for quad %d\n", 
 				QUADLOCAL2BUS(quad,0), quad);
 			pci_scan_bus(QUADLOCAL2BUS(quad,0), 
-				&pci_root_ops, NULL);
+				&pci_root_ops, &pci_default_sysdata);
 		}
 	return 0;
 }
diff --git a/arch/i386/pci/visws.c b/arch/i386/pci/visws.c
index f1b486d..2539371 100644
--- a/arch/i386/pci/visws.c
+++ b/arch/i386/pci/visws.c
@@ -101,8 +101,8 @@ static int __init pcibios_init(void)
 		"bridge B (PIIX4) bus: %u\n", pci_bus1, pci_bus0);
 
 	raw_pci_ops = &pci_direct_conf1;
-	pci_scan_bus(pci_bus0, &pci_root_ops, NULL);
-	pci_scan_bus(pci_bus1, &pci_root_ops, NULL);
+	pci_scan_bus(pci_bus0, &pci_root_ops, &pci_default_sysdata);
+	pci_scan_bus(pci_bus1, &pci_root_ops, &pci_default_sysdata);
 	pci_fixup_irqs(visws_swizzle, visws_map_irq);
 	pcibios_resource_survey();
 	return 0;
diff --git a/include/asm-i386/pci.h b/include/asm-i386/pci.h
index d790343..7003604 100644
--- a/include/asm-i386/pci.h
+++ b/include/asm-i386/pci.h
@@ -7,6 +7,7 @@
 struct pci_sysdata {
 	int		node;		/* NUMA node */
 };
+extern struct pci_sysdata pci_default_sysdata;
 
 #include <linux/mm.h>		/* for struct page */
 
diff --git a/include/asm-x86_64/pci.h b/include/asm-x86_64/pci.h
index 88926eb..2c2b092 100644
--- a/include/asm-x86_64/pci.h
+++ b/include/asm-x86_64/pci.h
@@ -9,6 +9,7 @@ struct pci_sysdata {
 	int		node;		/* NUMA node */
 	void*		iommu;		/* IOMMU private data */
 };
+extern struct pci_sysdata pci_default_sysdata;
 
 #ifdef CONFIG_CALGARY_IOMMU
 static inline void* pci_iommu(struct pci_bus *bus)

  parent reply	other threads:[~2007-07-25 22:42 UTC|newest]

Thread overview: 132+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-25 11:03 2.6.23-rc1-mm1 Andrew Morton
2007-07-25 12:25 ` 2.6.23-rc1-mm1 Cedric Le Goater
2007-07-25 17:23   ` 2.6.23-rc1-mm1 Len Brown
2007-07-25 18:58     ` 2.6.23-rc1-mm1 Andrew Morton
2007-07-25 19:13       ` 2.6.23-rc1-mm1 Torsten Kaiser
2007-07-25 20:22         ` 2.6.23-rc1-mm1 Torsten Kaiser
2007-07-25 20:36           ` 2.6.23-rc1-mm1 Andrew Morton
2007-07-25 21:52             ` 2.6.23-rc1-mm1 Torsten Kaiser
2007-07-26  7:25               ` 2.6.23-rc1-mm1 Andrew Morton
2007-07-26 17:54                 ` 2.6.23-rc1-mm1 Torsten Kaiser
2007-07-28 14:03                   ` 2.6.23-rc1-mm1 Torsten Kaiser
2007-07-25 23:26       ` 2.6.23-rc1-mm1 Len Brown
2007-07-26  9:41         ` 2.6.23-rc1-mm1 Mel Gorman
2007-07-26 13:53           ` 2.6.23-rc1-mm1 Cedric Le Goater
2007-07-25 12:40 ` 2.6.23-rc1-mm1 Cedric Le Goater
2007-07-25 20:05   ` 2.6.23-rc1-mm1 Andrew Morton
2007-07-25 12:55 ` 2.6.23-rc1-mm1 Cedric Le Goater
2007-07-25 13:48   ` 2.6.23-rc1-mm1: chipsfb_pci_suspend problem Rafael J. Wysocki
2007-07-25 20:22     ` Andrew Morton
2007-07-25 22:45       ` Pavel Machek
2007-07-25 13:36 ` [-mm patch] one e1000 driver should be enough for everyone Adrian Bunk
2007-07-25 13:48   ` Jeff Garzik
2007-07-25 14:46     ` Adrian Bunk
2007-07-25 15:05       ` Jeff Garzik
2007-07-25 15:21         ` Kok, Auke
2007-07-25 15:23           ` Jeff Garzik
2007-07-25 20:50           ` Andrew Morton
2007-07-25 16:32 ` 2.6.23-rc1-mm1 Michal Piotrowski
2007-07-25 21:56   ` 2.6.23-rc1-mm1 Andrew Morton
2007-07-25 16:36 ` 2.6.23-rc1-mm1 -- mostly fails to build Andy Whitcroft
2007-07-25 17:04   ` Sam Ravnborg
2007-07-25 18:06   ` 2.6.23-rc1-mm1: SCSI_SRP_ATTRS compile error Adrian Bunk
2007-07-26 10:49     ` FUJITA Tomonori
2007-07-25 22:41   ` Andy Whitcroft [this message]
2007-07-26  5:56     ` 2.6.23-rc1-mm1 -- mostly fails to build Andrew Morton
2007-07-26 17:53       ` Yinghai Lu
2007-07-25 18:15 ` 2.6.23-rc1-mm1: net/ipv4/fib_trie.c compile error Adrian Bunk
2007-07-25 18:22 ` 2.6.23-rc1-mm1: reiser4 <-> lzo " Adrian Bunk
2007-07-25 18:44   ` Edward Shishkin
2007-07-27 12:35   ` Edward Shishkin
2007-07-27 15:11     ` Richard Purdie
2007-07-25 18:48 ` 2.6.23-rc1-mm1 Michal Piotrowski
2007-07-25 18:53   ` 2.6.23-rc1-mm1 Sam Ravnborg
2007-07-25 19:18     ` 2.6.23-rc1-mm1 H. Peter Anvin
2007-07-25 19:21       ` 2.6.23-rc1-mm1 Sam Ravnborg
2007-07-25 20:58       ` 2.6.23-rc1-mm1 Gabriel C
2007-07-25 21:05         ` 2.6.23-rc1-mm1 Gabriel C
2007-07-25 21:11           ` 2.6.23-rc1-mm1 H. Peter Anvin
2007-07-25 21:13             ` 2.6.23-rc1-mm1 Gabriel C
2007-07-25 21:18               ` 2.6.23-rc1-mm1 H. Peter Anvin
2007-07-25 21:26                 ` 2.6.23-rc1-mm1 Gabriel C
2007-07-26  0:07           ` 2.6.23-rc1-mm1 Greg KH
2007-07-26  0:28             ` 2.6.23-rc1-mm1 Andrew Morton
2007-07-26  1:55               ` 2.6.23-rc1-mm1 Dave Young
2007-07-26  2:23                 ` 2.6.23-rc1-mm1 Andrew Morton
2007-07-26 20:18             ` 2.6.23-rc1-mm1 Dave Hansen
2007-07-25 20:42 ` 2.6.23-rc1-mm1 - drivers/char/nozomi.c overflow in implicit constant conversion , warnings Gabriel C
2007-07-26  5:42   ` Greg KH
2007-07-25 21:01 ` 2.6.23-rc1-mm1: m32r is_init() compile error Adrian Bunk
2007-07-25 21:42   ` sukadev
2007-07-25 21:17 ` 2.6.23-rc1-mm1: git-kgdb breaks sh compilation Adrian Bunk
2007-07-26  1:45   ` Paul Mundt
2007-07-25 22:03 ` 2.6.23-rc1-mm1 - seems OK on Dell Latitude D820, except for tpm_tis Valdis.Kletnieks
2007-07-26  3:37   ` Andrew Morton
2007-07-27  4:00     ` Valdis.Kletnieks
2007-07-27 13:28       ` Valdis.Kletnieks
2007-07-27 18:07         ` Andrew Morton
2007-07-27 19:44           ` Valdis.Kletnieks
2007-07-27 22:43         ` Bjorn Helgaas
2007-07-30 18:09           ` Bjorn Helgaas
2007-07-30 23:53           ` Valdis.Kletnieks
2007-07-31 18:48             ` Valdis.Kletnieks
2007-07-31 20:01               ` Bjorn Helgaas
2007-07-31 21:31                 ` Valdis.Kletnieks
2007-07-31 23:05                   ` Bjorn Helgaas
2007-07-26  5:26 ` [-mm patch] DMA engine kconfig improvements Adrian Bunk
2007-08-04  2:15   ` Dan Williams
2007-08-10  0:43     ` Adrian Bunk
2007-08-15 23:36   ` Nelson, Shannon
2007-07-26 12:11 ` [PATCH] sparsemem: ensure we initialise the node mapping for SPARSEMEM_STATIC Andy Whitcroft
2007-07-26 12:58 ` 2.6.23-rc1-mm1 sparsemem_vmemamp fix KAMEZAWA Hiroyuki
2007-07-26 14:39   ` Andy Whitcroft
2007-07-26 14:44     ` Andy Whitcroft
2007-07-27 13:28 ` [-mm patch] xtensa console.c: remove duplicate #include Frederik Deweerdt
2007-07-28 15:44 ` NETPOLL=y , NETDEVICES=n compile error ( Re: 2.6.23-rc1-mm1 ) Gabriel C
2007-07-28 17:26   ` Andrew Morton
2007-07-28 18:42     ` Gabriel C
2007-07-31  8:32       ` Jarek Poplawski
2007-07-31 10:14         ` Gabriel C
2007-07-31 11:44           ` Jason Wessel
2007-07-31 12:47             ` Jarek Poplawski
2007-07-31 12:17           ` Jarek Poplawski
2007-07-31 15:05             ` Gabriel C
2007-08-01  9:59               ` Jarek Poplawski
2007-08-02  2:02                 ` Matt Mackall
2007-08-02  9:00                   ` Jarek Poplawski
2007-08-02 15:59                     ` Matt Mackall
2007-08-03  7:30                       ` Jarek Poplawski
2007-08-02  9:36                   ` Sam Ravnborg
2007-08-02 10:32                     ` Satyam Sharma
2007-08-02 11:40                       ` Satyam Sharma
2007-08-02 11:40                       ` Jarek Poplawski
2007-08-02 11:56                         ` Satyam Sharma
2007-08-02 12:52                           ` Jarek Poplawski
2007-08-06 11:51                     ` [PATCH] docs: note about select in kconfig-language.txt Jarek Poplawski
2007-07-28 16:36 ` DCA=n , INTEL_IOATDMA=y compile error ( Re: 2.6.23-rc1-mm1 ) Gabriel C
2007-07-28 16:47 ` sound/pci/ac97/ac97_patch.h - declared 'static' but never defined warnings " Gabriel C
2007-07-28 17:07 ` mm/sparse.c compile error " Gabriel C
2007-07-28 17:30   ` Andrew Morton
2007-07-30 12:16     ` Andy Whitcroft
2007-07-28 19:32 ` [PATCH -mm] Fix libata warnings with CONFIG_PM=n Gabriel C
2007-07-29 14:57 ` [-mm patch] make hugetlbfs_read() static Adrian Bunk
2007-07-29 14:57 ` [-mm patch] fs/ecryptfs/: make code static Adrian Bunk
2007-07-29 14:58 ` [-mm patch] make struct sdio_dev_attrs[] static Adrian Bunk
2007-07-29 19:29   ` Pierre Ossman
2007-07-29 14:58 ` [-mm patch] MTD onenand_sim.c: make struct info static Adrian Bunk
2007-07-29 14:58 ` [-mm patch] make scsi_host_link_pm_policy() static Adrian Bunk
2007-07-29 14:58 ` [-mm patch] USB: make dev_attr_authorized_default static Adrian Bunk
2007-07-31 19:13   ` Inaky Perez-Gonzalez
2007-07-29 14:59 ` [-mm patch] kernel/printk.c: make 2 variables static Adrian Bunk
2007-07-29 16:51   ` Randy Dunlap
2007-07-29 14:59 ` [-mm patch] export v4l2_int_device_{,un}register Adrian Bunk
2007-07-29 14:59 ` [-mm patch] kernel/pid.c: remove unused exports Adrian Bunk
2007-07-29 15:00 ` [-mm patch] security/ cleanups Adrian Bunk
2007-07-30 11:47   ` James Morris
2007-07-29 15:49 ` 2.6.23-rc1-mm1 Grant Wilson
2007-07-30  9:58   ` 2.6.23-rc1-mm1 Dave Young
2007-07-30 18:27     ` 2.6.23-rc1-mm1 Andrew Morton
2007-07-30 18:42       ` 2.6.23-rc1-mm1 Christoph Hellwig
2007-07-30 22:18         ` 2.6.23-rc1-mm1 Satyam Sharma
2007-07-31  1:21           ` 2.6.23-rc1-mm1 Dave Young
2007-08-01 15:24 ` 2.6.23-rc1-mm1 - loopback mount of files fails loop-use-unlocked_ioctl.patch Valdis.Kletnieks

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070725224143.GA668@shadowen.org \
    --to=apw@shadowen.org \
    --cc=akpm@linux-foundation.org \
    --cc=drfickle@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mel@csn.ul.ie \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox