* x86_64 account-for-memmap patch in 2.6.18-rc4-mm3 doesn't boot.
@ 2006-08-31 10:46 Paul Jackson
2006-08-31 16:17 ` Mel Gorman
0 siblings, 1 reply; 10+ messages in thread
From: Paul Jackson @ 2006-08-31 10:46 UTC (permalink / raw)
To: linux-kernel, Andrew Morton, Mel Gorman
Cc: Dave Hansen, Andy Whitcroft, Andi Kleen, Benjamin Herrenschmidt,
Paul Mackerras, Keith Mannthey, Luck, Tony, KAMEZAWA Hiroyuki,
Yasunori Goto
The following patch in 2.6.18-rc4-mm3 is broken on my x86_64:
account-for-memmap-and-optionally-the-kernel-image-as-holes.patch
The failure is 100% reproducible.
The system has a pair of dual-core Intel Xeon 5100 series (Woodcrest)
processors (4 logical CPUs total) and 2 GBytes of ram.
The .config is what one gets from 'make defconfig' for arch x86_64,
plus the following changes:
=========================== begin ===========================
--- .config.def 2006-08-31 04:29:22.100311614 -0500
+++ .config 2006-08-31 04:29:03.247761750 -0500
@@ -1,7 +1,7 @@
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.18-rc4-mm3
-# Thu Aug 31 04:29:22 2006
+# Thu Aug 31 04:07:54 2006
#
CONFIG_X86_64=y
CONFIG_64BIT=y
@@ -44,7 +44,7 @@
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
-# CONFIG_CPUSETS is not set
+CONFIG_CPUSETS=y
# CONFIG_RELAY is not set
CONFIG_INITRAMFS_SOURCE=""
CONFIG_UID16=y
@@ -205,7 +205,7 @@
# CONFIG_ACPI_ASUS is not set
# CONFIG_ACPI_IBM is not set
# CONFIG_ACPI_TOSHIBA is not set
-CONFIG_ACPI_SONY=m
+# CONFIG_ACPI_SONY is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_EC=y
@@ -1270,7 +1270,11 @@
# CONFIG_REISERFS_FS_SECURITY is not set
# CONFIG_JFS_FS is not set
CONFIG_FS_POSIX_ACL=y
-# CONFIG_XFS_FS is not set
+CONFIG_XFS_FS=y
+# CONFIG_XFS_QUOTA is not set
+# CONFIG_XFS_SECURITY is not set
+# CONFIG_XFS_POSIX_ACL is not set
+# CONFIG_XFS_RT is not set
# CONFIG_GFS2_FS is not set
# CONFIG_OCFS2_FS is not set
# CONFIG_MINIX_FS is not set
============================ end ============================
The boot fails with the following console output:
=========================== begin ===========================
root (hd0,0)
Filesystem type is ext2fs, partition type 0x83
kernel /vmlinuz.pj2 root=/dev/sda3 console=ttyS1,115200 showopts pj2
[Linux-bzImage, setup=0x1c00, size=0x2b66e5]
Linux version 2.6.18-rc4-mm3 (pj@spandau) (gcc version 4.1.0 (SUSE Linux)) #48 SMP Thu Aug 31 04:22:41 CDT 2006
Command line: root=/dev/sda3 console=ttyS1,115200 showopts pj2
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009f000 (usable)
BIOS-e820: 000000000009f000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000007f932000 (usable)
BIOS-e820: 000000007f932000 - 000000007f9d0(ACPI NVS)
BIOS-e820: 000000007f9d0000 - 000000007fa42000 (usable)
BIOS-e820: 000000007fa420000 - 000000007fb2b000 (usable)
BIOS-e820: 000000007fb2b000 - 000000007fb3a000 (ACPI data)
B0000000000-000000007fc00000
Bootmem setup node 0 0000000000000000-000000007fc00000
Zone PFN raProcessor #0 (Bootup-CPU)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x06] enabled)
Processor #6
ACPapic_id[0x85] disabled)
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x86] disabled)
ACPI: LAPIC (acpi_x02] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] high level lint[0x1])
ACPI: LAPIC_NM0x08] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 8, address 0xfec00000, GSI 0-23
ACPI0x0b] address[0xfec84400] gsi_base[72])
IOAPIC[3]: apic_id 11, address 0xfec84400, GSI 72-95
AUsing ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 800000ot=/dev/sda3 console=ttyS1,115200 showopts pj2
Initializing CPU#0
PID hash table entries: 40962 (order: 8, 1048576 bytes)
Checking aperture...
Memory: 2052128k/2093056k available (3519k kerved, 2323k data, 280k init)
Calibrating delay using timer specific routine.. 5324.66 BogoMIPS (lpj=10649332)
Mount-cache hash table entries: 256
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU 0/0 -> Node 0
using mwait in idle threads.
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
CPU0: Thermal monitoring enabled (TM2)
SMP alternatives: switching to UP code
ACPI: Core revision 20060707
Using local APIC timer interrupts.
result 20781304
Detected 20.781 MHz APIC timer.
SMP alternatives: switching to SMP code
Booting processor 1/4 APIC 0x6
Initializing CPU#1
Calibrating delay using timer specific routine.. 5320.16 BogoMIPS (lpj=10640330)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU 1/6 -> Node 0
CPU: Physical Processor ID: 3
CPU: Processor Core ID: 0
CPU1: Thermal monitoring enabled (TM2)
Genuine Intel(R) CPU @ 2.66GHz stepping 04
SMP alternatives: switching to SMP code
Booting processor 2/4 APIC 0x1
Initializing CPU#2
Calibrating delay using timer specific routine.. 5320.16 BogoMIPS (lpj=10640332)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU 2/1 -> Node 0
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
CPU2: Thermal monitoring enabled (TM2)
Genuine Intel(R) CPU @ 2.66GHz stepping 04
SMP alternatives: switching to SMP code
Booting processor 3/4 APIC 0x7
Initializing CPU#3
Calibrating delay using timer specific routine.. 5320.04 BogoMIPS (lpj=10640092)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU 3/7 -> Node 0
CPU: Physical Processor ID: 3
CPU: Processor Core ID: 1
CPU3: Thermal monitoring enabled (TM2)
Genuine Intel(R) CPU @ 2.66GHz stepping 04
Brought up 4 CPUs
testing NMI watchdog ... OK.
time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer.
time.c: Detected 2660.007 MHz processor.
migration_cost=30,7937
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using MMCONFIG at a0000000
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1
PCI: PXH quirk detected, disabling MSI for SHPC device
PCI: PXH quirk detected, disabling MSI for SHPC device
PCI: Transparent bridge - 0000:00:1e.0
ACPI: PCI Interrupt Link [LNKA] (IRQs 5 7 *10 11)
ACPI: PCI Interrupt Link [LNKB] (IRQs 5 7 10 *11)
ACPI: PCI Interrupt Link [LNKC] (IRQs 5 7 *10 11)
ACPI: PCI Interrupt Link [LNKD] (IRQs *5 7 10 11)
ACPI: PCI Interrupt Link [LNKE] (IRQs *5 7 10 11)
ACPI: PCI Interrupt Link Intel 82802 RNG detected
SCSI subsystem initialized
usbcore: registered new interface driver uirq". If it helps, post a report
hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
hpet0: 3 64-bit timeow: b8b00000-b8bfffff
PCI: Bridge: 0000:03:00.2
IO window: disabled.
MEM window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:02:02.0
IO windowisabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:02.0
IOdow: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:05.0c:00.2
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: BridEFETCH window: disabled.
PCI: Bridge: 0000:00:1e.0
IO window: 1000-1fff
MEM window: b8c00terrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:02:00.0[A] - IRQ 169
ACPI: PCI Interrupt 0000:00:03.0[A] -> GSI 16 (level, low) -> IRQ 169
ACPI: PCI Inter Interrupt 0000:00:07.0[A] -> GSI 16 (level, low) -> IRQ 169
IP route cache hash table entries: 65536 (order: 7, 524288 bytes)
TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 TCP: Hash tables configured (established 262144 bind 65536)
TCP reno registered
Total HugeTLB io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (def0000:00:1d.7 EHCI: BIOS handoff failed (BIOS bug ?) 01010001
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
aer: probe of 0000:00:02.0:pcie01 failed with error 2
aer: probe of 0000:00:03.0:pcie01 failed with error 1
aer: probe of 0000:00:04.0:pcie01 failed failed with error 2
aer: probe of 0000:00:07.0:pcie01 failed with error 2
ACPI: Power Button r Device is not present [20060707]
ACPI: Getting cpuindex for acpiid 0x4
ACPI Exception (acpi_060707]
ACPI: Getting cpuindex for acpiid 0x6
ACPI Exception (acpi_processor-0681): AE_NOT_FOUReal Time Clock Driver v1.12ac
Linux agpgart interface v0.101 (c) Dave Jones
Serial: 8250/1655/O 0x2f8 (irq = 3) is a 16550A
floppy0: no floppy controllers found
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
loop: loaded (max 8 devices)
IntI 17 sharing vector 0x42 and IRQ 17
ACPI: PCI Interrupt 0000:07:00.0[A] -> GSI 18 (level, low) -> IRQ 66
e1000: 0000:07:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:04:23:cf:2d:d2
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
GSI 18 sharing vector 0x4A and IRQ 18
ACPI: PCI Interrupt 0000:07:00.1[B] -> GSI 19 (level, low) -> IRQ 74
e1000: 0000:07:00.1:ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: ATAPI 24X DVD-ROM drive, 256kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006)
megaraid: 2.20.4.9 (Release Date: Sun Jul 16 12:27:22 EST 2006)
megasas: 00.00.03.01 Sun May 14 22:49:52 PDT 2006
megasas: 0x1000:0x0411:0x8086:0x3501: bus 4:slot 14:func 0
ACPI: PCI Interrupt 0000:04:0e.0[A] -> GSI 18 (level, low) -> IRQ 66
scsi0 : LSI Logic SAS based MegaRAID driver
scsi 0:0:0:0: Direct-Access ATA HDT722525DLA380 A80A PQ: 0 ANSI: 5
scsi 0:0:1:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
scsi 0:0:2:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
scsi 0:0:3:0: Direct-Access Ascsi 0:2:0:0: Direct-Access INTEL SROMBSAS18E 1.00 PQ: 0 ANSI: 5
scsi 0:2:1:0: Direswapper invoked oom-killer: gfp_mask=0xd1, order=0, oomkilladj=0
Call Trace:
[<ffffffff802025bc67>] __alloc_pages+0x229/0x2b2
[<ffffffff80274e46>] cache_grow+0x134/0x333
[<ffffffff802really_probe+0x47/0xc9
[<ffffffff803eea20>] __driver_attach+0x6f/0xaf
[<ffffffff803ee214>] bffffffff803abf12>] acpi_ds_init_one_object+0x0/0x82
[<ffffffff80207046>] init+0x0/0x306
[<ffu 0 hot: high 186, batch 31 used:24
cpu 0 cold: high 62, batch 15 used:0
cpu 1 hot: high 186, 15 used:0
Node 0 Normal per-cpu: empty
Active:0 inactive:0 dirty:0 writeback:0 unstable:0 freeB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_res 0*2048kB 496*4096kB = 2035560kB
Node 0 Normal: empty
Swap cache: add 0, delete 0, find 0/0, r swap cached
Kernel panic - not syncing: Out of memory and no killable processes...
============================ end ============================
Without this bad patch, the system boot continues with the following
messages, slightly overlapping my presentation with the above output:
========================== begin ===========================
...
ACPI: PCI Interrupt 0000:04:0e.0[A] -> GSI 18 (level, low) -> IRQ 66
scsi0 : LSI Logic SAS based MegaRAID driver
scsi 0:0:0:0: Direct-Access ATA HDT722525DLA380 A80A PQ: 0 ANSI: 5
scsi 0:0:1:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
scsi 0:0:2:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
scsi 0:0:3:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
scsi 0:0:4:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5
scsi 0:2:0:0: Direct-Access INTEL SROMBSAS18E 1.00 PQ: 0 ANSI: 5
scsi 0:2:1:0: Direct-Access INTEL SROMBSAS18E 1.00 PQ: 0 ANSI: 5
SCSI devi: write through
SCSI device sda: 486326272 512-byte hdwr sectors (248999 MB)
sda: test WP fail sda1 sda2 sda3
sd 0:2:0:0: Attached scsi disk sda
SCSI device sdb: 2923825152 512-byte hdwr s assuming drive cache: write through
SCSI device sdb: 2923825152 512-byte hdwr sectors (1496998 sdb1
sd 0:2:1:0: Attached scsi disk sdb
sd 0:2:0:0: Attached scsi generic sg0 type 0
sd 0:2:aw1394: /dev/raw1394 device initialized
GSI 20 sharing vector 0x5A and IRQ 20
ACPI: PCI Interr1d.7: debug port 1
...
============================ end ============================
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: x86_64 account-for-memmap patch in 2.6.18-rc4-mm3 doesn't boot. 2006-08-31 10:46 x86_64 account-for-memmap patch in 2.6.18-rc4-mm3 doesn't boot Paul Jackson @ 2006-08-31 16:17 ` Mel Gorman 2006-08-31 17:01 ` Paul Jackson 0 siblings, 1 reply; 10+ messages in thread From: Mel Gorman @ 2006-08-31 16:17 UTC (permalink / raw) To: Paul Jackson Cc: linux-kernel, Andrew Morton, Dave Hansen, Andy Whitcroft, Andi Kleen, Benjamin Herrenschmidt, Paul Mackerras, Keith Mannthey, Luck, Tony, KAMEZAWA Hiroyuki, Yasunori Goto On Thu, 31 Aug 2006, Paul Jackson wrote: > The following patch in 2.6.18-rc4-mm3 is broken on my x86_64: > > account-for-memmap-and-optionally-the-kernel-image-as-holes.patch > > The failure is 100% reproducible. > Ok, I'm suprised that it is this patch that causes a problem. I felt the patch would either explode everywhere or just work. > The system has a pair of dual-core Intel Xeon 5100 series (Woodcrest) > processors (4 logical CPUs total) and 2 GBytes of ram. > > The .config is what one gets from 'make defconfig' for arch x86_64, > plus the following changes: > > =========================== begin =========================== > --- .config.def 2006-08-31 04:29:22.100311614 -0500 > +++ .config 2006-08-31 04:29:03.247761750 -0500 > @@ -1,7 +1,7 @@ > # > # Automatically generated make config: don't edit > # Linux kernel version: 2.6.18-rc4-mm3 > -# Thu Aug 31 04:29:22 2006 > +# Thu Aug 31 04:07:54 2006 > # > CONFIG_X86_64=y > CONFIG_64BIT=y > @@ -44,7 +44,7 @@ > # CONFIG_AUDIT is not set > CONFIG_IKCONFIG=y > CONFIG_IKCONFIG_PROC=y > -# CONFIG_CPUSETS is not set > +CONFIG_CPUSETS=y > # CONFIG_RELAY is not set > CONFIG_INITRAMFS_SOURCE="" > CONFIG_UID16=y > @@ -205,7 +205,7 @@ > # CONFIG_ACPI_ASUS is not set > # CONFIG_ACPI_IBM is not set > # CONFIG_ACPI_TOSHIBA is not set > -CONFIG_ACPI_SONY=m > +# CONFIG_ACPI_SONY is not set > CONFIG_ACPI_BLACKLIST_YEAR=0 > # CONFIG_ACPI_DEBUG is not set > CONFIG_ACPI_EC=y > @@ -1270,7 +1270,11 @@ > # CONFIG_REISERFS_FS_SECURITY is not set > # CONFIG_JFS_FS is not set > CONFIG_FS_POSIX_ACL=y > -# CONFIG_XFS_FS is not set > +CONFIG_XFS_FS=y > +# CONFIG_XFS_QUOTA is not set > +# CONFIG_XFS_SECURITY is not set > +# CONFIG_XFS_POSIX_ACL is not set > +# CONFIG_XFS_RT is not set > # CONFIG_GFS2_FS is not set > # CONFIG_OCFS2_FS is not set > # CONFIG_MINIX_FS is not set > ============================ end ============================ > Nothing very suprising there. > The boot fails with the following console output: > ok, this is interesting. It appears that the log is truncated or somehow corrupt. > =========================== begin =========================== > root (hd0,0) > Filesystem type is ext2fs, partition type 0x83 > kernel /vmlinuz.pj2 root=/dev/sda3 console=ttyS1,115200 showopts pj2 > [Linux-bzImage, setup=0x1c00, size=0x2b66e5] > > Linux version 2.6.18-rc4-mm3 (pj@spandau) (gcc version 4.1.0 (SUSE Linux)) #48 SMP Thu Aug 31 04:22:41 CDT 2006 > Command line: root=/dev/sda3 console=ttyS1,115200 showopts pj2 > BIOS-provided physical RAM map: > BIOS-e820: 0000000000000000 - 000000000009f000 (usable) > BIOS-e820: 000000000009f000 - 0000000000100000 (reserved) > BIOS-e820: 0000000000100000 - 000000007f932000 (usable) > BIOS-e820: 000000007f932000 - 000000007f9d0(ACPI NVS) Little bit missing here. I don't expect 000000007f9d0 to be truncated like that. > BIOS-e820: 000000007f9d0000 - 000000007fa42000 (usable) > BIOS-e820: 000000007fa420000 - 000000007fb2b000 (usable) or 000000007fa420000 to have an additional 0 at the end. > BIOS-e820: 000000007fb2b000 - 000000007fb3a000 (ACPI data) > B0000000000-000000007fc00000 > Bootmem setup node 0 0000000000000000-000000007fc00000 and this seems to interleave even though the bootmem setup node range would match your physical memory. > Zone PFN raProcessor #0 (Bootup-CPU) There is information missing here. That should be Zone PFN Ranges followed by a list of active PFN ranges from your system. After that, I expect to see a message like X pages DMA reserved Y pages used for memmap Do you think this is a problem with your serial console or something else? Do you see the Zone PFN ranges information when the patch is backed out? Those messages, as well as botting with loglevel=8 would really help me figure out what went pear shaped. > ACPI: LAPIC (acpi_id[0x01] lapic_id[0x06] enabled) > Processor #6 > ACPapic_id[0x85] disabled) > ACPI: LAPIC (acpi_id[0x06] lapic_id[0x86] disabled) > ACPI: LAPIC (acpi_x02] high level lint[0x1]) > ACPI: LAPIC_NMI (acpi_id[0x03] high level lint[0x1]) > ACPI: LAPIC_NM0x08] address[0xfec00000] gsi_base[0]) > IOAPIC[0]: apic_id 8, address 0xfec00000, GSI 0-23 > ACPI0x0b] address[0xfec84400] gsi_base[72]) > IOAPIC[3]: apic_id 11, address 0xfec84400, GSI 72-95 > AUsing ACPI (MADT) for SMP configuration information > Allocating PCI resources starting at 800000ot=/dev/sda3 console=ttyS1,115200 showopts pj2 > Initializing CPU#0 > PID hash table entries: 40962 (order: 8, 1048576 bytes) > Checking aperture... > Memory: 2052128k/2093056k available (3519k kerved, 2323k data, 280k init) > Calibrating delay using timer specific routine.. 5324.66 BogoMIPS (lpj=10649332) > Mount-cache hash table entries: 256 > CPU: L1 I cache: 32K, L1 D cache: 32K > CPU: L2 cache: 4096K > CPU 0/0 -> Node 0 > using mwait in idle threads. > CPU: Physical Processor ID: 0 > CPU: Processor Core ID: 0 > CPU0: Thermal monitoring enabled (TM2) > SMP alternatives: switching to UP code > ACPI: Core revision 20060707 > Using local APIC timer interrupts. > result 20781304 > Detected 20.781 MHz APIC timer. > SMP alternatives: switching to SMP code > Booting processor 1/4 APIC 0x6 > Initializing CPU#1 > Calibrating delay using timer specific routine.. 5320.16 BogoMIPS (lpj=10640330) > CPU: L1 I cache: 32K, L1 D cache: 32K > CPU: L2 cache: 4096K > CPU 1/6 -> Node 0 > CPU: Physical Processor ID: 3 > CPU: Processor Core ID: 0 > CPU1: Thermal monitoring enabled (TM2) > Genuine Intel(R) CPU @ 2.66GHz stepping 04 > SMP alternatives: switching to SMP code > Booting processor 2/4 APIC 0x1 > Initializing CPU#2 > Calibrating delay using timer specific routine.. 5320.16 BogoMIPS (lpj=10640332) > CPU: L1 I cache: 32K, L1 D cache: 32K > CPU: L2 cache: 4096K > CPU 2/1 -> Node 0 > CPU: Physical Processor ID: 0 > CPU: Processor Core ID: 1 > CPU2: Thermal monitoring enabled (TM2) > Genuine Intel(R) CPU @ 2.66GHz stepping 04 > SMP alternatives: switching to SMP code > Booting processor 3/4 APIC 0x7 > Initializing CPU#3 > Calibrating delay using timer specific routine.. 5320.04 BogoMIPS (lpj=10640092) > CPU: L1 I cache: 32K, L1 D cache: 32K > CPU: L2 cache: 4096K > CPU 3/7 -> Node 0 > CPU: Physical Processor ID: 3 > CPU: Processor Core ID: 1 > CPU3: Thermal monitoring enabled (TM2) > Genuine Intel(R) CPU @ 2.66GHz stepping 04 > Brought up 4 CPUs > testing NMI watchdog ... OK. > time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer. > time.c: Detected 2660.007 MHz processor. > migration_cost=30,7937 > NET: Registered protocol family 16 > ACPI: bus type pci registered > PCI: Using MMCONFIG at a0000000 > ACPI: Interpreter enabled > ACPI: Using IOAPIC for interrupt routing > ACPI: PCI Root Bridge [PCI0] (0000:00) > PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1 > PCI: PXH quirk detected, disabling MSI for SHPC device > PCI: PXH quirk detected, disabling MSI for SHPC device > PCI: Transparent bridge - 0000:00:1e.0 > ACPI: PCI Interrupt Link [LNKA] (IRQs 5 7 *10 11) > ACPI: PCI Interrupt Link [LNKB] (IRQs 5 7 10 *11) > ACPI: PCI Interrupt Link [LNKC] (IRQs 5 7 *10 11) > ACPI: PCI Interrupt Link [LNKD] (IRQs *5 7 10 11) > ACPI: PCI Interrupt Link [LNKE] (IRQs *5 7 10 11) > ACPI: PCI Interrupt Link Intel 82802 RNG detected > SCSI subsystem initialized > usbcore: registered new interface driver uirq". If it helps, post a report > hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 > hpet0: 3 64-bit timeow: b8b00000-b8bfffff > PCI: Bridge: 0000:03:00.2 > IO window: disabled. > MEM window: disabled. > MEM window: disabled. > PREFETCH window: disabled. > PCI: Bridge: 0000:02:02.0 > IO windowisabled. > MEM window: disabled. > PREFETCH window: disabled. > PCI: Bridge: 0000:00:02.0 > IOdow: disabled. > MEM window: disabled. > PREFETCH window: disabled. > PCI: Bridge: 0000:00:05.0c:00.2 > IO window: disabled. > MEM window: disabled. > PREFETCH window: disabled. > PCI: BridEFETCH window: disabled. > PCI: Bridge: 0000:00:1e.0 > IO window: 1000-1fff > MEM window: b8c00terrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 169 > ACPI: PCI Interrupt 0000:02:00.0[A] - IRQ 169 > ACPI: PCI Interrupt 0000:00:03.0[A] -> GSI 16 (level, low) -> IRQ 169 > ACPI: PCI Inter Interrupt 0000:00:07.0[A] -> GSI 16 (level, low) -> IRQ 169 > IP route cache hash table entries: 65536 (order: 7, 524288 bytes) > TCP established hash table entries: 262144 (order: 10, 4194304 bytes) > TCP bind hash table entries: 65536 (order: 8, 1048576 TCP: Hash tables configured (established 262144 bind 65536) > TCP reno registered > Total HugeTLB io scheduler noop registered > io scheduler deadline registered > io scheduler cfq registered (def0000:00:1d.7 EHCI: BIOS handoff failed (BIOS bug ?) 01010001 > assign_interrupt_mode Found MSI capability > assign_interrupt_mode Found MSI capability > assign_interrupt_mode Found MSI capability > assign_interrupt_mode Found MSI capability > assign_interrupt_mode Found MSI capability > assign_interrupt_mode Found MSI capability > assign_interrupt_mode Found MSI capability > assign_interrupt_mode Found MSI capability > assign_interrupt_mode Found MSI capability > aer: probe of 0000:00:02.0:pcie01 failed with error 2 > aer: probe of 0000:00:03.0:pcie01 failed with error 1 > aer: probe of 0000:00:04.0:pcie01 failed failed with error 2 > aer: probe of 0000:00:07.0:pcie01 failed with error 2 > ACPI: Power Button r Device is not present [20060707] > ACPI: Getting cpuindex for acpiid 0x4 > ACPI Exception (acpi_060707] > ACPI: Getting cpuindex for acpiid 0x6 > ACPI Exception (acpi_processor-0681): AE_NOT_FOUReal Time Clock Driver v1.12ac > Linux agpgart interface v0.101 (c) Dave Jones > Serial: 8250/1655/O 0x2f8 (irq = 3) is a 16550A > floppy0: no floppy controllers found > RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize > loop: loaded (max 8 devices) > IntI 17 sharing vector 0x42 and IRQ 17 > ACPI: PCI Interrupt 0000:07:00.0[A] -> GSI 18 (level, low) -> IRQ 66 > e1000: 0000:07:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:04:23:cf:2d:d2 > e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection > GSI 18 sharing vector 0x4A and IRQ 18 > ACPI: PCI Interrupt 0000:07:00.1[B] -> GSI 19 (level, low) -> IRQ 74 > e1000: 0000:07:00.1:ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 > hda: ATAPI 24X DVD-ROM drive, 256kB Cache, UDMA(33) > Uniform CD-ROM driver Revision: 3.20 > megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006) > megaraid: 2.20.4.9 (Release Date: Sun Jul 16 12:27:22 EST 2006) > megasas: 00.00.03.01 Sun May 14 22:49:52 PDT 2006 > megasas: 0x1000:0x0411:0x8086:0x3501: bus 4:slot 14:func 0 > ACPI: PCI Interrupt 0000:04:0e.0[A] -> GSI 18 (level, low) -> IRQ 66 > scsi0 : LSI Logic SAS based MegaRAID driver > scsi 0:0:0:0: Direct-Access ATA HDT722525DLA380 A80A PQ: 0 ANSI: 5 > scsi 0:0:1:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5 > scsi 0:0:2:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5 > scsi 0:0:3:0: Direct-Access Ascsi 0:2:0:0: Direct-Access INTEL SROMBSAS18E 1.00 PQ: 0 ANSI: 5 > scsi 0:2:1:0: Direswapper invoked oom-killer: gfp_mask=0xd1, order=0, oomkilladj=0 > > Call Trace: > [<ffffffff802025bc67>] __alloc_pages+0x229/0x2b2 > [<ffffffff80274e46>] cache_grow+0x134/0x333 > [<ffffffff802really_probe+0x47/0xc9 > [<ffffffff803eea20>] __driver_attach+0x6f/0xaf > [<ffffffff803ee214>] bffffffff803abf12>] acpi_ds_init_one_object+0x0/0x82 > [<ffffffff80207046>] init+0x0/0x306 > [<ffu 0 hot: high 186, batch 31 used:24 > cpu 0 cold: high 62, batch 15 used:0 > cpu 1 hot: high 186, 15 used:0 > Node 0 Normal per-cpu: empty > Active:0 inactive:0 dirty:0 writeback:0 unstable:0 freeB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no > lowmem_res 0*2048kB 496*4096kB = 2035560kB This is also garbled up. This is in show_free_areas() though and it looks like it is saying there are 496*4096kB pages currently free. Not clear at all how it managed to go OOM due to this patch. > Node 0 Normal: empty > Swap cache: add 0, delete 0, find 0/0, r swap cached > Kernel panic - not syncing: Out of memory and no killable processes... > ============================ end ============================ > > > Without this bad patch, the system boot continues with the following > messages, slightly overlapping my presentation with the above output: > > > ========================== begin =========================== > ... > ACPI: PCI Interrupt 0000:04:0e.0[A] -> GSI 18 (level, low) -> IRQ 66 > scsi0 : LSI Logic SAS based MegaRAID driver > scsi 0:0:0:0: Direct-Access ATA HDT722525DLA380 A80A PQ: 0 ANSI: 5 > scsi 0:0:1:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5 > scsi 0:0:2:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5 > scsi 0:0:3:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5 > scsi 0:0:4:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5 > scsi 0:2:0:0: Direct-Access INTEL SROMBSAS18E 1.00 PQ: 0 ANSI: 5 > scsi 0:2:1:0: Direct-Access INTEL SROMBSAS18E 1.00 PQ: 0 ANSI: 5 > SCSI devi: write through > SCSI device sda: 486326272 512-byte hdwr sectors (248999 MB) > sda: test WP fail sda1 sda2 sda3 > sd 0:2:0:0: Attached scsi disk sda > SCSI device sdb: 2923825152 512-byte hdwr s assuming drive cache: write through > SCSI device sdb: 2923825152 512-byte hdwr sectors (1496998 sdb1 > sd 0:2:1:0: Attached scsi disk sdb > sd 0:2:0:0: Attached scsi generic sg0 type 0 > sd 0:2:aw1394: /dev/raw1394 device initialized > GSI 20 sharing vector 0x5A and IRQ 20 > ACPI: PCI Interr1d.7: debug port 1 > ... > ============================ end ============================ > > > -- > I won't rest till it's the best ... > Programmer, Linux Scalability > Paul Jackson <pj@sgi.com> 1.925.600.0401 > Can I see a full bootlog with the patch backed out to see if that console garbling is still there please? Have you any idea why the console garbling is happening? Thanks -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: x86_64 account-for-memmap patch in 2.6.18-rc4-mm3 doesn't boot. 2006-08-31 16:17 ` Mel Gorman @ 2006-08-31 17:01 ` Paul Jackson 2006-09-01 8:38 ` Mel Gorman 0 siblings, 1 reply; 10+ messages in thread From: Paul Jackson @ 2006-08-31 17:01 UTC (permalink / raw) To: Mel Gorman Cc: linux-kernel, akpm, haveblue, apw, ak, benh, paulus, kmannth, tony.luck, kamezawa.hiroyu, y-goto Mel wrote: > Have you any idea why the console garbling is happening? Yeah - you're right - it's garbled. Looks like its dropping chars. I don't know why, but I'm not surprised. It's a lab system with a new (for us) way of rigging the console output. I just got this particular x86_64's console connection to work at all yesterday. I've been working indirectly through my good lab tech. I should drive in to the lab that has this rig (an hour away) and check it out in person, and see what can be done to get clean console output. This may take a day or three to yield results, unless I get lucky. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: x86_64 account-for-memmap patch in 2.6.18-rc4-mm3 doesn't boot. 2006-08-31 17:01 ` Paul Jackson @ 2006-09-01 8:38 ` Mel Gorman 2006-09-02 3:24 ` Paul Jackson 0 siblings, 1 reply; 10+ messages in thread From: Mel Gorman @ 2006-09-01 8:38 UTC (permalink / raw) To: Paul Jackson Cc: linux-kernel, akpm, haveblue, apw, ak, benh, paulus, kmannth, tony.luck, kamezawa.hiroyu, y-goto On Thu, 31 Aug 2006, Paul Jackson wrote: > Mel wrote: >> Have you any idea why the console garbling is happening? > > Yeah - you're right - it's garbled. Looks like its dropping chars. > Or writing some chars twice but at a different time. The system might be one of those that fakes serial console output on the assumption the operating system isn't doing the same thing. I've seen one or two blade systems that did something like this with mixed results. > I don't know why, but I'm not surprised. It's a lab system with a > new (for us) way of rigging the console output. I just got this > particular x86_64's console connection to work at all yesterday. > > I've been working indirectly through my good lab tech. I should > drive in to the lab that has this rig (an hour away) and check it > out in person, and see what can be done to get clean console output. > That is a bit of a sickener. It may be worth getting your good lab tech to check if there is a configuration setting in the hardware for simulating console output before you make the trip. > This may take a day or three to yield results, unless I get lucky. > I have Keith's problem with reserve-based-hot-add to keep me occupied in the meantime. Whenever you get the chance will be fine. Thanks a lot -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: x86_64 account-for-memmap patch in 2.6.18-rc4-mm3 doesn't boot. 2006-09-01 8:38 ` Mel Gorman @ 2006-09-02 3:24 ` Paul Jackson 2006-09-04 9:45 ` Mel Gorman 0 siblings, 1 reply; 10+ messages in thread From: Paul Jackson @ 2006-09-02 3:24 UTC (permalink / raw) To: Mel Gorman Cc: linux-kernel, akpm, haveblue, apw, ak, benh, paulus, kmannth, tony.luck, kamezawa.hiroyu, y-goto > That is a bit of a sickener. It may be worth getting your good lab tech > to check if there is a configuration setting in the hardware for > simulating console output before you make the trip. Apparently my lab setup simply lacks correct flow control on the serial console line. I hacked the 8250 serial driver in my kernel to put a one msec delay between each character output, and it no longer drops console output during boot. > > This may take a day or three to yield results, unless I get lucky. > > > > I have Keith's problem with reserve-based-hot-add to keep me occupied in > the meantime. Whenever you get the chance will be fine. Thanks a lot Ok, below is the console output for one of these crashes. This output is missing the first couple dozen lines commencing with grub announcing it is loading my kernel, as those lines seem to go via a different serial driver that I didn't chase down to hack. Those initial lines were still dropping lotsa chars. If you need those initial lines bad, holler, and I can probably hack something to get them to show up. By the way, the crash continues to happen 100% with the patch: patches/account-for-memmap-and-optionally-the-kernel-image-as-holes.patch and zero percent without it. So this patch continues to be suspect number one. There is no suspect number two ;). Notice the really bogus looking memory size numbers on the line near the end that begins "Node 0 DMA free: ...". No, this is not a gazillion petabyte Altix. It's a mundane 2 GByte, 2 processor package (4 cores total) Xeon system. Without further ado ... ======================= CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 4096K CPU 0/0 -> Node 0 using mwait in idle threads. CPU: Physical Processor ID: 0 CPU: Processor Core ID: 0 CPU0: Thermal monitoring enabled (TM2) SMP alternatives: switching to UP code ACPI: Core revision 20060707 Using local APIC timer interrupts. result 20781258 Detected 20.781 MHz APIC timer. SMP alternatives: switching to SMP code Booting processor 1/4 APIC 0x6 Initializing CPU#1 Calibrating delay using timer specific routine.. 5320.09 BogoMIPS (lpj=10640184) CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 4096K CPU 1/6 -> Node 0 CPU: Physical Processor ID: 3 CPU: Processor Core ID: 0 CPU1: Thermal monitoring enabled (TM2) Genuine Intel(R) CPU @ 2.66GHz stepping 04 SMP alternatives: switching to SMP code Booting processor 2/4 APIC 0x1 Initializing CPU#2 Calibrating delay using timer specific routine.. 5320.27 BogoMIPS (lpj=10640543) CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 4096K CPU 2/1 -> Node 0 CPU: Physical Processor ID: 0 CPU: Processor Core ID: 1 CPU2: Thermal monitoring enabled (TM2) Genuine Intel(R) CPU @ 2.66GHz stepping 04 SMP alternatives: switching to SMP code Booting processor 3/4 APIC 0x7 Initializing CPU#3 Calibrating delay using timer specific routine.. 5320.03 BogoMIPS (lpj=10640065) CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 4096K CPU 3/7 -> Node 0 CPU: Physical Processor ID: 3 CPU: Processor Core ID: 1 CPU3: Thermal monitoring enabled (TM2) Genuine Intel(R) CPU @ 2.66GHz stepping 04 Brought up 4 CPUs testing NMI watchdog ... OK. time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer. time.c: Detected 2660.003 MHz processor. migration_cost=26,7972 NET: Registered protocol family 16 ACPI: bus type pci registered PCI: Using MMCONFIG at a0000000 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (0000:00) PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1 PCI: PXH quirk detected, disabling MSI for SHPC device PCI: PXH quirk detected, disabling MSI for SHPC device PCI: Transparent bridge - 0000:00:1e.0 ACPI: PCI Interrupt Link [LNKA] (IRQs 5 7 *10 11) ACPI: PCI Interrupt Link [LNKB] (IRQs 5 7 10 *11) ACPI: PCI Interrupt Link [LNKC] (IRQs 5 7 *10 11) ACPI: PCI Interrupt Link [LNKD] (IRQs *5 7 10 11) ACPI: PCI Interrupt Link [LNKE] (IRQs *5 7 10 11) ACPI: PCI Interrupt Link [LNKF] (IRQs 5 7 10 11) *0, disabled. ACPI: PCI Interrupt Link [LNKG] (IRQs 5 7 *10 11) ACPI: PCI Interrupt Link [LNKH] (IRQs 5 7 10 *11) Intel 82802 RNG detected SCSI subsystem initialized usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 hpet0: 3 64-bit timers, 14318180 Hz PCI-GART: No AMD northbridge found. PCI: Bridge: 0000:03:00.0 IO window: disabled. MEM window: b8900000-b89fffff PREFETCH window: b8b00000-b8bfffff PCI: Bridge: 0000:03:00.2 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:02:00.0 IO window: disabled. MEM window: b8900000-b89fffff PREFETCH window: b8b00000-b8bfffff PCI: Bridge: 0000:02:01.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:02:02.0 IO window: 2000-2fff MEM window: b8000000-b88fffff PREFETCH window: disabled. PCI: Bridge: 0000:01:00.0 IO window: 2000-2fff MEM window: b8000000-b89fffff PREFETCH window: b8b00000-b8bfffff PCI: Bridge: 0000:01:00.3 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:02.0 IO window: 2000-2fff MEM window: b8000000-b8afffff PREFETCH window: b8b00000-b8bfffff PCI: Bridge: 0000:00:03.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:04.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:05.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:0c:00.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:0c:00.2 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:06.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:07.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:1e.0 IO window: 1000-1fff MEM window: b8c00000-b8cfffff PREFETCH window: b0000000-b7ffffff GSI 16 sharing vector 0xA9 and IRQ 16 ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:02:01.0[A] -> GSI 16 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:02:02.0[A] -> GSI 16 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:00:03.0[A] -> GSI 16 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:00:04.0[A] -> GSI 16 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:00:05.0[A] -> GSI 16 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:00:06.0[A] -> GSI 16 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:00:07.0[A] -> GSI 16 (level, low) -> IRQ 169 NET: Registered protocol family 2 IP route cache hash table entries: 65536 (order: 7, 524288 bytes) TCP established hash table entries: 262144 (order: 10, 4194304 bytes) TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) TCP: Hash tables configured (established 262144 bind 65536) TCP reno registered Total HugeTLB memory allocated, 0 Installing knfsd (copyright (C) 1996 okir@monad.swb.de). SGI XFS with large block/inode numbers, no debug enabled io scheduler noop registered io scheduler deadline registered io scheduler cfq registered (default) 0000:00:1d.7 EHCI: BIOS handoff failed (BIOS bug ?) 01010001 assign_interrupt_mode Found MSI capability assign_interrupt_mode Found MSI capability assign_interrupt_mode Found MSI capability assign_interrupt_mode Found MSI capability assign_interrupt_mode Found MSI capability assign_interrupt_mode Found MSI capability assign_interrupt_mode Found MSI capability assign_interrupt_mode Found MSI capability assign_interrupt_mode Found MSI capability aer: probe of 0000:00:02.0:pcie01 failed with error 2 aer: probe of 0000:00:03.0:pcie01 failed with error 1 aer: probe of 0000:00:04.0:pcie01 failed with error 2 aer: probe of 0000:00:05.0:pcie01 failed with error 2 aer: probe of 0000:00:06.0:pcie01 failed with error 2 aer: probe of 0000:00:07.0:pcie01 failed with error 2 ACPI: Power Button (FF) [PWRF] ACPI: Power Button (CM) [PWRB] ACPI: Invalid PBLK length [5] ACPI: Invalid PBLK length [5] ACPI: Invalid PBLK length [5] ACPI: Invalid PBLK length [5] ACPI Exception (acpi_processor-0681): AE_NOT_FOUND, Processor Device is not present [20060707] ACPI: Getting cpuindex for acpiid 0x4 ACPI Exception (acpi_processor-0681): AE_NOT_FOUND, Processor Device is not present [20060707] ACPI: Getting cpuindex for acpiid 0x5 ACPI Exception (acpi_processor-0681): AE_NOT_FOUND, Processor Device is not present [20060707] ACPI: Getting cpuindex for acpiid 0x6 ACPI Exception (acpi_processor-0681): AE_NOT_FOUND, Processor Device is not present [20060707] ACPI: Getting cpuindex for acpiid 0x7 Real Time Clock Driver v1.12ac Linux agpgart interface v0.101 (c) Dave Jones Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A floppy0: no floppy controllers found RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize loop: loaded (max 8 devices) Intel(R) PRO/1000 Network Driver - version 7.1.9-k6 Copyright (c) 1999-2006 Intel Corporation. GSI 17 sharing vector 0x42 and IRQ 17 ACPI: PCI Interrupt 0000:07:00.0[A] -> GSI 18 (level, low) -> IRQ 66 e1000: 0000:07:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:04:23:cf:2d:d2e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection GSI 18 sharing vector 0x4A and IRQ 18 ACPI: PCI Interrupt 0000:07:00.1[B] -> GSI 19 (level, low) -> IRQ 74 e1000: 0000:07:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:04:23:cf:2d:d3 e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection e100: Intel(R) PRO/100 Network Driver, 3.5.10-k4-NAPI e100: Cohda: DV-28E-N, ATAPI CD/DVD-ROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 hda: ATAPI 24X DVD-ROM drive, 256kB Cache, UDMA(33) Uniform CD-ROM driver Revision: 3.20 megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006) megaraid: 2.20.4.9 (Release Date: Sun Jul 16 12:27:22 EST 2006) megasas: 00.00.03.01 Sun May 14 22:49:52 PDT 2006 megasas: 0x1000:0x0411:0x8086:0x3501: bus 4:slot 14:func 0 ACPI: PCI Interrupt 0000:04:0e.0[A] -> GSI 18 (level, low) -> IRQ 66 scsi0 : LSI Logic SAS based MegaRAID driver scsi 0:0:0:0: Direct-Access ATA HDT722525DLA380 A80A PQ: 0 ANSI: 5 scsi 0:0:1:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5 scsi 0:0:2:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5 scsi 0:0:3:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5 scsi 0:0:4:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5 scsi 0:2:0:0: Direct-Access INTEL SROMBSAS18E 1.00 PQ: 0 ANSI: 5 scsi 0:2:1:0: Direct-Access INTEL SROMBSAS18E 1.00 PQ: 0 ANSI: 5 swapper invoked oom-killer: gfp_mask=0xd1, order=0, oomkilladj=0 Call Trace: [<ffffffff8020acfa>] show_trace+0x34/0x47 [<ffffffff8020ad1f>] dump_stack+0x12/0x17 [<ffffffff8025a27f>] out_of_memory+0x79/0x282 [<ffffffff8025bce3>] __alloc_pages+0x229/0x2b2 [<ffffffff80274ec2>] cache_grow+0x134/0x333 [<ffffffff802753d2>] cache_alloc_refill+0x17e/0x1cc [<ffffffff80275820>] kmem_cache_alloc+0x6c/0x76 [<ffffffff8047b8fb>] sd_revalidate_disk+0x3a/0xcdb [<ffffffff8047d39d>] sd_probe+0x28b/0x31e [<ffffffff803ee8b7>] really_probe+0x47/0xc9 [<ffffffff803eeaa8>] __driver_attach+0x6f/0xaf [<ffffffff803ee29c>] bus_for_each_dev+0x43/0x6e [<ffffffff803eddbc>] bus_add_driver+0x6b/0x18d [<ffffffff80207184>] init+0x13e/0x306 [<ffffffff8020a3f8>] child_rip+0xa/0x12 DWARF2 unwinder stuck at child_rip+0xa/0x12 Leftover inexact backtrace: [<ffffffff803abf92>] acpi_ds_init_one_object+0x0/0x82 [<ffffffff80207046>] init+0x0/0x306 [<ffffffff8020a3ee>] child_rip+0x0/0x12 Mem-info: Node 0 DMA per-cpu: cpu 0 hot: high 186, batch 31 used:0 cpu 0 cold: high 62, batch 15 used:0 cpu 1 hot: high 186, batch 31 used:0 cpu 1 cold: high 62, batch 15 used:0 cpu 2 hot: high 186, batch 31 used:0 cpu 2 cold: high 62, batch 15 used:0 cpu 3 hot: high 186, batch 31 used:0 cpu 3 cold: high 62, batch 15 used:0 Node 0 DMA32 per-cpu: cpu 0 hot: high 186, batch 31 used:19 cpu 0 cold: high 62, batch 15 used:0 cpu 1 hot: high 186, batch 31 used:16 cpu 1 cold: high 62, batch 15 used:0 cpu 2 hot: high 186, batch 31 used:158 cpu 2 cold: high 62, batch 15 used:0 cpu 3 hot: high 186, batch 31 used:10 cpu 3 cold: high 62, batch 15 used:0 Node 0 Normal per-cpu: empty Active:0 inactive:0 dirty:0 writeback:0 unstable:0 free:509486 slab:1362 mapped:0 pagetables:0 Node 0 DMA free:1616kB min:143085642166168kB low:178857052707708kB high:214628463249252kB active:0kB inactive:0kB present:18446744073709538996kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2026 2026 Node 0 DMA32 free:2036328kB min:5776kB low:7220kB high:8664kB active:0kB inactive:0kB present:2075356kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 Node 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 Node 0 DMA: 2*4kB 3*8kB 1*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1616kB Node 0 DMA32: 0*4kB 1*8kB 0*16kB 1*32kB 5*64kB 8*128kB 3*256kB 1*512kB 0*1024kB 1*2048kB 496*4096kB = 2036328kB Node 0 Normal: empty Swap cache: add 0, delete 0, find 0/0, race 0+0 Free swap = 0kB Total swap = 0kB Free swap: 0kB 523264 pages of RAM 10232 reserved pages 0 pages shared 0 pages swap cached Kernel panic - not syncing: Out of memory and no killable processes... -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 -- VGER BF report: U 0.5 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: x86_64 account-for-memmap patch in 2.6.18-rc4-mm3 doesn't boot. 2006-09-02 3:24 ` Paul Jackson @ 2006-09-04 9:45 ` Mel Gorman 2006-09-06 22:10 ` Paul Jackson 0 siblings, 1 reply; 10+ messages in thread From: Mel Gorman @ 2006-09-04 9:45 UTC (permalink / raw) To: Paul Jackson Cc: linux-kernel, akpm, haveblue, apw, ak, benh, paulus, kmannth, tony.luck, kamezawa.hiroyu, y-goto On (01/09/06 20:24), Paul Jackson didst pronounce: > > That is a bit of a sickener. It may be worth getting your good lab tech > > to check if there is a configuration setting in the hardware for > > simulating console output before you make the trip. > > Apparently my lab setup simply lacks correct flow control on the serial > console line. I hacked the 8250 serial driver in my kernel to put a one > msec delay between each character output, and it no longer drops > console output during boot. > Nice work. > > > This may take a day or three to yield results, unless I get lucky. > > > > > > > I have Keith's problem with reserve-based-hot-add to keep me occupied in > > the meantime. Whenever you get the chance will be fine. Thanks a lot > > Ok, below is the console output for one of these crashes. > > This output is missing the first couple dozen lines commencing with > grub announcing it is loading my kernel, as those lines seem to go via > a different serial driver that I didn't chase down to hack. Those > initial lines were still dropping lotsa chars. If you need those > initial lines bad, holler, and I can probably hack something to get > them to show up. > I could do with those lines, but I believe there was enough information printed to determine why it failed to boot. I've attached a patch that should boot the machine and assuming it works, I just need the output of dmesg. > By the way, the crash continues to happen 100% with the patch: > > patches/account-for-memmap-and-optionally-the-kernel-image-as-holes.patch > Not suprising considering what the min_free_kbytes is from this output! > Node 0 DMA free:1616kB min:143085642166168kB low:178857052707708kB high:214628463249252kB active:0kB inactive:0kB present:18446744073709538996kB pages_scanned:0 all_unreclaimable? yes > lowmem_reserve[]: 0 2026 2026 > Node 0 DMA32 free:2036328kB min:5776kB low:7220kB high:8664kB active:0kB inactive:0kB present:2075356kB pages_scanned:0 all_unreclaimable? no > I believe it is because memmap was calculated to be bigger than it possibly could be. Can you try booting the following patch with loglevel=8 and send me the dmesg output if it boots please? Thanks diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc4-mm3-clean/mm/page_alloc.c linux-2.6.18-rc4-mm3-fix_accountmemmap/mm/page_alloc.c --- linux-2.6.18-rc4-mm3-clean/mm/page_alloc.c 2006-08-28 15:05:30.000000000 +0100 +++ linux-2.6.18-rc4-mm3-fix_accountmemmap/mm/page_alloc.c 2006-09-04 10:36:04.000000000 +0100 @@ -2373,7 +2373,9 @@ unsigned long __meminit account_memmap(s if (zone_index == memmap_zone_idx(pgdat->node_mem_map)) { pages = pgdat->node_spanned_pages; pages = (pages * sizeof(struct page)) >> PAGE_SHIFT; - printk(KERN_DEBUG "%lu pages used for memmap\n", pages); + printk(KERN_DEBUG + " %s zone: %lu pages used for memmap\n", + zone_names[zone_index], pages); } return pages; } @@ -2411,7 +2413,9 @@ unsigned long account_memmap(struct pgli } pages >>= PAGE_SHIFT; - printk(KERN_DEBUG "%lu pages used for SPARSE memmap\n", pages); + printk(KERN_DEBUG + " %s zone: %lu pages used for SPARSEMEM memmap\n", + zone_names[zone_index], pages); return pages; } #endif @@ -2437,17 +2441,24 @@ static void __meminit free_area_init_cor for (j = 0; j < MAX_NR_ZONES; j++) { struct zone *zone = pgdat->node_zones + j; - unsigned long size, realsize; + unsigned long size, realsize, memmap_size; size = zone_spanned_pages_in_node(nid, j, zones_size); realsize = size - zone_absent_pages_in_node(nid, j, zholes_size); - realsize -= account_memmap(pgdat, j); + /* Account for the size of mem_map */ + memmap_size = account_memmap(pgdat, j); + if (realsize >= memmap_size) + realsize -= memmap_size; + else + printk(KERN_WARNING "memmap_size of %lu exceeds %lu\n", + memmap_size, realsize); + /* Account for reserved DMA pages */ if (j == ZONE_DMA && realsize > dma_reserve) { realsize -= dma_reserve; - printk(KERN_DEBUG "%lu pages DMA reserved\n", + printk(KERN_DEBUG " DMA zone: %lu pages reserved\n", dma_reserve); } -- VGER BF report: U 0.499996 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: x86_64 account-for-memmap patch in 2.6.18-rc4-mm3 doesn't boot. 2006-09-04 9:45 ` Mel Gorman @ 2006-09-06 22:10 ` Paul Jackson 2006-09-07 14:16 ` Mel Gorman 0 siblings, 1 reply; 10+ messages in thread From: Paul Jackson @ 2006-09-06 22:10 UTC (permalink / raw) To: Mel Gorman Cc: linux-kernel, akpm, haveblue, apw, ak, benh, paulus, kmannth, tony.luck, kamezawa.hiroyu, y-goto Mel Gorman wrote: > I could do with those lines, but I believe there was enough information > printed to determine why it failed to boot. I've attached a patch that > should boot the machine and assuming it works, I just need the output of > dmesg. Yup - that patch booted it, and produced the output you asked for. Here's the dmesg output from booting your patch: Linux version 2.6.18-rc4-mm3 (pj@spandau) (gcc version 4.1.0 (SUSE Linux)) #60 SMP Wed Sep 6 16:34:36 CDT 2006 Command line: root=/dev/sda3 console=ttyS1,115200 showopts pj1 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009f000 (usable) BIOS-e820: 000000000009f000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000007f932000 (usable) BIOS-e820: 000000007f932000 - 000000007f9d0000 (ACPI NVS) BIOS-e820: 000000007f9d0000 - 000000007fa42000 (usable) BIOS-e820: 000000007fa42000 - 000000007fa9a000 (reserved) BIOS-e820: 000000007fa9a000 - 000000007fad6000 (usable) BIOS-e820: 000000007fad6000 - 000000007fb1a000 (ACPI NVS) BIOS-e820: 000000007fb1a000 - 000000007fb2b000 (usable) BIOS-e820: 000000007fb2b000 - 000000007fb3a000 (ACPI data) BIOS-e820: 000000007fb3a000 - 000000007fc00000 (usable) BIOS-e820: 00000000ffc00000 - 00000000ffc0c000 (reserved) Entering add_active_range(0, 0, 159) 0 entries of 3200 used Entering add_active_range(0, 256, 522546) 1 entries of 3200 used Entering add_active_range(0, 522704, 522818) 2 entries of 3200 used Entering add_active_range(0, 522906, 522966) 3 entries of 3200 used Entering add_active_range(0, 523034, 523051) 4 entries of 3200 used Entering add_active_range(0, 523066, 523264) 5 entries of 3200 used end_pfn_map = 1047564 DMI 2.4 present. ACPI: RSDP (v002 INTEL ) @ 0x00000000000f0350 ACPI: XSDT (v001 INTEL S5000PAL 0x00000000 INTL 0x01000013) @ 0x000000007fb39120 ACPI: FADT (v003 INTEL S5000PAL 0x00000000 INTL 0x01000013) @ 0x000000007fb36000 ACPI: MADT (v001 INTEL S5000PAL 0x00000000 INTL 0x01000013) @ 0x000000007fb35000 ACPI: SPCR (v001 INTEL S5000PAL 0x00000000 INTL 0x01000013) @ 0x000000007fb2e000 ACPI: HPET (v001 INTEL S5000PAL 0x00000001 INTL 0x01000013) @ 0x000000007fb2d000 ACPI: MCFG (v001 INTEL S5000PAL 0x00000001 INTL 0x01000013) @ 0x000000007fb2c000 ACPI: SSDT (v002 INTEL S5000PAL 0x00004000 INTL 0x01000013) @ 0x000000007fb2b000 ACPI: DSDT (v002 INTEL S5000PAL 0x00000008 INTL 0x01000013) @ 0x0000000000000000 No NUMA configuration found Faking a node at 0000000000000000-000000007fc00000 Entering add_active_range(0, 0, 159) 0 entries of 3200 used Entering add_active_range(0, 256, 522546) 1 entries of 3200 used Entering add_active_range(0, 522704, 522818) 2 entries of 3200 used Entering add_active_range(0, 522906, 522966) 3 entries of 3200 used Entering add_active_range(0, 523034, 523051) 4 entries of 3200 used Entering add_active_range(0, 523066, 523264) 5 entries of 3200 used Bootmem setup node 0 0000000000000000-000000007fc00000 Zone PFN ranges: DMA 0 -> 4096 DMA32 4096 -> 1048576 Normal 1048576 -> 1048576 early_node_map[6] active PFN ranges 0: 0 -> 159 0: 256 -> 522546 0: 522704 -> 522818 0: 522906 -> 522966 0: 523034 -> 523051 0: 523066 -> 523264 On node 0 totalpages: 522838 DMA zone: 7154 pages used for memmap memmap_size of 7154 exceeds 3999 DMA zone: 1732 pages reserved DMA zone: 2267 pages, LIFO batch:0 DMA32 zone: 518839 pages, LIFO batch:31 ACPI: PM-Timer IO Port: 0x408 ACPI: Local APIC address 0xfee00000 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 (Bootup-CPU) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x06] enabled) Processor #6 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 ACPI: LAPIC (acpi_id[0x03] lapic_id[0x07] enabled) Processor #7 ACPI: LAPIC (acpi_id[0x04] lapic_id[0x84] disabled) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x85] disabled) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x86] disabled) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x87] disabled) ACPI: LAPIC_NMI (acpi_id[0x00] high level lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] high level lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x02] high level lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x03] high level lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x04] high level lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x05] high level lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x06] high level lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x07] high level lint[0x1]) ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 8, address 0xfec00000, GSI 0-23 ACPI: IOAPIC (id[0x09] address[0xfec80000] gsi_base[24]) IOAPIC[1]: apic_id 9, address 0xfec80000, GSI 24-47 ACPI: IOAPIC (id[0x0a] address[0xfec84000] gsi_base[48]) IOAPIC[2]: apic_id 10, address 0xfec84000, GSI 48-71 ACPI: IOAPIC (id[0x0b] address[0xfec84400] gsi_base[72]) IOAPIC[3]: apic_id 11, address 0xfec84400, GSI 72-95 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. ACPI: IRQ9 used by override. Setting APIC routing to physical flat ACPI: HPET id: 0x8086a201 base: 0xfed00000 Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at 80000000 (gap: 7fc00000:80000000) SMP: Allowing 8 CPUs, 4 hotplug CPUs PERCPU: Allocating 32000 bytes of per cpu data Built 1 zonelists. Total pages: 521106 Kernel command line: root=/dev/sda3 console=ttyS1,115200 showopts pj1 Initializing CPU#0 PID hash table entries: 4096 (order: 12, 32768 bytes) Console: colour VGA+ 80x25 Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) Checking aperture... Memory: 2052128k/2093056k available (3520k kernel code, 39224k reserved, 2327k data, 280k init) Calibrating delay using timer specific routine.. 5324.65 BogoMIPS (lpj=10649309) Mount-cache hash table entries: 256 CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 4096K CPU 0/0 -> Node 0 using mwait in idle threads. CPU: Physical Processor ID: 0 CPU: Processor Core ID: 0 CPU0: Thermal monitoring enabled (TM2) SMP alternatives: switching to UP code ACPI: Core revision 20060707 Using local APIC timer interrupts. result 20781307 Detected 20.781 MHz APIC timer. SMP alternatives: switching to SMP code Booting processor 1/4 APIC 0x6 Initializing CPU#1 Calibrating delay using timer specific routine.. 5320.18 BogoMIPS (lpj=10640368) CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 4096K CPU 1/6 -> Node 0 CPU: Physical Processor ID: 3 CPU: Processor Core ID: 0 CPU1: Thermal monitoring enabled (TM2) Genuine Intel(R) CPU @ 2.66GHz stepping 04 SMP alternatives: switching to SMP code Booting processor 2/4 APIC 0x1 Initializing CPU#2 Calibrating delay using timer specific routine.. 5320.13 BogoMIPS (lpj=10640272) CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 4096K CPU 2/1 -> Node 0 CPU: Physical Processor ID: 0 CPU: Processor Core ID: 1 CPU2: Thermal monitoring enabled (TM2) Genuine Intel(R) CPU @ 2.66GHz stepping 04 SMP alternatives: switching to SMP code Booting processor 3/4 APIC 0x7 Initializing CPU#3 Calibrating delay using timer specific routine.. 5320.15 BogoMIPS (lpj=10640316) CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 4096K CPU 3/7 -> Node 0 CPU: Physical Processor ID: 3 CPU: Processor Core ID: 1 CPU3: Thermal monitoring enabled (TM2) Genuine Intel(R) CPU @ 2.66GHz stepping 04 Brought up 4 CPUs testing NMI watchdog ... OK. time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer. time.c: Detected 2660.008 MHz processor. migration_cost=24,8015 NET: Registered protocol family 16 ACPI: bus type pci registered PCI: Using MMCONFIG at a0000000 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (0000:00) PCI: Probing PCI hardware (bus 00) PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1 Losing some ticks... checking if CPU frequency changed. PCI: PXH quirk detected, disabling MSI for SHPC device PCI: PXH quirk detected, disabling MSI for SHPC device Boot video device is 0000:10:0c.0 PCI: Transparent bridge - 0000:00:1e.0 ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P32_._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCE4._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCE5._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.EXPC._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.EXPC.PXHA._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.EXPC.PXHB._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCE7._PRT] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 7 *10 11) ACPI: PCI Interrupt Link [LNKB] (IRQs 5 7 10 *11) ACPI: PCI Interrupt Link [LNKC] (IRQs 5 7 *10 11) ACPI: PCI Interrupt Link [LNKD] (IRQs *5 7 10 11) ACPI: PCI Interrupt Link [LNKE] (IRQs *5 7 10 11) ACPI: PCI Interrupt Link [LNKF] (IRQs 5 7 10 11) *0, disabled. ACPI: PCI Interrupt Link [LNKG] (IRQs 5 7 *10 11) ACPI: PCI Interrupt Link [LNKH] (IRQs 5 7 10 *11) ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIE._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIE.PCIE._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIE.PCIW._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIE.PCIW.PCIO._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIE.PCIW.PCIO.PCIA._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIE.PCIW.PCIP._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIE.PCIW.PCIQ._PRT] Intel 82802 RNG detected SCSI subsystem initialized usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 hpet0: 3 64-bit timers, 14318180 Hz PCI-GART: No AMD northbridge found. PCI: Bridge: 0000:03:00.0 IO window: disabled. MEM window: b8900000-b89fffff PREFETCH window: b8b00000-b8bfffff PCI: Bridge: 0000:03:00.2 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:02:00.0 IO window: disabled. MEM window: b8900000-b89fffff PREFETCH window: b8b00000-b8bfffff PCI: Bridge: 0000:02:01.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:02:02.0 IO window: 2000-2fff MEM window: b8000000-b88fffff PREFETCH window: disabled. PCI: Bridge: 0000:01:00.0 IO window: 2000-2fff MEM window: b8000000-b89fffff PREFETCH window: b8b00000-b8bfffff PCI: Bridge: 0000:01:00.3 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:02.0 IO window: 2000-2fff MEM window: b8000000-b8afffff PREFETCH window: b8b00000-b8bfffff PCI: Bridge: 0000:00:03.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:04.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:05.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:0c:00.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:0c:00.2 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:06.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:07.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:1e.0 IO window: 1000-1fff MEM window: b8c00000-b8cfffff PREFETCH window: b0000000-b7ffffff GSI 16 sharing vector 0xA9 and IRQ 16 ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 169 PCI: Setting latency timer of device 0000:00:02.0 to 64 ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 169 PCI: Setting latency timer of device 0000:01:00.0 to 64 ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 169 PCI: Setting latency timer of device 0000:02:00.0 to 64 PCI: Setting latency timer of device 0000:03:00.0 to 64 PCI: Setting latency timer of device 0000:03:00.2 to 64 ACPI: PCI Interrupt 0000:02:01.0[A] -> GSI 16 (level, low) -> IRQ 169 PCI: Setting latency timer of device 0000:02:01.0 to 64 ACPI: PCI Interrupt 0000:02:02.0[A] -> GSI 16 (level, low) -> IRQ 169 PCI: Setting latency timer of device 0000:02:02.0 to 64 PCI: Setting latency timer of device 0000:01:00.3 to 64 ACPI: PCI Interrupt 0000:00:03.0[A] -> GSI 16 (level, low) -> IRQ 169 PCI: Setting latency timer of device 0000:00:03.0 to 64 ACPI: PCI Interrupt 0000:00:04.0[A] -> GSI 16 (level, low) -> IRQ 169 PCI: Setting latency timer of device 0000:00:04.0 to 64 ACPI: PCI Interrupt 0000:00:05.0[A] -> GSI 16 (level, low) -> IRQ 169 PCI: Setting latency timer of device 0000:00:05.0 to 64 ACPI: PCI Interrupt 0000:00:06.0[A] -> GSI 16 (level, low) -> IRQ 169 PCI: Setting latency timer of device 0000:00:06.0 to 64 PCI: Setting latency timer of device 0000:0c:00.0 to 64 PCI: Setting latency timer of device 0000:0c:00.2 to 64 ACPI: PCI Interrupt 0000:00:07.0[A] -> GSI 16 (level, low) -> IRQ 169 PCI: Setting latency timer of device 0000:00:07.0 to 64 PCI: Setting latency timer of device 0000:00:1e.0 to 64 NET: Registered protocol family 2 IP route cache hash table entries: 65536 (order: 7, 524288 bytes) TCP established hash table entries: 262144 (order: 10, 4194304 bytes) TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) TCP: Hash tables configured (established 262144 bind 65536) TCP reno registered Total HugeTLB memory allocated, 0 Installing knfsd (copyright (C) 1996 okir@monad.swb.de). SGI XFS with large block/inode numbers, no debug enabled io scheduler noop registered io scheduler deadline registered io scheduler cfq registered (default) 0000:00:1d.7 EHCI: BIOS handoff failed (BIOS bug ?) 01010001 PCI: Setting latency timer of device 0000:00:02.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:02.0:pcie00] Allocate Port Service[0000:00:02.0:pcie01] PCI: Setting latency timer of device 0000:00:03.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:03.0:pcie00] Allocate Port Service[0000:00:03.0:pcie01] PCI: Setting latency timer of device 0000:00:04.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:04.0:pcie00] Allocate Port Service[0000:00:04.0:pcie01] Allocate Port Service[0000:00:04.0:pcie02] PCI: Setting latency timer of device 0000:00:05.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:05.0:pcie00] Allocate Port Service[0000:00:05.0:pcie01] Allocate Port Service[0000:00:05.0:pcie02] PCI: Setting latency timer of device 0000:00:06.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:06.0:pcie00] Allocate Port Service[0000:00:06.0:pcie01] Allocate Port Service[0000:00:06.0:pcie02] PCI: Setting latency timer of device 0000:00:07.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:07.0:pcie00] Allocate Port Service[0000:00:07.0:pcie01] PCI: Setting latency timer of device 0000:01:00.0 to 64 Allocate Port Service[0000:01:00.0:pcie10] Allocate Port Service[0000:01:00.0:pcie11] PCI: Setting latency timer of device 0000:02:00.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:02:00.0:pcie20] Allocate Port Service[0000:02:00.0:pcie21] Allocate Port Service[0000:02:00.0:pcie22] PCI: Setting latency timer of device 0000:02:01.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:02:01.0:pcie20] Allocate Port Service[0000:02:01.0:pcie21] PCI: Setting latency timer of device 0000:02:02.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:02:02.0:pcie20] Allocate Port Service[0000:02:02.0:pcie21] Evaluate _OSC Set fails. Status = 0x0005 aer_init: AER service init fails - Run ACPI _OSC fails aer: probe of 0000:00:02.0:pcie01 failed with error 2 aer_init: AER service init fails - No ACPI _OSC support aer: probe of 0000:00:03.0:pcie01 failed with error 1 Evaluate _OSC Set fails. Status = 0x0005 aer_init: AER service init fails - Run ACPI _OSC fails aer: probe of 0000:00:04.0:pcie01 failed with error 2 Evaluate _OSC Set fails. Status = 0x0005 aer_init: AER service init fails - Run ACPI _OSC fails aer: probe of 0000:00:05.0:pcie01 failed with error 2 Evaluate _OSC Set fails. Status = 0x0005 aer_init: AER service init fails - Run ACPI _OSC fails aer: probe of 0000:00:06.0:pcie01 failed with error 2 Evaluate _OSC Set fails. Status = 0x0005 aer_init: AER service init fails - Run ACPI _OSC fails aer: probe of 0000:00:07.0:pcie01 failed with error 2 ACPI: Power Button (FF) [PWRF] ACPI: Power Button (CM) [PWRB] ACPI: Invalid PBLK length [5] ACPI: Invalid PBLK length [5] ACPI: Invalid PBLK length [5] ACPI: Invalid PBLK length [5] ACPI Exception (acpi_processor-0681): AE_NOT_FOUND, Processor Device is not present [20060707] ACPI: Getting cpuindex for acpiid 0x4 ACPI Exception (acpi_processor-0681): AE_NOT_FOUND, Processor Device is not present [20060707] ACPI: Getting cpuindex for acpiid 0x5 ACPI Exception (acpi_processor-0681): AE_NOT_FOUND, Processor Device is not present [20060707] ACPI: Getting cpuindex for acpiid 0x6 ACPI Exception (acpi_processor-0681): AE_NOT_FOUND, Processor Device is not present [20060707] ACPI: Getting cpuindex for acpiid 0x7 Real Time Clock Driver v1.12ac hpet_resources: 0xfed00000 is busy Linux agpgart interface v0.101 (c) Dave Jones Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A floppy0: no floppy controllers found RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize loop: loaded (max 8 devices) Intel(R) PRO/1000 Network Driver - version 7.1.9-k6 Copyright (c) 1999-2006 Intel Corporation. GSI 17 sharing vector 0x42 and IRQ 17 ACPI: PCI Interrupt 0000:07:00.0[A] -> GSI 18 (level, low) -> IRQ 66 PCI: Setting latency timer of device 0000:07:00.0 to 64 e1000: 0000:07:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:04:23:cf:2d:d2 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection GSI 18 sharing vector 0x4A and IRQ 18 ACPI: PCI Interrupt 0000:07:00.1[B] -> GSI 19 (level, low) -> IRQ 74 PCI: Setting latency timer of device 0000:07:00.1 to 64 e1000: 0000:07:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:04:23:cf:2d:d3 e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection e100: Intel(R) PRO/100 Network Driver, 3.5.10-k4-NAPI e100: Copyright(c) 1999-2006 Intel Corporation forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.57. tun: Universal TUN/TAP device driver, 1.6 tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com> netconsole: not configured, aborting Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx ESB2: IDE controller at PCI slot 0000:00:1f.1 GSI 19 sharing vector 0x52 and IRQ 19 ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 20 (level, low) -> IRQ 82 ESB2: chipset revision 9 ESB2: not 100% native mode: will probe irqs later ide0: BM-DMA at 0x30a0-0x30a7, BIOS settings: hda:DMA, hdb:pio Probing IDE interface ide0... hda: DV-28E-N, ATAPI CD/DVD-ROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Probing IDE interface ide1... hda: ATAPI 24X DVD-ROM drive, 256kB Cache, UDMA(33) Uniform CD-ROM driver Revision: 3.20 megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006) megaraid: 2.20.4.9 (Release Date: Sun Jul 16 12:27:22 EST 2006) megasas: 00.00.03.01 Sun May 14 22:49:52 PDT 2006 megasas: 0x1000:0x0411:0x8086:0x3501: bus 4:slot 14:func 0 ACPI: PCI Interrupt 0000:04:0e.0[A] -> GSI 18 (level, low) -> IRQ 66 scsi0 : LSI Logic SAS based MegaRAID driver scsi 0:0:0:0: Direct-Access ATA HDT722525DLA380 A80A PQ: 0 ANSI: 5 scsi 0:0:1:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5 scsi 0:0:2:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5 scsi 0:0:3:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5 scsi 0:0:4:0: Direct-Access ATA HDS725050KLA360 AB0A PQ: 0 ANSI: 5 scsi 0:2:0:0: Direct-Access INTEL SROMBSAS18E 1.00 PQ: 0 ANSI: 5 scsi 0:2:1:0: Direct-Access INTEL SROMBSAS18E 1.00 PQ: 0 ANSI: 5 SCSI device sda: 486326272 512-byte hdwr sectors (248999 MB) sda: test WP failed, assume Write Enabled sda: asking for cache data failed sda: assuming drive cache: write through SCSI device sda: 486326272 512-byte hdwr sectors (248999 MB) sda: test WP failed, assume Write Enabled sda: asking for cache data failed sda: assuming drive cache: write through sda: sda1 sda2 sda3 sd 0:2:0:0: Attached scsi disk sda SCSI device sdb: 2923825152 512-byte hdwr sectors (1496998 MB) sdb: test WP failed, assume Write Enabled sdb: asking for cache data failed sdb: assuming drive cache: write through SCSI device sdb: 2923825152 512-byte hdwr sectors (1496998 MB) sdb: test WP failed, assume Write Enabled sdb: asking for cache data failed sdb: assuming drive cache: write through sdb: sdb1 sd 0:2:1:0: Attached scsi disk sdb sd 0:2:0:0: Attached scsi generic sg0 type 0 sd 0:2:1:0: Attached scsi generic sg1 type 0 Fusion MPT base driver 3.04.01 Copyright (c) 1999-2005 LSI Logic Corporation Fusion MPT SPI Host driver 3.04.01 Fusion MPT SAS Host driver 3.04.01 ieee1394: raw1394: /dev/raw1394 device initialized GSI 20 sharing vector 0x5A and IRQ 20 ACPI: PCI Interrupt 0000:00:1d.7[A] -> GSI 23 (level, low) -> IRQ 90 PCI: Setting latency timer of device 0000:00:1d.7 to 64 ehci_hcd 0000:00:1d.7: EHCI Host Controller ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 1 ehci_hcd 0000:00:1d.7: debug port 1 PCI: cache line size of 32 is not supported by device 0000:00:1d.7 ehci_hcd 0000:00:1d.7: irq 90, io mem 0xb8d00000 ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 usb usb1: new device found, idVendor=0000, idProduct=0000 usb usb1: new device strings: Mfr=3, Product=2, SerialNumber=1 usb usb1: Product: EHCI Host Controller usb usb1: Manufacturer: Linux 2.6.18-rc4-mm3 ehci_hcd usb usb1: SerialNumber: 0000:00:1d.7 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB hub found hub 1-0:1.0: 8 ports detected ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI) USB Universal Host Controller Interface driver v3.0 ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 23 (level, low) -> IRQ 90 PCI: Setting latency timer of device 0000:00:1d.0 to 64 uhci_hcd 0000:00:1d.0: UHCI Host Controller uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 2 uhci_hcd 0000:00:1d.0: irq 90, io base 0x00003080 usb usb2: new device found, idVendor=0000, idProduct=0000 usb usb2: new device strings: Mfr=3, Product=2, SerialNumber=1 usb usb2: Product: UHCI Host Controller usb usb2: Manufacturer: Linux 2.6.18-rc4-mm3 uhci_hcd usb usb2: SerialNumber: 0000:00:1d.0 usb usb2: configuration #1 chosen from 1 choice hub 2-0:1.0: USB hub found hub 2-0:1.0: 2 ports detected GSI 21 sharing vector 0x62 and IRQ 21 ACPI: PCI Interrupt 0000:00:1d.1[B] -> GSI 22 (level, low) -> IRQ 98 PCI: Setting latency timer of device 0000:00:1d.1 to 64 uhci_hcd 0000:00:1d.1: UHCI Host Controller uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 3 uhci_hcd 0000:00:1d.1: irq 98, io base 0x00003060 usb usb3: new device found, idVendor=0000, idProduct=0000 usb usb3: new device strings: Mfr=3, Product=2, SerialNumber=1 usb usb3: Product: UHCI Host Controller usb usb3: Manufacturer: Linux 2.6.18-rc4-mm3 uhci_hcd usb usb3: SerialNumber: 0000:00:1d.1 usb usb3: configuration #1 chosen from 1 choice hub 3-0:1.0: USB hub found hub 3-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:1d.2[C] -> GSI 23 (level, low) -> IRQ 90 PCI: Setting latency timer of device 0000:00:1d.2 to 64 uhci_hcd 0000:00:1d.2: UHCI Host Controller uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 4 uhci_hcd 0000:00:1d.2: irq 90, io base 0x00003040 usb usb4: new device found, idVendor=0000, idProduct=0000 usb usb4: new device strings: Mfr=3, Product=2, SerialNumber=1 usb usb4: Product: UHCI Host Controller usb usb4: Manufacturer: Linux 2.6.18-rc4-mm3 uhci_hcd usb usb4: SerialNumber: 0000:00:1d.2 usb usb4: configuration #1 chosen from 1 choice hub 4-0:1.0: USB hub found hub 4-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:1d.3[D] -> GSI 22 (level, low) -> IRQ 98 PCI: Setting latency timer of device 0000:00:1d.3 to 64 uhci_hcd 0000:00:1d.3: UHCI Host Controller uhci_hcd 0000:00:1d.3: new USB bus registered, assigned bus number 5 uhci_hcd 0000:00:1d.3: irq 98, io base 0x00003020 usb usb5: new device found, idVendor=0000, idProduct=0000 usb usb5: new device strings: Mfr=3, Product=2, SerialNumber=1 usb usb5: Product: UHCI Host Controller usb usb5: Manufacturer: Linux 2.6.18-rc4-mm3 uhci_hcd usb usb5: SerialNumber: 0000:00:1d.3 usb usb5: configuration #1 chosen from 1 choice hub 5-0:1.0: USB hub found hub 5-0:1.0: 2 ports detected usbcore: registered new interface driver usblp drivers/usb/class/usblp.c: v0.13: USB Printer Device Class driver Initializing USB Mass Storage driver... usbcore: registered new interface driver usb-storage USB Mass Storage support registered. usbcore: registered new interface driver usbhid drivers/usb/input/hid-core.c: v2.6:USB HID core driver serio: i8042 KBD port at 0x60,0x64 irq 1 serio: i8042 AUX port at 0x60,0x64 irq 12 mice: PS/2 mouse device common for all mice device-mapper: ioctl: 4.7.0-ioctl (2006-06-24) initialised: dm-devel@redhat.com Intel 810 + AC97 Audio, version 1.01, 16:33:05 Sep 6 2006 oprofile: using timer interrupt. TCP bic registered NET: Registered protocol family 1 NET: Registered protocol family 10 IPv6 over IPv4 tunneling driver NET: Registered protocol family 17 input: AT Translated Set 2 keyboard as /class/input/input0 ACPI: (supports S0 S1 S4 S5) logips2pp: Detected unknown logitech mouse model 1 input: PS/2 Logitech Mouse as /class/input/input1 XFS mounting filesystem sda3 Ending clean XFS mount for filesystem: sda3 VFS: Mounted root (xfs filesystem) readonly. Freeing unused kernel memory: 280k freed Adding 9438176k swap on /dev/disk/by-id/scsi-3600062b0000011200c86316cf30a2b4c-part2. Priority:-1 extents:1 across:9438176k ADDRCONF(NETDEV_UP): eth0: link is not ready e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex e1000: eth0: e1000_watchdog: 10/100 speed: disabling TSO ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready eth0: no IPv6 routers present -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: x86_64 account-for-memmap patch in 2.6.18-rc4-mm3 doesn't boot. 2006-09-06 22:10 ` Paul Jackson @ 2006-09-07 14:16 ` Mel Gorman 2006-09-07 14:27 ` [PATCH] Fix memmap accounting by approximating the map size Mel Gorman 0 siblings, 1 reply; 10+ messages in thread From: Mel Gorman @ 2006-09-07 14:16 UTC (permalink / raw) To: Paul Jackson Cc: Mel Gorman, linux-kernel, akpm, haveblue, apw, ak, benh, paulus, kmannth, tony.luck, kamezawa.hiroyu, y-goto On Wed, 6 Sep 2006, Paul Jackson wrote: > Mel Gorman wrote: >> I could do with those lines, but I believe there was enough information >> printed to determine why it failed to boot. I've attached a patch that >> should boot the machine and assuming it works, I just need the output of >> dmesg. > > Yup - that patch booted it, and produced the output you asked for. > > Here's the dmesg output from booting your patch: > > <dmesg log snipped> Thanks. Now it's *painfully* obvious what went wrong - memmap is not necessarily in one zone and in your machine memmap spanned two zones. A patch will follow this mail that fixes the underlying issue but keeps the underflow check in case. Please give it a test if you get the chance. It passes regression tests here. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] Fix memmap accounting by approximating the map size 2006-09-07 14:16 ` Mel Gorman @ 2006-09-07 14:27 ` Mel Gorman 2006-09-07 15:33 ` Paul Jackson 0 siblings, 1 reply; 10+ messages in thread From: Mel Gorman @ 2006-09-07 14:27 UTC (permalink / raw) To: Paul Jackson Cc: linux-kernel, akpm, haveblue, apw, ak, benh, paulus, kmannth, tony.luck, kamezawa.hiroyu, y-goto Arch-independent zone-sizing uses account_memmap() in an attempt to accurately account for how much memory was used in a zone by memmap. Watermarks and per-cpu sizes initialisations then take the memmap into account. However, the memmap may span multiple zones and in one case, there was an underflow causing boot failures. The fix that perfectly accounts for memory consumed by memmap is complicated with no clear benefit. The architecture-specific code in x86_64 was simpler because it approximated how much memory was consumed for memmap backing that zone regardless of where the memmap was really stored. This patch ditches the account_memmap() complexity and replaces with the simple approximation used by x86_64 while ensuring no underflow occurs. Signed-off-by: Mel Gorman <mel@csn.ul.ie> diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc4-mm3-clean/mm/page_alloc.c linux-2.6.18-rc4-mm3-101_fix_account_memmap/mm/page_alloc.c --- linux-2.6.18-rc4-mm3-clean/mm/page_alloc.c 2006-08-28 15:05:30.000000000 +0100 +++ linux-2.6.18-rc4-mm3-101_fix_account_memmap/mm/page_alloc.c 2006-09-07 14:36:05.000000000 +0100 @@ -2364,58 +2364,6 @@ static void __init calculate_node_totalp realtotalpages); } -#ifdef CONFIG_FLAT_NODE_MEM_MAP -/* Account for mem_map for CONFIG_FLAT_NODE_MEM_MAP */ -unsigned long __meminit account_memmap(struct pglist_data *pgdat, - int zone_index) -{ - unsigned long pages = 0; - if (zone_index == memmap_zone_idx(pgdat->node_mem_map)) { - pages = pgdat->node_spanned_pages; - pages = (pages * sizeof(struct page)) >> PAGE_SHIFT; - printk(KERN_DEBUG "%lu pages used for memmap\n", pages); - } - return pages; -} -#else -/* Account for mem_map for CONFIG_SPARSEMEM */ -unsigned long account_memmap(struct pglist_data *pgdat, int zone_index) -{ - unsigned long pages = 0; - unsigned long memmap_pfn; - struct page *memmap_addr; - int pnum; - unsigned long pgdat_startpfn, pgdat_endpfn; - struct mem_section *section; - - pgdat_startpfn = pgdat->node_start_pfn; - pgdat_endpfn = pgdat_startpfn + pgdat->node_spanned_pages; - - /* Go through valid sections looking for memmap */ - for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) { - if (!valid_section_nr(pnum)) - continue; - - section = __nr_to_section(pnum); - if (!section_has_mem_map(section)) - continue; - - memmap_addr = __section_mem_map_addr(section); - memmap_pfn = (unsigned long)memmap_addr >> PAGE_SHIFT; - - if (memmap_pfn < pgdat_startpfn || memmap_pfn >= pgdat_endpfn) - continue; - - if (zone_index == memmap_zone_idx(memmap_addr)) - pages += (PAGES_PER_SECTION * sizeof(struct page)); - } - - pages >>= PAGE_SHIFT; - printk(KERN_DEBUG "%lu pages used for SPARSE memmap\n", pages); - return pages; -} -#endif - /* * Set up the zone data structures: * - mark all pages reserved @@ -2437,17 +2385,32 @@ static void __meminit free_area_init_cor for (j = 0; j < MAX_NR_ZONES; j++) { struct zone *zone = pgdat->node_zones + j; - unsigned long size, realsize; + unsigned long size, realsize, memmap_pages; size = zone_spanned_pages_in_node(nid, j, zones_size); realsize = size - zone_absent_pages_in_node(nid, j, zholes_size); - realsize -= account_memmap(pgdat, j); + /* + * Adjust realsize so that it accounts for how much memory + * is used by this zone for memmap. This affects the watermark + * and per-cpu initialisations + */ + memmap_pages = (size * sizeof(struct page)) >> PAGE_SHIFT; + if (realsize >= memmap_pages) { + realsize -= memmap_pages; + printk(KERN_DEBUG + " %s zone: %lu pages used for memmap\n", + zone_names[j], memmap_pages); + } else + printk(KERN_WARNING + " %s zone: %lu pages exceeds realsize %lu\n", + zone_names[j], memmap_pages, realsize); + /* Account for reserved DMA pages */ if (j == ZONE_DMA && realsize > dma_reserve) { realsize -= dma_reserve; - printk(KERN_DEBUG "%lu pages DMA reserved\n", + printk(KERN_DEBUG " DMA zone: %lu pages reserved\n", dma_reserve); } ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Fix memmap accounting by approximating the map size 2006-09-07 14:27 ` [PATCH] Fix memmap accounting by approximating the map size Mel Gorman @ 2006-09-07 15:33 ` Paul Jackson 0 siblings, 0 replies; 10+ messages in thread From: Paul Jackson @ 2006-09-07 15:33 UTC (permalink / raw) To: Mel Gorman Cc: linux-kernel, akpm, haveblue, apw, ak, benh, paulus, kmannth, tony.luck, kamezawa.hiroyu, y-goto Mel wrote: > This patch ditches the account_memmap() complexity and replaces with the > simple approximation used by x86_64 while ensuring no underflow occurs. Works for me, on my x86_64 box. Thanks, Mel. Acked-by: Paul Jackson <pj@sgi.com> -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-09-07 15:33 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-08-31 10:46 x86_64 account-for-memmap patch in 2.6.18-rc4-mm3 doesn't boot Paul Jackson 2006-08-31 16:17 ` Mel Gorman 2006-08-31 17:01 ` Paul Jackson 2006-09-01 8:38 ` Mel Gorman 2006-09-02 3:24 ` Paul Jackson 2006-09-04 9:45 ` Mel Gorman 2006-09-06 22:10 ` Paul Jackson 2006-09-07 14:16 ` Mel Gorman 2006-09-07 14:27 ` [PATCH] Fix memmap accounting by approximating the map size Mel Gorman 2006-09-07 15:33 ` Paul Jackson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox