From: Clark Williams <williams@redhat.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Robert Crocombe <rwcrocombe@raytheon.com>,
LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>
Subject: Re: 2.6.16-rtXYZ hangs at boot on quad Opteron
Date: Fri, 19 May 2006 11:50:48 -0500 [thread overview]
Message-ID: <446DF768.80205@redhat.com> (raw)
In-Reply-To: <Pine.LNX.4.58.0605120216001.26721@gandalf.stny.rr.com>
[-- Attachment #1: Type: text/plain, Size: 2206 bytes --]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Steven Rostedt wrote:
> On Thu, 11 May 2006, Clark Williams wrote:
>
>> Robert Crocombe wrote:
>>> testing NMI watchdog ... OK.
>>>
>>> where we croak. On the non-realtime version, it is instead like so:
>> Yeah, this is where my Athlon64x2 goes into the weeds. I followed it
>> down into the routines that calculate processor migration costs and
>> got lost in the context switches. I suspect that the forced
>> migrations aggravate a problem down in the 64-bit arch specific code
>> that hasn't been looked at in a while (I believe most people running
>> AMD are doing so in 32-bit mode).
>>
>> Maybe it's time for round two...
>>
>
> Hmm, I run my AMDx2 in 64bit kernel mode and 32bit user space. But since
> it is my main machine I don't usually test the new -rt kernels on it that
> often. Looks like when I'm back in the States, I'll have to test it
> again.
>
> -- Steve
>
I spent some time this week trying to dig a bit deeper into why a
64-bit -rtXX won't boot on my Athlon64x2.
I've attached the output from one of my kernel boots (2.6.16-rt23)
that has lots of extra printks in kernel/sched.c and some in
arch/x86_64/kernel/smpboot.c. After spending time looking at the code
around the migration measurement, it occurred to me that things didn't
really look right for the migration thread on processor 1. I stuck a
print in the routine migration_thread() and noticed that I never
actually saw the migration thread for processor 1 get started (I did
see the print when migration_init() started the migration thread on
processor 0). Since the migration measurement code depends on all
migration threads being present, I suspect that somehow the kernel is
not completing initialization for the second processor.
My only thought at this point is to start comparing
arch/x86_64/kernel/smpboot.c to arch/i386/kernel/smpboot.c and see if
something isn't being done on the x86_64 side.
Clark
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org
iD8DBQFEbfdnHyuj/+TTEp0RAnIQAKDEYjJgVho6RXLllgDolZKGRGpPKACfTL/j
s4N1Cg9R3Sh/v87xfx45Vcc=
=WV8C
-----END PGP SIGNATURE-----
[-- Attachment #2: boot-2.6.16-rt23-x86_64x2.log --]
[-- Type: text/x-log, Size: 5281 bytes --]
Bootdata ok (command line is ro root=/dev/hda2 console=tty0 console=uart,io,0xbc00,115200n8 nmi_watchdog=1 migration)
Linux version 2.6.16-rt23 (williams@rt3) (gcc version 4.1.0 20060304 (Red Hat 4.1.0-3)) #154 SMP PREEMPT Fri May 19 6
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000100 - 000000000009f000 (usable)
BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved)
BIOS-e820: 0000000000100000 - 0000000037ef0000 (usable)
BIOS-e820: 0000000037ef0000 - 0000000037ef3000 (ACPI NVS)
BIOS-e820: 0000000037ef3000 - 0000000037f00000 (ACPI data)
BIOS-e820: 0000000038000000 - 0000000040000000 (reserved)
BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
Nvidia board detected. Ignoring ACPI timer override.
ACPI: PM-Timer IO Port: 0x1008
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:11 APIC version 16
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Processor #1 15:11 APIC version 16
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
Setting APIC routing to physical flat
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 50000000 (gap: 40000000:a0000000)
Checking aperture...
CPU 0: aperture @ 3dda000000 size 32 MB
Aperture from northbridge cpu 0 too small (32 MB)
No AGP bridge found
SMP: Allowing 2 CPUs, 0 hotplug CPUs
Real-Time Preemption Support (C) 2004-2006 Ingo Molnar
Built 1 zonelists
Kernel command line: ro root=/dev/hda2 console=tty0 console=uart,io,0xbc00,115200n8 nmi_watchdog=1 migration_debug=1
Initializing CPU#0
WARNING: experimental RCU implementation.
PID hash table entries: 4096 (order: 12, 131072 bytes)
time.c: Using 1.193182 MHz WALL PIT GTOD PIT timer.
time.c: Detected 2004.217 MHz processor.
Console: colour VGA+ 80x25
Early serial console at I/O port 0xbc00 (options '115200n8')
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
Memory: 848936k/916416k available (4005k kernel code, 67088k reserved, 2105k data, 236k init)
Calibrating delay using timer specific routine.. 4010.93 BogoMIPS (lpj=2005468)
Security Framework v1.0.0 initialized
SELinux: Initializing.
SELinux: Starting in permissive mode
selinux_register_security: Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 256
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
activating NMI Watchdog ... done.
Using local APIC timer interrupts.
result 12526354
Detected 12.526 MHz APIC timer.
>>>>>>>>>> migration_thread(0): starting (pid: 2)
++++++++++++++++++++=_---CPU UP 1
Booting processor 1/2 APIC 0x1
Setting warm reset code and vector.
1.
2.
3.
Asserting INIT.
Waiting for send to finish...
+Deasserting INIT.
Waiting for send to finish...
+#startup loops: 2.
Sending STARTUP #1.
After apic_write.
Initializing CPU#1
CPU#1 (phys ID: 1) waiting for CALLOUT
Startup point 1.
Waiting for send to finish...
+Sending STARTUP #2.
After apic_write.
Startup point 1.
Waiting for send to finish...
+After Startup.
Before Callout 1.
After Callout 1.
CALLIN, before setup_local_APIC().
Calibrating delay using timer specific routine.. 4007.88 BogoMIPS (lpj=2003941)
Stack at about ffff810037c81ed4
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping 01
cpu 1: setting up apic clock
CPU has booted.
cpu 1: enabling apic timer
CPU 1: Syncing TSC to CPU 0.
CPU 1: synchronized TSC with CPU 0 (last diff -88 cycles, maxerr 546 cycles)
waiting for cpu 1
Brought up 2 CPUs
testing NMI watchdog ... CPU#0: NMI appears to be stuck (0->0)!
build_sched_domains: starting
build_sched_domains: calling calibrate_migration_costs
calibrate_migration_costs: starting on processor 0
calibrate_migration_costs: starting first pass
calibrate_migration_costs: calculating distance betweeen 0 and 1
calibrate_migration_costs: distance == 0
calibrate_migration_cost: measuring cost betwen 0 and 1
measure_migration_cost: between 0 and 1
measure_migration_cost: starting measurement loop
measure_cost: starting for cpus 0 and 1
measure_one: source:0, target:1, size:65536
meaure_one: migrating pid 1 to cpu:0
measure_one: dirtying the working set
measure_one: migrating pid 1 to target:1
set_cpus_allowed: waking migration thread
wake_up_process: waking pid 15
try_to_wake_up: pid: 15, states:0x1f, sync:0, mutex:0
try_to_wake_up: bailing because oldstate(0) & state(31) is zero
wake_up_process: waking pid 15
try_to_wake_up: pid: 15, states:0x1f, sync:0, mutex:0
try_to_wake_up: bailing because oldstate(0) & state(31) is zero
wake_up_process: waking pid 15
try_to_wake_up: pid: 15, states:0x1f, sync:0, mutex:0
try_to_wake_up: bailing because oldstate(0) & state(31) is zero
wake_up_process: waking pid 15
...
next prev parent reply other threads:[~2006-05-19 16:51 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-05-10 19:28 2.6.16-rtXYZ hangs at boot on quad Opteron Robert Crocombe
2006-05-10 19:44 ` Steven Rostedt
2006-05-11 0:50 ` Robert Crocombe
2006-05-11 14:51 ` Clark Williams
2006-05-12 6:18 ` Steven Rostedt
2006-05-19 16:50 ` Clark Williams [this message]
2006-05-26 18:05 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=446DF768.80205@redhat.com \
--to=williams@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=rostedt@goodmis.org \
--cc=rwcrocombe@raytheon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.