From: Ingo Molnar <mingo@elte.hu>
To: raz ben yehuda <raz@scalemp.com>
Cc: linux-kernel@vger.kernel.org, mingo@redhat.com,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Mike Galbraith <efault@gmx.de>, Jack Steiner <steiner@sgi.com>,
Cliff Wickman <cpw@sgi.com>, Mike Travis <travis@sgi.com>,
Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [BUG] soft lockup while booting machine with more than 700 cores
Date: Thu, 10 Feb 2011 13:39:37 +0100 [thread overview]
Message-ID: <20110210123937.GD26094@elte.hu> (raw)
In-Reply-To: <1297236453.2756.9.camel@raz.scalemp.com>
* raz ben yehuda <raz@scalemp.com> wrote:
> Mingo Hello
>
> Bellow is a boot of a 2.6.32.19 kernel over a machine with more than 700 cores. I
> am failing to boot it due to a soft lockup in rebalance_domains area. I did not
> find anything related in mainline git and kernel's bugzilla.
>
> thank you
> Raz
>
>
> [ 929.614315] TCP cubic registered
> [ 929.614577] NET: Registered protocol family 17
> [ 930.785915] Bridge firewalling registered
> [ 930.928396] Freeing unused kernel memory: 1380k freed
> ===============================================================================
> Running /disklessrc
> Mounting /proc
> Creating /dev
> Creating initial device nodes
> [ 931.327841] usb 5-1: configuration #1 chosen from 1 choice
> [ 931.657469] input: HP Virtual Keyboard as /class/input/input0
> [ 931.671560] generic-usb 0003:03F0:1027.0001: input: USB HID v1.01 Keyboard [H
> P Virtual Keyboard] on usb-0000:01:04.0-1/input0
> [ 931.911480] input: HP Virtual Keyboard as /class/input/input1
> [ 931.926135] generic-usb 0003:03F0:1027.0002: input: USB HID v1.01 Mouse [HP V
> irtual Keyboard] on usb-0000:01:04.0-1/input1
> [ 932.247432] scsi 0:0:0:0: Direct-Access Generic USB Flash Disk 0.00 PQ
> : 0 ANSI: 2
> [ 932.301626] sd 0:0:0:0: Attached scsi generic sg0 type 0
> [ 932.416279] sd 0:0:0:0: [sda] 7892992 512-byte logical blocks: (4.04 GB/3.76
> GiB)
> [ 932.559424] sd 0:0:0:0: [sda] Write Protect is off
> [ 932.563238] sd 0:0:0:0: [sda] Assuming drive cache: write through
> [ 932.802006] sd 0:0:0:0: [sda] Assuming drive cache: write through
> [ 932.805070] sda: sda1
> [ 934.315071] sd 0:0:0:0: [sda] Assuming drive cache: write through
> [ 934.318055] sd 0:0:0:0: [sda] Attached SCSI removable disk
> Loading nfs module... [ 1011.681028] BUG: soft lockup - CPU#240 stuck for 62s! [
> swapper:0]
> [ 1011.744482] Modules linked in: sunrpc(+)
> [ 1011.789117] CPU 240:
> [ 1011.828757] Modules linked in: sunrpc(+)
> [ 1011.874003] Pid: 0, comm: swapper Not tainted 2.6.32.19-3.vSMP #2 vSMP 3.5
> [ 1011.935843] RIP: 0010:[<ffffffff8105ac32>] [<ffffffff8105ac32>] weighted_cpu
> load+0x12/0x20
> [ 1012.051597] RSP: 0018:ffff89468e803c88 EFLAGS: 00010286
> [ 1012.115020] RAX: 00000000000115c0 RBX: 0000000000000002 RCX: 000000000000001d
> [ 1012.162897] RDX: ffff8acd2e840000 RSI: 0000000000000002 RDI: 000000000000021d
> [ 1012.243858] RBP: ffffffff81033133 R08: 0000000000000200 R09: ffff894f0ca3d450
> [ 1012.309760] R10: 0000000000000000 R11: ffff89468e803dc0 R12: ffff89468e803c00
> [ 1012.358023] R13: 00000000000115c0 R14: ffffffff8104b6dc R15: ffffffff81046ea6
> [ 1012.417072] FS: 0000000000000000(0000) GS:ffff89468e800000(0000) knlGS:00000
> 00000000000
> [ 1012.494488] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [ 1012.559412] CR2: 00000000008d3988 CR3: 0000000001001000 CR4: 00000000000026e0
> [ 1012.619828] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1012.675491] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
> [ 1012.739386] Call Trace:
> [ 1012.790082] <IRQ> [<ffffffff81039705>] ? sched_clock+0x5/0x10
> [ 1012.868687] [<ffffffff8105ac6b>] ? source_load+0x2b/0x70
> [ 1012.923473] [<ffffffff810602d5>] ? find_busiest_group+0x1b5/0xa30
> [ 1012.973482] [<ffffffff81063487>] ? rebalance_domains+0x117/0x470
> [ 1013.031838] [<ffffffff81065a4e>] ? run_rebalance_domains+0x3e/0xe0
> [ 1013.081837] [<ffffffff8106fbbe>] ? __do_softirq+0xae/0x140
> [ 1013.134496] [<ffffffff81085da0>] ? ktime_get+0x50/0xd0
> [ 1013.182834] [<ffffffff8103374c>] ? call_softirq+0x1c/0x30
> [ 1013.246263] [<ffffffff81035745>] ? do_softirq+0x65/0xa0
> [ 1013.314801] [<ffffffff8106fb0c>] ? irq_exit+0x7c/0x80
> [ 1013.355605] [<ffffffff81046eab>] ? smp_apic_timer_interrupt+0x6b/0xa0
> [ 1013.391166] [<ffffffff8104b6dc>] ? native_apic_msr_write+0x2c/0x40
> [ 1013.391166] [<ffffffff81033133>] ? apic_timer_interrupt+0x13/0x20
> [ 1013.478307] <EOI> [<ffffffff8104dc92>] ? native_safe_halt+0x2/0x10
> [ 1013.515916] [<ffffffff8103a481>] ? default_idle+0x21/0x40
> [ 1013.572168] [<ffffffff81031537>] ? cpu_idle+0x57/0x90
> [ 1112.445978] BUG: soft lockup - CPU#240 stuck for 62s! [swapper:0]
> [ 1112.445978] Modules linked in: sunrpc(+)
Interesting.
Could you boot up with just enough cores for it to not lock up, and run perf top and
see where the overhead is?
Thanks,
Ingo
next prev parent reply other threads:[~2011-02-10 12:40 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-09 7:27 [BUG] soft lockup while booting machine with more than 700 cores raz ben yehuda
2011-02-10 4:47 ` Mike Galbraith
2011-02-10 12:39 ` Ingo Molnar [this message]
2011-02-10 6:09 ` raz ben yehuda
2011-02-10 20:56 ` Jack Steiner
2011-02-10 21:03 ` David Miller
2011-02-10 21:12 ` Jack Steiner
2011-02-16 15:04 ` Dimitri Sivanich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110210123937.GD26094@elte.hu \
--to=mingo@elte.hu \
--cc=a.p.zijlstra@chello.nl \
--cc=cpw@sgi.com \
--cc=efault@gmx.de \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=raz@scalemp.com \
--cc=steiner@sgi.com \
--cc=tglx@linutronix.de \
--cc=travis@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox