From: Roger Heflin <rheflin@atipa.com>
To: openib-general@openib.org, Linux-Kernel <linux-kernel@vger.kernel.org>
Subject: Openmpi/xhpl kernel crash 2.6.17-rc3 with Pathscale htx
Date: Mon, 08 May 2006 16:36:07 -0500 [thread overview]
Message-ID: <445FB9C7.8060507@atipa.com> (raw)
Hello,
Running hpl with openmpi over Infiniband gets me a crash.
Using hpl, openmpi 1.0.2, openib, and the 2.6.17-rc3 kernel.
I don't see the crash under ip over ib (ran for over an hour),
the crash occurs immediately upon attempting to start xhpl.
Here is the crash captured via the serial port:
[ 144.713555] ----------- [cut here ] --------- [please bite here ]
---------
[ 144.720550] Kernel BUG at drivers/infiniband/hw/ipath/ipath_layer.c:757
[ 144.727205] invalid opcode: 0000 [1] SMP
[ 144.731334] CPU 0
[ 144.733419] Modules linked in: ipv6 autofs4 adm1026 hwmon_vid
i2c_piix4 nfs lockd nfs_acl sunrpc dm_mirror dm_multipath dm_mod button
battery ac ohci_hcd ehci_hcd i2c_nforce2 i2c_core shpchp snd_intel8x0
snd_ac97_codec snd_ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_timer
snd soundcore snd_page_alloc ib_ipoib ib_ipath ipath_core ib_uverbs
ib_umad ib_ucm ib_sa ib_cm ib_mad ib_core tg3 floppy sata_svw ext3 jbd
sata_nv libata sd_mod scsi_mod
[ 144.774643] Pid: 4771, comm: xhpl Not tainted 2.6.17-rc3 #1
[ 144.780244] RIP: 0010:[<ffffffff880f6984>]
<ffffffff880f6984>{:ipath_core:ipath_verbs_send+362}
[ 144.788858] RSP: 0018:ffffffff8051be38 EFLAGS: 00010246
[ 144.794409] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
ffff8100df4a0150
[ 144.801574] RDX: ffffc200003b1078 RSI: 0000000000000000 RDI:
ffff8100df4a0150
[ 144.808742] RBP: 0000000000000000 R08: ffff8100df4a0158 R09:
0000000000000018
[ 144.815910] R10: 0000000000000018 R11: 0000000000000246 R12:
ffffc2000026f020
[ 144.823071] R13: 0000000000000000 R14: 0000000000000018 R15:
0000000000000000
[ 144.830230] FS: 00002b750d6fcca0(0000) GS:ffffffff805ad000(0000)
knlGS:0000000000000000
[ 144.838398] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 144.844190] CR2: 000000000047f050 CR3: 000000000cdb1000 CR4:
00000000000006e0
[ 144.851370] Process xhpl (pid: 4771, threadinfo ffff81000ceba000,
task ffff81000cc9e880)
[ 144.859504] Stack: ffffffff8059d900 ffff8100df4a0150 00000018dfef1000
ffff8100df4a0120
[ 144.867549] ffff8100df4a0000 ffffffff805f7d88 ffff8100df4a0098
0000000000000038
[ 144.875829] 0000000000000400 ffffffff8811869e
[ 144.881079] Call Trace: <IRQ>
<ffffffff8811869e>{:ib_ipath:ipath_do_rc_send+348}
[ 144.888727] <ffffffff80232548>{do_timer+58}
<ffffffff8020d0bb>{main_timer_handler+493}
[ 144.897498] <ffffffff8022efc6>{tasklet_hi_action+105}
<ffffffff8022ebc4>{__do_softirq+80}
[ 144.906525] <ffffffff8020aa5a>{call_softirq+30}
<ffffffff8020bc0a>{do_softirq+47}
[ 144.914854] <ffffffff8020bbd1>{do_IRQ+62}
<ffffffff80209b96>{ret_from_intr+0} <EOI>
[ 144.923395] <ffffffff8026aafe>{kfree+417}
<ffffffff880e14ff>{:ib_uverbs:ib_uverbs_poll_cq+409}
[ 144.932867] <ffffffff880dfa27>{:ib_uverbs:ib_uverbs_write+196}
<ffffffff8026f746>{vfs_write+212}
[ 144.942509] <ffffffff8026f897>{sys_write+69}
<ffffffff80209612>{system_call+126}
[ 144.950997]
[ 144.950998] Code: 0f 0b 68 84 21 10 88 c2 f5 02 eb 07 44 39 f3 41 0f
47 de 48
[ 144.960709] RIP <ffffffff880f6984>{:ipath_core:ipath_verbs_send+362}
RSP <ffffffff8051be38>
[ 144.969212] <3>BUG: sleeping function called from invalid context at
include/linux/rwsem.h:43
[ 144.977952] in_atomic():1, irqs_disabled():0
[ 144.982261]
[ 144.982262] Call Trace: <IRQ> <ffffffff80221daa>{__might_sleep+190}
[ 144.990056] <ffffffff80216103>{flat_send_IPI_mask+0}
<ffffffff80236073>{blocking_notifier_call_chain+31}
[ 145.000411] <ffffffff8022c2ee>{do_exit+34}
<ffffffff80423c6f>{_spin_unlock_irqrestore+11}
[ 145.009454] <ffffffff8020b027>{do_divide_error+0}
<ffffffff8020b22e>{do_invalid_op+145}
[ 145.018334] <ffffffff880f6984>{:ipath_core:ipath_verbs_send+362}
[ 145.025102] <ffffffff803f7d02>{tcp_v4_do_rcv+43}
<ffffffff88092128>{:tg3:tg3_interrupt_tagged+51}
[ 145.034840] <ffffffff8020a551>{error_exit+0}
<ffffffff880f6984>{:ipath_core:ipath_verbs_send+362}
[ 145.044606] <ffffffff880f6b40>{:ipath_core:ipath_verbs_send+806}
[ 145.051390] <ffffffff8811869e>{:ib_ipath:ipath_do_rc_send+348}
<ffffffff80232548>{do_timer+58}
[ 145.060897] <ffffffff8020d0bb>{main_timer_handler+493}
<ffffffff8022efc6>{tasklet_hi_action+105}
[ 145.070569] <ffffffff8022ebc4>{__do_softirq+80}
<ffffffff8020aa5a>{call_softirq+30}
[ 145.079121] <ffffffff8020bc0a>{do_softirq+47}
<ffffffff8020bbd1>{do_IRQ+62}
[ 145.086947] <ffffffff80209b96>{ret_from_intr+0} <EOI>
<ffffffff8026aafe>{kfree+417}
[ 145.095523] <ffffffff880e14ff>{:ib_uverbs:ib_uverbs_poll_cq+409}
[ 145.102291] <ffffffff880dfa27>{:ib_uverbs:ib_uverbs_write+196}
<ffffffff8026f746>{vfs_write+212}
[ 145.111972] <ffffffff8026f897>{sys_write+69}
<ffffffff80209612>{system_call+126}
[ 145.120482] Kernel panic - not syncing: Aiee, killing interrupt handler!
[ 145.127265]
/proc/interrupts looks like this:
CPU0 CPU1 CPU2 CPU3
0: 107714 110040 109206 113504 IO-APIC-edge timer
1: 417 1287 405 1627 IO-APIC-edge i8042
8: 0 0 0 0 IO-APIC-edge rtc
9: 0 0 0 0 IO-APIC-level acpi
15: 50 0 0 23 IO-APIC-edge ide1
50: 0 0 0 0 IO-APIC-level
libata, ohci_hcd:usb2
58: 0 0 0 0 IO-APIC-level libata
66: 0 0 0 0 IO-APIC-level libata
74: 15625 0 0 11 IO-APIC-level eth0
90: 551 0 0 0 IO-APIC-level
ipath_core
98: 0 0 0 0 IO-APIC-level
NVidia CK804
233: 249 904 1161 4180 IO-APIC-level
libata, ehci_hcd:usb1
NMI: 107 124 406 483
LOC: 440388 440365 440341 440317
ERR: 0
MIS: 0
Any ideas?
Roger
next reply other threads:[~2006-05-08 21:36 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-05-08 21:36 Roger Heflin [this message]
2006-05-10 18:41 ` [openib-general] Openmpi/xhpl kernel crash 2.6.17-rc3 with Pathscale htx Bryan O'Sullivan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=445FB9C7.8060507@atipa.com \
--to=rheflin@atipa.com \
--cc=linux-kernel@vger.kernel.org \
--cc=openib-general@openib.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.