public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* BUG: unable to handle kernel paging request at 0000000000070a78 IPoIB
@ 2013-05-17 14:16 Jack Wang
       [not found] ` <51963BC6.1050901-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Jack Wang @ 2013-05-17 14:16 UTC (permalink / raw)
  To: Shlomo Pongratz, Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi All,

I've saw this before, anyone have suggestion how to fix this.

May 17 16:09:13 ib2 kernel: [  528.500381] BUG: unable to handle kernel
paging request at 0000000000070a78
May 17 16:09:13 ib2 kernel: [  528.500529] IP: [<ffffffffa0166810>]
ipoib_cm_tx_handler+0x30/0x2a0 [ib_ipoib]
May 17 16:09:13 ib2 kernel: [  528.500655] PGD 1f6b89067 PUD 20d41a067
PMD 0
May 17 16:09:13 ib2 kernel: [  528.500807] Oops: 0000 [#1] SMP
May 17 16:09:13 ib2 kernel: [  528.501056] ib0: failed to send RTU: -22
May 17 16:09:13 ib2 kernel: [  528.500927] Modules linked in:
ib_ipoib(O) rdma_ucm rdma_cm iw_cm ib_addr ib_cm ib_uverbs ib_umad
mlx4_ib ib_sa ib_mad ib_core ip6table_filter ip6_tables iptable_filter
ip_tables ebtable_nat ebtables x_tables cpufreq_powersave
cpufreq_conservative cpufreq_stats cpufreq_userspace binfmt_misc fuse
loop acpi_cpufreq mperf kvm_amd kvm tpm_tis tpm psmouse edac_core
tpm_bios evdev serio_raw microcode edac_mce_amd processor shpchp
thermal_sys pci_hotplug asus_atk0110 i2c_piix4 button dm_multipath
scsi_dh mlx4_en sg sd_mod crc_t10dif r8169 mlx4_core ahci libahci libata
scsi_mod [last unloaded: ib_ipoib]
May 17 16:09:13 ib2 kernel: [  528.503008] CPU 2
May 17 16:09:13 ib2 kernel: [  528.503052] Pid: 58, comm: kworker/2:1
Tainted: G           O 3.9.0-rc7-pserver #4 System manufacturer System
Product Name/M4A89GTD-PRO
May 17 16:09:13 ib2 kernel: [  528.503183] RIP:
0010:[<ffffffffa0166810>]  [<ffffffffa0166810>]
ipoib_cm_tx_handler+0x30/0x2a0 [ib_ipoib]
May 17 16:09:13 ib2 kernel: [  528.503303] RSP: 0018:ffff88020e20dc68
EFLAGS: 00010293
May 17 16:09:13 ib2 kernel: [  528.503362] RAX: ffff88020e20dfd8 RBX:
ffff8801f5f19400 RCX: 0000000000000000
May 17 16:09:13 ib2 kernel: [  528.503425] RDX: 0000000000000003 RSI:
ffff8801f896e0e8 RDI: ffff8801f5f19400
May 17 16:09:13 ib2 kernel: [  528.503484] RBP: ffff88020e20dcb8 R08:
0000000000000000 R09: 0000000000000001
May 17 16:09:13 ib2 kernel: [  528.503544] R10: 0000000000000000 R11:
0000000000000000 R12: ffff8801f896e000
May 17 16:09:13 ib2 kernel: [  528.503603] R13: ffff88020d34d7e0 R14:
0000000000070a40 R15: ffff8801f896e0e8
May 17 16:09:13 ib2 kernel: [  528.503666] FS:  00007ff811ffb700(0000)
GS:ffff88021fc80000(0000) knlGS:0000000000000000
May 17 16:09:13 ib2 kernel: [  528.503777] CS:  0010 DS: 0000 ES: 0000
CR0: 000000008005003b
May 17 16:09:13 ib2 kernel: [  528.503834] CR2: 0000000000070a78 CR3:
000000020cb73000 CR4: 00000000000007e0
May 17 16:09:13 ib2 kernel: [  528.503894] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
May 17 16:09:13 ib2 kernel: [  528.503954] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
May 17 16:09:13 ib2 kernel: [  528.504014] Process kworker/2:1 (pid: 58,
threadinfo ffff88020e20c000, task ffff88020e12df40)
May 17 16:09:13 ib2 kernel: [  528.504124] Stack:
May 17 16:09:13 ib2 kernel: [  528.504173]  0000000000000086
ffff88020d524860 ffff88020e20dc88 ffffffff810a891d
May 17 16:09:13 ib2 kernel: [  528.504366]  ffff88020e20dca8
ffff8801f5f19400 ffff8801f896e000 ffff8801f5f19470
May 17 16:09:13 ib2 kernel: [  528.504557]  ffff88020d561b00
0000000000000000 ffff88020e20dd08 ffffffffa00cbc75
May 17 16:09:13 ib2 kernel: [  528.504748] Call Trace:
May 17 16:09:13 ib2 kernel: [  528.504808]  [<ffffffff810a891d>] ?
trace_hardirqs_off+0xd/0x10
May 17 16:09:13 ib2 kernel: [  528.504875]  [<ffffffffa00cbc75>]
cm_process_work+0x25/0x120 [ib_cm]
May 17 16:09:13 ib2 kernel: [  528.504942]  [<ffffffffa00cec61>]
cm_work_handler+0x591/0xb40 [ib_cm]
May 17 16:09:13 ib2 kernel: [  528.505007]  [<ffffffff810677c6>]
process_one_work+0x1d6/0x560
May 17 16:09:13 ib2 kernel: [  528.505069]  [<ffffffff81067755>] ?
process_one_work+0x165/0x560
May 17 16:09:13 ib2 kernel: [  528.505132]  [<ffffffff81068e29>]
worker_thread+0x119/0x370
May 17 16:09:13 ib2 kernel: [  528.505193]  [<ffffffff81068d10>] ?
manage_workers+0x340/0x340
May 17 16:09:13 ib2 kernel: [  528.505257]  [<ffffffff8106e846>]
kthread+0xe6/0xf0
May 17 16:09:13 ib2 kernel: [  528.505319]  [<ffffffff8106e760>] ?
__init_kthread_worker+0x70/0x70
May 17 16:09:13 ib2 kernel: [  528.505383]  [<ffffffff817c842c>]
ret_from_fork+0x7c/0xb0
May 17 16:09:13 ib2 kernel: [  528.505444]  [<ffffffff8106e760>] ?
__init_kthread_worker+0x70/0x70
May 17 16:09:13 ib2 kernel: [  528.505502] Code: 57 41 56 41 55 41 54 53
48 83 ec 28 66 66 66 66 90 4c 8b 6f 08 8b 16 48 89 fb 49 89 f7 4d 8b 75
20 49 81 c6 40 0a 00 00 83 fa 0b <4d> 8b 66 38 77 2a 89 d0 ff 24 c5 60
ae 16 a0 90 44 8b 1d 31 7e
May 17 16:09:13 ib2 kernel: [  528.507522] RIP  [<ffffffffa0166810>]
ipoib_cm_tx_handler+0x30/0x2a0 [ib_ipoib]
May 17 16:09:13 ib2 kernel: [  528.507633]  RSP <ffff88020e20dc68>
May 17 16:09:13 ib2 kernel: [  528.507685] CR2: 0000000000070a78
May 17 16:09:13 ib2 kernel: [  528.507741] ---[ end trace
bcb226cccac815a8 ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BUG: unable to handle kernel paging request at 0000000000070a78 IPoIB
       [not found] ` <51963BC6.1050901-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
@ 2013-05-21 12:51   ` Sebastian Riemer
       [not found]     ` <519B6DD6.4090502-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Riemer @ 2013-05-21 12:51 UTC (permalink / raw)
  To: Jack Wang
  Cc: Shlomo Pongratz, Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 17.05.2013 16:16, Jack Wang wrote:
> unable to handle kernel paging request

Hi Jack,

this should be related to the list corruption in IPoIB as list_del()
sets the LIST_POISON1 and LIST_POISON2 pointers.
Referencing these results in page faults according to the documentation
in the code.

Cheers,
Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BUG: unable to handle kernel paging request at 0000000000070a78 IPoIB
       [not found]     ` <519B6DD6.4090502-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
@ 2013-05-21 15:19       ` Jack Wang
       [not found]         ` <519B909A.8010004-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Jack Wang @ 2013-05-21 15:19 UTC (permalink / raw)
  To: Sebastian Riemer
  Cc: Shlomo Pongratz, Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 05/21/2013 02:51 PM, Sebastian Riemer wrote:
> On 17.05.2013 16:16, Jack Wang wrote:
>> unable to handle kernel paging request
> 
> Hi Jack,
> 
> this should be related to the list corruption in IPoIB as list_del()
> sets the LIST_POISON1 and LIST_POISON2 pointers.
> Referencing these results in page faults according to the documentation
> in the code.
> 
> Cheers,
> Sebastian
> 
This bug is easy triggered with below inject_bug with iperf -P 50 &&
switch ib mode in sync on both side.
-- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -1315,7 +1315,8 @@ static void ipoib_cm_tx_start(struct work_struct
*work)
 		netif_tx_lock_bh(dev);
 		spin_lock_irqsave(&priv->lock, flags);

-		if (ret) {
+		if (ret || priv->inject_bug) {
+			priv->inject_bug = 0;
 			neigh = p->neigh;
 			if (neigh) {
 				neigh->cm = NULL;

It turned into another panic after patch list_del to list_del_init, I'm
managing to get the back trace.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BUG: unable to handle kernel paging request at 0000000000070a78 IPoIB
       [not found]         ` <519B909A.8010004-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
@ 2013-05-23 15:38           ` Jack Wang
       [not found]             ` <519E37DE.1080504-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Jack Wang @ 2013-05-23 15:38 UTC (permalink / raw)
  To: Sebastian Riemer
  Cc: Shlomo Pongratz, Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 05/21/2013 05:19 PM, Jack Wang wrote:
> On 05/21/2013 02:51 PM, Sebastian Riemer wrote:
>> On 17.05.2013 16:16, Jack Wang wrote:
>>> unable to handle kernel paging request
>>
>> Hi Jack,
>>
>> this should be related to the list corruption in IPoIB as list_del()
>> sets the LIST_POISON1 and LIST_POISON2 pointers.
>> Referencing these results in page faults according to the documentation
>> in the code.
>>
>> Cheers,
>> Sebastian
>>
> This bug is easy triggered with below inject_bug with iperf -P 50 &&
> switch ib mode in sync on both side.
> -- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> @@ -1315,7 +1315,8 @@ static void ipoib_cm_tx_start(struct work_struct
> *work)
>  		netif_tx_lock_bh(dev);
>  		spin_lock_irqsave(&priv->lock, flags);
> 
> -		if (ret) {
> +		if (ret || priv->inject_bug) {
> +			priv->inject_bug = 0;
>  			neigh = p->neigh;
>  			if (neigh) {
>  				neigh->cm = NULL;
> 
> It turned into another panic after patch list_del to list_del_init, I'm
> managing to get the back trace.
> 

Some trace I got during testing, Dear IPoIB expert, could you give some
suggestion? It looks like some object life time issues?



May 21 15:12:03 ib2 kernel: [  415.050021] general protection fault:
0000 [#1] SMP
May 21 15:12:03 ib2 kernel: [  415.050114] CPU 2
May 21 15:12:03 ib2 kernel: [  415.050142] Modules linked in:
ib_ipoib(O) rdma_ucm rdma_cm iw_cm ib_addr ib_cm ib_sa ib_uverbs ib_umad
mlx4_ib ib_mad ib_core ip6table_filter ip6_tables iptable_filter
ip_tables ebtable_nat ebtables x_tables cpufreq_powersave
cpufreq_conservative cpufreq_stats cpufreq_userspace binfmt_misc fuse
loop kvm_amd kvm tpm_tis powernow_k8 shpchp tpm processor mperf
edac_core tpm_bios psmouse edac_mce_amd pci_hotplug microcode evdev
serio_raw i2c_piix4 asus_atk0110 thermal_sys button dm_multipath scsi_dh
mlx4_en sg sd_mod crc_t10dif r8169 ahci libahci libata scsi_mod
mlx4_core [last unloaded: ib_ipoib]
May 21 15:12:03 ib2 kernel: [  415.051845]
May 21 15:12:03 ib2 kernel: [  415.051886] Pid: 3166, comm: kworker/2:0
Tainted: G           O 3.4.23-pserver-hotfix+ #109 System manufacturer
System Product Name/M4A89GTD-PRO
May 21 15:12:03 ib2 kernel: [  415.052019] RIP:
0010:[<ffffffffa01c8bf9>]  [<ffffffffa01c8bf9>] ib_modify_qp+0x9/0x20
[ib_core]
May 21 15:12:03 ib2 kernel: [  415.052106] RSP: 0018:ffff88020efd3b00
EFLAGS: 00010246
May 21 15:12:03 ib2 kernel: [  415.052148] RAX: 0000000000000000 RBX:
0000000000000000 RCX: 0000000000000000
May 21 15:12:03 ib2 kernel: [  415.052190] RDX: 0000000000129181 RSI:
ffff88020efd3b20 RDI: dead4ead00000000
May 21 15:12:03 ib2 kernel: [  415.052233] RBP: ffff88020efd3b00 R08:
0000000000000000 R09: 0000000000000001
May 21 15:12:03 ib2 kernel: [  415.052275] R10: 0000000000000000 R11:
0000000000000000 R12: ffff8801fb698c60
May 21 15:12:03 ib2 kernel: [  415.052317] R13: ffff88020efd3b20 R14:
ffff8802101fdc00 R15: ffffffff81e14250
May 21 15:12:03 ib2 kernel: [  415.052360] FS:  00007f8c38a05700(0000)
GS:ffff88021fc80000(0000) knlGS:0000000000000000
May 21 15:12:03 ib2 kernel: [  415.052415] CS:  0010 DS: 0000 ES: 0000
CR0: 000000008005003b
May 21 15:12:03 ib2 kernel: [  415.052457] CR2: 00007f8c38535d70 CR3:
0000000001c0b000 CR4: 00000000000007e0
May 21 15:12:03 ib2 kernel: [  415.052500] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
May 21 15:12:03 ib2 kernel: [  415.052542] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
May 21 15:12:03 ib2 kernel: [  415.052585] Process kworker/2:0 (pid:
3166, threadinfo ffff88020efd2000, task ffff88021228bf00)
May 21 15:12:03 ib2 kernel: [  415.052640] Stack:
May 21 15:12:03 ib2 kernel: [  415.052678]  ffff88020efd3c40
ffffffffa02bfcb9 0000000000000000 001291811228bf00
May 21 15:12:03 ib2 kernel: [  415.052834]  ffffffff00000002
ffff880200000005 000000008173c557 0008005eefed5918
May 21 15:12:03 ib2 kernel: [  415.052988]  ffffffff81e12e00
0000000000000080 ffff88020efd3b70 0000000000000000
May 21 15:12:03 ib2 kernel: [  415.053143] Call Trace:
May 21 15:12:03 ib2 kernel: [  415.053188]  [<ffffffffa02bfcb9>]
ipoib_cm_rep_handler+0x99/0x2c0 [ib_ipoib]
May 21 15:12:03 ib2 kernel: [  415.053233]  [<ffffffff8109c0bd>] ?
trace_hardirqs_off+0xd/0x10
May 21 15:12:03 ib2 kernel: [  415.053277]  [<ffffffff8173c557>] ?
_raw_spin_unlock_irqrestore+0x77/0x80
May 21 15:12:03 ib2 kernel: [  415.053322]  [<ffffffff8105c913>] ?
__queue_work+0x103/0x4a0
May 21 15:12:03 ib2 kernel: [  415.053364]  [<ffffffff8109c009>] ?
trace_hardirqs_off_caller+0x29/0xd0
May 21 15:12:03 ib2 kernel: [  415.053409]  [<ffffffffa02c0373>]
ipoib_cm_tx_handler+0x93/0x2b0 [ib_ipoib]
May 21 15:12:03 ib2 kernel: [  415.053452]  [<ffffffff8109c0bd>] ?
trace_hardirqs_off+0xd/0x10
May 21 15:12:03 ib2 kernel: [  415.053497]  [<ffffffffa0141cc5>]
cm_process_work+0x25/0x120 [ib_cm]
May 21 15:12:03 ib2 kernel: [  415.053540]  [<ffffffffa0142508>]
cm_rep_handler+0x308/0x590 [ib_cm]
May 21 15:12:03 ib2 kernel: [  415.053585]  [<ffffffffa0143c65>]
cm_work_handler+0x145/0x1070 [ib_cm]
May 21 15:12:03 ib2 kernel: [  415.053628]  [<ffffffff8105daea>]
process_one_work+0x19a/0x5c0
May 21 15:12:03 ib2 kernel: [  415.053670]  [<ffffffff8105da7d>] ?
process_one_work+0x12d/0x5c0
May 21 15:12:03 ib2 kernel: [  415.053713]  [<ffffffffa0143b20>] ?
cm_req_handler+0xa40/0xa40 [ib_cm]
May 21 15:12:03 ib2 kernel: [  415.053757]  [<ffffffff8105f865>]
worker_thread+0x175/0x380
May 21 15:12:03 ib2 kernel: [  415.053799]  [<ffffffff8105f6f0>] ?
manage_workers+0x210/0x210
May 21 15:12:03 ib2 kernel: [  415.053841]  [<ffffffff81064e0e>]
kthread+0xbe/0xd0
May 21 15:12:03 ib2 kernel: [  415.053884]  [<ffffffff8109f2b0>] ?
trace_hardirqs_on_caller+0x20/0x1b0
May 21 15:12:03 ib2 kernel: [  415.053928]  [<ffffffff817465b4>]
kernel_thread_helper+0x4/0x10
May 21 15:12:03 ib2 kernel: [  415.053972]  [<ffffffff8173c4c0>] ?
_raw_spin_unlock_irq+0x30/0x50
May 21 15:12:03 ib2 kernel: [  415.054015]  [<ffffffff8109f44d>] ?
trace_hardirqs_on+0xd/0x10
May 21 15:12:03 ib2 kernel: [  415.054058]  [<ffffffff8173c8b0>] ?
retint_restore_args+0x13/0x13
May 21 15:12:03 ib2 kernel: [  415.054100]  [<ffffffff81064d50>] ?
__init_kthread_worker+0x70/0x70
May 21 15:12:03 ib2 kernel: [  415.054144]  [<ffffffff817465b0>] ?
gs_change+0x13/0x13
May 21 15:12:03 ib2 kernel: [  415.054185] Code: ff ff 31 c0 eb d6 0f 1f
40 00 83 ca 01 c9 09 c2 31 c0 f7 d2 85 ca 0f 94 c0 c3 0f 1f 84 00 00 00
00 00 55 48 89 e5 66 66 66 66 90 <48> 8b 07 31 c9 48 8b 7f 58 ff 90 30
02 00 00 c9 c3 66 0f 1f 44
May 21 15:12:03 ib2 kernel: [  415.055875] RIP  [<ffffffffa01c8bf9>]
ib_modify_qp+0x9/0x20 [ib_core]
May 21 15:12:03 ib2 kernel: [  415.055945]  RSP <ffff88020efd3b00>
May 21 15:12:03 ib2 kernel: [  415.056011] ---[ end trace
871425e942ec1142 ]---
(gdb) list *ib_modify_qp+0x9
0xbf9 is in ib_modify_qp (drivers/infiniband/core/verbs.c:807).
802	
803	int ib_modify_qp(struct ib_qp *qp,
804			 struct ib_qp_attr *qp_attr,
805			 int qp_attr_mask)
806	{
807		return qp->device->modify_qp(qp->real_qp, qp_attr, qp_attr_mask, NULL);
808	}
809	EXPORT_SYMBOL(ib_modify_qp);
810	
811	int ib_query_qp(struct ib_qp *qp,



May 21 15:12:03 ib2 kernel: [  415.056065] BUG: unable to handle kernel
paging request at fffffffffffffff8
May 21 15:12:03 ib2 kernel: [  415.056164] IP: [<ffffffff81064700>]
kthread_data+0x10/0x20
May 21 15:12:03 ib2 kernel: [  415.056236] PGD 1c0d067 PUD 1c0e067 PMD 0
May 21 15:12:03 ib2 kernel: [  415.056358] Oops: 0000 [#2] SMP
May 21 15:12:03 ib2 kernel: [  415.056449] CPU 2
May 21 15:12:05 ib2 kernel: [  415.056477] Modules linked in:
ib_ipoib(O) rdma_ucm rdma_cm iw_cm ib_addr ib_cm ib_sa ib_uverbs ib_umad
mlx4_ib ib_mad ib_core ip6table_filter ip6_tables iptable_filter
ip_tables ebtable_nat ebtables x_tables cpufreq_powersave
cpufreq_conservative cpufreq_stats cpufreq_userspace binfmt_misc fuse
loop kvm_amd kvm tpm_tis powernow_k8 shpchp tpm processor mperf
edac_core tpm_bios psmouse edac_mce_amd pci_hotplug microcode evdev
serio_raw i2c_piix4 asus_atk0110 thermal_sys button dm_multipath scsi_dh
mlx4_en sg sd_mod crc_t10dif r8169 ahci libahci libata scsi_mod
mlx4_core [last unloaded: ib_ipoib]
May 21 15:12:05 ib2 kernel: [  415.058609]
May 21 15:12:05 ib2 kernel: [  415.058648] Pid: 3166, comm: kworker/2:0
Tainted: G      D    O 3.4.23-pserver-hotfix+ #109 System manufacturer
System Product Name/M4A89GTD-PRO
May 21 15:12:05 ib2 kernel: [  415.058783] RIP:
0010:[<ffffffff81064700>]  [<ffffffff81064700>] kthread_data+0x10/0x20
May 21 15:12:05 ib2 kernel: [  415.058866] RSP: 0018:ffff88020efd3858
EFLAGS: 00010092
May 21 15:12:05 ib2 kernel: [  415.058909] RAX: 0000000000000000 RBX:
0000000000000002 RCX: 0000000000000002
May 21 15:12:05 ib2 kernel: [  415.058954] RDX: ffffffff81e138c0 RSI:
0000000000000002 RDI: ffff88021228bf00
May 21 15:12:05 ib2 kernel: [  415.058997] RBP: ffff88020efd3858 R08:
ffff88021228bf70 R09: 0000000000000001
May 21 15:12:05 ib2 kernel: [  415.059041] R10: 0000000000000800 R11:
0000000000000000 R12: 0000000000000002
May 21 15:12:05 ib2 kernel: [  415.059085] R13: ffff88021228c2c8 R14:
ffff88020efd3688 R15: ffffffff81e14250
May 21 15:12:05 ib2 kernel: [  415.059128] FS:  00007f8c38a05700(0000)
GS:ffff88021fc80000(0000) knlGS:0000000000000000
May 21 15:12:05 ib2 kernel: [  415.059187] CS:  0010 DS: 0000 ES: 0000
CR0: 000000008005003b
May 21 15:12:05 ib2 kernel: [  415.059230] CR2: fffffffffffffff8 CR3:
0000000001c0b000 CR4: 00000000000007e0
May 21 15:12:05 ib2 kernel: [  415.059274] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
May 21 15:12:05 ib2 kernel: [  415.059317] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
May 21 15:12:05 ib2 kernel: [  415.059362] Process kworker/2:0 (pid:
3166, threadinfo ffff88020efd2000, task ffff88021228bf00)
May 21 15:12:05 ib2 kernel: [  415.059420] Stack:
May 21 15:12:05 ib2 kernel: [  415.059460]  ffff88020efd3878
ffffffff8105c735 ffff88020efd3878 ffff88021fc92f40
May 21 15:12:05 ib2 kernel: [  415.059616]  ffff88020efd3908
ffffffff8173a963 ffff880200000000 ffff88020efd2000
May 21 15:12:05 ib2 kernel: [  415.059771]  ffff88020efd3fd8
ffff88020efd2000 ffff88020efd2010 ffff88020efd2000
May 21 15:12:05 ib2 kernel: [  415.059928] Call Trace:
May 21 15:12:05 ib2 kernel: [  415.059969]  [<ffffffff8105c735>]
wq_worker_sleeping+0x15/0xa0
May 21 15:12:05 ib2 kernel: [  415.060013]  [<ffffffff8173a963>]
__schedule+0x6a3/0x940
May 21 15:12:05 ib2 kernel: [  415.060056]  [<ffffffff8173acc9>]
schedule+0x29/0x70
May 21 15:12:05 ib2 kernel: [  415.060098]  [<ffffffff81042105>]
do_exit+0x615/0xa40
May 21 15:12:05 ib2 kernel: [  415.060141]  [<ffffffff8103e6c1>] ?
kmsg_dump+0x81/0x300
May 21 15:12:05 ib2 kernel: [  415.060184]  [<ffffffff8173d6db>]
oops_end+0xab/0xf0
May 21 15:12:05 ib2 kernel: [  415.060228]  [<ffffffff8100570b>]
die+0x5b/0x90
May 21 15:12:05 ib2 kernel: [  415.060270]  [<ffffffff8173d274>]
do_general_protection+0x164/0x170
May 21 15:12:05 ib2 kernel: [  415.060315]  [<ffffffff8173c8e0>] ?
restore_args+0x30/0x30
May 21 15:12:05 ib2 kernel: [  415.060358]  [<ffffffff8173ca95>]
general_protection+0x25/0x30
May 21 15:12:05 ib2 kernel: [  415.060404]  [<ffffffffa01c8bf9>] ?
ib_modify_qp+0x9/0x20 [ib_core]
May 21 15:12:05 ib2 kernel: [  415.060449]  [<ffffffffa02bfcb9>]
ipoib_cm_rep_handler+0x99/0x2c0 [ib_ipoib]
May 21 15:12:05 ib2 kernel: [  415.060493]  [<ffffffff8109c0bd>] ?
trace_hardirqs_off+0xd/0x10
May 21 15:12:05 ib2 kernel: [  415.060536]  [<ffffffff8173c557>] ?
_raw_spin_unlock_irqrestore+0x77/0x80
May 21 15:12:05 ib2 kernel: [  415.060579]  [<ffffffff8105c913>] ?
__queue_work+0x103/0x4a0
May 21 15:12:05 ib2 kernel: [  415.060625]  [<ffffffff8109c009>] ?
trace_hardirqs_off_caller+0x29/0xd0
May 21 15:12:05 ib2 kernel: [  415.060670]  [<ffffffffa02c0373>]
ipoib_cm_tx_handler+0x93/0x2b0 [ib_ipoib]
May 21 15:12:05 ib2 kernel: [  415.060714]  [<ffffffff8109c0bd>] ?
trace_hardirqs_off+0xd/0x10
May 21 15:12:05 ib2 kernel: [  415.060757]  [<ffffffffa0141cc5>]
cm_process_work+0x25/0x120 [ib_cm]
May 21 15:12:05 ib2 kernel: [  415.060801]  [<ffffffffa0142508>]
cm_rep_handler+0x308/0x590 [ib_cm]
May 21 15:12:05 ib2 kernel: [  415.060844]  [<ffffffffa0143c65>]
cm_work_handler+0x145/0x1070 [ib_cm]
May 21 15:12:05 ib2 kernel: [  415.060887]  [<ffffffff8105daea>]
process_one_work+0x19a/0x5c0
May 21 15:12:05 ib2 kernel: [  415.060930]  [<ffffffff8105da7d>] ?
process_one_work+0x12d/0x5c0
May 21 15:12:05 ib2 kernel: [  415.060973]  [<ffffffffa0143b20>] ?
cm_req_handler+0xa40/0xa40 [ib_cm]
May 21 15:12:05 ib2 kernel: [  415.061016]  [<ffffffff8105f865>]
worker_thread+0x175/0x380
May 21 15:12:05 ib2 kernel: [  415.061059]  [<ffffffff8105f6f0>] ?
manage_workers+0x210/0x210
May 21 15:12:05 ib2 kernel: [  415.061102]  [<ffffffff81064e0e>]
kthread+0xbe/0xd0
May 21 15:12:05 ib2 kernel: [  415.061144]  [<ffffffff8109f2b0>] ?
trace_hardirqs_on_caller+0x20/0x1b0
May 21 15:12:05 ib2 kernel: [  415.061188]  [<ffffffff817465b4>]
kernel_thread_helper+0x4/0x10
May 21 15:12:05 ib2 kernel: [  415.061234]  [<ffffffff8173c4c0>] ?
_raw_spin_unlock_irq+0x30/0x50
May 21 15:12:05 ib2 kernel: [  415.061277]  [<ffffffff8109f44d>] ?
trace_hardirqs_on+0xd/0x10
May 21 15:12:05 ib2 kernel: [  415.061319]  [<ffffffff8173c8b0>] ?
retint_restore_args+0x13/0x13
May 21 15:12:05 ib2 kernel: [  415.061363]  [<ffffffff81064d50>] ?
__init_kthread_worker+0x70/0x70
May 21 15:12:05 ib2 kernel: [  415.061406]  [<ffffffff817465b0>] ?
gs_change+0x13/0x13
May 21 15:12:05 ib2 kernel: [  415.061447] Code: 66 66 66 90 65 48 8b 04
25 80 b9 00 00 48 8b 80 70 03 00 00 8b 40 f0 c9 c3 66 90 55 48 89 e5 66
66 66 66 90 48 8b 87 70 03 00 00 <48> 8b 40 f8 c9 c3 66 2e 0f 1f 84 00
00 00 00 00 55 48 89 e5 66
May 21 15:12:05 ib2 kernel: [  415.063139] RIP  [<ffffffff81064700>]
kthread_data+0x10/0x20
May 21 15:12:05 ib2 kernel: [  415.063205]  RSP <ffff88020efd3858>
May 21 15:12:05 ib2 kernel: [  415.063245] CR2: fffffffffffffff8
May 21 15:12:05 ib2 kernel: [  415.063285] ---[ end trace
871425e942ec1143 ]---
May 21 15:12:05 ib2 kernel: [  415.063326] Fixing recursive fault but
reboot is needed!
May 21 15:12:05 ib2 kernel: [  417.441382] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:07 ib2 kernel: [  419.840353] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:10 ib2 kernel: [  422.198880] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:12 ib2 kernel: [  424.597641] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:14 ib2 kernel: [  426.956288] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:17 ib2 kernel: [  429.355047] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:19 ib2 kernel: [  431.753621] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:22 ib2 kernel: [  434.122390] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:24 ib2 kernel: [  436.521068] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:26 ib2 kernel: [  436.660137] ------------[ cut here
]------------
May 21 15:12:26 ib2 kernel: [  436.660216] WARNING: at
kernel/watchdog.c:241 watchdog_overflow_callback+0x98/0xc0()
May 21 15:12:26 ib2 kernel: [  436.660272] Hardware name: System Product
Name
May 21 15:12:26 ib2 kernel: [  436.660313] Watchdog detected hard LOCKUP
on cpu 2
May 21 15:12:26 ib2 kernel: [  436.660341] Modules linked in:
ib_ipoib(O) rdma_ucm rdma_cm iw_cm ib_addr ib_cm ib_sa ib_uverbs ib_umad
mlx4_ib ib_mad ib_core ip6table_filter ip6_tables iptable_filter
ip_tables ebtable_nat ebtables x_tables cpufreq_powersave
cpufreq_conservative cpufreq_stats cpufreq_userspace binfmt_misc fuse
loop kvm_amd kvm tpm_tis powernow_k8 shpchp tpm processor mperf
edac_core tpm_bios psmouse edac_mce_amd pci_hotplug microcode evdev
serio_raw i2c_piix4 asus_atk0110 thermal_sys button dm_multipath scsi_dh
mlx4_en sg sd_mod crc_t10dif r8169 ahci libahci libata scsi_mod
mlx4_core [last unloaded: ib_ipoib]
May 21 15:12:26 ib2 kernel: [  436.662032] Pid: 3166, comm: kworker/2:0
Tainted: G      D    O 3.4.23-pserver-hotfix+ #109
May 21 15:12:26 ib2 kernel: [  436.662088] Call Trace:
May 21 15:12:26 ib2 kernel: [  436.662127]  <NMI>  [<ffffffff8103c2cf>]
warn_slowpath_common+0x7f/0xc0
May 21 15:12:26 ib2 kernel: [  436.662197]  [<ffffffff8103c3c6>]
warn_slowpath_fmt+0x46/0x50
May 21 15:12:26 ib2 kernel: [  436.662239]  [<ffffffff8109c009>] ?
trace_hardirqs_off_caller+0x29/0xd0
May 21 15:12:26 ib2 kernel: [  436.662283]  [<ffffffff810cd968>]
watchdog_overflow_callback+0x98/0xc0
May 21 15:12:26 ib2 kernel: [  436.662327]  [<ffffffff811077dc>]
__perf_event_overflow+0x9c/0x320
May 21 15:12:26 ib2 kernel: [  436.662370]  [<ffffffff811087ec>] ?
perf_event_update_userpage+0x16c/0x2c0
May 21 15:12:26 ib2 kernel: [  436.662415]  [<ffffffff81108680>] ?
perf_event_mmap_ctx+0x170/0x170
May 21 15:12:26 ib2 kernel: [  436.662458]  [<ffffffff81107f74>]
perf_event_overflow+0x14/0x20
May 21 15:12:26 ib2 kernel: [  436.662501]  [<ffffffff81013f27>]
x86_pmu_handle_irq+0x1b7/0x220
May 21 15:12:26 ib2 kernel: [  436.662545]  [<ffffffff8173e341>]
perf_event_nmi_handler+0x21/0x30
May 21 15:12:26 ib2 kernel: [  436.662588]  [<ffffffff8173d8a6>]
nmi_handle+0xb6/0x200
May 21 15:12:26 ib2 kernel: [  436.662631]  [<ffffffff8173d7f0>] ?
oops_begin+0xd0/0xd0
May 21 15:12:26 ib2 kernel: [  436.662673]  [<ffffffff8173db1d>]
do_nmi+0x12d/0x350
May 21 15:12:26 ib2 kernel: [  436.662715]  [<ffffffff8173ceac>]
end_repeat_nmi+0x1a/0x1e
May 21 15:12:26 ib2 kernel: [  436.662758]  [<ffffffff81420d14>] ?
delay_tsc+0x34/0xb0
May 21 15:12:26 ib2 kernel: [  436.662800]  [<ffffffff81420d14>] ?
delay_tsc+0x34/0xb0
May 21 15:12:26 ib2 kernel: [  436.662842]  [<ffffffff81420d14>] ?
delay_tsc+0x34/0xb0
May 21 15:12:26 ib2 kernel: [  436.662883]  <<EOE>>
[<ffffffff81420c8f>] __delay+0xf/0x20
May 21 15:12:26 ib2 kernel: [  436.662952]  [<ffffffff814285a3>]
do_raw_spin_lock+0xd3/0x140
May 21 15:12:26 ib2 kernel: [  436.662995]  [<ffffffff8173bc74>]
_raw_spin_lock_irq+0x54/0x60
May 21 15:12:26 ib2 kernel: [  436.663037]  [<ffffffff8173a3e0>] ?
__schedule+0x120/0x940
May 21 15:12:26 ib2 kernel: [  436.663080]  [<ffffffff8173a3e0>]
__schedule+0x120/0x940
May 21 15:12:26 ib2 kernel: [  436.663122]  [<ffffffff8173acc9>]
schedule+0x29/0x70
May 21 15:12:26 ib2 kernel: [  436.663164]  [<ffffffff81042293>]
do_exit+0x7a3/0xa40
May 21 15:12:26 ib2 kernel: [  436.663206]  [<ffffffff8103e7fe>] ?
kmsg_dump+0x1be/0x300
May 21 15:12:26 ib2 kernel: [  436.663248]  [<ffffffff8103e6c1>] ?
kmsg_dump+0x81/0x300
May 21 15:12:26 ib2 kernel: [  436.663291]  [<ffffffff817387f9>] ?
printk+0x41/0x48
May 21 15:12:26 ib2 kernel: [  436.663333]  [<ffffffff8173d6db>]
oops_end+0xab/0xf0
May 21 15:12:26 ib2 kernel: [  436.663376]  [<ffffffff8102f6bd>]
no_context+0x11d/0x2d0
May 21 15:12:26 ib2 kernel: [  436.663418]  [<ffffffff810afbf0>] ?
kallsyms_lookup+0x60/0xe0
May 21 15:12:26 ib2 kernel: [  436.663462]  [<ffffffff8102f9ad>]
__bad_area_nosemaphore+0x13d/0x220
May 21 15:12:26 ib2 kernel: [  436.663505]  [<ffffffff8102faa3>]
bad_area_nosemaphore+0x13/0x20
May 21 15:12:26 ib2 kernel: [  436.663548]  [<ffffffff81740603>]
do_page_fault+0x3a3/0x4e0
May 21 15:12:26 ib2 kernel: [  436.663590]  [<ffffffff8173cd06>] ?
error_sti+0x5/0x6
May 21 15:12:26 ib2 kernel: [  436.663632]  [<ffffffff8109c009>] ?
trace_hardirqs_off_caller+0x29/0xd0
May 21 15:12:26 ib2 kernel: [  436.663676]  [<ffffffff8142211d>] ?
trace_hardirqs_off_thunk+0x3a/0x3c
May 21 15:12:26 ib2 kernel: [  436.663719]  [<ffffffff8173cac5>]
page_fault+0x25/0x30
May 21 15:12:26 ib2 kernel: [  436.663762]  [<ffffffff81064700>] ?
kthread_data+0x10/0x20
May 21 15:12:26 ib2 kernel: [  436.663804]  [<ffffffff8105c735>]
wq_worker_sleeping+0x15/0xa0
May 21 15:12:26 ib2 kernel: [  436.663848]  [<ffffffff8173a963>]
__schedule+0x6a3/0x940
May 21 15:12:26 ib2 kernel: [  436.663890]  [<ffffffff8173acc9>]
schedule+0x29/0x70
May 21 15:12:26 ib2 kernel: [  436.663932]  [<ffffffff81042105>]
do_exit+0x615/0xa40
May 21 15:12:26 ib2 kernel: [  436.663974]  [<ffffffff8103e6c1>] ?
kmsg_dump+0x81/0x300
May 21 15:12:26 ib2 kernel: [  436.664017]  [<ffffffff8173d6db>]
oops_end+0xab/0xf0
May 21 15:12:26 ib2 kernel: [  436.664059]  [<ffffffff8100570b>]
die+0x5b/0x90
May 21 15:12:26 ib2 kernel: [  436.664102]  [<ffffffff8173d274>]
do_general_protection+0x164/0x170
May 21 15:12:26 ib2 kernel: [  436.664145]  [<ffffffff8173c8e0>] ?
restore_args+0x30/0x30
May 21 15:12:26 ib2 kernel: [  436.664188]  [<ffffffff8173ca95>]
general_protection+0x25/0x30
May 21 15:12:26 ib2 kernel: [  436.664233]  [<ffffffffa01c8bf9>] ?
ib_modify_qp+0x9/0x20 [ib_core]
May 21 15:12:26 ib2 kernel: [  436.664277]  [<ffffffffa02bfcb9>]
ipoib_cm_rep_handler+0x99/0x2c0 [ib_ipoib]
May 21 15:12:26 ib2 kernel: [  436.664321]  [<ffffffff8109c0bd>] ?
trace_hardirqs_off+0xd/0x10
May 21 15:12:26 ib2 kernel: [  436.664363]  [<ffffffff8173c557>] ?
_raw_spin_unlock_irqrestore+0x77/0x80
May 21 15:12:26 ib2 kernel: [  436.664407]  [<ffffffff8105c913>] ?
__queue_work+0x103/0x4a0
May 21 15:12:26 ib2 kernel: [  436.664450]  [<ffffffff8109c009>] ?
trace_hardirqs_off_caller+0x29/0xd0
May 21 15:12:26 ib2 kernel: [  436.664495]  [<ffffffffa02c0373>]
ipoib_cm_tx_handler+0x93/0x2b0 [ib_ipoib]
May 21 15:12:26 ib2 kernel: [  436.664538]  [<ffffffff8109c0bd>] ?
trace_hardirqs_off+0xd/0x10
May 21 15:12:26 ib2 kernel: [  436.664583]  [<ffffffffa0141cc5>]
cm_process_work+0x25/0x120 [ib_cm]
May 21 15:12:26 ib2 kernel: [  436.664627]  [<ffffffffa0142508>]
cm_rep_handler+0x308/0x590 [ib_cm]
May 21 15:12:26 ib2 kernel: [  436.664671]  [<ffffffffa0143c65>]
cm_work_handler+0x145/0x1070 [ib_cm]
May 21 15:12:26 ib2 kernel: [  436.664714]  [<ffffffff8105daea>]
process_one_work+0x19a/0x5c0
May 21 15:12:26 ib2 kernel: [  436.664756]  [<ffffffff8105da7d>] ?
process_one_work+0x12d/0x5c0
May 21 15:12:26 ib2 kernel: [  436.664800]  [<ffffffffa0143b20>] ?
cm_req_handler+0xa40/0xa40 [ib_cm]
May 21 15:12:26 ib2 kernel: [  436.664843]  [<ffffffff8105f865>]
worker_thread+0x175/0x380
May 21 15:12:26 ib2 kernel: [  436.664886]  [<ffffffff8105f6f0>] ?
manage_workers+0x210/0x210
May 21 15:12:26 ib2 kernel: [  436.664929]  [<ffffffff81064e0e>]
kthread+0xbe/0xd0
May 21 15:12:26 ib2 kernel: [  436.664972]  [<ffffffff8109f2b0>] ?
trace_hardirqs_on_caller+0x20/0x1b0
May 21 15:12:26 ib2 kernel: [  436.665015]  [<ffffffff817465b4>]
kernel_thread_helper+0x4/0x10
May 21 15:12:26 ib2 kernel: [  436.665059]  [<ffffffff8173c4c0>] ?
_raw_spin_unlock_irq+0x30/0x50
May 21 15:12:26 ib2 kernel: [  436.665102]  [<ffffffff8109f44d>] ?
trace_hardirqs_on+0xd/0x10
May 21 15:12:26 ib2 kernel: [  436.665145]  [<ffffffff8173c8b0>] ?
retint_restore_args+0x13/0x13
May 21 15:12:26 ib2 kernel: [  436.665187]  [<ffffffff81064d50>] ?
__init_kthread_worker+0x70/0x70
May 21 15:12:26 ib2 kernel: [  436.665231]  [<ffffffff817465b0>] ?
gs_change+0x13/0x13
May 21 15:12:26 ib2 kernel: [  436.665273] ---[ end trace
871425e942ec1144 ]---
May 21 15:12:26 ib2 kernel: [  438.919742] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:29 ib2 kernel: [  441.318429] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:31 ib2 kernel: [  443.717220] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:34 ib2 kernel: [  446.115789] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:36 ib2 kernel: [  448.514602] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:38 ib2 kernel: [  450.913390] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:41 ib2 kernel: [  453.271906] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:43 ib2 kernel: [  455.670796] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:46 ib2 kernel: [  458.069297] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:48 ib2 kernel: [  460.438309] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:50 ib2 kernel: [  462.836738] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:53 ib2 kernel: [  465.235553] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:55 ib2 kernel: [  467.634331] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:58 ib2 kernel: [  468.407807] ------------[ cut here
]------------
May 21 15:12:58 ib2 kernel: [  468.407897] WARNING: at
kernel/watchdog.c:241 watchdog_overflow_callback+0x98/0xc0()
May 21 15:12:58 ib2 kernel: [  468.407957] Hardware name: System Product
Name
May 21 15:12:58 ib2 kernel: [  468.408001] Watchdog detected hard LOCKUP
on cpu 1
May 21 15:12:58 ib2 kernel: [  468.408032] Modules linked in:
ib_ipoib(O) rdma_ucm rdma_cm iw_cm ib_addr ib_cm ib_sa ib_uverbs ib_umad
mlx4_ib ib_mad ib_core ip6table_filter ip6_tables iptable_filter
ip_tables ebtable_nat ebtables x_tables cpufreq_powersave
cpufreq_conservative cpufreq_stats cpufreq_userspace binfmt_misc fuse
loop kvm_amd kvm tpm_tis powernow_k8 shpchp tpm processor mperf
edac_core tpm_bios psmouse edac_mce_amd pci_hotplug microcode evdev
serio_raw i2c_piix4 asus_atk0110 thermal_sys button dm_multipath scsi_dh
mlx4_en sg sd_mod crc_t10dif r8169 ahci libahci libata scsi_mod
mlx4_core [last unloaded: ib_ipoib]
May 21 15:12:58 ib2 kernel: [  468.409806] Pid: 0, comm: swapper/1
Tainted: G      D W  O 3.4.23-pserver-hotfix+ #109
May 21 15:12:58 ib2 kernel: [  468.409866] Call Trace:
May 21 15:12:58 ib2 kernel: [  468.409908]  <NMI>  [<ffffffff8103c2cf>]
warn_slowpath_common+0x7f/0xc0
May 21 15:12:58 ib2 kernel: [  468.409986]  [<ffffffff8103c3c6>]
warn_slowpath_fmt+0x46/0x50
May 21 15:12:58 ib2 kernel: [  468.410033]  [<ffffffff8109c009>] ?
trace_hardirqs_off_caller+0x29/0xd0
May 21 15:12:58 ib2 kernel: [  468.410081]  [<ffffffff810cd968>]
watchdog_overflow_callback+0x98/0xc0
May 21 15:12:58 ib2 kernel: [  468.410129]  [<ffffffff811077dc>]
__perf_event_overflow+0x9c/0x320
May 21 15:12:58 ib2 kernel: [  468.410177]  [<ffffffff811087ec>] ?
perf_event_update_userpage+0x16c/0x2c0
May 21 15:12:58 ib2 kernel: [  468.410225]  [<ffffffff81108680>] ?
perf_event_mmap_ctx+0x170/0x170
May 21 15:12:58 ib2 kernel: [  468.410272]  [<ffffffff81107f74>]
perf_event_overflow+0x14/0x20
May 21 15:12:58 ib2 kernel: [  468.410319]  [<ffffffff81013f27>]
x86_pmu_handle_irq+0x1b7/0x220
May 21 15:12:58 ib2 kernel: [  468.410368]  [<ffffffff8173e341>]
perf_event_nmi_handler+0x21/0x30
May 21 15:12:58 ib2 kernel: [  468.410416]  [<ffffffff8173d8a6>]
nmi_handle+0xb6/0x200
May 21 15:12:58 ib2 kernel: [  468.410462]  [<ffffffff8173d7f0>] ?
oops_begin+0xd0/0xd0
May 21 15:12:58 ib2 kernel: [  468.410508]  [<ffffffff8173db1d>]
do_nmi+0x12d/0x350
May 21 15:12:58 ib2 kernel: [  468.410554]  [<ffffffff8173ceac>]
end_repeat_nmi+0x1a/0x1e
May 21 15:12:58 ib2 kernel: [  468.410602]  [<ffffffff81420d41>] ?
delay_tsc+0x61/0xb0
May 21 15:12:58 ib2 kernel: [  468.410648]  [<ffffffff81420d41>] ?
delay_tsc+0x61/0xb0
May 21 15:12:58 ib2 kernel: [  468.410694]  [<ffffffff81420d41>] ?
delay_tsc+0x61/0xb0
May 21 15:12:58 ib2 kernel: [  468.410738]  <<EOE>>  <IRQ>
[<ffffffff81420c8f>] __delay+0xf/0x20
May 21 15:12:58 ib2 kernel: [  468.410839]  [<ffffffff814285a3>]
do_raw_spin_lock+0xd3/0x140
May 21 15:12:58 ib2 kernel: [  468.410885]  [<ffffffff8173bba8>]
_raw_spin_lock+0x48/0x50
May 21 15:12:58 ib2 kernel: [  468.410932]  [<ffffffff810834f2>] ?
sched_rt_period_timer+0xf2/0x270
May 21 15:12:58 ib2 kernel: [  468.410980]  [<ffffffff8173c58b>] ?
_raw_spin_unlock+0x2b/0x50
May 21 15:12:58 ib2 kernel: [  468.411027]  [<ffffffff810834f2>]
sched_rt_period_timer+0xf2/0x270
May 21 15:12:58 ib2 kernel: [  468.411075]  [<ffffffff81069ff6>]
__run_hrtimer+0x86/0x2f0
May 21 15:12:58 ib2 kernel: [  468.411121]  [<ffffffff81083400>] ?
init_rt_bandwidth+0x60/0x60
May 21 15:12:58 ib2 kernel: [  468.411168]  [<ffffffff8106a50e>]
hrtimer_interrupt+0xfe/0x270
May 21 15:12:58 ib2 kernel: [  468.411215]  [<ffffffff81746ea9>]
smp_apic_timer_interrupt+0x69/0x99
May 21 15:12:58 ib2 kernel: [  468.411263]  [<ffffffff81745caf>]
apic_timer_interrupt+0x6f/0x80
May 21 15:12:58 ib2 kernel: [  468.411308]  <EOI>  [<ffffffff8100bab1>]
? default_idle+0x61/0x320
May 21 15:12:58 ib2 kernel: [  468.411383]  [<ffffffff8109f44d>] ?
trace_hardirqs_on+0xd/0x10
May 21 15:12:58 ib2 kernel: [  468.411431]  [<ffffffff8102b3d6>] ?
native_safe_halt+0x6/0x10
May 21 15:12:58 ib2 kernel: [  468.411477]  [<ffffffff8109f44d>] ?
trace_hardirqs_on+0xd/0x10
May 21 15:12:58 ib2 kernel: [  468.411523]  [<ffffffff8100bab6>]
default_idle+0x66/0x320
May 21 15:12:58 ib2 kernel: [  468.411569]  [<ffffffff8100be02>]
amd_e400_idle+0x92/0x130
May 21 15:12:58 ib2 kernel: [  468.411617]  [<ffffffff8100af36>]
cpu_idle+0xf6/0x140
May 21 15:12:58 ib2 kernel: [  468.411664]  [<ffffffff81731d77>]
start_secondary+0x1ed/0x1f4
May 21 15:12:58 ib2 kernel: [  468.411709] ---[ end trace
871425e942ec1145 ]---
May 21 15:12:58 ib2 kernel: [  470.032848] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:13:00 ib2 kernel: [  472.431601] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:13:02 ib2 kernel: [  474.830297] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:13:05 ib2 kernel: [  477.229094] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:13:07 ib2 kernel: [  479.627563] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:13:10 ib2 kernel: [  482.026253] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:13:12 ib2 kernel: [  484.395049] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:13:14 ib2 kernel: [  486.793758] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:13:17 ib2 kernel: [  489.192468] ib0: enabling connected mode
will cause multicast packet drops


[  884.055635] general protection fault: 0000 [#1] SMP
[  884.055780] CPU 0
[  884.055821] Modules linked in: ib_ipoib(O) rdma_ucm rdma_cm iw_cm
ib_addr ib_cm ib_sa ib_uverbs ib_umad mlx4_ib ib_mad ib_core
ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables
x_tables cpufreq_powersave cpufreq_conservative cpufreq_stats
cpufreq_userspace binfmt_misc fuse loop kvm_amd kvm powernow_k8 psmouse
serio_raw mperf microcode tpm_tis tpm evdev edac_core tpm_bios
edac_mce_amd asus_atk0110 shpchp i2c_piix4 pci_hotplug processor
thermal_sys button dm_multipath scsi_dh mlx4_en sg sd_mod crc_t10dif
mlx4_core r8169 ahci libahci libata scsi_mod [last unloaded: ib_ipoib]
[  884.058726]
[  884.058788] Pid: 3001, comm: kworker/0:0 Tainted: G           O
3.4.23-pserver-hotfix+ #111 System manufacturer System Product
Name/M4A89GTD-PRO
[  884.059827] RIP: 0010:[<ffffffffa02dc3e0>]  [<ffffffffa02dc3e0>]
ipoib_cm_tx_handler+0x30/0x2b0 [ib_ipoib]
[  884.059952] RSP: 0018:ffff8801fad67c50  EFLAGS: 00010293
[  884.060015] RAX: ffff8801fad67fd8 RBX: ffff880211ed5d88 RCX:
0000000000000006
[  884.060080] RDX: 0000000000000003 RSI: ffff8801f664c0d8 RDI:
ffff880211ed5d88
[  884.060139] RBP: ffff8801fad67ca0 R08: 0000000000000001 R09:
0000000000000002
[  884.060198] R10: 0000000000000000 R11: 0000000000000000 R12:
ffff8801f664c000
[  884.060257] R13: ffff88020d110b98 R14: 6b6b6b6b6b6b756b R15:
ffff8801f664c0d8
[  884.060316] FS:  00007f11da415700(0000) GS:ffff88021fc00000(0000)
knlGS:0000000000000000
[  884.060390] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  884.060449] CR2: 00007f11d032c000 CR3: 00000001f16f5000 CR4:
00000000000007f0
[  884.060512] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  884.060579] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[  884.060643] Process kworker/0:0 (pid: 3001, threadinfo
ffff8801fad66000, task ffff8801fb734180)
[  884.060717] Stack:
[  884.060777]  ffff8801fad67ca0 ffffffff8109f019 ffff8801fad67c70
ffffffff8109c0bd
[  884.061014]  ffff8801fad67c90 ffff880211ed5d88 ffff8801f664c000
ffff8801f664c000
[  884.061248]  ffff88020c031100 ffff8801fad67dc0 ffff8801fad67cf0
ffffffffa017fcc5
[  884.061486] Call Trace:
[  884.061544]  [<ffffffff8109f019>] ? mark_held_locks+0x79/0x120
[  884.061610]  [<ffffffff8109c0bd>] ? trace_hardirqs_off+0xd/0x10
[  884.061673]  [<ffffffffa017fcc5>] cm_process_work+0x25/0x120 [ib_cm]
[  884.061734]  [<ffffffffa0180508>] cm_rep_handler+0x308/0x590 [ib_cm]
[  884.061798]  [<ffffffffa0181c65>] cm_work_handler+0x145/0x1070 [ib_cm]
[  884.061867]  [<ffffffff8105daea>] process_one_work+0x19a/0x5c0
[  884.061929]  [<ffffffff8105da7d>] ? process_one_work+0x12d/0x5c0
[  884.061990]  [<ffffffffa0181b20>] ? cm_req_handler+0xa40/0xa40 [ib_cm]
[  884.062055]  [<ffffffff8105f865>] worker_thread+0x175/0x380
[  884.062116]  [<ffffffff8105f6f0>] ? manage_workers+0x210/0x210
[  884.062176]  [<ffffffff81064e0e>] kthread+0xbe/0xd0
[  884.062239]  [<ffffffff8109f2b0>] ? trace_hardirqs_on_caller+0x20/0x1b0
[  884.062302]  [<ffffffff81746734>] kernel_thread_helper+0x4/0x10
[  884.062792]  [<ffffffff8173ca30>] ? retint_restore_args+0x13/0x13
[  884.062853]  [<ffffffff81064d50>] ? __init_kthread_worker+0x70/0x70
[  884.062914]  [<ffffffff81746730>] ? gs_change+0x13/0x13
[  884.062974] Code: 57 41 56 41 55 41 54 53 48 83 ec 28 66 66 66 66 90
4c 8b 6f 08 8b 16 48 89 fb 49 89 f7 4d 8b 75 20 49 81 c6 00 0a 00 00 83
fa 0b <4d> 8b 66 38 77 2a 89 d0 ff 24 c5 90 08 2e a0 90 44 8b 1d a1 79
[  884.066632] RIP  [<ffffffffa02dc3e0>] ipoib_cm_tx_handler+0x30/0x2b0
[ib_ipoib]
[  884.066770]  RSP <ffff8801fad67c50>
[  884.066841] ---[ end trace fa3d54b0aa9bc9ce ]---
(gdb) list *ipoib_cm_tx_handler+0x30
0xa410 is in ipoib_cm_tx_handler
(drivers/infiniband/ulp/ipoib/ipoib_cm.c:1208).
1203	static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id,
1204				       struct ib_cm_event *event)
1205	{
1206		struct ipoib_cm_tx *tx = cm_id->context;
1207		struct ipoib_dev_priv *priv = netdev_priv(tx->dev);
1208		struct net_device *dev = priv->dev;
1209		struct ipoib_neigh *neigh;
1210		unsigned long flags;
1211		int ret;
1212	



[  884.066926] BUG: unable to handle kernel paging request at
fffffffffffffff8
[  884.067090] IP: [<ffffffff81064700>] kthread_data+0x10/0x20
[  884.067210] PGD 1c0d067 PUD 1c0e067 PMD 0
[  884.067412] Oops: 0000 [#2] SMP
[  884.067565] CPU 0
[  884.067618] Modules linked in: ib_ipoib(O) rdma_ucm rdma_cm iw_cm
ib_addr ib_cm ib_sa ib_uverbs ib_umad mlx4_ib ib_mad ib_core
ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables
x_tables cpufreq_powersave cpufreq_conservative cpufreq_stats
cpufreq_userspace binfmt_misc fuse loop kvm_amd kvm powernow_k8 psmouse
serio_raw mperf microcode tpm_tis tpm evdev edac_core tpm_bios
edac_mce_amd asus_atk0110 shpchp i2c_piix4 pci_hotplug processor
thermal_sys button dm_multipath scsi_dh mlx4_en sg sd_mod crc_t10dif
mlx4_core r8169 ahci libahci libata scsi_mod [last unloaded: ib_ipoib]
[  884.071695]
[  884.071753] Pid: 3001, comm: kworker/0:0 Tainted: G      D    O
3.4.23-pserver-hotfix+ #111 System manufacturer System Product
Name/M4A89GTD-PRO
[  884.071972] RIP: 0010:[<ffffffff81064700>]  [<ffffffff81064700>]
kthread_data+0x10/0x20
[  884.072099] RSP: 0018:ffff8801fad679a8  EFLAGS: 00010096
[  884.072168] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
0000000000000000
[  884.072228] RDX: ffffffff81e138c0 RSI: 0000000000000000 RDI:
ffff8801fb734180
[  884.072293] RBP: ffff8801fad679a8 R08: ffff8801fb7341f0 R09:
000000cdd60f50a3
[  884.072357] R10: 0000000000000c00 R11: 0000000000000000 R12:
0000000000000000
[  884.072422] R13: ffff8801fb734548 R14: ffff8801fad677d8 R15:
ffff8801f664c0d8
[  884.072485] FS:  00007f11da415700(0000) GS:ffff88021fc00000(0000)
knlGS:0000000000000000
[  884.072560] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  884.072623] CR2: fffffffffffffff8 CR3: 00000001f16f5000 CR4:
00000000000007f0
[  884.072690] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  884.072762] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[  884.072827] Process kworker/0:0 (pid: 3001, threadinfo
ffff8801fad66000, task ffff8801fb734180)
[  884.072909] Stack:
[  884.072969]  ffff8801fad679c8 ffffffff8105c735 ffff8801fad679c8
ffff88021fc12f40
[  884.074211]  ffff8801fad67a58 ffffffff8173aad3 ffff880100000000
ffff8801fad66000
[  884.074481]  ffff8801fad67fd8 ffff8801fad66000 ffff8801fad66010
ffff8801fad66000
[  884.074742] Call Trace:
[  884.074801]  [<ffffffff8105c735>] wq_worker_sleeping+0x15/0xa0
[  884.074869]  [<ffffffff8173aad3>] __schedule+0x6a3/0x940
[  884.074934]  [<ffffffff8173ae39>] schedule+0x29/0x70
[  884.074998]  [<ffffffff81042105>] do_exit+0x615/0xa40
[  884.075061]  [<ffffffff8103e6c1>] ? kmsg_dump+0x81/0x300
[  884.075123]  [<ffffffff8173d85b>] oops_end+0xab/0xf0
[  884.075184]  [<ffffffff8100570b>] die+0x5b/0x90
[  884.075245]  [<ffffffff8173d3f4>] do_general_protection+0x164/0x170
[  884.075308]  [<ffffffff8173ca60>] ? restore_args+0x30/0x30
[  884.075370]  [<ffffffff8173cc15>] general_protection+0x25/0x30
[  884.075434]  [<ffffffffa02dc3e0>] ? ipoib_cm_tx_handler+0x30/0x2b0
[ib_ipoib]
[  884.075498]  [<ffffffff8109f019>] ? mark_held_locks+0x79/0x120
[  884.075559]  [<ffffffff8109c0bd>] ? trace_hardirqs_off+0xd/0x10
[  884.075622]  [<ffffffffa017fcc5>] cm_process_work+0x25/0x120 [ib_cm]
[  884.075686]  [<ffffffffa0180508>] cm_rep_handler+0x308/0x590 [ib_cm]
[  884.075750]  [<ffffffffa0181c65>] cm_work_handler+0x145/0x1070 [ib_cm]
[  884.075813]  [<ffffffff8105daea>] process_one_work+0x19a/0x5c0
[  884.075875]  [<ffffffff8105da7d>] ? process_one_work+0x12d/0x5c0
[  884.075938]  [<ffffffffa0181b20>] ? cm_req_handler+0xa40/0xa40 [ib_cm]
[  884.076001]  [<ffffffff8105f865>] worker_thread+0x175/0x380
[  884.076064]  [<ffffffff8105f6f0>] ? manage_workers+0x210/0x210
[  884.076126]  [<ffffffff81064e0e>] kthread+0xbe/0xd0
[  884.076187]  [<ffffffff8109f2b0>] ? trace_hardirqs_on_caller+0x20/0x1b0
[  884.076252]  [<ffffffff81746734>] kernel_thread_helper+0x4/0x10
[  884.076313]  [<ffffffff8173ca30>] ? retint_restore_args+0x13/0x13
[  884.076376]  [<ffffffff81064d50>] ? __init_kthread_worker+0x70/0x70
[  884.076438]  [<ffffffff81746730>] ? gs_change+0x13/0x13
[  884.076499] Code: 66 66 66 90 65 48 8b 04 25 80 b9 00 00 48 8b 80 70
03 00 00 8b 40 f0 c9 c3 66 90 55 48 89 e5 66 66 66 66 90 48 8b 87 70 03
00 00 <48> 8b 40 f8 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66
[  884.081230] RIP  [<ffffffff81064700>] kthread_data+0x10/0x20
[  884.081332]  RSP <ffff8801fad679a8>
[  884.081388] CR2: fffffffffffffff8
[  884.081447] ---[ end trace fa3d54b0aa9bc9cf ]---
[  884.081504] Fixing recursive fault but reboot is needed!
[  903.845688] ------------[ cut here ]------------
[  903.845800] WARNING: at kernel/watchdog.c:241
watchdog_overflow_callback+0x98/0xc0()
[  903.845878] Hardware name: System Product Name
[  903.845939] Watchdog detected hard LOCKUP on cpu 3
[  903.845989] Modules linked in: ib_ipoib(O) rdma_ucm rdma_cm iw_cm
ib_addr ib_cm ib_sa ib_uverbs ib_umad mlx4_ib ib_mad ib_core
ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables
x_tables cpufreq_powersave cpufreq_conservative cpufreq_stats
cpufreq_userspace binfmt_misc fuse loop kvm_amd kvm powernow_k8 psmouse
serio_raw mperf microcode tpm_tis tpm evdev edac_core tpm_bios
edac_mce_amd asus_atk0110 shpchp i2c_piix4 pci_hotplug processor
thermal_sys button dm_multipath scsi_dh mlx4_en sg sd_mod crc_t10dif
mlx4_core r8169 ahci libahci libata scsi_mod [last unloaded: ib_ipoib]
[  903.850712] Pid: 19, comm: ksoftirqd/3 Tainted: G      D    O
3.4.23-pserver-hotfix+ #111
[  903.850790] Call Trace:
[  903.850851]  <NMI>  [<ffffffff8103c2cf>] warn_slowpath_common+0x7f/0xc0
[  903.850967]  [<ffffffff8103c3c6>] warn_slowpath_fmt+0x46/0x50
[  903.851034]  [<ffffffff8109c009>] ? trace_hardirqs_off_caller+0x29/0xd0
[  903.851101]  [<ffffffff810cd968>] watchdog_overflow_callback+0x98/0xc0
[  903.851167]  [<ffffffff811077dc>] __perf_event_overflow+0x9c/0x320
[  903.851233]  [<ffffffff811087ec>] ?
perf_event_update_userpage+0x16c/0x2c0
[  903.851299]  [<ffffffff81108680>] ? perf_event_mmap_ctx+0x170/0x170
[  903.852535]  [<ffffffff81107f74>] perf_event_overflow+0x14/0x20
[  903.852601]  [<ffffffff81013f27>] x86_pmu_handle_irq+0x1b7/0x220
[  903.852668]  [<ffffffff8173e4c1>] perf_event_nmi_handler+0x21/0x30
[  903.852733]  [<ffffffff8173da26>] nmi_handle+0xb6/0x200
[  903.852798]  [<ffffffff8173d970>] ? oops_begin+0xd0/0xd0
[  903.852863]  [<ffffffff8173dc9d>] do_nmi+0x12d/0x350
[  903.852928]  [<ffffffff8173d02c>] end_repeat_nmi+0x1a/0x1e
[  903.852994]  [<ffffffff81420eb1>] ? delay_tsc+0x61/0xb0
[  903.853059]  [<ffffffff81420eb1>] ? delay_tsc+0x61/0xb0
[  903.853123]  [<ffffffff81420eb1>] ? delay_tsc+0x61/0xb0
[  903.853188]  <<EOE>>  [<ffffffff81420dff>] __delay+0xf/0x20
[  903.853302]  [<ffffffff81428713>] do_raw_spin_lock+0xd3/0x140
[  903.853367]  [<ffffffff8173bd18>] _raw_spin_lock+0x48/0x50
[  903.853433]  [<ffffffff8107771f>] ? try_to_wake_up+0x20f/0x2f0
[  903.853498]  [<ffffffff8107771f>] try_to_wake_up+0x20f/0x2f0
[  903.853564]  [<ffffffff81077812>] default_wake_function+0x12/0x20
[  903.853629]  [<ffffffff810654cd>] autoremove_wake_function+0x1d/0x50
[  903.853694]  [<ffffffff8106e729>] __wake_up_common+0x59/0x90
[  903.853759]  [<ffffffff81071310>] __wake_up+0x40/0x60
[  903.853827]  [<ffffffff815cc82c>] sk_stream_write_space+0xdc/0x230
[  903.853892]  [<ffffffff815cc794>] ? sk_stream_write_space+0x44/0x230
[  903.853958]  [<ffffffff81629760>] tcp_data_snd_check+0x110/0x120
[  903.854023]  [<ffffffff8162e829>] tcp_rcv_established+0x389/0x870
[  903.854089]  [<ffffffff81639a17>] tcp_v4_do_rcv+0x297/0x5d0
[  903.854153]  [<ffffffff8163a2f1>] tcp_v4_rcv+0x5a1/0x930
[  903.854217]  [<ffffffff81611dfc>] ? ip_local_deliver_finish+0x4c/0x4f0
[  903.854283]  [<ffffffff81611ee5>] ip_local_deliver_finish+0x135/0x4f0
[  903.854348]  [<ffffffff81611dfc>] ? ip_local_deliver_finish+0x4c/0x4f0
[  903.854413]  [<ffffffff81611da0>] ip_local_deliver+0x80/0x90
[  903.854478]  [<ffffffff8161244d>] ip_rcv_finish+0x1ad/0x660
[  903.854544]  [<ffffffff81611c58>] ip_rcv+0x228/0x2f0
[  903.854610]  [<ffffffff815d7696>] __netif_receive_skb+0x2c6/0x990
[  903.854675]  [<ffffffff815d74e6>] ? __netif_receive_skb+0x116/0x990
[  903.854741]  [<ffffffff81162487>] ?
__kmalloc_node_track_caller+0xf7/0x250
[  903.854807]  [<ffffffff815d89bd>] netif_receive_skb+0x2d/0x210
[  903.854877]  [<ffffffffa02de26a>] ipoib_cm_handle_rx_wc+0x1fa/0x710
[ib_ipoib]
[  903.854958]  [<ffffffff8173c6fb>] ? _raw_spin_unlock+0x2b/0x50
[  903.855026]  [<ffffffffa02ded32>] ? ipoib_cm_handle_tx_wc+0x1c2/0x370
[ib_ipoib]
[  903.855108]  [<ffffffffa02d7a86>] ipoib_poll+0xd6/0x190 [ib_ipoib]
[  903.855173]  [<ffffffff815d97ad>] net_rx_action+0x13d/0x320
[  903.855239]  [<ffffffff81045048>] __do_softirq+0xf8/0x380
[  903.855304]  [<ffffffff810453ed>] run_ksoftirqd+0x11d/0x1e0
[  903.855368]  [<ffffffff810452d0>] ? __do_softirq+0x380/0x380
[  903.855433]  [<ffffffff81064e0e>] kthread+0xbe/0xd0
[  903.855497]  [<ffffffff8109f2b0>] ? trace_hardirqs_on_caller+0x20/0x1b0
[  903.855564]  [<ffffffff81746734>] kernel_thread_helper+0x4/0x10
[  903.856798]  [<ffffffff8173ca30>] ? retint_restore_args+0x13/0x13
[  903.856864]  [<ffffffff81064d50>] ? __init_kthread_worker+0x70/0x70
[  903.856929]  [<ffffffff81746730>] ? gs_change+0x13/0x13
[  903.856993] ---[ end trace fa3d54b0aa9bc9d0 ]---
[  917.505825] ------------[ cut here ]------------
[  917.505938] WARNING: at kernel/watchdog.c:241
watchdog_overflow_callback+0x98/0xc0()
[  917.506014] Hardware name: System Product Name
[  917.506075] Watchdog detected hard LOCKUP on cpu 2
[  917.506123] Modules linked in: ib_ipoib(O) rdma_ucm rdma_cm iw_cm
ib_addr ib_cm ib_sa ib_uverbs ib_umad mlx4_ib ib_mad ib_core
ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables
x_tables cpufreq_powersave cpufreq_conservative cpufreq_stats
cpufreq_userspace binfmt_misc fuse loop kvm_amd kvm powernow_k8 psmouse
serio_raw mperf microcode tpm_tis tpm evdev edac_core tpm_bios
edac_mce_amd asus_atk0110 shpchp i2c_piix4 pci_hotplug processor
thermal_sys button dm_multipath scsi_dh mlx4_en sg sd_mod crc_t10dif
mlx4_core r8169 ahci libahci libata scsi_mod [last unloaded: ib_ipoib]
[  917.510288] Pid: 3337, comm: iperf Tainted: G      D W  O
3.4.23-pserver-hotfix+ #111
[  917.510362] Call Trace:
[  917.510421]  <NMI>  [<ffffffff8103c2cf>] warn_slowpath_common+0x7f/0xc0
[  917.510534]  [<ffffffff8103c3c6>] warn_slowpath_fmt+0x46/0x50
[  917.510598]  [<ffffffff8109c009>] ? trace_hardirqs_off_caller+0x29/0xd0
[  917.510662]  [<ffffffff810cd968>] watchdog_overflow_callback+0x98/0xc0
[  917.511154]  [<ffffffff811077dc>] __perf_event_overflow+0x9c/0x320
[  917.511218]  [<ffffffff811087ec>] ?
perf_event_update_userpage+0x16c/0x2c0
[  917.511283]  [<ffffffff81108680>] ? perf_event_mmap_ctx+0x170/0x170
[  917.511347]  [<ffffffff81107f74>] perf_event_overflow+0x14/0x20
[  917.511411]  [<ffffffff81013f27>] x86_pmu_handle_irq+0x1b7/0x220
[  917.511477]  [<ffffffff8173e4c1>] perf_event_nmi_handler+0x21/0x30
[  917.511541]  [<ffffffff8173da26>] nmi_handle+0xb6/0x200
[  917.511604]  [<ffffffff8173d970>] ? oops_begin+0xd0/0xd0
[  917.511669]  [<ffffffff8173dc9d>] do_nmi+0x12d/0x350
[  917.511732]  [<ffffffff8173d02c>] end_repeat_nmi+0x1a/0x1e
[  917.511796]  [<ffffffff81420eb1>] ? delay_tsc+0x61/0xb0
[  917.511859]  [<ffffffff81420eb1>] ? delay_tsc+0x61/0xb0
[  917.511921]  [<ffffffff81420eb1>] ? delay_tsc+0x61/0xb0
[  917.511984]  <<EOE>>  [<ffffffff81420dff>] __delay+0xf/0x20
[  917.512093]  [<ffffffff81428713>] do_raw_spin_lock+0xd3/0x140
[  917.512158]  [<ffffffff8173bd18>] _raw_spin_lock+0x48/0x50
[  917.513308]  [<ffffffff8107eee0>] ? load_balance+0x540/0x8a0
[  917.513371]  [<ffffffff8107eee0>] load_balance+0x540/0x8a0
[  917.513435]  [<ffffffff8107eefc>] ? load_balance+0x55c/0x8a0
[  917.513498]  [<ffffffff8107fe8d>] idle_balance+0x13d/0x2b0
[  917.513560]  [<ffffffff8107fda0>] ? idle_balance+0x50/0x2b0
[  917.513623]  [<ffffffff8173acc0>] __schedule+0x890/0x940
[  917.513686]  [<ffffffff8173ae39>] schedule+0x29/0x70
[  917.513749]  [<ffffffff81738bd5>] schedule_timeout+0x225/0x3b0
[  917.513812]  [<ffffffff8109f2b0>] ? trace_hardirqs_on_caller+0x20/0x1b0
[  917.513877]  [<ffffffff815c26ae>] ? release_sock+0x14e/0x1b0
[  917.513939]  [<ffffffff8109f44d>] ? trace_hardirqs_on+0xd/0x10
[  917.514003]  [<ffffffff81045542>] ? local_bh_enable_ip+0x92/0xf0
[  917.514067]  [<ffffffff8173c5f3>] ? _raw_spin_unlock_bh+0x43/0x50
[  917.514132]  [<ffffffff815ccf98>] sk_stream_wait_memory+0x218/0x300
[  917.514196]  [<ffffffff810654b0>] ? wake_up_bit+0x40/0x40
[  917.514260]  [<ffffffff816247d1>] tcp_sendmsg+0x681/0xc30
[  917.514324]  [<ffffffff8164e0db>] inet_sendmsg+0x12b/0x240
[  917.514387]  [<ffffffff8164dfb0>] ? inet_create+0x5b0/0x5b0
[  917.514450]  [<ffffffff815c27c2>] ? sock_update_classid+0xb2/0x2b0
[  917.514514]  [<ffffffff815c2860>] ? sock_update_classid+0x150/0x2b0
[  917.514577]  [<ffffffff815bdf90>] sock_aio_write+0x190/0x1b0
[  917.514641]  [<ffffffff8113924f>] ? handle_pte_fault+0x50f/0x8e0
[  917.514706]  [<ffffffff8116e11a>] do_sync_write+0xea/0x130
[  917.514770]  [<ffffffff81170cc3>] ? fget_light+0x43/0x490
[  917.514835]  [<ffffffff813b1013>] ? security_file_permission+0x23/0x90
[  917.514900]  [<ffffffff8116e772>] vfs_write+0x172/0x190
[  917.514965]  [<ffffffff8116e881>] sys_write+0x51/0x90
[  917.515028]  [<ffffffff817452e9>] system_call_fastpath+0x16/0x1b
[  917.515092] ---[ end trace fa3d54b0aa9bc9d1 ]---

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BUG: unable to handle kernel paging request at 0000000000070a78 IPoIB
       [not found]             ` <519E37DE.1080504-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
@ 2013-05-23 17:41               ` Doug Ledford
       [not found]                 ` <519E54D9.1050506-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Doug Ledford @ 2013-05-23 17:41 UTC (permalink / raw)
  To: Jack Wang
  Cc: Sebastian Riemer, Shlomo Pongratz, Roland Dreier,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 543 bytes --]

On 05/23/2013 11:38 AM, Jack Wang wrote:
> Tainted: G           O 3.4.23-pserver-hotfix+ #109 System manufacturer
                         ^^^^^^^

I would try a newer kernel.  There are a couple known issues fixed since
this kernel (including a memory corrupter that was involved with
neighbor list handling, and some of your traces look vaguely familiar to
that old failuer).



-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD
	      http://people.redhat.com/dledford



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BUG: unable to handle kernel paging request at 0000000000070a78 IPoIB
       [not found]                 ` <519E54D9.1050506-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2013-05-23 18:53                   ` Jack Wang
       [not found]                     ` <519E65B5.1090909-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Jack Wang @ 2013-05-23 18:53 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Sebastian Riemer, Shlomo Pongratz, Roland Dreier,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 2013年05月23日 19:41, Doug Ledford wrote:
> On 05/23/2013 11:38 AM, Jack Wang wrote:
>> Tainted: G           O 3.4.23-pserver-hotfix+ #109 System manufacturer
>                          ^^^^^^^
> 
> I would try a newer kernel.  There are a couple known issues fixed since
> this kernel (including a memory corrupter that was involved with
> neighbor list handling, and some of your traces look vaguely familiar to
> that old failuer).
> 
> 
> 

Thanks Doug for reply, I tried branch rdma-for-linus, It panic in other
places.

Could you point me which commit do you mean exactly?

Regards,
Jack
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BUG: unable to handle kernel paging request at 0000000000070a78 IPoIB
       [not found]                     ` <519E65B5.1090909-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
@ 2013-05-23 18:55                       ` Doug Ledford
  0 siblings, 0 replies; 7+ messages in thread
From: Doug Ledford @ 2013-05-23 18:55 UTC (permalink / raw)
  To: Jack Wang
  Cc: Sebastian Riemer, Shlomo Pongratz, Roland Dreier,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 956 bytes --]

On 05/23/2013 02:53 PM, Jack Wang wrote:
> On 2013年05月23日 19:41, Doug Ledford wrote:
>> On 05/23/2013 11:38 AM, Jack Wang wrote:
>>> Tainted: G           O 3.4.23-pserver-hotfix+ #109 System manufacturer
>>                          ^^^^^^^
>>
>> I would try a newer kernel.  There are a couple known issues fixed since
>> this kernel (including a memory corrupter that was involved with
>> neighbor list handling, and some of your traces look vaguely familiar to
>> that old failuer).
>>
>>
>>
> 
> Thanks Doug for reply, I tried branch rdma-for-linus, It panic in other
> places.
> 
> Could you point me which commit do you mean exactly?
> 
> Regards,
> Jack
> 

Just try the official v3.9 kernel from Linus and see how it does.  A
'git checkout v3.9' will do the trick.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD
	      http://people.redhat.com/dledford



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-05-23 18:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-17 14:16 BUG: unable to handle kernel paging request at 0000000000070a78 IPoIB Jack Wang
     [not found] ` <51963BC6.1050901-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2013-05-21 12:51   ` Sebastian Riemer
     [not found]     ` <519B6DD6.4090502-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2013-05-21 15:19       ` Jack Wang
     [not found]         ` <519B909A.8010004-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2013-05-23 15:38           ` Jack Wang
     [not found]             ` <519E37DE.1080504-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2013-05-23 17:41               ` Doug Ledford
     [not found]                 ` <519E54D9.1050506-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-05-23 18:53                   ` Jack Wang
     [not found]                     ` <519E65B5.1090909-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2013-05-23 18:55                       ` Doug Ledford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox