public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* list corruption in IPOIB
@ 2013-05-17 19:36 Jack Wang
       [not found] ` <519686B4.7010300-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Jack Wang @ 2013-05-17 19:36 UTC (permalink / raw)
  To: Shlomo Pongratz, Or Gerlitz,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
  Cc: Dongsu Park

Hi Shlomo & Or,

We've seen below neigh->list list corruption warning during testing,
 From Dongsu's and my opinion, several place also need
netif_tx_lock(_bh)/netif_tx_unlock(_bh) pairs around neigh->list , I
tried to add netif_tx_lock/netif_tx_unlock into ipoib_cm_destroy_tx, it
improved the situation, there're some other places in ipoib_main.c and
ipoib_mcast.c, but I don't know which lock should be added, if you can
take some time to look into it, that will be great.



May 17 15:17:57 ib2 kernel: [  274.910792] ib0: failed to send RTU: -22
May 17 15:17:59 ib2 kernel: [  276.118006] ib0: enabling connected mode
will cause multicast packet drops
May 17 15:18:01 ib2 kernel: [  278.557566] ib0: enabling connected mode
will cause multicast packet drops
May 17 15:18:02 ib2 kernel: [  279.793565] ib0: failed to send cm req: -22
May 17 15:18:02 ib2 kernel: [  279.793713] ------------[ cut here
]------------
May 17 15:18:02 ib2 kernel: [  279.793779] WARNING: at
lib/list_debug.c:49 __list_del_entry+0x63/0xd0()
May 17 15:18:02 ib2 kernel: [  279.793840] Hardware name: System Product
Name
May 17 15:18:02 ib2 kernel: [  279.793898] list_del corruption,
ffff8801f9708740->next is LIST_POISON1 (dead000000100100)
May 17 15:18:02 ib2 kernel: [  279.794013] Modules linked in: rdma_ucm
rdma_cm iw_cm ib_addr ib_ipoib ib_cm ib_sa ib_uverbs ib_umad mlx4_ib
ib_mad ib_core ip6table_filter ip6_tables iptable_filter ip_tables
ebtable_nat ebtables x_tables cpufreq_powersave cpufreq_conservative
cpufreq_stats cpufreq_userspace binfmt_misc fuse loop kvm_amd kvm
psmouse powernow_k8 tpm_tis tpm tpm_bios serio_raw edac_core mperf evdev
shpchp processor edac_mce_amd microcode pci_hotplug i2c_piix4
asus_atk0110 thermal_sys button dm_multipath scsi_dh mlx4_en sg sd_mod
crc_t10dif mlx4_core ahci libahci r8169 libata scsi_mod [last unloaded:
scsi_wait_scan]
May 17 15:18:02 ib2 kernel: [  279.796082] Pid: 220, comm: kworker/u:5
Not tainted 3.4.23-pserver+ #98
May 17 15:18:02 ib2 kernel: [  279.796142] Call Trace:
May 17 15:18:02 ib2 kernel: [  279.796202]  [<ffffffff8103c21f>]
warn_slowpath_common+0x7f/0xc0
May 17 15:18:02 ib2 kernel: [  279.796266]  [<ffffffff8103c316>]
warn_slowpath_fmt+0x46/0x50
May 17 15:18:02 ib2 kernel: [  279.796328]  [<ffffffff81428ff3>]
__list_del_entry+0x63/0xd0
May 17 15:18:02 ib2 kernel: [  279.796828]  [<ffffffff81429071>]
list_del+0x11/0x40
May 17 15:18:02 ib2 kernel: [  279.796897]  [<ffffffffa02b7978>]
ipoib_cm_tx_start+0x2e8/0x3b0 [ib_ipoib]
May 17 15:18:02 ib2 kernel: [  279.796964]  [<ffffffff8105da3a>]
process_one_work+0x19a/0x5c0
May 17 15:18:02 ib2 kernel: [  279.797026]  [<ffffffff8105d9cd>] ?
process_one_work+0x12d/0x5c0
May 17 15:18:02 ib2 kernel: [  279.797096]  [<ffffffffa02b7690>] ?
ipoib_cm_destroy_tx+0xc0/0xc0 [ib_ipoib]
May 17 15:18:02 ib2 kernel: [  279.797162]  [<ffffffff8105f7b5>]
worker_thread+0x175/0x380
May 17 15:18:02 ib2 kernel: [  279.797224]  [<ffffffff8105f640>] ?
manage_workers+0x210/0x210
May 17 15:18:02 ib2 kernel: [  279.797285]  [<ffffffff81064d5e>]
kthread+0xbe/0xd0
May 17 15:18:02 ib2 kernel: [  279.797346]  [<ffffffff8109f1d0>] ?
trace_hardirqs_on_caller+0x20/0x1b0
May 17 15:18:02 ib2 kernel: [  279.797412]  [<ffffffff81746b74>]
kernel_thread_helper+0x4/0x10
May 17 15:18:02 ib2 kernel: [  279.797475]  [<ffffffff8173ce70>] ?
retint_restore_args+0x13/0x13
May 17 15:18:02 ib2 kernel: [  279.797539]  [<ffffffff81064ca0>] ?
__init_kthread_worker+0x70/0x70
May 17 15:18:02 ib2 kernel: [  279.797602]  [<ffffffff81746b70>] ?
gs_change+0x13/0x13
May 17 15:18:02 ib2 kernel: [  279.797660] ---[ end trace
a513a4365628073c ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-05-20 19:57 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-17 19:36 list corruption in IPOIB Jack Wang
     [not found] ` <519686B4.7010300-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2013-05-18 19:37   ` Or Gerlitz
     [not found]     ` <CAJZOPZJNA7E005x9+XdVMG31fLEZm2mKB1nkpt5m3hA1qh7fYg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-18 21:36       ` Jack Wang
     [not found]         ` <5197F447.5020702-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2013-05-19  6:00           ` Or Gerlitz
     [not found]             ` <51986A8B.9030806-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-05-19  9:17               ` Jack Wang
     [not found]                 ` <519898B0.1000901-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2013-05-20  9:05                   ` Or Gerlitz
     [not found]                     ` <5199E747.3070502-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-05-20  9:10                       ` Jinpu Wang
     [not found]                         ` <CAMGffEn6YwXSB7KDfDRJrJmBaiQEG-zAjEonY=JUxMo=nLRSXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-20 10:58                           ` Or Gerlitz
     [not found]                             ` <519A01DD.6080906-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-05-20 12:46                               ` Jinpu Wang
     [not found]                                 ` <CAMGffEk=PJge4jtdcx8xOKA_3RhcSn9wweULxCE7yctPApSn1g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-20 12:51                                   ` Or Gerlitz
     [not found]                                     ` <CAD+HZHUKU3qq_WbaoW8NfwkoMQWQKeVS1GTGXxBRUEJOridEyg@mail.gmail.com>
     [not found]                                       ` <CAD+HZHUKU3qq_WbaoW8NfwkoMQWQKeVS1GTGXxBRUEJOridEyg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-20 13:38                                         ` Shlomo Pongratz
     [not found]                                           ` <519A275B.9070400-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-05-20 14:36                                             ` Jack Wang
     [not found]                                               ` <519A34F9.3080700-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2013-05-20 19:00                                                 ` Or Gerlitz
     [not found]                                                   ` <CAJZOPZKQF-qWLKAtuh8tJvPeMmWJTsXqG5P_0ELBs3EKYDh4sA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-20 19:38                                                     ` Jack Wang
     [not found]                                                       ` <519A7BAA.1080008-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2013-05-20 19:50                                                         ` Or Gerlitz
     [not found]                                                           ` <CAJZOPZLaXDjMHWCoo5Gs_iEro22o6XS2u-f6E9SLtH3AFMu_mQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-20 19:57                                                             ` Jack Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox