From: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
To: Shlomo Pongratz <shlomop-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
"linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Cc: Dongsu Park <dongsu.park-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Subject: list corruption in IPOIB
Date: Fri, 17 May 2013 21:36:20 +0200 [thread overview]
Message-ID: <519686B4.7010300@profitbricks.com> (raw)
Hi Shlomo & Or,
We've seen below neigh->list list corruption warning during testing,
From Dongsu's and my opinion, several place also need
netif_tx_lock(_bh)/netif_tx_unlock(_bh) pairs around neigh->list , I
tried to add netif_tx_lock/netif_tx_unlock into ipoib_cm_destroy_tx, it
improved the situation, there're some other places in ipoib_main.c and
ipoib_mcast.c, but I don't know which lock should be added, if you can
take some time to look into it, that will be great.
May 17 15:17:57 ib2 kernel: [ 274.910792] ib0: failed to send RTU: -22
May 17 15:17:59 ib2 kernel: [ 276.118006] ib0: enabling connected mode
will cause multicast packet drops
May 17 15:18:01 ib2 kernel: [ 278.557566] ib0: enabling connected mode
will cause multicast packet drops
May 17 15:18:02 ib2 kernel: [ 279.793565] ib0: failed to send cm req: -22
May 17 15:18:02 ib2 kernel: [ 279.793713] ------------[ cut here
]------------
May 17 15:18:02 ib2 kernel: [ 279.793779] WARNING: at
lib/list_debug.c:49 __list_del_entry+0x63/0xd0()
May 17 15:18:02 ib2 kernel: [ 279.793840] Hardware name: System Product
Name
May 17 15:18:02 ib2 kernel: [ 279.793898] list_del corruption,
ffff8801f9708740->next is LIST_POISON1 (dead000000100100)
May 17 15:18:02 ib2 kernel: [ 279.794013] Modules linked in: rdma_ucm
rdma_cm iw_cm ib_addr ib_ipoib ib_cm ib_sa ib_uverbs ib_umad mlx4_ib
ib_mad ib_core ip6table_filter ip6_tables iptable_filter ip_tables
ebtable_nat ebtables x_tables cpufreq_powersave cpufreq_conservative
cpufreq_stats cpufreq_userspace binfmt_misc fuse loop kvm_amd kvm
psmouse powernow_k8 tpm_tis tpm tpm_bios serio_raw edac_core mperf evdev
shpchp processor edac_mce_amd microcode pci_hotplug i2c_piix4
asus_atk0110 thermal_sys button dm_multipath scsi_dh mlx4_en sg sd_mod
crc_t10dif mlx4_core ahci libahci r8169 libata scsi_mod [last unloaded:
scsi_wait_scan]
May 17 15:18:02 ib2 kernel: [ 279.796082] Pid: 220, comm: kworker/u:5
Not tainted 3.4.23-pserver+ #98
May 17 15:18:02 ib2 kernel: [ 279.796142] Call Trace:
May 17 15:18:02 ib2 kernel: [ 279.796202] [<ffffffff8103c21f>]
warn_slowpath_common+0x7f/0xc0
May 17 15:18:02 ib2 kernel: [ 279.796266] [<ffffffff8103c316>]
warn_slowpath_fmt+0x46/0x50
May 17 15:18:02 ib2 kernel: [ 279.796328] [<ffffffff81428ff3>]
__list_del_entry+0x63/0xd0
May 17 15:18:02 ib2 kernel: [ 279.796828] [<ffffffff81429071>]
list_del+0x11/0x40
May 17 15:18:02 ib2 kernel: [ 279.796897] [<ffffffffa02b7978>]
ipoib_cm_tx_start+0x2e8/0x3b0 [ib_ipoib]
May 17 15:18:02 ib2 kernel: [ 279.796964] [<ffffffff8105da3a>]
process_one_work+0x19a/0x5c0
May 17 15:18:02 ib2 kernel: [ 279.797026] [<ffffffff8105d9cd>] ?
process_one_work+0x12d/0x5c0
May 17 15:18:02 ib2 kernel: [ 279.797096] [<ffffffffa02b7690>] ?
ipoib_cm_destroy_tx+0xc0/0xc0 [ib_ipoib]
May 17 15:18:02 ib2 kernel: [ 279.797162] [<ffffffff8105f7b5>]
worker_thread+0x175/0x380
May 17 15:18:02 ib2 kernel: [ 279.797224] [<ffffffff8105f640>] ?
manage_workers+0x210/0x210
May 17 15:18:02 ib2 kernel: [ 279.797285] [<ffffffff81064d5e>]
kthread+0xbe/0xd0
May 17 15:18:02 ib2 kernel: [ 279.797346] [<ffffffff8109f1d0>] ?
trace_hardirqs_on_caller+0x20/0x1b0
May 17 15:18:02 ib2 kernel: [ 279.797412] [<ffffffff81746b74>]
kernel_thread_helper+0x4/0x10
May 17 15:18:02 ib2 kernel: [ 279.797475] [<ffffffff8173ce70>] ?
retint_restore_args+0x13/0x13
May 17 15:18:02 ib2 kernel: [ 279.797539] [<ffffffff81064ca0>] ?
__init_kthread_worker+0x70/0x70
May 17 15:18:02 ib2 kernel: [ 279.797602] [<ffffffff81746b70>] ?
gs_change+0x13/0x13
May 17 15:18:02 ib2 kernel: [ 279.797660] ---[ end trace
a513a4365628073c ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next reply other threads:[~2013-05-17 19:36 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-17 19:36 Jack Wang [this message]
[not found] ` <519686B4.7010300-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2013-05-18 19:37 ` list corruption in IPOIB Or Gerlitz
[not found] ` <CAJZOPZJNA7E005x9+XdVMG31fLEZm2mKB1nkpt5m3hA1qh7fYg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-18 21:36 ` Jack Wang
[not found] ` <5197F447.5020702-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2013-05-19 6:00 ` Or Gerlitz
[not found] ` <51986A8B.9030806-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-05-19 9:17 ` Jack Wang
[not found] ` <519898B0.1000901-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2013-05-20 9:05 ` Or Gerlitz
[not found] ` <5199E747.3070502-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-05-20 9:10 ` Jinpu Wang
[not found] ` <CAMGffEn6YwXSB7KDfDRJrJmBaiQEG-zAjEonY=JUxMo=nLRSXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-20 10:58 ` Or Gerlitz
[not found] ` <519A01DD.6080906-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-05-20 12:46 ` Jinpu Wang
[not found] ` <CAMGffEk=PJge4jtdcx8xOKA_3RhcSn9wweULxCE7yctPApSn1g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-20 12:51 ` Or Gerlitz
[not found] ` <CAD+HZHUKU3qq_WbaoW8NfwkoMQWQKeVS1GTGXxBRUEJOridEyg@mail.gmail.com>
[not found] ` <CAD+HZHUKU3qq_WbaoW8NfwkoMQWQKeVS1GTGXxBRUEJOridEyg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-20 13:38 ` Shlomo Pongratz
[not found] ` <519A275B.9070400-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-05-20 14:36 ` Jack Wang
[not found] ` <519A34F9.3080700-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2013-05-20 19:00 ` Or Gerlitz
[not found] ` <CAJZOPZKQF-qWLKAtuh8tJvPeMmWJTsXqG5P_0ELBs3EKYDh4sA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-20 19:38 ` Jack Wang
[not found] ` <519A7BAA.1080008-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2013-05-20 19:50 ` Or Gerlitz
[not found] ` <CAJZOPZLaXDjMHWCoo5Gs_iEro22o6XS2u-f6E9SLtH3AFMu_mQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-20 19:57 ` Jack Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=519686B4.7010300@profitbricks.com \
--to=jinpu.wang-eikl63zcoxah+58jc4qpia@public.gmane.org \
--cc=dongsu.park-EIkl63zCoXaH+58JC4qpiA@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=shlomop-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox