From: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: "Nicholas A. Bellinger" <nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org>
Cc: Bart Van Assche
<bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>,
linux-rdma <linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
target-devel
<target-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: SRPt oops with 4.5-rc3-ish
Date: Mon, 11 Apr 2016 16:08:33 -0400 [thread overview]
Message-ID: <570C0441.9040905@redhat.com> (raw)
In-Reply-To: <1456647963.19657.135.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 13469 bytes --]
On 02/28/2016 03:26 AM, Nicholas A. Bellinger wrote:
> AFAIK, the oldest last working srpt commit with se_node_acl + se_session
> active I/O shutdown is:
>
> ib_srpt: Call target_sess_cmd_list_set_waiting during shutdown_session
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/infiniband/ulp/srpt?id=1d19f7800d
>
> Note this is ~40 upstream commits between then and now in v4.5-rc5.
>
> Please confirm when you started triggering this regression during target
> service restart.
I don't have a clear answer for that, although it just happened again on
a v4.5-rc4 kernel. It's pretty annoying because the trigger is (as
often as anything else) and yum upgrade process. And it hangs mid way
through the process. I don't want to know how corrupted my RPM db or my
filesystem is :-(
Anyway, I have a clearer oops this time that I'll attach here, but this
will be my last one from this kernel as I'm upgrading to the most recent
v4.6-rc kernel. If the oops still happens on v4.6-rc, I'll update here.
Here's the oops series, machine was useless after this (disk access was
blocked for all processes):
[4752021.950589] ------------[ cut here ]------------
[4752021.955992] WARNING: CPU: 5 PID: 10364 at
drivers/infiniband/ulp/srpt/ib_srpt.c:3251
srpt_close_session+0x12f/0x140 [ib_srpt]()
[4752021.969091] Modules linked in: hfi1(C) 8021q garp mrp
target_core_user uio target_core_pscsi target_core_file
target_core_iblock ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ip_set
nfnetlink ebtable_nat ebtable_filter ebtable_broute bridge stp llc
ebtables ip6table_mangle ip6table_raw nf_defrag_ipv6 ip6table_security
ip6table_filter ip6_tables iptable_mangle iptable_raw nf_defrag_ipv4
nf_conntrack(-) iptable_security ib_isert iscsi_target_mod ib_iser
libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp
scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm
ib_cm iw_cm ib_sa ib_mad intel_rapl x86_pkg_temp_thermal coretemp
kvm_intel kvm irqbypass ipmi_ssif crct10dif_pclmul ipmi_devintf iTCO_wdt
crc32_pclmul ghash_clmulni_intel iTCO_vendor_support dcdbas ipmi_si
sb_edac mei_me edac_core
[4752022.049588] ioatdma mei ipmi_msghandler lpc_ich dca shpchp wmi
acpi_power_meter tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc
xfs libcrc32c mlx5_ib raid1 raid0 ib_core ib_addr mgag200 i2c_algo_bit
drm_kms_helper ttm crc32c_intel mlx5_core tg3 drm ptp megaraid_sas
pps_core fjes [last unloaded: nf_conntrack_ipv6]
[4752022.080463] CPU: 5 PID: 10364 Comm: targetctl Tainted: G CI
4.5.0-0.rc4.git0.1.fc24.x86_64 #1
[4752022.091366] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS
1.0.4 08/28/2014
[4752022.100131] 0000000000000286 00000000189b0c8a ffff880de32ffcc0
ffffffff813d3e0f
[4752022.108624] 0000000000000000 ffffffffa04872f0 ffff880de32ffcf8
ffffffff810a4fe2
[4752022.117126] ffff881fd427a800 ffff88100fcb7000 0000000000000001
ffff88100fcb70e8
[4752022.125629] Call Trace:
[4752022.128565] [<ffffffff813d3e0f>] dump_stack+0x63/0x84
[4752022.134513] [<ffffffff810a4fe2>] warn_slowpath_common+0x82/0xc0
[4752022.141431] [<ffffffff810a512a>] warn_slowpath_null+0x1a/0x20
[4752022.148155] [<ffffffffa04830bf>] srpt_close_session+0x12f/0x140
[ib_srpt]
[4752022.156055] [<ffffffffa0639de4>] target_release_session+0x24/0x30
[target_core_mod]
[4752022.164925] [<ffffffffa063bb3d>] target_put_session+0x1d/0x20
[target_core_mod]
[4752022.173403] [<ffffffffa06395eb>]
core_tpg_del_initiator_node_acl+0x16b/0x240 [target_core_mod]
[4752022.183343] [<ffffffffa062d23f>]
target_fabric_nacl_base_release+0x3f/0x50 [target_core_mod]
[4752022.193082] [<ffffffff812cc133>] config_item_release+0x63/0xd0
[4752022.199902] [<ffffffff812cc1c2>] config_item_put+0x22/0x30
[4752022.206326] [<ffffffff812ca676>] configfs_rmdir+0x1d6/0x2e0
[4752022.212857] [<ffffffff8124ea0c>] vfs_rmdir+0xbc/0x130
[4752022.218803] [<ffffffff81253c6a>] do_rmdir+0x19a/0x220
[4752022.224750] [<ffffffff81254a16>] SyS_rmdir+0x16/0x20
[4752022.230598] [<ffffffff817cd6ae>] entry_SYSCALL_64_fastpath+0x12/0x6d
[4752022.238009] ---[ end trace befc2f337e9f56d7 ]---
[4752027.739051] ib_srpt Received IB DREQ ERROR event.
[4752029.794988] ib_srpt Received IB TimeWait exit for cm_id
ffff881ff5d55800.
[4752029.807121] BUG: unable to handle kernel paging request at
0000000000017930
[4752029.815120] IP: [<ffffffff810ee9a5>]
queued_spin_lock_slowpath+0x105/0x190
[4752029.823015] PGD 0
[4752029.825466] Oops: 0002 [#1] SMP
[4752029.829286] Modules linked in: hfi1(C) 8021q garp mrp
target_core_user uio target_core_pscsi target_core_file
target_core_iblock ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ip_set
nfnetlink ebtable_nat ebtable_filter ebtable_broute bridge stp llc
ebtables ip6table_mangle ip6table_raw nf_defrag_ipv6 ip6table_security
ip6table_filter ip6_tables iptable_mangle iptable_raw nf_defrag_ipv4
nf_conntrack(-) iptable_security ib_isert iscsi_target_mod ib_iser
libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp
scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm
ib_cm iw_cm ib_sa ib_mad intel_rapl x86_pkg_temp_thermal coretemp
kvm_intel kvm irqbypass ipmi_ssif crct10dif_pclmul ipmi_devintf iTCO_wdt
crc32_pclmul ghash_clmulni_intel iTCO_vendor_support dcdbas ipmi_si
sb_edac mei_me edac_core
[4752029.913124] ioatdma mei ipmi_msghandler lpc_ich dca shpchp wmi
acpi_power_meter tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc
xfs libcrc32c mlx5_ib raid1 raid0 ib_core ib_addr mgag200 i2c_algo_bit
drm_kms_helper ttm crc32c_intel mlx5_core tg3 drm ptp megaraid_sas
pps_core fjes [last unloaded: nf_conntrack_ipv6]
[4752029.946121] CPU: 7 PID: 288828 Comm: kworker/7:0 Tainted: G
WCI 4.5.0-0.rc4.git0.1.fc24.x86_64 #1
[4752029.958057] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS
1.0.4 08/28/2014
[4752029.967563] Workqueue: events srpt_release_channel_work [ib_srpt]
[4752029.975315] task: ffff8820352e5b80 ti: ffff881f5da10000 task.ti:
ffff881f5da10000
[4752029.984607] RIP: 0010:[<ffffffff810ee9a5>] [<ffffffff810ee9a5>]
queued_spin_lock_slowpath+0x105/0x190
[4752029.995941] RSP: 0018:ffff881f5da13da8 EFLAGS: 00010006
[4752030.002790] RAX: 0000000000017930 RBX: 0000000000000286 RCX:
ffff88203d2d7900
[4752030.011668] RDX: 00000000000039eb RSI: 00000000e7b31ae8 RDI:
ffff880de32ffd20
[4752030.020528] RBP: ffff881f5da13da8 R08: 0000000000200000 R09:
0000000000000000
[4752030.029374] R10: 0000000000000000 R11: 000000000001a700 R12:
ffff880de32ffd18
[4752030.038206] R13: ffff881fd2c6b780 R14: ffff881fd427a800 R15:
ffff881fd427a8d0
[4752030.047025] FS: 0000000000000000(0000) GS:ffff88203d2c0000(0000)
knlGS:0000000000000000
[4752030.056913] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[4752030.064174] CR2: 0000000000017930 CR3: 0000000de33db000 CR4:
00000000001406e0
[4752030.072995] Stack:
[4752030.076087] ffff881f5da13dc0 ffffffff817cd4c7 ffff880de32ffd20
ffff881f5da13de8
[4752030.085236] ffffffff810e7cfd ffff881fd427a8d0 ffff88100fcb7000
ffff881fd2c6b780
[4752030.094382] ffff881f5da13e18 ffffffffa0485931 ffff881fc81c60c0
ffff88203d2d65c0
[4752030.103531] Call Trace:
[4752030.107120] [<ffffffff817cd4c7>] _raw_spin_lock_irqsave+0x37/0x40
[4752030.114886] [<ffffffff810e7cfd>] complete+0x1d/0x50
[4752030.121291] [<ffffffffa0485931>]
srpt_release_channel_work+0xe1/0x140 [ib_srpt]
[4752030.130416] [<ffffffff810bd6fd>] process_one_work+0x1ad/0x400
[4752030.137791] [<ffffffff810bd99e>] worker_thread+0x4e/0x480
[4752030.144772] [<ffffffff810bd950>] ? process_one_work+0x400/0x400
[4752030.152327] [<ffffffff810bd950>] ? process_one_work+0x400/0x400
[4752030.159879] [<ffffffff810c38e8>] kthread+0xd8/0xf0
[4752030.166170] [<ffffffff810c3810>] ? kthread_worker_fn+0x180/0x180
[4752030.173823] [<ffffffff817cd9ff>] ret_from_fork+0x3f/0x70
[4752030.180702] [<ffffffff810c3810>] ? kthread_worker_fn+0x180/0x180
[4752030.188352] Code: 02 89 c2 45 31 c9 c1 e2 10 85 d2 74 41 c1 ea 12
83 e0 03 83 ea 01 48 c1 e0 04 48 63 d2 48 05 00 79 01 00 48 03 04 d5 00
d5 d3 81 <48> 89 08 8b 41 08 85 c0 75 09 f3 90 8b 41 08 85 c0 74 f7 4c 8b
[4752030.211521] RIP [<ffffffff810ee9a5>]
queued_spin_lock_slowpath+0x105/0x190
[4752030.220180] RSP <ffff881f5da13da8>
[4752030.224954] CR2: 0000000000017930
[4752030.231895] ---[ end trace befc2f337e9f56d8 ]---
[4752030.312493] BUG: unable to handle kernel paging request at
ffffffffffffffd8
[4752030.322906] IP: [<ffffffff810c3f80>] kthread_data+0x10/0x20
[4752030.331299] PGD 1c0d067 PUD 1c0f067 PMD 0
[4752030.337938] Oops: 0000 [#2] SMP
[4752030.343539] Modules linked in: hfi1(C) 8021q garp mrp
target_core_user uio target_core_pscsi target_core_file
target_core_iblock ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ip_set
nfnetlink ebtable_nat ebtable_filter ebtable_broute bridge stp llc
ebtables ip6table_mangle ip6table_raw nf_defrag_ipv6 ip6table_security
ip6table_filter ip6_tables iptable_mangle iptable_raw nf_defrag_ipv4
nf_conntrack(-) iptable_security ib_isert iscsi_target_mod ib_iser
libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp
scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm
ib_cm iw_cm ib_sa ib_mad intel_rapl x86_pkg_temp_thermal coretemp
kvm_intel kvm irqbypass ipmi_ssif crct10dif_pclmul ipmi_devintf iTCO_wdt
crc32_pclmul ghash_clmulni_intel iTCO_vendor_support dcdbas ipmi_si
sb_edac mei_me edac_core
[4752030.432786] ioatdma mei ipmi_msghandler lpc_ich dca shpchp wmi
acpi_power_meter tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc
xfs libcrc32c mlx5_ib raid1 raid0 ib_core ib_addr mgag200 i2c_algo_bit
drm_kms_helper ttm crc32c_intel mlx5_core tg3 drm ptp megaraid_sas
pps_core fjes [last unloaded: nf_conntrack_ipv6]
[4752030.467298] CPU: 7 PID: 288828 Comm: kworker/7:0 Tainted: G D
WCI 4.5.0-0.rc4.git0.1.fc24.x86_64 #1
[4752030.479665] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS
1.0.4 08/28/2014
[4752030.489575] task: ffff8820352e5b80 ti: ffff881f5da10000 task.ti:
ffff881f5da10000
[4752030.499244] RIP: 0010:[<ffffffff810c3f80>] [<ffffffff810c3f80>]
kthread_data+0x10/0x20
[4752030.509511] RSP: 0018:ffff881f5da13a80 EFLAGS: 00010002
[4752030.516747] RAX: 0000000000000000 RBX: 0000000000000007 RCX:
0000000000000007
[4752030.526034] RDX: ffff88103d410000 RSI: 0000000000000007 RDI:
ffff8820352e5b80
[4752030.535318] RBP: ffff881f5da13a80 R08: ffff8820352e5c28 R09:
ffff8820352e5c00
[4752030.544599] R10: 0000000000000000 R11: 000000000000002f R12:
0000000000016dc0
[4752030.553884] R13: ffff8820352e61d8 R14: ffff8820352e5b80 R15:
ffff88203d2d6dc0
[4752030.563161] FS: 0000000000000000(0000) GS:ffff88203d2c0000(0000)
knlGS:0000000000000000
[4752030.573516] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[4752030.581247] CR2: 0000000000000028 CR3: 0000000de33db000 CR4:
00000000001406e0
[4752030.590525] Stack:
[4752030.594064] ffff881f5da13a98 ffffffff810be581 ffff88203d2d6dc0
ffff881f5da13ae8
[4752030.603691] ffffffff817c91ba 00ff881f652b6478 ffff881f00000007
ffff8820352e5b80
[4752030.613311] ffff881f5da10000 0000000000000000 ffff881f5da13b38
ffff881f5da135d0
[4752030.622926] Call Trace:
[4752030.626959] [<ffffffff810be581>] wq_worker_sleeping+0x11/0x90
[4752030.634789] [<ffffffff817c91ba>] __schedule+0x62a/0x9b0
[4752030.642030] [<ffffffff817c957c>] schedule+0x3c/0x90
[4752030.648874] [<ffffffff810a7f48>] do_exit+0x7a8/0xb30
[4752030.655813] [<ffffffff8101992a>] oops_end+0x9a/0xd0
[4752030.662650] [<ffffffff81067e7e>] no_context+0x13e/0x390
[4752030.669886] [<ffffffff81068150>] __bad_area_nosemaphore+0x80/0x1f0
[4752030.678193] [<ffffffff810682d3>] bad_area_nosemaphore+0x13/0x20
[4752030.686209] [<ffffffff81068597>] __do_page_fault+0xb7/0x400
[4752030.693834] [<ffffffff81068910>] do_page_fault+0x30/0x80
[4752030.701166] [<ffffffff817cfa48>] page_fault+0x28/0x30
[4752030.708210] [<ffffffff810ee9a5>] ?
queued_spin_lock_slowpath+0x105/0x190
[4752030.717062] [<ffffffff817cd4c7>] _raw_spin_lock_irqsave+0x37/0x40
[4752030.725221] [<ffffffff810e7cfd>] complete+0x1d/0x50
[4752030.731999] [<ffffffffa0485931>]
srpt_release_channel_work+0xe1/0x140 [ib_srpt]
[4752030.741523] [<ffffffff810bd6fd>] process_one_work+0x1ad/0x400
[4752030.749298] [<ffffffff810bd99e>] worker_thread+0x4e/0x480
[4752030.756677] [<ffffffff810bd950>] ? process_one_work+0x400/0x400
[4752030.764626] [<ffffffff810bd950>] ? process_one_work+0x400/0x400
[4752030.772558] [<ffffffff810c38e8>] kthread+0xd8/0xf0
[4752030.779231] [<ffffffff810c3810>] ? kthread_worker_fn+0x180/0x180
[4752030.787241] [<ffffffff817cd9ff>] ret_from_fork+0x3f/0x70
[4752030.794438] [<ffffffff810c3810>] ? kthread_worker_fn+0x180/0x180
[4752030.802395] Code: 97 69 70 00 e9 53 ff ff ff e8 4d 0e fe ff 0f 1f
00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 87 e0 05 00 00 55
48 89 e5 <48> 8b 40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
[4752030.826210] RIP [<ffffffff810c3f80>] kthread_data+0x10/0x20
[4752030.833669] RSP <ffff881f5da13a80>
[4752030.838651] CR2: ffffffffffffffd8
[4752030.843418] ---[ end trace befc2f337e9f56d9 ]---
[4752030.933774] Fixing recursive fault but reboot is needed!
--
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
GPG KeyID: 0E572FDD
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]
prev parent reply other threads:[~2016-04-11 20:08 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-14 16:09 SRPt oops with 4.5-rc3-ish Doug Ledford
[not found] ` <56C0A6C3.3010903-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-02-16 1:42 ` Bart Van Assche
2016-02-29 9:11 ` Christoph Hellwig
2016-02-28 3:37 ` Nicholas A. Bellinger
[not found] ` <1456630639.19657.47.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
2016-02-28 4:18 ` Bart Van Assche
[not found] ` <56D274F8.9070804-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2016-02-28 4:47 ` Nicholas A. Bellinger
2016-02-28 4:49 ` Bart Van Assche
2016-02-28 5:00 ` Nicholas A. Bellinger
2016-03-03 15:24 ` Doug Ledford
2016-02-28 8:26 ` Nicholas A. Bellinger
2016-02-28 16:14 ` Bart Van Assche
[not found] ` <56D31CC9.7000609-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2016-02-28 20:43 ` Nicholas A. Bellinger
2016-02-29 0:37 ` Bart Van Assche
[not found] ` <56D392D4.2000105-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2016-02-29 6:05 ` Christoph Hellwig
2016-03-01 6:49 ` Nicholas A. Bellinger
2016-03-01 7:16 ` Christoph Hellwig
[not found] ` <1456647963.19657.135.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
2016-04-11 20:08 ` Doug Ledford [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=570C0441.9040905@redhat.com \
--to=dledford-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org \
--cc=target-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox