public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
To: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	linux-rdma <linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: SRPt oops with 4.5-rc3-ish
Date: Mon, 15 Feb 2016 17:42:52 -0800	[thread overview]
Message-ID: <56C27E9C.6030403@sandisk.com> (raw)
In-Reply-To: <56C0A6C3.3010903-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On 02/14/16 08:09, Doug Ledford wrote:
> While testing with my latest kernel (rc3 plus pending RDMA patches), I
> ran across this oops:
>
> [dledford@linux-ws ~]$ console rdma-storage-04
> Enter dledford-Qj3k6FK1/F6RgOCG8Jv1mWcM3YSXpimQh7FX57BcuXVWk0Htik3J/w@public.gmane.org's password:
> [Enter `^Ec?' for help]
> [-- MOTD -- https://home.corp.redhat.com/wiki/conserver]
> [playback]
> [160605.947614]  [<ffffffff81150545>] ? call_rcu_sched+0x25/0x30
> [160605.954074]  [<ffffffffc0b3dd84>]
> target_fabric_nacl_base_release+0x64/0x70]
> [160605.963731]  [<ffffffff813ccc6f>] config_item_release+0x9f/0x1c0
> [160605.970579]  [<ffffffff813ccdf2>] config_item_put+0x62/0x80
> [160605.976936]  [<ffffffff813c97d3>] configfs_rmdir+0x343/0x500
> [160605.983396]  [<ffffffff8131287a>] vfs_rmdir+0x13a/0x220
> [160605.989375]  [<ffffffff813197db>] do_rmdir+0x1fb/0x260
> [160605.995244]  [<ffffffff8131adde>] SyS_rmdir+0x1e/0x30
> [160606.001019]  [<ffffffff81a0922e>] entry_SYSCALL_64_fastpath+0x12/0x71
> [160606.009586] ---[ end trace 820588f5ef5f6148 ]---
> [160607.051593] ib_srpt Received SRP_LOGIN_REQ with i_port_id
> 0x7f0ee700032d1de)
> [160607.078225] ib_srpt rejected SRP_LOGIN_REQ because the target port
> has not d
> [160611.228909] ib_srpt Received IB DREQ ERROR event.
> [160613.276862] ib_srpt Received IB TimeWait exit for cm_id
> ffff881cc9dc7a00.
> [160613.290322] BUG: unable to handle kernel paging request at
> 0000000000018630
> [160613.301470] IP: [<ffffffff81125694>]
> native_queued_spin_lock_slowpath+0x2e40
> [160613.313112] PGD 0
> [160613.318577] Oops: 0002 [#1] SMP
> [160613.325358] Modules linked in: nfnetlink(+) ip6t_rpfilter 8021q garp
> ip6t_R]
> [160613.492357] CPU: 1 PID: 44982 Comm: kworker/1:1 Tainted: G        W
> I     44
> [160613.505978] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS
> 1.0.4 084
> [160613.517697] Workqueue: events srpt_release_channel_work [ib_srpt]
> [160613.527634] task: ffff881d01099000 ti: ffff881d02014000 task.ti:
> ffff881d020
> [160613.539130] RIP: 0010:[<ffffffff81125694>]  [<ffffffff81125694>]
> native_que0
> [160613.553326] RSP: 0018:ffff881d02017d90  EFLAGS: 00010006
> [160613.562332] RAX: 00000000000000ea RBX: 0000000000000206 RCX:
> 000000000001860
> [160613.573401] RDX: 0000000000080000 RSI: ffff881d4c818600 RDI:
> ffff880f2d7c7d8
> [160613.584472] RBP: ffff881d02017d90 R08: 0000000000000023 R09:
> 000000000000000
> [160613.595491] R10: 00000000ffffffd8 R11: 00000000000211c0 R12:
> ffff880f2d7c7d0
> [160613.606568] R13: ffff881ce426d000 R14: ffff881cca702a00 R15:
> 000000000000000
> [160613.617643] FS:  0000000000000000(0000) GS:ffff881d4c800000(0000)
> knlGS:0000
> [160613.629793] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [160613.639315] CR2: 0000000000018630 CR3: 0000000001ca9000 CR4:
> 000000000014060
> [160613.650471] Stack:
> [160613.655843]  ffff881d02017da0 ffffffff8122ac4c ffff881d02017db8
> ffffffff81a7
> [160613.667361]  ffff880f2d7c7d18 ffff881d02017de0 ffffffff81121255
> ffff881cca70
> [160613.678885]  ffff881ce426d058 ffff881ce426d000 ffff881d02017e10
> ffffffffc070
> [160613.690366] Call Trace:
> [160613.696195]  [<ffffffff8122ac4c>] queued_spin_lock_slowpath+0x12/0x1d
> [160613.706533]  [<ffffffff81a08ea7>] _raw_spin_lock_irqsave+0x87/0xa0
> [160613.716586]  [<ffffffff81121255>] complete+0x25/0x70
> [160613.725318]  [<ffffffffc07e7e80>]
> srpt_release_channel_work+0x180/0x210 [ib]
> [160613.736889]  [<ffffffff810e6dd8>] process_one_work+0x228/0x650
> [160613.746616]  [<ffffffff810e79be>] worker_thread+0x21e/0x800
> [160613.756047]  [<ffffffff81a02035>] ? __schedule+0x4b5/0xe6a
> [160613.765371]  [<ffffffff810e77a0>] ? kzalloc+0x30/0x30
> [160613.774203]  [<ffffffff810efc38>] kthread+0x118/0x150
> [160613.783000]  [<ffffffff810efb20>] ? flush_kthread_worker+0xd0/0xd0
> [160613.792932]  [<ffffffff81a0958f>] ret_from_fork+0x3f/0x70
> [160613.801994]  [<ffffffff810efb20>] ? flush_kthread_worker+0xd0/0xd0
> [160613.811897] Code: 01 00 00 74 ec e9 d7 fd ff ff 48 89 c1 c1 e8 12 48
> c1 e9
> [160613.840260] RIP  [<ffffffff81125694>]
> native_queued_spin_lock_slowpath+0x2e0
> [160613.851846]  RSP <ffff881d02017d90>
> [160613.858812] CR2: 0000000000018630
> [160613.874762] ---[ end trace 820588f5ef5f6149 ]---
> [160613.937225] Kernel panic - not syncing: Fatal exception
> [160613.946167] Kernel Offset: disabled
> [160614.004693] ---[ end Kernel panic - not syncing: Fatal exception
> [-- MARK -- Sun Feb 14 15:50:00 2016]
> [-- dledford-CKb8VAQLn9hXrIkS9f7CXA@public.gmane.org@ovpn-116-26.rdu2.redhat.com attached -- Sun Feb
> 14 15:5]
>
>
>
> Basic description of situation that cause the oops:
>
> Server with 30+ SRPt luns, 2 SRP devices, 1 active client busy beating
> away on 1 lun via two paths (active/passive setup)
>
> Run dnf upgrade (dnf is yum's replacement, so just a system wide
> software update).
>
> Get to the cleanup for targetcli/target-restore and it invokes an
> attempt to reload the target service while still in use.  During the
> process of deconfiguring the luns that are in use, this oops occurred.
> Sending the report to you because it appears to involve the
> multi-channel support.

Hello Doug,

As far as I know the session shutdown code in the LIO core has never 
worked reliably in the presence of active I/O in any upstream kernel 
version. All my tests of the ib_srpt patch series I submitted recently 
have been performed on top of a long series of bug fixes for the LIO 
core. The tree I have been testing is available at 
https://github.com/bvanassche/linux/tree/lio-tmf-fixes-2016-01-13. I 
have tried a few times to submit the LIO core patches to Nic Bellinger 
(making TMF handling synchronous + several fixes for race conditions 
related to session shutdown). Apparently Nic is trying to fix the 
existing approach for TMF handling (handling TMF from another context 
than the regular command execution context) but so far without success 
(see e.g. http://www.spinics.net/lists/target-devel/index.html#11822).

Bart.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2016-02-16  1:42 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-14 16:09 SRPt oops with 4.5-rc3-ish Doug Ledford
     [not found] ` <56C0A6C3.3010903-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-02-16  1:42   ` Bart Van Assche [this message]
2016-02-29  9:11   ` Christoph Hellwig
2016-02-28  3:37 ` Nicholas A. Bellinger
     [not found]   ` <1456630639.19657.47.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
2016-02-28  4:18     ` Bart Van Assche
     [not found]       ` <56D274F8.9070804-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2016-02-28  4:47         ` Nicholas A. Bellinger
2016-02-28  4:49           ` Bart Van Assche
2016-02-28  5:00             ` Nicholas A. Bellinger
2016-03-03 15:24               ` Doug Ledford
2016-02-28  8:26   ` Nicholas A. Bellinger
2016-02-28 16:14     ` Bart Van Assche
     [not found]       ` <56D31CC9.7000609-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2016-02-28 20:43         ` Nicholas A. Bellinger
2016-02-29  0:37           ` Bart Van Assche
     [not found]             ` <56D392D4.2000105-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2016-02-29  6:05               ` Christoph Hellwig
2016-03-01  6:49                 ` Nicholas A. Bellinger
2016-03-01  7:16                   ` Christoph Hellwig
     [not found]     ` <1456647963.19657.135.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
2016-04-11 20:08       ` Doug Ledford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56C27E9C.6030403@sandisk.com \
    --to=bart.vanassche-xdaiopvojttbdgjk7y7tuq@public.gmane.org \
    --cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox