From: Shirley Ma <shirley.ma-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
To: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
Steve Wise
<swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
Cc: "Hefty,
Sean" <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
Devesh Sharma
<devesh.sharma-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org>,
Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider module
Date: Thu, 17 Jul 2014 14:25:14 -0700 [thread overview]
Message-ID: <53C83F3A.7020608@oracle.com> (raw)
In-Reply-To: <DF7CE85B-288D-4CC2-AD51-B326D5F1EE1A-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
On 07/17/2014 01:41 PM, Chuck Lever wrote:
> On Jul 17, 2014, at 4:08 PM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> wrote:
>
>> >
>> >
>>> >> -----Original Message-----
>>> >> From: Steve Wise [mailto:swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org]
>>> >> Sent: Thursday, July 17, 2014 2:56 PM
>>> >> To: 'Hefty, Sean'; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
>>> >> Cc: 'linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org'; 'chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org'
>>> >> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider module
>>> >>
>>> >>
>>> >>
>>>> >>> -----Original Message-----
>>>> >>> From: Hefty, Sean [mailto:sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org]
>>>> >>> Sent: Thursday, July 17, 2014 2:50 PM
>>>> >>> To: Steve Wise; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
>>>> >>> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org
>>>> >>> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider module
>>>> >>>
>>>>>> >>>>> So the rdma cm is expected to increase the driver reference count
>>>>> >>>> (try_module_get) for
>>>>>> >>>>> each new cm id, then deference count (module_put) when cm id is
>>>>> >>>> destroyed?
>>>>>> >>>>>
>>>>> >>>>
>>>>> >>>> No, I think he's saying the rdma-cm posts a RDMA_CM_DEVICE_REMOVAL event
>>>>> >>>> to each
>>>>> >>>> application with rdmacm objects allocated, and each application is expected
>>>>> >>>> to destroy all
>>>>> >>>> the objects it has allocated before returning from the event handler.
>>>> >>>
>>>> >>> This is almost correct. The applications do not have to destroy all the objects that
>> > it has
>>>> >>> allocated before returning from their event handler. E.g. an app can queue a work
>> > item
>>>> >>> that does the destruction. The rdmacm will block in its ib_client remove handler
>> > until all
>>>> >>> relevant rdma_cm_id's have been destroyed.
>>>> >>>
>>> >>
>>> >> Thanks for the clarification.
>>> >>
>> >
>> > And looking at xprtrdma, it does handle the DEVICE_REMOVAL event in rpcrdma_conn_upcall().
>> > It sets ep->rep_connected to -ENODEV, wakes everybody up, and calls rpcrdma_conn_func()
>> > for that endpoint, which schedules rep_connect_worker... and I gave up following the code
>> > path at this point... :)
>> >
>> > For this to all work correctly, it would need to destroy all the QPs, MRs, CQs, etc for
>> > that device _before_ destroying the rdma cm ids. Otherwise the provider module could be
>> > unloaded too soon…
> We can’t really deal with a CM_DEVICE_REMOVE event while there are active
> NFS mounts.
>
> System shutdown ordering should guarantee (one would hope) that NFS
> mount points are unmounted before the RDMA/IB core infrastructure is
> torn down. Ordering shouldn’t matter as long all NFS activity has
> ceased before the CM tries to remove the device.
>
> So if something is hanging up the CM, there’s something xprtrdma is not
> cleaning up properly.
I saw a problem once, restart the system without umounting the NFS. CM was hung on waiting for completion. It looks like a bug in xprtrdma cleanup up. I couldn't reproduce it.
Call Trace:
[<ffffffff815c9aa9>] schedule+0x29/0x70
[<ffffffff815c8d35>] schedule_timeout+0x165/0x200
[<ffffffff815ca9ff>] ? wait_for_completion+0xcf/0x110
[<ffffffff810a708e>] ? __lock_release+0x9e/0x1f0
[<ffffffff815ca9ff>] ? wait_for_completion+0xcf/0x110
[<ffffffff815caa07>] wait_for_completion+0xd7/0x110
[<ffffffff8108bce0>] ? try_to_wake_up+0x260/0x260
[<ffffffffa064cb6e>] cma_process_remove+0xee/0x110 [rdma_cm]
[<ffffffffa064cbdc>] cma_remove_one+0x4c/0x60 [rdma_cm]
[<ffffffffa0279e0f>] ib_unregister_device+0x4f/0x100 [ib_core]
[<ffffffffa02f76ee>] mlx4_ib_remove+0x2e/0x260 [mlx4_ib]
[<ffffffffa01754c9>] mlx4_remove_device+0x69/0x80 [mlx4_core]
[<ffffffffa01755b3>] mlx4_unregister_interface+0x43/0x80 [mlx4_core]
[<ffffffffa030970c>] mlx4_ib_cleanup+0x10/0x23 [mlx4_ib]
[<ffffffff810d9183>] SyS_delete_module+0x183/0x1e0
[<ffffffff810f7c94>] ? __audit_syscall_entry+0x94/0x100
[<ffffffff812c5789>] ? lockdep_sys_exit_thunk+0x35/0x67
[<ffffffff815cec92>] system_call_fastpath+0x16/0x1b
Shirley
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2014-07-17 21:25 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1405605697-11583-1-git-send-email-devesh.sharma@emulex.com>
[not found] ` <1405605697-11583-1-git-send-email-devesh.sharma-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org>
2014-07-17 14:01 ` [for-next 1/2] xprtrdma: take reference of rdma provider module Devesh Sharma
[not found] ` <3e39e90f-7095-4eb9-a844-516672a355ad-3RiH6ntJJkOPfaB/Gd0HpljyZtpTMMwT@public.gmane.org>
2014-07-17 15:01 ` Steve Wise
[not found] ` <53C7E546.3080008-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2014-07-17 15:05 ` Chuck Lever
[not found] ` <78A77C48-AC73-4C01-B139-A00B4F674C70-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-07-17 15:31 ` Devesh Sharma
2014-07-17 15:20 ` Devesh Sharma
2014-07-17 16:06 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A823739933FCA3-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-07-17 18:57 ` Shirley Ma
[not found] ` <53C81CB7.2030000-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-07-17 19:07 ` Steve Wise
2014-07-17 19:50 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A823739933FDEA-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-07-17 19:55 ` Steve Wise
2014-07-17 20:23 ` Shirley Ma
2014-07-17 20:08 ` Steve Wise
2014-07-17 20:41 ` Chuck Lever
[not found] ` <DF7CE85B-288D-4CC2-AD51-B326D5F1EE1A-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-07-17 20:59 ` Steve Wise
2014-07-18 5:05 ` Devesh Sharma
[not found] ` <EE7902D3F51F404C82415C4803930ACD3FE1482F-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-07-18 13:27 ` Steve Wise
2014-07-18 15:47 ` Shirley Ma
[not found] ` <53C94199.4050601-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-07-21 6:11 ` Devesh Sharma
[not found] ` <EE7902D3F51F404C82415C4803930ACD3FE1C7B7-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-07-21 11:48 ` Devesh Sharma
[not found] ` <a6345162-863d -447c-b7c2-059ced190a13@CMEXHTCAS1.ad.emulex.com>
[not found] ` <a6345162-863d-447c-b7c2-059ced190a13-3RiH6ntJJkP8BX6JNMqfyFjyZtpTMMwT@public.gmane.org>
2014-07-21 14:53 ` Chuck Lever
[not found] ` <27ACE237-161A-4CA5-AA5C-6349CC4118E3-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-07-21 15:03 ` Steve Wise
2014-07-21 15:20 ` Chuck Lever
[not found] ` <D88D1952-83A1-4FF9-B028-AAE7A859A 5B1@oracle.com>
[not found] ` <D88D1952-83A1-4FF9-B028-AAE7A859A5B1-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-07-21 15:22 ` Steve Wise
2014-07-21 17:07 ` Devesh Sharma
[not found] ` <EE7902D3F51F404C82415C4803930ACD3FE1D9CF-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-07-21 17:30 ` Chuck Lever
[not found] ` <0CDA5340-DDD6-42F8-8359-0069BBC9E24C-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-07-22 5:06 ` Devesh Sharma
[not found] ` <EE7902D3F51F404C82415C4803930ACD3FE1DB1D-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-07-30 18:39 ` Chuck Lever
[not found] ` <A40CDF7D-7ED2-4D67-957F-8F977D567774-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-07-31 5:14 ` Devesh Sharma
[not found] ` <EE7902D3F51F404C82415C4803930ACD3FE23695-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-08-18 9:52 ` Devesh Sharma
[not found] ` <6a71f6a5-f335-42c6-b8b7-8b4bac5aae83-3RiH6ntJJkP8BX6JNMqfyFjyZtpTMMwT@public.gmane.org>
2014-08-18 13:13 ` Chuck Lever
2014-07-21 5:23 ` Devesh Sharma
2014-07-17 21:25 ` Shirley Ma [this message]
2014-07-18 6:19 ` Devesh Sharma
[not found] ` <EE7902D3F51F404C82415C4803930ACD3FE1686F-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-07-18 15:27 ` Chuck Lever
[not found] ` <D9783B2E-8D18-442E-9BFE-0863F9DD6B96-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-07-21 5:40 ` Devesh Sharma
2014-07-17 14:01 ` [for-next 2/2] xprtrdma: fix deallocation sequence of pd Devesh Sharma
[not found] ` <3fdcf67f-2e90-4c61-92da-a8f7743cf54a-3RiH6ntJJkOPfaB/Gd0HpljyZtpTMMwT@public.gmane.org>
2014-07-17 15:05 ` Steve Wise
[not found] ` <53C7E64D.90501-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2014-07-17 15:35 ` Devesh Sharma
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53C83F3A.7020608@oracle.com \
--to=shirley.ma-qhclzuegtsvqt0dzr+alfa@public.gmane.org \
--cc=chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
--cc=devesh.sharma-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org \
--cc=sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox