linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagig@dev.mellanox.co.il>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: linux-rdma@vger.kernel.org,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	"Hefty, Sean" <sean.hefty@intel.com>
Subject: Re: [PATCH v1 04/14] xprtrdma: Use ib_device pointer safely
Date: Thu, 07 May 2015 18:11:51 +0300	[thread overview]
Message-ID: <554B80B7.8090900@dev.mellanox.co.il> (raw)
In-Reply-To: <BDE22240-DC37-4C54-B71E-D88EF54D3119@oracle.com>

On 5/7/2015 5:12 PM, Chuck Lever wrote:
>
> On May 7, 2015, at 9:56 AM, Sagi Grimberg <sagig@dev.mellanox.co.il> wrote:
>
>> On 5/7/2015 4:39 PM, Chuck Lever wrote:
>>>
>>> On May 7, 2015, at 6:00 AM, Sagi Grimberg <sagig@dev.mellanox.co.il> wrote:
>>>
>>>> On 5/4/2015 8:57 PM, Chuck Lever wrote:
>>>>> The connect worker can replace ri_id, but prevents ri_id->device
>>>>> from changing during the lifetime of a transport instance.
>>>>>
>>>>> Cache a copy of ri_id->device in rpcrdma_ia and in rpcrdma_rep.
>>>>> The cached copy can be used safely in code that does not serialize
>>>>> with the connect worker.
>>>>>
>>>>> Other code can use it to save an extra address generation (one
>>>>> pointer dereference instead of two).
>>>>>
>>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>>> ---
>>>>>   net/sunrpc/xprtrdma/fmr_ops.c      |    8 +----
>>>>>   net/sunrpc/xprtrdma/frwr_ops.c     |   12 +++----
>>>>>   net/sunrpc/xprtrdma/physical_ops.c |    8 +----
>>>>>   net/sunrpc/xprtrdma/verbs.c        |   61 +++++++++++++++++++-----------------
>>>>>   net/sunrpc/xprtrdma/xprt_rdma.h    |    2 +
>>>>>   5 files changed, 43 insertions(+), 48 deletions(-)
>>>>>
>>>>> diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
>>>>> index 302d4eb..0a96155 100644
>>>>> --- a/net/sunrpc/xprtrdma/fmr_ops.c
>>>>> +++ b/net/sunrpc/xprtrdma/fmr_ops.c
>>>>> @@ -85,7 +85,7 @@ fmr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
>>>>>   	   int nsegs, bool writing)
>>>>>   {
>>>>>   	struct rpcrdma_ia *ia = &r_xprt->rx_ia;
>>>>> -	struct ib_device *device = ia->ri_id->device;
>>>>> +	struct ib_device *device = ia->ri_device;
>>>>>   	enum dma_data_direction direction = rpcrdma_data_dir(writing);
>>>>>   	struct rpcrdma_mr_seg *seg1 = seg;
>>>>>   	struct rpcrdma_mw *mw = seg1->rl_mw;
>>>>> @@ -137,17 +137,13 @@ fmr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
>>>>>   {
>>>>>   	struct rpcrdma_ia *ia = &r_xprt->rx_ia;
>>>>>   	struct rpcrdma_mr_seg *seg1 = seg;
>>>>> -	struct ib_device *device;
>>>>>   	int rc, nsegs = seg->mr_nsegs;
>>>>>   	LIST_HEAD(l);
>>>>>
>>>>>   	list_add(&seg1->rl_mw->r.fmr->list, &l);
>>>>>   	rc = ib_unmap_fmr(&l);
>>>>> -	read_lock(&ia->ri_qplock);
>>>>> -	device = ia->ri_id->device;
>>>>>   	while (seg1->mr_nsegs--)
>>>>> -		rpcrdma_unmap_one(device, seg++);
>>>>> -	read_unlock(&ia->ri_qplock);
>>>>> +		rpcrdma_unmap_one(ia->ri_device, seg++);
>>>>
>>>> Umm, I'm wandering if this is guaranteed to be the same device as
>>>> ri_id->device?
>>>>
>>>> Imagine you are working on a bond device where each slave belongs to
>>>> a different adapter. When the active port toggles, you will see a
>>>> ADDR_CHANGED event (that the current code does not handle...), what
>>>> you'd want to do is just reconnect and rdma_cm will resolve the new
>>>> address for you (via the backup slave). I suspect that in case this
>>>> flow is concurrent with the reconnects you may end up with a stale
>>>> device handle.
>>>
>>> I’m not sure what you mean by “stale” : freed memory?
>>>
>>> I’m looking at this code in rpcrdma_ep_connect() :
>>>
>>>   916                 if (ia->ri_id->device != id->device) {
>>>   917                         printk("RPC:       %s: can't reconnect on "
>>>   918                                 "different device!\n", __func__);
>>>   919                         rdma_destroy_id(id);
>>>   920                         rc = -ENETUNREACH;
>>>   921                         goto out;
>>>   922                 }
>>>
>>> After reconnecting, if the ri_id has changed, the connect fails. Today,
>>> xprtrdma does not support the device changing out from under it.
>>>
>>> Note also that our receive completion upcall uses ri_id->device for
>>> DMA map syncing. Would that also be a problem during a bond failover?
>>>
>>
>> I'm not talking about ri_id->device, this will be consistent. I'm
>> wandering about ia->ri_device, which might not have been updated yet.
>
> ia->ri_device is never updated. The only place it is set is in
> rpcrdma_ia_open().

So you assume that each ri_id that you will recreate contains the
same device handle?

I think that for ADDR_CHANGE event when the slave belongs to another
device you will hit a mismatch. CC'ing Sean for more info...

>
>> Just asking, assuming your transport device can change between consecutive reconnects (the new cm_id will contain another device), is
>> it safe to rely on ri_device being updated?
>
> My reading of the above logic is that ia->ri_id->device is guaranteed to
> be the same address during the lifetime of the transport instance. If it
> changes during a reconnect, rpcrdma_ep_connect() will fail the connect.

It is the same address - the bond0 IP...

>
> In the case of a bonded device, why are the physical slave devices exposed
> to consumers?

You mean ib_device handle? you need it to create PD/CQ/QP/MRs...
How else can you allocate the device resources without the device
handle?

rdma_cm simply gives you the device handle by the IP route. From there
you own the resources you create.

> It might be saner to construct a virtual ib_device in this
> case that consumers can depend on.

I'm not sure how does a virtual ib_device can work - that goes
to the verbs themselves... Seems like a layering mis-match to me...

  reply	other threads:[~2015-05-07 15:11 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-04 17:56 [PATCH v1 00/14] client NFS/RDMA patches for 4.2 Chuck Lever
2015-05-04 17:56 ` [PATCH v1 01/14] xprtrdma: Transport fault injection Chuck Lever
2015-05-05 13:49   ` Anna Schumaker
2015-05-05 13:53     ` Chuck Lever
2015-05-05 14:44       ` Anna Schumaker
2015-05-05 15:15         ` Chuck Lever
2015-05-05 15:16           ` Anna Schumaker
2015-05-05 15:10   ` Steve Wise
2015-05-04 17:57 ` [PATCH v1 02/14] xprtrdma: Warn when there are orphaned IB objects Chuck Lever
2015-05-06 11:37   ` Devesh Sharma
2015-05-06 13:24     ` Chuck Lever
2015-05-06 14:05       ` Sagi Grimberg
2015-05-06 14:22       ` Devesh Sharma
2015-05-06 16:48         ` Jason Gunthorpe
2015-05-07  7:53           ` Devesh Sharma
2015-05-04 17:57 ` [PATCH v1 03/14] xprtrdma: Replace rpcrdma_rep::rr_buffer with rr_rxprt Chuck Lever
2015-05-07  9:38   ` Sagi Grimberg
2015-05-07 13:25     ` Chuck Lever
2015-05-04 17:57 ` [PATCH v1 04/14] xprtrdma: Use ib_device pointer safely Chuck Lever
2015-05-07 10:00   ` Sagi Grimberg
2015-05-07 13:39     ` Chuck Lever
2015-05-07 13:56       ` Sagi Grimberg
2015-05-07 14:12         ` Chuck Lever
2015-05-07 15:11           ` Sagi Grimberg [this message]
2015-05-11 15:22             ` Chuck Lever
2015-05-11 18:26             ` Hefty, Sean
2015-05-11 18:57               ` Chuck Lever
2015-05-12 10:01               ` Sagi Grimberg
2015-05-04 17:57 ` [PATCH v1 05/14] xprtrdma: Introduce helpers for allocating MWs Chuck Lever
2015-05-07 10:16   ` Sagi Grimberg
2015-05-04 17:57 ` [PATCH v1 06/14] xprtrdma: Acquire FMRs in rpcrdma_fmr_register_external() Chuck Lever
2015-05-07 10:15   ` Sagi Grimberg
2015-05-04 17:57 ` [PATCH v1 07/14] xprtrdma: Introduce an FRMR recovery workqueue Chuck Lever
2015-05-07 10:37   ` Devesh Sharma
2015-05-04 17:57 ` [PATCH v1 08/14] xprtrdma: Acquire MRs in rpcrdma_register_external() Chuck Lever
2015-05-07 10:31   ` Sagi Grimberg
2015-05-08 15:24     ` Devesh Sharma
2015-05-08 15:40       ` Chuck Lever
2015-05-10 10:17         ` Sagi Grimberg
2015-05-04 17:58 ` [PATCH v1 09/14] xprtrdma: Remove unused LOCAL_INV recovery logic Chuck Lever
2015-05-07 10:35   ` Sagi Grimberg
2015-05-08 15:31     ` Devesh Sharma
2015-05-04 17:58 ` [PATCH v1 10/14] xprtrdma: Remove ->ro_reset Chuck Lever
2015-05-07 10:36   ` Sagi Grimberg
2015-05-08 15:33     ` Devesh Sharma
2015-05-04 17:58 ` [PATCH v1 11/14] xprtrdma: Remove rpcrdma_ia::ri_memreg_strategy Chuck Lever
2015-05-07 10:36   ` Sagi Grimberg
2015-05-08 15:34     ` Devesh Sharma
2015-05-04 17:58 ` [PATCH v1 12/14] xprtrdma: Split rb_lock Chuck Lever
2015-05-07 10:37   ` Sagi Grimberg
2015-05-04 17:58 ` [PATCH v1 13/14] xprtrdma: Stack relief in fmr_op_map() Chuck Lever
2015-05-07 10:50   ` Sagi Grimberg
2015-05-08 15:36     ` Devesh Sharma
2015-05-04 17:58 ` [PATCH v1 14/14] xprtrmda: Reduce per-transport MR allocation Chuck Lever
2015-05-07 11:00   ` Sagi Grimberg
2015-05-08 15:53     ` Devesh Sharma
2015-05-05 15:17 ` [PATCH v1 00/14] client NFS/RDMA patches for 4.2 Steve Wise

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=554B80B7.8090900@dev.mellanox.co.il \
    --to=sagig@dev.mellanox.co.il \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=sean.hefty@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).