From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sagi Grimberg Subject: Re: [PATCH v1 04/14] xprtrdma: Use ib_device pointer safely Date: Thu, 07 May 2015 18:11:51 +0300 Message-ID: <554B80B7.8090900@dev.mellanox.co.il> References: <20150504174626.3483.97639.stgit@manet.1015granger.net> <20150504175720.3483.80356.stgit@manet.1015granger.net> <554B37CF.2070206@dev.mellanox.co.il> <554B6F2A.6000608@dev.mellanox.co.il> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Chuck Lever Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Linux NFS Mailing List , "Hefty, Sean" List-Id: linux-rdma@vger.kernel.org On 5/7/2015 5:12 PM, Chuck Lever wrote: > > On May 7, 2015, at 9:56 AM, Sagi Grimberg = wrote: > >> On 5/7/2015 4:39 PM, Chuck Lever wrote: >>> >>> On May 7, 2015, at 6:00 AM, Sagi Grimberg wrote: >>> >>>> On 5/4/2015 8:57 PM, Chuck Lever wrote: >>>>> The connect worker can replace ri_id, but prevents ri_id->device >>>>> from changing during the lifetime of a transport instance. >>>>> >>>>> Cache a copy of ri_id->device in rpcrdma_ia and in rpcrdma_rep. >>>>> The cached copy can be used safely in code that does not serializ= e >>>>> with the connect worker. >>>>> >>>>> Other code can use it to save an extra address generation (one >>>>> pointer dereference instead of two). >>>>> >>>>> Signed-off-by: Chuck Lever >>>>> --- >>>>> net/sunrpc/xprtrdma/fmr_ops.c | 8 +---- >>>>> net/sunrpc/xprtrdma/frwr_ops.c | 12 +++---- >>>>> net/sunrpc/xprtrdma/physical_ops.c | 8 +---- >>>>> net/sunrpc/xprtrdma/verbs.c | 61 +++++++++++++++++++--= --------------- >>>>> net/sunrpc/xprtrdma/xprt_rdma.h | 2 + >>>>> 5 files changed, 43 insertions(+), 48 deletions(-) >>>>> >>>>> diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/= fmr_ops.c >>>>> index 302d4eb..0a96155 100644 >>>>> --- a/net/sunrpc/xprtrdma/fmr_ops.c >>>>> +++ b/net/sunrpc/xprtrdma/fmr_ops.c >>>>> @@ -85,7 +85,7 @@ fmr_op_map(struct rpcrdma_xprt *r_xprt, struct = rpcrdma_mr_seg *seg, >>>>> int nsegs, bool writing) >>>>> { >>>>> struct rpcrdma_ia *ia =3D &r_xprt->rx_ia; >>>>> - struct ib_device *device =3D ia->ri_id->device; >>>>> + struct ib_device *device =3D ia->ri_device; >>>>> enum dma_data_direction direction =3D rpcrdma_data_dir(writing= ); >>>>> struct rpcrdma_mr_seg *seg1 =3D seg; >>>>> struct rpcrdma_mw *mw =3D seg1->rl_mw; >>>>> @@ -137,17 +137,13 @@ fmr_op_unmap(struct rpcrdma_xprt *r_xprt, s= truct rpcrdma_mr_seg *seg) >>>>> { >>>>> struct rpcrdma_ia *ia =3D &r_xprt->rx_ia; >>>>> struct rpcrdma_mr_seg *seg1 =3D seg; >>>>> - struct ib_device *device; >>>>> int rc, nsegs =3D seg->mr_nsegs; >>>>> LIST_HEAD(l); >>>>> >>>>> list_add(&seg1->rl_mw->r.fmr->list, &l); >>>>> rc =3D ib_unmap_fmr(&l); >>>>> - read_lock(&ia->ri_qplock); >>>>> - device =3D ia->ri_id->device; >>>>> while (seg1->mr_nsegs--) >>>>> - rpcrdma_unmap_one(device, seg++); >>>>> - read_unlock(&ia->ri_qplock); >>>>> + rpcrdma_unmap_one(ia->ri_device, seg++); >>>> >>>> Umm, I'm wandering if this is guaranteed to be the same device as >>>> ri_id->device? >>>> >>>> Imagine you are working on a bond device where each slave belongs = to >>>> a different adapter. When the active port toggles, you will see a >>>> ADDR_CHANGED event (that the current code does not handle...), wha= t >>>> you'd want to do is just reconnect and rdma_cm will resolve the ne= w >>>> address for you (via the backup slave). I suspect that in case thi= s >>>> flow is concurrent with the reconnects you may end up with a stale >>>> device handle. >>> >>> I=92m not sure what you mean by =93stale=94 : freed memory? >>> >>> I=92m looking at this code in rpcrdma_ep_connect() : >>> >>> 916 if (ia->ri_id->device !=3D id->device) { >>> 917 printk("RPC: %s: can't reconnec= t on " >>> 918 "different device!\n", __func= __); >>> 919 rdma_destroy_id(id); >>> 920 rc =3D -ENETUNREACH; >>> 921 goto out; >>> 922 } >>> >>> After reconnecting, if the ri_id has changed, the connect fails. To= day, >>> xprtrdma does not support the device changing out from under it. >>> >>> Note also that our receive completion upcall uses ri_id->device for >>> DMA map syncing. Would that also be a problem during a bond failove= r? >>> >> >> I'm not talking about ri_id->device, this will be consistent. I'm >> wandering about ia->ri_device, which might not have been updated yet= =2E > > ia->ri_device is never updated. The only place it is set is in > rpcrdma_ia_open(). So you assume that each ri_id that you will recreate contains the same device handle? I think that for ADDR_CHANGE event when the slave belongs to another device you will hit a mismatch. CC'ing Sean for more info... > >> Just asking, assuming your transport device can change between conse= cutive reconnects (the new cm_id will contain another device), is >> it safe to rely on ri_device being updated? > > My reading of the above logic is that ia->ri_id->device is guaranteed= to > be the same address during the lifetime of the transport instance. If= it > changes during a reconnect, rpcrdma_ep_connect() will fail the connec= t. It is the same address - the bond0 IP... > > In the case of a bonded device, why are the physical slave devices ex= posed > to consumers? You mean ib_device handle? you need it to create PD/CQ/QP/MRs... How else can you allocate the device resources without the device handle? rdma_cm simply gives you the device handle by the IP route. From there you own the resources you create. > It might be saner to construct a virtual ib_device in this > case that consumers can depend on. I'm not sure how does a virtual ib_device can work - that goes to the verbs themselves... Seems like a layering mis-match to me... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" i= n the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html