From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sagi Grimberg Subject: Re: [PATCH V1] NFS-RDMA: fix qp pointer validation checks Date: Sun, 27 Apr 2014 13:12:41 +0300 Message-ID: <535CD819.3050508@dev.mellanox.co.il> References: <014738b6-698e-4ea1-82f9-287378bfec19@CMEXHTCAS2.ad.emulex.com> <56C87770-7940-4006-948C-FEF3C0EC4ACC@oracle.com> <5710A71F-C4D5-408B-9B41-07F21B5853F0@oracle.com> <6837A427-B677-4CC7-A022-4FB9E52A3FC6@oracle.com> <1bab6615-60c4-4865-a6a0-c53bb1c32341@CMEXHTCAS1.ad.emulex.com> <5358B975.4020207@dev.mellanox.co.il> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: Sender: linux-nfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Chuck Lever Cc: Devesh Sharma , Linux NFS Mailing List , "linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Trond Myklebust List-Id: linux-rdma@vger.kernel.org On 4/24/2014 6:01 PM, Chuck Lever wrote: > On Apr 24, 2014, at 3:12 AM, Sagi Grimberg = wrote: > >> On 4/24/2014 2:30 AM, Devesh Sharma wrote: >>> Hi Chuck >>> >>> Following is the complete call trace of a typical NFS-RDMA transact= ion while mounting a share. >>> It is unavoidable to stop calling post-send in case it is not creat= ed. Therefore, applying checks to the connection state is a must >>> While registering/deregistering frmrs on-the-fly. The unconnected s= tate of QP implies don't call post_send/post_recv from any context. >>> >> Long thread... didn't follow it all. > I think you got the gist of it. > >> If I understand correctly this race comes only for *cleanup* (LINV) = of FRMR registration while teardown flow destroyed the QP. >> I think this might be disappear if for each registration you post LI= NV+FRMR. >> This is assuming that a situation where trying to post Fastreg on a = "bad" QP can >> never happen (usually since teardown flow typically suspends outgoin= g commands). > That=92s typically true for =93hard=94 NFS mounts. But =93soft=94 NFS= mounts > wake RPCs after a timeout while the transport is disconnected, in > order to kill them. At that point, deregistration still needs to > succeed somehow. Not sure I understand, Can you please explain why deregistration will=20 not succeed? AFAIK You are allowed to register FRMR and then deregister it without=20 having to invalidate it. Can you please explain why you logically connected LINV with deregistra= tion? > > IMO there are three related problems. > > 1. rpcrdma_ep_connect() is allowing RPC tasks to be awoken while > there is no QP at all (->qp is NULL). The woken RPC tasks are > trying to deregister buffers that may include page cache pages, > and it=92s oopsing because ->qp is NULL. > > That=92s a logic bug in rpcrdma_ep_connect(), and I have an idea > how to address it. Why not first create a new id+qp and assign them - and then destroy the= =20 old id+qp? see SRP related section: ib_srp.x:srp_create_target_ib() Anyway it is indeed important to guarantee that no xmit flows happens=20 concurrently to that, and cleanups are processed synchronously and in-order. > > 2. If a QP is present but disconnected, posting LOCAL_INV won=92t wo= rk. > That leaves buffers (and page cache pages, potentially) register= ed. > That could be addressed with LINV+FRMR. But... > > 3. The client should not leave page cache pages registered indefinit= ely. > Both LINV+FRMR and our current approach depends on having a work= ing > QP _at_ _some_ _point_ =85 but the client simply can=92t depend = on that. > What happens if an NFS server is, say, destroyed by fire while t= here > are active client mount points? What if the HCA=92s firmware is > permanently not allowing QP creation? Again, I don't understand why you can't dereg_mr(). How about allocating the FRMR pool *after* the QP was successfully=20 created/connected (makes sense as the FRMRs are not usable until then), and destroy/cleanup the pool before the QP is=20 disconnected/destroyed. it also makes sense as they must match PDs. > > Here's a relevant comment in rpcrdma_ep_connect(): > > 815 /* TEMP TEMP TEMP - fail if new device: > 816 * Deregister/remarshal *all* requests! > 817 * Close and recreate adapter, pd, etc! > 818 * Re-determine all attributes still sane! > 819 * More stuff I haven't thought of! > 820 * Rrrgh! > 821 */ > > xprtrdma does not do this today. > > When a new device is created, all existing RPC requests could be > deregistered and re-marshalled. As far as I can tell, > rpcrdma_ep_connect() is executing in a synchronous context (the conne= ct > worker) and we can simply use dereg_mr, as long as later, when the RP= Cs > are re-driven, they know they need to re-marshal. Agree. Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html