* Virtual IPs and blocking locks @ 2009-05-15 14:48 Sachin S. Prabhu 2009-05-15 16:50 ` Rob Gardner 0 siblings, 1 reply; 19+ messages in thread From: Sachin S. Prabhu @ 2009-05-15 14:48 UTC (permalink / raw) To: linux-nfs We have had a few reported cases of problems using blocking locks on nfs shares mounted using virtual ips. In these cases, the NFS server was using a floating ip for clustering purposes. Please consider the transaction below NFS client: 10.33.8.75 NFS Server: Primary IP : 10.33.8.71 Floating IP: 10.33.8.77 $ tshark -r block-virtual.pcap -R 'nlm' 19 2.487622 10.33.8.75 -> 10.33.8.77 NLM V4 LOCK Call FH:0x6176411a svid:4 pos:0-0 22 2.487760 10.33.8.77 -> 10.33.8.75 NLM V4 LOCK Reply (Call In 19) NLM_BLOCKED 33 2.489518 10.33.8.71 -> 10.33.8.75 NLM V4 GRANTED_MSG Call FH:0x6176411a svid:4 pos:0-0 36 2.489635 10.33.8.75 -> 10.33.8.71 NLM V4 GRANTED_MSG Reply (Call In 33) 46 2.489977 10.33.8.75 -> 10.33.8.71 NLM V4 GRANTED_RES Call NLM_DENIED 49 2.490096 10.33.8.71 -> 10.33.8.75 NLM V4 GRANTED_RES Reply (Call In 46) 19 - A lock request is sent from the client to the floating ip. 22 - A NLM_BLOCKED request is sent back by the Floating ip to the client. 33 - Server Primary IP address returns a NLM_GRANTED using the async callback mechanism. 36 - Ack for GRANTED_MSG in 33. 47 - Client returns a NLM_DENIED to the SERVER. This is done since it doesn't match the locks requested. 49 - Ack for GRANTED_RES in 46. In this case, the GRANTED_MSG is sent by the primary ip as determined by the routing table. This lock grant is rejected by the server since the ip address of the server doesn't match the ip address of the server against which the request was made. The locks are eventually granted after a 30 second poll timeout on the client. Similar problems are also seen when nfs shares are exported from GFS filesystems since GFS uses deferred locks. The problem was introduced by commit 5ac5f9d1ce8492163dbde5d357dc5d03becf7e36 which adds a check for the server ip address. This causes a regression for clients which mount off a virtual ip address from the server. A possible fix for this issue is to use the server ip address in the nlm_lock.oh field used to make the request and compare it to the nlm_lock.oh returned in the GRANTED_MSG call instead of checking the ip address of the server calling making the GRANTED_MSG call. Sachin Prabhu ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Virtual IPs and blocking locks 2009-05-15 14:48 Virtual IPs and blocking locks Sachin S. Prabhu @ 2009-05-15 16:50 ` Rob Gardner 2009-05-18 13:41 ` Sachin S. Prabhu 0 siblings, 1 reply; 19+ messages in thread From: Rob Gardner @ 2009-05-15 16:50 UTC (permalink / raw) To: linux-nfs@vger.kernel.org It looks to me like recent kernels have added a "h_srcaddr" filed to the nlm_host structure, and this should be set to the server's virtual ip address. Then when the server sends the GRANTED_MSG call to the client, it should appear to be coming from the virtual ip address, not the server's primary ip address. So either h_srcaddr isn't getting set up correctly with your virtual ip address, or rpc_create() isn't binding it as the source address as it should. In our (older kernel) code, we explicitly call xprt_set_bindaddr() with the virtual ip address to make this happen, but I don't immediately see where this happens in the latest kernel source. Rob Gardner HP Storage Works / NAS Sachin S. Prabhu wrote: > We have had a few reported cases of problems using blocking locks on nfs shares > mounted using virtual ips. In these cases, the NFS server was using a floating > ip for clustering purposes. > > Please consider the transaction below > > NFS client: 10.33.8.75 > NFS Server: > Primary IP : 10.33.8.71 > Floating IP: 10.33.8.77 > > $ tshark -r block-virtual.pcap -R 'nlm' > 19 2.487622 10.33.8.75 -> 10.33.8.77 NLM V4 LOCK Call FH:0x6176411a svid:4 > pos:0-0 > 22 2.487760 10.33.8.77 -> 10.33.8.75 NLM V4 LOCK Reply (Call In 19) > NLM_BLOCKED > 33 2.489518 10.33.8.71 -> 10.33.8.75 NLM V4 GRANTED_MSG Call FH:0x6176411a > svid:4 pos:0-0 > 36 2.489635 10.33.8.75 -> 10.33.8.71 NLM V4 GRANTED_MSG Reply (Call In 33) > 46 2.489977 10.33.8.75 -> 10.33.8.71 NLM V4 GRANTED_RES Call NLM_DENIED > 49 2.490096 10.33.8.71 -> 10.33.8.75 NLM V4 GRANTED_RES Reply (Call In 46) > > 19 - A lock request is sent from the client to the floating ip. > 22 - A NLM_BLOCKED request is sent back by the Floating ip to the client. > 33 - Server Primary IP address returns a NLM_GRANTED using the async callback > mechanism. > 36 - Ack for GRANTED_MSG in 33. > 47 - Client returns a NLM_DENIED to the SERVER. This is done since it doesn't > match the locks requested. > 49 - Ack for GRANTED_RES in 46. > > In this case, the GRANTED_MSG is sent by the primary ip as determined by the > routing table. This lock grant is rejected by the server since the ip address of > the server doesn't match the ip address of the server against which the request > was made. The locks are eventually granted after a 30 second poll timeout on the > client. > > Similar problems are also seen when nfs shares are exported from GFS filesystems > since GFS uses deferred locks. > > The problem was introduced by commit 5ac5f9d1ce8492163dbde5d357dc5d03becf7e36 > which adds a check for the server ip address. This causes a regression for > clients which mount off a virtual ip address from the server. > > A possible fix for this issue is to use the server ip address in the nlm_lock.oh > field used to make the request and compare it to the nlm_lock.oh returned in the > GRANTED_MSG call instead of checking the ip address of the server calling making > the GRANTED_MSG call. > > Sachin Prabhu > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Virtual IPs and blocking locks 2009-05-15 16:50 ` Rob Gardner @ 2009-05-18 13:41 ` Sachin S. Prabhu 2009-05-18 13:46 ` Trond Myklebust 2009-05-18 13:55 ` Rob Gardner 0 siblings, 2 replies; 19+ messages in thread From: Sachin S. Prabhu @ 2009-05-18 13:41 UTC (permalink / raw) To: Rob Gardner; +Cc: linux-nfs@vger.kernel.org Rob Gardner wrote: > It looks to me like recent kernels have added a "h_srcaddr" filed to the > nlm_host structure, and this should be set to the server's virtual ip > address. Then when the server sends the GRANTED_MSG call to the client, > it should appear to be coming from the virtual ip address, not the > server's primary ip address. So either h_srcaddr isn't getting set up > correctly with your virtual ip address, or rpc_create() isn't binding it > as the source address as it should. In our (older kernel) code, we > explicitly call xprt_set_bindaddr() with the virtual ip address to make > this happen, but I don't immediately see where this happens in the > latest kernel source. You are right, I cannot reproduce this issue with nfs servers based on later versions on the kernel containing the patches for 'NLM: fix source address in server callbacks' However this still leaves the question of the client handling such a situation where a callback is made from a different ip address. Should the client accept such callbacks or is the current behaviour of rejecting these callbacks correct? Sachin Prabhu ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Virtual IPs and blocking locks 2009-05-18 13:41 ` Sachin S. Prabhu @ 2009-05-18 13:46 ` Trond Myklebust 2009-05-18 13:55 ` Rob Gardner 1 sibling, 0 replies; 19+ messages in thread From: Trond Myklebust @ 2009-05-18 13:46 UTC (permalink / raw) To: Sachin S. Prabhu; +Cc: Rob Gardner, linux-nfs@vger.kernel.org On Mon, 2009-05-18 at 14:41 +0100, Sachin S. Prabhu wrote: > Rob Gardner wrote: > > It looks to me like recent kernels have added a "h_srcaddr" filed to the > > nlm_host structure, and this should be set to the server's virtual ip > > address. Then when the server sends the GRANTED_MSG call to the client, > > it should appear to be coming from the virtual ip address, not the > > server's primary ip address. So either h_srcaddr isn't getting set up > > correctly with your virtual ip address, or rpc_create() isn't binding it > > as the source address as it should. In our (older kernel) code, we > > explicitly call xprt_set_bindaddr() with the virtual ip address to make > > this happen, but I don't immediately see where this happens in the > > latest kernel source. > > You are right, I cannot reproduce this issue with nfs servers based on later > versions on the kernel containing the patches for > 'NLM: fix source address in server callbacks' > > However this still leaves the question of the client handling such a situation > where a callback is made from a different ip address. Should the client accept > such callbacks or is the current behaviour of rejecting these callbacks correct? The decision to reject callbacks from other ip addresses was deliberate. There is no good justification for a server to send callbacks or replies using an address that won't be recognised by the client. Trond ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Virtual IPs and blocking locks 2009-05-18 13:41 ` Sachin S. Prabhu 2009-05-18 13:46 ` Trond Myklebust @ 2009-05-18 13:55 ` Rob Gardner 2009-05-19 20:43 ` Huge race in lockd for async lock requests? Rob Gardner 1 sibling, 1 reply; 19+ messages in thread From: Rob Gardner @ 2009-05-18 13:55 UTC (permalink / raw) To: Sachin S. Prabhu; +Cc: linux-nfs@vger.kernel.org If the client gets a message from an IP address that he never sent a request to, he is correct to reject that message. Rob Sachin S. Prabhu wrote: > > However this still leaves the question of the client handling such a situation > where a callback is made from a different ip address. Should the client accept > such callbacks or is the current behaviour of rejecting these callbacks correct? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Huge race in lockd for async lock requests? 2009-05-18 13:55 ` Rob Gardner @ 2009-05-19 20:43 ` Rob Gardner 2009-05-19 21:33 ` Tom Talpey 2009-05-20 6:55 ` Rob Gardner 0 siblings, 2 replies; 19+ messages in thread From: Rob Gardner @ 2009-05-19 20:43 UTC (permalink / raw) To: linux-nfs@vger.kernel.org I've got a question about lockd in conjunction with a filesystem that provides its own (async) locking. After nlmsvc_lock() calls vfs_lock_file(), it seems to be that we might get the async callback (nlmsvc_grant_deferred) at any time. What's to stop it from arriving before we even put the block on the nlm_block list? If this happens, then nlmsvc_grant_deferred() will print "grant for unknown block" and then we'll wait forever for a grant that will never come. Seems like we ought to do nlmsvc_insert_block() before vfs_lock_file() at the very least; But this still leaves problems where the lock is granted via the callback while we're still in nlmsvc_lock(), and we ignore it and tell the client that the lock is blocked; Now they'll have to retry before getting the lock. Any thoughts on this besides "give up on using lockd"? Rob Gardner ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge race in lockd for async lock requests? 2009-05-19 20:43 ` Huge race in lockd for async lock requests? Rob Gardner @ 2009-05-19 21:33 ` Tom Talpey 2009-05-20 6:55 ` Rob Gardner 1 sibling, 0 replies; 19+ messages in thread From: Tom Talpey @ 2009-05-19 21:33 UTC (permalink / raw) To: Rob Gardner; +Cc: linux-nfs@vger.kernel.org At 04:43 PM 5/19/2009, Rob Gardner wrote: >I've got a question about lockd in conjunction with a filesystem that >provides its own (async) locking. > >After nlmsvc_lock() calls vfs_lock_file(), it seems to be that we might >get the async callback (nlmsvc_grant_deferred) at any time. What's to >stop it from arriving before we even put the block on the nlm_block >list? If this happens, then nlmsvc_grant_deferred() will print "grant >for unknown block" and then we'll wait forever for a grant that will >never come. Yes, there's a race but the client will retry every 30 seconds, so it won't wait forever. Depending on the kernel client version, there are some improvements we've tried over time to close the raciness a little. What exact client version are you working with? > >Seems like we ought to do nlmsvc_insert_block() before vfs_lock_file() >at the very least; But this still leaves problems where the lock is >granted via the callback while we're still in nlmsvc_lock(), and we >ignore it and tell the client that the lock is blocked; Now they'll have >to retry before getting the lock. It's a little worse than that. It's also possible for the client to hold a lock, but a stray or retried server callback can cause the client to reject it, releasing the lock at the server. This causes the server to grant the lock to another client even though the first client still thinks it holds it. It's an NLM protocol issue, frankly, due to the fact that the server callback is a completely separate channel. > >Any thoughts on this besides "give up on using lockd"? Use NFSv4? ;-) Tom. > >Rob Gardner > >-- >To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge race in lockd for async lock requests? 2009-05-19 20:43 ` Huge race in lockd for async lock requests? Rob Gardner 2009-05-19 21:33 ` Tom Talpey @ 2009-05-20 6:55 ` Rob Gardner 2009-05-20 14:00 ` Tom Talpey 1 sibling, 1 reply; 19+ messages in thread From: Rob Gardner @ 2009-05-20 6:55 UTC (permalink / raw) To: linux-nfs@vger.kernel.org Tom Talpey wrote: > At 04:43 PM 5/19/2009, Rob Gardner wrote: > >I've got a question about lockd in conjunction with a filesystem that > >provides its own (async) locking. > > > >After nlmsvc_lock() calls vfs_lock_file(), it seems to be that we might > >get the async callback (nlmsvc_grant_deferred) at any time. What's to > >stop it from arriving before we even put the block on the nlm_block > >list? If this happens, then nlmsvc_grant_deferred() will print "grant > >for unknown block" and then we'll wait forever for a grant that will > >never come. > > Yes, there's a race but the client will retry every 30 seconds, so it won't > wait forever. OK, a blocking lock request will get retried in 30 seconds and work out "ok". But a non-blocking request will get in big trouble. Let's say the callback is invoked immediately after the vfs_lock_file call returns FILE_LOCK_DEFERRED. At this point, the block is not on the nlm_block list, so the callback routine will not be able to find it and mark it as granted. Then nlmsvc_lock() will call nlmsvc_defer_lock_rqst(), put the block on the nlm_block list, and eventually the request will timeout and the client will get lck_denied. Meanwhile, the lock has actually been granted, but nobody knows about it. > Depending on the kernel client version, there are some > improvements we've tried over time to close the raciness a little. What > exact client version are you working with? > I maintain nfs/nlm server code for a NAS product, and so there is no "exact client" but rather a multitude of clients that I have no control over. All I can do is hack the server. We have been working around this by using a semaphore to cover the vfs_lock_file() to nlmsvc_insert_block() sequence in nlmsvc_lock() and also nlmsvc_grant_deferred(). So if the callback arrives at a bad time, it has to wait until the lock actually makes it onto the nlm_block list, and so the status of the lock gets updated properly. > Use NFSv4? ;-) > I had a feeling you were going to say that. ;-) Unfortunately that doesn't make NFSv3 and lockd go away. Rob Gardner ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge race in lockd for async lock requests? 2009-05-20 6:55 ` Rob Gardner @ 2009-05-20 14:00 ` Tom Talpey [not found] ` <4a140d0a.85c2f10a.53bc.0979-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Tom Talpey @ 2009-05-20 14:00 UTC (permalink / raw) To: Rob Gardner; +Cc: linux-nfs@vger.kernel.org At 02:55 AM 5/20/2009, Rob Gardner wrote: >Tom Talpey wrote: >> At 04:43 PM 5/19/2009, Rob Gardner wrote: >> >I've got a question about lockd in conjunction with a filesystem that >> >provides its own (async) locking. >> > >> >After nlmsvc_lock() calls vfs_lock_file(), it seems to be that we might >> >get the async callback (nlmsvc_grant_deferred) at any time. What's to >> >stop it from arriving before we even put the block on the nlm_block >> >list? If this happens, then nlmsvc_grant_deferred() will print "grant >> >for unknown block" and then we'll wait forever for a grant that will >> >never come. >> >> Yes, there's a race but the client will retry every 30 seconds, so it won't >> wait forever. >OK, a blocking lock request will get retried in 30 seconds and work out >"ok". But a non-blocking request will get in big trouble. Let's say the A non-blocking lock doesn't request, and won't get, a callback. So I don't understand... >callback is invoked immediately after the vfs_lock_file call returns >FILE_LOCK_DEFERRED. At this point, the block is not on the nlm_block >list, so the callback routine will not be able to find it and mark it as >granted. Then nlmsvc_lock() will call nlmsvc_defer_lock_rqst(), put the >block on the nlm_block list, and eventually the request will timeout and >the client will get lck_denied. Meanwhile, the lock has actually been >granted, but nobody knows about it. Yes, this can happen, I've seen it too. Again, it's a bug in the protocol more than a bug in the clients. It gets even worse when retries occur. If the reply cache doesn't catch the duplicates (and it never does), all heck breaks out. > >> Depending on the kernel client version, there are some >> improvements we've tried over time to close the raciness a little. What >> exact client version are you working with? >> > >I maintain nfs/nlm server code for a NAS product, and so there is no >"exact client" but rather a multitude of clients that I have no control >over. All I can do is hack the server. We have been working around this I feel for ya (been there, done that) :-) >by using a semaphore to cover the vfs_lock_file() to >nlmsvc_insert_block() sequence in nlmsvc_lock() and also >nlmsvc_grant_deferred(). So if the callback arrives at a bad time, it >has to wait until the lock actually makes it onto the nlm_block list, >and so the status of the lock gets updated properly. Can you explain this further? If you're implementing the server, how do you know your callback "arrives at a bad time", by the DENIED result from the client? Another thing to worry about is the presence of NLM_CANCEL calls from the client which cross the callbacks. I sent a patch which improves the situation at the client, some time ago. Basically it was more willing to positively acknowledge a callback which didn't match the nlm_blocked list, by also checking whether the lock was actually being held. This was only half the solution however, it didn't close the protocol race, just the client one. You want the patch? I'll look for it. > >> Use NFSv4? ;-) >> > >I had a feeling you were going to say that. ;-) Unfortunately that >doesn't make NFSv3 and lockd go away. Yes, I know. Unfortunately there aren't any elegant solutions to the NLM protocol's flaws. Tom. ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <4a140d0a.85c2f10a.53bc.0979-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>]
* Re: Huge race in lockd for async lock requests? [not found] ` <4a140d0a.85c2f10a.53bc.0979-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org> @ 2009-05-20 14:14 ` Tom Talpey [not found] ` <4a14106e.48c3f10a.7ce3.0e55-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org> 2009-05-20 16:37 ` Rob Gardner 1 sibling, 1 reply; 19+ messages in thread From: Tom Talpey @ 2009-05-20 14:14 UTC (permalink / raw) To: Rob Gardner; +Cc: linux-nfs@vger.kernel.org At 10:00 AM 5/20/2009, Tom Talpey wrote: >At 02:55 AM 5/20/2009, Rob Gardner wrote: >>Tom Talpey wrote: >>> At 04:43 PM 5/19/2009, Rob Gardner wrote: >>> >I've got a question about lockd in conjunction with a filesystem that >>> >provides its own (async) locking. >>> > >>> >After nlmsvc_lock() calls vfs_lock_file(), it seems to be that we might >>> >get the async callback (nlmsvc_grant_deferred) at any time. What's to >>> >stop it from arriving before we even put the block on the nlm_block >>> >list? If this happens, then nlmsvc_grant_deferred() will print "grant >>> >for unknown block" and then we'll wait forever for a grant that will >>> >never come. >>> >>> Yes, there's a race but the client will retry every 30 seconds, so it won't >>> wait forever. >>OK, a blocking lock request will get retried in 30 seconds and work out >>"ok". But a non-blocking request will get in big trouble. Let's say the > >A non-blocking lock doesn't request, and won't get, a callback. So I >don't understand... > >>callback is invoked immediately after the vfs_lock_file call returns >>FILE_LOCK_DEFERRED. At this point, the block is not on the nlm_block >>list, so the callback routine will not be able to find it and mark it as >>granted. Then nlmsvc_lock() will call nlmsvc_defer_lock_rqst(), put the >>block on the nlm_block list, and eventually the request will timeout and >>the client will get lck_denied. Meanwhile, the lock has actually been >>granted, but nobody knows about it. > >Yes, this can happen, I've seen it too. Again, it's a bug in the protocol >more than a bug in the clients. It gets even worse when retries occur. >If the reply cache doesn't catch the duplicates (and it never does), all >heck breaks out. > >> >>> Depending on the kernel client version, there are some >>> improvements we've tried over time to close the raciness a little. What >>> exact client version are you working with? >>> >> >>I maintain nfs/nlm server code for a NAS product, and so there is no >>"exact client" but rather a multitude of clients that I have no control >>over. All I can do is hack the server. We have been working around this > >I feel for ya (been there, done that) :-) > >>by using a semaphore to cover the vfs_lock_file() to >>nlmsvc_insert_block() sequence in nlmsvc_lock() and also >>nlmsvc_grant_deferred(). So if the callback arrives at a bad time, it >>has to wait until the lock actually makes it onto the nlm_block list, >>and so the status of the lock gets updated properly. > >Can you explain this further? If you're implementing the server, how do >you know your callback "arrives at a bad time", by the DENIED result >from the client? > >Another thing to worry about is the presence of NLM_CANCEL calls >from the client which cross the callbacks. > >I sent a patch which improves the situation at the client, some time >ago. Basically it was more willing to positively acknowledge a callback >which didn't match the nlm_blocked list, by also checking whether the >lock was actually being held. This was only half the solution however, >it didn't close the protocol race, just the client one. You want the >patch? I'll look for it. Found it, on the old nfs list: http://thread.gmane.org/gmane.linux.nfs/16611 Tom. > >> >>> Use NFSv4? ;-) >>> >> >>I had a feeling you were going to say that. ;-) Unfortunately that >>doesn't make NFSv3 and lockd go away. > >Yes, I know. Unfortunately there aren't any elegant solutions to >the NLM protocol's flaws. > >Tom. ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <4a14106e.48c3f10a.7ce3.0e55-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>]
* Re: Huge race in lockd for async lock requests? [not found] ` <4a14106e.48c3f10a.7ce3.0e55-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org> @ 2009-05-20 23:20 ` Rob Gardner 0 siblings, 0 replies; 19+ messages in thread From: Rob Gardner @ 2009-05-20 23:20 UTC (permalink / raw) To: Tom Talpey; +Cc: linux-nfs@vger.kernel.org Tom Talpey wrote: > Found it, on the old nfs list: > http://thread.gmane.org/gmane.linux.nfs/16611 > > Sorry I didn't see this until now... I realize now that we were definitely not talking about the same problem. I apologize for not being clear. My 'callbacks' are file system grant calls to lockd, not rpc calls to/from clients. Rob Gardner ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge race in lockd for async lock requests? [not found] ` <4a140d0a.85c2f10a.53bc.0979-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org> 2009-05-20 14:14 ` Tom Talpey @ 2009-05-20 16:37 ` Rob Gardner 2009-05-28 20:05 ` J. Bruce Fields 1 sibling, 1 reply; 19+ messages in thread From: Rob Gardner @ 2009-05-20 16:37 UTC (permalink / raw) To: tmtalpey, linux-nfs@vger.kernel.org Tom Talpey wrote: > At 02:55 AM 5/20/2009, Rob Gardner wrote: > >> Tom Talpey wrote: >> >>> At 04:43 PM 5/19/2009, Rob Gardner wrote: >>> >>>> I've got a question about lockd in conjunction with a filesystem that >>>> provides its own (async) locking. >>>> >>>> After nlmsvc_lock() calls vfs_lock_file(), it seems to be that we might >>>> get the async callback (nlmsvc_grant_deferred) at any time. What's to >>>> stop it from arriving before we even put the block on the nlm_block >>>> list? If this happens, then nlmsvc_grant_deferred() will print "grant >>>> for unknown block" and then we'll wait forever for a grant that will >>>> never come. >>>> >>> Yes, there's a race but the client will retry every 30 seconds, so it won't >>> wait forever. >>> >> OK, a blocking lock request will get retried in 30 seconds and work out >> "ok". But a non-blocking request will get in big trouble. Let's say the >> > > A non-blocking lock doesn't request, and won't get, a callback. So I > don't understand... > > What do you mean a non-blocking lock doesn't request? Remember that I'm dealing with a filesystem that provides its own locking functions via file->f_op->lock(). Such a filesystem might easily defer a non-blocking lock request and invoke the callback later. At least I don't know of any rule that says that it can't do this, and clearly the code expects this possibility: case FILE_LOCK_DEFERRED: if (wait) break; /* Filesystem lock operation is in progress Add it to the queue waiting for callback */ ret = nlmsvc_defer_lock_rqst(rqstp, block); >> callback is invoked immediately after the vfs_lock_file call returns >> FILE_LOCK_DEFERRED. At this point, the block is not on the nlm_block >> list, so the callback routine will not be able to find it and mark it as >> granted. Then nlmsvc_lock() will call nlmsvc_defer_lock_rqst(), put the >> block on the nlm_block list, and eventually the request will timeout and >> the client will get lck_denied. Meanwhile, the lock has actually been >> granted, but nobody knows about it. >> > > Yes, this can happen, I've seen it too. Again, it's a bug in the protocol > more than a bug in the clients. It looks to me like a bug in the server. The server must be able to deal with async filesystem callbacks happening at any time, however inconvenient. > It gets even worse when retries occur. > If the reply cache doesn't catch the duplicates (and it never does), all > heck breaks out. > You'll have to explain further what scenario you're talking about. I don't understand what the reply cache has to do with lockd. >> by using a semaphore to cover the vfs_lock_file() to >> nlmsvc_insert_block() sequence in nlmsvc_lock() and also >> nlmsvc_grant_deferred(). So if the callback arrives at a bad time, it >> has to wait until the lock actually makes it onto the nlm_block list, >> and so the status of the lock gets updated properly. >> > > Can you explain this further? If you're implementing the server, how do > you know your callback "arrives at a bad time", by the DENIED result > from the client? > I sense a little confusion so let me be more precise. By "callback" I am talking about the callback from the filesystem to lockd via lock_manager_operations.fl_grant (ie, nlmsvc_grant_deferred). If this callback is invoked while the lockd thread is executing code in nlmsvc_lock() between the call to vfs_lock_file() and the call to nlmsvc_insert_block(), then the callback routine (nlmsvc_grant_deferred) will not find the block on the nlm_block list because it's not there yet, and thus the "grant" is effectively lost. We use a semaphore in nlmsvc_lock() and nlmsvc_grant_deferred() to avoid this race. Rob Gardner ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge race in lockd for async lock requests? 2009-05-20 16:37 ` Rob Gardner @ 2009-05-28 20:05 ` J. Bruce Fields 2009-05-28 21:34 ` Rob Gardner 0 siblings, 1 reply; 19+ messages in thread From: J. Bruce Fields @ 2009-05-28 20:05 UTC (permalink / raw) To: Rob Gardner; +Cc: tmtalpey, linux-nfs@vger.kernel.org On Wed, May 20, 2009 at 10:37:05AM -0600, Rob Gardner wrote: > Tom Talpey wrote: >> At 02:55 AM 5/20/2009, Rob Gardner wrote: >> >>> Tom Talpey wrote: >>> >>>> At 04:43 PM 5/19/2009, Rob Gardner wrote: >>>> >>>>> I've got a question about lockd in conjunction with a filesystem >>>>> that provides its own (async) locking. >>>>> >>>>> After nlmsvc_lock() calls vfs_lock_file(), it seems to be that we >>>>> might get the async callback (nlmsvc_grant_deferred) at any time. >>>>> What's to stop it from arriving before we even put the block on >>>>> the nlm_block list? If this happens, then nlmsvc_grant_deferred() >>>>> will print "grant for unknown block" and then we'll wait forever >>>>> for a grant that will never come. >>>>> >>>> Yes, there's a race but the client will retry every 30 seconds, so it won't >>>> wait forever. >>>> >>> OK, a blocking lock request will get retried in 30 seconds and work >>> out "ok". But a non-blocking request will get in big trouble. Let's >>> say the >> >> A non-blocking lock doesn't request, and won't get, a callback. So I >> don't understand... >> >> > > What do you mean a non-blocking lock doesn't request? Remember that I'm > dealing with a filesystem that provides its own locking functions via > file->f_op->lock(). Such a filesystem might easily defer a non-blocking > lock request and invoke the callback later. At least I don't know of any > rule that says that it can't do this, and clearly the code expects this > possibility: > > case FILE_LOCK_DEFERRED: > if (wait) > break; > /* Filesystem lock operation is in progress > Add it to the queue waiting for callback */ > ret = nlmsvc_defer_lock_rqst(rqstp, block); > > >>> callback is invoked immediately after the vfs_lock_file call returns >>> FILE_LOCK_DEFERRED. At this point, the block is not on the nlm_block >>> list, so the callback routine will not be able to find it and mark it >>> as granted. Then nlmsvc_lock() will call nlmsvc_defer_lock_rqst(), >>> put the block on the nlm_block list, and eventually the request will >>> timeout and the client will get lck_denied. Meanwhile, the lock has >>> actually been granted, but nobody knows about it. >>> >> >> Yes, this can happen, I've seen it too. Again, it's a bug in the protocol >> more than a bug in the clients. > It looks to me like a bug in the server. The server must be able to deal > with async filesystem callbacks happening at any time, however > inconvenient. Absolutely, if that's possible then it's a server bug. --b. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge race in lockd for async lock requests? 2009-05-28 20:05 ` J. Bruce Fields @ 2009-05-28 21:34 ` Rob Gardner 2009-05-29 0:26 ` J. Bruce Fields 0 siblings, 1 reply; 19+ messages in thread From: Rob Gardner @ 2009-05-28 21:34 UTC (permalink / raw) To: J. Bruce Fields; +Cc: tmtalpey@gmail.com, linux-nfs@vger.kernel.org J. Bruce Fields wrote: >>>>> At 04:43 PM 5/19/2009, Rob Gardner wrote: >>>>> >>>>> >>>>>> I've got a question about lockd in conjunction with a filesystem >>>>>> that provides its own (async) locking. >>>>>> >>>>>> After nlmsvc_lock() calls vfs_lock_file(), it seems to be that we >>>>>> might get the async callback (nlmsvc_grant_deferred) at any time. >>>>>> What's to stop it from arriving before we even put the block on >>>>>> the nlm_block list? If this happens, then nlmsvc_grant_deferred() >>>>>> will print "grant for unknown block" and then we'll wait forever >>>>>> for a grant that will never come. >>>>>> >> dealing with a filesystem that provides its own locking functions via >> file->f_op->lock(). Such a filesystem might easily defer a non-blocking >> lock request and invoke the callback later. At least I don't know of any >> rule that says that it can't do this, and clearly the code expects this >> possibility: >> >> case FILE_LOCK_DEFERRED: >> if (wait) >> break; >> /* Filesystem lock operation is in progress >> Add it to the queue waiting for callback */ >> ret = nlmsvc_defer_lock_rqst(rqstp, block); >> >> >> It looks to me like a bug in the server. The server must be able to deal >> with async filesystem callbacks happening at any time, however >> inconvenient. >> > > Absolutely, if that's possible then it's a server bug. > > --b. > It's definitely possible for the async filesystem callback to occur at any time. I think at the very least, nlmsvc_lock() ought to put the block on the nlm_block list *before* calling vfs_lock_file(), and then remove it immediately if the lock is granted synchronously. I would like to develop and submit a patch for this, but I am currently working with a much older kernel and it will take some time before I get to work with newer bits. Rob Gardner ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge race in lockd for async lock requests? 2009-05-28 21:34 ` Rob Gardner @ 2009-05-29 0:26 ` J. Bruce Fields 2009-05-29 2:59 ` Rob Gardner 0 siblings, 1 reply; 19+ messages in thread From: J. Bruce Fields @ 2009-05-29 0:26 UTC (permalink / raw) To: Rob Gardner; +Cc: tmtalpey@gmail.com, linux-nfs@vger.kernel.org On Thu, May 28, 2009 at 03:34:19PM -0600, Rob Gardner wrote: > J. Bruce Fields wrote: >>>>>> At 04:43 PM 5/19/2009, Rob Gardner wrote: >>>>>> >>>>>>> I've got a question about lockd in conjunction with a >>>>>>> filesystem that provides its own (async) locking. >>>>>>> >>>>>>> After nlmsvc_lock() calls vfs_lock_file(), it seems to be >>>>>>> that we might get the async callback (nlmsvc_grant_deferred) >>>>>>> at any time. What's to stop it from arriving before we even >>>>>>> put the block on the nlm_block list? If this happens, then >>>>>>> nlmsvc_grant_deferred() will print "grant for unknown block" >>>>>>> and then we'll wait forever for a grant that will never come. >>>>>>> >>> dealing with a filesystem that provides its own locking functions via >>> file->f_op->lock(). Such a filesystem might easily defer a >>> non-blocking lock request and invoke the callback later. At least I >>> don't know of any rule that says that it can't do this, and clearly >>> the code expects this possibility: >>> >>> case FILE_LOCK_DEFERRED: >>> if (wait) >>> break; >>> /* Filesystem lock operation is in progress >>> Add it to the queue waiting for callback */ >>> ret = nlmsvc_defer_lock_rqst(rqstp, block); >>> >>> It looks to me like a bug in the server. The server must be able >>> to deal with async filesystem callbacks happening at any time, >>> however inconvenient. >>> >> >> Absolutely, if that's possible then it's a server bug. >> >> --b. >> > > It's definitely possible for the async filesystem callback to occur at > any time. Looking at the code.... This is all under the BKL, and as far as I can tell there aren't any blocking operations anywhere there, so I don't think this should happen if the filesystem is careful. Have you seen it happen? Of course this may be fragile--we'll have to think about what to do when we eventually remove the BKL. --b. > I think at the very least, nlmsvc_lock() ought to put the > block on the nlm_block list *before* calling vfs_lock_file(), and then > remove it immediately if the lock is granted synchronously. I would like > to develop and submit a patch for this, but I am currently working with > a much older kernel and it will take some time before I get to work with > newer bits. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge race in lockd for async lock requests? 2009-05-29 0:26 ` J. Bruce Fields @ 2009-05-29 2:59 ` Rob Gardner 2009-05-29 13:22 ` Tom Talpey 0 siblings, 1 reply; 19+ messages in thread From: Rob Gardner @ 2009-05-29 2:59 UTC (permalink / raw) To: J. Bruce Fields; +Cc: tmtalpey@gmail.com, linux-nfs@vger.kernel.org J. Bruce Fields wrote: > > Looking at the code.... This is all under the BKL, and as far as I can > tell there aren't any blocking operations anywhere there, so I don't > think this should happen if the filesystem is careful. Have you seen it > happen? Aha, I just figured it out and you were right. The filesystem in this case was not careful. It broke the rules and actually made the fl_grant call *before* even returning to nlmsvc_lock's call to vfs_lock_file, and it did it in the lockd thread! So the BKL was of no use, and I saw nlmsvc_grant_deferred print "grant for unknown block". So I think everything is ok, no huge race in lockd for async lock requests. Thank you for clearing this up. Rob Gardner ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge race in lockd for async lock requests? 2009-05-29 2:59 ` Rob Gardner @ 2009-05-29 13:22 ` Tom Talpey [not found] ` <4a1fe1c0.06045a0a.165b.5fbc-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Tom Talpey @ 2009-05-29 13:22 UTC (permalink / raw) To: Rob Gardner; +Cc: J. Bruce Fields, linux-nfs@vger.kernel.org At 10:59 PM 5/28/2009, Rob Gardner wrote: >J. Bruce Fields wrote: >> >> Looking at the code.... This is all under the BKL, and as far as I can >> tell there aren't any blocking operations anywhere there, so I don't >> think this should happen if the filesystem is careful. Have you seen it >> happen? > > >Aha, I just figured it out and you were right. The filesystem in this >case was not careful. It broke the rules and actually made the fl_grant >call *before* even returning to nlmsvc_lock's call to vfs_lock_file, and >it did it in the lockd thread! So the BKL was of no use, and I saw >nlmsvc_grant_deferred print "grant for unknown block". So I think >everything is ok, no huge race in lockd for async lock requests. Thank >you for clearing this up. Gack! I'm surprised it worked at all. The fact that the BKL allows itself to be taken recursively really masked your filesystem bug. If the BKL had blocked, or asserted, the bug would never have happened. This is as good a time as any to point out that the BKL's use in the lockd code is insidious and needs some serious attention. Unfortunately, it's also wrapped up in the BKL use of the VFS locking layer, but in practice those are two different things. Using the BKL for upcall/downcall synchronization and lockd thread protection are the issue here. Tom. ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <4a1fe1c0.06045a0a.165b.5fbc-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>]
* Re: Huge race in lockd for async lock requests? [not found] ` <4a1fe1c0.06045a0a.165b.5fbc-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org> @ 2009-05-29 15:24 ` Rob Gardner 2009-05-29 19:14 ` J. Bruce Fields 0 siblings, 1 reply; 19+ messages in thread From: Rob Gardner @ 2009-05-29 15:24 UTC (permalink / raw) To: Tom Talpey; +Cc: J. Bruce Fields, linux-nfs@vger.kernel.org Tom Talpey wrote: > At 10:59 PM 5/28/2009, Rob Gardner wrote: > >> J. Bruce Fields wrote: >> >>> Looking at the code.... This is all under the BKL, and as far as I can >>> tell there aren't any blocking operations anywhere there, so I don't >>> think this should happen if the filesystem is careful. Have you seen it >>> happen? >>> >> Aha, I just figured it out and you were right. The filesystem in this >> case was not careful. It broke the rules and actually made the fl_grant >> call *before* even returning to nlmsvc_lock's call to vfs_lock_file, and >> it did it in the lockd thread! So the BKL was of no use, and I saw >> nlmsvc_grant_deferred print "grant for unknown block". So I think >> everything is ok, no huge race in lockd for async lock requests. Thank >> you for clearing this up. >> > > Gack! I'm surprised it worked at all. The fact that the BKL allows itself to > be taken recursively really masked your filesystem bug. If the BKL had > blocked, or asserted, the bug would never have happened. > Yeah, recall that I'm using a very old kernel (circa 2.6.18) which I think must still allow the BKL to be acquired recursively. > This is as good a time as any to point out that the BKL's use in the lockd > code is insidious and needs some serious attention. No disagreement here! I think I almost understand enough about lockd to remove the BKL, but the operative word there is "almost". Rob Gardner ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge race in lockd for async lock requests? 2009-05-29 15:24 ` Rob Gardner @ 2009-05-29 19:14 ` J. Bruce Fields 0 siblings, 0 replies; 19+ messages in thread From: J. Bruce Fields @ 2009-05-29 19:14 UTC (permalink / raw) To: Rob Gardner; +Cc: Tom Talpey, linux-nfs@vger.kernel.org On Fri, May 29, 2009 at 09:24:25AM -0600, Rob Gardner wrote: > Tom Talpey wrote: >> At 10:59 PM 5/28/2009, Rob Gardner wrote: >> >>> J. Bruce Fields wrote: >>> >>>> Looking at the code.... This is all under the BKL, and as far as I can >>>> tell there aren't any blocking operations anywhere there, so I don't >>>> think this should happen if the filesystem is careful. Have you seen it >>>> happen? >>>> >>> Aha, I just figured it out and you were right. The filesystem in this >>> case was not careful. It broke the rules and actually made the >>> fl_grant call *before* even returning to nlmsvc_lock's call to >>> vfs_lock_file, and it did it in the lockd thread! So the BKL was of >>> no use, and I saw nlmsvc_grant_deferred print "grant for unknown >>> block". So I think everything is ok, no huge race in lockd for async >>> lock requests. Thank you for clearing this up. >>> >> >> Gack! I'm surprised it worked at all. The fact that the BKL allows itself to >> be taken recursively really masked your filesystem bug. If the BKL had >> blocked, or asserted, the bug would never have happened. >> > > Yeah, recall that I'm using a very old kernel (circa 2.6.18) which I > think must still allow the BKL to be acquired recursively. That's still true on recent kernels. --b. > >> This is as good a time as any to point out that the BKL's use in the lockd >> code is insidious and needs some serious attention. > No disagreement here! I think I almost understand enough about lockd to > remove the BKL, but the operative word there is "almost". > > > Rob Gardner > > ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2009-05-29 19:14 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-15 14:48 Virtual IPs and blocking locks Sachin S. Prabhu
2009-05-15 16:50 ` Rob Gardner
2009-05-18 13:41 ` Sachin S. Prabhu
2009-05-18 13:46 ` Trond Myklebust
2009-05-18 13:55 ` Rob Gardner
2009-05-19 20:43 ` Huge race in lockd for async lock requests? Rob Gardner
2009-05-19 21:33 ` Tom Talpey
2009-05-20 6:55 ` Rob Gardner
2009-05-20 14:00 ` Tom Talpey
[not found] ` <4a140d0a.85c2f10a.53bc.0979-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
2009-05-20 14:14 ` Tom Talpey
[not found] ` <4a14106e.48c3f10a.7ce3.0e55-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
2009-05-20 23:20 ` Rob Gardner
2009-05-20 16:37 ` Rob Gardner
2009-05-28 20:05 ` J. Bruce Fields
2009-05-28 21:34 ` Rob Gardner
2009-05-29 0:26 ` J. Bruce Fields
2009-05-29 2:59 ` Rob Gardner
2009-05-29 13:22 ` Tom Talpey
[not found] ` <4a1fe1c0.06045a0a.165b.5fbc-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
2009-05-29 15:24 ` Rob Gardner
2009-05-29 19:14 ` J. Bruce Fields
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox