Virtual IPs and blocking locks

public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed

* Virtual IPs and blocking locks
@ 2009-05-15 14:48 Sachin S. Prabhu
  2009-05-15 16:50 ` Rob Gardner
  0 siblings, 1 reply; 19+ messages in thread
From: Sachin S. Prabhu @ 2009-05-15 14:48 UTC (permalink / raw)
  To: linux-nfs

We have had a few reported cases of problems using blocking locks on nfs shares 
mounted using virtual ips. In these cases, the NFS server was using a floating 
ip for clustering purposes.

Please consider the transaction below

NFS client: 10.33.8.75
NFS Server:
Primary IP : 10.33.8.71
Floating IP:  10.33.8.77

$ tshark -r block-virtual.pcap -R 'nlm'
19   2.487622   10.33.8.75 -> 10.33.8.77   NLM V4 LOCK Call FH:0x6176411a svid:4 
pos:0-0
22   2.487760   10.33.8.77 -> 10.33.8.75   NLM V4 LOCK Reply (Call In 19) 
NLM_BLOCKED
33   2.489518   10.33.8.71 -> 10.33.8.75   NLM V4 GRANTED_MSG Call FH:0x6176411a 
svid:4 pos:0-0
36   2.489635   10.33.8.75 -> 10.33.8.71   NLM V4 GRANTED_MSG Reply (Call In 33)
46   2.489977   10.33.8.75 -> 10.33.8.71   NLM V4 GRANTED_RES Call NLM_DENIED
49   2.490096   10.33.8.71 -> 10.33.8.75   NLM V4 GRANTED_RES Reply (Call In 46)

19 - A lock request is sent from the client to the floating ip.
22 - A NLM_BLOCKED request is sent back by the Floating ip to the client.
33 - Server Primary IP address returns a NLM_GRANTED using the async callback 
mechanism.
36 - Ack for GRANTED_MSG in 33.
47 - Client returns a NLM_DENIED to the SERVER. This is done since it doesn't 
match the locks requested.
49 - Ack for GRANTED_RES in 46.

In this case, the GRANTED_MSG is sent by the primary ip as determined by the 
routing table. This lock grant is rejected by the server since the ip address of 
the server doesn't match the ip address of the server against which the request 
was made. The locks are eventually granted after a 30 second poll timeout on the 
client.

Similar problems are also seen when nfs shares are exported from GFS filesystems 
since GFS uses deferred locks.

The problem was introduced by commit 5ac5f9d1ce8492163dbde5d357dc5d03becf7e36 
which adds a check for the server ip address. This causes a regression for 
clients which mount off a virtual ip address from the server.

A possible fix for this issue is to use the server ip address in the nlm_lock.oh 
field used to make the request and compare it to the nlm_lock.oh returned in the 
GRANTED_MSG call instead of checking the ip address of the server calling making 
the GRANTED_MSG call.

Sachin Prabhu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Virtual IPs and blocking locks
  2009-05-15 14:48 Virtual IPs and blocking locks Sachin S. Prabhu
@ 2009-05-15 16:50 ` Rob Gardner
  2009-05-18 13:41   ` Sachin S. Prabhu
  0 siblings, 1 reply; 19+ messages in thread
From: Rob Gardner @ 2009-05-15 16:50 UTC (permalink / raw)
  To: linux-nfs@vger.kernel.org

It looks to me like recent kernels have added a "h_srcaddr" filed to the 
nlm_host structure, and this should be set to the server's virtual ip 
address. Then when the server sends the GRANTED_MSG call to the client, 
it should appear to be coming from the virtual ip address, not the 
server's primary ip address. So either h_srcaddr isn't getting set up 
correctly with your virtual ip address, or rpc_create() isn't binding it 
as the source address as it should. In our (older kernel) code, we 
explicitly call xprt_set_bindaddr() with the virtual ip address to make 
this happen, but I don't immediately see where this happens in the 
latest kernel source.

Rob Gardner
HP Storage Works / NAS


Sachin S. Prabhu wrote:
> We have had a few reported cases of problems using blocking locks on nfs shares 
> mounted using virtual ips. In these cases, the NFS server was using a floating 
> ip for clustering purposes.
>
> Please consider the transaction below
>
> NFS client: 10.33.8.75
> NFS Server:
> Primary IP : 10.33.8.71
> Floating IP:  10.33.8.77
>
> $ tshark -r block-virtual.pcap -R 'nlm'
> 19   2.487622   10.33.8.75 -> 10.33.8.77   NLM V4 LOCK Call FH:0x6176411a svid:4 
> pos:0-0
> 22   2.487760   10.33.8.77 -> 10.33.8.75   NLM V4 LOCK Reply (Call In 19) 
> NLM_BLOCKED
> 33   2.489518   10.33.8.71 -> 10.33.8.75   NLM V4 GRANTED_MSG Call FH:0x6176411a 
> svid:4 pos:0-0
> 36   2.489635   10.33.8.75 -> 10.33.8.71   NLM V4 GRANTED_MSG Reply (Call In 33)
> 46   2.489977   10.33.8.75 -> 10.33.8.71   NLM V4 GRANTED_RES Call NLM_DENIED
> 49   2.490096   10.33.8.71 -> 10.33.8.75   NLM V4 GRANTED_RES Reply (Call In 46)
>
> 19 - A lock request is sent from the client to the floating ip.
> 22 - A NLM_BLOCKED request is sent back by the Floating ip to the client.
> 33 - Server Primary IP address returns a NLM_GRANTED using the async callback 
> mechanism.
> 36 - Ack for GRANTED_MSG in 33.
> 47 - Client returns a NLM_DENIED to the SERVER. This is done since it doesn't 
> match the locks requested.
> 49 - Ack for GRANTED_RES in 46.
>
> In this case, the GRANTED_MSG is sent by the primary ip as determined by the 
> routing table. This lock grant is rejected by the server since the ip address of 
> the server doesn't match the ip address of the server against which the request 
> was made. The locks are eventually granted after a 30 second poll timeout on the 
> client.
>
> Similar problems are also seen when nfs shares are exported from GFS filesystems 
> since GFS uses deferred locks.
>
> The problem was introduced by commit 5ac5f9d1ce8492163dbde5d357dc5d03becf7e36 
> which adds a check for the server ip address. This causes a regression for 
> clients which mount off a virtual ip address from the server.
>
> A possible fix for this issue is to use the server ip address in the nlm_lock.oh 
> field used to make the request and compare it to the nlm_lock.oh returned in the 
> GRANTED_MSG call instead of checking the ip address of the server calling making 
> the GRANTED_MSG call.
>
> Sachin Prabhu
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Virtual IPs and blocking locks
  2009-05-15 16:50 ` Rob Gardner
@ 2009-05-18 13:41   ` Sachin S. Prabhu
  2009-05-18 13:46     ` Trond Myklebust
  2009-05-18 13:55     ` Rob Gardner
  0 siblings, 2 replies; 19+ messages in thread
From: Sachin S. Prabhu @ 2009-05-18 13:41 UTC (permalink / raw)
  To: Rob Gardner; +Cc: linux-nfs@vger.kernel.org

Rob Gardner wrote:
> It looks to me like recent kernels have added a "h_srcaddr" filed to the 
> nlm_host structure, and this should be set to the server's virtual ip 
> address. Then when the server sends the GRANTED_MSG call to the client, 
> it should appear to be coming from the virtual ip address, not the 
> server's primary ip address. So either h_srcaddr isn't getting set up 
> correctly with your virtual ip address, or rpc_create() isn't binding it 
> as the source address as it should. In our (older kernel) code, we 
> explicitly call xprt_set_bindaddr() with the virtual ip address to make 
> this happen, but I don't immediately see where this happens in the 
> latest kernel source.

You are right, I cannot reproduce this issue with nfs servers based on later 
versions on the kernel containing the patches for
'NLM: fix source address in server callbacks'

However this still leaves the question of the client handling such a situation 
where a callback is made from a different ip address. Should the client accept 
such callbacks or is the current behaviour of rejecting these callbacks correct?

Sachin Prabhu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Virtual IPs and blocking locks
  2009-05-18 13:41   ` Sachin S. Prabhu
@ 2009-05-18 13:46     ` Trond Myklebust
  2009-05-18 13:55     ` Rob Gardner
  1 sibling, 0 replies; 19+ messages in thread
From: Trond Myklebust @ 2009-05-18 13:46 UTC (permalink / raw)
  To: Sachin S. Prabhu; +Cc: Rob Gardner, linux-nfs@vger.kernel.org

On Mon, 2009-05-18 at 14:41 +0100, Sachin S. Prabhu wrote:
> Rob Gardner wrote:
> > It looks to me like recent kernels have added a "h_srcaddr" filed to the 
> > nlm_host structure, and this should be set to the server's virtual ip 
> > address. Then when the server sends the GRANTED_MSG call to the client, 
> > it should appear to be coming from the virtual ip address, not the 
> > server's primary ip address. So either h_srcaddr isn't getting set up 
> > correctly with your virtual ip address, or rpc_create() isn't binding it 
> > as the source address as it should. In our (older kernel) code, we 
> > explicitly call xprt_set_bindaddr() with the virtual ip address to make 
> > this happen, but I don't immediately see where this happens in the 
> > latest kernel source.
> 
> You are right, I cannot reproduce this issue with nfs servers based on later 
> versions on the kernel containing the patches for
> 'NLM: fix source address in server callbacks'
> 
> However this still leaves the question of the client handling such a situation 
> where a callback is made from a different ip address. Should the client accept 
> such callbacks or is the current behaviour of rejecting these callbacks correct?

The decision to reject callbacks from other ip addresses was deliberate.
There is no good justification for a server to send callbacks or replies
using an address that won't be recognised by the client.

  Trond


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Virtual IPs and blocking locks
  2009-05-18 13:41   ` Sachin S. Prabhu
  2009-05-18 13:46     ` Trond Myklebust
@ 2009-05-18 13:55     ` Rob Gardner
  2009-05-19 20:43       ` Huge race in lockd for async lock requests? Rob Gardner
  1 sibling, 1 reply; 19+ messages in thread
From: Rob Gardner @ 2009-05-18 13:55 UTC (permalink / raw)
  To: Sachin S. Prabhu; +Cc: linux-nfs@vger.kernel.org

If the client gets a message from an IP address that he never sent a 
request to, he is correct to reject that message.

Rob


Sachin S. Prabhu wrote:
>
> However this still leaves the question of the client handling such a situation 
> where a callback is made from a different ip address. Should the client accept 
> such callbacks or is the current behaviour of rejecting these callbacks correct?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Huge race in lockd for async lock requests?
  2009-05-18 13:55     ` Rob Gardner
@ 2009-05-19 20:43       ` Rob Gardner
  2009-05-19 21:33         ` Tom Talpey
  2009-05-20  6:55         ` Rob Gardner
  0 siblings, 2 replies; 19+ messages in thread
From: Rob Gardner @ 2009-05-19 20:43 UTC (permalink / raw)
  To: linux-nfs@vger.kernel.org

I've got a question about lockd in conjunction with a filesystem that 
provides its own (async) locking.

After nlmsvc_lock() calls vfs_lock_file(), it seems to be that we might 
get the async callback (nlmsvc_grant_deferred) at any time. What's to 
stop it from arriving before we even put the block on the nlm_block 
list? If this happens, then nlmsvc_grant_deferred() will print "grant 
for unknown block" and then we'll wait forever for a grant that will 
never come.

Seems like we ought to do nlmsvc_insert_block() before vfs_lock_file() 
at the very least; But this still leaves problems where the lock is 
granted via the callback while we're still in nlmsvc_lock(), and we 
ignore it and tell the client that the lock is blocked; Now they'll have 
to retry before getting the lock.

Any thoughts on this besides "give up on using lockd"?

Rob Gardner

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Huge race in lockd for async lock requests?
  2009-05-19 20:43       ` Huge race in lockd for async lock requests? Rob Gardner
@ 2009-05-19 21:33         ` Tom Talpey
  2009-05-20  6:55         ` Rob Gardner
  1 sibling, 0 replies; 19+ messages in thread
From: Tom Talpey @ 2009-05-19 21:33 UTC (permalink / raw)
  To: Rob Gardner; +Cc: linux-nfs@vger.kernel.org

At 04:43 PM 5/19/2009, Rob Gardner wrote:
>I've got a question about lockd in conjunction with a filesystem that 
>provides its own (async) locking.
>
>After nlmsvc_lock() calls vfs_lock_file(), it seems to be that we might 
>get the async callback (nlmsvc_grant_deferred) at any time. What's to 
>stop it from arriving before we even put the block on the nlm_block 
>list? If this happens, then nlmsvc_grant_deferred() will print "grant 
>for unknown block" and then we'll wait forever for a grant that will 
>never come.

Yes, there's a race but the client will retry every 30 seconds, so it won't
wait forever. Depending on the kernel client version, there are some
improvements we've tried over time to close the raciness a little. What
exact client version are you working with?

>
>Seems like we ought to do nlmsvc_insert_block() before vfs_lock_file() 
>at the very least; But this still leaves problems where the lock is 
>granted via the callback while we're still in nlmsvc_lock(), and we 
>ignore it and tell the client that the lock is blocked; Now they'll have 
>to retry before getting the lock.

It's a little worse than that. It's also possible for the client to hold a lock,
but a stray or retried server callback can cause the client to reject it,
releasing the lock at the server. This causes the server to grant the lock
to another client even though the first client still thinks it holds it. It's an
NLM protocol issue, frankly, due to the fact that the server callback is a
completely separate channel.

>
>Any thoughts on this besides "give up on using lockd"?

Use NFSv4? ;-)

Tom.

>
>Rob Gardner
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Huge race in lockd for async lock requests?
  2009-05-19 20:43       ` Huge race in lockd for async lock requests? Rob Gardner
  2009-05-19 21:33         ` Tom Talpey
@ 2009-05-20  6:55         ` Rob Gardner
  2009-05-20 14:00           ` Tom Talpey
  1 sibling, 1 reply; 19+ messages in thread
From: Rob Gardner @ 2009-05-20  6:55 UTC (permalink / raw)
  To: linux-nfs@vger.kernel.org

Tom Talpey wrote:
> At 04:43 PM 5/19/2009, Rob Gardner wrote:
> >I've got a question about lockd in conjunction with a filesystem that 
> >provides its own (async) locking.
> >
> >After nlmsvc_lock() calls vfs_lock_file(), it seems to be that we might 
> >get the async callback (nlmsvc_grant_deferred) at any time. What's to 
> >stop it from arriving before we even put the block on the nlm_block 
> >list? If this happens, then nlmsvc_grant_deferred() will print "grant 
> >for unknown block" and then we'll wait forever for a grant that will 
> >never come.
>
> Yes, there's a race but the client will retry every 30 seconds, so it won't
> wait forever.
OK, a blocking lock request will get retried in 30 seconds and work out 
"ok". But a non-blocking request will get in big trouble. Let's say the 
callback is invoked immediately after the vfs_lock_file call returns 
FILE_LOCK_DEFERRED. At this point, the block is not on the nlm_block 
list, so the callback routine will not be able to find it and mark it as 
granted. Then nlmsvc_lock() will call nlmsvc_defer_lock_rqst(), put the 
block on the nlm_block list, and eventually the request will timeout and 
the client will get lck_denied. Meanwhile, the lock has actually been 
granted, but nobody knows about it.

>  Depending on the kernel client version, there are some
> improvements we've tried over time to close the raciness a little. What
> exact client version are you working with?
>   

I maintain nfs/nlm server code for a NAS product, and so there is no 
"exact client" but rather a multitude of clients that I have no control 
over. All I can do is hack the server. We have been working around this 
by using a semaphore to cover the vfs_lock_file() to 
nlmsvc_insert_block() sequence in nlmsvc_lock() and also 
nlmsvc_grant_deferred(). So if the callback arrives at a bad time, it 
has to wait until the lock actually makes it onto the nlm_block list, 
and so the status of the lock gets updated properly.

> Use NFSv4? ;-)
>   

I had a feeling you were going to say that. ;-)  Unfortunately that 
doesn't make NFSv3 and lockd go away.


Rob Gardner






^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Huge race in lockd for async lock requests?
  2009-05-20  6:55         ` Rob Gardner
@ 2009-05-20 14:00           ` Tom Talpey
       [not found]             ` <4a140d0a.85c2f10a.53bc.0979-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Tom Talpey @ 2009-05-20 14:00 UTC (permalink / raw)
  To: Rob Gardner; +Cc: linux-nfs@vger.kernel.org

At 02:55 AM 5/20/2009, Rob Gardner wrote:
>Tom Talpey wrote:
>> At 04:43 PM 5/19/2009, Rob Gardner wrote:
>> >I've got a question about lockd in conjunction with a filesystem that 
>> >provides its own (async) locking.
>> >
>> >After nlmsvc_lock() calls vfs_lock_file(), it seems to be that we might 
>> >get the async callback (nlmsvc_grant_deferred) at any time. What's to 
>> >stop it from arriving before we even put the block on the nlm_block 
>> >list? If this happens, then nlmsvc_grant_deferred() will print "grant 
>> >for unknown block" and then we'll wait forever for a grant that will 
>> >never come.
>>
>> Yes, there's a race but the client will retry every 30 seconds, so it won't
>> wait forever.
>OK, a blocking lock request will get retried in 30 seconds and work out 
>"ok". But a non-blocking request will get in big trouble. Let's say the 

A non-blocking lock doesn't request, and won't get, a callback. So I
don't understand...

>callback is invoked immediately after the vfs_lock_file call returns 
>FILE_LOCK_DEFERRED. At this point, the block is not on the nlm_block 
>list, so the callback routine will not be able to find it and mark it as 
>granted. Then nlmsvc_lock() will call nlmsvc_defer_lock_rqst(), put the 
>block on the nlm_block list, and eventually the request will timeout and 
>the client will get lck_denied. Meanwhile, the lock has actually been 
>granted, but nobody knows about it.

Yes, this can happen, I've seen it too. Again, it's a bug in the protocol
more than a bug in the clients. It gets even worse when retries occur.
If the reply cache doesn't catch the duplicates (and it never does), all
heck breaks out.

>
>>  Depending on the kernel client version, there are some
>> improvements we've tried over time to close the raciness a little. What
>> exact client version are you working with?
>>   
>
>I maintain nfs/nlm server code for a NAS product, and so there is no 
>"exact client" but rather a multitude of clients that I have no control 
>over. All I can do is hack the server. We have been working around this 

I feel for ya (been there, done that) :-)

>by using a semaphore to cover the vfs_lock_file() to 
>nlmsvc_insert_block() sequence in nlmsvc_lock() and also 
>nlmsvc_grant_deferred(). So if the callback arrives at a bad time, it 
>has to wait until the lock actually makes it onto the nlm_block list, 
>and so the status of the lock gets updated properly.

Can you explain this further? If you're implementing the server, how do
you know your callback "arrives at a bad time", by the DENIED result
from the client?

Another thing to worry about is the presence of NLM_CANCEL calls
from the client which cross the callbacks. 

I sent a patch which improves the situation at the client, some time
ago. Basically it was more willing to positively acknowledge a callback
which didn't match the nlm_blocked list, by also checking whether the
lock was actually being held. This was only half the solution however,
it didn't close the protocol race, just the client one. You want the
patch? I'll look for it.

>
>> Use NFSv4? ;-)
>>   
>
>I had a feeling you were going to say that. ;-)  Unfortunately that 
>doesn't make NFSv3 and lockd go away.

Yes, I know. Unfortunately there aren't any elegant solutions to
the NLM protocol's flaws.

Tom.


^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <4a140d0a.85c2f10a.53bc.0979-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>]

* Re: Huge race in lockd for async lock requests?
       [not found]             ` <4a140d0a.85c2f10a.53bc.0979-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
@ 2009-05-20 14:14               ` Tom Talpey
       [not found]                 ` <4a14106e.48c3f10a.7ce3.0e55-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
  2009-05-20 16:37               ` Rob Gardner
  1 sibling, 1 reply; 19+ messages in thread
From: Tom Talpey @ 2009-05-20 14:14 UTC (permalink / raw)
  To: Rob Gardner; +Cc: linux-nfs@vger.kernel.org

At 10:00 AM 5/20/2009, Tom Talpey wrote:
>At 02:55 AM 5/20/2009, Rob Gardner wrote:
>>Tom Talpey wrote:
>>> At 04:43 PM 5/19/2009, Rob Gardner wrote:
>>> >I've got a question about lockd in conjunction with a filesystem that 
>>> >provides its own (async) locking.
>>> >
>>> >After nlmsvc_lock() calls vfs_lock_file(), it seems to be that we might 
>>> >get the async callback (nlmsvc_grant_deferred) at any time. What's to 
>>> >stop it from arriving before we even put the block on the nlm_block 
>>> >list? If this happens, then nlmsvc_grant_deferred() will print "grant 
>>> >for unknown block" and then we'll wait forever for a grant that will 
>>> >never come.
>>>
>>> Yes, there's a race but the client will retry every 30 seconds, so it won't
>>> wait forever.
>>OK, a blocking lock request will get retried in 30 seconds and work out 
>>"ok". But a non-blocking request will get in big trouble. Let's say the 
>
>A non-blocking lock doesn't request, and won't get, a callback. So I
>don't understand...
>
>>callback is invoked immediately after the vfs_lock_file call returns 
>>FILE_LOCK_DEFERRED. At this point, the block is not on the nlm_block 
>>list, so the callback routine will not be able to find it and mark it as 
>>granted. Then nlmsvc_lock() will call nlmsvc_defer_lock_rqst(), put the 
>>block on the nlm_block list, and eventually the request will timeout and 
>>the client will get lck_denied. Meanwhile, the lock has actually been 
>>granted, but nobody knows about it.
>
>Yes, this can happen, I've seen it too. Again, it's a bug in the protocol
>more than a bug in the clients. It gets even worse when retries occur.
>If the reply cache doesn't catch the duplicates (and it never does), all
>heck breaks out.
>
>>
>>>  Depending on the kernel client version, there are some
>>> improvements we've tried over time to close the raciness a little. What
>>> exact client version are you working with?
>>>   
>>
>>I maintain nfs/nlm server code for a NAS product, and so there is no 
>>"exact client" but rather a multitude of clients that I have no control 
>>over. All I can do is hack the server. We have been working around this 
>
>I feel for ya (been there, done that) :-)
>
>>by using a semaphore to cover the vfs_lock_file() to 
>>nlmsvc_insert_block() sequence in nlmsvc_lock() and also 
>>nlmsvc_grant_deferred(). So if the callback arrives at a bad time, it 
>>has to wait until the lock actually makes it onto the nlm_block list, 
>>and so the status of the lock gets updated properly.
>
>Can you explain this further? If you're implementing the server, how do
>you know your callback "arrives at a bad time", by the DENIED result
>from the client?
>
>Another thing to worry about is the presence of NLM_CANCEL calls
>from the client which cross the callbacks. 
>
>I sent a patch which improves the situation at the client, some time
>ago. Basically it was more willing to positively acknowledge a callback
>which didn't match the nlm_blocked list, by also checking whether the
>lock was actually being held. This was only half the solution however,
>it didn't close the protocol race, just the client one. You want the
>patch? I'll look for it.

Found it, on the old nfs list:

http://thread.gmane.org/gmane.linux.nfs/16611

Tom.

>
>>
>>> Use NFSv4? ;-)
>>>   
>>
>>I had a feeling you were going to say that. ;-)  Unfortunately that 
>>doesn't make NFSv3 and lockd go away.
>
>Yes, I know. Unfortunately there aren't any elegant solutions to
>the NLM protocol's flaws.
>
>Tom.


^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <4a14106e.48c3f10a.7ce3.0e55-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>]

* Re: Huge race in lockd for async lock requests?
       [not found]                 ` <4a14106e.48c3f10a.7ce3.0e55-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
@ 2009-05-20 23:20                   ` Rob Gardner
  0 siblings, 0 replies; 19+ messages in thread
From: Rob Gardner @ 2009-05-20 23:20 UTC (permalink / raw)
  To: Tom Talpey; +Cc: linux-nfs@vger.kernel.org

Tom Talpey wrote:
> Found it, on the old nfs list:
> http://thread.gmane.org/gmane.linux.nfs/16611
>
>   

Sorry I didn't see this until now... I realize now that we were 
definitely not talking about the same problem. I apologize for not being 
clear. My 'callbacks' are file system grant calls to lockd, not rpc 
calls to/from clients.

Rob Gardner


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Huge race in lockd for async lock requests?
       [not found]             ` <4a140d0a.85c2f10a.53bc.0979-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
  2009-05-20 14:14               ` Tom Talpey
@ 2009-05-20 16:37               ` Rob Gardner
  2009-05-28 20:05                 ` J. Bruce Fields
  1 sibling, 1 reply; 19+ messages in thread
From: Rob Gardner @ 2009-05-20 16:37 UTC (permalink / raw)
  To: tmtalpey, linux-nfs@vger.kernel.org

Tom Talpey wrote:
> At 02:55 AM 5/20/2009, Rob Gardner wrote:
>   
>> Tom Talpey wrote:
>>     
>>> At 04:43 PM 5/19/2009, Rob Gardner wrote:
>>>       
>>>> I've got a question about lockd in conjunction with a filesystem that 
>>>> provides its own (async) locking.
>>>>
>>>> After nlmsvc_lock() calls vfs_lock_file(), it seems to be that we might 
>>>> get the async callback (nlmsvc_grant_deferred) at any time. What's to 
>>>> stop it from arriving before we even put the block on the nlm_block 
>>>> list? If this happens, then nlmsvc_grant_deferred() will print "grant 
>>>> for unknown block" and then we'll wait forever for a grant that will 
>>>> never come.
>>>>         
>>> Yes, there's a race but the client will retry every 30 seconds, so it won't
>>> wait forever.
>>>       
>> OK, a blocking lock request will get retried in 30 seconds and work out 
>> "ok". But a non-blocking request will get in big trouble. Let's say the 
>>     
>
> A non-blocking lock doesn't request, and won't get, a callback. So I
> don't understand...
>
>   

What do you mean a non-blocking lock doesn't request? Remember that I'm 
dealing with a filesystem that provides its own locking functions via 
file->f_op->lock(). Such a filesystem might easily defer a non-blocking 
lock request and invoke the callback later. At least I don't know of any 
rule that says that it can't do this, and clearly the code expects this 
possibility:

              case FILE_LOCK_DEFERRED:
                        if (wait)
                                break;
                        /* Filesystem lock operation is in progress
                           Add it to the queue waiting for callback */
                        ret = nlmsvc_defer_lock_rqst(rqstp, block);


>> callback is invoked immediately after the vfs_lock_file call returns 
>> FILE_LOCK_DEFERRED. At this point, the block is not on the nlm_block 
>> list, so the callback routine will not be able to find it and mark it as 
>> granted. Then nlmsvc_lock() will call nlmsvc_defer_lock_rqst(), put the 
>> block on the nlm_block list, and eventually the request will timeout and 
>> the client will get lck_denied. Meanwhile, the lock has actually been 
>> granted, but nobody knows about it.
>>     
>
> Yes, this can happen, I've seen it too. Again, it's a bug in the protocol
> more than a bug in the clients. 
It looks to me like a bug in the server. The server must be able to deal 
with async filesystem callbacks happening at any time, however inconvenient.



> It gets even worse when retries occur.
> If the reply cache doesn't catch the duplicates (and it never does), all
> heck breaks out.
>   
You'll have to explain further what scenario you're talking about. I 
don't understand what the reply cache has to do with lockd.

>> by using a semaphore to cover the vfs_lock_file() to 
>> nlmsvc_insert_block() sequence in nlmsvc_lock() and also 
>> nlmsvc_grant_deferred(). So if the callback arrives at a bad time, it 
>> has to wait until the lock actually makes it onto the nlm_block list, 
>> and so the status of the lock gets updated properly.
>>     
>
> Can you explain this further? If you're implementing the server, how do
> you know your callback "arrives at a bad time", by the DENIED result
> from the client?
>   

I sense a little confusion so let me be more precise. By "callback" I am 
talking about the callback from the filesystem to lockd via 
lock_manager_operations.fl_grant (ie, nlmsvc_grant_deferred). If this 
callback is invoked while the lockd thread is executing code in 
nlmsvc_lock() between the call to vfs_lock_file() and the call to 
nlmsvc_insert_block(), then the callback routine (nlmsvc_grant_deferred) 
will not find the block on the nlm_block list because it's not there 
yet, and thus the "grant" is effectively lost. We use a semaphore in 
nlmsvc_lock() and nlmsvc_grant_deferred() to avoid this race.


Rob Gardner




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Huge race in lockd for async lock requests?
  2009-05-20 16:37               ` Rob Gardner
@ 2009-05-28 20:05                 ` J. Bruce Fields
  2009-05-28 21:34                   ` Rob Gardner
  0 siblings, 1 reply; 19+ messages in thread
From: J. Bruce Fields @ 2009-05-28 20:05 UTC (permalink / raw)
  To: Rob Gardner; +Cc: tmtalpey, linux-nfs@vger.kernel.org

On Wed, May 20, 2009 at 10:37:05AM -0600, Rob Gardner wrote:
> Tom Talpey wrote:
>> At 02:55 AM 5/20/2009, Rob Gardner wrote:
>>   
>>> Tom Talpey wrote:
>>>     
>>>> At 04:43 PM 5/19/2009, Rob Gardner wrote:
>>>>       
>>>>> I've got a question about lockd in conjunction with a filesystem 
>>>>> that provides its own (async) locking.
>>>>>
>>>>> After nlmsvc_lock() calls vfs_lock_file(), it seems to be that we 
>>>>> might get the async callback (nlmsvc_grant_deferred) at any time. 
>>>>> What's to stop it from arriving before we even put the block on 
>>>>> the nlm_block list? If this happens, then nlmsvc_grant_deferred() 
>>>>> will print "grant for unknown block" and then we'll wait forever 
>>>>> for a grant that will never come.
>>>>>         
>>>> Yes, there's a race but the client will retry every 30 seconds, so it won't
>>>> wait forever.
>>>>       
>>> OK, a blocking lock request will get retried in 30 seconds and work 
>>> out "ok". But a non-blocking request will get in big trouble. Let's 
>>> say the     
>>
>> A non-blocking lock doesn't request, and won't get, a callback. So I
>> don't understand...
>>
>>   
>
> What do you mean a non-blocking lock doesn't request? Remember that I'm  
> dealing with a filesystem that provides its own locking functions via  
> file->f_op->lock(). Such a filesystem might easily defer a non-blocking  
> lock request and invoke the callback later. At least I don't know of any  
> rule that says that it can't do this, and clearly the code expects this  
> possibility:
>
>              case FILE_LOCK_DEFERRED:
>                        if (wait)
>                                break;
>                        /* Filesystem lock operation is in progress
>                           Add it to the queue waiting for callback */
>                        ret = nlmsvc_defer_lock_rqst(rqstp, block);
>
>
>>> callback is invoked immediately after the vfs_lock_file call returns  
>>> FILE_LOCK_DEFERRED. At this point, the block is not on the nlm_block  
>>> list, so the callback routine will not be able to find it and mark it 
>>> as granted. Then nlmsvc_lock() will call nlmsvc_defer_lock_rqst(), 
>>> put the block on the nlm_block list, and eventually the request will 
>>> timeout and the client will get lck_denied. Meanwhile, the lock has 
>>> actually been granted, but nobody knows about it.
>>>     
>>
>> Yes, this can happen, I've seen it too. Again, it's a bug in the protocol
>> more than a bug in the clients. 
> It looks to me like a bug in the server. The server must be able to deal  
> with async filesystem callbacks happening at any time, however 
> inconvenient.

Absolutely, if that's possible then it's a server bug.

--b.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Huge race in lockd for async lock requests?
  2009-05-28 20:05                 ` J. Bruce Fields
@ 2009-05-28 21:34                   ` Rob Gardner
  2009-05-29  0:26                     ` J. Bruce Fields
  0 siblings, 1 reply; 19+ messages in thread
From: Rob Gardner @ 2009-05-28 21:34 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: tmtalpey@gmail.com, linux-nfs@vger.kernel.org

J. Bruce Fields wrote:
>>>>> At 04:43 PM 5/19/2009, Rob Gardner wrote:
>>>>>       
>>>>>           
>>>>>> I've got a question about lockd in conjunction with a filesystem 
>>>>>> that provides its own (async) locking.
>>>>>>
>>>>>> After nlmsvc_lock() calls vfs_lock_file(), it seems to be that we 
>>>>>> might get the async callback (nlmsvc_grant_deferred) at any time. 
>>>>>> What's to stop it from arriving before we even put the block on 
>>>>>> the nlm_block list? If this happens, then nlmsvc_grant_deferred() 
>>>>>> will print "grant for unknown block" and then we'll wait forever 
>>>>>> for a grant that will never come.
>>>>>>     
>> dealing with a filesystem that provides its own locking functions via  
>> file->f_op->lock(). Such a filesystem might easily defer a non-blocking  
>> lock request and invoke the callback later. At least I don't know of any  
>> rule that says that it can't do this, and clearly the code expects this  
>> possibility:
>>
>>              case FILE_LOCK_DEFERRED:
>>                        if (wait)
>>                                break;
>>                        /* Filesystem lock operation is in progress
>>                           Add it to the queue waiting for callback */
>>                        ret = nlmsvc_defer_lock_rqst(rqstp, block);
>>
>>     
>> It looks to me like a bug in the server. The server must be able to deal  
>> with async filesystem callbacks happening at any time, however 
>> inconvenient.
>>     
>
> Absolutely, if that's possible then it's a server bug.
>
> --b.
>   

It's definitely possible for the async filesystem callback to occur at 
any time. I think at the very least, nlmsvc_lock() ought to put the 
block on the nlm_block list *before* calling vfs_lock_file(), and then 
remove it immediately if the lock is granted synchronously. I would like 
to develop and submit a patch for this, but I am currently working with 
a much older kernel and it will take some time before I get to work with 
newer bits.

Rob Gardner


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Huge race in lockd for async lock requests?
  2009-05-28 21:34                   ` Rob Gardner
@ 2009-05-29  0:26                     ` J. Bruce Fields
  2009-05-29  2:59                       ` Rob Gardner
  0 siblings, 1 reply; 19+ messages in thread
From: J. Bruce Fields @ 2009-05-29  0:26 UTC (permalink / raw)
  To: Rob Gardner; +Cc: tmtalpey@gmail.com, linux-nfs@vger.kernel.org

On Thu, May 28, 2009 at 03:34:19PM -0600, Rob Gardner wrote:
> J. Bruce Fields wrote:
>>>>>> At 04:43 PM 5/19/2009, Rob Gardner wrote:
>>>>>>                 
>>>>>>> I've got a question about lockd in conjunction with a 
>>>>>>> filesystem that provides its own (async) locking.
>>>>>>>
>>>>>>> After nlmsvc_lock() calls vfs_lock_file(), it seems to be 
>>>>>>> that we might get the async callback (nlmsvc_grant_deferred) 
>>>>>>> at any time. What's to stop it from arriving before we even 
>>>>>>> put the block on the nlm_block list? If this happens, then 
>>>>>>> nlmsvc_grant_deferred() will print "grant for unknown block" 
>>>>>>> and then we'll wait forever for a grant that will never come.
>>>>>>>     
>>> dealing with a filesystem that provides its own locking functions via 
>>>  file->f_op->lock(). Such a filesystem might easily defer a 
>>> non-blocking  lock request and invoke the callback later. At least I 
>>> don't know of any  rule that says that it can't do this, and clearly 
>>> the code expects this  possibility:
>>>
>>>              case FILE_LOCK_DEFERRED:
>>>                        if (wait)
>>>                                break;
>>>                        /* Filesystem lock operation is in progress
>>>                           Add it to the queue waiting for callback */
>>>                        ret = nlmsvc_defer_lock_rqst(rqstp, block);
>>>
>>>     It looks to me like a bug in the server. The server must be able 
>>> to deal  with async filesystem callbacks happening at any time, 
>>> however inconvenient.
>>>     
>>
>> Absolutely, if that's possible then it's a server bug.
>>
>> --b.
>>   
>
> It's definitely possible for the async filesystem callback to occur at  
> any time.

Looking at the code....  This is all under the BKL, and as far as I can
tell there aren't any blocking operations anywhere there, so I don't
think this should happen if the filesystem is careful.  Have you seen it
happen?

Of course this may be fragile--we'll have to think about what to do when
we eventually remove the BKL.

--b.

> I think at the very least, nlmsvc_lock() ought to put the  
> block on the nlm_block list *before* calling vfs_lock_file(), and then  
> remove it immediately if the lock is granted synchronously. I would like  
> to develop and submit a patch for this, but I am currently working with  
> a much older kernel and it will take some time before I get to work with  
> newer bits.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Huge race in lockd for async lock requests?
  2009-05-29  0:26                     ` J. Bruce Fields
@ 2009-05-29  2:59                       ` Rob Gardner
  2009-05-29 13:22                         ` Tom Talpey
  0 siblings, 1 reply; 19+ messages in thread
From: Rob Gardner @ 2009-05-29  2:59 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: tmtalpey@gmail.com, linux-nfs@vger.kernel.org

J. Bruce Fields wrote:
>
> Looking at the code....  This is all under the BKL, and as far as I can
> tell there aren't any blocking operations anywhere there, so I don't
> think this should happen if the filesystem is careful.  Have you seen it
> happen?

Aha, I just figured it out and you were right. The filesystem in this 
case was not careful. It broke the rules and actually made the fl_grant 
call *before* even returning to nlmsvc_lock's call to vfs_lock_file, and 
it did it in the lockd thread! So the BKL was of no use, and I saw 
nlmsvc_grant_deferred print "grant for unknown block". So I think 
everything is ok, no huge race in lockd for async lock requests. Thank 
you for clearing this up.

Rob Gardner

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Huge race in lockd for async lock requests?
  2009-05-29  2:59                       ` Rob Gardner
@ 2009-05-29 13:22                         ` Tom Talpey
       [not found]                           ` <4a1fe1c0.06045a0a.165b.5fbc-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Tom Talpey @ 2009-05-29 13:22 UTC (permalink / raw)
  To: Rob Gardner; +Cc: J. Bruce Fields, linux-nfs@vger.kernel.org

At 10:59 PM 5/28/2009, Rob Gardner wrote:
>J. Bruce Fields wrote:
>>
>> Looking at the code....  This is all under the BKL, and as far as I can
>> tell there aren't any blocking operations anywhere there, so I don't
>> think this should happen if the filesystem is careful.  Have you seen it
>> happen?
>
>
>Aha, I just figured it out and you were right. The filesystem in this 
>case was not careful. It broke the rules and actually made the fl_grant 
>call *before* even returning to nlmsvc_lock's call to vfs_lock_file, and 
>it did it in the lockd thread! So the BKL was of no use, and I saw 
>nlmsvc_grant_deferred print "grant for unknown block". So I think 
>everything is ok, no huge race in lockd for async lock requests. Thank 
>you for clearing this up.

Gack! I'm surprised it worked at all. The fact that the BKL allows itself to
be taken recursively really masked your filesystem bug. If the BKL had
blocked, or asserted, the bug would never have happened.

This is as good a time as any to point out that the BKL's use in the lockd
code is insidious and needs some serious attention. Unfortunately, it's also
wrapped up in the BKL use of the VFS locking layer, but in practice those
are two different things. Using the BKL for upcall/downcall synchronization
and lockd thread protection are the issue here.

Tom.

^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <4a1fe1c0.06045a0a.165b.5fbc-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>]

* Re: Huge race in lockd for async lock requests?
       [not found]                           ` <4a1fe1c0.06045a0a.165b.5fbc-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
@ 2009-05-29 15:24                             ` Rob Gardner
  2009-05-29 19:14                               ` J. Bruce Fields
  0 siblings, 1 reply; 19+ messages in thread
From: Rob Gardner @ 2009-05-29 15:24 UTC (permalink / raw)
  To: Tom Talpey; +Cc: J. Bruce Fields, linux-nfs@vger.kernel.org

Tom Talpey wrote:
> At 10:59 PM 5/28/2009, Rob Gardner wrote:
>   
>> J. Bruce Fields wrote:
>>     
>>> Looking at the code....  This is all under the BKL, and as far as I can
>>> tell there aren't any blocking operations anywhere there, so I don't
>>> think this should happen if the filesystem is careful.  Have you seen it
>>> happen?
>>>       
>> Aha, I just figured it out and you were right. The filesystem in this 
>> case was not careful. It broke the rules and actually made the fl_grant 
>> call *before* even returning to nlmsvc_lock's call to vfs_lock_file, and 
>> it did it in the lockd thread! So the BKL was of no use, and I saw 
>> nlmsvc_grant_deferred print "grant for unknown block". So I think 
>> everything is ok, no huge race in lockd for async lock requests. Thank 
>> you for clearing this up.
>>     
>
> Gack! I'm surprised it worked at all. The fact that the BKL allows itself to
> be taken recursively really masked your filesystem bug. If the BKL had
> blocked, or asserted, the bug would never have happened.
>   

Yeah, recall that I'm using a very old kernel (circa 2.6.18) which I 
think must still allow the BKL to be acquired recursively.

> This is as good a time as any to point out that the BKL's use in the lockd
> code is insidious and needs some serious attention. 
No disagreement here! I think I almost understand enough about lockd to 
remove the BKL, but the operative word there is "almost".


Rob Gardner



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Huge race in lockd for async lock requests?
  2009-05-29 15:24                             ` Rob Gardner
@ 2009-05-29 19:14                               ` J. Bruce Fields
  0 siblings, 0 replies; 19+ messages in thread
From: J. Bruce Fields @ 2009-05-29 19:14 UTC (permalink / raw)
  To: Rob Gardner; +Cc: Tom Talpey, linux-nfs@vger.kernel.org

On Fri, May 29, 2009 at 09:24:25AM -0600, Rob Gardner wrote:
> Tom Talpey wrote:
>> At 10:59 PM 5/28/2009, Rob Gardner wrote:
>>   
>>> J. Bruce Fields wrote:
>>>     
>>>> Looking at the code....  This is all under the BKL, and as far as I can
>>>> tell there aren't any blocking operations anywhere there, so I don't
>>>> think this should happen if the filesystem is careful.  Have you seen it
>>>> happen?
>>>>       
>>> Aha, I just figured it out and you were right. The filesystem in this 
>>> case was not careful. It broke the rules and actually made the 
>>> fl_grant call *before* even returning to nlmsvc_lock's call to 
>>> vfs_lock_file, and it did it in the lockd thread! So the BKL was of 
>>> no use, and I saw nlmsvc_grant_deferred print "grant for unknown 
>>> block". So I think everything is ok, no huge race in lockd for async 
>>> lock requests. Thank you for clearing this up.
>>>     
>>
>> Gack! I'm surprised it worked at all. The fact that the BKL allows itself to
>> be taken recursively really masked your filesystem bug. If the BKL had
>> blocked, or asserted, the bug would never have happened.
>>   
>
> Yeah, recall that I'm using a very old kernel (circa 2.6.18) which I  
> think must still allow the BKL to be acquired recursively.

That's still true on recent kernels.

--b.

>
>> This is as good a time as any to point out that the BKL's use in the lockd
>> code is insidious and needs some serious attention. 
> No disagreement here! I think I almost understand enough about lockd to  
> remove the BKL, but the operative word there is "almost".
>
>
> Rob Gardner
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2009-05-29 19:14 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-15 14:48 Virtual IPs and blocking locks Sachin S. Prabhu
2009-05-15 16:50 ` Rob Gardner
2009-05-18 13:41   ` Sachin S. Prabhu
2009-05-18 13:46     ` Trond Myklebust
2009-05-18 13:55     ` Rob Gardner
2009-05-19 20:43       ` Huge race in lockd for async lock requests? Rob Gardner
2009-05-19 21:33         ` Tom Talpey
2009-05-20  6:55         ` Rob Gardner
2009-05-20 14:00           ` Tom Talpey
     [not found]             ` <4a140d0a.85c2f10a.53bc.0979-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
2009-05-20 14:14               ` Tom Talpey
     [not found]                 ` <4a14106e.48c3f10a.7ce3.0e55-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
2009-05-20 23:20                   ` Rob Gardner
2009-05-20 16:37               ` Rob Gardner
2009-05-28 20:05                 ` J. Bruce Fields
2009-05-28 21:34                   ` Rob Gardner
2009-05-29  0:26                     ` J. Bruce Fields
2009-05-29  2:59                       ` Rob Gardner
2009-05-29 13:22                         ` Tom Talpey
     [not found]                           ` <4a1fe1c0.06045a0a.165b.5fbc-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
2009-05-29 15:24                             ` Rob Gardner
2009-05-29 19:14                               ` J. Bruce Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox