linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] server's statd and lockd will not sync after its nfslock restart
@ 2009-12-15 10:02 Mi Jinlong
  2009-12-15 12:41 ` J. Bruce Fields
  2009-12-15 15:10 ` Chuck Lever
  0 siblings, 2 replies; 18+ messages in thread
From: Mi Jinlong @ 2009-12-15 10:02 UTC (permalink / raw)
  To: Trond.Myklebust, J. Bruce Fields, NFSv3 list, Chuck Lever

Hi,

When testing the NLM at the latest kernel(2.6.32),  i find a bug.
When a client hold locks, after server restart its nfslock service, 
server's statd will not synchronize with lockd.
If server restart nfslock twice or more, client's lock will be lost.

Test process:

  Step1: client open nfs file.
  Step2: client using fcntl to get lock.
  Step3: server restart it's nfslock service.

After step3, server's lockd records client holding locks, but statd's 
/var/lib/nfs/statd/sm/ directory is empty. It means statd and lockd are 
not sync. If server restart it's nfslock again, client's locks will be lost.

The Primary Reason:

  At step3, when client's reclaimed lock request is sent to server, 
client's host(the host struct) is reused but not be re-monitored at
server's lockd. After that, statd and lockd are not sync.

Question:

In my opinion, if lockd is allowed reuseing the client's host, it should
send a SM_MON to statd when reuse. If not allowed, the client's host should 
be destroyed immediately.
 
What should lockd to do?  Reuse ? Destroy ? Or some other action?


thanks,

Mi Jinlong


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] server's statd and lockd will not sync after its nfslock restart
  2009-12-15 10:02 [RFC] server's statd and lockd will not sync after its nfslock restart Mi Jinlong
@ 2009-12-15 12:41 ` J. Bruce Fields
  2009-12-16  9:46   ` Mi Jinlong
  2009-12-15 15:10 ` Chuck Lever
  1 sibling, 1 reply; 18+ messages in thread
From: J. Bruce Fields @ 2009-12-15 12:41 UTC (permalink / raw)
  To: Mi Jinlong; +Cc: Trond.Myklebust, NFSv3 list, Chuck Lever

On Tue, Dec 15, 2009 at 06:02:11PM +0800, Mi Jinlong wrote:
> Hi,
> 
> When testing the NLM at the latest kernel(2.6.32),  i find a bug.
> When a client hold locks, after server restart its nfslock service, 
> server's statd will not synchronize with lockd.
> If server restart nfslock twice or more, client's lock will be lost.
> 
> Test process:
> 
>   Step1: client open nfs file.
>   Step2: client using fcntl to get lock.
>   Step3: server restart it's nfslock service.

I don't know what you mean; what did you actually do in step 3?

--b.

> 
> After step3, server's lockd records client holding locks, but statd's 
> /var/lib/nfs/statd/sm/ directory is empty. It means statd and lockd are 
> not sync. If server restart it's nfslock again, client's locks will be lost.
> 
> The Primary Reason:
> 
>   At step3, when client's reclaimed lock request is sent to server, 
> client's host(the host struct) is reused but not be re-monitored at
> server's lockd. After that, statd and lockd are not sync.
> 
> Question:
> 
> In my opinion, if lockd is allowed reuseing the client's host, it should
> send a SM_MON to statd when reuse. If not allowed, the client's host should 
> be destroyed immediately.
>  
> What should lockd to do?  Reuse ? Destroy ? Or some other action?
> 
> 
> thanks,
> 
> Mi Jinlong
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] server's statd and lockd will not sync after its nfslock restart
  2009-12-15 10:02 [RFC] server's statd and lockd will not sync after its nfslock restart Mi Jinlong
  2009-12-15 12:41 ` J. Bruce Fields
@ 2009-12-15 15:10 ` Chuck Lever
  2009-12-16 10:27   ` Mi Jinlong
  1 sibling, 1 reply; 18+ messages in thread
From: Chuck Lever @ 2009-12-15 15:10 UTC (permalink / raw)
  To: Mi Jinlong; +Cc: Trond.Myklebust, J. Bruce Fields, NFSv3 list

On Dec 15, 2009, at 5:02 AM, Mi Jinlong wrote:
> Hi,
>
> When testing the NLM at the latest kernel(2.6.32),  i find a bug.
> When a client hold locks, after server restart its nfslock service,
> server's statd will not synchronize with lockd.
> If server restart nfslock twice or more, client's lock will be lost.
>
> Test process:
>
>  Step1: client open nfs file.
>  Step2: client using fcntl to get lock.
>  Step3: server restart it's nfslock service.

I'll assume here that you mean the equivalent of "service nfslock  
restart".  This restarts statd and possibly runs sm-notify, but it has  
no effect on lockd.

Again, this test seems artificial to me.  Is there a real world use  
case where someone would deliberately restart statd while an NFS  
server is serving files?  I pose this question because I've worked on  
statd only for a year or so, and I am quite likely ignorant of all the  
ways it can be deployed.

> After step3, server's lockd records client holding locks, but statd's
> /var/lib/nfs/statd/sm/ directory is empty. It means statd and lockd  
> are
> not sync. If server restart it's nfslock again, client's locks will  
> be lost.
>
> The Primary Reason:
>
>  At step3, when client's reclaimed lock request is sent to server,
> client's host(the host struct) is reused but not be re-monitored at
> server's lockd. After that, statd and lockd are not sync.

The kernel squashes SM_MON upcalls for hosts that it already believes  
are monitored.  This is a scalability feature.

> Question:
>
> In my opinion, if lockd is allowed reuseing the client's host, it  
> should
> send a SM_MON to statd when reuse. If not allowed, the client's host  
> should
> be destroyed immediately.
>
> What should lockd to do?  Reuse ? Destroy ? Or some other action?

I don't immediately see why lockd should change it's behavior.   
Perhaps statd/sm-notify were incorrect to delete the monitor list when  
you restarted the nfslock service?

Can you show exactly how statd's state (ie it's on-disk monitor list  
in /var/lib/nfs/statd/sm) changed across the restart?  Did sm-notify  
run when you restarted statd?  If so, why didn't the sm-notify pid  
file stop it?

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] server's statd and lockd will not sync after its nfslock restart
  2009-12-15 12:41 ` J. Bruce Fields
@ 2009-12-16  9:46   ` Mi Jinlong
  0 siblings, 0 replies; 18+ messages in thread
From: Mi Jinlong @ 2009-12-16  9:46 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Trond.Myklebust, NFSv3 list, Chuck Lever



J. Bruce Fields :
> On Tue, Dec 15, 2009 at 06:02:11PM +0800, Mi Jinlong wrote:
>> Hi,
>>
>> When testing the NLM at the latest kernel(2.6.32),  i find a bug.
>> When a client hold locks, after server restart its nfslock service, 
>> server's statd will not synchronize with lockd.
>> If server restart nfslock twice or more, client's lock will be lost.
>>
>> Test process:
>>
>>   Step1: client open nfs file.
>>   Step2: client using fcntl to get lock.
>>   Step3: server restart it's nfslock service.
> 
> I don't know what you mean; what did you actually do in step 3?

  I use command "service nfslock restart" at server.

  I mean, after server restart nfslock service, lockd and statd are not synchronizing.
  Because, after nfslock restart, server go into grace_period, and client's 
  reclaimed lock request will be processed by lockd. At this, client's host(the host struct)
  will be reused which is created before nfslock restart, but, the lockd don't 
  send a SM_MON to statd.

  After locks are reclaimed, server's lockd records client hold locks, but statd don't 
  monitor client.

 thanks,
 Mi Jinlong

> 
> --b.
> 
>> After step3, server's lockd records client holding locks, but statd's 
>> /var/lib/nfs/statd/sm/ directory is empty. It means statd and lockd are 
>> not sync. If server restart it's nfslock again, client's locks will be lost.
>>
>> The Primary Reason:
>>
>>   At step3, when client's reclaimed lock request is sent to server, 
>> client's host(the host struct) is reused but not be re-monitored at
>> server's lockd. After that, statd and lockd are not sync.
>>
>> Question:
>>
>> In my opinion, if lockd is allowed reuseing the client's host, it should
>> send a SM_MON to statd when reuse. If not allowed, the client's host should 
>> be destroyed immediately.
>>  
>> What should lockd to do?  Reuse ? Destroy ? Or some other action?
>>
>>
>> thanks,
>>
>> Mi Jinlong
>>
> 
> 

-- 
Regards
Mi Jinlong


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] server's statd and lockd will not sync after its nfslock restart
  2009-12-15 15:10 ` Chuck Lever
@ 2009-12-16 10:27   ` Mi Jinlong
  2009-12-16 13:49     ` Jeff Layton
  2009-12-16 19:33     ` Chuck Lever
  0 siblings, 2 replies; 18+ messages in thread
From: Mi Jinlong @ 2009-12-16 10:27 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Trond.Myklebust, J. Bruce Fields, NFSv3 list



Chuck Lever:
> On Dec 15, 2009, at 5:02 AM, Mi Jinlong wrote:
>> Hi,
>>
>> When testing the NLM at the latest kernel(2.6.32),  i find a bug.
>> When a client hold locks, after server restart its nfslock service,
>> server's statd will not synchronize with lockd.
>> If server restart nfslock twice or more, client's lock will be lost.
>>
>> Test process:
>>
>>  Step1: client open nfs file.
>>  Step2: client using fcntl to get lock.
>>  Step3: server restart it's nfslock service.
> 
> I'll assume here that you mean the equivalent of "service nfslock
> restart".  This restarts statd and possibly runs sm-notify, but it has
> no effect on lockd.

  Yes, i used "service nfslock restart".

  It has effect on lockd too, when service stop, lockd will get a KILL signal.
  Lockd will release all client's locks, and go into grace_period and wait 
  client reclaime it's lock.

> 
> Again, this test seems artificial to me.  Is there a real world use case
> where someone would deliberately restart statd while an NFS server is
> serving files?  I pose this question because I've worked on statd only
> for a year or so, and I am quite likely ignorant of all the ways it can
> be deployed.

  ^/^, but maybe someone will restart nfslock when an NFS server is serving files.
  It is inevitable.

> 
>> After step3, server's lockd records client holding locks, but statd's
>> /var/lib/nfs/statd/sm/ directory is empty. It means statd and lockd are
>> not sync. If server restart it's nfslock again, client's locks will be
>> lost.
>>
>> The Primary Reason:
>>
>>  At step3, when client's reclaimed lock request is sent to server,
>> client's host(the host struct) is reused but not be re-monitored at
>> server's lockd. After that, statd and lockd are not sync.
> 
> The kernel squashes SM_MON upcalls for hosts that it already believes
> are monitored.  This is a scalability feature.

  When statd start, it will move files from /var/lib/nfs/statd/sm/ to
  /var/lib/nfs/statd/sm.bak/. If lockd don't send a SM_MON to statd, 
  statd will not monitor those client which be monitored before statd restart.
  I don't make sure, is it right?  

> 
>> Question:
>>
>> In my opinion, if lockd is allowed reuseing the client's host, it should
>> send a SM_MON to statd when reuse. If not allowed, the client's host
>> should
>> be destroyed immediately.
>>
>> What should lockd to do?  Reuse ? Destroy ? Or some other action?
> 
> I don't immediately see why lockd should change it's behavior.  Perhaps
> statd/sm-notify were incorrect to delete the monitor list when you
> restarted the nfslock service?

  Sorry, maybe i did not express clearly.
  I mean, lockd reuse the host struct which was created before statd restart.

  It seems have deleted the monitor list when nfslock restart.

> 
> Can you show exactly how statd's state (ie it's on-disk monitor list in
> /var/lib/nfs/statd/sm) changed across the restart?  Did sm-notify run
> when you restarted statd?  If so, why didn't the sm-notify pid file stop
> it?
> 

  The statd and lockd's state at server when nfslock restart:

        lockd                   statd         |
                                              |
      host(monitored = 1)      /sm/client     |  client get locks success at first
          (locks)                             |
                                              |
      host(monitored = 1)      /sm/client     |  nfslock stop (lockd release client's locks)
          (no locks)                          |
                                              |  
      host(monitored = 1)      /sm/           |  nfslock start (client reclaim locks)
          (locks)                             |                (but statd don't monitor it)

  note: host(monitored=1)  means: client's host struct is created, and is marked be monitored.
        (locks), (no locks)means: host strcut holds locks, or not.
        /sm/client         means: there have a file under /var/lib/nfs/statd/sm directory
        /sm/               means: /var/lib/nfs/statd/sm is empty!


thanks,
Mi Jinlong


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] server's statd and lockd will not sync after its nfslock restart
  2009-12-16 10:27   ` Mi Jinlong
@ 2009-12-16 13:49     ` Jeff Layton
       [not found]       ` <20091216084902.64f722ad-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
  2009-12-16 19:33     ` Chuck Lever
  1 sibling, 1 reply; 18+ messages in thread
From: Jeff Layton @ 2009-12-16 13:49 UTC (permalink / raw)
  To: Mi Jinlong; +Cc: Chuck Lever, Trond.Myklebust, J. Bruce Fields, NFSv3 list

On Wed, 16 Dec 2009 18:27:09 +0800
Mi Jinlong <mijinlong@cn.fujitsu.com> wrote:

> 
>   The statd and lockd's state at server when nfslock restart:
> 
>         lockd                   statd         |
>                                               |
>       host(monitored = 1)      /sm/client     |  client get locks success at first
>           (locks)                             |
>                                               |
>       host(monitored = 1)      /sm/client     |  nfslock stop (lockd release client's locks)
>           (no locks)                          |
>                                               |  
>       host(monitored = 1)      /sm/           |  nfslock start (client reclaim locks)
>           (locks)                             |                (but statd don't monitor it)
> 
>   note: host(monitored=1)  means: client's host struct is created, and is marked be monitored.
>         (locks), (no locks)means: host strcut holds locks, or not.
>         /sm/client         means: there have a file under /var/lib/nfs/statd/sm directory
>         /sm/               means: /var/lib/nfs/statd/sm is empty!
> 
> 

Perhaps we ought to clear the cached list of monitored hosts (i.e. set
them all to monitored = 0) when lockd gets a SIGKILL.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] server's statd and lockd will not sync after its nfslock restart
  2009-12-16 10:27   ` Mi Jinlong
  2009-12-16 13:49     ` Jeff Layton
@ 2009-12-16 19:33     ` Chuck Lever
  2009-12-17 10:07       ` Mi Jinlong
  1 sibling, 1 reply; 18+ messages in thread
From: Chuck Lever @ 2009-12-16 19:33 UTC (permalink / raw)
  To: Mi Jinlong; +Cc: Trond.Myklebust, J. Bruce Fields, NFSv3 list

On Dec 16, 2009, at 5:27 AM, Mi Jinlong wrote:
> Chuck Lever:
>> On Dec 15, 2009, at 5:02 AM, Mi Jinlong wrote:
>>> Hi,
>>>
>>> When testing the NLM at the latest kernel(2.6.32),  i find a bug.
>>> When a client hold locks, after server restart its nfslock service,
>>> server's statd will not synchronize with lockd.
>>> If server restart nfslock twice or more, client's lock will be lost.
>>>
>>> Test process:
>>>
>>> Step1: client open nfs file.
>>> Step2: client using fcntl to get lock.
>>> Step3: server restart it's nfslock service.
>>
>> I'll assume here that you mean the equivalent of "service nfslock
>> restart".  This restarts statd and possibly runs sm-notify, but it  
>> has
>> no effect on lockd.
>
>  Yes, i used "service nfslock restart".
>
>  It has effect on lockd too, when service stop, lockd will get a  
> KILL signal.
>  Lockd will release all client's locks, and go into grace_period and  
> wait
>  client reclaime it's lock.
>
>>
>> Again, this test seems artificial to me.  Is there a real world use  
>> case
>> where someone would deliberately restart statd while an NFS server is
>> serving files?  I pose this question because I've worked on statd  
>> only
>> for a year or so, and I am quite likely ignorant of all the ways it  
>> can
>> be deployed.
>
>  ^/^, but maybe someone will restart nfslock when an NFS server is  
> serving files.
>  It is inevitable.
>
>>> After step3, server's lockd records client holding locks, but  
>>> statd's
>>> /var/lib/nfs/statd/sm/ directory is empty. It means statd and  
>>> lockd are
>>> not sync. If server restart it's nfslock again, client's locks  
>>> will be
>>> lost.
>>>
>>> The Primary Reason:
>>>
>>> At step3, when client's reclaimed lock request is sent to server,
>>> client's host(the host struct) is reused but not be re-monitored at
>>> server's lockd. After that, statd and lockd are not sync.
>>
>> The kernel squashes SM_MON upcalls for hosts that it already believes
>> are monitored.  This is a scalability feature.
>
>  When statd start, it will move files from /var/lib/nfs/statd/sm/ to
>  /var/lib/nfs/statd/sm.bak/.

Well, it's really sm-notify that does this.  sm-notify is run by  
rpc.statd when it starts up.

However, sm-notify should only retire the monitor list the first time  
it is run after a reboot.  Simply restarting statd should not change  
the on-disk monitor list in the slightest.  If it does, there's some  
kind of problem with the way sm-notify's pid file is managed, or  
perhaps with the nfslock script.

> If lockd don't send a SM_MON to statd,
>  statd will not monitor those client which be monitored before statd  
> restart.
>
>>> Question:
>>>
>>> In my opinion, if lockd is allowed reuseing the client's host, it  
>>> should
>>> send a SM_MON to statd when reuse. If not allowed, the client's host
>>> should
>>> be destroyed immediately.
>>>
>>> What should lockd to do?  Reuse ? Destroy ? Or some other action?
>>
>> I don't immediately see why lockd should change it's behavior.   
>> Perhaps
>> statd/sm-notify were incorrect to delete the monitor list when you
>> restarted the nfslock service?
>
>  Sorry, maybe i did not express clearly.
>  I mean, lockd reuse the host struct which was created before statd  
> restart.
>
>  It seems have deleted the monitor list when nfslock restart.

lockd does not touch any user space files; the on-disk monitor list is  
managed by statd and sm-notify.  A remote peer rebooting does not  
clear the "monitored" flag for that peer in the local kernel's lockd,  
so it won't send another SM_MON request.

Now, it may be the case that "service nfslock start" uses a command  
line option that forces a fresh sm-notify run, and that is what is  
wiping the on-disk monitor list.  That would be the bug in this case  
-- sm-notify can and should be allowed to make its own determination  
of whether the monitor list gets retired.  Notification should not  
normally be forced by command line options in the nfslock script.

>> Can you show exactly how statd's state (ie it's on-disk monitor  
>> list in
>> /var/lib/nfs/statd/sm) changed across the restart?  Did sm-notify run
>> when you restarted statd?  If so, why didn't the sm-notify pid file  
>> stop
>> it?
>
>  The statd and lockd's state at server when nfslock restart:
>
>        lockd                   statd         |
>                                              |
>      host(monitored = 1)      /sm/client     |  client get locks  
> success at first
>          (locks)                             |
>                                              |
>      host(monitored = 1)      /sm/client     |  nfslock stop (lockd  
> release client's locks)
>          (no locks)                          |
>                                              |
>      host(monitored = 1)      /sm/           |  nfslock start  
> (client reclaim locks)
>          (locks)                             |                (but  
> statd don't monitor it)
>
>  note: host(monitored=1)  means: client's host struct is created,  
> and is marked be monitored.
>        (locks), (no locks)means: host strcut holds locks, or not.
>        /sm/client         means: there have a file under /var/lib/ 
> nfs/statd/sm directory
>        /sm/               means: /var/lib/nfs/statd/sm is empty!
>
>
> thanks,
> Mi Jinlong
>

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] server's statd and lockd will not sync after its nfslock restart
       [not found]       ` <20091216084902.64f722ad-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
@ 2009-12-17  9:34         ` Mi Jinlong
  0 siblings, 0 replies; 18+ messages in thread
From: Mi Jinlong @ 2009-12-17  9:34 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Chuck Lever, Trond.Myklebust, J. Bruce Fields, NFSv3 list



Jeff Layton :
> On Wed, 16 Dec 2009 18:27:09 +0800
> Mi Jinlong <mijinlong@cn.fujitsu.com> wrote:
> 
>>   The statd and lockd's state at server when nfslock restart:
>>
>>         lockd                   statd         |
>>                                               |
>>       host(monitored = 1)      /sm/client     |  client get locks success at first
>>           (locks)                             |
>>                                               |
>>       host(monitored = 1)      /sm/client     |  nfslock stop (lockd release client's locks)
>>           (no locks)                          |
>>                                               |  
>>       host(monitored = 1)      /sm/           |  nfslock start (client reclaim locks)
>>           (locks)                             |                (but statd don't monitor it)
>>
>>   note: host(monitored=1)  means: client's host struct is created, and is marked be monitored.
>>         (locks), (no locks)means: host strcut holds locks, or not.
>>         /sm/client         means: there have a file under /var/lib/nfs/statd/sm directory
>>         /sm/               means: /var/lib/nfs/statd/sm is empty!
>>
>>
> 
> Perhaps we ought to clear the cached list of monitored hosts (i.e. set
> them all to monitored = 0) when lockd gets a SIGKILL.

  Yes, if lockd can reuse the host struct, clear the cached list of
  monitored hosts is a good way when lockd gets a SIGKILL.

thanks,
Mi Jinlong


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] server's statd and lockd will not sync after its nfslock restart
  2009-12-16 19:33     ` Chuck Lever
@ 2009-12-17 10:07       ` Mi Jinlong
  2009-12-17 16:18         ` Chuck Lever
  0 siblings, 1 reply; 18+ messages in thread
From: Mi Jinlong @ 2009-12-17 10:07 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Trond.Myklebust, J. Bruce Fields, NFSv3 list



Chuck Lever :
> On Dec 16, 2009, at 5:27 AM, Mi Jinlong wrote:
>> Chuck Lever:
>>> On Dec 15, 2009, at 5:02 AM, Mi Jinlong wrote:
>>>> Hi,

...snip...

>>>>
>>>> The Primary Reason:
>>>>
>>>> At step3, when client's reclaimed lock request is sent to server,
>>>> client's host(the host struct) is reused but not be re-monitored at
>>>> server's lockd. After that, statd and lockd are not sync.
>>>
>>> The kernel squashes SM_MON upcalls for hosts that it already believes
>>> are monitored.  This is a scalability feature.
>>
>>  When statd start, it will move files from /var/lib/nfs/statd/sm/ to
>>  /var/lib/nfs/statd/sm.bak/.
> 
> Well, it's really sm-notify that does this.  sm-notify is run by
> rpc.statd when it starts up.
> 
> However, sm-notify should only retire the monitor list the first time it
> is run after a reboot.  Simply restarting statd should not change the
> on-disk monitor list in the slightest.  If it does, there's some kind of
> problem with the way sm-notify's pid file is managed, or perhaps with
> the nfslock script.

  When starting, statd will call run_sm_notify() function to run sm-notify.
  Using command "service nfslock restart" will case statd stop and start,
  so sm-notify will be run. If sm-notify run, the on-disk monitor list 
  will be changed.

> 
>> If lockd don't send a SM_MON to statd,
>>  statd will not monitor those client which be monitored before statd
>> restart.
>>
>>>> Question:
>>>>
>>>> In my opinion, if lockd is allowed reuseing the client's host, it
>>>> should
>>>> send a SM_MON to statd when reuse. If not allowed, the client's host
>>>> should
>>>> be destroyed immediately.
>>>>
>>>> What should lockd to do?  Reuse ? Destroy ? Or some other action?
>>>
>>> I don't immediately see why lockd should change it's behavior.  Perhaps
>>> statd/sm-notify were incorrect to delete the monitor list when you
>>> restarted the nfslock service?
>>
>>  Sorry, maybe i did not express clearly.
>>  I mean, lockd reuse the host struct which was created before statd
>> restart.
>>
>>  It seems have deleted the monitor list when nfslock restart.
> 
> lockd does not touch any user space files; the on-disk monitor list is
> managed by statd and sm-notify.  A remote peer rebooting does not clear
> the "monitored" flag for that peer in the local kernel's lockd, so it
> won't send another SM_MON request.

  Yes, that's right.

  But, this case refers to server's lockd, not the remote peer.
  I thank, when local system's nfslock restart, local kernel's lockd 
  clear all other client's host strcut's "monitored" flag.

> 
> Now, it may be the case that "service nfslock start" uses a command line
> option that forces a fresh sm-notify run, and that is what is wiping the
> on-disk monitor list.  That would be the bug in this case -- sm-notify
> can and should be allowed to make its own determination of whether the
> monitor list gets retired.  Notification should not normally be forced
> by command line options in the nfslock script.

  A fresh sm-notify run is cause by statd start. 
  I find it through codes by followed.

 utils/statd/statd.c
 ...
 478         if (! (run_mode & MODE_NO_NOTIFY))
 479                 switch (pid = fork()) {
 480                 case 0:
 481                         run_sm_notify(out_port);
 482                         break;
 483                 case -1:
 484                         break;
 485                 default:
 486                         waitpid(pid, NULL, 0);
 487                 }
 ....


 I thank, when statd restart and call sm-notify, the on-disk monitor list will 
 be deleted, so lockd should clear all other client's host strcut's "monitored" flag.
 After that, a reused host struct will be re-monitored, a on-disk monitor 
 will be re-created. Like that, lockd and statd will sync .


thanks,
Mi Jinlong


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] server's statd and lockd will not sync after its nfslock restart
  2009-12-17 10:07       ` Mi Jinlong
@ 2009-12-17 16:18         ` Chuck Lever
  2009-12-17 20:14           ` J. Bruce Fields
                             ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Chuck Lever @ 2009-12-17 16:18 UTC (permalink / raw)
  To: Trond.Myklebust Myklebust, J. Bruce Fields, Neil Brown,
	Steve Dickson
  Cc: NFSv3 list, Mi Jinlong

On Dec 17, 2009, at 5:07 AM, Mi Jinlong wrote:
> Chuck Lever :
>> On Dec 16, 2009, at 5:27 AM, Mi Jinlong wrote:
>>> Chuck Lever:
>>>> On Dec 15, 2009, at 5:02 AM, Mi Jinlong wrote:
>>>>> Hi,
>
> ...snip...
>
>>>>>
>>>>> The Primary Reason:
>>>>>
>>>>> At step3, when client's reclaimed lock request is sent to server,
>>>>> client's host(the host struct) is reused but not be re-monitored  
>>>>> at
>>>>> server's lockd. After that, statd and lockd are not sync.
>>>>
>>>> The kernel squashes SM_MON upcalls for hosts that it already  
>>>> believes
>>>> are monitored.  This is a scalability feature.
>>>
>>> When statd start, it will move files from /var/lib/nfs/statd/sm/ to
>>> /var/lib/nfs/statd/sm.bak/.
>>
>> Well, it's really sm-notify that does this.  sm-notify is run by
>> rpc.statd when it starts up.
>>
>> However, sm-notify should only retire the monitor list the first  
>> time it
>> is run after a reboot.  Simply restarting statd should not change the
>> on-disk monitor list in the slightest.  If it does, there's some  
>> kind of
>> problem with the way sm-notify's pid file is managed, or perhaps with
>> the nfslock script.
>
>  When starting, statd will call run_sm_notify() function to run sm- 
> notify.
>  Using command "service nfslock restart" will case statd stop and  
> start,
>  so sm-notify will be run. If sm-notify run, the on-disk monitor list
>  will be changed.
>
>>
>>> If lockd don't send a SM_MON to statd,
>>> statd will not monitor those client which be monitored before statd
>>> restart.
>>>
>>>>> Question:
>>>>>
>>>>> In my opinion, if lockd is allowed reuseing the client's host, it
>>>>> should
>>>>> send a SM_MON to statd when reuse. If not allowed, the client's  
>>>>> host
>>>>> should
>>>>> be destroyed immediately.
>>>>>
>>>>> What should lockd to do?  Reuse ? Destroy ? Or some other action?
>>>>
>>>> I don't immediately see why lockd should change it's behavior.   
>>>> Perhaps
>>>> statd/sm-notify were incorrect to delete the monitor list when you
>>>> restarted the nfslock service?
>>>
>>> Sorry, maybe i did not express clearly.
>>> I mean, lockd reuse the host struct which was created before statd
>>> restart.
>>>
>>> It seems have deleted the monitor list when nfslock restart.
>>
>> lockd does not touch any user space files; the on-disk monitor list  
>> is
>> managed by statd and sm-notify.  A remote peer rebooting does not  
>> clear
>> the "monitored" flag for that peer in the local kernel's lockd, so it
>> won't send another SM_MON request.
>
>  Yes, that's right.
>
>  But, this case refers to server's lockd, not the remote peer.
>  I thank, when local system's nfslock restart, local kernel's lockd
>  clear all other client's host strcut's "monitored" flag.
>
>>
>> Now, it may be the case that "service nfslock start" uses a command  
>> line
>> option that forces a fresh sm-notify run, and that is what is  
>> wiping the
>> on-disk monitor list.  That would be the bug in this case -- sm- 
>> notify
>> can and should be allowed to make its own determination of whether  
>> the
>> monitor list gets retired.  Notification should not normally be  
>> forced
>> by command line options in the nfslock script.
>
>  A fresh sm-notify run is cause by statd start.
>  I find it through codes by followed.
>
> utils/statd/statd.c
> ...
> 478         if (! (run_mode & MODE_NO_NOTIFY))
> 479                 switch (pid = fork()) {
> 480                 case 0:
> 481                         run_sm_notify(out_port);
> 482                         break;
> 483                 case -1:
> 484                         break;
> 485                 default:
> 486                         waitpid(pid, NULL, 0);
> 487                 }
> ....
>
>
> I thank, when statd restart and call sm-notify, the on-disk monitor  
> list will
> be deleted, so lockd should clear all other client's host strcut's  
> "monitored" flag.
> After that, a reused host struct will be re-monitored, a on-disk  
> monitor
> will be re-created. Like that, lockd and statd will sync .

run_sm_notify() simply forks and execs the sm-notify program.  This  
program checks for the existence of a pid file.  If the pid file  
exists, then sm-notify exits.  If it does not, then sm-notify retires  
the records in /var/lib/nfs/statd/sm and posts reboot notifications.

Jeff Layton pointed out to me yesterday that Red Hat's nfslock script  
unconditionally deletes sm-notify's pid file every time "service  
nfslock start" is done, which effectively defeats sm-notify's reboot  
detection.

sm-notify was written by a developer at SuSE.  SuSE Linux uses a tmpfs  
for /var/run, but Red Hat uses permanent storage for this directory.   
Thus on SuSE, the pid file gets deleted automatically by a reboot, but  
on Red Hat, the pid file must be deleted "by hand" or reboot  
notification never occurs.

So the root cause of this problem is that the current mechanism sm- 
notify uses to detect a reboot is not portable across distributions.

My new-statd prototype used a semaphor instead of a pid file to detect  
reboots.  A semaphor is shared (visible to other processes) and will  
continue to exist until it is deleted or the system reboots.  It is a  
resource that is not destroyed automatically when the sm-notify  
process exits.  If creating the semaphor fails, sm-notify exits.  If  
creating it succeeds, it runs.

Would anyone strongly object to using a semaphor instead of a pid file  
here?  Is support for semaphors always built into kernels?  Would  
there be any problems with the small size of the semaphor name space?   
Is there another similar facility that might be better?

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] server's statd and lockd will not sync after its nfslock restart
  2009-12-17 16:18         ` Chuck Lever
@ 2009-12-17 20:14           ` J. Bruce Fields
  2009-12-17 20:35             ` Chuck Lever
  2009-12-17 20:27           ` Trond Myklebust
  2009-12-17 23:14           ` Neil Brown
  2 siblings, 1 reply; 18+ messages in thread
From: J. Bruce Fields @ 2009-12-17 20:14 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Trond.Myklebust Myklebust, Neil Brown, Steve Dickson, NFSv3 list,
	Mi Jinlong

On Thu, Dec 17, 2009 at 11:18:53AM -0500, Chuck Lever wrote:
> run_sm_notify() simply forks and execs the sm-notify program.  This =20
> program checks for the existence of a pid file.  If the pid file exis=
ts,=20
> then sm-notify exits.  If it does not, then sm-notify retires the rec=
ords=20
> in /var/lib/nfs/statd/sm and posts reboot notifications.
>
> Jeff Layton pointed out to me yesterday that Red Hat's nfslock script=
 =20
> unconditionally deletes sm-notify's pid file every time "service nfsl=
ock=20
> start" is done, which effectively defeats sm-notify's reboot detectio=
n.
>
> sm-notify was written by a developer at SuSE.  SuSE Linux uses a tmpf=
s =20
> for /var/run, but Red Hat uses permanent storage for this directory. =
 =20
> Thus on SuSE, the pid file gets deleted automatically by a reboot, bu=
t =20
> on Red Hat, the pid file must be deleted "by hand" or reboot =20
> notification never occurs.
>
> So the root cause of this problem is that the current mechanism sm-=20
> notify uses to detect a reboot is not portable across distributions.
>
> My new-statd prototype used a semaphor instead of a pid file to detec=
t =20
> reboots.  A semaphor is shared (visible to other processes) and will =
=20
> continue to exist until it is deleted or the system reboots.  It is a=
 =20
> resource that is not destroyed automatically when the sm-notify proce=
ss=20
> exits.  If creating the semaphor fails, sm-notify exits.  If creating=
 it=20
> succeeds, it runs.
>
> Would anyone strongly object to using a semaphor instead of a pid fil=
e =20
> here?  Is support for semaphors always built into kernels?  Would the=
re=20
> be any problems with the small size of the semaphor name space?  Is t=
here=20
> another similar facility that might be better?

I don't know much about those (except that I think there's an e at the
end); looks like sem_overview(7) is the place to start?

It says:

	" Prior to kernel 2.6, Linux only supported unnamed,
	thread-shared  sema=E2=80=90 phores.   On a system with Linux 2.6 and =
a
	glibc that provides the NPTL threading implementation, a
	complete implementation of POSIX semaphores is provided."

So would it mean dropping support for 2.4?

--b.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] server's statd and lockd will not sync after its nfslock restart
  2009-12-17 16:18         ` Chuck Lever
  2009-12-17 20:14           ` J. Bruce Fields
@ 2009-12-17 20:27           ` Trond Myklebust
  2009-12-17 20:34             ` Chuck Lever
  2009-12-17 23:14           ` Neil Brown
  2 siblings, 1 reply; 18+ messages in thread
From: Trond Myklebust @ 2009-12-17 20:27 UTC (permalink / raw)
  To: Chuck Lever
  Cc: J. Bruce Fields, Neil Brown, Steve Dickson, NFSv3 list,
	Mi Jinlong

On Thu, 2009-12-17 at 11:18 -0500, Chuck Lever wrote: 
> On Dec 17, 2009, at 5:07 AM, Mi Jinlong wrote:
> > Chuck Lever :
> >> On Dec 16, 2009, at 5:27 AM, Mi Jinlong wrote:
> >>> Chuck Lever:
> >>>> On Dec 15, 2009, at 5:02 AM, Mi Jinlong wrote:
> >>>>> Hi,
> >
> > ...snip...
> >
> >>>>>
> >>>>> The Primary Reason:
> >>>>>
> >>>>> At step3, when client's reclaimed lock request is sent to server,
> >>>>> client's host(the host struct) is reused but not be re-monitored  
> >>>>> at
> >>>>> server's lockd. After that, statd and lockd are not sync.
> >>>>
> >>>> The kernel squashes SM_MON upcalls for hosts that it already  
> >>>> believes
> >>>> are monitored.  This is a scalability feature.
> >>>
> >>> When statd start, it will move files from /var/lib/nfs/statd/sm/ to
> >>> /var/lib/nfs/statd/sm.bak/.
> >>
> >> Well, it's really sm-notify that does this.  sm-notify is run by
> >> rpc.statd when it starts up.
> >>
> >> However, sm-notify should only retire the monitor list the first  
> >> time it
> >> is run after a reboot.  Simply restarting statd should not change the
> >> on-disk monitor list in the slightest.  If it does, there's some  
> >> kind of
> >> problem with the way sm-notify's pid file is managed, or perhaps with
> >> the nfslock script.
> >
> >  When starting, statd will call run_sm_notify() function to run sm- 
> > notify.
> >  Using command "service nfslock restart" will case statd stop and  
> > start,
> >  so sm-notify will be run. If sm-notify run, the on-disk monitor list
> >  will be changed.
> >
> >>
> >>> If lockd don't send a SM_MON to statd,
> >>> statd will not monitor those client which be monitored before statd
> >>> restart.
> >>>
> >>>>> Question:
> >>>>>
> >>>>> In my opinion, if lockd is allowed reuseing the client's host, it
> >>>>> should
> >>>>> send a SM_MON to statd when reuse. If not allowed, the client's  
> >>>>> host
> >>>>> should
> >>>>> be destroyed immediately.
> >>>>>
> >>>>> What should lockd to do?  Reuse ? Destroy ? Or some other action?
> >>>>
> >>>> I don't immediately see why lockd should change it's behavior.   
> >>>> Perhaps
> >>>> statd/sm-notify were incorrect to delete the monitor list when you
> >>>> restarted the nfslock service?
> >>>
> >>> Sorry, maybe i did not express clearly.
> >>> I mean, lockd reuse the host struct which was created before statd
> >>> restart.
> >>>
> >>> It seems have deleted the monitor list when nfslock restart.
> >>
> >> lockd does not touch any user space files; the on-disk monitor list  
> >> is
> >> managed by statd and sm-notify.  A remote peer rebooting does not  
> >> clear
> >> the "monitored" flag for that peer in the local kernel's lockd, so it
> >> won't send another SM_MON request.
> >
> >  Yes, that's right.
> >
> >  But, this case refers to server's lockd, not the remote peer.
> >  I thank, when local system's nfslock restart, local kernel's lockd
> >  clear all other client's host strcut's "monitored" flag.
> >
> >>
> >> Now, it may be the case that "service nfslock start" uses a command  
> >> line
> >> option that forces a fresh sm-notify run, and that is what is  
> >> wiping the
> >> on-disk monitor list.  That would be the bug in this case -- sm- 
> >> notify
> >> can and should be allowed to make its own determination of whether  
> >> the
> >> monitor list gets retired.  Notification should not normally be  
> >> forced
> >> by command line options in the nfslock script.
> >
> >  A fresh sm-notify run is cause by statd start.
> >  I find it through codes by followed.
> >
> > utils/statd/statd.c
> > ...
> > 478         if (! (run_mode & MODE_NO_NOTIFY))
> > 479                 switch (pid = fork()) {
> > 480                 case 0:
> > 481                         run_sm_notify(out_port);
> > 482                         break;
> > 483                 case -1:
> > 484                         break;
> > 485                 default:
> > 486                         waitpid(pid, NULL, 0);
> > 487                 }
> > ....
> >
> >
> > I thank, when statd restart and call sm-notify, the on-disk monitor  
> > list will
> > be deleted, so lockd should clear all other client's host strcut's  
> > "monitored" flag.
> > After that, a reused host struct will be re-monitored, a on-disk  
> > monitor
> > will be re-created. Like that, lockd and statd will sync .
> 
> run_sm_notify() simply forks and execs the sm-notify program.  This  
> program checks for the existence of a pid file.  If the pid file  
> exists, then sm-notify exits.  If it does not, then sm-notify retires  
> the records in /var/lib/nfs/statd/sm and posts reboot notifications.
> 
> Jeff Layton pointed out to me yesterday that Red Hat's nfslock script  
> unconditionally deletes sm-notify's pid file every time "service  
> nfslock start" is done, which effectively defeats sm-notify's reboot  
> detection.
> 
> sm-notify was written by a developer at SuSE.  SuSE Linux uses a tmpfs  
> for /var/run, but Red Hat uses permanent storage for this directory.   
> Thus on SuSE, the pid file gets deleted automatically by a reboot, but  
> on Red Hat, the pid file must be deleted "by hand" or reboot  
> notification never occurs.
> 
> So the root cause of this problem is that the current mechanism sm- 
> notify uses to detect a reboot is not portable across distributions.
> 
> My new-statd prototype used a semaphor instead of a pid file to detect  
> reboots.  A semaphor is shared (visible to other processes) and will  
> continue to exist until it is deleted or the system reboots.  It is a  
> resource that is not destroyed automatically when the sm-notify  
> process exits.  If creating the semaphor fails, sm-notify exits.  If  
> creating it succeeds, it runs.
> 
> Would anyone strongly object to using a semaphor instead of a pid file  
> here?  Is support for semaphors always built into kernels?  Would  
> there be any problems with the small size of the semaphor name space?   
> Is there another similar facility that might be better?
> 

One alternative might be to just record the kernel's random boot_id in
the pid file. That gets regenerated on each boot, so should be unique.

Trond


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] server's statd and lockd will not sync after its nfslock restart
  2009-12-17 20:27           ` Trond Myklebust
@ 2009-12-17 20:34             ` Chuck Lever
  2009-12-17 20:48               ` Trond Myklebust
  0 siblings, 1 reply; 18+ messages in thread
From: Chuck Lever @ 2009-12-17 20:34 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: J. Bruce Fields, Neil Brown, Steve Dickson, NFSv3 list,
	Mi Jinlong


On Dec 17, 2009, at 3:27 PM, Trond Myklebust wrote:

> On Thu, 2009-12-17 at 11:18 -0500, Chuck Lever wrote:
>> On Dec 17, 2009, at 5:07 AM, Mi Jinlong wrote:
>>> Chuck Lever :
>>>> On Dec 16, 2009, at 5:27 AM, Mi Jinlong wrote:
>>>>> Chuck Lever:
>>>>>> On Dec 15, 2009, at 5:02 AM, Mi Jinlong wrote:
>>>>>>> Hi,
>>>
>>> ...snip...
>>>
>>>>>>>
>>>>>>> The Primary Reason:
>>>>>>>
>>>>>>> At step3, when client's reclaimed lock request is sent to  
>>>>>>> server,
>>>>>>> client's host(the host struct) is reused but not be re-monitored
>>>>>>> at
>>>>>>> server's lockd. After that, statd and lockd are not sync.
>>>>>>
>>>>>> The kernel squashes SM_MON upcalls for hosts that it already
>>>>>> believes
>>>>>> are monitored.  This is a scalability feature.
>>>>>
>>>>> When statd start, it will move files from /var/lib/nfs/statd/sm/  
>>>>> to
>>>>> /var/lib/nfs/statd/sm.bak/.
>>>>
>>>> Well, it's really sm-notify that does this.  sm-notify is run by
>>>> rpc.statd when it starts up.
>>>>
>>>> However, sm-notify should only retire the monitor list the first
>>>> time it
>>>> is run after a reboot.  Simply restarting statd should not change  
>>>> the
>>>> on-disk monitor list in the slightest.  If it does, there's some
>>>> kind of
>>>> problem with the way sm-notify's pid file is managed, or perhaps  
>>>> with
>>>> the nfslock script.
>>>
>>> When starting, statd will call run_sm_notify() function to run sm-
>>> notify.
>>> Using command "service nfslock restart" will case statd stop and
>>> start,
>>> so sm-notify will be run. If sm-notify run, the on-disk monitor list
>>> will be changed.
>>>
>>>>
>>>>> If lockd don't send a SM_MON to statd,
>>>>> statd will not monitor those client which be monitored before  
>>>>> statd
>>>>> restart.
>>>>>
>>>>>>> Question:
>>>>>>>
>>>>>>> In my opinion, if lockd is allowed reuseing the client's host,  
>>>>>>> it
>>>>>>> should
>>>>>>> send a SM_MON to statd when reuse. If not allowed, the client's
>>>>>>> host
>>>>>>> should
>>>>>>> be destroyed immediately.
>>>>>>>
>>>>>>> What should lockd to do?  Reuse ? Destroy ? Or some other  
>>>>>>> action?
>>>>>>
>>>>>> I don't immediately see why lockd should change it's behavior.
>>>>>> Perhaps
>>>>>> statd/sm-notify were incorrect to delete the monitor list when  
>>>>>> you
>>>>>> restarted the nfslock service?
>>>>>
>>>>> Sorry, maybe i did not express clearly.
>>>>> I mean, lockd reuse the host struct which was created before statd
>>>>> restart.
>>>>>
>>>>> It seems have deleted the monitor list when nfslock restart.
>>>>
>>>> lockd does not touch any user space files; the on-disk monitor list
>>>> is
>>>> managed by statd and sm-notify.  A remote peer rebooting does not
>>>> clear
>>>> the "monitored" flag for that peer in the local kernel's lockd,  
>>>> so it
>>>> won't send another SM_MON request.
>>>
>>> Yes, that's right.
>>>
>>> But, this case refers to server's lockd, not the remote peer.
>>> I thank, when local system's nfslock restart, local kernel's lockd
>>> clear all other client's host strcut's "monitored" flag.
>>>
>>>>
>>>> Now, it may be the case that "service nfslock start" uses a command
>>>> line
>>>> option that forces a fresh sm-notify run, and that is what is
>>>> wiping the
>>>> on-disk monitor list.  That would be the bug in this case -- sm-
>>>> notify
>>>> can and should be allowed to make its own determination of whether
>>>> the
>>>> monitor list gets retired.  Notification should not normally be
>>>> forced
>>>> by command line options in the nfslock script.
>>>
>>> A fresh sm-notify run is cause by statd start.
>>> I find it through codes by followed.
>>>
>>> utils/statd/statd.c
>>> ...
>>> 478         if (! (run_mode & MODE_NO_NOTIFY))
>>> 479                 switch (pid = fork()) {
>>> 480                 case 0:
>>> 481                         run_sm_notify(out_port);
>>> 482                         break;
>>> 483                 case -1:
>>> 484                         break;
>>> 485                 default:
>>> 486                         waitpid(pid, NULL, 0);
>>> 487                 }
>>> ....
>>>
>>>
>>> I thank, when statd restart and call sm-notify, the on-disk monitor
>>> list will
>>> be deleted, so lockd should clear all other client's host strcut's
>>> "monitored" flag.
>>> After that, a reused host struct will be re-monitored, a on-disk
>>> monitor
>>> will be re-created. Like that, lockd and statd will sync .
>>
>> run_sm_notify() simply forks and execs the sm-notify program.  This
>> program checks for the existence of a pid file.  If the pid file
>> exists, then sm-notify exits.  If it does not, then sm-notify retires
>> the records in /var/lib/nfs/statd/sm and posts reboot notifications.
>>
>> Jeff Layton pointed out to me yesterday that Red Hat's nfslock script
>> unconditionally deletes sm-notify's pid file every time "service
>> nfslock start" is done, which effectively defeats sm-notify's reboot
>> detection.
>>
>> sm-notify was written by a developer at SuSE.  SuSE Linux uses a  
>> tmpfs
>> for /var/run, but Red Hat uses permanent storage for this directory.
>> Thus on SuSE, the pid file gets deleted automatically by a reboot,  
>> but
>> on Red Hat, the pid file must be deleted "by hand" or reboot
>> notification never occurs.
>>
>> So the root cause of this problem is that the current mechanism sm-
>> notify uses to detect a reboot is not portable across distributions.
>>
>> My new-statd prototype used a semaphor instead of a pid file to  
>> detect
>> reboots.  A semaphor is shared (visible to other processes) and will
>> continue to exist until it is deleted or the system reboots.  It is a
>> resource that is not destroyed automatically when the sm-notify
>> process exits.  If creating the semaphor fails, sm-notify exits.  If
>> creating it succeeds, it runs.
>>
>> Would anyone strongly object to using a semaphor instead of a pid  
>> file
>> here?  Is support for semaphors always built into kernels?  Would
>> there be any problems with the small size of the semaphor name space?
>> Is there another similar facility that might be better?
>
> One alternative might be to just record the kernel's random boot_id in
> the pid file. That gets regenerated on each boot, so should be unique.

Where do you get it in user space?  Is it available on earlier  
kernels?  ("should be unique" -- I hope it doesn't have the same  
problem we had with XID replay on diskless systems).

Fwiw, I tried using the boot time stamp at one point, but  
unfortunately that's adjusted by the ntp offset, so it can take  
different values over time.  It was difficult to compare it to a time  
stamp recorded in a file.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] server's statd and lockd will not sync after its nfslock restart
  2009-12-17 20:14           ` J. Bruce Fields
@ 2009-12-17 20:35             ` Chuck Lever
  0 siblings, 0 replies; 18+ messages in thread
From: Chuck Lever @ 2009-12-17 20:35 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Trond.Myklebust Myklebust, Neil Brown, Steve Dickson, NFSv3 list,
	Mi Jinlong

On Dec 17, 2009, at 3:14 PM, J. Bruce Fields wrote:
> On Thu, Dec 17, 2009 at 11:18:53AM -0500, Chuck Lever wrote:
>> run_sm_notify() simply forks and execs the sm-notify program.  This
>> program checks for the existence of a pid file.  If the pid file =20
>> exists,
>> then sm-notify exits.  If it does not, then sm-notify retires the =20
>> records
>> in /var/lib/nfs/statd/sm and posts reboot notifications.
>>
>> Jeff Layton pointed out to me yesterday that Red Hat's nfslock scrip=
t
>> unconditionally deletes sm-notify's pid file every time "service =20
>> nfslock
>> start" is done, which effectively defeats sm-notify's reboot =20
>> detection.
>>
>> sm-notify was written by a developer at SuSE.  SuSE Linux uses a =20
>> tmpfs
>> for /var/run, but Red Hat uses permanent storage for this directory.
>> Thus on SuSE, the pid file gets deleted automatically by a reboot, =20
>> but
>> on Red Hat, the pid file must be deleted "by hand" or reboot
>> notification never occurs.
>>
>> So the root cause of this problem is that the current mechanism sm-
>> notify uses to detect a reboot is not portable across distributions.
>>
>> My new-statd prototype used a semaphor instead of a pid file to =20
>> detect
>> reboots.  A semaphor is shared (visible to other processes) and will
>> continue to exist until it is deleted or the system reboots.  It is =
a
>> resource that is not destroyed automatically when the sm-notify =20
>> process
>> exits.  If creating the semaphor fails, sm-notify exits.  If =20
>> creating it
>> succeeds, it runs.
>>
>> Would anyone strongly object to using a semaphor instead of a pid =20
>> file
>> here?  Is support for semaphors always built into kernels?  Would =20
>> there
>> be any problems with the small size of the semaphor name space?  Is =
=20
>> there
>> another similar facility that might be better?
>
> I don't know much about those (except that I think there's an e at th=
e
> end); looks like sem_overview(7) is the place to start?
>
> It says:
>
> 	" Prior to kernel 2.6, Linux only supported unnamed,
> 	thread-shared  sema=E2=80=90 phores.   On a system with Linux 2.6 an=
d a
> 	glibc that provides the NPTL threading implementation, a
> 	complete implementation of POSIX semaphores is provided."
>
> So would it mean dropping support for 2.4?

No, it would mean using them only on systems that supported shared =20
semaphores.

--=20
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] server's statd and lockd will not sync after its nfslock restart
  2009-12-17 20:34             ` Chuck Lever
@ 2009-12-17 20:48               ` Trond Myklebust
  0 siblings, 0 replies; 18+ messages in thread
From: Trond Myklebust @ 2009-12-17 20:48 UTC (permalink / raw)
  To: Chuck Lever
  Cc: J. Bruce Fields, Neil Brown, Steve Dickson, NFSv3 list,
	Mi Jinlong

On Thu, 2009-12-17 at 15:34 -0500, Chuck Lever wrote: 
> On Dec 17, 2009, at 3:27 PM, Trond Myklebust wrote:
> > One alternative might be to just record the kernel's random boot_id in
> > the pid file. That gets regenerated on each boot, so should be unique.
> 
> Where do you get it in user space?  Is it available on earlier  
> kernels?  ("should be unique" -- I hope it doesn't have the same  
> problem we had with XID replay on diskless systems).

You can access it from userland as the 'kernel.random.boot_id' sysctl.

It is available on 2.4 kernels and newer.

It is based on the kernel random number generator, so should be
reasonably unique.

> Fwiw, I tried using the boot time stamp at one point, but  
> unfortunately that's adjusted by the ntp offset, so it can take  
> different values over time.  It was difficult to compare it to a time  
> stamp recorded in a file.

Agreed. You can't rely on time stamps.

Trond


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] server's statd and lockd will not sync after its nfslock restart
  2009-12-17 16:18         ` Chuck Lever
  2009-12-17 20:14           ` J. Bruce Fields
  2009-12-17 20:27           ` Trond Myklebust
@ 2009-12-17 23:14           ` Neil Brown
       [not found]             ` <20091218101438.48eb06a4-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
  2 siblings, 1 reply; 18+ messages in thread
From: Neil Brown @ 2009-12-17 23:14 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Trond.Myklebust Myklebust, J. Bruce Fields, Steve Dickson,
	NFSv3 list, Mi Jinlong

On Thu, 17 Dec 2009 11:18:53 -0500
Chuck Lever <chuck.lever@oracle.com> wrote:

> Jeff Layton pointed out to me yesterday that Red Hat's nfslock script  
> unconditionally deletes sm-notify's pid file every time "service  
> nfslock start" is done, which effectively defeats sm-notify's reboot  
> detection.
> 
> sm-notify was written by a developer at SuSE.  SuSE Linux uses a tmpfs  
> for /var/run, but Red Hat uses permanent storage for this directory.   
> Thus on SuSE, the pid file gets deleted automatically by a reboot, but  
> on Red Hat, the pid file must be deleted "by hand" or reboot  
> notification never occurs.

Just to make sure the facts are straight:
 SuSE does not use tmpfs for /var/run (much as I personally think that
 would be a very sensible approach for both /var/run and /var/locks).
 It appears that Debian can use tmpfs for these, but doesn't by default.

 Both SuSE and Debian have boot time scripts that clean up /var/run and other
 directories.  They remove all non-directories other than /var/run/utmp.

If Redhat doesn't clean up /var/run at boot time, then I would think that is
very odd.  The files in there represent something that is running.  At boot,
nothing is running, so it should all be cleaned up.  Are you sure Redhat
doesn't clean out /var/run???

I just had a look at master.kernel.org (the only fedora machine I can think
of that I have access to) and in /etc/rc.d/rc.sysinit I find

    find /var/lock /var/run ! -type d -exec rm -f {} \;

So I'm thinking that if you just remove

        # Make sure locks are recovered
        rm -f /var/run/sm-notify.pid

from /etc/init.d/nfslock, then it will do the right thing.

NeilBrown


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] server's statd and lockd will not sync after its nfslock restart
       [not found]             ` <20091218101438.48eb06a4-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
@ 2009-12-18 15:18               ` Chuck Lever
  2009-12-19 16:42                 ` Steve Dickson
  0 siblings, 1 reply; 18+ messages in thread
From: Chuck Lever @ 2009-12-18 15:18 UTC (permalink / raw)
  To: Neil Brown, Steve Dickson
  Cc: Trond.Myklebust Myklebust, J. Bruce Fields, NFSv3 list,
	Mi Jinlong


On Dec 17, 2009, at 6:14 PM, Neil Brown wrote:

> On Thu, 17 Dec 2009 11:18:53 -0500
> Chuck Lever <chuck.lever@oracle.com> wrote:
>
>> Jeff Layton pointed out to me yesterday that Red Hat's nfslock script
>> unconditionally deletes sm-notify's pid file every time "service
>> nfslock start" is done, which effectively defeats sm-notify's reboot
>> detection.
>>
>> sm-notify was written by a developer at SuSE.  SuSE Linux uses a  
>> tmpfs
>> for /var/run, but Red Hat uses permanent storage for this directory.
>> Thus on SuSE, the pid file gets deleted automatically by a reboot,  
>> but
>> on Red Hat, the pid file must be deleted "by hand" or reboot
>> notification never occurs.
>
> Just to make sure the facts are straight:
> SuSE does not use tmpfs for /var/run (much as I personally think that
> would be a very sensible approach for both /var/run and /var/locks).
> It appears that Debian can use tmpfs for these, but doesn't by  
> default.
>
> Both SuSE and Debian have boot time scripts that clean up /var/run  
> and other
> directories.  They remove all non-directories other than /var/run/ 
> utmp.
>
> If Redhat doesn't clean up /var/run at boot time, then I would think  
> that is
> very odd.  The files in there represent something that is running.   
> At boot,
> nothing is running, so it should all be cleaned up.  Are you sure  
> Redhat
> doesn't clean out /var/run???
>
> I just had a look at master.kernel.org (the only fedora machine I  
> can think
> of that I have access to) and in /etc/rc.d/rc.sysinit I find
>
>    find /var/lock /var/run ! -type d -exec rm -f {} \;
>
> So I'm thinking that if you just remove
>
>        # Make sure locks are recovered
>        rm -f /var/run/sm-notify.pid
>
> from /etc/init.d/nfslock, then it will do the right thing.

Makes sense.  Steve, can you look into this for supported releases  
(like F12 and RHEL5)?  Or, perhaps you can clarify why that "rm" is  
required.

Meanwhile, I'm going to prototype a mechanism that tries to use the  
kernel's boot_id, if present.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] server's statd and lockd will not sync after its nfslock restart
  2009-12-18 15:18               ` Chuck Lever
@ 2009-12-19 16:42                 ` Steve Dickson
  0 siblings, 0 replies; 18+ messages in thread
From: Steve Dickson @ 2009-12-19 16:42 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Neil Brown, Trond.Myklebust Myklebust, J. Bruce Fields,
	NFSv3 list, Mi Jinlong



On 12/18/2009 10:18 AM, Chuck Lever wrote:
> 
> On Dec 17, 2009, at 6:14 PM, Neil Brown wrote:
> 
>> On Thu, 17 Dec 2009 11:18:53 -0500
>> Chuck Lever <chuck.lever@oracle.com> wrote:
>>
>>> Jeff Layton pointed out to me yesterday that Red Hat's nfslock script
>>> unconditionally deletes sm-notify's pid file every time "service
>>> nfslock start" is done, which effectively defeats sm-notify's reboot
>>> detection.
>>>
>>> sm-notify was written by a developer at SuSE.  SuSE Linux uses a tmpfs
>>> for /var/run, but Red Hat uses permanent storage for this directory.
>>> Thus on SuSE, the pid file gets deleted automatically by a reboot, but
>>> on Red Hat, the pid file must be deleted "by hand" or reboot
>>> notification never occurs.
>>
>> Just to make sure the facts are straight:
>> SuSE does not use tmpfs for /var/run (much as I personally think that
>> would be a very sensible approach for both /var/run and /var/locks).
>> It appears that Debian can use tmpfs for these, but doesn't by default.
>>
>> Both SuSE and Debian have boot time scripts that clean up /var/run and
>> other
>> directories.  They remove all non-directories other than /var/run/utmp.
>>
>> If Redhat doesn't clean up /var/run at boot time, then I would think
>> that is
>> very odd.  The files in there represent something that is running.  At
>> boot,
>> nothing is running, so it should all be cleaned up.  Are you sure Redhat
>> doesn't clean out /var/run???
>>
>> I just had a look at master.kernel.org (the only fedora machine I can
>> think
>> of that I have access to) and in /etc/rc.d/rc.sysinit I find
>>
>>    find /var/lock /var/run ! -type d -exec rm -f {} \;
>>
>> So I'm thinking that if you just remove
>>
>>        # Make sure locks are recovered
>>        rm -f /var/run/sm-notify.pid
>>
>> from /etc/init.d/nfslock, then it will do the right thing.
> 
> Makes sense.  Steve, can you look into this for supported releases (like
> F12 and RHEL5)?  Or, perhaps you can clarify why that "rm" is required.
I know at the time I added that code the pid file was not being
removed and by explicitly removing it caused sm-notify to *always* run 
which, at the time, seem like the right thing to do.. The change was 
made in early January of 08, so let me take to see if things 
have changed... 

steved.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2009-12-19 16:42 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-15 10:02 [RFC] server's statd and lockd will not sync after its nfslock restart Mi Jinlong
2009-12-15 12:41 ` J. Bruce Fields
2009-12-16  9:46   ` Mi Jinlong
2009-12-15 15:10 ` Chuck Lever
2009-12-16 10:27   ` Mi Jinlong
2009-12-16 13:49     ` Jeff Layton
     [not found]       ` <20091216084902.64f722ad-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2009-12-17  9:34         ` Mi Jinlong
2009-12-16 19:33     ` Chuck Lever
2009-12-17 10:07       ` Mi Jinlong
2009-12-17 16:18         ` Chuck Lever
2009-12-17 20:14           ` J. Bruce Fields
2009-12-17 20:35             ` Chuck Lever
2009-12-17 20:27           ` Trond Myklebust
2009-12-17 20:34             ` Chuck Lever
2009-12-17 20:48               ` Trond Myklebust
2009-12-17 23:14           ` Neil Brown
     [not found]             ` <20091218101438.48eb06a4-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2009-12-18 15:18               ` Chuck Lever
2009-12-19 16:42                 ` Steve Dickson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).