* server does not abort grace period @ 2011-02-11 12:18 Ferenc Wagner 2011-02-21 19:54 ` Ferenc Wagner 0 siblings, 1 reply; 8+ messages in thread From: Ferenc Wagner @ 2011-02-11 12:18 UTC (permalink / raw) To: linux-nfs Hi, We're running 2.6.32 (Debian squeeze) NFS4 server and clients. The server boots and runs purely from SAN, so we can start it on different computers. In case of such "hardware failovers" I'd expect the clients to quickly reclaim their locks (if any) and thus the server to abort it's 90-second grace period early. However, this does not happen, ruining our HA like, totally. So, the questions: is the functionality of aborting the grace period early missing from version 2.6.32 of the Linux kernel? If yes, is it present in any kernel version? If it should work, could someone offer some advice on debugging it? If it isn't supported, what's the best practice of providing highly available NFSv4 today? -- Thanks, Feri. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: server does not abort grace period 2011-02-11 12:18 server does not abort grace period Ferenc Wagner @ 2011-02-21 19:54 ` Ferenc Wagner 2011-02-22 1:11 ` J. Bruce Fields 0 siblings, 1 reply; 8+ messages in thread From: Ferenc Wagner @ 2011-02-21 19:54 UTC (permalink / raw) To: linux-nfs Ferenc Wagner <wferi@niif.hu> writes: > We're running 2.6.32 (Debian squeeze) NFS4 server and clients. The > server boots and runs purely from SAN, so we can start it on different > computers. In case of such "hardware failovers" I'd expect the clients > to quickly reclaim their locks (if any) and thus the server to abort > it's 90-second grace period early. However, this does not happen, > ruining our HA like, totally. > > So, the questions: is the functionality of aborting the grace period > early missing from version 2.6.32 of the Linux kernel? If yes, is it > present in any kernel version? If it should work, could someone offer > some advice on debugging it? If it isn't supported, what's the > best practice of providing highly available NFSv4 today? Hi, Could somebody please share any related wisdom? Pretty please? In short, how to fight grace period in a HA NFS4 setup? Decreasing it (of course after cutting the lock lease time) seems a rather big hammer, I'd like to avoid using it if reasonably possible. -- Thanks, Feri. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: server does not abort grace period 2011-02-21 19:54 ` Ferenc Wagner @ 2011-02-22 1:11 ` J. Bruce Fields 2011-02-22 17:05 ` Ferenc Wagner 0 siblings, 1 reply; 8+ messages in thread From: J. Bruce Fields @ 2011-02-22 1:11 UTC (permalink / raw) To: Ferenc Wagner; +Cc: linux-nfs On Mon, Feb 21, 2011 at 08:54:24PM +0100, Ferenc Wagner wrote: > Ferenc Wagner <wferi@niif.hu> writes: > > > We're running 2.6.32 (Debian squeeze) NFS4 server and clients. The > > server boots and runs purely from SAN, so we can start it on different > > computers. In case of such "hardware failovers" I'd expect the clients > > to quickly reclaim their locks (if any) and thus the server to abort > > it's 90-second grace period early. However, this does not happen, > > ruining our HA like, totally. > > > > So, the questions: is the functionality of aborting the grace period > > early missing from version 2.6.32 of the Linux kernel? If yes, is it > > present in any kernel version? If it should work, could someone offer > > some advice on debugging it? If it isn't supported, what's the > > best practice of providing highly available NFSv4 today? > > Hi, > > Could somebody please share any related wisdom? Pretty please? > In short, how to fight grace period in a HA NFS4 setup? > Decreasing it (of course after cutting the lock lease time) seems a > rather big hammer, I'd like to avoid using it if reasonably possible. The NFSv4.0 protocol doesn't provide any way for clients to tell the server that they have finished recovering; as long as *any* clients held state on the previous server instance, the new server is stuck waiting out the whole grace period. Some things we could do: - We could at least recognize the case where *no* clients held state before, and end the grace period early in that case. - In the NFSv4.1 case there is a "reclaim complete" rpc that clients are required to send. Currently we don't take advantage of that to end the grace period early, but we should. That's no help for 4.0 clients. - We could record a count of all locks/opens held in stable storage and use that to decide when a client is done recovering. That would be complicated and risk slowing down normal opens and locks a lot. In short, it's hard. I don't think decreasing the lease time would be so terrible. Perhaps the default should even be a little less. --b. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: server does not abort grace period 2011-02-22 1:11 ` J. Bruce Fields @ 2011-02-22 17:05 ` Ferenc Wagner 2011-02-23 19:52 ` J. Bruce Fields 0 siblings, 1 reply; 8+ messages in thread From: Ferenc Wagner @ 2011-02-22 17:05 UTC (permalink / raw) To: linux-nfs "J. Bruce Fields" <bfields@fieldses.org> writes: First of all, thank you very much for the detailed and useful reply! > On Mon, Feb 21, 2011 at 08:54:24PM +0100, Ferenc Wagner wrote: > >> Ferenc Wagner <wferi@niif.hu> writes: >> >>> We're running 2.6.32 (Debian squeeze) NFS4 server and clients. The >>> server boots and runs purely from SAN, so we can start it on different >>> computers. In case of such "hardware failovers" I'd expect the clients >>> to quickly reclaim their locks (if any) and thus the server to abort >>> it's 90-second grace period early. However, this does not happen, >>> ruining our HA like, totally. >>> >>> So, the questions: is the functionality of aborting the grace period >>> early missing from version 2.6.32 of the Linux kernel? If yes, is it >>> present in any kernel version? If it should work, could someone offer >>> some advice on debugging it? If it isn't supported, what's the >>> best practice of providing highly available NFSv4 today? >> >> Could somebody please share any related wisdom? Pretty please? >> In short, how to fight grace period in a HA NFS4 setup? >> Decreasing it (of course after cutting the lock lease time) seems a >> rather big hammer, I'd like to avoid using it if reasonably possible. > > The NFSv4.0 protocol doesn't provide any way for clients to tell the > server that they have finished recovering; as long as *any* clients > held state on the previous server instance, the new server is stuck > waiting out the whole grace period. Some things we could do: > > - We could at least recognize the case where *no* clients held > state before, and end the grace period early in that case. Would this mean that /var/lib/nfs/v4recovery is empty on the server? Actually, it contains a hex-named empty directory, sometimes two (we're running with two clients at the moment). > - In the NFSv4.1 case there is a "reclaim complete" rpc that > clients are required to send. Currently we don't take > advantage of that to end the grace period early, but we > should. That's no help for 4.0 clients. /proc/fs/nfsd/versions shows +4.1 on the server, does this mean that nfs4 type Linux client mounts should issue "reclaim complete"? I see that it won't help anyway at the moment, lacking server support, just out of interest... > - We could record a count of all locks/opens held in stable > storage and use that to decide when a client is done > recovering. That would be complicated and risk slowing down > normal opens and locks a lot. And the "reclaim complete" client RPC seems must better anyway, as the server and the client may get out of sync in case of an unclean client shutdown. > I don't think decreasing the lease time would be so terrible. Perhaps > the default should even be a little less. Fine, then. Does the Linux nfs server implementation use the lease time of the previous server instance as grace period on startup, or does it simply take whatever it finds in /proc/fs/nfsd/nfsv4leasetime? -- Thanks for taking time, Feri. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: server does not abort grace period 2011-02-22 17:05 ` Ferenc Wagner @ 2011-02-23 19:52 ` J. Bruce Fields 2011-02-24 17:06 ` Ferenc Wagner 0 siblings, 1 reply; 8+ messages in thread From: J. Bruce Fields @ 2011-02-23 19:52 UTC (permalink / raw) To: Ferenc Wagner; +Cc: linux-nfs On Tue, Feb 22, 2011 at 06:05:14PM +0100, Ferenc Wagner wrote: > "J. Bruce Fields" <bfields@fieldses.org> writes: > > The NFSv4.0 protocol doesn't provide any way for clients to tell the > > server that they have finished recovering; as long as *any* clients > > held state on the previous server instance, the new server is stuck > > waiting out the whole grace period. Some things we could do: > > > > - We could at least recognize the case where *no* clients held > > state before, and end the grace period early in that case. > > Would this mean that /var/lib/nfs/v4recovery is empty on the server? Right. > Actually, it contains a hex-named empty directory, sometimes two (we're > running with two clients at the moment). > > > - In the NFSv4.1 case there is a "reclaim complete" rpc that > > clients are required to send. Currently we don't take > > advantage of that to end the grace period early, but we > > should. That's no help for 4.0 clients. > > /proc/fs/nfsd/versions shows +4.1 on the server, does this mean that > nfs4 type Linux client mounts should issue "reclaim complete"? It means that a 4.1 is supported, so a client *could* use 4.1 if it asked to. And if it did use 4.1, yes, it would be required to issue reclaim complete. Current linux clients do not use 4.1 unless you explicitly ask for it on the mont commandline. (Aside: the server really shouldn't have +4.1 by default, as the 4.1 server is not done. We should fix that; which distro are you using?) > I see > that it won't help anyway at the moment, lacking server support, just > out of interest... > > > - We could record a count of all locks/opens held in stable > > storage and use that to decide when a client is done > > recovering. That would be complicated and risk slowing down > > normal opens and locks a lot. > > And the "reclaim complete" client RPC seems must better anyway, as the > server and the client may get out of sync in case of an unclean client > shutdown. > > > I don't think decreasing the lease time would be so terrible. Perhaps > > the default should even be a little less. > > Fine, then. Does the Linux nfs server implementation use the lease time > of the previous server instance as grace period on startup, or does it > simply take whatever it finds in /proc/fs/nfsd/nfsv4leasetime? The latest server has separately tunable "nfsv4gracetime" and "nfsv4leasetime", and if you want to be careful, you should: - stop the server - set nfsv4gracetime to the *previous* lease time - set nfsv4leasetime to the *new* lease time - start the server That gives you the new (lower) lease time while still giving a sufficiently long grace period for clients who only knew about the old time to recover. After doing that once, on future restarts you can use the shorter time for both. Probably we should write utilites which do this right for you.... --b. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: server does not abort grace period 2011-02-23 19:52 ` J. Bruce Fields @ 2011-02-24 17:06 ` Ferenc Wagner 2011-02-24 17:30 ` J. Bruce Fields 0 siblings, 1 reply; 8+ messages in thread From: Ferenc Wagner @ 2011-02-24 17:06 UTC (permalink / raw) To: linux-nfs "J. Bruce Fields" <bfields@fieldses.org> writes: > On Tue, Feb 22, 2011 at 06:05:14PM +0100, Ferenc Wagner wrote: > >> "J. Bruce Fields" <bfields@fieldses.org> writes: >> >>> - In the NFSv4.1 case there is a "reclaim complete" rpc that >>> clients are required to send. Currently we don't take >>> advantage of that to end the grace period early, but we >>> should. That's no help for 4.0 clients. >> >> /proc/fs/nfsd/versions shows +4.1 on the server, does this mean that >> nfs4 type Linux client mounts should issue "reclaim complete"? > > It means that a 4.1 is supported, so a client *could* use 4.1 if it > asked to. And if it did use 4.1, yes, it would be required to issue > reclaim complete. Current linux clients do not use 4.1 unless you > explicitly ask for it on the mount commandline. I can't find any mention of 4.1 in man nfs (nfs-common version 1.2.2), is there an undocumented nfsvers=4.1 mount option or some other means? > (Aside: the server really shouldn't have +4.1 by default, as the 4.1 > server is not done. We should fix that; which distro are you using?) Debian squeeze. If it's switchable, then it's possible I switched it on, I can't remember. However, 4.1 client support is disabled in the stock kernel config, and 4.1 server support isn't even mentioned: $ fgrep NFS /boot/config-2.6.32-5-686 CONFIG_NFS_FS=m CONFIG_NFS_V3=y CONFIG_NFS_V3_ACL=y CONFIG_NFS_V4=y # CONFIG_NFS_V4_1 is not set CONFIG_NFS_FSCACHE=y CONFIG_NFSD=m CONFIG_NFSD_V2_ACL=y CONFIG_NFSD_V3=y CONFIG_NFSD_V3_ACL=y CONFIG_NFSD_V4=y CONFIG_NFS_ACL_SUPPORT=m CONFIG_NFS_COMMON=y CONFIG_NCPFS_NFS_NS=y >> Does the Linux nfs server implementation use the lease time of the >> previous server instance as grace period on startup, or does it >> simply take whatever it finds in /proc/fs/nfsd/nfsv4leasetime? > > The latest server has separately tunable "nfsv4gracetime" and > "nfsv4leasetime", and if you want to be careful, you should: > > - stop the server > - set nfsv4gracetime to the *previous* lease time > - set nfsv4leasetime to the *new* lease time > - start the server > > That gives you the new (lower) lease time while still giving a > sufficiently long grace period for clients who only knew about the old > time to recover. After doing that once, on future restarts you can use > the shorter time for both. Yes, this is exactly where I was going to (and what's recommended in the RFC). Good to hear it's already implemented! > Probably we should write utilites which do this right for you.... No worries, I won't be changing the lease time that frequently. :) -- Thanks a lot, Feri. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: server does not abort grace period 2011-02-24 17:06 ` Ferenc Wagner @ 2011-02-24 17:30 ` J. Bruce Fields 2011-02-25 16:51 ` Ferenc Wagner 0 siblings, 1 reply; 8+ messages in thread From: J. Bruce Fields @ 2011-02-24 17:30 UTC (permalink / raw) To: Ferenc Wagner; +Cc: linux-nfs On Thu, Feb 24, 2011 at 06:06:00PM +0100, Ferenc Wagner wrote: > "J. Bruce Fields" <bfields@fieldses.org> writes: > > > On Tue, Feb 22, 2011 at 06:05:14PM +0100, Ferenc Wagner wrote: > > > >> "J. Bruce Fields" <bfields@fieldses.org> writes: > >> > >>> - In the NFSv4.1 case there is a "reclaim complete" rpc that > >>> clients are required to send. Currently we don't take > >>> advantage of that to end the grace period early, but we > >>> should. That's no help for 4.0 clients. > >> > >> /proc/fs/nfsd/versions shows +4.1 on the server, does this mean that > >> nfs4 type Linux client mounts should issue "reclaim complete"? > > > > It means that a 4.1 is supported, so a client *could* use 4.1 if it > > asked to. And if it did use 4.1, yes, it would be required to issue > > reclaim complete. Current linux clients do not use 4.1 unless you > > explicitly ask for it on the mount commandline. > > I can't find any mention of 4.1 in man nfs (nfs-common version 1.2.2), > is there an undocumented nfsvers=4.1 mount option or some other means? -ominorversion=1 > > (Aside: the server really shouldn't have +4.1 by default, as the 4.1 > > server is not done. We should fix that; which distro are you using?) > > Debian squeeze. If it's switchable, then it's possible I switched it > on, I can't remember. However, 4.1 client support is disabled in the > stock kernel config, and 4.1 server support isn't even mentioned: There's no separate config option, but the kernel keeps it off by default. I think nfs-utils is overriding the kernel's default. We should fix that. --b. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: server does not abort grace period 2011-02-24 17:30 ` J. Bruce Fields @ 2011-02-25 16:51 ` Ferenc Wagner 0 siblings, 0 replies; 8+ messages in thread From: Ferenc Wagner @ 2011-02-25 16:51 UTC (permalink / raw) To: J. Bruce Fields; +Cc: linux-nfs "J. Bruce Fields" <bfields@fieldses.org> writes: > On Thu, Feb 24, 2011 at 06:06:00PM +0100, Ferenc Wagner wrote: >> "J. Bruce Fields" <bfields@fieldses.org> writes: >> >>> On Tue, Feb 22, 2011 at 06:05:14PM +0100, Ferenc Wagner wrote: >>> >>>> "J. Bruce Fields" <bfields@fieldses.org> writes: >>>> >>>>> - In the NFSv4.1 case there is a "reclaim complete" rpc that >>>>> clients are required to send. Currently we don't take >>>>> advantage of that to end the grace period early, but we >>>>> should. That's no help for 4.0 clients. >>>> >>>> /proc/fs/nfsd/versions shows +4.1 on the server, does this mean that >>>> nfs4 type Linux client mounts should issue "reclaim complete"? >>> >>> It means that a 4.1 is supported, so a client *could* use 4.1 if it >>> asked to. And if it did use 4.1, yes, it would be required to issue >>> reclaim complete. Current linux clients do not use 4.1 unless you >>> explicitly ask for it on the mount commandline. >> >> I can't find any mention of 4.1 in man nfs (nfs-common version 1.2.2), >> is there an undocumented nfsvers=4.1 mount option or some other means? > > -ominorversion=1 Hmm, looks like this feature didn't make it into the squeeze version of mount.nfs4. And it's disabled in the kernel, anyway. >>> (Aside: the server really shouldn't have +4.1 by default, as the 4.1 >>> server is not done. We should fix that; which distro are you using?) >> >> Debian squeeze. If it's switchable, then it's possible I switched it >> on, I can't remember. However, 4.1 client support is disabled in the >> stock kernel config, and 4.1 server support isn't even mentioned: > > There's no separate config option, but the kernel keeps it off by > default. I think nfs-utils is overriding the kernel's default. We > should fix that. At least I can't find any occurence of fs/nfsd/version in my startup and config scripts. -- Regards, Feri. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-02-25 16:51 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-02-11 12:18 server does not abort grace period Ferenc Wagner 2011-02-21 19:54 ` Ferenc Wagner 2011-02-22 1:11 ` J. Bruce Fields 2011-02-22 17:05 ` Ferenc Wagner 2011-02-23 19:52 ` J. Bruce Fields 2011-02-24 17:06 ` Ferenc Wagner 2011-02-24 17:30 ` J. Bruce Fields 2011-02-25 16:51 ` Ferenc Wagner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).