* server_scope v4.1 lock reclaim @ 2015-04-27 6:07 Saso Slavicic 2015-04-27 15:19 ` J. Bruce Fields 0 siblings, 1 reply; 4+ messages in thread From: Saso Slavicic @ 2015-04-27 6:07 UTC (permalink / raw) To: linux-nfs Hi, I'm doing a NFS HA setup for KVM and need lock reclaim to work. I've been doing a lot of testing and reading in the past week and finally figured out that for reclaims to work on a 4.1 mount (4.1 is preferable due to RECLAIM_COMPLETE and thus faster failover), the server hostnames need to be the same. RFC specifies that reclaim can succeed if server scope is the same and in fact, the client will not even attempt a reclaim if the server scope does not match. But...there doesn't seem to be any way of setting server scope other than changing server hostname? RFC states: "The purpose of the server scope is to allow a group of servers to indicate to clients that a set of servers sharing the same server scope value has arranged to use compatible values of otherwise opaque identifiers." The nfsdcltrack directory is properly handed over during failover so I'd need some way of configuring server scope on this "set of servers"? From the code, the server scope is simply set to utsname()->nodename in nfs4xdr.c. What am I missing here, how can this work when Heartbeat needs different names for nodes? Thanks, Saso Slavicic ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: server_scope v4.1 lock reclaim 2015-04-27 6:07 server_scope v4.1 lock reclaim Saso Slavicic @ 2015-04-27 15:19 ` J. Bruce Fields 2015-04-28 16:44 ` Saso Slavicic 0 siblings, 1 reply; 4+ messages in thread From: J. Bruce Fields @ 2015-04-27 15:19 UTC (permalink / raw) To: Saso Slavicic; +Cc: linux-nfs On Mon, Apr 27, 2015 at 08:07:12AM +0200, Saso Slavicic wrote: > I'm doing a NFS HA setup for KVM and need lock reclaim to work. I've been > doing a lot of testing and reading in the past week and finally figured out > that for reclaims to work on a 4.1 mount (4.1 is preferable due to > RECLAIM_COMPLETE and thus faster failover), the server hostnames need to be > the same. RFC specifies that reclaim can succeed if server scope is the same > and in fact, the client will not even attempt a reclaim if the server scope > does not match. > > But...there doesn't seem to be any way of setting server scope other than > changing server hostname? RFC states: "The purpose of the server scope is to > allow a group of servers to indicate to clients that a set of servers > sharing the same server scope value has arranged to use compatible values of > otherwise opaque identifiers." The nfsdcltrack directory is properly handed > over during failover so I'd need some way of configuring server scope on > this "set of servers"? From the code, the server scope is simply set to > utsname()->nodename in nfs4xdr.c. > > What am I missing here, how can this work when Heartbeat needs different > names for nodes? So in theory we could add some sort of way to configure the server scope and then you could set the server scope to the same thing on all your servers. But that's not enough to satisfy https://tools.ietf.org/html/rfc5661#section-2.10.4, which also requires stateid's and the rest to be compatible between the servers. In practice given current Linux servers and clients maybe that could work, because in your situation the only case when they see each other's stateid's is after a restart, in which case the id's will include a boot time that will result in a STALE error as long as the server clocks are roughly synchronized. But that makes some assumptions about how our servers generate id's and how the clients use them. And I don't think those assumptions are guaranteed by the spec. It seems fragile. If it's simple active-to-passive failover then I suppose you could arrange for the utsname to be the same too. --b. ^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: server_scope v4.1 lock reclaim 2015-04-27 15:19 ` J. Bruce Fields @ 2015-04-28 16:44 ` Saso Slavicic 2015-04-28 18:23 ` 'J. Bruce Fields' 0 siblings, 1 reply; 4+ messages in thread From: Saso Slavicic @ 2015-04-28 16:44 UTC (permalink / raw) To: 'J. Bruce Fields'; +Cc: linux-nfs > From: J. Bruce Fields > Sent: Monday, April 27, 2015 5:20 PM > So in theory we could add some sort of way to configure the server scope > and then you could set the server scope to the same thing on all your > servers. > > But that's not enough to satisfy > https://tools.ietf.org/html/rfc5661#section-2.10.4, which also requires > stateid's and the rest to be compatible between the servers. OK...I have to admit that with the amount of NFS HA tutorials and the improvements that NFS v4(.1) brings in the specs, I assumed that HA failover was supported. I apologize if that is not the case. So, such a config option could be added but it's not planned to be added, since it could be wrongly used in some situations (ie. not doing active-to-passive failover)? Active-active setup is then totally out of the question? > In practice given current Linux servers and clients maybe that could > work, because in your situation the only case when they see each other's > stateid's is after a restart, in which case the id's will include a boot > time that will result in a STALE error as long as the server clocks are > roughly synchronized. But that makes some assumptions about how our > servers generate id's and how the clients use them. And I don't think > those assumptions are guaranteed by the spec. It seems fragile. I read (part of) the specs and stateids are supposed to hold over sessions but not for different client ids. Doing a wireshark dump, the (failover) server sends STALE_CLIENTID after reconnect so that should properly invalidate all the ids? Would I assume correctly that this is read from the nfsdcltrack? Is there even a need for this database to sync between each failover, if the client is already known since it's last failover (only the timestamp would be older)? > If it's simple active-to-passive failover then I suppose you could > arrange for the utsname to be the same too. I could, but then I don't know which server is active when I login to ssh :) What would happen, if the 'migration' mount option would be modified for v4.1 mounts not to check for server scope when doing reclaims (as opposed to configuring server scope)? :) Thanks, Saso Slavicic ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: server_scope v4.1 lock reclaim 2015-04-28 16:44 ` Saso Slavicic @ 2015-04-28 18:23 ` 'J. Bruce Fields' 0 siblings, 0 replies; 4+ messages in thread From: 'J. Bruce Fields' @ 2015-04-28 18:23 UTC (permalink / raw) To: Saso Slavicic; +Cc: linux-nfs On Tue, Apr 28, 2015 at 06:44:27PM +0200, Saso Slavicic wrote: > > From: J. Bruce Fields > > Sent: Monday, April 27, 2015 5:20 PM > > > So in theory we could add some sort of way to configure the server scope > > and then you could set the server scope to the same thing on all your > > servers. > > > > But that's not enough to satisfy > > https://tools.ietf.org/html/rfc5661#section-2.10.4, which also requires > > stateid's and the rest to be compatible between the servers. > > OK...I have to admit that with the amount of NFS HA tutorials and the > improvements that NFS v4(.1) brings in the specs, I assumed that HA failover > was supported. I apologize if that is not the case. I'm afraid you're in the vanguard--I doubt many people have tried HA with 4.1 and knfsd yet. (And I hadn't noticed the server scope problem, thanks for bringing it up.) > So, such a config option could be added but it's not planned to be added, > since it could be wrongly used in some situations (ie. not doing > active-to-passive failover)? > Active-active setup is then totally out of the question? I'm not sure what the right fix is yet. > > In practice given current Linux servers and clients maybe that could > > work, because in your situation the only case when they see each other's > > stateid's is after a restart, in which case the id's will include a boot > > time that will result in a STALE error as long as the server clocks are > > roughly synchronized. But that makes some assumptions about how our > > servers generate id's and how the clients use them. And I don't think > > those assumptions are guaranteed by the spec. It seems fragile. > > I read (part of) the specs and stateids are supposed to hold over sessions > but not for different client ids. > Doing a wireshark dump, the (failover) server sends STALE_CLIENTID after > reconnect so that should properly invalidate all the ids? Since this is 4.1, I guess the first rpc the new server sees will have either a clientid or a sessionid. So we want to make sure the new server will handle either of those correctly. > Would I assume correctly that this is read from the nfsdcltrack? Is there > even a need for this database to sync between each failover, if the client > is already known since it's last failover (only the timestamp would be > older)? So, you're thinking of a case where there's a failover from server A to server B, then back to server A again, and a single client is continuously active throughout both failovers? Here's the sort of case that's a concern: - A->B failover happens - client gets a file lock from B - client loses contact with B (network problem or something) - B->A failover happens. At this point, should A allow the client to reclaim its lock? B could have given up on the client, released its lock, and granted conflicting lock to other clients. Or it might not have. Neither the client nor A knows, B's the only one that knows what happened, so we need to get that database from B to find out. --b. > > If it's simple active-to-passive failover then I suppose you could > > arrange for the utsname to be the same too. > > I could, but then I don't know which server is active when I login to ssh :) > What would happen, if the 'migration' mount option would be modified for > v4.1 mounts not to check for server scope when doing reclaims (as opposed to > configuring server scope)? :) > > Thanks, > Saso Slavicic ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-04-28 18:23 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-04-27 6:07 server_scope v4.1 lock reclaim Saso Slavicic 2015-04-27 15:19 ` J. Bruce Fields 2015-04-28 16:44 ` Saso Slavicic 2015-04-28 18:23 ` 'J. Bruce Fields'
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).