Re: sm notify (nlm) question

Linux NFS development
 help / color / mirror / Atom feed

From: Trond Myklebust <trondmy@hammerspace.com>
To: "ffilzlnx@mindspring.com" <ffilzlnx@mindspring.com>,
	"aglo@umich.edu" <aglo@umich.edu>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"chuck.lever@oracle.com" <chuck.lever@oracle.com>
Subject: Re: sm notify (nlm) question
Date: Wed, 22 May 2024 16:20:18 +0000	[thread overview]
Message-ID: <33bdba418d7ccd7d9b87fa478ea71a14e979aaf1.camel@hammerspace.com> (raw)
In-Reply-To: <CAN-5tyHWeQykx0cFpOFf2hTBRk9_NaZarzeeAFdSu2NW0zqobA@mail.gmail.com>

On Wed, 2024-05-22 at 09:57 -0400, Olga Kornievskaia wrote:
> On Tue, May 14, 2024 at 6:13 PM Frank Filz <ffilzlnx@mindspring.com>
> wrote:
> > 
> > 
> > 
> > > -----Original Message-----
> > > From: Olga Kornievskaia [mailto:aglo@umich.edu]
> > > Sent: Tuesday, May 14, 2024 2:50 PM
> > > To: Frank Filz <ffilzlnx@mindspring.com>
> > > Cc: Chuck Lever III <chuck.lever@oracle.com>; Linux NFS Mailing
> > > List <linux-
> > > nfs@vger.kernel.org>
> > > Subject: Re: sm notify (nlm) question
> > > 
> > > On Tue, May 14, 2024 at 5:36 PM Frank Filz
> > > <ffilzlnx@mindspring.com> wrote:
> > > > 
> > > > > > On May 14, 2024, at 2:56 PM, Olga Kornievskaia
> > > > > > <aglo@umich.edu>
> > > wrote:
> > > > > > 
> > > > > > Hi folks,
> > > > > > 
> > > > > > Given that not everything for NFSv3 has a specification, I
> > > > > > post a
> > > > > > question here (as it concerns linux v3 (client)
> > > > > > implementation)
> > > > > > but I ask a generic question with respect to NOTIFY sent by
> > > > > > an NFS server.
> > > > > 
> > > > > There is a standard:
> > > > > 
> > > > > https://pubs.opengroup.org/onlinepubs/9629799/chap11.htm
> > > > > 
> > > > > 
> > > > > > A NOTIFY message that is sent by an NFS server upon reboot
> > > > > > has a
> > > > > > monitor name and a state. This "state" is an integer and is
> > > > > > modified on each server reboot. My question is: what about
> > > > > > state
> > > > > > value uniqueness? Is there somewhere some notion that this
> > > > > > value
> > > > > > has to be unique (as in say a random value).
> > > > > > 
> > > > > > Here's a problem. Say a client has 2 mounts to ip1 and ip2
> > > > > > (both
> > > > > > representing the same DNS name) and acquires a lock per
> > > > > > mount. Now
> > > > > > say each of those servers reboot. Once up they each send a
> > > > > > NOTIFY
> > > > > > call and each use a timestamp as basis for their "state"
> > > > > > value --
> > > > > > which very likely is to produce the same value for 2
> > > > > > servers
> > > > > > rebooted at the same time (or for the linux server that
> > > > > > looks like
> > > > > > a counter). On the client side, once the client processes
> > > > > > the 1st
> > > > > > NOTIFY call, it updates the "state" for the monitor name
> > > > > > (ie a
> > > > > > client monitors based on a DNS name which is the same for
> > > > > > ip1 and
> > > > > > ip2) and then in the current code, because the 2nd NOTIFY
> > > > > > has the
> > > > > > same "state" value this NOTIFY call would be ignored. The
> > > > > > linux
> > > > > > client would never reclaim the 2nd lock (but the
> > > > > > application
> > > > > > obviously would never know it's missing a lock)
> > > > > > --- data corruption.
> > > > > > 
> > > > > > Who is to blame: is the server not allowed to send "non-
> > > > > > unique"
> > > > > > state value? Or is the client at fault here for some
> > > > > > reason?
> > > > > 
> > > > > The state value is supposed to be specific to the monitored
> > > > > host. If
> > > > > the client is indeed ignoring the second reboot notification,
> > > > > that's incorrect
> > > behavior, IMO.
> > > > 
> > > > If you are using multiple server IP addresses with the same DNS
> > > > name, you
> > > may want to set:
> > > > 
> > > > sysctl fs.nfs.nsm_use_hostnames=0
> > > > 
> > > > The NLM will register with statd using the IP address as name
> > > > instead of host
> > > name. Then your two IP addresses will each have a separate
> > > monitor entry and
> > > state value monitored.
> > > 
> > > In my setup I already have this set to 0. But I'll look around
> > > the code to see what
> > > it is supposed to do.
> > 
> > Hmm, maybe it doesn't work on the client side. I don't often test
> > NLM clients with my Ganesha work because I only run one VM and NLM
> > clients can’t function on the same host as any server other than
> > knfsd...
> 
> I've been staring and tracing the code and here's what I conclude:
> the
> use of nsm_use_hostname toggles nothing that helps. No matter what
> statd always stores whatever it is monitoring based on the DSN name
> (looks like git blame says it's due to nfs-utils's commit
> 0da56f7d359475837008ea4b8d3764fe982ef512 "statd - use dnsname to
> ensure correct matching of NOTIFY requests". Now what's worse is that
> when statd receives a 2nd monitoring request from lockd for something
> that maps to the same DNS name, statd overwrites the previous
> monitoring information it had. When a NOTIFY arrives from an IP
> matching the DNS name, the statd does the downcall and it will send
> whatever the last monitoring information lockd gave it. Therefore all
> the other locks will never be recovered.
> 
> What I struggle with is how to solve this problem. Say ip1 and ip2
> run
> an NFS server and both are known under the same DNS name:
> foo.bar.com.
> Does it mean that they represent the "same" server? Can we assume
> that
> if one of them "rebooted" then the other rebooted as well?  It seems
> like we can't go backwards and go back to monitoring by IP. In that
> case I can see that we'll get in trouble if the rebooted server
> indeed
> comes back up with a different IP (same DNS name) and then it would
> never match the old entry and the lock would never be recovered (but
> then also I think lockd will only send the lock to the IP is stored
> previously which in this case would be unreachable). If statd
> continues to monitor by DNS name and then matches either ips to the
> stored entry, then the problem comes with "state" update. Once statd
> processes one NOTIFY which matched the DNS name its state "should" be
> updated but then it would leads us back into the problem if ignoring
> the 2nd NOTIFY call. If statd were to be changed to store multiple
> monitor handles lockd asked to monitor, then when the 1st NOTIFY call
> comes we can ask lockd to recover "all" the store handles. But then
> it
> circles back to my question: can we assume that if one IP rebooted
> does it imply all IPs rebooted?
> 
> Perhaps it's lockd that needs to change in how it keeps track of
> servers that hold locks. The behaviour seems to have changed in 2010
> (with commit 8ea6ecc8b0759756a766c05dc7c98c51ec90de37 "lockd: Create
> client-side nlm_host cache") when nlm_host cache was introduced
> written to be based on hash of IP. It seems that before things were
> based on a DNS name making it in line with statd.
> 
> Anybody has any thoughts as to whether statd or lockd needs to
> change?
> 

I believe Tom Talpey is to blame for the nsm_use_hostname stuff. That
all came from his 2006 Connectathon talk
https://nfsv4bat.org/Documents/ConnectAThon/2006/talpey-cthon06-nsm.pdf

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com

next prev parent reply	other threads:[~2024-05-22 16:20 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-14 20:56 sm notify (nlm) question Olga Kornievskaia
2024-05-14 21:08 ` Chuck Lever III
2024-05-14 21:21   ` Olga Kornievskaia
2024-05-14 21:36   ` Frank Filz
2024-05-14 21:49     ` Olga Kornievskaia
2024-05-14 22:13       ` Frank Filz
2024-05-22 13:57         ` Olga Kornievskaia
2024-05-22 16:20           ` Trond Myklebust [this message]
2024-05-22 17:18             ` Tom Talpey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=33bdba418d7ccd7d9b87fa478ea71a14e979aaf1.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=aglo@umich.edu \
    --cc=chuck.lever@oracle.com \
    --cc=ffilzlnx@mindspring.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox