From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael Kerrisk (man-pages)" Subject: Re: flock() and NFS [Was: Re: [PATCH] locks: rename file-private locks to file-description locks] Date: Tue, 29 Apr 2014 11:53:40 +0200 Message-ID: <535F76A4.4090208@gmail.com> References: <1398087935-14001-1-git-send-email-jlayton@redhat.com> <20140421140246.GB26358@brightrain.aerifal.cx> <535529FA.8070709@gmail.com> <20140421161004.GC26358@brightrain.aerifal.cx> <5355644C.7000801@gmail.com> <20140421184640.GD26358@brightrain.aerifal.cx> <535573E0.9080106@gmail.com> <20140421155520.3b33fbef@ipyr.poochiereds.net> <53558A73.3010602@samba.org> <5355F60C.8010004@gmail.com> <20140427145125.21e7e6c6@notabene.brown> <535CCAD2.4060304@gmail.com> <20140427200431.426c98d1@notabene.brown> <20140428072845.67f48d8e@notabene.brown> <535F6BC4.2090601@gmail.com> <20140429192458.641ebf1d@notabene.brown> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, "Stefan (metze) Metzmacher" , Jeff Layton , "linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , lkml , Ganesha NFS List , Suresh Jayaraman , Trond Myklebust , Christoph Hellwig , linux-nfs , "J. Bruce Fields" To: NeilBrown Return-path: In-Reply-To: <20140429192458.641ebf1d-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> Sender: linux-nfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: linux-fsdevel.vger.kernel.org On 04/29/2014 11:24 AM, NeilBrown wrote: > On Tue, 29 Apr 2014 11:07:16 +0200 "Michael Kerrisk (man-pages)" > wrote: >=20 >> On 04/27/2014 11:28 PM, NeilBrown wrote: >>> On Sun, 27 Apr 2014 13:11:33 +0200 "Michael Kerrisk (man-pages)" >>> wrote: >>> >>>> On Sun, Apr 27, 2014 at 12:04 PM, NeilBrown wrote: >>>>> On Sun, 27 Apr 2014 11:16:02 +0200 "Michael Kerrisk (man-pages)" >>>>> wrote: >>>>> >>>>>> [Trimming some folk from CC, and adding various NFS people] >>>>>> >>>>>> On 04/27/2014 06:51 AM, NeilBrown wrote: >>>>>> >>>>>> [...] >>>>>> >>>>>>> Note to Michael: The text >>>>>>> flock() does not lock files over NFS. >>>>>>> in flock(2) is no longer accurate. The reality is ... complex. >>>>>>> See nfs(5), and search for "local_lock". >>>>>> >>>>>> Ahhh -- I see: >>>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/c= ommit/?id=3D5eebde23223aeb0ad2d9e3be6590ff8bbfab0fc2 >>>>>> >>>>>> Thanks for the heads up. >>>>>> >>>>>> Just in general, it would be great if the flock(2) and fcntl(2) = man pages >>>>>> contained correct details for NFS, of course. So, for example, i= f there >>>>>> are any current gotchas for NFS and fcntl() byte-range locking, = I'd like >>>>>> to add those to the fcntl(2) man page. >>>>> >>>>> The only peculiarities I can think of are: >>>>> - With NFS, locking or unlocking a region forces a flush of any = cached data >>>>> for that file (or maybe for the region of the file). I'm not = sure if this >>>>> is worth mentioning. >>>> >>>> I agree that it's probably not necessary to mention. >>>> >>>>> - With NFSv4 the client can lose a lock if it is out of contact = with the >>>>> server for a period of time. When this happens, any IO to the= file by a >>>>> process which "thinks" it holds a lock will fail until that pr= ocess closes >>>>> and re-opens the file. >>>>> This behaviour is since 3.12. Prior to that the client might = lose and >>>>> regain the lock without ever knowing thus potentially risking = corruption >>>>> (but only if client and server lost contact for an extended pe= riod). >>>> >>>> Do you have a pointer for that commit to 3.12? >>>> >>> >>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/comm= it/?id=3Def1820f9be27b6ad158f433ab38002ab8131db4d >>> >>> did most of the work while the subsequent commit >>> >>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/comm= it/?id=3Df6de7a39c181dfb8a2c534661a53c73afb3081cd >>> >>> changed some details, added some documentation, and inverted the de= fault >>> behaviour. >> >> Thanks for that detail. What do you think of the following text for = the=20 >> fcntl(2) man page: >> >> Before Linux 3.12, if an NFS client is out of contact with t= he >> server for a period of time, it might lose and regain a lo= ck >> without ever being aware of the fact. This scenario pote= n=E2=80=90 >> tially risks data corruption, since another process mig= ht >> acquire a lock in the intervening period and perform file I/= O. >> Since Linux 3.12, if the client loses contact with the serve= r, >> any I/O to the file by a process which "thinks" it holds a lo= ck >> will fail until that process closes and reopens the file. = A >> kernel parameter, nfs.recover_lost_locks, can be set to 1 = to >> obtain the pre-3.12 behavior, whereby the client will attem= pt >> to recover lost locks when contact is reestablished with t= he >> server. Because of the attendant risk of data corruption, th= is >> parameter defaults to 0 (disabled). >> >=20 > Mostly good. >=20 > I'm just a little concerned about "if the client loses contact with t= he > server" in the middle there. It is no longer qualified and it isn't = clear > that the "for a period of time" qualification still applied. And we = should > probably quantify the period of time - which defaults to 90 seconds. > I don't remember just now the difference between > /proc/fs/nfsd/nfsv4{lease,grace}time > but this 90 seconds is one of those. >=20 > Also this is NFSv4 specific. With NFSv3 the failure mode is the reve= rse. If > the server loses contact with a client then any lock stays in place > indefinitely ("why can't I read my mail"... I remember it well). >=20 > Before Linux 3.12, if an NFSv4 client loses contact with the server > (defined as more than 90 seconds with no communication), it might l= ose > and regain .... Thanks, Neil. Changed as you suggest. I'd quite like to mention which of /proc/fs/nfsd/nfsv4{lease,grace}time is relevant here. I had a= =20 quick scan, but could not determine it with complete confidence. My sus= picion,=20 looking at fs/lockd/svcproc.c and fs/lockd/grace.c::locks_in_grace() is that it is /proc/fs/nfsd/nfsv4gracetime that is relevant here. Can a= nyone confirm? Cheers, Michael --=20 Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html