All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Amir Goldstein <amir73il@gmail.com>,
	Eddie Horng <eddiehorng.tw@gmail.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	overlayfs <linux-unionfs@vger.kernel.org>,
	Trond Myklebust <trondmy@primarydata.com>,
	"J. Bruce Fields" <bfields@fieldses.org>
Subject: Re: flock fails in overlay nfs-exported file
Date: Tue, 13 Mar 2018 08:51:40 -0400	[thread overview]
Message-ID: <1520945500.4474.26.camel@kernel.org> (raw)
In-Reply-To: <CAOQ4uxhTqKkU_md1t8QSs9xGnBk+qr_3tv4FH3D5Nqg8jUpbQQ@mail.gmail.com>

On Tue, 2018-03-13 at 08:24 +0200, Amir Goldstein wrote:
> [CC some NFS/lock folks (see history below top post)]
> 
> On Tue, Mar 13, 2018 at 3:39 AM, Eddie Horng <eddiehorng.tw@gmail.com> wrote:
> > Hi Amir,
> > Thanks your prompt response. After compare flock(1) and my flock(2)
> > test program, it seems open flag makes the result different. strace
> > result shows open with O_RDONLY flock fails (case A), open with
> > O_RDWR|O_CREAT|O_NOCTTY flock works (case B) and open local ext4 file
> > with O_RDONLY flock works too (case C)
> > 
> > case A:
> > strace myflock /mnt/n/foo
> > open("/mnt/n/foo", O_RDONLY)            = 3
> > flock(3, LOCK_EX|LOCK_NB)               = -1 EBADF (Bad file descriptor)
> > 
> 
> It looks like flock(1) has special code to handle this case for NFSv4
> and fall back to open O_RDRW:
> https://github.com/karelzak/util-linux/blob/master/sys-utils/flock.c#L295
> 
> Although I tested with NFSv3 and open flags used by flock(1)
> where O_RDONLY|O_CREAT|O_NOCTTY
> 
> Why do you need to get an exclusive lock on a file that is open for read?
> Can you open the file for write and resolve the issue like flock(1) does?
> 
> You should know that even if you manage to lock a O_RDONLY fd,
> if this file is then open for write by another process, that process will
> get a file descriptor pointing to a *different* inode.
> This is a long standing issue with overlayfs (inconsistent ro/rw fd),
> which is being worked around by some user applications -
> i.e. touch the file before first access to avoid applications
> getting open file descriptor to lower inode.
> 
> Let me know if this answer suffice or if you get this error only
> with NFSv4 over overalyfs.
>
> > case B:
> > strace flock -x -n /mnt/n/foo echo locked
> > open("/mnt/n/foo", O_RDWR|O_CREAT|O_NOCTTY, 0666) = 3
> > flock(3, LOCK_EX|LOCK_NB)               = 0
> > 
> > case C:
> > strace myflock /tmp/t
> > open("/tmp/t", O_RDONLY)                = 3
> > flock(3, LOCK_EX|LOCK_NB)               = 0
> > 
> 
> So that presumably works because the test is not over NFS and not
> because test is not over NFS+overlayfs, because of no NFSv4 flock
> emulation.
> 

Agreed. The real issue here is that NFSv4 emulates flock locks using
LOCK/LOCKT byte-range locks. The NFSv4 spec does not allow you to set a
write lock on a file open read-only, so that just plain doesn't work on
NFSv4.

> 
> > Below is my test configuration of case A:
> > - underlying filesystem:
> > ext4
> > - /proc/mounts:
> > /dev/disk/by-uuid/a2d5005c-.... / ext4
> > rw,relatime,errors=remount-ro,data=ordered 0 0
> > none /share overlay
> > rw,relatime,lowerdir=/base/lower,upperdir=/base/upper,workdir=/base/work,index=on,nfs_export=on
> > 0 0
> > localhost:/share /mnt/n nfs4
> > rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=127.0.0.1,local_lock=none,addr=127.0.0.1
> > 0 0
> > - /etc/exports
> > /share *(rw,sync,no_subtree_check,no_root_squash,fsid=41)
> > 
> > 
> > For dmesg, in case A, there's no any output from dmesg, however in my
> > applications running with overlay nfs exported files, there are  some
> > lock related messages. Which lock call triggers it, need more
> > investigation.
> > The message from nfs server side is like:
> > [  872.940080] Leaked POSIX lock on dev=0x0:0x42 ino=0xf5a1
> > fl_owner=0000000023265f44 fl_flags=0x1 fl_type=0x1 fl_pid=1
> > [ 1939.829655] Leaked locks on dev=0x0:0x42 ino=0xf5a1:
> > [ 1939.829659] POSIX: fl_owner=0000000023265f44 fl_flags=0x1
> > fl_type=0x1 fl_pid=1
> > 
> 
> I'm not sure what those mean. Maybe NFS folks can shed some light.
>

That means that there was a file_lock associated with this struct file
that was left on the POSIX lock list after filp_close. Either it didn't
get released properly or a lock raced onto the list after
locks_remove_posix ran. That should never happen, so this is likely a
bug.

> Thanks,
> Amir.
> 
> > 
> > 2018-03-12 20:07 GMT+08:00 Amir Goldstein <amir73il@gmail.com>:
> > > On Mon, Mar 12, 2018 at 9:38 AM, Eddie Horng <eddiehorng.tw@gmail.com> wrote:
> > > > Hello Miklos,
> > > > I'd like to report a flock(2) problem to overlay nfs-exported files.
> > > > The error return from flock(2) is "Bad file descriptor".
> > > > 
> > > > Environment:
> > > > OS: Ubuntu 14.04.2 LTS
> > > > Kernel: 4.16.0-041600rc4-generic (from
> > > > http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16-rc4/)
> > > > 
> > > > Reproduce step:
> > > > (nfs server side)
> > > > mount -t overlay
> > > > -orw,lowerdir=/mnt/ro,upperdir=/mnt/u,workdir=/mnt/w,nfs_export=on,index=on
> > > > none /mnt/m
> > > > touch /mnt/m/foo
> > > > (nfs client side)
> > > > mount server:/mnt/m /mnt/n
> > > > 
> > > > flock /mnt/n/foo
> > > > failed to lock file '/mnt/n/foo': Bad file descriptor
> > > > 
> > > 
> > > Does not reproduce on my end. I am using v4.16-rc5, but I don't think
> > > any of the fixes there are relevant to this failure.
> > > 
> > > This is what I have for underlying fs, overlay and nfs mount options
> > > (index and nfs_export are on by default in my kernel):
> > > 
> > > /dev/mapper/storage-lower_layer on /base type xfs
> > > (rw,relatime,attr2,inode64,noquota)
> > > share on /share type overlay
> > > (rw,relatime,lowerdir=/base/lower,upperdir=/base/upper/0,workdir=/base/upper/work0)
> > > c800:/share on /mnt/t type nfs
> > > (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.91.126,mountvers=3,mountport=49494,mountproto=udp,local_lock=none,addr=192.168.91.126)
> > > 
> > > $ touch /mnt/t/foo
> > > $ flock -x -n /mnt/t/foo echo locked
> > > locked
> > > 
> > > Please share more information about nfs mount options and underlying filesystem
> > > 
> > > Please check if you see any relevant errors/warnings in dmesg.
> > > 
> > > Thanks,
> > > Amir.

-- 
Jeff Layton <jlayton@kernel.org>

  parent reply	other threads:[~2018-03-13 12:51 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-12  7:38 flock fails in overlay nfs-exported file Eddie Horng
2018-03-12 12:07 ` Amir Goldstein
2018-03-13  1:39   ` Eddie Horng
2018-03-13  6:24     ` Amir Goldstein
2018-03-13  8:40       ` Eddie Horng
2018-03-13 11:40         ` Amir Goldstein
2018-03-13 12:51       ` Jeff Layton [this message]
2018-03-14  2:11         ` Eddie Horng
  -- strict thread matches above, loose matches on Subject: below --
2018-03-12  7:13 Eddie Horng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1520945500.4474.26.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=amir73il@gmail.com \
    --cc=bfields@fieldses.org \
    --cc=eddiehorng.tw@gmail.com \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=trondmy@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.