From: Trond Myklebust <trondmy@primarydata.com>
To: "miklos@szeredi.hu" <miklos@szeredi.hu>,
"amir73il@gmail.com" <amir73il@gmail.com>
Cc: "bfields@fieldses.org" <bfields@fieldses.org>,
"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
"jlayton@poochiereds.net" <jlayton@poochiereds.net>,
"linux-unionfs@vger.kernel.org" <linux-unionfs@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: overlayfs NFS export
Date: Fri, 7 Apr 2017 15:58:27 +0000 [thread overview]
Message-ID: <1491580705.12269.3.camel@primarydata.com> (raw)
In-Reply-To: <CAOQ4uxgnvfZUdWfTsPMfjsvht9tEZi04GVyg=VzUNAWHyD=g_A@mail.gmail.com>
On Fri, 2017-04-07 at 18:45 +0300, Amir Goldstein wrote:
> On Fri, Apr 7, 2017 at 6:28 PM, Miklos Szeredi <miklos@szeredi.hu>
> wrote:
> > On Fri, Apr 7, 2017 at 4:57 PM, Trond Myklebust <trondmy@primarydat
> > a.com> wrote:
> >
> > > What is the problem you are trying to solve?
> >
> > The problem is getting a persistent file handle for overlayfs
> > files.
>
> That is only part of the problem and the point I was trying to
> explore is that we don't need to solve it at all (see below).
You don't, if you are willing to live with non-POSIX semantics.
Otherwise you do.
>
> The other part of the problem is getting a persistent handle for
> overlayfs directories.
>
> Why this second problem is hard is too difficult to explain to
> non-overlayfs folks, but Miklos and I started playing around with an
> idea.
>
> >
> > One idea suggested by Viro is to create a dummy inode on the upper
> > layer whenever we look up a dentry in the overlay filesystem. Then
> > we
>
> So that idea is not relevant for directories (I think)
>
> > have an inode number reserved for the file if it needs to be copied
> > up. This solves the file handle problem, since we can generate a
> > path
> > from the file handle and from there get the original lower layer
> > file
> > (assumes the file handle has the parent handle encoded as
> > well). If
>
> Apparently, that is not the case with knfsd, but it doesn't matter
> for directory handles which can always be reconnceted.
>
> > the file is copied up, the file is no longer assiciated with the
> > lower
> > layer, we just need to use the upper inode, this works too. And
> > also
> > files created on the upper work fine.
> >
> > The only little problem is that we are creating lots of inodes on
> > disk
> > and memory that until now we haven't. Currently overlayfs only
> > modifies upper layer if there's a good reason to believe that there
> > is
> > really going to be modification (e.g. when file is opened for
> > write).
> >
> > The alternative is generate file handle from lower file (if on
> > lower)
> > and from upper file (if on upper). The issue is if the file is
> > copied up and goes from lower to upper. In that case we need to
> > find
> > the upper file from the handle generated from the lower
> > file. This
>
> So why do we really need to find the upper in that case?
> If we follow my idea, then NFS read request with lower handle
> may be served from lower inode and NFS write request with a
> lower handle will get ESTALE and will try to lookup by path
> (I suppose?).
>
The client will never try to recover from an ESTALE error that is
returned on a file it has already opened. That would cause data
corruption if the user were to do something like 'rm foo; touch foo' on
the server; writes that were intended for the old file would suddenly
be written to the new one in violation of POSIX I/O rules.
IOW: In the case where WRITE returns ESTALE, that error will result in
the client returning EIO to the application on the next write() or
fsync() or close(). That error will persist; a retry will not clear
it.
--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com
next prev parent reply other threads:[~2017-04-07 15:58 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-07 14:29 overlayfs NFS export Amir Goldstein
2017-04-07 14:53 ` Jeff Layton
2017-04-07 15:26 ` Amir Goldstein
2017-04-07 14:57 ` Trond Myklebust
2017-04-07 15:28 ` Miklos Szeredi
2017-04-07 15:45 ` Amir Goldstein
2017-04-07 15:58 ` Trond Myklebust [this message]
2017-04-07 16:10 ` Amir Goldstein
2017-04-07 16:21 ` Trond Myklebust
2017-04-07 18:43 ` Amir Goldstein
2017-04-07 16:47 ` Jeff Layton
2017-04-07 18:53 ` Amir Goldstein
2017-04-07 15:46 ` Trond Myklebust
2017-04-07 15:58 ` Amir Goldstein
2017-04-07 16:02 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1491580705.12269.3.camel@primarydata.com \
--to=trondmy@primarydata.com \
--cc=amir73il@gmail.com \
--cc=bfields@fieldses.org \
--cc=jlayton@poochiereds.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-unionfs@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.