From: Valerie Aurora <vaurora@redhat.com>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: viro@zeniv.linux.org.uk, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 16/35] union-mount: Writable overlays/union mounts documentation
Date: Thu, 29 Apr 2010 16:20:12 -0400 [thread overview]
Message-ID: <20100429202012.GA6539@shell> (raw)
In-Reply-To: <E1O7Q7n-0002KC-Tm@pomaz-ex.szeredi.hu>
On Thu, Apr 29, 2010 at 11:33:39AM +0200, Miklos Szeredi wrote:
> On Wed, 28 Apr 2010, Valerie Aurora wrote:
> > I'm sorry I have responded sooner, I've been trying to write a
> > detailed useful message and that turns out to be hard. I'll just
> > include a few of the highlights; mainly I want to say that I'd
> > rather do it the way you describe but when I tried it ended up even
> > uglier than the VFS implementation.
> >
> > I went down this road initially (do most of the unioning in a file
> > system) and spent a couple of months on it. But I always ended up
> > having to do some level of copy-around and redirection similar to that
> > in unionfs.
>
> I haven't looked at unionfs in a long time. Can you say something
> more specific about what these problems were?
Sure. The short version is that unionfs has to allocate another copy
of each file system structure - inode, etc. - and then keep an array
of the matching structures from each of the file system layers. Each
unionfs file system op copies data up and down between the unionfs
structures and the underlying structures, and then calls the lower
file system op as necessary. Often it has to duplicate code from the
VFS before calling the lower file system ops.
Where union mounts has the advantage is that we make zero copies of
file system data structures and therefore don't need copyup or
interposition on as many ops. But if you wait until the file system
op is called, you have to attach your union-related data to the
associated data structure, and the underlying file system is already
using the private data pointer. And you have to keep a copy of the
underlying file system ops. And each data structure can be part of
multiple unions. So you end up with an effective second copy of the
file system data structure and a mess of linked lists or pointers.
> > One of the major difficulties that arises even when doing unioning at
> > the VFS level is keeping around the parent's path in order to do the
> > copyup later on. Take a look at the code pattern in the "union-mount:
> > Implement union-aware syscall()" series of patches. That's the
> > prettiest and most efficient version I could come up with, after two
> > other implementations, and it's in the VFS, at the vfs_foo_syscall()
> > level. I don't even know how I would start if I had to wait until the
> > file system op is called.
>
> On a high level I don't see a problem, the parent of every dentry can
> be found through ->d_parent.
Unfortunately, dentries aren't unioned - paths (dentry/mnt pairs) are.
So you can get the parent dentry in the file system op, but the dentry
is potentially part of many different mounts. There's no mapping from
a lower-level read-only dentry to the covering read-write parent
dentry because the read-only dentry could potentially be mounted in 5
different places. Which union mount is this dentry part of? You have
to record the parent's path during lookup and carry it around until
you do the copyup - for every syscall that alters a file, not just
open() and write(), but chmod(), etc. So if you implement it in the
VFS, you don't have to carry that info across the file system op
boundary.
I think the chmod() case really shows the issues well. user_path_nd()
records the parent's path during lookup (in an inefficient, possibly
racy manner), then union_copyup() does the copy (too early, before a
lot of permission checks). The underlying file system doesn't get
involved until the ->setattr() call in notify_change(), and all that
gets is the dentry.
> One issue is having to duplicate some locking and other stuff around
> vfs_whatever() calls. But that could be fixed by exporting suitable
> helpers from the VFS.
That's somewhat of an issue right now. For union mounts to be most
efficient and wonderful, system calls should be separated into two
sequential parts called from the same context as the user_path()
lookup:
1) permission checks and all read-only checks that can fail.
[union copyup happens here]
2) the actual write or change to the file system
Otherwise we have to push the parent nameidata down through the stack
to where the actual change happens. So if want to avoid copying up
the file unless chmod() succeeds, in the current code structure I'd
have to add a nameidata and a mnt to notify_change()'s arguments. But
this is an optimization, not a correctness problem.
-VAL
next prev parent reply other threads:[~2010-04-30 17:22 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-15 23:04 [PATCH 00/35] Union mounts - everything but the xattrs Valerie Aurora
2010-04-15 23:04 ` [PATCH 01/35] VFS: Make lookup_hash() return a struct path Valerie Aurora
2010-04-15 23:04 ` [PATCH 02/35] VFS: Add read-only users count to superblock Valerie Aurora
[not found] ` <1271372682-21225-4-git-send-email-vaurora@redhat.com>
2010-04-15 23:04 ` [PATCH 04/35] whiteout/NFSD: Don't return information about whiteouts to userspace Valerie Aurora
2010-04-15 23:04 ` [PATCH 05/35] whiteout: Add vfs_whiteout() and whiteout inode operation Valerie Aurora
2010-04-15 23:04 ` [PATCH 06/35] whiteout: Set S_OPAQUE inode flag when creating directories Valerie Aurora
2010-04-15 23:04 ` [PATCH 07/35] whiteout: Allow removal of a directory with whiteouts Valerie Aurora
2010-04-15 23:04 ` [PATCH 08/35] whiteout: tmpfs whiteout support Valerie Aurora
2010-04-15 23:04 ` [PATCH 09/35] whiteout: Split of ext2_append_link() from ext2_add_link() Valerie Aurora
2010-04-15 23:04 ` [PATCH 10/35] whiteout: ext2 whiteout support Valerie Aurora
2010-04-15 23:04 ` [PATCH 11/35] whiteout: jffs2 " Valerie Aurora
2010-04-15 23:04 ` [PATCH 12/35] fallthru: Basic fallthru definitions Valerie Aurora
2010-04-15 23:04 ` [PATCH 13/35] fallthru: ext2 fallthru support Valerie Aurora
2010-04-15 23:04 ` [PATCH 14/35] fallthru: jffs2 " Valerie Aurora
2010-04-15 23:04 ` [PATCH 15/35] fallthru: tmpfs " Valerie Aurora
2010-04-15 23:04 ` [PATCH 16/35] union-mount: Writable overlays/union mounts documentation Valerie Aurora
2010-04-15 23:04 ` [PATCH 17/35] union-mount: Introduce MNT_UNION and MS_UNION flags Valerie Aurora
2010-04-15 23:04 ` [PATCH 18/35] union-mount: Introduce union_mount structure and basic operations Valerie Aurora
2010-04-15 23:04 ` [PATCH 19/35] union-mount: Drive the union cache via dcache Valerie Aurora
2010-04-15 23:04 ` [PATCH 20/35] union-mount: Implement union lookup Valerie Aurora
2010-04-15 23:04 ` [PATCH 21/35] union-mount: Support for mounting union mount file systems Valerie Aurora
2010-04-15 23:04 ` [PATCH 22/35] union-mount: Call do_whiteout() on unlink and rmdir in unions Valerie Aurora
2010-04-15 23:04 ` [PATCH 23/35] union-mount: Copy up directory entries on first readdir() Valerie Aurora
2010-04-15 23:04 ` [PATCH 24/35] VFS: Split inode_permission() and create path_permission() Valerie Aurora
2010-04-15 23:04 ` [PATCH 25/35] VFS: Create user_path_nd() to lookup both parent and target Valerie Aurora
2010-04-15 23:04 ` [PATCH 26/35] union-mount: In-kernel copyup routines Valerie Aurora
2010-04-15 23:04 ` [PATCH 27/35] union-mount: Implement union-aware access()/faccessat() Valerie Aurora
2010-04-15 23:04 ` [PATCH 28/35] union-mount: Implement union-aware link() Valerie Aurora
2010-04-15 23:04 ` [PATCH 29/35] union-mount: Implement union-aware rename() Valerie Aurora
2010-04-15 23:04 ` [PATCH 30/35] union-mount: Implement union-aware writable open() Valerie Aurora
2010-04-15 23:04 ` [PATCH 31/35] union-mount: Implement union-aware chown() Valerie Aurora
2010-04-15 23:04 ` [PATCH 32/35] union-mount: Implement union-aware truncate() Valerie Aurora
2010-04-15 23:04 ` [PATCH 33/35] union-mount: Implement union-aware chmod()/fchmodat() Valerie Aurora
2010-04-15 23:04 ` [PATCH 34/35] union-mount: Implement union-aware lchown() Valerie Aurora
2010-04-15 23:04 ` [PATCH 35/35] union-mount: Implement union-aware utimensat() Valerie Aurora
2010-04-20 16:30 ` [PATCH 16/35] union-mount: Writable overlays/union mounts documentation Miklos Szeredi
2010-04-28 20:19 ` Valerie Aurora
2010-04-29 9:33 ` Miklos Szeredi
2010-04-29 20:20 ` Valerie Aurora [this message]
2010-05-10 12:57 ` Miklos Szeredi
2010-05-17 19:55 ` Valerie Aurora
2010-04-29 16:10 ` J. R. Okajima
2010-04-19 12:40 ` [PATCH 13/35] fallthru: ext2 fallthru support Jan Blunck
2010-04-19 13:02 ` David Woodhouse
2010-04-19 13:23 ` Jan Blunck
2010-04-19 13:30 ` Jamie Lokier
2010-04-19 14:12 ` Jan Blunck
2010-04-19 14:23 ` Valerie Aurora
2010-04-19 14:53 ` Miklos Szeredi
2010-04-20 21:34 ` Jamie Lokier
2010-04-21 8:42 ` Jan Blunck
2010-04-21 9:22 ` Jamie Lokier
2010-04-21 9:34 ` Miklos Szeredi
2010-04-21 9:52 ` Jamie Lokier
2010-04-21 10:17 ` Miklos Szeredi
2010-04-21 17:36 ` Jamie Lokier
2010-04-21 21:34 ` Valerie Aurora
2010-04-21 21:38 ` Valerie Aurora
2010-04-21 22:10 ` Jamie Lokier
2010-04-22 10:30 ` J. R. Okajima
2010-04-20 21:40 ` Jamie Lokier
2010-04-19 13:03 ` [PATCH 11/35] whiteout: jffs2 whiteout support David Woodhouse
2010-04-19 14:26 ` Valerie Aurora
2010-04-16 15:59 ` [PATCH 04/35] whiteout/NFSD: Don't return information about whiteouts to userspace J. Bruce Fields
2010-04-19 12:37 ` Jan Blunck
2010-04-19 13:54 ` J. Bruce Fields
2010-04-15 23:45 ` [PATCH 03/35] autofs4: Save autofs trigger's vfsmount in super block info Valerie Aurora
2010-04-21 22:06 ` [PATCH 00/35] Union mounts - everything but the xattrs Randy Dunlap
2010-04-21 23:35 ` Valerie Aurora
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100429202012.GA6539@shell \
--to=vaurora@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).