From: Al Viro <viro@ZenIV.linux.org.uk>
To: Erez Zadok <ezk@cs.sunysb.edu>
Cc: torvalds@linux-foundation.org, akpm@linux-foundation.org,
hch@infradead.org, viro@ftp.linux.org.uk,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
mhalcrow@us.ibm.com
Subject: Re: [UNIONFS] 00/29 Unionfs and related patches pre-merge review (v2)
Date: Sat, 2 Feb 2008 20:45:13 +0000 [thread overview]
Message-ID: <20080202204513.GO27894@ZenIV.linux.org.uk> (raw)
In-Reply-To: <200802021845.m12IjF8o014001@agora.fsl.cs.sunysb.edu>
On Sat, Feb 02, 2008 at 01:45:15PM -0500, Erez Zadok wrote:
> > You are thinking about non-interesting case. _Files_ are not much
> > of a problem. Directory tree is. The real problems with all unionfs and
> > stacking implementations I've seen so far, all way back to Heidemann et.al.
> > start when topology of the underlying layer changes.
>
> OK, so if I understand you, your concerns center around the fact that lower
> directories can be moved around (i.e., topology changes), what happens then
> for operations that go through the stackable f/s, and what should users
> expect to see.
Correct.
> > If you have clear
> > semantics for unionfs behaviour in presence of such things, by all means,
> > publish it - as far as I know *nobody* had done that; not even on the
> > "what should we see when..." level, nevermind the implementation.
>
> Since stacking and NFS have some similarities, I first checked w/ the NFS
> people to see what are their semantics in a similar scenario: an NFS client
> could be validating a directory, then issue, say, a ->create; but in
> between, the server could have moved the directory that was validated. In
> NFS, the ->create operation succeeds, and creates the file in the new
> location of the directory which was validated.
>
> Unionfs's behavior is similar: the newly created file will be successfully
> created in the moved directory. The only exception is that if a lower
> branch is marked readonly by unionfs, a copyup will take place.
Er... For unionfs it's a much more monumental problem. Look - you have
a mapping between the directory trees of layers; everything else is
defined in terms of that mapping ("this files shadows that file", etc.).
If you allow a mix of old and new mappings, you can easily run into the
situations when at some moment X1 covers Y1, X2 covers Y2, X2 is a descendent
of X1 and Y1 is a descendent of Y2. You *really* don't want to go there -
if nothing else, defining behaviour of copyup in face of that insanity
will be very painful.
What's more, and what makes NFS a bad model, NFS client doesn't lock
directories of NFS server. unionfs *does* locking of directories in
underlying layers, which means that you have an entirely new set of
constraints - you can deadlock easily. As the matter of fact, your
current code suffers from that problem - it violates the locking rules
in operations on covered layers.
> This had not been a problem for unionfs users to date. The main reason is
> that when unionfs users modify lower files, they often do so while there's
> little to no activity going through the union itself. And while it doesn't
> prevent directories from being moved around, this common usage mode does
> reduce the frequency in which topology changes can be an issue for unionfs
> users.
Ugh... "Currently unionfs users tend to use it ways that do not trigger
fs corruption" is nice, but doesn't really address the "current unionfs
implementation contains races leadint to fs corruption"...
> > around, changing parents, etc. Cross-directory rename() certainly rates
> > very high on the list of "WTF had they been smoking in UCB?" misfeatures,
> > but it's there and it has to be dealt with.
>
> Well, it was UCB, UCLA, and Sun. I don't think back in the early 90s they
> were too concerned about topology changes; f/s stacking was a new idea and
> they wanted to explore what can be done with it conceptually, not produce
> commercial-grade software (still, remind me to tell you the first-hand story
> I've learned about of how full-blown stacking _almost_ made it into Solaris
> 2.0 :-)
> The only known reference to try and address this coherency problem was
> Heidemann's SOSP'95 paper titled "Performance of cache coherence in
> stackable filing." The paper advocated changing the whole VFS and the
> caches (page cache + dnlc) to create a "unified cache manager" that was
> aware of complex layering topologies (including fan-out and fan-in). It was
> even able to handle compression layers, where file data offsets change b/t
> the layers (a nasty problem). Code for this unified cache manager was never
> released AFAIK. I think Heidemann's approach was elegant, but I don't think
> it was practical as it required radical VFS/VM surgery. Ironically, MS
> Windows has a single I/O cache manager that all storage and filesystem
> modules talk to directly (they're not allowed to pass IRPs directly b/t
> layers): so Windows can handle such coherency better than most Unix systems
> can today.
Different problem. IIRC, that paper implicitly assumed that mapping between
vnodes in different layers would _not_ be subject to massive surgeries from
operations involved.
> However, for this "view" idea to work, I'll need a way to lock-out or hide
> the lower directories from the namespace, so no one can access them in r-w
> mode any longer (if at all). Moreover, even if such a method was available,
> one would have to decide what to do with any open files on the lower branch
> (killing such processes or closing the descriptors seems somewhat harsh --
> maybe there'd be a way to "promote" them up the stack?)
Keep in mind that the same directory can be seen elsewhere in the same
namespace (let alone the different ones). The same fs can be seen in
any number of places...
> > BTW, and that's a completely unrelated story, I'd rather see whiteouts
> > done directly by filesystems involved - it would simplify the life big
> > way. How about adding a dir->i_op->whiteout(dir, dentry) and seeing if
> > your variant could be turned into such a method to be used by really
> > piss-poor filesystems? All UFS-related ones (including ext*) can trivially
> > support whiteouts without any PITA; adding them to tmpfs is also not a big
> > deal and anything that caches inode type in directory entries should be
> > easy to extend in the same way as ext*/ufs...
>
> I agree that WH is really needed and would certainly help. I've always felt
> that WH support wasn't technically very challenging, but logistically it was
> going to be a nightmare -- having to coordinate with many
> subsystem/filesystem maintainers over a lengthy period of time, no? Yo
> talk in more detail about it in person, I wouldn't mind taking a stab at
> this.
For ext* and tmpfs it's a matter of trivial kernel patches (I can do the
entire bunch easily) + coordination with Ted to get feature flag for
ext*/teach e2fsck not to throw up on directory entries with type "whiteout".
> BTW, before such WH support can be useful enough for unionfs, it'll have to
> be supported in ext*, tmpfs, _and_ nfs* (those are the most popular ones in
> use).
NFS is more interesting ;-/ There the fallback to your scheme might be the
best available variant, if it can be converted into nfs_whiteout(dir, dentry)
+ modifications to existing methods...
prev parent reply other threads:[~2008-02-02 20:45 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-10 14:59 [UNIONFS] 00/29 Unionfs and related patches pre-merge review (v2) Erez Zadok
2008-01-10 14:59 ` [PATCH 01/29] Unionfs: documentation Erez Zadok
2008-01-10 14:59 ` [PATCH 02/29] VFS/eCryptfs: use simplified fs_stack API to fsstack_copy_attr_all Erez Zadok
2008-01-10 14:59 ` [PATCH 03/29] Makefile: hook to compile unionfs Erez Zadok
2008-01-10 14:59 ` [PATCH 04/29] Unionfs: main Makefile Erez Zadok
2008-01-10 14:59 ` [PATCH 05/29] Unionfs: fanout header definitions Erez Zadok
2008-01-10 14:59 ` [PATCH 06/29] Unionfs: main header file Erez Zadok
2008-01-10 14:59 ` [PATCH 07/29] Unionfs: common file copyup/revalidation operations Erez Zadok
2008-01-10 14:59 ` [PATCH 08/29] Unionfs: basic file operations Erez Zadok
2008-01-10 14:59 ` [PATCH 09/29] Unionfs: lower-level copyup routines Erez Zadok
2008-01-10 14:59 ` [PATCH 10/29] Unionfs: dentry revalidation Erez Zadok
2008-01-10 14:59 ` [PATCH 11/29] Unionfs: lower-level lookup routines Erez Zadok
2008-01-10 14:59 ` [PATCH 12/29] Unionfs: rename method and helpers Erez Zadok
2008-01-10 14:59 ` [PATCH 13/29] Unionfs: directory reading file operations Erez Zadok
2008-01-10 14:59 ` [PATCH 14/29] Unionfs: readdir helper functions Erez Zadok
2008-01-10 14:59 ` [PATCH 15/29] Unionfs: readdir state helpers Erez Zadok
2008-01-10 14:59 ` [PATCH 16/29] Unionfs: inode operations Erez Zadok
2008-01-10 14:59 ` [PATCH 17/29] Unionfs: unlink/rmdir operations Erez Zadok
2008-01-10 14:59 ` [PATCH 18/29] Unionfs: address-space operations Erez Zadok
2008-01-10 14:59 ` [PATCH 19/29] Unionfs: mount-time and stacking-interposition functions Erez Zadok
2008-01-10 14:59 ` [PATCH 20/29] Unionfs: super_block operations Erez Zadok
2008-01-10 14:59 ` [PATCH 21/29] Unionfs: extended attributes operations Erez Zadok
2008-01-10 14:59 ` [PATCH 22/29] Unionfs: async I/O queue Erez Zadok
2008-01-10 14:59 ` [PATCH 23/29] Unionfs: miscellaneous helper routines Erez Zadok
2008-01-10 14:59 ` [PATCH 24/29] Unionfs: debugging infrastructure Erez Zadok
2008-01-10 14:59 ` [PATCH 25/29] Unionfs file system magic number Erez Zadok
2008-01-10 14:59 ` [PATCH 26/29] Unionfs: common header file for user-land utilities and kernel Erez Zadok
2008-01-10 14:59 ` [PATCH 27/29] VFS path get/put ops used by Unionfs Erez Zadok
2008-01-10 14:59 ` [PATCH 28/29] VFS: export release_open_intent symbol Erez Zadok
2008-01-10 14:59 ` [PATCH 29/29] Put Unionfs and eCryptfs under one layered filesystems menu Erez Zadok
2008-01-10 15:08 ` [UNIONFS] 00/29 Unionfs and related patches pre-merge review (v2) Christoph Hellwig
2008-01-10 15:57 ` Erez Zadok
2008-01-16 21:21 ` Michael Halcrow
2008-01-16 21:41 ` Erez Zadok
2008-01-17 6:00 ` Al Viro
2008-01-17 6:17 ` Erez Zadok
2008-01-26 5:08 ` Erez Zadok
2008-01-26 8:45 ` Al Viro
2008-02-02 18:45 ` Erez Zadok
2008-02-02 20:45 ` Al Viro [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080202204513.GO27894@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=akpm@linux-foundation.org \
--cc=ezk@cs.sunysb.edu \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mhalcrow@us.ibm.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@ftp.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox