linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Blunck <jblunck@suse.de>
To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Cc: viro@zeniv.linux.org.uk, bharata@in.ibm.com, dwmw2@infradead.org,
	mszeredi@suse.cz, vaurora@redhat.com
Subject: [PATCH 15/32] union-mount: Documentation
Date: Mon, 18 May 2009 18:09:11 +0200	[thread overview]
Message-ID: <1242662968-11684-16-git-send-email-jblunck@suse.de> (raw)
In-Reply-To: <1242662968-11684-1-git-send-email-jblunck@suse.de>

Add simple documentation about union mounting in general and this
implementation in specific.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
---
 Documentation/filesystems/union-mounts.txt |  187 ++++++++++++++++++++++++++++
 1 files changed, 187 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/filesystems/union-mounts.txt

diff --git a/Documentation/filesystems/union-mounts.txt b/Documentation/filesystems/union-mounts.txt
new file mode 100644
index 0000000..15bb9d5
--- /dev/null
+++ b/Documentation/filesystems/union-mounts.txt
@@ -0,0 +1,187 @@
+VFS based Union Mounts
+----------------------
+
+ 1. What are "Union Mounts"
+ 2. The Union Stack
+ 3. Whiteouts, Opaque Directories, and Fallthrus
+ 4. Copy-up
+ 5. Directory Reading
+ 6. Known Problems
+ 7. References
+
+-------------------------------------------------------------------------------
+
+1. What are "Union Mounts"
+==========================
+
+Please note: this is NOT about UnionFS and it is NOT derived work!
+
+Traditionally the mount operation is opaque, which means that the content of
+the mount point, the directory where the file system is mounted on, is hidden
+by the content of the mounted file system's root directory until the file
+system is unmounted again. Unlike the traditional UNIX mount mechanism, that
+hides the contents of the mount point, a union mount presents a view as if
+both filesystems are merged together. Although only the topmost layer of the
+mount stack can be altered, it appears as if transparent file system mounts
+allow any file to be created, modified or deleted.
+
+Most people know the concepts and features of union mounts from other
+operating systems like Sun's Translucent Filesystem, Plan9 or BSD. For an
+in-depth review of union mounts and other unioning file systems, see:
+
+http://lwn.net/Articles/324291/
+http://lwn.net/Articles/325369/
+http://lwn.net/Articles/327738/
+
+Here are the key features of this implementation:
+- completely VFS based
+- does not change the namespace stacking
+- directory listings have duplicate entries removed in the kernel
+- writable unions: only the topmost file system layer may be writable
+- writable unions: new whiteout filetype handled inside the kernel
+
+-------------------------------------------------------------------------------
+
+2. The Union Stack
+==================
+
+The mounted file systems are organized in the "file system hierarchy" (tree of
+vfsmount structures), which keeps track about the stacking of file systems
+upon each other. The per-directory view on the file system hierarchy is called
+"mount stack" and reflects the order of file systems, which are mounted on a
+specific directory.
+
+Union mounts present a single unified view of the contents of two or more file
+systems as if they are merged together. Since the information which file
+system objects are part of a unified view is not directly available from the
+file system hierarchy there is a need for a new structure. The file system
+objects, which are part of a unified view are ordered in a so-called "union
+stack". Only directories can be part of a unified view.
+
+The link between two layers of the union stack is maintained using the
+union_mount structure (#include <linux/union.h>):
+
+struct union_mount {
+       atomic_t u_count;               /* reference count */
+       struct mutex u_mutex;
+       struct list_head u_unions;      /* list head for d_unions */
+       struct hlist_node u_hash;       /* list head for searching */
+       struct hlist_node u_rhash;      /* list head for reverse searching */
+
+       struct path u_this;             /* this is me */
+       struct path u_next;             /* this is what I overlay */
+};
+
+The union_mount structure holds a reference (dget,mntget) to the next lower
+layer of the union stack. Since a dentry can be part of multiple unions
+(e.g. with bind mounts) they are tied together via the d_unions field of the
+dentry structure.
+
+All union_mount structures are cached in two hash tables, one for lookups of
+the next lower layer of the union stack and one for reverse lookups of the
+next upper layer of the union stack. The reverse lookup is necessary to
+resolve CWD relative path lookups. For calculation of the hash value, the
+(dentry,vfsmount) pair is used. The u_this field is used for the hash table
+which is used in forward lookups and the u_next field for the reverse lookups.
+
+During every new mount (or mount propagation), a new union_mount structure is
+allocated. A reference to the mountpoint's vfsmount and dentry is taken and
+stored in the u_next field.  In almost the same manner an union_mount
+structure is created during the first time lookup of a directory within a
+union mount point. In this case the lookup proceeds to all lower layers of the
+union. Therefore the complete union stack is constructed during lookups.
+
+The union_mount structures of a dentry are destroyed when the dentry itself is
+destroyed. Therefore the dentry cache is indirectly driving the union_mount
+cache like this is done for inodes too. Please note that lower layer
+union_mount structures are kept in memory until the topmost dentry is
+destroyed.
+
+-------------------------------------------------------------------------------
+
+3. Whiteouts, Opaque Directories, and Fallthrus
+===========================================================
+
+The whiteout filetype isn't new. It has been there for quite some time now
+but Linux's VFS hasn't used it yet. With the availability of union mount code
+inside the VFS the whiteout filetype is getting important to support writable
+union mounts. For read-only union mounts, support for whiteouts or
+copy-on-open is not necessary.
+
+The whiteout filetype has the same function as negative dentries: they
+describe a filename which isn't there. The creation of whiteouts needs
+lowlevel filesystem support. At the time of writing this, there is whiteout
+support for tmpfs, ext2 and ext3 available. The VFS is extended to make the
+whiteout handling transparent to all its users. The whiteouts are not
+visible to user-space.
+
+What happens when we create a directory that was previously whited-out? We
+don't want the directory entries from underlying filesystems to suddenly appear
+in the newly created directory.  So we mark the directory opaque (the file
+system must support storage of the opaque flag).
+
+Fallthrus are directory entries that override the opaque flag on a directory
+for that specific directory entry name (the lookup "falls through" to the next
+layer of the union mount).  Fallthrus are mainly useful for implementing
+readdir().
+
+-------------------------------------------------------------------------------
+
+4. Copy-up
+===========
+
+Any write to an object on any layer other than the topmost triggers a copy-up
+of the object to the topmost file system. For regular files, the copy-up
+happens when it is opened in writable mode.
+
+Directories are copied up on open, regardless of intent to write, to simplify
+copy-up of any object located below it in the namespace. Otherwise we have to
+walk the entire pathname to create intermediate directories whenever we do a
+copy-up. This is the same approach as BSD union mounts and uses a negigible
+amount of disk space.  Note that the actual directory entries themselves are
+not copied-up from the lower levels until (a) the directory is written to, or
+(b) the first readdir() of the directory (more on that later).
+
+Rename across different levels of the union is implemented as a copy-up
+operation for regular files. Rename of directories simply returns EXDEV, the
+same as if we tried to rename across different mounts. Most applications have
+to handle this case anyway. Some applications do not expect EXDEV on
+rename operations within the same directory, but these applications will also
+be broken with bind mounts.
+
+-------------------------------------------------------------------------------
+
+5. Directory Reading
+====================
+
+readdir() is somewhat difficult to implement in a unioning file system. We must
+eliminate duplicates, apply whiteouts, and start up readdir() where we left
+off, given a single f_pos value. Our solution is to copy up all the directory
+entries to the topmost directory the first time readdir() is called on a
+directory. During this copy-up, we skip duplicates and entries covered by
+whiteouts, and then create fallthru entries for each remaining visible dentry.
+Then we mark the whole directory opaque. From then on, we just use the topmost
+file system's normal readdir() operation.
+
+-------------------------------------------------------------------------------
+
+6. Known Problems
+=================
+
+- copyup() for other filetypes that reg and dir (e.g. for chown() on devices)
+- symlinks are untested
+
+-------------------------------------------------------------------------------
+
+7. References
+=============
+
+[1] http://marc.info/?l=linux-fsdevel&m=96035682927821&w=2
+[2] http://marc.info/?l=linux-fsdevel&m=117681527820133&w=2
+[3] http://marc.info/?l=linux-fsdevel&m=117913503200362&w=2
+[4] http://marc.info/?l=linux-fsdevel&m=118231827024394&w=2
+
+Authors:
+Jan Blunck <jblunck@suse.de>
+Bharata B Rao <bharata@linux.vnet.ibm.com>
+Valerie Aurora <vaurora@redhat.com>
-- 
1.6.1.3


  parent reply	other threads:[~2009-05-18 16:09 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-18 16:08 [PATCH 00/32] VFS based Union Mount (V3) Jan Blunck
2009-05-18 16:08 ` [PATCH 01/32] atomic: Only take lock when the counter drops to zero on UP as well Jan Blunck
2009-05-18 16:08 ` [PATCH 02/32] VFS: BUG() if somebody tries to rehash an already hashed dentry Jan Blunck
2009-05-18 16:08 ` [PATCH 03/32] VFS: propagate mnt_flags into do_loopback Jan Blunck
2009-05-18 16:09 ` [PATCH 04/32] VFS: Make lookup_hash() return a struct path Jan Blunck
2009-05-18 16:09 ` [PATCH 05/32] VFS: Remove unnecessary micro-optimization in cached_lookup() Jan Blunck
2009-05-18 16:09 ` [PATCH 06/32] VFS: Make real_lookup() return a struct path Jan Blunck
2009-05-18 16:09 ` [PATCH 07/32] VFS: Introduce dput() variant that maintains a kill-list Jan Blunck
2009-05-18 16:09 ` [PATCH 08/32] whiteout: Don't return information about whiteouts to userspace Jan Blunck
2009-05-18 16:09 ` [PATCH 09/32] whiteout: Add vfs_whiteout() and whiteout inode operation Jan Blunck
2009-05-18 16:09 ` [PATCH 10/32] whiteout: Set S_OPAQUE inode flag when creating directories Jan Blunck
2009-05-18 16:09 ` [PATCH 11/32] whiteout: Add whiteout support to tmpfs Jan Blunck
2009-05-18 16:09 ` [PATCH 12/32] whiteout: Split of ext2_append_link() from ext2_add_link() Jan Blunck
2009-05-18 16:09 ` [PATCH 13/32] whiteout: Add whiteout support to ext2 Jan Blunck
2009-05-18 16:09 ` [PATCH 14/32] whiteout: Add path_whiteout() helper Jan Blunck
2009-05-18 16:09 ` Jan Blunck [this message]
2009-05-25  6:25   ` [PATCH 15/32] union-mount: Documentation hooanon05
2009-05-25  8:03     ` Arnd Bergmann
2009-05-25  8:43       ` hooanon05
2009-06-18 19:05         ` Valerie Aurora
2009-06-19  1:53           ` hooanon05
2009-05-18 16:09 ` [PATCH 16/32] union-mount: Introduce MNT_UNION and MS_UNION flags Jan Blunck
2009-05-18 16:09 ` [PATCH 17/32] union-mount: Introduce union_mount structure Jan Blunck
2009-05-18 16:09 ` [PATCH 18/32] union-mount: Drive the union cache via dcache Jan Blunck
2009-05-18 16:09 ` [PATCH 19/32] union-mount: Some checks during namespace changes Jan Blunck
2009-05-18 16:09 ` [PATCH 20/32] union-mount: Changes to the namespace handling Jan Blunck
2009-05-18 16:09 ` [PATCH 21/32] union-mount: Make lookup work for union-mounted file systems Jan Blunck
2009-05-19 16:15   ` Miklos Szeredi
2009-05-19 17:30     ` Valerie Aurora
2009-05-20 10:21       ` Miklos Szeredi
2009-05-18 16:09 ` [PATCH 22/32] union-mount: stop lookup when directory has S_OPAQUE flag set Jan Blunck
2009-05-18 16:09 ` [PATCH 23/32] union-mount: stop lookup when finding a whiteout Jan Blunck
2009-05-18 16:09 ` [PATCH 24/32] union-mount: in-kernel file copy between union mounted filesystems Jan Blunck
2009-05-18 16:09 ` [PATCH 25/32] union-mount: check for logically empty directory (FIXME) Jan Blunck
2009-05-18 16:09 ` [PATCH 26/32] union-mount: call do_whiteout() on unlink and rmdir Jan Blunck
2009-05-18 16:09 ` [PATCH 27/32] union-mount: Always create topmost directory on open Jan Blunck
2009-05-18 16:09 ` [PATCH 28/32] union-mount: Basic fallthru definitions Jan Blunck
2009-05-18 16:09 ` [PATCH 29/32] union mount: Support for fallthru entries in union mount lookup Jan Blunck
2009-05-18 16:09 ` [PATCH 30/32] union mount: ext2 fallthru support Jan Blunck
2009-05-18 16:32   ` Andreas Dilger
2009-05-19  9:42     ` Jan Blunck
2009-05-19 14:05       ` Andreas Dilger
2009-05-19 16:13         ` Jan Blunck
2009-05-18 16:09 ` [PATCH 31/32] union-mount: tmpfs " Jan Blunck
2009-05-18 16:09 ` [PATCH 32/32] union-mount: Copy up directory entries on first readdir() Jan Blunck
2009-05-18 20:40 ` [PATCH] Userland for VFS based Union Mount (V3) Valerie Aurora
2009-05-21 13:53   ` Andreas Dilger
2009-06-18  3:22     ` Valerie Aurora
2009-05-19  9:48 ` [PATCH 00/32] " Miklos Szeredi
2009-05-19 10:29   ` Jan Blunck
2009-05-19 10:35     ` Miklos Szeredi
2009-05-19 10:39       ` Jan Blunck
2009-05-19 11:54         ` Arnd Bergmann
2009-05-19 12:15           ` Jan Blunck
2009-05-19 12:21             ` Arnd Bergmann
2009-05-19 13:10               ` Jan Blunck
2009-05-19 17:23   ` Valerie Aurora
2009-05-20  9:05     ` Miklos Szeredi
2009-06-08 19:44       ` Valerie Aurora
2009-06-16 15:19         ` Miklos Szeredi
2009-05-21 12:54 ` Jan Rekorajski
2009-06-08 19:57   ` Valerie Aurora
2009-06-08 22:44     ` Jan Rekorajski
2009-06-08 22:48       ` Valerie Aurora
2009-06-15  9:55         ` Jan Rekorajski
2009-06-18  3:23           ` Valerie Aurora
2009-06-04 11:38 ` Scott James Remnant
2009-06-09 22:15   ` Valerie Aurora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1242662968-11684-16-git-send-email-jblunck@suse.de \
    --to=jblunck@suse.de \
    --cc=bharata@in.ibm.com \
    --cc=dwmw2@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mszeredi@suse.cz \
    --cc=vaurora@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).