From: Randy Dunlap <rdunlap@xenotime.net>
To: David Howells <dhowells@redhat.com>
Cc: linux-fsdevel@vger.kernel.org, viro@ZenIV.linux.org.uk,
valerie.aurora@gmail.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 18/73] union-mount: Union mounts documentation [ver #2]
Date: Sun, 26 Feb 2012 18:57:10 -0800 [thread overview]
Message-ID: <4F4AF106.5050001@xenotime.net> (raw)
In-Reply-To: <20120221175947.25235.58759.stgit@warthog.procyon.org.uk>
On 02/21/2012 09:59 AM, David Howells wrote:
> From: Valerie Aurora <vaurora@redhat.com>
>
> Document design and implementation of union mounts (a.k.a. writable overlays).
>
> With corrections from Andreas Gruenbacher <agruen@suse.de>.
>
> Original-author: Valerie Aurora <vaurora@redhat.com>
> Signed-off-by: David Howells <dhowells@redhat.com>
> ---
>
> Documentation/filesystems/union-mounts.txt | 712 ++++++++++++++++++++++++++++
> 1 files changed, 712 insertions(+), 0 deletions(-)
> create mode 100644 Documentation/filesystems/union-mounts.txt
>
> diff --git a/Documentation/filesystems/union-mounts.txt b/Documentation/filesystems/union-mounts.txt
> new file mode 100644
> index 0000000..596bfe6
> --- /dev/null
> +++ b/Documentation/filesystems/union-mounts.txt
> @@ -0,0 +1,712 @@
> +Union mounts (a.k.a. writable overlays)
> +=======================================
> +
> +This document describes the architecture and current status of union mounts,
> +also known as writable overlays.
> +
> +In this document:
> + - Overview of union mounts
> + - Terminology
> + - VFS implementation
> + - Locking strategy
> + - VFS/file system interface
> + - Userland interface
> + - NFS interaction
> + - Status
> + - Contributing to union mounts
> +
> +Overview
> +========
> +
> +A union mount layers one read-write file system over one or more read-only file
> +systems, with all writes going to the writable file system. The namespace of
> +both file systems appears as a combined whole to userland, with files and
> +directories on the writable file system covering up any files or directories
> +with matching pathnames on the read-only file system. The read-write file
> +system is the "topmost" or "upper" file system and the read-only file systems
> +are the "lower" file systems. A few use cases:
> +
> +- Root file system on CD with writes saved to hard drive (LiveCD)
> +- Multiple virtual machines with the same starting root file system
> +- Cluster with NFS mounted root on clients
> +
> +Most if not all of these problems could be solved with a COW block device or a
problems? use cases?
> +clustered file system (include NFS mounts). However, for some use cases,
> +sharing is more efficient and better performing if done at the file system
> +namespace level. COW block devices only increase their divergence as time goes
> +on, and a fully coherent writable file system is unnecessary synchronization
> +overhead if no other client needs to see the writes.
> +
> +What union mounts are not
> +-------------------------
> +
...
> +
> +Terminology
> +===========
> +
...
> +VFS objects and union mounts
> +----------------------------
> +
...
> +
> +In union mounts, a file system can only be the topmost layer for one union
> +mount. A file system can be part of multiple union mounts if it is a read-only
> +layer. So dentries in the read-only layers can be part of multiple unions,
> +while a dentry in the read-write layer can only be part of one unin.
typo: union.
> +
> +union_dir structure
> +---------------------
> +
...
> +/*
> + * The union_stack structure. It is an array of struct paths of
> + * directories below the topmost directory in a unioned directory, The
directory.
> + * topmost dentry has a pointer to this structure. The topmost dentry
> + * can only be part of one union, so we can reference it from the
> + * dentry, but lower dentries can be part of multiple union stacks.
> + *
> + * The number of dirs actually allocated is kept in the superblock,
> + * s_union_count.
> + */
> +struct union_stack {
> + struct path u_dirs[0];
> +};
> +
> +This structure is flexible enough to support an arbitrary number of layers of
> +unioned file systems. Since there can be more than two layers, this section
> +will talk about mapping "upper" directories to "lower" directories, instead of
> +"topmost" directories to "bottom" directories.
> +
> +Traversing the union stack
> +--------------------------
> +
...
> +Permission checks
> +-----------------
> +
...
> +
> +inode_permission() calls sb_permission() and __inode_permission() on the same
> +path. We create path_permission() which calls sb_permission() on the parent
> +directory from the top layer, and __inode_permission() on the target on the
> +lower layer. This gets us the correct write permissions consdering that the
considering
> +file will be copied up.
> +
> +Locking strategy
> +================
> +
> +The current union mount locking strategy is based on the following
> +rules:
> +
> +* The lower layer file system is always read-only
> +* The topmost file system is always read-write
> + => A file system can never a topmost and lower layer at the same time
can never be topmost and a lower layer at the same time
> +
> +Additionally, the topmost layer may only be mounted exactly once. Don't think
> +of the topmost layer as a separate independent file system; when it is part of
> +a union mount, it is only a file system in conjunction with the read-only
> +bottom layer. The read-only bottom layer is an independent file system in and
> +of itself and can be mounted elsewhere, including as the bottom layer for
> +another union mount.
> +
> +Thus, we may define a stable locking order in terms of top layer and bottom
> +layer locks, since a top layer is never a bottom layer and a bottom layer is
> +never a top layer. Another simplifying assumption is that all directories in a
> +pathname exist on the top layer, as they are created step-by-step during
> +lookup. This prevents us from ever having to walk backwards up the path
> +creating directory entries, which can get complicated. By implication, parent
> +directories paths during any operation (rename(), unlink(),etc.) are from the
directory paths
> +top layer. Dentries for directories from the bottom layer are only ever seen
> +or used by the lookup code.
> +
> +The two major problems we avoid with the above rules are:
> +
> +Lock ordering: Imagine two union stacks with the same two file systems: A
> +mounted over B, and B mounted over A. Sometimes locks on objects in both A and
> +B will have to be held simultanously. What order should they be acquired in?
simultaneously.
> +Simply acquiring them from top to bottom will create a lock-ordering problem -
> +one thread acquires lock on object from A and then tries for a lock on object
> +from B, while another thread grabs the lock on object from B and then waits for
> +the lock on object from A. Some other lock ordering must be defined.
> +
> +Movement/change/disappearance of objects on multiple layers: A variety of nasty
> +corner cases arise when more than one layer is changing at the same time.
> +Changes in the directory topology and their effect on inheritance are of
> +special concern. Al Viro's canonical email on the subject:
> +
> +http://lkml.indiana.edu/hypermail/linux/kernel/0802.0/0839.html
> +
> +We don't try to solve any of these cases, just avoid them in the first place.
> +
> +Todo: Prevent top layer from being mounted more than once.
> +
...
> +Userland support
> +================
> +
> +The mount command must support the "-o union" mount option and pass the
> +corresponding MS_UNION flag to the kerel. A util-linux git tree with union
kernel.
> +mount support is here:
> +
> +git://git.kernel.org/pub/scm/utils/util-linux-ng/val/util-linux-ng.git
> +
> +File system utilities must support whiteouts and fallthrus. An e2fsprogs git
> +tree with union mount support is here:
> +
> +git://git.kernel.org/pub/scm/fs/ext2/val/e2fsprogs.git
> +
> +Currently, whiteout directory entries are not returned to userland. While the
> +directory type for whiteouts, DT_WHT, has been defined for many years, very
> +little userland code handles them. Userland will never see fallthru directory
> +entries.
...
> +Non-features
> +------------
> +
...
> +Read-only top layer: The readdir() strategy fundamentally requires the ability
> +to create persistent directory entries on the top layer file system (which may
> +be tmpfs). However, you can union two read-only file systems by union mounting
> +a third file system (such as tmpfs) over the two read-onlly file systems.
read-only
> +Numerous alternatives to this readdir() strategy (including in-kernel or
> +in-application caching) exist and are compatible with union mounts with its
> +writing-readdir() implementation disabled. Creating a readdir() cookie that is
> +stable across multiple readdir()s requires one of:
> +
> +- Write to stable storage (e.g., fallthru dentries)
> +- Non-evictable kernel memory cache (doesn't handle NFS server reboot)
> +- Per-application caching by glibc readdir()
> +
> +Often these features are supported by other unioning file systems or by other
> +versions of union mounts.
--
~Randy
next prev parent reply other threads:[~2012-02-27 2:57 UTC|newest]
Thread overview: 98+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-21 17:57 [RFC][PATCH 00/73] Union Mount [ver #2] David Howells
2012-02-21 17:57 ` [PATCH 01/73] VFS: Make chown() and lchown() call fchownat() " David Howells
2012-02-21 17:57 ` [PATCH 02/73] VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors " David Howells
2012-02-21 17:57 ` [PATCH 03/73] VFS: Comment mount following code " David Howells
2012-02-21 17:57 ` [PATCH 04/73] VFS: Make lookup_hash() return a struct path " David Howells
2012-02-21 17:58 ` [PATCH 05/73] VFS: Pass mount flags to sget() " David Howells
2012-02-21 17:58 ` [PATCH 06/73] VFS: Split inode_permission() " David Howells
2012-02-21 17:58 ` [PATCH 07/73] VFS: Add hard read-only users count to superblock " David Howells
2012-02-21 17:58 ` [PATCH 08/73] VFS: Add CL_NO_SHARED flag to clone_mnt()/copy_tree() " David Howells
2012-02-29 6:56 ` Ram Pai
2012-02-21 17:58 ` [PATCH 09/73] VFS: Add CL_NO_SLAVE " David Howells
2012-02-29 6:58 ` Ram Pai
2012-02-21 17:58 ` [PATCH 10/73] VFS: Add CL_MAKE_HARD_READONLY " David Howells
2012-02-21 17:58 ` [PATCH 11/73] whiteout/NFSD: Don't return information about whiteouts to userspace " David Howells
2012-02-21 17:58 ` [PATCH 12/73] whiteout: Define flags and operations for opaque inodes " David Howells
2012-02-21 17:59 ` [PATCH 13/73] whiteout: Add vfs_whiteout() and whiteout inode operation " David Howells
2012-02-21 17:59 ` [PATCH 14/73] whiteout: Allow removal of a directory with whiteouts " David Howells
2012-02-21 17:59 ` [PATCH 15/73] tmpfs: Add whiteout support " David Howells
2012-02-21 17:59 ` David Howells
2012-02-21 17:59 ` [PATCH 16/73] VFS: Basic fallthru definitions " David Howells
2012-02-21 17:59 ` [PATCH 17/73] tmpfs: Add fallthru support " David Howells
2012-02-21 17:59 ` [PATCH 18/73] union-mount: Union mounts documentation " David Howells
2012-02-27 2:57 ` Randy Dunlap [this message]
2012-02-21 17:59 ` [PATCH 19/73] union-mount: Introduce MNT_UNION and MS_UNION flags " David Howells
2012-02-21 18:00 ` [PATCH 20/73] union-mount: Add CONFIG_UNION_MOUNT option " David Howells
2012-02-21 18:00 ` [PATCH 21/73] union-mount: Create union_stack structure " David Howells
2012-02-21 18:00 ` [PATCH 22/73] union-mount: Add two superblock fields for union mounts " David Howells
2012-02-21 18:00 ` [PATCH 23/73] union-mount: Add union_alloc() " David Howells
2012-02-21 18:00 ` [PATCH 24/73] union-mount: Add union_find_dir() " David Howells
2012-02-21 18:00 ` [PATCH 25/73] union-mount: Create d_free_unions() " David Howells
2012-02-21 18:00 ` [PATCH 26/73] union-mount: Free union stack on removal of topmost dentry from dcache " David Howells
2012-02-21 18:00 ` [PATCH 27/73] union-mount: Create union_add_dir() " David Howells
2012-02-21 18:01 ` [PATCH 28/73] union-mount: Add union_create_topmost_dir() " David Howells
2012-02-21 18:01 ` [PATCH 29/73] union-mount: Create IS_MNT_UNION() " David Howells
2012-02-21 18:01 ` [PATCH 30/73] union-mount: Create needs_lookup_union() " David Howells
2012-02-21 18:01 ` [PATCH 31/73] union-mount: Create check_topmost_union_mnt() " David Howells
2012-02-21 18:01 ` [PATCH 32/73] union-mount: Add clone_union_tree() and put_union_sb() " David Howells
2012-02-21 18:01 ` [PATCH 33/73] unionmount: Mark lower layers in union " David Howells
2012-02-21 18:01 ` [PATCH 34/73] union-mount: Create build_root_union() " David Howells
2012-02-21 18:01 ` [PATCH 35/73] union-mount: Create prepare_mnt_union() and cleanup_mnt_union() " David Howells
2012-02-21 18:02 ` [PATCH 36/73] union-mount: Prevent improper union-related remounts " David Howells
2012-02-21 18:02 ` [PATCH 37/73] union-mount: Prevent topmost file system from being mounted elsewhere " David Howells
2012-02-21 18:02 ` [PATCH 38/73] union-mount: Prevent bind mounts of union mounts " David Howells
2012-02-21 18:02 ` [PATCH 39/73] union-mount: Duplicate the i_{, dir_}mutex lock classes and use for upper layer " David Howells
2012-02-21 18:02 ` [PATCH 40/73] union-mount: Implement union mount " David Howells
2012-02-21 18:02 ` [PATCH 41/73] union-mount: Temporarily disable some syscalls " David Howells
2012-02-21 18:02 ` [PATCH 42/73] union-mount: Basic infrastructure of __lookup_union() " David Howells
2012-02-21 18:02 ` [PATCH 43/73] union-mount: Process negative dentries in " David Howells
2012-02-21 18:03 ` [PATCH 44/73] union-mount: Return files found in lower layers " David Howells
2012-02-21 18:03 ` [PATCH 45/73] union-mount: Build union stack " David Howells
2012-02-21 18:03 ` [PATCH 46/73] union-mount: Follow mount " David Howells
2012-02-21 18:03 ` [PATCH 47/73] union-mount: Add lookup_union_locked() " David Howells
2012-02-21 18:03 ` [PATCH 48/73] union-mount: Add wrapper for lookup_union_locked() and RCU hook " David Howells
2012-02-21 18:03 ` [PATCH 49/73] union-mount: Call union lookup functions in lookup path " David Howells
2012-02-21 18:03 ` [PATCH 50/73] union-mount: Create whiteout on unlink() " David Howells
2012-02-21 18:03 ` [PATCH 51/73] union-mount: Create whiteout on rmdir() " David Howells
2012-02-21 18:03 ` [PATCH 52/73] union-mount: Set opaque flag on new directories in unioned file systems " David Howells
2012-02-21 18:04 ` [PATCH 53/73] union-mount: Copy up directory entries on first readdir() " David Howells
2012-02-21 18:04 ` [PATCH 54/73] union-mount: Add generic_readdir_fallthru() helper " David Howells
2012-02-21 18:04 ` [PATCH 55/73] fallthru: tmpfs support for lookup of d_type/d_ino in fallthrus " David Howells
2012-02-21 18:04 ` David Howells
2012-02-21 18:04 ` [PATCH 56/73] union-mount: In-kernel file copyup routines " David Howells
2012-02-21 18:04 ` [PATCH 57/73] VFS: Create user_path_nd() to lookup both parent and target " David Howells
2012-02-21 18:04 ` [PATCH 58/73] unionmount: Add LOOKUP_COPY_UP " David Howells
2012-02-21 18:04 ` [PATCH 59/73] unionmount: Override creds when copying up a file to correctly set ownership " David Howells
2012-02-21 18:05 ` [PATCH 60/73] union-mount: Implement union-aware access()/faccessat() " David Howells
2012-02-21 18:05 ` [PATCH 61/73] union-mount: Make various syscalls aware (link, chmod, chown, utimes & setxattr) " David Howells
2012-02-21 18:05 ` [PATCH 62/73] union-mount: Implement union-aware rename() " David Howells
2012-02-21 18:05 ` [PATCH 63/73] union-mount: Implement union-aware writable open() " David Howells
2012-02-21 18:05 ` [PATCH 64/73] union-mount: Implement union-aware truncate() " David Howells
2012-02-21 18:05 ` [PATCH 65/73] ext2: Add ext2_dirent_in_use() " David Howells
2012-02-21 18:05 ` [PATCH 66/73] ext2: Split ext2_add_entry() from ext2_add_link() " David Howells
2012-02-27 0:04 ` Ted Ts'o
2012-02-27 3:30 ` Andreas Dilger
2012-02-27 19:09 ` Ted Ts'o
2012-02-27 20:45 ` Andreas Dilger
2012-02-21 18:05 ` [PATCH 67/73] ext2: Remove target inode pointer from ext2_add_entry() " David Howells
2012-02-27 0:22 ` Ted Ts'o
2012-02-21 18:06 ` [PATCH 68/73] ext2: Add whiteout and opaque directory support " David Howells
2012-02-21 18:06 ` [PATCH 69/73] ext2: Add fallthru " David Howells
2012-02-27 0:33 ` Ted Ts'o
2012-02-21 18:06 ` [PATCH 70/73] fallthru: ext2 support for lookup of d_type/d_ino in fallthrus " David Howells
2012-02-21 18:06 ` [PATCH 71/73] jffs2: Add whiteout support " David Howells
2012-02-21 18:06 ` David Howells
2012-02-21 18:06 ` [PATCH 72/73] jffs2: Add fallthru " David Howells
2012-02-21 18:06 ` David Howells
2012-02-21 18:06 ` [PATCH 73/73] fallthru: jffs2 support for lookup of d_type/d_ino in fallthrus " David Howells
2012-02-26 6:48 ` copy-up xattr (Re: [RFC][PATCH 00/73] Union Mount [ver #2]) J. R. Okajima
2012-03-26 14:22 ` David Howells
2012-03-26 14:22 ` David Howells
2012-03-27 4:38 ` Casey Schaufler
2012-03-27 4:38 ` Casey Schaufler
2012-03-27 13:10 ` David Quigley
2012-03-27 13:10 ` David Quigley
2012-03-27 13:10 ` David Quigley
2012-03-27 16:37 ` Casey Schaufler
2012-03-27 16:37 ` Casey Schaufler
2012-03-28 14:51 ` J. R. Okajima
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F4AF106.5050001@xenotime.net \
--to=rdunlap@xenotime.net \
--cc=dhowells@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=valerie.aurora@gmail.com \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.