linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jamie Lokier <jamie@shareable.org>
To: Valerie Aurora <vaurora@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	Jan Blunck <jblunck@suse.de>
Subject: Re: [PATCH 17/39] union-mount: Union mounts documentation
Date: Tue, 4 May 2010 22:12:09 +0100	[thread overview]
Message-ID: <20100504211209.GC4360@shareable.org> (raw)
In-Reply-To: <1272928358-20854-18-git-send-email-vaurora@redhat.com>

Valerie Aurora wrote:
> +File copyup: Create a file on the top layer that has the same metadata
> +and contents as the file with the same pathname on the bottom layer.

Can copyup be interrupted?  E.g. if I chmod an 80GB file, will the
chmod() system call pause for a couple of hours, or can I control-C it?

> +This deviation from standard is due to technical limitations of the
> +union mount implementation.  Specifically, we would need to replace an
> +open file descriptor from the lower layer with an open file descriptor
> +for a file with matching pathname and contents on the upper layer,
> +which is difficult to do.  We avoid this in other system calls by
> +doing the copyup before the file is opened.  Unionfs doesn't encounter
> +this problem because it creates a dummy file struct which redirects or
> +fans out operations to the struct files for the underlying file
> +systems.
> +
> +From an application's point of view, the result of an in-kernel file
> +copyup is the logical equivalent of another application updating the
> +file via the rename() pattern: creat() a new file, copy the data over,
> +make changes the copy, and rename() over the old version.  Any
> +existing open file descriptors for that file (including those in the
> +same application) refer to a now invisible object that used to have
> +the same pathname.  Only opens that occur after the copyup will see
> +updates to the file.

Does it apply the same permission checks that a program doing
copy+rename would have to pass?  I guess that is just write access to
the directory.

Does it effectively "rename" all hard links referring to the file, to
point to the new version, or does it only affect the path that was
used by the writer/modifier, leaving the other links continue to refer
to the original file?

> + - File copyup on open(O_DIRECT)

Why is O_DIRECT relevant?  O_DIRECT doesn't imply writing, and
copy+rename behaviour is the same with O_DIRECT as not.

Some programs use O_DIRECT to read very large files, without intending
they will ever be modified.  For example, qemu using O_DIRECT to
access a disk image backing file.

> +NFS interaction
> +===============
> +
> +NFS is currently not supported as either type of layer.  NFS as
> +read-only layer requires support from the server to honor the
> +read-only guarantee needed for the bottom layer.  To do this, the
> +server needs to revoke access to clients requesting read-only file
> +systems if the exported file system is remounted read-write or
> +unmounted (during which arbitrary changes can occur).  Some recent
> +discussion:
> +
> +http://markmail.org/message/3mkgnvo4pswxd7lp
> +
> +NFS as the read-write layer would require implementation of the
> +->whiteout() and ->fallthru() methods.  DT_WHT directory entries are
> +theoretically already supported.
> +
> +Also, technically the requirement for a readdir() cookie that is
> +stable across reboots comes only from file systems exported via NFSv2:
> +
> +http://oss.oracle.com/pipermail/btrfs-devel/2008-January/000463.html
> +
> +Todo:
> +
> +- Guarantee really really read-only on NFS exports
> +- Implement whiteout()/fallthru() for NFS

I'm finding it hard to imagine _guaranteeing_ really read-only.  All
you can guarantee is that the NFS says it is read-only.

For example, a userspace NFS server cannot prevent the filesystem it's
serving from changing.

Is this not a problem with other network filesystems like CIFS, P9, FUSE?

> +Known non-POSIX behaviors
> +-------------------------
> +
> +- Link count may be wrong for files on bottom layer with > 1 link count

Can you say a bit more about what will be seen?

> +- File copyup is the logical equivalent of an update via copy +
> +  rename().  Any existing open file descriptors will continue to refer
> +  to the read-only copy on the bottom layer and will not see any
> +  changes that occur after the copy-up.

I can imagine some database-like programs getting confused by that.

Maybe it would be better to fail copyup operations when the file is
currently open O_RDONLY by anyone, analogous to the way writable
mounts are refused when any union holds it read-only?

Are there uses likely to be broken by that behaviour?

Thanks,
-- Jamie

  parent reply	other threads:[~2010-05-04 21:12 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-03 23:11 [RFC PATCH 00/39] Union mounts with xattrs Valerie Aurora
2010-05-03 23:12 ` [PATCH 01/39] VFS: Comment follow_mount() and friends Valerie Aurora
2010-05-03 23:12 ` [PATCH 02/39] VFS: Make lookup_hash() return a struct path Valerie Aurora
2010-05-03 23:12 ` [PATCH 03/39] VFS: Add read-only users count to superblock Valerie Aurora
2010-05-03 23:12 ` [PATCH 04/39] autofs4: Save autofs trigger's vfsmount in super block info Valerie Aurora
2010-05-03 23:12 ` [PATCH 05/39] whiteout/NFSD: Don't return information about whiteouts to userspace Valerie Aurora
     [not found]   ` <1272928358-20854-6-git-send-email-vaurora-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-05-03 23:37     ` Neil Brown
2010-05-06 18:01       ` Valerie Aurora
2010-05-06 21:18         ` Neil Brown
2010-05-17 19:51           ` Valerie Aurora
2010-05-03 23:12 ` [PATCH 06/39] whiteout: Add vfs_whiteout() and whiteout inode operation Valerie Aurora
2010-05-03 23:12 ` [PATCH 07/39] whiteout: Set S_OPAQUE inode flag when creating directories Valerie Aurora
2010-05-03 23:12 ` [PATCH 08/39] whiteout: Allow removal of a directory with whiteouts Valerie Aurora
2010-05-03 23:12 ` [PATCH 09/39] whiteout: tmpfs whiteout support Valerie Aurora
2010-05-03 23:12 ` [PATCH 10/39] whiteout: Split of ext2_append_link() from ext2_add_link() Valerie Aurora
2010-05-03 23:12 ` [PATCH 11/39] whiteout: ext2 whiteout support Valerie Aurora
2010-05-03 23:12 ` [PATCH 12/39] whiteout: jffs2 " Valerie Aurora
2010-05-03 23:12 ` [PATCH 13/39] fallthru: Basic fallthru definitions Valerie Aurora
2010-05-03 23:12 ` [PATCH 14/39] fallthru: ext2 fallthru support Valerie Aurora
2010-05-03 23:12 ` [PATCH 15/39] fallthru: jffs2 " Valerie Aurora
2010-05-03 23:12 ` [PATCH 16/39] fallthru: tmpfs " Valerie Aurora
2010-05-03 23:12 ` [PATCH 17/39] union-mount: Union mounts documentation Valerie Aurora
2010-05-04  1:54   ` Valdis.Kletnieks
2010-05-05 13:06     ` Valerie Aurora
2010-05-04 21:12   ` Jamie Lokier [this message]
2010-05-05 13:19     ` Valerie Aurora
2010-05-03 23:12 ` [PATCH 18/39] union-mount: Introduce MNT_UNION and MS_UNION flags Valerie Aurora
2010-05-03 23:12 ` [PATCH 19/39] union-mount: Introduce union_mount structure and basic operations Valerie Aurora
2010-05-03 23:12 ` [PATCH 20/39] union-mount: Drive the union cache via dcache Valerie Aurora
2010-05-03 23:12 ` [PATCH 21/39] union-mount: Implement union lookup Valerie Aurora
2010-05-03 23:12 ` [PATCH 22/39] union-mount: Support for mounting union mount file systems Valerie Aurora
2010-05-03 23:12 ` [PATCH 23/39] union-mount: Call do_whiteout() on unlink and rmdir in unions Valerie Aurora
2010-05-03 23:12 ` [PATCH 24/39] union-mount: Copy up directory entries on first readdir() Valerie Aurora
2010-05-03 23:12 ` [PATCH 25/39] VFS: Split inode_permission() and create path_permission() Valerie Aurora
2010-05-03 23:12 ` [PATCH 26/39] VFS: Create user_path_nd() to lookup both parent and target Valerie Aurora
2010-05-03 23:12 ` [PATCH 27/39] union-mount: In-kernel copyup routines Valerie Aurora
2010-05-04  1:40   ` Valdis.Kletnieks
2010-05-07 14:45     ` Valerie Aurora
2010-05-03 23:12 ` [PATCH 28/39] union-mount: In-kernel copyup of xattrs Valerie Aurora
2010-05-03 23:12 ` [PATCH 29/39] union-mount: Implement union-aware access()/faccessat() Valerie Aurora
2010-05-03 23:12 ` [PATCH 30/39] union-mount: Implement union-aware link() Valerie Aurora
2010-05-03 23:12 ` [PATCH 31/39] union-mount: Implement union-aware rename() Valerie Aurora
2010-05-03 23:12 ` [PATCH 32/39] union-mount: Implement union-aware writable open() Valerie Aurora
2010-05-03 23:12 ` [PATCH 33/39] union-mount: Implement union-aware chown() Valerie Aurora
2010-05-03 23:12 ` [PATCH 34/39] union-mount: Implement union-aware truncate() Valerie Aurora
2010-05-03 23:12 ` [PATCH 35/39] union-mount: Implement union-aware chmod()/fchmodat() Valerie Aurora
2010-05-03 23:12 ` [PATCH 36/39] union-mount: Implement union-aware lchown() Valerie Aurora
2010-05-03 23:12 ` [PATCH 37/39] union-mount: Implement union-aware utimensat() Valerie Aurora
2010-05-03 23:12 ` [PATCH 38/39] union-mount: Implement union-aware setxattr() Valerie Aurora
2010-05-03 23:12 ` [PATCH 39/39] union-mount: Implement union-aware lsetxattr() Valerie Aurora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100504211209.GC4360@shareable.org \
    --to=jamie@shareable.org \
    --cc=hch@infradead.org \
    --cc=jblunck@suse.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vaurora@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).