From: "J. R. Okajima" <hooanon05@yahoo.co.jp>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Jan Kara <jack@suse.cz>, David Howells <dhowells@redhat.com>,
Miklos Szeredi <miklos@szeredi.hu>,
torvalds@linux-foundation.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, hch@infradead.org,
akpm@linux-foundation.org, apw@canonical.com, nbd@openwrt.org,
neilb@suse.de, jordipujolp@gmail.com, ezk@fsl.cs.sunysb.edu,
sedat.dilek@googlemail.com, mszeredi@suse.cz
Subject: Re: [PATCH 2/9] vfs: export do_splice_direct() to modules
Date: Tue, 19 Mar 2013 18:00:37 +0900 [thread overview]
Message-ID: <20556.1363683637@jrobl> (raw)
In-Reply-To: <20130319013805.GG21522@ZenIV.linux.org.uk>
Al Viro:
> BTW, I wonder what's the right locking for that sucker; overlayfs is probably
> too heavy - we are talking about copying a file from one fs to another, which
> can obviously take quite a while, so holding ->i_mutex on _parent_ all along
> is asking for very serious contention. OTOH, there's a pile of unpleasant
Yes, holding parent->i_mutex can be longer.
Using splice function for copy-up is simple and (probably) fastest. But
it doesn't support a "hole" in the file (sparse file). All holes are
filled with NUL byte and consumes a disk block on the upper layer. It is
a problem, especially for users who have smaller tmpfs as his upper
layer.
The copy-up with considering a hole may cost more, but it can save the
storage consumtion.
> Another fun issue is copyup vs. copyup - we want to sit and wait for copyup
> attempt in progress to complete, rather than start another one in parallel.
> And whoever comes the second must check if copyup has succeeded, obviously -
> it's possible to have user run into 5% limit and fail copyup, followed by
> root doing it successfully.
"5% limit" means the reserved are for a superuser on a filesystem,
right? As far as I know, overlayfs (UnionMount too?) solves this problem
as changing the task credential and the capability. But I am not sure
whether it solves the similar problem around the resource limit like
RLIMIT_CPU, RLIMIT_FSIZE or something.
> Another one: overwriting rename() vs. copyup. Similar to unlink() vs. copyup().
Hmm, do you mean that, just after the parent dir lock on the underlying
fs, the copyup routine should confirm whether the target is still alive?
If so, I agree.
> Another one: read-only open() vs. copyup(). Hell knows - we obviously don't
> want to open a partially copied file; might want to wait or pretend that this
> open() has come first and give it the underlying file. The same goes for
> stat() vs. copyup().
Exactly.
Moreover users don't want to refer to the lower file which is obsoleted
by copyup.
> FWIW, something like "lock parent, ->create(), ->unlink(), unlock parent,
> copy data and metadata, lock parent, allocate a new dentry in covering layer
> and do ->lookup() on it, verify that it is negative and not a whiteout, lock
> child, use ->link() to put it into directory, unlock everything" would
> probably DTRT for unlink/copyup and rename/copyup. The rest... hell knows;
Please let me make sure.
You are saying,
- create the file on the upper layer
- get the "struct file" object
- hide it from users
- before copying the file, unlock the parent in order to stop the long
period locking
- copy the file without the parent lock. it doesn't matter since the
file is invisible to users.
- confirm that the target name is still available for copyup
- make the completed file visible by ->link()
It is ineteresting to ->link() with the unlinked file. While
vfs_unlink() rejects such case, it may not matter for the underlying fs.
Need to verify FS including jounals.
J. R. Okajima
next prev parent reply other threads:[~2013-03-19 9:00 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-13 14:16 [PATCH 0/9] overlay filesystem: request for inclusion (v17) Miklos Szeredi
2013-03-13 14:16 ` [PATCH 1/9] vfs: add i_op->dentry_open() Miklos Szeredi
2013-03-13 22:44 ` Andrew Morton
2013-03-14 11:15 ` Miklos Szeredi
2013-03-13 14:16 ` [PATCH 2/9] vfs: export do_splice_direct() to modules Miklos Szeredi
2013-03-13 22:45 ` Andrew Morton
2013-03-13 14:16 ` [PATCH 3/9] vfs: export __inode_permission() " Miklos Szeredi
2013-03-13 14:16 ` [PATCH 4/9] vfs: introduce clone_private_mount() Miklos Szeredi
2013-03-13 22:48 ` Andrew Morton
2013-03-14 13:28 ` Miklos Szeredi
2013-03-13 14:16 ` [PATCH 5/9] overlay filesystem Miklos Szeredi
2013-03-13 22:53 ` Andrew Morton
2013-03-13 14:16 ` [PATCH 6/9] overlayfs: add statfs support Miklos Szeredi
2013-03-13 14:16 ` [PATCH 7/9] overlayfs: implement show_options Miklos Szeredi
2013-03-13 14:16 ` [PATCH 8/9] overlay: overlay filesystem documentation Miklos Szeredi
2013-03-13 23:06 ` Andrew Morton
2013-03-14 13:35 ` Miklos Szeredi
2013-03-13 14:16 ` [PATCH 9/9] fs: limit filesystem stacking depth Miklos Szeredi
2013-03-13 14:31 ` [PATCH 0/9] overlay filesystem: request for inclusion (v17) Sedat Dilek
2013-03-13 15:13 ` Sedat Dilek
2013-03-13 15:18 ` Miklos Szeredi
2013-03-13 15:26 ` Sedat Dilek
2013-03-13 15:53 ` Sedat Dilek
2013-03-13 16:10 ` Sedat Dilek
2013-03-13 16:21 ` Miklos Szeredi
2013-03-13 16:35 ` Sedat Dilek
2013-03-13 16:51 ` Sedat Dilek
2013-03-13 18:12 ` Robin Holt
2013-03-13 18:37 ` Felix Fietkau
2013-03-13 19:10 ` Sedat Dilek
2013-03-13 19:54 ` Eric W. Biederman
2013-03-13 19:58 ` Linus Torvalds
2013-03-13 20:27 ` Sedat Dilek
[not found] ` <CAB3woddVfZ9PdYPpzidJLBMmUeRx0Rxgb5Pc8bTM9U-tkcS_uA@mail.gmail.com>
2013-03-13 20:32 ` Sedat Dilek
2013-03-13 20:36 ` Phillip Lougher
2013-03-13 23:08 ` Andrew Morton
2013-03-14 13:43 ` Miklos Szeredi
2013-03-15 1:25 ` Al Viro
2013-03-15 4:15 ` J. R. Okajima
2013-03-15 4:44 ` Al Viro
2013-03-15 5:09 ` J. R. Okajima
2013-03-15 5:13 ` Al Viro
2013-03-15 8:15 ` James Bottomley
2013-03-15 12:12 ` Al Viro
2013-03-15 18:57 ` J. R. Okajima
2013-03-15 19:26 ` Erez Zadok
2013-03-15 20:30 ` Al Viro
2013-03-16 13:55 ` J. R. Okajima
2013-03-15 19:11 ` Linus Torvalds
2013-03-16 13:57 ` J. R. Okajima
2013-03-17 13:06 ` [PATCH 2/9] vfs: export do_splice_direct() to modules David Howells
2013-03-18 2:31 ` Dave Chinner
2013-03-18 15:39 ` Jan Kara
2013-03-18 21:53 ` Al Viro
2013-03-18 23:01 ` Al Viro
2013-03-19 1:38 ` Al Viro
2013-03-19 9:00 ` J. R. Okajima [this message]
2013-03-19 10:29 ` Miklos Szeredi
2013-03-19 17:03 ` Al Viro
2013-03-19 18:32 ` Miklos Szeredi
2013-03-19 21:24 ` Al Viro
2013-03-20 9:15 ` Miklos Szeredi
2013-03-19 11:04 ` David Howells
2013-03-19 11:40 ` Miklos Szeredi
2013-03-19 20:54 ` Jan Kara
2013-03-19 20:25 ` Jan Kara
2013-03-19 21:38 ` Al Viro
2013-03-19 22:10 ` Al Viro
2013-03-20 2:33 ` Al Viro
2013-03-20 19:52 ` Jan Kara
2013-03-20 21:48 ` Al Viro
2013-03-20 22:19 ` Jan Kara
2013-03-20 12:30 ` David Howells
2013-03-22 17:37 ` J. R. Okajima
2013-03-22 18:11 ` Al Viro
2013-03-22 18:21 ` Al Viro
2013-03-23 2:49 ` J. R. Okajima
2013-03-23 4:41 ` Al Viro
2013-03-23 5:37 ` J. R. Okajima
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20556.1363683637@jrobl \
--to=hooanon05@yahoo.co.jp \
--cc=akpm@linux-foundation.org \
--cc=apw@canonical.com \
--cc=dhowells@redhat.com \
--cc=ezk@fsl.cs.sunysb.edu \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=jordipujolp@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=mszeredi@suse.cz \
--cc=nbd@openwrt.org \
--cc=neilb@suse.de \
--cc=sedat.dilek@googlemail.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).