linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	linux-unionfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: [RFC][PATCH 4/4] ovl: allow concurrent copy up data of different files
Date: Sat, 12 Nov 2016 21:36:04 +0200	[thread overview]
Message-ID: <1478979364-10355-5-git-send-email-amir73il@gmail.com> (raw)
In-Reply-To: <1478979364-10355-1-git-send-email-amir73il@gmail.com>

Currently, all copy operations are serialized by taking the
lock_rename(workdir, upperdir) locks for the entire copy up
operation, including the data copy up.

Copying up data may take a long time with large files and
holding s_vfs_rename_mutex during that time blocks all
rename and copy up operations on the entire underlying fs.

This change addresses this problem by changing the copy up
locking scheme for different types of files as follows.

Directories:
  <maybe> inode_lock(ovl_inode)
    lock_rename(workdir, upperdir)
      copy up dir to workdir
      move dir to upperdir

Special and zero size files:
  inode_lock(ovl_inode)
    lock_rename(workdir, upperdir)
      copy up file to workdir (no data)
      move file to upperdir

Regular files with data:
  inode_lock(ovl_inode)
    lock_rename(workdir, upperdir)
      copy up file to workdir (no data)
    unlock_rename(workdir, upperdir)
      copy data to file in workdir
    lock_rename(workdir, upperdir)
      move file to upperdir

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++---
 fs/overlayfs/inode.c   |  3 +++
 2 files changed, 69 insertions(+), 4 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index a16127b..1b9705e 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -230,6 +230,44 @@ int ovl_set_attr(struct dentry *upperdentry, struct kstat *stat)
 	return err;
 }
 
+/*
+ * Called with dentry->d_inode lock held only for the last (leaf) copy up
+ * from __ovl_copy_up(), so it is NOT held when called for ancestor
+ * directory from __ovl_copy_up()
+ *
+ * Called with lock_rename(workdir, upperdir) locks held.
+ *
+ * lock_rename() locks remain locked throughout the copy up
+ * of non regular files and zero sized regular files.
+ *
+ * lock_rename() locks are released during ovl_copy_up_data()
+ * of non zero sized regular files. During this time, the overlay
+ * dentry->d_inode lock is still held to avoid concurrent
+ * copy up of files with data.
+ *
+ * Maybe a better description of this locking scheme:
+ *
+ * Directories:
+ *   <maybe> inode_lock(ovl_inode)
+ *     lock_rename(workdir, upperdir)
+ *       copy up dir to workdir
+ *       move dir to upperdir
+ *
+ * Special and zero size files:
+ *   inode_lock(ovl_inode)
+ *     lock_rename(workdir, upperdir)
+ *       copy up file to workdir (no data)
+ *       move file to upperdir
+ *
+ * Regular files with data:
+ *  inode_lock(ovl_inode)
+ *    lock_rename(workdir, upperdir)
+ *      copy up file to workdir (no data)
+ *    unlock_rename(workdir, upperdir)
+ *      copy data to file in workdir
+ *    lock_rename(workdir, upperdir)
+ *      move file to upperdir
+ */
 static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir,
 			      struct dentry *dentry, struct path *lowerpath,
 			      struct kstat *stat, const char *link)
@@ -274,16 +312,39 @@ static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir,
 	if (err)
 		goto out2;
 
-	if (S_ISREG(stat->mode)) {
+	if (S_ISREG(stat->mode) && stat->size > 0) {
 		struct path upperpath;
 
 		ovl_path_upper(dentry, &upperpath);
 		BUG_ON(upperpath.dentry != NULL);
 		upperpath.dentry = newdentry;
 
+		/*
+		 * Release rename locks, because copy data may take a long time,
+		 * and holding s_vfs_rename_mutex will block all rename and
+		 * copy up operations on the entire underlying fs.
+		 * We still hold the overlay inode lock to avoid concurrent
+		 * copy up of this file.
+		 */
+		unlock_rename(workdir, upperdir);
+
 		err = ovl_copy_up_data(lowerpath, &upperpath, stat->size);
+
+		/*
+		 * Re-aquire rename locks, before moving copied file into place.
+		 */
+		if (unlikely(lock_rename(workdir, upperdir) != NULL)) {
+			pr_err("overlayfs: failed to re-aquire lock_rename\n");
+			err = -EIO;
+		}
+
 		if (err)
 			goto out_cleanup;
+
+		if (WARN_ON(ovl_dentry_is_upper(dentry))) {
+			/* Raced with another copy-up? This shouldn't happen */
+			goto out_cleanup;
+		}
 	}
 
 	err = ovl_copy_xattr(lowerpath->dentry, newdentry);
@@ -366,15 +427,14 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 			return PTR_ERR(link);
 	}
 
-	err = -EIO;
-	if (lock_rename(workdir, upperdir) != NULL) {
+	if (unlikely(lock_rename(workdir, upperdir) != NULL)) {
 		pr_err("overlayfs: failed to lock workdir+upperdir\n");
+		err = -EIO;
 		goto out_unlock;
 	}
 
 	if (ovl_dentry_is_upper(dentry)) {
 		/* Raced with another copy-up?  Nothing to do, then... */
-		err = 0;
 		goto out_unlock;
 	}
 
@@ -433,11 +493,13 @@ static int __ovl_copy_up(struct dentry *dentry, int flags)
 	return err;
 }
 
+/* Called with dentry->d_inode lock held */
 int ovl_copy_up(struct dentry *dentry)
 {
 	return __ovl_copy_up(dentry, 0);
 }
 
+/* Called with dentry->d_inode lock held */
 int ovl_copy_up_open(struct dentry *dentry, int flags)
 {
 	return __ovl_copy_up(dentry, flags);
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 7abae00..532b0d5 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -251,11 +251,14 @@ int ovl_open_maybe_copy_up(struct dentry *dentry, unsigned int file_flags)
 
 	type = ovl_path_real(dentry, &realpath);
 	if (ovl_open_need_copy_up(file_flags, type, realpath.dentry)) {
+		/* Take the overlay inode lock to avoid concurrent copy-up */
+		inode_lock(dentry->d_inode);
 		err = ovl_want_write(dentry);
 		if (!err) {
 			err = ovl_copy_up_open(dentry, file_flags);
 			ovl_drop_write(dentry);
 		}
+		inode_unlock(dentry->d_inode);
 	}
 
 	return err;
-- 
2.7.4


  parent reply	other threads:[~2016-11-12 19:36 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-12 19:36 [RFC][PATCH 0/4] ovl: Do not hold s_vfs_rename_mutex during data copy up Amir Goldstein
2016-11-12 19:36 ` [RFC][PATCH 1/4] vfs: update documentation for inode operation locking rules Amir Goldstein
2016-11-12 19:36 ` [RFC][PATCH 2/4] ovl: add helper ovl_dentry_is_upper() Amir Goldstein
2016-11-12 19:36 ` [RFC][PATCH 3/4] ovl: fold ovl_copy_up_truncate() into ovl_copy_up() Amir Goldstein
2016-12-03 17:04   ` Amir Goldstein
2016-12-05 14:51   ` Miklos Szeredi
2016-11-12 19:36 ` Amir Goldstein [this message]
2016-11-15 15:56   ` [RFC][PATCH 4/4] ovl: allow concurrent copy up data of different files Vivek Goyal
2016-11-15 19:20     ` Amir Goldstein
2016-11-15 21:57       ` Vivek Goyal
2016-11-16  5:27         ` Amir Goldstein
2016-11-20  8:27           ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1478979364-10355-5-git-send-email-amir73il@gmail.com \
    --to=amir73il@gmail.com \
    --cc=koct9i@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).