linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mark Fasheh <mark.fasheh@oracle.com>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-fsdevel@vger.kernel.org, Jens Axboe <jens.axboe@oracle.com>,
	Michael Halcrow <mhalcrow@us.ibm.com>
Subject: Re: i_mutex locking in generic_file_splice_write()
Date: Thu, 12 Oct 2006 17:17:15 -0700	[thread overview]
Message-ID: <20061013001715.GV6485@ca-server1.us.oracle.com> (raw)
In-Reply-To: <20061012125409.40bd1fb1.akpm@osdl.org>

On Thu, Oct 12, 2006 at 12:54:09PM -0700, Andrew Morton wrote:
> > Shouldn't we be taking this before calling into ->prepare_write() and
> > ->commit_write(). What's preventing generic_file_splice_write() from racing
> > a truncate? Or maybe even another write?
> 
> The lock_page() will block truncate and will block write()s to this particular
> page.
Ok.


> > A quick look through other callers reveals that generic_file_aio_write() and
> > do_lo_send_aops() both are careful to take i_mutex.
> 
> I'm trying to remember what i_mutex actually protects in this context. 
> i_size, certainly - if we go changing the file size without locks then
> other places might get surprised.  For example, a concurrent write() at a
> larger file offset might try to increase i_size but if it loses the race
> against the unlocked i_size-changing thread, the inode ends up with the
> smaller i_size.
I'm also worried about concurrent allocation tree changes. Perhaps I'm
mistaken and all file systems we care about can handle them happening
concurrently, but otherwise couldn't two processes writing to different
sparse regions in a file cause problems? One process via file write and the
other via a splice write.


> So yup, we need i_mutex if only for that reason.
Ok. Here's a first pass. The double lock is ugly, but as far as I can tell
we need it. Unless there's a rule about ordering between pipe inodes and
"other" inodes that I don't know about.

Compile tested only. I probably won't get a chance to actually run it until
late this weekend at the earliest :/
	--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh@oracle.com


From: Mark Fasheh <mark.fasheh@oracle.com>

[PATCH] Take i_mutex in splice_from_pipe()

The splice_actor may be calling ->prepare_write() and ->commit_write(). We
want i_mutex on the inode being written to before calling those so that we
don't race i_size changes.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

---

 fs/splice.c |   21 +++++++++++++++++++--
 1 files changed, 19 insertions(+), 2 deletions(-)

96e880f22bd1c0e2809ebbfe5bf122ed67019e33
diff --git a/fs/splice.c b/fs/splice.c
index 13e92dd..e1ecb9e 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -713,6 +713,7 @@ ssize_t splice_from_pipe(struct pipe_ino
 {
 	int ret, do_wakeup, err;
 	struct splice_desc sd;
+	struct inode *inode = out->f_mapping->host;
 
 	ret = 0;
 	do_wakeup = 0;
@@ -722,8 +723,23 @@ ssize_t splice_from_pipe(struct pipe_ino
 	sd.file = out;
 	sd.pos = *ppos;
 
-	if (pipe->inode)
-		mutex_lock(&pipe->inode->i_mutex);
+	/*
+	 * The actor worker might be calling ->prepare_write and
+	 * ->commit_write. Most of the time, these expect i_mutex to
+	 * be held. Since this may result in an ABBA deadlock with
+	 * pipe->inode, we have to order lock acquiry here.
+	 */
+	if (pipe->inode) {
+		if (pipe->inode < inode) {
+			mutex_lock_nested(&pipe->inode->i_mutex, I_MUTEX_PARENT);
+			mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
+		} else {
+			mutex_lock_nested(&inode->i_mutex, I_MUTEX_PARENT);
+			mutex_lock_nested(&pipe->inode->i_mutex, I_MUTEX_CHILD);
+		}
+	} else {
+		mutex_lock(&inode->i_mutex);
+	}
 
 	for (;;) {
 		if (pipe->nrbufs) {
@@ -799,6 +815,7 @@ ssize_t splice_from_pipe(struct pipe_ino
 
 	if (pipe->inode)
 		mutex_unlock(&pipe->inode->i_mutex);
+	mutex_unlock(&inode->i_mutex);
 
 	if (do_wakeup) {
 		smp_mb();
-- 
1.3.3


  reply	other threads:[~2006-10-13  0:17 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-12 19:01 i_mutex locking in generic_file_splice_write() Mark Fasheh
2006-10-12 19:54 ` Andrew Morton
2006-10-13  0:17   ` Mark Fasheh [this message]
2006-10-13  7:45     ` Jens Axboe
2006-10-13  8:11       ` Andrew Morton
2006-10-13  8:18         ` Jens Axboe
2006-10-13 19:44           ` Mark Fasheh
2006-10-15 18:05             ` Jens Axboe
2006-10-15 19:56               ` Mark Fasheh
2006-10-15 20:08                 ` Jens Axboe
2006-10-15 20:14                   ` Mark Fasheh
2006-10-16 17:58                     ` Andreas Dilger
2006-10-16 22:24                       ` Mark Fasheh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061013001715.GV6485@ca-server1.us.oracle.com \
    --to=mark.fasheh@oracle.com \
    --cc=akpm@osdl.org \
    --cc=jens.axboe@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mhalcrow@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).