From: Mark Fasheh <mark.fasheh@oracle.com>
To: Jens Axboe <jens.axboe@oracle.com>
Cc: Andrew Morton <akpm@osdl.org>,
linux-fsdevel@vger.kernel.org,
Michael Halcrow <mhalcrow@us.ibm.com>
Subject: Re: i_mutex locking in generic_file_splice_write()
Date: Fri, 13 Oct 2006 12:44:42 -0700 [thread overview]
Message-ID: <20061013194442.GA2954@ca-server1.us.oracle.com> (raw)
In-Reply-To: <20061013081855.GG6515@kernel.dk>
On Fri, Oct 13, 2006 at 10:18:56AM +0200, Jens Axboe wrote:
> > Taking the lowest-addressed lock first is the usual convention, if we really
> > have to do that. I'm not aware of anywhere else where we pull this
> > trick with i_mutex though.
>
> It is, but people had concerns with that approach when it was originally
> done for pipe tee'ing. Things like lock_rename() look scary.
Yeah, I'm not too thrilled with the double locking either. Eventually I
think we'll need some _nolock variant of generic_file_splice_write() so that
OCFS2 can order cluster locks within i_mutex (much like you see with
ocfs2_file_aio_write(), etc).
At the very least, I'd rather not have this stuff open coded everywhere. If
we consolidate the behavior within a set of functions, we'll be able to
avoid questions later about locking order, etc.
I felt dirty just typing up this patch. Sigh. Let me know what you think.
--Mark
From: Mark Fasheh <mark.fasheh@oracle.com>
[PATCH] Take i_mutex in splice_from_pipe()
The splice_actor may be calling ->prepare_write() and ->commit_write(). We
want i_mutex on the inode being written to before calling those so that we
don't race i_size changes.
The double locking behavior is done elsewhere in splice.c, and if we
eventually want _nolock variants of generic_file_splice_write(), fs modules
might have to replicate the nasty locking code. We introduce
inode_double_lock() and inode_double_unlock() to consolidate the locking
rules into one set of functions.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
---
fs/splice.c | 24 +++++++++++-------------
include/linux/fs.h | 29 +++++++++++++++++++++++++++++
2 files changed, 40 insertions(+), 13 deletions(-)
diff --git a/fs/splice.c b/fs/splice.c
index 13e92dd..4d8a774 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -713,6 +713,7 @@ ssize_t splice_from_pipe(struct pipe_ino
{
int ret, do_wakeup, err;
struct splice_desc sd;
+ struct inode *inode = out->f_mapping->host;
ret = 0;
do_wakeup = 0;
@@ -722,8 +723,13 @@ ssize_t splice_from_pipe(struct pipe_ino
sd.file = out;
sd.pos = *ppos;
- if (pipe->inode)
- mutex_lock(&pipe->inode->i_mutex);
+ /*
+ * The actor worker might be calling ->prepare_write and
+ * ->commit_write. Most of the time, these expect i_mutex to
+ * be held. Since this may result in an ABBA deadlock with
+ * pipe->inode, we have to order lock acquiry here.
+ */
+ inode_double_lock(inode, pipe->inode);
for (;;) {
if (pipe->nrbufs) {
@@ -797,8 +803,7 @@ ssize_t splice_from_pipe(struct pipe_ino
pipe_wait(pipe);
}
- if (pipe->inode)
- mutex_unlock(&pipe->inode->i_mutex);
+ inode_double_unlock(inode, pipe->inode);
if (do_wakeup) {
smp_mb();
@@ -1400,13 +1405,7 @@ static int link_pipe(struct pipe_inode_i
* grabbing by inode address. Otherwise two different processes
* could deadlock (one doing tee from A -> B, the other from B -> A).
*/
- if (ipipe->inode < opipe->inode) {
- mutex_lock_nested(&ipipe->inode->i_mutex, I_MUTEX_PARENT);
- mutex_lock_nested(&opipe->inode->i_mutex, I_MUTEX_CHILD);
- } else {
- mutex_lock_nested(&opipe->inode->i_mutex, I_MUTEX_PARENT);
- mutex_lock_nested(&ipipe->inode->i_mutex, I_MUTEX_CHILD);
- }
+ inode_double_lock(ipipe->inode, opipe->inode);
do {
if (!opipe->readers) {
@@ -1450,8 +1449,7 @@ static int link_pipe(struct pipe_inode_i
i++;
} while (len);
- mutex_unlock(&ipipe->inode->i_mutex);
- mutex_unlock(&opipe->inode->i_mutex);
+ inode_double_unlock(ipipe->inode, opipe->inode);
/*
* If we put data in the output pipe, wakeup any potential readers.
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 34406ed..31d8548 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -624,6 +624,35 @@ enum inode_i_mutex_lock_class
};
/*
+ * We rarely want to lock two inodes that do not have a parent/child
+ * relationship (such as directory, child inode) simultaneously. The
+ * vast majority of file systems should be able to get along fine
+ * without this. Do not use these functions except as a last resort.
+ */
+static inline void inode_double_lock(struct inode *inode1, struct inode *inode2)
+{
+ if (!inode2) {
+ mutex_lock(&inode1->i_mutex);
+ return;
+ }
+
+ if (inode1 < inode2) {
+ mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
+ mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
+ } else {
+ mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT);
+ mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD);
+ }
+}
+
+static inline void inode_double_unlock(struct inode *inode1, struct inode *inode2)
+{
+ mutex_unlock(&inode1->i_mutex);
+ if (inode2)
+ mutex_unlock(&inode2->i_mutex);
+}
+
+/*
* NOTE: in a 32bit arch with a preemptable kernel and
* an UP compile the i_size_read/write must be atomic
* with respect to the local cpu (unlike with preempt disabled),
--
1.4.2.3
next prev parent reply other threads:[~2006-10-13 19:45 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-10-12 19:01 i_mutex locking in generic_file_splice_write() Mark Fasheh
2006-10-12 19:54 ` Andrew Morton
2006-10-13 0:17 ` Mark Fasheh
2006-10-13 7:45 ` Jens Axboe
2006-10-13 8:11 ` Andrew Morton
2006-10-13 8:18 ` Jens Axboe
2006-10-13 19:44 ` Mark Fasheh [this message]
2006-10-15 18:05 ` Jens Axboe
2006-10-15 19:56 ` Mark Fasheh
2006-10-15 20:08 ` Jens Axboe
2006-10-15 20:14 ` Mark Fasheh
2006-10-16 17:58 ` Andreas Dilger
2006-10-16 22:24 ` Mark Fasheh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20061013194442.GA2954@ca-server1.us.oracle.com \
--to=mark.fasheh@oracle.com \
--cc=akpm@osdl.org \
--cc=jens.axboe@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mhalcrow@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.