From: Mark Fasheh <mark.fasheh@oracle.com>
To: Jens Axboe <jens.axboe@oracle.com>, Andrew Morton <akpm@osdl.org>,
linux-fsdevel@vger.kernel.org,
Michael Halcrow <mhalcrow@us.ibm.com>
Subject: Re: i_mutex locking in generic_file_splice_write()
Date: Mon, 16 Oct 2006 15:24:24 -0700 [thread overview]
Message-ID: <20061016222424.GC13426@ca-server1.us.oracle.com> (raw)
In-Reply-To: <20061016175808.GH6221@schatzie.adilger.int>
On Mon, Oct 16, 2006 at 11:58:08AM -0600, Andreas Dilger wrote:
> Could be a tiny bit cleaner if you put the "inode1 == inode2" case above:
Good catch - thanks for pointing it out. Here's a version with your cleanup.
This version of the patch was actually tested too.
--Mark
--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh@oracle.com
From: Mark Fasheh <mark.fasheh@oracle.com>
[PATCH] Take i_mutex in splice_from_pipe()
The splice_actor may be calling ->prepare_write() and ->commit_write(). We
want i_mutex on the inode being written to before calling those so that we
don't race i_size changes.
The double locking behavior is done elsewhere in splice.c, and if we
eventually want _nolock variants of generic_file_splice_write(), fs modules
might have to replicate the nasty locking code. We introduce
inode_double_lock() and inode_double_unlock() to consolidate the locking
rules into one set of functions.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
---
fs/inode.c | 36 ++++++++++++++++++++++++++++++++++++
fs/splice.c | 24 +++++++++++-------------
include/linux/fs.h | 3 +++
3 files changed, 50 insertions(+), 13 deletions(-)
049fb2a0358c842a6c5256a4c0d488441bfd8e11
diff --git a/fs/inode.c b/fs/inode.c
index d9a21d1..26cdb11 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1306,6 +1306,42 @@ void wake_up_inode(struct inode *inode)
wake_up_bit(&inode->i_state, __I_LOCK);
}
+/*
+ * We rarely want to lock two inodes that do not have a parent/child
+ * relationship (such as directory, child inode) simultaneously. The
+ * vast majority of file systems should be able to get along fine
+ * without this. Do not use these functions except as a last resort.
+ */
+void inode_double_lock(struct inode *inode1, struct inode *inode2)
+{
+ if (inode1 == NULL || inode2 == NULL || inode1 == inode2) {
+ if (inode1)
+ mutex_lock(&inode1->i_mutex);
+ else if (inode2)
+ mutex_lock(&inode2->i_mutex);
+ return;
+ }
+
+ if (inode1 < inode2) {
+ mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
+ mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
+ } else {
+ mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT);
+ mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD);
+ }
+}
+EXPORT_SYMBOL(inode_double_lock);
+
+void inode_double_unlock(struct inode *inode1, struct inode *inode2)
+{
+ if (inode1)
+ mutex_unlock(&inode1->i_mutex);
+
+ if (inode2 && inode2 != inode1)
+ mutex_unlock(&inode2->i_mutex);
+}
+EXPORT_SYMBOL(inode_double_unlock);
+
static __initdata unsigned long ihash_entries;
static int __init set_ihash_entries(char *str)
{
diff --git a/fs/splice.c b/fs/splice.c
index a567010..c1072b6 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -713,6 +713,7 @@ ssize_t splice_from_pipe(struct pipe_ino
{
int ret, do_wakeup, err;
struct splice_desc sd;
+ struct inode *inode = out->f_mapping->host;
ret = 0;
do_wakeup = 0;
@@ -722,8 +723,13 @@ ssize_t splice_from_pipe(struct pipe_ino
sd.file = out;
sd.pos = *ppos;
- if (pipe->inode)
- mutex_lock(&pipe->inode->i_mutex);
+ /*
+ * The actor worker might be calling ->prepare_write and
+ * ->commit_write. Most of the time, these expect i_mutex to
+ * be held. Since this may result in an ABBA deadlock with
+ * pipe->inode, we have to order lock acquiry here.
+ */
+ inode_double_lock(inode, pipe->inode);
for (;;) {
if (pipe->nrbufs) {
@@ -797,8 +803,7 @@ ssize_t splice_from_pipe(struct pipe_ino
pipe_wait(pipe);
}
- if (pipe->inode)
- mutex_unlock(&pipe->inode->i_mutex);
+ inode_double_unlock(inode, pipe->inode);
if (do_wakeup) {
smp_mb();
@@ -1400,13 +1405,7 @@ static int link_pipe(struct pipe_inode_i
* grabbing by inode address. Otherwise two different processes
* could deadlock (one doing tee from A -> B, the other from B -> A).
*/
- if (ipipe->inode < opipe->inode) {
- mutex_lock_nested(&ipipe->inode->i_mutex, I_MUTEX_PARENT);
- mutex_lock_nested(&opipe->inode->i_mutex, I_MUTEX_CHILD);
- } else {
- mutex_lock_nested(&opipe->inode->i_mutex, I_MUTEX_PARENT);
- mutex_lock_nested(&ipipe->inode->i_mutex, I_MUTEX_CHILD);
- }
+ inode_double_lock(ipipe->inode, opipe->inode);
do {
if (!opipe->readers) {
@@ -1450,8 +1449,7 @@ static int link_pipe(struct pipe_inode_i
i++;
} while (len);
- mutex_unlock(&ipipe->inode->i_mutex);
- mutex_unlock(&opipe->inode->i_mutex);
+ inode_double_unlock(ipipe->inode, opipe->inode);
/*
* If we put data in the output pipe, wakeup any potential readers.
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 34406ed..ddbb833 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -623,6 +623,9 @@ enum inode_i_mutex_lock_class
I_MUTEX_QUOTA
};
+extern void inode_double_lock(struct inode *inode1, struct inode *inode2);
+extern void inode_double_unlock(struct inode *inode1, struct inode *inode2);
+
/*
* NOTE: in a 32bit arch with a preemptable kernel and
* an UP compile the i_size_read/write must be atomic
--
1.3.3
prev parent reply other threads:[~2006-10-16 22:24 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-10-12 19:01 i_mutex locking in generic_file_splice_write() Mark Fasheh
2006-10-12 19:54 ` Andrew Morton
2006-10-13 0:17 ` Mark Fasheh
2006-10-13 7:45 ` Jens Axboe
2006-10-13 8:11 ` Andrew Morton
2006-10-13 8:18 ` Jens Axboe
2006-10-13 19:44 ` Mark Fasheh
2006-10-15 18:05 ` Jens Axboe
2006-10-15 19:56 ` Mark Fasheh
2006-10-15 20:08 ` Jens Axboe
2006-10-15 20:14 ` Mark Fasheh
2006-10-16 17:58 ` Andreas Dilger
2006-10-16 22:24 ` Mark Fasheh [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20061016222424.GC13426@ca-server1.us.oracle.com \
--to=mark.fasheh@oracle.com \
--cc=akpm@osdl.org \
--cc=jens.axboe@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mhalcrow@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.