From: Mark Fasheh <mark.fasheh@oracle.com>
To: Jens Axboe <jens.axboe@oracle.com>, Andrew Morton <akpm@osdl.org>,
linux-fsdevel@vger.kernel.org,
Michael Halcrow <mhalcrow@us.ibm.com>
Subject: Re: i_mutex locking in generic_file_splice_write()
Date: Mon, 16 Oct 2006 15:24:24 -0700 [thread overview]
Message-ID: <20061016222424.GC13426@ca-server1.us.oracle.com> (raw)
In-Reply-To: <20061016175808.GH6221@schatzie.adilger.int>
On Mon, Oct 16, 2006 at 11:58:08AM -0600, Andreas Dilger wrote:
> Could be a tiny bit cleaner if you put the "inode1 == inode2" case above:
Good catch - thanks for pointing it out. Here's a version with your cleanup.
This version of the patch was actually tested too.
--Mark
--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh@oracle.com
From: Mark Fasheh <mark.fasheh@oracle.com>
[PATCH] Take i_mutex in splice_from_pipe()
The splice_actor may be calling ->prepare_write() and ->commit_write(). We
want i_mutex on the inode being written to before calling those so that we
don't race i_size changes.
The double locking behavior is done elsewhere in splice.c, and if we
eventually want _nolock variants of generic_file_splice_write(), fs modules
might have to replicate the nasty locking code. We introduce
inode_double_lock() and inode_double_unlock() to consolidate the locking
rules into one set of functions.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
---
fs/inode.c | 36 ++++++++++++++++++++++++++++++++++++
fs/splice.c | 24 +++++++++++-------------
include/linux/fs.h | 3 +++
3 files changed, 50 insertions(+), 13 deletions(-)
049fb2a0358c842a6c5256a4c0d488441bfd8e11
diff --git a/fs/inode.c b/fs/inode.c
index d9a21d1..26cdb11 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1306,6 +1306,42 @@ void wake_up_inode(struct inode *inode)
wake_up_bit(&inode->i_state, __I_LOCK);
}
+/*
+ * We rarely want to lock two inodes that do not have a parent/child
+ * relationship (such as directory, child inode) simultaneously. The
+ * vast majority of file systems should be able to get along fine
+ * without this. Do not use these functions except as a last resort.
+ */
+void inode_double_lock(struct inode *inode1, struct inode *inode2)
+{
+ if (inode1 == NULL || inode2 == NULL || inode1 == inode2) {
+ if (inode1)
+ mutex_lock(&inode1->i_mutex);
+ else if (inode2)
+ mutex_lock(&inode2->i_mutex);
+ return;
+ }
+
+ if (inode1 < inode2) {
+ mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
+ mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
+ } else {
+ mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT);
+ mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD);
+ }
+}
+EXPORT_SYMBOL(inode_double_lock);
+
+void inode_double_unlock(struct inode *inode1, struct inode *inode2)
+{
+ if (inode1)
+ mutex_unlock(&inode1->i_mutex);
+
+ if (inode2 && inode2 != inode1)
+ mutex_unlock(&inode2->i_mutex);
+}
+EXPORT_SYMBOL(inode_double_unlock);
+
static __initdata unsigned long ihash_entries;
static int __init set_ihash_entries(char *str)
{
diff --git a/fs/splice.c b/fs/splice.c
index a567010..c1072b6 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -713,6 +713,7 @@ ssize_t splice_from_pipe(struct pipe_ino
{
int ret, do_wakeup, err;
struct splice_desc sd;
+ struct inode *inode = out->f_mapping->host;
ret = 0;
do_wakeup = 0;
@@ -722,8 +723,13 @@ ssize_t splice_from_pipe(struct pipe_ino
sd.file = out;
sd.pos = *ppos;
- if (pipe->inode)
- mutex_lock(&pipe->inode->i_mutex);
+ /*
+ * The actor worker might be calling ->prepare_write and
+ * ->commit_write. Most of the time, these expect i_mutex to
+ * be held. Since this may result in an ABBA deadlock with
+ * pipe->inode, we have to order lock acquiry here.
+ */
+ inode_double_lock(inode, pipe->inode);
for (;;) {
if (pipe->nrbufs) {
@@ -797,8 +803,7 @@ ssize_t splice_from_pipe(struct pipe_ino
pipe_wait(pipe);
}
- if (pipe->inode)
- mutex_unlock(&pipe->inode->i_mutex);
+ inode_double_unlock(inode, pipe->inode);
if (do_wakeup) {
smp_mb();
@@ -1400,13 +1405,7 @@ static int link_pipe(struct pipe_inode_i
* grabbing by inode address. Otherwise two different processes
* could deadlock (one doing tee from A -> B, the other from B -> A).
*/
- if (ipipe->inode < opipe->inode) {
- mutex_lock_nested(&ipipe->inode->i_mutex, I_MUTEX_PARENT);
- mutex_lock_nested(&opipe->inode->i_mutex, I_MUTEX_CHILD);
- } else {
- mutex_lock_nested(&opipe->inode->i_mutex, I_MUTEX_PARENT);
- mutex_lock_nested(&ipipe->inode->i_mutex, I_MUTEX_CHILD);
- }
+ inode_double_lock(ipipe->inode, opipe->inode);
do {
if (!opipe->readers) {
@@ -1450,8 +1449,7 @@ static int link_pipe(struct pipe_inode_i
i++;
} while (len);
- mutex_unlock(&ipipe->inode->i_mutex);
- mutex_unlock(&opipe->inode->i_mutex);
+ inode_double_unlock(ipipe->inode, opipe->inode);
/*
* If we put data in the output pipe, wakeup any potential readers.
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 34406ed..ddbb833 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -623,6 +623,9 @@ enum inode_i_mutex_lock_class
I_MUTEX_QUOTA
};
+extern void inode_double_lock(struct inode *inode1, struct inode *inode2);
+extern void inode_double_unlock(struct inode *inode1, struct inode *inode2);
+
/*
* NOTE: in a 32bit arch with a preemptable kernel and
* an UP compile the i_size_read/write must be atomic
--
1.3.3
prev parent reply other threads:[~2006-10-16 22:24 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-10-12 19:01 i_mutex locking in generic_file_splice_write() Mark Fasheh
2006-10-12 19:54 ` Andrew Morton
2006-10-13 0:17 ` Mark Fasheh
2006-10-13 7:45 ` Jens Axboe
2006-10-13 8:11 ` Andrew Morton
2006-10-13 8:18 ` Jens Axboe
2006-10-13 19:44 ` Mark Fasheh
2006-10-15 18:05 ` Jens Axboe
2006-10-15 19:56 ` Mark Fasheh
2006-10-15 20:08 ` Jens Axboe
2006-10-15 20:14 ` Mark Fasheh
2006-10-16 17:58 ` Andreas Dilger
2006-10-16 22:24 ` Mark Fasheh [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20061016222424.GC13426@ca-server1.us.oracle.com \
--to=mark.fasheh@oracle.com \
--cc=akpm@osdl.org \
--cc=jens.axboe@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mhalcrow@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).