From: Stephane Doyon <sdoyon@max-t.com>
To: Nathan Scott <nathans@sgi.com>
Cc: Christoph Hellwig <hch@infradead.org>, Suzuki <suzuki@in.ibm.com>,
linux-fsdevel@vger.kernel.org,
"linux-aio kvack.org" <linux-aio@kvack.org>,
lkml <linux-kernel@vger.kernel.org>, suparna <suparna@in.ibm.com>,
akpm@osdl.org, xfs@oss.sgi.com
Subject: Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests
Date: Tue, 11 Jul 2006 09:40:49 -0400 (EDT) [thread overview]
Message-ID: <Pine.LNX.4.64.0607110928010.3327@madrid.max-t.internal> (raw)
In-Reply-To: <20060711101817.B1702118@wobbly.melbourne.sgi.com>
On Tue, 11 Jul 2006, Nathan Scott wrote:
> On Mon, Jul 10, 2006 at 12:46:23PM -0400, Stephane Doyon wrote:
>> Hi,
>
> Hi Stephane,
>
>> A few months back, a fix was introduced for a mutex double unlock warning
>> related to direct I/O in XFS. I believe that fix has a lock ordering
>> problem that can cause a deadlock.
>>
>> The problem was that __blockdev_direct_IO() would unlock the i_mutex in
>> the READ and DIO_OWN_LOCKING case, and the mutex would be unlocked again
>> in xfs_read() soon after returning from __generic_file_aio_read(). Because
>> there are lots of execution paths down from __generic_file_aio_read() that
>> do not consistently release the i_mutex, the safest fix was deemed to be
>> to reacquire the i_mutex before returning from __blockdev_direct_IO(). At
>> this point however, the reader is holding an xfs ilock, and AFAICT the
>> i_mutex is usually taken BEFORE the xfs ilock.
>
> That is correct, yes. Hmm. Subtle. Painful. Thanks for the detailed
> report and your analysis.
>
>> We are seeing a deadlock between a process writing and another process
>> reading the same file, when the reader is using direct I/O. (The writer
>
> Is that a direct writer or a buffered writer?
Whichever, both cases trigger the deadlock.
>> must apparently be growing the file, using either direct or buffered
>> I/O.) Something like this, on XFS:
>> (dd if=/dev/zero of=f bs=128K count=5000 & ) ; sleep 1 ; dd of=/dev/null
>> if=f iflag=direct bs=128K count=5000
>>
>> Seen on kernels 2.6.16 and 2.6.17.
>>
>> The deadlock scenario appears to be this:
>> -The reader goes into xfs_read(), takes the i_mutex and then an xfs_ilock
>> XFS_IOLOCK_SHARED, then calls down to __generic_file_aio_read() which
>> eventually goes down to __blockdev_direct_IO(). In there it drops the
>> i_mutex.
>> -The writer goes into xfs_write() and obtains the i_mutex. It then tries
>> to get an xfs_ilock XFS_ILOCK_EXCL and must wait on it since it's held by
>> the reader.
>> -The reader, still in __blockdev_direct_IO(), executes directio_worker()
>> and then tries to reacquire the i_mutex, and must wait on it because the
>> writer has it.
>>
>> And so we have a deadlock.
>
> *nod*. This will require some thought, I'm not sure I like the sound of
> your suggested workaround there a whole lot, unfortunately, but maybe it
> is all we can do at this stage. Let me ponder further and get back to you
Thank you.
> (but if you want to prototype your fix further, that'd be most welcome, of
> course).
Sure, well it's not very subtle. The below patch is what I'm using for
now. I haven't seen problems with it yet but it hasn't been seriously
tested.
I'm assuming that it's OK to keep holding the i_mutex during the call to
direct_io_worker(), because in the DIO_LOCKING case, direct_io_worker()
is called with the i_mutex held, and XFS touches i_mutex only in
xfs_read() and xfs_write(), so opefully nothing will conflict.
Signed-off-by: Stephane Doyon <sdoyon@max-t.com>
--- linux/fs/direct-io.c.orig 2006-07-11 09:23:20.000000000 -0400
+++ linux/fs/direct-io.c 2006-07-11 09:27:54.000000000 -0400
@@ -1191,7 +1191,6 @@ __blockdev_direct_IO(int rw, struct kioc
loff_t end = offset;
struct dio *dio;
int release_i_mutex = 0;
- int acquire_i_mutex = 0;
if (rw & WRITE)
current->flags |= PF_SYNCWRITE;
@@ -1254,11 +1253,6 @@ __blockdev_direct_IO(int rw, struct kioc
goto out;
}
- if (dio_lock_type == DIO_OWN_LOCKING) {
- mutex_unlock(&inode->i_mutex);
- acquire_i_mutex = 1;
- }
- }
if (dio_lock_type == DIO_LOCKING)
down_read(&inode->i_alloc_sem);
@@ -1282,8 +1276,6 @@ __blockdev_direct_IO(int rw, struct kioc
out:
if (release_i_mutex)
mutex_unlock(&inode->i_mutex);
- else if (acquire_i_mutex)
- mutex_lock(&inode->i_mutex);
if (rw & WRITE)
current->flags &= ~PF_SYNCWRITE;
return retval;
prev parent reply other threads:[~2006-07-11 13:43 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-03-09 7:54 [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests Suzuki
2006-03-09 12:03 ` Christoph Hellwig
2006-03-09 22:30 ` Nathan Scott
2006-03-09 22:42 ` Christoph Hellwig
2006-03-09 23:14 ` Nathan Scott
2006-03-10 0:50 ` Nathan Scott
2006-03-10 15:49 ` Christoph Hellwig
2006-03-14 4:46 ` Suparna Bhattacharya
2006-03-17 17:22 ` Adrian Bunk
2006-03-18 3:34 ` Nathan Scott
2006-03-18 5:03 ` Adrian Bunk
2006-07-10 16:46 ` Stephane Doyon
2006-07-11 0:18 ` Nathan Scott
2006-07-11 13:40 ` Stephane Doyon [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0607110928010.3327@madrid.max-t.internal \
--to=sdoyon@max-t.com \
--cc=akpm@osdl.org \
--cc=hch@infradead.org \
--cc=linux-aio@kvack.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nathans@sgi.com \
--cc=suparna@in.ibm.com \
--cc=suzuki@in.ibm.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).