linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stephane Doyon <sdoyon@max-t.com>
To: Nathan Scott <nathans@sgi.com>
Cc: Christoph Hellwig <hch@infradead.org>, Suzuki <suzuki@in.ibm.com>,
	linux-fsdevel@vger.kernel.org,
	"linux-aio kvack.org" <linux-aio@kvack.org>,
	lkml <linux-kernel@vger.kernel.org>, suparna <suparna@in.ibm.com>,
	akpm@osdl.org, xfs@oss.sgi.com
Subject: Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests
Date: Tue, 11 Jul 2006 09:40:49 -0400 (EDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0607110928010.3327@madrid.max-t.internal> (raw)
In-Reply-To: <20060711101817.B1702118@wobbly.melbourne.sgi.com>

On Tue, 11 Jul 2006, Nathan Scott wrote:

> On Mon, Jul 10, 2006 at 12:46:23PM -0400, Stephane Doyon wrote:
>> Hi,
>
> Hi Stephane,
>
>> A few months back, a fix was introduced for a mutex double unlock warning
>> related to direct I/O in XFS. I believe that fix has a lock ordering
>> problem that can cause a deadlock.
>>
>> The problem was that __blockdev_direct_IO() would unlock the i_mutex in
>> the READ and DIO_OWN_LOCKING case, and the mutex would be unlocked again
>> in xfs_read() soon after returning from __generic_file_aio_read(). Because
>> there are lots of execution paths down from __generic_file_aio_read() that
>> do not consistently release the i_mutex, the safest fix was deemed to be
>> to reacquire the i_mutex before returning from __blockdev_direct_IO(). At
>> this point however, the reader is holding an xfs ilock, and AFAICT the
>> i_mutex is usually taken BEFORE the xfs ilock.
>
> That is correct, yes.  Hmm.  Subtle.  Painful.  Thanks for the detailed
> report and your analysis.
>
>> We are seeing a deadlock between a process writing and another process
>> reading the same file, when the reader is using direct I/O. (The writer
>
> Is that a direct writer or a buffered writer?

Whichever, both cases trigger the deadlock.

>> must apparently be growing the file, using either direct or buffered
>> I/O.) Something like this, on XFS:
>> (dd if=/dev/zero of=f bs=128K count=5000 & ) ; sleep 1 ; dd of=/dev/null
>> if=f iflag=direct bs=128K count=5000
>>
>> Seen on kernels 2.6.16 and 2.6.17.
>>
>> The deadlock scenario appears to be this:
>> -The reader goes into xfs_read(), takes the i_mutex and then an xfs_ilock
>> XFS_IOLOCK_SHARED, then calls down to __generic_file_aio_read() which
>> eventually goes down to __blockdev_direct_IO(). In there it drops the
>> i_mutex.
>> -The writer goes into xfs_write() and obtains the i_mutex. It then tries
>> to get an xfs_ilock XFS_ILOCK_EXCL and must wait on it since it's held by
>> the reader.
>> -The reader, still in __blockdev_direct_IO(), executes directio_worker()
>> and then tries to reacquire the i_mutex, and must wait on it because the
>> writer has it.
>>
>> And so we have a deadlock.
>
> *nod*.  This will require some thought, I'm not sure I like the sound of
> your suggested workaround there a whole lot, unfortunately, but maybe it
> is all we can do at this stage.  Let me ponder further and get back to you

Thank you.

> (but if you want to prototype your fix further, that'd be most welcome, of
> course).

Sure, well it's not very subtle. The below patch is what I'm using for 
now. I haven't seen problems with it yet but it hasn't been seriously 
tested.

I'm assuming that it's OK to keep holding the i_mutex during the call to 
direct_io_worker(), because in the DIO_LOCKING case, direct_io_worker() 
is called with the i_mutex held, and XFS touches i_mutex only in 
xfs_read() and xfs_write(), so opefully nothing will conflict.

Signed-off-by: Stephane Doyon <sdoyon@max-t.com>
--- linux/fs/direct-io.c.orig	2006-07-11 09:23:20.000000000 -0400
+++ linux/fs/direct-io.c	2006-07-11 09:27:54.000000000 -0400
@@ -1191,7 +1191,6 @@ __blockdev_direct_IO(int rw, struct kioc
  	loff_t end = offset;
  	struct dio *dio;
  	int release_i_mutex = 0;
-	int acquire_i_mutex = 0;

  	if (rw & WRITE)
  		current->flags |= PF_SYNCWRITE;
@@ -1254,11 +1253,6 @@ __blockdev_direct_IO(int rw, struct kioc
  				goto out;
  			}

-			if (dio_lock_type == DIO_OWN_LOCKING) {
-				mutex_unlock(&inode->i_mutex);
-				acquire_i_mutex = 1;
-			}
-		}

  		if (dio_lock_type == DIO_LOCKING)
  			down_read(&inode->i_alloc_sem);
@@ -1282,8 +1276,6 @@ __blockdev_direct_IO(int rw, struct kioc
  out:
  	if (release_i_mutex)
  		mutex_unlock(&inode->i_mutex);
-	else if (acquire_i_mutex)
-		mutex_lock(&inode->i_mutex);
  	if (rw & WRITE)
  		current->flags &= ~PF_SYNCWRITE;
  	return retval;

      reply	other threads:[~2006-07-11 13:43 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-09  7:54 [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests Suzuki
2006-03-09 12:03 ` Christoph Hellwig
2006-03-09 22:30   ` Nathan Scott
2006-03-09 22:42     ` Christoph Hellwig
2006-03-09 23:14       ` Nathan Scott
2006-03-10  0:50         ` Nathan Scott
2006-03-10 15:49           ` Christoph Hellwig
2006-03-14  4:46             ` Suparna Bhattacharya
2006-03-17 17:22           ` Adrian Bunk
2006-03-18  3:34             ` Nathan Scott
2006-03-18  5:03               ` Adrian Bunk
2006-07-10 16:46           ` Stephane Doyon
2006-07-11  0:18             ` Nathan Scott
2006-07-11 13:40               ` Stephane Doyon [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0607110928010.3327@madrid.max-t.internal \
    --to=sdoyon@max-t.com \
    --cc=akpm@osdl.org \
    --cc=hch@infradead.org \
    --cc=linux-aio@kvack.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nathans@sgi.com \
    --cc=suparna@in.ibm.com \
    --cc=suzuki@in.ibm.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).