* [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests @ 2006-03-09 7:54 Suzuki 2006-03-09 12:03 ` Christoph Hellwig 0 siblings, 1 reply; 14+ messages in thread From: Suzuki @ 2006-03-09 7:54 UTC (permalink / raw) To: linux-fsdevel, linux-aio kvack.org, lkml; +Cc: suparna, akpm [-- Attachment #1: Type: text/plain, Size: 134 bytes --] Missed out linux-aio & linux-fs-devel lists. Forwarding. Comments ? Suzuki K P Linux Technology Center, IBM Software Labs, India. [-- Attachment #2: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests.eml --] [-- Type: message/rfc822, Size: 2028 bytes --] From: Suzuki <suzuki@in.ibm.com> To: lkml <linux-kernel@vger.kernel.org> Cc: suparna <suparna@in.ibm.com>, akpm@osdl.org Subject: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests Date: Thu, 09 Mar 2006 12:47:01 +0530 Message-ID: <440FD66D.6060308@in.ibm.com> Hi all, I was working on an issue with getting "Badness in __mutex_unlock_slowpath" and hence a stack trace, while running FS stress tests on XFS on 2.6.16-rc5 kernel. The dmesg looks like : Badness in __mutex_unlock_slowpath at kernel/mutex.c:207 [<c0103c0c>] show_trace+0x20/0x22 [<c0103d4b>] dump_stack+0x1e/0x20 [<c0473f1f>] __mutex_unlock_slowpath+0x12a/0x23b [<c0473938>] mutex_unlock+0xb/0xd [<c02a5720>] xfs_read+0x230/0x2d9 [<c02a1bed>] linvfs_aio_read+0x8d/0x98 [<c015f3df>] do_sync_read+0xb8/0x107 [<c015f4f7>] vfs_read+0xc9/0x19b [<c015f8b2>] sys_read+0x47/0x6e [<c0102db7>] sysenter_past_esp+0x54/0x75 This happens with XFS DIO reads. xfs_read holds the i_mutex and issues a __generic_file_aio_read(), which falls into __blockdev_direct_IO with DIO_OWN_LOCKING flag (since xfs uses own_locking ). Now __blockdev_direct_IO releases the i_mutex for READs with DIO_OWN_LOCKING.When it returns to xfs_read, it tries to unlock the i_mutex ( which is now already unlocked), causing the "Badness". The possible solution which we can think of, is not to unlock the i_mutex for DIO_OWN_LOCKING. This will only affect the DIO_OWN_LOCKING users (as of now, only XFS ) with concurrent DIO sync read requests. AIO read requests would not suffer this problem since they would just return once the DIO is submitted. Another work around for this can be adding a check "mutex_is_locked" before trying to unlock i_mutex in xfs_read. But this seems to be an ugly hack. :( Comments ? thanks, Suzuki ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests 2006-03-09 7:54 [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests Suzuki @ 2006-03-09 12:03 ` Christoph Hellwig 2006-03-09 22:30 ` Nathan Scott 0 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2006-03-09 12:03 UTC (permalink / raw) To: Suzuki; +Cc: linux-fsdevel, linux-aio kvack.org, lkml, suparna, akpm, linux-xfs On Thu, Mar 09, 2006 at 01:24:38PM +0530, Suzuki wrote: > > Missed out linux-aio & linux-fs-devel lists. Forwarding. > > Comments ? I've seen this too. The problem is that __generic_file_aio_read can return with or without the i_mutex locked in the direct I/O case for filesystems that set DIO_OWN_LOCKING. It's a nasty one and I haven't found a better solution than copying lots of code from filemap.c into xfs. -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests 2006-03-09 12:03 ` Christoph Hellwig @ 2006-03-09 22:30 ` Nathan Scott 2006-03-09 22:42 ` Christoph Hellwig 0 siblings, 1 reply; 14+ messages in thread From: Nathan Scott @ 2006-03-09 22:30 UTC (permalink / raw) To: Christoph Hellwig, Suzuki Cc: linux-fsdevel, linux-aio kvack.org, lkml, suparna, akpm, linux-xfs On Thu, Mar 09, 2006 at 12:03:06PM +0000, Christoph Hellwig wrote: > On Thu, Mar 09, 2006 at 01:24:38PM +0530, Suzuki wrote: > > > > Missed out linux-aio & linux-fs-devel lists. Forwarding. > > > > Comments ? > > I've seen this too. The problem is that __generic_file_aio_read can return > with or without the i_mutex locked in the direct I/O case for filesystems > that set DIO_OWN_LOCKING. Not for reads AFAICT - __generic_file_aio_read + own-locking should always have released i_mutex at the end of the direct read - are you thinking of writes or have I missed something? > It's a nasty one and I haven't found a better solution > than copying lots of code from filemap.c into xfs. Er, eek? Hopefully thats not needed - from my reading of the code, all the i_mutex locking for direct reads lives inside direct-io.c, not filemap.c -- is the solution from my other mail not workable? (isn't it only writes that has the wierd buffered I/O fallback + i_sem/i_mutex locking interaction?). thanks. -- Nathan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests 2006-03-09 22:30 ` Nathan Scott @ 2006-03-09 22:42 ` Christoph Hellwig 2006-03-09 23:14 ` Nathan Scott 0 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2006-03-09 22:42 UTC (permalink / raw) To: Nathan Scott Cc: Christoph Hellwig, Suzuki, linux-fsdevel, linux-aio kvack.org, lkml, suparna, akpm, linux-xfs On Fri, Mar 10, 2006 at 09:30:42AM +1100, Nathan Scott wrote: > Not for reads AFAICT - __generic_file_aio_read + own-locking > should always have released i_mutex at the end of the direct > read - are you thinking of writes or have I missed something? if an error occurs before a_ops->direct_IO is called __generic_file_aio_read will return with i_mutex still locked. Note that checking for negative return values is not enough as __blockdev_direct_IO can return errors aswell. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests 2006-03-09 22:42 ` Christoph Hellwig @ 2006-03-09 23:14 ` Nathan Scott 2006-03-10 0:50 ` Nathan Scott 0 siblings, 1 reply; 14+ messages in thread From: Nathan Scott @ 2006-03-09 23:14 UTC (permalink / raw) To: Christoph Hellwig Cc: Suzuki, linux-fsdevel, linux-aio kvack.org, lkml, suparna, akpm, linux-xfs On Thu, Mar 09, 2006 at 10:42:19PM +0000, Christoph Hellwig wrote: > On Fri, Mar 10, 2006 at 09:30:42AM +1100, Nathan Scott wrote: > > Not for reads AFAICT - __generic_file_aio_read + own-locking > > should always have released i_mutex at the end of the direct > > read - are you thinking of writes or have I missed something? > > if an error occurs before a_ops->direct_IO is called __generic_file_aio_read > will return with i_mutex still locked. Note that checking for negative > return values is not enough as __blockdev_direct_IO can return errors > aswell. *groan* - right you are. Another option may be to have the generic dio+own-locking case reacquire i_mutex if it drops it, before returning... thoughts? Seems alot less invasive than the filemap.c code dup'ing thing. cheers. -- Nathan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests 2006-03-09 23:14 ` Nathan Scott @ 2006-03-10 0:50 ` Nathan Scott 2006-03-10 15:49 ` Christoph Hellwig ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Nathan Scott @ 2006-03-10 0:50 UTC (permalink / raw) To: Christoph Hellwig, Suzuki Cc: linux-fsdevel, linux-aio kvack.org, lkml, suparna, akpm, linux-xfs On Fri, Mar 10, 2006 at 10:14:22AM +1100, Nathan Scott wrote: > On Thu, Mar 09, 2006 at 10:42:19PM +0000, Christoph Hellwig wrote: > > On Fri, Mar 10, 2006 at 09:30:42AM +1100, Nathan Scott wrote: > > > Not for reads AFAICT - __generic_file_aio_read + own-locking > > > should always have released i_mutex at the end of the direct > > > read - are you thinking of writes or have I missed something? > > > > if an error occurs before a_ops->direct_IO is called __generic_file_aio_read > > will return with i_mutex still locked. Note that checking for negative > > return values is not enough as __blockdev_direct_IO can return errors > > aswell. > > *groan* - right you are. Another option may be to have the > generic dio+own-locking case reacquire i_mutex if it drops > it, before returning... thoughts? Seems alot less invasive > than the filemap.c code dup'ing thing. Something like this (works OK for me)... cheers. -- Nathan Index: 2.6.x-xfs/fs/direct-io.c =================================================================== --- 2.6.x-xfs.orig/fs/direct-io.c +++ 2.6.x-xfs/fs/direct-io.c @@ -1155,15 +1155,16 @@ direct_io_worker(int rw, struct kiocb *i * For writes, i_mutex is not held on entry; it is never taken. * * DIO_LOCKING (simple locking for regular files) - * For writes we are called under i_mutex and return with i_mutex held, even though - * it is internally dropped. + * For writes we are called under i_mutex and return with i_mutex held, even + * though it is internally dropped. * For reads, i_mutex is not held on entry, but it is taken and dropped before * returning. * * DIO_OWN_LOCKING (filesystem provides synchronisation and handling of * uninitialised data, allowing parallel direct readers and writers) * For writes we are called without i_mutex, return without it, never touch it. - * For reads, i_mutex is held on entry and will be released before returning. + * For reads we are called under i_mutex and return with i_mutex held, even + * though it may be internally dropped. * * Additional i_alloc_sem locking requirements described inline below. */ @@ -1182,7 +1183,8 @@ __blockdev_direct_IO(int rw, struct kioc ssize_t retval = -EINVAL; loff_t end = offset; struct dio *dio; - int reader_with_isem = (rw == READ && dio_lock_type == DIO_OWN_LOCKING); + int release_i_mutex = 0; + int acquire_i_mutex = 0; if (rw & WRITE) current->flags |= PF_SYNCWRITE; @@ -1225,7 +1227,6 @@ __blockdev_direct_IO(int rw, struct kioc * writers need to grab i_alloc_sem only (i_mutex is already held) * For regular files using DIO_OWN_LOCKING, * neither readers nor writers take any locks here - * (i_mutex is already held and release for writers here) */ dio->lock_type = dio_lock_type; if (dio_lock_type != DIO_NO_LOCKING) { @@ -1236,7 +1237,7 @@ __blockdev_direct_IO(int rw, struct kioc mapping = iocb->ki_filp->f_mapping; if (dio_lock_type != DIO_OWN_LOCKING) { mutex_lock(&inode->i_mutex); - reader_with_isem = 1; + release_i_mutex = 1; } retval = filemap_write_and_wait_range(mapping, offset, @@ -1248,7 +1249,7 @@ __blockdev_direct_IO(int rw, struct kioc if (dio_lock_type == DIO_OWN_LOCKING) { mutex_unlock(&inode->i_mutex); - reader_with_isem = 0; + acquire_i_mutex = 1; } } @@ -1269,11 +1270,13 @@ __blockdev_direct_IO(int rw, struct kioc nr_segs, blkbits, get_blocks, end_io, dio); if (rw == READ && dio_lock_type == DIO_LOCKING) - reader_with_isem = 0; + release_i_mutex = 0; out: - if (reader_with_isem) + if (release_i_mutex) mutex_unlock(&inode->i_mutex); + else if (acquire_i_mutex) + mutex_lock(&inode->i_mutex); if (rw & WRITE) current->flags &= ~PF_SYNCWRITE; return retval; ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests 2006-03-10 0:50 ` Nathan Scott @ 2006-03-10 15:49 ` Christoph Hellwig 2006-03-14 4:46 ` Suparna Bhattacharya 2006-03-17 17:22 ` Adrian Bunk 2006-07-10 16:46 ` Stephane Doyon 2 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2006-03-10 15:49 UTC (permalink / raw) To: Nathan Scott Cc: Christoph Hellwig, Suzuki, linux-fsdevel, linux-aio kvack.org, lkml, suparna, akpm, linux-xfs On Fri, Mar 10, 2006 at 11:50:20AM +1100, Nathan Scott wrote: > Something like this (works OK for me)... Yeah, that should work for now. But long-term we really need to redo direct I/O locking to have a common scheme for all filesystems. I've heard birds whistling RH patches yet another scheme into RHEL4 for GFS an it's definitly already far too complex now. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests 2006-03-10 15:49 ` Christoph Hellwig @ 2006-03-14 4:46 ` Suparna Bhattacharya 0 siblings, 0 replies; 14+ messages in thread From: Suparna Bhattacharya @ 2006-03-14 4:46 UTC (permalink / raw) To: Christoph Hellwig, Nathan Scott, Suzuki, linux-fsdevel, linux-aio kvack.org, lkml, akpm, linux-xfs On Fri, Mar 10, 2006 at 03:49:25PM +0000, Christoph Hellwig wrote: > On Fri, Mar 10, 2006 at 11:50:20AM +1100, Nathan Scott wrote: > > Something like this (works OK for me)... > > Yeah, that should work for now. But long-term we really need to redo > direct I/O locking to have a common scheme for all filesystems. I've heard > birds whistling RH patches yet another scheme into RHEL4 for GFS an it's > definitly already far too complex now. Yup, getting rid of the need for all these confusing locking modes was one of the objectives in mind for DIO simplification. (http://www.kernel.org/pub/linux/kernel/people/suparna/DIO-simplify.txt) Once we have an efficient range locking or similar mechanism in place (Chris Mason is working on a patch), then it should be possible to push out all of the i_mutex locking to higher level routines, outside of direct-io.c. Longer term, it would be nice to be able to rethink and further simplify the whole _nolock equiv versions for VFS write methods. Especially the percolation down to sync_page_range_nolock, etc. Regards Suparna > > -- > To unsubscribe, send a message with 'unsubscribe linux-aio' in > the body to majordomo@kvack.org. For more info on Linux AIO, > see: http://www.kvack.org/aio/ > Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> -- Suparna Bhattacharya (suparna@in.ibm.com) Linux Technology Center IBM Software Lab, India ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests 2006-03-10 0:50 ` Nathan Scott 2006-03-10 15:49 ` Christoph Hellwig @ 2006-03-17 17:22 ` Adrian Bunk 2006-03-18 3:34 ` Nathan Scott 2006-07-10 16:46 ` Stephane Doyon 2 siblings, 1 reply; 14+ messages in thread From: Adrian Bunk @ 2006-03-17 17:22 UTC (permalink / raw) To: Nathan Scott Cc: Christoph Hellwig, Suzuki, linux-fsdevel, linux-aio kvack.org, lkml, suparna, akpm, linux-xfs On Fri, Mar 10, 2006 at 11:50:20AM +1100, Nathan Scott wrote: > On Fri, Mar 10, 2006 at 10:14:22AM +1100, Nathan Scott wrote: > > On Thu, Mar 09, 2006 at 10:42:19PM +0000, Christoph Hellwig wrote: > > > On Fri, Mar 10, 2006 at 09:30:42AM +1100, Nathan Scott wrote: > > > > Not for reads AFAICT - __generic_file_aio_read + own-locking > > > > should always have released i_mutex at the end of the direct > > > > read - are you thinking of writes or have I missed something? > > > > > > if an error occurs before a_ops->direct_IO is called __generic_file_aio_read > > > will return with i_mutex still locked. Note that checking for negative > > > return values is not enough as __blockdev_direct_IO can return errors > > > aswell. > > > > *groan* - right you are. Another option may be to have the > > generic dio+own-locking case reacquire i_mutex if it drops > > it, before returning... thoughts? Seems alot less invasive > > than the filemap.c code dup'ing thing. > > Something like this (works OK for me)... Is this 2.6.16 material? > cheers. > Nathan >... cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests 2006-03-17 17:22 ` Adrian Bunk @ 2006-03-18 3:34 ` Nathan Scott 2006-03-18 5:03 ` Adrian Bunk 0 siblings, 1 reply; 14+ messages in thread From: Nathan Scott @ 2006-03-18 3:34 UTC (permalink / raw) To: Adrian Bunk Cc: Christoph Hellwig, Suzuki, linux-fsdevel, linux-aio kvack.org, lkml, suparna, akpm, linux-xfs On Fri, Mar 17, 2006 at 06:22:10PM +0100, Adrian Bunk wrote: > On Fri, Mar 10, 2006 at 11:50:20AM +1100, Nathan Scott wrote: > > Something like this (works OK for me)... > > Is this 2.6.16 material? Its been merged already. cheers. -- Nathan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests 2006-03-18 3:34 ` Nathan Scott @ 2006-03-18 5:03 ` Adrian Bunk 0 siblings, 0 replies; 14+ messages in thread From: Adrian Bunk @ 2006-03-18 5:03 UTC (permalink / raw) To: Nathan Scott Cc: Christoph Hellwig, Suzuki, linux-fsdevel, linux-aio kvack.org, lkml, suparna, akpm, linux-xfs On Sat, Mar 18, 2006 at 02:34:44PM +1100, Nathan Scott wrote: > On Fri, Mar 17, 2006 at 06:22:10PM +0100, Adrian Bunk wrote: > > On Fri, Mar 10, 2006 at 11:50:20AM +1100, Nathan Scott wrote: > > > Something like this (works OK for me)... > > > > Is this 2.6.16 material? > > Its been merged already. Ups, sorry for missing this. > cheers. > Nathan cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests 2006-03-10 0:50 ` Nathan Scott 2006-03-10 15:49 ` Christoph Hellwig 2006-03-17 17:22 ` Adrian Bunk @ 2006-07-10 16:46 ` Stephane Doyon 2006-07-11 0:18 ` Nathan Scott 2 siblings, 1 reply; 14+ messages in thread From: Stephane Doyon @ 2006-07-10 16:46 UTC (permalink / raw) To: Christoph Hellwig, Nathan Scott Cc: Suzuki, linux-fsdevel, linux-aio kvack.org, lkml, suparna, akpm, linux-xfs Hi, A few months back, a fix was introduced for a mutex double unlock warning related to direct I/O in XFS. I believe that fix has a lock ordering problem that can cause a deadlock. The problem was that __blockdev_direct_IO() would unlock the i_mutex in the READ and DIO_OWN_LOCKING case, and the mutex would be unlocked again in xfs_read() soon after returning from __generic_file_aio_read(). Because there are lots of execution paths down from __generic_file_aio_read() that do not consistently release the i_mutex, the safest fix was deemed to be to reacquire the i_mutex before returning from __blockdev_direct_IO(). At this point however, the reader is holding an xfs ilock, and AFAICT the i_mutex is usually taken BEFORE the xfs ilock. We are seeing a deadlock between a process writing and another process reading the same file, when the reader is using direct I/O. (The writer must apparently be growing the file, using either direct or buffered I/O.) Something like this, on XFS: (dd if=/dev/zero of=f bs=128K count=5000 & ) ; sleep 1 ; dd of=/dev/null if=f iflag=direct bs=128K count=5000 Seen on kernels 2.6.16 and 2.6.17. The deadlock scenario appears to be this: -The reader goes into xfs_read(), takes the i_mutex and then an xfs_ilock XFS_IOLOCK_SHARED, then calls down to __generic_file_aio_read() which eventually goes down to __blockdev_direct_IO(). In there it drops the i_mutex. -The writer goes into xfs_write() and obtains the i_mutex. It then tries to get an xfs_ilock XFS_ILOCK_EXCL and must wait on it since it's held by the reader. -The reader, still in __blockdev_direct_IO(), executes directio_worker() and then tries to reacquire the i_mutex, and must wait on it because the writer has it. And so we have a deadlock. I leave it to the experts to figure out what the right fix for this might be. A workaround might be to not release the i_mutex during __blockdev_direct_IO(). Thanks On Thu, 9 Mar 2006, Christoph Hellwig wrote: > On Thu, Mar 09, 2006 at 01:24:38PM +0530, Suzuki wrote: >> >> Missed out linux-aio & linux-fs-devel lists. Forwarding. >> >> Comments ? > > I've seen this too. The problem is that __generic_file_aio_read can return > with or without the i_mutex locked in the direct I/O case for filesystems > that set DIO_OWN_LOCKING. It's a nasty one and I haven't found a better solution > than copying lots of code from filemap.c into xfs. > > > On Fri, 10 Mar 2006, Nathan Scott wrote: > On Thu, Mar 09, 2006 at 12:47:01PM +0530, Suzuki wrote: >> Hi all, > > Hi there Suzuki, > >> I was working on an issue with getting "Badness in >> __mutex_unlock_slowpath" and hence a stack trace, while running FS >> stress tests on XFS on 2.6.16-rc5 kernel. > > Thanks for looking into this. > >> The dmesg looks like : >> >> Badness in __mutex_unlock_slowpath at kernel/mutex.c:207 >> [<c0103c0c>] show_trace+0x20/0x22 >> [<c0103d4b>] dump_stack+0x1e/0x20 >> [<c0473f1f>] __mutex_unlock_slowpath+0x12a/0x23b >> [<c0473938>] mutex_unlock+0xb/0xd >> [<c02a5720>] xfs_read+0x230/0x2d9 >> [<c02a1bed>] linvfs_aio_read+0x8d/0x98 >> [<c015f3df>] do_sync_read+0xb8/0x107 >> [<c015f4f7>] vfs_read+0xc9/0x19b >> [<c015f8b2>] sys_read+0x47/0x6e >> [<c0102db7>] sysenter_past_esp+0x54/0x75 > > Yeah, test 008 from the xfstests suite was reliably hitting this for > me, it'd just not percolated to the top of my todo list yet. > >> This happens with XFS DIO reads. xfs_read holds the i_mutex and issues a >> __generic_file_aio_read(), which falls into __blockdev_direct_IO with >> DIO_OWN_LOCKING flag (since xfs uses own_locking ). Now >> __blockdev_direct_IO releases the i_mutex for READs with >> DIO_OWN_LOCKING.When it returns to xfs_read, it tries to unlock the >> i_mutex ( which is now already unlocked), causing the "Badness". > > Indeed. And there's the problem - why is XFS releasing i_mutex > for the direct read in xfs_read? Shouldn't be - fs/direct-io.c > will always release i_mutex for a direct read in the own-locking > case, so XFS shouldn't be doing it too (thats what the code does > and thats what the comment preceding __blockdev_direct_IO says). > > The only piece of the puzzle I don't understand is why we don't > always get that badness message at the end of every direct read. > Perhaps its some subtle fastpath/slowpath difference, or maybe > "debug_mutex_on" gets switched off after the first occurance... > > Anyway, with the above change (remove 2 lines near xfs_read end), > I can no longer reproduce the problem in that previously-warning > test case, and all the other XFS tests seem to be chugging along > OK (which includes a healthy mix of dio testing). > >> The possible solution which we can think of, is not to unlock the >> i_mutex for DIO_OWN_LOCKING. This will only affect the DIO_OWN_LOCKING >> users (as of now, only XFS ) with concurrent DIO sync read requests. AIO >> read requests would not suffer this problem since they would just return >> once the DIO is submitted. > > I don't think that level of invasiveness is necessary at this stage, > but perhaps you're seeing something that I've missed? Do you see > any reason why removing the xfs_read unlock wont work? > >> Another work around for this can be adding a check "mutex_is_locked" >> before trying to unlock i_mutex in xfs_read. But this seems to be an >> ugly hack. :( > > Hmm, that just plain wouldn't work - what if the lock was released > in generic direct IO code, and someone else had acquired it before > we got to the end of xfs_read? Badness for sure. > > cheers. > > On Fri, 10 Mar 2006, Nathan Scott wrote: > On Fri, Mar 10, 2006 at 10:14:22AM +1100, Nathan Scott wrote: >> On Thu, Mar 09, 2006 at 10:42:19PM +0000, Christoph Hellwig wrote: >>> On Fri, Mar 10, 2006 at 09:30:42AM +1100, Nathan Scott wrote: >>>> Not for reads AFAICT - __generic_file_aio_read + own-locking >>>> should always have released i_mutex at the end of the direct >>>> read - are you thinking of writes or have I missed something? >>> >>> if an error occurs before a_ops->direct_IO is called __generic_file_aio_read >>> will return with i_mutex still locked. Note that checking for negative >>> return values is not enough as __blockdev_direct_IO can return errors >>> aswell. >> >> *groan* - right you are. Another option may be to have the >> generic dio+own-locking case reacquire i_mutex if it drops >> it, before returning... thoughts? Seems alot less invasive >> than the filemap.c code dup'ing thing. > > Something like this (works OK for me)... > > cheers. > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests 2006-07-10 16:46 ` Stephane Doyon @ 2006-07-11 0:18 ` Nathan Scott 2006-07-11 13:40 ` Stephane Doyon 0 siblings, 1 reply; 14+ messages in thread From: Nathan Scott @ 2006-07-11 0:18 UTC (permalink / raw) To: Stephane Doyon Cc: Christoph Hellwig, Suzuki, linux-fsdevel, linux-aio kvack.org, lkml, suparna, akpm, xfs On Mon, Jul 10, 2006 at 12:46:23PM -0400, Stephane Doyon wrote: > Hi, Hi Stephane, > A few months back, a fix was introduced for a mutex double unlock warning > related to direct I/O in XFS. I believe that fix has a lock ordering > problem that can cause a deadlock. > > The problem was that __blockdev_direct_IO() would unlock the i_mutex in > the READ and DIO_OWN_LOCKING case, and the mutex would be unlocked again > in xfs_read() soon after returning from __generic_file_aio_read(). Because > there are lots of execution paths down from __generic_file_aio_read() that > do not consistently release the i_mutex, the safest fix was deemed to be > to reacquire the i_mutex before returning from __blockdev_direct_IO(). At > this point however, the reader is holding an xfs ilock, and AFAICT the > i_mutex is usually taken BEFORE the xfs ilock. That is correct, yes. Hmm. Subtle. Painful. Thanks for the detailed report and your analysis. > We are seeing a deadlock between a process writing and another process > reading the same file, when the reader is using direct I/O. (The writer Is that a direct writer or a buffered writer? > must apparently be growing the file, using either direct or buffered > I/O.) Something like this, on XFS: > (dd if=/dev/zero of=f bs=128K count=5000 & ) ; sleep 1 ; dd of=/dev/null > if=f iflag=direct bs=128K count=5000 > > Seen on kernels 2.6.16 and 2.6.17. > > The deadlock scenario appears to be this: > -The reader goes into xfs_read(), takes the i_mutex and then an xfs_ilock > XFS_IOLOCK_SHARED, then calls down to __generic_file_aio_read() which > eventually goes down to __blockdev_direct_IO(). In there it drops the > i_mutex. > -The writer goes into xfs_write() and obtains the i_mutex. It then tries > to get an xfs_ilock XFS_ILOCK_EXCL and must wait on it since it's held by > the reader. > -The reader, still in __blockdev_direct_IO(), executes directio_worker() > and then tries to reacquire the i_mutex, and must wait on it because the > writer has it. > > And so we have a deadlock. *nod*. This will require some thought, I'm not sure I like the sound of your suggested workaround there a whole lot, unfortunately, but maybe it is all we can do at this stage. Let me ponder further and get back to you (but if you want to prototype your fix further, that'd be most welcome, of course). cheers. -- Nathan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests 2006-07-11 0:18 ` Nathan Scott @ 2006-07-11 13:40 ` Stephane Doyon 0 siblings, 0 replies; 14+ messages in thread From: Stephane Doyon @ 2006-07-11 13:40 UTC (permalink / raw) To: Nathan Scott Cc: Christoph Hellwig, Suzuki, linux-fsdevel, linux-aio kvack.org, lkml, suparna, akpm, xfs On Tue, 11 Jul 2006, Nathan Scott wrote: > On Mon, Jul 10, 2006 at 12:46:23PM -0400, Stephane Doyon wrote: >> Hi, > > Hi Stephane, > >> A few months back, a fix was introduced for a mutex double unlock warning >> related to direct I/O in XFS. I believe that fix has a lock ordering >> problem that can cause a deadlock. >> >> The problem was that __blockdev_direct_IO() would unlock the i_mutex in >> the READ and DIO_OWN_LOCKING case, and the mutex would be unlocked again >> in xfs_read() soon after returning from __generic_file_aio_read(). Because >> there are lots of execution paths down from __generic_file_aio_read() that >> do not consistently release the i_mutex, the safest fix was deemed to be >> to reacquire the i_mutex before returning from __blockdev_direct_IO(). At >> this point however, the reader is holding an xfs ilock, and AFAICT the >> i_mutex is usually taken BEFORE the xfs ilock. > > That is correct, yes. Hmm. Subtle. Painful. Thanks for the detailed > report and your analysis. > >> We are seeing a deadlock between a process writing and another process >> reading the same file, when the reader is using direct I/O. (The writer > > Is that a direct writer or a buffered writer? Whichever, both cases trigger the deadlock. >> must apparently be growing the file, using either direct or buffered >> I/O.) Something like this, on XFS: >> (dd if=/dev/zero of=f bs=128K count=5000 & ) ; sleep 1 ; dd of=/dev/null >> if=f iflag=direct bs=128K count=5000 >> >> Seen on kernels 2.6.16 and 2.6.17. >> >> The deadlock scenario appears to be this: >> -The reader goes into xfs_read(), takes the i_mutex and then an xfs_ilock >> XFS_IOLOCK_SHARED, then calls down to __generic_file_aio_read() which >> eventually goes down to __blockdev_direct_IO(). In there it drops the >> i_mutex. >> -The writer goes into xfs_write() and obtains the i_mutex. It then tries >> to get an xfs_ilock XFS_ILOCK_EXCL and must wait on it since it's held by >> the reader. >> -The reader, still in __blockdev_direct_IO(), executes directio_worker() >> and then tries to reacquire the i_mutex, and must wait on it because the >> writer has it. >> >> And so we have a deadlock. > > *nod*. This will require some thought, I'm not sure I like the sound of > your suggested workaround there a whole lot, unfortunately, but maybe it > is all we can do at this stage. Let me ponder further and get back to you Thank you. > (but if you want to prototype your fix further, that'd be most welcome, of > course). Sure, well it's not very subtle. The below patch is what I'm using for now. I haven't seen problems with it yet but it hasn't been seriously tested. I'm assuming that it's OK to keep holding the i_mutex during the call to direct_io_worker(), because in the DIO_LOCKING case, direct_io_worker() is called with the i_mutex held, and XFS touches i_mutex only in xfs_read() and xfs_write(), so opefully nothing will conflict. Signed-off-by: Stephane Doyon <sdoyon@max-t.com> --- linux/fs/direct-io.c.orig 2006-07-11 09:23:20.000000000 -0400 +++ linux/fs/direct-io.c 2006-07-11 09:27:54.000000000 -0400 @@ -1191,7 +1191,6 @@ __blockdev_direct_IO(int rw, struct kioc loff_t end = offset; struct dio *dio; int release_i_mutex = 0; - int acquire_i_mutex = 0; if (rw & WRITE) current->flags |= PF_SYNCWRITE; @@ -1254,11 +1253,6 @@ __blockdev_direct_IO(int rw, struct kioc goto out; } - if (dio_lock_type == DIO_OWN_LOCKING) { - mutex_unlock(&inode->i_mutex); - acquire_i_mutex = 1; - } - } if (dio_lock_type == DIO_LOCKING) down_read(&inode->i_alloc_sem); @@ -1282,8 +1276,6 @@ __blockdev_direct_IO(int rw, struct kioc out: if (release_i_mutex) mutex_unlock(&inode->i_mutex); - else if (acquire_i_mutex) - mutex_lock(&inode->i_mutex); if (rw & WRITE) current->flags &= ~PF_SYNCWRITE; return retval; ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2006-07-11 13:43 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-03-09 7:54 [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests Suzuki 2006-03-09 12:03 ` Christoph Hellwig 2006-03-09 22:30 ` Nathan Scott 2006-03-09 22:42 ` Christoph Hellwig 2006-03-09 23:14 ` Nathan Scott 2006-03-10 0:50 ` Nathan Scott 2006-03-10 15:49 ` Christoph Hellwig 2006-03-14 4:46 ` Suparna Bhattacharya 2006-03-17 17:22 ` Adrian Bunk 2006-03-18 3:34 ` Nathan Scott 2006-03-18 5:03 ` Adrian Bunk 2006-07-10 16:46 ` Stephane Doyon 2006-07-11 0:18 ` Nathan Scott 2006-07-11 13:40 ` Stephane Doyon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).