next/mmotm: ext4 causes !rwsem_is

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* next/mmotm: ext4 causes !rwsem_is_locked warnings
@ 2012-09-27 21:05 Hugh Dickins
  2012-09-27 21:27 ` Theodore Ts'o
  0 siblings, 1 reply; 3+ messages in thread
From: Hugh Dickins @ 2012-09-27 21:05 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Andrew Morton, Christoph Hellwig, linux-ext4, linux-kernel

Hi Ted,

mmotm (or next) now gives me lots of WARNING at fs/fs-writeback.c:1312 from
writeback_inodes_sb_nr().  That's WARN_ON(!rwsem_is_locked(&sb->s_umount))
(some useful safety from hch) in writeback_inodes_sb_nr().

Your commit 40e684e5c597a98996cf4616e31fb2581ec82060
"ext4: fix potential deadlock in ext4_nonda_switch()" needs more change
to fs/fs-writeback.c.  I guess I'm the only one using a nearly full ext4.

I am, of course, applying the patch at the bottom to my tree for now;
but I think you'd do better not to eliminate that check.  Maybe add a
special wb_reason for your case, and skip rwsem check in that case.

Hugh

WARNING: at fs/fs-writeback.c:1312 writeback_inodes_sb_nr+0x6e/0xca()
Hardware name: GU658AA-ABU a6250.uk
ACPI: Invalid Power Resource to register!
Modules linked in: snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device
Pid: 20839, comm: objcopy Not tainted 3.6.0-rc7-mm1 #1
Call Trace:
[<ffffffff8102f7b7>] warn_slowpath_common+0x80/0x98
[<ffffffff8102f7e4>] warn_slowpath_null+0x15/0x17
[<ffffffff810f55a8>] writeback_inodes_sb_nr+0x6e/0xca
[<ffffffff810f58a1>] writeback_inodes_sb+0x22/0x29
[<ffffffff81131b2a>] ext4_nonda_switch+0x8b/0x93
[<ffffffff81134cda>] ext4_da_write_begin+0x48/0x180
[<ffffffff8108718d>] generic_perform_write+0xc0/0x1e0
[<ffffffff81087300>] generic_file_buffered_write+0x53/0x7d
[<ffffffff81088c64>] __generic_file_aio_write+0x2a5/0x2d5
[<ffffffff81088d05>] generic_file_aio_write+0x71/0xda
[<ffffffff8112f203>] ext4_file_write+0xb0/0xb7
[<ffffffff810d6f0a>] do_sync_write+0x98/0xd5
[<ffffffff810d75d2>] vfs_write+0xb0/0x12c
[<ffffffff810d783f>] sys_write+0x4f/0x79
[<ffffffff813b6732>] system_call_fastpath+0x16/0x1b

---
 fs/fs-writeback.c |    1 -
 1 file changed, 1 deletion(-)

--- mmotm/fs/fs-writeback.c	2012-09-26 10:15:29.368071559 -0700
+++ linux/fs/fs-writeback.c	2012-09-26 22:38:25.648719969 -0700
@@ -1309,7 +1309,6 @@ void writeback_inodes_sb_nr(struct super

 	if (sb->s_bdi == &noop_backing_dev_info)
 		return;
-	WARN_ON(!rwsem_is_locked(&sb->s_umount));
 	bdi_queue_work(sb->s_bdi, &work);
 	wait_for_completion(&done);
 }

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: next/mmotm: ext4 causes !rwsem_is_locked warnings
  2012-09-27 21:05 next/mmotm: ext4 causes !rwsem_is_locked warnings Hugh Dickins
@ 2012-09-27 21:27 ` Theodore Ts'o
  2012-09-28  3:31   ` [PATCH v2] ext4: fix potential deadlock in ext4_nonda_switch() Theodore Ts'o
  0 siblings, 1 reply; 3+ messages in thread
From: Theodore Ts'o @ 2012-09-27 21:27 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, Christoph Hellwig, linux-ext4, linux-kernel

On Thu, Sep 27, 2012 at 02:05:00PM -0700, Hugh Dickins wrote:
> Hi Ted,
> 
> mmotm (or next) now gives me lots of WARNING at fs/fs-writeback.c:1312 from
> writeback_inodes_sb_nr().  That's WARN_ON(!rwsem_is_locked(&sb->s_umount))
> (some useful safety from hch) in writeback_inodes_sb_nr().
> 
> Your commit 40e684e5c597a98996cf4616e31fb2581ec82060
> "ext4: fix potential deadlock in ext4_nonda_switch()" needs more change
> to fs/fs-writeback.c.  I guess I'm the only one using a nearly full ext4.
> 
> I am, of course, applying the patch at the bottom to my tree for now;
> but I think you'd do better not to eliminate that check.  Maybe add a
> special wb_reason for your case, and skip rwsem check in that case.

Yeah, we were just discussing this on another thread.  I didn't think
of the option of creating a new special wb_reason option, but that's a
much better option than the other ones I had been considering, so I
think I will give that a try.   Thanks for the suggestion!!

      	     	       	 	       - Ted

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH v2] ext4: fix potential deadlock in ext4_nonda_switch()
  2012-09-27 21:27 ` Theodore Ts'o
@ 2012-09-28  3:31   ` Theodore Ts'o
  0 siblings, 0 replies; 3+ messages in thread
From: Theodore Ts'o @ 2012-09-28  3:31 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton, Christoph Hellwig, linux-ext4,
	linux-kernel, Dmitry Monakhov, stable

I've found a much simpler way of fixing this, by using
down_read_trylock().  In the very unlikely case where s_umount is
contended, we can just skip kicking the writeback thread.

	      	       	    	    	- Ted

>From 51ad3407a91ab090d1772b63329bd3b7f2210eb0 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso@mit.edu>
Date: Thu, 27 Sep 2012 23:12:48 -0400
Subject: [PATCH v2] ext4: fix potential deadlock in ext4_nonda_switch()

In ext4_nonda_switch(), if the file system is getting full we used to
call writeback_inodes_sb_if_idle().  The problem is that we can be
holding i_mutex already, and this causes a potential deadlock when
writeback_inodes_sb_if_idle() when it tries to take s_umount.  (See
lockdep output below).

As it turns out we don't need need to hold s_umount; the fact that we
are in the middle of the write(2) system call will keep the superblock
pinned.  Unfortunately writeback_inodes_sb() checks to make sure
s_umount is taken, and the VFS uses a different mechanism for making
sure the file system doesn't get unmounted out from under us.  The
simplest way of dealing with this is to just simply grab s_umount
using a trylock, and skip kicking the writeback flusher thread in the
very unlikely case that we can't take a read lock on s_umount without
blocking.

Also, we now check the cirteria for kicking the writeback thread
before we decide to whether to fall back to non-delayed writeback, so
if there are any outstanding delayed allocation writes, we try to get
them resolved as soon as possible.

   [ INFO: possible circular locking dependency detected ]
   3.6.0-rc1-00042-gce894ca #367 Not tainted
   -------------------------------------------------------
   dd/8298 is trying to acquire lock:
    (&type->s_umount_key#18){++++..}, at: [<c02277d4>] writeback_inodes_sb_if_idle+0x28/0x46

   but task is already holding lock:
    (&sb->s_type->i_mutex_key#8){+.+...}, at: [<c01ddcce>] generic_file_aio_write+0x5f/0xd3

   which lock already depends on the new lock.

   2 locks held by dd/8298:
    #0:  (sb_writers#2){.+.+.+}, at: [<c01ddcc5>] generic_file_aio_write+0x56/0xd3
    #1:  (&sb->s_type->i_mutex_key#8){+.+...}, at: [<c01ddcce>] generic_file_aio_write+0x5f/0xd3

   stack backtrace:
   Pid: 8298, comm: dd Not tainted 3.6.0-rc1-00042-gce894ca #367
   Call Trace:
    [<c015b79c>] ? console_unlock+0x345/0x372
    [<c06d62a1>] print_circular_bug+0x190/0x19d
    [<c019906c>] __lock_acquire+0x86d/0xb6c
    [<c01999db>] ? mark_held_locks+0x5c/0x7b
    [<c0199724>] lock_acquire+0x66/0xb9
    [<c02277d4>] ? writeback_inodes_sb_if_idle+0x28/0x46
    [<c06db935>] down_read+0x28/0x58
    [<c02277d4>] ? writeback_inodes_sb_if_idle+0x28/0x46
    [<c02277d4>] writeback_inodes_sb_if_idle+0x28/0x46
    [<c026f3b2>] ext4_nonda_switch+0xe1/0xf4
    [<c0271ece>] ext4_da_write_begin+0x27/0x193
    [<c01dcdb0>] generic_file_buffered_write+0xc8/0x1bb
    [<c01ddc47>] __generic_file_aio_write+0x1dd/0x205
    [<c01ddce7>] generic_file_aio_write+0x78/0xd3
    [<c026d336>] ext4_file_write+0x480/0x4a6
    [<c0198c1d>] ? __lock_acquire+0x41e/0xb6c
    [<c0180944>] ? sched_clock_cpu+0x11a/0x13e
    [<c01967e9>] ? trace_hardirqs_off+0xb/0xd
    [<c018099f>] ? local_clock+0x37/0x4e
    [<c0209f2c>] do_sync_write+0x67/0x9d
    [<c0209ec5>] ? wait_on_retry_sync_kiocb+0x44/0x44
    [<c020a7b9>] vfs_write+0x7b/0xe6
    [<c020a9a6>] sys_write+0x3b/0x64
    [<c06dd4bd>] syscall_call+0x7/0xb

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
---
 fs/ext4/inode.c   | 17 ++++++++++-------
 fs/fs-writeback.c |  1 +
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 0a9a89e..4ea396f 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2437,6 +2437,16 @@ static int ext4_nonda_switch(struct super_block *sb)
 	free_blocks  = EXT4_C2B(sbi,
 		percpu_counter_read_positive(&sbi->s_freeclusters_counter));
 	dirty_blocks = percpu_counter_read_positive(&sbi->s_dirtyclusters_counter);
+	/*
+	 * Start pushing delalloc when 1/2 of free blocks are dirty.
+	 */
+	if (dirty_blocks && (free_blocks < 2 * dirty_blocks) &&
+	    !writeback_in_progress(sb->s_bdi) &&
+	    down_read_trylock(&sb->s_umount)) {
+		writeback_inodes_sb(sb, WB_REASON_FS_FREE_SPACE);
+		up_read(&sb->s_umount);
+	}
+
 	if (2 * free_blocks < 3 * dirty_blocks ||
 		free_blocks < (dirty_blocks + EXT4_FREECLUSTERS_WATERMARK)) {
 		/*
@@ -2445,13 +2455,6 @@ static int ext4_nonda_switch(struct super_block *sb)
 		 */
 		return 1;
 	}
-	/*
-	 * Even if we don't switch but are nearing capacity,
-	 * start pushing delalloc when 1/2 of free blocks are dirty.
-	 */
-	if (free_blocks < 2 * dirty_blocks)
-		writeback_inodes_sb_if_idle(sb, WB_REASON_FS_FREE_SPACE);
-
 	return 0;
 }
 
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index be3efc4..5602d73 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -63,6 +63,7 @@ int writeback_in_progress(struct backing_dev_info *bdi)
 {
 	return test_bit(BDI_writeback_running, &bdi->state);
 }
+EXPORT_SYMBOL(writeback_in_progress);
 
 static inline struct backing_dev_info *inode_to_bdi(struct inode *inode)
 {
-- 
1.7.12.rc0.22.gcdd159b

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-09-28  3:31 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-27 21:05 next/mmotm: ext4 causes !rwsem_is_locked warnings Hugh Dickins
2012-09-27 21:27 ` Theodore Ts'o
2012-09-28  3:31   ` [PATCH v2] ext4: fix potential deadlock in ext4_nonda_switch() Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).