From: Dave Chinner <david@fromorbit.com>
To: Josef Bacik <jbacik@fb.com>
Cc: linux-fsdevel@vger.kernel.org, kernel-team@fb.com,
viro@ZenIV.linux.org.uk, hch@infradead.org, jack@suse.cz
Subject: [PATCH] sync: wait_sb_inodes() calls iput() with spinlock held (was Re: [PATCH 0/7] super block scalabilit patches V3)
Date: Mon, 22 Jun 2015 12:26:48 +1000 [thread overview]
Message-ID: <20150622022648.GO10224@dastard> (raw)
In-Reply-To: <20150615213429.GB10224@dastard>
On Tue, Jun 16, 2015 at 07:34:29AM +1000, Dave Chinner wrote:
> On Thu, Jun 11, 2015 at 03:41:05PM -0400, Josef Bacik wrote:
> > Here are the cleaned up versions of Dave Chinners super block scalability
> > patches. I've been testing them locally for a while and they are pretty solid.
> > They fix a few big issues, such as the global inode list and soft lockups on
> > boxes on unmount that have lots of inodes in cache. Al if you would consider
> > pulling these in that would be great, you can pull from here
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git superblock-scaling
>
> Passes all my smoke tests.
>
> Tested-by: Dave Chinner <dchinner@redhat.com>
FWIW, I just updated my trees to whatever is in the above branch and
v4.1-rc8, and now I'm seeing problems with wb.list_lock recursion
and "sleeping in atomic" scehduling issues. generic/269 produced
this:
BUG: spinlock cpu recursion on CPU#1, fsstress/3852
lock: 0xffff88042a650c28, .magic: dead4ead, .owner: fsstress/3804, .owner_cpu: 1
CPU: 1 PID: 3852 Comm: fsstress Tainted: G W 4.1.0-rc8-dgc+ #263
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
ffff88042a650c28 ffff88039898b8e8 ffffffff81e18ffd ffff88042f250fb0
ffff880428f6b8e0 ffff88039898b908 ffffffff81e12f09 ffff88042a650c28
ffffffff8221337b ffff88039898b928 ffffffff81e12f34 ffff88042a650c28
Call Trace:
[<ffffffff81e18ffd>] dump_stack+0x4c/0x6e
[<ffffffff81e12f09>] spin_dump+0x90/0x95
[<ffffffff81e12f34>] spin_bug+0x26/0x2b
[<ffffffff810e762d>] do_raw_spin_lock+0x10d/0x150
[<ffffffff81e24975>] _raw_spin_lock+0x15/0x20
[<ffffffff811f8ba0>] __mark_inode_dirty+0x2b0/0x450
[<ffffffff812003b8>] __set_page_dirty+0x78/0xd0
[<ffffffff81200531>] mark_buffer_dirty+0x61/0xf0
[<ffffffff81200d91>] __block_commit_write.isra.24+0x81/0xb0
[<ffffffff81202406>] block_write_end+0x36/0x70
[<ffffffff814fa110>] ? __xfs_get_blocks+0x8a0/0x8a0
[<ffffffff81202474>] generic_write_end+0x34/0xb0
[<ffffffff8118af3d>] ? wait_for_stable_page+0x1d/0x50
[<ffffffff814fa317>] xfs_vm_write_end+0x67/0xc0
[<ffffffff811813af>] pagecache_write_end+0x1f/0x30
[<ffffffff815060dd>] xfs_iozero+0x10d/0x190
[<ffffffff8150666b>] xfs_zero_last_block+0xdb/0x110
[<ffffffff815067ba>] xfs_zero_eof+0x11a/0x290
[<ffffffff811d69e0>] ? complete_walk+0x60/0x100
[<ffffffff811da25f>] ? path_lookupat+0x5f/0x660
[<ffffffff81506a6e>] xfs_file_aio_write_checks+0x13e/0x160
[<ffffffff81506f15>] xfs_file_buffered_aio_write+0x75/0x250
[<ffffffff811ddb0f>] ? user_path_at_empty+0x5f/0xa0
[<ffffffff810c601d>] ? __might_sleep+0x4d/0x90
[<ffffffff815071f5>] xfs_file_write_iter+0x105/0x120
[<ffffffff811cc5ce>] __vfs_write+0xae/0xf0
[<ffffffff811ccc01>] vfs_write+0xa1/0x190
[<ffffffff811cd999>] SyS_write+0x49/0xb0
[<ffffffff811cc781>] ? SyS_lseek+0x91/0xb0
[<ffffffff81e24fee>] system_call_fastpath+0x12/0x71
And there are a few tests (including generic/269) producing
in_atomic/"scheduling while atomic" bugs in the evict() path such as:
in_atomic(): 1, irqs_disabled(): 0, pid: 3852, name: fsstress
CPU: 12 PID: 3852 Comm: fsstress Not tainted 4.1.0-rc8-dgc+ #263
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
000000000000015d ffff88039898b6d8 ffffffff81e18ffd 0000000000000000
ffff880398865550 ffff88039898b6f8 ffffffff810c5f89 ffff8803f15c45c0
ffffffff8227a3bf ffff88039898b728 ffffffff810c601d ffff88039898b758
Call Trace:
[<ffffffff81e18ffd>] dump_stack+0x4c/0x6e
[<ffffffff810c5f89>] ___might_sleep+0xf9/0x140
[<ffffffff810c601d>] __might_sleep+0x4d/0x90
[<ffffffff81201e8b>] block_invalidatepage+0xab/0x140
[<ffffffff814f7579>] xfs_vm_invalidatepage+0x39/0xb0
[<ffffffff8118fa77>] truncate_inode_page+0x67/0xa0
[<ffffffff8118fc92>] truncate_inode_pages_range+0x1a2/0x6f0
[<ffffffff811828d1>] ? find_get_pages_tag+0xf1/0x1b0
[<ffffffff8104a663>] ? __switch_to+0x1e3/0x5a0
[<ffffffff8118dd05>] ? pagevec_lookup_tag+0x25/0x40
[<ffffffff811f620d>] ? __inode_wait_for_writeback+0x6d/0xc0
[<ffffffff8119024c>] truncate_inode_pages_final+0x4c/0x60
[<ffffffff8151c47f>] xfs_fs_evict_inode+0x4f/0x100
[<ffffffff811e8330>] evict+0xc0/0x1a0
[<ffffffff811e8d7b>] iput+0x1bb/0x220
[<ffffffff811f68b3>] sync_inodes_sb+0x353/0x3d0
[<ffffffff8151def8>] xfs_flush_inodes+0x28/0x40
[<ffffffff81514648>] xfs_create+0x638/0x770
[<ffffffff814e9049>] ? xfs_dir2_sf_lookup+0x199/0x330
[<ffffffff81511091>] xfs_generic_create+0xd1/0x300
[<ffffffff817a059c>] ? security_inode_permission+0x1c/0x30
[<ffffffff815112f6>] xfs_vn_create+0x16/0x20
[<ffffffff811d8665>] vfs_create+0xd5/0x140
[<ffffffff811dbea3>] do_last+0xff3/0x1200
[<ffffffff811d9f36>] ? path_init+0x186/0x450
[<ffffffff811dc130>] path_openat+0x80/0x610
[<ffffffff81512a24>] ? xfs_iunlock+0xc4/0x210
[<ffffffff811ddbfa>] do_filp_open+0x3a/0x90
[<ffffffff811dc8bf>] ? getname_flags+0x4f/0x200
[<ffffffff81e249ce>] ? _raw_spin_unlock+0xe/0x30
[<ffffffff811eab17>] ? __alloc_fd+0xa7/0x130
[<ffffffff811cbcf8>] do_sys_open+0x128/0x220
[<ffffffff811cbe4e>] SyS_creat+0x1e/0x20
[<ffffffff81e24fee>] system_call_fastpath+0x12/0x71
It looks to me like iput() is being called with the wb.list_lock
held in wait_sb_inodes(), and everything is going downhill from
there. Patch below fixes the problem for me.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
sync: wait_sb_inodes() calls iput() with spinlock held.
From: Dave Chinner <dchinner@redhat.com>
wait_sb_inodes() is triggering "sleeping in atomic" problems with
blocking operations in iput() processing when wait_sb_inodes()
releases the last reference to an inode. Fix it by delaying the
iput() until the next loop pass when we aren't holding any
spinlocks.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/fs-writeback.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 1718702..a2cd363 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1436,6 +1436,7 @@ static void wait_sb_inodes(struct super_block *sb)
{
struct backing_dev_info *bdi = sb->s_bdi;
LIST_HEAD(sync_list);
+ struct inode *iput_inode = NULL;
/*
* We need to be protected against the filesystem going from
@@ -1497,6 +1498,9 @@ static void wait_sb_inodes(struct super_block *sb)
spin_unlock(&inode->i_lock);
spin_unlock(&bdi->wb.list_lock);
+ if (iput_inode)
+ iput(iput_inode);
+
filemap_fdatawait(mapping);
cond_resched();
@@ -1516,9 +1520,19 @@ static void wait_sb_inodes(struct super_block *sb)
} else
list_del_init(&inode->i_wb_list);
spin_unlock_irq(&mapping->tree_lock);
- iput(inode);
+
+ /*
+ * can't iput inode while holding the wb.list_lock. Save it for
+ * the next time through the loop when we drop all our spin
+ * locks.
+ */
+ iput_inode = inode;
}
spin_unlock(&bdi->wb.list_lock);
+
+ if (iput_inode)
+ iput(iput_inode);
+
mutex_unlock(&sb->s_sync_lock);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
next prev parent reply other threads:[~2015-06-22 2:27 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-11 19:41 [PATCH 0/7] super block scalabilit patches V3 Josef Bacik
2015-06-11 19:41 ` [PATCH 1/8] writeback: plug writeback at a high level Josef Bacik
2015-06-17 12:03 ` Christoph Hellwig
2015-06-11 19:41 ` [PATCH 2/8] inode: add hlist_fake to avoid the inode hash lock in evict Josef Bacik
2015-06-17 12:03 ` Christoph Hellwig
2015-06-11 19:41 ` [PATCH 3/8] inode: convert inode_sb_list_lock to per-sb Josef Bacik
2015-06-17 12:06 ` Christoph Hellwig
2015-06-11 19:41 ` [PATCH 4/8] sync: serialise per-superblock sync operations Josef Bacik
2015-06-17 12:06 ` Christoph Hellwig
2015-06-11 19:41 ` [PATCH 5/8] inode: rename i_wb_list to i_io_list Josef Bacik
2015-06-17 12:06 ` Christoph Hellwig
2015-06-11 19:41 ` [PATCH 6/8] bdi: add a new writeback list for sync Josef Bacik
2015-06-15 14:12 ` Jan Kara
2015-06-16 15:42 ` Josef Bacik
2015-06-17 10:34 ` Jan Kara
2015-06-17 17:55 ` Josef Bacik
2015-06-18 9:28 ` Jan Kara
2015-06-18 22:18 ` [PATCH 6/8 V4] " Josef Bacik
2015-06-19 8:38 ` Jan Kara
2015-06-11 19:41 ` [PATCH 7/8] writeback: periodically trim the writeback list Josef Bacik
2015-06-11 19:41 ` [PATCH 8/8] inode: don't softlockup when evicting inodes Josef Bacik
2015-06-15 14:16 ` Jan Kara
2015-06-11 20:50 ` [PATCH 0/7] super block scalabilit patches V3 Tejun Heo
2015-06-15 21:34 ` Dave Chinner
2015-06-22 2:26 ` Dave Chinner [this message]
2015-06-22 16:21 ` [PATCH] sync: wait_sb_inodes() calls iput() with spinlock held (was Re: [PATCH 0/7] super block scalabilit patches V3) Josef Bacik
2015-06-23 23:14 ` Josef Bacik
2015-06-24 0:35 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150622022648.GO10224@dastard \
--to=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=jbacik@fb.com \
--cc=kernel-team@fb.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.