linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josef Bacik <jbacik@fb.com>
To: Dave Chinner <david@fromorbit.com>
Cc: <linux-fsdevel@vger.kernel.org>, <kernel-team@fb.com>,
	<viro@ZenIV.linux.org.uk>, <hch@infradead.org>, <jack@suse.cz>
Subject: Re: [PATCH] sync: wait_sb_inodes() calls iput() with spinlock held (was Re: [PATCH 0/7] super block scalabilit patches V3)
Date: Mon, 22 Jun 2015 09:21:03 -0700	[thread overview]
Message-ID: <558835EF.2000000@fb.com> (raw)
In-Reply-To: <20150622022648.GO10224@dastard>

On 06/21/2015 07:26 PM, Dave Chinner wrote:
> On Tue, Jun 16, 2015 at 07:34:29AM +1000, Dave Chinner wrote:
>> On Thu, Jun 11, 2015 at 03:41:05PM -0400, Josef Bacik wrote:
>>> Here are the cleaned up versions of Dave Chinners super block scalability
>>> patches.  I've been testing them locally for a while and they are pretty solid.
>>> They fix a few big issues, such as the global inode list and soft lockups on
>>> boxes on unmount that have lots of inodes in cache.  Al if you would consider
>>> pulling these in that would be great, you can pull from here
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git superblock-scaling
>>
>> Passes all my smoke tests.
>>
>> Tested-by: Dave Chinner <dchinner@redhat.com>
>
> FWIW, I just updated my trees to whatever is in the above branch and
> v4.1-rc8, and now I'm seeing problems with wb.list_lock recursion
> and "sleeping in atomic" scehduling issues. generic/269 produced
> this:
>
>   BUG: spinlock cpu recursion on CPU#1, fsstress/3852
>    lock: 0xffff88042a650c28, .magic: dead4ead, .owner: fsstress/3804, .owner_cpu: 1
>   CPU: 1 PID: 3852 Comm: fsstress Tainted: G        W       4.1.0-rc8-dgc+ #263
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>    ffff88042a650c28 ffff88039898b8e8 ffffffff81e18ffd ffff88042f250fb0
>    ffff880428f6b8e0 ffff88039898b908 ffffffff81e12f09 ffff88042a650c28
>    ffffffff8221337b ffff88039898b928 ffffffff81e12f34 ffff88042a650c28
>   Call Trace:
>    [<ffffffff81e18ffd>] dump_stack+0x4c/0x6e
>    [<ffffffff81e12f09>] spin_dump+0x90/0x95
>    [<ffffffff81e12f34>] spin_bug+0x26/0x2b
>    [<ffffffff810e762d>] do_raw_spin_lock+0x10d/0x150
>    [<ffffffff81e24975>] _raw_spin_lock+0x15/0x20
>    [<ffffffff811f8ba0>] __mark_inode_dirty+0x2b0/0x450
>    [<ffffffff812003b8>] __set_page_dirty+0x78/0xd0
>    [<ffffffff81200531>] mark_buffer_dirty+0x61/0xf0
>    [<ffffffff81200d91>] __block_commit_write.isra.24+0x81/0xb0
>    [<ffffffff81202406>] block_write_end+0x36/0x70
>    [<ffffffff814fa110>] ? __xfs_get_blocks+0x8a0/0x8a0
>    [<ffffffff81202474>] generic_write_end+0x34/0xb0
>    [<ffffffff8118af3d>] ? wait_for_stable_page+0x1d/0x50
>    [<ffffffff814fa317>] xfs_vm_write_end+0x67/0xc0
>    [<ffffffff811813af>] pagecache_write_end+0x1f/0x30
>    [<ffffffff815060dd>] xfs_iozero+0x10d/0x190
>    [<ffffffff8150666b>] xfs_zero_last_block+0xdb/0x110
>    [<ffffffff815067ba>] xfs_zero_eof+0x11a/0x290
>    [<ffffffff811d69e0>] ? complete_walk+0x60/0x100
>    [<ffffffff811da25f>] ? path_lookupat+0x5f/0x660
>    [<ffffffff81506a6e>] xfs_file_aio_write_checks+0x13e/0x160
>    [<ffffffff81506f15>] xfs_file_buffered_aio_write+0x75/0x250
>    [<ffffffff811ddb0f>] ? user_path_at_empty+0x5f/0xa0
>    [<ffffffff810c601d>] ? __might_sleep+0x4d/0x90
>    [<ffffffff815071f5>] xfs_file_write_iter+0x105/0x120
>    [<ffffffff811cc5ce>] __vfs_write+0xae/0xf0
>    [<ffffffff811ccc01>] vfs_write+0xa1/0x190
>    [<ffffffff811cd999>] SyS_write+0x49/0xb0
>    [<ffffffff811cc781>] ? SyS_lseek+0x91/0xb0
>    [<ffffffff81e24fee>] system_call_fastpath+0x12/0x71
>
> And there are a few tests (including generic/269) producing
> in_atomic/"scheduling while atomic" bugs in the evict() path such as:
>
>   in_atomic(): 1, irqs_disabled(): 0, pid: 3852, name: fsstress
>   CPU: 12 PID: 3852 Comm: fsstress Not tainted 4.1.0-rc8-dgc+ #263
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>    000000000000015d ffff88039898b6d8 ffffffff81e18ffd 0000000000000000
>    ffff880398865550 ffff88039898b6f8 ffffffff810c5f89 ffff8803f15c45c0
>    ffffffff8227a3bf ffff88039898b728 ffffffff810c601d ffff88039898b758
>   Call Trace:
>    [<ffffffff81e18ffd>] dump_stack+0x4c/0x6e
>    [<ffffffff810c5f89>] ___might_sleep+0xf9/0x140
>    [<ffffffff810c601d>] __might_sleep+0x4d/0x90
>    [<ffffffff81201e8b>] block_invalidatepage+0xab/0x140
>    [<ffffffff814f7579>] xfs_vm_invalidatepage+0x39/0xb0
>    [<ffffffff8118fa77>] truncate_inode_page+0x67/0xa0
>    [<ffffffff8118fc92>] truncate_inode_pages_range+0x1a2/0x6f0
>    [<ffffffff811828d1>] ? find_get_pages_tag+0xf1/0x1b0
>    [<ffffffff8104a663>] ? __switch_to+0x1e3/0x5a0
>    [<ffffffff8118dd05>] ? pagevec_lookup_tag+0x25/0x40
>    [<ffffffff811f620d>] ? __inode_wait_for_writeback+0x6d/0xc0
>    [<ffffffff8119024c>] truncate_inode_pages_final+0x4c/0x60
>    [<ffffffff8151c47f>] xfs_fs_evict_inode+0x4f/0x100
>    [<ffffffff811e8330>] evict+0xc0/0x1a0
>    [<ffffffff811e8d7b>] iput+0x1bb/0x220
>    [<ffffffff811f68b3>] sync_inodes_sb+0x353/0x3d0
>    [<ffffffff8151def8>] xfs_flush_inodes+0x28/0x40
>    [<ffffffff81514648>] xfs_create+0x638/0x770
>    [<ffffffff814e9049>] ? xfs_dir2_sf_lookup+0x199/0x330
>    [<ffffffff81511091>] xfs_generic_create+0xd1/0x300
>    [<ffffffff817a059c>] ? security_inode_permission+0x1c/0x30
>    [<ffffffff815112f6>] xfs_vn_create+0x16/0x20
>    [<ffffffff811d8665>] vfs_create+0xd5/0x140
>    [<ffffffff811dbea3>] do_last+0xff3/0x1200
>    [<ffffffff811d9f36>] ? path_init+0x186/0x450
>    [<ffffffff811dc130>] path_openat+0x80/0x610
>    [<ffffffff81512a24>] ? xfs_iunlock+0xc4/0x210
>    [<ffffffff811ddbfa>] do_filp_open+0x3a/0x90
>    [<ffffffff811dc8bf>] ? getname_flags+0x4f/0x200
>    [<ffffffff81e249ce>] ? _raw_spin_unlock+0xe/0x30
>    [<ffffffff811eab17>] ? __alloc_fd+0xa7/0x130
>    [<ffffffff811cbcf8>] do_sys_open+0x128/0x220
>    [<ffffffff811cbe4e>] SyS_creat+0x1e/0x20
>    [<ffffffff81e24fee>] system_call_fastpath+0x12/0x71
>
> It looks to me like iput() is being called with the wb.list_lock
> held in wait_sb_inodes(), and everything is going downhill from
> there.  Patch below fixes the problem for me.
>
> Cheers,
>
> Dave.
>

Thanks Dave I'll add it.  I think this is what we were doing at first 
but then I changed it, didn't notice the wb.list_lock.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in

  reply	other threads:[~2015-06-22 16:21 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-11 19:41 [PATCH 0/7] super block scalabilit patches V3 Josef Bacik
2015-06-11 19:41 ` [PATCH 1/8] writeback: plug writeback at a high level Josef Bacik
2015-06-17 12:03   ` Christoph Hellwig
2015-06-11 19:41 ` [PATCH 2/8] inode: add hlist_fake to avoid the inode hash lock in evict Josef Bacik
2015-06-17 12:03   ` Christoph Hellwig
2015-06-11 19:41 ` [PATCH 3/8] inode: convert inode_sb_list_lock to per-sb Josef Bacik
2015-06-17 12:06   ` Christoph Hellwig
2015-06-11 19:41 ` [PATCH 4/8] sync: serialise per-superblock sync operations Josef Bacik
2015-06-17 12:06   ` Christoph Hellwig
2015-06-11 19:41 ` [PATCH 5/8] inode: rename i_wb_list to i_io_list Josef Bacik
2015-06-17 12:06   ` Christoph Hellwig
2015-06-11 19:41 ` [PATCH 6/8] bdi: add a new writeback list for sync Josef Bacik
2015-06-15 14:12   ` Jan Kara
2015-06-16 15:42     ` Josef Bacik
2015-06-17 10:34       ` Jan Kara
2015-06-17 17:55         ` Josef Bacik
2015-06-18  9:28           ` Jan Kara
2015-06-18 22:18   ` [PATCH 6/8 V4] " Josef Bacik
2015-06-19  8:38     ` Jan Kara
2015-06-11 19:41 ` [PATCH 7/8] writeback: periodically trim the writeback list Josef Bacik
2015-06-11 19:41 ` [PATCH 8/8] inode: don't softlockup when evicting inodes Josef Bacik
2015-06-15 14:16   ` Jan Kara
2015-06-11 20:50 ` [PATCH 0/7] super block scalabilit patches V3 Tejun Heo
2015-06-15 21:34 ` Dave Chinner
2015-06-22  2:26   ` [PATCH] sync: wait_sb_inodes() calls iput() with spinlock held (was Re: [PATCH 0/7] super block scalabilit patches V3) Dave Chinner
2015-06-22 16:21     ` Josef Bacik [this message]
2015-06-23 23:14     ` Josef Bacik
2015-06-24  0:35       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=558835EF.2000000@fb.com \
    --to=jbacik@fb.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=kernel-team@fb.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).