From mboxrd@z Thu Jan 1 00:00:00 1970 From: Miao Xie Subject: Re: [PATCH V5 2/2] btrfs: implement delayed inode items operation Date: Sun, 27 Mar 2011 15:00:00 +0800 Message-ID: <4D8EE070.6000700@cn.fujitsu.com> References: <4D8B2DEB.1000001@cn.fujitsu.com> <20110327143055.ae2d3c68.kitayama@cl.bb4u.ne.jp> Reply-To: miaox@cn.fujitsu.com Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: Chris Mason , Linux Btrfs , Ito , David Sterba To: Itaru Kitayama Return-path: In-Reply-To: <20110327143055.ae2d3c68.kitayama@cl.bb4u.ne.jp> List-ID: On sun, 27 Mar 2011 14:30:55 +0900, Itaru Kitayama wrote: > Chris' stress test, stress.sh -n 50 -c /mnt/linux-2.6 /mnt gave me another lockdep splat > (see below). I applied your V5 patches on top of the next-rc branch. I got it. It is because the allocation flag of the metadata's page cache, which is stored in the btree inode's i_mapping, was set to be GFP_HIGHUSER_MOVABLE. So if we allocate pages for btree's page cache, this lockdep warning will be triggered. I think even without my patch, this lockdep warning can also be triggered, btrfs_evict_inode() do the similar operations like what I do in the btrfs_destroy_inode(). Task1 Kswap0 task open() ... btrfs_search_slot() ... btrfs_cow_block() ... alloc_page() wait for reclaiming shrink_slab() ... shrink_icache_memory() ... btrfs_evict_inode() ... btrfs_search_slot() If the path is locked by task1, the deadlock happens. So the btree's page cache is different with the file's page cache, it can not allocate pages by GFP_HIGHUSER_MOVABLE flag. I will make a separate patch to fix it. > I haven't triggered it in my actual testing, but do you think we can iterate a list of block > groups in an lockless manner using rcu? May be we can use it, but AFAIK, the write-side of the sleepable RCU is quite slow. Though the operations of the block group list are few, I think we should do some test to check the performance regression. Thanks Miao > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 2164296..f40ff4e 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -740,6 +740,7 @@ struct btrfs_space_info { > struct list_head block_groups[BTRFS_NR_RAID_TYPES]; > spinlock_t lock; > struct rw_semaphore groups_sem; > + struct srcu_struct groups_srcu; > atomic_t caching_threads; > }; > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 9e4c9f4..22d6dbb 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -3003,6 +3003,7 @@ static int update_space_info(struct btrfs_fs_info *info, u64 flags, > for (i = 0; i < BTRFS_NR_RAID_TYPES; i++) > INIT_LIST_HEAD(&found->block_groups[i]); > init_rwsem(&found->groups_sem); > + init_srcu_struct(&found->groups_srcu); > spin_lock_init(&found->lock); > found->flags = flags & (BTRFS_BLOCK_GROUP_DATA | > BTRFS_BLOCK_GROUP_SYSTEM | > @@ -4853,6 +4854,7 @@ static noinline int find_free_extent(struct btrfs_trans_handle *trans, > int data) > { > int ret = 0; > + int idx; > struct btrfs_root *root = orig_root->fs_info->extent_root; > struct btrfs_free_cluster *last_ptr = NULL; > struct btrfs_block_group_cache *block_group = NULL; > @@ -4929,7 +4931,7 @@ ideal_cache: > if (block_group && block_group_bits(block_group, data) && > (block_group->cached != BTRFS_CACHE_NO || > search_start == ideal_cache_offset)) { > - down_read(&space_info->groups_sem); > + idx = srcu_read_lock(&space_info->groups_srcu); > if (list_empty(&block_group->list) || > block_group->ro) { > /* > @@ -4939,7 +4941,7 @@ ideal_cache: > * valid > */ > btrfs_put_block_group(block_group); > - up_read(&space_info->groups_sem); > + srcu_read_unlock(&space_info->groups_srcu, idx); > } else { > index = get_block_group_index(block_group); > goto have_block_group; > @@ -4949,8 +4951,8 @@ ideal_cache: > } > } > search: > - down_read(&space_info->groups_sem); > - list_for_each_entry(block_group, &space_info->block_groups[index], > + idx = srcu_read_lock(&space_info->groups_srcu); > + list_for_each_entry_rcu(block_group, &space_info->block_groups[index], > list) { > u64 offset; > int cached; > @@ -5197,8 +5199,8 @@ loop: > BUG_ON(index != get_block_group_index(block_group)); > btrfs_put_block_group(block_group); > } > - up_read(&space_info->groups_sem); > - > + srcu_read_unlock(&space_info->groups_srcu, idx); > + > if (!ins->objectid && ++index < BTRFS_NR_RAID_TYPES) > goto search; > > > > ========================================================= > [ INFO: possible irq lock inversion dependency detected ] > 2.6.36-v5+ #2 > --------------------------------------------------------- > kswapd0/49 just changed the state of lock: > (&delayed_node->mutex){+.+.-.}, at: [] btrfs_remove_delayed_node+0x3e/0xd2 > but this lock took another, RECLAIM_FS-READ-unsafe lock in the past: > (&found->groups_sem){++++.+} > > and interrupts could create inverse lock ordering between them. > > > other info that might help us debug this: > 2 locks held by kswapd0/49: > #0: (shrinker_rwsem){++++..}, at: [] shrink_slab+0x3d/0x164 > #1: (iprune_sem){++++.-}, at: [] shrink_icache_memory+0x4d/0x213 > > the shortest dependencies between 2nd lock and 1st lock: > -> (&found->groups_sem){++++.+} ops: 1334 { > HARDIRQ-ON-W at: > [] __lock_acquire+0x346/0xda6 > [] lock_acquire+0x11d/0x143 > [] down_write+0x55/0x9b > [] __link_block_group+0x5a/0x83 > [] btrfs_read_block_groups+0x2fb/0x56c > [] open_ctree+0xf78/0x14ab > [] btrfs_get_sb+0x236/0x467 > [] vfs_kern_mount+0xbd/0x1a7 > [] do_kern_mount+0x4d/0xed > [] do_mount+0x74e/0x7c5 > [] sys_mount+0x88/0xc2 > [] system_call_fastpath+0x16/0x1b > HARDIRQ-ON-R at: > [] __lock_acquire+0x31e/0xda6 > [] lock_acquire+0x11d/0x143 > [] down_read+0x4c/0x91 > [] find_free_extent+0x3ec/0xa86 > [] btrfs_reserve_extent+0xb4/0x142 > [] btrfs_alloc_free_block+0x167/0x2b2 > [] __btrfs_cow_block+0x103/0x346 > [] btrfs_cow_block+0x101/0x110 > [] btrfs_search_slot+0x143/0x513 > [] btrfs_insert_empty_items+0x6a/0xbc > [] btrfs_insert_orphan_item+0x5d/0x75 > [] btrfs_orphan_add+0x139/0x152 > [] btrfs_setattr+0xff/0x253 > [] notify_change+0x1a2/0x29d > [] do_truncate+0x6c/0x89 > [] do_last+0x579/0x57e > [] do_filp_open+0x215/0x5ae > [] do_sys_open+0x60/0xfc > [] sys_open+0x20/0x22 > [] system_call_fastpath+0x16/0x1b > SOFTIRQ-ON-W at: > [] __lock_acquire+0x367/0xda6 > [] lock_acquire+0x11d/0x143 > [] down_write+0x55/0x9b > [] __link_block_group+0x5a/0x83 > [] btrfs_read_block_groups+0x2fb/0x56c > [] open_ctree+0xf78/0x14ab > [] btrfs_get_sb+0x236/0x467 > [] vfs_kern_mount+0xbd/0x1a7 > [] do_kern_mount+0x4d/0xed > [] do_mount+0x74e/0x7c5 > [] sys_mount+0x88/0xc2 > [] system_call_fastpath+0x16/0x1b > SOFTIRQ-ON-R at: > [] __lock_acquire+0x367/0xda6 > [] lock_acquire+0x11d/0x143 > [] down_read+0x4c/0x91 > [] find_free_extent+0x3ec/0xa86 > [] btrfs_reserve_extent+0xb4/0x142 > [] btrfs_alloc_free_block+0x167/0x2b2 > [] __btrfs_cow_block+0x103/0x346 > [] btrfs_cow_block+0x101/0x110 > [] btrfs_search_slot+0x143/0x513 > [] btrfs_insert_empty_items+0x6a/0xbc > [] btrfs_insert_orphan_item+0x5d/0x75 > [] btrfs_orphan_add+0x139/0x152 > [] btrfs_setattr+0xff/0x253 > [] notify_change+0x1a2/0x29d > [] do_truncate+0x6c/0x89 > [] do_last+0x579/0x57e > [] do_filp_open+0x215/0x5ae > [] do_sys_open+0x60/0xfc > [] sys_open+0x20/0x22 > [] system_call_fastpath+0x16/0x1b > RECLAIM_FS-ON-R at: > [] mark_held_locks+0x52/0x70 > [] lockdep_trace_alloc+0xa4/0xc2 > [] __alloc_pages_nodemask+0x96/0x841 > [] alloc_pages_current+0xa7/0xca > [] __page_cache_alloc+0x85/0x8c > [] __do_page_cache_readahead+0xb5/0x19d > [] ra_submit+0x21/0x25 > [] ondemand_readahead+0x1b6/0x1c9 > [] page_cache_sync_readahead+0x3d/0x3f > [] load_free_space_cache+0x262/0x671 > [] cache_block_group+0x97/0x233 > [] find_free_extent+0x479/0xa86 > [] btrfs_reserve_extent+0xb4/0x142 > [] btrfs_alloc_free_block+0x167/0x2b2 > [] __btrfs_cow_block+0x103/0x346 > [] btrfs_cow_block+0x101/0x110 > [] btrfs_search_slot+0x143/0x513 > [] btrfs_insert_empty_items+0x6a/0xbc > [] btrfs_insert_orphan_item+0x5d/0x75 > [] btrfs_orphan_add+0x139/0x152 > [] btrfs_setattr+0xff/0x253 > [] notify_change+0x1a2/0x29d > [] do_truncate+0x6c/0x89 > [] do_last+0x579/0x57e > [] do_filp_open+0x215/0x5ae > [] do_sys_open+0x60/0xfc > [] sys_open+0x20/0x22 > [] system_call_fastpath+0x16/0x1b > INITIAL USE at: > [] __lock_acquire+0x3bd/0xda6 > [] lock_acquire+0x11d/0x143 > [] down_write+0x55/0x9b > [] __link_block_group+0x5a/0x83 > [] btrfs_read_block_groups+0x2fb/0x56c > [] open_ctree+0xf78/0x14ab > [] btrfs_get_sb+0x236/0x467 > [] vfs_kern_mount+0xbd/0x1a7 > [] do_kern_mount+0x4d/0xed > [] do_mount+0x74e/0x7c5 > [] sys_mount+0x88/0xc2 > [] system_call_fastpath+0x16/0x1b > } > ... key at: [] __key.40112+0x0/0x8 > ... acquired at: > [] lock_acquire+0x11d/0x143 > [] down_read+0x4c/0x91 > [] find_free_extent+0x2c4/0xa86 > [] btrfs_reserve_extent+0xb4/0x142 > [] btrfs_alloc_free_block+0x167/0x2b2 > [] __btrfs_cow_block+0x103/0x346 > [] btrfs_cow_block+0x101/0x110 > [] btrfs_search_slot+0x143/0x513 > [] btrfs_lookup_inode+0x2f/0x8f > [] btrfs_update_delayed_inode+0x75/0x135 > [] btrfs_async_run_delayed_node_done+0xd5/0x194 > [] worker_loop+0x198/0x4dd > [] kthread+0x9d/0xa5 > [] kernel_thread_helper+0x4/0x10 > > -> (&delayed_node->mutex){+.+.-.} ops: 8932 { > HARDIRQ-ON-W at: > [] __lock_acquire+0x346/0xda6 > [] lock_acquire+0x11d/0x143 > [] __mutex_lock_common+0x5a/0x444 > [] mutex_lock_nested+0x39/0x3e > [] btrfs_delayed_update_inode+0x45/0x101 > [] btrfs_update_inode+0x2e/0x129 > [] btrfs_truncate+0x43d/0x477 > [] vmtruncate+0x44/0x52 > [] btrfs_setattr+0x202/0x253 > [] notify_change+0x1a2/0x29d > [] do_truncate+0x6c/0x89 > [] do_last+0x579/0x57e > [] do_filp_open+0x215/0x5ae > [] do_sys_open+0x60/0xfc > [] sys_open+0x20/0x22 > [] system_call_fastpath+0x16/0x1b > SOFTIRQ-ON-W at: > [] __lock_acquire+0x367/0xda6 > [] lock_acquire+0x11d/0x143 > [] __mutex_lock_common+0x5a/0x444 > [] mutex_lock_nested+0x39/0x3e > [] btrfs_delayed_update_inode+0x45/0x101 > [] btrfs_update_inode+0x2e/0x129 > [] btrfs_truncate+0x43d/0x477 > [] vmtruncate+0x44/0x52 > [] btrfs_setattr+0x202/0x253 > [] notify_change+0x1a2/0x29d > [] do_truncate+0x6c/0x89 > [] do_last+0x579/0x57e > [] do_filp_open+0x215/0x5ae > [] do_sys_open+0x60/0xfc > [] sys_open+0x20/0x22 > [] system_call_fastpath+0x16/0x1b > IN-RECLAIM_FS-W at: > [] __lock_acquire+0x3a5/0xda6 > [] lock_acquire+0x11d/0x143 > [] __mutex_lock_common+0x5a/0x444 > [] mutex_lock_nested+0x39/0x3e > [] btrfs_remove_delayed_node+0x3e/0xd2 > [] btrfs_destroy_inode+0x2ae/0x2d4 > [] destroy_inode+0x2f/0x45 > [] dispose_list+0xaa/0xdf > [] shrink_icache_memory+0x1e3/0x213 > [] shrink_slab+0xe0/0x164 > [] balance_pgdat+0x2e8/0x50b > [] kswapd+0x380/0x3c0 > [] kthread+0x9d/0xa5 > [] kernel_thread_helper+0x4/0x10 > INITIAL USE at: > [] __lock_acquire+0x3bd/0xda6 > [] lock_acquire+0x11d/0x143 > [] __mutex_lock_common+0x5a/0x444 > [] mutex_lock_nested+0x39/0x3e > [] btrfs_delayed_update_inode+0x45/0x101 > [] btrfs_update_inode+0x2e/0x129 > [] btrfs_truncate+0x43d/0x477 > [] vmtruncate+0x44/0x52 > [] btrfs_setattr+0x202/0x253 > [] notify_change+0x1a2/0x29d > [] do_truncate+0x6c/0x89 > [] do_last+0x579/0x57e > [] do_filp_open+0x215/0x5ae > [] do_sys_open+0x60/0xfc > [] sys_open+0x20/0x22 > [] system_call_fastpath+0x16/0x1b > } > ... key at: [] __key.31289+0x0/0x8 > ... acquired at: > [] check_usage_forwards+0x71/0x7e > [] mark_lock+0x18c/0x26a > [] __lock_acquire+0x3a5/0xda6 > [] lock_acquire+0x11d/0x143 > [] __mutex_lock_common+0x5a/0x444 > [] mutex_lock_nested+0x39/0x3e > [] btrfs_remove_delayed_node+0x3e/0xd2 > [] btrfs_destroy_inode+0x2ae/0x2d4 > [] destroy_inode+0x2f/0x45 > [] dispose_list+0xaa/0xdf > [] shrink_icache_memory+0x1e3/0x213 > [] shrink_slab+0xe0/0x164 > [] balance_pgdat+0x2e8/0x50b > [] kswapd+0x380/0x3c0 > [] kthread+0x9d/0xa5 > [] kernel_thread_helper+0x4/0x10 > > > stack backtrace: > Pid: 49, comm: kswapd0 Not tainted 2.6.36-v5+ #2 > Call Trace: > [] print_irq_inversion_bug+0x124/0x135 > [] check_usage_forwards+0x71/0x7e > [] ? check_usage_forwards+0x0/0x7e > [] mark_lock+0x18c/0x26a > [] __lock_acquire+0x3a5/0xda6 > [] ? __lock_acquire+0xd97/0xda6 > [] ? btrfs_remove_delayed_node+0x3e/0xd2 > [] lock_acquire+0x11d/0x143 > [] ? btrfs_remove_delayed_node+0x3e/0xd2 > [] ? btrfs_remove_delayed_node+0x3e/0xd2 > [] __mutex_lock_common+0x5a/0x444 > [] ? btrfs_remove_delayed_node+0x3e/0xd2 > [] ? trace_hardirqs_on+0xd/0xf > [] mutex_lock_nested+0x39/0x3e > [] btrfs_remove_delayed_node+0x3e/0xd2 > [] btrfs_destroy_inode+0x2ae/0x2d4 > [] destroy_inode+0x2f/0x45 > [] dispose_list+0xaa/0xdf > [] shrink_icache_memory+0x1e3/0x213 > [] shrink_slab+0xe0/0x164 > [] balance_pgdat+0x2e8/0x50b > [] kswapd+0x380/0x3c0 > [] ? autoremove_wake_function+0x0/0x39 > [] ? kswapd+0x0/0x3c0 > [] kthread+0x9d/0xa5 > [] kernel_thread_helper+0x4/0x10 > [] ? finish_task_switch+0x70/0xb9 > [] ? restore_args+0x0/0x30 > [] ? kthread+0x0/0xa5 > [] ? kernel_thread_helper+0x0/0x10 > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >