linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* About hung task on generic/041
@ 2018-07-11  9:02 Lu Fengqi
  2018-07-12 10:40 ` Filipe Manana
  0 siblings, 1 reply; 4+ messages in thread
From: Lu Fengqi @ 2018-07-11  9:02 UTC (permalink / raw)
  To: linux-btrfs; +Cc: fdmanana

Hi,

When I run generic/041 with v4.18-rc3 (turn on kasan and hung task
detection), btrfs-transaction kthread will trigger the hung task timeout
(stall at wait_event in btrfs_commit_transaction). At the same time, you
can see that xfs_io -c fsync will occupy 100% of the CPU. I am not sure
whether this is a problem. Any suggestion?

[Wed Jul 11 15:50:08 2018] INFO: task btrfs-transacti:1053 blocked for more than 120 seconds.
[Wed Jul 11 15:50:08 2018]       Not tainted 4.18.0-rc3-custom #14
[Wed Jul 11 15:50:08 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Jul 11 15:50:08 2018] btrfs-transacti D    0  1053      2 0x80000000
[Wed Jul 11 15:50:08 2018] Call Trace:
[Wed Jul 11 15:50:08 2018]  ? __schedule+0x5b2/0x1380
[Wed Jul 11 15:50:08 2018]  ? check_flags.part.23+0x240/0x240
[Wed Jul 11 15:50:08 2018]  ? firmware_map_remove+0x187/0x187
[Wed Jul 11 15:50:08 2018]  ? ___preempt_schedule+0x16/0x18
[Wed Jul 11 15:50:08 2018]  ? mark_held_locks+0x6e/0x90
[Wed Jul 11 15:50:08 2018]  ? _raw_spin_unlock_irqrestore+0x59/0x70
[Wed Jul 11 15:50:08 2018]  ? preempt_count_sub+0x14/0xc0
[Wed Jul 11 15:50:08 2018]  ? _raw_spin_unlock_irqrestore+0x46/0x70
[Wed Jul 11 15:50:08 2018]  ? prepare_to_wait_event+0x191/0x410
[Wed Jul 11 15:50:08 2018]  ? prepare_to_wait_exclusive+0x210/0x210
[Wed Jul 11 15:50:08 2018]  ? print_usage_bug+0x3a0/0x3a0
[Wed Jul 11 15:50:08 2018]  ? do_raw_spin_unlock+0x10f/0x1e0
[Wed Jul 11 15:50:08 2018]  ? do_raw_spin_trylock+0x120/0x120
[Wed Jul 11 15:50:08 2018]  schedule+0xca/0x260
[Wed Jul 11 15:50:08 2018]  ? rcu_lockdep_current_cpu_online+0x12b/0x160
[Wed Jul 11 15:50:08 2018]  ? __schedule+0x1380/0x1380
[Wed Jul 11 15:50:08 2018]  ? ___might_sleep+0x126/0x370
[Wed Jul 11 15:50:08 2018]  ? init_wait_entry+0xc7/0x100
[Wed Jul 11 15:50:08 2018]  ? __wake_up_locked_key_bookmark+0x20/0x20
[Wed Jul 11 15:50:08 2018]  ? __btrfs_run_delayed_items+0x1e5/0x280 [btrfs]
[Wed Jul 11 15:50:08 2018]  ? __might_sleep+0x31/0xd0
[Wed Jul 11 15:50:08 2018]  btrfs_commit_transaction+0x122a/0x1640 [btrfs]
[Wed Jul 11 15:50:08 2018]  ? btrfs_apply_pending_changes+0x90/0x90 [btrfs]
[Wed Jul 11 15:50:08 2018]  ? wait_woken+0x150/0x150
[Wed Jul 11 15:50:08 2018]  ? ret_from_fork+0x27/0x50
[Wed Jul 11 15:50:08 2018]  ? ret_from_fork+0x27/0x50
[Wed Jul 11 15:50:08 2018]  ? deref_stack_reg+0xe0/0xe0
[Wed Jul 11 15:50:08 2018]  ? __module_text_address+0x63/0xa0
[Wed Jul 11 15:50:08 2018]  ? preempt_count_sub+0x14/0xc0
[Wed Jul 11 15:50:08 2018]  ? transaction_kthread+0x161/0x240 [btrfs]
[Wed Jul 11 15:50:08 2018]  ? is_module_text_address+0x2b/0x50
[Wed Jul 11 15:50:08 2018]  ? transaction_kthread+0x1d9/0x240 [btrfs]
[Wed Jul 11 15:50:08 2018]  ? kernel_text_address+0x5a/0x100
[Wed Jul 11 15:50:08 2018]  ? deactivate_slab.isra.27+0x64f/0x7a0
[Wed Jul 11 15:50:08 2018]  ? __save_stack_trace+0x82/0x100
[Wed Jul 11 15:50:08 2018]  ? kasan_kmalloc+0x142/0x170
[Wed Jul 11 15:50:08 2018]  ? kmem_cache_alloc+0xfc/0x2e0
[Wed Jul 11 15:50:08 2018]  ? start_transaction+0x596/0x930 [btrfs]
[Wed Jul 11 15:50:08 2018]  ? transaction_kthread+0x1d9/0x240 [btrfs]
[Wed Jul 11 15:50:08 2018]  ? kthread+0x1b9/0x1e0
[Wed Jul 11 15:50:08 2018]  ? ret_from_fork+0x27/0x50
[Wed Jul 11 15:50:08 2018]  ? deactivate_slab.isra.27+0x64f/0x7a0
[Wed Jul 11 15:50:08 2018]  ? mark_lock+0x149/0xa80
[Wed Jul 11 15:50:08 2018]  ? init_object+0x6b/0x80
[Wed Jul 11 15:50:08 2018]  ? print_usage_bug+0x3a0/0x3a0
[Wed Jul 11 15:50:08 2018]  ? ___slab_alloc+0x62a/0x690
[Wed Jul 11 15:50:08 2018]  ? ___slab_alloc+0x62a/0x690
[Wed Jul 11 15:50:08 2018]  ? __lock_is_held+0x8c/0xe0
[Wed Jul 11 15:50:08 2018]  ? start_transaction+0x596/0x930 [btrfs]
[Wed Jul 11 15:50:08 2018]  ? preempt_count_sub+0x14/0xc0
[Wed Jul 11 15:50:08 2018]  ? rcu_lockdep_current_cpu_online+0x12b/0x160
[Wed Jul 11 15:50:08 2018]  ? rcu_oom_callback+0x40/0x40
[Wed Jul 11 15:50:08 2018]  ? __lock_is_held+0x8c/0xe0
[Wed Jul 11 15:50:08 2018]  ? start_transaction+0x596/0x930 [btrfs]
[Wed Jul 11 15:50:08 2018]  ? rcu_read_lock_sched_held+0x8f/0xa0
[Wed Jul 11 15:50:08 2018]  ? btrfs_record_root_in_trans+0x1f/0xa0 [btrfs]
[Wed Jul 11 15:50:08 2018]  ? start_transaction+0x26b/0x930 [btrfs]
[Wed Jul 11 15:50:08 2018]  ? btrfs_commit_transaction+0x1640/0x1640 [btrfs]
[Wed Jul 11 15:50:08 2018]  ? check_flags.part.23+0x240/0x240
[Wed Jul 11 15:50:08 2018]  ? lock_downgrade+0x380/0x380
[Wed Jul 11 15:50:08 2018]  ? do_raw_spin_unlock+0x10f/0x1e0
[Wed Jul 11 15:50:08 2018]  ? do_raw_spin_unlock+0x10f/0x1e0
[Wed Jul 11 15:50:08 2018]  ? do_raw_spin_trylock+0x120/0x120
[Wed Jul 11 15:50:08 2018]  transaction_kthread+0x219/0x240 [btrfs]
[Wed Jul 11 15:50:08 2018]  ? btrfs_cleanup_transaction+0x6f0/0x6f0 [btrfs]
[Wed Jul 11 15:50:08 2018]  kthread+0x1b9/0x1e0
[Wed Jul 11 15:50:08 2018]  ? kthread_flush_work_fn+0x10/0x10
[Wed Jul 11 15:50:08 2018]  ret_from_fork+0x27/0x50
[Wed Jul 11 15:50:08 2018] 
                           Showing all locks held in the system:
[Wed Jul 11 15:50:08 2018] 1 lock held by khungtaskd/72:
[Wed Jul 11 15:50:08 2018]  #0: 000000006461a1cc (rcu_read_lock){....}, at: debug_show_all_locks+0xa9/0x3d0
[Wed Jul 11 15:50:08 2018] 1 lock held by btrfs-transacti/1053:
[Wed Jul 11 15:50:08 2018]  #0: 000000005d73d46a (&fs_info->transaction_kthread_mutex){+.+.}, at: transaction_kthread+0x92/0x240 [btrfs]
[Wed Jul 11 15:50:08 2018] 1 lock held by dmesg/1066:
[Wed Jul 11 15:50:08 2018]  #0: 000000003e5d6fc4 (&user->lock){+.+.}, at: devkmsg_read+0xa5/0x4f0
[Wed Jul 11 15:50:08 2018] 4 locks held by xfs_io/7075:

-- 
Thanks,
Lu



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: About hung task on generic/041
  2018-07-11  9:02 About hung task on generic/041 Lu Fengqi
@ 2018-07-12 10:40 ` Filipe Manana
  2018-07-12 12:34   ` Lu Fengqi
  0 siblings, 1 reply; 4+ messages in thread
From: Filipe Manana @ 2018-07-12 10:40 UTC (permalink / raw)
  To: Lu Fengqi; +Cc: linux-btrfs, Filipe David Borba Manana

On Wed, Jul 11, 2018 at 10:02 AM, Lu Fengqi <lufq.fnst@cn.fujitsu.com> wrote:
> Hi,
>
> When I run generic/041 with v4.18-rc3 (turn on kasan and hung task
> detection), btrfs-transaction kthread will trigger the hung task timeout
> (stall at wait_event in btrfs_commit_transaction). At the same time, you
> can see that xfs_io -c fsync will occupy 100% of the CPU. I am not sure
> whether this is a problem. Any suggestion?

Well, something at 100% cpu and that seems hang forever is definitely
a problem, specially a workload as simple as the one in generic/041
(never happened to me, even on vanilla 4.18-rc4).
Do you have the stack trace for the fsync task? What you pasted below
is only for the transaction kthread and that alone doesn't help.

>
> [Wed Jul 11 15:50:08 2018] INFO: task btrfs-transacti:1053 blocked for more than 120 seconds.
> [Wed Jul 11 15:50:08 2018]       Not tainted 4.18.0-rc3-custom #14
> [Wed Jul 11 15:50:08 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Wed Jul 11 15:50:08 2018] btrfs-transacti D    0  1053      2 0x80000000
> [Wed Jul 11 15:50:08 2018] Call Trace:
> [Wed Jul 11 15:50:08 2018]  ? __schedule+0x5b2/0x1380
> [Wed Jul 11 15:50:08 2018]  ? check_flags.part.23+0x240/0x240
> [Wed Jul 11 15:50:08 2018]  ? firmware_map_remove+0x187/0x187
> [Wed Jul 11 15:50:08 2018]  ? ___preempt_schedule+0x16/0x18
> [Wed Jul 11 15:50:08 2018]  ? mark_held_locks+0x6e/0x90
> [Wed Jul 11 15:50:08 2018]  ? _raw_spin_unlock_irqrestore+0x59/0x70
> [Wed Jul 11 15:50:08 2018]  ? preempt_count_sub+0x14/0xc0
> [Wed Jul 11 15:50:08 2018]  ? _raw_spin_unlock_irqrestore+0x46/0x70
> [Wed Jul 11 15:50:08 2018]  ? prepare_to_wait_event+0x191/0x410
> [Wed Jul 11 15:50:08 2018]  ? prepare_to_wait_exclusive+0x210/0x210
> [Wed Jul 11 15:50:08 2018]  ? print_usage_bug+0x3a0/0x3a0
> [Wed Jul 11 15:50:08 2018]  ? do_raw_spin_unlock+0x10f/0x1e0
> [Wed Jul 11 15:50:08 2018]  ? do_raw_spin_trylock+0x120/0x120
> [Wed Jul 11 15:50:08 2018]  schedule+0xca/0x260
> [Wed Jul 11 15:50:08 2018]  ? rcu_lockdep_current_cpu_online+0x12b/0x160
> [Wed Jul 11 15:50:08 2018]  ? __schedule+0x1380/0x1380
> [Wed Jul 11 15:50:08 2018]  ? ___might_sleep+0x126/0x370
> [Wed Jul 11 15:50:08 2018]  ? init_wait_entry+0xc7/0x100
> [Wed Jul 11 15:50:08 2018]  ? __wake_up_locked_key_bookmark+0x20/0x20
> [Wed Jul 11 15:50:08 2018]  ? __btrfs_run_delayed_items+0x1e5/0x280 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? __might_sleep+0x31/0xd0
> [Wed Jul 11 15:50:08 2018]  btrfs_commit_transaction+0x122a/0x1640 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? btrfs_apply_pending_changes+0x90/0x90 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? wait_woken+0x150/0x150
> [Wed Jul 11 15:50:08 2018]  ? ret_from_fork+0x27/0x50
> [Wed Jul 11 15:50:08 2018]  ? ret_from_fork+0x27/0x50
> [Wed Jul 11 15:50:08 2018]  ? deref_stack_reg+0xe0/0xe0
> [Wed Jul 11 15:50:08 2018]  ? __module_text_address+0x63/0xa0
> [Wed Jul 11 15:50:08 2018]  ? preempt_count_sub+0x14/0xc0
> [Wed Jul 11 15:50:08 2018]  ? transaction_kthread+0x161/0x240 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? is_module_text_address+0x2b/0x50
> [Wed Jul 11 15:50:08 2018]  ? transaction_kthread+0x1d9/0x240 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? kernel_text_address+0x5a/0x100
> [Wed Jul 11 15:50:08 2018]  ? deactivate_slab.isra.27+0x64f/0x7a0
> [Wed Jul 11 15:50:08 2018]  ? __save_stack_trace+0x82/0x100
> [Wed Jul 11 15:50:08 2018]  ? kasan_kmalloc+0x142/0x170
> [Wed Jul 11 15:50:08 2018]  ? kmem_cache_alloc+0xfc/0x2e0
> [Wed Jul 11 15:50:08 2018]  ? start_transaction+0x596/0x930 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? transaction_kthread+0x1d9/0x240 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? kthread+0x1b9/0x1e0
> [Wed Jul 11 15:50:08 2018]  ? ret_from_fork+0x27/0x50
> [Wed Jul 11 15:50:08 2018]  ? deactivate_slab.isra.27+0x64f/0x7a0
> [Wed Jul 11 15:50:08 2018]  ? mark_lock+0x149/0xa80
> [Wed Jul 11 15:50:08 2018]  ? init_object+0x6b/0x80
> [Wed Jul 11 15:50:08 2018]  ? print_usage_bug+0x3a0/0x3a0
> [Wed Jul 11 15:50:08 2018]  ? ___slab_alloc+0x62a/0x690
> [Wed Jul 11 15:50:08 2018]  ? ___slab_alloc+0x62a/0x690
> [Wed Jul 11 15:50:08 2018]  ? __lock_is_held+0x8c/0xe0
> [Wed Jul 11 15:50:08 2018]  ? start_transaction+0x596/0x930 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? preempt_count_sub+0x14/0xc0
> [Wed Jul 11 15:50:08 2018]  ? rcu_lockdep_current_cpu_online+0x12b/0x160
> [Wed Jul 11 15:50:08 2018]  ? rcu_oom_callback+0x40/0x40
> [Wed Jul 11 15:50:08 2018]  ? __lock_is_held+0x8c/0xe0
> [Wed Jul 11 15:50:08 2018]  ? start_transaction+0x596/0x930 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? rcu_read_lock_sched_held+0x8f/0xa0
> [Wed Jul 11 15:50:08 2018]  ? btrfs_record_root_in_trans+0x1f/0xa0 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? start_transaction+0x26b/0x930 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? btrfs_commit_transaction+0x1640/0x1640 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? check_flags.part.23+0x240/0x240
> [Wed Jul 11 15:50:08 2018]  ? lock_downgrade+0x380/0x380
> [Wed Jul 11 15:50:08 2018]  ? do_raw_spin_unlock+0x10f/0x1e0
> [Wed Jul 11 15:50:08 2018]  ? do_raw_spin_unlock+0x10f/0x1e0
> [Wed Jul 11 15:50:08 2018]  ? do_raw_spin_trylock+0x120/0x120
> [Wed Jul 11 15:50:08 2018]  transaction_kthread+0x219/0x240 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? btrfs_cleanup_transaction+0x6f0/0x6f0 [btrfs]
> [Wed Jul 11 15:50:08 2018]  kthread+0x1b9/0x1e0
> [Wed Jul 11 15:50:08 2018]  ? kthread_flush_work_fn+0x10/0x10
> [Wed Jul 11 15:50:08 2018]  ret_from_fork+0x27/0x50
> [Wed Jul 11 15:50:08 2018]
>                            Showing all locks held in the system:
> [Wed Jul 11 15:50:08 2018] 1 lock held by khungtaskd/72:
> [Wed Jul 11 15:50:08 2018]  #0: 000000006461a1cc (rcu_read_lock){....}, at: debug_show_all_locks+0xa9/0x3d0
> [Wed Jul 11 15:50:08 2018] 1 lock held by btrfs-transacti/1053:
> [Wed Jul 11 15:50:08 2018]  #0: 000000005d73d46a (&fs_info->transaction_kthread_mutex){+.+.}, at: transaction_kthread+0x92/0x240 [btrfs]
> [Wed Jul 11 15:50:08 2018] 1 lock held by dmesg/1066:
> [Wed Jul 11 15:50:08 2018]  #0: 000000003e5d6fc4 (&user->lock){+.+.}, at: devkmsg_read+0xa5/0x4f0
> [Wed Jul 11 15:50:08 2018] 4 locks held by xfs_io/7075:
>
> --
> Thanks,
> Lu
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: About hung task on generic/041
  2018-07-12 10:40 ` Filipe Manana
@ 2018-07-12 12:34   ` Lu Fengqi
       [not found]     ` <20180713084449.GB575@fnst.localdomain>
  0 siblings, 1 reply; 4+ messages in thread
From: Lu Fengqi @ 2018-07-12 12:34 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs, Filipe David Borba Manana

On Thu, Jul 12, 2018 at 11:40:54AM +0100, Filipe Manana wrote:
>On Wed, Jul 11, 2018 at 10:02 AM, Lu Fengqi <lufq.fnst@cn.fujitsu.com> wrote:
>> Hi,
>>
>> When I run generic/041 with v4.18-rc3 (turn on kasan and hung task
>> detection), btrfs-transaction kthread will trigger the hung task timeout
>> (stall at wait_event in btrfs_commit_transaction). At the same time, you
>> can see that xfs_io -c fsync will occupy 100% of the CPU. I am not sure
>> whether this is a problem. Any suggestion?
>
>Well, something at 100% cpu and that seems hang forever is definitely
>a problem, specially a workload as simple as the one in generic/041

To clarify, the hung task will end within 500s. Without KASAN, it will
end within 80s, so it won't trigger hung task timeout 120s. I'm not sure
if this is just slow, or have some problem?

>(never happened to me, even on vanilla 4.18-rc4).
>Do you have the stack trace for the fsync task? What you pasted below

I will send the stack trace tomorrow.

-- 
Thanks,
Lu

>is only for the transaction kthread and that alone doesn't help.
>
>>
>> [Wed Jul 11 15:50:08 2018] INFO: task btrfs-transacti:1053 blocked for more than 120 seconds.
>> [Wed Jul 11 15:50:08 2018]       Not tainted 4.18.0-rc3-custom #14
>> [Wed Jul 11 15:50:08 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [Wed Jul 11 15:50:08 2018] btrfs-transacti D    0  1053      2 0x80000000
>> [Wed Jul 11 15:50:08 2018] Call Trace:
>> [Wed Jul 11 15:50:08 2018]  ? __schedule+0x5b2/0x1380
>> [Wed Jul 11 15:50:08 2018]  ? check_flags.part.23+0x240/0x240
>> [Wed Jul 11 15:50:08 2018]  ? firmware_map_remove+0x187/0x187
>> [Wed Jul 11 15:50:08 2018]  ? ___preempt_schedule+0x16/0x18
>> [Wed Jul 11 15:50:08 2018]  ? mark_held_locks+0x6e/0x90
>> [Wed Jul 11 15:50:08 2018]  ? _raw_spin_unlock_irqrestore+0x59/0x70
>> [Wed Jul 11 15:50:08 2018]  ? preempt_count_sub+0x14/0xc0
>> [Wed Jul 11 15:50:08 2018]  ? _raw_spin_unlock_irqrestore+0x46/0x70
>> [Wed Jul 11 15:50:08 2018]  ? prepare_to_wait_event+0x191/0x410
>> [Wed Jul 11 15:50:08 2018]  ? prepare_to_wait_exclusive+0x210/0x210
>> [Wed Jul 11 15:50:08 2018]  ? print_usage_bug+0x3a0/0x3a0
>> [Wed Jul 11 15:50:08 2018]  ? do_raw_spin_unlock+0x10f/0x1e0
>> [Wed Jul 11 15:50:08 2018]  ? do_raw_spin_trylock+0x120/0x120
>> [Wed Jul 11 15:50:08 2018]  schedule+0xca/0x260
>> [Wed Jul 11 15:50:08 2018]  ? rcu_lockdep_current_cpu_online+0x12b/0x160
>> [Wed Jul 11 15:50:08 2018]  ? __schedule+0x1380/0x1380
>> [Wed Jul 11 15:50:08 2018]  ? ___might_sleep+0x126/0x370
>> [Wed Jul 11 15:50:08 2018]  ? init_wait_entry+0xc7/0x100
>> [Wed Jul 11 15:50:08 2018]  ? __wake_up_locked_key_bookmark+0x20/0x20
>> [Wed Jul 11 15:50:08 2018]  ? __btrfs_run_delayed_items+0x1e5/0x280 [btrfs]
>> [Wed Jul 11 15:50:08 2018]  ? __might_sleep+0x31/0xd0
>> [Wed Jul 11 15:50:08 2018]  btrfs_commit_transaction+0x122a/0x1640 [btrfs]
>> [Wed Jul 11 15:50:08 2018]  ? btrfs_apply_pending_changes+0x90/0x90 [btrfs]
>> [Wed Jul 11 15:50:08 2018]  ? wait_woken+0x150/0x150
>> [Wed Jul 11 15:50:08 2018]  ? ret_from_fork+0x27/0x50
>> [Wed Jul 11 15:50:08 2018]  ? ret_from_fork+0x27/0x50
>> [Wed Jul 11 15:50:08 2018]  ? deref_stack_reg+0xe0/0xe0
>> [Wed Jul 11 15:50:08 2018]  ? __module_text_address+0x63/0xa0
>> [Wed Jul 11 15:50:08 2018]  ? preempt_count_sub+0x14/0xc0
>> [Wed Jul 11 15:50:08 2018]  ? transaction_kthread+0x161/0x240 [btrfs]
>> [Wed Jul 11 15:50:08 2018]  ? is_module_text_address+0x2b/0x50
>> [Wed Jul 11 15:50:08 2018]  ? transaction_kthread+0x1d9/0x240 [btrfs]
>> [Wed Jul 11 15:50:08 2018]  ? kernel_text_address+0x5a/0x100
>> [Wed Jul 11 15:50:08 2018]  ? deactivate_slab.isra.27+0x64f/0x7a0
>> [Wed Jul 11 15:50:08 2018]  ? __save_stack_trace+0x82/0x100
>> [Wed Jul 11 15:50:08 2018]  ? kasan_kmalloc+0x142/0x170
>> [Wed Jul 11 15:50:08 2018]  ? kmem_cache_alloc+0xfc/0x2e0
>> [Wed Jul 11 15:50:08 2018]  ? start_transaction+0x596/0x930 [btrfs]
>> [Wed Jul 11 15:50:08 2018]  ? transaction_kthread+0x1d9/0x240 [btrfs]
>> [Wed Jul 11 15:50:08 2018]  ? kthread+0x1b9/0x1e0
>> [Wed Jul 11 15:50:08 2018]  ? ret_from_fork+0x27/0x50
>> [Wed Jul 11 15:50:08 2018]  ? deactivate_slab.isra.27+0x64f/0x7a0
>> [Wed Jul 11 15:50:08 2018]  ? mark_lock+0x149/0xa80
>> [Wed Jul 11 15:50:08 2018]  ? init_object+0x6b/0x80
>> [Wed Jul 11 15:50:08 2018]  ? print_usage_bug+0x3a0/0x3a0
>> [Wed Jul 11 15:50:08 2018]  ? ___slab_alloc+0x62a/0x690
>> [Wed Jul 11 15:50:08 2018]  ? ___slab_alloc+0x62a/0x690
>> [Wed Jul 11 15:50:08 2018]  ? __lock_is_held+0x8c/0xe0
>> [Wed Jul 11 15:50:08 2018]  ? start_transaction+0x596/0x930 [btrfs]
>> [Wed Jul 11 15:50:08 2018]  ? preempt_count_sub+0x14/0xc0
>> [Wed Jul 11 15:50:08 2018]  ? rcu_lockdep_current_cpu_online+0x12b/0x160
>> [Wed Jul 11 15:50:08 2018]  ? rcu_oom_callback+0x40/0x40
>> [Wed Jul 11 15:50:08 2018]  ? __lock_is_held+0x8c/0xe0
>> [Wed Jul 11 15:50:08 2018]  ? start_transaction+0x596/0x930 [btrfs]
>> [Wed Jul 11 15:50:08 2018]  ? rcu_read_lock_sched_held+0x8f/0xa0
>> [Wed Jul 11 15:50:08 2018]  ? btrfs_record_root_in_trans+0x1f/0xa0 [btrfs]
>> [Wed Jul 11 15:50:08 2018]  ? start_transaction+0x26b/0x930 [btrfs]
>> [Wed Jul 11 15:50:08 2018]  ? btrfs_commit_transaction+0x1640/0x1640 [btrfs]
>> [Wed Jul 11 15:50:08 2018]  ? check_flags.part.23+0x240/0x240
>> [Wed Jul 11 15:50:08 2018]  ? lock_downgrade+0x380/0x380
>> [Wed Jul 11 15:50:08 2018]  ? do_raw_spin_unlock+0x10f/0x1e0
>> [Wed Jul 11 15:50:08 2018]  ? do_raw_spin_unlock+0x10f/0x1e0
>> [Wed Jul 11 15:50:08 2018]  ? do_raw_spin_trylock+0x120/0x120
>> [Wed Jul 11 15:50:08 2018]  transaction_kthread+0x219/0x240 [btrfs]
>> [Wed Jul 11 15:50:08 2018]  ? btrfs_cleanup_transaction+0x6f0/0x6f0 [btrfs]
>> [Wed Jul 11 15:50:08 2018]  kthread+0x1b9/0x1e0
>> [Wed Jul 11 15:50:08 2018]  ? kthread_flush_work_fn+0x10/0x10
>> [Wed Jul 11 15:50:08 2018]  ret_from_fork+0x27/0x50
>> [Wed Jul 11 15:50:08 2018]
>>                            Showing all locks held in the system:
>> [Wed Jul 11 15:50:08 2018] 1 lock held by khungtaskd/72:
>> [Wed Jul 11 15:50:08 2018]  #0: 000000006461a1cc (rcu_read_lock){....}, at: debug_show_all_locks+0xa9/0x3d0
>> [Wed Jul 11 15:50:08 2018] 1 lock held by btrfs-transacti/1053:
>> [Wed Jul 11 15:50:08 2018]  #0: 000000005d73d46a (&fs_info->transaction_kthread_mutex){+.+.}, at: transaction_kthread+0x92/0x240 [btrfs]
>> [Wed Jul 11 15:50:08 2018] 1 lock held by dmesg/1066:
>> [Wed Jul 11 15:50:08 2018]  #0: 000000003e5d6fc4 (&user->lock){+.+.}, at: devkmsg_read+0xa5/0x4f0
>> [Wed Jul 11 15:50:08 2018] 4 locks held by xfs_io/7075:
>>
>> --
>> Thanks,
>> Lu
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
>-- 
>Filipe David Manana,
>
>“Whether you think you can, or you think you can't — you're right.”
>
>



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: About hung task on generic/041
       [not found]       ` <CAL3q7H4f7Ui-KRFfTuPJyey2OPKC2NFYwuUN9Lxn3cRKd0QVDg@mail.gmail.com>
@ 2018-07-14  0:25         ` Lu Fengqi
  0 siblings, 0 replies; 4+ messages in thread
From: Lu Fengqi @ 2018-07-14  0:25 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs, Filipe David Borba Manana

On Fri, Jul 13, 2018 at 11:33:39AM +0100, Filipe Manana wrote:
>On Fri, Jul 13, 2018 at 9:44 AM, Lu Fengqi <lufq.fnst@cn.fujitsu.com> wrote:
>> On Thu, Jul 12, 2018 at 08:33:59PM +0800, Lu Fengqi wrote:
>>>On Thu, Jul 12, 2018 at 11:40:54AM +0100, Filipe Manana wrote:
>>>>On Wed, Jul 11, 2018 at 10:02 AM, Lu Fengqi <lufq.fnst@cn.fujitsu.com> wrote:
>>>>> Hi,
>>>>>
>>>>> When I run generic/041 with v4.18-rc3 (turn on kasan and hung task
>>>>> detection), btrfs-transaction kthread will trigger the hung task timeout
>>>>> (stall at wait_event in btrfs_commit_transaction). At the same time, you
>>>>> can see that xfs_io -c fsync will occupy 100% of the CPU. I am not sure
>>>>> whether this is a problem. Any suggestion?
>>>>
>>>>Well, something at 100% cpu and that seems hang forever is definitely
>>>>a problem, specially a workload as simple as the one in generic/041
>>>
>>>To clarify, the hung task will end within 500s. Without KASAN, it will
>>>end within 80s, so it won't trigger hung task timeout 120s. I'm not sure
>>>if this is just slow, or have some problem?
>>
>> Well, I tried to run generic/041 with v4.18-rc4(with KASAN) on the other
>> machine(with HDD) and it didn't finish all night. The hung task maybe
>> only end within 500s on SSD.
>>
>>>
>>>>(never happened to me, even on vanilla 4.18-rc4).
>>
>> See the attachment kernel_config. Maybe some config make you can't
>> replicate the case.
>
>Don't have to look into that, but I'm attaching mine and then you can
>compare them.

You must set the following config. As mentioned above, the test won't hit
the hung task timeout without *KASAN*.

CONFIG_KASAN=y
CONFIG_KASAN_EXTRA=y
CONFIG_KASAN_OUTLINE=y

>
>>
>>>>Do you have the stack trace for the fsync task? What you pasted below
>>>
>>>I will send the stack trace tomorrow.
>>
>> See the attachment kasan.log.xz.
>>
>> From the log it seems that the time is consumed in the
>> btrfs_log_inode_parent loop call btrfs_log_inode.
>>
>> I'm very willing to provide a trace(without KASAN) for comparison, but when
>> I run both systemtap and testcase, I have another problem.
>>
>> See the attachment btrfs_sync_file.stp and 4.18-rc4.dmesg.
>
>Are you sure you running a vanilla kernel, without any other btrfs patches?
>This test case has been around since 2015 and no one ever run into
>such problem (it takes around 15 seconds to finish here, on 2 vms with
>a debug kernel).
>
>Does that happen to you on 4.17 or older kernels too? If it doesn't,
>then I suggest bisecting.

As soon as I turn on KASAN, the test case will encounter this problem at
the vanilla 4.17/4.18-rc3/4.18-rc4 kernel(no other patches).

-- 
Thanks,
Lu

>
>>
>> --
>> Thanks,
>> Lu
>>
>>>
>>>--
>>>Thanks,
>>>Lu
>>>
>>>>is only for the transaction kthread and that alone doesn't help.
>>>>
>>>>>
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>>
>>>>--
>>>>Filipe David Manana,
>>>>
>>>>“Whether you think you can, or you think you can't — you're right.”
>>>>
>>>>
>>
>>
>
>
>
>-- 
>Filipe David Manana,
>
>“Whether you think you can, or you think you can't — you're right.”
>
>



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-07-14  0:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-07-11  9:02 About hung task on generic/041 Lu Fengqi
2018-07-12 10:40 ` Filipe Manana
2018-07-12 12:34   ` Lu Fengqi
     [not found]     ` <20180713084449.GB575@fnst.localdomain>
     [not found]       ` <CAL3q7H4f7Ui-KRFfTuPJyey2OPKC2NFYwuUN9Lxn3cRKd0QVDg@mail.gmail.com>
2018-07-14  0:25         ` Lu Fengqi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).