From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Marc MERLIN <marc@merlins.org>, linux-btrfs@vger.kernel.org
Subject: Re: 6.4 and 6.9 btrfs blocked and btrfs_work_helper workqueue lockup, is it an IO bug/hang though?
Date: Wed, 28 Feb 2024 18:48:07 +1030 [thread overview]
Message-ID: <2020a7b4-b052-4144-8386-b05102a5465d@gmx.com> (raw)
In-Reply-To: <Zd5s1k8bFguE2NVl@merlins.org>
在 2024/2/28 09:44, Marc MERLIN 写道:
> Could anyone who can raad those things better than me, give me a clue
> whether those hangs are hardware related like I'm guessing they may be,
> or if they are potentially bugs worth looking into?
>
> Thanks,
> Marc
>
> On Sun, Feb 18, 2024 at 10:24:04PM -0800, Marc MERLIN wrote:
>> I've seen this with both 6.4.9 and 6.6.9 and had to sysrq reboot both
>> times to recover.
>> I'm trying to see if it's a hang of my raid card.
>>
>> That's the more recent hang with 6.6.9:
>>
>> The one with 6.4.9 is longer, so I put it here: https://pastebin.com/xz11JXWM
This one looks like it's hanging at btrfs_commit_transaction+0x26c.
But without the proper code line, it's hard to say what btrfs is waiting
for.
>>
>> Here's the 6.6.9 one here. Can someone help me confirm that at least
>> this one is likely not btrfs' fault but just underlying I/O hang?
>>
>> If so, idoes the 6.4.9 match the same symptom, or is it a different issue?
>>
>> Thanks,
>> Marc
>>
>> 135577.600958] INFO: task md12_raid1:1276 blocked for more than 120 seconds.
>> [135577.621963] Not tainted 6.6.9-amd64-volpre-sysrq-20240101 #19
>> [135577.641401] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [135577.665522] task:md12_raid1 state:D stack:0 pid:1276 ppid:2 flags:0x00004000
>> [135577.691381] Call Trace:
>> [135577.699494] <TASK>
>> [135577.706627] __schedule+0x6af/0x702
>> [135577.717853] schedule+0x8b/0xbd
>> [135577.727933] md_super_wait+0x5d/0x9c
>> [135577.739287] ? __pfx_autoremove_wake_function+0x40/0x40
>> [135577.755602] write_sb_page+0x242/0x25d
>> [135577.767482] md_update_sb+0x4c1/0x679
>> [135577.779072] md_check_recovery+0x276/0x484
>> [135577.791965] raid1d+0x46/0x10db
>> [135577.802178] ? raw_spin_rq_unlock_irq+0x5/0x10
>> [135577.816122] ? finish_task_switch.isra.0+0x129/0x202
>> [135577.831629] ? __schedule+0x6b7/0x702
>> [135577.843292] ? lock_timer_base+0x38/0x5f
>> [135577.855662] ? schedule+0x8b/0xbd
>> [135577.866222] ? __list_add+0x12/0x2f
>> [135577.877341] ? _raw_spin_unlock_irqrestore+0xe/0x2e
>> [135577.892618] md_thread+0x113/0x140
>> [135577.903553] ? __pfx_autoremove_wake_function+0x40/0x40
>> [135577.920016] ? __pfx_md_thread+0x40/0x40
>> [135577.932413] kthread+0xe8/0xf0
>> [135577.942221] ? __pfx_kthread+0x40/0x40
>> [135577.954084] ret_from_fork+0x24/0x36
>> [135577.965583] ? __pfx_kthread+0x40/0x40
>> [135577.977475] ret_from_fork_asm+0x1b/0x80
>> [135577.989877] </TASK>
>> [135577.997078] INFO: task md13_raid1:1278 blocked for more than 121 seconds.
>> [135578.018044] Not tainted 6.6.9-amd64-volpre-sysrq-20240101 #19
>> [135578.037405] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [135578.061454] task:md13_raid1 state:D stack:0 pid:1278 ppid:2 flags:0x00004000
>> [135578.087065] Call Trace:
>> [135578.094972] <TASK>
>> [135578.102074] __schedule+0x6af/0x702
>> [135578.113302] schedule+0x8b/0xbd
>> [135578.123297] md_super_wait+0x5d/0x9c
>> [135578.134757] ? __pfx_autoremove_wake_function+0x40/0x40
>> [135578.151023] write_sb_page+0x242/0x25d
>> [135578.162869] md_update_sb+0x4c1/0x679
>> [135578.174470] md_check_recovery+0x276/0x484
>> [135578.187342] raid1d+0x46/0x10db
>> [135578.197352] ? raw_spin_rq_unlock_irq+0x5/0x10
>> [135578.211262] ? finish_task_switch.isra.0+0x129/0x202
>> [135578.226731] ? __schedule+0x6b7/0x702
>> [135578.238278] ? lock_timer_base+0x38/0x5f
>> [135578.250688] ? _raw_spin_unlock_irqrestore+0xe/0x2e
>> [135578.265933] ? __try_to_del_timer_sync+0x64/0x8b
>> [135578.280350] ? __timer_delete_sync+0x2e/0x3d
>> [135578.293706] ? __list_add+0x12/0x2f
>> [135578.304886] ? _raw_spin_unlock_irqrestore+0xe/0x2e
>> [135578.320238] md_thread+0x113/0x140
>> [135578.331038] ? __pfx_autoremove_wake_function+0x40/0x40
>> [135578.347265] ? __pfx_md_thread+0x40/0x40
>> [135578.359599] kthread+0xe8/0xf0
>> [135578.369323] ? __pfx_kthread+0x40/0x40
>> [135578.381272] ret_from_fork+0x24/0x36
>> [135578.392702] ? __pfx_kthread+0x40/0x40
>> [135578.404534] ret_from_fork_asm+0x1b/0x80
>> [135578.416877] </TASK>
>> [135578.424012] INFO: task dmcrypt_write/2:2017 blocked for more than 121 seconds.
>> [135578.446256] Not tainted 6.6.9-amd64-volpre-sysrq-20240101 #19
>> [135578.465619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [135578.489691] task:dmcrypt_write/2 state:D stack:0 pid:2017 ppid:2 flags:0x00004000
>> [135578.515301] Call Trace:
>> [135578.523384] <TASK>
>> [135578.530433] __schedule+0x6af/0x702
>> [135578.541490] schedule+0x8b/0xbd
>> [135578.551650] md_write_start+0x160/0x1a7
>> [135578.563748] ? __pfx_autoremove_wake_function+0x40/0x40
>> [135578.579993] raid1_make_request+0x89/0x880
>> [135578.592962] ? sugov_update_single_freq+0x20/0x106
>> [135578.607926] ? update_load_avg+0x372/0x39b
>> [135578.620814] ? get_sd_balance_interval+0xf/0x3d
>> [135578.635156] md_handle_request+0x126/0x16d
>> [135578.648040] ? __pfx_dmcrypt_write+0x40/0x40 [dm_crypt 721219ef82f7f7c3ecde59f70e81b621d3b8f858]
>> [135578.674973] ? _raw_spin_unlock+0xa/0x1d
>> [135578.687325] ? raw_spin_rq_unlock_irq+0x5/0x10
>> [135578.701386] ? finish_task_switch.isra.0+0x129/0x202
>> [135578.716851] __submit_bio+0x63/0x89
>> [135578.728043] submit_bio_noacct_nocheck+0x181/0x258
>> [135578.743026] ? __pfx_dmcrypt_write+0x40/0x40 [dm_crypt 721219ef82f7f7c3ecde59f70e81b621d3b8f858]
>> [135578.769994] dmcrypt_write+0xd1/0xfd [dm_crypt 721219ef82f7f7c3ecde59f70e81b621d3b8f858]
>> [135578.794848] kthread+0xe8/0xf0
>> [135578.804612] ? __pfx_kthread+0x40/0x40
>> [135578.816438] ret_from_fork+0x24/0x36
>> [135578.827741] ? __pfx_kthread+0x40/0x40
>> [135578.839744] ret_from_fork_asm+0x1b/0x80
>> [135578.852280] </TASK>
>> [135578.859445] INFO: task btrfs-transacti:2415 blocked for more than 122 seconds.
>> [135578.881710] Not tainted 6.6.9-amd64-volpre-sysrq-20240101 #19
>> [135578.901270] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [135578.925345] task:btrfs-transacti state:D stack:0 pid:2415 ppid:2 flags:0x00004000
>> [135578.950996] Call Trace:
>> [135578.958960] <TASK>
>> [135578.965902] __schedule+0x6af/0x702
>> [135578.976995] schedule+0x8b/0xbd
>> [135578.987195] io_schedule+0x12/0x38
>> [135578.998015] folio_wait_bit_common+0x157/0x202
>> [135579.011950] ? __pfx_wake_page_function+0x40/0x40
>> [135579.026658] folio_wait_writeback+0x30/0x38
But in v6.6, btrfs is only waiting for its metadata to be properly
written back, that's the only blocked btrfs thread.
On the other hand, there are a lot of md/dm related hanging, which I
believe is the cause for blocked btrfs IO.
Thanks,
Qu
>> [135579.039897] __filemap_fdatawait_range+0x74/0xbf
>> [135579.054353] ? __update_freelist_fast+0x17/0x1e
>> [135579.068568] ? __clear_extent_bit+0x323/0x338
>> [135579.082257] filemap_fdatawait_range+0xf/0x19
>> [135579.096112] __btrfs_wait_marked_extents.isra.0+0x98/0xf3
>> [135579.113089] btrfs_write_and_wait_transaction+0x5d/0xbf
>> [135579.129372] btrfs_commit_transaction+0x67c/0xa62
>> [135579.144094] ? start_transaction+0x3f7/0x463
>> [135579.157540] transaction_kthread+0x105/0x17a
>> [135579.170970] ? __pfx_transaction_kthread+0x40/0x40
>> [135579.185951] kthread+0xe8/0xf0
>> [135579.195732] ? __pfx_kthread+0x40/0x40
>> [135579.207593] ret_from_fork+0x24/0x36
>> [135579.218942] ? __pfx_kthread+0x40/0x40
>> [135579.230778] ret_from_fork_asm+0x1b/0x80
>> [135579.243146] </TASK>
>> [135579.250316] INFO: task dmcrypt_write/2:5016 blocked for more than 122 seconds.
>> [135579.272593] Not tainted 6.6.9-amd64-volpre-sysrq-20240101 #19
>> [135579.291991] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [135579.316244] task:dmcrypt_write/2 state:D stack:0 pid:5016 ppid:2 flags:0x00004000
>> [135579.341930] Call Trace:
>> [135579.349929] <TASK>
>> [135579.356826] __schedule+0x6af/0x702
>> [135579.367910] schedule+0x8b/0xbd
>> [135579.377939] md_write_start+0x160/0x1a7
>> [135579.390216] ? __pfx_autoremove_wake_function+0x40/0x40
>> [135579.406491] raid1_make_request+0x89/0x880
>> [135579.419401] ? update_cfs_rq_load_avg+0x176/0x189
>> [135579.434131] ? update_load_avg+0x46/0x39b
>> [135579.446738] ? get_sd_balance_interval+0xf/0x3d
>> [135579.461129] md_handle_request+0x126/0x16d
>> [135579.474037] ? __pfx_dmcrypt_write+0x40/0x40 [dm_crypt 721219ef82f7f7c3ecde59f70e81b621d3b8f858]
>> [135579.500975] ? _raw_spin_unlock+0xa/0x1d
>> [135579.513370] ? raw_spin_rq_unlock_irq+0x5/0x10
>> [135579.527308] ? finish_task_switch.isra.0+0x129/0x202
>> [135579.542831] __submit_bio+0x63/0x89
>> [135579.553918] submit_bio_noacct_nocheck+0x181/0x258
>> [135579.568943] ? __pfx_dmcrypt_write+0x40/0x40 [dm_crypt 721219ef82f7f7c3ecde59f70e81b621d3b8f858]
>> [135579.595885] dmcrypt_write+0xd1/0xfd [dm_crypt 721219ef82f7f7c3ecde59f70e81b621d3b8f858]
>> [135579.620760] kthread+0xe8/0xf0
>> [135579.630682] ? __pfx_kthread+0x40/0x40
>> [135579.642699] ret_from_fork+0x24/0x36
>> [135579.654188] ? __pfx_kthread+0x40/0x40
>> [135579.666045] ret_from_fork_asm+0x1b/0x80
>> [135579.678418] </TASK>
>> [135579.685606] INFO: task dmcrypt_write/2:5286 blocked for more than 122 seconds.
>> [135579.707870] Not tainted 6.6.9-amd64-volpre-sysrq-20240101 #19
>> [135579.727445] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [135579.751591] task:dmcrypt_write/2 state:D stack:0 pid:5286 ppid:2 flags:0x00004000
>> [135579.777266] Call Trace:
>> [135579.785223] <TASK>
>> [135579.792124] __schedule+0x6af/0x702
>> [135579.803200] ? __pfx_wbt_inflight_cb+0x40/0x40
>> [135579.817139] ? __pfx_wbt_cleanup_cb+0x40/0x40
>> [135579.830818] schedule+0x8b/0xbd
>> [135579.840951] io_schedule+0x12/0x38
>> [135579.851745] rq_qos_wait+0xe8/0x126
>> [135579.862795] ? __pfx_rq_qos_wake_function+0x40/0x40
>> [135579.878021] ? __pfx_wbt_inflight_cb+0x40/0x40
>> [135579.891951] wbt_wait+0x95/0xe4
>> [135579.902022] __rq_qos_throttle+0x23/0x33
>> [135579.914398] blk_mq_submit_bio+0x2b6/0x4dd
>> [135579.927273] __submit_bio+0x29/0x89
>> [135579.938356] submit_bio_noacct_nocheck+0x121/0x258
>> [135579.953357] ? __pfx_dmcrypt_write+0x40/0x40 [dm_crypt 721219ef82f7f7c3ecde59f70e81b621d3b8f858]
>> [135579.980432] dmcrypt_write+0xd1/0xfd [dm_crypt 721219ef82f7f7c3ecde59f70e81b621d3b8f858]
>> [135580.005323] kthread+0xe8/0xf0
>> [135580.015088] ? __pfx_kthread+0x40/0x40
>> [135580.026929] ret_from_fork+0x24/0x36
>> [135580.038259] ? __pfx_kthread+0x40/0x40
>> [135580.050266] ret_from_fork_asm+0x1b/0x80
>> [135580.062628] </TASK>
>> [135580.069774] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
>>
>>
>> --
>> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
>>
>> Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08
>>
>>
>
next prev parent reply other threads:[~2024-02-28 8:18 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-19 6:24 6.4 and 6.9 btrfs blocked and btrfs_work_helper workqueue lockup, is it an IO bug/hang though? Marc MERLIN
2024-02-27 23:14 ` Marc MERLIN
2024-02-28 8:18 ` Qu Wenruo [this message]
2024-02-28 8:24 ` Qu Wenruo
2024-02-29 1:41 ` Marc MERLIN
2024-02-29 2:01 ` Qu Wenruo
2024-02-29 3:15 ` Marc MERLIN
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2020a7b4-b052-4144-8386-b05102a5465d@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=marc@merlins.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox