From: Nikolay Borisov <nborisov@suse.com>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: bisected: btrfs dedupe regression in v5.11-rc1: 3078d85c9a10 vfs: verify source area in vfs_dedupe_file_range_one()
Date: Wed, 15 Dec 2021 00:25:04 +0200 [thread overview]
Message-ID: <c6125582-a1dc-1114-8211-48437dbf4976@suse.com> (raw)
In-Reply-To: <Ybj1jVYu3MrUzVTD@hungrycats.org>
On 14.12.21 г. 21:50, Zygo Blaxell wrote:
> On Tue, Dec 14, 2021 at 01:11:24PM +0200, Nikolay Borisov wrote:
>>
>>
>> On 14.12.21 г. 1:12, Zygo Blaxell wrote:
>>> On Mon, Dec 13, 2021 at 03:28:26PM +0200, Nikolay Borisov wrote:
>>>> On 10.12.21 г. 20:34, Zygo Blaxell wrote:
>>>>> I've been getting deadlocks in dedupe on btrfs since kernel 5.11, and
>>>>> some bees users have reported it as well. I bisected to this commit:
>>>>>
>>>>> 3078d85c9a10 vfs: verify source area in vfs_dedupe_file_range_one()
>>>>>
>>>>> These kernels work for at least 18 hours:
>>>>>
>>>>> 5.10.83 (months)
>>>>> 5.11.22 with 3078d85c9a10 reverted (36 hours)
>>>>> btrfs misc-next 66dc4de326b0 with 3078d85c9a10 reverted
>>>>>
>>>>> These kernels lock up in 3 hours or less:
>>>>>
>>>>> 5.11.22
>>>>> 5.12.19
>>>>> 5.14.21
>>>>> 5.15.6
>>>>> btrfs for-next 279373dee83e
>>>>>
>>>>> All of the failing kernels include this commit, none of the non-failing
>>>>> kernels include the commit.
>>>>>
>>>>> Kernel logs from the lockup:
>>>>>
>>>>> [19647.696042][ T3721] sysrq: Show Blocked State
>>>>> [19647.697024][ T3721] task:btrfs-transacti state:D stack: 0 pid: 6161 ppid: 2 flags:0x00004000
>>>>> [19647.698203][ T3721] Call Trace:
>>>>> [19647.698608][ T3721] __schedule+0x388/0xaf0
>>>>> [19647.699125][ T3721] schedule+0x68/0xe0
>>>>> [19647.699615][ T3721] btrfs_commit_transaction+0x97c/0xbf0
>>>>
>>>> Can you run this through symbolize script as I'd like to understand
>>>> where in transaction commit the sleep is happening.
>>>
>>> btrfs_commit_transaction+0x97c/0xbf0:
>>>
>>> btrfs_commit_transaction at fs/btrfs/transaction.c:2159 (discriminator 9)
>>> 2154
>>> 2155 ret = btrfs_run_delayed_items(trans);
>>> 2156 if (ret)
>>> 2157 goto cleanup_transaction;
>>> 2158
>>> >2159< wait_event(cur_trans->writer_wait,
>>> 2160 extwriter_counter_read(cur_trans) == 0);
>>> 2161
>>> 2162 /* some pending stuffs might be added after the previous flush. */
>>> 2163 ret = btrfs_run_delayed_items(trans);
>>> 2164 if (ret)
>>>
>>
>> So it seems there is an open transaction handle thus commit can't
>> continue and everything is stalled behind. Would you be able to run the
>> attached python script on a host which is stuck. It requires you having
>> debug symbols for the kernel installed as well as
>> https://github.com/osandov/drgn/ which is a scriptable debugger. The
>> easiest way would to follow the instructions at
>> https://drgn.readthedocs.io/en/latest/installation.html and just get it
>> via pip.
>>
>>
>> Once you have it installed run it by doing:
>>
>> "sudo drgn get-num-extwriters.py 310dd372-0fd1-4496-a232-0fb46ca4afd6"
>>
>> Where 310dd372-0fd1-4496-a232-0fb46ca4afd6 is the fsid as taken from
>> 'blkid' which corresponds to the wedged fs.
>
> [drum roll noises...]
>
> [f79c1081-d81d-4abc-8b47-3b15bf2f93c5] num_extwriters is: 1
Huhz, this means there is an open transaction handle somewhere o_O. I
checked back the stacktraces in your original email but couldn't see
where that might be coming from. I.e all processes are waiting on
wait_current_trans and this happens _before_ the transaction handle is
opened, hence num_extwriters can't have been incremented by them.
When an fs wedges, and you get again num_extwriters can you provde the
output of "echo w > /proc/sysrq-trigger"
<snip>
next prev parent reply other threads:[~2021-12-14 22:25 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-10 18:34 bisected: btrfs dedupe regression in v5.11-rc1: 3078d85c9a10 vfs: verify source area in vfs_dedupe_file_range_one() Zygo Blaxell
2021-12-12 10:03 ` Thorsten Leemhuis
2021-12-13 13:28 ` Nikolay Borisov
2021-12-13 23:12 ` Zygo Blaxell
2021-12-14 11:11 ` Nikolay Borisov
2021-12-14 19:50 ` Zygo Blaxell
2021-12-14 22:25 ` Nikolay Borisov [this message]
2021-12-16 5:33 ` Zygo Blaxell
2021-12-16 21:29 ` Nikolay Borisov
2021-12-16 22:07 ` Josef Bacik
2021-12-17 20:50 ` Zygo Blaxell
2022-01-07 18:31 ` bisected: btrfs dedupe regression in v5.11-rc1 Zygo Blaxell
2022-01-20 14:04 ` Thorsten Leemhuis
2022-01-21 0:27 ` Zygo Blaxell
2022-02-09 12:22 ` Libor Klepáč
2022-02-18 14:46 ` Thorsten Leemhuis
2022-03-06 10:31 ` Thorsten Leemhuis
2022-03-06 23:34 ` Zygo Blaxell
2022-03-07 6:17 ` Thorsten Leemhuis
2021-12-17 5:38 ` bisected: btrfs dedupe regression in v5.11-rc1: 3078d85c9a10 vfs: verify source area in vfs_dedupe_file_range_one() Zygo Blaxell
2022-06-13 8:38 ` Libor Klepáč
2022-06-21 5:08 ` Zygo Blaxell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c6125582-a1dc-1114-8211-48437dbf4976@suse.com \
--to=nborisov@suse.com \
--cc=ce3g8jdj@umail.furryterror.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox