From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.flyingcircus.io (mail.flyingcircus.io [212.122.41.197]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 13D46535B7 for ; Thu, 8 Aug 2024 06:03:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=212.122.41.197 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723096986; cv=none; b=C5XAePrhmDVaWyFklf0CyDNkOJ7u51Bun0FPVKaNx2+uYNdC2OrKQGCLTOy2rqd/xyp/CnWUllUj6r1IICkwUWqCaq+qL6DfTsS8ccEx97S7X8wGZMqi87C19U/tb0ymK66k4z4GrO/rKBtBiwZOf8JO3ZyyHK/ODVDvrkb8nfA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723096986; c=relaxed/simple; bh=LKLTacLwy0U3ZBuEHYJQfpX9+w4enh/6bhD2X02ZM6k=; h=Content-Type:Mime-Version:Subject:From:In-Reply-To:Date:Cc: Message-Id:References:To; b=daJm8arl9JjuBLs3kGybWbmpBtnJQBByuG6PUormRI8QbuB4oCDIS8595f4y5TyG93jO+JDy9cMk/VMhoGiJy0S/fXiOz6O1687XSSqmNVO5BiyLQpYpl/byy2zMNeztMNRMR1x0OJ5g4sBpux6HPcdtqpT/TRrsSFUne7ifAfY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=flyingcircus.io; spf=pass smtp.mailfrom=flyingcircus.io; dkim=pass (1024-bit key) header.d=flyingcircus.io header.i=@flyingcircus.io header.b=E2lIwVRp; arc=none smtp.client-ip=212.122.41.197 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=flyingcircus.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flyingcircus.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=flyingcircus.io header.i=@flyingcircus.io header.b="E2lIwVRp" Content-Type: text/plain; charset=utf-8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=flyingcircus.io; s=mail; t=1723096971; bh=mj9cYSOErQ/DskaAKYIMu6MqYPNO5jA5EIFPHhl+cfM=; h=Subject:From:In-Reply-To:Date:Cc:References:To; b=E2lIwVRpSwIKKhmqq0cVUuOc2+XgIPg22iqTW8gIXI7o5S/HhnP2e8X5HVV2AerEF LvQTgPOuVUhyAEP3lQd3jroXyyr5wc3Ero0xTYHZf1owHGW2VZEfPu13PcBmFotH0j Hevvi1JZNhRTGnlo7FsrUvtf2a/9Ndb5tDsC5gQk= Precedence: bulk X-Mailing-List: linux-raid@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3776.700.51\)) Subject: Re: PROBLEM: repeatable lockup on RAID-6 with LUKS dm-crypt on NVMe devices when rsyncing many files From: Christian Theune In-Reply-To: <26291.57727.410499.243125@quad.stoffel.home> Date: Thu, 8 Aug 2024 08:02:28 +0200 Cc: Yu Kuai , "linux-raid@vger.kernel.org" , dm-devel@lists.linux.dev, "yukuai (C)" Content-Transfer-Encoding: quoted-printable Message-Id: <2EE0A3CE-CFF2-460C-97CD-262D686BFA8C@flyingcircus.io> References: <316050c6-fac2-b022-6350-eaedcc7d953a@huaweicloud.com> <58450ED6-EBC3-4770-9C5C-01ABB29468D6@flyingcircus.io> <22C5E55F-9C50-4DB7-B656-08BEC238C8A7@flyingcircus.io> <26291.57727.410499.243125@quad.stoffel.home> To: John Stoffel Hi, > On 7. Aug 2024, at 23:05, John Stoffel wrote: >=20 >>>>>> "Christian" =3D=3D Christian Theune writes: >=20 >=20 >=20 >> i had some more time at hand and managent to compile 5.15.164. The >> issue is the same. After around 1h30m of work it hangs. I=E2=80=99ll = try to >> reproduce this on a newer supported kernel if I can. >=20 > Supported by who? NixOS? Why don't you just install linux kernel > 6.6.x and see of the problem is still there? 5.15.x is ancient and > un-supported upstream now. =20 I did just that. However, 5.15 =E2=80=9Cun-supported=E2=80=9D by = upstream is confusing me. It=E2=80=99s an official LTS kernel with an = EOL of December 2026.=20 Also, I=E2=80=99d like to note that NixOS kernels tend to be very close = to upstream. The only patches that I can see are involved here are those = that patch out some hard coded references to user space paths: = https://github.com/NixOS/nixpkgs/blob/master/pkgs/top-level/linux-kernels.= nix#L173 = https://github.com/NixOS/nixpkgs/blob/master/pkgs/os-specific/linux/kernel= /request-key-helper.patch = https://github.com/NixOS/nixpkgs/blob/master/pkgs/os-specific/linux/kernel= /bridge-stp-helper.patch Kernel is now: Linux barbrady08 6.10.3 #1-NixOS SMP PREEMPT_DYNAMIC Sat Aug 3 07:01:09 = UTC 2024 x86_64 GNU/Linux The issue is still there on 6.10.3 and now looks like shown below. I=E2=80=99m aware that this is output that shows symptoms and not = (necessarily) the cause. I=E2=80=99m currently a bit out of ideas where = to look for more information and would appreciate any pointers. My = suspicion is an interaction problem triggered by the use of NVMe in = combination with other systems (xfs, dm-crypt and raid are the ones = I=E2=80=99m aware of playign a role). The use of NVMe itself likely isn=E2=80=99t the issue (we=E2=80=99ve = been using NVMe on similar hosts and also in combination with dm-crypt = with this kernel for a while now) and I could imagine that it triggers a = race condition due to the higher performance - although the specific = performance parameters aren't *that* high. Right before the lockup I see = ~700 IOPS reading and ~2.5k IOPS writing. So we have seen NVMe with = dm-crypt but not with raid before. I can perform debugging on that machine as needed, but googling for any = combination of hung tasks related to nvme/xfs/crypt/raid only ends up = showing me generic performance concerns from forum, an unrelated xfs = issue mentioned by redhat and the list archive entry from this post. [ 7497.019235] INFO: task .backy-wrapped:2706 blocked for more than 122 = seconds. [ 7497.027265] Not tainted 6.10.3 #1-NixOS [ 7497.032173] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" = disables this message. [ 7497.040974] task:.backy-wrapped state:D stack:0 pid:2706 = tgid:2706 ppid:1 flags:0x00000002 [ 7497.040979] Call Trace: [ 7497.040981] [ 7497.040987] __schedule+0x3fa/0x1550 [ 7497.040996] ? xfs_iextents_copy+0xec/0x1b0 [xfs] [ 7497.041085] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7497.041089] ? xlog_copy_iovec+0x30/0x90 [xfs] [ 7497.041168] schedule+0x27/0xf0 [ 7497.041171] io_schedule+0x46/0x70 [ 7497.041173] folio_wait_bit_common+0x13f/0x340 [ 7497.041180] ? __pfx_wake_page_function+0x10/0x10 [ 7497.041187] folio_wait_writeback+0x2b/0x80 [ 7497.041191] truncate_inode_partial_folio+0x5b/0x190 [ 7497.041194] truncate_inode_pages_range+0x1de/0x400 [ 7497.041207] evict+0x1b0/0x1d0 [ 7497.041212] __dentry_kill+0x6e/0x170 [ 7497.041216] dput+0xe5/0x1b0 [ 7497.041218] do_renameat2+0x386/0x600 [ 7497.041226] __x64_sys_rename+0x43/0x50 [ 7497.041229] do_syscall_64+0xb7/0x200 [ 7497.041234] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 7497.041236] RIP: 0033:0x7f4be586f75b [ 7497.041265] RSP: 002b:00007fffd2706538 EFLAGS: 00000246 ORIG_RAX: = 0000000000000052 [ 7497.041267] RAX: ffffffffffffffda RBX: 00007fffd27065d0 RCX: = 00007f4be586f75b [ 7497.041269] RDX: 0000000000000000 RSI: 00007f4bd6f73e50 RDI: = 00007f4bd6f732d0 [ 7497.041270] RBP: 00007fffd2706580 R08: 00000000ffffffff R09: = 0000000000000000 [ 7497.041271] R10: 00007fffd27067b0 R11: 0000000000000246 R12: = 00000000ffffff9c [ 7497.041273] R13: 00000000ffffff9c R14: 0000000037fb4ab0 R15: = 00007f4be5814810 [ 7497.041277] [ 7497.041281] INFO: task kworker/u131:1:12780 blocked for more than 122 = seconds. [ 7497.049410] Not tainted 6.10.3 #1-NixOS [ 7497.054317] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" = disables this message. [ 7497.063124] task:kworker/u131:1 state:D stack:0 pid:12780 = tgid:12780 ppid:2 flags:0x00004000 [ 7497.063131] Workqueue: kcryptd-253:4-1 kcryptd_crypt [dm_crypt] [ 7497.063140] Call Trace: [ 7497.063141] [ 7497.063145] __schedule+0x3fa/0x1550 [ 7497.063154] schedule+0x27/0xf0 [ 7497.063156] md_bitmap_startwrite+0x14f/0x1c0 [ 7497.063160] ? __pfx_autoremove_wake_function+0x10/0x10 [ 7497.063168] __add_stripe_bio+0x1f4/0x240 [raid456] [ 7497.063175] raid5_make_request+0x34d/0x1280 [raid456] [ 7497.063182] ? __pfx_woken_wake_function+0x10/0x10 [ 7497.063184] ? bio_split_rw+0x193/0x260 [ 7497.063190] md_handle_request+0x153/0x270 [ 7497.063194] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7497.063198] __submit_bio+0x190/0x240 [ 7497.063203] submit_bio_noacct_nocheck+0x19a/0x3c0 [ 7497.063205] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7497.063207] ? submit_bio_noacct+0x46/0x5a0 [ 7497.063210] kcryptd_crypt_write_convert+0x118/0x1e0 [dm_crypt] [ 7497.063214] process_one_work+0x18f/0x3b0 [ 7497.063219] worker_thread+0x233/0x340 [ 7497.063222] ? __pfx_worker_thread+0x10/0x10 [ 7497.063225] kthread+0xcd/0x100 [ 7497.063228] ? __pfx_kthread+0x10/0x10 [ 7497.063230] ret_from_fork+0x31/0x50 [ 7497.063234] ? __pfx_kthread+0x10/0x10 [ 7497.063236] ret_from_fork_asm+0x1a/0x30 [ 7497.063243] [ 7497.063246] INFO: task kworker/u131:0:17487 blocked for more than 122 = seconds. [ 7497.071367] Not tainted 6.10.3 #1-NixOS [ 7497.076269] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" = disables this message. [ 7497.085073] task:kworker/u131:0 state:D stack:0 pid:17487 = tgid:17487 ppid:2 flags:0x00004000 [ 7497.085081] Workqueue: kcryptd-253:4-1 kcryptd_crypt [dm_crypt] [ 7497.085086] Call Trace: [ 7497.085087] [ 7497.085089] __schedule+0x3fa/0x1550 [ 7497.085094] schedule+0x27/0xf0 [ 7497.085096] md_bitmap_startwrite+0x14f/0x1c0 [ 7497.085098] ? __pfx_autoremove_wake_function+0x10/0x10 [ 7497.085102] __add_stripe_bio+0x1f4/0x240 [raid456] [ 7497.085108] raid5_make_request+0x34d/0x1280 [raid456] [ 7497.085114] ? __pfx_woken_wake_function+0x10/0x10 [ 7497.085116] ? bio_split_rw+0x193/0x260 [ 7497.085120] md_handle_request+0x153/0x270 [ 7497.085122] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7497.085125] __submit_bio+0x190/0x240 [ 7497.085128] submit_bio_noacct_nocheck+0x19a/0x3c0 [ 7497.085131] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7497.085133] ? submit_bio_noacct+0x46/0x5a0 [ 7497.085135] kcryptd_crypt_write_convert+0x118/0x1e0 [dm_crypt] [ 7497.085138] process_one_work+0x18f/0x3b0 [ 7497.085142] worker_thread+0x233/0x340 [ 7497.085145] ? __pfx_worker_thread+0x10/0x10 [ 7497.085148] ? __pfx_worker_thread+0x10/0x10 [ 7497.085150] kthread+0xcd/0x100 [ 7497.085152] ? __pfx_kthread+0x10/0x10 [ 7497.085155] ret_from_fork+0x31/0x50 [ 7497.085157] ? __pfx_kthread+0x10/0x10 [ 7497.085159] ret_from_fork_asm+0x1a/0x30 [ 7497.085164] [ 7497.085165] INFO: task kworker/u131:2:18973 blocked for more than 122 = seconds. [ 7497.093282] Not tainted 6.10.3 #1-NixOS [ 7497.098185] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" = disables this message. [ 7497.106988] task:kworker/u131:2 state:D stack:0 pid:18973 = tgid:18973 ppid:2 flags:0x00004000 [ 7497.106993] Workqueue: kcryptd-253:4-1 kcryptd_crypt [dm_crypt] [ 7497.106998] Call Trace: [ 7497.106999] [ 7497.107001] __schedule+0x3fa/0x1550 [ 7497.107006] schedule+0x27/0xf0 [ 7497.107009] md_bitmap_startwrite+0x14f/0x1c0 [ 7497.107012] ? __pfx_autoremove_wake_function+0x10/0x10 [ 7497.107016] __add_stripe_bio+0x1f4/0x240 [raid456] [ 7497.107021] raid5_make_request+0x34d/0x1280 [raid456] [ 7497.107026] ? __pfx_woken_wake_function+0x10/0x10 [ 7497.107028] ? bio_split_rw+0x193/0x260 [ 7497.107033] md_handle_request+0x153/0x270 [ 7497.107036] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7497.107039] __submit_bio+0x190/0x240 [ 7497.107042] submit_bio_noacct_nocheck+0x19a/0x3c0 [ 7497.107044] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7497.107046] ? submit_bio_noacct+0x46/0x5a0 [ 7497.107049] kcryptd_crypt_write_convert+0x118/0x1e0 [dm_crypt] [ 7497.107052] process_one_work+0x18f/0x3b0 [ 7497.107055] worker_thread+0x233/0x340 [ 7497.107058] ? __pfx_worker_thread+0x10/0x10 [ 7497.107060] ? __pfx_worker_thread+0x10/0x10 [ 7497.107063] kthread+0xcd/0x100 [ 7497.107065] ? __pfx_kthread+0x10/0x10 [ 7497.107067] ret_from_fork+0x31/0x50 [ 7497.107069] ? __pfx_kthread+0x10/0x10 [ 7497.107071] ret_from_fork_asm+0x1a/0x30 [ 7497.107081] [ 7497.107086] INFO: task rsync:23530 blocked for more than 122 seconds. [ 7497.114327] Not tainted 6.10.3 #1-NixOS [ 7497.119226] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" = disables this message. [ 7497.128020] task:rsync state:D stack:0 pid:23530 = tgid:23530 ppid:23520 flags:0x00000000 [ 7497.128024] Call Trace: [ 7497.128025] [ 7497.128027] __schedule+0x3fa/0x1550 [ 7497.128030] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7497.128034] schedule+0x27/0xf0 [ 7497.128036] schedule_timeout+0x15d/0x170 [ 7497.128040] __down_common+0x119/0x220 [ 7497.128045] down+0x47/0x60 [ 7497.128048] xfs_buf_lock+0x31/0xe0 [xfs] [ 7497.128131] xfs_buf_find_lock+0x55/0x100 [xfs] [ 7497.128185] xfs_buf_get_map+0x1ea/0xa80 [xfs] [ 7497.128236] xfs_buf_read_map+0x62/0x2a0 [xfs] [ 7497.128287] ? xfs_read_agf+0x97/0x150 [xfs] [ 7497.128357] xfs_trans_read_buf_map+0x12e/0x310 [xfs] [ 7497.128429] ? xfs_read_agf+0x97/0x150 [xfs] [ 7497.128489] xfs_read_agf+0x97/0x150 [xfs] [ 7497.128540] xfs_alloc_read_agf+0x5a/0x200 [xfs] [ 7497.128589] xfs_alloc_fix_freelist+0x345/0x660 [xfs] [ 7497.128641] xfs_alloc_vextent_prepare_ag+0x2d/0x120 [xfs] [ 7497.128690] xfs_alloc_vextent_exact_bno+0xd1/0x100 [xfs] [ 7497.128740] xfs_ialloc_ag_alloc+0x177/0x610 [xfs] [ 7497.128812] xfs_dialloc+0x219/0x7b0 [xfs] [ 7497.128864] ? xfs_trans_alloc_icreate+0x93/0x120 [xfs] [ 7497.128935] xfs_create+0x2c7/0x640 [xfs] [ 7497.128998] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7497.129001] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7497.129003] ? get_cached_acl+0x4c/0x90 [ 7497.129008] xfs_generic_create+0x321/0x3a0 [xfs] [ 7497.129061] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7497.129065] path_openat+0xf82/0x1240 [ 7497.129072] do_filp_open+0xc4/0x170 [ 7497.129084] do_sys_openat2+0xab/0xe0 [ 7497.129090] __x64_sys_openat+0x57/0xa0 [ 7497.129093] do_syscall_64+0xb7/0x200 [ 7497.129096] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 7497.129099] RIP: 0033:0x7f6809d2be2f [ 7497.129121] RSP: 002b:00007ffe3d410cf0 EFLAGS: 00000246 ORIG_RAX: = 0000000000000101 [ 7497.129123] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: = 00007f6809d2be2f [ 7497.129124] RDX: 00000000000000c2 RSI: 00007ffe3d412fc0 RDI: = 00000000ffffff9c [ 7497.129126] RBP: 000000000003a2f8 R08: 001f1108db8eff56 R09: = 00007ffe3d410f2c [ 7497.129128] R10: 0000000000000180 R11: 0000000000000246 R12: = 00007ffe3d41300b [ 7497.129129] R13: 00007ffe3d412fc0 R14: 8421084210842109 R15: = 00007f6809dc6a80 [ 7497.129133] [ 7497.129146] INFO: task kworker/u131:3:23611 blocked for more than 122 = seconds. [ 7497.137277] Not tainted 6.10.3 #1-NixOS [ 7497.142187] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" = disables this message. [ 7497.150980] task:kworker/u131:3 state:D stack:0 pid:23611 = tgid:23611 ppid:2 flags:0x00004000 [ 7497.150986] Workqueue: writeback wb_workfn (flush-253:4) [ 7497.150993] Call Trace: [ 7497.150995] [ 7497.150998] __schedule+0x3fa/0x1550 [ 7497.151007] schedule+0x27/0xf0 [ 7497.151009] schedule_timeout+0x15d/0x170 [ 7497.151013] __wait_for_common+0x90/0x1c0 [ 7497.151015] ? __pfx_schedule_timeout+0x10/0x10 [ 7497.151020] xfs_buf_iowait+0x1c/0xc0 [xfs] [ 7497.151094] __xfs_buf_submit+0x132/0x1e0 [xfs] [ 7497.151146] xfs_buf_read_map+0x129/0x2a0 [xfs] [ 7497.151197] ? xfs_btree_read_buf_block+0xa7/0x120 [xfs] [ 7497.151267] xfs_trans_read_buf_map+0x12e/0x310 [xfs] [ 7497.151336] ? xfs_btree_read_buf_block+0xa7/0x120 [xfs] [ 7497.151396] xfs_btree_read_buf_block+0xa7/0x120 [xfs] [ 7497.151446] xfs_btree_lookup_get_block+0xa6/0x1f0 [xfs] [ 7497.151497] xfs_btree_lookup+0xea/0x500 [xfs] [ 7497.151546] ? xfs_btree_increment+0x44/0x310 [xfs] [ 7497.151596] xfs_alloc_fixup_trees+0x66/0x4c0 [xfs] [ 7497.151661] xfs_alloc_cur_finish+0x2b/0xa0 [xfs] [ 7497.151710] xfs_alloc_ag_vextent_near+0x437/0x540 [xfs] [ 7497.151764] xfs_alloc_vextent_iterate_ags.constprop.0+0xc8/0x200 = [xfs] [ 7497.151813] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7497.151817] ? xfs_buf_item_format+0x1b8/0x450 [xfs] [ 7497.151884] xfs_alloc_vextent_start_ag+0xc0/0x190 [xfs] [ 7497.151938] xfs_bmap_btalloc+0x4dd/0x640 [xfs] [ 7497.151999] xfs_bmapi_allocate+0xac/0x2c0 [xfs] [ 7497.152048] xfs_bmapi_convert_one_delalloc+0x1f6/0x430 [xfs] [ 7497.152105] xfs_bmapi_convert_delalloc+0x43/0x60 [xfs] [ 7497.152155] xfs_map_blocks+0x257/0x420 [xfs] [ 7497.152228] iomap_writepages+0x271/0x9b0 [ 7497.152235] xfs_vm_writepages+0x67/0x90 [xfs] [ 7497.152287] do_writepages+0x76/0x260 [ 7497.152294] ? uas_submit_urbs+0x8c/0x4c0 [uas] [ 7497.152297] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7497.152300] ? psi_group_change+0x213/0x3c0 [ 7497.152305] __writeback_single_inode+0x3d/0x350 [ 7497.152307] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7497.152309] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7497.152312] writeback_sb_inodes+0x21c/0x4e0 [ 7497.152323] __writeback_inodes_wb+0x4c/0xf0 [ 7497.152325] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7497.152328] wb_writeback+0x193/0x310 [ 7497.152332] wb_workfn+0x357/0x450 [ 7497.152337] process_one_work+0x18f/0x3b0 [ 7497.152342] worker_thread+0x233/0x340 [ 7497.152345] ? __pfx_worker_thread+0x10/0x10 [ 7497.152348] kthread+0xcd/0x100 [ 7497.152352] ? __pfx_kthread+0x10/0x10 [ 7497.152354] ret_from_fork+0x31/0x50 [ 7497.152358] ? __pfx_kthread+0x10/0x10 [ 7497.152360] ret_from_fork_asm+0x1a/0x30 [ 7497.152366] [ 7497.152368] INFO: task kworker/u131:4:23612 blocked for more than 123 = seconds. [ 7497.160489] Not tainted 6.10.3 #1-NixOS [ 7497.165390] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" = disables this message. [ 7497.174190] task:kworker/u131:4 state:D stack:0 pid:23612 = tgid:23612 ppid:2 flags:0x00004000 [ 7497.174194] Workqueue: kcryptd-253:4-1 kcryptd_crypt [dm_crypt] [ 7497.174200] Call Trace: [ 7497.174201] [ 7497.174203] __schedule+0x3fa/0x1550 [ 7497.174208] schedule+0x27/0xf0 [ 7497.174210] md_bitmap_startwrite+0x14f/0x1c0 [ 7497.174214] ? __pfx_autoremove_wake_function+0x10/0x10 [ 7497.174219] __add_stripe_bio+0x1f4/0x240 [raid456] [ 7497.174227] raid5_make_request+0x34d/0x1280 [raid456] [ 7497.174233] ? __pfx_woken_wake_function+0x10/0x10 [ 7497.174235] ? bio_split_rw+0x193/0x260 [ 7497.174242] md_handle_request+0x153/0x270 [ 7497.174245] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7497.174248] __submit_bio+0x190/0x240 [ 7497.174252] submit_bio_noacct_nocheck+0x19a/0x3c0 [ 7497.174255] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7497.174257] ? submit_bio_noacct+0x46/0x5a0 [ 7497.174259] kcryptd_crypt_write_convert+0x118/0x1e0 [dm_crypt] [ 7497.174263] process_one_work+0x18f/0x3b0 [ 7497.174266] worker_thread+0x233/0x340 [ 7497.174269] ? __pfx_worker_thread+0x10/0x10 [ 7497.174271] kthread+0xcd/0x100 [ 7497.174273] ? __pfx_kthread+0x10/0x10 [ 7497.174276] ret_from_fork+0x31/0x50 [ 7497.174277] ? __pfx_kthread+0x10/0x10 [ 7497.174279] ret_from_fork_asm+0x1a/0x30 [ 7497.174285] [ 7497.174292] INFO: task kworker/u130:33:23645 blocked for more than = 123 seconds. [ 7497.182499] Not tainted 6.10.3 #1-NixOS [ 7497.187400] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" = disables this message. [ 7497.196203] task:kworker/u130:33 state:D stack:0 pid:23645 = tgid:23645 ppid:2 flags:0x00004000 [ 7497.196209] Workqueue: xfs-cil/dm-4 xlog_cil_push_work [xfs] [ 7497.196281] Call Trace: [ 7497.196282] [ 7497.196285] __schedule+0x3fa/0x1550 [ 7497.196289] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7497.196293] schedule+0x27/0xf0 [ 7497.196295] xlog_state_get_iclog_space+0x102/0x2b0 [xfs] [ 7497.196346] ? __pfx_default_wake_function+0x10/0x10 [ 7497.196351] xlog_write_get_more_iclog_space+0xd0/0x100 [xfs] [ 7497.196400] xlog_write+0x310/0x470 [xfs] [ 7497.196451] xlog_cil_push_work+0x6a5/0x880 [xfs] [ 7497.196503] process_one_work+0x18f/0x3b0 [ 7497.196507] worker_thread+0x233/0x340 [ 7497.196510] ? __pfx_worker_thread+0x10/0x10 [ 7497.196512] ? __pfx_worker_thread+0x10/0x10 [ 7497.196515] kthread+0xcd/0x100 [ 7497.196517] ? __pfx_kthread+0x10/0x10 [ 7497.196519] ret_from_fork+0x31/0x50 [ 7497.196522] ? __pfx_kthread+0x10/0x10 [ 7497.196524] ret_from_fork_asm+0x1a/0x30 [ 7497.196529] [ 7497.196531] INFO: task kworker/u131:6:23863 blocked for more than 123 = seconds. [ 7497.204648] Not tainted 6.10.3 #1-NixOS [ 7497.209539] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" = disables this message. [ 7497.218347] task:kworker/u131:6 state:D stack:0 pid:23863 = tgid:23863 ppid:2 flags:0x00004000 [ 7497.218353] Workqueue: kcryptd-253:4-1 kcryptd_crypt [dm_crypt] [ 7497.218359] Call Trace: [ 7497.218360] [ 7497.218363] __schedule+0x3fa/0x1550 [ 7497.218369] schedule+0x27/0xf0 [ 7497.218371] md_bitmap_startwrite+0x14f/0x1c0 [ 7497.218375] ? __pfx_autoremove_wake_function+0x10/0x10 [ 7497.218379] __add_stripe_bio+0x1f4/0x240 [raid456] [ 7497.218384] raid5_make_request+0x34d/0x1280 [raid456] [ 7497.218390] ? __pfx_woken_wake_function+0x10/0x10 [ 7497.218392] ? bio_split_rw+0x193/0x260 [ 7497.218398] md_handle_request+0x153/0x270 [ 7497.218401] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7497.218405] __submit_bio+0x190/0x240 [ 7497.218408] submit_bio_noacct_nocheck+0x19a/0x3c0 [ 7497.218410] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7497.218413] ? submit_bio_noacct+0x46/0x5a0 [ 7497.218415] kcryptd_crypt_write_convert+0x118/0x1e0 [dm_crypt] [ 7497.218419] process_one_work+0x18f/0x3b0 [ 7497.218423] worker_thread+0x233/0x340 [ 7497.218426] ? __pfx_worker_thread+0x10/0x10 [ 7497.218428] kthread+0xcd/0x100 [ 7497.218430] ? __pfx_kthread+0x10/0x10 [ 7497.218433] ret_from_fork+0x31/0x50 [ 7497.218435] ? __pfx_kthread+0x10/0x10 [ 7497.218437] ret_from_fork_asm+0x1a/0x30 [ 7497.218442] [ 7497.218444] INFO: task kworker/u131:7:23864 blocked for more than 123 = seconds. [ 7497.226572] Not tainted 6.10.3 #1-NixOS [ 7497.231475] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" = disables this message. [ 7497.240277] task:kworker/u131:7 state:D stack:0 pid:23864 = tgid:23864 ppid:2 flags:0x00004000 [ 7497.240282] Workqueue: kcryptd-253:4-1 kcryptd_crypt [dm_crypt] [ 7497.240287] Call Trace: [ 7497.240288] [ 7497.240290] __schedule+0x3fa/0x1550 [ 7497.240298] schedule+0x27/0xf0 [ 7497.240301] md_bitmap_startwrite+0x14f/0x1c0 [ 7497.240304] ? __pfx_autoremove_wake_function+0x10/0x10 [ 7497.240310] __add_stripe_bio+0x1f4/0x240 [raid456] [ 7497.240314] raid5_make_request+0x34d/0x1280 [raid456] [ 7497.240320] ? __pfx_woken_wake_function+0x10/0x10 [ 7497.240322] ? bio_split_rw+0x193/0x260 [ 7497.240328] md_handle_request+0x153/0x270 [ 7497.240330] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7497.240334] __submit_bio+0x190/0x240 [ 7497.240338] submit_bio_noacct_nocheck+0x19a/0x3c0 [ 7497.240340] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7497.240342] ? submit_bio_noacct+0x46/0x5a0 [ 7497.240345] kcryptd_crypt_write_convert+0x118/0x1e0 [dm_crypt] [ 7497.240348] process_one_work+0x18f/0x3b0 [ 7497.240353] worker_thread+0x233/0x340 [ 7497.240356] ? __pfx_worker_thread+0x10/0x10 [ 7497.240358] kthread+0xcd/0x100 [ 7497.240361] ? __pfx_kthread+0x10/0x10 [ 7497.240364] ret_from_fork+0x31/0x50 [ 7497.240366] ? __pfx_kthread+0x10/0x10 [ 7497.240368] ret_from_fork_asm+0x1a/0x30 [ 7497.240375] [ 7497.240376] Future hung task reports are suppressed, see sysctl = kernel.hung_task_warnings >=20 >=20 >=20 >> Kernel: >=20 >> Linux version 5.15.164 (nixbld@localhost) (gcc (GCC) 12.2.0, GNU ld = (GNU Binutils) 2.40) #1-NixOS SMP Sat Jul 27 08:46:18 UTC 2024 >=20 >> The config is unchanged except from the deprecated NFSD_V2_ACL and = NFSD_V3 options which I had to remove. NFS is not in use on this server, = though. >=20 >> Output: >=20 >> [ 4549.838672] INFO: task kworker/u64:7:432 blocked for more than 122 = seconds. >> [ 4549.846507] Not tainted 5.15.164 #1-NixOS >> [ 4549.851616] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" = disables this message. >> [ 4549.860421] task:kworker/u64:7 state:D stack: 0 pid: 432 = ppid: 2 flags:0x00004000 >> [ 4549.860426] Workqueue: kcryptd/253:4 kcryptd_crypt [dm_crypt] >> [ 4549.860435] Call Trace: >> [ 4549.860437] >> [ 4549.860440] __schedule+0x373/0x1580 >> [ 4549.860446] ? sysvec_call_function_single+0xa/0x90 >> [ 4549.860449] ? asm_sysvec_call_function_single+0x16/0x20 >> [ 4549.860453] schedule+0x5b/0xe0 >> [ 4549.860455] md_bitmap_startwrite+0x177/0x1e0 >> [ 4549.860459] ? finish_wait+0x90/0x90 >> [ 4549.860465] add_stripe_bio+0x449/0x770 [raid456] >> [ 4549.860472] raid5_make_request+0x1cf/0xbd0 [raid456] >> [ 4549.860476] ? kmem_cache_alloc_node_trace+0x341/0x3e0 >> [ 4549.860480] ? srso_alias_return_thunk+0x5/0x7f >> [ 4549.860484] ? linear_map+0x44/0x90 [dm_mod] >> [ 4549.860490] ? finish_wait+0x90/0x90 >> [ 4549.860492] ? __blk_queue_split+0x516/0x580 >> [ 4549.860495] md_handle_request+0x11f/0x1b0 >> [ 4549.860500] md_submit_bio+0x6e/0xb0 >> [ 4549.860502] __submit_bio+0x18c/0x220 >> [ 4549.860505] ? srso_alias_return_thunk+0x5/0x7f >> [ 4549.860507] ? crypt_page_alloc+0x46/0x60 [dm_crypt] >> [ 4549.860510] submit_bio_noacct+0xbe/0x2d0 >> [ 4549.860512] kcryptd_crypt+0x3a8/0x5a0 [dm_crypt] >> [ 4549.860517] process_one_work+0x1d3/0x360 >> [ 4549.860521] worker_thread+0x4d/0x3b0 >> [ 4549.860523] ? process_one_work+0x360/0x360 >> [ 4549.860525] kthread+0x115/0x140 >> [ 4549.860528] ? set_kthread_struct+0x50/0x50 >> [ 4549.860530] ret_from_fork+0x1f/0x30 >> [ 4549.860535] >> [ 4549.860536] INFO: task kworker/u64:23:448 blocked for more than = 122 seconds. >> [ 4549.868461] Not tainted 5.15.164 #1-NixOS >> [ 4549.873555] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" = disables this message. >> [ 4549.882358] task:kworker/u64:23 state:D stack: 0 pid: 448 = ppid: 2 flags:0x00004000 >> [ 4549.882364] Workqueue: kcryptd/253:4 kcryptd_crypt [dm_crypt] >> [ 4549.882368] Call Trace: >> [ 4549.882369] >> [ 4549.882370] __schedule+0x373/0x1580 >> [ 4549.882373] ? sysvec_apic_timer_interrupt+0xa/0x90 >> [ 4549.882375] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 >> [ 4549.882379] schedule+0x5b/0xe0 >> [ 4549.882382] md_bitmap_startwrite+0x177/0x1e0 >> [ 4549.882384] ? finish_wait+0x90/0x90 >> [ 4549.882387] add_stripe_bio+0x449/0x770 [raid456] >> [ 4549.882393] raid5_make_request+0x1cf/0xbd0 [raid456] >> [ 4549.882397] ? __bio_clone_fast+0xa5/0xe0 >> [ 4549.882401] ? srso_alias_return_thunk+0x5/0x7f >> [ 4549.882403] ? finish_wait+0x90/0x90 >> [ 4549.882406] md_handle_request+0x11f/0x1b0 >> [ 4549.882410] ? blk_throtl_charge_bio_split+0x23/0x60 >> [ 4549.882413] md_submit_bio+0x6e/0xb0 >> [ 4549.882415] __submit_bio+0x18c/0x220 >> [ 4549.882417] ? srso_alias_return_thunk+0x5/0x7f >> [ 4549.882419] ? crypt_page_alloc+0x46/0x60 [dm_crypt] >> [ 4549.882421] submit_bio_noacct+0xbe/0x2d0 >> [ 4549.882424] kcryptd_crypt+0x3a8/0x5a0 [dm_crypt] >> [ 4549.882428] process_one_work+0x1d3/0x360 >> [ 4549.882431] worker_thread+0x4d/0x3b0 >> [ 4549.882433] ? process_one_work+0x360/0x360 >> [ 4549.882435] kthread+0x115/0x140 >> [ 4549.882436] ? set_kthread_struct+0x50/0x50 >> [ 4549.882438] ret_from_fork+0x1f/0x30 >> [ 4549.882442] >> [ 4549.882497] INFO: task .backy-wrapped:2578 blocked for more than = 122 seconds. >> [ 4549.890517] Not tainted 5.15.164 #1-NixOS >> [ 4549.895611] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" = disables this message. >> [ 4549.904406] task:.backy-wrapped state:D stack: 0 pid: 2578 = ppid: 1 flags:0x00000002 >> [ 4549.904411] Call Trace: >> [ 4549.904412] >> [ 4549.904414] __schedule+0x373/0x1580 >> [ 4549.904419] ? xlog_cil_commit+0x556/0x880 [xfs] >> [ 4549.904465] ? __xfs_trans_commit+0xac/0x2f0 [xfs] >> [ 4549.904498] schedule+0x5b/0xe0 >> [ 4549.904500] io_schedule+0x42/0x70 >> [ 4549.904503] wait_on_page_bit_common+0x119/0x380 >> [ 4549.904507] ? __page_cache_alloc+0x80/0x80 >> [ 4549.904510] wait_on_page_writeback+0x22/0x70 >> [ 4549.904513] truncate_inode_pages_range+0x26f/0x6d0 >> [ 4549.904520] evict+0x15f/0x180 >> [ 4549.904524] __dentry_kill+0xde/0x170 >> [ 4549.904527] dput+0x139/0x320 >> [ 4549.904529] do_renameat2+0x375/0x5f0 >> [ 4549.904536] __x64_sys_rename+0x3f/0x50 >> [ 4549.904538] do_syscall_64+0x34/0x80 >> [ 4549.904541] entry_SYSCALL_64_after_hwframe+0x6c/0xd6 >> [ 4549.904544] RIP: 0033:0x7fbf3e61a75b >> [ 4549.904545] RSP: 002b:00007ffc61e25988 EFLAGS: 00000246 ORIG_RAX: = 0000000000000052 >> [ 4549.904548] RAX: ffffffffffffffda RBX: 00007ffc61e25a20 RCX: = 00007fbf3e61a75b >> [ 4549.904549] RDX: 0000000000000000 RSI: 00007fbf2f7ff150 RDI: = 00007fbf2f7fc190 >> [ 4549.904550] RBP: 00007ffc61e259d0 R08: 00000000ffffffff R09: = 0000000000000000 >> [ 4549.904551] R10: 00007ffc61e25c00 R11: 0000000000000246 R12: = 00000000ffffff9c >> [ 4549.904552] R13: 00000000ffffff9c R14: 00000000016afab0 R15: = 00007fbf30ef0810 >> [ 4549.904555] >> [ 4549.904556] INFO: task kworker/u64:0:4372 blocked for more than = 122 seconds. >> [ 4549.912477] Not tainted 5.15.164 #1-NixOS >> [ 4549.917573] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" = disables this message. >> [ 4549.926373] task:kworker/u64:0 state:D stack: 0 pid: 4372 = ppid: 2 flags:0x00004000 >> [ 4549.926376] Workqueue: kcryptd/253:4 kcryptd_crypt [dm_crypt] >> [ 4549.926380] Call Trace: >> [ 4549.926381] >> [ 4549.926383] __schedule+0x373/0x1580 >> [ 4549.926386] ? sysvec_apic_timer_interrupt+0xa/0x90 >> [ 4549.926389] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 >> [ 4549.926392] schedule+0x5b/0xe0 >> [ 4549.926394] md_bitmap_startwrite+0x177/0x1e0 >> [ 4549.926397] ? finish_wait+0x90/0x90 >> [ 4549.926401] add_stripe_bio+0x449/0x770 [raid456] >> [ 4549.926406] raid5_make_request+0x1cf/0xbd0 [raid456] >> [ 4549.926410] ? __bio_clone_fast+0xa5/0xe0 >> [ 4549.926413] ? finish_wait+0x90/0x90 >> [ 4549.926415] ? __blk_queue_split+0x2d0/0x580 >> [ 4549.926418] md_handle_request+0x11f/0x1b0 >> [ 4549.926422] md_submit_bio+0x6e/0xb0 >> [ 4549.926424] __submit_bio+0x18c/0x220 >> [ 4549.926426] ? srso_alias_return_thunk+0x5/0x7f >> [ 4549.926428] ? crypt_page_alloc+0x46/0x60 [dm_crypt] >> [ 4549.926431] submit_bio_noacct+0xbe/0x2d0 >> [ 4549.926434] kcryptd_crypt+0x3a8/0x5a0 [dm_crypt] >> [ 4549.926437] process_one_work+0x1d3/0x360 >> [ 4549.926441] worker_thread+0x4d/0x3b0 >> [ 4549.926442] ? process_one_work+0x360/0x360 >> [ 4549.926444] kthread+0x115/0x140 >> [ 4549.926447] ? set_kthread_struct+0x50/0x50 >> [ 4549.926448] ret_from_fork+0x1f/0x30 >> [ 4549.926454] >> [ 4549.926459] INFO: task rsync:4929 blocked for more than 122 = seconds. >> [ 4549.933603] Not tainted 5.15.164 #1-NixOS >> [ 4549.938702] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" = disables this message. >> [ 4549.947501] task:rsync state:D stack: 0 pid: 4929 = ppid: 4925 flags:0x00000000 >> [ 4549.947503] Call Trace: >> [ 4549.947505] >> [ 4549.947505] ? usleep_range_state+0x90/0x90 >> [ 4549.947510] __schedule+0x373/0x1580 >> [ 4549.947513] ? srso_alias_return_thunk+0x5/0x7f >> [ 4549.947515] ? blk_mq_sched_insert_requests+0x97/0xe0 >> [ 4549.947519] ? usleep_range_state+0x90/0x90 >> [ 4549.947521] schedule+0x5b/0xe0 >> [ 4549.947523] schedule_timeout+0xff/0x130 >> [ 4549.947526] __wait_for_common+0xaf/0x160 >> [ 4549.947530] xfs_buf_iowait+0x1c/0xa0 [xfs] >> [ 4549.947573] __xfs_buf_submit+0x109/0x1b0 [xfs] >> [ 4549.947604] xfs_buf_read_map+0x120/0x280 [xfs] >> [ 4549.947635] ? xfs_btree_read_buf_block.constprop.0+0xae/0xf0 = [xfs] >> [ 4549.947670] xfs_trans_read_buf_map+0x156/0x2c0 [xfs] >> [ 4549.947705] ? xfs_btree_read_buf_block.constprop.0+0xae/0xf0 = [xfs] >> [ 4549.947735] xfs_btree_read_buf_block.constprop.0+0xae/0xf0 [xfs] >> [ 4549.947764] ? srso_alias_return_thunk+0x5/0x7f >> [ 4549.947766] xfs_btree_lookup_get_block+0xa2/0x180 [xfs] >> [ 4549.947798] xfs_btree_lookup+0xe9/0x540 [xfs] >> [ 4549.947830] xfs_alloc_lookup_eq+0x1d/0x30 [xfs] >> [ 4549.947863] xfs_alloc_fixup_trees+0xe7/0x3b0 [xfs] >> [ 4549.947893] xfs_alloc_cur_finish+0x2b/0xa0 [xfs] >> [ 4549.947923] xfs_alloc_ag_vextent_near.constprop.0+0x3f2/0x4a0 = [xfs] >> [ 4549.947954] xfs_alloc_ag_vextent+0x13f/0x150 [xfs] >> [ 4549.947983] xfs_alloc_vextent+0x327/0x450 [xfs] >> [ 4549.948013] xfs_bmap_btalloc+0x44e/0x830 [xfs] >> [ 4549.948047] xfs_bmapi_allocate+0xda/0x300 [xfs] >> [ 4549.948076] xfs_bmapi_write+0x4ab/0x570 [xfs] >> [ 4549.948109] xfs_da_grow_inode_int+0xd8/0x320 [xfs] >> [ 4549.948141] ? srso_alias_return_thunk+0x5/0x7f >> [ 4549.948142] ? xfs_da_read_buf+0xf7/0x150 [xfs] >> [ 4549.948171] ? srso_alias_return_thunk+0x5/0x7f >> [ 4549.948174] xfs_dir2_grow_inode+0x68/0x120 [xfs] >> [ 4549.948204] ? srso_alias_return_thunk+0x5/0x7f >> [ 4549.948206] xfs_dir2_node_addname+0x5ea/0x9e0 [xfs] >> [ 4549.948241] xfs_dir_createname+0x1cf/0x1e0 [xfs] >> [ 4549.948271] xfs_rename+0x87e/0xcd0 [xfs] >> [ 4549.948308] xfs_vn_rename+0xfa/0x170 [xfs] >> [ 4549.948340] vfs_rename+0x818/0x10d0 >> [ 4549.948345] ? lookup_dcache+0x17/0x60 >> [ 4549.948348] ? do_renameat2+0x57f/0x5f0 >> [ 4549.948350] do_renameat2+0x57f/0x5f0 >> [ 4549.948355] __x64_sys_rename+0x3f/0x50 >> [ 4549.948357] do_syscall_64+0x34/0x80 >> [ 4549.948360] entry_SYSCALL_64_after_hwframe+0x6c/0xd6 >> [ 4549.948362] RIP: 0033:0x7fcc5520c1d7 >> [ 4549.948364] RSP: 002b:00007ffe3909c748 EFLAGS: 00000246 ORIG_RAX: = 0000000000000052 >> [ 4549.948366] RAX: ffffffffffffffda RBX: 00007ffe3909c8f0 RCX: = 00007fcc5520c1d7 >> [ 4549.948367] RDX: 0000000000000000 RSI: 00007ffe3909c8f0 RDI: = 00007ffe3909e8f0 >> [ 4549.948368] RBP: 00007ffe3909e8f0 R08: 0000000000000000 R09: = 00007ffe3909c2f8 >> [ 4549.948369] R10: 00007ffe3909c2f7 R11: 0000000000000246 R12: = 0000000000000000 >> [ 4549.948370] R13: 00000000023c9c30 R14: 00000000000081a4 R15: = 0000000000000004 >> [ 4549.948373] >> [ 4549.948374] INFO: task kworker/u64:1:4930 blocked for more than = 122 seconds. >> [ 4549.956299] Not tainted 5.15.164 #1-NixOS >> [ 4549.961396] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" = disables this message. >> [ 4549.970198] task:kworker/u64:1 state:D stack: 0 pid: 4930 = ppid: 2 flags:0x00004000 >> [ 4549.970202] Workqueue: kcryptd/253:4 kcryptd_crypt [dm_crypt] >> [ 4549.970205] Call Trace: >> [ 4549.970206] >> [ 4549.970209] __schedule+0x373/0x1580 >> [ 4549.970211] ? srso_alias_return_thunk+0x5/0x7f >> [ 4549.970215] schedule+0x5b/0xe0 >> [ 4549.970217] md_bitmap_startwrite+0x177/0x1e0 >> [ 4549.970219] ? finish_wait+0x90/0x90 >> [ 4549.970223] add_stripe_bio+0x449/0x770 [raid456] >> [ 4549.970229] raid5_make_request+0x1cf/0xbd0 [raid456] >> [ 4549.970232] ? kmem_cache_alloc_node_trace+0x341/0x3e0 >> [ 4549.970236] ? srso_alias_return_thunk+0x5/0x7f >> [ 4549.970238] ? linear_map+0x44/0x90 [dm_mod] >> [ 4549.970244] ? finish_wait+0x90/0x90 >> [ 4549.970245] ? __blk_queue_split+0x516/0x580 >> [ 4549.970248] md_handle_request+0x11f/0x1b0 >> [ 4549.970251] md_submit_bio+0x6e/0xb0 >> [ 4549.970254] __submit_bio+0x18c/0x220 >> [ 4549.970256] ? srso_alias_return_thunk+0x5/0x7f >> [ 4549.970258] ? crypt_page_alloc+0x46/0x60 [dm_crypt] >> [ 4549.970260] submit_bio_noacct+0xbe/0x2d0 >> [ 4549.970263] kcryptd_crypt+0x3a8/0x5a0 [dm_crypt] >> [ 4549.970267] process_one_work+0x1d3/0x360 >> [ 4549.970270] worker_thread+0x4d/0x3b0 >> [ 4549.970272] ? process_one_work+0x360/0x360 >> [ 4549.970274] kthread+0x115/0x140 >> [ 4549.970276] ? set_kthread_struct+0x50/0x50 >> [ 4549.970278] ret_from_fork+0x1f/0x30 >> [ 4549.970282] >> [ 4549.970284] INFO: task kworker/u64:2:4949 blocked for more than = 123 seconds. >> [ 4549.978205] Not tainted 5.15.164 #1-NixOS >> [ 4549.983290] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" = disables this message. >> [ 4549.992088] task:kworker/u64:2 state:D stack: 0 pid: 4949 = ppid: 2 flags:0x00004000 >> [ 4549.992093] Workqueue: kcryptd/253:4 kcryptd_crypt [dm_crypt] >> [ 4549.992097] Call Trace: >> [ 4549.992098] >> [ 4549.992100] __schedule+0x373/0x1580 >> [ 4549.992103] ? sysvec_apic_timer_interrupt+0xa/0x90 >> [ 4549.992106] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 >> [ 4549.992109] schedule+0x5b/0xe0 >> [ 4549.992111] md_bitmap_startwrite+0x177/0x1e0 >> [ 4549.992114] ? finish_wait+0x90/0x90 >> [ 4549.992117] add_stripe_bio+0x449/0x770 [raid456] >> [ 4549.992122] raid5_make_request+0x1cf/0xbd0 [raid456] >> [ 4549.992125] ? kmem_cache_alloc+0x261/0x3b0 >> [ 4549.992129] ? srso_alias_return_thunk+0x5/0x7f >> [ 4549.992131] ? linear_map+0x44/0x90 [dm_mod] >> [ 4549.992135] ? finish_wait+0x90/0x90 >> [ 4549.992137] ? __blk_queue_split+0x516/0x580 >> [ 4549.992139] md_handle_request+0x11f/0x1b0 >> [ 4549.992142] md_submit_bio+0x6e/0xb0 >> [ 4549.992144] __submit_bio+0x18c/0x220 >> [ 4549.992146] ? srso_alias_return_thunk+0x5/0x7f >> [ 4549.992148] ? crypt_page_alloc+0x46/0x60 [dm_crypt] >> [ 4549.992150] submit_bio_noacct+0xbe/0x2d0 >> [ 4549.992153] kcryptd_crypt+0x3a8/0x5a0 [dm_crypt] >> [ 4549.992157] process_one_work+0x1d3/0x360 >> [ 4549.992160] worker_thread+0x4d/0x3b0 >> [ 4549.992162] ? process_one_work+0x360/0x360 >> [ 4549.992163] kthread+0x115/0x140 >> [ 4549.992166] ? set_kthread_struct+0x50/0x50 >> [ 4549.992168] ret_from_fork+0x1f/0x30 >> [ 4549.992172] >> [ 4549.992174] INFO: task kworker/u64:5:4952 blocked for more than = 123 seconds. >> [ 4550.000095] Not tainted 5.15.164 #1-NixOS >> [ 4550.005187] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" = disables this message. >> [ 4550.013985] task:kworker/u64:5 state:D stack: 0 pid: 4952 = ppid: 2 flags:0x00004000 >> [ 4550.013988] Workqueue: kcryptd/253:4 kcryptd_crypt [dm_crypt] >> [ 4550.013992] Call Trace: >> [ 4550.013993] >> [ 4550.013995] __schedule+0x373/0x1580 >> [ 4550.013997] ? sysvec_apic_timer_interrupt+0xa/0x90 >> [ 4550.014000] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 >> [ 4550.014003] schedule+0x5b/0xe0 >> [ 4550.014005] md_bitmap_startwrite+0x177/0x1e0 >> [ 4550.014008] ? finish_wait+0x90/0x90 >> [ 4550.014010] add_stripe_bio+0x449/0x770 [raid456] >> [ 4550.014015] raid5_make_request+0x1cf/0xbd0 [raid456] >> [ 4550.014018] ? __bio_clone_fast+0xa5/0xe0 >> [ 4550.014022] ? finish_wait+0x90/0x90 >> [ 4550.014024] ? __blk_queue_split+0x2d0/0x580 >> [ 4550.014027] md_handle_request+0x11f/0x1b0 >> [ 4550.014030] md_submit_bio+0x6e/0xb0 >> [ 4550.014032] __submit_bio+0x18c/0x220 >> [ 4550.014034] ? srso_alias_return_thunk+0x5/0x7f >> [ 4550.014036] ? crypt_page_alloc+0x46/0x60 [dm_crypt] >> [ 4550.014038] submit_bio_noacct+0xbe/0x2d0 >> [ 4550.014041] kcryptd_crypt+0x3a8/0x5a0 [dm_crypt] >> [ 4550.014044] process_one_work+0x1d3/0x360 >> [ 4550.014047] worker_thread+0x4d/0x3b0 >> [ 4550.014049] ? process_one_work+0x360/0x360 >> [ 4550.014050] kthread+0x115/0x140 >> [ 4550.014052] ? set_kthread_struct+0x50/0x50 >> [ 4550.014054] ret_from_fork+0x1f/0x30 >> [ 4550.014058] >> [ 4550.014059] INFO: task kworker/u64:8:4954 blocked for more than = 123 seconds. >> [ 4550.021982] Not tainted 5.15.164 #1-NixOS >> [ 4550.027078] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" = disables this message. >> [ 4550.035881] task:kworker/u64:8 state:D stack: 0 pid: 4954 = ppid: 2 flags:0x00004000 >> [ 4550.035884] Workqueue: kcryptd/253:4 kcryptd_crypt [dm_crypt] >> [ 4550.035887] Call Trace: >> [ 4550.035888] >> [ 4550.035890] __schedule+0x373/0x1580 >> [ 4550.035893] ? sysvec_apic_timer_interrupt+0xa/0x90 >> [ 4550.035896] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 >> [ 4550.035899] schedule+0x5b/0xe0 >> [ 4550.035901] md_bitmap_startwrite+0x177/0x1e0 >> [ 4550.035904] ? finish_wait+0x90/0x90 >> [ 4550.035907] add_stripe_bio+0x449/0x770 [raid456] >> [ 4550.035912] raid5_make_request+0x1cf/0xbd0 [raid456] >> [ 4550.035916] ? __bio_clone_fast+0xa5/0xe0 >> [ 4550.035919] ? finish_wait+0x90/0x90 >> [ 4550.035921] ? __blk_queue_split+0x2d0/0x580 >> [ 4550.035924] md_handle_request+0x11f/0x1b0 >> [ 4550.035927] md_submit_bio+0x6e/0xb0 >> [ 4550.035929] __submit_bio+0x18c/0x220 >> [ 4550.035931] ? srso_alias_return_thunk+0x5/0x7f >> [ 4550.035933] ? crypt_page_alloc+0x46/0x60 [dm_crypt] >> [ 4550.035936] submit_bio_noacct+0xbe/0x2d0 >> [ 4550.035939] kcryptd_crypt+0x3a8/0x5a0 [dm_crypt] >> [ 4550.035942] process_one_work+0x1d3/0x360 >> [ 4550.035946] worker_thread+0x4d/0x3b0 >> [ 4550.035948] ? process_one_work+0x360/0x360 >> [ 4550.035949] kthread+0x115/0x140 >> [ 4550.035951] ? set_kthread_struct+0x50/0x50 >> [ 4550.035953] ret_from_fork+0x1f/0x30 >> [ 4550.035957] >> [ 4550.035958] INFO: task kworker/u64:9:4955 blocked for more than = 123 seconds. >> [ 4550.043881] Not tainted 5.15.164 #1-NixOS >> [ 4550.048979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" = disables this message. >> [ 4550.057786] task:kworker/u64:9 state:D stack: 0 pid: 4955 = ppid: 2 flags:0x00004000 >> [ 4550.057790] Workqueue: kcryptd/253:4 kcryptd_crypt [dm_crypt] >> [ 4550.057794] Call Trace: >> [ 4550.057796] >> [ 4550.057798] __schedule+0x373/0x1580 >> [ 4550.057801] ? sysvec_apic_timer_interrupt+0xa/0x90 >> [ 4550.057803] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 >> [ 4550.057806] schedule+0x5b/0xe0 >> [ 4550.057808] md_bitmap_startwrite+0x177/0x1e0 >> [ 4550.057810] ? finish_wait+0x90/0x90 >> [ 4550.057813] add_stripe_bio+0x449/0x770 [raid456] >> [ 4550.057818] raid5_make_request+0x1cf/0xbd0 [raid456] >> [ 4550.057821] ? __bio_clone_fast+0xa5/0xe0 >> [ 4550.057824] ? finish_wait+0x90/0x90 >> [ 4550.057826] ? __blk_queue_split+0x2d0/0x580 >> [ 4550.057828] md_handle_request+0x11f/0x1b0 >> [ 4550.057831] md_submit_bio+0x6e/0xb0 >> [ 4550.057834] __submit_bio+0x18c/0x220 >> [ 4550.057835] ? srso_alias_return_thunk+0x5/0x7f >> [ 4550.057837] ? crypt_page_alloc+0x46/0x60 [dm_crypt] >> [ 4550.057839] submit_bio_noacct+0xbe/0x2d0 >> [ 4550.057842] kcryptd_crypt+0x3a8/0x5a0 [dm_crypt] >> [ 4550.057846] process_one_work+0x1d3/0x360 >> [ 4550.057848] worker_thread+0x4d/0x3b0 >> [ 4550.057850] ? process_one_work+0x360/0x360 >> [ 4550.057852] kthread+0x115/0x140 >> [ 4550.057854] ? set_kthread_struct+0x50/0x50 >> [ 4550.057856] ret_from_fork+0x1f/0x30 >> [ 4550.057860] >=20 >=20 >>> On 7. Aug 2024, at 08:46, Christian Theune = wrote: >>>=20 >>> I tried updating to 5.15.164, but have to struggle against our = config management as some options have been shifted that I need to = filter out: NFSD_V3 and NFSD2_ACL are now fixed and cause config errors = if set - I guess that=E2=80=99s a valid thing to happen within an LTS = release. I=E2=80=99ll try again on Friday >>>=20 >>>> On 7. Aug 2024, at 07:31, Christian Theune = wrote: >>>>=20 >>>> Sure, >>>>=20 >>>> would you prefer me testing on 5.15.x or something else? >>>>=20 >>>>> On 7. Aug 2024, at 04:55, Yu Kuai wrote: >>>>>=20 >>>>> Hi, >>>>>=20 >>>>> =E5=9C=A8 2024/08/06 22:10, Christian Theune =E5=86=99=E9=81=93: >>>>>> we are seeing an issue that can be triggered with relative ease = on a server that has been working fine for a few weeks. The regular = workload is a backup utility that copies off data from virtual disk = images in 4MiB (compressed) chunks from Ceph onto a local NVME-based = RAID-6 array that is encrypted using LUKS. >>>>>> Today I started a larger rsync job from another server (that has = a couple of million files with around 200-300 gib in total) to migrate = data and we=E2=80=99ve seen the server suddenly lock up twice. Any IO = that interacts with the mountpoint (/srv/backy) will hang indefinitely. = A reset is required to get out of this as the machine will hang trying = to unmount the affected filesystem. No other messages than the hung = tasks are being presented - I have no indicator for hardware faults at = the moment. >>>>>> I=E2=80=99m messaging both dm-devel and linux-raid as I=E2=80=99m = suspecting either one or both (or an interaction) might be the cause. >>>>>> Kernel: >>>>>> Linux version 5.15.138 (nixbld@localhost) (gcc (GCC) 12.2.0, GNU = ld (GNU Binutils) 2.40) #1-NixOS SMP Wed Nov 8 16:26:52 UTC 2023 >>>>>=20 >>>>> Since you can trigger this easily, I'll suggest you to try the = latest >>>>> kernel release first. >>>>>=20 >>>>> Thanks, >>>>> Kuai >>>>>=20 >>>>>> See the kernel config attached. >>>>=20 >>>>=20 >>>> Liebe Gr=C3=BC=C3=9Fe, >>>> Christian Theune >>>>=20 >>>> --=20 >>>> Christian Theune =C2=B7 ct@flyingcircus.io =C2=B7 +49 345 219401 0 >>>> Flying Circus Internet Operations GmbH =C2=B7 = https://flyingcircus.io >>>> Leipziger Str. 70/71 =C2=B7 06108 Halle (Saale) =C2=B7 Deutschland >>>> HR Stendal HRB 21169 =C2=B7 Gesch=C3=A4ftsf=C3=BChrer: Christian = Theune, Christian Zagrodnick >>>>=20 >>>>=20 >>>=20 >>> Liebe Gr=C3=BC=C3=9Fe, >>> Christian Theune >>>=20 >>> --=20 >>> Christian Theune =C2=B7 ct@flyingcircus.io =C2=B7 +49 345 219401 0 >>> Flying Circus Internet Operations GmbH =C2=B7 = https://flyingcircus.io >>> Leipziger Str. 70/71 =C2=B7 06108 Halle (Saale) =C2=B7 Deutschland >>> HR Stendal HRB 21169 =C2=B7 Gesch=C3=A4ftsf=C3=BChrer: Christian = Theune, Christian Zagrodnick >>>=20 >=20 >> Liebe Gr=C3=BC=C3=9Fe, >> Christian Theune >=20 >> --=20 >> Christian Theune =C2=B7 ct@flyingcircus.io =C2=B7 +49 345 219401 0 >> Flying Circus Internet Operations GmbH =C2=B7 https://flyingcircus.io >> Leipziger Str. 70/71 =C2=B7 06108 Halle (Saale) =C2=B7 Deutschland >> HR Stendal HRB 21169 =C2=B7 Gesch=C3=A4ftsf=C3=BChrer: Christian = Theune, Christian Zagrodnick >=20 >=20 Liebe Gr=C3=BC=C3=9Fe, Christian Theune --=20 Christian Theune =C2=B7 ct@flyingcircus.io =C2=B7 +49 345 219401 0 Flying Circus Internet Operations GmbH =C2=B7 https://flyingcircus.io Leipziger Str. 70/71 =C2=B7 06108 Halle (Saale) =C2=B7 Deutschland HR Stendal HRB 21169 =C2=B7 Gesch=C3=A4ftsf=C3=BChrer: Christian Theune, = Christian Zagrodnick