From: Bagas Sanjaya <bagasdotme@gmail.com>
To: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>,
"Linux Stable" <stable@vger.kernel.org>
Cc: Linux Regressions <regressions@lists.linux.dev>,
Alasdair Kergon <agk@redhat.com>,
Mike Snitzer <snitzer@kernel.org>,
Mikulas Patocka <mpatocka@redhat.com>,
Linux Devicemapper <dm-devel@redhat.com>,
Matthew Wilcox <willy@infradead.org>
Subject: Re: Intermittent storage (dm-crypt?) freeze - regression 6.4->6.5
Date: Sat, 21 Oct 2023 14:48:32 +0700 [thread overview]
Message-ID: <ZTOCUJdgDDBX-ecp@debian.me> (raw)
In-Reply-To: <ZTNH0qtmint/zLJZ@mail-itl>
[-- Attachment #1: Type: text/plain, Size: 7665 bytes --]
On Sat, Oct 21, 2023 at 05:38:58AM +0200, Marek Marczykowski-Górecki wrote:
> Hi,
>
> Since updating from 6.4.13 to 6.5.5 occasionally I hit a storage
> subsystem freeze - any I/O ends up frozen. I'm not sure what exactly
> triggers the issue, but often it happens when doing some LVM operations
> (lvremove, lvrename etc) on a dm-thin volume together with bulk data
> copy to/from another LVM thin volume with ext4 fs.
>
> The storage stack I use is:
> nvme -> dm-crypt (LUKS) -> dm-thin (LVM thin pool) -> ext4
>
> And this whole thing running in a (PV) dom0 under Xen, on Qubes OS 4.2 to be
> specific.
>
> I can reproduce the issue on at least 3 different machines. I did tried
> also 6.5.6 and the issue is still there. I haven't checked newer
> versions, but briefly reviewed git log and haven't found anything
> suggesting a fix to similar issue.
>
> I managed to bisect it down to this commit:
>
> commit 5054e778fcd9cd29ddaa8109077cd235527e4f94
> Author: Mikulas Patocka <mpatocka@redhat.com>
> Date: Mon May 1 09:19:17 2023 -0400
>
> dm crypt: allocate compound pages if possible
>
> It was reported that allocating pages for the write buffer in dm-crypt
> causes measurable overhead [1].
>
> Change dm-crypt to allocate compound pages if they are available. If
> not, fall back to the mempool.
>
> [1] https://listman.redhat.com/archives/dm-devel/2023-February/053284.html
>
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
>
> TBH, I'm not sure if the bug is in this commit, or maybe in some
> functions it uses (I don't see dm-crypt functions directly involved in
> the stack traces I collected). But reverting this commit on top of 6.5.6
> seems to fix the issue.
>
> I tried also CONFIG_PROVE_LOCKING, but it didn't show any issue.
>
> I managed to collect "blocked tasks" dump via sysrq below. Few more can
> be found at https://github.com/QubesOS/qubes-issues/issues/8575
>
> [ 4246.558313] sysrq: Show Blocked State
> [ 4246.558388] task:journal-offline state:D stack:0 pid:8098 ppid:1 flags:0x00000002
> [ 4246.558407] Call Trace:
> [ 4246.558414] <TASK>
> [ 4246.558422] __schedule+0x23d/0x670
> [ 4246.558440] schedule+0x5e/0xd0
> [ 4246.558450] io_schedule+0x46/0x70
> [ 4246.558461] folio_wait_bit_common+0x13d/0x350
> [ 4246.558475] ? __pfx_wake_page_function+0x10/0x10
> [ 4246.558488] folio_wait_writeback+0x2c/0x90
> [ 4246.558498] mpage_prepare_extent_to_map+0x15c/0x4d0
> [ 4246.558512] ext4_do_writepages+0x25f/0x770
> [ 4246.558523] ext4_writepages+0xad/0x180
> [ 4246.558533] do_writepages+0xcf/0x1e0
> [ 4246.558543] ? __seccomp_filter+0x32a/0x4f0
> [ 4246.558554] filemap_fdatawrite_wbc+0x63/0x90
> [ 4246.558567] __filemap_fdatawrite_range+0x5c/0x80
> [ 4246.558578] file_write_and_wait_range+0x4a/0xb0
> [ 4246.558588] ext4_sync_file+0x88/0x380
> [ 4246.558598] __x64_sys_fsync+0x3b/0x70
> [ 4246.558609] do_syscall_64+0x5c/0x90
> [ 4246.558621] ? exit_to_user_mode_prepare+0xb2/0xd0
> [ 4246.558632] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> [ 4246.558644] RIP: 0033:0x7710cf124d0a
> [ 4246.558654] RSP: 002b:00007710ccdfda40 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
> [ 4246.558668] RAX: ffffffffffffffda RBX: 000064bb92f67e60 RCX: 00007710cf124d0a
> [ 4246.558679] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000028
> [ 4246.558691] RBP: 000064bb92f72670 R08: 0000000000000000 R09: 00007710ccdfe6c0
> [ 4246.558702] R10: 00007710cf0adfee R11: 0000000000000293 R12: 000064bb92505940
> [ 4246.558713] R13: 0000000000000002 R14: 00007ffc05649500 R15: 00007710cc5fe000
> [ 4246.558728] </TASK>
> [ 4246.558836] task:lvm state:D stack:0 pid:7835 ppid:5665 flags:0x00004006
> [ 4246.558852] Call Trace:
> [ 4246.558857] <TASK>
> [ 4246.558863] __schedule+0x23d/0x670
> [ 4246.558874] schedule+0x5e/0xd0
> [ 4246.558884] io_schedule+0x46/0x70
> [ 4246.558894] dm_wait_for_bios_completion+0xfc/0x110
> [ 4246.558909] ? __pfx_autoremove_wake_function+0x10/0x10
> [ 4246.558922] __dm_suspend+0x7e/0x1b0
> [ 4246.558932] dm_internal_suspend_noflush+0x5c/0x80
> [ 4246.558946] pool_presuspend+0xcc/0x130 [dm_thin_pool]
> [ 4246.558968] dm_table_presuspend_targets+0x3f/0x60
> [ 4246.558980] __dm_suspend+0x41/0x1b0
> [ 4246.558991] dm_suspend+0xc0/0xe0
> [ 4246.559001] dev_suspend+0xa5/0xd0
> [ 4246.559011] ctl_ioctl+0x26e/0x350
> [ 4246.559020] ? __pfx_dev_suspend+0x10/0x10
> [ 4246.559032] dm_ctl_ioctl+0xe/0x20
> [ 4246.559041] __x64_sys_ioctl+0x94/0xd0
> [ 4246.559052] do_syscall_64+0x5c/0x90
> [ 4246.559062] ? do_syscall_64+0x6b/0x90
> [ 4246.559072] ? do_syscall_64+0x6b/0x90
> [ 4246.559081] ? xen_pv_evtchn_do_upcall+0x54/0xb0
> [ 4246.559093] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> [ 4246.559104] RIP: 0033:0x7f1cb77cfe0f
> [ 4246.559112] RSP: 002b:00007fff870f2560 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [ 4246.559141] RAX: ffffffffffffffda RBX: 00005b8d13c16580 RCX: 00007f1cb77cfe0f
> [ 4246.559152] RDX: 00005b8d144a2180 RSI: 00000000c138fd06 RDI: 0000000000000003
> [ 4246.559164] RBP: 00005b8d144a2180 R08: 00005b8d132b1190 R09: 00007fff870f2420
> [ 4246.559175] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
> [ 4246.559186] R13: 00005b8d132aacf0 R14: 00005b8d1324414d R15: 00005b8d144a21b0
> [ 4246.559199] </TASK>
> [ 4246.559207] task:kworker/u8:3 state:D stack:0 pid:8033 ppid:2 flags:0x00004000
> [ 4246.559222] Workqueue: writeback wb_workfn (flush-253:4)
> [ 4246.559238] Call Trace:
> [ 4246.559244] <TASK>
> [ 4246.559249] __schedule+0x23d/0x670
> [ 4246.559260] schedule+0x5e/0xd0
> [ 4246.559270] io_schedule+0x46/0x70
> [ 4246.559280] folio_wait_bit_common+0x13d/0x350
> [ 4246.559290] ? __pfx_wake_page_function+0x10/0x10
> [ 4246.559302] mpage_prepare_extent_to_map+0x309/0x4d0
> [ 4246.559314] ext4_do_writepages+0x25f/0x770
> [ 4246.559324] ext4_writepages+0xad/0x180
> [ 4246.559334] do_writepages+0xcf/0x1e0
> [ 4246.559344] ? find_busiest_group+0x42/0x1a0
> [ 4246.559354] __writeback_single_inode+0x3d/0x280
> [ 4246.559368] writeback_sb_inodes+0x1ed/0x4a0
> [ 4246.559381] __writeback_inodes_wb+0x4c/0xf0
> [ 4246.559393] wb_writeback+0x298/0x310
> [ 4246.559403] wb_do_writeback+0x230/0x2b0
> [ 4246.559414] wb_workfn+0x5f/0x260
> [ 4246.559424] ? _raw_spin_unlock+0xe/0x30
> [ 4246.559434] ? finish_task_switch.isra.0+0x95/0x2b0
> [ 4246.559447] ? __schedule+0x245/0x670
> [ 4246.559457] process_one_work+0x1df/0x3e0
> [ 4246.559466] worker_thread+0x51/0x390
> [ 4246.559475] ? __pfx_worker_thread+0x10/0x10
> [ 4246.559484] kthread+0xe5/0x120
> [ 4246.559495] ? __pfx_kthread+0x10/0x10
> [ 4246.559504] ret_from_fork+0x31/0x50
> [ 4246.559514] ? __pfx_kthread+0x10/0x10
> [ 4246.559523] ret_from_fork_asm+0x1b/0x30
> [ 4246.559536] </TASK>
>
Thanks for the regression report. I'm adding it to regzbot:
#regzbot ^introduced: 5054e778fcd9cd
--
An old man doll... just what I always wanted! - Clara
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
next prev parent reply other threads:[~2023-10-21 7:48 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-21 3:38 Intermittent storage (dm-crypt?) freeze - regression 6.4->6.5 Marek Marczykowski-Górecki
2023-10-21 7:48 ` Bagas Sanjaya [this message]
2023-10-29 6:23 ` Linux regression tracking #update (Thorsten Leemhuis)
2023-10-29 6:23 ` Linux regression tracking #update (Thorsten Leemhuis)
2023-10-23 20:59 ` Mikulas Patocka
2023-10-23 20:59 ` Mikulas Patocka
2023-10-25 3:10 ` Marek Marczykowski-Górecki
2023-10-25 3:10 ` Marek Marczykowski-Górecki
2023-10-25 3:22 ` Marek Marczykowski-Górecki
2023-10-25 3:22 ` Marek Marczykowski-Górecki
2023-10-25 10:13 ` Mikulas Patocka
2023-10-27 17:32 ` Mikulas Patocka
2023-10-28 9:23 ` Matthew Wilcox
2023-10-28 15:14 ` Mike Snitzer
2023-10-29 11:15 ` Marek Marczykowski-Górecki
2023-10-29 20:02 ` Vlastimil Babka
2023-10-30 7:37 ` Mikulas Patocka
2023-10-30 8:37 ` Vlastimil Babka
2023-10-30 11:22 ` Mikulas Patocka
2023-10-30 11:30 ` Vlastimil Babka
2023-10-30 11:37 ` Mikulas Patocka
2023-10-30 12:25 ` Jan Kara
2023-10-30 13:30 ` Marek Marczykowski-Górecki
2023-10-30 14:08 ` Mikulas Patocka
2023-10-30 15:56 ` Jan Kara
2023-10-30 16:51 ` Marek Marczykowski-Górecki
2023-10-30 17:50 ` Mikulas Patocka
2023-10-31 3:48 ` Marek Marczykowski-Górecki
2023-10-31 14:01 ` Jan Kara
2023-10-31 15:42 ` Marek Marczykowski-Górecki
2023-10-31 17:17 ` Mikulas Patocka
2023-10-31 17:24 ` Mikulas Patocka
2023-11-02 0:38 ` Marek Marczykowski-Górecki
2023-11-02 9:28 ` Mikulas Patocka
2023-11-02 11:45 ` Marek Marczykowski-Górecki
2023-11-02 17:06 ` Mikulas Patocka
2023-11-03 15:01 ` Marek Marczykowski-Górecki
2023-11-03 15:10 ` Keith Busch
2023-11-03 16:15 ` Marek Marczykowski-Górecki
2023-11-03 16:54 ` Keith Busch
2023-11-03 20:30 ` Marek Marczykowski-G'orecki
2023-11-03 22:42 ` Keith Busch
2023-11-04 9:27 ` Mikulas Patocka
2023-11-04 13:59 ` Keith Busch
2023-11-06 7:10 ` Christoph Hellwig
2023-11-06 14:59 ` [PATCH] swiotlb-xen: provide the "max_mapping_size" method Mikulas Patocka
2023-11-06 15:16 ` Keith Busch
2023-11-06 15:30 ` Mike Snitzer
2023-11-06 17:12 ` [PATCH v2] " Mikulas Patocka
2023-11-07 4:18 ` Stefano Stabellini
2023-11-08 7:31 ` Christoph Hellwig
2023-11-06 7:08 ` Intermittent storage (dm-crypt?) freeze - regression 6.4->6.5 Christoph Hellwig
2023-11-02 12:21 ` Jan Kara
2023-11-01 1:27 ` Ming Lei
2023-11-01 2:14 ` Marek Marczykowski-Górecki
2023-11-01 2:15 ` Marek Marczykowski-Górecki
2023-11-01 2:35 ` Marek Marczykowski-Górecki
2023-11-01 3:24 ` Ming Lei
2023-11-01 10:15 ` Hannes Reinecke
2023-11-01 10:26 ` Jan Kara
2023-11-01 11:23 ` Ming Lei
2023-11-02 14:02 ` Keith Busch
2023-11-01 12:16 ` Mikulas Patocka
2023-10-30 11:28 ` Jan Kara
2023-10-30 11:49 ` Mikulas Patocka
2023-10-30 12:11 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZTOCUJdgDDBX-ecp@debian.me \
--to=bagasdotme@gmail.com \
--cc=agk@redhat.com \
--cc=dm-devel@redhat.com \
--cc=marmarek@invisiblethingslab.com \
--cc=mpatocka@redhat.com \
--cc=regressions@lists.linux.dev \
--cc=snitzer@kernel.org \
--cc=stable@vger.kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.