From: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: Jan Kara <jack@suse.cz>, Vlastimil Babka <vbabka@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
Matthew Wilcox <willy@infradead.org>,
Michal Hocko <mhocko@suse.com>,
stable@vger.kernel.org, regressions@lists.linux.dev,
Alasdair Kergon <agk@redhat.com>,
Mike Snitzer <snitzer@kernel.org>,
dm-devel@lists.linux.dev, linux-mm@kvack.org
Subject: Re: Intermittent storage (dm-crypt?) freeze - regression 6.4->6.5
Date: Tue, 31 Oct 2023 04:48:44 +0100 [thread overview]
Message-ID: <ZUB5HFeK3eHeI8UH@mail-itl> (raw)
In-Reply-To: <98aefaa9-1ac-a0e4-fb9a-89ded456750@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 11179 bytes --]
On Mon, Oct 30, 2023 at 06:50:35PM +0100, Mikulas Patocka wrote:
> On Mon, 30 Oct 2023, Marek Marczykowski-Górecki wrote:
> > Then retried with order=PAGE_ALLOC_COSTLY_ORDER and
> > PAGE_ALLOC_COSTLY_ORDER back at 3, and also got similar crash.
>
> So, does it mean that even allocating with order=PAGE_ALLOC_COSTLY_ORDER
> isn't safe?
That seems to be another bug, see below.
> Try enabling CONFIG_DEBUG_VM (it also needs CONFIG_DEBUG_KERNEL) and try
> to provoke a similar crash. Let's see if it crashes on one of the
> VM_BUG_ON statements.
This was very interesting idea. With this, immediately after login I get
the crash like below. Which makes sense, as this is when pulseaudio
starts and opens /dev/snd/*. I then tried with the dm-crypt commit
reverted and still got the crash! But, after blacklisting snd_pcm,
there is no BUG splat, but the storage freeze still happens on vanilla 6.5.6.
The snd_pcm BUG splat:
[ 51.082877] page:00000000d8fdb7f1 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11b7d9
[ 51.082919] flags: 0x200000000000000(node=0|zone=2)
[ 51.082924] page_type: 0xffffffff()
[ 51.082929] raw: 0200000000000000 dead000000000100 dead000000000122 0000000000000000
[ 51.082934] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 51.082938] page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u))
[ 51.082969] ------------[ cut here ]------------
[ 51.082972] kernel BUG at include/linux/mm.h:1406!
[ 51.082980] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 51.082986] CPU: 5 PID: 3893 Comm: alsa-sink-Gener Tainted: G W 6.5.6-dirty #359
[ 51.082992] Hardware name: Star Labs StarBook/StarBook, BIOS 8.97 10/03/2023
[ 51.082997] RIP: e030:snd_pcm_mmap_data_fault+0x11d/0x140 [snd_pcm]
[ 51.083015] Code: 48 2b 05 8e 7b 67 c2 48 01 f0 48 c1 e8 0c 48 c1 e0 06 48 03 05 6c 7b 67 c2 e9 4c ff ff ff 48 c7 c6 d8 71 1c c0 e8 93 1e 0e c1 <0f> 0b 48 83 ef 01 e9 4d ff ff ff 48 8b 05 51 47 89 c2 eb c9 66 66
[ 51.083023] RSP: e02b:ffffc90041be7e00 EFLAGS: 00010246
[ 51.083028] RAX: 000000000000005c RBX: ffffc90041be7e28 RCX: 0000000000000000
[ 51.083033] RDX: 0000000000000000 RSI: 0000000000000027 RDI: 00000000ffffffff
[ 51.083038] RBP: ffff888102e75f18 R08: 00000000ffffdfff R09: 0000000000000001
[ 51.083042] R10: 00000000ffffdfff R11: ffffffff82a5ddc0 R12: ffff888102e75f18
[ 51.083047] R13: 0000000000000255 R14: ffff888100955e80 R15: ffff888102e75f18
[ 51.083056] FS: 00007f51d354f6c0(0000) GS:ffff888189740000(0000) knlGS:0000000000000000
[ 51.083061] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 51.083065] CR2: 00007f51d36f6000 CR3: 000000011b53e000 CR4: 0000000000050660
[ 51.083072] Call Trace:
[ 51.083076] <TASK>
[ 51.083078] ? die+0x31/0x80
[ 51.083085] ? do_trap+0xd5/0x100
[ 51.083089] ? snd_pcm_mmap_data_fault+0x11d/0x140 [snd_pcm]
[ 51.083103] ? do_error_trap+0x65/0x90
[ 51.083107] ? snd_pcm_mmap_data_fault+0x11d/0x140 [snd_pcm]
[ 51.083120] ? exc_invalid_op+0x50/0x70
[ 51.083127] ? snd_pcm_mmap_data_fault+0x11d/0x140 [snd_pcm]
[ 51.083140] ? asm_exc_invalid_op+0x1a/0x20
[ 51.083146] ? snd_pcm_mmap_data_fault+0x11d/0x140 [snd_pcm]
[ 51.083159] __do_fault+0x29/0x110
[ 51.083165] __handle_mm_fault+0x5fb/0xc40
[ 51.083170] handle_mm_fault+0x91/0x1e0
[ 51.083173] do_user_addr_fault+0x216/0x5d0
[ 51.083179] ? check_preemption_disabled+0x31/0xf0
[ 51.083185] exc_page_fault+0x71/0x160
[ 51.083189] asm_exc_page_fault+0x26/0x30
[ 51.083195] RIP: 0033:0x7f51e56793ca
[ 51.083198] Code: c5 fe 7f 07 c5 fe 7f 47 20 c5 fe 7f 47 40 c5 fe 7f 47 60 c5 f8 77 c3 66 0f 1f 84 00 00 00 00 00 40 0f b6 c6 48 89 d1 48 89 fa <f3> aa 48 89 d0 c5 f8 77 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90
[ 51.083207] RSP: 002b:00007f51d354c528 EFLAGS: 00010202
[ 51.083211] RAX: 0000000000000000 RBX: 00007f51d354ec80 RCX: 00000000000034e0
[ 51.083216] RDX: 00007f51d36f5000 RSI: 0000000000000000 RDI: 00007f51d36f6000
[ 51.083220] RBP: 000055fec98b2f60 R08: 00007f51cc0031c0 R09: 0000000000000000
[ 51.083224] R10: 0000000000000000 R11: 0000000000000101 R12: 000055fec98b2f60
[ 51.083228] R13: 00007f51d354c630 R14: 0000000000000000 R15: 000055fec78ba680
[ 51.083233] </TASK>
[ 51.083235] Modules linked in: snd_hda_codec_hdmi snd_sof_pci_intel_tgl snd_sof_intel_hda_common snd_soc_hdac_hda soundwire_intel soundwire_generic_allocation snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_hda_codec_generic snd_sof_pci ledtrig_audio snd_sof snd_sof_utils snd_sof_xtensa_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_hda_ext_core soundwire_bus snd_soc_core snd_compress snd_pcm_dmaengine ac97_bus snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device hid_multitouch snd_pcm i2c_i801 idma64 iwlwifi i2c_smbus i2c_designware_platform i2c_designware_core snd_timer snd soundcore efivarfs i2c_hid_acpi i2c_hid pinctrl_tigerlake pinctrl_intel xen_acpi_processor xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
[ 51.083293] ---[ end trace 0000000000000000 ]---
[ 51.083296] RIP: e030:snd_pcm_mmap_data_fault+0x11d/0x140 [snd_pcm]
[ 51.083310] Code: 48 2b 05 8e 7b 67 c2 48 01 f0 48 c1 e8 0c 48 c1 e0 06 48 03 05 6c 7b 67 c2 e9 4c ff ff ff 48 c7 c6 d8 71 1c c0 e8 93 1e 0e c1 <0f> 0b 48 83 ef 01 e9 4d ff ff ff 48 8b 05 51 47 89 c2 eb c9 66 66
[ 51.083318] RSP: e02b:ffffc90041be7e00 EFLAGS: 00010246
[ 51.083323] RAX: 000000000000005c RBX: ffffc90041be7e28 RCX: 0000000000000000
[ 51.083327] RDX: 0000000000000000 RSI: 0000000000000027 RDI: 00000000ffffffff
[ 51.083331] RBP: ffff888102e75f18 R08: 00000000ffffdfff R09: 0000000000000001
[ 51.083335] R10: 00000000ffffdfff R11: ffffffff82a5ddc0 R12: ffff888102e75f18
[ 51.083340] R13: 0000000000000255 R14: ffff888100955e80 R15: ffff888102e75f18
[ 51.083347] FS: 00007f51d354f6c0(0000) GS:ffff888189740000(0000) knlGS:0000000000000000
[ 51.083353] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 51.083356] CR2: 00007f51d36f6000 CR3: 000000011b53e000 CR4: 0000000000050660
Having discovered that, I'm redoing recent tests with snd_pcm
blacklisted. I'll get back to debugging snd_pcm issue separately.
Plain 6.5.6 (so order = MAX_ORDER - 1, and PAGE_ALLOC_COSTLY_ORDER=3), in frozen state:
[ 143.195348] sysrq: Show Blocked State
[ 143.195471] task:lvm state:D stack:13312 pid:4882 ppid:2025 flags:0x00004002
[ 143.195504] Call Trace:
[ 143.195514] <TASK>
[ 143.195526] __schedule+0x30e/0x8b0
[ 143.195550] ? __pfx_dev_suspend+0x10/0x10
[ 143.195569] schedule+0x59/0xb0
[ 143.195582] io_schedule+0x41/0x70
[ 143.195595] dm_wait_for_completion+0x19d/0x1b0
[ 143.195671] ? __pfx_autoremove_wake_function+0x10/0x10
[ 143.195693] __dm_suspend+0x79/0x190
[ 143.195707] ? __pfx_dev_suspend+0x10/0x10
[ 143.195723] dm_internal_suspend_noflush+0x57/0x80
[ 143.195740] pool_presuspend+0xc7/0x130
[ 143.195759] dm_table_presuspend_targets+0x38/0x60
[ 143.195774] __dm_suspend+0x34/0x190
[ 143.195788] ? preempt_count_add+0x69/0xa0
[ 143.195805] ? __pfx_dev_suspend+0x10/0x10
[ 143.195819] dm_suspend+0xbb/0xe0
[ 143.195835] ? preempt_count_add+0x46/0xa0
[ 143.195851] dev_suspend+0x18e/0x2d0
[ 143.195867] ? __pfx_dev_suspend+0x10/0x10
[ 143.195882] ctl_ioctl+0x329/0x640
[ 143.195901] dm_ctl_ioctl+0x9/0x10
[ 143.195917] __x64_sys_ioctl+0x8f/0xd0
[ 143.195938] do_syscall_64+0x3c/0x90
[ 143.195954] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 143.195975] RIP: 0033:0x7f2e0ab1fe0f
[ 143.195989] RSP: 002b:00007ffd59a16e60 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 143.196011] RAX: ffffffffffffffda RBX: 000056289d130840 RCX: 00007f2e0ab1fe0f
[ 143.196029] RDX: 000056289d120b80 RSI: 00000000c138fd06 RDI: 0000000000000003
[ 143.196046] RBP: 000056289d120b80 R08: 000056289a7eb190 R09: 00007ffd59a16d20
[ 143.196063] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
[ 143.196080] R13: 000056289a7e4cf0 R14: 000056289a77e14d R15: 000056289d120bb0
[ 143.196098] </TASK>
[ 143.196106] task:blkdiscard state:D stack:13672 pid:4884 ppid:2025 flags:0x00000002
[ 143.196130] Call Trace:
[ 143.196139] <TASK>
[ 143.196147] __schedule+0x30e/0x8b0
[ 143.196162] schedule+0x59/0xb0
[ 143.196175] schedule_timeout+0x14c/0x160
[ 143.196193] io_schedule_timeout+0x4b/0x70
[ 143.196207] wait_for_completion_io+0x81/0x130
[ 143.196226] submit_bio_wait+0x5c/0x90
[ 143.196241] blkdev_issue_discard+0x94/0xe0
[ 143.196260] blkdev_common_ioctl+0x79e/0x9c0
[ 143.196279] blkdev_ioctl+0xc7/0x270
[ 143.196293] __x64_sys_ioctl+0x8f/0xd0
[ 143.196310] do_syscall_64+0x3c/0x90
[ 143.196324] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 143.196343] RIP: 0033:0x7fa6cebcee0f
[ 143.196354] RSP: 002b:00007ffe6700fa80 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 143.196374] RAX: ffffffffffffffda RBX: 0000000280000000 RCX: 00007fa6cebcee0f
[ 143.196391] RDX: 00007ffe6700fb50 RSI: 0000000000001277 RDI: 0000000000000003
[ 143.196408] RBP: 0000000000000003 R08: 0000000000000071 R09: 0000000000000004
[ 143.196424] R10: 00007ffe67064170 R11: 0000000000000246 R12: 0000000040000000
[ 143.196441] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 143.196460] </TASK>
for f in $(grep -l crypt /proc/*/comm); do head $f ${f/comm/stack}; done
==> /proc/3761/comm <==
kworker/u12:7-kcryptd/252:0
==> /proc/3761/stack <==
[<0>] worker_thread+0xab/0x3b0
[<0>] kthread+0xef/0x120
[<0>] ret_from_fork+0x2c/0x50
[<0>] ret_from_fork_asm+0x1b/0x30
==> /proc/51/comm <==
cryptd
==> /proc/51/stack <==
[<0>] rescuer_thread+0x2d5/0x390
[<0>] kthread+0xef/0x120
[<0>] ret_from_fork+0x2c/0x50
[<0>] ret_from_fork_asm+0x1b/0x30
==> /proc/556/comm <==
kcryptd_io/252:
==> /proc/556/stack <==
[<0>] rescuer_thread+0x2d5/0x390
[<0>] kthread+0xef/0x120
[<0>] ret_from_fork+0x2c/0x50
[<0>] ret_from_fork_asm+0x1b/0x30
==> /proc/557/comm <==
kcryptd/252:0
==> /proc/557/stack <==
[<0>] rescuer_thread+0x2d5/0x390
[<0>] kthread+0xef/0x120
[<0>] ret_from_fork+0x2c/0x50
[<0>] ret_from_fork_asm+0x1b/0x30
==> /proc/558/comm <==
dmcrypt_write/252:0
==> /proc/558/stack <==
[<0>] dmcrypt_write+0x6a/0x140
[<0>] kthread+0xef/0x120
[<0>] ret_from_fork+0x2c/0x50
[<0>] ret_from_fork_asm+0x1b/0x30
==> /proc/717/comm <==
kworker/u12:6-kcryptd/252:0
==> /proc/717/stack <==
[<0>] worker_thread+0xab/0x3b0
[<0>] kthread+0xef/0x120
[<0>] ret_from_fork+0x2c/0x50
[<0>] ret_from_fork_asm+0x1b/0x30
Then tried:
- PAGE_ALLOC_COSTLY_ORDER=4, order=4 - cannot reproduce,
- PAGE_ALLOC_COSTLY_ORDER=4, order=5 - cannot reproduce,
- PAGE_ALLOC_COSTLY_ORDER=4, order=6 - freeze rather quickly
I've retried the PAGE_ALLOC_COSTLY_ORDER=4,order=5 case several times
and I can't reproduce the issue there. I'm confused...
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2023-10-31 3:48 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <ZTNH0qtmint/zLJZ@mail-itl>
[not found] ` <e427823c-e869-86a2-3549-61b3fdf29537@redhat.com>
[not found] ` <ZTiHQDY54E7WAld+@mail-itl>
[not found] ` <ZTiJ3CO8w0jauOzW@mail-itl>
2023-10-25 10:13 ` Intermittent storage (dm-crypt?) freeze - regression 6.4->6.5 Mikulas Patocka
2023-10-27 17:32 ` Mikulas Patocka
2023-10-28 9:23 ` Matthew Wilcox
2023-10-28 15:14 ` Mike Snitzer
2023-10-29 11:15 ` Marek Marczykowski-Górecki
2023-10-29 20:02 ` Vlastimil Babka
2023-10-30 7:37 ` Mikulas Patocka
2023-10-30 8:37 ` Vlastimil Babka
2023-10-30 11:22 ` Mikulas Patocka
2023-10-30 11:30 ` Vlastimil Babka
2023-10-30 11:37 ` Mikulas Patocka
2023-10-30 12:25 ` Jan Kara
2023-10-30 13:30 ` Marek Marczykowski-Górecki
2023-10-30 14:08 ` Mikulas Patocka
2023-10-30 15:56 ` Jan Kara
2023-10-30 16:51 ` Marek Marczykowski-Górecki
2023-10-30 17:50 ` Mikulas Patocka
2023-10-31 3:48 ` Marek Marczykowski-Górecki [this message]
2023-10-31 14:01 ` Jan Kara
2023-10-31 15:42 ` Marek Marczykowski-Górecki
2023-10-31 17:17 ` Mikulas Patocka
2023-10-31 17:24 ` Mikulas Patocka
2023-11-02 0:38 ` Marek Marczykowski-Górecki
2023-11-02 9:28 ` Mikulas Patocka
2023-11-02 11:45 ` Marek Marczykowski-Górecki
2023-11-02 17:06 ` Mikulas Patocka
2023-11-03 15:01 ` Marek Marczykowski-Górecki
2023-11-03 15:10 ` Keith Busch
2023-11-03 16:15 ` Marek Marczykowski-Górecki
2023-11-03 16:54 ` Keith Busch
2023-11-03 20:30 ` Marek Marczykowski-G'orecki
2023-11-03 22:42 ` Keith Busch
2023-11-04 9:27 ` Mikulas Patocka
2023-11-04 13:59 ` Keith Busch
2023-11-06 7:10 ` Christoph Hellwig
2023-11-06 14:59 ` [PATCH] swiotlb-xen: provide the "max_mapping_size" method Mikulas Patocka
2023-11-06 15:16 ` Keith Busch
2023-11-06 15:30 ` Mike Snitzer
2023-11-06 17:12 ` [PATCH v2] " Mikulas Patocka
2023-11-07 4:18 ` Stefano Stabellini
2023-11-08 7:31 ` Christoph Hellwig
2023-11-06 7:08 ` Intermittent storage (dm-crypt?) freeze - regression 6.4->6.5 Christoph Hellwig
2023-11-02 12:21 ` Jan Kara
2023-11-01 1:27 ` Ming Lei
[not found] ` <ZUG0gcRhUlFm57qN@mail-itl>
[not found] ` <ZUG016NyTms2073C@mail-itl>
2023-11-01 2:35 ` Marek Marczykowski-Górecki
2023-11-01 3:24 ` Ming Lei
2023-11-01 10:15 ` Hannes Reinecke
2023-11-01 10:26 ` Jan Kara
2023-11-01 11:23 ` Ming Lei
2023-11-02 14:02 ` Keith Busch
2023-11-01 12:16 ` Mikulas Patocka
2023-10-30 11:28 ` Jan Kara
2023-10-30 11:49 ` Mikulas Patocka
2023-10-30 12:11 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZUB5HFeK3eHeI8UH@mail-itl \
--to=marmarek@invisiblethingslab.com \
--cc=agk@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dm-devel@lists.linux.dev \
--cc=jack@suse.cz \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=mpatocka@redhat.com \
--cc=regressions@lists.linux.dev \
--cc=snitzer@kernel.org \
--cc=stable@vger.kernel.org \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).