From: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: Jan Kara <jack@suse.cz>, Vlastimil Babka <vbabka@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
Matthew Wilcox <willy@infradead.org>,
Michal Hocko <mhocko@suse.com>,
stable@vger.kernel.org, regressions@lists.linux.dev,
Alasdair Kergon <agk@redhat.com>,
Mike Snitzer <snitzer@kernel.org>,
dm-devel@lists.linux.dev, linux-mm@kvack.org
Subject: Re: Intermittent storage (dm-crypt?) freeze - regression 6.4->6.5
Date: Tue, 31 Oct 2023 04:48:44 +0100 [thread overview]
Message-ID: <ZUB5HFeK3eHeI8UH@mail-itl> (raw)
In-Reply-To: <98aefaa9-1ac-a0e4-fb9a-89ded456750@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 11179 bytes --]
On Mon, Oct 30, 2023 at 06:50:35PM +0100, Mikulas Patocka wrote:
> On Mon, 30 Oct 2023, Marek Marczykowski-Górecki wrote:
> > Then retried with order=PAGE_ALLOC_COSTLY_ORDER and
> > PAGE_ALLOC_COSTLY_ORDER back at 3, and also got similar crash.
>
> So, does it mean that even allocating with order=PAGE_ALLOC_COSTLY_ORDER
> isn't safe?
That seems to be another bug, see below.
> Try enabling CONFIG_DEBUG_VM (it also needs CONFIG_DEBUG_KERNEL) and try
> to provoke a similar crash. Let's see if it crashes on one of the
> VM_BUG_ON statements.
This was very interesting idea. With this, immediately after login I get
the crash like below. Which makes sense, as this is when pulseaudio
starts and opens /dev/snd/*. I then tried with the dm-crypt commit
reverted and still got the crash! But, after blacklisting snd_pcm,
there is no BUG splat, but the storage freeze still happens on vanilla 6.5.6.
The snd_pcm BUG splat:
[ 51.082877] page:00000000d8fdb7f1 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11b7d9
[ 51.082919] flags: 0x200000000000000(node=0|zone=2)
[ 51.082924] page_type: 0xffffffff()
[ 51.082929] raw: 0200000000000000 dead000000000100 dead000000000122 0000000000000000
[ 51.082934] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 51.082938] page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u))
[ 51.082969] ------------[ cut here ]------------
[ 51.082972] kernel BUG at include/linux/mm.h:1406!
[ 51.082980] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 51.082986] CPU: 5 PID: 3893 Comm: alsa-sink-Gener Tainted: G W 6.5.6-dirty #359
[ 51.082992] Hardware name: Star Labs StarBook/StarBook, BIOS 8.97 10/03/2023
[ 51.082997] RIP: e030:snd_pcm_mmap_data_fault+0x11d/0x140 [snd_pcm]
[ 51.083015] Code: 48 2b 05 8e 7b 67 c2 48 01 f0 48 c1 e8 0c 48 c1 e0 06 48 03 05 6c 7b 67 c2 e9 4c ff ff ff 48 c7 c6 d8 71 1c c0 e8 93 1e 0e c1 <0f> 0b 48 83 ef 01 e9 4d ff ff ff 48 8b 05 51 47 89 c2 eb c9 66 66
[ 51.083023] RSP: e02b:ffffc90041be7e00 EFLAGS: 00010246
[ 51.083028] RAX: 000000000000005c RBX: ffffc90041be7e28 RCX: 0000000000000000
[ 51.083033] RDX: 0000000000000000 RSI: 0000000000000027 RDI: 00000000ffffffff
[ 51.083038] RBP: ffff888102e75f18 R08: 00000000ffffdfff R09: 0000000000000001
[ 51.083042] R10: 00000000ffffdfff R11: ffffffff82a5ddc0 R12: ffff888102e75f18
[ 51.083047] R13: 0000000000000255 R14: ffff888100955e80 R15: ffff888102e75f18
[ 51.083056] FS: 00007f51d354f6c0(0000) GS:ffff888189740000(0000) knlGS:0000000000000000
[ 51.083061] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 51.083065] CR2: 00007f51d36f6000 CR3: 000000011b53e000 CR4: 0000000000050660
[ 51.083072] Call Trace:
[ 51.083076] <TASK>
[ 51.083078] ? die+0x31/0x80
[ 51.083085] ? do_trap+0xd5/0x100
[ 51.083089] ? snd_pcm_mmap_data_fault+0x11d/0x140 [snd_pcm]
[ 51.083103] ? do_error_trap+0x65/0x90
[ 51.083107] ? snd_pcm_mmap_data_fault+0x11d/0x140 [snd_pcm]
[ 51.083120] ? exc_invalid_op+0x50/0x70
[ 51.083127] ? snd_pcm_mmap_data_fault+0x11d/0x140 [snd_pcm]
[ 51.083140] ? asm_exc_invalid_op+0x1a/0x20
[ 51.083146] ? snd_pcm_mmap_data_fault+0x11d/0x140 [snd_pcm]
[ 51.083159] __do_fault+0x29/0x110
[ 51.083165] __handle_mm_fault+0x5fb/0xc40
[ 51.083170] handle_mm_fault+0x91/0x1e0
[ 51.083173] do_user_addr_fault+0x216/0x5d0
[ 51.083179] ? check_preemption_disabled+0x31/0xf0
[ 51.083185] exc_page_fault+0x71/0x160
[ 51.083189] asm_exc_page_fault+0x26/0x30
[ 51.083195] RIP: 0033:0x7f51e56793ca
[ 51.083198] Code: c5 fe 7f 07 c5 fe 7f 47 20 c5 fe 7f 47 40 c5 fe 7f 47 60 c5 f8 77 c3 66 0f 1f 84 00 00 00 00 00 40 0f b6 c6 48 89 d1 48 89 fa <f3> aa 48 89 d0 c5 f8 77 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90
[ 51.083207] RSP: 002b:00007f51d354c528 EFLAGS: 00010202
[ 51.083211] RAX: 0000000000000000 RBX: 00007f51d354ec80 RCX: 00000000000034e0
[ 51.083216] RDX: 00007f51d36f5000 RSI: 0000000000000000 RDI: 00007f51d36f6000
[ 51.083220] RBP: 000055fec98b2f60 R08: 00007f51cc0031c0 R09: 0000000000000000
[ 51.083224] R10: 0000000000000000 R11: 0000000000000101 R12: 000055fec98b2f60
[ 51.083228] R13: 00007f51d354c630 R14: 0000000000000000 R15: 000055fec78ba680
[ 51.083233] </TASK>
[ 51.083235] Modules linked in: snd_hda_codec_hdmi snd_sof_pci_intel_tgl snd_sof_intel_hda_common snd_soc_hdac_hda soundwire_intel soundwire_generic_allocation snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_hda_codec_generic snd_sof_pci ledtrig_audio snd_sof snd_sof_utils snd_sof_xtensa_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_hda_ext_core soundwire_bus snd_soc_core snd_compress snd_pcm_dmaengine ac97_bus snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device hid_multitouch snd_pcm i2c_i801 idma64 iwlwifi i2c_smbus i2c_designware_platform i2c_designware_core snd_timer snd soundcore efivarfs i2c_hid_acpi i2c_hid pinctrl_tigerlake pinctrl_intel xen_acpi_processor xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
[ 51.083293] ---[ end trace 0000000000000000 ]---
[ 51.083296] RIP: e030:snd_pcm_mmap_data_fault+0x11d/0x140 [snd_pcm]
[ 51.083310] Code: 48 2b 05 8e 7b 67 c2 48 01 f0 48 c1 e8 0c 48 c1 e0 06 48 03 05 6c 7b 67 c2 e9 4c ff ff ff 48 c7 c6 d8 71 1c c0 e8 93 1e 0e c1 <0f> 0b 48 83 ef 01 e9 4d ff ff ff 48 8b 05 51 47 89 c2 eb c9 66 66
[ 51.083318] RSP: e02b:ffffc90041be7e00 EFLAGS: 00010246
[ 51.083323] RAX: 000000000000005c RBX: ffffc90041be7e28 RCX: 0000000000000000
[ 51.083327] RDX: 0000000000000000 RSI: 0000000000000027 RDI: 00000000ffffffff
[ 51.083331] RBP: ffff888102e75f18 R08: 00000000ffffdfff R09: 0000000000000001
[ 51.083335] R10: 00000000ffffdfff R11: ffffffff82a5ddc0 R12: ffff888102e75f18
[ 51.083340] R13: 0000000000000255 R14: ffff888100955e80 R15: ffff888102e75f18
[ 51.083347] FS: 00007f51d354f6c0(0000) GS:ffff888189740000(0000) knlGS:0000000000000000
[ 51.083353] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 51.083356] CR2: 00007f51d36f6000 CR3: 000000011b53e000 CR4: 0000000000050660
Having discovered that, I'm redoing recent tests with snd_pcm
blacklisted. I'll get back to debugging snd_pcm issue separately.
Plain 6.5.6 (so order = MAX_ORDER - 1, and PAGE_ALLOC_COSTLY_ORDER=3), in frozen state:
[ 143.195348] sysrq: Show Blocked State
[ 143.195471] task:lvm state:D stack:13312 pid:4882 ppid:2025 flags:0x00004002
[ 143.195504] Call Trace:
[ 143.195514] <TASK>
[ 143.195526] __schedule+0x30e/0x8b0
[ 143.195550] ? __pfx_dev_suspend+0x10/0x10
[ 143.195569] schedule+0x59/0xb0
[ 143.195582] io_schedule+0x41/0x70
[ 143.195595] dm_wait_for_completion+0x19d/0x1b0
[ 143.195671] ? __pfx_autoremove_wake_function+0x10/0x10
[ 143.195693] __dm_suspend+0x79/0x190
[ 143.195707] ? __pfx_dev_suspend+0x10/0x10
[ 143.195723] dm_internal_suspend_noflush+0x57/0x80
[ 143.195740] pool_presuspend+0xc7/0x130
[ 143.195759] dm_table_presuspend_targets+0x38/0x60
[ 143.195774] __dm_suspend+0x34/0x190
[ 143.195788] ? preempt_count_add+0x69/0xa0
[ 143.195805] ? __pfx_dev_suspend+0x10/0x10
[ 143.195819] dm_suspend+0xbb/0xe0
[ 143.195835] ? preempt_count_add+0x46/0xa0
[ 143.195851] dev_suspend+0x18e/0x2d0
[ 143.195867] ? __pfx_dev_suspend+0x10/0x10
[ 143.195882] ctl_ioctl+0x329/0x640
[ 143.195901] dm_ctl_ioctl+0x9/0x10
[ 143.195917] __x64_sys_ioctl+0x8f/0xd0
[ 143.195938] do_syscall_64+0x3c/0x90
[ 143.195954] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 143.195975] RIP: 0033:0x7f2e0ab1fe0f
[ 143.195989] RSP: 002b:00007ffd59a16e60 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 143.196011] RAX: ffffffffffffffda RBX: 000056289d130840 RCX: 00007f2e0ab1fe0f
[ 143.196029] RDX: 000056289d120b80 RSI: 00000000c138fd06 RDI: 0000000000000003
[ 143.196046] RBP: 000056289d120b80 R08: 000056289a7eb190 R09: 00007ffd59a16d20
[ 143.196063] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
[ 143.196080] R13: 000056289a7e4cf0 R14: 000056289a77e14d R15: 000056289d120bb0
[ 143.196098] </TASK>
[ 143.196106] task:blkdiscard state:D stack:13672 pid:4884 ppid:2025 flags:0x00000002
[ 143.196130] Call Trace:
[ 143.196139] <TASK>
[ 143.196147] __schedule+0x30e/0x8b0
[ 143.196162] schedule+0x59/0xb0
[ 143.196175] schedule_timeout+0x14c/0x160
[ 143.196193] io_schedule_timeout+0x4b/0x70
[ 143.196207] wait_for_completion_io+0x81/0x130
[ 143.196226] submit_bio_wait+0x5c/0x90
[ 143.196241] blkdev_issue_discard+0x94/0xe0
[ 143.196260] blkdev_common_ioctl+0x79e/0x9c0
[ 143.196279] blkdev_ioctl+0xc7/0x270
[ 143.196293] __x64_sys_ioctl+0x8f/0xd0
[ 143.196310] do_syscall_64+0x3c/0x90
[ 143.196324] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 143.196343] RIP: 0033:0x7fa6cebcee0f
[ 143.196354] RSP: 002b:00007ffe6700fa80 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 143.196374] RAX: ffffffffffffffda RBX: 0000000280000000 RCX: 00007fa6cebcee0f
[ 143.196391] RDX: 00007ffe6700fb50 RSI: 0000000000001277 RDI: 0000000000000003
[ 143.196408] RBP: 0000000000000003 R08: 0000000000000071 R09: 0000000000000004
[ 143.196424] R10: 00007ffe67064170 R11: 0000000000000246 R12: 0000000040000000
[ 143.196441] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 143.196460] </TASK>
for f in $(grep -l crypt /proc/*/comm); do head $f ${f/comm/stack}; done
==> /proc/3761/comm <==
kworker/u12:7-kcryptd/252:0
==> /proc/3761/stack <==
[<0>] worker_thread+0xab/0x3b0
[<0>] kthread+0xef/0x120
[<0>] ret_from_fork+0x2c/0x50
[<0>] ret_from_fork_asm+0x1b/0x30
==> /proc/51/comm <==
cryptd
==> /proc/51/stack <==
[<0>] rescuer_thread+0x2d5/0x390
[<0>] kthread+0xef/0x120
[<0>] ret_from_fork+0x2c/0x50
[<0>] ret_from_fork_asm+0x1b/0x30
==> /proc/556/comm <==
kcryptd_io/252:
==> /proc/556/stack <==
[<0>] rescuer_thread+0x2d5/0x390
[<0>] kthread+0xef/0x120
[<0>] ret_from_fork+0x2c/0x50
[<0>] ret_from_fork_asm+0x1b/0x30
==> /proc/557/comm <==
kcryptd/252:0
==> /proc/557/stack <==
[<0>] rescuer_thread+0x2d5/0x390
[<0>] kthread+0xef/0x120
[<0>] ret_from_fork+0x2c/0x50
[<0>] ret_from_fork_asm+0x1b/0x30
==> /proc/558/comm <==
dmcrypt_write/252:0
==> /proc/558/stack <==
[<0>] dmcrypt_write+0x6a/0x140
[<0>] kthread+0xef/0x120
[<0>] ret_from_fork+0x2c/0x50
[<0>] ret_from_fork_asm+0x1b/0x30
==> /proc/717/comm <==
kworker/u12:6-kcryptd/252:0
==> /proc/717/stack <==
[<0>] worker_thread+0xab/0x3b0
[<0>] kthread+0xef/0x120
[<0>] ret_from_fork+0x2c/0x50
[<0>] ret_from_fork_asm+0x1b/0x30
Then tried:
- PAGE_ALLOC_COSTLY_ORDER=4, order=4 - cannot reproduce,
- PAGE_ALLOC_COSTLY_ORDER=4, order=5 - cannot reproduce,
- PAGE_ALLOC_COSTLY_ORDER=4, order=6 - freeze rather quickly
I've retried the PAGE_ALLOC_COSTLY_ORDER=4,order=5 case several times
and I can't reproduce the issue there. I'm confused...
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2023-10-31 3:48 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-21 3:38 Intermittent storage (dm-crypt?) freeze - regression 6.4->6.5 Marek Marczykowski-Górecki
2023-10-21 7:48 ` Bagas Sanjaya
2023-10-29 6:23 ` Linux regression tracking #update (Thorsten Leemhuis)
2023-10-29 6:23 ` Linux regression tracking #update (Thorsten Leemhuis)
2023-10-23 20:59 ` Mikulas Patocka
2023-10-23 20:59 ` Mikulas Patocka
2023-10-25 3:10 ` Marek Marczykowski-Górecki
2023-10-25 3:10 ` Marek Marczykowski-Górecki
2023-10-25 3:22 ` Marek Marczykowski-Górecki
2023-10-25 3:22 ` Marek Marczykowski-Górecki
2023-10-25 10:13 ` Mikulas Patocka
2023-10-27 17:32 ` Mikulas Patocka
2023-10-28 9:23 ` Matthew Wilcox
2023-10-28 15:14 ` Mike Snitzer
2023-10-29 11:15 ` Marek Marczykowski-Górecki
2023-10-29 20:02 ` Vlastimil Babka
2023-10-30 7:37 ` Mikulas Patocka
2023-10-30 8:37 ` Vlastimil Babka
2023-10-30 11:22 ` Mikulas Patocka
2023-10-30 11:30 ` Vlastimil Babka
2023-10-30 11:37 ` Mikulas Patocka
2023-10-30 12:25 ` Jan Kara
2023-10-30 13:30 ` Marek Marczykowski-Górecki
2023-10-30 14:08 ` Mikulas Patocka
2023-10-30 15:56 ` Jan Kara
2023-10-30 16:51 ` Marek Marczykowski-Górecki
2023-10-30 17:50 ` Mikulas Patocka
2023-10-31 3:48 ` Marek Marczykowski-Górecki [this message]
2023-10-31 14:01 ` Jan Kara
2023-10-31 15:42 ` Marek Marczykowski-Górecki
2023-10-31 17:17 ` Mikulas Patocka
2023-10-31 17:24 ` Mikulas Patocka
2023-11-02 0:38 ` Marek Marczykowski-Górecki
2023-11-02 9:28 ` Mikulas Patocka
2023-11-02 11:45 ` Marek Marczykowski-Górecki
2023-11-02 17:06 ` Mikulas Patocka
2023-11-03 15:01 ` Marek Marczykowski-Górecki
2023-11-03 15:10 ` Keith Busch
2023-11-03 16:15 ` Marek Marczykowski-Górecki
2023-11-03 16:54 ` Keith Busch
2023-11-03 20:30 ` Marek Marczykowski-G'orecki
2023-11-03 22:42 ` Keith Busch
2023-11-04 9:27 ` Mikulas Patocka
2023-11-04 13:59 ` Keith Busch
2023-11-06 7:10 ` Christoph Hellwig
2023-11-06 14:59 ` [PATCH] swiotlb-xen: provide the "max_mapping_size" method Mikulas Patocka
2023-11-06 15:16 ` Keith Busch
2023-11-06 15:30 ` Mike Snitzer
2023-11-06 17:12 ` [PATCH v2] " Mikulas Patocka
2023-11-07 4:18 ` Stefano Stabellini
2023-11-08 7:31 ` Christoph Hellwig
2023-11-06 7:08 ` Intermittent storage (dm-crypt?) freeze - regression 6.4->6.5 Christoph Hellwig
2023-11-02 12:21 ` Jan Kara
2023-11-01 1:27 ` Ming Lei
2023-11-01 2:14 ` Marek Marczykowski-Górecki
2023-11-01 2:15 ` Marek Marczykowski-Górecki
2023-11-01 2:35 ` Marek Marczykowski-Górecki
2023-11-01 3:24 ` Ming Lei
2023-11-01 10:15 ` Hannes Reinecke
2023-11-01 10:26 ` Jan Kara
2023-11-01 11:23 ` Ming Lei
2023-11-02 14:02 ` Keith Busch
2023-11-01 12:16 ` Mikulas Patocka
2023-10-30 11:28 ` Jan Kara
2023-10-30 11:49 ` Mikulas Patocka
2023-10-30 12:11 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZUB5HFeK3eHeI8UH@mail-itl \
--to=marmarek@invisiblethingslab.com \
--cc=agk@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dm-devel@lists.linux.dev \
--cc=jack@suse.cz \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=mpatocka@redhat.com \
--cc=regressions@lists.linux.dev \
--cc=snitzer@kernel.org \
--cc=stable@vger.kernel.org \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.