Processes hung in "D" state in ext4, mm, md and dmcrypt

* Processes hung in "D" state in ext4, mm, md and dmcrypt
@ 2023-07-26 10:02 David Howells
  2023-07-26 15:29 ` [dm-devel] " Ming Lei
  2023-07-27  2:38 ` Yu Kuai
  0 siblings, 2 replies; 5+ messages in thread
From: David Howells @ 2023-07-26 10:02 UTC (permalink / raw)
  To: Theodore Ts'o, Andrew Morton, Song Liu, Christoph Hellwig,
	Alasdair Kergon
  Cc: dhowells, linux-ext4, linux-mm, linux-raid, dm-devel,
	linux-kernel

Hi,

With 6.5-rc2 (6.5.0-0.rc2.20230721gitf7e3a1bafdea.20.fc39.x86_64), I'm seeing
a bunch of processes getting stuck in the D state on my desktop after a few
hours of reading email and compiling stuff.  It's happened every day this week
so far and I managed to grab stack traces of the stuck processes this morning
(see attached).

There are two blockdevs involved below, /dev/md2 and /dev/md3.  md3 is a raid1
array with two partitions with an ext4 partition on it.  md2 is similar but
it's dm-crypted and ext4 is on top of that.

David
---

   1015 ?        D      0:04 [md2_raid1]
   1074 ?        D      0:00 [jbd2/sda6-8]
   1138 ?        D      0:00 [jbd2/md3-8]
   1167 ?        D      0:10 [dmcrypt_write/253:0]
   1202 ?        D      0:03 [jbd2/dm-0-8]
 117547 ?        D      5:12 [kworker/u16:8+flush-9:3]
 121540 ?        D      0:46 [kworker/u16:10+flush-253:0]
 125431 pts/2    Dl+    0:00 emacs .stgit-edit.txt
 125469 ?        D      0:00 /usr/libexec/nmh/rcvstore +kernel

===1015===
    PID TTY      STAT   TIME COMMAND
   1015 ?        D      0:04 [md2_raid1]
[<0>] md_super_wait+0xa2/0xe0
[<0>] md_bitmap_daemon_work+0x183/0x3b0
[<0>] md_check_recovery+0x42/0x5a0
[<0>] raid1d+0x87/0x16f0 [raid1]
[<0>] md_thread+0xab/0x190
[<0>] kthread+0xe5/0x120
[<0>] ret_from_fork+0x31/0x50
[<0>] ret_from_fork_asm+0x1b/0x30

===1074===
    PID TTY      STAT   TIME COMMAND
   1074 ?        D      0:00 [jbd2/sda6-8]
[<0>] jbd2_journal_commit_transaction+0x11a6/0x1a20
[<0>] kjournald2+0xad/0x280
[<0>] kthread+0xe5/0x120
[<0>] ret_from_fork+0x31/0x50
[<0>] ret_from_fork_asm+0x1b/0x30

===1138===
    PID TTY      STAT   TIME COMMAND
   1138 ?        D      0:00 [jbd2/md3-8]
[<0>] jbd2_journal_commit_transaction+0x162d/0x1a20
[<0>] kjournald2+0xad/0x280
[<0>] kthread+0xe5/0x120
[<0>] ret_from_fork+0x31/0x50
[<0>] ret_from_fork_asm+0x1b/0x30

===1167===
    PID TTY      STAT   TIME COMMAND
   1167 ?        D      0:10 [dmcrypt_write/253:0]
[<0>] md_super_wait+0xa2/0xe0
[<0>] md_bitmap_unplug+0xad/0x120
[<0>] flush_bio_list+0xf3/0x100 [raid1]
[<0>] raid1_unplug+0x3b/0xb0 [raid1]
[<0>] __blk_flush_plug+0xd8/0x160
[<0>] blk_finish_plug+0x29/0x40
[<0>] dmcrypt_write+0x132/0x140 [dm_crypt]
[<0>] kthread+0xe5/0x120
[<0>] ret_from_fork+0x31/0x50
[<0>] ret_from_fork_asm+0x1b/0x30

===1202===
    PID TTY      STAT   TIME COMMAND
   1202 ?        D      0:03 [jbd2/dm-0-8]
[<0>] jbd2_journal_commit_transaction+0x162d/0x1a20
[<0>] kjournald2+0xad/0x280
[<0>] kthread+0xe5/0x120
[<0>] ret_from_fork+0x31/0x50
[<0>] ret_from_fork_asm+0x1b/0x30

===117547===
    PID TTY      STAT   TIME COMMAND
 117547 ?        D      5:12 [kworker/u16:8+flush-9:3]
[<0>] blk_mq_get_tag+0x11e/0x2b0
[<0>] __blk_mq_alloc_requests+0x1bc/0x350
[<0>] blk_mq_submit_bio+0x2c7/0x680
[<0>] __submit_bio+0x8b/0x170
[<0>] submit_bio_noacct_nocheck+0x159/0x370
[<0>] __block_write_full_folio+0x1e1/0x400
[<0>] writepage_cb+0x1a/0x70
[<0>] write_cache_pages+0x144/0x3b0
[<0>] do_writepages+0x164/0x1e0
[<0>] __writeback_single_inode+0x3d/0x360
[<0>] writeback_sb_inodes+0x1ed/0x4b0
[<0>] __writeback_inodes_wb+0x4c/0xf0
[<0>] wb_writeback+0x298/0x310
[<0>] wb_workfn+0x35b/0x510
[<0>] process_one_work+0x1de/0x3f0
[<0>] worker_thread+0x51/0x390
[<0>] kthread+0xe5/0x120
[<0>] ret_from_fork+0x31/0x50
[<0>] ret_from_fork_asm+0x1b/0x30

===121540===
    PID TTY      STAT   TIME COMMAND
 121540 ?        D      0:46 [kworker/u16:10+flush-253:0]
[<0>] folio_wait_bit_common+0x13d/0x350
[<0>] mpage_prepare_extent_to_map+0x309/0x4d0
[<0>] ext4_do_writepages+0x25d/0xc90
[<0>] ext4_writepages+0xad/0x180
[<0>] do_writepages+0xcf/0x1e0
[<0>] __writeback_single_inode+0x3d/0x360
[<0>] writeback_sb_inodes+0x1ed/0x4b0
[<0>] __writeback_inodes_wb+0x4c/0xf0
[<0>] wb_writeback+0x298/0x310
[<0>] wb_workfn+0x35b/0x510
[<0>] process_one_work+0x1de/0x3f0
[<0>] worker_thread+0x51/0x390
[<0>] kthread+0xe5/0x120
[<0>] ret_from_fork+0x31/0x50
[<0>] ret_from_fork_asm+0x1b/0x30

===125431===
    PID TTY      STAT   TIME COMMAND
 125431 pts/2    Dl+    0:00 emacs .stgit-edit.txt
[<0>] jbd2_log_wait_commit+0xd8/0x140
[<0>] ext4_sync_file+0x1cc/0x380
[<0>] __x64_sys_fsync+0x3b/0x70
[<0>] do_syscall_64+0x5d/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8

===125469===
    PID TTY      STAT   TIME COMMAND
 125469 ?        D      0:00 /usr/libexec/nmh/rcvstore +kernel
[<0>] folio_wait_bit_common+0x13d/0x350
[<0>] folio_wait_writeback+0x2c/0x90
[<0>] truncate_inode_partial_folio+0x5e/0x1a0
[<0>] truncate_inode_pages_range+0x1da/0x400
[<0>] truncate_pagecache+0x47/0x60
[<0>] ext4_setattr+0x685/0xba0
[<0>] notify_change+0x1e0/0x4a0
[<0>] do_truncate+0x98/0xf0
[<0>] do_sys_ftruncate+0x15c/0x1b0
[<0>] do_syscall_64+0x5d/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8

^ permalink raw reply	[flat|nested] 5+ messages in thread