linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* EXT4/JBD2 Not Fully Released device after unmount of NVMe-oF Block Device
@ 2025-06-01 11:02 Mitta Sai Chaithanya
  2025-06-01 22:04 ` Theodore Ts'o
  0 siblings, 1 reply; 5+ messages in thread
From: Mitta Sai Chaithanya @ 2025-06-01 11:02 UTC (permalink / raw)
  To: linux-ext4@vger.kernel.org
  Cc: Nilesh Awate, Ganesan Kalyanasundaram, Pawan Sharma

Hi Team,
           I'm encountering journal block device (JBD2) errors after unmounting a device and have been trying to trace the source of these errors. I've observed that these JBD2 errors only occur if the entries under /proc/fs/ext4/<device_name> or /proc/fs/jbd2/<device_name> still exist even after a successful unmount (the unmount command returns success).

For context: the block device (/dev/nvme0n1) is connected over NVMe-oF TCP to a remote target. I'm confident that no I/O is stuck on the target side, as there are no related I/O errors or warnings in the kernel logs where the target is connected.

However, the /proc entries mentioned above remain even after a successful unmount, and this seems to correlate with the journal-related errors.

I'd like to understand how to debug this issue further to determine the root cause. Specifically, I’m looking for guidance on what kernel-level references or subsystems might still be holding on to the journal or device structures post-unmount, and how to trace or identify them effectively (or) is this has fixed in latest versions of ext4?

Proc entries exist even after unmount:
root@aks-nodepool1-44537149-vmss000002 [ / ]# ls /proc/fs/ext4/nvme0n1/
es_shrinker_info  fc_info  mb_groups  mb_stats  mb_structs_summary  options
root@aks-nodepool1-44537149-vmss000002 [ / ]# ls /proc/fs/jbd2/nvme0n1-8/
info


Active process associated with unmounted device:
root      636845  0.0  0.0      0     0 ?        S    08:43   0:03 [jbd2/nvme0n1-8]
root      636987  0.0  0.0      0     0 ?        I<   08:43   0:00 [dio/nvme0n1]
root      699903  0.0  0.0      0     0 ?        I    09:18   0:01 [kworker/u16:1-nvme-wq]
root      761100  0.0  0.0      0     0 ?        I<   09:50   0:00 [kworker/1:1H-nvme_tcp_wq]
root      763896  0.0  0.0      0     0 ?        I<   09:52   0:00 [kworker/0:0H-nvme_tcp_wq]
root      779007  0.0  0.0      0     0 ?        I<   10:01   0:00 [kworker/0:1H-nvme_tcp_wq]


Stack trace of process (after unmount):

root@aks-nodepool1-44537149-vmss000002 [ / ]# cat /proc/636845/stack
[<0>] kjournald2+0x219/0x270
[<0>] kthread+0x12a/0x150
[<0>] ret_from_fork+0x22/0x30
root@aks-nodepool1-44537149-vmss000002 [ / ]# cat /proc/636846/stack
[<0>] rescuer_thread+0x2db/0x3b0
[<0>] kthread+0x12a/0x150
[<0>] ret_from_fork+0x22/0x30


 [ / ]# cat /proc/636987/stack
[<0>] rescuer_thread+0x2db/0x3b0
[<0>] kthread+0x12a/0x150
[<0>] ret_from_fork+0x22/0x30

 [ / ]# cat /proc/699903/stack
[<0>] worker_thread+0xcd/0x3d0
[<0>] kthread+0x12a/0x150
[<0>] ret_from_fork+0x22/0x30

 [ / ]# cat /proc/761100/stack
[<0>] worker_thread+0xcd/0x3d0
[<0>] kthread+0x12a/0x150
[<0>] ret_from_fork+0x22/0x30

 [ / ]# cat /proc/763896/stack
[<0>] worker_thread+0xcd/0x3d0
[<0>] kthread+0x12a/0x150
[<0>] ret_from_fork+0x22/0x30

[ / ]# cat /proc/779007/stack
[<0>] worker_thread+0xcd/0x3d0
[<0>] kthread+0x12a/0x150
[<0>] ret_from_fork+0x22/0x30


Kernel Logs:

2025-06-01T10:01:11.568304+00:00 aks-nodepool1-44537149-vmss000002 kernel: [30452.346875] nvme nvme0: Failed reconnect attempt 6
2025-06-01T10:01:11.568330+00:00 aks-nodepool1-44537149-vmss000002 kernel: [30452.346881] nvme nvme0: Reconnecting in 10 seconds...
2025-06-01T10:01:21.814134+00:00 aks-nodepool1-44537149-vmss000002 kernel: [30462.596133] nvme nvme0: Connect command failed, error wo/DNR bit: 6
2025-06-01T10:01:21.814165+00:00 aks-nodepool1-44537149-vmss000002 kernel: [30462.596186] nvme nvme0: failed to connect queue: 0 ret=6
2025-06-01T10:01:21.814174+00:00 aks-nodepool1-44537149-vmss000002 kernel: [30462.596289] nvme nvme0: Failed reconnect attempt 7
2025-06-01T10:01:21.814176+00:00 aks-nodepool1-44537149-vmss000002 kernel: [30462.596292] nvme nvme0: Reconnecting in 10 seconds...
2025-06-01T10:01:32.055063+00:00 aks-nodepool1-44537149-vmss000002 kernel: [30472.836929] nvme nvme0: queue_size 128 > ctrl sqsize 64, clamping down
2025-06-01T10:01:32.055094+00:00 aks-nodepool1-44537149-vmss000002 kernel: [30472.837002] nvme nvme0: creating 2 I/O queues.
2025-06-01T10:01:32.108286+00:00 aks-nodepool1-44537149-vmss000002 kernel: [30472.886546] nvme nvme0: mapped 2/0/0 default/read/poll queues.
2025-06-01T10:01:32.108313+00:00 aks-nodepool1-44537149-vmss000002 kernel: [30472.887450] nvme nvme0: Successfully reconnected (8 attempt)

High level information of ext4:

root@aks-nodepool1-44537149-vmss000002 [ / ]# dumpe2fs /dev/nvme0n1
dumpe2fs 1.46.5 (30-Dec-2021)
Filesystem volume name:   <none>
Last mounted on:          /datadir
Filesystem UUID:          1a564b4d-8f34-4f71-8370-802a239e350a
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index FEATURE_C12 filetype needs_recovery extent 64bit flex_bg metadata_csum_seed sparse_super large_file huge_file dir_nlink extra_isize metadata_csum FEATURE_R16
Filesystem flags:         signed_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              655360
Block count:              2620155
Reserved block count:     131007
Overhead clusters:        66747
Free blocks:              454698
Free inodes:              655344
First block:              0
Block size:               4096
Fragment size:            4096
Group descriptor size:    64
Reserved GDT blocks:      1024
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
RAID stripe width:        32
Flex block group size:    16
Filesystem created:       Sun Jun  1 08:36:28 2025
Last mount time:          Sun Jun  1 08:43:57 2025
Last write time:          Sun Jun  1 08:43:57 2025
Mount count:              4
Maximum mount count:      -1
Last checked:             Sun Jun  1 08:36:28 2025
Check interval:           0 (<none>)
Lifetime writes:          576 MB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     32
Desired extra isize:      32
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      22fed392-1993-4796-a996-feab145379ba
Journal backup:           inode blocks
Checksum type:            crc32c
Checksum:                 0xea839b0c
Checksum seed:            0x8e742ce9
Journal features:         journal_64bit journal_checksum_v3
Total journal size:       64M
Total journal blocks:     16384
Max transaction length:   16384
Fast commit length:       0
Journal sequence:         0x000002a0
Journal start:            6816
Journal checksum type:    crc32c
Journal checksum:         0xa35736ab


Thanks & Regards,
Sai

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-06-03 20:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-01 11:02 EXT4/JBD2 Not Fully Released device after unmount of NVMe-oF Block Device Mitta Sai Chaithanya
2025-06-01 22:04 ` Theodore Ts'o
2025-06-02 21:32   ` [EXTERNAL] " Mitta Sai Chaithanya
2025-06-03  0:29     ` Theodore Ts'o
2025-06-03 20:32       ` Andreas Dilger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).