* [PATCH v2 1/4] smb: cached directories can be more than root file handle
2024-11-18 21:50 [PATCH v2 0/4] SMB cached directory fixes around reconnection/unmounting Paul Aurich
@ 2024-11-18 21:50 ` Paul Aurich
2024-11-18 22:27 ` Steve French
2024-11-18 21:50 ` [PATCH v2 2/4] smb: Don't leak cfid when reconnect races with open_cached_dir Paul Aurich
` (4 subsequent siblings)
5 siblings, 1 reply; 20+ messages in thread
From: Paul Aurich @ 2024-11-18 21:50 UTC (permalink / raw)
To: linux-cifs, Steve French
Cc: paul, Paulo Alcantara, Ronnie Sahlberg, Shyam Prasad N,
Tom Talpey, Bharath SM
Update this log message since cached fids may represent things other
than the root of a mount.
Fixes: e4029e072673 ("cifs: find and use the dentry for cached non-root directories also")
Signed-off-by: Paul Aurich <paul@darkrain42.org>
---
fs/smb/client/cached_dir.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
index 0ff2491c311d..585e1dc72432 100644
--- a/fs/smb/client/cached_dir.c
+++ b/fs/smb/client/cached_dir.c
@@ -399,11 +399,12 @@ int open_cached_dir_by_dentry(struct cifs_tcon *tcon,
return -ENOENT;
spin_lock(&cfids->cfid_list_lock);
list_for_each_entry(cfid, &cfids->entries, entry) {
if (dentry && cfid->dentry == dentry) {
- cifs_dbg(FYI, "found a cached root file handle by dentry\n");
+ cifs_dbg(FYI, "found a cached file handle by dentry for %pd\n",
+ dentry);
kref_get(&cfid->refcount);
*ret_cfid = cfid;
spin_unlock(&cfids->cfid_list_lock);
return 0;
}
--
2.45.2
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH v2 1/4] smb: cached directories can be more than root file handle
2024-11-18 21:50 ` [PATCH v2 1/4] smb: cached directories can be more than root file handle Paul Aurich
@ 2024-11-18 22:27 ` Steve French
0 siblings, 0 replies; 20+ messages in thread
From: Steve French @ 2024-11-18 22:27 UTC (permalink / raw)
To: Paul Aurich
Cc: linux-cifs, Steve French, Paulo Alcantara, Ronnie Sahlberg,
Shyam Prasad N, Tom Talpey, Bharath SM
merged into cifs-2.6.git for-next
On Mon, Nov 18, 2024 at 3:50 PM Paul Aurich <paul@darkrain42.org> wrote:
>
> Update this log message since cached fids may represent things other
> than the root of a mount.
>
> Fixes: e4029e072673 ("cifs: find and use the dentry for cached non-root directories also")
> Signed-off-by: Paul Aurich <paul@darkrain42.org>
> ---
> fs/smb/client/cached_dir.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
> index 0ff2491c311d..585e1dc72432 100644
> --- a/fs/smb/client/cached_dir.c
> +++ b/fs/smb/client/cached_dir.c
> @@ -399,11 +399,12 @@ int open_cached_dir_by_dentry(struct cifs_tcon *tcon,
> return -ENOENT;
>
> spin_lock(&cfids->cfid_list_lock);
> list_for_each_entry(cfid, &cfids->entries, entry) {
> if (dentry && cfid->dentry == dentry) {
> - cifs_dbg(FYI, "found a cached root file handle by dentry\n");
> + cifs_dbg(FYI, "found a cached file handle by dentry for %pd\n",
> + dentry);
> kref_get(&cfid->refcount);
> *ret_cfid = cfid;
> spin_unlock(&cfids->cfid_list_lock);
> return 0;
> }
> --
> 2.45.2
>
>
--
Thanks,
Steve
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v2 2/4] smb: Don't leak cfid when reconnect races with open_cached_dir
2024-11-18 21:50 [PATCH v2 0/4] SMB cached directory fixes around reconnection/unmounting Paul Aurich
2024-11-18 21:50 ` [PATCH v2 1/4] smb: cached directories can be more than root file handle Paul Aurich
@ 2024-11-18 21:50 ` Paul Aurich
2024-11-18 21:50 ` [PATCH v2 3/4] smb: prevent use-after-free due to open_cached_dir error paths Paul Aurich
` (3 subsequent siblings)
5 siblings, 0 replies; 20+ messages in thread
From: Paul Aurich @ 2024-11-18 21:50 UTC (permalink / raw)
To: linux-cifs, Steve French
Cc: paul, Paulo Alcantara, Ronnie Sahlberg, Shyam Prasad N,
Tom Talpey, Bharath SM
open_cached_dir() may either race with the tcon reconnection even before
compound_send_recv() or directly trigger a reconnection via
SMB2_open_init() or SMB_query_info_init().
The reconnection process invokes invalidate_all_cached_dirs() via
cifs_mark_open_files_invalid(), which removes all cfids from the
cfids->entries list but doesn't drop a ref if has_lease isn't true. This
results in the currently-being-constructed cfid not being on the list,
but still having a refcount of 2. It leaks if returned from
open_cached_dir().
Fix this by setting cfid->has_lease when the ref is actually taken; the
cfid will not be used by other threads until it has a valid time.
Addresses these kmemleaks:
unreferenced object 0xffff8881090c4000 (size 1024):
comm "bash", pid 1860, jiffies 4295126592
hex dump (first 32 bytes):
00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de ........".......
00 ca 45 22 81 88 ff ff f8 dc 4f 04 81 88 ff ff ..E"......O.....
backtrace (crc 6f58c20f):
[<ffffffff8b895a1e>] __kmalloc_cache_noprof+0x2be/0x350
[<ffffffff8bda06e3>] open_cached_dir+0x993/0x1fb0
[<ffffffff8bdaa750>] cifs_readdir+0x15a0/0x1d50
[<ffffffff8b9a853f>] iterate_dir+0x28f/0x4b0
[<ffffffff8b9a9aed>] __x64_sys_getdents64+0xfd/0x200
[<ffffffff8cf6da05>] do_syscall_64+0x95/0x1a0
[<ffffffff8d00012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
unreferenced object 0xffff8881044fdcf8 (size 8):
comm "bash", pid 1860, jiffies 4295126592
hex dump (first 8 bytes):
00 cc cc cc cc cc cc cc ........
backtrace (crc 10c106a9):
[<ffffffff8b89a3d3>] __kmalloc_node_track_caller_noprof+0x363/0x480
[<ffffffff8b7d7256>] kstrdup+0x36/0x60
[<ffffffff8bda0700>] open_cached_dir+0x9b0/0x1fb0
[<ffffffff8bdaa750>] cifs_readdir+0x15a0/0x1d50
[<ffffffff8b9a853f>] iterate_dir+0x28f/0x4b0
[<ffffffff8b9a9aed>] __x64_sys_getdents64+0xfd/0x200
[<ffffffff8cf6da05>] do_syscall_64+0x95/0x1a0
[<ffffffff8d00012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
And addresses these BUG splats when unmounting the SMB filesystem:
BUG: Dentry ffff888140590ba0{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs]
WARNING: CPU: 3 PID: 3433 at fs/dcache.c:1536 umount_check+0xd0/0x100
Modules linked in:
CPU: 3 UID: 0 PID: 3433 Comm: bash Not tainted 6.12.0-rc4-g850925a8133c-dirty #49
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
RIP: 0010:umount_check+0xd0/0x100
Code: 8d 7c 24 40 e8 31 5a f4 ff 49 8b 54 24 40 41 56 49 89 e9 45 89 e8 48 89 d9 41 57 48 89 de 48 c7 c7 80 e7 db ac e8 f0 72 9a ff <0f> 0b 58 31 c0 5a 5b 5d 41 5c 41 5d 41 5e 41 5f e9 2b e5 5d 01 41
RSP: 0018:ffff88811cc27978 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff888140590ba0 RCX: ffffffffaaf20bae
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff8881f6fb6f40
RBP: ffff8881462ec000 R08: 0000000000000001 R09: ffffed1023984ee3
R10: ffff88811cc2771f R11: 00000000016cfcc0 R12: ffff888134383e08
R13: 0000000000000002 R14: ffff8881462ec668 R15: ffffffffaceab4c0
FS: 00007f23bfa98740(0000) GS:ffff8881f6f80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000556de4a6f808 CR3: 0000000123c80000 CR4: 0000000000350ef0
Call Trace:
<TASK>
d_walk+0x6a/0x530
shrink_dcache_for_umount+0x6a/0x200
generic_shutdown_super+0x52/0x2a0
kill_anon_super+0x22/0x40
cifs_kill_sb+0x159/0x1e0
deactivate_locked_super+0x66/0xe0
cleanup_mnt+0x140/0x210
task_work_run+0xfb/0x170
syscall_exit_to_user_mode+0x29f/0x2b0
do_syscall_64+0xa1/0x1a0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f23bfb93ae7
Code: ff ff ff ff c3 66 0f 1f 44 00 00 48 8b 0d 11 93 0d 00 f7 d8 64 89 01 b8 ff ff ff ff eb bf 0f 1f 44 00 00 b8 50 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 92 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007ffee9138598 EFLAGS: 00000246 ORIG_RAX: 0000000000000050
RAX: 0000000000000000 RBX: 0000558f1803e9a0 RCX: 00007f23bfb93ae7
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000558f1803e9a0
RBP: 0000558f1803e600 R08: 0000000000000007 R09: 0000558f17fab610
R10: d91d5ec34ab757b0 R11: 0000000000000246 R12: 0000000000000001
R13: 0000000000000000 R14: 0000000000000015 R15: 0000000000000000
</TASK>
irq event stamp: 1163486
hardirqs last enabled at (1163485): [<ffffffffac98d344>] _raw_spin_unlock_irqrestore+0x34/0x60
hardirqs last disabled at (1163486): [<ffffffffac97dcfc>] __schedule+0xc7c/0x19a0
softirqs last enabled at (1163482): [<ffffffffab79a3ee>] __smb_send_rqst+0x3de/0x990
softirqs last disabled at (1163480): [<ffffffffac2314f1>] release_sock+0x21/0xf0
---[ end trace 0000000000000000 ]---
VFS: Busy inodes after unmount of cifs (cifs)
------------[ cut here ]------------
kernel BUG at fs/super.c:661!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 1 UID: 0 PID: 3433 Comm: bash Tainted: G W 6.12.0-rc4-g850925a8133c-dirty #49
Tainted: [W]=WARN
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
RIP: 0010:generic_shutdown_super+0x290/0x2a0
Code: e8 15 7c f7 ff 48 8b 5d 28 48 89 df e8 09 7c f7 ff 48 8b 0b 48 89 ee 48 8d 95 68 06 00 00 48 c7 c7 80 7f db ac e8 00 69 af ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90
RSP: 0018:ffff88811cc27a50 EFLAGS: 00010246
RAX: 000000000000003e RBX: ffffffffae994420 RCX: 0000000000000027
RDX: 0000000000000000 RSI: ffffffffab06180e RDI: ffff8881f6eb18c8
RBP: ffff8881462ec000 R08: 0000000000000001 R09: ffffed103edd6319
R10: ffff8881f6eb18cb R11: 00000000016d3158 R12: ffff8881462ec9c0
R13: ffff8881462ec050 R14: 0000000000000001 R15: 0000000000000000
FS: 00007f23bfa98740(0000) GS:ffff8881f6e80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8364005d68 CR3: 0000000123c80000 CR4: 0000000000350ef0
Call Trace:
<TASK>
kill_anon_super+0x22/0x40
cifs_kill_sb+0x159/0x1e0
deactivate_locked_super+0x66/0xe0
cleanup_mnt+0x140/0x210
task_work_run+0xfb/0x170
syscall_exit_to_user_mode+0x29f/0x2b0
do_syscall_64+0xa1/0x1a0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f23bfb93ae7
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:generic_shutdown_super+0x290/0x2a0
Code: e8 15 7c f7 ff 48 8b 5d 28 48 89 df e8 09 7c f7 ff 48 8b 0b 48 89 ee 48 8d 95 68 06 00 00 48 c7 c7 80 7f db ac e8 00 69 af ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90
RSP: 0018:ffff88811cc27a50 EFLAGS: 00010246
RAX: 000000000000003e RBX: ffffffffae994420 RCX: 0000000000000027
RDX: 0000000000000000 RSI: ffffffffab06180e RDI: ffff8881f6eb18c8
RBP: ffff8881462ec000 R08: 0000000000000001 R09: ffffed103edd6319
R10: ffff8881f6eb18cb R11: 00000000016d3158 R12: ffff8881462ec9c0
R13: ffff8881462ec050 R14: 0000000000000001 R15: 0000000000000000
FS: 00007f23bfa98740(0000) GS:ffff8881f6e80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8364005d68 CR3: 0000000123c80000 CR4: 0000000000350ef0
This reproduces eventually with an SMB mount and two shells running
these loops concurrently
- while true; do
cd ~; sleep 1;
for i in {1..3}; do cd /mnt/test/subdir;
echo $PWD; sleep 1; cd ..; echo $PWD; sleep 1;
done;
echo ...;
done
- while true; do
iptables -F OUTPUT; mount -t cifs -a;
for _ in {0..2}; do ls /mnt/test/subdir/ | wc -l; done;
iptables -I OUTPUT -p tcp --dport 445 -j DROP;
sleep 10
echo "unmounting"; umount -l -t cifs -a; echo "done unmounting";
sleep 20
echo "recovering"; iptables -F OUTPUT;
sleep 10;
done
Fixes: ebe98f1447bb ("cifs: enable caching of directories for which a lease is held")
Fixes: 5c86919455c1 ("smb: client: fix use-after-free in smb2_query_info_compound()")
Cc: stable@vger.kernel.org
Signed-off-by: Paul Aurich <paul@darkrain42.org>
---
fs/smb/client/cached_dir.c | 27 ++++++++++++++-------------
1 file changed, 14 insertions(+), 13 deletions(-)
diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
index 585e1dc72432..59f07adf28d3 100644
--- a/fs/smb/client/cached_dir.c
+++ b/fs/smb/client/cached_dir.c
@@ -57,10 +57,20 @@ static struct cached_fid *find_or_create_cached_dir(struct cached_fids *cfids,
cfid->cfids = cfids;
cfids->num_entries++;
list_add(&cfid->entry, &cfids->entries);
cfid->on_list = true;
kref_get(&cfid->refcount);
+ /*
+ * Set @cfid->has_lease to true during construction so that the lease
+ * reference can be put in cached_dir_lease_break() due to a potential
+ * lease break right after the request is sent or while @cfid is still
+ * being cached, or if a reconnection is triggered during construction.
+ * Concurrent processes won't be to use it yet due to @cfid->time being
+ * zero.
+ */
+ cfid->has_lease = true;
+
spin_unlock(&cfids->cfid_list_lock);
return cfid;
}
static struct dentry *
@@ -174,16 +184,16 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
if (cfid == NULL) {
kfree(utf16_path);
return -ENOENT;
}
/*
- * Return cached fid if it has a lease. Otherwise, it is either a new
- * entry or laundromat worker removed it from @cfids->entries. Caller
- * will put last reference if the latter.
+ * Return cached fid if it is valid (has a lease and has a time).
+ * Otherwise, it is either a new entry or laundromat worker removed it
+ * from @cfids->entries. Caller will put last reference if the latter.
*/
spin_lock(&cfids->cfid_list_lock);
- if (cfid->has_lease) {
+ if (cfid->has_lease && cfid->time) {
spin_unlock(&cfids->cfid_list_lock);
*ret_cfid = cfid;
kfree(utf16_path);
return 0;
}
@@ -265,19 +275,10 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
if (rc)
goto oshr_free;
smb2_set_related(&rqst[1]);
- /*
- * Set @cfid->has_lease to true before sending out compounded request so
- * its lease reference can be put in cached_dir_lease_break() due to a
- * potential lease break right after the request is sent or while @cfid
- * is still being cached. Concurrent processes won't be to use it yet
- * due to @cfid->time being zero.
- */
- cfid->has_lease = true;
-
if (retries) {
smb2_set_replay(server, &rqst[0]);
smb2_set_replay(server, &rqst[1]);
}
--
2.45.2
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH v2 3/4] smb: prevent use-after-free due to open_cached_dir error paths
2024-11-18 21:50 [PATCH v2 0/4] SMB cached directory fixes around reconnection/unmounting Paul Aurich
2024-11-18 21:50 ` [PATCH v2 1/4] smb: cached directories can be more than root file handle Paul Aurich
2024-11-18 21:50 ` [PATCH v2 2/4] smb: Don't leak cfid when reconnect races with open_cached_dir Paul Aurich
@ 2024-11-18 21:50 ` Paul Aurich
2024-11-18 21:50 ` [PATCH v2 4/4] smb: During unmount, ensure all cached dir instances drop their dentry Paul Aurich
` (2 subsequent siblings)
5 siblings, 0 replies; 20+ messages in thread
From: Paul Aurich @ 2024-11-18 21:50 UTC (permalink / raw)
To: linux-cifs, Steve French
Cc: paul, Paulo Alcantara, Ronnie Sahlberg, Shyam Prasad N,
Tom Talpey, Bharath SM
If open_cached_dir() encounters an error parsing the lease from the
server, the error handling may race with receiving a lease break,
resulting in open_cached_dir() freeing the cfid while the queued work is
pending.
Update open_cached_dir() to drop refs rather than directly freeing the
cfid.
Have cached_dir_lease_break(), cfids_laundromat_worker(), and
invalidate_all_cached_dirs() clear has_lease immediately while still
holding cfids->cfid_list_lock, and then use this to also simplify the
reference counting in cfids_laundromat_worker() and
invalidate_all_cached_dirs().
Fixes this KASAN splat (which manually injects an error and lease break
in open_cached_dir()):
==================================================================
BUG: KASAN: slab-use-after-free in smb2_cached_lease_break+0x27/0xb0
Read of size 8 at addr ffff88811cc24c10 by task kworker/3:1/65
CPU: 3 UID: 0 PID: 65 Comm: kworker/3:1 Not tainted 6.12.0-rc6-g255cf264e6e5-dirty #87
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
Workqueue: cifsiod smb2_cached_lease_break
Call Trace:
<TASK>
dump_stack_lvl+0x77/0xb0
print_report+0xce/0x660
kasan_report+0xd3/0x110
smb2_cached_lease_break+0x27/0xb0
process_one_work+0x50a/0xc50
worker_thread+0x2ba/0x530
kthread+0x17c/0x1c0
ret_from_fork+0x34/0x60
ret_from_fork_asm+0x1a/0x30
</TASK>
Allocated by task 2464:
kasan_save_stack+0x33/0x60
kasan_save_track+0x14/0x30
__kasan_kmalloc+0xaa/0xb0
open_cached_dir+0xa7d/0x1fb0
smb2_query_path_info+0x43c/0x6e0
cifs_get_fattr+0x346/0xf10
cifs_get_inode_info+0x157/0x210
cifs_revalidate_dentry_attr+0x2d1/0x460
cifs_getattr+0x173/0x470
vfs_statx_path+0x10f/0x160
vfs_statx+0xe9/0x150
vfs_fstatat+0x5e/0xc0
__do_sys_newfstatat+0x91/0xf0
do_syscall_64+0x95/0x1a0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Freed by task 2464:
kasan_save_stack+0x33/0x60
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3b/0x60
__kasan_slab_free+0x51/0x70
kfree+0x174/0x520
open_cached_dir+0x97f/0x1fb0
smb2_query_path_info+0x43c/0x6e0
cifs_get_fattr+0x346/0xf10
cifs_get_inode_info+0x157/0x210
cifs_revalidate_dentry_attr+0x2d1/0x460
cifs_getattr+0x173/0x470
vfs_statx_path+0x10f/0x160
vfs_statx+0xe9/0x150
vfs_fstatat+0x5e/0xc0
__do_sys_newfstatat+0x91/0xf0
do_syscall_64+0x95/0x1a0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Last potentially related work creation:
kasan_save_stack+0x33/0x60
__kasan_record_aux_stack+0xad/0xc0
insert_work+0x32/0x100
__queue_work+0x5c9/0x870
queue_work_on+0x82/0x90
open_cached_dir+0x1369/0x1fb0
smb2_query_path_info+0x43c/0x6e0
cifs_get_fattr+0x346/0xf10
cifs_get_inode_info+0x157/0x210
cifs_revalidate_dentry_attr+0x2d1/0x460
cifs_getattr+0x173/0x470
vfs_statx_path+0x10f/0x160
vfs_statx+0xe9/0x150
vfs_fstatat+0x5e/0xc0
__do_sys_newfstatat+0x91/0xf0
do_syscall_64+0x95/0x1a0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The buggy address belongs to the object at ffff88811cc24c00
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff88811cc24c00, ffff88811cc25000)
Cc: stable@vger.kernel.org
Signed-off-by: Paul Aurich <paul@darkrain42.org>
---
fs/smb/client/cached_dir.c | 70 ++++++++++++++++----------------------
1 file changed, 29 insertions(+), 41 deletions(-)
diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
index 59f07adf28d3..64c67cbb2aa5 100644
--- a/fs/smb/client/cached_dir.c
+++ b/fs/smb/client/cached_dir.c
@@ -346,10 +346,11 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
oshr_free:
SMB2_open_free(&rqst[0]);
SMB2_query_info_free(&rqst[1]);
free_rsp_buf(resp_buftype[0], rsp_iov[0].iov_base);
free_rsp_buf(resp_buftype[1], rsp_iov[1].iov_base);
+out:
if (rc) {
spin_lock(&cfids->cfid_list_lock);
if (cfid->on_list) {
list_del(&cfid->entry);
cfid->on_list = false;
@@ -357,27 +358,18 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
}
if (cfid->has_lease) {
/*
* We are guaranteed to have two references at this
* point. One for the caller and one for a potential
- * lease. Release the Lease-ref so that the directory
- * will be closed when the caller closes the cached
- * handle.
+ * lease. Release one here, and the second below.
*/
cfid->has_lease = false;
- spin_unlock(&cfids->cfid_list_lock);
kref_put(&cfid->refcount, smb2_close_cached_fid);
- goto out;
}
spin_unlock(&cfids->cfid_list_lock);
- }
-out:
- if (rc) {
- if (cfid->is_open)
- SMB2_close(0, cfid->tcon, cfid->fid.persistent_fid,
- cfid->fid.volatile_fid);
- free_cached_dir(cfid);
+
+ kref_put(&cfid->refcount, smb2_close_cached_fid);
} else {
*ret_cfid = cfid;
atomic_inc(&tcon->num_remote_opens);
}
kfree(utf16_path);
@@ -512,42 +504,38 @@ void invalidate_all_cached_dirs(struct cifs_tcon *tcon)
list_for_each_entry_safe(cfid, q, &cfids->entries, entry) {
list_move(&cfid->entry, &entry);
cfids->num_entries--;
cfid->is_open = false;
cfid->on_list = false;
- /* To prevent race with smb2_cached_lease_break() */
- kref_get(&cfid->refcount);
+ if (cfid->has_lease) {
+ /*
+ * The lease was never cancelled from the server,
+ * so steal that reference.
+ */
+ cfid->has_lease = false;
+ } else
+ kref_get(&cfid->refcount);
}
spin_unlock(&cfids->cfid_list_lock);
list_for_each_entry_safe(cfid, q, &entry, entry) {
list_del(&cfid->entry);
cancel_work_sync(&cfid->lease_break);
- if (cfid->has_lease) {
- /*
- * We lease was never cancelled from the server so we
- * need to drop the reference.
- */
- spin_lock(&cfids->cfid_list_lock);
- cfid->has_lease = false;
- spin_unlock(&cfids->cfid_list_lock);
- kref_put(&cfid->refcount, smb2_close_cached_fid);
- }
- /* Drop the extra reference opened above*/
+ /*
+ * Drop the ref-count from above, either the lease-ref (if there
+ * was one) or the extra one acquired.
+ */
kref_put(&cfid->refcount, smb2_close_cached_fid);
}
}
static void
smb2_cached_lease_break(struct work_struct *work)
{
struct cached_fid *cfid = container_of(work,
struct cached_fid, lease_break);
- spin_lock(&cfid->cfids->cfid_list_lock);
- cfid->has_lease = false;
- spin_unlock(&cfid->cfids->cfid_list_lock);
kref_put(&cfid->refcount, smb2_close_cached_fid);
}
int cached_dir_lease_break(struct cifs_tcon *tcon, __u8 lease_key[16])
{
@@ -561,10 +549,11 @@ int cached_dir_lease_break(struct cifs_tcon *tcon, __u8 lease_key[16])
list_for_each_entry(cfid, &cfids->entries, entry) {
if (cfid->has_lease &&
!memcmp(lease_key,
cfid->fid.lease_key,
SMB2_LEASE_KEY_SIZE)) {
+ cfid->has_lease = false;
cfid->time = 0;
/*
* We found a lease remove it from the list
* so no threads can access it.
*/
@@ -638,12 +627,18 @@ static void cfids_laundromat_worker(struct work_struct *work)
if (cfid->time &&
time_after(jiffies, cfid->time + HZ * dir_cache_timeout)) {
cfid->on_list = false;
list_move(&cfid->entry, &entry);
cfids->num_entries--;
- /* To prevent race with smb2_cached_lease_break() */
- kref_get(&cfid->refcount);
+ if (cfid->has_lease) {
+ /*
+ * Our lease has not yet been cancelled from the
+ * server. Steal that reference.
+ */
+ cfid->has_lease = false;
+ } else
+ kref_get(&cfid->refcount);
}
}
spin_unlock(&cfids->cfid_list_lock);
list_for_each_entry_safe(cfid, q, &entry, entry) {
@@ -651,21 +646,14 @@ static void cfids_laundromat_worker(struct work_struct *work)
/*
* Cancel and wait for the work to finish in case we are racing
* with it.
*/
cancel_work_sync(&cfid->lease_break);
- if (cfid->has_lease) {
- /*
- * Our lease has not yet been cancelled from the server
- * so we need to drop the reference.
- */
- spin_lock(&cfids->cfid_list_lock);
- cfid->has_lease = false;
- spin_unlock(&cfids->cfid_list_lock);
- kref_put(&cfid->refcount, smb2_close_cached_fid);
- }
- /* Drop the extra reference opened above */
+ /*
+ * Drop the ref-count from above, either the lease-ref (if there
+ * was one) or the extra one acquired.
+ */
kref_put(&cfid->refcount, smb2_close_cached_fid);
}
queue_delayed_work(cifsiod_wq, &cfids->laundromat_work,
dir_cache_timeout * HZ);
}
--
2.45.2
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH v2 4/4] smb: During unmount, ensure all cached dir instances drop their dentry
2024-11-18 21:50 [PATCH v2 0/4] SMB cached directory fixes around reconnection/unmounting Paul Aurich
` (2 preceding siblings ...)
2024-11-18 21:50 ` [PATCH v2 3/4] smb: prevent use-after-free due to open_cached_dir error paths Paul Aurich
@ 2024-11-18 21:50 ` Paul Aurich
2024-11-22 2:05 ` Paulo Alcantara
2024-11-19 0:55 ` [PATCH v2 0/4] SMB cached directory fixes around reconnection/unmounting Steve French
2024-11-21 20:59 ` Steve French
5 siblings, 1 reply; 20+ messages in thread
From: Paul Aurich @ 2024-11-18 21:50 UTC (permalink / raw)
To: linux-cifs, Steve French
Cc: paul, Paulo Alcantara, Ronnie Sahlberg, Shyam Prasad N,
Tom Talpey, Bharath SM
The unmount process (cifs_kill_sb() calling close_all_cached_dirs()) can
race with various cached directory operations, which ultimately results
in dentries not being dropped and these kernel BUGs:
BUG: Dentry ffff88814f37e358{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs]
VFS: Busy inodes after unmount of cifs (cifs)
------------[ cut here ]------------
kernel BUG at fs/super.c:661!
This happens when a cfid is in the process of being cleaned up when, and
has been removed from the cfids->entries list, including:
- Receiving a lease break from the server
- Server reconnection triggers invalidate_all_cached_dirs(), which
removes all the cfids from the list
- The laundromat thread decides to expire an old cfid.
To solve these problems, dropping the dentry is done in queued work done
in a newly-added cfid_put_wq workqueue, and close_all_cached_dirs()
flushes that workqueue after it drops all the dentries of which it's
aware. This is a global workqueue (rather than scoped to a mount), but
the queued work is minimal.
The final cleanup work for cleaning up a cfid is performed via work
queued in the serverclose_wq workqueue; this is done separate from
dropping the dentries so that close_all_cached_dirs() doesn't block on
any server operations.
Both of these queued works expect to invoked with a cfid reference and
a tcon reference to avoid those objects from being freed while the work
is ongoing.
While we're here, add proper locking to close_all_cached_dirs(), and
locking around the freeing of cfid->dentry.
Fixes: ebe98f1447bb ("cifs: enable caching of directories for which a lease is held")
Cc: stable@vger.kernel.org
Signed-off-by: Paul Aurich <paul@darkrain42.org>
---
fs/smb/client/cached_dir.c | 156 ++++++++++++++++++++++++++++++-------
fs/smb/client/cached_dir.h | 6 +-
fs/smb/client/cifsfs.c | 14 +++-
fs/smb/client/cifsglob.h | 3 +-
fs/smb/client/inode.c | 3 -
fs/smb/client/trace.h | 3 +
6 files changed, 148 insertions(+), 37 deletions(-)
diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
index 64c67cbb2aa5..8fb95f4347df 100644
--- a/fs/smb/client/cached_dir.c
+++ b/fs/smb/client/cached_dir.c
@@ -15,10 +15,15 @@
static struct cached_fid *init_cached_dir(const char *path);
static void free_cached_dir(struct cached_fid *cfid);
static void smb2_close_cached_fid(struct kref *ref);
static void cfids_laundromat_worker(struct work_struct *work);
+struct cached_dir_dentry {
+ struct list_head entry;
+ struct dentry *dentry;
+};
+
static struct cached_fid *find_or_create_cached_dir(struct cached_fids *cfids,
const char *path,
bool lookup_only,
__u32 max_cached_dirs)
{
@@ -469,42 +474,68 @@ void close_all_cached_dirs(struct cifs_sb_info *cifs_sb)
struct rb_node *node;
struct cached_fid *cfid;
struct cifs_tcon *tcon;
struct tcon_link *tlink;
struct cached_fids *cfids;
+ struct cached_dir_dentry *tmp_list, *q;
+ LIST_HEAD(entry);
+ spin_lock(&cifs_sb->tlink_tree_lock);
for (node = rb_first(root); node; node = rb_next(node)) {
tlink = rb_entry(node, struct tcon_link, tl_rbnode);
tcon = tlink_tcon(tlink);
if (IS_ERR(tcon))
continue;
cfids = tcon->cfids;
if (cfids == NULL)
continue;
+ spin_lock(&cfids->cfid_list_lock);
list_for_each_entry(cfid, &cfids->entries, entry) {
- dput(cfid->dentry);
+ tmp_list = kmalloc(sizeof(*tmp_list), GFP_ATOMIC);
+ if (tmp_list == NULL)
+ break;
+ spin_lock(&cfid->fid_lock);
+ tmp_list->dentry = cfid->dentry;
cfid->dentry = NULL;
+ spin_unlock(&cfid->fid_lock);
+
+ list_add_tail(&tmp_list->entry, &entry);
}
+ spin_unlock(&cfids->cfid_list_lock);
}
+ spin_unlock(&cifs_sb->tlink_tree_lock);
+
+ list_for_each_entry_safe(tmp_list, q, &entry, entry) {
+ list_del(&tmp_list->entry);
+ dput(tmp_list->dentry);
+ kfree(tmp_list);
+ }
+
+ /* Flush any pending work that will drop dentries */
+ flush_workqueue(cfid_put_wq);
}
/*
* Invalidate all cached dirs when a TCON has been reset
* due to a session loss.
*/
void invalidate_all_cached_dirs(struct cifs_tcon *tcon)
{
struct cached_fids *cfids = tcon->cfids;
struct cached_fid *cfid, *q;
- LIST_HEAD(entry);
if (cfids == NULL)
return;
+ /*
+ * Mark all the cfids as closed, and move them to the cfids->dying list.
+ * They'll be cleaned up later by cfids_invalidation_worker. Take
+ * a reference to each cfid during this process.
+ */
spin_lock(&cfids->cfid_list_lock);
list_for_each_entry_safe(cfid, q, &cfids->entries, entry) {
- list_move(&cfid->entry, &entry);
+ list_move(&cfid->entry, &cfids->dying);
cfids->num_entries--;
cfid->is_open = false;
cfid->on_list = false;
if (cfid->has_lease) {
/*
@@ -513,30 +544,51 @@ void invalidate_all_cached_dirs(struct cifs_tcon *tcon)
*/
cfid->has_lease = false;
} else
kref_get(&cfid->refcount);
}
+ /*
+ * Queue dropping of the dentries once locks have been dropped
+ */
+ if (!list_empty(&cfids->dying))
+ queue_work(cfid_put_wq, &cfids->invalidation_work);
spin_unlock(&cfids->cfid_list_lock);
-
- list_for_each_entry_safe(cfid, q, &entry, entry) {
- list_del(&cfid->entry);
- cancel_work_sync(&cfid->lease_break);
- /*
- * Drop the ref-count from above, either the lease-ref (if there
- * was one) or the extra one acquired.
- */
- kref_put(&cfid->refcount, smb2_close_cached_fid);
- }
}
static void
-smb2_cached_lease_break(struct work_struct *work)
+cached_dir_offload_close(struct work_struct *work)
{
struct cached_fid *cfid = container_of(work,
- struct cached_fid, lease_break);
+ struct cached_fid, close_work);
+ struct cifs_tcon *tcon = cfid->tcon;
+
+ WARN_ON(cfid->on_list);
kref_put(&cfid->refcount, smb2_close_cached_fid);
+ cifs_put_tcon(tcon, netfs_trace_tcon_ref_put_cached_close);
+}
+
+/*
+ * Release the cached directory's dentry, and then queue work to drop cached
+ * directory itself (closing on server if needed).
+ *
+ * Must be called with a reference to the cached_fid and a reference to the
+ * tcon.
+ */
+static void cached_dir_put_work(struct work_struct *work)
+{
+ struct cached_fid *cfid = container_of(work, struct cached_fid,
+ put_work);
+ struct dentry *dentry;
+
+ spin_lock(&cfid->fid_lock);
+ dentry = cfid->dentry;
+ cfid->dentry = NULL;
+ spin_unlock(&cfid->fid_lock);
+
+ dput(dentry);
+ queue_work(serverclose_wq, &cfid->close_work);
}
int cached_dir_lease_break(struct cifs_tcon *tcon, __u8 lease_key[16])
{
struct cached_fids *cfids = tcon->cfids;
@@ -559,12 +611,14 @@ int cached_dir_lease_break(struct cifs_tcon *tcon, __u8 lease_key[16])
*/
list_del(&cfid->entry);
cfid->on_list = false;
cfids->num_entries--;
- queue_work(cifsiod_wq,
- &cfid->lease_break);
+ ++tcon->tc_count;
+ trace_smb3_tcon_ref(tcon->debug_id, tcon->tc_count,
+ netfs_trace_tcon_ref_get_cached_lease_break);
+ queue_work(cfid_put_wq, &cfid->put_work);
spin_unlock(&cfids->cfid_list_lock);
return true;
}
}
spin_unlock(&cfids->cfid_list_lock);
@@ -582,11 +636,12 @@ static struct cached_fid *init_cached_dir(const char *path)
if (!cfid->path) {
kfree(cfid);
return NULL;
}
- INIT_WORK(&cfid->lease_break, smb2_cached_lease_break);
+ INIT_WORK(&cfid->close_work, cached_dir_offload_close);
+ INIT_WORK(&cfid->put_work, cached_dir_put_work);
INIT_LIST_HEAD(&cfid->entry);
INIT_LIST_HEAD(&cfid->dirents.entries);
mutex_init(&cfid->dirents.de_mutex);
spin_lock_init(&cfid->fid_lock);
kref_init(&cfid->refcount);
@@ -595,10 +650,13 @@ static struct cached_fid *init_cached_dir(const char *path)
static void free_cached_dir(struct cached_fid *cfid)
{
struct cached_dirent *dirent, *q;
+ WARN_ON(work_pending(&cfid->close_work));
+ WARN_ON(work_pending(&cfid->put_work));
+
dput(cfid->dentry);
cfid->dentry = NULL;
/*
* Delete all cached dirent names
@@ -612,14 +670,34 @@ static void free_cached_dir(struct cached_fid *cfid)
kfree(cfid->path);
cfid->path = NULL;
kfree(cfid);
}
+static void cfids_invalidation_worker(struct work_struct *work)
+{
+ struct cached_fids *cfids = container_of(work, struct cached_fids,
+ invalidation_work);
+ struct cached_fid *cfid, *q;
+ LIST_HEAD(entry);
+
+ spin_lock(&cfids->cfid_list_lock);
+ /* move cfids->dying to the local list */
+ list_cut_before(&entry, &cfids->dying, &cfids->dying);
+ spin_unlock(&cfids->cfid_list_lock);
+
+ list_for_each_entry_safe(cfid, q, &entry, entry) {
+ list_del(&cfid->entry);
+ /* Drop the ref-count acquired in invalidate_all_cached_dirs */
+ kref_put(&cfid->refcount, smb2_close_cached_fid);
+ }
+}
+
static void cfids_laundromat_worker(struct work_struct *work)
{
struct cached_fids *cfids;
struct cached_fid *cfid, *q;
+ struct dentry *dentry;
LIST_HEAD(entry);
cfids = container_of(work, struct cached_fids, laundromat_work.work);
spin_lock(&cfids->cfid_list_lock);
@@ -641,22 +719,32 @@ static void cfids_laundromat_worker(struct work_struct *work)
}
spin_unlock(&cfids->cfid_list_lock);
list_for_each_entry_safe(cfid, q, &entry, entry) {
list_del(&cfid->entry);
- /*
- * Cancel and wait for the work to finish in case we are racing
- * with it.
- */
- cancel_work_sync(&cfid->lease_break);
- /*
- * Drop the ref-count from above, either the lease-ref (if there
- * was one) or the extra one acquired.
- */
- kref_put(&cfid->refcount, smb2_close_cached_fid);
+
+ spin_lock(&cfid->fid_lock);
+ dentry = cfid->dentry;
+ cfid->dentry = NULL;
+ spin_unlock(&cfid->fid_lock);
+
+ dput(dentry);
+ if (cfid->is_open) {
+ spin_lock(&cifs_tcp_ses_lock);
+ ++cfid->tcon->tc_count;
+ trace_smb3_tcon_ref(cfid->tcon->debug_id, cfid->tcon->tc_count,
+ netfs_trace_tcon_ref_get_cached_laundromat);
+ spin_unlock(&cifs_tcp_ses_lock);
+ queue_work(serverclose_wq, &cfid->close_work);
+ } else
+ /*
+ * Drop the ref-count from above, either the lease-ref (if there
+ * was one) or the extra one acquired.
+ */
+ kref_put(&cfid->refcount, smb2_close_cached_fid);
}
- queue_delayed_work(cifsiod_wq, &cfids->laundromat_work,
+ queue_delayed_work(cfid_put_wq, &cfids->laundromat_work,
dir_cache_timeout * HZ);
}
struct cached_fids *init_cached_dirs(void)
{
@@ -665,13 +753,15 @@ struct cached_fids *init_cached_dirs(void)
cfids = kzalloc(sizeof(*cfids), GFP_KERNEL);
if (!cfids)
return NULL;
spin_lock_init(&cfids->cfid_list_lock);
INIT_LIST_HEAD(&cfids->entries);
+ INIT_LIST_HEAD(&cfids->dying);
+ INIT_WORK(&cfids->invalidation_work, cfids_invalidation_worker);
INIT_DELAYED_WORK(&cfids->laundromat_work, cfids_laundromat_worker);
- queue_delayed_work(cifsiod_wq, &cfids->laundromat_work,
+ queue_delayed_work(cfid_put_wq, &cfids->laundromat_work,
dir_cache_timeout * HZ);
return cfids;
}
@@ -686,17 +776,23 @@ void free_cached_dirs(struct cached_fids *cfids)
if (cfids == NULL)
return;
cancel_delayed_work_sync(&cfids->laundromat_work);
+ cancel_work_sync(&cfids->invalidation_work);
spin_lock(&cfids->cfid_list_lock);
list_for_each_entry_safe(cfid, q, &cfids->entries, entry) {
cfid->on_list = false;
cfid->is_open = false;
list_move(&cfid->entry, &entry);
}
+ list_for_each_entry_safe(cfid, q, &cfids->dying, entry) {
+ cfid->on_list = false;
+ cfid->is_open = false;
+ list_move(&cfid->entry, &entry);
+ }
spin_unlock(&cfids->cfid_list_lock);
list_for_each_entry_safe(cfid, q, &entry, entry) {
list_del(&cfid->entry);
free_cached_dir(cfid);
diff --git a/fs/smb/client/cached_dir.h b/fs/smb/client/cached_dir.h
index 81ba0fd5cc16..1dfe79d947a6 100644
--- a/fs/smb/client/cached_dir.h
+++ b/fs/smb/client/cached_dir.h
@@ -42,23 +42,27 @@ struct cached_fid {
struct kref refcount;
struct cifs_fid fid;
spinlock_t fid_lock;
struct cifs_tcon *tcon;
struct dentry *dentry;
- struct work_struct lease_break;
+ struct work_struct put_work;
+ struct work_struct close_work;
struct smb2_file_all_info file_all_info;
struct cached_dirents dirents;
};
/* default MAX_CACHED_FIDS is 16 */
struct cached_fids {
/* Must be held when:
* - accessing the cfids->entries list
+ * - accessing the cfids->dying list
*/
spinlock_t cfid_list_lock;
int num_entries;
struct list_head entries;
+ struct list_head dying;
+ struct work_struct invalidation_work;
struct delayed_work laundromat_work;
};
extern struct cached_fids *init_cached_dirs(void);
extern void free_cached_dirs(struct cached_fids *cfids);
diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
index 20cafdff5081..bf909c2f6b96 100644
--- a/fs/smb/client/cifsfs.c
+++ b/fs/smb/client/cifsfs.c
@@ -155,10 +155,11 @@ struct workqueue_struct *cifsiod_wq;
struct workqueue_struct *decrypt_wq;
struct workqueue_struct *fileinfo_put_wq;
struct workqueue_struct *cifsoplockd_wq;
struct workqueue_struct *deferredclose_wq;
struct workqueue_struct *serverclose_wq;
+struct workqueue_struct *cfid_put_wq;
__u32 cifs_lock_secret;
/*
* Bumps refcount for cifs super block.
* Note that it should be only called if a reference to VFS super block is
@@ -1893,13 +1894,20 @@ init_cifs(void)
if (!serverclose_wq) {
rc = -ENOMEM;
goto out_destroy_deferredclose_wq;
}
- rc = cifs_init_inodecache();
- if (rc)
+ cfid_put_wq = alloc_workqueue("cfid_put_wq",
+ WQ_FREEZABLE|WQ_MEM_RECLAIM, 0);
+ if (!cfid_put_wq) {
+ rc = -ENOMEM;
goto out_destroy_serverclose_wq;
+ }
+
+ rc = cifs_init_inodecache();
+ if (rc)
+ goto out_destroy_cfid_put_wq;
rc = cifs_init_netfs();
if (rc)
goto out_destroy_inodecache;
@@ -1963,10 +1971,12 @@ init_cifs(void)
destroy_mids();
out_destroy_netfs:
cifs_destroy_netfs();
out_destroy_inodecache:
cifs_destroy_inodecache();
+out_destroy_cfid_put_wq:
+ destroy_workqueue(cfid_put_wq);
out_destroy_serverclose_wq:
destroy_workqueue(serverclose_wq);
out_destroy_deferredclose_wq:
destroy_workqueue(deferredclose_wq);
out_destroy_cifsoplockd_wq:
diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h
index 5041b1ffc244..31ea19e7b998 100644
--- a/fs/smb/client/cifsglob.h
+++ b/fs/smb/client/cifsglob.h
@@ -1981,11 +1981,11 @@ require use of the stronger protocol */
* cifsInodeInfo->open_file_lock cifsInodeInfo->openFileList cifs_alloc_inode
* cifsInodeInfo->writers_lock cifsInodeInfo->writers cifsInodeInfo_alloc
* cifsInodeInfo->lock_sem cifsInodeInfo->llist cifs_init_once
* ->can_cache_brlcks
* cifsInodeInfo->deferred_lock cifsInodeInfo->deferred_closes cifsInodeInfo_alloc
- * cached_fid->fid_mutex cifs_tcon->crfid tcon_info_alloc
+ * cached_fids->cfid_list_lock cifs_tcon->cfids->entries init_cached_dirs
* cifsFileInfo->fh_mutex cifsFileInfo cifs_new_fileinfo
* cifsFileInfo->file_info_lock cifsFileInfo->count cifs_new_fileinfo
* ->invalidHandle initiate_cifs_search
* ->oplock_break_cancelled
****************************************************************************/
@@ -2069,10 +2069,11 @@ extern struct workqueue_struct *cifsiod_wq;
extern struct workqueue_struct *decrypt_wq;
extern struct workqueue_struct *fileinfo_put_wq;
extern struct workqueue_struct *cifsoplockd_wq;
extern struct workqueue_struct *deferredclose_wq;
extern struct workqueue_struct *serverclose_wq;
+extern struct workqueue_struct *cfid_put_wq;
extern __u32 cifs_lock_secret;
extern mempool_t *cifs_sm_req_poolp;
extern mempool_t *cifs_req_poolp;
extern mempool_t *cifs_mid_poolp;
diff --git a/fs/smb/client/inode.c b/fs/smb/client/inode.c
index eff3f57235ee..20484853fb6b 100644
--- a/fs/smb/client/inode.c
+++ b/fs/smb/client/inode.c
@@ -2471,17 +2471,14 @@ cifs_dentry_needs_reval(struct dentry *dentry)
if (!lookupCacheEnabled)
return true;
if (!open_cached_dir_by_dentry(tcon, dentry->d_parent, &cfid)) {
- spin_lock(&cfid->fid_lock);
if (cfid->time && cifs_i->time > cfid->time) {
- spin_unlock(&cfid->fid_lock);
close_cached_dir(cfid);
return false;
}
- spin_unlock(&cfid->fid_lock);
close_cached_dir(cfid);
}
/*
* depending on inode type, check if attribute caching disabled for
* files or directories
diff --git a/fs/smb/client/trace.h b/fs/smb/client/trace.h
index 0b52d22a91a0..12cbd3428a6d 100644
--- a/fs/smb/client/trace.h
+++ b/fs/smb/client/trace.h
@@ -42,18 +42,21 @@
EM(netfs_trace_tcon_ref_free, "FRE ") \
EM(netfs_trace_tcon_ref_free_fail, "FRE Fail ") \
EM(netfs_trace_tcon_ref_free_ipc, "FRE Ipc ") \
EM(netfs_trace_tcon_ref_free_ipc_fail, "FRE Ipc-F ") \
EM(netfs_trace_tcon_ref_free_reconnect_server, "FRE Reconn") \
+ EM(netfs_trace_tcon_ref_get_cached_laundromat, "GET Ch-Lau") \
+ EM(netfs_trace_tcon_ref_get_cached_lease_break, "GET Ch-Lea") \
EM(netfs_trace_tcon_ref_get_cancelled_close, "GET Cn-Cls") \
EM(netfs_trace_tcon_ref_get_dfs_refer, "GET DfsRef") \
EM(netfs_trace_tcon_ref_get_find, "GET Find ") \
EM(netfs_trace_tcon_ref_get_find_sess_tcon, "GET FndSes") \
EM(netfs_trace_tcon_ref_get_reconnect_server, "GET Reconn") \
EM(netfs_trace_tcon_ref_new, "NEW ") \
EM(netfs_trace_tcon_ref_new_ipc, "NEW Ipc ") \
EM(netfs_trace_tcon_ref_new_reconnect_server, "NEW Reconn") \
+ EM(netfs_trace_tcon_ref_put_cached_close, "PUT Ch-Cls") \
EM(netfs_trace_tcon_ref_put_cancelled_close, "PUT Cn-Cls") \
EM(netfs_trace_tcon_ref_put_cancelled_close_fid, "PUT Cn-Fid") \
EM(netfs_trace_tcon_ref_put_cancelled_mid, "PUT Cn-Mid") \
EM(netfs_trace_tcon_ref_put_mnt_ctx, "PUT MntCtx") \
EM(netfs_trace_tcon_ref_put_reconnect_server, "PUT Reconn") \
--
2.45.2
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH v2 4/4] smb: During unmount, ensure all cached dir instances drop their dentry
2024-11-18 21:50 ` [PATCH v2 4/4] smb: During unmount, ensure all cached dir instances drop their dentry Paul Aurich
@ 2024-11-22 2:05 ` Paulo Alcantara
2024-11-23 3:28 ` Paul Aurich
0 siblings, 1 reply; 20+ messages in thread
From: Paulo Alcantara @ 2024-11-22 2:05 UTC (permalink / raw)
To: Paul Aurich, linux-cifs, Steve French
Cc: paul, Ronnie Sahlberg, Shyam Prasad N, Tom Talpey, Bharath SM
Hi Paul,
Thanks for looking into this! Really appreciate it.
Paul Aurich <paul@darkrain42.org> writes:
> The unmount process (cifs_kill_sb() calling close_all_cached_dirs()) can
> race with various cached directory operations, which ultimately results
> in dentries not being dropped and these kernel BUGs:
>
> BUG: Dentry ffff88814f37e358{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs]
> VFS: Busy inodes after unmount of cifs (cifs)
> ------------[ cut here ]------------
> kernel BUG at fs/super.c:661!
>
> This happens when a cfid is in the process of being cleaned up when, and
> has been removed from the cfids->entries list, including:
>
> - Receiving a lease break from the server
> - Server reconnection triggers invalidate_all_cached_dirs(), which
> removes all the cfids from the list
> - The laundromat thread decides to expire an old cfid.
>
> To solve these problems, dropping the dentry is done in queued work done
> in a newly-added cfid_put_wq workqueue, and close_all_cached_dirs()
> flushes that workqueue after it drops all the dentries of which it's
> aware. This is a global workqueue (rather than scoped to a mount), but
> the queued work is minimal.
Why does it need to be a global workqueue? Can't you make it per tcon?
> The final cleanup work for cleaning up a cfid is performed via work
> queued in the serverclose_wq workqueue; this is done separate from
> dropping the dentries so that close_all_cached_dirs() doesn't block on
> any server operations.
>
> Both of these queued works expect to invoked with a cfid reference and
> a tcon reference to avoid those objects from being freed while the work
> is ongoing.
Why do you need to take a tcon reference? Can't you drop the dentries
when tearing down tcon in cifs_put_tcon()? No concurrent mounts would
be able to access or free it.
After running xfstests I've seen a leaked tcon in
/proc/fs/cifs/DebugData with no CIFS superblocks, which might be related
to this.
Could you please check if there is any leaked connection in
/proc/fs/cifs/DebugData after running your tests?
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH v2 4/4] smb: During unmount, ensure all cached dir instances drop their dentry
2024-11-22 2:05 ` Paulo Alcantara
@ 2024-11-23 3:28 ` Paul Aurich
2024-11-26 21:37 ` Paul Aurich
2024-11-27 17:36 ` Paulo Alcantara
0 siblings, 2 replies; 20+ messages in thread
From: Paul Aurich @ 2024-11-23 3:28 UTC (permalink / raw)
To: Paulo Alcantara
Cc: linux-cifs, Steve French, Ronnie Sahlberg, Shyam Prasad N,
Tom Talpey, Bharath SM
On 2024-11-21 23:05:51 -0300, Paulo Alcantara wrote:
>Hi Paul,
>
>Thanks for looking into this! Really appreciate it.
>
>Paul Aurich <paul@darkrain42.org> writes:
>
>> The unmount process (cifs_kill_sb() calling close_all_cached_dirs()) can
>> race with various cached directory operations, which ultimately results
>> in dentries not being dropped and these kernel BUGs:
>>
>> BUG: Dentry ffff88814f37e358{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs]
>> VFS: Busy inodes after unmount of cifs (cifs)
>> ------------[ cut here ]------------
>> kernel BUG at fs/super.c:661!
>>
>> This happens when a cfid is in the process of being cleaned up when, and
>> has been removed from the cfids->entries list, including:
>>
>> - Receiving a lease break from the server
>> - Server reconnection triggers invalidate_all_cached_dirs(), which
>> removes all the cfids from the list
>> - The laundromat thread decides to expire an old cfid.
>>
>> To solve these problems, dropping the dentry is done in queued work done
>> in a newly-added cfid_put_wq workqueue, and close_all_cached_dirs()
>> flushes that workqueue after it drops all the dentries of which it's
>> aware. This is a global workqueue (rather than scoped to a mount), but
>> the queued work is minimal.
>
>Why does it need to be a global workqueue? Can't you make it per tcon?
The problem with a per-tcon workqueue is I didn't see clean way to deal with
multiuser mounts and flushing the workqueue in close_all_cached_dirs() -- when
dealing with each individual tcon, we're still holding tlink_tree_lock, so an
arbitrary sleep seems problematic.
There could be a per-sb workqueue (stored in cifs_sb or the master tcon) but
is there a way to get back to the superblock / master tcon with just a tcon
(e.g. cached_dir_lease_break, when processing a lease break)?
>> The final cleanup work for cleaning up a cfid is performed via work
>> queued in the serverclose_wq workqueue; this is done separate from
>> dropping the dentries so that close_all_cached_dirs() doesn't block on
>> any server operations.
>>
>> Both of these queued works expect to invoked with a cfid reference and
>> a tcon reference to avoid those objects from being freed while the work
>> is ongoing.
>
>Why do you need to take a tcon reference?
In the existing code (and my patch, without the refs), I was seeing an
intermittent use-after-free of the tcon or cached_fids struct by queued work
processing a lease break -- the cfid isn't linked from cached_fids, but
smb2_close_cached_fid invoking SMB2_close can race with the unmount and
cifs_put_tcon
Something like:
t1 t2
cached_dir_lease_break
smb2_cached_lease_break
smb2_close_cached_fid
SMB2_close starts
cifs_kill_sb
cifs_umount
cifs_put_link
cifs_put_tcon
SMB2_close continues
I had a version of the patch that kept the 'in flight lease breaks' on
a second list in cached_fids so that they could be cancelled synchronously
from free_cached_fids(), but I struggled with it (I can't remember exactly,
but I think I was struggling to get the linked list membership / removal
handling and num_entries handling consistent).
> Can't you drop the dentries
>when tearing down tcon in cifs_put_tcon()? No concurrent mounts would
>be able to access or free it.
The dentries being dropped must occur before kill_anon_super(), as that's
where the 'Dentry still in use' check is. All the tcons are put in
cifs_umount(), which occurs after:
kill_anon_super(sb);
cifs_umount(cifs_sb);
The other thing is that cifs_umount_begin() has this comment, which made me
think a tcon can actually be tied to two distinct mount points:
if ((tcon->tc_count > 1) || (tcon->status == TID_EXITING)) {
/* we have other mounts to same share or we have
already tried to umount this and woken up
all waiting network requests, nothing to do */
Although, as I'm thinking about it again, I think I've misunderstood (and that
comment is wrong?).
It did cross my mind to pull some of the work out of cifs_umount into
cifs_kill_sb (specifically, I wanted to cancel prune_tlinks earlier) -- no
prune_tlinks would make it more feasible to drop tlink_tree_lock in
close_all_cached_dirs(), at which point a per-tcon workqueue is more
practical.
>After running xfstests I've seen a leaked tcon in
>/proc/fs/cifs/DebugData with no CIFS superblocks, which might be related
>to this.
>
>Could you please check if there is any leaked connection in
>/proc/fs/cifs/DebugData after running your tests?
After I finish with my tests (I'm not using xfstests, although perhaps
I should be) and unmount the share, DebugData doesn't show any connections for
me.
~Paul
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH v2 4/4] smb: During unmount, ensure all cached dir instances drop their dentry
2024-11-23 3:28 ` Paul Aurich
@ 2024-11-26 21:37 ` Paul Aurich
2024-11-27 16:38 ` Steve French
2024-11-27 17:36 ` Paulo Alcantara
1 sibling, 1 reply; 20+ messages in thread
From: Paul Aurich @ 2024-11-26 21:37 UTC (permalink / raw)
To: Paulo Alcantara, linux-cifs, Steve French, Ronnie Sahlberg,
Shyam Prasad N, Tom Talpey, Bharath SM
[-- Attachment #1: Type: text/plain, Size: 5315 bytes --]
On 2024-11-22 19:28:34 -0800, Paul Aurich wrote:
>On 2024-11-21 23:05:51 -0300, Paulo Alcantara wrote:
>>Hi Paul,
>>
>>Thanks for looking into this! Really appreciate it.
>>
>>Paul Aurich <paul@darkrain42.org> writes:
>>
>>>The unmount process (cifs_kill_sb() calling close_all_cached_dirs()) can
>>>race with various cached directory operations, which ultimately results
>>>in dentries not being dropped and these kernel BUGs:
>>>
>>>BUG: Dentry ffff88814f37e358{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs]
>>>VFS: Busy inodes after unmount of cifs (cifs)
>>>------------[ cut here ]------------
>>>kernel BUG at fs/super.c:661!
>>>
>>>This happens when a cfid is in the process of being cleaned up when, and
>>>has been removed from the cfids->entries list, including:
>>>
>>>- Receiving a lease break from the server
>>>- Server reconnection triggers invalidate_all_cached_dirs(), which
>>> removes all the cfids from the list
>>>- The laundromat thread decides to expire an old cfid.
>>>
>>>To solve these problems, dropping the dentry is done in queued work done
>>>in a newly-added cfid_put_wq workqueue, and close_all_cached_dirs()
>>>flushes that workqueue after it drops all the dentries of which it's
>>>aware. This is a global workqueue (rather than scoped to a mount), but
>>>the queued work is minimal.
>>
>>Why does it need to be a global workqueue? Can't you make it per tcon?
>
>The problem with a per-tcon workqueue is I didn't see clean way to
>deal with multiuser mounts and flushing the workqueue in
>close_all_cached_dirs() -- when dealing with each individual tcon,
>we're still holding tlink_tree_lock, so an arbitrary sleep seems
>problematic.
>
>There could be a per-sb workqueue (stored in cifs_sb or the master
>tcon) but is there a way to get back to the superblock / master tcon
>with just a tcon (e.g. cached_dir_lease_break, when processing a lease
>break)?
>
>>>The final cleanup work for cleaning up a cfid is performed via work
>>>queued in the serverclose_wq workqueue; this is done separate from
>>>dropping the dentries so that close_all_cached_dirs() doesn't block on
>>>any server operations.
>>>
>>>Both of these queued works expect to invoked with a cfid reference and
>>>a tcon reference to avoid those objects from being freed while the work
>>>is ongoing.
>>
>>Why do you need to take a tcon reference?
>
>In the existing code (and my patch, without the refs), I was seeing an
>intermittent use-after-free of the tcon or cached_fids struct by
>queued work processing a lease break -- the cfid isn't linked from
>cached_fids, but smb2_close_cached_fid invoking SMB2_close can race
>with the unmount and cifs_put_tcon
>
>Something like:
>
> t1 t2
>cached_dir_lease_break
>smb2_cached_lease_break
>smb2_close_cached_fid
>SMB2_close starts
> cifs_kill_sb
> cifs_umount
> cifs_put_link
> cifs_put_tcon
>SMB2_close continues
>
>I had a version of the patch that kept the 'in flight lease breaks' on
>a second list in cached_fids so that they could be cancelled
>synchronously from free_cached_fids(), but I struggled with it (I
>can't remember exactly, but I think I was struggling to get the linked
>list membership / removal handling and num_entries handling
>consistent).
>
>>Can't you drop the dentries
>>when tearing down tcon in cifs_put_tcon()? No concurrent mounts would
>>be able to access or free it.
>
>The dentries being dropped must occur before kill_anon_super(), as
>that's where the 'Dentry still in use' check is. All the tcons are put
>in cifs_umount(), which occurs after:
>
> kill_anon_super(sb);
> cifs_umount(cifs_sb);
>
>The other thing is that cifs_umount_begin() has this comment, which
>made me think a tcon can actually be tied to two distinct mount
>points:
>
> if ((tcon->tc_count > 1) || (tcon->status == TID_EXITING)) {
> /* we have other mounts to same share or we have
> already tried to umount this and woken up
> all waiting network requests, nothing to do */
>
>Although, as I'm thinking about it again, I think I've misunderstood
>(and that comment is wrong?).
>
>It did cross my mind to pull some of the work out of cifs_umount into
>cifs_kill_sb (specifically, I wanted to cancel prune_tlinks earlier)
>-- no prune_tlinks would make it more feasible to drop tlink_tree_lock
>in close_all_cached_dirs(), at which point a per-tcon workqueue is
>more practical.
>
>>After running xfstests I've seen a leaked tcon in
>>/proc/fs/cifs/DebugData with no CIFS superblocks, which might be related
>>to this.
>>
>>Could you please check if there is any leaked connection in
>>/proc/fs/cifs/DebugData after running your tests?
>
>After I finish with my tests (I'm not using xfstests, although perhaps
>I should be) and unmount the share, DebugData doesn't show any
>connections for me.
I was able to reproduce this leak. I believe the attached patch addresses it.
I'm able to intermittently see a 'Dentry still in use' bug with xfstests
generic/241 (what Steve saw) (the attached patch doesn't help with that). I'm
still unsure what's going on there.
>~Paul
[-- Attachment #2: 0001-smb-Initialize-cfid-tcon-before-performing-network-o.patch --]
[-- Type: text/x-diff, Size: 1485 bytes --]
>From cc5f87508d85e5bb8e855b9f851683838a3b5425 Mon Sep 17 00:00:00 2001
From: Paul Aurich <paul@darkrain42.org>
Date: Tue, 26 Nov 2024 13:27:29 -0800
Subject: [PATCH] smb: Initialize cfid->tcon before performing network ops
Avoid leaking a tcon ref when a lease break races with opening the
cached directory. Processing the leak break might take a reference to
the tcon in cached_dir_lease_break() and then fail to release the ref in
cached_dir_offload_close, since cfid->tcon is still NULL.
Signed-off-by: Paul Aurich <paul@darkrain42.org>
---
fs/smb/client/cached_dir.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
index d9e1d1dc6178..fe738623cf1b 100644
--- a/fs/smb/client/cached_dir.c
+++ b/fs/smb/client/cached_dir.c
@@ -227,10 +227,11 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
rc = -ENOENT;
goto out;
}
}
cfid->dentry = dentry;
+ cfid->tcon = tcon;
/*
* We do not hold the lock for the open because in case
* SMB2_open needs to reconnect.
* This is safe because no other thread will be able to get a ref
@@ -298,11 +299,10 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
pr_warn_once("server share %s deleted\n",
tcon->tree_name);
}
goto oshr_free;
}
- cfid->tcon = tcon;
cfid->is_open = true;
spin_lock(&cfids->cfid_list_lock);
o_rsp = (struct smb2_create_rsp *)rsp_iov[0].iov_base;
--
2.45.2
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH v2 4/4] smb: During unmount, ensure all cached dir instances drop their dentry
2024-11-26 21:37 ` Paul Aurich
@ 2024-11-27 16:38 ` Steve French
2024-11-27 17:04 ` Enzo Matsumiya
2024-11-28 1:10 ` Steve French
0 siblings, 2 replies; 20+ messages in thread
From: Steve French @ 2024-11-27 16:38 UTC (permalink / raw)
To: Paulo Alcantara, linux-cifs, Steve French, Ronnie Sahlberg,
Shyam Prasad N, Tom Talpey, Bharath SM
I did see the generic/241 failure again with current for-next
(unrelated to this patch though). Will try to repro it again - but
any ideas how to narrow it down or fix it would be helpful.
SECTION -- smb3
FSTYP -- cifs
PLATFORM -- Linux/x86_64 fedora29 6.12.0 #1 SMP PREEMPT_DYNAMIC Wed
Nov 27 01:02:07 UTC 2024
MKFS_OPTIONS -- //win16.vm.test/Scratch
generic/241 73s
Ran: generic/241
Passed all 1 tests
SECTION -- smb3
=========================
Ran: generic/241
Passed all 1 tests
Number of reconnects: 0
Test completed smb3 generic/241 at Wed Nov 27 06:38:47 AM UTC 2024
dmesg output during the test:
[Wed Nov 27 00:37:32 2024] CIFS: Attempting to mount //win16.vm.test/Share
[Wed Nov 27 00:37:32 2024] CIFS: VFS: generate_smb3signingkey: dumping
generated AES session keys
[Wed Nov 27 00:37:32 2024] CIFS: VFS: Session Id 45 00 00 08 00 c8 00 00
[Wed Nov 27 00:37:32 2024] CIFS: VFS: Cipher type 2
[Wed Nov 27 00:37:32 2024] CIFS: VFS: Session Key 00 bf ed c7 f1 95 0e
29 06 e8 82 87 b5 c8 72 06
[Wed Nov 27 00:37:32 2024] CIFS: VFS: Signing Key a4 0f 15 64 d2 69 02
2f 4e 78 60 7a fe 3e 31 4e
[Wed Nov 27 00:37:32 2024] CIFS: VFS: ServerIn Key a6 fd 04 f6 04 ea
0e 6e 60 c0 1b b1 ee 63 38 e9
[Wed Nov 27 00:37:32 2024] CIFS: VFS: ServerOut Key a6 e3 e3 22 8c c2
b0 6e b1 9d 40 ea d0 89 6d d8
[Wed Nov 27 00:37:32 2024] CIFS: Attempting to mount //win16.vm.test/Scratch
[Wed Nov 27 00:37:32 2024] CIFS: Attempting to mount //win16.vm.test/Scratch
[Wed Nov 27 00:37:32 2024] run fstests generic/241 at 2024-11-27 00:37:33
[Wed Nov 27 00:38:46 2024] ------------[ cut here ]------------
[Wed Nov 27 00:38:46 2024] BUG: Dentry
00000000318d67d4{i=11000000033f68,n=~dmtmp} still in use (1) [unmount
of cifs cifs]
[Wed Nov 27 00:38:46 2024] WARNING: CPU: 2 PID: 316177 at
fs/dcache.c:1546 umount_check+0xc3/0xf0
[Wed Nov 27 00:38:46 2024] Modules linked in: cifs ccm cmac nls_utf8
cifs_arc4 nls_ucs2_utils cifs_md4 rpcsec_gss_krb5 auth_rpcgss nfsv4
dns_resolver nfs lockd grace netfs nf_conntrack_netbios_ns
nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat
ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
iptable_security ip_set ebtable_filter ebtables ip6table_filter
ip6_tables iptable_filter ip_tables sunrpc kvm_intel kvm virtio_net
net_failover failover virtio_balloon loop fuse dm_multipath nfnetlink
zram xfs bochs drm_client_lib drm_shmem_helper drm_kms_helper
crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm
sha512_ssse3 sha256_ssse3 sha1_ssse3 floppy virtio_blk qemu_fw_cfg
virtio_console [last unloaded: cifs]
[Wed Nov 27 00:38:46 2024] CPU: 2 UID: 0 PID: 316177 Comm: umount Not
tainted 6.12.0 #1
[Wed Nov 27 00:38:46 2024] Hardware name: Red Hat KVM, BIOS
1.16.3-2.el9 04/01/2014
[Wed Nov 27 00:38:46 2024] RIP: 0010:umount_check+0xc3/0xf0
[Wed Nov 27 00:38:46 2024] Code: db 74 0d 48 8d 7b 40 e8 db df f5 ff
48 8b 53 40 41 55 4d 89 f1 45 89 e0 48 89 e9 48 89 ee 48 c7 c7 80 99
ba ad e8 2d 27 a2 ff <0f> 0b 58 31 c0 5b 5d 41 5c 41 5d 41 5e c3 cc cc
cc cc 41 83 fc 01
[Wed Nov 27 00:38:46 2024] RSP: 0018:ff1100012197fd20 EFLAGS: 00010282
[Wed Nov 27 00:38:46 2024] RAX: dffffc0000000000 RBX: ff1100010c574ce0
RCX: 0000000000000027
[Wed Nov 27 00:38:46 2024] RDX: 0000000000000027 RSI: 0000000000000004
RDI: ff110004cb131a48
[Wed Nov 27 00:38:46 2024] RBP: ff1100012c76bd60 R08: ffffffffac3fd2fe
R09: ffe21c0099626349
[Wed Nov 27 00:38:46 2024] R10: ff110004cb131a4b R11: 0000000000000001
R12: 0000000000000001
[Wed Nov 27 00:38:46 2024] R13: ff110001238b6668 R14: ffffffffc1d6e6c0
R15: ff1100012c76be18
[Wed Nov 27 00:38:46 2024] FS: 00007fddc1dcc800(0000)
GS:ff110004cb100000(0000) knlGS:0000000000000000
[Wed Nov 27 00:38:46 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Wed Nov 27 00:38:46 2024] CR2: 00007fc6440095d0 CR3: 0000000142146005
CR4: 0000000000373ef0
[Wed Nov 27 00:38:46 2024] DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
[Wed Nov 27 00:38:46 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0
DR7: 0000000000000400
[Wed Nov 27 00:38:46 2024] Call Trace:
[Wed Nov 27 00:38:46 2024] <TASK>
[Wed Nov 27 00:38:46 2024] ? __warn+0xa9/0x220
[Wed Nov 27 00:38:46 2024] ? umount_check+0xc3/0xf0
[Wed Nov 27 00:38:46 2024] ? report_bug+0x1d4/0x1e0
[Wed Nov 27 00:38:46 2024] ? handle_bug+0x5b/0xa0
[Wed Nov 27 00:38:46 2024] ? exc_invalid_op+0x18/0x50
[Wed Nov 27 00:38:46 2024] ? asm_exc_invalid_op+0x1a/0x20
[Wed Nov 27 00:38:46 2024] ? irq_work_claim+0x1e/0x40
[Wed Nov 27 00:38:46 2024] ? umount_check+0xc3/0xf0
[Wed Nov 27 00:38:46 2024] ? __pfx_umount_check+0x10/0x10
[Wed Nov 27 00:38:46 2024] d_walk+0xf3/0x4e0
[Wed Nov 27 00:38:46 2024] ? d_walk+0x4b/0x4e0
[Wed Nov 27 00:38:46 2024] shrink_dcache_for_umount+0x6d/0x220
[Wed Nov 27 00:38:46 2024] generic_shutdown_super+0x4a/0x1c0
[Wed Nov 27 00:38:46 2024] kill_anon_super+0x22/0x40
[Wed Nov 27 00:38:46 2024] cifs_kill_sb+0x78/0x90 [cifs]
[Wed Nov 27 00:38:46 2024] deactivate_locked_super+0x69/0xf0
[Wed Nov 27 00:38:46 2024] cleanup_mnt+0x195/0x200
[Wed Nov 27 00:38:46 2024] task_work_run+0xec/0x150
[Wed Nov 27 00:38:46 2024] ? __pfx_task_work_run+0x10/0x10
[Wed Nov 27 00:38:46 2024] ? mark_held_locks+0x24/0x90
[Wed Nov 27 00:38:46 2024] syscall_exit_to_user_mode+0x269/0x2a0
[Wed Nov 27 00:38:46 2024] do_syscall_64+0x81/0x180
[Wed Nov 27 00:38:46 2024] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Wed Nov 27 00:38:46 2024] RIP: 0033:0x7fddc1ff43eb
[Wed Nov 27 00:38:46 2024] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f
1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa
b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15
f9 a9 0c 00 f7 d8
[Wed Nov 27 00:38:46 2024] RSP: 002b:00007ffe64be88d8 EFLAGS: 00000246
ORIG_RAX: 00000000000000a6
[Wed Nov 27 00:38:46 2024] RAX: 0000000000000000 RBX: 00005632106d4c20
RCX: 00007fddc1ff43eb
[Wed Nov 27 00:38:46 2024] RDX: 0000000000000000 RSI: 0000000000000000
RDI: 00005632106d9410
[Wed Nov 27 00:38:46 2024] RBP: 00007ffe64be89b0 R08: 00005632106d4010
R09: 0000000000000007
[Wed Nov 27 00:38:46 2024] R10: 0000000000000000 R11: 0000000000000246
R12: 00005632106d4d28
[Wed Nov 27 00:38:46 2024] R13: 0000000000000000 R14: 00005632106d9410
R15: 00005632106d5030
[Wed Nov 27 00:38:46 2024] </TASK>
[Wed Nov 27 00:38:46 2024] irq event stamp: 8317
[Wed Nov 27 00:38:46 2024] hardirqs last enabled at (8323):
[<ffffffffac230dce>] __up_console_sem+0x5e/0x70
[Wed Nov 27 00:38:46 2024] hardirqs last disabled at (8328):
[<ffffffffac230db3>] __up_console_sem+0x43/0x70
[Wed Nov 27 00:38:46 2024] softirqs last enabled at (6628):
[<ffffffffac135745>] __irq_exit_rcu+0x135/0x160
[Wed Nov 27 00:38:46 2024] softirqs last disabled at (6539):
[<ffffffffac135745>] __irq_exit_rcu+0x135/0x160
[Wed Nov 27 00:38:46 2024] ---[ end trace 0000000000000000 ]---
[Wed Nov 27 00:38:46 2024] VFS: Busy inodes after unmount of cifs (cifs)
[Wed Nov 27 00:38:46 2024] ------------[ cut here ]------------
[Wed Nov 27 00:38:46 2024] kernel BUG at fs/super.c:650!
[Wed Nov 27 00:38:46 2024] Oops: invalid opcode: 0000 [#1] PREEMPT SMP
KASAN NOPTI
[Wed Nov 27 00:38:46 2024] CPU: 2 UID: 0 PID: 316177 Comm: umount
Tainted: G W 6.12.0 #1
[Wed Nov 27 00:38:46 2024] Tainted: [W]=WARN
[Wed Nov 27 00:38:46 2024] Hardware name: Red Hat KVM, BIOS
1.16.3-2.el9 04/01/2014
[Wed Nov 27 00:38:46 2024] RIP: 0010:generic_shutdown_super+0x1b7/0x1c0
[Wed Nov 27 00:38:46 2024] Code: 7b 28 e8 5c ca f8 ff 48 8b 6b 28 48
89 ef e8 50 ca f8 ff 48 8b 55 00 48 8d b3 68 06 00 00 48 c7 c7 e0 38
ba ad e8 d9 c1 b5 ff <0f> 0b 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90
90 90 90 90 90 90
[Wed Nov 27 00:38:46 2024] RSP: 0018:ff1100012197fdf0 EFLAGS: 00010282
[Wed Nov 27 00:38:46 2024] RAX: 000000000000002d RBX: ff110001238b6000
RCX: 0000000000000027
[Wed Nov 27 00:38:46 2024] RDX: 0000000000000027 RSI: 0000000000000004
RDI: ff110004cb131a48
[Wed Nov 27 00:38:46 2024] RBP: ffffffffc1c6ac00 R08: ffffffffac3fd2fe
R09: ffe21c0099626349
[Wed Nov 27 00:38:46 2024] R10: ff110004cb131a4b R11: 0000000000000001
R12: ff110001238b69c0
[Wed Nov 27 00:38:46 2024] R13: ff110001238b6780 R14: 1fe220002432ffd4
R15: 0000000000000000
[Wed Nov 27 00:38:46 2024] FS: 00007fddc1dcc800(0000)
GS:ff110004cb100000(0000) knlGS:0000000000000000
[Wed Nov 27 00:38:46 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Wed Nov 27 00:38:46 2024] CR2: 00007fc6440095d0 CR3: 0000000142146005
CR4: 0000000000373ef0
[Wed Nov 27 00:38:46 2024] DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
[Wed Nov 27 00:38:46 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0
DR7: 0000000000000400
[Wed Nov 27 00:38:46 2024] Call Trace:
[Wed Nov 27 00:38:46 2024] <TASK>
[Wed Nov 27 00:38:46 2024] ? die+0x37/0x90
[Wed Nov 27 00:38:46 2024] ? do_trap+0x133/0x230
[Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
[Wed Nov 27 00:38:46 2024] ? do_error_trap+0x94/0x130
[Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
[Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
[Wed Nov 27 00:38:46 2024] ? handle_invalid_op+0x2c/0x40
[Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
[Wed Nov 27 00:38:46 2024] ? exc_invalid_op+0x2f/0x50
[Wed Nov 27 00:38:46 2024] ? asm_exc_invalid_op+0x1a/0x20
[Wed Nov 27 00:38:46 2024] ? irq_work_claim+0x1e/0x40
[Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
[Wed Nov 27 00:38:46 2024] kill_anon_super+0x22/0x40
[Wed Nov 27 00:38:46 2024] cifs_kill_sb+0x78/0x90 [cifs]
[Wed Nov 27 00:38:46 2024] deactivate_locked_super+0x69/0xf0
[Wed Nov 27 00:38:46 2024] cleanup_mnt+0x195/0x200
[Wed Nov 27 00:38:46 2024] task_work_run+0xec/0x150
[Wed Nov 27 00:38:46 2024] ? __pfx_task_work_run+0x10/0x10
[Wed Nov 27 00:38:46 2024] ? mark_held_locks+0x24/0x90
[Wed Nov 27 00:38:46 2024] syscall_exit_to_user_mode+0x269/0x2a0
[Wed Nov 27 00:38:46 2024] do_syscall_64+0x81/0x180
[Wed Nov 27 00:38:46 2024] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Wed Nov 27 00:38:46 2024] RIP: 0033:0x7fddc1ff43eb
[Wed Nov 27 00:38:46 2024] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f
1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa
b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15
f9 a9 0c 00 f7 d8
[Wed Nov 27 00:38:46 2024] RSP: 002b:00007ffe64be88d8 EFLAGS: 00000246
ORIG_RAX: 00000000000000a6
[Wed Nov 27 00:38:46 2024] RAX: 0000000000000000 RBX: 00005632106d4c20
RCX: 00007fddc1ff43eb
[Wed Nov 27 00:38:46 2024] RDX: 0000000000000000 RSI: 0000000000000000
RDI: 00005632106d9410
[Wed Nov 27 00:38:46 2024] RBP: 00007ffe64be89b0 R08: 00005632106d4010
R09: 0000000000000007
[Wed Nov 27 00:38:46 2024] R10: 0000000000000000 R11: 0000000000000246
R12: 00005632106d4d28
[Wed Nov 27 00:38:46 2024] R13: 0000000000000000 R14: 00005632106d9410
R15: 00005632106d5030
[Wed Nov 27 00:38:46 2024] </TASK>
[Wed Nov 27 00:38:46 2024] Modules linked in: cifs ccm cmac nls_utf8
cifs_arc4 nls_ucs2_utils cifs_md4 rpcsec_gss_krb5 auth_rpcgss nfsv4
dns_resolver nfs lockd grace netfs nf_conntrack_netbios_ns
nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat
ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
iptable_security ip_set ebtable_filter ebtables ip6table_filter
ip6_tables iptable_filter ip_tables sunrpc kvm_intel kvm virtio_net
net_failover failover virtio_balloon loop fuse dm_multipath nfnetlink
zram xfs bochs drm_client_lib drm_shmem_helper drm_kms_helper
crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm
sha512_ssse3 sha256_ssse3 sha1_ssse3 floppy virtio_blk qemu_fw_cfg
virtio_console [last unloaded: cifs]
[Wed Nov 27 00:38:46 2024] ---[ end trace 0000000000000000 ]---
[Wed Nov 27 00:38:46 2024] RIP: 0010:generic_shutdown_super+0x1b7/0x1c0
[Wed Nov 27 00:38:46 2024] Code: 7b 28 e8 5c ca f8 ff 48 8b 6b 28 48
89 ef e8 50 ca f8 ff 48 8b 55 00 48 8d b3 68 06 00 00 48 c7 c7 e0 38
ba ad e8 d9 c1 b5 ff <0f> 0b 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90
90 90 90 90 90 90
[Wed Nov 27 00:38:46 2024] RSP: 0018:ff1100012197fdf0 EFLAGS: 00010282
[Wed Nov 27 00:38:46 2024] RAX: 000000000000002d RBX: ff110001238b6000
RCX: 0000000000000027
[Wed Nov 27 00:38:46 2024] RDX: 0000000000000027 RSI: 0000000000000004
RDI: ff110004cb131a48
[Wed Nov 27 00:38:46 2024] RBP: ffffffffc1c6ac00 R08: ffffffffac3fd2fe
R09: ffe21c0099626349
[Wed Nov 27 00:38:46 2024] R10: ff110004cb131a4b R11: 0000000000000001
R12: ff110001238b69c0
[Wed Nov 27 00:38:46 2024] R13: ff110001238b6780 R14: 1fe220002432ffd4
R15: 0000000000000000
[Wed Nov 27 00:38:46 2024] FS: 00007fddc1dcc800(0000)
GS:ff110004cb100000(0000) knlGS:0000000000000000
[Wed Nov 27 00:38:46 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Wed Nov 27 00:38:46 2024] CR2: 00007fc6440095d0 CR3: 0000000142146005
CR4: 0000000000373ef0
[Wed Nov 27 00:38:46 2024] DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
[Wed Nov 27 00:38:46 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0
DR7: 0000000000000400
On Tue, Nov 26, 2024 at 3:50 PM Paul Aurich <paul@darkrain42.org> wrote:
>
> On 2024-11-22 19:28:34 -0800, Paul Aurich wrote:
> >On 2024-11-21 23:05:51 -0300, Paulo Alcantara wrote:
> >>Hi Paul,
> >>
> >>Thanks for looking into this! Really appreciate it.
> >>
> >>Paul Aurich <paul@darkrain42.org> writes:
> >>
> >>>The unmount process (cifs_kill_sb() calling close_all_cached_dirs()) can
> >>>race with various cached directory operations, which ultimately results
> >>>in dentries not being dropped and these kernel BUGs:
> >>>
> >>>BUG: Dentry ffff88814f37e358{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs]
> >>>VFS: Busy inodes after unmount of cifs (cifs)
> >>>------------[ cut here ]------------
> >>>kernel BUG at fs/super.c:661!
> >>>
> >>>This happens when a cfid is in the process of being cleaned up when, and
> >>>has been removed from the cfids->entries list, including:
> >>>
> >>>- Receiving a lease break from the server
> >>>- Server reconnection triggers invalidate_all_cached_dirs(), which
> >>> removes all the cfids from the list
> >>>- The laundromat thread decides to expire an old cfid.
> >>>
> >>>To solve these problems, dropping the dentry is done in queued work done
> >>>in a newly-added cfid_put_wq workqueue, and close_all_cached_dirs()
> >>>flushes that workqueue after it drops all the dentries of which it's
> >>>aware. This is a global workqueue (rather than scoped to a mount), but
> >>>the queued work is minimal.
> >>
> >>Why does it need to be a global workqueue? Can't you make it per tcon?
> >
> >The problem with a per-tcon workqueue is I didn't see clean way to
> >deal with multiuser mounts and flushing the workqueue in
> >close_all_cached_dirs() -- when dealing with each individual tcon,
> >we're still holding tlink_tree_lock, so an arbitrary sleep seems
> >problematic.
> >
> >There could be a per-sb workqueue (stored in cifs_sb or the master
> >tcon) but is there a way to get back to the superblock / master tcon
> >with just a tcon (e.g. cached_dir_lease_break, when processing a lease
> >break)?
> >
> >>>The final cleanup work for cleaning up a cfid is performed via work
> >>>queued in the serverclose_wq workqueue; this is done separate from
> >>>dropping the dentries so that close_all_cached_dirs() doesn't block on
> >>>any server operations.
> >>>
> >>>Both of these queued works expect to invoked with a cfid reference and
> >>>a tcon reference to avoid those objects from being freed while the work
> >>>is ongoing.
> >>
> >>Why do you need to take a tcon reference?
> >
> >In the existing code (and my patch, without the refs), I was seeing an
> >intermittent use-after-free of the tcon or cached_fids struct by
> >queued work processing a lease break -- the cfid isn't linked from
> >cached_fids, but smb2_close_cached_fid invoking SMB2_close can race
> >with the unmount and cifs_put_tcon
> >
> >Something like:
> >
> > t1 t2
> >cached_dir_lease_break
> >smb2_cached_lease_break
> >smb2_close_cached_fid
> >SMB2_close starts
> > cifs_kill_sb
> > cifs_umount
> > cifs_put_link
> > cifs_put_tcon
> >SMB2_close continues
> >
> >I had a version of the patch that kept the 'in flight lease breaks' on
> >a second list in cached_fids so that they could be cancelled
> >synchronously from free_cached_fids(), but I struggled with it (I
> >can't remember exactly, but I think I was struggling to get the linked
> >list membership / removal handling and num_entries handling
> >consistent).
> >
> >>Can't you drop the dentries
> >>when tearing down tcon in cifs_put_tcon()? No concurrent mounts would
> >>be able to access or free it.
> >
> >The dentries being dropped must occur before kill_anon_super(), as
> >that's where the 'Dentry still in use' check is. All the tcons are put
> >in cifs_umount(), which occurs after:
> >
> > kill_anon_super(sb);
> > cifs_umount(cifs_sb);
> >
> >The other thing is that cifs_umount_begin() has this comment, which
> >made me think a tcon can actually be tied to two distinct mount
> >points:
> >
> > if ((tcon->tc_count > 1) || (tcon->status == TID_EXITING)) {
> > /* we have other mounts to same share or we have
> > already tried to umount this and woken up
> > all waiting network requests, nothing to do */
> >
> >Although, as I'm thinking about it again, I think I've misunderstood
> >(and that comment is wrong?).
> >
> >It did cross my mind to pull some of the work out of cifs_umount into
> >cifs_kill_sb (specifically, I wanted to cancel prune_tlinks earlier)
> >-- no prune_tlinks would make it more feasible to drop tlink_tree_lock
> >in close_all_cached_dirs(), at which point a per-tcon workqueue is
> >more practical.
> >
> >>After running xfstests I've seen a leaked tcon in
> >>/proc/fs/cifs/DebugData with no CIFS superblocks, which might be related
> >>to this.
> >>
> >>Could you please check if there is any leaked connection in
> >>/proc/fs/cifs/DebugData after running your tests?
> >
> >After I finish with my tests (I'm not using xfstests, although perhaps
> >I should be) and unmount the share, DebugData doesn't show any
> >connections for me.
>
> I was able to reproduce this leak. I believe the attached patch addresses it.
>
> I'm able to intermittently see a 'Dentry still in use' bug with xfstests
> generic/241 (what Steve saw) (the attached patch doesn't help with that). I'm
> still unsure what's going on there.
>
> >~Paul
--
Thanks,
Steve
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH v2 4/4] smb: During unmount, ensure all cached dir instances drop their dentry
2024-11-27 16:38 ` Steve French
@ 2024-11-27 17:04 ` Enzo Matsumiya
2024-11-27 17:12 ` Steve French
2024-11-28 1:10 ` Steve French
1 sibling, 1 reply; 20+ messages in thread
From: Enzo Matsumiya @ 2024-11-27 17:04 UTC (permalink / raw)
To: Steve French
Cc: Paulo Alcantara, linux-cifs, Steve French, Ronnie Sahlberg,
Shyam Prasad N, Tom Talpey, Bharath SM
On 11/27, Steve French wrote:
>I did see the generic/241 failure again with current for-next
>(unrelated to this patch though). Will try to repro it again - but
>any ideas how to narrow it down or fix it would be helpful.
We're seeing this too when backporting that patch series to SLE15-SP6,
by only running generic/072, so I don't think it's unrelated.
We also hit, also with generic/072, but only once, the WARN() in
cached_dir_offload_close() (introduced in this same patch):
[ 526.946722] WARNING: CPU: 2 PID: 23778 at fs/smb/client/cached_dir.c:555 cached_dir_offload_close+0x90/0xa0 [cifs]
[ 526.948561] Modules linked in: cifs cifs_arc4 cifs_md4
[ 526.949385] CPU: 2 PID: 23778 Comm: kworker/2:1 Kdump: loaded Not tainted 6.4.0-lku #91 SLE15-SP6 (unreleased)
[ 526.949394] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-2-gc13ff2cd-prebuilt.qemu.org 04/01/2014
[ 526.949398] Workqueue: serverclose cached_dir_offload_close [cifs]
[ 526.951938] RIP: 0010:cached_dir_offload_close+0x90/0xa0 [cifs]
[ 526.953827] Code: e8 a5 fb ff ff 4c 89 e7 5b 5d 41 5c e9 99 57 fc ff 48 89 ef be 03 00 00 00 e8 5c d4 c5 c9 4c 89 e7 5b 5d 41 5c e9 80 57 fc ff <0f> 0b eb 99 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90
[ 526.953836] RSP: 0018:ffff888108b7fdd8 EFLAGS: 00010202
[ 526.953844] RAX: 0000000000000000 RBX: ffff888100de68d0 RCX: ffffffffc042e7d4
[ 526.953849] RDX: 1ffff110201bcd04 RSI: 0000000000000008 RDI: ffff888100de6820
[ 526.953854] RBP: ffff8881063d8a00 R08: 0000000000000001 R09: ffffed10201bcd1a
[ 526.953858] R10: ffff888100de68d7 R11: 0000000000000000 R12: ffff888100a2b000
[ 526.953862] R13: 0000000000000080 R14: ffff88814f336ea8 R15: ffff888100de68d8
[ 526.953872] FS: 0000000000000000(0000) GS:ffff88814f300000(0000) knlGS:0000000000000000
[ 526.965888] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 526.965897] CR2: 00007f841e6ea698 CR3: 0000000106c68000 CR4: 0000000000750ee0
[ 526.965904] PKRU: 55555554
[ 526.965908] Call Trace:
[ 526.965912] <TASK>
[ 526.965917] ? __warn+0x92/0xf0
[ 526.965932] ? cached_dir_offload_close+0x90/0xa0 [cifs]
[ 526.972634] ? report_bug+0x163/0x190
[ 526.972652] ? handle_bug+0x3a/0x70
[ 526.972665] ? exc_invalid_op+0x17/0x40
[ 526.972674] ? asm_exc_invalid_op+0x1a/0x20
[ 526.972688] ? cached_dir_offload_close+0x24/0xa0 [cifs]
[ 526.977239] ? cached_dir_offload_close+0x90/0xa0 [cifs]
[ 526.978831] ? cached_dir_offload_close+0x24/0xa0 [cifs]
[ 526.979133] process_one_work+0x42c/0x730
[ 526.979176] worker_thread+0x8e/0x700
[ 526.979190] ? __pfx_worker_thread+0x10/0x10
[ 526.979200] kthread+0x197/0x1d0
[ 526.979208] ? kthread+0xeb/0x1d0
[ 526.979216] ? __pfx_kthread+0x10/0x10
[ 526.979225] ret_from_fork+0x29/0x50
[ 526.979237] </TASK>
[ 526.979241] ---[ end trace 0000000000000000 ]---
Will update here if we find a fix/root cause.
Cheers,
Enzo
>SECTION -- smb3
>FSTYP -- cifs
>PLATFORM -- Linux/x86_64 fedora29 6.12.0 #1 SMP PREEMPT_DYNAMIC Wed
>Nov 27 01:02:07 UTC 2024
>MKFS_OPTIONS -- //win16.vm.test/Scratch
>generic/241 73s
>Ran: generic/241
>Passed all 1 tests
>SECTION -- smb3
>=========================
>Ran: generic/241
>Passed all 1 tests
>Number of reconnects: 0
>Test completed smb3 generic/241 at Wed Nov 27 06:38:47 AM UTC 2024
>dmesg output during the test:
>[Wed Nov 27 00:37:32 2024] CIFS: Attempting to mount //win16.vm.test/Share
>[Wed Nov 27 00:37:32 2024] CIFS: VFS: generate_smb3signingkey: dumping
>generated AES session keys
>[Wed Nov 27 00:37:32 2024] CIFS: VFS: Session Id 45 00 00 08 00 c8 00 00
>[Wed Nov 27 00:37:32 2024] CIFS: VFS: Cipher type 2
>[Wed Nov 27 00:37:32 2024] CIFS: VFS: Session Key 00 bf ed c7 f1 95 0e
>29 06 e8 82 87 b5 c8 72 06
>[Wed Nov 27 00:37:32 2024] CIFS: VFS: Signing Key a4 0f 15 64 d2 69 02
>2f 4e 78 60 7a fe 3e 31 4e
>[Wed Nov 27 00:37:32 2024] CIFS: VFS: ServerIn Key a6 fd 04 f6 04 ea
>0e 6e 60 c0 1b b1 ee 63 38 e9
>[Wed Nov 27 00:37:32 2024] CIFS: VFS: ServerOut Key a6 e3 e3 22 8c c2
>b0 6e b1 9d 40 ea d0 89 6d d8
>[Wed Nov 27 00:37:32 2024] CIFS: Attempting to mount //win16.vm.test/Scratch
>[Wed Nov 27 00:37:32 2024] CIFS: Attempting to mount //win16.vm.test/Scratch
>[Wed Nov 27 00:37:32 2024] run fstests generic/241 at 2024-11-27 00:37:33
>[Wed Nov 27 00:38:46 2024] ------------[ cut here ]------------
>[Wed Nov 27 00:38:46 2024] BUG: Dentry
>00000000318d67d4{i=11000000033f68,n=~dmtmp} still in use (1) [unmount
>of cifs cifs]
>[Wed Nov 27 00:38:46 2024] WARNING: CPU: 2 PID: 316177 at
>fs/dcache.c:1546 umount_check+0xc3/0xf0
>[Wed Nov 27 00:38:46 2024] Modules linked in: cifs ccm cmac nls_utf8
>cifs_arc4 nls_ucs2_utils cifs_md4 rpcsec_gss_krb5 auth_rpcgss nfsv4
>dns_resolver nfs lockd grace netfs nf_conntrack_netbios_ns
>nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
>nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
>nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat
>ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
>nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
>iptable_security ip_set ebtable_filter ebtables ip6table_filter
>ip6_tables iptable_filter ip_tables sunrpc kvm_intel kvm virtio_net
>net_failover failover virtio_balloon loop fuse dm_multipath nfnetlink
>zram xfs bochs drm_client_lib drm_shmem_helper drm_kms_helper
>crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm
>sha512_ssse3 sha256_ssse3 sha1_ssse3 floppy virtio_blk qemu_fw_cfg
>virtio_console [last unloaded: cifs]
>[Wed Nov 27 00:38:46 2024] CPU: 2 UID: 0 PID: 316177 Comm: umount Not
>tainted 6.12.0 #1
>[Wed Nov 27 00:38:46 2024] Hardware name: Red Hat KVM, BIOS
>1.16.3-2.el9 04/01/2014
>[Wed Nov 27 00:38:46 2024] RIP: 0010:umount_check+0xc3/0xf0
>[Wed Nov 27 00:38:46 2024] Code: db 74 0d 48 8d 7b 40 e8 db df f5 ff
>48 8b 53 40 41 55 4d 89 f1 45 89 e0 48 89 e9 48 89 ee 48 c7 c7 80 99
>ba ad e8 2d 27 a2 ff <0f> 0b 58 31 c0 5b 5d 41 5c 41 5d 41 5e c3 cc cc
>cc cc 41 83 fc 01
>[Wed Nov 27 00:38:46 2024] RSP: 0018:ff1100012197fd20 EFLAGS: 00010282
>[Wed Nov 27 00:38:46 2024] RAX: dffffc0000000000 RBX: ff1100010c574ce0
>RCX: 0000000000000027
>[Wed Nov 27 00:38:46 2024] RDX: 0000000000000027 RSI: 0000000000000004
>RDI: ff110004cb131a48
>[Wed Nov 27 00:38:46 2024] RBP: ff1100012c76bd60 R08: ffffffffac3fd2fe
>R09: ffe21c0099626349
>[Wed Nov 27 00:38:46 2024] R10: ff110004cb131a4b R11: 0000000000000001
>R12: 0000000000000001
>[Wed Nov 27 00:38:46 2024] R13: ff110001238b6668 R14: ffffffffc1d6e6c0
>R15: ff1100012c76be18
>[Wed Nov 27 00:38:46 2024] FS: 00007fddc1dcc800(0000)
>GS:ff110004cb100000(0000) knlGS:0000000000000000
>[Wed Nov 27 00:38:46 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>[Wed Nov 27 00:38:46 2024] CR2: 00007fc6440095d0 CR3: 0000000142146005
>CR4: 0000000000373ef0
>[Wed Nov 27 00:38:46 2024] DR0: 0000000000000000 DR1: 0000000000000000
>DR2: 0000000000000000
>[Wed Nov 27 00:38:46 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0
>DR7: 0000000000000400
>[Wed Nov 27 00:38:46 2024] Call Trace:
>[Wed Nov 27 00:38:46 2024] <TASK>
>[Wed Nov 27 00:38:46 2024] ? __warn+0xa9/0x220
>[Wed Nov 27 00:38:46 2024] ? umount_check+0xc3/0xf0
>[Wed Nov 27 00:38:46 2024] ? report_bug+0x1d4/0x1e0
>[Wed Nov 27 00:38:46 2024] ? handle_bug+0x5b/0xa0
>[Wed Nov 27 00:38:46 2024] ? exc_invalid_op+0x18/0x50
>[Wed Nov 27 00:38:46 2024] ? asm_exc_invalid_op+0x1a/0x20
>[Wed Nov 27 00:38:46 2024] ? irq_work_claim+0x1e/0x40
>[Wed Nov 27 00:38:46 2024] ? umount_check+0xc3/0xf0
>[Wed Nov 27 00:38:46 2024] ? __pfx_umount_check+0x10/0x10
>[Wed Nov 27 00:38:46 2024] d_walk+0xf3/0x4e0
>[Wed Nov 27 00:38:46 2024] ? d_walk+0x4b/0x4e0
>[Wed Nov 27 00:38:46 2024] shrink_dcache_for_umount+0x6d/0x220
>[Wed Nov 27 00:38:46 2024] generic_shutdown_super+0x4a/0x1c0
>[Wed Nov 27 00:38:46 2024] kill_anon_super+0x22/0x40
>[Wed Nov 27 00:38:46 2024] cifs_kill_sb+0x78/0x90 [cifs]
>[Wed Nov 27 00:38:46 2024] deactivate_locked_super+0x69/0xf0
>[Wed Nov 27 00:38:46 2024] cleanup_mnt+0x195/0x200
>[Wed Nov 27 00:38:46 2024] task_work_run+0xec/0x150
>[Wed Nov 27 00:38:46 2024] ? __pfx_task_work_run+0x10/0x10
>[Wed Nov 27 00:38:46 2024] ? mark_held_locks+0x24/0x90
>[Wed Nov 27 00:38:46 2024] syscall_exit_to_user_mode+0x269/0x2a0
>[Wed Nov 27 00:38:46 2024] do_syscall_64+0x81/0x180
>[Wed Nov 27 00:38:46 2024] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>[Wed Nov 27 00:38:46 2024] RIP: 0033:0x7fddc1ff43eb
>[Wed Nov 27 00:38:46 2024] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f
>1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa
>b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15
>f9 a9 0c 00 f7 d8
>[Wed Nov 27 00:38:46 2024] RSP: 002b:00007ffe64be88d8 EFLAGS: 00000246
>ORIG_RAX: 00000000000000a6
>[Wed Nov 27 00:38:46 2024] RAX: 0000000000000000 RBX: 00005632106d4c20
>RCX: 00007fddc1ff43eb
>[Wed Nov 27 00:38:46 2024] RDX: 0000000000000000 RSI: 0000000000000000
>RDI: 00005632106d9410
>[Wed Nov 27 00:38:46 2024] RBP: 00007ffe64be89b0 R08: 00005632106d4010
>R09: 0000000000000007
>[Wed Nov 27 00:38:46 2024] R10: 0000000000000000 R11: 0000000000000246
>R12: 00005632106d4d28
>[Wed Nov 27 00:38:46 2024] R13: 0000000000000000 R14: 00005632106d9410
>R15: 00005632106d5030
>[Wed Nov 27 00:38:46 2024] </TASK>
>[Wed Nov 27 00:38:46 2024] irq event stamp: 8317
>[Wed Nov 27 00:38:46 2024] hardirqs last enabled at (8323):
>[<ffffffffac230dce>] __up_console_sem+0x5e/0x70
>[Wed Nov 27 00:38:46 2024] hardirqs last disabled at (8328):
>[<ffffffffac230db3>] __up_console_sem+0x43/0x70
>[Wed Nov 27 00:38:46 2024] softirqs last enabled at (6628):
>[<ffffffffac135745>] __irq_exit_rcu+0x135/0x160
>[Wed Nov 27 00:38:46 2024] softirqs last disabled at (6539):
>[<ffffffffac135745>] __irq_exit_rcu+0x135/0x160
>[Wed Nov 27 00:38:46 2024] ---[ end trace 0000000000000000 ]---
>[Wed Nov 27 00:38:46 2024] VFS: Busy inodes after unmount of cifs (cifs)
>[Wed Nov 27 00:38:46 2024] ------------[ cut here ]------------
>[Wed Nov 27 00:38:46 2024] kernel BUG at fs/super.c:650!
>[Wed Nov 27 00:38:46 2024] Oops: invalid opcode: 0000 [#1] PREEMPT SMP
>KASAN NOPTI
>[Wed Nov 27 00:38:46 2024] CPU: 2 UID: 0 PID: 316177 Comm: umount
>Tainted: G W 6.12.0 #1
>[Wed Nov 27 00:38:46 2024] Tainted: [W]=WARN
>[Wed Nov 27 00:38:46 2024] Hardware name: Red Hat KVM, BIOS
>1.16.3-2.el9 04/01/2014
>[Wed Nov 27 00:38:46 2024] RIP: 0010:generic_shutdown_super+0x1b7/0x1c0
>[Wed Nov 27 00:38:46 2024] Code: 7b 28 e8 5c ca f8 ff 48 8b 6b 28 48
>89 ef e8 50 ca f8 ff 48 8b 55 00 48 8d b3 68 06 00 00 48 c7 c7 e0 38
>ba ad e8 d9 c1 b5 ff <0f> 0b 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90
>90 90 90 90 90 90
>[Wed Nov 27 00:38:46 2024] RSP: 0018:ff1100012197fdf0 EFLAGS: 00010282
>[Wed Nov 27 00:38:46 2024] RAX: 000000000000002d RBX: ff110001238b6000
>RCX: 0000000000000027
>[Wed Nov 27 00:38:46 2024] RDX: 0000000000000027 RSI: 0000000000000004
>RDI: ff110004cb131a48
>[Wed Nov 27 00:38:46 2024] RBP: ffffffffc1c6ac00 R08: ffffffffac3fd2fe
>R09: ffe21c0099626349
>[Wed Nov 27 00:38:46 2024] R10: ff110004cb131a4b R11: 0000000000000001
>R12: ff110001238b69c0
>[Wed Nov 27 00:38:46 2024] R13: ff110001238b6780 R14: 1fe220002432ffd4
>R15: 0000000000000000
>[Wed Nov 27 00:38:46 2024] FS: 00007fddc1dcc800(0000)
>GS:ff110004cb100000(0000) knlGS:0000000000000000
>[Wed Nov 27 00:38:46 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>[Wed Nov 27 00:38:46 2024] CR2: 00007fc6440095d0 CR3: 0000000142146005
>CR4: 0000000000373ef0
>[Wed Nov 27 00:38:46 2024] DR0: 0000000000000000 DR1: 0000000000000000
>DR2: 0000000000000000
>[Wed Nov 27 00:38:46 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0
>DR7: 0000000000000400
>[Wed Nov 27 00:38:46 2024] Call Trace:
>[Wed Nov 27 00:38:46 2024] <TASK>
>[Wed Nov 27 00:38:46 2024] ? die+0x37/0x90
>[Wed Nov 27 00:38:46 2024] ? do_trap+0x133/0x230
>[Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
>[Wed Nov 27 00:38:46 2024] ? do_error_trap+0x94/0x130
>[Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
>[Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
>[Wed Nov 27 00:38:46 2024] ? handle_invalid_op+0x2c/0x40
>[Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
>[Wed Nov 27 00:38:46 2024] ? exc_invalid_op+0x2f/0x50
>[Wed Nov 27 00:38:46 2024] ? asm_exc_invalid_op+0x1a/0x20
>[Wed Nov 27 00:38:46 2024] ? irq_work_claim+0x1e/0x40
>[Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
>[Wed Nov 27 00:38:46 2024] kill_anon_super+0x22/0x40
>[Wed Nov 27 00:38:46 2024] cifs_kill_sb+0x78/0x90 [cifs]
>[Wed Nov 27 00:38:46 2024] deactivate_locked_super+0x69/0xf0
>[Wed Nov 27 00:38:46 2024] cleanup_mnt+0x195/0x200
>[Wed Nov 27 00:38:46 2024] task_work_run+0xec/0x150
>[Wed Nov 27 00:38:46 2024] ? __pfx_task_work_run+0x10/0x10
>[Wed Nov 27 00:38:46 2024] ? mark_held_locks+0x24/0x90
>[Wed Nov 27 00:38:46 2024] syscall_exit_to_user_mode+0x269/0x2a0
>[Wed Nov 27 00:38:46 2024] do_syscall_64+0x81/0x180
>[Wed Nov 27 00:38:46 2024] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>[Wed Nov 27 00:38:46 2024] RIP: 0033:0x7fddc1ff43eb
>[Wed Nov 27 00:38:46 2024] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f
>1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa
>b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15
>f9 a9 0c 00 f7 d8
>[Wed Nov 27 00:38:46 2024] RSP: 002b:00007ffe64be88d8 EFLAGS: 00000246
>ORIG_RAX: 00000000000000a6
>[Wed Nov 27 00:38:46 2024] RAX: 0000000000000000 RBX: 00005632106d4c20
>RCX: 00007fddc1ff43eb
>[Wed Nov 27 00:38:46 2024] RDX: 0000000000000000 RSI: 0000000000000000
>RDI: 00005632106d9410
>[Wed Nov 27 00:38:46 2024] RBP: 00007ffe64be89b0 R08: 00005632106d4010
>R09: 0000000000000007
>[Wed Nov 27 00:38:46 2024] R10: 0000000000000000 R11: 0000000000000246
>R12: 00005632106d4d28
>[Wed Nov 27 00:38:46 2024] R13: 0000000000000000 R14: 00005632106d9410
>R15: 00005632106d5030
>[Wed Nov 27 00:38:46 2024] </TASK>
>[Wed Nov 27 00:38:46 2024] Modules linked in: cifs ccm cmac nls_utf8
>cifs_arc4 nls_ucs2_utils cifs_md4 rpcsec_gss_krb5 auth_rpcgss nfsv4
>dns_resolver nfs lockd grace netfs nf_conntrack_netbios_ns
>nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
>nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
>nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat
>ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
>nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
>iptable_security ip_set ebtable_filter ebtables ip6table_filter
>ip6_tables iptable_filter ip_tables sunrpc kvm_intel kvm virtio_net
>net_failover failover virtio_balloon loop fuse dm_multipath nfnetlink
>zram xfs bochs drm_client_lib drm_shmem_helper drm_kms_helper
>crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm
>sha512_ssse3 sha256_ssse3 sha1_ssse3 floppy virtio_blk qemu_fw_cfg
>virtio_console [last unloaded: cifs]
>[Wed Nov 27 00:38:46 2024] ---[ end trace 0000000000000000 ]---
>[Wed Nov 27 00:38:46 2024] RIP: 0010:generic_shutdown_super+0x1b7/0x1c0
>[Wed Nov 27 00:38:46 2024] Code: 7b 28 e8 5c ca f8 ff 48 8b 6b 28 48
>89 ef e8 50 ca f8 ff 48 8b 55 00 48 8d b3 68 06 00 00 48 c7 c7 e0 38
>ba ad e8 d9 c1 b5 ff <0f> 0b 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90
>90 90 90 90 90 90
>[Wed Nov 27 00:38:46 2024] RSP: 0018:ff1100012197fdf0 EFLAGS: 00010282
>[Wed Nov 27 00:38:46 2024] RAX: 000000000000002d RBX: ff110001238b6000
>RCX: 0000000000000027
>[Wed Nov 27 00:38:46 2024] RDX: 0000000000000027 RSI: 0000000000000004
>RDI: ff110004cb131a48
>[Wed Nov 27 00:38:46 2024] RBP: ffffffffc1c6ac00 R08: ffffffffac3fd2fe
>R09: ffe21c0099626349
>[Wed Nov 27 00:38:46 2024] R10: ff110004cb131a4b R11: 0000000000000001
>R12: ff110001238b69c0
>[Wed Nov 27 00:38:46 2024] R13: ff110001238b6780 R14: 1fe220002432ffd4
>R15: 0000000000000000
>[Wed Nov 27 00:38:46 2024] FS: 00007fddc1dcc800(0000)
>GS:ff110004cb100000(0000) knlGS:0000000000000000
>[Wed Nov 27 00:38:46 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>[Wed Nov 27 00:38:46 2024] CR2: 00007fc6440095d0 CR3: 0000000142146005
>CR4: 0000000000373ef0
>[Wed Nov 27 00:38:46 2024] DR0: 0000000000000000 DR1: 0000000000000000
>DR2: 0000000000000000
>[Wed Nov 27 00:38:46 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0
>DR7: 0000000000000400
>
>On Tue, Nov 26, 2024 at 3:50 PM Paul Aurich <paul@darkrain42.org> wrote:
>>
>> On 2024-11-22 19:28:34 -0800, Paul Aurich wrote:
>> >On 2024-11-21 23:05:51 -0300, Paulo Alcantara wrote:
>> >>Hi Paul,
>> >>
>> >>Thanks for looking into this! Really appreciate it.
>> >>
>> >>Paul Aurich <paul@darkrain42.org> writes:
>> >>
>> >>>The unmount process (cifs_kill_sb() calling close_all_cached_dirs()) can
>> >>>race with various cached directory operations, which ultimately results
>> >>>in dentries not being dropped and these kernel BUGs:
>> >>>
>> >>>BUG: Dentry ffff88814f37e358{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs]
>> >>>VFS: Busy inodes after unmount of cifs (cifs)
>> >>>------------[ cut here ]------------
>> >>>kernel BUG at fs/super.c:661!
>> >>>
>> >>>This happens when a cfid is in the process of being cleaned up when, and
>> >>>has been removed from the cfids->entries list, including:
>> >>>
>> >>>- Receiving a lease break from the server
>> >>>- Server reconnection triggers invalidate_all_cached_dirs(), which
>> >>> removes all the cfids from the list
>> >>>- The laundromat thread decides to expire an old cfid.
>> >>>
>> >>>To solve these problems, dropping the dentry is done in queued work done
>> >>>in a newly-added cfid_put_wq workqueue, and close_all_cached_dirs()
>> >>>flushes that workqueue after it drops all the dentries of which it's
>> >>>aware. This is a global workqueue (rather than scoped to a mount), but
>> >>>the queued work is minimal.
>> >>
>> >>Why does it need to be a global workqueue? Can't you make it per tcon?
>> >
>> >The problem with a per-tcon workqueue is I didn't see clean way to
>> >deal with multiuser mounts and flushing the workqueue in
>> >close_all_cached_dirs() -- when dealing with each individual tcon,
>> >we're still holding tlink_tree_lock, so an arbitrary sleep seems
>> >problematic.
>> >
>> >There could be a per-sb workqueue (stored in cifs_sb or the master
>> >tcon) but is there a way to get back to the superblock / master tcon
>> >with just a tcon (e.g. cached_dir_lease_break, when processing a lease
>> >break)?
>> >
>> >>>The final cleanup work for cleaning up a cfid is performed via work
>> >>>queued in the serverclose_wq workqueue; this is done separate from
>> >>>dropping the dentries so that close_all_cached_dirs() doesn't block on
>> >>>any server operations.
>> >>>
>> >>>Both of these queued works expect to invoked with a cfid reference and
>> >>>a tcon reference to avoid those objects from being freed while the work
>> >>>is ongoing.
>> >>
>> >>Why do you need to take a tcon reference?
>> >
>> >In the existing code (and my patch, without the refs), I was seeing an
>> >intermittent use-after-free of the tcon or cached_fids struct by
>> >queued work processing a lease break -- the cfid isn't linked from
>> >cached_fids, but smb2_close_cached_fid invoking SMB2_close can race
>> >with the unmount and cifs_put_tcon
>> >
>> >Something like:
>> >
>> > t1 t2
>> >cached_dir_lease_break
>> >smb2_cached_lease_break
>> >smb2_close_cached_fid
>> >SMB2_close starts
>> > cifs_kill_sb
>> > cifs_umount
>> > cifs_put_link
>> > cifs_put_tcon
>> >SMB2_close continues
>> >
>> >I had a version of the patch that kept the 'in flight lease breaks' on
>> >a second list in cached_fids so that they could be cancelled
>> >synchronously from free_cached_fids(), but I struggled with it (I
>> >can't remember exactly, but I think I was struggling to get the linked
>> >list membership / removal handling and num_entries handling
>> >consistent).
>> >
>> >>Can't you drop the dentries
>> >>when tearing down tcon in cifs_put_tcon()? No concurrent mounts would
>> >>be able to access or free it.
>> >
>> >The dentries being dropped must occur before kill_anon_super(), as
>> >that's where the 'Dentry still in use' check is. All the tcons are put
>> >in cifs_umount(), which occurs after:
>> >
>> > kill_anon_super(sb);
>> > cifs_umount(cifs_sb);
>> >
>> >The other thing is that cifs_umount_begin() has this comment, which
>> >made me think a tcon can actually be tied to two distinct mount
>> >points:
>> >
>> > if ((tcon->tc_count > 1) || (tcon->status == TID_EXITING)) {
>> > /* we have other mounts to same share or we have
>> > already tried to umount this and woken up
>> > all waiting network requests, nothing to do */
>> >
>> >Although, as I'm thinking about it again, I think I've misunderstood
>> >(and that comment is wrong?).
>> >
>> >It did cross my mind to pull some of the work out of cifs_umount into
>> >cifs_kill_sb (specifically, I wanted to cancel prune_tlinks earlier)
>> >-- no prune_tlinks would make it more feasible to drop tlink_tree_lock
>> >in close_all_cached_dirs(), at which point a per-tcon workqueue is
>> >more practical.
>> >
>> >>After running xfstests I've seen a leaked tcon in
>> >>/proc/fs/cifs/DebugData with no CIFS superblocks, which might be related
>> >>to this.
>> >>
>> >>Could you please check if there is any leaked connection in
>> >>/proc/fs/cifs/DebugData after running your tests?
>> >
>> >After I finish with my tests (I'm not using xfstests, although perhaps
>> >I should be) and unmount the share, DebugData doesn't show any
>> >connections for me.
>>
>> I was able to reproduce this leak. I believe the attached patch addresses it.
>>
>> I'm able to intermittently see a 'Dentry still in use' bug with xfstests
>> generic/241 (what Steve saw) (the attached patch doesn't help with that). I'm
>> still unsure what's going on there.
>>
>> >~Paul
>
>
>
>--
>Thanks,
>
>Steve
>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH v2 4/4] smb: During unmount, ensure all cached dir instances drop their dentry
2024-11-27 17:04 ` Enzo Matsumiya
@ 2024-11-27 17:12 ` Steve French
0 siblings, 0 replies; 20+ messages in thread
From: Steve French @ 2024-11-27 17:12 UTC (permalink / raw)
To: Enzo Matsumiya
Cc: Paulo Alcantara, linux-cifs, Steve French, Ronnie Sahlberg,
Shyam Prasad N, Tom Talpey, Bharath SM
On Wed, Nov 27, 2024 at 11:07 AM Enzo Matsumiya <ematsumiya@suse.de> wrote:
>
> On 11/27, Steve French wrote:
> >I did see the generic/241 failure again with current for-next
> >(unrelated to this patch though). Will try to repro it again - but
> >any ideas how to narrow it down or fix it would be helpful.
>
> We're seeing this too when backporting that patch series to SLE15-SP6,
> by only running generic/072, so I don't think it's unrelated.
>
> We also hit, also with generic/072, but only once, the WARN() in
> cached_dir_offload_close() (introduced in this same patch):
>
> [ 526.946722] WARNING: CPU: 2 PID: 23778 at fs/smb/client/cached_dir.c:555 cached_dir_offload_close+0x90/0xa0 [cifs]
> [ 526.948561] Modules linked in: cifs cifs_arc4 cifs_md4
> [ 526.949385] CPU: 2 PID: 23778 Comm: kworker/2:1 Kdump: loaded Not tainted 6.4.0-lku #91 SLE15-SP6 (unreleased)
> [ 526.949394] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-2-gc13ff2cd-prebuilt.qemu.org 04/01/2014
> [ 526.949398] Workqueue: serverclose cached_dir_offload_close [cifs]
> [ 526.951938] RIP: 0010:cached_dir_offload_close+0x90/0xa0 [cifs]
> [ 526.953827] Code: e8 a5 fb ff ff 4c 89 e7 5b 5d 41 5c e9 99 57 fc ff 48 89 ef be 03 00 00 00 e8 5c d4 c5 c9 4c 89 e7 5b 5d 41 5c e9 80 57 fc ff <0f> 0b eb 99 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90
> [ 526.953836] RSP: 0018:ffff888108b7fdd8 EFLAGS: 00010202
> [ 526.953844] RAX: 0000000000000000 RBX: ffff888100de68d0 RCX: ffffffffc042e7d4
> [ 526.953849] RDX: 1ffff110201bcd04 RSI: 0000000000000008 RDI: ffff888100de6820
> [ 526.953854] RBP: ffff8881063d8a00 R08: 0000000000000001 R09: ffffed10201bcd1a
> [ 526.953858] R10: ffff888100de68d7 R11: 0000000000000000 R12: ffff888100a2b000
> [ 526.953862] R13: 0000000000000080 R14: ffff88814f336ea8 R15: ffff888100de68d8
> [ 526.953872] FS: 0000000000000000(0000) GS:ffff88814f300000(0000) knlGS:0000000000000000
> [ 526.965888] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 526.965897] CR2: 00007f841e6ea698 CR3: 0000000106c68000 CR4: 0000000000750ee0
> [ 526.965904] PKRU: 55555554
> [ 526.965908] Call Trace:
> [ 526.965912] <TASK>
> [ 526.965917] ? __warn+0x92/0xf0
> [ 526.965932] ? cached_dir_offload_close+0x90/0xa0 [cifs]
> [ 526.972634] ? report_bug+0x163/0x190
> [ 526.972652] ? handle_bug+0x3a/0x70
> [ 526.972665] ? exc_invalid_op+0x17/0x40
> [ 526.972674] ? asm_exc_invalid_op+0x1a/0x20
> [ 526.972688] ? cached_dir_offload_close+0x24/0xa0 [cifs]
> [ 526.977239] ? cached_dir_offload_close+0x90/0xa0 [cifs]
> [ 526.978831] ? cached_dir_offload_close+0x24/0xa0 [cifs]
> [ 526.979133] process_one_work+0x42c/0x730
> [ 526.979176] worker_thread+0x8e/0x700
> [ 526.979190] ? __pfx_worker_thread+0x10/0x10
> [ 526.979200] kthread+0x197/0x1d0
> [ 526.979208] ? kthread+0xeb/0x1d0
> [ 526.979216] ? __pfx_kthread+0x10/0x10
> [ 526.979225] ret_from_fork+0x29/0x50
> [ 526.979237] </TASK>
> [ 526.979241] ---[ end trace 0000000000000000 ]---
>
> Will update here if we find a fix/root cause.
>
>
> Cheers,
>
> Enzo
Presumably it is related to the dir lease series but one of the
earlier ones not the most recent one, since we saw it running before -
but without the patch series we saw the unmount crash when racing with
freeing cached dentries so would be helpful to narrow the bug down so
we can fix the original problem that has been around for quite a while
now - especially important now that more servers will be enabling
directory leases (e.g. Samba can now be tested against)
--
Thanks,
Steve
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 4/4] smb: During unmount, ensure all cached dir instances drop their dentry
2024-11-27 16:38 ` Steve French
2024-11-27 17:04 ` Enzo Matsumiya
@ 2024-11-28 1:10 ` Steve French
2024-11-28 5:00 ` Steve French
1 sibling, 1 reply; 20+ messages in thread
From: Steve French @ 2024-11-28 1:10 UTC (permalink / raw)
To: Paulo Alcantara, linux-cifs, Steve French, Ronnie Sahlberg,
Shyam Prasad N, Tom Talpey, Bharath SM
I see this error at the end of generic/241 (in dmesg) even without the
more recent dir lease patch:
"smb: Initialize cfid->tcon before performing network ops"
so it is likely unrelated (earlier bug)
Nov 27 18:07:46 fedora29 kernel: Call Trace:
Nov 27 18:07:46 fedora29 kernel: <TASK>
Nov 27 18:07:46 fedora29 kernel: ? __warn+0xa9/0x220
Nov 27 18:07:46 fedora29 kernel: ? umount_check+0xc3/0xf0
Nov 27 18:07:46 fedora29 kernel: ? report_bug+0x1d4/0x1e0
Nov 27 18:07:46 fedora29 kernel: ? handle_bug+0x5b/0xa0
Nov 27 18:07:46 fedora29 kernel: ? exc_invalid_op+0x18/0x50
Nov 27 18:07:46 fedora29 kernel: ? asm_exc_invalid_op+0x1a/0x20
Nov 27 18:07:46 fedora29 kernel: ? irq_work_claim+0x1e/0x40
Nov 27 18:07:46 fedora29 kernel: ? umount_check+0xc3/0xf0
Nov 27 18:07:46 fedora29 kernel: ? __pfx_umount_check+0x10/0x10
Nov 27 18:07:46 fedora29 kernel: d_walk+0xf3/0x4e0
Nov 27 18:07:46 fedora29 kernel: ? d_walk+0x4b/0x4e0
Nov 27 18:07:46 fedora29 kernel: shrink_dcache_for_umount+0x6d/0x220
Nov 27 18:07:46 fedora29 kernel: generic_shutdown_super+0x4a/0x1c0
Nov 27 18:07:46 fedora29 kernel: kill_anon_super+0x22/0x40
Nov 27 18:07:46 fedora29 kernel: cifs_kill_sb+0x78/0x90 [cifs]
On Wed, Nov 27, 2024 at 10:38 AM Steve French <smfrench@gmail.com> wrote:
>
> I did see the generic/241 failure again with current for-next
> (unrelated to this patch though). Will try to repro it again - but
> any ideas how to narrow it down or fix it would be helpful.
>
> SECTION -- smb3
> FSTYP -- cifs
> PLATFORM -- Linux/x86_64 fedora29 6.12.0 #1 SMP PREEMPT_DYNAMIC Wed
> Nov 27 01:02:07 UTC 2024
> MKFS_OPTIONS -- //win16.vm.test/Scratch
> generic/241 73s
> Ran: generic/241
> Passed all 1 tests
> SECTION -- smb3
> =========================
> Ran: generic/241
> Passed all 1 tests
> Number of reconnects: 0
> Test completed smb3 generic/241 at Wed Nov 27 06:38:47 AM UTC 2024
> dmesg output during the test:
> [Wed Nov 27 00:37:32 2024] CIFS: Attempting to mount //win16.vm.test/Share
> [Wed Nov 27 00:37:32 2024] CIFS: VFS: generate_smb3signingkey: dumping
> generated AES session keys
> [Wed Nov 27 00:37:32 2024] CIFS: VFS: Session Id 45 00 00 08 00 c8 00 00
> [Wed Nov 27 00:37:32 2024] CIFS: VFS: Cipher type 2
> [Wed Nov 27 00:37:32 2024] CIFS: VFS: Session Key 00 bf ed c7 f1 95 0e
> 29 06 e8 82 87 b5 c8 72 06
> [Wed Nov 27 00:37:32 2024] CIFS: VFS: Signing Key a4 0f 15 64 d2 69 02
> 2f 4e 78 60 7a fe 3e 31 4e
> [Wed Nov 27 00:37:32 2024] CIFS: VFS: ServerIn Key a6 fd 04 f6 04 ea
> 0e 6e 60 c0 1b b1 ee 63 38 e9
> [Wed Nov 27 00:37:32 2024] CIFS: VFS: ServerOut Key a6 e3 e3 22 8c c2
> b0 6e b1 9d 40 ea d0 89 6d d8
> [Wed Nov 27 00:37:32 2024] CIFS: Attempting to mount //win16.vm.test/Scratch
> [Wed Nov 27 00:37:32 2024] CIFS: Attempting to mount //win16.vm.test/Scratch
> [Wed Nov 27 00:37:32 2024] run fstests generic/241 at 2024-11-27 00:37:33
> [Wed Nov 27 00:38:46 2024] ------------[ cut here ]------------
> [Wed Nov 27 00:38:46 2024] BUG: Dentry
> 00000000318d67d4{i=11000000033f68,n=~dmtmp} still in use (1) [unmount
> of cifs cifs]
> [Wed Nov 27 00:38:46 2024] WARNING: CPU: 2 PID: 316177 at
> fs/dcache.c:1546 umount_check+0xc3/0xf0
> [Wed Nov 27 00:38:46 2024] Modules linked in: cifs ccm cmac nls_utf8
> cifs_arc4 nls_ucs2_utils cifs_md4 rpcsec_gss_krb5 auth_rpcgss nfsv4
> dns_resolver nfs lockd grace netfs nf_conntrack_netbios_ns
> nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
> nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
> nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat
> ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
> iptable_security ip_set ebtable_filter ebtables ip6table_filter
> ip6_tables iptable_filter ip_tables sunrpc kvm_intel kvm virtio_net
> net_failover failover virtio_balloon loop fuse dm_multipath nfnetlink
> zram xfs bochs drm_client_lib drm_shmem_helper drm_kms_helper
> crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm
> sha512_ssse3 sha256_ssse3 sha1_ssse3 floppy virtio_blk qemu_fw_cfg
> virtio_console [last unloaded: cifs]
> [Wed Nov 27 00:38:46 2024] CPU: 2 UID: 0 PID: 316177 Comm: umount Not
> tainted 6.12.0 #1
> [Wed Nov 27 00:38:46 2024] Hardware name: Red Hat KVM, BIOS
> 1.16.3-2.el9 04/01/2014
> [Wed Nov 27 00:38:46 2024] RIP: 0010:umount_check+0xc3/0xf0
> [Wed Nov 27 00:38:46 2024] Code: db 74 0d 48 8d 7b 40 e8 db df f5 ff
> 48 8b 53 40 41 55 4d 89 f1 45 89 e0 48 89 e9 48 89 ee 48 c7 c7 80 99
> ba ad e8 2d 27 a2 ff <0f> 0b 58 31 c0 5b 5d 41 5c 41 5d 41 5e c3 cc cc
> cc cc 41 83 fc 01
> [Wed Nov 27 00:38:46 2024] RSP: 0018:ff1100012197fd20 EFLAGS: 00010282
> [Wed Nov 27 00:38:46 2024] RAX: dffffc0000000000 RBX: ff1100010c574ce0
> RCX: 0000000000000027
> [Wed Nov 27 00:38:46 2024] RDX: 0000000000000027 RSI: 0000000000000004
> RDI: ff110004cb131a48
> [Wed Nov 27 00:38:46 2024] RBP: ff1100012c76bd60 R08: ffffffffac3fd2fe
> R09: ffe21c0099626349
> [Wed Nov 27 00:38:46 2024] R10: ff110004cb131a4b R11: 0000000000000001
> R12: 0000000000000001
> [Wed Nov 27 00:38:46 2024] R13: ff110001238b6668 R14: ffffffffc1d6e6c0
> R15: ff1100012c76be18
> [Wed Nov 27 00:38:46 2024] FS: 00007fddc1dcc800(0000)
> GS:ff110004cb100000(0000) knlGS:0000000000000000
> [Wed Nov 27 00:38:46 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [Wed Nov 27 00:38:46 2024] CR2: 00007fc6440095d0 CR3: 0000000142146005
> CR4: 0000000000373ef0
> [Wed Nov 27 00:38:46 2024] DR0: 0000000000000000 DR1: 0000000000000000
> DR2: 0000000000000000
> [Wed Nov 27 00:38:46 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> DR7: 0000000000000400
> [Wed Nov 27 00:38:46 2024] Call Trace:
> [Wed Nov 27 00:38:46 2024] <TASK>
> [Wed Nov 27 00:38:46 2024] ? __warn+0xa9/0x220
> [Wed Nov 27 00:38:46 2024] ? umount_check+0xc3/0xf0
> [Wed Nov 27 00:38:46 2024] ? report_bug+0x1d4/0x1e0
> [Wed Nov 27 00:38:46 2024] ? handle_bug+0x5b/0xa0
> [Wed Nov 27 00:38:46 2024] ? exc_invalid_op+0x18/0x50
> [Wed Nov 27 00:38:46 2024] ? asm_exc_invalid_op+0x1a/0x20
> [Wed Nov 27 00:38:46 2024] ? irq_work_claim+0x1e/0x40
> [Wed Nov 27 00:38:46 2024] ? umount_check+0xc3/0xf0
> [Wed Nov 27 00:38:46 2024] ? __pfx_umount_check+0x10/0x10
> [Wed Nov 27 00:38:46 2024] d_walk+0xf3/0x4e0
> [Wed Nov 27 00:38:46 2024] ? d_walk+0x4b/0x4e0
> [Wed Nov 27 00:38:46 2024] shrink_dcache_for_umount+0x6d/0x220
> [Wed Nov 27 00:38:46 2024] generic_shutdown_super+0x4a/0x1c0
> [Wed Nov 27 00:38:46 2024] kill_anon_super+0x22/0x40
> [Wed Nov 27 00:38:46 2024] cifs_kill_sb+0x78/0x90 [cifs]
> [Wed Nov 27 00:38:46 2024] deactivate_locked_super+0x69/0xf0
> [Wed Nov 27 00:38:46 2024] cleanup_mnt+0x195/0x200
> [Wed Nov 27 00:38:46 2024] task_work_run+0xec/0x150
> [Wed Nov 27 00:38:46 2024] ? __pfx_task_work_run+0x10/0x10
> [Wed Nov 27 00:38:46 2024] ? mark_held_locks+0x24/0x90
> [Wed Nov 27 00:38:46 2024] syscall_exit_to_user_mode+0x269/0x2a0
> [Wed Nov 27 00:38:46 2024] do_syscall_64+0x81/0x180
> [Wed Nov 27 00:38:46 2024] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [Wed Nov 27 00:38:46 2024] RIP: 0033:0x7fddc1ff43eb
> [Wed Nov 27 00:38:46 2024] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f
> 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa
> b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15
> f9 a9 0c 00 f7 d8
> [Wed Nov 27 00:38:46 2024] RSP: 002b:00007ffe64be88d8 EFLAGS: 00000246
> ORIG_RAX: 00000000000000a6
> [Wed Nov 27 00:38:46 2024] RAX: 0000000000000000 RBX: 00005632106d4c20
> RCX: 00007fddc1ff43eb
> [Wed Nov 27 00:38:46 2024] RDX: 0000000000000000 RSI: 0000000000000000
> RDI: 00005632106d9410
> [Wed Nov 27 00:38:46 2024] RBP: 00007ffe64be89b0 R08: 00005632106d4010
> R09: 0000000000000007
> [Wed Nov 27 00:38:46 2024] R10: 0000000000000000 R11: 0000000000000246
> R12: 00005632106d4d28
> [Wed Nov 27 00:38:46 2024] R13: 0000000000000000 R14: 00005632106d9410
> R15: 00005632106d5030
> [Wed Nov 27 00:38:46 2024] </TASK>
> [Wed Nov 27 00:38:46 2024] irq event stamp: 8317
> [Wed Nov 27 00:38:46 2024] hardirqs last enabled at (8323):
> [<ffffffffac230dce>] __up_console_sem+0x5e/0x70
> [Wed Nov 27 00:38:46 2024] hardirqs last disabled at (8328):
> [<ffffffffac230db3>] __up_console_sem+0x43/0x70
> [Wed Nov 27 00:38:46 2024] softirqs last enabled at (6628):
> [<ffffffffac135745>] __irq_exit_rcu+0x135/0x160
> [Wed Nov 27 00:38:46 2024] softirqs last disabled at (6539):
> [<ffffffffac135745>] __irq_exit_rcu+0x135/0x160
> [Wed Nov 27 00:38:46 2024] ---[ end trace 0000000000000000 ]---
> [Wed Nov 27 00:38:46 2024] VFS: Busy inodes after unmount of cifs (cifs)
> [Wed Nov 27 00:38:46 2024] ------------[ cut here ]------------
> [Wed Nov 27 00:38:46 2024] kernel BUG at fs/super.c:650!
> [Wed Nov 27 00:38:46 2024] Oops: invalid opcode: 0000 [#1] PREEMPT SMP
> KASAN NOPTI
> [Wed Nov 27 00:38:46 2024] CPU: 2 UID: 0 PID: 316177 Comm: umount
> Tainted: G W 6.12.0 #1
> [Wed Nov 27 00:38:46 2024] Tainted: [W]=WARN
> [Wed Nov 27 00:38:46 2024] Hardware name: Red Hat KVM, BIOS
> 1.16.3-2.el9 04/01/2014
> [Wed Nov 27 00:38:46 2024] RIP: 0010:generic_shutdown_super+0x1b7/0x1c0
> [Wed Nov 27 00:38:46 2024] Code: 7b 28 e8 5c ca f8 ff 48 8b 6b 28 48
> 89 ef e8 50 ca f8 ff 48 8b 55 00 48 8d b3 68 06 00 00 48 c7 c7 e0 38
> ba ad e8 d9 c1 b5 ff <0f> 0b 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90
> 90 90 90 90 90 90
> [Wed Nov 27 00:38:46 2024] RSP: 0018:ff1100012197fdf0 EFLAGS: 00010282
> [Wed Nov 27 00:38:46 2024] RAX: 000000000000002d RBX: ff110001238b6000
> RCX: 0000000000000027
> [Wed Nov 27 00:38:46 2024] RDX: 0000000000000027 RSI: 0000000000000004
> RDI: ff110004cb131a48
> [Wed Nov 27 00:38:46 2024] RBP: ffffffffc1c6ac00 R08: ffffffffac3fd2fe
> R09: ffe21c0099626349
> [Wed Nov 27 00:38:46 2024] R10: ff110004cb131a4b R11: 0000000000000001
> R12: ff110001238b69c0
> [Wed Nov 27 00:38:46 2024] R13: ff110001238b6780 R14: 1fe220002432ffd4
> R15: 0000000000000000
> [Wed Nov 27 00:38:46 2024] FS: 00007fddc1dcc800(0000)
> GS:ff110004cb100000(0000) knlGS:0000000000000000
> [Wed Nov 27 00:38:46 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [Wed Nov 27 00:38:46 2024] CR2: 00007fc6440095d0 CR3: 0000000142146005
> CR4: 0000000000373ef0
> [Wed Nov 27 00:38:46 2024] DR0: 0000000000000000 DR1: 0000000000000000
> DR2: 0000000000000000
> [Wed Nov 27 00:38:46 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> DR7: 0000000000000400
> [Wed Nov 27 00:38:46 2024] Call Trace:
> [Wed Nov 27 00:38:46 2024] <TASK>
> [Wed Nov 27 00:38:46 2024] ? die+0x37/0x90
> [Wed Nov 27 00:38:46 2024] ? do_trap+0x133/0x230
> [Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
> [Wed Nov 27 00:38:46 2024] ? do_error_trap+0x94/0x130
> [Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
> [Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
> [Wed Nov 27 00:38:46 2024] ? handle_invalid_op+0x2c/0x40
> [Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
> [Wed Nov 27 00:38:46 2024] ? exc_invalid_op+0x2f/0x50
> [Wed Nov 27 00:38:46 2024] ? asm_exc_invalid_op+0x1a/0x20
> [Wed Nov 27 00:38:46 2024] ? irq_work_claim+0x1e/0x40
> [Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
> [Wed Nov 27 00:38:46 2024] kill_anon_super+0x22/0x40
> [Wed Nov 27 00:38:46 2024] cifs_kill_sb+0x78/0x90 [cifs]
> [Wed Nov 27 00:38:46 2024] deactivate_locked_super+0x69/0xf0
> [Wed Nov 27 00:38:46 2024] cleanup_mnt+0x195/0x200
> [Wed Nov 27 00:38:46 2024] task_work_run+0xec/0x150
> [Wed Nov 27 00:38:46 2024] ? __pfx_task_work_run+0x10/0x10
> [Wed Nov 27 00:38:46 2024] ? mark_held_locks+0x24/0x90
> [Wed Nov 27 00:38:46 2024] syscall_exit_to_user_mode+0x269/0x2a0
> [Wed Nov 27 00:38:46 2024] do_syscall_64+0x81/0x180
> [Wed Nov 27 00:38:46 2024] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [Wed Nov 27 00:38:46 2024] RIP: 0033:0x7fddc1ff43eb
> [Wed Nov 27 00:38:46 2024] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f
> 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa
> b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15
> f9 a9 0c 00 f7 d8
> [Wed Nov 27 00:38:46 2024] RSP: 002b:00007ffe64be88d8 EFLAGS: 00000246
> ORIG_RAX: 00000000000000a6
> [Wed Nov 27 00:38:46 2024] RAX: 0000000000000000 RBX: 00005632106d4c20
> RCX: 00007fddc1ff43eb
> [Wed Nov 27 00:38:46 2024] RDX: 0000000000000000 RSI: 0000000000000000
> RDI: 00005632106d9410
> [Wed Nov 27 00:38:46 2024] RBP: 00007ffe64be89b0 R08: 00005632106d4010
> R09: 0000000000000007
> [Wed Nov 27 00:38:46 2024] R10: 0000000000000000 R11: 0000000000000246
> R12: 00005632106d4d28
> [Wed Nov 27 00:38:46 2024] R13: 0000000000000000 R14: 00005632106d9410
> R15: 00005632106d5030
> [Wed Nov 27 00:38:46 2024] </TASK>
> [Wed Nov 27 00:38:46 2024] Modules linked in: cifs ccm cmac nls_utf8
> cifs_arc4 nls_ucs2_utils cifs_md4 rpcsec_gss_krb5 auth_rpcgss nfsv4
> dns_resolver nfs lockd grace netfs nf_conntrack_netbios_ns
> nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
> nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
> nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat
> ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
> iptable_security ip_set ebtable_filter ebtables ip6table_filter
> ip6_tables iptable_filter ip_tables sunrpc kvm_intel kvm virtio_net
> net_failover failover virtio_balloon loop fuse dm_multipath nfnetlink
> zram xfs bochs drm_client_lib drm_shmem_helper drm_kms_helper
> crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm
> sha512_ssse3 sha256_ssse3 sha1_ssse3 floppy virtio_blk qemu_fw_cfg
> virtio_console [last unloaded: cifs]
> [Wed Nov 27 00:38:46 2024] ---[ end trace 0000000000000000 ]---
> [Wed Nov 27 00:38:46 2024] RIP: 0010:generic_shutdown_super+0x1b7/0x1c0
> [Wed Nov 27 00:38:46 2024] Code: 7b 28 e8 5c ca f8 ff 48 8b 6b 28 48
> 89 ef e8 50 ca f8 ff 48 8b 55 00 48 8d b3 68 06 00 00 48 c7 c7 e0 38
> ba ad e8 d9 c1 b5 ff <0f> 0b 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90
> 90 90 90 90 90 90
> [Wed Nov 27 00:38:46 2024] RSP: 0018:ff1100012197fdf0 EFLAGS: 00010282
> [Wed Nov 27 00:38:46 2024] RAX: 000000000000002d RBX: ff110001238b6000
> RCX: 0000000000000027
> [Wed Nov 27 00:38:46 2024] RDX: 0000000000000027 RSI: 0000000000000004
> RDI: ff110004cb131a48
> [Wed Nov 27 00:38:46 2024] RBP: ffffffffc1c6ac00 R08: ffffffffac3fd2fe
> R09: ffe21c0099626349
> [Wed Nov 27 00:38:46 2024] R10: ff110004cb131a4b R11: 0000000000000001
> R12: ff110001238b69c0
> [Wed Nov 27 00:38:46 2024] R13: ff110001238b6780 R14: 1fe220002432ffd4
> R15: 0000000000000000
> [Wed Nov 27 00:38:46 2024] FS: 00007fddc1dcc800(0000)
> GS:ff110004cb100000(0000) knlGS:0000000000000000
> [Wed Nov 27 00:38:46 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [Wed Nov 27 00:38:46 2024] CR2: 00007fc6440095d0 CR3: 0000000142146005
> CR4: 0000000000373ef0
> [Wed Nov 27 00:38:46 2024] DR0: 0000000000000000 DR1: 0000000000000000
> DR2: 0000000000000000
> [Wed Nov 27 00:38:46 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> DR7: 0000000000000400
>
> On Tue, Nov 26, 2024 at 3:50 PM Paul Aurich <paul@darkrain42.org> wrote:
> >
> > On 2024-11-22 19:28:34 -0800, Paul Aurich wrote:
> > >On 2024-11-21 23:05:51 -0300, Paulo Alcantara wrote:
> > >>Hi Paul,
> > >>
> > >>Thanks for looking into this! Really appreciate it.
> > >>
> > >>Paul Aurich <paul@darkrain42.org> writes:
> > >>
> > >>>The unmount process (cifs_kill_sb() calling close_all_cached_dirs()) can
> > >>>race with various cached directory operations, which ultimately results
> > >>>in dentries not being dropped and these kernel BUGs:
> > >>>
> > >>>BUG: Dentry ffff88814f37e358{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs]
> > >>>VFS: Busy inodes after unmount of cifs (cifs)
> > >>>------------[ cut here ]------------
> > >>>kernel BUG at fs/super.c:661!
> > >>>
> > >>>This happens when a cfid is in the process of being cleaned up when, and
> > >>>has been removed from the cfids->entries list, including:
> > >>>
> > >>>- Receiving a lease break from the server
> > >>>- Server reconnection triggers invalidate_all_cached_dirs(), which
> > >>> removes all the cfids from the list
> > >>>- The laundromat thread decides to expire an old cfid.
> > >>>
> > >>>To solve these problems, dropping the dentry is done in queued work done
> > >>>in a newly-added cfid_put_wq workqueue, and close_all_cached_dirs()
> > >>>flushes that workqueue after it drops all the dentries of which it's
> > >>>aware. This is a global workqueue (rather than scoped to a mount), but
> > >>>the queued work is minimal.
> > >>
> > >>Why does it need to be a global workqueue? Can't you make it per tcon?
> > >
> > >The problem with a per-tcon workqueue is I didn't see clean way to
> > >deal with multiuser mounts and flushing the workqueue in
> > >close_all_cached_dirs() -- when dealing with each individual tcon,
> > >we're still holding tlink_tree_lock, so an arbitrary sleep seems
> > >problematic.
> > >
> > >There could be a per-sb workqueue (stored in cifs_sb or the master
> > >tcon) but is there a way to get back to the superblock / master tcon
> > >with just a tcon (e.g. cached_dir_lease_break, when processing a lease
> > >break)?
> > >
> > >>>The final cleanup work for cleaning up a cfid is performed via work
> > >>>queued in the serverclose_wq workqueue; this is done separate from
> > >>>dropping the dentries so that close_all_cached_dirs() doesn't block on
> > >>>any server operations.
> > >>>
> > >>>Both of these queued works expect to invoked with a cfid reference and
> > >>>a tcon reference to avoid those objects from being freed while the work
> > >>>is ongoing.
> > >>
> > >>Why do you need to take a tcon reference?
> > >
> > >In the existing code (and my patch, without the refs), I was seeing an
> > >intermittent use-after-free of the tcon or cached_fids struct by
> > >queued work processing a lease break -- the cfid isn't linked from
> > >cached_fids, but smb2_close_cached_fid invoking SMB2_close can race
> > >with the unmount and cifs_put_tcon
> > >
> > >Something like:
> > >
> > > t1 t2
> > >cached_dir_lease_break
> > >smb2_cached_lease_break
> > >smb2_close_cached_fid
> > >SMB2_close starts
> > > cifs_kill_sb
> > > cifs_umount
> > > cifs_put_link
> > > cifs_put_tcon
> > >SMB2_close continues
> > >
> > >I had a version of the patch that kept the 'in flight lease breaks' on
> > >a second list in cached_fids so that they could be cancelled
> > >synchronously from free_cached_fids(), but I struggled with it (I
> > >can't remember exactly, but I think I was struggling to get the linked
> > >list membership / removal handling and num_entries handling
> > >consistent).
> > >
> > >>Can't you drop the dentries
> > >>when tearing down tcon in cifs_put_tcon()? No concurrent mounts would
> > >>be able to access or free it.
> > >
> > >The dentries being dropped must occur before kill_anon_super(), as
> > >that's where the 'Dentry still in use' check is. All the tcons are put
> > >in cifs_umount(), which occurs after:
> > >
> > > kill_anon_super(sb);
> > > cifs_umount(cifs_sb);
> > >
> > >The other thing is that cifs_umount_begin() has this comment, which
> > >made me think a tcon can actually be tied to two distinct mount
> > >points:
> > >
> > > if ((tcon->tc_count > 1) || (tcon->status == TID_EXITING)) {
> > > /* we have other mounts to same share or we have
> > > already tried to umount this and woken up
> > > all waiting network requests, nothing to do */
> > >
> > >Although, as I'm thinking about it again, I think I've misunderstood
> > >(and that comment is wrong?).
> > >
> > >It did cross my mind to pull some of the work out of cifs_umount into
> > >cifs_kill_sb (specifically, I wanted to cancel prune_tlinks earlier)
> > >-- no prune_tlinks would make it more feasible to drop tlink_tree_lock
> > >in close_all_cached_dirs(), at which point a per-tcon workqueue is
> > >more practical.
> > >
> > >>After running xfstests I've seen a leaked tcon in
> > >>/proc/fs/cifs/DebugData with no CIFS superblocks, which might be related
> > >>to this.
> > >>
> > >>Could you please check if there is any leaked connection in
> > >>/proc/fs/cifs/DebugData after running your tests?
> > >
> > >After I finish with my tests (I'm not using xfstests, although perhaps
> > >I should be) and unmount the share, DebugData doesn't show any
> > >connections for me.
> >
> > I was able to reproduce this leak. I believe the attached patch addresses it.
> >
> > I'm able to intermittently see a 'Dentry still in use' bug with xfstests
> > generic/241 (what Steve saw) (the attached patch doesn't help with that). I'm
> > still unsure what's going on there.
> >
> > >~Paul
>
>
>
> --
> Thanks,
>
> Steve
--
Thanks,
Steve
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH v2 4/4] smb: During unmount, ensure all cached dir instances drop their dentry
2024-11-28 1:10 ` Steve French
@ 2024-11-28 5:00 ` Steve French
2024-11-28 14:16 ` Steve French
0 siblings, 1 reply; 20+ messages in thread
From: Steve French @ 2024-11-28 5:00 UTC (permalink / raw)
To: Paulo Alcantara, linux-cifs, Shyam Prasad N, Bharath SM
And also see it without patch "smb: During unmount, ensure all cached
dir instances drop their dentry"
On Wed, Nov 27, 2024 at 7:10 PM Steve French <smfrench@gmail.com> wrote:
>
> I see this error at the end of generic/241 (in dmesg) even without the
> more recent dir lease patch:
>
> "smb: Initialize cfid->tcon before performing network ops"
>
> so it is likely unrelated (earlier bug)
>
>
>
> Nov 27 18:07:46 fedora29 kernel: Call Trace:
> Nov 27 18:07:46 fedora29 kernel: <TASK>
> Nov 27 18:07:46 fedora29 kernel: ? __warn+0xa9/0x220
> Nov 27 18:07:46 fedora29 kernel: ? umount_check+0xc3/0xf0
> Nov 27 18:07:46 fedora29 kernel: ? report_bug+0x1d4/0x1e0
> Nov 27 18:07:46 fedora29 kernel: ? handle_bug+0x5b/0xa0
> Nov 27 18:07:46 fedora29 kernel: ? exc_invalid_op+0x18/0x50
> Nov 27 18:07:46 fedora29 kernel: ? asm_exc_invalid_op+0x1a/0x20
> Nov 27 18:07:46 fedora29 kernel: ? irq_work_claim+0x1e/0x40
> Nov 27 18:07:46 fedora29 kernel: ? umount_check+0xc3/0xf0
> Nov 27 18:07:46 fedora29 kernel: ? __pfx_umount_check+0x10/0x10
> Nov 27 18:07:46 fedora29 kernel: d_walk+0xf3/0x4e0
> Nov 27 18:07:46 fedora29 kernel: ? d_walk+0x4b/0x4e0
> Nov 27 18:07:46 fedora29 kernel: shrink_dcache_for_umount+0x6d/0x220
> Nov 27 18:07:46 fedora29 kernel: generic_shutdown_super+0x4a/0x1c0
> Nov 27 18:07:46 fedora29 kernel: kill_anon_super+0x22/0x40
> Nov 27 18:07:46 fedora29 kernel: cifs_kill_sb+0x78/0x90 [cifs]
>
> On Wed, Nov 27, 2024 at 10:38 AM Steve French <smfrench@gmail.com> wrote:
> >
> > I did see the generic/241 failure again with current for-next
> > (unrelated to this patch though). Will try to repro it again - but
> > any ideas how to narrow it down or fix it would be helpful.
> >
> > SECTION -- smb3
> > FSTYP -- cifs
> > PLATFORM -- Linux/x86_64 fedora29 6.12.0 #1 SMP PREEMPT_DYNAMIC Wed
> > Nov 27 01:02:07 UTC 2024
> > MKFS_OPTIONS -- //win16.vm.test/Scratch
> > generic/241 73s
> > Ran: generic/241
> > Passed all 1 tests
> > SECTION -- smb3
> > =========================
> > Ran: generic/241
> > Passed all 1 tests
> > Number of reconnects: 0
> > Test completed smb3 generic/241 at Wed Nov 27 06:38:47 AM UTC 2024
> > dmesg output during the test:
> > [Wed Nov 27 00:37:32 2024] CIFS: Attempting to mount //win16.vm.test/Share
> > [Wed Nov 27 00:37:32 2024] CIFS: VFS: generate_smb3signingkey: dumping
> > generated AES session keys
> > [Wed Nov 27 00:37:32 2024] CIFS: VFS: Session Id 45 00 00 08 00 c8 00 00
> > [Wed Nov 27 00:37:32 2024] CIFS: VFS: Cipher type 2
> > [Wed Nov 27 00:37:32 2024] CIFS: VFS: Session Key 00 bf ed c7 f1 95 0e
> > 29 06 e8 82 87 b5 c8 72 06
> > [Wed Nov 27 00:37:32 2024] CIFS: VFS: Signing Key a4 0f 15 64 d2 69 02
> > 2f 4e 78 60 7a fe 3e 31 4e
> > [Wed Nov 27 00:37:32 2024] CIFS: VFS: ServerIn Key a6 fd 04 f6 04 ea
> > 0e 6e 60 c0 1b b1 ee 63 38 e9
> > [Wed Nov 27 00:37:32 2024] CIFS: VFS: ServerOut Key a6 e3 e3 22 8c c2
> > b0 6e b1 9d 40 ea d0 89 6d d8
> > [Wed Nov 27 00:37:32 2024] CIFS: Attempting to mount //win16.vm.test/Scratch
> > [Wed Nov 27 00:37:32 2024] CIFS: Attempting to mount //win16.vm.test/Scratch
> > [Wed Nov 27 00:37:32 2024] run fstests generic/241 at 2024-11-27 00:37:33
> > [Wed Nov 27 00:38:46 2024] ------------[ cut here ]------------
> > [Wed Nov 27 00:38:46 2024] BUG: Dentry
> > 00000000318d67d4{i=11000000033f68,n=~dmtmp} still in use (1) [unmount
> > of cifs cifs]
> > [Wed Nov 27 00:38:46 2024] WARNING: CPU: 2 PID: 316177 at
> > fs/dcache.c:1546 umount_check+0xc3/0xf0
> > [Wed Nov 27 00:38:46 2024] Modules linked in: cifs ccm cmac nls_utf8
> > cifs_arc4 nls_ucs2_utils cifs_md4 rpcsec_gss_krb5 auth_rpcgss nfsv4
> > dns_resolver nfs lockd grace netfs nf_conntrack_netbios_ns
> > nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
> > nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
> > nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat
> > ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
> > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
> > iptable_security ip_set ebtable_filter ebtables ip6table_filter
> > ip6_tables iptable_filter ip_tables sunrpc kvm_intel kvm virtio_net
> > net_failover failover virtio_balloon loop fuse dm_multipath nfnetlink
> > zram xfs bochs drm_client_lib drm_shmem_helper drm_kms_helper
> > crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm
> > sha512_ssse3 sha256_ssse3 sha1_ssse3 floppy virtio_blk qemu_fw_cfg
> > virtio_console [last unloaded: cifs]
> > [Wed Nov 27 00:38:46 2024] CPU: 2 UID: 0 PID: 316177 Comm: umount Not
> > tainted 6.12.0 #1
> > [Wed Nov 27 00:38:46 2024] Hardware name: Red Hat KVM, BIOS
> > 1.16.3-2.el9 04/01/2014
> > [Wed Nov 27 00:38:46 2024] RIP: 0010:umount_check+0xc3/0xf0
> > [Wed Nov 27 00:38:46 2024] Code: db 74 0d 48 8d 7b 40 e8 db df f5 ff
> > 48 8b 53 40 41 55 4d 89 f1 45 89 e0 48 89 e9 48 89 ee 48 c7 c7 80 99
> > ba ad e8 2d 27 a2 ff <0f> 0b 58 31 c0 5b 5d 41 5c 41 5d 41 5e c3 cc cc
> > cc cc 41 83 fc 01
> > [Wed Nov 27 00:38:46 2024] RSP: 0018:ff1100012197fd20 EFLAGS: 00010282
> > [Wed Nov 27 00:38:46 2024] RAX: dffffc0000000000 RBX: ff1100010c574ce0
> > RCX: 0000000000000027
> > [Wed Nov 27 00:38:46 2024] RDX: 0000000000000027 RSI: 0000000000000004
> > RDI: ff110004cb131a48
> > [Wed Nov 27 00:38:46 2024] RBP: ff1100012c76bd60 R08: ffffffffac3fd2fe
> > R09: ffe21c0099626349
> > [Wed Nov 27 00:38:46 2024] R10: ff110004cb131a4b R11: 0000000000000001
> > R12: 0000000000000001
> > [Wed Nov 27 00:38:46 2024] R13: ff110001238b6668 R14: ffffffffc1d6e6c0
> > R15: ff1100012c76be18
> > [Wed Nov 27 00:38:46 2024] FS: 00007fddc1dcc800(0000)
> > GS:ff110004cb100000(0000) knlGS:0000000000000000
> > [Wed Nov 27 00:38:46 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [Wed Nov 27 00:38:46 2024] CR2: 00007fc6440095d0 CR3: 0000000142146005
> > CR4: 0000000000373ef0
> > [Wed Nov 27 00:38:46 2024] DR0: 0000000000000000 DR1: 0000000000000000
> > DR2: 0000000000000000
> > [Wed Nov 27 00:38:46 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> > DR7: 0000000000000400
> > [Wed Nov 27 00:38:46 2024] Call Trace:
> > [Wed Nov 27 00:38:46 2024] <TASK>
> > [Wed Nov 27 00:38:46 2024] ? __warn+0xa9/0x220
> > [Wed Nov 27 00:38:46 2024] ? umount_check+0xc3/0xf0
> > [Wed Nov 27 00:38:46 2024] ? report_bug+0x1d4/0x1e0
> > [Wed Nov 27 00:38:46 2024] ? handle_bug+0x5b/0xa0
> > [Wed Nov 27 00:38:46 2024] ? exc_invalid_op+0x18/0x50
> > [Wed Nov 27 00:38:46 2024] ? asm_exc_invalid_op+0x1a/0x20
> > [Wed Nov 27 00:38:46 2024] ? irq_work_claim+0x1e/0x40
> > [Wed Nov 27 00:38:46 2024] ? umount_check+0xc3/0xf0
> > [Wed Nov 27 00:38:46 2024] ? __pfx_umount_check+0x10/0x10
> > [Wed Nov 27 00:38:46 2024] d_walk+0xf3/0x4e0
> > [Wed Nov 27 00:38:46 2024] ? d_walk+0x4b/0x4e0
> > [Wed Nov 27 00:38:46 2024] shrink_dcache_for_umount+0x6d/0x220
> > [Wed Nov 27 00:38:46 2024] generic_shutdown_super+0x4a/0x1c0
> > [Wed Nov 27 00:38:46 2024] kill_anon_super+0x22/0x40
> > [Wed Nov 27 00:38:46 2024] cifs_kill_sb+0x78/0x90 [cifs]
> > [Wed Nov 27 00:38:46 2024] deactivate_locked_super+0x69/0xf0
> > [Wed Nov 27 00:38:46 2024] cleanup_mnt+0x195/0x200
> > [Wed Nov 27 00:38:46 2024] task_work_run+0xec/0x150
> > [Wed Nov 27 00:38:46 2024] ? __pfx_task_work_run+0x10/0x10
> > [Wed Nov 27 00:38:46 2024] ? mark_held_locks+0x24/0x90
> > [Wed Nov 27 00:38:46 2024] syscall_exit_to_user_mode+0x269/0x2a0
> > [Wed Nov 27 00:38:46 2024] do_syscall_64+0x81/0x180
> > [Wed Nov 27 00:38:46 2024] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [Wed Nov 27 00:38:46 2024] RIP: 0033:0x7fddc1ff43eb
> > [Wed Nov 27 00:38:46 2024] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f
> > 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa
> > b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15
> > f9 a9 0c 00 f7 d8
> > [Wed Nov 27 00:38:46 2024] RSP: 002b:00007ffe64be88d8 EFLAGS: 00000246
> > ORIG_RAX: 00000000000000a6
> > [Wed Nov 27 00:38:46 2024] RAX: 0000000000000000 RBX: 00005632106d4c20
> > RCX: 00007fddc1ff43eb
> > [Wed Nov 27 00:38:46 2024] RDX: 0000000000000000 RSI: 0000000000000000
> > RDI: 00005632106d9410
> > [Wed Nov 27 00:38:46 2024] RBP: 00007ffe64be89b0 R08: 00005632106d4010
> > R09: 0000000000000007
> > [Wed Nov 27 00:38:46 2024] R10: 0000000000000000 R11: 0000000000000246
> > R12: 00005632106d4d28
> > [Wed Nov 27 00:38:46 2024] R13: 0000000000000000 R14: 00005632106d9410
> > R15: 00005632106d5030
> > [Wed Nov 27 00:38:46 2024] </TASK>
> > [Wed Nov 27 00:38:46 2024] irq event stamp: 8317
> > [Wed Nov 27 00:38:46 2024] hardirqs last enabled at (8323):
> > [<ffffffffac230dce>] __up_console_sem+0x5e/0x70
> > [Wed Nov 27 00:38:46 2024] hardirqs last disabled at (8328):
> > [<ffffffffac230db3>] __up_console_sem+0x43/0x70
> > [Wed Nov 27 00:38:46 2024] softirqs last enabled at (6628):
> > [<ffffffffac135745>] __irq_exit_rcu+0x135/0x160
> > [Wed Nov 27 00:38:46 2024] softirqs last disabled at (6539):
> > [<ffffffffac135745>] __irq_exit_rcu+0x135/0x160
> > [Wed Nov 27 00:38:46 2024] ---[ end trace 0000000000000000 ]---
> > [Wed Nov 27 00:38:46 2024] VFS: Busy inodes after unmount of cifs (cifs)
> > [Wed Nov 27 00:38:46 2024] ------------[ cut here ]------------
> > [Wed Nov 27 00:38:46 2024] kernel BUG at fs/super.c:650!
> > [Wed Nov 27 00:38:46 2024] Oops: invalid opcode: 0000 [#1] PREEMPT SMP
> > KASAN NOPTI
> > [Wed Nov 27 00:38:46 2024] CPU: 2 UID: 0 PID: 316177 Comm: umount
> > Tainted: G W 6.12.0 #1
> > [Wed Nov 27 00:38:46 2024] Tainted: [W]=WARN
> > [Wed Nov 27 00:38:46 2024] Hardware name: Red Hat KVM, BIOS
> > 1.16.3-2.el9 04/01/2014
> > [Wed Nov 27 00:38:46 2024] RIP: 0010:generic_shutdown_super+0x1b7/0x1c0
> > [Wed Nov 27 00:38:46 2024] Code: 7b 28 e8 5c ca f8 ff 48 8b 6b 28 48
> > 89 ef e8 50 ca f8 ff 48 8b 55 00 48 8d b3 68 06 00 00 48 c7 c7 e0 38
> > ba ad e8 d9 c1 b5 ff <0f> 0b 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90
> > 90 90 90 90 90 90
> > [Wed Nov 27 00:38:46 2024] RSP: 0018:ff1100012197fdf0 EFLAGS: 00010282
> > [Wed Nov 27 00:38:46 2024] RAX: 000000000000002d RBX: ff110001238b6000
> > RCX: 0000000000000027
> > [Wed Nov 27 00:38:46 2024] RDX: 0000000000000027 RSI: 0000000000000004
> > RDI: ff110004cb131a48
> > [Wed Nov 27 00:38:46 2024] RBP: ffffffffc1c6ac00 R08: ffffffffac3fd2fe
> > R09: ffe21c0099626349
> > [Wed Nov 27 00:38:46 2024] R10: ff110004cb131a4b R11: 0000000000000001
> > R12: ff110001238b69c0
> > [Wed Nov 27 00:38:46 2024] R13: ff110001238b6780 R14: 1fe220002432ffd4
> > R15: 0000000000000000
> > [Wed Nov 27 00:38:46 2024] FS: 00007fddc1dcc800(0000)
> > GS:ff110004cb100000(0000) knlGS:0000000000000000
> > [Wed Nov 27 00:38:46 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [Wed Nov 27 00:38:46 2024] CR2: 00007fc6440095d0 CR3: 0000000142146005
> > CR4: 0000000000373ef0
> > [Wed Nov 27 00:38:46 2024] DR0: 0000000000000000 DR1: 0000000000000000
> > DR2: 0000000000000000
> > [Wed Nov 27 00:38:46 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> > DR7: 0000000000000400
> > [Wed Nov 27 00:38:46 2024] Call Trace:
> > [Wed Nov 27 00:38:46 2024] <TASK>
> > [Wed Nov 27 00:38:46 2024] ? die+0x37/0x90
> > [Wed Nov 27 00:38:46 2024] ? do_trap+0x133/0x230
> > [Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
> > [Wed Nov 27 00:38:46 2024] ? do_error_trap+0x94/0x130
> > [Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
> > [Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
> > [Wed Nov 27 00:38:46 2024] ? handle_invalid_op+0x2c/0x40
> > [Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
> > [Wed Nov 27 00:38:46 2024] ? exc_invalid_op+0x2f/0x50
> > [Wed Nov 27 00:38:46 2024] ? asm_exc_invalid_op+0x1a/0x20
> > [Wed Nov 27 00:38:46 2024] ? irq_work_claim+0x1e/0x40
> > [Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
> > [Wed Nov 27 00:38:46 2024] kill_anon_super+0x22/0x40
> > [Wed Nov 27 00:38:46 2024] cifs_kill_sb+0x78/0x90 [cifs]
> > [Wed Nov 27 00:38:46 2024] deactivate_locked_super+0x69/0xf0
> > [Wed Nov 27 00:38:46 2024] cleanup_mnt+0x195/0x200
> > [Wed Nov 27 00:38:46 2024] task_work_run+0xec/0x150
> > [Wed Nov 27 00:38:46 2024] ? __pfx_task_work_run+0x10/0x10
> > [Wed Nov 27 00:38:46 2024] ? mark_held_locks+0x24/0x90
> > [Wed Nov 27 00:38:46 2024] syscall_exit_to_user_mode+0x269/0x2a0
> > [Wed Nov 27 00:38:46 2024] do_syscall_64+0x81/0x180
> > [Wed Nov 27 00:38:46 2024] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [Wed Nov 27 00:38:46 2024] RIP: 0033:0x7fddc1ff43eb
> > [Wed Nov 27 00:38:46 2024] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f
> > 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa
> > b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15
> > f9 a9 0c 00 f7 d8
> > [Wed Nov 27 00:38:46 2024] RSP: 002b:00007ffe64be88d8 EFLAGS: 00000246
> > ORIG_RAX: 00000000000000a6
> > [Wed Nov 27 00:38:46 2024] RAX: 0000000000000000 RBX: 00005632106d4c20
> > RCX: 00007fddc1ff43eb
> > [Wed Nov 27 00:38:46 2024] RDX: 0000000000000000 RSI: 0000000000000000
> > RDI: 00005632106d9410
> > [Wed Nov 27 00:38:46 2024] RBP: 00007ffe64be89b0 R08: 00005632106d4010
> > R09: 0000000000000007
> > [Wed Nov 27 00:38:46 2024] R10: 0000000000000000 R11: 0000000000000246
> > R12: 00005632106d4d28
> > [Wed Nov 27 00:38:46 2024] R13: 0000000000000000 R14: 00005632106d9410
> > R15: 00005632106d5030
> > [Wed Nov 27 00:38:46 2024] </TASK>
> > [Wed Nov 27 00:38:46 2024] Modules linked in: cifs ccm cmac nls_utf8
> > cifs_arc4 nls_ucs2_utils cifs_md4 rpcsec_gss_krb5 auth_rpcgss nfsv4
> > dns_resolver nfs lockd grace netfs nf_conntrack_netbios_ns
> > nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
> > nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
> > nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat
> > ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
> > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
> > iptable_security ip_set ebtable_filter ebtables ip6table_filter
> > ip6_tables iptable_filter ip_tables sunrpc kvm_intel kvm virtio_net
> > net_failover failover virtio_balloon loop fuse dm_multipath nfnetlink
> > zram xfs bochs drm_client_lib drm_shmem_helper drm_kms_helper
> > crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm
> > sha512_ssse3 sha256_ssse3 sha1_ssse3 floppy virtio_blk qemu_fw_cfg
> > virtio_console [last unloaded: cifs]
> > [Wed Nov 27 00:38:46 2024] ---[ end trace 0000000000000000 ]---
> > [Wed Nov 27 00:38:46 2024] RIP: 0010:generic_shutdown_super+0x1b7/0x1c0
> > [Wed Nov 27 00:38:46 2024] Code: 7b 28 e8 5c ca f8 ff 48 8b 6b 28 48
> > 89 ef e8 50 ca f8 ff 48 8b 55 00 48 8d b3 68 06 00 00 48 c7 c7 e0 38
> > ba ad e8 d9 c1 b5 ff <0f> 0b 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90
> > 90 90 90 90 90 90
> > [Wed Nov 27 00:38:46 2024] RSP: 0018:ff1100012197fdf0 EFLAGS: 00010282
> > [Wed Nov 27 00:38:46 2024] RAX: 000000000000002d RBX: ff110001238b6000
> > RCX: 0000000000000027
> > [Wed Nov 27 00:38:46 2024] RDX: 0000000000000027 RSI: 0000000000000004
> > RDI: ff110004cb131a48
> > [Wed Nov 27 00:38:46 2024] RBP: ffffffffc1c6ac00 R08: ffffffffac3fd2fe
> > R09: ffe21c0099626349
> > [Wed Nov 27 00:38:46 2024] R10: ff110004cb131a4b R11: 0000000000000001
> > R12: ff110001238b69c0
> > [Wed Nov 27 00:38:46 2024] R13: ff110001238b6780 R14: 1fe220002432ffd4
> > R15: 0000000000000000
> > [Wed Nov 27 00:38:46 2024] FS: 00007fddc1dcc800(0000)
> > GS:ff110004cb100000(0000) knlGS:0000000000000000
> > [Wed Nov 27 00:38:46 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [Wed Nov 27 00:38:46 2024] CR2: 00007fc6440095d0 CR3: 0000000142146005
> > CR4: 0000000000373ef0
> > [Wed Nov 27 00:38:46 2024] DR0: 0000000000000000 DR1: 0000000000000000
> > DR2: 0000000000000000
> > [Wed Nov 27 00:38:46 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> > DR7: 0000000000000400
> >
> > On Tue, Nov 26, 2024 at 3:50 PM Paul Aurich <paul@darkrain42.org> wrote:
> > >
> > > On 2024-11-22 19:28:34 -0800, Paul Aurich wrote:
> > > >On 2024-11-21 23:05:51 -0300, Paulo Alcantara wrote:
> > > >>Hi Paul,
> > > >>
> > > >>Thanks for looking into this! Really appreciate it.
> > > >>
> > > >>Paul Aurich <paul@darkrain42.org> writes:
> > > >>
> > > >>>The unmount process (cifs_kill_sb() calling close_all_cached_dirs()) can
> > > >>>race with various cached directory operations, which ultimately results
> > > >>>in dentries not being dropped and these kernel BUGs:
> > > >>>
> > > >>>BUG: Dentry ffff88814f37e358{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs]
> > > >>>VFS: Busy inodes after unmount of cifs (cifs)
> > > >>>------------[ cut here ]------------
> > > >>>kernel BUG at fs/super.c:661!
> > > >>>
> > > >>>This happens when a cfid is in the process of being cleaned up when, and
> > > >>>has been removed from the cfids->entries list, including:
> > > >>>
> > > >>>- Receiving a lease break from the server
> > > >>>- Server reconnection triggers invalidate_all_cached_dirs(), which
> > > >>> removes all the cfids from the list
> > > >>>- The laundromat thread decides to expire an old cfid.
> > > >>>
> > > >>>To solve these problems, dropping the dentry is done in queued work done
> > > >>>in a newly-added cfid_put_wq workqueue, and close_all_cached_dirs()
> > > >>>flushes that workqueue after it drops all the dentries of which it's
> > > >>>aware. This is a global workqueue (rather than scoped to a mount), but
> > > >>>the queued work is minimal.
> > > >>
> > > >>Why does it need to be a global workqueue? Can't you make it per tcon?
> > > >
> > > >The problem with a per-tcon workqueue is I didn't see clean way to
> > > >deal with multiuser mounts and flushing the workqueue in
> > > >close_all_cached_dirs() -- when dealing with each individual tcon,
> > > >we're still holding tlink_tree_lock, so an arbitrary sleep seems
> > > >problematic.
> > > >
> > > >There could be a per-sb workqueue (stored in cifs_sb or the master
> > > >tcon) but is there a way to get back to the superblock / master tcon
> > > >with just a tcon (e.g. cached_dir_lease_break, when processing a lease
> > > >break)?
> > > >
> > > >>>The final cleanup work for cleaning up a cfid is performed via work
> > > >>>queued in the serverclose_wq workqueue; this is done separate from
> > > >>>dropping the dentries so that close_all_cached_dirs() doesn't block on
> > > >>>any server operations.
> > > >>>
> > > >>>Both of these queued works expect to invoked with a cfid reference and
> > > >>>a tcon reference to avoid those objects from being freed while the work
> > > >>>is ongoing.
> > > >>
> > > >>Why do you need to take a tcon reference?
> > > >
> > > >In the existing code (and my patch, without the refs), I was seeing an
> > > >intermittent use-after-free of the tcon or cached_fids struct by
> > > >queued work processing a lease break -- the cfid isn't linked from
> > > >cached_fids, but smb2_close_cached_fid invoking SMB2_close can race
> > > >with the unmount and cifs_put_tcon
> > > >
> > > >Something like:
> > > >
> > > > t1 t2
> > > >cached_dir_lease_break
> > > >smb2_cached_lease_break
> > > >smb2_close_cached_fid
> > > >SMB2_close starts
> > > > cifs_kill_sb
> > > > cifs_umount
> > > > cifs_put_link
> > > > cifs_put_tcon
> > > >SMB2_close continues
> > > >
> > > >I had a version of the patch that kept the 'in flight lease breaks' on
> > > >a second list in cached_fids so that they could be cancelled
> > > >synchronously from free_cached_fids(), but I struggled with it (I
> > > >can't remember exactly, but I think I was struggling to get the linked
> > > >list membership / removal handling and num_entries handling
> > > >consistent).
> > > >
> > > >>Can't you drop the dentries
> > > >>when tearing down tcon in cifs_put_tcon()? No concurrent mounts would
> > > >>be able to access or free it.
> > > >
> > > >The dentries being dropped must occur before kill_anon_super(), as
> > > >that's where the 'Dentry still in use' check is. All the tcons are put
> > > >in cifs_umount(), which occurs after:
> > > >
> > > > kill_anon_super(sb);
> > > > cifs_umount(cifs_sb);
> > > >
> > > >The other thing is that cifs_umount_begin() has this comment, which
> > > >made me think a tcon can actually be tied to two distinct mount
> > > >points:
> > > >
> > > > if ((tcon->tc_count > 1) || (tcon->status == TID_EXITING)) {
> > > > /* we have other mounts to same share or we have
> > > > already tried to umount this and woken up
> > > > all waiting network requests, nothing to do */
> > > >
> > > >Although, as I'm thinking about it again, I think I've misunderstood
> > > >(and that comment is wrong?).
> > > >
> > > >It did cross my mind to pull some of the work out of cifs_umount into
> > > >cifs_kill_sb (specifically, I wanted to cancel prune_tlinks earlier)
> > > >-- no prune_tlinks would make it more feasible to drop tlink_tree_lock
> > > >in close_all_cached_dirs(), at which point a per-tcon workqueue is
> > > >more practical.
> > > >
> > > >>After running xfstests I've seen a leaked tcon in
> > > >>/proc/fs/cifs/DebugData with no CIFS superblocks, which might be related
> > > >>to this.
> > > >>
> > > >>Could you please check if there is any leaked connection in
> > > >>/proc/fs/cifs/DebugData after running your tests?
> > > >
> > > >After I finish with my tests (I'm not using xfstests, although perhaps
> > > >I should be) and unmount the share, DebugData doesn't show any
> > > >connections for me.
> > >
> > > I was able to reproduce this leak. I believe the attached patch addresses it.
> > >
> > > I'm able to intermittently see a 'Dentry still in use' bug with xfstests
> > > generic/241 (what Steve saw) (the attached patch doesn't help with that). I'm
> > > still unsure what's going on there.
> > >
> > > >~Paul
> >
> >
> >
> > --
> > Thanks,
> >
> > Steve
>
>
>
> --
> Thanks,
>
> Steve
--
Thanks,
Steve
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH v2 4/4] smb: During unmount, ensure all cached dir instances drop their dentry
2024-11-28 5:00 ` Steve French
@ 2024-11-28 14:16 ` Steve French
2024-12-06 23:28 ` Steve French
0 siblings, 1 reply; 20+ messages in thread
From: Steve French @ 2024-11-28 14:16 UTC (permalink / raw)
To: Paulo Alcantara, linux-cifs, Shyam Prasad N, Bharath SM
The unmount crash also happens with mainline so not related to patches
in for-next
http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/6/builds/128/steps/94/logs/stdio
On Wed, Nov 27, 2024 at 11:00 PM Steve French <smfrench@gmail.com> wrote:
>
> And also see it without patch "smb: During unmount, ensure all cached
> dir instances drop their dentry"
>
> On Wed, Nov 27, 2024 at 7:10 PM Steve French <smfrench@gmail.com> wrote:
> >
> > I see this error at the end of generic/241 (in dmesg) even without the
> > more recent dir lease patch:
> >
> > "smb: Initialize cfid->tcon before performing network ops"
> >
> > so it is likely unrelated (earlier bug)
> >
> >
> >
> > Nov 27 18:07:46 fedora29 kernel: Call Trace:
> > Nov 27 18:07:46 fedora29 kernel: <TASK>
> > Nov 27 18:07:46 fedora29 kernel: ? __warn+0xa9/0x220
> > Nov 27 18:07:46 fedora29 kernel: ? umount_check+0xc3/0xf0
> > Nov 27 18:07:46 fedora29 kernel: ? report_bug+0x1d4/0x1e0
> > Nov 27 18:07:46 fedora29 kernel: ? handle_bug+0x5b/0xa0
> > Nov 27 18:07:46 fedora29 kernel: ? exc_invalid_op+0x18/0x50
> > Nov 27 18:07:46 fedora29 kernel: ? asm_exc_invalid_op+0x1a/0x20
> > Nov 27 18:07:46 fedora29 kernel: ? irq_work_claim+0x1e/0x40
> > Nov 27 18:07:46 fedora29 kernel: ? umount_check+0xc3/0xf0
> > Nov 27 18:07:46 fedora29 kernel: ? __pfx_umount_check+0x10/0x10
> > Nov 27 18:07:46 fedora29 kernel: d_walk+0xf3/0x4e0
> > Nov 27 18:07:46 fedora29 kernel: ? d_walk+0x4b/0x4e0
> > Nov 27 18:07:46 fedora29 kernel: shrink_dcache_for_umount+0x6d/0x220
> > Nov 27 18:07:46 fedora29 kernel: generic_shutdown_super+0x4a/0x1c0
> > Nov 27 18:07:46 fedora29 kernel: kill_anon_super+0x22/0x40
> > Nov 27 18:07:46 fedora29 kernel: cifs_kill_sb+0x78/0x90 [cifs]
> >
> > On Wed, Nov 27, 2024 at 10:38 AM Steve French <smfrench@gmail.com> wrote:
> > >
> > > I did see the generic/241 failure again with current for-next
> > > (unrelated to this patch though). Will try to repro it again - but
> > > any ideas how to narrow it down or fix it would be helpful.
> > >
> > > SECTION -- smb3
> > > FSTYP -- cifs
> > > PLATFORM -- Linux/x86_64 fedora29 6.12.0 #1 SMP PREEMPT_DYNAMIC Wed
> > > Nov 27 01:02:07 UTC 2024
> > > MKFS_OPTIONS -- //win16.vm.test/Scratch
> > > generic/241 73s
> > > Ran: generic/241
> > > Passed all 1 tests
> > > SECTION -- smb3
> > > =========================
> > > Ran: generic/241
> > > Passed all 1 tests
> > > Number of reconnects: 0
> > > Test completed smb3 generic/241 at Wed Nov 27 06:38:47 AM UTC 2024
> > > dmesg output during the test:
> > > [Wed Nov 27 00:37:32 2024] CIFS: Attempting to mount //win16.vm.test/Share
> > > [Wed Nov 27 00:37:32 2024] CIFS: VFS: generate_smb3signingkey: dumping
> > > generated AES session keys
> > > [Wed Nov 27 00:37:32 2024] CIFS: VFS: Session Id 45 00 00 08 00 c8 00 00
> > > [Wed Nov 27 00:37:32 2024] CIFS: VFS: Cipher type 2
> > > [Wed Nov 27 00:37:32 2024] CIFS: VFS: Session Key 00 bf ed c7 f1 95 0e
> > > 29 06 e8 82 87 b5 c8 72 06
> > > [Wed Nov 27 00:37:32 2024] CIFS: VFS: Signing Key a4 0f 15 64 d2 69 02
> > > 2f 4e 78 60 7a fe 3e 31 4e
> > > [Wed Nov 27 00:37:32 2024] CIFS: VFS: ServerIn Key a6 fd 04 f6 04 ea
> > > 0e 6e 60 c0 1b b1 ee 63 38 e9
> > > [Wed Nov 27 00:37:32 2024] CIFS: VFS: ServerOut Key a6 e3 e3 22 8c c2
> > > b0 6e b1 9d 40 ea d0 89 6d d8
> > > [Wed Nov 27 00:37:32 2024] CIFS: Attempting to mount //win16.vm.test/Scratch
> > > [Wed Nov 27 00:37:32 2024] CIFS: Attempting to mount //win16.vm.test/Scratch
> > > [Wed Nov 27 00:37:32 2024] run fstests generic/241 at 2024-11-27 00:37:33
> > > [Wed Nov 27 00:38:46 2024] ------------[ cut here ]------------
> > > [Wed Nov 27 00:38:46 2024] BUG: Dentry
> > > 00000000318d67d4{i=11000000033f68,n=~dmtmp} still in use (1) [unmount
> > > of cifs cifs]
> > > [Wed Nov 27 00:38:46 2024] WARNING: CPU: 2 PID: 316177 at
> > > fs/dcache.c:1546 umount_check+0xc3/0xf0
> > > [Wed Nov 27 00:38:46 2024] Modules linked in: cifs ccm cmac nls_utf8
> > > cifs_arc4 nls_ucs2_utils cifs_md4 rpcsec_gss_krb5 auth_rpcgss nfsv4
> > > dns_resolver nfs lockd grace netfs nf_conntrack_netbios_ns
> > > nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
> > > nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
> > > nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat
> > > ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
> > > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
> > > iptable_security ip_set ebtable_filter ebtables ip6table_filter
> > > ip6_tables iptable_filter ip_tables sunrpc kvm_intel kvm virtio_net
> > > net_failover failover virtio_balloon loop fuse dm_multipath nfnetlink
> > > zram xfs bochs drm_client_lib drm_shmem_helper drm_kms_helper
> > > crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm
> > > sha512_ssse3 sha256_ssse3 sha1_ssse3 floppy virtio_blk qemu_fw_cfg
> > > virtio_console [last unloaded: cifs]
> > > [Wed Nov 27 00:38:46 2024] CPU: 2 UID: 0 PID: 316177 Comm: umount Not
> > > tainted 6.12.0 #1
> > > [Wed Nov 27 00:38:46 2024] Hardware name: Red Hat KVM, BIOS
> > > 1.16.3-2.el9 04/01/2014
> > > [Wed Nov 27 00:38:46 2024] RIP: 0010:umount_check+0xc3/0xf0
> > > [Wed Nov 27 00:38:46 2024] Code: db 74 0d 48 8d 7b 40 e8 db df f5 ff
> > > 48 8b 53 40 41 55 4d 89 f1 45 89 e0 48 89 e9 48 89 ee 48 c7 c7 80 99
> > > ba ad e8 2d 27 a2 ff <0f> 0b 58 31 c0 5b 5d 41 5c 41 5d 41 5e c3 cc cc
> > > cc cc 41 83 fc 01
> > > [Wed Nov 27 00:38:46 2024] RSP: 0018:ff1100012197fd20 EFLAGS: 00010282
> > > [Wed Nov 27 00:38:46 2024] RAX: dffffc0000000000 RBX: ff1100010c574ce0
> > > RCX: 0000000000000027
> > > [Wed Nov 27 00:38:46 2024] RDX: 0000000000000027 RSI: 0000000000000004
> > > RDI: ff110004cb131a48
> > > [Wed Nov 27 00:38:46 2024] RBP: ff1100012c76bd60 R08: ffffffffac3fd2fe
> > > R09: ffe21c0099626349
> > > [Wed Nov 27 00:38:46 2024] R10: ff110004cb131a4b R11: 0000000000000001
> > > R12: 0000000000000001
> > > [Wed Nov 27 00:38:46 2024] R13: ff110001238b6668 R14: ffffffffc1d6e6c0
> > > R15: ff1100012c76be18
> > > [Wed Nov 27 00:38:46 2024] FS: 00007fddc1dcc800(0000)
> > > GS:ff110004cb100000(0000) knlGS:0000000000000000
> > > [Wed Nov 27 00:38:46 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [Wed Nov 27 00:38:46 2024] CR2: 00007fc6440095d0 CR3: 0000000142146005
> > > CR4: 0000000000373ef0
> > > [Wed Nov 27 00:38:46 2024] DR0: 0000000000000000 DR1: 0000000000000000
> > > DR2: 0000000000000000
> > > [Wed Nov 27 00:38:46 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> > > DR7: 0000000000000400
> > > [Wed Nov 27 00:38:46 2024] Call Trace:
> > > [Wed Nov 27 00:38:46 2024] <TASK>
> > > [Wed Nov 27 00:38:46 2024] ? __warn+0xa9/0x220
> > > [Wed Nov 27 00:38:46 2024] ? umount_check+0xc3/0xf0
> > > [Wed Nov 27 00:38:46 2024] ? report_bug+0x1d4/0x1e0
> > > [Wed Nov 27 00:38:46 2024] ? handle_bug+0x5b/0xa0
> > > [Wed Nov 27 00:38:46 2024] ? exc_invalid_op+0x18/0x50
> > > [Wed Nov 27 00:38:46 2024] ? asm_exc_invalid_op+0x1a/0x20
> > > [Wed Nov 27 00:38:46 2024] ? irq_work_claim+0x1e/0x40
> > > [Wed Nov 27 00:38:46 2024] ? umount_check+0xc3/0xf0
> > > [Wed Nov 27 00:38:46 2024] ? __pfx_umount_check+0x10/0x10
> > > [Wed Nov 27 00:38:46 2024] d_walk+0xf3/0x4e0
> > > [Wed Nov 27 00:38:46 2024] ? d_walk+0x4b/0x4e0
> > > [Wed Nov 27 00:38:46 2024] shrink_dcache_for_umount+0x6d/0x220
> > > [Wed Nov 27 00:38:46 2024] generic_shutdown_super+0x4a/0x1c0
> > > [Wed Nov 27 00:38:46 2024] kill_anon_super+0x22/0x40
> > > [Wed Nov 27 00:38:46 2024] cifs_kill_sb+0x78/0x90 [cifs]
> > > [Wed Nov 27 00:38:46 2024] deactivate_locked_super+0x69/0xf0
> > > [Wed Nov 27 00:38:46 2024] cleanup_mnt+0x195/0x200
> > > [Wed Nov 27 00:38:46 2024] task_work_run+0xec/0x150
> > > [Wed Nov 27 00:38:46 2024] ? __pfx_task_work_run+0x10/0x10
> > > [Wed Nov 27 00:38:46 2024] ? mark_held_locks+0x24/0x90
> > > [Wed Nov 27 00:38:46 2024] syscall_exit_to_user_mode+0x269/0x2a0
> > > [Wed Nov 27 00:38:46 2024] do_syscall_64+0x81/0x180
> > > [Wed Nov 27 00:38:46 2024] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > [Wed Nov 27 00:38:46 2024] RIP: 0033:0x7fddc1ff43eb
> > > [Wed Nov 27 00:38:46 2024] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f
> > > 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa
> > > b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15
> > > f9 a9 0c 00 f7 d8
> > > [Wed Nov 27 00:38:46 2024] RSP: 002b:00007ffe64be88d8 EFLAGS: 00000246
> > > ORIG_RAX: 00000000000000a6
> > > [Wed Nov 27 00:38:46 2024] RAX: 0000000000000000 RBX: 00005632106d4c20
> > > RCX: 00007fddc1ff43eb
> > > [Wed Nov 27 00:38:46 2024] RDX: 0000000000000000 RSI: 0000000000000000
> > > RDI: 00005632106d9410
> > > [Wed Nov 27 00:38:46 2024] RBP: 00007ffe64be89b0 R08: 00005632106d4010
> > > R09: 0000000000000007
> > > [Wed Nov 27 00:38:46 2024] R10: 0000000000000000 R11: 0000000000000246
> > > R12: 00005632106d4d28
> > > [Wed Nov 27 00:38:46 2024] R13: 0000000000000000 R14: 00005632106d9410
> > > R15: 00005632106d5030
> > > [Wed Nov 27 00:38:46 2024] </TASK>
> > > [Wed Nov 27 00:38:46 2024] irq event stamp: 8317
> > > [Wed Nov 27 00:38:46 2024] hardirqs last enabled at (8323):
> > > [<ffffffffac230dce>] __up_console_sem+0x5e/0x70
> > > [Wed Nov 27 00:38:46 2024] hardirqs last disabled at (8328):
> > > [<ffffffffac230db3>] __up_console_sem+0x43/0x70
> > > [Wed Nov 27 00:38:46 2024] softirqs last enabled at (6628):
> > > [<ffffffffac135745>] __irq_exit_rcu+0x135/0x160
> > > [Wed Nov 27 00:38:46 2024] softirqs last disabled at (6539):
> > > [<ffffffffac135745>] __irq_exit_rcu+0x135/0x160
> > > [Wed Nov 27 00:38:46 2024] ---[ end trace 0000000000000000 ]---
> > > [Wed Nov 27 00:38:46 2024] VFS: Busy inodes after unmount of cifs (cifs)
> > > [Wed Nov 27 00:38:46 2024] ------------[ cut here ]------------
> > > [Wed Nov 27 00:38:46 2024] kernel BUG at fs/super.c:650!
> > > [Wed Nov 27 00:38:46 2024] Oops: invalid opcode: 0000 [#1] PREEMPT SMP
> > > KASAN NOPTI
> > > [Wed Nov 27 00:38:46 2024] CPU: 2 UID: 0 PID: 316177 Comm: umount
> > > Tainted: G W 6.12.0 #1
> > > [Wed Nov 27 00:38:46 2024] Tainted: [W]=WARN
> > > [Wed Nov 27 00:38:46 2024] Hardware name: Red Hat KVM, BIOS
> > > 1.16.3-2.el9 04/01/2014
> > > [Wed Nov 27 00:38:46 2024] RIP: 0010:generic_shutdown_super+0x1b7/0x1c0
> > > [Wed Nov 27 00:38:46 2024] Code: 7b 28 e8 5c ca f8 ff 48 8b 6b 28 48
> > > 89 ef e8 50 ca f8 ff 48 8b 55 00 48 8d b3 68 06 00 00 48 c7 c7 e0 38
> > > ba ad e8 d9 c1 b5 ff <0f> 0b 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90
> > > 90 90 90 90 90 90
> > > [Wed Nov 27 00:38:46 2024] RSP: 0018:ff1100012197fdf0 EFLAGS: 00010282
> > > [Wed Nov 27 00:38:46 2024] RAX: 000000000000002d RBX: ff110001238b6000
> > > RCX: 0000000000000027
> > > [Wed Nov 27 00:38:46 2024] RDX: 0000000000000027 RSI: 0000000000000004
> > > RDI: ff110004cb131a48
> > > [Wed Nov 27 00:38:46 2024] RBP: ffffffffc1c6ac00 R08: ffffffffac3fd2fe
> > > R09: ffe21c0099626349
> > > [Wed Nov 27 00:38:46 2024] R10: ff110004cb131a4b R11: 0000000000000001
> > > R12: ff110001238b69c0
> > > [Wed Nov 27 00:38:46 2024] R13: ff110001238b6780 R14: 1fe220002432ffd4
> > > R15: 0000000000000000
> > > [Wed Nov 27 00:38:46 2024] FS: 00007fddc1dcc800(0000)
> > > GS:ff110004cb100000(0000) knlGS:0000000000000000
> > > [Wed Nov 27 00:38:46 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [Wed Nov 27 00:38:46 2024] CR2: 00007fc6440095d0 CR3: 0000000142146005
> > > CR4: 0000000000373ef0
> > > [Wed Nov 27 00:38:46 2024] DR0: 0000000000000000 DR1: 0000000000000000
> > > DR2: 0000000000000000
> > > [Wed Nov 27 00:38:46 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> > > DR7: 0000000000000400
> > > [Wed Nov 27 00:38:46 2024] Call Trace:
> > > [Wed Nov 27 00:38:46 2024] <TASK>
> > > [Wed Nov 27 00:38:46 2024] ? die+0x37/0x90
> > > [Wed Nov 27 00:38:46 2024] ? do_trap+0x133/0x230
> > > [Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
> > > [Wed Nov 27 00:38:46 2024] ? do_error_trap+0x94/0x130
> > > [Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
> > > [Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
> > > [Wed Nov 27 00:38:46 2024] ? handle_invalid_op+0x2c/0x40
> > > [Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
> > > [Wed Nov 27 00:38:46 2024] ? exc_invalid_op+0x2f/0x50
> > > [Wed Nov 27 00:38:46 2024] ? asm_exc_invalid_op+0x1a/0x20
> > > [Wed Nov 27 00:38:46 2024] ? irq_work_claim+0x1e/0x40
> > > [Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
> > > [Wed Nov 27 00:38:46 2024] kill_anon_super+0x22/0x40
> > > [Wed Nov 27 00:38:46 2024] cifs_kill_sb+0x78/0x90 [cifs]
> > > [Wed Nov 27 00:38:46 2024] deactivate_locked_super+0x69/0xf0
> > > [Wed Nov 27 00:38:46 2024] cleanup_mnt+0x195/0x200
> > > [Wed Nov 27 00:38:46 2024] task_work_run+0xec/0x150
> > > [Wed Nov 27 00:38:46 2024] ? __pfx_task_work_run+0x10/0x10
> > > [Wed Nov 27 00:38:46 2024] ? mark_held_locks+0x24/0x90
> > > [Wed Nov 27 00:38:46 2024] syscall_exit_to_user_mode+0x269/0x2a0
> > > [Wed Nov 27 00:38:46 2024] do_syscall_64+0x81/0x180
> > > [Wed Nov 27 00:38:46 2024] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > [Wed Nov 27 00:38:46 2024] RIP: 0033:0x7fddc1ff43eb
> > > [Wed Nov 27 00:38:46 2024] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f
> > > 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa
> > > b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15
> > > f9 a9 0c 00 f7 d8
> > > [Wed Nov 27 00:38:46 2024] RSP: 002b:00007ffe64be88d8 EFLAGS: 00000246
> > > ORIG_RAX: 00000000000000a6
> > > [Wed Nov 27 00:38:46 2024] RAX: 0000000000000000 RBX: 00005632106d4c20
> > > RCX: 00007fddc1ff43eb
> > > [Wed Nov 27 00:38:46 2024] RDX: 0000000000000000 RSI: 0000000000000000
> > > RDI: 00005632106d9410
> > > [Wed Nov 27 00:38:46 2024] RBP: 00007ffe64be89b0 R08: 00005632106d4010
> > > R09: 0000000000000007
> > > [Wed Nov 27 00:38:46 2024] R10: 0000000000000000 R11: 0000000000000246
> > > R12: 00005632106d4d28
> > > [Wed Nov 27 00:38:46 2024] R13: 0000000000000000 R14: 00005632106d9410
> > > R15: 00005632106d5030
> > > [Wed Nov 27 00:38:46 2024] </TASK>
> > > [Wed Nov 27 00:38:46 2024] Modules linked in: cifs ccm cmac nls_utf8
> > > cifs_arc4 nls_ucs2_utils cifs_md4 rpcsec_gss_krb5 auth_rpcgss nfsv4
> > > dns_resolver nfs lockd grace netfs nf_conntrack_netbios_ns
> > > nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
> > > nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
> > > nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat
> > > ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
> > > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
> > > iptable_security ip_set ebtable_filter ebtables ip6table_filter
> > > ip6_tables iptable_filter ip_tables sunrpc kvm_intel kvm virtio_net
> > > net_failover failover virtio_balloon loop fuse dm_multipath nfnetlink
> > > zram xfs bochs drm_client_lib drm_shmem_helper drm_kms_helper
> > > crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm
> > > sha512_ssse3 sha256_ssse3 sha1_ssse3 floppy virtio_blk qemu_fw_cfg
> > > virtio_console [last unloaded: cifs]
> > > [Wed Nov 27 00:38:46 2024] ---[ end trace 0000000000000000 ]---
> > > [Wed Nov 27 00:38:46 2024] RIP: 0010:generic_shutdown_super+0x1b7/0x1c0
> > > [Wed Nov 27 00:38:46 2024] Code: 7b 28 e8 5c ca f8 ff 48 8b 6b 28 48
> > > 89 ef e8 50 ca f8 ff 48 8b 55 00 48 8d b3 68 06 00 00 48 c7 c7 e0 38
> > > ba ad e8 d9 c1 b5 ff <0f> 0b 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90
> > > 90 90 90 90 90 90
> > > [Wed Nov 27 00:38:46 2024] RSP: 0018:ff1100012197fdf0 EFLAGS: 00010282
> > > [Wed Nov 27 00:38:46 2024] RAX: 000000000000002d RBX: ff110001238b6000
> > > RCX: 0000000000000027
> > > [Wed Nov 27 00:38:46 2024] RDX: 0000000000000027 RSI: 0000000000000004
> > > RDI: ff110004cb131a48
> > > [Wed Nov 27 00:38:46 2024] RBP: ffffffffc1c6ac00 R08: ffffffffac3fd2fe
> > > R09: ffe21c0099626349
> > > [Wed Nov 27 00:38:46 2024] R10: ff110004cb131a4b R11: 0000000000000001
> > > R12: ff110001238b69c0
> > > [Wed Nov 27 00:38:46 2024] R13: ff110001238b6780 R14: 1fe220002432ffd4
> > > R15: 0000000000000000
> > > [Wed Nov 27 00:38:46 2024] FS: 00007fddc1dcc800(0000)
> > > GS:ff110004cb100000(0000) knlGS:0000000000000000
> > > [Wed Nov 27 00:38:46 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [Wed Nov 27 00:38:46 2024] CR2: 00007fc6440095d0 CR3: 0000000142146005
> > > CR4: 0000000000373ef0
> > > [Wed Nov 27 00:38:46 2024] DR0: 0000000000000000 DR1: 0000000000000000
> > > DR2: 0000000000000000
> > > [Wed Nov 27 00:38:46 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> > > DR7: 0000000000000400
> > >
> > > On Tue, Nov 26, 2024 at 3:50 PM Paul Aurich <paul@darkrain42.org> wrote:
> > > >
> > > > On 2024-11-22 19:28:34 -0800, Paul Aurich wrote:
> > > > >On 2024-11-21 23:05:51 -0300, Paulo Alcantara wrote:
> > > > >>Hi Paul,
> > > > >>
> > > > >>Thanks for looking into this! Really appreciate it.
> > > > >>
> > > > >>Paul Aurich <paul@darkrain42.org> writes:
> > > > >>
> > > > >>>The unmount process (cifs_kill_sb() calling close_all_cached_dirs()) can
> > > > >>>race with various cached directory operations, which ultimately results
> > > > >>>in dentries not being dropped and these kernel BUGs:
> > > > >>>
> > > > >>>BUG: Dentry ffff88814f37e358{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs]
> > > > >>>VFS: Busy inodes after unmount of cifs (cifs)
> > > > >>>------------[ cut here ]------------
> > > > >>>kernel BUG at fs/super.c:661!
> > > > >>>
> > > > >>>This happens when a cfid is in the process of being cleaned up when, and
> > > > >>>has been removed from the cfids->entries list, including:
> > > > >>>
> > > > >>>- Receiving a lease break from the server
> > > > >>>- Server reconnection triggers invalidate_all_cached_dirs(), which
> > > > >>> removes all the cfids from the list
> > > > >>>- The laundromat thread decides to expire an old cfid.
> > > > >>>
> > > > >>>To solve these problems, dropping the dentry is done in queued work done
> > > > >>>in a newly-added cfid_put_wq workqueue, and close_all_cached_dirs()
> > > > >>>flushes that workqueue after it drops all the dentries of which it's
> > > > >>>aware. This is a global workqueue (rather than scoped to a mount), but
> > > > >>>the queued work is minimal.
> > > > >>
> > > > >>Why does it need to be a global workqueue? Can't you make it per tcon?
> > > > >
> > > > >The problem with a per-tcon workqueue is I didn't see clean way to
> > > > >deal with multiuser mounts and flushing the workqueue in
> > > > >close_all_cached_dirs() -- when dealing with each individual tcon,
> > > > >we're still holding tlink_tree_lock, so an arbitrary sleep seems
> > > > >problematic.
> > > > >
> > > > >There could be a per-sb workqueue (stored in cifs_sb or the master
> > > > >tcon) but is there a way to get back to the superblock / master tcon
> > > > >with just a tcon (e.g. cached_dir_lease_break, when processing a lease
> > > > >break)?
> > > > >
> > > > >>>The final cleanup work for cleaning up a cfid is performed via work
> > > > >>>queued in the serverclose_wq workqueue; this is done separate from
> > > > >>>dropping the dentries so that close_all_cached_dirs() doesn't block on
> > > > >>>any server operations.
> > > > >>>
> > > > >>>Both of these queued works expect to invoked with a cfid reference and
> > > > >>>a tcon reference to avoid those objects from being freed while the work
> > > > >>>is ongoing.
> > > > >>
> > > > >>Why do you need to take a tcon reference?
> > > > >
> > > > >In the existing code (and my patch, without the refs), I was seeing an
> > > > >intermittent use-after-free of the tcon or cached_fids struct by
> > > > >queued work processing a lease break -- the cfid isn't linked from
> > > > >cached_fids, but smb2_close_cached_fid invoking SMB2_close can race
> > > > >with the unmount and cifs_put_tcon
> > > > >
> > > > >Something like:
> > > > >
> > > > > t1 t2
> > > > >cached_dir_lease_break
> > > > >smb2_cached_lease_break
> > > > >smb2_close_cached_fid
> > > > >SMB2_close starts
> > > > > cifs_kill_sb
> > > > > cifs_umount
> > > > > cifs_put_link
> > > > > cifs_put_tcon
> > > > >SMB2_close continues
> > > > >
> > > > >I had a version of the patch that kept the 'in flight lease breaks' on
> > > > >a second list in cached_fids so that they could be cancelled
> > > > >synchronously from free_cached_fids(), but I struggled with it (I
> > > > >can't remember exactly, but I think I was struggling to get the linked
> > > > >list membership / removal handling and num_entries handling
> > > > >consistent).
> > > > >
> > > > >>Can't you drop the dentries
> > > > >>when tearing down tcon in cifs_put_tcon()? No concurrent mounts would
> > > > >>be able to access or free it.
> > > > >
> > > > >The dentries being dropped must occur before kill_anon_super(), as
> > > > >that's where the 'Dentry still in use' check is. All the tcons are put
> > > > >in cifs_umount(), which occurs after:
> > > > >
> > > > > kill_anon_super(sb);
> > > > > cifs_umount(cifs_sb);
> > > > >
> > > > >The other thing is that cifs_umount_begin() has this comment, which
> > > > >made me think a tcon can actually be tied to two distinct mount
> > > > >points:
> > > > >
> > > > > if ((tcon->tc_count > 1) || (tcon->status == TID_EXITING)) {
> > > > > /* we have other mounts to same share or we have
> > > > > already tried to umount this and woken up
> > > > > all waiting network requests, nothing to do */
> > > > >
> > > > >Although, as I'm thinking about it again, I think I've misunderstood
> > > > >(and that comment is wrong?).
> > > > >
> > > > >It did cross my mind to pull some of the work out of cifs_umount into
> > > > >cifs_kill_sb (specifically, I wanted to cancel prune_tlinks earlier)
> > > > >-- no prune_tlinks would make it more feasible to drop tlink_tree_lock
> > > > >in close_all_cached_dirs(), at which point a per-tcon workqueue is
> > > > >more practical.
> > > > >
> > > > >>After running xfstests I've seen a leaked tcon in
> > > > >>/proc/fs/cifs/DebugData with no CIFS superblocks, which might be related
> > > > >>to this.
> > > > >>
> > > > >>Could you please check if there is any leaked connection in
> > > > >>/proc/fs/cifs/DebugData after running your tests?
> > > > >
> > > > >After I finish with my tests (I'm not using xfstests, although perhaps
> > > > >I should be) and unmount the share, DebugData doesn't show any
> > > > >connections for me.
> > > >
> > > > I was able to reproduce this leak. I believe the attached patch addresses it.
> > > >
> > > > I'm able to intermittently see a 'Dentry still in use' bug with xfstests
> > > > generic/241 (what Steve saw) (the attached patch doesn't help with that). I'm
> > > > still unsure what's going on there.
> > > >
> > > > >~Paul
> > >
> > >
> > >
> > > --
> > > Thanks,
> > >
> > > Steve
> >
> >
> >
> > --
> > Thanks,
> >
> > Steve
>
>
>
> --
> Thanks,
>
> Steve
--
Thanks,
Steve
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH v2 4/4] smb: During unmount, ensure all cached dir instances drop their dentry
2024-11-28 14:16 ` Steve French
@ 2024-12-06 23:28 ` Steve French
0 siblings, 0 replies; 20+ messages in thread
From: Steve French @ 2024-12-06 23:28 UTC (permalink / raw)
To: Paulo Alcantara, linux-cifs, Shyam Prasad N, Bharath SM
[-- Attachment #1: Type: text/plain, Size: 24885 bytes --]
I see the attached umount check warning to current Samba (presumably
due to enabling directory leases) but it is rare - I saw once in test
generic/337 once in generic/339
On Thu, Nov 28, 2024 at 8:16 AM Steve French <smfrench@gmail.com> wrote:
>
> The unmount crash also happens with mainline so not related to patches
> in for-next
>
> http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/6/builds/128/steps/94/logs/stdio
>
> On Wed, Nov 27, 2024 at 11:00 PM Steve French <smfrench@gmail.com> wrote:
> >
> > And also see it without patch "smb: During unmount, ensure all cached
> > dir instances drop their dentry"
> >
> > On Wed, Nov 27, 2024 at 7:10 PM Steve French <smfrench@gmail.com> wrote:
> > >
> > > I see this error at the end of generic/241 (in dmesg) even without the
> > > more recent dir lease patch:
> > >
> > > "smb: Initialize cfid->tcon before performing network ops"
> > >
> > > so it is likely unrelated (earlier bug)
> > >
> > >
> > >
> > > Nov 27 18:07:46 fedora29 kernel: Call Trace:
> > > Nov 27 18:07:46 fedora29 kernel: <TASK>
> > > Nov 27 18:07:46 fedora29 kernel: ? __warn+0xa9/0x220
> > > Nov 27 18:07:46 fedora29 kernel: ? umount_check+0xc3/0xf0
> > > Nov 27 18:07:46 fedora29 kernel: ? report_bug+0x1d4/0x1e0
> > > Nov 27 18:07:46 fedora29 kernel: ? handle_bug+0x5b/0xa0
> > > Nov 27 18:07:46 fedora29 kernel: ? exc_invalid_op+0x18/0x50
> > > Nov 27 18:07:46 fedora29 kernel: ? asm_exc_invalid_op+0x1a/0x20
> > > Nov 27 18:07:46 fedora29 kernel: ? irq_work_claim+0x1e/0x40
> > > Nov 27 18:07:46 fedora29 kernel: ? umount_check+0xc3/0xf0
> > > Nov 27 18:07:46 fedora29 kernel: ? __pfx_umount_check+0x10/0x10
> > > Nov 27 18:07:46 fedora29 kernel: d_walk+0xf3/0x4e0
> > > Nov 27 18:07:46 fedora29 kernel: ? d_walk+0x4b/0x4e0
> > > Nov 27 18:07:46 fedora29 kernel: shrink_dcache_for_umount+0x6d/0x220
> > > Nov 27 18:07:46 fedora29 kernel: generic_shutdown_super+0x4a/0x1c0
> > > Nov 27 18:07:46 fedora29 kernel: kill_anon_super+0x22/0x40
> > > Nov 27 18:07:46 fedora29 kernel: cifs_kill_sb+0x78/0x90 [cifs]
> > >
> > > On Wed, Nov 27, 2024 at 10:38 AM Steve French <smfrench@gmail.com> wrote:
> > > >
> > > > I did see the generic/241 failure again with current for-next
> > > > (unrelated to this patch though). Will try to repro it again - but
> > > > any ideas how to narrow it down or fix it would be helpful.
> > > >
> > > > SECTION -- smb3
> > > > FSTYP -- cifs
> > > > PLATFORM -- Linux/x86_64 fedora29 6.12.0 #1 SMP PREEMPT_DYNAMIC Wed
> > > > Nov 27 01:02:07 UTC 2024
> > > > MKFS_OPTIONS -- //win16.vm.test/Scratch
> > > > generic/241 73s
> > > > Ran: generic/241
> > > > Passed all 1 tests
> > > > SECTION -- smb3
> > > > =========================
> > > > Ran: generic/241
> > > > Passed all 1 tests
> > > > Number of reconnects: 0
> > > > Test completed smb3 generic/241 at Wed Nov 27 06:38:47 AM UTC 2024
> > > > dmesg output during the test:
> > > > [Wed Nov 27 00:37:32 2024] CIFS: Attempting to mount //win16.vm.test/Share
> > > > [Wed Nov 27 00:37:32 2024] CIFS: VFS: generate_smb3signingkey: dumping
> > > > generated AES session keys
> > > > [Wed Nov 27 00:37:32 2024] CIFS: VFS: Session Id 45 00 00 08 00 c8 00 00
> > > > [Wed Nov 27 00:37:32 2024] CIFS: VFS: Cipher type 2
> > > > [Wed Nov 27 00:37:32 2024] CIFS: VFS: Session Key 00 bf ed c7 f1 95 0e
> > > > 29 06 e8 82 87 b5 c8 72 06
> > > > [Wed Nov 27 00:37:32 2024] CIFS: VFS: Signing Key a4 0f 15 64 d2 69 02
> > > > 2f 4e 78 60 7a fe 3e 31 4e
> > > > [Wed Nov 27 00:37:32 2024] CIFS: VFS: ServerIn Key a6 fd 04 f6 04 ea
> > > > 0e 6e 60 c0 1b b1 ee 63 38 e9
> > > > [Wed Nov 27 00:37:32 2024] CIFS: VFS: ServerOut Key a6 e3 e3 22 8c c2
> > > > b0 6e b1 9d 40 ea d0 89 6d d8
> > > > [Wed Nov 27 00:37:32 2024] CIFS: Attempting to mount //win16.vm.test/Scratch
> > > > [Wed Nov 27 00:37:32 2024] CIFS: Attempting to mount //win16.vm.test/Scratch
> > > > [Wed Nov 27 00:37:32 2024] run fstests generic/241 at 2024-11-27 00:37:33
> > > > [Wed Nov 27 00:38:46 2024] ------------[ cut here ]------------
> > > > [Wed Nov 27 00:38:46 2024] BUG: Dentry
> > > > 00000000318d67d4{i=11000000033f68,n=~dmtmp} still in use (1) [unmount
> > > > of cifs cifs]
> > > > [Wed Nov 27 00:38:46 2024] WARNING: CPU: 2 PID: 316177 at
> > > > fs/dcache.c:1546 umount_check+0xc3/0xf0
> > > > [Wed Nov 27 00:38:46 2024] Modules linked in: cifs ccm cmac nls_utf8
> > > > cifs_arc4 nls_ucs2_utils cifs_md4 rpcsec_gss_krb5 auth_rpcgss nfsv4
> > > > dns_resolver nfs lockd grace netfs nf_conntrack_netbios_ns
> > > > nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
> > > > nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
> > > > nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat
> > > > ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
> > > > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
> > > > iptable_security ip_set ebtable_filter ebtables ip6table_filter
> > > > ip6_tables iptable_filter ip_tables sunrpc kvm_intel kvm virtio_net
> > > > net_failover failover virtio_balloon loop fuse dm_multipath nfnetlink
> > > > zram xfs bochs drm_client_lib drm_shmem_helper drm_kms_helper
> > > > crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm
> > > > sha512_ssse3 sha256_ssse3 sha1_ssse3 floppy virtio_blk qemu_fw_cfg
> > > > virtio_console [last unloaded: cifs]
> > > > [Wed Nov 27 00:38:46 2024] CPU: 2 UID: 0 PID: 316177 Comm: umount Not
> > > > tainted 6.12.0 #1
> > > > [Wed Nov 27 00:38:46 2024] Hardware name: Red Hat KVM, BIOS
> > > > 1.16.3-2.el9 04/01/2014
> > > > [Wed Nov 27 00:38:46 2024] RIP: 0010:umount_check+0xc3/0xf0
> > > > [Wed Nov 27 00:38:46 2024] Code: db 74 0d 48 8d 7b 40 e8 db df f5 ff
> > > > 48 8b 53 40 41 55 4d 89 f1 45 89 e0 48 89 e9 48 89 ee 48 c7 c7 80 99
> > > > ba ad e8 2d 27 a2 ff <0f> 0b 58 31 c0 5b 5d 41 5c 41 5d 41 5e c3 cc cc
> > > > cc cc 41 83 fc 01
> > > > [Wed Nov 27 00:38:46 2024] RSP: 0018:ff1100012197fd20 EFLAGS: 00010282
> > > > [Wed Nov 27 00:38:46 2024] RAX: dffffc0000000000 RBX: ff1100010c574ce0
> > > > RCX: 0000000000000027
> > > > [Wed Nov 27 00:38:46 2024] RDX: 0000000000000027 RSI: 0000000000000004
> > > > RDI: ff110004cb131a48
> > > > [Wed Nov 27 00:38:46 2024] RBP: ff1100012c76bd60 R08: ffffffffac3fd2fe
> > > > R09: ffe21c0099626349
> > > > [Wed Nov 27 00:38:46 2024] R10: ff110004cb131a4b R11: 0000000000000001
> > > > R12: 0000000000000001
> > > > [Wed Nov 27 00:38:46 2024] R13: ff110001238b6668 R14: ffffffffc1d6e6c0
> > > > R15: ff1100012c76be18
> > > > [Wed Nov 27 00:38:46 2024] FS: 00007fddc1dcc800(0000)
> > > > GS:ff110004cb100000(0000) knlGS:0000000000000000
> > > > [Wed Nov 27 00:38:46 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [Wed Nov 27 00:38:46 2024] CR2: 00007fc6440095d0 CR3: 0000000142146005
> > > > CR4: 0000000000373ef0
> > > > [Wed Nov 27 00:38:46 2024] DR0: 0000000000000000 DR1: 0000000000000000
> > > > DR2: 0000000000000000
> > > > [Wed Nov 27 00:38:46 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> > > > DR7: 0000000000000400
> > > > [Wed Nov 27 00:38:46 2024] Call Trace:
> > > > [Wed Nov 27 00:38:46 2024] <TASK>
> > > > [Wed Nov 27 00:38:46 2024] ? __warn+0xa9/0x220
> > > > [Wed Nov 27 00:38:46 2024] ? umount_check+0xc3/0xf0
> > > > [Wed Nov 27 00:38:46 2024] ? report_bug+0x1d4/0x1e0
> > > > [Wed Nov 27 00:38:46 2024] ? handle_bug+0x5b/0xa0
> > > > [Wed Nov 27 00:38:46 2024] ? exc_invalid_op+0x18/0x50
> > > > [Wed Nov 27 00:38:46 2024] ? asm_exc_invalid_op+0x1a/0x20
> > > > [Wed Nov 27 00:38:46 2024] ? irq_work_claim+0x1e/0x40
> > > > [Wed Nov 27 00:38:46 2024] ? umount_check+0xc3/0xf0
> > > > [Wed Nov 27 00:38:46 2024] ? __pfx_umount_check+0x10/0x10
> > > > [Wed Nov 27 00:38:46 2024] d_walk+0xf3/0x4e0
> > > > [Wed Nov 27 00:38:46 2024] ? d_walk+0x4b/0x4e0
> > > > [Wed Nov 27 00:38:46 2024] shrink_dcache_for_umount+0x6d/0x220
> > > > [Wed Nov 27 00:38:46 2024] generic_shutdown_super+0x4a/0x1c0
> > > > [Wed Nov 27 00:38:46 2024] kill_anon_super+0x22/0x40
> > > > [Wed Nov 27 00:38:46 2024] cifs_kill_sb+0x78/0x90 [cifs]
> > > > [Wed Nov 27 00:38:46 2024] deactivate_locked_super+0x69/0xf0
> > > > [Wed Nov 27 00:38:46 2024] cleanup_mnt+0x195/0x200
> > > > [Wed Nov 27 00:38:46 2024] task_work_run+0xec/0x150
> > > > [Wed Nov 27 00:38:46 2024] ? __pfx_task_work_run+0x10/0x10
> > > > [Wed Nov 27 00:38:46 2024] ? mark_held_locks+0x24/0x90
> > > > [Wed Nov 27 00:38:46 2024] syscall_exit_to_user_mode+0x269/0x2a0
> > > > [Wed Nov 27 00:38:46 2024] do_syscall_64+0x81/0x180
> > > > [Wed Nov 27 00:38:46 2024] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > > [Wed Nov 27 00:38:46 2024] RIP: 0033:0x7fddc1ff43eb
> > > > [Wed Nov 27 00:38:46 2024] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f
> > > > 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa
> > > > b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15
> > > > f9 a9 0c 00 f7 d8
> > > > [Wed Nov 27 00:38:46 2024] RSP: 002b:00007ffe64be88d8 EFLAGS: 00000246
> > > > ORIG_RAX: 00000000000000a6
> > > > [Wed Nov 27 00:38:46 2024] RAX: 0000000000000000 RBX: 00005632106d4c20
> > > > RCX: 00007fddc1ff43eb
> > > > [Wed Nov 27 00:38:46 2024] RDX: 0000000000000000 RSI: 0000000000000000
> > > > RDI: 00005632106d9410
> > > > [Wed Nov 27 00:38:46 2024] RBP: 00007ffe64be89b0 R08: 00005632106d4010
> > > > R09: 0000000000000007
> > > > [Wed Nov 27 00:38:46 2024] R10: 0000000000000000 R11: 0000000000000246
> > > > R12: 00005632106d4d28
> > > > [Wed Nov 27 00:38:46 2024] R13: 0000000000000000 R14: 00005632106d9410
> > > > R15: 00005632106d5030
> > > > [Wed Nov 27 00:38:46 2024] </TASK>
> > > > [Wed Nov 27 00:38:46 2024] irq event stamp: 8317
> > > > [Wed Nov 27 00:38:46 2024] hardirqs last enabled at (8323):
> > > > [<ffffffffac230dce>] __up_console_sem+0x5e/0x70
> > > > [Wed Nov 27 00:38:46 2024] hardirqs last disabled at (8328):
> > > > [<ffffffffac230db3>] __up_console_sem+0x43/0x70
> > > > [Wed Nov 27 00:38:46 2024] softirqs last enabled at (6628):
> > > > [<ffffffffac135745>] __irq_exit_rcu+0x135/0x160
> > > > [Wed Nov 27 00:38:46 2024] softirqs last disabled at (6539):
> > > > [<ffffffffac135745>] __irq_exit_rcu+0x135/0x160
> > > > [Wed Nov 27 00:38:46 2024] ---[ end trace 0000000000000000 ]---
> > > > [Wed Nov 27 00:38:46 2024] VFS: Busy inodes after unmount of cifs (cifs)
> > > > [Wed Nov 27 00:38:46 2024] ------------[ cut here ]------------
> > > > [Wed Nov 27 00:38:46 2024] kernel BUG at fs/super.c:650!
> > > > [Wed Nov 27 00:38:46 2024] Oops: invalid opcode: 0000 [#1] PREEMPT SMP
> > > > KASAN NOPTI
> > > > [Wed Nov 27 00:38:46 2024] CPU: 2 UID: 0 PID: 316177 Comm: umount
> > > > Tainted: G W 6.12.0 #1
> > > > [Wed Nov 27 00:38:46 2024] Tainted: [W]=WARN
> > > > [Wed Nov 27 00:38:46 2024] Hardware name: Red Hat KVM, BIOS
> > > > 1.16.3-2.el9 04/01/2014
> > > > [Wed Nov 27 00:38:46 2024] RIP: 0010:generic_shutdown_super+0x1b7/0x1c0
> > > > [Wed Nov 27 00:38:46 2024] Code: 7b 28 e8 5c ca f8 ff 48 8b 6b 28 48
> > > > 89 ef e8 50 ca f8 ff 48 8b 55 00 48 8d b3 68 06 00 00 48 c7 c7 e0 38
> > > > ba ad e8 d9 c1 b5 ff <0f> 0b 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90
> > > > 90 90 90 90 90 90
> > > > [Wed Nov 27 00:38:46 2024] RSP: 0018:ff1100012197fdf0 EFLAGS: 00010282
> > > > [Wed Nov 27 00:38:46 2024] RAX: 000000000000002d RBX: ff110001238b6000
> > > > RCX: 0000000000000027
> > > > [Wed Nov 27 00:38:46 2024] RDX: 0000000000000027 RSI: 0000000000000004
> > > > RDI: ff110004cb131a48
> > > > [Wed Nov 27 00:38:46 2024] RBP: ffffffffc1c6ac00 R08: ffffffffac3fd2fe
> > > > R09: ffe21c0099626349
> > > > [Wed Nov 27 00:38:46 2024] R10: ff110004cb131a4b R11: 0000000000000001
> > > > R12: ff110001238b69c0
> > > > [Wed Nov 27 00:38:46 2024] R13: ff110001238b6780 R14: 1fe220002432ffd4
> > > > R15: 0000000000000000
> > > > [Wed Nov 27 00:38:46 2024] FS: 00007fddc1dcc800(0000)
> > > > GS:ff110004cb100000(0000) knlGS:0000000000000000
> > > > [Wed Nov 27 00:38:46 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [Wed Nov 27 00:38:46 2024] CR2: 00007fc6440095d0 CR3: 0000000142146005
> > > > CR4: 0000000000373ef0
> > > > [Wed Nov 27 00:38:46 2024] DR0: 0000000000000000 DR1: 0000000000000000
> > > > DR2: 0000000000000000
> > > > [Wed Nov 27 00:38:46 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> > > > DR7: 0000000000000400
> > > > [Wed Nov 27 00:38:46 2024] Call Trace:
> > > > [Wed Nov 27 00:38:46 2024] <TASK>
> > > > [Wed Nov 27 00:38:46 2024] ? die+0x37/0x90
> > > > [Wed Nov 27 00:38:46 2024] ? do_trap+0x133/0x230
> > > > [Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
> > > > [Wed Nov 27 00:38:46 2024] ? do_error_trap+0x94/0x130
> > > > [Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
> > > > [Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
> > > > [Wed Nov 27 00:38:46 2024] ? handle_invalid_op+0x2c/0x40
> > > > [Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
> > > > [Wed Nov 27 00:38:46 2024] ? exc_invalid_op+0x2f/0x50
> > > > [Wed Nov 27 00:38:46 2024] ? asm_exc_invalid_op+0x1a/0x20
> > > > [Wed Nov 27 00:38:46 2024] ? irq_work_claim+0x1e/0x40
> > > > [Wed Nov 27 00:38:46 2024] ? generic_shutdown_super+0x1b7/0x1c0
> > > > [Wed Nov 27 00:38:46 2024] kill_anon_super+0x22/0x40
> > > > [Wed Nov 27 00:38:46 2024] cifs_kill_sb+0x78/0x90 [cifs]
> > > > [Wed Nov 27 00:38:46 2024] deactivate_locked_super+0x69/0xf0
> > > > [Wed Nov 27 00:38:46 2024] cleanup_mnt+0x195/0x200
> > > > [Wed Nov 27 00:38:46 2024] task_work_run+0xec/0x150
> > > > [Wed Nov 27 00:38:46 2024] ? __pfx_task_work_run+0x10/0x10
> > > > [Wed Nov 27 00:38:46 2024] ? mark_held_locks+0x24/0x90
> > > > [Wed Nov 27 00:38:46 2024] syscall_exit_to_user_mode+0x269/0x2a0
> > > > [Wed Nov 27 00:38:46 2024] do_syscall_64+0x81/0x180
> > > > [Wed Nov 27 00:38:46 2024] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > > [Wed Nov 27 00:38:46 2024] RIP: 0033:0x7fddc1ff43eb
> > > > [Wed Nov 27 00:38:46 2024] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f
> > > > 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa
> > > > b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15
> > > > f9 a9 0c 00 f7 d8
> > > > [Wed Nov 27 00:38:46 2024] RSP: 002b:00007ffe64be88d8 EFLAGS: 00000246
> > > > ORIG_RAX: 00000000000000a6
> > > > [Wed Nov 27 00:38:46 2024] RAX: 0000000000000000 RBX: 00005632106d4c20
> > > > RCX: 00007fddc1ff43eb
> > > > [Wed Nov 27 00:38:46 2024] RDX: 0000000000000000 RSI: 0000000000000000
> > > > RDI: 00005632106d9410
> > > > [Wed Nov 27 00:38:46 2024] RBP: 00007ffe64be89b0 R08: 00005632106d4010
> > > > R09: 0000000000000007
> > > > [Wed Nov 27 00:38:46 2024] R10: 0000000000000000 R11: 0000000000000246
> > > > R12: 00005632106d4d28
> > > > [Wed Nov 27 00:38:46 2024] R13: 0000000000000000 R14: 00005632106d9410
> > > > R15: 00005632106d5030
> > > > [Wed Nov 27 00:38:46 2024] </TASK>
> > > > [Wed Nov 27 00:38:46 2024] Modules linked in: cifs ccm cmac nls_utf8
> > > > cifs_arc4 nls_ucs2_utils cifs_md4 rpcsec_gss_krb5 auth_rpcgss nfsv4
> > > > dns_resolver nfs lockd grace netfs nf_conntrack_netbios_ns
> > > > nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
> > > > nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
> > > > nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat
> > > > ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
> > > > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
> > > > iptable_security ip_set ebtable_filter ebtables ip6table_filter
> > > > ip6_tables iptable_filter ip_tables sunrpc kvm_intel kvm virtio_net
> > > > net_failover failover virtio_balloon loop fuse dm_multipath nfnetlink
> > > > zram xfs bochs drm_client_lib drm_shmem_helper drm_kms_helper
> > > > crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm
> > > > sha512_ssse3 sha256_ssse3 sha1_ssse3 floppy virtio_blk qemu_fw_cfg
> > > > virtio_console [last unloaded: cifs]
> > > > [Wed Nov 27 00:38:46 2024] ---[ end trace 0000000000000000 ]---
> > > > [Wed Nov 27 00:38:46 2024] RIP: 0010:generic_shutdown_super+0x1b7/0x1c0
> > > > [Wed Nov 27 00:38:46 2024] Code: 7b 28 e8 5c ca f8 ff 48 8b 6b 28 48
> > > > 89 ef e8 50 ca f8 ff 48 8b 55 00 48 8d b3 68 06 00 00 48 c7 c7 e0 38
> > > > ba ad e8 d9 c1 b5 ff <0f> 0b 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90
> > > > 90 90 90 90 90 90
> > > > [Wed Nov 27 00:38:46 2024] RSP: 0018:ff1100012197fdf0 EFLAGS: 00010282
> > > > [Wed Nov 27 00:38:46 2024] RAX: 000000000000002d RBX: ff110001238b6000
> > > > RCX: 0000000000000027
> > > > [Wed Nov 27 00:38:46 2024] RDX: 0000000000000027 RSI: 0000000000000004
> > > > RDI: ff110004cb131a48
> > > > [Wed Nov 27 00:38:46 2024] RBP: ffffffffc1c6ac00 R08: ffffffffac3fd2fe
> > > > R09: ffe21c0099626349
> > > > [Wed Nov 27 00:38:46 2024] R10: ff110004cb131a4b R11: 0000000000000001
> > > > R12: ff110001238b69c0
> > > > [Wed Nov 27 00:38:46 2024] R13: ff110001238b6780 R14: 1fe220002432ffd4
> > > > R15: 0000000000000000
> > > > [Wed Nov 27 00:38:46 2024] FS: 00007fddc1dcc800(0000)
> > > > GS:ff110004cb100000(0000) knlGS:0000000000000000
> > > > [Wed Nov 27 00:38:46 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [Wed Nov 27 00:38:46 2024] CR2: 00007fc6440095d0 CR3: 0000000142146005
> > > > CR4: 0000000000373ef0
> > > > [Wed Nov 27 00:38:46 2024] DR0: 0000000000000000 DR1: 0000000000000000
> > > > DR2: 0000000000000000
> > > > [Wed Nov 27 00:38:46 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> > > > DR7: 0000000000000400
> > > >
> > > > On Tue, Nov 26, 2024 at 3:50 PM Paul Aurich <paul@darkrain42.org> wrote:
> > > > >
> > > > > On 2024-11-22 19:28:34 -0800, Paul Aurich wrote:
> > > > > >On 2024-11-21 23:05:51 -0300, Paulo Alcantara wrote:
> > > > > >>Hi Paul,
> > > > > >>
> > > > > >>Thanks for looking into this! Really appreciate it.
> > > > > >>
> > > > > >>Paul Aurich <paul@darkrain42.org> writes:
> > > > > >>
> > > > > >>>The unmount process (cifs_kill_sb() calling close_all_cached_dirs()) can
> > > > > >>>race with various cached directory operations, which ultimately results
> > > > > >>>in dentries not being dropped and these kernel BUGs:
> > > > > >>>
> > > > > >>>BUG: Dentry ffff88814f37e358{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs]
> > > > > >>>VFS: Busy inodes after unmount of cifs (cifs)
> > > > > >>>------------[ cut here ]------------
> > > > > >>>kernel BUG at fs/super.c:661!
> > > > > >>>
> > > > > >>>This happens when a cfid is in the process of being cleaned up when, and
> > > > > >>>has been removed from the cfids->entries list, including:
> > > > > >>>
> > > > > >>>- Receiving a lease break from the server
> > > > > >>>- Server reconnection triggers invalidate_all_cached_dirs(), which
> > > > > >>> removes all the cfids from the list
> > > > > >>>- The laundromat thread decides to expire an old cfid.
> > > > > >>>
> > > > > >>>To solve these problems, dropping the dentry is done in queued work done
> > > > > >>>in a newly-added cfid_put_wq workqueue, and close_all_cached_dirs()
> > > > > >>>flushes that workqueue after it drops all the dentries of which it's
> > > > > >>>aware. This is a global workqueue (rather than scoped to a mount), but
> > > > > >>>the queued work is minimal.
> > > > > >>
> > > > > >>Why does it need to be a global workqueue? Can't you make it per tcon?
> > > > > >
> > > > > >The problem with a per-tcon workqueue is I didn't see clean way to
> > > > > >deal with multiuser mounts and flushing the workqueue in
> > > > > >close_all_cached_dirs() -- when dealing with each individual tcon,
> > > > > >we're still holding tlink_tree_lock, so an arbitrary sleep seems
> > > > > >problematic.
> > > > > >
> > > > > >There could be a per-sb workqueue (stored in cifs_sb or the master
> > > > > >tcon) but is there a way to get back to the superblock / master tcon
> > > > > >with just a tcon (e.g. cached_dir_lease_break, when processing a lease
> > > > > >break)?
> > > > > >
> > > > > >>>The final cleanup work for cleaning up a cfid is performed via work
> > > > > >>>queued in the serverclose_wq workqueue; this is done separate from
> > > > > >>>dropping the dentries so that close_all_cached_dirs() doesn't block on
> > > > > >>>any server operations.
> > > > > >>>
> > > > > >>>Both of these queued works expect to invoked with a cfid reference and
> > > > > >>>a tcon reference to avoid those objects from being freed while the work
> > > > > >>>is ongoing.
> > > > > >>
> > > > > >>Why do you need to take a tcon reference?
> > > > > >
> > > > > >In the existing code (and my patch, without the refs), I was seeing an
> > > > > >intermittent use-after-free of the tcon or cached_fids struct by
> > > > > >queued work processing a lease break -- the cfid isn't linked from
> > > > > >cached_fids, but smb2_close_cached_fid invoking SMB2_close can race
> > > > > >with the unmount and cifs_put_tcon
> > > > > >
> > > > > >Something like:
> > > > > >
> > > > > > t1 t2
> > > > > >cached_dir_lease_break
> > > > > >smb2_cached_lease_break
> > > > > >smb2_close_cached_fid
> > > > > >SMB2_close starts
> > > > > > cifs_kill_sb
> > > > > > cifs_umount
> > > > > > cifs_put_link
> > > > > > cifs_put_tcon
> > > > > >SMB2_close continues
> > > > > >
> > > > > >I had a version of the patch that kept the 'in flight lease breaks' on
> > > > > >a second list in cached_fids so that they could be cancelled
> > > > > >synchronously from free_cached_fids(), but I struggled with it (I
> > > > > >can't remember exactly, but I think I was struggling to get the linked
> > > > > >list membership / removal handling and num_entries handling
> > > > > >consistent).
> > > > > >
> > > > > >>Can't you drop the dentries
> > > > > >>when tearing down tcon in cifs_put_tcon()? No concurrent mounts would
> > > > > >>be able to access or free it.
> > > > > >
> > > > > >The dentries being dropped must occur before kill_anon_super(), as
> > > > > >that's where the 'Dentry still in use' check is. All the tcons are put
> > > > > >in cifs_umount(), which occurs after:
> > > > > >
> > > > > > kill_anon_super(sb);
> > > > > > cifs_umount(cifs_sb);
> > > > > >
> > > > > >The other thing is that cifs_umount_begin() has this comment, which
> > > > > >made me think a tcon can actually be tied to two distinct mount
> > > > > >points:
> > > > > >
> > > > > > if ((tcon->tc_count > 1) || (tcon->status == TID_EXITING)) {
> > > > > > /* we have other mounts to same share or we have
> > > > > > already tried to umount this and woken up
> > > > > > all waiting network requests, nothing to do */
> > > > > >
> > > > > >Although, as I'm thinking about it again, I think I've misunderstood
> > > > > >(and that comment is wrong?).
> > > > > >
> > > > > >It did cross my mind to pull some of the work out of cifs_umount into
> > > > > >cifs_kill_sb (specifically, I wanted to cancel prune_tlinks earlier)
> > > > > >-- no prune_tlinks would make it more feasible to drop tlink_tree_lock
> > > > > >in close_all_cached_dirs(), at which point a per-tcon workqueue is
> > > > > >more practical.
> > > > > >
> > > > > >>After running xfstests I've seen a leaked tcon in
> > > > > >>/proc/fs/cifs/DebugData with no CIFS superblocks, which might be related
> > > > > >>to this.
> > > > > >>
> > > > > >>Could you please check if there is any leaked connection in
> > > > > >>/proc/fs/cifs/DebugData after running your tests?
> > > > > >
> > > > > >After I finish with my tests (I'm not using xfstests, although perhaps
> > > > > >I should be) and unmount the share, DebugData doesn't show any
> > > > > >connections for me.
> > > > >
> > > > > I was able to reproduce this leak. I believe the attached patch addresses it.
> > > > >
> > > > > I'm able to intermittently see a 'Dentry still in use' bug with xfstests
> > > > > generic/241 (what Steve saw) (the attached patch doesn't help with that). I'm
> > > > > still unsure what's going on there.
> > > > >
> > > > > >~Paul
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks,
> > > >
> > > > Steve
> > >
> > >
> > >
> > > --
> > > Thanks,
> > >
> > > Steve
> >
> >
> >
> > --
> > Thanks,
> >
> > Steve
>
>
>
> --
> Thanks,
>
> Steve
--
Thanks,
Steve
[-- Attachment #2: 337.dmesg --]
[-- Type: application/octet-stream, Size: 14754 bytes --]
[24570.038452] run fstests generic/337 at 2024-12-06 16:45:52
[24570.459160] CIFS: Attempting to mount //localhost/scratch
[24571.625626] CIFS: Attempting to mount //localhost/scratch
[24571.665599] ------------[ cut here ]------------
[24571.665602] BUG: Dentry 000000009ae483ab{i=100c93e9,n=WORD} still in use (2) [unmount of cifs cifs]
[24571.665610] WARNING: CPU: 4 PID: 1365069 at fs/dcache.c:1536 umount_check+0x67/0x90
[24571.665615] Modules linked in: cifs rfcomm snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables br_netfilter bridge stp llc nls_utf8 cifs_arc4 nls_ucs2_utils cifs_md4 cachefiles netfs ccm overlay qrtr cmac algif_hash algif_skcipher af_alg bnep sch_fq_codel binfmt_misc nls_iso8859_1 intel_uncore_frequency intel_uncore_frequency_common intel_tcc_cooling x86_pkg_temp_thermal snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel intel_powerclamp soundwire_cadence coretemp snd_sof_intel_hda_common snd_soc_hdac_hda elan_i2c snd_sof_intel_hda_mlink kvm_intel snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof kvm snd_sof_utils ee1004 cmdlinepart snd_soc_acpi_intel_match soundwire_generic_allocation snd_soc_acpi crct10dif_pclmul polyval_clmulni mei_hdcp spi_nor polyval_generic soundwire_bus ghash_clmulni_intel
[24571.665665] sha256_ssse3 mtd mei_pxp intel_rapl_msr sha1_ssse3 snd_soc_avs aesni_intel snd_soc_hda_codec crypto_simd snd_hda_ext_core cryptd iwlmvm rapl snd_hda_codec_realtek snd_soc_core think_lmi snd_hda_codec_generic firmware_attributes_class intel_cstate snd_hda_codec_hdmi snd_hda_scodec_component snd_ctl_led snd_compress ac97_bus processor_thermal_device_pci_legacy mac80211 snd_pcm_dmaengine processor_thermal_device processor_thermal_wt_hint uvcvideo processor_thermal_rfim videobuf2_vmalloc libarc4 uvc snd_hda_intel snd_intel_dspcfg processor_thermal_rapl videobuf2_memops snd_intel_sdw_acpi intel_wmi_thunderbolt wmi_bmof btusb videobuf2_v4l2 intel_rapl_common iwlwifi btrtl i2c_i801 snd_hda_codec processor_thermal_wt_req intel_pmc_core videobuf2_common nvidiafb btintel i2c_mux spi_intel_pci processor_thermal_power_floor spi_intel snd_hda_core i2c_smbus snd_hwdep thinkpad_acpi intel_vsec btbcm videodev processor_thermal_mbox int3403_thermal joydev vgastate input_leds int3400_thermal btmtk mei_me pmt_telemetry
[24571.665713] snd_pcm cfg80211 mei bluetooth mc fb_ddc snd_timer intel_pch_thermal intel_soc_dts_iosf nvram int340x_thermal_zone acpi_pad acpi_thermal_rel pmt_class mac_hid serio_raw nouveau mxm_wmi drm_gpuvm drm_exec gpu_sched drm_ttm_helper ttm drm_display_helper cec rc_core i2c_algo_bit nfsd msr parport_pc auth_rpcgss nfs_acl ppdev lockd grace lp nvme_fabrics parport nvme_keyring efi_pstore sunrpc nfnetlink dmi_sysfs ip_tables x_tables autofs4 xfs btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 wacom hid_microsoft ff_memless hid_generic usbhid hid 8250_dw rtsx_pci_sdmmc nvme ucsi_acpi psmouse crc32_pclmul typec_ucsi intel_lpss_pci nvme_core e1000e intel_lpss rtsx_pci nvme_auth snd typec idma64 soundcore video sparse_keymap platform_profile wmi pinctrl_cannonlake [last unloaded: cifs(OE)]
[24571.665773] CPU: 4 UID: 0 PID: 1365069 Comm: umount Tainted: G B W OE 6.12.3-061203-generic #202412060638
[24571.665777] Tainted: [B]=BAD_PAGE, [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[24571.665778] Hardware name: LENOVO 20MAS08500/20MAS08500, BIOS N2CET70W (1.53 ) 03/11/2024
[24571.665779] RIP: 0010:umount_check+0x67/0x90
[24571.665782] Code: 03 00 00 48 8b 40 28 48 89 e5 4c 8b 08 48 8b 46 30 48 85 c0 74 04 48 8b 50 40 51 48 c7 c7 a0 eb ec b7 48 89 f1 e8 49 e9 bb ff <0f> 0b 58 31 c0 c9 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 c3 cc
[24571.665784] RSP: 0018:ffffae71cbd0fb68 EFLAGS: 00010246
[24571.665786] RAX: 0000000000000000 RBX: 00000000000575fa RCX: 0000000000000000
[24571.665787] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[24571.665788] RBP: ffffae71cbd0fb70 R08: 0000000000000000 R09: 0000000000000000
[24571.665790] R10: 0000000000000000 R11: 0000000000000000 R12: ffff977c0043f600
[24571.665791] R13: ffffffffb6745ba0 R14: ffff977c0043f680 R15: ffff977c4018d9c0
[24571.665792] FS: 0000734a602d3800(0000) GS:ffff97837ba00000(0000) knlGS:0000000000000000
[24571.665794] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[24571.665795] CR2: 00007ffe3e2f0cf0 CR3: 0000000120340001 CR4: 00000000003726f0
[24571.665797] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[24571.665798] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[24571.665800] Call Trace:
[24571.665801] <TASK>
[24571.665803] ? show_trace_log_lvl+0x1be/0x310
[24571.665806] ? show_trace_log_lvl+0x1be/0x310
[24571.665809] ? d_walk+0xc5/0x2b0
[24571.665812] ? show_regs.part.0+0x22/0x30
[24571.665814] ? show_regs.cold+0x8/0x10
[24571.665816] ? umount_check+0x67/0x90
[24571.665818] ? __warn.cold+0xac/0x10c
[24571.665820] ? umount_check+0x67/0x90
[24571.665823] ? report_bug+0x114/0x160
[24571.665826] ? handle_bug+0x6e/0xb0
[24571.665829] ? exc_invalid_op+0x18/0x80
[24571.665831] ? asm_exc_invalid_op+0x1b/0x20
[24571.665834] ? __pfx_umount_check+0x10/0x10
[24571.665837] ? umount_check+0x67/0x90
[24571.665840] ? umount_check+0x67/0x90
[24571.665842] d_walk+0xc5/0x2b0
[24571.665845] shrink_dcache_for_umount+0x4c/0x130
[24571.665848] generic_shutdown_super+0x25/0x1a0
[24571.665851] kill_anon_super+0x18/0x50
[24571.665852] cifs_kill_sb+0x4a/0x60 [cifs]
[24571.665902] deactivate_locked_super+0x32/0xc0
[24571.665904] deactivate_super+0x46/0x60
[24571.665907] cleanup_mnt+0xc3/0x170
[24571.665910] __cleanup_mnt+0x12/0x20
[24571.665912] task_work_run+0x5d/0xa0
[24571.665917] syscall_exit_to_user_mode+0x1ca/0x1d0
[24571.665921] do_syscall_64+0x8a/0x170
[24571.665923] ? generic_permission+0x39/0x230
[24571.665927] ? mntput+0x24/0x50
[24571.665929] ? path_put+0x1e/0x30
[24571.665932] ? do_faccessat+0x1e3/0x2e0
[24571.665935] ? arch_exit_to_user_mode_prepare.isra.0+0x22/0xd0
[24571.665939] ? syscall_exit_to_user_mode+0x38/0x1d0
[24571.665941] ? do_syscall_64+0x8a/0x170
[24571.665944] ? arch_exit_to_user_mode_prepare.isra.0+0x22/0xd0
[24571.665947] ? syscall_exit_to_user_mode+0x38/0x1d0
[24571.665950] ? do_syscall_64+0x8a/0x170
[24571.665952] ? irqentry_exit_to_user_mode+0x2d/0x1d0
[24571.665955] ? irqentry_exit+0x43/0x50
[24571.665957] ? exc_page_fault+0x96/0x1c0
[24571.665960] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[24571.665963] RIP: 0033:0x734a6012a9fb
[24571.665966] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e9 83 0d 00 f7 d8
[24571.665968] RSP: 002b:00007ffe3e2f24a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[24571.665971] RAX: 0000000000000000 RBX: 0000639144f67a60 RCX: 0000734a6012a9fb
[24571.665973] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000639144f74070
[24571.665974] RBP: 00007ffe3e2f2580 R08: 0000734a60203b20 R09: 0000000000000020
[24571.665976] R10: 0000000000000001 R11: 0000000000000246 R12: 0000639144f67b60
[24571.665977] R13: 0000000000000000 R14: 0000639144f74070 R15: 0000639144f67e70
[24571.665981] </TASK>
[24571.665982] ---[ end trace 0000000000000000 ]---
[24571.682972] ------------[ cut here ]------------
[24571.682975] VFS: Busy inodes after unmount of cifs (cifs)
[24571.682981] WARNING: CPU: 4 PID: 1365069 at fs/super.c:650 generic_shutdown_super+0x127/0x1a0
[24571.682986] Modules linked in: cifs rfcomm snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables br_netfilter bridge stp llc nls_utf8 cifs_arc4 nls_ucs2_utils cifs_md4 cachefiles netfs ccm overlay qrtr cmac algif_hash algif_skcipher af_alg bnep sch_fq_codel binfmt_misc nls_iso8859_1 intel_uncore_frequency intel_uncore_frequency_common intel_tcc_cooling x86_pkg_temp_thermal snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel intel_powerclamp soundwire_cadence coretemp snd_sof_intel_hda_common snd_soc_hdac_hda elan_i2c snd_sof_intel_hda_mlink kvm_intel snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof kvm snd_sof_utils ee1004 cmdlinepart snd_soc_acpi_intel_match soundwire_generic_allocation snd_soc_acpi crct10dif_pclmul polyval_clmulni mei_hdcp spi_nor polyval_generic soundwire_bus ghash_clmulni_intel
[24571.683036] sha256_ssse3 mtd mei_pxp intel_rapl_msr sha1_ssse3 snd_soc_avs aesni_intel snd_soc_hda_codec crypto_simd snd_hda_ext_core cryptd iwlmvm rapl snd_hda_codec_realtek snd_soc_core think_lmi snd_hda_codec_generic firmware_attributes_class intel_cstate snd_hda_codec_hdmi snd_hda_scodec_component snd_ctl_led snd_compress ac97_bus processor_thermal_device_pci_legacy mac80211 snd_pcm_dmaengine processor_thermal_device processor_thermal_wt_hint uvcvideo processor_thermal_rfim videobuf2_vmalloc libarc4 uvc snd_hda_intel snd_intel_dspcfg processor_thermal_rapl videobuf2_memops snd_intel_sdw_acpi intel_wmi_thunderbolt wmi_bmof btusb videobuf2_v4l2 intel_rapl_common iwlwifi btrtl i2c_i801 snd_hda_codec processor_thermal_wt_req intel_pmc_core videobuf2_common nvidiafb btintel i2c_mux spi_intel_pci processor_thermal_power_floor spi_intel snd_hda_core i2c_smbus snd_hwdep thinkpad_acpi intel_vsec btbcm videodev processor_thermal_mbox int3403_thermal joydev vgastate input_leds int3400_thermal btmtk mei_me pmt_telemetry
[24571.683091] snd_pcm cfg80211 mei bluetooth mc fb_ddc snd_timer intel_pch_thermal intel_soc_dts_iosf nvram int340x_thermal_zone acpi_pad acpi_thermal_rel pmt_class mac_hid serio_raw nouveau mxm_wmi drm_gpuvm drm_exec gpu_sched drm_ttm_helper ttm drm_display_helper cec rc_core i2c_algo_bit nfsd msr parport_pc auth_rpcgss nfs_acl ppdev lockd grace lp nvme_fabrics parport nvme_keyring efi_pstore sunrpc nfnetlink dmi_sysfs ip_tables x_tables autofs4 xfs btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 wacom hid_microsoft ff_memless hid_generic usbhid hid 8250_dw rtsx_pci_sdmmc nvme ucsi_acpi psmouse crc32_pclmul typec_ucsi intel_lpss_pci nvme_core e1000e intel_lpss rtsx_pci nvme_auth snd typec idma64 soundcore video sparse_keymap platform_profile wmi pinctrl_cannonlake [last unloaded: cifs(OE)]
[24571.683187] CPU: 4 UID: 0 PID: 1365069 Comm: umount Tainted: G B W OE 6.12.3-061203-generic #202412060638
[24571.683191] Tainted: [B]=BAD_PAGE, [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[24571.683192] Hardware name: LENOVO 20MAS08500/20MAS08500, BIOS N2CET70W (1.53 ) 03/11/2024
[24571.683193] RIP: 0010:generic_shutdown_super+0x127/0x1a0
[24571.683196] Code: cc cc e8 1c 3f f0 ff 48 8b bb 00 01 00 00 eb cd 48 8b 43 28 48 8d b3 c0 03 00 00 48 c7 c7 a0 e5 ec b7 48 8b 10 e8 d9 14 be ff <0f> 0b 4c 8d ab 40 05 00 00 4c 89 ef e8 a8 16 dd 00 48 8b 8b 48 05
[24571.683198] RSP: 0018:ffffae71cbd0fc08 EFLAGS: 00010246
[24571.683200] RAX: 0000000000000000 RBX: ffff977c0096d000 RCX: 0000000000000000
[24571.683202] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[24571.683203] RBP: ffffae71cbd0fc20 R08: 0000000000000000 R09: 0000000000000000
[24571.683204] R10: 0000000000000000 R11: 0000000000000000 R12: ffff977c0096d548
[24571.683206] R13: ffff977db8f0b834 R14: 0000000000000000 R15: 0000000000000000
[24571.683207] FS: 0000734a602d3800(0000) GS:ffff97837ba00000(0000) knlGS:0000000000000000
[24571.683209] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[24571.683210] CR2: 00007ffe3e2f0cf0 CR3: 0000000120340001 CR4: 00000000003726f0
[24571.683212] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[24571.683213] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[24571.683214] Call Trace:
[24571.683216] <TASK>
[24571.683218] ? show_trace_log_lvl+0x1be/0x310
[24571.683221] ? show_trace_log_lvl+0x1be/0x310
[24571.683225] ? kill_anon_super+0x18/0x50
[24571.683227] ? show_regs.part.0+0x22/0x30
[24571.683229] ? show_regs.cold+0x8/0x10
[24571.683231] ? generic_shutdown_super+0x127/0x1a0
[24571.683233] ? __warn.cold+0xac/0x10c
[24571.683235] ? generic_shutdown_super+0x127/0x1a0
[24571.683238] ? report_bug+0x114/0x160
[24571.683241] ? handle_bug+0x6e/0xb0
[24571.683244] ? exc_invalid_op+0x18/0x80
[24571.683247] ? asm_exc_invalid_op+0x1b/0x20
[24571.683251] ? generic_shutdown_super+0x127/0x1a0
[24571.683253] ? generic_shutdown_super+0x127/0x1a0
[24571.683256] kill_anon_super+0x18/0x50
[24571.683258] cifs_kill_sb+0x4a/0x60 [cifs]
[24571.683308] deactivate_locked_super+0x32/0xc0
[24571.683310] deactivate_super+0x46/0x60
[24571.683312] cleanup_mnt+0xc3/0x170
[24571.683314] __cleanup_mnt+0x12/0x20
[24571.683315] task_work_run+0x5d/0xa0
[24571.683317] syscall_exit_to_user_mode+0x1ca/0x1d0
[24571.683320] do_syscall_64+0x8a/0x170
[24571.683322] ? generic_permission+0x39/0x230
[24571.683325] ? mntput+0x24/0x50
[24571.683326] ? path_put+0x1e/0x30
[24571.683329] ? do_faccessat+0x1e3/0x2e0
[24571.683331] ? arch_exit_to_user_mode_prepare.isra.0+0x22/0xd0
[24571.683334] ? syscall_exit_to_user_mode+0x38/0x1d0
[24571.683336] ? do_syscall_64+0x8a/0x170
[24571.683339] ? arch_exit_to_user_mode_prepare.isra.0+0x22/0xd0
[24571.683341] ? syscall_exit_to_user_mode+0x38/0x1d0
[24571.683343] ? do_syscall_64+0x8a/0x170
[24571.683345] ? irqentry_exit_to_user_mode+0x2d/0x1d0
[24571.683347] ? irqentry_exit+0x43/0x50
[24571.683349] ? exc_page_fault+0x96/0x1c0
[24571.683352] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[24571.683354] RIP: 0033:0x734a6012a9fb
[24571.683356] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e9 83 0d 00 f7 d8
[24571.683358] RSP: 002b:00007ffe3e2f24a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[24571.683360] RAX: 0000000000000000 RBX: 0000639144f67a60 RCX: 0000734a6012a9fb
[24571.683361] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000639144f74070
[24571.683363] RBP: 00007ffe3e2f2580 R08: 0000734a60203b20 R09: 0000000000000020
[24571.683364] R10: 0000000000000001 R11: 0000000000000246 R12: 0000639144f67b60
[24571.683365] R13: 0000000000000000 R14: 0000639144f74070 R15: 0000639144f67e70
[24571.683368] </TASK>
[24571.683369] ---[ end trace 0000000000000000 ]---
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 4/4] smb: During unmount, ensure all cached dir instances drop their dentry
2024-11-23 3:28 ` Paul Aurich
2024-11-26 21:37 ` Paul Aurich
@ 2024-11-27 17:36 ` Paulo Alcantara
1 sibling, 0 replies; 20+ messages in thread
From: Paulo Alcantara @ 2024-11-27 17:36 UTC (permalink / raw)
To: Paul Aurich
Cc: linux-cifs, Steve French, Ronnie Sahlberg, Shyam Prasad N,
Tom Talpey, Bharath SM
Paul Aurich <paul@darkrain42.org> writes:
> On 2024-11-21 23:05:51 -0300, Paulo Alcantara wrote:
>>
>>Why does it need to be a global workqueue? Can't you make it per tcon?
>
> The problem with a per-tcon workqueue is I didn't see clean way to deal with
> multiuser mounts and flushing the workqueue in close_all_cached_dirs() -- when
> dealing with each individual tcon, we're still holding tlink_tree_lock, so an
> arbitrary sleep seems problematic.
OK.
> There could be a per-sb workqueue (stored in cifs_sb or the master tcon) but
> is there a way to get back to the superblock / master tcon with just a tcon
> (e.g. cached_dir_lease_break, when processing a lease break)?
Yes - cifs_get_dfs_tcon_super() does that.
>>> The final cleanup work for cleaning up a cfid is performed via work
>>> queued in the serverclose_wq workqueue; this is done separate from
>>> dropping the dentries so that close_all_cached_dirs() doesn't block on
>>> any server operations.
>>>
>>> Both of these queued works expect to invoked with a cfid reference and
>>> a tcon reference to avoid those objects from being freed while the work
>>> is ongoing.
>>
>>Why do you need to take a tcon reference?
>
> In the existing code (and my patch, without the refs), I was seeing an
> intermittent use-after-free of the tcon or cached_fids struct by queued work
> processing a lease break -- the cfid isn't linked from cached_fids, but
> smb2_close_cached_fid invoking SMB2_close can race with the unmount and
> cifs_put_tcon
>
> Something like:
>
> t1 t2
> cached_dir_lease_break
> smb2_cached_lease_break
> smb2_close_cached_fid
> SMB2_close starts
> cifs_kill_sb
> cifs_umount
> cifs_put_link
> cifs_put_tcon
> SMB2_close continues
Makes sense.
> I had a version of the patch that kept the 'in flight lease breaks' on
> a second list in cached_fids so that they could be cancelled synchronously
> from free_cached_fids(), but I struggled with it (I can't remember exactly,
> but I think I was struggling to get the linked list membership / removal
> handling and num_entries handling consistent).
No worries. The damn thing isn't trivial to follow.
>> Can't you drop the dentries
>>when tearing down tcon in cifs_put_tcon()? No concurrent mounts would
>>be able to access or free it.
>
> The dentries being dropped must occur before kill_anon_super(), as that's
> where the 'Dentry still in use' check is. All the tcons are put in
> cifs_umount(), which occurs after:
>
> kill_anon_super(sb);
> cifs_umount(cifs_sb);
Right. Can't we call cancel_work_sync() to make sure that any leases
breakes are processed on the cached directory handle before calling the
above?
> The other thing is that cifs_umount_begin() has this comment, which made me
> think a tcon can actually be tied to two distinct mount points:
>
> if ((tcon->tc_count > 1) || (tcon->status == TID_EXITING)) {
> /* we have other mounts to same share or we have
> already tried to umount this and woken up
> all waiting network requests, nothing to do */
>
> Although, as I'm thinking about it again, I think I've misunderstood (and that
> comment is wrong?).
Comment is correct as a single tcon may be shared among different
mounts.
Consider the following where a single tcon is shared:
mount.cifs //srv/share /mnt/1 -o $opts
mount.cifs //srv/share/dir /mnt/2 -o $opts
There will be two different superblocks that end up using same tcon.
> It did cross my mind to pull some of the work out of cifs_umount into
> cifs_kill_sb (specifically, I wanted to cancel prune_tlinks earlier) -- no
> prune_tlinks would make it more feasible to drop tlink_tree_lock in
> close_all_cached_dirs(), at which point a per-tcon workqueue is more
> practical.
Yeah, multiuser tcons just make it even more complicated.
Sorry for the delay as I've been quite busy with other stuff.
Great work, BTW.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 0/4] SMB cached directory fixes around reconnection/unmounting
2024-11-18 21:50 [PATCH v2 0/4] SMB cached directory fixes around reconnection/unmounting Paul Aurich
` (3 preceding siblings ...)
2024-11-18 21:50 ` [PATCH v2 4/4] smb: During unmount, ensure all cached dir instances drop their dentry Paul Aurich
@ 2024-11-19 0:55 ` Steve French
2024-11-19 2:29 ` Paul Aurich
2024-11-21 20:59 ` Steve French
5 siblings, 1 reply; 20+ messages in thread
From: Steve French @ 2024-11-19 0:55 UTC (permalink / raw)
To: Paul Aurich
Cc: linux-cifs, Steve French, Paulo Alcantara, Ronnie Sahlberg,
Shyam Prasad N, Tom Talpey, Bharath SM
Looks like you dropped the patch:
"smb: No need to wait for work when cleaning up cached directories"
Otherwise for the four remaining patches, looks like the first patch
stayed the same (trivial comment change).
Can you remind me which of these three changed:
smb: Don't leak cfid when reconnect races with open_cached_dir
smb: prevent use-after-free due to open_cached_dir error paths
smb: During unmount, ensure all cached dir instances drop their dentry
On Mon, Nov 18, 2024 at 3:53 PM Paul Aurich <paul@darkrain42.org> wrote:
>
> v2:
> - Added locking in closed_all_cached_dirs()
> - Replaced use of the cifsiod_wq with a new workqueue used for dropping cached
> dir dentries, and split out the "drop dentry" work from "potential
> SMB2_close + cleanup" work so that close_all_cached_dirs() doesn't block on
> server traffic, but can ensure all "drop dentry" work has run.
> - Repurposed the (essentially unused) cfid->fid_lock to protect cfid->dentry
>
>
> The SMB client cached directory functionality can either leak a cfid if
> open_cached_dir() races with a reconnect, or can have races between the
> unmount process and cached dir cleanup/lease breaks that all lead to
> a cached_dir instance not dropping its dentry ref in close_all_cached_dirs().
> These all manifest as a pair of BUGs when unmounting:
>
> [18645.013550] BUG: Dentry ffff888140590ba0{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs]
> [18645.789274] VFS: Busy inodes after unmount of cifs (cifs)
>
> These issues started with the lease directory cache handling introduced in
> commit ebe98f1447bb ("cifs: enable caching of directories for which a lease is
> held"), and go away if I mount with 'nohandlecache'.
>
> I'm able to reproduce the "Dentry still in use" errors by connecting to an
> actively-used SMB share (the server organically generates lease breaks) and
> leaving these running for 'a while':
>
> - while true; do cd ~; sleep 1; for i in {1..3}; do cd /mnt/test/subdir; echo $PWD; sleep 1; cd ..; echo $PWD; sleep 1; done; echo ...; done
> - while true; do iptables -F OUTPUT; mount -t cifs -a; for _ in {0..2}; do ls /mnt/test/subdir/ | wc -l; done; iptables -I OUTPUT -p tcp --dport 445 -j DROP; sleep 10; echo "unmounting"; umount -l -t cifs -a; echo "done unmounting"; sleep 20; echo "recovering"; iptables -F OUTPUT; sleep 10; done
>
> ('a while' is anywhere from 10 minutes to overnight. Also, it's not the
> cleanest reproducer, but I stopped iterating once I had something that was
> even remotely reliable for me...)
>
> This series attempts to fix these, as well as a use-after-free that could
> occur because open_cached_dir() explicitly frees the cached_fid, rather than
> relying on reference counting.
> Paul Aurich (4):
> smb: cached directories can be more than root file handle
> smb: Don't leak cfid when reconnect races with open_cached_dir
> smb: prevent use-after-free due to open_cached_dir error paths
> smb: During unmount, ensure all cached dir instances drop their dentry
>
> fs/smb/client/cached_dir.c | 228 +++++++++++++++++++++++++------------
> fs/smb/client/cached_dir.h | 6 +-
> fs/smb/client/cifsfs.c | 14 ++-
> fs/smb/client/cifsglob.h | 3 +-
> fs/smb/client/inode.c | 3 -
> fs/smb/client/trace.h | 3 +
> 6 files changed, 179 insertions(+), 78 deletions(-)
>
> --
> 2.45.2
>
>
--
Thanks,
Steve
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH v2 0/4] SMB cached directory fixes around reconnection/unmounting
2024-11-19 0:55 ` [PATCH v2 0/4] SMB cached directory fixes around reconnection/unmounting Steve French
@ 2024-11-19 2:29 ` Paul Aurich
0 siblings, 0 replies; 20+ messages in thread
From: Paul Aurich @ 2024-11-19 2:29 UTC (permalink / raw)
To: Steve French
Cc: linux-cifs, Steve French, Paulo Alcantara, Ronnie Sahlberg,
Shyam Prasad N, Tom Talpey, Bharath SM
On 2024-11-18 18:55:23 -0600, Steve French wrote:
>Looks like you dropped the patch:
>"smb: No need to wait for work when cleaning up cached directories"
>
>Otherwise for the four remaining patches, looks like the first patch
>stayed the same (trivial comment change).
>
>Can you remind me which of these three changed:
>
> smb: Don't leak cfid when reconnect races with open_cached_dir
> smb: prevent use-after-free due to open_cached_dir error paths
> smb: During unmount, ensure all cached dir instances drop their dentry
All the substantive changes are in the last patch. I should have clarified,
but I just folded the changes from "smb: No need to wait for work when
cleaning up cached directories" into that patch, as well.
>On Mon, Nov 18, 2024 at 3:53 PM Paul Aurich <paul@darkrain42.org> wrote:
>>
>> v2:
>> - Added locking in closed_all_cached_dirs()
>> - Replaced use of the cifsiod_wq with a new workqueue used for dropping cached
>> dir dentries, and split out the "drop dentry" work from "potential
>> SMB2_close + cleanup" work so that close_all_cached_dirs() doesn't block on
>> server traffic, but can ensure all "drop dentry" work has run.
>> - Repurposed the (essentially unused) cfid->fid_lock to protect cfid->dentry
>>
>>
>> The SMB client cached directory functionality can either leak a cfid if
>> open_cached_dir() races with a reconnect, or can have races between the
>> unmount process and cached dir cleanup/lease breaks that all lead to
>> a cached_dir instance not dropping its dentry ref in close_all_cached_dirs().
>> These all manifest as a pair of BUGs when unmounting:
>>
>> [18645.013550] BUG: Dentry ffff888140590ba0{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs]
>> [18645.789274] VFS: Busy inodes after unmount of cifs (cifs)
>>
>> These issues started with the lease directory cache handling introduced in
>> commit ebe98f1447bb ("cifs: enable caching of directories for which a lease is
>> held"), and go away if I mount with 'nohandlecache'.
>>
>> I'm able to reproduce the "Dentry still in use" errors by connecting to an
>> actively-used SMB share (the server organically generates lease breaks) and
>> leaving these running for 'a while':
>>
>> - while true; do cd ~; sleep 1; for i in {1..3}; do cd /mnt/test/subdir; echo $PWD; sleep 1; cd ..; echo $PWD; sleep 1; done; echo ...; done
>> - while true; do iptables -F OUTPUT; mount -t cifs -a; for _ in {0..2}; do ls /mnt/test/subdir/ | wc -l; done; iptables -I OUTPUT -p tcp --dport 445 -j DROP; sleep 10; echo "unmounting"; umount -l -t cifs -a; echo "done unmounting"; sleep 20; echo "recovering"; iptables -F OUTPUT; sleep 10; done
>>
>> ('a while' is anywhere from 10 minutes to overnight. Also, it's not the
>> cleanest reproducer, but I stopped iterating once I had something that was
>> even remotely reliable for me...)
>>
>> This series attempts to fix these, as well as a use-after-free that could
>> occur because open_cached_dir() explicitly frees the cached_fid, rather than
>> relying on reference counting.
>> Paul Aurich (4):
>> smb: cached directories can be more than root file handle
>> smb: Don't leak cfid when reconnect races with open_cached_dir
>> smb: prevent use-after-free due to open_cached_dir error paths
>> smb: During unmount, ensure all cached dir instances drop their dentry
>>
>> fs/smb/client/cached_dir.c | 228 +++++++++++++++++++++++++------------
>> fs/smb/client/cached_dir.h | 6 +-
>> fs/smb/client/cifsfs.c | 14 ++-
>> fs/smb/client/cifsglob.h | 3 +-
>> fs/smb/client/inode.c | 3 -
>> fs/smb/client/trace.h | 3 +
>> 6 files changed, 179 insertions(+), 78 deletions(-)
>>
>> --
>> 2.45.2
>>
>>
>
>
>--
>Thanks,
>
>Steve
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 0/4] SMB cached directory fixes around reconnection/unmounting
2024-11-18 21:50 [PATCH v2 0/4] SMB cached directory fixes around reconnection/unmounting Paul Aurich
` (4 preceding siblings ...)
2024-11-19 0:55 ` [PATCH v2 0/4] SMB cached directory fixes around reconnection/unmounting Steve French
@ 2024-11-21 20:59 ` Steve French
5 siblings, 0 replies; 20+ messages in thread
From: Steve French @ 2024-11-21 20:59 UTC (permalink / raw)
To: Paul Aurich
Cc: linux-cifs, Paulo Alcantara, Ronnie Sahlberg, Shyam Prasad N,
Tom Talpey, Bharath SM
[-- Attachment #1: Type: text/plain, Size: 3829 bytes --]
I noticed a hang today in generic/246 immediately after a crash in
umount in the previous test, generic/241 running with 6.12 kernel with
recent cifs-2.6.git/for-next which includes e.g. the 5 directory
caching fixes) generic/241 to Windows. See attached dmesg log. Any
thoughts?
I did not see it hang on a previous run.
Nov 21 12:36:09 fedora29 kernel: ? umount_check+0xc3/0xf0
Nov 21 12:36:09 fedora29 kernel: ? __pfx_umount_check+0x10/0x10
Nov 21 12:36:09 fedora29 kernel: d_walk+0xf3/0x4e0
Nov 21 12:36:09 fedora29 kernel: ? d_walk+0x4b/0x4e0
Nov 21 12:36:09 fedora29 kernel: shrink_dcache_for_umount+0x6d/0x220
Nov 21 12:36:09 fedora29 kernel: generic_shutdown_super+0x4a/0x1c0
Nov 21 12:36:09 fedora29 kernel: kill_anon_super+0x22/0x40
Nov 21 12:36:09 fedora29 kernel: cifs_kill_sb+0x78/0x90 [cifs]
On Mon, Nov 18, 2024 at 3:53 PM Paul Aurich <paul@darkrain42.org> wrote:
>
> v2:
> - Added locking in closed_all_cached_dirs()
> - Replaced use of the cifsiod_wq with a new workqueue used for dropping cached
> dir dentries, and split out the "drop dentry" work from "potential
> SMB2_close + cleanup" work so that close_all_cached_dirs() doesn't block on
> server traffic, but can ensure all "drop dentry" work has run.
> - Repurposed the (essentially unused) cfid->fid_lock to protect cfid->dentry
>
>
> The SMB client cached directory functionality can either leak a cfid if
> open_cached_dir() races with a reconnect, or can have races between the
> unmount process and cached dir cleanup/lease breaks that all lead to
> a cached_dir instance not dropping its dentry ref in close_all_cached_dirs().
> These all manifest as a pair of BUGs when unmounting:
>
> [18645.013550] BUG: Dentry ffff888140590ba0{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs]
> [18645.789274] VFS: Busy inodes after unmount of cifs (cifs)
>
> These issues started with the lease directory cache handling introduced in
> commit ebe98f1447bb ("cifs: enable caching of directories for which a lease is
> held"), and go away if I mount with 'nohandlecache'.
>
> I'm able to reproduce the "Dentry still in use" errors by connecting to an
> actively-used SMB share (the server organically generates lease breaks) and
> leaving these running for 'a while':
>
> - while true; do cd ~; sleep 1; for i in {1..3}; do cd /mnt/test/subdir; echo $PWD; sleep 1; cd ..; echo $PWD; sleep 1; done; echo ...; done
> - while true; do iptables -F OUTPUT; mount -t cifs -a; for _ in {0..2}; do ls /mnt/test/subdir/ | wc -l; done; iptables -I OUTPUT -p tcp --dport 445 -j DROP; sleep 10; echo "unmounting"; umount -l -t cifs -a; echo "done unmounting"; sleep 20; echo "recovering"; iptables -F OUTPUT; sleep 10; done
>
> ('a while' is anywhere from 10 minutes to overnight. Also, it's not the
> cleanest reproducer, but I stopped iterating once I had something that was
> even remotely reliable for me...)
>
> This series attempts to fix these, as well as a use-after-free that could
> occur because open_cached_dir() explicitly frees the cached_fid, rather than
> relying on reference counting.
> Paul Aurich (4):
> smb: cached directories can be more than root file handle
> smb: Don't leak cfid when reconnect races with open_cached_dir
> smb: prevent use-after-free due to open_cached_dir error paths
> smb: During unmount, ensure all cached dir instances drop their dentry
>
> fs/smb/client/cached_dir.c | 228 +++++++++++++++++++++++++------------
> fs/smb/client/cached_dir.h | 6 +-
> fs/smb/client/cifsfs.c | 14 ++-
> fs/smb/client/cifsglob.h | 3 +-
> fs/smb/client/inode.c | 3 -
> fs/smb/client/trace.h | 3 +
> 6 files changed, 179 insertions(+), 78 deletions(-)
>
> --
> 2.45.2
>
>
--
Thanks,
Steve
[-- Attachment #2: messages --]
[-- Type: application/octet-stream, Size: 45270 bytes --]
Nov 21 12:34:27 fedora29 journal: run fstests generic/239 at 2024-11-21 12:34:27
Nov 21 12:34:27 fedora29 systemd[1]: Started fstests-generic-239.scope - /usr/bin/bash -c "test -w /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec ./tests/generic/239".
Nov 21 12:34:52 fedora29 systemd[1]: fstests-generic-239.scope: Deactivated successfully.
Nov 21 12:34:52 fedora29 systemd[1]: fstests-generic-239.scope: Consumed 6.523s CPU time.
Nov 21 12:34:53 fedora29 systemd[1]: mnt-test.mount: Deactivated successfully.
Nov 21 12:34:53 fedora29 audit[314365]: USER_END pid=314365 uid=0 auid=0 ses=85 subj=kernel msg='op=login id=0 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=ssh res=success'
Nov 21 12:34:53 fedora29 audit[314365]: USER_LOGOUT pid=314365 uid=0 auid=0 ses=85 subj=kernel msg='op=login id=0 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=ssh res=success'
Nov 21 12:34:53 fedora29 audit[314365]: CRYPTO_KEY_USER pid=314365 uid=0 auid=0 ses=85 subj=kernel msg='op=destroy kind=session fp=? direction=both spid=314368 suid=0 rport=53434 laddr=192.168.122.12 lport=22 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=? res=success'
Nov 21 12:34:53 fedora29 audit[314365]: CRYPTO_KEY_USER pid=314365 uid=0 auid=0 ses=85 subj=kernel msg='op=destroy kind=server fp=SHA256:1c:96:8e:f3:a7:01:59:0e:41:37:ed:d1:99:72:2a:55:ce:22:78:f7:52:5c:bd:6a:46:2f:22:14:0b:75:ff:f7 direction=? spid=314368 suid=0 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=? res=success'
Nov 21 12:34:53 fedora29 audit[314365]: USER_END pid=314365 uid=0 auid=0 ses=85 subj=kernel msg='op=PAM:session_close grantors=pam_selinux,pam_loginuid,pam_selinux,pam_namespace,pam_keyinit,pam_keyinit,pam_limits,pam_systemd,pam_unix,pam_umask,pam_lastlog acct="root" exe="/usr/sbin/sshd" hostname=192.168.122.1 addr=192.168.122.1 terminal=ssh res=success'
Nov 21 12:34:53 fedora29 audit[314365]: CRED_DISP pid=314365 uid=0 auid=0 ses=85 subj=kernel msg='op=PAM:setcred grantors=pam_env,pam_unix acct="root" exe="/usr/sbin/sshd" hostname=192.168.122.1 addr=192.168.122.1 terminal=ssh res=success'
Nov 21 12:34:53 fedora29 audit[314365]: CRYPTO_KEY_USER pid=314365 uid=0 auid=0 ses=85 subj=kernel msg='op=destroy kind=server fp=SHA256:1c:96:8e:f3:a7:01:59:0e:41:37:ed:d1:99:72:2a:55:ce:22:78:f7:52:5c:bd:6a:46:2f:22:14:0b:75:ff:f7 direction=? spid=314365 suid=0 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=? res=success'
Nov 21 12:34:53 fedora29 systemd-logind[1094]: Session 85 logged out. Waiting for processes to exit.
Nov 21 12:34:53 fedora29 systemd[1]: session-85.scope: Deactivated successfully.
Nov 21 12:34:53 fedora29 systemd[1]: session-85.scope: Consumed 2.752s CPU time.
Nov 21 12:34:53 fedora29 systemd-logind[1094]: Removed session 85.
Nov 21 12:34:53 fedora29 audit[315378]: CRYPTO_KEY_USER pid=315378 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='op=destroy kind=server fp=SHA256:1c:96:8e:f3:a7:01:59:0e:41:37:ed:d1:99:72:2a:55:ce:22:78:f7:52:5c:bd:6a:46:2f:22:14:0b:75:ff:f7 direction=? spid=315378 suid=0 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=? res=success'
Nov 21 12:34:53 fedora29 audit[315377]: CRYPTO_SESSION pid=315377 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='op=start direction=from-server cipher=aes256-gcm@openssh.com ksize=256 mac=<implicit> pfs=curve25519-sha256 spid=315378 suid=74 rport=56308 laddr=192.168.122.12 lport=22 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=? res=success'
Nov 21 12:34:53 fedora29 audit[315377]: CRYPTO_SESSION pid=315377 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='op=start direction=from-client cipher=aes256-gcm@openssh.com ksize=256 mac=<implicit> pfs=curve25519-sha256 spid=315378 suid=74 rport=56308 laddr=192.168.122.12 lport=22 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=? res=success'
Nov 21 12:34:53 fedora29 audit[315377]: USER_AUTH pid=315377 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='op=pubkey_auth grantors=auth-key acct="root" exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=? res=success'
Nov 21 12:34:53 fedora29 audit[315377]: CRYPTO_KEY_USER pid=315377 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='op=negotiate kind=auth-key fp=SHA256:3e:03:52:86:fb:4d:b7:b6:0e:dd:8d:1e:57:6e:e2:00:dc:da:9a:38:08:d8:49:4f:31:4b:e6:fb:16:6c:ff:84 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=? res=success'
Nov 21 12:34:53 fedora29 audit[315377]: USER_ACCT pid=315377 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='op=PAM:accounting grantors=pam_unix acct="root" exe="/usr/sbin/sshd" hostname=192.168.122.1 addr=192.168.122.1 terminal=ssh res=success'
Nov 21 12:34:53 fedora29 audit[315377]: CRYPTO_KEY_USER pid=315377 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='op=destroy kind=session fp=? direction=both spid=315378 suid=74 rport=56308 laddr=192.168.122.12 lport=22 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=? res=success'
Nov 21 12:34:53 fedora29 audit[315377]: CRED_ACQ pid=315377 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='op=PAM:setcred grantors=pam_env,pam_unix acct="root" exe="/usr/sbin/sshd" hostname=192.168.122.1 addr=192.168.122.1 terminal=ssh res=success'
Nov 21 12:34:53 fedora29 systemd-logind[1094]: New session 86 of user root.
Nov 21 12:34:53 fedora29 systemd[1]: Started session-86.scope - Session 86 of User root.
Nov 21 12:34:53 fedora29 audit[315377]: USER_START pid=315377 uid=0 auid=0 ses=86 subj=kernel msg='op=PAM:session_open grantors=pam_selinux,pam_loginuid,pam_selinux,pam_namespace,pam_keyinit,pam_keyinit,pam_limits,pam_systemd,pam_unix,pam_umask,pam_lastlog acct="root" exe="/usr/sbin/sshd" hostname=192.168.122.1 addr=192.168.122.1 terminal=ssh res=success'
Nov 21 12:34:53 fedora29 audit[315380]: CRYPTO_KEY_USER pid=315380 uid=0 auid=0 ses=86 subj=kernel msg='op=destroy kind=server fp=SHA256:1c:96:8e:f3:a7:01:59:0e:41:37:ed:d1:99:72:2a:55:ce:22:78:f7:52:5c:bd:6a:46:2f:22:14:0b:75:ff:f7 direction=? spid=315380 suid=0 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=? res=success'
Nov 21 12:34:53 fedora29 audit[315380]: CRED_ACQ pid=315380 uid=0 auid=0 ses=86 subj=kernel msg='op=PAM:setcred grantors=pam_env,pam_unix acct="root" exe="/usr/sbin/sshd" hostname=192.168.122.1 addr=192.168.122.1 terminal=ssh res=success'
Nov 21 12:34:53 fedora29 audit[315377]: USER_LOGIN pid=315377 uid=0 auid=0 ses=86 subj=kernel msg='op=login id=0 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=ssh res=success'
Nov 21 12:34:53 fedora29 audit[315377]: USER_START pid=315377 uid=0 auid=0 ses=86 subj=kernel msg='op=login id=0 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=ssh res=success'
Nov 21 12:34:53 fedora29 audit[315377]: CRYPTO_KEY_USER pid=315377 uid=0 auid=0 ses=86 subj=kernel msg='op=destroy kind=server fp=SHA256:1c:96:8e:f3:a7:01:59:0e:41:37:ed:d1:99:72:2a:55:ce:22:78:f7:52:5c:bd:6a:46:2f:22:14:0b:75:ff:f7 direction=? spid=315381 suid=0 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=? res=success'
Nov 21 12:34:54 fedora29 kernel: CIFS: Attempting to mount //win16.vm.test/Share
Nov 21 12:34:54 fedora29 kernel: CIFS: VFS: generate_smb3signingkey: dumping generated AES session keys
Nov 21 12:34:54 fedora29 kernel: CIFS: VFS: Session Id 4d 00 00 08 00 c8 00 00
Nov 21 12:34:54 fedora29 kernel: CIFS: VFS: Cipher type 2
Nov 21 12:34:54 fedora29 kernel: CIFS: VFS: Session Key 30 73 2e 16 d5 25 62 83 05 b3 93 9f 4e 4f 0b f5
Nov 21 12:34:54 fedora29 kernel: CIFS: VFS: Signing Key 3a f2 1b 6b 19 98 e6 11 98 74 c1 e3 7f 04 0b 67
Nov 21 12:34:54 fedora29 kernel: CIFS: VFS: ServerIn Key d1 cc d2 5c 28 3a a7 12 f8 c6 45 09 04 8d 79 e2
Nov 21 12:34:54 fedora29 kernel: CIFS: VFS: ServerOut Key 9c de f1 51 1b 26 15 bd 27 a3 63 d2 3b 8f 13 f6
Nov 21 12:34:54 fedora29 systemd[1]: Started fstests-check.scope - /usr/bin/bash -c "exit 77".
Nov 21 12:34:54 fedora29 systemd[1]: fstests-check.scope: Deactivated successfully.
Nov 21 12:34:55 fedora29 kernel: CIFS: Attempting to mount //win16.vm.test/Scratch
Nov 21 12:34:55 fedora29 systemd[1]: mnt-scratch.mount: Deactivated successfully.
Nov 21 12:34:55 fedora29 kernel: CIFS: Attempting to mount //win16.vm.test/Scratch
Nov 21 12:34:55 fedora29 systemd[1]: mnt-scratch.mount: Deactivated successfully.
Nov 21 12:34:55 fedora29 root[315652]: run xfstest generic/241
Nov 21 12:34:55 fedora29 journal: run fstests generic/241 at 2024-11-21 12:34:55
Nov 21 12:34:55 fedora29 systemd[1]: Started fstests-generic-241.scope - /usr/bin/bash -c "test -w /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec ./tests/generic/241".
Nov 21 12:36:09 fedora29 systemd[1]: fstests-generic-241.scope: Deactivated successfully.
Nov 21 12:36:09 fedora29 systemd[1]: fstests-generic-241.scope: Consumed 52.315s CPU time.
Nov 21 12:36:09 fedora29 systemd[1]: mnt-test.mount: Deactivated successfully.
Nov 21 12:36:09 fedora29 kernel: ------------[ cut here ]------------
Nov 21 12:36:09 fedora29 kernel: BUG: Dentry 000000001f2f7f1c{i=f00000001c445,n=~dmtmp} still in use (1) [unmount of cifs cifs]
Nov 21 12:36:09 fedora29 kernel: WARNING: CPU: 0 PID: 315989 at fs/dcache.c:1536 umount_check+0xc3/0xf0
Nov 21 12:36:09 fedora29 kernel: Modules linked in: cifs ccm cmac nls_utf8 cifs_arc4 nls_ucs2_utils cifs_md4 rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace netfs nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables sunrpc kvm_intel kvm virtio_net net_failover virtio_balloon failover fuse loop dm_multipath nfnetlink zram xfs bochs drm_vram_helper drm_ttm_helper ttm drm_kms_helper drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 floppy sha256_ssse3 sha1_ssse3 virtio_blk qemu_fw_cfg virtio_console [last unloaded: cifs]
Nov 21 12:36:09 fedora29 kernel: CPU: 0 UID: 0 PID: 315989 Comm: umount Not tainted 6.12.0 #1
Nov 21 12:36:09 fedora29 kernel: Hardware name: Red Hat KVM, BIOS 1.16.3-2.el9 04/01/2014
Nov 21 12:36:09 fedora29 kernel: RIP: 0010:umount_check+0xc3/0xf0
Nov 21 12:36:09 fedora29 kernel: Code: db 74 0d 48 8d 7b 40 e8 0b e4 f5 ff 48 8b 53 40 41 55 4d 89 f1 45 89 e0 48 89 e9 48 89 ee 48 c7 c7 40 75 ba 9b e8 5d d2 a2 ff <0f> 0b 58 31 c0 5b 5d 41 5c 41 5d 41 5e c3 cc cc cc cc 41 83 fc 01
Nov 21 12:36:09 fedora29 kernel: RSP: 0018:ff11000113cffd20 EFLAGS: 00010282
Nov 21 12:36:09 fedora29 kernel: RAX: dffffc0000000000 RBX: ff11000116a963f0 RCX: 0000000000000027
Nov 21 12:36:09 fedora29 kernel: RDX: 0000000000000027 RSI: 0000000000000004 RDI: ff110004cb031a08
Nov 21 12:36:09 fedora29 kernel: RBP: ff1100011419e000 R08: ffffffff9a3f76fe R09: ffe21c0099606341
Nov 21 12:36:09 fedora29 kernel: R10: ff110004cb031a0b R11: 0000000000000001 R12: 0000000000000001
Nov 21 12:36:09 fedora29 kernel: R13: ff1100012b880668 R14: ffffffffc1e6e6c0 R15: ff1100011419e0b8
Nov 21 12:36:09 fedora29 kernel: FS: 00007fb8596e2800(0000) GS:ff110004cb000000(0000) knlGS:0000000000000000
Nov 21 12:36:09 fedora29 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 21 12:36:09 fedora29 kernel: CR2: 00007fff7f3e2e3c CR3: 0000000122822004 CR4: 0000000000373ef0
Nov 21 12:36:09 fedora29 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 21 12:36:09 fedora29 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 21 12:36:09 fedora29 kernel: Call Trace:
Nov 21 12:36:09 fedora29 kernel: <TASK>
Nov 21 12:36:09 fedora29 kernel: ? __warn+0xa9/0x220
Nov 21 12:36:09 fedora29 kernel: ? umount_check+0xc3/0xf0
Nov 21 12:36:09 fedora29 kernel: ? report_bug+0x1d4/0x1e0
Nov 21 12:36:09 fedora29 kernel: ? handle_bug+0x5b/0xa0
Nov 21 12:36:09 fedora29 kernel: ? exc_invalid_op+0x18/0x50
Nov 21 12:36:09 fedora29 kernel: ? asm_exc_invalid_op+0x1a/0x20
Nov 21 12:36:09 fedora29 kernel: ? irq_work_claim+0x1e/0x40
Nov 21 12:36:09 fedora29 kernel: ? umount_check+0xc3/0xf0
Nov 21 12:36:09 fedora29 kernel: ? __pfx_umount_check+0x10/0x10
Nov 21 12:36:09 fedora29 kernel: d_walk+0xf3/0x4e0
Nov 21 12:36:09 fedora29 kernel: ? d_walk+0x4b/0x4e0
Nov 21 12:36:09 fedora29 kernel: shrink_dcache_for_umount+0x6d/0x220
Nov 21 12:36:09 fedora29 kernel: generic_shutdown_super+0x4a/0x1c0
Nov 21 12:36:09 fedora29 kernel: kill_anon_super+0x22/0x40
Nov 21 12:36:09 fedora29 kernel: cifs_kill_sb+0x78/0x90 [cifs]
Nov 21 12:36:09 fedora29 kernel: deactivate_locked_super+0x69/0xf0
Nov 21 12:36:09 fedora29 kernel: cleanup_mnt+0x195/0x200
Nov 21 12:36:09 fedora29 kernel: task_work_run+0xec/0x150
Nov 21 12:36:09 fedora29 kernel: ? __pfx_task_work_run+0x10/0x10
Nov 21 12:36:09 fedora29 kernel: ? mark_held_locks+0x24/0x90
Nov 21 12:36:09 fedora29 kernel: syscall_exit_to_user_mode+0x269/0x2a0
Nov 21 12:36:09 fedora29 kernel: do_syscall_64+0x81/0x180
Nov 21 12:36:09 fedora29 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Nov 21 12:36:09 fedora29 kernel: RIP: 0033:0x7fb85990a3eb
Nov 21 12:36:09 fedora29 kernel: Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 f9 a9 0c 00 f7 d8
Nov 21 12:36:09 fedora29 kernel: RSP: 002b:00007fff7f3e45d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
Nov 21 12:36:09 fedora29 kernel: RAX: 0000000000000000 RBX: 000055b939a42c20 RCX: 00007fb85990a3eb
Nov 21 12:36:09 fedora29 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055b939a47410
Nov 21 12:36:09 fedora29 kernel: RBP: 00007fff7f3e46b0 R08: 000055b939a42010 R09: 0000000000000007
Nov 21 12:36:09 fedora29 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 000055b939a42d28
Nov 21 12:36:09 fedora29 kernel: R13: 0000000000000000 R14: 000055b939a47410 R15: 000055b939a43030
Nov 21 12:36:09 fedora29 kernel: </TASK>
Nov 21 12:36:09 fedora29 kernel: irq event stamp: 8243
Nov 21 12:36:09 fedora29 kernel: hardirqs last enabled at (8249): [<ffffffff9a22e31e>] __up_console_sem+0x5e/0x70
Nov 21 12:36:09 fedora29 kernel: hardirqs last disabled at (8254): [<ffffffff9a22e303>] __up_console_sem+0x43/0x70
Nov 21 12:36:09 fedora29 kernel: softirqs last enabled at (5682): [<ffffffff9a13410e>] __irq_exit_rcu+0xfe/0x120
Nov 21 12:36:09 fedora29 kernel: softirqs last disabled at (5549): [<ffffffff9a13410e>] __irq_exit_rcu+0xfe/0x120
Nov 21 12:36:09 fedora29 kernel: ---[ end trace 0000000000000000 ]---
Nov 21 12:36:09 fedora29 kernel: VFS: Busy inodes after unmount of cifs (cifs)
Nov 21 12:36:09 fedora29 kernel: ------------[ cut here ]------------
Nov 21 12:36:09 fedora29 kernel: kernel BUG at fs/super.c:650!
Nov 21 12:36:09 fedora29 kernel: Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
Nov 21 12:36:09 fedora29 kernel: CPU: 0 UID: 0 PID: 315989 Comm: umount Tainted: G W 6.12.0 #1
Nov 21 12:36:09 fedora29 kernel: Tainted: [W]=WARN
Nov 21 12:36:09 fedora29 kernel: Hardware name: Red Hat KVM, BIOS 1.16.3-2.el9 04/01/2014
Nov 21 12:36:09 fedora29 kernel: RIP: 0010:generic_shutdown_super+0x1b7/0x1c0
Nov 21 12:36:09 fedora29 kernel: Code: 7b 28 e8 7c c8 f8 ff 48 8b 6b 28 48 89 ef e8 70 c8 f8 ff 48 8b 55 00 48 8d b3 68 06 00 00 48 c7 c7 20 15 ba 9b e8 99 53 b6 ff <0f> 0b 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90
Nov 21 12:36:09 fedora29 kernel: RSP: 0018:ff11000113cffdf0 EFLAGS: 00010282
Nov 21 12:36:09 fedora29 kernel: RAX: 000000000000002d RBX: ff1100012b880000 RCX: ffffffff9a2c919e
Nov 21 12:36:09 fedora29 kernel: RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ff110004cb037080
Nov 21 12:36:09 fedora29 kernel: RBP: ffffffffc1d6ac00 R08: 0000000000000001 R09: ffe21c002279ff86
Nov 21 12:36:09 fedora29 kernel: R10: ff11000113cffc37 R11: 0000000000000001 R12: ff1100012b8809c0
Nov 21 12:36:09 fedora29 kernel: R13: ff1100012b880780 R14: 1fe220002279ffd4 R15: 0000000000000000
Nov 21 12:36:09 fedora29 kernel: FS: 00007fb8596e2800(0000) GS:ff110004cb000000(0000) knlGS:0000000000000000
Nov 21 12:36:09 fedora29 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 21 12:36:09 fedora29 kernel: CR2: 00007fff7f3e2e3c CR3: 0000000122822004 CR4: 0000000000373ef0
Nov 21 12:36:09 fedora29 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 21 12:36:09 fedora29 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 21 12:36:09 fedora29 kernel: Call Trace:
Nov 21 12:36:09 fedora29 kernel: <TASK>
Nov 21 12:36:09 fedora29 kernel: ? die+0x37/0x90
Nov 21 12:36:09 fedora29 kernel: ? do_trap+0x133/0x230
Nov 21 12:36:09 fedora29 kernel: ? generic_shutdown_super+0x1b7/0x1c0
Nov 21 12:36:09 fedora29 kernel: ? do_error_trap+0x94/0x130
Nov 21 12:36:09 fedora29 kernel: ? generic_shutdown_super+0x1b7/0x1c0
Nov 21 12:36:09 fedora29 kernel: ? generic_shutdown_super+0x1b7/0x1c0
Nov 21 12:36:09 fedora29 kernel: ? handle_invalid_op+0x2c/0x40
Nov 21 12:36:09 fedora29 kernel: ? generic_shutdown_super+0x1b7/0x1c0
Nov 21 12:36:09 fedora29 kernel: ? exc_invalid_op+0x2f/0x50
Nov 21 12:36:09 fedora29 kernel: ? asm_exc_invalid_op+0x1a/0x20
Nov 21 12:36:09 fedora29 kernel: ? tick_nohz_tick_stopped+0x1e/0x40
Nov 21 12:36:09 fedora29 kernel: ? generic_shutdown_super+0x1b7/0x1c0
Nov 21 12:36:09 fedora29 kernel: kill_anon_super+0x22/0x40
Nov 21 12:36:09 fedora29 kernel: cifs_kill_sb+0x78/0x90 [cifs]
Nov 21 12:36:09 fedora29 kernel: deactivate_locked_super+0x69/0xf0
Nov 21 12:36:09 fedora29 kernel: cleanup_mnt+0x195/0x200
Nov 21 12:36:09 fedora29 kernel: task_work_run+0xec/0x150
Nov 21 12:36:09 fedora29 kernel: ? __pfx_task_work_run+0x10/0x10
Nov 21 12:36:09 fedora29 kernel: ? mark_held_locks+0x24/0x90
Nov 21 12:36:09 fedora29 kernel: syscall_exit_to_user_mode+0x269/0x2a0
Nov 21 12:36:09 fedora29 kernel: do_syscall_64+0x81/0x180
Nov 21 12:36:09 fedora29 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Nov 21 12:36:09 fedora29 kernel: RIP: 0033:0x7fb85990a3eb
Nov 21 12:36:09 fedora29 kernel: Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 f9 a9 0c 00 f7 d8
Nov 21 12:36:09 fedora29 kernel: RSP: 002b:00007fff7f3e45d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
Nov 21 12:36:09 fedora29 kernel: RAX: 0000000000000000 RBX: 000055b939a42c20 RCX: 00007fb85990a3eb
Nov 21 12:36:09 fedora29 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055b939a47410
Nov 21 12:36:09 fedora29 kernel: RBP: 00007fff7f3e46b0 R08: 000055b939a42010 R09: 0000000000000007
Nov 21 12:36:09 fedora29 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 000055b939a42d28
Nov 21 12:36:09 fedora29 kernel: R13: 0000000000000000 R14: 000055b939a47410 R15: 000055b939a43030
Nov 21 12:36:09 fedora29 kernel: </TASK>
Nov 21 12:36:09 fedora29 kernel: Modules linked in: cifs ccm cmac nls_utf8 cifs_arc4 nls_ucs2_utils cifs_md4 rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace netfs nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables sunrpc kvm_intel kvm virtio_net net_failover virtio_balloon failover fuse loop dm_multipath nfnetlink zram xfs bochs drm_vram_helper drm_ttm_helper ttm drm_kms_helper drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 floppy sha256_ssse3 sha1_ssse3 virtio_blk qemu_fw_cfg virtio_console [last unloaded: cifs]
Nov 21 12:36:09 fedora29 kernel: ---[ end trace 0000000000000000 ]---
Nov 21 12:36:09 fedora29 kernel: RIP: 0010:generic_shutdown_super+0x1b7/0x1c0
Nov 21 12:36:09 fedora29 kernel: Code: 7b 28 e8 7c c8 f8 ff 48 8b 6b 28 48 89 ef e8 70 c8 f8 ff 48 8b 55 00 48 8d b3 68 06 00 00 48 c7 c7 20 15 ba 9b e8 99 53 b6 ff <0f> 0b 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90
Nov 21 12:36:09 fedora29 kernel: RSP: 0018:ff11000113cffdf0 EFLAGS: 00010282
Nov 21 12:36:09 fedora29 kernel: RAX: 000000000000002d RBX: ff1100012b880000 RCX: ffffffff9a2c919e
Nov 21 12:36:09 fedora29 kernel: RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ff110004cb037080
Nov 21 12:36:09 fedora29 kernel: RBP: ffffffffc1d6ac00 R08: 0000000000000001 R09: ffe21c002279ff86
Nov 21 12:36:09 fedora29 kernel: R10: ff11000113cffc37 R11: 0000000000000001 R12: ff1100012b8809c0
Nov 21 12:36:09 fedora29 kernel: R13: ff1100012b880780 R14: 1fe220002279ffd4 R15: 0000000000000000
Nov 21 12:36:09 fedora29 kernel: FS: 00007fb8596e2800(0000) GS:ff110004cb000000(0000) knlGS:0000000000000000
Nov 21 12:36:09 fedora29 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 21 12:36:09 fedora29 kernel: CR2: 00007fff7f3e2e3c CR3: 0000000122822004 CR4: 0000000000373ef0
Nov 21 12:36:09 fedora29 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 21 12:36:09 fedora29 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 21 12:36:09 fedora29 audit[315377]: USER_END pid=315377 uid=0 auid=0 ses=86 subj=kernel msg='op=login id=0 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=ssh res=success'
Nov 21 12:36:09 fedora29 audit[315377]: USER_LOGOUT pid=315377 uid=0 auid=0 ses=86 subj=kernel msg='op=login id=0 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=ssh res=success'
Nov 21 12:36:09 fedora29 audit[315377]: CRYPTO_KEY_USER pid=315377 uid=0 auid=0 ses=86 subj=kernel msg='op=destroy kind=session fp=? direction=both spid=315380 suid=0 rport=56308 laddr=192.168.122.12 lport=22 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=? res=success'
Nov 21 12:36:09 fedora29 audit[315377]: CRYPTO_KEY_USER pid=315377 uid=0 auid=0 ses=86 subj=kernel msg='op=destroy kind=server fp=SHA256:1c:96:8e:f3:a7:01:59:0e:41:37:ed:d1:99:72:2a:55:ce:22:78:f7:52:5c:bd:6a:46:2f:22:14:0b:75:ff:f7 direction=? spid=315380 suid=0 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=? res=success'
Nov 21 12:36:09 fedora29 audit[315377]: USER_END pid=315377 uid=0 auid=0 ses=86 subj=kernel msg='op=PAM:session_close grantors=pam_selinux,pam_loginuid,pam_selinux,pam_namespace,pam_keyinit,pam_keyinit,pam_limits,pam_systemd,pam_unix,pam_umask,pam_lastlog acct="root" exe="/usr/sbin/sshd" hostname=192.168.122.1 addr=192.168.122.1 terminal=ssh res=success'
Nov 21 12:36:09 fedora29 audit[315377]: CRED_DISP pid=315377 uid=0 auid=0 ses=86 subj=kernel msg='op=PAM:setcred grantors=pam_env,pam_unix acct="root" exe="/usr/sbin/sshd" hostname=192.168.122.1 addr=192.168.122.1 terminal=ssh res=success'
Nov 21 12:36:09 fedora29 audit[315377]: CRYPTO_KEY_USER pid=315377 uid=0 auid=0 ses=86 subj=kernel msg='op=destroy kind=server fp=SHA256:1c:96:8e:f3:a7:01:59:0e:41:37:ed:d1:99:72:2a:55:ce:22:78:f7:52:5c:bd:6a:46:2f:22:14:0b:75:ff:f7 direction=? spid=315377 suid=0 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=? res=success'
Nov 21 12:36:09 fedora29 systemd[1]: session-86.scope: Deactivated successfully.
Nov 21 12:36:09 fedora29 systemd[1]: session-86.scope: Consumed 2.508s CPU time.
Nov 21 12:36:09 fedora29 systemd-logind[1094]: Session 86 logged out. Waiting for processes to exit.
Nov 21 12:36:09 fedora29 systemd-logind[1094]: Removed session 86.
Nov 21 12:36:09 fedora29 audit[316011]: CRYPTO_KEY_USER pid=316011 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='op=destroy kind=server fp=SHA256:1c:96:8e:f3:a7:01:59:0e:41:37:ed:d1:99:72:2a:55:ce:22:78:f7:52:5c:bd:6a:46:2f:22:14:0b:75:ff:f7 direction=? spid=316011 suid=0 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=? res=success'
Nov 21 12:36:09 fedora29 audit[316010]: CRYPTO_SESSION pid=316010 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='op=start direction=from-server cipher=aes256-gcm@openssh.com ksize=256 mac=<implicit> pfs=curve25519-sha256 spid=316011 suid=74 rport=40510 laddr=192.168.122.12 lport=22 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=? res=success'
Nov 21 12:36:09 fedora29 audit[316010]: CRYPTO_SESSION pid=316010 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='op=start direction=from-client cipher=aes256-gcm@openssh.com ksize=256 mac=<implicit> pfs=curve25519-sha256 spid=316011 suid=74 rport=40510 laddr=192.168.122.12 lport=22 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=? res=success'
Nov 21 12:36:09 fedora29 audit[316010]: USER_AUTH pid=316010 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='op=pubkey_auth grantors=auth-key acct="root" exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=? res=success'
Nov 21 12:36:09 fedora29 audit[316010]: CRYPTO_KEY_USER pid=316010 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='op=negotiate kind=auth-key fp=SHA256:3e:03:52:86:fb:4d:b7:b6:0e:dd:8d:1e:57:6e:e2:00:dc:da:9a:38:08:d8:49:4f:31:4b:e6:fb:16:6c:ff:84 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=? res=success'
Nov 21 12:36:09 fedora29 audit[316010]: USER_ACCT pid=316010 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='op=PAM:accounting grantors=pam_unix acct="root" exe="/usr/sbin/sshd" hostname=192.168.122.1 addr=192.168.122.1 terminal=ssh res=success'
Nov 21 12:36:09 fedora29 audit[316010]: CRYPTO_KEY_USER pid=316010 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='op=destroy kind=session fp=? direction=both spid=316011 suid=74 rport=40510 laddr=192.168.122.12 lport=22 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=? res=success'
Nov 21 12:36:09 fedora29 audit[316010]: CRED_ACQ pid=316010 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='op=PAM:setcred grantors=pam_env,pam_unix acct="root" exe="/usr/sbin/sshd" hostname=192.168.122.1 addr=192.168.122.1 terminal=ssh res=success'
Nov 21 12:36:09 fedora29 systemd-logind[1094]: New session 87 of user root.
Nov 21 12:36:09 fedora29 systemd[1]: Started session-87.scope - Session 87 of User root.
Nov 21 12:36:09 fedora29 audit[316010]: USER_START pid=316010 uid=0 auid=0 ses=87 subj=kernel msg='op=PAM:session_open grantors=pam_selinux,pam_loginuid,pam_selinux,pam_namespace,pam_keyinit,pam_keyinit,pam_limits,pam_systemd,pam_unix,pam_umask,pam_lastlog acct="root" exe="/usr/sbin/sshd" hostname=192.168.122.1 addr=192.168.122.1 terminal=ssh res=success'
Nov 21 12:36:09 fedora29 audit[316013]: CRYPTO_KEY_USER pid=316013 uid=0 auid=0 ses=87 subj=kernel msg='op=destroy kind=server fp=SHA256:1c:96:8e:f3:a7:01:59:0e:41:37:ed:d1:99:72:2a:55:ce:22:78:f7:52:5c:bd:6a:46:2f:22:14:0b:75:ff:f7 direction=? spid=316013 suid=0 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=? res=success'
Nov 21 12:36:09 fedora29 audit[316013]: CRED_ACQ pid=316013 uid=0 auid=0 ses=87 subj=kernel msg='op=PAM:setcred grantors=pam_env,pam_unix acct="root" exe="/usr/sbin/sshd" hostname=192.168.122.1 addr=192.168.122.1 terminal=ssh res=success'
Nov 21 12:36:09 fedora29 audit[316010]: USER_LOGIN pid=316010 uid=0 auid=0 ses=87 subj=kernel msg='op=login id=0 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=ssh res=success'
Nov 21 12:36:09 fedora29 audit[316010]: USER_START pid=316010 uid=0 auid=0 ses=87 subj=kernel msg='op=login id=0 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=ssh res=success'
Nov 21 12:36:09 fedora29 audit[316010]: CRYPTO_KEY_USER pid=316010 uid=0 auid=0 ses=87 subj=kernel msg='op=destroy kind=server fp=SHA256:1c:96:8e:f3:a7:01:59:0e:41:37:ed:d1:99:72:2a:55:ce:22:78:f7:52:5c:bd:6a:46:2f:22:14:0b:75:ff:f7 direction=? spid=316014 suid=0 exe="/usr/sbin/sshd" hostname=? addr=192.168.122.1 terminal=? res=success'
Nov 21 12:36:10 fedora29 kernel: CIFS: Attempting to mount //win16.vm.test/Share
Nov 21 12:36:10 fedora29 abrt-dump-journal-oops[1136]: abrt-dump-journal-oops: Found oopses: 2
Nov 21 12:36:10 fedora29 abrt-dump-journal-oops[1136]: abrt-dump-journal-oops: Creating problem directories
Nov 21 12:36:10 fedora29 kernel: ==================================================================
Nov 21 12:36:10 fedora29 kernel: BUG: KASAN: slab-use-after-free in rwsem_down_write_slowpath+0xa34/0xaf0
Nov 21 12:36:10 fedora29 kernel: Read of size 4 at addr ff11000121838034 by task mount.cifs/316187
Nov 21 12:36:10 fedora29 kernel:
Nov 21 12:36:10 fedora29 kernel: CPU: 3 UID: 0 PID: 316187 Comm: mount.cifs Tainted: G D W 6.12.0 #1
Nov 21 12:36:10 fedora29 kernel: Tainted: [D]=DIE, [W]=WARN
Nov 21 12:36:10 fedora29 kernel: Hardware name: Red Hat KVM, BIOS 1.16.3-2.el9 04/01/2014
Nov 21 12:36:10 fedora29 kernel: Call Trace:
Nov 21 12:36:10 fedora29 kernel: <TASK>
Nov 21 12:36:10 fedora29 kernel: dump_stack_lvl+0x79/0xb0
Nov 21 12:36:10 fedora29 kernel: print_report+0xcb/0x620
Nov 21 12:36:10 fedora29 kernel: ? __virt_addr_valid+0x19a/0x300
Nov 21 12:36:10 fedora29 kernel: ? rwsem_down_write_slowpath+0xa34/0xaf0
Nov 21 12:36:10 fedora29 kernel: kasan_report+0xbd/0xf0
Nov 21 12:36:10 fedora29 kernel: ? rwsem_down_write_slowpath+0xa34/0xaf0
Nov 21 12:36:10 fedora29 kernel: rwsem_down_write_slowpath+0xa34/0xaf0
Nov 21 12:36:10 fedora29 kernel: ? kasan_save_stack+0x34/0x50
Nov 21 12:36:10 fedora29 kernel: ? __pfx_rwsem_down_write_slowpath+0x10/0x10
Nov 21 12:36:10 fedora29 kernel: ? cifs_mount+0xfb/0x3b0 [cifs]
Nov 21 12:36:10 fedora29 kernel: ? cifs_smb3_do_mount+0x1a5/0xc10 [cifs]
Nov 21 12:36:10 fedora29 kernel: ? smb3_get_tree+0x1f0/0x430 [cifs]
Nov 21 12:36:10 fedora29 kernel: ? rcu_is_watching+0x20/0x50
Nov 21 12:36:10 fedora29 kernel: ? trace_lock_acquire+0x116/0x150
Nov 21 12:36:10 fedora29 kernel: ? lock_acquire+0x40/0x90
Nov 21 12:36:10 fedora29 kernel: ? super_lock+0xea/0x1d0
Nov 21 12:36:10 fedora29 kernel: ? super_lock+0xea/0x1d0
Nov 21 12:36:10 fedora29 kernel: down_write+0x15b/0x160
Nov 21 12:36:10 fedora29 kernel: ? __pfx_down_write+0x10/0x10
Nov 21 12:36:10 fedora29 kernel: ? __mod_timer+0x407/0x590
Nov 21 12:36:10 fedora29 kernel: super_lock+0xea/0x1d0
Nov 21 12:36:10 fedora29 kernel: ? __pfx_super_lock+0x10/0x10
Nov 21 12:36:10 fedora29 kernel: ? __pfx_lock_release+0x10/0x10
Nov 21 12:36:10 fedora29 kernel: ? rcu_is_watching+0x20/0x50
Nov 21 12:36:10 fedora29 kernel: ? lock_release+0xa5/0x3d0
Nov 21 12:36:10 fedora29 kernel: ? cifs_match_super+0x177/0x650 [cifs]
Nov 21 12:36:10 fedora29 kernel: grab_super+0x80/0x1e0
Nov 21 12:36:10 fedora29 kernel: ? __pfx_grab_super+0x10/0x10
Nov 21 12:36:10 fedora29 kernel: ? cifs_put_tlink+0xa1/0xc0 [cifs]
Nov 21 12:36:10 fedora29 kernel: ? cifs_match_super+0x17f/0x650 [cifs]
Nov 21 12:36:10 fedora29 kernel: ? __pfx_cifs_match_super+0x10/0x10 [cifs]
Nov 21 12:36:10 fedora29 kernel: sget+0x121/0x350
Nov 21 12:36:10 fedora29 kernel: ? __pfx_cifs_set_super+0x10/0x10 [cifs]
Nov 21 12:36:10 fedora29 kernel: cifs_smb3_do_mount+0x293/0xc10 [cifs]
Nov 21 12:36:10 fedora29 kernel: ? __pfx___mutex_lock+0x10/0x10
Nov 21 12:36:10 fedora29 kernel: ? cred_has_capability.isra.0+0xd4/0x1a0
Nov 21 12:36:10 fedora29 kernel: ? __pfx_cifs_smb3_do_mount+0x10/0x10 [cifs]
Nov 21 12:36:10 fedora29 kernel: smb3_get_tree+0x1f0/0x430 [cifs]
Nov 21 12:36:10 fedora29 kernel: vfs_get_tree+0x50/0x180
Nov 21 12:36:10 fedora29 kernel: path_mount+0x5d6/0xf20
Nov 21 12:36:10 fedora29 kernel: ? __pfx_path_mount+0x10/0x10
Nov 21 12:36:10 fedora29 kernel: ? user_path_at+0x45/0x60
Nov 21 12:36:10 fedora29 kernel: __x64_sys_mount+0x174/0x1b0
Nov 21 12:36:10 fedora29 kernel: ? __pfx___x64_sys_mount+0x10/0x10
Nov 21 12:36:10 fedora29 kernel: ? rcu_is_watching+0x20/0x50
Nov 21 12:36:10 fedora29 kernel: do_syscall_64+0x75/0x180
Nov 21 12:36:10 fedora29 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Nov 21 12:36:10 fedora29 kernel: RIP: 0033:0x7f62bc06c8fe
Nov 21 12:36:10 fedora29 kernel: Code: 48 8b 0d 1d a5 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ea a4 0c 00 f7 d8 64 89 01 48
Nov 21 12:36:10 fedora29 kernel: RSP: 002b:00007ffe134d8018 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
Nov 21 12:36:10 fedora29 kernel: RAX: ffffffffffffffda RBX: 0000555d89139471 RCX: 00007f62bc06c8fe
Nov 21 12:36:10 fedora29 kernel: RDX: 0000555d89139471 RSI: 0000555d891394d7 RDI: 00007ffe134d9caf
Nov 21 12:36:10 fedora29 kernel: RBP: 000000000000000a R08: 0000555d93ab0eb0 R09: 0000000000000000
Nov 21 12:36:10 fedora29 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe134d9caf
Nov 21 12:36:10 fedora29 kernel: R13: 0000555d93ab1f20 R14: 0000555d93ab0eb0 R15: 00007f62bc166000
Nov 21 12:36:10 fedora29 kernel: </TASK>
Nov 21 12:36:10 fedora29 kernel:
Nov 21 12:36:10 fedora29 kernel: Allocated by task 315408:
Nov 21 12:36:10 fedora29 kernel: kasan_save_stack+0x24/0x50
Nov 21 12:36:10 fedora29 kernel: kasan_save_track+0x14/0x30
Nov 21 12:36:10 fedora29 kernel: __kasan_slab_alloc+0x59/0x70
Nov 21 12:36:10 fedora29 kernel: kmem_cache_alloc_node_noprof+0x116/0x330
Nov 21 12:36:10 fedora29 kernel: copy_process+0x299/0x45e0
Nov 21 12:36:10 fedora29 kernel: kernel_clone+0xf2/0x4b0
Nov 21 12:36:10 fedora29 kernel: __do_sys_clone+0x90/0xb0
Nov 21 12:36:10 fedora29 kernel: do_syscall_64+0x75/0x180
Nov 21 12:36:10 fedora29 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Nov 21 12:36:10 fedora29 kernel:
Nov 21 12:36:10 fedora29 kernel: Freed by task 0:
Nov 21 12:36:10 fedora29 kernel: kasan_save_stack+0x24/0x50
Nov 21 12:36:10 fedora29 kernel: kasan_save_track+0x14/0x30
Nov 21 12:36:10 fedora29 kernel: kasan_save_free_info+0x3b/0x60
Nov 21 12:36:10 fedora29 kernel: __kasan_slab_free+0x38/0x50
Nov 21 12:36:10 fedora29 kernel: kmem_cache_free+0x239/0x5a0
Nov 21 12:36:10 fedora29 kernel: delayed_put_task_struct+0x149/0x1b0
Nov 21 12:36:10 fedora29 kernel: rcu_do_batch+0x2f4/0x880
Nov 21 12:36:10 fedora29 kernel: rcu_core+0x3d6/0x510
Nov 21 12:36:10 fedora29 kernel: handle_softirqs+0x10f/0x580
Nov 21 12:36:10 fedora29 kernel: __irq_exit_rcu+0xfe/0x120
Nov 21 12:36:10 fedora29 kernel: irq_exit_rcu+0xe/0x20
Nov 21 12:36:10 fedora29 kernel: sysvec_apic_timer_interrupt+0x76/0x90
Nov 21 12:36:10 fedora29 kernel: asm_sysvec_apic_timer_interrupt+0x1a/0x20
Nov 21 12:36:10 fedora29 kernel:
Nov 21 12:36:10 fedora29 kernel: Last potentially related work creation:
Nov 21 12:36:10 fedora29 kernel: kasan_save_stack+0x24/0x50
Nov 21 12:36:10 fedora29 kernel: __kasan_record_aux_stack+0x8e/0xa0
Nov 21 12:36:10 fedora29 kernel: __call_rcu_common.constprop.0+0x87/0x920
Nov 21 12:36:10 fedora29 kernel: release_task+0x836/0xc60
Nov 21 12:36:10 fedora29 kernel: wait_consider_task+0x9db/0x19c0
Nov 21 12:36:10 fedora29 kernel: __do_wait+0xe9/0x390
Nov 21 12:36:10 fedora29 kernel: do_wait+0xcb/0x230
Nov 21 12:36:10 fedora29 kernel: kernel_wait4+0xe4/0x1a0
Nov 21 12:36:10 fedora29 kernel: __do_sys_wait4+0xce/0xe0
Nov 21 12:36:10 fedora29 kernel: do_syscall_64+0x75/0x180
Nov 21 12:36:10 fedora29 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Nov 21 12:36:10 fedora29 kernel:
Nov 21 12:36:10 fedora29 kernel: Second to last potentially related work creation:
Nov 21 12:36:10 fedora29 kernel: kasan_save_stack+0x24/0x50
Nov 21 12:36:10 fedora29 kernel: __kasan_record_aux_stack+0x8e/0xa0
Nov 21 12:36:10 fedora29 kernel: task_work_add+0x14d/0x210
Nov 21 12:36:10 fedora29 kernel: sched_tick+0x174/0x400
Nov 21 12:36:10 fedora29 kernel: update_process_times+0xd4/0xf0
Nov 21 12:36:10 fedora29 kernel: tick_nohz_handler+0x180/0x220
Nov 21 12:36:10 fedora29 kernel: __hrtimer_run_queues+0x31b/0x5b0
Nov 21 12:36:10 fedora29 kernel: hrtimer_interrupt+0x1a7/0x370
Nov 21 12:36:10 fedora29 kernel: __sysvec_apic_timer_interrupt+0xa1/0x270
Nov 21 12:36:10 fedora29 kernel: sysvec_apic_timer_interrupt+0x71/0x90
Nov 21 12:36:10 fedora29 kernel: asm_sysvec_apic_timer_interrupt+0x1a/0x20
Nov 21 12:36:10 fedora29 kernel:
Nov 21 12:36:10 fedora29 kernel: The buggy address belongs to the object at ff11000121838000#012 which belongs to the cache task_struct of size 14656
Nov 21 12:36:10 fedora29 kernel: The buggy address is located 52 bytes inside of#012 freed 14656-byte region [ff11000121838000, ff1100012183b940)
Nov 21 12:36:10 fedora29 kernel:
Nov 21 12:36:10 fedora29 kernel: The buggy address belongs to the physical page:
Nov 21 12:36:10 fedora29 kernel: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xff11000121838000 pfn:0x121838
Nov 21 12:36:10 fedora29 kernel: head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
Nov 21 12:36:10 fedora29 kernel: memcg:ff11000119d981c1
Nov 21 12:36:10 fedora29 kernel: flags: 0x17ffffc0000240(workingset|head|node=0|zone=2|lastcpupid=0x1fffff)
Nov 21 12:36:10 fedora29 kernel: page_type: f5(slab)
Nov 21 12:36:10 fedora29 kernel: raw: 0017ffffc0000240 ff11000100280640 ff1100010014d648 ff1100010014d648
Nov 21 12:36:10 fedora29 kernel: raw: ff11000121838000 0000000000020001 00000001f5000000 ff11000119d981c1
Nov 21 12:36:10 fedora29 kernel: head: 0017ffffc0000240 ff11000100280640 ff1100010014d648 ff1100010014d648
Nov 21 12:36:10 fedora29 kernel: head: ff11000121838000 0000000000020001 00000001f5000000 ff11000119d981c1
Nov 21 12:36:10 fedora29 kernel: head: 0017ffffc0000003 ffd4000004860e01 ffffffffffffffff 0000000000000000
Nov 21 12:36:10 fedora29 kernel: head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
Nov 21 12:36:10 fedora29 kernel: page dumped because: kasan: bad access detected
Nov 21 12:36:10 fedora29 kernel:
Nov 21 12:36:10 fedora29 kernel: Memory state around the buggy address:
Nov 21 12:36:10 fedora29 kernel: ff11000121837f00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Nov 21 12:36:10 fedora29 kernel: ff11000121837f80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
Nov 21 12:36:10 fedora29 kernel: >ff11000121838000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Nov 21 12:36:10 fedora29 kernel: ^
Nov 21 12:36:10 fedora29 kernel: ff11000121838080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Nov 21 12:36:10 fedora29 kernel: ff11000121838100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Nov 21 12:36:10 fedora29 kernel: ==================================================================
Nov 21 12:36:11 fedora29 abrt-server[316189]: Package 'kernel' isn't signed with proper key
Nov 21 12:36:11 fedora29 abrt-server[316189]: 'post-create' on '/var/spool/abrt/oops-2024-11-21-12:36:10-1136-0' exited with 1
Nov 21 12:36:11 fedora29 abrt-server[316189]: Deleting problem directory '/var/spool/abrt/oops-2024-11-21-12:36:10-1136-0'
Nov 21 12:36:11 fedora29 abrt-server[316192]: Package 'kernel' isn't signed with proper key
Nov 21 12:36:11 fedora29 abrt-server[316192]: 'post-create' on '/var/spool/abrt/oops-2024-11-21-12:36:10-1136-1' exited with 1
Nov 21 12:36:11 fedora29 abrt-server[316192]: Deleting problem directory '/var/spool/abrt/oops-2024-11-21-12:36:10-1136-1'
Nov 21 12:36:12 fedora29 abrt-dump-journal-oops[1136]: Reported 2 kernel oopses to Abrt
Nov 21 12:38:02 fedora29 nmbd[1298]: [2024/11/21 12:38:02.597677, 0] ../../source3/nmbd/nmbd_namequery.c:109(query_name_response)
Nov 21 12:38:02 fedora29 nmbd[1298]: query_name_response: Multiple (2) responses received for a query on subnet 192.168.122.12 for name SAMBA<1d>.
Nov 21 12:38:02 fedora29 nmbd[1298]: This response was from IP 192.168.122.14, reporting an IP address of 192.168.122.14.
Nov 21 12:43:11 fedora29 nmbd[1298]: [2024/11/21 12:43:11.863181, 0] ../../source3/nmbd/nmbd_namequery.c:109(query_name_response)
Nov 21 12:43:11 fedora29 nmbd[1298]: query_name_response: Multiple (2) responses received for a query on subnet 192.168.122.12 for name SAMBA<1d>.
Nov 21 12:43:11 fedora29 nmbd[1298]: This response was from IP 192.168.122.14, reporting an IP address of 192.168.122.14.
Nov 21 12:48:14 fedora29 nmbd[1298]: [2024/11/21 12:48:14.221940, 0] ../../source3/nmbd/nmbd_namequery.c:109(query_name_response)
Nov 21 12:48:14 fedora29 nmbd[1298]: query_name_response: Multiple (2) responses received for a query on subnet 192.168.122.12 for name SAMBA<1d>.
Nov 21 12:48:14 fedora29 nmbd[1298]: This response was from IP 192.168.122.14, reporting an IP address of 192.168.122.14.
Nov 21 12:50:27 fedora29 NetworkManager[1139]: <info> [1732215027.3833] dhcp4 (ens19): state changed new lease, address=192.168.237.173
Nov 21 12:50:27 fedora29 dbus-daemon[1086]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.9' (uid=0 pid=1139 comm="/usr/sbin/NetworkManager --no-daemon" label="kernel")
Nov 21 12:50:27 fedora29 systemd[1]: Starting NetworkManager-dispatcher.service - Network Manager Script Dispatcher Service...
Nov 21 12:50:27 fedora29 dbus-daemon[1086]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Nov 21 12:50:27 fedora29 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 21 12:50:27 fedora29 systemd[1]: Started NetworkManager-dispatcher.service - Network Manager Script Dispatcher Service.
Nov 21 12:50:37 fedora29 systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Nov 21 12:50:37 fedora29 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 21 12:51:15 fedora29 systemd[1]: Starting fwupd-refresh.service - Refresh fwupd metadata and update motd...
Nov 21 12:51:15 fedora29 systemd[1]: fwupd-refresh.service: Main process exited, code=exited, status=1/FAILURE
Nov 21 12:51:15 fedora29 systemd[1]: fwupd-refresh.service: Failed with result 'exit-code'.
Nov 21 12:51:15 fedora29 systemd[1]: Failed to start fwupd-refresh.service - Refresh fwupd metadata and update motd.
Nov 21 12:51:15 fedora29 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=fwupd-refresh comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Nov 21 12:53:04 fedora29 NetworkManager[1139]: <info> [1732215184.3968] dhcp4 (ens3): state changed new lease, address=192.168.122.12
Nov 21 12:53:04 fedora29 dbus-daemon[1086]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.9' (uid=0 pid=1139 comm="/usr/sbin/NetworkManager --no-daemon" label="kernel")
Nov 21 12:53:04 fedora29 systemd[1]: Starting NetworkManager-dispatcher.service - Network Manager Script Dispatcher Service...
Nov 21 12:53:04 fedora29 dbus-daemon[1086]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Nov 21 12:53:04 fedora29 systemd[1]: Started NetworkManager-dispatcher.service - Network Manager Script Dispatcher Service.
Nov 21 12:53:04 fedora29 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 21 12:53:14 fedora29 systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Nov 21 12:53:14 fedora29 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 21 12:53:20 fedora29 nmbd[1298]: [2024/11/21 12:53:20.499533, 0] ../../source3/nmbd/nmbd_namequery.c:109(query_name_response)
Nov 21 12:53:20 fedora29 nmbd[1298]: query_name_response: Multiple (2) responses received for a query on subnet 192.168.122.12 for name SAMBA<1d>.
Nov 21 12:53:20 fedora29 nmbd[1298]: This response was from IP 192.168.122.14, reporting an IP address of 192.168.122.14.
Nov 21 12:58:29 fedora29 nmbd[1298]: [2024/11/21 12:58:29.287464, 0] ../../source3/nmbd/nmbd_namequery.c:109(query_name_response)
Nov 21 12:58:29 fedora29 nmbd[1298]: query_name_response: Multiple (2) responses received for a query on subnet 192.168.122.12 for name SAMBA<1d>.
Nov 21 12:58:29 fedora29 nmbd[1298]: This response was from IP 192.168.122.14, reporting an IP address of 192.168.122.14.
Nov 21 13:01:15 fedora29 systemd[1]: Starting fwupd-refresh.service - Refresh fwupd metadata and update motd...
Nov 21 13:01:15 fedora29 systemd[1]: fwupd-refresh.service: Main process exited, code=exited, status=1/FAILURE
Nov 21 13:01:15 fedora29 systemd[1]: fwupd-refresh.service: Failed with result 'exit-code'.
Nov 21 13:01:15 fedora29 systemd[1]: Failed to start fwupd-refresh.service - Refresh fwupd metadata and update motd.
Nov 21 13:01:15 fedora29 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=fwupd-refresh comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
^ permalink raw reply [flat|nested] 20+ messages in thread