Linux CIFS filesystem development
 help / color / mirror / Atom feed
* [PATCH 0/5] SMB cached directory fixes around reconnection/unmounting
@ 2024-11-08 22:29 Paul Aurich
  2024-11-08 22:29 ` [PATCH 1/5] smb: cached directories can be more than root file handle Paul Aurich
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Paul Aurich @ 2024-11-08 22:29 UTC (permalink / raw)
  To: linux-cifs, Steve French
  Cc: paul, Paulo Alcantara, Ronnie Sahlberg, Shyam Prasad N,
	Tom Talpey, Bharath SM

The SMB client cached directory functionality has a few problems around
flaky/lost server connections, which manifest as a pair of BUGs when
eventually unmounting the server connection:

    [18645.013550] BUG: Dentry ffff888140590ba0{i=1000000000080,n=/}  still in use (2) [unmount of cifs cifs]
    [18645.789274] VFS: Busy inodes after unmount of cifs (cifs)

Based on bisection, these issues started with the lease directory cache
handling introduced in commit ebe98f1447bb ("cifs: enable caching of
directories for which a lease is held"), and go away if I mount with
'nohandlecache'.  I started seeing these on Debian Bookworm stable kernel
(v6.1.x), but the issues persist even in current git versions.  I think the
situation was improved (occurrence frequency went down) with
commit 5c86919455c1 ("smb: client: fix use-after-free in
smb2_query_info_compound()").


I'm able to reproduce the "Dentry still in use" errors by connecting to an
actively-used SMB share (the server organically generates lease breaks) and
leaving these running for 'a while':

- while true; do cd ~; sleep 1; for i in {1..3}; do cd /mnt/test/subdir; echo $PWD; sleep 1; cd ..; echo $PWD; sleep 1; done; echo ...; done
- while true; do iptables -F OUTPUT; mount -t cifs -a; for _ in {0..2}; do ls /mnt/test/subdir/ | wc -l; done; iptables -I OUTPUT -p tcp --dport 445 -j DROP; sleep 10; echo "unmounting"; umount -l -t cifs -a; echo "done unmounting"; sleep 20; echo "recovering"; iptables -F OUTPUT; sleep 10; done

('a while' is anywhere from 10 minutes to overnight. Also, it's not the
cleanest reproducer, but I stopped iterating once I had something that was
even remotely reliable for me...)

This series attempts to fix these, as well as a use-after-free that could
occur because open_cached_dir() explicitly frees the cached_fid, rather than
relying on reference counting.


The last patch in this series (smb: During umount, flush any pending lease
breaks for cached dirs) is a work-in-progress, and should not be taken as-is.
The issue there:

Unmounting an SMB share (cifs_kill_sb) can race with a queued lease break from
the server for a cached directory.  When this happens, the cfid is removed
from the linked list, so close_all_cached_dirs() cannot drop the dentry.  If
cifs_kill_sb continues on before the queued work puts the dentry, we trigger
the "Dentry still in use" BUG splat.  Flushing the cifsiod_wq seems to help
this, but some thoughts:

1. cifsiod_wq is a global workqueue, rather than per-mount.  Flushing the
   entire workqueue is potentially doing more work that necessary.  Should
   there be a workqueue that's more appropriately scoped?
2. With an unresponsive server, this causes umount (even umount -l) to hang
   (waiting for SMB2_close calls), and when I test with backports on a 6.1
   kernel, appears to cause a deadlock between kill_sb and some cifs
   reconnection code calling iterate_supers_type.  (Pretty sure the deadlock
   was addressed by changes to fs/super.c, so not really an SMB problem, but
   just an indication that flush_waitqueue isn't the right solution).
3. Should cached_dir_lease_break() drop the dentry before queueing work
   (and if so, is it OK to do this under the spinlock, or should the spinlock
   be dropped first)?
4. Related to #3 -- shouldn't close_all_cached_dirs() be holding the spinlock
   while looping?


Lastly, patches 2, 3, and 5 (in its final form) are beneficial going back to
v6.1 for stable, but it's not a clean backport because some other important
fixes (commit 5c86919455c1 ("smb: client: fix use-after-free in
smb2_query_info_compound()") weren't picked up.

Paul Aurich (5):
  smb: cached directories can be more than root file handle
  smb: Don't leak cfid when reconnect races with open_cached_dir
  smb: prevent use-after-free due to open_cached_dir error paths
  smb: No need to wait for work when cleaning up cached directories
  smb: During umount, flush any pending lease breaks for cached dirs

 fs/smb/client/cached_dir.c | 106 ++++++++++++++++---------------------
 1 file changed, 47 insertions(+), 59 deletions(-)

--
2.45.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-11-18 17:41 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-08 22:29 [PATCH 0/5] SMB cached directory fixes around reconnection/unmounting Paul Aurich
2024-11-08 22:29 ` [PATCH 1/5] smb: cached directories can be more than root file handle Paul Aurich
2024-11-08 22:29 ` [PATCH 2/5] smb: Don't leak cfid when reconnect races with open_cached_dir Paul Aurich
2024-11-08 22:29 ` [PATCH 3/5] smb: prevent use-after-free due to open_cached_dir error paths Paul Aurich
2024-11-08 22:29 ` [PATCH 4/5] smb: No need to wait for work when cleaning up cached directories Paul Aurich
2024-11-08 22:29 ` [PATCH 5/5] smb: During umount, flush any pending lease breaks for cached dirs Paul Aurich
2024-11-08 22:33   ` Steve French
2024-11-08 22:39 ` [PATCH 0/5] SMB cached directory fixes around reconnection/unmounting Steve French
2024-11-08 22:44   ` Paul Aurich
2024-11-08 23:09     ` Enzo Matsumiya
2024-11-10  0:49       ` Steve French
2024-11-13 19:08         ` Paul Aurich
2024-11-18 17:40           ` Steve French

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox