Linux CIFS filesystem development
 help / color / mirror / Atom feed
From: Paul Aurich <paul@darkrain42.org>
To: Steve French <smfrench@gmail.com>
Cc: linux-cifs@vger.kernel.org, "Steve French" <sfrench@samba.org>,
	"Paulo Alcantara" <pc@manguebit.com>,
	"Ronnie Sahlberg" <ronniesahlberg@gmail.com>,
	"Shyam Prasad N" <sprasad@microsoft.com>,
	"Tom Talpey" <tom@talpey.com>,
	"Bharath SM" <bharathsm@microsoft.com>,
	"Ralph Böhme" <slow@samba.org>
Subject: Re: [PATCH 0/5] SMB cached directory fixes around reconnection/unmounting
Date: Fri, 8 Nov 2024 14:44:43 -0800	[thread overview]
Message-ID: <Zy6UW5fuva5VdIWk@haley.home.arpa> (raw)
In-Reply-To: <CAH2r5muDRhy2gsy7kz9GHC3maGxcxAcam8B3pgCmnS8xcEQX1w@mail.gmail.com>

No, this series doesn't try to address that. I was just focused on the 
issues around the lifecycle of the cfids and the dentry:s.

~Paul

On 2024-11-08 16:39:03 -0600, Steve French wrote:
>The perf improvement allowing caching of directories (helping "ls"
>then "ls" again for same dir) not just the perf improvement with "ls
>"then "stat" could be very important.
>
>Did this series also address Ralph's point about missing requesting
>"H" (handle caching) flag when requesting directory leases from the
>server?
>
>On Fri, Nov 8, 2024 at 4:35 PM Paul Aurich <paul@darkrain42.org> wrote:
>>
>> The SMB client cached directory functionality has a few problems around
>> flaky/lost server connections, which manifest as a pair of BUGs when
>> eventually unmounting the server connection:
>>
>>     [18645.013550] BUG: Dentry ffff888140590ba0{i=1000000000080,n=/}  still in use (2) [unmount of cifs cifs]
>>     [18645.789274] VFS: Busy inodes after unmount of cifs (cifs)
>>
>> Based on bisection, these issues started with the lease directory cache
>> handling introduced in commit ebe98f1447bb ("cifs: enable caching of
>> directories for which a lease is held"), and go away if I mount with
>> 'nohandlecache'.  I started seeing these on Debian Bookworm stable kernel
>> (v6.1.x), but the issues persist even in current git versions.  I think the
>> situation was improved (occurrence frequency went down) with
>> commit 5c86919455c1 ("smb: client: fix use-after-free in
>> smb2_query_info_compound()").
>>
>>
>> I'm able to reproduce the "Dentry still in use" errors by connecting to an
>> actively-used SMB share (the server organically generates lease breaks) and
>> leaving these running for 'a while':
>>
>> - while true; do cd ~; sleep 1; for i in {1..3}; do cd /mnt/test/subdir; echo $PWD; sleep 1; cd ..; echo $PWD; sleep 1; done; echo ...; done
>> - while true; do iptables -F OUTPUT; mount -t cifs -a; for _ in {0..2}; do ls /mnt/test/subdir/ | wc -l; done; iptables -I OUTPUT -p tcp --dport 445 -j DROP; sleep 10; echo "unmounting"; umount -l -t cifs -a; echo "done unmounting"; sleep 20; echo "recovering"; iptables -F OUTPUT; sleep 10; done
>>
>> ('a while' is anywhere from 10 minutes to overnight. Also, it's not the
>> cleanest reproducer, but I stopped iterating once I had something that was
>> even remotely reliable for me...)
>>
>> This series attempts to fix these, as well as a use-after-free that could
>> occur because open_cached_dir() explicitly frees the cached_fid, rather than
>> relying on reference counting.
>>
>>
>> The last patch in this series (smb: During umount, flush any pending lease
>> breaks for cached dirs) is a work-in-progress, and should not be taken as-is.
>> The issue there:
>>
>> Unmounting an SMB share (cifs_kill_sb) can race with a queued lease break from
>> the server for a cached directory.  When this happens, the cfid is removed
>> from the linked list, so close_all_cached_dirs() cannot drop the dentry.  If
>> cifs_kill_sb continues on before the queued work puts the dentry, we trigger
>> the "Dentry still in use" BUG splat.  Flushing the cifsiod_wq seems to help
>> this, but some thoughts:
>>
>> 1. cifsiod_wq is a global workqueue, rather than per-mount.  Flushing the
>>    entire workqueue is potentially doing more work that necessary.  Should
>>    there be a workqueue that's more appropriately scoped?
>> 2. With an unresponsive server, this causes umount (even umount -l) to hang
>>    (waiting for SMB2_close calls), and when I test with backports on a 6.1
>>    kernel, appears to cause a deadlock between kill_sb and some cifs
>>    reconnection code calling iterate_supers_type.  (Pretty sure the deadlock
>>    was addressed by changes to fs/super.c, so not really an SMB problem, but
>>    just an indication that flush_waitqueue isn't the right solution).
>> 3. Should cached_dir_lease_break() drop the dentry before queueing work
>>    (and if so, is it OK to do this under the spinlock, or should the spinlock
>>    be dropped first)?
>> 4. Related to #3 -- shouldn't close_all_cached_dirs() be holding the spinlock
>>    while looping?
>>
>>
>> Lastly, patches 2, 3, and 5 (in its final form) are beneficial going back to
>> v6.1 for stable, but it's not a clean backport because some other important
>> fixes (commit 5c86919455c1 ("smb: client: fix use-after-free in
>> smb2_query_info_compound()") weren't picked up.
>>
>> Paul Aurich (5):
>>   smb: cached directories can be more than root file handle
>>   smb: Don't leak cfid when reconnect races with open_cached_dir
>>   smb: prevent use-after-free due to open_cached_dir error paths
>>   smb: No need to wait for work when cleaning up cached directories
>>   smb: During umount, flush any pending lease breaks for cached dirs
>>
>>  fs/smb/client/cached_dir.c | 106 ++++++++++++++++---------------------
>>  1 file changed, 47 insertions(+), 59 deletions(-)
>>
>> --
>> 2.45.2
>>
>>
>
>
>-- 
>Thanks,
>
>Steve


  reply	other threads:[~2024-11-08 22:44 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-08 22:29 [PATCH 0/5] SMB cached directory fixes around reconnection/unmounting Paul Aurich
2024-11-08 22:29 ` [PATCH 1/5] smb: cached directories can be more than root file handle Paul Aurich
2024-11-08 22:29 ` [PATCH 2/5] smb: Don't leak cfid when reconnect races with open_cached_dir Paul Aurich
2024-11-08 22:29 ` [PATCH 3/5] smb: prevent use-after-free due to open_cached_dir error paths Paul Aurich
2024-11-08 22:29 ` [PATCH 4/5] smb: No need to wait for work when cleaning up cached directories Paul Aurich
2024-11-08 22:29 ` [PATCH 5/5] smb: During umount, flush any pending lease breaks for cached dirs Paul Aurich
2024-11-08 22:33   ` Steve French
2024-11-08 22:39 ` [PATCH 0/5] SMB cached directory fixes around reconnection/unmounting Steve French
2024-11-08 22:44   ` Paul Aurich [this message]
2024-11-08 23:09     ` Enzo Matsumiya
2024-11-10  0:49       ` Steve French
2024-11-13 19:08         ` Paul Aurich
2024-11-18 17:40           ` Steve French

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zy6UW5fuva5VdIWk@haley.home.arpa \
    --to=paul@darkrain42.org \
    --cc=bharathsm@microsoft.com \
    --cc=linux-cifs@vger.kernel.org \
    --cc=pc@manguebit.com \
    --cc=ronniesahlberg@gmail.com \
    --cc=sfrench@samba.org \
    --cc=slow@samba.org \
    --cc=smfrench@gmail.com \
    --cc=sprasad@microsoft.com \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox