From: Paul Aurich <paul@darkrain42.org>
To: Paulo Alcantara <pc@manguebit.com>
Cc: linux-cifs@vger.kernel.org, Steve French <sfrench@samba.org>,
Ronnie Sahlberg <ronniesahlberg@gmail.com>,
Shyam Prasad N <sprasad@microsoft.com>,
Tom Talpey <tom@talpey.com>, Bharath SM <bharathsm@microsoft.com>
Subject: Re: [PATCH v2 4/4] smb: During unmount, ensure all cached dir instances drop their dentry
Date: Fri, 22 Nov 2024 19:28:34 -0800 [thread overview]
Message-ID: <Z0FL4kIUiCMFDVfe@vaarsuvius.home.arpa> (raw)
In-Reply-To: <2a818d91e9f3c392b2739a4c2a018085@manguebit.com>
On 2024-11-21 23:05:51 -0300, Paulo Alcantara wrote:
>Hi Paul,
>
>Thanks for looking into this! Really appreciate it.
>
>Paul Aurich <paul@darkrain42.org> writes:
>
>> The unmount process (cifs_kill_sb() calling close_all_cached_dirs()) can
>> race with various cached directory operations, which ultimately results
>> in dentries not being dropped and these kernel BUGs:
>>
>> BUG: Dentry ffff88814f37e358{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs]
>> VFS: Busy inodes after unmount of cifs (cifs)
>> ------------[ cut here ]------------
>> kernel BUG at fs/super.c:661!
>>
>> This happens when a cfid is in the process of being cleaned up when, and
>> has been removed from the cfids->entries list, including:
>>
>> - Receiving a lease break from the server
>> - Server reconnection triggers invalidate_all_cached_dirs(), which
>> removes all the cfids from the list
>> - The laundromat thread decides to expire an old cfid.
>>
>> To solve these problems, dropping the dentry is done in queued work done
>> in a newly-added cfid_put_wq workqueue, and close_all_cached_dirs()
>> flushes that workqueue after it drops all the dentries of which it's
>> aware. This is a global workqueue (rather than scoped to a mount), but
>> the queued work is minimal.
>
>Why does it need to be a global workqueue? Can't you make it per tcon?
The problem with a per-tcon workqueue is I didn't see clean way to deal with
multiuser mounts and flushing the workqueue in close_all_cached_dirs() -- when
dealing with each individual tcon, we're still holding tlink_tree_lock, so an
arbitrary sleep seems problematic.
There could be a per-sb workqueue (stored in cifs_sb or the master tcon) but
is there a way to get back to the superblock / master tcon with just a tcon
(e.g. cached_dir_lease_break, when processing a lease break)?
>> The final cleanup work for cleaning up a cfid is performed via work
>> queued in the serverclose_wq workqueue; this is done separate from
>> dropping the dentries so that close_all_cached_dirs() doesn't block on
>> any server operations.
>>
>> Both of these queued works expect to invoked with a cfid reference and
>> a tcon reference to avoid those objects from being freed while the work
>> is ongoing.
>
>Why do you need to take a tcon reference?
In the existing code (and my patch, without the refs), I was seeing an
intermittent use-after-free of the tcon or cached_fids struct by queued work
processing a lease break -- the cfid isn't linked from cached_fids, but
smb2_close_cached_fid invoking SMB2_close can race with the unmount and
cifs_put_tcon
Something like:
t1 t2
cached_dir_lease_break
smb2_cached_lease_break
smb2_close_cached_fid
SMB2_close starts
cifs_kill_sb
cifs_umount
cifs_put_link
cifs_put_tcon
SMB2_close continues
I had a version of the patch that kept the 'in flight lease breaks' on
a second list in cached_fids so that they could be cancelled synchronously
from free_cached_fids(), but I struggled with it (I can't remember exactly,
but I think I was struggling to get the linked list membership / removal
handling and num_entries handling consistent).
> Can't you drop the dentries
>when tearing down tcon in cifs_put_tcon()? No concurrent mounts would
>be able to access or free it.
The dentries being dropped must occur before kill_anon_super(), as that's
where the 'Dentry still in use' check is. All the tcons are put in
cifs_umount(), which occurs after:
kill_anon_super(sb);
cifs_umount(cifs_sb);
The other thing is that cifs_umount_begin() has this comment, which made me
think a tcon can actually be tied to two distinct mount points:
if ((tcon->tc_count > 1) || (tcon->status == TID_EXITING)) {
/* we have other mounts to same share or we have
already tried to umount this and woken up
all waiting network requests, nothing to do */
Although, as I'm thinking about it again, I think I've misunderstood (and that
comment is wrong?).
It did cross my mind to pull some of the work out of cifs_umount into
cifs_kill_sb (specifically, I wanted to cancel prune_tlinks earlier) -- no
prune_tlinks would make it more feasible to drop tlink_tree_lock in
close_all_cached_dirs(), at which point a per-tcon workqueue is more
practical.
>After running xfstests I've seen a leaked tcon in
>/proc/fs/cifs/DebugData with no CIFS superblocks, which might be related
>to this.
>
>Could you please check if there is any leaked connection in
>/proc/fs/cifs/DebugData after running your tests?
After I finish with my tests (I'm not using xfstests, although perhaps
I should be) and unmount the share, DebugData doesn't show any connections for
me.
~Paul
next prev parent reply other threads:[~2024-11-23 3:28 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-18 21:50 [PATCH v2 0/4] SMB cached directory fixes around reconnection/unmounting Paul Aurich
2024-11-18 21:50 ` [PATCH v2 1/4] smb: cached directories can be more than root file handle Paul Aurich
2024-11-18 22:27 ` Steve French
2024-11-18 21:50 ` [PATCH v2 2/4] smb: Don't leak cfid when reconnect races with open_cached_dir Paul Aurich
2024-11-18 21:50 ` [PATCH v2 3/4] smb: prevent use-after-free due to open_cached_dir error paths Paul Aurich
2024-11-18 21:50 ` [PATCH v2 4/4] smb: During unmount, ensure all cached dir instances drop their dentry Paul Aurich
2024-11-22 2:05 ` Paulo Alcantara
2024-11-23 3:28 ` Paul Aurich [this message]
2024-11-26 21:37 ` Paul Aurich
2024-11-27 16:38 ` Steve French
2024-11-27 17:04 ` Enzo Matsumiya
2024-11-27 17:12 ` Steve French
2024-11-28 1:10 ` Steve French
2024-11-28 5:00 ` Steve French
2024-11-28 14:16 ` Steve French
2024-12-06 23:28 ` Steve French
2024-11-27 17:36 ` Paulo Alcantara
2024-11-19 0:55 ` [PATCH v2 0/4] SMB cached directory fixes around reconnection/unmounting Steve French
2024-11-19 2:29 ` Paul Aurich
2024-11-21 20:59 ` Steve French
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z0FL4kIUiCMFDVfe@vaarsuvius.home.arpa \
--to=paul@darkrain42.org \
--cc=bharathsm@microsoft.com \
--cc=linux-cifs@vger.kernel.org \
--cc=pc@manguebit.com \
--cc=ronniesahlberg@gmail.com \
--cc=sfrench@samba.org \
--cc=sprasad@microsoft.com \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox