From: Christian Brauner <brauner@kernel.org>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Namjae Jeon <linkinjeon@kernel.org>,
linux-cifs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
smfrench@gmail.com, hyc.lee@gmail.com, senozhatsky@chromium.org
Subject: Re: [PATCH 3/3] ksmbd: fix racy issue from using ->d_parent and ->d_name
Date: Wed, 18 May 2022 16:27:09 +0200 [thread overview]
Message-ID: <20220518142709.iewhrubtqiowdago@wittgenstein> (raw)
In-Reply-To: <YoQRypdyLcN60F+X@zeniv-ca.linux.org.uk>
On Tue, May 17, 2022 at 09:21:14PM +0000, Al Viro wrote:
> On Wed, Apr 27, 2022 at 11:32:45AM +0900, Namjae Jeon wrote:
> > Al pointed out that ksmbd has racy issue from using ->d_parent and ->d_name
> > in ksmbd_vfs_unlink and smb2_vfs_rename(). and use new lock_rename_child()
> > to lock stable parent while underlying rename racy.
> > Introduce vfs_path_parent_lookup helper to avoid out of share access and
> > export vfs functions like the following ones to use
> > vfs_path_parent_lookup().
> > - export __lookup_hash().
> > - export getname_kernel() and putname().
> >
> > vfs_path_parent_lookup() is used for parent lookup of destination file
> > using absolute pathname given from FILE_RENAME_INFORMATION request.
>
> First of all, this is seriously broken:
>
> > -int ksmbd_vfs_lock_parent(struct user_namespace *user_ns, struct dentry *parent,
> > - struct dentry *child)
> > +struct dentry *ksmbd_vfs_lock_parent(struct dentry *child)
> > {
> > - struct dentry *dentry;
> > - int ret = 0;
> > + struct dentry *parent;
> >
> > + parent = dget(child->d_parent);
> > inode_lock_nested(d_inode(parent), I_MUTEX_PARENT);
>
> Do that in parallel with host rename() and you are risking this:
>
> you: fetch child->d_parent
> get preempted away
> another thread: move child elsewhere
> another thread: drop (the last) reference to old parent
> memory pressure evicts that dentry and reuses memory
> you: regain the timeslice and bump what you think is parent->d_count.
> In reality, it's 4 bytes in completely different data structure.
> At that point you already have a memory corruptor. Worse,
> there's a decent chance that subsequent code will revert the
> corruption, so it would be hell to debug - you need a race
> to reproduce the thing in the first place *and* you need
> something else to notice the temporary memory corruption.
>
> you: fetch what you think is ->d_inode of that dentry. It actually
> isn't anything of that sort.
> you: grab rwsem at hell knows what address (might or might not point
> to an rwsem). Here's another chance to get something
> reproducible - e.g. if what you thought was ->d_inode actually
> points to unmapped memory, you'll get an oops here. Won't
> be consistent, though.
>
> > + if (child->d_parent != parent) {
> you: ->d_parent doesn't point there anymore
>
> > + dput(parent);
>
> you: decrement those 4 bytes in whatever object it is; if you are
> lucky, it won't hit zero and nobody had noticed the temporary
> increment. If you are not, well...
>
> > + inode_unlock(d_inode(parent));
>
> you: fetch ->d_inode of parent (mind you, it's a bug in its own right -
> even if parent hadn't gotten freed before your dget(), after dput()
> above it's fair game for getting freed; placing that dput()
> before unlocking d_inode() is wrong). Assuming you've got
> the same pointer as the first time around, you proceed to
> drop rwsem at the same address where you've grabbed it.
>
> IOW, you really don't want that in the tree in this form.
>
> It *might* be partially recoverable if you replace the first dget() with
> dget_parent() and reorder dput() and inode_unlock() in failure case, but...
> some of the callers of that thing are also rather dubious.
>
> Look: you have smb2_open() calling ksmbd_vfs_may_delete(), which calls
> that thing. Downstream of this:
> if (!file_present) {
> ...
> } else if (!already_permitted) {
>
> If the parent is *NOT* already locked by that point, just how much is
> your 'file_present' worth? And if it is, you'd obviously deadlock
> right there and then...
>
> I'm not sure I like what you've done with added exports - e.g.
> __lookup_hash had been OK as a name of static function, but exporting
> it is asking for clashes. And honestly, what would you say when running
> into a name like that? OK, it sounds like it's a (probably low-level)
> lookup in some hash table. _Maybe_ it would've been fine if we had one
> and only implementation of hash tables in the entire tree and that
> had been a part of it, but it's nothing of that sort. And "hash" in
> the name is not about doing a hash lookup as opposed to some other
> work (it *does* handle hash misses, allocating dentry, asking filesystem
> to do real on-disk lookup, etc.) - it's actually about "hash function
> of the name is already calculated". My fault, that - predecessor of
> that thing had been called lookup_one(); it took a string, calculated
> its length and computed hash, then proceeded to do lookups. The latter
> part could be reused in handling of rmdir et.al., where we already had
> the component length and hash precalculated, so the tail of lookup_one()
> had been carved out into a separate helper. Circa 2.3.99...
>
> Anyway, the name is _not_ fit for an export; I'm not sure what to call
> it - lookup_one_qstr(), perhaps? Additional fun is due to the fact
> that these days it is slightly different from the lookup_one() et.al.
> Those can be called with directory held shared; that allows parallel
> lookups, but it's not free of cost - if we run into a cache miss and need
> to allocate a new dentry and talk to filesystem, we have to recheck the
> hash table after allocation. __lookup_hash() is called only with parent
> held exclusive and it can skip that fun - hash miss is going to remain
> a miss; nobody else will be able to insert stuff into dcache in that
> directory until we unlock it.
>
> What I'm worried about is that renaming it to lookup_one_qstr() will
> be an invitation for "oh, we happen to have hash/len already known by the
> time of that lookup_one() call; let's just convert it to lookup_one_qsrt()"
> and if that happens in a place where the parent is held only shared, we'll
> be in trouble. OTOH, lookup_one_qstr_excl() sounds like an invitation to
> do something painful to whoever's responsible for such name...
>
> Suggestions, anyone?
Plus, __lookup_hash() doesn't do permission checking so naming it
lookup_one_qstr() seems problematic from that perspective as well...
Don't kill me but:
__excl_qstr_lookup_noperm()
?
next prev parent reply other threads:[~2022-05-18 14:27 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-27 2:32 [PATCH 1/3] ksmbd: remove internal.h include Namjae Jeon
2022-04-27 2:32 ` [PATCH 2/3] fs: introduce lock_rename_child() helper Namjae Jeon
2022-04-27 2:32 ` [PATCH 3/3] ksmbd: fix racy issue from using ->d_parent and ->d_name Namjae Jeon
2022-05-15 11:09 ` Namjae Jeon
2022-05-17 21:21 ` Al Viro
2022-05-18 14:27 ` Christian Brauner [this message]
2022-05-19 0:39 ` Namjae Jeon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220518142709.iewhrubtqiowdago@wittgenstein \
--to=brauner@kernel.org \
--cc=hyc.lee@gmail.com \
--cc=linkinjeon@kernel.org \
--cc=linux-cifs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=senozhatsky@chromium.org \
--cc=smfrench@gmail.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox