Linux CIFS filesystem development
 help / color / mirror / Atom feed
From: Christian Brauner <brauner@kernel.org>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Namjae Jeon <linkinjeon@kernel.org>,
	linux-cifs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	smfrench@gmail.com, hyc.lee@gmail.com, senozhatsky@chromium.org
Subject: Re: [PATCH 3/3] ksmbd: fix racy issue from using ->d_parent and ->d_name
Date: Wed, 18 May 2022 16:27:09 +0200	[thread overview]
Message-ID: <20220518142709.iewhrubtqiowdago@wittgenstein> (raw)
In-Reply-To: <YoQRypdyLcN60F+X@zeniv-ca.linux.org.uk>

On Tue, May 17, 2022 at 09:21:14PM +0000, Al Viro wrote:
> On Wed, Apr 27, 2022 at 11:32:45AM +0900, Namjae Jeon wrote:
> > Al pointed out that ksmbd has racy issue from using ->d_parent and ->d_name
> > in ksmbd_vfs_unlink and smb2_vfs_rename(). and use new lock_rename_child()
> > to lock stable parent while underlying rename racy.
> > Introduce vfs_path_parent_lookup helper to avoid out of share access and
> > export vfs functions like the following ones to use
> > vfs_path_parent_lookup().
> >  - export __lookup_hash().
> >  - export getname_kernel() and putname().
> > 
> > vfs_path_parent_lookup() is used for parent lookup of destination file
> > using absolute pathname given from FILE_RENAME_INFORMATION request.
> 
> First of all, this is seriously broken:
> 
> > -int ksmbd_vfs_lock_parent(struct user_namespace *user_ns, struct dentry *parent,
> > -			  struct dentry *child)
> > +struct dentry *ksmbd_vfs_lock_parent(struct dentry *child)
> >  {
> > -	struct dentry *dentry;
> > -	int ret = 0;
> > +	struct dentry *parent;
> >  
> > +	parent = dget(child->d_parent);
> >  	inode_lock_nested(d_inode(parent), I_MUTEX_PARENT);
> 
> Do that in parallel with host rename() and you are risking this:
> 
> you: fetch child->d_parent
> 	get preempted away
> another thread: move child elsewhere
> another thread: drop (the last) reference to old parent
> 	memory pressure evicts that dentry and reuses memory
> you: regain the timeslice and bump what you think is parent->d_count.
> 	In reality, it's 4 bytes in completely different data structure.
> 	At that point you already have a memory corruptor.  Worse,
> 	there's a decent chance that subsequent code will revert the
> 	corruption, so it would be hell to debug - you need a race
> 	to reproduce the thing in the first place *and* you need
> 	something else to notice the temporary memory corruption.
> 
> you: fetch what you think is ->d_inode of that dentry.  It actually
> 	isn't anything of that sort.
> you: grab rwsem at hell knows what address (might or might not point
> 	to an rwsem).  Here's another chance to get something
> 	reproducible - e.g. if what you thought was ->d_inode actually
> 	points to unmapped memory, you'll get an oops here.  Won't
> 	be consistent, though.
> 
> > +	if (child->d_parent != parent) {
> you:	->d_parent doesn't point there anymore
> 
> > +		dput(parent);
> 
> you:	decrement those 4 bytes in whatever object it is; if you are
> 	lucky, it won't hit zero and nobody had noticed the temporary
> 	increment.  If you are not, well...
> 
> > +		inode_unlock(d_inode(parent));
> 
> you:	fetch ->d_inode of parent (mind you, it's a bug in its own right -
> 	even if parent hadn't gotten freed before your dget(), after dput()
> 	above it's fair game for getting freed; placing that dput()
> 	before unlocking d_inode() is wrong).  Assuming you've got
> 	the same pointer as the first time around, you proceed to
> 	drop rwsem at the same address where you've grabbed it.
> 
> IOW, you really don't want that in the tree in this form.
> 
> It *might* be partially recoverable if you replace the first dget() with
> dget_parent() and reorder dput() and inode_unlock() in failure case, but...
> some of the callers of that thing are also rather dubious.
> 
> Look: you have smb2_open() calling ksmbd_vfs_may_delete(), which calls
> that thing.  Downstream of this:
> 	if (!file_present) {
> 	...
> 	} else if (!already_permitted) {
> 
> If the parent is *NOT* already locked by that point, just how much is
> your 'file_present' worth?  And if it is, you'd obviously deadlock
> right there and then...
> 
> I'm not sure I like what you've done with added exports - e.g.
> __lookup_hash had been OK as a name of static function, but exporting
> it is asking for clashes.  And honestly, what would you say when running
> into a name like that?  OK, it sounds like it's a (probably low-level)
> lookup in some hash table.  _Maybe_ it would've been fine if we had one
> and only implementation of hash tables in the entire tree and that
> had been a part of it, but it's nothing of that sort.  And "hash" in
> the name is not about doing a hash lookup as opposed to some other
> work (it *does* handle hash misses, allocating dentry, asking filesystem
> to do real on-disk lookup, etc.) - it's actually about "hash function
> of the name is already calculated".  My fault, that - predecessor of
> that thing had been called lookup_one(); it took a string, calculated
> its length and computed hash, then proceeded to do lookups.  The latter
> part could be reused in handling of rmdir et.al., where we already had
> the component length and hash precalculated, so the tail of lookup_one()
> had been carved out into a separate helper.  Circa 2.3.99...
> 
> Anyway, the name is _not_ fit for an export; I'm not sure what to call
> it - lookup_one_qstr(), perhaps?  Additional fun is due to the fact
> that these days it is slightly different from the lookup_one() et.al.
> Those can be called with directory held shared; that allows parallel
> lookups, but it's not free of cost - if we run into a cache miss and need
> to allocate a new dentry and talk to filesystem, we have to recheck the
> hash table after allocation.  __lookup_hash() is called only with parent
> held exclusive and it can skip that fun - hash miss is going to remain
> a miss; nobody else will be able to insert stuff into dcache in that
> directory until we unlock it.
> 
> What I'm worried about is that renaming it to lookup_one_qstr() will
> be an invitation for "oh, we happen to have hash/len already known by the
> time of that lookup_one() call; let's just convert it to lookup_one_qsrt()"
> and if that happens in a place where the parent is held only shared, we'll
> be in trouble.  OTOH, lookup_one_qstr_excl() sounds like an invitation to
> do something painful to whoever's responsible for such name...
> 
> Suggestions, anyone?

Plus, __lookup_hash() doesn't do permission checking so naming it
lookup_one_qstr() seems problematic from that perspective as well...

Don't kill me but:

__excl_qstr_lookup_noperm()

?

  reply	other threads:[~2022-05-18 14:27 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-27  2:32 [PATCH 1/3] ksmbd: remove internal.h include Namjae Jeon
2022-04-27  2:32 ` [PATCH 2/3] fs: introduce lock_rename_child() helper Namjae Jeon
2022-04-27  2:32 ` [PATCH 3/3] ksmbd: fix racy issue from using ->d_parent and ->d_name Namjae Jeon
2022-05-15 11:09   ` Namjae Jeon
2022-05-17 21:21   ` Al Viro
2022-05-18 14:27     ` Christian Brauner [this message]
2022-05-19  0:39     ` Namjae Jeon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220518142709.iewhrubtqiowdago@wittgenstein \
    --to=brauner@kernel.org \
    --cc=hyc.lee@gmail.com \
    --cc=linkinjeon@kernel.org \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=senozhatsky@chromium.org \
    --cc=smfrench@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox