public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: NeilBrown <neilb@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint
Date: Wed, 22 Apr 2015 22:05:53 +0100	[thread overview]
Message-ID: <20150422210553.GX889@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20150422201238.GW889@ZenIV.linux.org.uk>

On Wed, Apr 22, 2015 at 09:12:38PM +0100, Al Viro wrote:
> On Wed, Apr 22, 2015 at 07:07:59PM +0100, Al Viro wrote:
> > And one more: may_follow_link() is now potentially oopsable.  Look: suppose
> > we've reached the link in RCU mode, just as it got unlinked.  link->dentry
> > has become negative and may_follow_link() steps into
> >         /* Allowed if owner and follower match. */
> >         inode = link->dentry->d_inode;
> >         if (uid_eq(current_cred()->fsuid, inode->i_uid))
> >                 return 0;
> > Oops...  Incidentally, I suspect that your __read_seqcount_retry() in
> > follow_link() might be lacking a barrier; why isn't full read_seqcount_retry()
> > needed?
> > 
> > FWIW, I would rather fetch ->d_inode *and* checked ->seq proir to calling
> > get_link(), and passed inode to it as an explicit argument.  And passed it
> > to may_follow_link() as well...
> 
> Hrm...  You know, something really weird is going on here.  Where are
> you setting nd->seq?  I don't see anything in follow_link() doing that.
> And nd->seq _definitely_ needs setting if you want to stay in RCU mode -
> at that point it matches the dentry of symlink, not that of nd->path
> (== parent directory).  Neil, could you tell me which kernel you'd been
> testing (ideally - commit ID is a public git tree), what config and what
> tests had those been?

FWIW, there's a wart that had been annoying me for quite a while, and it
might be related to dealing with that properly.  Namely, walk_component()
calling conventions.  We have walk_component(nd, &path, follow), which can
	* return -E..., and leave us with pathwalk terminated; path contains
junk, and so does nd->path.
	* return 0, update nd->path, nd->inode and nd->seq.  The contents
of path is in undefined state - it might be unchanged, it might be equal to
nd->path (and not pinned down, RCU mode or not).  In any case, callers do
not touch it afterwards.  That's the normal case.
	* return 1, update nd->seq, leave nd->path and nd->inode unchanged and
set path pointing to our symlink.  nd->seq matches path, not nd->path.

In all cases the original contents of path is ignored - it's purely 'out'
parameter, but compiler can't establish that on its own; it _might_ be
left untouched.  In all cases when its contents survives we don't look at
it afterwards, but proving that requires a non-trivial analysis.

And in case when we return 1 (== symlink to be followed), we bugger nd->seq.
It's left as we need it for unlazy_walk() (and after unlazy_walk() we don't
care about it at all), so currently everything works, but if we want to
stay in RCU mode for symlink traversal, we _will_ need ->d_seq of parent
directory.

I wonder if the right way to solve that would be to drop the path argument
entirely and store the bugger in nameidata.  As in
	union {
		struct qstr last;
		struct path link;
	};
	...
	union {
		int last_type;
		unsigned link_seq;
	};
in struct nameidata.  We never need both at the same time; after
walk_component() (or its analogue in do_last()) we don't need the component
name anymore.  That way walk_component() would not trash nd->seq when
finding a symlink...

It would also shrink the stack footprint a bit - local struct path next
in link_path_walk() would be gone, along with the same thing in path_lookupat()
and friends.  Not a lot of win (4 pointers total), but it might be enough
to excuse putting ->d_seq of root in there, along with ->link.dentry->d_inode,
to avoid rechecking its ->d_seq.  As the matter of fact, we do have an
odd number of 32bit fields in there, so ->d_seq of root would fit nicely...

Comments?

  reply	other threads:[~2015-04-22 21:06 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
2015-04-20 18:12 ` [PATCH 01/24] lustre: rip the private symlink nesting limit out Al Viro
2015-04-20 19:08   ` Andreas Dilger
2015-04-20 19:22     ` Al Viro
2015-04-20 20:35       ` Al Viro
2015-04-20 18:12 ` [PATCH 02/24] VFS: replace {, total_}link_count in task_struct with pointer to nameidata Al Viro
2015-04-20 18:12 ` [PATCH 03/24] ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link Al Viro
2015-04-20 18:12 ` [PATCH 04/24] VFS: replace nameidata arg to ->put_link with a char* Al Viro
2015-04-20 18:12 ` [PATCH 05/24] SECURITY: remove nameidata arg from inode_follow_link Al Viro
2015-04-20 18:12 ` [PATCH 06/24] VFS: remove nameidata args from ->follow_link Al Viro
2015-04-20 18:12 ` [PATCH 07/24] namei: expand nested_symlink() in its only caller Al Viro
2015-04-20 18:12 ` [PATCH 08/24] namei.c: separate the parts of follow_link() that find the link body Al Viro
2015-04-20 18:12 ` [PATCH 09/24] namei: fold follow_link() into link_path_walk() Al Viro
2015-04-20 18:12 ` [PATCH 10/24] link_path_walk: handle get_link() returning ERR_PTR() immediately Al Viro
2015-04-20 18:12 ` [PATCH 11/24] link_path_walk: don't bother with walk_component() after jumping link Al Viro
2015-04-20 18:12 ` [PATCH 12/24] link_path_walk: turn inner loop into explicit goto Al Viro
2015-04-20 18:12 ` [PATCH 13/24] link_path_walk: massage a bit more Al Viro
2015-04-20 18:12 ` [PATCH 14/24] link_path_walk: get rid of duplication Al Viro
2015-04-20 18:12 ` [PATCH 15/24] link_path_walk: final preparations to killing recursion Al Viro
2015-04-20 18:13 ` [PATCH 16/24] link_path_walk: kill the recursion Al Viro
2015-04-20 21:04   ` Linus Torvalds
2015-04-20 21:32     ` Al Viro
2015-04-20 21:39       ` Linus Torvalds
2015-04-20 21:51         ` Al Viro
2015-04-20 21:41       ` Linus Torvalds
2015-04-20 21:42         ` Linus Torvalds
2015-04-20 21:59           ` Al Viro
2015-04-20 21:52         ` Al Viro
2015-04-20 18:13 ` [PATCH 17/24] link_path_walk: split "return from recursive call" path Al Viro
2015-04-20 18:13 ` [PATCH 18/24] link_path_walk: cleanup - turn goto start; into continue; Al Viro
2015-04-20 18:13 ` [PATCH 19/24] namei: fold may_follow_link() into follow_link() Al Viro
2015-04-20 18:13 ` [PATCH 20/24] namei: introduce nameidata->stack Al Viro
2015-04-20 18:13 ` [PATCH 21/24] namei: regularize use of put_link() and follow_link(), trim arguments Al Viro
2015-04-20 18:13 ` [PATCH 22/24] namei: trim the arguments of get_link() Al Viro
2015-04-20 18:13 ` [PATCH 23/24] new ->follow_link() and ->put_link() calling conventions Al Viro
2015-04-20 18:13 ` [PATCH 24/24] uninline walk_component() Al Viro
2015-04-21 14:49 ` [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
2015-04-21 15:04   ` Christoph Hellwig
2015-04-21 15:12     ` Richard Weinberger
2015-04-21 15:45       ` Al Viro
2015-04-21 16:46         ` Boaz Harrosh
2015-04-21 21:20         ` Al Viro
2015-04-22 18:07           ` Al Viro
2015-04-22 20:12             ` Al Viro
2015-04-22 21:05               ` Al Viro [this message]
2015-04-23  7:45                 ` NeilBrown
2015-04-23 18:07                   ` Al Viro
2015-04-24  6:35                     ` NeilBrown
2015-04-24 13:42                       ` Al Viro
2015-05-04  5:11                         ` Al Viro
2015-05-04  7:30                           ` NeilBrown
2015-04-23  5:01           ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150422210553.GX889@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox