All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: NeilBrown <neilb@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint
Date: Wed, 22 Apr 2015 22:05:53 +0100	[thread overview]
Message-ID: <20150422210553.GX889@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20150422201238.GW889@ZenIV.linux.org.uk>

On Wed, Apr 22, 2015 at 09:12:38PM +0100, Al Viro wrote:
> On Wed, Apr 22, 2015 at 07:07:59PM +0100, Al Viro wrote:
> > And one more: may_follow_link() is now potentially oopsable.  Look: suppose
> > we've reached the link in RCU mode, just as it got unlinked.  link->dentry
> > has become negative and may_follow_link() steps into
> >         /* Allowed if owner and follower match. */
> >         inode = link->dentry->d_inode;
> >         if (uid_eq(current_cred()->fsuid, inode->i_uid))
> >                 return 0;
> > Oops...  Incidentally, I suspect that your __read_seqcount_retry() in
> > follow_link() might be lacking a barrier; why isn't full read_seqcount_retry()
> > needed?
> > 
> > FWIW, I would rather fetch ->d_inode *and* checked ->seq proir to calling
> > get_link(), and passed inode to it as an explicit argument.  And passed it
> > to may_follow_link() as well...
> 
> Hrm...  You know, something really weird is going on here.  Where are
> you setting nd->seq?  I don't see anything in follow_link() doing that.
> And nd->seq _definitely_ needs setting if you want to stay in RCU mode -
> at that point it matches the dentry of symlink, not that of nd->path
> (== parent directory).  Neil, could you tell me which kernel you'd been
> testing (ideally - commit ID is a public git tree), what config and what
> tests had those been?

FWIW, there's a wart that had been annoying me for quite a while, and it
might be related to dealing with that properly.  Namely, walk_component()
calling conventions.  We have walk_component(nd, &path, follow), which can
	* return -E..., and leave us with pathwalk terminated; path contains
junk, and so does nd->path.
	* return 0, update nd->path, nd->inode and nd->seq.  The contents
of path is in undefined state - it might be unchanged, it might be equal to
nd->path (and not pinned down, RCU mode or not).  In any case, callers do
not touch it afterwards.  That's the normal case.
	* return 1, update nd->seq, leave nd->path and nd->inode unchanged and
set path pointing to our symlink.  nd->seq matches path, not nd->path.

In all cases the original contents of path is ignored - it's purely 'out'
parameter, but compiler can't establish that on its own; it _might_ be
left untouched.  In all cases when its contents survives we don't look at
it afterwards, but proving that requires a non-trivial analysis.

And in case when we return 1 (== symlink to be followed), we bugger nd->seq.
It's left as we need it for unlazy_walk() (and after unlazy_walk() we don't
care about it at all), so currently everything works, but if we want to
stay in RCU mode for symlink traversal, we _will_ need ->d_seq of parent
directory.

I wonder if the right way to solve that would be to drop the path argument
entirely and store the bugger in nameidata.  As in
	union {
		struct qstr last;
		struct path link;
	};
	...
	union {
		int last_type;
		unsigned link_seq;
	};
in struct nameidata.  We never need both at the same time; after
walk_component() (or its analogue in do_last()) we don't need the component
name anymore.  That way walk_component() would not trash nd->seq when
finding a symlink...

It would also shrink the stack footprint a bit - local struct path next
in link_path_walk() would be gone, along with the same thing in path_lookupat()
and friends.  Not a lot of win (4 pointers total), but it might be enough
to excuse putting ->d_seq of root in there, along with ->link.dentry->d_inode,
to avoid rechecking its ->d_seq.  As the matter of fact, we do have an
odd number of 32bit fields in there, so ->d_seq of root would fit nicely...

Comments?

  reply	other threads:[~2015-04-22 21:05 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
2015-04-20 18:12 ` [PATCH 01/24] lustre: rip the private symlink nesting limit out Al Viro
2015-04-20 19:08   ` Andreas Dilger
2015-04-20 19:22     ` Al Viro
2015-04-20 20:35       ` Al Viro
2015-04-20 18:12 ` [PATCH 02/24] VFS: replace {, total_}link_count in task_struct with pointer to nameidata Al Viro
2015-04-20 18:12 ` [PATCH 03/24] ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link Al Viro
2015-04-20 18:12 ` [PATCH 04/24] VFS: replace nameidata arg to ->put_link with a char* Al Viro
2015-04-20 18:12 ` [PATCH 05/24] SECURITY: remove nameidata arg from inode_follow_link Al Viro
2015-04-20 18:12 ` [PATCH 06/24] VFS: remove nameidata args from ->follow_link Al Viro
2015-04-20 18:12 ` [PATCH 07/24] namei: expand nested_symlink() in its only caller Al Viro
2015-04-20 18:12 ` [PATCH 08/24] namei.c: separate the parts of follow_link() that find the link body Al Viro
2015-04-20 18:12 ` [PATCH 09/24] namei: fold follow_link() into link_path_walk() Al Viro
2015-04-20 18:12 ` [PATCH 10/24] link_path_walk: handle get_link() returning ERR_PTR() immediately Al Viro
2015-04-20 18:12 ` [PATCH 11/24] link_path_walk: don't bother with walk_component() after jumping link Al Viro
2015-04-20 18:12 ` [PATCH 12/24] link_path_walk: turn inner loop into explicit goto Al Viro
2015-04-20 18:12 ` [PATCH 13/24] link_path_walk: massage a bit more Al Viro
2015-04-20 18:12 ` [PATCH 14/24] link_path_walk: get rid of duplication Al Viro
2015-04-20 18:12 ` [PATCH 15/24] link_path_walk: final preparations to killing recursion Al Viro
2015-04-20 18:13 ` [PATCH 16/24] link_path_walk: kill the recursion Al Viro
2015-04-20 21:04   ` Linus Torvalds
2015-04-20 21:32     ` Al Viro
2015-04-20 21:39       ` Linus Torvalds
2015-04-20 21:51         ` Al Viro
2015-04-20 21:41       ` Linus Torvalds
2015-04-20 21:42         ` Linus Torvalds
2015-04-20 21:59           ` Al Viro
2015-04-20 21:52         ` Al Viro
2015-04-20 18:13 ` [PATCH 17/24] link_path_walk: split "return from recursive call" path Al Viro
2015-04-20 18:13 ` [PATCH 18/24] link_path_walk: cleanup - turn goto start; into continue; Al Viro
2015-04-20 18:13 ` [PATCH 19/24] namei: fold may_follow_link() into follow_link() Al Viro
2015-04-20 18:13 ` [PATCH 20/24] namei: introduce nameidata->stack Al Viro
2015-04-20 18:13 ` [PATCH 21/24] namei: regularize use of put_link() and follow_link(), trim arguments Al Viro
2015-04-20 18:13 ` [PATCH 22/24] namei: trim the arguments of get_link() Al Viro
2015-04-20 18:13 ` [PATCH 23/24] new ->follow_link() and ->put_link() calling conventions Al Viro
2015-04-20 18:13 ` [PATCH 24/24] uninline walk_component() Al Viro
2015-04-21 14:49 ` [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
2015-04-21 15:04   ` Christoph Hellwig
2015-04-21 15:12     ` Richard Weinberger
2015-04-21 15:45       ` Al Viro
2015-04-21 16:46         ` Boaz Harrosh
2015-04-21 21:20         ` Al Viro
2015-04-22 18:07           ` Al Viro
2015-04-22 20:12             ` Al Viro
2015-04-22 21:05               ` Al Viro [this message]
2015-04-23  7:45                 ` NeilBrown
2015-04-23 18:07                   ` Al Viro
2015-04-24  6:35                     ` NeilBrown
2015-04-24 13:42                       ` Al Viro
2015-05-04  5:11                         ` Al Viro
2015-05-04  7:30                           ` NeilBrown
2015-04-23  5:01           ` Al Viro
2015-04-21 14:51 ` [PATCH] logfs: fix a pagecache leak for symlinks Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150422210553.GX889@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.