From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: NeilBrown <neilb@suse.de>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint
Date: Mon, 20 Apr 2015 19:12:22 +0100 [thread overview]
Message-ID: <20150420181222.GK889@ZenIV.linux.org.uk> (raw)
The patchset below gets rid of recursion in pathname resolution.
As the result, stack footprint of fs/namei.c stuff doesn't depend upon
the symlink nesting anymore and is much less than the worst cases in
mainline - as the matter of fact, it's around the footprint mainline gets
with just one or two levels of nesting.
We could reduce it further (see below), but I'm not sure it's worth
doing - it's not much extra complexity, but we only squeeze out ~250 bytes
that way, with the worst-case footprints (those are triggered by rename())
are around 1.4Kb (instead of about twice as much in mainline). OTOH,
that extra complexity would've allowed us to get rid of the nesting limit
entirely (i.e. we get ELOOP when we encounter 40 symlinks, no matter in
which manner they are nested). That might be worth considering...
Performance of the result seems to be about on par with the mainline
and so far it has survived everything I'd hit it with. More testing is
obviously needed.
Patches 2/24..6/24 are from Neil's RCU follow_link patchset; the
rest of his patchset is, of course, derailed by the massage done here,
but AFAICS we should be able to port it on top of this one with reasonably
little PITA.
There are several fs-visible interface changes:
1) nameidata is _not_ passed to ->follow_link() and ->put_link(). We use
the same kind of approach as with pt_regs.
2) instead of "store the link body (with nd_set_link()), return an opaque
pointer to be given to ->put_link() to undo whatever we'd done to pin the
link body" ->follow_link() now does the opposite - it *returns* the symlink
body and stores whatever opaque pointer ->put_link() will need. Stores
it with nd_pin_link(cookie). Rules:
* ERR_PTR() returned => fail lookup with that error.
* NULL returned => we'd done the move ourselves, no body to traverse.
* pointer to string => symlink body for traversal. ->put_link()
will be called if non-NULL pointer had been stored by nd_pin_link().
3) ->put_link() does *not* have access to symlink body anymore. The cases
that used to rely on seeing it (kfree_put_link(), mostly) are trivially
switched to new rules - just do nd_pin_link(body); return body; and we
are fine.
I've converted all in-tree instances of ->follow_link()/->put_link()
(see 23/24) and it's actually less headache than with the mainline calling
conventions.
The main reason the series is that long is that I'm trying to keep
all steps small - it *is* a serious rewrite and I want to do it in easily
verified steps.
Overall it's fairly straightforward - instead of getting a new
stack frame for each level of nesting, we add an explicit array just for
the stuff that needs to be preserved across the recursive call - there's
not much of it (how far did we get in pathname + stuff needed for ->put_link(),
i.e. cookie and struct path of symlink itself). That array is *not*
embedded into nameidata - only a pointer to it. That allows to reduce the
size of struct nameidata, which is especially sensitive in rename(), where
we have two of those. Old nd->saved_names[] is gone - not needed anymore.
Array is held in caller (or caller of caller) stack frame. We could
go for "just keep a couple of elements there, if we need more we'll allocate
from slab", but that doesn't buy us a lot unless we remove the limit on
nesting at the same time. I hadn't done that in this series; we can always
do that later, it would be a fairly isolated modification.
For RCU follow_link porting we'll probably need to replace struct
path of symlink with union of struct path and (dentry,inode) pair. Should
be doable without blowing the stack footprint.
The scariest remaining thing about fs/namei.c stack footprint is that
we get ->d_automount() called at pretty much maximal depth; it's about half of
what it used to be in mainline (1.3Kb instead of 2.7Kb), but there's a lot
of work done by that thing (and by finish_automount()). Something like
NFS referral point encountered by rename(2) while looking for parent
directories can get really nasty...
Folks, please review; the patches (on top of vfs.git#for-next) go in followups, the entire branch is in vfs.git#link_path_walk.
Shortlog:
Al Viro (19):
lustre: rip the private symlink nesting limit out
namei: expand nested_symlink() in its only caller
namei.c: separate the parts of follow_link() that find the link body
namei: fold follow_link() into link_path_walk()
link_path_walk: handle get_link() returning ERR_PTR() immediately
link_path_walk: don't bother with walk_component() after jumping link
link_path_walk: turn inner loop into explicit goto
link_path_walk: massage a bit more
link_path_walk: get rid of duplication
link_path_walk: final preparations to killing recursion
link_path_walk: kill the recursion
link_path_walk: split "return from recursive call" path
link_path_walk: cleanup - turn goto start; into continue;
namei: fold may_follow_link() into follow_link()
namei: introduce nameidata->stack
namei: regularize use of put_link() and follow_link(), trim arguments
namei: trim the arguments of get_link()
new ->follow_link() and ->put_link() calling conventions
uninline walk_component()
NeilBrown (5):
VFS: replace {, total_}link_count in task_struct with pointer to nameidata
ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link
VFS: replace nameidata arg to ->put_link with a char*.
SECURITY: remove nameidata arg from inode_follow_link.
VFS: remove nameidata args from ->follow_link
Diffstat:
Documentation/filesystems/Locking | 4 +-
Documentation/filesystems/porting | 10 +
Documentation/filesystems/vfs.txt | 4 +-
drivers/staging/lustre/lustre/llite/symlink.c | 26 +-
fs/9p/vfs_inode.c | 15 +-
fs/9p/vfs_inode_dotl.c | 9 +-
fs/autofs4/symlink.c | 5 +-
fs/befs/linuxvfs.c | 46 ++--
fs/ceph/inode.c | 6 +-
fs/cifs/cifsfs.h | 2 +-
fs/cifs/link.c | 29 +-
fs/configfs/symlink.c | 28 +-
fs/debugfs/file.c | 5 +-
fs/ecryptfs/inode.c | 11 +-
fs/exofs/symlink.c | 7 +-
fs/ext2/symlink.c | 6 +-
fs/ext3/symlink.c | 6 +-
fs/ext4/symlink.c | 6 +-
fs/freevxfs/vxfs_immed.c | 11 +-
fs/fuse/dir.c | 19 +-
fs/gfs2/inode.c | 11 +-
fs/hostfs/hostfs_kern.c | 16 +-
fs/hppfs/hppfs.c | 9 +-
fs/jffs2/symlink.c | 9 +-
fs/jfs/symlink.c | 6 +-
fs/kernfs/symlink.c | 23 +-
fs/libfs.c | 7 +-
fs/namei.c | 378 +++++++++++++++-----------
fs/nfs/symlink.c | 18 +-
fs/ntfs/namei.c | 1 -
fs/overlayfs/inode.c | 36 +--
fs/proc/base.c | 4 +-
fs/proc/inode.c | 8 +-
fs/proc/namespaces.c | 4 +-
fs/proc/self.c | 24 +-
fs/proc/thread_self.c | 22 +-
fs/sysv/symlink.c | 5 +-
fs/ubifs/file.c | 7 +-
fs/ufs/symlink.c | 6 +-
fs/xfs/xfs_iops.c | 12 +-
include/linux/fs.h | 11 +-
include/linux/namei.h | 7 +-
include/linux/sched.h | 3 +-
include/linux/security.h | 9 +-
mm/shmem.c | 28 +-
security/capability.c | 3 +-
security/security.c | 4 +-
security/selinux/hooks.c | 2 +-
48 files changed, 457 insertions(+), 471 deletions(-)
next reply other threads:[~2015-04-20 18:12 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-20 18:12 Al Viro [this message]
2015-04-20 18:12 ` [PATCH 01/24] lustre: rip the private symlink nesting limit out Al Viro
2015-04-20 19:08 ` Andreas Dilger
2015-04-20 19:22 ` Al Viro
2015-04-20 20:35 ` Al Viro
2015-04-20 18:12 ` [PATCH 02/24] VFS: replace {, total_}link_count in task_struct with pointer to nameidata Al Viro
2015-04-20 18:12 ` [PATCH 03/24] ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link Al Viro
2015-04-20 18:12 ` [PATCH 04/24] VFS: replace nameidata arg to ->put_link with a char* Al Viro
2015-04-20 18:12 ` [PATCH 05/24] SECURITY: remove nameidata arg from inode_follow_link Al Viro
2015-04-20 18:12 ` [PATCH 06/24] VFS: remove nameidata args from ->follow_link Al Viro
2015-04-20 18:12 ` [PATCH 07/24] namei: expand nested_symlink() in its only caller Al Viro
2015-04-20 18:12 ` [PATCH 08/24] namei.c: separate the parts of follow_link() that find the link body Al Viro
2015-04-20 18:12 ` [PATCH 09/24] namei: fold follow_link() into link_path_walk() Al Viro
2015-04-20 18:12 ` [PATCH 10/24] link_path_walk: handle get_link() returning ERR_PTR() immediately Al Viro
2015-04-20 18:12 ` [PATCH 11/24] link_path_walk: don't bother with walk_component() after jumping link Al Viro
2015-04-20 18:12 ` [PATCH 12/24] link_path_walk: turn inner loop into explicit goto Al Viro
2015-04-20 18:12 ` [PATCH 13/24] link_path_walk: massage a bit more Al Viro
2015-04-20 18:12 ` [PATCH 14/24] link_path_walk: get rid of duplication Al Viro
2015-04-20 18:12 ` [PATCH 15/24] link_path_walk: final preparations to killing recursion Al Viro
2015-04-20 18:13 ` [PATCH 16/24] link_path_walk: kill the recursion Al Viro
2015-04-20 21:04 ` Linus Torvalds
2015-04-20 21:32 ` Al Viro
2015-04-20 21:39 ` Linus Torvalds
2015-04-20 21:51 ` Al Viro
2015-04-20 21:41 ` Linus Torvalds
2015-04-20 21:42 ` Linus Torvalds
2015-04-20 21:59 ` Al Viro
2015-04-20 21:52 ` Al Viro
2015-04-20 18:13 ` [PATCH 17/24] link_path_walk: split "return from recursive call" path Al Viro
2015-04-20 18:13 ` [PATCH 18/24] link_path_walk: cleanup - turn goto start; into continue; Al Viro
2015-04-20 18:13 ` [PATCH 19/24] namei: fold may_follow_link() into follow_link() Al Viro
2015-04-20 18:13 ` [PATCH 20/24] namei: introduce nameidata->stack Al Viro
2015-04-20 18:13 ` [PATCH 21/24] namei: regularize use of put_link() and follow_link(), trim arguments Al Viro
2015-04-20 18:13 ` [PATCH 22/24] namei: trim the arguments of get_link() Al Viro
2015-04-20 18:13 ` [PATCH 23/24] new ->follow_link() and ->put_link() calling conventions Al Viro
2015-04-20 18:13 ` [PATCH 24/24] uninline walk_component() Al Viro
2015-04-21 14:49 ` [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
2015-04-21 15:04 ` Christoph Hellwig
2015-04-21 15:12 ` Richard Weinberger
2015-04-21 15:45 ` Al Viro
2015-04-21 16:46 ` Boaz Harrosh
2015-04-21 21:20 ` Al Viro
2015-04-22 18:07 ` Al Viro
2015-04-22 20:12 ` Al Viro
2015-04-22 21:05 ` Al Viro
2015-04-23 7:45 ` NeilBrown
2015-04-23 18:07 ` Al Viro
2015-04-24 6:35 ` NeilBrown
2015-04-24 13:42 ` Al Viro
2015-05-04 5:11 ` Al Viro
2015-05-04 7:30 ` NeilBrown
2015-04-23 5:01 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150420181222.GK889@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=neilb@suse.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox