All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/9] Support follow_link in RCU-walk.
Date: Mon, 9 Mar 2015 13:21:44 +1100	[thread overview]
Message-ID: <20150309132144.7babc0a6@notabene.brown> (raw)
In-Reply-To: <20150305060520.GY29656@ZenIV.linux.org.uk>

[-- Attachment #1: Type: text/plain, Size: 4944 bytes --]

On Thu, 5 Mar 2015 06:05:20 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Thu, Mar 05, 2015 at 04:21:21PM +1100, NeilBrown wrote:
> > Hi Al (and others),
> > 
> >  I wonder if you could look over this patchset.
> >  It allows RCU-walk to follow symlinks in many common cases,
> >  thus removing a surprising performance hit caused by using symlinks.
> > 
> >  The last could of patches make changes to XFS and NFS to support
> >  this but I haven't forwarded to the relevant lists yet.
> >  If/when the early code meets with approval I'll do that.
> > 
> >  The first patch almost certainly needs to be changed.  I originally
> >  wrote this code when filesystems could see inside nameidata.
> >  It is now opaque so the simplest solution was to provide an
> >  accessor function.
> >  Maybe I should as a 'flags' arg to ->follow_link?? Or have
> >  ->follow_link and ->follow_link_rcu ??
> >  What do you suggest?
> 
> Umm...  Some observations:
> 	* now ->follow_link() can be called in RCU mode, which means
> that it can race with fs shutdown; not a problem, except that now it
> joins ->lookup() et.al. in "if some data structure is needed in RCU
> case of that, make sure it's not destroyed without an RCU delay somewhere
> between the entry into ->kill_sb() and destruction.

So inodes and dentries and associated private data should already be safe.
And s_fs_info can be used if it is freed by e.g. kfree_rcu (like autofs)
but not if just kfree (like ext3).

xfs_fs_put_super() directly frees the 'xfs_mount', which xfs_readlink
accesses.  I guess that needs to be fixed.


> 	* highmem pages in symlinks: that BS shouldn't be allowed at
> all.  Just make sure that at least for those filesystems symlink inodes
> get mapping_set_gfp_mask(&inode->i_data, GFP_KERNEL) and be done with that.

page_getlink() already uses kmap(), implying that highmem pages are
supported.   All I'm doing is making sure that my page_getlink_rcu()
doesn't fail horribly if the page is a highmem page.

If a filesystem needs improved follow_link performance on a highmem machine,
then setting the gfp_mask as you suggest is probably a good idea, but I don't
really want to impose that on filesystems if I don't need to.  And at present
I don't.
So I'd rather leave it to the filesystem maintainer, or someone who discovers
a need.


> 	* are you sure that security_inode_follow_link() is OK to call in
> RCU mode?

No.
avc_has_perm() doesn't look RCU safe, even without auditing enabled.
At the very least we'll need to pass a "lookup_rcu" flag in there.


> 	* what warranties are you giving for the lifetime of strings
> passed to nd_set_link()?  Right now it's "should not be freed until the
> matching ->put_link()"; what happens for RCU mode?

The same....

For XFS, we kmalloc a buffer GFP_ATOMIC and copy into that.  Then
put_link() kfrees it.
For filesystems with the symlink in the page cache, we get a reference to
the page (which is a bit heavy-handed for RCU-walk, but much less so than the
current code) and drop the reference in ->put_link.

For filesystems with a short symlink in the inode, we just provide a pointer
to that... How long can we expect that to be around?
I cannot see any provision for keeping those inodes in memory while we
follow the symlink... What am I missing?

In any case, if there is a reference held on the inode for ref-walk, then
presumably complete_walk() will take a reference on that same inode when
dropping out of rcu-walk.... I hope.


So I think the rules here are unchanged.


> 	* really nasty one: creat(2) on a dangling symlink.  What's to
> preserve the last component if you get into that symlink in RCU mode?

As above - we will have a counted reference to whatever holds the text of the
symlink.



> 
> TBH, I'm less than fond of passing nameidata to ->follow_link() at all,
> flags or no flags.  We could kill current->link_count and
> current->total_link_count, replacing them with one void * current->nameidata
> and taking counters into struct nameidata itself.  Have places like e.g.
> kern_path_locked() do
> 	struct nameidata nd, *saved = set_nameidata(&nd);
> 	...
> 	set_nameidata(saved);
> with set_nameidata(p) doing this:
> 	old = current->nameidata;
> 	current->nameidata = p;
> 	if (p) {
> 		if (!old) {
> 			p->link_count = 0;
> 			p->total_link_count = 0;
> 		} else {
> 			p->link_count = old->link_count;
> 			p->total_link_count = old->total_link_count;
> 		}
> 	}
> 	return old;
> 
> Then nd_set_link() et.al. would use current->nameidata instead of an
> explicitly passed pointer and ->follow_link() instances wouldn't need
> that opaque pointer passed to them at all.

Sounds interesting  - I might try it.

Would ->follow_link() than get a 'flags' argument, or would "nd_is_rcu()"
reference current->nameidata->flags ??

Thanks,
NeilBrown


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

      parent reply	other threads:[~2015-03-09  2:21 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-05  5:21 [PATCH 0/9] Support follow_link in RCU-walk NeilBrown
2015-03-05  5:21 ` [PATCH 8/9] XFS: allow follow_link to often succeed " NeilBrown
2015-03-05  5:21 ` [PATCH 3/9] VFS/namei: new flag to support RCU symlinks: LOOKUP_LINK_RCU NeilBrown
2015-03-05  5:21 ` [PATCH 6/9] VFS/namei: enable RCU-walk when following symlinks NeilBrown
2015-03-05  5:21 ` [PATCH 1/9] FS: make all ->follow_link handlers aware for LOOKUP_RCU NeilBrown
2015-03-05  5:21 ` [PATCH 7/9] VFS/namei: handle LOOKUP_RCU in page_follow_link_light NeilBrown
2015-03-05  5:21 ` [PATCH 4/9] VFS/namei: abort RCU-walk on symlink if atime needs updating NeilBrown
2015-03-05  5:21 ` [PATCH 2/9] VFS/namei: use terminate_walk when symlink lookup fails NeilBrown
2015-03-05  5:21 ` [PATCH 5/9] VFS/namei: enhance follow_link to support RCU-walk NeilBrown
2015-03-05  5:21 ` [PATCH 9/9] NFS: support LOOKUP_RCU in nfs_follow_link NeilBrown
2015-03-05  6:05 ` [PATCH 0/9] Support follow_link in RCU-walk Al Viro
2015-03-05 13:52   ` John Stoffel
2015-03-05 16:00     ` Al Viro
2015-03-05 17:17       ` John Stoffel
2015-03-05 21:08   ` NeilBrown
2015-03-09  2:21   ` NeilBrown [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150309132144.7babc0a6@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.