From: Kinglong Mee <kinglongmee@gmail.com>
To: "J. Bruce Fields" <bfields@fieldses.org>,
Al Viro <viro@zeniv.linux.org.uk>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
linux-fsdevel@vger.kernel.org, NeilBrown <neilb@suse.de>,
Trond Myklebust <trond.myklebust@primarydata.com>,
kinglongmee@gmail.com
Subject: [PATCH 0/9 v8] NFSD: Pin to vfsmount for nfsd exports cache
Date: Mon, 27 Jul 2015 11:05:54 +0800 [thread overview]
Message-ID: <55B5A012.1030006@gmail.com> (raw)
If there are some mount points(not exported for nfs) under pseudo root,
after client's operation of those entry under the root, anyone *can't*
unmount those mount points until export cache expired.
# cat /etc/exports
/nfs/xfs *(rw,insecure,no_subtree_check,no_root_squash)
/nfs/pnfs *(rw,insecure,no_subtree_check,no_root_squash)
# ll /nfs/
total 0
drwxr-xr-x. 3 root root 84 Apr 21 22:27 pnfs
drwxr-xr-x. 3 root root 84 Apr 21 22:27 test
drwxr-xr-x. 2 root root 6 Apr 20 22:01 xfs
# mount /dev/sde /nfs/test
# df
Filesystem 1K-blocks Used Available Use% Mounted on
......
/dev/sdd 1038336 32944 1005392 4% /nfs/pnfs
/dev/sdc 10475520 32928 10442592 1% /nfs/xfs
/dev/sde 999320 1284 929224 1% /nfs/test
# mount -t nfs 127.0.0.1:/nfs/ /mnt
# ll /mnt/*/
/mnt/pnfs/:
total 0
-rw-r--r--. 1 root root 0 Apr 21 22:23 attr
drwxr-xr-x. 2 root root 6 Apr 21 22:19 tmp
/mnt/xfs/:
total 0
# umount /nfs/test/
umount: /nfs/test/: target is busy
(In some cases useful info about processes that
use the device is found by lsof(8) or fuser(1).)
It's caused by exports cache of nfsd holds the reference of
the path (here is /nfs/test/), so, it can't be umounted.
I don't think that's user expect, they want umount /nfs/test/.
Bruce think user can also umount /nfs/pnfs/ and /nfs/xfs.
This patch site lets nfsd exports pinning to vfsmount,
not using mntget, so user can umount any exports mountpoint now.
v3,
1. New helpers path_get_pin/path_put_unpin for path pin.
2. Use kzalloc for allocating memory.
v4, Thanks for Al Viro's commets for the logic of fs_pin.
1. add a completion for pin_kill waiting the reference is decreased to zero.
2. add a work_struct for pin_kill decreases the reference indirectly.
3. free svc_export/svc_expkey in pin_kill, not svc_export_put/svc_expkey_put.
4. svc_export_put/svc_expkey_put go though pin_kill logic.
v5,
let killing fs_pin under a reference of vfsmnt.
v6,
1. revert the change of v5
2. new helper legitimize_mntget() for nfsd exports/expkey cache
get vfsmount from fs_pin
3. cleanup some codes of sunrpc's cache
4. switch using list_head instead of single list for cache_head
in cache_detail
5. new functions validate/invalidate for processing of reference
increase/decrease change (nfsd exports/expkey using grab the
reference of mnt)
6. delete cache_head directly from cache_detail in pin_kill
v7,
implement self reference increase and decrease for nfsd exports/expkey
When reference of cahce_head increase(>1), grab a reference of mnt once.
and reference decrease to 1 (==1), drop the reference of mnt.
v8, Use hash_list for sunrpc cachen and a new method for nfsd's pin,
1. There are only one outlet from each cache, exp_find_key() for expkey,
exp_get_by_name() for export.
2. Any fsid to export or filehandle to export will call the function.
3. exp_get()/exp_put() increase/decrease the reference of export.
Call legitimize_mntget() in the only outlet function exp_find_key()/
exp_get_by_name(), if fail return STALE, otherwise, any valid
expkey/export from the cache is validated (Have get the reference of vfsmnt).
Add mntget() in exp_get() and mntput() in exp_put(), because the export
passed to exp_get/exp_put are returned from exp_find_key/exp_get_by_name.
For expkey cache,
1. At first, a fsid is passed to exp_find_key, and lookup a cache
in svc_expkey_lookup, if success, ekey->ek_path is pined to mount.
2. Then call legitimize_mntget getting a reference of vfsmnt
before return from exp_find_key.
3. Any calling exp_find_key with valid cache must put the vfsmnt.
for export cache,
1. At first, a path (returned from exp_find_key) with validate vfsmnt
is passed to exp_get_by_name, if success, exp->ex_path is pined to mount.
2. Then call legitimize_mntget getting a reference of vfsmnt
before return from exp_get_by_name.
3. Any calling exp_get_by_name with valid cache must put the vfsmnt
by exp_put();
4. Any using the exp returned from exp_get_by_name must call exp_get(),
will increase the reference of vfsmnt.
So that,
a. After getting the reference in 2, any umount of filesystem will get -EBUSY.
b. After put all reference after 4, or before get the reference in 2,
any umount of filesystem will call pin_kill, and delete the cache directly,
also unpin the vfsmount.
c. Between 1 and 2, have get the reference of exp/key cache, with invalidate vfsmnt.
As you said, umount of filesystem only wait exp_find_key/exp_get_by_name
put the reference of cache when legitimize_mntget fail.
Kinglong Mee (9):
fs_pin: Initialize value for fs_pin explicitly
fs_pin: Export functions for specific filesystem
path: New helpers path_get_pin/path_put_unpin for path pin
fs: New helper legitimize_mntget() for getting a legitimize mnt
sunrpc: Store cache_detail in seq_file's private, directly
sunrpc/nfsd: Remove redundant code by exports seq_operations functions
sunrpc: Switch to using hash list instead single list
sunrpc: New helper cache_delete_entry for deleting cache_head directly
nfsd: Allows user un-mounting filesystem where nfsd exports base on
fs/fs_pin.c | 4 +
fs/namei.c | 26 ++++++
fs/namespace.c | 19 ++++
fs/nfsd/export.c | 209 ++++++++++++++++++++++++-------------------
fs/nfsd/export.h | 22 ++++-
include/linux/fs_pin.h | 6 ++
include/linux/mount.h | 1 +
include/linux/path.h | 4 +
include/linux/sunrpc/cache.h | 10 ++-
net/sunrpc/cache.c | 133 ++++++++++++++++-----------
10 files changed, 286 insertions(+), 148 deletions(-)
--
2.4.3
next reply other threads:[~2015-07-27 3:06 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-27 3:05 Kinglong Mee [this message]
2015-07-27 3:08 ` [PATCH 4/9 v8] fs: New helper legitimize_mntget() for getting a legitimize mnt Kinglong Mee
[not found] ` <55B5A0B0.7060604-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-07-29 2:06 ` NeilBrown
2015-07-30 13:17 ` Kinglong Mee
2015-07-27 3:09 ` [PATCH 5/9 v8] sunrpc: Store cache_detail in seq_file's private directly Kinglong Mee
2015-07-29 2:11 ` NeilBrown
[not found] ` <55B5A012.1030006-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-07-27 3:06 ` [PATCH 1/9 v8] fs_pin: Initialize value for fs_pin explicitly Kinglong Mee
2015-07-29 0:25 ` NeilBrown
2015-07-29 19:41 ` J. Bruce Fields
[not found] ` <20150729194155.GC21949-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-07-29 21:48 ` NeilBrown
2015-07-30 0:36 ` J. Bruce Fields
2015-07-30 12:28 ` Kinglong Mee
2015-07-27 3:07 ` [PATCH 2/9 v8] fs_pin: Export functions for specific filesystem Kinglong Mee
2015-07-29 0:30 ` NeilBrown
2015-07-30 12:31 ` Kinglong Mee
2015-07-27 3:07 ` [PATCH 3/9 v8] path: New helpers path_get_pin/path_put_unpin for path pin Kinglong Mee
2015-07-27 3:09 ` [PATCH 6/9 v8] sunrpc/nfsd: Remove redundant code by exports seq_operations functions Kinglong Mee
2015-07-29 2:15 ` NeilBrown
2015-07-27 3:12 ` [PATCH 9/9 v8] nfsd: Allows user un-mounting filesystem where nfsd exports base on Kinglong Mee
[not found] ` <55B5A186.7040004-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-07-29 3:56 ` NeilBrown
2015-07-30 13:30 ` Kinglong Mee
2015-07-29 3:59 ` [PATCH] fs-pin: allow pin_remove() to be called other than from ->kill() NeilBrown
2015-08-10 11:37 ` Kinglong Mee
2015-08-18 6:07 ` Kinglong Mee
2015-08-18 6:21 ` NeilBrown
2015-08-18 6:37 ` Kinglong Mee
2015-07-27 3:10 ` [PATCH 7/9 v8] sunrpc: Switch to using hash list instead single list Kinglong Mee
2015-07-29 2:19 ` NeilBrown
2015-07-29 19:51 ` J. Bruce Fields
[not found] ` <20150729195151.GD21949-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-07-30 13:01 ` Kinglong Mee
2015-07-27 3:10 ` [PATCH 8/9 v8] sunrpc: New helper cache_delete_entry for deleting cache_head directly Kinglong Mee
[not found] ` <55B5A135.9050800-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-07-29 2:29 ` NeilBrown
2015-07-30 13:14 ` Kinglong Mee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55B5A012.1030006@gmail.com \
--to=kinglongmee@gmail.com \
--cc=bfields@fieldses.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=trond.myklebust@primarydata.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).