From: Trond Myklebust <trondmy@gmail.com>
To: "J. Bruce Fields" <bfields@redhat.com>,
Chuck Lever <chuck.lever@oracle.com>
Cc: Jeff Layton <jlayton@redhat.com>,
linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: [PATCH 14/16] nfsd: close cached files prior to a REMOVE or RENAME that would replace target
Date: Sun, 30 Jun 2019 09:52:38 -0400 [thread overview]
Message-ID: <20190630135240.7490-15-trond.myklebust@hammerspace.com> (raw)
In-Reply-To: <20190630135240.7490-14-trond.myklebust@hammerspace.com>
From: Jeff Layton <jeff.layton@primarydata.com>
It's not uncommon for some workloads to do a bunch of I/O to a file and
delete it just afterward. If knfsd has a cached open file however, then
the file may still be open when the dentry is unlinked. If the
underlying filesystem is nfs, then that could trigger it to do a
sillyrename.
On a REMOVE or RENAME scan the nfsd_file cache for open files that
correspond to the inode, and proactively unhash and put their
references. This should prevent any delete-on-last-close activity from
occurring, solely due to knfsd's open file cache.
This must be done synchronously though so we use the variants that call
flush_delayed_fput. There are deadlock possibilities if you call
flush_delayed_fput while holding locks, however. In the case of
nfsd_rename, we don't even do the lookups of the dentries to be renamed
until we've locked for rename.
Once we've figured out what the target dentry is for a rename, check to
see whether there are cached open files associated with it. If there
are, then unwind all of the locking, close them all, and then reattempt
the rename.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
fs/nfsd/vfs.c | 62 +++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 53 insertions(+), 9 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 58b6d8df95d4..f5cf64a40112 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1590,6 +1590,26 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
goto out_unlock;
}
+static void
+nfsd_close_cached_files(struct dentry *dentry)
+{
+ struct inode *inode = d_inode(dentry);
+
+ if (inode && S_ISREG(inode->i_mode))
+ nfsd_file_close_inode_sync(inode);
+}
+
+static bool
+nfsd_has_cached_files(struct dentry *dentry)
+{
+ bool ret = false;
+ struct inode *inode = d_inode(dentry);
+
+ if (inode && S_ISREG(inode->i_mode))
+ ret = nfsd_file_is_cached(inode);
+ return ret;
+}
+
/*
* Rename a file
* N.B. After this call _both_ ffhp and tfhp need an fh_put
@@ -1602,6 +1622,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
struct inode *fdir, *tdir;
__be32 err;
int host_err;
+ bool has_cached = false;
err = fh_verify(rqstp, ffhp, S_IFDIR, NFSD_MAY_REMOVE);
if (err)
@@ -1620,6 +1641,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
if (!flen || isdotent(fname, flen) || !tlen || isdotent(tname, tlen))
goto out;
+retry:
host_err = fh_want_write(ffhp);
if (host_err) {
err = nfserrno(host_err);
@@ -1659,11 +1681,16 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
if (ffhp->fh_export->ex_path.dentry != tfhp->fh_export->ex_path.dentry)
goto out_dput_new;
- host_err = vfs_rename(fdir, odentry, tdir, ndentry, NULL, 0);
- if (!host_err) {
- host_err = commit_metadata(tfhp);
- if (!host_err)
- host_err = commit_metadata(ffhp);
+ if (nfsd_has_cached_files(ndentry)) {
+ has_cached = true;
+ goto out_dput_old;
+ } else {
+ host_err = vfs_rename(fdir, odentry, tdir, ndentry, NULL, 0);
+ if (!host_err) {
+ host_err = commit_metadata(tfhp);
+ if (!host_err)
+ host_err = commit_metadata(ffhp);
+ }
}
out_dput_new:
dput(ndentry);
@@ -1676,12 +1703,26 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
* as that would do the wrong thing if the two directories
* were the same, so again we do it by hand.
*/
- fill_post_wcc(ffhp);
- fill_post_wcc(tfhp);
+ if (!has_cached) {
+ fill_post_wcc(ffhp);
+ fill_post_wcc(tfhp);
+ }
unlock_rename(tdentry, fdentry);
ffhp->fh_locked = tfhp->fh_locked = false;
fh_drop_write(ffhp);
+ /*
+ * If the target dentry has cached open files, then we need to try to
+ * close them prior to doing the rename. Flushing delayed fput
+ * shouldn't be done with locks held however, so we delay it until this
+ * point and then reattempt the whole shebang.
+ */
+ if (has_cached) {
+ has_cached = false;
+ nfsd_close_cached_files(ndentry);
+ dput(ndentry);
+ goto retry;
+ }
out:
return err;
}
@@ -1728,10 +1769,13 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
if (!type)
type = d_inode(rdentry)->i_mode & S_IFMT;
- if (type != S_IFDIR)
+ if (type != S_IFDIR) {
+ nfsd_close_cached_files(rdentry);
host_err = vfs_unlink(dirp, rdentry, NULL);
- else
+ } else {
host_err = vfs_rmdir(dirp, rdentry);
+ }
+
if (!host_err)
host_err = commit_metadata(fhp);
dput(rdentry);
--
2.21.0
next prev parent reply other threads:[~2019-06-30 13:55 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-30 13:52 [PATCH 00/16] Cache open file descriptors in knfsd Trond Myklebust
2019-06-30 13:52 ` [PATCH 01/16] sunrpc: add a new cache_detail operation for when a cache is flushed Trond Myklebust
2019-06-30 13:52 ` [PATCH 02/16] locks: create a new notifier chain for lease attempts Trond Myklebust
2019-06-30 13:52 ` [PATCH 03/16] notify: export symbols for use by the knfsd file cache Trond Myklebust
2019-06-30 13:52 ` [PATCH 04/16] vfs: Export flush_delayed_fput for use by knfsd Trond Myklebust
2019-06-30 13:52 ` [PATCH 05/16] nfsd: add a new struct file caching facility to nfsd Trond Myklebust
2019-06-30 13:52 ` [PATCH 06/16] nfsd: hook up nfsd_write to the new nfsd_file cache Trond Myklebust
2019-06-30 13:52 ` [PATCH 07/16] nfsd: hook up nfsd_read to the " Trond Myklebust
2019-06-30 13:52 ` [PATCH 08/16] nfsd: hook nfsd_commit up " Trond Myklebust
2019-06-30 13:52 ` [PATCH 09/16] nfsd: convert nfs4_file->fi_fds array to use nfsd_files Trond Myklebust
2019-06-30 13:52 ` [PATCH 10/16] nfsd: convert fi_deleg_file and ls_file fields to nfsd_file Trond Myklebust
2019-06-30 13:52 ` [PATCH 11/16] nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache Trond Myklebust
2019-06-30 13:52 ` [PATCH 12/16] nfsd: have nfsd_test_lock use " Trond Myklebust
2019-06-30 13:52 ` [PATCH 13/16] nfsd: rip out the raparms cache Trond Myklebust
2019-06-30 13:52 ` Trond Myklebust [this message]
2019-06-30 13:52 ` [PATCH 15/16] nfsd: Fix up some unused variable warnings Trond Myklebust
2019-06-30 13:52 ` [PATCH 16/16] nfsd: Fix the documentation for svcxdr_tmpalloc() Trond Myklebust
2019-06-30 15:57 ` [PATCH 05/16] nfsd: add a new struct file caching facility to nfsd Matthew Wilcox
2019-06-30 16:15 ` Trond Myklebust
2019-06-30 15:27 ` [PATCH 02/16] locks: create a new notifier chain for lease attempts Matthew Wilcox
2019-06-30 15:50 ` Trond Myklebust
2019-07-01 15:02 ` [PATCH 00/16] Cache open file descriptors in knfsd Chuck Lever
2019-07-01 15:17 ` Trond Myklebust
2019-07-01 15:39 ` Chuck Lever
2019-07-31 22:05 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190630135240.7490-15-trond.myklebust@hammerspace.com \
--to=trondmy@gmail.com \
--cc=bfields@redhat.com \
--cc=chuck.lever@oracle.com \
--cc=jlayton@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).