* [RFC][PATCH 00/13] Fix FS-Cache problems
@ 2011-09-29 14:45 David Howells
2011-09-29 14:45 ` [PATCH 01/13] Noisefs: A predictable noise producing fs for testing things David Howells
` (12 more replies)
0 siblings, 13 replies; 14+ messages in thread
From: David Howells @ 2011-09-29 14:45 UTC (permalink / raw)
To: moseleymark, mark, jlayton, steved
Cc: linux-fsdevel, linux-nfs, linux-cachefs
Here are some patches to debug and fix FS-Cache problems.
(1) A debugging patch to add a predictable pattern generator with a filesystem
interface. Can be used to generate regular apparent changes on the NFS
server. This was very useful in breaking FS-Cache.
(2) A patch to correctly mark cached netfs pages.
(3) A debugging patch to validate page->mapping on a page for which retrieval
was requested.
(4) A patch to downgrade memory allocation levels in the cache to not try so
hard and to be more willing to abort with ENOMEM. It's a cache - it
doesn't matter if we can't store something.
(5) A debugging patch to check that there are no read operations outstanding
on a cookie when it is relinquished.
(6) A debugging patch to check that the object cookie pointer doesn't get
cleared whilst we're trying to read for it.
(7) A patch to make conditional some debugging prints.
(8) A patch to make cookie relinquishment log a warning and wait for any
outstanding reads.
(9) A patch to fix up operation state handling and accounting.
(10) A patch to provide proper invalidation facilities.
(11) A patch to make a vfs_truncate() for cachefiles's invalidation to call.
Question: CacheFiles uses truncation in a couple of places. It has been
using notify_change() rather than sys_truncate() or something similar.
This means it bypasses a bunch of checks and suchlike that it possibly
should be making (security, file locking, lease breaking, vfsmount write).
Should it be using vfs_truncate() as added by a preceding patch or should
it use notify_write() and assume that anyone poking around in the cache
files on disk gets everything they deserve?
(12) A patch to provide invalidation for cachefiles.
(13) A patch to make NFS use the invalidation call.
David
---
David Howells (13):
NFS: Use FS-Cache invalidation
CacheFiles: Implement invalidation
VFS: Make more complete truncate operation available to CacheFiles
FS-Cache: Provide proper invalidation
FS-Cache: Fix operation state management and accounting
FS-Cache: Make cookie relinquishment wait for outstanding reads
CacheFiles: Make some debugging statements conditional
FS-Cache: Check cookie is still correct in __fscache_read_or_alloc_pages()
FS-Cache: Check that there are no read ops when cookie relinquished
CacheFiles: Downgrade the requirements passed to the allocator
FS-Cache: Validate page mapping pointer value
CacheFiles: Fix the marking of cached pages
Noisefs: A predictable noise producing fs for testing things
Documentation/filesystems/caching/backend-api.txt | 38 ++
Documentation/filesystems/caching/netfs-api.txt | 46 ++-
Documentation/filesystems/caching/object.txt | 23 +
Documentation/filesystems/caching/operations.txt | 2
fs/Kconfig | 1
fs/Makefile | 1
fs/cachefiles/interface.c | 57 +++-
fs/cachefiles/internal.h | 2
fs/cachefiles/key.c | 2
fs/cachefiles/rdwr.c | 118 ++++---
fs/cachefiles/xattr.c | 2
fs/fscache/cookie.c | 78 +++++
fs/fscache/internal.h | 10 +
fs/fscache/object.c | 74 +++++
fs/fscache/operation.c | 135 ++++++---
fs/fscache/page.c | 188 ++++++++++--
fs/fscache/stats.c | 11 +
fs/nfs/fscache.h | 20 +
fs/nfs/inode.c | 20 +
fs/nfs/nfs4proc.c | 2
fs/noisefs/Kconfig | 20 +
fs/noisefs/Makefile | 7
fs/noisefs/file.c | 331 +++++++++++++++++++++
fs/noisefs/inode.c | 171 +++++++++++
fs/noisefs/internal.h | 59 ++++
fs/noisefs/super.c | 302 +++++++++++++++++++
fs/open.c | 50 ++-
include/linux/fs.h | 1
include/linux/fscache-cache.h | 53 +++
include/linux/fscache.h | 50 +++
mm/page-writeback.c | 1
31 files changed, 1696 insertions(+), 179 deletions(-)
create mode 100644 fs/noisefs/Kconfig
create mode 100644 fs/noisefs/Makefile
create mode 100644 fs/noisefs/file.c
create mode 100644 fs/noisefs/inode.c
create mode 100644 fs/noisefs/internal.h
create mode 100644 fs/noisefs/super.c
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 01/13] Noisefs: A predictable noise producing fs for testing things
2011-09-29 14:45 [RFC][PATCH 00/13] Fix FS-Cache problems David Howells
@ 2011-09-29 14:45 ` David Howells
2011-09-29 14:46 ` [PATCH 02/13] CacheFiles: Fix the marking of cached pages David Howells
` (11 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2011-09-29 14:45 UTC (permalink / raw)
To: moseleymark, mark, jlayton, steved
Cc: linux-cachefs, linux-fsdevel, linux-nfs
Add a configurable, predictable noise-producing filesystem for testing stuff,
particularly cache coherency handling in NFS and FS-Cache.
It's based somewhat on ramfs, so you mount an empty fs and populate it:
mount -t noisefs none /mnt
touch /mnt/{a,b}
mkdir /mnt/{c,d}
touch /mnt/c/{e,f}
The size of files can be set, e.g.:
echo hello >/mnt/a
echo hello >>/mnt/a
truncate -s 4096 /mnt/a
However, no data is actually stored, and the filesystem is not persistent.
Data can be read, but it's generated by a pattern generator, and is available
up to the EOF point.
The data pattern is generated in BE words of four bytes, consisting of the sum
of the filesystem key, inode number, inode data version number and the word
number within the file (file offset / 4). This should make it easy to check
quickly whether the data is correct.
=============
CONFIGURATION
=============
(1) A file's i_version number can be given a timeout. This means that once
an i_version is viewed (by getattr), it is only good for that many more
seconds. After that it will be incremented and the timeout begun again:
setfattr -n iversion_timo -v 4 /mnt/a
The file's i_version number is also incremented any time there's a write
or a truncation operation performed on the file.
(2) A file can be made to inject an arbitrary error (EIO by default) when
i_version reaches or exceeds a certain number:
setfattr -n inject_error_at -v 4 /mnt/a
setfattr -n error_to_inject -v 28 /mnt/a
So the above would inject ENOSPC when i_version >= 4.
(3) A 32-bit filesystem ID key can be set during mounting:
mount -t noisefs none /mnt -o fs_key=1234
Note that when exporting this filesystem through NFS, an FSID should be
set as the mount does not have real device numbers of its own.
Signed-off-by: David Howells <dhowells@redhat.com>
---
fs/Kconfig | 1
fs/Makefile | 1
fs/noisefs/Kconfig | 20 +++
fs/noisefs/Makefile | 7 +
fs/noisefs/file.c | 331 +++++++++++++++++++++++++++++++++++++++++++++++++
fs/noisefs/inode.c | 171 +++++++++++++++++++++++++
fs/noisefs/internal.h | 59 +++++++++
fs/noisefs/super.c | 302 +++++++++++++++++++++++++++++++++++++++++++++
mm/page-writeback.c | 1
9 files changed, 893 insertions(+), 0 deletions(-)
create mode 100644 fs/noisefs/Kconfig
create mode 100644 fs/noisefs/Makefile
create mode 100644 fs/noisefs/file.c
create mode 100644 fs/noisefs/inode.c
create mode 100644 fs/noisefs/internal.h
create mode 100644 fs/noisefs/super.c
diff --git a/fs/Kconfig b/fs/Kconfig
index 9fe0b34..7290445 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -173,6 +173,7 @@ config HUGETLB_PAGE
def_bool HUGETLBFS
source "fs/configfs/Kconfig"
+source "fs/noisefs/Kconfig"
endmenu
diff --git a/fs/Makefile b/fs/Makefile
index afc1096..fde1b7b 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -123,3 +123,4 @@ obj-$(CONFIG_GFS2_FS) += gfs2/
obj-$(CONFIG_EXOFS_FS) += exofs/
obj-$(CONFIG_CEPH_FS) += ceph/
obj-$(CONFIG_PSTORE) += pstore/
+obj-$(CONFIG_NOISE_FS) += noisefs/
diff --git a/fs/noisefs/Kconfig b/fs/noisefs/Kconfig
new file mode 100644
index 0000000..bf68072
--- /dev/null
+++ b/fs/noisefs/Kconfig
@@ -0,0 +1,20 @@
+config NOISE_FS
+ tristate "Predictable noise-maker filesystem (EXPERIMENTAL)"
+ depends on EXPERIMENTAL
+ help
+ A debugging filesystem that generates pages of noise in a predictable
+ pattern based on the fs key, inode number, page number and data
+ version number.
+
+ The filesystem borrows from ramfs to provide directory and special
+ file support; only regular files produce noise.
+
+ No backing storage is required as data writes merely update the data
+ version number. The files can be programmed to automatically update
+ the data version number every N reads to simulate the effect of
+ interference by another writer.
+
+ Export ops are available, so the thing can be NFS exported.
+
+ To compile this filesystem as a module, choose M here: the module
+ will be called noisefs. If unsure, say N.
diff --git a/fs/noisefs/Makefile b/fs/noisefs/Makefile
new file mode 100644
index 0000000..59f9746
--- /dev/null
+++ b/fs/noisefs/Makefile
@@ -0,0 +1,7 @@
+#
+# Makefile for the linux ramfs routines.
+#
+
+obj-$(CONFIG_NOISE_FS) += noisefs.o
+
+noisefs-objs += super.o inode.o file.o
diff --git a/fs/noisefs/file.c b/fs/noisefs/file.c
new file mode 100644
index 0000000..44e9b65
--- /dev/null
+++ b/fs/noisefs/file.c
@@ -0,0 +1,331 @@
+/* Predictable noise generator file handler
+ *
+ * Copyright (C) 2011 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/sched.h>
+#include <linux/aio.h>
+#include <linux/uaccess.h>
+#include "internal.h"
+
+/*
+ * Fabricate data for the reader to read
+ */
+static ssize_t noisefs_read(struct file *filp, char __user *buf, size_t len,
+ loff_t *ppos)
+{
+ struct inode *i = filp->f_dentry->d_inode;
+ struct noisefs_inode *ni = NOISEFS_I(i);
+ ssize_t ret;
+ size_t piece;
+ loff_t pos = *ppos, eof, avail;
+ __be32 raw;
+ const void const *raw_p = &raw;
+ u32 data;
+
+ kenter("%lu,,%zu,%llu", i->i_ino, len, pos);
+
+ if (pos + len < pos)
+ len = (loff_t)-1ULL - pos;
+ if (len == 0) {
+ kleave(" = 0 [len 0]");
+ return 0;
+ }
+
+ if (ni->inject_error_at && i->i_version >= ni->inject_error_at) {
+ pr_warning("noisefs_read() injected error\n");
+ return ni->error_to_inject;
+ }
+
+ /* Determine the word covering the starting position and file size */
+ spin_lock(&i->i_lock);
+ eof = i_size_read(i);
+ data = NOISEFS_SB(i->i_sb)->fs_key;
+ data += i->i_ino + i->i_version;
+ data += pos >> 2;
+ spin_unlock(&i->i_lock);
+
+ kdebug("start with %08x eof %llx", data, eof);
+
+ if (pos >= eof) {
+ kleave(" = 0 [after eof]");
+ return 0;
+ }
+
+ /* Shrink the read to the part before the EOF */
+ avail = eof - pos;
+ if (avail < len)
+ len = avail;
+ *ppos = pos + len;
+ ret = len;
+
+ kdebug("readable %lx/%llx", len, avail);
+
+ /* handle a misaligned starting position */
+ if (pos & 3) {
+ unsigned off = pos & 3;
+
+ piece = sizeof(data) - off;
+ piece = (len > piece) ? piece : len;
+
+ kdebug("misaligned start %zu/%zu", piece, len);
+
+ raw = cpu_to_be32(data);
+ if (copy_to_user(buf, raw_p + off, piece))
+ goto fault;
+ buf += piece;
+ len -= piece;
+ data++;
+ }
+
+ /* Handle all the whole words */
+ if (len >= 4) {
+ kdebug("whole words %zu", len & ~3);
+ do {
+ raw = cpu_to_be32(data);
+ if (put_user(raw, (__be32 __user *)buf) < 0)
+ goto fault;
+ buf += 4;
+ len -= 4;
+ data++;
+ } while (len >= 4);
+ }
+
+ /* Handle trailing partial word */
+ if (len > 0) {
+ kdebug("partial trailer %zu", len);
+ raw = cpu_to_be32(data);
+ if (copy_to_user(buf, raw_p, len))
+ goto fault;
+ }
+
+ kleave(" = %zd [%llx]", ret, *ppos);
+ return ret;
+
+fault:
+ kleave(" = -EFAULT");
+ return -EFAULT;
+}
+
+/*
+ * Note a write, but don't actually store the data
+ */
+static ssize_t noisefs_write(struct file *filp, const char __user *buf,
+ size_t len, loff_t *ppos)
+{
+ struct inode *i = filp->f_dentry->d_inode;
+ struct noisefs_inode *ni = NOISEFS_I(i);
+ loff_t pos = *ppos, eof;
+ time_t now;
+
+ kenter("%lu{%llx},,%zu,%llu", i->i_ino, i->i_version, len, pos);
+
+ if (pos + len < pos)
+ return -EFBIG;
+
+ if (ni->inject_error_at && i->i_version >= ni->inject_error_at) {
+ pr_warning("noisefs_write() injected error\n");
+ return ni->error_to_inject;
+ }
+
+ now = CURRENT_TIME.tv_sec;
+
+ spin_lock(&i->i_lock);
+
+ eof = i_size_read(i);
+
+ if (filp->f_flags & O_APPEND)
+ pos = eof;
+
+ if (pos + len > eof)
+ i_size_write(i, pos + len);
+ i->i_version++;
+ if (ni->iver_timeout)
+ ni->iver_expiry = now + ni->iver_timeout;
+
+ spin_unlock(&i->i_lock);
+ *ppos = pos + len;
+ return len;
+}
+
+static ssize_t noisefs_aio_write(struct kiocb *iocb,
+ const struct iovec *iov, unsigned long nr_segs,
+ loff_t pos)
+{
+ loff_t pos_copy = pos;
+ return noisefs_write(iocb->ki_filp, NULL, iocb->ki_nbytes, &pos_copy);
+}
+
+static ssize_t noisefs_splice_write(struct pipe_inode_info *pipe,
+ struct file *out,
+ loff_t *ppos, size_t len,
+ unsigned int flags)
+{
+ return noisefs_write(out, NULL, len, ppos);
+}
+
+const struct file_operations noisefs_file_operations = {
+ .read = noisefs_read,
+ .write = noisefs_write,
+ .aio_write = noisefs_aio_write,
+ .splice_write = noisefs_splice_write,
+ .llseek = generic_file_llseek,
+ .fsync = noop_fsync,
+};
+
+static int noisefs_extract_val(const void *value, size_t size,
+ unsigned *_val)
+{
+ if (size > 0 && value && *(char *)value) {
+ char buf[8], *end;
+ if (size > sizeof(buf) - 1)
+ return -EINVAL;
+ memcpy(buf, value, size);
+ buf[size] = 0;
+ *_val = simple_strtoul(value, &end, 10);
+ if (end != value + size)
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int noisefs_setxattr(struct dentry *dentry, const char *name,
+ const void *value, size_t size, int flags)
+{
+ struct inode *i = dentry->d_inode;
+ struct noisefs_inode *ni = NOISEFS_I(i);
+
+ kenter("%lu,%s,,,", i->i_ino, name);
+
+ if (!name || !*name)
+ return -EOPNOTSUPP;
+
+ if (strcmp(name, "iversion_timo") == 0) {
+ unsigned timeout = 0;
+
+ if (noisefs_extract_val(value, size, &timeout) < 0)
+ return -EINVAL;
+
+ /* increment i_version after first getattr on this version */
+ kdebug("set timeout %x", timeout);
+ spin_lock(&ni->lock);
+ ni->iver_timeout = timeout;
+ if (!timeout)
+ ni->iver_expiry = 0;
+ spin_unlock(&ni->lock);
+ return 0;
+ }
+
+ if (strcmp(name, "inject_error_at") == 0) {
+ unsigned when = 0;
+
+ if (noisefs_extract_val(value, size, &when) < 0)
+ return -EINVAL;
+ ni->inject_error_at = when;
+ return 0;
+ }
+
+ if (strcmp(name, "error_to_inject") == 0) {
+ unsigned err = 0;
+
+ if (noisefs_extract_val(value, size, &err) < 0 ||
+ err <= 0 || err >= 512)
+ return -EINVAL;
+ ni->error_to_inject = -err;
+ return 0;
+ }
+
+ return -EOPNOTSUPP;
+}
+
+static int noisefs_setattr(struct dentry *dentry, struct iattr *iattr)
+{
+ struct inode *i = dentry->d_inode;
+ struct noisefs_inode *ni = NOISEFS_I(i);
+ time_t now;
+ int error;
+
+ kenter("%lu{%llx},{%x}", i->i_ino, i->i_version, iattr->ia_valid);
+
+ error = inode_change_ok(i, iattr);
+ if (error)
+ return error;
+
+ if (ni->inject_error_at && i->i_version >= ni->inject_error_at) {
+ pr_warning("noisefs_setattr() preinjected error\n");
+ return ni->error_to_inject;
+ }
+
+ if (iattr->ia_valid & ATTR_SIZE) {
+ now = CURRENT_TIME.tv_sec;
+ spin_lock(&i->i_lock);
+ i_size_write(i, iattr->ia_size);
+ i->i_version++;
+ if (ni->iver_timeout)
+ ni->iver_expiry = now + ni->iver_timeout;
+ spin_unlock(&i->i_lock);
+ }
+
+ if (ni->inject_error_at && i->i_version >= ni->inject_error_at) {
+ pr_warning("noisefs_read() injected error\n");
+ return ni->error_to_inject;
+ }
+
+ setattr_copy(i, iattr);
+ mark_inode_dirty(i);
+ return 0;
+}
+
+static int noisefs_getattr(struct vfsmount *mnt, struct dentry *dentry,
+ struct kstat *stat)
+{
+ struct inode *i = dentry->d_inode;
+ struct noisefs_inode *ni = NOISEFS_I(i);
+ time_t now;
+
+ if (ni->iver_timeout) {
+ now = CURRENT_TIME.tv_sec;
+ spin_lock(&ni->lock);
+ if (ni->iver_timeout) {
+ if (!ni->iver_expiry || now > ni->iver_expiry) {
+ if (ni->iver_expiry) {
+ kdebug("i_version expired");
+ spin_lock(&i->i_lock);
+ i->i_version++;
+ spin_unlock(&i->i_lock);
+ }
+ ni->iver_expiry = now + ni->iver_timeout;
+ }
+ }
+ spin_unlock(&ni->lock);
+ }
+
+ if (ni->inject_error_at && i->i_version >= ni->inject_error_at) {
+ pr_warning("noisefs_getattr() injected error\n");
+ return ni->error_to_inject;
+ }
+
+ return simple_getattr(mnt, dentry, stat);
+}
+
+const struct inode_operations noisefs_file_inode_operations = {
+ .getattr = noisefs_getattr,
+ .setattr = noisefs_setattr,
+ .setxattr = noisefs_setxattr,
+};
+
+const struct address_space_operations noisefs_aops = {
+ .readpage = simple_readpage,
+ .write_begin = simple_write_begin,
+ .write_end = simple_write_end,
+ .set_page_dirty = __set_page_dirty_no_writeback,
+};
diff --git a/fs/noisefs/inode.c b/fs/noisefs/inode.c
new file mode 100644
index 0000000..d01553f
--- /dev/null
+++ b/fs/noisefs/inode.c
@@ -0,0 +1,171 @@
+/* Predictable noise generator filesystem
+ *
+ * Copyright (C) 2011 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#include <linux/fs.h>
+#include <linux/time.h>
+#include <linux/pagemap.h>
+#include <linux/backing-dev.h>
+#include <linux/slab.h>
+#include "internal.h"
+
+static const struct inode_operations noisefs_dir_inode_operations;
+
+static struct backing_dev_info noisefs_backing_dev_info = {
+ .name = "noisefs",
+ .ra_pages = 0, /* No readahead */
+ .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK,
+};
+
+void noisefs_i_init_once(void *_inode)
+{
+ struct noisefs_inode *ni = _inode;
+
+ memset(ni, 0, sizeof(*ni));
+ inode_init_once(&ni->vfs_inode);
+ spin_lock_init(&ni->lock);
+}
+
+/*
+ * Get an inode and initialise it
+ */
+struct inode *noisefs_iget(struct super_block *sb, const struct inode *dir,
+ int mode, dev_t dev)
+{
+ struct noisefs_super *s = NOISEFS_SB(sb);
+ struct noisefs_inode *ni;
+ struct inode *inode;
+ ino_t ino;
+
+ kenter(",%lx,%o,%x", dir ? dir->i_ino : 0, mode, dev);
+
+ spin_lock(&s->ino_lock);
+ if (S_ISREG(mode)) {
+ if (s->regular_ino == INT_MAX)
+ goto out_of_inos;
+ ino = ++s->regular_ino;
+ } else {
+ if (s->special_ino == UINT_MAX)
+ goto out_of_inos;
+ ino = ++s->special_ino;
+ }
+ spin_unlock(&s->ino_lock);
+
+ inode = iget_locked(sb, ino);
+ if (!inode)
+ return ERR_PTR(-ENOMEM);
+ if (!(inode->i_state & I_NEW)) {
+ kenter(" = {%lx} [extant]", inode->i_ino);
+ return inode;
+ }
+
+ inode_init_owner(inode, dir, mode);
+ inode->i_mapping->a_ops = &noisefs_aops;
+ inode->i_mapping->backing_dev_info = &noisefs_backing_dev_info;
+ mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
+ mapping_set_unevictable(inode->i_mapping);
+ inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+
+ switch (mode & S_IFMT) {
+ default:
+ init_special_inode(inode, mode, dev);
+ break;
+ case S_IFREG:
+ inode->i_op = &noisefs_file_inode_operations;
+ inode->i_fop = &noisefs_file_operations;
+ break;
+ case S_IFDIR:
+ inode->i_op = &noisefs_dir_inode_operations;
+ inode->i_fop = &simple_dir_operations;
+
+ /* directory inodes start off with i_nlink == 2 (for "." entry) */
+ inc_nlink(inode);
+ break;
+ case S_IFLNK:
+ inode->i_op = &page_symlink_inode_operations;
+ break;
+ }
+
+ ni = NOISEFS_I(inode);
+ ni->error_to_inject = -EIO;
+
+ unlock_new_inode(inode);
+ kenter(" = {%lx} [new]", inode->i_ino);
+ return inode;
+
+out_of_inos:
+ spin_unlock(&s->ino_lock);
+ pr_warning("noisefs: Ran out of inode numbers\n");
+ return NULL;
+}
+
+/*
+ * File creation. Allocate an inode, and we're done..
+ */
+static int noisefs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
+{
+ struct inode *inode = noisefs_iget(dir->i_sb, dir, mode, dev);
+ int error = -ENOSPC;
+
+ if (inode) {
+ d_instantiate(dentry, inode);
+ dget(dentry); /* Extra count - pin the dentry in core */
+ dir->i_mtime = dir->i_ctime = CURRENT_TIME;
+ error = 0;
+ }
+ return error;
+}
+
+static int noisefs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
+{
+ int retval = noisefs_mknod(dir, dentry, mode | S_IFDIR, 0);
+ if (!retval)
+ inc_nlink(dir);
+ return retval;
+}
+
+static int noisefs_create(struct inode *dir, struct dentry *dentry, int mode,
+ struct nameidata *nd)
+{
+ return noisefs_mknod(dir, dentry, mode | S_IFREG, 0);
+}
+
+static int noisefs_symlink(struct inode *dir, struct dentry *dentry,
+ const char *symname)
+{
+ struct inode *inode;
+ int error = -ENOSPC;
+
+ inode = noisefs_iget(dir->i_sb, dir, S_IFLNK|S_IRWXUGO, 0);
+ if (inode) {
+ int l = strlen(symname) + 1;
+ error = page_symlink(inode, symname, l);
+ if (!error) {
+ d_instantiate(dentry, inode);
+ dget(dentry);
+ dir->i_mtime = dir->i_ctime = CURRENT_TIME;
+ } else {
+ iput(inode);
+ }
+ }
+ return error;
+}
+
+static const struct inode_operations noisefs_dir_inode_operations = {
+ .create = noisefs_create,
+ .lookup = simple_lookup,
+ .link = simple_link,
+ .unlink = simple_unlink,
+ .symlink = noisefs_symlink,
+ .mkdir = noisefs_mkdir,
+ .rmdir = simple_rmdir,
+ .mknod = noisefs_mknod,
+ .rename = simple_rename,
+};
diff --git a/fs/noisefs/internal.h b/fs/noisefs/internal.h
new file mode 100644
index 0000000..56a371d
--- /dev/null
+++ b/fs/noisefs/internal.h
@@ -0,0 +1,59 @@
+/* Predictable noise generator filesystem internal defs
+ *
+ * Copyright (C) 2011 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+struct noisefs_super {
+ spinlock_t ino_lock; /* Lock for *_ino */
+ unsigned long regular_ino; /* Last regular inode number allocated */
+ unsigned long special_ino; /* Last special inode number allocated */
+ unsigned fs_key; /* Filesystem noise key */
+};
+
+static inline struct noisefs_super *NOISEFS_SB(struct super_block *sb)
+{
+ return (struct noisefs_super *)sb->s_fs_info;
+}
+
+struct noisefs_inode {
+ struct inode vfs_inode;
+ spinlock_t lock;
+ unsigned iver_timeout;
+ time_t iver_expiry;
+ unsigned inject_error_at;
+ int error_to_inject;
+};
+
+static inline struct noisefs_inode *NOISEFS_I(struct inode *inode)
+{
+ return container_of(inode, struct noisefs_inode, vfs_inode);
+}
+
+/*
+ * file.c
+ */
+extern const struct address_space_operations noisefs_aops;
+extern const struct file_operations noisefs_file_operations;
+extern const struct inode_operations noisefs_file_inode_operations;
+
+/*
+ * inode.c
+ */
+extern void noisefs_i_init_once(void *);
+extern struct inode *noisefs_iget(struct super_block *,
+ const struct inode *, int, dev_t);
+
+/*
+ * Debugging
+ */
+#define dbgprintk(FMT,...) \
+ no_printk("[%-6.6s] "FMT"\n", current->comm ,##__VA_ARGS__)
+#define kenter(FMT,...) dbgprintk("==> %s("FMT")",__func__ ,##__VA_ARGS__)
+#define kleave(FMT,...) dbgprintk("<== %s()"FMT"",__func__ ,##__VA_ARGS__)
+#define kdebug(FMT,...) dbgprintk(" "FMT ,##__VA_ARGS__)
diff --git a/fs/noisefs/super.c b/fs/noisefs/super.c
new file mode 100644
index 0000000..a672b75
--- /dev/null
+++ b/fs/noisefs/super.c
@@ -0,0 +1,302 @@
+/* Predictable noise generator filesystem
+ *
+ * Copyright (C) 2011 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/fs.h>
+#include <linux/exportfs.h>
+#include <linux/pagemap.h>
+#include <linux/parser.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include "internal.h"
+
+MODULE_LICENSE("GPL");
+
+static struct kmem_cache *noisefs_inode_cache;
+
+static struct inode *noisefs_alloc_inode(struct super_block *sb)
+{
+ struct noisefs_inode *ni;
+
+ ni = kmem_cache_alloc(noisefs_inode_cache, GFP_KERNEL);
+ if (!ni)
+ return NULL;
+ return &ni->vfs_inode;
+}
+
+static void noisefs_i_callback(struct rcu_head *head)
+{
+ struct inode *inode = container_of(head, struct inode, i_rcu);
+ struct noisefs_inode *ni = NOISEFS_I(inode);
+
+ INIT_LIST_HEAD(&inode->i_dentry);
+ kmem_cache_free(noisefs_inode_cache, ni);
+}
+
+static void noisefs_destroy_inode(struct inode *inode)
+{
+ call_rcu(&inode->i_rcu, noisefs_i_callback);
+}
+
+static const struct super_operations noisefs_super_ops = {
+ .statfs = simple_statfs,
+ .alloc_inode = noisefs_alloc_inode,
+ .drop_inode = generic_delete_inode,
+ .destroy_inode = noisefs_destroy_inode,
+ .show_options = generic_show_options,
+};
+
+static int noisefs_encode_fh(struct dentry *dentry, __u32 *fh,
+ int *max_len, int connectable)
+{
+ struct inode *inode = dentry->d_inode;
+ int len = *max_len;
+ int type = FILEID_INO32_GEN;
+
+ kenter("%lx,,{%d},%d", inode->i_ino, len, connectable);
+
+ if (connectable && (len < 4)) {
+ *max_len = 4;
+ return 255;
+ } else if (len < 2) {
+ *max_len = 2;
+ return 255;
+ }
+
+ len = 2;
+ fh[0] = inode->i_ino;
+ fh[1] = inode->i_generation;
+ if (connectable && !S_ISDIR(inode->i_mode)) {
+ struct inode *parent;
+
+ spin_lock(&dentry->d_lock);
+ parent = dentry->d_parent->d_inode;
+ fh[2] = parent->i_ino;
+ fh[3] = parent->i_generation;
+ spin_unlock(&dentry->d_lock);
+ len = 4;
+ type = FILEID_INO32_GEN_PARENT;
+ }
+ *max_len = len;
+ kleave(" = %d [%d]", type, len);
+ return type;
+}
+
+static struct inode *noisefs_nfs_get_inode(struct super_block *sb,
+ u64 ino, u32 generation)
+{
+ struct inode *inode;
+
+ kenter(",%llx,%x", ino, generation);
+
+ /* If the inode exists, it will be in core already. */
+ inode = ilookup(sb, ino);
+ if (!inode) {
+ kleave(" = -ESTALE [no ino]");
+ return ERR_PTR(-ESTALE);
+ }
+ if (generation && inode->i_generation != generation) {
+ kleave(" = -ESTALE [%x!=%x]", inode->i_generation, generation);
+ iput(inode);
+ return ERR_PTR(-ESTALE);
+ }
+ return inode;
+}
+
+static struct dentry *noisefs_fh_to_dentry(struct super_block *sb,
+ struct fid *fid,
+ int fh_len, int fh_type)
+{
+ struct dentry *dentry;
+
+ kenter(",,%d,%d", fh_len, fh_type);
+ dentry = generic_fh_to_dentry(sb, fid, fh_len, fh_type,
+ noisefs_nfs_get_inode);
+ kleave(" = %p", dentry);
+ return dentry;
+}
+
+static struct dentry *noisefs_fh_to_parent(struct super_block *sb,
+ struct fid *fid,
+ int fh_len, int fh_type)
+{
+ struct dentry *dentry;
+
+ kenter(",,%d,%d", fh_len, fh_type);
+ dentry = generic_fh_to_parent(sb, fid, fh_len, fh_type,
+ noisefs_nfs_get_inode);
+ kleave(" = %p", dentry);
+ return dentry;
+}
+
+static struct dentry *noisefs_get_parent(struct dentry *child)
+{
+ struct dentry *dentry;
+
+ kenter("%lx", child->d_inode ? child->d_inode->i_ino : 0);
+
+ /* should we check !IS_ROOT(child)? */
+ dentry = dget_parent(child);
+ kleave(" = %lx", dentry->d_inode->i_ino);
+ return dentry;
+}
+
+static const struct export_operations noisefs_export_ops = {
+ .encode_fh = noisefs_encode_fh,
+ .fh_to_dentry = noisefs_fh_to_dentry,
+ .fh_to_parent = noisefs_fh_to_parent,
+ .get_parent = noisefs_get_parent,
+};
+
+enum {
+ Opt_fs_key,
+ Opt_err
+};
+
+static const match_table_t noisefs_mountopt_tokens = {
+ { Opt_fs_key, "fs_key=%d" },
+ { Opt_err, NULL }
+};
+
+static int noisefs_parse_options(char *data, struct noisefs_super *s)
+{
+ substring_t args[MAX_OPT_ARGS];
+ int option;
+ int token;
+ char *p;
+
+ while ((p = strsep(&data, ",")) != NULL) {
+ if (!*p)
+ continue;
+
+ token = match_token(p, noisefs_mountopt_tokens, args);
+ switch (token) {
+ case Opt_fs_key:
+ if (match_int(&args[0], &option))
+ return -EINVAL;
+ break;
+ default:
+ pr_err("noisefs: Unknown mount option\n");
+ return -EINVAL;
+ }
+ }
+
+ return 0;
+}
+
+int noisefs_fill_super(struct super_block *sb, void *data, int silent)
+{
+ struct noisefs_super *s;
+ struct inode *inode;
+ struct dentry *root;
+ int err;
+
+ save_mount_options(sb, data);
+
+ err = -ENOMEM;
+ s = kzalloc(sizeof(struct noisefs_super), GFP_KERNEL);
+ if (!s)
+ goto fail;
+
+ spin_lock_init(&s->ino_lock);
+ s->special_ino = 0x80000000U;
+
+ sb->s_fs_info = s;
+
+ err = noisefs_parse_options(data, s);
+ if (err)
+ goto fail_have_super;
+
+ sb->s_maxbytes = MAX_LFS_FILESIZE;
+ sb->s_blocksize = PAGE_CACHE_SIZE;
+ sb->s_blocksize_bits = PAGE_CACHE_SHIFT;
+ sb->s_magic = 0x4E6F6973U;
+ sb->s_op = &noisefs_super_ops;
+ sb->s_export_op = &noisefs_export_ops;
+ sb->s_time_gran = 1;
+ sb->s_flags |= MS_I_VERSION;
+
+ err = -ENOMEM;
+ inode = noisefs_iget(sb, NULL,
+ S_IFDIR | S_IRUGO | S_IXUGO | S_IWUSR, 0);
+ if (!inode)
+ goto fail_have_super;
+
+ root = d_alloc_root(inode);
+ if (!root)
+ goto fail_have_inode;
+ sb->s_root = root;
+ return 0;
+
+fail_have_inode:
+ iput(inode);
+fail_have_super:
+ kfree(sb->s_fs_info);
+ sb->s_fs_info = NULL;
+fail:
+ return err;
+}
+
+struct dentry *noisefs_mount(struct file_system_type *fs_type,
+ int flags, const char *dev_name, void *data)
+{
+ return mount_nodev(fs_type, flags, data, noisefs_fill_super);
+}
+
+static void noisefs_kill_sb(struct super_block *sb)
+{
+ kfree(sb->s_fs_info);
+ kill_litter_super(sb);
+}
+
+static struct file_system_type noisefs_fs_type = {
+ .name = "noisefs",
+ .mount = noisefs_mount,
+ .kill_sb = noisefs_kill_sb,
+};
+
+static int __init init_noisefs_fs(void)
+{
+ int ret;
+
+ noisefs_inode_cache = kmem_cache_create("noisefs_inode_cache",
+ sizeof(struct noisefs_inode),
+ 0,
+ SLAB_HWCACHE_ALIGN,
+ noisefs_i_init_once);
+ if (!noisefs_inode_cache) {
+ pr_notice("noisefs: Failed to allocate inode cache\n");
+ ret = -ENOMEM;
+ goto error;
+ }
+
+ ret = register_filesystem(&noisefs_fs_type);
+ if (ret < 0) {
+ pr_notice("noisefs: Failed to register filesystem\n");
+ goto error_fsreg;
+ }
+ return 0;
+
+error_fsreg:
+ kmem_cache_destroy(noisefs_inode_cache);
+error:
+ return ret;
+}
+
+static void __exit exit_noisefs_fs(void)
+{
+ unregister_filesystem(&noisefs_fs_type);
+ kmem_cache_destroy(noisefs_inode_cache);
+}
+
+module_init(init_noisefs_fs)
+module_exit(exit_noisefs_fs)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 0e309cd..c38c311 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1311,6 +1311,7 @@ int __set_page_dirty_no_writeback(struct page *page)
return !TestSetPageDirty(page);
return 0;
}
+EXPORT_SYMBOL(__set_page_dirty_no_writeback);
/*
* Helper function for set_page_dirty family.
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 02/13] CacheFiles: Fix the marking of cached pages
2011-09-29 14:45 [RFC][PATCH 00/13] Fix FS-Cache problems David Howells
2011-09-29 14:45 ` [PATCH 01/13] Noisefs: A predictable noise producing fs for testing things David Howells
@ 2011-09-29 14:46 ` David Howells
2011-09-29 14:46 ` [PATCH 03/13] FS-Cache: Validate page mapping pointer value David Howells
` (10 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2011-09-29 14:46 UTC (permalink / raw)
To: moseleymark, mark, jlayton, steved
Cc: linux-fsdevel, linux-nfs, linux-cachefs
Under some circumstances CacheFiles defers the marking of pages with PG_fscache
so that it can take advantage of pagevecs to reduce the number of calls to
fscache_mark_pages_cached() and the netfs's hook to keep track of this.
There are, however, two problems with this:
(1) It can lead to the PG_fscache mark being applied _after_ the page is set
PG_uptodate and unlocked (by the call to fscache_end_io()).
(2) CacheFiles's ref on the page is dropped immediately following
fscache_end_io() - and so may not still be held when the mark is applied.
This can lead to the page being passed back to the allocator before the
mark is applied.
Fix this by, where appropriate, marking the page before calling
fscache_end_io() and releasing the page. This means that we can't take
advantage of pagevecs and have to make a separate call for each page to the
marking routines.
The symptoms of this are Bad Page state errors cropping up under memory
pressure, for example:
BUG: Bad page state in process tar pfn:002da
page:ffffea0000009fb0 count:0 mapcount:0 mapping: (null) index:0x1447
page flags: 0x1000(private_2)
Pid: 4574, comm: tar Tainted: G W 3.1.0-rc4-fsdevel+ #1064
Call Trace:
[<ffffffff8109583c>] ? dump_page+0xb9/0xbe
[<ffffffff81095916>] bad_page+0xd5/0xea
[<ffffffff81095d82>] get_page_from_freelist+0x35b/0x46a
[<ffffffff810961f3>] __alloc_pages_nodemask+0x362/0x662
[<ffffffff810989da>] __do_page_cache_readahead+0x13a/0x267
[<ffffffff81098942>] ? __do_page_cache_readahead+0xa2/0x267
[<ffffffff81098d7b>] ra_submit+0x1c/0x20
[<ffffffff8109900a>] ondemand_readahead+0x28b/0x29a
[<ffffffff81098ee2>] ? ondemand_readahead+0x163/0x29a
[<ffffffff810990ce>] page_cache_sync_readahead+0x38/0x3a
[<ffffffff81091d8a>] generic_file_aio_read+0x2ab/0x67e
[<ffffffffa008cfbe>] nfs_file_read+0xa4/0xc9 [nfs]
[<ffffffff810c22c4>] do_sync_read+0xba/0xfa
[<ffffffff81177a47>] ? security_file_permission+0x7b/0x84
[<ffffffff810c25dd>] ? rw_verify_area+0xab/0xc8
[<ffffffff810c29a4>] vfs_read+0xaa/0x13a
[<ffffffff810c2a79>] sys_read+0x45/0x6c
[<ffffffff813ac37b>] system_call_fastpath+0x16/0x1b
As can be seen, PG_private_2 (== PG_fscache) is set in the page flags.
Instrumenting fscache_mark_pages_cached() to verify whether page->mapping was
set appropriately showed that sometimes it wasn't. This led to the discovery
that sometimes the page has apparently been reclaimed by the time the marker
got to see it.
Reported-by: M. Stevens <m@tippett.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
---
fs/cachefiles/rdwr.c | 34 ++++++++----------------
fs/fscache/page.c | 59 +++++++++++++++++++++++++----------------
include/linux/fscache-cache.h | 3 ++
include/linux/fscache.h | 12 ++++----
4 files changed, 56 insertions(+), 52 deletions(-)
diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index 0e3c092..0186fc1 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -176,9 +176,8 @@ static void cachefiles_read_copier(struct fscache_operation *_op)
recheck:
if (PageUptodate(monitor->back_page)) {
copy_highpage(monitor->netfs_page, monitor->back_page);
-
- pagevec_add(&pagevec, monitor->netfs_page);
- fscache_mark_pages_cached(monitor->op, &pagevec);
+ fscache_mark_page_cached(monitor->op,
+ monitor->netfs_page);
error = 0;
} else if (!PageError(monitor->back_page)) {
/* the page has probably been truncated */
@@ -335,8 +334,7 @@ backing_page_already_present:
backing_page_already_uptodate:
_debug("- uptodate");
- pagevec_add(pagevec, netpage);
- fscache_mark_pages_cached(op, pagevec);
+ fscache_mark_page_cached(op, netpage);
copy_highpage(netpage, backpage);
fscache_end_io(op, netpage, 0);
@@ -448,8 +446,7 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op,
&pagevec);
} else if (cachefiles_has_space(cache, 0, 1) == 0) {
/* there's space in the cache we can use */
- pagevec_add(&pagevec, page);
- fscache_mark_pages_cached(op, &pagevec);
+ fscache_mark_page_cached(op, page);
ret = -ENODATA;
} else {
ret = -ENOBUFS;
@@ -465,8 +462,7 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op,
*/
static int cachefiles_read_backing_file(struct cachefiles_object *object,
struct fscache_retrieval *op,
- struct list_head *list,
- struct pagevec *mark_pvec)
+ struct list_head *list)
{
struct cachefiles_one_read *monitor = NULL;
struct address_space *bmapping = object->backer->d_inode->i_mapping;
@@ -626,13 +622,13 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
page_cache_release(backpage);
backpage = NULL;
- if (!pagevec_add(mark_pvec, netpage))
- fscache_mark_pages_cached(op, mark_pvec);
+ fscache_mark_page_cached(op, netpage);
page_cache_get(netpage);
if (!pagevec_add(&lru_pvec, netpage))
__pagevec_lru_add_file(&lru_pvec);
+ /* the netpage is unlocked and marked up to date here */
fscache_end_io(op, netpage, 0);
page_cache_release(netpage);
netpage = NULL;
@@ -775,15 +771,11 @@ int cachefiles_read_or_alloc_pages(struct fscache_retrieval *op,
/* submit the apparently valid pages to the backing fs to be read from
* disk */
if (nrbackpages > 0) {
- ret2 = cachefiles_read_backing_file(object, op, &backpages,
- &pagevec);
+ ret2 = cachefiles_read_backing_file(object, op, &backpages);
if (ret2 == -ENOMEM || ret2 == -EINTR)
ret = ret2;
}
- if (pagevec_count(&pagevec) > 0)
- fscache_mark_pages_cached(op, &pagevec);
-
_leave(" = %d [nr=%u%s]",
ret, *nr_pages, list_empty(pages) ? " empty" : "");
return ret;
@@ -806,7 +798,6 @@ int cachefiles_allocate_page(struct fscache_retrieval *op,
{
struct cachefiles_object *object;
struct cachefiles_cache *cache;
- struct pagevec pagevec;
int ret;
object = container_of(op->op.object,
@@ -817,13 +808,10 @@ int cachefiles_allocate_page(struct fscache_retrieval *op,
_enter("%p,{%lx},", object, page->index);
ret = cachefiles_has_space(cache, 0, 1);
- if (ret == 0) {
- pagevec_init(&pagevec, 0);
- pagevec_add(&pagevec, page);
- fscache_mark_pages_cached(op, &pagevec);
- } else {
+ if (ret == 0)
+ fscache_mark_page_cached(op, page);
+ else
ret = -ENOBUFS;
- }
_leave(" = %d", ret);
return ret;
diff --git a/fs/fscache/page.c b/fs/fscache/page.c
index 3f7a59b..d7c663c 100644
--- a/fs/fscache/page.c
+++ b/fs/fscache/page.c
@@ -915,6 +915,40 @@ done:
EXPORT_SYMBOL(__fscache_uncache_page);
/**
+ * fscache_mark_page_cached - Mark a page as being cached
+ * @op: The retrieval op pages are being marked for
+ * @page: The page to be marked
+ *
+ * Mark a netfs page as being cached. After this is called, the netfs
+ * must call fscache_uncache_page() to remove the mark.
+ */
+void fscache_mark_page_cached(struct fscache_retrieval *op, struct page *page)
+{
+ struct fscache_cookie *cookie = op->op.object->cookie;
+
+#ifdef CONFIG_FSCACHE_STATS
+ atomic_inc(&fscache_n_marks);
+#endif
+
+ _debug("- mark %p{%lx}", page, page->index);
+ if (TestSetPageFsCache(page)) {
+ static bool once_only;
+ if (!once_only) {
+ once_only = true;
+ printk(KERN_WARNING "FS-Cache:"
+ " Cookie type %s marked page %lx"
+ " multiple times\n",
+ cookie->def->name, page->index);
+ }
+ }
+
+ if (cookie->def->mark_page_cached)
+ cookie->def->mark_page_cached(cookie->netfs_data,
+ op->mapping, page);
+}
+EXPORT_SYMBOL(fscache_mark_page_cached);
+
+/**
* fscache_mark_pages_cached - Mark pages as being cached
* @op: The retrieval op pages are being marked for
* @pagevec: The pages to be marked
@@ -925,32 +959,11 @@ EXPORT_SYMBOL(__fscache_uncache_page);
void fscache_mark_pages_cached(struct fscache_retrieval *op,
struct pagevec *pagevec)
{
- struct fscache_cookie *cookie = op->op.object->cookie;
unsigned long loop;
-#ifdef CONFIG_FSCACHE_STATS
- atomic_add(pagevec->nr, &fscache_n_marks);
-#endif
-
- for (loop = 0; loop < pagevec->nr; loop++) {
- struct page *page = pagevec->pages[loop];
-
- _debug("- mark %p{%lx}", page, page->index);
- if (TestSetPageFsCache(page)) {
- static bool once_only;
- if (!once_only) {
- once_only = true;
- printk(KERN_WARNING "FS-Cache:"
- " Cookie type %s marked page %lx"
- " multiple times\n",
- cookie->def->name, page->index);
- }
- }
- }
+ for (loop = 0; loop < pagevec->nr; loop++)
+ fscache_mark_page_cached(op, pagevec->pages[loop]);
- if (cookie->def->mark_pages_cached)
- cookie->def->mark_pages_cached(cookie->netfs_data,
- op->mapping, pagevec);
pagevec_reinit(pagevec);
}
EXPORT_SYMBOL(fscache_mark_pages_cached);
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index af095b5..485c011 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -504,6 +504,9 @@ extern void fscache_withdraw_cache(struct fscache_cache *cache);
extern void fscache_io_error(struct fscache_cache *cache);
+extern void fscache_mark_page_cached(struct fscache_retrieval *op,
+ struct page *page);
+
extern void fscache_mark_pages_cached(struct fscache_retrieval *op,
struct pagevec *pagevec);
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 9ec20de..f4b6353 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -135,14 +135,14 @@ struct fscache_cookie_def {
*/
void (*put_context)(void *cookie_netfs_data, void *context);
- /* indicate pages that now have cache metadata retained
- * - this function should mark the specified pages as now being cached
- * - the pages will have been marked with PG_fscache before this is
+ /* indicate page that now have cache metadata retained
+ * - this function should mark the specified page as now being cached
+ * - the page will have been marked with PG_fscache before this is
* called, so this is optional
*/
- void (*mark_pages_cached)(void *cookie_netfs_data,
- struct address_space *mapping,
- struct pagevec *cached_pvec);
+ void (*mark_page_cached)(void *cookie_netfs_data,
+ struct address_space *mapping,
+ struct page *page);
/* indicate the cookie is no longer cached
* - this function is called when the backing store currently caching
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 03/13] FS-Cache: Validate page mapping pointer value
2011-09-29 14:45 [RFC][PATCH 00/13] Fix FS-Cache problems David Howells
2011-09-29 14:45 ` [PATCH 01/13] Noisefs: A predictable noise producing fs for testing things David Howells
2011-09-29 14:46 ` [PATCH 02/13] CacheFiles: Fix the marking of cached pages David Howells
@ 2011-09-29 14:46 ` David Howells
2011-09-29 14:46 ` [PATCH 04/13] CacheFiles: Downgrade the requirements passed to the allocator David Howells
` (9 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2011-09-29 14:46 UTC (permalink / raw)
To: moseleymark, mark, jlayton, steved
Cc: linux-fsdevel, linux-nfs, linux-cachefs
Add information to indicate whether page->mapping should have a value or not,
and to log a warning if it is wrong.
Signed-off-by: David Howells <dhowells@redhat.com>
---
fs/cachefiles/rdwr.c | 26 +++++++++++++++++---------
fs/fscache/page.c | 36 +++++++++++++++++++++++++++++++++---
include/linux/fscache-cache.h | 6 ++++--
3 files changed, 54 insertions(+), 14 deletions(-)
diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index 0186fc1..e8c3766 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -177,7 +177,7 @@ static void cachefiles_read_copier(struct fscache_operation *_op)
if (PageUptodate(monitor->back_page)) {
copy_highpage(monitor->netfs_page, monitor->back_page);
fscache_mark_page_cached(monitor->op,
- monitor->netfs_page);
+ monitor->netfs_page, true);
error = 0;
} else if (!PageError(monitor->back_page)) {
/* the page has probably been truncated */
@@ -334,7 +334,7 @@ backing_page_already_present:
backing_page_already_uptodate:
_debug("- uptodate");
- fscache_mark_page_cached(op, netpage);
+ fscache_mark_page_cached(op, netpage, true);
copy_highpage(netpage, backpage);
fscache_end_io(op, netpage, 0);
@@ -446,7 +446,7 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op,
&pagevec);
} else if (cachefiles_has_space(cache, 0, 1) == 0) {
/* there's space in the cache we can use */
- fscache_mark_page_cached(op, page);
+ fscache_mark_page_cached(op, page, true);
ret = -ENODATA;
} else {
ret = -ENOBUFS;
@@ -622,7 +622,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
page_cache_release(backpage);
backpage = NULL;
- fscache_mark_page_cached(op, netpage);
+ fscache_mark_page_cached(op, netpage, true);
page_cache_get(netpage);
if (!pagevec_add(&lru_pvec, netpage))
@@ -704,6 +704,14 @@ int cachefiles_read_or_alloc_pages(struct fscache_retrieval *op,
object->fscache.debug_id, atomic_read(&op->op.usage),
*nr_pages);
+ {
+ struct page *q, *_q;
+ list_for_each_entry_safe(q, _q, pages, lru) {
+ ASSERT(!q->mapping);
+ ASSERT(!PageFsCache(q));
+ }
+ }
+
if (!object->backer)
return -ENOBUFS;
@@ -757,13 +765,13 @@ int cachefiles_read_or_alloc_pages(struct fscache_retrieval *op,
(*nr_pages)--;
nrbackpages++;
} else if (space && pagevec_add(&pagevec, page) == 0) {
- fscache_mark_pages_cached(op, &pagevec);
+ fscache_mark_pages_cached(op, &pagevec, false);
ret = -ENODATA;
}
}
if (pagevec_count(&pagevec) > 0)
- fscache_mark_pages_cached(op, &pagevec);
+ fscache_mark_pages_cached(op, &pagevec, false);
if (list_empty(pages))
ret = 0;
@@ -809,7 +817,7 @@ int cachefiles_allocate_page(struct fscache_retrieval *op,
ret = cachefiles_has_space(cache, 0, 1);
if (ret == 0)
- fscache_mark_page_cached(op, page);
+ fscache_mark_page_cached(op, page, true);
else
ret = -ENOBUFS;
@@ -852,11 +860,11 @@ int cachefiles_allocate_pages(struct fscache_retrieval *op,
list_for_each_entry(page, pages, lru) {
if (pagevec_add(&pagevec, page) == 0)
- fscache_mark_pages_cached(op, &pagevec);
+ fscache_mark_pages_cached(op, &pagevec, false);
}
if (pagevec_count(&pagevec) > 0)
- fscache_mark_pages_cached(op, &pagevec);
+ fscache_mark_pages_cached(op, &pagevec, false);
ret = -ENODATA;
} else {
ret = -ENOBUFS;
diff --git a/fs/fscache/page.c b/fs/fscache/page.c
index d7c663c..5e793b5 100644
--- a/fs/fscache/page.c
+++ b/fs/fscache/page.c
@@ -918,11 +918,13 @@ EXPORT_SYMBOL(__fscache_uncache_page);
* fscache_mark_page_cached - Mark a page as being cached
* @op: The retrieval op pages are being marked for
* @page: The page to be marked
+ * @should_have_mapping: True if the pages should have a mapping set
*
* Mark a netfs page as being cached. After this is called, the netfs
* must call fscache_uncache_page() to remove the mark.
*/
-void fscache_mark_page_cached(struct fscache_retrieval *op, struct page *page)
+void fscache_mark_page_cached(struct fscache_retrieval *op, struct page *page,
+ bool should_have_mapping)
{
struct fscache_cookie *cookie = op->op.object->cookie;
@@ -941,6 +943,31 @@ void fscache_mark_page_cached(struct fscache_retrieval *op, struct page *page)
cookie->def->name, page->index);
}
}
+ if (should_have_mapping && !page->mapping) {
+ static bool once_only;
+ if (!once_only) {
+ once_only = 1;
+ printk(KERN_ALERT
+ "FSC: page:%p count:%d mapcount:%d NO_MAPPING"
+ " index:%#lx\n",
+ page, page_count(page), page_mapcount(page),
+ page->index);
+ WARN_ON(1);
+ }
+ }
+
+ if (!should_have_mapping && page->mapping) {
+ static bool once_only;
+ if (!once_only) {
+ once_only = 1;
+ printk(KERN_ALERT
+ "FSC: page:%p count:%d mapcount:%d MAPPING:%p"
+ " index:%#lx\n",
+ page, page_count(page), page_mapcount(page),
+ page->mapping, page->index);
+ WARN_ON(1);
+ }
+ }
if (cookie->def->mark_page_cached)
cookie->def->mark_page_cached(cookie->netfs_data,
@@ -952,17 +979,20 @@ EXPORT_SYMBOL(fscache_mark_page_cached);
* fscache_mark_pages_cached - Mark pages as being cached
* @op: The retrieval op pages are being marked for
* @pagevec: The pages to be marked
+ * @should_have_mapping: True if the pages should have a mapping set
*
* Mark a bunch of netfs pages as being cached. After this is called,
* the netfs must call fscache_uncache_page() to remove the mark.
*/
void fscache_mark_pages_cached(struct fscache_retrieval *op,
- struct pagevec *pagevec)
+ struct pagevec *pagevec,
+ bool should_have_mapping)
{
unsigned long loop;
for (loop = 0; loop < pagevec->nr; loop++)
- fscache_mark_page_cached(op, pagevec->pages[loop]);
+ fscache_mark_page_cached(op, pagevec->pages[loop],
+ should_have_mapping);
pagevec_reinit(pagevec);
}
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 485c011..ae155f1 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -505,10 +505,12 @@ extern void fscache_withdraw_cache(struct fscache_cache *cache);
extern void fscache_io_error(struct fscache_cache *cache);
extern void fscache_mark_page_cached(struct fscache_retrieval *op,
- struct page *page);
+ struct page *page,
+ bool should_have_mapping);
extern void fscache_mark_pages_cached(struct fscache_retrieval *op,
- struct pagevec *pagevec);
+ struct pagevec *pagevec,
+ bool should_have_mapping);
extern bool fscache_object_sleep_till_congested(signed long *timeoutp);
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 04/13] CacheFiles: Downgrade the requirements passed to the allocator
2011-09-29 14:45 [RFC][PATCH 00/13] Fix FS-Cache problems David Howells
` (2 preceding siblings ...)
2011-09-29 14:46 ` [PATCH 03/13] FS-Cache: Validate page mapping pointer value David Howells
@ 2011-09-29 14:46 ` David Howells
2011-09-29 14:46 ` [PATCH 05/13] FS-Cache: Check that there are no read ops when cookie relinquished David Howells
` (8 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2011-09-29 14:46 UTC (permalink / raw)
To: moseleymark, mark, jlayton, steved
Cc: linux-fsdevel, linux-nfs, linux-cachefs
Downgrade the requirements passed to the allocator in the gfp flags parameter.
FS-Cache/CacheFiles can handle OOM conditions simply by aborting the attempt to
store an object or a page in the cache.
Signed-off-by: David Howells <dhowells@redhat.com>
---
fs/cachefiles/interface.c | 8 ++++----
fs/cachefiles/internal.h | 2 ++
fs/cachefiles/key.c | 2 +-
fs/cachefiles/rdwr.c | 18 ++++++++++--------
fs/cachefiles/xattr.c | 2 +-
fs/fscache/page.c | 2 +-
6 files changed, 19 insertions(+), 15 deletions(-)
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 1064805..075b7a6 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -42,12 +42,12 @@ static struct fscache_object *cachefiles_alloc_object(
_enter("{%s},%p,", cache->cache.identifier, cookie);
- lookup_data = kmalloc(sizeof(*lookup_data), GFP_KERNEL);
+ lookup_data = kmalloc(sizeof(*lookup_data), cachefiles_gfp);
if (!lookup_data)
goto nomem_lookup_data;
/* create a new object record and a temporary leaf image */
- object = kmem_cache_alloc(cachefiles_object_jar, GFP_KERNEL);
+ object = kmem_cache_alloc(cachefiles_object_jar, cachefiles_gfp);
if (!object)
goto nomem_object;
@@ -64,7 +64,7 @@ static struct fscache_object *cachefiles_alloc_object(
* - stick the length on the front and leave space on the back for the
* encoder
*/
- buffer = kmalloc((2 + 512) + 3, GFP_KERNEL);
+ buffer = kmalloc((2 + 512) + 3, cachefiles_gfp);
if (!buffer)
goto nomem_buffer;
@@ -220,7 +220,7 @@ static void cachefiles_update_object(struct fscache_object *_object)
return;
}
- auxdata = kmalloc(2 + 512 + 3, GFP_KERNEL);
+ auxdata = kmalloc(2 + 512 + 3, cachefiles_gfp);
if (!auxdata) {
_leave(" [nomem]");
return;
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index bd6bc1b..4938251 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -23,6 +23,8 @@ extern unsigned cachefiles_debug;
#define CACHEFILES_DEBUG_KLEAVE 2
#define CACHEFILES_DEBUG_KDEBUG 4
+#define cachefiles_gfp (__GFP_WAIT | __GFP_NORETRY | __GFP_NOMEMALLOC)
+
/*
* node records
*/
diff --git a/fs/cachefiles/key.c b/fs/cachefiles/key.c
index 81b8b2b..33b58c6 100644
--- a/fs/cachefiles/key.c
+++ b/fs/cachefiles/key.c
@@ -78,7 +78,7 @@ char *cachefiles_cook_key(const u8 *raw, int keylen, uint8_t type)
_debug("max: %d", max);
- key = kmalloc(max, GFP_KERNEL);
+ key = kmalloc(max, cachefiles_gfp);
if (!key)
return NULL;
diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index e8c3766..96e6940 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -238,7 +238,7 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
_debug("read back %p{%lu,%d}",
netpage, netpage->index, page_count(netpage));
- monitor = kzalloc(sizeof(*monitor), GFP_KERNEL);
+ monitor = kzalloc(sizeof(*monitor), cachefiles_gfp);
if (!monitor)
goto nomem;
@@ -257,13 +257,14 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
goto backing_page_already_present;
if (!newpage) {
- newpage = page_cache_alloc_cold(bmapping);
+ newpage = __page_cache_alloc(cachefiles_gfp |
+ __GFP_COLD);
if (!newpage)
goto nomem_monitor;
}
ret = add_to_page_cache(newpage, bmapping,
- netpage->index, GFP_KERNEL);
+ netpage->index, cachefiles_gfp);
if (ret == 0)
goto installed_new_backing_page;
if (ret != -EEXIST)
@@ -481,7 +482,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
netpage, netpage->index, page_count(netpage));
if (!monitor) {
- monitor = kzalloc(sizeof(*monitor), GFP_KERNEL);
+ monitor = kzalloc(sizeof(*monitor), cachefiles_gfp);
if (!monitor)
goto nomem;
@@ -496,13 +497,14 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
goto backing_page_already_present;
if (!newpage) {
- newpage = page_cache_alloc_cold(bmapping);
+ newpage = __page_cache_alloc(cachefiles_gfp |
+ __GFP_COLD);
if (!newpage)
goto nomem;
}
ret = add_to_page_cache(newpage, bmapping,
- netpage->index, GFP_KERNEL);
+ netpage->index, cachefiles_gfp);
if (ret == 0)
goto installed_new_backing_page;
if (ret != -EEXIST)
@@ -532,7 +534,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
_debug("- monitor add");
ret = add_to_page_cache(netpage, op->mapping, netpage->index,
- GFP_KERNEL);
+ cachefiles_gfp);
if (ret < 0) {
if (ret == -EEXIST) {
page_cache_release(netpage);
@@ -608,7 +610,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
_debug("- uptodate");
ret = add_to_page_cache(netpage, op->mapping, netpage->index,
- GFP_KERNEL);
+ cachefiles_gfp);
if (ret < 0) {
if (ret == -EEXIST) {
page_cache_release(netpage);
diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
index e18b183..73b4628 100644
--- a/fs/cachefiles/xattr.c
+++ b/fs/cachefiles/xattr.c
@@ -174,7 +174,7 @@ int cachefiles_check_object_xattr(struct cachefiles_object *object,
ASSERT(dentry);
ASSERT(dentry->d_inode);
- auxbuf = kmalloc(sizeof(struct cachefiles_xattr) + 512, GFP_KERNEL);
+ auxbuf = kmalloc(sizeof(struct cachefiles_xattr) + 512, cachefiles_gfp);
if (!auxbuf) {
_leave(" = -ENOMEM");
return -ENOMEM;
diff --git a/fs/fscache/page.c b/fs/fscache/page.c
index 5e793b5..b8b62f4 100644
--- a/fs/fscache/page.c
+++ b/fs/fscache/page.c
@@ -759,7 +759,7 @@ int __fscache_write_page(struct fscache_cookie *cookie,
fscache_stat(&fscache_n_stores);
- op = kzalloc(sizeof(*op), GFP_NOIO);
+ op = kzalloc(sizeof(*op), GFP_NOIO | __GFP_NOMEMALLOC | __GFP_NORETRY);
if (!op)
goto nomem;
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 05/13] FS-Cache: Check that there are no read ops when cookie relinquished
2011-09-29 14:45 [RFC][PATCH 00/13] Fix FS-Cache problems David Howells
` (3 preceding siblings ...)
2011-09-29 14:46 ` [PATCH 04/13] CacheFiles: Downgrade the requirements passed to the allocator David Howells
@ 2011-09-29 14:46 ` David Howells
2011-09-29 14:46 ` [PATCH 06/13] FS-Cache: Check cookie is still correct in __fscache_read_or_alloc_pages() David Howells
` (7 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2011-09-29 14:46 UTC (permalink / raw)
To: moseleymark, mark, jlayton, steved
Cc: linux-fsdevel, linux-nfs, linux-cachefs
Check that the netfs isn't trying to relinquish a cookie that still has read
operations in progress upon it. If there are, then give log a warning and BUG.
Signed-off-by: David Howells <dhowells@redhat.com>
---
fs/fscache/cookie.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 9905350..0666996 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -452,6 +452,14 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie, int retire)
_debug("RELEASE OBJ%x", object->debug_id);
+ if (atomic_read(&object->n_reads)) {
+ spin_unlock(&cookie->lock);
+ printk(KERN_ERR "FS-Cache:"
+ " Cookie '%s' still has %d outstanding reads\n",
+ cookie->def->name, atomic_read(&object->n_reads));
+ BUG();
+ }
+
/* detach each cache object from the object cookie */
spin_lock(&object->lock);
hlist_del_init(&object->cookie_link);
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 06/13] FS-Cache: Check cookie is still correct in __fscache_read_or_alloc_pages()
2011-09-29 14:45 [RFC][PATCH 00/13] Fix FS-Cache problems David Howells
` (4 preceding siblings ...)
2011-09-29 14:46 ` [PATCH 05/13] FS-Cache: Check that there are no read ops when cookie relinquished David Howells
@ 2011-09-29 14:46 ` David Howells
2011-09-29 14:47 ` [PATCH 07/13] CacheFiles: Make some debugging statements conditional David Howells
` (6 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2011-09-29 14:46 UTC (permalink / raw)
To: moseleymark, mark, jlayton, steved
Cc: linux-fsdevel, linux-nfs, linux-cachefs
Check the object's cookie pointer is still correct in
__fscache_read_or_alloc_pages(). This may change as a result of the cookie
being released by the netfs before we've finished reading from it.
Signed-off-by: David Howells <dhowells@redhat.com>
---
fs/fscache/page.c | 21 +++++++++++++++++++++
1 files changed, 21 insertions(+), 0 deletions(-)
diff --git a/fs/fscache/page.c b/fs/fscache/page.c
index b8b62f4..aaed5cd 100644
--- a/fs/fscache/page.c
+++ b/fs/fscache/page.c
@@ -496,6 +496,7 @@ int __fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
if (fscache_submit_op(object, &op->op) < 0)
goto nobufs_unlock;
spin_unlock(&cookie->lock);
+ ASSERTCMP(object->cookie, ==, cookie);
fscache_stat(&fscache_n_retrieval_ops);
@@ -513,6 +514,26 @@ int __fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
goto error;
/* ask the cache to honour the operation */
+ if (!object->cookie) {
+ static const char prefix[] = "fs-";
+ printk(KERN_ERR "%sobject: OBJ%x\n",
+ prefix, object->debug_id);
+ printk(KERN_ERR "%sobjstate=%s fl=%lx wbusy=%x ev=%lx[%lx]\n",
+ prefix, fscache_object_states[object->state],
+ object->flags, work_busy(&object->work),
+ object->events,
+ object->event_mask & FSCACHE_OBJECT_EVENTS_MASK);
+ printk(KERN_ERR "%sops=%u inp=%u exc=%u\n",
+ prefix, object->n_ops, object->n_in_progress,
+ object->n_exclusive);
+ printk(KERN_ERR "%sparent=%p\n",
+ prefix, object->parent);
+ printk(KERN_ERR "%scookie=%p [pr=%p nd=%p fl=%lx]\n",
+ prefix, object->cookie,
+ cookie->parent, cookie->netfs_data, cookie->flags);
+ }
+ ASSERTCMP(object->cookie, ==, cookie);
+
if (test_bit(FSCACHE_COOKIE_NO_DATA_YET, &object->cookie->flags)) {
fscache_stat(&fscache_n_cop_allocate_pages);
ret = object->cache->ops->allocate_pages(
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 07/13] CacheFiles: Make some debugging statements conditional
2011-09-29 14:45 [RFC][PATCH 00/13] Fix FS-Cache problems David Howells
` (5 preceding siblings ...)
2011-09-29 14:46 ` [PATCH 06/13] FS-Cache: Check cookie is still correct in __fscache_read_or_alloc_pages() David Howells
@ 2011-09-29 14:47 ` David Howells
2011-09-29 14:47 ` [PATCH 08/13] FS-Cache: Make cookie relinquishment wait for outstanding reads David Howells
` (5 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2011-09-29 14:47 UTC (permalink / raw)
To: moseleymark, mark, jlayton, steved
Cc: linux-fsdevel, linux-nfs, linux-cachefs
Downgrade some debugging statements to not unconditionally print stuff, but
rather be conditional on the appropriate module parameter setting.
Signed-off-by: David Howells <dhowells@redhat.com>
---
fs/cachefiles/rdwr.c | 14 +++++++-------
1 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index 96e6940..f507937 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -77,25 +77,25 @@ static int cachefiles_read_reissue(struct cachefiles_object *object,
struct page *backpage = monitor->back_page, *backpage2;
int ret;
- kenter("{ino=%lx},{%lx,%lx}",
+ _enter("{ino=%lx},{%lx,%lx}",
object->backer->d_inode->i_ino,
backpage->index, backpage->flags);
/* skip if the page was truncated away completely */
if (backpage->mapping != bmapping) {
- kleave(" = -ENODATA [mapping]");
+ _leave(" = -ENODATA [mapping]");
return -ENODATA;
}
backpage2 = find_get_page(bmapping, backpage->index);
if (!backpage2) {
- kleave(" = -ENODATA [gone]");
+ _leave(" = -ENODATA [gone]");
return -ENODATA;
}
if (backpage != backpage2) {
put_page(backpage2);
- kleave(" = -ENODATA [different]");
+ _leave(" = -ENODATA [different]");
return -ENODATA;
}
@@ -114,7 +114,7 @@ static int cachefiles_read_reissue(struct cachefiles_object *object,
if (PageUptodate(backpage))
goto unlock_discard;
- kdebug("reissue read");
+ _debug("reissue read");
ret = bmapping->a_ops->readpage(NULL, backpage);
if (ret < 0)
goto unlock_discard;
@@ -129,7 +129,7 @@ static int cachefiles_read_reissue(struct cachefiles_object *object,
}
/* it'll reappear on the todo list */
- kleave(" = -EINPROGRESS");
+ _leave(" = -EINPROGRESS");
return -EINPROGRESS;
unlock_discard:
@@ -137,7 +137,7 @@ unlock_discard:
spin_lock_irq(&object->work_lock);
list_del(&monitor->op_link);
spin_unlock_irq(&object->work_lock);
- kleave(" = %d", ret);
+ _leave(" = %d", ret);
return ret;
}
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 08/13] FS-Cache: Make cookie relinquishment wait for outstanding reads
2011-09-29 14:45 [RFC][PATCH 00/13] Fix FS-Cache problems David Howells
` (6 preceding siblings ...)
2011-09-29 14:47 ` [PATCH 07/13] CacheFiles: Make some debugging statements conditional David Howells
@ 2011-09-29 14:47 ` David Howells
2011-09-29 14:47 ` [PATCH 09/13] FS-Cache: Fix operation state management and accounting David Howells
` (4 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2011-09-29 14:47 UTC (permalink / raw)
To: moseleymark, mark, jlayton, steved
Cc: linux-fsdevel, linux-nfs, linux-cachefs
Make fscache_relinquish_cookie() log a warning and wait if there are any
outstanding reads left on the cookie it was given.
Signed-off-by: David Howells <dhowells@redhat.com>
---
fs/fscache/cookie.c | 18 ++++++++++++++----
fs/fscache/operation.c | 10 ++++++++--
include/linux/fscache-cache.h | 1 +
3 files changed, 23 insertions(+), 6 deletions(-)
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 0666996..66be9ec 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -442,22 +442,32 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie, int retire)
event = retire ? FSCACHE_OBJECT_EV_RETIRE : FSCACHE_OBJECT_EV_RELEASE;
+try_again:
spin_lock(&cookie->lock);
/* break links with all the active objects */
while (!hlist_empty(&cookie->backing_objects)) {
+ int n_reads;
object = hlist_entry(cookie->backing_objects.first,
struct fscache_object,
cookie_link);
_debug("RELEASE OBJ%x", object->debug_id);
- if (atomic_read(&object->n_reads)) {
+ set_bit(FSCACHE_COOKIE_WAITING_ON_READS, &cookie->flags);
+ n_reads = atomic_read(&object->n_reads);
+ if (n_reads) {
+ int n_ops = object->n_ops;
+ int n_in_progress = object->n_in_progress;
spin_unlock(&cookie->lock);
printk(KERN_ERR "FS-Cache:"
- " Cookie '%s' still has %d outstanding reads\n",
- cookie->def->name, atomic_read(&object->n_reads));
- BUG();
+ " Cookie '%s' still has %d outstanding reads (%d,%d)\n",
+ cookie->def->name,
+ n_reads, n_ops, n_in_progress);
+ wait_on_bit(&cookie->flags, FSCACHE_COOKIE_WAITING_ON_READS,
+ fscache_wait_bit, TASK_UNINTERRUPTIBLE);
+ printk("Wait finished\n");
+ goto try_again;
}
/* detach each cache object from the object cookie */
diff --git a/fs/fscache/operation.c b/fs/fscache/operation.c
index 30afdfa..c857ab8 100644
--- a/fs/fscache/operation.c
+++ b/fs/fscache/operation.c
@@ -340,8 +340,14 @@ void fscache_put_operation(struct fscache_operation *op)
object = op->object;
- if (test_bit(FSCACHE_OP_DEC_READ_CNT, &op->flags))
- atomic_dec(&object->n_reads);
+ if (test_bit(FSCACHE_OP_DEC_READ_CNT, &op->flags)) {
+ if (atomic_dec_and_test(&object->n_reads)) {
+ clear_bit(FSCACHE_COOKIE_WAITING_ON_READS,
+ &object->cookie->flags);
+ wake_up_bit(&object->cookie->flags,
+ FSCACHE_COOKIE_WAITING_ON_READS);
+ }
+ }
/* now... we may get called with the object spinlock held, so we
* complete the cleanup here only if we can immediately acquire the
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index ae155f1..c593d57 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -301,6 +301,7 @@ struct fscache_cookie {
#define FSCACHE_COOKIE_PENDING_FILL 3 /* T if pending initial fill on object */
#define FSCACHE_COOKIE_FILLING 4 /* T if filling object incrementally */
#define FSCACHE_COOKIE_UNAVAILABLE 5 /* T if cookie is unavailable (error, etc) */
+#define FSCACHE_COOKIE_WAITING_ON_READS 6 /* T if cookie is waiting on reads */
};
extern struct fscache_cookie fscache_fsdef_index;
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 09/13] FS-Cache: Fix operation state management and accounting
2011-09-29 14:45 [RFC][PATCH 00/13] Fix FS-Cache problems David Howells
` (7 preceding siblings ...)
2011-09-29 14:47 ` [PATCH 08/13] FS-Cache: Make cookie relinquishment wait for outstanding reads David Howells
@ 2011-09-29 14:47 ` David Howells
[not found] ` <20110929144536.5812.84405.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>
` (3 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2011-09-29 14:47 UTC (permalink / raw)
To: moseleymark, mark, jlayton, steved
Cc: linux-fsdevel, linux-nfs, linux-cachefs
Fix the state management of internal fscache operations and the accounting of
what operations are in what states.
This is done by:
(1) Give struct fscache_operation a enum variable that directly represents the
state it's currently in, rather than spreading this knowledge over a bunch
of flags, who's processing the operation at the moment and whether it is
queued or not.
This makes it easier to write assertions to check the state at various
points and to prevent invalid state transitions.
(2) Add an 'operation complete' state and supply a function to indicate the
completion of an operation (fscache_op_complete()) and make things call
it. The final call to fscache_put_operation() can then check that an op
in the appropriate state (complete or cancelled).
(3) Adjust the use of object->n_ops, ->n_in_progress, ->n_exclusive to better
govern the state of an object:
(a) The ->n_ops is now the number of extant operations on the object
and is now decremented by fscache_put_operation() only.
(b) The ->n_in_progress is simply the number of objects that have been
taken off of the object's pending queue for the purposes of being
run. This is decremented by fscache_op_complete() only.
(c) The ->n_exclusive is the number of exclusive ops that have been
submitted and queued or are in progress. It is decremented by
fscache_op_complete() and by fscache_cancel_op().
fscache_put_operation() and fscache_operation_gc() now no longer try to
clean up ->n_exclusive and ->n_in_progress. That was leading to double
decrements against fscache_cancel_op().
fscache_cancel_op() now no longer decrements ->n_ops. That was leading to
double decrements against fscache_put_operation().
fscache_submit_exclusive_op() now decides whether it has to queue an op
based on ->n_in_progress being > 0 rather than ->n_ops > 0 as the latter
will persist in being true even after all preceding operations have been
cancelled or completed. Furthermore, if an object is active and there are
runnable ops against it, there must be at least one op running.
(4) Add a remaining-pages counter (n_pages) to struct fscache_retrieval and
provide a function to record completion of the pages as they complete.
When n_pages reaches 0, the operation is deemed to be complete and
fscache_op_complete() is called.
Add calls to fscache_retrieval_complete() anywhere we've finished with a
page we've been given to read or allocate for. This includes places where
we just return pages to the netfs for reading from the server and where
accessing the cache fails and we discard the proposed netfs page.
The bugs in the unfixed state management manifest themselves as oopses like the
following where the operation completion gets out of sync with return of the
cookie by the netfs. This is possible because the cache unlocks and returns
all the netfs pages before recording its completion - which means that there's
nothing to stop the netfs discarding them and returning the cookie.
FS-Cache: Cookie 'NFS.fh' still has outstanding reads
------------[ cut here ]------------
kernel BUG at fs/fscache/cookie.c:519!
invalid opcode: 0000 [#1] SMP
CPU 1
Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc
Pid: 400, comm: kswapd0 Not tainted 3.1.0-rc7-fsdevel+ #1090 /DG965RY
RIP: 0010:[<ffffffffa007050a>] [<ffffffffa007050a>] __fscache_relinquish_cookie+0x170/0x343 [fscache]
RSP: 0018:ffff8800368cfb00 EFLAGS: 00010282
RAX: 000000000000003c RBX: ffff880023cc8790 RCX: 0000000000000000
RDX: 0000000000002f2e RSI: 0000000000000001 RDI: ffffffff813ab86c
RBP: ffff8800368cfb50 R08: 0000000000000002 R09: 0000000000000000
R10: ffff88003a1b7890 R11: ffff88001df6e488 R12: ffff880023d8ed98
R13: ffff880023cc8798 R14: 0000000000000004 R15: ffff88003b8bf370
FS: 0000000000000000(0000) GS:ffff88003bd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000008ba008 CR3: 0000000023d93000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kswapd0 (pid: 400, threadinfo ffff8800368ce000, task ffff88003b8bf040)
Stack:
ffff88003b8bf040 ffff88001df6e528 ffff88001df6e528 ffffffffa00b46b0
ffff88003b8bf040 ffff88001df6e488 ffff88001df6e620 ffffffffa00b46b0
ffff88001ebd04c8 0000000000000004 ffff8800368cfb70 ffffffffa00b2c91
Call Trace:
[<ffffffffa00b2c91>] nfs_fscache_release_inode_cookie+0x3b/0x47 [nfs]
[<ffffffffa008f25f>] nfs_clear_inode+0x3c/0x41 [nfs]
[<ffffffffa0090df1>] nfs4_evict_inode+0x2f/0x33 [nfs]
[<ffffffff810d8d47>] evict+0xa1/0x15c
[<ffffffff810d8e2e>] dispose_list+0x2c/0x38
[<ffffffff810d9ebd>] prune_icache_sb+0x28c/0x29b
[<ffffffff810c56b7>] prune_super+0xd5/0x140
[<ffffffff8109b615>] shrink_slab+0x102/0x1ab
[<ffffffff8109d690>] balance_pgdat+0x2f2/0x595
[<ffffffff8103e009>] ? process_timeout+0xb/0xb
[<ffffffff8109dba3>] kswapd+0x270/0x289
[<ffffffff8104c5ea>] ? __init_waitqueue_head+0x46/0x46
[<ffffffff8109d933>] ? balance_pgdat+0x595/0x595
[<ffffffff8104bf7a>] kthread+0x7f/0x87
[<ffffffff813ad6b4>] kernel_thread_helper+0x4/0x10
[<ffffffff81026b98>] ? finish_task_switch+0x45/0xc0
[<ffffffff813abcdd>] ? retint_restore_args+0xe/0xe
[<ffffffff8104befb>] ? __init_kthread_worker+0x53/0x53
[<ffffffff813ad6b0>] ? gs_change+0xb/0xb
Signed-off-by: David Howells <dhowells@redhat.com>
---
Documentation/filesystems/caching/backend-api.txt | 26 ++++++
Documentation/filesystems/caching/operations.txt | 2
fs/cachefiles/rdwr.c | 31 ++++++-
fs/fscache/object.c | 2
fs/fscache/operation.c | 93 ++++++++++++++-------
fs/fscache/page.c | 23 +++++
include/linux/fscache-cache.h | 37 +++++++-
7 files changed, 164 insertions(+), 50 deletions(-)
diff --git a/Documentation/filesystems/caching/backend-api.txt b/Documentation/filesystems/caching/backend-api.txt
index 382d52c..f4769b9 100644
--- a/Documentation/filesystems/caching/backend-api.txt
+++ b/Documentation/filesystems/caching/backend-api.txt
@@ -419,7 +419,10 @@ performed on the denizens of the cache. These are held in a structure of type:
If an I/O error occurs, fscache_io_error() should be called and -ENOBUFS
returned if possible or fscache_end_io() called with a suitable error
- code..
+ code.
+
+ fscache_put_retrieval() should be called after a page or pages are dealt
+ with. This will complete the operation when all pages are dealt with.
(*) Request pages be read from cache [mandatory]:
@@ -526,6 +529,27 @@ FS-Cache provides some utilities that a cache backend may make use of:
error value should be 0 if successful and an error otherwise.
+ (*) Record that one or more pages being retrieved or allocated have been dealt
+ with:
+
+ void fscache_retrieval_complete(struct fscache_retrieval *op,
+ int n_pages);
+
+ This is called to record the fact that one or more pages have been dealt
+ with and are no longer the concern of this operation. When the number of
+ pages remaining in the operation reaches 0, the operation will be
+ completed.
+
+
+ (*) Record operation completion:
+
+ void fscache_op_complete(struct fscache_operation *op);
+
+ This is called to record the completion of an operation. This deducts
+ this operation from the parent object's run state, potentially permitting
+ one or more pending operations to start running.
+
+
(*) Set highest store limit:
void fscache_set_store_limit(struct fscache_object *object,
diff --git a/Documentation/filesystems/caching/operations.txt b/Documentation/filesystems/caching/operations.txt
index b6b070c..bee2a5f 100644
--- a/Documentation/filesystems/caching/operations.txt
+++ b/Documentation/filesystems/caching/operations.txt
@@ -174,7 +174,7 @@ Operations are used through the following procedure:
necessary (the object might have died whilst the thread was waiting).
When it has finished doing its processing, it should call
- fscache_put_operation() on it.
+ fscache_op_complete() and fscache_put_operation() on it.
(4) The operation holds an effective lock upon the object, preventing other
exclusive ops conflicting until it is released. The operation can be
diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index f507937..4b2b821 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -197,6 +197,7 @@ static void cachefiles_read_copier(struct fscache_operation *_op)
fscache_end_io(op, monitor->netfs_page, error);
page_cache_release(monitor->netfs_page);
+ fscache_retrieval_complete(op, 1);
fscache_put_retrieval(op);
kfree(monitor);
@@ -339,6 +340,7 @@ backing_page_already_uptodate:
copy_highpage(netpage, backpage);
fscache_end_io(op, netpage, 0);
+ fscache_retrieval_complete(op, 1);
success:
_debug("success");
@@ -360,6 +362,7 @@ read_error:
goto out;
io_error:
cachefiles_io_error_obj(object, "Page read error on backing file");
+ fscache_retrieval_complete(op, 1);
ret = -ENOBUFS;
goto out;
@@ -369,6 +372,7 @@ nomem_monitor:
fscache_put_retrieval(monitor->op);
kfree(monitor);
nomem:
+ fscache_retrieval_complete(op, 1);
_leave(" = -ENOMEM");
return -ENOMEM;
}
@@ -407,7 +411,7 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op,
_enter("{%p},{%lx},,,", object, page->index);
if (!object->backer)
- return -ENOBUFS;
+ goto enobufs;
inode = object->backer->d_inode;
ASSERT(S_ISREG(inode->i_mode));
@@ -416,7 +420,7 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op,
/* calculate the shift required to use bmap */
if (inode->i_sb->s_blocksize > PAGE_SIZE)
- return -ENOBUFS;
+ goto enobufs;
shift = PAGE_SHIFT - inode->i_sb->s_blocksize_bits;
@@ -448,13 +452,19 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op,
} else if (cachefiles_has_space(cache, 0, 1) == 0) {
/* there's space in the cache we can use */
fscache_mark_page_cached(op, page, true);
+ fscache_retrieval_complete(op, 1);
ret = -ENODATA;
} else {
- ret = -ENOBUFS;
+ goto enobufs;
}
_leave(" = %d", ret);
return ret;
+
+enobufs:
+ fscache_retrieval_complete(op, 1);
+ _leave(" = -ENOBUFS");
+ return -ENOBUFS;
}
/*
@@ -632,6 +642,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
/* the netpage is unlocked and marked up to date here */
fscache_end_io(op, netpage, 0);
+ fscache_retrieval_complete(op, 1);
page_cache_release(netpage);
netpage = NULL;
continue;
@@ -659,6 +670,7 @@ out:
list_for_each_entry_safe(netpage, _n, list, lru) {
list_del(&netpage->lru);
page_cache_release(netpage);
+ fscache_retrieval_complete(op, 1);
}
_leave(" = %d", ret);
@@ -715,7 +727,7 @@ int cachefiles_read_or_alloc_pages(struct fscache_retrieval *op,
}
if (!object->backer)
- return -ENOBUFS;
+ goto all_enobufs;
space = 1;
if (cachefiles_has_space(cache, 0, *nr_pages) < 0)
@@ -728,7 +740,7 @@ int cachefiles_read_or_alloc_pages(struct fscache_retrieval *op,
/* calculate the shift required to use bmap */
if (inode->i_sb->s_blocksize > PAGE_SIZE)
- return -ENOBUFS;
+ goto all_enobufs;
shift = PAGE_SHIFT - inode->i_sb->s_blocksize_bits;
@@ -768,7 +780,10 @@ int cachefiles_read_or_alloc_pages(struct fscache_retrieval *op,
nrbackpages++;
} else if (space && pagevec_add(&pagevec, page) == 0) {
fscache_mark_pages_cached(op, &pagevec, false);
+ fscache_retrieval_complete(op, 1);
ret = -ENODATA;
+ } else {
+ fscache_retrieval_complete(op, 1);
}
}
@@ -789,6 +804,10 @@ int cachefiles_read_or_alloc_pages(struct fscache_retrieval *op,
_leave(" = %d [nr=%u%s]",
ret, *nr_pages, list_empty(pages) ? " empty" : "");
return ret;
+
+all_enobufs:
+ fscache_retrieval_complete(op, *nr_pages);
+ return -ENOBUFS;
}
/*
@@ -823,6 +842,7 @@ int cachefiles_allocate_page(struct fscache_retrieval *op,
else
ret = -ENOBUFS;
+ fscache_retrieval_complete(op, 1);
_leave(" = %d", ret);
return ret;
}
@@ -872,6 +892,7 @@ int cachefiles_allocate_pages(struct fscache_retrieval *op,
ret = -ENOBUFS;
}
+ fscache_retrieval_complete(op, *nr_pages);
_leave(" = %d", ret);
return ret;
}
diff --git a/fs/fscache/object.c b/fs/fscache/object.c
index b6b897c..773bc79 100644
--- a/fs/fscache/object.c
+++ b/fs/fscache/object.c
@@ -587,8 +587,6 @@ static void fscache_object_available(struct fscache_object *object)
if (object->n_in_progress == 0) {
if (object->n_ops > 0) {
ASSERTCMP(object->n_ops, >=, object->n_obj_ops);
- ASSERTIF(object->n_ops > object->n_obj_ops,
- !list_empty(&object->pending_ops));
fscache_start_operations(object);
} else {
ASSERT(list_empty(&object->pending_ops));
diff --git a/fs/fscache/operation.c b/fs/fscache/operation.c
index c857ab8..1b9c4c9 100644
--- a/fs/fscache/operation.c
+++ b/fs/fscache/operation.c
@@ -37,6 +37,7 @@ void fscache_enqueue_operation(struct fscache_operation *op)
ASSERT(op->processor != NULL);
ASSERTCMP(op->object->state, >=, FSCACHE_OBJECT_AVAILABLE);
ASSERTCMP(atomic_read(&op->usage), >, 0);
+ ASSERTCMP(op->state, ==, FSCACHE_OP_ST_IN_PROGRESS);
fscache_stat(&fscache_n_op_enqueue);
switch (op->flags & FSCACHE_OP_TYPE) {
@@ -64,6 +65,9 @@ EXPORT_SYMBOL(fscache_enqueue_operation);
static void fscache_run_op(struct fscache_object *object,
struct fscache_operation *op)
{
+ ASSERTCMP(op->state, ==, FSCACHE_OP_ST_PENDING);
+
+ op->state = FSCACHE_OP_ST_IN_PROGRESS;
object->n_in_progress++;
if (test_and_clear_bit(FSCACHE_OP_WAITING, &op->flags))
wake_up_bit(&op->flags, FSCACHE_OP_WAITING);
@@ -80,22 +84,23 @@ static void fscache_run_op(struct fscache_object *object,
int fscache_submit_exclusive_op(struct fscache_object *object,
struct fscache_operation *op)
{
- int ret;
-
_enter("{OBJ%x OP%x},", object->debug_id, op->debug_id);
+ ASSERTCMP(op->state, ==, FSCACHE_OP_ST_INITIALISED);
+ ASSERTCMP(atomic_read(&op->usage), >, 0);
+
spin_lock(&object->lock);
ASSERTCMP(object->n_ops, >=, object->n_in_progress);
ASSERTCMP(object->n_ops, >=, object->n_exclusive);
ASSERT(list_empty(&op->pend_link));
- ret = -ENOBUFS;
+ op->state = FSCACHE_OP_ST_PENDING;
if (fscache_object_is_active(object)) {
op->object = object;
object->n_ops++;
object->n_exclusive++; /* reads and writes must wait */
- if (object->n_ops > 1) {
+ if (object->n_in_progress > 0) {
atomic_inc(&op->usage);
list_add_tail(&op->pend_link, &object->pending_ops);
fscache_stat(&fscache_n_op_pend);
@@ -111,7 +116,6 @@ int fscache_submit_exclusive_op(struct fscache_object *object,
/* need to issue a new write op after this */
clear_bit(FSCACHE_OBJECT_PENDING_WRITE, &object->flags);
- ret = 0;
} else if (object->state == FSCACHE_OBJECT_CREATING) {
op->object = object;
object->n_ops++;
@@ -119,14 +123,13 @@ int fscache_submit_exclusive_op(struct fscache_object *object,
atomic_inc(&op->usage);
list_add_tail(&op->pend_link, &object->pending_ops);
fscache_stat(&fscache_n_op_pend);
- ret = 0;
} else {
/* not allowed to submit ops in any other state */
BUG();
}
spin_unlock(&object->lock);
- return ret;
+ return 0;
}
/*
@@ -186,6 +189,7 @@ int fscache_submit_op(struct fscache_object *object,
_enter("{OBJ%x OP%x},{%u}",
object->debug_id, op->debug_id, atomic_read(&op->usage));
+ ASSERTCMP(op->state, ==, FSCACHE_OP_ST_INITIALISED);
ASSERTCMP(atomic_read(&op->usage), >, 0);
spin_lock(&object->lock);
@@ -196,6 +200,7 @@ int fscache_submit_op(struct fscache_object *object,
ostate = object->state;
smp_rmb();
+ op->state = FSCACHE_OP_ST_PENDING;
if (fscache_object_is_active(object)) {
op->object = object;
object->n_ops++;
@@ -225,12 +230,15 @@ int fscache_submit_op(struct fscache_object *object,
object->state == FSCACHE_OBJECT_LC_DYING ||
object->state == FSCACHE_OBJECT_WITHDRAWING) {
fscache_stat(&fscache_n_op_rejected);
+ op->state = FSCACHE_OP_ST_CANCELLED;
ret = -ENOBUFS;
} else if (!test_bit(FSCACHE_IOERROR, &object->cache->flags)) {
fscache_report_unexpected_submission(object, op, ostate);
ASSERT(!fscache_object_is_active(object));
+ op->state = FSCACHE_OP_ST_CANCELLED;
ret = -ENOBUFS;
} else {
+ op->state = FSCACHE_OP_ST_CANCELLED;
ret = -ENOBUFS;
}
@@ -290,13 +298,18 @@ int fscache_cancel_op(struct fscache_operation *op)
_enter("OBJ%x OP%x}", op->object->debug_id, op->debug_id);
+ ASSERTCMP(op->state, >=, FSCACHE_OP_ST_PENDING);
+ ASSERTCMP(op->state, !=, FSCACHE_OP_ST_CANCELLED);
+ ASSERTCMP(atomic_read(&op->usage), >, 0);
+
spin_lock(&object->lock);
ret = -EBUSY;
- if (!list_empty(&op->pend_link)) {
+ if (op->state == FSCACHE_OP_ST_PENDING) {
+ ASSERT(!list_empty(&op->pend_link));
fscache_stat(&fscache_n_op_cancelled);
list_del_init(&op->pend_link);
- object->n_ops--;
+ op->state = FSCACHE_OP_ST_CANCELLED;
if (test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags))
object->n_exclusive--;
if (test_and_clear_bit(FSCACHE_OP_WAITING, &op->flags))
@@ -311,6 +324,37 @@ int fscache_cancel_op(struct fscache_operation *op)
}
/*
+ * Record the completion of an in-progress operation.
+ */
+void fscache_op_complete(struct fscache_operation *op)
+{
+ struct fscache_object *object = op->object;
+
+ _enter("OBJ%x", object->debug_id);
+
+ ASSERTCMP(op->state, ==, FSCACHE_OP_ST_IN_PROGRESS);
+ ASSERTCMP(object->n_in_progress, >, 0);
+ ASSERTIFCMP(test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags),
+ object->n_exclusive, >, 0);
+ ASSERTIFCMP(test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags),
+ object->n_in_progress, ==, 1);
+
+ spin_lock(&object->lock);
+
+ op->state = FSCACHE_OP_ST_COMPLETE;
+
+ if (test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags))
+ object->n_exclusive--;
+ object->n_in_progress--;
+ if (object->n_in_progress == 0)
+ fscache_start_operations(object);
+
+ spin_unlock(&object->lock);
+ _leave("");
+}
+EXPORT_SYMBOL(fscache_op_complete);
+
+/*
* release an operation
* - queues pending ops if this is the last in-progress op
*/
@@ -328,8 +372,9 @@ void fscache_put_operation(struct fscache_operation *op)
return;
_debug("PUT OP");
- if (test_and_set_bit(FSCACHE_OP_DEAD, &op->flags))
- BUG();
+ ASSERTIFCMP(op->state != FSCACHE_OP_ST_COMPLETE,
+ op->state, ==, FSCACHE_OP_ST_CANCELLED);
+ op->state = FSCACHE_OP_ST_DEAD;
fscache_stat(&fscache_n_op_release);
@@ -365,16 +410,6 @@ void fscache_put_operation(struct fscache_operation *op)
return;
}
- if (test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags)) {
- ASSERTCMP(object->n_exclusive, >, 0);
- object->n_exclusive--;
- }
-
- ASSERTCMP(object->n_in_progress, >, 0);
- object->n_in_progress--;
- if (object->n_in_progress == 0)
- fscache_start_operations(object);
-
ASSERTCMP(object->n_ops, >, 0);
object->n_ops--;
if (object->n_ops == 0)
@@ -413,23 +448,14 @@ void fscache_operation_gc(struct work_struct *work)
spin_unlock(&cache->op_gc_list_lock);
object = op->object;
+ spin_lock(&object->lock);
_debug("GC DEFERRED REL OBJ%x OP%x",
object->debug_id, op->debug_id);
fscache_stat(&fscache_n_op_gc);
ASSERTCMP(atomic_read(&op->usage), ==, 0);
-
- spin_lock(&object->lock);
- if (test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags)) {
- ASSERTCMP(object->n_exclusive, >, 0);
- object->n_exclusive--;
- }
-
- ASSERTCMP(object->n_in_progress, >, 0);
- object->n_in_progress--;
- if (object->n_in_progress == 0)
- fscache_start_operations(object);
+ ASSERTCMP(op->state, ==, FSCACHE_OP_ST_DEAD);
ASSERTCMP(object->n_ops, >, 0);
object->n_ops--;
@@ -437,7 +463,8 @@ void fscache_operation_gc(struct work_struct *work)
fscache_raise_event(object, FSCACHE_OBJECT_EV_CLEARED);
spin_unlock(&object->lock);
-
+ kfree(op);
+
} while (count++ < 20);
if (!list_empty(&cache->op_gc_list))
diff --git a/fs/fscache/page.c b/fs/fscache/page.c
index aaed5cd..e7e8ff4 100644
--- a/fs/fscache/page.c
+++ b/fs/fscache/page.c
@@ -162,6 +162,7 @@ static void fscache_attr_changed_op(struct fscache_operation *op)
fscache_abort_object(object);
}
+ fscache_op_complete(op);
_leave("");
}
@@ -223,6 +224,8 @@ static void fscache_release_retrieval_op(struct fscache_operation *_op)
_enter("{OP%x}", op->op.debug_id);
+ ASSERTCMP(op->n_pages, ==, 0);
+
fscache_hist(fscache_retrieval_histogram, op->start_time);
if (op->context)
fscache_put_context(op->op.object->cookie, op->context);
@@ -320,6 +323,11 @@ static int fscache_wait_for_retrieval_activation(struct fscache_object *object,
_debug("<<< GO");
check_if_dead:
+ if (op->op.state == FSCACHE_OP_ST_CANCELLED) {
+ fscache_stat(stat_object_dead);
+ kleave(" = -ENOBUFS [cancelled]");
+ return -ENOBUFS;
+ }
if (unlikely(fscache_object_is_dead(object))) {
fscache_stat(stat_object_dead);
return -ENOBUFS;
@@ -364,6 +372,7 @@ int __fscache_read_or_alloc_page(struct fscache_cookie *cookie,
_leave(" = -ENOMEM");
return -ENOMEM;
}
+ op->n_pages = 1;
spin_lock(&cookie->lock);
@@ -378,7 +387,7 @@ int __fscache_read_or_alloc_page(struct fscache_cookie *cookie,
set_bit(FSCACHE_OP_DEC_READ_CNT, &op->op.flags);
if (fscache_submit_op(object, &op->op) < 0)
- goto nobufs_unlock;
+ goto nobufs_unlock_dec;
spin_unlock(&cookie->lock);
fscache_stat(&fscache_n_retrieval_ops);
@@ -425,6 +434,8 @@ error:
_leave(" = %d", ret);
return ret;
+nobufs_unlock_dec:
+ atomic_dec(&object->n_reads);
nobufs_unlock:
spin_unlock(&cookie->lock);
kfree(op);
@@ -482,6 +493,7 @@ int __fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
op = fscache_alloc_retrieval(mapping, end_io_func, context);
if (!op)
return -ENOMEM;
+ op->n_pages = *nr_pages;
spin_lock(&cookie->lock);
@@ -491,10 +503,10 @@ int __fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
struct fscache_object, cookie_link);
atomic_inc(&object->n_reads);
- set_bit(FSCACHE_OP_DEC_READ_CNT, &op->op.flags);
+ __set_bit(FSCACHE_OP_DEC_READ_CNT, &op->op.flags);
if (fscache_submit_op(object, &op->op) < 0)
- goto nobufs_unlock;
+ goto nobufs_unlock_dec;
spin_unlock(&cookie->lock);
ASSERTCMP(object->cookie, ==, cookie);
@@ -562,6 +574,8 @@ error:
_leave(" = %d", ret);
return ret;
+nobufs_unlock_dec:
+ atomic_dec(&object->n_reads);
nobufs_unlock:
spin_unlock(&cookie->lock);
kfree(op);
@@ -604,6 +618,7 @@ int __fscache_alloc_page(struct fscache_cookie *cookie,
op = fscache_alloc_retrieval(page->mapping, NULL, NULL);
if (!op)
return -ENOMEM;
+ op->n_pages = 1;
spin_lock(&cookie->lock);
@@ -717,6 +732,7 @@ static void fscache_write_op(struct fscache_operation *_op)
fscache_end_page_write(object, page);
if (ret < 0) {
fscache_abort_object(object);
+ fscache_op_complete(&op->op);
} else {
fscache_enqueue_operation(&op->op);
}
@@ -731,6 +747,7 @@ superseded:
spin_unlock(&cookie->stores_lock);
clear_bit(FSCACHE_OBJECT_PENDING_WRITE, &object->flags);
spin_unlock(&object->lock);
+ fscache_op_complete(&op->op);
_leave("");
}
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index c593d57..22a3bd7 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -75,6 +75,16 @@ extern wait_queue_head_t fscache_cache_cleared_wq;
typedef void (*fscache_operation_release_t)(struct fscache_operation *op);
typedef void (*fscache_operation_processor_t)(struct fscache_operation *op);
+enum fscache_operation_state {
+ FSCACHE_OP_ST_BLANK, /* Op is not yet submitted */
+ FSCACHE_OP_ST_INITIALISED, /* Op is initialised */
+ FSCACHE_OP_ST_PENDING, /* Op is blocked from running */
+ FSCACHE_OP_ST_IN_PROGRESS, /* Op is in progress */
+ FSCACHE_OP_ST_COMPLETE, /* Op is complete */
+ FSCACHE_OP_ST_CANCELLED, /* Op has been cancelled */
+ FSCACHE_OP_ST_DEAD /* Op is now dead */
+};
+
struct fscache_operation {
struct work_struct work; /* record for async ops */
struct list_head pend_link; /* link in object->pending_ops */
@@ -86,10 +96,10 @@ struct fscache_operation {
#define FSCACHE_OP_MYTHREAD 0x0002 /* - processing is done be issuing thread, not pool */
#define FSCACHE_OP_WAITING 4 /* cleared when op is woken */
#define FSCACHE_OP_EXCLUSIVE 5 /* exclusive op, other ops must wait */
-#define FSCACHE_OP_DEAD 6 /* op is now dead */
-#define FSCACHE_OP_DEC_READ_CNT 7 /* decrement object->n_reads on destruction */
-#define FSCACHE_OP_KEEP_FLAGS 0xc0 /* flags to keep when repurposing an op */
+#define FSCACHE_OP_DEC_READ_CNT 6 /* decrement object->n_reads on destruction */
+#define FSCACHE_OP_KEEP_FLAGS 0x0070 /* flags to keep when repurposing an op */
+ enum fscache_operation_state state;
atomic_t usage;
unsigned debug_id; /* debugging ID */
@@ -106,6 +116,7 @@ extern atomic_t fscache_op_debug_id;
extern void fscache_op_work_func(struct work_struct *work);
extern void fscache_enqueue_operation(struct fscache_operation *);
+extern void fscache_op_complete(struct fscache_operation *);
extern void fscache_put_operation(struct fscache_operation *);
/**
@@ -122,6 +133,7 @@ static inline void fscache_operation_init(struct fscache_operation *op,
{
INIT_WORK(&op->work, fscache_op_work_func);
atomic_set(&op->usage, 1);
+ op->state = FSCACHE_OP_ST_INITIALISED;
op->debug_id = atomic_inc_return(&fscache_op_debug_id);
op->processor = processor;
op->release = release;
@@ -138,6 +150,7 @@ struct fscache_retrieval {
void *context; /* netfs read context (pinned) */
struct list_head to_do; /* list of things to be done by the backend */
unsigned long start_time; /* time at which retrieval started */
+ unsigned n_pages; /* number of pages to be retrieved */
};
typedef int (*fscache_page_retrieval_func_t)(struct fscache_retrieval *op,
@@ -174,8 +187,22 @@ static inline void fscache_enqueue_retrieval(struct fscache_retrieval *op)
}
/**
+ * fscache_retrieval_complete - Record (partial) completion of a retrieval
+ * @op: The retrieval operation affected
+ * @n_pages: The number of pages to account for
+ */
+static inline void fscache_retrieval_complete(struct fscache_retrieval *op,
+ int n_pages)
+{
+ op->n_pages -= n_pages;
+ if (op->n_pages <= 0)
+ fscache_op_complete(&op->op);
+}
+
+/**
* fscache_put_retrieval - Drop a reference to a retrieval operation
* @op: The retrieval operation affected
+ * @n_pages: The number of pages to account for
*
* Drop a reference to a retrieval operation.
*/
@@ -333,10 +360,10 @@ struct fscache_object {
int debug_id; /* debugging ID */
int n_children; /* number of child objects */
- int n_ops; /* number of ops outstanding on object */
+ int n_ops; /* number of extant ops on object */
int n_obj_ops; /* number of object ops outstanding on object */
int n_in_progress; /* number of ops in progress */
- int n_exclusive; /* number of exclusive ops queued */
+ int n_exclusive; /* number of exclusive ops queued or in progress */
atomic_t n_reads; /* number of read ops in progress */
spinlock_t lock; /* state and operations lock */
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 10/13] FS-Cache: Provide proper invalidation
[not found] ` <20110929144536.5812.84405.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>
@ 2011-09-29 14:47 ` David Howells
0 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2011-09-29 14:47 UTC (permalink / raw)
To: moseleymark-Re5JQEeQqe8AvxtiuMwx3w, mark-UrrBsZIrrsb10XsdtD+oqA,
jlayton-H+wXaHxf7aLQT0dZR+AlfA, steved-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-cachefs-H+wXaHxf7aLQT0dZR+AlfA,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA
Provide a proper invalidation method rather than relying on the netfs retiring
the cookie it has and getting a new one. The problem with this is that isn't
easy for the netfs to make sure that it has completed/cancelled all its
outstanding storage and retrieval operations on the cookie it is retiring.
Instead, have the cache provide an invalidation method that will cancel or wait
for all currently outstanding operations before invalidating the cache, and
will cause new operations to queue up behind that. Whilst invalidation is in
progress, some requests will be rejected until the cache can stack a barrier on
the operation queue to cause new operations to be deferred behind it.
Signed-off-by: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
Documentation/filesystems/caching/backend-api.txt | 12 ++++
Documentation/filesystems/caching/netfs-api.txt | 46 +++++++++++--
Documentation/filesystems/caching/object.txt | 23 ++++---
fs/fscache/cookie.c | 60 ++++++++++++++++++
fs/fscache/internal.h | 10 +++
fs/fscache/object.c | 72 +++++++++++++++++++++
fs/fscache/operation.c | 32 +++++++++
fs/fscache/page.c | 51 +++++++++++++++
fs/fscache/stats.c | 11 +++
include/linux/fscache-cache.h | 8 ++
include/linux/fscache.h | 38 +++++++++++
11 files changed, 345 insertions(+), 18 deletions(-)
diff --git a/Documentation/filesystems/caching/backend-api.txt b/Documentation/filesystems/caching/backend-api.txt
index f4769b9..d78bab9 100644
--- a/Documentation/filesystems/caching/backend-api.txt
+++ b/Documentation/filesystems/caching/backend-api.txt
@@ -308,6 +308,18 @@ performed on the denizens of the cache. These are held in a structure of type:
obtained by calling object->cookie->def->get_aux()/get_attr().
+ (*) Invalidate data object [mandatory]:
+
+ int (*invalidate_object)(struct fscache_operation *op)
+
+ This is called to invalidate a data object (as pointed to by op->object).
+ All the data stored for this object should be discarded and an
+ attr_changed operation should be performed. The caller will follow up
+ with an object update operation.
+
+ fscache_op_complete() must be called on op before returning.
+
+
(*) Discard object [mandatory]:
void (*drop_object)(struct fscache_object *object)
diff --git a/Documentation/filesystems/caching/netfs-api.txt b/Documentation/filesystems/caching/netfs-api.txt
index 7cc6bf2..97e6c0e 100644
--- a/Documentation/filesystems/caching/netfs-api.txt
+++ b/Documentation/filesystems/caching/netfs-api.txt
@@ -35,8 +35,9 @@ This document contains the following sections:
(12) Index and data file update
(13) Miscellaneous cookie operations
(14) Cookie unregistration
- (15) Index and data file invalidation
- (16) FS-Cache specific page flags.
+ (15) Index invalidation
+ (16) Data file invalidation
+ (17) FS-Cache specific page flags.
=============================
@@ -767,13 +768,42 @@ the cookies for "child" indices, objects and pages have been relinquished
first.
-================================
-INDEX AND DATA FILE INVALIDATION
-================================
+==================
+INDEX INVALIDATION
+==================
+
+There is no direct way to invalidate an index subtree. To do this, the caller
+should relinquish and retire the cookie they have, and then acquire a new one.
+
+
+======================
+DATA FILE INVALIDATION
+======================
+
+Sometimes it will be necessary to invalidate an object that contains data.
+Typically this will be necessary when the server tells the netfs of a foreign
+change - at which point the netfs has to throw away all the state it had for an
+inode and reload from the server.
+
+To indicate that a cache object should be invalidated, the following function
+can be called:
+
+ void fscache_invalidate(struct fscache_cookie *cookie);
+
+This can be called with spinlocks held as it defers the work to a thread pool.
+All extant storage, retrieval and attribute change ops at this point are
+cancelled and discarded. Some future operations will be rejected until the
+cache has had a chance to insert a barrier in the operations queue. After
+that, operations will be queued again behind the invalidation operation.
+
+The invalidation operation will perform an attribute change operation and an
+auxiliary data update operation as it is very likely these will have changed.
+
+Using the following function, the netfs can wait for the invalidation operation
+to have reached a point at which it can start submitting ordinary operations
+once again:
-There is no direct way to invalidate an index subtree or a data file. To do
-this, the caller should relinquish and retire the cookie they have, and then
-acquire a new one.
+ void fscache_wait_on_invalidate(struct fscache_cookie *cookie);
===========================
diff --git a/Documentation/filesystems/caching/object.txt b/Documentation/filesystems/caching/object.txt
index e8b0a35..4a67070 100644
--- a/Documentation/filesystems/caching/object.txt
+++ b/Documentation/filesystems/caching/object.txt
@@ -216,7 +216,14 @@ servicing netfs requests:
The normal running state. In this state, requests the netfs makes will be
passed on to the cache.
- (6) State FSCACHE_OBJECT_UPDATING.
+ (6) State FSCACHE_OBJECT_INVALIDATING.
+
+ The object is undergoing invalidation. When the state comes here, it
+ discards all pending read, write and attribute change operations as it is
+ going to clear out the cache entirely and reinitialise it. It will then
+ continue to the FSCACHE_OBJECT_UPDATING state.
+
+ (7) State FSCACHE_OBJECT_UPDATING.
The state machine comes here to update the object in the cache from the
netfs's records. This involves updating the auxiliary data that is used
@@ -225,13 +232,13 @@ servicing netfs requests:
And there are terminal states in which an object cleans itself up, deallocates
memory and potentially deletes stuff from disk:
- (7) State FSCACHE_OBJECT_LC_DYING.
+ (8) State FSCACHE_OBJECT_LC_DYING.
The object comes here if it is dying because of a lookup or creation
error. This would be due to a disk error or system error of some sort.
Temporary data is cleaned up, and the parent is released.
- (8) State FSCACHE_OBJECT_DYING.
+ (9) State FSCACHE_OBJECT_DYING.
The object comes here if it is dying due to an error, because its parent
cookie has been relinquished by the netfs or because the cache is being
@@ -241,27 +248,27 @@ memory and potentially deletes stuff from disk:
can destroy themselves. This object waits for all its children to go away
before advancing to the next state.
- (9) State FSCACHE_OBJECT_ABORT_INIT.
+(10) State FSCACHE_OBJECT_ABORT_INIT.
The object comes to this state if it was waiting on its parent in
FSCACHE_OBJECT_INIT, but its parent died. The object will destroy itself
so that the parent may proceed from the FSCACHE_OBJECT_DYING state.
-(10) State FSCACHE_OBJECT_RELEASING.
-(11) State FSCACHE_OBJECT_RECYCLING.
+(11) State FSCACHE_OBJECT_RELEASING.
+(12) State FSCACHE_OBJECT_RECYCLING.
The object comes to one of these two states when dying once it is rid of
all its children, if it is dying because the netfs relinquished its
cookie. In the first state, the cached data is expected to persist, and
in the second it will be deleted.
-(12) State FSCACHE_OBJECT_WITHDRAWING.
+(13) State FSCACHE_OBJECT_WITHDRAWING.
The object transits to this state if the cache decides it wants to
withdraw the object from service, perhaps to make space, but also due to
error or just because the whole cache is being withdrawn.
-(13) State FSCACHE_OBJECT_DEAD.
+(14) State FSCACHE_OBJECT_DEAD.
The object transits to this state when the in-memory object record is
ready to be deleted. The object processor shouldn't ever see an object in
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 66be9ec..8dcb114 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -370,6 +370,66 @@ cant_attach_object:
}
/*
+ * Invalidate an object. Callable with spinlocks held.
+ */
+void __fscache_invalidate(struct fscache_cookie *cookie)
+{
+ struct fscache_object *object;
+
+ _enter("{%s}", cookie->def->name);
+
+ fscache_stat(&fscache_n_invalidates);
+
+ /* Only permit invalidation of data files. Invalidating an index will
+ * require the caller to release all its attachments to the tree rooted
+ * there, and if it's doing that, it may as well just retire the
+ * cookie.
+ */
+ ASSERTCMP(cookie->def->type, ==, FSCACHE_COOKIE_TYPE_DATAFILE);
+
+ /* We will be updating the cookie too. */
+ BUG_ON(!cookie->def->get_aux);
+
+ /* If there's an object, we tell the object state machine to handle the
+ * invalidation on our behalf, otherwise there's nothing to do.
+ */
+ if (!hlist_empty(&cookie->backing_objects)) {
+ spin_lock(&cookie->lock);
+
+ if (!hlist_empty(&cookie->backing_objects) &&
+ !test_and_set_bit(FSCACHE_COOKIE_INVALIDATING,
+ &cookie->flags)) {
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object,
+ cookie_link);
+ if (object->state < FSCACHE_OBJECT_DYING)
+ fscache_raise_event(
+ object, FSCACHE_OBJECT_EV_INVALIDATE);
+ }
+
+ spin_unlock(&cookie->lock);
+ }
+
+ _leave("");
+}
+EXPORT_SYMBOL(__fscache_invalidate);
+
+/*
+ * Wait for object invalidation to complete.
+ */
+void __fscache_wait_on_invalidate(struct fscache_cookie *cookie)
+{
+ _enter("%p", cookie);
+
+ wait_on_bit(&cookie->flags, FSCACHE_COOKIE_INVALIDATING,
+ fscache_wait_bit_interruptible,
+ TASK_UNINTERRUPTIBLE);
+
+ _leave("");
+}
+EXPORT_SYMBOL(__fscache_wait_on_invalidate);
+
+/*
* update the index entries backing a cookie
*/
void __fscache_update_cookie(struct fscache_cookie *cookie)
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index f6aad48..c811793 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -122,11 +122,17 @@ extern int fscache_submit_exclusive_op(struct fscache_object *,
extern int fscache_submit_op(struct fscache_object *,
struct fscache_operation *);
extern int fscache_cancel_op(struct fscache_operation *);
+extern void fscache_cancel_all_ops(struct fscache_object *);
extern void fscache_abort_object(struct fscache_object *);
extern void fscache_start_operations(struct fscache_object *);
extern void fscache_operation_gc(struct work_struct *);
/*
+ * page.c
+ */
+extern void fscache_invalidate_writes(struct fscache_cookie *);
+
+/*
* proc.c
*/
#ifdef CONFIG_PROC_FS
@@ -205,6 +211,9 @@ extern atomic_t fscache_n_acquires_ok;
extern atomic_t fscache_n_acquires_nobufs;
extern atomic_t fscache_n_acquires_oom;
+extern atomic_t fscache_n_invalidates;
+extern atomic_t fscache_n_invalidates_run;
+
extern atomic_t fscache_n_updates;
extern atomic_t fscache_n_updates_null;
extern atomic_t fscache_n_updates_run;
@@ -237,6 +246,7 @@ extern atomic_t fscache_n_cop_alloc_object;
extern atomic_t fscache_n_cop_lookup_object;
extern atomic_t fscache_n_cop_lookup_complete;
extern atomic_t fscache_n_cop_grab_object;
+extern atomic_t fscache_n_cop_invalidate_object;
extern atomic_t fscache_n_cop_update_object;
extern atomic_t fscache_n_cop_drop_object;
extern atomic_t fscache_n_cop_put_object;
diff --git a/fs/fscache/object.c b/fs/fscache/object.c
index 773bc79..80b5491 100644
--- a/fs/fscache/object.c
+++ b/fs/fscache/object.c
@@ -14,6 +14,7 @@
#define FSCACHE_DEBUG_LEVEL COOKIE
#include <linux/module.h>
+#include <linux/slab.h>
#include "internal.h"
const char *fscache_object_states[FSCACHE_OBJECT__NSTATES] = {
@@ -22,6 +23,7 @@ const char *fscache_object_states[FSCACHE_OBJECT__NSTATES] = {
[FSCACHE_OBJECT_CREATING] = "OBJECT_CREATING",
[FSCACHE_OBJECT_AVAILABLE] = "OBJECT_AVAILABLE",
[FSCACHE_OBJECT_ACTIVE] = "OBJECT_ACTIVE",
+ [FSCACHE_OBJECT_INVALIDATING] = "OBJECT_INVALIDATING",
[FSCACHE_OBJECT_UPDATING] = "OBJECT_UPDATING",
[FSCACHE_OBJECT_DYING] = "OBJECT_DYING",
[FSCACHE_OBJECT_LC_DYING] = "OBJECT_LC_DYING",
@@ -39,6 +41,7 @@ const char fscache_object_states_short[FSCACHE_OBJECT__NSTATES][5] = {
[FSCACHE_OBJECT_CREATING] = "CRTN",
[FSCACHE_OBJECT_AVAILABLE] = "AVBL",
[FSCACHE_OBJECT_ACTIVE] = "ACTV",
+ [FSCACHE_OBJECT_INVALIDATING] = "INVL",
[FSCACHE_OBJECT_UPDATING] = "UPDT",
[FSCACHE_OBJECT_DYING] = "DYNG",
[FSCACHE_OBJECT_LC_DYING] = "LCDY",
@@ -54,6 +57,7 @@ static void fscache_put_object(struct fscache_object *);
static void fscache_initialise_object(struct fscache_object *);
static void fscache_lookup_object(struct fscache_object *);
static void fscache_object_available(struct fscache_object *);
+static void fscache_invalidate_object(struct fscache_object *);
static void fscache_release_object(struct fscache_object *);
static void fscache_withdraw_object(struct fscache_object *);
static void fscache_enqueue_dependents(struct fscache_object *);
@@ -79,6 +83,15 @@ static inline void fscache_done_parent_op(struct fscache_object *object)
}
/*
+ * Notify netfs of invalidation completion.
+ */
+static inline void fscache_invalidation_complete(struct fscache_cookie *cookie)
+{
+ if (test_and_clear_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags))
+ wake_up_bit(&cookie->flags, FSCACHE_COOKIE_INVALIDATING);
+}
+
+/*
* process events that have been sent to an object's state machine
* - initiates parent lookup
* - does object lookup
@@ -125,6 +138,16 @@ static void fscache_object_state_machine(struct fscache_object *object)
case FSCACHE_OBJECT_ACTIVE:
goto active_transit;
+ /* Invalidate an object on disk */
+ case FSCACHE_OBJECT_INVALIDATING:
+ clear_bit(FSCACHE_OBJECT_EV_INVALIDATE, &object->events);
+ fscache_stat(&fscache_n_invalidates_run);
+ fscache_stat(&fscache_n_cop_invalidate_object);
+ fscache_invalidate_object(object);
+ fscache_stat_d(&fscache_n_cop_invalidate_object);
+ fscache_raise_event(object, FSCACHE_OBJECT_EV_UPDATE);
+ goto active_transit;
+
/* update the object metadata on disk */
case FSCACHE_OBJECT_UPDATING:
clear_bit(FSCACHE_OBJECT_EV_UPDATE, &object->events);
@@ -275,6 +298,9 @@ active_transit:
case FSCACHE_OBJECT_EV_ERROR:
new_state = FSCACHE_OBJECT_DYING;
goto change_state;
+ case FSCACHE_OBJECT_EV_INVALIDATE:
+ new_state = FSCACHE_OBJECT_INVALIDATING;
+ goto change_state;
case FSCACHE_OBJECT_EV_UPDATE:
new_state = FSCACHE_OBJECT_UPDATING;
goto change_state;
@@ -679,6 +705,7 @@ static void fscache_withdraw_object(struct fscache_object *object)
if (object->cookie == cookie) {
hlist_del_init(&object->cookie_link);
object->cookie = NULL;
+ fscache_invalidation_complete(cookie);
detached = true;
}
spin_unlock(&cookie->lock);
@@ -888,3 +915,48 @@ enum fscache_checkaux fscache_check_aux(struct fscache_object *object,
return result;
}
EXPORT_SYMBOL(fscache_check_aux);
+
+/*
+ * Asynchronously invalidate an object.
+ */
+static void fscache_invalidate_object(struct fscache_object *object)
+{
+ struct fscache_operation *op;
+ struct fscache_cookie *cookie = object->cookie;
+
+ _enter("{OBJ%x}", object->debug_id);
+
+ /* Reject any new read/write ops and abort any that are pending. */
+ fscache_invalidate_writes(cookie);
+ clear_bit(FSCACHE_OBJECT_PENDING_WRITE, &object->flags);
+ fscache_cancel_all_ops(object);
+
+ /* Now we have to wait for in-progress reads and writes */
+ op = kzalloc(sizeof(*op), GFP_KERNEL);
+ if (!op) {
+ fscache_raise_event(object, FSCACHE_OBJECT_EV_ERROR);
+ _leave(" [ENOMEM]");
+ return;
+ }
+
+ fscache_operation_init(op, object->cache->ops->invalidate_object, NULL);
+ op->flags = FSCACHE_OP_ASYNC | (1 << FSCACHE_OP_EXCLUSIVE);
+
+ spin_lock(&cookie->lock);
+ if (fscache_submit_exclusive_op(object, op) < 0)
+ BUG();
+ spin_unlock(&cookie->lock);
+ fscache_put_operation(op);
+
+ /* Once we've completed the invalidation, we know there will be no data
+ * stored in the cache and thus we can reinstate the data-check-skip
+ * optimisation.
+ */
+ set_bit(FSCACHE_COOKIE_NO_DATA_YET, &cookie->flags);
+
+ /* We can allow read and write requests to come in once again. They'll
+ * queue up behind our exclusive invalidation operation.
+ */
+ fscache_invalidation_complete(cookie);
+ _leave("");
+}
diff --git a/fs/fscache/operation.c b/fs/fscache/operation.c
index 1b9c4c9..2037f03 100644
--- a/fs/fscache/operation.c
+++ b/fs/fscache/operation.c
@@ -324,6 +324,38 @@ int fscache_cancel_op(struct fscache_operation *op)
}
/*
+ * Cancel all pending operations on an object
+ */
+void fscache_cancel_all_ops(struct fscache_object *object)
+{
+ struct fscache_operation *op;
+
+ _enter("OBJ%x", object->debug_id);
+
+ spin_lock(&object->lock);
+
+ while (!list_empty(&object->pending_ops)) {
+ op = list_entry(object->pending_ops.next,
+ struct fscache_operation, pend_link);
+ fscache_stat(&fscache_n_op_cancelled);
+ list_del_init(&op->pend_link);
+
+ ASSERTCMP(op->state, ==, FSCACHE_OP_ST_PENDING);
+ op->state = FSCACHE_OP_ST_CANCELLED;
+
+ if (test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags))
+ object->n_exclusive--;
+ if (test_and_clear_bit(FSCACHE_OP_WAITING, &op->flags))
+ wake_up_bit(&op->flags, FSCACHE_OP_WAITING);
+ fscache_put_operation(op);
+ cond_resched_lock(&object->lock);
+ }
+
+ spin_unlock(&object->lock);
+ _leave("");
+}
+
+/*
* Record the completion of an in-progress operation.
*/
void fscache_op_complete(struct fscache_operation *op)
diff --git a/fs/fscache/page.c b/fs/fscache/page.c
index e7e8ff4..2cccfae 100644
--- a/fs/fscache/page.c
+++ b/fs/fscache/page.c
@@ -361,6 +361,11 @@ int __fscache_read_or_alloc_page(struct fscache_cookie *cookie,
if (hlist_empty(&cookie->backing_objects))
goto nobufs;
+ if (test_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags)) {
+ kleave(" = -ENOBUFS [invalidating]");
+ return -ENOBUFS;
+ }
+
ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
ASSERTCMP(page, !=, NULL);
@@ -483,6 +488,11 @@ int __fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
if (hlist_empty(&cookie->backing_objects))
goto nobufs;
+ if (test_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags)) {
+ kleave(" = -ENOBUFS [invalidating]");
+ return -ENOBUFS;
+ }
+
ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
ASSERTCMP(*nr_pages, >, 0);
ASSERT(!list_empty(pages));
@@ -612,6 +622,11 @@ int __fscache_alloc_page(struct fscache_cookie *cookie,
ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
ASSERTCMP(page, !=, NULL);
+ if (test_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags)) {
+ kleave(" = -ENOBUFS [invalidating]");
+ return -ENOBUFS;
+ }
+
if (fscache_wait_for_deferred_lookup(cookie) < 0)
return -ERESTARTSYS;
@@ -752,6 +767,37 @@ superseded:
}
/*
+ * Clear the pages pending writing for invalidation
+ */
+void fscache_invalidate_writes(struct fscache_cookie *cookie)
+{
+ struct page *page;
+ void *results[16];
+ int n, i;
+
+ _enter("");
+
+ while (spin_lock(&cookie->stores_lock),
+ n = radix_tree_gang_lookup_tag(&cookie->stores, results, 0,
+ ARRAY_SIZE(results),
+ FSCACHE_COOKIE_PENDING_TAG),
+ n > 0) {
+ for (i = n - 1; i >= 0; i--) {
+ page = results[i];
+ radix_tree_delete(&cookie->stores, page->index);
+ }
+
+ spin_unlock(&cookie->stores_lock);
+
+ for (i = n - 1; i >= 0; i--)
+ page_cache_release(results[i]);
+ }
+
+ spin_unlock(&cookie->stores_lock);
+ _leave("");
+}
+
+/*
* request a page be stored in the cache
* - returns:
* -ENOMEM - out of memory, nothing done
@@ -797,6 +843,11 @@ int __fscache_write_page(struct fscache_cookie *cookie,
fscache_stat(&fscache_n_stores);
+ if (test_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags)) {
+ kleave(" = -ENOBUFS [invalidating]");
+ return -ENOBUFS;
+ }
+
op = kzalloc(sizeof(*op), GFP_NOIO | __GFP_NOMEMALLOC | __GFP_NORETRY);
if (!op)
goto nomem;
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index 4765190..51cdaee 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -80,6 +80,9 @@ atomic_t fscache_n_acquires_ok;
atomic_t fscache_n_acquires_nobufs;
atomic_t fscache_n_acquires_oom;
+atomic_t fscache_n_invalidates;
+atomic_t fscache_n_invalidates_run;
+
atomic_t fscache_n_updates;
atomic_t fscache_n_updates_null;
atomic_t fscache_n_updates_run;
@@ -112,6 +115,7 @@ atomic_t fscache_n_cop_alloc_object;
atomic_t fscache_n_cop_lookup_object;
atomic_t fscache_n_cop_lookup_complete;
atomic_t fscache_n_cop_grab_object;
+atomic_t fscache_n_cop_invalidate_object;
atomic_t fscache_n_cop_update_object;
atomic_t fscache_n_cop_drop_object;
atomic_t fscache_n_cop_put_object;
@@ -168,6 +172,10 @@ static int fscache_stats_show(struct seq_file *m, void *v)
atomic_read(&fscache_n_object_created),
atomic_read(&fscache_n_object_lookups_timed_out));
+ seq_printf(m, "Invals : n=%u run=%u\n",
+ atomic_read(&fscache_n_invalidates),
+ atomic_read(&fscache_n_invalidates_run));
+
seq_printf(m, "Updates: n=%u nul=%u run=%u\n",
atomic_read(&fscache_n_updates),
atomic_read(&fscache_n_updates_null),
@@ -246,7 +254,8 @@ static int fscache_stats_show(struct seq_file *m, void *v)
atomic_read(&fscache_n_cop_lookup_object),
atomic_read(&fscache_n_cop_lookup_complete),
atomic_read(&fscache_n_cop_grab_object));
- seq_printf(m, "CacheOp: upo=%d dro=%d pto=%d atc=%d syn=%d\n",
+ seq_printf(m, "CacheOp: inv=%d upo=%d dro=%d pto=%d atc=%d syn=%d\n",
+ atomic_read(&fscache_n_cop_invalidate_object),
atomic_read(&fscache_n_cop_update_object),
atomic_read(&fscache_n_cop_drop_object),
atomic_read(&fscache_n_cop_put_object),
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 22a3bd7..29f552d 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -254,6 +254,9 @@ struct fscache_cache_ops {
/* store the updated auxiliary data on an object */
void (*update_object)(struct fscache_object *object);
+ /* Invalidate an object */
+ void (*invalidate_object)(struct fscache_operation *op);
+
/* discard the resources pinned by an object and effect retirement if
* necessary */
void (*drop_object)(struct fscache_object *object);
@@ -329,6 +332,7 @@ struct fscache_cookie {
#define FSCACHE_COOKIE_FILLING 4 /* T if filling object incrementally */
#define FSCACHE_COOKIE_UNAVAILABLE 5 /* T if cookie is unavailable (error, etc) */
#define FSCACHE_COOKIE_WAITING_ON_READS 6 /* T if cookie is waiting on reads */
+#define FSCACHE_COOKIE_INVALIDATING 7 /* T if cookie is being invalidated */
};
extern struct fscache_cookie fscache_fsdef_index;
@@ -345,6 +349,7 @@ struct fscache_object {
/* active states */
FSCACHE_OBJECT_AVAILABLE, /* cleaning up object after creation */
FSCACHE_OBJECT_ACTIVE, /* object is usable */
+ FSCACHE_OBJECT_INVALIDATING, /* object is invalidating */
FSCACHE_OBJECT_UPDATING, /* object is updating */
/* terminal states */
@@ -378,7 +383,8 @@ struct fscache_object {
#define FSCACHE_OBJECT_EV_RELEASE 4 /* T if netfs requested object release */
#define FSCACHE_OBJECT_EV_RETIRE 5 /* T if netfs requested object retirement */
#define FSCACHE_OBJECT_EV_WITHDRAW 6 /* T if cache requested object withdrawal */
-#define FSCACHE_OBJECT_EVENTS_MASK 0x7f /* mask of all events*/
+#define FSCACHE_OBJECT_EV_INVALIDATE 7 /* T if cache requested object invalidation */
+#define FSCACHE_OBJECT_EVENTS_MASK 0xff /* mask of all events*/
unsigned long flags;
#define FSCACHE_OBJECT_LOCK 0 /* T if object is busy being processed */
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index f4b6353..7a08623 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -185,6 +185,8 @@ extern struct fscache_cookie *__fscache_acquire_cookie(
extern void __fscache_relinquish_cookie(struct fscache_cookie *, int);
extern void __fscache_update_cookie(struct fscache_cookie *);
extern int __fscache_attr_changed(struct fscache_cookie *);
+extern void __fscache_invalidate(struct fscache_cookie *);
+extern void __fscache_wait_on_invalidate(struct fscache_cookie *);
extern int __fscache_read_or_alloc_page(struct fscache_cookie *,
struct page *,
fscache_rw_complete_t,
@@ -390,6 +392,42 @@ int fscache_attr_changed(struct fscache_cookie *cookie)
}
/**
+ * fscache_invalidate - Notify cache that an object needs invalidation
+ * @cookie: The cookie representing the cache object
+ *
+ * Notify the cache that an object is needs to be invalidated and that it
+ * should abort any retrievals or stores it is doing on the cache. The object
+ * is then marked non-caching until such time as the invalidation is complete.
+ *
+ * This can be called with spinlocks held.
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for a complete
+ * description.
+ */
+static inline
+void fscache_invalidate(struct fscache_cookie *cookie)
+{
+ if (fscache_cookie_valid(cookie))
+ __fscache_invalidate(cookie);
+}
+
+/**
+ * fscache_wait_on_invalidate - Wait for invalidation to complete
+ * @cookie: The cookie representing the cache object
+ *
+ * Wait for the invalidation of an object to complete.
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for a complete
+ * description.
+ */
+static inline
+void fscache_wait_on_invalidate(struct fscache_cookie *cookie)
+{
+ if (fscache_cookie_valid(cookie))
+ __fscache_wait_on_invalidate(cookie);
+}
+
+/**
* fscache_reserve_space - Reserve data space for a cached object
* @cookie: The cookie representing the cache object
* @i_size: The amount of space to be reserved
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 11/13] VFS: Make more complete truncate operation available to CacheFiles
2011-09-29 14:45 [RFC][PATCH 00/13] Fix FS-Cache problems David Howells
` (9 preceding siblings ...)
[not found] ` <20110929144536.5812.84405.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>
@ 2011-09-29 14:47 ` David Howells
2011-09-29 14:48 ` [PATCH 12/13] CacheFiles: Implement invalidation David Howells
2011-09-29 14:48 ` [PATCH 13/13] NFS: Use FS-Cache invalidation David Howells
12 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2011-09-29 14:47 UTC (permalink / raw)
To: moseleymark, mark, jlayton, steved
Cc: linux-fsdevel, linux-nfs, linux-cachefs
Make a more complete truncate operation available to CacheFiles (including
security checks and suchlike) so that it can use this to clear invalidated
cache files.
Signed-off-by: David Howells <dhowells@redhat.com>
---
fs/open.c | 50 +++++++++++++++++++++++++++-----------------------
include/linux/fs.h | 1 +
2 files changed, 28 insertions(+), 23 deletions(-)
diff --git a/fs/open.c b/fs/open.c
index f711921..8178e58 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -61,33 +61,22 @@ int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs,
return ret;
}
-static long do_sys_truncate(const char __user *pathname, loff_t length)
+long vfs_truncate(struct path *path, loff_t length)
{
- struct path path;
struct inode *inode;
- int error;
-
- error = -EINVAL;
- if (length < 0) /* sorry, but loff_t says... */
- goto out;
+ long error;
- error = user_path(pathname, &path);
- if (error)
- goto out;
- inode = path.dentry->d_inode;
+ inode = path->dentry->d_inode;
/* For directories it's -EISDIR, for other non-regulars - -EINVAL */
- error = -EISDIR;
if (S_ISDIR(inode->i_mode))
- goto dput_and_out;
-
- error = -EINVAL;
+ return -EISDIR;
if (!S_ISREG(inode->i_mode))
- goto dput_and_out;
+ return -EINVAL;
- error = mnt_want_write(path.mnt);
+ error = mnt_want_write(path->mnt);
if (error)
- goto dput_and_out;
+ goto out;
error = inode_permission(inode, MAY_WRITE);
if (error)
@@ -111,19 +100,34 @@ static long do_sys_truncate(const char __user *pathname, loff_t length)
error = locks_verify_truncate(inode, NULL, length);
if (!error)
- error = security_path_truncate(&path);
+ error = security_path_truncate(path);
if (!error)
- error = do_truncate(path.dentry, length, 0, NULL);
+ error = do_truncate(path->dentry, length, 0, NULL);
put_write_and_out:
put_write_access(inode);
mnt_drop_write_and_out:
- mnt_drop_write(path.mnt);
-dput_and_out:
- path_put(&path);
+ mnt_drop_write(path->mnt);
out:
return error;
}
+EXPORT_SYMBOL_GPL(vfs_truncate);
+
+static long do_sys_truncate(const char __user *pathname, loff_t length)
+{
+ struct path path;
+ int error;
+
+ if (length < 0) /* sorry, but loff_t says... */
+ return -EINVAL;
+
+ error = user_path(pathname, &path);
+ if (!error) {
+ error = vfs_truncate(&path, length);
+ path_put(&path);
+ }
+ return error;
+}
SYSCALL_DEFINE2(truncate, const char __user *, path, long, length)
{
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 277f497..8b2274f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2013,6 +2013,7 @@ static inline int break_lease(struct inode *inode, unsigned int mode)
/* fs/open.c */
+extern long vfs_truncate(struct path *, loff_t);
extern int do_truncate(struct dentry *, loff_t start, unsigned int time_attrs,
struct file *filp);
extern int do_fallocate(struct file *file, int mode, loff_t offset,
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 12/13] CacheFiles: Implement invalidation
2011-09-29 14:45 [RFC][PATCH 00/13] Fix FS-Cache problems David Howells
` (10 preceding siblings ...)
2011-09-29 14:47 ` [PATCH 11/13] VFS: Make more complete truncate operation available to CacheFiles David Howells
@ 2011-09-29 14:48 ` David Howells
2011-09-29 14:48 ` [PATCH 13/13] NFS: Use FS-Cache invalidation David Howells
12 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2011-09-29 14:48 UTC (permalink / raw)
To: moseleymark, mark, jlayton, steved
Cc: linux-fsdevel, linux-nfs, linux-cachefs
Implement invalidation for CacheFiles. This is in two parts:
(1) Provide an invalidation method (which just truncates the backing file).
(2) Abort attempts to copy anything read from the backing file whilst
invalidation is in progress.
Question: CacheFiles uses truncation in a couple of places. It has been using
notify_change() rather than sys_truncate() or something similar. This means
it bypasses a bunch of checks and suchlike that it possibly should be making
(security, file locking, lease breaking, vfsmount write). Should it be using
vfs_truncate() as added by a preceding patch or should it use notify_write()
and assume that anyone poking around in the cache files on disk gets
everything they deserve?
Signed-off-by: David Howells <dhowells@redhat.com>
---
fs/cachefiles/interface.c | 49 +++++++++++++++++++++++++++++++++++++++++++++
fs/cachefiles/rdwr.c | 5 ++++-
2 files changed, 53 insertions(+), 1 deletions(-)
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 075b7a6..ef5c02d 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -442,6 +442,54 @@ truncate_failed:
}
/*
+ * Invalidate an object
+ */
+static void cachefiles_invalidate_object(struct fscache_operation *op)
+{
+ struct cachefiles_object *object;
+ struct cachefiles_cache *cache;
+ const struct cred *saved_cred;
+ struct path path;
+ uint64_t ni_size;
+ int ret;
+
+ object = container_of(op->object, struct cachefiles_object, fscache);
+ cache = container_of(object->fscache.cache,
+ struct cachefiles_cache, cache);
+
+ op->object->cookie->def->get_attr(op->object->cookie->netfs_data,
+ &ni_size);
+
+ _enter("{OBJ%x},[%llu]",
+ op->object->debug_id, (unsigned long long)ni_size);
+
+ if (object->backer) {
+ ASSERT(S_ISREG(object->backer->d_inode->i_mode));
+
+ fscache_set_store_limit(&object->fscache, ni_size);
+
+ path.dentry = object->backer;
+ path.mnt = cache->mnt;
+
+ cachefiles_begin_secure(cache, &saved_cred);
+ ret = vfs_truncate(&path, 0);
+ if (ret == 0)
+ ret = vfs_truncate(&path, ni_size);
+ cachefiles_end_secure(cache, saved_cred);
+
+ if (ret != 0) {
+ fscache_set_store_limit(&object->fscache, 0);
+ if (ret == -EIO)
+ cachefiles_io_error_obj(object,
+ "Invalidate failed");
+ }
+ }
+
+ fscache_op_complete(op);
+ _leave("");
+}
+
+/*
* dissociate a cache from all the pages it was backing
*/
static void cachefiles_dissociate_pages(struct fscache_cache *cache)
@@ -456,6 +504,7 @@ const struct fscache_cache_ops cachefiles_cache_ops = {
.lookup_complete = cachefiles_lookup_complete,
.grab_object = cachefiles_grab_object,
.update_object = cachefiles_update_object,
+ .invalidate_object = cachefiles_invalidate_object,
.drop_object = cachefiles_drop_object,
.put_object = cachefiles_put_object,
.sync_cache = cachefiles_sync_cache,
diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index 4b2b821..637a27d 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -174,7 +174,10 @@ static void cachefiles_read_copier(struct fscache_operation *_op)
_debug("- copy {%lu}", monitor->back_page->index);
recheck:
- if (PageUptodate(monitor->back_page)) {
+ if (test_bit(FSCACHE_COOKIE_INVALIDATING,
+ &object->fscache.cookie->flags)) {
+ error = -ESTALE;
+ } else if (PageUptodate(monitor->back_page)) {
copy_highpage(monitor->netfs_page, monitor->back_page);
fscache_mark_page_cached(monitor->op,
monitor->netfs_page, true);
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 13/13] NFS: Use FS-Cache invalidation
2011-09-29 14:45 [RFC][PATCH 00/13] Fix FS-Cache problems David Howells
` (11 preceding siblings ...)
2011-09-29 14:48 ` [PATCH 12/13] CacheFiles: Implement invalidation David Howells
@ 2011-09-29 14:48 ` David Howells
12 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2011-09-29 14:48 UTC (permalink / raw)
To: moseleymark, mark, jlayton, steved
Cc: linux-fsdevel, linux-nfs, linux-cachefs
Use the new FS-Cache invalidation facility from NFS to deal with foreign
changes being detected on the server rather than attempting to retire the old
cookie and get a new one.
The problem with the old method was that NFS did not wait for all outstanding
storage and retrieval ops on the cache to complete. There was no automatic
wait between the calls to ->readpages() and calls to invalidate_inode_pages2()
as the latter can only wait on locked pages that have been added to the
pagecache (which they haven't yet on entry to ->readpages()).
This was leading to oopses like the one below when an outstanding read got cut
off from its cookie by a premature release.
BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
IP: [<ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache]
PGD 15889067 PUD 15890067 PMD 0
Oops: 0000 [#1] SMP
CPU 0
Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc
Pid: 4544, comm: tar Not tainted 3.1.0-rc4-fsdevel+ #1064 /DG965RY
RIP: 0010:[<ffffffffa0075118>] [<ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache]
RSP: 0018:ffff8800158799e8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8800070d41e0 RCX: ffff8800083dc1b0
RDX: 0000000000000000 RSI: ffff880015879960 RDI: ffff88003e627b90
RBP: ffff880015879a28 R08: 0000000000000002 R09: 0000000000000002
R10: 0000000000000001 R11: ffff880015879950 R12: ffff880015879aa4
R13: 0000000000000000 R14: ffff8800083dc158 R15: ffff880015879be8
FS: 00007f671e9d87c0(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000000a8 CR3: 000000001587f000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process tar (pid: 4544, threadinfo ffff880015878000, task ffff880015875040)
Stack:
ffffffffa00b1759 ffff8800070dc158 ffff8800000213da ffff88002a286508
ffff880015879aa4 ffff880015879be8 0000000000000001 ffff88002a2866e8
ffff880015879a88 ffffffffa00b20be 00000000000200da ffff880015875040
Call Trace:
[<ffffffffa00b1759>] ? nfs_fscache_wait_bit+0xd/0xd [nfs]
[<ffffffffa00b20be>] __nfs_readpages_from_fscache+0x7e/0x13f [nfs]
[<ffffffff81095fe7>] ? __alloc_pages_nodemask+0x156/0x662
[<ffffffffa0098763>] nfs_readpages+0xee/0x187 [nfs]
[<ffffffff81098a5e>] __do_page_cache_readahead+0x1be/0x267
[<ffffffff81098942>] ? __do_page_cache_readahead+0xa2/0x267
[<ffffffff81098d7b>] ra_submit+0x1c/0x20
[<ffffffff8109900a>] ondemand_readahead+0x28b/0x29a
[<ffffffff810990ce>] page_cache_sync_readahead+0x38/0x3a
[<ffffffff81091d8a>] generic_file_aio_read+0x2ab/0x67e
[<ffffffffa008cfbe>] nfs_file_read+0xa4/0xc9 [nfs]
[<ffffffff810c22c4>] do_sync_read+0xba/0xfa
[<ffffffff810a62c9>] ? might_fault+0x4e/0x9e
[<ffffffff81177a47>] ? security_file_permission+0x7b/0x84
[<ffffffff810c25dd>] ? rw_verify_area+0xab/0xc8
[<ffffffff810c29a4>] vfs_read+0xaa/0x13a
[<ffffffff810c2a79>] sys_read+0x45/0x6c
[<ffffffff813ac37b>] system_call_fastpath+0x16/0x1b
Reported-by: Mark Moseley <moseleymark@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
---
fs/nfs/fscache.h | 20 +++++++++++++++++++-
fs/nfs/inode.c | 20 ++++++++++++++++----
fs/nfs/nfs4proc.c | 2 ++
3 files changed, 37 insertions(+), 5 deletions(-)
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index b9c572d..851fee1 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -155,6 +155,22 @@ static inline void nfs_readpage_to_fscache(struct inode *inode,
}
/*
+ * Invalidate the contents of fscache for this inode. This will not sleep.
+ */
+static inline void nfs_fscache_invalidate(struct inode *inode)
+{
+ fscache_invalidate(NFS_I(inode)->fscache);
+}
+
+/*
+ * Wait for an object to finish being invalidated.
+ */
+static inline void nfs_fscache_wait_on_invalidate(struct inode *inode)
+{
+ fscache_wait_on_invalidate(NFS_I(inode)->fscache);
+}
+
+/*
* indicate the client caching state as readable text
*/
static inline const char *nfs_server_fscache_state(struct nfs_server *server)
@@ -164,7 +180,6 @@ static inline const char *nfs_server_fscache_state(struct nfs_server *server)
return "no ";
}
-
#else /* CONFIG_NFS_FSCACHE */
static inline int nfs_fscache_register(void) { return 0; }
static inline void nfs_fscache_unregister(void) {}
@@ -213,6 +228,9 @@ static inline int nfs_readpages_from_fscache(struct nfs_open_context *ctx,
static inline void nfs_readpage_to_fscache(struct inode *inode,
struct page *page, int sync) {}
+
+static inline void nfs_fscache_invalidate(struct inode *inode) {}
+
static inline const char *nfs_server_fscache_state(struct nfs_server *server)
{
return "no ";
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index fe12037..24ea1f8 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -151,10 +151,12 @@ static void nfs_zap_caches_locked(struct inode *inode)
nfsi->attrtimeo_timestamp = jiffies;
memset(NFS_COOKIEVERF(inode), 0, sizeof(NFS_COOKIEVERF(inode)));
- if (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode))
+ if (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode)) {
nfsi->cache_validity |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA|NFS_INO_INVALID_ACCESS|NFS_INO_INVALID_ACL|NFS_INO_REVAL_PAGECACHE;
- else
+ nfs_fscache_invalidate(inode);
+ } else {
nfsi->cache_validity |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_ACCESS|NFS_INO_INVALID_ACL|NFS_INO_REVAL_PAGECACHE;
+ }
}
void nfs_zap_caches(struct inode *inode)
@@ -169,6 +171,7 @@ void nfs_zap_mapping(struct inode *inode, struct address_space *mapping)
if (mapping->nrpages != 0) {
spin_lock(&inode->i_lock);
NFS_I(inode)->cache_validity |= NFS_INO_INVALID_DATA;
+ nfs_fscache_invalidate(inode);
spin_unlock(&inode->i_lock);
}
}
@@ -861,7 +864,7 @@ static int nfs_invalidate_mapping(struct inode *inode, struct address_space *map
memset(nfsi->cookieverf, 0, sizeof(nfsi->cookieverf));
spin_unlock(&inode->i_lock);
nfs_inc_stats(inode, NFSIOS_DATAINVALIDATE);
- nfs_fscache_reset_inode_cookie(inode);
+ nfs_fscache_wait_on_invalidate(inode);
dfprintk(PAGECACHE, "NFS: (%s/%Ld) data cache invalidated\n",
inode->i_sb->s_id, (long long)NFS_FILEID(inode));
return 0;
@@ -926,6 +929,10 @@ static unsigned long nfs_wcc_update_inode(struct inode *inode, struct nfs_fattr
i_size_write(inode, nfs_size_to_loff_t(fattr->size));
ret |= NFS_INO_INVALID_ATTR;
}
+
+ if (nfsi->cache_validity & NFS_INO_INVALID_DATA)
+ nfs_fscache_invalidate(inode);
+
return ret;
}
@@ -1105,8 +1112,10 @@ static int nfs_post_op_update_inode_locked(struct inode *inode, struct nfs_fattr
struct nfs_inode *nfsi = NFS_I(inode);
nfsi->cache_validity |= NFS_INO_INVALID_ATTR|NFS_INO_REVAL_PAGECACHE;
- if (S_ISDIR(inode->i_mode))
+ if (S_ISDIR(inode->i_mode)) {
nfsi->cache_validity |= NFS_INO_INVALID_DATA;
+ nfs_fscache_invalidate(inode);
+ }
if ((fattr->valid & NFS_ATTR_FATTR) == 0)
return 0;
return nfs_refresh_inode_locked(inode, fattr);
@@ -1398,6 +1407,9 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
(save_cache_validity & NFS_INO_REVAL_FORCED))
nfsi->cache_validity |= invalid;
+ if (invalid & NFS_INO_INVALID_DATA)
+ nfs_fscache_invalidate(inode);
+
return 0;
out_changed:
/*
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 4700fae..d6b9734 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -60,6 +60,7 @@
#include "iostat.h"
#include "callback.h"
#include "pnfs.h"
+#include "fscache.h"
#define NFSDBG_FACILITY NFSDBG_PROC
@@ -756,6 +757,7 @@ static void update_changeattr(struct inode *dir, struct nfs4_change_info *cinfo)
if (!cinfo->atomic || cinfo->before != nfsi->change_attr)
nfs_force_lookup_revalidate(dir);
nfsi->change_attr = cinfo->after;
+ nfs_fscache_invalidate(dir);
spin_unlock(&dir->i_lock);
}
^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2011-09-29 14:48 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-29 14:45 [RFC][PATCH 00/13] Fix FS-Cache problems David Howells
2011-09-29 14:45 ` [PATCH 01/13] Noisefs: A predictable noise producing fs for testing things David Howells
2011-09-29 14:46 ` [PATCH 02/13] CacheFiles: Fix the marking of cached pages David Howells
2011-09-29 14:46 ` [PATCH 03/13] FS-Cache: Validate page mapping pointer value David Howells
2011-09-29 14:46 ` [PATCH 04/13] CacheFiles: Downgrade the requirements passed to the allocator David Howells
2011-09-29 14:46 ` [PATCH 05/13] FS-Cache: Check that there are no read ops when cookie relinquished David Howells
2011-09-29 14:46 ` [PATCH 06/13] FS-Cache: Check cookie is still correct in __fscache_read_or_alloc_pages() David Howells
2011-09-29 14:47 ` [PATCH 07/13] CacheFiles: Make some debugging statements conditional David Howells
2011-09-29 14:47 ` [PATCH 08/13] FS-Cache: Make cookie relinquishment wait for outstanding reads David Howells
2011-09-29 14:47 ` [PATCH 09/13] FS-Cache: Fix operation state management and accounting David Howells
[not found] ` <20110929144536.5812.84405.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>
2011-09-29 14:47 ` [PATCH 10/13] FS-Cache: Provide proper invalidation David Howells
2011-09-29 14:47 ` [PATCH 11/13] VFS: Make more complete truncate operation available to CacheFiles David Howells
2011-09-29 14:48 ` [PATCH 12/13] CacheFiles: Implement invalidation David Howells
2011-09-29 14:48 ` [PATCH 13/13] NFS: Use FS-Cache invalidation David Howells
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).