From: Chuck Lever <chuck.lever@oracle.com>
To: Steve Dickson <SteveD@redhat.com>
Cc: Andrew Morton <akpm@osdl.org>,
Trond Myklebust <Trond.Myklebust@netapp.com>,
linux-mm@kvack.org
Subject: Re: Checking page_count(page) in invalidate_complete_page
Date: Mon, 02 Oct 2006 22:14:50 -0400 [thread overview]
Message-ID: <4521C79A.6090102@oracle.com> (raw)
In-Reply-To: <45216233.5010602@RedHat.com>
[-- Attachment #1: Type: text/plain, Size: 1298 bytes --]
Steve Dickson wrote:
> Andrew Morton wrote:
>>
>> This is our user's data we're talking about here.
> Point...
>
>>
>> If that printk comes out then we need to fix the kernel so that it no
>> longer wants to print that printk. We don't want to just hide it.
>
> I'm concern about the printk popping when we are flushing the
> readdir cache (i.e. stale data) and either flooding the console
> to a ton a messages (basically bring a system to its knees for
> no good reason) or scaring the hell out people by saying we have a
> major problem when in reality we are just flushing stale data...
>
> So I definitely agree the printk should be there and be on by default,
> but I so think it would be a good idea to have way to turn it off
> if need be...
[ Sorry for the attachment... anyone know how to include a diff inline
with Thunderbird? ]
The attached patch is my suggestion for reporting the cache invalidation
failure from within the NFS client. Please review and comment. My
testing with this patch applied has not triggered a single message, but
I haven't tried any memory exhaustion scenarios.
I honestly doubt that we will see log floods. The original problem that
was causing stale data to remain cached has been addressed. The reclaim
race will almost certainly be rare.
[-- Attachment #2: nfs-check-return-codes.diff --]
[-- Type: text/plain, Size: 5097 bytes --]
NFS: Add return code checks for page invalidation
Print a warning if the page invalidation functions don't behave as
expected. A BUG is probably overkill here since the client's internal data
structures will remain consistent.
We're trying to catch cases where invaliding an inode's page cache races
with vmscan or direct I/O, resulting in stale data remaining in the page
cache.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfs/dir.c | 34 +++++++++++++++++++++++++++++-----
fs/nfs/direct.c | 2 +-
fs/nfs/inode.c | 25 +++++++++++++++++++++++--
fs/nfs/iostat.h | 1 +
include/linux/nfs_fs.h | 1 +
5 files changed, 55 insertions(+), 8 deletions(-)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 7432f1a..0bb1a42 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -156,6 +156,32 @@ typedef struct {
int error;
} nfs_readdir_descriptor_t;
+/*
+ * Trim off all pages past page zero. This ensures consistent page
+ * alignment of cached data.
+ *
+ * NB: This assumes we have exclusive access to this mapping either
+ * through inode->i_mutex or some other mechanism.
+ */
+static void nfs_truncate_directory_cache(struct inode *inode)
+{
+ int result;
+
+ dfprintk(DIRCACHE, "NFS: %s: truncating directory (%s/%Ld)\n",
+ __FUNCTION__, inode->i_sb->s_id,
+ (long long)NFS_FILEID(inode));
+
+ result = invalidate_inode_pages2_range(inode->i_mapping,
+ PAGE_CACHE_SIZE, -1);
+ if (unlikely(result < 0)) {
+ nfs_inc_stats(inode, NFSIOS_INVALIDATEFAILED);
+ printk(KERN_ERR
+ "NFS: error %d invalidating cache for dir (%s/%Ld)\n",
+ result, inode->i_sb->s_id,
+ (long long)NFS_FILEID(inode));
+ }
+}
+
/* Now we cache directories properly, by stuffing the dirent
* data directly in the page cache.
*
@@ -199,12 +225,10 @@ int nfs_readdir_filler(nfs_readdir_descr
spin_lock(&inode->i_lock);
NFS_I(inode)->cache_validity |= NFS_INO_INVALID_ATIME;
spin_unlock(&inode->i_lock);
- /* Ensure consistent page alignment of the data.
- * Note: assumes we have exclusive access to this mapping either
- * through inode->i_mutex or some other mechanism.
- */
+
if (page->index == 0)
- invalidate_inode_pages2_range(inode->i_mapping, PAGE_CACHE_SIZE, -1);
+ nfs_truncate_directory_cache(inode);
+
unlock_page(page);
return 0;
error:
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 377839b..fe69c39 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -823,7 +823,7 @@ ssize_t nfs_file_direct_write(struct kio
* occur before the writes complete. Kind of racey.
*/
if (mapping->nrpages)
- invalidate_inode_pages2(mapping);
+ nfs_invalidate_mapping(mapping->host, mapping);
if (retval > 0)
iocb->ki_pos = pos + retval;
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index bc9376c..e1cf978 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -657,6 +657,27 @@ int nfs_revalidate_inode(struct nfs_serv
}
/**
+ * nfs_invalidate_mapping - Invalidate the inode's page cache
+ * @inode - pointer to host inode
+ * @mapping - pointer to mapping
+ */
+void nfs_invalidate_mapping(struct inode *inode, struct address_space *mapping)
+{
+ int result;
+
+ nfs_inc_stats(inode, NFSIOS_DATAINVALIDATE);
+
+ result = invalidate_inode_pages2(mapping);
+ if (unlikely(result) < 0) {
+ nfs_inc_stats(inode, NFSIOS_INVALIDATEFAILED);
+ printk(KERN_ERR
+ "NFS: error %d invalidating pages for inode (%s/%Ld)\n",
+ result, inode->i_sb->s_id,
+ (long long)NFS_FILEID(inode));
+ }
+}
+
+/**
* nfs_revalidate_mapping - Revalidate the pagecache
* @inode - pointer to host inode
* @mapping - pointer to mapping
@@ -673,10 +694,10 @@ int nfs_revalidate_mapping(struct inode
ret = __nfs_revalidate_inode(NFS_SERVER(inode), inode);
if (nfsi->cache_validity & NFS_INO_INVALID_DATA) {
- nfs_inc_stats(inode, NFSIOS_DATAINVALIDATE);
if (S_ISREG(inode->i_mode))
nfs_sync_mapping(mapping);
- invalidate_inode_pages2(mapping);
+
+ nfs_invalidate_mapping(inode, mapping);
spin_lock(&inode->i_lock);
nfsi->cache_validity &= ~NFS_INO_INVALID_DATA;
diff --git a/fs/nfs/iostat.h b/fs/nfs/iostat.h
index 6350ecb..df41150 100644
--- a/fs/nfs/iostat.h
+++ b/fs/nfs/iostat.h
@@ -104,6 +104,7 @@ enum nfs_stat_eventcounters {
NFSIOS_SHORTREAD,
NFSIOS_SHORTWRITE,
NFSIOS_DELAY,
+ NFSIOS_INVALIDATEFAILED,
__NFSIOS_COUNTSMAX,
};
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 98c9b9f..dc3cac3 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -306,6 +306,7 @@ extern int nfs_attribute_timeout(struct
extern int nfs_revalidate_inode(struct nfs_server *server, struct inode *inode);
extern int __nfs_revalidate_inode(struct nfs_server *, struct inode *);
extern int nfs_revalidate_mapping(struct inode *inode, struct address_space *mapping);
+extern void nfs_invalidate_mapping(struct inode *inode, struct address_space *mapping);
extern int nfs_setattr(struct dentry *, struct iattr *);
extern void nfs_setattr_update_inode(struct inode *inode, struct iattr *attr);
extern void nfs_begin_attr_update(struct inode *);
next prev parent reply other threads:[~2006-10-03 2:14 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <4518333E.2060101@oracle.com>
2006-09-25 21:10 ` Checking page_count(page) in invalidate_complete_page Andrew Morton
2006-09-25 22:30 ` Chuck Lever
2006-09-25 22:53 ` Andrew Morton
2006-09-25 22:57 ` Steve Dickson
2006-09-25 23:14 ` Nick Piggin
2006-09-25 22:40 ` Chuck Lever
2006-09-25 23:02 ` Andrew Morton
2006-09-25 22:50 ` Steve Dickson
2006-09-25 22:51 ` Nick Piggin
2006-09-25 23:14 ` Chuck Lever
2006-09-25 23:21 ` Nick Piggin
2006-09-26 0:01 ` Chuck Lever
2006-09-26 0:13 ` Nick Piggin
2006-09-26 1:33 ` Chuck Lever
2006-09-26 1:48 ` Nick Piggin
2006-09-28 16:26 ` Chuck Lever
2006-09-28 16:36 ` Andrew Morton
2006-09-28 16:40 ` Andrew Morton
2006-09-28 16:42 ` Chuck Lever
2006-09-28 17:03 ` Andrew Morton
2006-09-28 17:09 ` Chuck Lever
2006-09-29 0:37 ` Nick Piggin
2006-09-29 20:34 ` Chuck Lever
2006-09-29 20:45 ` Peter Zijlstra
2006-09-29 21:02 ` Chuck Lever
2006-09-29 21:17 ` Peter Zijlstra
2006-09-29 21:44 ` Andrew Morton
2006-09-29 21:48 ` Chuck Lever
2006-09-29 22:29 ` Andrew Morton
2006-09-29 23:05 ` Chuck Lever
2006-10-01 4:21 ` Chuck Lever
2006-10-02 12:01 ` Steve Dickson
2006-10-02 13:25 ` Trond Myklebust
2006-10-02 16:57 ` Andrew Morton
2006-10-02 17:02 ` Steve Dickson
2006-10-02 18:20 ` Andrew Morton
2006-10-02 19:02 ` Steve Dickson
2006-10-03 2:14 ` Chuck Lever [this message]
2006-10-03 4:18 ` Trond Myklebust
2006-10-03 4:24 ` Andrew Morton
2006-10-03 18:50 ` Chuck Lever
2006-10-03 19:10 ` Trond Myklebust
2006-10-03 19:21 ` Chuck Lever
2006-10-03 21:37 ` Andrew Morton
2006-10-04 19:29 ` Chuck Lever
2006-10-04 19:43 ` Andrew Morton
2006-10-04 19:53 ` Steve Dickson
2006-09-28 16:41 ` Chuck Lever
2006-09-26 6:25 ` Nick Piggin
2006-09-26 13:12 ` Chuck Lever
2006-09-27 4:47 ` Nick Piggin
2006-09-27 8:25 ` Andrew Morton
2006-09-27 8:39 ` Nick Piggin
2006-09-27 16:03 ` Andrew Morton
2006-09-27 15:54 ` Chuck Lever
2006-09-25 22:56 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4521C79A.6090102@oracle.com \
--to=chuck.lever@oracle.com \
--cc=SteveD@redhat.com \
--cc=Trond.Myklebust@netapp.com \
--cc=akpm@osdl.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.