Re: [PATCH] prune_icache_sb

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Wendy Cheng <wcheng@redhat.com>
To: wcheng@redhat.com, Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] prune_icache_sb
Date: Thu, 30 Nov 2006 11:05:32 -0500	[thread overview]
Message-ID: <456F014C.5040200@redhat.com> (raw)
In-Reply-To: <456D2259.1050306@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 1715 bytes --]

How about a simple and plain change with this uploaded patch ....

The idea is, instead of unconditionally dropping every buffer associated 
with the particular mount point (that defeats the purpose of page 
caching), base kernel exports the "drop_pagecache_sb()" call that allows 
page cache to be trimmed. More importantly, it is changed to offer the 
choice of not randomly purging any buffer but the ones that seem to be 
unused (i_state is NULL and i_count is zero). This will encourage 
filesystem(s) to pro actively response to vm memory shortage if they 
choose so.

 From our end (cluster locks are expensive - that's why we cache them), 
one of our kernel daemons will invoke this newly exported call based on 
a set of pre-defined tunables. It is then followed by a lock reclaim 
logic to trim the locks by checking the page cache associated with the 
inode (that this cluster lock is created for). If nothing is attached to 
the inode (based on i_mapping->nrpages count), we know it is a good 
candidate for trimming and will subsequently drop this lock (instead of 
waiting until the end of vfs inode life cycle).

Note that I could do invalidate_inode_pages() within our kernel modules 
to accomplish what drop_pagecache_sb() does (without coming here to bug 
people) but I don't have access to inode_lock as an external kernel 
module. So either EXPORT_SYMBOL(inode_lock) or this patch ?

The end result (of this change) should encourage filesystem to actively 
avoid depleting too much memory and we'll encourage our applications to 
understand clustering locality issues.

Haven't tested this out though - would appreciate some comments before 
spending more efforts on this direction.

-- Wendy

[-- Attachment #2: inode_prune_sb.patch --]
[-- Type: text/x-patch, Size: 1723 bytes --]

--- linux-2.6.18/include/linux/fs.h	2006-09-19 23:42:06.000000000 -0400
+++ ups-kernel/include/linux/fs.h	2006-11-30 02:16:34.000000000 -0500
@@ -1625,7 +1625,8 @@ extern void remove_inode_hash(struct ino
 static inline void insert_inode_hash(struct inode *inode) {
 	__insert_inode_hash(inode, inode->i_ino);
 }
-
+extern void void drop_pagecache_sb(struct super_block *sb, int nr_goal);:q
+ 
 extern struct file * get_empty_filp(void);
 extern void file_move(struct file *f, struct list_head *list);
 extern void file_kill(struct file *f);
--- linux-2.6.18/fs/drop_caches.c	2006-09-19 23:42:06.000000000 -0400
+++ ups-kernel/fs/drop_caches.c	2006-11-30 03:36:11.000000000 -0500
@@ -12,13 +12,20 @@
 /* A global variable is a bit ugly, but it keeps the code simple */
 int sysctl_drop_caches;

-static void drop_pagecache_sb(struct super_block *sb)
+void drop_pagecache_sb(struct super_block *sb, int nr_goal)
 {
 	struct inode *inode;
+	int nr_count=0;

 	spin_lock(&inode_lock);
 	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
-		if (inode->i_state & (I_FREEING|I_WILL_FREE))
+		if (nr_goal) {
+			if (nr_goal == nr_count)
+				break;
+			if ((inode->i_state || atomic_read(&inode->i_count))
+				continue;
+			nr_count++;
+		} else if (inode->i_state & (I_FREEING|I_WILL_FREE))
 			continue;
 		invalidate_inode_pages(inode->i_mapping);
 	}
@@ -36,7 +43,7 @@ restart:
 		spin_unlock(&sb_lock);
 		down_read(&sb->s_umount);
 		if (sb->s_root)
-			drop_pagecache_sb(sb);
+			drop_pagecache_sb(sb, 0);
 		up_read(&sb->s_umount);
 		spin_lock(&sb_lock);
 		if (__put_super_and_need_restart(sb))
@@ -66,3 +73,6 @@ int drop_caches_sysctl_handler(ctl_table
 	}
 	return 0;
 }
+
+EXPORT_SYMBOL(drop_pagecache_sb);
+

next prev parent reply	other threads:[~2006-11-30 16:16 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-22 21:35 [PATCH] prune_icache_sb Wendy Cheng
2006-11-22 23:36 ` Andrew Morton
2006-11-27 23:52   ` Wendy Cheng
2006-11-28  0:52     ` Andrew Morton
2006-11-28 21:41       ` Wendy Cheng
2006-11-29  0:21         ` Andrew Morton
2006-11-29  6:02           ` Wendy Cheng
2006-11-30 16:05             ` Wendy Cheng [this message]
2006-11-30 19:31               ` Nate Diller
2006-12-01 21:23               ` Andrew Morton
2006-12-03 17:49                 ` Wendy Cheng
2006-12-03 20:47                   ` Andrew Morton
2006-12-04  5:57                     ` Wendy Cheng
2006-12-04  6:28                       ` Andrew Morton
2006-12-04 16:51                   ` Russell Cattelan
2006-12-04 20:46                     ` Wendy Cheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=456F014C.5040200@redhat.com \
    --to=wcheng@redhat.com \
    --cc=akpm@osdl.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).