linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wendy Cheng <wcheng@redhat.com>
To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: [PATCH] prune_icache_sb
Date: Wed, 22 Nov 2006 16:35:07 -0500	[thread overview]
Message-ID: <4564C28B.30604@redhat.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 1692 bytes --]

There seems to have a need to prune inode cache entries for specific 
mount points (per vfs superblock) due to performance issues found after 
some io intensive commands ("rsyn" for example).  The problem is 
particularly serious for one of our kernel modules where it caches its 
(cluster) locks based on vfs inode implementation. These locks are 
created by inode creation call and get purged when s_op->clear_inode() 
is invoked. With larger servers that equipped with plenty of memory, the 
page dirty ratio may not pass the threshold to trigger VM reclaim logic 
but the accumulated inode counts (and its associated cluster locks) 
could causes unacceptable performance degradation for latency sensitive 
applications.

After adding the uploaded inode trimming patch, together with 
shrink_dcache_sb(), we are able to keep the latency for one real world 
application within a satisfactory bound (consistently stayed within 5 
seconds, compared to the original fluctuation between 5 to 16 seconds). 
The calls are placed in one of our kernel daemons that wakes up in a 
tunable interval to do the trimming work as shown in the following code 
segment. Would appreciate if this patch can get accepted into mainline 
kernel.

                  i_percent = sdp->sd_tune.gt_inoded_purge;
                  if (i_percent) {
                          if (i_percent > 100) i_percent = 100;
                          a_count = atomic_read(&sdp->sd_inode_count);
                          i_count = a_count * i_percent / 100;
                          (void) shrink_dcache_sb(sdp->sd_vfs);
                          (void) prune_icache_sb(i_count, sdp->sd_vfs);
                   }

-- Wendy





[-- Attachment #2: inode_prune_sb.patch --]
[-- Type: text/x-patch, Size: 2144 bytes --]

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>

 fs/inode.c         |   14 +++++++++++---
 include/linux/fs.h |    3 ++-
 2 files changed, 13 insertions(+), 4 deletions(-)

--- linux-2.6.18/include/linux/fs.h	2006-09-19 23:42:06.000000000 -0400
+++ ups-kernel/include/linux/fs.h	2006-11-22 13:55:55.000000000 -0500
@@ -1625,7 +1625,8 @@ extern void remove_inode_hash(struct ino
 static inline void insert_inode_hash(struct inode *inode) {
 	__insert_inode_hash(inode, inode->i_ino);
 }
-
+extern void prune_icache_sb(int nr_to_scan, struct super_block *sb);
+ 
 extern struct file * get_empty_filp(void);
 extern void file_move(struct file *f, struct list_head *list);
 extern void file_kill(struct file *f);
--- linux-2.6.18/fs/inode.c	2006-09-19 23:42:06.000000000 -0400
+++ ups-kernel/fs/inode.c	2006-11-22 14:12:28.000000000 -0500
@@ -411,7 +411,7 @@ static int can_unuse(struct inode *inode
  * If the inode has metadata buffers attached to mapping->private_list then
  * try to remove them.
  */
-static void prune_icache(int nr_to_scan)
+static void __prune_icache(int nr_to_scan, struct super_block *sb)
 {
 	LIST_HEAD(freeable);
 	int nr_pruned = 0;
@@ -428,7 +428,8 @@ static void prune_icache(int nr_to_scan)
 
 		inode = list_entry(inode_unused.prev, struct inode, i_list);
 
-		if (inode->i_state || atomic_read(&inode->i_count)) {
+		if (inode->i_state || atomic_read(&inode->i_count) 
+			|| (sb && (inode->i_sb != sb))) {
 			list_move(&inode->i_list, &inode_unused);
 			continue;
 		}
@@ -461,6 +462,13 @@ static void prune_icache(int nr_to_scan)
 	mutex_unlock(&iprune_mutex);
 }
 
+void prune_icache_sb(int nr_to_scan, struct super_block *sb)
+{
+	__prune_icache(nr_to_scan, sb);
+}
+
+EXPORT_SYMBOL(prune_icache_sb);
+
 /*
  * shrink_icache_memory() will attempt to reclaim some unused inodes.  Here,
  * "unused" means that no dentries are referring to the inodes: the files are
@@ -480,7 +488,7 @@ static int shrink_icache_memory(int nr, 
 	 	 */
 		if (!(gfp_mask & __GFP_FS))
 			return -1;
-		prune_icache(nr);
+		__prune_icache(nr, NULL);
 	}
 	return (inodes_stat.nr_unused / 100) * sysctl_vfs_cache_pressure;
 }

             reply	other threads:[~2006-11-22 21:44 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-22 21:35 Wendy Cheng [this message]
2006-11-22 23:36 ` [PATCH] prune_icache_sb Andrew Morton
2006-11-27 23:52   ` Wendy Cheng
2006-11-28  0:52     ` Andrew Morton
2006-11-28 21:41       ` Wendy Cheng
2006-11-29  0:21         ` Andrew Morton
2006-11-29  6:02           ` Wendy Cheng
2006-11-30 16:05             ` Wendy Cheng
2006-11-30 19:31               ` Nate Diller
2006-12-01 21:23               ` Andrew Morton
2006-12-03 17:49                 ` Wendy Cheng
2006-12-03 20:47                   ` Andrew Morton
2006-12-04  5:57                     ` Wendy Cheng
2006-12-04  6:28                       ` Andrew Morton
2006-12-04 16:51                   ` Russell Cattelan
2006-12-04 20:46                     ` Wendy Cheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4564C28B.30604@redhat.com \
    --to=wcheng@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).