[PATCH 0/5] Per-superblock shrinkers

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/5] Per-superblock shrinkers
@ 2010-05-14  7:24 Dave Chinner
  2010-05-14  7:24 ` [PATCH 1/5] inode: Make unused inode LRU per superblock Dave Chinner
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Dave Chinner @ 2010-05-14  7:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: xfs, linux-fsdevel, linux-mm

This series reworks the filesystem shrinkers. We currently have a
set of issues with the current filesystem shrinkers:

	1. There is an dependency between dentry and inode cache
	   shrinking that is only implicitly defined by the order of
	   shrinker registration.
	2. The shrinkers need to walk the superblock list and pin
	   the superblock to avoid unmount races with the sb going
	   away.
	3. The dentry cache uses per-superblock LRUs and proportions
	   reclaim between all the superblocks which means we are
	   doing breadth based reclaim. This means we touch every
	   superblock for every shrinker call, and may only reclaim
	   a single dentry at a time from a given superblock.
	4. The inode cache has a global LRU, so it has different
	   reclaim patterns to the dentry cache, despite the fact
	   that the dentry cache is generally the only thing that
	   pins inodes in memory.
	5. Filesystems need to register their own shrinkers for
	   caches and can't co-ordinate them with the dentry and
	   inode cache shrinkers.

The series starts by converting the inode cache to per-superblock
LRUs and changes the shrinker to match the dentry cache (#4).

It then adds a context to the shrinker callouts by passing the
shrinker structure with the callout. With this, a shrinker structure
is added to the superblock structure and a per-superblock shrinker
is registered.  Both the inode and dentry caches are modified to
shrunk via the superblock shrinker, and this directly encodes the
dcache/icache dependency inside the shrinker (#1).

This shrinker structure also avoids the need to pin the superblock
inside the shrinker because the shrinker is unregistered before the
superblock is freed (#2). Further, it pushes the proportioning of
reclaim between superblocks back up into the shrinker and batches
all the reclaim from a superblock into a tight call loop until the
shrink cycle for that superblock is complete. This effectively
converts reclaim to a depth-based reclaim mechanism which has a
smaller CPU cache footprint than the current mechanism (#3).

Then a pair of superblock operations that can be used to implement
filesystem specific cache reclaim is added. This is split into two
operations we don't need to overload the number of objects to scan
to indicate that a count should be returned.

Finally, the XFS inode cache shrinker is converted to use these
superblock operations, removing the need to register a shrinker,
keep a global list of XFS filesystems and locking to access the
per-filesystem caches. This fixes several new lockdep warnings the
XFS shrinker introduces because of the different contexts the
shrinker is called in, and allows for correct proportioning of
reclaim between the dentry, inode and XFS inode caches on the
filesystem to be executed (#5).

 arch/x86/kvm/mmu.c              |    2 +-
 drivers/gpu/drm/i915/i915_gem.c |    2 +-
 fs/dcache.c                     |  137 ++++++---------------------------------
 fs/fs-writeback.c               |    2 +-
 fs/gfs2/glock.c                 |    2 +-
 fs/gfs2/quota.c                 |    2 +-
 fs/gfs2/quota.h                 |    2 +-
 fs/inode.c                      |   64 ++++++-------------
 fs/mbcache.c                    |    5 +-
 fs/nfs/dir.c                    |    2 +-
 fs/nfs/internal.h               |    3 +-
 fs/quota/dquot.c                |    2 +-
 fs/super.c                      |   68 +++++++++++++++++++
 fs/ubifs/shrinker.c             |    2 +-
 fs/ubifs/ubifs.h                |    2 +-
 fs/xfs/linux-2.6/xfs_buf.c      |    5 +-
 fs/xfs/linux-2.6/xfs_super.c    |   23 +++++--
 fs/xfs/linux-2.6/xfs_sync.c     |  123 +++++++++-------------------------
 fs/xfs/linux-2.6/xfs_sync.h     |   16 +++--
 fs/xfs/quota/xfs_qm.c           |    7 ++-
 fs/xfs/quota/xfs_qm_syscalls.c  |    2 +-
 fs/xfs/xfs_mount.h              |    1 -
 include/linux/fs.h              |   22 ++++++
 include/linux/mm.h              |    2 +-
 include/linux/writeback.h       |    1 -
 mm/vmscan.c                     |    8 ++-
 26 files changed, 220 insertions(+), 287 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/5] inode: Make unused inode LRU per superblock
  2010-05-14  7:24 [PATCH 0/5] Per-superblock shrinkers Dave Chinner
@ 2010-05-14  7:24 ` Dave Chinner
  2010-05-14  7:24 ` [PATCH 2/5] mm: add context argument to shrinker callback Dave Chinner
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2010-05-14  7:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: xfs, linux-fsdevel, linux-mm

From: Dave Chinner <dchinner@redhat.com>

The inode unused list is currently a global LRU. This does not match
the other global filesystem cache - the dentry cache - which uses
per-superblock LRU lists. Hence we have related filesystem object
types using different LRU reclaimatin schemes.

To enable a per-superblock filesystem cache shrinker, both of these
caches need to have per-sb unused object LRU lists. Hence this patch
converts the global inode LRU to per-sb LRUs.

The patch only does rudimentary per-sb propotioning in the shrinker
infrastructure, as this gets removed when the per-sb shrinker
callouts are introduced later on.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/fs-writeback.c         |    2 +-
 fs/inode.c                |   89 ++++++++++++++++++++++++++++++++++++++++-----
 fs/super.c                |    1 +
 include/linux/fs.h        |    4 ++
 include/linux/writeback.h |    1 -
 5 files changed, 85 insertions(+), 12 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 4b37f7c..fd78854 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -547,7 +547,7 @@ select_queue:
 			/*
 			 * The inode is clean, unused
 			 */
-			list_move(&inode->i_list, &inode_unused);
+			list_move(&inode->i_list, &inode->i_sb->s_inode_lru);
 		}
 	}
 	inode_sync_complete(inode);
diff --git a/fs/inode.c b/fs/inode.c
index 407bf39..8b95b15 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -74,7 +74,6 @@ static unsigned int i_hash_shift __read_mostly;
  */
 
 LIST_HEAD(inode_in_use);
-LIST_HEAD(inode_unused);
 static struct hlist_head *inode_hashtable __read_mostly;
 
 /*
@@ -294,6 +293,7 @@ void __iget(struct inode *inode)
 	if (!(inode->i_state & (I_DIRTY|I_SYNC)))
 		list_move(&inode->i_list, &inode_in_use);
 	inodes_stat.nr_unused--;
+	inode->i_sb->s_nr_inodes_unused--;
 }
 
 /**
@@ -388,6 +388,7 @@ static int invalidate_list(struct list_head *head, struct list_head *dispose)
 		invalidate_inode_buffers(inode);
 		if (!atomic_read(&inode->i_count)) {
 			list_move(&inode->i_list, dispose);
+			inode->i_sb->s_nr_inodes_unused--;
 			WARN_ON(inode->i_state & I_NEW);
 			inode->i_state |= I_FREEING;
 			count++;
@@ -446,32 +447,31 @@ static int can_unuse(struct inode *inode)
  *
  * Any inodes which are pinned purely because of attached pagecache have their
  * pagecache removed.  We expect the final iput() on that inode to add it to
- * the front of the inode_unused list.  So look for it there and if the
+ * the front of the sb->s_inode_lru list.  So look for it there and if the
  * inode is still freeable, proceed.  The right inode is found 99.9% of the
  * time in testing on a 4-way.
  *
  * If the inode has metadata buffers attached to mapping->private_list then
  * try to remove them.
  */
-static void prune_icache(int nr_to_scan)
+static void shrink_icache_sb(struct super_block *sb, int *nr_to_scan)
 {
 	LIST_HEAD(freeable);
 	int nr_pruned = 0;
 	int nr_scanned;
 	unsigned long reap = 0;
 
-	down_read(&iprune_sem);
 	spin_lock(&inode_lock);
-	for (nr_scanned = 0; nr_scanned < nr_to_scan; nr_scanned++) {
+	for (nr_scanned = *nr_to_scan; nr_scanned >= 0; nr_scanned--) {
 		struct inode *inode;
 
-		if (list_empty(&inode_unused))
+		if (list_empty(&sb->s_inode_lru))
 			break;
 
-		inode = list_entry(inode_unused.prev, struct inode, i_list);
+		inode = list_entry(sb->s_inode_lru.prev, struct inode, i_list);
 
 		if (inode->i_state || atomic_read(&inode->i_count)) {
-			list_move(&inode->i_list, &inode_unused);
+			list_move(&inode->i_list, &sb->s_inode_lru);
 			continue;
 		}
 		if (inode_has_buffers(inode) || inode->i_data.nrpages) {
@@ -483,7 +483,7 @@ static void prune_icache(int nr_to_scan)
 			iput(inode);
 			spin_lock(&inode_lock);
 
-			if (inode != list_entry(inode_unused.next,
+			if (inode != list_entry(sb->s_inode_lru.next,
 						struct inode, i_list))
 				continue;	/* wrong inode or list_empty */
 			if (!can_unuse(inode))
@@ -495,13 +495,80 @@ static void prune_icache(int nr_to_scan)
 		nr_pruned++;
 	}
 	inodes_stat.nr_unused -= nr_pruned;
+	sb->s_nr_inodes_unused -= nr_pruned;
 	if (current_is_kswapd())
 		__count_vm_events(KSWAPD_INODESTEAL, reap);
 	else
 		__count_vm_events(PGINODESTEAL, reap);
 	spin_unlock(&inode_lock);
+	*nr_to_scan = nr_scanned;
 
 	dispose_list(&freeable);
+}
+
+static void prune_icache(int count)
+{
+	struct super_block *sb;
+	int w_count;
+	int unused = inodes_stat.nr_unused;
+	int prune_ratio;
+	int pruned;
+
+	if (unused == 0 || count == 0)
+		return;
+	down_read(&iprune_sem);
+restart:
+	if (count >= unused)
+		prune_ratio = 1;
+	else
+		prune_ratio = unused / count;
+	spin_lock(&sb_lock);
+	list_for_each_entry(sb, &super_blocks, s_list) {
+		if (sb->s_nr_inodes_unused == 0)
+			continue;
+		sb->s_count++;
+		/* Now, we reclaim unused dentrins with fairness.
+		 * We reclaim them same percentage from each superblock.
+		 * We calculate number of dentries to scan on this sb
+		 * as follows, but the implementation is arranged to avoid
+		 * overflows:
+		 * number of dentries to scan on this sb =
+		 * count * (number of dentries on this sb /
+		 * number of dentries in the machine)
+		 */
+		spin_unlock(&sb_lock);
+		if (prune_ratio != 1)
+			w_count = (sb->s_nr_inodes_unused / prune_ratio) + 1;
+		else
+			w_count = sb->s_nr_inodes_unused;
+		pruned = w_count;
+		/*
+		 * We need to be sure this filesystem isn't being unmounted,
+		 * otherwise we could race with generic_shutdown_super(), and
+		 * end up holding a reference to an inode while the filesystem
+		 * is unmounted.  So we try to get s_umount, and make sure
+		 * s_root isn't NULL.
+		 */
+		if (down_read_trylock(&sb->s_umount)) {
+			if ((sb->s_root != NULL) &&
+			    (!list_empty(&sb->s_dentry_lru))) {
+				shrink_icache_sb(sb, &w_count);
+				pruned -= w_count;
+			}
+			up_read(&sb->s_umount);
+		}
+		spin_lock(&sb_lock);
+		count -= pruned;
+		/*
+		 * restart only when sb is no longer on the list and
+		 * we have more work to do.
+		 */
+		if (__put_super_and_need_restart(sb) && count > 0) {
+			spin_unlock(&sb_lock);
+			goto restart;
+		}
+	}
+	spin_unlock(&sb_lock);
 	up_read(&iprune_sem);
 }
 
@@ -1242,8 +1309,9 @@ int generic_detach_inode(struct inode *inode)
 
 	if (!hlist_unhashed(&inode->i_hash)) {
 		if (!(inode->i_state & (I_DIRTY|I_SYNC)))
-			list_move(&inode->i_list, &inode_unused);
+			list_move(&inode->i_list, &sb->s_inode_lru);
 		inodes_stat.nr_unused++;
+		sb->s_nr_inodes_unused++;
 		if (sb->s_flags & MS_ACTIVE) {
 			spin_unlock(&inode_lock);
 			return 0;
@@ -1256,6 +1324,7 @@ int generic_detach_inode(struct inode *inode)
 		WARN_ON(inode->i_state & I_NEW);
 		inode->i_state &= ~I_WILL_FREE;
 		inodes_stat.nr_unused--;
+		sb->s_nr_inodes_unused--;
 		hlist_del_init(&inode->i_hash);
 	}
 	list_del_init(&inode->i_list);
diff --git a/fs/super.c b/fs/super.c
index 1527e6a..18655e6 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -68,6 +68,7 @@ static struct super_block *alloc_super(struct file_system_type *type)
 		INIT_HLIST_HEAD(&s->s_anon);
 		INIT_LIST_HEAD(&s->s_inodes);
 		INIT_LIST_HEAD(&s->s_dentry_lru);
+		INIT_LIST_HEAD(&s->s_inode_lru);
 		init_rwsem(&s->s_umount);
 		mutex_init(&s->s_lock);
 		lockdep_set_class(&s->s_umount, &type->s_umount_key);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 44f35ae..41132e3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1348,6 +1348,10 @@ struct super_block {
 	struct list_head	s_dentry_lru;	/* unused dentry lru */
 	int			s_nr_dentry_unused;	/* # of dentry on lru */
 
+	/* s_inode_lru and s_nr_inodes_unused are protected by inode_lock */
+	struct list_head	s_inode_lru;	/* unused inode lru */
+	int			s_nr_inodes_unused;	/* # of inodes on lru */
+
 	struct block_device	*s_bdev;
 	struct backing_dev_info *s_bdi;
 	struct mtd_info		*s_mtd;
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 36520de..2636ade 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -11,7 +11,6 @@ struct backing_dev_info;
 
 extern spinlock_t inode_lock;
 extern struct list_head inode_in_use;
-extern struct list_head inode_unused;
 
 /*
  * fs/fs-writeback.c
-- 
1.5.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/5] mm: add context argument to shrinker callback
  2010-05-14  7:24 [PATCH 0/5] Per-superblock shrinkers Dave Chinner
  2010-05-14  7:24 ` [PATCH 1/5] inode: Make unused inode LRU per superblock Dave Chinner
@ 2010-05-14  7:24 ` Dave Chinner
  2010-05-14  7:24 ` [PATCH 3/5] superblock: introduce per-sb cache shrinker infrastructure Dave Chinner
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2010-05-14  7:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: xfs, linux-fsdevel, linux-mm

From: Dave Chinner <dchinner@redhat.com>

The current shrinker implementation requires the registered callback
to have global state to work from. This makes it difficult to shrink
caches that are not global (e.g. per-filesystem caches). Pass the shrinker
structure to the callback so that users can embed the shrinker structure
in the context the shrinker needs to operate on and get back to it in the
callback via container_of().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 arch/x86/kvm/mmu.c              |    2 +-
 drivers/gpu/drm/i915/i915_gem.c |    2 +-
 fs/dcache.c                     |    2 +-
 fs/gfs2/glock.c                 |    2 +-
 fs/gfs2/quota.c                 |    2 +-
 fs/gfs2/quota.h                 |    2 +-
 fs/inode.c                      |    2 +-
 fs/mbcache.c                    |    5 +++--
 fs/nfs/dir.c                    |    2 +-
 fs/nfs/internal.h               |    3 ++-
 fs/quota/dquot.c                |    2 +-
 fs/ubifs/shrinker.c             |    2 +-
 fs/ubifs/ubifs.h                |    2 +-
 fs/xfs/linux-2.6/xfs_buf.c      |    5 +++--
 fs/xfs/linux-2.6/xfs_sync.c     |    1 +
 fs/xfs/quota/xfs_qm.c           |    7 +++++--
 include/linux/mm.h              |    2 +-
 mm/vmscan.c                     |    8 +++++---
 18 files changed, 31 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 19a8906..03e689d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2918,7 +2918,7 @@ static void kvm_mmu_remove_one_alloc_mmu_page(struct kvm *kvm)
 	kvm_mmu_zap_page(kvm, page);
 }
 
-static int mmu_shrink(int nr_to_scan, gfp_t gfp_mask)
+static int mmu_shrink(struct shrinker *shrink, int nr_to_scan, gfp_t gfp_mask)
 {
 	struct kvm *kvm;
 	struct kvm *kvm_freed = NULL;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ef3d91d..cc436ba 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5185,7 +5185,7 @@ void i915_gem_release(struct drm_device * dev, struct drm_file *file_priv)
 }
 
 static int
-i915_gem_shrink(int nr_to_scan, gfp_t gfp_mask)
+i915_gem_shrink(struct shrinker *shrink, int nr_to_scan, gfp_t gfp_mask)
 {
 	drm_i915_private_t *dev_priv, *next_dev;
 	struct drm_i915_gem_object *obj_priv, *next_obj;
diff --git a/fs/dcache.c b/fs/dcache.c
index f1358e5..41c35c1 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -897,7 +897,7 @@ EXPORT_SYMBOL(shrink_dcache_parent);
  *
  * In this case we return -1 to tell the caller that we baled.
  */
-static int shrink_dcache_memory(int nr, gfp_t gfp_mask)
+static int shrink_dcache_memory(struct shrinker *shrink, int nr, gfp_t gfp_mask)
 {
 	if (nr) {
 		if (!(gfp_mask & __GFP_FS))
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 454d4b4..e947661 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1345,7 +1345,7 @@ void gfs2_glock_complete(struct gfs2_glock *gl, int ret)
 }
 
 
-static int gfs2_shrink_glock_memory(int nr, gfp_t gfp_mask)
+static int gfs2_shrink_glock_memory(struct shrinker *shrink, int nr, gfp_t gfp_mask)
 {
 	struct gfs2_glock *gl;
 	int may_demote;
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 6dbcbad..3f2cf67 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -77,7 +77,7 @@ static LIST_HEAD(qd_lru_list);
 static atomic_t qd_lru_count = ATOMIC_INIT(0);
 static DEFINE_SPINLOCK(qd_lru_lock);
 
-int gfs2_shrink_qd_memory(int nr, gfp_t gfp_mask)
+int gfs2_shrink_qd_memory(struct shrinker *shrink, int nr, gfp_t gfp_mask)
 {
 	struct gfs2_quota_data *qd;
 	struct gfs2_sbd *sdp;
diff --git a/fs/gfs2/quota.h b/fs/gfs2/quota.h
index 195f60c..e7d236c 100644
--- a/fs/gfs2/quota.h
+++ b/fs/gfs2/quota.h
@@ -51,7 +51,7 @@ static inline int gfs2_quota_lock_check(struct gfs2_inode *ip)
 	return ret;
 }
 
-extern int gfs2_shrink_qd_memory(int nr, gfp_t gfp_mask);
+extern int gfs2_shrink_qd_memory(struct shrinker *shrink, int nr, gfp_t gfp_mask);
 extern const struct quotactl_ops gfs2_quotactl_ops;
 
 #endif /* __QUOTA_DOT_H__ */
diff --git a/fs/inode.c b/fs/inode.c
index 8b95b15..b292e41 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -581,7 +581,7 @@ restart:
  * This function is passed the number of inodes to scan, and it returns the
  * total number of remaining possibly-reclaimable inodes.
  */
-static int shrink_icache_memory(int nr, gfp_t gfp_mask)
+static int shrink_icache_memory(struct shrinker *shrink, int nr, gfp_t gfp_mask)
 {
 	if (nr) {
 		/*
diff --git a/fs/mbcache.c b/fs/mbcache.c
index ec88ff3..e28f21b 100644
--- a/fs/mbcache.c
+++ b/fs/mbcache.c
@@ -115,7 +115,7 @@ mb_cache_indexes(struct mb_cache *cache)
  * What the mbcache registers as to get shrunk dynamically.
  */
 
-static int mb_cache_shrink_fn(int nr_to_scan, gfp_t gfp_mask);
+static int mb_cache_shrink_fn(struct shrinker *shrink, int nr_to_scan, gfp_t gfp_mask);
 
 static struct shrinker mb_cache_shrinker = {
 	.shrink = mb_cache_shrink_fn,
@@ -191,13 +191,14 @@ forget:
  * This function is called by the kernel memory management when memory
  * gets low.
  *
+ * @shrink: (ignored)
  * @nr_to_scan: Number of objects to scan
  * @gfp_mask: (ignored)
  *
  * Returns the number of objects which are present in the cache.
  */
 static int
-mb_cache_shrink_fn(int nr_to_scan, gfp_t gfp_mask)
+mb_cache_shrink_fn(struct shrinker *shrink, int nr_to_scan, gfp_t gfp_mask)
 {
 	LIST_HEAD(free_list);
 	struct list_head *l, *ltmp;
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index a7bb5c6..6a0a6f7 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1669,7 +1669,7 @@ static void nfs_access_free_entry(struct nfs_access_entry *entry)
 	smp_mb__after_atomic_dec();
 }
 
-int nfs_access_cache_shrinker(int nr_to_scan, gfp_t gfp_mask)
+int nfs_access_cache_shrinker(struct shrinker *shrink, int nr_to_scan, gfp_t gfp_mask)
 {
 	LIST_HEAD(head);
 	struct nfs_inode *nfsi;
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 11f82f0..aa022eb 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -205,7 +205,8 @@ extern struct rpc_procinfo nfs4_procedures[];
 void nfs_close_context(struct nfs_open_context *ctx, int is_sync);
 
 /* dir.c */
-extern int nfs_access_cache_shrinker(int nr_to_scan, gfp_t gfp_mask);
+extern int nfs_access_cache_shrinker(struct shrinker *shrink,
+					int nr_to_scan, gfp_t gfp_mask);
 
 /* inode.c */
 extern struct workqueue_struct *nfsiod_workqueue;
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 788b580..187e3f2 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -668,7 +668,7 @@ static void prune_dqcache(int count)
  * more memory
  */
 
-static int shrink_dqcache_memory(int nr, gfp_t gfp_mask)
+static int shrink_dqcache_memory(struct shrinker *shrink, int nr, gfp_t gfp_mask)
 {
 	if (nr) {
 		spin_lock(&dq_list_lock);
diff --git a/fs/ubifs/shrinker.c b/fs/ubifs/shrinker.c
index 02feb59..0b20111 100644
--- a/fs/ubifs/shrinker.c
+++ b/fs/ubifs/shrinker.c
@@ -277,7 +277,7 @@ static int kick_a_thread(void)
 	return 0;
 }
 
-int ubifs_shrinker(int nr, gfp_t gfp_mask)
+int ubifs_shrinker(struct shrinker *shrink, int nr, gfp_t gfp_mask)
 {
 	int freed, contention = 0;
 	long clean_zn_cnt = atomic_long_read(&ubifs_clean_zn_cnt);
diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h
index bd2542d..5a92345 100644
--- a/fs/ubifs/ubifs.h
+++ b/fs/ubifs/ubifs.h
@@ -1575,7 +1575,7 @@ int ubifs_tnc_start_commit(struct ubifs_info *c, struct ubifs_zbranch *zroot);
 int ubifs_tnc_end_commit(struct ubifs_info *c);
 
 /* shrinker.c */
-int ubifs_shrinker(int nr_to_scan, gfp_t gfp_mask);
+int ubifs_shrinker(struct shrinker *shrink, int nr_to_scan, gfp_t gfp_mask);
 
 /* commit.c */
 int ubifs_bg_thread(void *info);
diff --git a/fs/xfs/linux-2.6/xfs_buf.c b/fs/xfs/linux-2.6/xfs_buf.c
index 44c2b0e..d2cfc54 100644
--- a/fs/xfs/linux-2.6/xfs_buf.c
+++ b/fs/xfs/linux-2.6/xfs_buf.c
@@ -44,7 +44,7 @@
 
 static kmem_zone_t *xfs_buf_zone;
 STATIC int xfsbufd(void *);
-STATIC int xfsbufd_wakeup(int, gfp_t);
+STATIC int xfsbufd_wakeup(struct shrinker *, int, gfp_t);
 STATIC void xfs_buf_delwri_queue(xfs_buf_t *, int);
 static struct shrinker xfs_buf_shake = {
 	.shrink = xfsbufd_wakeup,
@@ -339,7 +339,7 @@ _xfs_buf_lookup_pages(
 					__func__, gfp_mask);
 
 			XFS_STATS_INC(xb_page_retries);
-			xfsbufd_wakeup(0, gfp_mask);
+			xfsbufd_wakeup(NULL, 0, gfp_mask);
 			congestion_wait(BLK_RW_ASYNC, HZ/50);
 			goto retry;
 		}
@@ -1756,6 +1756,7 @@ xfs_buf_runall_queues(
 
 STATIC int
 xfsbufd_wakeup(
+	struct shrinker		*shrink,
 	int			priority,
 	gfp_t			mask)
 {
diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index a427c63..17ec4a6 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -879,6 +879,7 @@ static struct rw_semaphore xfs_mount_list_lock;
 
 static int
 xfs_reclaim_inode_shrink(
+	struct shrinker	*shrink,
 	int		nr_to_scan,
 	gfp_t		gfp_mask)
 {
diff --git a/fs/xfs/quota/xfs_qm.c b/fs/xfs/quota/xfs_qm.c
index 417e61e..49c8d84 100644
--- a/fs/xfs/quota/xfs_qm.c
+++ b/fs/xfs/quota/xfs_qm.c
@@ -72,7 +72,7 @@ STATIC void	xfs_qm_freelist_destroy(xfs_frlist_t *);
 
 STATIC int	xfs_qm_init_quotainos(xfs_mount_t *);
 STATIC int	xfs_qm_init_quotainfo(xfs_mount_t *);
-STATIC int	xfs_qm_shake(int, gfp_t);
+STATIC int	xfs_qm_shake(struct shrinker *, int, gfp_t);
 
 static struct shrinker xfs_qm_shaker = {
 	.shrink = xfs_qm_shake,
@@ -2088,7 +2088,10 @@ xfs_qm_shake_freelist(
  */
 /* ARGSUSED */
 STATIC int
-xfs_qm_shake(int nr_to_scan, gfp_t gfp_mask)
+xfs_qm_shake(
+	struct shrinker	*shrink,
+	int		nr_to_scan,
+	gfp_t		gfp_mask)
 {
 	int	ndqused, nfree, n;
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 462acaf..ff4c44e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -995,7 +995,7 @@ static inline void sync_mm_rss(struct task_struct *task, struct mm_struct *mm)
  * querying the cache size, so a fastpath for that case is appropriate.
  */
 struct shrinker {
-	int (*shrink)(int nr_to_scan, gfp_t gfp_mask);
+	int (*shrink)(struct shrinker *, int nr_to_scan, gfp_t gfp_mask);
 	int seeks;	/* seeks to recreate an obj */
 
 	/* These are for internal use */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 3ff3311..9d56aaf 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -215,8 +215,9 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
 	list_for_each_entry(shrinker, &shrinker_list, list) {
 		unsigned long long delta;
 		unsigned long total_scan;
-		unsigned long max_pass = (*shrinker->shrink)(0, gfp_mask);
+		unsigned long max_pass;
 
+		max_pass = (*shrinker->shrink)(shrinker, 0, gfp_mask);
 		delta = (4 * scanned) / shrinker->seeks;
 		delta *= max_pass;
 		do_div(delta, lru_pages + 1);
@@ -244,8 +245,9 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
 			int shrink_ret;
 			int nr_before;
 
-			nr_before = (*shrinker->shrink)(0, gfp_mask);
-			shrink_ret = (*shrinker->shrink)(this_scan, gfp_mask);
+			nr_before = (*shrinker->shrink)(shrinker, 0, gfp_mask);
+			shrink_ret = (*shrinker->shrink)(shrinker, this_scan,
+								gfp_mask);
 			if (shrink_ret == -1)
 				break;
 			if (shrink_ret < nr_before)
-- 
1.5.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] superblock: introduce per-sb cache shrinker infrastructure
  2010-05-14  7:24 [PATCH 0/5] Per-superblock shrinkers Dave Chinner
  2010-05-14  7:24 ` [PATCH 1/5] inode: Make unused inode LRU per superblock Dave Chinner
  2010-05-14  7:24 ` [PATCH 2/5] mm: add context argument to shrinker callback Dave Chinner
@ 2010-05-14  7:24 ` Dave Chinner
  2010-05-14  7:24 ` [PATCH 4/5] superblock: add filesystem shrinker operations Dave Chinner
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2010-05-14  7:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: xfs, linux-fsdevel, linux-mm

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 15596 bytes --]

From: Dave Chinner <dchinner@redhat.com>

With context based shrinkers, we can implement a per-superblock
shrinker that shrinks the caches attached to the superblock. We
currently have global shrinkers for the inode and dentry caches that
split up into per-superblock operations via a coarse proportioning
method that does not batch very well.  The global shrinkers also
have a dependency - dentries pin inodes - so we have to be very
careful about how we register the global shrinkers so that the
implicit call order is always correct.

With a per-sb shrinker callout, we can encode this dependency
directly into the per-sb shrinker, hence avoiding the need for
strictly ordering shrinker registrations. We also have no need for
any proportioning code for the shrinker subsystem already provides
this functionality across all shrinkers. Allowing the shrinker to
operate on a single superblock at a time means that we do less
superblock list traversals and locking and reclaim should batch more
effectively. This should result in less CPU overhead for reclaim and
potentially faster reclaim of items from each filesystem.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/dcache.c        |  137 ++++++++--------------------------------------------
 fs/inode.c         |  111 +++---------------------------------------
 fs/super.c         |   48 ++++++++++++++++++
 include/linux/fs.h |    7 +++
 4 files changed, 84 insertions(+), 219 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 41c35c1..2d619d3 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -456,21 +456,16 @@ static void prune_one_dentry(struct dentry * dentry)
  * which flags are set. This means we don't need to maintain multiple
  * similar copies of this loop.
  */
-static void __shrink_dcache_sb(struct super_block *sb, int *count, int flags)
+static void __shrink_dcache_sb(struct super_block *sb, int count, int flags)
 {
 	LIST_HEAD(referenced);
 	LIST_HEAD(tmp);
 	struct dentry *dentry;
-	int cnt = 0;
 
 	BUG_ON(!sb);
-	BUG_ON((flags & DCACHE_REFERENCED) && count == NULL);
+	BUG_ON((flags & DCACHE_REFERENCED) && count == -1);
 	spin_lock(&dcache_lock);
-	if (count != NULL)
-		/* called from prune_dcache() and shrink_dcache_parent() */
-		cnt = *count;
-restart:
-	if (count == NULL)
+	if (count == -1)
 		list_splice_init(&sb->s_dentry_lru, &tmp);
 	else {
 		while (!list_empty(&sb->s_dentry_lru)) {
@@ -492,13 +487,13 @@ restart:
 			} else {
 				list_move_tail(&dentry->d_lru, &tmp);
 				spin_unlock(&dentry->d_lock);
-				cnt--;
-				if (!cnt)
+				if (--count == 0)
 					break;
 			}
 			cond_resched_lock(&dcache_lock);
 		}
 	}
+prune_more:
 	while (!list_empty(&tmp)) {
 		dentry = list_entry(tmp.prev, struct dentry, d_lru);
 		dentry_lru_del_init(dentry);
@@ -516,91 +511,30 @@ restart:
 		/* dentry->d_lock was dropped in prune_one_dentry() */
 		cond_resched_lock(&dcache_lock);
 	}
-	if (count == NULL && !list_empty(&sb->s_dentry_lru))
-		goto restart;
-	if (count != NULL)
-		*count = cnt;
+	if (count == -1 && !list_empty(&sb->s_dentry_lru)) {
+		list_splice_init(&sb->s_dentry_lru, &tmp);
+		goto prune_more;
+	}
 	if (!list_empty(&referenced))
 		list_splice(&referenced, &sb->s_dentry_lru);
 	spin_unlock(&dcache_lock);
 }
 
 /**
- * prune_dcache - shrink the dcache
- * @count: number of entries to try to free
+ * prune_dcache_sb - shrink the dcache
+ * @nr_to_scan: number of entries to try to free
  *
- * Shrink the dcache. This is done when we need more memory, or simply when we
- * need to unmount something (at which point we need to unuse all dentries).
+ * Attempt to shrink the superblock dcache LRU by @nr_to_scan entries. This is
+ * done when we need more memory an called from the superblock shrinker
+ * function.
  *
- * This function may fail to free any resources if all the dentries are in use.
+ * This function may fail to free any resources if all the dentries are in
+ * use.
  */
-static void prune_dcache(int count)
-{
-	struct super_block *sb;
-	int w_count;
-	int unused = dentry_stat.nr_unused;
-	int prune_ratio;
-	int pruned;
 
-	if (unused == 0 || count == 0)
-		return;
-	spin_lock(&dcache_lock);
-restart:
-	if (count >= unused)
-		prune_ratio = 1;
-	else
-		prune_ratio = unused / count;
-	spin_lock(&sb_lock);
-	list_for_each_entry(sb, &super_blocks, s_list) {
-		if (sb->s_nr_dentry_unused == 0)
-			continue;
-		sb->s_count++;
-		/* Now, we reclaim unused dentrins with fairness.
-		 * We reclaim them same percentage from each superblock.
-		 * We calculate number of dentries to scan on this sb
-		 * as follows, but the implementation is arranged to avoid
-		 * overflows:
-		 * number of dentries to scan on this sb =
-		 * count * (number of dentries on this sb /
-		 * number of dentries in the machine)
-		 */
-		spin_unlock(&sb_lock);
-		if (prune_ratio != 1)
-			w_count = (sb->s_nr_dentry_unused / prune_ratio) + 1;
-		else
-			w_count = sb->s_nr_dentry_unused;
-		pruned = w_count;
-		/*
-		 * We need to be sure this filesystem isn't being unmounted,
-		 * otherwise we could race with generic_shutdown_super(), and
-		 * end up holding a reference to an inode while the filesystem
-		 * is unmounted.  So we try to get s_umount, and make sure
-		 * s_root isn't NULL.
-		 */
-		if (down_read_trylock(&sb->s_umount)) {
-			if ((sb->s_root != NULL) &&
-			    (!list_empty(&sb->s_dentry_lru))) {
-				spin_unlock(&dcache_lock);
-				__shrink_dcache_sb(sb, &w_count,
-						DCACHE_REFERENCED);
-				pruned -= w_count;
-				spin_lock(&dcache_lock);
-			}
-			up_read(&sb->s_umount);
-		}
-		spin_lock(&sb_lock);
-		count -= pruned;
-		/*
-		 * restart only when sb is no longer on the list and
-		 * we have more work to do.
-		 */
-		if (__put_super_and_need_restart(sb) && count > 0) {
-			spin_unlock(&sb_lock);
-			goto restart;
-		}
-	}
-	spin_unlock(&sb_lock);
-	spin_unlock(&dcache_lock);
+void prune_dcache_sb(struct super_block *sb, int nr_to_scan)
+{
+	__shrink_dcache_sb(sb, nr_to_scan, DCACHE_REFERENCED);
 }
 
 /**
@@ -613,7 +547,7 @@ restart:
  */
 void shrink_dcache_sb(struct super_block * sb)
 {
-	__shrink_dcache_sb(sb, NULL, 0);
+	__shrink_dcache_sb(sb, -1, 0);
 }
 EXPORT_SYMBOL(shrink_dcache_sb);
 
@@ -881,37 +815,10 @@ void shrink_dcache_parent(struct dentry * parent)
 	int found;
 
 	while ((found = select_parent(parent)) != 0)
-		__shrink_dcache_sb(sb, &found, 0);
+		__shrink_dcache_sb(sb, found, 0);
 }
 EXPORT_SYMBOL(shrink_dcache_parent);
 
-/*
- * Scan `nr' dentries and return the number which remain.
- *
- * We need to avoid reentering the filesystem if the caller is performing a
- * GFP_NOFS allocation attempt.  One example deadlock is:
- *
- * ext2_new_block->getblk->GFP->shrink_dcache_memory->prune_dcache->
- * prune_one_dentry->dput->dentry_iput->iput->inode->i_sb->s_op->put_inode->
- * ext2_discard_prealloc->ext2_free_blocks->lock_super->DEADLOCK.
- *
- * In this case we return -1 to tell the caller that we baled.
- */
-static int shrink_dcache_memory(struct shrinker *shrink, int nr, gfp_t gfp_mask)
-{
-	if (nr) {
-		if (!(gfp_mask & __GFP_FS))
-			return -1;
-		prune_dcache(nr);
-	}
-	return (dentry_stat.nr_unused / 100) * sysctl_vfs_cache_pressure;
-}
-
-static struct shrinker dcache_shrinker = {
-	.shrink = shrink_dcache_memory,
-	.seeks = DEFAULT_SEEKS,
-};
-
 /**
  * d_alloc	-	allocate a dcache entry
  * @parent: parent of entry to allocate
@@ -2318,8 +2225,6 @@ static void __init dcache_init(void)
 	 */
 	dentry_cache = KMEM_CACHE(dentry,
 		SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD);
-	
-	register_shrinker(&dcache_shrinker);
 
 	/* Hash may have been set up in dcache_init_early */
 	if (!hashdist)
diff --git a/fs/inode.c b/fs/inode.c
index b292e41..ab8ce3a 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -442,8 +442,10 @@ static int can_unuse(struct inode *inode)
 }
 
 /*
- * Scan `goal' inodes on the unused list for freeable ones. They are moved to
- * a temporary list and then are freed outside inode_lock by dispose_list().
+ * Walk the superblock inode LRU for freeable inodes and attempt to free them.
+ * This is called from the superblock shrinker function with a number of inodes
+ * to trim from the LRU. Inodes to be freed are moved to a temporary list and
+ * then are freed outside inode_lock by dispose_list().
  *
  * Any inodes which are pinned purely because of attached pagecache have their
  * pagecache removed.  We expect the final iput() on that inode to add it to
@@ -451,10 +453,10 @@ static int can_unuse(struct inode *inode)
  * inode is still freeable, proceed.  The right inode is found 99.9% of the
  * time in testing on a 4-way.
  *
- * If the inode has metadata buffers attached to mapping->private_list then
- * try to remove them.
+ * If the inode has metadata buffers attached to mapping->private_list then try
+ * to remove them.
  */
-static void shrink_icache_sb(struct super_block *sb, int *nr_to_scan)
+void prune_icache_sb(struct super_block *sb, int nr_to_scan)
 {
 	LIST_HEAD(freeable);
 	int nr_pruned = 0;
@@ -462,7 +464,7 @@ static void shrink_icache_sb(struct super_block *sb, int *nr_to_scan)
 	unsigned long reap = 0;
 
 	spin_lock(&inode_lock);
-	for (nr_scanned = *nr_to_scan; nr_scanned >= 0; nr_scanned--) {
+	for (nr_scanned = nr_to_scan; nr_scanned >= 0; nr_scanned--) {
 		struct inode *inode;
 
 		if (list_empty(&sb->s_inode_lru))
@@ -501,106 +503,10 @@ static void shrink_icache_sb(struct super_block *sb, int *nr_to_scan)
 	else
 		__count_vm_events(PGINODESTEAL, reap);
 	spin_unlock(&inode_lock);
-	*nr_to_scan = nr_scanned;
 
 	dispose_list(&freeable);
 }
 
-static void prune_icache(int count)
-{
-	struct super_block *sb;
-	int w_count;
-	int unused = inodes_stat.nr_unused;
-	int prune_ratio;
-	int pruned;
-
-	if (unused == 0 || count == 0)
-		return;
-	down_read(&iprune_sem);
-restart:
-	if (count >= unused)
-		prune_ratio = 1;
-	else
-		prune_ratio = unused / count;
-	spin_lock(&sb_lock);
-	list_for_each_entry(sb, &super_blocks, s_list) {
-		if (sb->s_nr_inodes_unused == 0)
-			continue;
-		sb->s_count++;
-		/* Now, we reclaim unused dentrins with fairness.
-		 * We reclaim them same percentage from each superblock.
-		 * We calculate number of dentries to scan on this sb
-		 * as follows, but the implementation is arranged to avoid
-		 * overflows:
-		 * number of dentries to scan on this sb =
-		 * count * (number of dentries on this sb /
-		 * number of dentries in the machine)
-		 */
-		spin_unlock(&sb_lock);
-		if (prune_ratio != 1)
-			w_count = (sb->s_nr_inodes_unused / prune_ratio) + 1;
-		else
-			w_count = sb->s_nr_inodes_unused;
-		pruned = w_count;
-		/*
-		 * We need to be sure this filesystem isn't being unmounted,
-		 * otherwise we could race with generic_shutdown_super(), and
-		 * end up holding a reference to an inode while the filesystem
-		 * is unmounted.  So we try to get s_umount, and make sure
-		 * s_root isn't NULL.
-		 */
-		if (down_read_trylock(&sb->s_umount)) {
-			if ((sb->s_root != NULL) &&
-			    (!list_empty(&sb->s_dentry_lru))) {
-				shrink_icache_sb(sb, &w_count);
-				pruned -= w_count;
-			}
-			up_read(&sb->s_umount);
-		}
-		spin_lock(&sb_lock);
-		count -= pruned;
-		/*
-		 * restart only when sb is no longer on the list and
-		 * we have more work to do.
-		 */
-		if (__put_super_and_need_restart(sb) && count > 0) {
-			spin_unlock(&sb_lock);
-			goto restart;
-		}
-	}
-	spin_unlock(&sb_lock);
-	up_read(&iprune_sem);
-}
-
-/*
- * shrink_icache_memory() will attempt to reclaim some unused inodes.  Here,
- * "unused" means that no dentries are referring to the inodes: the files are
- * not open and the dcache references to those inodes have already been
- * reclaimed.
- *
- * This function is passed the number of inodes to scan, and it returns the
- * total number of remaining possibly-reclaimable inodes.
- */
-static int shrink_icache_memory(struct shrinker *shrink, int nr, gfp_t gfp_mask)
-{
-	if (nr) {
-		/*
-		 * Nasty deadlock avoidance.  We may hold various FS locks,
-		 * and we don't want to recurse into the FS that called us
-		 * in clear_inode() and friends..
-		 */
-		if (!(gfp_mask & __GFP_FS))
-			return -1;
-		prune_icache(nr);
-	}
-	return (inodes_stat.nr_unused / 100) * sysctl_vfs_cache_pressure;
-}
-
-static struct shrinker icache_shrinker = {
-	.shrink = shrink_icache_memory,
-	.seeks = DEFAULT_SEEKS,
-};
-
 static void __wait_on_freeing_inode(struct inode *inode);
 /*
  * Called with the inode lock held.
@@ -1640,7 +1546,6 @@ void __init inode_init(void)
 					 (SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|
 					 SLAB_MEM_SPREAD),
 					 init_once);
-	register_shrinker(&icache_shrinker);
 
 	/* Hash may have been set up in inode_init_early */
 	if (!hashdist)
diff --git a/fs/super.c b/fs/super.c
index 18655e6..339b590 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -45,6 +45,50 @@
 LIST_HEAD(super_blocks);
 DEFINE_SPINLOCK(sb_lock);
 
+static int prune_super(struct shrinker *shrink, int nr_to_scan, gfp_t gfp_mask)
+{
+	struct super_block *sb;
+	int count;
+
+	sb = container_of(shrink, struct super_block, s_shrink);
+
+	/*
+	 * Deadlock avoidance.  We may hold various FS locks, and we don't want
+	 * to recurse into the FS that called us in clear_inode() and friends..
+	 */
+	if (!(gfp_mask & __GFP_FS))
+		return -1;
+
+	/*
+	 * if we can't get the umount lock, then there's no point having the
+	 * shrinker try again because the sb is being torn down.
+	 */
+	if (!down_read_trylock(&sb->s_umount))
+		return -1;
+
+	if (!sb->s_root) {
+		up_read(&sb->s_umount);
+		return -1;
+	}
+
+	if (nr_to_scan) {
+		/* proportion the scan between the two cacheѕ */
+		int total;
+
+		total = sb->s_nr_dentry_unused + sb->s_nr_inodes_unused + 1;
+		count = (nr_to_scan * sb->s_nr_dentry_unused) / total;
+
+		/* prune dcache first as icache is pinned by it */
+		prune_dcache_sb(sb, count);
+		prune_icache_sb(sb, nr_to_scan - count);
+	}
+
+	count = ((sb->s_nr_dentry_unused + sb->s_nr_inodes_unused) / 100)
+						* sysctl_vfs_cache_pressure;
+	up_read(&sb->s_umount);
+	return count;
+}
+
 /**
  *	alloc_super	-	create new superblock
  *	@type:	filesystem type superblock should belong to
@@ -106,6 +150,9 @@ static struct super_block *alloc_super(struct file_system_type *type)
 		s->s_qcop = sb_quotactl_ops;
 		s->s_op = &default_op;
 		s->s_time_gran = 1000000000;
+		s->s_shrink.shrink = prune_super;
+		s->s_shrink.seeks = DEFAULT_SEEKS;
+		register_shrinker(&s->s_shrink);
 	}
 out:
 	return s;
@@ -119,6 +166,7 @@ out:
  */
 static inline void destroy_super(struct super_block *s)
 {
+	unregister_shrinker(&s->s_shrink);
 	security_sb_free(s);
 	kfree(s->s_subtype);
 	kfree(s->s_options);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 41132e3..6ba3739 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -382,6 +382,7 @@ struct inodes_stat_t {
 #include <linux/capability.h>
 #include <linux/semaphore.h>
 #include <linux/fiemap.h>
+#include <linux/mm.h>
 
 #include <asm/atomic.h>
 #include <asm/byteorder.h>
@@ -1387,8 +1388,14 @@ struct super_block {
 	 * generic_show_options()
 	 */
 	char *s_options;
+
+	struct shrinker s_shrink;	/* per-sb shrinker handle */
 };
 
+/* superblock cache pruning functions */
+void prune_icache_sb(struct super_block *sb, int nr_to_scan);
+void prune_dcache_sb(struct super_block *sb, int nr_to_scan);
+
 extern struct timespec current_fs_time(struct super_block *sb);
 
 /*
-- 
1.5.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/5] superblock: add filesystem shrinker operations
  2010-05-14  7:24 [PATCH 0/5] Per-superblock shrinkers Dave Chinner
                   ` (2 preceding siblings ...)
  2010-05-14  7:24 ` [PATCH 3/5] superblock: introduce per-sb cache shrinker infrastructure Dave Chinner
@ 2010-05-14  7:24 ` Dave Chinner
  2010-05-14  7:24 ` [PATCH 5/5] xfs: make use of new shrinker callout Dave Chinner
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2010-05-14  7:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: xfs, linux-fsdevel, linux-mm

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 3797 bytes --]

From: Dave Chinner <dchinner@redhat.com>

Now we have a per-superblock shrinker implementation, we can add a
filesystem specific callout to it to allow filesystem internal
caches to be shrunk by the superblock shrinker.

Rather than perpetuate the multipurpose shrinker callback API (i.e.
nr_to_scan == 0 meaning "tell me how many objects freeable in the
cache), two operations will be added. The first will return the
number of objects that are freeable, the second is the actual
shrinker call.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/super.c         |   43 +++++++++++++++++++++++++++++++------------
 include/linux/fs.h |   11 +++++++++++
 2 files changed, 42 insertions(+), 12 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 339b590..e98292e 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -48,7 +48,8 @@ DEFINE_SPINLOCK(sb_lock);
 static int prune_super(struct shrinker *shrink, int nr_to_scan, gfp_t gfp_mask)
 {
 	struct super_block *sb;
-	int count;
+	int	fs_objects = 0;
+	int	total_objects;
 
 	sb = container_of(shrink, struct super_block, s_shrink);
 
@@ -71,22 +72,40 @@ static int prune_super(struct shrinker *shrink, int nr_to_scan, gfp_t gfp_mask)
 		return -1;
 	}
 
-	if (nr_to_scan) {
-		/* proportion the scan between the two cacheѕ */
-		int total;
-
-		total = sb->s_nr_dentry_unused + sb->s_nr_inodes_unused + 1;
-		count = (nr_to_scan * sb->s_nr_dentry_unused) / total;
+	if (sb->s_op && sb->s_op->nr_cached_objects)
+		fs_objects = sb->s_op->nr_cached_objects(sb);
 
-		/* prune dcache first as icache is pinned by it */
-		prune_dcache_sb(sb, count);
-		prune_icache_sb(sb, nr_to_scan - count);
+	total_objects = sb->s_nr_dentry_unused +
+			sb->s_nr_inodes_unused + fs_objects + 1;
+	if (nr_to_scan) {
+		int	dentries;
+		int	inodes;
+
+		/* proportion the scan between the cacheѕ */
+		dentries = (nr_to_scan * sb->s_nr_dentry_unused) /
+							total_objects;
+		inodes = (nr_to_scan * sb->s_nr_inodes_unused) /
+							total_objects;
+		if (fs_objects)
+			fs_objects = (nr_to_scan * fs_objects) /
+							total_objects;
+		/*
+		 * prune the dcache first as the icache is pinned by it, then
+		 * prune the icache, followed by the filesystem specific caches
+		 */
+		prune_dcache_sb(sb, dentries);
+		prune_icache_sb(sb, inodes);
+		if (sb->s_op && sb->s_op->free_cached_objects) {
+			sb->s_op->free_cached_objects(sb, fs_objects);
+			fs_objects = sb->s_op->nr_cached_objects(sb);
+		}
+		total_objects = sb->s_nr_dentry_unused +
+				sb->s_nr_inodes_unused + fs_objects;
 	}
 
-	count = ((sb->s_nr_dentry_unused + sb->s_nr_inodes_unused) / 100)
-						* sysctl_vfs_cache_pressure;
+	total_objects = (total_objects / 100) * sysctl_vfs_cache_pressure;
 	up_read(&sb->s_umount);
-	return count;
+	return total_objects;
 }
 
 /**
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6ba3739..ef2e9e2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1591,6 +1591,17 @@ struct super_operations {
 	ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
 #endif
 	int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
+
+	/*
+	 * memory shrinker operations.
+	 * ->nr_cached_objects() should return the number of freeable cached
+	 * objects the filesystem holds.
+	 * ->free_cache_objects() should attempt to free the number of cached
+	 * objects indicated. It should return how many objects it attempted to
+	 * free.
+	 */
+	int (*nr_cached_objects)(struct super_block *);
+	int (*free_cached_objects)(struct super_block *, int);
 };
 
 /*
-- 
1.5.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/5] xfs: make use of new shrinker callout
  2010-05-14  7:24 [PATCH 0/5] Per-superblock shrinkers Dave Chinner
                   ` (3 preceding siblings ...)
  2010-05-14  7:24 ` [PATCH 4/5] superblock: add filesystem shrinker operations Dave Chinner
@ 2010-05-14  7:24 ` Dave Chinner
  2010-05-14 17:46 ` Defrag in shrinkers (was Re: [PATCH 0/5] Per-superblock shrinkers) Christoph Lameter
  2010-05-15  1:30 ` [PATCH 0/5] Per-superblock shrinkers Al Viro
  6 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2010-05-14  7:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: xfs, linux-fsdevel, linux-mm

From: Dave Chinner <dchinner@redhat.com>

Convert the inode reclaim shrinker to use the new per-sb shrinker
operations.  This fixes a bunch of lockdep warnings about the
xfs_mount_list_lock being taken in different reclaim contexts by
removing it, and allows the reclaim to be proportioned across
filesystems with no extra code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/linux-2.6/xfs_super.c   |   23 ++++++--
 fs/xfs/linux-2.6/xfs_sync.c    |  124 +++++++++++-----------------------------
 fs/xfs/linux-2.6/xfs_sync.h    |   16 +++--
 fs/xfs/quota/xfs_qm_syscalls.c |    2 +-
 fs/xfs/xfs_mount.h             |    1 -
 5 files changed, 61 insertions(+), 105 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index 29f1edc..2bd512a 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -1209,7 +1209,6 @@ xfs_fs_put_super(
 
 	xfs_unmountfs(mp);
 	xfs_freesb(mp);
-	xfs_inode_shrinker_unregister(mp);
 	xfs_icsb_destroy_counters(mp);
 	xfs_close_devices(mp);
 	xfs_dmops_put(mp);
@@ -1623,8 +1622,6 @@ xfs_fs_fill_super(
 	if (error)
 		goto fail_vnrele;
 
-	xfs_inode_shrinker_register(mp);
-
 	kfree(mtpt);
 	return 0;
 
@@ -1678,6 +1675,22 @@ xfs_fs_get_sb(
 			   mnt);
 }
 
+static int
+xfs_fs_nr_cached_objects(
+	struct super_block	*sb)
+{
+	return xfs_reclaim_inodes_count(XFS_M(sb));
+}
+
+static int
+xfs_fs_free_cached_objects(
+	struct super_block	*sb,
+	int			nr_to_scan)
+{
+	xfs_reclaim_inodes_nr(XFS_M(sb), 0, nr_to_scan);
+	return 0;
+}
+
 static const struct super_operations xfs_super_operations = {
 	.alloc_inode		= xfs_fs_alloc_inode,
 	.destroy_inode		= xfs_fs_destroy_inode,
@@ -1691,6 +1704,8 @@ static const struct super_operations xfs_super_operations = {
 	.statfs			= xfs_fs_statfs,
 	.remount_fs		= xfs_fs_remount,
 	.show_options		= xfs_fs_show_options,
+	.nr_cached_objects	= xfs_fs_nr_cached_objects,
+	.free_cached_objects	= xfs_fs_free_cached_objects,
 };
 
 static struct file_system_type xfs_fs_type = {
@@ -1870,7 +1885,6 @@ init_xfs_fs(void)
 		goto out_cleanup_procfs;
 
 	vfs_initquota();
-	xfs_inode_shrinker_init();
 
 	error = register_filesystem(&xfs_fs_type);
 	if (error)
@@ -1898,7 +1912,6 @@ exit_xfs_fs(void)
 {
 	vfs_exitquota();
 	unregister_filesystem(&xfs_fs_type);
-	xfs_inode_shrinker_destroy();
 	xfs_sysctl_unregister();
 	xfs_cleanup_procfs();
 	xfs_buf_terminate();
diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index 17ec4a6..4f26673 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -137,7 +137,7 @@ restart:
 
 	} while ((*nr_to_scan)--);
 
-	if (skipped) {
+	if (skipped && *nr_to_scan > 0) {
 		delay(1);
 		goto restart;
 	}
@@ -152,14 +152,14 @@ xfs_inode_ag_iterator(
 	int			flags,
 	int			tag,
 	int			exclusive,
-	int			*nr_to_scan)
+	int			nr_to_scan)
 {
 	int			error = 0;
 	int			last_error = 0;
 	xfs_agnumber_t		ag;
-	int			nr;
 
-	nr = nr_to_scan ? *nr_to_scan : INT_MAX;
+	if (nr_to_scan <= 0)
+		nr_to_scan = INT_MAX;
 	for (ag = 0; ag < mp->m_sb.sb_agcount; ag++) {
 		struct xfs_perag	*pag;
 
@@ -169,18 +169,16 @@ xfs_inode_ag_iterator(
 			continue;
 		}
 		error = xfs_inode_ag_walk(mp, pag, execute, flags, tag,
-						exclusive, &nr);
+						exclusive, &nr_to_scan);
 		xfs_perag_put(pag);
 		if (error) {
 			last_error = error;
 			if (error == EFSCORRUPTED)
 				break;
 		}
-		if (nr <= 0)
+		if (nr_to_scan <= 0)
 			break;
 	}
-	if (nr_to_scan)
-		*nr_to_scan = nr;
 	return XFS_ERROR(last_error);
 }
 
@@ -299,7 +297,7 @@ xfs_sync_data(
 	ASSERT((flags & ~(SYNC_TRYLOCK|SYNC_WAIT)) == 0);
 
 	error = xfs_inode_ag_iterator(mp, xfs_sync_inode_data, flags,
-				      XFS_ICI_NO_TAG, 0, NULL);
+				      XFS_ICI_NO_TAG, 0, 0);
 	if (error)
 		return XFS_ERROR(error);
 
@@ -318,7 +316,7 @@ xfs_sync_attr(
 	ASSERT((flags & ~SYNC_WAIT) == 0);
 
 	return xfs_inode_ag_iterator(mp, xfs_sync_inode_attr, flags,
-				     XFS_ICI_NO_TAG, 0, NULL);
+				     XFS_ICI_NO_TAG, 0, 0);
 }
 
 STATIC int
@@ -858,100 +856,44 @@ reclaim:
 
 }
 
+/*
+ * Scan a certain number of inodes for reclaim. nr_to_scan <= 0 means reclaim
+ * every inode that has the reclaim tag set.
+ */
 int
-xfs_reclaim_inodes(
+xfs_reclaim_inodes_nr(
 	xfs_mount_t	*mp,
-	int		mode)
+	int		mode,
+	int		nr_to_scan)
 {
 	return xfs_inode_ag_iterator(mp, xfs_reclaim_inode, mode,
-					XFS_ICI_RECLAIM_TAG, 1, NULL);
+					XFS_ICI_RECLAIM_TAG, 1, nr_to_scan);
 }
 
 /*
- * Shrinker infrastructure.
+ * Return the number of reclaimable inodes in the filesystem for
+ * the shrinker to determine how much to reclaim.
  *
- * This is all far more complex than it needs to be. It adds a global list of
- * mounts because the shrinkers can only call a global context. We need to make
- * the shrinkers pass a context to avoid the need for global state.
+ * Because the inode cache may not have any reclaimable inodes in it, but will
+ * be populated as part of the higher level cleaning, we need to count all
+ * those inodes as reclaimable here as well.
  */
-static LIST_HEAD(xfs_mount_list);
-static struct rw_semaphore xfs_mount_list_lock;
-
-static int
-xfs_reclaim_inode_shrink(
-	struct shrinker	*shrink,
-	int		nr_to_scan,
-	gfp_t		gfp_mask)
+int
+xfs_reclaim_inodes_count(
+	xfs_mount_t	*mp)
 {
-	struct xfs_mount *mp;
-	struct xfs_perag *pag;
-	xfs_agnumber_t	ag;
-	int		reclaimable = 0;
-
-	if (nr_to_scan) {
-		if (!(gfp_mask & __GFP_FS))
-			return -1;
-
-		down_read(&xfs_mount_list_lock);
-		list_for_each_entry(mp, &xfs_mount_list, m_mplist) {
-			xfs_inode_ag_iterator(mp, xfs_reclaim_inode, 0,
-					XFS_ICI_RECLAIM_TAG, 1, &nr_to_scan);
-			if (nr_to_scan <= 0)
-				break;
-		}
-		up_read(&xfs_mount_list_lock);
-	}
-
-	down_read(&xfs_mount_list_lock);
-	list_for_each_entry(mp, &xfs_mount_list, m_mplist) {
-		for (ag = 0; ag < mp->m_sb.sb_agcount; ag++) {
+	xfs_agnumber_t		ag;
+	int			reclaimable = 0;
 
-			pag = xfs_perag_get(mp, ag);
-			if (!pag->pag_ici_init) {
-				xfs_perag_put(pag);
-				continue;
-			}
-			reclaimable += pag->pag_ici_reclaimable;
+	for (ag = 0; ag < mp->m_sb.sb_agcount; ag++) {
+		struct xfs_perag *pag = xfs_perag_get(mp, ag);
+		if (!pag->pag_ici_init) {
 			xfs_perag_put(pag);
+			continue;
 		}
+		reclaimable += pag->pag_ici_reclaimable;
+		xfs_perag_put(pag);
 	}
-	up_read(&xfs_mount_list_lock);
-	return reclaimable;
-}
-
-static struct shrinker xfs_inode_shrinker = {
-	.shrink = xfs_reclaim_inode_shrink,
-	.seeks = DEFAULT_SEEKS,
-};
-
-void __init
-xfs_inode_shrinker_init(void)
-{
-	init_rwsem(&xfs_mount_list_lock);
-	register_shrinker(&xfs_inode_shrinker);
-}
-
-void
-xfs_inode_shrinker_destroy(void)
-{
-	ASSERT(list_empty(&xfs_mount_list));
-	unregister_shrinker(&xfs_inode_shrinker);
-}
-
-void
-xfs_inode_shrinker_register(
-	struct xfs_mount	*mp)
-{
-	down_write(&xfs_mount_list_lock);
-	list_add_tail(&mp->m_mplist, &xfs_mount_list);
-	up_write(&xfs_mount_list_lock);
+	return reclaimable + mp->m_super->s_nr_inodes_unused;
 }
 
-void
-xfs_inode_shrinker_unregister(
-	struct xfs_mount	*mp)
-{
-	down_write(&xfs_mount_list_lock);
-	list_del(&mp->m_mplist);
-	up_write(&xfs_mount_list_lock);
-}
diff --git a/fs/xfs/linux-2.6/xfs_sync.h b/fs/xfs/linux-2.6/xfs_sync.h
index cdcbaac..c55f645 100644
--- a/fs/xfs/linux-2.6/xfs_sync.h
+++ b/fs/xfs/linux-2.6/xfs_sync.h
@@ -43,7 +43,14 @@ void xfs_quiesce_attr(struct xfs_mount *mp);
 
 void xfs_flush_inodes(struct xfs_inode *ip);
 
-int xfs_reclaim_inodes(struct xfs_mount *mp, int mode);
+int xfs_reclaim_inodes_count(struct xfs_mount *mp);
+int xfs_reclaim_inodes_nr(struct xfs_mount *mp, int mode, int nr_to_scan);
+
+static inline int
+xfs_reclaim_inodes(struct xfs_mount *mp, int mode)
+{
+	return xfs_reclaim_inodes_nr(mp, mode, 0);
+}
 
 void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
 void __xfs_inode_set_reclaim_tag(struct xfs_perag *pag, struct xfs_inode *ip);
@@ -53,11 +60,6 @@ void __xfs_inode_clear_reclaim_tag(struct xfs_mount *mp, struct xfs_perag *pag,
 int xfs_sync_inode_valid(struct xfs_inode *ip, struct xfs_perag *pag);
 int xfs_inode_ag_iterator(struct xfs_mount *mp,
 	int (*execute)(struct xfs_inode *ip, struct xfs_perag *pag, int flags),
-	int flags, int tag, int write_lock, int *nr_to_scan);
-
-void xfs_inode_shrinker_init(void);
-void xfs_inode_shrinker_destroy(void);
-void xfs_inode_shrinker_register(struct xfs_mount *mp);
-void xfs_inode_shrinker_unregister(struct xfs_mount *mp);
+	int flags, int tag, int write_lock, int nr_to_scan);
 
 #endif
diff --git a/fs/xfs/quota/xfs_qm_syscalls.c b/fs/xfs/quota/xfs_qm_syscalls.c
index 50bee07..94c0cac 100644
--- a/fs/xfs/quota/xfs_qm_syscalls.c
+++ b/fs/xfs/quota/xfs_qm_syscalls.c
@@ -892,7 +892,7 @@ xfs_qm_dqrele_all_inodes(
 {
 	ASSERT(mp->m_quotainfo);
 	xfs_inode_ag_iterator(mp, xfs_dqrele_inode, flags,
-				XFS_ICI_NO_TAG, 0, NULL);
+				XFS_ICI_NO_TAG, 0, 0);
 }
 
 /*------------------------------------------------------------------------*/
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 9ff48a1..4fa0bc7 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -259,7 +259,6 @@ typedef struct xfs_mount {
 	wait_queue_head_t	m_wait_single_sync_task;
 	__int64_t		m_update_flags;	/* sb flags we need to update
 						   on the next remount,rw */
-	struct list_head	m_mplist;	/* inode shrinker mount list */
 } xfs_mount_t;
 
 /*
-- 
1.5.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Defrag in shrinkers (was Re: [PATCH 0/5] Per-superblock shrinkers)
  2010-05-14  7:24 [PATCH 0/5] Per-superblock shrinkers Dave Chinner
                   ` (4 preceding siblings ...)
  2010-05-14  7:24 ` [PATCH 5/5] xfs: make use of new shrinker callout Dave Chinner
@ 2010-05-14 17:46 ` Christoph Lameter
  2010-05-14 20:36   ` Defrag in shrinkers Andi Kleen
  2010-05-15  1:15   ` Defrag in shrinkers (was Re: [PATCH 0/5] Per-superblock shrinkers) Dave Chinner
  2010-05-15  1:30 ` [PATCH 0/5] Per-superblock shrinkers Al Viro
  6 siblings, 2 replies; 14+ messages in thread
From: Christoph Lameter @ 2010-05-14 17:46 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-kernel, xfs, linux-fsdevel, linux-mm, Pekka Enberg, npiggin

Would it also be possible to add some defragmentation logic when you
revise the shrinkers? Here is a prototype patch that would allow you to
determine the other objects sitting in the same page as a given object.

With that I hope that you have enough information to determine if its
worth to evict the other objects as well to reclaim the slab page.


From: Christoph Lameter <cl@linux-foundation.org>
Subject: Slab allocators: Introduce function to determine other objects in the same slab page

kmem_cache_objects() can be used to determin other objects sharing the same
slab. With such knowledge a slab user can intentionally free all slab objects
in a slab to allow the freeing of the slab as a whole. This is particularly
important for the dentry and inode cache handling since they reclaim objects
in LRU fashion. With this function they can see if the object is sitting in
a sparsely populated slab page and if so decide to reclaim the other objects
in the slab page. In many situations we can otherwise get high memory use
since only a very small portion of the available object slots are in use
(this can occur after a file scan or when the computational load on a server
changes).

kmem_cache_object() returns the number of objects currently in use. A parameter
allows the retrieval of the maximum number of objects that would fit into this
slab page.

The user must then use these numbers to determine if an effort should be made
to free the remaining objects. The allocated objects are returned in an array
of pointers.

Objects can only stay allocated if the user has some way of locking out
kmem_cache_free() operations on the slab. Otherwise the operations on the
returned object pointers cause race conditions.

Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

---
 include/linux/slab.h |   18 ++++++++++++++++++
 mm/slab.c            |   23 +++++++++++++++++++++++
 mm/slob.c            |    6 ++++++
 mm/slub.c            |   42 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 89 insertions(+)

Index: linux-2.6/include/linux/slab.h
===================================================================
--- linux-2.6.orig/include/linux/slab.h	2010-05-14 12:24:44.000000000 -0500
+++ linux-2.6/include/linux/slab.h	2010-05-14 12:37:36.000000000 -0500
@@ -110,6 +110,24 @@ int kern_ptr_validate(const void *ptr, u
 int kmem_ptr_validate(struct kmem_cache *cachep, const void *ptr);

 /*
+ * Determine objects in the same slab page as a given object.
+ *
+ * The return value is the number of objects currently allocated in the slab
+ * or a negative error value and the maximum number of objects that this
+ * slab page could handle.
+ *
+ * Warning: The objects returned can be freed at any time and therefore the
+ * pointer can be invalid unless other measures are taken to avoid objects
+ * being freed while looping through the list of objects.
+ *
+ * Return codes:
+ *	-E2BIG	More objects than fit into the provided list.
+ *	-EBUSY	Objects in the slab are allocation queues.
+ */
+int kmem_cache_objects(struct kmem_cache *slab, const void *x,
+		 const void **list, int max, int *capacity);
+
+/*
  * Please use this macro to create slab caches. Simply specify the
  * name of the structure and maybe some flags that are listed above.
  *
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2010-05-14 12:37:27.000000000 -0500
+++ linux-2.6/mm/slub.c	2010-05-14 12:40:10.000000000 -0500
@@ -2868,6 +2868,48 @@ void kfree(const void *x)
 }
 EXPORT_SYMBOL(kfree);

+static void get_object(struct kmem_cache *s, void *object, void *private)
+{
+	const void ***list = private;
+
+	*(*list)++ = object;
+}
+
+int kmem_cache_objects(struct kmem_cache *s, const void *x,
+		const void **list, int list_size, int *capacity)
+{
+	int r;
+	struct page *page;
+	unsigned long *map;
+
+	page = virt_to_head_page(x);
+	BUG_ON(!PageSlab(page));
+	BUG_ON(page->slab != s);
+	*capacity = page->objects;
+
+	map = kmalloc(BITS_TO_LONGS(page->objects), GFP_KERNEL);
+
+	slab_lock(page);
+	r = page->inuse;
+
+	if (page->inuse > list_size) {
+		r = -E2BIG;
+		goto abort;
+	}
+
+	if (PageSlubFrozen(page)) {
+		r = -EBUSY;
+		goto abort;
+	}
+
+	traverse_objects(s, page, get_object, &list, map);
+
+abort:
+	slab_unlock(page);
+	kfree(map);
+	return r;
+}
+
 /*
  * kmem_cache_shrink removes empty slabs from the partial lists and sorts
  * the remaining slabs by the number of items in use. The slabs with the
Index: linux-2.6/mm/slab.c
===================================================================
--- linux-2.6.orig/mm/slab.c	2010-05-14 12:24:44.000000000 -0500
+++ linux-2.6/mm/slab.c	2010-05-14 12:37:36.000000000 -0500
@@ -3617,6 +3617,29 @@ out:
 	return 0;
 }

+int kmem_cache_objects(struct kmem_cache *cachep, const void *objp,
+		const void **list, int list_size, int *capacity)
+{
+	struct slab *slabp = virt_to_slab(objp);
+	void *p;
+	int i;
+
+	BUG_ON(cachep != virt_to_cache(objp));
+
+	*capacity = cachep->num;
+	if (slabp->inuse > list_size)
+		return -E2BIG;
+
+	for (i = 0, p = slabp->s_mem; i < cachep->num;
+				 i++, p += cachep->buffer_size) {
+
+		if (slab_bufctl(slabp)[i] == BUFCTL_ACTIVE)
+			*(list) ++ = p;
+
+	}
+	return slabp->inuse;
+}
+
 #ifdef CONFIG_NUMA
 void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 {
Index: linux-2.6/mm/slob.c
===================================================================
--- linux-2.6.orig/mm/slob.c	2010-05-14 12:24:44.000000000 -0500
+++ linux-2.6/mm/slob.c	2010-05-14 12:37:36.000000000 -0500
@@ -658,6 +658,12 @@ void kmem_cache_free(struct kmem_cache *
 }
 EXPORT_SYMBOL(kmem_cache_free);

+void kmem_cache_objects(struct kmem_cache *c, const void *b, void **list,
+						int list_size, int *capacity)
+{
+	return -EBUSY;
+}
+
 unsigned int kmem_cache_size(struct kmem_cache *c)
 {
 	return c->size;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Defrag in shrinkers
  2010-05-14 17:46 ` Defrag in shrinkers (was Re: [PATCH 0/5] Per-superblock shrinkers) Christoph Lameter
@ 2010-05-14 20:36   ` Andi Kleen
  2010-05-15 17:08     ` Ed Tomlinson
  2010-05-15  1:15   ` Defrag in shrinkers (was Re: [PATCH 0/5] Per-superblock shrinkers) Dave Chinner
  1 sibling, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2010-05-14 20:36 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Dave Chinner, linux-kernel, xfs, linux-fsdevel, linux-mm,
	Pekka Enberg, npiggin

Christoph Lameter <cl@linux.com> writes:

> Would it also be possible to add some defragmentation logic when you
> revise the shrinkers? Here is a prototype patch that would allow you to
> determine the other objects sitting in the same page as a given object.
>
> With that I hope that you have enough information to determine if its
> worth to evict the other objects as well to reclaim the slab page.

I like the idea, it would be useful for the hwpoison code too,
when it tries to clean a page.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Defrag in shrinkers (was Re: [PATCH 0/5] Per-superblock shrinkers)
  2010-05-14 17:46 ` Defrag in shrinkers (was Re: [PATCH 0/5] Per-superblock shrinkers) Christoph Lameter
  2010-05-14 20:36   ` Defrag in shrinkers Andi Kleen
@ 2010-05-15  1:15   ` Dave Chinner
  1 sibling, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2010-05-15  1:15 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, xfs, linux-fsdevel, linux-mm, Pekka Enberg, npiggin

On Fri, May 14, 2010 at 12:46:52PM -0500, Christoph Lameter wrote:
> Would it also be possible to add some defragmentation logic when you
> revise the shrinkers? Here is a prototype patch that would allow you to
> determine the other objects sitting in the same page as a given object.
> 
> With that I hope that you have enough information to determine if its
> worth to evict the other objects as well to reclaim the slab page.

I'll have a think about how this might fit in - the real problem is
when the list returns objects that belong to a different superblock.
We can only safely check whether the object belongs to the current
superblock - to check if it belongs to a different sb we a lot of
locks and reference counting to juggle. That would require
re-introducing all the muck (and then some) that this patchset
removes from the shrinkers.

Perhaps just freeing the objects that belong to the current sb would
be sufficient to realise significant improvements (will be fine for
systems that only have one active or dominant filesystem), but i
think some experimentation would be needed.

The that brings us to test cases - we need a good one. I think we
need to re-evaluate where we stand with regard to slab fragmentation
(which probably hasn't changed much), and we need to be able to
quantify the amount of improvement the increase in complexity will
provide.  I don't have anything close to hand to generate such
fragmentation, so it might take a little time to write a test that
does the IO patterns I know will generate problems...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] Per-superblock shrinkers
  2010-05-14  7:24 [PATCH 0/5] Per-superblock shrinkers Dave Chinner
                   ` (5 preceding siblings ...)
  2010-05-14 17:46 ` Defrag in shrinkers (was Re: [PATCH 0/5] Per-superblock shrinkers) Christoph Lameter
@ 2010-05-15  1:30 ` Al Viro
  2010-05-17  0:19   ` Dave Chinner
  6 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2010-05-15  1:30 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel, xfs, linux-fsdevel, linux-mm

On Fri, May 14, 2010 at 05:24:18PM +1000, Dave Chinner wrote:
> 
> This series reworks the filesystem shrinkers. We currently have a
> set of issues with the current filesystem shrinkers:
> 
> 	1. There is an dependency between dentry and inode cache
> 	   shrinking that is only implicitly defined by the order of
> 	   shrinker registration.
> 	2. The shrinkers need to walk the superblock list and pin
> 	   the superblock to avoid unmount races with the sb going
> 	   away.
> 	3. The dentry cache uses per-superblock LRUs and proportions
> 	   reclaim between all the superblocks which means we are
> 	   doing breadth based reclaim. This means we touch every
> 	   superblock for every shrinker call, and may only reclaim
> 	   a single dentry at a time from a given superblock.
> 	4. The inode cache has a global LRU, so it has different
> 	   reclaim patterns to the dentry cache, despite the fact
> 	   that the dentry cache is generally the only thing that
> 	   pins inodes in memory.
> 	5. Filesystems need to register their own shrinkers for
> 	   caches and can't co-ordinate them with the dentry and
> 	   inode cache shrinkers.

NAK in that form; sb refcounting and iterators had been reworked for .34,
so at least it needs rediff on top of that.  What's more, it's very
obviously broken wrt locking - you are unregistering a shrinker
from __put_super().  I.e. grab rwsem exclusively under a spinlock.

Essentially, you've turned dropping a _passive_ reference to superblock
(currently an operation safe in any context) into an operation allowed
only when no fs or vm locks are held by caller.  Not going to work...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Defrag in shrinkers
  2010-05-14 20:36   ` Defrag in shrinkers Andi Kleen
@ 2010-05-15 17:08     ` Ed Tomlinson
  2010-05-17  0:24       ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Ed Tomlinson @ 2010-05-15 17:08 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Lameter, Dave Chinner, linux-kernel, xfs, linux-fsdevel,
	linux-mm, Pekka Enberg, npiggin

On Friday 14 May 2010 16:36:03 Andi Kleen wrote:
> Christoph Lameter <cl@linux.com> writes:
> 
> > Would it also be possible to add some defragmentation logic when you
> > revise the shrinkers? Here is a prototype patch that would allow you to
> > determine the other objects sitting in the same page as a given object.
> >
> > With that I hope that you have enough information to determine if its
> > worth to evict the other objects as well to reclaim the slab page.
> 
> I like the idea, it would be useful for the hwpoison code too,
> when it tries to clean a page.

If this is done generally we probably want to retune the 'pressure' put on the slab.  The
whole reason for the callbacks was to keep the 'pressure on the slab proportional to the
memory pressure (scan rate).  

Ed Tomlinson

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] Per-superblock shrinkers
  2010-05-15  1:30 ` [PATCH 0/5] Per-superblock shrinkers Al Viro
@ 2010-05-17  0:19   ` Dave Chinner
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2010-05-17  0:19 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, xfs, linux-fsdevel, linux-mm

On Sat, May 15, 2010 at 02:30:05AM +0100, Al Viro wrote:
> On Fri, May 14, 2010 at 05:24:18PM +1000, Dave Chinner wrote:
> > 
> > This series reworks the filesystem shrinkers. We currently have a
> > set of issues with the current filesystem shrinkers:
> > 
> > 	1. There is an dependency between dentry and inode cache
> > 	   shrinking that is only implicitly defined by the order of
> > 	   shrinker registration.
> > 	2. The shrinkers need to walk the superblock list and pin
> > 	   the superblock to avoid unmount races with the sb going
> > 	   away.
> > 	3. The dentry cache uses per-superblock LRUs and proportions
> > 	   reclaim between all the superblocks which means we are
> > 	   doing breadth based reclaim. This means we touch every
> > 	   superblock for every shrinker call, and may only reclaim
> > 	   a single dentry at a time from a given superblock.
> > 	4. The inode cache has a global LRU, so it has different
> > 	   reclaim patterns to the dentry cache, despite the fact
> > 	   that the dentry cache is generally the only thing that
> > 	   pins inodes in memory.
> > 	5. Filesystems need to register their own shrinkers for
> > 	   caches and can't co-ordinate them with the dentry and
> > 	   inode cache shrinkers.
> 
> NAK in that form; sb refcounting and iterators had been reworked for .34,
> so at least it needs rediff on top of that.

The tree I based this on was 2.6.34-rc7 - is there new code in a
-next branch somewhere?

> What's more, it's very
> obviously broken wrt locking - you are unregistering a shrinker
> from __put_super().  I.e. grab rwsem exclusively under a spinlock.
> Essentially, you've turned dropping a _passive_ reference to superblock
> (currently an operation safe in any context) into an operation allowed
> only when no fs or vm locks are held by caller.  Not going to work...

Yeah, I picked that up after I posted it. My bad - I'll look into how
I can rework that for the next iteration.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Defrag in shrinkers
  2010-05-15 17:08     ` Ed Tomlinson
@ 2010-05-17  0:24       ` Dave Chinner
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2010-05-17  0:24 UTC (permalink / raw)
  To: Ed Tomlinson
  Cc: Andi Kleen, Christoph Lameter, linux-kernel, xfs, linux-fsdevel,
	linux-mm, Pekka Enberg, npiggin

On Sat, May 15, 2010 at 01:08:17PM -0400, Ed Tomlinson wrote:
> On Friday 14 May 2010 16:36:03 Andi Kleen wrote:
> > Christoph Lameter <cl@linux.com> writes:
> > 
> > > Would it also be possible to add some defragmentation logic when you
> > > revise the shrinkers? Here is a prototype patch that would allow you to
> > > determine the other objects sitting in the same page as a given object.
> > >
> > > With that I hope that you have enough information to determine if its
> > > worth to evict the other objects as well to reclaim the slab page.
> > 
> > I like the idea, it would be useful for the hwpoison code too,
> > when it tries to clean a page.
> 
> If this is done generally we probably want to retune the 'pressure' put on the slab.  The
> whole reason for the callbacks was to keep the 'pressure on the slab proportional to the
> memory pressure (scan rate).  

I don't see that defrag based reclaim changes the concept of
pressure at all. As long as reclaim follows the nr_to_scan
guideline, then it doesn't matter if we do reclaim from the LRU or
reclaim from a list provided by the slab cache....

FWIW, one thing that would be necessary, I think, is to avoid defrag
until a certain level of fragmentation has occurred - we should do
LRU-based reclaim as much as possible, and only trigger defrag-style
reclaim once we hit a trigger (e.g. once the slab is 25% partial
pages).

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/5] mm: add context argument to shrinker callback
  2010-05-25  8:53 [PATCH 0/5] Per superblock shrinkers V2 Dave Chinner
@ 2010-05-25  8:53 ` Dave Chinner
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2010-05-25  8:53 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel, linux-mm, xfs

From: Dave Chinner <dchinner@redhat.com>

The current shrinker implementation requires the registered callback
to have global state to work from. This makes it difficult to shrink
caches that are not global (e.g. per-filesystem caches). Pass the shrinker
structure to the callback so that users can embed the shrinker structure
in the context the shrinker needs to operate on and get back to it in the
callback via container_of().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 arch/x86/kvm/mmu.c              |    2 +-
 drivers/gpu/drm/i915/i915_gem.c |    2 +-
 fs/dcache.c                     |    2 +-
 fs/gfs2/glock.c                 |    2 +-
 fs/gfs2/quota.c                 |    2 +-
 fs/gfs2/quota.h                 |    2 +-
 fs/inode.c                      |    2 +-
 fs/mbcache.c                    |    5 +++--
 fs/nfs/dir.c                    |    2 +-
 fs/nfs/internal.h               |    3 ++-
 fs/quota/dquot.c                |    2 +-
 fs/ubifs/shrinker.c             |    2 +-
 fs/ubifs/ubifs.h                |    2 +-
 fs/xfs/linux-2.6/xfs_buf.c      |    5 +++--
 fs/xfs/linux-2.6/xfs_sync.c     |    1 +
 fs/xfs/quota/xfs_qm.c           |    7 +++++--
 include/linux/mm.h              |    2 +-
 mm/vmscan.c                     |    8 +++++---
 18 files changed, 31 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 81563e7..ac3d107 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2919,7 +2919,7 @@ static int kvm_mmu_remove_some_alloc_mmu_pages(struct kvm *kvm)
 	return kvm_mmu_zap_page(kvm, page) + 1;
 }
 
-static int mmu_shrink(int nr_to_scan, gfp_t gfp_mask)
+static int mmu_shrink(struct shrinker *shrink, int nr_to_scan, gfp_t gfp_mask)
 {
 	struct kvm *kvm;
 	struct kvm *kvm_freed = NULL;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 112699f..6cd2e7a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5216,7 +5216,7 @@ i915_gpu_is_active(struct drm_device *dev)
 }
 
 static int
-i915_gem_shrink(int nr_to_scan, gfp_t gfp_mask)
+i915_gem_shrink(struct shrinker *shrink, int nr_to_scan, gfp_t gfp_mask)
 {
 	drm_i915_private_t *dev_priv, *next_dev;
 	struct drm_i915_gem_object *obj_priv, *next_obj;
diff --git a/fs/dcache.c b/fs/dcache.c
index d96047b..dba6b6d 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -894,7 +894,7 @@ EXPORT_SYMBOL(shrink_dcache_parent);
  *
  * In this case we return -1 to tell the caller that we baled.
  */
-static int shrink_dcache_memory(int nr, gfp_t gfp_mask)
+static int shrink_dcache_memory(struct shrinker *shrink, int nr, gfp_t gfp_mask)
 {
 	if (nr) {
 		if (!(gfp_mask & __GFP_FS))
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index ddcdbf4..04b540c 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1348,7 +1348,7 @@ void gfs2_glock_complete(struct gfs2_glock *gl, int ret)
 }
 
 
-static int gfs2_shrink_glock_memory(int nr, gfp_t gfp_mask)
+static int gfs2_shrink_glock_memory(struct shrinker *shrink, int nr, gfp_t gfp_mask)
 {
 	struct gfs2_glock *gl;
 	int may_demote;
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 49667d6..4ea548f 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -77,7 +77,7 @@ static LIST_HEAD(qd_lru_list);
 static atomic_t qd_lru_count = ATOMIC_INIT(0);
 static DEFINE_SPINLOCK(qd_lru_lock);
 
-int gfs2_shrink_qd_memory(int nr, gfp_t gfp_mask)
+int gfs2_shrink_qd_memory(struct shrinker *shrink, int nr, gfp_t gfp_mask)
 {
 	struct gfs2_quota_data *qd;
 	struct gfs2_sbd *sdp;
diff --git a/fs/gfs2/quota.h b/fs/gfs2/quota.h
index 195f60c..e7d236c 100644
--- a/fs/gfs2/quota.h
+++ b/fs/gfs2/quota.h
@@ -51,7 +51,7 @@ static inline int gfs2_quota_lock_check(struct gfs2_inode *ip)
 	return ret;
 }
 
-extern int gfs2_shrink_qd_memory(int nr, gfp_t gfp_mask);
+extern int gfs2_shrink_qd_memory(struct shrinker *shrink, int nr, gfp_t gfp_mask);
 extern const struct quotactl_ops gfs2_quotactl_ops;
 
 #endif /* __QUOTA_DOT_H__ */
diff --git a/fs/inode.c b/fs/inode.c
index 3caa758..1e44ec5 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -577,7 +577,7 @@ static void prune_icache(int count)
  * This function is passed the number of inodes to scan, and it returns the
  * total number of remaining possibly-reclaimable inodes.
  */
-static int shrink_icache_memory(int nr, gfp_t gfp_mask)
+static int shrink_icache_memory(struct shrinker *shrink, int nr, gfp_t gfp_mask)
 {
 	if (nr) {
 		/*
diff --git a/fs/mbcache.c b/fs/mbcache.c
index ec88ff3..e28f21b 100644
--- a/fs/mbcache.c
+++ b/fs/mbcache.c
@@ -115,7 +115,7 @@ mb_cache_indexes(struct mb_cache *cache)
  * What the mbcache registers as to get shrunk dynamically.
  */
 
-static int mb_cache_shrink_fn(int nr_to_scan, gfp_t gfp_mask);
+static int mb_cache_shrink_fn(struct shrinker *shrink, int nr_to_scan, gfp_t gfp_mask);
 
 static struct shrinker mb_cache_shrinker = {
 	.shrink = mb_cache_shrink_fn,
@@ -191,13 +191,14 @@ forget:
  * This function is called by the kernel memory management when memory
  * gets low.
  *
+ * @shrink: (ignored)
  * @nr_to_scan: Number of objects to scan
  * @gfp_mask: (ignored)
  *
  * Returns the number of objects which are present in the cache.
  */
 static int
-mb_cache_shrink_fn(int nr_to_scan, gfp_t gfp_mask)
+mb_cache_shrink_fn(struct shrinker *shrink, int nr_to_scan, gfp_t gfp_mask)
 {
 	LIST_HEAD(free_list);
 	struct list_head *l, *ltmp;
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index ee9a179..3f33bc0 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1708,7 +1708,7 @@ static void nfs_access_free_list(struct list_head *head)
 	}
 }
 
-int nfs_access_cache_shrinker(int nr_to_scan, gfp_t gfp_mask)
+int nfs_access_cache_shrinker(struct shrinker *shrink, int nr_to_scan, gfp_t gfp_mask)
 {
 	LIST_HEAD(head);
 	struct nfs_inode *nfsi;
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index d8bd619..e70f44b 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -205,7 +205,8 @@ extern struct rpc_procinfo nfs4_procedures[];
 void nfs_close_context(struct nfs_open_context *ctx, int is_sync);
 
 /* dir.c */
-extern int nfs_access_cache_shrinker(int nr_to_scan, gfp_t gfp_mask);
+extern int nfs_access_cache_shrinker(struct shrinker *shrink,
+					int nr_to_scan, gfp_t gfp_mask);
 
 /* inode.c */
 extern struct workqueue_struct *nfsiod_workqueue;
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 655a4c5..cfd5437 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -697,7 +697,7 @@ static int dqstats_read(unsigned int type)
  * more memory
  */
 
-static int shrink_dqcache_memory(int nr, gfp_t gfp_mask)
+static int shrink_dqcache_memory(struct shrinker *shrink, int nr, gfp_t gfp_mask)
 {
 	if (nr) {
 		spin_lock(&dq_list_lock);
diff --git a/fs/ubifs/shrinker.c b/fs/ubifs/shrinker.c
index 02feb59..0b20111 100644
--- a/fs/ubifs/shrinker.c
+++ b/fs/ubifs/shrinker.c
@@ -277,7 +277,7 @@ static int kick_a_thread(void)
 	return 0;
 }
 
-int ubifs_shrinker(int nr, gfp_t gfp_mask)
+int ubifs_shrinker(struct shrinker *shrink, int nr, gfp_t gfp_mask)
 {
 	int freed, contention = 0;
 	long clean_zn_cnt = atomic_long_read(&ubifs_clean_zn_cnt);
diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h
index bd2542d..5a92345 100644
--- a/fs/ubifs/ubifs.h
+++ b/fs/ubifs/ubifs.h
@@ -1575,7 +1575,7 @@ int ubifs_tnc_start_commit(struct ubifs_info *c, struct ubifs_zbranch *zroot);
 int ubifs_tnc_end_commit(struct ubifs_info *c);
 
 /* shrinker.c */
-int ubifs_shrinker(int nr_to_scan, gfp_t gfp_mask);
+int ubifs_shrinker(struct shrinker *shrink, int nr_to_scan, gfp_t gfp_mask);
 
 /* commit.c */
 int ubifs_bg_thread(void *info);
diff --git a/fs/xfs/linux-2.6/xfs_buf.c b/fs/xfs/linux-2.6/xfs_buf.c
index f01de3c..fe8bf82 100644
--- a/fs/xfs/linux-2.6/xfs_buf.c
+++ b/fs/xfs/linux-2.6/xfs_buf.c
@@ -44,7 +44,7 @@
 
 static kmem_zone_t *xfs_buf_zone;
 STATIC int xfsbufd(void *);
-STATIC int xfsbufd_wakeup(int, gfp_t);
+STATIC int xfsbufd_wakeup(struct shrinker *, int, gfp_t);
 STATIC void xfs_buf_delwri_queue(xfs_buf_t *, int);
 static struct shrinker xfs_buf_shake = {
 	.shrink = xfsbufd_wakeup,
@@ -339,7 +339,7 @@ _xfs_buf_lookup_pages(
 					__func__, gfp_mask);
 
 			XFS_STATS_INC(xb_page_retries);
-			xfsbufd_wakeup(0, gfp_mask);
+			xfsbufd_wakeup(NULL, 0, gfp_mask);
 			congestion_wait(BLK_RW_ASYNC, HZ/50);
 			goto retry;
 		}
@@ -1753,6 +1753,7 @@ xfs_buf_runall_queues(
 
 STATIC int
 xfsbufd_wakeup(
+	struct shrinker		*shrink,
 	int			priority,
 	gfp_t			mask)
 {
diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index 3884e20..c881a0c 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -842,6 +842,7 @@ static struct rw_semaphore xfs_mount_list_lock;
 
 static int
 xfs_reclaim_inode_shrink(
+	struct shrinker	*shrink,
 	int		nr_to_scan,
 	gfp_t		gfp_mask)
 {
diff --git a/fs/xfs/quota/xfs_qm.c b/fs/xfs/quota/xfs_qm.c
index 38e7641..b8051aa 100644
--- a/fs/xfs/quota/xfs_qm.c
+++ b/fs/xfs/quota/xfs_qm.c
@@ -69,7 +69,7 @@ STATIC void	xfs_qm_list_destroy(xfs_dqlist_t *);
 
 STATIC int	xfs_qm_init_quotainos(xfs_mount_t *);
 STATIC int	xfs_qm_init_quotainfo(xfs_mount_t *);
-STATIC int	xfs_qm_shake(int, gfp_t);
+STATIC int	xfs_qm_shake(struct shrinker *, int, gfp_t);
 
 static struct shrinker xfs_qm_shaker = {
 	.shrink = xfs_qm_shake,
@@ -2117,7 +2117,10 @@ xfs_qm_shake_freelist(
  */
 /* ARGSUSED */
 STATIC int
-xfs_qm_shake(int nr_to_scan, gfp_t gfp_mask)
+xfs_qm_shake(
+	struct shrinker	*shrink,
+	int		nr_to_scan,
+	gfp_t		gfp_mask)
 {
 	int	ndqused, nfree, n;
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index fb19bb9..3d7eedc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -994,7 +994,7 @@ static inline void sync_mm_rss(struct task_struct *task, struct mm_struct *mm)
  * querying the cache size, so a fastpath for that case is appropriate.
  */
 struct shrinker {
-	int (*shrink)(int nr_to_scan, gfp_t gfp_mask);
+	int (*shrink)(struct shrinker *, int nr_to_scan, gfp_t gfp_mask);
 	int seeks;	/* seeks to recreate an obj */
 
 	/* These are for internal use */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 3ff3311..9d56aaf 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -215,8 +215,9 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
 	list_for_each_entry(shrinker, &shrinker_list, list) {
 		unsigned long long delta;
 		unsigned long total_scan;
-		unsigned long max_pass = (*shrinker->shrink)(0, gfp_mask);
+		unsigned long max_pass;
 
+		max_pass = (*shrinker->shrink)(shrinker, 0, gfp_mask);
 		delta = (4 * scanned) / shrinker->seeks;
 		delta *= max_pass;
 		do_div(delta, lru_pages + 1);
@@ -244,8 +245,9 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
 			int shrink_ret;
 			int nr_before;
 
-			nr_before = (*shrinker->shrink)(0, gfp_mask);
-			shrink_ret = (*shrinker->shrink)(this_scan, gfp_mask);
+			nr_before = (*shrinker->shrink)(shrinker, 0, gfp_mask);
+			shrink_ret = (*shrinker->shrink)(shrinker, this_scan,
+								gfp_mask);
 			if (shrink_ret == -1)
 				break;
 			if (shrink_ret < nr_before)
-- 
1.5.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-05-25  8:53 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-14  7:24 [PATCH 0/5] Per-superblock shrinkers Dave Chinner
2010-05-14  7:24 ` [PATCH 1/5] inode: Make unused inode LRU per superblock Dave Chinner
2010-05-14  7:24 ` [PATCH 2/5] mm: add context argument to shrinker callback Dave Chinner
2010-05-14  7:24 ` [PATCH 3/5] superblock: introduce per-sb cache shrinker infrastructure Dave Chinner
2010-05-14  7:24 ` [PATCH 4/5] superblock: add filesystem shrinker operations Dave Chinner
2010-05-14  7:24 ` [PATCH 5/5] xfs: make use of new shrinker callout Dave Chinner
2010-05-14 17:46 ` Defrag in shrinkers (was Re: [PATCH 0/5] Per-superblock shrinkers) Christoph Lameter
2010-05-14 20:36   ` Defrag in shrinkers Andi Kleen
2010-05-15 17:08     ` Ed Tomlinson
2010-05-17  0:24       ` Dave Chinner
2010-05-15  1:15   ` Defrag in shrinkers (was Re: [PATCH 0/5] Per-superblock shrinkers) Dave Chinner
2010-05-15  1:30 ` [PATCH 0/5] Per-superblock shrinkers Al Viro
2010-05-17  0:19   ` Dave Chinner
  -- strict thread matches above, loose matches on Subject: below --
2010-05-25  8:53 [PATCH 0/5] Per superblock shrinkers V2 Dave Chinner
2010-05-25  8:53 ` [PATCH 2/5] mm: add context argument to shrinker callback Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).