[PATCH v2 0/4] zcache: a compressed file page cache

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/4] zcache: a compressed file page cache
@ 2013-08-06 11:36 Bob Liu
  2013-08-06 11:36 ` [PATCH v2 1/4] mm: zcache: add core files Bob Liu
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Bob Liu @ 2013-08-06 11:36 UTC (permalink / raw)
  To: linux-mm
  Cc: gregkh, ngupta, akpm, konrad.wilk, sjenning, riel, mgorman,
	kyungmin.park, p.sarna, barry.song, penberg, Bob Liu

Overview:
Zcache is a in kernel compressed cache for file pages.
It takes active file pages that are in the process of being reclaimed and
attempts to compress them into a dynamically allocated RAM-based memory pool.

If this process is successful, when those file pages needed again, the I/O
reading operation was avoided. This results in a significant performance gains
under memory pressure for systems full with file pages.

History:
Nitin Gupta started zcache in 2010:
http://lwn.net/Articles/397574/
http://lwn.net/Articles/396467/

Dan Magenheimer extended zcache supporting both file pages and anonymous pages.
It's located in drivers/staging/zcache now. But the current version of zcache is
too complicated to be merged into upstream.

Seth Jennings implemented a lightweight compressed cache for swap pages(zswap)
only which was merged into v3.11-rc1 together with a zbud allocation.

What I'm trying is reimplement a simple zcache for file pages only, based on the
same zbud alloction layer. We can merge zswap and zcache to current zcache in
staging if there is the requirement in future.

Who can benefit:
Applications like database which have a lot of file page data in memory, but
during memory pressure some of those file pages will be reclaimed after their
data are synced to disk. The data need to be reread into memory when they are
required again. This may increse the transaction latency and cause performance
drop. But with zcache, those data are compressed in memory. Only decompressing
is needed instead of reading from disk!

Other users with limited RAM capacities can also mitigate the performance impact
of memory pressue if there are many file pages in memory.

Design:
Zcache receives pages for compression through the Cleancache API and is able to
evict pages from its own compressed pool on an LRU basis in the case that the
compressed pool is full.

Zcache makes use of zbud for the managing the compressed memory pool. Each
allocation in zbud is not directly accessible by address.  Rather, a handle is
returned(zaddr) by the allocation routine and that handle(zaddr) must be mapped
before being accessed. The compressed memory pool grows on demand and shrinks
as compressed pages are freed.

When a file page is passed from cleancache to zcache, zcache maintains a mapping
of the <filesystem_type, inode_number, page_index> to the zbud address that
references that compressed file page. This mapping is achieved with a red-black
tree per filesystem type, plus a radix tree per red-black node.

A zcache pool with pool_id as the index is created when a filesystem mounted.
Each zcache pool has a red-black tree, the inode number is the search key.
Each red-black tree node has a radix tree which use page index as the index.
Each radix tree slot points to the zbud address combining with some extra
information.

A debugfs interface is provided for various statistic about zcache pool size,
number of pages stored, loaded and evicted.

Performance, Kernel Building:

Setup
========
Ubuntu with kernel v3.11-rc1
Quad-core i5-3320 @ 2.6GHz
1G memory size(limited with mem=1G on boot)
started kernbench with -o N(numbers of threads)

Details
========
          Without zcache    With zcache

8 threads
Elapsed Time        1821              1814(+0.3%)
User Time 	    5332              5304
System Time 	     256               306
Percent CPU 	     306               306
Context Switches 1915378           1912027
Sleeps 		 1501004           1492835

Nr pages succ decompress from zcache
		       -              8295

24 threads
Elapsed Time 	    2556              2256(+11.7%)
User Time 	    5184              5225
System Time 	     271               276
Percent CPU 	     213               243
Context Switches 1993763           2024661
Sleeps 		 2000881           1849496

Nr pages succ decompress from zcache
                       -	    174490

36 threads
Elapsed Time 	    5254              3995(+23.9%)
User Time 	    4781              4947
System Time 	     293               295
Percent CPU 	      96               131
Context Switches 1612581           1779860
Sleeps 		 2944985           2414438

Nr pages succ decompress from zcache
                       -            380470

Performance, Sysbench+mysql:

Setup
========
Ubuntu with kernel v3.11-rc1
Quad-core i5-3320 @ 2.6GHz
2G memory size(limited with mem=2G on boot)
Run sysbench in oltp complex mode for 1 hour:
sysbench --test=oltp --oltp-table-size=5000000 --num-threads=16  --max-time=3600
--oltp-test-mode=complex...

After sysbench started, run iozone to trigger memory pressure:
iozone -a -M -B -s 1200M -y 4k -+u

Sysbench result
========
                                Without zcache	        With zcache
OLTP test statistics:
    queries performed:
        read:                   124320                  166936
        write:                   44400                   59620
        other:                   17760                   23848
        total:                  186480                  250404
    transactions:                 8880(2.47 per sec.)    11924(3.31 per sec.) (+34%)
    deadlocks:                       0(0.00 per sec.)        0(0.00 per sec.)
    read/write requests:        168720(46.86 per sec.)  226556(62.91 per sec.)(+34%)
    other operations:            17760(4.93 per sec.)    23848(6.62 per sec.) (+34%)

Test execution summary:
    total time:                   3600.8528s              3601.3977s
    total number of events:       8880                   11924
    total time taken by event execution:
                                 57610.3546              57612.9163
    per-request statistics:
         min:                       57.68ms                 49.52ms (+14%)
         avg:                     6487.65ms               4831.68ms (+25%)
         max:                   169640.52ms             124282.16ms (+42%)
         approx.  95 percentile: 25139.93ms              21794.82ms (+13%)

Threads fairness:
    events (avg/stddev):           555.0000/6.05           745.2500/8.33
    execution time (avg/stddev):  3600.6472/0.26          3600.8073/0.27

Welcome helps with testing, it would be intersting to find zcache's effect in
more real life workloads.

Bob Liu (4):
  mm: zcache: add core files
  zcache: staging: %s/ZCACHE/ZCACHE_OLD
  mm: zcache: add evict zpages supporting
  mm: add WasActive page flag

 drivers/staging/zcache/Kconfig  |   12 +-
 drivers/staging/zcache/Makefile |    4 +-
 include/linux/page-flags.h      |    9 +-
 mm/Kconfig                      |   17 +
 mm/Makefile                     |    1 +
 mm/page_alloc.c                 |    3 +
 mm/vmscan.c                     |    2 +
 mm/zcache.c                     |  944 +++++++++++++++++++++++++++++++++++++++
 8 files changed, 983 insertions(+), 9 deletions(-)
 create mode 100644 mm/zcache.c

-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/4] mm: zcache: add core files
  2013-08-06 11:36 [PATCH v2 0/4] zcache: a compressed file page cache Bob Liu
@ 2013-08-06 11:36 ` Bob Liu
  2013-08-06 11:36 ` [PATCH v2 2/4] zcache: staging: %s/ZCACHE/ZCACHE_OLD Bob Liu
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Bob Liu @ 2013-08-06 11:36 UTC (permalink / raw)
  To: linux-mm
  Cc: gregkh, ngupta, akpm, konrad.wilk, sjenning, riel, mgorman,
	kyungmin.park, p.sarna, barry.song, penberg, Bob Liu

zcache is a backend for cleancache that takes file pages that are in the process
of being reclaimed and attempts to compress them and store them in a RAM-based
memory pool.
This can result in a significant I/O reduction if system is full
with file pages and, in the case where decompressing from RAM is faster than
reading from the disk, can also improve workload performance.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 mm/Kconfig  |   17 ++
 mm/Makefile |    1 +
 mm/zcache.c |  895 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 913 insertions(+)
 create mode 100644 mm/zcache.c

diff --git a/mm/Kconfig b/mm/Kconfig
index 8028dcc..0084030 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -508,6 +508,23 @@ config ZSWAP
 	  they have not be fully explored on the large set of potential
 	  configurations and workloads that exist.
 
+config ZCACHE
+	bool "Compressed cache for file pages (EXPERIMENTAL)"
+	depends on CRYPTO && CLEANCACHE
+	select CRYPTO_LZO
+	select ZBUD
+	default n
+	help
+	  A compressed cache for file pages.
+
+	  It takes active file pages that are in the process of being reclaimed
+	  and attempts to compress them into a dynamically allocated RAM-based
+	  memory pool.
+
+          If this process is successful, when those file pages needed again, the
+	  I/O reading operation was avoided. This results in a significant performance
+	  gains under memory pressure for systems full with file pages.
+
 config MEM_SOFT_DIRTY
 	bool "Track memory changes"
 	depends on CHECKPOINT_RESTORE && HAVE_ARCH_SOFT_DIRTY
diff --git a/mm/Makefile b/mm/Makefile
index f008033..a29232b 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -33,6 +33,7 @@ obj-$(CONFIG_BOUNCE)	+= bounce.o
 obj-$(CONFIG_SWAP)	+= page_io.o swap_state.o swapfile.o
 obj-$(CONFIG_FRONTSWAP)	+= frontswap.o
 obj-$(CONFIG_ZSWAP)	+= zswap.o
+obj-$(CONFIG_ZCACHE)	+= zcache.o
 obj-$(CONFIG_HAS_DMA)	+= dmapool.o
 obj-$(CONFIG_HUGETLBFS)	+= hugetlb.o
 obj-$(CONFIG_NUMA) 	+= mempolicy.o
diff --git a/mm/zcache.c b/mm/zcache.c
new file mode 100644
index 0000000..ec1a0eb
--- /dev/null
+++ b/mm/zcache.c
@@ -0,0 +1,895 @@
+/*
+ * linux/mm/zcache.c
+ *
+ * A cleancache backend for file pages compression.
+ * Concepts based on original zcache by Dan Magenheimer.
+ * Copyright (C) 2013  Bob Liu <bob.liu@oracle.com>
+ *
+ * With zcache, active file pages can be compressed in memory during page
+ * reclaiming. When their data is needed again the I/O reading operation is
+ * avoided. This results in a significant performance gain under memory pressure
+ * for systems with many file pages.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+*/
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/atomic.h>
+#include <linux/cleancache.h>
+#include <linux/cpu.h>
+#include <linux/crypto.h>
+#include <linux/page-flags.h>
+#include <linux/pagemap.h>
+#include <linux/highmem.h>
+#include <linux/mm_types.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/radix-tree.h>
+#include <linux/rbtree.h>
+#include <linux/types.h>
+#include <linux/zbud.h>
+
+/*
+ * Enable/disable zcache (disabled by default)
+ */
+static bool zcache_enabled __read_mostly;
+module_param_named(enabled, zcache_enabled, bool, 0);
+
+/*
+ * Compressor to be used by zcache
+ */
+#define ZCACHE_COMPRESSOR_DEFAULT "lzo"
+static char *zcache_compressor = ZCACHE_COMPRESSOR_DEFAULT;
+module_param_named(compressor, zcache_compressor, charp, 0);
+
+/*
+ * The maximum percentage of memory that the compressed pool can occupy.
+ */
+static unsigned int zcache_max_pool_percent = 10;
+module_param_named(max_pool_percent, zcache_max_pool_percent, uint, 0644);
+
+/*
+ * zcache statistics
+ */
+static u64 zcache_pool_limit_hit;
+static u64 zcache_dup_entry;
+static u64 zcache_zbud_alloc_fail;
+static u64 zcache_pool_pages;
+static atomic_t zcache_stored_pages = ATOMIC_INIT(0);
+
+/*
+ * Zcache receives pages for compression through the Cleancache API and is able
+ * to evict pages from its own compressed pool on an LRU basis in the case that
+ * the compressed pool is full.
+ *
+ * Zcache makes use of zbud for the managing the compressed memory pool. Each
+ * allocation in zbud is not directly accessible by address.  Rather, a handle
+ * (zaddr) is return by the allocation routine and that handle(zaddr must be
+ * mapped before being accessed. The compressed memory pool grows on demand and
+ * shrinks as compressed pages are freed.
+ *
+ * When a file page is passed from cleancache to zcache, zcache maintains a
+ * mapping of the <filesystem_type, inode_number, page_index> to the zbud
+ * address that references that compressed file page. This mapping is achieved
+ * with a red-black tree per filesystem type, plus a radix tree per red-black
+ * node.
+ *
+ * A zcache pool with pool_id as the index is created when a filesystem mounted
+ * Each zcache pool has a red-black tree, the inode number(rb_index) is the
+ * search key. Each red-black tree node has a radix tree which use
+ * page->index(ra_index) as the index. Each radix tree slot points to the zbud
+ * address combining with some extra information(zcache_ra_handle).
+ */
+#define MAX_ZCACHE_POOLS 32
+/*
+ * One zcache_pool per (cleancache aware) filesystem mount instance
+ */
+struct zcache_pool {
+	struct rb_root rbtree;
+	rwlock_t rb_lock;		/* Protects rbtree */
+	struct zbud_pool *pool;         /* Zbud pool used */
+};
+
+/*
+ * Manage all zcache pools
+ */
+struct _zcache {
+	struct zcache_pool *pools[MAX_ZCACHE_POOLS];
+	u32 num_pools;			/* Current no. of zcache pools */
+	spinlock_t pool_lock;		/* Protects pools[] and num_pools */
+};
+struct _zcache zcache;
+
+/*
+ * Redblack tree node, each node has a page index radix-tree.
+ * Indexed by inode nubmer.
+ */
+struct zcache_rbnode {
+	struct rb_node rb_node;
+	int rb_index;
+	struct radix_tree_root ratree; /* Page radix tree per inode rbtree */
+	spinlock_t ra_lock;		/* Protects radix tree */
+	struct kref refcount;
+};
+
+/*
+ * Radix-tree leaf, indexed by page->index
+ */
+struct zcache_ra_handle {
+	int rb_index;			/* Redblack tree index */
+	int ra_index;			/* Radix tree index */
+	int zlen;			/* Compressed page size */
+};
+
+static struct kmem_cache *zcache_rbnode_cache;
+static int zcache_rbnode_cache_create(void)
+{
+	zcache_rbnode_cache = KMEM_CACHE(zcache_rbnode, 0);
+	return (zcache_rbnode_cache == NULL);
+}
+static void zcache_rbnode_cache_destory(void)
+{
+	kmem_cache_destroy(zcache_rbnode_cache);
+}
+
+/*
+ * Compression functions
+ * (Below functions are copyed from zswap!)
+ */
+static struct crypto_comp * __percpu *zcache_comp_pcpu_tfms;
+
+enum comp_op {
+	ZCACHE_COMPOP_COMPRESS,
+	ZCACHE_COMPOP_DECOMPRESS
+};
+
+static int zcache_comp_op(enum comp_op op, const u8 *src, unsigned int slen,
+				u8 *dst, unsigned int *dlen)
+{
+	struct crypto_comp *tfm;
+	int ret;
+
+	tfm = *per_cpu_ptr(zcache_comp_pcpu_tfms, get_cpu());
+	switch (op) {
+	case ZCACHE_COMPOP_COMPRESS:
+		ret = crypto_comp_compress(tfm, src, slen, dst, dlen);
+		break;
+	case ZCACHE_COMPOP_DECOMPRESS:
+		ret = crypto_comp_decompress(tfm, src, slen, dst, dlen);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	put_cpu();
+	return ret;
+}
+
+static int __init zcache_comp_init(void)
+{
+	if (!crypto_has_comp(zcache_compressor, 0, 0)) {
+		pr_info("%s compressor not available\n", zcache_compressor);
+		/* fall back to default compressor */
+		zcache_compressor = ZCACHE_COMPRESSOR_DEFAULT;
+		if (!crypto_has_comp(zcache_compressor, 0, 0))
+			/* can't even load the default compressor */
+			return -ENODEV;
+	}
+	pr_info("using %s compressor\n", zcache_compressor);
+
+	/* alloc percpu transforms */
+	zcache_comp_pcpu_tfms = alloc_percpu(struct crypto_comp *);
+	if (!zcache_comp_pcpu_tfms)
+		return -ENOMEM;
+	return 0;
+}
+
+static void zcache_comp_exit(void)
+{
+	/* free percpu transforms */
+	if (zcache_comp_pcpu_tfms)
+		free_percpu(zcache_comp_pcpu_tfms);
+}
+
+/*
+ * Per-cpu code
+ * (Below functions are also copyed from zswap!)
+ */
+static DEFINE_PER_CPU(u8 *, zcache_dstmem);
+
+static int __zcache_cpu_notifier(unsigned long action, unsigned long cpu)
+{
+	struct crypto_comp *tfm;
+	u8 *dst;
+
+	switch (action) {
+	case CPU_UP_PREPARE:
+		tfm = crypto_alloc_comp(zcache_compressor, 0, 0);
+		if (IS_ERR(tfm)) {
+			pr_err("can't allocate compressor transform\n");
+			return NOTIFY_BAD;
+		}
+		*per_cpu_ptr(zcache_comp_pcpu_tfms, cpu) = tfm;
+		dst = kmalloc(PAGE_SIZE * 2, GFP_KERNEL);
+		if (!dst) {
+			pr_err("can't allocate compressor buffer\n");
+			crypto_free_comp(tfm);
+			*per_cpu_ptr(zcache_comp_pcpu_tfms, cpu) = NULL;
+			return NOTIFY_BAD;
+		}
+		per_cpu(zcache_dstmem, cpu) = dst;
+		break;
+	case CPU_DEAD:
+	case CPU_UP_CANCELED:
+		tfm = *per_cpu_ptr(zcache_comp_pcpu_tfms, cpu);
+		if (tfm) {
+			crypto_free_comp(tfm);
+			*per_cpu_ptr(zcache_comp_pcpu_tfms, cpu) = NULL;
+		}
+		dst = per_cpu(zcache_dstmem, cpu);
+		kfree(dst);
+		per_cpu(zcache_dstmem, cpu) = NULL;
+		break;
+	default:
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+static int zcache_cpu_notifier(struct notifier_block *nb,
+				unsigned long action, void *pcpu)
+{
+	unsigned long cpu = (unsigned long)pcpu;
+	return __zcache_cpu_notifier(action, cpu);
+}
+
+static struct notifier_block zcache_cpu_notifier_block = {
+	.notifier_call = zcache_cpu_notifier
+};
+
+static int zcache_cpu_init(void)
+{
+	unsigned long cpu;
+
+	get_online_cpus();
+	for_each_online_cpu(cpu)
+		if (__zcache_cpu_notifier(CPU_UP_PREPARE, cpu) != NOTIFY_OK)
+			goto cleanup;
+	register_cpu_notifier(&zcache_cpu_notifier_block);
+	put_online_cpus();
+	return 0;
+
+cleanup:
+	for_each_online_cpu(cpu)
+		__zcache_cpu_notifier(CPU_UP_CANCELED, cpu);
+	put_online_cpus();
+	return -ENOMEM;
+}
+
+/*
+ * Zcache helpers
+ */
+static bool zcache_is_full(void)
+{
+	return (totalram_pages * zcache_max_pool_percent / 100 <
+			zcache_pool_pages);
+}
+
+/*
+ * The caller must hold zpool->rb_lock at least
+ */
+static struct zcache_rbnode *zcache_find_rbnode(struct rb_root *rbtree,
+	int index, struct rb_node **rb_parent, struct rb_node ***rb_link)
+{
+	struct zcache_rbnode *entry;
+	struct rb_node **__rb_link, *__rb_parent, *rb_prev;
+
+	__rb_link = &rbtree->rb_node;
+	rb_prev = __rb_parent = NULL;
+
+	while (*__rb_link) {
+		__rb_parent = *__rb_link;
+		entry = rb_entry(__rb_parent, struct zcache_rbnode, rb_node);
+		if (entry->rb_index > index)
+			__rb_link = &__rb_parent->rb_left;
+		else if (entry->rb_index < index) {
+			rb_prev = __rb_parent;
+			__rb_link = &__rb_parent->rb_right;
+		} else
+			return entry;
+	}
+
+	if (rb_parent)
+		*rb_parent = __rb_parent;
+	if (rb_link)
+		*rb_link = __rb_link;
+	return NULL;
+}
+
+static struct zcache_rbnode *zcache_find_get_rbnode(struct zcache_pool *zpool,
+					int rb_index)
+{
+	unsigned long flags;
+	struct zcache_rbnode *rbnode;
+
+	read_lock_irqsave(&zpool->rb_lock, flags);
+	rbnode = zcache_find_rbnode(&zpool->rbtree, rb_index, 0, 0);
+	if (rbnode)
+		kref_get(&rbnode->refcount);
+	read_unlock_irqrestore(&zpool->rb_lock, flags);
+	return rbnode;
+}
+
+/*
+ * kref_put callback for zcache_rbnode.
+ *
+ * The rbnode must have been isolated from rbtree already.
+ */
+static void zcache_rbnode_release(struct kref *kref)
+{
+	struct zcache_rbnode *rbnode;
+
+	rbnode = container_of(kref, struct zcache_rbnode, refcount);
+	BUG_ON(rbnode->ratree.rnode);
+	kmem_cache_free(zcache_rbnode_cache, rbnode);
+}
+
+/*
+ * Check whether the radix-tree of this rbnode is empty.
+ * If that's true, then we can delete this zcache_rbnode from
+ * zcache_pool->rbtree
+ *
+ * Caller must hold zcache_rbnode->ra_lock
+ */
+static int zcache_rbnode_empty(struct zcache_rbnode *rbnode)
+{
+	return rbnode->ratree.rnode == NULL;
+}
+
+/*
+ * Remove zcache_rbnode from zpool->rbtree
+ *
+ * holded_rblock - whether the caller has holded zpool->rb_lock
+ */
+static void zcache_rbnode_isolate(struct zcache_pool *zpool,
+		struct zcache_rbnode *rbnode, bool holded_rblock)
+{
+	unsigned long flags;
+
+	if (!holded_rblock)
+		write_lock_irqsave(&zpool->rb_lock, flags);
+	/*
+	 * Someone can get reference on this rbnode before we could
+	 * acquire write lock above.
+	 * We want to remove it from zpool->rbtree when only the caller and
+	 * corresponding ratree holds a reference to this rbnode.
+	 * Below check ensures that a racing zcache put will not end up adding
+	 * a page to an isolated node and thereby losing that memory.
+	 */
+	if (atomic_read(&rbnode->refcount.refcount) == 2) {
+		rb_erase(&rbnode->rb_node, &zpool->rbtree);
+		RB_CLEAR_NODE(&rbnode->rb_node);
+		kref_put(&rbnode->refcount, zcache_rbnode_release);
+	}
+	if (!holded_rblock)
+		write_unlock_irqrestore(&zpool->rb_lock, flags);
+}
+
+/*
+ * Store zaddr which allocated by zbud_alloc() to the hierarchy rbtree-ratree.
+ */
+static int zcache_store_zaddr(struct zcache_pool *zpool,
+		struct zcache_ra_handle *zhandle, unsigned long zaddr)
+{
+	unsigned long flags;
+	struct zcache_rbnode *rbnode, *tmp;
+	struct rb_node **link = NULL, *parent = NULL;
+	int ret;
+	void *dup_zaddr;
+
+	rbnode = zcache_find_get_rbnode(zpool, zhandle->rb_index);
+	if (!rbnode) {
+		/* alloc and init a new rbnode */
+		rbnode = kmem_cache_alloc(zcache_rbnode_cache, GFP_KERNEL);
+		if (!rbnode)
+			return -ENOMEM;
+
+		INIT_RADIX_TREE(&rbnode->ratree, GFP_ATOMIC|__GFP_NOWARN);
+		spin_lock_init(&rbnode->ra_lock);
+		rbnode->rb_index = zhandle->rb_index;
+		kref_init(&rbnode->refcount);
+		RB_CLEAR_NODE(&rbnode->rb_node);
+
+		/* add that rbnode to rbtree */
+		write_lock_irqsave(&zpool->rb_lock, flags);
+		tmp = zcache_find_rbnode(&zpool->rbtree, zhandle->rb_index,
+				&parent, &link);
+		if (tmp) {
+			/* somebody else allocated new rbnode */
+			kmem_cache_free(zcache_rbnode_cache, rbnode);
+			rbnode = tmp;
+		} else {
+			rb_link_node(&rbnode->rb_node, parent, link);
+			rb_insert_color(&rbnode->rb_node, &zpool->rbtree);
+		}
+
+		/* Inc the reference of this zcache_rbnode */
+		kref_get(&rbnode->refcount);
+		write_unlock_irqrestore(&zpool->rb_lock, flags);
+	}
+
+	/* Succfully got a zcache_rbnode when arriving here */
+	spin_lock_irqsave(&rbnode->ra_lock, flags);
+	dup_zaddr = radix_tree_delete(&rbnode->ratree, zhandle->ra_index);
+	if (unlikely(dup_zaddr)) {
+		WARN_ON("duplicated, will be replaced!\n");
+		zbud_free(zpool->pool, (unsigned long)dup_zaddr);
+		atomic_dec(&zcache_stored_pages);
+		zcache_pool_pages = zbud_get_pool_size(zpool->pool);
+		zcache_dup_entry++;
+	}
+
+	/* Insert zcache_ra_handle to ratree */
+	ret = radix_tree_insert(&rbnode->ratree, zhandle->ra_index,
+				(void *)zaddr);
+	if (unlikely(ret))
+		if (zcache_rbnode_empty(rbnode))
+			zcache_rbnode_isolate(zpool, rbnode, 0);
+	spin_unlock_irqrestore(&rbnode->ra_lock, flags);
+
+	kref_put(&rbnode->refcount, zcache_rbnode_release);
+	return ret;
+}
+
+/*
+ * Load zaddr and delete it from radix tree.
+ * If the radix tree of the corresponding rbnode is empty, delete the rbnode
+ * from zpool->rbtree also.
+ */
+static void *zcache_load_delete_zaddr(struct zcache_pool *zpool,
+				int rb_index, int ra_index)
+{
+	struct zcache_rbnode *rbnode;
+	void *zaddr = NULL;
+	unsigned long flags;
+
+	rbnode = zcache_find_get_rbnode(zpool, rb_index);
+	if (!rbnode)
+		goto out;
+
+	BUG_ON(rbnode->rb_index != rb_index);
+
+	spin_lock_irqsave(&rbnode->ra_lock, flags);
+	zaddr = radix_tree_delete(&rbnode->ratree, ra_index);
+	if (zcache_rbnode_empty(rbnode))
+		zcache_rbnode_isolate(zpool, rbnode, 0);
+	spin_unlock_irqrestore(&rbnode->ra_lock, flags);
+
+	kref_put(&rbnode->refcount, zcache_rbnode_release);
+out:
+	return zaddr;
+}
+
+static void zcache_store_page(int pool_id, struct cleancache_filekey key,
+		pgoff_t index, struct page *page)
+{
+	struct zcache_ra_handle *zhandle;
+	u8 *zpage, *src, *dst;
+	unsigned long zaddr; /* Address of zhandle + compressed data(zpage) */
+	unsigned int zlen = PAGE_SIZE;
+	int ret;
+
+	struct zcache_pool *zpool = zcache.pools[pool_id];
+
+	if (zcache_is_full()) {
+		zcache_pool_limit_hit++;
+		return;
+	}
+
+	/* compress */
+	dst = get_cpu_var(zcache_dstmem);
+	src = kmap_atomic(page);
+	ret = zcache_comp_op(ZCACHE_COMPOP_COMPRESS, src, PAGE_SIZE, dst,
+			&zlen);
+	kunmap_atomic(src);
+	if (ret) {
+		pr_err("zcache compress error ret %d\n", ret);
+		put_cpu_var(zcache_dstmem);
+		return;
+	}
+
+	/* store zcache handle together with compressed page data */
+	ret = zbud_alloc(zpool->pool, zlen + sizeof(struct zcache_ra_handle),
+			__GFP_NORETRY | __GFP_NOWARN, &zaddr);
+	if (ret) {
+		zcache_zbud_alloc_fail++;
+		put_cpu_var(zcache_dstmem);
+		return;
+	}
+
+	zhandle = (struct zcache_ra_handle *)zbud_map(zpool->pool, zaddr);
+	zhandle->ra_index = index;
+	zhandle->rb_index = key.u.ino;
+	zhandle->zlen = zlen;
+	/* Compressed page data stored at the end of zcache_ra_handle */
+	zpage = (u8 *)(zhandle + 1);
+	memcpy(zpage, dst, zlen);
+	zbud_unmap(zpool->pool, zaddr);
+	put_cpu_var(zcache_dstmem);
+
+	/* store zcache handle */
+	ret = zcache_store_zaddr(zpool, zhandle, zaddr);
+	if (ret) {
+		pr_err("%s: store handle error %d\n", __func__, ret);
+		zbud_free(zpool->pool, zaddr);
+	}
+
+	/* update stats */
+	atomic_inc(&zcache_stored_pages);
+	zcache_pool_pages = zbud_get_pool_size(zpool->pool);
+	return;
+}
+
+static int zcache_load_page(int pool_id, struct cleancache_filekey key,
+			pgoff_t index, struct page *page)
+{
+	int ret;
+	u8 *src, *dst;
+	void *zaddr;
+	unsigned int dlen = PAGE_SIZE;
+	struct zcache_ra_handle *zhandle;
+	struct zcache_pool *zpool = zcache.pools[pool_id];
+
+	zaddr = zcache_load_delete_zaddr(zpool, key.u.ino, index);
+	if (!zaddr)
+		return -1;
+
+	zhandle = (struct zcache_ra_handle *)zbud_map(zpool->pool,
+			(unsigned long)zaddr);
+	/* Compressed page data stored at the end of zcache_ra_handle */
+	src = (u8 *)(zhandle + 1);
+
+	/* decompress */
+	dst = kmap_atomic(page);
+	ret = zcache_comp_op(ZCACHE_COMPOP_DECOMPRESS, src, zhandle->zlen, dst,
+			&dlen);
+	kunmap_atomic(dst);
+	zbud_unmap(zpool->pool, (unsigned long)zaddr);
+	zbud_free(zpool->pool, (unsigned long)zaddr);
+
+	BUG_ON(ret);
+	BUG_ON(dlen != PAGE_SIZE);
+
+	/* update stats */
+	atomic_dec(&zcache_stored_pages);
+	zcache_pool_pages = zbud_get_pool_size(zpool->pool);
+	return ret;
+}
+
+static void zcache_flush_page(int pool_id, struct cleancache_filekey key,
+			pgoff_t index)
+{
+	struct zcache_pool *zpool = zcache.pools[pool_id];
+	void *zaddr = NULL;
+
+	zaddr = zcache_load_delete_zaddr(zpool, key.u.ino, index);
+	if (zaddr) {
+		zbud_free(zpool->pool, (unsigned long)zaddr);
+		atomic_dec(&zcache_stored_pages);
+		zcache_pool_pages = zbud_get_pool_size(zpool->pool);
+	}
+}
+
+#define FREE_BATCH 16
+/*
+ * Callers must hold the lock
+ */
+static void zcache_flush_ratree(struct zcache_pool *zpool,
+		struct zcache_rbnode *rbnode)
+{
+	unsigned long index = 0;
+	int count, i;
+	struct zcache_ra_handle *zhandle;
+
+	do {
+		void *zaddrs[FREE_BATCH];
+
+		count = radix_tree_gang_lookup(&rbnode->ratree, (void **)zaddrs,
+				index, FREE_BATCH);
+
+		for (i = 0; i < count; i++) {
+			zhandle = (struct zcache_ra_handle *)zbud_map(
+					zpool->pool, (unsigned long)zaddrs[i]);
+			index = zhandle->ra_index;
+			radix_tree_delete(&rbnode->ratree, index);
+			zbud_unmap(zpool->pool, (unsigned long)zaddrs[i]);
+			zbud_free(zpool->pool, (unsigned long)zaddrs[i]);
+			atomic_dec(&zcache_stored_pages);
+			zcache_pool_pages = zbud_get_pool_size(zpool->pool);
+		}
+
+		index++;
+	} while (count == FREE_BATCH);
+}
+
+static void zcache_flush_inode(int pool_id, struct cleancache_filekey key)
+{
+	struct zcache_rbnode *rbnode;
+	unsigned long flags1, flags2;
+	struct zcache_pool *zpool = zcache.pools[pool_id];
+
+	/*
+	 * Refuse new pages added in to the same rbinode, so get rb_lock at
+	 * first.
+	 */
+	write_lock_irqsave(&zpool->rb_lock, flags1);
+	rbnode = zcache_find_rbnode(&zpool->rbtree, key.u.ino, 0, 0);
+	if (!rbnode) {
+		write_unlock_irqrestore(&zpool->rb_lock, flags1);
+		return;
+	}
+
+	kref_get(&rbnode->refcount);
+	spin_lock_irqsave(&rbnode->ra_lock, flags2);
+
+	zcache_flush_ratree(zpool, rbnode);
+	if (zcache_rbnode_empty(rbnode))
+		/* When arrvied here, we already hold rb_lock */
+		zcache_rbnode_isolate(zpool, rbnode, 1);
+
+	spin_unlock_irqrestore(&rbnode->ra_lock, flags2);
+	write_unlock_irqrestore(&zpool->rb_lock, flags1);
+	kref_put(&rbnode->refcount, zcache_rbnode_release);
+}
+
+static void zcache_destroy_pool(struct zcache_pool *zpool);
+static void zcache_flush_fs(int pool_id)
+{
+	struct zcache_rbnode *z_rbnode = NULL;
+	struct rb_node *rbnode;
+	unsigned long flags1, flags2;
+	struct zcache_pool *zpool;
+
+	if (pool_id < 0)
+		return;
+
+	zpool = zcache.pools[pool_id];
+	if (!zpool)
+		return;
+
+	/*
+	 * Refuse new pages added in, so get rb_lock at first.
+	 */
+	write_lock_irqsave(&zpool->rb_lock, flags1);
+
+	rbnode = rb_first(&zpool->rbtree);
+	while (rbnode) {
+		z_rbnode = rb_entry(rbnode, struct zcache_rbnode, rb_node);
+		rbnode = rb_next(rbnode);
+		if (z_rbnode) {
+			kref_get(&z_rbnode->refcount);
+			spin_lock_irqsave(&z_rbnode->ra_lock, flags2);
+			zcache_flush_ratree(zpool, z_rbnode);
+			if (zcache_rbnode_empty(z_rbnode))
+				zcache_rbnode_isolate(zpool, z_rbnode, 1);
+			spin_unlock_irqrestore(&z_rbnode->ra_lock, flags2);
+			kref_put(&z_rbnode->refcount, zcache_rbnode_release);
+		}
+	}
+
+	write_unlock_irqrestore(&zpool->rb_lock, flags1);
+	zcache_destroy_pool(zpool);
+}
+
+/*
+ * Evict pages from zcache pool on an LRU basis after the compressed pool is
+ * full.
+ */
+static int zcache_evict_entry(struct zbud_pool *pool, unsigned long zaddr)
+{
+	return -1;
+}
+
+static struct zbud_ops zcache_zbud_ops = {
+	.evict = zcache_evict_entry
+};
+
+/* Return pool id */
+static int zcache_create_pool(void)
+{
+	int ret;
+	struct zcache_pool *zpool;
+
+	zpool = kzalloc(sizeof(*zpool), GFP_KERNEL);
+	if (!zpool) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	zpool->pool = zbud_create_pool(GFP_KERNEL, &zcache_zbud_ops);
+	if (!zpool->pool) {
+		kfree(zpool);
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	spin_lock(&zcache.pool_lock);
+	if (zcache.num_pools == MAX_ZCACHE_POOLS) {
+		pr_err("Cannot create new pool (limit:%u)\n", MAX_ZCACHE_POOLS);
+		zbud_destroy_pool(zpool->pool);
+		kfree(zpool);
+		ret = -EPERM;
+		goto out_unlock;
+	}
+
+	rwlock_init(&zpool->rb_lock);
+	zpool->rbtree = RB_ROOT;
+	/* Add to pool list */
+	for (ret = 0; ret < MAX_ZCACHE_POOLS; ret++)
+		if (!zcache.pools[ret])
+			break;
+	zcache.pools[ret] = zpool;
+	zcache.num_pools++;
+	pr_info("New pool created id:%d\n", ret);
+
+out_unlock:
+	spin_unlock(&zcache.pool_lock);
+out:
+	return ret;
+}
+
+static void zcache_destroy_pool(struct zcache_pool *zpool)
+{
+	int i;
+
+	if (!zpool)
+		return;
+
+	spin_lock(&zcache.pool_lock);
+	zcache.num_pools--;
+	for (i = 0; i < MAX_ZCACHE_POOLS; i++)
+		if (zcache.pools[i] == zpool)
+			break;
+	zcache.pools[i] = NULL;
+	spin_unlock(&zcache.pool_lock);
+
+	if (!RB_EMPTY_ROOT(&zpool->rbtree))
+		WARN_ON("Memory leak detected. Freeing non-empty pool!\n");
+
+	zbud_destroy_pool(zpool->pool);
+	kfree(zpool);
+}
+
+static int zcache_init_fs(size_t pagesize)
+{
+	int ret;
+
+	if (pagesize != PAGE_SIZE) {
+		pr_info("Unsupported page size: %zu", pagesize);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = zcache_create_pool();
+	if (ret < 0) {
+		pr_info("Failed to create new pool\n");
+		ret = -ENOMEM;
+		goto out;
+	}
+out:
+	return ret;
+}
+
+static int zcache_init_shared_fs(char *uuid, size_t pagesize)
+{
+	/* shared pools are unsupported and map to private */
+	return zcache_init_fs(pagesize);
+}
+
+static struct cleancache_ops zcache_ops = {
+	.put_page = zcache_store_page,
+	.get_page = zcache_load_page,
+	.invalidate_page = zcache_flush_page,
+	.invalidate_inode = zcache_flush_inode,
+	.invalidate_fs = zcache_flush_fs,
+	.init_shared_fs = zcache_init_shared_fs,
+	.init_fs = zcache_init_fs
+};
+
+/*
+ * Debugfs functions
+ */
+#ifdef CONFIG_DEBUG_FS
+#include <linux/debugfs.h>
+static struct dentry *zcache_debugfs_root;
+
+static int __init zcache_debugfs_init(void)
+{
+	if (!debugfs_initialized())
+		return -ENODEV;
+
+	zcache_debugfs_root = debugfs_create_dir("zcache", NULL);
+	if (!zcache_debugfs_root)
+		return -ENOMEM;
+
+	debugfs_create_u64("pool_limit_hit", S_IRUGO, zcache_debugfs_root,
+			&zcache_pool_limit_hit);
+	debugfs_create_u64("reject_alloc_fail", S_IRUGO, zcache_debugfs_root,
+			&zcache_zbud_alloc_fail);
+	debugfs_create_u64("duplicate_entry", S_IRUGO, zcache_debugfs_root,
+			&zcache_dup_entry);
+	debugfs_create_u64("pool_pages", S_IRUGO, zcache_debugfs_root,
+			&zcache_pool_pages);
+	debugfs_create_atomic_t("stored_pages", S_IRUGO, zcache_debugfs_root,
+			&zcache_stored_pages);
+	return 0;
+}
+
+static void __exit zcache_debugfs_exit(void)
+{
+	debugfs_remove_recursive(zcache_debugfs_root);
+}
+#else
+static int __init zcache_debugfs_init(void)
+{
+	return 0;
+}
+static void __exit zcache_debugfs_exit(void)
+{
+}
+#endif
+
+/*
+ * zcache init and exit
+ */
+static int __init init_zcache(void)
+{
+	if (!zcache_enabled)
+		return 0;
+
+	pr_info("loading zcache..\n");
+	if (zcache_rbnode_cache_create()) {
+		pr_err("entry cache creation failed\n");
+		goto error;
+	}
+
+	if (zcache_comp_init()) {
+		pr_err("compressor initialization failed\n");
+		goto compfail;
+	}
+	if (zcache_cpu_init()) {
+		pr_err("per-cpu initialization failed\n");
+		goto pcpufail;
+	}
+
+	spin_lock_init(&zcache.pool_lock);
+	cleancache_register_ops(&zcache_ops);
+
+	if (zcache_debugfs_init())
+		pr_warn("debugfs initialization failed\n");
+	return 0;
+pcpufail:
+	zcache_comp_exit();
+compfail:
+	zcache_rbnode_cache_destory();
+error:
+	return -ENOMEM;
+}
+
+/* must be late so crypto has time to come up */
+late_initcall(init_zcache);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Bob Liu <bob.liu@oracle.com>");
+MODULE_DESCRIPTION("Compressed cache for clean file pages");
+
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/4] zcache: staging: %s/ZCACHE/ZCACHE_OLD
  2013-08-06 11:36 [PATCH v2 0/4] zcache: a compressed file page cache Bob Liu
  2013-08-06 11:36 ` [PATCH v2 1/4] mm: zcache: add core files Bob Liu
@ 2013-08-06 11:36 ` Bob Liu
  2013-08-06 11:36 ` [PATCH v2 3/4] mm: zcache: add evict zpages supporting Bob Liu
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Bob Liu @ 2013-08-06 11:36 UTC (permalink / raw)
  To: linux-mm
  Cc: gregkh, ngupta, akpm, konrad.wilk, sjenning, riel, mgorman,
	kyungmin.park, p.sarna, barry.song, penberg, Bob Liu

If nobody are using it, I'll drop it from staging.
Zcache in staging then split to zswap and zcache in mm/, and can be merged
again in future if there is requriement.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/staging/zcache/Kconfig  |   12 ++++++------
 drivers/staging/zcache/Makefile |    4 ++--
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/zcache/Kconfig b/drivers/staging/zcache/Kconfig
index 2d7b2da..f96fb12 100644
--- a/drivers/staging/zcache/Kconfig
+++ b/drivers/staging/zcache/Kconfig
@@ -1,4 +1,4 @@
-config ZCACHE
+config ZCACHE_OLD
 	tristate "Dynamic compression of swap pages and clean pagecache pages"
 	depends on CRYPTO=y && SWAP=y && CLEANCACHE && FRONTSWAP
 	select CRYPTO_LZO
@@ -10,9 +10,9 @@ config ZCACHE
 	  memory to store clean page cache pages and swap in RAM,
 	  providing a noticeable reduction in disk I/O.
 
-config ZCACHE_DEBUG
+config ZCACHE_OLD_DEBUG
 	bool "Enable debug statistics"
-	depends on DEBUG_FS && ZCACHE
+	depends on DEBUG_FS && ZCACHE_OLD
 	default n
 	help
 	  This is used to provide an debugfs directory with counters of
@@ -20,7 +20,7 @@ config ZCACHE_DEBUG
 
 config RAMSTER
 	tristate "Cross-machine RAM capacity sharing, aka peer-to-peer tmem"
-	depends on CONFIGFS_FS=y && SYSFS=y && !HIGHMEM && ZCACHE
+	depends on CONFIGFS_FS=y && SYSFS=y && !HIGHMEM && ZCACHE_OLD
 	depends on NET
 	# must ensure struct page is 8-byte aligned
 	select HAVE_ALIGNED_STRUCT_PAGE if !64BIT
@@ -45,9 +45,9 @@ config RAMSTER_DEBUG
 # __add_to_swap_cache, and implement __swap_writepage (which is swap_writepage
 # without the frontswap call. When these are in-tree, the dependency on
 # BROKEN can be removed
-config ZCACHE_WRITEBACK
+config ZCACHE_OLD_WRITEBACK
 	bool "Allow compressed swap pages to be writtenback to swap disk"
-	depends on ZCACHE=y && BROKEN
+	depends on ZCACHE_OLD=y && BROKEN
 	default n
 	help
 	  Zcache caches compressed swap pages (and other data) in RAM which
diff --git a/drivers/staging/zcache/Makefile b/drivers/staging/zcache/Makefile
index 845a5c2..34d27bd 100644
--- a/drivers/staging/zcache/Makefile
+++ b/drivers/staging/zcache/Makefile
@@ -1,8 +1,8 @@
 zcache-y	:=		zcache-main.o tmem.o zbud.o
-zcache-$(CONFIG_ZCACHE_DEBUG) += debug.o
+zcache-$(CONFIG_ZCACHE_OLD_DEBUG) += debug.o
 zcache-$(CONFIG_RAMSTER_DEBUG) += ramster/debug.o
 zcache-$(CONFIG_RAMSTER)	+=	ramster/ramster.o ramster/r2net.o
 zcache-$(CONFIG_RAMSTER)	+=	ramster/nodemanager.o ramster/tcp.o
 zcache-$(CONFIG_RAMSTER)	+=	ramster/heartbeat.o ramster/masklog.o
 
-obj-$(CONFIG_ZCACHE)	+=	zcache.o
+obj-$(CONFIG_ZCACHE_OLD)	+=	zcache.o
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 3/4] mm: zcache: add evict zpages supporting
  2013-08-06 11:36 [PATCH v2 0/4] zcache: a compressed file page cache Bob Liu
  2013-08-06 11:36 ` [PATCH v2 1/4] mm: zcache: add core files Bob Liu
  2013-08-06 11:36 ` [PATCH v2 2/4] zcache: staging: %s/ZCACHE/ZCACHE_OLD Bob Liu
@ 2013-08-06 11:36 ` Bob Liu
  2013-08-06 11:36 ` [PATCH v2 4/4] mm: add WasActive page flag Bob Liu
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Bob Liu @ 2013-08-06 11:36 UTC (permalink / raw)
  To: linux-mm
  Cc: gregkh, ngupta, akpm, konrad.wilk, sjenning, riel, mgorman,
	kyungmin.park, p.sarna, barry.song, penberg, Bob Liu

Implemented zbud_ops->evict, so that compressed zpages can be evicted from zbud
memory pool in the case that the compressed pool is full.

zbud already managered the compressed pool based on LRU. The evict was
implemented just by dropping the compressed file page data directly, if the data
is required again then no more disk reading can be saved.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 mm/zcache.c |   53 +++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 47 insertions(+), 6 deletions(-)

diff --git a/mm/zcache.c b/mm/zcache.c
index ec1a0eb..8c3222e 100644
--- a/mm/zcache.c
+++ b/mm/zcache.c
@@ -65,6 +65,9 @@ static u64 zcache_pool_limit_hit;
 static u64 zcache_dup_entry;
 static u64 zcache_zbud_alloc_fail;
 static u64 zcache_pool_pages;
+static u64 zcache_evict_zpages;
+static u64 zcache_evict_filepages;
+static u64 zcache_reclaim_fail;
 static atomic_t zcache_stored_pages = ATOMIC_INIT(0);
 
 /*
@@ -129,6 +132,7 @@ struct zcache_ra_handle {
 	int rb_index;			/* Redblack tree index */
 	int ra_index;			/* Radix tree index */
 	int zlen;			/* Compressed page size */
+	struct zcache_pool *zpool;	/* Finding zcache_pool during evict */
 };
 
 static struct kmem_cache *zcache_rbnode_cache;
@@ -493,7 +497,16 @@ static void zcache_store_page(int pool_id, struct cleancache_filekey key,
 
 	if (zcache_is_full()) {
 		zcache_pool_limit_hit++;
-		return;
+		if (zbud_reclaim_page(zpool->pool, 8)) {
+			zcache_reclaim_fail++;
+			return;
+		} else {
+			/*
+			 * Continue if eclaimed a page frame succ.
+			 */
+			zcache_evict_filepages++;
+			zcache_pool_pages = zbud_get_pool_size(zpool->pool);
+		}
 	}
 
 	/* compress */
@@ -521,6 +534,8 @@ static void zcache_store_page(int pool_id, struct cleancache_filekey key,
 	zhandle->ra_index = index;
 	zhandle->rb_index = key.u.ino;
 	zhandle->zlen = zlen;
+	zhandle->zpool = zpool;
+
 	/* Compressed page data stored at the end of zcache_ra_handle */
 	zpage = (u8 *)(zhandle + 1);
 	memcpy(zpage, dst, zlen);
@@ -692,16 +707,36 @@ static void zcache_flush_fs(int pool_id)
 }
 
 /*
- * Evict pages from zcache pool on an LRU basis after the compressed pool is
- * full.
+ * Evict compressed pages from zcache pool on an LRU basis after the compressed
+ * pool is full.
  */
-static int zcache_evict_entry(struct zbud_pool *pool, unsigned long zaddr)
+static int zcache_evict_zpage(struct zbud_pool *pool, unsigned long zaddr)
 {
-	return -1;
+	struct zcache_pool *zpool;
+	struct zcache_ra_handle *zhandle;
+	void *zaddr_intree;
+
+	zhandle = (struct zcache_ra_handle *)zbud_map(pool, zaddr);
+
+	zpool = zhandle->zpool;
+	BUG_ON(!zpool);
+	BUG_ON(pool != zpool->pool);
+
+	zaddr_intree = zcache_load_delete_zaddr(zpool, zhandle->rb_index,
+			zhandle->ra_index);
+	if (zaddr_intree) {
+		BUG_ON((unsigned long)zaddr_intree != zaddr);
+		zbud_unmap(pool, zaddr);
+		zbud_free(pool, zaddr);
+		atomic_dec(&zcache_stored_pages);
+		zcache_pool_pages = zbud_get_pool_size(pool);
+		zcache_evict_zpages++;
+	}
+	return 0;
 }
 
 static struct zbud_ops zcache_zbud_ops = {
-	.evict = zcache_evict_entry
+	.evict = zcache_evict_zpage
 };
 
 /* Return pool id */
@@ -832,6 +867,12 @@ static int __init zcache_debugfs_init(void)
 			&zcache_pool_pages);
 	debugfs_create_atomic_t("stored_pages", S_IRUGO, zcache_debugfs_root,
 			&zcache_stored_pages);
+	debugfs_create_u64("evicted_zpages", S_IRUGO, zcache_debugfs_root,
+			&zcache_evict_zpages);
+	debugfs_create_u64("evicted_filepages", S_IRUGO, zcache_debugfs_root,
+			&zcache_evict_filepages);
+	debugfs_create_u64("reclaim_fail", S_IRUGO, zcache_debugfs_root,
+			&zcache_reclaim_fail);
 	return 0;
 }
 
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 4/4] mm: add WasActive page flag
  2013-08-06 11:36 [PATCH v2 0/4] zcache: a compressed file page cache Bob Liu
                   ` (2 preceding siblings ...)
  2013-08-06 11:36 ` [PATCH v2 3/4] mm: zcache: add evict zpages supporting Bob Liu
@ 2013-08-06 11:36 ` Bob Liu
  2013-08-13  6:01   ` Pekka Enberg
  2013-08-06 13:58 ` [PATCH v2 0/4] zcache: a compressed file page cache Greg KH
  2013-08-09  8:03 ` Bob Liu
  5 siblings, 1 reply; 16+ messages in thread
From: Bob Liu @ 2013-08-06 11:36 UTC (permalink / raw)
  To: linux-mm
  Cc: gregkh, ngupta, akpm, konrad.wilk, sjenning, riel, mgorman,
	kyungmin.park, p.sarna, barry.song, penberg, Bob Liu

Zcache could be ineffective if the compressed memory pool is full with
compressed inactive file pages and most of them will be never used again.

So we pick up pages from active file list only, those pages would probably be
accessed again. Compress them in memory can reduce the latency significantly
compared with rereading from disk.

When a file page is shrinked from active file list to inactive file list,
PageActive flag is also cleared.
So adding an extra WasActive page flag for zcache to know whether the file page
was shrinked from the active list.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 include/linux/page-flags.h |    9 ++++++++-
 mm/page_alloc.c            |    3 +++
 mm/vmscan.c                |   11 ++++++++++-
 mm/zcache.c                |   15 +++++++++++++++
 4 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6d53675..ab433916 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -109,6 +109,9 @@ enum pageflags {
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	PG_compound_lock,
 #endif
+#ifdef CONFIG_CLEANCACHE
+	PG_was_active,
+#endif
 	__NR_PAGEFLAGS,
 
 	/* Filesystems */
@@ -210,6 +213,9 @@ PAGEFLAG(Reserved, reserved) __CLEARPAGEFLAG(Reserved, reserved)
 PAGEFLAG(SwapBacked, swapbacked) __CLEARPAGEFLAG(SwapBacked, swapbacked)
 
 __PAGEFLAG(SlobFree, slob_free)
+#ifdef CONFIG_CLEANCACHE
+PAGEFLAG(WasActive, was_active)
+#endif
 
 /*
  * Private page markings that may be used by the filesystem that owns the page
@@ -509,7 +515,8 @@ static inline void ClearPageSlabPfmemalloc(struct page *page)
  * Pages being prepped should not have any flags set.  It they are set,
  * there has been a kernel bug or struct page corruption.
  */
-#define PAGE_FLAGS_CHECK_AT_PREP	((1 << NR_PAGEFLAGS) - 1)
+#define PAGE_FLAGS_CHECK_AT_PREP	(((1 << NR_PAGEFLAGS) - 1) |\
+	(1 << PG_was_active))
 
 #define PAGE_FLAGS_PRIVATE				\
 	(1 << PG_private | 1 << PG_private_2)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b100255..9505ced 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6345,6 +6345,9 @@ static const struct trace_print_flags pageflag_names[] = {
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	{1UL << PG_compound_lock,	"compound_lock"	},
 #endif
+#ifdef CONFIG_CLEANCACHE
+	{1UL << PG_was_active,	"was_active"	},
+#endif
 };
 
 static void dump_page_flags(unsigned long flags)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2cff0d4..674f33f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1325,8 +1325,11 @@ putback_inactive_pages(struct lruvec *lruvec, struct list_head *page_list)
 		lru = page_lru(page);
 		add_page_to_lru_list(page, lruvec, lru);
 
+		int file = is_file_lru(lru);
+		if (IS_ENABLED(CONFIG_ZCACHE))
+			if (file)
+				SetPageWasActive(page);
 		if (is_active_lru(lru)) {
-			int file = is_file_lru(lru);
 			int numpages = hpage_nr_pages(page);
 			reclaim_stat->recent_rotated[file] += numpages;
 		}
@@ -1632,6 +1635,12 @@ static void shrink_active_list(unsigned long nr_to_scan,
 		}
 
 		ClearPageActive(page);	/* we are de-activating */
+		if (IS_ENABLED(CONFIG_ZCACHE))
+			/*
+			 * For zcache to know whether the page is from active
+			 * file list
+			 */
+			SetPageWasActive(page);
 		list_add(&page->lru, &l_inactive);
 	}
 
diff --git a/mm/zcache.c b/mm/zcache.c
index 8c3222e..97ca274 100644
--- a/mm/zcache.c
+++ b/mm/zcache.c
@@ -67,6 +67,7 @@ static u64 zcache_zbud_alloc_fail;
 static u64 zcache_pool_pages;
 static u64 zcache_evict_zpages;
 static u64 zcache_evict_filepages;
+static u64 zcache_inactive_pages_refused;
 static u64 zcache_reclaim_fail;
 static atomic_t zcache_stored_pages = ATOMIC_INIT(0);
 
@@ -495,6 +496,17 @@ static void zcache_store_page(int pool_id, struct cleancache_filekey key,
 
 	struct zcache_pool *zpool = zcache.pools[pool_id];
 
+	/*
+	 * Zcache will be ineffective if the compressed memory pool is full with
+	 * compressed inactive file pages and most of them will never be used
+	 * again.
+	 * So we refuse to compress pages that are not from active file list.
+	 */
+	if (!PageWasActive(page)) {
+		zcache_inactive_pages_refused++;
+		return;
+	}
+
 	if (zcache_is_full()) {
 		zcache_pool_limit_hit++;
 		if (zbud_reclaim_page(zpool->pool, 8)) {
@@ -588,6 +600,7 @@ static int zcache_load_page(int pool_id, struct cleancache_filekey key,
 	/* update stats */
 	atomic_dec(&zcache_stored_pages);
 	zcache_pool_pages = zbud_get_pool_size(zpool->pool);
+	SetPageWasActive(page);
 	return ret;
 }
 
@@ -873,6 +886,8 @@ static int __init zcache_debugfs_init(void)
 			&zcache_evict_filepages);
 	debugfs_create_u64("reclaim_fail", S_IRUGO, zcache_debugfs_root,
 			&zcache_reclaim_fail);
+	debugfs_create_u64("inactive_pages_refused", S_IRUGO,
+			zcache_debugfs_root, &zcache_inactive_pages_refused);
 	return 0;
 }
 
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 4/4] mm: add WasActive page flag
  2013-08-06 11:36 ` [PATCH v2 4/4] mm: add WasActive page flag Bob Liu
@ 2013-08-13  6:01   ` Pekka Enberg
  2013-08-13 13:50     ` Bob Liu
  0 siblings, 1 reply; 16+ messages in thread
From: Pekka Enberg @ 2013-08-13  6:01 UTC (permalink / raw)
  To: Bob Liu
  Cc: linux-mm, gregkh, ngupta, akpm, konrad.wilk, sjenning, riel,
	mgorman, kyungmin.park, p.sarna, barry.song, penberg, Bob Liu

On 8/6/13 2:36 PM, Bob Liu wrote:
> Zcache could be ineffective if the compressed memory pool is full with
> compressed inactive file pages and most of them will be never used again.
>
> So we pick up pages from active file list only, those pages would probably be
> accessed again. Compress them in memory can reduce the latency significantly
> compared with rereading from disk.
>
> When a file page is shrinked from active file list to inactive file list,
> PageActive flag is also cleared.
> So adding an extra WasActive page flag for zcache to know whether the file page
> was shrinked from the active list.
>
> Signed-off-by: Bob Liu <bob.liu@oracle.com>

Using a page flag for this seems like an ugly hack to me.
Can we rearrange the code so that vmscan notifies zcache
*before* the active page flag is cleared...?

                 Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 4/4] mm: add WasActive page flag
  2013-08-13  6:01   ` Pekka Enberg
@ 2013-08-13 13:50     ` Bob Liu
  0 siblings, 0 replies; 16+ messages in thread
From: Bob Liu @ 2013-08-13 13:50 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Bob Liu, linux-mm, gregkh, ngupta, akpm, konrad.wilk, sjenning,
	riel, mgorman, kyungmin.park, p.sarna, barry.song, penberg

Hi Pekka,

On 08/13/2013 02:01 PM, Pekka Enberg wrote:
> On 8/6/13 2:36 PM, Bob Liu wrote:
>> Zcache could be ineffective if the compressed memory pool is full with
>> compressed inactive file pages and most of them will be never used again.
>>
>> So we pick up pages from active file list only, those pages would
>> probably be
>> accessed again. Compress them in memory can reduce the latency
>> significantly
>> compared with rereading from disk.
>>
>> When a file page is shrinked from active file list to inactive file list,
>> PageActive flag is also cleared.
>> So adding an extra WasActive page flag for zcache to know whether the
>> file page
>> was shrinked from the active list.
>>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> 

Thank you so much for your review!

> Using a page flag for this seems like an ugly hack to me.
> Can we rearrange the code so that vmscan notifies zcache
> *before* the active page flag is cleared...?

Yep, adding a page flag is not a good idea.
I'm looking at whether there is other way to notify zcache.

BTW: Could you please give some feedback too about other zcache patches?

> 
>                 Pekka

-- 
Regards,
-Bob

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/4] zcache: a compressed file page cache
  2013-08-06 11:36 [PATCH v2 0/4] zcache: a compressed file page cache Bob Liu
                   ` (3 preceding siblings ...)
  2013-08-06 11:36 ` [PATCH v2 4/4] mm: add WasActive page flag Bob Liu
@ 2013-08-06 13:58 ` Greg KH
  2013-08-06 14:24   ` Bob Liu
  2013-08-09  8:03 ` Bob Liu
  5 siblings, 1 reply; 16+ messages in thread
From: Greg KH @ 2013-08-06 13:58 UTC (permalink / raw)
  To: Bob Liu
  Cc: linux-mm, ngupta, akpm, konrad.wilk, sjenning, riel, mgorman,
	kyungmin.park, p.sarna, barry.song, penberg, Bob Liu

On Tue, Aug 06, 2013 at 07:36:13PM +0800, Bob Liu wrote:
> Dan Magenheimer extended zcache supporting both file pages and anonymous pages.
> It's located in drivers/staging/zcache now. But the current version of zcache is
> too complicated to be merged into upstream.

Really?  If this is so, I'll just go delete zcache now, I don't want to
lug around dead code that will never be merged.

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/4] zcache: a compressed file page cache
  2013-08-06 13:58 ` [PATCH v2 0/4] zcache: a compressed file page cache Greg KH
@ 2013-08-06 14:24   ` Bob Liu
  2013-08-12 12:19     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 16+ messages in thread
From: Bob Liu @ 2013-08-06 14:24 UTC (permalink / raw)
  To: Greg KH
  Cc: Bob Liu, linux-mm, ngupta, akpm, konrad.wilk, sjenning, riel,
	mgorman, kyungmin.park, p.sarna, barry.song, penberg

Hi Greg,

On 08/06/2013 09:58 PM, Greg KH wrote:
> On Tue, Aug 06, 2013 at 07:36:13PM +0800, Bob Liu wrote:
>> Dan Magenheimer extended zcache supporting both file pages and anonymous pages.
>> It's located in drivers/staging/zcache now. But the current version of zcache is
>> too complicated to be merged into upstream.
> 
> Really?  If this is so, I'll just go delete zcache now, I don't want to
> lug around dead code that will never be merged.
> 

Zcache in staging have a zbud allocation which is almost the same as
mm/zbud.c but with different API and have a frontswap backend like
mm/zswap.c.
So I'd prefer reuse mm/zbud.c and mm/zswap.c for a generic memory
compression solution.
Which means in that case, zcache in staging = mm/zswap.c + mm/zcache.c +
mm/zbud.c.

But I'm not sure if there are any existing users of zcache in staging,
if not I can delete zcache from staging in my next version of this
mm/zcache.c series.

> greg k-h
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

-- 
Regards,
-Bob

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/4] zcache: a compressed file page cache
  2013-08-06 14:24   ` Bob Liu
@ 2013-08-12 12:19     ` Konrad Rzeszutek Wilk
  2013-08-12 12:25       ` Kyungmin Park
                         ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-08-12 12:19 UTC (permalink / raw)
  To: Bob Liu
  Cc: Greg KH, Bob Liu, linux-mm, ngupta, akpm, sjenning, riel, mgorman,
	kyungmin.park, p.sarna, barry.song, penberg

On Tue, Aug 06, 2013 at 10:24:20PM +0800, Bob Liu wrote:
> Hi Greg,
> 
> On 08/06/2013 09:58 PM, Greg KH wrote:
> > On Tue, Aug 06, 2013 at 07:36:13PM +0800, Bob Liu wrote:
> >> Dan Magenheimer extended zcache supporting both file pages and anonymous pages.
> >> It's located in drivers/staging/zcache now. But the current version of zcache is
> >> too complicated to be merged into upstream.
> > 
> > Really?  If this is so, I'll just go delete zcache now, I don't want to
> > lug around dead code that will never be merged.
> > 
> 
> Zcache in staging have a zbud allocation which is almost the same as
> mm/zbud.c but with different API and have a frontswap backend like
> mm/zswap.c.
> So I'd prefer reuse mm/zbud.c and mm/zswap.c for a generic memory
> compression solution.
> Which means in that case, zcache in staging = mm/zswap.c + mm/zcache.c +
> mm/zbud.c.
> 
> But I'm not sure if there are any existing users of zcache in staging,
> if not I can delete zcache from staging in my next version of this
> mm/zcache.c series.

I think the Samsung folks are using it (zcache).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/4] zcache: a compressed file page cache
  2013-08-12 12:19     ` Konrad Rzeszutek Wilk
@ 2013-08-12 12:25       ` Kyungmin Park
  2013-08-12 12:30       ` Wanpeng Li
  2013-08-12 12:30       ` Wanpeng Li
  2 siblings, 0 replies; 16+ messages in thread
From: Kyungmin Park @ 2013-08-12 12:25 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Bob Liu, Greg KH, Bob Liu, linux-mm, ngupta, akpm, sjenning, riel,
	mgorman, p.sarna, barry.song, penberg

On Mon, Aug 12, 2013 at 9:19 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Tue, Aug 06, 2013 at 10:24:20PM +0800, Bob Liu wrote:
>> Hi Greg,
>>
>> On 08/06/2013 09:58 PM, Greg KH wrote:
>> > On Tue, Aug 06, 2013 at 07:36:13PM +0800, Bob Liu wrote:
>> >> Dan Magenheimer extended zcache supporting both file pages and anonymous pages.
>> >> It's located in drivers/staging/zcache now. But the current version of zcache is
>> >> too complicated to be merged into upstream.
>> >
>> > Really?  If this is so, I'll just go delete zcache now, I don't want to
>> > lug around dead code that will never be merged.
>> >
>>
>> Zcache in staging have a zbud allocation which is almost the same as
>> mm/zbud.c but with different API and have a frontswap backend like
>> mm/zswap.c.
>> So I'd prefer reuse mm/zbud.c and mm/zswap.c for a generic memory
>> compression solution.
>> Which means in that case, zcache in staging = mm/zswap.c + mm/zcache.c +
>> mm/zbud.c.
>>
>> But I'm not sure if there are any existing users of zcache in staging,
>> if not I can delete zcache from staging in my next version of this
>> mm/zcache.c series.
>
> I think the Samsung folks are using it (zcache).

I'm not sure, but, at least, my team doesn't use it at now.

Thank you,
Kyungmin Park

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/4] zcache: a compressed file page cache
  2013-08-12 12:19     ` Konrad Rzeszutek Wilk
  2013-08-12 12:25       ` Kyungmin Park
@ 2013-08-12 12:30       ` Wanpeng Li
  2013-08-12 12:30       ` Wanpeng Li
  2 siblings, 0 replies; 16+ messages in thread
From: Wanpeng Li @ 2013-08-12 12:30 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Bob Liu, Greg KH, Bob Liu, linux-mm, ngupta, akpm, sjenning, riel,
	mgorman, kyungmin.park, p.sarna, barry.song, penberg

On Mon, Aug 12, 2013 at 08:19:08AM -0400, Konrad Rzeszutek Wilk wrote:
>On Tue, Aug 06, 2013 at 10:24:20PM +0800, Bob Liu wrote:
>> Hi Greg,
>> 
>> On 08/06/2013 09:58 PM, Greg KH wrote:
>> > On Tue, Aug 06, 2013 at 07:36:13PM +0800, Bob Liu wrote:
>> >> Dan Magenheimer extended zcache supporting both file pages and anonymous pages.
>> >> It's located in drivers/staging/zcache now. But the current version of zcache is
>> >> too complicated to be merged into upstream.
>> > 
>> > Really?  If this is so, I'll just go delete zcache now, I don't want to
>> > lug around dead code that will never be merged.
>> > 
>> 
>> Zcache in staging have a zbud allocation which is almost the same as
>> mm/zbud.c but with different API and have a frontswap backend like
>> mm/zswap.c.
>> So I'd prefer reuse mm/zbud.c and mm/zswap.c for a generic memory
>> compression solution.
>> Which means in that case, zcache in staging = mm/zswap.c + mm/zcache.c +
>> mm/zbud.c.
>> 
>> But I'm not sure if there are any existing users of zcache in staging,
>> if not I can delete zcache from staging in my next version of this
>> mm/zcache.c series.
>
>I think the Samsung folks are using it (zcache).
>

Hi Konrad,

If there are real users using ramster? And if Xen project using zcache
and ramster in staging tree? 

Regards,
Wanpeng Li 

>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/4] zcache: a compressed file page cache
  2013-08-12 12:19     ` Konrad Rzeszutek Wilk
  2013-08-12 12:25       ` Kyungmin Park
  2013-08-12 12:30       ` Wanpeng Li
@ 2013-08-12 12:30       ` Wanpeng Li
  2013-08-12 13:23         ` Konrad Rzeszutek Wilk
  2 siblings, 1 reply; 16+ messages in thread
From: Wanpeng Li @ 2013-08-12 12:30 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Bob Liu, Greg KH, Bob Liu, linux-mm, ngupta, akpm, sjenning, riel,
	mgorman, kyungmin.park, p.sarna, barry.song, penberg

On Mon, Aug 12, 2013 at 08:19:08AM -0400, Konrad Rzeszutek Wilk wrote:
>On Tue, Aug 06, 2013 at 10:24:20PM +0800, Bob Liu wrote:
>> Hi Greg,
>> 
>> On 08/06/2013 09:58 PM, Greg KH wrote:
>> > On Tue, Aug 06, 2013 at 07:36:13PM +0800, Bob Liu wrote:
>> >> Dan Magenheimer extended zcache supporting both file pages and anonymous pages.
>> >> It's located in drivers/staging/zcache now. But the current version of zcache is
>> >> too complicated to be merged into upstream.
>> > 
>> > Really?  If this is so, I'll just go delete zcache now, I don't want to
>> > lug around dead code that will never be merged.
>> > 
>> 
>> Zcache in staging have a zbud allocation which is almost the same as
>> mm/zbud.c but with different API and have a frontswap backend like
>> mm/zswap.c.
>> So I'd prefer reuse mm/zbud.c and mm/zswap.c for a generic memory
>> compression solution.
>> Which means in that case, zcache in staging = mm/zswap.c + mm/zcache.c +
>> mm/zbud.c.
>> 
>> But I'm not sure if there are any existing users of zcache in staging,
>> if not I can delete zcache from staging in my next version of this
>> mm/zcache.c series.
>
>I think the Samsung folks are using it (zcache).
>

Hi Konrad,

If there are real users using ramster? And if Xen project using zcache
and ramster in staging tree? 

Regards,
Wanpeng Li 

>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/4] zcache: a compressed file page cache
  2013-08-12 12:30       ` Wanpeng Li
@ 2013-08-12 13:23         ` Konrad Rzeszutek Wilk
  2013-08-12 22:10           ` Greg KH
  0 siblings, 1 reply; 16+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-08-12 13:23 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Bob Liu, Greg KH, Bob Liu, linux-mm, ngupta, akpm, sjenning, riel,
	mgorman, kyungmin.park, p.sarna, barry.song, penberg

On Mon, Aug 12, 2013 at 08:30:02PM +0800, Wanpeng Li wrote:
> On Mon, Aug 12, 2013 at 08:19:08AM -0400, Konrad Rzeszutek Wilk wrote:
> >On Tue, Aug 06, 2013 at 10:24:20PM +0800, Bob Liu wrote:
> >> Hi Greg,
> >> 
> >> On 08/06/2013 09:58 PM, Greg KH wrote:
> >> > On Tue, Aug 06, 2013 at 07:36:13PM +0800, Bob Liu wrote:
> >> >> Dan Magenheimer extended zcache supporting both file pages and anonymous pages.
> >> >> It's located in drivers/staging/zcache now. But the current version of zcache is
> >> >> too complicated to be merged into upstream.
> >> > 
> >> > Really?  If this is so, I'll just go delete zcache now, I don't want to
> >> > lug around dead code that will never be merged.
> >> > 
> >> 
> >> Zcache in staging have a zbud allocation which is almost the same as
> >> mm/zbud.c but with different API and have a frontswap backend like
> >> mm/zswap.c.
> >> So I'd prefer reuse mm/zbud.c and mm/zswap.c for a generic memory
> >> compression solution.
> >> Which means in that case, zcache in staging = mm/zswap.c + mm/zcache.c +
> >> mm/zbud.c.
> >> 
> >> But I'm not sure if there are any existing users of zcache in staging,
> >> if not I can delete zcache from staging in my next version of this
> >> mm/zcache.c series.
> >
> >I think the Samsung folks are using it (zcache).
> >
> 
> Hi Konrad,
> 
> If there are real users using ramster? And if Xen project using zcache
> and ramster in staging tree? 

The Xen Project has an tmem API implementation which allows the 'tmem'
driver (drivers/xen/tmem.c) to use it. The Linux tmem driver implements
both frontswap and cleancache APIs. That means if a guest is running under
Xen it has the same benefits as if it was running baremetal and using
zswap + zcache3 (what Bob posted, which is the cleancache backend) or
the old zcache2 (staging/zcache).

One way to think about is that the compression, deduplication, etc are
all hoisted in the hypervisor while each of the guests pipes the
pages up/down using hypercalls.

Xen Project does not need to use zcache2 (staging/zcache) as it can
get the same benefits from using tmem. Thought if the user wanted they
can certainly use it and bypass tmem and either load zcache2 or zswap
and zcache3 (the one Bob posted).

In regards to "real users using RAMster"  - I am surmising you are
wondering whether Oracle is offering this as a supported product to
customers?  The answer to that is no at this time as it is still in
development and we would want it to be out of that before Oracle
supports it in its distributions.

Now "would want" and the reality of what can be done right now
is a bit disjoint.

I think that the next step is concentrating on making zswap awesome
and also make the zcache3 (the patches that Bob posted) in shape to
be merged in mm.

It would be fantastic if folks took a look at the patches and gave
comments.

Thanks!

P.S.
Greg, since the Samsung folks are not using it, and we (Oracle) can
patch our distro kernel to provide smörgåsbord of zcache2, zswap
and zcache3, even zcache1 if needed. I think it is safe to
delete staging/zcache and focus on getting the zcache3 (Bob's
patchset) upstream.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/4] zcache: a compressed file page cache
  2013-08-12 13:23         ` Konrad Rzeszutek Wilk
@ 2013-08-12 22:10           ` Greg KH
  0 siblings, 0 replies; 16+ messages in thread
From: Greg KH @ 2013-08-12 22:10 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wanpeng Li, Bob Liu, Bob Liu, linux-mm, ngupta, akpm, sjenning,
	riel, mgorman, kyungmin.park, p.sarna, barry.song, penberg

On Mon, Aug 12, 2013 at 09:23:10AM -0400, Konrad Rzeszutek Wilk wrote:
> Greg, since the Samsung folks are not using it, and we (Oracle) can
> patch our distro kernel to provide smorgasbord of zcache2, zswap
> and zcache3, even zcache1 if needed. I think it is safe to
> delete staging/zcache and focus on getting the zcache3 (Bob's
> patchset) upstream.

Ok, now deleted, thanks!

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/4] zcache: a compressed file page cache
  2013-08-06 11:36 [PATCH v2 0/4] zcache: a compressed file page cache Bob Liu
                   ` (4 preceding siblings ...)
  2013-08-06 13:58 ` [PATCH v2 0/4] zcache: a compressed file page cache Greg KH
@ 2013-08-09  8:03 ` Bob Liu
  5 siblings, 0 replies; 16+ messages in thread
From: Bob Liu @ 2013-08-09  8:03 UTC (permalink / raw)
  To: Linux-MM, Linux-Kernel
  Cc: Greg Kroah-Hartman, Nitin Gupta, Andrew Morton,
	Konrad Rzeszutek Wilk, Seth Jennings, Rik van Riel, Mel Gorman,
	kyungmin.park, p.sarna, barry.song, Pekka Enberg, Bob Liu

Another test case running sysbench only showed that the average time
per request and transactions per second got around 7% faster!
bootcmdline: mem=1G zcache.enabled=1 single

sysbench --test=oltp --oltp-table-size=15000000 --oltp-read-only=off \
--init-rng=on --num-threads=16 --max-requests=0 \
--oltp-dist-type=special --oltp-dist-pct=10 \
--max-time=7200 --db-driver=mysql --mysql-table-engine=innodb \
--mysql-user=root \
--mysql-password=xxxx --oltp-test-mode=complex run

                                Without zcache        With zcache
OLTP test statistics:
    queries performed:
        read:                  2238446                   2402372(+7%)
        write:                  799445                    857990
        other:                  319778                    343196
        total:                 3357669                   3603558
    transactions:               159889(22.20 per sec.)    171598(23.83
per sec.) (+7%)
    deadlocks:                       0(0.00 per sec.)          0(0.00 per sec.)
    read/write requests:       3037891(421.87 per sec.)
3260362(452.70 per sec.)(+7%)
    other operations:           319778(44.41 per sec.)    343196(47.65
per sec.) (+7%)

Test execution summary:
    total time:                   7201.0705s                7202.0176s
    total number of events:     159889                    171598
    total time taken by event execution:
                                115204.6708              115219.8235
    per-request statistics:
         min:                       94.25ms                  57.80ms (+38%)
         avg:                      720.53ms                 671.45ms (+7%)
         max:                    10684.90ms                7892.48ms (+26%)
         approx.  95 percentile:  1678.39ms                1699.62ms

Threads fairness:
    events (avg/stddev):          9993.0625/28.05          10724.87500/30.32
    execution time (avg/stddev):  7200.2919/0.30            7201.2390/0.48

By comparing /proc/vmstat, there is around 14G data reading are saved
if enabled zcache!
I believe zcache can also be helpful for many other file memory hungry
applications and it do no harm for other users!
Looking forward any feedback!

On Tue, Aug 6, 2013 at 7:36 PM, Bob Liu <lliubbo@gmail.com> wrote:
> Overview:
> Zcache is a in kernel compressed cache for file pages.
> It takes active file pages that are in the process of being reclaimed and
> attempts to compress them into a dynamically allocated RAM-based memory pool.
>
> If this process is successful, when those file pages needed again, the I/O
> reading operation was avoided. This results in a significant performance gains
> under memory pressure for systems full with file pages.
>
> History:
> Nitin Gupta started zcache in 2010:
> http://lwn.net/Articles/397574/
> http://lwn.net/Articles/396467/
>
> Dan Magenheimer extended zcache supporting both file pages and anonymous pages.
> It's located in drivers/staging/zcache now. But the current version of zcache is
> too complicated to be merged into upstream.
>
> Seth Jennings implemented a lightweight compressed cache for swap pages(zswap)
> only which was merged into v3.11-rc1 together with a zbud allocation.
>
> What I'm trying is reimplement a simple zcache for file pages only, based on the
> same zbud alloction layer. We can merge zswap and zcache to current zcache in
> staging if there is the requirement in future.
>
> Who can benefit:
> Applications like database which have a lot of file page data in memory, but
> during memory pressure some of those file pages will be reclaimed after their
> data are synced to disk. The data need to be reread into memory when they are
> required again. This may increse the transaction latency and cause performance
> drop. But with zcache, those data are compressed in memory. Only decompressing
> is needed instead of reading from disk!
>
> Other users with limited RAM capacities can also mitigate the performance impact
> of memory pressue if there are many file pages in memory.
>
> Design:
> Zcache receives pages for compression through the Cleancache API and is able to
> evict pages from its own compressed pool on an LRU basis in the case that the
> compressed pool is full.
>
> Zcache makes use of zbud for the managing the compressed memory pool. Each
> allocation in zbud is not directly accessible by address.  Rather, a handle is
> returned(zaddr) by the allocation routine and that handle(zaddr) must be mapped
> before being accessed. The compressed memory pool grows on demand and shrinks
> as compressed pages are freed.
>
> When a file page is passed from cleancache to zcache, zcache maintains a mapping
> of the <filesystem_type, inode_number, page_index> to the zbud address that
> references that compressed file page. This mapping is achieved with a red-black
> tree per filesystem type, plus a radix tree per red-black node.
>
> A zcache pool with pool_id as the index is created when a filesystem mounted.
> Each zcache pool has a red-black tree, the inode number is the search key.
> Each red-black tree node has a radix tree which use page index as the index.
> Each radix tree slot points to the zbud address combining with some extra
> information.
>
> A debugfs interface is provided for various statistic about zcache pool size,
> number of pages stored, loaded and evicted.
>
> Performance, Kernel Building:
>
> Setup
> ========
> Ubuntu with kernel v3.11-rc1
> Quad-core i5-3320 @ 2.6GHz
> 1G memory size(limited with mem=1G on boot)
> started kernbench with -o N(numbers of threads)
>
> Details
> ========
>           Without zcache    With zcache
>
> 8 threads
> Elapsed Time        1821              1814(+0.3%)
> User Time           5332              5304
> System Time          256               306
> Percent CPU          306               306
> Context Switches 1915378           1912027
> Sleeps           1501004           1492835
>
> Nr pages succ decompress from zcache
>                        -              8295
>
> 24 threads
> Elapsed Time        2556              2256(+11.7%)
> User Time           5184              5225
> System Time          271               276
> Percent CPU          213               243
> Context Switches 1993763           2024661
> Sleeps           2000881           1849496
>
> Nr pages succ decompress from zcache
>                        -            174490
>
> 36 threads
> Elapsed Time        5254              3995(+23.9%)
> User Time           4781              4947
> System Time          293               295
> Percent CPU           96               131
> Context Switches 1612581           1779860
> Sleeps           2944985           2414438
>
> Nr pages succ decompress from zcache
>                        -            380470
>
>
> Performance, Sysbench+mysql:
>
> Setup
> ========
> Ubuntu with kernel v3.11-rc1
> Quad-core i5-3320 @ 2.6GHz
> 2G memory size(limited with mem=2G on boot)
> Run sysbench in oltp complex mode for 1 hour:
> sysbench --test=oltp --oltp-table-size=5000000 --num-threads=16  --max-time=3600
> --oltp-test-mode=complex...
>
> After sysbench started, run iozone to trigger memory pressure:
> iozone -a -M -B -s 1200M -y 4k -+u
>
> Sysbench result
> ========
>                                 Without zcache          With zcache
> OLTP test statistics:
>     queries performed:
>         read:                   124320                  166936
>         write:                   44400                   59620
>         other:                   17760                   23848
>         total:                  186480                  250404
>     transactions:                 8880(2.47 per sec.)    11924(3.31 per sec.) (+34%)
>     deadlocks:                       0(0.00 per sec.)        0(0.00 per sec.)
>     read/write requests:        168720(46.86 per sec.)  226556(62.91 per sec.)(+34%)
>     other operations:            17760(4.93 per sec.)    23848(6.62 per sec.) (+34%)
>
> Test execution summary:
>     total time:                   3600.8528s              3601.3977s
>     total number of events:       8880                   11924
>     total time taken by event execution:
>                                  57610.3546              57612.9163
>     per-request statistics:
>          min:                       57.68ms                 49.52ms (+14%)
>          avg:                     6487.65ms               4831.68ms (+25%)
>          max:                   169640.52ms             124282.16ms (+42%)
>          approx.  95 percentile: 25139.93ms              21794.82ms (+13%)
>
> Threads fairness:
>     events (avg/stddev):           555.0000/6.05           745.2500/8.33
>     execution time (avg/stddev):  3600.6472/0.26          3600.8073/0.27
>
> Welcome helps with testing, it would be intersting to find zcache's effect in
> more real life workloads.
>
> Bob Liu (4):
>   mm: zcache: add core files
>   zcache: staging: %s/ZCACHE/ZCACHE_OLD
>   mm: zcache: add evict zpages supporting
>   mm: add WasActive page flag
>
>  drivers/staging/zcache/Kconfig  |   12 +-
>  drivers/staging/zcache/Makefile |    4 +-
>  include/linux/page-flags.h      |    9 +-
>  mm/Kconfig                      |   17 +
>  mm/Makefile                     |    1 +
>  mm/page_alloc.c                 |    3 +
>  mm/vmscan.c                     |    2 +
>  mm/zcache.c                     |  944 +++++++++++++++++++++++++++++++++++++++
>  8 files changed, 983 insertions(+), 9 deletions(-)
>  create mode 100644 mm/zcache.c
>
> --
> 1.7.10.4
>
-- 
Regards,
--Bob

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-08-13 13:51 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-06 11:36 [PATCH v2 0/4] zcache: a compressed file page cache Bob Liu
2013-08-06 11:36 ` [PATCH v2 1/4] mm: zcache: add core files Bob Liu
2013-08-06 11:36 ` [PATCH v2 2/4] zcache: staging: %s/ZCACHE/ZCACHE_OLD Bob Liu
2013-08-06 11:36 ` [PATCH v2 3/4] mm: zcache: add evict zpages supporting Bob Liu
2013-08-06 11:36 ` [PATCH v2 4/4] mm: add WasActive page flag Bob Liu
2013-08-13  6:01   ` Pekka Enberg
2013-08-13 13:50     ` Bob Liu
2013-08-06 13:58 ` [PATCH v2 0/4] zcache: a compressed file page cache Greg KH
2013-08-06 14:24   ` Bob Liu
2013-08-12 12:19     ` Konrad Rzeszutek Wilk
2013-08-12 12:25       ` Kyungmin Park
2013-08-12 12:30       ` Wanpeng Li
2013-08-12 12:30       ` Wanpeng Li
2013-08-12 13:23         ` Konrad Rzeszutek Wilk
2013-08-12 22:10           ` Greg KH
2013-08-09  8:03 ` Bob Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).