* [PATCH v2 0/4] zcache: a compressed file page cache
@ 2013-08-06 11:36 Bob Liu
2013-08-06 11:36 ` [PATCH v2 1/4] mm: zcache: add core files Bob Liu
` (5 more replies)
0 siblings, 6 replies; 16+ messages in thread
From: Bob Liu @ 2013-08-06 11:36 UTC (permalink / raw)
To: linux-mm
Cc: gregkh, ngupta, akpm, konrad.wilk, sjenning, riel, mgorman,
kyungmin.park, p.sarna, barry.song, penberg, Bob Liu
Overview:
Zcache is a in kernel compressed cache for file pages.
It takes active file pages that are in the process of being reclaimed and
attempts to compress them into a dynamically allocated RAM-based memory pool.
If this process is successful, when those file pages needed again, the I/O
reading operation was avoided. This results in a significant performance gains
under memory pressure for systems full with file pages.
History:
Nitin Gupta started zcache in 2010:
http://lwn.net/Articles/397574/
http://lwn.net/Articles/396467/
Dan Magenheimer extended zcache supporting both file pages and anonymous pages.
It's located in drivers/staging/zcache now. But the current version of zcache is
too complicated to be merged into upstream.
Seth Jennings implemented a lightweight compressed cache for swap pages(zswap)
only which was merged into v3.11-rc1 together with a zbud allocation.
What I'm trying is reimplement a simple zcache for file pages only, based on the
same zbud alloction layer. We can merge zswap and zcache to current zcache in
staging if there is the requirement in future.
Who can benefit:
Applications like database which have a lot of file page data in memory, but
during memory pressure some of those file pages will be reclaimed after their
data are synced to disk. The data need to be reread into memory when they are
required again. This may increse the transaction latency and cause performance
drop. But with zcache, those data are compressed in memory. Only decompressing
is needed instead of reading from disk!
Other users with limited RAM capacities can also mitigate the performance impact
of memory pressue if there are many file pages in memory.
Design:
Zcache receives pages for compression through the Cleancache API and is able to
evict pages from its own compressed pool on an LRU basis in the case that the
compressed pool is full.
Zcache makes use of zbud for the managing the compressed memory pool. Each
allocation in zbud is not directly accessible by address. Rather, a handle is
returned(zaddr) by the allocation routine and that handle(zaddr) must be mapped
before being accessed. The compressed memory pool grows on demand and shrinks
as compressed pages are freed.
When a file page is passed from cleancache to zcache, zcache maintains a mapping
of the <filesystem_type, inode_number, page_index> to the zbud address that
references that compressed file page. This mapping is achieved with a red-black
tree per filesystem type, plus a radix tree per red-black node.
A zcache pool with pool_id as the index is created when a filesystem mounted.
Each zcache pool has a red-black tree, the inode number is the search key.
Each red-black tree node has a radix tree which use page index as the index.
Each radix tree slot points to the zbud address combining with some extra
information.
A debugfs interface is provided for various statistic about zcache pool size,
number of pages stored, loaded and evicted.
Performance, Kernel Building:
Setup
========
Ubuntu with kernel v3.11-rc1
Quad-core i5-3320 @ 2.6GHz
1G memory size(limited with mem=1G on boot)
started kernbench with -o N(numbers of threads)
Details
========
Without zcache With zcache
8 threads
Elapsed Time 1821 1814(+0.3%)
User Time 5332 5304
System Time 256 306
Percent CPU 306 306
Context Switches 1915378 1912027
Sleeps 1501004 1492835
Nr pages succ decompress from zcache
- 8295
24 threads
Elapsed Time 2556 2256(+11.7%)
User Time 5184 5225
System Time 271 276
Percent CPU 213 243
Context Switches 1993763 2024661
Sleeps 2000881 1849496
Nr pages succ decompress from zcache
- 174490
36 threads
Elapsed Time 5254 3995(+23.9%)
User Time 4781 4947
System Time 293 295
Percent CPU 96 131
Context Switches 1612581 1779860
Sleeps 2944985 2414438
Nr pages succ decompress from zcache
- 380470
Performance, Sysbench+mysql:
Setup
========
Ubuntu with kernel v3.11-rc1
Quad-core i5-3320 @ 2.6GHz
2G memory size(limited with mem=2G on boot)
Run sysbench in oltp complex mode for 1 hour:
sysbench --test=oltp --oltp-table-size=5000000 --num-threads=16 --max-time=3600
--oltp-test-mode=complex...
After sysbench started, run iozone to trigger memory pressure:
iozone -a -M -B -s 1200M -y 4k -+u
Sysbench result
========
Without zcache With zcache
OLTP test statistics:
queries performed:
read: 124320 166936
write: 44400 59620
other: 17760 23848
total: 186480 250404
transactions: 8880(2.47 per sec.) 11924(3.31 per sec.) (+34%)
deadlocks: 0(0.00 per sec.) 0(0.00 per sec.)
read/write requests: 168720(46.86 per sec.) 226556(62.91 per sec.)(+34%)
other operations: 17760(4.93 per sec.) 23848(6.62 per sec.) (+34%)
Test execution summary:
total time: 3600.8528s 3601.3977s
total number of events: 8880 11924
total time taken by event execution:
57610.3546 57612.9163
per-request statistics:
min: 57.68ms 49.52ms (+14%)
avg: 6487.65ms 4831.68ms (+25%)
max: 169640.52ms 124282.16ms (+42%)
approx. 95 percentile: 25139.93ms 21794.82ms (+13%)
Threads fairness:
events (avg/stddev): 555.0000/6.05 745.2500/8.33
execution time (avg/stddev): 3600.6472/0.26 3600.8073/0.27
Welcome helps with testing, it would be intersting to find zcache's effect in
more real life workloads.
Bob Liu (4):
mm: zcache: add core files
zcache: staging: %s/ZCACHE/ZCACHE_OLD
mm: zcache: add evict zpages supporting
mm: add WasActive page flag
drivers/staging/zcache/Kconfig | 12 +-
drivers/staging/zcache/Makefile | 4 +-
include/linux/page-flags.h | 9 +-
mm/Kconfig | 17 +
mm/Makefile | 1 +
mm/page_alloc.c | 3 +
mm/vmscan.c | 2 +
mm/zcache.c | 944 +++++++++++++++++++++++++++++++++++++++
8 files changed, 983 insertions(+), 9 deletions(-)
create mode 100644 mm/zcache.c
--
1.7.10.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread* [PATCH v2 1/4] mm: zcache: add core files 2013-08-06 11:36 [PATCH v2 0/4] zcache: a compressed file page cache Bob Liu @ 2013-08-06 11:36 ` Bob Liu 2013-08-06 11:36 ` [PATCH v2 2/4] zcache: staging: %s/ZCACHE/ZCACHE_OLD Bob Liu ` (4 subsequent siblings) 5 siblings, 0 replies; 16+ messages in thread From: Bob Liu @ 2013-08-06 11:36 UTC (permalink / raw) To: linux-mm Cc: gregkh, ngupta, akpm, konrad.wilk, sjenning, riel, mgorman, kyungmin.park, p.sarna, barry.song, penberg, Bob Liu zcache is a backend for cleancache that takes file pages that are in the process of being reclaimed and attempts to compress them and store them in a RAM-based memory pool. This can result in a significant I/O reduction if system is full with file pages and, in the case where decompressing from RAM is faster than reading from the disk, can also improve workload performance. Signed-off-by: Bob Liu <bob.liu@oracle.com> --- mm/Kconfig | 17 ++ mm/Makefile | 1 + mm/zcache.c | 895 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 913 insertions(+) create mode 100644 mm/zcache.c diff --git a/mm/Kconfig b/mm/Kconfig index 8028dcc..0084030 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -508,6 +508,23 @@ config ZSWAP they have not be fully explored on the large set of potential configurations and workloads that exist. +config ZCACHE + bool "Compressed cache for file pages (EXPERIMENTAL)" + depends on CRYPTO && CLEANCACHE + select CRYPTO_LZO + select ZBUD + default n + help + A compressed cache for file pages. + + It takes active file pages that are in the process of being reclaimed + and attempts to compress them into a dynamically allocated RAM-based + memory pool. + + If this process is successful, when those file pages needed again, the + I/O reading operation was avoided. This results in a significant performance + gains under memory pressure for systems full with file pages. + config MEM_SOFT_DIRTY bool "Track memory changes" depends on CHECKPOINT_RESTORE && HAVE_ARCH_SOFT_DIRTY diff --git a/mm/Makefile b/mm/Makefile index f008033..a29232b 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -33,6 +33,7 @@ obj-$(CONFIG_BOUNCE) += bounce.o obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o obj-$(CONFIG_FRONTSWAP) += frontswap.o obj-$(CONFIG_ZSWAP) += zswap.o +obj-$(CONFIG_ZCACHE) += zcache.o obj-$(CONFIG_HAS_DMA) += dmapool.o obj-$(CONFIG_HUGETLBFS) += hugetlb.o obj-$(CONFIG_NUMA) += mempolicy.o diff --git a/mm/zcache.c b/mm/zcache.c new file mode 100644 index 0000000..ec1a0eb --- /dev/null +++ b/mm/zcache.c @@ -0,0 +1,895 @@ +/* + * linux/mm/zcache.c + * + * A cleancache backend for file pages compression. + * Concepts based on original zcache by Dan Magenheimer. + * Copyright (C) 2013 Bob Liu <bob.liu@oracle.com> + * + * With zcache, active file pages can be compressed in memory during page + * reclaiming. When their data is needed again the I/O reading operation is + * avoided. This results in a significant performance gain under memory pressure + * for systems with many file pages. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. +*/ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/atomic.h> +#include <linux/cleancache.h> +#include <linux/cpu.h> +#include <linux/crypto.h> +#include <linux/page-flags.h> +#include <linux/pagemap.h> +#include <linux/highmem.h> +#include <linux/mm_types.h> +#include <linux/module.h> +#include <linux/slab.h> +#include <linux/spinlock.h> +#include <linux/radix-tree.h> +#include <linux/rbtree.h> +#include <linux/types.h> +#include <linux/zbud.h> + +/* + * Enable/disable zcache (disabled by default) + */ +static bool zcache_enabled __read_mostly; +module_param_named(enabled, zcache_enabled, bool, 0); + +/* + * Compressor to be used by zcache + */ +#define ZCACHE_COMPRESSOR_DEFAULT "lzo" +static char *zcache_compressor = ZCACHE_COMPRESSOR_DEFAULT; +module_param_named(compressor, zcache_compressor, charp, 0); + +/* + * The maximum percentage of memory that the compressed pool can occupy. + */ +static unsigned int zcache_max_pool_percent = 10; +module_param_named(max_pool_percent, zcache_max_pool_percent, uint, 0644); + +/* + * zcache statistics + */ +static u64 zcache_pool_limit_hit; +static u64 zcache_dup_entry; +static u64 zcache_zbud_alloc_fail; +static u64 zcache_pool_pages; +static atomic_t zcache_stored_pages = ATOMIC_INIT(0); + +/* + * Zcache receives pages for compression through the Cleancache API and is able + * to evict pages from its own compressed pool on an LRU basis in the case that + * the compressed pool is full. + * + * Zcache makes use of zbud for the managing the compressed memory pool. Each + * allocation in zbud is not directly accessible by address. Rather, a handle + * (zaddr) is return by the allocation routine and that handle(zaddr must be + * mapped before being accessed. The compressed memory pool grows on demand and + * shrinks as compressed pages are freed. + * + * When a file page is passed from cleancache to zcache, zcache maintains a + * mapping of the <filesystem_type, inode_number, page_index> to the zbud + * address that references that compressed file page. This mapping is achieved + * with a red-black tree per filesystem type, plus a radix tree per red-black + * node. + * + * A zcache pool with pool_id as the index is created when a filesystem mounted + * Each zcache pool has a red-black tree, the inode number(rb_index) is the + * search key. Each red-black tree node has a radix tree which use + * page->index(ra_index) as the index. Each radix tree slot points to the zbud + * address combining with some extra information(zcache_ra_handle). + */ +#define MAX_ZCACHE_POOLS 32 +/* + * One zcache_pool per (cleancache aware) filesystem mount instance + */ +struct zcache_pool { + struct rb_root rbtree; + rwlock_t rb_lock; /* Protects rbtree */ + struct zbud_pool *pool; /* Zbud pool used */ +}; + +/* + * Manage all zcache pools + */ +struct _zcache { + struct zcache_pool *pools[MAX_ZCACHE_POOLS]; + u32 num_pools; /* Current no. of zcache pools */ + spinlock_t pool_lock; /* Protects pools[] and num_pools */ +}; +struct _zcache zcache; + +/* + * Redblack tree node, each node has a page index radix-tree. + * Indexed by inode nubmer. + */ +struct zcache_rbnode { + struct rb_node rb_node; + int rb_index; + struct radix_tree_root ratree; /* Page radix tree per inode rbtree */ + spinlock_t ra_lock; /* Protects radix tree */ + struct kref refcount; +}; + +/* + * Radix-tree leaf, indexed by page->index + */ +struct zcache_ra_handle { + int rb_index; /* Redblack tree index */ + int ra_index; /* Radix tree index */ + int zlen; /* Compressed page size */ +}; + +static struct kmem_cache *zcache_rbnode_cache; +static int zcache_rbnode_cache_create(void) +{ + zcache_rbnode_cache = KMEM_CACHE(zcache_rbnode, 0); + return (zcache_rbnode_cache == NULL); +} +static void zcache_rbnode_cache_destory(void) +{ + kmem_cache_destroy(zcache_rbnode_cache); +} + +/* + * Compression functions + * (Below functions are copyed from zswap!) + */ +static struct crypto_comp * __percpu *zcache_comp_pcpu_tfms; + +enum comp_op { + ZCACHE_COMPOP_COMPRESS, + ZCACHE_COMPOP_DECOMPRESS +}; + +static int zcache_comp_op(enum comp_op op, const u8 *src, unsigned int slen, + u8 *dst, unsigned int *dlen) +{ + struct crypto_comp *tfm; + int ret; + + tfm = *per_cpu_ptr(zcache_comp_pcpu_tfms, get_cpu()); + switch (op) { + case ZCACHE_COMPOP_COMPRESS: + ret = crypto_comp_compress(tfm, src, slen, dst, dlen); + break; + case ZCACHE_COMPOP_DECOMPRESS: + ret = crypto_comp_decompress(tfm, src, slen, dst, dlen); + break; + default: + ret = -EINVAL; + } + + put_cpu(); + return ret; +} + +static int __init zcache_comp_init(void) +{ + if (!crypto_has_comp(zcache_compressor, 0, 0)) { + pr_info("%s compressor not available\n", zcache_compressor); + /* fall back to default compressor */ + zcache_compressor = ZCACHE_COMPRESSOR_DEFAULT; + if (!crypto_has_comp(zcache_compressor, 0, 0)) + /* can't even load the default compressor */ + return -ENODEV; + } + pr_info("using %s compressor\n", zcache_compressor); + + /* alloc percpu transforms */ + zcache_comp_pcpu_tfms = alloc_percpu(struct crypto_comp *); + if (!zcache_comp_pcpu_tfms) + return -ENOMEM; + return 0; +} + +static void zcache_comp_exit(void) +{ + /* free percpu transforms */ + if (zcache_comp_pcpu_tfms) + free_percpu(zcache_comp_pcpu_tfms); +} + +/* + * Per-cpu code + * (Below functions are also copyed from zswap!) + */ +static DEFINE_PER_CPU(u8 *, zcache_dstmem); + +static int __zcache_cpu_notifier(unsigned long action, unsigned long cpu) +{ + struct crypto_comp *tfm; + u8 *dst; + + switch (action) { + case CPU_UP_PREPARE: + tfm = crypto_alloc_comp(zcache_compressor, 0, 0); + if (IS_ERR(tfm)) { + pr_err("can't allocate compressor transform\n"); + return NOTIFY_BAD; + } + *per_cpu_ptr(zcache_comp_pcpu_tfms, cpu) = tfm; + dst = kmalloc(PAGE_SIZE * 2, GFP_KERNEL); + if (!dst) { + pr_err("can't allocate compressor buffer\n"); + crypto_free_comp(tfm); + *per_cpu_ptr(zcache_comp_pcpu_tfms, cpu) = NULL; + return NOTIFY_BAD; + } + per_cpu(zcache_dstmem, cpu) = dst; + break; + case CPU_DEAD: + case CPU_UP_CANCELED: + tfm = *per_cpu_ptr(zcache_comp_pcpu_tfms, cpu); + if (tfm) { + crypto_free_comp(tfm); + *per_cpu_ptr(zcache_comp_pcpu_tfms, cpu) = NULL; + } + dst = per_cpu(zcache_dstmem, cpu); + kfree(dst); + per_cpu(zcache_dstmem, cpu) = NULL; + break; + default: + break; + } + return NOTIFY_OK; +} + +static int zcache_cpu_notifier(struct notifier_block *nb, + unsigned long action, void *pcpu) +{ + unsigned long cpu = (unsigned long)pcpu; + return __zcache_cpu_notifier(action, cpu); +} + +static struct notifier_block zcache_cpu_notifier_block = { + .notifier_call = zcache_cpu_notifier +}; + +static int zcache_cpu_init(void) +{ + unsigned long cpu; + + get_online_cpus(); + for_each_online_cpu(cpu) + if (__zcache_cpu_notifier(CPU_UP_PREPARE, cpu) != NOTIFY_OK) + goto cleanup; + register_cpu_notifier(&zcache_cpu_notifier_block); + put_online_cpus(); + return 0; + +cleanup: + for_each_online_cpu(cpu) + __zcache_cpu_notifier(CPU_UP_CANCELED, cpu); + put_online_cpus(); + return -ENOMEM; +} + +/* + * Zcache helpers + */ +static bool zcache_is_full(void) +{ + return (totalram_pages * zcache_max_pool_percent / 100 < + zcache_pool_pages); +} + +/* + * The caller must hold zpool->rb_lock at least + */ +static struct zcache_rbnode *zcache_find_rbnode(struct rb_root *rbtree, + int index, struct rb_node **rb_parent, struct rb_node ***rb_link) +{ + struct zcache_rbnode *entry; + struct rb_node **__rb_link, *__rb_parent, *rb_prev; + + __rb_link = &rbtree->rb_node; + rb_prev = __rb_parent = NULL; + + while (*__rb_link) { + __rb_parent = *__rb_link; + entry = rb_entry(__rb_parent, struct zcache_rbnode, rb_node); + if (entry->rb_index > index) + __rb_link = &__rb_parent->rb_left; + else if (entry->rb_index < index) { + rb_prev = __rb_parent; + __rb_link = &__rb_parent->rb_right; + } else + return entry; + } + + if (rb_parent) + *rb_parent = __rb_parent; + if (rb_link) + *rb_link = __rb_link; + return NULL; +} + +static struct zcache_rbnode *zcache_find_get_rbnode(struct zcache_pool *zpool, + int rb_index) +{ + unsigned long flags; + struct zcache_rbnode *rbnode; + + read_lock_irqsave(&zpool->rb_lock, flags); + rbnode = zcache_find_rbnode(&zpool->rbtree, rb_index, 0, 0); + if (rbnode) + kref_get(&rbnode->refcount); + read_unlock_irqrestore(&zpool->rb_lock, flags); + return rbnode; +} + +/* + * kref_put callback for zcache_rbnode. + * + * The rbnode must have been isolated from rbtree already. + */ +static void zcache_rbnode_release(struct kref *kref) +{ + struct zcache_rbnode *rbnode; + + rbnode = container_of(kref, struct zcache_rbnode, refcount); + BUG_ON(rbnode->ratree.rnode); + kmem_cache_free(zcache_rbnode_cache, rbnode); +} + +/* + * Check whether the radix-tree of this rbnode is empty. + * If that's true, then we can delete this zcache_rbnode from + * zcache_pool->rbtree + * + * Caller must hold zcache_rbnode->ra_lock + */ +static int zcache_rbnode_empty(struct zcache_rbnode *rbnode) +{ + return rbnode->ratree.rnode == NULL; +} + +/* + * Remove zcache_rbnode from zpool->rbtree + * + * holded_rblock - whether the caller has holded zpool->rb_lock + */ +static void zcache_rbnode_isolate(struct zcache_pool *zpool, + struct zcache_rbnode *rbnode, bool holded_rblock) +{ + unsigned long flags; + + if (!holded_rblock) + write_lock_irqsave(&zpool->rb_lock, flags); + /* + * Someone can get reference on this rbnode before we could + * acquire write lock above. + * We want to remove it from zpool->rbtree when only the caller and + * corresponding ratree holds a reference to this rbnode. + * Below check ensures that a racing zcache put will not end up adding + * a page to an isolated node and thereby losing that memory. + */ + if (atomic_read(&rbnode->refcount.refcount) == 2) { + rb_erase(&rbnode->rb_node, &zpool->rbtree); + RB_CLEAR_NODE(&rbnode->rb_node); + kref_put(&rbnode->refcount, zcache_rbnode_release); + } + if (!holded_rblock) + write_unlock_irqrestore(&zpool->rb_lock, flags); +} + +/* + * Store zaddr which allocated by zbud_alloc() to the hierarchy rbtree-ratree. + */ +static int zcache_store_zaddr(struct zcache_pool *zpool, + struct zcache_ra_handle *zhandle, unsigned long zaddr) +{ + unsigned long flags; + struct zcache_rbnode *rbnode, *tmp; + struct rb_node **link = NULL, *parent = NULL; + int ret; + void *dup_zaddr; + + rbnode = zcache_find_get_rbnode(zpool, zhandle->rb_index); + if (!rbnode) { + /* alloc and init a new rbnode */ + rbnode = kmem_cache_alloc(zcache_rbnode_cache, GFP_KERNEL); + if (!rbnode) + return -ENOMEM; + + INIT_RADIX_TREE(&rbnode->ratree, GFP_ATOMIC|__GFP_NOWARN); + spin_lock_init(&rbnode->ra_lock); + rbnode->rb_index = zhandle->rb_index; + kref_init(&rbnode->refcount); + RB_CLEAR_NODE(&rbnode->rb_node); + + /* add that rbnode to rbtree */ + write_lock_irqsave(&zpool->rb_lock, flags); + tmp = zcache_find_rbnode(&zpool->rbtree, zhandle->rb_index, + &parent, &link); + if (tmp) { + /* somebody else allocated new rbnode */ + kmem_cache_free(zcache_rbnode_cache, rbnode); + rbnode = tmp; + } else { + rb_link_node(&rbnode->rb_node, parent, link); + rb_insert_color(&rbnode->rb_node, &zpool->rbtree); + } + + /* Inc the reference of this zcache_rbnode */ + kref_get(&rbnode->refcount); + write_unlock_irqrestore(&zpool->rb_lock, flags); + } + + /* Succfully got a zcache_rbnode when arriving here */ + spin_lock_irqsave(&rbnode->ra_lock, flags); + dup_zaddr = radix_tree_delete(&rbnode->ratree, zhandle->ra_index); + if (unlikely(dup_zaddr)) { + WARN_ON("duplicated, will be replaced!\n"); + zbud_free(zpool->pool, (unsigned long)dup_zaddr); + atomic_dec(&zcache_stored_pages); + zcache_pool_pages = zbud_get_pool_size(zpool->pool); + zcache_dup_entry++; + } + + /* Insert zcache_ra_handle to ratree */ + ret = radix_tree_insert(&rbnode->ratree, zhandle->ra_index, + (void *)zaddr); + if (unlikely(ret)) + if (zcache_rbnode_empty(rbnode)) + zcache_rbnode_isolate(zpool, rbnode, 0); + spin_unlock_irqrestore(&rbnode->ra_lock, flags); + + kref_put(&rbnode->refcount, zcache_rbnode_release); + return ret; +} + +/* + * Load zaddr and delete it from radix tree. + * If the radix tree of the corresponding rbnode is empty, delete the rbnode + * from zpool->rbtree also. + */ +static void *zcache_load_delete_zaddr(struct zcache_pool *zpool, + int rb_index, int ra_index) +{ + struct zcache_rbnode *rbnode; + void *zaddr = NULL; + unsigned long flags; + + rbnode = zcache_find_get_rbnode(zpool, rb_index); + if (!rbnode) + goto out; + + BUG_ON(rbnode->rb_index != rb_index); + + spin_lock_irqsave(&rbnode->ra_lock, flags); + zaddr = radix_tree_delete(&rbnode->ratree, ra_index); + if (zcache_rbnode_empty(rbnode)) + zcache_rbnode_isolate(zpool, rbnode, 0); + spin_unlock_irqrestore(&rbnode->ra_lock, flags); + + kref_put(&rbnode->refcount, zcache_rbnode_release); +out: + return zaddr; +} + +static void zcache_store_page(int pool_id, struct cleancache_filekey key, + pgoff_t index, struct page *page) +{ + struct zcache_ra_handle *zhandle; + u8 *zpage, *src, *dst; + unsigned long zaddr; /* Address of zhandle + compressed data(zpage) */ + unsigned int zlen = PAGE_SIZE; + int ret; + + struct zcache_pool *zpool = zcache.pools[pool_id]; + + if (zcache_is_full()) { + zcache_pool_limit_hit++; + return; + } + + /* compress */ + dst = get_cpu_var(zcache_dstmem); + src = kmap_atomic(page); + ret = zcache_comp_op(ZCACHE_COMPOP_COMPRESS, src, PAGE_SIZE, dst, + &zlen); + kunmap_atomic(src); + if (ret) { + pr_err("zcache compress error ret %d\n", ret); + put_cpu_var(zcache_dstmem); + return; + } + + /* store zcache handle together with compressed page data */ + ret = zbud_alloc(zpool->pool, zlen + sizeof(struct zcache_ra_handle), + __GFP_NORETRY | __GFP_NOWARN, &zaddr); + if (ret) { + zcache_zbud_alloc_fail++; + put_cpu_var(zcache_dstmem); + return; + } + + zhandle = (struct zcache_ra_handle *)zbud_map(zpool->pool, zaddr); + zhandle->ra_index = index; + zhandle->rb_index = key.u.ino; + zhandle->zlen = zlen; + /* Compressed page data stored at the end of zcache_ra_handle */ + zpage = (u8 *)(zhandle + 1); + memcpy(zpage, dst, zlen); + zbud_unmap(zpool->pool, zaddr); + put_cpu_var(zcache_dstmem); + + /* store zcache handle */ + ret = zcache_store_zaddr(zpool, zhandle, zaddr); + if (ret) { + pr_err("%s: store handle error %d\n", __func__, ret); + zbud_free(zpool->pool, zaddr); + } + + /* update stats */ + atomic_inc(&zcache_stored_pages); + zcache_pool_pages = zbud_get_pool_size(zpool->pool); + return; +} + +static int zcache_load_page(int pool_id, struct cleancache_filekey key, + pgoff_t index, struct page *page) +{ + int ret; + u8 *src, *dst; + void *zaddr; + unsigned int dlen = PAGE_SIZE; + struct zcache_ra_handle *zhandle; + struct zcache_pool *zpool = zcache.pools[pool_id]; + + zaddr = zcache_load_delete_zaddr(zpool, key.u.ino, index); + if (!zaddr) + return -1; + + zhandle = (struct zcache_ra_handle *)zbud_map(zpool->pool, + (unsigned long)zaddr); + /* Compressed page data stored at the end of zcache_ra_handle */ + src = (u8 *)(zhandle + 1); + + /* decompress */ + dst = kmap_atomic(page); + ret = zcache_comp_op(ZCACHE_COMPOP_DECOMPRESS, src, zhandle->zlen, dst, + &dlen); + kunmap_atomic(dst); + zbud_unmap(zpool->pool, (unsigned long)zaddr); + zbud_free(zpool->pool, (unsigned long)zaddr); + + BUG_ON(ret); + BUG_ON(dlen != PAGE_SIZE); + + /* update stats */ + atomic_dec(&zcache_stored_pages); + zcache_pool_pages = zbud_get_pool_size(zpool->pool); + return ret; +} + +static void zcache_flush_page(int pool_id, struct cleancache_filekey key, + pgoff_t index) +{ + struct zcache_pool *zpool = zcache.pools[pool_id]; + void *zaddr = NULL; + + zaddr = zcache_load_delete_zaddr(zpool, key.u.ino, index); + if (zaddr) { + zbud_free(zpool->pool, (unsigned long)zaddr); + atomic_dec(&zcache_stored_pages); + zcache_pool_pages = zbud_get_pool_size(zpool->pool); + } +} + +#define FREE_BATCH 16 +/* + * Callers must hold the lock + */ +static void zcache_flush_ratree(struct zcache_pool *zpool, + struct zcache_rbnode *rbnode) +{ + unsigned long index = 0; + int count, i; + struct zcache_ra_handle *zhandle; + + do { + void *zaddrs[FREE_BATCH]; + + count = radix_tree_gang_lookup(&rbnode->ratree, (void **)zaddrs, + index, FREE_BATCH); + + for (i = 0; i < count; i++) { + zhandle = (struct zcache_ra_handle *)zbud_map( + zpool->pool, (unsigned long)zaddrs[i]); + index = zhandle->ra_index; + radix_tree_delete(&rbnode->ratree, index); + zbud_unmap(zpool->pool, (unsigned long)zaddrs[i]); + zbud_free(zpool->pool, (unsigned long)zaddrs[i]); + atomic_dec(&zcache_stored_pages); + zcache_pool_pages = zbud_get_pool_size(zpool->pool); + } + + index++; + } while (count == FREE_BATCH); +} + +static void zcache_flush_inode(int pool_id, struct cleancache_filekey key) +{ + struct zcache_rbnode *rbnode; + unsigned long flags1, flags2; + struct zcache_pool *zpool = zcache.pools[pool_id]; + + /* + * Refuse new pages added in to the same rbinode, so get rb_lock at + * first. + */ + write_lock_irqsave(&zpool->rb_lock, flags1); + rbnode = zcache_find_rbnode(&zpool->rbtree, key.u.ino, 0, 0); + if (!rbnode) { + write_unlock_irqrestore(&zpool->rb_lock, flags1); + return; + } + + kref_get(&rbnode->refcount); + spin_lock_irqsave(&rbnode->ra_lock, flags2); + + zcache_flush_ratree(zpool, rbnode); + if (zcache_rbnode_empty(rbnode)) + /* When arrvied here, we already hold rb_lock */ + zcache_rbnode_isolate(zpool, rbnode, 1); + + spin_unlock_irqrestore(&rbnode->ra_lock, flags2); + write_unlock_irqrestore(&zpool->rb_lock, flags1); + kref_put(&rbnode->refcount, zcache_rbnode_release); +} + +static void zcache_destroy_pool(struct zcache_pool *zpool); +static void zcache_flush_fs(int pool_id) +{ + struct zcache_rbnode *z_rbnode = NULL; + struct rb_node *rbnode; + unsigned long flags1, flags2; + struct zcache_pool *zpool; + + if (pool_id < 0) + return; + + zpool = zcache.pools[pool_id]; + if (!zpool) + return; + + /* + * Refuse new pages added in, so get rb_lock at first. + */ + write_lock_irqsave(&zpool->rb_lock, flags1); + + rbnode = rb_first(&zpool->rbtree); + while (rbnode) { + z_rbnode = rb_entry(rbnode, struct zcache_rbnode, rb_node); + rbnode = rb_next(rbnode); + if (z_rbnode) { + kref_get(&z_rbnode->refcount); + spin_lock_irqsave(&z_rbnode->ra_lock, flags2); + zcache_flush_ratree(zpool, z_rbnode); + if (zcache_rbnode_empty(z_rbnode)) + zcache_rbnode_isolate(zpool, z_rbnode, 1); + spin_unlock_irqrestore(&z_rbnode->ra_lock, flags2); + kref_put(&z_rbnode->refcount, zcache_rbnode_release); + } + } + + write_unlock_irqrestore(&zpool->rb_lock, flags1); + zcache_destroy_pool(zpool); +} + +/* + * Evict pages from zcache pool on an LRU basis after the compressed pool is + * full. + */ +static int zcache_evict_entry(struct zbud_pool *pool, unsigned long zaddr) +{ + return -1; +} + +static struct zbud_ops zcache_zbud_ops = { + .evict = zcache_evict_entry +}; + +/* Return pool id */ +static int zcache_create_pool(void) +{ + int ret; + struct zcache_pool *zpool; + + zpool = kzalloc(sizeof(*zpool), GFP_KERNEL); + if (!zpool) { + ret = -ENOMEM; + goto out; + } + + zpool->pool = zbud_create_pool(GFP_KERNEL, &zcache_zbud_ops); + if (!zpool->pool) { + kfree(zpool); + ret = -ENOMEM; + goto out; + } + + spin_lock(&zcache.pool_lock); + if (zcache.num_pools == MAX_ZCACHE_POOLS) { + pr_err("Cannot create new pool (limit:%u)\n", MAX_ZCACHE_POOLS); + zbud_destroy_pool(zpool->pool); + kfree(zpool); + ret = -EPERM; + goto out_unlock; + } + + rwlock_init(&zpool->rb_lock); + zpool->rbtree = RB_ROOT; + /* Add to pool list */ + for (ret = 0; ret < MAX_ZCACHE_POOLS; ret++) + if (!zcache.pools[ret]) + break; + zcache.pools[ret] = zpool; + zcache.num_pools++; + pr_info("New pool created id:%d\n", ret); + +out_unlock: + spin_unlock(&zcache.pool_lock); +out: + return ret; +} + +static void zcache_destroy_pool(struct zcache_pool *zpool) +{ + int i; + + if (!zpool) + return; + + spin_lock(&zcache.pool_lock); + zcache.num_pools--; + for (i = 0; i < MAX_ZCACHE_POOLS; i++) + if (zcache.pools[i] == zpool) + break; + zcache.pools[i] = NULL; + spin_unlock(&zcache.pool_lock); + + if (!RB_EMPTY_ROOT(&zpool->rbtree)) + WARN_ON("Memory leak detected. Freeing non-empty pool!\n"); + + zbud_destroy_pool(zpool->pool); + kfree(zpool); +} + +static int zcache_init_fs(size_t pagesize) +{ + int ret; + + if (pagesize != PAGE_SIZE) { + pr_info("Unsupported page size: %zu", pagesize); + ret = -EINVAL; + goto out; + } + + ret = zcache_create_pool(); + if (ret < 0) { + pr_info("Failed to create new pool\n"); + ret = -ENOMEM; + goto out; + } +out: + return ret; +} + +static int zcache_init_shared_fs(char *uuid, size_t pagesize) +{ + /* shared pools are unsupported and map to private */ + return zcache_init_fs(pagesize); +} + +static struct cleancache_ops zcache_ops = { + .put_page = zcache_store_page, + .get_page = zcache_load_page, + .invalidate_page = zcache_flush_page, + .invalidate_inode = zcache_flush_inode, + .invalidate_fs = zcache_flush_fs, + .init_shared_fs = zcache_init_shared_fs, + .init_fs = zcache_init_fs +}; + +/* + * Debugfs functions + */ +#ifdef CONFIG_DEBUG_FS +#include <linux/debugfs.h> +static struct dentry *zcache_debugfs_root; + +static int __init zcache_debugfs_init(void) +{ + if (!debugfs_initialized()) + return -ENODEV; + + zcache_debugfs_root = debugfs_create_dir("zcache", NULL); + if (!zcache_debugfs_root) + return -ENOMEM; + + debugfs_create_u64("pool_limit_hit", S_IRUGO, zcache_debugfs_root, + &zcache_pool_limit_hit); + debugfs_create_u64("reject_alloc_fail", S_IRUGO, zcache_debugfs_root, + &zcache_zbud_alloc_fail); + debugfs_create_u64("duplicate_entry", S_IRUGO, zcache_debugfs_root, + &zcache_dup_entry); + debugfs_create_u64("pool_pages", S_IRUGO, zcache_debugfs_root, + &zcache_pool_pages); + debugfs_create_atomic_t("stored_pages", S_IRUGO, zcache_debugfs_root, + &zcache_stored_pages); + return 0; +} + +static void __exit zcache_debugfs_exit(void) +{ + debugfs_remove_recursive(zcache_debugfs_root); +} +#else +static int __init zcache_debugfs_init(void) +{ + return 0; +} +static void __exit zcache_debugfs_exit(void) +{ +} +#endif + +/* + * zcache init and exit + */ +static int __init init_zcache(void) +{ + if (!zcache_enabled) + return 0; + + pr_info("loading zcache..\n"); + if (zcache_rbnode_cache_create()) { + pr_err("entry cache creation failed\n"); + goto error; + } + + if (zcache_comp_init()) { + pr_err("compressor initialization failed\n"); + goto compfail; + } + if (zcache_cpu_init()) { + pr_err("per-cpu initialization failed\n"); + goto pcpufail; + } + + spin_lock_init(&zcache.pool_lock); + cleancache_register_ops(&zcache_ops); + + if (zcache_debugfs_init()) + pr_warn("debugfs initialization failed\n"); + return 0; +pcpufail: + zcache_comp_exit(); +compfail: + zcache_rbnode_cache_destory(); +error: + return -ENOMEM; +} + +/* must be late so crypto has time to come up */ +late_initcall(init_zcache); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Bob Liu <bob.liu@oracle.com>"); +MODULE_DESCRIPTION("Compressed cache for clean file pages"); + -- 1.7.10.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 2/4] zcache: staging: %s/ZCACHE/ZCACHE_OLD 2013-08-06 11:36 [PATCH v2 0/4] zcache: a compressed file page cache Bob Liu 2013-08-06 11:36 ` [PATCH v2 1/4] mm: zcache: add core files Bob Liu @ 2013-08-06 11:36 ` Bob Liu 2013-08-06 11:36 ` [PATCH v2 3/4] mm: zcache: add evict zpages supporting Bob Liu ` (3 subsequent siblings) 5 siblings, 0 replies; 16+ messages in thread From: Bob Liu @ 2013-08-06 11:36 UTC (permalink / raw) To: linux-mm Cc: gregkh, ngupta, akpm, konrad.wilk, sjenning, riel, mgorman, kyungmin.park, p.sarna, barry.song, penberg, Bob Liu If nobody are using it, I'll drop it from staging. Zcache in staging then split to zswap and zcache in mm/, and can be merged again in future if there is requriement. Signed-off-by: Bob Liu <bob.liu@oracle.com> --- drivers/staging/zcache/Kconfig | 12 ++++++------ drivers/staging/zcache/Makefile | 4 ++-- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/staging/zcache/Kconfig b/drivers/staging/zcache/Kconfig index 2d7b2da..f96fb12 100644 --- a/drivers/staging/zcache/Kconfig +++ b/drivers/staging/zcache/Kconfig @@ -1,4 +1,4 @@ -config ZCACHE +config ZCACHE_OLD tristate "Dynamic compression of swap pages and clean pagecache pages" depends on CRYPTO=y && SWAP=y && CLEANCACHE && FRONTSWAP select CRYPTO_LZO @@ -10,9 +10,9 @@ config ZCACHE memory to store clean page cache pages and swap in RAM, providing a noticeable reduction in disk I/O. -config ZCACHE_DEBUG +config ZCACHE_OLD_DEBUG bool "Enable debug statistics" - depends on DEBUG_FS && ZCACHE + depends on DEBUG_FS && ZCACHE_OLD default n help This is used to provide an debugfs directory with counters of @@ -20,7 +20,7 @@ config ZCACHE_DEBUG config RAMSTER tristate "Cross-machine RAM capacity sharing, aka peer-to-peer tmem" - depends on CONFIGFS_FS=y && SYSFS=y && !HIGHMEM && ZCACHE + depends on CONFIGFS_FS=y && SYSFS=y && !HIGHMEM && ZCACHE_OLD depends on NET # must ensure struct page is 8-byte aligned select HAVE_ALIGNED_STRUCT_PAGE if !64BIT @@ -45,9 +45,9 @@ config RAMSTER_DEBUG # __add_to_swap_cache, and implement __swap_writepage (which is swap_writepage # without the frontswap call. When these are in-tree, the dependency on # BROKEN can be removed -config ZCACHE_WRITEBACK +config ZCACHE_OLD_WRITEBACK bool "Allow compressed swap pages to be writtenback to swap disk" - depends on ZCACHE=y && BROKEN + depends on ZCACHE_OLD=y && BROKEN default n help Zcache caches compressed swap pages (and other data) in RAM which diff --git a/drivers/staging/zcache/Makefile b/drivers/staging/zcache/Makefile index 845a5c2..34d27bd 100644 --- a/drivers/staging/zcache/Makefile +++ b/drivers/staging/zcache/Makefile @@ -1,8 +1,8 @@ zcache-y := zcache-main.o tmem.o zbud.o -zcache-$(CONFIG_ZCACHE_DEBUG) += debug.o +zcache-$(CONFIG_ZCACHE_OLD_DEBUG) += debug.o zcache-$(CONFIG_RAMSTER_DEBUG) += ramster/debug.o zcache-$(CONFIG_RAMSTER) += ramster/ramster.o ramster/r2net.o zcache-$(CONFIG_RAMSTER) += ramster/nodemanager.o ramster/tcp.o zcache-$(CONFIG_RAMSTER) += ramster/heartbeat.o ramster/masklog.o -obj-$(CONFIG_ZCACHE) += zcache.o +obj-$(CONFIG_ZCACHE_OLD) += zcache.o -- 1.7.10.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 3/4] mm: zcache: add evict zpages supporting 2013-08-06 11:36 [PATCH v2 0/4] zcache: a compressed file page cache Bob Liu 2013-08-06 11:36 ` [PATCH v2 1/4] mm: zcache: add core files Bob Liu 2013-08-06 11:36 ` [PATCH v2 2/4] zcache: staging: %s/ZCACHE/ZCACHE_OLD Bob Liu @ 2013-08-06 11:36 ` Bob Liu 2013-08-06 11:36 ` [PATCH v2 4/4] mm: add WasActive page flag Bob Liu ` (2 subsequent siblings) 5 siblings, 0 replies; 16+ messages in thread From: Bob Liu @ 2013-08-06 11:36 UTC (permalink / raw) To: linux-mm Cc: gregkh, ngupta, akpm, konrad.wilk, sjenning, riel, mgorman, kyungmin.park, p.sarna, barry.song, penberg, Bob Liu Implemented zbud_ops->evict, so that compressed zpages can be evicted from zbud memory pool in the case that the compressed pool is full. zbud already managered the compressed pool based on LRU. The evict was implemented just by dropping the compressed file page data directly, if the data is required again then no more disk reading can be saved. Signed-off-by: Bob Liu <bob.liu@oracle.com> --- mm/zcache.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 47 insertions(+), 6 deletions(-) diff --git a/mm/zcache.c b/mm/zcache.c index ec1a0eb..8c3222e 100644 --- a/mm/zcache.c +++ b/mm/zcache.c @@ -65,6 +65,9 @@ static u64 zcache_pool_limit_hit; static u64 zcache_dup_entry; static u64 zcache_zbud_alloc_fail; static u64 zcache_pool_pages; +static u64 zcache_evict_zpages; +static u64 zcache_evict_filepages; +static u64 zcache_reclaim_fail; static atomic_t zcache_stored_pages = ATOMIC_INIT(0); /* @@ -129,6 +132,7 @@ struct zcache_ra_handle { int rb_index; /* Redblack tree index */ int ra_index; /* Radix tree index */ int zlen; /* Compressed page size */ + struct zcache_pool *zpool; /* Finding zcache_pool during evict */ }; static struct kmem_cache *zcache_rbnode_cache; @@ -493,7 +497,16 @@ static void zcache_store_page(int pool_id, struct cleancache_filekey key, if (zcache_is_full()) { zcache_pool_limit_hit++; - return; + if (zbud_reclaim_page(zpool->pool, 8)) { + zcache_reclaim_fail++; + return; + } else { + /* + * Continue if eclaimed a page frame succ. + */ + zcache_evict_filepages++; + zcache_pool_pages = zbud_get_pool_size(zpool->pool); + } } /* compress */ @@ -521,6 +534,8 @@ static void zcache_store_page(int pool_id, struct cleancache_filekey key, zhandle->ra_index = index; zhandle->rb_index = key.u.ino; zhandle->zlen = zlen; + zhandle->zpool = zpool; + /* Compressed page data stored at the end of zcache_ra_handle */ zpage = (u8 *)(zhandle + 1); memcpy(zpage, dst, zlen); @@ -692,16 +707,36 @@ static void zcache_flush_fs(int pool_id) } /* - * Evict pages from zcache pool on an LRU basis after the compressed pool is - * full. + * Evict compressed pages from zcache pool on an LRU basis after the compressed + * pool is full. */ -static int zcache_evict_entry(struct zbud_pool *pool, unsigned long zaddr) +static int zcache_evict_zpage(struct zbud_pool *pool, unsigned long zaddr) { - return -1; + struct zcache_pool *zpool; + struct zcache_ra_handle *zhandle; + void *zaddr_intree; + + zhandle = (struct zcache_ra_handle *)zbud_map(pool, zaddr); + + zpool = zhandle->zpool; + BUG_ON(!zpool); + BUG_ON(pool != zpool->pool); + + zaddr_intree = zcache_load_delete_zaddr(zpool, zhandle->rb_index, + zhandle->ra_index); + if (zaddr_intree) { + BUG_ON((unsigned long)zaddr_intree != zaddr); + zbud_unmap(pool, zaddr); + zbud_free(pool, zaddr); + atomic_dec(&zcache_stored_pages); + zcache_pool_pages = zbud_get_pool_size(pool); + zcache_evict_zpages++; + } + return 0; } static struct zbud_ops zcache_zbud_ops = { - .evict = zcache_evict_entry + .evict = zcache_evict_zpage }; /* Return pool id */ @@ -832,6 +867,12 @@ static int __init zcache_debugfs_init(void) &zcache_pool_pages); debugfs_create_atomic_t("stored_pages", S_IRUGO, zcache_debugfs_root, &zcache_stored_pages); + debugfs_create_u64("evicted_zpages", S_IRUGO, zcache_debugfs_root, + &zcache_evict_zpages); + debugfs_create_u64("evicted_filepages", S_IRUGO, zcache_debugfs_root, + &zcache_evict_filepages); + debugfs_create_u64("reclaim_fail", S_IRUGO, zcache_debugfs_root, + &zcache_reclaim_fail); return 0; } -- 1.7.10.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 4/4] mm: add WasActive page flag 2013-08-06 11:36 [PATCH v2 0/4] zcache: a compressed file page cache Bob Liu ` (2 preceding siblings ...) 2013-08-06 11:36 ` [PATCH v2 3/4] mm: zcache: add evict zpages supporting Bob Liu @ 2013-08-06 11:36 ` Bob Liu 2013-08-13 6:01 ` Pekka Enberg 2013-08-06 13:58 ` [PATCH v2 0/4] zcache: a compressed file page cache Greg KH 2013-08-09 8:03 ` Bob Liu 5 siblings, 1 reply; 16+ messages in thread From: Bob Liu @ 2013-08-06 11:36 UTC (permalink / raw) To: linux-mm Cc: gregkh, ngupta, akpm, konrad.wilk, sjenning, riel, mgorman, kyungmin.park, p.sarna, barry.song, penberg, Bob Liu Zcache could be ineffective if the compressed memory pool is full with compressed inactive file pages and most of them will be never used again. So we pick up pages from active file list only, those pages would probably be accessed again. Compress them in memory can reduce the latency significantly compared with rereading from disk. When a file page is shrinked from active file list to inactive file list, PageActive flag is also cleared. So adding an extra WasActive page flag for zcache to know whether the file page was shrinked from the active list. Signed-off-by: Bob Liu <bob.liu@oracle.com> --- include/linux/page-flags.h | 9 ++++++++- mm/page_alloc.c | 3 +++ mm/vmscan.c | 11 ++++++++++- mm/zcache.c | 15 +++++++++++++++ 4 files changed, 36 insertions(+), 2 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 6d53675..ab433916 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -109,6 +109,9 @@ enum pageflags { #ifdef CONFIG_TRANSPARENT_HUGEPAGE PG_compound_lock, #endif +#ifdef CONFIG_CLEANCACHE + PG_was_active, +#endif __NR_PAGEFLAGS, /* Filesystems */ @@ -210,6 +213,9 @@ PAGEFLAG(Reserved, reserved) __CLEARPAGEFLAG(Reserved, reserved) PAGEFLAG(SwapBacked, swapbacked) __CLEARPAGEFLAG(SwapBacked, swapbacked) __PAGEFLAG(SlobFree, slob_free) +#ifdef CONFIG_CLEANCACHE +PAGEFLAG(WasActive, was_active) +#endif /* * Private page markings that may be used by the filesystem that owns the page @@ -509,7 +515,8 @@ static inline void ClearPageSlabPfmemalloc(struct page *page) * Pages being prepped should not have any flags set. It they are set, * there has been a kernel bug or struct page corruption. */ -#define PAGE_FLAGS_CHECK_AT_PREP ((1 << NR_PAGEFLAGS) - 1) +#define PAGE_FLAGS_CHECK_AT_PREP (((1 << NR_PAGEFLAGS) - 1) |\ + (1 << PG_was_active)) #define PAGE_FLAGS_PRIVATE \ (1 << PG_private | 1 << PG_private_2) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b100255..9505ced 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6345,6 +6345,9 @@ static const struct trace_print_flags pageflag_names[] = { #ifdef CONFIG_TRANSPARENT_HUGEPAGE {1UL << PG_compound_lock, "compound_lock" }, #endif +#ifdef CONFIG_CLEANCACHE + {1UL << PG_was_active, "was_active" }, +#endif }; static void dump_page_flags(unsigned long flags) diff --git a/mm/vmscan.c b/mm/vmscan.c index 2cff0d4..674f33f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1325,8 +1325,11 @@ putback_inactive_pages(struct lruvec *lruvec, struct list_head *page_list) lru = page_lru(page); add_page_to_lru_list(page, lruvec, lru); + int file = is_file_lru(lru); + if (IS_ENABLED(CONFIG_ZCACHE)) + if (file) + SetPageWasActive(page); if (is_active_lru(lru)) { - int file = is_file_lru(lru); int numpages = hpage_nr_pages(page); reclaim_stat->recent_rotated[file] += numpages; } @@ -1632,6 +1635,12 @@ static void shrink_active_list(unsigned long nr_to_scan, } ClearPageActive(page); /* we are de-activating */ + if (IS_ENABLED(CONFIG_ZCACHE)) + /* + * For zcache to know whether the page is from active + * file list + */ + SetPageWasActive(page); list_add(&page->lru, &l_inactive); } diff --git a/mm/zcache.c b/mm/zcache.c index 8c3222e..97ca274 100644 --- a/mm/zcache.c +++ b/mm/zcache.c @@ -67,6 +67,7 @@ static u64 zcache_zbud_alloc_fail; static u64 zcache_pool_pages; static u64 zcache_evict_zpages; static u64 zcache_evict_filepages; +static u64 zcache_inactive_pages_refused; static u64 zcache_reclaim_fail; static atomic_t zcache_stored_pages = ATOMIC_INIT(0); @@ -495,6 +496,17 @@ static void zcache_store_page(int pool_id, struct cleancache_filekey key, struct zcache_pool *zpool = zcache.pools[pool_id]; + /* + * Zcache will be ineffective if the compressed memory pool is full with + * compressed inactive file pages and most of them will never be used + * again. + * So we refuse to compress pages that are not from active file list. + */ + if (!PageWasActive(page)) { + zcache_inactive_pages_refused++; + return; + } + if (zcache_is_full()) { zcache_pool_limit_hit++; if (zbud_reclaim_page(zpool->pool, 8)) { @@ -588,6 +600,7 @@ static int zcache_load_page(int pool_id, struct cleancache_filekey key, /* update stats */ atomic_dec(&zcache_stored_pages); zcache_pool_pages = zbud_get_pool_size(zpool->pool); + SetPageWasActive(page); return ret; } @@ -873,6 +886,8 @@ static int __init zcache_debugfs_init(void) &zcache_evict_filepages); debugfs_create_u64("reclaim_fail", S_IRUGO, zcache_debugfs_root, &zcache_reclaim_fail); + debugfs_create_u64("inactive_pages_refused", S_IRUGO, + zcache_debugfs_root, &zcache_inactive_pages_refused); return 0; } -- 1.7.10.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2 4/4] mm: add WasActive page flag 2013-08-06 11:36 ` [PATCH v2 4/4] mm: add WasActive page flag Bob Liu @ 2013-08-13 6:01 ` Pekka Enberg 2013-08-13 13:50 ` Bob Liu 0 siblings, 1 reply; 16+ messages in thread From: Pekka Enberg @ 2013-08-13 6:01 UTC (permalink / raw) To: Bob Liu Cc: linux-mm, gregkh, ngupta, akpm, konrad.wilk, sjenning, riel, mgorman, kyungmin.park, p.sarna, barry.song, penberg, Bob Liu On 8/6/13 2:36 PM, Bob Liu wrote: > Zcache could be ineffective if the compressed memory pool is full with > compressed inactive file pages and most of them will be never used again. > > So we pick up pages from active file list only, those pages would probably be > accessed again. Compress them in memory can reduce the latency significantly > compared with rereading from disk. > > When a file page is shrinked from active file list to inactive file list, > PageActive flag is also cleared. > So adding an extra WasActive page flag for zcache to know whether the file page > was shrinked from the active list. > > Signed-off-by: Bob Liu <bob.liu@oracle.com> Using a page flag for this seems like an ugly hack to me. Can we rearrange the code so that vmscan notifies zcache *before* the active page flag is cleared...? Pekka -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 4/4] mm: add WasActive page flag 2013-08-13 6:01 ` Pekka Enberg @ 2013-08-13 13:50 ` Bob Liu 0 siblings, 0 replies; 16+ messages in thread From: Bob Liu @ 2013-08-13 13:50 UTC (permalink / raw) To: Pekka Enberg Cc: Bob Liu, linux-mm, gregkh, ngupta, akpm, konrad.wilk, sjenning, riel, mgorman, kyungmin.park, p.sarna, barry.song, penberg Hi Pekka, On 08/13/2013 02:01 PM, Pekka Enberg wrote: > On 8/6/13 2:36 PM, Bob Liu wrote: >> Zcache could be ineffective if the compressed memory pool is full with >> compressed inactive file pages and most of them will be never used again. >> >> So we pick up pages from active file list only, those pages would >> probably be >> accessed again. Compress them in memory can reduce the latency >> significantly >> compared with rereading from disk. >> >> When a file page is shrinked from active file list to inactive file list, >> PageActive flag is also cleared. >> So adding an extra WasActive page flag for zcache to know whether the >> file page >> was shrinked from the active list. >> >> Signed-off-by: Bob Liu <bob.liu@oracle.com> > Thank you so much for your review! > Using a page flag for this seems like an ugly hack to me. > Can we rearrange the code so that vmscan notifies zcache > *before* the active page flag is cleared...? Yep, adding a page flag is not a good idea. I'm looking at whether there is other way to notify zcache. BTW: Could you please give some feedback too about other zcache patches? > > Pekka -- Regards, -Bob -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 0/4] zcache: a compressed file page cache 2013-08-06 11:36 [PATCH v2 0/4] zcache: a compressed file page cache Bob Liu ` (3 preceding siblings ...) 2013-08-06 11:36 ` [PATCH v2 4/4] mm: add WasActive page flag Bob Liu @ 2013-08-06 13:58 ` Greg KH 2013-08-06 14:24 ` Bob Liu 2013-08-09 8:03 ` Bob Liu 5 siblings, 1 reply; 16+ messages in thread From: Greg KH @ 2013-08-06 13:58 UTC (permalink / raw) To: Bob Liu Cc: linux-mm, ngupta, akpm, konrad.wilk, sjenning, riel, mgorman, kyungmin.park, p.sarna, barry.song, penberg, Bob Liu On Tue, Aug 06, 2013 at 07:36:13PM +0800, Bob Liu wrote: > Dan Magenheimer extended zcache supporting both file pages and anonymous pages. > It's located in drivers/staging/zcache now. But the current version of zcache is > too complicated to be merged into upstream. Really? If this is so, I'll just go delete zcache now, I don't want to lug around dead code that will never be merged. greg k-h -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 0/4] zcache: a compressed file page cache 2013-08-06 13:58 ` [PATCH v2 0/4] zcache: a compressed file page cache Greg KH @ 2013-08-06 14:24 ` Bob Liu 2013-08-12 12:19 ` Konrad Rzeszutek Wilk 0 siblings, 1 reply; 16+ messages in thread From: Bob Liu @ 2013-08-06 14:24 UTC (permalink / raw) To: Greg KH Cc: Bob Liu, linux-mm, ngupta, akpm, konrad.wilk, sjenning, riel, mgorman, kyungmin.park, p.sarna, barry.song, penberg Hi Greg, On 08/06/2013 09:58 PM, Greg KH wrote: > On Tue, Aug 06, 2013 at 07:36:13PM +0800, Bob Liu wrote: >> Dan Magenheimer extended zcache supporting both file pages and anonymous pages. >> It's located in drivers/staging/zcache now. But the current version of zcache is >> too complicated to be merged into upstream. > > Really? If this is so, I'll just go delete zcache now, I don't want to > lug around dead code that will never be merged. > Zcache in staging have a zbud allocation which is almost the same as mm/zbud.c but with different API and have a frontswap backend like mm/zswap.c. So I'd prefer reuse mm/zbud.c and mm/zswap.c for a generic memory compression solution. Which means in that case, zcache in staging = mm/zswap.c + mm/zcache.c + mm/zbud.c. But I'm not sure if there are any existing users of zcache in staging, if not I can delete zcache from staging in my next version of this mm/zcache.c series. > greg k-h > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > -- Regards, -Bob -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 0/4] zcache: a compressed file page cache 2013-08-06 14:24 ` Bob Liu @ 2013-08-12 12:19 ` Konrad Rzeszutek Wilk 2013-08-12 12:25 ` Kyungmin Park ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Konrad Rzeszutek Wilk @ 2013-08-12 12:19 UTC (permalink / raw) To: Bob Liu Cc: Greg KH, Bob Liu, linux-mm, ngupta, akpm, sjenning, riel, mgorman, kyungmin.park, p.sarna, barry.song, penberg On Tue, Aug 06, 2013 at 10:24:20PM +0800, Bob Liu wrote: > Hi Greg, > > On 08/06/2013 09:58 PM, Greg KH wrote: > > On Tue, Aug 06, 2013 at 07:36:13PM +0800, Bob Liu wrote: > >> Dan Magenheimer extended zcache supporting both file pages and anonymous pages. > >> It's located in drivers/staging/zcache now. But the current version of zcache is > >> too complicated to be merged into upstream. > > > > Really? If this is so, I'll just go delete zcache now, I don't want to > > lug around dead code that will never be merged. > > > > Zcache in staging have a zbud allocation which is almost the same as > mm/zbud.c but with different API and have a frontswap backend like > mm/zswap.c. > So I'd prefer reuse mm/zbud.c and mm/zswap.c for a generic memory > compression solution. > Which means in that case, zcache in staging = mm/zswap.c + mm/zcache.c + > mm/zbud.c. > > But I'm not sure if there are any existing users of zcache in staging, > if not I can delete zcache from staging in my next version of this > mm/zcache.c series. I think the Samsung folks are using it (zcache). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 0/4] zcache: a compressed file page cache 2013-08-12 12:19 ` Konrad Rzeszutek Wilk @ 2013-08-12 12:25 ` Kyungmin Park 2013-08-12 12:30 ` Wanpeng Li 2013-08-12 12:30 ` Wanpeng Li 2 siblings, 0 replies; 16+ messages in thread From: Kyungmin Park @ 2013-08-12 12:25 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: Bob Liu, Greg KH, Bob Liu, linux-mm, ngupta, akpm, sjenning, riel, mgorman, p.sarna, barry.song, penberg On Mon, Aug 12, 2013 at 9:19 PM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > On Tue, Aug 06, 2013 at 10:24:20PM +0800, Bob Liu wrote: >> Hi Greg, >> >> On 08/06/2013 09:58 PM, Greg KH wrote: >> > On Tue, Aug 06, 2013 at 07:36:13PM +0800, Bob Liu wrote: >> >> Dan Magenheimer extended zcache supporting both file pages and anonymous pages. >> >> It's located in drivers/staging/zcache now. But the current version of zcache is >> >> too complicated to be merged into upstream. >> > >> > Really? If this is so, I'll just go delete zcache now, I don't want to >> > lug around dead code that will never be merged. >> > >> >> Zcache in staging have a zbud allocation which is almost the same as >> mm/zbud.c but with different API and have a frontswap backend like >> mm/zswap.c. >> So I'd prefer reuse mm/zbud.c and mm/zswap.c for a generic memory >> compression solution. >> Which means in that case, zcache in staging = mm/zswap.c + mm/zcache.c + >> mm/zbud.c. >> >> But I'm not sure if there are any existing users of zcache in staging, >> if not I can delete zcache from staging in my next version of this >> mm/zcache.c series. > > I think the Samsung folks are using it (zcache). I'm not sure, but, at least, my team doesn't use it at now. Thank you, Kyungmin Park -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 0/4] zcache: a compressed file page cache 2013-08-12 12:19 ` Konrad Rzeszutek Wilk 2013-08-12 12:25 ` Kyungmin Park @ 2013-08-12 12:30 ` Wanpeng Li 2013-08-12 12:30 ` Wanpeng Li 2 siblings, 0 replies; 16+ messages in thread From: Wanpeng Li @ 2013-08-12 12:30 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: Bob Liu, Greg KH, Bob Liu, linux-mm, ngupta, akpm, sjenning, riel, mgorman, kyungmin.park, p.sarna, barry.song, penberg On Mon, Aug 12, 2013 at 08:19:08AM -0400, Konrad Rzeszutek Wilk wrote: >On Tue, Aug 06, 2013 at 10:24:20PM +0800, Bob Liu wrote: >> Hi Greg, >> >> On 08/06/2013 09:58 PM, Greg KH wrote: >> > On Tue, Aug 06, 2013 at 07:36:13PM +0800, Bob Liu wrote: >> >> Dan Magenheimer extended zcache supporting both file pages and anonymous pages. >> >> It's located in drivers/staging/zcache now. But the current version of zcache is >> >> too complicated to be merged into upstream. >> > >> > Really? If this is so, I'll just go delete zcache now, I don't want to >> > lug around dead code that will never be merged. >> > >> >> Zcache in staging have a zbud allocation which is almost the same as >> mm/zbud.c but with different API and have a frontswap backend like >> mm/zswap.c. >> So I'd prefer reuse mm/zbud.c and mm/zswap.c for a generic memory >> compression solution. >> Which means in that case, zcache in staging = mm/zswap.c + mm/zcache.c + >> mm/zbud.c. >> >> But I'm not sure if there are any existing users of zcache in staging, >> if not I can delete zcache from staging in my next version of this >> mm/zcache.c series. > >I think the Samsung folks are using it (zcache). > Hi Konrad, If there are real users using ramster? And if Xen project using zcache and ramster in staging tree? Regards, Wanpeng Li >-- >To unsubscribe, send a message with 'unsubscribe linux-mm' in >the body to majordomo@kvack.org. For more info on Linux MM, >see: http://www.linux-mm.org/ . >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 0/4] zcache: a compressed file page cache 2013-08-12 12:19 ` Konrad Rzeszutek Wilk 2013-08-12 12:25 ` Kyungmin Park 2013-08-12 12:30 ` Wanpeng Li @ 2013-08-12 12:30 ` Wanpeng Li 2013-08-12 13:23 ` Konrad Rzeszutek Wilk 2 siblings, 1 reply; 16+ messages in thread From: Wanpeng Li @ 2013-08-12 12:30 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: Bob Liu, Greg KH, Bob Liu, linux-mm, ngupta, akpm, sjenning, riel, mgorman, kyungmin.park, p.sarna, barry.song, penberg On Mon, Aug 12, 2013 at 08:19:08AM -0400, Konrad Rzeszutek Wilk wrote: >On Tue, Aug 06, 2013 at 10:24:20PM +0800, Bob Liu wrote: >> Hi Greg, >> >> On 08/06/2013 09:58 PM, Greg KH wrote: >> > On Tue, Aug 06, 2013 at 07:36:13PM +0800, Bob Liu wrote: >> >> Dan Magenheimer extended zcache supporting both file pages and anonymous pages. >> >> It's located in drivers/staging/zcache now. But the current version of zcache is >> >> too complicated to be merged into upstream. >> > >> > Really? If this is so, I'll just go delete zcache now, I don't want to >> > lug around dead code that will never be merged. >> > >> >> Zcache in staging have a zbud allocation which is almost the same as >> mm/zbud.c but with different API and have a frontswap backend like >> mm/zswap.c. >> So I'd prefer reuse mm/zbud.c and mm/zswap.c for a generic memory >> compression solution. >> Which means in that case, zcache in staging = mm/zswap.c + mm/zcache.c + >> mm/zbud.c. >> >> But I'm not sure if there are any existing users of zcache in staging, >> if not I can delete zcache from staging in my next version of this >> mm/zcache.c series. > >I think the Samsung folks are using it (zcache). > Hi Konrad, If there are real users using ramster? And if Xen project using zcache and ramster in staging tree? Regards, Wanpeng Li >-- >To unsubscribe, send a message with 'unsubscribe linux-mm' in >the body to majordomo@kvack.org. For more info on Linux MM, >see: http://www.linux-mm.org/ . >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 0/4] zcache: a compressed file page cache 2013-08-12 12:30 ` Wanpeng Li @ 2013-08-12 13:23 ` Konrad Rzeszutek Wilk 2013-08-12 22:10 ` Greg KH 0 siblings, 1 reply; 16+ messages in thread From: Konrad Rzeszutek Wilk @ 2013-08-12 13:23 UTC (permalink / raw) To: Wanpeng Li Cc: Bob Liu, Greg KH, Bob Liu, linux-mm, ngupta, akpm, sjenning, riel, mgorman, kyungmin.park, p.sarna, barry.song, penberg On Mon, Aug 12, 2013 at 08:30:02PM +0800, Wanpeng Li wrote: > On Mon, Aug 12, 2013 at 08:19:08AM -0400, Konrad Rzeszutek Wilk wrote: > >On Tue, Aug 06, 2013 at 10:24:20PM +0800, Bob Liu wrote: > >> Hi Greg, > >> > >> On 08/06/2013 09:58 PM, Greg KH wrote: > >> > On Tue, Aug 06, 2013 at 07:36:13PM +0800, Bob Liu wrote: > >> >> Dan Magenheimer extended zcache supporting both file pages and anonymous pages. > >> >> It's located in drivers/staging/zcache now. But the current version of zcache is > >> >> too complicated to be merged into upstream. > >> > > >> > Really? If this is so, I'll just go delete zcache now, I don't want to > >> > lug around dead code that will never be merged. > >> > > >> > >> Zcache in staging have a zbud allocation which is almost the same as > >> mm/zbud.c but with different API and have a frontswap backend like > >> mm/zswap.c. > >> So I'd prefer reuse mm/zbud.c and mm/zswap.c for a generic memory > >> compression solution. > >> Which means in that case, zcache in staging = mm/zswap.c + mm/zcache.c + > >> mm/zbud.c. > >> > >> But I'm not sure if there are any existing users of zcache in staging, > >> if not I can delete zcache from staging in my next version of this > >> mm/zcache.c series. > > > >I think the Samsung folks are using it (zcache). > > > > Hi Konrad, > > If there are real users using ramster? And if Xen project using zcache > and ramster in staging tree? The Xen Project has an tmem API implementation which allows the 'tmem' driver (drivers/xen/tmem.c) to use it. The Linux tmem driver implements both frontswap and cleancache APIs. That means if a guest is running under Xen it has the same benefits as if it was running baremetal and using zswap + zcache3 (what Bob posted, which is the cleancache backend) or the old zcache2 (staging/zcache). One way to think about is that the compression, deduplication, etc are all hoisted in the hypervisor while each of the guests pipes the pages up/down using hypercalls. Xen Project does not need to use zcache2 (staging/zcache) as it can get the same benefits from using tmem. Thought if the user wanted they can certainly use it and bypass tmem and either load zcache2 or zswap and zcache3 (the one Bob posted). In regards to "real users using RAMster" - I am surmising you are wondering whether Oracle is offering this as a supported product to customers? The answer to that is no at this time as it is still in development and we would want it to be out of that before Oracle supports it in its distributions. Now "would want" and the reality of what can be done right now is a bit disjoint. I think that the next step is concentrating on making zswap awesome and also make the zcache3 (the patches that Bob posted) in shape to be merged in mm. It would be fantastic if folks took a look at the patches and gave comments. Thanks! P.S. Greg, since the Samsung folks are not using it, and we (Oracle) can patch our distro kernel to provide smörgåsbord of zcache2, zswap and zcache3, even zcache1 if needed. I think it is safe to delete staging/zcache and focus on getting the zcache3 (Bob's patchset) upstream. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 0/4] zcache: a compressed file page cache 2013-08-12 13:23 ` Konrad Rzeszutek Wilk @ 2013-08-12 22:10 ` Greg KH 0 siblings, 0 replies; 16+ messages in thread From: Greg KH @ 2013-08-12 22:10 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: Wanpeng Li, Bob Liu, Bob Liu, linux-mm, ngupta, akpm, sjenning, riel, mgorman, kyungmin.park, p.sarna, barry.song, penberg On Mon, Aug 12, 2013 at 09:23:10AM -0400, Konrad Rzeszutek Wilk wrote: > Greg, since the Samsung folks are not using it, and we (Oracle) can > patch our distro kernel to provide smorgasbord of zcache2, zswap > and zcache3, even zcache1 if needed. I think it is safe to > delete staging/zcache and focus on getting the zcache3 (Bob's > patchset) upstream. Ok, now deleted, thanks! greg k-h -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 0/4] zcache: a compressed file page cache 2013-08-06 11:36 [PATCH v2 0/4] zcache: a compressed file page cache Bob Liu ` (4 preceding siblings ...) 2013-08-06 13:58 ` [PATCH v2 0/4] zcache: a compressed file page cache Greg KH @ 2013-08-09 8:03 ` Bob Liu 5 siblings, 0 replies; 16+ messages in thread From: Bob Liu @ 2013-08-09 8:03 UTC (permalink / raw) To: Linux-MM, Linux-Kernel Cc: Greg Kroah-Hartman, Nitin Gupta, Andrew Morton, Konrad Rzeszutek Wilk, Seth Jennings, Rik van Riel, Mel Gorman, kyungmin.park, p.sarna, barry.song, Pekka Enberg, Bob Liu Another test case running sysbench only showed that the average time per request and transactions per second got around 7% faster! bootcmdline: mem=1G zcache.enabled=1 single sysbench --test=oltp --oltp-table-size=15000000 --oltp-read-only=off \ --init-rng=on --num-threads=16 --max-requests=0 \ --oltp-dist-type=special --oltp-dist-pct=10 \ --max-time=7200 --db-driver=mysql --mysql-table-engine=innodb \ --mysql-user=root \ --mysql-password=xxxx --oltp-test-mode=complex run Without zcache With zcache OLTP test statistics: queries performed: read: 2238446 2402372(+7%) write: 799445 857990 other: 319778 343196 total: 3357669 3603558 transactions: 159889(22.20 per sec.) 171598(23.83 per sec.) (+7%) deadlocks: 0(0.00 per sec.) 0(0.00 per sec.) read/write requests: 3037891(421.87 per sec.) 3260362(452.70 per sec.)(+7%) other operations: 319778(44.41 per sec.) 343196(47.65 per sec.) (+7%) Test execution summary: total time: 7201.0705s 7202.0176s total number of events: 159889 171598 total time taken by event execution: 115204.6708 115219.8235 per-request statistics: min: 94.25ms 57.80ms (+38%) avg: 720.53ms 671.45ms (+7%) max: 10684.90ms 7892.48ms (+26%) approx. 95 percentile: 1678.39ms 1699.62ms Threads fairness: events (avg/stddev): 9993.0625/28.05 10724.87500/30.32 execution time (avg/stddev): 7200.2919/0.30 7201.2390/0.48 By comparing /proc/vmstat, there is around 14G data reading are saved if enabled zcache! I believe zcache can also be helpful for many other file memory hungry applications and it do no harm for other users! Looking forward any feedback! On Tue, Aug 6, 2013 at 7:36 PM, Bob Liu <lliubbo@gmail.com> wrote: > Overview: > Zcache is a in kernel compressed cache for file pages. > It takes active file pages that are in the process of being reclaimed and > attempts to compress them into a dynamically allocated RAM-based memory pool. > > If this process is successful, when those file pages needed again, the I/O > reading operation was avoided. This results in a significant performance gains > under memory pressure for systems full with file pages. > > History: > Nitin Gupta started zcache in 2010: > http://lwn.net/Articles/397574/ > http://lwn.net/Articles/396467/ > > Dan Magenheimer extended zcache supporting both file pages and anonymous pages. > It's located in drivers/staging/zcache now. But the current version of zcache is > too complicated to be merged into upstream. > > Seth Jennings implemented a lightweight compressed cache for swap pages(zswap) > only which was merged into v3.11-rc1 together with a zbud allocation. > > What I'm trying is reimplement a simple zcache for file pages only, based on the > same zbud alloction layer. We can merge zswap and zcache to current zcache in > staging if there is the requirement in future. > > Who can benefit: > Applications like database which have a lot of file page data in memory, but > during memory pressure some of those file pages will be reclaimed after their > data are synced to disk. The data need to be reread into memory when they are > required again. This may increse the transaction latency and cause performance > drop. But with zcache, those data are compressed in memory. Only decompressing > is needed instead of reading from disk! > > Other users with limited RAM capacities can also mitigate the performance impact > of memory pressue if there are many file pages in memory. > > Design: > Zcache receives pages for compression through the Cleancache API and is able to > evict pages from its own compressed pool on an LRU basis in the case that the > compressed pool is full. > > Zcache makes use of zbud for the managing the compressed memory pool. Each > allocation in zbud is not directly accessible by address. Rather, a handle is > returned(zaddr) by the allocation routine and that handle(zaddr) must be mapped > before being accessed. The compressed memory pool grows on demand and shrinks > as compressed pages are freed. > > When a file page is passed from cleancache to zcache, zcache maintains a mapping > of the <filesystem_type, inode_number, page_index> to the zbud address that > references that compressed file page. This mapping is achieved with a red-black > tree per filesystem type, plus a radix tree per red-black node. > > A zcache pool with pool_id as the index is created when a filesystem mounted. > Each zcache pool has a red-black tree, the inode number is the search key. > Each red-black tree node has a radix tree which use page index as the index. > Each radix tree slot points to the zbud address combining with some extra > information. > > A debugfs interface is provided for various statistic about zcache pool size, > number of pages stored, loaded and evicted. > > Performance, Kernel Building: > > Setup > ======== > Ubuntu with kernel v3.11-rc1 > Quad-core i5-3320 @ 2.6GHz > 1G memory size(limited with mem=1G on boot) > started kernbench with -o N(numbers of threads) > > Details > ======== > Without zcache With zcache > > 8 threads > Elapsed Time 1821 1814(+0.3%) > User Time 5332 5304 > System Time 256 306 > Percent CPU 306 306 > Context Switches 1915378 1912027 > Sleeps 1501004 1492835 > > Nr pages succ decompress from zcache > - 8295 > > 24 threads > Elapsed Time 2556 2256(+11.7%) > User Time 5184 5225 > System Time 271 276 > Percent CPU 213 243 > Context Switches 1993763 2024661 > Sleeps 2000881 1849496 > > Nr pages succ decompress from zcache > - 174490 > > 36 threads > Elapsed Time 5254 3995(+23.9%) > User Time 4781 4947 > System Time 293 295 > Percent CPU 96 131 > Context Switches 1612581 1779860 > Sleeps 2944985 2414438 > > Nr pages succ decompress from zcache > - 380470 > > > Performance, Sysbench+mysql: > > Setup > ======== > Ubuntu with kernel v3.11-rc1 > Quad-core i5-3320 @ 2.6GHz > 2G memory size(limited with mem=2G on boot) > Run sysbench in oltp complex mode for 1 hour: > sysbench --test=oltp --oltp-table-size=5000000 --num-threads=16 --max-time=3600 > --oltp-test-mode=complex... > > After sysbench started, run iozone to trigger memory pressure: > iozone -a -M -B -s 1200M -y 4k -+u > > Sysbench result > ======== > Without zcache With zcache > OLTP test statistics: > queries performed: > read: 124320 166936 > write: 44400 59620 > other: 17760 23848 > total: 186480 250404 > transactions: 8880(2.47 per sec.) 11924(3.31 per sec.) (+34%) > deadlocks: 0(0.00 per sec.) 0(0.00 per sec.) > read/write requests: 168720(46.86 per sec.) 226556(62.91 per sec.)(+34%) > other operations: 17760(4.93 per sec.) 23848(6.62 per sec.) (+34%) > > Test execution summary: > total time: 3600.8528s 3601.3977s > total number of events: 8880 11924 > total time taken by event execution: > 57610.3546 57612.9163 > per-request statistics: > min: 57.68ms 49.52ms (+14%) > avg: 6487.65ms 4831.68ms (+25%) > max: 169640.52ms 124282.16ms (+42%) > approx. 95 percentile: 25139.93ms 21794.82ms (+13%) > > Threads fairness: > events (avg/stddev): 555.0000/6.05 745.2500/8.33 > execution time (avg/stddev): 3600.6472/0.26 3600.8073/0.27 > > Welcome helps with testing, it would be intersting to find zcache's effect in > more real life workloads. > > Bob Liu (4): > mm: zcache: add core files > zcache: staging: %s/ZCACHE/ZCACHE_OLD > mm: zcache: add evict zpages supporting > mm: add WasActive page flag > > drivers/staging/zcache/Kconfig | 12 +- > drivers/staging/zcache/Makefile | 4 +- > include/linux/page-flags.h | 9 +- > mm/Kconfig | 17 + > mm/Makefile | 1 + > mm/page_alloc.c | 3 + > mm/vmscan.c | 2 + > mm/zcache.c | 944 +++++++++++++++++++++++++++++++++++++++ > 8 files changed, 983 insertions(+), 9 deletions(-) > create mode 100644 mm/zcache.c > > -- > 1.7.10.4 > -- Regards, --Bob -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2013-08-13 13:51 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-08-06 11:36 [PATCH v2 0/4] zcache: a compressed file page cache Bob Liu 2013-08-06 11:36 ` [PATCH v2 1/4] mm: zcache: add core files Bob Liu 2013-08-06 11:36 ` [PATCH v2 2/4] zcache: staging: %s/ZCACHE/ZCACHE_OLD Bob Liu 2013-08-06 11:36 ` [PATCH v2 3/4] mm: zcache: add evict zpages supporting Bob Liu 2013-08-06 11:36 ` [PATCH v2 4/4] mm: add WasActive page flag Bob Liu 2013-08-13 6:01 ` Pekka Enberg 2013-08-13 13:50 ` Bob Liu 2013-08-06 13:58 ` [PATCH v2 0/4] zcache: a compressed file page cache Greg KH 2013-08-06 14:24 ` Bob Liu 2013-08-12 12:19 ` Konrad Rzeszutek Wilk 2013-08-12 12:25 ` Kyungmin Park 2013-08-12 12:30 ` Wanpeng Li 2013-08-12 12:30 ` Wanpeng Li 2013-08-12 13:23 ` Konrad Rzeszutek Wilk 2013-08-12 22:10 ` Greg KH 2013-08-09 8:03 ` Bob Liu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).