* [RFC] [PATCH v5 0/3] fadvise: support POSIX_FADV_NOREUSE @ 2012-02-12 0:21 Andrea Righi 2012-02-12 0:21 ` [PATCH v5 1/3] kinterval: routines to manipulate generic intervals Andrea Righi ` (3 more replies) 0 siblings, 4 replies; 27+ messages in thread From: Andrea Righi @ 2012-02-12 0:21 UTC (permalink / raw) To: Andrew Morton Cc: Minchan Kim, Peter Zijlstra, Johannes Weiner, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, linux-mm, linux-fsdevel, LKML There were some reported problems in the past about trashing page cache when a backup software (i.e., rsync) touches a huge amount of pages (see for example [1]). This problem was mitigated by the Minchan's patch [2] and a proper use of fadvise() in the backup software. For example this patch set [3] has been proposed for inclusion in rsync. However, there are still other trashing problems: when the backup software reads all the source files, some of them may be part of the actual working set of the system. If a POSIX_FADV_DONTNEED is performed _all_ pages are evicted from pagecache, both the working set and the use-once pages touched only by the backup. A previous proposal [4] was to use the POSIX_FADV_NOREUSE hint, currently implemented as a no-op, to provide a page invalidation policy less agressive than POSIX_FADV_DONTNEED (by moving active pages to the tail of the inactive list, and dropping the pages from the inactive list). However, as correctly pointed out by KOSAKI, this behavior is very different respect to the POSIX definition: POSIX_FADV_NOREUSE Specifies that the application expects to access the specified data once and then not reuse it thereafter. The new proposal is to implement POSIX_FADV_NOREUSE as a way to perform a real drop-behind policy where applications can mark certain intervals of a file as FADV_NOREUSE before accessing the data. In this way the marked pages will never blow away the current working set. This requirement can be satisfied by preventing lru activation of the FADV_NOREUSE pages. Moreover, all the file cache pages in a FADV_NOREUSE range will be immediately dropped after a read if the page was not present in the cache before. The purpose is to preserve as much as possible the previous state of the file cache memory before reading data in ranges marked via FADV_NOREUSE. The list of FADV_NOREUSE ranges are maintained into an interval tree [5] inside the address_space structure (for this a generic interval tree implementation is also provided by PATCH 1/2). The intervals are dropped when pages are also dropped from the page cache or when a different caching hint is specified on the same intervals. Example: How this can solve the "backup is trashing my page cache" issue? The "backup" software can be changed as following: - before: - fd_src = open /path/very_large_file - fd_dst = open /backup/very_large_file - read from fd_src - write to fd_dst - close fd_src and fd_dst - after: - fd_src = open /path/large_file - fd_dst = open /backup/very_large_file - posix_fadvise(fd_src, start, end, POSIX_FADV_NOREUSE) - read from fd_src (even multiple times) - write to fd_dst - posix_fadvise(fd_src, start, end, POSIX_FADV_NORMAL) - posix_fadvise(fd_dst, start, end, POSIX_FADV_DONTNEED) - close fd_src and fd_dst Simple test case: - read a chunk of 256MB from the middle of a large file (this represents the working set) - simulate a backup-like software reading the whole file using POSIX_FADV_NORMAL, or POSIX_FADV_DONTNEED or POSIX_FADV_NOREUSE - re-read the same 256MB chunk from the middle of the file and measure the time and the bio submitted (perf stat -e block:block_bio_queue) The test is done starting both with the working set in the inactive lru list (the 256MB chunk read once at the beginning of the test) and the working set in the active lru list (the 256MB chunk read twice at the beginning of the test). Results: inact_working_set = the working set is all in the inactive lru list act_working_set = the working set is all in the active lru list - block:block_bio_queue (events) inact_working_set act_working_set POSIX_FADV_NORMAL 2,052 0 POSIX_FADV_DONTNEED 0(*) 2048 POSIX_FADV_NOREUSE 0 0 - elapsed time (sec) inact_working_set act_working_set POSIX_FADV_NORMAL 1.013 0.070 POSIX_FADV_DONTNEED 0.070(*) 1.006 POSIX_FADV_NOREUSE 0.070 0.070 (*) With POSIX_FADV_DONTNEED I would expect to see the working set pages dropped from the page cache when it starts in the inactive lru list. Instead this happens only when we start with the working set all in the active list. IIUC this happens because when we re-read the pages the second time they're moved to the per-cpu vector activate_page_pvecs; if the page vector is not yet drained when we issue the POSIX_FADV_DONTNEED the advice is ignored. Anyway, for this particular test case the best solution to preserve the state of the page cache is obviouly the usage of POSIX_FADV_NOREUSE. Regression test: for i in `seq 1 10`; do fio --name=fio --directory=. --rw=read --bs=4K --size=1G --numjobs=4 | grep READ: done - before: READ: io=4096.0MB, aggrb=244737KB/s, minb=62652KB/s, maxb=63474KB/s, mint=16916msec, maxt=17138msec READ: io=4096.0MB, aggrb=242263KB/s, minb=62019KB/s, maxb=64380KB/s, mint=16678msec, maxt=17313msec READ: io=4096.0MB, aggrb=241913KB/s, minb=61929KB/s, maxb=62766KB/s, mint=17107msec, maxt=17338msec READ: io=4096.0MB, aggrb=244337KB/s, minb=62550KB/s, maxb=63776KB/s, mint=16836msec, maxt=17166msec READ: io=4096.0MB, aggrb=242768KB/s, minb=62148KB/s, maxb=62517KB/s, mint=17175msec, maxt=17277msec READ: io=4096.0MB, aggrb=242796KB/s, minb=62155KB/s, maxb=63191KB/s, mint=16992msec, maxt=17275msec READ: io=4096.0MB, aggrb=244352KB/s, minb=62554KB/s, maxb=63392KB/s, mint=16938msec, maxt=17165msec READ: io=4096.0MB, aggrb=242011KB/s, minb=61954KB/s, maxb=62368KB/s, mint=17216msec, maxt=17331msec READ: io=4096.0MB, aggrb=241676KB/s, minb=61869KB/s, maxb=63738KB/s, mint=16846msec, maxt=17355msec READ: io=4096.0MB, aggrb=242319KB/s, minb=62033KB/s, maxb=63362KB/s, mint=16946msec, maxt=17309msec avg aggrb = 242917KB/s, avg mint = 16965msec, avg maxt = 17267msec - after: READ: io=4096.0MB, aggrb=243968KB/s, minb=62455KB/s, maxb=63306KB/s, mint=16961msec, maxt=17192msec READ: io=4096.0MB, aggrb=242979KB/s, minb=62202KB/s, maxb=63127KB/s, mint=17009msec, maxt=17262msec READ: io=4096.0MB, aggrb=242473KB/s, minb=62073KB/s, maxb=62285KB/s, mint=17239msec, maxt=17298msec READ: io=4096.0MB, aggrb=244494KB/s, minb=62590KB/s, maxb=63272KB/s, mint=16970msec, maxt=17155msec READ: io=4096.0MB, aggrb=244352KB/s, minb=62554KB/s, maxb=63269KB/s, mint=16971msec, maxt=17165msec READ: io=4096.0MB, aggrb=241969KB/s, minb=61944KB/s, maxb=63444KB/s, mint=16924msec, maxt=17334msec READ: io=4096.0MB, aggrb=243303KB/s, minb=62285KB/s, maxb=62543KB/s, mint=17168msec, maxt=17239msec READ: io=4096.0MB, aggrb=243232KB/s, minb=62267KB/s, maxb=63109KB/s, mint=17014msec, maxt=17244msec READ: io=4096.0MB, aggrb=241969KB/s, minb=61944KB/s, maxb=62652KB/s, mint=17138msec, maxt=17334msec READ: io=4096.0MB, aggrb=241649KB/s, minb=61862KB/s, maxb=62616KB/s, mint=17148msec, maxt=17357msec avg aggrb = 243038KB/s, avg mint = 17054msec, avg maxt = 17258msec No obvious performance regression was found according to this simple test. Credits: - Some of the routines to implement the generic interval tree has been taken from the x86 PAT code, that uses interval trees to keep track of PAT ranges (in the future it would be interesting to convert also the x86 PAT code to use the generic interval tree implementation). - The idea to store the FADV_NOREUSE intervals into the address_space structure as been inspired by the John's POSIX_FADV_VOLATILE patch [6]. References: [1] http://marc.info/?l=rsync&m=128885034930933&w=2 [2] https://lkml.org/lkml/2011/2/20/57 [3] http://lists.samba.org/archive/rsync/2010-November/025827.html [4] http://thread.gmane.org/gmane.linux.kernel.mm/65493 [5] http://en.wikipedia.org/wiki/Interval_tree [6] http://thread.gmane.org/gmane.linux.kernel/1218654 ChangeLog v4 -> v5: - completely new redesign: implement the expected drop-behind policy maintaining the list of FADV_NOREUSE ranges inside the file [PATCH v5 1/3] kinterval: routines to manipulate generic intervals [PATCH v5 2/3] mm: filemap: introduce mark_page_usedonce [PATCH v5 3/3] fadvise: implement POSIX_FADV_NOREUSE fs/inode.c | 3 + include/linux/fs.h | 12 ++ include/linux/kinterval.h | 126 ++++++++++++ include/linux/swap.h | 1 + lib/Makefile | 2 +- lib/kinterval.c | 483 +++++++++++++++++++++++++++++++++++++++++++++ mm/fadvise.c | 18 ++- mm/filemap.c | 95 +++++++++- mm/swap.c | 24 +++ 9 files changed, 760 insertions(+), 4 deletions(-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v5 1/3] kinterval: routines to manipulate generic intervals 2012-02-12 0:21 [RFC] [PATCH v5 0/3] fadvise: support POSIX_FADV_NOREUSE Andrea Righi @ 2012-02-12 0:21 ` Andrea Righi 2012-02-13 0:48 ` Andrea Righi 2012-02-12 0:21 ` [PATCH v5 2/3] mm: filemap: introduce mark_page_usedonce Andrea Righi ` (2 subsequent siblings) 3 siblings, 1 reply; 27+ messages in thread From: Andrea Righi @ 2012-02-12 0:21 UTC (permalink / raw) To: Andrew Morton Cc: Minchan Kim, Peter Zijlstra, Johannes Weiner, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, linux-mm, linux-fsdevel, LKML Add a generic infrastructure to efficiently keep track of intervals. An interval is represented by a triplet (start, end, type). The values (start, end) define the bounds of the range. The type is a generic property associated to the interval. The interpretation of the type is left to the user. Multiple intervals associated to the same object are stored in an interval tree (augmented rbtree) [1], with tree ordered on starting address. The tree cannot contain multiple entries of different interval types which overlap; in case of overlapping intervals new inserted intervals overwrite the old ones (completely or in part, in the second case the old interval is shrunk or split accordingly). Reference: [1] "Introduction to Algorithms" by Cormen, Leiserson, Rivest and Stein Signed-off-by: Andrea Righi <andrea@betterlinux.com> --- include/linux/kinterval.h | 126 ++++++++++++ lib/Makefile | 2 +- lib/kinterval.c | 483 +++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 610 insertions(+), 1 deletions(-) create mode 100644 include/linux/kinterval.h create mode 100644 lib/kinterval.c diff --git a/include/linux/kinterval.h b/include/linux/kinterval.h new file mode 100644 index 0000000..8152265 --- /dev/null +++ b/include/linux/kinterval.h @@ -0,0 +1,126 @@ +/* + * kinterval.h - Routines for manipulating generic intervals + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public + * License along with this program; if not, write to the + * Free Software Foundation, Inc., 59 Temple Place - Suite 330, + * Boston, MA 021110-1307, USA. + * + * Copyright (C) 2012 Andrea Righi <andrea@betterlinux.com> + */ + +#ifndef _LINUX_KINTERVAL_H +#define _LINUX_KINTERVAL_H + +#include <linux/types.h> +#include <linux/rbtree.h> + +/** + * struct kinterval - define a range in an interval tree + * @start: address representing the start of the range. + * @end: address representing the end of the range. + * @subtree_max_end: augmented rbtree data to perform quick lookup of the + * overlapping ranges. + * @type: type of the interval (defined by the user). + * @rb: the rbtree node. + */ +struct kinterval { + u64 start; + u64 end; + u64 subtree_max_end; + unsigned long type; + struct rb_node rb; +}; + +/** + * DECLARE_KINTERVAL_TREE - macro to declare an interval tree + * @__name: name of the declared interval tree. + * + * The tree is an interval tree (augmented rbtree) with tree ordered + * on starting address. Tree cannot contain multiple entries for differnt + * ranges which overlap; in case of overlapping ranges new inserted intervals + * overwrite the old ones (completely or in part, in the second case the old + * interval is shrinked accordingly). + * + * NOTE: all locking issues are left to the caller. + * + * Reference: + * "Introduction to Algorithms" by Cormen, Leiserson, Rivest and Stein. + */ +#define DECLARE_KINTERVAL_TREE(__name) struct rb_root __name + +/** + * DEFINE_KINTERVAL_TREE - macro to define and initialize an interval tree + * @__name: name of the declared interval tree. + */ +#define DEFINE_KINTERVAL_TREE(__name) \ + struct rb_root __name = RB_ROOT + +/** + * INIT_KINTERVAL_TREE_ROOT - macro to initialize an interval tree + * @__root: root of the declared interval tree. + */ +#define INIT_KINTERVAL_TREE_ROOT(__root) \ + do { \ + (__root)->rb_node = NULL; \ + } while (0) + +/** + * kinterval_add - define a new range into the interval tree + * @root: the root of the tree. + * @start: start of the range to define. + * @end: end of the range to define. + * @type: attribute assinged to the range. + * @flags: type of memory to allocate (see kcalloc). + */ +int kinterval_add(struct rb_root *root, u64 start, u64 end, + long type, gfp_t flags); + +/** + * kinterval_del - erase a range from the interval tree + * @root: the root of the tree. + * @start: start of the range to erase. + * @end: end of the range to erase. + * @flags: type of memory to allocate (see kcalloc). + */ +int kinterval_del(struct rb_root *root, u64 start, u64 end, gfp_t flags); + +/** + * kinterval_lookup_range - return the attribute of a range + * @root: the root of the tree. + * @start: start of the range to lookup. + * @end: end of the range to lookup. + * + * NOTE: return the type of the lowest match, if the range specified by the + * arguments overlaps multiple intervals only the type of the first one + * (lowest) is returned. + */ +long kinterval_lookup_range(struct rb_root *root, u64 start, u64 end); + +/** + * kinterval_lookup - return the attribute of an address + * @root: the root of the tree. + * @addr: address to lookup. + */ +static inline long kinterval_lookup(struct rb_root *root, u64 addr) +{ + return kinterval_lookup_range(root, addr, addr); +} + +/** + * kinterval_clear - erase all intervals defined in an interval tree + * @root: the root of the tree. + */ +void kinterval_clear(struct rb_root *root); + +#endif /* _LINUX_KINTERVAL_H */ diff --git a/lib/Makefile b/lib/Makefile index 18515f0..9a648ba 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -12,7 +12,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \ idr.o int_sqrt.o extable.o prio_tree.o \ sha1.o md5.o irq_regs.o reciprocal_div.o argv_split.o \ proportions.o prio_heap.o ratelimit.o show_mem.o \ - is_single_threaded.o plist.o decompress.o + is_single_threaded.o plist.o decompress.o kinterval.o lib-$(CONFIG_MMU) += ioremap.o lib-$(CONFIG_SMP) += cpumask.o diff --git a/lib/kinterval.c b/lib/kinterval.c new file mode 100644 index 0000000..2a9d463 --- /dev/null +++ b/lib/kinterval.c @@ -0,0 +1,483 @@ +/* + * kinterval.c - Routines for manipulating generic intervals + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public + * License along with this program; if not, write to the + * Free Software Foundation, Inc., 59 Temple Place - Suite 330, + * Boston, MA 021110-1307, USA. + * + * Copyright (C) 2012 Andrea Righi <andrea@betterlinux.com> + */ + +#include <linux/init.h> +#include <linux/version.h> +#include <linux/module.h> +#include <linux/slab.h> +#include <linux/uaccess.h> +#include <linux/rbtree.h> +#include <linux/kinterval.h> + +static struct kmem_cache *kinterval_cachep __read_mostly; + +static bool is_interval_overlapping(struct kinterval *node, u64 start, u64 end) +{ + return !(node->start > end || node->end < start); +} + +static u64 get_subtree_max_end(struct rb_node *node) +{ + struct kinterval *range; + + if (!node) + return 0; + range = rb_entry(node, struct kinterval, rb); + + return range->subtree_max_end; +} + +/* Update 'subtree_max_end' for a node, based on node and its children */ +static void kinterval_rb_augment_cb(struct rb_node *node, void *__unused) +{ + struct kinterval *range; + u64 max_end, child_max_end; + + if (!node) + return; + range = rb_entry(node, struct kinterval, rb); + max_end = range->end; + + child_max_end = get_subtree_max_end(node->rb_right); + if (child_max_end > max_end) + max_end = child_max_end; + + child_max_end = get_subtree_max_end(node->rb_left); + if (child_max_end > max_end) + max_end = child_max_end; + + range->subtree_max_end = max_end; +} + +/* + * Find the lowest overlapping range from the tree. + * + * Return NULL if there is no overlap. + */ +static struct kinterval * +kinterval_rb_lowest_match(struct rb_root *root, u64 start, u64 end) +{ + struct rb_node *node = root->rb_node; + struct kinterval *lower_range = NULL; + + while (node) { + struct kinterval *range = rb_entry(node, struct kinterval, rb); + + if (get_subtree_max_end(node->rb_left) > start) { + node = node->rb_left; + } else if (is_interval_overlapping(range, start, end)) { + lower_range = range; + break; + } else if (start >= range->start) { + node = node->rb_right; + } else { + break; + } + } + return lower_range; +} + +static void +kinterval_rb_insert(struct rb_root *root, struct kinterval *new) +{ + struct rb_node **node = &(root->rb_node); + struct rb_node *parent = NULL; + + while (*node) { + struct kinterval *range = rb_entry(*node, struct kinterval, rb); + + parent = *node; + if (new->start <= range->start) + node = &((*node)->rb_left); + else if (new->start > range->start) + node = &((*node)->rb_right); + } + + rb_link_node(&new->rb, parent, node); + rb_insert_color(&new->rb, root); + rb_augment_insert(&new->rb, kinterval_rb_augment_cb, NULL); +} + +/* Merge adjacent intervals */ +static void kinterval_rb_merge(struct rb_root *root) +{ + struct kinterval *next, *prev = NULL; + struct rb_node *node, *deepest; + + node = rb_first(root); + while (node) { + next = rb_entry(node, struct kinterval, rb); + node = rb_next(&next->rb); + + if (prev && prev->type == next->type && + prev->end == (next->start - 1) && + prev->end < next->start) { + prev->end = next->end; + deepest = rb_augment_erase_begin(&next->rb); + rb_erase(&next->rb, root); + rb_augment_erase_end(deepest, + kinterval_rb_augment_cb, NULL); + kmem_cache_free(kinterval_cachep, next); + } else { + prev = next; + } + } +} + +static int kinterval_rb_check_add(struct rb_root *root, + struct kinterval *new, gfp_t flags) +{ + struct kinterval *old; + struct rb_node *node, *deepest; + + node = rb_first(root); + while (node) { + old = rb_entry(node, struct kinterval, rb); + /* Check all the possible matches within the range */ + if (old->start > new->end) + break; + node = rb_next(&old->rb); + + if (!is_interval_overlapping(old, new->start, new->end)) + continue; + /* + * Interval is overlapping another one, shrink the old interval + * accordingly. + */ + if (new->start == old->start && new->end == old->end) { + /* + * Exact match, just update the type: + * + * old + * |___________________| + * new + * |___________________| + */ + old->type = new->type; + kmem_cache_free(kinterval_cachep, new); + return 0; + } else if (new->start <= old->start && new->end >= old->end) { + /* + * New range completely overwrites the old one: + * + * old + * |________| + * new + * |___________________| + * + * Replace old with new. + */ + deepest = rb_augment_erase_begin(&old->rb); + rb_erase(&old->rb, root); + rb_augment_erase_end(deepest, kinterval_rb_augment_cb, + NULL); + kmem_cache_free(kinterval_cachep, old); + } else if (new->start <= old->start && new->end <= old->end) { + /* + * Update the start of the interval: + * + * - before: + * + * old + * |_____________| + * new + * |___________| + * + * - after: + * + * new old + * |___________|_______| + */ + rb_erase(&old->rb, root); + old->start = new->end + 1; + kinterval_rb_insert(root, old); + break; + } else if (new->start >= old->start && new->end >= old->end) { + /* + * Update the end of the interval: + * + * - before: + * + * old + * |_____________| + * new + * |___________| + * + * - after: + * + * old new + * |________|__________| + */ + deepest = rb_augment_erase_begin(&old->rb); + rb_erase(&old->rb, root); + rb_augment_erase_end(deepest, kinterval_rb_augment_cb, + NULL); + old->end = new->start - 1; + old->subtree_max_end = old->end; + kinterval_rb_insert(root, old); + } else if (new->start >= old->start && new->end <= old->end) { + struct kinterval *prev; + + if (new->type == old->type) { + /* Same type, just drop the new element */ + kmem_cache_free(kinterval_cachep, new); + return 0; + } + /* + * Insert the new interval in the middle of another + * one. + * + * - before: + * + * old + * |___________________| + * new + * |_______| + * + * - after: + * + * prev new old + * |_____|_______|_____| + */ + prev = kmem_cache_zalloc(kinterval_cachep, flags); + if (unlikely(!prev)) + return -ENOMEM; + + rb_erase(&old->rb, root); + + prev->start = old->start; + old->start = new->end + 1; + prev->end = new->start - 1; + prev->type = old->type; + + kinterval_rb_insert(root, old); + + new->subtree_max_end = new->end; + kinterval_rb_insert(root, new); + + prev->subtree_max_end = prev->end; + kinterval_rb_insert(root, prev); + return 0; + } + } + new->subtree_max_end = new->end; + kinterval_rb_insert(root, new); + kinterval_rb_merge(root); + + return 0; +} + +int kinterval_add(struct rb_root *root, u64 start, u64 end, + long type, gfp_t flags) +{ + struct kinterval *range; + int ret; + + if (end < start) + return -EINVAL; + range = kmem_cache_zalloc(kinterval_cachep, flags); + if (unlikely(!range)) + return -ENOMEM; + range->start = start; + range->end = end; + range->type = type; + + ret = kinterval_rb_check_add(root, range, flags); + if (unlikely(ret < 0)) + kmem_cache_free(kinterval_cachep, range); + + return ret; +} +EXPORT_SYMBOL(kinterval_add); + +static int kinterval_rb_check_del(struct rb_root *root, + u64 start, u64 end, gfp_t flags) +{ + struct kinterval *old; + struct rb_node *node, *deepest; + + node = rb_first(root); + while (node) { + old = rb_entry(node, struct kinterval, rb); + /* Check all the possible matches within the range */ + if (old->start > end) + break; + node = rb_next(&old->rb); + + if (!is_interval_overlapping(old, start, end)) + continue; + if (start <= old->start && end >= old->end) { + /* + * Completely erase the old range: + * + * old + * |________| + * erase + * |___________________| + */ + deepest = rb_augment_erase_begin(&old->rb); + rb_erase(&old->rb, root); + rb_augment_erase_end(deepest, kinterval_rb_augment_cb, + NULL); + kmem_cache_free(kinterval_cachep, old); + } else if (start <= old->start && end <= old->end) { + /* + * Trim the beginning of an interval: + * + * - before: + * + * old + * |_____________| + * erase + * |___________| + * + * - after: + * + * old + * |_______| + */ + rb_erase(&old->rb, root); + old->start = end + 1; + kinterval_rb_insert(root, old); + break; + } else if (start >= old->start && end >= old->end) { + /* + * Trim the end of an interval: + * + * - before: + * + * old + * |_____________| + * erase + * |___________| + * + * - after: + * + * old + * |________| + */ + deepest = rb_augment_erase_begin(&old->rb); + rb_erase(&old->rb, root); + rb_augment_erase_end(deepest, kinterval_rb_augment_cb, + NULL); + old->end = start - 1; + old->subtree_max_end = old->end; + kinterval_rb_insert(root, old); + } else if (start >= old->start && end <= old->end) { + struct kinterval *prev; + + /* + * Trim the middle of an interval: + * + * - before: + * + * old + * |___________________| + * erase + * |_______| + * + * - after: + * + * prev old + * |_____| |_____| + */ + prev = kmem_cache_zalloc(kinterval_cachep, flags); + if (unlikely(!prev)) + return -ENOMEM; + + rb_erase(&old->rb, root); + + prev->start = old->start; + old->start = end + 1; + prev->end = start - 1; + prev->type = old->type; + + kinterval_rb_insert(root, old); + + prev->subtree_max_end = prev->end; + kinterval_rb_insert(root, prev); + break; + } + } + return 0; +} + +int kinterval_del(struct rb_root *root, u64 start, u64 end, gfp_t flags) +{ + if (end < start) + return -EINVAL; + return kinterval_rb_check_del(root, start, end, flags); +} +EXPORT_SYMBOL(kinterval_del); + +void kinterval_clear(struct rb_root *root) +{ + struct kinterval *range; + struct rb_node *node; + + node = rb_first(root); + while (node) { + range = rb_entry(node, struct kinterval, rb); +#ifdef DEBUG + printk(KERN_INFO "start=%llu end=%llu type=%lu\n", + range->start, range->end, range->type); +#endif + node = rb_next(&range->rb); + rb_erase(&range->rb, root); + kmem_cache_free(kinterval_cachep, range); + } +} +EXPORT_SYMBOL(kinterval_clear); + +long kinterval_lookup_range(struct rb_root *root, u64 start, u64 end) +{ + struct kinterval *range; + + if (end < start) + return -EINVAL; + range = kinterval_rb_lowest_match(root, start, end); + return range ? range->type : -ENOENT; +} +EXPORT_SYMBOL(kinterval_lookup_range); + +static int __init kinterval_init(void) +{ + kinterval_cachep = kmem_cache_create("kinterval_cache", + sizeof(struct kinterval), + 0, 0, NULL); + if (unlikely(!kinterval_cachep)) { + printk(KERN_ERR "kinterval: failed to create slab cache\n"); + return -ENOMEM; + } + return 0; +} + +static void __exit kinterval_exit(void) +{ + kmem_cache_destroy(kinterval_cachep); +} + +module_init(kinterval_init); +module_exit(kinterval_exit); + +MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("Generic interval ranges"); +MODULE_AUTHOR("Andrea Righi <andrea@betterlinux.com>"); -- 1.7.5.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v5 1/3] kinterval: routines to manipulate generic intervals 2012-02-12 0:21 ` [PATCH v5 1/3] kinterval: routines to manipulate generic intervals Andrea Righi @ 2012-02-13 0:48 ` Andrea Righi 0 siblings, 0 replies; 27+ messages in thread From: Andrea Righi @ 2012-02-13 0:48 UTC (permalink / raw) To: Andrew Morton Cc: Minchan Kim, Peter Zijlstra, Johannes Weiner, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, linux-mm, linux-fsdevel, LKML On Sun, Feb 12, 2012 at 01:21:36AM +0100, Andrea Righi wrote: > Add a generic infrastructure to efficiently keep track of intervals. > > An interval is represented by a triplet (start, end, type). The values > (start, end) define the bounds of the range. The type is a generic > property associated to the interval. The interpretation of the type is > left to the user. > > Multiple intervals associated to the same object are stored in an > interval tree (augmented rbtree) [1], with tree ordered on starting > address. The tree cannot contain multiple entries of different > interval types which overlap; in case of overlapping intervals new > inserted intervals overwrite the old ones (completely or in part, in the > second case the old interval is shrunk or split accordingly). > > Reference: > [1] "Introduction to Algorithms" by Cormen, Leiserson, Rivest and Stein > > Signed-off-by: Andrea Righi <andrea@betterlinux.com> > --- For those who are interested, I've an extra patch for this part (must be applied on top of the previous one): there's a bug fix and some improvements. I'll include the following fix in the next version of the POSIX_FADV_NOREUSE patch set (sorry for this, but the whole patch set is still totally experimental, especially the kinterval part, I also plan to add a proper documentation and a sample test case in the samples/ dir if we think it'll be useful). Thanks, -Andrea --- From: Andrea Righi <andrea@betterlinux.com> Subject: [PATCH] kinterval: fix interval boundaries and optimize insertion/removal Use the right notation [start, end) instead of [start, end] for interval boundaries, so now an interval does not include its endpoint. This is the natural way to define a memory range (i.e, 0-4096 = all bytes between 0 and 4095 included) and it makes easier to reuse this code also for other similar cases. Moreover, optimize insertion and removal code to be actually O(log(n)). Signed-off-by: Andrea Righi <andrea@betterlinux.com> --- include/linux/kinterval.h | 2 +- lib/kinterval.c | 135 ++++++++++++++++++++++++++++----------------- 2 files changed, 86 insertions(+), 51 deletions(-) diff --git a/include/linux/kinterval.h b/include/linux/kinterval.h index 8152265..7a505f4 100644 --- a/include/linux/kinterval.h +++ b/include/linux/kinterval.h @@ -114,7 +114,7 @@ long kinterval_lookup_range(struct rb_root *root, u64 start, u64 end); */ static inline long kinterval_lookup(struct rb_root *root, u64 addr) { - return kinterval_lookup_range(root, addr, addr); + return kinterval_lookup_range(root, addr, addr + 1); } /** diff --git a/lib/kinterval.c b/lib/kinterval.c index 2a9d463..28ee627 100644 --- a/lib/kinterval.c +++ b/lib/kinterval.c @@ -31,7 +31,7 @@ static struct kmem_cache *kinterval_cachep __read_mostly; static bool is_interval_overlapping(struct kinterval *node, u64 start, u64 end) { - return !(node->start > end || node->end < start); + return !(node->start >= end || node->end <= start); } static u64 get_subtree_max_end(struct rb_node *node) @@ -76,23 +76,65 @@ static struct kinterval * kinterval_rb_lowest_match(struct rb_root *root, u64 start, u64 end) { struct rb_node *node = root->rb_node; - struct kinterval *lower_range = NULL; + struct kinterval *lowest_match = NULL; while (node) { struct kinterval *range = rb_entry(node, struct kinterval, rb); if (get_subtree_max_end(node->rb_left) > start) { + /* Lowest overlap if any must be on the left side */ node = node->rb_left; } else if (is_interval_overlapping(range, start, end)) { - lower_range = range; + lowest_match = range; break; } else if (start >= range->start) { + /* Lowest overlap if any must be on the right side */ node = node->rb_right; } else { break; } } - return lower_range; + return lowest_match; +} + +/* + * Merge two adjacent intervals, if they can be merged next is removed from the + * tree. + */ +static void kinterval_rb_merge_node(struct rb_root *root, + struct kinterval *prev, struct kinterval *next) +{ + struct rb_node *deepest; + + if (prev && prev->type == next->type && prev->end == next->start) { + prev->end = next->end; + deepest = rb_augment_erase_begin(&next->rb); + rb_erase(&next->rb, root); + rb_augment_erase_end(deepest, + kinterval_rb_augment_cb, NULL); + kmem_cache_free(kinterval_cachep, next); + } +} + +/* + * Try to merge a new inserted interval with the previous and the next + * interval. + */ +static void kinterval_rb_merge(struct rb_root *root, struct kinterval *new) +{ + struct kinterval *next, *prev; + struct rb_node *node; + + node = rb_prev(&new->rb); + prev = node ? rb_entry(node, struct kinterval, rb) : NULL; + + node = rb_next(&new->rb); + next = node ? rb_entry(node, struct kinterval, rb) : NULL; + + if (next) + kinterval_rb_merge_node(root, new, next); + if (prev) + kinterval_rb_merge_node(root, prev, new); } static void @@ -114,32 +156,8 @@ kinterval_rb_insert(struct rb_root *root, struct kinterval *new) rb_link_node(&new->rb, parent, node); rb_insert_color(&new->rb, root); rb_augment_insert(&new->rb, kinterval_rb_augment_cb, NULL); -} -/* Merge adjacent intervals */ -static void kinterval_rb_merge(struct rb_root *root) -{ - struct kinterval *next, *prev = NULL; - struct rb_node *node, *deepest; - - node = rb_first(root); - while (node) { - next = rb_entry(node, struct kinterval, rb); - node = rb_next(&next->rb); - - if (prev && prev->type == next->type && - prev->end == (next->start - 1) && - prev->end < next->start) { - prev->end = next->end; - deepest = rb_augment_erase_begin(&next->rb); - rb_erase(&next->rb, root); - rb_augment_erase_end(deepest, - kinterval_rb_augment_cb, NULL); - kmem_cache_free(kinterval_cachep, next); - } else { - prev = next; - } - } + kinterval_rb_merge(root, new); } static int kinterval_rb_check_add(struct rb_root *root, @@ -148,16 +166,17 @@ static int kinterval_rb_check_add(struct rb_root *root, struct kinterval *old; struct rb_node *node, *deepest; - node = rb_first(root); + old = kinterval_rb_lowest_match(root, new->start, new->end); + node = old ? &old->rb : NULL; + while (node) { old = rb_entry(node, struct kinterval, rb); + node = rb_next(&old->rb); + /* Check all the possible matches within the range */ - if (old->start > new->end) + if (old->start >= new->end) break; - node = rb_next(&old->rb); - if (!is_interval_overlapping(old, new->start, new->end)) - continue; /* * Interval is overlapping another one, shrink the old interval * accordingly. @@ -206,8 +225,12 @@ static int kinterval_rb_check_add(struct rb_root *root, * new old * |___________|_______| */ + deepest = rb_augment_erase_begin(&old->rb); rb_erase(&old->rb, root); + rb_augment_erase_end(deepest, kinterval_rb_augment_cb, + NULL); old->start = new->end + 1; + old->subtree_max_end = old->end; kinterval_rb_insert(root, old); break; } else if (new->start >= old->start && new->end >= old->end) { @@ -230,7 +253,7 @@ static int kinterval_rb_check_add(struct rb_root *root, rb_erase(&old->rb, root); rb_augment_erase_end(deepest, kinterval_rb_augment_cb, NULL); - old->end = new->start - 1; + old->end = new->start; old->subtree_max_end = old->end; kinterval_rb_insert(root, old); } else if (new->start >= old->start && new->end <= old->end) { @@ -261,13 +284,17 @@ static int kinterval_rb_check_add(struct rb_root *root, if (unlikely(!prev)) return -ENOMEM; + deepest = rb_augment_erase_begin(&old->rb); rb_erase(&old->rb, root); + rb_augment_erase_end(deepest, kinterval_rb_augment_cb, + NULL); prev->start = old->start; - old->start = new->end + 1; - prev->end = new->start - 1; + old->start = new->end; + prev->end = new->start; prev->type = old->type; + old->subtree_max_end = old->end; kinterval_rb_insert(root, old); new->subtree_max_end = new->end; @@ -280,7 +307,6 @@ static int kinterval_rb_check_add(struct rb_root *root, } new->subtree_max_end = new->end; kinterval_rb_insert(root, new); - kinterval_rb_merge(root); return 0; } @@ -291,7 +317,7 @@ int kinterval_add(struct rb_root *root, u64 start, u64 end, struct kinterval *range; int ret; - if (end < start) + if (end <= start) return -EINVAL; range = kmem_cache_zalloc(kinterval_cachep, flags); if (unlikely(!range)) @@ -314,16 +340,17 @@ static int kinterval_rb_check_del(struct rb_root *root, struct kinterval *old; struct rb_node *node, *deepest; - node = rb_first(root); + old = kinterval_rb_lowest_match(root, start, end); + node = old ? &old->rb : NULL; + while (node) { old = rb_entry(node, struct kinterval, rb); + node = rb_next(&old->rb); + /* Check all the possible matches within the range */ - if (old->start > end) + if (old->start >= end) break; - node = rb_next(&old->rb); - if (!is_interval_overlapping(old, start, end)) - continue; if (start <= old->start && end >= old->end) { /* * Completely erase the old range: @@ -354,8 +381,12 @@ static int kinterval_rb_check_del(struct rb_root *root, * old * |_______| */ + deepest = rb_augment_erase_begin(&old->rb); rb_erase(&old->rb, root); - old->start = end + 1; + rb_augment_erase_end(deepest, kinterval_rb_augment_cb, + NULL); + old->start = end; + old->subtree_max_end = old->end; kinterval_rb_insert(root, old); break; } else if (start >= old->start && end >= old->end) { @@ -378,7 +409,7 @@ static int kinterval_rb_check_del(struct rb_root *root, rb_erase(&old->rb, root); rb_augment_erase_end(deepest, kinterval_rb_augment_cb, NULL); - old->end = start - 1; + old->end = start; old->subtree_max_end = old->end; kinterval_rb_insert(root, old); } else if (start >= old->start && end <= old->end) { @@ -403,13 +434,17 @@ static int kinterval_rb_check_del(struct rb_root *root, if (unlikely(!prev)) return -ENOMEM; + deepest = rb_augment_erase_begin(&old->rb); rb_erase(&old->rb, root); + rb_augment_erase_end(deepest, kinterval_rb_augment_cb, + NULL); prev->start = old->start; - old->start = end + 1; - prev->end = start - 1; + old->start = end; + prev->end = start; prev->type = old->type; + old->subtree_max_end = old->end; kinterval_rb_insert(root, old); prev->subtree_max_end = prev->end; @@ -422,7 +457,7 @@ static int kinterval_rb_check_del(struct rb_root *root, int kinterval_del(struct rb_root *root, u64 start, u64 end, gfp_t flags) { - if (end < start) + if (end <= start) return -EINVAL; return kinterval_rb_check_del(root, start, end, flags); } @@ -451,7 +486,7 @@ long kinterval_lookup_range(struct rb_root *root, u64 start, u64 end) { struct kinterval *range; - if (end < start) + if (end <= start) return -EINVAL; range = kinterval_rb_lowest_match(root, start, end); return range ? range->type : -ENOENT; -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v5 2/3] mm: filemap: introduce mark_page_usedonce 2012-02-12 0:21 [RFC] [PATCH v5 0/3] fadvise: support POSIX_FADV_NOREUSE Andrea Righi 2012-02-12 0:21 ` [PATCH v5 1/3] kinterval: routines to manipulate generic intervals Andrea Righi @ 2012-02-12 0:21 ` Andrea Righi 2012-02-12 0:21 ` [PATCH v5 3/3] fadvise: implement POSIX_FADV_NOREUSE Andrea Righi 2012-02-14 21:33 ` [RFC] [PATCH v5 0/3] fadvise: support POSIX_FADV_NOREUSE Andrew Morton 3 siblings, 0 replies; 27+ messages in thread From: Andrea Righi @ 2012-02-12 0:21 UTC (permalink / raw) To: Andrew Morton Cc: Minchan Kim, Peter Zijlstra, Johannes Weiner, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, linux-mm, linux-fsdevel, LKML Introduce a helper function to drop a page from the page cache if it is evictable, inactive and unreferenced. This can be used to drop used-once pages from the file cache with POSIX_FADV_NOREUSE. Signed-off-by: Andrea Righi <andrea@betterlinux.com> --- include/linux/swap.h | 1 + mm/swap.c | 24 ++++++++++++++++++++++++ 2 files changed, 25 insertions(+), 0 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 3e60228..2e5d1b8 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -222,6 +222,7 @@ extern void lru_add_page_tail(struct zone* zone, struct page *page, struct page *page_tail); extern void activate_page(struct page *); extern void mark_page_accessed(struct page *); +extern void mark_page_usedonce(struct page *page); extern void lru_add_drain(void); extern int lru_add_drain_all(void); extern void rotate_reclaimable_page(struct page *page); diff --git a/mm/swap.c b/mm/swap.c index fff1ff7..2c19c92 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -352,6 +352,30 @@ void activate_page(struct page *page) } #endif +/** + * mark_page_usedonce - handle used-once pages + * @page: the page set as used-once + * + * Drop a page from the page cache if it is evictable, inactive and + * unreferenced. + */ +void mark_page_usedonce(struct page *page) +{ + int ret; + + if (!PageLRU(page)) + return; + if (PageActive(page) || PageUnevictable(page) || PageReferenced(page)) + return; + if (lock_page_killable(page)) + return; + ret = invalidate_inode_page(page); + unlock_page(page); + if (!ret) + deactivate_page(page); +} +EXPORT_SYMBOL(mark_page_usedonce); + /* * Mark a page as having seen activity. * -- 1.7.5.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v5 3/3] fadvise: implement POSIX_FADV_NOREUSE 2012-02-12 0:21 [RFC] [PATCH v5 0/3] fadvise: support POSIX_FADV_NOREUSE Andrea Righi 2012-02-12 0:21 ` [PATCH v5 1/3] kinterval: routines to manipulate generic intervals Andrea Righi 2012-02-12 0:21 ` [PATCH v5 2/3] mm: filemap: introduce mark_page_usedonce Andrea Righi @ 2012-02-12 0:21 ` Andrea Righi 2012-02-13 16:22 ` KOSAKI Motohiro ` (4 more replies) 2012-02-14 21:33 ` [RFC] [PATCH v5 0/3] fadvise: support POSIX_FADV_NOREUSE Andrew Morton 3 siblings, 5 replies; 27+ messages in thread From: Andrea Righi @ 2012-02-12 0:21 UTC (permalink / raw) To: Andrew Morton Cc: Minchan Kim, Peter Zijlstra, Johannes Weiner, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, linux-mm, linux-fsdevel, LKML According to the POSIX standard the POSIX_FADV_NOREUSE hint means that the application expects to access the specified data once and then not reuse it thereafter. It seems that the expected behavior is to implement a drop-behind policy where the application can set certain intervals of a file as FADV_NOREUSE _before_ accessing the data. An interesting usage of this hint is to guarantee that pages marked as FADV_NOREUSE will never blow away the pages of the current working set. A possible solution to satisfy this requirement is to prevent lru activation of the pages marked as FADV_NOREUSE, in other words, never add pages marked as FADV_NOREUSE to the active lru list. Moreover, all the file cache pages in a FADV_NOREUSE range can be immediately dropped after a read if the page was not present in the file cache before. In general, the purpose of this approach is to preserve as much as possible the previous state of the file cache memory before reading data in ranges marked by FADV_NOREUSE. All the pages read before (pre-)setting them as FADV_NOREUSE should be treated as normal, so they can be added to the active lru list as usual if they're accessed multiple times. Only after setting them as FADV_NOREUSE we can prevent them for being promoted to the active lru list. If they are already in the active lru list before calling FADV_NOREUSE we should keep them there, but if they quit from the active list they can't get back anymore (except by explicitly setting a different caching hint). To achieve this goal we need to maintain the list of file ranges marked as FADV_NOREUSE until the pages are dropped from the page cache, or the inode is deleted, or they're explicitly marked to use a different cache behavior (FADV_NORMAL | FADV_WILLNEED). The list of FADV_NOREUSE ranges is maintained in the address_space structure using an interval tree (kinterval). Signed-off-by: Andrea Righi <andrea@betterlinux.com> --- fs/inode.c | 3 ++ include/linux/fs.h | 12 ++++++ mm/fadvise.c | 18 +++++++++- mm/filemap.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++- 4 files changed, 125 insertions(+), 3 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index fb10d86..6375163 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -254,6 +254,7 @@ void __destroy_inode(struct inode *inode) if (inode->i_default_acl && inode->i_default_acl != ACL_NOT_CACHED) posix_acl_release(inode->i_default_acl); #endif + filemap_clear_cache(&inode->i_data); this_cpu_dec(nr_inodes); } EXPORT_SYMBOL(__destroy_inode); @@ -360,6 +361,8 @@ void address_space_init_once(struct address_space *mapping) spin_lock_init(&mapping->private_lock); INIT_RAW_PRIO_TREE_ROOT(&mapping->i_mmap); INIT_LIST_HEAD(&mapping->i_mmap_nonlinear); + INIT_KINTERVAL_TREE_ROOT(&mapping->nocache_tree); + rwlock_init(&mapping->nocache_lock); } EXPORT_SYMBOL(address_space_init_once); diff --git a/include/linux/fs.h b/include/linux/fs.h index 386da09..624a73e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -9,6 +9,7 @@ #include <linux/limits.h> #include <linux/ioctl.h> #include <linux/blk_types.h> +#include <linux/kinterval.h> #include <linux/types.h> /* @@ -521,6 +522,11 @@ enum positive_aop_returns { * helper code (eg buffer layer) * to clear GFP_FS from alloc */ +enum filemap_cache_modes { + FILEMAP_CACHE_NORMAL, /* No special cache behavior */ + FILEMAP_CACHE_ONCE, /* Pages will be used once */ +}; + /* * oh the beauties of C type declarations. */ @@ -655,6 +661,8 @@ struct address_space { spinlock_t private_lock; /* for use by the address_space */ struct list_head private_list; /* ditto */ struct address_space *assoc_mapping; /* ditto */ + struct rb_root nocache_tree; /* noreuse cache range tree */ + rwlock_t nocache_lock; /* protect the nocache_tree */ } __attribute__((aligned(sizeof(long)))); /* * On most architectures that alignment is already the case; but @@ -2189,6 +2197,10 @@ extern int invalidate_inode_pages2(struct address_space *mapping); extern int invalidate_inode_pages2_range(struct address_space *mapping, pgoff_t start, pgoff_t end); extern int write_inode_now(struct inode *, int); +extern void filemap_clear_cache(struct address_space *mapping); +extern int filemap_set_cache(struct address_space *mapping, + pgoff_t start, pgoff_t end, int mode); +extern int filemap_get_cache(struct address_space *mapping, pgoff_t index); extern int filemap_fdatawrite(struct address_space *); extern int filemap_flush(struct address_space *); extern int filemap_fdatawait(struct address_space *); diff --git a/mm/fadvise.c b/mm/fadvise.c index 469491e..22b1aa8 100644 --- a/mm/fadvise.c +++ b/mm/fadvise.c @@ -80,6 +80,12 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice) spin_lock(&file->f_lock); file->f_mode &= ~FMODE_RANDOM; spin_unlock(&file->f_lock); + + start_index = offset >> PAGE_CACHE_SHIFT; + end_index = endbyte >> PAGE_CACHE_SHIFT; + + ret = filemap_set_cache(mapping, start_index, end_index, + FILEMAP_CACHE_NORMAL); break; case POSIX_FADV_RANDOM: spin_lock(&file->f_lock); @@ -102,11 +108,16 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice) start_index = offset >> PAGE_CACHE_SHIFT; end_index = endbyte >> PAGE_CACHE_SHIFT; + ret = filemap_set_cache(mapping, start_index, end_index, + FILEMAP_CACHE_NORMAL); + if (ret < 0) + break; + /* Careful about overflow on the "+1" */ nrpages = end_index - start_index + 1; if (!nrpages) nrpages = ~0UL; - + ret = force_page_cache_readahead(mapping, file, start_index, nrpages); @@ -114,6 +125,11 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice) ret = 0; break; case POSIX_FADV_NOREUSE: + start_index = offset >> PAGE_CACHE_SHIFT; + end_index = endbyte >> PAGE_CACHE_SHIFT; + + ret = filemap_set_cache(mapping, start_index, end_index, + FILEMAP_CACHE_ONCE); break; case POSIX_FADV_DONTNEED: if (!bdi_write_congested(mapping->backing_dev_info)) diff --git a/mm/filemap.c b/mm/filemap.c index b662757..610502a 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -27,6 +27,7 @@ #include <linux/writeback.h> #include <linux/backing-dev.h> #include <linux/pagevec.h> +#include <linux/kinterval.h> #include <linux/blkdev.h> #include <linux/security.h> #include <linux/syscalls.h> @@ -187,6 +188,82 @@ static int sleep_on_page_killable(void *word) } /** + * filemap_clear_cache - clear all special cache settings + * @mapping: target address space structure + * + * Reset all the special file cache settings previously defined by + * filemap_set_cache(). + */ +void filemap_clear_cache(struct address_space *mapping) +{ + write_lock(&mapping->nocache_lock); + kinterval_clear(&mapping->nocache_tree); + write_unlock(&mapping->nocache_lock); +} + +/** + * filemap_set_cache - set special cache behavior for a range of file pages + * @mapping: target address space structure + * @start: offset in pages where the range starts + * @end: offset in pages where the range ends (inclusive) + * @mode: cache behavior configuration + * + * This can be used to define special cache behavior in advance, before + * accessing the data. + * + * At the moment the supported cache behaviors are the following (see also + * filemap_cache_modes): + * + * FILEMAP_CACHE_NORMAL: normal page cache behavior; + * + * FILEMAP_CACHE_ONCE: specifies that the pages will be accessed once and the + * caller don't expect to reuse it thereafter. This prevents them for being + * promoted to the active lru list. + */ +int filemap_set_cache(struct address_space *mapping, + pgoff_t start, pgoff_t end, int mode) +{ + int ret; + + write_lock(&mapping->nocache_lock); + switch (mode) { + case FILEMAP_CACHE_NORMAL: + ret = kinterval_del(&mapping->nocache_tree, + start, end, GFP_KERNEL); + break; + case FILEMAP_CACHE_ONCE: + ret = kinterval_add(&mapping->nocache_tree, + start, end, mode, GFP_KERNEL); + break; + default: + ret = -EINVAL; + break; + } + write_unlock(&mapping->nocache_lock); + + return ret; +} + +/** + * filemap_get_cache - get special cache behavior of a file page + * @mapping: target address space structure + * @index: index of the page inside the address space + * + * If no special cache behavior are defined FILEMAP_CACHE_NORMAL is returned + * (that means no special page cache behavior is applied). + */ +int filemap_get_cache(struct address_space *mapping, pgoff_t index) +{ + int mode; + + read_lock(&mapping->nocache_lock); + mode = kinterval_lookup(&mapping->nocache_tree, index); + read_unlock(&mapping->nocache_lock); + + return mode < 0 ? FILEMAP_CACHE_NORMAL : mode; +} + +/** * __filemap_fdatawrite_range - start writeback on mapping dirty pages in range * @mapping: address space structure to write * @start: offset in bytes where the range starts @@ -1181,8 +1258,22 @@ page_ok: * When a sequential read accesses a page several times, * only mark it as accessed the first time. */ - if (prev_index != index || offset != prev_offset) - mark_page_accessed(page); + if (prev_index != index || offset != prev_offset) { + int mode; + + mode = filemap_get_cache(mapping, index); + switch (mode) { + case FILEMAP_CACHE_NORMAL: + mark_page_accessed(page); + break; + case FILEMAP_CACHE_ONCE: + mark_page_usedonce(page); + break; + default: + WARN_ON_ONCE(1); + break; + } + } prev_index = index; /* -- 1.7.5.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v5 3/3] fadvise: implement POSIX_FADV_NOREUSE 2012-02-12 0:21 ` [PATCH v5 3/3] fadvise: implement POSIX_FADV_NOREUSE Andrea Righi @ 2012-02-13 16:22 ` KOSAKI Motohiro 2012-02-13 16:22 ` KOSAKI Motohiro ` (3 subsequent siblings) 4 siblings, 0 replies; 27+ messages in thread From: KOSAKI Motohiro @ 2012-02-13 16:22 UTC (permalink / raw) To: Andrea Righi Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner, KAMEZAWA Hiroyuki, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, linux-mm, linux-fsdevel, LKML > @@ -1181,8 +1258,22 @@ page_ok: > * When a sequential read accesses a page several times, > * only mark it as accessed the first time. > */ > - if (prev_index != index || offset != prev_offset) > - mark_page_accessed(page); > + if (prev_index != index || offset != prev_offset) { > + int mode; > + > + mode = filemap_get_cache(mapping, index); > + switch (mode) { > + case FILEMAP_CACHE_NORMAL: > + mark_page_accessed(page); > + break; > + case FILEMAP_CACHE_ONCE: > + mark_page_usedonce(page); > + break; > + default: > + WARN_ON_ONCE(1); > + break; Here is generic_file_read, right? Why don't you care write and page fault? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v5 3/3] fadvise: implement POSIX_FADV_NOREUSE 2012-02-12 0:21 ` [PATCH v5 3/3] fadvise: implement POSIX_FADV_NOREUSE Andrea Righi 2012-02-13 16:22 ` KOSAKI Motohiro @ 2012-02-13 16:22 ` KOSAKI Motohiro [not found] ` <CAHGf_=qs8-nE6y6EzNYUzgjGo0sMP5zvCc3=GNZmHct6mPecqg@mail.gmail.com> ` (2 subsequent siblings) 4 siblings, 0 replies; 27+ messages in thread From: KOSAKI Motohiro @ 2012-02-13 16:22 UTC (permalink / raw) To: Andrea Righi Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner, KAMEZAWA Hiroyuki, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, linux-mm, linux-fsdevel, LKML > @@ -1181,8 +1258,22 @@ page_ok: > * When a sequential read accesses a page several times, > * only mark it as accessed the first time. > */ > - if (prev_index != index || offset != prev_offset) > - mark_page_accessed(page); > + if (prev_index != index || offset != prev_offset) { > + int mode; > + > + mode = filemap_get_cache(mapping, index); > + switch (mode) { > + case FILEMAP_CACHE_NORMAL: > + mark_page_accessed(page); > + break; > + case FILEMAP_CACHE_ONCE: > + mark_page_usedonce(page); > + break; > + default: > + WARN_ON_ONCE(1); > + break; Here is generic_file_read, right? Why don't you care write and page fault? ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <CAHGf_=qs8-nE6y6EzNYUzgjGo0sMP5zvCc3=GNZmHct6mPecqg@mail.gmail.com>]
* Re: [PATCH v5 3/3] fadvise: implement POSIX_FADV_NOREUSE [not found] ` <CAHGf_=qs8-nE6y6EzNYUzgjGo0sMP5zvCc3=GNZmHct6mPecqg@mail.gmail.com> @ 2012-02-13 18:00 ` Andrea Righi 0 siblings, 0 replies; 27+ messages in thread From: Andrea Righi @ 2012-02-13 18:00 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner, KAMEZAWA Hiroyuki, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, linux-mm, linux-fsdevel, LKML On Mon, Feb 13, 2012 at 11:22:26AM -0500, KOSAKI Motohiro wrote: > > @@ -1181,8 +1258,22 @@ page_ok: > > * When a sequential read accesses a page several times, > > * only mark it as accessed the first time. > > */ > > - if (prev_index != index || offset != prev_offset) > > - mark_page_accessed(page); > > + if (prev_index != index || offset != prev_offset) { > > + int mode; > > + > > + mode = filemap_get_cache(mapping, index); > > + switch (mode) { > > + case FILEMAP_CACHE_NORMAL: > > + mark_page_accessed(page); > > + break; > > + case FILEMAP_CACHE_ONCE: > > + mark_page_usedonce(page); > > + break; > > + default: > > + WARN_ON_ONCE(1); > > + break; > > Here is generic_file_read, right? Why don't you care write and page fault? That's correct. I focused in my read test case and have not consider write and page fault at all yet. There's also another missing piece probably: readahead. About generic_file_read the behavior that we may want to provide looks quite clear to me. Instead, I don't know which is the best behavior for the NOREUSE writes... should we just avoid active lru list eligibility, or also drop the pages after the write if they weren't present in page cache before the write? In the former NOREUSE pages can still trash pages in the inactive lru list, in the latter writes will be slow because we need to wait for the writeback. Ideas/suggestions? About readahead pages I think we shouldn't touch anything. IIUC, when readahead pages are loaded in page cache for the first time they are added to the inactive lru list and not marked as referenced. If the readahead pages are also referenced by the application and they're inside a NOREUSE range they won't be marked as referenced and will continue to live in the inactive list (and dropped when the inactive list is shrunk). If readahead pages are not inside a NOREUSE range they will be treated as usual (marked as referenced, and moved to the active list if they're accessed more than once). Thanks, -Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v5 3/3] fadvise: implement POSIX_FADV_NOREUSE 2012-02-12 0:21 ` [PATCH v5 3/3] fadvise: implement POSIX_FADV_NOREUSE Andrea Righi ` (2 preceding siblings ...) [not found] ` <CAHGf_=qs8-nE6y6EzNYUzgjGo0sMP5zvCc3=GNZmHct6mPecqg@mail.gmail.com> @ 2012-02-15 23:35 ` Arun Sharma 2012-02-15 23:47 ` Andrea Righi 2012-02-27 2:33 ` KAMEZAWA Hiroyuki 4 siblings, 1 reply; 27+ messages in thread From: Arun Sharma @ 2012-02-15 23:35 UTC (permalink / raw) To: Andrea Righi Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, linux-mm, linux-fsdevel, LKML On Sun, Feb 12, 2012 at 01:21:38AM +0100, Andrea Righi wrote: > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 386da09..624a73e 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -9,6 +9,7 @@ > #include <linux/limits.h> > #include <linux/ioctl.h> > #include <linux/blk_types.h> > +#include <linux/kinterval.h> > #include <linux/types.h> fs.h is an exported header file, whereas kinterval.h is not. So this fails scripts/header_check.pl. I used the workaround below. -Arun diff --git a/fs/inode.c b/fs/inode.c index d27dbee..1335a5f 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -26,6 +26,7 @@ #include <linux/ima.h> #include <linux/cred.h> #include <linux/buffer_head.h> /* for inode_has_buffers */ +#include <linux/kinterval.h> #include "internal.h" /* @@ -279,7 +280,7 @@ void address_space_init_once(struct address_space *mapping) spin_lock_init(&mapping->private_lock); INIT_RAW_PRIO_TREE_ROOT(&mapping->i_mmap); INIT_LIST_HEAD(&mapping->i_mmap_nonlinear); - INIT_KINTERVAL_TREE_ROOT(&mapping->nocache_tree); + INIT_KINTERVAL_TREE_ROOT((struct rb_root *) &mapping->nocache_tree); rwlock_init(&mapping->nocache_lock); } EXPORT_SYMBOL(address_space_init_once); diff --git a/include/linux/fs.h b/include/linux/fs.h index 74b6a97..b4e45e6 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -9,7 +9,6 @@ #include <linux/limits.h> #include <linux/ioctl.h> #include <linux/blk_types.h> -#include <linux/kinterval.h> #include <linux/types.h> /* @@ -656,7 +655,7 @@ struct address_space { spinlock_t private_lock; /* for use by the address_space */ struct list_head private_list; /* ditto */ struct address_space *assoc_mapping; /* ditto */ - struct rb_root nocache_tree; /* noreuse cache range tree */ + void *nocache_tree; /* noreuse cache range tree */ rwlock_t nocache_lock; /* protect the nocache_tree */ } __attribute__((aligned(sizeof(long)))); /* -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v5 3/3] fadvise: implement POSIX_FADV_NOREUSE 2012-02-15 23:35 ` Arun Sharma @ 2012-02-15 23:47 ` Andrea Righi 2012-02-15 23:57 ` Arun Sharma 0 siblings, 1 reply; 27+ messages in thread From: Andrea Righi @ 2012-02-15 23:47 UTC (permalink / raw) To: Arun Sharma Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, linux-mm, linux-fsdevel, LKML On Wed, Feb 15, 2012 at 03:35:37PM -0800, Arun Sharma wrote: > On Sun, Feb 12, 2012 at 01:21:38AM +0100, Andrea Righi wrote: > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > index 386da09..624a73e 100644 > > --- a/include/linux/fs.h > > +++ b/include/linux/fs.h > > @@ -9,6 +9,7 @@ > > #include <linux/limits.h> > > #include <linux/ioctl.h> > > #include <linux/blk_types.h> > > +#include <linux/kinterval.h> > > #include <linux/types.h> > > fs.h is an exported header file, whereas kinterval.h is not. So this > fails scripts/header_check.pl. > > I used the workaround below. > > -Arun > > diff --git a/fs/inode.c b/fs/inode.c > index d27dbee..1335a5f 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -26,6 +26,7 @@ > #include <linux/ima.h> > #include <linux/cred.h> > #include <linux/buffer_head.h> /* for inode_has_buffers */ > +#include <linux/kinterval.h> > #include "internal.h" > > /* > @@ -279,7 +280,7 @@ void address_space_init_once(struct address_space *mapping) > spin_lock_init(&mapping->private_lock); > INIT_RAW_PRIO_TREE_ROOT(&mapping->i_mmap); > INIT_LIST_HEAD(&mapping->i_mmap_nonlinear); > - INIT_KINTERVAL_TREE_ROOT(&mapping->nocache_tree); > + INIT_KINTERVAL_TREE_ROOT((struct rb_root *) &mapping->nocache_tree); > rwlock_init(&mapping->nocache_lock); > } > EXPORT_SYMBOL(address_space_init_once); > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 74b6a97..b4e45e6 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -9,7 +9,6 @@ > #include <linux/limits.h> > #include <linux/ioctl.h> > #include <linux/blk_types.h> > -#include <linux/kinterval.h> > #include <linux/types.h> > > /* > @@ -656,7 +655,7 @@ struct address_space { > spinlock_t private_lock; /* for use by the address_space */ > struct list_head private_list; /* ditto */ > struct address_space *assoc_mapping; /* ditto */ > - struct rb_root nocache_tree; /* noreuse cache range tree */ > + void *nocache_tree; /* noreuse cache range tree */ > rwlock_t nocache_lock; /* protect the nocache_tree */ > } __attribute__((aligned(sizeof(long)))); > /* mmh.. a forward declaration of rb_root in fs.h shouldn't be better than this? Thanks, -Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v5 3/3] fadvise: implement POSIX_FADV_NOREUSE 2012-02-15 23:47 ` Andrea Righi @ 2012-02-15 23:57 ` Arun Sharma 2012-02-16 0:56 ` Andrea Righi 0 siblings, 1 reply; 27+ messages in thread From: Arun Sharma @ 2012-02-15 23:57 UTC (permalink / raw) To: Andrea Righi Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, linux-mm, linux-fsdevel, LKML On 2/15/12 3:47 PM, Andrea Righi wrote: >> index 74b6a97..b4e45e6 100644 >> --- a/include/linux/fs.h >> +++ b/include/linux/fs.h >> @@ -9,7 +9,6 @@ >> #include<linux/limits.h> >> #include<linux/ioctl.h> >> #include<linux/blk_types.h> >> -#include<linux/kinterval.h> >> #include<linux/types.h> >> >> /* >> @@ -656,7 +655,7 @@ struct address_space { >> spinlock_t private_lock; /* for use by the address_space */ >> struct list_head private_list; /* ditto */ >> struct address_space *assoc_mapping; /* ditto */ >> - struct rb_root nocache_tree; /* noreuse cache range tree */ >> + void *nocache_tree; /* noreuse cache range tree */ >> rwlock_t nocache_lock; /* protect the nocache_tree */ >> } __attribute__((aligned(sizeof(long)))); >> /* > > mmh.. a forward declaration of rb_root in fs.h shouldn't be better than > this? > Forward declaration works if the type was struct rb_root *. But the type in your patch was a struct and the compiler can't figure out its size. include/linux/fs.h:659:17: error: field ‘nocache_tree’ has incomplete type Did you mean forward declaring struct rb_node instead of rb_root? If we go down this path, a few more places need fixups (I ignored the compiler warnings about casting void * to struct rb_root *). -Arun -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v5 3/3] fadvise: implement POSIX_FADV_NOREUSE 2012-02-15 23:57 ` Arun Sharma @ 2012-02-16 0:56 ` Andrea Righi 2012-02-16 2:10 ` Arun Sharma 0 siblings, 1 reply; 27+ messages in thread From: Andrea Righi @ 2012-02-16 0:56 UTC (permalink / raw) To: Arun Sharma Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, linux-mm, linux-fsdevel, LKML On Wed, Feb 15, 2012 at 03:57:47PM -0800, Arun Sharma wrote: > > > On 2/15/12 3:47 PM, Andrea Righi wrote: > >>index 74b6a97..b4e45e6 100644 > >>--- a/include/linux/fs.h > >>+++ b/include/linux/fs.h > >>@@ -9,7 +9,6 @@ > >> #include<linux/limits.h> > >> #include<linux/ioctl.h> > >> #include<linux/blk_types.h> > >>-#include<linux/kinterval.h> > >> #include<linux/types.h> > >> > >> /* > >>@@ -656,7 +655,7 @@ struct address_space { > >> spinlock_t private_lock; /* for use by the address_space */ > >> struct list_head private_list; /* ditto */ > >> struct address_space *assoc_mapping; /* ditto */ > >>- struct rb_root nocache_tree; /* noreuse cache range tree */ > >>+ void *nocache_tree; /* noreuse cache range tree */ > >> rwlock_t nocache_lock; /* protect the nocache_tree */ > >> } __attribute__((aligned(sizeof(long)))); > >> /* > > > >mmh.. a forward declaration of rb_root in fs.h shouldn't be better than > >this? > > > > Forward declaration works if the type was struct rb_root *. But the > type in your patch was a struct and the compiler can't figure out > its size. > > include/linux/fs.h:659:17: error: field ‘nocache_tree’ has incomplete type > > Did you mean forward declaring struct rb_node instead of rb_root? > > If we go down this path, a few more places need fixups (I ignored > the compiler warnings about casting void * to struct rb_root *). > > -Arun Oh sorry, you're right! nocache_tree is not a pointer inside address_space, so the compiler must know the size. mmh... move the definition of the rb_root struct in linux/types.h? or simply use a rb_root pointer. The (void *) looks a bit scary and too bug prone. -Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v5 3/3] fadvise: implement POSIX_FADV_NOREUSE 2012-02-16 0:56 ` Andrea Righi @ 2012-02-16 2:10 ` Arun Sharma 2012-02-16 10:39 ` Andrea Righi 0 siblings, 1 reply; 27+ messages in thread From: Arun Sharma @ 2012-02-16 2:10 UTC (permalink / raw) To: Andrea Righi Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, linux-mm, linux-fsdevel, LKML On 2/15/12 4:56 PM, Andrea Righi wrote: > Oh sorry, you're right! nocache_tree is not a pointer inside > address_space, so the compiler must know the size. > > mmh... move the definition of the rb_root struct in linux/types.h? or > simply use a rb_root pointer. The (void *) looks a bit scary and too bug > prone. Either way is fine. I did some black box testing of the patch (comparing noreuse vs dontneed) and it behaves as expected. On a file copy, neither one pollutes the page cache. But if I run a random read benchmark on the source file right before and afterwards, page cache is warm with noreuse, but cold with dontneed. Copy performance was unaffected. I can't really comment on the implementation details since I haven't reviewed it, but the functionality sounds useful. -Arun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v5 3/3] fadvise: implement POSIX_FADV_NOREUSE 2012-02-16 2:10 ` Arun Sharma @ 2012-02-16 10:39 ` Andrea Righi 2012-02-16 18:43 ` Arun Sharma 0 siblings, 1 reply; 27+ messages in thread From: Andrea Righi @ 2012-02-16 10:39 UTC (permalink / raw) To: Arun Sharma Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, linux-mm, linux-fsdevel, LKML On Wed, Feb 15, 2012 at 06:10:28PM -0800, Arun Sharma wrote: > On 2/15/12 4:56 PM, Andrea Righi wrote: > > >Oh sorry, you're right! nocache_tree is not a pointer inside > >address_space, so the compiler must know the size. > > > >mmh... move the definition of the rb_root struct in linux/types.h? or > >simply use a rb_root pointer. The (void *) looks a bit scary and too bug > >prone. > > Either way is fine. I did some black box testing of the patch > (comparing noreuse vs dontneed) and it behaves as expected. > > On a file copy, neither one pollutes the page cache. But if I run a > random read benchmark on the source file right before and > afterwards, page cache is warm with noreuse, but cold with dontneed. > Copy performance was unaffected. > > I can't really comment on the implementation details since I haven't > reviewed it, but the functionality sounds useful. > > -Arun Arun, thank you very much for your review and testing. Probably we'll move to a different, memcg-based solution, so I don't think I'll post another version of this patch set as is. In case, I'll apply one of the workarounds for the rb_root attribute. Thanks, -Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v5 3/3] fadvise: implement POSIX_FADV_NOREUSE 2012-02-16 10:39 ` Andrea Righi @ 2012-02-16 18:43 ` Arun Sharma 2012-02-16 18:57 ` Andrea Righi 0 siblings, 1 reply; 27+ messages in thread From: Arun Sharma @ 2012-02-16 18:43 UTC (permalink / raw) To: Andrea Righi Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, linux-mm, linux-fsdevel, LKML On 2/16/12 2:39 AM, Andrea Righi wrote: > Arun, thank you very much for your review and testing. Probably we'll > move to a different, memcg-based solution, so I don't think I'll post > another version of this patch set as is. In case, I'll apply one of > the workarounds for the rb_root attribute. I'm not sure if the proposed memory.file.limit_in_bytes is the right interface. Two problems: * The user is now required to figure out what is the right amount of page cache for the app (may change over time) * If the app touches two sets of files, one with streaming access and the other which benefits from page cache (eg: a mapper task in a map reduce), memcg doesn't allow the user to specify the access pattern per-fd. -Arun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v5 3/3] fadvise: implement POSIX_FADV_NOREUSE 2012-02-16 18:43 ` Arun Sharma @ 2012-02-16 18:57 ` Andrea Righi 2012-02-16 19:07 ` Arun Sharma 0 siblings, 1 reply; 27+ messages in thread From: Andrea Righi @ 2012-02-16 18:57 UTC (permalink / raw) To: Arun Sharma Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, linux-mm, linux-fsdevel, LKML On Thu, Feb 16, 2012 at 10:43:00AM -0800, Arun Sharma wrote: > On 2/16/12 2:39 AM, Andrea Righi wrote: > > >Arun, thank you very much for your review and testing. Probably we'll > >move to a different, memcg-based solution, so I don't think I'll post > >another version of this patch set as is. In case, I'll apply one of > >the workarounds for the rb_root attribute. > > I'm not sure if the proposed memory.file.limit_in_bytes is the right > interface. Two problems: > > * The user is now required to figure out what is the right amount of > page cache for the app (may change over time) Right. > > * If the app touches two sets of files, one with streaming access > and the other which benefits from page cache (eg: a mapper task in a > map reduce), memcg doesn't allow the user to specify the access > pattern per-fd. Yes, of course the memcg approach is probably too coarse-grained for certain apps. If we want to provide the per-fd granularity the fadvise() solution is a way better. However, the memcg solution could be enough to resolve most of the common data streaming issues (like "the backup is trashing the page cache" problem) and it doesn't require modification of the application source code. This is an important advantage that we shouldn't ignore IMHO, because it means that the new feature will be available _immediately_ for any application. Maybe we should try to push ...something... in the memcg code for the short-term future, make it as much generic as possible, and for the long-term try to reuse the same feature (totally or in part) in the per-fd approach via fadvise(). Thanks, -Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v5 3/3] fadvise: implement POSIX_FADV_NOREUSE 2012-02-16 18:57 ` Andrea Righi @ 2012-02-16 19:07 ` Arun Sharma 0 siblings, 0 replies; 27+ messages in thread From: Arun Sharma @ 2012-02-16 19:07 UTC (permalink / raw) To: Andrea Righi Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, linux-mm, linux-fsdevel, LKML On 2/16/12 10:57 AM, Andrea Righi wrote: > Maybe we should try to push ...something... in the memcg code for the > short-term future, make it as much generic as possible, and for the > long-term try to reuse the same feature (totally or in part) in the > per-fd approach via fadvise(). Yes - the two approaches are complementary and we should probably pursue both. There are a number of apps which are already using fadvise though: https://issues.apache.org/jira/browse/MAPREDUCE-3289 http://highscalability.com/blog/2012/1/12/peregrine-a-map-reduce-framework-for-iterative-and-pipelined.html and probably many other similar cases that are not open source. Some of these apps may be better off using NOREUSE instead of DONTNEED, since they may not have a clue on what else is going on in the system. The way I think about it: NOREUSE is a statement about what my process is doing and DONTNEED is a statement about the entire system. -Arun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v5 3/3] fadvise: implement POSIX_FADV_NOREUSE 2012-02-12 0:21 ` [PATCH v5 3/3] fadvise: implement POSIX_FADV_NOREUSE Andrea Righi ` (3 preceding siblings ...) 2012-02-15 23:35 ` Arun Sharma @ 2012-02-27 2:33 ` KAMEZAWA Hiroyuki 2012-02-27 10:46 ` Andrea Righi 4 siblings, 1 reply; 27+ messages in thread From: KAMEZAWA Hiroyuki @ 2012-02-27 2:33 UTC (permalink / raw) To: Andrea Righi Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner, KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, linux-mm, linux-fsdevel, LKML On Sun, 12 Feb 2012 01:21:38 +0100 Andrea Righi <andrea@betterlinux.com> wrote: > According to the POSIX standard the POSIX_FADV_NOREUSE hint means that > the application expects to access the specified data once and then not > reuse it thereafter. > > It seems that the expected behavior is to implement a drop-behind > policy where the application can set certain intervals of a file as > FADV_NOREUSE _before_ accessing the data. > > An interesting usage of this hint is to guarantee that pages marked as > FADV_NOREUSE will never blow away the pages of the current working set. > > A possible solution to satisfy this requirement is to prevent lru > activation of the pages marked as FADV_NOREUSE, in other words, never > add pages marked as FADV_NOREUSE to the active lru list. Moreover, all > the file cache pages in a FADV_NOREUSE range can be immediately dropped > after a read if the page was not present in the file cache before. > > In general, the purpose of this approach is to preserve as much as > possible the previous state of the file cache memory before reading data > in ranges marked by FADV_NOREUSE. > > All the pages read before (pre-)setting them as FADV_NOREUSE should be > treated as normal, so they can be added to the active lru list as usual > if they're accessed multiple times. > > Only after setting them as FADV_NOREUSE we can prevent them for being > promoted to the active lru list. If they are already in the active lru > list before calling FADV_NOREUSE we should keep them there, but if they > quit from the active list they can't get back anymore (except by > explicitly setting a different caching hint). > >From this part, it seems the behavior of systemcall is highly depends on interanal kernel implemenatation... > To achieve this goal we need to maintain the list of file ranges marked > as FADV_NOREUSE until the pages are dropped from the page cache, or the > inode is deleted, or they're explicitly marked to use a different cache > behavior (FADV_NORMAL | FADV_WILLNEED). > > The list of FADV_NOREUSE ranges is maintained in the address_space > structure using an interval tree (kinterval). > > Signed-off-by: Andrea Righi <andrea@betterlinux.com> Once an appliation sets a range of file as FILEMAP_CACHE_ONCE, the effects will last until the inode is dropped....right ? Won't this cause troubles which cannot be detected (because kinterval information is hidden.) ? I'm not sure but FADV_NOREUSE seems like one-shot call and should not have very long time of effect (after the application exits.) Can't we ties the liftime of kinteval to the application/file descriptor ? Thanks, -Kame > --- > fs/inode.c | 3 ++ > include/linux/fs.h | 12 ++++++ > mm/fadvise.c | 18 +++++++++- > mm/filemap.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++- > 4 files changed, 125 insertions(+), 3 deletions(-) > > diff --git a/fs/inode.c b/fs/inode.c > index fb10d86..6375163 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -254,6 +254,7 @@ void __destroy_inode(struct inode *inode) > if (inode->i_default_acl && inode->i_default_acl != ACL_NOT_CACHED) > posix_acl_release(inode->i_default_acl); > #endif > + filemap_clear_cache(&inode->i_data); > this_cpu_dec(nr_inodes); > } > EXPORT_SYMBOL(__destroy_inode); > @@ -360,6 +361,8 @@ void address_space_init_once(struct address_space *mapping) > spin_lock_init(&mapping->private_lock); > INIT_RAW_PRIO_TREE_ROOT(&mapping->i_mmap); > INIT_LIST_HEAD(&mapping->i_mmap_nonlinear); > + INIT_KINTERVAL_TREE_ROOT(&mapping->nocache_tree); > + rwlock_init(&mapping->nocache_lock); > } > EXPORT_SYMBOL(address_space_init_once); > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 386da09..624a73e 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -9,6 +9,7 @@ > #include <linux/limits.h> > #include <linux/ioctl.h> > #include <linux/blk_types.h> > +#include <linux/kinterval.h> > #include <linux/types.h> > > /* > @@ -521,6 +522,11 @@ enum positive_aop_returns { > * helper code (eg buffer layer) > * to clear GFP_FS from alloc */ > > +enum filemap_cache_modes { > + FILEMAP_CACHE_NORMAL, /* No special cache behavior */ > + FILEMAP_CACHE_ONCE, /* Pages will be used once */ > +}; > + > /* > * oh the beauties of C type declarations. > */ > @@ -655,6 +661,8 @@ struct address_space { > spinlock_t private_lock; /* for use by the address_space */ > struct list_head private_list; /* ditto */ > struct address_space *assoc_mapping; /* ditto */ > + struct rb_root nocache_tree; /* noreuse cache range tree */ > + rwlock_t nocache_lock; /* protect the nocache_tree */ > } __attribute__((aligned(sizeof(long)))); > /* > * On most architectures that alignment is already the case; but > @@ -2189,6 +2197,10 @@ extern int invalidate_inode_pages2(struct address_space *mapping); > extern int invalidate_inode_pages2_range(struct address_space *mapping, > pgoff_t start, pgoff_t end); > extern int write_inode_now(struct inode *, int); > +extern void filemap_clear_cache(struct address_space *mapping); > +extern int filemap_set_cache(struct address_space *mapping, > + pgoff_t start, pgoff_t end, int mode); > +extern int filemap_get_cache(struct address_space *mapping, pgoff_t index); > extern int filemap_fdatawrite(struct address_space *); > extern int filemap_flush(struct address_space *); > extern int filemap_fdatawait(struct address_space *); > diff --git a/mm/fadvise.c b/mm/fadvise.c > index 469491e..22b1aa8 100644 > --- a/mm/fadvise.c > +++ b/mm/fadvise.c > @@ -80,6 +80,12 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice) > spin_lock(&file->f_lock); > file->f_mode &= ~FMODE_RANDOM; > spin_unlock(&file->f_lock); > + > + start_index = offset >> PAGE_CACHE_SHIFT; > + end_index = endbyte >> PAGE_CACHE_SHIFT; > + > + ret = filemap_set_cache(mapping, start_index, end_index, > + FILEMAP_CACHE_NORMAL); > break; > case POSIX_FADV_RANDOM: > spin_lock(&file->f_lock); > @@ -102,11 +108,16 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice) > start_index = offset >> PAGE_CACHE_SHIFT; > end_index = endbyte >> PAGE_CACHE_SHIFT; > > + ret = filemap_set_cache(mapping, start_index, end_index, > + FILEMAP_CACHE_NORMAL); > + if (ret < 0) > + break; > + > /* Careful about overflow on the "+1" */ > nrpages = end_index - start_index + 1; > if (!nrpages) > nrpages = ~0UL; > - > + > ret = force_page_cache_readahead(mapping, file, > start_index, > nrpages); > @@ -114,6 +125,11 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice) > ret = 0; > break; > case POSIX_FADV_NOREUSE: > + start_index = offset >> PAGE_CACHE_SHIFT; > + end_index = endbyte >> PAGE_CACHE_SHIFT; > + > + ret = filemap_set_cache(mapping, start_index, end_index, > + FILEMAP_CACHE_ONCE); > break; > case POSIX_FADV_DONTNEED: > if (!bdi_write_congested(mapping->backing_dev_info)) > diff --git a/mm/filemap.c b/mm/filemap.c > index b662757..610502a 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -27,6 +27,7 @@ > #include <linux/writeback.h> > #include <linux/backing-dev.h> > #include <linux/pagevec.h> > +#include <linux/kinterval.h> > #include <linux/blkdev.h> > #include <linux/security.h> > #include <linux/syscalls.h> > @@ -187,6 +188,82 @@ static int sleep_on_page_killable(void *word) > } > > /** > + * filemap_clear_cache - clear all special cache settings > + * @mapping: target address space structure > + * > + * Reset all the special file cache settings previously defined by > + * filemap_set_cache(). > + */ > +void filemap_clear_cache(struct address_space *mapping) > +{ > + write_lock(&mapping->nocache_lock); > + kinterval_clear(&mapping->nocache_tree); > + write_unlock(&mapping->nocache_lock); > +} > + > +/** > + * filemap_set_cache - set special cache behavior for a range of file pages > + * @mapping: target address space structure > + * @start: offset in pages where the range starts > + * @end: offset in pages where the range ends (inclusive) > + * @mode: cache behavior configuration > + * > + * This can be used to define special cache behavior in advance, before > + * accessing the data. > + * > + * At the moment the supported cache behaviors are the following (see also > + * filemap_cache_modes): > + * > + * FILEMAP_CACHE_NORMAL: normal page cache behavior; > + * > + * FILEMAP_CACHE_ONCE: specifies that the pages will be accessed once and the > + * caller don't expect to reuse it thereafter. This prevents them for being > + * promoted to the active lru list. > + */ > +int filemap_set_cache(struct address_space *mapping, > + pgoff_t start, pgoff_t end, int mode) > +{ > + int ret; > + > + write_lock(&mapping->nocache_lock); > + switch (mode) { > + case FILEMAP_CACHE_NORMAL: > + ret = kinterval_del(&mapping->nocache_tree, > + start, end, GFP_KERNEL); > + break; > + case FILEMAP_CACHE_ONCE: > + ret = kinterval_add(&mapping->nocache_tree, > + start, end, mode, GFP_KERNEL); > + break; > + default: > + ret = -EINVAL; > + break; > + } > + write_unlock(&mapping->nocache_lock); > + > + return ret; > +} > + > +/** > + * filemap_get_cache - get special cache behavior of a file page > + * @mapping: target address space structure > + * @index: index of the page inside the address space > + * > + * If no special cache behavior are defined FILEMAP_CACHE_NORMAL is returned > + * (that means no special page cache behavior is applied). > + */ > +int filemap_get_cache(struct address_space *mapping, pgoff_t index) > +{ > + int mode; > + > + read_lock(&mapping->nocache_lock); > + mode = kinterval_lookup(&mapping->nocache_tree, index); > + read_unlock(&mapping->nocache_lock); > + > + return mode < 0 ? FILEMAP_CACHE_NORMAL : mode; > +} > + > +/** > * __filemap_fdatawrite_range - start writeback on mapping dirty pages in range > * @mapping: address space structure to write > * @start: offset in bytes where the range starts > @@ -1181,8 +1258,22 @@ page_ok: > * When a sequential read accesses a page several times, > * only mark it as accessed the first time. > */ > - if (prev_index != index || offset != prev_offset) > - mark_page_accessed(page); > + if (prev_index != index || offset != prev_offset) { > + int mode; > + > + mode = filemap_get_cache(mapping, index); > + switch (mode) { > + case FILEMAP_CACHE_NORMAL: > + mark_page_accessed(page); > + break; > + case FILEMAP_CACHE_ONCE: > + mark_page_usedonce(page); > + break; > + default: > + WARN_ON_ONCE(1); > + break; > + } > + } > prev_index = index; > > /* > -- > 1.7.5.4 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v5 3/3] fadvise: implement POSIX_FADV_NOREUSE 2012-02-27 2:33 ` KAMEZAWA Hiroyuki @ 2012-02-27 10:46 ` Andrea Righi 0 siblings, 0 replies; 27+ messages in thread From: Andrea Righi @ 2012-02-27 10:46 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner, KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, linux-mm, linux-fsdevel, LKML On Mon, Feb 27, 2012 at 11:33:38AM +0900, KAMEZAWA Hiroyuki wrote: > On Sun, 12 Feb 2012 01:21:38 +0100 > Andrea Righi <andrea@betterlinux.com> wrote: > > > According to the POSIX standard the POSIX_FADV_NOREUSE hint means that > > the application expects to access the specified data once and then not > > reuse it thereafter. > > > > It seems that the expected behavior is to implement a drop-behind > > policy where the application can set certain intervals of a file as > > FADV_NOREUSE _before_ accessing the data. > > > > An interesting usage of this hint is to guarantee that pages marked as > > FADV_NOREUSE will never blow away the pages of the current working set. > > > > A possible solution to satisfy this requirement is to prevent lru > > activation of the pages marked as FADV_NOREUSE, in other words, never > > add pages marked as FADV_NOREUSE to the active lru list. Moreover, all > > the file cache pages in a FADV_NOREUSE range can be immediately dropped > > after a read if the page was not present in the file cache before. > > > > In general, the purpose of this approach is to preserve as much as > > possible the previous state of the file cache memory before reading data > > in ranges marked by FADV_NOREUSE. > > > > All the pages read before (pre-)setting them as FADV_NOREUSE should be > > treated as normal, so they can be added to the active lru list as usual > > if they're accessed multiple times. > > > > Only after setting them as FADV_NOREUSE we can prevent them for being > > promoted to the active lru list. If they are already in the active lru > > list before calling FADV_NOREUSE we should keep them there, but if they > > quit from the active list they can't get back anymore (except by > > explicitly setting a different caching hint). > > > > >From this part, it seems the behavior of systemcall is highly depends on > interanal kernel implemenatation... Yes. If in a future kernel we'll decide to remove the active/inactive lru lists we also need to change the implementation of FADV_NOREUSE. However, I think that any solution for a feature that allows to not disturb the state of the page cache has inevitably something dependent on internal kernel implementation... Probably a more generic concept to document is that FADV_NOREUSE is an advice from the application to never consider the marked pages as part of the working set (for any possible meaning/implementation of "working set"). > > > > To achieve this goal we need to maintain the list of file ranges marked > > as FADV_NOREUSE until the pages are dropped from the page cache, or the > > inode is deleted, or they're explicitly marked to use a different cache > > behavior (FADV_NORMAL | FADV_WILLNEED). > > > > The list of FADV_NOREUSE ranges is maintained in the address_space > > structure using an interval tree (kinterval). > > > > Signed-off-by: Andrea Righi <andrea@betterlinux.com> > > > Once an appliation sets a range of file as FILEMAP_CACHE_ONCE, > the effects will last until the inode is dropped....right ? > Won't this cause troubles which cannot be detected > (because kinterval information is hidden.) ? > > I'm not sure but FADV_NOREUSE seems like one-shot call and should not have > very long time of effect (after the application exits.) > Can't we ties the liftime of kinteval to the application/file descriptor ? Yes, I'm also concerned about this. Using FADV_NOREUSE may also affect the page cache behavior of other applications. I like the idea to tie the FADV_NOREUSE ranges to the file descriptor. In addition to the shorter lifetime it has the advantage that the policy is applied only to the application that is actually using this feature and not also to the other apps running in the system. I'll consider this possibility for sure if I'll post a new version of this patch set. Thanks! -Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] [PATCH v5 0/3] fadvise: support POSIX_FADV_NOREUSE 2012-02-12 0:21 [RFC] [PATCH v5 0/3] fadvise: support POSIX_FADV_NOREUSE Andrea Righi ` (2 preceding siblings ...) 2012-02-12 0:21 ` [PATCH v5 3/3] fadvise: implement POSIX_FADV_NOREUSE Andrea Righi @ 2012-02-14 21:33 ` Andrew Morton 2012-02-14 22:06 ` John Stultz 2012-02-14 22:59 ` Andrea Righi 3 siblings, 2 replies; 27+ messages in thread From: Andrew Morton @ 2012-02-14 21:33 UTC (permalink / raw) To: Andrea Righi Cc: Minchan Kim, Peter Zijlstra, Johannes Weiner, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, linux-mm, linux-fsdevel, LKML On Sun, 12 Feb 2012 01:21:35 +0100 Andrea Righi <andrea@betterlinux.com> wrote: > The new proposal is to implement POSIX_FADV_NOREUSE as a way to perform a real > drop-behind policy where applications can mark certain intervals of a file as > FADV_NOREUSE before accessing the data. I think you and John need to talk to each other, please. The amount of duplication here is extraordinary. Both patchsets add fields to the address_space (and hence inode), which is significant - we should convince ourselves that we're getting really good returns from a feature which does this. Regarding the use of fadvise(): I suppose it's a reasonable thing to do in the long term - if the feature works well, popular data streaming applications will eventually switch over. But I do think we should explore interfaces which don't require modification of userspace source code. Because there will always be unconverted applications, and the feature becomes available immediately. One such interface would be to toss the offending application into a container which has a modified drop-behind policy. And here we need to drag out the crystal ball: what *is* the best way of tuning application pagecache behaviour? Will we gravitate towards containerization, or will we gravitate towards finer-tuned fadvise/sync_page_range/etc behaviour? Thus far it has been the latter, and I don't think that has been a great success. Finally, are the problems which prompted these patchsets already solved? What happens if you take the offending streaming application and toss it into a 16MB memcg? That *should* avoid perturbing other things running on that machine. And yes, a container-based approach is pretty crude, and one can envision applications which only want modified reclaim policy for one particualr file. But I suspect an application-wide reclaim policy solves 90% of the problems. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] [PATCH v5 0/3] fadvise: support POSIX_FADV_NOREUSE 2012-02-14 21:33 ` [RFC] [PATCH v5 0/3] fadvise: support POSIX_FADV_NOREUSE Andrew Morton @ 2012-02-14 22:06 ` John Stultz 2012-02-14 22:59 ` Andrea Righi 1 sibling, 0 replies; 27+ messages in thread From: John Stultz @ 2012-02-14 22:06 UTC (permalink / raw) To: Andrew Morton Cc: Andrea Righi, Minchan Kim, Peter Zijlstra, Johannes Weiner, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, Jerry James, Julius Plenz, linux-mm, linux-fsdevel, LKML On Tue, 2012-02-14 at 13:33 -0800, Andrew Morton wrote: > On Sun, 12 Feb 2012 01:21:35 +0100 > Andrea Righi <andrea@betterlinux.com> wrote: > > > The new proposal is to implement POSIX_FADV_NOREUSE as a way to perform a real > > drop-behind policy where applications can mark certain intervals of a file as > > FADV_NOREUSE before accessing the data. > > I think you and John need to talk to each other, please. The amount of > duplication here is extraordinary. Yea. Clearly there is much we can share. I'm still catching up from a conference last week, so I've not had a chance to really look at this yet, but its on my queue for this week. thanks -john -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] [PATCH v5 0/3] fadvise: support POSIX_FADV_NOREUSE 2012-02-14 21:33 ` [RFC] [PATCH v5 0/3] fadvise: support POSIX_FADV_NOREUSE Andrew Morton 2012-02-14 22:06 ` John Stultz @ 2012-02-14 22:59 ` Andrea Righi 2012-02-14 23:22 ` Andrew Morton 1 sibling, 1 reply; 27+ messages in thread From: Andrea Righi @ 2012-02-14 22:59 UTC (permalink / raw) To: Andrew Morton Cc: Minchan Kim, Peter Zijlstra, Johannes Weiner, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, linux-mm, linux-fsdevel, LKML On Tue, Feb 14, 2012 at 01:33:37PM -0800, Andrew Morton wrote: > On Sun, 12 Feb 2012 01:21:35 +0100 > Andrea Righi <andrea@betterlinux.com> wrote: > > > The new proposal is to implement POSIX_FADV_NOREUSE as a way to perform a real > > drop-behind policy where applications can mark certain intervals of a file as > > FADV_NOREUSE before accessing the data. > > I think you and John need to talk to each other, please. The amount of > duplication here is extraordinary. Yes, definitely. I'm currently reviewing and testing the John's patch set. I was even considering to apply my patch set on top of the John's patch, or at least propose my tree-based approach to manage the list of the POSIX_FADV_VOLATILE ranges. > > Both patchsets add fields to the address_space (and hence inode), which > is significant - we should convince ourselves that we're getting really > good returns from a feature which does this. > > > > Regarding the use of fadvise(): I suppose it's a reasonable thing to do > in the long term - if the feature works well, popular data streaming > applications will eventually switch over. But I do think we should > explore interfaces which don't require modification of userspace source > code. Because there will always be unconverted applications, and the > feature becomes available immediately. > > One such interface would be to toss the offending application into a > container which has a modified drop-behind policy. And here we need to > drag out the crystal ball: what *is* the best way of tuning application > pagecache behaviour? Will we gravitate towards containerization, or > will we gravitate towards finer-tuned fadvise/sync_page_range/etc > behaviour? Thus far it has been the latter, and I don't think that has > been a great success. > > Finally, are the problems which prompted these patchsets already > solved? What happens if you take the offending streaming application > and toss it into a 16MB memcg? That *should* avoid perturbing other > things running on that machine. Moving the streaming application into a 16MB memcg can be dangerous in some cases... the application might start to do "bad" things, like swapping (if the memcg can swap) or just fail due to OOMs. > > And yes, a container-based approach is pretty crude, and one can > envision applications which only want modified reclaim policy for one > particualr file. But I suspect an application-wide reclaim policy > solves 90% of the problems. I really like the container-based approach. But for this we need a better file cache control in the memory cgroup; now we have the accounting of file pages, but there's no way to limit them. Thanks for your comments, Andrew. -Andrea ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] [PATCH v5 0/3] fadvise: support POSIX_FADV_NOREUSE 2012-02-14 22:59 ` Andrea Righi @ 2012-02-14 23:22 ` Andrew Morton 2012-02-15 1:35 ` Andrea Righi 0 siblings, 1 reply; 27+ messages in thread From: Andrew Morton @ 2012-02-14 23:22 UTC (permalink / raw) To: Andrea Righi Cc: Minchan Kim, Peter Zijlstra, Johannes Weiner, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, linux-mm, linux-fsdevel, LKML On Tue, 14 Feb 2012 23:59:22 +0100 Andrea Righi <andrea@betterlinux.com> wrote: > On Tue, Feb 14, 2012 at 01:33:37PM -0800, Andrew Morton wrote: > > On Sun, 12 Feb 2012 01:21:35 +0100 > > Andrea Righi <andrea@betterlinux.com> wrote: > > > > > The new proposal is to implement POSIX_FADV_NOREUSE as a way to perform a real > > > drop-behind policy where applications can mark certain intervals of a file as > > > FADV_NOREUSE before accessing the data. > > > > I think you and John need to talk to each other, please. The amount of > > duplication here is extraordinary. > > Yes, definitely. I'm currently reviewing and testing the John's patch > set. I was even considering to apply my patch set on top of the John's > patch, or at least propose my tree-based approach to manage the list of > the POSIX_FADV_VOLATILE ranges. Cool. > > > > Both patchsets add fields to the address_space (and hence inode), which > > is significant - we should convince ourselves that we're getting really > > good returns from a feature which does this. > > > > > > > > Regarding the use of fadvise(): I suppose it's a reasonable thing to do > > in the long term - if the feature works well, popular data streaming > > applications will eventually switch over. But I do think we should > > explore interfaces which don't require modification of userspace source > > code. Because there will always be unconverted applications, and the > > feature becomes available immediately. > > > > One such interface would be to toss the offending application into a > > container which has a modified drop-behind policy. And here we need to > > drag out the crystal ball: what *is* the best way of tuning application > > pagecache behaviour? Will we gravitate towards containerization, or > > will we gravitate towards finer-tuned fadvise/sync_page_range/etc > > behaviour? Thus far it has been the latter, and I don't think that has > > been a great success. > > > > Finally, are the problems which prompted these patchsets already > > solved? What happens if you take the offending streaming application > > and toss it into a 16MB memcg? That *should* avoid perturbing other > > things running on that machine. > > Moving the streaming application into a 16MB memcg can be dangerous in > some cases... the application might start to do "bad" things, like > swapping (if the memcg can swap) or just fail due to OOMs. Well OK, maybe there are problems with the current implementation. But are they unfixable problems? Is the right approach to give up on ever making containers useful for this application and to instead go off and implement a new and separate feature? > > And yes, a container-based approach is pretty crude, and one can > > envision applications which only want modified reclaim policy for one > > particualr file. But I suspect an application-wide reclaim policy > > solves 90% of the problems. > > I really like the container-based approach. But for this we need a > better file cache control in the memory cgroup; now we have the > accounting of file pages, but there's no way to limit them. Again, if/whem memcg becomes sufficiently useful for this application we're left maintaining the obsolete POSIX_FADVISE_NOREUSE for ever. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] [PATCH v5 0/3] fadvise: support POSIX_FADV_NOREUSE 2012-02-14 23:22 ` Andrew Morton @ 2012-02-15 1:35 ` Andrea Righi 2012-02-15 23:48 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 27+ messages in thread From: Andrea Righi @ 2012-02-15 1:35 UTC (permalink / raw) To: Andrew Morton Cc: Minchan Kim, Peter Zijlstra, Johannes Weiner, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, Greg Thelen, linux-mm, linux-fsdevel, LKML On Tue, Feb 14, 2012 at 03:22:20PM -0800, Andrew Morton wrote: > On Tue, 14 Feb 2012 23:59:22 +0100 > Andrea Righi <andrea@betterlinux.com> wrote: > > > On Tue, Feb 14, 2012 at 01:33:37PM -0800, Andrew Morton wrote: > > > On Sun, 12 Feb 2012 01:21:35 +0100 > > > Andrea Righi <andrea@betterlinux.com> wrote: > > > > > > > The new proposal is to implement POSIX_FADV_NOREUSE as a way to perform a real > > > > drop-behind policy where applications can mark certain intervals of a file as > > > > FADV_NOREUSE before accessing the data. > > > > > > I think you and John need to talk to each other, please. The amount of > > > duplication here is extraordinary. > > > > Yes, definitely. I'm currently reviewing and testing the John's patch > > set. I was even considering to apply my patch set on top of the John's > > patch, or at least propose my tree-based approach to manage the list of > > the POSIX_FADV_VOLATILE ranges. > > Cool. > > > > > > > Both patchsets add fields to the address_space (and hence inode), which > > > is significant - we should convince ourselves that we're getting really > > > good returns from a feature which does this. > > > > > > > > > > > > Regarding the use of fadvise(): I suppose it's a reasonable thing to do > > > in the long term - if the feature works well, popular data streaming > > > applications will eventually switch over. But I do think we should > > > explore interfaces which don't require modification of userspace source > > > code. Because there will always be unconverted applications, and the > > > feature becomes available immediately. > > > > > > One such interface would be to toss the offending application into a > > > container which has a modified drop-behind policy. And here we need to > > > drag out the crystal ball: what *is* the best way of tuning application > > > pagecache behaviour? Will we gravitate towards containerization, or > > > will we gravitate towards finer-tuned fadvise/sync_page_range/etc > > > behaviour? Thus far it has been the latter, and I don't think that has > > > been a great success. > > > > > > Finally, are the problems which prompted these patchsets already > > > solved? What happens if you take the offending streaming application > > > and toss it into a 16MB memcg? That *should* avoid perturbing other > > > things running on that machine. > > > > Moving the streaming application into a 16MB memcg can be dangerous in > > some cases... the application might start to do "bad" things, like > > swapping (if the memcg can swap) or just fail due to OOMs. > > Well OK, maybe there are problems with the current implementation. But > are they unfixable problems? Is the right approach to give up on ever > making containers useful for this application and to instead go off and > implement a new and separate feature? > > > > And yes, a container-based approach is pretty crude, and one can > > > envision applications which only want modified reclaim policy for one > > > particualr file. But I suspect an application-wide reclaim policy > > > solves 90% of the problems. > > > > I really like the container-based approach. But for this we need a > > better file cache control in the memory cgroup; now we have the > > accounting of file pages, but there's no way to limit them. > > Again, if/whem memcg becomes sufficiently useful for this application > we're left maintaining the obsolete POSIX_FADVISE_NOREUSE for ever. Yes, totally agree. For the future a memcg-based solution is probably the best way to go. This reminds me to the old per-memcg dirty memory discussion (http://thread.gmane.org/gmane.linux.kernel.mm/67114), cc'ing Greg. Maybe the generic feature to provide that could solve both problems is a better file cache isolation in memcg. Thanks, -Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] [PATCH v5 0/3] fadvise: support POSIX_FADV_NOREUSE 2012-02-15 1:35 ` Andrea Righi @ 2012-02-15 23:48 ` KAMEZAWA Hiroyuki 2012-02-16 0:43 ` Andrea Righi 0 siblings, 1 reply; 27+ messages in thread From: KAMEZAWA Hiroyuki @ 2012-02-15 23:48 UTC (permalink / raw) To: Andrea Righi Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner, KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, Greg Thelen, linux-mm, linux-fsdevel, LKML On Wed, 15 Feb 2012 02:35:24 +0100 Andrea Righi <andrea@betterlinux.com> wrote: > On Tue, Feb 14, 2012 at 03:22:20PM -0800, Andrew Morton wrote: > > On Tue, 14 Feb 2012 23:59:22 +0100 > > Andrea Righi <andrea@betterlinux.com> wrote: > > > > > On Tue, Feb 14, 2012 at 01:33:37PM -0800, Andrew Morton wrote: > > > > On Sun, 12 Feb 2012 01:21:35 +0100 > > > > Andrea Righi <andrea@betterlinux.com> wrote: > > > > And yes, a container-based approach is pretty crude, and one can > > > > envision applications which only want modified reclaim policy for one > > > > particualr file. But I suspect an application-wide reclaim policy > > > > solves 90% of the problems. > > > > > > I really like the container-based approach. But for this we need a > > > better file cache control in the memory cgroup; now we have the > > > accounting of file pages, but there's no way to limit them. > > > > Again, if/whem memcg becomes sufficiently useful for this application > > we're left maintaining the obsolete POSIX_FADVISE_NOREUSE for ever. > > Yes, totally agree. For the future a memcg-based solution is probably > the best way to go. > > This reminds me to the old per-memcg dirty memory discussion > (http://thread.gmane.org/gmane.linux.kernel.mm/67114), cc'ing Greg. > > Maybe the generic feature to provide that could solve both problems is > a better file cache isolation in memcg. > Can you think of example interface for us ? I'd like to discuss this in mm-summit if we have a chance. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] [PATCH v5 0/3] fadvise: support POSIX_FADV_NOREUSE 2012-02-15 23:48 ` KAMEZAWA Hiroyuki @ 2012-02-16 0:43 ` Andrea Righi 2014-01-02 21:25 ` Phillip Susi 0 siblings, 1 reply; 27+ messages in thread From: Andrea Righi @ 2012-02-16 0:43 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner, KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, Greg Thelen, linux-mm, linux-fsdevel, LKML On Thu, Feb 16, 2012 at 08:48:31AM +0900, KAMEZAWA Hiroyuki wrote: > On Wed, 15 Feb 2012 02:35:24 +0100 > Andrea Righi <andrea@betterlinux.com> wrote: > > > On Tue, Feb 14, 2012 at 03:22:20PM -0800, Andrew Morton wrote: > > > On Tue, 14 Feb 2012 23:59:22 +0100 > > > Andrea Righi <andrea@betterlinux.com> wrote: > > > > > > > On Tue, Feb 14, 2012 at 01:33:37PM -0800, Andrew Morton wrote: > > > > > On Sun, 12 Feb 2012 01:21:35 +0100 > > > > > Andrea Righi <andrea@betterlinux.com> wrote: > > > > > And yes, a container-based approach is pretty crude, and one can > > > > > envision applications which only want modified reclaim policy for one > > > > > particualr file. But I suspect an application-wide reclaim policy > > > > > solves 90% of the problems. > > > > > > > > I really like the container-based approach. But for this we need a > > > > better file cache control in the memory cgroup; now we have the > > > > accounting of file pages, but there's no way to limit them. > > > > > > Again, if/whem memcg becomes sufficiently useful for this application > > > we're left maintaining the obsolete POSIX_FADVISE_NOREUSE for ever. > > > > Yes, totally agree. For the future a memcg-based solution is probably > > the best way to go. > > > > This reminds me to the old per-memcg dirty memory discussion > > (http://thread.gmane.org/gmane.linux.kernel.mm/67114), cc'ing Greg. > > > > Maybe the generic feature to provide that could solve both problems is > > a better file cache isolation in memcg. > > > > Can you think of example interface for us ? > I'd like to discuss this in mm-summit if we have a chance. > > Thanks, > -Kame Sure! I'll try to write down more detailed ideas. For now the best interface that I can see is to add something like memory.file.* in cgroupfs. The NOREUSE-like policy that I was trying to implement via fadvise() can be probably implemented by setting memory.file.limit_in_bytes=0 (or using a very small value). A cgroup like this could use any amount of memory (according to the other memory.* settings), but it should drop any file cache page as soon as possible, if the page was not present in memory before. IOW, this cgroup shouldn't disturb the state of the page cache for the other cgroups. Another interesting usage is to provide different levels of service. For example, using different values for memory.file.limit_in_byte would make possible to specify that file cache pages of certain cgroups are reclaimed before others. This would be a very nice feature IMHO, also for those who want to provide different levels of service per-user. Thoughts? Thanks, -Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] [PATCH v5 0/3] fadvise: support POSIX_FADV_NOREUSE 2012-02-16 0:43 ` Andrea Righi @ 2014-01-02 21:25 ` Phillip Susi 0 siblings, 0 replies; 27+ messages in thread From: Phillip Susi @ 2014-01-02 21:25 UTC (permalink / raw) To: Andrea Righi, KAMEZAWA Hiroyuki Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner, KOSAKI Motohiro, Rik van Riel, Hugh Dickins, Alexander Viro, Shaohua Li, Pádraig Brady, John Stultz, Jerry James, Julius Plenz, Greg Thelen, linux-mm, linux-fsdevel, LKML -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 What ever happened to this patch set? It looks like a great idea to me and I'd *really* like to see this flag implemented. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJSxdlTAAoJEI5FoCIzSKrw0O0H/3cIe/XvrXvF6qzgHpQPIfnN a+14iqUa5+fNKNRM0Rwk9Tb3EFUjIXKHcRiRzGD9CINVxonBQQME68KA94UxVZIL oul4YMP4dNcBhp8Ux1M80JY3Y/CMSo9SAN1pc7bmIezua/v821vb6wgCamj3EnS5 /Zs51cIWMkRSAr7EVvycI6mI04MzqsEdtGHdI0U6jrLjLLHsEgbuqkBMrc5BNkQ/ 3tD6atY5zNyBIl+RBOvukNoijtEW4Z5OU+zfZHSk/L72yZnl+17nz4mRApmikUDV zhJoCheWqLCLtpg2SxVC/EMUS3TDy3k9+8zQHbRZ3igXP4e58NZFR+XC1JhVonQ= =TqVD -----END PGP SIGNATURE----- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2014-01-02 21:25 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-02-12 0:21 [RFC] [PATCH v5 0/3] fadvise: support POSIX_FADV_NOREUSE Andrea Righi 2012-02-12 0:21 ` [PATCH v5 1/3] kinterval: routines to manipulate generic intervals Andrea Righi 2012-02-13 0:48 ` Andrea Righi 2012-02-12 0:21 ` [PATCH v5 2/3] mm: filemap: introduce mark_page_usedonce Andrea Righi 2012-02-12 0:21 ` [PATCH v5 3/3] fadvise: implement POSIX_FADV_NOREUSE Andrea Righi 2012-02-13 16:22 ` KOSAKI Motohiro 2012-02-13 16:22 ` KOSAKI Motohiro [not found] ` <CAHGf_=qs8-nE6y6EzNYUzgjGo0sMP5zvCc3=GNZmHct6mPecqg@mail.gmail.com> 2012-02-13 18:00 ` Andrea Righi 2012-02-15 23:35 ` Arun Sharma 2012-02-15 23:47 ` Andrea Righi 2012-02-15 23:57 ` Arun Sharma 2012-02-16 0:56 ` Andrea Righi 2012-02-16 2:10 ` Arun Sharma 2012-02-16 10:39 ` Andrea Righi 2012-02-16 18:43 ` Arun Sharma 2012-02-16 18:57 ` Andrea Righi 2012-02-16 19:07 ` Arun Sharma 2012-02-27 2:33 ` KAMEZAWA Hiroyuki 2012-02-27 10:46 ` Andrea Righi 2012-02-14 21:33 ` [RFC] [PATCH v5 0/3] fadvise: support POSIX_FADV_NOREUSE Andrew Morton 2012-02-14 22:06 ` John Stultz 2012-02-14 22:59 ` Andrea Righi 2012-02-14 23:22 ` Andrew Morton 2012-02-15 1:35 ` Andrea Righi 2012-02-15 23:48 ` KAMEZAWA Hiroyuki 2012-02-16 0:43 ` Andrea Righi 2014-01-02 21:25 ` Phillip Susi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).