* [PATCH 0/5] VFS: Directory level cache cleaning
@ 2013-12-16 15:00 Li Wang
2013-12-16 15:00 ` [PATCH 1/5] VFS: Convert drop_caches to accept string Li Wang
` (6 more replies)
0 siblings, 7 replies; 16+ messages in thread
From: Li Wang @ 2013-12-16 15:00 UTC (permalink / raw)
To: Alexander Viro
Cc: Sage Weil, linux-fsdevel, linux-mm, linux-kernel, Li Wang,
Yunchuan Wen
Currently, Linux only support file system wide VFS
cache (dentry cache and page cache) cleaning through
'/proc/sys/vm/drop_caches'. Sometimes this is less
flexible. The applications may know exactly whether
the metadata and data will be referenced or not in future,
a desirable mechanism is to enable applications to
reclaim the memory of unused cache entries at a finer
granularity - directory level. This enables applications
to keep hot metadata and data (to be referenced in the
future) in the cache, and kick unused out to avoid
cache thrashing. Another advantage is it is more flexible
for debugging.
This patch extend the 'drop_caches' interface to
support directory level cache cleaning and has a complete
backward compatibility. '{1,2,3}' keeps the same semantics
as before. Besides, "{1,2,3}:DIRECTORY_PATH_NAME" is allowed
to recursively clean the caches under DIRECTORY_PATH_NAME.
For example, 'echo 1:/home/foo/jpg > /proc/sys/vm/drop_caches'
will clean the page caches of the files inside 'home/foo/jpg'.
It is easy to demonstrate the advantage of directory level
cache cleaning. We use a virtual machine configured with
an Intel(R) Xeon(R) 8-core CPU E5506 @ 2.13GHz, and with 1GB
memory. Three directories named '1', '2' and '3' are created,
with each containing 180000 – 280000 files. The test program
opens all files in a directory and then tries the next directory.
The order for accessing the directories is '1', '2', '3',
'1'.
The time on accessing '1' on the second time is measured
with/without cache cleaning, under different file counts.
With cache cleaning, we clean all cache entries of files
in '2' before accessing the files in '3'. The results
are as follows (in seconds),
Note: by default, VFS will move those unreferenced inodes
into a global LRU list rather than freeing them, for this
experiment, we modified iput() to force to free inode as well,
this behavior and related codes are left for further discussion,
thus not reflected in this patch)
Number of files: 180000 200000 220000 240000 260000
Without cleaning: 2.165 6.977 10.032 11.571 13.443
With cleaning: 1.949 1.906 2.336 2.918 3.651
When the number of files is 180000 in each directory,
the metadata cache is large enough to buffer all entries
of three directories, so re-accessing '1' will hit in
the cache, regardless of whether '2' cleaned up or not.
As the number of files increases, the cache can now only
buffer two+ directories. Accessing '3' will result in some
entries of '1' to be evicted (due to LRU). When re-accessing '1',
some entries need be reloaded from disk, which is time-consuming.
In this case, cleaning '2' before accessing '3' enjoys a good
speedup, a maximum 4.29X performance improvements is achieved.
The advantage of directory level page cache cleaning should be
easier to be demonstrated.
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Li Wang (5):
VFS: Convert drop_caches to accept string
VFS: Convert sysctl_drop_caches to string
VFS: Add the declaration of shrink_pagecache_parent
VFS: Add shrink_pagecache_parent
VFS: Extend drop_caches sysctl handler to allow directory level cache
cleaning
fs/dcache.c | 35 +++++++++++++++++++++++++++++++++++
fs/drop_caches.c | 45 +++++++++++++++++++++++++++++++++++++--------
include/linux/dcache.h | 1 +
include/linux/mm.h | 3 ++-
kernel/sysctl.c | 6 ++----
5 files changed, 77 insertions(+), 13 deletions(-)
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/5] VFS: Convert drop_caches to accept string
2013-12-16 15:00 [PATCH 0/5] VFS: Directory level cache cleaning Li Wang
@ 2013-12-16 15:00 ` Li Wang
2013-12-16 15:00 ` [PATCH 2/5] VFS: Convert sysctl_drop_caches to string Li Wang
` (5 subsequent siblings)
6 siblings, 0 replies; 16+ messages in thread
From: Li Wang @ 2013-12-16 15:00 UTC (permalink / raw)
To: Alexander Viro
Cc: Sage Weil, linux-fsdevel, linux-mm, linux-kernel, Li Wang,
Yunchuan Wen
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
---
kernel/sysctl.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 34a6047..2f2d8ab 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1255,12 +1255,10 @@ static struct ctl_table vm_table[] = {
},
{
.procname = "drop_caches",
- .data = &sysctl_drop_caches,
- .maxlen = sizeof(int),
+ .data = sysctl_drop_caches,
+ .maxlen = PATH_MAX,
.mode = 0644,
.proc_handler = drop_caches_sysctl_handler,
- .extra1 = &one,
- .extra2 = &three,
},
#ifdef CONFIG_COMPACTION
{
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/5] VFS: Convert sysctl_drop_caches to string
2013-12-16 15:00 [PATCH 0/5] VFS: Directory level cache cleaning Li Wang
2013-12-16 15:00 ` [PATCH 1/5] VFS: Convert drop_caches to accept string Li Wang
@ 2013-12-16 15:00 ` Li Wang
2013-12-16 15:00 ` [PATCH 3/5] VFS: Add the declaration of shrink_pagecache_parent Li Wang
` (4 subsequent siblings)
6 siblings, 0 replies; 16+ messages in thread
From: Li Wang @ 2013-12-16 15:00 UTC (permalink / raw)
To: Alexander Viro
Cc: Sage Weil, linux-fsdevel, linux-mm, linux-kernel, Li Wang,
Yunchuan Wen
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
---
include/linux/mm.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1cedd00..5e3cc5b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -17,6 +17,7 @@
#include <linux/pfn.h>
#include <linux/bit_spinlock.h>
#include <linux/shrinker.h>
+#include <linux/fs.h>
struct mempolicy;
struct anon_vma;
@@ -1920,7 +1921,7 @@ int in_gate_area_no_mm(unsigned long addr);
#endif /* __HAVE_ARCH_GATE_AREA */
#ifdef CONFIG_SYSCTL
-extern int sysctl_drop_caches;
+extern char sysctl_drop_caches[PATH_MAX];
int drop_caches_sysctl_handler(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
#endif
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/5] VFS: Add the declaration of shrink_pagecache_parent
2013-12-16 15:00 [PATCH 0/5] VFS: Directory level cache cleaning Li Wang
2013-12-16 15:00 ` [PATCH 1/5] VFS: Convert drop_caches to accept string Li Wang
2013-12-16 15:00 ` [PATCH 2/5] VFS: Convert sysctl_drop_caches to string Li Wang
@ 2013-12-16 15:00 ` Li Wang
2013-12-16 15:00 ` [PATCH 4/5] VFS: Add shrink_pagecache_parent Li Wang
` (3 subsequent siblings)
6 siblings, 0 replies; 16+ messages in thread
From: Li Wang @ 2013-12-16 15:00 UTC (permalink / raw)
To: Alexander Viro
Cc: Sage Weil, linux-fsdevel, linux-mm, linux-kernel, Li Wang,
Yunchuan Wen
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
---
include/linux/dcache.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 57e87e7..ce11098 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -247,6 +247,7 @@ extern struct dentry *d_find_any_alias(struct inode *inode);
extern struct dentry * d_obtain_alias(struct inode *);
extern void shrink_dcache_sb(struct super_block *);
extern void shrink_dcache_parent(struct dentry *);
+extern void shrink_pagecache_parent(struct dentry *);
extern void shrink_dcache_for_umount(struct super_block *);
extern int d_invalidate(struct dentry *);
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 4/5] VFS: Add shrink_pagecache_parent
2013-12-16 15:00 [PATCH 0/5] VFS: Directory level cache cleaning Li Wang
` (2 preceding siblings ...)
2013-12-16 15:00 ` [PATCH 3/5] VFS: Add the declaration of shrink_pagecache_parent Li Wang
@ 2013-12-16 15:00 ` Li Wang
2013-12-16 15:00 ` [PATCH 5/5] VFS: Extend drop_caches sysctl handler to allow directory level cache cleaning Li Wang
` (2 subsequent siblings)
6 siblings, 0 replies; 16+ messages in thread
From: Li Wang @ 2013-12-16 15:00 UTC (permalink / raw)
To: Alexander Viro
Cc: Sage Weil, linux-fsdevel, linux-mm, linux-kernel, Li Wang,
Yunchuan Wen
Analogous to shrink_dcache_parent except that it collects inodes.
It is not very appropriate to be put in dcache.c, but d_walk can only
be invoked from here.
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
---
fs/dcache.c | 35 +++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/fs/dcache.c b/fs/dcache.c
index 4bdb300..bcbfd0d 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1318,6 +1318,41 @@ void shrink_dcache_parent(struct dentry *parent)
}
EXPORT_SYMBOL(shrink_dcache_parent);
+static enum d_walk_ret gather_inode(void *data, struct dentry *dentry)
+{
+ struct list_head *list = data;
+ struct inode *inode = dentry->d_inode;
+
+ if (inode == NULL)
+ goto out;
+ spin_lock(&inode->i_lock);
+ if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
+ (inode->i_mapping->nrpages == 0) ||
+ (!list_empty(&inode->i_lru))) {
+ goto out_unlock;
+ }
+ __iget(inode);
+ list_add_tail(&inode->i_lru, list);
+out_unlock:
+ spin_unlock(&inode->i_lock);
+out:
+ return D_WALK_CONTINUE;
+}
+
+void shrink_pagecache_parent(struct dentry *parent)
+{
+ LIST_HEAD(list);
+ struct inode *inode, *next;
+
+ d_walk(parent, &list, gather_inode, NULL);
+ list_for_each_entry_safe(inode, next, &list, i_lru) {
+ list_del_init(&inode->i_lru);
+ invalidate_mapping_pages(inode->i_mapping, 0, -1);
+ iput(inode);
+ }
+}
+EXPORT_SYMBOL(shrink_pagecache_parent);
+
static enum d_walk_ret umount_collect(void *_data, struct dentry *dentry)
{
struct select_data *data = _data;
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 5/5] VFS: Extend drop_caches sysctl handler to allow directory level cache cleaning
2013-12-16 15:00 [PATCH 0/5] VFS: Directory level cache cleaning Li Wang
` (3 preceding siblings ...)
2013-12-16 15:00 ` [PATCH 4/5] VFS: Add shrink_pagecache_parent Li Wang
@ 2013-12-16 15:00 ` Li Wang
2013-12-16 17:45 ` [PATCH 0/5] VFS: Directory " Cong Wang
2013-12-17 22:05 ` Dave Chinner
6 siblings, 0 replies; 16+ messages in thread
From: Li Wang @ 2013-12-16 15:00 UTC (permalink / raw)
To: Alexander Viro
Cc: Sage Weil, linux-fsdevel, linux-mm, linux-kernel, Li Wang,
Yunchuan Wen
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
---
fs/drop_caches.c | 45 +++++++++++++++++++++++++++++++++++++--------
1 file changed, 37 insertions(+), 8 deletions(-)
diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index 9fd702f..ab31393 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -8,10 +8,11 @@
#include <linux/writeback.h>
#include <linux/sysctl.h>
#include <linux/gfp.h>
+#include <linux/fs_struct.h>
#include "internal.h"
/* A global variable is a bit ugly, but it keeps the code simple */
-int sysctl_drop_caches;
+char sysctl_drop_caches[PATH_MAX];
static void drop_pagecache_sb(struct super_block *sb, void *unused)
{
@@ -54,15 +55,43 @@ int drop_caches_sysctl_handler(ctl_table *table, int write,
void __user *buffer, size_t *length, loff_t *ppos)
{
int ret;
+ int command;
+ struct path path;
+ struct path root;
- ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
- if (ret)
- return ret;
- if (write) {
- if (sysctl_drop_caches & 1)
+ ret = proc_dostring(table, write, buffer, length, ppos);
+ if (ret || !write)
+ goto out;
+ ret = -EINVAL;
+ command = sysctl_drop_caches[0] - '0';
+ if (command < 1 || command > 3)
+ goto out;
+ if (sysctl_drop_caches[1] == '\0') {
+ if (command & 1)
iterate_supers(drop_pagecache_sb, NULL);
- if (sysctl_drop_caches & 2)
+ if (command & 2)
drop_slab();
+ ret = 0;
+ goto out;
}
- return 0;
+ if (sysctl_drop_caches[1] != ':' || sysctl_drop_caches[2] == '\0')
+ goto out;
+ if (sysctl_drop_caches[2] == '/')
+ get_fs_root(current->fs, &root);
+ else
+ get_fs_pwd(current->fs, &root);
+ ret = vfs_path_lookup(root.dentry, root.mnt,
+ &sysctl_drop_caches[2], 0, &path);
+ path_put(&root);
+ if (ret)
+ goto out;
+ if (command & 1)
+ shrink_pagecache_parent(path.dentry);
+ if (command & 2)
+ shrink_dcache_parent(path.dentry);
+ path_put(&path);
+out:
+ if (ret)
+ memset(sysctl_drop_caches, 0, PATH_MAX);
+ return ret;
}
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] VFS: Directory level cache cleaning
2013-12-16 15:00 [PATCH 0/5] VFS: Directory level cache cleaning Li Wang
` (4 preceding siblings ...)
2013-12-16 15:00 ` [PATCH 5/5] VFS: Extend drop_caches sysctl handler to allow directory level cache cleaning Li Wang
@ 2013-12-16 17:45 ` Cong Wang
2013-12-17 3:08 ` Li Wang
2013-12-17 22:05 ` Dave Chinner
6 siblings, 1 reply; 16+ messages in thread
From: Cong Wang @ 2013-12-16 17:45 UTC (permalink / raw)
To: Li Wang
Cc: Alexander Viro, Sage Weil, linux-fsdevel, linux-mm, LKML,
Yunchuan Wen
On Mon, Dec 16, 2013 at 7:00 AM, Li Wang <liwang@ubuntukylin.com> wrote:
> This patch extend the 'drop_caches' interface to
> support directory level cache cleaning and has a complete
> backward compatibility. '{1,2,3}' keeps the same semantics
> as before. Besides, "{1,2,3}:DIRECTORY_PATH_NAME" is allowed
> to recursively clean the caches under DIRECTORY_PATH_NAME.
> For example, 'echo 1:/home/foo/jpg > /proc/sys/vm/drop_caches'
> will clean the page caches of the files inside 'home/foo/jpg'.
>
This interface is ugly...
And we already have a file-level drop cache, that is,
fadvise(DONTNEED). Can you extend it if it can't
handle a directory fd?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] VFS: Directory level cache cleaning
2013-12-16 17:45 ` [PATCH 0/5] VFS: Directory " Cong Wang
@ 2013-12-17 3:08 ` Li Wang
2013-12-17 3:58 ` Matthew Wilcox
0 siblings, 1 reply; 16+ messages in thread
From: Li Wang @ 2013-12-17 3:08 UTC (permalink / raw)
To: Cong Wang
Cc: Alexander Viro, Sage Weil, linux-fsdevel, linux-mm, LKML,
Yunchuan Wen
As far as we know, fadvise(DONTNEED) does not support metadata
cache cleaning. We think that is desirable under massive small files
situations. Another thing is that do people accept the behavior
of feeding a directory fd to fadvise will recusively clean all
page caches of files inside that directory?
On 2013/12/17 1:45, Cong Wang wrote:
> On Mon, Dec 16, 2013 at 7:00 AM, Li Wang <liwang@ubuntukylin.com> wrote:
>> This patch extend the 'drop_caches' interface to
>> support directory level cache cleaning and has a complete
>> backward compatibility. '{1,2,3}' keeps the same semantics
>> as before. Besides, "{1,2,3}:DIRECTORY_PATH_NAME" is allowed
>> to recursively clean the caches under DIRECTORY_PATH_NAME.
>> For example, 'echo 1:/home/foo/jpg > /proc/sys/vm/drop_caches'
>> will clean the page caches of the files inside 'home/foo/jpg'.
>>
>
> This interface is ugly...
>
> And we already have a file-level drop cache, that is,
> fadvise(DONTNEED). Can you extend it if it can't
> handle a directory fd?
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] VFS: Directory level cache cleaning
2013-12-17 3:08 ` Li Wang
@ 2013-12-17 3:58 ` Matthew Wilcox
2013-12-17 7:23 ` Li Wang
0 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2013-12-17 3:58 UTC (permalink / raw)
To: Li Wang
Cc: Cong Wang, Alexander Viro, Sage Weil, linux-fsdevel, linux-mm,
LKML, Yunchuan Wen
On Tue, Dec 17, 2013 at 11:08:16AM +0800, Li Wang wrote:
> As far as we know, fadvise(DONTNEED) does not support metadata
> cache cleaning. We think that is desirable under massive small files
> situations. Another thing is that do people accept the behavior
> of feeding a directory fd to fadvise will recusively clean all
> page caches of files inside that directory?
I think there's a really good permissions-related question here.
If that's an acceptable interface, should one have to be CAP_SYS_ADMIN
to issue the request? What if some of the files below this directory
are not owned by the user issuing the request?
> On 2013/12/17 1:45, Cong Wang wrote:
>> On Mon, Dec 16, 2013 at 7:00 AM, Li Wang <liwang@ubuntukylin.com> wrote:
>>> This patch extend the 'drop_caches' interface to
>>> support directory level cache cleaning and has a complete
>>> backward compatibility. '{1,2,3}' keeps the same semantics
>>> as before. Besides, "{1,2,3}:DIRECTORY_PATH_NAME" is allowed
>>> to recursively clean the caches under DIRECTORY_PATH_NAME.
>>> For example, 'echo 1:/home/foo/jpg > /proc/sys/vm/drop_caches'
>>> will clean the page caches of the files inside 'home/foo/jpg'.
>>>
>>
>> This interface is ugly...
>>
>> And we already have a file-level drop cache, that is,
>> fadvise(DONTNEED). Can you extend it if it can't
>> handle a directory fd?
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] VFS: Directory level cache cleaning
2013-12-17 3:58 ` Matthew Wilcox
@ 2013-12-17 7:23 ` Li Wang
2013-12-17 9:12 ` Li Zefan
0 siblings, 1 reply; 16+ messages in thread
From: Li Wang @ 2013-12-17 7:23 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Cong Wang, Alexander Viro, Sage Weil, linux-fsdevel, linux-mm,
LKML, Yunchuan Wen
If we do wanna equip fadvise() with directory level page cache cleaning,
this could be solved by invoking (inode_permission() ||
capable(CAP_SYS_ADMIN)) before manipulating the page cache of that inode.
We think the current extension to 'drop_caches' has a complete back
compatibility, the old semantics keep unchanged, and with add-on
features to do finer granularity cache cleaning should be also
desirable.
On 2013/12/17 11:58, Matthew Wilcox wrote:
> On Tue, Dec 17, 2013 at 11:08:16AM +0800, Li Wang wrote:
>> As far as we know, fadvise(DONTNEED) does not support metadata
>> cache cleaning. We think that is desirable under massive small files
>> situations. Another thing is that do people accept the behavior
>> of feeding a directory fd to fadvise will recusively clean all
>> page caches of files inside that directory?
>
> I think there's a really good permissions-related question here.
> If that's an acceptable interface, should one have to be CAP_SYS_ADMIN
> to issue the request? What if some of the files below this directory
> are not owned by the user issuing the request?
>
>> On 2013/12/17 1:45, Cong Wang wrote:
>>> On Mon, Dec 16, 2013 at 7:00 AM, Li Wang <liwang@ubuntukylin.com> wrote:
>>>> This patch extend the 'drop_caches' interface to
>>>> support directory level cache cleaning and has a complete
>>>> backward compatibility. '{1,2,3}' keeps the same semantics
>>>> as before. Besides, "{1,2,3}:DIRECTORY_PATH_NAME" is allowed
>>>> to recursively clean the caches under DIRECTORY_PATH_NAME.
>>>> For example, 'echo 1:/home/foo/jpg > /proc/sys/vm/drop_caches'
>>>> will clean the page caches of the files inside 'home/foo/jpg'.
>>>>
>>>
>>> This interface is ugly...
>>>
>>> And we already have a file-level drop cache, that is,
>>> fadvise(DONTNEED). Can you extend it if it can't
>>> handle a directory fd?
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] VFS: Directory level cache cleaning
2013-12-17 7:23 ` Li Wang
@ 2013-12-17 9:12 ` Li Zefan
2013-12-17 9:31 ` Li Wang
2013-12-17 13:55 ` Michal Hocko
0 siblings, 2 replies; 16+ messages in thread
From: Li Zefan @ 2013-12-17 9:12 UTC (permalink / raw)
To: Li Wang
Cc: Matthew Wilcox, Cong Wang, Alexander Viro, Sage Weil,
linux-fsdevel, linux-mm, LKML, Yunchuan Wen
On 2013/12/17 15:23, Li Wang wrote:
> If we do wanna equip fadvise() with directory level page cache cleaning,
> this could be solved by invoking (inode_permission() || capable(CAP_SYS_ADMIN)) before manipulating the page cache of that inode.
> We think the current extension to 'drop_caches' has a complete back
> compatibility, the old semantics keep unchanged, and with add-on
> features to do finer granularity cache cleaning should be also
> desirable.
>
I don't think you can extend the drop_caches interface this way. It should
be used for debuging only.
commit 9d0243bca345d5ce25d3f4b74b7facb3a6df1232
Author: Andrew Morton <akpm@osdl.org>
Date: Sun Jan 8 01:00:39 2006 -0800
[PATCH] drop-pagecache
Add /proc/sys/vm/drop_caches. When written to, this will cause the kernel to
discard as much pagecache and/or reclaimable slab objects as it can. THis
operation requires root permissions.
...
This is a debugging feature: useful for getting consistent results between
filesystem benchmarks. We could possibly put it under a config option, but
it's less than 300 bytes.
Also see http://lkml.org/lkml/2013/7/26/230
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] VFS: Directory level cache cleaning
2013-12-17 9:12 ` Li Zefan
@ 2013-12-17 9:31 ` Li Wang
2013-12-18 1:26 ` Li Zefan
2013-12-17 13:55 ` Michal Hocko
1 sibling, 1 reply; 16+ messages in thread
From: Li Wang @ 2013-12-17 9:31 UTC (permalink / raw)
To: Li Zefan
Cc: Matthew Wilcox, Cong Wang, Alexander Viro, Sage Weil,
linux-fsdevel, linux-mm, LKML, Yunchuan Wen
This extension is just add-on extension. The original debugging
capability is still there, and more flexible debugging is now allowed.
On 2013/12/17 17:12, Li Zefan wrote:
> On 2013/12/17 15:23, Li Wang wrote:
>> If we do wanna equip fadvise() with directory level page cache cleaning,
>> this could be solved by invoking (inode_permission() || capable(CAP_SYS_ADMIN)) before manipulating the page cache of that inode.
>> We think the current extension to 'drop_caches' has a complete back
>> compatibility, the old semantics keep unchanged, and with add-on
>> features to do finer granularity cache cleaning should be also
>> desirable.
>>
>
> I don't think you can extend the drop_caches interface this way. It should
> be used for debuging only.
>
> commit 9d0243bca345d5ce25d3f4b74b7facb3a6df1232
> Author: Andrew Morton <akpm@osdl.org>
> Date: Sun Jan 8 01:00:39 2006 -0800
>
> [PATCH] drop-pagecache
>
> Add /proc/sys/vm/drop_caches. When written to, this will cause the kernel to
> discard as much pagecache and/or reclaimable slab objects as it can. THis
> operation requires root permissions.
>
> ...
>
> This is a debugging feature: useful for getting consistent results between
> filesystem benchmarks. We could possibly put it under a config option, but
> it's less than 300 bytes.
>
> Also see http://lkml.org/lkml/2013/7/26/230
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] VFS: Directory level cache cleaning
2013-12-17 9:12 ` Li Zefan
2013-12-17 9:31 ` Li Wang
@ 2013-12-17 13:55 ` Michal Hocko
1 sibling, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2013-12-17 13:55 UTC (permalink / raw)
To: Li Zefan
Cc: Li Wang, Matthew Wilcox, Cong Wang, Alexander Viro, Sage Weil,
linux-fsdevel, linux-mm, LKML, Yunchuan Wen
On Tue 17-12-13 17:12:52, Li Zefan wrote:
> On 2013/12/17 15:23, Li Wang wrote:
> > If we do wanna equip fadvise() with directory level page cache cleaning,
> > this could be solved by invoking (inode_permission() || capable(CAP_SYS_ADMIN)) before manipulating the page cache of that inode.
> > We think the current extension to 'drop_caches' has a complete back
> > compatibility, the old semantics keep unchanged, and with add-on
> > features to do finer granularity cache cleaning should be also
> > desirable.
> >
>
> I don't think you can extend the drop_caches interface this way. It should
> be used for debuging only.
Completely agreed. The interface shouldn't be further extended. I would
even argue it shouldn't be used in the first place.
> commit 9d0243bca345d5ce25d3f4b74b7facb3a6df1232
> Author: Andrew Morton <akpm@osdl.org>
> Date: Sun Jan 8 01:00:39 2006 -0800
>
> [PATCH] drop-pagecache
>
> Add /proc/sys/vm/drop_caches. When written to, this will cause the kernel to
> discard as much pagecache and/or reclaimable slab objects as it can. THis
> operation requires root permissions.
>
> ...
>
> This is a debugging feature: useful for getting consistent results between
> filesystem benchmarks. We could possibly put it under a config option, but
> it's less than 300 bytes.
>
> Also see http://lkml.org/lkml/2013/7/26/230
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] VFS: Directory level cache cleaning
2013-12-16 15:00 [PATCH 0/5] VFS: Directory level cache cleaning Li Wang
` (5 preceding siblings ...)
2013-12-16 17:45 ` [PATCH 0/5] VFS: Directory " Cong Wang
@ 2013-12-17 22:05 ` Dave Chinner
2013-12-18 1:36 ` Li Wang
6 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2013-12-17 22:05 UTC (permalink / raw)
To: Li Wang
Cc: Alexander Viro, Sage Weil, linux-fsdevel, linux-mm, linux-kernel,
Yunchuan Wen
On Mon, Dec 16, 2013 at 07:00:04AM -0800, Li Wang wrote:
> Currently, Linux only support file system wide VFS
> cache (dentry cache and page cache) cleaning through
> '/proc/sys/vm/drop_caches'. Sometimes this is less
> flexible. The applications may know exactly whether
> the metadata and data will be referenced or not in future,
> a desirable mechanism is to enable applications to
> reclaim the memory of unused cache entries at a finer
> granularity - directory level. This enables applications
> to keep hot metadata and data (to be referenced in the
> future) in the cache, and kick unused out to avoid
> cache thrashing. Another advantage is it is more flexible
> for debugging.
>
> This patch extend the 'drop_caches' interface to
> support directory level cache cleaning and has a complete
> backward compatibility. '{1,2,3}' keeps the same semantics
> as before. Besides, "{1,2,3}:DIRECTORY_PATH_NAME" is allowed
> to recursively clean the caches under DIRECTORY_PATH_NAME.
> For example, 'echo 1:/home/foo/jpg > /proc/sys/vm/drop_caches'
> will clean the page caches of the files inside 'home/foo/jpg'.
>
> It is easy to demonstrate the advantage of directory level
> cache cleaning. We use a virtual machine configured with
> an Intel(R) Xeon(R) 8-core CPU E5506 @ 2.13GHz, and with 1GB
> memory. Three directories named '1', '2' and '3' are created,
> with each containing 180000 – 280000 files. The test program
> opens all files in a directory and then tries the next directory.
> The order for accessing the directories is '1', '2', '3',
> '1'.
>
> The time on accessing '1' on the second time is measured
> with/without cache cleaning, under different file counts.
> With cache cleaning, we clean all cache entries of files
> in '2' before accessing the files in '3'. The results
> are as follows (in seconds),
This sounds like a highly contrived test case. There is no reason
why dentry cache access time would change going from 180k to 280k
files in 3 directories unless you're right at the memory pressure
balance point in terms of cache sizing.
> Note: by default, VFS will move those unreferenced inodes
> into a global LRU list rather than freeing them, for this
> experiment, we modified iput() to force to free inode as well,
> this behavior and related codes are left for further discussion,
> thus not reflected in this patch)
>
> Number of files: 180000 200000 220000 240000 260000
> Without cleaning: 2.165 6.977 10.032 11.571 13.443
> With cleaning: 1.949 1.906 2.336 2.918 3.651
>
> When the number of files is 180000 in each directory,
> the metadata cache is large enough to buffer all entries
> of three directories, so re-accessing '1' will hit in
> the cache, regardless of whether '2' cleaned up or not.
> As the number of files increases, the cache can now only
> buffer two+ directories. Accessing '3' will result in some
> entries of '1' to be evicted (due to LRU). When re-accessing '1',
> some entries need be reloaded from disk, which is time-consuming.
Ok, so exactly as I thought - your example working set is slightly
larger than what the cache holds. Hence what you are describing is
a cache reclaim threshold effect: something you can avoid with
/proc/sys/vm/vfs_cache_pressure.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] VFS: Directory level cache cleaning
2013-12-17 9:31 ` Li Wang
@ 2013-12-18 1:26 ` Li Zefan
0 siblings, 0 replies; 16+ messages in thread
From: Li Zefan @ 2013-12-18 1:26 UTC (permalink / raw)
To: Li Wang
Cc: Matthew Wilcox, Cong Wang, Alexander Viro, Sage Weil,
linux-fsdevel, linux-mm, LKML, Yunchuan Wen
On 2013/12/17 17:31, Li Wang wrote:
> This extension is just add-on extension. The original debugging
> capability is still there, and more flexible debugging is now allowed.
>
but you intent is to let applications use this interface for
non-debugging purpose.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] VFS: Directory level cache cleaning
2013-12-17 22:05 ` Dave Chinner
@ 2013-12-18 1:36 ` Li Wang
0 siblings, 0 replies; 16+ messages in thread
From: Li Wang @ 2013-12-18 1:36 UTC (permalink / raw)
To: Dave Chinner
Cc: Alexander Viro, Sage Weil, linux-fsdevel, linux-mm, linux-kernel,
Yunchuan Wen, Cong Wang, Li Zefan, Matthew Wilcox, Michal Hocko,
Andrew Morton
Both 'drop_caches' and 'vfs_cache_pressure' do coarse granularity
control. Sometimes these do not help much for those performance
sensitive applications. General and simple algorithms are good
regarding its application independence and working for normal
situations. However, since applications have the most knowledge
about the things they are doing, they can always do better if
they are given a chance. I think that is why compiler have
directives, such as __inline__,__align__, cpu cache provides
__prefetch__ etc. Similarly, I think we had better endow the
applications more abilities to manipulate the metadata/page cache.
This is potentially beneficial to avoid performance degradation
due to cache thrashing.
'drop_caches' may not be the expected way to go, since its intention
is for debugging. 'fadvise' is originally proposed at this purpose,
I think we may start with making 'fadvise' could handle directory level
page cache cleaning.
On 2013/12/18 6:05, Dave Chinner wrote:
> On Mon, Dec 16, 2013 at 07:00:04AM -0800, Li Wang wrote:
>> Currently, Linux only support file system wide VFS
>> cache (dentry cache and page cache) cleaning through
>> '/proc/sys/vm/drop_caches'. Sometimes this is less
>> flexible. The applications may know exactly whether
>> the metadata and data will be referenced or not in future,
>> a desirable mechanism is to enable applications to
>> reclaim the memory of unused cache entries at a finer
>> granularity - directory level. This enables applications
>> to keep hot metadata and data (to be referenced in the
>> future) in the cache, and kick unused out to avoid
>> cache thrashing. Another advantage is it is more flexible
>> for debugging.
>>
>> This patch extend the 'drop_caches' interface to
>> support directory level cache cleaning and has a complete
>> backward compatibility. '{1,2,3}' keeps the same semantics
>> as before. Besides, "{1,2,3}:DIRECTORY_PATH_NAME" is allowed
>> to recursively clean the caches under DIRECTORY_PATH_NAME.
>> For example, 'echo 1:/home/foo/jpg > /proc/sys/vm/drop_caches'
>> will clean the page caches of the files inside 'home/foo/jpg'.
>>
>> It is easy to demonstrate the advantage of directory level
>> cache cleaning. We use a virtual machine configured with
>> an Intel(R) Xeon(R) 8-core CPU E5506 @ 2.13GHz, and with 1GB
>> memory. Three directories named '1', '2' and '3' are created,
>> with each containing 180000 – 280000 files. The test program
>> opens all files in a directory and then tries the next directory.
>> The order for accessing the directories is '1', '2', '3',
>> '1'.
>>
>> The time on accessing '1' on the second time is measured
>> with/without cache cleaning, under different file counts.
>> With cache cleaning, we clean all cache entries of files
>> in '2' before accessing the files in '3'. The results
>> are as follows (in seconds),
>
> This sounds like a highly contrived test case. There is no reason
> why dentry cache access time would change going from 180k to 280k
> files in 3 directories unless you're right at the memory pressure
> balance point in terms of cache sizing.
>
>> Note: by default, VFS will move those unreferenced inodes
>> into a global LRU list rather than freeing them, for this
>> experiment, we modified iput() to force to free inode as well,
>> this behavior and related codes are left for further discussion,
>> thus not reflected in this patch)
>>
>> Number of files: 180000 200000 220000 240000 260000
>> Without cleaning: 2.165 6.977 10.032 11.571 13.443
>> With cleaning: 1.949 1.906 2.336 2.918 3.651
>>
>> When the number of files is 180000 in each directory,
>> the metadata cache is large enough to buffer all entries
>> of three directories, so re-accessing '1' will hit in
>> the cache, regardless of whether '2' cleaned up or not.
>> As the number of files increases, the cache can now only
>> buffer two+ directories. Accessing '3' will result in some
>> entries of '1' to be evicted (due to LRU). When re-accessing '1',
>> some entries need be reloaded from disk, which is time-consuming.
>
> Ok, so exactly as I thought - your example working set is slightly
> larger than what the cache holds. Hence what you are describing is
> a cache reclaim threshold effect: something you can avoid with
> /proc/sys/vm/vfs_cache_pressure.
>
> Cheers,
>
> Dave.
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2013-12-18 1:36 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-16 15:00 [PATCH 0/5] VFS: Directory level cache cleaning Li Wang
2013-12-16 15:00 ` [PATCH 1/5] VFS: Convert drop_caches to accept string Li Wang
2013-12-16 15:00 ` [PATCH 2/5] VFS: Convert sysctl_drop_caches to string Li Wang
2013-12-16 15:00 ` [PATCH 3/5] VFS: Add the declaration of shrink_pagecache_parent Li Wang
2013-12-16 15:00 ` [PATCH 4/5] VFS: Add shrink_pagecache_parent Li Wang
2013-12-16 15:00 ` [PATCH 5/5] VFS: Extend drop_caches sysctl handler to allow directory level cache cleaning Li Wang
2013-12-16 17:45 ` [PATCH 0/5] VFS: Directory " Cong Wang
2013-12-17 3:08 ` Li Wang
2013-12-17 3:58 ` Matthew Wilcox
2013-12-17 7:23 ` Li Wang
2013-12-17 9:12 ` Li Zefan
2013-12-17 9:31 ` Li Wang
2013-12-18 1:26 ` Li Zefan
2013-12-17 13:55 ` Michal Hocko
2013-12-17 22:05 ` Dave Chinner
2013-12-18 1:36 ` Li Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).