* [PATCH 0/3] add support for drop_caches for individual filesystem
@ 2024-10-10 11:25 Ye Bin
2024-10-10 11:25 ` [PATCH 1/3] vfs: introduce shrink_icache_sb() helper Ye Bin
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Ye Bin @ 2024-10-10 11:25 UTC (permalink / raw)
To: viro, brauner, jack, linux-fsdevel; +Cc: linux-kernel, yebin10, zhangxiaoxu5
From: Ye Bin <yebin10@huawei.com>
In order to better analyze the issue of file system uninstallation caused
by kernel module opening files, it is necessary to perform dentry recycling
on a single file system. But now, apart from global dentry recycling, it is
not supported to do dentry recycling on a single file system separately.
This feature has usage scenarios in problem localization scenarios.At the
same time, it also provides users with a slightly fine-grained
pagecache/entry recycling mechanism.
This patchset supports the recycling of pagecache/entry for individual file
systems.
Ye Bin (3):
vfs: introduce shrink_icache_sb() helper
sysctl: add support for drop_caches for individual filesystem
Documentation: add instructions for using 'drop_fs_caches sysctl'
sysctl
Documentation/admin-guide/sysctl/vm.rst | 27 ++++++++++++++++
fs/drop_caches.c | 43 +++++++++++++++++++++++++
fs/inode.c | 17 ++++++++++
fs/internal.h | 1 +
include/linux/mm.h | 2 ++
kernel/sysctl.c | 9 ++++++
6 files changed, 99 insertions(+)
--
2.31.1
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH 1/3] vfs: introduce shrink_icache_sb() helper 2024-10-10 11:25 [PATCH 0/3] add support for drop_caches for individual filesystem Ye Bin @ 2024-10-10 11:25 ` Ye Bin 2024-10-10 12:07 ` Jan Kara 2024-10-10 11:25 ` [PATCH 2/3] sysctl: add support for drop_caches for individual filesystem Ye Bin 2024-10-10 11:25 ` [PATCH 3/3] Documentation: add instructions for using 'drop_fs_caches sysctl' sysctl Ye Bin 2 siblings, 1 reply; 13+ messages in thread From: Ye Bin @ 2024-10-10 11:25 UTC (permalink / raw) To: viro, brauner, jack, linux-fsdevel; +Cc: linux-kernel, yebin10, zhangxiaoxu5 From: Ye Bin <yebin10@huawei.com> This patch is prepare for support drop_caches for specify file system. shrink_icache_sb() helper walk the superblock inode LRU for freeable inodes and attempt to free them. Signed-off-by: Ye Bin <yebin10@huawei.com> --- fs/inode.c | 17 +++++++++++++++++ fs/internal.h | 1 + 2 files changed, 18 insertions(+) diff --git a/fs/inode.c b/fs/inode.c index 1939f711d2c9..2129b48571b4 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -1045,6 +1045,23 @@ long prune_icache_sb(struct super_block *sb, struct shrink_control *sc) return freed; } +/* + * Walk the superblock inode LRU for freeable inodes and attempt to free them. + * Inodes to be freed are moved to a temporary list and then are freed outside + * inode_lock by dispose_list(). + */ +void shrink_icache_sb(struct super_block *sb) +{ + do { + LIST_HEAD(dispose); + + list_lru_walk(&sb->s_inode_lru, inode_lru_isolate, + &dispose, 1024); + dispose_list(&dispose); + } while (list_lru_count(&sb->s_inode_lru) > 0); +} +EXPORT_SYMBOL(shrink_icache_sb); + static void __wait_on_freeing_inode(struct inode *inode, bool is_inode_hash_locked); /* * Called with the inode lock held. diff --git a/fs/internal.h b/fs/internal.h index 81c7a085355c..cee79141e308 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -199,6 +199,7 @@ extern int vfs_open(const struct path *, struct file *); * inode.c */ extern long prune_icache_sb(struct super_block *sb, struct shrink_control *sc); +extern void shrink_icache_sb(struct super_block *sb); int dentry_needs_remove_privs(struct mnt_idmap *, struct dentry *dentry); bool in_group_or_capable(struct mnt_idmap *idmap, const struct inode *inode, vfsgid_t vfsgid); -- 2.31.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/3] vfs: introduce shrink_icache_sb() helper 2024-10-10 11:25 ` [PATCH 1/3] vfs: introduce shrink_icache_sb() helper Ye Bin @ 2024-10-10 12:07 ` Jan Kara 0 siblings, 0 replies; 13+ messages in thread From: Jan Kara @ 2024-10-10 12:07 UTC (permalink / raw) To: Ye Bin Cc: viro, brauner, jack, linux-fsdevel, linux-kernel, yebin10, zhangxiaoxu5 On Thu 10-10-24 19:25:41, Ye Bin wrote: > From: Ye Bin <yebin10@huawei.com> > > This patch is prepare for support drop_caches for specify file system. > shrink_icache_sb() helper walk the superblock inode LRU for freeable inodes > and attempt to free them. > > Signed-off-by: Ye Bin <yebin10@huawei.com> > --- > fs/inode.c | 17 +++++++++++++++++ > fs/internal.h | 1 + > 2 files changed, 18 insertions(+) > > diff --git a/fs/inode.c b/fs/inode.c > index 1939f711d2c9..2129b48571b4 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -1045,6 +1045,23 @@ long prune_icache_sb(struct super_block *sb, struct shrink_control *sc) > return freed; > } > > +/* > + * Walk the superblock inode LRU for freeable inodes and attempt to free them. > + * Inodes to be freed are moved to a temporary list and then are freed outside > + * inode_lock by dispose_list(). > + */ > +void shrink_icache_sb(struct super_block *sb) > +{ > + do { > + LIST_HEAD(dispose); > + > + list_lru_walk(&sb->s_inode_lru, inode_lru_isolate, > + &dispose, 1024); > + dispose_list(&dispose); > + } while (list_lru_count(&sb->s_inode_lru) > 0); > +} > +EXPORT_SYMBOL(shrink_icache_sb); Hum, but this will livelock if we cannot remove all the inodes? Now I guess inode_lru_isolate() usually removes busy inodes from the LRU so this should not happen in practice but such behavior is not guaranteed (we can LRU_SKIP inodes if i_lock is busy or LRU_RETRY if inode has page cache pages). So I think we need some safety net here... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/3] sysctl: add support for drop_caches for individual filesystem 2024-10-10 11:25 [PATCH 0/3] add support for drop_caches for individual filesystem Ye Bin 2024-10-10 11:25 ` [PATCH 1/3] vfs: introduce shrink_icache_sb() helper Ye Bin @ 2024-10-10 11:25 ` Ye Bin 2024-10-10 12:16 ` Jan Kara ` (2 more replies) 2024-10-10 11:25 ` [PATCH 3/3] Documentation: add instructions for using 'drop_fs_caches sysctl' sysctl Ye Bin 2 siblings, 3 replies; 13+ messages in thread From: Ye Bin @ 2024-10-10 11:25 UTC (permalink / raw) To: viro, brauner, jack, linux-fsdevel; +Cc: linux-kernel, yebin10, zhangxiaoxu5 From: Ye Bin <yebin10@huawei.com> In order to better analyze the issue of file system uninstallation caused by kernel module opening files, it is necessary to perform dentry recycling on a single file system. But now, apart from global dentry recycling, it is not supported to do dentry recycling on a single file system separately. This feature has usage scenarios in problem localization scenarios.At the same time, it also provides users with a slightly fine-grained pagecache/entry recycling mechanism. This patch supports the recycling of pagecache/entry for individual file systems. Signed-off-by: Ye Bin <yebin10@huawei.com> --- fs/drop_caches.c | 43 +++++++++++++++++++++++++++++++++++++++++++ include/linux/mm.h | 2 ++ kernel/sysctl.c | 9 +++++++++ 3 files changed, 54 insertions(+) diff --git a/fs/drop_caches.c b/fs/drop_caches.c index d45ef541d848..99d412cf3e52 100644 --- a/fs/drop_caches.c +++ b/fs/drop_caches.c @@ -77,3 +77,46 @@ int drop_caches_sysctl_handler(const struct ctl_table *table, int write, } return 0; } + +int drop_fs_caches_sysctl_handler(const struct ctl_table *table, int write, + void *buffer, size_t *length, loff_t *ppos) +{ + unsigned int major, minor; + unsigned int ctl; + struct super_block *sb; + static int stfu; + + if (!write) + return 0; + + if (sscanf(buffer, "%u:%u:%u", &major, &minor, &ctl) != 3) + return -EINVAL; + + if (ctl < *((int *)table->extra1) || ctl > *((int *)table->extra2)) + return -EINVAL; + + sb = user_get_super(MKDEV(major, minor), false); + if (!sb) + return -EINVAL; + + if (ctl & 1) { + lru_add_drain_all(); + drop_pagecache_sb(sb, NULL); + count_vm_event(DROP_PAGECACHE); + } + + if (ctl & 2) { + shrink_dcache_sb(sb); + shrink_icache_sb(sb); + count_vm_event(DROP_SLAB); + } + + drop_super(sb); + + if (!stfu) + pr_info("%s (%d): drop_fs_caches: %u:%u:%d\n", current->comm, + task_pid_nr(current), major, minor, ctl); + stfu |= ctl & 4; + + return 0; +} diff --git a/include/linux/mm.h b/include/linux/mm.h index 344541f8cba0..43079478296f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3788,6 +3788,8 @@ extern bool process_shares_mm(struct task_struct *p, struct mm_struct *mm); extern int sysctl_drop_caches; int drop_caches_sysctl_handler(const struct ctl_table *, int, void *, size_t *, loff_t *); +int drop_fs_caches_sysctl_handler(const struct ctl_table *table, int write, + void *buffer, size_t *length, loff_t *ppos); #endif void drop_slab(void); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 79e6cb1d5c48..d434cbe10e47 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2101,6 +2101,15 @@ static struct ctl_table vm_table[] = { .extra1 = SYSCTL_ONE, .extra2 = SYSCTL_FOUR, }, + { + .procname = "drop_fs_caches", + .data = NULL, + .maxlen = 256, + .mode = 0200, + .proc_handler = drop_fs_caches_sysctl_handler, + .extra1 = SYSCTL_ONE, + .extra2 = SYSCTL_FOUR, + }, { .procname = "page_lock_unfairness", .data = &sysctl_page_lock_unfairness, -- 2.31.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] sysctl: add support for drop_caches for individual filesystem 2024-10-10 11:25 ` [PATCH 2/3] sysctl: add support for drop_caches for individual filesystem Ye Bin @ 2024-10-10 12:16 ` Jan Kara 2024-10-10 12:44 ` yebin (H) 2024-10-10 13:35 ` Benjamin Coddington 2024-10-10 13:48 ` Thomas Weißschuh 2024-10-10 17:17 ` Al Viro 2 siblings, 2 replies; 13+ messages in thread From: Jan Kara @ 2024-10-10 12:16 UTC (permalink / raw) To: Ye Bin Cc: viro, brauner, jack, linux-fsdevel, linux-kernel, yebin10, zhangxiaoxu5 On Thu 10-10-24 19:25:42, Ye Bin wrote: > From: Ye Bin <yebin10@huawei.com> > > In order to better analyze the issue of file system uninstallation caused > by kernel module opening files, it is necessary to perform dentry recycling I don't quite understand the use case you mention here. Can you explain it a bit more (that being said I've needed dropping caches for a particular sb myself a few times for debugging purposes so I generally agree it is a useful feature). > on a single file system. But now, apart from global dentry recycling, it is > not supported to do dentry recycling on a single file system separately. > This feature has usage scenarios in problem localization scenarios.At the > same time, it also provides users with a slightly fine-grained > pagecache/entry recycling mechanism. > This patch supports the recycling of pagecache/entry for individual file > systems. > > Signed-off-by: Ye Bin <yebin10@huawei.com> > --- > fs/drop_caches.c | 43 +++++++++++++++++++++++++++++++++++++++++++ > include/linux/mm.h | 2 ++ > kernel/sysctl.c | 9 +++++++++ > 3 files changed, 54 insertions(+) > > diff --git a/fs/drop_caches.c b/fs/drop_caches.c > index d45ef541d848..99d412cf3e52 100644 > --- a/fs/drop_caches.c > +++ b/fs/drop_caches.c > @@ -77,3 +77,46 @@ int drop_caches_sysctl_handler(const struct ctl_table *table, int write, > } > return 0; > } > + > +int drop_fs_caches_sysctl_handler(const struct ctl_table *table, int write, > + void *buffer, size_t *length, loff_t *ppos) > +{ > + unsigned int major, minor; > + unsigned int ctl; > + struct super_block *sb; > + static int stfu; > + > + if (!write) > + return 0; > + > + if (sscanf(buffer, "%u:%u:%u", &major, &minor, &ctl) != 3) > + return -EINVAL; I think specifying bdev major & minor number is not a great interface these days. In particular for filesystems which are not bdev based such as NFS. I think specifying path to some file/dir in the filesystem is nicer and you can easily resolve that to sb here as well. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] sysctl: add support for drop_caches for individual filesystem 2024-10-10 12:16 ` Jan Kara @ 2024-10-10 12:44 ` yebin (H) 2024-10-10 13:35 ` Benjamin Coddington 1 sibling, 0 replies; 13+ messages in thread From: yebin (H) @ 2024-10-10 12:44 UTC (permalink / raw) To: Jan Kara, Ye Bin; +Cc: viro, brauner, linux-fsdevel, linux-kernel, zhangxiaoxu5 On 2024/10/10 20:16, Jan Kara wrote: > On Thu 10-10-24 19:25:42, Ye Bin wrote: >> From: Ye Bin <yebin10@huawei.com> >> >> In order to better analyze the issue of file system uninstallation caused >> by kernel module opening files, it is necessary to perform dentry recycling > I don't quite understand the use case you mention here. Can you explain it > a bit more (that being said I've needed dropping caches for a particular sb > myself a few times for debugging purposes so I generally agree it is a > useful feature). Well, I'm analyzing what files are still open and the file system can't be unmounted. The process occupied by the opened file cannot be found through the fuser. That is, the file may be occupied by the kernel mode. You can insert a module or use kprobe to obtain all cached files of the corresponding file system. But there can be a lot of files, so I want to clean up irrelevant files first. >> on a single file system. But now, apart from global dentry recycling, it is >> not supported to do dentry recycling on a single file system separately. >> This feature has usage scenarios in problem localization scenarios.At the >> same time, it also provides users with a slightly fine-grained >> pagecache/entry recycling mechanism. >> This patch supports the recycling of pagecache/entry for individual file >> systems. >> >> Signed-off-by: Ye Bin <yebin10@huawei.com> >> --- >> fs/drop_caches.c | 43 +++++++++++++++++++++++++++++++++++++++++++ >> include/linux/mm.h | 2 ++ >> kernel/sysctl.c | 9 +++++++++ >> 3 files changed, 54 insertions(+) >> >> diff --git a/fs/drop_caches.c b/fs/drop_caches.c >> index d45ef541d848..99d412cf3e52 100644 >> --- a/fs/drop_caches.c >> +++ b/fs/drop_caches.c >> @@ -77,3 +77,46 @@ int drop_caches_sysctl_handler(const struct ctl_table *table, int write, >> } >> return 0; >> } >> + >> +int drop_fs_caches_sysctl_handler(const struct ctl_table *table, int write, >> + void *buffer, size_t *length, loff_t *ppos) >> +{ >> + unsigned int major, minor; >> + unsigned int ctl; >> + struct super_block *sb; >> + static int stfu; >> + >> + if (!write) >> + return 0; >> + >> + if (sscanf(buffer, "%u:%u:%u", &major, &minor, &ctl) != 3) >> + return -EINVAL; > I think specifying bdev major & minor number is not a great interface these > days. In particular for filesystems which are not bdev based such as NFS. I > think specifying path to some file/dir in the filesystem is nicer and you > can easily resolve that to sb here as well. > > Honza That's a really good idea. I think by specifying bdev "major & minor", you can reclaim the file system pagecache that is not unmounted due to "umount -l" mode. In this case, the sb of the corresponding file system cannot be found in the specified path. So I think we can support both ways. I look forward to your opinion. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] sysctl: add support for drop_caches for individual filesystem 2024-10-10 12:16 ` Jan Kara 2024-10-10 12:44 ` yebin (H) @ 2024-10-10 13:35 ` Benjamin Coddington 2024-10-10 17:04 ` Jan Kara 1 sibling, 1 reply; 13+ messages in thread From: Benjamin Coddington @ 2024-10-10 13:35 UTC (permalink / raw) To: Jan Kara Cc: Ye Bin, viro, brauner, linux-fsdevel, linux-kernel, yebin10, zhangxiaoxu5 On 10 Oct 2024, at 8:16, Jan Kara wrote: > On Thu 10-10-24 19:25:42, Ye Bin wrote: >> From: Ye Bin <yebin10@huawei.com> >> >> In order to better analyze the issue of file system uninstallation caused >> by kernel module opening files, it is necessary to perform dentry recycling > > I don't quite understand the use case you mention here. Can you explain it > a bit more (that being said I've needed dropping caches for a particular sb > myself a few times for debugging purposes so I generally agree it is a > useful feature). > >> on a single file system. But now, apart from global dentry recycling, it is >> not supported to do dentry recycling on a single file system separately. >> This feature has usage scenarios in problem localization scenarios.At the >> same time, it also provides users with a slightly fine-grained >> pagecache/entry recycling mechanism. >> This patch supports the recycling of pagecache/entry for individual file >> systems. >> >> Signed-off-by: Ye Bin <yebin10@huawei.com> >> --- >> fs/drop_caches.c | 43 +++++++++++++++++++++++++++++++++++++++++++ >> include/linux/mm.h | 2 ++ >> kernel/sysctl.c | 9 +++++++++ >> 3 files changed, 54 insertions(+) >> >> diff --git a/fs/drop_caches.c b/fs/drop_caches.c >> index d45ef541d848..99d412cf3e52 100644 >> --- a/fs/drop_caches.c >> +++ b/fs/drop_caches.c >> @@ -77,3 +77,46 @@ int drop_caches_sysctl_handler(const struct ctl_table *table, int write, >> } >> return 0; >> } >> + >> +int drop_fs_caches_sysctl_handler(const struct ctl_table *table, int write, >> + void *buffer, size_t *length, loff_t *ppos) >> +{ >> + unsigned int major, minor; >> + unsigned int ctl; >> + struct super_block *sb; >> + static int stfu; >> + >> + if (!write) >> + return 0; >> + >> + if (sscanf(buffer, "%u:%u:%u", &major, &minor, &ctl) != 3) >> + return -EINVAL; > > I think specifying bdev major & minor number is not a great interface these > days. In particular for filesystems which are not bdev based such as NFS. I > think specifying path to some file/dir in the filesystem is nicer and you > can easily resolve that to sb here as well. Slight disagreement here since NFS uses set_anon_super() and major:minor will work fine with it. I'd prefer it actually since it avoids this interface having to do a pathwalk and make decisions about what's mounted where and in what namespace. Ben ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] sysctl: add support for drop_caches for individual filesystem 2024-10-10 13:35 ` Benjamin Coddington @ 2024-10-10 17:04 ` Jan Kara 2024-10-11 11:44 ` Amir Goldstein 0 siblings, 1 reply; 13+ messages in thread From: Jan Kara @ 2024-10-10 17:04 UTC (permalink / raw) To: Benjamin Coddington Cc: Jan Kara, Ye Bin, viro, brauner, linux-fsdevel, linux-kernel, yebin10, zhangxiaoxu5 On Thu 10-10-24 09:35:46, Benjamin Coddington wrote: > On 10 Oct 2024, at 8:16, Jan Kara wrote: > > > On Thu 10-10-24 19:25:42, Ye Bin wrote: > >> From: Ye Bin <yebin10@huawei.com> > >> > >> In order to better analyze the issue of file system uninstallation caused > >> by kernel module opening files, it is necessary to perform dentry recycling > > > > I don't quite understand the use case you mention here. Can you explain it > > a bit more (that being said I've needed dropping caches for a particular sb > > myself a few times for debugging purposes so I generally agree it is a > > useful feature). > > > >> on a single file system. But now, apart from global dentry recycling, it is > >> not supported to do dentry recycling on a single file system separately. > >> This feature has usage scenarios in problem localization scenarios.At the > >> same time, it also provides users with a slightly fine-grained > >> pagecache/entry recycling mechanism. > >> This patch supports the recycling of pagecache/entry for individual file > >> systems. > >> > >> Signed-off-by: Ye Bin <yebin10@huawei.com> > >> --- > >> fs/drop_caches.c | 43 +++++++++++++++++++++++++++++++++++++++++++ > >> include/linux/mm.h | 2 ++ > >> kernel/sysctl.c | 9 +++++++++ > >> 3 files changed, 54 insertions(+) > >> > >> diff --git a/fs/drop_caches.c b/fs/drop_caches.c > >> index d45ef541d848..99d412cf3e52 100644 > >> --- a/fs/drop_caches.c > >> +++ b/fs/drop_caches.c > >> @@ -77,3 +77,46 @@ int drop_caches_sysctl_handler(const struct ctl_table *table, int write, > >> } > >> return 0; > >> } > >> + > >> +int drop_fs_caches_sysctl_handler(const struct ctl_table *table, int write, > >> + void *buffer, size_t *length, loff_t *ppos) > >> +{ > >> + unsigned int major, minor; > >> + unsigned int ctl; > >> + struct super_block *sb; > >> + static int stfu; > >> + > >> + if (!write) > >> + return 0; > >> + > >> + if (sscanf(buffer, "%u:%u:%u", &major, &minor, &ctl) != 3) > >> + return -EINVAL; > > > > I think specifying bdev major & minor number is not a great interface these > > days. In particular for filesystems which are not bdev based such as NFS. I > > think specifying path to some file/dir in the filesystem is nicer and you > > can easily resolve that to sb here as well. > > Slight disagreement here since NFS uses set_anon_super() and major:minor > will work fine with it. OK, fair point, anon bdev numbers can be used. But filesystems using get_tree_nodev() would still be problematic. > I'd prefer it actually since it avoids this > interface having to do a pathwalk and make decisions about what's mounted > where and in what namespace. I don't understand the problem here. We'd do user_path_at(AT_FDCWD, ..., &path) and then take path.mnt->mnt_sb. That doesn't look terribly complicated to me. Plus it naturally deals with issues like namespacing etc. although they are not a huge issue here because the functionality should be restricted to CAP_SYS_ADMIN anyway. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] sysctl: add support for drop_caches for individual filesystem 2024-10-10 17:04 ` Jan Kara @ 2024-10-11 11:44 ` Amir Goldstein 2024-10-14 11:24 ` Jan Kara 0 siblings, 1 reply; 13+ messages in thread From: Amir Goldstein @ 2024-10-11 11:44 UTC (permalink / raw) To: Jan Kara, brauner Cc: Benjamin Coddington, Ye Bin, viro, linux-fsdevel, linux-kernel, yebin10, zhangxiaoxu5 On Thu, Oct 10, 2024 at 7:04 PM Jan Kara <jack@suse.cz> wrote: > > On Thu 10-10-24 09:35:46, Benjamin Coddington wrote: > > On 10 Oct 2024, at 8:16, Jan Kara wrote: > > > > > On Thu 10-10-24 19:25:42, Ye Bin wrote: > > >> From: Ye Bin <yebin10@huawei.com> > > >> > > >> In order to better analyze the issue of file system uninstallation caused > > >> by kernel module opening files, it is necessary to perform dentry recycling > > > > > > I don't quite understand the use case you mention here. Can you explain it > > > a bit more (that being said I've needed dropping caches for a particular sb > > > myself a few times for debugging purposes so I generally agree it is a > > > useful feature). > > > > > >> on a single file system. But now, apart from global dentry recycling, it is > > >> not supported to do dentry recycling on a single file system separately. > > >> This feature has usage scenarios in problem localization scenarios.At the > > >> same time, it also provides users with a slightly fine-grained > > >> pagecache/entry recycling mechanism. > > >> This patch supports the recycling of pagecache/entry for individual file > > >> systems. > > >> > > >> Signed-off-by: Ye Bin <yebin10@huawei.com> > > >> --- > > >> fs/drop_caches.c | 43 +++++++++++++++++++++++++++++++++++++++++++ > > >> include/linux/mm.h | 2 ++ > > >> kernel/sysctl.c | 9 +++++++++ > > >> 3 files changed, 54 insertions(+) > > >> > > >> diff --git a/fs/drop_caches.c b/fs/drop_caches.c > > >> index d45ef541d848..99d412cf3e52 100644 > > >> --- a/fs/drop_caches.c > > >> +++ b/fs/drop_caches.c > > >> @@ -77,3 +77,46 @@ int drop_caches_sysctl_handler(const struct ctl_table *table, int write, > > >> } > > >> return 0; > > >> } > > >> + > > >> +int drop_fs_caches_sysctl_handler(const struct ctl_table *table, int write, > > >> + void *buffer, size_t *length, loff_t *ppos) > > >> +{ > > >> + unsigned int major, minor; > > >> + unsigned int ctl; > > >> + struct super_block *sb; > > >> + static int stfu; > > >> + > > >> + if (!write) > > >> + return 0; > > >> + > > >> + if (sscanf(buffer, "%u:%u:%u", &major, &minor, &ctl) != 3) > > >> + return -EINVAL; > > > > > > I think specifying bdev major & minor number is not a great interface these > > > days. In particular for filesystems which are not bdev based such as NFS. I > > > think specifying path to some file/dir in the filesystem is nicer and you > > > can easily resolve that to sb here as well. > > > > Slight disagreement here since NFS uses set_anon_super() and major:minor > > will work fine with it. > > OK, fair point, anon bdev numbers can be used. But filesystems using > get_tree_nodev() would still be problematic. > > > I'd prefer it actually since it avoids this > > interface having to do a pathwalk and make decisions about what's mounted > > where and in what namespace. > > I don't understand the problem here. We'd do user_path_at(AT_FDCWD, ..., > &path) and then take path.mnt->mnt_sb. That doesn't look terribly > complicated to me. Plus it naturally deals with issues like namespacing > etc. although they are not a huge issue here because the functionality > should be restricted to CAP_SYS_ADMIN anyway. > Both looking up bdev and looking up path from write() can make syzbot and lockdep very upset: https://lore.kernel.org/linux-fsdevel/00000000000098f75506153551a1@google.com/ I thought Christian had a proposal for dropping cache per-sb API via fadvise() or something? Why use sysfs API for this and not fd to reference an sb? Thanks, Amir. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] sysctl: add support for drop_caches for individual filesystem 2024-10-11 11:44 ` Amir Goldstein @ 2024-10-14 11:24 ` Jan Kara 0 siblings, 0 replies; 13+ messages in thread From: Jan Kara @ 2024-10-14 11:24 UTC (permalink / raw) To: Amir Goldstein Cc: Jan Kara, brauner, Benjamin Coddington, Ye Bin, viro, linux-fsdevel, linux-kernel, yebin10, zhangxiaoxu5 On Fri 11-10-24 13:44:57, Amir Goldstein wrote: > On Thu, Oct 10, 2024 at 7:04 PM Jan Kara <jack@suse.cz> wrote: > > > > On Thu 10-10-24 09:35:46, Benjamin Coddington wrote: > > > On 10 Oct 2024, at 8:16, Jan Kara wrote: > > > > > > > On Thu 10-10-24 19:25:42, Ye Bin wrote: > > > >> From: Ye Bin <yebin10@huawei.com> > > > >> > > > >> In order to better analyze the issue of file system uninstallation caused > > > >> by kernel module opening files, it is necessary to perform dentry recycling > > > > > > > > I don't quite understand the use case you mention here. Can you explain it > > > > a bit more (that being said I've needed dropping caches for a particular sb > > > > myself a few times for debugging purposes so I generally agree it is a > > > > useful feature). > > > > > > > >> on a single file system. But now, apart from global dentry recycling, it is > > > >> not supported to do dentry recycling on a single file system separately. > > > >> This feature has usage scenarios in problem localization scenarios.At the > > > >> same time, it also provides users with a slightly fine-grained > > > >> pagecache/entry recycling mechanism. > > > >> This patch supports the recycling of pagecache/entry for individual file > > > >> systems. > > > >> > > > >> Signed-off-by: Ye Bin <yebin10@huawei.com> > > > >> --- > > > >> fs/drop_caches.c | 43 +++++++++++++++++++++++++++++++++++++++++++ > > > >> include/linux/mm.h | 2 ++ > > > >> kernel/sysctl.c | 9 +++++++++ > > > >> 3 files changed, 54 insertions(+) > > > >> > > > >> diff --git a/fs/drop_caches.c b/fs/drop_caches.c > > > >> index d45ef541d848..99d412cf3e52 100644 > > > >> --- a/fs/drop_caches.c > > > >> +++ b/fs/drop_caches.c > > > >> @@ -77,3 +77,46 @@ int drop_caches_sysctl_handler(const struct ctl_table *table, int write, > > > >> } > > > >> return 0; > > > >> } > > > >> + > > > >> +int drop_fs_caches_sysctl_handler(const struct ctl_table *table, int write, > > > >> + void *buffer, size_t *length, loff_t *ppos) > > > >> +{ > > > >> + unsigned int major, minor; > > > >> + unsigned int ctl; > > > >> + struct super_block *sb; > > > >> + static int stfu; > > > >> + > > > >> + if (!write) > > > >> + return 0; > > > >> + > > > >> + if (sscanf(buffer, "%u:%u:%u", &major, &minor, &ctl) != 3) > > > >> + return -EINVAL; > > > > > > > > I think specifying bdev major & minor number is not a great interface these > > > > days. In particular for filesystems which are not bdev based such as NFS. I > > > > think specifying path to some file/dir in the filesystem is nicer and you > > > > can easily resolve that to sb here as well. > > > > > > Slight disagreement here since NFS uses set_anon_super() and major:minor > > > will work fine with it. > > > > OK, fair point, anon bdev numbers can be used. But filesystems using > > get_tree_nodev() would still be problematic. > > > > > I'd prefer it actually since it avoids this > > > interface having to do a pathwalk and make decisions about what's mounted > > > where and in what namespace. > > > > I don't understand the problem here. We'd do user_path_at(AT_FDCWD, ..., > > &path) and then take path.mnt->mnt_sb. That doesn't look terribly > > complicated to me. Plus it naturally deals with issues like namespacing > > etc. although they are not a huge issue here because the functionality > > should be restricted to CAP_SYS_ADMIN anyway. > > > > Both looking up bdev and looking up path from write() can make syzbot > and lockdep very upset: > https://lore.kernel.org/linux-fsdevel/00000000000098f75506153551a1@google.com/ OK, thanks for the reference. > I thought Christian had a proposal for dropping cache per-sb API via > fadvise() or something? > > Why use sysfs API for this and not fd to reference an sb? I guess because the original drop_caches is in the sysfs. But yes, in principle we could use fd pointing to the filesystem for this. I'm just not sure fadvise(2) is really the right syscall for this because it is currently all about page cache of a file and this call should shrink also the dcache / icache. But ioctl() (not sure if this debug-mostly functionality is worth a syscall) implemented in VFS would certainly be possible and perhaps nicer than sysfs interface. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] sysctl: add support for drop_caches for individual filesystem 2024-10-10 11:25 ` [PATCH 2/3] sysctl: add support for drop_caches for individual filesystem Ye Bin 2024-10-10 12:16 ` Jan Kara @ 2024-10-10 13:48 ` Thomas Weißschuh 2024-10-10 17:17 ` Al Viro 2 siblings, 0 replies; 13+ messages in thread From: Thomas Weißschuh @ 2024-10-10 13:48 UTC (permalink / raw) To: Ye Bin Cc: viro, brauner, jack, linux-fsdevel, linux-kernel, yebin10, zhangxiaoxu5 On 2024-10-10 19:25:42+0800, Ye Bin wrote: > From: Ye Bin <yebin10@huawei.com> > > In order to better analyze the issue of file system uninstallation caused > by kernel module opening files, it is necessary to perform dentry recycling > on a single file system. But now, apart from global dentry recycling, it is > not supported to do dentry recycling on a single file system separately. > This feature has usage scenarios in problem localization scenarios.At the > same time, it also provides users with a slightly fine-grained > pagecache/entry recycling mechanism. > This patch supports the recycling of pagecache/entry for individual file > systems. > > Signed-off-by: Ye Bin <yebin10@huawei.com> > --- > fs/drop_caches.c | 43 +++++++++++++++++++++++++++++++++++++++++++ > include/linux/mm.h | 2 ++ > kernel/sysctl.c | 9 +++++++++ > 3 files changed, 54 insertions(+) > > diff --git a/fs/drop_caches.c b/fs/drop_caches.c > index d45ef541d848..99d412cf3e52 100644 > --- a/fs/drop_caches.c > +++ b/fs/drop_caches.c > @@ -77,3 +77,46 @@ int drop_caches_sysctl_handler(const struct ctl_table *table, int write, > } > return 0; > } > + > +int drop_fs_caches_sysctl_handler(const struct ctl_table *table, int write, > + void *buffer, size_t *length, loff_t *ppos) > +{ > + unsigned int major, minor; > + unsigned int ctl; > + struct super_block *sb; > + static int stfu; > + > + if (!write) > + return 0; > + > + if (sscanf(buffer, "%u:%u:%u", &major, &minor, &ctl) != 3) > + return -EINVAL; > + > + if (ctl < *((int *)table->extra1) || ctl > *((int *)table->extra2)) > + return -EINVAL; > + > + sb = user_get_super(MKDEV(major, minor), false); > + if (!sb) > + return -EINVAL; > + > + if (ctl & 1) { BIT(0) > + lru_add_drain_all(); > + drop_pagecache_sb(sb, NULL); > + count_vm_event(DROP_PAGECACHE); > + } > + > + if (ctl & 2) { > + shrink_dcache_sb(sb); > + shrink_icache_sb(sb); > + count_vm_event(DROP_SLAB); > + } > + > + drop_super(sb); > + > + if (!stfu) > + pr_info("%s (%d): drop_fs_caches: %u:%u:%d\n", current->comm, > + task_pid_nr(current), major, minor, ctl); > + stfu |= ctl & 4; This looks very weird. I guess it's already in the original drop_caches_sysctl_handler(). > + > + return 0; > +} > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 344541f8cba0..43079478296f 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -3788,6 +3788,8 @@ extern bool process_shares_mm(struct task_struct *p, struct mm_struct *mm); > extern int sysctl_drop_caches; > int drop_caches_sysctl_handler(const struct ctl_table *, int, void *, size_t *, > loff_t *); > +int drop_fs_caches_sysctl_handler(const struct ctl_table *table, int write, > + void *buffer, size_t *length, loff_t *ppos); > #endif > > void drop_slab(void); > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index 79e6cb1d5c48..d434cbe10e47 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -2101,6 +2101,15 @@ static struct ctl_table vm_table[] = { Sooner or later this table should move out of kernel/sysctl.c and into a subsystem-specific file. This also means the handler doesn't need to be exported. > .extra1 = SYSCTL_ONE, > .extra2 = SYSCTL_FOUR, > }, > + { > + .procname = "drop_fs_caches", > + .data = NULL, NULL is already the default. > + .maxlen = 256, The maxlen field refers to the data field. As there is no data, there should be no maxlen. > + .mode = 0200, > + .proc_handler = drop_fs_caches_sysctl_handler, > + .extra1 = SYSCTL_ONE, > + .extra2 = SYSCTL_FOUR, These extras are meant as parameters for generic handlers. Inlining the limits into your hander makes it much clearer. > + }, > { > .procname = "page_lock_unfairness", > .data = &sysctl_page_lock_unfairness, > -- > 2.31.1 > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] sysctl: add support for drop_caches for individual filesystem 2024-10-10 11:25 ` [PATCH 2/3] sysctl: add support for drop_caches for individual filesystem Ye Bin 2024-10-10 12:16 ` Jan Kara 2024-10-10 13:48 ` Thomas Weißschuh @ 2024-10-10 17:17 ` Al Viro 2 siblings, 0 replies; 13+ messages in thread From: Al Viro @ 2024-10-10 17:17 UTC (permalink / raw) To: Ye Bin; +Cc: brauner, jack, linux-fsdevel, linux-kernel, yebin10, zhangxiaoxu5 On Thu, Oct 10, 2024 at 07:25:42PM +0800, Ye Bin wrote: > + if (sscanf(buffer, "%u:%u:%u", &major, &minor, &ctl) != 3) > + return -EINVAL; > + > + if (ctl < *((int *)table->extra1) || ctl > *((int *)table->extra2)) > + return -EINVAL; > + > + sb = user_get_super(MKDEV(major, minor), false); > + if (!sb) > + return -EINVAL; Odd user interface aside, you do realize that you've just grabbed ->s_umount from inside a ->write() instance? Considering how much can be grabbed under ->s_umount... Ow. IOW, I very much doubt that doing that kind of stuff from sysctl is a good idea - if nothing else, we'll end up with syzbot screaming its head off about many and varied potential deadlocks, as soon as it discovers that one. And I wouldn't swear that all of those would be false positives. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 3/3] Documentation: add instructions for using 'drop_fs_caches sysctl' sysctl 2024-10-10 11:25 [PATCH 0/3] add support for drop_caches for individual filesystem Ye Bin 2024-10-10 11:25 ` [PATCH 1/3] vfs: introduce shrink_icache_sb() helper Ye Bin 2024-10-10 11:25 ` [PATCH 2/3] sysctl: add support for drop_caches for individual filesystem Ye Bin @ 2024-10-10 11:25 ` Ye Bin 2 siblings, 0 replies; 13+ messages in thread From: Ye Bin @ 2024-10-10 11:25 UTC (permalink / raw) To: viro, brauner, jack, linux-fsdevel; +Cc: linux-kernel, yebin10, zhangxiaoxu5 From: Ye Bin <yebin10@huawei.com> Add instructions for 'drop_fs_caches sysctl' sysctl in 'vm.rst'. Signed-off-by: Ye Bin <yebin10@huawei.com> --- Documentation/admin-guide/sysctl/vm.rst | 27 +++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index f48eaa98d22d..4648ac1ac66c 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -36,6 +36,7 @@ Currently, these files are in /proc/sys/vm: - dirtytime_expire_seconds - dirty_writeback_centisecs - drop_caches +- drop_fs_caches - enable_soft_offline - extfrag_threshold - highmem_is_dirtyable @@ -268,6 +269,32 @@ used:: These are informational only. They do not mean that anything is wrong with your system. To disable them, echo 4 (bit 2) into drop_caches. +drop_fs_caches +============== + +Writing to this will cause the kernel to drop clean for a specific file system +caches, as well as reclaimable slab objects like dentries and inodes. Once +dropped, their memory becomes free. Except for specifying the device number for +a specific file system, everything else is consistent with drop_caches. The +device number can be viewed through "cat /proc/self/montinfo" or 'lsblk'. + +To free pagecache:: + + echo "MAJOR:MINOR:1" > /proc/sys/vm/drop_caches + +To free reclaimable slab objects (includes dentries and inodes):: + + echo "MAJOR:MINOR:2" > /proc/sys/vm/drop_caches + +To free slab objects and pagecache:: + + echo "MAJOR:MINOR:3" > /proc/sys/vm/drop_caches + +You may see informational messages in your kernel log when this file is +used:: + + echo (1234): drop_fs_caches: MAJOR:MINOR:3 + enable_soft_offline =================== Correctable memory errors are very common on servers. Soft-offline is kernel's -- 2.31.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-10-14 11:24 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-10-10 11:25 [PATCH 0/3] add support for drop_caches for individual filesystem Ye Bin 2024-10-10 11:25 ` [PATCH 1/3] vfs: introduce shrink_icache_sb() helper Ye Bin 2024-10-10 12:07 ` Jan Kara 2024-10-10 11:25 ` [PATCH 2/3] sysctl: add support for drop_caches for individual filesystem Ye Bin 2024-10-10 12:16 ` Jan Kara 2024-10-10 12:44 ` yebin (H) 2024-10-10 13:35 ` Benjamin Coddington 2024-10-10 17:04 ` Jan Kara 2024-10-11 11:44 ` Amir Goldstein 2024-10-14 11:24 ` Jan Kara 2024-10-10 13:48 ` Thomas Weißschuh 2024-10-10 17:17 ` Al Viro 2024-10-10 11:25 ` [PATCH 3/3] Documentation: add instructions for using 'drop_fs_caches sysctl' sysctl Ye Bin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).