* [PATCH 0/2 v6] add ioctl/sysfs to donate file-backed pages
@ 2025-01-17 16:41 Jaegeuk Kim
2025-01-17 16:41 ` [PATCH 1/2] f2fs: register inodes which is able to donate pages Jaegeuk Kim
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Jaegeuk Kim @ 2025-01-17 16:41 UTC (permalink / raw)
To: linux-kernel, linux-f2fs-devel; +Cc: Jaegeuk Kim, linux-mm, linux-fsdevel
If users clearly know which file-backed pages to reclaim in system view, they
can use this ioctl() to register in advance and reclaim all at once later.
To MM and others,
I'd like to propose this API in F2FS only, since
1) the use-case is quite limited in Android at the moment. Once it's generall
accepted with more use-cases, happy to propose a generic API such as fadvise.
Please chime in, if there's any needs.
2) it's file-backed pages which requires to maintain the list of inode objects.
I'm not sure this fits in MM tho, also happy to listen to any feedback.
Jaegeuk Kim (2):
f2fs: register inodes which is able to donate pages
f2fs: add a sysfs entry to request donate file-backed pages
Documentation/ABI/testing/sysfs-fs-f2fs | 7 +++
fs/f2fs/debug.c | 3 ++
fs/f2fs/f2fs.h | 14 +++++-
fs/f2fs/file.c | 65 +++++++++++++++++++++++++
fs/f2fs/inode.c | 14 ++++++
fs/f2fs/shrinker.c | 27 ++++++++++
fs/f2fs/super.c | 1 +
fs/f2fs/sysfs.c | 8 +++
include/uapi/linux/f2fs.h | 7 +++
9 files changed, 145 insertions(+), 1 deletion(-)
--
2.48.0.rc2.279.g1de40edade-goog
^ permalink raw reply [flat|nested] 23+ messages in thread* [PATCH 1/2] f2fs: register inodes which is able to donate pages
2025-01-17 16:41 [PATCH 0/2 v6] add ioctl/sysfs to donate file-backed pages Jaegeuk Kim
@ 2025-01-17 16:41 ` Jaegeuk Kim
2025-01-21 9:22 ` [f2fs-dev] " Chao Yu
2025-01-17 16:41 ` [PATCH 2/2] f2fs: add a sysfs entry to request donate file-backed pages Jaegeuk Kim
2025-01-17 18:05 ` [PATCH 0/2 v6] add ioctl/sysfs to " Matthew Wilcox
2 siblings, 1 reply; 23+ messages in thread
From: Jaegeuk Kim @ 2025-01-17 16:41 UTC (permalink / raw)
To: linux-kernel, linux-f2fs-devel; +Cc: Jaegeuk Kim
This patch introduces an inode list to keep the page cache ranges that users
can donate pages together.
#define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
struct f2fs_donate_range)
struct f2fs_donate_range {
__u64 start;
__u64 len;
};
e.g., ioctl(F2FS_IOC_DONATE_RANGE, &range);
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
fs/f2fs/debug.c | 3 ++
fs/f2fs/f2fs.h | 12 +++++++-
fs/f2fs/file.c | 65 +++++++++++++++++++++++++++++++++++++++
fs/f2fs/inode.c | 14 +++++++++
fs/f2fs/super.c | 1 +
include/uapi/linux/f2fs.h | 7 +++++
6 files changed, 101 insertions(+), 1 deletion(-)
diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index 468828288a4a..16c2dfb4f595 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -164,6 +164,7 @@ static void update_general_status(struct f2fs_sb_info *sbi)
si->ndirty_imeta = get_pages(sbi, F2FS_DIRTY_IMETA);
si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE];
si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
+ si->ndonate_files = sbi->donate_files;
si->nquota_files = sbi->nquota_files;
si->ndirty_all = sbi->ndirty_inode[DIRTY_META];
si->aw_cnt = atomic_read(&sbi->atomic_files);
@@ -501,6 +502,8 @@ static int stat_show(struct seq_file *s, void *v)
si->compr_inode, si->compr_blocks);
seq_printf(s, " - Swapfile Inode: %u\n",
si->swapfile_inode);
+ seq_printf(s, " - Donate Inode: %u\n",
+ si->ndonate_files);
seq_printf(s, " - Orphan/Append/Update Inode: %u, %u, %u\n",
si->orphans, si->append, si->update);
seq_printf(s, "\nMain area: %d segs, %d secs %d zones\n",
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 4bfe162eefd3..951fbc3f94c7 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -850,6 +850,11 @@ struct f2fs_inode_info {
#endif
struct list_head dirty_list; /* dirty list for dirs and files */
struct list_head gdirty_list; /* linked in global dirty list */
+
+ /* linked in global inode list for cache donation */
+ struct list_head gdonate_list;
+ loff_t donate_start, donate_end; /* inclusive */
+
struct task_struct *atomic_write_task; /* store atomic write task */
struct extent_tree *extent_tree[NR_EXTENT_CACHES];
/* cached extent_tree entry */
@@ -1274,6 +1279,7 @@ enum inode_type {
DIR_INODE, /* for dirty dir inode */
FILE_INODE, /* for dirty regular/symlink inode */
DIRTY_META, /* for all dirtied inode metadata */
+ DONATE_INODE, /* for all inode to donate pages */
NR_INODE_TYPE,
};
@@ -1629,6 +1635,9 @@ struct f2fs_sb_info {
unsigned int warm_data_age_threshold;
unsigned int last_age_weight;
+ /* control donate caches */
+ unsigned int donate_files;
+
/* basic filesystem units */
unsigned int log_sectors_per_block; /* log2 sectors per block */
unsigned int log_blocksize; /* log2 block size */
@@ -3984,7 +3993,8 @@ struct f2fs_stat_info {
unsigned long long allocated_data_blocks;
int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta;
int ndirty_data, ndirty_qdata;
- unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all;
+ unsigned int ndirty_dirs, ndirty_files, ndirty_all;
+ unsigned int nquota_files, ndonate_files;
int nats, dirty_nats, sits, dirty_sits;
int free_nids, avail_nids, alloc_nids;
int total_count, utilization;
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 3e06fea9795c..0213687805fe 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -2450,6 +2450,68 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg)
return ret;
}
+static int f2fs_ioc_donate_range(struct file *filp, unsigned long arg)
+{
+ struct inode *inode = file_inode(filp);
+ struct mnt_idmap *idmap = file_mnt_idmap(filp);
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+ struct f2fs_donate_range range;
+ u64 max_bytes = F2FS_BLK_TO_BYTES(max_file_blocks(inode));
+ u64 start, end;
+ int ret;
+
+ if (copy_from_user(&range, (struct f2fs_donate_range __user *)arg,
+ sizeof(range)))
+ return -EFAULT;
+
+ if (!inode_owner_or_capable(idmap, inode))
+ return -EACCES;
+
+ if (!S_ISREG(inode->i_mode))
+ return -EINVAL;
+
+ if (range.start >= max_bytes || range.len > max_bytes ||
+ (range.start + range.len) > max_bytes)
+ return -EINVAL;
+
+ start = range.start >> PAGE_SHIFT;
+ end = DIV_ROUND_UP(range.start + range.len, PAGE_SIZE);
+
+ ret = mnt_want_write_file(filp);
+ if (ret)
+ return ret;
+
+ inode_lock(inode);
+
+ if (f2fs_is_atomic_file(inode))
+ goto out;
+
+ spin_lock(&sbi->inode_lock[DONATE_INODE]);
+ /* let's remove the range, if len = 0 */
+ if (!range.len) {
+ if (!list_empty(&F2FS_I(inode)->gdonate_list)) {
+ list_del_init(&F2FS_I(inode)->gdonate_list);
+ sbi->donate_files--;
+ }
+ } else {
+ if (list_empty(&F2FS_I(inode)->gdonate_list)) {
+ list_add_tail(&F2FS_I(inode)->gdonate_list,
+ &sbi->inode_list[DONATE_INODE]);
+ sbi->donate_files++;
+ } else {
+ list_move_tail(&F2FS_I(inode)->gdonate_list,
+ &sbi->inode_list[DONATE_INODE]);
+ }
+ F2FS_I(inode)->donate_start = start;
+ F2FS_I(inode)->donate_end = end - 1;
+ }
+ spin_unlock(&sbi->inode_lock[DONATE_INODE]);
+out:
+ inode_unlock(inode);
+ mnt_drop_write_file(filp);
+ return ret;
+}
+
static int f2fs_ioc_fitrim(struct file *filp, unsigned long arg)
{
struct inode *inode = file_inode(filp);
@@ -4479,6 +4541,8 @@ static long __f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return -EOPNOTSUPP;
case F2FS_IOC_SHUTDOWN:
return f2fs_ioc_shutdown(filp, arg);
+ case F2FS_IOC_DONATE_RANGE:
+ return f2fs_ioc_donate_range(filp, arg);
case FITRIM:
return f2fs_ioc_fitrim(filp, arg);
case FS_IOC_SET_ENCRYPTION_POLICY:
@@ -5230,6 +5294,7 @@ long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case F2FS_IOC_RELEASE_VOLATILE_WRITE:
case F2FS_IOC_ABORT_ATOMIC_WRITE:
case F2FS_IOC_SHUTDOWN:
+ case F2FS_IOC_DONATE_RANGE:
case FITRIM:
case FS_IOC_SET_ENCRYPTION_POLICY:
case FS_IOC_GET_ENCRYPTION_PWSALT:
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 7de33da8b3ea..f9fc58f313f2 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -804,6 +804,19 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
return 0;
}
+static void f2fs_remove_donate_inode(struct inode *inode)
+{
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+
+ if (list_empty(&F2FS_I(inode)->gdonate_list))
+ return;
+
+ spin_lock(&sbi->inode_lock[DONATE_INODE]);
+ list_del_init(&F2FS_I(inode)->gdonate_list);
+ sbi->donate_files--;
+ spin_unlock(&sbi->inode_lock[DONATE_INODE]);
+}
+
/*
* Called at the last iput() if i_nlink is zero
*/
@@ -838,6 +851,7 @@ void f2fs_evict_inode(struct inode *inode)
f2fs_bug_on(sbi, get_dirty_pages(inode));
f2fs_remove_dirty_inode(inode);
+ f2fs_remove_donate_inode(inode);
if (!IS_DEVICE_ALIASING(inode))
f2fs_destroy_extent_tree(inode);
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index fc7d463dee15..ef639a6d82e5 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1441,6 +1441,7 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
spin_lock_init(&fi->i_size_lock);
INIT_LIST_HEAD(&fi->dirty_list);
INIT_LIST_HEAD(&fi->gdirty_list);
+ INIT_LIST_HEAD(&fi->gdonate_list);
init_f2fs_rwsem(&fi->i_gc_rwsem[READ]);
init_f2fs_rwsem(&fi->i_gc_rwsem[WRITE]);
init_f2fs_rwsem(&fi->i_xattr_sem);
diff --git a/include/uapi/linux/f2fs.h b/include/uapi/linux/f2fs.h
index f7aaf8d23e20..cd38a7c166e6 100644
--- a/include/uapi/linux/f2fs.h
+++ b/include/uapi/linux/f2fs.h
@@ -44,6 +44,8 @@
#define F2FS_IOC_COMPRESS_FILE _IO(F2FS_IOCTL_MAGIC, 24)
#define F2FS_IOC_START_ATOMIC_REPLACE _IO(F2FS_IOCTL_MAGIC, 25)
#define F2FS_IOC_GET_DEV_ALIAS_FILE _IOR(F2FS_IOCTL_MAGIC, 26, __u32)
+#define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
+ struct f2fs_donate_range)
/*
* should be same as XFS_IOC_GOINGDOWN.
@@ -97,4 +99,9 @@ struct f2fs_comp_option {
__u8 log_cluster_size;
};
+struct f2fs_donate_range {
+ __u64 start;
+ __u64 len;
+};
+
#endif /* _UAPI_LINUX_F2FS_H */
--
2.48.0.rc2.279.g1de40edade-goog
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [f2fs-dev] [PATCH 1/2] f2fs: register inodes which is able to donate pages
2025-01-17 16:41 ` [PATCH 1/2] f2fs: register inodes which is able to donate pages Jaegeuk Kim
@ 2025-01-21 9:22 ` Chao Yu
2025-01-21 16:56 ` Jaegeuk Kim
0 siblings, 1 reply; 23+ messages in thread
From: Chao Yu @ 2025-01-21 9:22 UTC (permalink / raw)
To: Jaegeuk Kim, linux-kernel, linux-f2fs-devel; +Cc: chao
On 1/18/25 00:41, Jaegeuk Kim via Linux-f2fs-devel wrote:
> This patch introduces an inode list to keep the page cache ranges that users
> can donate pages together.
>
> #define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
> struct f2fs_donate_range)
> struct f2fs_donate_range {
> __u64 start;
> __u64 len;
> };
>
> e.g., ioctl(F2FS_IOC_DONATE_RANGE, &range);
>
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
> fs/f2fs/debug.c | 3 ++
> fs/f2fs/f2fs.h | 12 +++++++-
> fs/f2fs/file.c | 65 +++++++++++++++++++++++++++++++++++++++
> fs/f2fs/inode.c | 14 +++++++++
> fs/f2fs/super.c | 1 +
> include/uapi/linux/f2fs.h | 7 +++++
> 6 files changed, 101 insertions(+), 1 deletion(-)
>
> diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
> index 468828288a4a..16c2dfb4f595 100644
> --- a/fs/f2fs/debug.c
> +++ b/fs/f2fs/debug.c
> @@ -164,6 +164,7 @@ static void update_general_status(struct f2fs_sb_info *sbi)
> si->ndirty_imeta = get_pages(sbi, F2FS_DIRTY_IMETA);
> si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE];
> si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
> + si->ndonate_files = sbi->donate_files;
> si->nquota_files = sbi->nquota_files;
> si->ndirty_all = sbi->ndirty_inode[DIRTY_META];
> si->aw_cnt = atomic_read(&sbi->atomic_files);
> @@ -501,6 +502,8 @@ static int stat_show(struct seq_file *s, void *v)
> si->compr_inode, si->compr_blocks);
> seq_printf(s, " - Swapfile Inode: %u\n",
> si->swapfile_inode);
> + seq_printf(s, " - Donate Inode: %u\n",
> + si->ndonate_files);
> seq_printf(s, " - Orphan/Append/Update Inode: %u, %u, %u\n",
> si->orphans, si->append, si->update);
> seq_printf(s, "\nMain area: %d segs, %d secs %d zones\n",
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 4bfe162eefd3..951fbc3f94c7 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -850,6 +850,11 @@ struct f2fs_inode_info {
> #endif
> struct list_head dirty_list; /* dirty list for dirs and files */
> struct list_head gdirty_list; /* linked in global dirty list */
> +
> + /* linked in global inode list for cache donation */
> + struct list_head gdonate_list;
> + loff_t donate_start, donate_end; /* inclusive */
use block_t instead of loff_t? it can avoid unnecessary memory cost.
Thanks,
> +
> struct task_struct *atomic_write_task; /* store atomic write task */
> struct extent_tree *extent_tree[NR_EXTENT_CACHES];
> /* cached extent_tree entry */
> @@ -1274,6 +1279,7 @@ enum inode_type {
> DIR_INODE, /* for dirty dir inode */
> FILE_INODE, /* for dirty regular/symlink inode */
> DIRTY_META, /* for all dirtied inode metadata */
> + DONATE_INODE, /* for all inode to donate pages */
> NR_INODE_TYPE,
> };
>
> @@ -1629,6 +1635,9 @@ struct f2fs_sb_info {
> unsigned int warm_data_age_threshold;
> unsigned int last_age_weight;
>
> + /* control donate caches */
> + unsigned int donate_files;
> +
> /* basic filesystem units */
> unsigned int log_sectors_per_block; /* log2 sectors per block */
> unsigned int log_blocksize; /* log2 block size */
> @@ -3984,7 +3993,8 @@ struct f2fs_stat_info {
> unsigned long long allocated_data_blocks;
> int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta;
> int ndirty_data, ndirty_qdata;
> - unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all;
> + unsigned int ndirty_dirs, ndirty_files, ndirty_all;
> + unsigned int nquota_files, ndonate_files;
> int nats, dirty_nats, sits, dirty_sits;
> int free_nids, avail_nids, alloc_nids;
> int total_count, utilization;
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 3e06fea9795c..0213687805fe 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -2450,6 +2450,68 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg)
> return ret;
> }
>
> +static int f2fs_ioc_donate_range(struct file *filp, unsigned long arg)
> +{
> + struct inode *inode = file_inode(filp);
> + struct mnt_idmap *idmap = file_mnt_idmap(filp);
> + struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> + struct f2fs_donate_range range;
> + u64 max_bytes = F2FS_BLK_TO_BYTES(max_file_blocks(inode));
> + u64 start, end;
> + int ret;
> +
> + if (copy_from_user(&range, (struct f2fs_donate_range __user *)arg,
> + sizeof(range)))
> + return -EFAULT;
> +
> + if (!inode_owner_or_capable(idmap, inode))
> + return -EACCES;
> +
> + if (!S_ISREG(inode->i_mode))
> + return -EINVAL;
> +
> + if (range.start >= max_bytes || range.len > max_bytes ||
> + (range.start + range.len) > max_bytes)
> + return -EINVAL;
> +
> + start = range.start >> PAGE_SHIFT;
> + end = DIV_ROUND_UP(range.start + range.len, PAGE_SIZE);
> +
> + ret = mnt_want_write_file(filp);
> + if (ret)
> + return ret;
> +
> + inode_lock(inode);
> +
> + if (f2fs_is_atomic_file(inode))
> + goto out;
> +
> + spin_lock(&sbi->inode_lock[DONATE_INODE]);
> + /* let's remove the range, if len = 0 */
> + if (!range.len) {
> + if (!list_empty(&F2FS_I(inode)->gdonate_list)) {
> + list_del_init(&F2FS_I(inode)->gdonate_list);
> + sbi->donate_files--;
> + }
> + } else {
> + if (list_empty(&F2FS_I(inode)->gdonate_list)) {
> + list_add_tail(&F2FS_I(inode)->gdonate_list,
> + &sbi->inode_list[DONATE_INODE]);
> + sbi->donate_files++;
> + } else {
> + list_move_tail(&F2FS_I(inode)->gdonate_list,
> + &sbi->inode_list[DONATE_INODE]);
> + }
> + F2FS_I(inode)->donate_start = start;
> + F2FS_I(inode)->donate_end = end - 1;
> + }
> + spin_unlock(&sbi->inode_lock[DONATE_INODE]);
> +out:
> + inode_unlock(inode);
> + mnt_drop_write_file(filp);
> + return ret;
> +}
> +
> static int f2fs_ioc_fitrim(struct file *filp, unsigned long arg)
> {
> struct inode *inode = file_inode(filp);
> @@ -4479,6 +4541,8 @@ static long __f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> return -EOPNOTSUPP;
> case F2FS_IOC_SHUTDOWN:
> return f2fs_ioc_shutdown(filp, arg);
> + case F2FS_IOC_DONATE_RANGE:
> + return f2fs_ioc_donate_range(filp, arg);
> case FITRIM:
> return f2fs_ioc_fitrim(filp, arg);
> case FS_IOC_SET_ENCRYPTION_POLICY:
> @@ -5230,6 +5294,7 @@ long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> case F2FS_IOC_RELEASE_VOLATILE_WRITE:
> case F2FS_IOC_ABORT_ATOMIC_WRITE:
> case F2FS_IOC_SHUTDOWN:
> + case F2FS_IOC_DONATE_RANGE:
> case FITRIM:
> case FS_IOC_SET_ENCRYPTION_POLICY:
> case FS_IOC_GET_ENCRYPTION_PWSALT:
> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> index 7de33da8b3ea..f9fc58f313f2 100644
> --- a/fs/f2fs/inode.c
> +++ b/fs/f2fs/inode.c
> @@ -804,6 +804,19 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
> return 0;
> }
>
> +static void f2fs_remove_donate_inode(struct inode *inode)
> +{
> + struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> +
> + if (list_empty(&F2FS_I(inode)->gdonate_list))
> + return;
> +
> + spin_lock(&sbi->inode_lock[DONATE_INODE]);
> + list_del_init(&F2FS_I(inode)->gdonate_list);
> + sbi->donate_files--;
> + spin_unlock(&sbi->inode_lock[DONATE_INODE]);
> +}
> +
> /*
> * Called at the last iput() if i_nlink is zero
> */
> @@ -838,6 +851,7 @@ void f2fs_evict_inode(struct inode *inode)
>
> f2fs_bug_on(sbi, get_dirty_pages(inode));
> f2fs_remove_dirty_inode(inode);
> + f2fs_remove_donate_inode(inode);
>
> if (!IS_DEVICE_ALIASING(inode))
> f2fs_destroy_extent_tree(inode);
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index fc7d463dee15..ef639a6d82e5 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -1441,6 +1441,7 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
> spin_lock_init(&fi->i_size_lock);
> INIT_LIST_HEAD(&fi->dirty_list);
> INIT_LIST_HEAD(&fi->gdirty_list);
> + INIT_LIST_HEAD(&fi->gdonate_list);
> init_f2fs_rwsem(&fi->i_gc_rwsem[READ]);
> init_f2fs_rwsem(&fi->i_gc_rwsem[WRITE]);
> init_f2fs_rwsem(&fi->i_xattr_sem);
> diff --git a/include/uapi/linux/f2fs.h b/include/uapi/linux/f2fs.h
> index f7aaf8d23e20..cd38a7c166e6 100644
> --- a/include/uapi/linux/f2fs.h
> +++ b/include/uapi/linux/f2fs.h
> @@ -44,6 +44,8 @@
> #define F2FS_IOC_COMPRESS_FILE _IO(F2FS_IOCTL_MAGIC, 24)
> #define F2FS_IOC_START_ATOMIC_REPLACE _IO(F2FS_IOCTL_MAGIC, 25)
> #define F2FS_IOC_GET_DEV_ALIAS_FILE _IOR(F2FS_IOCTL_MAGIC, 26, __u32)
> +#define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
> + struct f2fs_donate_range)
>
> /*
> * should be same as XFS_IOC_GOINGDOWN.
> @@ -97,4 +99,9 @@ struct f2fs_comp_option {
> __u8 log_cluster_size;
> };
>
> +struct f2fs_donate_range {
> + __u64 start;
> + __u64 len;
> +};
> +
> #endif /* _UAPI_LINUX_F2FS_H */
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [f2fs-dev] [PATCH 1/2] f2fs: register inodes which is able to donate pages
2025-01-21 9:22 ` [f2fs-dev] " Chao Yu
@ 2025-01-21 16:56 ` Jaegeuk Kim
0 siblings, 0 replies; 23+ messages in thread
From: Jaegeuk Kim @ 2025-01-21 16:56 UTC (permalink / raw)
To: Chao Yu; +Cc: linux-kernel, linux-f2fs-devel
On 01/21, Chao Yu wrote:
> On 1/18/25 00:41, Jaegeuk Kim via Linux-f2fs-devel wrote:
> > This patch introduces an inode list to keep the page cache ranges that users
> > can donate pages together.
> >
> > #define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
> > struct f2fs_donate_range)
> > struct f2fs_donate_range {
> > __u64 start;
> > __u64 len;
> > };
> >
> > e.g., ioctl(F2FS_IOC_DONATE_RANGE, &range);
> >
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > ---
> > fs/f2fs/debug.c | 3 ++
> > fs/f2fs/f2fs.h | 12 +++++++-
> > fs/f2fs/file.c | 65 +++++++++++++++++++++++++++++++++++++++
> > fs/f2fs/inode.c | 14 +++++++++
> > fs/f2fs/super.c | 1 +
> > include/uapi/linux/f2fs.h | 7 +++++
> > 6 files changed, 101 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
> > index 468828288a4a..16c2dfb4f595 100644
> > --- a/fs/f2fs/debug.c
> > +++ b/fs/f2fs/debug.c
> > @@ -164,6 +164,7 @@ static void update_general_status(struct f2fs_sb_info *sbi)
> > si->ndirty_imeta = get_pages(sbi, F2FS_DIRTY_IMETA);
> > si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE];
> > si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
> > + si->ndonate_files = sbi->donate_files;
> > si->nquota_files = sbi->nquota_files;
> > si->ndirty_all = sbi->ndirty_inode[DIRTY_META];
> > si->aw_cnt = atomic_read(&sbi->atomic_files);
> > @@ -501,6 +502,8 @@ static int stat_show(struct seq_file *s, void *v)
> > si->compr_inode, si->compr_blocks);
> > seq_printf(s, " - Swapfile Inode: %u\n",
> > si->swapfile_inode);
> > + seq_printf(s, " - Donate Inode: %u\n",
> > + si->ndonate_files);
> > seq_printf(s, " - Orphan/Append/Update Inode: %u, %u, %u\n",
> > si->orphans, si->append, si->update);
> > seq_printf(s, "\nMain area: %d segs, %d secs %d zones\n",
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 4bfe162eefd3..951fbc3f94c7 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -850,6 +850,11 @@ struct f2fs_inode_info {
> > #endif
> > struct list_head dirty_list; /* dirty list for dirs and files */
> > struct list_head gdirty_list; /* linked in global dirty list */
> > +
> > + /* linked in global inode list for cache donation */
> > + struct list_head gdonate_list;
> > + loff_t donate_start, donate_end; /* inclusive */
>
> use block_t instead of loff_t? it can avoid unnecessary memory cost.
Changed to pgoff_t, since it's a page offset.
>
> Thanks,
>
> > +
> > struct task_struct *atomic_write_task; /* store atomic write task */
> > struct extent_tree *extent_tree[NR_EXTENT_CACHES];
> > /* cached extent_tree entry */
> > @@ -1274,6 +1279,7 @@ enum inode_type {
> > DIR_INODE, /* for dirty dir inode */
> > FILE_INODE, /* for dirty regular/symlink inode */
> > DIRTY_META, /* for all dirtied inode metadata */
> > + DONATE_INODE, /* for all inode to donate pages */
> > NR_INODE_TYPE,
> > };
> > @@ -1629,6 +1635,9 @@ struct f2fs_sb_info {
> > unsigned int warm_data_age_threshold;
> > unsigned int last_age_weight;
> > + /* control donate caches */
> > + unsigned int donate_files;
> > +
> > /* basic filesystem units */
> > unsigned int log_sectors_per_block; /* log2 sectors per block */
> > unsigned int log_blocksize; /* log2 block size */
> > @@ -3984,7 +3993,8 @@ struct f2fs_stat_info {
> > unsigned long long allocated_data_blocks;
> > int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta;
> > int ndirty_data, ndirty_qdata;
> > - unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all;
> > + unsigned int ndirty_dirs, ndirty_files, ndirty_all;
> > + unsigned int nquota_files, ndonate_files;
> > int nats, dirty_nats, sits, dirty_sits;
> > int free_nids, avail_nids, alloc_nids;
> > int total_count, utilization;
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index 3e06fea9795c..0213687805fe 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -2450,6 +2450,68 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg)
> > return ret;
> > }
> > +static int f2fs_ioc_donate_range(struct file *filp, unsigned long arg)
> > +{
> > + struct inode *inode = file_inode(filp);
> > + struct mnt_idmap *idmap = file_mnt_idmap(filp);
> > + struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> > + struct f2fs_donate_range range;
> > + u64 max_bytes = F2FS_BLK_TO_BYTES(max_file_blocks(inode));
> > + u64 start, end;
> > + int ret;
> > +
> > + if (copy_from_user(&range, (struct f2fs_donate_range __user *)arg,
> > + sizeof(range)))
> > + return -EFAULT;
> > +
> > + if (!inode_owner_or_capable(idmap, inode))
> > + return -EACCES;
> > +
> > + if (!S_ISREG(inode->i_mode))
> > + return -EINVAL;
> > +
> > + if (range.start >= max_bytes || range.len > max_bytes ||
> > + (range.start + range.len) > max_bytes)
> > + return -EINVAL;
> > +
> > + start = range.start >> PAGE_SHIFT;
> > + end = DIV_ROUND_UP(range.start + range.len, PAGE_SIZE);
> > +
> > + ret = mnt_want_write_file(filp);
> > + if (ret)
> > + return ret;
> > +
> > + inode_lock(inode);
> > +
> > + if (f2fs_is_atomic_file(inode))
> > + goto out;
> > +
> > + spin_lock(&sbi->inode_lock[DONATE_INODE]);
> > + /* let's remove the range, if len = 0 */
> > + if (!range.len) {
> > + if (!list_empty(&F2FS_I(inode)->gdonate_list)) {
> > + list_del_init(&F2FS_I(inode)->gdonate_list);
> > + sbi->donate_files--;
> > + }
> > + } else {
> > + if (list_empty(&F2FS_I(inode)->gdonate_list)) {
> > + list_add_tail(&F2FS_I(inode)->gdonate_list,
> > + &sbi->inode_list[DONATE_INODE]);
> > + sbi->donate_files++;
> > + } else {
> > + list_move_tail(&F2FS_I(inode)->gdonate_list,
> > + &sbi->inode_list[DONATE_INODE]);
> > + }
> > + F2FS_I(inode)->donate_start = start;
> > + F2FS_I(inode)->donate_end = end - 1;
> > + }
> > + spin_unlock(&sbi->inode_lock[DONATE_INODE]);
> > +out:
> > + inode_unlock(inode);
> > + mnt_drop_write_file(filp);
> > + return ret;
> > +}
> > +
> > static int f2fs_ioc_fitrim(struct file *filp, unsigned long arg)
> > {
> > struct inode *inode = file_inode(filp);
> > @@ -4479,6 +4541,8 @@ static long __f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> > return -EOPNOTSUPP;
> > case F2FS_IOC_SHUTDOWN:
> > return f2fs_ioc_shutdown(filp, arg);
> > + case F2FS_IOC_DONATE_RANGE:
> > + return f2fs_ioc_donate_range(filp, arg);
> > case FITRIM:
> > return f2fs_ioc_fitrim(filp, arg);
> > case FS_IOC_SET_ENCRYPTION_POLICY:
> > @@ -5230,6 +5294,7 @@ long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> > case F2FS_IOC_RELEASE_VOLATILE_WRITE:
> > case F2FS_IOC_ABORT_ATOMIC_WRITE:
> > case F2FS_IOC_SHUTDOWN:
> > + case F2FS_IOC_DONATE_RANGE:
> > case FITRIM:
> > case FS_IOC_SET_ENCRYPTION_POLICY:
> > case FS_IOC_GET_ENCRYPTION_PWSALT:
> > diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> > index 7de33da8b3ea..f9fc58f313f2 100644
> > --- a/fs/f2fs/inode.c
> > +++ b/fs/f2fs/inode.c
> > @@ -804,6 +804,19 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
> > return 0;
> > }
> > +static void f2fs_remove_donate_inode(struct inode *inode)
> > +{
> > + struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> > +
> > + if (list_empty(&F2FS_I(inode)->gdonate_list))
> > + return;
> > +
> > + spin_lock(&sbi->inode_lock[DONATE_INODE]);
> > + list_del_init(&F2FS_I(inode)->gdonate_list);
> > + sbi->donate_files--;
> > + spin_unlock(&sbi->inode_lock[DONATE_INODE]);
> > +}
> > +
> > /*
> > * Called at the last iput() if i_nlink is zero
> > */
> > @@ -838,6 +851,7 @@ void f2fs_evict_inode(struct inode *inode)
> > f2fs_bug_on(sbi, get_dirty_pages(inode));
> > f2fs_remove_dirty_inode(inode);
> > + f2fs_remove_donate_inode(inode);
> > if (!IS_DEVICE_ALIASING(inode))
> > f2fs_destroy_extent_tree(inode);
> > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > index fc7d463dee15..ef639a6d82e5 100644
> > --- a/fs/f2fs/super.c
> > +++ b/fs/f2fs/super.c
> > @@ -1441,6 +1441,7 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
> > spin_lock_init(&fi->i_size_lock);
> > INIT_LIST_HEAD(&fi->dirty_list);
> > INIT_LIST_HEAD(&fi->gdirty_list);
> > + INIT_LIST_HEAD(&fi->gdonate_list);
> > init_f2fs_rwsem(&fi->i_gc_rwsem[READ]);
> > init_f2fs_rwsem(&fi->i_gc_rwsem[WRITE]);
> > init_f2fs_rwsem(&fi->i_xattr_sem);
> > diff --git a/include/uapi/linux/f2fs.h b/include/uapi/linux/f2fs.h
> > index f7aaf8d23e20..cd38a7c166e6 100644
> > --- a/include/uapi/linux/f2fs.h
> > +++ b/include/uapi/linux/f2fs.h
> > @@ -44,6 +44,8 @@
> > #define F2FS_IOC_COMPRESS_FILE _IO(F2FS_IOCTL_MAGIC, 24)
> > #define F2FS_IOC_START_ATOMIC_REPLACE _IO(F2FS_IOCTL_MAGIC, 25)
> > #define F2FS_IOC_GET_DEV_ALIAS_FILE _IOR(F2FS_IOCTL_MAGIC, 26, __u32)
> > +#define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
> > + struct f2fs_donate_range)
> > /*
> > * should be same as XFS_IOC_GOINGDOWN.
> > @@ -97,4 +99,9 @@ struct f2fs_comp_option {
> > __u8 log_cluster_size;
> > };
> > +struct f2fs_donate_range {
> > + __u64 start;
> > + __u64 len;
> > +};
> > +
> > #endif /* _UAPI_LINUX_F2FS_H */
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 2/2] f2fs: add a sysfs entry to request donate file-backed pages
2025-01-17 16:41 [PATCH 0/2 v6] add ioctl/sysfs to donate file-backed pages Jaegeuk Kim
2025-01-17 16:41 ` [PATCH 1/2] f2fs: register inodes which is able to donate pages Jaegeuk Kim
@ 2025-01-17 16:41 ` Jaegeuk Kim
2025-01-17 18:05 ` [PATCH 0/2 v6] add ioctl/sysfs to " Matthew Wilcox
2 siblings, 0 replies; 23+ messages in thread
From: Jaegeuk Kim @ 2025-01-17 16:41 UTC (permalink / raw)
To: linux-kernel, linux-f2fs-devel; +Cc: Jaegeuk Kim
1. ioctl(fd1, F2FS_IOC_DONATE_RANGE, {0,3});
2. ioctl(fd2, F2FS_IOC_DONATE_RANGE, {1,2});
3. ioctl(fd3, F2FS_IOC_DONATE_RANGE, {3,1});
4. echo 3 > /sys/fs/f2fs/blk/donate_caches
will reclaim 3 page cache ranges, registered by #1, #2, and #3.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
Documentation/ABI/testing/sysfs-fs-f2fs | 7 +++++++
fs/f2fs/f2fs.h | 2 ++
fs/f2fs/shrinker.c | 27 +++++++++++++++++++++++++
fs/f2fs/sysfs.c | 8 ++++++++
4 files changed, 44 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs
index 3e1630c70d8a..6f9d8b8889fd 100644
--- a/Documentation/ABI/testing/sysfs-fs-f2fs
+++ b/Documentation/ABI/testing/sysfs-fs-f2fs
@@ -828,3 +828,10 @@ Date: November 2024
Contact: "Chao Yu" <chao@kernel.org>
Description: It controls max read extent count for per-inode, the value of threshold
is 10240 by default.
+
+What: /sys/fs/f2fs/<disk>/donate_caches
+Date: December 2024
+Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
+Description: It reclaims the certian file-backed pages registered by
+ ioctl(F2FS_IOC_DONATE_RANGE).
+ For example, writing N tries to drop N address spaces in LRU.
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 951fbc3f94c7..399ddd10a94f 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1637,6 +1637,7 @@ struct f2fs_sb_info {
/* control donate caches */
unsigned int donate_files;
+ unsigned int donate_caches;
/* basic filesystem units */
unsigned int log_sectors_per_block; /* log2 sectors per block */
@@ -4259,6 +4260,7 @@ unsigned long f2fs_shrink_count(struct shrinker *shrink,
struct shrink_control *sc);
unsigned long f2fs_shrink_scan(struct shrinker *shrink,
struct shrink_control *sc);
+void f2fs_donate_caches(struct f2fs_sb_info *sbi);
void f2fs_join_shrinker(struct f2fs_sb_info *sbi);
void f2fs_leave_shrinker(struct f2fs_sb_info *sbi);
diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
index 83d6fb97dcae..22f62813910b 100644
--- a/fs/f2fs/shrinker.c
+++ b/fs/f2fs/shrinker.c
@@ -130,6 +130,33 @@ unsigned long f2fs_shrink_scan(struct shrinker *shrink,
return freed;
}
+void f2fs_donate_caches(struct f2fs_sb_info *sbi)
+{
+ struct inode *inode;
+ struct f2fs_inode_info *fi;
+ int nfiles = sbi->donate_caches;
+
+ while (nfiles--) {
+ spin_lock(&sbi->inode_lock[DONATE_INODE]);
+ if (list_empty(&sbi->inode_list[DONATE_INODE])) {
+ spin_unlock(&sbi->inode_lock[DONATE_INODE]);
+ break;
+ }
+ fi = list_first_entry(&sbi->inode_list[DONATE_INODE],
+ struct f2fs_inode_info, gdonate_list);
+ list_move_tail(&fi->gdonate_list, &sbi->inode_list[DONATE_INODE]);
+ inode = igrab(&fi->vfs_inode);
+ spin_unlock(&sbi->inode_lock[DONATE_INODE]);
+
+ if (!inode)
+ continue;
+
+ invalidate_inode_pages2_range(inode->i_mapping,
+ fi->donate_start, fi->donate_end);
+ iput(inode);
+ }
+}
+
void f2fs_join_shrinker(struct f2fs_sb_info *sbi)
{
spin_lock(&f2fs_list_lock);
diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index 6b99dc49f776..f81190fabdd3 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -811,6 +811,12 @@ static ssize_t __sbi_store(struct f2fs_attr *a,
return count;
}
+ if (!strcmp(a->attr.name, "donate_caches")) {
+ sbi->donate_caches = min(t, sbi->donate_files);
+ f2fs_donate_caches(sbi);
+ return count;
+ }
+
*ui = (unsigned int)t;
return count;
@@ -1030,6 +1036,7 @@ F2FS_SBI_GENERAL_RW_ATTR(max_victim_search);
F2FS_SBI_GENERAL_RW_ATTR(migration_granularity);
F2FS_SBI_GENERAL_RW_ATTR(migration_window_granularity);
F2FS_SBI_GENERAL_RW_ATTR(dir_level);
+F2FS_SBI_GENERAL_RW_ATTR(donate_caches);
#ifdef CONFIG_F2FS_IOSTAT
F2FS_SBI_GENERAL_RW_ATTR(iostat_enable);
F2FS_SBI_GENERAL_RW_ATTR(iostat_period_ms);
@@ -1178,6 +1185,7 @@ static struct attribute *f2fs_attrs[] = {
ATTR_LIST(migration_granularity),
ATTR_LIST(migration_window_granularity),
ATTR_LIST(dir_level),
+ ATTR_LIST(donate_caches),
ATTR_LIST(ram_thresh),
ATTR_LIST(ra_nid_pages),
ATTR_LIST(dirty_nats_ratio),
--
2.48.0.rc2.279.g1de40edade-goog
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 0/2 v6] add ioctl/sysfs to donate file-backed pages
2025-01-17 16:41 [PATCH 0/2 v6] add ioctl/sysfs to donate file-backed pages Jaegeuk Kim
2025-01-17 16:41 ` [PATCH 1/2] f2fs: register inodes which is able to donate pages Jaegeuk Kim
2025-01-17 16:41 ` [PATCH 2/2] f2fs: add a sysfs entry to request donate file-backed pages Jaegeuk Kim
@ 2025-01-17 18:05 ` Matthew Wilcox
2025-01-17 18:48 ` Jaegeuk Kim
2 siblings, 1 reply; 23+ messages in thread
From: Matthew Wilcox @ 2025-01-17 18:05 UTC (permalink / raw)
To: Jaegeuk Kim; +Cc: linux-kernel, linux-f2fs-devel, linux-mm, linux-fsdevel
On Fri, Jan 17, 2025 at 04:41:16PM +0000, Jaegeuk Kim wrote:
> If users clearly know which file-backed pages to reclaim in system view, they
> can use this ioctl() to register in advance and reclaim all at once later.
>
> To MM and others,
>
> I'd like to propose this API in F2FS only, since
> 1) the use-case is quite limited in Android at the moment. Once it's generall
> accepted with more use-cases, happy to propose a generic API such as fadvise.
> Please chime in, if there's any needs.
>
> 2) it's file-backed pages which requires to maintain the list of inode objects.
> I'm not sure this fits in MM tho, also happy to listen to any feedback.
You didn't cc the patches to linux-mm, so that's a bad start.
I don't understand how this is different from MADV_COLD. Please
explain.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2 v6] add ioctl/sysfs to donate file-backed pages
2025-01-17 18:05 ` [PATCH 0/2 v6] add ioctl/sysfs to " Matthew Wilcox
@ 2025-01-17 18:48 ` Jaegeuk Kim
2025-01-17 19:04 ` Matthew Wilcox
0 siblings, 1 reply; 23+ messages in thread
From: Jaegeuk Kim @ 2025-01-17 18:48 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: linux-kernel, linux-f2fs-devel, linux-mm, linux-fsdevel
On 01/17, Matthew Wilcox wrote:
> On Fri, Jan 17, 2025 at 04:41:16PM +0000, Jaegeuk Kim wrote:
> > If users clearly know which file-backed pages to reclaim in system view, they
> > can use this ioctl() to register in advance and reclaim all at once later.
> >
> > To MM and others,
> >
> > I'd like to propose this API in F2FS only, since
> > 1) the use-case is quite limited in Android at the moment. Once it's generall
> > accepted with more use-cases, happy to propose a generic API such as fadvise.
> > Please chime in, if there's any needs.
> >
> > 2) it's file-backed pages which requires to maintain the list of inode objects.
> > I'm not sure this fits in MM tho, also happy to listen to any feedback.
>
> You didn't cc the patches to linux-mm, so that's a bad start.
Because #1.
>
> I don't understand how this is different from MADV_COLD. Please
> explain.
MADV_COLD is a vma range, while this is a file range. So, it's more close to
fadvise(POSIX_FADV_DONTNEED) which tries to reclaim the file-backed pages
at the time when it's called. The idea is to keep the hints only, and try to
reclaim all later when admin expects system memory pressure soon.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2 v6] add ioctl/sysfs to donate file-backed pages
2025-01-17 18:48 ` Jaegeuk Kim
@ 2025-01-17 19:04 ` Matthew Wilcox
2025-01-17 20:37 ` Jaegeuk Kim
0 siblings, 1 reply; 23+ messages in thread
From: Matthew Wilcox @ 2025-01-17 19:04 UTC (permalink / raw)
To: Jaegeuk Kim; +Cc: linux-kernel, linux-f2fs-devel, linux-mm, linux-fsdevel
On Fri, Jan 17, 2025 at 06:48:55PM +0000, Jaegeuk Kim wrote:
> > I don't understand how this is different from MADV_COLD. Please
> > explain.
>
> MADV_COLD is a vma range, while this is a file range. So, it's more close to
> fadvise(POSIX_FADV_DONTNEED) which tries to reclaim the file-backed pages
> at the time when it's called. The idea is to keep the hints only, and try to
> reclaim all later when admin expects system memory pressure soon.
So you're saying you want POSIX_FADV_COLD?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2 v6] add ioctl/sysfs to donate file-backed pages
2025-01-17 19:04 ` Matthew Wilcox
@ 2025-01-17 20:37 ` Jaegeuk Kim
2025-02-04 16:29 ` Jaegeuk Kim
0 siblings, 1 reply; 23+ messages in thread
From: Jaegeuk Kim @ 2025-01-17 20:37 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: linux-kernel, linux-f2fs-devel, linux-mm, linux-fsdevel
On 01/17, Matthew Wilcox wrote:
> On Fri, Jan 17, 2025 at 06:48:55PM +0000, Jaegeuk Kim wrote:
> > > I don't understand how this is different from MADV_COLD. Please
> > > explain.
> >
> > MADV_COLD is a vma range, while this is a file range. So, it's more close to
> > fadvise(POSIX_FADV_DONTNEED) which tries to reclaim the file-backed pages
> > at the time when it's called. The idea is to keep the hints only, and try to
> > reclaim all later when admin expects system memory pressure soon.
>
> So you're saying you want POSIX_FADV_COLD?
Yeah, the intention looks similar like marking it cold and paging out later.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2 v6] add ioctl/sysfs to donate file-backed pages
2025-01-17 20:37 ` Jaegeuk Kim
@ 2025-02-04 16:29 ` Jaegeuk Kim
2025-02-10 17:00 ` Jaegeuk Kim
0 siblings, 1 reply; 23+ messages in thread
From: Jaegeuk Kim @ 2025-02-04 16:29 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: linux-kernel, linux-f2fs-devel, linux-mm, linux-fsdevel
On 01/17, Jaegeuk Kim wrote:
> On 01/17, Matthew Wilcox wrote:
> > On Fri, Jan 17, 2025 at 06:48:55PM +0000, Jaegeuk Kim wrote:
> > > > I don't understand how this is different from MADV_COLD. Please
> > > > explain.
> > >
> > > MADV_COLD is a vma range, while this is a file range. So, it's more close to
> > > fadvise(POSIX_FADV_DONTNEED) which tries to reclaim the file-backed pages
> > > at the time when it's called. The idea is to keep the hints only, and try to
> > > reclaim all later when admin expects system memory pressure soon.
> >
> > So you're saying you want POSIX_FADV_COLD?
>
> Yeah, the intention looks similar like marking it cold and paging out later.
Kindly ping, for the feedback on the direction. If there's demand for something
generalized api, I'm happy to explore.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2 v6] add ioctl/sysfs to donate file-backed pages
2025-02-04 16:29 ` Jaegeuk Kim
@ 2025-02-10 17:00 ` Jaegeuk Kim
2025-02-10 17:20 ` Matthew Wilcox
0 siblings, 1 reply; 23+ messages in thread
From: Jaegeuk Kim @ 2025-02-10 17:00 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: linux-kernel, linux-f2fs-devel, linux-mm, linux-fsdevel
On 02/04, Jaegeuk Kim wrote:
> On 01/17, Jaegeuk Kim wrote:
> > On 01/17, Matthew Wilcox wrote:
> > > On Fri, Jan 17, 2025 at 06:48:55PM +0000, Jaegeuk Kim wrote:
> > > > > I don't understand how this is different from MADV_COLD. Please
> > > > > explain.
> > > >
> > > > MADV_COLD is a vma range, while this is a file range. So, it's more close to
> > > > fadvise(POSIX_FADV_DONTNEED) which tries to reclaim the file-backed pages
> > > > at the time when it's called. The idea is to keep the hints only, and try to
> > > > reclaim all later when admin expects system memory pressure soon.
> > >
> > > So you're saying you want POSIX_FADV_COLD?
> >
> > Yeah, the intention looks similar like marking it cold and paging out later.
>
> Kindly ping, for the feedback on the direction. If there's demand for something
> generalized api, I'm happy to explore.
If there's no objection, let me push the change in f2fs and keep an eye on
who more will need this in general.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2 v6] add ioctl/sysfs to donate file-backed pages
2025-02-10 17:00 ` Jaegeuk Kim
@ 2025-02-10 17:20 ` Matthew Wilcox
2025-02-10 19:01 ` Jaegeuk Kim
0 siblings, 1 reply; 23+ messages in thread
From: Matthew Wilcox @ 2025-02-10 17:20 UTC (permalink / raw)
To: Jaegeuk Kim; +Cc: linux-kernel, linux-f2fs-devel, linux-mm, linux-fsdevel
On Mon, Feb 10, 2025 at 05:00:47PM +0000, Jaegeuk Kim wrote:
> On 02/04, Jaegeuk Kim wrote:
> > On 01/17, Jaegeuk Kim wrote:
> > > On 01/17, Matthew Wilcox wrote:
> > > > On Fri, Jan 17, 2025 at 06:48:55PM +0000, Jaegeuk Kim wrote:
> > > > > > I don't understand how this is different from MADV_COLD. Please
> > > > > > explain.
> > > > >
> > > > > MADV_COLD is a vma range, while this is a file range. So, it's more close to
> > > > > fadvise(POSIX_FADV_DONTNEED) which tries to reclaim the file-backed pages
> > > > > at the time when it's called. The idea is to keep the hints only, and try to
> > > > > reclaim all later when admin expects system memory pressure soon.
> > > >
> > > > So you're saying you want POSIX_FADV_COLD?
> > >
> > > Yeah, the intention looks similar like marking it cold and paging out later.
> >
> > Kindly ping, for the feedback on the direction. If there's demand for something
> > generalized api, I'm happy to explore.
>
> If there's no objection, let me push the change in f2fs and keep an eye on
> who more will need this in general.
I don't know why you're asking for direction. I gave my direction: use
fadvise().
Putting this directly in f2fs is a horrible idea. NAK.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2 v6] add ioctl/sysfs to donate file-backed pages
2025-02-10 17:20 ` Matthew Wilcox
@ 2025-02-10 19:01 ` Jaegeuk Kim
2025-02-12 0:39 ` Jaegeuk Kim
0 siblings, 1 reply; 23+ messages in thread
From: Jaegeuk Kim @ 2025-02-10 19:01 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: linux-kernel, linux-f2fs-devel, linux-mm, linux-fsdevel
On 02/10, Matthew Wilcox wrote:
> On Mon, Feb 10, 2025 at 05:00:47PM +0000, Jaegeuk Kim wrote:
> > On 02/04, Jaegeuk Kim wrote:
> > > On 01/17, Jaegeuk Kim wrote:
> > > > On 01/17, Matthew Wilcox wrote:
> > > > > On Fri, Jan 17, 2025 at 06:48:55PM +0000, Jaegeuk Kim wrote:
> > > > > > > I don't understand how this is different from MADV_COLD. Please
> > > > > > > explain.
> > > > > >
> > > > > > MADV_COLD is a vma range, while this is a file range. So, it's more close to
> > > > > > fadvise(POSIX_FADV_DONTNEED) which tries to reclaim the file-backed pages
> > > > > > at the time when it's called. The idea is to keep the hints only, and try to
> > > > > > reclaim all later when admin expects system memory pressure soon.
> > > > >
> > > > > So you're saying you want POSIX_FADV_COLD?
> > > >
> > > > Yeah, the intention looks similar like marking it cold and paging out later.
> > >
> > > Kindly ping, for the feedback on the direction. If there's demand for something
> > > generalized api, I'm happy to explore.
> >
> > If there's no objection, let me push the change in f2fs and keep an eye on
> > who more will need this in general.
>
> I don't know why you're asking for direction. I gave my direction: use
> fadvise().
Funny, that single question didn't mean like this at all. Will take a look
how the patch looks like.
>
> Putting this directly in f2fs is a horrible idea. NAK.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2 v6] add ioctl/sysfs to donate file-backed pages
2025-02-10 19:01 ` Jaegeuk Kim
@ 2025-02-12 0:39 ` Jaegeuk Kim
0 siblings, 0 replies; 23+ messages in thread
From: Jaegeuk Kim @ 2025-02-12 0:39 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: linux-kernel, linux-f2fs-devel, linux-mm, linux-fsdevel
On 02/10, Jaegeuk Kim wrote:
> On 02/10, Matthew Wilcox wrote:
> > On Mon, Feb 10, 2025 at 05:00:47PM +0000, Jaegeuk Kim wrote:
> > > On 02/04, Jaegeuk Kim wrote:
> > > > On 01/17, Jaegeuk Kim wrote:
> > > > > On 01/17, Matthew Wilcox wrote:
> > > > > > On Fri, Jan 17, 2025 at 06:48:55PM +0000, Jaegeuk Kim wrote:
> > > > > > > > I don't understand how this is different from MADV_COLD. Please
> > > > > > > > explain.
> > > > > > >
> > > > > > > MADV_COLD is a vma range, while this is a file range. So, it's more close to
> > > > > > > fadvise(POSIX_FADV_DONTNEED) which tries to reclaim the file-backed pages
> > > > > > > at the time when it's called. The idea is to keep the hints only, and try to
> > > > > > > reclaim all later when admin expects system memory pressure soon.
> > > > > >
> > > > > > So you're saying you want POSIX_FADV_COLD?
> > > > >
> > > > > Yeah, the intention looks similar like marking it cold and paging out later.
> > > >
> > > > Kindly ping, for the feedback on the direction. If there's demand for something
> > > > generalized api, I'm happy to explore.
> > >
> > > If there's no objection, let me push the change in f2fs and keep an eye on
> > > who more will need this in general.
> >
> > I don't know why you're asking for direction. I gave my direction: use
> > fadvise().
>
> Funny, that single question didn't mean like this at all. Will take a look
> how the patch looks like.
Ok, it seems we can get this hint via POSIX_FADV_NOREUSE. I'll take that
instead of adding a new API. Thanks.
>
> >
> > Putting this directly in f2fs is a horrible idea. NAK.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/2 v7] add ioctl/sysfs to donate file-backed pages
@ 2025-01-22 21:10 Jaegeuk Kim
2025-01-22 21:10 ` [PATCH 1/2] f2fs: register inodes which is able to donate pages Jaegeuk Kim
0 siblings, 1 reply; 23+ messages in thread
From: Jaegeuk Kim @ 2025-01-22 21:10 UTC (permalink / raw)
To: linux-kernel, linux-f2fs-devel; +Cc: Jaegeuk Kim
Note, let me keep improving this patch set, while trying to get some feedbacks
from MM and API folks from [1].
If users clearly know which file-backed pages to reclaim in system view, they
can use this ioctl() to register in advance and reclaim all at once later.
I'd like to propose this API in F2FS only, since
1) the use-case is quite limited in Android at the moment. Once it's generall
accepted with more use-cases, happy to propose a generic API such as fadvise.
Please chime in, if there's any needs.
2) it's file-backed pages which requires to maintain the list of inode objects.
I'm not sure this fits in MM tho, also happy to listen to any feedback.
[1] https://lore.kernel.org/lkml/Z4qmF2n2pzuHqad_@google.com/
Change log from v6:
- change sysfs entry name to reclaim_caches_kb
Jaegeuk Kim (2):
f2fs: register inodes which is able to donate pages
f2fs: add a sysfs entry to request donate file-backed pages
Documentation/ABI/testing/sysfs-fs-f2fs | 7 +++
fs/f2fs/debug.c | 3 ++
fs/f2fs/f2fs.h | 14 +++++-
fs/f2fs/file.c | 65 +++++++++++++++++++++++++
fs/f2fs/inode.c | 14 ++++++
fs/f2fs/shrinker.c | 33 +++++++++++++
fs/f2fs/super.c | 1 +
fs/f2fs/sysfs.c | 8 +++
include/uapi/linux/f2fs.h | 7 +++
9 files changed, 151 insertions(+), 1 deletion(-)
--
2.48.1.262.g85cc9f2d1e-goog
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/2] f2fs: register inodes which is able to donate pages
2025-01-22 21:10 [PATCH 0/2 v7] " Jaegeuk Kim
@ 2025-01-22 21:10 ` Jaegeuk Kim
2025-01-23 1:50 ` [f2fs-dev] " Chao Yu
0 siblings, 1 reply; 23+ messages in thread
From: Jaegeuk Kim @ 2025-01-22 21:10 UTC (permalink / raw)
To: linux-kernel, linux-f2fs-devel; +Cc: Jaegeuk Kim
This patch introduces an inode list to keep the page cache ranges that users
can donate pages together.
#define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
struct f2fs_donate_range)
struct f2fs_donate_range {
__u64 start;
__u64 len;
};
e.g., ioctl(F2FS_IOC_DONATE_RANGE, &range);
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
fs/f2fs/debug.c | 3 ++
fs/f2fs/f2fs.h | 12 +++++++-
fs/f2fs/file.c | 65 +++++++++++++++++++++++++++++++++++++++
fs/f2fs/inode.c | 14 +++++++++
fs/f2fs/super.c | 1 +
include/uapi/linux/f2fs.h | 7 +++++
6 files changed, 101 insertions(+), 1 deletion(-)
diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index 468828288a4a..16c2dfb4f595 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -164,6 +164,7 @@ static void update_general_status(struct f2fs_sb_info *sbi)
si->ndirty_imeta = get_pages(sbi, F2FS_DIRTY_IMETA);
si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE];
si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
+ si->ndonate_files = sbi->donate_files;
si->nquota_files = sbi->nquota_files;
si->ndirty_all = sbi->ndirty_inode[DIRTY_META];
si->aw_cnt = atomic_read(&sbi->atomic_files);
@@ -501,6 +502,8 @@ static int stat_show(struct seq_file *s, void *v)
si->compr_inode, si->compr_blocks);
seq_printf(s, " - Swapfile Inode: %u\n",
si->swapfile_inode);
+ seq_printf(s, " - Donate Inode: %u\n",
+ si->ndonate_files);
seq_printf(s, " - Orphan/Append/Update Inode: %u, %u, %u\n",
si->orphans, si->append, si->update);
seq_printf(s, "\nMain area: %d segs, %d secs %d zones\n",
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 4bfe162eefd3..9bed1a3a60fb 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -850,6 +850,11 @@ struct f2fs_inode_info {
#endif
struct list_head dirty_list; /* dirty list for dirs and files */
struct list_head gdirty_list; /* linked in global dirty list */
+
+ /* linked in global inode list for cache donation */
+ struct list_head gdonate_list;
+ pgoff_t donate_start, donate_end; /* inclusive */
+
struct task_struct *atomic_write_task; /* store atomic write task */
struct extent_tree *extent_tree[NR_EXTENT_CACHES];
/* cached extent_tree entry */
@@ -1274,6 +1279,7 @@ enum inode_type {
DIR_INODE, /* for dirty dir inode */
FILE_INODE, /* for dirty regular/symlink inode */
DIRTY_META, /* for all dirtied inode metadata */
+ DONATE_INODE, /* for all inode to donate pages */
NR_INODE_TYPE,
};
@@ -1629,6 +1635,9 @@ struct f2fs_sb_info {
unsigned int warm_data_age_threshold;
unsigned int last_age_weight;
+ /* control donate caches */
+ unsigned int donate_files;
+
/* basic filesystem units */
unsigned int log_sectors_per_block; /* log2 sectors per block */
unsigned int log_blocksize; /* log2 block size */
@@ -3984,7 +3993,8 @@ struct f2fs_stat_info {
unsigned long long allocated_data_blocks;
int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta;
int ndirty_data, ndirty_qdata;
- unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all;
+ unsigned int ndirty_dirs, ndirty_files, ndirty_all;
+ unsigned int nquota_files, ndonate_files;
int nats, dirty_nats, sits, dirty_sits;
int free_nids, avail_nids, alloc_nids;
int total_count, utilization;
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index f92a9fba9991..99de53fb0bd9 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -2448,6 +2448,68 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg)
return ret;
}
+static int f2fs_ioc_donate_range(struct file *filp, unsigned long arg)
+{
+ struct inode *inode = file_inode(filp);
+ struct mnt_idmap *idmap = file_mnt_idmap(filp);
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+ struct f2fs_donate_range range;
+ u64 max_bytes = F2FS_BLK_TO_BYTES(max_file_blocks(inode));
+ u64 start, end;
+ int ret;
+
+ if (copy_from_user(&range, (struct f2fs_donate_range __user *)arg,
+ sizeof(range)))
+ return -EFAULT;
+
+ if (!inode_owner_or_capable(idmap, inode))
+ return -EACCES;
+
+ if (!S_ISREG(inode->i_mode))
+ return -EINVAL;
+
+ if (range.start >= max_bytes || range.len > max_bytes ||
+ (range.start + range.len) > max_bytes)
+ return -EINVAL;
+
+ start = range.start >> PAGE_SHIFT;
+ end = DIV_ROUND_UP(range.start + range.len, PAGE_SIZE);
+
+ ret = mnt_want_write_file(filp);
+ if (ret)
+ return ret;
+
+ inode_lock(inode);
+
+ if (f2fs_is_atomic_file(inode))
+ goto out;
+
+ spin_lock(&sbi->inode_lock[DONATE_INODE]);
+ /* let's remove the range, if len = 0 */
+ if (!range.len) {
+ if (!list_empty(&F2FS_I(inode)->gdonate_list)) {
+ list_del_init(&F2FS_I(inode)->gdonate_list);
+ sbi->donate_files--;
+ }
+ } else {
+ if (list_empty(&F2FS_I(inode)->gdonate_list)) {
+ list_add_tail(&F2FS_I(inode)->gdonate_list,
+ &sbi->inode_list[DONATE_INODE]);
+ sbi->donate_files++;
+ } else {
+ list_move_tail(&F2FS_I(inode)->gdonate_list,
+ &sbi->inode_list[DONATE_INODE]);
+ }
+ F2FS_I(inode)->donate_start = start;
+ F2FS_I(inode)->donate_end = end - 1;
+ }
+ spin_unlock(&sbi->inode_lock[DONATE_INODE]);
+out:
+ inode_unlock(inode);
+ mnt_drop_write_file(filp);
+ return ret;
+}
+
static int f2fs_ioc_fitrim(struct file *filp, unsigned long arg)
{
struct inode *inode = file_inode(filp);
@@ -4477,6 +4539,8 @@ static long __f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return -EOPNOTSUPP;
case F2FS_IOC_SHUTDOWN:
return f2fs_ioc_shutdown(filp, arg);
+ case F2FS_IOC_DONATE_RANGE:
+ return f2fs_ioc_donate_range(filp, arg);
case FITRIM:
return f2fs_ioc_fitrim(filp, arg);
case FS_IOC_SET_ENCRYPTION_POLICY:
@@ -5228,6 +5292,7 @@ long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case F2FS_IOC_RELEASE_VOLATILE_WRITE:
case F2FS_IOC_ABORT_ATOMIC_WRITE:
case F2FS_IOC_SHUTDOWN:
+ case F2FS_IOC_DONATE_RANGE:
case FITRIM:
case FS_IOC_SET_ENCRYPTION_POLICY:
case FS_IOC_GET_ENCRYPTION_PWSALT:
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 3dd25f64d6f1..cba2f6bacde4 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -804,6 +804,19 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
return 0;
}
+static void f2fs_remove_donate_inode(struct inode *inode)
+{
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+
+ if (list_empty(&F2FS_I(inode)->gdonate_list))
+ return;
+
+ spin_lock(&sbi->inode_lock[DONATE_INODE]);
+ list_del_init(&F2FS_I(inode)->gdonate_list);
+ sbi->donate_files--;
+ spin_unlock(&sbi->inode_lock[DONATE_INODE]);
+}
+
/*
* Called at the last iput() if i_nlink is zero
*/
@@ -838,6 +851,7 @@ void f2fs_evict_inode(struct inode *inode)
f2fs_bug_on(sbi, get_dirty_pages(inode));
f2fs_remove_dirty_inode(inode);
+ f2fs_remove_donate_inode(inode);
if (!IS_DEVICE_ALIASING(inode))
f2fs_destroy_extent_tree(inode);
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index fc7d463dee15..ef639a6d82e5 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1441,6 +1441,7 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
spin_lock_init(&fi->i_size_lock);
INIT_LIST_HEAD(&fi->dirty_list);
INIT_LIST_HEAD(&fi->gdirty_list);
+ INIT_LIST_HEAD(&fi->gdonate_list);
init_f2fs_rwsem(&fi->i_gc_rwsem[READ]);
init_f2fs_rwsem(&fi->i_gc_rwsem[WRITE]);
init_f2fs_rwsem(&fi->i_xattr_sem);
diff --git a/include/uapi/linux/f2fs.h b/include/uapi/linux/f2fs.h
index f7aaf8d23e20..cd38a7c166e6 100644
--- a/include/uapi/linux/f2fs.h
+++ b/include/uapi/linux/f2fs.h
@@ -44,6 +44,8 @@
#define F2FS_IOC_COMPRESS_FILE _IO(F2FS_IOCTL_MAGIC, 24)
#define F2FS_IOC_START_ATOMIC_REPLACE _IO(F2FS_IOCTL_MAGIC, 25)
#define F2FS_IOC_GET_DEV_ALIAS_FILE _IOR(F2FS_IOCTL_MAGIC, 26, __u32)
+#define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
+ struct f2fs_donate_range)
/*
* should be same as XFS_IOC_GOINGDOWN.
@@ -97,4 +99,9 @@ struct f2fs_comp_option {
__u8 log_cluster_size;
};
+struct f2fs_donate_range {
+ __u64 start;
+ __u64 len;
+};
+
#endif /* _UAPI_LINUX_F2FS_H */
--
2.48.1.262.g85cc9f2d1e-goog
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [f2fs-dev] [PATCH 1/2] f2fs: register inodes which is able to donate pages
2025-01-22 21:10 ` [PATCH 1/2] f2fs: register inodes which is able to donate pages Jaegeuk Kim
@ 2025-01-23 1:50 ` Chao Yu
0 siblings, 0 replies; 23+ messages in thread
From: Chao Yu @ 2025-01-23 1:50 UTC (permalink / raw)
To: Jaegeuk Kim, linux-kernel, linux-f2fs-devel; +Cc: chao
On 1/23/25 05:10, Jaegeuk Kim via Linux-f2fs-devel wrote:
> This patch introduces an inode list to keep the page cache ranges that users
> can donate pages together.
>
> #define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
> struct f2fs_donate_range)
> struct f2fs_donate_range {
> __u64 start;
> __u64 len;
> };
>
> e.g., ioctl(F2FS_IOC_DONATE_RANGE, &range);
>
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Thanks,
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/2 v5 RESEND] add ioctl/sysfs to donate file-backed pages
@ 2025-01-16 22:51 Jaegeuk Kim
2025-01-16 22:51 ` [PATCH 1/2] f2fs: register inodes which is able to donate pages Jaegeuk Kim
0 siblings, 1 reply; 23+ messages in thread
From: Jaegeuk Kim @ 2025-01-16 22:51 UTC (permalink / raw)
To: linux-kernel, linux-f2fs-devel; +Cc: Jaegeuk Kim
If users clearly know which file-backed pages to reclaim in system view, they
can use this ioctl() to register in advance and reclaim all at once later.
Change log from v4:
- fix range handling
Change log from v3:
- cover partial range
Change log from v2:
- add more boundary checks
- de-register the range, if len is zero
Jaegeuk Kim (2):
f2fs: register inodes which is able to donate pages
f2fs: add a sysfs entry to request donate file-backed pages
Documentation/ABI/testing/sysfs-fs-f2fs | 7 +++
fs/f2fs/debug.c | 3 ++
fs/f2fs/f2fs.h | 14 +++++-
fs/f2fs/file.c | 65 +++++++++++++++++++++++++
fs/f2fs/inode.c | 14 ++++++
fs/f2fs/shrinker.c | 27 ++++++++++
fs/f2fs/super.c | 1 +
fs/f2fs/sysfs.c | 8 +++
include/uapi/linux/f2fs.h | 7 +++
9 files changed, 145 insertions(+), 1 deletion(-)
--
2.48.0.rc2.279.g1de40edade-goog
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/2] f2fs: register inodes which is able to donate pages
2025-01-16 22:51 [PATCH 0/2 v5 RESEND] add ioctl/sysfs to donate file-backed pages Jaegeuk Kim
@ 2025-01-16 22:51 ` Jaegeuk Kim
2025-01-17 1:48 ` [f2fs-dev] " Chao Yu
0 siblings, 1 reply; 23+ messages in thread
From: Jaegeuk Kim @ 2025-01-16 22:51 UTC (permalink / raw)
To: linux-kernel, linux-f2fs-devel; +Cc: Jaegeuk Kim
This patch introduces an inode list to keep the page cache ranges that users
can donate pages together.
#define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
struct f2fs_donate_range)
struct f2fs_donate_range {
__u64 start;
__u64 len;
};
e.g., ioctl(F2FS_IOC_DONATE_RANGE, &range);
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
fs/f2fs/debug.c | 3 ++
fs/f2fs/f2fs.h | 12 +++++++-
fs/f2fs/file.c | 65 +++++++++++++++++++++++++++++++++++++++
fs/f2fs/inode.c | 14 +++++++++
fs/f2fs/super.c | 1 +
include/uapi/linux/f2fs.h | 7 +++++
6 files changed, 101 insertions(+), 1 deletion(-)
diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index 468828288a4a..16c2dfb4f595 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -164,6 +164,7 @@ static void update_general_status(struct f2fs_sb_info *sbi)
si->ndirty_imeta = get_pages(sbi, F2FS_DIRTY_IMETA);
si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE];
si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
+ si->ndonate_files = sbi->donate_files;
si->nquota_files = sbi->nquota_files;
si->ndirty_all = sbi->ndirty_inode[DIRTY_META];
si->aw_cnt = atomic_read(&sbi->atomic_files);
@@ -501,6 +502,8 @@ static int stat_show(struct seq_file *s, void *v)
si->compr_inode, si->compr_blocks);
seq_printf(s, " - Swapfile Inode: %u\n",
si->swapfile_inode);
+ seq_printf(s, " - Donate Inode: %u\n",
+ si->ndonate_files);
seq_printf(s, " - Orphan/Append/Update Inode: %u, %u, %u\n",
si->orphans, si->append, si->update);
seq_printf(s, "\nMain area: %d segs, %d secs %d zones\n",
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 4bfe162eefd3..951fbc3f94c7 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -850,6 +850,11 @@ struct f2fs_inode_info {
#endif
struct list_head dirty_list; /* dirty list for dirs and files */
struct list_head gdirty_list; /* linked in global dirty list */
+
+ /* linked in global inode list for cache donation */
+ struct list_head gdonate_list;
+ loff_t donate_start, donate_end; /* inclusive */
+
struct task_struct *atomic_write_task; /* store atomic write task */
struct extent_tree *extent_tree[NR_EXTENT_CACHES];
/* cached extent_tree entry */
@@ -1274,6 +1279,7 @@ enum inode_type {
DIR_INODE, /* for dirty dir inode */
FILE_INODE, /* for dirty regular/symlink inode */
DIRTY_META, /* for all dirtied inode metadata */
+ DONATE_INODE, /* for all inode to donate pages */
NR_INODE_TYPE,
};
@@ -1629,6 +1635,9 @@ struct f2fs_sb_info {
unsigned int warm_data_age_threshold;
unsigned int last_age_weight;
+ /* control donate caches */
+ unsigned int donate_files;
+
/* basic filesystem units */
unsigned int log_sectors_per_block; /* log2 sectors per block */
unsigned int log_blocksize; /* log2 block size */
@@ -3984,7 +3993,8 @@ struct f2fs_stat_info {
unsigned long long allocated_data_blocks;
int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta;
int ndirty_data, ndirty_qdata;
- unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all;
+ unsigned int ndirty_dirs, ndirty_files, ndirty_all;
+ unsigned int nquota_files, ndonate_files;
int nats, dirty_nats, sits, dirty_sits;
int free_nids, avail_nids, alloc_nids;
int total_count, utilization;
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 3e06fea9795c..6572970a988a 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -2450,6 +2450,68 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg)
return ret;
}
+static int f2fs_ioc_donate_range(struct file *filp, unsigned long arg)
+{
+ struct inode *inode = file_inode(filp);
+ struct mnt_idmap *idmap = file_mnt_idmap(filp);
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+ struct f2fs_donate_range range;
+ u64 max_pages = F2FS_BLK_TO_BYTES(max_file_blocks(inode)) >> PAGE_SHIFT;
+ u64 start, end;
+ int ret;
+
+ if (copy_from_user(&range, (struct f2fs_donate_range __user *)arg,
+ sizeof(range)))
+ return -EFAULT;
+
+ if (!inode_owner_or_capable(idmap, inode))
+ return -EACCES;
+
+ if (!S_ISREG(inode->i_mode))
+ return -EINVAL;
+
+ if (range.start >= max_pages || range.len > max_pages ||
+ (range.start + range.len) > max_pages)
+ return -EINVAL;
+
+ start = range.start >> PAGE_SHIFT;
+ end = DIV_ROUND_UP(range.start + range.len, PAGE_SIZE);
+
+ ret = mnt_want_write_file(filp);
+ if (ret)
+ return ret;
+
+ inode_lock(inode);
+
+ if (f2fs_is_atomic_file(inode))
+ goto out;
+
+ spin_lock(&sbi->inode_lock[DONATE_INODE]);
+ /* let's remove the range, if len = 0 */
+ if (!range.len) {
+ if (!list_empty(&F2FS_I(inode)->gdonate_list)) {
+ list_del_init(&F2FS_I(inode)->gdonate_list);
+ sbi->donate_files--;
+ }
+ } else {
+ if (list_empty(&F2FS_I(inode)->gdonate_list)) {
+ list_add_tail(&F2FS_I(inode)->gdonate_list,
+ &sbi->inode_list[DONATE_INODE]);
+ sbi->donate_files++;
+ } else {
+ list_move_tail(&F2FS_I(inode)->gdonate_list,
+ &sbi->inode_list[DONATE_INODE]);
+ }
+ F2FS_I(inode)->donate_start = start;
+ F2FS_I(inode)->donate_end = end - 1;
+ }
+ spin_unlock(&sbi->inode_lock[DONATE_INODE]);
+out:
+ inode_unlock(inode);
+ mnt_drop_write_file(filp);
+ return ret;
+}
+
static int f2fs_ioc_fitrim(struct file *filp, unsigned long arg)
{
struct inode *inode = file_inode(filp);
@@ -4479,6 +4541,8 @@ static long __f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return -EOPNOTSUPP;
case F2FS_IOC_SHUTDOWN:
return f2fs_ioc_shutdown(filp, arg);
+ case F2FS_IOC_DONATE_RANGE:
+ return f2fs_ioc_donate_range(filp, arg);
case FITRIM:
return f2fs_ioc_fitrim(filp, arg);
case FS_IOC_SET_ENCRYPTION_POLICY:
@@ -5230,6 +5294,7 @@ long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case F2FS_IOC_RELEASE_VOLATILE_WRITE:
case F2FS_IOC_ABORT_ATOMIC_WRITE:
case F2FS_IOC_SHUTDOWN:
+ case F2FS_IOC_DONATE_RANGE:
case FITRIM:
case FS_IOC_SET_ENCRYPTION_POLICY:
case FS_IOC_GET_ENCRYPTION_PWSALT:
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 7de33da8b3ea..f9fc58f313f2 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -804,6 +804,19 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
return 0;
}
+static void f2fs_remove_donate_inode(struct inode *inode)
+{
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+
+ if (list_empty(&F2FS_I(inode)->gdonate_list))
+ return;
+
+ spin_lock(&sbi->inode_lock[DONATE_INODE]);
+ list_del_init(&F2FS_I(inode)->gdonate_list);
+ sbi->donate_files--;
+ spin_unlock(&sbi->inode_lock[DONATE_INODE]);
+}
+
/*
* Called at the last iput() if i_nlink is zero
*/
@@ -838,6 +851,7 @@ void f2fs_evict_inode(struct inode *inode)
f2fs_bug_on(sbi, get_dirty_pages(inode));
f2fs_remove_dirty_inode(inode);
+ f2fs_remove_donate_inode(inode);
if (!IS_DEVICE_ALIASING(inode))
f2fs_destroy_extent_tree(inode);
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index fc7d463dee15..ef639a6d82e5 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1441,6 +1441,7 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
spin_lock_init(&fi->i_size_lock);
INIT_LIST_HEAD(&fi->dirty_list);
INIT_LIST_HEAD(&fi->gdirty_list);
+ INIT_LIST_HEAD(&fi->gdonate_list);
init_f2fs_rwsem(&fi->i_gc_rwsem[READ]);
init_f2fs_rwsem(&fi->i_gc_rwsem[WRITE]);
init_f2fs_rwsem(&fi->i_xattr_sem);
diff --git a/include/uapi/linux/f2fs.h b/include/uapi/linux/f2fs.h
index f7aaf8d23e20..cd38a7c166e6 100644
--- a/include/uapi/linux/f2fs.h
+++ b/include/uapi/linux/f2fs.h
@@ -44,6 +44,8 @@
#define F2FS_IOC_COMPRESS_FILE _IO(F2FS_IOCTL_MAGIC, 24)
#define F2FS_IOC_START_ATOMIC_REPLACE _IO(F2FS_IOCTL_MAGIC, 25)
#define F2FS_IOC_GET_DEV_ALIAS_FILE _IOR(F2FS_IOCTL_MAGIC, 26, __u32)
+#define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
+ struct f2fs_donate_range)
/*
* should be same as XFS_IOC_GOINGDOWN.
@@ -97,4 +99,9 @@ struct f2fs_comp_option {
__u8 log_cluster_size;
};
+struct f2fs_donate_range {
+ __u64 start;
+ __u64 len;
+};
+
#endif /* _UAPI_LINUX_F2FS_H */
--
2.48.0.rc2.279.g1de40edade-goog
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [f2fs-dev] [PATCH 1/2] f2fs: register inodes which is able to donate pages
2025-01-16 22:51 ` [PATCH 1/2] f2fs: register inodes which is able to donate pages Jaegeuk Kim
@ 2025-01-17 1:48 ` Chao Yu
0 siblings, 0 replies; 23+ messages in thread
From: Chao Yu @ 2025-01-17 1:48 UTC (permalink / raw)
To: Jaegeuk Kim, linux-kernel, linux-f2fs-devel; +Cc: chao
On 1/17/25 06:51, Jaegeuk Kim via Linux-f2fs-devel wrote:
> This patch introduces an inode list to keep the page cache ranges that users
> can donate pages together.
>
> #define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
> struct f2fs_donate_range)
> struct f2fs_donate_range {
> __u64 start;
> __u64 len;
> };
>
> e.g., ioctl(F2FS_IOC_DONATE_RANGE, &range);
>
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
> fs/f2fs/debug.c | 3 ++
> fs/f2fs/f2fs.h | 12 +++++++-
> fs/f2fs/file.c | 65 +++++++++++++++++++++++++++++++++++++++
> fs/f2fs/inode.c | 14 +++++++++
> fs/f2fs/super.c | 1 +
> include/uapi/linux/f2fs.h | 7 +++++
> 6 files changed, 101 insertions(+), 1 deletion(-)
>
> diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
> index 468828288a4a..16c2dfb4f595 100644
> --- a/fs/f2fs/debug.c
> +++ b/fs/f2fs/debug.c
> @@ -164,6 +164,7 @@ static void update_general_status(struct f2fs_sb_info *sbi)
> si->ndirty_imeta = get_pages(sbi, F2FS_DIRTY_IMETA);
> si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE];
> si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
> + si->ndonate_files = sbi->donate_files;
> si->nquota_files = sbi->nquota_files;
> si->ndirty_all = sbi->ndirty_inode[DIRTY_META];
> si->aw_cnt = atomic_read(&sbi->atomic_files);
> @@ -501,6 +502,8 @@ static int stat_show(struct seq_file *s, void *v)
> si->compr_inode, si->compr_blocks);
> seq_printf(s, " - Swapfile Inode: %u\n",
> si->swapfile_inode);
> + seq_printf(s, " - Donate Inode: %u\n",
> + si->ndonate_files);
> seq_printf(s, " - Orphan/Append/Update Inode: %u, %u, %u\n",
> si->orphans, si->append, si->update);
> seq_printf(s, "\nMain area: %d segs, %d secs %d zones\n",
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 4bfe162eefd3..951fbc3f94c7 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -850,6 +850,11 @@ struct f2fs_inode_info {
> #endif
> struct list_head dirty_list; /* dirty list for dirs and files */
> struct list_head gdirty_list; /* linked in global dirty list */
> +
> + /* linked in global inode list for cache donation */
> + struct list_head gdonate_list;
> + loff_t donate_start, donate_end; /* inclusive */
> +
> struct task_struct *atomic_write_task; /* store atomic write task */
> struct extent_tree *extent_tree[NR_EXTENT_CACHES];
> /* cached extent_tree entry */
> @@ -1274,6 +1279,7 @@ enum inode_type {
> DIR_INODE, /* for dirty dir inode */
> FILE_INODE, /* for dirty regular/symlink inode */
> DIRTY_META, /* for all dirtied inode metadata */
> + DONATE_INODE, /* for all inode to donate pages */
> NR_INODE_TYPE,
> };
>
> @@ -1629,6 +1635,9 @@ struct f2fs_sb_info {
> unsigned int warm_data_age_threshold;
> unsigned int last_age_weight;
>
> + /* control donate caches */
> + unsigned int donate_files;
> +
> /* basic filesystem units */
> unsigned int log_sectors_per_block; /* log2 sectors per block */
> unsigned int log_blocksize; /* log2 block size */
> @@ -3984,7 +3993,8 @@ struct f2fs_stat_info {
> unsigned long long allocated_data_blocks;
> int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta;
> int ndirty_data, ndirty_qdata;
> - unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all;
> + unsigned int ndirty_dirs, ndirty_files, ndirty_all;
> + unsigned int nquota_files, ndonate_files;
> int nats, dirty_nats, sits, dirty_sits;
> int free_nids, avail_nids, alloc_nids;
> int total_count, utilization;
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 3e06fea9795c..6572970a988a 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -2450,6 +2450,68 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg)
> return ret;
> }
>
> +static int f2fs_ioc_donate_range(struct file *filp, unsigned long arg)
> +{
> + struct inode *inode = file_inode(filp);
> + struct mnt_idmap *idmap = file_mnt_idmap(filp);
> + struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> + struct f2fs_donate_range range;
> + u64 max_pages = F2FS_BLK_TO_BYTES(max_file_blocks(inode)) >> PAGE_SHIFT;
It should be u64 max_size = F2FS_BLK_TO_BYTES(max_file_blocks(inode));?
> + u64 start, end;
> + int ret;
> +
> + if (copy_from_user(&range, (struct f2fs_donate_range __user *)arg,
> + sizeof(range)))
> + return -EFAULT;
> +
> + if (!inode_owner_or_capable(idmap, inode))
> + return -EACCES;
> +
> + if (!S_ISREG(inode->i_mode))
> + return -EINVAL;
> +
> + if (range.start >= max_pages || range.len > max_pages ||
> + (range.start + range.len) > max_pages)
> + return -EINVAL;
Use max_size instead of max_pages?
Thanks,
> +
> + start = range.start >> PAGE_SHIFT;
> + end = DIV_ROUND_UP(range.start + range.len, PAGE_SIZE);
> +
> + ret = mnt_want_write_file(filp);
> + if (ret)
> + return ret;
> +
> + inode_lock(inode);
> +
> + if (f2fs_is_atomic_file(inode))
> + goto out;
> +
> + spin_lock(&sbi->inode_lock[DONATE_INODE]);
> + /* let's remove the range, if len = 0 */
> + if (!range.len) {
> + if (!list_empty(&F2FS_I(inode)->gdonate_list)) {
> + list_del_init(&F2FS_I(inode)->gdonate_list);
> + sbi->donate_files--;
> + }
> + } else {
> + if (list_empty(&F2FS_I(inode)->gdonate_list)) {
> + list_add_tail(&F2FS_I(inode)->gdonate_list,
> + &sbi->inode_list[DONATE_INODE]);
> + sbi->donate_files++;
> + } else {
> + list_move_tail(&F2FS_I(inode)->gdonate_list,
> + &sbi->inode_list[DONATE_INODE]);
> + }
> + F2FS_I(inode)->donate_start = start;
> + F2FS_I(inode)->donate_end = end - 1;
> + }
> + spin_unlock(&sbi->inode_lock[DONATE_INODE]);
> +out:
> + inode_unlock(inode);
> + mnt_drop_write_file(filp);
> + return ret;
> +}
> +
> static int f2fs_ioc_fitrim(struct file *filp, unsigned long arg)
> {
> struct inode *inode = file_inode(filp);
> @@ -4479,6 +4541,8 @@ static long __f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> return -EOPNOTSUPP;
> case F2FS_IOC_SHUTDOWN:
> return f2fs_ioc_shutdown(filp, arg);
> + case F2FS_IOC_DONATE_RANGE:
> + return f2fs_ioc_donate_range(filp, arg);
> case FITRIM:
> return f2fs_ioc_fitrim(filp, arg);
> case FS_IOC_SET_ENCRYPTION_POLICY:
> @@ -5230,6 +5294,7 @@ long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> case F2FS_IOC_RELEASE_VOLATILE_WRITE:
> case F2FS_IOC_ABORT_ATOMIC_WRITE:
> case F2FS_IOC_SHUTDOWN:
> + case F2FS_IOC_DONATE_RANGE:
> case FITRIM:
> case FS_IOC_SET_ENCRYPTION_POLICY:
> case FS_IOC_GET_ENCRYPTION_PWSALT:
> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> index 7de33da8b3ea..f9fc58f313f2 100644
> --- a/fs/f2fs/inode.c
> +++ b/fs/f2fs/inode.c
> @@ -804,6 +804,19 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
> return 0;
> }
>
> +static void f2fs_remove_donate_inode(struct inode *inode)
> +{
> + struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> +
> + if (list_empty(&F2FS_I(inode)->gdonate_list))
> + return;
> +
> + spin_lock(&sbi->inode_lock[DONATE_INODE]);
> + list_del_init(&F2FS_I(inode)->gdonate_list);
> + sbi->donate_files--;
> + spin_unlock(&sbi->inode_lock[DONATE_INODE]);
> +}
> +
> /*
> * Called at the last iput() if i_nlink is zero
> */
> @@ -838,6 +851,7 @@ void f2fs_evict_inode(struct inode *inode)
>
> f2fs_bug_on(sbi, get_dirty_pages(inode));
> f2fs_remove_dirty_inode(inode);
> + f2fs_remove_donate_inode(inode);
>
> if (!IS_DEVICE_ALIASING(inode))
> f2fs_destroy_extent_tree(inode);
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index fc7d463dee15..ef639a6d82e5 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -1441,6 +1441,7 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
> spin_lock_init(&fi->i_size_lock);
> INIT_LIST_HEAD(&fi->dirty_list);
> INIT_LIST_HEAD(&fi->gdirty_list);
> + INIT_LIST_HEAD(&fi->gdonate_list);
> init_f2fs_rwsem(&fi->i_gc_rwsem[READ]);
> init_f2fs_rwsem(&fi->i_gc_rwsem[WRITE]);
> init_f2fs_rwsem(&fi->i_xattr_sem);
> diff --git a/include/uapi/linux/f2fs.h b/include/uapi/linux/f2fs.h
> index f7aaf8d23e20..cd38a7c166e6 100644
> --- a/include/uapi/linux/f2fs.h
> +++ b/include/uapi/linux/f2fs.h
> @@ -44,6 +44,8 @@
> #define F2FS_IOC_COMPRESS_FILE _IO(F2FS_IOCTL_MAGIC, 24)
> #define F2FS_IOC_START_ATOMIC_REPLACE _IO(F2FS_IOCTL_MAGIC, 25)
> #define F2FS_IOC_GET_DEV_ALIAS_FILE _IOR(F2FS_IOCTL_MAGIC, 26, __u32)
> +#define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
> + struct f2fs_donate_range)
>
> /*
> * should be same as XFS_IOC_GOINGDOWN.
> @@ -97,4 +99,9 @@ struct f2fs_comp_option {
> __u8 log_cluster_size;
> };
>
> +struct f2fs_donate_range {
> + __u64 start;
> + __u64 len;
> +};
> +
> #endif /* _UAPI_LINUX_F2FS_H */
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/2 v4] add ioctl/sysfs to donate file-backed pages
@ 2025-01-16 4:41 Jaegeuk Kim
2025-01-16 4:42 ` [PATCH 1/2] f2fs: register inodes which is able to donate pages Jaegeuk Kim
0 siblings, 1 reply; 23+ messages in thread
From: Jaegeuk Kim @ 2025-01-16 4:41 UTC (permalink / raw)
To: linux-kernel, linux-f2fs-devel; +Cc: Jaegeuk Kim
If users clearly know which file-backed pages to reclaim in system view, they
can use this ioctl() to register in advance and reclaim all at once later.
Change log from v3:
- cover partial range
Change log from v2:
- add more boundary checks
- de-register the range, if len is zero
Jaegeuk Kim (2):
f2fs: register inodes which is able to donate pages
f2fs: add a sysfs entry to request donate file-backed pages
Documentation/ABI/testing/sysfs-fs-f2fs | 7 +++
fs/f2fs/debug.c | 3 ++
fs/f2fs/f2fs.h | 14 +++++-
fs/f2fs/file.c | 65 +++++++++++++++++++++++++
fs/f2fs/inode.c | 14 ++++++
fs/f2fs/shrinker.c | 27 ++++++++++
fs/f2fs/super.c | 1 +
fs/f2fs/sysfs.c | 8 +++
include/uapi/linux/f2fs.h | 7 +++
9 files changed, 145 insertions(+), 1 deletion(-)
--
2.48.0.rc2.279.g1de40edade-goog
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/2] f2fs: register inodes which is able to donate pages
2025-01-16 4:41 [PATCH 0/2 v4] add ioctl/sysfs to donate file-backed pages Jaegeuk Kim
@ 2025-01-16 4:42 ` Jaegeuk Kim
2025-01-16 5:31 ` [f2fs-dev] " Chao Yu
0 siblings, 1 reply; 23+ messages in thread
From: Jaegeuk Kim @ 2025-01-16 4:42 UTC (permalink / raw)
To: linux-kernel, linux-f2fs-devel; +Cc: Jaegeuk Kim
This patch introduces an inode list to keep the page cache ranges that users
can donate pages together.
#define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
struct f2fs_donate_range)
struct f2fs_donate_range {
__u64 start;
__u64 len;
};
e.g., ioctl(F2FS_IOC_DONATE_RANGE, &range);
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
fs/f2fs/debug.c | 3 ++
fs/f2fs/f2fs.h | 12 +++++++-
fs/f2fs/file.c | 65 +++++++++++++++++++++++++++++++++++++++
fs/f2fs/inode.c | 14 +++++++++
fs/f2fs/super.c | 1 +
include/uapi/linux/f2fs.h | 7 +++++
6 files changed, 101 insertions(+), 1 deletion(-)
diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index 468828288a4a..16c2dfb4f595 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -164,6 +164,7 @@ static void update_general_status(struct f2fs_sb_info *sbi)
si->ndirty_imeta = get_pages(sbi, F2FS_DIRTY_IMETA);
si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE];
si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
+ si->ndonate_files = sbi->donate_files;
si->nquota_files = sbi->nquota_files;
si->ndirty_all = sbi->ndirty_inode[DIRTY_META];
si->aw_cnt = atomic_read(&sbi->atomic_files);
@@ -501,6 +502,8 @@ static int stat_show(struct seq_file *s, void *v)
si->compr_inode, si->compr_blocks);
seq_printf(s, " - Swapfile Inode: %u\n",
si->swapfile_inode);
+ seq_printf(s, " - Donate Inode: %u\n",
+ si->ndonate_files);
seq_printf(s, " - Orphan/Append/Update Inode: %u, %u, %u\n",
si->orphans, si->append, si->update);
seq_printf(s, "\nMain area: %d segs, %d secs %d zones\n",
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 4bfe162eefd3..951fbc3f94c7 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -850,6 +850,11 @@ struct f2fs_inode_info {
#endif
struct list_head dirty_list; /* dirty list for dirs and files */
struct list_head gdirty_list; /* linked in global dirty list */
+
+ /* linked in global inode list for cache donation */
+ struct list_head gdonate_list;
+ loff_t donate_start, donate_end; /* inclusive */
+
struct task_struct *atomic_write_task; /* store atomic write task */
struct extent_tree *extent_tree[NR_EXTENT_CACHES];
/* cached extent_tree entry */
@@ -1274,6 +1279,7 @@ enum inode_type {
DIR_INODE, /* for dirty dir inode */
FILE_INODE, /* for dirty regular/symlink inode */
DIRTY_META, /* for all dirtied inode metadata */
+ DONATE_INODE, /* for all inode to donate pages */
NR_INODE_TYPE,
};
@@ -1629,6 +1635,9 @@ struct f2fs_sb_info {
unsigned int warm_data_age_threshold;
unsigned int last_age_weight;
+ /* control donate caches */
+ unsigned int donate_files;
+
/* basic filesystem units */
unsigned int log_sectors_per_block; /* log2 sectors per block */
unsigned int log_blocksize; /* log2 block size */
@@ -3984,7 +3993,8 @@ struct f2fs_stat_info {
unsigned long long allocated_data_blocks;
int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta;
int ndirty_data, ndirty_qdata;
- unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all;
+ unsigned int ndirty_dirs, ndirty_files, ndirty_all;
+ unsigned int nquota_files, ndonate_files;
int nats, dirty_nats, sits, dirty_sits;
int free_nids, avail_nids, alloc_nids;
int total_count, utilization;
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 81764b10840b..ff475bdc2832 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -2429,6 +2429,68 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg)
return ret;
}
+static int f2fs_ioc_donate_range(struct file *filp, unsigned long arg)
+{
+ struct inode *inode = file_inode(filp);
+ struct mnt_idmap *idmap = file_mnt_idmap(filp);
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+ struct f2fs_donate_range range;
+ u64 max_pages = F2FS_BLK_TO_BYTES(max_file_blocks(inode)) >> PAGE_SHIFT;
+ bool partial = range.start & PAGE_MASK;
+ int ret;
+
+ if (copy_from_user(&range, (struct f2fs_donate_range __user *)arg,
+ sizeof(range)))
+ return -EFAULT;
+
+ if (!inode_owner_or_capable(idmap, inode))
+ return -EACCES;
+
+ if (!S_ISREG(inode->i_mode))
+ return -EINVAL;
+
+ range.start >>= PAGE_SHIFT;
+ range.len = DIV_ROUND_UP(range.len, PAGE_SIZE) + partial ? 1: 0;
+
+ if (range.start >= max_pages || range.len > max_pages ||
+ (range.start + range.len) > max_pages)
+ return -EINVAL;
+
+ ret = mnt_want_write_file(filp);
+ if (ret)
+ return ret;
+
+ inode_lock(inode);
+
+ if (f2fs_is_atomic_file(inode))
+ goto out;
+
+ spin_lock(&sbi->inode_lock[DONATE_INODE]);
+ /* let's remove the range, if len = 0 */
+ if (!range.len) {
+ if (!list_empty(&F2FS_I(inode)->gdonate_list)) {
+ list_del_init(&F2FS_I(inode)->gdonate_list);
+ sbi->donate_files--;
+ }
+ } else {
+ if (list_empty(&F2FS_I(inode)->gdonate_list)) {
+ list_add_tail(&F2FS_I(inode)->gdonate_list,
+ &sbi->inode_list[DONATE_INODE]);
+ sbi->donate_files++;
+ } else {
+ list_move_tail(&F2FS_I(inode)->gdonate_list,
+ &sbi->inode_list[DONATE_INODE]);
+ }
+ F2FS_I(inode)->donate_start = range.start;
+ F2FS_I(inode)->donate_end = range.start + range.len - 1;
+ }
+ spin_unlock(&sbi->inode_lock[DONATE_INODE]);
+out:
+ inode_unlock(inode);
+ mnt_drop_write_file(filp);
+ return ret;
+}
+
static int f2fs_ioc_fitrim(struct file *filp, unsigned long arg)
{
struct inode *inode = file_inode(filp);
@@ -4458,6 +4520,8 @@ static long __f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return -EOPNOTSUPP;
case F2FS_IOC_SHUTDOWN:
return f2fs_ioc_shutdown(filp, arg);
+ case F2FS_IOC_DONATE_RANGE:
+ return f2fs_ioc_donate_range(filp, arg);
case FITRIM:
return f2fs_ioc_fitrim(filp, arg);
case FS_IOC_SET_ENCRYPTION_POLICY:
@@ -5209,6 +5273,7 @@ long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case F2FS_IOC_RELEASE_VOLATILE_WRITE:
case F2FS_IOC_ABORT_ATOMIC_WRITE:
case F2FS_IOC_SHUTDOWN:
+ case F2FS_IOC_DONATE_RANGE:
case FITRIM:
case FS_IOC_SET_ENCRYPTION_POLICY:
case FS_IOC_GET_ENCRYPTION_PWSALT:
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 7de33da8b3ea..f9fc58f313f2 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -804,6 +804,19 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
return 0;
}
+static void f2fs_remove_donate_inode(struct inode *inode)
+{
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+
+ if (list_empty(&F2FS_I(inode)->gdonate_list))
+ return;
+
+ spin_lock(&sbi->inode_lock[DONATE_INODE]);
+ list_del_init(&F2FS_I(inode)->gdonate_list);
+ sbi->donate_files--;
+ spin_unlock(&sbi->inode_lock[DONATE_INODE]);
+}
+
/*
* Called at the last iput() if i_nlink is zero
*/
@@ -838,6 +851,7 @@ void f2fs_evict_inode(struct inode *inode)
f2fs_bug_on(sbi, get_dirty_pages(inode));
f2fs_remove_dirty_inode(inode);
+ f2fs_remove_donate_inode(inode);
if (!IS_DEVICE_ALIASING(inode))
f2fs_destroy_extent_tree(inode);
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index fc7d463dee15..ef639a6d82e5 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1441,6 +1441,7 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
spin_lock_init(&fi->i_size_lock);
INIT_LIST_HEAD(&fi->dirty_list);
INIT_LIST_HEAD(&fi->gdirty_list);
+ INIT_LIST_HEAD(&fi->gdonate_list);
init_f2fs_rwsem(&fi->i_gc_rwsem[READ]);
init_f2fs_rwsem(&fi->i_gc_rwsem[WRITE]);
init_f2fs_rwsem(&fi->i_xattr_sem);
diff --git a/include/uapi/linux/f2fs.h b/include/uapi/linux/f2fs.h
index f7aaf8d23e20..cd38a7c166e6 100644
--- a/include/uapi/linux/f2fs.h
+++ b/include/uapi/linux/f2fs.h
@@ -44,6 +44,8 @@
#define F2FS_IOC_COMPRESS_FILE _IO(F2FS_IOCTL_MAGIC, 24)
#define F2FS_IOC_START_ATOMIC_REPLACE _IO(F2FS_IOCTL_MAGIC, 25)
#define F2FS_IOC_GET_DEV_ALIAS_FILE _IOR(F2FS_IOCTL_MAGIC, 26, __u32)
+#define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
+ struct f2fs_donate_range)
/*
* should be same as XFS_IOC_GOINGDOWN.
@@ -97,4 +99,9 @@ struct f2fs_comp_option {
__u8 log_cluster_size;
};
+struct f2fs_donate_range {
+ __u64 start;
+ __u64 len;
+};
+
#endif /* _UAPI_LINUX_F2FS_H */
--
2.48.0.rc2.279.g1de40edade-goog
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [f2fs-dev] [PATCH 1/2] f2fs: register inodes which is able to donate pages
2025-01-16 4:42 ` [PATCH 1/2] f2fs: register inodes which is able to donate pages Jaegeuk Kim
@ 2025-01-16 5:31 ` Chao Yu
2025-01-16 17:00 ` Jaegeuk Kim
0 siblings, 1 reply; 23+ messages in thread
From: Chao Yu @ 2025-01-16 5:31 UTC (permalink / raw)
To: Jaegeuk Kim, linux-kernel, linux-f2fs-devel; +Cc: chao
On 1/16/25 12:42, Jaegeuk Kim via Linux-f2fs-devel wrote:
> This patch introduces an inode list to keep the page cache ranges that users
> can donate pages together.
>
> #define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
> struct f2fs_donate_range)
> struct f2fs_donate_range {
> __u64 start;
> __u64 len;
> };
>
> e.g., ioctl(F2FS_IOC_DONATE_RANGE, &range);
>
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
> fs/f2fs/debug.c | 3 ++
> fs/f2fs/f2fs.h | 12 +++++++-
> fs/f2fs/file.c | 65 +++++++++++++++++++++++++++++++++++++++
> fs/f2fs/inode.c | 14 +++++++++
> fs/f2fs/super.c | 1 +
> include/uapi/linux/f2fs.h | 7 +++++
> 6 files changed, 101 insertions(+), 1 deletion(-)
>
> diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
> index 468828288a4a..16c2dfb4f595 100644
> --- a/fs/f2fs/debug.c
> +++ b/fs/f2fs/debug.c
> @@ -164,6 +164,7 @@ static void update_general_status(struct f2fs_sb_info *sbi)
> si->ndirty_imeta = get_pages(sbi, F2FS_DIRTY_IMETA);
> si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE];
> si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
> + si->ndonate_files = sbi->donate_files;
> si->nquota_files = sbi->nquota_files;
> si->ndirty_all = sbi->ndirty_inode[DIRTY_META];
> si->aw_cnt = atomic_read(&sbi->atomic_files);
> @@ -501,6 +502,8 @@ static int stat_show(struct seq_file *s, void *v)
> si->compr_inode, si->compr_blocks);
> seq_printf(s, " - Swapfile Inode: %u\n",
> si->swapfile_inode);
> + seq_printf(s, " - Donate Inode: %u\n",
> + si->ndonate_files);
> seq_printf(s, " - Orphan/Append/Update Inode: %u, %u, %u\n",
> si->orphans, si->append, si->update);
> seq_printf(s, "\nMain area: %d segs, %d secs %d zones\n",
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 4bfe162eefd3..951fbc3f94c7 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -850,6 +850,11 @@ struct f2fs_inode_info {
> #endif
> struct list_head dirty_list; /* dirty list for dirs and files */
> struct list_head gdirty_list; /* linked in global dirty list */
> +
> + /* linked in global inode list for cache donation */
> + struct list_head gdonate_list;
> + loff_t donate_start, donate_end; /* inclusive */
> +
> struct task_struct *atomic_write_task; /* store atomic write task */
> struct extent_tree *extent_tree[NR_EXTENT_CACHES];
> /* cached extent_tree entry */
> @@ -1274,6 +1279,7 @@ enum inode_type {
> DIR_INODE, /* for dirty dir inode */
> FILE_INODE, /* for dirty regular/symlink inode */
> DIRTY_META, /* for all dirtied inode metadata */
> + DONATE_INODE, /* for all inode to donate pages */
> NR_INODE_TYPE,
> };
>
> @@ -1629,6 +1635,9 @@ struct f2fs_sb_info {
> unsigned int warm_data_age_threshold;
> unsigned int last_age_weight;
>
> + /* control donate caches */
> + unsigned int donate_files;
> +
> /* basic filesystem units */
> unsigned int log_sectors_per_block; /* log2 sectors per block */
> unsigned int log_blocksize; /* log2 block size */
> @@ -3984,7 +3993,8 @@ struct f2fs_stat_info {
> unsigned long long allocated_data_blocks;
> int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta;
> int ndirty_data, ndirty_qdata;
> - unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all;
> + unsigned int ndirty_dirs, ndirty_files, ndirty_all;
> + unsigned int nquota_files, ndonate_files;
> int nats, dirty_nats, sits, dirty_sits;
> int free_nids, avail_nids, alloc_nids;
> int total_count, utilization;
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 81764b10840b..ff475bdc2832 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -2429,6 +2429,68 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg)
> return ret;
> }
>
> +static int f2fs_ioc_donate_range(struct file *filp, unsigned long arg)
> +{
> + struct inode *inode = file_inode(filp);
> + struct mnt_idmap *idmap = file_mnt_idmap(filp);
> + struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> + struct f2fs_donate_range range;
> + u64 max_pages = F2FS_BLK_TO_BYTES(max_file_blocks(inode)) >> PAGE_SHIFT;
> + bool partial = range.start & PAGE_MASK;
> + int ret;
> +
> + if (copy_from_user(&range, (struct f2fs_donate_range __user *)arg,
> + sizeof(range)))
> + return -EFAULT;
> +
> + if (!inode_owner_or_capable(idmap, inode))
> + return -EACCES;
> +
> + if (!S_ISREG(inode->i_mode))
> + return -EINVAL;
> +
> + range.start >>= PAGE_SHIFT;
> + range.len = DIV_ROUND_UP(range.len, PAGE_SIZE) + partial ? 1: 0;
e.g.
range.start = 2048
range.len = 6144
original range is [2048, 8192]
after calculation, the range becomes [0, 12288]?
How about this?
u64 max_size = F2FS_BLK_TO_BYTES(max_file_blocks(inode));
u64 start, end;
if (range.start >= max_size || range.len > max_size ||
(range.start + range.len) > max_pages)
start = range.start >> PAGE_SHIFT;
end = DIV_ROUND_UP(range.start + range.len, PAGE_SIZE);
...
/* let's remove the range, if len = 0 */
if (start == end)
...
F2FS_I(inode)->donate_start = start;
F2FS_I(inode)->donate_end = end;
Thanks,
> +
> + if (range.start >= max_pages || range.len > max_pages ||
> + (range.start + range.len) > max_pages)
> + return -EINVAL;
> +
> + ret = mnt_want_write_file(filp);
> + if (ret)
> + return ret;
> +
> + inode_lock(inode);
> +
> + if (f2fs_is_atomic_file(inode))
> + goto out;
> +
> + spin_lock(&sbi->inode_lock[DONATE_INODE]);
> + /* let's remove the range, if len = 0 */
> + if (!range.len) {
> + if (!list_empty(&F2FS_I(inode)->gdonate_list)) {
> + list_del_init(&F2FS_I(inode)->gdonate_list);
> + sbi->donate_files--;
> + }
> + } else {
> + if (list_empty(&F2FS_I(inode)->gdonate_list)) {
> + list_add_tail(&F2FS_I(inode)->gdonate_list,
> + &sbi->inode_list[DONATE_INODE]);
> + sbi->donate_files++;
> + } else {
> + list_move_tail(&F2FS_I(inode)->gdonate_list,
> + &sbi->inode_list[DONATE_INODE]);
> + }
> + F2FS_I(inode)->donate_start = range.start;
> + F2FS_I(inode)->donate_end = range.start + range.len - 1;
> + }
> + spin_unlock(&sbi->inode_lock[DONATE_INODE]);
> +out:
> + inode_unlock(inode);
> + mnt_drop_write_file(filp);
> + return ret;
> +}
> +
> static int f2fs_ioc_fitrim(struct file *filp, unsigned long arg)
> {
> struct inode *inode = file_inode(filp);
> @@ -4458,6 +4520,8 @@ static long __f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> return -EOPNOTSUPP;
> case F2FS_IOC_SHUTDOWN:
> return f2fs_ioc_shutdown(filp, arg);
> + case F2FS_IOC_DONATE_RANGE:
> + return f2fs_ioc_donate_range(filp, arg);
> case FITRIM:
> return f2fs_ioc_fitrim(filp, arg);
> case FS_IOC_SET_ENCRYPTION_POLICY:
> @@ -5209,6 +5273,7 @@ long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> case F2FS_IOC_RELEASE_VOLATILE_WRITE:
> case F2FS_IOC_ABORT_ATOMIC_WRITE:
> case F2FS_IOC_SHUTDOWN:
> + case F2FS_IOC_DONATE_RANGE:
> case FITRIM:
> case FS_IOC_SET_ENCRYPTION_POLICY:
> case FS_IOC_GET_ENCRYPTION_PWSALT:
> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> index 7de33da8b3ea..f9fc58f313f2 100644
> --- a/fs/f2fs/inode.c
> +++ b/fs/f2fs/inode.c
> @@ -804,6 +804,19 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
> return 0;
> }
>
> +static void f2fs_remove_donate_inode(struct inode *inode)
> +{
> + struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> +
> + if (list_empty(&F2FS_I(inode)->gdonate_list))
> + return;
> +
> + spin_lock(&sbi->inode_lock[DONATE_INODE]);
> + list_del_init(&F2FS_I(inode)->gdonate_list);
> + sbi->donate_files--;
> + spin_unlock(&sbi->inode_lock[DONATE_INODE]);
> +}
> +
> /*
> * Called at the last iput() if i_nlink is zero
> */
> @@ -838,6 +851,7 @@ void f2fs_evict_inode(struct inode *inode)
>
> f2fs_bug_on(sbi, get_dirty_pages(inode));
> f2fs_remove_dirty_inode(inode);
> + f2fs_remove_donate_inode(inode);
>
> if (!IS_DEVICE_ALIASING(inode))
> f2fs_destroy_extent_tree(inode);
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index fc7d463dee15..ef639a6d82e5 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -1441,6 +1441,7 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
> spin_lock_init(&fi->i_size_lock);
> INIT_LIST_HEAD(&fi->dirty_list);
> INIT_LIST_HEAD(&fi->gdirty_list);
> + INIT_LIST_HEAD(&fi->gdonate_list);
> init_f2fs_rwsem(&fi->i_gc_rwsem[READ]);
> init_f2fs_rwsem(&fi->i_gc_rwsem[WRITE]);
> init_f2fs_rwsem(&fi->i_xattr_sem);
> diff --git a/include/uapi/linux/f2fs.h b/include/uapi/linux/f2fs.h
> index f7aaf8d23e20..cd38a7c166e6 100644
> --- a/include/uapi/linux/f2fs.h
> +++ b/include/uapi/linux/f2fs.h
> @@ -44,6 +44,8 @@
> #define F2FS_IOC_COMPRESS_FILE _IO(F2FS_IOCTL_MAGIC, 24)
> #define F2FS_IOC_START_ATOMIC_REPLACE _IO(F2FS_IOCTL_MAGIC, 25)
> #define F2FS_IOC_GET_DEV_ALIAS_FILE _IOR(F2FS_IOCTL_MAGIC, 26, __u32)
> +#define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
> + struct f2fs_donate_range)
>
> /*
> * should be same as XFS_IOC_GOINGDOWN.
> @@ -97,4 +99,9 @@ struct f2fs_comp_option {
> __u8 log_cluster_size;
> };
>
> +struct f2fs_donate_range {
> + __u64 start;
> + __u64 len;
> +};
> +
> #endif /* _UAPI_LINUX_F2FS_H */
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [f2fs-dev] [PATCH 1/2] f2fs: register inodes which is able to donate pages
2025-01-16 5:31 ` [f2fs-dev] " Chao Yu
@ 2025-01-16 17:00 ` Jaegeuk Kim
0 siblings, 0 replies; 23+ messages in thread
From: Jaegeuk Kim @ 2025-01-16 17:00 UTC (permalink / raw)
To: Chao Yu; +Cc: linux-kernel, linux-f2fs-devel
On 01/16, Chao Yu wrote:
> On 1/16/25 12:42, Jaegeuk Kim via Linux-f2fs-devel wrote:
> > This patch introduces an inode list to keep the page cache ranges that users
> > can donate pages together.
> >
> > #define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
> > struct f2fs_donate_range)
> > struct f2fs_donate_range {
> > __u64 start;
> > __u64 len;
> > };
> >
> > e.g., ioctl(F2FS_IOC_DONATE_RANGE, &range);
> >
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > ---
> > fs/f2fs/debug.c | 3 ++
> > fs/f2fs/f2fs.h | 12 +++++++-
> > fs/f2fs/file.c | 65 +++++++++++++++++++++++++++++++++++++++
> > fs/f2fs/inode.c | 14 +++++++++
> > fs/f2fs/super.c | 1 +
> > include/uapi/linux/f2fs.h | 7 +++++
> > 6 files changed, 101 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
> > index 468828288a4a..16c2dfb4f595 100644
> > --- a/fs/f2fs/debug.c
> > +++ b/fs/f2fs/debug.c
> > @@ -164,6 +164,7 @@ static void update_general_status(struct f2fs_sb_info *sbi)
> > si->ndirty_imeta = get_pages(sbi, F2FS_DIRTY_IMETA);
> > si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE];
> > si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
> > + si->ndonate_files = sbi->donate_files;
> > si->nquota_files = sbi->nquota_files;
> > si->ndirty_all = sbi->ndirty_inode[DIRTY_META];
> > si->aw_cnt = atomic_read(&sbi->atomic_files);
> > @@ -501,6 +502,8 @@ static int stat_show(struct seq_file *s, void *v)
> > si->compr_inode, si->compr_blocks);
> > seq_printf(s, " - Swapfile Inode: %u\n",
> > si->swapfile_inode);
> > + seq_printf(s, " - Donate Inode: %u\n",
> > + si->ndonate_files);
> > seq_printf(s, " - Orphan/Append/Update Inode: %u, %u, %u\n",
> > si->orphans, si->append, si->update);
> > seq_printf(s, "\nMain area: %d segs, %d secs %d zones\n",
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 4bfe162eefd3..951fbc3f94c7 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -850,6 +850,11 @@ struct f2fs_inode_info {
> > #endif
> > struct list_head dirty_list; /* dirty list for dirs and files */
> > struct list_head gdirty_list; /* linked in global dirty list */
> > +
> > + /* linked in global inode list for cache donation */
> > + struct list_head gdonate_list;
> > + loff_t donate_start, donate_end; /* inclusive */
> > +
> > struct task_struct *atomic_write_task; /* store atomic write task */
> > struct extent_tree *extent_tree[NR_EXTENT_CACHES];
> > /* cached extent_tree entry */
> > @@ -1274,6 +1279,7 @@ enum inode_type {
> > DIR_INODE, /* for dirty dir inode */
> > FILE_INODE, /* for dirty regular/symlink inode */
> > DIRTY_META, /* for all dirtied inode metadata */
> > + DONATE_INODE, /* for all inode to donate pages */
> > NR_INODE_TYPE,
> > };
> > @@ -1629,6 +1635,9 @@ struct f2fs_sb_info {
> > unsigned int warm_data_age_threshold;
> > unsigned int last_age_weight;
> > + /* control donate caches */
> > + unsigned int donate_files;
> > +
> > /* basic filesystem units */
> > unsigned int log_sectors_per_block; /* log2 sectors per block */
> > unsigned int log_blocksize; /* log2 block size */
> > @@ -3984,7 +3993,8 @@ struct f2fs_stat_info {
> > unsigned long long allocated_data_blocks;
> > int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta;
> > int ndirty_data, ndirty_qdata;
> > - unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all;
> > + unsigned int ndirty_dirs, ndirty_files, ndirty_all;
> > + unsigned int nquota_files, ndonate_files;
> > int nats, dirty_nats, sits, dirty_sits;
> > int free_nids, avail_nids, alloc_nids;
> > int total_count, utilization;
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index 81764b10840b..ff475bdc2832 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -2429,6 +2429,68 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg)
> > return ret;
> > }
> > +static int f2fs_ioc_donate_range(struct file *filp, unsigned long arg)
> > +{
> > + struct inode *inode = file_inode(filp);
> > + struct mnt_idmap *idmap = file_mnt_idmap(filp);
> > + struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> > + struct f2fs_donate_range range;
> > + u64 max_pages = F2FS_BLK_TO_BYTES(max_file_blocks(inode)) >> PAGE_SHIFT;
> > + bool partial = range.start & PAGE_MASK;
> > + int ret;
> > +
> > + if (copy_from_user(&range, (struct f2fs_donate_range __user *)arg,
> > + sizeof(range)))
> > + return -EFAULT;
> > +
> > + if (!inode_owner_or_capable(idmap, inode))
> > + return -EACCES;
> > +
> > + if (!S_ISREG(inode->i_mode))
> > + return -EINVAL;
> > +
> > + range.start >>= PAGE_SHIFT;
> > + range.len = DIV_ROUND_UP(range.len, PAGE_SIZE) + partial ? 1: 0;
>
> e.g.
>
> range.start = 2048
> range.len = 6144
>
> original range is [2048, 8192]
>
> after calculation, the range becomes [0, 12288]?
>
> How about this?
>
> u64 max_size = F2FS_BLK_TO_BYTES(max_file_blocks(inode));
> u64 start, end;
>
> if (range.start >= max_size || range.len > max_size ||
> (range.start + range.len) > max_pages)
>
> start = range.start >> PAGE_SHIFT;
> end = DIV_ROUND_UP(range.start + range.len, PAGE_SIZE);
>
> ...
>
> /* let's remove the range, if len = 0 */
> if (start == end)
Let me take others except this, since we'd better remove the entry if
range.start!=0 && range.len=0 as well.
>
> ...
>
> F2FS_I(inode)->donate_start = start;
> F2FS_I(inode)->donate_end = end;
Needed to have end - 1.
>
> Thanks,
>
> > +
> > + if (range.start >= max_pages || range.len > max_pages ||
> > + (range.start + range.len) > max_pages)
> > + return -EINVAL;
> > +
> > + ret = mnt_want_write_file(filp);
> > + if (ret)
> > + return ret;
> > +
> > + inode_lock(inode);
> > +
> > + if (f2fs_is_atomic_file(inode))
> > + goto out;
> > +
> > + spin_lock(&sbi->inode_lock[DONATE_INODE]);
> > + /* let's remove the range, if len = 0 */
> > + if (!range.len) {
> > + if (!list_empty(&F2FS_I(inode)->gdonate_list)) {
> > + list_del_init(&F2FS_I(inode)->gdonate_list);
> > + sbi->donate_files--;
> > + }
> > + } else {
> > + if (list_empty(&F2FS_I(inode)->gdonate_list)) {
> > + list_add_tail(&F2FS_I(inode)->gdonate_list,
> > + &sbi->inode_list[DONATE_INODE]);
> > + sbi->donate_files++;
> > + } else {
> > + list_move_tail(&F2FS_I(inode)->gdonate_list,
> > + &sbi->inode_list[DONATE_INODE]);
> > + }
> > + F2FS_I(inode)->donate_start = range.start;
> > + F2FS_I(inode)->donate_end = range.start + range.len - 1;
> > + }
> > + spin_unlock(&sbi->inode_lock[DONATE_INODE]);
> > +out:
> > + inode_unlock(inode);
> > + mnt_drop_write_file(filp);
> > + return ret;
> > +}
> > +
> > static int f2fs_ioc_fitrim(struct file *filp, unsigned long arg)
> > {
> > struct inode *inode = file_inode(filp);
> > @@ -4458,6 +4520,8 @@ static long __f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> > return -EOPNOTSUPP;
> > case F2FS_IOC_SHUTDOWN:
> > return f2fs_ioc_shutdown(filp, arg);
> > + case F2FS_IOC_DONATE_RANGE:
> > + return f2fs_ioc_donate_range(filp, arg);
> > case FITRIM:
> > return f2fs_ioc_fitrim(filp, arg);
> > case FS_IOC_SET_ENCRYPTION_POLICY:
> > @@ -5209,6 +5273,7 @@ long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> > case F2FS_IOC_RELEASE_VOLATILE_WRITE:
> > case F2FS_IOC_ABORT_ATOMIC_WRITE:
> > case F2FS_IOC_SHUTDOWN:
> > + case F2FS_IOC_DONATE_RANGE:
> > case FITRIM:
> > case FS_IOC_SET_ENCRYPTION_POLICY:
> > case FS_IOC_GET_ENCRYPTION_PWSALT:
> > diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> > index 7de33da8b3ea..f9fc58f313f2 100644
> > --- a/fs/f2fs/inode.c
> > +++ b/fs/f2fs/inode.c
> > @@ -804,6 +804,19 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
> > return 0;
> > }
> > +static void f2fs_remove_donate_inode(struct inode *inode)
> > +{
> > + struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> > +
> > + if (list_empty(&F2FS_I(inode)->gdonate_list))
> > + return;
> > +
> > + spin_lock(&sbi->inode_lock[DONATE_INODE]);
> > + list_del_init(&F2FS_I(inode)->gdonate_list);
> > + sbi->donate_files--;
> > + spin_unlock(&sbi->inode_lock[DONATE_INODE]);
> > +}
> > +
> > /*
> > * Called at the last iput() if i_nlink is zero
> > */
> > @@ -838,6 +851,7 @@ void f2fs_evict_inode(struct inode *inode)
> > f2fs_bug_on(sbi, get_dirty_pages(inode));
> > f2fs_remove_dirty_inode(inode);
> > + f2fs_remove_donate_inode(inode);
> > if (!IS_DEVICE_ALIASING(inode))
> > f2fs_destroy_extent_tree(inode);
> > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > index fc7d463dee15..ef639a6d82e5 100644
> > --- a/fs/f2fs/super.c
> > +++ b/fs/f2fs/super.c
> > @@ -1441,6 +1441,7 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
> > spin_lock_init(&fi->i_size_lock);
> > INIT_LIST_HEAD(&fi->dirty_list);
> > INIT_LIST_HEAD(&fi->gdirty_list);
> > + INIT_LIST_HEAD(&fi->gdonate_list);
> > init_f2fs_rwsem(&fi->i_gc_rwsem[READ]);
> > init_f2fs_rwsem(&fi->i_gc_rwsem[WRITE]);
> > init_f2fs_rwsem(&fi->i_xattr_sem);
> > diff --git a/include/uapi/linux/f2fs.h b/include/uapi/linux/f2fs.h
> > index f7aaf8d23e20..cd38a7c166e6 100644
> > --- a/include/uapi/linux/f2fs.h
> > +++ b/include/uapi/linux/f2fs.h
> > @@ -44,6 +44,8 @@
> > #define F2FS_IOC_COMPRESS_FILE _IO(F2FS_IOCTL_MAGIC, 24)
> > #define F2FS_IOC_START_ATOMIC_REPLACE _IO(F2FS_IOCTL_MAGIC, 25)
> > #define F2FS_IOC_GET_DEV_ALIAS_FILE _IOR(F2FS_IOCTL_MAGIC, 26, __u32)
> > +#define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
> > + struct f2fs_donate_range)
> > /*
> > * should be same as XFS_IOC_GOINGDOWN.
> > @@ -97,4 +99,9 @@ struct f2fs_comp_option {
> > __u8 log_cluster_size;
> > };
> > +struct f2fs_donate_range {
> > + __u64 start;
> > + __u64 len;
> > +};
> > +
> > #endif /* _UAPI_LINUX_F2FS_H */
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/2 v3] add ioctl/sysfs to donate file-backed pages
@ 2025-01-15 22:16 Jaegeuk Kim
2025-01-15 22:16 ` [PATCH 1/2] f2fs: register inodes which is able to donate pages Jaegeuk Kim
0 siblings, 1 reply; 23+ messages in thread
From: Jaegeuk Kim @ 2025-01-15 22:16 UTC (permalink / raw)
To: linux-kernel, linux-f2fs-devel; +Cc: Jaegeuk Kim
If users clearly know which file-backed pages to reclaim in system view, they
can use this ioctl() to register in advance and reclaim all at once later.
Change log from v2:
- add more boundary checks
- de-register the range, if len is zero
Jaegeuk Kim (2):
f2fs: register inodes which is able to donate pages
f2fs: add a sysfs entry to request donate file-backed pages
Documentation/ABI/testing/sysfs-fs-f2fs | 7 +++
fs/f2fs/debug.c | 3 ++
fs/f2fs/f2fs.h | 14 +++++-
fs/f2fs/file.c | 64 +++++++++++++++++++++++++
fs/f2fs/inode.c | 14 ++++++
fs/f2fs/shrinker.c | 27 +++++++++++
fs/f2fs/super.c | 1 +
fs/f2fs/sysfs.c | 8 ++++
include/uapi/linux/f2fs.h | 7 +++
9 files changed, 144 insertions(+), 1 deletion(-)
--
2.48.0.rc2.279.g1de40edade-goog
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/2] f2fs: register inodes which is able to donate pages
2025-01-15 22:16 [PATCH 0/2 v3] add ioctl/sysfs to donate file-backed pages Jaegeuk Kim
@ 2025-01-15 22:16 ` Jaegeuk Kim
2025-01-16 3:07 ` [f2fs-dev] " Chao Yu
0 siblings, 1 reply; 23+ messages in thread
From: Jaegeuk Kim @ 2025-01-15 22:16 UTC (permalink / raw)
To: linux-kernel, linux-f2fs-devel; +Cc: Jaegeuk Kim
This patch introduces an inode list to keep the page cache ranges that users
can donate pages together.
#define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
struct f2fs_donate_range)
struct f2fs_donate_range {
__u64 start;
__u64 len;
};
e.g., ioctl(F2FS_IOC_DONATE_RANGE, &range);
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
fs/f2fs/debug.c | 3 ++
fs/f2fs/f2fs.h | 12 +++++++-
fs/f2fs/file.c | 64 +++++++++++++++++++++++++++++++++++++++
fs/f2fs/inode.c | 14 +++++++++
fs/f2fs/super.c | 1 +
include/uapi/linux/f2fs.h | 7 +++++
6 files changed, 100 insertions(+), 1 deletion(-)
diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index 468828288a4a..16c2dfb4f595 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -164,6 +164,7 @@ static void update_general_status(struct f2fs_sb_info *sbi)
si->ndirty_imeta = get_pages(sbi, F2FS_DIRTY_IMETA);
si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE];
si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
+ si->ndonate_files = sbi->donate_files;
si->nquota_files = sbi->nquota_files;
si->ndirty_all = sbi->ndirty_inode[DIRTY_META];
si->aw_cnt = atomic_read(&sbi->atomic_files);
@@ -501,6 +502,8 @@ static int stat_show(struct seq_file *s, void *v)
si->compr_inode, si->compr_blocks);
seq_printf(s, " - Swapfile Inode: %u\n",
si->swapfile_inode);
+ seq_printf(s, " - Donate Inode: %u\n",
+ si->ndonate_files);
seq_printf(s, " - Orphan/Append/Update Inode: %u, %u, %u\n",
si->orphans, si->append, si->update);
seq_printf(s, "\nMain area: %d segs, %d secs %d zones\n",
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 4bfe162eefd3..951fbc3f94c7 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -850,6 +850,11 @@ struct f2fs_inode_info {
#endif
struct list_head dirty_list; /* dirty list for dirs and files */
struct list_head gdirty_list; /* linked in global dirty list */
+
+ /* linked in global inode list for cache donation */
+ struct list_head gdonate_list;
+ loff_t donate_start, donate_end; /* inclusive */
+
struct task_struct *atomic_write_task; /* store atomic write task */
struct extent_tree *extent_tree[NR_EXTENT_CACHES];
/* cached extent_tree entry */
@@ -1274,6 +1279,7 @@ enum inode_type {
DIR_INODE, /* for dirty dir inode */
FILE_INODE, /* for dirty regular/symlink inode */
DIRTY_META, /* for all dirtied inode metadata */
+ DONATE_INODE, /* for all inode to donate pages */
NR_INODE_TYPE,
};
@@ -1629,6 +1635,9 @@ struct f2fs_sb_info {
unsigned int warm_data_age_threshold;
unsigned int last_age_weight;
+ /* control donate caches */
+ unsigned int donate_files;
+
/* basic filesystem units */
unsigned int log_sectors_per_block; /* log2 sectors per block */
unsigned int log_blocksize; /* log2 block size */
@@ -3984,7 +3993,8 @@ struct f2fs_stat_info {
unsigned long long allocated_data_blocks;
int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta;
int ndirty_data, ndirty_qdata;
- unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all;
+ unsigned int ndirty_dirs, ndirty_files, ndirty_all;
+ unsigned int nquota_files, ndonate_files;
int nats, dirty_nats, sits, dirty_sits;
int free_nids, avail_nids, alloc_nids;
int total_count, utilization;
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 81764b10840b..6d071605b0cd 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -2429,6 +2429,67 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg)
return ret;
}
+static int f2fs_ioc_donate_range(struct file *filp, unsigned long arg)
+{
+ struct inode *inode = file_inode(filp);
+ struct mnt_idmap *idmap = file_mnt_idmap(filp);
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+ struct f2fs_donate_range range;
+ u64 max_pages = F2FS_BLK_TO_BYTES(max_file_blocks(inode)) >> PAGE_SHIFT;
+ int ret;
+
+ if (copy_from_user(&range, (struct f2fs_donate_range __user *)arg,
+ sizeof(range)))
+ return -EFAULT;
+
+ if (!inode_owner_or_capable(idmap, inode))
+ return -EACCES;
+
+ if (!S_ISREG(inode->i_mode))
+ return -EINVAL;
+
+ range.start >>= PAGE_SHIFT;
+ range.len = DIV_ROUND_UP(range.len, PAGE_SIZE);
+
+ if (range.start >= max_pages || range.len > max_pages ||
+ (range.start + range.len) > max_pages)
+ return -EINVAL;
+
+ ret = mnt_want_write_file(filp);
+ if (ret)
+ return ret;
+
+ inode_lock(inode);
+
+ if (f2fs_is_atomic_file(inode))
+ goto out;
+
+ spin_lock(&sbi->inode_lock[DONATE_INODE]);
+ /* let's remove the range, if len = 0 */
+ if (!range.len) {
+ if (!list_empty(&F2FS_I(inode)->gdonate_list)) {
+ list_del_init(&F2FS_I(inode)->gdonate_list);
+ sbi->donate_files--;
+ }
+ } else {
+ if (list_empty(&F2FS_I(inode)->gdonate_list)) {
+ list_add_tail(&F2FS_I(inode)->gdonate_list,
+ &sbi->inode_list[DONATE_INODE]);
+ sbi->donate_files++;
+ } else {
+ list_move_tail(&F2FS_I(inode)->gdonate_list,
+ &sbi->inode_list[DONATE_INODE]);
+ }
+ F2FS_I(inode)->donate_start = range.start;
+ F2FS_I(inode)->donate_end = range.start + range.len - 1;
+ }
+ spin_unlock(&sbi->inode_lock[DONATE_INODE]);
+out:
+ inode_unlock(inode);
+ mnt_drop_write_file(filp);
+ return ret;
+}
+
static int f2fs_ioc_fitrim(struct file *filp, unsigned long arg)
{
struct inode *inode = file_inode(filp);
@@ -4458,6 +4519,8 @@ static long __f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return -EOPNOTSUPP;
case F2FS_IOC_SHUTDOWN:
return f2fs_ioc_shutdown(filp, arg);
+ case F2FS_IOC_DONATE_RANGE:
+ return f2fs_ioc_donate_range(filp, arg);
case FITRIM:
return f2fs_ioc_fitrim(filp, arg);
case FS_IOC_SET_ENCRYPTION_POLICY:
@@ -5209,6 +5272,7 @@ long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case F2FS_IOC_RELEASE_VOLATILE_WRITE:
case F2FS_IOC_ABORT_ATOMIC_WRITE:
case F2FS_IOC_SHUTDOWN:
+ case F2FS_IOC_DONATE_RANGE:
case FITRIM:
case FS_IOC_SET_ENCRYPTION_POLICY:
case FS_IOC_GET_ENCRYPTION_PWSALT:
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 7de33da8b3ea..f9fc58f313f2 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -804,6 +804,19 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
return 0;
}
+static void f2fs_remove_donate_inode(struct inode *inode)
+{
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+
+ if (list_empty(&F2FS_I(inode)->gdonate_list))
+ return;
+
+ spin_lock(&sbi->inode_lock[DONATE_INODE]);
+ list_del_init(&F2FS_I(inode)->gdonate_list);
+ sbi->donate_files--;
+ spin_unlock(&sbi->inode_lock[DONATE_INODE]);
+}
+
/*
* Called at the last iput() if i_nlink is zero
*/
@@ -838,6 +851,7 @@ void f2fs_evict_inode(struct inode *inode)
f2fs_bug_on(sbi, get_dirty_pages(inode));
f2fs_remove_dirty_inode(inode);
+ f2fs_remove_donate_inode(inode);
if (!IS_DEVICE_ALIASING(inode))
f2fs_destroy_extent_tree(inode);
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index fc7d463dee15..ef639a6d82e5 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1441,6 +1441,7 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
spin_lock_init(&fi->i_size_lock);
INIT_LIST_HEAD(&fi->dirty_list);
INIT_LIST_HEAD(&fi->gdirty_list);
+ INIT_LIST_HEAD(&fi->gdonate_list);
init_f2fs_rwsem(&fi->i_gc_rwsem[READ]);
init_f2fs_rwsem(&fi->i_gc_rwsem[WRITE]);
init_f2fs_rwsem(&fi->i_xattr_sem);
diff --git a/include/uapi/linux/f2fs.h b/include/uapi/linux/f2fs.h
index f7aaf8d23e20..cd38a7c166e6 100644
--- a/include/uapi/linux/f2fs.h
+++ b/include/uapi/linux/f2fs.h
@@ -44,6 +44,8 @@
#define F2FS_IOC_COMPRESS_FILE _IO(F2FS_IOCTL_MAGIC, 24)
#define F2FS_IOC_START_ATOMIC_REPLACE _IO(F2FS_IOCTL_MAGIC, 25)
#define F2FS_IOC_GET_DEV_ALIAS_FILE _IOR(F2FS_IOCTL_MAGIC, 26, __u32)
+#define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
+ struct f2fs_donate_range)
/*
* should be same as XFS_IOC_GOINGDOWN.
@@ -97,4 +99,9 @@ struct f2fs_comp_option {
__u8 log_cluster_size;
};
+struct f2fs_donate_range {
+ __u64 start;
+ __u64 len;
+};
+
#endif /* _UAPI_LINUX_F2FS_H */
--
2.48.0.rc2.279.g1de40edade-goog
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [f2fs-dev] [PATCH 1/2] f2fs: register inodes which is able to donate pages
2025-01-15 22:16 ` [PATCH 1/2] f2fs: register inodes which is able to donate pages Jaegeuk Kim
@ 2025-01-16 3:07 ` Chao Yu
0 siblings, 0 replies; 23+ messages in thread
From: Chao Yu @ 2025-01-16 3:07 UTC (permalink / raw)
To: Jaegeuk Kim, linux-kernel, linux-f2fs-devel; +Cc: chao
On 1/16/25 06:16, Jaegeuk Kim via Linux-f2fs-devel wrote:
> This patch introduces an inode list to keep the page cache ranges that users
> can donate pages together.
>
> #define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
> struct f2fs_donate_range)
> struct f2fs_donate_range {
> __u64 start;
> __u64 len;
> };
>
> e.g., ioctl(F2FS_IOC_DONATE_RANGE, &range);
>
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
> fs/f2fs/debug.c | 3 ++
> fs/f2fs/f2fs.h | 12 +++++++-
> fs/f2fs/file.c | 64 +++++++++++++++++++++++++++++++++++++++
> fs/f2fs/inode.c | 14 +++++++++
> fs/f2fs/super.c | 1 +
> include/uapi/linux/f2fs.h | 7 +++++
> 6 files changed, 100 insertions(+), 1 deletion(-)
>
> diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
> index 468828288a4a..16c2dfb4f595 100644
> --- a/fs/f2fs/debug.c
> +++ b/fs/f2fs/debug.c
> @@ -164,6 +164,7 @@ static void update_general_status(struct f2fs_sb_info *sbi)
> si->ndirty_imeta = get_pages(sbi, F2FS_DIRTY_IMETA);
> si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE];
> si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
> + si->ndonate_files = sbi->donate_files;
> si->nquota_files = sbi->nquota_files;
> si->ndirty_all = sbi->ndirty_inode[DIRTY_META];
> si->aw_cnt = atomic_read(&sbi->atomic_files);
> @@ -501,6 +502,8 @@ static int stat_show(struct seq_file *s, void *v)
> si->compr_inode, si->compr_blocks);
> seq_printf(s, " - Swapfile Inode: %u\n",
> si->swapfile_inode);
> + seq_printf(s, " - Donate Inode: %u\n",
> + si->ndonate_files);
> seq_printf(s, " - Orphan/Append/Update Inode: %u, %u, %u\n",
> si->orphans, si->append, si->update);
> seq_printf(s, "\nMain area: %d segs, %d secs %d zones\n",
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 4bfe162eefd3..951fbc3f94c7 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -850,6 +850,11 @@ struct f2fs_inode_info {
> #endif
> struct list_head dirty_list; /* dirty list for dirs and files */
> struct list_head gdirty_list; /* linked in global dirty list */
> +
> + /* linked in global inode list for cache donation */
> + struct list_head gdonate_list;
> + loff_t donate_start, donate_end; /* inclusive */
> +
> struct task_struct *atomic_write_task; /* store atomic write task */
> struct extent_tree *extent_tree[NR_EXTENT_CACHES];
> /* cached extent_tree entry */
> @@ -1274,6 +1279,7 @@ enum inode_type {
> DIR_INODE, /* for dirty dir inode */
> FILE_INODE, /* for dirty regular/symlink inode */
> DIRTY_META, /* for all dirtied inode metadata */
> + DONATE_INODE, /* for all inode to donate pages */
> NR_INODE_TYPE,
> };
>
> @@ -1629,6 +1635,9 @@ struct f2fs_sb_info {
> unsigned int warm_data_age_threshold;
> unsigned int last_age_weight;
>
> + /* control donate caches */
> + unsigned int donate_files;
> +
> /* basic filesystem units */
> unsigned int log_sectors_per_block; /* log2 sectors per block */
> unsigned int log_blocksize; /* log2 block size */
> @@ -3984,7 +3993,8 @@ struct f2fs_stat_info {
> unsigned long long allocated_data_blocks;
> int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta;
> int ndirty_data, ndirty_qdata;
> - unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all;
> + unsigned int ndirty_dirs, ndirty_files, ndirty_all;
> + unsigned int nquota_files, ndonate_files;
> int nats, dirty_nats, sits, dirty_sits;
> int free_nids, avail_nids, alloc_nids;
> int total_count, utilization;
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 81764b10840b..6d071605b0cd 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -2429,6 +2429,67 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg)
> return ret;
> }
>
> +static int f2fs_ioc_donate_range(struct file *filp, unsigned long arg)
> +{
> + struct inode *inode = file_inode(filp);
> + struct mnt_idmap *idmap = file_mnt_idmap(filp);
> + struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> + struct f2fs_donate_range range;
> + u64 max_pages = F2FS_BLK_TO_BYTES(max_file_blocks(inode)) >> PAGE_SHIFT;
> + int ret;
> +
> + if (copy_from_user(&range, (struct f2fs_donate_range __user *)arg,
> + sizeof(range)))
> + return -EFAULT;
> +
> + if (!inode_owner_or_capable(idmap, inode))
> + return -EACCES;
> +
> + if (!S_ISREG(inode->i_mode))
> + return -EINVAL;
> +
> + range.start >>= PAGE_SHIFT;
> + range.len = DIV_ROUND_UP(range.len, PAGE_SIZE);
e.g.
range.start = 2048
range.len = 4096
The range is page #[0, 1]
after calculation,
range.start = 0
range.len = 1
The range is shrunk to page #[0, 0]? IIUC.
Thanks,
> +
> + if (range.start >= max_pages || range.len > max_pages ||
> + (range.start + range.len) > max_pages)
> + return -EINVAL;
> +
> + ret = mnt_want_write_file(filp);
> + if (ret)
> + return ret;
> +
> + inode_lock(inode);
> +
> + if (f2fs_is_atomic_file(inode))
> + goto out;
> +
> + spin_lock(&sbi->inode_lock[DONATE_INODE]);
> + /* let's remove the range, if len = 0 */
> + if (!range.len) {
> + if (!list_empty(&F2FS_I(inode)->gdonate_list)) {
> + list_del_init(&F2FS_I(inode)->gdonate_list);
> + sbi->donate_files--;
> + }
> + } else {
> + if (list_empty(&F2FS_I(inode)->gdonate_list)) {
> + list_add_tail(&F2FS_I(inode)->gdonate_list,
> + &sbi->inode_list[DONATE_INODE]);
> + sbi->donate_files++;
> + } else {
> + list_move_tail(&F2FS_I(inode)->gdonate_list,
> + &sbi->inode_list[DONATE_INODE]);
> + }
> + F2FS_I(inode)->donate_start = range.start;
> + F2FS_I(inode)->donate_end = range.start + range.len - 1;
> + }
> + spin_unlock(&sbi->inode_lock[DONATE_INODE]);
> +out:
> + inode_unlock(inode);
> + mnt_drop_write_file(filp);
> + return ret;
> +}
> +
> static int f2fs_ioc_fitrim(struct file *filp, unsigned long arg)
> {
> struct inode *inode = file_inode(filp);
> @@ -4458,6 +4519,8 @@ static long __f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> return -EOPNOTSUPP;
> case F2FS_IOC_SHUTDOWN:
> return f2fs_ioc_shutdown(filp, arg);
> + case F2FS_IOC_DONATE_RANGE:
> + return f2fs_ioc_donate_range(filp, arg);
> case FITRIM:
> return f2fs_ioc_fitrim(filp, arg);
> case FS_IOC_SET_ENCRYPTION_POLICY:
> @@ -5209,6 +5272,7 @@ long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> case F2FS_IOC_RELEASE_VOLATILE_WRITE:
> case F2FS_IOC_ABORT_ATOMIC_WRITE:
> case F2FS_IOC_SHUTDOWN:
> + case F2FS_IOC_DONATE_RANGE:
> case FITRIM:
> case FS_IOC_SET_ENCRYPTION_POLICY:
> case FS_IOC_GET_ENCRYPTION_PWSALT:
> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> index 7de33da8b3ea..f9fc58f313f2 100644
> --- a/fs/f2fs/inode.c
> +++ b/fs/f2fs/inode.c
> @@ -804,6 +804,19 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
> return 0;
> }
>
> +static void f2fs_remove_donate_inode(struct inode *inode)
> +{
> + struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> +
> + if (list_empty(&F2FS_I(inode)->gdonate_list))
> + return;
> +
> + spin_lock(&sbi->inode_lock[DONATE_INODE]);
> + list_del_init(&F2FS_I(inode)->gdonate_list);
> + sbi->donate_files--;
> + spin_unlock(&sbi->inode_lock[DONATE_INODE]);
> +}
> +
> /*
> * Called at the last iput() if i_nlink is zero
> */
> @@ -838,6 +851,7 @@ void f2fs_evict_inode(struct inode *inode)
>
> f2fs_bug_on(sbi, get_dirty_pages(inode));
> f2fs_remove_dirty_inode(inode);
> + f2fs_remove_donate_inode(inode);
>
> if (!IS_DEVICE_ALIASING(inode))
> f2fs_destroy_extent_tree(inode);
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index fc7d463dee15..ef639a6d82e5 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -1441,6 +1441,7 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
> spin_lock_init(&fi->i_size_lock);
> INIT_LIST_HEAD(&fi->dirty_list);
> INIT_LIST_HEAD(&fi->gdirty_list);
> + INIT_LIST_HEAD(&fi->gdonate_list);
> init_f2fs_rwsem(&fi->i_gc_rwsem[READ]);
> init_f2fs_rwsem(&fi->i_gc_rwsem[WRITE]);
> init_f2fs_rwsem(&fi->i_xattr_sem);
> diff --git a/include/uapi/linux/f2fs.h b/include/uapi/linux/f2fs.h
> index f7aaf8d23e20..cd38a7c166e6 100644
> --- a/include/uapi/linux/f2fs.h
> +++ b/include/uapi/linux/f2fs.h
> @@ -44,6 +44,8 @@
> #define F2FS_IOC_COMPRESS_FILE _IO(F2FS_IOCTL_MAGIC, 24)
> #define F2FS_IOC_START_ATOMIC_REPLACE _IO(F2FS_IOCTL_MAGIC, 25)
> #define F2FS_IOC_GET_DEV_ALIAS_FILE _IOR(F2FS_IOCTL_MAGIC, 26, __u32)
> +#define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
> + struct f2fs_donate_range)
>
> /*
> * should be same as XFS_IOC_GOINGDOWN.
> @@ -97,4 +99,9 @@ struct f2fs_comp_option {
> __u8 log_cluster_size;
> };
>
> +struct f2fs_donate_range {
> + __u64 start;
> + __u64 len;
> +};
> +
> #endif /* _UAPI_LINUX_F2FS_H */
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/2 v2] add ioctl/sysfs to donate file-backed pages
@ 2025-01-14 22:39 Jaegeuk Kim
2025-01-14 22:39 ` [PATCH 1/2] f2fs: register inodes which is able to donate pages Jaegeuk Kim
0 siblings, 1 reply; 23+ messages in thread
From: Jaegeuk Kim @ 2025-01-14 22:39 UTC (permalink / raw)
To: linux-kernel, linux-f2fs-devel; +Cc: Jaegeuk Kim
If users clearly know which file-backed pages to reclaim in system view, they
can use this ioctl() to register in advance and reclaim all at once later.
Jaegeuk Kim (2):
f2fs: register inodes which is able to donate pages
f2fs: add a sysfs entry to request donate file-backed pages
Documentation/ABI/testing/sysfs-fs-f2fs | 7 ++++
fs/f2fs/debug.c | 3 ++
fs/f2fs/f2fs.h | 14 ++++++-
fs/f2fs/file.c | 52 +++++++++++++++++++++++++
fs/f2fs/inode.c | 14 +++++++
fs/f2fs/shrinker.c | 27 +++++++++++++
fs/f2fs/super.c | 1 +
fs/f2fs/sysfs.c | 8 ++++
include/uapi/linux/f2fs.h | 7 ++++
9 files changed, 132 insertions(+), 1 deletion(-)
--
2.48.0.rc2.279.g1de40edade-goog
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/2] f2fs: register inodes which is able to donate pages
2025-01-14 22:39 [PATCH 0/2 v2] add ioctl/sysfs to donate file-backed pages Jaegeuk Kim
@ 2025-01-14 22:39 ` Jaegeuk Kim
2025-01-15 1:59 ` [f2fs-dev] " Chao Yu
0 siblings, 1 reply; 23+ messages in thread
From: Jaegeuk Kim @ 2025-01-14 22:39 UTC (permalink / raw)
To: linux-kernel, linux-f2fs-devel; +Cc: Jaegeuk Kim
This patch introduces an inode list to keep the page cache ranges that users
can donate pages together.
#define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
struct f2fs_donate_range)
struct f2fs_donate_range {
__u64 start;
__u64 len;
};
e.g., ioctl(F2FS_IOC_DONATE_RANGE, &range);
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
fs/f2fs/debug.c | 3 +++
fs/f2fs/f2fs.h | 12 ++++++++-
fs/f2fs/file.c | 52 +++++++++++++++++++++++++++++++++++++++
fs/f2fs/inode.c | 14 +++++++++++
fs/f2fs/super.c | 1 +
include/uapi/linux/f2fs.h | 7 ++++++
6 files changed, 88 insertions(+), 1 deletion(-)
diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index 468828288a4a..16c2dfb4f595 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -164,6 +164,7 @@ static void update_general_status(struct f2fs_sb_info *sbi)
si->ndirty_imeta = get_pages(sbi, F2FS_DIRTY_IMETA);
si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE];
si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
+ si->ndonate_files = sbi->donate_files;
si->nquota_files = sbi->nquota_files;
si->ndirty_all = sbi->ndirty_inode[DIRTY_META];
si->aw_cnt = atomic_read(&sbi->atomic_files);
@@ -501,6 +502,8 @@ static int stat_show(struct seq_file *s, void *v)
si->compr_inode, si->compr_blocks);
seq_printf(s, " - Swapfile Inode: %u\n",
si->swapfile_inode);
+ seq_printf(s, " - Donate Inode: %u\n",
+ si->ndonate_files);
seq_printf(s, " - Orphan/Append/Update Inode: %u, %u, %u\n",
si->orphans, si->append, si->update);
seq_printf(s, "\nMain area: %d segs, %d secs %d zones\n",
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 4bfe162eefd3..951fbc3f94c7 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -850,6 +850,11 @@ struct f2fs_inode_info {
#endif
struct list_head dirty_list; /* dirty list for dirs and files */
struct list_head gdirty_list; /* linked in global dirty list */
+
+ /* linked in global inode list for cache donation */
+ struct list_head gdonate_list;
+ loff_t donate_start, donate_end; /* inclusive */
+
struct task_struct *atomic_write_task; /* store atomic write task */
struct extent_tree *extent_tree[NR_EXTENT_CACHES];
/* cached extent_tree entry */
@@ -1274,6 +1279,7 @@ enum inode_type {
DIR_INODE, /* for dirty dir inode */
FILE_INODE, /* for dirty regular/symlink inode */
DIRTY_META, /* for all dirtied inode metadata */
+ DONATE_INODE, /* for all inode to donate pages */
NR_INODE_TYPE,
};
@@ -1629,6 +1635,9 @@ struct f2fs_sb_info {
unsigned int warm_data_age_threshold;
unsigned int last_age_weight;
+ /* control donate caches */
+ unsigned int donate_files;
+
/* basic filesystem units */
unsigned int log_sectors_per_block; /* log2 sectors per block */
unsigned int log_blocksize; /* log2 block size */
@@ -3984,7 +3993,8 @@ struct f2fs_stat_info {
unsigned long long allocated_data_blocks;
int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta;
int ndirty_data, ndirty_qdata;
- unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all;
+ unsigned int ndirty_dirs, ndirty_files, ndirty_all;
+ unsigned int nquota_files, ndonate_files;
int nats, dirty_nats, sits, dirty_sits;
int free_nids, avail_nids, alloc_nids;
int total_count, utilization;
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 81764b10840b..c43d64898d8b 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -2429,6 +2429,55 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg)
return ret;
}
+static int f2fs_ioc_donate_range(struct file *filp, unsigned long arg)
+{
+ struct inode *inode = file_inode(filp);
+ struct mnt_idmap *idmap = file_mnt_idmap(filp);
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+ struct f2fs_donate_range range;
+ int ret;
+
+ if (copy_from_user(&range, (struct f2fs_donate_range __user *)arg,
+ sizeof(range)))
+ return -EFAULT;
+
+ if (!inode_owner_or_capable(idmap, inode))
+ return -EACCES;
+
+ if (!S_ISREG(inode->i_mode))
+ return -EINVAL;
+
+ if (unlikely((range.start + range.len) >> PAGE_SHIFT >
+ max_file_blocks(inode)))
+ return -EINVAL;
+
+ ret = mnt_want_write_file(filp);
+ if (ret)
+ return ret;
+
+ inode_lock(inode);
+
+ if (f2fs_is_atomic_file(inode))
+ goto out;
+
+ spin_lock(&sbi->inode_lock[DONATE_INODE]);
+ if (list_empty(&F2FS_I(inode)->gdonate_list)) {
+ list_add_tail(&F2FS_I(inode)->gdonate_list,
+ &sbi->inode_list[DONATE_INODE]);
+ sbi->donate_files++;
+ } else {
+ list_move_tail(&F2FS_I(inode)->gdonate_list,
+ &sbi->inode_list[DONATE_INODE]);
+ }
+ F2FS_I(inode)->donate_start = range.start;
+ F2FS_I(inode)->donate_end = range.start + range.len - 1;
+ spin_unlock(&sbi->inode_lock[DONATE_INODE]);
+out:
+ inode_unlock(inode);
+ mnt_drop_write_file(filp);
+ return ret;
+}
+
static int f2fs_ioc_fitrim(struct file *filp, unsigned long arg)
{
struct inode *inode = file_inode(filp);
@@ -4458,6 +4507,8 @@ static long __f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return -EOPNOTSUPP;
case F2FS_IOC_SHUTDOWN:
return f2fs_ioc_shutdown(filp, arg);
+ case F2FS_IOC_DONATE_RANGE:
+ return f2fs_ioc_donate_range(filp, arg);
case FITRIM:
return f2fs_ioc_fitrim(filp, arg);
case FS_IOC_SET_ENCRYPTION_POLICY:
@@ -5209,6 +5260,7 @@ long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case F2FS_IOC_RELEASE_VOLATILE_WRITE:
case F2FS_IOC_ABORT_ATOMIC_WRITE:
case F2FS_IOC_SHUTDOWN:
+ case F2FS_IOC_DONATE_RANGE:
case FITRIM:
case FS_IOC_SET_ENCRYPTION_POLICY:
case FS_IOC_GET_ENCRYPTION_PWSALT:
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 7de33da8b3ea..f9fc58f313f2 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -804,6 +804,19 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
return 0;
}
+static void f2fs_remove_donate_inode(struct inode *inode)
+{
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+
+ if (list_empty(&F2FS_I(inode)->gdonate_list))
+ return;
+
+ spin_lock(&sbi->inode_lock[DONATE_INODE]);
+ list_del_init(&F2FS_I(inode)->gdonate_list);
+ sbi->donate_files--;
+ spin_unlock(&sbi->inode_lock[DONATE_INODE]);
+}
+
/*
* Called at the last iput() if i_nlink is zero
*/
@@ -838,6 +851,7 @@ void f2fs_evict_inode(struct inode *inode)
f2fs_bug_on(sbi, get_dirty_pages(inode));
f2fs_remove_dirty_inode(inode);
+ f2fs_remove_donate_inode(inode);
if (!IS_DEVICE_ALIASING(inode))
f2fs_destroy_extent_tree(inode);
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index fc7d463dee15..ef639a6d82e5 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1441,6 +1441,7 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
spin_lock_init(&fi->i_size_lock);
INIT_LIST_HEAD(&fi->dirty_list);
INIT_LIST_HEAD(&fi->gdirty_list);
+ INIT_LIST_HEAD(&fi->gdonate_list);
init_f2fs_rwsem(&fi->i_gc_rwsem[READ]);
init_f2fs_rwsem(&fi->i_gc_rwsem[WRITE]);
init_f2fs_rwsem(&fi->i_xattr_sem);
diff --git a/include/uapi/linux/f2fs.h b/include/uapi/linux/f2fs.h
index f7aaf8d23e20..cd38a7c166e6 100644
--- a/include/uapi/linux/f2fs.h
+++ b/include/uapi/linux/f2fs.h
@@ -44,6 +44,8 @@
#define F2FS_IOC_COMPRESS_FILE _IO(F2FS_IOCTL_MAGIC, 24)
#define F2FS_IOC_START_ATOMIC_REPLACE _IO(F2FS_IOCTL_MAGIC, 25)
#define F2FS_IOC_GET_DEV_ALIAS_FILE _IOR(F2FS_IOCTL_MAGIC, 26, __u32)
+#define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
+ struct f2fs_donate_range)
/*
* should be same as XFS_IOC_GOINGDOWN.
@@ -97,4 +99,9 @@ struct f2fs_comp_option {
__u8 log_cluster_size;
};
+struct f2fs_donate_range {
+ __u64 start;
+ __u64 len;
+};
+
#endif /* _UAPI_LINUX_F2FS_H */
--
2.48.0.rc2.279.g1de40edade-goog
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [f2fs-dev] [PATCH 1/2] f2fs: register inodes which is able to donate pages
2025-01-14 22:39 ` [PATCH 1/2] f2fs: register inodes which is able to donate pages Jaegeuk Kim
@ 2025-01-15 1:59 ` Chao Yu
0 siblings, 0 replies; 23+ messages in thread
From: Chao Yu @ 2025-01-15 1:59 UTC (permalink / raw)
To: Jaegeuk Kim, linux-kernel, linux-f2fs-devel; +Cc: chao
On 1/15/25 06:39, Jaegeuk Kim via Linux-f2fs-devel wrote:
> This patch introduces an inode list to keep the page cache ranges that users
> can donate pages together.
>
> #define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
> struct f2fs_donate_range)
> struct f2fs_donate_range {
> __u64 start;
> __u64 len;
> };
>
> e.g., ioctl(F2FS_IOC_DONATE_RANGE, &range);
>
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
> fs/f2fs/debug.c | 3 +++
> fs/f2fs/f2fs.h | 12 ++++++++-
> fs/f2fs/file.c | 52 +++++++++++++++++++++++++++++++++++++++
> fs/f2fs/inode.c | 14 +++++++++++
> fs/f2fs/super.c | 1 +
> include/uapi/linux/f2fs.h | 7 ++++++
> 6 files changed, 88 insertions(+), 1 deletion(-)
>
> diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
> index 468828288a4a..16c2dfb4f595 100644
> --- a/fs/f2fs/debug.c
> +++ b/fs/f2fs/debug.c
> @@ -164,6 +164,7 @@ static void update_general_status(struct f2fs_sb_info *sbi)
> si->ndirty_imeta = get_pages(sbi, F2FS_DIRTY_IMETA);
> si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE];
> si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
> + si->ndonate_files = sbi->donate_files;
> si->nquota_files = sbi->nquota_files;
> si->ndirty_all = sbi->ndirty_inode[DIRTY_META];
> si->aw_cnt = atomic_read(&sbi->atomic_files);
> @@ -501,6 +502,8 @@ static int stat_show(struct seq_file *s, void *v)
> si->compr_inode, si->compr_blocks);
> seq_printf(s, " - Swapfile Inode: %u\n",
> si->swapfile_inode);
> + seq_printf(s, " - Donate Inode: %u\n",
> + si->ndonate_files);
> seq_printf(s, " - Orphan/Append/Update Inode: %u, %u, %u\n",
> si->orphans, si->append, si->update);
> seq_printf(s, "\nMain area: %d segs, %d secs %d zones\n",
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 4bfe162eefd3..951fbc3f94c7 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -850,6 +850,11 @@ struct f2fs_inode_info {
> #endif
> struct list_head dirty_list; /* dirty list for dirs and files */
> struct list_head gdirty_list; /* linked in global dirty list */
> +
> + /* linked in global inode list for cache donation */
> + struct list_head gdonate_list;
> + loff_t donate_start, donate_end; /* inclusive */
> +
> struct task_struct *atomic_write_task; /* store atomic write task */
> struct extent_tree *extent_tree[NR_EXTENT_CACHES];
> /* cached extent_tree entry */
> @@ -1274,6 +1279,7 @@ enum inode_type {
> DIR_INODE, /* for dirty dir inode */
> FILE_INODE, /* for dirty regular/symlink inode */
> DIRTY_META, /* for all dirtied inode metadata */
> + DONATE_INODE, /* for all inode to donate pages */
> NR_INODE_TYPE,
> };
>
> @@ -1629,6 +1635,9 @@ struct f2fs_sb_info {
> unsigned int warm_data_age_threshold;
> unsigned int last_age_weight;
>
> + /* control donate caches */
> + unsigned int donate_files;
> +
> /* basic filesystem units */
> unsigned int log_sectors_per_block; /* log2 sectors per block */
> unsigned int log_blocksize; /* log2 block size */
> @@ -3984,7 +3993,8 @@ struct f2fs_stat_info {
> unsigned long long allocated_data_blocks;
> int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta;
> int ndirty_data, ndirty_qdata;
> - unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all;
> + unsigned int ndirty_dirs, ndirty_files, ndirty_all;
> + unsigned int nquota_files, ndonate_files;
> int nats, dirty_nats, sits, dirty_sits;
> int free_nids, avail_nids, alloc_nids;
> int total_count, utilization;
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 81764b10840b..c43d64898d8b 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -2429,6 +2429,55 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg)
> return ret;
> }
>
> +static int f2fs_ioc_donate_range(struct file *filp, unsigned long arg)
> +{
> + struct inode *inode = file_inode(filp);
> + struct mnt_idmap *idmap = file_mnt_idmap(filp);
> + struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> + struct f2fs_donate_range range;
> + int ret;
> +
> + if (copy_from_user(&range, (struct f2fs_donate_range __user *)arg,
> + sizeof(range)))
> + return -EFAULT;
> +
> + if (!inode_owner_or_capable(idmap, inode))
> + return -EACCES;
> +
> + if (!S_ISREG(inode->i_mode))
> + return -EINVAL;
> +
> + if (unlikely((range.start + range.len) >> PAGE_SHIFT >
> + max_file_blocks(inode)))
What about below case?
range.start = ULLONG_MAX / 2;
range.len = ULLONG_MAX / 2 + 1;
Maybe this one?
if (unlikely(range.start >> PAGE_SHIFT >= max_file_blocks() ||
range.len >> PAGE_SHIFT > max_file_blocks() ||
(range.start + range.len) >> PAGE_SHIFT > max_file_blocks()))
Thanks,
> + return -EINVAL;
> +
> + ret = mnt_want_write_file(filp);
> + if (ret)
> + return ret;
> +
> + inode_lock(inode);
> +
> + if (f2fs_is_atomic_file(inode))
> + goto out;
> +
> + spin_lock(&sbi->inode_lock[DONATE_INODE]);
> + if (list_empty(&F2FS_I(inode)->gdonate_list)) {
> + list_add_tail(&F2FS_I(inode)->gdonate_list,
> + &sbi->inode_list[DONATE_INODE]);
> + sbi->donate_files++;
> + } else {
> + list_move_tail(&F2FS_I(inode)->gdonate_list,
> + &sbi->inode_list[DONATE_INODE]);
> + }
> + F2FS_I(inode)->donate_start = range.start;
> + F2FS_I(inode)->donate_end = range.start + range.len - 1;
> + spin_unlock(&sbi->inode_lock[DONATE_INODE]);
> +out:
> + inode_unlock(inode);
> + mnt_drop_write_file(filp);
> + return ret;
> +}
> +
> static int f2fs_ioc_fitrim(struct file *filp, unsigned long arg)
> {
> struct inode *inode = file_inode(filp);
> @@ -4458,6 +4507,8 @@ static long __f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> return -EOPNOTSUPP;
> case F2FS_IOC_SHUTDOWN:
> return f2fs_ioc_shutdown(filp, arg);
> + case F2FS_IOC_DONATE_RANGE:
> + return f2fs_ioc_donate_range(filp, arg);
> case FITRIM:
> return f2fs_ioc_fitrim(filp, arg);
> case FS_IOC_SET_ENCRYPTION_POLICY:
> @@ -5209,6 +5260,7 @@ long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> case F2FS_IOC_RELEASE_VOLATILE_WRITE:
> case F2FS_IOC_ABORT_ATOMIC_WRITE:
> case F2FS_IOC_SHUTDOWN:
> + case F2FS_IOC_DONATE_RANGE:
> case FITRIM:
> case FS_IOC_SET_ENCRYPTION_POLICY:
> case FS_IOC_GET_ENCRYPTION_PWSALT:
> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> index 7de33da8b3ea..f9fc58f313f2 100644
> --- a/fs/f2fs/inode.c
> +++ b/fs/f2fs/inode.c
> @@ -804,6 +804,19 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
> return 0;
> }
>
> +static void f2fs_remove_donate_inode(struct inode *inode)
> +{
> + struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> +
> + if (list_empty(&F2FS_I(inode)->gdonate_list))
> + return;
> +
> + spin_lock(&sbi->inode_lock[DONATE_INODE]);
> + list_del_init(&F2FS_I(inode)->gdonate_list);
> + sbi->donate_files--;
> + spin_unlock(&sbi->inode_lock[DONATE_INODE]);
> +}
> +
> /*
> * Called at the last iput() if i_nlink is zero
> */
> @@ -838,6 +851,7 @@ void f2fs_evict_inode(struct inode *inode)
>
> f2fs_bug_on(sbi, get_dirty_pages(inode));
> f2fs_remove_dirty_inode(inode);
> + f2fs_remove_donate_inode(inode);
>
> if (!IS_DEVICE_ALIASING(inode))
> f2fs_destroy_extent_tree(inode);
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index fc7d463dee15..ef639a6d82e5 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -1441,6 +1441,7 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
> spin_lock_init(&fi->i_size_lock);
> INIT_LIST_HEAD(&fi->dirty_list);
> INIT_LIST_HEAD(&fi->gdirty_list);
> + INIT_LIST_HEAD(&fi->gdonate_list);
> init_f2fs_rwsem(&fi->i_gc_rwsem[READ]);
> init_f2fs_rwsem(&fi->i_gc_rwsem[WRITE]);
> init_f2fs_rwsem(&fi->i_xattr_sem);
> diff --git a/include/uapi/linux/f2fs.h b/include/uapi/linux/f2fs.h
> index f7aaf8d23e20..cd38a7c166e6 100644
> --- a/include/uapi/linux/f2fs.h
> +++ b/include/uapi/linux/f2fs.h
> @@ -44,6 +44,8 @@
> #define F2FS_IOC_COMPRESS_FILE _IO(F2FS_IOCTL_MAGIC, 24)
> #define F2FS_IOC_START_ATOMIC_REPLACE _IO(F2FS_IOCTL_MAGIC, 25)
> #define F2FS_IOC_GET_DEV_ALIAS_FILE _IOR(F2FS_IOCTL_MAGIC, 26, __u32)
> +#define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
> + struct f2fs_donate_range)
>
> /*
> * should be same as XFS_IOC_GOINGDOWN.
> @@ -97,4 +99,9 @@ struct f2fs_comp_option {
> __u8 log_cluster_size;
> };
>
> +struct f2fs_donate_range {
> + __u64 start;
> + __u64 len;
> +};
> +
> #endif /* _UAPI_LINUX_F2FS_H */
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/2] f2fs: register inodes which is able to donate pages
@ 2025-01-13 18:39 Jaegeuk Kim
2025-01-14 6:34 ` [f2fs-dev] " Chao Yu
0 siblings, 1 reply; 23+ messages in thread
From: Jaegeuk Kim @ 2025-01-13 18:39 UTC (permalink / raw)
To: linux-kernel, linux-f2fs-devel; +Cc: Jaegeuk Kim
This patch introduces an inode list to keep the page cache ranges that users
can donate pages together.
#define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
struct f2fs_donate_range)
struct f2fs_donate_range {
__u64 start;
__u64 len;
};
e.g., ioctl(F2FS_IOC_DONATE_RANGE, &range);
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
fs/f2fs/debug.c | 3 +++
fs/f2fs/f2fs.h | 9 +++++++-
fs/f2fs/file.c | 48 +++++++++++++++++++++++++++++++++++++++
fs/f2fs/inode.c | 14 ++++++++++++
fs/f2fs/super.c | 1 +
include/uapi/linux/f2fs.h | 7 ++++++
6 files changed, 81 insertions(+), 1 deletion(-)
diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index 468828288a4a..1b099c123670 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -164,6 +164,7 @@ static void update_general_status(struct f2fs_sb_info *sbi)
si->ndirty_imeta = get_pages(sbi, F2FS_DIRTY_IMETA);
si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE];
si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
+ si->ndonate_files = sbi->ndirty_inode[DONATE_INODE];
si->nquota_files = sbi->nquota_files;
si->ndirty_all = sbi->ndirty_inode[DIRTY_META];
si->aw_cnt = atomic_read(&sbi->atomic_files);
@@ -501,6 +502,8 @@ static int stat_show(struct seq_file *s, void *v)
si->compr_inode, si->compr_blocks);
seq_printf(s, " - Swapfile Inode: %u\n",
si->swapfile_inode);
+ seq_printf(s, " - Donate Inode: %d\n",
+ si->ndonate_files);
seq_printf(s, " - Orphan/Append/Update Inode: %u, %u, %u\n",
si->orphans, si->append, si->update);
seq_printf(s, "\nMain area: %d segs, %d secs %d zones\n",
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 4bfe162eefd3..7ce3e3eab17a 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -850,6 +850,11 @@ struct f2fs_inode_info {
#endif
struct list_head dirty_list; /* dirty list for dirs and files */
struct list_head gdirty_list; /* linked in global dirty list */
+
+ /* linked in global inode list for cache donation */
+ struct list_head gdonate_list;
+ loff_t donate_start, donate_end; /* inclusive */
+
struct task_struct *atomic_write_task; /* store atomic write task */
struct extent_tree *extent_tree[NR_EXTENT_CACHES];
/* cached extent_tree entry */
@@ -1274,6 +1279,7 @@ enum inode_type {
DIR_INODE, /* for dirty dir inode */
FILE_INODE, /* for dirty regular/symlink inode */
DIRTY_META, /* for all dirtied inode metadata */
+ DONATE_INODE, /* for all inode to donate pages */
NR_INODE_TYPE,
};
@@ -3984,7 +3990,8 @@ struct f2fs_stat_info {
unsigned long long allocated_data_blocks;
int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta;
int ndirty_data, ndirty_qdata;
- unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all;
+ unsigned int ndirty_dirs, ndirty_files, ndirty_all;
+ unsigned int nquota_files, ndonate_files;
int nats, dirty_nats, sits, dirty_sits;
int free_nids, avail_nids, alloc_nids;
int total_count, utilization;
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 9980d17ef9f5..d6dea6258c2d 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -2493,6 +2493,51 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg)
return ret;
}
+static int f2fs_ioc_donate_range(struct file *filp, unsigned long arg)
+{
+ struct inode *inode = file_inode(filp);
+ struct mnt_idmap *idmap = file_mnt_idmap(filp);
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+ struct f2fs_donate_range range;
+ int ret;
+
+ if (copy_from_user(&range, (struct f2fs_donate_range __user *)arg,
+ sizeof(range)))
+ return -EFAULT;
+
+ if (!inode_owner_or_capable(idmap, inode))
+ return -EACCES;
+
+ if (!S_ISREG(inode->i_mode))
+ return -EINVAL;
+
+ ret = mnt_want_write_file(filp);
+ if (ret)
+ return ret;
+
+ inode_lock(inode);
+
+ if (f2fs_is_atomic_file(inode))
+ goto out;
+
+ spin_lock(&sbi->inode_lock[DONATE_INODE]);
+ if (list_empty(&F2FS_I(inode)->gdonate_list)) {
+ list_add_tail(&F2FS_I(inode)->gdonate_list,
+ &sbi->inode_list[DONATE_INODE]);
+ stat_inc_dirty_inode(sbi, DONATE_INODE);
+ } else {
+ list_move_tail(&F2FS_I(inode)->gdonate_list,
+ &sbi->inode_list[DONATE_INODE]);
+ }
+ F2FS_I(inode)->donate_start = range.start;
+ F2FS_I(inode)->donate_end = range.start + range.len - 1;
+ spin_unlock(&sbi->inode_lock[DONATE_INODE]);
+out:
+ inode_unlock(inode);
+ mnt_drop_write_file(filp);
+ return ret;
+}
+
static int f2fs_ioc_fitrim(struct file *filp, unsigned long arg)
{
struct inode *inode = file_inode(filp);
@@ -4522,6 +4567,8 @@ static long __f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return -EOPNOTSUPP;
case F2FS_IOC_SHUTDOWN:
return f2fs_ioc_shutdown(filp, arg);
+ case F2FS_IOC_DONATE_RANGE:
+ return f2fs_ioc_donate_range(filp, arg);
case FITRIM:
return f2fs_ioc_fitrim(filp, arg);
case FS_IOC_SET_ENCRYPTION_POLICY:
@@ -5273,6 +5320,7 @@ long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case F2FS_IOC_RELEASE_VOLATILE_WRITE:
case F2FS_IOC_ABORT_ATOMIC_WRITE:
case F2FS_IOC_SHUTDOWN:
+ case F2FS_IOC_DONATE_RANGE:
case FITRIM:
case FS_IOC_SET_ENCRYPTION_POLICY:
case FS_IOC_GET_ENCRYPTION_PWSALT:
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 7de33da8b3ea..e38dc5fe2f2e 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -804,6 +804,19 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
return 0;
}
+static void f2fs_remove_donate_inode(struct inode *inode)
+{
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+
+ if (list_empty(&F2FS_I(inode)->gdonate_list))
+ return;
+
+ spin_lock(&sbi->inode_lock[DONATE_INODE]);
+ list_del_init(&F2FS_I(inode)->gdonate_list);
+ stat_dec_dirty_inode(sbi, DONATE_INODE);
+ spin_unlock(&sbi->inode_lock[DONATE_INODE]);
+}
+
/*
* Called at the last iput() if i_nlink is zero
*/
@@ -838,6 +851,7 @@ void f2fs_evict_inode(struct inode *inode)
f2fs_bug_on(sbi, get_dirty_pages(inode));
f2fs_remove_dirty_inode(inode);
+ f2fs_remove_donate_inode(inode);
if (!IS_DEVICE_ALIASING(inode))
f2fs_destroy_extent_tree(inode);
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index fc7d463dee15..ef639a6d82e5 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1441,6 +1441,7 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
spin_lock_init(&fi->i_size_lock);
INIT_LIST_HEAD(&fi->dirty_list);
INIT_LIST_HEAD(&fi->gdirty_list);
+ INIT_LIST_HEAD(&fi->gdonate_list);
init_f2fs_rwsem(&fi->i_gc_rwsem[READ]);
init_f2fs_rwsem(&fi->i_gc_rwsem[WRITE]);
init_f2fs_rwsem(&fi->i_xattr_sem);
diff --git a/include/uapi/linux/f2fs.h b/include/uapi/linux/f2fs.h
index f7aaf8d23e20..cd38a7c166e6 100644
--- a/include/uapi/linux/f2fs.h
+++ b/include/uapi/linux/f2fs.h
@@ -44,6 +44,8 @@
#define F2FS_IOC_COMPRESS_FILE _IO(F2FS_IOCTL_MAGIC, 24)
#define F2FS_IOC_START_ATOMIC_REPLACE _IO(F2FS_IOCTL_MAGIC, 25)
#define F2FS_IOC_GET_DEV_ALIAS_FILE _IOR(F2FS_IOCTL_MAGIC, 26, __u32)
+#define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
+ struct f2fs_donate_range)
/*
* should be same as XFS_IOC_GOINGDOWN.
@@ -97,4 +99,9 @@ struct f2fs_comp_option {
__u8 log_cluster_size;
};
+struct f2fs_donate_range {
+ __u64 start;
+ __u64 len;
+};
+
#endif /* _UAPI_LINUX_F2FS_H */
--
2.47.1.688.g23fc6f90ad-goog
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [f2fs-dev] [PATCH 1/2] f2fs: register inodes which is able to donate pages
2025-01-13 18:39 Jaegeuk Kim
@ 2025-01-14 6:34 ` Chao Yu
2025-01-14 17:15 ` Jaegeuk Kim
0 siblings, 1 reply; 23+ messages in thread
From: Chao Yu @ 2025-01-14 6:34 UTC (permalink / raw)
To: Jaegeuk Kim, linux-kernel, linux-f2fs-devel; +Cc: chao
On 1/14/25 02:39, Jaegeuk Kim via Linux-f2fs-devel wrote:
> This patch introduces an inode list to keep the page cache ranges that users
> can donate pages together.
>
> #define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
> struct f2fs_donate_range)
> struct f2fs_donate_range {
> __u64 start;
> __u64 len;
> };
>
> e.g., ioctl(F2FS_IOC_DONATE_RANGE, &range);
I guess we need to add documentation for all ioctls including this one, maybe
later? :)
>
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
> fs/f2fs/debug.c | 3 +++
> fs/f2fs/f2fs.h | 9 +++++++-
> fs/f2fs/file.c | 48 +++++++++++++++++++++++++++++++++++++++
> fs/f2fs/inode.c | 14 ++++++++++++
> fs/f2fs/super.c | 1 +
> include/uapi/linux/f2fs.h | 7 ++++++
> 6 files changed, 81 insertions(+), 1 deletion(-)
>
> diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
> index 468828288a4a..1b099c123670 100644
> --- a/fs/f2fs/debug.c
> +++ b/fs/f2fs/debug.c
> @@ -164,6 +164,7 @@ static void update_general_status(struct f2fs_sb_info *sbi)
> si->ndirty_imeta = get_pages(sbi, F2FS_DIRTY_IMETA);
> si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE];
> si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
> + si->ndonate_files = sbi->ndirty_inode[DONATE_INODE];
> si->nquota_files = sbi->nquota_files;
> si->ndirty_all = sbi->ndirty_inode[DIRTY_META];
> si->aw_cnt = atomic_read(&sbi->atomic_files);
> @@ -501,6 +502,8 @@ static int stat_show(struct seq_file *s, void *v)
> si->compr_inode, si->compr_blocks);
> seq_printf(s, " - Swapfile Inode: %u\n",
> si->swapfile_inode);
> + seq_printf(s, " - Donate Inode: %d\n",
%u instead of %d due to si->ndonate_files is type of unsigned int.
> + si->ndonate_files);
> seq_printf(s, " - Orphan/Append/Update Inode: %u, %u, %u\n",
> si->orphans, si->append, si->update);
> seq_printf(s, "\nMain area: %d segs, %d secs %d zones\n",
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 4bfe162eefd3..7ce3e3eab17a 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -850,6 +850,11 @@ struct f2fs_inode_info {
> #endif
> struct list_head dirty_list; /* dirty list for dirs and files */
> struct list_head gdirty_list; /* linked in global dirty list */
> +
> + /* linked in global inode list for cache donation */
> + struct list_head gdonate_list;
> + loff_t donate_start, donate_end; /* inclusive */
> +
> struct task_struct *atomic_write_task; /* store atomic write task */
> struct extent_tree *extent_tree[NR_EXTENT_CACHES];
> /* cached extent_tree entry */
> @@ -1274,6 +1279,7 @@ enum inode_type {
> DIR_INODE, /* for dirty dir inode */
> FILE_INODE, /* for dirty regular/symlink inode */
> DIRTY_META, /* for all dirtied inode metadata */
> + DONATE_INODE, /* for all inode to donate pages */
> NR_INODE_TYPE,
> };
>
> @@ -3984,7 +3990,8 @@ struct f2fs_stat_info {
> unsigned long long allocated_data_blocks;
> int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta;
> int ndirty_data, ndirty_qdata;
> - unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all;
> + unsigned int ndirty_dirs, ndirty_files, ndirty_all;
> + unsigned int nquota_files, ndonate_files;
> int nats, dirty_nats, sits, dirty_sits;
> int free_nids, avail_nids, alloc_nids;
> int total_count, utilization;
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 9980d17ef9f5..d6dea6258c2d 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -2493,6 +2493,51 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg)
> return ret;
> }
>
> +static int f2fs_ioc_donate_range(struct file *filp, unsigned long arg)
> +{
> + struct inode *inode = file_inode(filp);
> + struct mnt_idmap *idmap = file_mnt_idmap(filp);
> + struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> + struct f2fs_donate_range range;
> + int ret;
> +
> + if (copy_from_user(&range, (struct f2fs_donate_range __user *)arg,
> + sizeof(range)))
> + return -EFAULT;
What about doing sanity check on donate range here? in order to avoid overflow
during fi->donate_end calculation.
F2FS_I(inode)->donate_end = range.start + range.len - 1;
> +
> + if (!inode_owner_or_capable(idmap, inode))
> + return -EACCES;
> +
> + if (!S_ISREG(inode->i_mode))
> + return -EINVAL;
> +
> + ret = mnt_want_write_file(filp);
> + if (ret)
> + return ret;
> +
> + inode_lock(inode);
> +
> + if (f2fs_is_atomic_file(inode))
> + goto out;
> +
> + spin_lock(&sbi->inode_lock[DONATE_INODE]);
> + if (list_empty(&F2FS_I(inode)->gdonate_list)) {
> + list_add_tail(&F2FS_I(inode)->gdonate_list,
> + &sbi->inode_list[DONATE_INODE]);
> + stat_inc_dirty_inode(sbi, DONATE_INODE);
> + } else {
> + list_move_tail(&F2FS_I(inode)->gdonate_list,
> + &sbi->inode_list[DONATE_INODE]);
> + }
> + F2FS_I(inode)->donate_start = range.start;
> + F2FS_I(inode)->donate_end = range.start + range.len - 1;
> + spin_unlock(&sbi->inode_lock[DONATE_INODE]);
> +out:
> + inode_unlock(inode);
> + mnt_drop_write_file(filp);
> + return ret;
> +}
> +
> static int f2fs_ioc_fitrim(struct file *filp, unsigned long arg)
> {
> struct inode *inode = file_inode(filp);
> @@ -4522,6 +4567,8 @@ static long __f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> return -EOPNOTSUPP;
> case F2FS_IOC_SHUTDOWN:
> return f2fs_ioc_shutdown(filp, arg);
> + case F2FS_IOC_DONATE_RANGE:
> + return f2fs_ioc_donate_range(filp, arg);
> case FITRIM:
> return f2fs_ioc_fitrim(filp, arg);
> case FS_IOC_SET_ENCRYPTION_POLICY:
> @@ -5273,6 +5320,7 @@ long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> case F2FS_IOC_RELEASE_VOLATILE_WRITE:
> case F2FS_IOC_ABORT_ATOMIC_WRITE:
> case F2FS_IOC_SHUTDOWN:
> + case F2FS_IOC_DONATE_RANGE:
> case FITRIM:
> case FS_IOC_SET_ENCRYPTION_POLICY:
> case FS_IOC_GET_ENCRYPTION_PWSALT:
> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> index 7de33da8b3ea..e38dc5fe2f2e 100644
> --- a/fs/f2fs/inode.c
> +++ b/fs/f2fs/inode.c
> @@ -804,6 +804,19 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
> return 0;
> }
>
> +static void f2fs_remove_donate_inode(struct inode *inode)
> +{
> + struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> +
> + if (list_empty(&F2FS_I(inode)->gdonate_list))
It will be more safe to access gdonate_list w/ inode_lock[DONATE_INODE]?
Thanks,
> + return;
> +
> + spin_lock(&sbi->inode_lock[DONATE_INODE]);
> + list_del_init(&F2FS_I(inode)->gdonate_list);
> + stat_dec_dirty_inode(sbi, DONATE_INODE);
> + spin_unlock(&sbi->inode_lock[DONATE_INODE]);
> +}
> +
> /*
> * Called at the last iput() if i_nlink is zero
> */
> @@ -838,6 +851,7 @@ void f2fs_evict_inode(struct inode *inode)
>
> f2fs_bug_on(sbi, get_dirty_pages(inode));
> f2fs_remove_dirty_inode(inode);
> + f2fs_remove_donate_inode(inode);
>
> if (!IS_DEVICE_ALIASING(inode))
> f2fs_destroy_extent_tree(inode);
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index fc7d463dee15..ef639a6d82e5 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -1441,6 +1441,7 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
> spin_lock_init(&fi->i_size_lock);
> INIT_LIST_HEAD(&fi->dirty_list);
> INIT_LIST_HEAD(&fi->gdirty_list);
> + INIT_LIST_HEAD(&fi->gdonate_list);
> init_f2fs_rwsem(&fi->i_gc_rwsem[READ]);
> init_f2fs_rwsem(&fi->i_gc_rwsem[WRITE]);
> init_f2fs_rwsem(&fi->i_xattr_sem);
> diff --git a/include/uapi/linux/f2fs.h b/include/uapi/linux/f2fs.h
> index f7aaf8d23e20..cd38a7c166e6 100644
> --- a/include/uapi/linux/f2fs.h
> +++ b/include/uapi/linux/f2fs.h
> @@ -44,6 +44,8 @@
> #define F2FS_IOC_COMPRESS_FILE _IO(F2FS_IOCTL_MAGIC, 24)
> #define F2FS_IOC_START_ATOMIC_REPLACE _IO(F2FS_IOCTL_MAGIC, 25)
> #define F2FS_IOC_GET_DEV_ALIAS_FILE _IOR(F2FS_IOCTL_MAGIC, 26, __u32)
> +#define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
> + struct f2fs_donate_range)
>
> /*
> * should be same as XFS_IOC_GOINGDOWN.
> @@ -97,4 +99,9 @@ struct f2fs_comp_option {
> __u8 log_cluster_size;
> };
>
> +struct f2fs_donate_range {
> + __u64 start;
> + __u64 len;
> +};
> +
> #endif /* _UAPI_LINUX_F2FS_H */
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [f2fs-dev] [PATCH 1/2] f2fs: register inodes which is able to donate pages
2025-01-14 6:34 ` [f2fs-dev] " Chao Yu
@ 2025-01-14 17:15 ` Jaegeuk Kim
2025-01-15 2:12 ` Chao Yu
0 siblings, 1 reply; 23+ messages in thread
From: Jaegeuk Kim @ 2025-01-14 17:15 UTC (permalink / raw)
To: Chao Yu; +Cc: linux-kernel, linux-f2fs-devel
On 01/14, Chao Yu wrote:
> On 1/14/25 02:39, Jaegeuk Kim via Linux-f2fs-devel wrote:
> > This patch introduces an inode list to keep the page cache ranges that users
> > can donate pages together.
> >
> > #define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
> > struct f2fs_donate_range)
> > struct f2fs_donate_range {
> > __u64 start;
> > __u64 len;
> > };
> >
> > e.g., ioctl(F2FS_IOC_DONATE_RANGE, &range);
>
> I guess we need to add documentation for all ioctls including this one, maybe
> later? :)
Yeah, later.
>
> >
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > ---
> > fs/f2fs/debug.c | 3 +++
> > fs/f2fs/f2fs.h | 9 +++++++-
> > fs/f2fs/file.c | 48 +++++++++++++++++++++++++++++++++++++++
> > fs/f2fs/inode.c | 14 ++++++++++++
> > fs/f2fs/super.c | 1 +
> > include/uapi/linux/f2fs.h | 7 ++++++
> > 6 files changed, 81 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
> > index 468828288a4a..1b099c123670 100644
> > --- a/fs/f2fs/debug.c
> > +++ b/fs/f2fs/debug.c
> > @@ -164,6 +164,7 @@ static void update_general_status(struct f2fs_sb_info *sbi)
> > si->ndirty_imeta = get_pages(sbi, F2FS_DIRTY_IMETA);
> > si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE];
> > si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
> > + si->ndonate_files = sbi->ndirty_inode[DONATE_INODE];
> > si->nquota_files = sbi->nquota_files;
> > si->ndirty_all = sbi->ndirty_inode[DIRTY_META];
> > si->aw_cnt = atomic_read(&sbi->atomic_files);
> > @@ -501,6 +502,8 @@ static int stat_show(struct seq_file *s, void *v)
> > si->compr_inode, si->compr_blocks);
> > seq_printf(s, " - Swapfile Inode: %u\n",
> > si->swapfile_inode);
> > + seq_printf(s, " - Donate Inode: %d\n",
>
> %u instead of %d due to si->ndonate_files is type of unsigned int.
>
> > + si->ndonate_files);
> > seq_printf(s, " - Orphan/Append/Update Inode: %u, %u, %u\n",
> > si->orphans, si->append, si->update);
> > seq_printf(s, "\nMain area: %d segs, %d secs %d zones\n",
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 4bfe162eefd3..7ce3e3eab17a 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -850,6 +850,11 @@ struct f2fs_inode_info {
> > #endif
> > struct list_head dirty_list; /* dirty list for dirs and files */
> > struct list_head gdirty_list; /* linked in global dirty list */
> > +
> > + /* linked in global inode list for cache donation */
> > + struct list_head gdonate_list;
> > + loff_t donate_start, donate_end; /* inclusive */
> > +
> > struct task_struct *atomic_write_task; /* store atomic write task */
> > struct extent_tree *extent_tree[NR_EXTENT_CACHES];
> > /* cached extent_tree entry */
> > @@ -1274,6 +1279,7 @@ enum inode_type {
> > DIR_INODE, /* for dirty dir inode */
> > FILE_INODE, /* for dirty regular/symlink inode */
> > DIRTY_META, /* for all dirtied inode metadata */
> > + DONATE_INODE, /* for all inode to donate pages */
> > NR_INODE_TYPE,
> > };
> > @@ -3984,7 +3990,8 @@ struct f2fs_stat_info {
> > unsigned long long allocated_data_blocks;
> > int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta;
> > int ndirty_data, ndirty_qdata;
> > - unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all;
> > + unsigned int ndirty_dirs, ndirty_files, ndirty_all;
> > + unsigned int nquota_files, ndonate_files;
> > int nats, dirty_nats, sits, dirty_sits;
> > int free_nids, avail_nids, alloc_nids;
> > int total_count, utilization;
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index 9980d17ef9f5..d6dea6258c2d 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -2493,6 +2493,51 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg)
> > return ret;
> > }
> > +static int f2fs_ioc_donate_range(struct file *filp, unsigned long arg)
> > +{
> > + struct inode *inode = file_inode(filp);
> > + struct mnt_idmap *idmap = file_mnt_idmap(filp);
> > + struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> > + struct f2fs_donate_range range;
> > + int ret;
> > +
> > + if (copy_from_user(&range, (struct f2fs_donate_range __user *)arg,
> > + sizeof(range)))
> > + return -EFAULT;
>
> What about doing sanity check on donate range here? in order to avoid overflow
> during fi->donate_end calculation.
>
> F2FS_I(inode)->donate_end = range.start + range.len - 1;
>
> > +
> > + if (!inode_owner_or_capable(idmap, inode))
> > + return -EACCES;
> > +
> > + if (!S_ISREG(inode->i_mode))
> > + return -EINVAL;
> > +
> > + ret = mnt_want_write_file(filp);
> > + if (ret)
> > + return ret;
> > +
> > + inode_lock(inode);
> > +
> > + if (f2fs_is_atomic_file(inode))
> > + goto out;
> > +
> > + spin_lock(&sbi->inode_lock[DONATE_INODE]);
> > + if (list_empty(&F2FS_I(inode)->gdonate_list)) {
> > + list_add_tail(&F2FS_I(inode)->gdonate_list,
> > + &sbi->inode_list[DONATE_INODE]);
> > + stat_inc_dirty_inode(sbi, DONATE_INODE);
> > + } else {
> > + list_move_tail(&F2FS_I(inode)->gdonate_list,
> > + &sbi->inode_list[DONATE_INODE]);
> > + }
> > + F2FS_I(inode)->donate_start = range.start;
> > + F2FS_I(inode)->donate_end = range.start + range.len - 1;
> > + spin_unlock(&sbi->inode_lock[DONATE_INODE]);
> > +out:
> > + inode_unlock(inode);
> > + mnt_drop_write_file(filp);
> > + return ret;
> > +}
> > +
> > static int f2fs_ioc_fitrim(struct file *filp, unsigned long arg)
> > {
> > struct inode *inode = file_inode(filp);
> > @@ -4522,6 +4567,8 @@ static long __f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> > return -EOPNOTSUPP;
> > case F2FS_IOC_SHUTDOWN:
> > return f2fs_ioc_shutdown(filp, arg);
> > + case F2FS_IOC_DONATE_RANGE:
> > + return f2fs_ioc_donate_range(filp, arg);
> > case FITRIM:
> > return f2fs_ioc_fitrim(filp, arg);
> > case FS_IOC_SET_ENCRYPTION_POLICY:
> > @@ -5273,6 +5320,7 @@ long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> > case F2FS_IOC_RELEASE_VOLATILE_WRITE:
> > case F2FS_IOC_ABORT_ATOMIC_WRITE:
> > case F2FS_IOC_SHUTDOWN:
> > + case F2FS_IOC_DONATE_RANGE:
> > case FITRIM:
> > case FS_IOC_SET_ENCRYPTION_POLICY:
> > case FS_IOC_GET_ENCRYPTION_PWSALT:
> > diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> > index 7de33da8b3ea..e38dc5fe2f2e 100644
> > --- a/fs/f2fs/inode.c
> > +++ b/fs/f2fs/inode.c
> > @@ -804,6 +804,19 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
> > return 0;
> > }
> > +static void f2fs_remove_donate_inode(struct inode *inode)
> > +{
> > + struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> > +
> > + if (list_empty(&F2FS_I(inode)->gdonate_list))
>
> It will be more safe to access gdonate_list w/ inode_lock[DONATE_INODE]?
It's unnecessary as this is called from evict_inode.
>
> Thanks,
>
> > + return;
> > +
> > + spin_lock(&sbi->inode_lock[DONATE_INODE]);
> > + list_del_init(&F2FS_I(inode)->gdonate_list);
> > + stat_dec_dirty_inode(sbi, DONATE_INODE);
> > + spin_unlock(&sbi->inode_lock[DONATE_INODE]);
> > +}
> > +
> > /*
> > * Called at the last iput() if i_nlink is zero
> > */
> > @@ -838,6 +851,7 @@ void f2fs_evict_inode(struct inode *inode)
> > f2fs_bug_on(sbi, get_dirty_pages(inode));
> > f2fs_remove_dirty_inode(inode);
> > + f2fs_remove_donate_inode(inode);
> > if (!IS_DEVICE_ALIASING(inode))
> > f2fs_destroy_extent_tree(inode);
> > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > index fc7d463dee15..ef639a6d82e5 100644
> > --- a/fs/f2fs/super.c
> > +++ b/fs/f2fs/super.c
> > @@ -1441,6 +1441,7 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
> > spin_lock_init(&fi->i_size_lock);
> > INIT_LIST_HEAD(&fi->dirty_list);
> > INIT_LIST_HEAD(&fi->gdirty_list);
> > + INIT_LIST_HEAD(&fi->gdonate_list);
> > init_f2fs_rwsem(&fi->i_gc_rwsem[READ]);
> > init_f2fs_rwsem(&fi->i_gc_rwsem[WRITE]);
> > init_f2fs_rwsem(&fi->i_xattr_sem);
> > diff --git a/include/uapi/linux/f2fs.h b/include/uapi/linux/f2fs.h
> > index f7aaf8d23e20..cd38a7c166e6 100644
> > --- a/include/uapi/linux/f2fs.h
> > +++ b/include/uapi/linux/f2fs.h
> > @@ -44,6 +44,8 @@
> > #define F2FS_IOC_COMPRESS_FILE _IO(F2FS_IOCTL_MAGIC, 24)
> > #define F2FS_IOC_START_ATOMIC_REPLACE _IO(F2FS_IOCTL_MAGIC, 25)
> > #define F2FS_IOC_GET_DEV_ALIAS_FILE _IOR(F2FS_IOCTL_MAGIC, 26, __u32)
> > +#define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
> > + struct f2fs_donate_range)
> > /*
> > * should be same as XFS_IOC_GOINGDOWN.
> > @@ -97,4 +99,9 @@ struct f2fs_comp_option {
> > __u8 log_cluster_size;
> > };
> > +struct f2fs_donate_range {
> > + __u64 start;
> > + __u64 len;
> > +};
> > +
> > #endif /* _UAPI_LINUX_F2FS_H */
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [f2fs-dev] [PATCH 1/2] f2fs: register inodes which is able to donate pages
2025-01-14 17:15 ` Jaegeuk Kim
@ 2025-01-15 2:12 ` Chao Yu
0 siblings, 0 replies; 23+ messages in thread
From: Chao Yu @ 2025-01-15 2:12 UTC (permalink / raw)
To: Jaegeuk Kim; +Cc: chao, linux-kernel, linux-f2fs-devel
On 1/15/25 01:15, Jaegeuk Kim wrote:
> On 01/14, Chao Yu wrote:
>> On 1/14/25 02:39, Jaegeuk Kim via Linux-f2fs-devel wrote:
>>> This patch introduces an inode list to keep the page cache ranges that users
>>> can donate pages together.
>>>
>>> #define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
>>> struct f2fs_donate_range)
>>> struct f2fs_donate_range {
>>> __u64 start;
>>> __u64 len;
>>> };
>>>
>>> e.g., ioctl(F2FS_IOC_DONATE_RANGE, &range);
>>
>> I guess we need to add documentation for all ioctls including this one, maybe
>> later? :)
>
> Yeah, later.
>
>>
>>>
>>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
>>> ---
>>> fs/f2fs/debug.c | 3 +++
>>> fs/f2fs/f2fs.h | 9 +++++++-
>>> fs/f2fs/file.c | 48 +++++++++++++++++++++++++++++++++++++++
>>> fs/f2fs/inode.c | 14 ++++++++++++
>>> fs/f2fs/super.c | 1 +
>>> include/uapi/linux/f2fs.h | 7 ++++++
>>> 6 files changed, 81 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
>>> index 468828288a4a..1b099c123670 100644
>>> --- a/fs/f2fs/debug.c
>>> +++ b/fs/f2fs/debug.c
>>> @@ -164,6 +164,7 @@ static void update_general_status(struct f2fs_sb_info *sbi)
>>> si->ndirty_imeta = get_pages(sbi, F2FS_DIRTY_IMETA);
>>> si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE];
>>> si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
>>> + si->ndonate_files = sbi->ndirty_inode[DONATE_INODE];
>>> si->nquota_files = sbi->nquota_files;
>>> si->ndirty_all = sbi->ndirty_inode[DIRTY_META];
>>> si->aw_cnt = atomic_read(&sbi->atomic_files);
>>> @@ -501,6 +502,8 @@ static int stat_show(struct seq_file *s, void *v)
>>> si->compr_inode, si->compr_blocks);
>>> seq_printf(s, " - Swapfile Inode: %u\n",
>>> si->swapfile_inode);
>>> + seq_printf(s, " - Donate Inode: %d\n",
>>
>> %u instead of %d due to si->ndonate_files is type of unsigned int.
>>
>>> + si->ndonate_files);
>>> seq_printf(s, " - Orphan/Append/Update Inode: %u, %u, %u\n",
>>> si->orphans, si->append, si->update);
>>> seq_printf(s, "\nMain area: %d segs, %d secs %d zones\n",
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>> index 4bfe162eefd3..7ce3e3eab17a 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -850,6 +850,11 @@ struct f2fs_inode_info {
>>> #endif
>>> struct list_head dirty_list; /* dirty list for dirs and files */
>>> struct list_head gdirty_list; /* linked in global dirty list */
>>> +
>>> + /* linked in global inode list for cache donation */
>>> + struct list_head gdonate_list;
>>> + loff_t donate_start, donate_end; /* inclusive */
>>> +
>>> struct task_struct *atomic_write_task; /* store atomic write task */
>>> struct extent_tree *extent_tree[NR_EXTENT_CACHES];
>>> /* cached extent_tree entry */
>>> @@ -1274,6 +1279,7 @@ enum inode_type {
>>> DIR_INODE, /* for dirty dir inode */
>>> FILE_INODE, /* for dirty regular/symlink inode */
>>> DIRTY_META, /* for all dirtied inode metadata */
>>> + DONATE_INODE, /* for all inode to donate pages */
>>> NR_INODE_TYPE,
>>> };
>>> @@ -3984,7 +3990,8 @@ struct f2fs_stat_info {
>>> unsigned long long allocated_data_blocks;
>>> int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta;
>>> int ndirty_data, ndirty_qdata;
>>> - unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all;
>>> + unsigned int ndirty_dirs, ndirty_files, ndirty_all;
>>> + unsigned int nquota_files, ndonate_files;
>>> int nats, dirty_nats, sits, dirty_sits;
>>> int free_nids, avail_nids, alloc_nids;
>>> int total_count, utilization;
>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>>> index 9980d17ef9f5..d6dea6258c2d 100644
>>> --- a/fs/f2fs/file.c
>>> +++ b/fs/f2fs/file.c
>>> @@ -2493,6 +2493,51 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg)
>>> return ret;
>>> }
>>> +static int f2fs_ioc_donate_range(struct file *filp, unsigned long arg)
>>> +{
>>> + struct inode *inode = file_inode(filp);
>>> + struct mnt_idmap *idmap = file_mnt_idmap(filp);
>>> + struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
>>> + struct f2fs_donate_range range;
>>> + int ret;
>>> +
>>> + if (copy_from_user(&range, (struct f2fs_donate_range __user *)arg,
>>> + sizeof(range)))
>>> + return -EFAULT;
>>
>> What about doing sanity check on donate range here? in order to avoid overflow
>> during fi->donate_end calculation.
>>
>> F2FS_I(inode)->donate_end = range.start + range.len - 1;
>>
>>> +
>>> + if (!inode_owner_or_capable(idmap, inode))
>>> + return -EACCES;
>>> +
>>> + if (!S_ISREG(inode->i_mode))
>>> + return -EINVAL;
>>> +
>>> + ret = mnt_want_write_file(filp);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + inode_lock(inode);
>>> +
>>> + if (f2fs_is_atomic_file(inode))
>>> + goto out;
>>> +
>>> + spin_lock(&sbi->inode_lock[DONATE_INODE]);
>>> + if (list_empty(&F2FS_I(inode)->gdonate_list)) {
>>> + list_add_tail(&F2FS_I(inode)->gdonate_list,
>>> + &sbi->inode_list[DONATE_INODE]);
>>> + stat_inc_dirty_inode(sbi, DONATE_INODE);
>>> + } else {
>>> + list_move_tail(&F2FS_I(inode)->gdonate_list,
>>> + &sbi->inode_list[DONATE_INODE]);
>>> + }
>>> + F2FS_I(inode)->donate_start = range.start;
>>> + F2FS_I(inode)->donate_end = range.start + range.len - 1;
>>> + spin_unlock(&sbi->inode_lock[DONATE_INODE]);
>>> +out:
>>> + inode_unlock(inode);
>>> + mnt_drop_write_file(filp);
>>> + return ret;
>>> +}
>>> +
>>> static int f2fs_ioc_fitrim(struct file *filp, unsigned long arg)
>>> {
>>> struct inode *inode = file_inode(filp);
>>> @@ -4522,6 +4567,8 @@ static long __f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>>> return -EOPNOTSUPP;
>>> case F2FS_IOC_SHUTDOWN:
>>> return f2fs_ioc_shutdown(filp, arg);
>>> + case F2FS_IOC_DONATE_RANGE:
>>> + return f2fs_ioc_donate_range(filp, arg);
>>> case FITRIM:
>>> return f2fs_ioc_fitrim(filp, arg);
>>> case FS_IOC_SET_ENCRYPTION_POLICY:
>>> @@ -5273,6 +5320,7 @@ long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
>>> case F2FS_IOC_RELEASE_VOLATILE_WRITE:
>>> case F2FS_IOC_ABORT_ATOMIC_WRITE:
>>> case F2FS_IOC_SHUTDOWN:
>>> + case F2FS_IOC_DONATE_RANGE:
>>> case FITRIM:
>>> case FS_IOC_SET_ENCRYPTION_POLICY:
>>> case FS_IOC_GET_ENCRYPTION_PWSALT:
>>> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
>>> index 7de33da8b3ea..e38dc5fe2f2e 100644
>>> --- a/fs/f2fs/inode.c
>>> +++ b/fs/f2fs/inode.c
>>> @@ -804,6 +804,19 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
>>> return 0;
>>> }
>>> +static void f2fs_remove_donate_inode(struct inode *inode)
>>> +{
>>> + struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
>>> +
>>> + if (list_empty(&F2FS_I(inode)->gdonate_list))
>>
>> It will be more safe to access gdonate_list w/ inode_lock[DONATE_INODE]?
>
> It's unnecessary as this is called from evict_inode.
I just concerned about the case fi->gdonate_list's prev and next pointer can
be updated in race condition due to insertion or deletion of its adjacent entry.
No risk now as I checked. :)
Thanks,
>
>>
>> Thanks,
>>
>>> + return;
>>> +
>>> + spin_lock(&sbi->inode_lock[DONATE_INODE]);
>>> + list_del_init(&F2FS_I(inode)->gdonate_list);
>>> + stat_dec_dirty_inode(sbi, DONATE_INODE);
>>> + spin_unlock(&sbi->inode_lock[DONATE_INODE]);
>>> +}
>>> +
>>> /*
>>> * Called at the last iput() if i_nlink is zero
>>> */
>>> @@ -838,6 +851,7 @@ void f2fs_evict_inode(struct inode *inode)
>>> f2fs_bug_on(sbi, get_dirty_pages(inode));
>>> f2fs_remove_dirty_inode(inode);
>>> + f2fs_remove_donate_inode(inode);
>>> if (!IS_DEVICE_ALIASING(inode))
>>> f2fs_destroy_extent_tree(inode);
>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>> index fc7d463dee15..ef639a6d82e5 100644
>>> --- a/fs/f2fs/super.c
>>> +++ b/fs/f2fs/super.c
>>> @@ -1441,6 +1441,7 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
>>> spin_lock_init(&fi->i_size_lock);
>>> INIT_LIST_HEAD(&fi->dirty_list);
>>> INIT_LIST_HEAD(&fi->gdirty_list);
>>> + INIT_LIST_HEAD(&fi->gdonate_list);
>>> init_f2fs_rwsem(&fi->i_gc_rwsem[READ]);
>>> init_f2fs_rwsem(&fi->i_gc_rwsem[WRITE]);
>>> init_f2fs_rwsem(&fi->i_xattr_sem);
>>> diff --git a/include/uapi/linux/f2fs.h b/include/uapi/linux/f2fs.h
>>> index f7aaf8d23e20..cd38a7c166e6 100644
>>> --- a/include/uapi/linux/f2fs.h
>>> +++ b/include/uapi/linux/f2fs.h
>>> @@ -44,6 +44,8 @@
>>> #define F2FS_IOC_COMPRESS_FILE _IO(F2FS_IOCTL_MAGIC, 24)
>>> #define F2FS_IOC_START_ATOMIC_REPLACE _IO(F2FS_IOCTL_MAGIC, 25)
>>> #define F2FS_IOC_GET_DEV_ALIAS_FILE _IOR(F2FS_IOCTL_MAGIC, 26, __u32)
>>> +#define F2FS_IOC_DONATE_RANGE _IOW(F2FS_IOCTL_MAGIC, 27, \
>>> + struct f2fs_donate_range)
>>> /*
>>> * should be same as XFS_IOC_GOINGDOWN.
>>> @@ -97,4 +99,9 @@ struct f2fs_comp_option {
>>> __u8 log_cluster_size;
>>> };
>>> +struct f2fs_donate_range {
>>> + __u64 start;
>>> + __u64 len;
>>> +};
>>> +
>>> #endif /* _UAPI_LINUX_F2FS_H */
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2025-02-12 0:39 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-17 16:41 [PATCH 0/2 v6] add ioctl/sysfs to donate file-backed pages Jaegeuk Kim
2025-01-17 16:41 ` [PATCH 1/2] f2fs: register inodes which is able to donate pages Jaegeuk Kim
2025-01-21 9:22 ` [f2fs-dev] " Chao Yu
2025-01-21 16:56 ` Jaegeuk Kim
2025-01-17 16:41 ` [PATCH 2/2] f2fs: add a sysfs entry to request donate file-backed pages Jaegeuk Kim
2025-01-17 18:05 ` [PATCH 0/2 v6] add ioctl/sysfs to " Matthew Wilcox
2025-01-17 18:48 ` Jaegeuk Kim
2025-01-17 19:04 ` Matthew Wilcox
2025-01-17 20:37 ` Jaegeuk Kim
2025-02-04 16:29 ` Jaegeuk Kim
2025-02-10 17:00 ` Jaegeuk Kim
2025-02-10 17:20 ` Matthew Wilcox
2025-02-10 19:01 ` Jaegeuk Kim
2025-02-12 0:39 ` Jaegeuk Kim
-- strict thread matches above, loose matches on Subject: below --
2025-01-22 21:10 [PATCH 0/2 v7] " Jaegeuk Kim
2025-01-22 21:10 ` [PATCH 1/2] f2fs: register inodes which is able to donate pages Jaegeuk Kim
2025-01-23 1:50 ` [f2fs-dev] " Chao Yu
2025-01-16 22:51 [PATCH 0/2 v5 RESEND] add ioctl/sysfs to donate file-backed pages Jaegeuk Kim
2025-01-16 22:51 ` [PATCH 1/2] f2fs: register inodes which is able to donate pages Jaegeuk Kim
2025-01-17 1:48 ` [f2fs-dev] " Chao Yu
2025-01-16 4:41 [PATCH 0/2 v4] add ioctl/sysfs to donate file-backed pages Jaegeuk Kim
2025-01-16 4:42 ` [PATCH 1/2] f2fs: register inodes which is able to donate pages Jaegeuk Kim
2025-01-16 5:31 ` [f2fs-dev] " Chao Yu
2025-01-16 17:00 ` Jaegeuk Kim
2025-01-15 22:16 [PATCH 0/2 v3] add ioctl/sysfs to donate file-backed pages Jaegeuk Kim
2025-01-15 22:16 ` [PATCH 1/2] f2fs: register inodes which is able to donate pages Jaegeuk Kim
2025-01-16 3:07 ` [f2fs-dev] " Chao Yu
2025-01-14 22:39 [PATCH 0/2 v2] add ioctl/sysfs to donate file-backed pages Jaegeuk Kim
2025-01-14 22:39 ` [PATCH 1/2] f2fs: register inodes which is able to donate pages Jaegeuk Kim
2025-01-15 1:59 ` [f2fs-dev] " Chao Yu
2025-01-13 18:39 Jaegeuk Kim
2025-01-14 6:34 ` [f2fs-dev] " Chao Yu
2025-01-14 17:15 ` Jaegeuk Kim
2025-01-15 2:12 ` Chao Yu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox