* Re: [PATCH v2] f2fs: another way to set large folio by remembering inode number [not found] ` <adhPZxtbZxgU-37v@google.com> @ 2026-04-14 8:02 ` Christoph Hellwig 2026-04-15 16:44 ` Jaegeuk Kim 0 siblings, 1 reply; 20+ messages in thread From: Christoph Hellwig @ 2026-04-14 8:02 UTC (permalink / raw) To: Jaegeuk Kim Cc: linux-kernel, linux-f2fs-devel, Akilesh Kailash, linux-fsdevel, linux-mm, linux-api Please add the relevant mailing lists when adding new user interfaces. And I'm not sure hacks working around the proper large folio implementation are something that should be merged upstream. On Fri, Apr 10, 2026 at 01:16:23AM +0000, Jaegeuk Kim wrote: > enum { > F2FS_XATTR_FADV_LARGEFOLIO, > }; > > unsigned int value = BIT(F2FS_XATTR_FADV_LARGEFOLIO); > > 1. setxattr(file, "user.fadvise", &value, sizeof(unsigned int), 0) > -> register the inode number for large folio > 2. chmod(0400, file) > -> make Read-Only > 3. fsync() && close() && open(READ) > -> f2fs_iget() with large folio > 4. open(WRITE), mkwrite on mmap, chmod(WRITE) > -> return error > 5. close() and open() > -> goto #3 > 6. unlink > -> deregister the inode number > > Suggested-by: Akilesh Kailash <akailash@google.com> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> > --- > > Log from v1: > - add a condition in f2fs_drop_inode > - add Doc > > Documentation/filesystems/f2fs.rst | 41 ++++++++++++++++++++++++++---- > fs/f2fs/checkpoint.c | 2 +- > fs/f2fs/data.c | 2 +- > fs/f2fs/f2fs.h | 1 + > fs/f2fs/file.c | 11 ++++++-- > fs/f2fs/inode.c | 19 +++++++++++--- > fs/f2fs/super.c | 7 +++++ > fs/f2fs/xattr.c | 35 ++++++++++++++++++++++++- > fs/f2fs/xattr.h | 6 +++++ > 9 files changed, 111 insertions(+), 13 deletions(-) > > diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst > index 7e4031631286..de899d0d3088 100644 > --- a/Documentation/filesystems/f2fs.rst > +++ b/Documentation/filesystems/f2fs.rst > @@ -1044,11 +1044,14 @@ page allocation for significant performance gains. To minimize code complexity, > this support is currently excluded from the write path, which requires handling > complex optimizations such as compression and block allocation modes. > > -This optional feature is triggered only when a file's immutable bit is set. > -Consequently, F2FS will return EOPNOTSUPP if a user attempts to open a cached > -file with write permissions, even immediately after clearing the bit. Write > -access is only restored once the cached inode is dropped. The usage flow is > -demonstrated below: > +This optional feature is triggered by two mechanisms: the file's immutable bit > +or a specific xattr flag. In both cases, F2FS ensures data integrity by > +restricting the file to a read-only state while large folios are active. > + > +1. Immutable Bit Approach: > +Triggered when the FS_IMMUTABLE_FL is set. This is a strict enforcement > +where the file cannot be modified at all until the bit is cleared and > +the cached inode is dropped. > > .. code-block:: > > @@ -1078,3 +1081,31 @@ demonstrated below: > Written 4096 bytes with pattern = zero, total_time = 29 us, max_latency = 28 us > > # rm /data/testfile_read_seq > + > +2. XATTR fadvise Approach: > +A more flexible registration via extended attributes. > + > +.. code-block:: > + > + enum { > + F2FS_XATTR_FADV_LARGEFOLIO, > + }; > + unsigned int value = BIT(F2FS_XATTR_FADV_LARGEFOLIO); > + > + /* Registers the inode number for large folio support in the subsystem.*/ > + # setxattr(file, "user.fadvise", &value, sizeof(unsigned int), 0) > + > + /* The file must be made Read-Only to transition into the large folio path. */ > + # fchmod(0400, fd) > + > + /* clean up dirty inode state. */ > + # fsync(fd) > + > + /* Drop the inode cache. > + # close(fd) > + > + /* f2fs_iget() instantiates the inode with large folio support.*/ > + # open() > + > + /* Returns -EOPNOTSUPP or error to protect the large folio cache.*/ > + # open(WRITE), mkwrite on mmap, or chmod(WRITE) > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c > index 01e1ba77263e..fdd62ddc3ed6 100644 > --- a/fs/f2fs/checkpoint.c > +++ b/fs/f2fs/checkpoint.c > @@ -778,7 +778,7 @@ void f2fs_remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type) > __remove_ino_entry(sbi, ino, type); > } > > -/* mode should be APPEND_INO, UPDATE_INO or TRANS_DIR_INO */ > +/* mode should be APPEND_INO, UPDATE_INO, LARGE_FOLIO_IO, or TRANS_DIR_INO */ > bool f2fs_exist_written_data(struct f2fs_sb_info *sbi, nid_t ino, int mode) > { > struct inode_management *im = &sbi->im[mode]; > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c > index 965d4e6443c6..5e46230398d7 100644 > --- a/fs/f2fs/data.c > +++ b/fs/f2fs/data.c > @@ -2494,7 +2494,7 @@ static int f2fs_read_data_large_folio(struct inode *inode, > int ret = 0; > bool folio_in_bio; > > - if (!IS_IMMUTABLE(inode) || f2fs_compressed_file(inode)) { > + if (f2fs_compressed_file(inode)) { > if (folio) > folio_unlock(folio); > return -EOPNOTSUPP; > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > index e40b6b2784ee..02bc6eb96a59 100644 > --- a/fs/f2fs/f2fs.h > +++ b/fs/f2fs/f2fs.h > @@ -381,6 +381,7 @@ enum { > /* for the list of ino */ > enum { > ORPHAN_INO, /* for orphan ino list */ > + LARGE_FOLIO_INO, /* for large folio case */ > APPEND_INO, /* for append ino list */ > UPDATE_INO, /* for update ino list */ > TRANS_DIR_INO, /* for transactions dir ino list */ > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c > index c0220cd7b332..64ba900410fc 100644 > --- a/fs/f2fs/file.c > +++ b/fs/f2fs/file.c > @@ -2068,9 +2068,16 @@ static long f2fs_fallocate(struct file *file, int mode, > > static int f2fs_release_file(struct inode *inode, struct file *filp) > { > - if (atomic_dec_and_test(&F2FS_I(inode)->open_count)) > + if (atomic_dec_and_test(&F2FS_I(inode)->open_count)) { > f2fs_remove_donate_inode(inode); > - > + /* > + * In order to get large folio as soon as possible, let's drop > + * inode cache asap. See also f2fs_drop_inode. > + */ > + if (f2fs_exist_written_data(F2FS_I_SB(inode), > + inode->i_ino, LARGE_FOLIO_INO)) > + d_drop(filp->f_path.dentry); > + } > /* > * f2fs_release_file is called at every close calls. So we should > * not drop any inmemory pages by close called by other process. > diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c > index 89240be8cc59..e100bc5a378c 100644 > --- a/fs/f2fs/inode.c > +++ b/fs/f2fs/inode.c > @@ -565,6 +565,20 @@ static bool is_meta_ino(struct f2fs_sb_info *sbi, unsigned int ino) > ino == F2FS_COMPRESS_INO(sbi); > } > > +static void f2fs_mapping_set_large_folio(struct inode *inode) > +{ > + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); > + > + if (f2fs_compressed_file(inode)) > + return; > + if (f2fs_quota_file(sbi, inode->i_ino)) > + return; > + if (IS_IMMUTABLE(inode) || > + (f2fs_exist_written_data(sbi, inode->i_ino, LARGE_FOLIO_INO) && > + !(inode->i_mode & S_IWUGO))) > + mapping_set_folio_min_order(inode->i_mapping, 0); > +} > + > struct inode *f2fs_iget(struct super_block *sb, unsigned long ino) > { > struct f2fs_sb_info *sbi = F2FS_SB(sb); > @@ -620,9 +634,7 @@ struct inode *f2fs_iget(struct super_block *sb, unsigned long ino) > inode->i_op = &f2fs_file_inode_operations; > inode->i_fop = &f2fs_file_operations; > inode->i_mapping->a_ops = &f2fs_dblock_aops; > - if (IS_IMMUTABLE(inode) && !f2fs_compressed_file(inode) && > - !f2fs_quota_file(sbi, inode->i_ino)) > - mapping_set_folio_min_order(inode->i_mapping, 0); > + f2fs_mapping_set_large_folio(inode); > } else if (S_ISDIR(inode->i_mode)) { > inode->i_op = &f2fs_dir_inode_operations; > inode->i_fop = &f2fs_dir_operations; > @@ -895,6 +907,7 @@ void f2fs_evict_inode(struct inode *inode) > f2fs_remove_ino_entry(sbi, inode->i_ino, APPEND_INO); > f2fs_remove_ino_entry(sbi, inode->i_ino, UPDATE_INO); > f2fs_remove_ino_entry(sbi, inode->i_ino, FLUSH_INO); > + f2fs_remove_ino_entry(sbi, inode->i_ino, LARGE_FOLIO_INO); > > if (!is_sbi_flag_set(sbi, SBI_IS_FREEZING)) { > sb_start_intwrite(inode->i_sb); > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c > index ccf806b676f5..11d1e0c99ac1 100644 > --- a/fs/f2fs/super.c > +++ b/fs/f2fs/super.c > @@ -1844,6 +1844,13 @@ static int f2fs_drop_inode(struct inode *inode) > return 1; > } > } > + /* > + * In order to get large folio as soon as possible, let's drop > + * inode cache asap. See also f2fs_release_file. > + */ > + if (f2fs_exist_written_data(sbi, inode->i_ino, LARGE_FOLIO_INO) && > + !is_inode_flag_set(inode, FI_DIRTY_INODE)) > + return 1; > > /* > * This is to avoid a deadlock condition like below. > diff --git a/fs/f2fs/xattr.c b/fs/f2fs/xattr.c > index 941dc62a6d6f..0c0e44c2dcdd 100644 > --- a/fs/f2fs/xattr.c > +++ b/fs/f2fs/xattr.c > @@ -44,6 +44,16 @@ static void xattr_free(struct f2fs_sb_info *sbi, void *xattr_addr, > kfree(xattr_addr); > } > > +static int f2fs_xattr_fadvise_get(struct inode *inode, void *buffer) > +{ > + if (!buffer) > + goto out; > + if (mapping_large_folio_support(inode->i_mapping)) > + *((unsigned int *)buffer) |= BIT(F2FS_XATTR_FADV_LARGEFOLIO); > +out: > + return sizeof(unsigned int); > +} > + > static int f2fs_xattr_generic_get(const struct xattr_handler *handler, > struct dentry *unused, struct inode *inode, > const char *name, void *buffer, size_t size) > @@ -61,10 +71,29 @@ static int f2fs_xattr_generic_get(const struct xattr_handler *handler, > default: > return -EINVAL; > } > + if (handler->flags == F2FS_XATTR_INDEX_USER && > + !strcmp(name, "fadvise")) > + return f2fs_xattr_fadvise_get(inode, buffer); > + > return f2fs_getxattr(inode, handler->flags, name, > buffer, size, NULL); > } > > +static int f2fs_xattr_fadvise_set(struct inode *inode, const void *value) > +{ > + unsigned int new_fadvise; > + > + new_fadvise = *(unsigned int *)value; > + > + if (new_fadvise & BIT(F2FS_XATTR_FADV_LARGEFOLIO)) > + f2fs_add_ino_entry(F2FS_I_SB(inode), > + inode->i_ino, LARGE_FOLIO_INO); > + else > + f2fs_remove_ino_entry(F2FS_I_SB(inode), > + inode->i_ino, LARGE_FOLIO_INO); > + return 0; > +} > + > static int f2fs_xattr_generic_set(const struct xattr_handler *handler, > struct mnt_idmap *idmap, > struct dentry *unused, struct inode *inode, > @@ -84,6 +113,10 @@ static int f2fs_xattr_generic_set(const struct xattr_handler *handler, > default: > return -EINVAL; > } > + if (handler->flags == F2FS_XATTR_INDEX_USER && > + !strcmp(name, "fadvise")) > + return f2fs_xattr_fadvise_set(inode, value); > + > return f2fs_setxattr(inode, handler->flags, name, > value, size, NULL, flags); > } > @@ -842,4 +875,4 @@ int __init f2fs_init_xattr_cache(void) > void f2fs_destroy_xattr_cache(void) > { > kmem_cache_destroy(inline_xattr_slab); > -} > \ No newline at end of file > +} > diff --git a/fs/f2fs/xattr.h b/fs/f2fs/xattr.h > index bce3d93e4755..455f460d014e 100644 > --- a/fs/f2fs/xattr.h > +++ b/fs/f2fs/xattr.h > @@ -24,6 +24,7 @@ > #define F2FS_XATTR_REFCOUNT_MAX 1024 > > /* Name indexes */ > +#define F2FS_USER_FADVISE_NAME "user.fadvise" > #define F2FS_SYSTEM_ADVISE_NAME "system.advise" > #define F2FS_XATTR_INDEX_USER 1 > #define F2FS_XATTR_INDEX_POSIX_ACL_ACCESS 2 > @@ -39,6 +40,11 @@ > #define F2FS_XATTR_NAME_ENCRYPTION_CONTEXT "c" > #define F2FS_XATTR_NAME_VERITY "v" > > +/* used for F2FS_USER_FADVISE_NAME */ > +enum { > + F2FS_XATTR_FADV_LARGEFOLIO, > +}; > + > struct f2fs_xattr_header { > __le32 h_magic; /* magic number for identification */ > __le32 h_refcount; /* reference count */ > -- > 2.53.0.1213.gd9a14994de-goog > > ---end quoted text--- ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] f2fs: another way to set large folio by remembering inode number 2026-04-14 8:02 ` [PATCH v2] f2fs: another way to set large folio by remembering inode number Christoph Hellwig @ 2026-04-15 16:44 ` Jaegeuk Kim 2026-04-15 17:15 ` Matthew Wilcox 0 siblings, 1 reply; 20+ messages in thread From: Jaegeuk Kim @ 2026-04-15 16:44 UTC (permalink / raw) To: Christoph Hellwig Cc: linux-kernel, linux-f2fs-devel, Akilesh Kailash, linux-fsdevel, linux-mm, linux-api On 04/14, Christoph Hellwig wrote: > Please add the relevant mailing lists when adding new user interfaces. > > And I'm not sure hacks working around the proper large folio > implementation are something that should be merged upstream. Cc'ed linux-api and linux-fsdevel onto the patch thread with a proposal that I'm not sure it's acceptable or not. > > On Fri, Apr 10, 2026 at 01:16:23AM +0000, Jaegeuk Kim wrote: > > enum { > > F2FS_XATTR_FADV_LARGEFOLIO, > > }; > > > > unsigned int value = BIT(F2FS_XATTR_FADV_LARGEFOLIO); > > > > 1. setxattr(file, "user.fadvise", &value, sizeof(unsigned int), 0) > > -> register the inode number for large folio > > 2. chmod(0400, file) > > -> make Read-Only > > 3. fsync() && close() && open(READ) > > -> f2fs_iget() with large folio > > 4. open(WRITE), mkwrite on mmap, chmod(WRITE) > > -> return error > > 5. close() and open() > > -> goto #3 > > 6. unlink > > -> deregister the inode number > > > > Suggested-by: Akilesh Kailash <akailash@google.com> > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> > > --- > > > > Log from v1: > > - add a condition in f2fs_drop_inode > > - add Doc > > > > Documentation/filesystems/f2fs.rst | 41 ++++++++++++++++++++++++++---- > > fs/f2fs/checkpoint.c | 2 +- > > fs/f2fs/data.c | 2 +- > > fs/f2fs/f2fs.h | 1 + > > fs/f2fs/file.c | 11 ++++++-- > > fs/f2fs/inode.c | 19 +++++++++++--- > > fs/f2fs/super.c | 7 +++++ > > fs/f2fs/xattr.c | 35 ++++++++++++++++++++++++- > > fs/f2fs/xattr.h | 6 +++++ > > 9 files changed, 111 insertions(+), 13 deletions(-) > > > > diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst > > index 7e4031631286..de899d0d3088 100644 > > --- a/Documentation/filesystems/f2fs.rst > > +++ b/Documentation/filesystems/f2fs.rst > > @@ -1044,11 +1044,14 @@ page allocation for significant performance gains. To minimize code complexity, > > this support is currently excluded from the write path, which requires handling > > complex optimizations such as compression and block allocation modes. > > > > -This optional feature is triggered only when a file's immutable bit is set. > > -Consequently, F2FS will return EOPNOTSUPP if a user attempts to open a cached > > -file with write permissions, even immediately after clearing the bit. Write > > -access is only restored once the cached inode is dropped. The usage flow is > > -demonstrated below: > > +This optional feature is triggered by two mechanisms: the file's immutable bit > > +or a specific xattr flag. In both cases, F2FS ensures data integrity by > > +restricting the file to a read-only state while large folios are active. > > + > > +1. Immutable Bit Approach: > > +Triggered when the FS_IMMUTABLE_FL is set. This is a strict enforcement > > +where the file cannot be modified at all until the bit is cleared and > > +the cached inode is dropped. > > > > .. code-block:: > > > > @@ -1078,3 +1081,31 @@ demonstrated below: > > Written 4096 bytes with pattern = zero, total_time = 29 us, max_latency = 28 us > > > > # rm /data/testfile_read_seq > > + > > +2. XATTR fadvise Approach: > > +A more flexible registration via extended attributes. > > + > > +.. code-block:: > > + > > + enum { > > + F2FS_XATTR_FADV_LARGEFOLIO, > > + }; > > + unsigned int value = BIT(F2FS_XATTR_FADV_LARGEFOLIO); > > + > > + /* Registers the inode number for large folio support in the subsystem.*/ > > + # setxattr(file, "user.fadvise", &value, sizeof(unsigned int), 0) > > + > > + /* The file must be made Read-Only to transition into the large folio path. */ > > + # fchmod(0400, fd) > > + > > + /* clean up dirty inode state. */ > > + # fsync(fd) > > + > > + /* Drop the inode cache. > > + # close(fd) > > + > > + /* f2fs_iget() instantiates the inode with large folio support.*/ > > + # open() > > + > > + /* Returns -EOPNOTSUPP or error to protect the large folio cache.*/ > > + # open(WRITE), mkwrite on mmap, or chmod(WRITE) > > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c > > index 01e1ba77263e..fdd62ddc3ed6 100644 > > --- a/fs/f2fs/checkpoint.c > > +++ b/fs/f2fs/checkpoint.c > > @@ -778,7 +778,7 @@ void f2fs_remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type) > > __remove_ino_entry(sbi, ino, type); > > } > > > > -/* mode should be APPEND_INO, UPDATE_INO or TRANS_DIR_INO */ > > +/* mode should be APPEND_INO, UPDATE_INO, LARGE_FOLIO_IO, or TRANS_DIR_INO */ > > bool f2fs_exist_written_data(struct f2fs_sb_info *sbi, nid_t ino, int mode) > > { > > struct inode_management *im = &sbi->im[mode]; > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c > > index 965d4e6443c6..5e46230398d7 100644 > > --- a/fs/f2fs/data.c > > +++ b/fs/f2fs/data.c > > @@ -2494,7 +2494,7 @@ static int f2fs_read_data_large_folio(struct inode *inode, > > int ret = 0; > > bool folio_in_bio; > > > > - if (!IS_IMMUTABLE(inode) || f2fs_compressed_file(inode)) { > > + if (f2fs_compressed_file(inode)) { > > if (folio) > > folio_unlock(folio); > > return -EOPNOTSUPP; > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > > index e40b6b2784ee..02bc6eb96a59 100644 > > --- a/fs/f2fs/f2fs.h > > +++ b/fs/f2fs/f2fs.h > > @@ -381,6 +381,7 @@ enum { > > /* for the list of ino */ > > enum { > > ORPHAN_INO, /* for orphan ino list */ > > + LARGE_FOLIO_INO, /* for large folio case */ > > APPEND_INO, /* for append ino list */ > > UPDATE_INO, /* for update ino list */ > > TRANS_DIR_INO, /* for transactions dir ino list */ > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c > > index c0220cd7b332..64ba900410fc 100644 > > --- a/fs/f2fs/file.c > > +++ b/fs/f2fs/file.c > > @@ -2068,9 +2068,16 @@ static long f2fs_fallocate(struct file *file, int mode, > > > > static int f2fs_release_file(struct inode *inode, struct file *filp) > > { > > - if (atomic_dec_and_test(&F2FS_I(inode)->open_count)) > > + if (atomic_dec_and_test(&F2FS_I(inode)->open_count)) { > > f2fs_remove_donate_inode(inode); > > - > > + /* > > + * In order to get large folio as soon as possible, let's drop > > + * inode cache asap. See also f2fs_drop_inode. > > + */ > > + if (f2fs_exist_written_data(F2FS_I_SB(inode), > > + inode->i_ino, LARGE_FOLIO_INO)) > > + d_drop(filp->f_path.dentry); > > + } > > /* > > * f2fs_release_file is called at every close calls. So we should > > * not drop any inmemory pages by close called by other process. > > diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c > > index 89240be8cc59..e100bc5a378c 100644 > > --- a/fs/f2fs/inode.c > > +++ b/fs/f2fs/inode.c > > @@ -565,6 +565,20 @@ static bool is_meta_ino(struct f2fs_sb_info *sbi, unsigned int ino) > > ino == F2FS_COMPRESS_INO(sbi); > > } > > > > +static void f2fs_mapping_set_large_folio(struct inode *inode) > > +{ > > + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); > > + > > + if (f2fs_compressed_file(inode)) > > + return; > > + if (f2fs_quota_file(sbi, inode->i_ino)) > > + return; > > + if (IS_IMMUTABLE(inode) || > > + (f2fs_exist_written_data(sbi, inode->i_ino, LARGE_FOLIO_INO) && > > + !(inode->i_mode & S_IWUGO))) > > + mapping_set_folio_min_order(inode->i_mapping, 0); > > +} > > + > > struct inode *f2fs_iget(struct super_block *sb, unsigned long ino) > > { > > struct f2fs_sb_info *sbi = F2FS_SB(sb); > > @@ -620,9 +634,7 @@ struct inode *f2fs_iget(struct super_block *sb, unsigned long ino) > > inode->i_op = &f2fs_file_inode_operations; > > inode->i_fop = &f2fs_file_operations; > > inode->i_mapping->a_ops = &f2fs_dblock_aops; > > - if (IS_IMMUTABLE(inode) && !f2fs_compressed_file(inode) && > > - !f2fs_quota_file(sbi, inode->i_ino)) > > - mapping_set_folio_min_order(inode->i_mapping, 0); > > + f2fs_mapping_set_large_folio(inode); > > } else if (S_ISDIR(inode->i_mode)) { > > inode->i_op = &f2fs_dir_inode_operations; > > inode->i_fop = &f2fs_dir_operations; > > @@ -895,6 +907,7 @@ void f2fs_evict_inode(struct inode *inode) > > f2fs_remove_ino_entry(sbi, inode->i_ino, APPEND_INO); > > f2fs_remove_ino_entry(sbi, inode->i_ino, UPDATE_INO); > > f2fs_remove_ino_entry(sbi, inode->i_ino, FLUSH_INO); > > + f2fs_remove_ino_entry(sbi, inode->i_ino, LARGE_FOLIO_INO); > > > > if (!is_sbi_flag_set(sbi, SBI_IS_FREEZING)) { > > sb_start_intwrite(inode->i_sb); > > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c > > index ccf806b676f5..11d1e0c99ac1 100644 > > --- a/fs/f2fs/super.c > > +++ b/fs/f2fs/super.c > > @@ -1844,6 +1844,13 @@ static int f2fs_drop_inode(struct inode *inode) > > return 1; > > } > > } > > + /* > > + * In order to get large folio as soon as possible, let's drop > > + * inode cache asap. See also f2fs_release_file. > > + */ > > + if (f2fs_exist_written_data(sbi, inode->i_ino, LARGE_FOLIO_INO) && > > + !is_inode_flag_set(inode, FI_DIRTY_INODE)) > > + return 1; > > > > /* > > * This is to avoid a deadlock condition like below. > > diff --git a/fs/f2fs/xattr.c b/fs/f2fs/xattr.c > > index 941dc62a6d6f..0c0e44c2dcdd 100644 > > --- a/fs/f2fs/xattr.c > > +++ b/fs/f2fs/xattr.c > > @@ -44,6 +44,16 @@ static void xattr_free(struct f2fs_sb_info *sbi, void *xattr_addr, > > kfree(xattr_addr); > > } > > > > +static int f2fs_xattr_fadvise_get(struct inode *inode, void *buffer) > > +{ > > + if (!buffer) > > + goto out; > > + if (mapping_large_folio_support(inode->i_mapping)) > > + *((unsigned int *)buffer) |= BIT(F2FS_XATTR_FADV_LARGEFOLIO); > > +out: > > + return sizeof(unsigned int); > > +} > > + > > static int f2fs_xattr_generic_get(const struct xattr_handler *handler, > > struct dentry *unused, struct inode *inode, > > const char *name, void *buffer, size_t size) > > @@ -61,10 +71,29 @@ static int f2fs_xattr_generic_get(const struct xattr_handler *handler, > > default: > > return -EINVAL; > > } > > + if (handler->flags == F2FS_XATTR_INDEX_USER && > > + !strcmp(name, "fadvise")) > > + return f2fs_xattr_fadvise_get(inode, buffer); > > + > > return f2fs_getxattr(inode, handler->flags, name, > > buffer, size, NULL); > > } > > > > +static int f2fs_xattr_fadvise_set(struct inode *inode, const void *value) > > +{ > > + unsigned int new_fadvise; > > + > > + new_fadvise = *(unsigned int *)value; > > + > > + if (new_fadvise & BIT(F2FS_XATTR_FADV_LARGEFOLIO)) > > + f2fs_add_ino_entry(F2FS_I_SB(inode), > > + inode->i_ino, LARGE_FOLIO_INO); > > + else > > + f2fs_remove_ino_entry(F2FS_I_SB(inode), > > + inode->i_ino, LARGE_FOLIO_INO); > > + return 0; > > +} > > + > > static int f2fs_xattr_generic_set(const struct xattr_handler *handler, > > struct mnt_idmap *idmap, > > struct dentry *unused, struct inode *inode, > > @@ -84,6 +113,10 @@ static int f2fs_xattr_generic_set(const struct xattr_handler *handler, > > default: > > return -EINVAL; > > } > > + if (handler->flags == F2FS_XATTR_INDEX_USER && > > + !strcmp(name, "fadvise")) > > + return f2fs_xattr_fadvise_set(inode, value); > > + > > return f2fs_setxattr(inode, handler->flags, name, > > value, size, NULL, flags); > > } > > @@ -842,4 +875,4 @@ int __init f2fs_init_xattr_cache(void) > > void f2fs_destroy_xattr_cache(void) > > { > > kmem_cache_destroy(inline_xattr_slab); > > -} > > \ No newline at end of file > > +} > > diff --git a/fs/f2fs/xattr.h b/fs/f2fs/xattr.h > > index bce3d93e4755..455f460d014e 100644 > > --- a/fs/f2fs/xattr.h > > +++ b/fs/f2fs/xattr.h > > @@ -24,6 +24,7 @@ > > #define F2FS_XATTR_REFCOUNT_MAX 1024 > > > > /* Name indexes */ > > +#define F2FS_USER_FADVISE_NAME "user.fadvise" > > #define F2FS_SYSTEM_ADVISE_NAME "system.advise" > > #define F2FS_XATTR_INDEX_USER 1 > > #define F2FS_XATTR_INDEX_POSIX_ACL_ACCESS 2 > > @@ -39,6 +40,11 @@ > > #define F2FS_XATTR_NAME_ENCRYPTION_CONTEXT "c" > > #define F2FS_XATTR_NAME_VERITY "v" > > > > +/* used for F2FS_USER_FADVISE_NAME */ > > +enum { > > + F2FS_XATTR_FADV_LARGEFOLIO, > > +}; > > + > > struct f2fs_xattr_header { > > __le32 h_magic; /* magic number for identification */ > > __le32 h_refcount; /* reference count */ > > -- > > 2.53.0.1213.gd9a14994de-goog > > > > > ---end quoted text--- ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] f2fs: another way to set large folio by remembering inode number 2026-04-15 16:44 ` Jaegeuk Kim @ 2026-04-15 17:15 ` Matthew Wilcox 2026-04-15 22:02 ` Jaegeuk Kim 2026-05-21 8:51 ` Christoph Hellwig 0 siblings, 2 replies; 20+ messages in thread From: Matthew Wilcox @ 2026-04-15 17:15 UTC (permalink / raw) To: Jaegeuk Kim Cc: Christoph Hellwig, linux-kernel, linux-f2fs-devel, Akilesh Kailash, linux-fsdevel, linux-mm, linux-api On Wed, Apr 15, 2026 at 04:44:04PM +0000, Jaegeuk Kim wrote: > On 04/14, Christoph Hellwig wrote: > > Please add the relevant mailing lists when adding new user interfaces. > > > > And I'm not sure hacks working around the proper large folio > > implementation are something that should be merged upstream. > > Cc'ed linux-api and linux-fsdevel onto the patch thread with a proposal that > I'm not sure it's acceptable or not. You haven't sent a proposal. This is a reply to a reply to a reply of a patch. There's no justification for why f2fs is so special that it needs this. What the hell is going on? You know this is not the way to get code merged into Linux. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] f2fs: another way to set large folio by remembering inode number 2026-04-15 17:15 ` Matthew Wilcox @ 2026-04-15 22:02 ` Jaegeuk Kim 2026-04-15 23:49 ` Darrick J. Wong 2026-05-21 8:51 ` Christoph Hellwig 1 sibling, 1 reply; 20+ messages in thread From: Jaegeuk Kim @ 2026-04-15 22:02 UTC (permalink / raw) To: Matthew Wilcox Cc: Christoph Hellwig, linux-kernel, linux-f2fs-devel, Akilesh Kailash, linux-fsdevel, linux-mm, linux-api On 04/15, Matthew Wilcox wrote: > On Wed, Apr 15, 2026 at 04:44:04PM +0000, Jaegeuk Kim wrote: > > On 04/14, Christoph Hellwig wrote: > > > Please add the relevant mailing lists when adding new user interfaces. > > > > > > And I'm not sure hacks working around the proper large folio > > > implementation are something that should be merged upstream. > > > > Cc'ed linux-api and linux-fsdevel onto the patch thread with a proposal that > > I'm not sure it's acceptable or not. > > You haven't sent a proposal. This is a reply to a reply to a reply of a > patch. There's no justification for why f2fs is so special that it > needs this. What the hell is going on? You know this is not the way to > get code merged into Linux. I added two ideas in that email. Have you even tried to understand? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] f2fs: another way to set large folio by remembering inode number 2026-04-15 22:02 ` Jaegeuk Kim @ 2026-04-15 23:49 ` Darrick J. Wong 2026-04-16 1:19 ` Jaegeuk Kim 0 siblings, 1 reply; 20+ messages in thread From: Darrick J. Wong @ 2026-04-15 23:49 UTC (permalink / raw) To: Jaegeuk Kim Cc: Matthew Wilcox, Christoph Hellwig, linux-kernel, linux-f2fs-devel, Akilesh Kailash, linux-fsdevel, linux-mm, linux-api On Wed, Apr 15, 2026 at 10:02:26PM +0000, Jaegeuk Kim wrote: > On 04/15, Matthew Wilcox wrote: > > On Wed, Apr 15, 2026 at 04:44:04PM +0000, Jaegeuk Kim wrote: > > > On 04/14, Christoph Hellwig wrote: > > > > Please add the relevant mailing lists when adding new user interfaces. > > > > > > > > And I'm not sure hacks working around the proper large folio > > > > implementation are something that should be merged upstream. > > > > > > Cc'ed linux-api and linux-fsdevel onto the patch thread with a proposal that > > > I'm not sure it's acceptable or not. > > > > You haven't sent a proposal. This is a reply to a reply to a reply of a > > patch. There's no justification for why f2fs is so special that it > > needs this. What the hell is going on? You know this is not the way to > > get code merged into Linux. > > I added two ideas in that email. Have you even tried to understand? You want to establish "user.fadvise" as an extended attribute containing a bitmask. The sole bit defined in that attribute means "use large folios", but you also have to change the file mode and set the IMMUTABLE bit for it to actually do anything. Meanwhile, you can't actually persist any of the fadvise(2) advice flags, so the xattr name doesn't even make sense. Maybe you meant to call it "user.madvise" since the closest thing I can think of is MADV_HUGEPAGE? I've understood enough. YUCK. --D ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] f2fs: another way to set large folio by remembering inode number 2026-04-15 23:49 ` Darrick J. Wong @ 2026-04-16 1:19 ` Jaegeuk Kim 0 siblings, 0 replies; 20+ messages in thread From: Jaegeuk Kim @ 2026-04-16 1:19 UTC (permalink / raw) To: Darrick J. Wong Cc: Matthew Wilcox, Christoph Hellwig, linux-kernel, linux-f2fs-devel, Akilesh Kailash, linux-fsdevel, linux-mm, linux-api On 04/15, Darrick J. Wong wrote: > On Wed, Apr 15, 2026 at 10:02:26PM +0000, Jaegeuk Kim wrote: > > On 04/15, Matthew Wilcox wrote: > > > On Wed, Apr 15, 2026 at 04:44:04PM +0000, Jaegeuk Kim wrote: > > > > On 04/14, Christoph Hellwig wrote: > > > > > Please add the relevant mailing lists when adding new user interfaces. > > > > > > > > > > And I'm not sure hacks working around the proper large folio > > > > > implementation are something that should be merged upstream. > > > > > > > > Cc'ed linux-api and linux-fsdevel onto the patch thread with a proposal that > > > > I'm not sure it's acceptable or not. > > > > > > You haven't sent a proposal. This is a reply to a reply to a reply of a > > > patch. There's no justification for why f2fs is so special that it > > > needs this. What the hell is going on? You know this is not the way to > > > get code merged into Linux. > > > > I added two ideas in that email. Have you even tried to understand? > > You want to establish "user.fadvise" as an extended attribute containing > a bitmask. The sole bit defined in that attribute means "use large > folios", but you also have to change the file mode and set the IMMUTABLE > bit for it to actually do anything. Partly yes. This path has nothing to do with IMMUTABLE bit, since I used to activate the large folio with that bit, but hit a big pain which requires clearing the bit whenever just deleting the file. So, this gives a new way to activate the large folio by chmod(0400) and setxattr("user.fadvise") only while providing quick inode eviction in order to set mapping by iget, and allowing file deletion easily. I feel the arguable points would be 1) the path to evict inode by calling d_drop in release_file and returning 1 in drop_inode, 2) how to give the hint between fadvise(FADV_LARGE_FOLIO) or setxattr(user.fadvise) by individual file system. > > Meanwhile, you can't actually persist any of the fadvise(2) advice > flags, so the xattr name doesn't even make sense. Maybe you meant to > call it "user.madvise" since the closest thing I can think of is > MADV_HUGEPAGE? > > I've understood enough. YUCK. Thank you for taking the time to take a look. > > --D ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] f2fs: another way to set large folio by remembering inode number 2026-04-15 17:15 ` Matthew Wilcox 2026-04-15 22:02 ` Jaegeuk Kim @ 2026-05-21 8:51 ` Christoph Hellwig 2026-05-21 15:57 ` Theodore Tso 1 sibling, 1 reply; 20+ messages in thread From: Christoph Hellwig @ 2026-05-21 8:51 UTC (permalink / raw) To: Matthew Wilcox Cc: Jaegeuk Kim, Christoph Hellwig, linux-kernel, linux-f2fs-devel, Akilesh Kailash, linux-fsdevel, linux-mm, linux-api, Christian Brauner On Wed, Apr 15, 2026 at 06:15:46PM +0100, Matthew Wilcox wrote: > On Wed, Apr 15, 2026 at 04:44:04PM +0000, Jaegeuk Kim wrote: > > On 04/14, Christoph Hellwig wrote: > > > Please add the relevant mailing lists when adding new user interfaces. > > > > > > And I'm not sure hacks working around the proper large folio > > > implementation are something that should be merged upstream. > > > > Cc'ed linux-api and linux-fsdevel onto the patch thread with a proposal that > > I'm not sure it's acceptable or not. > > You haven't sent a proposal. This is a reply to a reply to a reply of a > patch. There's no justification for why f2fs is so special that it > needs this. What the hell is going on? You know this is not the way to > get code merged into Linux. None of this got properly answers, and this broken interface now landed in linux-next. IT is offloading a user.* xattr which is free-form user data with semantics that are weird to say it very nicely. All this was done against the advice in the mailing list discussion. I think at some point we just need to stop taking f2fs updates likes this. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] f2fs: another way to set large folio by remembering inode number 2026-05-21 8:51 ` Christoph Hellwig @ 2026-05-21 15:57 ` Theodore Tso 2026-05-21 17:42 ` Matthew Wilcox ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Theodore Tso @ 2026-05-21 15:57 UTC (permalink / raw) To: Christoph Hellwig Cc: Matthew Wilcox, Jaegeuk Kim, linux-kernel, linux-f2fs-devel, Akilesh Kailash, linux-fsdevel, linux-mm, linux-api, Christian Brauner On Thu, May 21, 2026 at 01:51:08AM -0700, Christoph Hellwig wrote: > > You haven't sent a proposal. This is a reply to a reply to a reply of a > > patch. There's no justification for why f2fs is so special that it > > needs this. What the hell is going on? You know this is not the way to > > get code merged into Linux. > > None of this got properly answers, and this broken interface now landed > in linux-next. IT is offloading a user.* xattr which is free-form > user data with semantics that are weird to say it very nicely. > > All this was done against the advice in the mailing list discussion. So let me get this straight. This is a magic xattr interface which is not even persisted in the file system, but instead sets a 32-bit bitmask in the struct inode which disappears once the inode gets flushed from the inode stack. And it uses a generic xattr name, "user.fadvise". There's no way in *hell* any other file system is likely to adopt such a broken interface, so why didn't you just use an ioctl to set this magic f2fs-specific flag? > I think at some point we just need to stop taking f2fs updates likes > this. Well, that's ultiamtely up to Linus. I'll say that if I were Linus (and I'm glad I'm not :-), and I saw this in a pull request, I'd reject it out of hand. But whether it's worth making a huge fuss and asking escalating this mess to Linus, we probably should get a bit more community consensus before taking such a drastic step. Christian, since you're one of the VFS maintaienrs, what's your opinion about escalating this to Linus? - Ted ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] f2fs: another way to set large folio by remembering inode number 2026-05-21 15:57 ` Theodore Tso @ 2026-05-21 17:42 ` Matthew Wilcox 2026-05-22 3:59 ` Jaegeuk Kim 2026-05-22 3:32 ` Jaegeuk Kim 2026-05-22 9:59 ` Christian Brauner 2 siblings, 1 reply; 20+ messages in thread From: Matthew Wilcox @ 2026-05-21 17:42 UTC (permalink / raw) To: Theodore Tso Cc: Christoph Hellwig, Jaegeuk Kim, linux-kernel, linux-f2fs-devel, Akilesh Kailash, linux-fsdevel, linux-mm, linux-api, Christian Brauner On Thu, May 21, 2026 at 11:57:48AM -0400, Theodore Tso wrote: > So let me get this straight. This is a magic xattr interface which is > not even persisted in the file system, but instead sets a 32-bit > bitmask in the struct inode which disappears once the inode gets > flushed from the inode stack. And it uses a generic xattr name, > "user.fadvise". > > There's no way in *hell* any other file system is likely to adopt such > a broken interface, so why didn't you just use an ioctl to set this > magic f2fs-specific flag? I mean, yes, this API is horrendous. But it's just another example of f2fs thinking it's somehow special and not just enabling large folios like other filesystems do. This hurts everyone, not just people who use f2fs. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] f2fs: another way to set large folio by remembering inode number 2026-05-21 17:42 ` Matthew Wilcox @ 2026-05-22 3:59 ` Jaegeuk Kim 2026-05-22 12:55 ` Matthew Wilcox 0 siblings, 1 reply; 20+ messages in thread From: Jaegeuk Kim @ 2026-05-22 3:59 UTC (permalink / raw) To: Matthew Wilcox Cc: Theodore Tso, Christoph Hellwig, linux-kernel, linux-f2fs-devel, Akilesh Kailash, linux-fsdevel, linux-mm, linux-api, Christian Brauner On 05/21, Matthew Wilcox wrote: > On Thu, May 21, 2026 at 11:57:48AM -0400, Theodore Tso wrote: > > So let me get this straight. This is a magic xattr interface which is > > not even persisted in the file system, but instead sets a 32-bit > > bitmask in the struct inode which disappears once the inode gets > > flushed from the inode stack. And it uses a generic xattr name, > > "user.fadvise". > > > > There's no way in *hell* any other file system is likely to adopt such > > a broken interface, so why didn't you just use an ioctl to set this > > magic f2fs-specific flag? > > I mean, yes, this API is horrendous. But it's just another example of > f2fs thinking it's somehow special and not just enabling large folios > like other filesystems do. This hurts everyone, not just people who use > f2fs. From the production viewpoint, I raised a concern on setting large folio by default, since that exhausts lots of high-order pages, which were needed for essential system services and critical apps. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] f2fs: another way to set large folio by remembering inode number 2026-05-22 3:59 ` Jaegeuk Kim @ 2026-05-22 12:55 ` Matthew Wilcox 2026-05-22 14:04 ` [f2fs-dev] " Jaegeuk Kim 0 siblings, 1 reply; 20+ messages in thread From: Matthew Wilcox @ 2026-05-22 12:55 UTC (permalink / raw) To: Jaegeuk Kim Cc: Theodore Tso, Christoph Hellwig, linux-kernel, linux-f2fs-devel, Akilesh Kailash, linux-fsdevel, linux-mm, linux-api, Christian Brauner On Fri, May 22, 2026 at 03:59:45AM +0000, Jaegeuk Kim wrote: > On 05/21, Matthew Wilcox wrote: > > On Thu, May 21, 2026 at 11:57:48AM -0400, Theodore Tso wrote: > > > So let me get this straight. This is a magic xattr interface which is > > > not even persisted in the file system, but instead sets a 32-bit > > > bitmask in the struct inode which disappears once the inode gets > > > flushed from the inode stack. And it uses a generic xattr name, > > > "user.fadvise". > > > > > > There's no way in *hell* any other file system is likely to adopt such > > > a broken interface, so why didn't you just use an ioctl to set this > > > magic f2fs-specific flag? > > > > I mean, yes, this API is horrendous. But it's just another example of > > f2fs thinking it's somehow special and not just enabling large folios > > like other filesystems do. This hurts everyone, not just people who use > > f2fs. > > >From the production viewpoint, I raised a concern on setting large folio by > default, since that exhausts lots of high-order pages, which were needed for > essential system services and critical apps. Random fears or actual data? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [f2fs-dev] [PATCH v2] f2fs: another way to set large folio by remembering inode number 2026-05-22 12:55 ` Matthew Wilcox @ 2026-05-22 14:04 ` Jaegeuk Kim 0 siblings, 0 replies; 20+ messages in thread From: Jaegeuk Kim @ 2026-05-22 14:04 UTC (permalink / raw) To: Matthew Wilcox Cc: Theodore Tso, linux-api, linux-kernel, linux-f2fs-devel, Christoph Hellwig, linux-mm, linux-fsdevel, Akilesh Kailash, Christian Brauner On 05/22, Matthew Wilcox wrote: > On Fri, May 22, 2026 at 03:59:45AM +0000, Jaegeuk Kim wrote: > > On 05/21, Matthew Wilcox wrote: > > > On Thu, May 21, 2026 at 11:57:48AM -0400, Theodore Tso wrote: > > > > So let me get this straight. This is a magic xattr interface which is > > > > not even persisted in the file system, but instead sets a 32-bit > > > > bitmask in the struct inode which disappears once the inode gets > > > > flushed from the inode stack. And it uses a generic xattr name, > > > > "user.fadvise". > > > > > > > > There's no way in *hell* any other file system is likely to adopt such > > > > a broken interface, so why didn't you just use an ioctl to set this > > > > magic f2fs-specific flag? > > > > > > I mean, yes, this API is horrendous. But it's just another example of > > > f2fs thinking it's somehow special and not just enabling large folios > > > like other filesystems do. This hurts everyone, not just people who use > > > f2fs. > > > > >From the production viewpoint, I raised a concern on setting large folio by > > default, since that exhausts lots of high-order pages, which were needed for > > essential system services and critical apps. > > Random fears or actual data? This was a quick buddyinfo right after booting the device. Before: Node 0, zone Normal 22684 42284 28704 16901 9515 4566 1854 673 181 36 758 After disabling EROFS large folio: Node 0, zone Normal 8486 4732 2175 1161 697 272 82 19 3 1 856 > > > _______________________________________________ > Linux-f2fs-devel mailing list > Linux-f2fs-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [f2fs-dev] [PATCH v2] f2fs: another way to set large folio by remembering inode number 2026-05-21 15:57 ` Theodore Tso 2026-05-21 17:42 ` Matthew Wilcox @ 2026-05-22 3:32 ` Jaegeuk Kim 2026-05-22 3:53 ` Eric Biggers 2026-05-22 14:11 ` Theodore Tso 2026-05-22 9:59 ` Christian Brauner 2 siblings, 2 replies; 20+ messages in thread From: Jaegeuk Kim @ 2026-05-22 3:32 UTC (permalink / raw) To: Theodore Tso Cc: Christoph Hellwig, linux-api, linux-kernel, Matthew Wilcox, linux-f2fs-devel, linux-mm, linux-fsdevel, Akilesh Kailash, Christian Brauner On 05/21, Theodore Tso wrote: > On Thu, May 21, 2026 at 01:51:08AM -0700, Christoph Hellwig wrote: > > > You haven't sent a proposal. This is a reply to a reply to a reply of a > > > patch. There's no justification for why f2fs is so special that it > > > needs this. What the hell is going on? You know this is not the way to > > > get code merged into Linux. > > > > None of this got properly answers, and this broken interface now landed > > in linux-next. IT is offloading a user.* xattr which is free-form > > user data with semantics that are weird to say it very nicely. > > > > All this was done against the advice in the mailing list discussion. > > So let me get this straight. This is a magic xattr interface which is > not even persisted in the file system, but instead sets a 32-bit > bitmask in the struct inode which disappears once the inode gets > flushed from the inode stack. And it uses a generic xattr name, > "user.fadvise". > > There's no way in *hell* any other file system is likely to adopt such > a broken interface, so why didn't you just use an ioctl to set this > magic f2fs-specific flag? I went this route because Android heavily restricts ioctl() permissions and we needed broader access for this to work within the framework. It’s definitely a pragmatic choice just to get it running in production. If ioctl() is a right way for upstream, I'm happy to change this patch. By the way, I really don't understand why all the messages are so offensive, even without trying to understand the problem or guiding right directions. > > > I think at some point we just need to stop taking f2fs updates likes > > this. > > Well, that's ultiamtely up to Linus. I'll say that if I were Linus > (and I'm glad I'm not :-), and I saw this in a pull request, I'd > reject it out of hand. But whether it's worth making a huge fuss and > asking escalating this mess to Linus, we probably should get a bit > more community consensus before taking such a drastic step. Could I also raise a quick concern regarding the phrasing/wording used in the communications? > > Christian, since you're one of the VFS maintaienrs, what's your > opinion about escalating this to Linus? > > - Ted > > > _______________________________________________ > Linux-f2fs-devel mailing list > Linux-f2fs-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [f2fs-dev] [PATCH v2] f2fs: another way to set large folio by remembering inode number 2026-05-22 3:32 ` Jaegeuk Kim @ 2026-05-22 3:53 ` Eric Biggers 2026-05-22 4:02 ` Jaegeuk Kim 2026-05-22 10:01 ` Christian Brauner 2026-05-22 14:11 ` Theodore Tso 1 sibling, 2 replies; 20+ messages in thread From: Eric Biggers @ 2026-05-22 3:53 UTC (permalink / raw) To: Jaegeuk Kim Cc: Theodore Tso, Christoph Hellwig, linux-api, linux-kernel, Matthew Wilcox, linux-f2fs-devel, linux-mm, linux-fsdevel, Akilesh Kailash, Christian Brauner On Fri, May 22, 2026 at 03:32:39AM +0000, Jaegeuk Kim wrote: > On 05/21, Theodore Tso wrote: > > On Thu, May 21, 2026 at 01:51:08AM -0700, Christoph Hellwig wrote: > > > > You haven't sent a proposal. This is a reply to a reply to a reply of a > > > > patch. There's no justification for why f2fs is so special that it > > > > needs this. What the hell is going on? You know this is not the way to > > > > get code merged into Linux. > > > > > > None of this got properly answers, and this broken interface now landed > > > in linux-next. IT is offloading a user.* xattr which is free-form > > > user data with semantics that are weird to say it very nicely. > > > > > > All this was done against the advice in the mailing list discussion. > > > > So let me get this straight. This is a magic xattr interface which is > > not even persisted in the file system, but instead sets a 32-bit > > bitmask in the struct inode which disappears once the inode gets > > flushed from the inode stack. And it uses a generic xattr name, > > "user.fadvise". > > > > There's no way in *hell* any other file system is likely to adopt such > > a broken interface, so why didn't you just use an ioctl to set this > > magic f2fs-specific flag? > > I went this route because Android heavily restricts ioctl() permissions > and we needed broader access for this to work within the framework. It's straightforward (2 lines I think) to update Android's SELinux policy to allow an ioctl in all domains. So that doesn't seem like a reason to not use an ioctl. In fact this is actually a reason *to* use an ioctl, as it shows that ioctls can be allowed/denied independently as needed, whereas xattrs just use the file write permission. - Eric ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [f2fs-dev] [PATCH v2] f2fs: another way to set large folio by remembering inode number 2026-05-22 3:53 ` Eric Biggers @ 2026-05-22 4:02 ` Jaegeuk Kim 2026-05-22 10:01 ` Christian Brauner 1 sibling, 0 replies; 20+ messages in thread From: Jaegeuk Kim @ 2026-05-22 4:02 UTC (permalink / raw) To: Eric Biggers Cc: Theodore Tso, linux-api, linux-kernel, Matthew Wilcox, linux-f2fs-devel, Christoph Hellwig, linux-mm, linux-fsdevel, Akilesh Kailash, Christian Brauner On 05/21, Eric Biggers via Linux-f2fs-devel wrote: > On Fri, May 22, 2026 at 03:32:39AM +0000, Jaegeuk Kim wrote: > > On 05/21, Theodore Tso wrote: > > > On Thu, May 21, 2026 at 01:51:08AM -0700, Christoph Hellwig wrote: > > > > > You haven't sent a proposal. This is a reply to a reply to a reply of a > > > > > patch. There's no justification for why f2fs is so special that it > > > > > needs this. What the hell is going on? You know this is not the way to > > > > > get code merged into Linux. > > > > > > > > None of this got properly answers, and this broken interface now landed > > > > in linux-next. IT is offloading a user.* xattr which is free-form > > > > user data with semantics that are weird to say it very nicely. > > > > > > > > All this was done against the advice in the mailing list discussion. > > > > > > So let me get this straight. This is a magic xattr interface which is > > > not even persisted in the file system, but instead sets a 32-bit > > > bitmask in the struct inode which disappears once the inode gets > > > flushed from the inode stack. And it uses a generic xattr name, > > > "user.fadvise". > > > > > > There's no way in *hell* any other file system is likely to adopt such > > > a broken interface, so why didn't you just use an ioctl to set this > > > magic f2fs-specific flag? > > > > I went this route because Android heavily restricts ioctl() permissions > > and we needed broader access for this to work within the framework. > > It's straightforward (2 lines I think) to update Android's SELinux > policy to allow an ioctl in all domains. So that doesn't seem like a > reason to not use an ioctl. In fact this is actually a reason *to* use > an ioctl, as it shows that ioctls can be allowed/denied independently as > needed, whereas xattrs just use the file write permission. Ok, that's also great news to me. > > - Eric > > > _______________________________________________ > Linux-f2fs-devel mailing list > Linux-f2fs-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [f2fs-dev] [PATCH v2] f2fs: another way to set large folio by remembering inode number 2026-05-22 3:53 ` Eric Biggers 2026-05-22 4:02 ` Jaegeuk Kim @ 2026-05-22 10:01 ` Christian Brauner 1 sibling, 0 replies; 20+ messages in thread From: Christian Brauner @ 2026-05-22 10:01 UTC (permalink / raw) To: Eric Biggers Cc: Jaegeuk Kim, Theodore Tso, Christoph Hellwig, linux-api, linux-kernel, Matthew Wilcox, linux-f2fs-devel, linux-mm, linux-fsdevel, Akilesh Kailash, Christian Brauner On 2026-05-21 22:53 -0500, Eric Biggers wrote: > On Fri, May 22, 2026 at 03:32:39AM +0000, Jaegeuk Kim wrote: > > On 05/21, Theodore Tso wrote: > > > On Thu, May 21, 2026 at 01:51:08AM -0700, Christoph Hellwig wrote: > > > > > You haven't sent a proposal. This is a reply to a reply to a reply of a > > > > > patch. There's no justification for why f2fs is so special that it > > > > > needs this. What the hell is going on? You know this is not the way to > > > > > get code merged into Linux. > > > > > > > > None of this got properly answers, and this broken interface now landed > > > > in linux-next. IT is offloading a user.* xattr which is free-form > > > > user data with semantics that are weird to say it very nicely. > > > > > > > > All this was done against the advice in the mailing list discussion. > > > > > > So let me get this straight. This is a magic xattr interface which is > > > not even persisted in the file system, but instead sets a 32-bit > > > bitmask in the struct inode which disappears once the inode gets > > > flushed from the inode stack. And it uses a generic xattr name, > > > "user.fadvise". > > > > > > There's no way in *hell* any other file system is likely to adopt such > > > a broken interface, so why didn't you just use an ioctl to set this > > > magic f2fs-specific flag? > > > > I went this route because Android heavily restricts ioctl() permissions > > and we needed broader access for this to work within the framework. > > It's straightforward (2 lines I think) to update Android's SELinux > policy to allow an ioctl in all domains. So that doesn't seem like a > reason to not use an ioctl. In fact this is actually a reason *to* use > an ioctl, as it shows that ioctls can be allowed/denied independently as > needed, whereas xattrs just use the file write permission. Thank you, I very much agree. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [f2fs-dev] [PATCH v2] f2fs: another way to set large folio by remembering inode number 2026-05-22 3:32 ` Jaegeuk Kim 2026-05-22 3:53 ` Eric Biggers @ 2026-05-22 14:11 ` Theodore Tso 2026-05-22 17:08 ` Jaegeuk Kim 1 sibling, 1 reply; 20+ messages in thread From: Theodore Tso @ 2026-05-22 14:11 UTC (permalink / raw) To: Jaegeuk Kim Cc: Christoph Hellwig, linux-api, linux-kernel, Matthew Wilcox, linux-f2fs-devel, linux-mm, linux-fsdevel, Akilesh Kailash, Christian Brauner On Fri, May 22, 2026 at 03:32:39AM +0000, Jaegeuk Kim wrote: > I went this route because Android heavily restricts ioctl() permissions > and we needed broader access for this to work within the framework. It’s > definitely a pragmatic choice just to get it running in production. > > If ioctl() is a right way for upstream, I'm happy to change this patch. By > the way, I really don't understand why all the messages are so offensive, > even without trying to understand the problem or guiding right directions. The reason why some people were getting annoyed was because as a Linux file system maintainer, there was an assumption that you would understand that extended attributes --- especially in the user.* namespace --- have an intended use case of storing a user-chosen small piece of metadata that would be stored in the file system. Hijacking user.fadvise such that it no longer persistent stores an extended attribute for one specific file system --- such that if a hypothetical user application might decide to store a piece of application data in the extended attribute named "fadvise" would do something completely different on a single mainline file system is in such poor taste that I would have *hoped* that any Linux file system maintainer would know that this a Really Bad Thing, such that if someone in your development community suggested such an idea, you would reject it. And then, when people complained that it was a bad idea, and you decided to put in the f2fs branch, such that it would show up in linux-next, and there was no way for other file system developers to object (short of appealing to Linus) --- well, that's basically you taking advantage of your file system maintainer privileges. And this is why I started proposing whether we needed to appeal this to Linus so he could make the call to reject something that the community had already told you was in terrible, terrible taste. As far as trying to understand why you were doing this --- I have to turn that question around. Why didn't *you* explain why you needed to do this thing? I went through the e-mail history, and I couldn't find an explanation of why you decided to do this thing. Perhaps we need to add an explanation the Documentation directory explaining what the intended use of the extended attribute case, and perhaps referencing past cases were people tried to use this to bypass the linux-api review process (f2fs's user.fadvise is not the first time someone has tried to do this) thing), so that automated review bots like Sashiko can explain why it's in such terrible taste to patch authors, perhaps we need to do this. Up until now, I think the assumption is that file system maintainers would know something this self-evident, and if not, if it was pointed out, they wouldn't try to force such an ill-advised interface to Linus. - Ted P.S. As an exmaple of how I hanlded a somewhat similar scenario in the past, my employer's cluster file system needed FALLOC_FL_NO_HIDE_STALE to save $$$$ in TCO storage costs. But the concern was this would be an attractive nuisance for enterprise distro users, who would see the massive performance increase, not realize that this would leak stale data, which could result in user PII being exposed, thus making life hard for Enterprise Linux's reputation. (This wasn't an issue at $WORK because we enrypt all data at rest, and the cluster file system daemon was a privileged server who (a) knew what it was doing, and (b) only it would have access to set FALLOC_FL_NO_HIDE_STALE.) I disclosed *why* $WORK needed such a thing (it made a huge difference to storage TCO costss for Google's Cluster Filesystem), and after discussion and negotiation, we came to a compromise which involved my keeping the (very small) patch out of tree, but reserving the code point upstream to avoid future bitfield collisions. They key here was that I *knew* it was controversial, and I understood what problems it might cause in the rest of the ecosystem. That's part of the job of a maintainer, and it's also why a company might want to hire a maintainer. They can represent the needs of multiple stakeholders --- the upstream community, upstream users and the greater Linux ecosystem, as well as their employer. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [f2fs-dev] [PATCH v2] f2fs: another way to set large folio by remembering inode number 2026-05-22 14:11 ` Theodore Tso @ 2026-05-22 17:08 ` Jaegeuk Kim 2026-05-22 22:41 ` Theodore Tso 0 siblings, 1 reply; 20+ messages in thread From: Jaegeuk Kim @ 2026-05-22 17:08 UTC (permalink / raw) To: Theodore Tso Cc: linux-api, linux-kernel, Matthew Wilcox, linux-f2fs-devel, Christoph Hellwig, linux-mm, linux-fsdevel, Akilesh Kailash, Christian Brauner On 05/22, Theodore Tso wrote: > On Fri, May 22, 2026 at 03:32:39AM +0000, Jaegeuk Kim wrote: > > I went this route because Android heavily restricts ioctl() permissions > > and we needed broader access for this to work within the framework. It’s > > definitely a pragmatic choice just to get it running in production. > > > > If ioctl() is a right way for upstream, I'm happy to change this patch. By > > the way, I really don't understand why all the messages are so offensive, > > even without trying to understand the problem or guiding right directions. > > The reason why some people were getting annoyed was because as a Linux > file system maintainer, there was an assumption that you would > understand that extended attributes --- especially in the user.* > namespace --- have an intended use case of storing a user-chosen small > piece of metadata that would be stored in the file system. > > Hijacking user.fadvise such that it no longer persistent stores an > extended attribute for one specific file system --- such that if a > hypothetical user application might decide to store a piece of > application data in the extended attribute named "fadvise" would do > something completely different on a single mainline file system is in > such poor taste that I would have *hoped* that any Linux file system > maintainer would know that this a Really Bad Thing, such that if > someone in your development community suggested such an idea, you > would reject it. Thank you for the explanation. It seems I made a wrong assumption on the usage of "user." prefix where each filesystem can support in different ways. > > And then, when people complained that it was a bad idea, and you > decided to put in the f2fs branch, such that it would show up in > linux-next, and there was no way for other file system developers to > object (short of appealing to Linus) --- well, that's basically you > taking advantage of your file system maintainer privileges. And this > is why I started proposing whether we needed to appeal this to Linus > so he could make the call to reject something that the community had > already told you was in terrible, terrible taste. To be fair, I hadn't queued it in linux-next for weeks since the first post, and was waiting for more feedbacks. If I was able to hear "user." is a generic prefix used for all filesystems and use ioctl from the beginning, I'd just drop and be able to start arguing with security folks back. Since my employer is funding for production mainly, not much upstream work, I couldn't sit and wait for the response for months. :( > > As far as trying to understand why you were doing this --- I have to > turn that question around. Why didn't *you* explain why you needed to > do this thing? I went through the e-mail history, and I couldn't find > an explanation of why you decided to do this thing. I shared some motivation when replying to Darrick's feedback [1], but yes, it was not enough for all heads-up. The problem started that some speicific application needs as many high-order pages as possible mostly for reads. So, I thought we can turn on large folio on the specific files per hints. One way for the hints was using immutable bit, but it turned out it's very hard to manage disabling the bit whenever deleting the files. Along with limited ioctl() and requiring inode eviction to manage large folio activation, I had to implement this path. [1] https://lore.kernel.org/lkml/aeA5C8byIpXWla7f@google.com/ > > Perhaps we need to add an explanation the Documentation directory > explaining what the intended use of the extended attribute case, and > perhaps referencing past cases were people tried to use this to bypass > the linux-api review process (f2fs's user.fadvise is not the first > time someone has tried to do this) thing), so that automated review > bots like Sashiko can explain why it's in such terrible taste to patch > authors, perhaps we need to do this. Up until now, I think the > assumption is that file system maintainers would know something this > self-evident, and if not, if it was pointed out, they wouldn't try to > force such an ill-advised interface to Linus. I believe this helps a lot to all newbies like me. I really appreciate your feedback. > > - Ted > > P.S. As an exmaple of how I hanlded a somewhat similar scenario in > the past, my employer's cluster file system needed > FALLOC_FL_NO_HIDE_STALE to save $$$$ in TCO storage costs. But the > concern was this would be an attractive nuisance for enterprise distro > users, who would see the massive performance increase, not realize > that this would leak stale data, which could result in user PII being > exposed, thus making life hard for Enterprise Linux's reputation. > (This wasn't an issue at $WORK because we enrypt all data at rest, and > the cluster file system daemon was a privileged server who (a) knew > what it was doing, and (b) only it would have access to set > FALLOC_FL_NO_HIDE_STALE.) > > I disclosed *why* $WORK needed such a thing (it made a huge difference > to storage TCO costss for Google's Cluster Filesystem), and after > discussion and negotiation, we came to a compromise which involved my > keeping the (very small) patch out of tree, but reserving the code > point upstream to avoid future bitfield collisions. They key here was > that I *knew* it was controversial, and I understood what problems it > might cause in the rest of the ecosystem. That's part of the job of a > maintainer, and it's also why a company might want to hire a > maintainer. They can represent the needs of multiple stakeholders --- > the upstream community, upstream users and the greater Linux > ecosystem, as well as their employer. > > > _______________________________________________ > Linux-f2fs-devel mailing list > Linux-f2fs-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [f2fs-dev] [PATCH v2] f2fs: another way to set large folio by remembering inode number 2026-05-22 17:08 ` Jaegeuk Kim @ 2026-05-22 22:41 ` Theodore Tso 0 siblings, 0 replies; 20+ messages in thread From: Theodore Tso @ 2026-05-22 22:41 UTC (permalink / raw) To: Jaegeuk Kim Cc: linux-api, linux-kernel, Matthew Wilcox, linux-f2fs-devel, Christoph Hellwig, linux-mm, linux-fsdevel, Akilesh Kailash, Christian Brauner On Fri, May 22, 2026 at 05:08:41PM +0000, Jaegeuk Kim wrote: > > Thank you for the explanation. It seems I made a wrong assumption on the > usage of "user." prefix where each filesystem can support in different > ways. The "user." prefix is used by all userspace applications that wish to store extended attributes. For example, user.mime_type, user.xdg.origin_url, user.charset, user.appache_handler, etc For more information, see: https://www.freedesktop.org/wiki/CommonExtendedAttribute https://wiki.archlinux.org/title/Extended_attributes I certainly assumed this was common knowledge across all file system maintainers, but this was apparently not true in your case. I don't know how this could be the case given that f2fs implements extended attributes, and I would have thought you would have known that when testing that feature. > I shared some motivation when replying to Darrick's feedback [1], but yes, > it was not enough for all heads-up. The problem started that some speicific > application needs as many high-order pages as possible mostly for reads. So, > I thought we can turn on large folio on the specific files per hints. One way > for the hints was using immutable bit, but it turned out it's very hard to > manage disabling the bit whenever deleting the files. Along with limited > ioctl() and requiring inode eviction to manage large folio activation, I had > to implement this path. > > [1] https://lore.kernel.org/lkml/aeA5C8byIpXWla7f@google.com/ Actually, you still haven't explained your use case, at least, not well enough for me to understand what you are trying to do. So an application wants a particular file to use as many high-order pages as possible. Why? What sort of guarantees do you need to provide? What happens if they can't be provided? What happens if a possibly malicious, or at least gready, application uses this interface to grab a lot of high-order pages? From your patch: 1. setxattr(file, "user.fadvise", &value, sizeof(unsigned int), 0) -> register the inode number for large folio 2. chmod(0400, file) -> make Read-Only 3. open() -> f2fs_iget() with large folio 4. open(WRITE), mkwrite on mmap, chmod(WRITE) -> return error 5. iput() and open() -> goto #3 6. unlink -> deregister the inode number Why should making the file read-only matter? And when you say "derigster the inode number", why should this be related to deleting the inode? This is an interface which seems to be very specific to your use case. What if those requirements change over time? What if you want pull in a file without making it be read-only? And what if you want to release the large-order pages without deleting the file? - Ted ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] f2fs: another way to set large folio by remembering inode number 2026-05-21 15:57 ` Theodore Tso 2026-05-21 17:42 ` Matthew Wilcox 2026-05-22 3:32 ` Jaegeuk Kim @ 2026-05-22 9:59 ` Christian Brauner 2 siblings, 0 replies; 20+ messages in thread From: Christian Brauner @ 2026-05-22 9:59 UTC (permalink / raw) To: Theodore Tso Cc: Christoph Hellwig, Matthew Wilcox, Jaegeuk Kim, linux-kernel, linux-f2fs-devel, Akilesh Kailash, linux-fsdevel, linux-mm, linux-api, Christian Brauner On 2026-05-21 11:57 -0400, Theodore Tso wrote: > On Thu, May 21, 2026 at 01:51:08AM -0700, Christoph Hellwig wrote: > > > You haven't sent a proposal. This is a reply to a reply to a reply of a > > > patch. There's no justification for why f2fs is so special that it > > > needs this. What the hell is going on? You know this is not the way to > > > get code merged into Linux. > > > > None of this got properly answers, and this broken interface now landed > > in linux-next. IT is offloading a user.* xattr which is free-form > > user data with semantics that are weird to say it very nicely. > > > > All this was done against the advice in the mailing list discussion. > > So let me get this straight. This is a magic xattr interface which is > not even persisted in the file system, but instead sets a 32-bit > bitmask in the struct inode which disappears once the inode gets > flushed from the inode stack. And it uses a generic xattr name, > "user.fadvise". > > There's no way in *hell* any other file system is likely to adopt such > a broken interface, so why didn't you just use an ioctl to set this > magic f2fs-specific flag? > > > I think at some point we just need to stop taking f2fs updates likes > > this. > > Well, that's ultiamtely up to Linus. I'll say that if I were Linus > (and I'm glad I'm not :-), and I saw this in a pull request, I'd > reject it out of hand. But whether it's worth making a huge fuss and > asking escalating this mess to Linus, we probably should get a bit > more community consensus before taking such a drastic step. > > Christian, since you're one of the VFS maintaienrs, what's your > opinion about escalating this to Linus? I think we don't need to involve Linus. The interface as is is broken. Using xattrs for this makes no sense whatsoever. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2026-05-22 22:41 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260409134538.3692605-1-jaegeuk@kernel.org>
[not found] ` <adhPZxtbZxgU-37v@google.com>
2026-04-14 8:02 ` [PATCH v2] f2fs: another way to set large folio by remembering inode number Christoph Hellwig
2026-04-15 16:44 ` Jaegeuk Kim
2026-04-15 17:15 ` Matthew Wilcox
2026-04-15 22:02 ` Jaegeuk Kim
2026-04-15 23:49 ` Darrick J. Wong
2026-04-16 1:19 ` Jaegeuk Kim
2026-05-21 8:51 ` Christoph Hellwig
2026-05-21 15:57 ` Theodore Tso
2026-05-21 17:42 ` Matthew Wilcox
2026-05-22 3:59 ` Jaegeuk Kim
2026-05-22 12:55 ` Matthew Wilcox
2026-05-22 14:04 ` [f2fs-dev] " Jaegeuk Kim
2026-05-22 3:32 ` Jaegeuk Kim
2026-05-22 3:53 ` Eric Biggers
2026-05-22 4:02 ` Jaegeuk Kim
2026-05-22 10:01 ` Christian Brauner
2026-05-22 14:11 ` Theodore Tso
2026-05-22 17:08 ` Jaegeuk Kim
2026-05-22 22:41 ` Theodore Tso
2026-05-22 9:59 ` Christian Brauner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox