* [f2fs-dev] [DISCUSSION]:f2fs:Approachs to address write amplification in current aops->dirty_folio
@ 2025-03-30 2:38 Nanzhe Zhao
2025-03-31 3:43 ` Matthew Wilcox
0 siblings, 1 reply; 6+ messages in thread
From: Nanzhe Zhao @ 2025-03-30 2:38 UTC (permalink / raw)
To: jaegeuk; +Cc: Matthew Wilcox, linux-f2fs-devel@lists.sourceforge.net
Dear f2fs developers,
I am writing to discuss a potential issue regarding the dirty_folio
implementation in f2fs and its potential impact on write
amplification.
Currently, the f2fs implementation of dirty_folio within
address_space_operations relies on filemap_dirty_folio. As
filemap_dirty_folio marks the entire folio as dirty, this means that
during dirty page writeback, if only a single page within a folio has
been modified, the entire folio will still be written back to storage.
This behavior can lead to write amplification.
For f2fs, a log-structured file system, this write amplification issue
is particularly concerning. It not only degrades writeback I/O
performance but also results in more data blocks being appended to the
end of the disk, accelerating the frequency of garbage collection and
potentially shortening the lifespan of flash memory.
Unlike ext4, f2fs lacks a buffer_head-like per-block data structure.
Therefore, the block_dirty_folio approach used by ext4 to mitigate
this issue is not directly applicable to f2fs.
I have been considering potential solutions to address this. Two
approaches I've explored are:
Either modifying the f2fs dirty page writeback function to manually
mark individual sub-pages within a folio as dirty, rather than relying
on the folio-level dirty flag.
Or utilizing the per-block dirty state tracking feature introduced in
kernel 6.6 within the iomap framework. This would involve using the
iomap_folio_state structure to track the dirty status of each block
within a folio.
I would greatly appreciate it if you could share your insights and
perspectives on this issue and the proposed solutions. Any feedback or
alternative suggestions would be highly valuable.
Best regards.
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [f2fs-dev] [DISCUSSION]:f2fs:Approachs to address write amplification in current aops->dirty_folio
2025-03-30 2:38 [f2fs-dev] [DISCUSSION]:f2fs:Approachs to address write amplification in current aops->dirty_folio Nanzhe Zhao
@ 2025-03-31 3:43 ` Matthew Wilcox
2025-04-01 14:17 ` Nanzhe Zhao
0 siblings, 1 reply; 6+ messages in thread
From: Matthew Wilcox @ 2025-03-31 3:43 UTC (permalink / raw)
To: Nanzhe Zhao; +Cc: jaegeuk, linux-f2fs-devel@lists.sourceforge.net
On Sun, Mar 30, 2025 at 10:38:37AM +0800, Nanzhe Zhao wrote:
> I have been considering potential solutions to address this. Two
> approaches I've explored are:
> Either modifying the f2fs dirty page writeback function to manually
> mark individual sub-pages within a folio as dirty, rather than relying
> on the folio-level dirty flag.
Just so you know, the per-page dirty flag is not in fact per page.
If you call SetPageDirty() on a tail page, it will set the dirty flag
on the head page (ie the same bit that is used by folio_set_dirty()).
This is intentional as we do not intend for there to be a per-page flags
field in the future.
> Or utilizing the per-block dirty state tracking feature introduced in
> kernel 6.6 within the iomap framework. This would involve using the
> iomap_folio_state structure to track the dirty status of each block
> within a folio.
The challenge with that is that iomap does not support all the
functionality that f2fs requires. The iomap data structure could
be duplicated inside f2fs, but then we hit the problem that f2fs
currently stores other information in folio->private. So we'd need
to add a flags field to iomap_folio_state to store that information
instead.
See the part of f2fs.h from PAGE_PRIVATE_GET_FUNC to the end of
clear_page_private_all().
You're right that f2fs needs per-block dirty tracking if it is to
support large folios.
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [f2fs-dev] [DISCUSSION]:f2fs:Approachs to address write amplification in current aops->dirty_folio
2025-03-31 3:43 ` Matthew Wilcox
@ 2025-04-01 14:17 ` Nanzhe Zhao
2025-04-02 3:10 ` Matthew Wilcox
0 siblings, 1 reply; 6+ messages in thread
From: Nanzhe Zhao @ 2025-04-01 14:17 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: jaegeuk@kernel.org, linux-f2fs-devel@lists.sourceforge.net
Thank you for your prompt and patient response!
>
> The challenge with that is that iomap does not support all the
> functionality that f2fs requires. The iomap data structure could
> be duplicated inside f2fs, but then we hit the problem that f2fs
> currently stores other information in folio->private. So we'd need
> to add a flags field to iomap_folio_state to store that information
> instead.
>
> See the part of f2fs.h from PAGE_PRIVATE_GET_FUNC to the end of
> clear_page_private_all().
Thank you for pointing out that specific piece of code. It really
helped me see some issues I hadn't noticed before and gave me a lot of
insights.
Based on my understanding after studying the code related to F2FS's
use of the private field of the page structure, it appears that F2FS
employs this field in a specific way. If the private field is not
interpreted as a pointer, it seems it could be used to store
additional flag bits. A key observation is that these functions seem
to apply to tail pages as well. Therefore, as you mentioned, if we are
using folios to manage multiple pages, it seems reasonable to consider
adding a similar field within the iomap_folio_state structure. This
would be analogous to how it currently tracks the uptodate and dirty
states for each subpage, allowing us to track the state of these
private fields for each subpage as well. Because it looks just like
F2FS is utilizing the private field as a way to extend the various
state flags of a page in memory. Perhaps it would be more appropriate
to directly name this new structure f2fs_folio_state? This is because
I'm currently unsure whether it will interact with existing iomap APIs
or if we will need to develop F2FS-specific APIs for it.
>
> You're right that f2fs needs per-block dirty tracking if it is to
> support large folios.
I feel that we need to consider more than just this aspect. In fact,
it might be because we are still in the early stages of F2FS folio
support,so it leaves me the impression that the current F2FS folio
implementation is essentially just replacing struct page at the
interface level. It effectively acts just like a single page, or in
other words, a folio of order 0.
Just now, I perform a simple test:
static int open_and_read(struct file** file_ptr_ref,char*
file_path,int flag,char** read_buffer_ref,size_t read_size,loff_t
read_pos)
{
/*open the file in custom module*/
struct file*file_ptr = filp_open(file_path,flag, 0644);
mapping_set_large_folios(file_ptr->f_mapping);/*Intentionaly set
large order of folio for test*/
printk(KERN_EMERG "min order folio of file %s is
%d",file_path,mapping_min_folio_order(file_ptr->f_mapping));
printk(KERN_EMERG "max order folio of file %s is
%d",file_path,mapping_max_folio_order(file_ptr->f_mapping));
*file_ptr_ref=file_ptr;
/*...
file_ptr error handle code
*/
char* read_buffer=kmalloc(read_size,GFP_KERNEL);
/*...
read_buffer error handle code
*/
int bytes_read=0;
bytes_read = kernel_read(file_ptr, read_buffer, read_size, &read_pos);
/*...
bytes_read error handle code
*/
*read_buffer_ref=read_buffer;
return bytes_read;
}
In my custom module code, which I use for experiments and testing, I
used this function to attempt a buffer read of 512 * PAGE_SIZE from
F2FS. The result was a segmentation fault. The reason is quite simple:
static int f2fs_mpage_readpages(struct inode *inode,
struct readahead_control *rac, struct folio *folio)
{
struct bio *bio = NULL;
sector_t last_block_in_bio = 0;
struct f2fs_map_blocks map;
#ifdef CONFIG_F2FS_FS_COMPRESSION
/*...compress ctx*/
#endif
unsigned nr_pages = rac ? readahead_count(rac) : 1;
unsigned max_nr_pages = nr_pages;
int ret = 0;
/*...init f2fs_map_blocks*/
for (; nr_pages; nr_pages--) {/*Error ! Only decre 1 a time when
submit one folio to read*/
if (rac) {
folio = readahead_folio(rac);/*Iterate to next folio*/
prefetchw(&folio->flags);
}
#ifdef CONFIG_F2FS_FS_COMPRESSION
index = folio_index(folio);/*Error!!Deref NULL ptr!!*/
if (!f2fs_compressed_file(inode))
goto read_single_page;
/*compress file pages read logic*/
goto next_page;
read_single_page:
#endif
ret = f2fs_read_single_page(inode, folio, max_nr_pages, &map,
&bio, &last_block_in_bio, rac);
/*Error!! Only first page of current folio is subimited for read!*/
if (ret) {
#ifdef CONFIG_F2FS_FS_COMPRESSION
set_error_page:
#endif
folio_zero_segment(folio, 0, folio_size(folio));
folio_unlock(folio);
}
#ifdef CONFIG_F2FS_FS_COMPRESSION
next_page:
#endif
#ifdef CONFIG_F2FS_FS_COMPRESSION
/*... last page handle logic*/
#endif
}
if (bio)
f2fs_submit_read_bio(F2FS_I_SB(inode), bio, DATA);
return ret;
}
As you can see in f2fs_mpage_readpages, after each folio is processed
in the loop, the nr_pages counter is only decremented by 1. Therefore,
it's clear that when the allocated folios in the page cache are all
iterated through, nr_pages still has remaining value, and the loop
continues. This naturally leads to a segmentation fault at index =
folio_index(folio); due to dereferencing a null pointer. Furthermore,
only the first page of each folio is submitted for I/O; the remaining
pages are not filled with data from disk.
This isn't the only place. Actually, when I was previously studying
the implementation of f2fs_dirty_folio, I also noticed:
static bool f2fs_dirty_data_folio(struct address_space *mapping,
struct folio *folio)
{
/*...other code*/
if (filemap_dirty_folio(mapping, folio)) {
f2fs_update_dirty_folio(inode, folio);
return true;
}
return false;
}
void f2fs_update_dirty_folio(struct inode *inode, struct folio *folio)
{
/*...other code*/
spin_lock(&sbi->inode_lock[type]);
if (type != FILE_INODE || test_opt(sbi, DATA_FLUSH))
__add_dirty_inode(inode, type);
inode_inc_dirty_pages(inode);/*Only incre inode->dirty_pages by one*/
spin_unlock(&sbi->inode_lock[type]);
set_page_private_reference(&folio->page);
}
In f2fs_update_dirty_folio, inode_inc_dirty_pages(inode) only
increments the inode's dirty page count by 1. This is quite confusing,
as it's unclear why it doesn't increment by nr_pages (assuming we are
not yet tracking dirty status at per block level). This observation
further strengthens my suspicion that the current folio implementation
might just be fixed at order 0.I haven't check whether other piece of
code in f2fs share the same problem.
I am planning to prepare patches to address these issues and submit
them soon. I noticed you recently submitted a big bunch of patches on
folio. I would like to debug and test based on your patch.Therefore, I
was wondering if it would be possible for you to share your modified
F2FS code directly, or perhaps provide a link to your Git repository?
Manually copying and applying so many patches from the mailing list
would be quite cumbersome.
Best regards.
Matthew Wilcox <willy@infradead.org> 于2025年3月31日周一 11:43写道:
>
> On Sun, Mar 30, 2025 at 10:38:37AM +0800, Nanzhe Zhao wrote:
> > I have been considering potential solutions to address this. Two
> > approaches I've explored are:
> > Either modifying the f2fs dirty page writeback function to manually
> > mark individual sub-pages within a folio as dirty, rather than relying
> > on the folio-level dirty flag.
>
> Just so you know, the per-page dirty flag is not in fact per page.
> If you call SetPageDirty() on a tail page, it will set the dirty flag
> on the head page (ie the same bit that is used by folio_set_dirty()).
> This is intentional as we do not intend for there to be a per-page flags
> field in the future.
>
> > Or utilizing the per-block dirty state tracking feature introduced in
> > kernel 6.6 within the iomap framework. This would involve using the
> > iomap_folio_state structure to track the dirty status of each block
> > within a folio.
>
> The challenge with that is that iomap does not support all the
> functionality that f2fs requires. The iomap data structure could
> be duplicated inside f2fs, but then we hit the problem that f2fs
> currently stores other information in folio->private. So we'd need
> to add a flags field to iomap_folio_state to store that information
> instead.
>
> See the part of f2fs.h from PAGE_PRIVATE_GET_FUNC to the end of
> clear_page_private_all().
>
> You're right that f2fs needs per-block dirty tracking if it is to
> support large folios.
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [f2fs-dev] [DISCUSSION]:f2fs:Approachs to address write amplification in current aops->dirty_folio
2025-04-01 14:17 ` Nanzhe Zhao
@ 2025-04-02 3:10 ` Matthew Wilcox
2025-04-05 3:10 ` Nanzhe Zhao
0 siblings, 1 reply; 6+ messages in thread
From: Matthew Wilcox @ 2025-04-02 3:10 UTC (permalink / raw)
To: Nanzhe Zhao; +Cc: jaegeuk@kernel.org, linux-f2fs-devel@lists.sourceforge.net
On Tue, Apr 01, 2025 at 10:17:42PM +0800, Nanzhe Zhao wrote:
> Based on my understanding after studying the code related to F2FS's
> use of the private field of the page structure, it appears that F2FS
> employs this field in a specific way. If the private field is not
> interpreted as a pointer, it seems it could be used to store
> additional flag bits. A key observation is that these functions seem
> to apply to tail pages as well. Therefore, as you mentioned, if we are
> using folios to manage multiple pages, it seems reasonable to consider
> adding a similar field within the iomap_folio_state structure. This
> would be analogous to how it currently tracks the uptodate and dirty
> states for each subpage, allowing us to track the state of these
> private fields for each subpage as well. Because it looks just like
> F2FS is utilizing the private field as a way to extend the various
> state flags of a page in memory. Perhaps it would be more appropriate
> to directly name this new structure f2fs_folio_state? This is because
> I'm currently unsure whether it will interact with existing iomap APIs
> or if we will need to develop F2FS-specific APIs for it.
At this point, f2fs has no concept of head/tail pages. Because it
doesn't tell the VFS that it can handle large folios, it will only see
order-0 pages. The page->private member will go away, so filesystems
cannot depend on being able to access it. They only get folio->private,
and it's recommended (but not required) that they use that to point to
their own private per-folio struct.
I do think the best approach is to extend iomap and then have f2fs use
iomap, but I appreciate that is several large jobs. It's worth it
because it completely insulates f2fs from having to deal with
pages/folios (except for metadata)
> > You're right that f2fs needs per-block dirty tracking if it is to
> > support large folios.
>
> I feel that we need to consider more than just this aspect. In fact,
> it might be because we are still in the early stages of F2FS folio
> support,so it leaves me the impression that the current F2FS folio
> implementation is essentially just replacing struct page at the
> interface level. It effectively acts just like a single page, or in
> other words, a folio of order 0.
Right, that's the current approach. We're taking it because the page
APIs are being removed. The f2fs developers have chosen to work on other
projects instead of supporting large folios (which is their right),
but they can't hold up the conversion of the entire filesystem stack
from pages to folios, so they're getting the minimal conversion and can
work on large folios when they have time.
> As you can see in f2fs_mpage_readpages, after each folio is processed
> in the loop, the nr_pages counter is only decremented by 1. Therefore,
> it's clear that when the allocated folios in the page cache are all
> iterated through, nr_pages still has remaining value, and the loop
> continues. This naturally leads to a segmentation fault at index =
> folio_index(folio); due to dereferencing a null pointer. Furthermore,
> only the first page of each folio is submitted for I/O; the remaining
> pages are not filled with data from disk.
Yes, there are lots of places in f2fs that assume a folio only has a
single page.
> I am planning to prepare patches to address these issues and submit
> them soon. I noticed you recently submitted a big bunch of patches on
> folio. I would like to debug and test based on your patch.Therefore, I
> was wondering if it would be possible for you to share your modified
> F2FS code directly, or perhaps provide a link to your Git repository?
> Manually copying and applying so many patches from the mailing list
> would be quite cumbersome.
Ah, you need a tool called b4. Your distro may have it packaged,
or you can get it from:
https://git.kernel.org/pub/scm/utils/b4/b4.git
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [f2fs-dev] [DISCUSSION]:f2fs:Approachs to address write amplification in current aops->dirty_folio
2025-04-02 3:10 ` Matthew Wilcox
@ 2025-04-05 3:10 ` Nanzhe Zhao
2025-04-08 10:14 ` Nanzhe Zhao
0 siblings, 1 reply; 6+ messages in thread
From: Nanzhe Zhao @ 2025-04-05 3:10 UTC (permalink / raw)
Cc: jaegeuk@kernel.org, v-songbaohua, linux-f2fs-devel
Thank you for your prompt and patient response!
>
> At this point, f2fs has no concept of head/tail pages. Because it
> doesn't tell the VFS that it can handle large folios, it will only see
> order-0 pages. The page->private member will go away, so filesystems
> cannot depend on being able to access it. They only get folio->private,
> and it's recommended (but not required) that they use that to point to
> their own private per-folio struct.
Yes, I understand that we should treat all pages represented by a
folio as a whole. The folio structure itself acts as the head page.
Operations and flags applied to the folio are effectively applied to
all pages within it, except for those operations that need to track
page-specific attributes, such as whether a page is dirty or uptodate.
I was just previously a bit concern about whether special flags used
in private within f2fs needed to be tracked on a per-page basis,
otherwise information might be lost. Let me give a specific example.
For instance, PAGE_PRIVATE_ONGOING_MIGRATION indicates that a page is
undergoing block migration during garbage collection. Initially, I was
a bit worried about what would happen if some pages in a folio were in
garbage collection while others were not. However, after further
consideration and looking at how this flag is used in the f2fs code,
it seems that it's sufficient for the folio's private field to know
that it is in the migration phase of garbage collection. For
PAGE_PRIVATE_INLINE_INODE, just from the name of this enumeration, we
can tell that it will only be used for metadata pages. Therefore, we
can currently fix the folio order for metadata folios to 0.
> I do think the best approach is to extend iomap and then have f2fs use
> iomap, but I appreciate that is several large jobs. It's worth it
> because it completely insulates f2fs from having to deal with
> pages/folios (except for metadata)
Well for iomap, I have several questions.
First of all,how should we define "having f2fs using iomap"? Does it
mean rewriting the address_space_operations using iomap-based APIs?
Let me take buffered read as a specific example. The only difference
between traditional buffered read and iomap-based buffered read is
whether to use mpage_readpages or iomap_readahead function during
aops->readahead. If using iomap in f2fs implies using the
iomap_readahead function, I am wondering if iomap_readahead supports
files based on indirect pointers?
I personally believe this question is very important. Because I
recently looked at the code related to iomap in buffered read for xfs
and the buffered read in the ext4 large folios patch. My conclusion is
that the current implementation of iomap_readahead in the mainline
kernel is entirely based on the assumption that the file's data block
allocation is extent-based. (The author of ext4 large folios patch
explicitly restricted the iomap buffered read path to extent-based
files).It seems to completely lack support for files with data blocks
allocated using indirect pointers. I am not sure if I have missed
something crucial or if my understanding of iomap's readahead logic is
not deep enough. I would like to confirm this with you. This is
because f2fs is a file system entirely based on indirect pointers (and
even with an additional layer of NAT table). The concept of "extent"
for files simply does not exist in f2fs. (And f2fs's extent cache is
also not the same concept as extent, which might be a point of
confusion). This is different from XFS, Btrfs, and even ext4. If there
are currently no iomap APIs that support indirect pointers, then using
iomap to support folios in f2fs in the short term is almost completely
infeasible. I also sent you an email previously to discuss this
matter. https://lore.kernel.org/linux-f2fs-devel/CAMLCH1FThw2hH3pNm_dYxDPRbQ=mPxXAdzsgSXHpa4oBzK80CQ@mail.gmail.com/T/#t
I have listened to the Linux Foundation talk "Challenges and Ideas in
Transitioning EXT* and other FS to iomap". The talk mentioned that
iomap is being optimized for mapping performance of files based on
indirect pointers. I am curious to know if there are any patches in
iomap currently that address the handling of indirect pointer
mappings?
Next,I would like to discuss the design of the disign of the extended
iomap strcture, assuming we make some extensions to the fields in
iomap_folio_state (for example, we might add f2fs's page_private
flags, sorry I haven't fully figured out the specific design yet), we
would not be able to directly use iomap's various ifs APIs (such as
ifs_alloc) with this extended structure. I am wondering if we could
write some adaptation layer APIs? For example, could we process this
extended iomap_folio_state structure in adpater function and then
delegate the operations to iomap's APIs?
If iomap indeed does not support indirect based files ,then regarding
how to enable large folio support in f2fs at this stage, I believe
that in the short term, making f2fs's traditional buffered read API
support large folios would be a more appropriate and pragmatic interim
solution. (I haven't yet deeply studied other
address_space_operations, so let's put them aside for now.)
Furthermore, I think we may need to embed calls to iomap APIs and
iomap data structures within these functions. For example, directly
using the extended iomap_folio_state structure and related APIs in
f2fs_mpage_readpages. I understand that iomap was not designed for
this kind of usage. But I feel it might be difficult to avoid doing
so in the short term. To illustrate, besides buffered I/O, the garbage
collection process in f2fs also generates a significant amount of I/O
that interacts with the page cache. Moreover, garbage collection has
its own APIs for interacting with the page cache. Completely
refactoring them to directly follow the framework provided by iomap
might also be challenging.
If this approach might cause interface pollution for the future
migration of f2fs to iomap, then I think our current focus could be
prioritized on enabling large folio support for f2fs's traditional
buffered read and buffered write, as well as garbage collection, using
the solution I proposed. This should not interfere with iomap, as
iomap uses a completely separate set of interfaces for buffered read
and buffered write. If you have a better solution, I would be very
grateful if you could share your insights.
> Ah, you need a tool called b4. Your distro may have it packaged,
> or you can get it from:
> https://git.kernel.org/pub/scm/utils/b4/b4.git
Thanks for recommendation.I think I've learned a lot with this
tool.Well it seems that when using the combination of b4 am and git am
commands to apply patches, issues can sometimes occur where patches
don't apply cleanly. It appears that each patch heavily relies on the
patch author's own kernel tree and their previous patches. The ext4
large folio support patch seems to be the case. So, sometimes it might
still be necessary to manually resolve code conflicts?
I apologize for the length of this reply. It also seems that this
discussion has drifted somewhat from the original subject of this
thread. If you think it would be better to start a new thread, please
let me know.
Best regards.
Matthew Wilcox <willy@infradead.org> 于2025年4月2日周三 11:10写道:
Matthew Wilcox <willy@infradead.org> 于2025年4月2日周三 11:10写道:
>
> On Tue, Apr 01, 2025 at 10:17:42PM +0800, Nanzhe Zhao wrote:
> > Based on my understanding after studying the code related to F2FS's
> > use of the private field of the page structure, it appears that F2FS
> > employs this field in a specific way. If the private field is not
> > interpreted as a pointer, it seems it could be used to store
> > additional flag bits. A key observation is that these functions seem
> > to apply to tail pages as well. Therefore, as you mentioned, if we are
> > using folios to manage multiple pages, it seems reasonable to consider
> > adding a similar field within the iomap_folio_state structure. This
> > would be analogous to how it currently tracks the uptodate and dirty
> > states for each subpage, allowing us to track the state of these
> > private fields for each subpage as well. Because it looks just like
> > F2FS is utilizing the private field as a way to extend the various
> > state flags of a page in memory. Perhaps it would be more appropriate
> > to directly name this new structure f2fs_folio_state? This is because
> > I'm currently unsure whether it will interact with existing iomap APIs
> > or if we will need to develop F2FS-specific APIs for it.
>
> At this point, f2fs has no concept of head/tail pages. Because it
> doesn't tell the VFS that it can handle large folios, it will only see
> order-0 pages. The page->private member will go away, so filesystems
> cannot depend on being able to access it. They only get folio->private,
> and it's recommended (but not required) that they use that to point to
> their own private per-folio struct.
>
> I do think the best approach is to extend iomap and then have f2fs use
> iomap, but I appreciate that is several large jobs. It's worth it
> because it completely insulates f2fs from having to deal with
> pages/folios (except for metadata)
>
> > > You're right that f2fs needs per-block dirty tracking if it is to
> > > support large folios.
> >
> > I feel that we need to consider more than just this aspect. In fact,
> > it might be because we are still in the early stages of F2FS folio
> > support,so it leaves me the impression that the current F2FS folio
> > implementation is essentially just replacing struct page at the
> > interface level. It effectively acts just like a single page, or in
> > other words, a folio of order 0.
>
> Right, that's the current approach. We're taking it because the page
> APIs are being removed. The f2fs developers have chosen to work on other
> projects instead of supporting large folios (which is their right),
> but they can't hold up the conversion of the entire filesystem stack
> from pages to folios, so they're getting the minimal conversion and can
> work on large folios when they have time.
>
> > As you can see in f2fs_mpage_readpages, after each folio is processed
> > in the loop, the nr_pages counter is only decremented by 1. Therefore,
> > it's clear that when the allocated folios in the page cache are all
> > iterated through, nr_pages still has remaining value, and the loop
> > continues. This naturally leads to a segmentation fault at index =
> > folio_index(folio); due to dereferencing a null pointer. Furthermore,
> > only the first page of each folio is submitted for I/O; the remaining
> > pages are not filled with data from disk.
>
> Yes, there are lots of places in f2fs that assume a folio only has a
> single page.
>
> > I am planning to prepare patches to address these issues and submit
> > them soon. I noticed you recently submitted a big bunch of patches on
> > folio. I would like to debug and test based on your patch.Therefore, I
> > was wondering if it would be possible for you to share your modified
> > F2FS code directly, or perhaps provide a link to your Git repository?
> > Manually copying and applying so many patches from the mailing list
> > would be quite cumbersome.
>
> Ah, you need a tool called b4. Your distro may have it packaged,
> or you can get it from:
>
> https://git.kernel.org/pub/scm/utils/b4/b4.git
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [f2fs-dev] [DISCUSSION]:f2fs:Approachs to address write amplification in current aops->dirty_folio
2025-04-05 3:10 ` Nanzhe Zhao
@ 2025-04-08 10:14 ` Nanzhe Zhao
0 siblings, 0 replies; 6+ messages in thread
From: Nanzhe Zhao @ 2025-04-08 10:14 UTC (permalink / raw)
To: Matthew Wilcox
Cc: jaegeuk@kernel.org, v-songbaohua@oppo.com,
linux-f2fs-devel@lists.sourceforge.net
Dear Mr.Matthew,
I am writing to sincerely apologize for my significant
misunderstanding of iomap and f2fs in my previous email. I incorrectly
assumed that iomap was exclusively designed for extent-based file
systems. In fact, for indirect pointer-based file systems, in the
worst-case scenario, a single mapped data block can be represented as
an iomap structure with a length of one file system block.
Furthermore, after re-examining the f2fs_map_blocks function, I
realized that f2fs actually attempts to merge contiguous file system
blocks into a single f2fs_map_block structure. Therefore, it is
entirely feasible to implement readahead for f2fs using iomap.
> The concept of "extent" for files simply does not exist in f2fs. (And f2fs's extent cache is also not the same concept as extent, which might be a point of
> confusion).
I also want to apologize for my previous assertion in the email that
f2fs "completely lacks extents, and its extent cache is different from
extents." It is true that f2fs does not organize file blocks using
extents in the same direct manner as XFS and Btrfs, nor does it have a
dedicated extent file like ext4. However, as I mentioned earlier, the
f2fs_map_blocks function does attempt to merge consecutive file block
pre-read requests, and it caches this f2fs_map_blocks information in
its extent cache.It acts quite like building an extent tree ”online“
(I must admit that I am still not completely certain about this last
point, as I have not yet conducted an in-depth study of the f2fs
extent cache code.)
> Next,I would like to discuss the design of the disign of the extended
> iomap strcture, assuming we make some extensions to the fields in
> iomap_folio_state (for example, we might add f2fs's page_private
> flags, sorry I haven't fully figured out the specific design yet), we
> would not be able to directly use iomap's various ifs APIs (such as
> ifs_alloc) with this extended structure. I am wondering if we could
> write some adaptation layer APIs? For example, could we process this
> extended iomap_folio_state structure in adpater function and then
> delegate the operations to iomap's APIs?
Regarding the challenge of extending iomap_folio_state to address the
page->private field issue in f2fs, I have been developing some initial
ideas, although they are not yet fully formed. I am going to opening a
new thread on the f2fs mailing list in the near future to discuss this
matter further. I would greatly appreciate it if you could follow the
discussion there.
Nanzhe Zhao <nzzhao.sigma@gmail.com> 于2025年4月5日周六 11:10写道:
Nanzhe Zhao <nzzhao.sigma@gmail.com> 于2025年4月5日周六 11:10写道:
>
> Thank you for your prompt and patient response!
> >
> > At this point, f2fs has no concept of head/tail pages. Because it
> > doesn't tell the VFS that it can handle large folios, it will only see
> > order-0 pages. The page->private member will go away, so filesystems
> > cannot depend on being able to access it. They only get folio->private,
> > and it's recommended (but not required) that they use that to point to
> > their own private per-folio struct.
> Yes, I understand that we should treat all pages represented by a
> folio as a whole. The folio structure itself acts as the head page.
> Operations and flags applied to the folio are effectively applied to
> all pages within it, except for those operations that need to track
> page-specific attributes, such as whether a page is dirty or uptodate.
> I was just previously a bit concern about whether special flags used
> in private within f2fs needed to be tracked on a per-page basis,
> otherwise information might be lost. Let me give a specific example.
> For instance, PAGE_PRIVATE_ONGOING_MIGRATION indicates that a page is
> undergoing block migration during garbage collection. Initially, I was
> a bit worried about what would happen if some pages in a folio were in
> garbage collection while others were not. However, after further
> consideration and looking at how this flag is used in the f2fs code,
> it seems that it's sufficient for the folio's private field to know
> that it is in the migration phase of garbage collection. For
> PAGE_PRIVATE_INLINE_INODE, just from the name of this enumeration, we
> can tell that it will only be used for metadata pages. Therefore, we
> can currently fix the folio order for metadata folios to 0.
>
> > I do think the best approach is to extend iomap and then have f2fs use
> > iomap, but I appreciate that is several large jobs. It's worth it
> > because it completely insulates f2fs from having to deal with
> > pages/folios (except for metadata)
>
> Well for iomap, I have several questions.
> First of all,how should we define "having f2fs using iomap"? Does it
> mean rewriting the address_space_operations using iomap-based APIs?
> Let me take buffered read as a specific example. The only difference
> between traditional buffered read and iomap-based buffered read is
> whether to use mpage_readpages or iomap_readahead function during
> aops->readahead. If using iomap in f2fs implies using the
> iomap_readahead function, I am wondering if iomap_readahead supports
> files based on indirect pointers?
>
> I personally believe this question is very important. Because I
> recently looked at the code related to iomap in buffered read for xfs
> and the buffered read in the ext4 large folios patch. My conclusion is
> that the current implementation of iomap_readahead in the mainline
> kernel is entirely based on the assumption that the file's data block
> allocation is extent-based. (The author of ext4 large folios patch
> explicitly restricted the iomap buffered read path to extent-based
> files).It seems to completely lack support for files with data blocks
> allocated using indirect pointers. I am not sure if I have missed
> something crucial or if my understanding of iomap's readahead logic is
> not deep enough. I would like to confirm this with you. This is
> because f2fs is a file system entirely based on indirect pointers (and
> even with an additional layer of NAT table). The concept of "extent"
> for files simply does not exist in f2fs. (And f2fs's extent cache is
> also not the same concept as extent, which might be a point of
> confusion). This is different from XFS, Btrfs, and even ext4. If there
> are currently no iomap APIs that support indirect pointers, then using
> iomap to support folios in f2fs in the short term is almost completely
> infeasible. I also sent you an email previously to discuss this
> matter. https://lore.kernel.org/linux-f2fs-devel/CAMLCH1FThw2hH3pNm_dYxDPRbQ=mPxXAdzsgSXHpa4oBzK80CQ@mail.gmail.com/T/#t
> I have listened to the Linux Foundation talk "Challenges and Ideas in
> Transitioning EXT* and other FS to iomap". The talk mentioned that
> iomap is being optimized for mapping performance of files based on
> indirect pointers. I am curious to know if there are any patches in
> iomap currently that address the handling of indirect pointer
> mappings?
>
> Next,I would like to discuss the design of the disign of the extended
> iomap strcture, assuming we make some extensions to the fields in
> iomap_folio_state (for example, we might add f2fs's page_private
> flags, sorry I haven't fully figured out the specific design yet), we
> would not be able to directly use iomap's various ifs APIs (such as
> ifs_alloc) with this extended structure. I am wondering if we could
> write some adaptation layer APIs? For example, could we process this
> extended iomap_folio_state structure in adpater function and then
> delegate the operations to iomap's APIs?
>
> If iomap indeed does not support indirect based files ,then regarding
> how to enable large folio support in f2fs at this stage, I believe
> that in the short term, making f2fs's traditional buffered read API
> support large folios would be a more appropriate and pragmatic interim
> solution. (I haven't yet deeply studied other
> address_space_operations, so let's put them aside for now.)
> Furthermore, I think we may need to embed calls to iomap APIs and
> iomap data structures within these functions. For example, directly
> using the extended iomap_folio_state structure and related APIs in
> f2fs_mpage_readpages. I understand that iomap was not designed for
> this kind of usage. But I feel it might be difficult to avoid doing
> so in the short term. To illustrate, besides buffered I/O, the garbage
> collection process in f2fs also generates a significant amount of I/O
> that interacts with the page cache. Moreover, garbage collection has
> its own APIs for interacting with the page cache. Completely
> refactoring them to directly follow the framework provided by iomap
> might also be challenging.
> If this approach might cause interface pollution for the future
> migration of f2fs to iomap, then I think our current focus could be
> prioritized on enabling large folio support for f2fs's traditional
> buffered read and buffered write, as well as garbage collection, using
> the solution I proposed. This should not interfere with iomap, as
> iomap uses a completely separate set of interfaces for buffered read
> and buffered write. If you have a better solution, I would be very
> grateful if you could share your insights.
>
> > Ah, you need a tool called b4. Your distro may have it packaged,
> > or you can get it from:
> > https://git.kernel.org/pub/scm/utils/b4/b4.git
> Thanks for recommendation.I think I've learned a lot with this
> tool.Well it seems that when using the combination of b4 am and git am
> commands to apply patches, issues can sometimes occur where patches
> don't apply cleanly. It appears that each patch heavily relies on the
> patch author's own kernel tree and their previous patches. The ext4
> large folio support patch seems to be the case. So, sometimes it might
> still be necessary to manually resolve code conflicts?
> I apologize for the length of this reply. It also seems that this
> discussion has drifted somewhat from the original subject of this
> thread. If you think it would be better to start a new thread, please
> let me know.
> Best regards.
> Matthew Wilcox <willy@infradead.org> 于2025年4月2日周三 11:10写道:
>
>
> Matthew Wilcox <willy@infradead.org> 于2025年4月2日周三 11:10写道:
> >
> > On Tue, Apr 01, 2025 at 10:17:42PM +0800, Nanzhe Zhao wrote:
> > > Based on my understanding after studying the code related to F2FS's
> > > use of the private field of the page structure, it appears that F2FS
> > > employs this field in a specific way. If the private field is not
> > > interpreted as a pointer, it seems it could be used to store
> > > additional flag bits. A key observation is that these functions seem
> > > to apply to tail pages as well. Therefore, as you mentioned, if we are
> > > using folios to manage multiple pages, it seems reasonable to consider
> > > adding a similar field within the iomap_folio_state structure. This
> > > would be analogous to how it currently tracks the uptodate and dirty
> > > states for each subpage, allowing us to track the state of these
> > > private fields for each subpage as well. Because it looks just like
> > > F2FS is utilizing the private field as a way to extend the various
> > > state flags of a page in memory. Perhaps it would be more appropriate
> > > to directly name this new structure f2fs_folio_state? This is because
> > > I'm currently unsure whether it will interact with existing iomap APIs
> > > or if we will need to develop F2FS-specific APIs for it.
> >
> > At this point, f2fs has no concept of head/tail pages. Because it
> > doesn't tell the VFS that it can handle large folios, it will only see
> > order-0 pages. The page->private member will go away, so filesystems
> > cannot depend on being able to access it. They only get folio->private,
> > and it's recommended (but not required) that they use that to point to
> > their own private per-folio struct.
> >
> > I do think the best approach is to extend iomap and then have f2fs use
> > iomap, but I appreciate that is several large jobs. It's worth it
> > because it completely insulates f2fs from having to deal with
> > pages/folios (except for metadata)
> >
> > > > You're right that f2fs needs per-block dirty tracking if it is to
> > > > support large folios.
> > >
> > > I feel that we need to consider more than just this aspect. In fact,
> > > it might be because we are still in the early stages of F2FS folio
> > > support,so it leaves me the impression that the current F2FS folio
> > > implementation is essentially just replacing struct page at the
> > > interface level. It effectively acts just like a single page, or in
> > > other words, a folio of order 0.
> >
> > Right, that's the current approach. We're taking it because the page
> > APIs are being removed. The f2fs developers have chosen to work on other
> > projects instead of supporting large folios (which is their right),
> > but they can't hold up the conversion of the entire filesystem stack
> > from pages to folios, so they're getting the minimal conversion and can
> > work on large folios when they have time.
> >
> > > As you can see in f2fs_mpage_readpages, after each folio is processed
> > > in the loop, the nr_pages counter is only decremented by 1. Therefore,
> > > it's clear that when the allocated folios in the page cache are all
> > > iterated through, nr_pages still has remaining value, and the loop
> > > continues. This naturally leads to a segmentation fault at index =
> > > folio_index(folio); due to dereferencing a null pointer. Furthermore,
> > > only the first page of each folio is submitted for I/O; the remaining
> > > pages are not filled with data from disk.
> >
> > Yes, there are lots of places in f2fs that assume a folio only has a
> > single page.
> >
> > > I am planning to prepare patches to address these issues and submit
> > > them soon. I noticed you recently submitted a big bunch of patches on
> > > folio. I would like to debug and test based on your patch.Therefore, I
> > > was wondering if it would be possible for you to share your modified
> > > F2FS code directly, or perhaps provide a link to your Git repository?
> > > Manually copying and applying so many patches from the mailing list
> > > would be quite cumbersome.
> >
> > Ah, you need a tool called b4. Your distro may have it packaged,
> > or you can get it from:
> >
> > https://git.kernel.org/pub/scm/utils/b4/b4.git
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-04-08 10:14 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-30 2:38 [f2fs-dev] [DISCUSSION]:f2fs:Approachs to address write amplification in current aops->dirty_folio Nanzhe Zhao
2025-03-31 3:43 ` Matthew Wilcox
2025-04-01 14:17 ` Nanzhe Zhao
2025-04-02 3:10 ` Matthew Wilcox
2025-04-05 3:10 ` Nanzhe Zhao
2025-04-08 10:14 ` Nanzhe Zhao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).