* [PATCH 0/2] btrfs: small cleanups to buffered write path @ 2024-09-30 1:50 Qu Wenruo 2024-09-30 1:50 ` [PATCH 1/2] btrfs: remove the dirty_page local variable Qu Wenruo 2024-09-30 1:50 ` [PATCH 2/2] btrfs: simplify the page uptodate preparation for prepare_pages() Qu Wenruo 0 siblings, 2 replies; 7+ messages in thread From: Qu Wenruo @ 2024-09-30 1:50 UTC (permalink / raw) To: linux-btrfs These two are small cleanups to the buffered write path, inspired by the test btrfs failure of generic/563 with 4K sector size and 64K page size. The root cause of that failure is, btrfs can't avoid full page read when the dirty range covers sectors, but not yet the full page. This is only the preparation part, we can not yet switch to the skip the full page read, or it will still lead to incorrect data. The full fix needs extra co-operation between the subpage read and write, until then prepare_uptodate_page() still needs to read the full page. Qu Wenruo (2): btrfs: remove the dirty_page local variable btrfs: simplify the page uptodate preparation for prepare_pages() fs/btrfs/file.c | 87 +++++++++++++++++++------------------ fs/btrfs/file.h | 2 +- fs/btrfs/free-space-cache.c | 2 +- 3 files changed, 46 insertions(+), 45 deletions(-) -- 2.46.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] btrfs: remove the dirty_page local variable 2024-09-30 1:50 [PATCH 0/2] btrfs: small cleanups to buffered write path Qu Wenruo @ 2024-09-30 1:50 ` Qu Wenruo 2024-09-30 18:01 ` Josef Bacik 2024-09-30 1:50 ` [PATCH 2/2] btrfs: simplify the page uptodate preparation for prepare_pages() Qu Wenruo 1 sibling, 1 reply; 7+ messages in thread From: Qu Wenruo @ 2024-09-30 1:50 UTC (permalink / raw) To: linux-btrfs Inside btrfs_buffered_write(), we have a local variable @dirty_pages, recording the number of pages we dirtied in the current iteration. However we do not really need that variable, since it can be calculated from @pos and @copied. In fact there is already a problem inside the short copy path, where we use @dirty_pages to calculate the range we need to release. But that usage assumes sectorsize == PAGE_SIZE, which is no longer true. Instead of keeping @dirty_pages and cause incorrect usage, just calculate the number of dirtied pages inside btrfs_dirty_pages(). Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/file.c | 19 +++++++------------ fs/btrfs/file.h | 2 +- fs/btrfs/free-space-cache.c | 2 +- 3 files changed, 9 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 4fb521d91b06..9555a3485670 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -124,12 +124,14 @@ static void btrfs_drop_pages(struct btrfs_fs_info *fs_info, * - Update inode size for past EOF write */ int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages, - size_t num_pages, loff_t pos, size_t write_bytes, + loff_t pos, size_t write_bytes, struct extent_state **cached, bool noreserve) { struct btrfs_fs_info *fs_info = inode->root->fs_info; int ret = 0; int i; + const int num_pages = (round_up(pos + write_bytes, PAGE_SIZE) - + round_down(pos, PAGE_SIZE)) >> PAGE_SHIFT; u64 num_bytes; u64 start_pos; u64 end_of_last_block; @@ -1242,7 +1244,6 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i) offset); size_t num_pages; size_t reserve_bytes; - size_t dirty_pages; size_t copied; size_t dirty_sectors; size_t num_sectors; @@ -1361,11 +1362,8 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i) if (copied == 0) { force_page_uptodate = true; dirty_sectors = 0; - dirty_pages = 0; } else { force_page_uptodate = false; - dirty_pages = DIV_ROUND_UP(copied + offset, - PAGE_SIZE); } if (num_sectors > dirty_sectors) { @@ -1375,13 +1373,10 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i) btrfs_delalloc_release_metadata(BTRFS_I(inode), release_bytes, true); } else { - u64 __pos; - - __pos = round_down(pos, - fs_info->sectorsize) + - (dirty_pages << PAGE_SHIFT); + u64 release_start = round_up(pos + copied, + fs_info->sectorsize); btrfs_delalloc_release_space(BTRFS_I(inode), - data_reserved, __pos, + data_reserved, release_start, release_bytes, true); } } @@ -1390,7 +1385,7 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i) fs_info->sectorsize); ret = btrfs_dirty_pages(BTRFS_I(inode), pages, - dirty_pages, pos, copied, + pos, copied, &cached_state, only_release_metadata); /* diff --git a/fs/btrfs/file.h b/fs/btrfs/file.h index 912254e653cf..c23d0bf42598 100644 --- a/fs/btrfs/file.h +++ b/fs/btrfs/file.h @@ -35,7 +35,7 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from, const struct btrfs_ioctl_encoded_io_args *encoded); int btrfs_release_file(struct inode *inode, struct file *file); int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages, - size_t num_pages, loff_t pos, size_t write_bytes, + loff_t pos, size_t write_bytes, struct extent_state **cached, bool noreserve); int btrfs_fdatawrite_range(struct btrfs_inode *inode, loff_t start, loff_t end); int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos, diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index f4bcb2530660..4a3a6db91878 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -1458,7 +1458,7 @@ static int __btrfs_write_out_cache(struct inode *inode, /* Everything is written out, now we dirty the pages in the file. */ ret = btrfs_dirty_pages(BTRFS_I(inode), io_ctl->pages, - io_ctl->num_pages, 0, i_size_read(inode), + 0, i_size_read(inode), &cached_state, false); if (ret) goto out_nospc; -- 2.46.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] btrfs: remove the dirty_page local variable 2024-09-30 1:50 ` [PATCH 1/2] btrfs: remove the dirty_page local variable Qu Wenruo @ 2024-09-30 18:01 ` Josef Bacik 0 siblings, 0 replies; 7+ messages in thread From: Josef Bacik @ 2024-09-30 18:01 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Mon, Sep 30, 2024 at 11:20:29AM +0930, Qu Wenruo wrote: > Inside btrfs_buffered_write(), we have a local variable @dirty_pages, > recording the number of pages we dirtied in the current iteration. > > However we do not really need that variable, since it can be calculated > from @pos and @copied. > > In fact there is already a problem inside the short copy path, where we > use @dirty_pages to calculate the range we need to release. > But that usage assumes sectorsize == PAGE_SIZE, which is no longer true. > > Instead of keeping @dirty_pages and cause incorrect usage, just > calculate the number of dirtied pages inside btrfs_dirty_pages(). > > Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] btrfs: simplify the page uptodate preparation for prepare_pages() 2024-09-30 1:50 [PATCH 0/2] btrfs: small cleanups to buffered write path Qu Wenruo 2024-09-30 1:50 ` [PATCH 1/2] btrfs: remove the dirty_page local variable Qu Wenruo @ 2024-09-30 1:50 ` Qu Wenruo 2024-09-30 18:04 ` Josef Bacik 2024-10-01 15:20 ` David Sterba 1 sibling, 2 replies; 7+ messages in thread From: Qu Wenruo @ 2024-09-30 1:50 UTC (permalink / raw) To: linux-btrfs Currently inside prepare_pages(), we handle the leading and tailing page differently, and skip the middle pages (if any). This is to avoid reading pages which is fully covered by the dirty range. Refactor the code by moving all checks (alignment check, range check, force read check) into prepare_uptodate_page(). So that prepare_pages() only need to iterate all the pages unconditionally. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/file.c | 68 +++++++++++++++++++++++++++---------------------- 1 file changed, 37 insertions(+), 31 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 9555a3485670..cc8edf8da79d 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -858,36 +858,46 @@ int btrfs_mark_extent_written(struct btrfs_trans_handle *trans, */ static int prepare_uptodate_page(struct inode *inode, struct page *page, u64 pos, - bool force_uptodate) + u64 len, bool force_uptodate) { struct folio *folio = page_folio(page); + u64 clamp_start = max_t(u64, pos, folio_pos(folio)); + u64 clamp_end = min_t(u64, pos + len, folio_pos(folio) + folio_size(folio)); int ret = 0; - if (((pos & (PAGE_SIZE - 1)) || force_uptodate) && - !PageUptodate(page)) { - ret = btrfs_read_folio(NULL, folio); - if (ret) - return ret; - lock_page(page); - if (!PageUptodate(page)) { - unlock_page(page); - return -EIO; - } + if (PageUptodate(page)) + return 0; - /* - * Since btrfs_read_folio() will unlock the folio before it - * returns, there is a window where btrfs_release_folio() can be - * called to release the page. Here we check both inode - * mapping and PagePrivate() to make sure the page was not - * released. - * - * The private flag check is essential for subpage as we need - * to store extra bitmap using folio private. - */ - if (page->mapping != inode->i_mapping || !folio_test_private(folio)) { - unlock_page(page); - return -EAGAIN; - } + if (force_uptodate) + goto read; + + /* The dirty range fully cover the page, no need to read it out. */ + if (IS_ALIGNED(clamp_start, PAGE_SIZE) && + IS_ALIGNED(clamp_end, PAGE_SIZE)) + return 0; +read: + ret = btrfs_read_folio(NULL, folio); + if (ret) + return ret; + lock_page(page); + if (!PageUptodate(page)) { + unlock_page(page); + return -EIO; + } + + /* + * Since btrfs_read_folio() will unlock the folio before it + * returns, there is a window where btrfs_release_folio() can be + * called to release the page. Here we check both inode + * mapping and PagePrivate() to make sure the page was not + * released. + * + * The private flag check is essential for subpage as we need + * to store extra bitmap using folio private. + */ + if (page->mapping != inode->i_mapping || !folio_test_private(folio)) { + unlock_page(page); + return -EAGAIN; } return 0; } @@ -949,12 +959,8 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages, goto fail; } - if (i == 0) - ret = prepare_uptodate_page(inode, pages[i], pos, - force_uptodate); - if (!ret && i == num_pages - 1) - ret = prepare_uptodate_page(inode, pages[i], - pos + write_bytes, false); + ret = prepare_uptodate_page(inode, pages[i], pos, write_bytes, + force_uptodate); if (ret) { put_page(pages[i]); if (!nowait && ret == -EAGAIN) { -- 2.46.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] btrfs: simplify the page uptodate preparation for prepare_pages() 2024-09-30 1:50 ` [PATCH 2/2] btrfs: simplify the page uptodate preparation for prepare_pages() Qu Wenruo @ 2024-09-30 18:04 ` Josef Bacik 2024-10-01 15:20 ` David Sterba 1 sibling, 0 replies; 7+ messages in thread From: Josef Bacik @ 2024-09-30 18:04 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Mon, Sep 30, 2024 at 11:20:30AM +0930, Qu Wenruo wrote: > Currently inside prepare_pages(), we handle the leading and tailing page > differently, and skip the middle pages (if any). > > This is to avoid reading pages which is fully covered by the dirty > range. > > Refactor the code by moving all checks (alignment check, range check, > force read check) into prepare_uptodate_page(). > > So that prepare_pages() only need to iterate all the pages > unconditionally. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/file.c | 68 +++++++++++++++++++++++++++---------------------- > 1 file changed, 37 insertions(+), 31 deletions(-) > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index 9555a3485670..cc8edf8da79d 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -858,36 +858,46 @@ int btrfs_mark_extent_written(struct btrfs_trans_handle *trans, > */ > static int prepare_uptodate_page(struct inode *inode, > struct page *page, u64 pos, > - bool force_uptodate) > + u64 len, bool force_uptodate) > { > struct folio *folio = page_folio(page); > + u64 clamp_start = max_t(u64, pos, folio_pos(folio)); > + u64 clamp_end = min_t(u64, pos + len, folio_pos(folio) + folio_size(folio)); > int ret = 0; > > - if (((pos & (PAGE_SIZE - 1)) || force_uptodate) && > - !PageUptodate(page)) { > - ret = btrfs_read_folio(NULL, folio); > - if (ret) > - return ret; > - lock_page(page); > - if (!PageUptodate(page)) { > - unlock_page(page); > - return -EIO; > - } > + if (PageUptodate(page)) > + return 0; > > - /* > - * Since btrfs_read_folio() will unlock the folio before it > - * returns, there is a window where btrfs_release_folio() can be > - * called to release the page. Here we check both inode > - * mapping and PagePrivate() to make sure the page was not > - * released. > - * > - * The private flag check is essential for subpage as we need > - * to store extra bitmap using folio private. > - */ > - if (page->mapping != inode->i_mapping || !folio_test_private(folio)) { > - unlock_page(page); > - return -EAGAIN; > - } > + if (force_uptodate) > + goto read; > + > + /* The dirty range fully cover the page, no need to read it out. */ > + if (IS_ALIGNED(clamp_start, PAGE_SIZE) && > + IS_ALIGNED(clamp_end, PAGE_SIZE)) > + return 0; > +read: > + ret = btrfs_read_folio(NULL, folio); > + if (ret) > + return ret; > + lock_page(page); > + if (!PageUptodate(page)) { > + unlock_page(page); > + return -EIO; > + } > + > + /* > + * Since btrfs_read_folio() will unlock the folio before it > + * returns, there is a window where btrfs_release_folio() can be > + * called to release the page. Here we check both inode > + * mapping and PagePrivate() to make sure the page was not > + * released. > + * > + * The private flag check is essential for subpage as we need > + * to store extra bitmap using folio private. > + */ > + if (page->mapping != inode->i_mapping || !folio_test_private(folio)) { > + unlock_page(page); > + return -EAGAIN; Since you're reworking it anyway can you go ahead and convert this to only use the folio? Thanks, Josef ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] btrfs: simplify the page uptodate preparation for prepare_pages() 2024-09-30 1:50 ` [PATCH 2/2] btrfs: simplify the page uptodate preparation for prepare_pages() Qu Wenruo 2024-09-30 18:04 ` Josef Bacik @ 2024-10-01 15:20 ` David Sterba 2024-10-01 21:09 ` Qu Wenruo 1 sibling, 1 reply; 7+ messages in thread From: David Sterba @ 2024-10-01 15:20 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Mon, Sep 30, 2024 at 11:20:30AM +0930, Qu Wenruo wrote: > + if (force_uptodate) > + goto read; > + > + /* The dirty range fully cover the page, no need to read it out. */ > + if (IS_ALIGNED(clamp_start, PAGE_SIZE) && > + IS_ALIGNED(clamp_end, PAGE_SIZE)) > + return 0; > +read: > + ret = btrfs_read_folio(NULL, folio); Please avoid this pattern of an if and goto where it's simply jumping around a block and the goto label is used only once. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] btrfs: simplify the page uptodate preparation for prepare_pages() 2024-10-01 15:20 ` David Sterba @ 2024-10-01 21:09 ` Qu Wenruo 0 siblings, 0 replies; 7+ messages in thread From: Qu Wenruo @ 2024-10-01 21:09 UTC (permalink / raw) To: dsterba, Qu Wenruo; +Cc: linux-btrfs 在 2024/10/2 00:50, David Sterba 写道: > On Mon, Sep 30, 2024 at 11:20:30AM +0930, Qu Wenruo wrote: >> + if (force_uptodate) >> + goto read; >> + >> + /* The dirty range fully cover the page, no need to read it out. */ >> + if (IS_ALIGNED(clamp_start, PAGE_SIZE) && >> + IS_ALIGNED(clamp_end, PAGE_SIZE)) >> + return 0; >> +read: >> + ret = btrfs_read_folio(NULL, folio); > > Please avoid this pattern of an if and goto where it's simply jumping > around a block and the goto label is used only once. > Any better alternative? Thanks, Qu ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-10-01 21:09 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-09-30 1:50 [PATCH 0/2] btrfs: small cleanups to buffered write path Qu Wenruo 2024-09-30 1:50 ` [PATCH 1/2] btrfs: remove the dirty_page local variable Qu Wenruo 2024-09-30 18:01 ` Josef Bacik 2024-09-30 1:50 ` [PATCH 2/2] btrfs: simplify the page uptodate preparation for prepare_pages() Qu Wenruo 2024-09-30 18:04 ` Josef Bacik 2024-10-01 15:20 ` David Sterba 2024-10-01 21:09 ` Qu Wenruo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).