* [PATCH 0/2] btrfs: small cleanups to buffered write path
@ 2024-09-30 1:50 Qu Wenruo
2024-09-30 1:50 ` [PATCH 1/2] btrfs: remove the dirty_page local variable Qu Wenruo
2024-09-30 1:50 ` [PATCH 2/2] btrfs: simplify the page uptodate preparation for prepare_pages() Qu Wenruo
0 siblings, 2 replies; 7+ messages in thread
From: Qu Wenruo @ 2024-09-30 1:50 UTC (permalink / raw)
To: linux-btrfs
These two are small cleanups to the buffered write path, inspired by the
test btrfs failure of generic/563 with 4K sector size and 64K page size.
The root cause of that failure is, btrfs can't avoid full page read when
the dirty range covers sectors, but not yet the full page.
This is only the preparation part, we can not yet switch to the skip the
full page read, or it will still lead to incorrect data.
The full fix needs extra co-operation between the subpage read and
write, until then prepare_uptodate_page() still needs to read the full
page.
Qu Wenruo (2):
btrfs: remove the dirty_page local variable
btrfs: simplify the page uptodate preparation for prepare_pages()
fs/btrfs/file.c | 87 +++++++++++++++++++------------------
fs/btrfs/file.h | 2 +-
fs/btrfs/free-space-cache.c | 2 +-
3 files changed, 46 insertions(+), 45 deletions(-)
--
2.46.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] btrfs: remove the dirty_page local variable
2024-09-30 1:50 [PATCH 0/2] btrfs: small cleanups to buffered write path Qu Wenruo
@ 2024-09-30 1:50 ` Qu Wenruo
2024-09-30 18:01 ` Josef Bacik
2024-09-30 1:50 ` [PATCH 2/2] btrfs: simplify the page uptodate preparation for prepare_pages() Qu Wenruo
1 sibling, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2024-09-30 1:50 UTC (permalink / raw)
To: linux-btrfs
Inside btrfs_buffered_write(), we have a local variable @dirty_pages,
recording the number of pages we dirtied in the current iteration.
However we do not really need that variable, since it can be calculated
from @pos and @copied.
In fact there is already a problem inside the short copy path, where we
use @dirty_pages to calculate the range we need to release.
But that usage assumes sectorsize == PAGE_SIZE, which is no longer true.
Instead of keeping @dirty_pages and cause incorrect usage, just
calculate the number of dirtied pages inside btrfs_dirty_pages().
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/file.c | 19 +++++++------------
fs/btrfs/file.h | 2 +-
fs/btrfs/free-space-cache.c | 2 +-
3 files changed, 9 insertions(+), 14 deletions(-)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 4fb521d91b06..9555a3485670 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -124,12 +124,14 @@ static void btrfs_drop_pages(struct btrfs_fs_info *fs_info,
* - Update inode size for past EOF write
*/
int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages,
- size_t num_pages, loff_t pos, size_t write_bytes,
+ loff_t pos, size_t write_bytes,
struct extent_state **cached, bool noreserve)
{
struct btrfs_fs_info *fs_info = inode->root->fs_info;
int ret = 0;
int i;
+ const int num_pages = (round_up(pos + write_bytes, PAGE_SIZE) -
+ round_down(pos, PAGE_SIZE)) >> PAGE_SHIFT;
u64 num_bytes;
u64 start_pos;
u64 end_of_last_block;
@@ -1242,7 +1244,6 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
offset);
size_t num_pages;
size_t reserve_bytes;
- size_t dirty_pages;
size_t copied;
size_t dirty_sectors;
size_t num_sectors;
@@ -1361,11 +1362,8 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
if (copied == 0) {
force_page_uptodate = true;
dirty_sectors = 0;
- dirty_pages = 0;
} else {
force_page_uptodate = false;
- dirty_pages = DIV_ROUND_UP(copied + offset,
- PAGE_SIZE);
}
if (num_sectors > dirty_sectors) {
@@ -1375,13 +1373,10 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
btrfs_delalloc_release_metadata(BTRFS_I(inode),
release_bytes, true);
} else {
- u64 __pos;
-
- __pos = round_down(pos,
- fs_info->sectorsize) +
- (dirty_pages << PAGE_SHIFT);
+ u64 release_start = round_up(pos + copied,
+ fs_info->sectorsize);
btrfs_delalloc_release_space(BTRFS_I(inode),
- data_reserved, __pos,
+ data_reserved, release_start,
release_bytes, true);
}
}
@@ -1390,7 +1385,7 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
fs_info->sectorsize);
ret = btrfs_dirty_pages(BTRFS_I(inode), pages,
- dirty_pages, pos, copied,
+ pos, copied,
&cached_state, only_release_metadata);
/*
diff --git a/fs/btrfs/file.h b/fs/btrfs/file.h
index 912254e653cf..c23d0bf42598 100644
--- a/fs/btrfs/file.h
+++ b/fs/btrfs/file.h
@@ -35,7 +35,7 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
const struct btrfs_ioctl_encoded_io_args *encoded);
int btrfs_release_file(struct inode *inode, struct file *file);
int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages,
- size_t num_pages, loff_t pos, size_t write_bytes,
+ loff_t pos, size_t write_bytes,
struct extent_state **cached, bool noreserve);
int btrfs_fdatawrite_range(struct btrfs_inode *inode, loff_t start, loff_t end);
int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index f4bcb2530660..4a3a6db91878 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -1458,7 +1458,7 @@ static int __btrfs_write_out_cache(struct inode *inode,
/* Everything is written out, now we dirty the pages in the file. */
ret = btrfs_dirty_pages(BTRFS_I(inode), io_ctl->pages,
- io_ctl->num_pages, 0, i_size_read(inode),
+ 0, i_size_read(inode),
&cached_state, false);
if (ret)
goto out_nospc;
--
2.46.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] btrfs: simplify the page uptodate preparation for prepare_pages()
2024-09-30 1:50 [PATCH 0/2] btrfs: small cleanups to buffered write path Qu Wenruo
2024-09-30 1:50 ` [PATCH 1/2] btrfs: remove the dirty_page local variable Qu Wenruo
@ 2024-09-30 1:50 ` Qu Wenruo
2024-09-30 18:04 ` Josef Bacik
2024-10-01 15:20 ` David Sterba
1 sibling, 2 replies; 7+ messages in thread
From: Qu Wenruo @ 2024-09-30 1:50 UTC (permalink / raw)
To: linux-btrfs
Currently inside prepare_pages(), we handle the leading and tailing page
differently, and skip the middle pages (if any).
This is to avoid reading pages which is fully covered by the dirty
range.
Refactor the code by moving all checks (alignment check, range check,
force read check) into prepare_uptodate_page().
So that prepare_pages() only need to iterate all the pages
unconditionally.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/file.c | 68 +++++++++++++++++++++++++++----------------------
1 file changed, 37 insertions(+), 31 deletions(-)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 9555a3485670..cc8edf8da79d 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -858,36 +858,46 @@ int btrfs_mark_extent_written(struct btrfs_trans_handle *trans,
*/
static int prepare_uptodate_page(struct inode *inode,
struct page *page, u64 pos,
- bool force_uptodate)
+ u64 len, bool force_uptodate)
{
struct folio *folio = page_folio(page);
+ u64 clamp_start = max_t(u64, pos, folio_pos(folio));
+ u64 clamp_end = min_t(u64, pos + len, folio_pos(folio) + folio_size(folio));
int ret = 0;
- if (((pos & (PAGE_SIZE - 1)) || force_uptodate) &&
- !PageUptodate(page)) {
- ret = btrfs_read_folio(NULL, folio);
- if (ret)
- return ret;
- lock_page(page);
- if (!PageUptodate(page)) {
- unlock_page(page);
- return -EIO;
- }
+ if (PageUptodate(page))
+ return 0;
- /*
- * Since btrfs_read_folio() will unlock the folio before it
- * returns, there is a window where btrfs_release_folio() can be
- * called to release the page. Here we check both inode
- * mapping and PagePrivate() to make sure the page was not
- * released.
- *
- * The private flag check is essential for subpage as we need
- * to store extra bitmap using folio private.
- */
- if (page->mapping != inode->i_mapping || !folio_test_private(folio)) {
- unlock_page(page);
- return -EAGAIN;
- }
+ if (force_uptodate)
+ goto read;
+
+ /* The dirty range fully cover the page, no need to read it out. */
+ if (IS_ALIGNED(clamp_start, PAGE_SIZE) &&
+ IS_ALIGNED(clamp_end, PAGE_SIZE))
+ return 0;
+read:
+ ret = btrfs_read_folio(NULL, folio);
+ if (ret)
+ return ret;
+ lock_page(page);
+ if (!PageUptodate(page)) {
+ unlock_page(page);
+ return -EIO;
+ }
+
+ /*
+ * Since btrfs_read_folio() will unlock the folio before it
+ * returns, there is a window where btrfs_release_folio() can be
+ * called to release the page. Here we check both inode
+ * mapping and PagePrivate() to make sure the page was not
+ * released.
+ *
+ * The private flag check is essential for subpage as we need
+ * to store extra bitmap using folio private.
+ */
+ if (page->mapping != inode->i_mapping || !folio_test_private(folio)) {
+ unlock_page(page);
+ return -EAGAIN;
}
return 0;
}
@@ -949,12 +959,8 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages,
goto fail;
}
- if (i == 0)
- ret = prepare_uptodate_page(inode, pages[i], pos,
- force_uptodate);
- if (!ret && i == num_pages - 1)
- ret = prepare_uptodate_page(inode, pages[i],
- pos + write_bytes, false);
+ ret = prepare_uptodate_page(inode, pages[i], pos, write_bytes,
+ force_uptodate);
if (ret) {
put_page(pages[i]);
if (!nowait && ret == -EAGAIN) {
--
2.46.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] btrfs: remove the dirty_page local variable
2024-09-30 1:50 ` [PATCH 1/2] btrfs: remove the dirty_page local variable Qu Wenruo
@ 2024-09-30 18:01 ` Josef Bacik
0 siblings, 0 replies; 7+ messages in thread
From: Josef Bacik @ 2024-09-30 18:01 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Mon, Sep 30, 2024 at 11:20:29AM +0930, Qu Wenruo wrote:
> Inside btrfs_buffered_write(), we have a local variable @dirty_pages,
> recording the number of pages we dirtied in the current iteration.
>
> However we do not really need that variable, since it can be calculated
> from @pos and @copied.
>
> In fact there is already a problem inside the short copy path, where we
> use @dirty_pages to calculate the range we need to release.
> But that usage assumes sectorsize == PAGE_SIZE, which is no longer true.
>
> Instead of keeping @dirty_pages and cause incorrect usage, just
> calculate the number of dirtied pages inside btrfs_dirty_pages().
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] btrfs: simplify the page uptodate preparation for prepare_pages()
2024-09-30 1:50 ` [PATCH 2/2] btrfs: simplify the page uptodate preparation for prepare_pages() Qu Wenruo
@ 2024-09-30 18:04 ` Josef Bacik
2024-10-01 15:20 ` David Sterba
1 sibling, 0 replies; 7+ messages in thread
From: Josef Bacik @ 2024-09-30 18:04 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Mon, Sep 30, 2024 at 11:20:30AM +0930, Qu Wenruo wrote:
> Currently inside prepare_pages(), we handle the leading and tailing page
> differently, and skip the middle pages (if any).
>
> This is to avoid reading pages which is fully covered by the dirty
> range.
>
> Refactor the code by moving all checks (alignment check, range check,
> force read check) into prepare_uptodate_page().
>
> So that prepare_pages() only need to iterate all the pages
> unconditionally.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/file.c | 68 +++++++++++++++++++++++++++----------------------
> 1 file changed, 37 insertions(+), 31 deletions(-)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 9555a3485670..cc8edf8da79d 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -858,36 +858,46 @@ int btrfs_mark_extent_written(struct btrfs_trans_handle *trans,
> */
> static int prepare_uptodate_page(struct inode *inode,
> struct page *page, u64 pos,
> - bool force_uptodate)
> + u64 len, bool force_uptodate)
> {
> struct folio *folio = page_folio(page);
> + u64 clamp_start = max_t(u64, pos, folio_pos(folio));
> + u64 clamp_end = min_t(u64, pos + len, folio_pos(folio) + folio_size(folio));
> int ret = 0;
>
> - if (((pos & (PAGE_SIZE - 1)) || force_uptodate) &&
> - !PageUptodate(page)) {
> - ret = btrfs_read_folio(NULL, folio);
> - if (ret)
> - return ret;
> - lock_page(page);
> - if (!PageUptodate(page)) {
> - unlock_page(page);
> - return -EIO;
> - }
> + if (PageUptodate(page))
> + return 0;
>
> - /*
> - * Since btrfs_read_folio() will unlock the folio before it
> - * returns, there is a window where btrfs_release_folio() can be
> - * called to release the page. Here we check both inode
> - * mapping and PagePrivate() to make sure the page was not
> - * released.
> - *
> - * The private flag check is essential for subpage as we need
> - * to store extra bitmap using folio private.
> - */
> - if (page->mapping != inode->i_mapping || !folio_test_private(folio)) {
> - unlock_page(page);
> - return -EAGAIN;
> - }
> + if (force_uptodate)
> + goto read;
> +
> + /* The dirty range fully cover the page, no need to read it out. */
> + if (IS_ALIGNED(clamp_start, PAGE_SIZE) &&
> + IS_ALIGNED(clamp_end, PAGE_SIZE))
> + return 0;
> +read:
> + ret = btrfs_read_folio(NULL, folio);
> + if (ret)
> + return ret;
> + lock_page(page);
> + if (!PageUptodate(page)) {
> + unlock_page(page);
> + return -EIO;
> + }
> +
> + /*
> + * Since btrfs_read_folio() will unlock the folio before it
> + * returns, there is a window where btrfs_release_folio() can be
> + * called to release the page. Here we check both inode
> + * mapping and PagePrivate() to make sure the page was not
> + * released.
> + *
> + * The private flag check is essential for subpage as we need
> + * to store extra bitmap using folio private.
> + */
> + if (page->mapping != inode->i_mapping || !folio_test_private(folio)) {
> + unlock_page(page);
> + return -EAGAIN;
Since you're reworking it anyway can you go ahead and convert this to only use
the folio? Thanks,
Josef
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] btrfs: simplify the page uptodate preparation for prepare_pages()
2024-09-30 1:50 ` [PATCH 2/2] btrfs: simplify the page uptodate preparation for prepare_pages() Qu Wenruo
2024-09-30 18:04 ` Josef Bacik
@ 2024-10-01 15:20 ` David Sterba
2024-10-01 21:09 ` Qu Wenruo
1 sibling, 1 reply; 7+ messages in thread
From: David Sterba @ 2024-10-01 15:20 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Mon, Sep 30, 2024 at 11:20:30AM +0930, Qu Wenruo wrote:
> + if (force_uptodate)
> + goto read;
> +
> + /* The dirty range fully cover the page, no need to read it out. */
> + if (IS_ALIGNED(clamp_start, PAGE_SIZE) &&
> + IS_ALIGNED(clamp_end, PAGE_SIZE))
> + return 0;
> +read:
> + ret = btrfs_read_folio(NULL, folio);
Please avoid this pattern of an if and goto where it's simply jumping
around a block and the goto label is used only once.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] btrfs: simplify the page uptodate preparation for prepare_pages()
2024-10-01 15:20 ` David Sterba
@ 2024-10-01 21:09 ` Qu Wenruo
0 siblings, 0 replies; 7+ messages in thread
From: Qu Wenruo @ 2024-10-01 21:09 UTC (permalink / raw)
To: dsterba, Qu Wenruo; +Cc: linux-btrfs
在 2024/10/2 00:50, David Sterba 写道:
> On Mon, Sep 30, 2024 at 11:20:30AM +0930, Qu Wenruo wrote:
>> + if (force_uptodate)
>> + goto read;
>> +
>> + /* The dirty range fully cover the page, no need to read it out. */
>> + if (IS_ALIGNED(clamp_start, PAGE_SIZE) &&
>> + IS_ALIGNED(clamp_end, PAGE_SIZE))
>> + return 0;
>> +read:
>> + ret = btrfs_read_folio(NULL, folio);
>
> Please avoid this pattern of an if and goto where it's simply jumping
> around a block and the goto label is used only once.
>
Any better alternative?
Thanks,
Qu
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-10-01 21:09 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-30 1:50 [PATCH 0/2] btrfs: small cleanups to buffered write path Qu Wenruo
2024-09-30 1:50 ` [PATCH 1/2] btrfs: remove the dirty_page local variable Qu Wenruo
2024-09-30 18:01 ` Josef Bacik
2024-09-30 1:50 ` [PATCH 2/2] btrfs: simplify the page uptodate preparation for prepare_pages() Qu Wenruo
2024-09-30 18:04 ` Josef Bacik
2024-10-01 15:20 ` David Sterba
2024-10-01 21:09 ` Qu Wenruo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).