From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: refactor how we prepare pages for btrfs_buffered_write()
Date: Thu, 13 Aug 2020 19:16:48 +0800 [thread overview]
Message-ID: <c0f45593-5733-29f1-e86e-cfa18091f470@suse.com> (raw)
In-Reply-To: <20200807072753.68285-1-wqu@suse.com>
[-- Attachment #1.1: Type: text/plain, Size: 6820 bytes --]
On 2020/8/7 下午3:27, Qu Wenruo wrote:
> When we prepare pages for buffered write, we need to ensure that if
> we're partially dirting a page, then it must be uptodate to make sure it
> has the old content read from disk.
>
> While for pages that is completely inside the write range, we don't need
> to make it uptodate at all, thus we can skip one page read.
>
> The old method uses @force_page_uptodate flag, but in fact it doesn't
> make any sense at all.
> In parepare_uptodate_page() it has the check to ensure we always force
> a paritially written page to be uptodate.
>
> So this patch will refactor the mess, to make it more easier to read by:
> - Remove the @force_page_uptodate bit
> It makes no difference at all
>
> - Remove the @pos and @force_uptodate for prepare_uptodate_page()
> Now the caller, parepare_pages() will determine if it's necessary.
>
> - Add more comment for parepare_pages()
> This will explain why we don't need some pages to be uptodate.
>
> - Use page_offset() to be more user-friendly
> After find_or_create_page(), page_offset() will return the offset
> inside the address space, thus can be used directly to be more
> reader-friendly.
>
> The new code uses the following method to do such check:
>
> full_dirty_page_start = round_up(pos, fs_info->sectorsize);
> full_dirty_page_end = round_down(pos + write_bytes,
> fs_info->sectorsize) - 1;
> for (i = 0; i < num_pages; i++) {
> ...
> if (!err && !(page_offset >= full_dirty_page_start &&
> page_offset <= full_dirty_page_end))
> err = prepare_uptodate_page(inode, pages[i]);
> }
>
> Which can handle all the possible cases like:
> - Dirty range in side a page
> || |///| ||
> Then @full_dirty_page_start > @full_dirty_page_end, and it result will
> always be (!err && !(false))
>
> - Dirty range across one page boundary
> || |///||//| ||
> Then @full_dirty_page_start == @full_dirty_page_end + 1, the same
> as above case.
>
> - Dirty range covers a full page
> || |///||////////||//| ||
> Then only for the 2nd page meet the condition, and skip the
> prepare_uptodate_page() call.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
Please discard this patch too.
It turns out that, some forced uptodate operations are needed. Reason
inlined below.
There is still something useful, like the partial written range check,
but it breaks the original behavior with some very rare iov combinations.
> ---
> fs/btrfs/file.c | 40 +++++++++++++++++++++++-----------------
> 1 file changed, 23 insertions(+), 17 deletions(-)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 841c516079a9..705ebe709e8d 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1378,13 +1378,11 @@ int btrfs_mark_extent_written(struct btrfs_trans_handle *trans,
> * on success we return a locked page and 0
> */
> static int prepare_uptodate_page(struct inode *inode,
> - struct page *page, u64 pos,
> - bool force_uptodate)
> + struct page *page)
> {
> int ret = 0;
>
> - if (((pos & (PAGE_SIZE - 1)) || force_uptodate) &&
> - !PageUptodate(page)) {
> + if (!PageUptodate(page)) {
> ret = btrfs_readpage(NULL, page);
> if (ret)
> return ret;
> @@ -1406,14 +1404,29 @@ static int prepare_uptodate_page(struct inode *inode,
> */
> static noinline int prepare_pages(struct inode *inode, struct page **pages,
> size_t num_pages, loff_t pos,
> - size_t write_bytes, bool force_uptodate)
> + size_t write_bytes)
> {
> - int i;
> + struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
> unsigned long index = pos >> PAGE_SHIFT;
> gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping);
> + u64 full_dirty_page_start;
> + u64 full_dirty_page_end;
> int err = 0;
> int faili;
> + int i;
>
> + /*
> + * || = page boundary
> + *
> + * @pos @pos + write_bytes
> + * || |///||//////||/////||//| ||
> + * \ /
> + * In this range, we don't need to the page to
> + * be uptodate at all.
> + */
> + full_dirty_page_start = round_up(pos, fs_info->sectorsize);
> + full_dirty_page_end = round_down(pos + write_bytes,
> + fs_info->sectorsize) - 1;
> for (i = 0; i < num_pages; i++) {
> again:
> pages[i] = find_or_create_page(inode->i_mapping, index + i,
> @@ -1424,12 +1437,9 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages,
> goto fail;
> }
>
> - if (i == 0)
> - err = prepare_uptodate_page(inode, pages[i], pos,
> - force_uptodate);
> - if (!err && i == num_pages - 1)
> - err = prepare_uptodate_page(inode, pages[i],
> - pos + write_bytes, false);
> + if (!err && !(page_offset(pages[i]) >= full_dirty_page_start &&
> + page_offset(pages[i]) <= full_dirty_page_end))
> + err = prepare_uptodate_page(inode, pages[i]);
> if (err) {
> put_page(pages[i]);
> if (err == -EAGAIN) {
> @@ -1638,7 +1648,6 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
> int nrptrs;
> int ret = 0;
> bool only_release_metadata = false;
> - bool force_page_uptodate = false;
>
> nrptrs = min(DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE),
> PAGE_SIZE / (sizeof(struct page *)));
> @@ -1727,8 +1736,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
> * contents of pages from loop to loop
> */
> ret = prepare_pages(inode, pages, num_pages,
> - pos, write_bytes,
> - force_page_uptodate);
> + pos, write_bytes);
> if (ret) {
> btrfs_delalloc_release_extents(BTRFS_I(inode),
> reserve_bytes);
> @@ -1763,11 +1771,9 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
> nrptrs = 1;
>
> if (copied == 0) {
> - force_page_uptodate = true;
This is here to prevent the following problem:
One page is completely covered by write range, but the write range
inside the page is split into two iov.
In that case, if the later part failed to be copied, we will hit a short
write.
Originally we will set force_page_uptodate, then falls back to nrptrs =
1; with force_page_uptodate = true;
Then in the next loop, we will force to read that full page to avoid
garbage in the partial written page.
This is indeed super rare, thus all my fstests run haven't triggered it.
But in theory, as long as we have differnt iovs supported, it should
still be possible.
Thanks,
Qu
> dirty_sectors = 0;
> dirty_pages = 0;
> } else {
> - force_page_uptodate = false;
> dirty_pages = DIV_ROUND_UP(copied + offset,
> PAGE_SIZE);
> }
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2020-08-13 11:17 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-07 7:27 [PATCH] btrfs: refactor how we prepare pages for btrfs_buffered_write() Qu Wenruo
2020-08-13 11:16 ` Qu Wenruo [this message]
2020-08-13 11:22 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c0f45593-5733-29f1-e86e-cfa18091f470@suse.com \
--to=wqu@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox