From: Boris Burkov <boris@bur.io>
To: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: linux-btrfs@vger.kernel.org, Goldwyn Rodrigues <rgoldwyn@suse.com>
Subject: Re: [PATCH 11/21] btrfs: locking extents for async writeback
Date: Wed, 8 Mar 2023 11:13:40 -0800 [thread overview]
Message-ID: <20230308191340.GB30177@zen> (raw)
In-Reply-To: <2aecd82ddb64139b04ccc51c5e625a7e6b3dc08e.1677793433.git.rgoldwyn@suse.com>
On Thu, Mar 02, 2023 at 04:24:56PM -0600, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>
> For async writebacks, lock the extents and then perform the cow file
> range for async. Unlock when async_chunk is free'd.
>
> Since writeback is performed in range, so locked_page can be removed
> from the structures and function parameters. Similarly for page_started
> and nr_written.
>
> A writeback could involve a hole, so check if the range locked covers
> the entire extent returned by find_lock_delalloc_range().
> If not try to lock the entire range or unlock the pages locked.
>
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> ---
> fs/btrfs/compression.c | 4 +
> fs/btrfs/extent_io.c | 10 +--
> fs/btrfs/extent_io.h | 2 +
> fs/btrfs/inode.c | 184 ++++++++++++++++++-----------------------
> 4 files changed, 92 insertions(+), 108 deletions(-)
>
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index b0dd01e31078..a8fa7f2049ce 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -1424,6 +1424,10 @@ static void heuristic_collect_sample(struct inode *inode, u64 start, u64 end,
> curr_sample_pos = 0;
> while (index < index_end) {
> page = find_get_page(inode->i_mapping, index);
> + if (!page) {
> + index++;
> + continue;
> + }
> in_data = kmap_local_page(page);
> /* Handle case where the start is not aligned to PAGE_SIZE */
> i = start % PAGE_SIZE;
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index cdce2db82d7e..12aa7eaf12c5 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -358,7 +358,7 @@ static int __process_pages_contig(struct address_space *mapping,
> return err;
> }
>
> -static noinline void __unlock_for_delalloc(struct inode *inode,
> +noinline void __unlock_for_delalloc(struct inode *inode,
> struct page *locked_page,
> u64 start, u64 end)
> {
> @@ -383,8 +383,7 @@ static noinline int lock_delalloc_pages(struct inode *inode,
> u64 processed_end = delalloc_start;
> int ret;
>
> - ASSERT(locked_page);
> - if (index == locked_page->index && index == end_index)
> + if (locked_page && index == locked_page->index && index == end_index)
> return 0;
>
> ret = __process_pages_contig(inode->i_mapping, locked_page, delalloc_start,
> @@ -432,8 +431,9 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode,
> ASSERT(orig_end > orig_start);
>
> /* The range should at least cover part of the page */
> - ASSERT(!(orig_start >= page_offset(locked_page) + PAGE_SIZE ||
> - orig_end <= page_offset(locked_page)));
> + if (locked_page)
> + ASSERT(!(orig_start >= page_offset(locked_page) + PAGE_SIZE ||
> + orig_end <= page_offset(locked_page)));
> again:
> /* step one, find a bunch of delalloc bytes starting at start */
> delalloc_start = *start;
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index 4341ad978fb8..ddfa100ab629 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -279,6 +279,8 @@ void btrfs_clear_buffer_dirty(struct btrfs_trans_handle *trans,
> int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array);
>
> void end_extent_writepage(struct page *page, int err, u64 start, u64 end);
> +void __unlock_for_delalloc(struct inode *inode, struct page *locked_page,
> + u64 start, u64 end);
>
> #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
> bool find_lock_delalloc_range(struct inode *inode,
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index eeddd7cdff58..fb02b2b3ac2e 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -506,7 +506,6 @@ struct async_extent {
>
> struct async_chunk {
> struct btrfs_inode *inode;
> - struct page *locked_page;
> u64 start;
> u64 end;
> blk_opf_t write_flags;
> @@ -887,18 +886,6 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
> }
> }
> cleanup_and_bail_uncompressed:
> - /*
> - * No compression, but we still need to write the pages in the file
> - * we've been given so far. redirty the locked page if it corresponds
> - * to our extent and set things up for the async work queue to run
> - * cow_file_range to do the normal delalloc dance.
> - */
> - if (async_chunk->locked_page &&
> - (page_offset(async_chunk->locked_page) >= start &&
> - page_offset(async_chunk->locked_page)) <= end) {
> - __set_page_dirty_nobuffers(async_chunk->locked_page);
> - /* unlocked later on in the async handlers */
> - }
>
> if (redirty)
> extent_range_redirty_for_io(&inode->vfs_inode, start, end);
> @@ -926,8 +913,7 @@ static void free_async_extent_pages(struct async_extent *async_extent)
> }
>
> static int submit_uncompressed_range(struct btrfs_inode *inode,
> - struct async_extent *async_extent,
> - struct page *locked_page)
> + struct async_extent *async_extent)
> {
> u64 start = async_extent->start;
> u64 end = async_extent->start + async_extent->ram_size - 1;
> @@ -942,7 +928,7 @@ static int submit_uncompressed_range(struct btrfs_inode *inode,
> * Also we call cow_file_range() with @unlock_page == 0, so that we
> * can directly submit them without interruption.
> */
> - ret = cow_file_range(inode, locked_page, start, end, &page_started,
> + ret = cow_file_range(inode, NULL, start, end, &page_started,
> &nr_written, 0, NULL);
> /* Inline extent inserted, page gets unlocked and everything is done */
> if (page_started) {
> @@ -950,23 +936,12 @@ static int submit_uncompressed_range(struct btrfs_inode *inode,
> goto out;
> }
> if (ret < 0) {
> - btrfs_cleanup_ordered_extents(inode, locked_page, start, end - start + 1);
> - if (locked_page) {
> - const u64 page_start = page_offset(locked_page);
> - const u64 page_end = page_start + PAGE_SIZE - 1;
> -
> - btrfs_page_set_error(inode->root->fs_info, locked_page,
> - page_start, PAGE_SIZE);
> - set_page_writeback(locked_page);
> - end_page_writeback(locked_page);
> - end_extent_writepage(locked_page, ret, page_start, page_end);
> - unlock_page(locked_page);
> - }
> + btrfs_cleanup_ordered_extents(inode, NULL, start, end - start + 1);
> goto out;
> }
>
> ret = extent_write_locked_range(&inode->vfs_inode, start, end);
> - /* All pages will be unlocked, including @locked_page */
> + /* All pages will be unlocked */
> out:
> kfree(async_extent);
> return ret;
> @@ -980,27 +955,14 @@ static int submit_one_async_extent(struct btrfs_inode *inode,
> struct btrfs_root *root = inode->root;
> struct btrfs_fs_info *fs_info = root->fs_info;
> struct btrfs_key ins;
> - struct page *locked_page = NULL;
> struct extent_map *em;
> int ret = 0;
> u64 start = async_extent->start;
> u64 end = async_extent->start + async_extent->ram_size - 1;
>
> - /*
> - * If async_chunk->locked_page is in the async_extent range, we need to
> - * handle it.
> - */
> - if (async_chunk->locked_page) {
> - u64 locked_page_start = page_offset(async_chunk->locked_page);
> - u64 locked_page_end = locked_page_start + PAGE_SIZE - 1;
> -
> - if (!(start >= locked_page_end || end <= locked_page_start))
> - locked_page = async_chunk->locked_page;
> - }
> -
> /* We have fall back to uncompressed write */
> if (!async_extent->pages)
> - return submit_uncompressed_range(inode, async_extent, locked_page);
> + return submit_uncompressed_range(inode, async_extent);
>
> ret = btrfs_reserve_extent(root, async_extent->ram_size,
> async_extent->compressed_size,
> @@ -1476,6 +1438,8 @@ static noinline void async_cow_start(struct btrfs_work *work)
>
> compressed_extents = compress_file_range(async_chunk);
> if (compressed_extents == 0) {
> + unlock_extent(&async_chunk->inode->io_tree,
> + async_chunk->start, async_chunk->end, NULL);
> btrfs_add_delayed_iput(async_chunk->inode);
> async_chunk->inode = NULL;
> }
> @@ -1515,11 +1479,15 @@ static noinline void async_cow_free(struct btrfs_work *work)
> struct async_cow *async_cow;
>
> async_chunk = container_of(work, struct async_chunk, work);
> - if (async_chunk->inode)
> + if (async_chunk->inode) {
> + unlock_extent(&async_chunk->inode->io_tree,
> + async_chunk->start, async_chunk->end, NULL);
> btrfs_add_delayed_iput(async_chunk->inode);
> + }
> if (async_chunk->blkcg_css)
> css_put(async_chunk->blkcg_css);
>
> +
> async_cow = async_chunk->async_cow;
> if (atomic_dec_and_test(&async_cow->num_chunks))
> kvfree(async_cow);
> @@ -1527,9 +1495,7 @@ static noinline void async_cow_free(struct btrfs_work *work)
>
> static int cow_file_range_async(struct btrfs_inode *inode,
> struct writeback_control *wbc,
> - struct page *locked_page,
> - u64 start, u64 end, int *page_started,
> - unsigned long *nr_written)
> + u64 start, u64 end)
> {
> struct btrfs_fs_info *fs_info = inode->root->fs_info;
> struct cgroup_subsys_state *blkcg_css = wbc_blkcg_css(wbc);
> @@ -1539,20 +1505,9 @@ static int cow_file_range_async(struct btrfs_inode *inode,
> u64 cur_end;
> u64 num_chunks = DIV_ROUND_UP(end - start, SZ_512K);
> int i;
> - bool should_compress;
> unsigned nofs_flag;
> const blk_opf_t write_flags = wbc_to_write_flags(wbc);
>
> - unlock_extent(&inode->io_tree, start, end, NULL);
> -
> - if (inode->flags & BTRFS_INODE_NOCOMPRESS &&
> - !btrfs_test_opt(fs_info, FORCE_COMPRESS)) {
> - num_chunks = 1;
> - should_compress = false;
> - } else {
> - should_compress = true;
> - }
> -
> nofs_flag = memalloc_nofs_save();
> ctx = kvmalloc(struct_size(ctx, chunks, num_chunks), GFP_KERNEL);
> memalloc_nofs_restore(nofs_flag);
> @@ -1564,19 +1519,17 @@ static int cow_file_range_async(struct btrfs_inode *inode,
> unsigned long page_ops = PAGE_UNLOCK | PAGE_START_WRITEBACK |
> PAGE_END_WRITEBACK | PAGE_SET_ERROR;
>
> - extent_clear_unlock_delalloc(inode, start, end, locked_page,
> + extent_clear_unlock_delalloc(inode, start, end, NULL,
> clear_bits, page_ops);
> return -ENOMEM;
> }
>
> + set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &inode->runtime_flags);
> async_chunk = ctx->chunks;
> atomic_set(&ctx->num_chunks, num_chunks);
>
> for (i = 0; i < num_chunks; i++) {
> - if (should_compress)
> - cur_end = min(end, start + SZ_512K - 1);
> - else
> - cur_end = end;
> + cur_end = min(end, start + SZ_512K - 1);
>
> /*
> * igrab is called higher up in the call chain, take only the
> @@ -1590,33 +1543,6 @@ static int cow_file_range_async(struct btrfs_inode *inode,
> async_chunk[i].write_flags = write_flags;
> INIT_LIST_HEAD(&async_chunk[i].extents);
>
> - /*
> - * The locked_page comes all the way from writepage and its
> - * the original page we were actually given. As we spread
> - * this large delalloc region across multiple async_chunk
> - * structs, only the first struct needs a pointer to locked_page
> - *
> - * This way we don't need racey decisions about who is supposed
> - * to unlock it.
> - */
> - if (locked_page) {
> - /*
> - * Depending on the compressibility, the pages might or
> - * might not go through async. We want all of them to
> - * be accounted against wbc once. Let's do it here
> - * before the paths diverge. wbc accounting is used
> - * only for foreign writeback detection and doesn't
> - * need full accuracy. Just account the whole thing
> - * against the first page.
> - */
> - wbc_account_cgroup_owner(wbc, locked_page,
> - cur_end - start);
> - async_chunk[i].locked_page = locked_page;
> - locked_page = NULL;
> - } else {
> - async_chunk[i].locked_page = NULL;
> - }
> -
> if (blkcg_css != blkcg_root_css) {
> css_get(blkcg_css);
> async_chunk[i].blkcg_css = blkcg_css;
> @@ -1632,10 +1558,8 @@ static int cow_file_range_async(struct btrfs_inode *inode,
>
> btrfs_queue_work(fs_info->delalloc_workers, &async_chunk[i].work);
>
> - *nr_written += nr_pages;
> start = cur_end + 1;
> }
> - *page_started = 1;
> return 0;
> }
>
> @@ -2238,18 +2162,13 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page
> ASSERT(!zoned || btrfs_is_data_reloc_root(inode->root));
> ret = run_delalloc_nocow(inode, locked_page, start, end,
> page_started, nr_written);
> - } else if (!btrfs_inode_can_compress(inode) ||
> - !inode_need_compress(inode, start, end)) {
> + } else {
> if (zoned)
> ret = run_delalloc_zoned(inode, locked_page, start, end,
> page_started, nr_written);
> else
> ret = cow_file_range(inode, locked_page, start, end,
> page_started, nr_written, 1, NULL);
> - } else {
> - set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &inode->runtime_flags);
> - ret = cow_file_range_async(inode, wbc, locked_page, start, end,
> - page_started, nr_written);
> }
> ASSERT(ret <= 0);
> if (ret)
> @@ -7840,14 +7759,68 @@ static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> return extent_fiemap(BTRFS_I(inode), fieinfo, start, len);
> }
>
> +static int btrfs_writepages_async(struct btrfs_inode *inode, struct writeback_control *wbc, u64 start, u64 end)
> +{
> + u64 last_start, cur_start = start;
> + u64 cur_end;
> + int ret = 0;
> +
> + lock_extent(&inode->io_tree, start, end, NULL);
> +
> + while (cur_start < end) {
> + bool found;
> + last_start = cur_start;
> + cur_end = end;
> +
> + found = find_lock_delalloc_range(&inode->vfs_inode, NULL, &cur_start, &cur_end);
This call to find_lock_delalloc_range makes it so inode.c doesn't
compile unless CONFIG_BTRFS_FS_RUN_SANITY_TESTS is set.
> + /* Nothing to writeback */
> + if (!found) {
> + unlock_extent(&inode->io_tree, cur_start, cur_end, NULL);
> + cur_start = cur_end + 1;
> + continue;
> + }
> +
> + /* A hole with no pages, unlock part therof */
> + if (cur_start > last_start)
> + unlock_extent(&inode->io_tree, last_start, cur_start - 1, NULL);
> +
> + /* Got more than we requested for */
> + if (cur_end > end) {
> + if (try_lock_extent(&inode->io_tree, end + 1, cur_end, NULL)) {
> + /* Try writing the whole extent */
> + end = cur_end;
> + } else {
> + /*
> + * Someone is holding the extent lock.
> + * Unlock pages from last part of extent, and
> + * write just as much writepage requested for
> + */
> + __unlock_for_delalloc(&inode->vfs_inode, NULL, end + 1, cur_end);
> + cur_end = end;
> + }
> + }
> +
> + ret = cow_file_range_async(inode, wbc, cur_start, cur_end);
> + if (ret < 0) {
> + unlock_extent(&inode->io_tree, cur_start, end, NULL);
> + break;
> + }
> +
> + cur_start = cur_end + 1;
> + }
> +
> + return ret;
> +}
> +
> static int btrfs_writepages(struct address_space *mapping,
> struct writeback_control *wbc)
> {
> u64 start = 0, end = LLONG_MAX;
> - struct inode *inode = mapping->host;
> + struct btrfs_inode *inode = BTRFS_I(mapping->host);
> + struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
> struct extent_state *cached = NULL;
> int ret;
> - loff_t isize = i_size_read(inode);
> + loff_t isize = i_size_read(&inode->vfs_inode);
> struct writeback_control new_wbc = *wbc;
>
> if (new_wbc.range_cyclic) {
> @@ -7864,9 +7837,14 @@ static int btrfs_writepages(struct address_space *mapping,
> if (start >= end)
> return 0;
>
> - lock_extent(&BTRFS_I(inode)->io_tree, start, end, &cached);
> - ret = extent_writepages(mapping, wbc);
> - unlock_extent(&BTRFS_I(inode)->io_tree, start, end, &cached);
> + if (btrfs_test_opt(fs_info, COMPRESS) &&
> + btrfs_inode_can_compress(inode)) {
> + ret = btrfs_writepages_async(inode, wbc, start, end);
> + } else {
> + lock_extent(&inode->io_tree, start, end, &cached);
> + ret = extent_writepages(mapping, wbc);
> + unlock_extent(&inode->io_tree, start, end, &cached);
> + }
>
> if (new_wbc.range_cyclic) {
> wbc->range_start = new_wbc.range_start;
> --
> 2.39.2
>
next prev parent reply other threads:[~2023-03-08 19:13 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <cover.1677793433.git.rgoldwyn@suse.com>
2023-03-02 22:24 ` [PATCH 01/21] fs: readahead_begin() to call before locking folio Goldwyn Rodrigues
2023-03-06 16:53 ` Christoph Hellwig
2023-03-02 22:24 ` [PATCH 02/21] btrfs: add WARN_ON() on incorrect lock range Goldwyn Rodrigues
2023-03-08 19:28 ` Boris Burkov
2023-03-02 22:24 ` [PATCH 03/21] btrfs: Add start < end check in btrfs_debug_check_extent_io_range() Goldwyn Rodrigues
2023-03-08 19:29 ` Boris Burkov
2023-03-02 22:24 ` [PATCH 04/21] btrfs: make btrfs_qgroup_flush() non-static Goldwyn Rodrigues
2023-03-02 22:24 ` [PATCH 05/21] btrfs: Lock extents before pages for buffered write() Goldwyn Rodrigues
2023-03-02 22:24 ` [PATCH 06/21] btrfs: wait ordered range before locking during truncate Goldwyn Rodrigues
2023-03-07 17:03 ` Christoph Hellwig
2023-03-02 22:24 ` [PATCH 07/21] btrfs: lock extents while truncating Goldwyn Rodrigues
2023-03-02 22:24 ` [PATCH 08/21] btrfs: no need to lock extent while performing invalidate_folio() Goldwyn Rodrigues
2023-03-02 22:24 ` [PATCH 09/21] btrfs: lock extents before folio for read()s Goldwyn Rodrigues
2023-03-02 22:24 ` [PATCH 10/21] btrfs: lock extents before pages in writepages Goldwyn Rodrigues
2023-03-02 22:24 ` [PATCH 11/21] btrfs: locking extents for async writeback Goldwyn Rodrigues
2023-03-08 19:13 ` Boris Burkov [this message]
2023-03-02 22:24 ` [PATCH 12/21] btrfs: lock extents before pages - defrag Goldwyn Rodrigues
2023-03-02 22:24 ` [PATCH 13/21] btrfs: Perform memory faults under locked extent Goldwyn Rodrigues
2023-03-02 22:24 ` [PATCH 14/21] btrfs: writepage fixup lock rearrangement Goldwyn Rodrigues
2023-03-02 22:25 ` [PATCH 15/21] btrfs: lock extent before pages for encoded read ioctls Goldwyn Rodrigues
2023-03-02 22:25 ` [PATCH 16/21] btrfs: lock extent before pages in encoded write Goldwyn Rodrigues
2023-03-02 22:25 ` [PATCH 17/21] btrfs: btree_writepages lock extents before pages Goldwyn Rodrigues
2023-03-02 22:25 ` [PATCH 18/21] btrfs: check if writeback pages exist before starting writeback Goldwyn Rodrigues
2023-03-02 22:25 ` [PATCH 19/21] btrfs: lock extents before pages in relocation Goldwyn Rodrigues
2023-03-02 22:25 ` [PATCH 20/21] btrfs: Add inode->i_count instead of calling ihold() Goldwyn Rodrigues
2023-03-07 17:06 ` Christoph Hellwig
2023-03-08 23:03 ` Goldwyn Rodrigues
2023-03-09 9:14 ` Christoph Hellwig
2023-03-11 3:52 ` Goldwyn Rodrigues
2023-03-02 22:25 ` [PATCH 21/21] btrfs: debug extent locking Goldwyn Rodrigues
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230308191340.GB30177@zen \
--to=boris@bur.io \
--cc=linux-btrfs@vger.kernel.org \
--cc=rgoldwyn@suse.com \
--cc=rgoldwyn@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.