public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: David Sterba <dsterba@suse.com>,
	linux-btrfs@vger.kernel.org, Filipe Manana <fdmanana@suse.com>,
	Naohiro Aota <naohiro.aota@wdc.com>
Subject: Re: [PATCH] btrfs: zoned: exclude relocation and page writeback
Date: Thu, 12 Aug 2021 16:25:58 +0200	[thread overview]
Message-ID: <20210812142558.GI5047@suse.cz> (raw)
In-Reply-To: <a858fb2ff980db27b3638e92f7d2d7a416b8e81e.1628776260.git.johannes.thumshirn@wdc.com>

On Thu, Aug 12, 2021 at 10:53:49PM +0900, Johannes Thumshirn wrote:
> Relocation in a zoned filesystem can fail with a transaction abort with
> error -22 (EINVAL). This happens because the relocation code assumes that
> the extents we relocated the data to have the same size the source extents
> had and ensures this by preallocating the extents.
> 
> But in a zoned filesystem we can't preallocate the extents as this would
> break the sequential write required rule. Therefore it can happen that the
> writeback process kicks in while we're still adding pages to a
> delallocation range and starts writing out dirty pages.
> 
> This then creates destination extents that are smaller than the source
> extents, triggering the following safety check in get_new_location():
> 
>  1034         if (num_bytes != btrfs_file_extent_disk_num_bytes(leaf, fi)) {
>  1035                 ret = -EINVAL;
>  1036                 goto out;
>  1037         }
> 
> One possible solution to address this problem is to mutually exclude page
> writeback and adding pages to the relocation inode. This ensures, that
> we're not submitting an extent before all needed pages have been added to
> it.
> 
> Introduce a new lock in the btrfs_inode which is only taken *IFF* the
> inode is a data relocation inode on a zoned filesystem to mutually exclude
> relocation's construction of extents and page writeback.
> 
> Fixes: 32430c614844 ("btrfs: zoned: enable relocation on a zoned filesystem")
> Reported-by: David Sterba <dsterba@suse.com>
> Cc: Filipe Manana <fdmanana@suse.com>
> Cc: Naohiro Aota <naohiro.aota@wdc.com>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/btrfs_inode.h |  3 +++
>  fs/btrfs/extent_io.c   |  4 ++++
>  fs/btrfs/inode.c       |  1 +
>  fs/btrfs/relocation.c  | 10 +++++++---
>  fs/btrfs/zoned.h       | 27 +++++++++++++++++++++++++++
>  5 files changed, 42 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index 76ee1452c57b..954e772f18e8 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -231,6 +231,9 @@ struct btrfs_inode {
>  
>  	struct rw_semaphore i_mmap_lock;
>  	struct inode vfs_inode;
> +
> +	/* Protects relocation from page writeback on a zoned FS */
> +	struct mutex relocation_lock;

That's a lot of new bytes for an inode, and just for zoned mode. Is
there another way how to synchronize that? Like some inode state bit and
then waiting on it, using the generic wait queues and API like
wait_var_event?

  reply	other threads:[~2021-08-12 14:28 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-12 13:53 [PATCH] btrfs: zoned: exclude relocation and page writeback Johannes Thumshirn
2021-08-12 14:25 ` David Sterba [this message]
2021-08-12 14:40   ` Johannes Thumshirn
2021-08-12 14:50     ` David Sterba
2021-08-17 14:21       ` Johannes Thumshirn
2021-08-18  9:36         ` David Sterba
2021-08-18  9:55           ` Johannes Thumshirn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210812142558.GI5047@suse.cz \
    --to=dsterba@suse.cz \
    --cc=dsterba@suse.com \
    --cc=fdmanana@suse.com \
    --cc=johannes.thumshirn@wdc.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=naohiro.aota@wdc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox