All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Boris Burkov <boris@bur.io>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH] btrfs: fix encoded write i_size corruption
Date: Sat, 29 Apr 2023 15:54:22 -0400	[thread overview]
Message-ID: <20230429195422.GA1455986@perftesting> (raw)
In-Reply-To: <e340cd5aef01df9826746dab5a74cb2fcce19a8e.1682714694.git.boris@bur.io>

On Fri, Apr 28, 2023 at 02:02:11PM -0700, Boris Burkov wrote:
> We have observed a btrfs filesystem corruption on workloads using
> NOHOLES and encoded writes via sendstream v2. The symptom is that a file
> appears to be truncated to the end of its last aligned extent, even
> though the final unaligned extent and even the file extent and otherwise
> correctly updated inode item have been written.
> 
> So if we were writing out a 1MiB+X file via 8 128K extents and one
> extent of length X, isize would be set to 1MiB, but the ninth extent,
> nbyte, etc.. would all appear correct otherwise.
> 
> The source of the race is a narrow (one code line..) window in which a
> noholes fs has read in an updated isize, but has not yet set a shared
> disk_i_size variable to write. Therefore, if two ordered extents run in
> parallel (par for the course for receive workloads), the following
> sequence can play out: (following "threads" a bit loosely, since there
> are callbacks involved for endio but extra threads aren't needed to
> cause the issue)
> 
> ENC-WR1 (second to last)                                         ENC-WR2 (last)
> -------                                                          -------
> btrfs_do_encoded_write
>   set isize = 1M
>   submit bio B1 ending at 1M
> endio B1
> btrfs_inode_safe_disk_i_size_write
>   local isize = 1M
>   falls off a cliff for some reason
>                                                             btrfs_do_encoded_write
>                                                               set isize = 1M+X
>                                                               submit bio B2 ending at 1M+X
>                                                             endio B2
> 							    btrfs_inode_safe_disk_i_size_write
>                                                               local isize = 1M+X
>                                                               disk_i_size = 1M+X
>   disk_i_size = 1M
> 							    btrfs_delayed_update_inode
>   btrfs_delayed_update_inode
> 
> And the delayed inode ends up filled with nbytes=1M+X and isize=1M, and
> writes respect isize and present a corruted file missing its last
> extents.
> 
> Fix this by holding the inode lock in the noholes case so that a thread
> can't sneak in a write to disk_i_size that gets overwritten with an out
> of date isize.
> 
> Fixes: 41a2ee75aab0290 btrfs: introduce per-inode file extent tree
> Signed-off-by: Boris Burkov <boris@bur.io>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

  reply	other threads:[~2023-04-29 19:54 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-28 21:02 [PATCH] btrfs: fix encoded write i_size corruption Boris Burkov
2023-04-29 19:54 ` Josef Bacik [this message]
2023-05-02 12:20 ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230429195422.GA1455986@perftesting \
    --to=josef@toxicpanda.com \
    --cc=boris@bur.io \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.