public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Kent Overstreet <kent.overstreet@linux.dev>
To: linux-bcachefs@vger.kernel.org, linux-block@vger.kernel.org
Cc: Kent Overstreet <kent.overstreet@linux.dev>,
	Roland Vet <vet.roland@protonmail.com>,
	linux-fsdevel@vger.kernel.org
Subject: [PATCH 00/14] better handling of checksum errors/bitrot
Date: Tue, 11 Mar 2025 16:15:02 -0400	[thread overview]
Message-ID: <20250311201518.3573009-1-kent.overstreet@linux.dev> (raw)

Roland Vet spotted a good one: currently, rebalance/copygc get stuck if
we've got an extent that can't be read, due to checksum error/bitrot.

This took some doing to fix properly, because

- We don't want to just delete such data (replace it with
  KEY_TYPE_error); we never want to delete anything except when
  explicitly told to by the user, and while we don't yet have an API for
  "read this file even though there's errors, just give me what you
  have" we definitely will in the future.

- Not being able to move data is a non-option: that would block copygc,
  device removal, etc.

- And, moving said extent requires giving it a new checksum - strictly
  required if the move has to fragment it, teaching the write/move path
  about extents with bad checksums is unpalateable, and anyways we'd
  like to be able to guard against more bitrot, if we can.

So that means:

- Extents need a poison bit: "reads return errors, even though it now
  has a good checksum" - this was added in a separate patch queued up
  for 6.15.

  It's an incompat feature because it's a new extent field, and old
  versions can't parse extents with unknown field types, since they
  won't know their sizes - meaning users will have to explicitly do an
  incompat upgrade to make use of this stuff.

- The read path needs to do additional retries after checksum errors
  before giving up and marking it poisoned, so that we don't
  accidentally convert a transient error to permanent corruption.

- The read path gets a whole bunch of work to plumb precise modern error
  codes around, so that e.g. the retry path, the data move path, and the
  "mark extent poisoned" path all know exactly what's going on.

- Read path is responsible for marking extents poisoned after sufficient
  retry attempts (controlled by a new filesystem option)

- Data move path is allowed to move extents after a read error, if it's
  a checksum error (giving it a new checksum) if it's been poisoned
  (i.e. the extent flags feature is enabled).

Code should be more or less finalized - still have more tests for corner
cases to write, but "write corrupt data and then tell rebalance to move
it to another device" works as expected.

TODO:

- NVME has a "read recovery level" attribute that controlls how hard the
  erasure coding algorithms work - we want that plumbed.

  Before we give up and move data that we know is bad, we need to try
  _as hard as possible_ to get a successful read.

Code currently lives in
https://evilpiepirate.org/git/bcachefs.git/log/?h=bcachefs-testing

Kent Overstreet (14):
  bcachefs: Convert read path to standard error codes
  bcachefs: Fix BCH_ERR_data_read_csum_err_maybe_userspace in retry path
  bcachefs: Read error message now indicates if it was for an internal
    move
  bcachefs: BCH_ERR_data_read_buffer_too_small
  bcachefs: Return errors to top level bch2_rbio_retry()
  bcachefs: Print message on successful read retry
  bcachefs: Don't create bch_io_failures unless it's needed
  bcachefs: Checksum errors get additional retries
  bcachefs: __bch2_read() now takes a btree_trans
  bcachefs: Poison extents that can't be read due to checksum errors
  bcachefs: Data move can read from poisoned extents
  bcachefs: Debug params for data corruption injection
  block: Allow REQ_FUA|REQ_READ
  bcachefs: Read retries are after checksum errors now REQ_FUA

 block/blk-core.c              |  19 +-
 fs/bcachefs/bcachefs_format.h |   2 +
 fs/bcachefs/btree_io.c        |   2 +-
 fs/bcachefs/errcode.h         |  19 +-
 fs/bcachefs/extents.c         | 157 +++++++++-------
 fs/bcachefs/extents.h         |   7 +-
 fs/bcachefs/extents_types.h   |  11 +-
 fs/bcachefs/io_read.c         | 325 +++++++++++++++++++++++++---------
 fs/bcachefs/io_read.h         |  21 +--
 fs/bcachefs/io_write.c        |  24 +++
 fs/bcachefs/move.c            |  26 ++-
 fs/bcachefs/opts.h            |   5 +
 fs/bcachefs/super-io.c        |   4 +
 fs/bcachefs/util.c            |  21 +++
 fs/bcachefs/util.h            |  12 ++
 15 files changed, 473 insertions(+), 182 deletions(-)

-- 
2.47.2


             reply	other threads:[~2025-03-11 20:15 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-11 20:15 Kent Overstreet [this message]
2025-03-11 20:15 ` [PATCH 13/14] block: Allow REQ_FUA|REQ_READ Kent Overstreet
2025-03-15 16:47   ` Jens Axboe
2025-03-15 17:01     ` Kent Overstreet
2025-03-15 17:03       ` Jens Axboe
2025-03-15 17:27         ` Kent Overstreet
2025-03-15 17:43           ` Jens Axboe
2025-03-15 18:07             ` Kent Overstreet
2025-03-15 18:32               ` Jens Axboe
2025-03-15 18:41                 ` Kent Overstreet
2025-03-17  6:00                   ` Christoph Hellwig
2025-03-17 12:15                     ` Kent Overstreet
2025-03-17 14:13                       ` Keith Busch
2025-03-17 14:49                         ` Kent Overstreet
2025-03-17 15:15                           ` Keith Busch
2025-03-17 15:22                             ` Kent Overstreet
2025-03-17 15:30                         ` Martin K. Petersen
2025-03-17 15:43                           ` Kent Overstreet
2025-03-17 17:57                             ` Martin K. Petersen
2025-03-17 18:21                               ` Kent Overstreet
2025-03-17 19:24                                 ` Keith Busch
2025-03-17 19:40                                   ` Kent Overstreet
2025-03-17 20:39                                     ` Keith Busch
2025-03-17 21:13                                       ` Bart Van Assche
2025-03-18  1:06                                         ` Kent Overstreet
2025-03-18  6:16                                           ` Christoph Hellwig
2025-03-18 17:49                                           ` Bart Van Assche
2025-03-18 18:00                                             ` Kent Overstreet
2025-03-18 18:10                                               ` Keith Busch
2025-03-18 18:13                                                 ` Kent Overstreet
2025-03-20  5:40                                                 ` Christoph Hellwig
2025-03-20 10:28                                                   ` Kent Overstreet
2025-03-18  0:27                                       ` Kent Overstreet
2025-03-18  6:11                                 ` Christoph Hellwig
2025-03-18 21:33                                   ` Kent Overstreet
2025-03-17 17:32                           ` Keith Busch
2025-03-18  6:19                             ` Christoph Hellwig
2025-03-18  6:01                       ` Christoph Hellwig
2025-03-17 20:55 ` [PATCH 00/14] better handling of checksum errors/bitrot John Stoffel
2025-03-18  1:15   ` Kent Overstreet
2025-03-18 14:47     ` John Stoffel
2025-03-20 17:15       ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250311201518.3573009-1-kent.overstreet@linux.dev \
    --to=kent.overstreet@linux.dev \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=vet.roland@protonmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox