linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Martin K. Petersen" <martin.petersen@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
	Jens Axboe <axboe@kernel.dk>, Bob Liu <bob.liu@oracle.com>,
	linux-block@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, shirley.ma@oracle.com,
	allison.henderson@oracle.com, darrick.wong@oracle.com,
	hch@infradead.org, adilger@dilger.ca, tytso@mit.edu
Subject: Re: [PATCH v3 2/3] block: verify data when endio
Date: Mon, 01 Apr 2019 10:04:43 -0400	[thread overview]
Message-ID: <yq1ef6lq0sk.fsf@oracle.com> (raw)
In-Reply-To: <20190331220001.GM23020@dastard> (Dave Chinner's message of "Mon, 1 Apr 2019 09:00:01 +1100")


Hi Dave!

>> However, that suffers from another headache in that the I/O can get
>> arbitrarily sliced and diced in units of 512 bytes.
>
> Right, but we don't need support that insane case. Indeed, if
> it wasn't already obvious, we _can't support it_ because the
> filesystem verifiers can't do partial verification. i.e.  part of
> the verification is CRC validation of the whole bio, not to mention
> that filesystem structure fragments cannot be safely parsed,
> interpretted and/or verified without the whole structure first being
> read in.

What I thought. There are some things I can verify by masking but it's
limited.

What about journal entries? Would they be validated with 512-byte
granularity or in bundles thereof? Only a problem during recovery, but
potentially a case where we care deeply about trying another copy if it
exists.

What I'm asking is if we should have a block size argument for the
verification? Or would you want to submit bios capped to the size you
care about and let the block layer take care of coalescing?

Validation of units bigger than the logical block size is an area which
our older Oracle HARD technology handles gracefully but which T10 PI has
been unable to address. So this is an area of particular interest to me,
although it's somewhat orthogonal to Bob's retry plumbing.

Another question for you wrt. retries: Once a copy has been identified
as bad and a good copy read from media, who does the rewrite? Does the
filesystem send a new I/O (which would overwrite all copies) or does the
retry plumbing own the responsibility of writing the good bio to the bad
location?

> IOWs, we need to look at this problem from a "whole stack" point of
> view, not just cry about how "bios are too flexible and so make this
> too hard!". The filesystem greatly constrains the alignment and
> slicing/dicing problem to the point where it should be non-existent,
> we have a clearly defined hard stop where verifier propagation
> terminates, and if all else fails we can still detect corruption at
> the filesystem level just like we do now. The worst thing that
> happens here is we give up the capability for automatic block device
> recovery and repair of damaged copies, which we can't do right now,
> so it's essentially status quo...

Having gone down the path of the one-to-many relationship when I did the
original heterogeneous I/O topology attempt, it's pure hell. Also dealt
with similar conundrums for the integrity stuff. So I don't like the
breadcrumb approach. Perfect is the enemy of good and all that.

And I am 100% in agreement on the careful alignment and not making
things complex for crazy use cases (although occasional straddling I/Os
are not as uncommon as we'd like to think). However, I do have concerns
about this particular feature when it comes to your status quo comment.

In order for us to build highly reliable systems, we have to have a
better building block than "this redundancy retry feature works most of
the time". So to me it is imperative that we provide hard guarantees
once a particular configuration has been set up and stacked. And if the
retry guarantee is somehow invalidated, then we really need to let the
user know about it.

-- 
Martin K. Petersen	Oracle Linux Engineering

  reply	other threads:[~2019-04-01 14:05 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-29 14:23 [RFC PATCH v3 0/3] Block/XFS: Support alternative mirror device retry Bob Liu
2019-03-29 14:23 ` [PATCH v3 1/3] block: introduce submit_bio_verify() Bob Liu
2019-03-29 22:22   ` Andreas Dilger
2019-03-30  1:43     ` Martin K. Petersen
2019-03-29 14:23 ` [PATCH v3 2/3] block: verify data when endio Bob Liu
2019-03-29 14:28   ` Jens Axboe
2019-03-29 14:34     ` Martin K. Petersen
2019-03-29 14:38       ` Jens Axboe
2019-03-29 14:46         ` Martin K. Petersen
2019-03-29 14:50           ` Jens Axboe
2019-03-29 14:51             ` Martin K. Petersen
2019-03-29 14:52               ` Jens Axboe
2019-03-29 15:00                 ` Bob Liu
2019-03-29 15:03                   ` Jens Axboe
2019-03-30  2:17                     ` Martin K. Petersen
2019-03-31 22:00                       ` Dave Chinner
2019-04-01 14:04                         ` Martin K. Petersen [this message]
2019-04-01 21:21                           ` Dave Chinner
2019-04-03  2:45                             ` Martin K. Petersen
2019-04-03 22:21                               ` Dave Chinner
2019-03-29 14:41       ` Bob Liu
2019-03-29 14:40     ` Bob Liu
2019-03-29 14:47       ` Jens Axboe
2019-03-30  0:20       ` Andreas Dilger
2019-03-29 15:39   ` Ming Lei
2019-03-29 14:23 ` [PATCH v3 3/3] fs: xfs: add read_verifier() function Bob Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=yq1ef6lq0sk.fsf@oracle.com \
    --to=martin.petersen@oracle.com \
    --cc=adilger@dilger.ca \
    --cc=allison.henderson@oracle.com \
    --cc=axboe@kernel.dk \
    --cc=bob.liu@oracle.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=shirley.ma@oracle.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).