From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Bob Liu <bob.liu@oracle.com>,
linux-block@vger.kernel.org, linux-xfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org, martin.petersen@oracle.com,
shirley.ma@oracle.com, allison.henderson@oracle.com,
hch@infradead.org, adilger@dilger.ca
Subject: Re: [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry
Date: Tue, 19 Feb 2019 14:33:23 +1100 [thread overview]
Message-ID: <20190219033323.GG14116@dastard> (raw)
In-Reply-To: <20190219025520.GB32253@magnolia>
On Mon, Feb 18, 2019 at 06:55:20PM -0800, Darrick J. Wong wrote:
> On Tue, Feb 19, 2019 at 08:31:50AM +1100, Dave Chinner wrote:
> > On Wed, Feb 13, 2019 at 05:50:35PM +0800, Bob Liu wrote:
> > > Motivation:
> > > When fs data/metadata checksum mismatch, lower block devices may have other
> > > correct copies. e.g. If XFS successfully reads a metadata buffer off a raid1 but
> > > decides that the metadata is garbage, today it will shut down the entire
> > > filesystem without trying any of the other mirrors. This is a severe
> > > loss of service, and we propose these patches to have XFS try harder to
> > > avoid failure.
> > >
> > > This patch prototype this mirror retry idea by:
> > > * Adding @nr_mirrors to struct request_queue which is similar as
> > > blk_queue_nonrot(), filesystem can grab device request queue and check max
> > > mirrors this block device has.
> > > Helper functions were also added to get/set the nr_mirrors.
> > >
> > > * Introducing bi_rd_hint just like bi_write_hint, but bi_rd_hint is a long bitmap
> > > in order to support stacked layer case.
> > >
> > > * Modify md/raid1 to support this retry feature.
> > >
> > > * Adapter xfs to use this feature.
> > > If the read verify fails, we loop over the available mirrors and retry the read.
> >
> > Why does the filesystem have to iterate every single posible
> > combination of devices that are underneath it?
> >
> > Wouldn't it be much simpler to be able to attach a verifier
> > function to the bio, and have each layer that gets called iterate
> > over all it's copies internally until the verfier function passes
> > or all copies are exhausted?
> >
> > This works for stacked mirrors - it can pass the higher layer
> > verifier down as far as necessary. It can work for RAID5/6, too, by
> > having that layer supply it's own verifier for reads that verifies
> > parity and can reconstruct of failure, then when it's reconstructed
> > a valid stripe it can run the verifier that was supplied to it from
> > above, etc.
> >
> > i.e. I dont see why only filesystems should drive retries or have to
> > be aware of the underlying storage stacking. ISTM that each
> > layer of the storage stack should be able to verify what has been
> > returned to it is valid independently of the higher layer
> > requirements. The only difference from a caller point of view should
> > be submit_bio(bio); vs submit_bio_verify(bio, verifier_cb_func);
>
> What if instead of constructing a giant pile of verifier call chain, we
> simply had a return value from ->bi_end_io that would then be returned
> from bio_endio()?
Conceptually it acheives the same thing - getting the high level
verifier status down to the lower layer to say "this copy is bad,
try again", but I suspect all the bio chaining and cloning done in
the stack makes this much more difficult than it seems.
> Stacked things like dm-linear would have to know how to connect
> the upper endio to the lower endio though. And that could have
> its downsides, too.
Stacking always makes things hard :/
> How long do we tie up resources in the scsi
> layer while upper levels are busy running verification functions...?
I suspect there's a more important issue to worry about: we run the
XFS read verifiers in an async work queue context after collecting
the IO completion status from the bio, rather than running directly
in bio->bi_end_io() call chain.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2019-02-19 3:33 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-13 9:50 [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry Bob Liu
2019-02-13 9:50 ` [RFC PATCH v2 1/9] block: add nr_mirrors to request_queue Bob Liu
2019-02-13 10:26 ` Andreas Dilger
2019-02-13 16:04 ` Theodore Y. Ts'o
2019-02-14 5:57 ` Bob Liu
2019-02-18 17:56 ` Theodore Y. Ts'o
2019-02-13 9:50 ` [RFC PATCH v2 2/9] block: add rd_hint to bio and request Bob Liu
2019-02-13 16:18 ` Jens Axboe
2019-02-14 6:10 ` Bob Liu
2019-02-13 9:50 ` [RFC PATCH v2 3/9] md:raid1: set mirrors correctly Bob Liu
2019-02-13 9:50 ` [RFC PATCH v2 4/9] md:raid1: rd_hint support and consider stacked layer case Bob Liu
2019-02-13 9:50 ` [RFC PATCH v2 5/9] Add b_alt_retry to xfs_buf Bob Liu
2019-02-13 9:50 ` [RFC PATCH v2 6/9] xfs: Add b_rd_hint " Bob Liu
2019-02-13 9:50 ` [RFC PATCH v2 7/9] xfs: Add device retry Bob Liu
2019-02-13 9:50 ` [RFC PATCH v2 8/9] xfs: Rewrite retried read Bob Liu
2019-02-13 9:50 ` [RFC PATCH v2 9/9] xfs: Add tracepoints and logging to alternate device retry Bob Liu
2019-02-18 8:08 ` [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror " jianchao.wang
2019-02-19 1:29 ` jianchao.wang
2019-02-18 21:31 ` Dave Chinner
2019-02-19 2:55 ` Darrick J. Wong
2019-02-19 3:33 ` Dave Chinner [this message]
2019-02-28 14:22 ` Bob Liu
2019-02-28 21:49 ` Dave Chinner
2019-03-03 2:37 ` Bob Liu
2019-03-03 23:18 ` Dave Chinner
2019-02-28 23:28 ` Andreas Dilger
2019-03-01 14:14 ` Bob Liu
2019-03-03 23:45 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190219033323.GG14116@dastard \
--to=david@fromorbit.com \
--cc=adilger@dilger.ca \
--cc=allison.henderson@oracle.com \
--cc=bob.liu@oracle.com \
--cc=darrick.wong@oracle.com \
--cc=hch@infradead.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=shirley.ma@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).