linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bob Liu <bob.liu@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-block@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, martin.petersen@oracle.com,
	shirley.ma@oracle.com, allison.henderson@oracle.com,
	darrick.wong@oracle.com, hch@infradead.org, adilger@dilger.ca
Subject: Re: [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry
Date: Sun, 3 Mar 2019 10:37:59 +0800	[thread overview]
Message-ID: <4c930f97-31cd-cbd9-effb-db3090e0f273@oracle.com> (raw)
In-Reply-To: <20190228214949.GO23020@dastard>

On 3/1/19 5:49 AM, Dave Chinner wrote:
> On Thu, Feb 28, 2019 at 10:22:02PM +0800, Bob Liu wrote:
>> On 2/19/19 5:31 AM, Dave Chinner wrote:
>>> On Wed, Feb 13, 2019 at 05:50:35PM +0800, Bob Liu wrote:
>>>> Motivation:
>>>> When fs data/metadata checksum mismatch, lower block devices may have other
>>>> correct copies. e.g. If XFS successfully reads a metadata buffer off a raid1 but
>>>> decides that the metadata is garbage, today it will shut down the entire
>>>> filesystem without trying any of the other mirrors.  This is a severe
>>>> loss of service, and we propose these patches to have XFS try harder to
>>>> avoid failure.
>>>>
>>>> This patch prototype this mirror retry idea by:
>>>> * Adding @nr_mirrors to struct request_queue which is similar as
>>>>   blk_queue_nonrot(), filesystem can grab device request queue and check max
>>>>   mirrors this block device has.
>>>>   Helper functions were also added to get/set the nr_mirrors.
>>>>
>>>> * Introducing bi_rd_hint just like bi_write_hint, but bi_rd_hint is a long bitmap
>>>> in order to support stacked layer case.
>>>>
>>>> * Modify md/raid1 to support this retry feature.
>>>>
>>>> * Adapter xfs to use this feature.
>>>>   If the read verify fails, we loop over the available mirrors and retry the read.
>>>
>>> Why does the filesystem have to iterate every single posible
>>> combination of devices that are underneath it?
>>>
>>> Wouldn't it be much simpler to be able to attach a verifier
>>> function to the bio, and have each layer that gets called iterate
>>> over all it's copies internally until the verfier function passes
>>> or all copies are exhausted?
>>>
>>> This works for stacked mirrors - it can pass the higher layer
>>> verifier down as far as necessary. It can work for RAID5/6, too, by
>>> having that layer supply it's own verifier for reads that verifies
>>> parity and can reconstruct of failure, then when it's reconstructed
>>> a valid stripe it can run the verifier that was supplied to it from
>>> above, etc.
>>>
>>> i.e. I dont see why only filesystems should drive retries or have to
>>> be aware of the underlying storage stacking. ISTM that each
>>> layer of the storage stack should be able to verify what has been
>>> returned to it is valid independently of the higher layer
>>> requirements. The only difference from a caller point of view should
>>> be submit_bio(bio); vs submit_bio_verify(bio, verifier_cb_func);
>>>
>>
>> We already have bio->bi_end_io(), how about do the verification inside bi_end_io()?
>>
>> Then the whole sequence would like:
>> bio_endio()
>>     > 1.bio->bi_end_io()
>>         > xfs_buf_bio_end_io()
>>             > verify, set bio->bi_status = "please retry" if verify fail
>>         
>>     > 2.if found bio->bi_status = retry
>>     > 3.resubmit bio
> 
> As I mentioned to Darrick, this isn't cwas simple as it seems
> because what XFS actually does is this:
> 
> IO completion thread			Workqueue Thread
> bio_endio(bio)
>   bio->bi_end_io(bio)
>     xfs_buf_bio_end_io(bio)
>       bp->b_error = bio->bi_status
>       xfs_buf_ioend_async(bp)
>         queue_work(bp->b_ioend_wq, bp)
>       bio_put(bio)
> <io completion done>
> 					.....
> 					xfs_buf_ioend(bp)
> 					  bp->b_ops->read_verify()
> 					.....
> 
> IOWs, XFS does not do read verification inside the bio completion
> context, but instead defers it to an external workqueue so it does
> not delay processing incoming bio IO completions. Hence there is no
> way to get the verification status back to the bio completion (the
> bio has already been freed!) to resubmit from there.
> 
> This is one of the reasons I suggested a verifier be added to the
> submission, so the bio itself is wholly responsible for running it,

But then completion time of an i/o would be longer if calling verifier function inside bio_endio().
Would that be a problem? Since it used to be async as your mentioned xfs uses workqueue.

Thanks, -Bob


> not an external, filesystem level completion function that may
> operate outside of bio scope....
> 
>> Is it fine to resubmit a bio inside bio_endio()?
> 
> Depends on the context the bio_endio() completion is running in.
> 

  reply	other threads:[~2019-03-03  2:39 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-13  9:50 [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry Bob Liu
2019-02-13  9:50 ` [RFC PATCH v2 1/9] block: add nr_mirrors to request_queue Bob Liu
2019-02-13 10:26   ` Andreas Dilger
2019-02-13 16:04   ` Theodore Y. Ts'o
2019-02-14  5:57     ` Bob Liu
2019-02-18 17:56       ` Theodore Y. Ts'o
2019-02-13  9:50 ` [RFC PATCH v2 2/9] block: add rd_hint to bio and request Bob Liu
2019-02-13 16:18   ` Jens Axboe
2019-02-14  6:10     ` Bob Liu
2019-02-13  9:50 ` [RFC PATCH v2 3/9] md:raid1: set mirrors correctly Bob Liu
2019-02-13  9:50 ` [RFC PATCH v2 4/9] md:raid1: rd_hint support and consider stacked layer case Bob Liu
2019-02-13  9:50 ` [RFC PATCH v2 5/9] Add b_alt_retry to xfs_buf Bob Liu
2019-02-13  9:50 ` [RFC PATCH v2 6/9] xfs: Add b_rd_hint " Bob Liu
2019-02-13  9:50 ` [RFC PATCH v2 7/9] xfs: Add device retry Bob Liu
2019-02-13  9:50 ` [RFC PATCH v2 8/9] xfs: Rewrite retried read Bob Liu
2019-02-13  9:50 ` [RFC PATCH v2 9/9] xfs: Add tracepoints and logging to alternate device retry Bob Liu
2019-02-18  8:08 ` [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror " jianchao.wang
2019-02-19  1:29   ` jianchao.wang
2019-02-18 21:31 ` Dave Chinner
2019-02-19  2:55   ` Darrick J. Wong
2019-02-19  3:33     ` Dave Chinner
2019-02-28 14:22   ` Bob Liu
2019-02-28 21:49     ` Dave Chinner
2019-03-03  2:37       ` Bob Liu [this message]
2019-03-03 23:18         ` Dave Chinner
2019-02-28 23:28     ` Andreas Dilger
2019-03-01 14:14       ` Bob Liu
2019-03-03 23:45       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4c930f97-31cd-cbd9-effb-db3090e0f273@oracle.com \
    --to=bob.liu@oracle.com \
    --cc=adilger@dilger.ca \
    --cc=allison.henderson@oracle.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=shirley.ma@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).