public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Naohiro Aota <Naohiro.Aota@wdc.com>
Cc: "hch@infradead.org" <hch@infradead.org>,
	Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: Any bio_clone_slow() implementation which doesn't share bi_io_vec?
Date: Wed, 24 Nov 2021 15:39:32 +0800	[thread overview]
Message-ID: <60ecb6a2-da19-6876-8c43-42011b4e374d@gmx.com> (raw)
In-Reply-To: <20211124072533.tleak7xavj3tqxly@naota-xeon>



On 2021/11/24 15:25, Naohiro Aota wrote:
> On Wed, Nov 24, 2021 at 07:07:18AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2021/11/23 22:28, hch@infradead.org wrote:
>>> On Tue, Nov 23, 2021 at 11:39:11AM +0000, Johannes Thumshirn wrote:
>>>> I think we have to differentiate two cases here:
>>>> A "regular" REQ_OP_ZONE_APPEND bio and a RAID stripe REQ_OP_ZONE_APPEND
>>>> bio. The 1st one (i.e. the regular REQ_OP_ZONE_APPEND bio) can't be split
>>>> because we cannot guarantee the order the device writes the data to disk.
>>
>> That's correct.
>>
>> But if we want to move all bio split into chunk layer, we want a initial
>> bio without any limitation, and then using that bio to create real
>> REQ_OP_ZONE_APPEND bios with proper size limitations.
>>
>>>> For the RAID stripe bio we can split it into the two (or more) parts that
>>>> will end up on _different_ devices. All we need to do is a) ensure it
>>>> doesn't cross the device's zone append limit and b) clamp all
>>>> bi_iter.bi_sector down to the start of the target zone, a.k.a sticking to
>>>> the rules of REQ_OP_ZONE_APPEND.
>>>
>>> Exactly.  A stacking driver must never split a REQ_OP_ZONE_APPEND bio.
>>> But the file system itself can of course split it as long as each split
>>> off bio has it's own bi_end_io handler to record where it has been
>>> written to.
>>>
>>
>> This makes me wonder, can we really forget the zone thing for the
>> initial bio so we just create a plain bio without any special
>> limitation, and let every split condition be handled in the lower layer?
>>
>> Including raid stripe boundary, zone limitations etc.
>
> What really matters is to ensure the "one bio (for real zoned device)
> == one ordered extent" rule. When a device rewrites ZONE_APPEND bio's
> sector address, we rewrite the ordered extent's logical address
> accordingly in the end_io process. For ensuring the rewriting works,
> one extent must be composed with one contiguous bio.
>
> So, if we can split an ordered extent at the bio splitting process,
> that will be fine. Or, it is also fine if we can split an ordered
> extent at end_bio process. But, I think it is difficult because
> someone can be already waiting for the ordered extent, and splitting
> it at that point will break some assumptions in the code.

OK, I see the problem now.

It's extract_ordered_extent() relying on the zoned append bio to split
the ordered extents.

Not the opposite, thus it will be still more complex than I thought to
split bio in chunk layer.

I'll leave the zoned part untouched for now until I have a better solution.

Thanks,
Qu
>
>> (yeah, it's still not pure stacking driver, but it's more
>> stacking-driver like).
>>
>> In that case, the missing piece seems to be a way to convert a splitted
>> plain bio into a REQ_OP_ZONE_APPEND bio.
>>
>> Can this be done without slow bvec copying?
>>
>> Thanks,
>> Qu

  reply	other threads:[~2021-11-24  7:39 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-23  6:44 Any bio_clone_slow() implementation which doesn't share bi_io_vec? Qu Wenruo
2021-11-23  7:43 ` Christoph Hellwig
2021-11-23  8:10   ` Qu Wenruo
2021-11-23  8:13     ` Christoph Hellwig
2021-11-23 11:09       ` Qu Wenruo
2021-11-23 11:39         ` Johannes Thumshirn
2021-11-23 14:28           ` hch
2021-11-23 23:07             ` Qu Wenruo
2021-11-24  6:09               ` hch
2021-11-24  6:18                 ` Qu Wenruo
2021-11-24  7:02                   ` hch
2021-11-24  7:22                     ` hch
2021-11-24  7:25               ` Naohiro Aota
2021-11-24  7:39                 ` Qu Wenruo [this message]
2021-11-26 12:33       ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=60ecb6a2-da19-6876-8c43-42011b4e374d@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=Naohiro.Aota@wdc.com \
    --cc=dm-devel@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox