From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Naohiro Aota <Naohiro.Aota@wdc.com>
Cc: "hch@infradead.org" <hch@infradead.org>,
Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
Linux FS Devel <linux-fsdevel@vger.kernel.org>,
"dm-devel@redhat.com" <dm-devel@redhat.com>,
"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: Any bio_clone_slow() implementation which doesn't share bi_io_vec?
Date: Wed, 24 Nov 2021 15:39:32 +0800 [thread overview]
Message-ID: <60ecb6a2-da19-6876-8c43-42011b4e374d@gmx.com> (raw)
In-Reply-To: <20211124072533.tleak7xavj3tqxly@naota-xeon>
On 2021/11/24 15:25, Naohiro Aota wrote:
> On Wed, Nov 24, 2021 at 07:07:18AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2021/11/23 22:28, hch@infradead.org wrote:
>>> On Tue, Nov 23, 2021 at 11:39:11AM +0000, Johannes Thumshirn wrote:
>>>> I think we have to differentiate two cases here:
>>>> A "regular" REQ_OP_ZONE_APPEND bio and a RAID stripe REQ_OP_ZONE_APPEND
>>>> bio. The 1st one (i.e. the regular REQ_OP_ZONE_APPEND bio) can't be split
>>>> because we cannot guarantee the order the device writes the data to disk.
>>
>> That's correct.
>>
>> But if we want to move all bio split into chunk layer, we want a initial
>> bio without any limitation, and then using that bio to create real
>> REQ_OP_ZONE_APPEND bios with proper size limitations.
>>
>>>> For the RAID stripe bio we can split it into the two (or more) parts that
>>>> will end up on _different_ devices. All we need to do is a) ensure it
>>>> doesn't cross the device's zone append limit and b) clamp all
>>>> bi_iter.bi_sector down to the start of the target zone, a.k.a sticking to
>>>> the rules of REQ_OP_ZONE_APPEND.
>>>
>>> Exactly. A stacking driver must never split a REQ_OP_ZONE_APPEND bio.
>>> But the file system itself can of course split it as long as each split
>>> off bio has it's own bi_end_io handler to record where it has been
>>> written to.
>>>
>>
>> This makes me wonder, can we really forget the zone thing for the
>> initial bio so we just create a plain bio without any special
>> limitation, and let every split condition be handled in the lower layer?
>>
>> Including raid stripe boundary, zone limitations etc.
>
> What really matters is to ensure the "one bio (for real zoned device)
> == one ordered extent" rule. When a device rewrites ZONE_APPEND bio's
> sector address, we rewrite the ordered extent's logical address
> accordingly in the end_io process. For ensuring the rewriting works,
> one extent must be composed with one contiguous bio.
>
> So, if we can split an ordered extent at the bio splitting process,
> that will be fine. Or, it is also fine if we can split an ordered
> extent at end_bio process. But, I think it is difficult because
> someone can be already waiting for the ordered extent, and splitting
> it at that point will break some assumptions in the code.
OK, I see the problem now.
It's extract_ordered_extent() relying on the zoned append bio to split
the ordered extents.
Not the opposite, thus it will be still more complex than I thought to
split bio in chunk layer.
I'll leave the zoned part untouched for now until I have a better solution.
Thanks,
Qu
>
>> (yeah, it's still not pure stacking driver, but it's more
>> stacking-driver like).
>>
>> In that case, the missing piece seems to be a way to convert a splitted
>> plain bio into a REQ_OP_ZONE_APPEND bio.
>>
>> Can this be done without slow bvec copying?
>>
>> Thanks,
>> Qu
next prev parent reply other threads:[~2021-11-24 7:39 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-23 6:44 Any bio_clone_slow() implementation which doesn't share bi_io_vec? Qu Wenruo
2021-11-23 7:43 ` Christoph Hellwig
2021-11-23 8:10 ` Qu Wenruo
2021-11-23 8:13 ` Christoph Hellwig
2021-11-23 11:09 ` Qu Wenruo
2021-11-23 11:39 ` Johannes Thumshirn
2021-11-23 14:28 ` hch
2021-11-23 23:07 ` Qu Wenruo
2021-11-24 6:09 ` hch
2021-11-24 6:18 ` Qu Wenruo
2021-11-24 7:02 ` hch
2021-11-24 7:22 ` hch
2021-11-24 7:25 ` Naohiro Aota
2021-11-24 7:39 ` Qu Wenruo [this message]
2021-11-26 12:33 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=60ecb6a2-da19-6876-8c43-42011b4e374d@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=Johannes.Thumshirn@wdc.com \
--cc=Naohiro.Aota@wdc.com \
--cc=dm-devel@redhat.com \
--cc=hch@infradead.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox