From: NeilBrown <neilb@suse.com>
To: Shaohua Li <shli@kernel.org>
Cc: Shaohua Li <shli@fb.com>,
linux-raid@vger.kernel.org, khlebnikov@yandex-team.ru,
hch@lst.de
Subject: Re: [PATCH 1/5] MD: attach data to each bio
Date: Mon, 13 Feb 2017 20:49:33 +1100 [thread overview]
Message-ID: <8760ke4dzm.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20170210064715.tpr5mzvccmgxz2af@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 3358 bytes --]
On Thu, Feb 09 2017, Shaohua Li wrote:
> On Fri, Feb 10, 2017 at 05:08:54PM +1100, Neil Brown wrote:
>> On Tue, Feb 07 2017, Shaohua Li wrote:
>>
>> > Currently MD is rebusing some bio fields. To remove the hack, we attach
>> > extra data to each bio. Each personablity can attach extra data to the
>> > bios, so we don't need to rebuse bio fields.
>>
>> I must say that I don't really like this approach.
>> Temporarily modifying ->bi_private and ->bi_end_io seems
>> .... intrusive. I suspect it works, but I wonder if it is really
>> robust in the long term.
>>
>> How about a different approach.. Your main concern with my first patch
>> was that it called md_write_start() and md_write_end() much more often,
>> and these performed atomic ops on "global" variables, particular
>> writes_pending.
>>
>> We could change writes_pending to a per-cpu array which we only count
>> occasionally when needed. As writes_pending is updated often and
>> checked rarely, a per-cpu array which is summed on demand seems
>> appropriate.
>>
>> The following patch is an early draft - it doesn't obviously fail and
>> isn't obviously wrong to me. There is certainly room for improvement
>> and may be bugs.
>> Next week I'll work on collection the re-factoring into separate
>> patches, which are possible good-to-have anyway.
>
> For your first patch, I don't have much concern. It's ok to me. What I don't
> like is the bi_phys_segments handling part. The patches add a lot of logic to
> handle the reference count. They should work, but I'd say it's not easy to
> understand and could be error prone. What we really need is a reference count
> for the bio, so let's just add a reference count. That's my logic and it's
> simple.
We already have two reference counts, and you want to add a third one.
bi_phys_segments is currently used for two related purposes.
It counts the number of stripe_heads currently attached to the bio so
that when the count reaches zero:
1/ ->writes_pending can be decremented
2/ bio_endio() can be called.
When the code was written, the __bi_remaining counter didn't exist. Now
it does and it is integrated with bio_endio() so it should make the code
easier to understand if we just use bio_endio() rather and doing our own
accounting.
That just leaves '1'. We can easily decrement ->writes_pending directly
instead of decrementing a per-bio refcount, and then when it reaches
zero, decrement ->writes_pending. As you pointed out, that comes with a
cost. If ->writes_pending is changed to a per-cpu array which is summed
on demand, the cost goes away.
Having an extra refcount in the bio just adds a level of indirection
that doesn't (that I can see) provide actual value.
>
> For the modifying bi_private and bi_end_io part, I saw some filesystems are
> using this way, at least btrfs. If this is really intrusive, is cloning a bio
> better?
The bio belongs to the filesystem. It allocated it and can do whatever
it likes with bi_end_io and bi_private. I don't think a block device
driver should ever change bi_private of bi_end_io of a bio that it was
passed (if it allocates its own bios, it can of course change those).
I don't think cloning the bio would really help, though you could
probably make something work.
Thanks,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
next prev parent reply other threads:[~2017-02-13 9:49 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-07 16:55 [PATCH 0/5] MD: don't abuse bi_phys_segements Shaohua Li
2017-02-07 16:55 ` [PATCH 1/5] MD: attach data to each bio Shaohua Li
2017-02-08 9:07 ` Guoqing Jiang
2017-02-08 5:24 ` Shaohua Li
2017-02-09 0:09 ` Wols Lists
2017-02-10 6:08 ` NeilBrown
2017-02-10 6:47 ` Shaohua Li
2017-02-13 9:49 ` NeilBrown [this message]
2017-02-13 18:49 ` Shaohua Li
2017-02-14 2:40 ` NeilBrown
2017-02-13 7:37 ` Christoph Hellwig
2017-02-13 9:32 ` NeilBrown
2017-02-07 16:55 ` [PATCH 2/5] md/raid5: don't abuse bio->bi_phys_segments Shaohua Li
2017-02-07 16:56 ` [PATCH 3/5] md/raid5: change disk limits Shaohua Li
2017-02-07 16:56 ` [PATCH 4/5] md/raid1: don't abuse bio->bi_phys_segments Shaohua Li
2017-02-07 16:56 ` [PATCH 5/5] md/raid10: " Shaohua Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8760ke4dzm.fsf@notabene.neil.brown.name \
--to=neilb@suse.com \
--cc=hch@lst.de \
--cc=khlebnikov@yandex-team.ru \
--cc=linux-raid@vger.kernel.org \
--cc=shli@fb.com \
--cc=shli@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).