linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Shaohua Li <shli@kernel.org>
Cc: Shaohua Li <shli@fb.com>,
	linux-raid@vger.kernel.org, khlebnikov@yandex-team.ru,
	hch@lst.de
Subject: Re: [PATCH 1/5] MD: attach data to each bio
Date: Tue, 14 Feb 2017 13:40:30 +1100	[thread overview]
Message-ID: <87vasd336p.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20170213184942.x3s2hawueq3ryzj3@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 3753 bytes --]

On Mon, Feb 13 2017, Shaohua Li wrote:

> On Mon, Feb 13, 2017 at 08:49:33PM +1100, Neil Brown wrote:
>> On Thu, Feb 09 2017, Shaohua Li wrote:
>> 
>> > On Fri, Feb 10, 2017 at 05:08:54PM +1100, Neil Brown wrote:
>> >> On Tue, Feb 07 2017, Shaohua Li wrote:
>> >> 
>> >> > Currently MD is rebusing some bio fields. To remove the hack, we attach
>> >> > extra data to each bio. Each personablity can attach extra data to the
>> >> > bios, so we don't need to rebuse bio fields.
>> >> 
>> >> I must say that I don't really like this approach.
>> >> Temporarily modifying ->bi_private and ->bi_end_io seems
>> >> .... intrusive.   I suspect it works, but I wonder if it is really
>> >> robust in the long term.
>> >> 
>> >> How about a different approach..  Your main concern with my first patch
>> >> was that it called md_write_start() and md_write_end() much more often,
>> >> and these performed atomic ops on "global" variables, particular
>> >> writes_pending.
>> >> 
>> >> We could change writes_pending to a per-cpu array which we only count
>> >> occasionally when needed.  As writes_pending is updated often and
>> >> checked rarely, a per-cpu array which is summed on demand seems
>> >> appropriate.
>> >> 
>> >> The following patch is an early draft - it doesn't obviously fail and
>> >> isn't obviously wrong to me.  There is certainly room for improvement
>> >> and may be bugs.
>> >> Next week I'll work on collection the re-factoring into separate
>> >> patches, which are possible good-to-have anyway.
>> >
>> > For your first patch, I don't have much concern. It's ok to me. What I don't
>> > like is the bi_phys_segments handling part. The patches add a lot of logic to
>> > handle the reference count. They should work, but I'd say it's not easy to
>> > understand and could be error prone. What we really need is a reference count
>> > for the bio, so let's just add a reference count. That's my logic and it's
>> > simple.
>> 
>> We already have two reference counts, and you want to add a third one.
>> 
>> bi_phys_segments is currently used for two related purposes.
>> It counts the number of stripe_heads currently attached to the bio so
>> that when the count reaches zero:
>>  1/ ->writes_pending can be decremented
>>  2/ bio_endio() can be called.
>> 
>> When the code was written, the __bi_remaining counter didn't exist.  Now
>> it does and it is integrated with bio_endio() so it should make the code
>> easier to understand if we just use bio_endio() rather and doing our own
>> accounting.
>> 
>> That just leaves '1'.  We can easily decrement ->writes_pending directly
>> instead of decrementing a per-bio refcount, and then when it reaches
>> zero, decrement ->writes_pending.  As you pointed out, that comes with a
>> cost.  If ->writes_pending is changed to a per-cpu array which is summed
>> on demand, the cost goes away.
>> 
>> Having an extra refcount in the bio just adds a level of indirection
>> that doesn't (that I can see) provide actual value.
>
> Ok, fair enough. I do think an explict counter in the driver side will help us
> a lot, eg, we can better control when to endio and do something there (for
> example the current blk trace, or something we want to add in the future). But
> I don't insist currently.
>
> For the patches, can you repost? I think:
> - patch 2 missed md_write_start for make_discard_request
> - It's unnecessary to zero bi_phys_segments in patch 5. And raid5-cache need do
>   the same change of bio_endio.
> For the md_write_start optimization, we can do it later.

Sure. I agree those two changes are needed.  I'll try to send something
in the next day or so.

NeilBrown


>
> Thanks,
> Shaohua

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  reply	other threads:[~2017-02-14  2:40 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-07 16:55 [PATCH 0/5] MD: don't abuse bi_phys_segements Shaohua Li
2017-02-07 16:55 ` [PATCH 1/5] MD: attach data to each bio Shaohua Li
2017-02-08  9:07   ` Guoqing Jiang
2017-02-08  5:24     ` Shaohua Li
2017-02-09  0:09     ` Wols Lists
2017-02-10  6:08   ` NeilBrown
2017-02-10  6:47     ` Shaohua Li
2017-02-13  9:49       ` NeilBrown
2017-02-13 18:49         ` Shaohua Li
2017-02-14  2:40           ` NeilBrown [this message]
2017-02-13  7:37     ` Christoph Hellwig
2017-02-13  9:32       ` NeilBrown
2017-02-07 16:55 ` [PATCH 2/5] md/raid5: don't abuse bio->bi_phys_segments Shaohua Li
2017-02-07 16:56 ` [PATCH 3/5] md/raid5: change disk limits Shaohua Li
2017-02-07 16:56 ` [PATCH 4/5] md/raid1: don't abuse bio->bi_phys_segments Shaohua Li
2017-02-07 16:56 ` [PATCH 5/5] md/raid10: " Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87vasd336p.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=hch@lst.de \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-raid@vger.kernel.org \
    --cc=shli@fb.com \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).