All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Shaohua Li <shli@kernel.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Christoph Hellwig <hch@lst.de>,
	linux-raid@vger.kernel.org
Subject: Re: [md PATCH 0/5] Stop using bi_phys_segments as a counter
Date: Tue, 22 Nov 2016 13:19:07 +1100	[thread overview]
Message-ID: <87k2bwcl44.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20161122010220.dcq6brjhsliw4io6@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 3534 bytes --]

On Tue, Nov 22 2016, Shaohua Li wrote:

> On Tue, Nov 22, 2016 at 11:25:04AM +1100, Neil Brown wrote:
>> On Tue, Nov 22 2016, Shaohua Li wrote:
>> 
>> > On Mon, Nov 21, 2016 at 12:19:43PM +1100, Neil Brown wrote:
>> >> There are 2 problems with using bi_phys_segments as a counter
>> >> 1/ we only use 16bits, which limits bios to 256M
>> >> 2/ it is poor form to reuse a field like this.  It interferes
>> >>    with other changes to bios.
>> >> 
>> >> We need to clean up a few things before we can change the use the
>> >> counter which is now available inside a bio.
>> >> 
>> >> I have only tested this lightly.  More review and testing would be
>> >> appreciated.
>> >
>> > So without the accounting, we:
>> > - don't do bio completion trace
>> 
>> Yes, but hopefully that will be added back to bio_endio() soon.
>> 
>> > - call md_write_start/md_write_end excessively, which involves atomic operation.
>> 
>> raid5_inc_bio_active_stripes() did an atomic operation.  I don't think
>> there is a net increase in the number of atomic operations.
>
> That's different. md_write_start/end uses a global atomic.
> raid5_inc_bio_active_stripes uses a bio atomic. So we have more cache bouncing now.

Maybe.
Most md_write_start() calls are made in the context of
raid5_make_request().
We could
 - call md_write_start() once at the start
 - count how many times we want to call it in a variable local to
   raid5_make_request()
 - atomically add that to the counter at the end.

Similarly mode md_write_end() requests are in the context of raid5d.  It
could maintain local counter and apply them all in a single update
before it sleeps.

It would be a little messy, but not too horrible I think.

>  
>> >
>> > Not big problems. But we are actually reusing __bi_remaining, I'm wondering why
>> > we not explicitly reuse it. Eg, adds bio_dec_remaining_return() and uses it
>> > like raid5_dec_bi_active_stripes.
>> 
>> Because using it exactly the same way that other places use it leads to
>> fewer surprises, now or later.
>> And I think that the effort to rearrange the code so that we could just
>> call bio_endio() brought real improvements in code clarity and
>> simplicity.
>
> Not the same way. The return_bi list and retry list fix are still good. We can
> replace the bio_endio in your patch with something like:
> if (bio_dec_remaining_return() == 0) {
> 	trace_block_bio_complete()
> 	md_write_end()
> 	bio_endio();
> }
> This will give us better control when to end io.

This isn't safe.  The bio arriving at raid5_make_request() might already
have been split and could be chained.  Then raid5 might never see
bio_dec_remaining_return() return zero.

For example, suppose there is a RAID0 make of some other device, and
this RAID5.
A write request arrives which crosses a chunk boundary.
raid0.c calls bio_split to split off a new bio that will fit in the other
device, leaving the original bio with a larger bi_sector which will get
mapped only into the raid5.
The split bio is chained into the original bio, elevating its
__bi_remaining count.
If the other device is particularly slow, or the RAID5 is particularly
fast, the RAID5 IO might complete before the split bio completes, so
raid5 will only see __bi_remaining go down to one, not zero.
When the split bio finally completes, it's bi_endio is
bio_chain_endio(), and that will call the final bio_endio() on the
original bio.  md_write_end() would then never be called.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]

  reply	other threads:[~2016-11-22  2:19 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-21  1:19 [md PATCH 0/5] Stop using bi_phys_segments as a counter NeilBrown
2016-11-21  1:19 ` [md PATCH 2/5] md/raid5: use md_write_start to count stripes, not bios NeilBrown
2016-11-21  1:19 ` [md PATCH 4/5] md/raid5: call bio_endio() directly rather than queuing for later NeilBrown
2016-11-21  1:19 ` [md PATCH 3/5] md/raid5: simplfy delaying of writes while metadata is updated NeilBrown
2016-11-21  1:19 ` [md PATCH 1/5] md: optimize md_write_start() slightly NeilBrown
2016-11-21  1:19 ` [md PATCH 5/5] md/raid5: use bio_inc_remaining() instead of repurposing bi_phys_segments as a counter NeilBrown
2016-11-21  2:32 ` [md PATCH 6/5] md/raid5: remove over-loading of ->bi_phys_segments NeilBrown
2016-11-21 14:01 ` [md PATCH 0/5] Stop using bi_phys_segments as a counter Christoph Hellwig
2016-11-21 23:43 ` Shaohua Li
2016-11-22  0:25   ` NeilBrown
2016-11-22  1:02     ` Shaohua Li
2016-11-22  2:19       ` NeilBrown [this message]
2016-11-22  8:01         ` Shaohua Li
2016-11-23  2:08           ` NeilBrown
2016-11-23  8:45             ` Christoph Hellwig
2016-11-24  0:31               ` NeilBrown
2017-02-06  8:56 ` Christoph Hellwig
2017-02-06 21:41   ` Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k2bwcl44.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=hch@lst.de \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-raid@vger.kernel.org \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.