linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Shaohua Li <shli@kernel.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Christoph Hellwig <hch@lst.de>,
	linux-raid@vger.kernel.org
Subject: Re: [md PATCH 0/5] Stop using bi_phys_segments as a counter
Date: Tue, 22 Nov 2016 13:19:07 +1100	[thread overview]
Message-ID: <87k2bwcl44.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20161122010220.dcq6brjhsliw4io6@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 3534 bytes --]

On Tue, Nov 22 2016, Shaohua Li wrote:

> On Tue, Nov 22, 2016 at 11:25:04AM +1100, Neil Brown wrote:
>> On Tue, Nov 22 2016, Shaohua Li wrote:
>> 
>> > On Mon, Nov 21, 2016 at 12:19:43PM +1100, Neil Brown wrote:
>> >> There are 2 problems with using bi_phys_segments as a counter
>> >> 1/ we only use 16bits, which limits bios to 256M
>> >> 2/ it is poor form to reuse a field like this.  It interferes
>> >>    with other changes to bios.
>> >> 
>> >> We need to clean up a few things before we can change the use the
>> >> counter which is now available inside a bio.
>> >> 
>> >> I have only tested this lightly.  More review and testing would be
>> >> appreciated.
>> >
>> > So without the accounting, we:
>> > - don't do bio completion trace
>> 
>> Yes, but hopefully that will be added back to bio_endio() soon.
>> 
>> > - call md_write_start/md_write_end excessively, which involves atomic operation.
>> 
>> raid5_inc_bio_active_stripes() did an atomic operation.  I don't think
>> there is a net increase in the number of atomic operations.
>
> That's different. md_write_start/end uses a global atomic.
> raid5_inc_bio_active_stripes uses a bio atomic. So we have more cache bouncing now.

Maybe.
Most md_write_start() calls are made in the context of
raid5_make_request().
We could
 - call md_write_start() once at the start
 - count how many times we want to call it in a variable local to
   raid5_make_request()
 - atomically add that to the counter at the end.

Similarly mode md_write_end() requests are in the context of raid5d.  It
could maintain local counter and apply them all in a single update
before it sleeps.

It would be a little messy, but not too horrible I think.

>  
>> >
>> > Not big problems. But we are actually reusing __bi_remaining, I'm wondering why
>> > we not explicitly reuse it. Eg, adds bio_dec_remaining_return() and uses it
>> > like raid5_dec_bi_active_stripes.
>> 
>> Because using it exactly the same way that other places use it leads to
>> fewer surprises, now or later.
>> And I think that the effort to rearrange the code so that we could just
>> call bio_endio() brought real improvements in code clarity and
>> simplicity.
>
> Not the same way. The return_bi list and retry list fix are still good. We can
> replace the bio_endio in your patch with something like:
> if (bio_dec_remaining_return() == 0) {
> 	trace_block_bio_complete()
> 	md_write_end()
> 	bio_endio();
> }
> This will give us better control when to end io.

This isn't safe.  The bio arriving at raid5_make_request() might already
have been split and could be chained.  Then raid5 might never see
bio_dec_remaining_return() return zero.

For example, suppose there is a RAID0 make of some other device, and
this RAID5.
A write request arrives which crosses a chunk boundary.
raid0.c calls bio_split to split off a new bio that will fit in the other
device, leaving the original bio with a larger bi_sector which will get
mapped only into the raid5.
The split bio is chained into the original bio, elevating its
__bi_remaining count.
If the other device is particularly slow, or the RAID5 is particularly
fast, the RAID5 IO might complete before the split bio completes, so
raid5 will only see __bi_remaining go down to one, not zero.
When the split bio finally completes, it's bi_endio is
bio_chain_endio(), and that will call the final bio_endio() on the
original bio.  md_write_end() would then never be called.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]

  reply	other threads:[~2016-11-22  2:19 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-21  1:19 [md PATCH 0/5] Stop using bi_phys_segments as a counter NeilBrown
2016-11-21  1:19 ` [md PATCH 5/5] md/raid5: use bio_inc_remaining() instead of repurposing " NeilBrown
2016-11-21  1:19 ` [md PATCH 4/5] md/raid5: call bio_endio() directly rather than queuing for later NeilBrown
2016-11-21  1:19 ` [md PATCH 2/5] md/raid5: use md_write_start to count stripes, not bios NeilBrown
2016-11-21  1:19 ` [md PATCH 1/5] md: optimize md_write_start() slightly NeilBrown
2016-11-21  1:19 ` [md PATCH 3/5] md/raid5: simplfy delaying of writes while metadata is updated NeilBrown
2016-11-21  2:32 ` [md PATCH 6/5] md/raid5: remove over-loading of ->bi_phys_segments NeilBrown
2016-11-21 14:01 ` [md PATCH 0/5] Stop using bi_phys_segments as a counter Christoph Hellwig
2016-11-21 23:43 ` Shaohua Li
2016-11-22  0:25   ` NeilBrown
2016-11-22  1:02     ` Shaohua Li
2016-11-22  2:19       ` NeilBrown [this message]
2016-11-22  8:01         ` Shaohua Li
2016-11-23  2:08           ` NeilBrown
2016-11-23  8:45             ` Christoph Hellwig
2016-11-24  0:31               ` NeilBrown
2017-02-06  8:56 ` Christoph Hellwig
2017-02-06 21:41   ` Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k2bwcl44.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=hch@lst.de \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-raid@vger.kernel.org \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).