Re: [RFC PATCH 1/2] RAID1: a new I/O barrier implementation to remove resync window

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Guoqing Jiang <gqjiang@suse.com>
To: Shaohua Li <shli@kernel.org>, Coly Li <colyli@suse.de>
Cc: linux-raid@vger.kernel.org, Shaohua Li <shli@fb.com>,
	Neil Brown <neilb@suse.de>,
	Johannes Thumshirn <jthumshirn@suse.de>
Subject: Re: [RFC PATCH 1/2] RAID1: a new I/O barrier implementation to remove resync window
Date: Wed, 23 Nov 2016 17:05:04 +0800	[thread overview]
Message-ID: <58355BC0.9030907@suse.com> (raw)
In-Reply-To: <20161122213541.btgw4cpoly5j4jpc@kernel.org>



On 11/23/2016 05:35 AM, Shaohua Li wrote:
> On Tue, Nov 22, 2016 at 05:54:00AM +0800, Coly Li wrote:
>> 'Commit 79ef3a8aa1cb ("raid1: Rewrite the implementation of iobarrier.")'
>> introduces a sliding resync window for raid1 I/O barrier, this idea limits
>> I/O barriers to happen only inside a slidingresync window, for regular
>> I/Os out of this resync window they don't need to wait for barrier any
>> more. On large raid1 device, it helps a lot to improve parallel writing
>> I/O throughput when there are background resync I/Os performing at
>> same time.
>>
>> The idea of sliding resync widow is awesome, but there are several
>> challenges are very difficult to solve,
>>   - code complexity
>>     Sliding resync window requires several veriables to work collectively,
>>     this is complexed and very hard to make it work correctly. Just grep
>>     "Fixes: 79ef3a8aa1" in kernel git log, there are 8 more patches to fix
>>     the original resync window patch. This is not the end, any further
>>     related modification may easily introduce more regreassion.
>>   - multiple sliding resync windows
>>     Currently raid1 code only has a single sliding resync window, we cannot
>>     do parallel resync with current I/O barrier implementation.
>>     Implementing multiple resync windows are much more complexed, and very
>>     hard to make it correctly.
>>
>> Therefore I decide to implement a much simpler raid1 I/O barrier, by
>> removing resync window code, I believe life will be much easier.
>>
>> The brief idea of the simpler barrier is,
>>   - Do not maintain a logbal unique resync window
>>   - Use multiple hash buckets to reduce I/O barrier conflictions, regular
>>     I/O only has to wait for a resync I/O when both them have same barrier
>>     bucket index, vice versa.
>>   - I/O barrier can be recuded to an acceptable number if there are enought
>>     barrier buckets
>>
>> Here I explain how the barrier buckets are designed,
>>   - BARRIER_UNIT_SECTOR_SIZE
>>     The whole LBA address space of a raid1 device is divided into multiple
>>     barrier units, by the size of BARRIER_UNIT_SECTOR_SIZE.
>>     Bio request won't go across border of barrier unit size, that means
>>     maximum bio size is BARRIER_UNIT_SECTOR_SIZE<<9 in bytes.
>>   - BARRIER_BUCKETS_NR
>>     There are BARRIER_BUCKETS_NR buckets in total, if multiple I/O requests
>>     hit different barrier units, they only need to compete I/O barrier with
>>     other I/Os which hit the same barrier bucket index with each other. The
>>     index of a barrier bucket which a bio should look for is calculated by
>>     get_barrier_bucket_idx(),
>> 	(sector >> BARRIER_UNIT_SECTOR_BITS) % BARRIER_BUCKETS_NR
>>     sector is the start sector number of a bio. align_to_barrier_unit_end()
>>     will make sure the finall bio sent into generic_make_request() won't
>>     exceed border of the barrier unit size.
>>   - RRIER_BUCKETS_NR
>>     Number of barrier buckets is defined by,
>> 	#define BARRIER_BUCKETS_NR	(PAGE_SIZE/sizeof(long))
>>     For 4KB page size, there are 512 buckets for each raid1 device. That
>>     means the propobility of full random I/O barrier confliction may be
>>     reduced down to 1/512.
> Thanks! The idea is awesome and does makes the code easier to understand.

Fully agree!

> Open question:
>   - Need review from md clustring developer, I don't touch related code now.
> Don't think it matters, but please open eyes, Guoqing!

Thanks for reminding, I agree.

Anyway,  I will try to comment it though I am sticking with lvm2 bugs 
now and
run some tests with the two patches applied.

Thanks,
Guoqing

next prev parent reply	other threads:[~2016-11-23  9:05 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-21 21:54 [RFC PATCH 1/2] RAID1: a new I/O barrier implementation to remove resync window Coly Li
2016-11-21 21:54 ` [RFC PATCH 2/2] RAID1: avoid unnecessary spin locks in I/O barrier code Coly Li
2016-11-22 21:58   ` Shaohua Li
2016-11-22 21:35 ` [RFC PATCH 1/2] RAID1: a new I/O barrier implementation to remove resync window Shaohua Li
2016-11-23  9:05   ` Guoqing Jiang [this message]
2016-11-24  5:45   ` NeilBrown
2016-11-24  6:05     ` Guoqing Jiang
2016-11-28  6:59     ` Coly Li
2016-11-28  6:42   ` Coly Li
2016-11-29 19:29     ` Shaohua Li
2016-11-30  2:57       ` Coly Li
2016-11-24  7:34 ` Guoqing Jiang
2016-11-28  7:33   ` Coly Li
2016-11-30  6:37     ` Guoqing Jiang
2016-11-30  7:19       ` Coly Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58355BC0.9030907@suse.com \
    --to=gqjiang@suse.com \
    --cc=colyli@suse.de \
    --cc=jthumshirn@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=shli@fb.com \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).