All of lore.kernel.org
 help / color / mirror / Atom feed
From: Guoqing Jiang <gqjiang@suse.com>
To: Shaohua Li <shli@kernel.org>, Coly Li <colyli@suse.de>
Cc: linux-raid@vger.kernel.org, Shaohua Li <shli@fb.com>,
	Neil Brown <neilb@suse.de>,
	Johannes Thumshirn <jthumshirn@suse.de>
Subject: Re: [RFC PATCH 1/2] RAID1: a new I/O barrier implementation to remove resync window
Date: Wed, 23 Nov 2016 17:05:04 +0800	[thread overview]
Message-ID: <58355BC0.9030907@suse.com> (raw)
In-Reply-To: <20161122213541.btgw4cpoly5j4jpc@kernel.org>



On 11/23/2016 05:35 AM, Shaohua Li wrote:
> On Tue, Nov 22, 2016 at 05:54:00AM +0800, Coly Li wrote:
>> 'Commit 79ef3a8aa1cb ("raid1: Rewrite the implementation of iobarrier.")'
>> introduces a sliding resync window for raid1 I/O barrier, this idea limits
>> I/O barriers to happen only inside a slidingresync window, for regular
>> I/Os out of this resync window they don't need to wait for barrier any
>> more. On large raid1 device, it helps a lot to improve parallel writing
>> I/O throughput when there are background resync I/Os performing at
>> same time.
>>
>> The idea of sliding resync widow is awesome, but there are several
>> challenges are very difficult to solve,
>>   - code complexity
>>     Sliding resync window requires several veriables to work collectively,
>>     this is complexed and very hard to make it work correctly. Just grep
>>     "Fixes: 79ef3a8aa1" in kernel git log, there are 8 more patches to fix
>>     the original resync window patch. This is not the end, any further
>>     related modification may easily introduce more regreassion.
>>   - multiple sliding resync windows
>>     Currently raid1 code only has a single sliding resync window, we cannot
>>     do parallel resync with current I/O barrier implementation.
>>     Implementing multiple resync windows are much more complexed, and very
>>     hard to make it correctly.
>>
>> Therefore I decide to implement a much simpler raid1 I/O barrier, by
>> removing resync window code, I believe life will be much easier.
>>
>> The brief idea of the simpler barrier is,
>>   - Do not maintain a logbal unique resync window
>>   - Use multiple hash buckets to reduce I/O barrier conflictions, regular
>>     I/O only has to wait for a resync I/O when both them have same barrier
>>     bucket index, vice versa.
>>   - I/O barrier can be recuded to an acceptable number if there are enought
>>     barrier buckets
>>
>> Here I explain how the barrier buckets are designed,
>>   - BARRIER_UNIT_SECTOR_SIZE
>>     The whole LBA address space of a raid1 device is divided into multiple
>>     barrier units, by the size of BARRIER_UNIT_SECTOR_SIZE.
>>     Bio request won't go across border of barrier unit size, that means
>>     maximum bio size is BARRIER_UNIT_SECTOR_SIZE<<9 in bytes.
>>   - BARRIER_BUCKETS_NR
>>     There are BARRIER_BUCKETS_NR buckets in total, if multiple I/O requests
>>     hit different barrier units, they only need to compete I/O barrier with
>>     other I/Os which hit the same barrier bucket index with each other. The
>>     index of a barrier bucket which a bio should look for is calculated by
>>     get_barrier_bucket_idx(),
>> 	(sector >> BARRIER_UNIT_SECTOR_BITS) % BARRIER_BUCKETS_NR
>>     sector is the start sector number of a bio. align_to_barrier_unit_end()
>>     will make sure the finall bio sent into generic_make_request() won't
>>     exceed border of the barrier unit size.
>>   - RRIER_BUCKETS_NR
>>     Number of barrier buckets is defined by,
>> 	#define BARRIER_BUCKETS_NR	(PAGE_SIZE/sizeof(long))
>>     For 4KB page size, there are 512 buckets for each raid1 device. That
>>     means the propobility of full random I/O barrier confliction may be
>>     reduced down to 1/512.
> Thanks! The idea is awesome and does makes the code easier to understand.

Fully agree!

> Open question:
>   - Need review from md clustring developer, I don't touch related code now.
> Don't think it matters, but please open eyes, Guoqing!

Thanks for reminding, I agree.

Anyway,  I will try to comment it though I am sticking with lvm2 bugs 
now and
run some tests with the two patches applied.

Thanks,
Guoqing

  reply	other threads:[~2016-11-23  9:05 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-21 21:54 [RFC PATCH 1/2] RAID1: a new I/O barrier implementation to remove resync window Coly Li
2016-11-21 21:54 ` [RFC PATCH 2/2] RAID1: avoid unnecessary spin locks in I/O barrier code Coly Li
2016-11-22 21:58   ` Shaohua Li
2016-11-22 21:35 ` [RFC PATCH 1/2] RAID1: a new I/O barrier implementation to remove resync window Shaohua Li
2016-11-23  9:05   ` Guoqing Jiang [this message]
2016-11-24  5:45   ` NeilBrown
2016-11-24  6:05     ` Guoqing Jiang
2016-11-28  6:59     ` Coly Li
2016-11-28  6:42   ` Coly Li
2016-11-29 19:29     ` Shaohua Li
2016-11-30  2:57       ` Coly Li
2016-11-24  7:34 ` Guoqing Jiang
2016-11-28  7:33   ` Coly Li
2016-11-30  6:37     ` Guoqing Jiang
2016-11-30  7:19       ` Coly Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58355BC0.9030907@suse.com \
    --to=gqjiang@suse.com \
    --cc=colyli@suse.de \
    --cc=jthumshirn@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=shli@fb.com \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.