From: Guoqing Jiang <gqjiang@suse.com>
To: Shaohua Li <shli@kernel.org>, Coly Li <colyli@suse.de>
Cc: linux-raid@vger.kernel.org, Shaohua Li <shli@fb.com>,
Neil Brown <neilb@suse.de>,
Johannes Thumshirn <jthumshirn@suse.de>
Subject: Re: [RFC PATCH 1/2] RAID1: a new I/O barrier implementation to remove resync window
Date: Wed, 23 Nov 2016 17:05:04 +0800 [thread overview]
Message-ID: <58355BC0.9030907@suse.com> (raw)
In-Reply-To: <20161122213541.btgw4cpoly5j4jpc@kernel.org>
On 11/23/2016 05:35 AM, Shaohua Li wrote:
> On Tue, Nov 22, 2016 at 05:54:00AM +0800, Coly Li wrote:
>> 'Commit 79ef3a8aa1cb ("raid1: Rewrite the implementation of iobarrier.")'
>> introduces a sliding resync window for raid1 I/O barrier, this idea limits
>> I/O barriers to happen only inside a slidingresync window, for regular
>> I/Os out of this resync window they don't need to wait for barrier any
>> more. On large raid1 device, it helps a lot to improve parallel writing
>> I/O throughput when there are background resync I/Os performing at
>> same time.
>>
>> The idea of sliding resync widow is awesome, but there are several
>> challenges are very difficult to solve,
>> - code complexity
>> Sliding resync window requires several veriables to work collectively,
>> this is complexed and very hard to make it work correctly. Just grep
>> "Fixes: 79ef3a8aa1" in kernel git log, there are 8 more patches to fix
>> the original resync window patch. This is not the end, any further
>> related modification may easily introduce more regreassion.
>> - multiple sliding resync windows
>> Currently raid1 code only has a single sliding resync window, we cannot
>> do parallel resync with current I/O barrier implementation.
>> Implementing multiple resync windows are much more complexed, and very
>> hard to make it correctly.
>>
>> Therefore I decide to implement a much simpler raid1 I/O barrier, by
>> removing resync window code, I believe life will be much easier.
>>
>> The brief idea of the simpler barrier is,
>> - Do not maintain a logbal unique resync window
>> - Use multiple hash buckets to reduce I/O barrier conflictions, regular
>> I/O only has to wait for a resync I/O when both them have same barrier
>> bucket index, vice versa.
>> - I/O barrier can be recuded to an acceptable number if there are enought
>> barrier buckets
>>
>> Here I explain how the barrier buckets are designed,
>> - BARRIER_UNIT_SECTOR_SIZE
>> The whole LBA address space of a raid1 device is divided into multiple
>> barrier units, by the size of BARRIER_UNIT_SECTOR_SIZE.
>> Bio request won't go across border of barrier unit size, that means
>> maximum bio size is BARRIER_UNIT_SECTOR_SIZE<<9 in bytes.
>> - BARRIER_BUCKETS_NR
>> There are BARRIER_BUCKETS_NR buckets in total, if multiple I/O requests
>> hit different barrier units, they only need to compete I/O barrier with
>> other I/Os which hit the same barrier bucket index with each other. The
>> index of a barrier bucket which a bio should look for is calculated by
>> get_barrier_bucket_idx(),
>> (sector >> BARRIER_UNIT_SECTOR_BITS) % BARRIER_BUCKETS_NR
>> sector is the start sector number of a bio. align_to_barrier_unit_end()
>> will make sure the finall bio sent into generic_make_request() won't
>> exceed border of the barrier unit size.
>> - RRIER_BUCKETS_NR
>> Number of barrier buckets is defined by,
>> #define BARRIER_BUCKETS_NR (PAGE_SIZE/sizeof(long))
>> For 4KB page size, there are 512 buckets for each raid1 device. That
>> means the propobility of full random I/O barrier confliction may be
>> reduced down to 1/512.
> Thanks! The idea is awesome and does makes the code easier to understand.
Fully agree!
> Open question:
> - Need review from md clustring developer, I don't touch related code now.
> Don't think it matters, but please open eyes, Guoqing!
Thanks for reminding, I agree.
Anyway, I will try to comment it though I am sticking with lvm2 bugs
now and
run some tests with the two patches applied.
Thanks,
Guoqing
next prev parent reply other threads:[~2016-11-23 9:05 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-21 21:54 [RFC PATCH 1/2] RAID1: a new I/O barrier implementation to remove resync window Coly Li
2016-11-21 21:54 ` [RFC PATCH 2/2] RAID1: avoid unnecessary spin locks in I/O barrier code Coly Li
2016-11-22 21:58 ` Shaohua Li
2016-11-22 21:35 ` [RFC PATCH 1/2] RAID1: a new I/O barrier implementation to remove resync window Shaohua Li
2016-11-23 9:05 ` Guoqing Jiang [this message]
2016-11-24 5:45 ` NeilBrown
2016-11-24 6:05 ` Guoqing Jiang
2016-11-28 6:59 ` Coly Li
2016-11-28 6:42 ` Coly Li
2016-11-29 19:29 ` Shaohua Li
2016-11-30 2:57 ` Coly Li
2016-11-24 7:34 ` Guoqing Jiang
2016-11-28 7:33 ` Coly Li
2016-11-30 6:37 ` Guoqing Jiang
2016-11-30 7:19 ` Coly Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=58355BC0.9030907@suse.com \
--to=gqjiang@suse.com \
--cc=colyli@suse.de \
--cc=jthumshirn@suse.de \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
--cc=shli@fb.com \
--cc=shli@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).