From: "Jianpeng Ma" <majianpeng@gmail.com>
To: Neil Brown <neilb@suse.de>
Cc: shli <shli@kernel.org>, linux-raid <linux-raid@vger.kernel.org>
Subject: Re: Re: [patch]raid5: fix directio regression
Date: Thu, 9 Aug 2012 10:27:35 +0800 [thread overview]
Message-ID: <201208091027320311633@gmail.com> (raw)
In-Reply-To: 20120809113230.152aade3@notabene.brown
On 2012-08-09 09:32 NeilBrown <neilb@suse.de> Wrote:
>On Thu, 9 Aug 2012 09:20:05 +0800 "Jianpeng Ma" <majianpeng@gmail.com> wrote:
>
>> On 2012-08-08 20:53 Shaohua Li <shli@kernel.org> Wrote:
>> >2012/8/8 Jianpeng Ma <majianpeng@gmail.com>:
>> >> On 2012-08-08 10:58 Shaohua Li <shli@kernel.org> Wrote:
>> >>>2012/8/7 Jianpeng Ma <majianpeng@gmail.com>:
>> >>>> On 2012-08-07 13:32 Shaohua Li <shli@kernel.org> Wrote:
>> >>>>>2012/8/7 Jianpeng Ma <majianpeng@gmail.com>:
>> >>>>>> On 2012-08-07 11:22 Shaohua Li <shli@kernel.org> Wrote:
>> >>>>>>>My directIO randomwrite 4k workload shows a 10~20% regression caused by commit
>> >>>>>>>895e3c5c58a80bb. directIO usually is random IO and if request size isn't big
>> >>>>>>>(which is the common case), delay handling of the stripe hasn't any advantages.
>> >>>>>>>For big size request, delay can still reduce IO.
>> >>>>>>>
>> >>>>>>>Signed-off-by: Shaohua Li <shli@fusionio.com>
>> >>>> [snip]
>> >>>>>>>--
>> >>>>>> May be used size to judge is not a good method.
>> >>>>>> I firstly sended this patch, only want to control direct-write-block,not for reqular file.
>> >>>>>> Because i think if someone used direct-write-block for raid5,he should know the feature of raid5 and he can control
>> >>>>>> for write to full-write.
>> >>>>>> But at that time, i did know how to differentiate between regular file and block-device.
>> >>>>>> I thik we should do something to do this.
>> >>>>>
>> >>>>>I don't think it's possible user can control his write to be a
>> >>>>>full-write even for
>> >>>>>raw disk IO. Why regular file and block device io matters here?
>> >>>>>
>> >>>>>Thanks,
>> >>>>>Shaohua
>> >>>> Another problem is the size. How to judge the size is large or not?
>> >>>> A syscall write is a dio and a dio may be split more bios.
>> >>>> For my workload, i usualy write chunk-size.
>> >>>> But your patch is judge by bio-size.
>> >>>
>> >>>I'd ignore workload which does sequential directIO, though
>> >>>your workload is, but I bet no real workloads are. So I'd like
>> >> Sorry,my explain maybe not corcrect. I write data once which size is almost chunks-size * devices,in order to full-write
>> >> and as possible as to no pre-read operation.
>> >>>only to consider big size random directio. I agree the size
>> >>>judge is arbitrary. I can optimize it to be only consider stripe
>> >>>which hits two or more disks in one bio, but not sure if it's
>> >>>worthy doing. Not ware big size directio is common, and even
>> >>>is, big size request IOPS is low, a bit delay maybe not a big
>> >>>deal.
>> >> If add a acc_time for 'striep_head' to control?
>> >> When get_active_stripe() is ok, update acc_time.
>> >> For some time, stripe_head did not access and it shold pre-read.
>> >
>> >Do you want to add a timer for each stripe? This is even ugly.
>> >How do you choose the expire time? A time works for harddisk
>> >definitely will not work for a fast SSD.
>> A time is like the size which is arbitrary.
>> How about add a interface in sysfs to control by user?
>> Only user can judge the workload, which sequatial write or random write.
>
>This is getting worse by the minute. A sysfs interface for this is
>definitely not a good idea.
>
>The REQ_NOIDLE flag is a pretty clear statement that no more requests that
>merge with this one are expected. If some use cases sends random requests,
>maybe it should be setting REQ_NOIDLE.
>
>Maybe someone should do some research and find out why WRITE_ODIRECT doesn't
>include REQ_NOIDLE. Understanding that would help understand the current
>problem.
>
>NeilBrown
>
Hi neil:
Thanks your suggestion.
Direct-write can set REQ_NOIDLE because only finish this write-operation the next can do.
But direct-write(struct dio) can break up to some bios(struct bios).
For those bios, they have releationSo they may not set REQ_NOIDLE unless the last bio.
I think this may increase the performance, because random-direct-write at most only one bio?
next prev parent reply other threads:[~2012-08-09 2:27 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-07 3:22 [patch]raid5: fix directio regression Shaohua Li
2012-08-07 5:13 ` Jianpeng Ma
2012-08-07 5:32 ` Shaohua Li
2012-08-07 5:42 ` Jianpeng Ma
2012-08-07 6:21 ` Jianpeng Ma
2012-08-08 2:58 ` Shaohua Li
2012-08-08 5:21 ` Jianpeng Ma
2012-08-08 12:53 ` Shaohua Li
2012-08-09 1:20 ` Jianpeng Ma
2012-08-09 1:32 ` NeilBrown
2012-08-09 2:27 ` Jianpeng Ma [this message]
2012-08-09 5:07 ` Shaohua Li
2012-08-14 6:33 ` [patch v2]raid5: " Shaohua Li
2012-08-15 0:56 ` NeilBrown
2012-08-15 1:20 ` kedacomkernel
2012-08-15 1:44 ` Shaohua Li
2012-08-15 1:54 ` Jianpeng Ma
2012-08-16 7:36 ` Jianpeng Ma
2012-08-16 9:42 ` Shaohua Li
2012-08-17 1:00 ` Jianpeng Ma
2012-08-23 6:08 ` Shaohua Li
2012-08-23 6:46 ` Jianpeng Ma
2012-08-23 7:55 ` Shaohua Li
2012-08-23 8:11 ` Jianpeng Ma
2012-08-23 12:17 ` Jianpeng Ma
2012-08-24 3:12 ` Shaohua Li
2012-08-24 4:21 ` kedacomkernel
2012-09-11 0:44 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201208091027320311633@gmail.com \
--to=majianpeng@gmail.com \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
--cc=shli@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).