Re: [RFC] Process requests instead of bios to use a scheduler

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
To: NeilBrown <neilb@suse.de>
Cc: "Linux RAID" <linux-raid@vger.kernel.org>,
	"Florian-Ewald Müller" <florian-ewald.mueller@profitbricks.com>
Subject: Re: [RFC] Process requests instead of bios to use a scheduler
Date: Mon, 02 Jun 2014 11:51:52 +0200	[thread overview]
Message-ID: <538C4938.6010704@profitbricks.com> (raw)
In-Reply-To: <20140602093258.22aa2c05@notabene.brown>

[-- Attachment #1: Type: text/plain, Size: 4236 bytes --]

Hi Neil,

first of all thank you very much for your response!

On 02.06.2014 01:32, NeilBrown wrote:
>> at ProfitBricks we use the raid0 driver stacked on top of raid1 to form
>> a RAID-10. Above there is LVM and SCST/ib_srpt.
> 
> Any particular reason you don't use the raid10 driver?

Yes, scaling tests (putting more and more disks into RAID-10) have shown
that this has better performance than the raid10 driver. Especially,
with our 24 HDDs the raid10 driver had strangely only half the
performance. It plain didn't scale right during our test while
raid1+raid0 did.
We also have HW RAID systems where 8 * 2 disks in RAID-10 is the
maximum. So we only do RAID-1 in HW and use MD RAID-0 on top.

>> [...]
>>
>> We did some fio 2.1.7 tests with iodepth 64, posixaio, 10 LVs with 1M
>> chunks sequential I/O and 10 LVs with 4K chunks sequential as well as
>> random I/O - one fio call per device. After 60s all fio processes are
>> killed.
>> Test systems have four 1 TB Seagate Constellation HDDs in RAID-10. LVs
>> are 20G in size each.
>>
>> The biggest issue in our cloud is unfairness leading to high latency,
>> SRP timeouts and reconnects. This way we would need a scheduler for our
>> raid0 device.
> 
> Having a scheduler for RAID0 doesn't make any sense to me.
> RAID0 simply passes each request down to the appropriate underlying device.
> That device then does its own scheduling.
> 
> Adding a scheduler may well make sense for RAID1 (the current "scheduler"
> only does some read balancing and is rather simplistic) and for RAID4/5/6/10.
> 
> But not for RAID0 .... was that a typo?

Nope, we have our RAID-1+0. So it is more or less a RAID-10 and putting
the scheduler to this RAID-0 layer makes sense for us.

>> The difference is tremendous when comparing the results of 4K random
>> writes fighting against 1M sequential writes. With a scheduler the
>> maximum write latency dropped from 10s to 1.6s. The other statistic
>> values are number of bios for scheduler none and number of requests for
>> other schedulers. First read, then write.
>>
>> Scheduler: none
>> <      8 ms: 0 2139
>> <     16 ms: 0 9451
>> <     32 ms: 0 10277
>> <     64 ms: 0 3586
>> <    128 ms: 0 5169
>> <    256 ms: 2 31688
>> <    512 ms: 3 115360
>> <   1024 ms: 2 283681
>> <   2048 ms: 0 420918
>> <   4096 ms: 0 10625
>> <   8192 ms: 0 220
>> <  16384 ms: 0 4
>> <  32768 ms: 0 0
>> <  65536 ms: 0 0
>>> = 65536 ms: 0 0
>>  maximum ms: 660 9920
>>
>> Scheduler: deadline
>> <      8 ms: 2 435
>> <     16 ms: 1 997
>> <     32 ms: 0 1560
>> <     64 ms: 0 4345
>> <    128 ms: 1 11933
>> <    256 ms: 2 46366
>> <    512 ms: 0 182166
>> <   1024 ms: 1 75903
>> <   2048 ms: 0 146
>> <   4096 ms: 0 0
>> <   8192 ms: 0 0
>> <  16384 ms: 0 0
>> <  32768 ms: 0 0
>> <  65536 ms: 0 0
>>> = 65536 ms: 0 0
>>  maximum ms: 640 1640
> 
> Could you do a graph?  I like graphs :-)
> I can certainly seem something has changed here...

Sure, please find the graphs attached. I've converted it into percentage
so that number of bios can be compared to number of requests.

You can see that the scheduler has less IOs below 32 ms. But it has most
IOs below 512 ms while it is below 2048 ms without a scheduler.

>> We clone the bios from the request and put them into a bio list. The
>> request is marked as in-flight and afterwards the bios are processed
>> one-by-one the same way as with the other mode.
>>
>> Is it safe to do it like this with a scheduler?
> 
> I see nothing inherently wrong with the theory.  The details of the code are
> much more important.
> 
>>
>> Any concerns regarding the write-intent bitmap?
> 
> Only that it has to keep working.
> 
>> Do you have any other concerns?
>>
>> We can provide you with the full test results, the test scripts and also
>> some code parts if you wish.
> 
> I'm not against improving the scheduling in various md raid levels, though
> not RAID0 as I mentioned above.
> 
> Show me the code and I might be able to provide a more detailed opinion.

I would say let the user decide whether an MD device should be equipped
with a scheduler or not. We can port our code to latest kernel + latest
mdadm and send you a patch set for testing. Just give me some time to do it.

Cheers,
Sebastian

[-- Attachment #2: md_latency_stat_sched.png --]
[-- Type: image/png, Size: 38833 bytes --]

[-- Attachment #3: md_max_latency_sched.png --]
[-- Type: image/png, Size: 15823 bytes --]

next prev parent reply	other threads:[~2014-06-02  9:51 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-28 13:04 [RFC] Process requests instead of bios to use a scheduler Sebastian Parschauer
2014-06-01 23:32 ` NeilBrown
2014-06-02  9:51   ` Sebastian Parschauer [this message]
2014-06-02 10:20     ` NeilBrown
2014-06-02 11:12       ` Sebastian Parschauer
2014-06-04 17:09       ` [RFC PATCH 0/4] md/mdadm: introduce request function mode support Sebastian Parschauer
2014-06-04 17:09         ` [RFC PATCH 1/4] md: complete bio accounting and add io_latency extension Sebastian Parschauer
2014-06-04 17:10         ` [RFC PATCH 2/4] md: introduce request function mode support Sebastian Parschauer
2014-06-04 17:10         ` [RFC PATCH 3/4] md: handle IO latency accounting in rqfn mode Sebastian Parschauer
2014-06-04 17:10         ` [RFC PATCH 4/4] mdadm: introduce '--use-requestfn' create/assembly option Sebastian Parschauer
2014-06-17 13:20         ` [RFC PATCH 0/4] md/mdadm: introduce request function mode support Sebastian Parschauer
     [not found]           ` <CAH3kUhEK26+4KryoReosMt654-vcrkkgkxaW5tKkFRDBqgX82w@mail.gmail.com>
     [not found]             ` <53A14513.20902@profitbricks.com>
2014-06-18 13:57               ` Roberto Spadim
2014-06-18 14:43                 ` Sebastian Parschauer
2014-06-24  7:09           ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=538C4938.6010704@profitbricks.com \
    --to=sebastian.riemer@profitbricks.com \
    --cc=florian-ewald.mueller@profitbricks.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).