From: Neil Brown <neilb@suse.de>
To: Eddy Zhao <eddy.y.zhao@gmail.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: BUG REPORT: md RAID5 write throughput will drop for 1~2s every 16s (under 1Hz sample rate)
Date: Tue, 20 Jul 2010 22:43:27 +1000 [thread overview]
Message-ID: <20100720224327.69ff1967@notabene> (raw)
In-Reply-To: <AANLkTin4T27FUSAWktl-gAHm1yseXcIMTc-7WEPkCOu6@mail.gmail.com>
On Tue, 20 Jul 2010 19:40:05 +0800
Eddy Zhao <eddy.y.zhao@gmail.com> wrote:
> Hello Neil:
>
>
> We observe periodic write throughput drop of md RAID5. See description below
>
> Configuration
> - linux 2.6.28.9
> - 3 Seagate 320GB 7200rpm SATA2.0 disks
> - md RAID5, 3 disks, 256KB chunk
>
> Test
> - open O_DIRECT /dev/md0
> - sequential write, 512KB write block
> - refer to "fpt.cpp" ("ulimit -s ulimited" before run the program)
>
> Problem
> - md RAID5 write throughput will drop for 1~2s every 16s (under 1Hz sample
> rate)
> - refer to "output.txt"
>
> Do you know the resaon of the problem? We want to fix it on our server to
> make the QOS smooth
If I'm interpreting your numbers correctly, it is just an occasional single
write that is slow - not a series of writes during a one second interval that
are each slow. It would help if you could confirm that.
Two possibilities occur to me, though it could be something else altogether.
You would need to instrument the code to collect internal states to see if it
is one of these or something else.
1/ a scheduler problem could be delaying the running of raid5d from time to
time so that it either doesn't respond to ready stripes quickly, or cannot
get CPU time to perform the xor.
2/ For some reason raid5 sometimes decides that it needs to pre-read the
'other' block to calculate parity rather than waiting for the other block
to be written. This is more likely.
Either this is bad code somewhere, or the raid5 is being 'unplugged'
prematurely.
This seems to happen with a period of 30 seconds (I don't know where you
got 16 from. The command:
tr : ' ' < output.txt | sed 's/ms//' | awk '$4 > 100 {print NR, NR-p; p=NR}'
suggests intervals of 1 or 33 seconds being most common, though you could
get more precise data out of your program.
I suspect this aligns with the 30 second periodic 'flush' that Linux does,
though I'm not 100% certain. You could possibly put a 'WARN_ON' in
raid5_activate_delayed if delayed_list is not empty. That will give you
a stack trace showing why the unplug was called.
I'd be keen to hear about any further discoveries you make.
BTW I prefer all such questions be post to linux-raid@vger.kernel.org
as others may be able to contribute. I have taken the liberty of
cc:ing this reply there. I hope you are OK with that.
NeilBrown
>
> FYI: "Single disk" and "2 disk RAID0" write throughput are all smooth (under
> 1Hz sample rate)
>
>
> Thanks
> Eddy
next parent reply other threads:[~2010-07-20 12:43 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <AANLkTin4T27FUSAWktl-gAHm1yseXcIMTc-7WEPkCOu6@mail.gmail.com>
2010-07-20 12:43 ` Neil Brown [this message]
2010-07-22 11:21 ` BUG REPORT: md RAID5 write throughput will drop for 1~2s every 16s (under 1Hz sample rate) Eddy Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100720224327.69ff1967@notabene \
--to=neilb@suse.de \
--cc=eddy.y.zhao@gmail.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.