Re: major performance drop on raid5 due to context switches caused by small max_hw_sectors [partially resolved]

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Justin Piszcz <jpiszcz@lucidpixels.com>
To: Pallai Roland <dap@mail.index.hu>
Cc: Linux RAID Mailing List <linux-raid@vger.kernel.org>
Subject: Re: major performance drop on raid5 due to context switches caused by small max_hw_sectors [partially resolved]
Date: Sun, 22 Apr 2007 07:42:43 -0400 (EDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0704220742080.14170@p34.internal.lan> (raw)
In-Reply-To: <200704221338.45759.dap@mail.index.hu>



On Sun, 22 Apr 2007, Pallai Roland wrote:

>
> On Sunday 22 April 2007 12:23:12 Justin Piszcz wrote:
>> On Sun, 22 Apr 2007, Pallai Roland wrote:
>>> On Sunday 22 April 2007 10:47:59 Justin Piszcz wrote:
>>>> On Sun, 22 Apr 2007, Pallai Roland wrote:
>>>>> On Sunday 22 April 2007 02:18:09 Justin Piszcz wrote:
>>>>>> How did you run your read test?
>>>>>
>>>>> I did run 100 parallel reader process (dd) top of XFS file system, try
>>>>> this: for i in `seq 1 100`; do dd of=$i if=/dev/zero bs=64k
>>>>> 2>/dev/null; done for i in `seq 1 100`; do dd if=$i of=/dev/zero bs=64k
>>>>> 2>/dev/null & done
>>>>>
>>>>> and don't forget to set max_sectors_kb below chunk size (eg. 64/128Kb)
>>>>> /sys/block# for i in sd*; do echo 64 >$i/queue/max_sectors_kb; done
>>>>>
>>>>> I also set 2048/4096 readahead sectors with blockdev --setra
>>>>>
>>>>> You need 50-100 reader processes for this issue, I think so. My kernel
>>>>> version is 2.6.20.3
>>>>
>>>> In one xterm:
>>>> for i in `seq 1 100`; do dd of=$i if=/dev/zero bs=64k 2>/dev/null; done
>>>>
>>>> In another:
>>>> for i in `seq 1 100`; do dd if=/dev/md3 of=$i.out bs=64k & done
>>>
>>> Write and read files top of XFS, not on the block device. $i isn't a
>>> typo, you should write into 100 files and read back by 100 threads in
>>> parallel when done. I've 1Gb of RAM, maybe you should use mem= kernel
>>> parameter on boot.
>>>
>>> 1. for i in `seq 1 100`; do dd of=$i if=/dev/zero bs=1M count=100
>>> 2>/dev/null; done
>>> 2. for i in `seq 1 100`; do dd if=$i of=/dev/zero bs=64k 2>/dev/null &
>>> done
>>>
>>
>> I use a combination of 4 Silicon Image controllers (SiI) and the Intel 965
>> chipset.  My max_sectors_kb is 128kb, my chunk size is 128kb, why do you
>> set the max_sectors_kb less than the chunk size?
> It's the maximum on Marvell SATA chips under Linux. Maybe hardware
> limitation. I just would used 128Kb chunk but I hit this issue.
>
>> For read-ahead, there
>> are some good benchmarks by SGI(?) I believe and some others that states
>> 16MB is the best value, over that, you lose on reads/writes or the other,
>> 16MB appears to be optimal for best overall value.  Do these values look
>> good to you, or?
> Where can I found this bechmark?

http://www.rhic.bnl.gov/hepix/talks/041019pm/schoen.pdf
Check page 13 of 20.

  I did some test on this topic, too. I think
> the optimal readahead size always depend on the number of sequentally reading
> processes and the available RAM. If you've 100 processes and 1Gb of RAM, max
> optimal readahead is about 5-6Mb, if you set it bigger that turns into
> readahead thrashing and undesirable context switches. Anyway, I tried 16Mb
> now, but the readahead size doesn't matter in this bug, same context switch
> storm appears with any readahead window size.
>
>> Read 100 files on XFS simultaneously:
> max_sectors_kb is 128kb is here? I think so. I see some anomaly, but maybe
> just you've too big readahead window for so many processes, it's not the bug
> what I'm talking about in my original post. High interrupt and CS count has
> been building slowly, it may a sign of readahead thrashing. In my case the CS
> storm began in the first second and no high interrupt count:
>
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
> r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
> 0  0      0   7220      0 940972    0    0     0     0  256    20  0  0 100  0
> 0 13      0 383636      0 535520    0    0 144904    32 2804 63834  1 42  0 57
> 24 20     0 353312      0 558200    0    0 121524     0 2669 67604  1 40  0 59
> 15 21      0 314808      0 557068    0    0 91300    33 2572 53442  0 29  0 71
>
> I attached a small kernel patch, you can measure readahead thrashing ratio
> with this (see tail of /proc/vmstat). I think it's a handy tool to find the
> optimal RA-size. And if you're interested in the bug what I'm talking about,
> set max_sectors_kb to 64Kb.
>
>
> --
> d
>
>

next prev parent reply	other threads:[~2007-04-22 11:42 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-20 21:06 major performance drop on raid5 due to context switches caused by small max_hw_sectors Pallai Roland
     [not found] ` <5d96567b0704202247s60e4f2f1x19511f790f597ea0@mail.gmail.com>
2007-04-21 19:32   ` major performance drop on raid5 due to context switches caused by small max_hw_sectors [partially resolved] Pallai Roland
2007-04-22  0:18     ` Justin Piszcz
2007-04-22  0:42       ` Pallai Roland
2007-04-22  8:47         ` Justin Piszcz
2007-04-22  9:52           ` Pallai Roland
2007-04-22 10:23             ` Justin Piszcz
2007-04-22 11:38               ` Pallai Roland
2007-04-22 11:42                 ` Justin Piszcz [this message]
2007-04-22 14:38                   ` Pallai Roland
2007-04-22 14:48                     ` Justin Piszcz
2007-04-22 15:09                       ` Pallai Roland
2007-04-22 15:53                         ` Justin Piszcz
2007-04-22 19:01                           ` Mr. James W. Laferriere
2007-04-22 20:35                             ` Justin Piszcz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0704220742080.14170@p34.internal.lan \
    --to=jpiszcz@lucidpixels.com \
    --cc=dap@mail.index.hu \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).