public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andreas Hirstius <Andreas.Hirstius@cern.ch>
To: jmerkey <jmerkey@utah-nac.org>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Serious performance degradation on a RAID with kernel 2.6.10-bk7 and later
Date: Wed, 20 Apr 2005 20:04:40 +0200	[thread overview]
Message-ID: <426699B8.3020504@cern.ch> (raw)
In-Reply-To: <42668977.5060708@utah-nac.org>


Just tried it, but the performance problem remains :-(
 (actually, why should it change? This part of the code didn't change so 
much between 2.6.10-bk6 and -bk7...)

Andreas



jmerkey wrote:

>
>
> For 3Ware, you need to chage the queue depths, and you will see 
> dramatically improved performance. 3Ware can take requests
> a lot faster than Linux pushes them out. Try changing this instead, 
> you won't be going to sleep all the time waiting on the read/write
> request queues to get "unstarved".
>
>
> /linux/include/linux/blkdev.h
>
> //#define BLKDEV_MIN_RQ 4
> //#define BLKDEV_MAX_RQ 128 /* Default maximum */
> #define BLKDEV_MIN_RQ 4096
> #define BLKDEV_MAX_RQ 8192 /* Default maximum */
>
>
> Jeff
>
> Andreas Hirstius wrote:
>
>> Hi,
>>
>>
>> We have a rx4640 with 3x 3Ware 9500 SATA controllers and 24x WD740GD 
>> HDD in a software RAID0 configuration (using md).
>> With kernel 2.6.11 the read performance on the md is reduced by a 
>> factor of 20 (!!) compared to previous kernels.
>> The write rate to the md doesn't change!! (it actually improves a bit).
>>
>> The config for the kernels are basically identical.
>>
>> Here is some vmstat output:
>>
>> kernel 2.6.9: ~1GB/s read
>> procs memory swap io system cpu
>> r b swpd free buff cache si so bi bo in cs us sy wa id
>> 1 1 0 12672 6592 15914112 0 0 1081344 56 15719 1583 0 11 14 74
>> 1 0 0 12672 6592 15915200 0 0 1130496 0 15996 1626 0 11 14 74
>> 0 1 0 12672 6592 15914112 0 0 1081344 0 15891 1570 0 11 14 74
>> 0 1 0 12480 6592 15914112 0 0 1081344 0 15855 1537 0 11 14 74
>> 1 0 0 12416 6592 15914112 0 0 1130496 0 16006 1586 0 12 14 74
>>
>>
>> kernel 2.6.11: ~55MB/s read
>> procs memory swap io system cpu
>> r b swpd free buff cache si so bi bo in cs us sy wa id
>> 1 1 0 24448 37568 15905984 0 0 56934 0 5166 1862 0 1 24 75
>> 0 1 0 20672 37568 15909248 0 0 57280 0 5168 1871 0 1 24 75
>> 0 1 0 22848 37568 15907072 0 0 57306 0 5173 1874 0 1 24 75
>> 0 1 0 25664 37568 15903808 0 0 57190 0 5171 1870 0 1 24 75
>> 0 1 0 21952 37568 15908160 0 0 57267 0 5168 1871 0 1 24 75
>>
>>
>> Because the filesystem might have an impact on the measurement, "dd" 
>> on /dev/md0
>> was used to get information about the performance. This also opens 
>> the possibility to test with block sizes larger than the page size.
>> And it appears that the performance with kernel 2.6.11 is closely 
>> related to the block size.
>> For example if the block size is exactly a multiple (>2) of the page 
>> size the performance is back to ~1.1GB/s.
>> The general behaviour is a bit more complicated:
>> 1. bs <= 1.5 * ps : ~27-57MB/s (differs with ps)
>> 2. bs > 1.5 * ps && bs < 2 * ps : rate increases to max. rate
>> 3. bs = n * ps ; (n >= 2) : ~1.1GB/s (== max. rate)
>> 4. bs > n * ps && bs < ~(n+0.5) * ps ; (n > 2) : ~27-70MB/s (differs 
>> with ps)
>> 5. bs > ~(n+0.5) * ps && bs < (n+1) * ps ; (n > 2) : increasing rate 
>> in several, more or
>> less, distinct steps (e.g. 1/3 of max. rate and then 2/3 of max rate 
>> for 64k pages)
>>
>> I've tested all four possible page sizes on Itanium (4k, 8k, 16k and 
>> 64k) and the pattern is always the same!!
>>
>> With kernel 2.6.9 (any kernel before 2.6.10-bk6) the read rate is 
>> always at ~1.1GB/s,
>> independent of the block size.
>>
>>
>> This simple patch solves the problem, but I have no idea of possible 
>> side-effects ...
>>
>> --- linux-2.6.12-rc2_orig/mm/filemap.c 2005-04-04 18:40:05.000000000 
>> +0200
>> +++ linux-2.6.12-rc2/mm/filemap.c 2005-04-20 10:27:42.000000000 +0200
>> @@ -719,7 +719,7 @@
>> index = *ppos >> PAGE_CACHE_SHIFT;
>> next_index = index;
>> prev_index = ra.prev_page;
>> - last_index = (*ppos + desc->count + PAGE_CACHE_SIZE-1) >> 
>> PAGE_CACHE_SHIFT;
>> + last_index = (*ppos + desc->count + PAGE_CACHE_SIZE) >> 
>> PAGE_CACHE_SHIFT;
>> offset = *ppos & ~PAGE_CACHE_MASK;
>>
>> isize = i_size_read(inode);
>> --- linux-2.6.12-rc2_orig/mm/readahead.c 2005-04-04 
>> 18:40:05.000000000 +0200
>> +++ linux-2.6.12-rc2/mm/readahead.c 2005-04-20 18:37:04.000000000 +0200
>> @@ -70,7 +70,7 @@
>> */
>> static unsigned long get_init_ra_size(unsigned long size, unsigned 
>> long max)
>> {
>> - unsigned long newsize = roundup_pow_of_two(size);
>> + unsigned long newsize = size;
>>
>> if (newsize <= max / 64)
>> newsize = newsize * newsize;
>>
>>
>>
>> In order to keep this mail short, I've created a webpage that 
>> contains all the detailed information and some plots:
>> http://www.cern.ch/openlab-debugging/raid
>>
>>
>> Regards,
>>
>> Andreas Hirstius
>>
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe 
>> linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
>

  reply	other threads:[~2005-04-20 18:04 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-20 17:37 Serious performance degradation on a RAID with kernel 2.6.10-bk7 and later Andreas Hirstius
2005-04-20 16:55 ` jmerkey
2005-04-20 18:04   ` Andreas Hirstius [this message]
2005-04-20 18:24   ` Andreas Hirstius
2005-04-20 19:17     ` jmerkey
2005-04-21  1:11   ` Nick Piggin
2005-04-21  8:32   ` Andreas Hirstius
     [not found]     ` <58cb370e05042102272ce70f2@mail.gmail.com>
2005-04-21  9:42       ` Bartlomiej ZOLNIERKIEWICZ
2005-04-21 11:30         ` Andreas Hirstius
2005-04-21 15:05           ` [Gelato-technical] " David Mosberger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=426699B8.3020504@cern.ch \
    --to=andreas.hirstius@cern.ch \
    --cc=jmerkey@utah-nac.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox