* Software RAID & Filesystem Cache
@ 2004-08-05 16:31 Philip Molter
[not found] ` <411268C7.9070804@h3c.com>
0 siblings, 1 reply; 5+ messages in thread
From: Philip Molter @ 2004-08-05 16:31 UTC (permalink / raw)
To: linux-raid
What is the relationship to software RAID, filesystem caching, and I/O?
Here's why I ask:
I have a software application that uses RRD files to monitor numerous
devices. All told, we're talking about 21GB of RRD files, about 75% of
which are being updated every 5 minutes. Each RRD file is about 3MB in
size, and updating an RRD file involves a lot of seeking, and writing
small bits of data to very small parts. The application should never be
reading the entire file into memory. On a properly running system,
reads are done almost entirely out of filesystem cache on a 4GB box.
Checking iostat, there is very little read util (<10KB/s over 5 minutes).
Now, I have a box with 12 37GB Raptor 10000RPM drives hooked up to a
3ware 8500-12 controller. In software RAID10 under kernel 2.4 (1024k
chunks), the system works as expected. Read load is light. The box has
plenty of I/O to spare (which it should; those drives are monsters). In
software RAID10 under kernel 2.6 (1024k chunks), read load is
excessively heavy. Each *drive* reports anywhere from 8-12MB/s (5
minute average). The RAID10 device itself reports 95-120MB/s (5 minute
average). There is no way the application is actually requesting to
read that much data. I trust the figures reported by iostat, though,
because the application cannot keep up and the system feels excessively
heavy. In software RAID10 under kernel 2.6 (32k chunks), the read load
is lighter, on the order of 1MB/s per disk and 10-12MB/s for the RAID10
device. If I convert the box to hardware RAID, the box functions
normally. Read levels are even better than 2.4 software RAID
configuration (literally, less than 1KB/s over a 5 minute average).
Each configuration is doing exactly the same work, with the exact same
code and data. The only difference is the kernel and software RAID
configurations.
It's almost as if every read request to software RAID under 2.6 is
bypassing the filesystem cache and furthermore, reading the entire
chunk, not just the bit of information it needs (that's the only way I
can explain the drop in read activity between 1024k and 32k).
Kernels are:
2.4.20-31.9 (roughtly 2.4.22)
2.6.7-1.494.2.2 (roughtly 2.6.8-rc2)
Filesystem is ext3
The RAIDs are setup as 6 RAID1 mirrors (2 drives each), striped together.
Any insight is greatly appreciated.
Philip
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Software RAID & Filesystem Cache
[not found] ` <411268C7.9070804@h3c.com>
@ 2004-08-05 17:34 ` Philip Molter
0 siblings, 0 replies; 5+ messages in thread
From: Philip Molter @ 2004-08-05 17:34 UTC (permalink / raw)
To: Mike Hardy; +Cc: linux-raid
Mike Hardy wrote:
>
> I was just researching a similar problem. I turned up a few things that
> pointed at the filesystem readahead values/strategies, apparently those
> things changed quite a bit between 2.4 and 2.6. You might try twiddling
> those knobs, and you can find some linux-kernel threads where various
> values were tested
>
> Also, for my specific case, I found that the default I/O scheduler in
> Fedora Core 2, the "Complete Fair Queueing" scheduler ('cfq') wasn't as
> good as the 'deadline' scheduler, and neither was good as the
> 'anticipatory' scheduler. So now I boot with the kernel command line
> parameter 'scheduler=as' and things are faster. That's extremely
> workload specific though - so you might run your own scheduler derby and
> see what works.
>
> The readahead is the first place I'd look though. All in all, it appears
> that 2.6 kernels need a great deal more I/O tuning before they can be
> put in production. While I like the flexibility that's available, the
> default settings seem to be a major negative change from 2.4. This sort
> of thing is just now being quantified and hopefully it gets sorted out
> in the next couple of releases
Hi Mike,
Thanks for the response. I run the system under the anticipatory
scheduler, too, but I tried all of this under the default, deadline and
as schedulers with no difference between the two.
My first suspicion was readahead as well, but I couldn't find any of the
sysctl options to control readahead in the kernel (as there were under
2.4). I did mess around with readahead directly on the drives, but as
expected, that had no effect. I tried other reading options, but none
seemed to have any effect. The RAID seemed very insistent on reading
all of that data.
I was flummoxed.
Philip
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Software RAID & Filesystem Cache
@ 2004-08-05 17:49 Mike Hardy
2004-08-05 18:54 ` Philip Molter
2004-08-07 14:04 ` Philip Molter
0 siblings, 2 replies; 5+ messages in thread
From: Mike Hardy @ 2004-08-05 17:49 UTC (permalink / raw)
To: 'linux-raid@vger.kernel.org'
Philip Molter wrote:
> Mike Hardy wrote:
> [ ... snip ... ]
>> The readahead is the first place I'd look though. All in all, it
>> appears that 2.6 kernels need a great deal more I/O tuning before they
>> can be put in production. While I like the flexibility that's
>> available, the default settings seem to be a major negative change
>> from 2.4. This sort of thing is just now being quantified and
>> hopefully it gets sorted out in the next couple of releases
>
>
> [ .. snip ... ]
>
> My first suspicion was readahead as well, but I couldn't find any of the
> sysctl options to control readahead in the kernel (as there were under
I'm honestly not trying to be a smart-ass by posting this since I
haven't played with these *a single bit* - however, this was what I found:
<berlin>/sys % find . | grep ahead
./block/hdf/queue/read_ahead_kb
./block/hdg/queue/read_ahead_kb
./block/hdc/queue/read_ahead_kb
./block/hda/queue/read_ahead_kb
Maybe those are the appropriate knobs to twiddle? If so, it appears
they've gone per-device, and they've moved to sysfs...
-Mike
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Software RAID & Filesystem Cache
2004-08-05 17:49 Software RAID & Filesystem Cache Mike Hardy
@ 2004-08-05 18:54 ` Philip Molter
2004-08-07 14:04 ` Philip Molter
1 sibling, 0 replies; 5+ messages in thread
From: Philip Molter @ 2004-08-05 18:54 UTC (permalink / raw)
To: Mike Hardy; +Cc: linux-raid
> I'm honestly not trying to be a smart-ass by posting this since I
> haven't played with these *a single bit* - however, this was what I found:
>
> <berlin>/sys % find . | grep ahead
> ./block/hdf/queue/read_ahead_kb
> ./block/hdg/queue/read_ahead_kb
> ./block/hdc/queue/read_ahead_kb
> ./block/hda/queue/read_ahead_kb
>
> Maybe those are the appropriate knobs to twiddle? If so, it appears
> they've gone per-device, and they've moved to sysfs...
Thanks! Maybe that's why I couldn't find it. I was looking through
sysctl settings, which only run through /proc (I guess). I keep
forgetting /sys exists now.
Philip
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Software RAID & Filesystem Cache
2004-08-05 17:49 Software RAID & Filesystem Cache Mike Hardy
2004-08-05 18:54 ` Philip Molter
@ 2004-08-07 14:04 ` Philip Molter
1 sibling, 0 replies; 5+ messages in thread
From: Philip Molter @ 2004-08-07 14:04 UTC (permalink / raw)
To: linux-raid
> <berlin>/sys % find . | grep ahead
> ./block/hdf/queue/read_ahead_kb
> ./block/hdg/queue/read_ahead_kb
> ./block/hdc/queue/read_ahead_kb
> ./block/hda/queue/read_ahead_kb
Yeah, these definitely have no effect on how much is being read from the
disks. There is a direct correlation, though, between the chunk size on
the raid0 portion of the raid10 and the amount of data that is read. If
you increase the chunk size, you increase not just the number of bytes
read from the disks.
There's something seriously messed up with software RAID10. It's
practically unusable for high-load situations where before it was
perfectly fine.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2004-08-07 14:04 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-05 17:49 Software RAID & Filesystem Cache Mike Hardy
2004-08-05 18:54 ` Philip Molter
2004-08-07 14:04 ` Philip Molter
-- strict thread matches above, loose matches on Subject: below --
2004-08-05 16:31 Philip Molter
[not found] ` <411268C7.9070804@h3c.com>
2004-08-05 17:34 ` Philip Molter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).