Software RAID & Filesystem Cache

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Software RAID & Filesystem Cache
@ 2004-08-05 16:31 Philip Molter
       [not found] ` <411268C7.9070804@h3c.com>
  0 siblings, 1 reply; 5+ messages in thread
From: Philip Molter @ 2004-08-05 16:31 UTC (permalink / raw)
  To: linux-raid

What is the relationship to software RAID, filesystem caching, and I/O? 
  Here's why I ask:

I have a software application that uses RRD files to monitor numerous 
devices.  All told, we're talking about 21GB of RRD files, about 75% of 
which are being updated every 5 minutes.  Each RRD file is about 3MB in 
size, and updating an RRD file involves a lot of seeking, and writing 
small bits of data to very small parts.  The application should never be 
reading the entire file into memory.  On a properly running system, 
reads are done almost entirely out of filesystem cache on a 4GB box. 
Checking iostat, there is very little read util (<10KB/s over 5 minutes).

Now, I have a box with 12 37GB Raptor 10000RPM drives hooked up to a 
3ware 8500-12 controller.  In software RAID10 under kernel 2.4 (1024k 
chunks), the system works as expected.  Read load is light.  The box has 
plenty of I/O to spare (which it should; those drives are monsters).  In 
software RAID10 under kernel 2.6 (1024k chunks), read load is 
excessively heavy.  Each *drive* reports anywhere from 8-12MB/s (5 
minute average).  The RAID10 device itself reports 95-120MB/s (5 minute 
average).  There is no way the application is actually requesting to 
read that much data.  I trust the figures reported by iostat, though, 
because the application cannot keep up and the system feels excessively 
heavy.  In software RAID10 under kernel 2.6 (32k chunks), the read load 
is lighter, on the order of 1MB/s per disk and 10-12MB/s for the RAID10 
device.  If I convert the box to hardware RAID, the box functions 
normally.  Read levels are even better than 2.4 software RAID 
configuration (literally, less than 1KB/s over a 5 minute average). 
Each configuration is doing exactly the same work, with the exact same 
code and data.  The only difference is the kernel and software RAID 
configurations.

It's almost as if every read request to software RAID under 2.6 is 
bypassing the filesystem cache and furthermore, reading the entire 
chunk, not just the bit of information it needs (that's the only way I 
can explain the drop in read activity between 1024k and 32k).

Kernels are:
2.4.20-31.9 (roughtly 2.4.22)
2.6.7-1.494.2.2 (roughtly 2.6.8-rc2)

Filesystem is ext3

The RAIDs are setup as 6 RAID1 mirrors (2 drives each), striped together.

Any insight is greatly appreciated.
Philip

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Software RAID & Filesystem Cache
       [not found] ` <411268C7.9070804@h3c.com>
@ 2004-08-05 17:34   ` Philip Molter
  0 siblings, 0 replies; 5+ messages in thread
From: Philip Molter @ 2004-08-05 17:34 UTC (permalink / raw)
  To: Mike Hardy; +Cc: linux-raid

Mike Hardy wrote:

> 
> I was just researching a similar problem. I turned up a few things that 
> pointed at the filesystem readahead values/strategies, apparently those 
> things changed quite a bit between 2.4 and 2.6. You might try twiddling 
> those knobs, and you can find some linux-kernel threads where various 
> values were tested
> 
> Also, for my specific case, I found that the default I/O scheduler in 
> Fedora Core 2, the "Complete Fair Queueing" scheduler ('cfq') wasn't as 
> good as the 'deadline' scheduler, and neither was good as the 
> 'anticipatory' scheduler. So now I boot with the kernel command line 
> parameter 'scheduler=as' and things are faster. That's extremely 
> workload specific though - so you might run your own scheduler derby and 
> see what works.
> 
> The readahead is the first place I'd look though. All in all, it appears 
> that 2.6 kernels need a great deal more I/O tuning before they can be 
> put in production. While I like the flexibility that's available, the 
> default settings seem to be a major negative change from 2.4. This sort 
> of thing is just now being quantified and hopefully it gets sorted out 
> in the next couple of releases

Hi Mike,

Thanks for the response.  I run the system under the anticipatory 
scheduler, too, but I tried all of this under the default, deadline and 
as schedulers with no difference between the two.

My first suspicion was readahead as well, but I couldn't find any of the 
sysctl options to control readahead in the kernel (as there were under 
2.4).  I did mess around with readahead directly on the drives, but as 
expected, that had no effect.  I tried other reading options, but none 
seemed to have any effect.  The RAID seemed very insistent on reading 
all of that data.

I was flummoxed.

Philip

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Software RAID & Filesystem Cache
@ 2004-08-05 17:49 Mike Hardy
  2004-08-05 18:54 ` Philip Molter
  2004-08-07 14:04 ` Philip Molter
  0 siblings, 2 replies; 5+ messages in thread
From: Mike Hardy @ 2004-08-05 17:49 UTC (permalink / raw)
  To: 'linux-raid@vger.kernel.org'


Philip Molter wrote:
> Mike Hardy wrote:
>        [ ... snip ... ]
>> The readahead is the first place I'd look though. All in all, it 
>> appears that 2.6 kernels need a great deal more I/O tuning before they 
>> can be put in production. While I like the flexibility that's 
>> available, the default settings seem to be a major negative change 
>> from 2.4. This sort of thing is just now being quantified and 
>> hopefully it gets sorted out in the next couple of releases
> 
> 
> [ .. snip ... ]
> 
> My first suspicion was readahead as well, but I couldn't find any of the 
> sysctl options to control readahead in the kernel (as there were under 

I'm honestly not trying to be a smart-ass by posting this since I
haven't played with these *a single bit* - however, this was what I found:

<berlin>/sys % find . | grep ahead
./block/hdf/queue/read_ahead_kb
./block/hdg/queue/read_ahead_kb
./block/hdc/queue/read_ahead_kb
./block/hda/queue/read_ahead_kb

Maybe those are the appropriate knobs to twiddle? If so, it appears
they've gone per-device, and they've moved to sysfs...

-Mike

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Software RAID & Filesystem Cache
  2004-08-05 17:49 Software RAID & Filesystem Cache Mike Hardy
@ 2004-08-05 18:54 ` Philip Molter
  2004-08-07 14:04 ` Philip Molter
  1 sibling, 0 replies; 5+ messages in thread
From: Philip Molter @ 2004-08-05 18:54 UTC (permalink / raw)
  To: Mike Hardy; +Cc: linux-raid

> I'm honestly not trying to be a smart-ass by posting this since I
> haven't played with these *a single bit* - however, this was what I found:
> 
> <berlin>/sys % find . | grep ahead
> ./block/hdf/queue/read_ahead_kb
> ./block/hdg/queue/read_ahead_kb
> ./block/hdc/queue/read_ahead_kb
> ./block/hda/queue/read_ahead_kb
> 
> Maybe those are the appropriate knobs to twiddle? If so, it appears
> they've gone per-device, and they've moved to sysfs...

Thanks!  Maybe that's why I couldn't find it.  I was looking through 
sysctl settings, which only run through /proc (I guess).  I keep 
forgetting /sys exists now.

Philip

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Software RAID & Filesystem Cache
  2004-08-05 17:49 Software RAID & Filesystem Cache Mike Hardy
  2004-08-05 18:54 ` Philip Molter
@ 2004-08-07 14:04 ` Philip Molter
  1 sibling, 0 replies; 5+ messages in thread
From: Philip Molter @ 2004-08-07 14:04 UTC (permalink / raw)
  To: linux-raid

> <berlin>/sys % find . | grep ahead
> ./block/hdf/queue/read_ahead_kb
> ./block/hdg/queue/read_ahead_kb
> ./block/hdc/queue/read_ahead_kb
> ./block/hda/queue/read_ahead_kb

Yeah, these definitely have no effect on how much is being read from the 
disks.  There is a direct correlation, though, between the chunk size on 
the raid0 portion of the raid10 and the amount of data that is read.  If 
you increase the chunk size, you increase not just the number of bytes 
read from the disks.

There's something seriously messed up with software RAID10.  It's 
practically unusable for high-load situations where before it was 
perfectly fine.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-08-07 14:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-05 17:49 Software RAID & Filesystem Cache Mike Hardy
2004-08-05 18:54 ` Philip Molter
2004-08-07 14:04 ` Philip Molter
  -- strict thread matches above, loose matches on Subject: below --
2004-08-05 16:31 Philip Molter
     [not found] ` <411268C7.9070804@h3c.com>
2004-08-05 17:34   ` Philip Molter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).