Re: how to turn down cpu usage of raid ?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Matthew Simpson" <matthew@symatec-computer.com>
To: Mark Hahn <hahn@physics.mcmaster.ca>
Cc: linux-raid@vger.kernel.org
Subject: Re: how to turn down cpu usage of raid ?
Date: Mon, 2 Feb 2004 20:25:17 -0600	[thread overview]
Message-ID: <02ad01c3e9fc$f70d4be0$0100a8c0@KARI> (raw)
In-Reply-To: Pine.LNX.4.44.0402022000380.5536-100000@coffee.psychology.mcmaster.ca

No, I'm not using DMA.  :(

/dev/hda:
 multcount    =  0 (off)
 IO_support   =  0 (default 16-bit)
 unmaskirq    =  0 (off)
 using_dma    =  0 (off)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    =  8 (on)
 geometry     = 10587/240/63, sectors = 160086528, start = 0

Model=Maxtor 6Y080P0, FwRev=YAR41BW0, SerialNo=Y24BG1QE
 Config={ Fixed }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=57
 BuffType=DualPortCache, BuffSize=7936kB, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=160086528
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 udma6
 AdvancedPM=yes: disabled (255) WriteCache=enabled
 Drive conforms to: :  1 2 3 4 5 6 7

hdc is the same of course, they are matched disks.

I did some googling on hdparm before and found out how to change things, but
I am nervous about changing my production servers after testing hdparm on
another server.  The problem is that my two production servers are using the
Intel chipset [same board], and the test server is using a Via chipset.  I
was able to set multcount to 16, IO support to 3 [32-bit sync], but I
tried -X66 -u1 -d1 and OOPSed the kernel.  Not sure if it was the -X66,
the -u1, or the -d1 that killed it, but I'm not sure what is safe to screw
with, and I can't hose a production server.  Of course the Intel boards
might be better....

Here is the other server, same Intel controller, but WDC disks instead of
Maxtor:

/dev/hda:
 multcount    =  0 (off)
 IO_support   =  0 (default 16-bit)
 unmaskirq    =  0 (off)
 using_dma    =  0 (off)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    =  8 (on)
 geometry     = 9729/255/63, sectors = 156301488, start = 0

 Model=WDC WD800JB-00ETA0, FwRev=77.07W77, SerialNo=WD-WCAHL4821776
 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
 RawCHS=16383/16/63, TrkSize=57600, SectSize=600, ECCbytes=74
 BuffType=DualPortCache, BuffSize=8192kB, MaxMultSect=16, MultSect=off
 CurCHS=65535/1/63, CurSects=4128705, LBA=yes, LBAsects=156301488
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: device does not report version:  1 2 3 4 5 6

What is the most important value to help this problem out?  I did some tests
with hdparm -Tt on the Via server, and adding multcount 16 and changing the
IO_support to 32-bit sync actually HURT performance instead of helping it.
If DMA is the biggest issue here, I can try turning turning that on and hope
for the best...

yours,
Matthew

----- Original Message ----- 
From: "Mark Hahn" <hahn@physics.mcmaster.ca>
To: "Matthew Simpson" <matthew@symatec-computer.com>
Cc: <linux-raid@vger.kernel.org>
Sent: Monday, February 02, 2004 7:37 PM
Subject: Re: how to turn down cpu usage of raid ?


> > Help!  I am having complaints from users about CPU spikes when writing
to my
> > RAID 1 array.
>
> I can think of two answers: first, are you sure your drives are configured
> sanely?  that is, using dma?  with any reasonable kernel, they should be,
> but its possible to compile in the wrong driver or make some other
mistake.
> hdparm -iv /dev/hda and hdc should show using_dma=1.  you can also look
> at /proc/ide/hda/settings.
>
> second, perhaps you should simply make the kernel less lazy at starting
> writes.  here's some basic settings from 2.4:
>
> [hahn@hahn hahn]$ cat /proc/sys/vm/bdflush
> 30      500     0       0       500     3000    60      20      0
>
>  Value      Meaning
>  nfract     Percentage of buffer cache dirty to activate bdflush
>  ndirty     Maximum number of dirty blocks to  write out per wake-cycle
>  dummy      Unused
>  dummy      Unused
>  interval   jiffies delay between kupdate flushes
>  age_buffer Time for normal buffer to age before we flush it
>  nfract_sync Percentage of buffer cache dirty to activate bdflush
>  synchronously
>  nfract_stop_bdflush Percetange of buffer cache dirty to stop bdflush
>  dummy      Unused
>
>
> in theory, this means:
> - wake up bdflush when 30% of buffers are dirty.
> - write up to 500 blocks per wakeup.
> - 5 seconds between wakeups.
> - let a buffer age for 30 seconds before flushing it.
> - if 60% of buffers are dirty, start throttling dirtiers.
> - stop bdflush when < 20% of buffers are dirty.
>
> of course, the code doesn't exactly do this, and 2.6 is very different.
> still, I'm guessing that:
> - 500 buffers (pages, right?) is too little
> - 5 seconds is to infrequent
> - 30 seconds is probably too long
>
> I have the fileserver for one of my clusters running much smoother with
> ndirty=1000, interval=200 and age_buffer=1000.  my logic is that the disk
> system can sustain around 200 MB/s, so flushing 4MB per wakeup is pretty
> minimal.  I also hate to see the typical burstiness of bdflush - no IO
> between bursts at 5 second intervals.  I'd rather see a smoother stream of
> write-outs - perhaps even a 1-second interval.  finally, Unix's
traditional
> 30-second laziness is mainly done in the hopes that a temporary file will
be
> deleted before ever hitting the disk (and/or writes will be combined).  I
> think 30 seconds is an eternity nowadays, and 10 seconds is more
reasonable.
>
> in short:
> echo '30 1000 0 0 200 1000 60 20 0' > /proc/sys/vm/bdflush
>
> perhaps:
> echo '30 1000 0 0 100 1000 60 20 0' > /proc/sys/vm/bdflush
>
> for extra credit, investigate whether nfract=30 is too high (I think so,
on
> today's big-memory systems).  whether higher ndirty improves balance
(these
> writes would compete with application IO, so might hurt, albeit less with
> 2.6's smarter IO scheduler.)  whether the sync/stop parameters make a
> difference, too - throttling dirtiers should probably kick in earlier,
> but if you lower nfract, also lower nfract_stop_bdflush...
>
> > Is there a way I can tune software RAID so that writing
> > updates doesn't interfere with other applications?
>
> remember also that many servers don't need atime updates; this can make a
big
> difference in some cases.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2004-02-03  2:25 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-02 23:18 how to turn down cpu usage of raid ? Matthew Simpson
2004-02-03  1:37 ` Mark Hahn
2004-02-03  2:25   ` Matthew Simpson [this message]
2004-02-03  1:39 ` Guy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='02ad01c3e9fc$f70d4be0$0100a8c0@KARI' \
    --to=matthew@symatec-computer.com \
    --cc=hahn@physics.mcmaster.ca \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).