All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Matthew Simpson" <matthew@symatec-computer.com>
To: Mark Hahn <hahn@physics.mcmaster.ca>
Cc: linux-raid@vger.kernel.org
Subject: Re: how to turn down cpu usage of raid ?
Date: Mon, 2 Feb 2004 20:25:17 -0600	[thread overview]
Message-ID: <02ad01c3e9fc$f70d4be0$0100a8c0@KARI> (raw)
In-Reply-To: Pine.LNX.4.44.0402022000380.5536-100000@coffee.psychology.mcmaster.ca

No, I'm not using DMA.  :(

/dev/hda:
 multcount    =  0 (off)
 IO_support   =  0 (default 16-bit)
 unmaskirq    =  0 (off)
 using_dma    =  0 (off)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    =  8 (on)
 geometry     = 10587/240/63, sectors = 160086528, start = 0

Model=Maxtor 6Y080P0, FwRev=YAR41BW0, SerialNo=Y24BG1QE
 Config={ Fixed }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=57
 BuffType=DualPortCache, BuffSize=7936kB, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=160086528
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 udma6
 AdvancedPM=yes: disabled (255) WriteCache=enabled
 Drive conforms to: :  1 2 3 4 5 6 7

hdc is the same of course, they are matched disks.

I did some googling on hdparm before and found out how to change things, but
I am nervous about changing my production servers after testing hdparm on
another server.  The problem is that my two production servers are using the
Intel chipset [same board], and the test server is using a Via chipset.  I
was able to set multcount to 16, IO support to 3 [32-bit sync], but I
tried -X66 -u1 -d1 and OOPSed the kernel.  Not sure if it was the -X66,
the -u1, or the -d1 that killed it, but I'm not sure what is safe to screw
with, and I can't hose a production server.  Of course the Intel boards
might be better....

Here is the other server, same Intel controller, but WDC disks instead of
Maxtor:

/dev/hda:
 multcount    =  0 (off)
 IO_support   =  0 (default 16-bit)
 unmaskirq    =  0 (off)
 using_dma    =  0 (off)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    =  8 (on)
 geometry     = 9729/255/63, sectors = 156301488, start = 0

 Model=WDC WD800JB-00ETA0, FwRev=77.07W77, SerialNo=WD-WCAHL4821776
 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
 RawCHS=16383/16/63, TrkSize=57600, SectSize=600, ECCbytes=74
 BuffType=DualPortCache, BuffSize=8192kB, MaxMultSect=16, MultSect=off
 CurCHS=65535/1/63, CurSects=4128705, LBA=yes, LBAsects=156301488
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: device does not report version:  1 2 3 4 5 6

What is the most important value to help this problem out?  I did some tests
with hdparm -Tt on the Via server, and adding multcount 16 and changing the
IO_support to 32-bit sync actually HURT performance instead of helping it.
If DMA is the biggest issue here, I can try turning turning that on and hope
for the best...

yours,
Matthew

----- Original Message ----- 
From: "Mark Hahn" <hahn@physics.mcmaster.ca>
To: "Matthew Simpson" <matthew@symatec-computer.com>
Cc: <linux-raid@vger.kernel.org>
Sent: Monday, February 02, 2004 7:37 PM
Subject: Re: how to turn down cpu usage of raid ?


> > Help!  I am having complaints from users about CPU spikes when writing
to my
> > RAID 1 array.
>
> I can think of two answers: first, are you sure your drives are configured
> sanely?  that is, using dma?  with any reasonable kernel, they should be,
> but its possible to compile in the wrong driver or make some other
mistake.
> hdparm -iv /dev/hda and hdc should show using_dma=1.  you can also look
> at /proc/ide/hda/settings.
>
> second, perhaps you should simply make the kernel less lazy at starting
> writes.  here's some basic settings from 2.4:
>
> [hahn@hahn hahn]$ cat /proc/sys/vm/bdflush
> 30      500     0       0       500     3000    60      20      0
>
>  Value      Meaning
>  nfract     Percentage of buffer cache dirty to activate bdflush
>  ndirty     Maximum number of dirty blocks to  write out per wake-cycle
>  dummy      Unused
>  dummy      Unused
>  interval   jiffies delay between kupdate flushes
>  age_buffer Time for normal buffer to age before we flush it
>  nfract_sync Percentage of buffer cache dirty to activate bdflush
>  synchronously
>  nfract_stop_bdflush Percetange of buffer cache dirty to stop bdflush
>  dummy      Unused
>
>
> in theory, this means:
> - wake up bdflush when 30% of buffers are dirty.
> - write up to 500 blocks per wakeup.
> - 5 seconds between wakeups.
> - let a buffer age for 30 seconds before flushing it.
> - if 60% of buffers are dirty, start throttling dirtiers.
> - stop bdflush when < 20% of buffers are dirty.
>
> of course, the code doesn't exactly do this, and 2.6 is very different.
> still, I'm guessing that:
> - 500 buffers (pages, right?) is too little
> - 5 seconds is to infrequent
> - 30 seconds is probably too long
>
> I have the fileserver for one of my clusters running much smoother with
> ndirty=1000, interval=200 and age_buffer=1000.  my logic is that the disk
> system can sustain around 200 MB/s, so flushing 4MB per wakeup is pretty
> minimal.  I also hate to see the typical burstiness of bdflush - no IO
> between bursts at 5 second intervals.  I'd rather see a smoother stream of
> write-outs - perhaps even a 1-second interval.  finally, Unix's
traditional
> 30-second laziness is mainly done in the hopes that a temporary file will
be
> deleted before ever hitting the disk (and/or writes will be combined).  I
> think 30 seconds is an eternity nowadays, and 10 seconds is more
reasonable.
>
> in short:
> echo '30 1000 0 0 200 1000 60 20 0' > /proc/sys/vm/bdflush
>
> perhaps:
> echo '30 1000 0 0 100 1000 60 20 0' > /proc/sys/vm/bdflush
>
> for extra credit, investigate whether nfract=30 is too high (I think so,
on
> today's big-memory systems).  whether higher ndirty improves balance
(these
> writes would compete with application IO, so might hurt, albeit less with
> 2.6's smarter IO scheduler.)  whether the sync/stop parameters make a
> difference, too - throttling dirtiers should probably kick in earlier,
> but if you lower nfract, also lower nfract_stop_bdflush...
>
> > Is there a way I can tune software RAID so that writing
> > updates doesn't interfere with other applications?
>
> remember also that many servers don't need atime updates; this can make a
big
> difference in some cases.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


  reply	other threads:[~2004-02-03  2:25 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-02 23:18 how to turn down cpu usage of raid ? Matthew Simpson
2004-02-03  1:37 ` Mark Hahn
2004-02-03  2:25   ` Matthew Simpson [this message]
2004-02-03  1:39 ` Guy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='02ad01c3e9fc$f70d4be0$0100a8c0@KARI' \
    --to=matthew@symatec-computer.com \
    --cc=hahn@physics.mcmaster.ca \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.