Re: allocsize mount option

From: Gim Leong Chin <chingimleong@yahoo.com.sg>
To: Dave Chinner <david@fromorbit.com>
Cc: Eric Sandeen <sandeen@sandeen.net>, xfs@oss.sgi.com
Subject: Re: allocsize mount option
Date: Fri, 15 Jan 2010 11:08:22 +0800 (SGT)	[thread overview]
Message-ID: <436901.69970.qm@web76202.mail.sg1.yahoo.com> (raw)

Hi Dave,

Thank you for the advice!

I have done Direct IO dd tests writing the same 20 GB files.  The results are an eye opener!  bs=1GB, count=2

Single instance repeats of 830, 800 MB/s, compared to >100 to under 300 MB/s for buffered.

Two instances aggregate of 304 MB/s, six instances aggregate of 587 MB/s.

System drive /home RAID 1 of 130 MB/s compared to 51 MB/s buffered.

So the problem is with the buffered writes.

> Youἀd have to get all the fixes from 2.6.30 to 2.6.32,
> and the
> backport would be very difficult to get right. Better
> would
> be طust to upgrade the kernel to 2.6.32 ;)

If I change the kernel, I would have no support from Novell.  I would try my luck and convince them.

> > > I'd suggest that you might need to look at
> increasing the
> > > maximum IO
> > > size for the block device
> > > (/sys/block/sdb/queue/max_sectors_kb),
> > > maybe the request queue depth as well to get
> larger IOs to
> > > be pushed
> > > to the raid controller. if you can, at least get
> it to the
> > > stripe
> > > width of 1536k....
> > 
> > Could you give a good reference for performance tuning
> of these
> > parameters?  I am at a total loss here.
> 
> Welcome to the black art of storage subsystem tuning ;)
> 
> I'm not sure there is a good reference for tuning the block
> device
> parameters - most of what I know was handed down by word of
> mouth
> from gurus on high mountains.
> 
> The overriding principle, though, is to try to ensure that
> the
> stripe width sized IOs can be issued right through the IO
> stack to
> the hardware, and that those IOs are correctly aligned to
> the
> stripes. You've got the filesystem configuration and layout
> part
> correct, now it's just tuning the block layer to pass the
> IO's
> through.

Can I confirm that
(/sys/block/sdb/queue/max_sectors_kb)=stripe width 1536 kB

Which parameter is "request queue depth"?  What should be the value?

> FWIW, your tests are not timing how longit takes for all
> the
> data to hit the disk, only how long it takes to get into
> cache.

Thank you!  I do know that XFS buffers writes extensively.  The drive LEDs remain lighted long after the OS says the writes are completed.  Plus some timings are physically impossible.

> That sounds wrong - it sounds like NCQ is not functioning
> properly
> as with NCQ enabled, disabling the drive cache should not
> impact
> throughput at all....

I do not remember clearly if NCQ is available for that motherboard, it is an Ubuntu 32-bit, but I do remember seeing queue depth in the kernel.  I will check it out next week.

But what I read is that NCQ hurts single write performance.  That is also what I found with another Areca SATA RAID in Windows XP.

What I found with all the drives we tested was that disabling the cache badly hurt sequential write performance (no file system, write data directly to designated LBA).

> I'd suggest trying to find another distributor that will
> bring them
> in for you. Putting that many drives in a single chassis is
> almost
> certainly going to cause vibration problems, especially if
> you get
> all the disk heads moving in close synchronisation (which
> is what
> happens when you get all your IO sizing and alignment
> right).

I am working on changing to the WD Caviar RE4 drives.  Not sure if I can pull it off.

Chin Gim Leong

      New Email names for you! 
Get the Email name you&#39;ve always wanted on the new @ymail and @rocketmail. 
Hurry before someone else does!
http://mail.promotions.yahoo.com/newdomains/sg/

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs