Re: XFS: Abysmal write performance because of excessive seeking (allocation groups to blame?)

From: Dave Chinner <david@fromorbit.com>
To: Stefan Ring <stefanrin@gmail.com>
Cc: stan@hardwarefreak.com, Linux fs XFS <xfs@oss.sgi.com>
Subject: Re: XFS: Abysmal write performance because of excessive seeking (allocation groups to blame?)
Date: Mon, 9 Apr 2012 10:19:43 +1000	[thread overview]
Message-ID: <20120409001943.GI18323@dastard> (raw)
In-Reply-To: <CAAxjCEyJW1b4dbKctbrgdWjykQt8Hb4Sw1RKdys3oUsehNHCcQ@mail.gmail.com>

On Sat, Apr 07, 2012 at 09:27:50AM +0200, Stefan Ring wrote:
> > Instead, a far more optimal solution would be to set aside 4 spares per
> > chassis and create 14 four drive RADI10 arrays.  This would yield ~600
> > seeks/sec and ~400MB/s sequential throughput performance per 2 spindle
> > array.  We'd stitch the resulting 56 hardware RAID10 arrays together in
> > an mdraid linear (concatenated) array.  Then we'd format this 112
> > effective spindle linear array with simply:
> >
> > $ mkfs.xfs -d agcount=56 /dev/md0
> >
> > Since each RAID10 is 900GB capacity, we have 56 AGs of just under the
> > 1TB limit, 1 AG per 2 physical spindles.  Due to the 2 stripe spindle
> > nature of the constituent hardware RAID10 arrays, we don't need to worry
> > about aligning XFS writes to the RAID stripe width.  The hardware cache
> > will take care of filling the small stripes.  Now we're in the opposite
> > situation of having too many AGs per spindle.  We've put 2 spindles in a
> > single AG and turned the seek starvation issues on its head.
> 
> So it sounds like that for poor guys like us, who can’t afford the
> hardware to have dozens of spindles, the best option would be to
> create the XFS file system with agcount=1?

No, because then you have no redundancy in metadata structures, so
if you lose/corrupt the superblock you can easier lose the entire
filesytem.  Not to mention you have no allocation parallelism in the
filesystem, so you'll get terrible performance in many common
workloads. IO fairness will also be a big problem.

> That seems to be the only reasonable conclusion to me, since a
> single RAID device, like a single disk, cannot write in parallel
> anyway.

A decent RAID controller with a BBWC and a single LUN benefits from
parallelism just as much as a large disk arrays do because the BBWC
minimises the write IO latency and the controller to do a better job
of scheduling it's IO.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs