linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Martin Steigerwald <martin@lichtvoll.de>
Cc: "Richard W.M. Jones" <rjones@redhat.com>, linux-xfs@vger.kernel.org
Subject: Re: mkfs.xfs options suitable for creating absurdly large XFS filesystems?
Date: Wed, 5 Sep 2018 17:43:49 +1000	[thread overview]
Message-ID: <20180905074349.GX5631@dastard> (raw)
In-Reply-To: <4003432.r53nhXgZDq@merkaba>

On Wed, Sep 05, 2018 at 09:09:28AM +0200, Martin Steigerwald wrote:
> Dave Chinner - 05.09.18, 00:23:
> > On Tue, Sep 04, 2018 at 05:36:43PM +0200, Martin Steigerwald wrote:
> > > Dave Chinner - 04.09.18, 02:49:
> > > > On Mon, Sep 03, 2018 at 11:49:19PM +0100, Richard W.M. Jones 
> wrote:
> > > > > [This is silly and has no real purpose except to explore the
> > > > > limits.
> > > > > If that offends you, don't read the rest of this email.]
> > > > 
> > > > We do this quite frequently ourselves, even if it is just to
> > > > remind
> > > > ourselves how long it takes to wait for millions of IOs to be
> > > > done.
> > > 
> > > Just for the fun of it during an Linux Performance analysis & tuning
> > > course I held I created a 1 EiB XFS filesystem a sparse file on
> > > another XFS filesystem on an SSD of a ThinkPad T520. It took
> > > several hours to create, but then it was there and mountable. AFAIR
> > > the sparse file was a bit less than 20 GiB.
> > 
> > Yup, 20GB of single sector IOs takes a long time.
> 
> Yeah. It was interesting to see that neither the CPU nor the SSD was 
> fully utilized during that time tough.

Right - it's not CPU bound because it's always waiting on a single
IO, and it's not IO bound because it's only issuing a single IO at a
time.

Speaking of which, I just hacked an delayed write buffer list
construct similar to the kernel code into mkfs/libxfs to batch
writeback. Then I added a
hacky AIO ring to allow it to drive deep IO queues. I'm seeing
sustained request queue depths of ~100 and the SSDs are about 80%
busy at 100,000 write IOPS. But mkfs is only consuming about 60% of
a single CPU.

Which means that, instead of 7-8 hours to make an 8EB filesystem, we
can get it down to:

$ time sudo ~/packages/mkfs.xfs -K  -d size=8191p /dev/vdd
meta-data=/dev/vdd               isize=512    agcount=8387585, agsize=268435455 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096 blocks=2251524935778304, imaxpct=1
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=521728, version=2
	 =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

real    15m18.090s
user    5m54.162s
sys     3m49.518s

Around 15 minutes on a couple of cheap consumer nvme SSDs.

xfs_repair is going to need some help to scale up to this many AGs,
though - phase 1 is doing a huge amount of IO just to verify the
primary superblock...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2018-09-05 12:12 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-03 22:49 mkfs.xfs options suitable for creating absurdly large XFS filesystems? Richard W.M. Jones
2018-09-04  0:49 ` Dave Chinner
2018-09-04  8:23   ` Dave Chinner
2018-09-04  9:12     ` Dave Chinner
2018-09-04  8:26   ` Richard W.M. Jones
2018-09-04  9:11     ` Dave Chinner
2018-09-04  9:45       ` Richard W.M. Jones
2018-09-04 15:36   ` Martin Steigerwald
2018-09-04 22:23     ` Dave Chinner
2018-09-05  7:09       ` Martin Steigerwald
2018-09-05  7:43         ` Dave Chinner [this message]
2018-09-05  9:05   ` Richard W.M. Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180905074349.GX5631@dastard \
    --to=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin@lichtvoll.de \
    --cc=rjones@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).