From: Martin Steigerwald <martin@lichtvoll.de>
To: Dave Chinner <david@fromorbit.com>
Cc: Theodore Ts'o <tytso@mit.edu>,
tux3@tux3.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org,
Mike Galbraith <umgwanakikbuti@gmail.com>,
Daniel Phillips <daniel@phunq.net>,
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Subject: Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
Date: Thu, 30 Apr 2015 11:00:05 +0200 [thread overview]
Message-ID: <4154074.ZWLyZCMjhl@merkaba> (raw)
In-Reply-To: <20150430002008.GY15810@dastard>
Am Donnerstag, 30. April 2015, 10:20:08 schrieb Dave Chinner:
> On Wed, Apr 29, 2015 at 09:05:26PM +0200, Mike Galbraith wrote:
> > Here's something that _might_ interest xfs folks.
> >
> > cd git (source repository of git itself)
> > make clean
> > echo 3 > /proc/sys/vm/drop_caches
> > time make -j8 test
> >
> > ext4 2m20.721s
> > xfs 6m41.887s <-- ick
> > btrfs 1m32.038s
> > tux3 1m30.262s
> >
> > Testing by Aunt Tilly: mkfs, no fancy switches, mount the thing, test.
>
> TL;DR: Results are *very different* on a 256GB Samsung 840 EVO SSD
> with slightly slower CPUs (E5-4620 @ 2.20GHz)i, all filesystems
> using defaults:
>
> real user sys
> xfs 3m16.138s 7m8.341s 14m32.462s
> ext4 3m18.045s 7m7.840s 14m32.994s
> btrfs 3m45.149s 7m10.184s 16m30.498s
>
> What you are seeing is physical seek distances impacting read
> performance. XFS does not optimise for minimal physical seek
> distance, and hence is slower than filesytsems that do optimise for
> minimal seek distance. This shows up especially well on slow single
> spindles.
>
> XFS is *adequate* for the use on slow single drives, but it is
> really designed for best performance on storage hardware that is not
> seek distance sensitive.
>
> IOWS, XFS just hates your disk. Spend $50 and buy a cheap SSD and
> the problem goes away. :)
I am quite surprised that a traditional filesystem that was created in the
age of rotating media does not like this kind of media and even seems to
excel on BTRFS on the new non rotating media available.
But…
> ----
>
> And now in more detail.
>
> It's easy to be fast on empty filesystems. XFS does not aim to be
> fast in such situations - it aims to have consistent performance
> across the life of the filesystem.
… this is a quite important addition.
> Thing is, once you've abused those filesytsems for a couple of
> months, the files in ext4, btrfs and tux3 are not going to be laid
> out perfectly on the outer edge of the disk. They'll be spread all
> over the place and so all the filesystems will be seeing large seeks
> on read. The thing is, XFS will have roughly the same performance as
> when the filesystem is empty because the spreading of the allocation
> allows it to maintain better locality and separation and hence
> doesn't fragment free space nearly as badly as the oher filesystems.
> Free space fragmentation is what leads to performance degradation in
> filesystems, and all the other filesystem will have degraded to be
> *much worse* than XFS.
I even still see hungs on what I tend to see as freespace fragmentation in
BTRFS. My /home on a Dual (!) BTRFS SSD setup can basically stall to a
halt when it has reserved all space of the device for chunks. So this
merkaba:~> btrfs fi sh /home
Label: 'home' uuid: […]
Total devices 2 FS bytes used 129.48GiB
devid 1 size 170.00GiB used 146.03GiB path /dev/mapper/msata-
home
devid 2 size 170.00GiB used 146.03GiB path /dev/mapper/sata-
home
Btrfs v3.18
merkaba:~> btrfs fi df /home
Data, RAID1: total=142.00GiB, used=126.72GiB
System, RAID1: total=32.00MiB, used=48.00KiB
Metadata, RAID1: total=4.00GiB, used=2.76GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
is safe, but one I have size 170 GiB user 170 GiB, even if inside the
chunks there is enough free space to allocate from, enough as in 30-40
GiB, it can happen that writes are stalled up to the point that
applications on the desktop freeze and I see hung task messages in kernel
log.
This is the case upto kernel 4.0. I have seen Chris Mason fixing some write
stalls for big facebook setups, maybe it will help here, but unless this
issue is fixed, I think BTRFS is not yet fully production ready, unless you
leave *huge* amount of free space, as in for 200 GiB of data you want to
write make a 400 GiB volume.
> Put simply: empty filesystem benchmarking does not show the real
> performance of the filesystem under sustained production workloads.
> Hence benchmarks like this - while interesting from a theoretical
> point of view and are widely used for bragging about whose got the
> fastest - are mostly irrelevant to determining how the filesystem
> will perform in production environments.
>
> We can also look at this algorithm in a different way: take a large
> filesystem (say a few hundred TB) across a few tens of disks in a
> linear concat. ext4, btrfs and tux3 will only hit the first disk in
> the concat, and so go no faster because they are still bound by
> physical seek times. XFS, however, will spread the load across many
> (if not all) of the disks, and so effectively reduce the average
> seek time by the number of disks doing concurrent IO. Then you'll
> see that application level IO concurrency becomes the performance
> limitation, not the physical seek time of the hardware.
That are the allocation groups. I always wondered how it can be beneficial
to spread the allocations onto 4 areas of one partition on expensive seek
media. Now that makes better sense for me. I always had the gut impression
that XFS may not be the fastest in all cases, but it is one of the
filesystem with the most consistent performance over time, but never was
able to fully explain why that is.
Thanks,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
_______________________________________________
Tux3 mailing list
Tux3@phunq.net
http://phunq.net/mailman/listinfo/tux3
next prev parent reply other threads:[~2015-04-30 9:00 UTC|newest]
Thread overview: 160+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-28 23:13 Tux3 Report: How fast can we fsync? Daniel Phillips
2015-04-29 2:21 ` Mike Galbraith
2015-04-29 6:01 ` Daniel Phillips
2015-04-29 6:20 ` Richard Weinberger
2015-04-29 6:56 ` Daniel Phillips
2015-04-29 6:33 ` Mike Galbraith
2015-04-29 7:23 ` Daniel Phillips
2015-04-29 16:42 ` Mike Galbraith
2015-04-29 19:05 ` xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?) Mike Galbraith
2015-04-29 19:20 ` Austin S Hemmelgarn
2015-04-29 21:12 ` Daniel Phillips
2015-04-30 4:40 ` Mike Galbraith
2015-04-30 0:20 ` Dave Chinner
2015-04-30 3:35 ` Mike Galbraith
2015-04-30 9:00 ` Martin Steigerwald [this message]
2015-04-30 14:57 ` Theodore Ts'o
2015-04-30 15:59 ` Daniel Phillips
2015-04-30 17:59 ` Martin Steigerwald
2015-04-30 11:14 ` Daniel Phillips
2015-04-30 12:07 ` Mike Galbraith
2015-04-30 12:58 ` Daniel Phillips
2015-04-30 13:48 ` Mike Galbraith
2015-04-30 14:07 ` Daniel Phillips
2015-04-30 14:28 ` Howard Chu
2015-04-30 15:14 ` Daniel Phillips
2015-04-30 16:00 ` Howard Chu
2015-04-30 18:22 ` Christian Stroetmann
2015-05-11 22:12 ` Pavel Machek
2015-05-11 23:17 ` Theodore Ts'o
2015-05-12 2:34 ` Daniel Phillips
2015-05-12 5:38 ` Dave Chinner
2015-05-12 6:18 ` Daniel Phillips
2015-05-12 18:39 ` David Lang
2015-05-12 20:54 ` Daniel Phillips
2015-05-12 21:30 ` David Lang
2015-05-12 22:27 ` Daniel Phillips
2015-05-12 22:35 ` David Lang
2015-05-12 23:55 ` Theodore Ts'o
2015-05-13 1:26 ` Daniel Phillips
2015-05-13 19:09 ` Martin Steigerwald
2015-05-13 19:37 ` Daniel Phillips
2015-05-13 20:02 ` Jeremy Allison
2015-05-13 20:24 ` Daniel Phillips
2015-05-13 20:25 ` Martin Steigerwald
2015-05-13 20:38 ` Daniel Phillips
2015-05-13 21:10 ` Martin Steigerwald
2015-05-13 0:31 ` Daniel Phillips
2015-05-12 21:30 ` Christian Stroetmann
2015-05-13 7:20 ` Pavel Machek
2015-05-13 13:47 ` Elifarley Callado Coelho Cruz
2015-05-12 9:03 ` Pavel Machek
2015-05-12 11:22 ` Daniel Phillips
2015-05-12 13:26 ` Howard Chu
2015-05-11 23:53 ` Daniel Phillips
2015-05-12 0:12 ` David Lang
2015-05-12 4:36 ` Daniel Phillips
2015-05-12 17:30 ` Christian Stroetmann
2015-05-13 7:25 ` Pavel Machek
2015-05-13 11:31 ` Daniel Phillips
2015-05-13 12:41 ` Daniel Phillips
2015-05-13 13:08 ` Mike Galbraith
2015-05-13 13:15 ` Daniel Phillips
2015-04-30 14:33 ` Mike Galbraith
2015-04-30 15:24 ` Daniel Phillips
2015-04-29 20:40 ` Tux3 Report: How fast can we fsync? Daniel Phillips
2015-04-29 22:06 ` OGAWA Hirofumi
2015-04-30 3:57 ` Mike Galbraith
2015-04-30 3:50 ` Mike Galbraith
2015-04-30 10:59 ` Daniel Phillips
2015-04-30 1:46 ` Dave Chinner
2015-04-30 10:28 ` Daniel Phillips
2015-05-01 15:38 ` Dave Chinner
2015-05-01 23:20 ` Daniel Phillips
2015-05-02 1:07 ` David Lang
2015-05-02 10:26 ` Daniel Phillips
2015-05-02 16:00 ` Christian Stroetmann
2015-05-02 16:30 ` Richard Weinberger
2015-05-02 17:00 ` Christian Stroetmann
2015-05-12 17:41 ` Daniel Phillips
2015-05-12 17:46 ` Tux3 Report: How fast can we fail? Daniel Phillips
2015-05-13 22:07 ` Daniel Phillips
2015-05-26 10:03 ` Pavel Machek
2015-05-27 6:41 ` Mosis Tembo
2015-05-27 18:28 ` Daniel Phillips
2015-05-27 21:39 ` Pavel Machek
2015-05-27 22:46 ` Daniel Phillips
2015-05-28 12:55 ` Austin S Hemmelgarn
2015-05-27 7:37 ` Mosis Tembo
2015-05-27 14:04 ` Austin S Hemmelgarn
2015-05-27 15:21 ` Mosis Tembo
2015-05-27 15:37 ` Austin S Hemmelgarn
2015-05-14 7:37 ` [WIP] tux3: Optimized fsync Daniel Phillips
2015-05-14 8:26 ` [FYI] tux3: Core changes Daniel Phillips
2015-05-14 12:59 ` Rik van Riel
2015-05-15 0:06 ` Daniel Phillips
2015-05-15 3:06 ` Rik van Riel
2015-05-15 8:09 ` Mel Gorman
2015-05-15 9:54 ` Daniel Phillips
2015-05-15 11:00 ` Mel Gorman
2015-05-16 22:38 ` David Lang
2015-05-18 12:57 ` Mel Gorman
2015-05-15 9:38 ` Daniel Phillips
2015-05-27 7:41 ` Pavel Machek
2015-05-27 18:09 ` Daniel Phillips
2015-05-27 21:37 ` Pavel Machek
2015-05-27 22:33 ` Daniel Phillips
2015-05-15 8:05 ` Mel Gorman
2015-05-17 13:26 ` Boaz Harrosh
2015-05-18 2:20 ` Rik van Riel
2015-05-18 7:58 ` Boaz Harrosh
2015-05-19 4:46 ` Daniel Phillips
2015-05-21 19:43 ` [WIP][PATCH] tux3: preliminatry nospace handling Daniel Phillips
2015-05-19 14:00 ` [FYI] tux3: Core changes Jan Kara
2015-05-19 19:18 ` Daniel Phillips
2015-05-19 20:33 ` David Lang
2015-05-20 14:44 ` Jan Kara
2015-05-20 16:22 ` Daniel Phillips
2015-05-20 18:01 ` David Lang
2015-05-20 19:53 ` Rik van Riel
2015-05-20 22:51 ` Daniel Phillips
2015-05-21 3:24 ` Daniel Phillips
2015-05-21 3:51 ` David Lang
2015-05-21 19:53 ` Daniel Phillips
2015-05-26 4:25 ` Rik van Riel
2015-05-26 4:30 ` Daniel Phillips
2015-05-26 6:04 ` David Lang
2015-05-26 6:11 ` Daniel Phillips
2015-05-26 6:13 ` David Lang
2015-05-26 8:09 ` Daniel Phillips
2015-05-26 10:13 ` Pavel Machek
2015-05-26 7:09 ` Jan Kara
2015-05-26 8:08 ` Daniel Phillips
2015-05-26 9:00 ` Jan Kara
2015-05-26 20:22 ` Daniel Phillips
2015-05-26 21:36 ` Rik van Riel
2015-05-26 21:49 ` Daniel Phillips
2015-05-27 8:41 ` Jan Kara
2015-06-21 15:36 ` OGAWA Hirofumi
2015-06-23 16:12 ` Jan Kara
2015-07-05 12:54 ` OGAWA Hirofumi
2015-07-09 16:05 ` Jan Kara
2015-07-31 4:44 ` OGAWA Hirofumi
2015-07-31 15:37 ` Raymond Jennings
2015-07-31 17:27 ` Daniel Phillips
2015-07-31 18:29 ` David Lang
2015-07-31 18:43 ` Daniel Phillips
2015-07-31 22:12 ` Daniel Phillips
2015-07-31 22:27 ` David Lang
2015-08-01 0:00 ` Daniel Phillips
2015-08-01 0:16 ` Daniel Phillips
2015-08-03 13:07 ` Jan Kara
2015-08-01 10:55 ` Elifarley Callado Coelho Cruz
2015-08-18 16:39 ` Rik van Riel
2015-08-03 13:42 ` Jan Kara
2015-08-09 13:42 ` OGAWA Hirofumi
2015-08-10 12:45 ` Jan Kara
2015-08-16 19:42 ` OGAWA Hirofumi
2015-05-26 10:22 ` Sergey Senozhatsky
2015-05-26 12:33 ` Jan Kara
2015-05-26 19:18 ` Daniel Phillips
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4154074.ZWLyZCMjhl@merkaba \
--to=martin@lichtvoll.de \
--cc=daniel@phunq.net \
--cc=david@fromorbit.com \
--cc=hirofumi@mail.parknet.co.jp \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tux3@tux3.org \
--cc=tytso@mit.edu \
--cc=umgwanakikbuti@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).