From: Dave Chinner <david@fromorbit.com>
To: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>, Jan Kara <jack@suse.cz>,
LKML <linux-kernel@vger.kernel.org>,
xfs@oss.sgi.com, Andy Lutomirski <luto@amacapital.net>,
linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
Tim Chen <tim.c.chen@linux.intel.com>
Subject: Re: page fault scalability (ext3, ext4, xfs)
Date: Thu, 15 Aug 2013 10:24:36 +1000 [thread overview]
Message-ID: <20130815002436.GI6023@dastard> (raw)
In-Reply-To: <520BB9EF.5020308@linux.intel.com>
On Wed, Aug 14, 2013 at 10:10:07AM -0700, Dave Hansen wrote:
> We talked a little about this issue in this thread:
>
> http://marc.info/?l=linux-mm&m=137573185419275&w=2
>
> but I figured I'd follow up with a full comparison. ext4 is about 20%
> slower in handling write page faults than ext3. xfs is about 30% slower
> than ext3. I'm running on an 8-socket / 80-core / 160-thread system.
> Test case is this:
>
> https://github.com/antonblanchard/will-it-scale/blob/master/tests/page_fault3.c
So, it writes a 128MB file sequentially via mmap page faults. This
isn't a page fault benchmark, as such...
>
> It's a little easier to look at the trends as you grow the number of
> processes:
>
> http://www.sr71.net/~dave/intel/page-fault-exts/cmp.html?1=ext3&2=ext4&3=xfs&hide=linear,threads,threads_idle,processes_idle&rollPeriod=16
>
> I recorded and diff'd some perf data (I've still got the raw data if
> anyone wants it), and the main culprit of the ext4/xfs delta looks to be
> spinlock contention (or at least bouncing) in xfs_log_commit_cil().
> This looks to be a known problem:
>
> http://oss.sgi.com/archives/xfs/2013-07/msg00110.html
Yup, apparently they've been pulled into the xfsdev tree, but i
haven't seen it updated since they were pulled in so the linux-next
builds aren't picking up the fixes yet.
> Here's a brief snippet of the ext4->xfs 'perf diff'. Note that things
> like page_fault() go down in the profile because we are doing _fewer_ of
> them, not because it got faster:
>
> > # Baseline Delta Shared Object Symbol
> > # ........ ....... ..................... ..............................................
> > #
> > 22.04% -4.07% [kernel.kallsyms] [k] page_fault
> > 2.93% +12.49% [kernel.kallsyms] [k] _raw_spin_lock
> > 8.21% -0.58% page_fault3_processes [.] testcase
> > 4.87% -0.34% [kernel.kallsyms] [k] __set_page_dirty_buffers
> > 4.07% -0.58% [kernel.kallsyms] [k] mem_cgroup_update_page_stat
> > 4.10% -0.61% [kernel.kallsyms] [k] __block_write_begin
> > 3.69% -0.57% [kernel.kallsyms] [k] find_get_page
>
> It's a bit of a bummer that things are so much less scalable on the
> newer filesystems.
Sorry, what? What filesystems are you comparing here? XFS is
anything but new...
> I expected xfs to do a _lot_ better than it did.
perf diff doesn't tell me anything about how you should expect the
workload to scale.
This workload appears to be a concurrent write workload using
mmap(), so performance is going to be determined by filesystem
configuration, storage capability and the CPU overhead of the
page_mkwrite() path through the filesystem. It's not a page fault
benchmark at all - it's simply a filesystem write bandwidth
benchmark.
So, perhaps you could describe the storage you are using, as that
would shed more light on your results. A good summary of what
information is useful to us is here:
http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
And FWIW, it's no secret that XFS has more per-operation overhead
than ext4 through the write path when it comes to allocation, so
it's no surprise that on a workload that is highly dependent on
allocation overhead that ext4 is a bit faster....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com,
linux-ext4@vger.kernel.org, Jan Kara <jack@suse.cz>,
LKML <linux-kernel@vger.kernel.org>,
Tim Chen <tim.c.chen@linux.intel.com>,
Andi Kleen <ak@linux.intel.com>,
Andy Lutomirski <luto@amacapital.net>
Subject: Re: page fault scalability (ext3, ext4, xfs)
Date: Thu, 15 Aug 2013 10:24:36 +1000 [thread overview]
Message-ID: <20130815002436.GI6023@dastard> (raw)
In-Reply-To: <520BB9EF.5020308@linux.intel.com>
On Wed, Aug 14, 2013 at 10:10:07AM -0700, Dave Hansen wrote:
> We talked a little about this issue in this thread:
>
> http://marc.info/?l=linux-mm&m=137573185419275&w=2
>
> but I figured I'd follow up with a full comparison. ext4 is about 20%
> slower in handling write page faults than ext3. xfs is about 30% slower
> than ext3. I'm running on an 8-socket / 80-core / 160-thread system.
> Test case is this:
>
> https://github.com/antonblanchard/will-it-scale/blob/master/tests/page_fault3.c
So, it writes a 128MB file sequentially via mmap page faults. This
isn't a page fault benchmark, as such...
>
> It's a little easier to look at the trends as you grow the number of
> processes:
>
> http://www.sr71.net/~dave/intel/page-fault-exts/cmp.html?1=ext3&2=ext4&3=xfs&hide=linear,threads,threads_idle,processes_idle&rollPeriod=16
>
> I recorded and diff'd some perf data (I've still got the raw data if
> anyone wants it), and the main culprit of the ext4/xfs delta looks to be
> spinlock contention (or at least bouncing) in xfs_log_commit_cil().
> This looks to be a known problem:
>
> http://oss.sgi.com/archives/xfs/2013-07/msg00110.html
Yup, apparently they've been pulled into the xfsdev tree, but i
haven't seen it updated since they were pulled in so the linux-next
builds aren't picking up the fixes yet.
> Here's a brief snippet of the ext4->xfs 'perf diff'. Note that things
> like page_fault() go down in the profile because we are doing _fewer_ of
> them, not because it got faster:
>
> > # Baseline Delta Shared Object Symbol
> > # ........ ....... ..................... ..............................................
> > #
> > 22.04% -4.07% [kernel.kallsyms] [k] page_fault
> > 2.93% +12.49% [kernel.kallsyms] [k] _raw_spin_lock
> > 8.21% -0.58% page_fault3_processes [.] testcase
> > 4.87% -0.34% [kernel.kallsyms] [k] __set_page_dirty_buffers
> > 4.07% -0.58% [kernel.kallsyms] [k] mem_cgroup_update_page_stat
> > 4.10% -0.61% [kernel.kallsyms] [k] __block_write_begin
> > 3.69% -0.57% [kernel.kallsyms] [k] find_get_page
>
> It's a bit of a bummer that things are so much less scalable on the
> newer filesystems.
Sorry, what? What filesystems are you comparing here? XFS is
anything but new...
> I expected xfs to do a _lot_ better than it did.
perf diff doesn't tell me anything about how you should expect the
workload to scale.
This workload appears to be a concurrent write workload using
mmap(), so performance is going to be determined by filesystem
configuration, storage capability and the CPU overhead of the
page_mkwrite() path through the filesystem. It's not a page fault
benchmark at all - it's simply a filesystem write bandwidth
benchmark.
So, perhaps you could describe the storage you are using, as that
would shed more light on your results. A good summary of what
information is useful to us is here:
http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
And FWIW, it's no secret that XFS has more per-operation overhead
than ext4 through the write path when it comes to allocation, so
it's no surprise that on a workload that is highly dependent on
allocation overhead that ext4 is a bit faster....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2013-08-15 0:24 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-14 17:10 page fault scalability (ext3, ext4, xfs) Dave Hansen
2013-08-14 17:10 ` Dave Hansen
2013-08-14 19:43 ` Theodore Ts'o
2013-08-14 19:43 ` Theodore Ts'o
2013-08-14 20:50 ` Dave Hansen
2013-08-14 20:50 ` Dave Hansen
2013-08-14 23:06 ` Theodore Ts'o
2013-08-14 23:06 ` Theodore Ts'o
2013-08-14 23:38 ` Andy Lutomirski
2013-08-15 1:11 ` Theodore Ts'o
2013-08-15 2:10 ` Dave Chinner
2013-08-15 4:32 ` Andy Lutomirski
2013-08-15 4:32 ` Andy Lutomirski
2013-08-15 6:01 ` Dave Chinner
2013-08-15 6:14 ` Andy Lutomirski
2013-08-15 6:14 ` Andy Lutomirski
2013-08-15 6:18 ` David Lang
2013-08-15 6:18 ` David Lang
2013-08-15 6:28 ` Andy Lutomirski
2013-08-15 6:28 ` Andy Lutomirski
2013-08-15 7:11 ` Dave Chinner
2013-08-15 7:11 ` Dave Chinner
2013-08-15 7:45 ` Jan Kara
2013-08-15 21:28 ` Dave Chinner
2013-08-15 21:28 ` Dave Chinner
2013-08-15 21:31 ` Andy Lutomirski
2013-08-15 21:39 ` Dave Chinner
2013-08-19 23:23 ` David Lang
2013-08-19 23:23 ` David Lang
2013-08-19 23:31 ` Andy Lutomirski
2013-08-15 15:17 ` Andy Lutomirski
2013-08-15 15:17 ` Andy Lutomirski
2013-08-15 21:37 ` Dave Chinner
2013-08-15 21:37 ` Dave Chinner
2013-08-15 21:43 ` Andy Lutomirski
2013-08-15 21:43 ` Andy Lutomirski
2013-08-15 22:18 ` Dave Chinner
2013-08-15 22:18 ` Dave Chinner
2013-08-15 22:26 ` Andy Lutomirski
2013-08-16 0:14 ` Dave Chinner
2013-08-16 0:21 ` Andy Lutomirski
2013-08-16 22:02 ` J. Bruce Fields
2013-08-16 22:02 ` J. Bruce Fields
2013-08-16 23:18 ` Andy Lutomirski
2013-08-16 23:18 ` Andy Lutomirski
2013-08-18 20:17 ` J. Bruce Fields
2013-08-18 20:17 ` J. Bruce Fields
2013-08-19 22:17 ` J. Bruce Fields
2013-08-19 22:17 ` J. Bruce Fields
2013-08-19 22:29 ` Andy Lutomirski
2013-08-19 22:29 ` Andy Lutomirski
2013-08-15 15:14 ` Dave Hansen
2013-08-15 15:14 ` Dave Hansen
2013-08-15 0:24 ` Dave Chinner [this message]
2013-08-15 0:24 ` Dave Chinner
2013-08-15 2:24 ` Andi Kleen
2013-08-15 2:24 ` Andi Kleen
2013-08-15 4:29 ` Dave Chinner
2013-08-15 4:29 ` Dave Chinner
2013-08-15 15:36 ` Dave Hansen
2013-08-15 15:36 ` Dave Hansen
2013-08-15 15:09 ` Dave Hansen
2013-08-15 15:05 ` Theodore Ts'o
2013-08-15 17:45 ` Dave Hansen
2013-08-15 17:45 ` Dave Hansen
2013-08-15 19:31 ` Theodore Ts'o
2013-08-15 19:31 ` Theodore Ts'o
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130815002436.GI6023@dastard \
--to=david@fromorbit.com \
--cc=ak@linux.intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=tim.c.chen@linux.intel.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.