All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Andi Kleen <ak@linux.intel.com>, Jan Kara <jack@suse.cz>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	xfs@oss.sgi.com, Andy Lutomirski <luto@amacapital.net>,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	Tim Chen <tim.c.chen@linux.intel.com>
Subject: Re: page fault scalability (ext3, ext4, xfs)
Date: Thu, 15 Aug 2013 08:36:40 -0700	[thread overview]
Message-ID: <520CF588.7090800@intel.com> (raw)
In-Reply-To: <20130815042930.GO6023@dastard>

On 08/14/2013 09:29 PM, Dave Chinner wrote:
> On Wed, Aug 14, 2013 at 07:24:01PM -0700, Andi Kleen wrote:
>>> And FWIW, it's no secret that XFS has more per-operation overhead
>>> than ext4 through the write path when it comes to allocation, so
>>> it's no surprise that on a workload that is highly dependent on
>>> allocation overhead that ext4 is a bit faster....
>>
>> This cannot explain a worse scaling curve though?
> 
> The scaling curve is pretty much identical. The difference in
> performance will be the overhead of timestamp updates through
> the transaction subsystems of the filesystems.

I guess how you read it is in the eye of the beholder.  I see xfs being
slower than ext3 or ext4.  Nobody sits and does this in a loop in real
life (it's a microbenchbark), but I'd be willing to bet that this is a
real *component* of real-life workloads.  It's a component where I think
it's pretty clear xfs and ext4 lag behind ext3, and it _looks_ to me
like it gets worse on larger systems.

Maybe that's because of design decisions in the filesystem, or because
of the enhanced integrity guarantees that xfs/ext4 provide.

>> w-i-s is all about scaling.
> 
> Sure, but scaling *what*? It's spending all it's time in the
> filesystem through the .page_mkwrite path. It's not a page fault
> scaling test - it's a filesystem overwrite test that uses mmap.

will-it-scale tests a bunch of different scenarios.  This is just one of
at least 6 tests that we do which beat on the page fault path.  It was
the only one of those 6 that showed any kind of bottleneck being in the
fs code.

> Indeed, I bet if you replace the mmap() with a write(fd, buf, 4096)
> loop, you'd get almost identical behaviour from the filesystems.

In a quick 60-second test: xfs went from ~70M writes/sec (doing faults)
to ~18M/sec (using write()).  ext4 went down to 0.5M/sec.  I didn't take
the mmap()/munmap() out:

                lseek(fd, 0, SEEK_SET);
                for (i = 0; i < MEMSIZE; i += pgsize) {
                        write(fd, xxx, 4096);
                        //c[i] = 0;
                        (*iterations)++;
                }

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

WARNING: multiple messages have this Message-ID (diff)
From: Dave Hansen <dave.hansen@intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Andi Kleen <ak@linux.intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com,
	linux-ext4@vger.kernel.org, Jan Kara <jack@suse.cz>,
	LKML <linux-kernel@vger.kernel.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Andy Lutomirski <luto@amacapital.net>
Subject: Re: page fault scalability (ext3, ext4, xfs)
Date: Thu, 15 Aug 2013 08:36:40 -0700	[thread overview]
Message-ID: <520CF588.7090800@intel.com> (raw)
In-Reply-To: <20130815042930.GO6023@dastard>

On 08/14/2013 09:29 PM, Dave Chinner wrote:
> On Wed, Aug 14, 2013 at 07:24:01PM -0700, Andi Kleen wrote:
>>> And FWIW, it's no secret that XFS has more per-operation overhead
>>> than ext4 through the write path when it comes to allocation, so
>>> it's no surprise that on a workload that is highly dependent on
>>> allocation overhead that ext4 is a bit faster....
>>
>> This cannot explain a worse scaling curve though?
> 
> The scaling curve is pretty much identical. The difference in
> performance will be the overhead of timestamp updates through
> the transaction subsystems of the filesystems.

I guess how you read it is in the eye of the beholder.  I see xfs being
slower than ext3 or ext4.  Nobody sits and does this in a loop in real
life (it's a microbenchbark), but I'd be willing to bet that this is a
real *component* of real-life workloads.  It's a component where I think
it's pretty clear xfs and ext4 lag behind ext3, and it _looks_ to me
like it gets worse on larger systems.

Maybe that's because of design decisions in the filesystem, or because
of the enhanced integrity guarantees that xfs/ext4 provide.

>> w-i-s is all about scaling.
> 
> Sure, but scaling *what*? It's spending all it's time in the
> filesystem through the .page_mkwrite path. It's not a page fault
> scaling test - it's a filesystem overwrite test that uses mmap.

will-it-scale tests a bunch of different scenarios.  This is just one of
at least 6 tests that we do which beat on the page fault path.  It was
the only one of those 6 that showed any kind of bottleneck being in the
fs code.

> Indeed, I bet if you replace the mmap() with a write(fd, buf, 4096)
> loop, you'd get almost identical behaviour from the filesystems.

In a quick 60-second test: xfs went from ~70M writes/sec (doing faults)
to ~18M/sec (using write()).  ext4 went down to 0.5M/sec.  I didn't take
the mmap()/munmap() out:

                lseek(fd, 0, SEEK_SET);
                for (i = 0; i < MEMSIZE; i += pgsize) {
                        write(fd, xxx, 4096);
                        //c[i] = 0;
                        (*iterations)++;
                }


  reply	other threads:[~2013-08-15 15:36 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-14 17:10 page fault scalability (ext3, ext4, xfs) Dave Hansen
2013-08-14 17:10 ` Dave Hansen
2013-08-14 19:43 ` Theodore Ts'o
2013-08-14 19:43   ` Theodore Ts'o
2013-08-14 20:50   ` Dave Hansen
2013-08-14 20:50     ` Dave Hansen
2013-08-14 23:06     ` Theodore Ts'o
2013-08-14 23:06       ` Theodore Ts'o
2013-08-14 23:38       ` Andy Lutomirski
2013-08-15  1:11         ` Theodore Ts'o
2013-08-15  2:10           ` Dave Chinner
2013-08-15  4:32             ` Andy Lutomirski
2013-08-15  4:32               ` Andy Lutomirski
2013-08-15  6:01               ` Dave Chinner
2013-08-15  6:14                 ` Andy Lutomirski
2013-08-15  6:14                   ` Andy Lutomirski
2013-08-15  6:18                   ` David Lang
2013-08-15  6:18                     ` David Lang
2013-08-15  6:28                     ` Andy Lutomirski
2013-08-15  6:28                       ` Andy Lutomirski
2013-08-15  7:11                   ` Dave Chinner
2013-08-15  7:11                     ` Dave Chinner
2013-08-15  7:45                     ` Jan Kara
2013-08-15 21:28                       ` Dave Chinner
2013-08-15 21:28                         ` Dave Chinner
2013-08-15 21:31                         ` Andy Lutomirski
2013-08-15 21:39                           ` Dave Chinner
2013-08-19 23:23                         ` David Lang
2013-08-19 23:23                           ` David Lang
2013-08-19 23:31                           ` Andy Lutomirski
2013-08-15 15:17                     ` Andy Lutomirski
2013-08-15 15:17                       ` Andy Lutomirski
2013-08-15 21:37                       ` Dave Chinner
2013-08-15 21:37                         ` Dave Chinner
2013-08-15 21:43                         ` Andy Lutomirski
2013-08-15 21:43                           ` Andy Lutomirski
2013-08-15 22:18                           ` Dave Chinner
2013-08-15 22:18                             ` Dave Chinner
2013-08-15 22:26                             ` Andy Lutomirski
2013-08-16  0:14                               ` Dave Chinner
2013-08-16  0:21                                 ` Andy Lutomirski
2013-08-16 22:02                         ` J. Bruce Fields
2013-08-16 22:02                           ` J. Bruce Fields
2013-08-16 23:18                           ` Andy Lutomirski
2013-08-16 23:18                             ` Andy Lutomirski
2013-08-18 20:17                             ` J. Bruce Fields
2013-08-18 20:17                               ` J. Bruce Fields
2013-08-19 22:17                 ` J. Bruce Fields
2013-08-19 22:17                   ` J. Bruce Fields
2013-08-19 22:29                   ` Andy Lutomirski
2013-08-19 22:29                     ` Andy Lutomirski
2013-08-15 15:14           ` Dave Hansen
2013-08-15 15:14             ` Dave Hansen
2013-08-15  0:24 ` Dave Chinner
2013-08-15  0:24   ` Dave Chinner
2013-08-15  2:24   ` Andi Kleen
2013-08-15  2:24     ` Andi Kleen
2013-08-15  4:29     ` Dave Chinner
2013-08-15  4:29       ` Dave Chinner
2013-08-15 15:36       ` Dave Hansen [this message]
2013-08-15 15:36         ` Dave Hansen
2013-08-15 15:09   ` Dave Hansen
2013-08-15 15:05 ` Theodore Ts'o
2013-08-15 17:45   ` Dave Hansen
2013-08-15 17:45     ` Dave Hansen
2013-08-15 19:31     ` Theodore Ts'o
2013-08-15 19:31       ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=520CF588.7090800@intel.com \
    --to=dave.hansen@intel.com \
    --cc=ak@linux.intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=tim.c.chen@linux.intel.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.