All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Nick Piggin <npiggin@gmail.com>
Cc: Nick Piggin <npiggin@kernel.dk>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [patch 1/6] fs: icache RCU free inodes
Date: Mon, 15 Nov 2010 12:00:27 +1100	[thread overview]
Message-ID: <20101115010027.GC22876@dastard> (raw)
In-Reply-To: <AANLkTi=H5ZZ3b5F=Z-PM6FX84FJNzdSh4_HbeeU666ts@mail.gmail.com>

On Fri, Nov 12, 2010 at 12:24:21PM +1100, Nick Piggin wrote:
> On Wed, Nov 10, 2010 at 9:05 AM, Nick Piggin <npiggin@kernel.dk> wrote:
> > On Tue, Nov 09, 2010 at 09:08:17AM -0800, Linus Torvalds wrote:
> >> On Tue, Nov 9, 2010 at 8:21 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> >
> >> > You can see problems using this fancy thing :
> >> >
> >> > - Need to use slab ctor() to not overwrite some sensitive fields of
> >> > reused inodes.
> >> >  (spinlock, next pointer)
> >>
> >> Yes, the downside of using SLAB_DESTROY_BY_RCU is that you really
> >> cannot initialize some fields in the allocation path, because they may
> >> end up being still used while allocating a new (well, re-used) entry.
> >>
> >> However, I think that in the long run we pretty much _have_ to do that
> >> anyway, because the "free each inode separately with RCU" is a real
> >> overhead (Nick reports 10-20% cost). So it just makes my skin crawl to
> >> go that way.
> >
> > This is a creat/unlink loop on a tmpfs filesystem. Any real filesystem
> > is going to be *much* heavier in creat/unlink (so that 10-20% cost would
> > look more like a few %), and any real workload is going to have much
> > less intensive pattern.
> 
> So to get some more precise numbers, on a new kernel, and on a nehalem
> class CPU, creat/unlink busy loop on ramfs (worst possible case for inode
> RCU), then inode RCU costs 12% more time.
> 
> If we go to ext4 over ramdisk, it's 4.2% slower. Btrfs is 4.3% slower, XFS
> is about 4.9% slower.

That is actually significant because in the current XFS performance
using delayed logging for pure metadata operations is not that far
off ramdisk results.  Indeed, the simple test:

        while (i++ < 1000 * 1000) {
                int fd = open("foo", O_CREAT|O_RDWR, 777);
                unlink("foo");
                close(fd);
        }

Running 8 instances of the above on XFS, each in their own
directory, on a single sata drive with delayed logging enabled with
my current working XFS tree (includes SLAB_DESTROY_BY_RCU inode
cache and XFS inode cache, and numerous other XFS scalability
enhancements) currently runs at ~250k files/s. It took ~33s for 8 of
those loops above to complete in parallel, and was 100% CPU bound...

> Remember, this is on a ramdisk that's _hitting the CPU's L3 if not L2_
> cache. A real disk, even a fast SSD, is going to do IO far slower.

The amount of IO done during the above test?  A single log write -
one IO. Hence it isn't going to be any faster on a RAM disk, an SSD, a
large RAID array, etc because it is CPU bound, not IO bound. IOWs,
that 5% difference in CPU usage is significant for XFS regardless of
the storage....

> And also remember that real workloads will not approach creat/unlink busy
> loop behaviour of creating and destroying 800K files/s.

Perhaps not a local workload, but I expect to see things like
fileservers getting hit with these sorts of loads (i.e. hundreds of
thousands of create/unlinks a second). Especially as XFS now has
the journal scalability to make this possible...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2010-11-15  1:00 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-09 12:46 [patch 1/6] fs: icache RCU free inodes Nick Piggin
2010-11-09 12:47 ` [patch 2/6] fs: icache avoid RCU freeing for pseudo fs Nick Piggin
2010-11-09 12:58 ` [patch 3/6] fs: dcache documentation cleanup Nick Piggin
2010-11-09 16:24   ` Christoph Hellwig
2010-11-09 22:06     ` Nick Piggin
2010-11-10 16:27       ` Christoph Hellwig
2010-11-09 13:01 ` [patch 4/6] fs: d_delete change Nick Piggin
2010-11-09 16:25   ` Christoph Hellwig
2010-11-09 22:08     ` Nick Piggin
2010-11-10 16:32       ` Christoph Hellwig
2010-11-11  0:27         ` Nick Piggin
2010-11-11 22:07           ` Linus Torvalds
2010-11-09 13:02 ` [patch 5/6] fs: d_compare change for rcu-walk Nick Piggin
2010-11-09 16:25   ` Christoph Hellwig
2010-11-10  1:48     ` Nick Piggin
2010-11-09 13:03 ` [patch 6/6] fs: d_hash " Nick Piggin
2010-11-09 14:19 ` [patch 1/6] fs: icache RCU free inodes Andi Kleen
2010-11-09 21:36   ` Nick Piggin
2010-11-10 14:47     ` Andi Kleen
2010-11-11  4:27       ` Nick Piggin
2010-11-09 16:02 ` Linus Torvalds
2010-11-09 16:21   ` Christoph Hellwig
2010-11-09 21:48     ` Nick Piggin
2010-11-09 16:21   ` Eric Dumazet
2010-11-09 17:08     ` Linus Torvalds
2010-11-09 17:15       ` Christoph Hellwig
2010-11-09 21:55         ` Nick Piggin
2010-11-09 22:05       ` Nick Piggin
2010-11-12  1:24         ` Nick Piggin
2010-11-12  1:24           ` Nick Piggin
2010-11-12  4:48           ` Linus Torvalds
2010-11-12  6:02             ` Nick Piggin
2010-11-12  6:49               ` Nick Piggin
2010-11-12 17:33                 ` Linus Torvalds
2010-11-12 23:17                   ` Nick Piggin
2010-11-15  1:00           ` Dave Chinner [this message]
2010-11-15  4:21             ` Nick Piggin
2010-11-16  3:02               ` Dave Chinner
2010-11-16  3:02                 ` Dave Chinner
2010-11-16  3:49                 ` Nick Piggin
2010-11-17  1:12                   ` Dave Chinner
2010-11-17  4:18                     ` Nick Piggin
2010-11-17  5:56                       ` Nick Piggin
2010-11-17  6:04                         ` Nick Piggin
2010-11-09 21:44   ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101115010027.GC22876@dastard \
    --to=david@fromorbit.com \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@gmail.com \
    --cc=npiggin@kernel.dk \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.