linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Some baseline tests on new hardware (was Re: [PATCH] xfs: optimise CIL insertion during transaction commit [RFC])
       [not found] <1372657476-9241-1-git-send-email-david@fromorbit.com>
@ 2013-07-08 12:44 ` Dave Chinner
  2013-07-08 13:59   ` Jan Kara
                     ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Dave Chinner @ 2013-07-08 12:44 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel

[cc fsdevel because after all the XFS stuff I did a some testing on
mmotm w.r.t per-node LRU lock contention avoidance, and also some
scalability tests against ext4 and btrfs for comparison on some new
hardware. That bit ain't pretty. ]

On Mon, Jul 01, 2013 at 03:44:36PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Note: This is an RFC right now - it'll need to be broken up into
> several patches for final submission.
> 
> The CIL insertion during transaction commit currently does multiple
> passes across the transaction objects and requires multiple memory
> allocations per object that is to be inserted into the CIL. It is
> quite inefficient, and as such xfs_log_commit_cil() and it's
> children show up quite highly in profiles under metadata
> modification intensive workloads.
> 
> The current insertion tries to minimise ithe number of times the
> xc_cil_lock is grabbed and the hold times via a couple of methods:
> 
> 	1. an initial loop across the transaction items outside the
> 	lock to allocate log vectors, buffers and copy the data into
> 	them.
> 	2. a second pass across the log vectors that then inserts
> 	them into the CIL, modifies the CIL state and frees the old
> 	vectors.
> 
> This is somewhat inefficient. While it minimises lock grabs, the
> hold time is still quite high because we are freeing objects with
> the spinlock held and so the hold times are much higher than they
> need to be.
> 
> Optimisations that can be made:
.....
> 
> The result is that my standard fsmark benchmark (8-way, 50m files)
> on my standard test VM (8-way, 4GB RAM, 4xSSD in RAID0, 100TB fs)
> gives the following results with a xfs-oss tree. No CRCs:
> 
>                 vanilla         patched         Difference
> create  (time)  483s            435s            -10.0%  (faster)
>         (rate)  109k+/6k        122k+/-7k       +11.9%  (faster)
> 
> walk            339s            335s            (noise)
>      (sys cpu)  1134s           1135s           (noise)
> 
> unlink          692s            645s             -6.8%  (faster)
> 
> So it's significantly faster than the current code, and lock_stat
> reports lower contention on the xc_cil_lock, too. So, big win here.
> 
> With CRCs:
> 
>                 vanilla         patched         Difference
> create  (time)  510s            460s             -9.8%  (faster)
>         (rate)  105k+/5.4k      117k+/-5k       +11.4%  (faster)
> 
> walk            494s            486s            (noise)
>      (sys cpu)  1324s           1290s           (noise)
> 
> unlink          959s            889s             -7.3%  (faster)
> 
> Gains are of the same order, with walk and unlink still affected by
> VFS LRU lock contention. IOWs, with this changes, filesystems with
> CRCs enabled will still be faster than the old non-CRC kernels...

FWIW, I have new hardware here that I'll be using for benchmarking
like this, so here's a quick baseline comparison using the same
8p/4GB RAM VM (just migrated across) and same SSD based storage
(physically moved) and 100TB filesystem. The disks are behind a
faster RAID controller w/ 1GB of BBWC, so random read and write IOPS
are higher and hence traversal times will due to lower IO latency.

Create times
		  wall time(s)		     rate (files/s)
		vanilla	 patched   diff	   vanilla  patched    diff
Old system	  483	  435	 -10.0%	   109k+-6k 122k+-7k +11.9%
New system	  378	  342	  -9.5%	   143k+-9k 158k+-8k +10.5%
diff		-21.7%	-21.4%		    +31.2%   +29.5%

Walk times
		  wall time(s)
		vanilla	 patched   diff
Old system	  339	  335	 (noise)
New system	  194	  197	 (noise)
diff		-42.7%	-41.2%

Unlink times
		  wall time(s)
		vanilla	 patched   diff
Old system	  692	  645	  -7.3%
New system	  457	  405	 -11.4%
diff		-34.0%  -37.2%

So, overall, the new system is 20-40% faster than the old one on a
comparitive test. but I have a few more cores and a lot more memory
to play with, so a 16-way test on the same machine with the VM
expanded to 16p/16GB RAM, 4 fake numa nodes follows:

New system, patched kernel:

Threads	    create		walk		unlink
	time(s)	 rate		time(s)		time(s)
8	  342	158k+-8k	  197		  405
16	  222	266k+-32k	  170		  295
diff	-35.1%	 +68.4%		-13.7%		-27.2%

Create rates are much more variable because the memory reclaim
behaviour appears to be very harsh, pulling 4-6 million inodes out
of memory every 10s or so and thrashing on the LRU locks, and then
doing nothing until another large step occurs.

Walk rates improve, but not much because of lock contention. I added
8 cpu cores to the workload, and I'm burning at least 4 of those
cores on the inode LRU lock.

-  30.61%  [kernel]  [k] __ticket_spin_trylock
   - __ticket_spin_trylock
      - 65.33% _raw_spin_lock
         + 88.19% inode_add_lru
         + 7.31% dentry_lru_del
         + 1.07% shrink_dentry_list
         + 0.90% dput
         + 0.83% inode_sb_list_add
         + 0.59% evict
      + 27.79% do_raw_spin_lock
      + 4.03% do_raw_spin_trylock
      + 2.85% _raw_spin_trylock

The current mmotm (and hence probably 3.11) has the new per-node LRU
code in it, so this variance and contention should go away very
soon.

Unlinks go lots faster because they don't cause inode LRU lock
contention, but we are still a long way from linear scalability
from 8- to 16-way.

FWIW, the mmotm kernel (which has a fair bit of debug enabled, so
not quite comparitive) doesn't have any LRU lock contention to speak
of. For create:

-   7.81%  [kernel]  [k] __ticket_spin_trylock
   - __ticket_spin_trylock
      - 70.98% _raw_spin_lock
         + 97.55% xfs_log_commit_cil
         + 0.93% __d_instantiate
         + 0.58% inode_sb_list_add
      - 29.02% do_raw_spin_lock
         - _raw_spin_lock
            + 41.14% xfs_log_commit_cil
            + 8.29% _xfs_buf_find
            + 8.00% xfs_iflush_cluster

And the walk:

-  26.37%  [kernel]  [k] __ticket_spin_trylock
   - __ticket_spin_trylock
      - 49.10% _raw_spin_lock
         - 50.65% evict
              dispose_list
              prune_icache_sb
              super_cache_scan
            + shrink_slab
         - 26.99% list_lru_add
            + 89.01% inode_add_lru
            + 10.95% dput
         + 7.03% __remove_inode_hash
      - 40.65% do_raw_spin_lock
         - _raw_spin_lock
            - 41.96% evict
                 dispose_list
                 prune_icache_sb
                 super_cache_scan
               + shrink_slab
            - 13.55% list_lru_add
                 84.33% inode_add_lru
                    iput
                    d_kill
                    shrink_dentry_list
                    prune_dcache_sb
                    super_cache_scan
                    shrink_slab
                 15.01% dput
                 0.66% xfs_buf_rele
            + 10.10% __remove_inode_hash                                                                                                                               
                    system_call_fastpath

There's quite a different pattern of contention - it has moved
inward to evict which implies the inode_sb_list_lock is the next
obvious point of contention. I have patches in the works for that.
Also, the inode_hash_lock is causing some contention, even though we
fake inode hashing. I have a patch to fix that for XFS as well.

I also note an interesting behaviour of the per-node inode LRUs -
the contention is coming from the dentry shrinker on one node
freeing inodes allocated on a different node during reclaim. There's
scope for improvement there.

But here' the interesting part:

Kernel	    create		walk		unlink
	time(s)	 rate		time(s)		time(s)
3.10-cil  222	266k+-32k	  170		  295
mmotm	  251	222k+-16k	  128		  356

Even with all the debug enabled, the overall walk time dropped by
25% to 128s. So performance in this workload has substantially
improved because of the per-node LRUs and variability is also down
as well, as predicted. Once I add all the tweaks I have in the
3.10-cil tree to mmotm, I expect significant improvements to create
and unlink performance as well...

So, lets look at ext4 vs btrfs vs XFS at 16-way (this is on the
3.10-cil kernel I've been testing XFS on):

	    create		 walk		unlink
	 time(s)   rate		time(s)		time(s)
xfs	  222	266k+-32k	  170		  295
ext4	  978	 54k+- 2k	  325		 2053
btrfs	 1223	 47k+- 8k	  366		12000(*)

(*) Estimate based on a removal rate of 18.5 minutes for the first
4.8 million inodes.

Basically, neither btrfs or ext4 have any concurrency scaling to
demonstrate, and unlinks on btrfs a just plain woeful.

ext4 create rate is limited by the extent cache LRU locking:

-  41.81%  [kernel]  [k] __ticket_spin_trylock
   - __ticket_spin_trylock
      - 60.67% _raw_spin_lock
         - 99.60% ext4_es_lru_add
            + 99.63% ext4_es_lookup_extent
      - 39.15% do_raw_spin_lock
         - _raw_spin_lock
            + 95.38% ext4_es_lru_add
              0.51% insert_inode_locked
                 __ext4_new_inode
-   16.20%  [kernel]  [k] native_read_tsc
   - native_read_tsc
      - 60.91% delay_tsc
           __delay
           do_raw_spin_lock
         + _raw_spin_lock
      - 39.09% __delay
           do_raw_spin_lock
         + _raw_spin_lock

Ext4 unlink is serialised on orphan list processing:

-  12.67%  [kernel]  [k] __mutex_unlock_slowpath
   - __mutex_unlock_slowpath
      - 99.95% mutex_unlock
         + 54.37% ext4_orphan_del
         + 43.26% ext4_orphan_add
+   5.33%  [kernel]  [k] __mutex_lock_slowpath


btrfs create has tree lock problems:

-  21.68%  [kernel]  [k] __write_lock_failed
   - __write_lock_failed
      - 99.93% do_raw_write_lock
         - _raw_write_lock
            - 79.04% btrfs_try_tree_write_lock
               - btrfs_search_slot
                  - 97.48% btrfs_insert_empty_items
                       99.82% btrfs_new_inode
                  + 2.52% btrfs_lookup_inode
            - 20.37% btrfs_tree_lock
               - 99.38% btrfs_search_slot
                    99.92% btrfs_insert_empty_items
                 0.52% btrfs_lock_root_node
                    btrfs_search_slot
                    btrfs_insert_empty_items
-  21.24%  [kernel]  [k] _raw_spin_unlock_irqrestore
   - _raw_spin_unlock_irqrestore
      - 61.22% prepare_to_wait
         + 61.52% btrfs_tree_lock
         + 32.31% btrfs_tree_read_lock
           6.17% reserve_metadata_bytes
              btrfs_block_rsv_add

btrfs walk phase hammers the inode_hash_lock:

-  18.45%  [kernel]  [k] __ticket_spin_trylock
   - __ticket_spin_trylock
      - 47.38% _raw_spin_lock
         + 42.99% iget5_locked
         + 15.17% __remove_inode_hash
         + 13.77% btrfs_get_delayed_node
         + 11.27% inode_tree_add
         + 9.32% btrfs_destroy_inode
.....
      - 46.77% do_raw_spin_lock
         - _raw_spin_lock
            + 30.51% iget5_locked
            + 11.40% __remove_inode_hash
            + 11.38% btrfs_get_delayed_node
            + 9.45% inode_tree_add
            + 7.28% btrfs_destroy_inode
.....

I have a RCU inode hash lookup patch floating around somewhere if
someone wants it...

And, well, the less said about btrfs unlinks the better:

+  37.14%  [kernel]  [k] _raw_spin_unlock_irqrestore
+  33.18%  [kernel]  [k] __write_lock_failed
+  17.96%  [kernel]  [k] __read_lock_failed
+   1.35%  [kernel]  [k] _raw_spin_unlock_irq
+   0.82%  [kernel]  [k] __do_softirq
+   0.53%  [kernel]  [k] btrfs_tree_lock
+   0.41%  [kernel]  [k] btrfs_tree_read_lock
+   0.41%  [kernel]  [k] do_raw_read_lock
+   0.39%  [kernel]  [k] do_raw_write_lock
+   0.38%  [kernel]  [k] btrfs_clear_lock_blocking_rw
+   0.37%  [kernel]  [k] free_extent_buffer
+   0.36%  [kernel]  [k] btrfs_tree_read_unlock
+   0.32%  [kernel]  [k] do_raw_write_unlock

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Some baseline tests on new hardware (was Re: [PATCH] xfs: optimise CIL insertion during transaction commit [RFC])
  2013-07-08 12:44 ` Some baseline tests on new hardware (was Re: [PATCH] xfs: optimise CIL insertion during transaction commit [RFC]) Dave Chinner
@ 2013-07-08 13:59   ` Jan Kara
  2013-07-08 15:22     ` Marco Stornelli
  2013-07-09  0:43   ` Zheng Liu
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2013-07-08 13:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, xfs

On Mon 08-07-13 22:44:53, Dave Chinner wrote:
<snipped some nice XFS results ;)>
> So, lets look at ext4 vs btrfs vs XFS at 16-way (this is on the
> 3.10-cil kernel I've been testing XFS on):
> 
> 	    create		 walk		unlink
> 	 time(s)   rate		time(s)		time(s)
> xfs	  222	266k+-32k	  170		  295
> ext4	  978	 54k+- 2k	  325		 2053
> btrfs	 1223	 47k+- 8k	  366		12000(*)
> 
> (*) Estimate based on a removal rate of 18.5 minutes for the first
> 4.8 million inodes.
> 
> Basically, neither btrfs or ext4 have any concurrency scaling to
> demonstrate, and unlinks on btrfs a just plain woeful.
  Thanks for posting the numbers. There isn't anyone seriously testing ext4
SMP scalability AFAIK so it's not surprising it sucks.
 
> ext4 create rate is limited by the extent cache LRU locking:
> 
> -  41.81%  [kernel]  [k] __ticket_spin_trylock
>    - __ticket_spin_trylock
>       - 60.67% _raw_spin_lock
>          - 99.60% ext4_es_lru_add
>             + 99.63% ext4_es_lookup_extent
  At least this should improve with the patches in 3.11-rc1.

>       - 39.15% do_raw_spin_lock
>          - _raw_spin_lock
>             + 95.38% ext4_es_lru_add
>               0.51% insert_inode_locked
>                  __ext4_new_inode
> -   16.20%  [kernel]  [k] native_read_tsc
>    - native_read_tsc
>       - 60.91% delay_tsc
>            __delay
>            do_raw_spin_lock
>          + _raw_spin_lock
>       - 39.09% __delay
>            do_raw_spin_lock
>          + _raw_spin_lock
> 
> Ext4 unlink is serialised on orphan list processing:
> 
> -  12.67%  [kernel]  [k] __mutex_unlock_slowpath
>    - __mutex_unlock_slowpath
>       - 99.95% mutex_unlock
>          + 54.37% ext4_orphan_del
>          + 43.26% ext4_orphan_add
> +   5.33%  [kernel]  [k] __mutex_lock_slowpath
  ext4 can do better here I'm sure. The current solution is pretty
simplistic. At least we could use spinlock for in-memory orphan list and
atomic ops for on disk one (as it's only singly linked list). But not sure
if I find time to look into this in forseeable future...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Some baseline tests on new hardware (was Re: [PATCH] xfs: optimise CIL insertion during transaction commit [RFC])
  2013-07-08 13:59   ` Jan Kara
@ 2013-07-08 15:22     ` Marco Stornelli
  2013-07-08 15:38       ` Jan Kara
  2013-07-09  0:56       ` Theodore Ts'o
  0 siblings, 2 replies; 12+ messages in thread
From: Marco Stornelli @ 2013-07-08 15:22 UTC (permalink / raw)
  To: Jan Kara; +Cc: Dave Chinner, xfs, linux-fsdevel

Il 08/07/2013 15:59, Jan Kara ha scritto:
> On Mon 08-07-13 22:44:53, Dave Chinner wrote:
> <snipped some nice XFS results ;)>
>> So, lets look at ext4 vs btrfs vs XFS at 16-way (this is on the
>> 3.10-cil kernel I've been testing XFS on):
>>
>> 	    create		 walk		unlink
>> 	 time(s)   rate		time(s)		time(s)
>> xfs	  222	266k+-32k	  170		  295
>> ext4	  978	 54k+- 2k	  325		 2053
>> btrfs	 1223	 47k+- 8k	  366		12000(*)
>>
>> (*) Estimate based on a removal rate of 18.5 minutes for the first
>> 4.8 million inodes.
>>
>> Basically, neither btrfs or ext4 have any concurrency scaling to
>> demonstrate, and unlinks on btrfs a just plain woeful.
>    Thanks for posting the numbers. There isn't anyone seriously testing ext4
> SMP scalability AFAIK so it's not surprising it sucks.

Funny, if I well remember Google guys switched android from yaffs2 to 
ext4 due to its superiority on SMP :)

Marco

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Some baseline tests on new hardware (was Re: [PATCH] xfs: optimise CIL insertion during transaction commit [RFC])
  2013-07-08 15:22     ` Marco Stornelli
@ 2013-07-08 15:38       ` Jan Kara
  2013-07-09  0:15         ` Dave Chinner
  2013-07-09  0:56       ` Theodore Ts'o
  1 sibling, 1 reply; 12+ messages in thread
From: Jan Kara @ 2013-07-08 15:38 UTC (permalink / raw)
  To: Marco Stornelli; +Cc: linux-fsdevel, Jan Kara, xfs

On Mon 08-07-13 17:22:43, Marco Stornelli wrote:
> Il 08/07/2013 15:59, Jan Kara ha scritto:
> >On Mon 08-07-13 22:44:53, Dave Chinner wrote:
> ><snipped some nice XFS results ;)>
> >>So, lets look at ext4 vs btrfs vs XFS at 16-way (this is on the
> >>3.10-cil kernel I've been testing XFS on):
> >>
> >>	    create		 walk		unlink
> >>	 time(s)   rate		time(s)		time(s)
> >>xfs	  222	266k+-32k	  170		  295
> >>ext4	  978	 54k+- 2k	  325		 2053
> >>btrfs	 1223	 47k+- 8k	  366		12000(*)
> >>
> >>(*) Estimate based on a removal rate of 18.5 minutes for the first
> >>4.8 million inodes.
> >>
> >>Basically, neither btrfs or ext4 have any concurrency scaling to
> >>demonstrate, and unlinks on btrfs a just plain woeful.
> >   Thanks for posting the numbers. There isn't anyone seriously testing ext4
> >SMP scalability AFAIK so it's not surprising it sucks.
> 
> Funny, if I well remember Google guys switched android from yaffs2
> to ext4 due to its superiority on SMP :)
  Well, there's SMP and SMP. Ext4 is perfectly OK for desktop kind of SMP -
that's what lots of people use. When we speak of heavy IO load with 16 CPUs
on enterprise grade storage so that CPU (and not IO) bottlenecks are actually
visible, that's not so easily available and so we don't have serious
performance work in that direction...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Some baseline tests on new hardware (was Re: [PATCH] xfs: optimise CIL insertion during transaction commit [RFC])
  2013-07-08 15:38       ` Jan Kara
@ 2013-07-09  0:15         ` Dave Chinner
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2013-07-09  0:15 UTC (permalink / raw)
  To: Jan Kara; +Cc: Marco Stornelli, xfs, linux-fsdevel

On Mon, Jul 08, 2013 at 05:38:07PM +0200, Jan Kara wrote:
> On Mon 08-07-13 17:22:43, Marco Stornelli wrote:
> > Il 08/07/2013 15:59, Jan Kara ha scritto:
> > >On Mon 08-07-13 22:44:53, Dave Chinner wrote:
> > ><snipped some nice XFS results ;)>
> > >>So, lets look at ext4 vs btrfs vs XFS at 16-way (this is on the
> > >>3.10-cil kernel I've been testing XFS on):
> > >>
> > >>	    create		 walk		unlink
> > >>	 time(s)   rate		time(s)		time(s)
> > >>xfs	  222	266k+-32k	  170		  295
> > >>ext4	  978	 54k+- 2k	  325		 2053
> > >>btrfs	 1223	 47k+- 8k	  366		12000(*)
> > >>
> > >>(*) Estimate based on a removal rate of 18.5 minutes for the first
> > >>4.8 million inodes.
> > >>
> > >>Basically, neither btrfs or ext4 have any concurrency scaling to
> > >>demonstrate, and unlinks on btrfs a just plain woeful.
> > >   Thanks for posting the numbers. There isn't anyone seriously testing ext4
> > >SMP scalability AFAIK so it's not surprising it sucks.

It's worse than that - nobody picked up on review that taking a
global lock on every extent lookup might be a scalability issue?
Scalability is not an afterthought anymore - new filesystem and
kernel features need to be designed from the ground up with this in
mind. We're living in a world where even phones have 4 CPU cores....

> > Funny, if I well remember Google guys switched android from yaffs2
> > to ext4 due to its superiority on SMP :)
>   Well, there's SMP and SMP. Ext4 is perfectly OK for desktop kind of SMP -

Barely. It tops out in parallelism at between 2-4 threads depending
on the metadata operations being done.

> that's what lots of people use. When we speak of heavy IO load with 16 CPUs
> on enterprise grade storage so that CPU (and not IO) bottlenecks are actually
> visible, that's not so easily available and so we don't have serious
> performance work in that direction...

I'm not testing with "enterprise grade" storage. The filesystem I'm
testing on is hosted on less than $300 of SSDs.  The "enterprise"
RAID controller they sit behind is actually an IOPS bottleneck, not
an improvement.

My 2.5 year old desktop has a pair of cheap, no name sandforce SSDs
in RAID0 and they can do at least 2x the read and write IOPS of the
new hardware I just tested. And yes, I run XFS on my desktop.

And then there's my 3 month old laptop, which has a recent SATA SSD
in it. It also has 8 threads, but twice the memory and about 1.5x
the IOPS and bandwidth of my desktop machine.

The bottlenecks showing up in ext4 and btrfs don't magically show up
at 16 threads - they are present and reproducable at 2-4 threads.
Indeed, I didn't bother testing at 32 threads - even though my new
server can do that - because that will just hammer the same
bottlenecks even harder.  Fundamentally, I'm not testing anything
you can't test on a $2000 desktop PC....

FWIW, the SSDs are making ext4 and btrfs look good in these
workloads. XFS is creating >250k files/s doing about 1500 IOPS. ext4
is making 50k files/s at 23,000 IOPS. btrfs has peaks every 30s of
over 30,000 IOPS. Which filesystem is going to scale better on
desktops with spinning rust?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Some baseline tests on new hardware (was Re: [PATCH] xfs: optimise CIL insertion during transaction commit [RFC])
  2013-07-08 12:44 ` Some baseline tests on new hardware (was Re: [PATCH] xfs: optimise CIL insertion during transaction commit [RFC]) Dave Chinner
  2013-07-08 13:59   ` Jan Kara
@ 2013-07-09  0:43   ` Zheng Liu
  2013-07-09  1:23     ` Dave Chinner
  2013-07-09  1:15   ` Chris Mason
  2013-07-09  8:26   ` Dave Chinner
  3 siblings, 1 reply; 12+ messages in thread
From: Zheng Liu @ 2013-07-09  0:43 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs, linux-fsdevel

Hi Dave,

On Mon, Jul 08, 2013 at 10:44:53PM +1000, Dave Chinner wrote:
[...]
> So, lets look at ext4 vs btrfs vs XFS at 16-way (this is on the
> 3.10-cil kernel I've been testing XFS on):
> 
> 	    create		 walk		unlink
> 	 time(s)   rate		time(s)		time(s)
> xfs	  222	266k+-32k	  170		  295
> ext4	  978	 54k+- 2k	  325		 2053
> btrfs	 1223	 47k+- 8k	  366		12000(*)
> 
> (*) Estimate based on a removal rate of 18.5 minutes for the first
> 4.8 million inodes.
> 
> Basically, neither btrfs or ext4 have any concurrency scaling to
> demonstrate, and unlinks on btrfs a just plain woeful.
> 
> ext4 create rate is limited by the extent cache LRU locking:

I have a patch to fix this problem and the patch has been applied into
3.11-rc1.  The patch is (d3922a77):
  ext4: improve extent cache shrink mechanism to avoid to burn CPU time

I do really appreicate that if you could try your testing again against
this patch.  I just want to make sure that this problem has been fixed.
At least in my own testing it looks fine.

Thanks,
                                                - Zheng

> 
> -  41.81%  [kernel]  [k] __ticket_spin_trylock
>    - __ticket_spin_trylock
>       - 60.67% _raw_spin_lock
>          - 99.60% ext4_es_lru_add
>             + 99.63% ext4_es_lookup_extent
>       - 39.15% do_raw_spin_lock
>          - _raw_spin_lock
>             + 95.38% ext4_es_lru_add
>               0.51% insert_inode_locked
>                  __ext4_new_inode
> -   16.20%  [kernel]  [k] native_read_tsc
>    - native_read_tsc
>       - 60.91% delay_tsc
>            __delay
>            do_raw_spin_lock
>          + _raw_spin_lock
>       - 39.09% __delay
>            do_raw_spin_lock
>          + _raw_spin_lock
> 
> Ext4 unlink is serialised on orphan list processing:
> 
> -  12.67%  [kernel]  [k] __mutex_unlock_slowpath
>    - __mutex_unlock_slowpath
>       - 99.95% mutex_unlock
>          + 54.37% ext4_orphan_del
>          + 43.26% ext4_orphan_add
> +   5.33%  [kernel]  [k] __mutex_lock_slowpath
> 
> 
> btrfs create has tree lock problems:
> 
> -  21.68%  [kernel]  [k] __write_lock_failed
>    - __write_lock_failed
>       - 99.93% do_raw_write_lock
>          - _raw_write_lock
>             - 79.04% btrfs_try_tree_write_lock
>                - btrfs_search_slot
>                   - 97.48% btrfs_insert_empty_items
>                        99.82% btrfs_new_inode
>                   + 2.52% btrfs_lookup_inode
>             - 20.37% btrfs_tree_lock
>                - 99.38% btrfs_search_slot
>                     99.92% btrfs_insert_empty_items
>                  0.52% btrfs_lock_root_node
>                     btrfs_search_slot
>                     btrfs_insert_empty_items
> -  21.24%  [kernel]  [k] _raw_spin_unlock_irqrestore
>    - _raw_spin_unlock_irqrestore
>       - 61.22% prepare_to_wait
>          + 61.52% btrfs_tree_lock
>          + 32.31% btrfs_tree_read_lock
>            6.17% reserve_metadata_bytes
>               btrfs_block_rsv_add
> 
> btrfs walk phase hammers the inode_hash_lock:
> 
> -  18.45%  [kernel]  [k] __ticket_spin_trylock
>    - __ticket_spin_trylock
>       - 47.38% _raw_spin_lock
>          + 42.99% iget5_locked
>          + 15.17% __remove_inode_hash
>          + 13.77% btrfs_get_delayed_node
>          + 11.27% inode_tree_add
>          + 9.32% btrfs_destroy_inode
> .....
>       - 46.77% do_raw_spin_lock
>          - _raw_spin_lock
>             + 30.51% iget5_locked
>             + 11.40% __remove_inode_hash
>             + 11.38% btrfs_get_delayed_node
>             + 9.45% inode_tree_add
>             + 7.28% btrfs_destroy_inode
> .....
> 
> I have a RCU inode hash lookup patch floating around somewhere if
> someone wants it...
> 
> And, well, the less said about btrfs unlinks the better:
> 
> +  37.14%  [kernel]  [k] _raw_spin_unlock_irqrestore
> +  33.18%  [kernel]  [k] __write_lock_failed
> +  17.96%  [kernel]  [k] __read_lock_failed
> +   1.35%  [kernel]  [k] _raw_spin_unlock_irq
> +   0.82%  [kernel]  [k] __do_softirq
> +   0.53%  [kernel]  [k] btrfs_tree_lock
> +   0.41%  [kernel]  [k] btrfs_tree_read_lock
> +   0.41%  [kernel]  [k] do_raw_read_lock
> +   0.39%  [kernel]  [k] do_raw_write_lock
> +   0.38%  [kernel]  [k] btrfs_clear_lock_blocking_rw
> +   0.37%  [kernel]  [k] free_extent_buffer
> +   0.36%  [kernel]  [k] btrfs_tree_read_unlock
> +   0.32%  [kernel]  [k] do_raw_write_unlock
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Some baseline tests on new hardware (was Re: [PATCH] xfs: optimise CIL insertion during transaction commit [RFC])
  2013-07-08 15:22     ` Marco Stornelli
  2013-07-08 15:38       ` Jan Kara
@ 2013-07-09  0:56       ` Theodore Ts'o
  1 sibling, 0 replies; 12+ messages in thread
From: Theodore Ts'o @ 2013-07-09  0:56 UTC (permalink / raw)
  To: Marco Stornelli; +Cc: Jan Kara, Dave Chinner, xfs, linux-fsdevel

On Mon, Jul 08, 2013 at 05:22:43PM +0200, Marco Stornelli wrote:
> 
> Funny, if I well remember Google guys switched android from yaffs2
> to ext4 due to its superiority on SMP :)

The bigger reason why was because raw NAND flash doesn't really make
sense any more; especially as the feature size of flash cells has
shrunk and with the introduction of MLC and TLC, you really need to
use hardware assist to make flash sufficiently reliable.  Modern flash
storage uses dynamic adjustment of voltage levels as the flash cells
age, and error correcting codes to compensate for flash reliability
challenges.  This means accessing flash using eMMC, SATA, SAS, etc.,
and that rules out YAFFS2.

						- Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Some baseline tests on new hardware (was Re: [PATCH] xfs: optimise CIL insertion during transaction commit [RFC])
  2013-07-08 12:44 ` Some baseline tests on new hardware (was Re: [PATCH] xfs: optimise CIL insertion during transaction commit [RFC]) Dave Chinner
  2013-07-08 13:59   ` Jan Kara
  2013-07-09  0:43   ` Zheng Liu
@ 2013-07-09  1:15   ` Chris Mason
  2013-07-09  1:26     ` Dave Chinner
  2013-07-09  8:26   ` Dave Chinner
  3 siblings, 1 reply; 12+ messages in thread
From: Chris Mason @ 2013-07-09  1:15 UTC (permalink / raw)
  To: Dave Chinner, xfs; +Cc: linux-fsdevel

Quoting Dave Chinner (2013-07-08 08:44:53)
> [cc fsdevel because after all the XFS stuff I did a some testing on
> mmotm w.r.t per-node LRU lock contention avoidance, and also some
> scalability tests against ext4 and btrfs for comparison on some new
> hardware. That bit ain't pretty. ]
> 
> And, well, the less said about btrfs unlinks the better:
> 
> +  37.14%  [kernel]  [k] _raw_spin_unlock_irqrestore
> +  33.18%  [kernel]  [k] __write_lock_failed
> +  17.96%  [kernel]  [k] __read_lock_failed
> +   1.35%  [kernel]  [k] _raw_spin_unlock_irq
> +   0.82%  [kernel]  [k] __do_softirq
> +   0.53%  [kernel]  [k] btrfs_tree_lock
> +   0.41%  [kernel]  [k] btrfs_tree_read_lock
> +   0.41%  [kernel]  [k] do_raw_read_lock
> +   0.39%  [kernel]  [k] do_raw_write_lock
> +   0.38%  [kernel]  [k] btrfs_clear_lock_blocking_rw
> +   0.37%  [kernel]  [k] free_extent_buffer
> +   0.36%  [kernel]  [k] btrfs_tree_read_unlock
> +   0.32%  [kernel]  [k] do_raw_write_unlock
> 

Hi Dave,

Thanks for doing these runs.  At least on Btrfs the best way to resolve
the tree locking today is to break things up into more subvolumes.  I've
got another run at the root lock contention in the queue after I get
the skiplists in place in a few other parts of the Btrfs code.

-chris


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Some baseline tests on new hardware (was Re: [PATCH] xfs: optimise CIL insertion during transaction commit [RFC])
  2013-07-09  0:43   ` Zheng Liu
@ 2013-07-09  1:23     ` Dave Chinner
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2013-07-09  1:23 UTC (permalink / raw)
  To: xfs, linux-fsdevel

On Tue, Jul 09, 2013 at 08:43:32AM +0800, Zheng Liu wrote:
> Hi Dave,
> 
> On Mon, Jul 08, 2013 at 10:44:53PM +1000, Dave Chinner wrote:
> [...]
> > So, lets look at ext4 vs btrfs vs XFS at 16-way (this is on the
> > 3.10-cil kernel I've been testing XFS on):
> > 
> > 	    create		 walk		unlink
> > 	 time(s)   rate		time(s)		time(s)
> > xfs	  222	266k+-32k	  170		  295
> > ext4	  978	 54k+- 2k	  325		 2053
> > btrfs	 1223	 47k+- 8k	  366		12000(*)
> > 
> > (*) Estimate based on a removal rate of 18.5 minutes for the first
> > 4.8 million inodes.
> > 
> > Basically, neither btrfs or ext4 have any concurrency scaling to
> > demonstrate, and unlinks on btrfs a just plain woeful.
> > 
> > ext4 create rate is limited by the extent cache LRU locking:
> 
> I have a patch to fix this problem and the patch has been applied into
> 3.11-rc1.  The patch is (d3922a77):
>   ext4: improve extent cache shrink mechanism to avoid to burn CPU time
> 
> I do really appreicate that if you could try your testing again against
> this patch.  I just want to make sure that this problem has been fixed.
> At least in my own testing it looks fine.

I'll redo them when 3.11-rc1 comes around. I'll let you know how
much better it is, and where the next ring of the onion lies.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Some baseline tests on new hardware (was Re: [PATCH] xfs: optimise CIL insertion during transaction commit [RFC])
  2013-07-09  1:15   ` Chris Mason
@ 2013-07-09  1:26     ` Dave Chinner
  2013-07-09  1:54       ` [BULK] " Chris Mason
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2013-07-09  1:26 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-fsdevel, xfs

On Mon, Jul 08, 2013 at 09:15:33PM -0400, Chris Mason wrote:
> Quoting Dave Chinner (2013-07-08 08:44:53)
> > [cc fsdevel because after all the XFS stuff I did a some testing on
> > mmotm w.r.t per-node LRU lock contention avoidance, and also some
> > scalability tests against ext4 and btrfs for comparison on some new
> > hardware. That bit ain't pretty. ]
> > 
> > And, well, the less said about btrfs unlinks the better:
> > 
> > +  37.14%  [kernel]  [k] _raw_spin_unlock_irqrestore
> > +  33.18%  [kernel]  [k] __write_lock_failed
> > +  17.96%  [kernel]  [k] __read_lock_failed
> > +   1.35%  [kernel]  [k] _raw_spin_unlock_irq
> > +   0.82%  [kernel]  [k] __do_softirq
> > +   0.53%  [kernel]  [k] btrfs_tree_lock
> > +   0.41%  [kernel]  [k] btrfs_tree_read_lock
> > +   0.41%  [kernel]  [k] do_raw_read_lock
> > +   0.39%  [kernel]  [k] do_raw_write_lock
> > +   0.38%  [kernel]  [k] btrfs_clear_lock_blocking_rw
> > +   0.37%  [kernel]  [k] free_extent_buffer
> > +   0.36%  [kernel]  [k] btrfs_tree_read_unlock
> > +   0.32%  [kernel]  [k] do_raw_write_unlock
> > 
> 
> Hi Dave,
> 
> Thanks for doing these runs.  At least on Btrfs the best way to resolve
> the tree locking today is to break things up into more subvolumes.

Sure, but you can't do that most workloads. Only on specialised
workloads (e.g. hashed directory tree based object stores) is this
really a viable option....

> I've
> got another run at the root lock contention in the queue after I get
> the skiplists in place in a few other parts of the Btrfs code.

It will be interesting to see how these new structures play out ;)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BULK] Re: Some baseline tests on new hardware (was Re: [PATCH] xfs: optimise CIL insertion during transaction commit [RFC])
  2013-07-09  1:26     ` Dave Chinner
@ 2013-07-09  1:54       ` Chris Mason
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Mason @ 2013-07-09  1:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs, linux-fsdevel

Quoting Dave Chinner (2013-07-08 21:26:14)
> On Mon, Jul 08, 2013 at 09:15:33PM -0400, Chris Mason wrote:
> > Quoting Dave Chinner (2013-07-08 08:44:53)
> > > [cc fsdevel because after all the XFS stuff I did a some testing on
> > > mmotm w.r.t per-node LRU lock contention avoidance, and also some
> > > scalability tests against ext4 and btrfs for comparison on some new
> > > hardware. That bit ain't pretty. ]
> > > 
> > > And, well, the less said about btrfs unlinks the better:
> > > 
> > > +  37.14%  [kernel]  [k] _raw_spin_unlock_irqrestore
> > > +  33.18%  [kernel]  [k] __write_lock_failed
> > > +  17.96%  [kernel]  [k] __read_lock_failed
> > > +   1.35%  [kernel]  [k] _raw_spin_unlock_irq
> > > +   0.82%  [kernel]  [k] __do_softirq
> > > +   0.53%  [kernel]  [k] btrfs_tree_lock
> > > +   0.41%  [kernel]  [k] btrfs_tree_read_lock
> > > +   0.41%  [kernel]  [k] do_raw_read_lock
> > > +   0.39%  [kernel]  [k] do_raw_write_lock
> > > +   0.38%  [kernel]  [k] btrfs_clear_lock_blocking_rw
> > > +   0.37%  [kernel]  [k] free_extent_buffer
> > > +   0.36%  [kernel]  [k] btrfs_tree_read_unlock
> > > +   0.32%  [kernel]  [k] do_raw_write_unlock
> > > 
> > 
> > Hi Dave,
> > 
> > Thanks for doing these runs.  At least on Btrfs the best way to resolve
> > the tree locking today is to break things up into more subvolumes.
> 
> Sure, but you can't do that most workloads. Only on specialised
> workloads (e.g. hashed directory tree based object stores) is this
> really a viable option....

Yes and no.  It makes a huge difference even when you have 8 procs all
working on the same 8 subvolumes.  It's not perfect but it's all I
have ;)

> 
> > I've
> > got another run at the root lock contention in the queue after I get
> > the skiplists in place in a few other parts of the Btrfs code.
> 
> It will be interesting to see how these new structures play out ;)

The skiplists don't translate well to the tree roots, so I'll probably
have to do something different there.  But I'll get the onion peeled one
way or another.

-chris


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Some baseline tests on new hardware (was Re: [PATCH] xfs: optimise CIL insertion during transaction commit [RFC])
  2013-07-08 12:44 ` Some baseline tests on new hardware (was Re: [PATCH] xfs: optimise CIL insertion during transaction commit [RFC]) Dave Chinner
                     ` (2 preceding siblings ...)
  2013-07-09  1:15   ` Chris Mason
@ 2013-07-09  8:26   ` Dave Chinner
  3 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2013-07-09  8:26 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel

On Mon, Jul 08, 2013 at 10:44:53PM +1000, Dave Chinner wrote:
> [cc fsdevel because after all the XFS stuff I did a some testing on
> mmotm w.r.t per-node LRU lock contention avoidance, and also some
> scalability tests against ext4 and btrfs for comparison on some new
> hardware. That bit ain't pretty. ]

A quick follow on mmotm:

> FWIW, the mmotm kernel (which has a fair bit of debug enabled, so
> not quite comparitive) doesn't have any LRU lock contention to speak
> of. For create:
> 
> -   7.81%  [kernel]  [k] __ticket_spin_trylock
>    - __ticket_spin_trylock
>       - 70.98% _raw_spin_lock
>          + 97.55% xfs_log_commit_cil
>          + 0.93% __d_instantiate
>          + 0.58% inode_sb_list_add
>       - 29.02% do_raw_spin_lock
>          - _raw_spin_lock
>             + 41.14% xfs_log_commit_cil
>             + 8.29% _xfs_buf_find
>             + 8.00% xfs_iflush_cluster

So i just ported all my prototype sync and inode_sb_list_lock
changes across to mmotm, as well as the XFS CIL optimisations.

-   2.33%  [kernel]  [k] __ticket_spin_trylock
   - __ticket_spin_trylock
      - 70.14% do_raw_spin_lock
         - _raw_spin_lock
            + 16.91% _xfs_buf_find
            + 15.20% list_lru_add
            + 12.83% xfs_log_commit_cil
            + 11.18% d_alloc
            + 7.43% dput
            + 4.56% __d_instantiate
....

Most of the spinlock contention has gone away.
 

> And the walk:
> 
> -  26.37%  [kernel]  [k] __ticket_spin_trylock
>    - __ticket_spin_trylock
>       - 49.10% _raw_spin_lock
>          - 50.65% evict
...
>          - 26.99% list_lru_add
>             + 89.01% inode_add_lru
>             + 10.95% dput
>          + 7.03% __remove_inode_hash
>       - 40.65% do_raw_spin_lock
>          - _raw_spin_lock
>             - 41.96% evict
....
>             - 13.55% list_lru_add
>                  84.33% inode_add_lru
....
>             + 10.10% __remove_inode_hash                                                                                                                               
>                     system_call_fastpath

-  15.44%  [kernel]  [k] __ticket_spin_trylock
   - __ticket_spin_trylock
      - 46.59% _raw_spin_lock
         + 69.40% list_lru_add
           17.65% list_lru_del
           5.70% list_lru_count_node
           2.44% shrink_dentry_list
              prune_dcache_sb
              super_cache_scan
              shrink_slab
           0.86% __page_check_address
      - 33.06% do_raw_spin_lock
         - _raw_spin_lock
            + 36.96% list_lru_add
            + 11.98% list_lru_del
            + 6.68% shrink_dentry_list
            + 6.43% d_alloc
            + 4.79% _xfs_buf_find
.....
      + 11.48% do_raw_spin_trylock
      + 8.87% _raw_spin_trylock

So now we see that CPU wasted on contention is down by 40%.
Observation shows that most of the list_lru_add/list_lru_del
contention occurs when reclaim is running - before memory filled
up the lookup rate was on the high side of 600,000 inodes/s, but
fell back to about 425,000/s once reclaim started working.

> 
> There's quite a different pattern of contention - it has moved
> inward to evict which implies the inode_sb_list_lock is the next
> obvious point of contention. I have patches in the works for that.
> Also, the inode_hash_lock is causing some contention, even though we
> fake inode hashing. I have a patch to fix that for XFS as well.
> 
> I also note an interesting behaviour of the per-node inode LRUs -
> the contention is coming from the dentry shrinker on one node
> freeing inodes allocated on a different node during reclaim. There's
> scope for improvement there.
> 
> But here' the interesting part:
> 
> Kernel	    create		walk		unlink
> 	time(s)	 rate		time(s)		time(s)
> 3.10-cil  222	266k+-32k	  170		  295
> mmotm	  251	222k+-16k	  128		  356

mmotm-cil  225  258k+-26k	  122		  296

So even with all the debug on, the mmotm kernel with most of the
mods as I was running in 3.10-cil, plus the s_inodes ->list_lru
conversion gets the same throughput for create and unlink and has
much better walk times.

> Even with all the debug enabled, the overall walk time dropped by
> 25% to 128s. So performance in this workload has substantially
> improved because of the per-node LRUs and variability is also down
> as well, as predicted. Once I add all the tweaks I have in the
> 3.10-cil tree to mmotm, I expect significant improvements to create
> and unlink performance as well...
> 
> So, lets look at ext4 vs btrfs vs XFS at 16-way (this is on the
> 3.10-cil kernel I've been testing XFS on):
> 
> 	    create		 walk		unlink
> 	 time(s)   rate		time(s)		time(s)
> xfs	  222	266k+-32k	  170		  295
> ext4	  978	 54k+- 2k	  325		 2053
> btrfs	 1223	 47k+- 8k	  366		12000(*)
> 
> (*) Estimate based on a removal rate of 18.5 minutes for the first
> 4.8 million inodes.

So, let's run these again on my current mmotm tree - it has the ext4
extent tree fixes in it and my rcu inode hash lookup patch...

	    create		 walk		unlink
	 time(s)   rate		time(s)		time(s)
xfs	  225	258k+-26k	  122		  296
ext4	  456	118k+- 4k	  128		 1632
btrfs	 1122	 51k+- 3k	  281		 3200(*)

(*) about 4.7 million inodes removed in 5 minutes.

ext4 is a lot healthier: create speed doubles from the extent cache
lock contention fixes, and the walk time halves due to the rcu inode
cache lookup. That said, it is still burning a huge amount of CPU on
the inode_hash_lock adding and removing inodes. Unlink perf is a bit
faster, but still slow.  So, yeah, things will get better in the
not-too distant future...

And for btrfs? Well, create is a tiny bit faster, the walk is 20%
faster thanks to the rcu hash lookups, and unlinks are markedly
faster (3x). Still not fast enough for me to hang around waiting for
them to complete, though.

FWIW, while the results are a lot better for ext4, let me just point
out how hard it is driving the storage to get that performance:

load	|    create  |	    walk    |		unlink
IO type	|    write   |	    read    |	   read	    |	   write
	| IOPS	 BW  |	 IOPS	 BW |	IOPS	BW  |	 IOPS	 BW
--------+------------+--------------+---------------+--------------
xfs	|  900	200  |	18000	140 |	7500	50  |	  400	 50
ext4	|23000	390  |	55000	200 |	2000	10  |	13000	160
btrfs(*)|peaky	 75  |	26000	100 |	decay	10  |	peaky peaky

ext4 is hammering the SSDs far harder than XFS, both in terms of
IOPS and bandwidth. You do not want to run ext4 on your SSD if you
have a metadata intensive workload as it will age the SSD much, much
faster than XFS with that sort of write behaviour.

(*) the btrfs create IO pattern is 5s peaks of write IOPS every 30s.
The baseline is about 500 IOPS, but the peaks reach upwards of
30,000 write IOPS. Unlink does this as well.  There are also short
bursts of 2-3000 read IOPS just before the write IOPS bursts in the
create workload. For the unlink, it starts off with about 10,000
read IOPS, and goes quickly into exponential decay down to about
2000 read IOPS in 90s.  Then it hits some trigger and the cycle
starts again. The trigger appears to co-incide with the reclaim 1-2
million dentries being reclaimed.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-07-09  8:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1372657476-9241-1-git-send-email-david@fromorbit.com>
2013-07-08 12:44 ` Some baseline tests on new hardware (was Re: [PATCH] xfs: optimise CIL insertion during transaction commit [RFC]) Dave Chinner
2013-07-08 13:59   ` Jan Kara
2013-07-08 15:22     ` Marco Stornelli
2013-07-08 15:38       ` Jan Kara
2013-07-09  0:15         ` Dave Chinner
2013-07-09  0:56       ` Theodore Ts'o
2013-07-09  0:43   ` Zheng Liu
2013-07-09  1:23     ` Dave Chinner
2013-07-09  1:15   ` Chris Mason
2013-07-09  1:26     ` Dave Chinner
2013-07-09  1:54       ` [BULK] " Chris Mason
2013-07-09  8:26   ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).