btrfs kernel workqueues performance regression

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* btrfs kernel workqueues performance regression
@ 2014-07-15 15:26 Morten Stevens
  2014-07-15 17:39 ` Chris Mason
  0 siblings, 1 reply; 4+ messages in thread
From: Morten Stevens @ 2014-07-15 15:26 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I see that btrfs is using kernel workqueues since linux 3.15. After
some tests I noticed performance regressions with fs_mark.

mount options: rw,relatime,compress=lzo,space_cache

fs_mark on Kernel 3.14.9:

# fs_mark  -d  /mnt/btrfs/fsmark  -D  512  -t  16  -n  4096  -s  51200  -L5  -S0
FSUse%        Count         Size    Files/sec     App Overhead
     1        65536        51200      17731.4           723894
     1       131072        51200      16832.6           685444
     1       196608        51200      19604.5           652294
     1       262144        51200      18663.6           630067
     1       327680        51200      20112.2           692769

The results are really nice! compress=lzo performs very good.

fs_mark after upgrading to Kernel 3.15.4:

# fs_mark  -d  /mnt/btrfs/fsmark  -D  512  -t  16  -n  4096  -s  51200  -L5  -S0
FSUse%        Count         Size    Files/sec     App Overhead
     0        65536        51200      10718.1           749540
     0       131072        51200       8601.2           853050
     0       196608        51200      11623.2           558546
     0       262144        51200      11534.2           536342
     0       327680        51200      11167.4           578562

That's really a big performance regression :(

What do you think? It's easy to reproduce with fs_mark.

Thank you.

Best regards,

Morten

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: btrfs kernel workqueues performance regression
  2014-07-15 15:26 btrfs kernel workqueues performance regression Morten Stevens
@ 2014-07-15 17:39 ` Chris Mason
  2014-07-22 23:39   ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Mason @ 2014-07-15 17:39 UTC (permalink / raw)
  To: Morten Stevens, linux-btrfs

On 07/15/2014 11:26 AM, Morten Stevens wrote:
> Hi,
> 
> I see that btrfs is using kernel workqueues since linux 3.15. After
> some tests I noticed performance regressions with fs_mark.
> 
> mount options: rw,relatime,compress=lzo,space_cache
> 
> fs_mark on Kernel 3.14.9:
> 
> # fs_mark  -d  /mnt/btrfs/fsmark  -D  512  -t  16  -n  4096  -s  51200  -L5  -S0
> FSUse%        Count         Size    Files/sec     App Overhead
>      1        65536        51200      17731.4           723894
>      1       131072        51200      16832.6           685444
>      1       196608        51200      19604.5           652294
>      1       262144        51200      18663.6           630067
>      1       327680        51200      20112.2           692769
> 
> The results are really nice! compress=lzo performs very good.
> 
> fs_mark after upgrading to Kernel 3.15.4:
> 
> # fs_mark  -d  /mnt/btrfs/fsmark  -D  512  -t  16  -n  4096  -s  51200  -L5  -S0
> FSUse%        Count         Size    Files/sec     App Overhead
>      0        65536        51200      10718.1           749540
>      0       131072        51200       8601.2           853050
>      0       196608        51200      11623.2           558546
>      0       262144        51200      11534.2           536342
>      0       327680        51200      11167.4           578562
> 
> That's really a big performance regression :(
> 
> What do you think? It's easy to reproduce with fs_mark.

I wasn't able to trigger regressions here when we first merged it, but I
was sure that something would pop up.  fs_mark is sensitive to a few
different factors outside just the worker threads, so it could easily be
another change as well.

With 16 threads, the btree locking also has a huge impact, and we've
made change there too.

I'll reproduce here, thanks for sending it in.

-chris



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: btrfs kernel workqueues performance regression
  2014-07-15 17:39 ` Chris Mason
@ 2014-07-22 23:39   ` Dave Chinner
  2014-07-23 13:33     ` Chris Mason
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2014-07-22 23:39 UTC (permalink / raw)
  To: Chris Mason; +Cc: Morten Stevens, linux-btrfs

On Tue, Jul 15, 2014 at 01:39:11PM -0400, Chris Mason wrote:
> On 07/15/2014 11:26 AM, Morten Stevens wrote:
> > Hi,
> > 
> > I see that btrfs is using kernel workqueues since linux 3.15. After
> > some tests I noticed performance regressions with fs_mark.
> > 
> > mount options: rw,relatime,compress=lzo,space_cache
> > 
> > fs_mark on Kernel 3.14.9:
> > 
> > # fs_mark  -d  /mnt/btrfs/fsmark  -D  512  -t  16  -n  4096  -s  51200  -L5  -S0
> > FSUse%        Count         Size    Files/sec     App Overhead
> >      1        65536        51200      17731.4           723894
> >      1       131072        51200      16832.6           685444
> >      1       196608        51200      19604.5           652294
> >      1       262144        51200      18663.6           630067
> >      1       327680        51200      20112.2           692769
> > 
> > The results are really nice! compress=lzo performs very good.
> > 
> > fs_mark after upgrading to Kernel 3.15.4:
> > 
> > # fs_mark  -d  /mnt/btrfs/fsmark  -D  512  -t  16  -n  4096  -s  51200  -L5  -S0
> > FSUse%        Count         Size    Files/sec     App Overhead
> >      0        65536        51200      10718.1           749540
> >      0       131072        51200       8601.2           853050
> >      0       196608        51200      11623.2           558546
> >      0       262144        51200      11534.2           536342
> >      0       327680        51200      11167.4           578562
> > 
> > That's really a big performance regression :(
> > 
> > What do you think? It's easy to reproduce with fs_mark.
> 
> I wasn't able to trigger regressions here when we first merged it, but I
> was sure that something would pop up.  fs_mark is sensitive to a few
> different factors outside just the worker threads, so it could easily be
> another change as well.
> 
> With 16 threads, the btree locking also has a huge impact, and we've
> made change there too.

FWIW, I ran my usual 16-way fsmark test last week on my sparse 500TB
perf test rig on btrfs. It sucked, big time, much worse than it's
sucked in the past. It didn't scale past a single thread - 1 thread
got 24,000 files/s, 2 threads got 25,000 files/s 16 threads got
22,000 files/s.

$ ./fs_mark  -D  10000  -S0  -n  100000  -s  0  -L  32  -d /mnt/scratch/0
....
FSUse%        Count         Size    Files/sec     App Overhead
     0       100000            0      24808.8           686583
....
$ ./fs_mark  -D  10000  -S0  -n  100000  -s  0  -L  32  -d /mnt/scratch/0  -d  /mnt/scratch/1  -d  /mnt/scratch/2  -d /mnt/scratch/3  -d  /mnt/scratch/4  -d  /mnt/scratch/5  -d /mnt/scratch/6  -d  /mnt/scratch/7  -d  /mnt/scratch/8  -d /mnt/scratch/9  -d  /mnt/scratch/10  -d  /mnt/scratch/11  -d /mnt/scratch/12  -d  /mnt/scratch/13  -d  /mnt/scratch/14  -d /mnt/scratch/15
....
FSUse%        Count         Size    Files/sec     App Overhead
     0      1600000            0      23599.7         38047237

Last time I ran this (probably about 3.12 - btrfs was simply too
broken when I last tried on 3.14) I got about 80,000 files/s so this
is a pretty significant regression.

The 16-way run consumed most of the 16 CPUs in the system, and the
perf top output showed this:

+  44.48%  [kernel]  [k] _raw_spin_unlock_irqrestore
+  28.60%  [kernel]  [k] queue_read_lock_slowpath
+  14.34%  [kernel]  [k] queue_write_lock_slowpath
+   1.91%  [kernel]  [k] _raw_spin_unlock_irq
+   0.85%  [kernel]  [k] __do_softirq
+   0.45%  [kernel]  [k] do_raw_read_lock
+   0.43%  [kernel]  [k] do_raw_read_unlock
+   0.42%  [kernel]  [k] btrfs_search_slot
+   0.40%  [kernel]  [k] do_raw_spin_lock
+   0.35%  [kernel]  [k] btrfs_tree_read_unlock
+   0.33%  [kernel]  [k] do_raw_write_lock
+   0.30%  [kernel]  [k] btrfs_clear_lock_blocking_rw
+   0.29%  [kernel]  [k] btrfs_tree_read_lock

All the CPU time is basically spend in locking functions.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: btrfs kernel workqueues performance regression
  2014-07-22 23:39   ` Dave Chinner
@ 2014-07-23 13:33     ` Chris Mason
  0 siblings, 0 replies; 4+ messages in thread
From: Chris Mason @ 2014-07-23 13:33 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Morten Stevens, linux-btrfs

On 07/22/2014 07:39 PM, Dave Chinner wrote:
> On Tue, Jul 15, 2014 at 01:39:11PM -0400, Chris Mason wrote:
>> On 07/15/2014 11:26 AM, Morten Stevens wrote:
>>> Hi,
>>>
>>> I see that btrfs is using kernel workqueues since linux 3.15. After
>>> some tests I noticed performance regressions with fs_mark.
>>>
>>> mount options: rw,relatime,compress=lzo,space_cache
>>>
>>> fs_mark on Kernel 3.14.9:
>>>
>>> # fs_mark  -d  /mnt/btrfs/fsmark  -D  512  -t  16  -n  4096  -s  51200  -L5  -S0
>>> FSUse%        Count         Size    Files/sec     App Overhead
>>>      1        65536        51200      17731.4           723894
>>>      1       131072        51200      16832.6           685444
>>>      1       196608        51200      19604.5           652294
>>>      1       262144        51200      18663.6           630067
>>>      1       327680        51200      20112.2           692769
>>>
>>> The results are really nice! compress=lzo performs very good.
>>>
>>> fs_mark after upgrading to Kernel 3.15.4:
>>>
>>> # fs_mark  -d  /mnt/btrfs/fsmark  -D  512  -t  16  -n  4096  -s  51200  -L5  -S0
>>> FSUse%        Count         Size    Files/sec     App Overhead
>>>      0        65536        51200      10718.1           749540
>>>      0       131072        51200       8601.2           853050
>>>      0       196608        51200      11623.2           558546
>>>      0       262144        51200      11534.2           536342
>>>      0       327680        51200      11167.4           578562
>>>
>>> That's really a big performance regression :(
>>>
>>> What do you think? It's easy to reproduce with fs_mark.
>>
>> I wasn't able to trigger regressions here when we first merged it, but I
>> was sure that something would pop up.  fs_mark is sensitive to a few
>> different factors outside just the worker threads, so it could easily be
>> another change as well.
>>
>> With 16 threads, the btree locking also has a huge impact, and we've
>> made change there too.
> 
> FWIW, I ran my usual 16-way fsmark test last week on my sparse 500TB
> perf test rig on btrfs. It sucked, big time, much worse than it's
> sucked in the past. It didn't scale past a single thread - 1 thread
> got 24,000 files/s, 2 threads got 25,000 files/s 16 threads got
> 22,000 files/s.

We had a trylock in the btree search code that always took the spinlock
but did a trylock on the blocking lock.  This was changed to a trylock
on the spinlock too because some of the callers were using trylock
differently than in the past.

It's a regression for this kind of run, but makes the btrfs locking much
less mystical.  I'm fixing up the performance regression part for the
next merge window, but I didn't want to mess around too much with it in
3.16 with all the other locking churn.

For this kind of fsmark run the best results still come from one subvol
per thread.

-chris

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-07-23 13:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-15 15:26 btrfs kernel workqueues performance regression Morten Stevens
2014-07-15 17:39 ` Chris Mason
2014-07-22 23:39   ` Dave Chinner
2014-07-23 13:33     ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).