* btrfs kernel workqueues performance regression
@ 2014-07-15 15:26 Morten Stevens
2014-07-15 17:39 ` Chris Mason
0 siblings, 1 reply; 4+ messages in thread
From: Morten Stevens @ 2014-07-15 15:26 UTC (permalink / raw)
To: linux-btrfs
Hi,
I see that btrfs is using kernel workqueues since linux 3.15. After
some tests I noticed performance regressions with fs_mark.
mount options: rw,relatime,compress=lzo,space_cache
fs_mark on Kernel 3.14.9:
# fs_mark -d /mnt/btrfs/fsmark -D 512 -t 16 -n 4096 -s 51200 -L5 -S0
FSUse% Count Size Files/sec App Overhead
1 65536 51200 17731.4 723894
1 131072 51200 16832.6 685444
1 196608 51200 19604.5 652294
1 262144 51200 18663.6 630067
1 327680 51200 20112.2 692769
The results are really nice! compress=lzo performs very good.
fs_mark after upgrading to Kernel 3.15.4:
# fs_mark -d /mnt/btrfs/fsmark -D 512 -t 16 -n 4096 -s 51200 -L5 -S0
FSUse% Count Size Files/sec App Overhead
0 65536 51200 10718.1 749540
0 131072 51200 8601.2 853050
0 196608 51200 11623.2 558546
0 262144 51200 11534.2 536342
0 327680 51200 11167.4 578562
That's really a big performance regression :(
What do you think? It's easy to reproduce with fs_mark.
Thank you.
Best regards,
Morten
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: btrfs kernel workqueues performance regression
2014-07-15 15:26 btrfs kernel workqueues performance regression Morten Stevens
@ 2014-07-15 17:39 ` Chris Mason
2014-07-22 23:39 ` Dave Chinner
0 siblings, 1 reply; 4+ messages in thread
From: Chris Mason @ 2014-07-15 17:39 UTC (permalink / raw)
To: Morten Stevens, linux-btrfs
On 07/15/2014 11:26 AM, Morten Stevens wrote:
> Hi,
>
> I see that btrfs is using kernel workqueues since linux 3.15. After
> some tests I noticed performance regressions with fs_mark.
>
> mount options: rw,relatime,compress=lzo,space_cache
>
> fs_mark on Kernel 3.14.9:
>
> # fs_mark -d /mnt/btrfs/fsmark -D 512 -t 16 -n 4096 -s 51200 -L5 -S0
> FSUse% Count Size Files/sec App Overhead
> 1 65536 51200 17731.4 723894
> 1 131072 51200 16832.6 685444
> 1 196608 51200 19604.5 652294
> 1 262144 51200 18663.6 630067
> 1 327680 51200 20112.2 692769
>
> The results are really nice! compress=lzo performs very good.
>
> fs_mark after upgrading to Kernel 3.15.4:
>
> # fs_mark -d /mnt/btrfs/fsmark -D 512 -t 16 -n 4096 -s 51200 -L5 -S0
> FSUse% Count Size Files/sec App Overhead
> 0 65536 51200 10718.1 749540
> 0 131072 51200 8601.2 853050
> 0 196608 51200 11623.2 558546
> 0 262144 51200 11534.2 536342
> 0 327680 51200 11167.4 578562
>
> That's really a big performance regression :(
>
> What do you think? It's easy to reproduce with fs_mark.
I wasn't able to trigger regressions here when we first merged it, but I
was sure that something would pop up. fs_mark is sensitive to a few
different factors outside just the worker threads, so it could easily be
another change as well.
With 16 threads, the btree locking also has a huge impact, and we've
made change there too.
I'll reproduce here, thanks for sending it in.
-chris
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: btrfs kernel workqueues performance regression
2014-07-15 17:39 ` Chris Mason
@ 2014-07-22 23:39 ` Dave Chinner
2014-07-23 13:33 ` Chris Mason
0 siblings, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2014-07-22 23:39 UTC (permalink / raw)
To: Chris Mason; +Cc: Morten Stevens, linux-btrfs
On Tue, Jul 15, 2014 at 01:39:11PM -0400, Chris Mason wrote:
> On 07/15/2014 11:26 AM, Morten Stevens wrote:
> > Hi,
> >
> > I see that btrfs is using kernel workqueues since linux 3.15. After
> > some tests I noticed performance regressions with fs_mark.
> >
> > mount options: rw,relatime,compress=lzo,space_cache
> >
> > fs_mark on Kernel 3.14.9:
> >
> > # fs_mark -d /mnt/btrfs/fsmark -D 512 -t 16 -n 4096 -s 51200 -L5 -S0
> > FSUse% Count Size Files/sec App Overhead
> > 1 65536 51200 17731.4 723894
> > 1 131072 51200 16832.6 685444
> > 1 196608 51200 19604.5 652294
> > 1 262144 51200 18663.6 630067
> > 1 327680 51200 20112.2 692769
> >
> > The results are really nice! compress=lzo performs very good.
> >
> > fs_mark after upgrading to Kernel 3.15.4:
> >
> > # fs_mark -d /mnt/btrfs/fsmark -D 512 -t 16 -n 4096 -s 51200 -L5 -S0
> > FSUse% Count Size Files/sec App Overhead
> > 0 65536 51200 10718.1 749540
> > 0 131072 51200 8601.2 853050
> > 0 196608 51200 11623.2 558546
> > 0 262144 51200 11534.2 536342
> > 0 327680 51200 11167.4 578562
> >
> > That's really a big performance regression :(
> >
> > What do you think? It's easy to reproduce with fs_mark.
>
> I wasn't able to trigger regressions here when we first merged it, but I
> was sure that something would pop up. fs_mark is sensitive to a few
> different factors outside just the worker threads, so it could easily be
> another change as well.
>
> With 16 threads, the btree locking also has a huge impact, and we've
> made change there too.
FWIW, I ran my usual 16-way fsmark test last week on my sparse 500TB
perf test rig on btrfs. It sucked, big time, much worse than it's
sucked in the past. It didn't scale past a single thread - 1 thread
got 24,000 files/s, 2 threads got 25,000 files/s 16 threads got
22,000 files/s.
$ ./fs_mark -D 10000 -S0 -n 100000 -s 0 -L 32 -d /mnt/scratch/0
....
FSUse% Count Size Files/sec App Overhead
0 100000 0 24808.8 686583
....
$ ./fs_mark -D 10000 -S0 -n 100000 -s 0 -L 32 -d /mnt/scratch/0 -d /mnt/scratch/1 -d /mnt/scratch/2 -d /mnt/scratch/3 -d /mnt/scratch/4 -d /mnt/scratch/5 -d /mnt/scratch/6 -d /mnt/scratch/7 -d /mnt/scratch/8 -d /mnt/scratch/9 -d /mnt/scratch/10 -d /mnt/scratch/11 -d /mnt/scratch/12 -d /mnt/scratch/13 -d /mnt/scratch/14 -d /mnt/scratch/15
....
FSUse% Count Size Files/sec App Overhead
0 1600000 0 23599.7 38047237
Last time I ran this (probably about 3.12 - btrfs was simply too
broken when I last tried on 3.14) I got about 80,000 files/s so this
is a pretty significant regression.
The 16-way run consumed most of the 16 CPUs in the system, and the
perf top output showed this:
+ 44.48% [kernel] [k] _raw_spin_unlock_irqrestore
+ 28.60% [kernel] [k] queue_read_lock_slowpath
+ 14.34% [kernel] [k] queue_write_lock_slowpath
+ 1.91% [kernel] [k] _raw_spin_unlock_irq
+ 0.85% [kernel] [k] __do_softirq
+ 0.45% [kernel] [k] do_raw_read_lock
+ 0.43% [kernel] [k] do_raw_read_unlock
+ 0.42% [kernel] [k] btrfs_search_slot
+ 0.40% [kernel] [k] do_raw_spin_lock
+ 0.35% [kernel] [k] btrfs_tree_read_unlock
+ 0.33% [kernel] [k] do_raw_write_lock
+ 0.30% [kernel] [k] btrfs_clear_lock_blocking_rw
+ 0.29% [kernel] [k] btrfs_tree_read_lock
All the CPU time is basically spend in locking functions.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: btrfs kernel workqueues performance regression
2014-07-22 23:39 ` Dave Chinner
@ 2014-07-23 13:33 ` Chris Mason
0 siblings, 0 replies; 4+ messages in thread
From: Chris Mason @ 2014-07-23 13:33 UTC (permalink / raw)
To: Dave Chinner; +Cc: Morten Stevens, linux-btrfs
On 07/22/2014 07:39 PM, Dave Chinner wrote:
> On Tue, Jul 15, 2014 at 01:39:11PM -0400, Chris Mason wrote:
>> On 07/15/2014 11:26 AM, Morten Stevens wrote:
>>> Hi,
>>>
>>> I see that btrfs is using kernel workqueues since linux 3.15. After
>>> some tests I noticed performance regressions with fs_mark.
>>>
>>> mount options: rw,relatime,compress=lzo,space_cache
>>>
>>> fs_mark on Kernel 3.14.9:
>>>
>>> # fs_mark -d /mnt/btrfs/fsmark -D 512 -t 16 -n 4096 -s 51200 -L5 -S0
>>> FSUse% Count Size Files/sec App Overhead
>>> 1 65536 51200 17731.4 723894
>>> 1 131072 51200 16832.6 685444
>>> 1 196608 51200 19604.5 652294
>>> 1 262144 51200 18663.6 630067
>>> 1 327680 51200 20112.2 692769
>>>
>>> The results are really nice! compress=lzo performs very good.
>>>
>>> fs_mark after upgrading to Kernel 3.15.4:
>>>
>>> # fs_mark -d /mnt/btrfs/fsmark -D 512 -t 16 -n 4096 -s 51200 -L5 -S0
>>> FSUse% Count Size Files/sec App Overhead
>>> 0 65536 51200 10718.1 749540
>>> 0 131072 51200 8601.2 853050
>>> 0 196608 51200 11623.2 558546
>>> 0 262144 51200 11534.2 536342
>>> 0 327680 51200 11167.4 578562
>>>
>>> That's really a big performance regression :(
>>>
>>> What do you think? It's easy to reproduce with fs_mark.
>>
>> I wasn't able to trigger regressions here when we first merged it, but I
>> was sure that something would pop up. fs_mark is sensitive to a few
>> different factors outside just the worker threads, so it could easily be
>> another change as well.
>>
>> With 16 threads, the btree locking also has a huge impact, and we've
>> made change there too.
>
> FWIW, I ran my usual 16-way fsmark test last week on my sparse 500TB
> perf test rig on btrfs. It sucked, big time, much worse than it's
> sucked in the past. It didn't scale past a single thread - 1 thread
> got 24,000 files/s, 2 threads got 25,000 files/s 16 threads got
> 22,000 files/s.
We had a trylock in the btree search code that always took the spinlock
but did a trylock on the blocking lock. This was changed to a trylock
on the spinlock too because some of the callers were using trylock
differently than in the past.
It's a regression for this kind of run, but makes the btrfs locking much
less mystical. I'm fixing up the performance regression part for the
next merge window, but I didn't want to mess around too much with it in
3.16 with all the other locking churn.
For this kind of fsmark run the best results still come from one subvol
per thread.
-chris
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-07-23 13:33 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-15 15:26 btrfs kernel workqueues performance regression Morten Stevens
2014-07-15 17:39 ` Chris Mason
2014-07-22 23:39 ` Dave Chinner
2014-07-23 13:33 ` Chris Mason
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).