public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Jens Axboe <axboe@fb.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-block@vger.kernel.org
Subject: Re: [PATCHSET v3][RFC] Make background writeback not suck
Date: Thu, 31 Mar 2016 19:24:33 +1100	[thread overview]
Message-ID: <20160331082433.GO11812@dastard> (raw)
In-Reply-To: <1459350477-16404-1-git-send-email-axboe@fb.com>

On Wed, Mar 30, 2016 at 09:07:48AM -0600, Jens Axboe wrote:
> Hi,
> 
> This patchset isn't as much a final solution, as it's demonstration
> of what I believe is a huge issue. Since the dawn of time, our
> background buffered writeback has sucked. When we do background
> buffered writeback, it should have little impact on foreground
> activity. That's the definition of background activity... But for as
> long as I can remember, heavy buffered writers has not behaved like
> that. For instance, if I do something like this:
> 
> $ dd if=/dev/zero of=foo bs=1M count=10k
> 
> on my laptop, and then try and start chrome, it basically won't start
> before the buffered writeback is done. Or, for server oriented
> workloads, where installation of a big RPM (or similar) adversely
> impacts data base reads or sync writes. When that happens, I get people
> yelling at me.
> 
> Last time I posted this, I used flash storage as the example. But
> this works equally well on rotating storage. Let's run a test case
> that writes a lot. This test writes 50 files, each 100M, on XFS on
> a regular hard drive. While this happens, we attempt to read
> another file with fio.
> 
> Writers:
> 
> $ time (./write-files ; sync)
> real	1m6.304s
> user	0m0.020s
> sys	0m12.210s

Great. So a basic IO tests looks good - let's through something more
complex at it. Say, a benchmark I've been using for years to stress
the Io subsystem, the filesystem and memory reclaim all at the same
time: a concurent fsmark inode creation test.
(first google hit https://lkml.org/lkml/2013/9/10/46)

This generates thousands of REQ_WRITE metadata IOs every second, so
iif I understand how the throttle works correctly, these would be
classified as background writeback by the block layer throttle.
And....

FSUse%        Count         Size    Files/sec     App Overhead
     0      1600000            0     255845.0         10796891
     0      3200000            0     261348.8         10842349
     0      4800000            0     249172.3         14121232
     0      6400000            0     245172.8         12453759
     0      8000000            0     201249.5         14293100
     0      9600000            0     200417.5         29496551
>>>> 0     11200000            0      90399.6         40665397
     0     12800000            0     212265.6         21839031
     0     14400000            0     206398.8         32598378
     0     16000000            0     197589.7         26266552
     0     17600000            0     206405.2         16447795
>>>> 0     19200000            0      99189.6         87650540
     0     20800000            0     249720.8         12294862
     0     22400000            0     138523.8         47330007
>>>> 0     24000000            0      85486.2         14271096
     0     25600000            0     157538.1         64430611
     0     27200000            0     109677.8         47835961
     0     28800000            0     207230.5         31301031
     0     30400000            0     188739.6         33750424
     0     32000000            0     174197.9         41402526
     0     33600000            0     139152.0        100838085
     0     35200000            0     203729.7         34833764
     0     36800000            0     228277.4         12459062
>>>> 0     38400000            0      94962.0         30189182
     0     40000000            0     166221.9         40564922
>>>> 0     41600000            0      62902.5         80098461
     0     43200000            0     217932.6         22539354
     0     44800000            0     189594.6         24692209
     0     46400000            0     137834.1         39822038
     0     48000000            0     240043.8         12779453
     0     49600000            0     176830.8         16604133
     0     51200000            0     180771.8         32860221

real    5m35.967s
user    3m57.054s
sys     48m53.332s

In those highlighted report points, the performance has dropped
significantly. The typical range I expect to see ionce memory has
filled (a bit over 8m inodes) is 180k-220k.  Runtime on a vanilla
kernel was 4m40s and there were no performance drops, so this
workload runs almost a minute slower with the block layer throttling
code.

What I see in these performance dips is the XFS transaction
subsystem stalling *completely* - instead of running at a steady
state of around 350,000 transactions/s, there are *zero*
transactions running for periods of up to ten seconds.  This
co-incides with the CPU usage falling to almost zero as well.
AFAICT, the only thing that is running when the filesystem stalls
like this is memory reclaim.

Without the block throttling patches, the workload quickly finds a
steady state of around 7.5-8.5 million cached inodes, and it doesn't
vary much outside those bounds. With the block throttling patches,
on every transaction subsystem stall that occurs, the inode cache
gets 3-4 million inodes trimmed out of it (i.e. half the
cache), and in a couple of cases I saw it trim 6+ million inodes from
the cache before the transactions started up and the cache started
growing again.

> The above was run without scsi-mq, and with using the deadline scheduler,
> results with CFQ are similary depressing for this test. So IO scheduling
> is in place for this test, it's not pure blk-mq without scheduling.

virtio in guest, XFS direct IO -> no-op -> scsi in host.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2016-03-31  8:24 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-30 15:07 [PATCHSET v3][RFC] Make background writeback not suck Jens Axboe
2016-03-30 15:07 ` [PATCH 1/9] writeback: propagate the various reasons for writeback Jens Axboe
2016-03-30 15:07 ` [PATCH 2/9] writeback: add wbc_to_write() Jens Axboe
2016-03-30 15:07 ` [PATCH 3/9] writeback: use WRITE_SYNC for reclaim or sync writeback Jens Axboe
2016-03-30 15:07 ` [PATCH 4/9] writeback: track if we're sleeping on progress in balance_dirty_pages() Jens Axboe
2016-04-13 13:08   ` Jan Kara
2016-04-13 14:20     ` Jens Axboe
2016-03-30 15:07 ` [PATCH 5/9] block: add ability to flag write back caching on a device Jens Axboe
2016-03-30 15:42   ` Christoph Hellwig
2016-03-30 15:46     ` Jens Axboe
2016-03-30 16:23       ` Jens Axboe
2016-03-30 17:29         ` Christoph Hellwig
2016-03-30 15:07 ` [PATCH 6/9] sd: inform block layer of write cache state Jens Axboe
2016-03-30 15:07 ` [PATCH 7/9] NVMe: " Jens Axboe
2016-03-30 15:07 ` [PATCH 8/9] block: add code to track actual device queue depth Jens Axboe
2016-03-30 15:07 ` [PATCH 9/9] writeback: throttle buffered writeback Jens Axboe
2016-03-31  8:24 ` Dave Chinner [this message]
2016-03-31 14:29   ` [PATCHSET v3][RFC] Make background writeback not suck Jens Axboe
2016-03-31 16:21     ` Jens Axboe
2016-04-01  0:56       ` Dave Chinner
2016-04-01  3:29         ` Jens Axboe
2016-04-01  3:33           ` Jens Axboe
2016-04-01  3:39           ` Jens Axboe
2016-04-01  6:16             ` Dave Chinner
2016-04-01 14:33               ` Jens Axboe
2016-04-01  5:04           ` Dave Chinner
2016-04-01  0:46     ` Dave Chinner
2016-04-01  3:25       ` Jens Axboe
2016-04-01  6:27         ` Dave Chinner
2016-04-01 14:34           ` Jens Axboe
2016-03-31 22:09 ` Holger Hoffstätte
2016-04-01  1:01   ` Dave Chinner
2016-04-01 16:58     ` Holger Hoffstätte

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160331082433.GO11812@dastard \
    --to=david@fromorbit.com \
    --cc=axboe@fb.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox