From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org,
Wu Fengguang <fengguang.wu@intel.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 0/4 RESEND] writeback: Dirty list handling changes
Date: Fri, 16 May 2014 09:55:14 +1000 [thread overview]
Message-ID: <20140515235514.GH5421@dastard> (raw)
In-Reply-To: <1400168517-3565-1-git-send-email-jack@suse.cz>
On Thu, May 15, 2014 at 05:41:53PM +0200, Jan Kara wrote:
> Hello,
>
> so I was recently thinking about how writeback code shuffles inodes between
> lists and also how redirty_tail() clobbers dirtied_when timestamp (which broke
> my sync(2) optimization). This patch series came out of that. Patch 1 is a
> clear win and just needs an independent review that I didn't forget about
> something. Patch 3 changes writeback list handling - IMHO it makes the logic
> somewhat more straightforward as we don't have to bother shuffling inodes
> between lists and we also don't need to clobber dirtied_when timestamp.
> But opinions may differ...
>
> Patches passed xfstests run and I did some basic writeback tests using tiobench
> and some artifical sync livelock tests to verify nothing regressed. So I'd
> be happy if people could have a look.
Performance regresses significantly.
Test is on a 16p/16GB VM with a sparse 100TB XFS filesystem backed
by a pair of SSDs in RAID0:
./fs_mark -D 10000 -S0 -n 10000 -s 4096 -L 120 -d
/mnt/scratch/0 -d /mnt/scratch/1 -d /mnt/scratch/2 -d
/mnt/scratch/3 -d /mnt/scratch/4 -d /mnt/scratch/5 -d
/mnt/scratch/6 -d /mnt/scratch/7
That creates 10 million 4k files with 16 threads and 10000 files per
directory. No sync/fsync is done, so it's a pure background
writeback workload.
For 0-400,000 files, it runs in memory, at 400-800k files background
writeback is occurring, at > 800k files foreground throttling is
occurring.
The file create rates and write IOPS/bw are:
vanilla patched
load point files iops bw files iops bw
< bg thres 120k 0 0 110k 0 0
< fg thres 120k 37k 210MB/s 60k 20k 130MB/s
sustained 36k 37k 210MB/s 25k 28k 150MB/s
The average create rate is 40k (vanilla) vs 28k (patched). Wall
times:
vanilla patched
real 4m27.475s 6m29.364s
user 1m7.072s 1m3.590s
sys 10m0.836s 22m34.362s
The new code burns more than twice the system CPU whilst going
significantly slower.
I haven't done any further investigation to determine which patch
causes the regression, but it's large enough you shoul dbe able to
reproduce it.
BTW, while touching this code, we should also add plugging at the
upper inode writeback level - it provides a 20% performance boost to
this workload. The numbers in the patch description below are old,
but I just verified 3.15-rc5 gives the same scale of improvement.
e.g. it almost completely negates the throughput and wall time
regressions this this patchset introduces:
vanilla patched patched+plug
load point files iops bw files iops bw files iops bw
< bg thres 120k 0 0 110k 0 0 120k 0 0
< fg thres 120k 37k 210MB/s 60k 20k 130MB/s 80k 1k 180MB/s
sustained 36k 37k 210MB/s 25k 28k 150MB/s 33k 1.5k 200MB/s
real 4m27.475s 6m29.364s 4m40.524s
user 1m7.072s 1m3.590s 0m55.819s
sys 10m0.836s 22m34.362s 18m8.130s
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
writeback: plug writeback at a high level
From: Dave Chinner <dchinner@redhat.com>
Doing writeback on lots of little files causes terrible IOPS storms
because of the per-mapping writeback plugging we do. This
essentially causes imeediate dispatch of IO for each mapping,
regardless of the context in which writeback is occurring.
IOWs, running a concurrent write-lots-of-small 4k files using fsmark
on XFS results in a huge number of IOPS being issued for data
writes. Metadata writes are sorted and plugged at a high level by
XFS, so aggregate nicely into large IOs. However, data writeback IOs
are dispatched in individual 4k IOs, even when the blocks of two
consecutively written files are adjacent.
Test VM: 8p, 8GB RAM, 4xSSD in RAID0, 100TB sparse XFS filesystem,
metadata CRCs enabled.
Kernel: 3.10-rc5 + xfsdev + my 3.11 xfs queue (~70 patches)
Test:
$ ./fs_mark -D 10000 -S0 -n 10000 -s 4096 -L 120 -d
/mnt/scratch/0 -d /mnt/scratch/1 -d /mnt/scratch/2 -d
/mnt/scratch/3 -d /mnt/scratch/4 -d /mnt/scratch/5 -d
/mnt/scratch/6 -d /mnt/scratch/7
Result:
wall sys create rate Physical write IO
time CPU (avg files/s) IOPS Bandwidth
----- ----- ------------ ------ ---------
unpatched 6m56s 15m47s 24,000+/-500 26,000 130MB/s
patched 5m06s 13m28s 32,800+/-600 1,500 180MB/s
improvement -26.44% -14.68% +36.67% -94.23% +38.46%
If I use zero length files, this workload at about 500 IOPS, so
plugging drops the data IOs from roughly 25,500/s to 1000/s.
3 lines of code, 35% better throughput for 15% less CPU.
The benefits of plugging at this layer are likely to be higher for
spinning media as the IO patterns for this workload are going make a
much bigger difference on high IO latency devices.....
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/fs-writeback.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 426ff81..7cd2b3a 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -505,6 +505,9 @@ static long writeback_inodes(struct bdi_writeback *wb,
long write_chunk;
long wrote = 0; /* count both pages and inodes */
struct inode *inode, *next;
+ struct blk_plug plug;
+
+ blk_start_plug(&plug);
restart:
/* We use list_safe_reset_next() to make the list iteration safe */
@@ -603,6 +606,7 @@ restart:
break;
}
}
+ blk_finish_plug(&plug);
return wrote;
}
next prev parent reply other threads:[~2014-05-15 23:55 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-15 15:41 [PATCH 0/4 RESEND] writeback: Dirty list handling changes Jan Kara
2014-05-15 15:41 ` [PATCH 1/4] writeback: Get rid of superblock pinning Jan Kara
2014-05-15 15:41 ` [PATCH 2/4] writeback: Move removal from writeback list in evict() Jan Kara
2014-05-15 15:41 ` [PATCH 3/4] writeback: Replace several writeback lists with inode tagging Jan Kara
2014-05-15 15:41 ` [PATCH 4/4] writeback: Remove pages_skipped from writeback_control Jan Kara
2014-05-15 23:05 ` Dave Chinner
2014-05-15 23:55 ` Dave Chinner [this message]
2014-05-16 0:47 ` [PATCH 0/4 RESEND] writeback: Dirty list handling changes Dave Chinner
2014-05-16 9:27 ` Christoph Hellwig
2014-05-16 22:42 ` Dave Chinner
2014-05-19 16:47 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140515235514.GH5421@dastard \
--to=david@fromorbit.com \
--cc=akpm@linux-foundation.org \
--cc=fengguang.wu@intel.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).