From: Mark Syms <mark.syms@citrix.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH 0/2] GFS2: inplace_reserve performance improvements
Date: Thu, 20 Sep 2018 15:52:11 +0100 [thread overview]
Message-ID: <1537455133-48589-1-git-send-email-mark.syms@citrix.com> (raw)
While testing GFS2 as a storage repository for virtual machines we
discovered a number of scenarios where the performance was being
pathologically poor.
The scenarios are simplfied to the following -
* On a single host in the cluster grow a number of files to a
significant proportion of the filesystems LUN size, exceeding the
hosts preferred resource group allocation. This can be replicated
by using fio and writing to 20 different files with a script like
[test-files]
directory=gfs2/a:gfs2/b:gfs2/c:gfs2/d:gfs2/e:gfs2/f:gfs2/g:gfs2/h:gfs2/i:gfs2/j:gfs2/k:gfs2/l:gfs2/m:gfs2/n:gfs2/o:gfs2/p:gfs2/q:gfs2/r:gfs2/s:gfs2/t
nrfiles=1
size=20G
bs=512k
rw=write
buffered=0
ioengine=libaio
fallocate=none
numjobs=20
After starting off at network wire speed this will rapidly degrade
with the fio process reporting large sys time.
This was diagnosed to all the processes contending on the glock in
gfs2_inplace_reserve having all selected the same resource
group. Patch 1 addresses this with an optional module parameter
which enables behaviour to "randomly" skip a selected resource
group in the first two passes in gfs_inplace_reserve in order to
spread the processes out.
Worth noting that this would probably also be addressed if the
comment in Documentation/gfs2-glocks.txt about eventually making
glock EX locally shared was made to happen. However, this looks
like it would require quite a bit of coordination and design so
this stop-gap helps in the meantime.
* With two or more hosts growing files at high data rates the
throughput drops to a small proportion of the maximum storage
I/O. This is the several VMs all writing to the filesystem
scenario. Sometimes this test would run through clean at 80-90% of
storage wire speed but at other times the performance would drop
on one or more hosts to a small number of KiB/s.
This was diagnosed to the different hosts repeatedly bouncing
resource group glocks between them as different hosts selected
the same resource group (having exhausted the preferred groups).
Patch 2 addresses this by -
* adding a hold delay to the resource group glock if there are
local waiters, following the pattern already in place for
inodes, this should also provide more data for
gfs2_rgrp_congested to work on.
* remembering when we were last asked to demote the lock on a
resource group
* in the first two passes in gfs2_inplace_reserve avoiding
resource groups where we have been asked to demote the glock
within the last second
Mark Syms (1):
GFS2: Avoid recently demoted rgrps.
Tim Smith (1):
Add some randomisation to the GFS2 resource group allocator
fs/gfs2/glock.c | 7 +++++--
fs/gfs2/incore.h | 2 ++
fs/gfs2/main.c | 1 +
fs/gfs2/rgrp.c | 49 +++++++++++++++++++++++++++++++++++++++++++++----
fs/gfs2/trace_gfs2.h | 12 +++++++++---
5 files changed, 62 insertions(+), 9 deletions(-)
--
1.8.3.1
next reply other threads:[~2018-09-20 14:52 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-20 14:52 Mark Syms [this message]
2018-09-20 14:52 ` [Cluster-devel] [PATCH 1/2] Add some randomisation to the GFS2 resource group allocator Mark Syms
2018-09-20 14:52 ` [Cluster-devel] [PATCH 2/2] GFS2: Avoid recently demoted rgrps Mark Syms
2018-09-20 17:17 ` [Cluster-devel] [PATCH 0/2] GFS2: inplace_reserve performance improvements Bob Peterson
2018-09-20 17:47 ` Mark Syms
2018-09-20 18:16 ` Steven Whitehouse
2018-09-28 12:23 ` Bob Peterson
2018-09-28 12:36 ` Mark Syms
2018-09-28 12:50 ` Mark Syms
2018-09-28 13:18 ` Steven Whitehouse
2018-09-28 13:43 ` Tim Smith
2018-09-28 13:59 ` Bob Peterson
2018-09-28 14:11 ` Mark Syms
2018-09-28 15:09 ` Tim Smith
2018-09-28 15:09 ` Steven Whitehouse
2018-09-28 12:55 ` Bob Peterson
2018-09-28 13:56 ` Mark Syms
2018-10-02 13:50 ` Mark Syms
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1537455133-48589-1-git-send-email-mark.syms@citrix.com \
--to=mark.syms@citrix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).