From: Mark Syms <mark.syms@citrix.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH 1/2] Add some randomisation to the GFS2 resource group allocator
Date: Thu, 20 Sep 2018 15:52:12 +0100 [thread overview]
Message-ID: <1537455133-48589-2-git-send-email-mark.syms@citrix.com> (raw)
In-Reply-To: <1537455133-48589-1-git-send-email-mark.syms@citrix.com>
From: Tim Smith <tim.smith@citrix.com>
When growing a number of files on the same cluster node from different
threads (e.g. fio with 20 or so jobs), all those threads pile into
gfs2_inplace_reserve() independently looking to claim a new resource
group and after a while they all synchronise, getting through the
gfs2_rgrp_used_recently()/gfs2_rgrp_congested() check together.
When this happens, write performance drops to about 1/5 on a single
node cluster, and on multi-node clusters it drops to near zero on
some nodes. The output from "glocktop -r -H -d 1" when this happens
begins to show many processes stuck in gfs2_inplace_reserve(), waiting
on a resource group lock.
This commit introduces a module parameter which, when set to a value
of 1, will introduce some random jitter into the first two passes of
gfs2_inplace_reserve() when trying to lock a new resource group,
skipping to the next one 1/2 the time with progressively lower
probability on each attempt.
Signed-off-by: Tim Smith <tim.smith@citrix.com>
---
fs/gfs2/rgrp.c | 39 +++++++++++++++++++++++++++++++++++----
1 file changed, 35 insertions(+), 4 deletions(-)
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 1ad3256..994eb7f 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -19,6 +19,7 @@
#include <linux/blkdev.h>
#include <linux/rbtree.h>
#include <linux/random.h>
+#include <linux/module.h>
#include "gfs2.h"
#include "incore.h"
@@ -49,6 +50,11 @@
#define LBITSKIP00 (0x0000000000000000UL)
#endif
+static int gfs2_skippy_rgrp_alloc;
+
+module_param_named(skippy_rgrp_alloc, gfs2_skippy_rgrp_alloc, int, 0644);
+MODULE_PARM_DESC(skippy_rgrp_alloc, "Set skippiness of resource group allocator, 0|1. Where 1 will cause resource groups to be randomly skipped with the likelihood of skipping progressively decreasing after a skip has occured.");
+
/*
* These routines are used by the resource group routines (rgrp.c)
* to keep track of block allocation. Each block is represented by two
@@ -2016,6 +2022,11 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
u64 last_unlinked = NO_BLOCK;
int loops = 0;
u32 free_blocks, skip = 0;
+ /*
+ * gfs2_skippy_rgrp_alloc provides our initial skippiness.
+ * randskip will thus be 2-255 if we want it do do anything.
+ */
+ u8 randskip = gfs2_skippy_rgrp_alloc + 1;
if (sdp->sd_args.ar_rgrplvb)
flags |= GL_SKIP;
@@ -2046,10 +2057,30 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
if (loops == 0 &&
!fast_to_acquire(rs->rs_rbm.rgd))
goto next_rgrp;
- if ((loops < 2) &&
- gfs2_rgrp_used_recently(rs, 1000) &&
- gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
- goto next_rgrp;
+ if (loops < 2) {
+ /*
+ * If resource group allocation is requested to be skippy,
+ * roll a hypothetical dice of <randskip> sides and skip
+ * straight to the next resource group anyway if it comes
+ * up 1.
+ */
+ if (gfs2_skippy_rgrp_alloc) {
+ u8 jitter;
+
+ prandom_bytes(&jitter, sizeof(jitter));
+ if ((jitter % randskip) == 0) {
+ /*
+ * If we are choosing to skip, bump randskip to make it
+ * successively less likely that we will skip again
+ */
+ randskip ++;
+ goto next_rgrp;
+ }
+ }
+ if (gfs2_rgrp_used_recently(rs, 1000) &&
+ gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
+ goto next_rgrp;
+ }
}
error = gfs2_glock_nq_init(rs->rs_rbm.rgd->rd_gl,
LM_ST_EXCLUSIVE, flags,
--
1.8.3.1
next prev parent reply other threads:[~2018-09-20 14:52 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-20 14:52 [Cluster-devel] [PATCH 0/2] GFS2: inplace_reserve performance improvements Mark Syms
2018-09-20 14:52 ` Mark Syms [this message]
2018-09-20 14:52 ` [Cluster-devel] [PATCH 2/2] GFS2: Avoid recently demoted rgrps Mark Syms
2018-09-20 17:17 ` [Cluster-devel] [PATCH 0/2] GFS2: inplace_reserve performance improvements Bob Peterson
2018-09-20 17:47 ` Mark Syms
2018-09-20 18:16 ` Steven Whitehouse
2018-09-28 12:23 ` Bob Peterson
2018-09-28 12:36 ` Mark Syms
2018-09-28 12:50 ` Mark Syms
2018-09-28 13:18 ` Steven Whitehouse
2018-09-28 13:43 ` Tim Smith
2018-09-28 13:59 ` Bob Peterson
2018-09-28 14:11 ` Mark Syms
2018-09-28 15:09 ` Tim Smith
2018-09-28 15:09 ` Steven Whitehouse
2018-09-28 12:55 ` Bob Peterson
2018-09-28 13:56 ` Mark Syms
2018-10-02 13:50 ` Mark Syms
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1537455133-48589-2-git-send-email-mark.syms@citrix.com \
--to=mark.syms@citrix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).