* [Cluster-devel] [GFS2 PATCH 3/3] GFS2: If we use up our block reservation, request more next time
2014-10-24 17:49 [Cluster-devel] [GFS2 PATCH 0/3] Patches to reduce GFS2 fragmentation Bob Peterson
@ 2014-10-24 17:49 ` Bob Peterson
0 siblings, 0 replies; 5+ messages in thread
From: Bob Peterson @ 2014-10-24 17:49 UTC (permalink / raw)
To: cluster-devel.redhat.com
If we run out of blocks for a given multi-block allocation, we obviously
did not reserve enough. We should reserve more blocks for the next
reservation to reduce fragmentation. This patch increases the size hint
for reservations when they run out.
---
fs/gfs2/rgrp.c | 3 +++
fs/gfs2/rgrp.h | 1 +
2 files changed, 4 insertions(+)
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index f65e56b..d4b9d93 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -2251,6 +2251,9 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
trace_gfs2_rs(rs, TRACE_RS_CLAIM);
if (rs->rs_free && !ret)
goto out;
+ /* We used up our block reservation, so we should
+ reserve more blocks next time. */
+ atomic_add(RGRP_RSRV_ADDBLKS, &rs->rs_sizehint);
}
__rs_deltree(rs);
}
diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
index 5d8f085..b104f4a 100644
--- a/fs/gfs2/rgrp.h
+++ b/fs/gfs2/rgrp.h
@@ -20,6 +20,7 @@
*/
#define RGRP_RSRV_MINBYTES 8
#define RGRP_RSRV_MINBLKS ((u32)(RGRP_RSRV_MINBYTES * GFS2_NBBY))
+#define RGRP_RSRV_ADDBLKS 64
struct gfs2_rgrpd;
struct gfs2_sbd;
--
1.9.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [Cluster-devel] [GFS2 PATCH 0/3] Patches to reduce GFS2 fragmentation
@ 2014-10-29 13:02 Bob Peterson
2014-10-29 13:02 ` [Cluster-devel] [GFS2 PATCH 1/3] GFS2: Set of distributed preferences for rgrps Bob Peterson
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Bob Peterson @ 2014-10-29 13:02 UTC (permalink / raw)
To: cluster-devel.redhat.com
Hi,
This is a revised version of my patches to improve GFS2 performance and
reduce GFS2 fragmentation. As suggested by Steve Whitehouse, the first patch
has been modified to base its distribution of resource groups on the number
of journals. It also corrects the criteria on the final loop, changing
"loops < 3" back to "loops < 2" as it should be. Also, I re-tested to make
sure nothing had regressed. I updated the timing numbers below.
Also as suggested, the patch that kept file close from deleting rgrp
reservations has been removed from the set, leaving only three patches.
The removal of that patch caused fragmentation to climb 15 percent, but it's
still much (41 percent) better than the original stock kernels. I agree that
use cases like untar would make for much worse fragmentation problems if
that patch were to remain.
I've re-tested with these three patches and found that performance and
fragmentation are still good. Here are numbers:
STOCK RHEL7 kernel (none of these patches):
Run times:
Run 1 time: 2hr 40min 33sec
Run 2 time: 2hr 39min 52sec
Run 3 time: 2hr 39min 31sec
Run 4 time: 2hr 33min 57sec
Run 5 time: 2hr 41min 6sec
Total file extents (File fragmentation):
EXTENT COUNT FOR OUTPUT FILES = 744708
EXTENT COUNT FOR OUTPUT FILES = 749868
EXTENT COUNT FOR OUTPUT FILES = 721862
EXTENT COUNT FOR OUTPUT FILES = 635301
EXTENT COUNT FOR OUTPUT FILES = 689263
The times are bad and the fragmentation level is also bad. If I add
just the first patch, "GFS2: Set of distributed preferences for rgrps"
the performance improves, but the fragmentation is worse (I only did three
iterations this time):
Run times:
Run 1 time: 2hr 2min 47sec
Run 2 time: 2hr 8min 37sec
Run 3 time: 2hr 10min 0sec
Total file extents (File fragmentation):
EXTENT COUNT FOR OUTPUT FILES = 1011217
EXTENT COUNT FOR OUTPUT FILES = 1025973
EXTENT COUNT FOR OUTPUT FILES = 1070163
With these three patches, both performance and fragmentation are good:
Run 1 time: 2hr 11min 11sec
Run 2 time: 2hr 8min 57sec
Run 3 time: 2hr 10min 43sec
Run 4 time: 2hr 12min 50sec
Run 5 time: 2hr 15min 32sec
Run 6 time: 2hr 10min 58sec
EXTENT COUNT FOR OUTPUT FILES = 424329
EXTENT COUNT FOR OUTPUT FILES = 430663
EXTENT COUNT FOR OUTPUT FILES = 430186
EXTENT COUNT FOR OUTPUT FILES = 421635
EXTENT COUNT FOR OUTPUT FILES = 416706
EXTENT COUNT FOR OUTPUT FILES = 418887
Regards,
Bob Peterson
Red Hat File Systems
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
Bob Peterson (3):
GFS2: Set of distributed preferences for rgrps
GFS2: Only increase rs_sizehint
GFS2: If we use up our block reservation, request more next time
fs/gfs2/file.c | 3 ++-
fs/gfs2/incore.h | 1 +
fs/gfs2/rgrp.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
fs/gfs2/rgrp.h | 1 +
4 files changed, 68 insertions(+), 6 deletions(-)
--
1.9.3
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Cluster-devel] [GFS2 PATCH 1/3] GFS2: Set of distributed preferences for rgrps
2014-10-29 13:02 [Cluster-devel] [GFS2 PATCH 0/3] Patches to reduce GFS2 fragmentation Bob Peterson
@ 2014-10-29 13:02 ` Bob Peterson
2014-10-29 13:02 ` [Cluster-devel] [GFS2 PATCH 2/3] GFS2: Only increase rs_sizehint Bob Peterson
2014-10-29 13:02 ` [Cluster-devel] [GFS2 PATCH 3/3] GFS2: If we use up our block reservation, request more next time Bob Peterson
2 siblings, 0 replies; 5+ messages in thread
From: Bob Peterson @ 2014-10-29 13:02 UTC (permalink / raw)
To: cluster-devel.redhat.com
This patch tries to use the journal numbers to evenly distribute
which node prefers which resource group for block allocations. This
is to help performance.
---
fs/gfs2/incore.h | 1 +
fs/gfs2/rgrp.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++-----
2 files changed, 62 insertions(+), 5 deletions(-)
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 39e7e99..1b89918 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -97,6 +97,7 @@ struct gfs2_rgrpd {
#define GFS2_RDF_CHECK 0x10000000 /* check for unlinked inodes */
#define GFS2_RDF_UPTODATE 0x20000000 /* rg is up to date */
#define GFS2_RDF_ERROR 0x40000000 /* error in rg */
+#define GFS2_RDF_PREFERRED 0x80000000 /* This rgrp is preferred */
#define GFS2_RDF_MASK 0xf0000000 /* mask for internal flags */
spinlock_t rd_rsspin; /* protects reservation related vars */
struct rb_root rd_rstree; /* multi-block reservation tree */
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 7474c41..f4e4a0c 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -936,7 +936,7 @@ static int read_rindex_entry(struct gfs2_inode *ip)
rgd->rd_gl->gl_vm.start = rgd->rd_addr * bsize;
rgd->rd_gl->gl_vm.end = rgd->rd_gl->gl_vm.start + (rgd->rd_length * bsize) - 1;
rgd->rd_rgl = (struct gfs2_rgrp_lvb *)rgd->rd_gl->gl_lksb.sb_lvbptr;
- rgd->rd_flags &= ~GFS2_RDF_UPTODATE;
+ rgd->rd_flags &= ~(GFS2_RDF_UPTODATE | GFS2_RDF_PREFERRED);
if (rgd->rd_data > sdp->sd_max_rg_data)
sdp->sd_max_rg_data = rgd->rd_data;
spin_lock(&sdp->sd_rindex_spin);
@@ -955,6 +955,36 @@ fail:
}
/**
+ * set_rgrp_preferences - Run all the rgrps, selecting some we prefer to use
+ * @sdp: the GFS2 superblock
+ *
+ * The purpose of this function is to select a subset of the resource groups
+ * and mark them as PREFERRED. We do it in such a way that each node prefers
+ * to use a unique set of rgrps to minimize glock contention.
+ */
+static void set_rgrp_preferences(struct gfs2_sbd *sdp)
+{
+ struct gfs2_rgrpd *rgd, *first;
+ int i;
+
+ /* Skip an initial number of rgrps, based on this node's journal ID.
+ That should start each node out on its own set. */
+ rgd = gfs2_rgrpd_get_first(sdp);
+ for (i = 0; i < sdp->sd_lockstruct.ls_jid; i++)
+ rgd = gfs2_rgrpd_get_next(rgd);
+ first = rgd;
+
+ do {
+ rgd->rd_flags |= GFS2_RDF_PREFERRED;
+ for (i = 0; i < sdp->sd_journals; i++) {
+ rgd = gfs2_rgrpd_get_next(rgd);
+ if (rgd == first)
+ break;
+ }
+ } while (rgd != first);
+}
+
+/**
* gfs2_ri_update - Pull in a new resource index from the disk
* @ip: pointer to the rindex inode
*
@@ -973,6 +1003,8 @@ static int gfs2_ri_update(struct gfs2_inode *ip)
if (error < 0)
return error;
+ set_rgrp_preferences(sdp);
+
sdp->sd_rindex_uptodate = 1;
return 0;
}
@@ -1891,6 +1923,25 @@ static bool gfs2_select_rgrp(struct gfs2_rgrpd **pos, const struct gfs2_rgrpd *b
}
/**
+ * fast_to_acquire - determine if a resource group will be fast to acquire
+ *
+ * If this is one of our preferred rgrps, it should be quicker to acquire,
+ * because we tried to set ourselves up as dlm lock master.
+ */
+static inline int fast_to_acquire(struct gfs2_rgrpd *rgd)
+{
+ struct gfs2_glock *gl = rgd->rd_gl;
+
+ if (gl->gl_state != LM_ST_UNLOCKED && list_empty(&gl->gl_holders) &&
+ !test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags) &&
+ !test_bit(GLF_DEMOTE, &gl->gl_flags))
+ return 1;
+ if (rgd->rd_flags & GFS2_RDF_PREFERRED)
+ return 1;
+ return 0;
+}
+
+/**
* gfs2_inplace_reserve - Reserve space in the filesystem
* @ip: the inode to reserve space for
* @ap: the allocation parameters
@@ -1932,10 +1983,15 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, const struct gfs2_alloc_parms *a
rg_locked = 0;
if (skip && skip--)
goto next_rgrp;
- if (!gfs2_rs_active(rs) && (loops < 2) &&
- gfs2_rgrp_used_recently(rs, 1000) &&
- gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
- goto next_rgrp;
+ if (!gfs2_rs_active(rs)) {
+ if (loops == 0 &&
+ !fast_to_acquire(rs->rs_rbm.rgd))
+ goto next_rgrp;
+ if ((loops < 2) &&
+ gfs2_rgrp_used_recently(rs, 1000) &&
+ gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
+ goto next_rgrp;
+ }
error = gfs2_glock_nq_init(rs->rs_rbm.rgd->rd_gl,
LM_ST_EXCLUSIVE, flags,
&rs->rs_rgd_gh);
--
1.9.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [Cluster-devel] [GFS2 PATCH 2/3] GFS2: Only increase rs_sizehint
2014-10-29 13:02 [Cluster-devel] [GFS2 PATCH 0/3] Patches to reduce GFS2 fragmentation Bob Peterson
2014-10-29 13:02 ` [Cluster-devel] [GFS2 PATCH 1/3] GFS2: Set of distributed preferences for rgrps Bob Peterson
@ 2014-10-29 13:02 ` Bob Peterson
2014-10-29 13:02 ` [Cluster-devel] [GFS2 PATCH 3/3] GFS2: If we use up our block reservation, request more next time Bob Peterson
2 siblings, 0 replies; 5+ messages in thread
From: Bob Peterson @ 2014-10-29 13:02 UTC (permalink / raw)
To: cluster-devel.redhat.com
If an application does a sequence of (1) big write, (2) little write
we don't necessarily want to reset the size hint based on the smaller
size. The fact that they did any big writes implies they may do more,
and therefore we should try to allocate bigger block reservations, even
if the last few were small writes. Therefore this patch changes function
gfs2_size_hint so that the size hint can only grow; it cannot shrink.
This is especially important where there are multiple writers.
---
fs/gfs2/file.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 80dd44d..5ebe568 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -337,7 +337,8 @@ static void gfs2_size_hint(struct file *filep, loff_t offset, size_t size)
size_t blks = (size + sdp->sd_sb.sb_bsize - 1) >> sdp->sd_sb.sb_bsize_shift;
int hint = min_t(size_t, INT_MAX, blks);
- atomic_set(&ip->i_res->rs_sizehint, hint);
+ if (hint > atomic_read(&ip->i_res->rs_sizehint))
+ atomic_set(&ip->i_res->rs_sizehint, hint);
}
/**
--
1.9.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [Cluster-devel] [GFS2 PATCH 3/3] GFS2: If we use up our block reservation, request more next time
2014-10-29 13:02 [Cluster-devel] [GFS2 PATCH 0/3] Patches to reduce GFS2 fragmentation Bob Peterson
2014-10-29 13:02 ` [Cluster-devel] [GFS2 PATCH 1/3] GFS2: Set of distributed preferences for rgrps Bob Peterson
2014-10-29 13:02 ` [Cluster-devel] [GFS2 PATCH 2/3] GFS2: Only increase rs_sizehint Bob Peterson
@ 2014-10-29 13:02 ` Bob Peterson
2 siblings, 0 replies; 5+ messages in thread
From: Bob Peterson @ 2014-10-29 13:02 UTC (permalink / raw)
To: cluster-devel.redhat.com
If we run out of blocks for a given multi-block allocation, we obviously
did not reserve enough. We should reserve more blocks for the next
reservation to reduce fragmentation. This patch increases the size hint
for reservations when they run out.
---
fs/gfs2/rgrp.c | 3 +++
fs/gfs2/rgrp.h | 1 +
2 files changed, 4 insertions(+)
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index f4e4a0c..9150207 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -2251,6 +2251,9 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
trace_gfs2_rs(rs, TRACE_RS_CLAIM);
if (rs->rs_free && !ret)
goto out;
+ /* We used up our block reservation, so we should
+ reserve more blocks next time. */
+ atomic_add(RGRP_RSRV_ADDBLKS, &rs->rs_sizehint);
}
__rs_deltree(rs);
}
diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
index 5d8f085..b104f4a 100644
--- a/fs/gfs2/rgrp.h
+++ b/fs/gfs2/rgrp.h
@@ -20,6 +20,7 @@
*/
#define RGRP_RSRV_MINBYTES 8
#define RGRP_RSRV_MINBLKS ((u32)(RGRP_RSRV_MINBYTES * GFS2_NBBY))
+#define RGRP_RSRV_ADDBLKS 64
struct gfs2_rgrpd;
struct gfs2_sbd;
--
1.9.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-10-29 13:02 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-29 13:02 [Cluster-devel] [GFS2 PATCH 0/3] Patches to reduce GFS2 fragmentation Bob Peterson
2014-10-29 13:02 ` [Cluster-devel] [GFS2 PATCH 1/3] GFS2: Set of distributed preferences for rgrps Bob Peterson
2014-10-29 13:02 ` [Cluster-devel] [GFS2 PATCH 2/3] GFS2: Only increase rs_sizehint Bob Peterson
2014-10-29 13:02 ` [Cluster-devel] [GFS2 PATCH 3/3] GFS2: If we use up our block reservation, request more next time Bob Peterson
-- strict thread matches above, loose matches on Subject: below --
2014-10-24 17:49 [Cluster-devel] [GFS2 PATCH 0/3] Patches to reduce GFS2 fragmentation Bob Peterson
2014-10-24 17:49 ` [Cluster-devel] [GFS2 PATCH 3/3] GFS2: If we use up our block reservation, request more next time Bob Peterson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).