cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Andreas Gruenbacher <agruenba@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH 08/11] gfs2: Add per-reservation reserved block accounting
Date: Fri,  5 Oct 2018 21:18:51 +0200	[thread overview]
Message-ID: <20181005191854.2566-9-agruenba@redhat.com> (raw)
In-Reply-To: <20181005191854.2566-1-agruenba@redhat.com>

Add a rs_reserved field to struct gfs2_blkreserv to keep track of the
number of blocks reserved by this particular reservation.  When making a
reservation with gfs2_inplace_reserve, this field is set to somewhere
between ap->min_target and ap->target depending on the number of free
blocks in the resource group.  When allocating blocks with
gfs2_alloc_blocks, rs_reserved is decremented accordingly.  Eventually,
any reserved but not consumed blocks are returned to the resource group
by gfs2_inplace_release (via gfs2_adjust_reservation).

The reservation tree (rd_rstree) is unaffected by this change: the
reservations it tracks are still advisory, and the sizes of those
reservations (rs_free) are still determined by the tentative allocation
sizes (i_sizehint).  Since rd_reserved now tracks the number of reserved
blocks rather than the number of tentatively rd_reserved blocks, we may
end up with slightly different allocation patterns, though. The
rd_extfail_pt optimization will still cause ill-suited resource groups
to be skipped quickly.

We expect to augment this with a patch that will reserve an extent of
blocks rather than just reserving a number of blocks in
gfs2_inplace_reserve.  gfs2_alloc_blocks will then be able to consume
that reserved extent before scanning for additional available blocks;
this should eliminate double bitmap scanning in most cases.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/file.c       |  4 +--
 fs/gfs2/incore.h     |  1 +
 fs/gfs2/rgrp.c       | 66 +++++++++++++++++++++++++-------------------
 fs/gfs2/trace_gfs2.h |  8 ++++--
 4 files changed, 46 insertions(+), 33 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index e8864ff2ed03..12c19e3fcb1b 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1008,8 +1008,8 @@ static long __gfs2_fallocate(struct file *file, int mode, loff_t offset, loff_t
 			goto out_qunlock;
 
 		/* check if the selected rgrp limits our max_blks further */
-		if (ap.allowed && ap.allowed < max_blks)
-			max_blks = ap.allowed;
+		if (ip->i_res.rs_reserved < max_blks)
+			max_blks = ip->i_res.rs_reserved;
 
 		/* Almost done. Calculate bytes that can be written using
 		 * max_blks. We also recompute max_bytes, data_blocks and
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 0ed28fbc73b4..932e63924f7e 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -288,6 +288,7 @@ struct gfs2_blkreserv {
 	struct gfs2_rgrpd *rs_rgd;
 	u64 rs_start;		      /* start of reservation */
 	u32 rs_free;                  /* how many blocks are still free */
+	u32 rs_reserved;              /* number of reserved blocks */
 };
 
 /*
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index ee981085db33..ef6768bcff21 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -674,9 +674,6 @@ static void __rs_deltree(struct gfs2_blkreserv *rs)
 	if (rs->rs_free) {
 		struct gfs2_bitmap *start, *last;
 
-		/* return reserved blocks to the rgrp */
-		BUG_ON(rs->rs_rgd->rd_reserved < rs->rs_free);
-		rs->rs_rgd->rd_reserved -= rs->rs_free;
 		/* The rgrp extent failure point is likely not to increase;
 		   it will only do so if the freed blocks are somehow
 		   contiguous with a span of free blocks that follows. Still,
@@ -1543,39 +1540,27 @@ static void rs_insert(struct gfs2_inode *ip)
 
 	rb_link_node(&rs->rs_node, parent, newn);
 	rb_insert_color(&rs->rs_node, &rgd->rd_rstree);
-
-	/* Do our rgrp accounting for the reservation */
-	rgd->rd_reserved += rs->rs_free; /* blocks reserved */
 	spin_unlock(&rgd->rd_rsspin);
 	trace_gfs2_rs(rs, TRACE_RS_INSERT);
 }
 
 /**
- * rgd_free - return the number of free blocks we can allocate.
+ * rgd_free - compute the number of blocks we can allocate
  * @rgd: the resource group
  *
- * This function returns the number of free blocks for an rgrp.
- * That's the clone-free blocks (blocks that are free, not including those
- * still being used for unlinked files that haven't been deleted.)
- *
- * It also subtracts any blocks reserved by someone else, but does not
- * include free blocks that are still part of our current reservation,
- * because obviously we can (and will) allocate them.
+ * Compute the number of blocks we can allocate in @rgd.  That's the clone-free
+ * blocks (blocks that are free, not including those still being used for
+ * unlinked files that haven't been deleted) minus the blocks currently
+ * reserved by any reservations other than @rs.
  */
 static inline u32 rgd_free(struct gfs2_rgrpd *rgd, struct gfs2_blkreserv *rs)
 {
-	u32 tot_reserved, tot_free;
-
-	if (WARN_ON_ONCE(rgd->rd_reserved < rs->rs_free))
-		return 0;
-	tot_reserved = rgd->rd_reserved - rs->rs_free;
+	u32 free;
 
-	if (rgd->rd_free_clone < tot_reserved)
-		tot_reserved = 0;
-
-	tot_free = rgd->rd_free_clone - tot_reserved;
-
-	return tot_free;
+	free = rgd->rd_free_clone - rgd->rd_reserved;
+	if (rgd == rs->rs_rgd)
+		free += rs->rs_reserved;
+	return free;
 }
 
 /**
@@ -2058,8 +2043,7 @@ static inline int fast_to_acquire(struct gfs2_rgrpd *rgd)
  * We try our best to find an rgrp that has at least ap->target blocks
  * available. After a couple of passes (loops == 2), the prospects of finding
  * such an rgrp diminish. At this stage, we return the first rgrp that has
- * at least ap->min_target blocks available. Either way, we set ap->allowed to
- * the number of blocks available in the chosen rgrp.
+ * at least ap->min_target blocks available.
  *
  * Returns: 0 on success,
  *          -ENOMEM if a suitable rgrp can't be found
@@ -2076,6 +2060,8 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 	int loops = 0;
 	u32 free_blocks, skip = 0;
 
+	BUG_ON(rs->rs_reserved);
+
 	if (sdp->sd_args.ar_rgrplvb)
 		flags |= GL_SKIP;
 	if (gfs2_assert_warn(sdp, ap->target))
@@ -2149,7 +2135,14 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 		if (free_blocks >= ap->target ||
 		    (loops == 2 && ap->min_target &&
 		     free_blocks >= ap->min_target)) {
-			ap->allowed = free_blocks;
+			struct gfs2_rgrpd *rgd = rs->rs_rgd;
+
+			rs->rs_reserved = ap->target;
+			if (rs->rs_reserved > free_blocks)
+				rs->rs_reserved = free_blocks;
+			spin_lock(&rs->rs_rgd->rd_rsspin);
+			rgd->rd_reserved += rs->rs_reserved;
+			spin_unlock(&rs->rs_rgd->rd_rsspin);
 			return 0;
 		}
 check_rgrp:
@@ -2201,6 +2194,17 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct gfs2_alloc_parms *ap)
 
 void gfs2_inplace_release(struct gfs2_inode *ip)
 {
+	struct gfs2_blkreserv *rs = &ip->i_res;
+
+	if (rs->rs_reserved) {
+		struct gfs2_rgrpd *rgd = rs->rs_rgd;
+
+		spin_lock(&rgd->rd_rsspin);
+		BUG_ON(rgd->rd_reserved < rs->rs_reserved);
+		rgd->rd_reserved -= rs->rs_reserved;
+		spin_unlock(&rs->rs_rgd->rd_rsspin);
+		rs->rs_reserved = 0;
+	}
 	if (gfs2_holder_initialized(&ip->i_rgd_gh))
 		gfs2_glock_dq_uninit(&ip->i_rgd_gh);
 }
@@ -2345,6 +2349,9 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
 	struct gfs2_rgrpd *rgd = rbm->rgd;
 
 	spin_lock(&rgd->rd_rsspin);
+	BUG_ON(rs->rs_reserved < len);
+	rgd->rd_reserved -= len;
+	rs->rs_reserved -= len;
 	if (gfs2_rs_active(rs)) {
 		u64 start = gfs2_rbm_to_block(rbm);
 
@@ -2354,7 +2361,6 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
 			rs->rs_start += len;
 			rlen = min(rs->rs_free, len);
 			rs->rs_free -= rlen;
-			rgd->rd_reserved -= rlen;
 			trace_gfs2_rs(rs, TRACE_RS_CLAIM);
 			if (rs->rs_start < rgd->rd_data0 + rgd->rd_data &&
 			    rs->rs_free)
@@ -2417,6 +2423,8 @@ int gfs2_alloc_blocks(struct gfs2_inode *ip, u64 *bn, unsigned int *nblocks,
 	u64 block; /* block, within the file system scope */
 	int error;
 
+	BUG_ON(ip->i_res.rs_reserved < *nblocks);
+
 	gfs2_set_alloc_start(&rbm, ip, dinode);
 	error = gfs2_rbm_find(&rbm, GFS2_BLKST_FREE, NULL, ip, false);
 
diff --git a/fs/gfs2/trace_gfs2.h b/fs/gfs2/trace_gfs2.h
index 7586c7629497..282fcb1a242f 100644
--- a/fs/gfs2/trace_gfs2.h
+++ b/fs/gfs2/trace_gfs2.h
@@ -598,6 +598,7 @@ TRACE_EVENT(gfs2_rs,
 		__field(	u64,	inum			)
 		__field(	u64,	start			)
 		__field(	u32,	free			)
+		__field(	u32,	reserved		)
 		__field(	u8,	func			)
 	),
 
@@ -610,17 +611,20 @@ TRACE_EVENT(gfs2_rs,
 						       i_res)->i_no_addr;
 		__entry->start		= rs->rs_start;
 		__entry->free		= rs->rs_free;
+		__entry->reserved	= rs->rs_reserved;
 		__entry->func		= func;
 	),
 
-	TP_printk("%u,%u bmap %llu resrv %llu rg:%llu rf:%lu rr:%lu %s f:%lu",
+	TP_printk("%u,%u bmap %llu resrv %llu rg:%llu rf:%lu rr:%lu %s f:%lu r:%lu",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long long)__entry->inum,
 		  (unsigned long long)__entry->start,
 		  (unsigned long long)__entry->rd_addr,
 		  (unsigned long)__entry->rd_free_clone,
 		  (unsigned long)__entry->rd_reserved,
-		  rs_func_name(__entry->func), (unsigned long)__entry->free)
+		  rs_func_name(__entry->func),
+		  (unsigned long)__entry->free,
+		  (unsigned long)__entry->reserved)
 );
 
 #endif /* _TRACE_GFS2_H */
-- 
2.17.1



  parent reply	other threads:[~2018-10-05 19:18 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-05 19:18 [Cluster-devel] [PATCH 00/11] gfs2: Prepare for resource group glock sharing Andreas Gruenbacher
2018-10-05 19:18 ` [Cluster-devel] [PATCH 01/11] gfs2: Always check the result of gfs2_rbm_from_block Andreas Gruenbacher
2018-10-08 10:12   ` Steven Whitehouse
2018-10-05 19:18 ` [Cluster-devel] [PATCH 02/11] gfs2: Move rs_{sizehint, rgd_gh} fields into the inode Andreas Gruenbacher
2018-10-08 10:34   ` Steven Whitehouse
2018-10-05 19:18 ` [Cluster-devel] [PATCH 03/11] gfs2: Remove unused RGRP_RSRV_MINBYTES definition Andreas Gruenbacher
2018-10-05 19:18 ` [Cluster-devel] [PATCH 04/11] gfs2: Rename bitmap.bi_{len => bytes} Andreas Gruenbacher
2018-10-05 19:18 ` [Cluster-devel] [PATCH 05/11] gfs2: Fix some minor typos Andreas Gruenbacher
2018-10-05 19:18 ` [Cluster-devel] [PATCH 06/11] gfs2: Only use struct gfs2_rbm for bitmap manipulations Andreas Gruenbacher
2018-10-08 10:39   ` Steven Whitehouse
2018-10-05 19:18 ` [Cluster-devel] [PATCH 07/11] gfs2: Fix marking bitmaps non-full Andreas Gruenbacher
2018-10-08 10:23   ` Steven Whitehouse
2018-10-05 19:18 ` Andreas Gruenbacher [this message]
2018-10-05 19:18 ` [Cluster-devel] [PATCH 09/11] gfs2: Remove unnecessary gfs2_rlist_alloc parameter Andreas Gruenbacher
2018-10-05 19:18 ` [Cluster-devel] [PATCH 10/11] gfs2: Pass resource group to rgblk_free Andreas Gruenbacher
2018-10-08 10:28   ` Steven Whitehouse
2018-10-05 19:18 ` [Cluster-devel] [PATCH 11/11] gfs2: Add local resource group locking Andreas Gruenbacher
2018-10-08 10:33   ` Steven Whitehouse
2018-10-08 12:56     ` Andreas Gruenbacher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181005191854.2566-9-agruenba@redhat.com \
    --to=agruenba@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).