From mboxrd@z Thu Jan  1 00:00:00 1970
From: Steven Whitehouse <swhiteho@redhat.com>
Date: Fri, 16 Dec 2016 14:40:10 +0000
Subject: [Cluster-devel] [GFS2 PATCH] GFS2: Limit number of transaction
 blocks requested for truncates
In-Reply-To: <1207853613.13078711.1481897777538.JavaMail.zimbra@redhat.com>
References: <1207853613.13078711.1481897777538.JavaMail.zimbra@redhat.com>
Message-ID: <5853FCCA.8000809@redhat.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

Hi,

On 16/12/16 14:16, Bob Peterson wrote:
> Hi,
>
> This patch limits the number of transaction blocks requested during
> file truncates. If we have very large multi-terabyte files, and want
> to delete or truncate them, they might span so many resource groups
> that we overflow the journal blocks, and cause an assert failure.
> By limiting the number of blocks in the transaction, we prevent this
> overflow and give other running processes time to do transactions.
>
> Note that this is a temporary solution. A long-term solution would
> be to implement the non-recursive truncate that processes one
> resource group at a time, and therefore, the transactions need
> much fewer blocks.
>
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
> diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> index 645721f..77f2ab1 100644
> --- a/fs/gfs2/bmap.c
> +++ b/fs/gfs2/bmap.c
> @@ -720,6 +720,7 @@ static int do_strip(struct gfs2_inode *ip, struct buffer_head *dibh,
>   {
>   	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
>   	struct gfs2_rgrp_list rlist;
> +	struct gfs2_trans *tr;
>   	u64 bn, bstart;
>   	u32 blen, btotal;
>   	__be64 *p;
> @@ -728,6 +729,7 @@ static int do_strip(struct gfs2_inode *ip, struct buffer_head *dibh,
>   	unsigned int revokes = 0;
>   	int x;
>   	int error;
> +	int jblocks_rqsted;
>   
>   	error = gfs2_rindex_update(sdp);
>   	if (error)
> @@ -791,12 +793,17 @@ static int do_strip(struct gfs2_inode *ip, struct buffer_head *dibh,
>   	if (gfs2_rs_active(&ip->i_res)) /* needs to be done with the rgrp glock held */
>   		gfs2_rs_deltree(&ip->i_res);
>   
> -	error = gfs2_trans_begin(sdp, rg_blocks + RES_DINODE +
> -				 RES_INDIRECT + RES_STATFS + RES_QUOTA,
> -				 revokes);
> +restart:
> +	jblocks_rqsted = rg_blocks + RES_DINODE +
> +		RES_INDIRECT + RES_STATFS + RES_QUOTA +
> +		gfs2_struct2blk(sdp, revokes, sizeof(u64));
> +	if (jblocks_rqsted > RES_MAX)
> +		jblocks_rqsted = RES_MAX;
> +	error = gfs2_trans_begin(sdp, jblocks_rqsted, revokes);
>   	if (error)
>   		goto out_rg_gunlock;
>   
> +	tr = current->journal_info;
>   	down_write(&ip->i_rw_mutex);
>   
>   	gfs2_trans_add_meta(ip->i_gl, dibh);
> @@ -810,6 +817,15 @@ static int do_strip(struct gfs2_inode *ip, struct buffer_head *dibh,
>   		if (!*p)
>   			continue;
>   
> +		/* check for max reasonable journal transaction blocks */
> +		if (tr->tr_num_buf_new + RES_STATFS + RES_QUOTA >= RES_MAX) {
> +			if (rg_blocks >= tr->tr_num_buf_new)
> +				rg_blocks -= tr->tr_num_buf_new;
> +			else
> +				rg_blocks = 0;
> +			break;
> +		}
> +
>   		bn = be64_to_cpu(*p);
>   
>   		if (bstart + blen == bn)
> @@ -827,6 +843,9 @@ static int do_strip(struct gfs2_inode *ip, struct buffer_head *dibh,
>   		*p = 0;
>   		gfs2_add_inode_blocks(&ip->i_inode, -1);
>   	}
> +	if (p == bottom)
> +		rg_blocks = 0;
> +
>   	if (bstart) {
>   		__gfs2_free_blocks(ip, bstart, blen, metadata);
>   		btotal += blen;
> @@ -844,6 +863,9 @@ static int do_strip(struct gfs2_inode *ip, struct buffer_head *dibh,
>   
>   	gfs2_trans_end(sdp);
>   
> +	if (rg_blocks)
> +		goto restart;
> +
>   out_rg_gunlock:
>   	gfs2_glock_dq_m(rlist.rl_rgrps, rlist.rl_ghs);
>   out_rlist:
> diff --git a/fs/gfs2/trans.h b/fs/gfs2/trans.h
> index 1e6e7da..c212d16 100644
> --- a/fs/gfs2/trans.h
> +++ b/fs/gfs2/trans.h
> @@ -25,6 +25,7 @@ struct gfs2_glock;
>   #define RES_EATTR	1
>   #define RES_STATFS	1
>   #define RES_QUOTA	2
> +#define RES_MAX		256 /* Must not exceed 8192 (min journal is 32MB) */
>   
>   /* reserve either the number of blocks to be allocated plus the rg header
>    * block, or all of the blocks in the rg, whichever is smaller */
>
If it can't exceed 8192, then why is it only 256, could there not be a 
larger number? It should probably scale with journal size to avoid 
causing issues for those with larger journals. The approach looks good 
though for a temporary fix,

Steve.