From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steven Whitehouse Date: Fri, 16 Dec 2016 14:40:10 +0000 Subject: [Cluster-devel] [GFS2 PATCH] GFS2: Limit number of transaction blocks requested for truncates In-Reply-To: <1207853613.13078711.1481897777538.JavaMail.zimbra@redhat.com> References: <1207853613.13078711.1481897777538.JavaMail.zimbra@redhat.com> Message-ID: <5853FCCA.8000809@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi, On 16/12/16 14:16, Bob Peterson wrote: > Hi, > > This patch limits the number of transaction blocks requested during > file truncates. If we have very large multi-terabyte files, and want > to delete or truncate them, they might span so many resource groups > that we overflow the journal blocks, and cause an assert failure. > By limiting the number of blocks in the transaction, we prevent this > overflow and give other running processes time to do transactions. > > Note that this is a temporary solution. A long-term solution would > be to implement the non-recursive truncate that processes one > resource group at a time, and therefore, the transactions need > much fewer blocks. > > Signed-off-by: Bob Peterson > --- > diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c > index 645721f..77f2ab1 100644 > --- a/fs/gfs2/bmap.c > +++ b/fs/gfs2/bmap.c > @@ -720,6 +720,7 @@ static int do_strip(struct gfs2_inode *ip, struct buffer_head *dibh, > { > struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode); > struct gfs2_rgrp_list rlist; > + struct gfs2_trans *tr; > u64 bn, bstart; > u32 blen, btotal; > __be64 *p; > @@ -728,6 +729,7 @@ static int do_strip(struct gfs2_inode *ip, struct buffer_head *dibh, > unsigned int revokes = 0; > int x; > int error; > + int jblocks_rqsted; > > error = gfs2_rindex_update(sdp); > if (error) > @@ -791,12 +793,17 @@ static int do_strip(struct gfs2_inode *ip, struct buffer_head *dibh, > if (gfs2_rs_active(&ip->i_res)) /* needs to be done with the rgrp glock held */ > gfs2_rs_deltree(&ip->i_res); > > - error = gfs2_trans_begin(sdp, rg_blocks + RES_DINODE + > - RES_INDIRECT + RES_STATFS + RES_QUOTA, > - revokes); > +restart: > + jblocks_rqsted = rg_blocks + RES_DINODE + > + RES_INDIRECT + RES_STATFS + RES_QUOTA + > + gfs2_struct2blk(sdp, revokes, sizeof(u64)); > + if (jblocks_rqsted > RES_MAX) > + jblocks_rqsted = RES_MAX; > + error = gfs2_trans_begin(sdp, jblocks_rqsted, revokes); > if (error) > goto out_rg_gunlock; > > + tr = current->journal_info; > down_write(&ip->i_rw_mutex); > > gfs2_trans_add_meta(ip->i_gl, dibh); > @@ -810,6 +817,15 @@ static int do_strip(struct gfs2_inode *ip, struct buffer_head *dibh, > if (!*p) > continue; > > + /* check for max reasonable journal transaction blocks */ > + if (tr->tr_num_buf_new + RES_STATFS + RES_QUOTA >= RES_MAX) { > + if (rg_blocks >= tr->tr_num_buf_new) > + rg_blocks -= tr->tr_num_buf_new; > + else > + rg_blocks = 0; > + break; > + } > + > bn = be64_to_cpu(*p); > > if (bstart + blen == bn) > @@ -827,6 +843,9 @@ static int do_strip(struct gfs2_inode *ip, struct buffer_head *dibh, > *p = 0; > gfs2_add_inode_blocks(&ip->i_inode, -1); > } > + if (p == bottom) > + rg_blocks = 0; > + > if (bstart) { > __gfs2_free_blocks(ip, bstart, blen, metadata); > btotal += blen; > @@ -844,6 +863,9 @@ static int do_strip(struct gfs2_inode *ip, struct buffer_head *dibh, > > gfs2_trans_end(sdp); > > + if (rg_blocks) > + goto restart; > + > out_rg_gunlock: > gfs2_glock_dq_m(rlist.rl_rgrps, rlist.rl_ghs); > out_rlist: > diff --git a/fs/gfs2/trans.h b/fs/gfs2/trans.h > index 1e6e7da..c212d16 100644 > --- a/fs/gfs2/trans.h > +++ b/fs/gfs2/trans.h > @@ -25,6 +25,7 @@ struct gfs2_glock; > #define RES_EATTR 1 > #define RES_STATFS 1 > #define RES_QUOTA 2 > +#define RES_MAX 256 /* Must not exceed 8192 (min journal is 32MB) */ > > /* reserve either the number of blocks to be allocated plus the rg header > * block, or all of the blocks in the rg, whichever is smaller */ > If it can't exceed 8192, then why is it only 256, could there not be a larger number? It should probably scale with journal size to avoid causing issues for those with larger journals. The approach looks good though for a temporary fix, Steve.