From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bob Peterson Date: Thu, 28 Apr 2016 10:16:09 -0400 (EDT) Subject: [Cluster-devel] [GFS2 PATCH 3/3 v3] GFS2: Add retry loop to delete_work_func In-Reply-To: <5721DB89.7040906@redhat.com> References: <1461771337-17317-1-git-send-email-rpeterso@redhat.com> <1461771337-17317-4-git-send-email-rpeterso@redhat.com> <853479097.58344937.1461777017183.JavaMail.zimbra@redhat.com> <5721DB89.7040906@redhat.com> Message-ID: <1675787118.58792048.1461852969312.JavaMail.zimbra@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi Steve, Comments below. ----- Original Message ----- > This generally looks like a much better solution, however I'm always > worried about arbitrary delays being added to the code. Is this just a > wait for an inode in I_FREEING to go away, along with a timeout? It > would be good to at least document why 30 loops here is the right > amount. In other words will this still be sure to work on different > machines with different timing characteristics? > > I wonder whether it might not be better to reschedule the work function > for later, rather than loop in the work function itself? > > Steve. 1. I chose 30 iterations arbitrarily based on instrumentation on the virt cluster I was using. I never saw it go over 29 retries. In my experience, virt clusters are faster and have more critical timing than bare metal, so I considered this worst case. But it is still arbitrary. 2. Your idea of re-queuing the delete work is an excellent one, and I've reworked the patch to do this. I'm testing the implementation now and it seems to be working well so far. This time my retry value is 30 seconds, with a 10ms delay between tries, but this is still arbitrary, so I'm open to suggestions. Here is the replacement patch: Patch description: The delete work function, delete_work_func, often doesn't find the inode needed to free an inode that's been marked unlinked. That's because it only tries gfs2_ilookup once, and it's often not found because of two things: The fact that gfs2_lookup_by_inum is only called in an else condition, and the fact that the non-blocking lookup often encounters inodes that are being I_FREEd by the vfs. This patch allows it to retry the lookup when under that circumstance, otherwise call the gfs2_lookup_by_inum. If the inode is in I_FREEING, -EAGAIN is returned to the caller, who then re-queues the delete work for later. After a certain timeout value, the delete_work_func stops using ilookup and uses lookup_by_inum instead. If that fails for a certain number of retries, it gives up. Signed-off-by: Bob Peterson --- diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 672de35..59efb8b 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -571,12 +571,14 @@ out_unlock: return; } +#define LOOKUP_TIMEOUT (HZ >> 1) + static void delete_work_func(struct work_struct *work) { - struct gfs2_glock *gl = container_of(work, struct gfs2_glock, gl_delete); + struct gfs2_glock *gl = container_of(work, struct gfs2_glock, + gl_delete.work); struct gfs2_sbd *sdp = gl->gl_name.ln_sbd; - struct gfs2_inode *ip; - struct inode *inode; + struct inode *inode = NULL; u64 no_addr = gl->gl_name.ln_number; /* If someone's using this glock to create a new dinode, the block must @@ -585,13 +587,43 @@ static void delete_work_func(struct work_struct *work) if (test_bit(GLF_INODE_CREATING, &gl->gl_flags)) goto out; - ip = gl->gl_object; - /* Note: Unsafe to dereference ip as we don't hold right refs/locks */ - - if (ip) + if (test_bit(GLF_TRY_ILOOKUP, &gl->gl_flags)) { inode = gfs2_ilookup(sdp->sd_vfs, no_addr, 1); - else - inode = gfs2_lookup_by_inum(sdp, no_addr, NULL, GFS2_BLKST_UNLINKED); + + if (inode == ERR_PTR(-EAGAIN)) { + if (time_before(jiffies, gl->gl_tchange + + LOOKUP_TIMEOUT)) { + gfs2_glock_hold(gl); + if (queue_delayed_work(gfs2_delete_workqueue, + &gl->gl_delete, 10) == 0) + gfs2_glock_put(gl); + goto out; + } else { + clear_bit(GLF_TRY_ILOOKUP, &gl->gl_flags); + gl->gl_tchange = jiffies; + } + } + } + + if (inode == NULL || IS_ERR(inode)) { + /* Note: This function uses the iopen glock only. It relies on + the fact that gfs2_inode_lookup (called by lookup_by_inum) + will return -EAGAIN before it does any manipulation of the + iopen glock that might change gl_tchange. */ + inode = gfs2_lookup_by_inum(sdp, no_addr, NULL, + GFS2_BLKST_UNLINKED); + if (inode == ERR_PTR(-EAGAIN)) { + if (time_before(jiffies, gl->gl_tchange + + LOOKUP_TIMEOUT)) { + gfs2_glock_hold(gl); + if (queue_delayed_work(gfs2_delete_workqueue, + &gl->gl_delete, 10) == 0) { + gfs2_glock_put(gl); + } + } + goto out; + } + } if (inode && !IS_ERR(inode)) { d_prune_aliases(inode); iput(inode); @@ -713,7 +745,7 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number, gl->gl_object = NULL; gl->gl_hold_time = GL_GLOCK_DFT_HOLD; INIT_DELAYED_WORK(&gl->gl_work, glock_work_func); - INIT_WORK(&gl->gl_delete, delete_work_func); + INIT_DELAYED_WORK(&gl->gl_delete, delete_work_func); mapping = gfs2_glock2aspace(gl); if (mapping) { diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c index 5db59d4..c0a45ed 100644 --- a/fs/gfs2/glops.c +++ b/fs/gfs2/glops.c @@ -550,7 +550,10 @@ static void iopen_go_callback(struct gfs2_glock *gl, bool remote) if (gl->gl_demote_state == LM_ST_UNLOCKED && gl->gl_state == LM_ST_SHARED && ip) { gl->gl_lockref.count++; - if (queue_work(gfs2_delete_workqueue, &gl->gl_delete) == 0) + gl->gl_tchange = jiffies; + set_bit(GLF_TRY_ILOOKUP, &gl->gl_flags); + if (queue_delayed_work(gfs2_delete_workqueue, &gl->gl_delete, + 0) == 0) gl->gl_lockref.count--; } } diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index a6a3389..8c79fe4 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -329,6 +329,7 @@ enum { GLF_OBJECT = 14, /* Used only for tracing */ GLF_BLOCKING = 15, GLF_INODE_CREATING = 16, /* Inode creation occurring */ + GLF_TRY_ILOOKUP = 17, /* Try gfs2_ilookup for del */ }; struct gfs2_glock { @@ -363,7 +364,7 @@ struct gfs2_glock { struct delayed_work gl_work; union { /* For inode and iopen glocks only */ - struct work_struct gl_delete; + struct delayed_work gl_delete; /* For rgrp glocks only */ struct { loff_t start; diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index 48c1418..3cd06c7 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -37,6 +37,11 @@ #include "super.h" #include "glops.h" +enum { + LOOKUP_UNLOCKED = 0, + LOOKUP_LOCKED = 1, +}; + struct gfs2_skip_data { u64 no_addr; int skipped; @@ -71,27 +76,34 @@ static int iget_set(struct inode *inode, void *opaque) return 0; } -struct inode *gfs2_ilookup(struct super_block *sb, u64 no_addr, int non_block) +static struct inode *inode_lookup_common(struct super_block *sb, u64 no_addr, + int non_block, int locked) { unsigned long hash = (unsigned long)no_addr; - struct gfs2_skip_data data; + struct gfs2_skip_data data = {.no_addr = no_addr, .skipped = 0, + .non_block = non_block}; + struct inode *inode; - data.no_addr = no_addr; - data.skipped = 0; - data.non_block = non_block; - return ilookup5(sb, hash, iget_test, &data); + if (locked == LOOKUP_LOCKED) + inode = iget5_locked(sb, hash, iget_test, iget_set, &data); + else + inode = ilookup5(sb, hash, iget_test, &data); + + if (non_block && data.skipped) + return ERR_PTR(-EAGAIN); + + return inode; +} + +struct inode *gfs2_ilookup(struct super_block *sb, u64 no_addr, int non_block) +{ + return inode_lookup_common(sb, no_addr, non_block, LOOKUP_UNLOCKED); } static struct inode *gfs2_iget(struct super_block *sb, u64 no_addr, int non_block) { - struct gfs2_skip_data data; - unsigned long hash = (unsigned long)no_addr; - - data.no_addr = no_addr; - data.skipped = 0; - data.non_block = non_block; - return iget5_locked(sb, hash, iget_test, iget_set, &data); + return inode_lookup_common(sb, no_addr, non_block, LOOKUP_LOCKED); } /** @@ -148,6 +160,8 @@ struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned int type, inode = gfs2_iget(sb, no_addr, non_block); if (!inode) return ERR_PTR(-ENOMEM); + if (inode == ERR_PTR(-EAGAIN)) + return inode; ip = GFS2_I(inode); ip->i_no_addr = no_addr; diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index 07c0265..58c74ca 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -1801,8 +1801,10 @@ static void try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 *last_unlinked, u64 skip * answer to whether it is NULL or not. */ ip = gl->gl_object; - - if (ip || queue_work(gfs2_delete_workqueue, &gl->gl_delete) == 0) + gl->gl_tchange = jiffies; + set_bit(GLF_TRY_ILOOKUP, &gl->gl_flags); + if (ip || queue_delayed_work(gfs2_delete_workqueue, + &gl->gl_delete, 0) == 0) gfs2_glock_put(gl); else found++;