cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
* [Cluster-devel] [GFS2 PATCH 00/15] Fifteen patches related to file unlink->delete->new
@ 2015-10-05 15:36 Bob Peterson
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 01/15] GFS2: Update master statfs buffer with sd_statfs_spin locked Bob Peterson
                   ` (14 more replies)
  0 siblings, 15 replies; 22+ messages in thread
From: Bob Peterson @ 2015-10-05 15:36 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On September 10, I posted a set of 11 preliminary patches for GFS2 that
revised the way GFS2 transitions files from unlinked to deleted. I received
no feedback. Since that time, I've found several bugs in testing and fixed
them. I've also split the last patch into five pieces. This is the revised
set of fifteen patches.

This set of fifteen patches was designed to fix several problems in GFS2
that had to do with transitioning dinodes from unlinked to deleted and
back to created again. Many of these problems resulted in dinodes being
not transitioning from the "unlinked" state to a "deleted" state.
Some of them address the problem of dinodes being re-created in place
(due to limited space in the file system) in which case callbacks were
received and delete work was queued for dinodes that were already
deleted and re-used by another node, only to have the callback destroy
the newly created dinode.

These patches were tested using virts that had 4GB GFS2 file systems,
using a test case that recreated numerous problems. With these patches,
all those problems disappear.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
Bob Peterson (15):
  GFS2: Update master statfs buffer with sd_statfs_spin locked
  GFS2: Allow fail_gunlock3 to set the free_vfs_inode bit
  GFS2: Protect log tail calculations with inside locks
  GFS2: Wait for iopen glock dequeues
  GFS2: Reintroduce a timeout in function gfs2_gl_hash_clear
  GFS2: Prevent gl_delete work for re-used inodes
  GFS2: Truncate address space mapping when deleting an inode
  GFS2: Don't filter out I_FREEING inodes anymore
  GFS2: generalize gfs2_check_blk_type
  GFS2: Change from tr_touched to tr_bufs
  GFS2: Add new function gfs2_inode_lookup_for_del
  gfs2: Remove unused param non_block from gfs2_inode_lookup
  gfs2: Use new variable i_gl instead of ip->i_gl
  GFS2: Hold onto iopen glock longer when dinode creation fails
  GFS2: Rework gfs2_evict_inode to prevent collisions with openers

 fs/gfs2/dir.c        |   2 +-
 fs/gfs2/export.c     |   2 +-
 fs/gfs2/glock.c      |  15 ++--
 fs/gfs2/glops.c      |   3 +-
 fs/gfs2/incore.h     |   7 +-
 fs/gfs2/inode.c      | 246 ++++++++++++++++++++++++++++++++++++---------------
 fs/gfs2/inode.h      |   7 +-
 fs/gfs2/log.c        |   7 +-
 fs/gfs2/main.c       |   2 +
 fs/gfs2/meta_io.c    |  12 ++-
 fs/gfs2/ops_fstype.c |   2 +-
 fs/gfs2/rgrp.c       |  52 ++++++++---
 fs/gfs2/rgrp.h       |   1 +
 fs/gfs2/super.c      |  66 ++++++++++----
 fs/gfs2/trans.c      |  13 +--
 15 files changed, 307 insertions(+), 130 deletions(-)

-- 
2.4.3



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Cluster-devel] [GFS2 PATCH 01/15] GFS2: Update master statfs buffer with sd_statfs_spin locked
  2015-10-05 15:36 [Cluster-devel] [GFS2 PATCH 00/15] Fifteen patches related to file unlink->delete->new Bob Peterson
@ 2015-10-05 15:36 ` Bob Peterson
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 02/15] GFS2: Allow fail_gunlock3 to set the free_vfs_inode bit Bob Peterson
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2015-10-05 15:36 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, function update_statfs called gfs2_statfs_change_out
to update the master statfs buffer without the sd_statfs_spin held.
In theory, another process could call gfs2_statfs_sync, which takes
the sd_statfs_spin lock and re-reads m_sc from the buffer. So there's
a theoretical timing window in which one process could write the
master statfs buffer, then another comes along and re-reads it, wiping
out the changes.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/super.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 894fb01..a691725 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -556,6 +556,7 @@ void update_statfs(struct gfs2_sbd *sdp, struct buffer_head *m_bh,
 	struct gfs2_statfs_change_host *l_sc = &sdp->sd_statfs_local;
 
 	gfs2_trans_add_meta(l_ip->i_gl, l_bh);
+	gfs2_trans_add_meta(m_ip->i_gl, m_bh);
 
 	spin_lock(&sdp->sd_statfs_spin);
 	m_sc->sc_total += l_sc->sc_total;
@@ -564,10 +565,8 @@ void update_statfs(struct gfs2_sbd *sdp, struct buffer_head *m_bh,
 	memset(l_sc, 0, sizeof(struct gfs2_statfs_change));
 	memset(l_bh->b_data + sizeof(struct gfs2_dinode),
 	       0, sizeof(struct gfs2_statfs_change));
-	spin_unlock(&sdp->sd_statfs_spin);
-
-	gfs2_trans_add_meta(m_ip->i_gl, m_bh);
 	gfs2_statfs_change_out(m_sc, m_bh->b_data + sizeof(struct gfs2_dinode));
+	spin_unlock(&sdp->sd_statfs_spin);
 }
 
 int gfs2_statfs_sync(struct super_block *sb, int type)
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [GFS2 PATCH 02/15] GFS2: Allow fail_gunlock3 to set the free_vfs_inode bit
  2015-10-05 15:36 [Cluster-devel] [GFS2 PATCH 00/15] Fifteen patches related to file unlink->delete->new Bob Peterson
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 01/15] GFS2: Update master statfs buffer with sd_statfs_spin locked Bob Peterson
@ 2015-10-05 15:36 ` Bob Peterson
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 03/15] GFS2: Protect log tail calculations with inside locks Bob Peterson
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2015-10-05 15:36 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch changes a failure case in function gfs2_create_inode.
In some error paths, it jumps to label fail_gunlock3, which fails
to set the free_vfs_inode flag. This, in turn, prevents the code
from setting the GIF_FREE_VFS_INODE inode flag. That, in turn,
allow the code to mistakenly unlink the dinode and not ever delete
its data blocks.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/inode.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 063fdfc..c56edb8 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -766,11 +766,8 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
 	return error;
 
 fail_gunlock3:
-	gfs2_glock_dq_uninit(ghs + 1);
-	if (ip->i_gl)
-		gfs2_glock_put(ip->i_gl);
-	goto fail_gunlock;
-
+	if (ip->i_iopen_gh.gh_gl) /* if holder is linked to the glock */
+		gfs2_glock_put(ip->i_iopen_gh.gh_gl);
 fail_gunlock2:
 	gfs2_glock_dq_uninit(ghs + 1);
 fail_free_inode:
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [GFS2 PATCH 03/15] GFS2: Protect log tail calculations with inside locks
  2015-10-05 15:36 [Cluster-devel] [GFS2 PATCH 00/15] Fifteen patches related to file unlink->delete->new Bob Peterson
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 01/15] GFS2: Update master statfs buffer with sd_statfs_spin locked Bob Peterson
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 02/15] GFS2: Allow fail_gunlock3 to set the free_vfs_inode bit Bob Peterson
@ 2015-10-05 15:36 ` Bob Peterson
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 04/15] GFS2: Wait for iopen glock dequeues Bob Peterson
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2015-10-05 15:36 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch moves the variables "old_tail" and "wrap" in function
gfs2_ail2_empty_one inside the protective spin_lock so they can't
be moved by another process. If the calculations aren't protected,
the log's tail may not be properly pulled, which eventually makes
it seem like there isn't enough room in the journal for more
transactions. This results in a hang where GFS2 repeatedly does
log flushes without making any progress.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/log.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 536e7a6..18959de 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -277,11 +277,12 @@ static void gfs2_ail2_empty_one(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
 static void ail2_empty(struct gfs2_sbd *sdp, unsigned int new_tail)
 {
 	struct gfs2_trans *tr, *safe;
-	unsigned int old_tail = sdp->sd_log_tail;
-	int wrap = (new_tail < old_tail);
-	int a, b, rm;
+	unsigned int old_tail;
+	int wrap, a, b, rm;
 
 	spin_lock(&sdp->sd_ail_lock);
+	old_tail = sdp->sd_log_tail;
+	wrap = (new_tail < old_tail);
 
 	list_for_each_entry_safe(tr, safe, &sdp->sd_ail2_list, tr_list) {
 		a = (old_tail <= tr->tr_first);
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [GFS2 PATCH 04/15] GFS2: Wait for iopen glock dequeues
  2015-10-05 15:36 [Cluster-devel] [GFS2 PATCH 00/15] Fifteen patches related to file unlink->delete->new Bob Peterson
                   ` (2 preceding siblings ...)
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 03/15] GFS2: Protect log tail calculations with inside locks Bob Peterson
@ 2015-10-05 15:36 ` Bob Peterson
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 05/15] GFS2: Reintroduce a timeout in function gfs2_gl_hash_clear Bob Peterson
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2015-10-05 15:36 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch changes every glock_dq for iopen glocks into a dq_wait.
This makes sure that iopen glocks do not outlive the inode itself.
In turn, that ensures that anyone trying to unlink the glock will
be able to find the inode when it receives a remote iopen callback.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/inode.c | 3 ++-
 fs/gfs2/super.c | 8 +++++---
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index c56edb8..ce4b793 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -191,7 +191,8 @@ struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned int type,
 fail_refresh:
 	ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
 	ip->i_iopen_gh.gh_gl->gl_object = NULL;
-	gfs2_glock_dq_uninit(&ip->i_iopen_gh);
+	gfs2_glock_dq_wait(&ip->i_iopen_gh);
+	gfs2_holder_uninit(&ip->i_iopen_gh);
 fail_iopen:
 	if (io_gl)
 		gfs2_glock_put(io_gl);
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index a691725..555fea0 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1525,7 +1525,8 @@ static void gfs2_evict_inode(struct inode *inode)
 	error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_SKIP, &gh);
 	if (unlikely(error)) {
 		ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
-		gfs2_glock_dq_uninit(&ip->i_iopen_gh);
+		gfs2_glock_dq_wait(&ip->i_iopen_gh);
+		gfs2_holder_uninit(&ip->i_iopen_gh);
 		goto out;
 	}
 
@@ -1597,7 +1598,7 @@ out_unlock:
 
 	if (test_bit(HIF_HOLDER, &ip->i_iopen_gh.gh_iflags)) {
 		ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
-		gfs2_glock_dq(&ip->i_iopen_gh);
+		gfs2_glock_dq_wait(&ip->i_iopen_gh);
 	}
 	gfs2_holder_uninit(&ip->i_iopen_gh);
 	gfs2_glock_dq_uninit(&gh);
@@ -1618,7 +1619,8 @@ out:
 	if (ip->i_iopen_gh.gh_gl) {
 		ip->i_iopen_gh.gh_gl->gl_object = NULL;
 		ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
-		gfs2_glock_dq_uninit(&ip->i_iopen_gh);
+		gfs2_glock_dq_wait(&ip->i_iopen_gh);
+		gfs2_holder_uninit(&ip->i_iopen_gh);
 	}
 }
 
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [GFS2 PATCH 05/15] GFS2: Reintroduce a timeout in function gfs2_gl_hash_clear
  2015-10-05 15:36 [Cluster-devel] [GFS2 PATCH 00/15] Fifteen patches related to file unlink->delete->new Bob Peterson
                   ` (3 preceding siblings ...)
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 04/15] GFS2: Wait for iopen glock dequeues Bob Peterson
@ 2015-10-05 15:36 ` Bob Peterson
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 06/15] GFS2: Prevent gl_delete work for re-used inodes Bob Peterson
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2015-10-05 15:36 UTC (permalink / raw)
  To: cluster-devel.redhat.com

At some point in the past, we used to have a timeout when GFS2 was
unmounting, trying to clear out its glocks. If the timeout expires,
it would dump the remaining glocks to the kernel messages so that
developers can debug the problem. That timeout was eliminated,
probably by accident. This patch reintroduces it.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glock.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 9bd1244..c352359 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1506,7 +1506,9 @@ void gfs2_gl_hash_clear(struct gfs2_sbd *sdp)
 	flush_workqueue(glock_workqueue);
 	glock_hash_walk(clear_glock, sdp);
 	flush_workqueue(glock_workqueue);
-	wait_event(sdp->sd_glock_wait, atomic_read(&sdp->sd_glock_disposal) == 0);
+	wait_event_timeout(sdp->sd_glock_wait,
+			   atomic_read(&sdp->sd_glock_disposal) == 0,
+			   HZ * 600);
 	glock_hash_walk(dump_glock_func, sdp);
 }
 
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [GFS2 PATCH 06/15] GFS2: Prevent gl_delete work for re-used inodes
  2015-10-05 15:36 [Cluster-devel] [GFS2 PATCH 00/15] Fifteen patches related to file unlink->delete->new Bob Peterson
                   ` (4 preceding siblings ...)
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 05/15] GFS2: Reintroduce a timeout in function gfs2_gl_hash_clear Bob Peterson
@ 2015-10-05 15:36 ` Bob Peterson
  2015-10-06 18:40   ` Andrew W Elble
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 07/15] GFS2: Truncate address space mapping when deleting an inode Bob Peterson
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 22+ messages in thread
From: Bob Peterson @ 2015-10-05 15:36 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch adds a new glock flag GLF_INODE_DELETING which signifies
when a glock is being used to change an inode from unlinked to
deleted. The flag is used in a few places:
1. If an iopen callback is received, it checks the flag. If the bit
   is set, someone else has already started deleting the inode.
   In that case, the delete_func may already be running, so we don't
   want to queue it to run another time. Doing so only gets us into
   trouble.
2. When a dinode is in the process of being created, we've been
   assigned that block by the allocator, so it must have been free.
   At that point, we check if there is pending delete work pending,
   and if so, we cancel it to prevent the block from being deleted
   while we're creating it. This is necessary because there could
   be pending delete work that was queued up a while ago, but the
   delete work might have been done on another node, which is how
   the block became freed. However, we keep the GLF_INODE_DELETING
   set to prevent new delete work from being queued. After we're
   done creating, we clear the bit, otherwise the file may not be
   deleted ever again, even in legitimate cases in the future.
3. In function try_rgrp_unlink, we also make sure the bit isn't
   already set before we try to reclaim an unlinked block.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glops.c  |  3 ++-
 fs/gfs2/incore.h |  1 +
 fs/gfs2/inode.c  | 21 ++++++++++++++++++++-
 fs/gfs2/rgrp.c   |  6 ++++++
 4 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 1f6c9c3..b604343 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -553,7 +553,8 @@ static void iopen_go_callback(struct gfs2_glock *gl, bool remote)
 		return;
 
 	if (gl->gl_demote_state == LM_ST_UNLOCKED &&
-	    gl->gl_state == LM_ST_SHARED && ip) {
+	    gl->gl_state == LM_ST_SHARED && ip &&
+	    !test_and_set_bit(GLF_INODE_DELETING, &gl->gl_flags)) {
 		gl->gl_lockref.count++;
 		if (queue_work(gfs2_delete_workqueue, &gl->gl_delete) == 0)
 			gl->gl_lockref.count--;
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 121ed08..5065e0c 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -326,6 +326,7 @@ enum {
 	GLF_LRU				= 13,
 	GLF_OBJECT			= 14, /* Used only for tracing */
 	GLF_BLOCKING			= 15,
+	GLF_INODE_DELETING		= 16, /* Was unlinked, being deleted */
 };
 
 struct gfs2_glock {
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index ce4b793..833f8fa 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -597,6 +597,7 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
 	int error, free_vfs_inode = 0;
 	u32 aflags = 0;
 	unsigned blocks = 1;
+	int delete_prevented = 0;
 	struct gfs2_diradd da = { .bh = NULL, .save_loc = 1, };
 
 	if (!name->len || name->len > GFS2_FNAMESIZE)
@@ -705,6 +706,17 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
 	if (error)
 		goto fail_free_inode;
 
+	/*
+	 * Cancel any pending delete work for this glock. If there's pending
+	 * delete work, we'd otherwise try to delete the dinode, but since we
+	 * were assigned this address by alloc_dinode, the block is already
+	 * free, so there's no need to attempt to change it from unlinked to
+	 * free. We'd just get into trouble trying to do so. The biggest
+	 * problem is having gfs2_delete_inode called while there pages
+	 * still in existence due to a race between create and delete.
+	 */
+	if (cancel_work_sync(&ip->i_gl->gl_delete))
+		delete_prevented = 1;
 	ip->i_gl->gl_object = ip;
 	error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_SKIP, ghs + 1);
 	if (error)
@@ -762,6 +774,10 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
 		*opened |= FILE_CREATED;
 		error = finish_open(file, dentry, gfs2_open_common, opened);
 	}
+	if (delete_prevented) {
+		gfs2_glock_put(ip->i_gl); /* fix the gl reference count */
+		clear_bit(GLF_INODE_DELETING, &ip->i_gl->gl_flags);
+	}
 	gfs2_glock_dq_uninit(ghs);
 	gfs2_glock_dq_uninit(ghs + 1);
 	return error;
@@ -772,8 +788,11 @@ fail_gunlock3:
 fail_gunlock2:
 	gfs2_glock_dq_uninit(ghs + 1);
 fail_free_inode:
-	if (ip->i_gl)
+	if (ip->i_gl) {
+		if (delete_prevented)
+			clear_bit(GLF_INODE_DELETING, &ip->i_gl->gl_flags);
 		gfs2_glock_put(ip->i_gl);
+	}
 	gfs2_rs_delete(ip, NULL);
 fail_free_acls:
 	if (default_acl)
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 475985d..b936ee1 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1807,6 +1807,12 @@ static void try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 *last_unlinked, u64 skip
 		if (error)
 			continue;
 
+		/* Make sure we're not queuing a delete if someone else has */
+		if (test_and_set_bit(GLF_INODE_DELETING, &gl->gl_flags)) {
+			gfs2_glock_put(gl);
+			continue;
+		}
+
 		/* If the inode is already in cache, we can ignore it here
 		 * because the existing inode disposal code will deal with
 		 * it when all refs have gone away. Accessing gl_object like
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [GFS2 PATCH 07/15] GFS2: Truncate address space mapping when deleting an inode
  2015-10-05 15:36 [Cluster-devel] [GFS2 PATCH 00/15] Fifteen patches related to file unlink->delete->new Bob Peterson
                   ` (5 preceding siblings ...)
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 06/15] GFS2: Prevent gl_delete work for re-used inodes Bob Peterson
@ 2015-10-05 15:36 ` Bob Peterson
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 08/15] GFS2: Don't filter out I_FREEING inodes anymore Bob Peterson
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2015-10-05 15:36 UTC (permalink / raw)
  To: cluster-devel.redhat.com

In function gfs2_delete_inode() we write and flush the mapping for
a glock, among other things. We truncate the mapping for the inode,
but we never truncate the mapping for the glock. This patch makes it
also truncate the metamapping. This avoid cases where the glock is
reused by another process who is trying to recreate an inode in its
place using the same block.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/super.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 555fea0..06bd72b 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1511,6 +1511,7 @@ static void gfs2_evict_inode(struct inode *inode)
 	struct gfs2_sbd *sdp = sb->s_fs_info;
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_holder gh;
+	struct address_space *metamapping;
 	int error;
 
 	if (test_bit(GIF_FREE_VFS_INODE, &ip->i_flags)) {
@@ -1575,8 +1576,8 @@ static void gfs2_evict_inode(struct inode *inode)
 
 out_truncate:
 	gfs2_log_flush(sdp, ip->i_gl, NORMAL_FLUSH);
+	metamapping = gfs2_glock2aspace(ip->i_gl);
 	if (test_bit(GLF_DIRTY, &ip->i_gl->gl_flags)) {
-		struct address_space *metamapping = gfs2_glock2aspace(ip->i_gl);
 		filemap_fdatawrite(metamapping);
 		filemap_fdatawait(metamapping);
 	}
@@ -1589,6 +1590,7 @@ out_truncate:
 		goto out_unlock;
 	/* Needs to be done before glock release & also in a transaction */
 	truncate_inode_pages(&inode->i_data, 0);
+	truncate_inode_pages(metamapping, 0);
 	gfs2_trans_end(sdp);
 
 out_unlock:
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [GFS2 PATCH 08/15] GFS2: Don't filter out I_FREEING inodes anymore
  2015-10-05 15:36 [Cluster-devel] [GFS2 PATCH 00/15] Fifteen patches related to file unlink->delete->new Bob Peterson
                   ` (6 preceding siblings ...)
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 07/15] GFS2: Truncate address space mapping when deleting an inode Bob Peterson
@ 2015-10-05 15:36 ` Bob Peterson
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 09/15] GFS2: generalize gfs2_check_blk_type Bob Peterson
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2015-10-05 15:36 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch basically reverts a very old patch from 2008,
7a9f53b3c1875bef22ad4588e818bc046ef183da, with the title
"Alternate gfs2_iget to avoid looking up inodes being freed".
The original patch was designed to avoid a deadlock caused by lock
ordering with try_rgrp_unlink. The patch forced the function to not
find inodes that were being removed by VFS. The problem is, that
made it impossible for nodes to delete their own unlinked dinodes
after a certain point in time, because the inode needed was not found
by this filtering process. There is no longer a need for the patch,
since function try_rgrp_unlink no longer locks the inode: All it does
is queue the glock onto the delete work_queue, so there should be no
more deadlock.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/export.c |  2 +-
 fs/gfs2/glock.c  |  2 +-
 fs/gfs2/inode.c  | 59 ++++----------------------------------------------------
 fs/gfs2/inode.h  |  2 +-
 4 files changed, 7 insertions(+), 58 deletions(-)

diff --git a/fs/gfs2/export.c b/fs/gfs2/export.c
index 5d15e94..d5bda85 100644
--- a/fs/gfs2/export.c
+++ b/fs/gfs2/export.c
@@ -137,7 +137,7 @@ static struct dentry *gfs2_get_dentry(struct super_block *sb,
 	struct gfs2_sbd *sdp = sb->s_fs_info;
 	struct inode *inode;
 
-	inode = gfs2_ilookup(sb, inum->no_addr, 0);
+	inode = gfs2_ilookup(sb, inum->no_addr);
 	if (inode) {
 		if (GFS2_I(inode)->i_no_formal_ino != inum->no_formal_ino) {
 			iput(inode);
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index c352359..e5dfbcd 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -576,7 +576,7 @@ static void delete_work_func(struct work_struct *work)
 	/* Note: Unsafe to dereference ip as we don't hold right refs/locks */
 
 	if (ip)
-		inode = gfs2_ilookup(sdp->sd_vfs, no_addr, 1);
+		inode = gfs2_ilookup(sdp->sd_vfs, no_addr);
 	else
 		inode = gfs2_lookup_by_inum(sdp, no_addr, NULL, GFS2_BLKST_UNLINKED);
 	if (inode && !IS_ERR(inode)) {
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 833f8fa..06c208b 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -37,61 +37,9 @@
 #include "super.h"
 #include "glops.h"
 
-struct gfs2_skip_data {
-	u64 no_addr;
-	int skipped;
-	int non_block;
-};
-
-static int iget_test(struct inode *inode, void *opaque)
-{
-	struct gfs2_inode *ip = GFS2_I(inode);
-	struct gfs2_skip_data *data = opaque;
-
-	if (ip->i_no_addr == data->no_addr) {
-		if (data->non_block &&
-		    inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE)) {
-			data->skipped = 1;
-			return 0;
-		}
-		return 1;
-	}
-	return 0;
-}
-
-static int iget_set(struct inode *inode, void *opaque)
+struct inode *gfs2_ilookup(struct super_block *sb, u64 no_addr)
 {
-	struct gfs2_inode *ip = GFS2_I(inode);
-	struct gfs2_skip_data *data = opaque;
-
-	if (data->skipped)
-		return -ENOENT;
-	inode->i_ino = (unsigned long)(data->no_addr);
-	ip->i_no_addr = data->no_addr;
-	return 0;
-}
-
-struct inode *gfs2_ilookup(struct super_block *sb, u64 no_addr, int non_block)
-{
-	unsigned long hash = (unsigned long)no_addr;
-	struct gfs2_skip_data data;
-
-	data.no_addr = no_addr;
-	data.skipped = 0;
-	data.non_block = non_block;
-	return ilookup5(sb, hash, iget_test, &data);
-}
-
-static struct inode *gfs2_iget(struct super_block *sb, u64 no_addr,
-			       int non_block)
-{
-	struct gfs2_skip_data data;
-	unsigned long hash = (unsigned long)no_addr;
-
-	data.no_addr = no_addr;
-	data.skipped = 0;
-	data.non_block = non_block;
-	return iget5_locked(sb, hash, iget_test, iget_set, &data);
+	return ilookup(sb, (unsigned long)no_addr);
 }
 
 /**
@@ -145,8 +93,9 @@ struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned int type,
 	struct gfs2_glock *io_gl = NULL;
 	int error;
 
-	inode = gfs2_iget(sb, no_addr, non_block);
+	inode = iget_locked(sb, (unsigned long)no_addr);
 	ip = GFS2_I(inode);
+	ip->i_no_addr = no_addr;
 
 	if (!inode)
 		return ERR_PTR(-ENOMEM);
diff --git a/fs/gfs2/inode.h b/fs/gfs2/inode.h
index ba4d949..22c27a8 100644
--- a/fs/gfs2/inode.h
+++ b/fs/gfs2/inode.h
@@ -99,7 +99,7 @@ extern struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned type,
 extern struct inode *gfs2_lookup_by_inum(struct gfs2_sbd *sdp, u64 no_addr,
 					 u64 *no_formal_ino,
 					 unsigned int blktype);
-extern struct inode *gfs2_ilookup(struct super_block *sb, u64 no_addr, int nonblock);
+extern struct inode *gfs2_ilookup(struct super_block *sb, u64 no_addr);
 
 extern int gfs2_inode_refresh(struct gfs2_inode *ip);
 
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [GFS2 PATCH 09/15] GFS2: generalize gfs2_check_blk_type
  2015-10-05 15:36 [Cluster-devel] [GFS2 PATCH 00/15] Fifteen patches related to file unlink->delete->new Bob Peterson
                   ` (7 preceding siblings ...)
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 08/15] GFS2: Don't filter out I_FREEING inodes anymore Bob Peterson
@ 2015-10-05 15:36 ` Bob Peterson
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 10/15] GFS2: Change from tr_touched to tr_bufs Bob Peterson
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2015-10-05 15:36 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch reorganizes function gfs2_check_blk_type so that a more
generalized function, gfs2_get_block_type, may be used in future
enhancements related to the freeing of unlinked dinodes.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/rgrp.c | 46 ++++++++++++++++++++++++++++++++--------------
 fs/gfs2/rgrp.h |  1 +
 2 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index b936ee1..0e3be08 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -2125,14 +2125,14 @@ void gfs2_inplace_release(struct gfs2_inode *ip)
 }
 
 /**
- * gfs2_get_block_type - Check a block in a RG is of given type
+ * __get_block_type - Check a block in a RG is of given type
  * @rgd: the resource group holding the block
  * @block: the block number
  *
  * Returns: The block type (GFS2_BLKST_*)
  */
 
-static unsigned char gfs2_get_block_type(struct gfs2_rgrpd *rgd, u64 block)
+static unsigned char __get_block_type(struct gfs2_rgrpd *rgd, u64 block)
 {
 	struct gfs2_rbm rbm = { .rgd = rgd, };
 	int ret;
@@ -2518,7 +2518,7 @@ void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip)
 }
 
 /**
- * gfs2_check_blk_type - Check the type of a block
+ * gfs2_get_blk_type - Get the type of a block
  * @sdp: The superblock
  * @no_addr: The block number to check
  * @type: The block type we are looking for
@@ -2528,26 +2528,44 @@ void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip)
  *          or -ve errno if something went wrong while checking
  */
 
-int gfs2_check_blk_type(struct gfs2_sbd *sdp, u64 no_addr, unsigned int type)
+int gfs2_get_blk_type(struct gfs2_sbd *sdp, u64 no_addr)
 {
 	struct gfs2_rgrpd *rgd;
 	struct gfs2_holder rgd_gh;
-	int error = -EINVAL;
+	int ret;
 
 	rgd = gfs2_blk2rgrpd(sdp, no_addr, 1);
 	if (!rgd)
-		goto fail;
+		return -EINVAL;
 
-	error = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_SHARED, 0, &rgd_gh);
-	if (error)
-		goto fail;
+	ret = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_SHARED, 0, &rgd_gh);
+	if (ret == 0) {
+		ret = __get_block_type(rgd, no_addr);
+		gfs2_glock_dq_uninit(&rgd_gh);
+	}
+	return ret;
+}
 
-	if (gfs2_get_block_type(rgd, no_addr) != type)
-		error = -ESTALE;
+/**
+ * gfs2_check_blk_type - Check the type of a block
+ * @sdp: The superblock
+ * @no_addr: The block number to check
+ * @type: The block type we are looking for
+ *
+ * Returns: 0 if the block type matches the expected type
+ *          -ESTALE if it doesn't match
+ *          or -ve errno if something went wrong while checking
+ */
 
-	gfs2_glock_dq_uninit(&rgd_gh);
-fail:
-	return error;
+int gfs2_check_blk_type(struct gfs2_sbd *sdp, u64 no_addr, unsigned int type)
+{
+	int ty = gfs2_get_blk_type(sdp, no_addr);
+
+	if (ty < 0)
+		return ty;
+	if (ty != type)
+		return -ESTALE;
+	return 0;
 }
 
 /**
diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
index c0ab33f..89cdab9 100644
--- a/fs/gfs2/rgrp.h
+++ b/fs/gfs2/rgrp.h
@@ -56,6 +56,7 @@ extern void __gfs2_free_blocks(struct gfs2_inode *ip, u64 bstart, u32 blen, int
 extern void gfs2_free_meta(struct gfs2_inode *ip, u64 bstart, u32 blen);
 extern void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip);
 extern void gfs2_unlink_di(struct inode *inode);
+extern int gfs2_get_blk_type(struct gfs2_sbd *sdp, u64 no_addr);
 extern int gfs2_check_blk_type(struct gfs2_sbd *sdp, u64 no_addr,
 			       unsigned int type);
 
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [GFS2 PATCH 10/15] GFS2: Change from tr_touched to tr_bufs
  2015-10-05 15:36 [Cluster-devel] [GFS2 PATCH 00/15] Fifteen patches related to file unlink->delete->new Bob Peterson
                   ` (8 preceding siblings ...)
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 09/15] GFS2: generalize gfs2_check_blk_type Bob Peterson
@ 2015-10-05 15:36 ` Bob Peterson
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 11/15] GFS2: Add new function gfs2_inode_lookup_for_del Bob Peterson
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2015-10-05 15:36 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch changes the GFS2 transaction processing so that it keeps
track of how many buffers are associated with a transaction rather
than whether or not it's been touched. That way, if a buffer is
added, then revoked, then removed entirely, it can avoid some work.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/incore.h  |  5 ++++-
 fs/gfs2/meta_io.c |  6 +++---
 fs/gfs2/trans.c   | 13 +++++++------
 3 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 5065e0c..c8f9bb9 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -473,10 +473,13 @@ struct gfs2_trans {
 	unsigned int tr_blocks;
 	unsigned int tr_revokes;
 	unsigned int tr_reserved;
-	unsigned int tr_touched:1;
 	unsigned int tr_attached:1;
 	unsigned int tr_alloced:1;
 
+	struct gfs2_holder tr_t_gh;
+
+	int tr_bufs;
+
 	unsigned int tr_num_buf_new;
 	unsigned int tr_num_databuf_new;
 	unsigned int tr_num_buf_rm;
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index 0e1d4be..c20d7ca 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -224,7 +224,7 @@ int gfs2_meta_read(struct gfs2_glock *gl, u64 blkno, int flags,
 	wait_on_buffer(bh);
 	if (unlikely(!buffer_uptodate(bh))) {
 		struct gfs2_trans *tr = current->journal_info;
-		if (tr && tr->tr_touched)
+		if (tr && tr->tr_bufs)
 			gfs2_io_error_bh(sdp, bh);
 		brelse(bh);
 		*bhp = NULL;
@@ -251,7 +251,7 @@ int gfs2_meta_wait(struct gfs2_sbd *sdp, struct buffer_head *bh)
 
 	if (!buffer_uptodate(bh)) {
 		struct gfs2_trans *tr = current->journal_info;
-		if (tr && tr->tr_touched)
+		if (tr && tr->tr_bufs)
 			gfs2_io_error_bh(sdp, bh);
 		return -EIO;
 	}
@@ -276,7 +276,7 @@ void gfs2_remove_from_journal(struct buffer_head *bh, struct gfs2_trans *tr, int
 			tr->tr_num_buf_rm++;
 		else
 			tr->tr_num_databuf_rm++;
-		tr->tr_touched = 1;
+		tr->tr_bufs--;
 		was_pinned = 1;
 		brelse(bh);
 	}
diff --git a/fs/gfs2/trans.c b/fs/gfs2/trans.c
index b95d0d6..6ff5faa 100644
--- a/fs/gfs2/trans.c
+++ b/fs/gfs2/trans.c
@@ -77,8 +77,8 @@ fail:
 static void gfs2_print_trans(const struct gfs2_trans *tr)
 {
 	pr_warn("Transaction created at: %pSR\n", (void *)tr->tr_ip);
-	pr_warn("blocks=%u revokes=%u reserved=%u touched=%u\n",
-		tr->tr_blocks, tr->tr_revokes, tr->tr_reserved, tr->tr_touched);
+	pr_warn("blocks=%u revokes=%u reserved=%u bufs=%d\n",
+		tr->tr_blocks, tr->tr_revokes, tr->tr_reserved, tr->tr_bufs);
 	pr_warn("Buf %u/%u Databuf %u/%u Revoke %u/%u\n",
 		tr->tr_num_buf_new, tr->tr_num_buf_rm,
 		tr->tr_num_databuf_new, tr->tr_num_databuf_rm,
@@ -94,7 +94,7 @@ void gfs2_trans_end(struct gfs2_sbd *sdp)
 	BUG_ON(!tr);
 	current->journal_info = NULL;
 
-	if (!tr->tr_touched) {
+	if (tr->tr_bufs == 0) {
 		gfs2_log_release(sdp, tr->tr_reserved);
 		if (alloced) {
 			kfree(tr);
@@ -180,7 +180,7 @@ void gfs2_trans_add_data(struct gfs2_glock *gl, struct buffer_head *bh)
 		gfs2_log_lock(sdp);
 	}
 	gfs2_assert(sdp, bd->bd_gl == gl);
-	tr->tr_touched = 1;
+	tr->tr_bufs++;
 	if (list_empty(&bd->bd_list)) {
 		set_bit(GLF_LFLUSH, &bd->bd_gl->gl_flags);
 		set_bit(GLF_DIRTY, &bd->bd_gl->gl_flags);
@@ -199,7 +199,7 @@ static void meta_lo_add(struct gfs2_sbd *sdp, struct gfs2_bufdata *bd)
 	enum gfs2_freeze_state state = atomic_read(&sdp->sd_freeze_state);
 
 	tr = current->journal_info;
-	tr->tr_touched = 1;
+	tr->tr_bufs++;
 	if (!list_empty(&bd->bd_list))
 		return;
 	set_bit(GLF_LFLUSH, &bd->bd_gl->gl_flags);
@@ -252,7 +252,7 @@ void gfs2_trans_add_revoke(struct gfs2_sbd *sdp, struct gfs2_bufdata *bd)
 
 	BUG_ON(!list_empty(&bd->bd_list));
 	gfs2_add_revoke(sdp, bd);
-	tr->tr_touched = 1;
+	tr->tr_bufs++;
 	tr->tr_num_revoke++;
 }
 
@@ -268,6 +268,7 @@ void gfs2_trans_add_unrevoke(struct gfs2_sbd *sdp, u64 blkno, unsigned int len)
 			list_del_init(&bd->bd_list);
 			gfs2_assert_withdraw(sdp, sdp->sd_log_num_revoke);
 			sdp->sd_log_num_revoke--;
+			tr->tr_bufs--;
 			kmem_cache_free(gfs2_bufdata_cachep, bd);
 			tr->tr_num_revoke_rm++;
 			if (--n == 0)
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [GFS2 PATCH 11/15] GFS2: Add new function gfs2_inode_lookup_for_del
  2015-10-05 15:36 [Cluster-devel] [GFS2 PATCH 00/15] Fifteen patches related to file unlink->delete->new Bob Peterson
                   ` (9 preceding siblings ...)
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 10/15] GFS2: Change from tr_touched to tr_bufs Bob Peterson
@ 2015-10-05 15:36 ` Bob Peterson
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 12/15] gfs2: Remove unused param non_block from gfs2_inode_lookup Bob Peterson
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2015-10-05 15:36 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch adds a new specialized function to look up an inode
for the purposes of deleting it. Before, we used to call function
gfs2_lookup_by_inum, but the new function closes some timing
windows involving the iopen and inode glocks coming and going,
since they typically outlive their inodes.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glock.c |   9 ++--
 fs/gfs2/inode.c | 133 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/gfs2/inode.h |   2 +
 3 files changed, 140 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index e5dfbcd..59f089c 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -569,16 +569,17 @@ static void delete_work_func(struct work_struct *work)
 	struct gfs2_glock *gl = container_of(work, struct gfs2_glock, gl_delete);
 	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
 	struct gfs2_inode *ip;
-	struct inode *inode;
+	struct inode *inode = NULL;
 	u64 no_addr = gl->gl_name.ln_number;
 
 	ip = gl->gl_object;
 	/* Note: Unsafe to dereference ip as we don't hold right refs/locks */
-
 	if (ip)
 		inode = gfs2_ilookup(sdp->sd_vfs, no_addr);
-	else
-		inode = gfs2_lookup_by_inum(sdp, no_addr, NULL, GFS2_BLKST_UNLINKED);
+
+	if (inode == NULL || IS_ERR(inode))
+		inode = gfs2_inode_lookup_for_del(sdp->sd_vfs, no_addr);
+
 	if (inode && !IS_ERR(inode)) {
 		d_prune_aliases(inode);
 		iput(inode);
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 06c208b..1c57f7d 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -153,6 +153,139 @@ fail:
 	return ERR_PTR(error);
 }
 
+/**
+ * gfs2_inode_lookup_for_del - Lookup an unlinked inode so we can delete it
+ * @sb: The super block
+ * @no_addr: The inode number
+ *
+ * Returns: A VFS inode, or an error
+ *
+ * We jump through some hoops here to avoid a special case in which the block
+ * has been freed and already reallocated for a different inode while after
+ * the iopen callback was received to signify a remote unlink that needs
+ * deleting. In that case, we don't want to return the inode to the caller
+ * to delete the inode. We also need to do an inode_refresh to ensure that
+ * whomever recreated the dinode gets a proper i_nlink count, otherwise the
+ * vfs might think it's unlinked and still needs deleting.
+ */
+struct inode *gfs2_inode_lookup_for_del(struct super_block *sb, u64 no_addr)
+{
+	struct inode *inode;
+	struct gfs2_inode *ip;
+	struct gfs2_glock *io_gl = NULL;
+	struct gfs2_holder i_gh;
+	int error, btype;
+	struct gfs2_sbd *sdp = sb->s_fs_info;
+	struct address_space *mapping;
+	BUG_ON(no_addr == 0);
+	inode = iget_locked(sb, (unsigned long)no_addr);
+	ip = GFS2_I(inode);
+
+	if (!inode)
+		return ERR_PTR(-ENOBUFS);
+
+	if (!(inode->i_state & I_NEW)) {
+		BUG_ON(ip->i_no_addr != no_addr);
+		return inode;
+	}
+
+	ip->i_no_addr = no_addr;
+
+	error = gfs2_glock_get(sdp, no_addr, &gfs2_inode_glops, CREATE,
+			       &ip->i_gl);
+	if (unlikely(error))
+		goto fail;
+	/* If this is a brand new inode, but a recycled glock, we've got a
+	   reference problem. In clear_inode it does an extra glock_put so the
+	   next time we do clear_inode for this inode, we'll get in trouble.
+	   So we hold the glock to bump the reference count. */
+	if (ip->i_gl->gl_flags != 0) /* New inode, recycled glock */
+		ip->i_gl->gl_lockref.count++;
+
+	ip->i_gl->gl_object = ip;
+
+	error = gfs2_glock_get(sdp, no_addr, &gfs2_iopen_glops, CREATE,
+			       &io_gl);
+	if (unlikely(error))
+		goto fail_put;
+
+	ip->i_iopen_gh.gh_gl = NULL;
+	gfs2_glock_put(io_gl);
+	io_gl = NULL;
+
+	error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_SKIP, &i_gh);
+	if (error)
+		goto fail_put;
+
+	/* Inode glock must be locked already */
+	btype = gfs2_get_blk_type(sdp, no_addr);
+	if (btype < 0) {
+		error = btype;
+		goto fail_refresh;
+	}
+	if (btype == GFS2_BLKST_FREE || btype == GFS2_BLKST_USED) {
+		error = -ESTALE;
+		goto fail_refresh;
+	}
+
+	/* At this point it's either UNLINKED or DINODE */
+
+	/* If there aren't any pages associated with it and it's a new inode,
+	 * we shouldn't be messing with it, even to read in pages. We should
+	 * just exit and let whomever's using it carry on. If we do inode
+	 * refresh, we'll end up adding pages to the cache that we'd otherwise
+	 * need to truncate with truncate_inode_page().
+	 */
+	mapping = gfs2_glock2aspace(ip->i_gl);
+	if (mapping->nrpages == 0 && btype == GFS2_BLKST_DINODE) {
+		error = -ESTALE;
+		goto fail_refresh;
+	}
+	/* At this point, we've got a dinode with pages associated with it
+	 * or it's unlinked. If it's unlinked, we need to do inode_refresh
+	 * so that our put() will delete it normally through gfs2_delete_inode.
+	 * If it has pages associated with it, we may have grabbed the glock
+	 * while it was being created, so we need to refresh it before
+	 * exiting.
+	 */
+	error = gfs2_inode_refresh(GFS2_I(inode));
+	if (error)
+		goto fail_refresh;
+
+	if (btype == GFS2_BLKST_DINODE) {
+		/* Now we know this is a dinode (not unlinked), but we know
+		 * there were already pages associated with it. So it's safe
+		 * to exit:
+		 */
+		error = -ESTALE;
+		goto fail_refresh;
+	}
+
+	gfs2_set_iop(inode);
+	unlock_new_inode(inode);
+	gfs2_glock_dq_uninit(&i_gh);
+	gfs2_glock_put(ip->i_gl);
+	return inode;
+
+fail_refresh:
+	ip->i_gl->gl_object = NULL; /* See note below */
+	gfs2_glock_dq_uninit(&i_gh);
+fail_put:
+	/* This setting of gl_object to NULL is done by the other lookup
+	 * functions. But if it races with someone reusing the dinode, we
+	 * don't want to mess them up. It seems necessary in order to prevent
+	 * buffer_heads from being attached after the i_gh is acquired.
+	 * But it seems like it has the potential to screw up people trying
+	 * to re-use the glock for a new incarnation of the inode.
+	 * For now, I'm going to move it inside the dq_uninit.
+	 */
+	/*ip->i_gl->gl_object = NULL;*/
+	gfs2_glock_put(ip->i_gl);
+fail:
+	iget_failed(inode);
+	return ERR_PTR(error);
+}
+
 struct inode *gfs2_lookup_by_inum(struct gfs2_sbd *sdp, u64 no_addr,
 				  u64 *no_formal_ino, unsigned int blktype)
 {
diff --git a/fs/gfs2/inode.h b/fs/gfs2/inode.h
index 22c27a8..031e301 100644
--- a/fs/gfs2/inode.h
+++ b/fs/gfs2/inode.h
@@ -96,6 +96,8 @@ err:
 extern struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned type, 
 				       u64 no_addr, u64 no_formal_ino,
 				       int non_block);
+extern struct inode *gfs2_inode_lookup_for_del(struct super_block *sb,
+					       u64 no_addr);
 extern struct inode *gfs2_lookup_by_inum(struct gfs2_sbd *sdp, u64 no_addr,
 					 u64 *no_formal_ino,
 					 unsigned int blktype);
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [GFS2 PATCH 12/15] gfs2: Remove unused param non_block from gfs2_inode_lookup
  2015-10-05 15:36 [Cluster-devel] [GFS2 PATCH 00/15] Fifteen patches related to file unlink->delete->new Bob Peterson
                   ` (10 preceding siblings ...)
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 11/15] GFS2: Add new function gfs2_inode_lookup_for_del Bob Peterson
@ 2015-10-05 15:36 ` Bob Peterson
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 13/15] gfs2: Use new variable i_gl instead of ip->i_gl Bob Peterson
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2015-10-05 15:36 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Function gfs2_inode_lookup had parameter non_block which is no
longer used. This patch removes it and fixes the comments and
callers.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/dir.c        | 2 +-
 fs/gfs2/inode.c      | 8 ++++----
 fs/gfs2/inode.h      | 3 +--
 fs/gfs2/ops_fstype.c | 2 +-
 4 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index 487527b..f4caf2f 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -1563,7 +1563,7 @@ struct inode *gfs2_dir_search(struct inode *dir, const struct qstr *name,
 		brelse(bh);
 		if (fail_on_exist)
 			return ERR_PTR(-EEXIST);
-		return gfs2_inode_lookup(dir->i_sb, dtype, addr, formal_ino, 0);
+		return gfs2_inode_lookup(dir->i_sb, dtype, addr, formal_ino);
 	}
 	return ERR_PTR(-ENOENT);
 }
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 1c57f7d..631d4ef 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -78,15 +78,15 @@ static void gfs2_set_iop(struct inode *inode)
 /**
  * gfs2_inode_lookup - Lookup an inode
  * @sb: The super block
- * @no_addr: The inode number
  * @type: The type of the inode
- * non_block: Can we block on inodes that are being freed?
+ * @no_addr: The inode number
+ * @no_formal_ino: The formal inode number (for NFS)
  *
  * Returns: A VFS inode, or an error
  */
 
 struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned int type,
-				u64 no_addr, u64 no_formal_ino, int non_block)
+				u64 no_addr, u64 no_formal_ino)
 {
 	struct inode *inode;
 	struct gfs2_inode *ip;
@@ -304,7 +304,7 @@ struct inode *gfs2_lookup_by_inum(struct gfs2_sbd *sdp, u64 no_addr,
 	if (error)
 		goto fail;
 
-	inode = gfs2_inode_lookup(sb, DT_UNKNOWN, no_addr, 0, 1);
+	inode = gfs2_inode_lookup(sb, DT_UNKNOWN, no_addr, 0);
 	if (IS_ERR(inode))
 		goto fail;
 
diff --git a/fs/gfs2/inode.h b/fs/gfs2/inode.h
index 031e301..9e838b3 100644
--- a/fs/gfs2/inode.h
+++ b/fs/gfs2/inode.h
@@ -94,8 +94,7 @@ err:
 }
 
 extern struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned type, 
-				       u64 no_addr, u64 no_formal_ino,
-				       int non_block);
+				       u64 no_addr, u64 no_formal_ino);
 extern struct inode *gfs2_inode_lookup_for_del(struct super_block *sb,
 					       u64 no_addr);
 extern struct inode *gfs2_lookup_by_inum(struct gfs2_sbd *sdp, u64 no_addr,
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index baab99b..ee44116 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -451,7 +451,7 @@ static int gfs2_lookup_root(struct super_block *sb, struct dentry **dptr,
 	struct dentry *dentry;
 	struct inode *inode;
 
-	inode = gfs2_inode_lookup(sb, DT_DIR, no_addr, 0, 0);
+	inode = gfs2_inode_lookup(sb, DT_DIR, no_addr, 0);
 	if (IS_ERR(inode)) {
 		fs_err(sdp, "can't read in %s inode: %ld\n", name, PTR_ERR(inode));
 		return PTR_ERR(inode);
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [GFS2 PATCH 13/15] gfs2: Use new variable i_gl instead of ip->i_gl
  2015-10-05 15:36 [Cluster-devel] [GFS2 PATCH 00/15] Fifteen patches related to file unlink->delete->new Bob Peterson
                   ` (11 preceding siblings ...)
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 12/15] gfs2: Remove unused param non_block from gfs2_inode_lookup Bob Peterson
@ 2015-10-05 15:36 ` Bob Peterson
  2015-10-06 18:31   ` Andrew W Elble
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 14/15] GFS2: Hold onto iopen glock longer when dinode creation fails Bob Peterson
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 15/15] GFS2: Rework gfs2_evict_inode to prevent collisions with openers Bob Peterson
  14 siblings, 1 reply; 22+ messages in thread
From: Bob Peterson @ 2015-10-05 15:36 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch adds a new variable to function gfs2_evict_inode that
simplifies the references to ip->i_gl. This is just for readability
and to clarify future patches.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/super.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 06bd72b..79ee54b 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1511,6 +1511,7 @@ static void gfs2_evict_inode(struct inode *inode)
 	struct gfs2_sbd *sdp = sb->s_fs_info;
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_holder gh;
+	struct gfs2_glock *i_gl = NULL;
 	struct address_space *metamapping;
 	int error;
 
@@ -1523,7 +1524,7 @@ static void gfs2_evict_inode(struct inode *inode)
 		goto out;
 
 	/* Must not read inode block until block type has been verified */
-	error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_SKIP, &gh);
+	error = gfs2_glock_nq_init(i_gl, LM_ST_EXCLUSIVE, GL_SKIP, &gh);
 	if (unlikely(error)) {
 		ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
 		gfs2_glock_dq_wait(&ip->i_iopen_gh);
@@ -1575,14 +1576,14 @@ static void gfs2_evict_inode(struct inode *inode)
 	goto out_unlock;
 
 out_truncate:
-	gfs2_log_flush(sdp, ip->i_gl, NORMAL_FLUSH);
-	metamapping = gfs2_glock2aspace(ip->i_gl);
-	if (test_bit(GLF_DIRTY, &ip->i_gl->gl_flags)) {
+	gfs2_log_flush(sdp, i_gl, NORMAL_FLUSH);
+	metamapping = gfs2_glock2aspace(i_gl);
+	if (test_bit(GLF_DIRTY, &i_gl->gl_flags)) {
 		filemap_fdatawrite(metamapping);
 		filemap_fdatawait(metamapping);
 	}
 	write_inode_now(inode, 1);
-	gfs2_ail_flush(ip->i_gl, 0);
+	gfs2_ail_flush(i_gl, 0);
 
 	/* Case 2 starts here */
 	error = gfs2_trans_begin(sdp, 0, sdp->sd_jdesc->jd_blocks);
@@ -1616,7 +1617,7 @@ out:
 	ip->i_gl->gl_object = NULL;
 	flush_delayed_work(&ip->i_gl->gl_work);
 	gfs2_glock_add_to_lru(ip->i_gl);
-	gfs2_glock_put(ip->i_gl);
+	gfs2_glock_put(i_gl);
 	ip->i_gl = NULL;
 	if (ip->i_iopen_gh.gh_gl) {
 		ip->i_iopen_gh.gh_gl->gl_object = NULL;
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [GFS2 PATCH 14/15] GFS2: Hold onto iopen glock longer when dinode creation fails
  2015-10-05 15:36 [Cluster-devel] [GFS2 PATCH 00/15] Fifteen patches related to file unlink->delete->new Bob Peterson
                   ` (12 preceding siblings ...)
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 13/15] gfs2: Use new variable i_gl instead of ip->i_gl Bob Peterson
@ 2015-10-05 15:36 ` Bob Peterson
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 15/15] GFS2: Rework gfs2_evict_inode to prevent collisions with openers Bob Peterson
  14 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2015-10-05 15:36 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch allows function gfs2_create_inode to keep hold of the
iopen glock longer in cases where the creation of a new dinode fails.
This prevents some nasty timing windows where a dinode is being
created and reused.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/inode.c | 19 ++++++++++++-------
 fs/gfs2/main.c  |  2 ++
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 631d4ef..a878767 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -673,7 +673,7 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
 	struct posix_acl *default_acl, *acl;
 	struct gfs2_holder ghs[2];
 	struct inode *inode = NULL;
-	struct gfs2_inode *dip = GFS2_I(dir), *ip;
+	struct gfs2_inode *dip = GFS2_I(dir), *ip = NULL;
 	struct gfs2_sbd *sdp = GFS2_SB(&dip->i_inode);
 	struct gfs2_glock *io_gl;
 	int error, free_vfs_inode = 0;
@@ -839,16 +839,16 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
 	}
 
 	if (error)
-		goto fail_gunlock3;
+		goto fail_gunlock2;
 
 	error = security_inode_init_security(&ip->i_inode, &dip->i_inode, name,
 					     &gfs2_initxattrs, NULL);
 	if (error)
-		goto fail_gunlock3;
+		goto fail_gunlock2;
 
 	error = link_dinode(dip, name, ip, &da);
 	if (error)
-		goto fail_gunlock3;
+		goto fail_gunlock2;
 
 	mark_inode_dirty(inode);
 	d_instantiate(dentry, inode);
@@ -864,9 +864,6 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
 	gfs2_glock_dq_uninit(ghs + 1);
 	return error;
 
-fail_gunlock3:
-	if (ip->i_iopen_gh.gh_gl) /* if holder is linked to the glock */
-		gfs2_glock_put(ip->i_iopen_gh.gh_gl);
 fail_gunlock2:
 	gfs2_glock_dq_uninit(ghs + 1);
 fail_free_inode:
@@ -883,6 +880,14 @@ fail_free_acls:
 		posix_acl_release(acl);
 fail_free_vfs_inode:
 	free_vfs_inode = 1;
+	/* We hold off until the very end to release the iopen glock. That
+	 * keeps other processes from acquiring it in EX mode and deleting
+	 * it while we're still using it. Since gfs2_delete_inode already
+	 * handles the iopen vs. inode glocks in any order, the lock order
+	 * does not matter. It must be done before iput, though, otherwise
+	 * we might get a segfault trying to dereference it. */
+	if (ip && ip->i_iopen_gh.gh_gl) /* if holder is linked to the glock */
+		gfs2_glock_put(ip->i_iopen_gh.gh_gl);
 fail_gunlock:
 	gfs2_dir_no_add(&da);
 	gfs2_glock_dq_uninit(ghs);
diff --git a/fs/gfs2/main.c b/fs/gfs2/main.c
index 241a399..a8bd5b6 100644
--- a/fs/gfs2/main.c
+++ b/fs/gfs2/main.c
@@ -43,6 +43,8 @@ static void gfs2_init_inode_once(void *foo)
 	INIT_LIST_HEAD(&ip->i_trunc_list);
 	ip->i_res = NULL;
 	ip->i_hash_cache = NULL;
+	ip->i_iopen_gh.gh_gl = NULL;
+	INIT_LIST_HEAD(&ip->i_iopen_gh.gh_list);
 }
 
 static void gfs2_init_glock_once(void *foo)
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [GFS2 PATCH 15/15] GFS2: Rework gfs2_evict_inode to prevent collisions with openers
  2015-10-05 15:36 [Cluster-devel] [GFS2 PATCH 00/15] Fifteen patches related to file unlink->delete->new Bob Peterson
                   ` (13 preceding siblings ...)
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 14/15] GFS2: Hold onto iopen glock longer when dinode creation fails Bob Peterson
@ 2015-10-05 15:36 ` Bob Peterson
  14 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2015-10-05 15:36 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch mainly changes function gfs2_evict_inode so that it takes
out glock references and sets a DELETING bit to prevent collisions
from inode and glock creations from interfering. This is necessary
to ensure a smooth transition from unlinked to deleted status for
dinodes.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/incore.h  |  1 +
 fs/gfs2/meta_io.c |  6 ++++++
 fs/gfs2/super.c   | 40 ++++++++++++++++++++++++++++++++--------
 3 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index c8f9bb9..effa924 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -381,6 +381,7 @@ enum {
 	GIF_SW_PAGED		= 3,
 	GIF_ORDERED		= 4,
 	GIF_FREE_VFS_INODE      = 5,
+	GIF_AM_DELETING         = 6,
 };
 
 struct gfs2_inode {
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index c20d7ca..040eb15 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -128,6 +128,12 @@ struct buffer_head *gfs2_getbuf(struct gfs2_glock *gl, u64 blkno, int create)
 	index = blkno >> shift;             /* convert block to page */
 	bufnum = blkno - (index << shift);  /* block buf index within page */
 
+	if ((gl->gl_name.ln_type == LM_TYPE_INODE) && gl->gl_object) {
+		struct gfs2_inode *ip = gl->gl_object;
+		struct inode *inode = &ip->i_inode;
+		if (!test_bit(GIF_AM_DELETING, &ip->i_flags))
+			BUG_ON(inode->i_state & I_FREEING);
+	}
 	if (create) {
 		for (;;) {
 			page = grab_cache_page(mapping, index);
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 79ee54b..80168b5 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1511,7 +1511,7 @@ static void gfs2_evict_inode(struct inode *inode)
 	struct gfs2_sbd *sdp = sb->s_fs_info;
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_holder gh;
-	struct gfs2_glock *i_gl = NULL;
+	struct gfs2_glock *io_gl = NULL, *i_gl = NULL;
 	struct address_space *metamapping;
 	int error;
 
@@ -1523,6 +1523,11 @@ static void gfs2_evict_inode(struct inode *inode)
 	if (inode->i_nlink || (sb->s_flags & MS_RDONLY))
 		goto out;
 
+	error = gfs2_glock_get(sdp, ip->i_no_addr, &gfs2_inode_glops,
+			       CREATE, &i_gl);
+	if (unlikely(error))
+		goto out;
+
 	/* Must not read inode block until block type has been verified */
 	error = gfs2_glock_nq_init(i_gl, LM_ST_EXCLUSIVE, GL_SKIP, &gh);
 	if (unlikely(error)) {
@@ -1532,10 +1537,13 @@ static void gfs2_evict_inode(struct inode *inode)
 		goto out;
 	}
 
+	set_bit(GIF_AM_DELETING, &ip->i_flags);
 	if (!test_bit(GIF_ALLOC_FAILED, &ip->i_flags)) {
 		error = gfs2_check_blk_type(sdp, ip->i_no_addr, GFS2_BLKST_UNLINKED);
-		if (error)
-			goto out_truncate;
+		if (error) {
+			gfs2_glock_dq_uninit(&gh);
+			goto out;
+		}
 	}
 
 	if (test_bit(GIF_INVALID, &ip->i_flags)) {
@@ -1544,10 +1552,22 @@ static void gfs2_evict_inode(struct inode *inode)
 			goto out_truncate;
 	}
 
+	/* The iopen glock may have been released prior to this, but we need
+	   it back. */
+	error = gfs2_glock_get(sdp, ip->i_no_addr, &gfs2_iopen_glops, CREATE,
+			       &io_gl);
+	if (error)
+		goto out_truncate;
+
 	ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
-	gfs2_glock_dq_wait(&ip->i_iopen_gh);
-	gfs2_holder_reinit(LM_ST_EXCLUSIVE, LM_FLAG_TRY_1CB | GL_NOCACHE, &ip->i_iopen_gh);
-	error = gfs2_glock_nq(&ip->i_iopen_gh);
+	if (ip->i_iopen_gh.gh_gl &&
+	    test_bit(HIF_HOLDER, &ip->i_iopen_gh.gh_iflags)) {
+		gfs2_glock_dq_wait(&ip->i_iopen_gh);
+		gfs2_holder_uninit(&ip->i_iopen_gh);
+	}
+
+	error = gfs2_glock_nq_init(io_gl, LM_ST_EXCLUSIVE, LM_FLAG_TRY_1CB |
+				   GL_NOCACHE, &ip->i_iopen_gh);
 	if (error)
 		goto out_truncate;
 
@@ -1599,11 +1619,15 @@ out_unlock:
 	if (gfs2_rs_active(ip->i_res))
 		gfs2_rs_deltree(ip->i_res);
 
-	if (test_bit(HIF_HOLDER, &ip->i_iopen_gh.gh_iflags)) {
+	if (ip->i_iopen_gh.gh_gl &&
+	    test_bit(HIF_HOLDER, &ip->i_iopen_gh.gh_iflags)) {
 		ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
 		gfs2_glock_dq_wait(&ip->i_iopen_gh);
+		gfs2_holder_uninit(&ip->i_iopen_gh);
 	}
-	gfs2_holder_uninit(&ip->i_iopen_gh);
+	if (io_gl)
+		gfs2_glock_put(io_gl);
+
 	gfs2_glock_dq_uninit(&gh);
 	if (error && error != GLR_TRYFAILED && error != -EROFS)
 		fs_warn(sdp, "gfs2_evict_inode: %d\n", error);
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [GFS2 PATCH 13/15] gfs2: Use new variable i_gl instead of ip->i_gl
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 13/15] gfs2: Use new variable i_gl instead of ip->i_gl Bob Peterson
@ 2015-10-06 18:31   ` Andrew W Elble
  0 siblings, 0 replies; 22+ messages in thread
From: Andrew W Elble @ 2015-10-06 18:31 UTC (permalink / raw)
  To: cluster-devel.redhat.com


Bob Peterson <rpeterso@redhat.com> writes:

> This patch adds a new variable to function gfs2_evict_inode that
> simplifies the references to ip->i_gl. This is just for readability
> and to clarify future patches.
>
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
>  fs/gfs2/super.c | 13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index 06bd72b..79ee54b 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c

<snip>

> @@ -1616,7 +1617,7 @@ out:
>  	ip->i_gl->gl_object = NULL;
>  	flush_delayed_work(&ip->i_gl->gl_work);
>  	gfs2_glock_add_to_lru(ip->i_gl);
> -	gfs2_glock_put(ip->i_gl);

        if (i_gl)

> +	gfs2_glock_put(i_gl);
>  	ip->i_gl = NULL;
>  	if (ip->i_iopen_gh.gh_gl) {
>  		ip->i_iopen_gh.gh_gl->gl_object = NULL;

-- 
Andrew W. Elble
aweits at discipline.rit.edu
Infrastructure Engineer, Communications Technical Lead
Rochester Institute of Technology
PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Cluster-devel] [GFS2 PATCH 06/15] GFS2: Prevent gl_delete work for re-used inodes
  2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 06/15] GFS2: Prevent gl_delete work for re-used inodes Bob Peterson
@ 2015-10-06 18:40   ` Andrew W Elble
  2015-10-06 19:02     ` Bob Peterson
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew W Elble @ 2015-10-06 18:40 UTC (permalink / raw)
  To: cluster-devel.redhat.com


Bob Peterson <rpeterso@redhat.com> writes:

> This patch adds a new glock flag GLF_INODE_DELETING which signifies
> when a glock is being used to change an inode from unlinked to
> deleted. The flag is used in a few places:

This is the change I made to what we're testing:

diff --git a/fs/gfs2/trace_gfs2.h b/fs/gfs2/trace_gfs2.h
index 20c007d..80f2ee7 100644
--- a/fs/gfs2/trace_gfs2.h
+++ b/fs/gfs2/trace_gfs2.h
@@ -57,7 +57,8 @@
        {(1UL << GLF_QUEUED),                   "q" },          \
        {(1UL << GLF_LRU),                      "L" },          \
        {(1UL << GLF_OBJECT),                   "o" },          \
-       {(1UL << GLF_BLOCKING),                 "b" })
+        {(1UL << GLF_BLOCKING),                        "b" },          \
+        {(1UL << GLF_INODE_DELETING),          "-" })
 
 #ifndef NUMPTY
 #define NUMPTY


-- 
Andrew W. Elble
aweits@discipline.rit.edu
Infrastructure Engineer, Communications Technical Lead
Rochester Institute of Technology
PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [GFS2 PATCH 06/15] GFS2: Prevent gl_delete work for re-used inodes
  2015-10-06 18:40   ` Andrew W Elble
@ 2015-10-06 19:02     ` Bob Peterson
  2015-10-06 19:33       ` Andrew W Elble
  0 siblings, 1 reply; 22+ messages in thread
From: Bob Peterson @ 2015-10-06 19:02 UTC (permalink / raw)
  To: cluster-devel.redhat.com

----- Original Message -----
> Bob Peterson <rpeterso@redhat.com> writes:
> 
> > This patch adds a new glock flag GLF_INODE_DELETING which signifies
> > when a glock is being used to change an inode from unlinked to
> > deleted. The flag is used in a few places:
> 
> This is the change I made to what we're testing:
> 
> diff --git a/fs/gfs2/trace_gfs2.h b/fs/gfs2/trace_gfs2.h
> index 20c007d..80f2ee7 100644
> --- a/fs/gfs2/trace_gfs2.h
> +++ b/fs/gfs2/trace_gfs2.h
> @@ -57,7 +57,8 @@
>         {(1UL << GLF_QUEUED),                   "q" },          \
>         {(1UL << GLF_LRU),                      "L" },          \
>         {(1UL << GLF_OBJECT),                   "o" },          \
> -       {(1UL << GLF_BLOCKING),                 "b" })
> +        {(1UL << GLF_BLOCKING),                        "b" },          \
> +        {(1UL << GLF_INODE_DELETING),          "-" })
>  
>  #ifndef NUMPTY
>  #define NUMPTY
> 
> 
> --
> Andrew W. Elble
> aweits at discipline.rit.edu
> Infrastructure Engineer, Communications Technical Lead
> Rochester Institute of Technology
> PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912

Hi Andrew,

Actually, I've found a few bugs and problems with that last patch set
and revised my patches last week. I've also added the glock flag, but
used "x" rather than "-" because I'm not sure I like punctuation marks there,
but nothing else makes sense either. The other changes are for the other
thing you spotted (which I caught in testing). The proper way to do it
is to initialize the i_gl to ip->i_gl in the evict code, and not have the
if at all. That affects two of the patches:
"gfs2: Use new variable i_gl instead of ip->i_gl"
and:
"GFS2: Rework gfs2_evict_inode to prevent collisions with openers"

I've been holding off on sending a revised set to see if I got other
comments or other problems during testing, but since you found them,
I'll just send out a new patch set.

Regards,

Bob Peterson
Red Hat File Systems
 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Cluster-devel] [GFS2 PATCH 06/15] GFS2: Prevent gl_delete work for re-used inodes
  2015-10-06 19:02     ` Bob Peterson
@ 2015-10-06 19:33       ` Andrew W Elble
  2015-10-12 19:15         ` Andrew W Elble
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew W Elble @ 2015-10-06 19:33 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Bob Peterson <rpeterso@redhat.com> writes:

> Hi Andrew,
>
> Actually, I've found a few bugs and problems with that last patch set
> and revised my patches last week. I've also added the glock flag, but
> used "x" rather than "-" because I'm not sure I like punctuation marks there,
> but nothing else makes sense either. The other changes are for the other
> thing you spotted (which I caught in testing). The proper way to do it
> is to initialize the i_gl to ip->i_gl in the evict code, and not have the
> if at all. That affects two of the patches:

I was wondering about that. I'll get that changed.

Thanks,

Andy

-- 
Andrew W. Elble
aweits at discipline.rit.edu
Infrastructure Engineer, Communications Technical Lead
Rochester Institute of Technology
PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Cluster-devel] [GFS2 PATCH 06/15] GFS2: Prevent gl_delete work for re-used inodes
  2015-10-06 19:33       ` Andrew W Elble
@ 2015-10-12 19:15         ` Andrew W Elble
  2015-10-13 17:28           ` Andrew W Elble
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew W Elble @ 2015-10-12 19:15 UTC (permalink / raw)
  To: cluster-devel.redhat.com


Bob,

   The deadlock is slightly different - but still occurs with your
   patches in place. I'm pretty sure this is what's happening:

nfsd (unlink) has i_mutex
        fs/nfsd/vfs.c:nfsd_unlink()
	    -> fh_lock_nested(fhp, I_MUTEX_PARENT);

nfsd (lookup) gets a glock
        fs/gfs2/inode.c:gfs2_inode_lookup()
	    -> error = gfs2_glock_nq_init(io_gl, LM_ST_SHARED, GL_EXACT, &ip->i_iopen_gh);

nfsd (unlink) needs conflicting glock
        fs/gfs2/super.c:gfs2_evict_inode()
            -> if (ip->i_iopen_gh.gh_gl &&
                  test_bit(HIF_HOLDER, &ip->i_iopen_gh.gh_iflags)) {
                     gfs2_glock_dq_wait(&ip->i_iopen_gh);

client (lookup) waiting on i_mutex
        fs/nfsd/vfs.c:nfsd_lookup_dentry()
            -> fh_lock_nested(fhp, I_MUTEX_PARENT);

-> deadlock

G:  s:EX n:2/2f4935a f:yIqob t:EX d:EX/0 a:0 v:0 r:10 m:200
 H: s:EX f:H e:0 p:34415 [nfsd] gfs2_evict_inode+0x160/0x4d0 [gfs2]
 I: n:329967/49582938 t:8 f:0x00 d:0x00000000 s:500

G:  s:SH n:5/2f4935a f:DIqob t:SH d:UN/3833211000 a:0 v:0 r:4 m:200
 H: s:SH f:EH e:0 p:34414 [nfsd] gfs2_inode_lookup+0xee/0x1f0 [gfs2]

nfs client host a:
313189 11:47:26.499858000 x.y.z.a -> x.y.z.q NFS 342 V4 Call REMOVE DH: 0x46fbb746/0353fd0043cc75dd8203b16b5bd4c197-cache-mod_custom-e2acfa1435db9601a6b9645e9f8be86f.php

nfs client host b:
539106 11:49:28.390748000 x.y.z.b -> x.y.z.q NFS 362 V4 Call LOOKUP DH: 0x46fbb746/0353fd0043cc75dd8203b16b5bd4c197-cache-mod_custom-ea521049a8a64b325300eab10b4ac871.php

crash> bt
PID: 34414  TASK: ffff881fdc7428b0  CPU: 38  COMMAND: "nfsd"
 #0 [ffff881f2da57b70] __schedule at ffffffff8165bbc4
 #1 [ffff881f2da57bc0] schedule at ffffffff8165c267
 #2 [ffff881f2da57be0] schedule_preempt_disabled at ffffffff8165c59e
 #3 [ffff881f2da57bf0] __mutex_lock_slowpath at ffffffff8165e0d5
 #4 [ffff881f2da57c50] mutex_lock at ffffffff8165e173
 #5 [ffff881f2da57c70] nfsd_lookup_dentry at ffffffffa035454f [nfsd]
 #6 [ffff881f2da57cf0] nfsd_lookup at ffffffffa0354989 [nfsd]
 #7 [ffff881f2da57d40] nfsd4_lookup at ffffffffa0361a2a [nfsd]
 #8 [ffff881f2da57d50] nfsd4_proc_compound at ffffffffa0363d57 [nfsd]
 #9 [ffff881f2da57db0] nfsd_dispatch at ffffffffa034ff83 [nfsd]
#10 [ffff881f2da57df0] svc_process_common at ffffffffa0188260 [sunrpc]
#11 [ffff881f2da57e60] svc_process at ffffffffa0188603 [sunrpc]
#12 [ffff881f2da57e90] nfsd at ffffffffa034f98f [nfsd]
#13 [ffff881f2da57ec0] kthread at ffffffff81096989
#14 [ffff881f2da57f50] ret_from_fork at ffffffff81660462

crash> bt
PID: 34415  TASK: ffff881fec1a6c80  CPU: 24  COMMAND: "nfsd"
 #0 [ffff881f2db779f0] __schedule at ffffffff8165bbc4
 #1 [ffff881f2db77a40] schedule at ffffffff8165c267
 #2 [ffff881f2db77a60] bit_wait at ffffffff8165ca7c
 #3 [ffff881f2db77a70] __wait_on_bit at ffffffff8165c705
 #4 [ffff881f2db77ac0] out_of_line_wait_on_bit at ffffffff8165c7a2
 #5 [ffff881f2db77b30] gfs2_glock_dq_wait at ffffffffa0850553 [gfs2]
 #6 [ffff881f2db77b50] gfs2_evict_inode at ffffffffa08697d5 [gfs2]
 #7 [ffff881f2db77bf0] evict at ffffffff811fcbcb
 #8 [ffff881f2db77c20] iput at ffffffff811fd52b
 #9 [ffff881f2db77c50] d_delete at ffffffff811f8e38
#10 [ffff881f2db77c80] vfs_unlink at ffffffff811edf79
#11 [ffff881f2db77cd0] nfsd_unlink at ffffffffa0355dcf [nfsd]
#12 [ffff881f2db77d10] nfsd4_remove at ffffffffa0362ebd [nfsd]
#13 [ffff881f2db77d50] nfsd4_proc_compound at ffffffffa0363d57 [nfsd]
#14 [ffff881f2db77db0] nfsd_dispatch at ffffffffa034ff83 [nfsd]
#15 [ffff881f2db77df0] svc_process_common at ffffffffa0188260 [sunrpc]
#16 [ffff881f2db77e60] svc_process at ffffffffa0188603 [sunrpc]
#17 [ffff881f2db77e90] nfsd at ffffffffa034f98f [nfsd]
#18 [ffff881f2db77ec0] kthread at ffffffff81096989
#19 [ffff881f2db77f50] ret_from_fork at ffffffff81660462

Thanks,

Andy

-- 
Andrew W. Elble
aweits at discipline.rit.edu
Infrastructure Engineer, Communications Technical Lead
Rochester Institute of Technology
PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Cluster-devel] [GFS2 PATCH 06/15] GFS2: Prevent gl_delete work for re-used inodes
  2015-10-12 19:15         ` Andrew W Elble
@ 2015-10-13 17:28           ` Andrew W Elble
  0 siblings, 0 replies; 22+ messages in thread
From: Andrew W Elble @ 2015-10-13 17:28 UTC (permalink / raw)
  To: cluster-devel.redhat.com


Upon further review, there's a lot more going on here. I'll have to get back
to you once I manage to pull all the relevant data out of the vmcore.

Thanks,

Andy

-- 
Andrew W. Elble
aweits at discipline.rit.edu
Infrastructure Engineer, Communications Technical Lead
Rochester Institute of Technology
PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2015-10-13 17:28 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-05 15:36 [Cluster-devel] [GFS2 PATCH 00/15] Fifteen patches related to file unlink->delete->new Bob Peterson
2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 01/15] GFS2: Update master statfs buffer with sd_statfs_spin locked Bob Peterson
2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 02/15] GFS2: Allow fail_gunlock3 to set the free_vfs_inode bit Bob Peterson
2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 03/15] GFS2: Protect log tail calculations with inside locks Bob Peterson
2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 04/15] GFS2: Wait for iopen glock dequeues Bob Peterson
2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 05/15] GFS2: Reintroduce a timeout in function gfs2_gl_hash_clear Bob Peterson
2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 06/15] GFS2: Prevent gl_delete work for re-used inodes Bob Peterson
2015-10-06 18:40   ` Andrew W Elble
2015-10-06 19:02     ` Bob Peterson
2015-10-06 19:33       ` Andrew W Elble
2015-10-12 19:15         ` Andrew W Elble
2015-10-13 17:28           ` Andrew W Elble
2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 07/15] GFS2: Truncate address space mapping when deleting an inode Bob Peterson
2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 08/15] GFS2: Don't filter out I_FREEING inodes anymore Bob Peterson
2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 09/15] GFS2: generalize gfs2_check_blk_type Bob Peterson
2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 10/15] GFS2: Change from tr_touched to tr_bufs Bob Peterson
2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 11/15] GFS2: Add new function gfs2_inode_lookup_for_del Bob Peterson
2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 12/15] gfs2: Remove unused param non_block from gfs2_inode_lookup Bob Peterson
2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 13/15] gfs2: Use new variable i_gl instead of ip->i_gl Bob Peterson
2015-10-06 18:31   ` Andrew W Elble
2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 14/15] GFS2: Hold onto iopen glock longer when dinode creation fails Bob Peterson
2015-10-05 15:36 ` [Cluster-devel] [GFS2 PATCH 15/15] GFS2: Rework gfs2_evict_inode to prevent collisions with openers Bob Peterson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).