[Cluster-devel] [PATCH 00/11] Eleven patches related to file unlink->delete->new

cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

* [Cluster-devel] [PATCH 00/11] Eleven patches related to file unlink->delete->new
@ 2015-09-10 19:49 Bob Peterson
  2015-09-10 19:49 ` [Cluster-devel] [PATCH 01/11] GFS2: Update master statfs buffer with sd_statfs_spin locked Bob Peterson
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: Bob Peterson @ 2015-09-10 19:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This set of eleven patches was designed to fix several problems in GFS2
that had to do with transitioning dinodes from unlinked to deleted and
back to created again. Many of these problems resulted in dinodes being
not transitioning from the "unlinked" state to a "deleted" state.
Some of them address the problem of dinodes being re-created in place
(due to limited space in the file system) in which case callbacks were
received and delete work was queued for dinodes that were already
deleted and re-used by another node, only to have the callback destroy
the newly created dinode.

These patches were tested using virts that had 4GB GFS2 file systems,
using a test case that recreated numerous problems. With these patches,
all those problems disappear.

I don't really consider this patch set ready to ship yet. I just wanted
to toss them out there to get some code reviews and some feedback.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
Bob Peterson (11):
  GFS2: Update master statfs buffer with sd_statfs_spin locked
  GFS2: Allow fail_gunlock3 to set the free_vfs_inode bit
  GFS2: Protect log tail calculations with inside locks
  GFS2: Wait for iopen glock dequeues
  GFS2: Reintroduce a timeout in function gfs2_gl_hash_clear
  GFS2: Prevent gl_delete work for re-used inodes
  GFS2: Truncate address space mapping when deleting an inode
  GFS2: Don't filter out I_FREEING inodes anymore
  GFS2: generalize gfs2_check_blk_type
  GFS2: Rework transition from unlinked to deleted dinodes
  GFS2: Change from tr_touched to tr_bufs

 fs/gfs2/dir.c        |   2 +-
 fs/gfs2/export.c     |   2 +-
 fs/gfs2/glock.c      |  15 ++--
 fs/gfs2/glops.c      |   3 +-
 fs/gfs2/incore.h     |   7 +-
 fs/gfs2/inode.c      | 231 ++++++++++++++++++++++++++++++++++++---------------
 fs/gfs2/inode.h      |   7 +-
 fs/gfs2/log.c        |   7 +-
 fs/gfs2/main.c       |   2 +
 fs/gfs2/meta_io.c    |  12 ++-
 fs/gfs2/ops_fstype.c |   2 +-
 fs/gfs2/rgrp.c       |  52 ++++++++----
 fs/gfs2/rgrp.h       |   1 +
 fs/gfs2/super.c      |  67 ++++++++++-----
 fs/gfs2/trans.c      |  13 +--
 fs/gfs2/xattr.c      |   6 ++
 16 files changed, 304 insertions(+), 125 deletions(-)

-- 
2.4.3
---

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Cluster-devel] [PATCH 01/11] GFS2: Update master statfs buffer with sd_statfs_spin locked
  2015-09-10 19:49 [Cluster-devel] [PATCH 00/11] Eleven patches related to file unlink->delete->new Bob Peterson
@ 2015-09-10 19:49 ` Bob Peterson
  2015-09-10 19:49 ` [Cluster-devel] [PATCH 02/11] GFS2: Allow fail_gunlock3 to set the free_vfs_inode bit Bob Peterson
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Bob Peterson @ 2015-09-10 19:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, function update_statfs called gfs2_statfs_change_out
to update the master statfs buffer without the sd_statfs_spin held.
In theory, another process could call gfs2_statfs_sync, which takes
the sd_statfs_spin lock and re-reads m_sc from the buffer. So there's
a theoretical timing window in which one process could write the
master statfs buffer, then another comes along and re-reads it, wiping
out the changes.
---
 fs/gfs2/super.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 2982445..06487ed 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -556,6 +556,7 @@ void update_statfs(struct gfs2_sbd *sdp, struct buffer_head *m_bh,
 	struct gfs2_statfs_change_host *l_sc = &sdp->sd_statfs_local;
 
 	gfs2_trans_add_meta(l_ip->i_gl, l_bh);
+	gfs2_trans_add_meta(m_ip->i_gl, m_bh);
 
 	spin_lock(&sdp->sd_statfs_spin);
 	m_sc->sc_total += l_sc->sc_total;
@@ -564,10 +565,8 @@ void update_statfs(struct gfs2_sbd *sdp, struct buffer_head *m_bh,
 	memset(l_sc, 0, sizeof(struct gfs2_statfs_change));
 	memset(l_bh->b_data + sizeof(struct gfs2_dinode),
 	       0, sizeof(struct gfs2_statfs_change));
-	spin_unlock(&sdp->sd_statfs_spin);
-
-	gfs2_trans_add_meta(m_ip->i_gl, m_bh);
 	gfs2_statfs_change_out(m_sc, m_bh->b_data + sizeof(struct gfs2_dinode));
+	spin_unlock(&sdp->sd_statfs_spin);
 }
 
 int gfs2_statfs_sync(struct super_block *sb, int type)
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Cluster-devel] [PATCH 02/11] GFS2: Allow fail_gunlock3 to set the free_vfs_inode bit
  2015-09-10 19:49 [Cluster-devel] [PATCH 00/11] Eleven patches related to file unlink->delete->new Bob Peterson
  2015-09-10 19:49 ` [Cluster-devel] [PATCH 01/11] GFS2: Update master statfs buffer with sd_statfs_spin locked Bob Peterson
@ 2015-09-10 19:49 ` Bob Peterson
  2015-09-10 19:49 ` [Cluster-devel] [PATCH 03/11] GFS2: Protect log tail calculations with inside locks Bob Peterson
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Bob Peterson @ 2015-09-10 19:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch changes a failure case in function gfs2_create_inode.
In some error paths, it jumps to label fail_gunlock3, which fails
to set the free_vfs_inode flag. This, in turn, prevents the code
from setting the GIF_FREE_VFS_INODE inode flag. That, in turn,
allow the code to mistakenly unlink the dinode and not ever delete
its data blocks.
---
 fs/gfs2/inode.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 063fdfc..c56edb8 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -766,11 +766,8 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
 	return error;
 
 fail_gunlock3:
-	gfs2_glock_dq_uninit(ghs + 1);
-	if (ip->i_gl)
-		gfs2_glock_put(ip->i_gl);
-	goto fail_gunlock;
-
+	if (ip->i_iopen_gh.gh_gl) /* if holder is linked to the glock */
+		gfs2_glock_put(ip->i_iopen_gh.gh_gl);
 fail_gunlock2:
 	gfs2_glock_dq_uninit(ghs + 1);
 fail_free_inode:
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Cluster-devel] [PATCH 03/11] GFS2: Protect log tail calculations with inside locks
  2015-09-10 19:49 [Cluster-devel] [PATCH 00/11] Eleven patches related to file unlink->delete->new Bob Peterson
  2015-09-10 19:49 ` [Cluster-devel] [PATCH 01/11] GFS2: Update master statfs buffer with sd_statfs_spin locked Bob Peterson
  2015-09-10 19:49 ` [Cluster-devel] [PATCH 02/11] GFS2: Allow fail_gunlock3 to set the free_vfs_inode bit Bob Peterson
@ 2015-09-10 19:49 ` Bob Peterson
  2015-09-10 19:49 ` [Cluster-devel] [PATCH 04/11] GFS2: Wait for iopen glock dequeues Bob Peterson
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Bob Peterson @ 2015-09-10 19:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch adjusts two functions related to the journal's log tail.
It essentially moves them inside the protective locks so they can't
be moved by another process. If the calculations aren't protected,
the log's tail may not be properly pulled, which eventually makes
it seem like there isn't enough room in the journal for more
transactions. This results in a hang where GFS2 repeatedly does
log flushes without making any progress or making any room.
---
 fs/gfs2/log.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 536e7a6..18959de 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -277,11 +277,12 @@ static void gfs2_ail2_empty_one(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
 static void ail2_empty(struct gfs2_sbd *sdp, unsigned int new_tail)
 {
 	struct gfs2_trans *tr, *safe;
-	unsigned int old_tail = sdp->sd_log_tail;
-	int wrap = (new_tail < old_tail);
-	int a, b, rm;
+	unsigned int old_tail;
+	int wrap, a, b, rm;
 
 	spin_lock(&sdp->sd_ail_lock);
+	old_tail = sdp->sd_log_tail;
+	wrap = (new_tail < old_tail);
 
 	list_for_each_entry_safe(tr, safe, &sdp->sd_ail2_list, tr_list) {
 		a = (old_tail <= tr->tr_first);
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Cluster-devel] [PATCH 04/11] GFS2: Wait for iopen glock dequeues
  2015-09-10 19:49 [Cluster-devel] [PATCH 00/11] Eleven patches related to file unlink->delete->new Bob Peterson
                   ` (2 preceding siblings ...)
  2015-09-10 19:49 ` [Cluster-devel] [PATCH 03/11] GFS2: Protect log tail calculations with inside locks Bob Peterson
@ 2015-09-10 19:49 ` Bob Peterson
  2015-09-10 19:49 ` [Cluster-devel] [PATCH 05/11] GFS2: Reintroduce a timeout in function gfs2_gl_hash_clear Bob Peterson
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Bob Peterson @ 2015-09-10 19:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch changes every glock_dq for iopen glocks into a dq_wait.
This makes sure the iopen glock does not outlive the inode itself.
In turn, that ensures that anyone trying to unlink the glock will
be able to find the inode when it receives a remote iopen callback.
---
 fs/gfs2/inode.c | 3 ++-
 fs/gfs2/super.c | 8 +++++---
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index c56edb8..ce4b793 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -191,7 +191,8 @@ struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned int type,
 fail_refresh:
 	ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
 	ip->i_iopen_gh.gh_gl->gl_object = NULL;
-	gfs2_glock_dq_uninit(&ip->i_iopen_gh);
+	gfs2_glock_dq_wait(&ip->i_iopen_gh);
+	gfs2_holder_uninit(&ip->i_iopen_gh);
 fail_iopen:
 	if (io_gl)
 		gfs2_glock_put(io_gl);
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 06487ed..7eba621 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1525,7 +1525,8 @@ static void gfs2_evict_inode(struct inode *inode)
 	error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_SKIP, &gh);
 	if (unlikely(error)) {
 		ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
-		gfs2_glock_dq_uninit(&ip->i_iopen_gh);
+		gfs2_glock_dq_wait(&ip->i_iopen_gh);
+		gfs2_holder_uninit(&ip->i_iopen_gh);
 		goto out;
 	}
 
@@ -1597,7 +1598,7 @@ out_unlock:
 
 	if (test_bit(HIF_HOLDER, &ip->i_iopen_gh.gh_iflags)) {
 		ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
-		gfs2_glock_dq(&ip->i_iopen_gh);
+		gfs2_glock_dq_wait(&ip->i_iopen_gh);
 	}
 	gfs2_holder_uninit(&ip->i_iopen_gh);
 	gfs2_glock_dq_uninit(&gh);
@@ -1618,7 +1619,8 @@ out:
 	if (ip->i_iopen_gh.gh_gl) {
 		ip->i_iopen_gh.gh_gl->gl_object = NULL;
 		ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
-		gfs2_glock_dq_uninit(&ip->i_iopen_gh);
+		gfs2_glock_dq_wait(&ip->i_iopen_gh);
+		gfs2_holder_uninit(&ip->i_iopen_gh);
 	}
 }
 
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Cluster-devel] [PATCH 05/11] GFS2: Reintroduce a timeout in function gfs2_gl_hash_clear
  2015-09-10 19:49 [Cluster-devel] [PATCH 00/11] Eleven patches related to file unlink->delete->new Bob Peterson
                   ` (3 preceding siblings ...)
  2015-09-10 19:49 ` [Cluster-devel] [PATCH 04/11] GFS2: Wait for iopen glock dequeues Bob Peterson
@ 2015-09-10 19:49 ` Bob Peterson
  2015-09-10 19:49 ` [Cluster-devel] [PATCH 06/11] GFS2: Prevent gl_delete work for re-used inodes Bob Peterson
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Bob Peterson @ 2015-09-10 19:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

At some point in the past, we used to have a timeout when GFS2 was
unmounting, trying to clear out its glocks. If the timeout expires,
it would dump the remaining glocks to the kernel messages so that
developers can debug the problem. That timeout was eliminated,
probably by accident. This patch reintroduces it.
---
 fs/gfs2/glock.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 9bd1244..c352359 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1506,7 +1506,9 @@ void gfs2_gl_hash_clear(struct gfs2_sbd *sdp)
 	flush_workqueue(glock_workqueue);
 	glock_hash_walk(clear_glock, sdp);
 	flush_workqueue(glock_workqueue);
-	wait_event(sdp->sd_glock_wait, atomic_read(&sdp->sd_glock_disposal) == 0);
+	wait_event_timeout(sdp->sd_glock_wait,
+			   atomic_read(&sdp->sd_glock_disposal) == 0,
+			   HZ * 600);
 	glock_hash_walk(dump_glock_func, sdp);
 }
 
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Cluster-devel] [PATCH 06/11] GFS2: Prevent gl_delete work for re-used inodes
  2015-09-10 19:49 [Cluster-devel] [PATCH 00/11] Eleven patches related to file unlink->delete->new Bob Peterson
                   ` (4 preceding siblings ...)
  2015-09-10 19:49 ` [Cluster-devel] [PATCH 05/11] GFS2: Reintroduce a timeout in function gfs2_gl_hash_clear Bob Peterson
@ 2015-09-10 19:49 ` Bob Peterson
  2015-09-10 19:49 ` [Cluster-devel] [PATCH 07/11] GFS2: Truncate address space mapping when deleting an inode Bob Peterson
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Bob Peterson @ 2015-09-10 19:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch adds a new glock flag GLF_INODE_DELETING which signifies
when a glock is being used to change an inode from unlinked to
deleted. The flag is used in a few places:
1. If an iopen callback is received, it checks the flag. If the bit
   is set, someone else has already started deleting the inode.
   In that case, the delete_func may already be running, so we don't
   want to queue it to run another time. Doing so only gets us into
   trouble.
2. When a dinode is in the process of being created, we've been
   assigned that block by the allocator, so it must have been free.
   At that point, we check if there is pending delete work pending,
   and if so, we cancel it to prevent the block from being deleted
   while we're creating it. This is necessary because there could
   be pending delete work that was queued up a while ago, but the
   delete work might have been done on another node, which is how
   the block became freed. However, we keep the GLF_INODE_DELETING
   set to prevent new delete work from being queued. After we're
   done creating, we clear the bit, otherwise the file may not be
   deleted ever again, even in legitimate cases in the future.
3. In function try_rgrp_unlink, we also make sure the bit isn't
   already set before we try to reclaim an unlinked block.
---
 fs/gfs2/glops.c  |  3 ++-
 fs/gfs2/incore.h |  1 +
 fs/gfs2/inode.c  | 21 ++++++++++++++++++++-
 fs/gfs2/rgrp.c   |  6 ++++++
 4 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 1f6c9c3..b604343 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -553,7 +553,8 @@ static void iopen_go_callback(struct gfs2_glock *gl, bool remote)
 		return;
 
 	if (gl->gl_demote_state == LM_ST_UNLOCKED &&
-	    gl->gl_state == LM_ST_SHARED && ip) {
+	    gl->gl_state == LM_ST_SHARED && ip &&
+	    !test_and_set_bit(GLF_INODE_DELETING, &gl->gl_flags)) {
 		gl->gl_lockref.count++;
 		if (queue_work(gfs2_delete_workqueue, &gl->gl_delete) == 0)
 			gl->gl_lockref.count--;
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 121ed08..5065e0c 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -326,6 +326,7 @@ enum {
 	GLF_LRU				= 13,
 	GLF_OBJECT			= 14, /* Used only for tracing */
 	GLF_BLOCKING			= 15,
+	GLF_INODE_DELETING		= 16, /* Was unlinked, being deleted */
 };
 
 struct gfs2_glock {
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index ce4b793..833f8fa 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -597,6 +597,7 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
 	int error, free_vfs_inode = 0;
 	u32 aflags = 0;
 	unsigned blocks = 1;
+	int delete_prevented = 0;
 	struct gfs2_diradd da = { .bh = NULL, .save_loc = 1, };
 
 	if (!name->len || name->len > GFS2_FNAMESIZE)
@@ -705,6 +706,17 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
 	if (error)
 		goto fail_free_inode;
 
+	/*
+	 * Cancel any pending delete work for this glock. If there's pending
+	 * delete work, we'd otherwise try to delete the dinode, but since we
+	 * were assigned this address by alloc_dinode, the block is already
+	 * free, so there's no need to attempt to change it from unlinked to
+	 * free. We'd just get into trouble trying to do so. The biggest
+	 * problem is having gfs2_delete_inode called while there pages
+	 * still in existence due to a race between create and delete.
+	 */
+	if (cancel_work_sync(&ip->i_gl->gl_delete))
+		delete_prevented = 1;
 	ip->i_gl->gl_object = ip;
 	error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_SKIP, ghs + 1);
 	if (error)
@@ -762,6 +774,10 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
 		*opened |= FILE_CREATED;
 		error = finish_open(file, dentry, gfs2_open_common, opened);
 	}
+	if (delete_prevented) {
+		gfs2_glock_put(ip->i_gl); /* fix the gl reference count */
+		clear_bit(GLF_INODE_DELETING, &ip->i_gl->gl_flags);
+	}
 	gfs2_glock_dq_uninit(ghs);
 	gfs2_glock_dq_uninit(ghs + 1);
 	return error;
@@ -772,8 +788,11 @@ fail_gunlock3:
 fail_gunlock2:
 	gfs2_glock_dq_uninit(ghs + 1);
 fail_free_inode:
-	if (ip->i_gl)
+	if (ip->i_gl) {
+		if (delete_prevented)
+			clear_bit(GLF_INODE_DELETING, &ip->i_gl->gl_flags);
 		gfs2_glock_put(ip->i_gl);
+	}
 	gfs2_rs_delete(ip, NULL);
 fail_free_acls:
 	if (default_acl)
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 475985d..b936ee1 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1807,6 +1807,12 @@ static void try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 *last_unlinked, u64 skip
 		if (error)
 			continue;
 
+		/* Make sure we're not queuing a delete if someone else has */
+		if (test_and_set_bit(GLF_INODE_DELETING, &gl->gl_flags)) {
+			gfs2_glock_put(gl);
+			continue;
+		}
+
 		/* If the inode is already in cache, we can ignore it here
 		 * because the existing inode disposal code will deal with
 		 * it when all refs have gone away. Accessing gl_object like
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Cluster-devel] [PATCH 07/11] GFS2: Truncate address space mapping when deleting an inode
  2015-09-10 19:49 [Cluster-devel] [PATCH 00/11] Eleven patches related to file unlink->delete->new Bob Peterson
                   ` (5 preceding siblings ...)
  2015-09-10 19:49 ` [Cluster-devel] [PATCH 06/11] GFS2: Prevent gl_delete work for re-used inodes Bob Peterson
@ 2015-09-10 19:49 ` Bob Peterson
  2015-09-10 19:49 ` [Cluster-devel] [PATCH 08/11] GFS2: Don't filter out I_FREEING inodes anymore Bob Peterson
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Bob Peterson @ 2015-09-10 19:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

In function gfs2_delete_inode() we write and flush the mapping for
a glock, among other things. We truncate the mapping for the inode,
but we never truncate the mapping for the glock. This patch makes it
also truncate the metamapping. This avoid cases where the glock is
reused by another process who is trying to recreate an inode in its
place using the same block.
---
 fs/gfs2/super.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 7eba621..2f29ca5 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1511,6 +1511,7 @@ static void gfs2_evict_inode(struct inode *inode)
 	struct gfs2_sbd *sdp = sb->s_fs_info;
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_holder gh;
+	struct address_space *metamapping;
 	int error;
 
 	if (test_bit(GIF_FREE_VFS_INODE, &ip->i_flags)) {
@@ -1575,8 +1576,8 @@ static void gfs2_evict_inode(struct inode *inode)
 
 out_truncate:
 	gfs2_log_flush(sdp, ip->i_gl, NORMAL_FLUSH);
+	metamapping = gfs2_glock2aspace(ip->i_gl);
 	if (test_bit(GLF_DIRTY, &ip->i_gl->gl_flags)) {
-		struct address_space *metamapping = gfs2_glock2aspace(ip->i_gl);
 		filemap_fdatawrite(metamapping);
 		filemap_fdatawait(metamapping);
 	}
@@ -1589,6 +1590,7 @@ out_truncate:
 		goto out_unlock;
 	/* Needs to be done before glock release & also in a transaction */
 	truncate_inode_pages(&inode->i_data, 0);
+	truncate_inode_pages(metamapping, 0);
 	gfs2_trans_end(sdp);
 
 out_unlock:
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Cluster-devel] [PATCH 08/11] GFS2: Don't filter out I_FREEING inodes anymore
  2015-09-10 19:49 [Cluster-devel] [PATCH 00/11] Eleven patches related to file unlink->delete->new Bob Peterson
                   ` (6 preceding siblings ...)
  2015-09-10 19:49 ` [Cluster-devel] [PATCH 07/11] GFS2: Truncate address space mapping when deleting an inode Bob Peterson
@ 2015-09-10 19:49 ` Bob Peterson
  2015-09-10 19:49 ` [Cluster-devel] [PATCH 09/11] GFS2: generalize gfs2_check_blk_type Bob Peterson
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Bob Peterson @ 2015-09-10 19:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch basically reverts a very old patch from 2008,
7a9f53b3c1875bef22ad4588e818bc046ef183da, with the title
"Alternate gfs2_iget to avoid looking up inodes being freed".
The original patch was designed to avoid a deadlock caused by lock
ordering with try_rgrp_unlink. The patch forced the function to not
find inodes that were being removed by VFS. The problem is, that
made it impossible for nodes to delete their own unlinked dinodes
after a certain point in time, because the inode needed was not found
by this filtering process. There is no longer a need for the patch,
since functio try_rgrp_unlink no longer locks the inode: All it does
is queue the glock onto the delete work_queue, so there should be no
more deadlock.
---
 fs/gfs2/export.c |  2 +-
 fs/gfs2/glock.c  |  2 +-
 fs/gfs2/inode.c  | 59 ++++----------------------------------------------------
 fs/gfs2/inode.h  |  2 +-
 4 files changed, 7 insertions(+), 58 deletions(-)

diff --git a/fs/gfs2/export.c b/fs/gfs2/export.c
index 5d15e94..d5bda85 100644
--- a/fs/gfs2/export.c
+++ b/fs/gfs2/export.c
@@ -137,7 +137,7 @@ static struct dentry *gfs2_get_dentry(struct super_block *sb,
 	struct gfs2_sbd *sdp = sb->s_fs_info;
 	struct inode *inode;
 
-	inode = gfs2_ilookup(sb, inum->no_addr, 0);
+	inode = gfs2_ilookup(sb, inum->no_addr);
 	if (inode) {
 		if (GFS2_I(inode)->i_no_formal_ino != inum->no_formal_ino) {
 			iput(inode);
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index c352359..e5dfbcd 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -576,7 +576,7 @@ static void delete_work_func(struct work_struct *work)
 	/* Note: Unsafe to dereference ip as we don't hold right refs/locks */
 
 	if (ip)
-		inode = gfs2_ilookup(sdp->sd_vfs, no_addr, 1);
+		inode = gfs2_ilookup(sdp->sd_vfs, no_addr);
 	else
 		inode = gfs2_lookup_by_inum(sdp, no_addr, NULL, GFS2_BLKST_UNLINKED);
 	if (inode && !IS_ERR(inode)) {
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 833f8fa..06c208b 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -37,61 +37,9 @@
 #include "super.h"
 #include "glops.h"
 
-struct gfs2_skip_data {
-	u64 no_addr;
-	int skipped;
-	int non_block;
-};
-
-static int iget_test(struct inode *inode, void *opaque)
-{
-	struct gfs2_inode *ip = GFS2_I(inode);
-	struct gfs2_skip_data *data = opaque;
-
-	if (ip->i_no_addr == data->no_addr) {
-		if (data->non_block &&
-		    inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE)) {
-			data->skipped = 1;
-			return 0;
-		}
-		return 1;
-	}
-	return 0;
-}
-
-static int iget_set(struct inode *inode, void *opaque)
+struct inode *gfs2_ilookup(struct super_block *sb, u64 no_addr)
 {
-	struct gfs2_inode *ip = GFS2_I(inode);
-	struct gfs2_skip_data *data = opaque;
-
-	if (data->skipped)
-		return -ENOENT;
-	inode->i_ino = (unsigned long)(data->no_addr);
-	ip->i_no_addr = data->no_addr;
-	return 0;
-}
-
-struct inode *gfs2_ilookup(struct super_block *sb, u64 no_addr, int non_block)
-{
-	unsigned long hash = (unsigned long)no_addr;
-	struct gfs2_skip_data data;
-
-	data.no_addr = no_addr;
-	data.skipped = 0;
-	data.non_block = non_block;
-	return ilookup5(sb, hash, iget_test, &data);
-}
-
-static struct inode *gfs2_iget(struct super_block *sb, u64 no_addr,
-			       int non_block)
-{
-	struct gfs2_skip_data data;
-	unsigned long hash = (unsigned long)no_addr;
-
-	data.no_addr = no_addr;
-	data.skipped = 0;
-	data.non_block = non_block;
-	return iget5_locked(sb, hash, iget_test, iget_set, &data);
+	return ilookup(sb, (unsigned long)no_addr);
 }
 
 /**
@@ -145,8 +93,9 @@ struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned int type,
 	struct gfs2_glock *io_gl = NULL;
 	int error;
 
-	inode = gfs2_iget(sb, no_addr, non_block);
+	inode = iget_locked(sb, (unsigned long)no_addr);
 	ip = GFS2_I(inode);
+	ip->i_no_addr = no_addr;
 
 	if (!inode)
 		return ERR_PTR(-ENOMEM);
diff --git a/fs/gfs2/inode.h b/fs/gfs2/inode.h
index ba4d949..22c27a8 100644
--- a/fs/gfs2/inode.h
+++ b/fs/gfs2/inode.h
@@ -99,7 +99,7 @@ extern struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned type,
 extern struct inode *gfs2_lookup_by_inum(struct gfs2_sbd *sdp, u64 no_addr,
 					 u64 *no_formal_ino,
 					 unsigned int blktype);
-extern struct inode *gfs2_ilookup(struct super_block *sb, u64 no_addr, int nonblock);
+extern struct inode *gfs2_ilookup(struct super_block *sb, u64 no_addr);
 
 extern int gfs2_inode_refresh(struct gfs2_inode *ip);
 
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Cluster-devel] [PATCH 09/11] GFS2: generalize gfs2_check_blk_type
  2015-09-10 19:49 [Cluster-devel] [PATCH 00/11] Eleven patches related to file unlink->delete->new Bob Peterson
                   ` (7 preceding siblings ...)
  2015-09-10 19:49 ` [Cluster-devel] [PATCH 08/11] GFS2: Don't filter out I_FREEING inodes anymore Bob Peterson
@ 2015-09-10 19:49 ` Bob Peterson
  2015-09-10 19:49 ` [Cluster-devel] [PATCH 10/11] GFS2: Rework transition from unlinked to deleted dinodes Bob Peterson
  2015-09-10 19:49 ` [Cluster-devel] [PATCH 11/11] GFS2: Change from tr_touched to tr_bufs Bob Peterson
  10 siblings, 0 replies; 12+ messages in thread
From: Bob Peterson @ 2015-09-10 19:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch reorganizes function gfs2_check_blk_type so that a more
generalized function, gfs2_get_block_type, may be used in future
enhancements related to the freeing of unlinked dinodes.
---
 fs/gfs2/rgrp.c | 46 ++++++++++++++++++++++++++++++++--------------
 fs/gfs2/rgrp.h |  1 +
 2 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index b936ee1..0e3be08 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -2125,14 +2125,14 @@ void gfs2_inplace_release(struct gfs2_inode *ip)
 }
 
 /**
- * gfs2_get_block_type - Check a block in a RG is of given type
+ * __get_block_type - Check a block in a RG is of given type
  * @rgd: the resource group holding the block
  * @block: the block number
  *
  * Returns: The block type (GFS2_BLKST_*)
  */
 
-static unsigned char gfs2_get_block_type(struct gfs2_rgrpd *rgd, u64 block)
+static unsigned char __get_block_type(struct gfs2_rgrpd *rgd, u64 block)
 {
 	struct gfs2_rbm rbm = { .rgd = rgd, };
 	int ret;
@@ -2518,7 +2518,7 @@ void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip)
 }
 
 /**
- * gfs2_check_blk_type - Check the type of a block
+ * gfs2_get_blk_type - Get the type of a block
  * @sdp: The superblock
  * @no_addr: The block number to check
  * @type: The block type we are looking for
@@ -2528,26 +2528,44 @@ void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip)
  *          or -ve errno if something went wrong while checking
  */
 
-int gfs2_check_blk_type(struct gfs2_sbd *sdp, u64 no_addr, unsigned int type)
+int gfs2_get_blk_type(struct gfs2_sbd *sdp, u64 no_addr)
 {
 	struct gfs2_rgrpd *rgd;
 	struct gfs2_holder rgd_gh;
-	int error = -EINVAL;
+	int ret;
 
 	rgd = gfs2_blk2rgrpd(sdp, no_addr, 1);
 	if (!rgd)
-		goto fail;
+		return -EINVAL;
 
-	error = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_SHARED, 0, &rgd_gh);
-	if (error)
-		goto fail;
+	ret = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_SHARED, 0, &rgd_gh);
+	if (ret == 0) {
+		ret = __get_block_type(rgd, no_addr);
+		gfs2_glock_dq_uninit(&rgd_gh);
+	}
+	return ret;
+}
 
-	if (gfs2_get_block_type(rgd, no_addr) != type)
-		error = -ESTALE;
+/**
+ * gfs2_check_blk_type - Check the type of a block
+ * @sdp: The superblock
+ * @no_addr: The block number to check
+ * @type: The block type we are looking for
+ *
+ * Returns: 0 if the block type matches the expected type
+ *          -ESTALE if it doesn't match
+ *          or -ve errno if something went wrong while checking
+ */
 
-	gfs2_glock_dq_uninit(&rgd_gh);
-fail:
-	return error;
+int gfs2_check_blk_type(struct gfs2_sbd *sdp, u64 no_addr, unsigned int type)
+{
+	int ty = gfs2_get_blk_type(sdp, no_addr);
+
+	if (ty < 0)
+		return ty;
+	if (ty != type)
+		return -ESTALE;
+	return 0;
 }
 
 /**
diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
index c0ab33f..89cdab9 100644
--- a/fs/gfs2/rgrp.h
+++ b/fs/gfs2/rgrp.h
@@ -56,6 +56,7 @@ extern void __gfs2_free_blocks(struct gfs2_inode *ip, u64 bstart, u32 blen, int
 extern void gfs2_free_meta(struct gfs2_inode *ip, u64 bstart, u32 blen);
 extern void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip);
 extern void gfs2_unlink_di(struct inode *inode);
+extern int gfs2_get_blk_type(struct gfs2_sbd *sdp, u64 no_addr);
 extern int gfs2_check_blk_type(struct gfs2_sbd *sdp, u64 no_addr,
 			       unsigned int type);
 
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Cluster-devel] [PATCH 10/11] GFS2: Rework transition from unlinked to deleted dinodes
  2015-09-10 19:49 [Cluster-devel] [PATCH 00/11] Eleven patches related to file unlink->delete->new Bob Peterson
                   ` (8 preceding siblings ...)
  2015-09-10 19:49 ` [Cluster-devel] [PATCH 09/11] GFS2: generalize gfs2_check_blk_type Bob Peterson
@ 2015-09-10 19:49 ` Bob Peterson
  2015-09-10 19:49 ` [Cluster-devel] [PATCH 11/11] GFS2: Change from tr_touched to tr_bufs Bob Peterson
  10 siblings, 0 replies; 12+ messages in thread
From: Bob Peterson @ 2015-09-10 19:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch reworks the glock's delete_work_func and function
gfs2_delete_inode so that dinodes make a proper transition from the
unlinked state to the free state. The biggest problem is that there
was a huge timing window where a node gets a callback from DLM to
delete an unlinked dinode, but it cannot do so because the inode in
memory has already been freed or is in the I_FREEING state, and
therefore not found. A previous patch allowed GFS2 to create a new
inode in those cases, but it can only do so after it waits for the
duplicate to actually be freed. This waiting caused major problems
with deadlocks due to lock ordering problems involving the inode and
its two glocks: i_gl and i_iopen_gl. This patch reworks that whole
area of code to avoid those deadlocks and still give us the proper
transition to free up unlinked dinodes.
---
 fs/gfs2/dir.c        |   2 +-
 fs/gfs2/glock.c      |   9 ++--
 fs/gfs2/incore.h     |   1 +
 fs/gfs2/inode.c      | 141 +++++++++++++++++++++++++++++++++++++++++++++++++--
 fs/gfs2/inode.h      |   5 +-
 fs/gfs2/main.c       |   2 +
 fs/gfs2/meta_io.c    |   6 +++
 fs/gfs2/ops_fstype.c |   2 +-
 fs/gfs2/super.c      |  52 ++++++++++++++-----
 fs/gfs2/xattr.c      |   6 +++
 10 files changed, 201 insertions(+), 25 deletions(-)

diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index 487527b..f4caf2f 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -1563,7 +1563,7 @@ struct inode *gfs2_dir_search(struct inode *dir, const struct qstr *name,
 		brelse(bh);
 		if (fail_on_exist)
 			return ERR_PTR(-EEXIST);
-		return gfs2_inode_lookup(dir->i_sb, dtype, addr, formal_ino, 0);
+		return gfs2_inode_lookup(dir->i_sb, dtype, addr, formal_ino);
 	}
 	return ERR_PTR(-ENOENT);
 }
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index e5dfbcd..59f089c 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -569,16 +569,17 @@ static void delete_work_func(struct work_struct *work)
 	struct gfs2_glock *gl = container_of(work, struct gfs2_glock, gl_delete);
 	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
 	struct gfs2_inode *ip;
-	struct inode *inode;
+	struct inode *inode = NULL;
 	u64 no_addr = gl->gl_name.ln_number;
 
 	ip = gl->gl_object;
 	/* Note: Unsafe to dereference ip as we don't hold right refs/locks */
-
 	if (ip)
 		inode = gfs2_ilookup(sdp->sd_vfs, no_addr);
-	else
-		inode = gfs2_lookup_by_inum(sdp, no_addr, NULL, GFS2_BLKST_UNLINKED);
+
+	if (inode == NULL || IS_ERR(inode))
+		inode = gfs2_inode_lookup_for_del(sdp->sd_vfs, no_addr);
+
 	if (inode && !IS_ERR(inode)) {
 		d_prune_aliases(inode);
 		iput(inode);
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 5065e0c..3cf0224 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -381,6 +381,7 @@ enum {
 	GIF_SW_PAGED		= 3,
 	GIF_ORDERED		= 4,
 	GIF_FREE_VFS_INODE      = 5,
+	GIF_AM_DELETING         = 6,
 };
 
 struct gfs2_inode {
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 06c208b..631d4ef 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -78,15 +78,15 @@ static void gfs2_set_iop(struct inode *inode)
 /**
  * gfs2_inode_lookup - Lookup an inode
  * @sb: The super block
- * @no_addr: The inode number
  * @type: The type of the inode
- * non_block: Can we block on inodes that are being freed?
+ * @no_addr: The inode number
+ * @no_formal_ino: The formal inode number (for NFS)
  *
  * Returns: A VFS inode, or an error
  */
 
 struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned int type,
-				u64 no_addr, u64 no_formal_ino, int non_block)
+				u64 no_addr, u64 no_formal_ino)
 {
 	struct inode *inode;
 	struct gfs2_inode *ip;
@@ -153,6 +153,139 @@ fail:
 	return ERR_PTR(error);
 }
 
+/**
+ * gfs2_inode_lookup_for_del - Lookup an unlinked inode so we can delete it
+ * @sb: The super block
+ * @no_addr: The inode number
+ *
+ * Returns: A VFS inode, or an error
+ *
+ * We jump through some hoops here to avoid a special case in which the block
+ * has been freed and already reallocated for a different inode while after
+ * the iopen callback was received to signify a remote unlink that needs
+ * deleting. In that case, we don't want to return the inode to the caller
+ * to delete the inode. We also need to do an inode_refresh to ensure that
+ * whomever recreated the dinode gets a proper i_nlink count, otherwise the
+ * vfs might think it's unlinked and still needs deleting.
+ */
+struct inode *gfs2_inode_lookup_for_del(struct super_block *sb, u64 no_addr)
+{
+	struct inode *inode;
+	struct gfs2_inode *ip;
+	struct gfs2_glock *io_gl = NULL;
+	struct gfs2_holder i_gh;
+	int error, btype;
+	struct gfs2_sbd *sdp = sb->s_fs_info;
+	struct address_space *mapping;
+	BUG_ON(no_addr == 0);
+	inode = iget_locked(sb, (unsigned long)no_addr);
+	ip = GFS2_I(inode);
+
+	if (!inode)
+		return ERR_PTR(-ENOBUFS);
+
+	if (!(inode->i_state & I_NEW)) {
+		BUG_ON(ip->i_no_addr != no_addr);
+		return inode;
+	}
+
+	ip->i_no_addr = no_addr;
+
+	error = gfs2_glock_get(sdp, no_addr, &gfs2_inode_glops, CREATE,
+			       &ip->i_gl);
+	if (unlikely(error))
+		goto fail;
+	/* If this is a brand new inode, but a recycled glock, we've got a
+	   reference problem. In clear_inode it does an extra glock_put so the
+	   next time we do clear_inode for this inode, we'll get in trouble.
+	   So we hold the glock to bump the reference count. */
+	if (ip->i_gl->gl_flags != 0) /* New inode, recycled glock */
+		ip->i_gl->gl_lockref.count++;
+
+	ip->i_gl->gl_object = ip;
+
+	error = gfs2_glock_get(sdp, no_addr, &gfs2_iopen_glops, CREATE,
+			       &io_gl);
+	if (unlikely(error))
+		goto fail_put;
+
+	ip->i_iopen_gh.gh_gl = NULL;
+	gfs2_glock_put(io_gl);
+	io_gl = NULL;
+
+	error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_SKIP, &i_gh);
+	if (error)
+		goto fail_put;
+
+	/* Inode glock must be locked already */
+	btype = gfs2_get_blk_type(sdp, no_addr);
+	if (btype < 0) {
+		error = btype;
+		goto fail_refresh;
+	}
+	if (btype == GFS2_BLKST_FREE || btype == GFS2_BLKST_USED) {
+		error = -ESTALE;
+		goto fail_refresh;
+	}
+
+	/* At this point it's either UNLINKED or DINODE */
+
+	/* If there aren't any pages associated with it and it's a new inode,
+	 * we shouldn't be messing with it, even to read in pages. We should
+	 * just exit and let whomever's using it carry on. If we do inode
+	 * refresh, we'll end up adding pages to the cache that we'd otherwise
+	 * need to truncate with truncate_inode_page().
+	 */
+	mapping = gfs2_glock2aspace(ip->i_gl);
+	if (mapping->nrpages == 0 && btype == GFS2_BLKST_DINODE) {
+		error = -ESTALE;
+		goto fail_refresh;
+	}
+	/* At this point, we've got a dinode with pages associated with it
+	 * or it's unlinked. If it's unlinked, we need to do inode_refresh
+	 * so that our put() will delete it normally through gfs2_delete_inode.
+	 * If it has pages associated with it, we may have grabbed the glock
+	 * while it was being created, so we need to refresh it before
+	 * exiting.
+	 */
+	error = gfs2_inode_refresh(GFS2_I(inode));
+	if (error)
+		goto fail_refresh;
+
+	if (btype == GFS2_BLKST_DINODE) {
+		/* Now we know this is a dinode (not unlinked), but we know
+		 * there were already pages associated with it. So it's safe
+		 * to exit:
+		 */
+		error = -ESTALE;
+		goto fail_refresh;
+	}
+
+	gfs2_set_iop(inode);
+	unlock_new_inode(inode);
+	gfs2_glock_dq_uninit(&i_gh);
+	gfs2_glock_put(ip->i_gl);
+	return inode;
+
+fail_refresh:
+	ip->i_gl->gl_object = NULL; /* See note below */
+	gfs2_glock_dq_uninit(&i_gh);
+fail_put:
+	/* This setting of gl_object to NULL is done by the other lookup
+	 * functions. But if it races with someone reusing the dinode, we
+	 * don't want to mess them up. It seems necessary in order to prevent
+	 * buffer_heads from being attached after the i_gh is acquired.
+	 * But it seems like it has the potential to screw up people trying
+	 * to re-use the glock for a new incarnation of the inode.
+	 * For now, I'm going to move it inside the dq_uninit.
+	 */
+	/*ip->i_gl->gl_object = NULL;*/
+	gfs2_glock_put(ip->i_gl);
+fail:
+	iget_failed(inode);
+	return ERR_PTR(error);
+}
+
 struct inode *gfs2_lookup_by_inum(struct gfs2_sbd *sdp, u64 no_addr,
 				  u64 *no_formal_ino, unsigned int blktype)
 {
@@ -171,7 +304,7 @@ struct inode *gfs2_lookup_by_inum(struct gfs2_sbd *sdp, u64 no_addr,
 	if (error)
 		goto fail;
 
-	inode = gfs2_inode_lookup(sb, DT_UNKNOWN, no_addr, 0, 1);
+	inode = gfs2_inode_lookup(sb, DT_UNKNOWN, no_addr, 0);
 	if (IS_ERR(inode))
 		goto fail;
 
diff --git a/fs/gfs2/inode.h b/fs/gfs2/inode.h
index 22c27a8..9e838b3 100644
--- a/fs/gfs2/inode.h
+++ b/fs/gfs2/inode.h
@@ -94,8 +94,9 @@ err:
 }
 
 extern struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned type, 
-				       u64 no_addr, u64 no_formal_ino,
-				       int non_block);
+				       u64 no_addr, u64 no_formal_ino);
+extern struct inode *gfs2_inode_lookup_for_del(struct super_block *sb,
+					       u64 no_addr);
 extern struct inode *gfs2_lookup_by_inum(struct gfs2_sbd *sdp, u64 no_addr,
 					 u64 *no_formal_ino,
 					 unsigned int blktype);
diff --git a/fs/gfs2/main.c b/fs/gfs2/main.c
index 241a399..a8bd5b6 100644
--- a/fs/gfs2/main.c
+++ b/fs/gfs2/main.c
@@ -43,6 +43,8 @@ static void gfs2_init_inode_once(void *foo)
 	INIT_LIST_HEAD(&ip->i_trunc_list);
 	ip->i_res = NULL;
 	ip->i_hash_cache = NULL;
+	ip->i_iopen_gh.gh_gl = NULL;
+	INIT_LIST_HEAD(&ip->i_iopen_gh.gh_list);
 }
 
 static void gfs2_init_glock_once(void *foo)
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index 0e1d4be..f5b4948 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -128,6 +128,12 @@ struct buffer_head *gfs2_getbuf(struct gfs2_glock *gl, u64 blkno, int create)
 	index = blkno >> shift;             /* convert block to page */
 	bufnum = blkno - (index << shift);  /* block buf index within page */
 
+	if ((gl->gl_name.ln_type == LM_TYPE_INODE) && gl->gl_object) {
+		struct gfs2_inode *ip = gl->gl_object;
+		struct inode *inode = &ip->i_inode;
+		if (!test_bit(GIF_AM_DELETING, &ip->i_flags))
+			BUG_ON(inode->i_state & I_FREEING);
+	}
 	if (create) {
 		for (;;) {
 			page = grab_cache_page(mapping, index);
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 1e3a93f..f6db0da 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -451,7 +451,7 @@ static int gfs2_lookup_root(struct super_block *sb, struct dentry **dptr,
 	struct dentry *dentry;
 	struct inode *inode;
 
-	inode = gfs2_inode_lookup(sb, DT_DIR, no_addr, 0, 0);
+	inode = gfs2_inode_lookup(sb, DT_DIR, no_addr, 0);
 	if (IS_ERR(inode)) {
 		fs_err(sdp, "can't read in %s inode: %ld\n", name, PTR_ERR(inode));
 		return PTR_ERR(inode);
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 2f29ca5..7b59a44 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1511,6 +1511,7 @@ static void gfs2_evict_inode(struct inode *inode)
 	struct gfs2_sbd *sdp = sb->s_fs_info;
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_holder gh;
+	struct gfs2_glock *io_gl = NULL, *i_gl = NULL;
 	struct address_space *metamapping;
 	int error;
 
@@ -1522,8 +1523,13 @@ static void gfs2_evict_inode(struct inode *inode)
 	if (inode->i_nlink || (sb->s_flags & MS_RDONLY))
 		goto out;
 
+	error = gfs2_glock_get(sdp, ip->i_no_addr, &gfs2_inode_glops,
+			       CREATE, &i_gl);
+	if (unlikely(error))
+		goto out;
+
 	/* Must not read inode block until block type has been verified */
-	error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_SKIP, &gh);
+	error = gfs2_glock_nq_init(i_gl, LM_ST_EXCLUSIVE, GL_SKIP, &gh);
 	if (unlikely(error)) {
 		ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
 		gfs2_glock_dq_wait(&ip->i_iopen_gh);
@@ -1531,10 +1537,13 @@ static void gfs2_evict_inode(struct inode *inode)
 		goto out;
 	}
 
+	set_bit(GIF_AM_DELETING, &ip->i_flags);
 	if (!test_bit(GIF_ALLOC_FAILED, &ip->i_flags)) {
 		error = gfs2_check_blk_type(sdp, ip->i_no_addr, GFS2_BLKST_UNLINKED);
-		if (error)
-			goto out_truncate;
+		if (error) {
+			gfs2_glock_dq_uninit(&gh);
+			goto out;
+		}
 	}
 
 	if (test_bit(GIF_INVALID, &ip->i_flags)) {
@@ -1543,10 +1552,22 @@ static void gfs2_evict_inode(struct inode *inode)
 			goto out_truncate;
 	}
 
+	/* The iopen glock may have been released prior to this, but we need
+	   it back. */
+	error = gfs2_glock_get(sdp, ip->i_no_addr, &gfs2_iopen_glops, CREATE,
+			       &io_gl);
+	if (error)
+		goto out_truncate;
+
 	ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
-	gfs2_glock_dq_wait(&ip->i_iopen_gh);
-	gfs2_holder_reinit(LM_ST_EXCLUSIVE, LM_FLAG_TRY_1CB | GL_NOCACHE, &ip->i_iopen_gh);
-	error = gfs2_glock_nq(&ip->i_iopen_gh);
+	if (ip->i_iopen_gh.gh_gl &&
+	    test_bit(HIF_HOLDER, &ip->i_iopen_gh.gh_iflags)) {
+		gfs2_glock_dq_wait(&ip->i_iopen_gh);
+		gfs2_holder_uninit(&ip->i_iopen_gh);
+	}
+
+	error = gfs2_glock_nq_init(io_gl, LM_ST_EXCLUSIVE, LM_FLAG_TRY_1CB |
+				   GL_NOCACHE, &ip->i_iopen_gh);
 	if (error)
 		goto out_truncate;
 
@@ -1575,14 +1596,14 @@ static void gfs2_evict_inode(struct inode *inode)
 	goto out_unlock;
 
 out_truncate:
-	gfs2_log_flush(sdp, ip->i_gl, NORMAL_FLUSH);
-	metamapping = gfs2_glock2aspace(ip->i_gl);
-	if (test_bit(GLF_DIRTY, &ip->i_gl->gl_flags)) {
+	gfs2_log_flush(sdp, i_gl, NORMAL_FLUSH);
+	metamapping = gfs2_glock2aspace(i_gl);
+	if (test_bit(GLF_DIRTY, &i_gl->gl_flags)) {
 		filemap_fdatawrite(metamapping);
 		filemap_fdatawait(metamapping);
 	}
 	write_inode_now(inode, 1);
-	gfs2_ail_flush(ip->i_gl, 0);
+	gfs2_ail_flush(i_gl, 0);
 
 	/* Case 2 starts here */
 	error = gfs2_trans_begin(sdp, 0, sdp->sd_jdesc->jd_blocks);
@@ -1598,11 +1619,15 @@ out_unlock:
 	if (gfs2_rs_active(ip->i_res))
 		gfs2_rs_deltree(ip->i_res);
 
-	if (test_bit(HIF_HOLDER, &ip->i_iopen_gh.gh_iflags)) {
+	if (ip->i_iopen_gh.gh_gl &&
+	    test_bit(HIF_HOLDER, &ip->i_iopen_gh.gh_iflags)) {
 		ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
 		gfs2_glock_dq_wait(&ip->i_iopen_gh);
+		gfs2_holder_uninit(&ip->i_iopen_gh);
 	}
-	gfs2_holder_uninit(&ip->i_iopen_gh);
+	if (io_gl)
+		gfs2_glock_put(io_gl);
+
 	gfs2_glock_dq_uninit(&gh);
 	if (error && error != GLR_TRYFAILED && error != -EROFS)
 		fs_warn(sdp, "gfs2_evict_inode: %d\n", error);
@@ -1616,7 +1641,8 @@ out:
 	ip->i_gl->gl_object = NULL;
 	flush_delayed_work(&ip->i_gl->gl_work);
 	gfs2_glock_add_to_lru(ip->i_gl);
-	gfs2_glock_put(ip->i_gl);
+	if (i_gl)
+		gfs2_glock_put(i_gl);
 	ip->i_gl = NULL;
 	if (ip->i_iopen_gh.gh_gl) {
 		ip->i_iopen_gh.gh_gl->gl_object = NULL;
diff --git a/fs/gfs2/xattr.c b/fs/gfs2/xattr.c
index 4c096fa..e436ad0 100644
--- a/fs/gfs2/xattr.c
+++ b/fs/gfs2/xattr.c
@@ -625,6 +625,8 @@ static int ea_alloc_blk(struct gfs2_inode *ip, struct buffer_head **bhp)
 	u64 block;
 	int error;
 
+	BUG_ON((!gfs2_glock_is_locked_by_me(ip->i_gl)) ||
+	       (!gfs2_glock_is_held_excl(ip->i_gl)));
 	error = gfs2_alloc_blocks(ip, &block, &n, 0, NULL);
 	if (error)
 		return error;
@@ -687,6 +689,8 @@ static int ea_write(struct gfs2_inode *ip, struct gfs2_ea_header *ea,
 			int mh_size = sizeof(struct gfs2_meta_header);
 			unsigned int n = 1;
 
+			BUG_ON((!gfs2_glock_is_locked_by_me(ip->i_gl)) ||
+			       (!gfs2_glock_is_held_excl(ip->i_gl)));
 			error = gfs2_alloc_blocks(ip, &block, &n, 0, NULL);
 			if (error)
 				return error;
@@ -1003,6 +1007,8 @@ static int ea_set_block(struct gfs2_inode *ip, struct gfs2_ea_request *er,
 	} else {
 		u64 blk;
 		unsigned int n = 1;
+		BUG_ON((!gfs2_glock_is_locked_by_me(ip->i_gl)) ||
+		       (!gfs2_glock_is_held_excl(ip->i_gl)));
 		error = gfs2_alloc_blocks(ip, &blk, &n, 0, NULL);
 		if (error)
 			return error;
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Cluster-devel] [PATCH 11/11] GFS2: Change from tr_touched to tr_bufs
  2015-09-10 19:49 [Cluster-devel] [PATCH 00/11] Eleven patches related to file unlink->delete->new Bob Peterson
                   ` (9 preceding siblings ...)
  2015-09-10 19:49 ` [Cluster-devel] [PATCH 10/11] GFS2: Rework transition from unlinked to deleted dinodes Bob Peterson
@ 2015-09-10 19:49 ` Bob Peterson
  10 siblings, 0 replies; 12+ messages in thread
From: Bob Peterson @ 2015-09-10 19:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch changes the GFS2 transaction processing so that it keeps
track of how many buffers are associated with a transaction rather
than whether or not it's been touched. That way, if a buffer is
added, then revoked, then removed entirely, it can avoid some work.
---
 fs/gfs2/incore.h  |  5 ++++-
 fs/gfs2/meta_io.c |  6 +++---
 fs/gfs2/trans.c   | 13 +++++++------
 3 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 3cf0224..effa924 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -474,10 +474,13 @@ struct gfs2_trans {
 	unsigned int tr_blocks;
 	unsigned int tr_revokes;
 	unsigned int tr_reserved;
-	unsigned int tr_touched:1;
 	unsigned int tr_attached:1;
 	unsigned int tr_alloced:1;
 
+	struct gfs2_holder tr_t_gh;
+
+	int tr_bufs;
+
 	unsigned int tr_num_buf_new;
 	unsigned int tr_num_databuf_new;
 	unsigned int tr_num_buf_rm;
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index f5b4948..040eb15 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -230,7 +230,7 @@ int gfs2_meta_read(struct gfs2_glock *gl, u64 blkno, int flags,
 	wait_on_buffer(bh);
 	if (unlikely(!buffer_uptodate(bh))) {
 		struct gfs2_trans *tr = current->journal_info;
-		if (tr && tr->tr_touched)
+		if (tr && tr->tr_bufs)
 			gfs2_io_error_bh(sdp, bh);
 		brelse(bh);
 		*bhp = NULL;
@@ -257,7 +257,7 @@ int gfs2_meta_wait(struct gfs2_sbd *sdp, struct buffer_head *bh)
 
 	if (!buffer_uptodate(bh)) {
 		struct gfs2_trans *tr = current->journal_info;
-		if (tr && tr->tr_touched)
+		if (tr && tr->tr_bufs)
 			gfs2_io_error_bh(sdp, bh);
 		return -EIO;
 	}
@@ -282,7 +282,7 @@ void gfs2_remove_from_journal(struct buffer_head *bh, struct gfs2_trans *tr, int
 			tr->tr_num_buf_rm++;
 		else
 			tr->tr_num_databuf_rm++;
-		tr->tr_touched = 1;
+		tr->tr_bufs--;
 		was_pinned = 1;
 		brelse(bh);
 	}
diff --git a/fs/gfs2/trans.c b/fs/gfs2/trans.c
index b95d0d6..6ff5faa 100644
--- a/fs/gfs2/trans.c
+++ b/fs/gfs2/trans.c
@@ -77,8 +77,8 @@ fail:
 static void gfs2_print_trans(const struct gfs2_trans *tr)
 {
 	pr_warn("Transaction created at: %pSR\n", (void *)tr->tr_ip);
-	pr_warn("blocks=%u revokes=%u reserved=%u touched=%u\n",
-		tr->tr_blocks, tr->tr_revokes, tr->tr_reserved, tr->tr_touched);
+	pr_warn("blocks=%u revokes=%u reserved=%u bufs=%d\n",
+		tr->tr_blocks, tr->tr_revokes, tr->tr_reserved, tr->tr_bufs);
 	pr_warn("Buf %u/%u Databuf %u/%u Revoke %u/%u\n",
 		tr->tr_num_buf_new, tr->tr_num_buf_rm,
 		tr->tr_num_databuf_new, tr->tr_num_databuf_rm,
@@ -94,7 +94,7 @@ void gfs2_trans_end(struct gfs2_sbd *sdp)
 	BUG_ON(!tr);
 	current->journal_info = NULL;
 
-	if (!tr->tr_touched) {
+	if (tr->tr_bufs == 0) {
 		gfs2_log_release(sdp, tr->tr_reserved);
 		if (alloced) {
 			kfree(tr);
@@ -180,7 +180,7 @@ void gfs2_trans_add_data(struct gfs2_glock *gl, struct buffer_head *bh)
 		gfs2_log_lock(sdp);
 	}
 	gfs2_assert(sdp, bd->bd_gl == gl);
-	tr->tr_touched = 1;
+	tr->tr_bufs++;
 	if (list_empty(&bd->bd_list)) {
 		set_bit(GLF_LFLUSH, &bd->bd_gl->gl_flags);
 		set_bit(GLF_DIRTY, &bd->bd_gl->gl_flags);
@@ -199,7 +199,7 @@ static void meta_lo_add(struct gfs2_sbd *sdp, struct gfs2_bufdata *bd)
 	enum gfs2_freeze_state state = atomic_read(&sdp->sd_freeze_state);
 
 	tr = current->journal_info;
-	tr->tr_touched = 1;
+	tr->tr_bufs++;
 	if (!list_empty(&bd->bd_list))
 		return;
 	set_bit(GLF_LFLUSH, &bd->bd_gl->gl_flags);
@@ -252,7 +252,7 @@ void gfs2_trans_add_revoke(struct gfs2_sbd *sdp, struct gfs2_bufdata *bd)
 
 	BUG_ON(!list_empty(&bd->bd_list));
 	gfs2_add_revoke(sdp, bd);
-	tr->tr_touched = 1;
+	tr->tr_bufs++;
 	tr->tr_num_revoke++;
 }
 
@@ -268,6 +268,7 @@ void gfs2_trans_add_unrevoke(struct gfs2_sbd *sdp, u64 blkno, unsigned int len)
 			list_del_init(&bd->bd_list);
 			gfs2_assert_withdraw(sdp, sdp->sd_log_num_revoke);
 			sdp->sd_log_num_revoke--;
+			tr->tr_bufs--;
 			kmem_cache_free(gfs2_bufdata_cachep, bd);
 			tr->tr_num_revoke_rm++;
 			if (--n == 0)
-- 
2.4.3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-09-10 19:49 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-10 19:49 [Cluster-devel] [PATCH 00/11] Eleven patches related to file unlink->delete->new Bob Peterson
2015-09-10 19:49 ` [Cluster-devel] [PATCH 01/11] GFS2: Update master statfs buffer with sd_statfs_spin locked Bob Peterson
2015-09-10 19:49 ` [Cluster-devel] [PATCH 02/11] GFS2: Allow fail_gunlock3 to set the free_vfs_inode bit Bob Peterson
2015-09-10 19:49 ` [Cluster-devel] [PATCH 03/11] GFS2: Protect log tail calculations with inside locks Bob Peterson
2015-09-10 19:49 ` [Cluster-devel] [PATCH 04/11] GFS2: Wait for iopen glock dequeues Bob Peterson
2015-09-10 19:49 ` [Cluster-devel] [PATCH 05/11] GFS2: Reintroduce a timeout in function gfs2_gl_hash_clear Bob Peterson
2015-09-10 19:49 ` [Cluster-devel] [PATCH 06/11] GFS2: Prevent gl_delete work for re-used inodes Bob Peterson
2015-09-10 19:49 ` [Cluster-devel] [PATCH 07/11] GFS2: Truncate address space mapping when deleting an inode Bob Peterson
2015-09-10 19:49 ` [Cluster-devel] [PATCH 08/11] GFS2: Don't filter out I_FREEING inodes anymore Bob Peterson
2015-09-10 19:49 ` [Cluster-devel] [PATCH 09/11] GFS2: generalize gfs2_check_blk_type Bob Peterson
2015-09-10 19:49 ` [Cluster-devel] [PATCH 10/11] GFS2: Rework transition from unlinked to deleted dinodes Bob Peterson
2015-09-10 19:49 ` [Cluster-devel] [PATCH 11/11] GFS2: Change from tr_touched to tr_bufs Bob Peterson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).