[Cluster-devel] [PATCH] GFS2: kernel changes to support new gfs2

cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

* [Cluster-devel] [PATCH] GFS2: kernel changes to support new gfs2_grow command (Try 3)
@ 2007-05-08 20:43 David Teigland
  0 siblings, 0 replies; 14+ messages in thread
From: David Teigland @ 2007-05-08 20:43 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tue, May 08, 2007 at 09:56:05AM -0500, Robert Peterson wrote:
> David Teigland wrote:
> >On Wed, May 02, 2007 at 08:57:08PM -0500, Robert Peterson wrote:
> >>@@ -447,7 +479,12 @@ static int gfs2_ri_update(struct gfs2_inode *ip)
> >>	u64 junk = ip->i_di.di_size;
> >>	int error;
> >>
> >>-	if (do_div(junk, sizeof(struct gfs2_rindex))) {
> >>+	/* If someone is holding the rindex file with a glock, they must
> >>+	   be updating it, in which case we may have partial entries.
> >>+	   In this case, we ignore the partials. */
> >>+	if (!gfs2_glock_is_held_excl(ip->i_gl) &&
> >>+	    !gfs2_glock_is_held_shrd(ip->i_gl) &&
> >>+	    do_div(junk, sizeof(struct gfs2_rindex))) {
> >>		gfs2_consist_inode(ip);
> >>		return -EIO;
> >>	}
> >
> >So the use of glock_is_held _is_ part of an assertion, not part of an
> >algorithm which I was worried about before.  We should only ever get to
> >this spot with a shared glock, right? (rindex_hold takes it).  So a plain
> >old assertion that the glock is shared at the beginning would be ok, but
> >this particular check doesn't make sense to me.
> 
> For the sake of completeness, I retested without this change to make
> sure it was also still necessary.  It was.  The problem is that
> we can now call gfs2_ri_update while there are still partial rindex 
> entries.  This can happen when we need to allocate a new page to the
> rindex, which calls gfs2_inplace_reserve_i, which eventually gets to 
> gfs2_check_rindex_version.  Without the change, you get:

+       else if (!sdp->sd_rgrps) /* We may not have the rindex read in, so: */
+               error = gfs2_check_rindex_version(sdp);

This is a very special exception; instead of using this more general
purpose function (gfs2_check_rindex_version) for the purpose of reading in
the rgrps, and then being required to add the further special case checks
up above -- I suggest using a new function here that just reads in the
rgrps without the checks above that you've had to modify.

The path I see us going down here is adding a special-case-exception in
one spot, then finding that it's not quite right, so adding another
special-case-exception on top of it ... and so on.  It becomes a house of
cards very quickly that way.

Dave



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] [PATCH] GFS2: kernel changes to support new gfs2_grow command (Try 3)
@ 2007-05-08 20:42 David Teigland
  0 siblings, 0 replies; 14+ messages in thread
From: David Teigland @ 2007-05-08 20:42 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tue, May 08, 2007 at 03:24:41PM +0100, Steven Whitehouse wrote:
> > Well, I retested this code path and found that this code is still 
> > necessary  (unless I should somehow be doing this a different way).
> > Without the code, the gfs2_rindex_hold tries to add a holder,
> > but there's already a holder due to the meta_fs.  Under the right 
> > conditions, you get:
> > 
> > original: gfs2_prepare_write+0x49/0x237 [gfs2]
> > new: gfs2_rindex_hold+0x2b/0x52a [gfs2]
> 
> I agree that this change is needed, its how we've solved the same
> problem elsewhere and it looks ok to me,

OK, I see, it's because we removed the ability for a process to have two
holders on the same glock from gfs2.  I'm assuming that the glock taken
for writing is released after inplace_release().

Dave



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] [PATCH] GFS2: kernel changes to support new gfs2_grow command (Try 3)
@ 2007-05-03  1:57 Robert Peterson
  2007-05-04 16:47 ` David Teigland
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Robert Peterson @ 2007-05-03  1:57 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi All,

This is another revision of my gfs2 kernel patch that allows 
gfs2_grow to function properly.  

Steve Whitehouse expressed some concerns about the previous
patch and I restructured it based on his comments.
The previous patch was doing the statfs_change at file close time,
under its own transaction.  The current patch does the statfs_change
inside the gfs2_commit_write function, which keeps it under the
umbrella of the inode transaction.

I can't call ri_update to re-read the rindex file during the 
transaction because the transaction may have outstanding unwritten
buffers attached to the rgrps that would be otherwise blown away.
So instead, I created a new function, gfs2_ri_total, that will
re-read the rindex file just to total the file system space
for the sake of the statfs_change.  The ri_update will happen
later, when gfs2 realizes the version number has changed, as it
happened before my patch.

Since the statfs_change is happening at write_commit time and there
may be multiple writes to the rindex file for one grow operation.
So one consequence of this restructuring is that instead of getting
one kernel message to indicate the change, you may see several.
For example, before when you did a gfs2_grow, you'd get a single
message like:

GFS2: File system extended by 247876 blocks (968MB)

Now you get something like:

GFS2: File system extended by 207896 blocks (812MB)
GFS2: File system extended by 39980 blocks (156MB)

This version has also been successfully run against the hours-long 
"gfs2_fsck_hellfire" test that does several gfs2_grow and gfs2_fsck
while interjecting file system damage.  It does this repeatedly
under a variety Resource Group conditions.  

Regards,

Bob Peterson
Red Hat Cluster Suite

Signed-off-By: Bob Peterson <rpeterso@redhat.com>
-- 
 fs/gfs2/ops_address.c |   39 +++++++++++++++++-
 fs/gfs2/ops_address.h |    5 ++-
 fs/gfs2/rgrp.c        |  106 ++++++++++++++++++++++++++++++++++++++++---------
 3 files changed, 129 insertions(+), 21 deletions(-)

diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c
index 30c1562..41f153d 100644
--- a/fs/gfs2/ops_address.c
+++ b/fs/gfs2/ops_address.c
@@ -1,6 +1,6 @@
 /*
  * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
- * Copyright (C) 2004-2006 Red Hat, Inc.  All rights reserved.
+ * Copyright (C) 2004-2007 Red Hat, Inc.  All rights reserved.
  *
  * This copyrighted material is made available to anyone wishing to use,
  * modify, copy, or redistribute it subject to the terms and conditions
@@ -450,6 +450,40 @@ out_uninit:
 }
 
 /**
+ * adjust_fs_space - Adjusts the free space available due to gfs2_grow
+ * @inode: the rindex inode
+ */
+static void adjust_fs_space(struct inode *inode)
+{
+	struct gfs2_sbd *sdp = inode->i_sb->s_fs_info;
+	struct gfs2_statfs_change_host *m_sc = &sdp->sd_statfs_master;
+	struct gfs2_statfs_change_host *l_sc = &sdp->sd_statfs_local;
+	__u64 fs_total, new_free, new_free_readable;
+	const char *stg_abbrev = " KMGTPEE";
+	int factor;
+
+	/* Total up the file system space, according to the latest rindex. */
+	fs_total = gfs2_ri_total(sdp);
+
+	spin_lock(&sdp->sd_statfs_spin);
+	if (fs_total > (m_sc->sc_total + l_sc->sc_total))
+		new_free = fs_total - (m_sc->sc_total + l_sc->sc_total);
+	else
+		new_free = 0;
+	spin_unlock(&sdp->sd_statfs_spin);
+	new_free_readable = new_free * sdp->sd_sb.sb_bsize;
+	for (factor = 0; factor < 7; factor++)
+		if (new_free_readable > 1024)
+			new_free_readable /= 1024;
+		else
+			break;
+	printk(KERN_WARNING "GFS2: File system extended by %llu "
+	       "blocks (%llu%cB)\n", new_free, new_free_readable,
+	       stg_abbrev[factor]);
+	gfs2_statfs_change(sdp, new_free, new_free, 0);
+}
+
+/**
  * gfs2_commit_write - Commit write to a file
  * @file: The file to write to
  * @page: The page containing the data
@@ -511,6 +545,9 @@ static int gfs2_commit_write(struct file *file, struct page *page,
 		di->di_size = cpu_to_be64(inode->i_size);
 	}
 
+	if (inode == sdp->sd_rindex)
+		adjust_fs_space(inode);
+
 	brelse(dibh);
 	gfs2_trans_end(sdp);
 	if (al->al_requested) {
diff --git a/fs/gfs2/ops_address.h b/fs/gfs2/ops_address.h
index 35aaee4..d770ff3 100644
--- a/fs/gfs2/ops_address.h
+++ b/fs/gfs2/ops_address.h
@@ -1,6 +1,6 @@
 /*
  * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
- * Copyright (C) 2004-2006 Red Hat, Inc.  All rights reserved.
+ * Copyright (C) 2004-2007 Red Hat, Inc.  All rights reserved.
  *
  * This copyrighted material is made available to anyone wishing to use,
  * modify, copy, or redistribute it subject to the terms and conditions
@@ -18,5 +18,8 @@ extern const struct address_space_operations gfs2_file_aops;
 extern int gfs2_get_block(struct inode *inode, sector_t lblock,
 			  struct buffer_head *bh_result, int create);
 extern int gfs2_releasepage(struct page *page, gfp_t gfp_mask);
+extern __u64 gfs2_ri_total(struct gfs2_sbd *sdp);
+extern void gfs2_statfs_change(struct gfs2_sbd *sdp, s64 total, s64 free,
+			       s64 dinodes);
 
 #endif /* __OPS_ADDRESS_DOT_H__ */
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 1727f50..dfb997d 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1,6 +1,6 @@
 /*
  * Copyright (C) Sistina Software, Inc.  1997-2003 All rights reserved.
- * Copyright (C) 2004-2006 Red Hat, Inc.  All rights reserved.
+ * Copyright (C) 2004-2007 Red Hat, Inc.  All rights reserved.
  *
  * This copyrighted material is made available to anyone wishing to use,
  * modify, copy, or redistribute it subject to the terms and conditions
@@ -431,6 +431,38 @@ static int compute_bitstructs(struct gfs2_rgrpd *rgd)
 }
 
 /**
+ * gfs2_ri_total - Total up the file system space, according to the rindex.
+ *
+ */
+__u64 gfs2_ri_total(struct gfs2_sbd *sdp)
+{
+	__u64 total_data = 0;	
+	struct inode *inode = sdp->sd_rindex;
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_rindex_host ri;
+	char buf[sizeof(struct gfs2_rindex)];
+	struct file_ra_state ra_state;
+	int error, rgrps;
+
+	mutex_lock(&sdp->sd_rindex_mutex);
+	file_ra_state_init(&ra_state, inode->i_mapping);
+	for (rgrps = 0;; rgrps++) {
+		loff_t pos = rgrps * sizeof(struct gfs2_rindex);
+
+		if (pos + sizeof(struct gfs2_rindex) >= ip->i_di.di_size)
+			break;
+		error = gfs2_internal_read(ip, &ra_state, buf, &pos,
+					   sizeof(struct gfs2_rindex));
+		if (error != sizeof(struct gfs2_rindex))
+			break;
+		gfs2_rindex_in(&ri, buf);
+		total_data += ri.ri_data;
+	}
+	mutex_unlock(&sdp->sd_rindex_mutex);
+	return total_data;
+}
+
+/**
  * gfs2_ri_update - Pull in a new resource index from the disk
  * @gl: The glock covering the rindex inode
  *
@@ -447,7 +479,12 @@ static int gfs2_ri_update(struct gfs2_inode *ip)
 	u64 junk = ip->i_di.di_size;
 	int error;
 
-	if (do_div(junk, sizeof(struct gfs2_rindex))) {
+	/* If someone is holding the rindex file with a glock, they must
+	   be updating it, in which case we may have partial entries.
+	   In this case, we ignore the partials. */
+	if (!gfs2_glock_is_held_excl(ip->i_gl) &&
+	    !gfs2_glock_is_held_shrd(ip->i_gl) &&
+	    do_div(junk, sizeof(struct gfs2_rindex))) {
 		gfs2_consist_inode(ip);
 		return -EIO;
 	}
@@ -457,6 +494,9 @@ static int gfs2_ri_update(struct gfs2_inode *ip)
 	file_ra_state_init(&ra_state, inode->i_mapping);
 	for (sdp->sd_rgrps = 0;; sdp->sd_rgrps++) {
 		loff_t pos = sdp->sd_rgrps * sizeof(struct gfs2_rindex);
+
+		if (pos + sizeof(struct gfs2_rindex) >= ip->i_di.di_size)
+			break;
 		error = gfs2_internal_read(ip, &ra_state, buf, &pos,
 					    sizeof(struct gfs2_rindex));
 		if (!error)
@@ -502,6 +542,35 @@ fail:
 }
 
 /**
+ * gfs2_check_rindex_version - Check the rindex version number and resync
+ *                        if necessary.
+ * @sdp: The GFS2 superblock
+ *
+ * This makes sure that we're using the latest copy of the resource index
+ * special file, which might have been updated if someone expanded the
+ * filesystem (via gfs2_grow utility), which adds new resource groups.
+ *
+ */
+
+static int gfs2_check_rindex_version(struct gfs2_sbd *sdp)
+{
+	struct gfs2_inode *ip = GFS2_I(sdp->sd_rindex);
+	struct gfs2_glock *gl = ip->i_gl;
+	int error = 0;
+
+	gfs2_assert(sdp, gfs2_glock_is_locked_by_me(gl));
+	/* Read new copy from disk if we don't have the latest */
+	if (sdp->sd_rindex_vn != gl->gl_vn) {
+		mutex_lock(&sdp->sd_rindex_mutex);
+		if (sdp->sd_rindex_vn != gl->gl_vn)
+			error = gfs2_ri_update(ip);
+		mutex_unlock(&sdp->sd_rindex_mutex);
+	}
+
+	return error;
+}
+
+/**
  * gfs2_rindex_hold - Grab a lock on the rindex
  * @sdp: The GFS2 superblock
  * @ri_gh: the glock holder
@@ -526,20 +595,11 @@ int gfs2_rindex_hold(struct gfs2_sbd *sdp, struct gfs2_holder *ri_gh)
 	int error;
 
 	error = gfs2_glock_nq_init(gl, LM_ST_SHARED, 0, ri_gh);
-	if (error)
-		return error;
-
-	/* Read new copy from disk if we don't have the latest */
-	if (sdp->sd_rindex_vn != gl->gl_vn) {
-		mutex_lock(&sdp->sd_rindex_mutex);
-		if (sdp->sd_rindex_vn != gl->gl_vn) {
-			error = gfs2_ri_update(ip);
-			if (error)
-				gfs2_glock_dq_uninit(ri_gh);
-		}
-		mutex_unlock(&sdp->sd_rindex_mutex);
+	if (!error) {
+		error = gfs2_check_rindex_version(sdp);
+		if (error)
+			gfs2_glock_dq_uninit(ri_gh);
 	}
-
 	return error;
 }
 
@@ -978,18 +1038,25 @@ int gfs2_inplace_reserve_i(struct gfs2_inode *ip, char *file, unsigned int line)
 {
 	struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
 	struct gfs2_alloc *al = &ip->i_alloc;
-	int error;
+	int error = 0;
 
 	if (gfs2_assert_warn(sdp, al->al_requested))
 		return -EINVAL;
 
-	error = gfs2_rindex_hold(sdp, &al->al_ri_gh);
+	/* We need to hold the rindex unless the inode we're using is
+	   the rindex itself, in which case it's already held. */
+	if (ip != GFS2_I(sdp->sd_rindex))
+		error = gfs2_rindex_hold(sdp, &al->al_ri_gh);
+	else if (!sdp->sd_rgrps) /* We may not have the rindex read in, so: */
+		error = gfs2_check_rindex_version(sdp);
+
 	if (error)
 		return error;
 
 	error = get_local_rgrp(ip);
 	if (error) {
-		gfs2_glock_dq_uninit(&al->al_ri_gh);
+		if (ip != GFS2_I(sdp->sd_rindex))
+			gfs2_glock_dq_uninit(&al->al_ri_gh);
 		return error;
 	}
 
@@ -1019,7 +1086,8 @@ void gfs2_inplace_release(struct gfs2_inode *ip)
 
 	al->al_rgd = NULL;
 	gfs2_glock_dq_uninit(&al->al_rgd_gh);
-	gfs2_glock_dq_uninit(&al->al_ri_gh);
+	if (ip != GFS2_I(sdp->sd_rindex))
+		gfs2_glock_dq_uninit(&al->al_ri_gh);
 }
 
 /**



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Cluster-devel] [PATCH] GFS2: kernel changes to support new gfs2_grow command (Try 3)
  2007-05-03  1:57 Robert Peterson
@ 2007-05-04 16:47 ` David Teigland
  2007-05-04 21:35   ` Robert Peterson
  2007-05-04 19:37 ` David Teigland
  2007-05-04 20:23 ` David Teigland
  2 siblings, 1 reply; 14+ messages in thread
From: David Teigland @ 2007-05-04 16:47 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, May 02, 2007 at 08:57:08PM -0500, Robert Peterson wrote:
> +	__u64 fs_total, new_free, new_free_readable;

use u64, __u64 is for structs shared with user space

> +	printk(KERN_WARNING "GFS2: File system extended by %llu "
> +	       "blocks (%llu%cB)\n", new_free, new_free_readable,
> +	       stg_abbrev[factor]);

use fs_warn(), and unless there's some precedent elsewhere I don't think
the human-readable translation is appropriate in the kernel.

> +	if (!gfs2_glock_is_held_excl(ip->i_gl) &&
> +	    !gfs2_glock_is_held_shrd(ip->i_gl) &&

Be extremely wary of using these "is_held" functions algorithmically;
needing to use them is usually an indication that you're doing something
wrong (I'll have to study this case a bit more to say anything helpful).

It's instructional to note that the glock_is_held functions are _never_
used outside of assertions in all of gfs1 and gfs2!  There's an exception
to every rule, but...

Dave

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] [PATCH] GFS2: kernel changes to support new gfs2_grow command (Try 3)
  2007-05-04 16:47 ` David Teigland
@ 2007-05-04 21:35   ` Robert Peterson
  2007-05-04 22:16     ` David Teigland
  0 siblings, 1 reply; 14+ messages in thread
From: Robert Peterson @ 2007-05-04 21:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

David Teigland wrote:
> On Wed, May 02, 2007 at 08:57:08PM -0500, Robert Peterson wrote:
>> +	__u64 fs_total, new_free, new_free_readable;
> 
> use u64, __u64 is for structs shared with user space

Okay.

>> +	printk(KERN_WARNING "GFS2: File system extended by %llu "
>> +	       "blocks (%llu%cB)\n", new_free, new_free_readable,
>> +	       stg_abbrev[factor]);
> 
> use fs_warn(), and unless there's some precedent elsewhere I don't think
> the human-readable translation is appropriate in the kernel.

Okay.

>> +	if (!gfs2_glock_is_held_excl(ip->i_gl) &&
>> +	    !gfs2_glock_is_held_shrd(ip->i_gl) &&
> 
> Be extremely wary of using these "is_held" functions algorithmically;
> needing to use them is usually an indication that you're doing something
> wrong (I'll have to study this case a bit more to say anything helpful).

I was using these functions to determine if someone was doing
writes to the rindex file through the meta_fs, in which case I
don't want to exit with a consistency error.  If the rindex is being
written to, there may be partial pieces of rindex entries
left on a page (i.e. between two gfs2_write_commits) before the
writing is over.  In these cases, it's normal and we should ignore
the partial entries (and re-read them when the writing is complete
and the version number has changed).

My concern here is if a different node decides to do an ri_update
while a gfs2_grow is happening.  For example, if someone does a
mount of the file system on a different node, the rindex may
briefly be in an inconsistent state.

> It's instructional to note that the glock_is_held functions are _never_
> used outside of assertions in all of gfs1 and gfs2!  There's an exception
> to every rule, but...

Yeah, I noticed that.

> Dave

Bob Peterson
Red Hat Cluster Suite

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] [PATCH] GFS2: kernel changes to support new gfs2_grow command (Try 3)
  2007-05-04 21:35   ` Robert Peterson
@ 2007-05-04 22:16     ` David Teigland
  0 siblings, 0 replies; 14+ messages in thread
From: David Teigland @ 2007-05-04 22:16 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Fri, May 04, 2007 at 04:35:08PM -0500, Robert Peterson wrote:
> >>+	if (!gfs2_glock_is_held_excl(ip->i_gl) &&
> >>+	    !gfs2_glock_is_held_shrd(ip->i_gl) &&
> >
> >Be extremely wary of using these "is_held" functions algorithmically;
> >needing to use them is usually an indication that you're doing something
> >wrong (I'll have to study this case a bit more to say anything helpful).
> 
> I was using these functions to determine if someone was doing
> writes to the rindex file through the meta_fs, in which case I
> don't want to exit with a consistency error.  If the rindex is being
> written to, there may be partial pieces of rindex entries
> left on a page (i.e. between two gfs2_write_commits) before the
> writing is over.  In these cases, it's normal and we should ignore
> the partial entries (and re-read them when the writing is complete
> and the version number has changed).
> 
> My concern here is if a different node decides to do an ri_update
> while a gfs2_grow is happening.  For example, if someone does a
> mount of the file system on a different node, the rindex may
> briefly be in an inconsistent state.

Why doesn't the exclusive glock held over the update protect against other
processes reading something bad?

Dave



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] [PATCH] GFS2: kernel changes to support new gfs2_grow command (Try 3)
  2007-05-03  1:57 Robert Peterson
  2007-05-04 16:47 ` David Teigland
@ 2007-05-04 19:37 ` David Teigland
  2007-05-04 21:37   ` Robert Peterson
  2007-05-08 14:56   ` Robert Peterson
  2007-05-04 20:23 ` David Teigland
  2 siblings, 2 replies; 14+ messages in thread
From: David Teigland @ 2007-05-04 19:37 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, May 02, 2007 at 08:57:08PM -0500, Robert Peterson wrote:
>  * gfs2_ri_update - Pull in a new resource index from the disk
>  * @gl: The glock covering the rindex inode
>  *
> @@ -447,7 +479,12 @@ static int gfs2_ri_update(struct gfs2_inode *ip)
> 	u64 junk = ip->i_di.di_size;
> 	int error;
> 
> -	if (do_div(junk, sizeof(struct gfs2_rindex))) {
> +	/* If someone is holding the rindex file with a glock, they must
> +	   be updating it, in which case we may have partial entries.
> +	   In this case, we ignore the partials. */
> +	if (!gfs2_glock_is_held_excl(ip->i_gl) &&
> +	    !gfs2_glock_is_held_shrd(ip->i_gl) &&
> +	    do_div(junk, sizeof(struct gfs2_rindex))) {
> 		gfs2_consist_inode(ip);
> 		return -EIO;
> 	}

So the use of glock_is_held _is_ part of an assertion, not part of an
algorithm which I was worried about before.  We should only ever get to
this spot with a shared glock, right? (rindex_hold takes it).  So a plain
old assertion that the glock is shared at the beginning would be ok, but
this particular check doesn't make sense to me.


> @@ -457,6 +494,9 @@ static int gfs2_ri_update(struct gfs2_inode *ip)
> 	file_ra_state_init(&ra_state, inode->i_mapping);
> 	for (sdp->sd_rgrps = 0;; sdp->sd_rgrps++) {
> 		loff_t pos = sdp->sd_rgrps * sizeof(struct gfs2_rindex);
> +
> +		if (pos + sizeof(struct gfs2_rindex) >= ip->i_di.di_size)
> +			break;

Why is this needed now if it wasn't before?

> 		error = gfs2_internal_read(ip, &ra_state, buf, &pos,
> 					    sizeof(struct gfs2_rindex));
> 		if (!error)

Dave



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] [PATCH] GFS2: kernel changes to support new gfs2_grow command (Try 3)
  2007-05-04 19:37 ` David Teigland
@ 2007-05-04 21:37   ` Robert Peterson
  2007-05-08 14:56   ` Robert Peterson
  1 sibling, 0 replies; 14+ messages in thread
From: Robert Peterson @ 2007-05-04 21:37 UTC (permalink / raw)
  To: cluster-devel.redhat.com

David Teigland wrote:
 > So the use of glock_is_held _is_ part of an assertion, not part of an
> algorithm which I was worried about before.  We should only ever get to
> this spot with a shared glock, right? (rindex_hold takes it).  So a plain
> old assertion that the glock is shared at the beginning would be ok, but
> this particular check doesn't make sense to me.

See previous email.
 
> Why is this needed now if it wasn't before?

Again, this goes back to the partial rinedx entries left on a page in
the middle of an rindex update through the meta_fs.  We don't want
to read in a partial entry, we want to ignore it.

Regards,

Bob Peterson
Red Hat Cluster Suite



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] [PATCH] GFS2: kernel changes to support new gfs2_grow command (Try 3)
  2007-05-04 19:37 ` David Teigland
  2007-05-04 21:37   ` Robert Peterson
@ 2007-05-08 14:56   ` Robert Peterson
  1 sibling, 0 replies; 14+ messages in thread
From: Robert Peterson @ 2007-05-08 14:56 UTC (permalink / raw)
  To: cluster-devel.redhat.com

David Teigland wrote:
> On Wed, May 02, 2007 at 08:57:08PM -0500, Robert Peterson wrote:
>> @@ -447,7 +479,12 @@ static int gfs2_ri_update(struct gfs2_inode *ip)
>> 	u64 junk = ip->i_di.di_size;
>> 	int error;
>>
>> -	if (do_div(junk, sizeof(struct gfs2_rindex))) {
>> +	/* If someone is holding the rindex file with a glock, they must
>> +	   be updating it, in which case we may have partial entries.
>> +	   In this case, we ignore the partials. */
>> +	if (!gfs2_glock_is_held_excl(ip->i_gl) &&
>> +	    !gfs2_glock_is_held_shrd(ip->i_gl) &&
>> +	    do_div(junk, sizeof(struct gfs2_rindex))) {
>> 		gfs2_consist_inode(ip);
>> 		return -EIO;
>> 	}
> 
> So the use of glock_is_held _is_ part of an assertion, not part of an
> algorithm which I was worried about before.  We should only ever get to
> this spot with a shared glock, right? (rindex_hold takes it).  So a plain
> old assertion that the glock is shared at the beginning would be ok, but
> this particular check doesn't make sense to me.

For the sake of completeness, I retested without this change to make
sure it was also still necessary.  It was.  The problem is that
we can now call gfs2_ri_update while there are still partial rindex 
entries.  This can happen when we need to allocate a new page to the
rindex, which calls gfs2_inplace_reserve_i, which eventually gets to 
gfs2_check_rindex_version.  Without the change, you get:

GFS2: fsid=bob_cluster2:test_gfs.0: fatal: filesystem consistency error
GFS2: fsid=bob_cluster2:test_gfs.0:   inode = 19 25439
GFS2: fsid=bob_cluster2:test_gfs.0:   function = gfs2_ri_update, file = fs/gfs2/rgrp.c, line = 486
GFS2: fsid=bob_cluster2:test_gfs.0: about to withdraw this file system
GFS2: fsid=bob_cluster2:test_gfs.0: telling LM to withdraw
GFS2: fsid=bob_cluster2:test_gfs.0: withdrawn
 [<e02a5c2f>] gfs2_lm_withdraw+0x82/0x8d [gfs2]
 [<e02b72d0>] gfs2_consist_inode_i+0x6f/0x75 [gfs2]
 [<e02b34da>] gfs2_check_rindex_version+0x11e/0x468 [gfs2]
 [<c04078a5>] __sched_text_start+0x715/0x7c4
 [<e029a335>] gfs2_extent_map+0x68/0x9c [gfs2]
 [<e02b3dbb>] gfs2_inplace_reserve_i+0x9a/0x443 [gfs2]
 [<e029a4fb>] gfs2_write_alloc_required+0x192/0x1ce [gfs2]
 [<e02aad58>] gfs2_prepare_write+0x12d/0x237 [gfs2]
 [<e02aac2b>] gfs2_prepare_write+0x0/0x237 [gfs2]
 [<c0138422>] generic_file_buffered_write+0x25b/0x60f
 [<c03b6594>] tcp_v4_do_rcv+0x28/0x307
 [<c016857e>] __mark_inode_dirty+0xdd/0x15c
 [<c011dbb6>] current_fs_time+0x41/0x46
 [<c0138cbd>] __generic_file_aio_write_nolock+0x4e7/0x560
 [<c013afef>] get_page_from_freelist+0x24f/0x2cf
 [<c0138d8b>] generic_file_aio_write+0x55/0xb3
 [<c01514cd>] do_sync_write+0xc7/0x10a
 [<e02a3293>] gfs2_holder_uninit+0xb/0x1b [gfs2]
 [<c01296cd>] autoremove_wake_function+0x0/0x35
 [<e02a3293>] gfs2_holder_uninit+0xb/0x1b [gfs2]
 [<e02ac3d8>] gfs2_llseek+0x76/0x9a [gfs2]
 [<c0151406>] do_sync_write+0x0/0x10a
 [<c0151c30>] vfs_write+0x8a/0x10c
 [<c015219f>] sys_write+0x41/0x67
 [<c01030d8>] sysenter_past_esp+0x5d/0x81
 [<c0400000>] svc_defer+0x6b/0x126
 =======================

>> @@ -457,6 +494,9 @@ static int gfs2_ri_update(struct gfs2_inode *ip)
>> 	file_ra_state_init(&ra_state, inode->i_mapping);
>> 	for (sdp->sd_rgrps = 0;; sdp->sd_rgrps++) {
>> 		loff_t pos = sdp->sd_rgrps * sizeof(struct gfs2_rindex);
>> +
>> +		if (pos + sizeof(struct gfs2_rindex) >= ip->i_di.di_size)
>> +			break;
> 
> Why is this needed now if it wasn't before?

For the same reason as the above check is necessary.

Regards,

Bob Peterson
Red Hat Cluster Suite



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] [PATCH] GFS2: kernel changes to support new gfs2_grow command (Try 3)
  2007-05-03  1:57 Robert Peterson
  2007-05-04 16:47 ` David Teigland
  2007-05-04 19:37 ` David Teigland
@ 2007-05-04 20:23 ` David Teigland
  2007-05-04 21:48   ` Robert Peterson
  2 siblings, 1 reply; 14+ messages in thread
From: David Teigland @ 2007-05-04 20:23 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, May 02, 2007 at 08:57:08PM -0500, Robert Peterson wrote:
> @@ -978,18 +1038,25 @@ int gfs2_inplace_reserve_i(struct gfs2_inode *ip, 

> -	error = gfs2_rindex_hold(sdp, &al->al_ri_gh);
> +	/* We need to hold the rindex unless the inode we're using is
> +	   the rindex itself, in which case it's already held. */
> +	if (ip != GFS2_I(sdp->sd_rindex))
> +		error = gfs2_rindex_hold(sdp, &al->al_ri_gh);
> +	else if (!sdp->sd_rgrps) /* We may not have the rindex read in, so: */
> +		error = gfs2_check_rindex_version(sdp);
> +

I don't see why we need this change, the original looks correct.

So there are the two distinct parts to the fs-grow procedure.  The first
is writing to the rindex and statfs files -- this is what was missing in
gfs2, and I'm ignoring that part for now.

The second part is gfs detecting that a grow took place and updating its
list of rg's.  I don't think this part needs any changing at all, it
should work the same way it always has -- this is what I'm focussing on.

Dave

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] [PATCH] GFS2: kernel changes to support new gfs2_grow command (Try 3)
  2007-05-04 20:23 ` David Teigland
@ 2007-05-04 21:48   ` Robert Peterson
  2007-05-04 22:04     ` David Teigland
  2007-05-08 14:17     ` Robert Peterson
  0 siblings, 2 replies; 14+ messages in thread
From: Robert Peterson @ 2007-05-04 21:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

David Teigland wrote:
> On Wed, May 02, 2007 at 08:57:08PM -0500, Robert Peterson wrote:
>> @@ -978,18 +1038,25 @@ int gfs2_inplace_reserve_i(struct gfs2_inode *ip, 
> 
>> -	error = gfs2_rindex_hold(sdp, &al->al_ri_gh);
>> +	/* We need to hold the rindex unless the inode we're using is
>> +	   the rindex itself, in which case it's already held. */
>> +	if (ip != GFS2_I(sdp->sd_rindex))
>> +		error = gfs2_rindex_hold(sdp, &al->al_ri_gh);
>> +	else if (!sdp->sd_rgrps) /* We may not have the rindex read in, so: */
>> +		error = gfs2_check_rindex_version(sdp);
>> +
> 
> I don't see why we need this change, the original looks correct.

If we're updating through the meta_fs, the rindex file will already be
glocked and held through its inode.

> So there are the two distinct parts to the fs-grow procedure.  The first
> is writing to the rindex and statfs files -- this is what was missing in
> gfs2, and I'm ignoring that part for now.
> 
> The second part is gfs detecting that a grow took place and updating its
> list of rg's.  I don't think this part needs any changing at all, it
> should work the same way it always has -- this is what I'm focussing on.

I'll re-test test the original code path to make sure that this part of
the fix is still really necessary.

Regards,

Bob Peterson
Red Hat Cluster Suite



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] [PATCH] GFS2: kernel changes to support new gfs2_grow command (Try 3)
  2007-05-04 21:48   ` Robert Peterson
@ 2007-05-04 22:04     ` David Teigland
  2007-05-08 14:17     ` Robert Peterson
  1 sibling, 0 replies; 14+ messages in thread
From: David Teigland @ 2007-05-04 22:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Fri, May 04, 2007 at 04:48:24PM -0500, Robert Peterson wrote:
> David Teigland wrote:
> >On Wed, May 02, 2007 at 08:57:08PM -0500, Robert Peterson wrote:
> >>@@ -978,18 +1038,25 @@ int gfs2_inplace_reserve_i(struct gfs2_inode *ip, 
> >
> >>-	error = gfs2_rindex_hold(sdp, &al->al_ri_gh);
> >>+	/* We need to hold the rindex unless the inode we're using is
> >>+	   the rindex itself, in which case it's already held. */
> >>+	if (ip != GFS2_I(sdp->sd_rindex))
> >>+		error = gfs2_rindex_hold(sdp, &al->al_ri_gh);
> >>+	else if (!sdp->sd_rgrps) /* We may not have the rindex read in, so: 
> >>*/
> >>+		error = gfs2_check_rindex_version(sdp);
> >>+
> >
> >I don't see why we need this change, the original looks correct.
> 
> If we're updating through the meta_fs, the rindex file will already be
> glocked and held through its inode.

You don't have to worry about other holders or the state of the glock,
just let gfs_rindex_hold() enqueue its own shared holder and everything
should work fine.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] [PATCH] GFS2: kernel changes to support new gfs2_grow command (Try 3)
  2007-05-04 21:48   ` Robert Peterson
  2007-05-04 22:04     ` David Teigland
@ 2007-05-08 14:17     ` Robert Peterson
  2007-05-08 14:24       ` Steven Whitehouse
  1 sibling, 1 reply; 14+ messages in thread
From: Robert Peterson @ 2007-05-08 14:17 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Robert Peterson wrote:
> David Teigland wrote:
>> On Wed, May 02, 2007 at 08:57:08PM -0500, Robert Peterson wrote:
>>> @@ -978,18 +1038,25 @@ int gfs2_inplace_reserve_i(struct gfs2_inode *ip, 
>>
>>> -    error = gfs2_rindex_hold(sdp, &al->al_ri_gh);
>>> +    /* We need to hold the rindex unless the inode we're using is
>>> +       the rindex itself, in which case it's already held. */
>>> +    if (ip != GFS2_I(sdp->sd_rindex))
>>> +        error = gfs2_rindex_hold(sdp, &al->al_ri_gh);
>>> +    else if (!sdp->sd_rgrps) /* We may not have the rindex read in, 
>>> so: */
>>> +        error = gfs2_check_rindex_version(sdp);
>>> +
>>
>> I don't see why we need this change, the original looks correct.
> 
> If we're updating through the meta_fs, the rindex file will already be
> glocked and held through its inode.
> 
>> So there are the two distinct parts to the fs-grow procedure.  The first
>> is writing to the rindex and statfs files -- this is what was missing in
>> gfs2, and I'm ignoring that part for now.
>>
>> The second part is gfs detecting that a grow took place and updating its
>> list of rg's.  I don't think this part needs any changing at all, it
>> should work the same way it always has -- this is what I'm focussing on.
> 
> I'll re-test test the original code path to make sure that this part of
> the fix is still really necessary.
> 
> Regards,
> 
> Bob Peterson
> Red Hat Cluster Suite
> 
Hi Dave,

Well, I retested this code path and found that this code is still 
necessary  (unless I should somehow be doing this a different way).
Without the code, the gfs2_rindex_hold tries to add a holder,
but there's already a holder due to the meta_fs.  Under the right 
conditions, you get:

original: gfs2_prepare_write+0x49/0x237 [gfs2]
new: gfs2_rindex_hold+0x2b/0x52a [gfs2]
------------[ cut here ]------------
kernel BUG at fs/gfs2/glock.c:1057!
invalid opcode: 0000 [#1]
SMP 
Modules linked in: lock_dlm gfs2 dlm configfs sd_mod sg qla2xxx
CPU:    0
EIP:    0060:[<e02a2fd3>]    Not tainted VLI
EFLAGS: 00010282   (2.6.21 #2)
EIP is at gfs2_glock_nq+0xff/0x1a6 [gfs2]
eax: 00000020   ebx: c3070b54   ecx: 00000046   edx: 00000046
esi: c3070c2c   edi: d5147b1c   ebp: d5147b1c   esp: d18f7bf0
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process gfs2_grow (pid: 5762, ti=d18f6000 task=df6e9ab0 task.ti=d18f6000)
Stack: e02b8e4e 00000002 00000001 c8ba1000 00000000 c8ba1000 00000000 c8ba1000 
       c307099c e02b33f0 c3070c2c 00000000 d5147b1c c3070c98 def5eeb8 d521d108 
       00001000 c3070c2c d18f7c6c c307099c d18f7cbc e029a2c1 d18f7c6c 00000000 
Call Trace:
 [<e02b33f0>] gfs2_rindex_hold+0x34/0x52a [gfs2]
 [<e029a2c1>] gfs2_block_map+0x39f/0x3ab [gfs2]
 [<c04078a5>] __sched_text_start+0x715/0x7c4
 [<e029a335>] gfs2_extent_map+0x68/0x9c [gfs2]
 [<e02b3e08>] gfs2_inplace_reserve_i+0x7b/0x418 [gfs2]
 [<e029a4fb>] gfs2_write_alloc_required+0x192/0x1ce [gfs2]
 [<e02aad58>] gfs2_prepare_write+0x12d/0x237 [gfs2]
 [<e02aac2b>] gfs2_prepare_write+0x0/0x237 [gfs2]
 [<c0138422>] generic_file_buffered_write+0x25b/0x60f
 [<c016857e>] __mark_inode_dirty+0xdd/0x15c
 [<c011dbb6>] current_fs_time+0x41/0x46
 [<c0138cbd>] __generic_file_aio_write_nolock+0x4e7/0x560
 [<c013afef>] get_page_from_freelist+0x24f/0x2cf
 [<c0138d8b>] generic_file_aio_write+0x55/0xb3
 [<c01514cd>] do_sync_write+0xc7/0x10a
 [<e02a3293>] gfs2_holder_uninit+0xb/0x1b [gfs2]
 [<c01296cd>] autoremove_wake_function+0x0/0x35
 [<e02a3293>] gfs2_holder_uninit+0xb/0x1b [gfs2]
 [<e02ac3d8>] gfs2_llseek+0x76/0x9a [gfs2]
 [<c0151406>] do_sync_write+0x0/0x10a
 [<c0151c30>] vfs_write+0x8a/0x10c
 [<c015219f>] sys_write+0x41/0x67
 [<c01030d8>] sysenter_past_esp+0x5d/0x81
 [<c0400000>] svc_defer+0x6b/0x126
 =======================

Regards,

Bob Peterson
Red Hat Cluster Suite



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Cluster-devel] [PATCH] GFS2: kernel changes to support new gfs2_grow command (Try 3)
  2007-05-08 14:17     ` Robert Peterson
@ 2007-05-08 14:24       ` Steven Whitehouse
  0 siblings, 0 replies; 14+ messages in thread
From: Steven Whitehouse @ 2007-05-08 14:24 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

On Tue, 2007-05-08 at 09:17 -0500, Robert Peterson wrote:
> Robert Peterson wrote:
> > David Teigland wrote:
> >> On Wed, May 02, 2007 at 08:57:08PM -0500, Robert Peterson wrote:
> >>> @@ -978,18 +1038,25 @@ int gfs2_inplace_reserve_i(struct gfs2_inode *ip, 
> >>
> >>> -    error = gfs2_rindex_hold(sdp, &al->al_ri_gh);
> >>> +    /* We need to hold the rindex unless the inode we're using is
> >>> +       the rindex itself, in which case it's already held. */
> >>> +    if (ip != GFS2_I(sdp->sd_rindex))
> >>> +        error = gfs2_rindex_hold(sdp, &al->al_ri_gh);
> >>> +    else if (!sdp->sd_rgrps) /* We may not have the rindex read in, 
> >>> so: */
> >>> +        error = gfs2_check_rindex_version(sdp);
> >>> +
> >>
> >> I don't see why we need this change, the original looks correct.
> > 
> > If we're updating through the meta_fs, the rindex file will already be
> > glocked and held through its inode.
> > 
> >> So there are the two distinct parts to the fs-grow procedure.  The first
> >> is writing to the rindex and statfs files -- this is what was missing in
> >> gfs2, and I'm ignoring that part for now.
> >>
> >> The second part is gfs detecting that a grow took place and updating its
> >> list of rg's.  I don't think this part needs any changing at all, it
> >> should work the same way it always has -- this is what I'm focussing on.
> > 
> > I'll re-test test the original code path to make sure that this part of
> > the fix is still really necessary.
> > 
> > Regards,
> > 
> > Bob Peterson
> > Red Hat Cluster Suite
> > 
> Hi Dave,
> 
> Well, I retested this code path and found that this code is still 
> necessary  (unless I should somehow be doing this a different way).
> Without the code, the gfs2_rindex_hold tries to add a holder,
> but there's already a holder due to the meta_fs.  Under the right 
> conditions, you get:
> 
> original: gfs2_prepare_write+0x49/0x237 [gfs2]
> new: gfs2_rindex_hold+0x2b/0x52a [gfs2]

I agree that this change is needed, its how we've solved the same
problem elsewhere and it looks ok to me,

Steve.




^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2007-05-08 20:43 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-08 20:43 [Cluster-devel] [PATCH] GFS2: kernel changes to support new gfs2_grow command (Try 3) David Teigland
  -- strict thread matches above, loose matches on Subject: below --
2007-05-08 20:42 David Teigland
2007-05-03  1:57 Robert Peterson
2007-05-04 16:47 ` David Teigland
2007-05-04 21:35   ` Robert Peterson
2007-05-04 22:16     ` David Teigland
2007-05-04 19:37 ` David Teigland
2007-05-04 21:37   ` Robert Peterson
2007-05-08 14:56   ` Robert Peterson
2007-05-04 20:23 ` David Teigland
2007-05-04 21:48   ` Robert Peterson
2007-05-04 22:04     ` David Teigland
2007-05-08 14:17     ` Robert Peterson
2007-05-08 14:24       ` Steven Whitehouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).