From mboxrd@z Thu Jan  1 00:00:00 1970
From: Robert Peterson <rpeterso@redhat.com>
Date: Tue, 08 May 2007 09:56:05 -0500
Subject: [Cluster-devel] [PATCH] GFS2: kernel changes to support new
	gfs2_grow command (Try 3)
In-Reply-To: <20070504193714.GB13775@redhat.com>
References: <46394174.2030709@redhat.com> <20070504193714.GB13775@redhat.com>
Message-ID: <46408F85.4000904@redhat.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

David Teigland wrote:
> On Wed, May 02, 2007 at 08:57:08PM -0500, Robert Peterson wrote:
>> @@ -447,7 +479,12 @@ static int gfs2_ri_update(struct gfs2_inode *ip)
>> 	u64 junk = ip->i_di.di_size;
>> 	int error;
>>
>> -	if (do_div(junk, sizeof(struct gfs2_rindex))) {
>> +	/* If someone is holding the rindex file with a glock, they must
>> +	   be updating it, in which case we may have partial entries.
>> +	   In this case, we ignore the partials. */
>> +	if (!gfs2_glock_is_held_excl(ip->i_gl) &&
>> +	    !gfs2_glock_is_held_shrd(ip->i_gl) &&
>> +	    do_div(junk, sizeof(struct gfs2_rindex))) {
>> 		gfs2_consist_inode(ip);
>> 		return -EIO;
>> 	}
> 
> So the use of glock_is_held _is_ part of an assertion, not part of an
> algorithm which I was worried about before.  We should only ever get to
> this spot with a shared glock, right? (rindex_hold takes it).  So a plain
> old assertion that the glock is shared at the beginning would be ok, but
> this particular check doesn't make sense to me.

For the sake of completeness, I retested without this change to make
sure it was also still necessary.  It was.  The problem is that
we can now call gfs2_ri_update while there are still partial rindex 
entries.  This can happen when we need to allocate a new page to the
rindex, which calls gfs2_inplace_reserve_i, which eventually gets to 
gfs2_check_rindex_version.  Without the change, you get:

GFS2: fsid=bob_cluster2:test_gfs.0: fatal: filesystem consistency error
GFS2: fsid=bob_cluster2:test_gfs.0:   inode = 19 25439
GFS2: fsid=bob_cluster2:test_gfs.0:   function = gfs2_ri_update, file = fs/gfs2/rgrp.c, line = 486
GFS2: fsid=bob_cluster2:test_gfs.0: about to withdraw this file system
GFS2: fsid=bob_cluster2:test_gfs.0: telling LM to withdraw
GFS2: fsid=bob_cluster2:test_gfs.0: withdrawn
 [<e02a5c2f>] gfs2_lm_withdraw+0x82/0x8d [gfs2]
 [<e02b72d0>] gfs2_consist_inode_i+0x6f/0x75 [gfs2]
 [<e02b34da>] gfs2_check_rindex_version+0x11e/0x468 [gfs2]
 [<c04078a5>] __sched_text_start+0x715/0x7c4
 [<e029a335>] gfs2_extent_map+0x68/0x9c [gfs2]
 [<e02b3dbb>] gfs2_inplace_reserve_i+0x9a/0x443 [gfs2]
 [<e029a4fb>] gfs2_write_alloc_required+0x192/0x1ce [gfs2]
 [<e02aad58>] gfs2_prepare_write+0x12d/0x237 [gfs2]
 [<e02aac2b>] gfs2_prepare_write+0x0/0x237 [gfs2]
 [<c0138422>] generic_file_buffered_write+0x25b/0x60f
 [<c03b6594>] tcp_v4_do_rcv+0x28/0x307
 [<c016857e>] __mark_inode_dirty+0xdd/0x15c
 [<c011dbb6>] current_fs_time+0x41/0x46
 [<c0138cbd>] __generic_file_aio_write_nolock+0x4e7/0x560
 [<c013afef>] get_page_from_freelist+0x24f/0x2cf
 [<c0138d8b>] generic_file_aio_write+0x55/0xb3
 [<c01514cd>] do_sync_write+0xc7/0x10a
 [<e02a3293>] gfs2_holder_uninit+0xb/0x1b [gfs2]
 [<c01296cd>] autoremove_wake_function+0x0/0x35
 [<e02a3293>] gfs2_holder_uninit+0xb/0x1b [gfs2]
 [<e02ac3d8>] gfs2_llseek+0x76/0x9a [gfs2]
 [<c0151406>] do_sync_write+0x0/0x10a
 [<c0151c30>] vfs_write+0x8a/0x10c
 [<c015219f>] sys_write+0x41/0x67
 [<c01030d8>] sysenter_past_esp+0x5d/0x81
 [<c0400000>] svc_defer+0x6b/0x126
 =======================

>> @@ -457,6 +494,9 @@ static int gfs2_ri_update(struct gfs2_inode *ip)
>> 	file_ra_state_init(&ra_state, inode->i_mapping);
>> 	for (sdp->sd_rgrps = 0;; sdp->sd_rgrps++) {
>> 		loff_t pos = sdp->sd_rgrps * sizeof(struct gfs2_rindex);
>> +
>> +		if (pos + sizeof(struct gfs2_rindex) >= ip->i_di.di_size)
>> +			break;
> 
> Why is this needed now if it wasn't before?

For the same reason as the above check is necessary.

Regards,

Bob Peterson
Red Hat Cluster Suite