[Cluster-devel] [PATCH 5 of 5] Bz #248176: GFS2: invalid metadata block

cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

* [Cluster-devel] [PATCH 5 of 5] Bz #248176: GFS2: invalid metadata block - REVISED
@ 2007-08-08 22:08 Bob Peterson
  2007-08-09  8:35 ` Steven Whitehouse
  2007-08-09 19:07 ` Wendy Cheng
  0 siblings, 2 replies; 4+ messages in thread
From: Bob Peterson @ 2007-08-08 22:08 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This is for bugzilla bug #248176: GFS2: invalid metadata block

Patches 1 thru 3 were accepted upstream, but there were problems
with 4 and 5.  Those issues have been resolved and now the recovery
tests are passing without errors.  This code has gone through
41 * 3 successful gfs2 recovery tests before it hit an
unrelated (openais) problem.  I'm continuing to test it.

This is a complete rewrite of patch 5 for bug #248176, written by
Steve Whitehouse.  This is referred to in the bugzilla record as
"new 6" and "a different solution".

The problem was that the journal inodes, although protected by
a glock, were not synched with the other nodes because they don't
use the inode glock synch operations (i.e. no "glops" were defined).
Therefore, journal recovery on a journal-recovering node were causing
the blocks to get out of sync with the node that was actually trying
to use that journal as it comes back up from a reboot.

There are two possible solutions: (1) To make the journals use the
normal inode glock sync operations, or (2) To make the journal
operations take effect immediately (i.e. no caching).  Although
option 1 works, it turns out to be a lot more code.  Steve opted
for option 2, which is much simpler and therefore less prone to
regression errors.

Regards,

Bob Peterson
--
Signed-off-by: Bob Peterson <rpeterso@redhat.com> 
--
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 58c730b..f0bcaa2 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -358,7 +358,7 @@ static int init_journal(struct gfs2_sbd *sdp, int undo)

 		ip = GFS2_I(sdp->sd_jdesc->jd_inode);
 		error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED,
-					   LM_FLAG_NOEXP | GL_EXACT,
+					   LM_FLAG_NOEXP | GL_EXACT | GL_NOCACHE,
 					   &sdp->sd_jinode_gh);
 		if (error) {
 			fs_err(sdp, "can't acquire journal inode glock: %d\n",
diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
index 5ada38c..beb6c7a 100644
--- a/fs/gfs2/recovery.c
+++ b/fs/gfs2/recovery.c
@@ -469,7 +469,7 @@ int gfs2_recover_journal(struct gfs2_jdesc *jd)
 		};

 		error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED,
-					   LM_FLAG_NOEXP, &ji_gh);
+					   LM_FLAG_NOEXP | GL_NOCACHE, &ji_gh);
 		if (error)
 			goto fail_gunlock_j;
 	} else {

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [Cluster-devel] [PATCH 5 of 5] Bz #248176: GFS2: invalid metadata block - REVISED
  2007-08-08 22:08 [Cluster-devel] [PATCH 5 of 5] Bz #248176: GFS2: invalid metadata block - REVISED Bob Peterson
@ 2007-08-09  8:35 ` Steven Whitehouse
  2007-08-09 19:07 ` Wendy Cheng
  1 sibling, 0 replies; 4+ messages in thread
From: Steven Whitehouse @ 2007-08-09  8:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

This one and the previous patch are now both in the -nmw git tree.
Thanks,

Steve.

On Wed, 2007-08-08 at 17:08 -0500, Bob Peterson wrote:
> This is for bugzilla bug #248176: GFS2: invalid metadata block
> 
> Patches 1 thru 3 were accepted upstream, but there were problems
> with 4 and 5.  Those issues have been resolved and now the recovery
> tests are passing without errors.  This code has gone through
> 41 * 3 successful gfs2 recovery tests before it hit an
> unrelated (openais) problem.  I'm continuing to test it.
> 
> This is a complete rewrite of patch 5 for bug #248176, written by
> Steve Whitehouse.  This is referred to in the bugzilla record as
> "new 6" and "a different solution".
> 
> The problem was that the journal inodes, although protected by
> a glock, were not synched with the other nodes because they don't
> use the inode glock synch operations (i.e. no "glops" were defined).
> Therefore, journal recovery on a journal-recovering node were causing
> the blocks to get out of sync with the node that was actually trying
> to use that journal as it comes back up from a reboot.
> 
> There are two possible solutions: (1) To make the journals use the
> normal inode glock sync operations, or (2) To make the journal
> operations take effect immediately (i.e. no caching).  Although
> option 1 works, it turns out to be a lot more code.  Steve opted
> for option 2, which is much simpler and therefore less prone to
> regression errors.
> 
> Regards,
> 
> Bob Peterson
> --
> Signed-off-by: Bob Peterson <rpeterso@redhat.com> 
> --
> diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
> index 58c730b..f0bcaa2 100644
> --- a/fs/gfs2/ops_fstype.c
> +++ b/fs/gfs2/ops_fstype.c
> @@ -358,7 +358,7 @@ static int init_journal(struct gfs2_sbd *sdp, int undo)
>  
>  		ip = GFS2_I(sdp->sd_jdesc->jd_inode);
>  		error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED,
> -					   LM_FLAG_NOEXP | GL_EXACT,
> +					   LM_FLAG_NOEXP | GL_EXACT | GL_NOCACHE,
>  					   &sdp->sd_jinode_gh);
>  		if (error) {
>  			fs_err(sdp, "can't acquire journal inode glock: %d\n",
> diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
> index 5ada38c..beb6c7a 100644
> --- a/fs/gfs2/recovery.c
> +++ b/fs/gfs2/recovery.c
> @@ -469,7 +469,7 @@ int gfs2_recover_journal(struct gfs2_jdesc *jd)
>  		};
>  
>  		error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED,
> -					   LM_FLAG_NOEXP, &ji_gh);
> +					   LM_FLAG_NOEXP | GL_NOCACHE, &ji_gh);
>  		if (error)
>  			goto fail_gunlock_j;
>  	} else {
> 
> 



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Cluster-devel] [PATCH 5 of 5] Bz #248176: GFS2: invalid metadata block - REVISED
  2007-08-08 22:08 [Cluster-devel] [PATCH 5 of 5] Bz #248176: GFS2: invalid metadata block - REVISED Bob Peterson
  2007-08-09  8:35 ` Steven Whitehouse
@ 2007-08-09 19:07 ` Wendy Cheng
  2007-08-10  8:22   ` Steven Whitehouse
  1 sibling, 1 reply; 4+ messages in thread
From: Wendy Cheng @ 2007-08-09 19:07 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Bob Peterson wrote:
> The problem was that the journal inodes, although protected by
> a glock, were not synched with the other nodes because they don't
> use the inode glock synch operations (i.e. no "glops" were defined).
> Therefore, journal recovery on a journal-recovering node were causing
> the blocks to get out of sync with the node that was actually trying
> to use that journal as it comes back up from a reboot.
>   

I don't understand this patch either. Maybe I have worked too long in 
GFS1 so please educate me on these GFS2 internals.  Comment below:
> There are two possible solutions: (1) To make the journals use the
> normal inode glock sync operations, or (2) To make the journal
> operations take effect immediately (i.e. no caching).  Although
> option 1 works, it turns out to be a lot more code.  Steve opted
> for option 2, which is much simpler and therefore less prone to
> regression errors.
>
> Regards,
>
> Bob Peterson
> --
> Signed-off-by: Bob Peterson <rpeterso@redhat.com> 
> --
> diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
> index 58c730b..f0bcaa2 100644
> --- a/fs/gfs2/ops_fstype.c
> +++ b/fs/gfs2/ops_fstype.c
> @@ -358,7 +358,7 @@ static int init_journal(struct gfs2_sbd *sdp, int undo)
>  
>  		ip = GFS2_I(sdp->sd_jdesc->jd_inode);
>  		error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED,
> -					   LM_FLAG_NOEXP | GL_EXACT,
> +					   LM_FLAG_NOEXP | GL_EXACT | GL_NOCACHE,
>  					   &sdp->sd_jinode_gh);
>  		if (error) {
>  			fs_err(sdp, "can't acquire journal inode glock: %d\n",
> diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
> index 5ada38c..beb6c7a 100644
> --- a/fs/gfs2/recovery.c
> +++ b/fs/gfs2/recovery.c
> @@ -469,7 +469,7 @@ int gfs2_recover_journal(struct gfs2_jdesc *jd)
>  		};
>  
>  		error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED,
> -					   LM_FLAG_NOEXP, &ji_gh);
> +					   LM_FLAG_NOEXP | GL_NOCACHE, &ji_gh);
>  		if (error)
>  			goto fail_gunlock_j;
>  	} else {
>
>   
This lock is requested as "SHARED" (read lock). So how does "GL_NOCACHE" 
help it to "sync" with other nodes regarding to disk blocks sharing as 
you described above ? For a normal EXCLUSIVE inode glock with nocache, 
it will force a sync (disk blocks). However, this is a read lock. So 
what is the problem this patch has solved ?

-- Wendy



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Cluster-devel] [PATCH 5 of 5] Bz #248176: GFS2: invalid metadata block - REVISED
  2007-08-09 19:07 ` Wendy Cheng
@ 2007-08-10  8:22   ` Steven Whitehouse
  0 siblings, 0 replies; 4+ messages in thread
From: Steven Whitehouse @ 2007-08-10  8:22 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

On Thu, 2007-08-09 at 15:07 -0400, Wendy Cheng wrote:
> Bob Peterson wrote:
> > The problem was that the journal inodes, although protected by
> > a glock, were not synched with the other nodes because they don't
> > use the inode glock synch operations (i.e. no "glops" were defined).
> > Therefore, journal recovery on a journal-recovering node were causing
> > the blocks to get out of sync with the node that was actually trying
> > to use that journal as it comes back up from a reboot.
> >   
> 
> I don't understand this patch either. Maybe I have worked too long in 
> GFS1 so please educate me on these GFS2 internals.  Comment below:
> > There are two possible solutions: (1) To make the journals use the
> > normal inode glock sync operations, or (2) To make the journal
> > operations take effect immediately (i.e. no caching).  Although
> > option 1 works, it turns out to be a lot more code.  Steve opted
> > for option 2, which is much simpler and therefore less prone to
> > regression errors.
> >
> > Regards,
> >
> > Bob Peterson
> > --
> > Signed-off-by: Bob Peterson <rpeterso@redhat.com> 
> > --
> > diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
> > index 58c730b..f0bcaa2 100644
> > --- a/fs/gfs2/ops_fstype.c
> > +++ b/fs/gfs2/ops_fstype.c
> > @@ -358,7 +358,7 @@ static int init_journal(struct gfs2_sbd *sdp, int undo)
> >  
> >  		ip = GFS2_I(sdp->sd_jdesc->jd_inode);
> >  		error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED,
> > -					   LM_FLAG_NOEXP | GL_EXACT,
> > +					   LM_FLAG_NOEXP | GL_EXACT | GL_NOCACHE,
> >  					   &sdp->sd_jinode_gh);
> >  		if (error) {
> >  			fs_err(sdp, "can't acquire journal inode glock: %d\n",
> > diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
> > index 5ada38c..beb6c7a 100644
> > --- a/fs/gfs2/recovery.c
> > +++ b/fs/gfs2/recovery.c
> > @@ -469,7 +469,7 @@ int gfs2_recover_journal(struct gfs2_jdesc *jd)
> >  		};
> >  
> >  		error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED,
> > -					   LM_FLAG_NOEXP, &ji_gh);
> > +					   LM_FLAG_NOEXP | GL_NOCACHE, &ji_gh);
> >  		if (error)
> >  			goto fail_gunlock_j;
> >  	} else {
> >
> >   
> This lock is requested as "SHARED" (read lock). So how does "GL_NOCACHE" 
> help it to "sync" with other nodes regarding to disk blocks sharing as 
> you described above ? For a normal EXCLUSIVE inode glock with nocache, 
> it will force a sync (disk blocks). However, this is a read lock. So 
> what is the problem this patch has solved ?
> 
> -- Wendy
> 

This is the easy one to explain.... the page cache relating to the
journal inode is only used to read the journal for recovery and
otherwise is unused. The problem here comes when a node tries to recover
the journal of the same remote node twice (with no umounts etc. between
the two events). In that case it was possible for a node to "see" stale
data from the first recovery attempt while reading the journal for the
second recovery. Using GL_NOCACHE means that we now flush the cache
after the first recovery, so when the second recovery occurs, the blocks
will be read fresh from disk,

Steve.




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-08-10  8:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-08 22:08 [Cluster-devel] [PATCH 5 of 5] Bz #248176: GFS2: invalid metadata block - REVISED Bob Peterson
2007-08-09  8:35 ` Steven Whitehouse
2007-08-09 19:07 ` Wendy Cheng
2007-08-10  8:22   ` Steven Whitehouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).