* [PATCH 0/2] xfsprogs/repair: secondary sb scan cleanups @ 2015-01-13 20:08 Brian Foster 2015-01-13 20:08 ` [PATCH 1/2] repair: fix unnecessary secondary scan if only last sb is corrupt Brian Foster 2015-01-13 20:08 ` [PATCH 2/2] repair: remove unused strided secondary sb scan logic Brian Foster 0 siblings, 2 replies; 6+ messages in thread From: Brian Foster @ 2015-01-13 20:08 UTC (permalink / raw) To: xfs Hi all, Here are a couple patches to clean up the secondary sb scanning in phase 1 of xfs_repair. Patch 1 is a fix for a bug encountered by a user with a corrupted fs/array that lead to scattered sb corruption: http://oss.sgi.com/archives/xfs/2015-01/msg00113.html Patch 2 is a larger cleanup of the affected sb scan function after reading through the code and not being able to grok the need for some of the extra complexity therein. I don't think it should affect behavior much, but thoughts appreciated. Brian Brian Foster (2): repair: fix unnecessary secondary scan if only last sb is corrupt repair: remove unused strided secondary sb scan logic repair/globals.h | 2 -- repair/sb.c | 67 +++++++++++++++++++------------------------------------- 2 files changed, 22 insertions(+), 47 deletions(-) -- 1.8.3.1 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] repair: fix unnecessary secondary scan if only last sb is corrupt 2015-01-13 20:08 [PATCH 0/2] xfsprogs/repair: secondary sb scan cleanups Brian Foster @ 2015-01-13 20:08 ` Brian Foster 2015-01-15 21:44 ` Eric Sandeen 2015-01-13 20:08 ` [PATCH 2/2] repair: remove unused strided secondary sb scan logic Brian Foster 1 sibling, 1 reply; 6+ messages in thread From: Brian Foster @ 2015-01-13 20:08 UTC (permalink / raw) To: xfs verify_set_primary_sb() scans the secondary superbocks based on the geometry specified in the primary and determines the most likely correct geometry by tracking how many superblocks are consistent across the set. The most frequent geometry is copied into the primary superblock. The return value is checked by the caller (phase1()) to determine whether a brute force secondary scan is necessary. This generally occurs when not enough secondary sb's are consistent to declare the geometry correct. If enough secondaries are consistent, verify_set_primary_sb() returns the status of the last secondary sb that was scanned. Corruptions to secondary supers other than the last are thus resolved fine. If the last secondary is corrupt, however, an error is returned to phase1(). This causes a brute force scan even if enough supers were found to repair the last secondary. Move the initialization of retval to after the sb scan to return an error only if not enough secondary supers were found to declare a correct geometry. Signed-off-by: Brian Foster <bfoster@redhat.com> --- repair/sb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/repair/sb.c b/repair/sb.c index ad27756..dc154f7 100644 --- a/repair/sb.c +++ b/repair/sb.c @@ -724,7 +724,6 @@ verify_set_primary_sb(xfs_sb_t *rsb, * sector size rather than the sector size in @rsb. */ size = NUM_AGH_SECTS * (1 << (XFS_MAX_SECTORSIZE_LOG)); - retval = 0; list = NULL; num_ok = 0; *sb_modified = 0; @@ -779,6 +778,7 @@ verify_set_primary_sb(xfs_sb_t *rsb, /* * see if we have enough superblocks to bother with */ + retval = 0; if (num_ok < num_sbs / 2) { retval = XR_INSUFF_SEC_SB; goto out_free_list; @@ -868,5 +868,5 @@ out_free_list: free_geo(list); free(sb); free(checked); - return(retval); + return retval; } -- 1.8.3.1 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] repair: fix unnecessary secondary scan if only last sb is corrupt 2015-01-13 20:08 ` [PATCH 1/2] repair: fix unnecessary secondary scan if only last sb is corrupt Brian Foster @ 2015-01-15 21:44 ` Eric Sandeen 2015-01-16 12:05 ` Brian Foster 0 siblings, 1 reply; 6+ messages in thread From: Eric Sandeen @ 2015-01-15 21:44 UTC (permalink / raw) To: Brian Foster, xfs On 1/13/15 2:08 PM, Brian Foster wrote: > verify_set_primary_sb() scans the secondary superbocks based on the > geometry specified in the primary and determines the most likely correct > geometry by tracking how many superblocks are consistent across the set. > The most frequent geometry is copied into the primary superblock. The > return value is checked by the caller (phase1()) to determine whether a > brute force secondary scan is necessary. > > This generally occurs when not enough secondary sb's are consistent to > declare the geometry correct. If enough secondaries are consistent, > verify_set_primary_sb() returns the status of the last secondary sb that > was scanned. Corruptions to secondary supers other than the last are > thus resolved fine. If the last secondary is corrupt, however, an error > is returned to phase1(). This causes a brute force scan even if enough > supers were found to repair the last secondary. > > Move the initialization of retval to after the sb scan to return an > error only if not enough secondary supers were found to declare a > correct geometry. > > Signed-off-by: Brian Foster <bfoster@redhat.com> Nice. Brute-force scan is awful, doing it when unnecessary stinks! :) could this be fstest-ed? Reviewed-by: Eric Sandeen <sandeen@redhat.com> > --- > repair/sb.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/repair/sb.c b/repair/sb.c > index ad27756..dc154f7 100644 > --- a/repair/sb.c > +++ b/repair/sb.c > @@ -724,7 +724,6 @@ verify_set_primary_sb(xfs_sb_t *rsb, > * sector size rather than the sector size in @rsb. > */ > size = NUM_AGH_SECTS * (1 << (XFS_MAX_SECTORSIZE_LOG)); > - retval = 0; > list = NULL; > num_ok = 0; > *sb_modified = 0; > @@ -779,6 +778,7 @@ verify_set_primary_sb(xfs_sb_t *rsb, > /* > * see if we have enough superblocks to bother with > */ > + retval = 0; > if (num_ok < num_sbs / 2) { > retval = XR_INSUFF_SEC_SB; > goto out_free_list; > @@ -868,5 +868,5 @@ out_free_list: > free_geo(list); > free(sb); > free(checked); > - return(retval); > + return retval; > } > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] repair: fix unnecessary secondary scan if only last sb is corrupt 2015-01-15 21:44 ` Eric Sandeen @ 2015-01-16 12:05 ` Brian Foster 0 siblings, 0 replies; 6+ messages in thread From: Brian Foster @ 2015-01-16 12:05 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs On Thu, Jan 15, 2015 at 03:44:43PM -0600, Eric Sandeen wrote: > On 1/13/15 2:08 PM, Brian Foster wrote: > > verify_set_primary_sb() scans the secondary superbocks based on the > > geometry specified in the primary and determines the most likely correct > > geometry by tracking how many superblocks are consistent across the set. > > The most frequent geometry is copied into the primary superblock. The > > return value is checked by the caller (phase1()) to determine whether a > > brute force secondary scan is necessary. > > > > This generally occurs when not enough secondary sb's are consistent to > > declare the geometry correct. If enough secondaries are consistent, > > verify_set_primary_sb() returns the status of the last secondary sb that > > was scanned. Corruptions to secondary supers other than the last are > > thus resolved fine. If the last secondary is corrupt, however, an error > > is returned to phase1(). This causes a brute force scan even if enough > > supers were found to repair the last secondary. > > > > Move the initialization of retval to after the sb scan to return an > > error only if not enough secondary supers were found to declare a > > correct geometry. > > > > Signed-off-by: Brian Foster <bfoster@redhat.com> > > Nice. Brute-force scan is awful, doing it when unnecessary stinks! :) > > could this be fstest-ed? > Yeah, the problem is easy to reproduce with xfs_db (just corrupt the last sb). I think a test that runs repair through a few sets of sb corruptions should be easy enough. I'll look into it. Another situation I ran into while playing with these is the brute force scan finding an older secondary (i.e., from a previous mkfs) and eventually falling over based on its geometry rather than continuing to scan for the a secondary of the current fs. I wasn't sure what was going on at the time so I zeroed off the bdev and retested to confirm that was the problem. I wonder how likely something like that might be in the real world... > Reviewed-by: Eric Sandeen <sandeen@redhat.com> > Thanks for reviewing these! Brian > > --- > > repair/sb.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/repair/sb.c b/repair/sb.c > > index ad27756..dc154f7 100644 > > --- a/repair/sb.c > > +++ b/repair/sb.c > > @@ -724,7 +724,6 @@ verify_set_primary_sb(xfs_sb_t *rsb, > > * sector size rather than the sector size in @rsb. > > */ > > size = NUM_AGH_SECTS * (1 << (XFS_MAX_SECTORSIZE_LOG)); > > - retval = 0; > > list = NULL; > > num_ok = 0; > > *sb_modified = 0; > > @@ -779,6 +778,7 @@ verify_set_primary_sb(xfs_sb_t *rsb, > > /* > > * see if we have enough superblocks to bother with > > */ > > + retval = 0; > > if (num_ok < num_sbs / 2) { > > retval = XR_INSUFF_SEC_SB; > > goto out_free_list; > > @@ -868,5 +868,5 @@ out_free_list: > > free_geo(list); > > free(sb); > > free(checked); > > - return(retval); > > + return retval; > > } > > > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 2/2] repair: remove unused strided secondary sb scan logic 2015-01-13 20:08 [PATCH 0/2] xfsprogs/repair: secondary sb scan cleanups Brian Foster 2015-01-13 20:08 ` [PATCH 1/2] repair: fix unnecessary secondary scan if only last sb is corrupt Brian Foster @ 2015-01-13 20:08 ` Brian Foster 2015-01-15 23:15 ` Eric Sandeen 1 sibling, 1 reply; 6+ messages in thread From: Brian Foster @ 2015-01-13 20:08 UTC (permalink / raw) To: xfs verify_set_primary_sb() scans and verifies secondary superblocks based on the primary sb. It currently defines a maximum number of 8 superblocks to scan per iteration. It also implements a strided algorithm that appears intended to ultimately scan every secondary, albeit in a strided order. Given that the algorithm is written to hit every sb and the stride value is initialized as follows: num_sbs = MIN(NUM_SBS, rsb->sb_agcount); skip = howmany(num_sbs, rsb->sb_agcount); ... which is guaranteed to be 1 since the howmany() parameters are backwards, the strided algorithm doesn't appear to accomplish anything that can't be done with a simple for loop. In other words, despite the max value and strided algorithm, repair always scans all of the secondary superblocks in incremental order. Therefore, remove the strided algorithm bits and replace with a simple for loop. As a result of this cleanup, also remove the 'checked' buffer used to track repeated ag visits and the now unused NUM_SBS definition. Signed-off-by: Brian Foster <bfoster@redhat.com> --- repair/globals.h | 2 -- repair/sb.c | 63 ++++++++++++++++++-------------------------------------- 2 files changed, 20 insertions(+), 45 deletions(-) diff --git a/repair/globals.h b/repair/globals.h index 6207ca1..f386686 100644 --- a/repair/globals.h +++ b/repair/globals.h @@ -57,7 +57,6 @@ #define XR_LOG2BSIZE_MIN 9 /* min/max fs blocksize (log2) */ #define XR_LOG2BSIZE_MAX 16 /* 2^XR_* == blocksize */ -#define NUM_SBS 8 /* max # of sbs to verify */ #define NUM_AGH_SECTS 4 /* # of components in an ag header */ /* @@ -88,7 +87,6 @@ EXTERN char *iobuf; /* large buffer */ EXTERN int iobuf_size; EXTERN char *smallbuf; /* small (1-4 page) buffer */ EXTERN int smallbuf_size; -EXTERN char *sb_bufs[NUM_SBS]; /* superblock buffers */ EXTERN int sbbuf_size; /* direct I/O info */ diff --git a/repair/sb.c b/repair/sb.c index dc154f7..a663728 100644 --- a/repair/sb.c +++ b/repair/sb.c @@ -702,20 +702,11 @@ verify_set_primary_sb(xfs_sb_t *rsb, xfs_sb_t *sb; fs_geo_list_t *list; fs_geo_list_t *current; - char *checked; xfs_agnumber_t agno; int num_sbs; - int skip; int size; int num_ok; int retval; - int round; - - /* - * select the number of secondaries to try for - */ - num_sbs = MIN(NUM_SBS, rsb->sb_agcount); - skip = howmany(num_sbs, rsb->sb_agcount); /* * We haven't been able to validate the sector size yet properly @@ -727,51 +718,38 @@ verify_set_primary_sb(xfs_sb_t *rsb, list = NULL; num_ok = 0; *sb_modified = 0; + num_sbs = rsb->sb_agcount; sb = (xfs_sb_t *) alloc_ag_buf(size); - checked = calloc(rsb->sb_agcount, sizeof(char)); - if (!checked) { - do_error(_("calloc failed in verify_set_primary_sb\n")); - exit(1); - } /* * put the primary sb geometry info onto the geometry list */ - checked[sb_index] = 1; get_sb_geometry(&geo, rsb); list = add_geo(list, &geo, sb_index); /* - * grab N secondaries. check them off as we get them - * so we only process each one once + * scan the secondaries and check them off as we get them so we only + * process each one once */ - for (round = 0; round < skip; round++) { - for (agno = round; agno < rsb->sb_agcount; agno += skip) { - if (checked[agno]) - continue; + for (agno = 1; agno < rsb->sb_agcount; agno++) { + off = (xfs_off_t)agno * rsb->sb_agblocks << rsb->sb_blocklog; - off = (xfs_off_t)agno * rsb->sb_agblocks << rsb->sb_blocklog; - - checked[agno] = 1; - retval = get_sb(sb, off, size, agno); - if (retval == XR_EOF) - goto out_free_list; - - if (retval == XR_OK) { - /* - * save away geometry info. - * don't bother checking the sb - * against the agi/agf as the odds - * of the sb being corrupted in a way - * that it is internally consistent - * but not consistent with the rest - * of the filesystem is really really low. - */ - get_sb_geometry(&geo, sb); - list = add_geo(list, &geo, agno); - num_ok++; - } + retval = get_sb(sb, off, size, agno); + if (retval == XR_EOF) + goto out_free_list; + + if (retval == XR_OK) { + /* + * save away geometry info. don't bother checking the + * sb against the agi/agf as the odds of the sb being + * corrupted in a way that it is internally consistent + * but not consistent with the rest of the filesystem is + * really really low. + */ + get_sb_geometry(&geo, sb); + list = add_geo(list, &geo, agno); + num_ok++; } } @@ -867,6 +845,5 @@ verify_set_primary_sb(xfs_sb_t *rsb, out_free_list: free_geo(list); free(sb); - free(checked); return retval; } -- 1.8.3.1 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] repair: remove unused strided secondary sb scan logic 2015-01-13 20:08 ` [PATCH 2/2] repair: remove unused strided secondary sb scan logic Brian Foster @ 2015-01-15 23:15 ` Eric Sandeen 0 siblings, 0 replies; 6+ messages in thread From: Eric Sandeen @ 2015-01-15 23:15 UTC (permalink / raw) To: Brian Foster, xfs On 1/13/15 2:08 PM, Brian Foster wrote: > verify_set_primary_sb() scans and verifies secondary superblocks based > on the primary sb. It currently defines a maximum number of 8 > superblocks to scan per iteration. It also implements a strided > algorithm that appears intended to ultimately scan every secondary, > albeit in a strided order. > > Given that the algorithm is written to hit every sb and the stride value > is initialized as follows: > > num_sbs = MIN(NUM_SBS, rsb->sb_agcount); > skip = howmany(num_sbs, rsb->sb_agcount); > > ... which is guaranteed to be 1 since the howmany() parameters are > backwards, the strided algorithm doesn't appear to accomplish anything > that can't be done with a simple for loop. In other words, despite the > max value and strided algorithm, repair always scans all of the > secondary superblocks in incremental order. > > Therefore, remove the strided algorithm bits and replace with a simple > for loop. As a result of this cleanup, also remove the 'checked' buffer > used to track repeated ag visits and the now unused NUM_SBS definition. > > Signed-off-by: Brian Foster <bfoster@redhat.com> This looks fine too. I can't see any reason to do the strided check, even if it works... Reviewed-by: Eric Sandeen <sandeen@redhat.com> > --- > repair/globals.h | 2 -- > repair/sb.c | 63 ++++++++++++++++++-------------------------------------- > 2 files changed, 20 insertions(+), 45 deletions(-) > > diff --git a/repair/globals.h b/repair/globals.h > index 6207ca1..f386686 100644 > --- a/repair/globals.h > +++ b/repair/globals.h > @@ -57,7 +57,6 @@ > #define XR_LOG2BSIZE_MIN 9 /* min/max fs blocksize (log2) */ > #define XR_LOG2BSIZE_MAX 16 /* 2^XR_* == blocksize */ > > -#define NUM_SBS 8 /* max # of sbs to verify */ > #define NUM_AGH_SECTS 4 /* # of components in an ag header */ > > /* > @@ -88,7 +87,6 @@ EXTERN char *iobuf; /* large buffer */ > EXTERN int iobuf_size; > EXTERN char *smallbuf; /* small (1-4 page) buffer */ > EXTERN int smallbuf_size; > -EXTERN char *sb_bufs[NUM_SBS]; /* superblock buffers */ > EXTERN int sbbuf_size; > > /* direct I/O info */ > diff --git a/repair/sb.c b/repair/sb.c > index dc154f7..a663728 100644 > --- a/repair/sb.c > +++ b/repair/sb.c > @@ -702,20 +702,11 @@ verify_set_primary_sb(xfs_sb_t *rsb, > xfs_sb_t *sb; > fs_geo_list_t *list; > fs_geo_list_t *current; > - char *checked; > xfs_agnumber_t agno; > int num_sbs; > - int skip; > int size; > int num_ok; > int retval; > - int round; > - > - /* > - * select the number of secondaries to try for > - */ > - num_sbs = MIN(NUM_SBS, rsb->sb_agcount); > - skip = howmany(num_sbs, rsb->sb_agcount); > > /* > * We haven't been able to validate the sector size yet properly > @@ -727,51 +718,38 @@ verify_set_primary_sb(xfs_sb_t *rsb, > list = NULL; > num_ok = 0; > *sb_modified = 0; > + num_sbs = rsb->sb_agcount; > > sb = (xfs_sb_t *) alloc_ag_buf(size); > - checked = calloc(rsb->sb_agcount, sizeof(char)); > - if (!checked) { > - do_error(_("calloc failed in verify_set_primary_sb\n")); > - exit(1); > - } > > /* > * put the primary sb geometry info onto the geometry list > */ > - checked[sb_index] = 1; > get_sb_geometry(&geo, rsb); > list = add_geo(list, &geo, sb_index); > > /* > - * grab N secondaries. check them off as we get them > - * so we only process each one once > + * scan the secondaries and check them off as we get them so we only > + * process each one once > */ > - for (round = 0; round < skip; round++) { > - for (agno = round; agno < rsb->sb_agcount; agno += skip) { > - if (checked[agno]) > - continue; > + for (agno = 1; agno < rsb->sb_agcount; agno++) { > + off = (xfs_off_t)agno * rsb->sb_agblocks << rsb->sb_blocklog; > > - off = (xfs_off_t)agno * rsb->sb_agblocks << rsb->sb_blocklog; > - > - checked[agno] = 1; > - retval = get_sb(sb, off, size, agno); > - if (retval == XR_EOF) > - goto out_free_list; > - > - if (retval == XR_OK) { > - /* > - * save away geometry info. > - * don't bother checking the sb > - * against the agi/agf as the odds > - * of the sb being corrupted in a way > - * that it is internally consistent > - * but not consistent with the rest > - * of the filesystem is really really low. > - */ > - get_sb_geometry(&geo, sb); > - list = add_geo(list, &geo, agno); > - num_ok++; > - } > + retval = get_sb(sb, off, size, agno); > + if (retval == XR_EOF) > + goto out_free_list; > + > + if (retval == XR_OK) { > + /* > + * save away geometry info. don't bother checking the > + * sb against the agi/agf as the odds of the sb being > + * corrupted in a way that it is internally consistent > + * but not consistent with the rest of the filesystem is > + * really really low. > + */ > + get_sb_geometry(&geo, sb); > + list = add_geo(list, &geo, agno); > + num_ok++; > } > } > > @@ -867,6 +845,5 @@ verify_set_primary_sb(xfs_sb_t *rsb, > out_free_list: > free_geo(list); > free(sb); > - free(checked); > return retval; > } > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-01-16 12:18 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-01-13 20:08 [PATCH 0/2] xfsprogs/repair: secondary sb scan cleanups Brian Foster 2015-01-13 20:08 ` [PATCH 1/2] repair: fix unnecessary secondary scan if only last sb is corrupt Brian Foster 2015-01-15 21:44 ` Eric Sandeen 2015-01-16 12:05 ` Brian Foster 2015-01-13 20:08 ` [PATCH 2/2] repair: remove unused strided secondary sb scan logic Brian Foster 2015-01-15 23:15 ` Eric Sandeen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox