* [PATCH 0/2] xfsprogs/repair: secondary sb scan cleanups
@ 2015-01-13 20:08 Brian Foster
2015-01-13 20:08 ` [PATCH 1/2] repair: fix unnecessary secondary scan if only last sb is corrupt Brian Foster
2015-01-13 20:08 ` [PATCH 2/2] repair: remove unused strided secondary sb scan logic Brian Foster
0 siblings, 2 replies; 6+ messages in thread
From: Brian Foster @ 2015-01-13 20:08 UTC (permalink / raw)
To: xfs
Hi all,
Here are a couple patches to clean up the secondary sb scanning in phase
1 of xfs_repair. Patch 1 is a fix for a bug encountered by a user with a
corrupted fs/array that lead to scattered sb corruption:
http://oss.sgi.com/archives/xfs/2015-01/msg00113.html
Patch 2 is a larger cleanup of the affected sb scan function after
reading through the code and not being able to grok the need for some of
the extra complexity therein. I don't think it should affect behavior
much, but thoughts appreciated.
Brian
Brian Foster (2):
repair: fix unnecessary secondary scan if only last sb is corrupt
repair: remove unused strided secondary sb scan logic
repair/globals.h | 2 --
repair/sb.c | 67 +++++++++++++++++++-------------------------------------
2 files changed, 22 insertions(+), 47 deletions(-)
--
1.8.3.1
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] repair: fix unnecessary secondary scan if only last sb is corrupt
2015-01-13 20:08 [PATCH 0/2] xfsprogs/repair: secondary sb scan cleanups Brian Foster
@ 2015-01-13 20:08 ` Brian Foster
2015-01-15 21:44 ` Eric Sandeen
2015-01-13 20:08 ` [PATCH 2/2] repair: remove unused strided secondary sb scan logic Brian Foster
1 sibling, 1 reply; 6+ messages in thread
From: Brian Foster @ 2015-01-13 20:08 UTC (permalink / raw)
To: xfs
verify_set_primary_sb() scans the secondary superbocks based on the
geometry specified in the primary and determines the most likely correct
geometry by tracking how many superblocks are consistent across the set.
The most frequent geometry is copied into the primary superblock. The
return value is checked by the caller (phase1()) to determine whether a
brute force secondary scan is necessary.
This generally occurs when not enough secondary sb's are consistent to
declare the geometry correct. If enough secondaries are consistent,
verify_set_primary_sb() returns the status of the last secondary sb that
was scanned. Corruptions to secondary supers other than the last are
thus resolved fine. If the last secondary is corrupt, however, an error
is returned to phase1(). This causes a brute force scan even if enough
supers were found to repair the last secondary.
Move the initialization of retval to after the sb scan to return an
error only if not enough secondary supers were found to declare a
correct geometry.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
repair/sb.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/repair/sb.c b/repair/sb.c
index ad27756..dc154f7 100644
--- a/repair/sb.c
+++ b/repair/sb.c
@@ -724,7 +724,6 @@ verify_set_primary_sb(xfs_sb_t *rsb,
* sector size rather than the sector size in @rsb.
*/
size = NUM_AGH_SECTS * (1 << (XFS_MAX_SECTORSIZE_LOG));
- retval = 0;
list = NULL;
num_ok = 0;
*sb_modified = 0;
@@ -779,6 +778,7 @@ verify_set_primary_sb(xfs_sb_t *rsb,
/*
* see if we have enough superblocks to bother with
*/
+ retval = 0;
if (num_ok < num_sbs / 2) {
retval = XR_INSUFF_SEC_SB;
goto out_free_list;
@@ -868,5 +868,5 @@ out_free_list:
free_geo(list);
free(sb);
free(checked);
- return(retval);
+ return retval;
}
--
1.8.3.1
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] repair: remove unused strided secondary sb scan logic
2015-01-13 20:08 [PATCH 0/2] xfsprogs/repair: secondary sb scan cleanups Brian Foster
2015-01-13 20:08 ` [PATCH 1/2] repair: fix unnecessary secondary scan if only last sb is corrupt Brian Foster
@ 2015-01-13 20:08 ` Brian Foster
2015-01-15 23:15 ` Eric Sandeen
1 sibling, 1 reply; 6+ messages in thread
From: Brian Foster @ 2015-01-13 20:08 UTC (permalink / raw)
To: xfs
verify_set_primary_sb() scans and verifies secondary superblocks based
on the primary sb. It currently defines a maximum number of 8
superblocks to scan per iteration. It also implements a strided
algorithm that appears intended to ultimately scan every secondary,
albeit in a strided order.
Given that the algorithm is written to hit every sb and the stride value
is initialized as follows:
num_sbs = MIN(NUM_SBS, rsb->sb_agcount);
skip = howmany(num_sbs, rsb->sb_agcount);
... which is guaranteed to be 1 since the howmany() parameters are
backwards, the strided algorithm doesn't appear to accomplish anything
that can't be done with a simple for loop. In other words, despite the
max value and strided algorithm, repair always scans all of the
secondary superblocks in incremental order.
Therefore, remove the strided algorithm bits and replace with a simple
for loop. As a result of this cleanup, also remove the 'checked' buffer
used to track repeated ag visits and the now unused NUM_SBS definition.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
repair/globals.h | 2 --
repair/sb.c | 63 ++++++++++++++++++--------------------------------------
2 files changed, 20 insertions(+), 45 deletions(-)
diff --git a/repair/globals.h b/repair/globals.h
index 6207ca1..f386686 100644
--- a/repair/globals.h
+++ b/repair/globals.h
@@ -57,7 +57,6 @@
#define XR_LOG2BSIZE_MIN 9 /* min/max fs blocksize (log2) */
#define XR_LOG2BSIZE_MAX 16 /* 2^XR_* == blocksize */
-#define NUM_SBS 8 /* max # of sbs to verify */
#define NUM_AGH_SECTS 4 /* # of components in an ag header */
/*
@@ -88,7 +87,6 @@ EXTERN char *iobuf; /* large buffer */
EXTERN int iobuf_size;
EXTERN char *smallbuf; /* small (1-4 page) buffer */
EXTERN int smallbuf_size;
-EXTERN char *sb_bufs[NUM_SBS]; /* superblock buffers */
EXTERN int sbbuf_size;
/* direct I/O info */
diff --git a/repair/sb.c b/repair/sb.c
index dc154f7..a663728 100644
--- a/repair/sb.c
+++ b/repair/sb.c
@@ -702,20 +702,11 @@ verify_set_primary_sb(xfs_sb_t *rsb,
xfs_sb_t *sb;
fs_geo_list_t *list;
fs_geo_list_t *current;
- char *checked;
xfs_agnumber_t agno;
int num_sbs;
- int skip;
int size;
int num_ok;
int retval;
- int round;
-
- /*
- * select the number of secondaries to try for
- */
- num_sbs = MIN(NUM_SBS, rsb->sb_agcount);
- skip = howmany(num_sbs, rsb->sb_agcount);
/*
* We haven't been able to validate the sector size yet properly
@@ -727,51 +718,38 @@ verify_set_primary_sb(xfs_sb_t *rsb,
list = NULL;
num_ok = 0;
*sb_modified = 0;
+ num_sbs = rsb->sb_agcount;
sb = (xfs_sb_t *) alloc_ag_buf(size);
- checked = calloc(rsb->sb_agcount, sizeof(char));
- if (!checked) {
- do_error(_("calloc failed in verify_set_primary_sb\n"));
- exit(1);
- }
/*
* put the primary sb geometry info onto the geometry list
*/
- checked[sb_index] = 1;
get_sb_geometry(&geo, rsb);
list = add_geo(list, &geo, sb_index);
/*
- * grab N secondaries. check them off as we get them
- * so we only process each one once
+ * scan the secondaries and check them off as we get them so we only
+ * process each one once
*/
- for (round = 0; round < skip; round++) {
- for (agno = round; agno < rsb->sb_agcount; agno += skip) {
- if (checked[agno])
- continue;
+ for (agno = 1; agno < rsb->sb_agcount; agno++) {
+ off = (xfs_off_t)agno * rsb->sb_agblocks << rsb->sb_blocklog;
- off = (xfs_off_t)agno * rsb->sb_agblocks << rsb->sb_blocklog;
-
- checked[agno] = 1;
- retval = get_sb(sb, off, size, agno);
- if (retval == XR_EOF)
- goto out_free_list;
-
- if (retval == XR_OK) {
- /*
- * save away geometry info.
- * don't bother checking the sb
- * against the agi/agf as the odds
- * of the sb being corrupted in a way
- * that it is internally consistent
- * but not consistent with the rest
- * of the filesystem is really really low.
- */
- get_sb_geometry(&geo, sb);
- list = add_geo(list, &geo, agno);
- num_ok++;
- }
+ retval = get_sb(sb, off, size, agno);
+ if (retval == XR_EOF)
+ goto out_free_list;
+
+ if (retval == XR_OK) {
+ /*
+ * save away geometry info. don't bother checking the
+ * sb against the agi/agf as the odds of the sb being
+ * corrupted in a way that it is internally consistent
+ * but not consistent with the rest of the filesystem is
+ * really really low.
+ */
+ get_sb_geometry(&geo, sb);
+ list = add_geo(list, &geo, agno);
+ num_ok++;
}
}
@@ -867,6 +845,5 @@ verify_set_primary_sb(xfs_sb_t *rsb,
out_free_list:
free_geo(list);
free(sb);
- free(checked);
return retval;
}
--
1.8.3.1
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] repair: fix unnecessary secondary scan if only last sb is corrupt
2015-01-13 20:08 ` [PATCH 1/2] repair: fix unnecessary secondary scan if only last sb is corrupt Brian Foster
@ 2015-01-15 21:44 ` Eric Sandeen
2015-01-16 12:05 ` Brian Foster
0 siblings, 1 reply; 6+ messages in thread
From: Eric Sandeen @ 2015-01-15 21:44 UTC (permalink / raw)
To: Brian Foster, xfs
On 1/13/15 2:08 PM, Brian Foster wrote:
> verify_set_primary_sb() scans the secondary superbocks based on the
> geometry specified in the primary and determines the most likely correct
> geometry by tracking how many superblocks are consistent across the set.
> The most frequent geometry is copied into the primary superblock. The
> return value is checked by the caller (phase1()) to determine whether a
> brute force secondary scan is necessary.
>
> This generally occurs when not enough secondary sb's are consistent to
> declare the geometry correct. If enough secondaries are consistent,
> verify_set_primary_sb() returns the status of the last secondary sb that
> was scanned. Corruptions to secondary supers other than the last are
> thus resolved fine. If the last secondary is corrupt, however, an error
> is returned to phase1(). This causes a brute force scan even if enough
> supers were found to repair the last secondary.
>
> Move the initialization of retval to after the sb scan to return an
> error only if not enough secondary supers were found to declare a
> correct geometry.
>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
Nice. Brute-force scan is awful, doing it when unnecessary stinks! :)
could this be fstest-ed?
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
> ---
> repair/sb.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/repair/sb.c b/repair/sb.c
> index ad27756..dc154f7 100644
> --- a/repair/sb.c
> +++ b/repair/sb.c
> @@ -724,7 +724,6 @@ verify_set_primary_sb(xfs_sb_t *rsb,
> * sector size rather than the sector size in @rsb.
> */
> size = NUM_AGH_SECTS * (1 << (XFS_MAX_SECTORSIZE_LOG));
> - retval = 0;
> list = NULL;
> num_ok = 0;
> *sb_modified = 0;
> @@ -779,6 +778,7 @@ verify_set_primary_sb(xfs_sb_t *rsb,
> /*
> * see if we have enough superblocks to bother with
> */
> + retval = 0;
> if (num_ok < num_sbs / 2) {
> retval = XR_INSUFF_SEC_SB;
> goto out_free_list;
> @@ -868,5 +868,5 @@ out_free_list:
> free_geo(list);
> free(sb);
> free(checked);
> - return(retval);
> + return retval;
> }
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] repair: remove unused strided secondary sb scan logic
2015-01-13 20:08 ` [PATCH 2/2] repair: remove unused strided secondary sb scan logic Brian Foster
@ 2015-01-15 23:15 ` Eric Sandeen
0 siblings, 0 replies; 6+ messages in thread
From: Eric Sandeen @ 2015-01-15 23:15 UTC (permalink / raw)
To: Brian Foster, xfs
On 1/13/15 2:08 PM, Brian Foster wrote:
> verify_set_primary_sb() scans and verifies secondary superblocks based
> on the primary sb. It currently defines a maximum number of 8
> superblocks to scan per iteration. It also implements a strided
> algorithm that appears intended to ultimately scan every secondary,
> albeit in a strided order.
>
> Given that the algorithm is written to hit every sb and the stride value
> is initialized as follows:
>
> num_sbs = MIN(NUM_SBS, rsb->sb_agcount);
> skip = howmany(num_sbs, rsb->sb_agcount);
>
> ... which is guaranteed to be 1 since the howmany() parameters are
> backwards, the strided algorithm doesn't appear to accomplish anything
> that can't be done with a simple for loop. In other words, despite the
> max value and strided algorithm, repair always scans all of the
> secondary superblocks in incremental order.
>
> Therefore, remove the strided algorithm bits and replace with a simple
> for loop. As a result of this cleanup, also remove the 'checked' buffer
> used to track repeated ag visits and the now unused NUM_SBS definition.
>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
This looks fine too. I can't see any reason to do the strided check,
even if it works...
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
> ---
> repair/globals.h | 2 --
> repair/sb.c | 63 ++++++++++++++++++--------------------------------------
> 2 files changed, 20 insertions(+), 45 deletions(-)
>
> diff --git a/repair/globals.h b/repair/globals.h
> index 6207ca1..f386686 100644
> --- a/repair/globals.h
> +++ b/repair/globals.h
> @@ -57,7 +57,6 @@
> #define XR_LOG2BSIZE_MIN 9 /* min/max fs blocksize (log2) */
> #define XR_LOG2BSIZE_MAX 16 /* 2^XR_* == blocksize */
>
> -#define NUM_SBS 8 /* max # of sbs to verify */
> #define NUM_AGH_SECTS 4 /* # of components in an ag header */
>
> /*
> @@ -88,7 +87,6 @@ EXTERN char *iobuf; /* large buffer */
> EXTERN int iobuf_size;
> EXTERN char *smallbuf; /* small (1-4 page) buffer */
> EXTERN int smallbuf_size;
> -EXTERN char *sb_bufs[NUM_SBS]; /* superblock buffers */
> EXTERN int sbbuf_size;
>
> /* direct I/O info */
> diff --git a/repair/sb.c b/repair/sb.c
> index dc154f7..a663728 100644
> --- a/repair/sb.c
> +++ b/repair/sb.c
> @@ -702,20 +702,11 @@ verify_set_primary_sb(xfs_sb_t *rsb,
> xfs_sb_t *sb;
> fs_geo_list_t *list;
> fs_geo_list_t *current;
> - char *checked;
> xfs_agnumber_t agno;
> int num_sbs;
> - int skip;
> int size;
> int num_ok;
> int retval;
> - int round;
> -
> - /*
> - * select the number of secondaries to try for
> - */
> - num_sbs = MIN(NUM_SBS, rsb->sb_agcount);
> - skip = howmany(num_sbs, rsb->sb_agcount);
>
> /*
> * We haven't been able to validate the sector size yet properly
> @@ -727,51 +718,38 @@ verify_set_primary_sb(xfs_sb_t *rsb,
> list = NULL;
> num_ok = 0;
> *sb_modified = 0;
> + num_sbs = rsb->sb_agcount;
>
> sb = (xfs_sb_t *) alloc_ag_buf(size);
> - checked = calloc(rsb->sb_agcount, sizeof(char));
> - if (!checked) {
> - do_error(_("calloc failed in verify_set_primary_sb\n"));
> - exit(1);
> - }
>
> /*
> * put the primary sb geometry info onto the geometry list
> */
> - checked[sb_index] = 1;
> get_sb_geometry(&geo, rsb);
> list = add_geo(list, &geo, sb_index);
>
> /*
> - * grab N secondaries. check them off as we get them
> - * so we only process each one once
> + * scan the secondaries and check them off as we get them so we only
> + * process each one once
> */
> - for (round = 0; round < skip; round++) {
> - for (agno = round; agno < rsb->sb_agcount; agno += skip) {
> - if (checked[agno])
> - continue;
> + for (agno = 1; agno < rsb->sb_agcount; agno++) {
> + off = (xfs_off_t)agno * rsb->sb_agblocks << rsb->sb_blocklog;
>
> - off = (xfs_off_t)agno * rsb->sb_agblocks << rsb->sb_blocklog;
> -
> - checked[agno] = 1;
> - retval = get_sb(sb, off, size, agno);
> - if (retval == XR_EOF)
> - goto out_free_list;
> -
> - if (retval == XR_OK) {
> - /*
> - * save away geometry info.
> - * don't bother checking the sb
> - * against the agi/agf as the odds
> - * of the sb being corrupted in a way
> - * that it is internally consistent
> - * but not consistent with the rest
> - * of the filesystem is really really low.
> - */
> - get_sb_geometry(&geo, sb);
> - list = add_geo(list, &geo, agno);
> - num_ok++;
> - }
> + retval = get_sb(sb, off, size, agno);
> + if (retval == XR_EOF)
> + goto out_free_list;
> +
> + if (retval == XR_OK) {
> + /*
> + * save away geometry info. don't bother checking the
> + * sb against the agi/agf as the odds of the sb being
> + * corrupted in a way that it is internally consistent
> + * but not consistent with the rest of the filesystem is
> + * really really low.
> + */
> + get_sb_geometry(&geo, sb);
> + list = add_geo(list, &geo, agno);
> + num_ok++;
> }
> }
>
> @@ -867,6 +845,5 @@ verify_set_primary_sb(xfs_sb_t *rsb,
> out_free_list:
> free_geo(list);
> free(sb);
> - free(checked);
> return retval;
> }
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] repair: fix unnecessary secondary scan if only last sb is corrupt
2015-01-15 21:44 ` Eric Sandeen
@ 2015-01-16 12:05 ` Brian Foster
0 siblings, 0 replies; 6+ messages in thread
From: Brian Foster @ 2015-01-16 12:05 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs
On Thu, Jan 15, 2015 at 03:44:43PM -0600, Eric Sandeen wrote:
> On 1/13/15 2:08 PM, Brian Foster wrote:
> > verify_set_primary_sb() scans the secondary superbocks based on the
> > geometry specified in the primary and determines the most likely correct
> > geometry by tracking how many superblocks are consistent across the set.
> > The most frequent geometry is copied into the primary superblock. The
> > return value is checked by the caller (phase1()) to determine whether a
> > brute force secondary scan is necessary.
> >
> > This generally occurs when not enough secondary sb's are consistent to
> > declare the geometry correct. If enough secondaries are consistent,
> > verify_set_primary_sb() returns the status of the last secondary sb that
> > was scanned. Corruptions to secondary supers other than the last are
> > thus resolved fine. If the last secondary is corrupt, however, an error
> > is returned to phase1(). This causes a brute force scan even if enough
> > supers were found to repair the last secondary.
> >
> > Move the initialization of retval to after the sb scan to return an
> > error only if not enough secondary supers were found to declare a
> > correct geometry.
> >
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
>
> Nice. Brute-force scan is awful, doing it when unnecessary stinks! :)
>
> could this be fstest-ed?
>
Yeah, the problem is easy to reproduce with xfs_db (just corrupt the
last sb). I think a test that runs repair through a few sets of sb
corruptions should be easy enough. I'll look into it.
Another situation I ran into while playing with these is the brute force
scan finding an older secondary (i.e., from a previous mkfs) and
eventually falling over based on its geometry rather than continuing to
scan for the a secondary of the current fs. I wasn't sure what was going
on at the time so I zeroed off the bdev and retested to confirm that was
the problem. I wonder how likely something like that might be in the
real world...
> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
>
Thanks for reviewing these!
Brian
> > ---
> > repair/sb.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/repair/sb.c b/repair/sb.c
> > index ad27756..dc154f7 100644
> > --- a/repair/sb.c
> > +++ b/repair/sb.c
> > @@ -724,7 +724,6 @@ verify_set_primary_sb(xfs_sb_t *rsb,
> > * sector size rather than the sector size in @rsb.
> > */
> > size = NUM_AGH_SECTS * (1 << (XFS_MAX_SECTORSIZE_LOG));
> > - retval = 0;
> > list = NULL;
> > num_ok = 0;
> > *sb_modified = 0;
> > @@ -779,6 +778,7 @@ verify_set_primary_sb(xfs_sb_t *rsb,
> > /*
> > * see if we have enough superblocks to bother with
> > */
> > + retval = 0;
> > if (num_ok < num_sbs / 2) {
> > retval = XR_INSUFF_SEC_SB;
> > goto out_free_list;
> > @@ -868,5 +868,5 @@ out_free_list:
> > free_geo(list);
> > free(sb);
> > free(checked);
> > - return(retval);
> > + return retval;
> > }
> >
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-01-16 12:18 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-13 20:08 [PATCH 0/2] xfsprogs/repair: secondary sb scan cleanups Brian Foster
2015-01-13 20:08 ` [PATCH 1/2] repair: fix unnecessary secondary scan if only last sb is corrupt Brian Foster
2015-01-15 21:44 ` Eric Sandeen
2015-01-16 12:05 ` Brian Foster
2015-01-13 20:08 ` [PATCH 2/2] repair: remove unused strided secondary sb scan logic Brian Foster
2015-01-15 23:15 ` Eric Sandeen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox