* [PATCH 1/5] repair: handle repair of image files on large sector size filesystems
2011-10-10 1:08 [PATCH 0/5, v3] repair: sector size and blkmap fixes Dave Chinner
@ 2011-10-10 1:08 ` Dave Chinner
2011-10-10 14:06 ` Christoph Hellwig
2011-10-13 9:57 ` Alex Elder
2011-10-10 1:08 ` [PATCH 2/5] repair: fix a valgrind reported error on i686 Dave Chinner
` (4 subsequent siblings)
5 siblings, 2 replies; 21+ messages in thread
From: Dave Chinner @ 2011-10-10 1:08 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Because repair uses direct IO, it cannot do IO smaller than a sector
on the underlying device. When repairing a filesystem image, the
filesystem hosting the image may have a sector size larger than the
sector size of the image, and so single image sector reads and
writes will fail.
To avoid this, when checking a file and there is a sector size
mismatch like this, turn off direct IO. While there, fix a compile
bug in the IO_DEBUG option for libxfs which was found during triage.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
libxfs/rdwr.c | 2 +-
repair/sb.c | 9 ++++++++-
repair/xfs_repair.c | 29 +++++++++++++++++++++++++++++
3 files changed, 38 insertions(+), 2 deletions(-)
diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index fe8ae09..c3edb89 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -365,7 +365,7 @@ libxfs_getbufr(dev_t device, xfs_daddr_t blkno, int bblen)
libxfs_initbuf(bp, device, blkno, blen);
#ifdef IO_DEBUG
printf("%lx: %s: allocated %u bytes buffer, key=%llu(%llu), %p\n",
- pthread_self(), __FUNCTION__, BBTOB(len),
+ pthread_self(), __FUNCTION__, blen,
(long long)LIBXFS_BBTOOFF64(blkno), (long long)blkno, bp);
#endif
diff --git a/repair/sb.c b/repair/sb.c
index 6d2e91a..004319f 100644
--- a/repair/sb.c
+++ b/repair/sb.c
@@ -689,7 +689,14 @@ verify_set_primary_sb(xfs_sb_t *rsb,
*/
num_sbs = MIN(NUM_SBS, rsb->sb_agcount);
skip = howmany(num_sbs, rsb->sb_agcount);
- size = NUM_AGH_SECTS * rsb->sb_sectsize;
+
+ /*
+ * We haven't been able to validate the sector size yet properly
+ * (e.g. in the case of repairing an image in a file), so we need to
+ * take into account sector mismatches and so use the maximum possible
+ * sector size rather than the sector size in @rsb.
+ */
+ size = NUM_AGH_SECTS * (1 << (XFS_MAX_SECTORSIZE_LOG));
retval = 0;
list = NULL;
num_ok = 0;
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 4e2b022..69b7eab 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -569,6 +569,35 @@ main(int argc, char **argv)
memset(&xfs_m, 0, sizeof(xfs_mount_t));
libxfs_sb_from_disk(&xfs_m.m_sb, XFS_BUF_TO_SBP(sbp));
+ /*
+ * if the sector size of the filesystem we are trying to repair is
+ * smaller than that of the underlying filesystem (i.e. we are repairing
+ * an image), the we have to turn off direct IO because we cannot do IO
+ * smaller than the host filesystem's sector size.
+ */
+ if (isa_file) {
+ int fd = libxfs_device_to_fd(x.ddev);
+ struct xfs_fsop_geom_v1 geom = { 0 };
+
+ if (ioctl(fd, XFS_IOC_FSGEOMETRY_V1, &geom) < 0) {
+ do_warn(_("Cannot get host filesystem geometry.\n"
+ "Repair may fail if there is a sector size mismatch between\n"
+ "the image and the host filesystem.\n"));
+ geom.sectsize = BBSIZE;
+ }
+
+ if (xfs_m.m_sb.sb_sectsize < geom.sectsize) {
+ long old_flags;
+
+ old_flags = fcntl(fd, F_GETFL, 0);
+ if (fcntl(fd, F_SETFL, old_flags & ~O_DIRECT) < 0) {
+ do_warn(_(
+ "Sector size on host filesystem larger than image sector size.\n"
+ "Cannot turn off direct IO, so exiting.\n"));
+ exit(1);
+ }
+ }
+ }
mp = libxfs_mount(&xfs_m, &xfs_m.m_sb, x.ddev, x.logdev, x.rtdev, 0);
if (!mp) {
--
1.7.5.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH 1/5] repair: handle repair of image files on large sector size filesystems
2011-10-10 1:08 ` [PATCH 1/5] repair: handle repair of image files on large sector size filesystems Dave Chinner
@ 2011-10-10 14:06 ` Christoph Hellwig
2011-10-13 9:57 ` Alex Elder
1 sibling, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2011-10-10 14:06 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Mon, Oct 10, 2011 at 12:08:31PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Because repair uses direct IO, it cannot do IO smaller than a sector
> on the underlying device. When repairing a filesystem image, the
> filesystem hosting the image may have a sector size larger than the
> sector size of the image, and so single image sector reads and
> writes will fail.
>
> To avoid this, when checking a file and there is a sector size
> mismatch like this, turn off direct IO. While there, fix a compile
> bug in the IO_DEBUG option for libxfs which was found during triage.
Looks fine,
Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/5] repair: handle repair of image files on large sector size filesystems
2011-10-10 1:08 ` [PATCH 1/5] repair: handle repair of image files on large sector size filesystems Dave Chinner
2011-10-10 14:06 ` Christoph Hellwig
@ 2011-10-13 9:57 ` Alex Elder
1 sibling, 0 replies; 21+ messages in thread
From: Alex Elder @ 2011-10-13 9:57 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Mon, 2011-10-10 at 12:08 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Because repair uses direct IO, it cannot do IO smaller than a sector
> on the underlying device. When repairing a filesystem image, the
> filesystem hosting the image may have a sector size larger than the
> sector size of the image, and so single image sector reads and
> writes will fail.
>
> To avoid this, when checking a file and there is a sector size
> mismatch like this, turn off direct IO. While there, fix a compile
> bug in the IO_DEBUG option for libxfs which was found during triage.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Looks good.
Reviewed-by: Alex Elder <aelder@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 2/5] repair: fix a valgrind reported error on i686
2011-10-10 1:08 [PATCH 0/5, v3] repair: sector size and blkmap fixes Dave Chinner
2011-10-10 1:08 ` [PATCH 1/5] repair: handle repair of image files on large sector size filesystems Dave Chinner
@ 2011-10-10 1:08 ` Dave Chinner
2011-10-10 14:07 ` Christoph Hellwig
2011-10-13 9:57 ` Alex Elder
2011-10-10 1:08 ` [PATCH 3/5] repair: handle memory allocation failure from blkmap_grow Dave Chinner
` (3 subsequent siblings)
5 siblings, 2 replies; 21+ messages in thread
From: Dave Chinner @ 2011-10-10 1:08 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Fix a potential prefetch read problem due to the first loop
execution of pf_batch_read potentially not initialising the fsbno
variable:
==10177== Thread 6:
==10177== Conditional jump or move depends on uninitialised value(s)
==10177== at 0x8079CAB: pf_batch_read (prefetch.c:408)
==10177== by 0x6A2996D: clone (clone.S:130)
==10177==
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
repair/prefetch.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/repair/prefetch.c b/repair/prefetch.c
index d2fdf90..da074a8 100644
--- a/repair/prefetch.c
+++ b/repair/prefetch.c
@@ -397,7 +397,7 @@ pf_batch_read(
int len, size;
int i;
int inode_bufs;
- unsigned long fsbno;
+ unsigned long fsbno = 0;
unsigned long max_fsbno;
char *pbuf;
--
1.7.5.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 21+ messages in thread* [PATCH 3/5] repair: handle memory allocation failure from blkmap_grow
2011-10-10 1:08 [PATCH 0/5, v3] repair: sector size and blkmap fixes Dave Chinner
2011-10-10 1:08 ` [PATCH 1/5] repair: handle repair of image files on large sector size filesystems Dave Chinner
2011-10-10 1:08 ` [PATCH 2/5] repair: fix a valgrind reported error on i686 Dave Chinner
@ 2011-10-10 1:08 ` Dave Chinner
2011-10-10 14:14 ` Christoph Hellwig
2011-10-13 9:57 ` Alex Elder
2011-10-10 1:08 ` [PATCH 4/5] repair: don't cache large blkmap allocations Dave Chinner
` (2 subsequent siblings)
5 siblings, 2 replies; 21+ messages in thread
From: Dave Chinner @ 2011-10-10 1:08 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
If blkmap_grow fails to allocate a new chunk of memory, it returns
with a null blkmap. The sole caller of blkmap_grow does not check
for this failure, and so will segfault if this error ever occurs.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
repair/bmap.c | 33 ++++++++++++++++++++++-----------
repair/bmap.h | 2 +-
repair/dinode.c | 22 ++++++++++++++++++++--
3 files changed, 43 insertions(+), 14 deletions(-)
diff --git a/repair/bmap.c b/repair/bmap.c
index 0ff9315..e290515 100644
--- a/repair/bmap.c
+++ b/repair/bmap.c
@@ -207,29 +207,34 @@ blkmap_next_off(
*/
static blkmap_t *
blkmap_grow(
- blkmap_t **blkmapp)
+ blkmap_t *blkmap)
{
pthread_key_t key = dblkmap_key;
- blkmap_t *blkmap = *blkmapp;
+ blkmap_t *new_blkmap;
+ int new_naexts = blkmap->naexts + 4;
if (pthread_getspecific(key) != blkmap) {
key = ablkmap_key;
ASSERT(pthread_getspecific(key) == blkmap);
}
- blkmap->naexts += 4;
- blkmap = realloc(blkmap, BLKMAP_SIZE(blkmap->naexts));
- if (blkmap == NULL)
+ new_blkmap = realloc(blkmap, BLKMAP_SIZE(new_naexts));
+ if (!new_blkmap) {
do_error(_("realloc failed in blkmap_grow\n"));
- *blkmapp = blkmap;
- pthread_setspecific(key, blkmap);
- return blkmap;
+ return NULL;
+ }
+ new_blkmap->naexts = new_naexts;
+ pthread_setspecific(key, new_blkmap);
+ return new_blkmap;
}
/*
* Set an extent into a block map.
+ *
+ * If this function fails, it leaves the blkmapp untouched so the caller can
+ * handle the error and free the blkmap appropriately.
*/
-void
+int
blkmap_set_ext(
blkmap_t **blkmapp,
xfs_dfiloff_t o,
@@ -239,9 +244,14 @@ blkmap_set_ext(
blkmap_t *blkmap = *blkmapp;
xfs_extnum_t i;
- if (blkmap->nexts == blkmap->naexts)
- blkmap = blkmap_grow(blkmapp);
+ if (blkmap->nexts == blkmap->naexts) {
+ blkmap = blkmap_grow(blkmap);
+ if (!blkmap)
+ return ENOMEM;
+ *blkmapp = blkmap;
+ }
+ ASSERT(blkmap->nexts < blkmap->naexts);
for (i = 0; i < blkmap->nexts; i++) {
if (blkmap->exts[i].startoff > o) {
memmove(blkmap->exts + i + 1,
@@ -255,4 +265,5 @@ blkmap_set_ext(
blkmap->exts[i].startblock = b;
blkmap->exts[i].blockcount = c;
blkmap->nexts++;
+ return 0;
}
diff --git a/repair/bmap.h b/repair/bmap.h
index 58abf95..118ae1e 100644
--- a/repair/bmap.h
+++ b/repair/bmap.h
@@ -43,7 +43,7 @@ typedef struct blkmap {
blkmap_t *blkmap_alloc(xfs_extnum_t nex, int whichfork);
void blkmap_free(blkmap_t *blkmap);
-void blkmap_set_ext(blkmap_t **blkmapp, xfs_dfiloff_t o,
+int blkmap_set_ext(blkmap_t **blkmapp, xfs_dfiloff_t o,
xfs_dfsbno_t b, xfs_dfilblks_t c);
xfs_dfsbno_t blkmap_get(blkmap_t *blkmap, xfs_dfiloff_t o);
diff --git a/repair/dinode.c b/repair/dinode.c
index 5a74538..39a0cb1 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -730,9 +730,27 @@ _("inode %" PRIu64 " - extent offset too large - start %" PRIu64 ", "
goto done;
}
- if (blkmapp && *blkmapp)
- blkmap_set_ext(blkmapp, irec.br_startoff,
+ if (blkmapp && *blkmapp) {
+ error = blkmap_set_ext(blkmapp, irec.br_startoff,
irec.br_startblock, irec.br_blockcount);
+ if (error) {
+ /*
+ * we don't want to clear the inode due to an
+ * internal bmap tracking error, but if we've
+ * run out of memory then we simply can't
+ * validate that the filesystem is consistent.
+ * Hence just abort at this point with an ENOMEM
+ * error.
+ */
+ do_abort(
+_("Fatal error: inode %" PRIu64 " - blkmap_set_ext(): %s\n"
+ "\t%s fork, off - %" PRIu64 ", start - %" PRIu64 ", cnt %" PRIu64 "\n"),
+ ino, strerror(error), forkname,
+ irec.br_startoff, irec.br_startblock,
+ irec.br_blockcount);
+ }
+ }
+
/*
* Profiling shows that the following loop takes the
* most time in all of xfs_repair.
--
1.7.5.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH 3/5] repair: handle memory allocation failure from blkmap_grow
2011-10-10 1:08 ` [PATCH 3/5] repair: handle memory allocation failure from blkmap_grow Dave Chinner
@ 2011-10-10 14:14 ` Christoph Hellwig
2011-10-13 9:57 ` Alex Elder
1 sibling, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2011-10-10 14:14 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Mon, Oct 10, 2011 at 12:08:33PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> If blkmap_grow fails to allocate a new chunk of memory, it returns
> with a null blkmap. The sole caller of blkmap_grow does not check
> for this failure, and so will segfault if this error ever occurs.
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
Two comments on the code not directly related to your change:
- it seems like xfs_db has another copy of these blkmap routines,
which even missed the last round of updates during the repair
scalability work. It seems like it should be switched to reuse
the repair code
- growing the map by four seems to be incredibly inefficient for
large files. Given that the only caller actually knows how many
entries it processes in that batch we should grow it at least by
the number, reducing the allocations to one per call to
process_bmbt_reclist_int.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/5] repair: handle memory allocation failure from blkmap_grow
2011-10-10 1:08 ` [PATCH 3/5] repair: handle memory allocation failure from blkmap_grow Dave Chinner
2011-10-10 14:14 ` Christoph Hellwig
@ 2011-10-13 9:57 ` Alex Elder
1 sibling, 0 replies; 21+ messages in thread
From: Alex Elder @ 2011-10-13 9:57 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Mon, 2011-10-10 at 12:08 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> If blkmap_grow fails to allocate a new chunk of memory, it returns
> with a null blkmap. The sole caller of blkmap_grow does not check
> for this failure, and so will segfault if this error ever occurs.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Looks good.
Reviewed-by: Alex Elder <aelder@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 4/5] repair: don't cache large blkmap allocations
2011-10-10 1:08 [PATCH 0/5, v3] repair: sector size and blkmap fixes Dave Chinner
` (2 preceding siblings ...)
2011-10-10 1:08 ` [PATCH 3/5] repair: handle memory allocation failure from blkmap_grow Dave Chinner
@ 2011-10-10 1:08 ` Dave Chinner
2011-10-10 14:15 ` Christoph Hellwig
2011-10-13 9:57 ` Alex Elder
2011-10-10 1:08 ` [PATCH 5/5] repair: prevent blkmap extent count overflows Dave Chinner
2011-10-10 14:18 ` [PATCH 0/5, v3] repair: sector size and blkmap fixes Christoph Hellwig
5 siblings, 2 replies; 21+ messages in thread
From: Dave Chinner @ 2011-10-10 1:08 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
We currently use thread local storage for storing blkmap allocations
from one inode to another as a way of reducing the number of short
term allocations we do. However, the stored allocations can only
ever grow, so once we've done a large allocation we never free than
memory even if we never need that much memory again. This can occur
if we have corrupted extent counts in inodes, and can greatly
increase the memory footprint of the repair process.
Hence if the cached blkmap array id greater than a reasonable number
of extents (say 100,000), then don't store the blkmap in TLS and
instead free it.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
repair/bmap.c | 20 +++++++++++++++++++-
repair/dinode.c | 6 ++----
2 files changed, 21 insertions(+), 5 deletions(-)
diff --git a/repair/bmap.c b/repair/bmap.c
index e290515..5fb27bc 100644
--- a/repair/bmap.c
+++ b/repair/bmap.c
@@ -66,12 +66,30 @@ blkmap_alloc(
/*
* Free a block map.
+ *
+ * If the map is a large, uncommon size (say for hundreds of thousands of
+ * extents) then free it to release the memory. This prevents us from pinning
+ * large tracts of memory due to corrupted fork values or one-off fragmented
+ * files. Otherwise we have nothing to do but keep the memory around for the
+ * next inode
*/
void
blkmap_free(
blkmap_t *blkmap)
{
- /* nothing to do! - keep the memory around for the next inode */
+ if (!blkmap)
+ return;
+
+ /* consider more than 100k extents rare */
+ if (blkmap->naexts < 100 * 1024)
+ return;
+
+ if (blkmap == pthread_getspecific(dblkmap_key))
+ pthread_setspecific(dblkmap_key, NULL);
+ else
+ pthread_setspecific(ablkmap_key, NULL);
+
+ free(blkmap);
}
/*
diff --git a/repair/dinode.c b/repair/dinode.c
index 39a0cb1..fb5e53a 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -2807,8 +2807,7 @@ _("bad non-zero extent size %u for non-realtime/extsize inode %" PRIu64 ", "),
break;
}
- if (dblkmap)
- blkmap_free(dblkmap);
+ blkmap_free(dblkmap);
/*
* check nlinks feature, if it's a version 1 inode,
@@ -2827,8 +2826,7 @@ clear_bad_out:
bad_out:
*used = is_free;
*isa_dir = 0;
- if (dblkmap)
- blkmap_free(dblkmap);
+ blkmap_free(dblkmap);
return 1;
}
--
1.7.5.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 21+ messages in thread* [PATCH 5/5] repair: prevent blkmap extent count overflows
2011-10-10 1:08 [PATCH 0/5, v3] repair: sector size and blkmap fixes Dave Chinner
` (3 preceding siblings ...)
2011-10-10 1:08 ` [PATCH 4/5] repair: don't cache large blkmap allocations Dave Chinner
@ 2011-10-10 1:08 ` Dave Chinner
2011-10-10 14:17 ` Christoph Hellwig
2011-10-13 9:58 ` Alex Elder
2011-10-10 14:18 ` [PATCH 0/5, v3] repair: sector size and blkmap fixes Christoph Hellwig
5 siblings, 2 replies; 21+ messages in thread
From: Dave Chinner @ 2011-10-10 1:08 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Fix a bunch of invalid read/write errors due to excessive blkmap
allocations when inode forks are corrupted. These show up some time
after making a blkmap allocation for 536870913 extents on i686,
which is followed some time later by a crash caused bymemory
corruption.
This blkmap allocation size overflows 32 bits in such a
way that it results in a 32 byte allocation and so access to the
second extent results in access beyond the allocated memory and
corrupts random memory.
==5419== Invalid write of size 4
==5419== at 0x80507DA: blkmap_set_ext (bmap.c:260)
==5419== by 0x8055CF4: process_bmbt_reclist_int (dinode.c:712)
==5419== by 0x8056206: process_bmbt_reclist (dinode.c:813)
==5419== by 0x80579DA: process_exinode (dinode.c:1324)
==5419== by 0x8059B77: process_dinode_int (dinode.c:2036)
==5419== by 0x805ABE6: process_dinode (dinode.c:2823)
==5419== by 0x8052493: process_inode_chunk.isra.4 (dino_chunks.c:777)
==5419== by 0x8054012: process_aginodes (dino_chunks.c:1024)
==5419== by 0xFFF: ???
==5419== Address 0x944cfb8 is 0 bytes after a block of size 32 alloc'd
==5419== at 0x48E1102: realloc (in
/usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==5419== by 0x80501F3: blkmap_alloc (bmap.c:56)
==5419== by 0x80599F5: process_dinode_int (dinode.c:2027)
==5419== by 0x805ABE6: process_dinode (dinode.c:2823)
==5419== by 0x8052493: process_inode_chunk.isra.4 (dino_chunks.c:777)
==5419== by 0x8054012: process_aginodes (dino_chunks.c:1024)
==5419== by 0xFFF: ???
Add overflow detection code into the blkmap allocation code to avoid
this problem.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
repair/bmap.c | 28 ++++++++++++++++++++++++++++
repair/bmap.h | 13 +++++++++++++
2 files changed, 41 insertions(+), 0 deletions(-)
diff --git a/repair/bmap.c b/repair/bmap.c
index 5fb27bc..2f1c307 100644
--- a/repair/bmap.c
+++ b/repair/bmap.c
@@ -47,6 +47,17 @@ blkmap_alloc(
if (nex < 1)
nex = 1;
+ if (nex > BLKMAP_NEXTS_MAX) {
+#if (BITS_PER_LONG == 32)
+ do_warn(
+ _("Number of extents requested in blkmap_alloc (%d) overflows 32 bits.\n"
+ "If this is not a corruption, then you will need a 64 bit system\n"
+ "to repair this filesystem.\n"),
+ nex);
+#endif
+ return NULL;
+ }
+
key = whichfork ? ablkmap_key : dblkmap_key;
blkmap = pthread_getspecific(key);
if (!blkmap || blkmap->naexts < nex) {
@@ -236,6 +247,23 @@ blkmap_grow(
ASSERT(pthread_getspecific(key) == blkmap);
}
+ if (new_naexts > BLKMAP_NEXTS_MAX) {
+#if (BITS_PER_LONG == 32)
+ do_error(
+ _("Number of extents requested in blkmap_grow (%d) overflows 32 bits.\n"
+ "You need a 64 bit system to repair this filesystem.\n"),
+ new_naexts);
+#endif
+ return NULL;
+ }
+ if (new_naexts <= 0) {
+ do_error(
+ _("Number of extents requested in blkmap_grow (%d) overflowed the\n"
+ "maximum number of supported extents (%d).\n"),
+ new_naexts, BLKMAP_NEXTS_MAX);
+ return NULL;
+ }
+
new_blkmap = realloc(blkmap, BLKMAP_SIZE(new_naexts));
if (!new_blkmap) {
do_error(_("realloc failed in blkmap_grow\n"));
diff --git a/repair/bmap.h b/repair/bmap.h
index 118ae1e..19720b1 100644
--- a/repair/bmap.h
+++ b/repair/bmap.h
@@ -40,6 +40,19 @@ typedef struct blkmap {
#define BLKMAP_SIZE(n) \
(offsetof(blkmap_t, exts) + (sizeof(bmap_ext_t) * (n)))
+/*
+ * For 32 bit platforms, we are limited to extent arrays of 2^31 bytes, which
+ * limits the number of extents in an inode we can check. If we don't limit the
+ * valid range, we can overflow the BLKMAP_SIZE() calculation and allocate less
+ * memory than we think we needed, and hence walk off the end of the array and
+ * corrupt memory.
+ */
+#if BITS_PER_LONG == 32
+#define BLKMAP_NEXTS_MAX ((INT_MAX / sizeof(bmap_ext_t)) - 1)
+#else
+#define BLKMAP_NEXTS_MAX INT_MAX
+#endif
+
blkmap_t *blkmap_alloc(xfs_extnum_t nex, int whichfork);
void blkmap_free(blkmap_t *blkmap);
--
1.7.5.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH 5/5] repair: prevent blkmap extent count overflows
2011-10-10 1:08 ` [PATCH 5/5] repair: prevent blkmap extent count overflows Dave Chinner
@ 2011-10-10 14:17 ` Christoph Hellwig
2011-10-13 9:58 ` Alex Elder
1 sibling, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2011-10-10 14:17 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
> Add overflow detection code into the blkmap allocation code to avoid
> this problem.
The code looks a bit odd with the conditional defintion of
BLKMAP_NEXTS_MAX, and the ifdefed printfs after that, but I can't
think of a better way to handle it either.
Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 5/5] repair: prevent blkmap extent count overflows
2011-10-10 1:08 ` [PATCH 5/5] repair: prevent blkmap extent count overflows Dave Chinner
2011-10-10 14:17 ` Christoph Hellwig
@ 2011-10-13 9:58 ` Alex Elder
1 sibling, 0 replies; 21+ messages in thread
From: Alex Elder @ 2011-10-13 9:58 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Mon, 2011-10-10 at 12:08 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Fix a bunch of invalid read/write errors due to excessive blkmap
> allocations when inode forks are corrupted. These show up some time
> after making a blkmap allocation for 536870913 extents on i686,
> which is followed some time later by a crash caused bymemory
> corruption.
>
> This blkmap allocation size overflows 32 bits in such a
> way that it results in a 32 byte allocation and so access to the
> second extent results in access beyond the allocated memory and
> corrupts random memory.
>
> ==5419== Invalid write of size 4
> ==5419== at 0x80507DA: blkmap_set_ext (bmap.c:260)
> ==5419== by 0x8055CF4: process_bmbt_reclist_int (dinode.c:712)
> ==5419== by 0x8056206: process_bmbt_reclist (dinode.c:813)
> ==5419== by 0x80579DA: process_exinode (dinode.c:1324)
> ==5419== by 0x8059B77: process_dinode_int (dinode.c:2036)
> ==5419== by 0x805ABE6: process_dinode (dinode.c:2823)
> ==5419== by 0x8052493: process_inode_chunk.isra.4 (dino_chunks.c:777)
> ==5419== by 0x8054012: process_aginodes (dino_chunks.c:1024)
> ==5419== by 0xFFF: ???
> ==5419== Address 0x944cfb8 is 0 bytes after a block of size 32 alloc'd
> ==5419== at 0x48E1102: realloc (in
> /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
> ==5419== by 0x80501F3: blkmap_alloc (bmap.c:56)
> ==5419== by 0x80599F5: process_dinode_int (dinode.c:2027)
> ==5419== by 0x805ABE6: process_dinode (dinode.c:2823)
> ==5419== by 0x8052493: process_inode_chunk.isra.4 (dino_chunks.c:777)
> ==5419== by 0x8054012: process_aginodes (dino_chunks.c:1024)
> ==5419== by 0xFFF: ???
>
> Add overflow detection code into the blkmap allocation code to avoid
> this problem.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
I still prefer checking for overflow *before* adding
but it's just not that important.
This looks good.
Reviewed-by: Alex Elder <aelder@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/5, v3] repair: sector size and blkmap fixes
2011-10-10 1:08 [PATCH 0/5, v3] repair: sector size and blkmap fixes Dave Chinner
` (4 preceding siblings ...)
2011-10-10 1:08 ` [PATCH 5/5] repair: prevent blkmap extent count overflows Dave Chinner
@ 2011-10-10 14:18 ` Christoph Hellwig
2011-10-10 23:12 ` Dave Chinner
5 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2011-10-10 14:18 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
Any chance we can get some coverage for the losts of extents cases
in xfstests? It will probably require pre-created images, but I'd
really hate not beeing able to regression test this.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 0/5, v3] repair: sector size and blkmap fixes
2011-10-10 14:18 ` [PATCH 0/5, v3] repair: sector size and blkmap fixes Christoph Hellwig
@ 2011-10-10 23:12 ` Dave Chinner
0 siblings, 0 replies; 21+ messages in thread
From: Dave Chinner @ 2011-10-10 23:12 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Mon, Oct 10, 2011 at 10:18:25AM -0400, Christoph Hellwig wrote:
> Any chance we can get some coverage for the losts of extents cases
> in xfstests? It will probably require pre-created images, but I'd
> really hate not beeing able to regression test this.
Test 042 creates files with a lot of extents, but we don't have
coverage of extent countsin the hundreds of thousands or greater.
I can put together a simple test to do it using preallocation and/or
hole punching, though I'm not sure what the runtime will be.
Given we assume we have at least 10GB on the scratch device, I can
easily create a couple of million extent files on a 4k block size
filesystem for repair/db to walk....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 3/5] repair: handle memory allocation failure from blkmap_grow
2011-10-09 23:11 [PATCH 0/5, v2] " Dave Chinner
@ 2011-10-09 23:11 ` Dave Chinner
2011-10-09 23:51 ` Christoph Hellwig
0 siblings, 1 reply; 21+ messages in thread
From: Dave Chinner @ 2011-10-09 23:11 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
If blkmap_grow fails to allocate a new chunk of memory, it returns
with a null blkmap. The sole caller of blkmap_grow does not check
for this failure, and so will segfault if this error ever occurs.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
repair/bmap.c | 33 ++++++++++++++++++++++-----------
repair/bmap.h | 2 +-
repair/dinode.c | 22 ++++++++++++++++++++--
3 files changed, 43 insertions(+), 14 deletions(-)
diff --git a/repair/bmap.c b/repair/bmap.c
index 79b9f79..3ee5eff 100644
--- a/repair/bmap.c
+++ b/repair/bmap.c
@@ -207,29 +207,34 @@ blkmap_next_off(
*/
static blkmap_t *
blkmap_grow(
- blkmap_t **blkmapp)
+ blkmap_t *blkmap)
{
pthread_key_t key = dblkmap_key;
- blkmap_t *blkmap = *blkmapp;
+ blkmap_t *new_blkmap;
+ int new_naexts = blkmap->naexts + 4;
if (pthread_getspecific(key) != blkmap) {
key = ablkmap_key;
ASSERT(pthread_getspecific(key) == blkmap);
}
- blkmap->naexts += 4;
- blkmap = realloc(blkmap, BLKMAP_SIZE(blkmap->naexts));
- if (blkmap == NULL)
+ new_blkmap = realloc(blkmap, BLKMAP_SIZE(new_naexts));
+ if (!new_blkmap) {
do_error(_("realloc failed in blkmap_grow\n"));
- *blkmapp = blkmap;
- pthread_setspecific(key, blkmap);
- return blkmap;
+ return NULL;
+ }
+ blkmap->naexts = new_naexts;
+ pthread_setspecific(key, new_blkmap);
+ return new_blkmap;
}
/*
* Set an extent into a block map.
+ *
+ * If this function fails, it leaves the blkmapp untouched so the caller can
+ * handle the error and free the blkmap appropriately.
*/
-void
+int
blkmap_set_ext(
blkmap_t **blkmapp,
xfs_dfiloff_t o,
@@ -239,9 +244,14 @@ blkmap_set_ext(
blkmap_t *blkmap = *blkmapp;
xfs_extnum_t i;
- if (blkmap->nexts == blkmap->naexts)
- blkmap = blkmap_grow(blkmapp);
+ if (blkmap->nexts == blkmap->naexts) {
+ blkmap = blkmap_grow(blkmap);
+ if (!blkmap)
+ return ENOMEM;
+ *blkmapp = blkmap;
+ }
+ ASSERT(blkmap->nexts < blkmap->naexts);
for (i = 0; i < blkmap->nexts; i++) {
if (blkmap->exts[i].startoff > o) {
memmove(blkmap->exts + i + 1,
@@ -255,4 +265,5 @@ blkmap_set_ext(
blkmap->exts[i].startblock = b;
blkmap->exts[i].blockcount = c;
blkmap->nexts++;
+ return 0;
}
diff --git a/repair/bmap.h b/repair/bmap.h
index 58abf95..118ae1e 100644
--- a/repair/bmap.h
+++ b/repair/bmap.h
@@ -43,7 +43,7 @@ typedef struct blkmap {
blkmap_t *blkmap_alloc(xfs_extnum_t nex, int whichfork);
void blkmap_free(blkmap_t *blkmap);
-void blkmap_set_ext(blkmap_t **blkmapp, xfs_dfiloff_t o,
+int blkmap_set_ext(blkmap_t **blkmapp, xfs_dfiloff_t o,
xfs_dfsbno_t b, xfs_dfilblks_t c);
xfs_dfsbno_t blkmap_get(blkmap_t *blkmap, xfs_dfiloff_t o);
diff --git a/repair/dinode.c b/repair/dinode.c
index b208c51..0cedc28 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -708,9 +708,27 @@ process_bmbt_reclist_int(
goto done;
}
- if (blkmapp && *blkmapp)
- blkmap_set_ext(blkmapp, irec.br_startoff,
+ if (blkmapp && *blkmapp) {
+ error = blkmap_set_ext(blkmapp, irec.br_startoff,
irec.br_startblock, irec.br_blockcount);
+ if (error) {
+ /*
+ * we don't want to clear the inode due to an
+ * internal bmap tracking error, but if we've
+ * run out of memory then we simply can't
+ * validate that the filesystem is consistent.
+ * Hence just abort at this point with an ENOMEM
+ * error.
+ */
+ do_abort(
+ _("Fatal error: inode %llu - blkmap_set_ext(): %s\n"
+ "\t%s fork, off - %llu, start - %llu, cnt %llu\n"),
+ ino, strerror(error), forkname,
+ irec.br_startoff, irec.br_startblock,
+ irec.br_blockcount);
+ }
+ }
+
/*
* Profiling shows that the following loop takes the
* most time in all of xfs_repair.
--
1.7.5.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH 3/5] repair: handle memory allocation failure from blkmap_grow
2011-10-09 23:11 ` [PATCH 3/5] repair: handle memory allocation failure from blkmap_grow Dave Chinner
@ 2011-10-09 23:51 ` Christoph Hellwig
2011-10-10 0:10 ` Dave Chinner
0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2011-10-09 23:51 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
> static blkmap_t *
> blkmap_grow(
> - blkmap_t **blkmapp)
> + blkmap_t *blkmap)
> {
> pthread_key_t key = dblkmap_key;
> - blkmap_t *blkmap = *blkmapp;
> + blkmap_t *new_blkmap;
> + int new_naexts = blkmap->naexts + 4;
>
> if (pthread_getspecific(key) != blkmap) {
> key = ablkmap_key;
> ASSERT(pthread_getspecific(key) == blkmap);
> }
>
> - blkmap->naexts += 4;
> - blkmap = realloc(blkmap, BLKMAP_SIZE(blkmap->naexts));
> - if (blkmap == NULL)
> - do_error(_("realloc failed in blkmap_grow\n"));
> - *blkmapp = blkmap;
> - pthread_setspecific(key, blkmap);
> - return blkmap;
> + new_blkmap = realloc(blkmap, BLKMAP_SIZE(new_naexts));
> + if (!new_blkmap) {
> + do_error(_("realloc failed in blkmap_grow\n"));
> + return NULL;
> + }
> + blkmap->naexts = new_naexts;
Why would we modify naexts in the old blkmap?
Otherwise looks fine.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 3/5] repair: handle memory allocation failure from blkmap_grow
2011-10-09 23:51 ` Christoph Hellwig
@ 2011-10-10 0:10 ` Dave Chinner
0 siblings, 0 replies; 21+ messages in thread
From: Dave Chinner @ 2011-10-10 0:10 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Sun, Oct 09, 2011 at 07:51:26PM -0400, Christoph Hellwig wrote:
> > static blkmap_t *
> > blkmap_grow(
> > - blkmap_t **blkmapp)
> > + blkmap_t *blkmap)
> > {
> > pthread_key_t key = dblkmap_key;
> > - blkmap_t *blkmap = *blkmapp;
> > + blkmap_t *new_blkmap;
> > + int new_naexts = blkmap->naexts + 4;
> >
> > if (pthread_getspecific(key) != blkmap) {
> > key = ablkmap_key;
> > ASSERT(pthread_getspecific(key) == blkmap);
> > }
> >
> > - blkmap->naexts += 4;
> > - blkmap = realloc(blkmap, BLKMAP_SIZE(blkmap->naexts));
> > - if (blkmap == NULL)
> > - do_error(_("realloc failed in blkmap_grow\n"));
> > - *blkmapp = blkmap;
> > - pthread_setspecific(key, blkmap);
> > - return blkmap;
> > + new_blkmap = realloc(blkmap, BLKMAP_SIZE(new_naexts));
> > + if (!new_blkmap) {
> > + do_error(_("realloc failed in blkmap_grow\n"));
> > + return NULL;
> > + }
> > + blkmap->naexts = new_naexts;
>
> Why would we modify naexts in the old blkmap?
Ooops. Will fix.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 21+ messages in thread