public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] repair: couple of bug fixes
@ 2011-10-06  1:01 Dave Chinner
  2011-10-06  1:01 ` [PATCH 1/2] repair: handle repair of image files on large sector size filesystems Dave Chinner
  2011-10-06  1:01 ` [PATCH 2/2] repair: fix some valgrind reported errors on i686 Dave Chinner
  0 siblings, 2 replies; 8+ messages in thread
From: Dave Chinner @ 2011-10-06  1:01 UTC (permalink / raw)
  To: xfs

Hi Alex,

These are a couple of bug fixes to repair that I posted a while ago
that we should probably include in the 3.1.6 release. I don't recall
if they were fully reviewed back when I posted them, so I'm just
sending them again.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] repair: handle repair of image files on large sector size filesystems
  2011-10-06  1:01 [PATCH 0/2] repair: couple of bug fixes Dave Chinner
@ 2011-10-06  1:01 ` Dave Chinner
  2011-10-06 12:17   ` Alex Elder
  2011-10-06  1:01 ` [PATCH 2/2] repair: fix some valgrind reported errors on i686 Dave Chinner
  1 sibling, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2011-10-06  1:01 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Because repair uses direct IO, it cannot do IO smaller than a sector
on the underlying device. When repairing a filesystem image, the
filesystem hosting the image may have a sector size larger than the
sector size of the image, and so single image sector reads and
writes will fail.

To avoid this, when checking a file and there is a sector size
mismatch like this, turn off direct IO. While there, fix a compile
bug in the IO_DEBUG option for libxfs which was found during triage.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 libxfs/rdwr.c       |    2 +-
 repair/sb.c         |    9 ++++++++-
 repair/xfs_repair.c |   30 ++++++++++++++++++++++++++++++
 3 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index ec2675e..a656851 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -364,7 +364,7 @@ libxfs_getbufr(dev_t device, xfs_daddr_t blkno, int bblen)
 		libxfs_initbuf(bp, device, blkno, blen);
 #ifdef IO_DEBUG
 	printf("%lx: %s: allocated %u bytes buffer, key=%llu(%llu), %p\n",
-		pthread_self(), __FUNCTION__, BBTOB(len),
+		pthread_self(), __FUNCTION__, blen,
 		(long long)LIBXFS_BBTOOFF64(blkno), (long long)blkno, bp);
 #endif
 
diff --git a/repair/sb.c b/repair/sb.c
index 0ee2345..c8df076 100644
--- a/repair/sb.c
+++ b/repair/sb.c
@@ -689,7 +689,14 @@ verify_set_primary_sb(xfs_sb_t		*rsb,
 	 */
 	num_sbs = MIN(NUM_SBS, rsb->sb_agcount);
 	skip = howmany(num_sbs, rsb->sb_agcount);
-	size = NUM_AGH_SECTS * rsb->sb_sectsize;
+
+	/*
+	 * We haven't been able to validate the sector size yet properly
+	 * (e.g. in the case of repairing an image in a file), so we need to
+	 * take into account sector mismatches and so use the maximum possible
+	 * sector size rather than that in @rsb.
+	 */
+	size = NUM_AGH_SECTS * (1 << (XFS_MAX_SECTORSIZE_LOG));
 	retval = 0;
 	list = NULL;
 	num_ok = 0;
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 4707b83..dda9daa 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -569,6 +569,36 @@ main(int argc, char **argv)
 	memset(&xfs_m, 0, sizeof(xfs_mount_t));
 	libxfs_sb_from_disk(&xfs_m.m_sb, XFS_BUF_TO_SBP(sbp));
 
+	/*
+	 * if the sector size of the filesystem we are trying to repair is
+	 * smaller than that of the underlying filesystem (i.e. we are repairing
+	 * an image), the we have to turn off direct IO because we cannot do IO
+	 * smaller than the host filesystem's sector size.
+	 */
+	if (isa_file) {
+		int     fd = libxfs_device_to_fd(x.ddev);
+		struct xfs_fsop_geom_v1 geom = {
+			.sectsize = BBSIZE,
+		};
+
+		if (ioctl(fd, XFS_IOC_FSGEOMETRY_V1, &geom) < 0) {
+			do_warn(_("Cannot get host fiesystem geometry.\n"
+		"Repair may fail if there is a sector size mismatch between "
+		"the image and the host filesystem.\n"));
+		}
+
+		if (xfs_m.m_sb.sb_sectsize < geom.sectsize) {
+			long	old_flags;
+
+			old_flags = fcntl(fd, F_GETFL, 0);
+			if (fcntl(fd, F_SETFL, old_flags & ~O_DIRECT) < 0) {
+				do_warn(_(
+		"Sector size on host filesystem larger than image sector size.\n"
+		"Cannot turn off direct IO, so exiting.\n"));
+				exit(1);
+			}
+		}
+	}
 	mp = libxfs_mount(&xfs_m, &xfs_m.m_sb, x.ddev, x.logdev, x.rtdev, 0);
 
 	if (!mp)  {
-- 
1.7.5.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] repair: fix some valgrind reported errors on i686
  2011-10-06  1:01 [PATCH 0/2] repair: couple of bug fixes Dave Chinner
  2011-10-06  1:01 ` [PATCH 1/2] repair: handle repair of image files on large sector size filesystems Dave Chinner
@ 2011-10-06  1:01 ` Dave Chinner
  2011-10-06 12:17   ` Alex Elder
  1 sibling, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2011-10-06  1:01 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Fix a potential prefetch read problem due to the first loop
execution of pf_batch_read potentially not initialising the fsbno
varaible:

==10177== Thread 6:
==10177== Conditional jump or move depends on uninitialised value(s)
==10177==    at 0x8079CAB: pf_batch_read (prefetch.c:408)
==10177==    by 0x6A2996D: clone (clone.S:130)
==10177==

Fix a bunch of invalid read/write errors due to excessive blkmap
allocations when inode forks are corrupted. These show up some time
after making a blkmap allocation for 536870913 extents on i686,
which is followed some time later by a crash caused bymemory
corruption.

This blkmap allocation size overflows 32 bits in such a
way that it results in a 32 byte allocation and so access to the
second extent results in access beyond the allocated memory and
corrupts random memory.

==5419== Invalid write of size 4
==5419==    at 0x80507DA: blkmap_set_ext (bmap.c:260)
==5419==    by 0x8055CF4: process_bmbt_reclist_int (dinode.c:712)
==5419==    by 0x8056206: process_bmbt_reclist (dinode.c:813)
==5419==    by 0x80579DA: process_exinode (dinode.c:1324)
==5419==    by 0x8059B77: process_dinode_int (dinode.c:2036)
==5419==    by 0x805ABE6: process_dinode (dinode.c:2823)
==5419==    by 0x8052493: process_inode_chunk.isra.4 (dino_chunks.c:777)
==5419==    by 0x8054012: process_aginodes (dino_chunks.c:1024)
==5419==    by 0xFFF: ???
==5419==  Address 0x944cfb8 is 0 bytes after a block of size 32 alloc'd
==5419==    at 0x48E1102: realloc (in
/usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==5419==    by 0x80501F3: blkmap_alloc (bmap.c:56)
==5419==    by 0x80599F5: process_dinode_int (dinode.c:2027)
==5419==    by 0x805ABE6: process_dinode (dinode.c:2823)
==5419==    by 0x8052493: process_inode_chunk.isra.4 (dino_chunks.c:777)
==5419==    by 0x8054012: process_aginodes (dino_chunks.c:1024)
==5419==    by 0xFFF: ???

Add overflow detection code into the blkmap allocation code to avoid
this problem, and also free large allocations once they are finished
with to avoid pinning large amounts of memory due to the occasional
large extent list in a filesystem.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 repair/bmap.c     |   37 ++++++++++++++++++++++++++++++++++++-
 repair/prefetch.c |    2 +-
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/repair/bmap.c b/repair/bmap.c
index 79b9f79..1127a87 100644
--- a/repair/bmap.c
+++ b/repair/bmap.c
@@ -47,6 +47,17 @@ blkmap_alloc(
 	if (nex < 1)
 		nex = 1;
 
+#if (BITS_PER_LONG != 64)
+	if (nex > (INT_MAX / sizeof(bmap_ext_t) - 1)) {
+		do_warn(
+	_("Number of extents requested in blkmap_alloc (%u) overflows 32 bits.\n"
+	  "If this is not a corruption, then will need a 64 bit system\n"
+	  "to repair this filesystem.\n"),
+			nex);
+		return NULL;
+	}
+#endif
+
 	key = whichfork ? ablkmap_key : dblkmap_key;
 	blkmap = pthread_getspecific(key);
 	if (!blkmap || blkmap->naexts < nex) {
@@ -66,12 +77,27 @@ blkmap_alloc(
 
 /*
  * Free a block map.
+ *
+ * If the map is a large, uncommon size (say for hundreds of thousands of
+ * extents) then free it to release the memory. This prevents us from pinning
+ * large tracts of memory due to corrupted fork values or one-off fragmented
+ * files. Otherwise we have nothing to do but keep the memory around for the
+ * next inode
  */
 void
 blkmap_free(
 	blkmap_t	*blkmap)
 {
-	/* nothing to do! - keep the memory around for the next inode */
+	/* consider more than 100k extents rare */
+	if (blkmap->naexts < 100 * 1024)
+		return;
+
+	if (blkmap == pthread_getspecific(dblkmap_key))
+		pthread_setspecific(dblkmap_key, NULL);
+	else
+		pthread_setspecific(ablkmap_key, NULL);
+
+	free(blkmap);
 }
 
 /*
@@ -218,6 +244,15 @@ blkmap_grow(
 	}
 
 	blkmap->naexts += 4;
+#if (BITS_PER_LONG != 64)
+	if (blkmap->naexts > (INT_MAX / sizeof(bmap_ext_t) - 1)) {
+		do_error(
+	_("Number of extents requested in blkmap_grow (%u) overflows 32 bits.\n"
+	  "You need a 64 bit system to repair this filesystem.\n"),
+			blkmap->naexts);
+		return NULL;
+	}
+#endif
 	blkmap = realloc(blkmap, BLKMAP_SIZE(blkmap->naexts));
 	if (blkmap == NULL)
 		do_error(_("realloc failed in blkmap_grow\n"));
diff --git a/repair/prefetch.c b/repair/prefetch.c
index d2fdf90..da074a8 100644
--- a/repair/prefetch.c
+++ b/repair/prefetch.c
@@ -397,7 +397,7 @@ pf_batch_read(
 	int			len, size;
 	int			i;
 	int			inode_bufs;
-	unsigned long		fsbno;
+	unsigned long		fsbno = 0;
 	unsigned long		max_fsbno;
 	char			*pbuf;
 
-- 
1.7.5.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] repair: handle repair of image files on large sector size filesystems
  2011-10-06  1:01 ` [PATCH 1/2] repair: handle repair of image files on large sector size filesystems Dave Chinner
@ 2011-10-06 12:17   ` Alex Elder
  0 siblings, 0 replies; 8+ messages in thread
From: Alex Elder @ 2011-10-06 12:17 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, 2011-10-06 at 12:01 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Because repair uses direct IO, it cannot do IO smaller than a sector
> on the underlying device. When repairing a filesystem image, the
> filesystem hosting the image may have a sector size larger than the
> sector size of the image, and so single image sector reads and
> writes will fail.
> 
> To avoid this, when checking a file and there is a sector size
> mismatch like this, turn off direct IO. While there, fix a compile
> bug in the IO_DEBUG option for libxfs which was found during triage.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

I had a few suggestions for you to consider when I
reviewed this before.  The second patch in this series
needs an update so you have a chance to address those
suggestions if you so choose.

    http://patchwork.xfs.org/patch/2336/

Either way, I'll mark this as I did before.

Reviewed-by: Alex Elder <aelder@sgi.com>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] repair: fix some valgrind reported errors on i686
  2011-10-06  1:01 ` [PATCH 2/2] repair: fix some valgrind reported errors on i686 Dave Chinner
@ 2011-10-06 12:17   ` Alex Elder
  2011-10-06 22:15     ` Dave Chinner
  0 siblings, 1 reply; 8+ messages in thread
From: Alex Elder @ 2011-10-06 12:17 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, 2011-10-06 at 12:01 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Fix a potential prefetch read problem due to the first loop
> execution of pf_batch_read potentially not initialising the fsbno
> varaible:
> 
> ==10177== Thread 6:
> ==10177== Conditional jump or move depends on uninitialised value(s)
> ==10177==    at 0x8079CAB: pf_batch_read (prefetch.c:408)
> ==10177==    by 0x6A2996D: clone (clone.S:130)
> ==10177==
> 
> Fix a bunch of invalid read/write errors due to excessive blkmap
> allocations when inode forks are corrupted. These show up some time
> after making a blkmap allocation for 536870913 extents on i686,
> which is followed some time later by a crash caused bymemory
> corruption.
> 
> This blkmap allocation size overflows 32 bits in such a
> way that it results in a 32 byte allocation and so access to the
> second extent results in access beyond the allocated memory and
> corrupts random memory.
> 
> ==5419== Invalid write of size 4
> ==5419==    at 0x80507DA: blkmap_set_ext (bmap.c:260)
> ==5419==    by 0x8055CF4: process_bmbt_reclist_int (dinode.c:712)
> ==5419==    by 0x8056206: process_bmbt_reclist (dinode.c:813)
> ==5419==    by 0x80579DA: process_exinode (dinode.c:1324)
> ==5419==    by 0x8059B77: process_dinode_int (dinode.c:2036)
> ==5419==    by 0x805ABE6: process_dinode (dinode.c:2823)
> ==5419==    by 0x8052493: process_inode_chunk.isra.4 (dino_chunks.c:777)
> ==5419==    by 0x8054012: process_aginodes (dino_chunks.c:1024)
> ==5419==    by 0xFFF: ???
> ==5419==  Address 0x944cfb8 is 0 bytes after a block of size 32 alloc'd
> ==5419==    at 0x48E1102: realloc (in
> /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
> ==5419==    by 0x80501F3: blkmap_alloc (bmap.c:56)
> ==5419==    by 0x80599F5: process_dinode_int (dinode.c:2027)
> ==5419==    by 0x805ABE6: process_dinode (dinode.c:2823)
> ==5419==    by 0x8052493: process_inode_chunk.isra.4 (dino_chunks.c:777)
> ==5419==    by 0x8054012: process_aginodes (dino_chunks.c:1024)
> ==5419==    by 0xFFF: ???
> 
> Add overflow detection code into the blkmap allocation code to avoid
> this problem, and also free large allocations once they are finished
> with to avoid pinning large amounts of memory due to the occasional
> large extent list in a filesystem.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

This is good but I have a few comments below, a couple of
which really indicate you need to update this.

					-Alex
 
> ---
>  repair/bmap.c     |   37 ++++++++++++++++++++++++++++++++++++-
>  repair/prefetch.c |    2 +-
>  2 files changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/repair/bmap.c b/repair/bmap.c
> index 79b9f79..1127a87 100644
> --- a/repair/bmap.c
> +++ b/repair/bmap.c
> @@ -47,6 +47,17 @@ blkmap_alloc(
>  	if (nex < 1)
>  		nex = 1;
>  
> +#if (BITS_PER_LONG != 64)

This should be == 32, not != 64.  (And if it
were possible, sizeof (int) == 32.)

> +	if (nex > (INT_MAX / sizeof(bmap_ext_t) - 1)) {

See the comment below about this calculation.

> +		do_warn(
> +	_("Number of extents requested in blkmap_alloc (%u) overflows 32 bits.\n"
> +	  "If this is not a corruption, then will need a 64 bit system\n"
		...then you will need...

> +	  "to repair this filesystem.\n"),
> +			nex);
> +		return NULL;
> +	}
> +#endif
> +
>  	key = whichfork ? ablkmap_key : dblkmap_key;
>  	blkmap = pthread_getspecific(key);
>  	if (!blkmap || blkmap->naexts < nex) {

. . .

> @@ -218,6 +244,15 @@ blkmap_grow(
>  	}
>  
>  	blkmap->naexts += 4;

The check needs to go *before* you update naexts.

> +#if (BITS_PER_LONG != 64)
> +	if (blkmap->naexts > (INT_MAX / sizeof(bmap_ext_t) - 1)) {

I don't really follow this calculation. I would expect
it to be based more closely on how BLKMAP_SIZE() is
defined.

If you move it before the increment I think it would
be better to use:
	if (BLKMAP_SIZE(nex) >= INT_MAX - sizeof (blkent_t *))
And since this would expose the internals of what
BLKMAP_SIZE() does, it might be nicer if some sort of
BLKMAP_NENTS_MAX constant were defined next to the
definition of BLKMAP_SIZE().



> +		do_error(
> +	_("Number of extents requested in blkmap_grow (%u) overflows 32 bits.\n"
> +	  "You need a 64 bit system to repair this filesystem.\n"),
> +			blkmap->naexts);
> +		return NULL;
> +	}
> +#endif
>  	blkmap = realloc(blkmap, BLKMAP_SIZE(blkmap->naexts));
>  	if (blkmap == NULL)
>  		do_error(_("realloc failed in blkmap_grow\n"));


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] repair: fix some valgrind reported errors on i686
  2011-10-06 12:17   ` Alex Elder
@ 2011-10-06 22:15     ` Dave Chinner
  2011-10-06 22:54       ` Alex Elder
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2011-10-06 22:15 UTC (permalink / raw)
  To: Alex Elder; +Cc: xfs

On Thu, Oct 06, 2011 at 07:17:52AM -0500, Alex Elder wrote:
> On Thu, 2011-10-06 at 12:01 +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Fix a potential prefetch read problem due to the first loop
> > execution of pf_batch_read potentially not initialising the fsbno
> > varaible:
> > 
> > ==10177== Thread 6:
> > ==10177== Conditional jump or move depends on uninitialised value(s)
> > ==10177==    at 0x8079CAB: pf_batch_read (prefetch.c:408)
> > ==10177==    by 0x6A2996D: clone (clone.S:130)
> > ==10177==
> > 
> > Fix a bunch of invalid read/write errors due to excessive blkmap
> > allocations when inode forks are corrupted. These show up some time
> > after making a blkmap allocation for 536870913 extents on i686,
> > which is followed some time later by a crash caused bymemory
> > corruption.
> > 
> > This blkmap allocation size overflows 32 bits in such a
> > way that it results in a 32 byte allocation and so access to the
> > second extent results in access beyond the allocated memory and
> > corrupts random memory.
> > 
> > ==5419== Invalid write of size 4
> > ==5419==    at 0x80507DA: blkmap_set_ext (bmap.c:260)
> > ==5419==    by 0x8055CF4: process_bmbt_reclist_int (dinode.c:712)
> > ==5419==    by 0x8056206: process_bmbt_reclist (dinode.c:813)
> > ==5419==    by 0x80579DA: process_exinode (dinode.c:1324)
> > ==5419==    by 0x8059B77: process_dinode_int (dinode.c:2036)
> > ==5419==    by 0x805ABE6: process_dinode (dinode.c:2823)
> > ==5419==    by 0x8052493: process_inode_chunk.isra.4 (dino_chunks.c:777)
> > ==5419==    by 0x8054012: process_aginodes (dino_chunks.c:1024)
> > ==5419==    by 0xFFF: ???
> > ==5419==  Address 0x944cfb8 is 0 bytes after a block of size 32 alloc'd
> > ==5419==    at 0x48E1102: realloc (in
> > /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
> > ==5419==    by 0x80501F3: blkmap_alloc (bmap.c:56)
> > ==5419==    by 0x80599F5: process_dinode_int (dinode.c:2027)
> > ==5419==    by 0x805ABE6: process_dinode (dinode.c:2823)
> > ==5419==    by 0x8052493: process_inode_chunk.isra.4 (dino_chunks.c:777)
> > ==5419==    by 0x8054012: process_aginodes (dino_chunks.c:1024)
> > ==5419==    by 0xFFF: ???
> > 
> > Add overflow detection code into the blkmap allocation code to avoid
> > this problem, and also free large allocations once they are finished
> > with to avoid pinning large amounts of memory due to the occasional
> > large extent list in a filesystem.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> 
> This is good but I have a few comments below, a couple of
> which really indicate you need to update this.
> 
> 					-Alex
>  
> > ---
> >  repair/bmap.c     |   37 ++++++++++++++++++++++++++++++++++++-
> >  repair/prefetch.c |    2 +-
> >  2 files changed, 37 insertions(+), 2 deletions(-)
> > 
> > diff --git a/repair/bmap.c b/repair/bmap.c
> > index 79b9f79..1127a87 100644
> > --- a/repair/bmap.c
> > +++ b/repair/bmap.c
> > @@ -47,6 +47,17 @@ blkmap_alloc(
> >  	if (nex < 1)
> >  		nex = 1;
> >  
> > +#if (BITS_PER_LONG != 64)
> 
> This should be == 32, not != 64.

OK.

>  (And if it
> were possible, sizeof (int) == 32.)

The BITS_PER_LONG are derived from the output of the configure
process, and that's what the other code uses as well. So I'm just
being consistent ;).

> > +	if (nex > (INT_MAX / sizeof(bmap_ext_t) - 1)) {
> 
> See the comment below about this calculation.
> 
> > +		do_warn(
> > +	_("Number of extents requested in blkmap_alloc (%u) overflows 32 bits.\n"
> > +	  "If this is not a corruption, then will need a 64 bit system\n"
> 		...then you will need...
> 
> > +	  "to repair this filesystem.\n"),
> > +			nex);
> > +		return NULL;
> > +	}
> > +#endif
> > +
> >  	key = whichfork ? ablkmap_key : dblkmap_key;
> >  	blkmap = pthread_getspecific(key);
> >  	if (!blkmap || blkmap->naexts < nex) {
> 
> . . .
> 
> > @@ -218,6 +244,15 @@ blkmap_grow(
> >  	}
> >  
> >  	blkmap->naexts += 4;
> 
> The check needs to go *before* you update naexts.

It's in the right place - adding 4 to the unchecked extent count can
push it over the limit, so we have to check it after adding 4 to it.

> > +#if (BITS_PER_LONG != 64)
> > +	if (blkmap->naexts > (INT_MAX / sizeof(bmap_ext_t) - 1)) {
> 
> I don't really follow this calculation. I would expect
> it to be based more closely on how BLKMAP_SIZE() is
> defined.

> 
> If you move it before the increment I think it would
> be better to use:
> 	if (BLKMAP_SIZE(nex) >= INT_MAX - sizeof (blkent_t *))

We can't use BLKMAP_SIZE(), because it will overflow 32 bits.
Indeed, use of BLKMAP_SIZE() on unchecked extent counts is
*precisely* the cause of the memory corruption this fix addresses.
In more detail:

typedef struct blkmap {
        int             naexts;
        int             nexts;
        bmap_ext_t      exts[1];
} blkmap_t;

#define BLKMAP_SIZE(n)  \
        (offsetof(blkmap_t, exts) + (sizeof(bmap_ext_t) * (n)))

sizeof()/offsetof() on 32 bit platforms return a 32 bit number,
naexts is a 32 bit number, so BLKMAP_SIZE() will overflow if
blkmap->naexts >= INT_MAX / sizeof(bmap_ext_t).

IOWs, we can't use BLKMAP_SIZE to detect an overflow, because it
is the code that overflows...

> And since this would expose the internals of what
> BLKMAP_SIZE() does, it might be nicer if some sort of
> BLKMAP_NENTS_MAX constant were defined next to the
> definition of BLKMAP_SIZE().

I can add a BLKMAP_NENTS_MAX constant to repair/bmap.h.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] repair: fix some valgrind reported errors on i686
  2011-10-06 22:15     ` Dave Chinner
@ 2011-10-06 22:54       ` Alex Elder
  2011-10-07  0:06         ` Dave Chinner
  0 siblings, 1 reply; 8+ messages in thread
From: Alex Elder @ 2011-10-06 22:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Fri, 2011-10-07 at 09:15 +1100, Dave Chinner wrote:
> On Thu, Oct 06, 2011 at 07:17:52AM -0500, Alex Elder wrote:
> > On Thu, 2011-10-06 at 12:01 +1100, Dave Chinner wrote:
. . .

> > > @@ -47,6 +47,17 @@ blkmap_alloc(
> > >  	if (nex < 1)
> > >  		nex = 1;
> > >  
> > > +#if (BITS_PER_LONG != 64)
> > 
> > This should be == 32, not != 64.
> 
> OK.
> 
> >  (And if it
> > were possible, sizeof (int) == 32.)
> 
> The BITS_PER_LONG are derived from the output of the configure
> process, and that's what the other code uses as well. So I'm just
> being consistent ;).

Well, it isn't possible anyway because sizeof is a compile-time
operator, so it can't be used during preprocessing.  You did
the right thing.

> 
> > > +	if (nex > (INT_MAX / sizeof(bmap_ext_t) - 1)) {
> > 
> > See the comment below about this calculation.
> > 
> > > +		do_warn(
> > > +	_("Number of extents requested in blkmap_alloc (%u) overflows 32 bits.\n"
> > > +	  "If this is not a corruption, then will need a 64 bit system\n"
> > 		...then you will need...
> > 
> > > +	  "to repair this filesystem.\n"),
> > > +			nex);
> > > +		return NULL;
> > > +	}
> > > +#endif
> > > +
> > >  	key = whichfork ? ablkmap_key : dblkmap_key;
> > >  	blkmap = pthread_getspecific(key);
> > >  	if (!blkmap || blkmap->naexts < nex) {
> > 
> > . . .
> > 
> > > @@ -218,6 +244,15 @@ blkmap_grow(
> > >  	}
> > >  
> > >  	blkmap->naexts += 4;
> > 
> > The check needs to go *before* you update naexts.
> 
> It's in the right place - adding 4 to the unchecked extent count can
> push it over the limit, so we have to check it after adding 4 to it.

My point was that I'd rather see the check be whether it *will*
overflow, rather than allowing it to overflow.

The other reason though, even if you do the calculation
at that spot, is that you shouldn't update what blkmap
records as the number of allocated extents unless you
have actually updated it the amount of allocated memory.
As it stands, you will have updated blkmap->naexts, then
if it overflows you return before actually attempting
to reallocate.

In other words, blkmap->naexts should reflect the actual
amount of memory allotted in the blkmap->exts array, which
is not going to be the case if it returns early (and this
can be avoided).


> > > +#if (BITS_PER_LONG != 64)
> > > +	if (blkmap->naexts > (INT_MAX / sizeof(bmap_ext_t) - 1)) {
> > 
> > I don't really follow this calculation. I would expect
> > it to be based more closely on how BLKMAP_SIZE() is
> > defined.
> 
> > 
> > If you move it before the increment I think it would
> > be better to use:
> > 	if (BLKMAP_SIZE(nex) >= INT_MAX - sizeof (blkent_t *))
> 
> We can't use BLKMAP_SIZE(), because it will overflow 32 bits.

Which is also a reason I was suggesting to check whether we'd
be exceeding the max *before* calling the macro.

> Indeed, use of BLKMAP_SIZE() on unchecked extent counts is
> *precisely* the cause of the memory corruption this fix addresses.
> In more detail:
> 
> typedef struct blkmap {
>         int             naexts;
>         int             nexts;
>         bmap_ext_t      exts[1];
> } blkmap_t;
> 
> #define BLKMAP_SIZE(n)  \
>         (offsetof(blkmap_t, exts) + (sizeof(bmap_ext_t) * (n)))
> 
> sizeof()/offsetof() on 32 bit platforms return a 32 bit number,
> naexts is a 32 bit number, so BLKMAP_SIZE() will overflow if
> blkmap->naexts >= INT_MAX / sizeof(bmap_ext_t).
> 
> IOWs, we can't use BLKMAP_SIZE to detect an overflow, because it
> is the code that overflows...

I understand that argument.  I'm in a hurry at the moment so
I haven't thought through this very well right now.

But if you do have BLKMAP_NENTS_MAX, you could check nex
against that before attempting to use BLKMAP_SIZE to
compute how many bytes need to be allocated.


> > And since this would expose the internals of what
> > BLKMAP_SIZE() does, it might be nicer if some sort of
> > BLKMAP_NENTS_MAX constant were defined next to the
> > definition of BLKMAP_SIZE().
> 
> I can add a BLKMAP_NENTS_MAX constant to repair/bmap.h.
> 

OK.  I'll check for an update later.  Thanks for the
explanation.

					-Alex

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] repair: fix some valgrind reported errors on i686
  2011-10-06 22:54       ` Alex Elder
@ 2011-10-07  0:06         ` Dave Chinner
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2011-10-07  0:06 UTC (permalink / raw)
  To: Alex Elder; +Cc: xfs

On Thu, Oct 06, 2011 at 05:54:18PM -0500, Alex Elder wrote:
> On Fri, 2011-10-07 at 09:15 +1100, Dave Chinner wrote:
> > On Thu, Oct 06, 2011 at 07:17:52AM -0500, Alex Elder wrote:
> > > On Thu, 2011-10-06 at 12:01 +1100, Dave Chinner wrote:
> > > > @@ -218,6 +244,15 @@ blkmap_grow(
> > > >  	}
> > > >  
> > > >  	blkmap->naexts += 4;
> > > 
> > > The check needs to go *before* you update naexts.
> > 
> > It's in the right place - adding 4 to the unchecked extent count can
> > push it over the limit, so we have to check it after adding 4 to it.
> 
> My point was that I'd rather see the check be whether it *will*
> overflow, rather than allowing it to overflow.
> 
> The other reason though, even if you do the calculation
> at that spot, is that you shouldn't update what blkmap
> records as the number of allocated extents unless you
> have actually updated it the amount of allocated memory.
> As it stands, you will have updated blkmap->naexts, then
> if it overflows you return before actually attempting
> to reallocate.
> 
> In other words, blkmap->naexts should reflect the actual
> amount of memory allotted in the blkmap->exts array, which
> is not going to be the case if it returns early (and this
> can be avoided).

Fair point.

And just checking the single caller of blkmap_grow indicates that it
doesn't handle allocation failure *at all*, so I need to fix that as
well.

> > IOWs, we can't use BLKMAP_SIZE to detect an overflow, because it
> > is the code that overflows...
> 
> I understand that argument.  I'm in a hurry at the moment so
> I haven't thought through this very well right now.
> 
> But if you do have BLKMAP_NENTS_MAX, you could check nex
> against that before attempting to use BLKMAP_SIZE to
> compute how many bytes need to be allocated.

Yup, that's what the patch does, just without the BLKMAP_NENTS_MAX
macro. Which I've already added. ;)

FWIW, because the extent count fields are all signed ints,
overflow checks also need to check for values <= 0. So I've added
that too.

And I think now I need to split this into separate patches for each
issue, as there are now about 5 different problems this one patch
fixes.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-10-07  0:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-06  1:01 [PATCH 0/2] repair: couple of bug fixes Dave Chinner
2011-10-06  1:01 ` [PATCH 1/2] repair: handle repair of image files on large sector size filesystems Dave Chinner
2011-10-06 12:17   ` Alex Elder
2011-10-06  1:01 ` [PATCH 2/2] repair: fix some valgrind reported errors on i686 Dave Chinner
2011-10-06 12:17   ` Alex Elder
2011-10-06 22:15     ` Dave Chinner
2011-10-06 22:54       ` Alex Elder
2011-10-07  0:06         ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox