Corrupted XFS log replay oops. (was Re: Warning and BUG with btrfs and corrupted image)

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* Corrupted XFS log replay oops. (was Re: Warning and BUG with btrfs and corrupted image)
       [not found]               ` <20090120173455.GC21339@alice>
@ 2009-01-21  3:57                 ` Dave Chinner
  2009-01-21  4:03                   ` Dave Chinner
       [not found]                   ` <200901211503.07308.nickpiggin@yahoo.com.au>
  0 siblings, 2 replies; 14+ messages in thread
From: Dave Chinner @ 2009-01-21  3:57 UTC (permalink / raw)
  To: Eric Sesterhenn; +Cc: Pavel Machek, npiggin, linux-kernel, Chris Mason, xfs

[drop btrfs list from this thread]

On Tue, Jan 20, 2009 at 06:34:55PM +0100, Eric Sesterhenn wrote:
> * Dave Chinner (david@fromorbit.com) wrote:
> > Sure I am.  It would be good if you could start testing XFS along
> > with all the other filesystems and report anything you find.
> 
> Ok, i wont report stuff with only xfs-internal backtraces from
> xfs_error_report() or are they interesting to you?

If it catches the corruption and shuts down then that's a valid
response to corruption. Mostly they are not interesting.

> This occurs during mount, box is dead afterwards
> Image can be found here :
> http://www.cccmz.de/~snakebyte/xfs.11.img.bz2
> I see this every ~10 images, which makes further testing hard :)

For future bugs, can you start a new thread on xfs@oss.sgi.com
for each report?

> [  235.250167] ------------[ cut here ]------------
> [  235.250354] kernel BUG at mm/vmalloc.c:164!
> [  235.250478] invalid opcode: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
> [  235.250869] last sysfs file: /sys/block/ram9/range
> [  235.250998] Modules linked in:
> [  235.251037] 
> [  235.251037] Pid: 5352, comm: mount Not tainted
> (2.6.29-rc2-00021-gd84d31c #216) System Name
> [  235.251037] EIP: 0060:[<c0182af1>] EFLAGS: 00010246 CPU: 0
> [  235.251037] EIP is at vmap_page_range+0x19/0x112
> [  235.251037] EAX: d1000000 EBX: d1000000 ECX: 00000163 EDX: d1000000
> [  235.251037] ESI: 00000003 EDI: d1000000 EBP: cbbd2c08 ESP: cbbd2be8
> [  235.251037]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
> [  235.251037] Process mount (pid: 5352, ti=cbbd2000 task=cbb85b00
> task.ti=cbbd2000)
> [  235.251037] Stack:
> [  235.251037]  00000246 cbb85b00 00000163 c01414cf cbbd2c0c d1000000
> 00000003 cba0f810
> [  235.251037]  cbbd2c40 c018367c c848e280 00100000 00000000 c848e280
> 00000000 00000014
> [  235.251037]  d1000000 cba0f944 00000000 c848e160 00000000 c848e160
> cbbd2c54 c03b2e1e
> [  235.251037] Call Trace:
> [  235.251037]  [<c01414cf>] ? trace_hardirqs_on+0xb/0xd
> [  235.251037]  [<c018367c>] ? vm_map_ram+0x36e/0x38a
> [  235.251037]  [<c03b2e1e>] ? _xfs_buf_map_pages+0x42/0x6d
> [  235.251037]  [<c03b3773>] ? xfs_buf_get_noaddr+0xbc/0x11f
> [  235.251037]  [<c03a2406>] ? xlog_get_bp+0x5a/0x5d
> [  235.251037]  [<c03a28fa>] ? xlog_find_verify_log_record+0x26/0x208
> [  235.251037]  [<c03a3521>] ? xlog_find_zeroed+0x1d6/0x214
> [  235.251037]  [<c03a3584>] ? xlog_find_head+0x25/0x358
.....

Ok, that's crashing in the new vmap code. It might take a couple
of days before I get a chance to look at this, but I've cc'd Nick Piggin
in case he has a chance to look at it before that. It's probably
an XFS bug, anyway.

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Corrupted XFS log replay oops. (was Re: Warning and BUG with btrfs and corrupted image)
  2009-01-21  3:57                 ` Corrupted XFS log replay oops. (was Re: Warning and BUG with btrfs and corrupted image) Dave Chinner
@ 2009-01-21  4:03                   ` Dave Chinner
       [not found]                   ` <200901211503.07308.nickpiggin@yahoo.com.au>
  1 sibling, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2009-01-21  4:03 UTC (permalink / raw)
  To: Eric Sesterhenn, Pavel Machek, Chris Mason, linux-kernel,
	nickpiggin, xfs

[really cc nick]

On Wed, Jan 21, 2009 at 02:57:03PM +1100, Dave Chinner wrote:
> [drop btrfs list from this thread]
> 
> On Tue, Jan 20, 2009 at 06:34:55PM +0100, Eric Sesterhenn wrote:
> > * Dave Chinner (david@fromorbit.com) wrote:
> > > Sure I am.  It would be good if you could start testing XFS along
> > > with all the other filesystems and report anything you find.
> > 
> > Ok, i wont report stuff with only xfs-internal backtraces from
> > xfs_error_report() or are they interesting to you?
> 
> If it catches the corruption and shuts down then that's a valid
> response to corruption. Mostly they are not interesting.
> 
> > This occurs during mount, box is dead afterwards
> > Image can be found here :
> > http://www.cccmz.de/~snakebyte/xfs.11.img.bz2
> > I see this every ~10 images, which makes further testing hard :)
> 
> For future bugs, can you start a new thread on xfs@oss.sgi.com
> for each report?
> 
> > [  235.250167] ------------[ cut here ]------------
> > [  235.250354] kernel BUG at mm/vmalloc.c:164!
> > [  235.250478] invalid opcode: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
> > [  235.250869] last sysfs file: /sys/block/ram9/range
> > [  235.250998] Modules linked in:
> > [  235.251037] 
> > [  235.251037] Pid: 5352, comm: mount Not tainted
> > (2.6.29-rc2-00021-gd84d31c #216) System Name
> > [  235.251037] EIP: 0060:[<c0182af1>] EFLAGS: 00010246 CPU: 0
> > [  235.251037] EIP is at vmap_page_range+0x19/0x112
> > [  235.251037] EAX: d1000000 EBX: d1000000 ECX: 00000163 EDX: d1000000
> > [  235.251037] ESI: 00000003 EDI: d1000000 EBP: cbbd2c08 ESP: cbbd2be8
> > [  235.251037]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
> > [  235.251037] Process mount (pid: 5352, ti=cbbd2000 task=cbb85b00
> > task.ti=cbbd2000)
> > [  235.251037] Stack:
> > [  235.251037]  00000246 cbb85b00 00000163 c01414cf cbbd2c0c d1000000
> > 00000003 cba0f810
> > [  235.251037]  cbbd2c40 c018367c c848e280 00100000 00000000 c848e280
> > 00000000 00000014
> > [  235.251037]  d1000000 cba0f944 00000000 c848e160 00000000 c848e160
> > cbbd2c54 c03b2e1e
> > [  235.251037] Call Trace:
> > [  235.251037]  [<c01414cf>] ? trace_hardirqs_on+0xb/0xd
> > [  235.251037]  [<c018367c>] ? vm_map_ram+0x36e/0x38a
> > [  235.251037]  [<c03b2e1e>] ? _xfs_buf_map_pages+0x42/0x6d
> > [  235.251037]  [<c03b3773>] ? xfs_buf_get_noaddr+0xbc/0x11f
> > [  235.251037]  [<c03a2406>] ? xlog_get_bp+0x5a/0x5d
> > [  235.251037]  [<c03a28fa>] ? xlog_find_verify_log_record+0x26/0x208
> > [  235.251037]  [<c03a3521>] ? xlog_find_zeroed+0x1d6/0x214
> > [  235.251037]  [<c03a3584>] ? xlog_find_head+0x25/0x358
> .....
> 
> Ok, that's crashing in the new vmap code. It might take a couple
> of days before I get a chance to look at this, but I've cc'd Nick Piggin
> in case he has a chance to look at it before that. It's probably
> an XFS bug, anyway.
> 
> Cheers,
> 
> Dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

[parent not found: <200901211503.07308.nickpiggin@yahoo.com.au>]

* [PATCH] Re: Corrupted XFS log replay oops.
       [not found]                   ` <200901211503.07308.nickpiggin@yahoo.com.au>
@ 2009-01-22  4:37                     ` Dave Chinner
  2009-01-22  5:50                       ` Felix Blyakher
                                         ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Dave Chinner @ 2009-01-22  4:37 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Eric Sesterhenn, linux-kernel, xfs, Pavel Machek, npiggin,
	Chris Mason

On Wed, Jan 21, 2009 at 03:03:06PM +1100, Nick Piggin wrote:
> On Wednesday 21 January 2009 14:57:03 Dave Chinner wrote:
> > > [  235.250167] ------------[ cut here ]------------
> > > [  235.250354] kernel BUG at mm/vmalloc.c:164!
> > > [  235.250478] invalid opcode: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
> > > [  235.250869] last sysfs file: /sys/block/ram9/range
> > > [  235.250998] Modules linked in:
......
> > > [  235.251037] Call Trace:
> > > [  235.251037]  [<c01414cf>] ? trace_hardirqs_on+0xb/0xd
> > > [  235.251037]  [<c018367c>] ? vm_map_ram+0x36e/0x38a
> > > [  235.251037]  [<c03b2e1e>] ? _xfs_buf_map_pages+0x42/0x6d
> > > [  235.251037]  [<c03b3773>] ? xfs_buf_get_noaddr+0xbc/0x11f
> > > [  235.251037]  [<c03a2406>] ? xlog_get_bp+0x5a/0x5d
> > > [  235.251037]  [<c03a28fa>] ? xlog_find_verify_log_record+0x26/0x208
> > > [  235.251037]  [<c03a3521>] ? xlog_find_zeroed+0x1d6/0x214
> > > [  235.251037]  [<c03a3584>] ? xlog_find_head+0x25/0x358
> >
> > .....
> >
> > Ok, that's crashing in the new vmap code. It might take a couple
> > of days before I get a chance to look at this, but I've cc'd Nick Piggin
> > in case he has a chance to look at it before that. It's probably
> > an XFS bug, anyway.
> 
> Hmm, it is crashing in BUG_ON(addr >= end); where this could happen
> if XFS asks to map a really huge (or -ve) number of pages and wraps
> the range, or if vmap subsystem returns an address right near the
> end of the address range and addr+size wraps (which would be a bug
> in vmap of course, but I think maybe less likely).

It's a zero length range, not a negative value. A debug XFS would
have assert failed on it, but it was completely unchecked on
production builds. The following patch checks the length of blocks
to build/read/write for being valid. Instead of an oops, we get:

[ 1572.665001] XFS mounting filesystem loop0
[ 1572.666942] XFS: Invalid block length (0x0) given for buffer
[ 1572.667141] XFS: Log inconsistent (didn't find previous header)
[ 1572.667141] XFS: empty log check failed
[ 1572.667141] XFS: log mount/recovery failed: error 5
[ 1572.671487] XFS: log mount failed

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

[XFS] Check buffer lengths in log recovery

Before trying to obtain, read or write a buffer,
check that the buffer length is actually valid. If
it is not valid, then something read in the recovery
process has been corrupted and we should abort
recovery. 

Reported-by: Eric Sesterhenn <snakebyte@gmx.de>
---
 fs/xfs/xfs_log_recover.c |   31 +++++++++++++++++++++++++------
 1 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 35cca98..b1047de 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -70,16 +70,21 @@ STATIC void	xlog_recover_check_summary(xlog_t *);
 xfs_buf_t *
 xlog_get_bp(
 	xlog_t		*log,
-	int		num_bblks)
+	int		nbblks)
 {
-	ASSERT(num_bblks > 0);
+	if (nbblks <= 0 || nbblks > log->l_logBBsize) {
+		xlog_warn("XFS: Invalid block length (0x%x) given for buffer", nbblks);
+		XFS_ERROR_REPORT("xlog_get_bp(1)",
+				 XFS_ERRLEVEL_HIGH, log->l_mp);
+		return NULL;
+	}
 
 	if (log->l_sectbb_log) {
-		if (num_bblks > 1)
-			num_bblks += XLOG_SECTOR_ROUNDUP_BBCOUNT(log, 1);
-		num_bblks = XLOG_SECTOR_ROUNDUP_BBCOUNT(log, num_bblks);
+		if (nbblks > 1)
+			nbblks += XLOG_SECTOR_ROUNDUP_BBCOUNT(log, 1);
+		nbblks = XLOG_SECTOR_ROUNDUP_BBCOUNT(log, nbblks);
 	}
-	return xfs_buf_get_noaddr(BBTOB(num_bblks), log->l_mp->m_logdev_targp);
+	return xfs_buf_get_noaddr(BBTOB(nbblks), log->l_mp->m_logdev_targp);
 }
 
 void
@@ -102,6 +107,13 @@ xlog_bread(
 {
 	int		error;
 
+	if (nbblks <= 0 || nbblks > log->l_logBBsize) {
+		xlog_warn("XFS: Invalid block length (0x%x) given for buffer", nbblks);
+		XFS_ERROR_REPORT("xlog_bread(1)",
+				 XFS_ERRLEVEL_HIGH, log->l_mp);
+		return EFSCORRUPTED;
+	}
+
 	if (log->l_sectbb_log) {
 		blk_no = XLOG_SECTOR_ROUNDDOWN_BLKNO(log, blk_no);
 		nbblks = XLOG_SECTOR_ROUNDUP_BBCOUNT(log, nbblks);
@@ -139,6 +151,13 @@ xlog_bwrite(
 {
 	int		error;
 
+	if (nbblks <= 0 || nbblks > log->l_logBBsize) {
+		xlog_warn("XFS: Invalid block length (0x%x) given for buffer", nbblks);
+		XFS_ERROR_REPORT("xlog_bwrite(1)",
+				 XFS_ERRLEVEL_HIGH, log->l_mp);
+		return EFSCORRUPTED;
+	}
+
 	if (log->l_sectbb_log) {
 		blk_no = XLOG_SECTOR_ROUNDDOWN_BLKNO(log, blk_no);
 		nbblks = XLOG_SECTOR_ROUNDUP_BBCOUNT(log, nbblks);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re: Corrupted XFS log replay oops.
  2009-01-22  4:37                     ` [PATCH] Re: Corrupted XFS log replay oops Dave Chinner
@ 2009-01-22  5:50                       ` Felix Blyakher
  2009-01-22  6:11                       ` Christoph Hellwig
  2009-02-03 20:48                       ` Eric Sandeen
  2 siblings, 0 replies; 14+ messages in thread
From: Felix Blyakher @ 2009-01-22  5:50 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Nick Piggin, Eric Sesterhenn, linux-kernel, xfs, Pavel Machek,
	npiggin, Chris Mason


On Jan 21, 2009, at 10:37 PM, Dave Chinner wrote:

> On Wed, Jan 21, 2009 at 03:03:06PM +1100, Nick Piggin wrote:
>> On Wednesday 21 January 2009 14:57:03 Dave Chinner wrote:
>>>> [  235.250167] ------------[ cut here ]------------
>>>> [  235.250354] kernel BUG at mm/vmalloc.c:164!
>>>> [  235.250478] invalid opcode: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
>>>> [  235.250869] last sysfs file: /sys/block/ram9/range
>>>> [  235.250998] Modules linked in:
> ......
>>>> [  235.251037] Call Trace:
>>>> [  235.251037]  [<c01414cf>] ? trace_hardirqs_on+0xb/0xd
>>>> [  235.251037]  [<c018367c>] ? vm_map_ram+0x36e/0x38a
>>>> [  235.251037]  [<c03b2e1e>] ? _xfs_buf_map_pages+0x42/0x6d
>>>> [  235.251037]  [<c03b3773>] ? xfs_buf_get_noaddr+0xbc/0x11f
>>>> [  235.251037]  [<c03a2406>] ? xlog_get_bp+0x5a/0x5d
>>>> [  235.251037]  [<c03a28fa>] ? xlog_find_verify_log_record 
>>>> +0x26/0x208
>>>> [  235.251037]  [<c03a3521>] ? xlog_find_zeroed+0x1d6/0x214
>>>> [  235.251037]  [<c03a3584>] ? xlog_find_head+0x25/0x358
>>>
>>> .....
>>>
>>> Ok, that's crashing in the new vmap code. It might take a couple
>>> of days before I get a chance to look at this, but I've cc'd Nick  
>>> Piggin
>>> in case he has a chance to look at it before that. It's probably
>>> an XFS bug, anyway.
>>
>> Hmm, it is crashing in BUG_ON(addr >= end); where this could happen
>> if XFS asks to map a really huge (or -ve) number of pages and wraps
>> the range, or if vmap subsystem returns an address right near the
>> end of the address range and addr+size wraps (which would be a bug
>> in vmap of course, but I think maybe less likely).
>
> It's a zero length range, not a negative value. A debug XFS would
> have assert failed on it, but it was completely unchecked on
> production builds. The following patch checks the length of blocks
> to build/read/write for being valid. Instead of an oops, we get:
>
> [ 1572.665001] XFS mounting filesystem loop0
> [ 1572.666942] XFS: Invalid block length (0x0) given for buffer
> [ 1572.667141] XFS: Log inconsistent (didn't find previous header)
> [ 1572.667141] XFS: empty log check failed
> [ 1572.667141] XFS: log mount/recovery failed: error 5
> [ 1572.671487] XFS: log mount failed
>
> Cheers,
>
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>
> [XFS] Check buffer lengths in log recovery
>
> Before trying to obtain, read or write a buffer,
> check that the buffer length is actually valid. If
> it is not valid, then something read in the recovery
> process has been corrupted and we should abort
> recovery.
>
> Reported-by: Eric Sesterhenn <snakebyte@gmx.de>

Reviewed-by: Felix Blyakher <felixb@sgi.com>

> ---
>  fs/xfs/xfs_log_recover.c |   31 +++++++++++++++++++++++++------
>  1 files changed, 25 insertions(+), 6 deletions(-)
>
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 35cca98..b1047de 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -70,16 +70,21 @@ STATIC void	xlog_recover_check_summary(xlog_t *);
>  xfs_buf_t *
>  xlog_get_bp(
>  	xlog_t		*log,
> -	int		num_bblks)
> +	int		nbblks)
>  {
> -	ASSERT(num_bblks > 0);
> +	if (nbblks <= 0 || nbblks > log->l_logBBsize) {
> +		xlog_warn("XFS: Invalid block length (0x%x) given for buffer",  
> nbblks);
> +		XFS_ERROR_REPORT("xlog_get_bp(1)",
> +				 XFS_ERRLEVEL_HIGH, log->l_mp);
> +		return NULL;
> +	}
>
>  	if (log->l_sectbb_log) {
> -		if (num_bblks > 1)
> -			num_bblks += XLOG_SECTOR_ROUNDUP_BBCOUNT(log, 1);
> -		num_bblks = XLOG_SECTOR_ROUNDUP_BBCOUNT(log, num_bblks);
> +		if (nbblks > 1)
> +			nbblks += XLOG_SECTOR_ROUNDUP_BBCOUNT(log, 1);
> +		nbblks = XLOG_SECTOR_ROUNDUP_BBCOUNT(log, nbblks);
>  	}
> -	return xfs_buf_get_noaddr(BBTOB(num_bblks), log->l_mp- 
> >m_logdev_targp);
> +	return xfs_buf_get_noaddr(BBTOB(nbblks), log->l_mp->m_logdev_targp);
>  }
>
>  void
> @@ -102,6 +107,13 @@ xlog_bread(
>  {
>  	int		error;
>
> +	if (nbblks <= 0 || nbblks > log->l_logBBsize) {
> +		xlog_warn("XFS: Invalid block length (0x%x) given for buffer",  
> nbblks);
> +		XFS_ERROR_REPORT("xlog_bread(1)",
> +				 XFS_ERRLEVEL_HIGH, log->l_mp);
> +		return EFSCORRUPTED;
> +	}
> +
>  	if (log->l_sectbb_log) {
>  		blk_no = XLOG_SECTOR_ROUNDDOWN_BLKNO(log, blk_no);
>  		nbblks = XLOG_SECTOR_ROUNDUP_BBCOUNT(log, nbblks);
> @@ -139,6 +151,13 @@ xlog_bwrite(
>  {
>  	int		error;
>
> +	if (nbblks <= 0 || nbblks > log->l_logBBsize) {
> +		xlog_warn("XFS: Invalid block length (0x%x) given for buffer",  
> nbblks);
> +		XFS_ERROR_REPORT("xlog_bwrite(1)",
> +				 XFS_ERRLEVEL_HIGH, log->l_mp);
> +		return EFSCORRUPTED;
> +	}
> +
>  	if (log->l_sectbb_log) {
>  		blk_no = XLOG_SECTOR_ROUNDDOWN_BLKNO(log, blk_no);
>  		nbblks = XLOG_SECTOR_ROUNDUP_BBCOUNT(log, nbblks);
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re: Corrupted XFS log replay oops.
  2009-01-22  4:37                     ` [PATCH] Re: Corrupted XFS log replay oops Dave Chinner
  2009-01-22  5:50                       ` Felix Blyakher
@ 2009-01-22  6:11                       ` Christoph Hellwig
  2009-01-22  8:35                         ` Eric Sesterhenn
                                           ` (3 more replies)
  2009-02-03 20:48                       ` Eric Sandeen
  2 siblings, 4 replies; 14+ messages in thread
From: Christoph Hellwig @ 2009-01-22  6:11 UTC (permalink / raw)
  To: Nick Piggin, Eric Sesterhenn, Pavel Machek, Chris Mason,
	linux-kernel, npiggin, xfs

On Thu, Jan 22, 2009 at 03:37:47PM +1100, Dave Chinner wrote:
>  xfs_buf_t *
>  xlog_get_bp(
>  	xlog_t		*log,
> -	int		num_bblks)
> +	int		nbblks)

Any reason for reanming this variable?  That causes quite a bit of
churn.

>  {
> -	ASSERT(num_bblks > 0);
> +	if (nbblks <= 0 || nbblks > log->l_logBBsize) {
> +		xlog_warn("XFS: Invalid block length (0x%x) given for buffer", nbblks);

And doesn't prevent this line from needing a linebreak to stay under 80
characters :)

Except for these nitpicks it looks fine to me.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re: Corrupted XFS log replay oops.
  2009-01-22  6:11                       ` Christoph Hellwig
@ 2009-01-22  8:35                         ` Eric Sesterhenn
  2009-01-22 10:06                         ` Eric Sesterhenn
                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Eric Sesterhenn @ 2009-01-22  8:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Nick Piggin, linux-kernel, xfs, Pavel Machek, npiggin,
	Chris Mason

* Christoph Hellwig (hch@infradead.org) wrote:
> On Thu, Jan 22, 2009 at 03:37:47PM +1100, Dave Chinner wrote:
> >  xfs_buf_t *
> >  xlog_get_bp(
> >  	xlog_t		*log,
> > -	int		num_bblks)
> > +	int		nbblks)
> 
> Any reason for reanming this variable?  That causes quite a bit of
> churn.
> 
> >  {
> > -	ASSERT(num_bblks > 0);
> > +	if (nbblks <= 0 || nbblks > log->l_logBBsize) {
> > +		xlog_warn("XFS: Invalid block length (0x%x) given for buffer", nbblks);
> 
> And doesn't prevent this line from needing a linebreak to stay under 80
> characters :)
> 
> Except for these nitpicks it looks fine to me.

Just want to report that it fixes the issue for me. Thanks,
i will continue xfs testing.

Greetings, Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re: Corrupted XFS log replay oops.
  2009-01-22  6:11                       ` Christoph Hellwig
  2009-01-22  8:35                         ` Eric Sesterhenn
@ 2009-01-22 10:06                         ` Eric Sesterhenn
  2009-01-22 23:37                           ` Dave Chinner
  2009-01-22 23:35                         ` Dave Chinner
  2009-01-23  0:02                         ` Dave Chinner
  3 siblings, 1 reply; 14+ messages in thread
From: Eric Sesterhenn @ 2009-01-22 10:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Nick Piggin, linux-kernel, xfs, Pavel Machek, npiggin,
	Chris Mason

* Christoph Hellwig (hch@infradead.org) wrote:
> On Thu, Jan 22, 2009 at 03:37:47PM +1100, Dave Chinner wrote:
> >  xfs_buf_t *
> >  xlog_get_bp(
> >  	xlog_t		*log,
> > -	int		num_bblks)
> > +	int		nbblks)
> 
> Any reason for reanming this variable?  That causes quite a bit of
> churn.
> 
> >  {
> > -	ASSERT(num_bblks > 0);
> > +	if (nbblks <= 0 || nbblks > log->l_logBBsize) {
> > +		xlog_warn("XFS: Invalid block length (0x%x) given for buffer", nbblks);
> 
> And doesn't prevent this line from needing a linebreak to stay under 80
> characters :)
> 
> Except for these nitpicks it looks fine to me.

Using the image at http://www.cccmz.de/~snakebyte/xfs.254.img.bz2
I was able to produce a pretty similar error with the patch applied

[  227.138277] XFS mounting filesystem loop0
[  227.142703] ------------[ cut here ]------------
[  227.142881] kernel BUG at mm/vmalloc.c:164!
[  227.143009] invalid opcode: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
[  227.143040] last sysfs file: /sys/block/ram9/range
[  227.143040] Modules linked in:
[  227.143040] 
[  227.143040] Pid: 5021, comm: mount Tainted: G        W
(2.6.29-rc2-00021-gd84d31c-dirty #221) System Name
[  227.143040] EIP: 0060:[<c0182af1>] EFLAGS: 00010246 CPU: 0
[  227.143040] EIP is at vmap_page_range+0x19/0x112
[  227.143040] EAX: d1000000 EBX: d1000000 ECX: 00000163 EDX: d1000000
[  227.143040] ESI: 00000003 EDI: d1000000 EBP: cb774c48 ESP: cb774c28
[  227.143040]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[  227.143040] Process mount (pid: 5021, ti=cb774000 task=cb760000
task.ti=cb774000)
[  227.143040] Stack:
[  227.143040]  00000246 cb760000 00000163 c01414cf cb774c4c d1000000
00000003 cb66ba40
[  227.143040]  cb774c80 c018367c cb7f7ee0 00100000 00000000 cb7f7ee0
00000000 00000014
[  227.143040]  d1000000 cb66bb74 00000000 cb7f7dc0 00000000 cb7f7dc0
cb774c94 c03b2ed6
[  227.143040] Call Trace:
[  227.143040]  [<c01414cf>] ? trace_hardirqs_on+0xb/0xd
[  227.143040]  [<c018367c>] ? vm_map_ram+0x36e/0x38a
[  227.143040]  [<c03b2ed6>] ? _xfs_buf_map_pages+0x42/0x6d
[  227.143040]  [<c03b382b>] ? xfs_buf_get_noaddr+0xbc/0x11f
[  227.143040]  [<c03a24bc>] ? xlog_get_bp+0x92/0x97
[  227.143040]  [<c03a343a>] ? xlog_find_zeroed+0x37/0x214
[  227.143040]  [<c0141381>] ? trace_hardirqs_on_caller+0x17/0x15a
[  227.143040]  [<c07b07bc>] ? _spin_unlock_irq+0x32/0x47
[  227.143040]  [<c03a363c>] ? xlog_find_head+0x25/0x358
[  227.143040]  [<c011ca1f>] ? __enqueue_entity+0xa1/0xa9
[  227.143040]  [<c03a398a>] ? xlog_find_tail+0x1b/0x3fa
[  227.143040]  [<c01414cf>] ? trace_hardirqs_on+0xb/0xd
[  227.143040]  [<c07b0768>] ? _spin_unlock_irqrestore+0x3b/0x5d
[  227.143040]  [<c07b0774>] ? _spin_unlock_irqrestore+0x47/0x5d
[  227.143040]  [<c01209c6>] ? try_to_wake_up+0x12f/0x13a
[  227.143040]  [<c03a570c>] ? xlog_recover+0x19/0x81
[  227.143040]  [<c03aab7b>] ? xfs_trans_ail_init+0x4b/0x64
[  227.143040]  [<c039f97d>] ? xfs_log_mount+0xef/0x13d
[  227.143040]  [<c03a720a>] ? xfs_mountfs+0x30d/0x5b8
[  227.143040]  [<c05061c1>] ? __debug_object_init+0x28b/0x293
[  227.143040]  [<c012b3b5>] ? init_timer+0x1c/0x1f
[  227.143040]  [<c03a7b59>] ? xfs_mru_cache_create+0x114/0x14e
[  227.143040]  [<c03ba2bd>] ? xfs_fs_fill_super+0x196/0x2e5
[  227.143040]  [<c01919c5>] ? get_sb_bdev+0xf1/0x13f
[  227.143040]  [<c0178896>] ? kstrdup+0x2f/0x51
[  227.143040]  [<c03b88d3>] ? xfs_fs_get_sb+0x18/0x1a
[  227.143040]  [<c03ba127>] ? xfs_fs_fill_super+0x0/0x2e5
[  227.143040]  [<c019159c>] ? vfs_kern_mount+0x40/0x7b
[  227.143040]  [<c0191625>] ? do_kern_mount+0x37/0xbf
[  227.143040]  [<c01a2cb0>] ? do_mount+0x5cc/0x609
[  227.143040]  [<c07b093b>] ? lock_kernel+0x19/0x8c
[  227.143040]  [<c01a2d43>] ? sys_mount+0x56/0xa0
[  227.143040]  [<c01a2d56>] ? sys_mount+0x69/0xa0
[  227.143040]  [<c0102ea1>] ? sysenter_do_call+0x12/0x31
[  227.143040] Code: 0f 0b eb fe ba 01 00 00 00 89 c8 e8 02 ff ff ff 5d
c3 55 89 e5 57 56 53 83 ec 14 0f 1f 44 00 00 39 d0 89 c3 89 d7 89 4d e8
72 04 <0f> 0b eb fe c1 e8 16 8d 34 85 00 00 00 00 03 35 84 91 a3 c0 8d 
[  227.143040] EIP: [<c0182af1>] vmap_page_range+0x19/0x112 SS:ESP
0068:cb774c28
[  227.161166] ---[ end trace a7919e7f17c0a727 ]---
[  553.569010] general protection fault: 0000 [#2] PREEMPT
DEBUG_PAGEALLOC
[  553.569043] last sysfs file: /sys/block/ram9/range
[  553.569043] Modules linked in:
[  553.569043] 
[  553.569043] Pid: 328, comm: kswapd0 Tainted: G      D W
(2.6.29-rc2-00021-gd84d31c-dirty #221) System Name
[  553.569043] EIP: 0060:[<c0181bb6>] EFLAGS: 00010282 CPU: 0
[  553.569043] EIP is at page_referenced+0x85/0xdf
[  553.569043] EAX: 00000000 EBX: c12b5f20 ECX: cf168e60 EDX: ffffffff
[  553.569043] ESI: ffffffcb EDI: cb665580 EBP: cf168e70 ESP: cf168e54
[  553.569043]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[  553.569043] Process kswapd0 (pid: 328, ti=cf168000 task=cf090000
task.ti=cf168000)
[  553.569043] Stack:
[  553.569043]  cb6655a4 00000000 00000000 00000001 c12b5f20 00000000
c0a88964 cf168ef4
[  553.569043]  c0174049 cf168edc cf168f70 00000000 c0a88c00 c0a88bb0
cf090000 00000005
[  553.569043]  c1074578 cf168ea8 cf168ed8 cf168eb4 c013f6f2 c0a88bb0
cf090000 ffffffff
[  553.569043] Call Trace:
[  553.569043]  [<c0174049>] ? shrink_active_list+0x12d/0x2e3
[  553.569043]  [<c013f6f2>] ? print_lock_contention_bug+0x11/0xb2
[  553.569043]  [<c0174c5f>] ? shrink_zone+0x95/0x24b
[  553.569043]  [<c01414cf>] ? trace_hardirqs_on+0xb/0xd
[  553.569043]  [<c07b07bc>] ? _spin_unlock_irq+0x32/0x47
[  553.569043]  [<c0174e00>] ? shrink_zone+0x236/0x24b
[  553.569043]  [<c0175546>] ? kswapd+0x2d1/0x443
[  553.569043]  [<c01735d1>] ? isolate_pages_global+0x0/0x1d2
[  553.569043]  [<c013416c>] ? autoremove_wake_function+0x0/0x35
[  553.569043]  [<c0175275>] ? kswapd+0x0/0x443
[  553.569043]  [<c0134094>] ? kthread+0x3e/0x66
[  553.569043]  [<c0134056>] ? kthread+0x0/0x66
[  553.569043]  [<c01035a7>] ? kernel_thread_helper+0x7/0x10
[  553.569043] Code: 45 f0 8d 47 24 8b 77 24 89 45 e4 83 ee 34 eb 1b 8d
4d f0 89 f2 89 d8 e8 0d f7 ff ff 01 45 ec 83 7d f0 00 74 15 8b 76 34 83
ee 34 <8b> 46 34 0f 18 00 90 8d 46 34 3b 45 e4 75 d6 89 f8 e8 51 f4 ff 
[  553.569043] EIP: [<c0181bb6>] page_referenced+0x85/0xdf SS:ESP
0068:cf168e54
[  553.582779] ---[ end trace a7919e7f17c0a728 ]---
[  553.582900] note: kswapd0[328] exited with preempt_count 1
[  697.933024] BUG: soft lockup - CPU#0 stuck for 61s! [make:14851]
[  697.933024] Modules linked in:
[  697.933024] irq event stamp: 0
[  697.933024] hardirqs last  enabled at (0): [<(null)>] (null)
[  697.933024] hardirqs last disabled at (0): [<c01221f9>]
copy_process+0x31f/0xfef
[  697.933024] softirqs last  enabled at (0): [<c01221f9>]
copy_process+0x31f/0xfef
[  697.933024] softirqs last disabled at (0): [<(null)>] (null)
[  697.933024] 
[  697.933024] Pid: 14851, comm: make Tainted: G      D W
(2.6.29-rc2-00021-gd84d31c-dirty #221) System Name
[  697.933024] EIP: 0060:[<c07b2628>] EFLAGS: 00000286 CPU: 0
[  697.933024] EIP is at add_preempt_count+0x0/0xde
[  697.933024] EAX: 00000001 EBX: 00000001 ECX: c018100f EDX: 00000096
[  697.933024] ESI: 166556bb EDI: 00000001 EBP: c1c83bd0 ESP: c1c83bbc
[  697.933024]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[  697.933024] CR0: 8005003b CR2: 400c3044 CR3: 01c87000 CR4: 000006d0
[  697.933024] DR0: c0122f08 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  697.933024] DR6: ffff0ff0 DR7: 00000400
[  697.933024] Call Trace:
[  697.933024]  [<c04f87fa>] ? delay_tsc+0x1a/0x88
[  697.933024]  [<c04f8778>] __delay+0xe/0x10
[  697.933024]  [<c0505465>] _raw_spin_lock+0x83/0xcb
[  697.933024]  [<c07b00da>] _spin_lock+0x4c/0x5d
[  697.933024]  [<c018100f>] page_lock_anon_vma+0x29/0x37
[  697.933024]  [<c0181b79>] page_referenced+0x48/0xdf
[  697.933024]  [<c07b07bc>] ? _spin_unlock_irq+0x32/0x47
[  697.933024]  [<c0173ddb>] ? __remove_mapping+0xb8/0xcd
[  697.933024]  [<c017433f>] shrink_page_list+0x140/0x5ff
[  697.933024]  [<c017499c>] shrink_list+0x19e/0x3cc
[  697.933024]  [<c0174d9f>] shrink_zone+0x1d5/0x24b
[  697.933024]  [<c0175121>] try_to_free_pages+0x1b4/0x29e
[  697.933024]  [<c01735d1>] ? isolate_pages_global+0x0/0x1d2
[  697.933024]  [<c017022f>] __alloc_pages_internal+0x247/0x3af
[  697.933024]  [<c018b4cd>] ? __slab_alloc+0x181/0x50b
[  697.933024]  [<c018b50d>] __slab_alloc+0x1c1/0x50b
[  697.933024]  [<c018a797>] ? check_bytes_and_report+0x26/0x94
[  697.933024]  [<c018bb5a>] kmem_cache_alloc+0x7c/0xea
[  697.933024]  [<c0121f43>] ? copy_process+0x69/0xfef
[  697.933024]  [<c0121f43>] ? copy_process+0x69/0xfef
[  697.933024]  [<c0121f43>] copy_process+0x69/0xfef
[  697.933024]  [<c013f6f2>] ? print_lock_contention_bug+0x11/0xb2
[  697.933024]  [<c018de9a>] ? fd_install+0x28/0x55
[  697.933024]  [<c0123029>] do_fork+0x121/0x2b8
[  697.933024]  [<c04f9195>] ? copy_to_user+0x38/0x43
[  697.933024]  [<c04f8b60>] ? trace_hardirqs_on_thunk+0xc/0x10
[  697.933024]  [<c0102ecf>] ? sysenter_exit+0xf/0x16
[  697.933024]  [<c01015a2>] sys_vfork+0x1e/0x20
[  697.933024]  [<c0102f7a>] syscall_call+0x7/0xb

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re: Corrupted XFS log replay oops.
  2009-01-22 10:06                         ` Eric Sesterhenn
@ 2009-01-22 23:37                           ` Dave Chinner
  2009-01-23  1:10                             ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2009-01-22 23:37 UTC (permalink / raw)
  To: Eric Sesterhenn
  Cc: Nick Piggin, linux-kernel, xfs, Christoph Hellwig, Pavel Machek,
	npiggin, Chris Mason

On Thu, Jan 22, 2009 at 11:06:48AM +0100, Eric Sesterhenn wrote:
> * Christoph Hellwig (hch@infradead.org) wrote:
> > On Thu, Jan 22, 2009 at 03:37:47PM +1100, Dave Chinner wrote:
> > >  xfs_buf_t *
> > >  xlog_get_bp(
> > >  	xlog_t		*log,
> > > -	int		num_bblks)
> > > +	int		nbblks)
> > 
> > Any reason for reanming this variable?  That causes quite a bit of
> > churn.
> > 
> > >  {
> > > -	ASSERT(num_bblks > 0);
> > > +	if (nbblks <= 0 || nbblks > log->l_logBBsize) {
> > > +		xlog_warn("XFS: Invalid block length (0x%x) given for buffer", nbblks);
> > 
> > And doesn't prevent this line from needing a linebreak to stay under 80
> > characters :)
> > 
> > Except for these nitpicks it looks fine to me.
> 
> Using the image at http://www.cccmz.de/~snakebyte/xfs.254.img.bz2
> I was able to produce a pretty similar error with the patch applied

Different problem, obviously. ;)

I'll have a look at this later today....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re: Corrupted XFS log replay oops.
  2009-01-22 23:37                           ` Dave Chinner
@ 2009-01-23  1:10                             ` Dave Chinner
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2009-01-23  1:10 UTC (permalink / raw)
  To: Eric Sesterhenn, Christoph Hellwig, Nick Piggin, Pavel Machek,
	Chris Mason, linux-kernel, npiggin, xfs

On Fri, Jan 23, 2009 at 10:37:17AM +1100, Dave Chinner wrote:
> On Thu, Jan 22, 2009 at 11:06:48AM +0100, Eric Sesterhenn wrote:
> > * Christoph Hellwig (hch@infradead.org) wrote:
> > > On Thu, Jan 22, 2009 at 03:37:47PM +1100, Dave Chinner wrote:
> > > >  xfs_buf_t *
> > > >  xlog_get_bp(
> > > >  	xlog_t		*log,
> > > > -	int		num_bblks)
> > > > +	int		nbblks)
> > > 
> > > Any reason for reanming this variable?  That causes quite a bit of
> > > churn.
> > > 
> > > >  {
> > > > -	ASSERT(num_bblks > 0);
> > > > +	if (nbblks <= 0 || nbblks > log->l_logBBsize) {
> > > > +		xlog_warn("XFS: Invalid block length (0x%x) given for buffer", nbblks);
> > > 
> > > And doesn't prevent this line from needing a linebreak to stay under 80
> > > characters :)
> > > 
> > > Except for these nitpicks it looks fine to me.
> > 
> > Using the image at http://www.cccmz.de/~snakebyte/xfs.254.img.bz2
> > I was able to produce a pretty similar error with the patch applied
> 
> Different problem, obviously. ;)
> 
> I'll have a look at this later today....

One word: Ouch.

Basically the corruption introduced adds random feature bits into the
superblock that aren't actually in use. And hence instead of having
valid superblock fields for each of those features, they are zero
and so strange stuff happens.

What is really stupid is that the fields are often checked. By
ASSERT(), not by production code so a debug kernel will pick up the
problem and panic, while a production kernel will continue onwards
until it panics. This is not going to be a small patch.....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re: Corrupted XFS log replay oops.
  2009-01-22  6:11                       ` Christoph Hellwig
  2009-01-22  8:35                         ` Eric Sesterhenn
  2009-01-22 10:06                         ` Eric Sesterhenn
@ 2009-01-22 23:35                         ` Dave Chinner
  2009-01-23  0:02                         ` Dave Chinner
  3 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2009-01-22 23:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Nick Piggin, Eric Sesterhenn, linux-kernel, xfs, Pavel Machek,
	npiggin, Chris Mason

On Thu, Jan 22, 2009 at 01:11:58AM -0500, Christoph Hellwig wrote:
> On Thu, Jan 22, 2009 at 03:37:47PM +1100, Dave Chinner wrote:
> >  xfs_buf_t *
> >  xlog_get_bp(
> >  	xlog_t		*log,
> > -	int		num_bblks)
> > +	int		nbblks)
> 
> Any reason for reanming this variable?  That causes quite a bit of
> churn.

Just so it was the same variable name as the other I/O functions
around it. It's kind of strange to have one function use num_bblks
and another 4 or so related functions use nbblks....

> >  {
> > -	ASSERT(num_bblks > 0);
> > +	if (nbblks <= 0 || nbblks > log->l_logBBsize) {
> > +		xlog_warn("XFS: Invalid block length (0x%x) given for buffer", nbblks);
> 
> And doesn't prevent this line from needing a linebreak to stay under 80
> characters :)

I'll respin it to fix this.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re: Corrupted XFS log replay oops.
  2009-01-22  6:11                       ` Christoph Hellwig
                                           ` (2 preceding siblings ...)
  2009-01-22 23:35                         ` Dave Chinner
@ 2009-01-23  0:02                         ` Dave Chinner
  2009-01-23  0:06                           ` Christoph Hellwig
  3 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2009-01-23  0:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Nick Piggin, Eric Sesterhenn, linux-kernel, xfs, Pavel Machek,
	npiggin, Chris Mason

On Thu, Jan 22, 2009 at 01:11:58AM -0500, Christoph Hellwig wrote:
> On Thu, Jan 22, 2009 at 03:37:47PM +1100, Dave Chinner wrote:
> >  xfs_buf_t *
> >  xlog_get_bp(
> >  	xlog_t		*log,
> > -	int		num_bblks)
> > +	int		nbblks)
> 
> Any reason for reanming this variable?  That causes quite a bit of
> churn.
> 
> >  {
> > -	ASSERT(num_bblks > 0);
> > +	if (nbblks <= 0 || nbblks > log->l_logBBsize) {
> > +		xlog_warn("XFS: Invalid block length (0x%x) given for buffer", nbblks);
> 
> And doesn't prevent this line from needing a linebreak to stay under 80
> characters :)

And now with line breaks.

------

[XFS] Check buffer lengths in log recovery

Before trying to obtain, read or write a buffer,
check that the buffer length is actually valid. If
it is not valid, then something read in the recovery
process has been corrupted and we should abort
recovery.

Reported-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Tested-by: Eric Sesterhenn <snakebyte@gmx.de>
---
 fs/xfs/xfs_log_recover.c |   34 ++++++++++++++++++++++++++++------
 1 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 35cca98..a37e4aa 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -70,16 +70,22 @@ STATIC void	xlog_recover_check_summary(xlog_t *);
 xfs_buf_t *
 xlog_get_bp(
 	xlog_t		*log,
-	int		num_bblks)
+	int		nbblks)
 {
-	ASSERT(num_bblks > 0);
+	if (nbblks <= 0 || nbblks > log->l_logBBsize) {
+		xlog_warn("XFS: Invalid block length (0x%x) given for buffer",
+					nbblks);
+		XFS_ERROR_REPORT("xlog_get_bp(1)",
+					XFS_ERRLEVEL_HIGH, log->l_mp);
+		return NULL;
+	}
 
 	if (log->l_sectbb_log) {
-		if (num_bblks > 1)
-			num_bblks += XLOG_SECTOR_ROUNDUP_BBCOUNT(log, 1);
-		num_bblks = XLOG_SECTOR_ROUNDUP_BBCOUNT(log, num_bblks);
+		if (nbblks > 1)
+			nbblks += XLOG_SECTOR_ROUNDUP_BBCOUNT(log, 1);
+		nbblks = XLOG_SECTOR_ROUNDUP_BBCOUNT(log, nbblks);
 	}
-	return xfs_buf_get_noaddr(BBTOB(num_bblks), log->l_mp->m_logdev_targp);
+	return xfs_buf_get_noaddr(BBTOB(nbblks), log->l_mp->m_logdev_targp);
 }
 
 void
@@ -102,6 +108,14 @@ xlog_bread(
 {
 	int		error;
 
+	if (nbblks <= 0 || nbblks > log->l_logBBsize) {
+		xlog_warn("XFS: Invalid block length (0x%x) given for buffer",
+					nbblks);
+		XFS_ERROR_REPORT("xlog_bread(1)",
+					XFS_ERRLEVEL_HIGH, log->l_mp);
+		return EFSCORRUPTED;
+	}
+
 	if (log->l_sectbb_log) {
 		blk_no = XLOG_SECTOR_ROUNDDOWN_BLKNO(log, blk_no);
 		nbblks = XLOG_SECTOR_ROUNDUP_BBCOUNT(log, nbblks);
@@ -139,6 +153,14 @@ xlog_bwrite(
 {
 	int		error;
 
+	if (nbblks <= 0 || nbblks > log->l_logBBsize) {
+		xlog_warn("XFS: Invalid block length (0x%x) given for buffer",
+					nbblks);
+		XFS_ERROR_REPORT("xlog_bwrite(1)",
+					XFS_ERRLEVEL_HIGH, log->l_mp);
+		return EFSCORRUPTED;
+	}
+
 	if (log->l_sectbb_log) {
 		blk_no = XLOG_SECTOR_ROUNDDOWN_BLKNO(log, blk_no);
 		nbblks = XLOG_SECTOR_ROUNDUP_BBCOUNT(log, nbblks);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re: Corrupted XFS log replay oops.
  2009-01-23  0:02                         ` Dave Chinner
@ 2009-01-23  0:06                           ` Christoph Hellwig
  2009-01-23  6:20                             ` Felix Blyakher
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2009-01-23  0:06 UTC (permalink / raw)
  To: Christoph Hellwig, Nick Piggin, Eric Sesterhenn, Pavel Machek,
	Chris Mason, linux-kernel, npiggin, xfs

> [XFS] Check buffer lengths in log recovery
> 
> Before trying to obtain, read or write a buffer,
> check that the buffer length is actually valid. If
> it is not valid, then something read in the recovery
> process has been corrupted and we should abort
> recovery.
> 
> Reported-by: Eric Sesterhenn <snakebyte@gmx.de>
> Signed-off-by: Dave Chinner <david@fromorbit.com>
> Tested-by: Eric Sesterhenn <snakebyte@gmx.de>


Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re: Corrupted XFS log replay oops.
  2009-01-23  0:06                           ` Christoph Hellwig
@ 2009-01-23  6:20                             ` Felix Blyakher
  0 siblings, 0 replies; 14+ messages in thread
From: Felix Blyakher @ 2009-01-23  6:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Nick Piggin, Eric Sesterhenn, linux-kernel, xfs, Pavel Machek,
	npiggin, Chris Mason


On Jan 22, 2009, at 6:06 PM, Christoph Hellwig wrote:

>> [XFS] Check buffer lengths in log recovery
>>
>> Before trying to obtain, read or write a buffer,
>> check that the buffer length is actually valid. If
>> it is not valid, then something read in the recovery
>> process has been corrupted and we should abort
>> recovery.
>>
>> Reported-by: Eric Sesterhenn <snakebyte@gmx.de>
>> Signed-off-by: Dave Chinner <david@fromorbit.com>
>> Tested-by: Eric Sesterhenn <snakebyte@gmx.de>
>
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Felix Blyakher <felixb@sgi.com>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re: Corrupted XFS log replay oops.
  2009-01-22  4:37                     ` [PATCH] Re: Corrupted XFS log replay oops Dave Chinner
  2009-01-22  5:50                       ` Felix Blyakher
  2009-01-22  6:11                       ` Christoph Hellwig
@ 2009-02-03 20:48                       ` Eric Sandeen
  2 siblings, 0 replies; 14+ messages in thread
From: Eric Sandeen @ 2009-02-03 20:48 UTC (permalink / raw)
  To: Dave Chinner, xfs mailing list

Dave Chinner wrote:
> On Wed, Jan 21, 2009 at 03:03:06PM +1100, Nick Piggin wrote:
>> On Wednesday 21 January 2009 14:57:03 Dave Chinner wrote:
>>>> [  235.250167] ------------[ cut here ]------------
>>>> [  235.250354] kernel BUG at mm/vmalloc.c:164!
>>>> [  235.250478] invalid opcode: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
>>>> [  235.250869] last sysfs file: /sys/block/ram9/range
>>>> [  235.250998] Modules linked in:
> ......
>>>> [  235.251037] Call Trace:
>>>> [  235.251037]  [<c01414cf>] ? trace_hardirqs_on+0xb/0xd
>>>> [  235.251037]  [<c018367c>] ? vm_map_ram+0x36e/0x38a
>>>> [  235.251037]  [<c03b2e1e>] ? _xfs_buf_map_pages+0x42/0x6d
>>>> [  235.251037]  [<c03b3773>] ? xfs_buf_get_noaddr+0xbc/0x11f
>>>> [  235.251037]  [<c03a2406>] ? xlog_get_bp+0x5a/0x5d
>>>> [  235.251037]  [<c03a28fa>] ? xlog_find_verify_log_record+0x26/0x208
>>>> [  235.251037]  [<c03a3521>] ? xlog_find_zeroed+0x1d6/0x214
>>>> [  235.251037]  [<c03a3584>] ? xlog_find_head+0x25/0x358
>>> .....
>>>
>>> Ok, that's crashing in the new vmap code. It might take a couple
>>> of days before I get a chance to look at this, but I've cc'd Nick Piggin
>>> in case he has a chance to look at it before that. It's probably
>>> an XFS bug, anyway.
>> Hmm, it is crashing in BUG_ON(addr >= end); where this could happen
>> if XFS asks to map a really huge (or -ve) number of pages and wraps
>> the range, or if vmap subsystem returns an address right near the
>> end of the address range and addr+size wraps (which would be a bug
>> in vmap of course, but I think maybe less likely).
> 
> It's a zero length range, not a negative value. A debug XFS would
> have assert failed on it, but it was completely unchecked on
> production builds. The following patch checks the length of blocks
> to build/read/write for being valid. Instead of an oops, we get:

Dave, this patch seems like a candidate for 2.6.27-stable too, yes?

-Eric

> [ 1572.665001] XFS mounting filesystem loop0
> [ 1572.666942] XFS: Invalid block length (0x0) given for buffer
> [ 1572.667141] XFS: Log inconsistent (didn't find previous header)
> [ 1572.667141] XFS: empty log check failed
> [ 1572.667141] XFS: log mount/recovery failed: error 5
> [ 1572.671487] XFS: log mount failed
> 
> Cheers,
> 
> Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-02-03 21:19 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20090113142147.GE16333@alice>
     [not found] ` <1231857643.29164.28.camel@think.oraclecorp.com>
     [not found]   ` <20090113144307.GF16333@alice>
     [not found]     ` <20090118174035.GG1944@ucw.cz>
     [not found]       ` <20090120063150.GC5854@alice>
     [not found]         ` <20090120101119.GB10158@disturbed>
     [not found]           ` <20090120101503.GC17377@alice>
     [not found]             ` <20090120125944.GC10158@disturbed>
     [not found]               ` <20090120173455.GC21339@alice>
2009-01-21  3:57                 ` Corrupted XFS log replay oops. (was Re: Warning and BUG with btrfs and corrupted image) Dave Chinner
2009-01-21  4:03                   ` Dave Chinner
     [not found]                   ` <200901211503.07308.nickpiggin@yahoo.com.au>
2009-01-22  4:37                     ` [PATCH] Re: Corrupted XFS log replay oops Dave Chinner
2009-01-22  5:50                       ` Felix Blyakher
2009-01-22  6:11                       ` Christoph Hellwig
2009-01-22  8:35                         ` Eric Sesterhenn
2009-01-22 10:06                         ` Eric Sesterhenn
2009-01-22 23:37                           ` Dave Chinner
2009-01-23  1:10                             ` Dave Chinner
2009-01-22 23:35                         ` Dave Chinner
2009-01-23  0:02                         ` Dave Chinner
2009-01-23  0:06                           ` Christoph Hellwig
2009-01-23  6:20                             ` Felix Blyakher
2009-02-03 20:48                       ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox