public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* CentOS 5.5 XFS internal errors (XFS_WANT_CORRUPTED_GOTO)
@ 2010-07-06 10:57 Shaun Adolphson
  2010-07-06 16:17 ` Stan Hoeppner
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Shaun Adolphson @ 2010-07-06 10:57 UTC (permalink / raw)
  To: xfs

Hi,

We have been able to repeatably produce xfs internal errors
(XFS_WANT_CORRUPTED_GOTO) on one of our fileservers. We are attempting
to locally copy a 248Gig file off a usb drive formated as NTFS to the
xfs drive. The copy gets about 96% of the way through and we get the
following messages:

Jun 28 22:14:46 terrorserver kernel: XFS internal error
XFS_WANT_CORRUPTED_GOTO at line 2092 of file fs/xfs/xfs_bmap_btree.c.
Caller 0xffffffff8837446f
Jun 28 22:14:46 terrorserver kernel:
Jun 28 22:14:46 terrorserver kernel: Call Trace:
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff8837c360>]
:xfs:xfs_bmbt_insert+0xac/0x13a
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff8837446f>]
:xfs:xfs_bmap_add_extent_delay_real+0x8cd/0x103a
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff88368cfa>]
:xfs:xfs_alloc_vextent+0x379/0x3ff
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff8837543a>]
:xfs:xfs_bmap_add_extent+0x1fb/0x390
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff88377f34>]
:xfs:xfs_bmapi+0x895/0xe79
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff88398ff2>]
:xfs:xfs_log_reserve+0xad/0xc9
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff88394082>]
:xfs:xfs_iomap_write_allocate+0x201/0x328
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff88394b09>]
:xfs:xfs_iomap+0x22a/0x2a5
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff883a9ae3>]
:xfs:xfs_map_blocks+0x2d/0x65
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff883aa723>]
:xfs:xfs_page_state_convert+0x2af/0x544
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff883aab04>]
:xfs:xfs_vm_writepage+0xa7/0xdf
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff800cae35>]
shrink_inactive_list+0x3fd/0x8d8
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff8001311b>]
shrink_zone+0x127/0x18d
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff80057e60>] kswapd+0x323/0x46c
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff800a0abe>]
autoremove_wake_function+0x0/0x2e
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff800a08a6>]
keventd_create_kthread+0x0/0xc4
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff80057b3d>] kswapd+0x0/0x46c
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff800a08a6>]
keventd_create_kthread+0x0/0xc4
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff80032894>] kthread+0xfe/0x132
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff8009d734>]
request_module+0x0/0x14d
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff800a08a6>]
keventd_create_kthread+0x0/0xc4
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff80032796>] kthread+0x0/0x132
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11
Jun 28 22:14:46 terrorserver kernel:
Jun 28 22:14:46 terrorserver kernel: Filesystem "dm-0": XFS internal
error xfs_trans_cancel at line 1164 of file fs/xfs/xfs_trans.c.
Caller 0xffffffff88394186
Jun 28 22:14:46 terrorserver kernel:
Jun 28 22:14:46 terrorserver kernel: Call Trace:
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff883a1b37>]
:xfs:xfs_trans_cancel+0x55/0xfa
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff88394186>]
:xfs:xfs_iomap_write_allocate+0x305/0x328
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff88394b09>]
:xfs:xfs_iomap+0x22a/0x2a5
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff883a9ae3>]
:xfs:xfs_map_blocks+0x2d/0x65
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff883aa723>]
:xfs:xfs_page_state_convert+0x2af/0x544
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff883aab04>]
:xfs:xfs_vm_writepage+0xa7/0xdf
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff800cae35>]
shrink_inactive_list+0x3fd/0x8d8
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff8001311b>]
shrink_zone+0x127/0x18d
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff80057e60>] kswapd+0x323/0x46c
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff800a0abe>]
autoremove_wake_function+0x0/0x2e
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff800a08a6>]
keventd_create_kthread+0x0/0xc4
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff80057b3d>] kswapd+0x0/0x46c
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff800a08a6>]
keventd_create_kthread+0x0/0xc4
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff80032894>] kthread+0xfe/0x132
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff8009d734>]
request_module+0x0/0x14d
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff800a08a6>]
keventd_create_kthread+0x0/0xc4
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff80032796>] kthread+0x0/0x132
Jun 28 22:14:46 terrorserver kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11
Jun 28 22:14:46 terrorserver kernel:
Jun 28 22:14:46 terrorserver kernel: xfs_force_shutdown(dm-0,0x8)
called from line 1165 of file fs/xfs/xfs_trans.c.  Return address =
0xffffffff883a1b50
Jun 28 22:14:46 terrorserver kernel: Filesystem "dm-0": Corruption of
in-memory data detected.  Shutting down filesystem: dm-0
Jun 28 22:14:46 terrorserver kernel: Please umount the filesystem, and
rectify the problem(s)
Jun 28 22:14:47 terrorserver kernel: Filesystem "dm-0": xfs_log_force:
error 5 returned.

We have reproduced the condition 3 times and each time we have been
able to remount the drive ( to replay the transaction log ) and then
preform and xfs_repair.

We are just using cp to copy the file.

Some further details about the system:

Software:
- Fresh install of CentOS 5.5 64bit all patches up to date
- Kernel 2.6.18-194.3.1.el5.centos.plus

RAID Hardware:
- 3ware 9650SE 12 port sata controler
- 6 x 1.5tb disk in a raid 5 (sde)
- 6 x 2.0tb disks in a raid 5 (sdf)

Configuration
- LVM across sde and sdf
- Formatted as XFS ( ~16 TB )

Any guidance to resolving this issue would be much appreciated. I am
able to provide any other information that is required.

Thanks for any assistance you can provide.
Regards,
Shaun

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CentOS 5.5 XFS internal errors (XFS_WANT_CORRUPTED_GOTO)
  2010-07-06 10:57 CentOS 5.5 XFS internal errors (XFS_WANT_CORRUPTED_GOTO) Shaun Adolphson
@ 2010-07-06 16:17 ` Stan Hoeppner
  2010-07-06 22:00 ` Shaun Adolphson
  2010-07-06 23:18 ` Dave Chinner
  2 siblings, 0 replies; 11+ messages in thread
From: Stan Hoeppner @ 2010-07-06 16:17 UTC (permalink / raw)
  To: xfs

Shaun Adolphson put forth on 7/6/2010 5:57 AM:

> Software:
> - Fresh install of CentOS 5.5 64bit all patches up to date
> - Kernel 2.6.18-194.3.1.el5.centos.plus

First thing that comes to mind is the fact that 2.6.18 is about 3 years old.
I'm not familiar with CentOS patching policies, but even if you've received
and applied some XFS patches, they'd still probably be rather old.

If CentOS 5.5 doesn't have a much newer kernel package available, say 2.6.30
or later, if I were you, I'd grab the kernel source from kernel.org and roll
your own 2.6.33.6 or 2.6.34.1.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: CentOS 5.5 XFS internal errors (XFS_WANT_CORRUPTED_GOTO)
  2010-07-06 10:57 CentOS 5.5 XFS internal errors (XFS_WANT_CORRUPTED_GOTO) Shaun Adolphson
  2010-07-06 16:17 ` Stan Hoeppner
@ 2010-07-06 22:00 ` Shaun Adolphson
  2010-07-06 23:18 ` Dave Chinner
  2 siblings, 0 replies; 11+ messages in thread
From: Shaun Adolphson @ 2010-07-06 22:00 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 765 bytes --]

>>* Software:*
>>* - Fresh install of CentOS 5.5 64bit all patches up to date*
>>* - Kernel 2.6.18-194.3.1.el5.centos.plus*
>
> First thing that comes to mind is the fact that 2.6.18 is about 3 years old.
> I'm not familiar with CentOS patching policies, but even if you've received
> and applied some XFS patches, they'd still probably be rather old.
>
> If CentOS 5.5 doesn't have a much newer kernel package available, say 2.6.30
> or later, if I were you, I'd grab the kernel source from kernel.org and roll
> your own 2.6.33.6 or 2.6.34.1.


Currently that is the latest and greatest kernel available on CentOS. So we
may need to investigate rolling our own kernel if there are no configuration
changes that we can make to resolve the problem.

Thanks,

Shaun

[-- Attachment #1.2: Type: text/html, Size: 1135 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CentOS 5.5 XFS internal errors (XFS_WANT_CORRUPTED_GOTO)
  2010-07-06 10:57 CentOS 5.5 XFS internal errors (XFS_WANT_CORRUPTED_GOTO) Shaun Adolphson
  2010-07-06 16:17 ` Stan Hoeppner
  2010-07-06 22:00 ` Shaun Adolphson
@ 2010-07-06 23:18 ` Dave Chinner
  2010-07-07  1:51   ` Eric Sandeen
  2010-07-08 11:21   ` Shaun Adolphson
  2 siblings, 2 replies; 11+ messages in thread
From: Dave Chinner @ 2010-07-06 23:18 UTC (permalink / raw)
  To: Shaun Adolphson; +Cc: xfs

On Tue, Jul 06, 2010 at 08:57:45PM +1000, Shaun Adolphson wrote:
> Hi,
> 
> We have been able to repeatably produce xfs internal errors
> (XFS_WANT_CORRUPTED_GOTO) on one of our fileservers. We are attempting
> to locally copy a 248Gig file off a usb drive formated as NTFS to the
> xfs drive. The copy gets about 96% of the way through and we get the
> following messages:
> 
> Jun 28 22:14:46 terrorserver kernel: XFS internal error
> XFS_WANT_CORRUPTED_GOTO at line 2092 of file fs/xfs/xfs_bmap_btree.c.
> Caller 0xffffffff8837446f

Interesting. That's a corrupted inode extent btree - I haven't seen
one of them for a long while. Were there any errors (like IO errors)
reported before this?

However, the first step is to determine if the error is on disk or an
in-memory error. Can you post output of:

	- xfs_info <mntpt>
	- xfs_repair -n after a shutdown

Can you upgrade xfsprogs (i.e. xfs_repair) to the latest version
(3.1.2) before you do this as well?

> We have reproduced the condition 3 times and each time we have been
> able to remount the drive ( to replay the transaction log ) and then
> preform and xfs_repair.
> 
> We are just using cp to copy the file.
> 
> Some further details about the system:
> 
> Software:
> - Fresh install of CentOS 5.5 64bit all patches up to date
> - Kernel 2.6.18-194.3.1.el5.centos.plus

I've got no idea exactly what version of XFS that has in it, so I
can't say off the top of my head whether this is a fixed bug or not.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CentOS 5.5 XFS internal errors (XFS_WANT_CORRUPTED_GOTO)
  2010-07-06 23:18 ` Dave Chinner
@ 2010-07-07  1:51   ` Eric Sandeen
  2010-07-08 11:21   ` Shaun Adolphson
  1 sibling, 0 replies; 11+ messages in thread
From: Eric Sandeen @ 2010-07-07  1:51 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Shaun Adolphson, xfs@oss.sgi.com

On Jul 6, 2010, at 6:18 PM, Dave Chinner <david@fromorbit.com> wrote:

> On Tue, Jul 06, 2010 at 08:57:45PM +1000, Shaun Adolphson wrote:
>> .
>> 
>> Some further details about the system:
>> 
>> Software:
>> - Fresh install of CentOS 5.5 64bit all patches up to date
>> - Kernel 2.6.18-194.3.1.el5.centos.plus
> 
> I've got no idea exactly what version of XFS that has in it, so I
> can't say off the top of my head whether this is a fixed bug or not.
> 
Assuming it's what is in the real RHEL it's 2.6.28.6 and a few patches IIRC, rpm changelog should say.

-Eric

> Cheers,
> 
> Dave.
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CentOS 5.5 XFS internal errors (XFS_WANT_CORRUPTED_GOTO)
  2010-07-06 23:18 ` Dave Chinner
  2010-07-07  1:51   ` Eric Sandeen
@ 2010-07-08 11:21   ` Shaun Adolphson
  2010-07-11 11:44     ` Shaun Adolphson
  1 sibling, 1 reply; 11+ messages in thread
From: Shaun Adolphson @ 2010-07-08 11:21 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Jul 7, 2010 at 9:18 AM, Dave Chinner <david@fromorbit.com> wrote:
>
> On Tue, Jul 06, 2010 at 08:57:45PM +1000, Shaun Adolphson wrote:
> > Hi,
> >
> > We have been able to repeatably produce xfs internal errors
> > (XFS_WANT_CORRUPTED_GOTO) on one of our fileservers. We are attempting
> > to locally copy a 248Gig file off a usb drive formated as NTFS to the
> > xfs drive. The copy gets about 96% of the way through and we get the
> > following messages:
> >
> > Jun 28 22:14:46 terrorserver kernel: XFS internal error
> > XFS_WANT_CORRUPTED_GOTO at line 2092 of file fs/xfs/xfs_bmap_btree.c.
> > Caller 0xffffffff8837446f
>
> Interesting. That's a corrupted inode extent btree - I haven't seen
> one of them for a long while. Were there any errors (like IO errors)
> reported before this?
>
> However, the first step is to determine if the error is on disk or an
> in-memory error. Can you post output of:
>
>        - xfs_info <mntpt>
>        - xfs_repair -n after a shutdown
>
> Can you upgrade xfsprogs (i.e. xfs_repair) to the latest version
> (3.1.2) before you do this as well?

We have upgraded the xfsprogs to 3.1.2 and in the process of
collecting the required infomation.

>
> > We have reproduced the condition 3 times and each time we have been
> > able to remount the drive ( to replay the transaction log ) and then
> > preform and xfs_repair.
> >
> > We are just using cp to copy the file.
> >
> > Some further details about the system:
> >
> > Software:
> > - Fresh install of CentOS 5.5 64bit all patches up to date
> > - Kernel 2.6.18-194.3.1.el5.centos.plus
>
> I've got no idea exactly what version of XFS that has in it, so I
> can't say off the top of my head whether this is a fixed bug or not.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com



During other testing we have also been able to reproduce the issue by
copying  a self generated 248Gig file from another system disk to the
XFS disk. The file was generated using dd with an input of /dev/zero.

All the existing data (~6TB ) was successfully copied onto the storage
with out have the error. The thing to note is that all the existing
files are much smaller than the one that we are trying to copy in (
248Gig ). And since we have been having the shutdown we have copied
many smaller files ( files < 30Gig in size ) onto the storage area
with out issue

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CentOS 5.5 XFS internal errors (XFS_WANT_CORRUPTED_GOTO)
  2010-07-08 11:21   ` Shaun Adolphson
@ 2010-07-11 11:44     ` Shaun Adolphson
  2010-07-11 11:47       ` Shaun Adolphson
  2010-07-12  1:08       ` Dave Chinner
  0 siblings, 2 replies; 11+ messages in thread
From: Shaun Adolphson @ 2010-07-11 11:44 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, Jul 8, 2010 at 9:21 PM, Shaun Adolphson <shaun@adolphson.net> wrote:
> On Wed, Jul 7, 2010 at 9:18 AM, Dave Chinner <david@fromorbit.com> wrote:
>>
>> On Tue, Jul 06, 2010 at 08:57:45PM +1000, Shaun Adolphson wrote:
>> > Hi,
>> >
>> > We have been able to repeatably produce xfs internal errors
>> > (XFS_WANT_CORRUPTED_GOTO) on one of our fileservers. We are attempting
>> > to locally copy a 248Gig file off a usb drive formated as NTFS to the
>> > xfs drive. The copy gets about 96% of the way through and we get the
>> > following messages:
>> >
>> > Jun 28 22:14:46 terrorserver kernel: XFS internal error
>> > XFS_WANT_CORRUPTED_GOTO at line 2092 of file fs/xfs/xfs_bmap_btree.c.
>> > Caller 0xffffffff8837446f
>>
>> Interesting. That's a corrupted inode extent btree - I haven't seen
>> one of them for a long while. Were there any errors (like IO errors)
>> reported before this?
>>
>> However, the first step is to determine if the error is on disk or an
>> in-memory error. Can you post output of:
>>
>>        - xfs_info <mntpt>

meta-data=/dev/TerrorVolume/terror isize=256    agcount=130385,
agsize=32768 blks
              =                      sectsz=512   attr=1
data        =                      bsize=4096   blocks=4272433152, imaxpct=25
              =                      sunit=0      swidth=0 blks
naming   =version 2         bsize=4096   ascii-ci=0
log         =internal            bsize=4096   blocks=2560, version=1
             =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime  =none               extsz=4096   blocks=0, rtextents=0


>>        - xfs_repair -n after a shutdown

The out out of the xfs_repair -n is 6mb, below is the condensed
version. I can post the whole output if required.

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
       - agno = 0
.
.
.
       - agno = 130384
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.




>>
>> Can you upgrade xfsprogs (i.e. xfs_repair) to the latest version
>> (3.1.2) before you do this as well?

# xfs_repair -V
xfs_repair version 3.1.2


>
> We have upgraded the xfsprogs to 3.1.2 and in the process of
> collecting the required infomation.
>
>>
>> > We have reproduced the condition 3 times and each time we have been
>> > able to remount the drive ( to replay the transaction log ) and then
>> > preform and xfs_repair.
>> >
>> > We are just using cp to copy the file.
>> >
>> > Some further details about the system:
>> >
>> > Software:
>> > - Fresh install of CentOS 5.5 64bit all patches up to date
>> > - Kernel 2.6.18-194.3.1.el5.centos.plus
>>
>> I've got no idea exactly what version of XFS that has in it, so I
>> can't say off the top of my head whether this is a fixed bug or not.
>>
>> Cheers,
>>
>> Dave.
>> --
>> Dave Chinner
>> david@fromorbit.com
>
>
>
> During other testing we have also been able to reproduce the issue by
> copying  a self generated 248Gig file from another system disk to the
> XFS disk. The file was generated using dd with an input of /dev/zero.
>
> All the existing data (~6TB ) was successfully copied onto the storage
> with out have the error. The thing to note is that all the existing
> files are much smaller than the one that we are trying to copy in (
> 248Gig ). And since we have been having the shutdown we have copied
> many smaller files ( files < 30Gig in size ) onto the storage area
> with out issue
>

Thanks,

Shaun

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CentOS 5.5 XFS internal errors (XFS_WANT_CORRUPTED_GOTO)
  2010-07-11 11:44     ` Shaun Adolphson
@ 2010-07-11 11:47       ` Shaun Adolphson
  2010-07-12  1:08       ` Dave Chinner
  1 sibling, 0 replies; 11+ messages in thread
From: Shaun Adolphson @ 2010-07-11 11:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Sun, Jul 11, 2010 at 9:44 PM, Shaun Adolphson <shaun@adolphson.net> wrote:
> On Thu, Jul 8, 2010 at 9:21 PM, Shaun Adolphson <shaun@adolphson.net> wrote:
>> On Wed, Jul 7, 2010 at 9:18 AM, Dave Chinner <david@fromorbit.com> wrote:
>>>
>>> On Tue, Jul 06, 2010 at 08:57:45PM +1000, Shaun Adolphson wrote:
>>> > Hi,
>>> >
>>> > We have been able to repeatably produce xfs internal errors
>>> > (XFS_WANT_CORRUPTED_GOTO) on one of our fileservers. We are attempting
>>> > to locally copy a 248Gig file off a usb drive formated as NTFS to the
>>> > xfs drive. The copy gets about 96% of the way through and we get the
>>> > following messages:
>>> >
>>> > Jun 28 22:14:46 terrorserver kernel: XFS internal error
>>> > XFS_WANT_CORRUPTED_GOTO at line 2092 of file fs/xfs/xfs_bmap_btree.c.
>>> > Caller 0xffffffff8837446f
>>>
>>> Interesting. That's a corrupted inode extent btree - I haven't seen
>>> one of them for a long while. Were there any errors (like IO errors)
>>> reported before this?
>>>
>>> However, the first step is to determine if the error is on disk or an
>>> in-memory error. Can you post output of:
>>>
>>>        - xfs_info <mntpt>
>
> meta-data=/dev/TerrorVolume/terror isize=256    agcount=130385,
> agsize=32768 blks
>              =                      sectsz=512   attr=1
> data        =                      bsize=4096   blocks=4272433152, imaxpct=25
>              =                      sunit=0      swidth=0 blks
> naming   =version 2         bsize=4096   ascii-ci=0
> log         =internal            bsize=4096   blocks=2560, version=1
>             =                       sectsz=512   sunit=0 blks, lazy-count=0
> realtime  =none               extsz=4096   blocks=0, rtextents=0
>
>
>>>        - xfs_repair -n after a shutdown
>
> The out out of the xfs_repair -n is 6mb, below is the condensed
> version. I can post the whole output if required.
>
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>        - scan filesystem freespace and inode maps...
>        - found root inode chunk
> Phase 3 - for each AG...
>        - scan (but don't clear) agi unlinked lists...
>        - process known inodes and perform inode discovery...
>       - agno = 0
> .
> .
> .
>       - agno = 130384
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
.
.
.
       - agno = 130384
> No modify flag set, skipping phase 5
> Phase 6 - check inode connectivity...
>        - traversing filesystem ...
>        - traversal finished ...
>        - moving disconnected inodes to lost+found ...
> Phase 7 - verify link counts...
> No modify flag set, skipping filesystem flush and exiting.
>
>
>
>
>>>
>>> Can you upgrade xfsprogs (i.e. xfs_repair) to the latest version
>>> (3.1.2) before you do this as well?
>
> # xfs_repair -V
> xfs_repair version 3.1.2
>
>
>>
>> We have upgraded the xfsprogs to 3.1.2 and in the process of
>> collecting the required infomation.
>>
>>>
>>> > We have reproduced the condition 3 times and each time we have been
>>> > able to remount the drive ( to replay the transaction log ) and then
>>> > preform and xfs_repair.
>>> >
>>> > We are just using cp to copy the file.
>>> >
>>> > Some further details about the system:
>>> >
>>> > Software:
>>> > - Fresh install of CentOS 5.5 64bit all patches up to date
>>> > - Kernel 2.6.18-194.3.1.el5.centos.plus
>>>
>>> I've got no idea exactly what version of XFS that has in it, so I
>>> can't say off the top of my head whether this is a fixed bug or not.
>>>
>>> Cheers,
>>>
>>> Dave.
>>> --
>>> Dave Chinner
>>> david@fromorbit.com
>>
>>
>>
>> During other testing we have also been able to reproduce the issue by
>> copying  a self generated 248Gig file from another system disk to the
>> XFS disk. The file was generated using dd with an input of /dev/zero.
>>
>> All the existing data (~6TB ) was successfully copied onto the storage
>> with out have the error. The thing to note is that all the existing
>> files are much smaller than the one that we are trying to copy in (
>> 248Gig ). And since we have been having the shutdown we have copied
>> many smaller files ( files < 30Gig in size ) onto the storage area
>> with out issue
>>
>
> Thanks,
>
> Shaun
>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CentOS 5.5 XFS internal errors (XFS_WANT_CORRUPTED_GOTO)
  2010-07-11 11:44     ` Shaun Adolphson
  2010-07-11 11:47       ` Shaun Adolphson
@ 2010-07-12  1:08       ` Dave Chinner
  2010-07-12  5:45         ` Dave Chinner
  1 sibling, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2010-07-12  1:08 UTC (permalink / raw)
  To: Shaun Adolphson; +Cc: xfs

On Sun, Jul 11, 2010 at 09:44:07PM +1000, Shaun Adolphson wrote:
> On Thu, Jul 8, 2010 at 9:21 PM, Shaun Adolphson <shaun@adolphson.net> wrote:
> > On Wed, Jul 7, 2010 at 9:18 AM, Dave Chinner <david@fromorbit.com> wrote:
> >>
> >> On Tue, Jul 06, 2010 at 08:57:45PM +1000, Shaun Adolphson wrote:
> >> > Hi,
> >> >
> >> > We have been able to repeatably produce xfs internal errors
> >> > (XFS_WANT_CORRUPTED_GOTO) on one of our fileservers. We are attempting
> >> > to locally copy a 248Gig file off a usb drive formated as NTFS to the
> >> > xfs drive. The copy gets about 96% of the way through and we get the
> >> > following messages:
> >> >
> >> > Jun 28 22:14:46 terrorserver kernel: XFS internal error
> >> > XFS_WANT_CORRUPTED_GOTO at line 2092 of file fs/xfs/xfs_bmap_btree.c.
> >> > Caller 0xffffffff8837446f
> >>
> >> Interesting. That's a corrupted inode extent btree - I haven't seen
> >> one of them for a long while. Were there any errors (like IO errors)
> >> reported before this?
> >>
> >> However, the first step is to determine if the error is on disk or an
> >> in-memory error. Can you post output of:
> >>
> >>        - xfs_info <mntpt>
> 
> meta-data=/dev/TerrorVolume/terror isize=256    agcount=130385,
> agsize=32768 blks
>               =                      sectsz=512   attr=1
> data        =                      bsize=4096   blocks=4272433152, imaxpct=25
>               =                      sunit=0      swidth=0 blks
> naming   =version 2         bsize=4096   ascii-ci=0
> log         =internal            bsize=4096   blocks=2560, version=1
>              =                       sectsz=512   sunit=0 blks, lazy-count=0
> realtime  =none               extsz=4096   blocks=0, rtextents=0

WHy did you make this filesystem with 128MB allocation groups? The
default for a filesystem of this size is 1TB allocation groups.
More than 100k allocation groups will certainly push internal AG
scanning scalability past it's tested limits....

Also, a log of 10MB is rather small, and it tells me that you didn't
just create this filesystem firectly on the 16TB block device with a
recent mkfs.xfs. That is, at current mkfs.xfs defaults to get a layout like
this you'd have to ѕtart with a 512MB filesystem and grow it to
16TB.

Growing a filesystem by 3-4 orders of magnitude does not result in a
particularly sane filesystem layout and pushes it way outside
configurations that are regularly tested.  I strongly suggest you
rebuild this filesystem with a default layout from a recent mkfs.xfs
before going any further....

> >>        - xfs_repair -n after a shutdown
> 
> The out out of the xfs_repair -n is 6mb, below is the condensed
> version. I can post the whole output if required.

If there were no errors, then I don't need to see it. However, if
you trimmed errors out or you don't know what errors look like, then
I need to see the whole output...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CentOS 5.5 XFS internal errors (XFS_WANT_CORRUPTED_GOTO)
  2010-07-12  1:08       ` Dave Chinner
@ 2010-07-12  5:45         ` Dave Chinner
  2010-08-16 10:32           ` Shaun Adolphson
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2010-07-12  5:45 UTC (permalink / raw)
  To: Shaun Adolphson; +Cc: xfs

On Mon, Jul 12, 2010 at 11:08:32AM +1000, Dave Chinner wrote:
> On Sun, Jul 11, 2010 at 09:44:07PM +1000, Shaun Adolphson wrote:
> > On Thu, Jul 8, 2010 at 9:21 PM, Shaun Adolphson <shaun@adolphson.net> wrote:
> > > On Wed, Jul 7, 2010 at 9:18 AM, Dave Chinner <david@fromorbit.com> wrote:
> > >>
> > >> On Tue, Jul 06, 2010 at 08:57:45PM +1000, Shaun Adolphson wrote:
> > >> > Hi,
> > >> >
> > >> > We have been able to repeatably produce xfs internal errors
> > >> > (XFS_WANT_CORRUPTED_GOTO) on one of our fileservers. We are attempting
> > >> > to locally copy a 248Gig file off a usb drive formated as NTFS to the
> > >> > xfs drive. The copy gets about 96% of the way through and we get the
> > >> > following messages:
> > >> >
> > >> > Jun 28 22:14:46 terrorserver kernel: XFS internal error
> > >> > XFS_WANT_CORRUPTED_GOTO at line 2092 of file fs/xfs/xfs_bmap_btree.c.
> > >> > Caller 0xffffffff8837446f
> > >>
> > >> Interesting. That's a corrupted inode extent btree - I haven't seen
> > >> one of them for a long while. Were there any errors (like IO errors)
> > >> reported before this?
> > >>
> > >> However, the first step is to determine if the error is on disk or an
> > >> in-memory error. Can you post output of:
> > >>
> > >>        - xfs_info <mntpt>
> > 
> > meta-data=/dev/TerrorVolume/terror isize=256    agcount=130385,
> > agsize=32768 blks
> >               =                      sectsz=512   attr=1
> > data        =                      bsize=4096   blocks=4272433152, imaxpct=25
> >               =                      sunit=0      swidth=0 blks
> > naming   =version 2         bsize=4096   ascii-ci=0
> > log         =internal            bsize=4096   blocks=2560, version=1
> >              =                       sectsz=512   sunit=0 blks, lazy-count=0
> > realtime  =none               extsz=4096   blocks=0, rtextents=0
> 
> WHy did you make this filesystem with 128MB allocation groups? The
> default for a filesystem of this size is 1TB allocation groups.
> More than 100k allocation groups will certainly push internal AG
> scanning scalability past it's tested limits....
> 
> Also, a log of 10MB is rather small, and it tells me that you didn't
> just create this filesystem firectly on the 16TB block device with a
> recent mkfs.xfs. That is, at current mkfs.xfs defaults to get a layout like
> this you'd have to ѕtart with a 512MB filesystem and grow it to
> 16TB.

Actually, an old mkfs that defaults to 16 AGs and a filesystem size
of 2GB in needed to get a log of 2540 blocks. I just grew one of
these to roughly 16TB and ended up with 125,000 AGs, so it's in the
ballpark. Also, *allocating* 250GB to a single file (as
preallocation) doesn't appear to have any problems on 2.6.35-rc4, so
there doesn't appear to be any general error caused by this
configuration in mainline....

Can you run this command:

# xfs_io -f -c "truncate 250g" -c "resvsp 0 250g" <test file>

And see if that generates the same corruption as copying a file?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CentOS 5.5 XFS internal errors (XFS_WANT_CORRUPTED_GOTO)
  2010-07-12  5:45         ` Dave Chinner
@ 2010-08-16 10:32           ` Shaun Adolphson
  0 siblings, 0 replies; 11+ messages in thread
From: Shaun Adolphson @ 2010-08-16 10:32 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Mon, Jul 12, 2010 at 3:45 PM, Dave Chinner <david@fromorbit.com> wrote:
>
> On Mon, Jul 12, 2010 at 11:08:32AM +1000, Dave Chinner wrote:
> > On Sun, Jul 11, 2010 at 09:44:07PM +1000, Shaun Adolphson wrote:
> > > On Thu, Jul 8, 2010 at 9:21 PM, Shaun Adolphson <shaun@adolphson.net> wrote:
> > > > On Wed, Jul 7, 2010 at 9:18 AM, Dave Chinner <david@fromorbit.com> wrote:
> > > >>
> > > >> On Tue, Jul 06, 2010 at 08:57:45PM +1000, Shaun Adolphson wrote:
> > > >> > Hi,
> > > >> >
> > > >> > We have been able to repeatably produce xfs internal errors
> > > >> > (XFS_WANT_CORRUPTED_GOTO) on one of our fileservers. We are attempting
> > > >> > to locally copy a 248Gig file off a usb drive formated as NTFS to the
> > > >> > xfs drive. The copy gets about 96% of the way through and we get the
> > > >> > following messages:
> > > >> >
> > > >> > Jun 28 22:14:46 terrorserver kernel: XFS internal error
> > > >> > XFS_WANT_CORRUPTED_GOTO at line 2092 of file fs/xfs/xfs_bmap_btree.c.
> > > >> > Caller 0xffffffff8837446f
> > > >>
> > > >> Interesting. That's a corrupted inode extent btree - I haven't seen
> > > >> one of them for a long while. Were there any errors (like IO errors)
> > > >> reported before this?
> > > >>
> > > >> However, the first step is to determine if the error is on disk or an
> > > >> in-memory error. Can you post output of:
> > > >>
> > > >>        - xfs_info <mntpt>
> > >
> > > meta-data=/dev/TerrorVolume/terror isize=256    agcount=130385,
> > > agsize=32768 blks
> > >               =                      sectsz=512   attr=1
> > > data        =                      bsize=4096   blocks=4272433152, imaxpct=25
> > >               =                      sunit=0      swidth=0 blks
> > > naming   =version 2         bsize=4096   ascii-ci=0
> > > log         =internal            bsize=4096   blocks=2560, version=1
> > >              =                       sectsz=512   sunit=0 blks, lazy-count=0
> > > realtime  =none               extsz=4096   blocks=0, rtextents=0
> >
> > WHy did you make this filesystem with 128MB allocation groups? The
> > default for a filesystem of this size is 1TB allocation groups.
> > More than 100k allocation groups will certainly push internal AG
> > scanning scalability past it's tested limits....
> >
> > Also, a log of 10MB is rather small, and it tells me that you didn't
> > just create this filesystem firectly on the 16TB block device with a
> > recent mkfs.xfs. That is, at current mkfs.xfs defaults to get a layout like
> > this you'd have to ѕtart with a 512MB filesystem and grow it to
> > 16TB.
>
> Actually, an old mkfs that defaults to 16 AGs and a filesystem size
> of 2GB in needed to get a log of 2540 blocks. I just grew one of
> these to roughly 16TB and ended up with 125,000 AGs, so it's in the
> ballpark. Also, *allocating* 250GB to a single file (as
> preallocation) doesn't appear to have any problems on 2.6.35-rc4, so
> there doesn't appear to be any general error caused by this
> configuration in mainline....
>
> Can you run this command:
>
> # xfs_io -f -c "truncate 250g" -c "resvsp 0 250g" <test file>
>
> And see if that generates the same corruption as copying a file?
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

Hi David,

After many weeks of planning to move all the data off and back on
again we are happy to say the the partition is now working as
expected.

In the end we managed to backup all data on our partition and we
re-created it using the mkfs.xfs default options. This time we have 16
allocation groups as you suggested we should have.

It appears that the original partition was grown from an extremely
small size to have created that many allocation groups.

I would like to thank the xfs mailing list for all its help.

Regards,

Shaun

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-08-16 10:32 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-06 10:57 CentOS 5.5 XFS internal errors (XFS_WANT_CORRUPTED_GOTO) Shaun Adolphson
2010-07-06 16:17 ` Stan Hoeppner
2010-07-06 22:00 ` Shaun Adolphson
2010-07-06 23:18 ` Dave Chinner
2010-07-07  1:51   ` Eric Sandeen
2010-07-08 11:21   ` Shaun Adolphson
2010-07-11 11:44     ` Shaun Adolphson
2010-07-11 11:47       ` Shaun Adolphson
2010-07-12  1:08       ` Dave Chinner
2010-07-12  5:45         ` Dave Chinner
2010-08-16 10:32           ` Shaun Adolphson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox