public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Seg fault during xfs repair (segmentation fault / segv)
@ 2009-06-30 18:29 Jesse Stroik
  2009-06-30 18:41 ` Eric Sandeen
  0 siblings, 1 reply; 8+ messages in thread
From: Jesse Stroik @ 2009-06-30 18:29 UTC (permalink / raw)
  To: xfs

I have a server with a ~20TB xfs file system on Linux 
(2.6.18-92.1.22.el5) and am running xfsprogs-2.9.4-4.el5.  We had a few 
corrupted files which I believe were due to a SCSI issue after a recent 
power outage.  Due to the corruption, I ran xfs_check and would like to 
run xfs_repair on the system.

The server has a variety of larger file systems as well for which we 
haven't had any issues and are much more filled, so I doubt it is 
related to the size of the data on the file system.

I found a thread on this list which addresses a similar problem, but the 
issues are different enough that I believe this warrants another thread:

http://oss.sgi.com/archives/xfs/2009-06/msg00089.html

Here is a description of my issue, as output by xfs_repair.

xfs_repair -n executes properly.  However, xfs_repair without -n 
segfaults during phase 2 right before entering phase three.  The output 
follows:


xfs_repair
-------------------
...
primary/secondary superblock 31 conflict - AG superblock geometry info 
conflicts with filesystem geometry
bad magic # 0xea84a85b for agf 31
bad version # -1440567846 for agf 31
bad sequence # -1237535942 for agf 31
bad length -834374160 for agf 31, should be 152534208
flfirst -568402337 in agf 31 too large (max = 128)
fllast 258143670 in agf 31 too large (max = 128)
bad magic # 0x2b8d5c56 for agi 31
bad version # -1456306498 for agi 31
bad sequence # 830698397 for agi 31
bad length # 1972157355 for agi 31, should be 152534208
reset bad sb for ag 31
reset bad agf for ag 31
reset bad agi for ag 31
Segmentation fault
-------------------


xfs_repair -n
-------------------
...
primary/secondary superblock 31 conflict - AG superblock geometry info 
conflicts with filesystem geometry
bad flags field in superblock 31
bad shared version number in superblock 31
bad inode alignment field in superblock 31
bad stripe unit/width fields in superblock 31
bad log/data device sector size fields in superblock 31
bad magic # 0xea84a85b for agf 31
bad version # -1440567846 for agf 31
bad sequence # -1237535942 for agf 31
bad length -834374160 for agf 31, should be 152534208
flfirst -568402337 in agf 31 too large (max = 128)
fllast 258143670 in agf 31 too large (max = 128)
bad magic # 0x2b8d5c56 for agi 31
bad version # -1456306498 for agi 31
bad sequence # 830698397 for agi 31
bad length # 1972157355 for agi 31, should be 152534208
would reset bad sb for ag 31
would reset bad agf for ag 31
would reset bad agi for ag 31
bad uncorrected agheader 31, skipping ag...
         - found root inode chunk
Phase 3 - for each AG...
         - scan (but don't clear) agi unlinked lists...
error following ag 8 unlinked list
error following ag 10 unlinked list
-------------------


I attempt to avoid hand-compiled system tools, but in this case I would 
make an exception if it is necessary to either diagnose or address this 
problem.

Best,
Jesse Stroik

Space Science and Engineering Center
University of Wisconsin

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Seg fault during xfs repair (segmentation fault / segv)
  2009-06-30 18:29 Seg fault during xfs repair (segmentation fault / segv) Jesse Stroik
@ 2009-06-30 18:41 ` Eric Sandeen
  2009-06-30 21:01   ` Jesse Stroik
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2009-06-30 18:41 UTC (permalink / raw)
  To: Jesse Stroik; +Cc: xfs

Jesse Stroik wrote:
> I have a server with a ~20TB xfs file system on Linux 
> (2.6.18-92.1.22.el5) and am running xfsprogs-2.9.4-4.el5.  We had a few 
> corrupted files which I believe were due to a SCSI issue after a recent 
> power outage.  Due to the corruption, I ran xfs_check and would like to 
> run xfs_repair on the system.

It'd really be great to test more recent xfsprogs first, that one is
about 2 years old.

You can probably grab any recent fedora src.rpm and rebuild it, and
later go back to the centos version if you wish.

If it persists, I can help investigate...

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Seg fault during xfs repair (segmentation fault / segv)
  2009-06-30 18:41 ` Eric Sandeen
@ 2009-06-30 21:01   ` Jesse Stroik
  2009-06-30 21:11     ` Eric Sandeen
  2009-07-01 19:53     ` Eric Sandeen
  0 siblings, 2 replies; 8+ messages in thread
From: Jesse Stroik @ 2009-06-30 21:01 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

Eric,

Eric Sandeen wrote:
> Jesse Stroik wrote:
>> I have a server with a ~20TB xfs file system on Linux 
>> (2.6.18-92.1.22.el5) and am running xfsprogs-2.9.4-4.el5.  We had a few 
>> corrupted files which I believe were due to a SCSI issue after a recent 
>> power outage.  Due to the corruption, I ran xfs_check and would like to 
>> run xfs_repair on the system.
> 
> It'd really be great to test more recent xfsprogs first, that one is
> about 2 years old.
> 
> You can probably grab any recent fedora src.rpm and rebuild it, and
> later go back to the centos version if you wish.


I fetched the current version from SVN using these directions: 
http://xfs.org/index.php/Getting_the_latest_source_code

I get identical results.

--------
...
reset bad sb for ag 31
reset bad agf for ag 31
reset bad agi for ag 31
Segmentation fault

$ ./xfs_repair -V
xfs_repair version 3.0.2
--------

If you want me to rebuild with debug and get you any specific 
information, let me know.

Best,
Jesse

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Seg fault during xfs repair (segmentation fault / segv)
  2009-06-30 21:01   ` Jesse Stroik
@ 2009-06-30 21:11     ` Eric Sandeen
  2009-07-01 19:53     ` Eric Sandeen
  1 sibling, 0 replies; 8+ messages in thread
From: Eric Sandeen @ 2009-06-30 21:11 UTC (permalink / raw)
  To: Jesse Stroik; +Cc: xfs

Jesse Stroik wrote:
> Eric,
> 
> Eric Sandeen wrote:
>> Jesse Stroik wrote:
>>> I have a server with a ~20TB xfs file system on Linux 
>>> (2.6.18-92.1.22.el5) and am running xfsprogs-2.9.4-4.el5.  We had a few 
>>> corrupted files which I believe were due to a SCSI issue after a recent 
>>> power outage.  Due to the corruption, I ran xfs_check and would like to 
>>> run xfs_repair on the system.
>> It'd really be great to test more recent xfsprogs first, that one is
>> about 2 years old.
>>
>> You can probably grab any recent fedora src.rpm and rebuild it, and
>> later go back to the centos version if you wish.
> 
> 
> I fetched the current version from SVN using these directions: 
> http://xfs.org/index.php/Getting_the_latest_source_code
> 
> I get identical results.

Bummer :)

> --------
> ...
> reset bad sb for ag 31
> reset bad agf for ag 31
> reset bad agi for ag 31
> Segmentation fault
> 
> $ ./xfs_repair -V
> xfs_repair version 3.0.2
> --------
> 
> If you want me to rebuild with debug and get you any specific 
> information, let me know.

That'd be great.

Perhaps you can give these a shot:

http://sandeen.fedorapeople.org/test/xfsprogs-3.0.1-8.test1.x86_64.rpm
http://sandeen.fedorapeople.org/test/xfsprogs-debuginfo-3.0.1-8.test1.x86_64.rpm

(they're just rebuilt from fedora, no special sauce in there)

run with ulimit -c unlimited & gather a core dump for starters?

You could also try creating an xfs_metadump of the filesystem and see if
xfs_repair also segfaults on that; then perhaps you could provide the
metadump for analysis.

Thanks,
-Eric


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Seg fault during xfs repair (segmentation fault / segv)
  2009-06-30 21:01   ` Jesse Stroik
  2009-06-30 21:11     ` Eric Sandeen
@ 2009-07-01 19:53     ` Eric Sandeen
  2009-07-01 20:34       ` Eric Sandeen
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2009-07-01 19:53 UTC (permalink / raw)
  To: Jesse Stroik; +Cc: xfs

Jesse Stroik wrote:
> Eric,
> 
> Eric Sandeen wrote:
>> Jesse Stroik wrote:
>>> I have a server with a ~20TB xfs file system on Linux 
>>> (2.6.18-92.1.22.el5) and am running xfsprogs-2.9.4-4.el5.  We had a few 
>>> corrupted files which I believe were due to a SCSI issue after a recent 
>>> power outage.  Due to the corruption, I ran xfs_check and would like to 
>>> run xfs_repair on the system.
>> It'd really be great to test more recent xfsprogs first, that one is
>> about 2 years old.
>>
>> You can probably grab any recent fedora src.rpm and rebuild it, and
>> later go back to the centos version if you wish.
> 
> 
> I fetched the current version from SVN using these directions: 
> http://xfs.org/index.php/Getting_the_latest_source_code
> 
> I get identical results.
> 
> --------
> ...
> reset bad sb for ag 31
> reset bad agf for ag 31
> reset bad agi for ag 31
> Segmentation fault

Ok, from a metadump image Jesse provided (thanks!) it's dying in here:

                bno = be32_to_cpu(agfl->agfl_bno[i]);
                printf("agfl at %p i is %d agfl_bno[i] %u bno is %u\n",
agfl, i, agfl->agfl_bno[i], bno);
                if (verify_agbno(mp, be32_to_cpu(agf->agf_seqno), bno))
                        set_agbno_state(mp, be32_to_cpu(agf->agf_seqno),
                                        bno, XR_E_FREE);

agfl_bno looks corrupt, and bno is coming out to be huge.

set_agbno_state() does:

*(ba_bmap[(agno)] + (ag_blockno)/XR_BB_NUM) = ....

where ag_blockno is that bno above; this wanders us off into bad memory
and boom.  I'll see what we can do to fix it up.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Seg fault during xfs repair (segmentation fault / segv)
  2009-07-01 19:53     ` Eric Sandeen
@ 2009-07-01 20:34       ` Eric Sandeen
  2009-07-01 20:51         ` Jesse Stroik
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2009-07-01 20:34 UTC (permalink / raw)
  To: Jesse Stroik; +Cc: xfs

Eric Sandeen wrote:

> Ok, from a metadump image Jesse provided (thanks!) it's dying in here:
> 
>                 bno = be32_to_cpu(agfl->agfl_bno[i]);
>                 printf("agfl at %p i is %d agfl_bno[i] %u bno is %u\n",
> agfl, i, agfl->agfl_bno[i], bno);
>                 if (verify_agbno(mp, be32_to_cpu(agf->agf_seqno), bno))
>                         set_agbno_state(mp, be32_to_cpu(agf->agf_seqno),
>                                         bno, XR_E_FREE);
> 
> agfl_bno looks corrupt, and bno is coming out to be huge.
> 
> set_agbno_state() does:
> 
> *(ba_bmap[(agno)] + (ag_blockno)/XR_BB_NUM) = ....
> 
> where ag_blockno is that bno above; this wanders us off into bad memory
> and boom.  I'll see what we can do to fix it up.

Ok patch sent, but now I hit:

junking entry "soh " in directory inode 128
entry ".nsr" in shortform directory 128 references invalid inode 210397
junking entry ".nsr" in directory inode 128
bogus .. inode number (128) in directory inode 128, clearing inode number
xfs_repair: dir2.c:2123: process_dir2: Assertion `(ino !=
mp->m_sb.sb_rootino && ino != *parent) || (ino == mp->m_sb.sb_rootino &&
(ino == *parent || need_root_dotdot == 1))' failed.
Aborted

that's one crunchy filesystem you've got there; what happened to it?

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Seg fault during xfs repair (segmentation fault / segv)
  2009-07-01 20:34       ` Eric Sandeen
@ 2009-07-01 20:51         ` Jesse Stroik
  2009-07-01 20:52           ` Eric Sandeen
  0 siblings, 1 reply; 8+ messages in thread
From: Jesse Stroik @ 2009-07-01 20:51 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

Eric,

Thanks for addressing the issue with xfs_repair.


> that's one crunchy filesystem you've got there; what happened to it?


It's not entirely clear -- the JBOD and SAS controller seem to have 
gotten into inconsistent states and I was observing a few SCSI errors 
for those particular LUNs.  While the system was exhibiting the SCSI 
errors, the user of this file system (and a few others like it on the 
host) noticed file corruption when reading/writing certain files, then 
spontaneous corruption after making copies of the files.

I'll take a look at the new xfs_check and see what happens.

Best,
Jesse

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Seg fault during xfs repair (segmentation fault / segv)
  2009-07-01 20:51         ` Jesse Stroik
@ 2009-07-01 20:52           ` Eric Sandeen
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Sandeen @ 2009-07-01 20:52 UTC (permalink / raw)
  To: Jesse Stroik; +Cc: xfs

Jesse Stroik wrote:
> Eric,
> 
> Thanks for addressing the issue with xfs_repair.
> 
> 
>> that's one crunchy filesystem you've got there; what happened to it?
> 
> 
> It's not entirely clear -- the JBOD and SAS controller seem to have 
> gotten into inconsistent states and I was observing a few SCSI errors 
> for those particular LUNs.  While the system was exhibiting the SCSI 
> errors, the user of this file system (and a few others like it on the 
> host) noticed file corruption when reading/writing certain files, then 
> spontaneous corruption after making copies of the files.
> 
> I'll take a look at the new xfs_check and see what happens.
> 
> Best,
> Jesse
> 

Turns out that it runs to completion, but another run still finds
corruption.  And a debug build trips asserts, so I guess there are still
issues.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-07-01 20:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-30 18:29 Seg fault during xfs repair (segmentation fault / segv) Jesse Stroik
2009-06-30 18:41 ` Eric Sandeen
2009-06-30 21:01   ` Jesse Stroik
2009-06-30 21:11     ` Eric Sandeen
2009-07-01 19:53     ` Eric Sandeen
2009-07-01 20:34       ` Eric Sandeen
2009-07-01 20:51         ` Jesse Stroik
2009-07-01 20:52           ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox