* Seg fault during xfs repair (segmentation fault / segv)
@ 2009-06-30 18:29 Jesse Stroik
2009-06-30 18:41 ` Eric Sandeen
0 siblings, 1 reply; 8+ messages in thread
From: Jesse Stroik @ 2009-06-30 18:29 UTC (permalink / raw)
To: xfs
I have a server with a ~20TB xfs file system on Linux
(2.6.18-92.1.22.el5) and am running xfsprogs-2.9.4-4.el5. We had a few
corrupted files which I believe were due to a SCSI issue after a recent
power outage. Due to the corruption, I ran xfs_check and would like to
run xfs_repair on the system.
The server has a variety of larger file systems as well for which we
haven't had any issues and are much more filled, so I doubt it is
related to the size of the data on the file system.
I found a thread on this list which addresses a similar problem, but the
issues are different enough that I believe this warrants another thread:
http://oss.sgi.com/archives/xfs/2009-06/msg00089.html
Here is a description of my issue, as output by xfs_repair.
xfs_repair -n executes properly. However, xfs_repair without -n
segfaults during phase 2 right before entering phase three. The output
follows:
xfs_repair
-------------------
...
primary/secondary superblock 31 conflict - AG superblock geometry info
conflicts with filesystem geometry
bad magic # 0xea84a85b for agf 31
bad version # -1440567846 for agf 31
bad sequence # -1237535942 for agf 31
bad length -834374160 for agf 31, should be 152534208
flfirst -568402337 in agf 31 too large (max = 128)
fllast 258143670 in agf 31 too large (max = 128)
bad magic # 0x2b8d5c56 for agi 31
bad version # -1456306498 for agi 31
bad sequence # 830698397 for agi 31
bad length # 1972157355 for agi 31, should be 152534208
reset bad sb for ag 31
reset bad agf for ag 31
reset bad agi for ag 31
Segmentation fault
-------------------
xfs_repair -n
-------------------
...
primary/secondary superblock 31 conflict - AG superblock geometry info
conflicts with filesystem geometry
bad flags field in superblock 31
bad shared version number in superblock 31
bad inode alignment field in superblock 31
bad stripe unit/width fields in superblock 31
bad log/data device sector size fields in superblock 31
bad magic # 0xea84a85b for agf 31
bad version # -1440567846 for agf 31
bad sequence # -1237535942 for agf 31
bad length -834374160 for agf 31, should be 152534208
flfirst -568402337 in agf 31 too large (max = 128)
fllast 258143670 in agf 31 too large (max = 128)
bad magic # 0x2b8d5c56 for agi 31
bad version # -1456306498 for agi 31
bad sequence # 830698397 for agi 31
bad length # 1972157355 for agi 31, should be 152534208
would reset bad sb for ag 31
would reset bad agf for ag 31
would reset bad agi for ag 31
bad uncorrected agheader 31, skipping ag...
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
error following ag 8 unlinked list
error following ag 10 unlinked list
-------------------
I attempt to avoid hand-compiled system tools, but in this case I would
make an exception if it is necessary to either diagnose or address this
problem.
Best,
Jesse Stroik
Space Science and Engineering Center
University of Wisconsin
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Seg fault during xfs repair (segmentation fault / segv)
2009-06-30 18:29 Seg fault during xfs repair (segmentation fault / segv) Jesse Stroik
@ 2009-06-30 18:41 ` Eric Sandeen
2009-06-30 21:01 ` Jesse Stroik
0 siblings, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2009-06-30 18:41 UTC (permalink / raw)
To: Jesse Stroik; +Cc: xfs
Jesse Stroik wrote:
> I have a server with a ~20TB xfs file system on Linux
> (2.6.18-92.1.22.el5) and am running xfsprogs-2.9.4-4.el5. We had a few
> corrupted files which I believe were due to a SCSI issue after a recent
> power outage. Due to the corruption, I ran xfs_check and would like to
> run xfs_repair on the system.
It'd really be great to test more recent xfsprogs first, that one is
about 2 years old.
You can probably grab any recent fedora src.rpm and rebuild it, and
later go back to the centos version if you wish.
If it persists, I can help investigate...
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Seg fault during xfs repair (segmentation fault / segv)
2009-06-30 18:41 ` Eric Sandeen
@ 2009-06-30 21:01 ` Jesse Stroik
2009-06-30 21:11 ` Eric Sandeen
2009-07-01 19:53 ` Eric Sandeen
0 siblings, 2 replies; 8+ messages in thread
From: Jesse Stroik @ 2009-06-30 21:01 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs
Eric,
Eric Sandeen wrote:
> Jesse Stroik wrote:
>> I have a server with a ~20TB xfs file system on Linux
>> (2.6.18-92.1.22.el5) and am running xfsprogs-2.9.4-4.el5. We had a few
>> corrupted files which I believe were due to a SCSI issue after a recent
>> power outage. Due to the corruption, I ran xfs_check and would like to
>> run xfs_repair on the system.
>
> It'd really be great to test more recent xfsprogs first, that one is
> about 2 years old.
>
> You can probably grab any recent fedora src.rpm and rebuild it, and
> later go back to the centos version if you wish.
I fetched the current version from SVN using these directions:
http://xfs.org/index.php/Getting_the_latest_source_code
I get identical results.
--------
...
reset bad sb for ag 31
reset bad agf for ag 31
reset bad agi for ag 31
Segmentation fault
$ ./xfs_repair -V
xfs_repair version 3.0.2
--------
If you want me to rebuild with debug and get you any specific
information, let me know.
Best,
Jesse
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Seg fault during xfs repair (segmentation fault / segv)
2009-06-30 21:01 ` Jesse Stroik
@ 2009-06-30 21:11 ` Eric Sandeen
2009-07-01 19:53 ` Eric Sandeen
1 sibling, 0 replies; 8+ messages in thread
From: Eric Sandeen @ 2009-06-30 21:11 UTC (permalink / raw)
To: Jesse Stroik; +Cc: xfs
Jesse Stroik wrote:
> Eric,
>
> Eric Sandeen wrote:
>> Jesse Stroik wrote:
>>> I have a server with a ~20TB xfs file system on Linux
>>> (2.6.18-92.1.22.el5) and am running xfsprogs-2.9.4-4.el5. We had a few
>>> corrupted files which I believe were due to a SCSI issue after a recent
>>> power outage. Due to the corruption, I ran xfs_check and would like to
>>> run xfs_repair on the system.
>> It'd really be great to test more recent xfsprogs first, that one is
>> about 2 years old.
>>
>> You can probably grab any recent fedora src.rpm and rebuild it, and
>> later go back to the centos version if you wish.
>
>
> I fetched the current version from SVN using these directions:
> http://xfs.org/index.php/Getting_the_latest_source_code
>
> I get identical results.
Bummer :)
> --------
> ...
> reset bad sb for ag 31
> reset bad agf for ag 31
> reset bad agi for ag 31
> Segmentation fault
>
> $ ./xfs_repair -V
> xfs_repair version 3.0.2
> --------
>
> If you want me to rebuild with debug and get you any specific
> information, let me know.
That'd be great.
Perhaps you can give these a shot:
http://sandeen.fedorapeople.org/test/xfsprogs-3.0.1-8.test1.x86_64.rpm
http://sandeen.fedorapeople.org/test/xfsprogs-debuginfo-3.0.1-8.test1.x86_64.rpm
(they're just rebuilt from fedora, no special sauce in there)
run with ulimit -c unlimited & gather a core dump for starters?
You could also try creating an xfs_metadump of the filesystem and see if
xfs_repair also segfaults on that; then perhaps you could provide the
metadump for analysis.
Thanks,
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Seg fault during xfs repair (segmentation fault / segv)
2009-06-30 21:01 ` Jesse Stroik
2009-06-30 21:11 ` Eric Sandeen
@ 2009-07-01 19:53 ` Eric Sandeen
2009-07-01 20:34 ` Eric Sandeen
1 sibling, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2009-07-01 19:53 UTC (permalink / raw)
To: Jesse Stroik; +Cc: xfs
Jesse Stroik wrote:
> Eric,
>
> Eric Sandeen wrote:
>> Jesse Stroik wrote:
>>> I have a server with a ~20TB xfs file system on Linux
>>> (2.6.18-92.1.22.el5) and am running xfsprogs-2.9.4-4.el5. We had a few
>>> corrupted files which I believe were due to a SCSI issue after a recent
>>> power outage. Due to the corruption, I ran xfs_check and would like to
>>> run xfs_repair on the system.
>> It'd really be great to test more recent xfsprogs first, that one is
>> about 2 years old.
>>
>> You can probably grab any recent fedora src.rpm and rebuild it, and
>> later go back to the centos version if you wish.
>
>
> I fetched the current version from SVN using these directions:
> http://xfs.org/index.php/Getting_the_latest_source_code
>
> I get identical results.
>
> --------
> ...
> reset bad sb for ag 31
> reset bad agf for ag 31
> reset bad agi for ag 31
> Segmentation fault
Ok, from a metadump image Jesse provided (thanks!) it's dying in here:
bno = be32_to_cpu(agfl->agfl_bno[i]);
printf("agfl at %p i is %d agfl_bno[i] %u bno is %u\n",
agfl, i, agfl->agfl_bno[i], bno);
if (verify_agbno(mp, be32_to_cpu(agf->agf_seqno), bno))
set_agbno_state(mp, be32_to_cpu(agf->agf_seqno),
bno, XR_E_FREE);
agfl_bno looks corrupt, and bno is coming out to be huge.
set_agbno_state() does:
*(ba_bmap[(agno)] + (ag_blockno)/XR_BB_NUM) = ....
where ag_blockno is that bno above; this wanders us off into bad memory
and boom. I'll see what we can do to fix it up.
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Seg fault during xfs repair (segmentation fault / segv)
2009-07-01 19:53 ` Eric Sandeen
@ 2009-07-01 20:34 ` Eric Sandeen
2009-07-01 20:51 ` Jesse Stroik
0 siblings, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2009-07-01 20:34 UTC (permalink / raw)
To: Jesse Stroik; +Cc: xfs
Eric Sandeen wrote:
> Ok, from a metadump image Jesse provided (thanks!) it's dying in here:
>
> bno = be32_to_cpu(agfl->agfl_bno[i]);
> printf("agfl at %p i is %d agfl_bno[i] %u bno is %u\n",
> agfl, i, agfl->agfl_bno[i], bno);
> if (verify_agbno(mp, be32_to_cpu(agf->agf_seqno), bno))
> set_agbno_state(mp, be32_to_cpu(agf->agf_seqno),
> bno, XR_E_FREE);
>
> agfl_bno looks corrupt, and bno is coming out to be huge.
>
> set_agbno_state() does:
>
> *(ba_bmap[(agno)] + (ag_blockno)/XR_BB_NUM) = ....
>
> where ag_blockno is that bno above; this wanders us off into bad memory
> and boom. I'll see what we can do to fix it up.
Ok patch sent, but now I hit:
junking entry "soh " in directory inode 128
entry ".nsr" in shortform directory 128 references invalid inode 210397
junking entry ".nsr" in directory inode 128
bogus .. inode number (128) in directory inode 128, clearing inode number
xfs_repair: dir2.c:2123: process_dir2: Assertion `(ino !=
mp->m_sb.sb_rootino && ino != *parent) || (ino == mp->m_sb.sb_rootino &&
(ino == *parent || need_root_dotdot == 1))' failed.
Aborted
that's one crunchy filesystem you've got there; what happened to it?
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Seg fault during xfs repair (segmentation fault / segv)
2009-07-01 20:34 ` Eric Sandeen
@ 2009-07-01 20:51 ` Jesse Stroik
2009-07-01 20:52 ` Eric Sandeen
0 siblings, 1 reply; 8+ messages in thread
From: Jesse Stroik @ 2009-07-01 20:51 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs
Eric,
Thanks for addressing the issue with xfs_repair.
> that's one crunchy filesystem you've got there; what happened to it?
It's not entirely clear -- the JBOD and SAS controller seem to have
gotten into inconsistent states and I was observing a few SCSI errors
for those particular LUNs. While the system was exhibiting the SCSI
errors, the user of this file system (and a few others like it on the
host) noticed file corruption when reading/writing certain files, then
spontaneous corruption after making copies of the files.
I'll take a look at the new xfs_check and see what happens.
Best,
Jesse
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Seg fault during xfs repair (segmentation fault / segv)
2009-07-01 20:51 ` Jesse Stroik
@ 2009-07-01 20:52 ` Eric Sandeen
0 siblings, 0 replies; 8+ messages in thread
From: Eric Sandeen @ 2009-07-01 20:52 UTC (permalink / raw)
To: Jesse Stroik; +Cc: xfs
Jesse Stroik wrote:
> Eric,
>
> Thanks for addressing the issue with xfs_repair.
>
>
>> that's one crunchy filesystem you've got there; what happened to it?
>
>
> It's not entirely clear -- the JBOD and SAS controller seem to have
> gotten into inconsistent states and I was observing a few SCSI errors
> for those particular LUNs. While the system was exhibiting the SCSI
> errors, the user of this file system (and a few others like it on the
> host) noticed file corruption when reading/writing certain files, then
> spontaneous corruption after making copies of the files.
>
> I'll take a look at the new xfs_check and see what happens.
>
> Best,
> Jesse
>
Turns out that it runs to completion, but another run still finds
corruption. And a debug build trips asserts, so I guess there are still
issues.
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-07-01 20:51 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-30 18:29 Seg fault during xfs repair (segmentation fault / segv) Jesse Stroik
2009-06-30 18:41 ` Eric Sandeen
2009-06-30 21:01 ` Jesse Stroik
2009-06-30 21:11 ` Eric Sandeen
2009-07-01 19:53 ` Eric Sandeen
2009-07-01 20:34 ` Eric Sandeen
2009-07-01 20:51 ` Jesse Stroik
2009-07-01 20:52 ` Eric Sandeen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox