* XFS-filesystem corrupted by defragmentation
@ 2010-04-13 12:10 Bernhard Gschaider
2010-04-13 14:58 ` Robert Brockway
2010-04-13 16:36 ` Eric Sandeen
0 siblings, 2 replies; 6+ messages in thread
From: Bernhard Gschaider @ 2010-04-13 12:10 UTC (permalink / raw)
To: xfs
Hi!
I'm asking here because I've been referred here fro the CentOS-mailing
list (for the full story see
http://www.pubbs.net/201004/centos/17112-centos-performance-problems-with-xfs-on-centos-54.html
and
http://www.pubbs.net/201004/centos/24542-centos-xfs-filesystem-corrupted-by-defragmentation-was-performance-problems-with-xfs-on-centos-54.html
the following stuff is a summary of this)
It was suggested to me that the source of my performance problems might
be the fragmentation of the XFS-system. I tested for fragmentation and
got
xfs_db> frag
actual 6349355, ideal 4865683, fragmentation factor 23.37%
Before I'd try to defragment my whole filesystem I figured "Let's try
it on some file".
So I did
> xfs_bmap /raid/Temp/someDiskimage.iso
[output shows 101 extents and 1 hole]
Then I defragmented the file
> xfs_fsr /raid/Temp/someDiskimage.iso
extents before:101 after:3 DONE
> xfs_bmap /raid/Temp/someDiskimage.iso
[output shows 3 extents and 1 hole]
and now comes the bummer: i wanted to check the fragmentation of the
whole filesystem (just for checking):
> xfs_db -r /dev/mapper/VolGroup00-LogVol04
xfs_db: unexpected XFS SB magic number 0x00000000
xfs_db: read failed: Invalid argument
xfs_db: data size check failed
cache_node_purge: refcount was 1, not zero (node=0x2a25c20)
xfs_db: cannot read root inode (22)
THAT output was definitly not there when I did this the last time and
therefor the new fragmentation does not make me happy either
xfs_db> frag
actual 0, ideal 0, fragmentation factor 0.00%
The file-system is still mounted and working and I don't dare to do
anything about it (am in a mild state of panic) because I think it
might not come back if I do.
Any suggestions most welcome (am googling myself before I do anything
about it).
I swear to god: I did not do anything else with the xfs_*-commands
than the stuff mentioned above
As far as I understood from other places the first thing to do is "try
to get the incore copy of the XFS superblock flushed out" before
proceeding (must find out how to do that). How would you suggest to
proceed from that? If defragmenting one file messes things up this
badly how safe is defragmentation in general?
Thanks for your time
Bernhard
Info about my system. Tell me if you need more info:
My system is a CentOS 5.4 (which is equivalent to a RHEL 5.4) which
means kernel 2.6.18 (64bit. Unmodified Xen-Kernel). xfs_db -V reports
"xfs_db version 2.9.4"
Memory on the system is 4Gig (2 DualCore Xenons). The filesystem is
3.5 TB of which 740 Gig are used. Which is the maximum amount used
during the one year that the filesystem is being used (that is why the
high fragmentation amazes me) The filesystem is on a LVM-Volume which
sits on a RAID 5 (Hardware RAID) drive.
% xfs_info /raid
meta-data=/dev/VolGroup00/LogVol05 isize=256 agcount=32, agsize=29434880 blks
= sectsz=512 attr=0
data = bsize=4096 blocks=941916160, imaxpct=25
= sunit=0 swidth=0 blks, unwritten=1
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=1
= sectsz=512 sunit=0 blks, lazy-count=0
realtime =none extsz=4096 blocks=0, rtextents=0
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: XFS-filesystem corrupted by defragmentation
2010-04-13 12:10 Bernhard Gschaider
@ 2010-04-13 14:58 ` Robert Brockway
2010-04-13 15:24 ` Bernhard Gschaider
2010-04-13 16:36 ` Eric Sandeen
1 sibling, 1 reply; 6+ messages in thread
From: Robert Brockway @ 2010-04-13 14:58 UTC (permalink / raw)
To: Bernhard Gschaider; +Cc: xfs
On Tue, 13 Apr 2010, Bernhard Gschaider wrote:
>> xfs_db -r /dev/mapper/VolGroup00-LogVol04
> xfs_db: unexpected XFS SB magic number 0x00000000
> xfs_db: read failed: Invalid argument
> xfs_db: data size check failed
> cache_node_purge: refcount was 1, not zero (node=0x2a25c20)
> xfs_db: cannot read root inode (22)
Hi Bernhard. Hmm that doesn't sound good.
> The file-system is still mounted and working and I don't dare to do
> anything about it (am in a mild state of panic) because I think it
> might not come back if I do.
I think your choice to sit back and evaluate your options before acting is
a wise one, especially since the filesystem is apparently mounted and
functioning.
Depending on how worried you are there are various options available. Eg
you could declare an emergency on the server and use xfs_freeze to freeze
the filesystem while you take a backup. Note - I have never used
xfs_freeze like this, it is just a suggestion. Naturally this will cause
an outage and problems for users.
Alternatively you could use xfsdump to capture an incremental or full
backup on the running system. (depending on whether you already have a
level 0 xfs dump file or not). The developers have confirmed (on this
list) that xfsdump will provide a consistent backup on a live filesystem.
Please note that any heavy I/O (like a backup) has the potential to cause
problems on a sick filesystem. In my experience xfs is inclined to
automatically remount read-only if it detects problems. While this can be
catastrophic for running processes it is helpful in protecting data so I'm
happy it works this way.
One last note. I hope you have good backups already. If you don't then
this is the time to start taking good backups.
These are the notes from my backup talk:
http://www.timetraveller.org/talks/backup_talk.pdf
> I swear to god: I did not do anything else with the xfs_*-commands
> than the stuff mentioned above
I defrag XFS filesystems from cron as recommended by SGI and I've never
had a problem. Maybe defragmentation didn't cause the problem - maybe it
just revealed an underlying problem.
Cheers,
Rob
--
Email: robert@timetraveller.org
IRC: Solver
Web: http://www.practicalsysadmin.com
Open Source: The revolution that silently changed the world
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: XFS-filesystem corrupted by defragmentation
2010-04-13 14:58 ` Robert Brockway
@ 2010-04-13 15:24 ` Bernhard Gschaider
0 siblings, 0 replies; 6+ messages in thread
From: Bernhard Gschaider @ 2010-04-13 15:24 UTC (permalink / raw)
To: Robert Brockway; +Cc: xfs
Thanks for the answer
>>>>> On Tue, 13 Apr 2010 10:58:22 -0400 (EDT)
>>>>> "RB" == Robert Brockway <robert@timetraveller.org> wrote:
RB> On Tue, 13 Apr 2010, Bernhard Gschaider wrote:
>>> xfs_db -r /dev/mapper/VolGroup00-LogVol04
>> xfs_db: unexpected XFS SB magic number 0x00000000 xfs_db: read
>> failed: Invalid argument xfs_db: data size check failed
>> cache_node_purge: refcount was 1, not zero (node=0x2a25c20)
>> xfs_db: cannot read root inode (22)
RB> Hi Bernhard. Hmm that doesn't sound good.
http://oss.sgi.com/archives/xfs/2007-04/msg00580.html suggests a sync
for that kind of situation. Any thoughts on this? I know that there is
no definite answer to his. Only guesses by people who have more
experience than me
>> The file-system is still mounted and working and I don't dare
>> to do anything about it (am in a mild state of panic) because I
>> think it might not come back if I do.
RB> I think your choice to sit back and evaluate your options
RB> before acting is a wise one, especially since the filesystem
RB> is apparently mounted and functioning.
RB> Depending on how worried you are there are various options
RB> available. Eg you could declare an emergency on the server
RB> and use xfs_freeze to freeze the filesystem while you take a
RB> backup. Note - I have never used xfs_freeze like this, it is
RB> just a suggestion. Naturally this will cause an outage and
RB> problems for users.
They'll have to live with that
RB> Alternatively you could use xfsdump to capture an incremental
RB> or full backup on the running system. (depending on whether
RB> you already have a level 0 xfs dump file or not). The
RB> developers have confirmed (on this list) that xfsdump will
RB> provide a consistent backup on a live filesystem.
RB> Please note that any heavy I/O (like a backup) has the
RB> potential to cause problems on a sick filesystem. In my
RB> experience xfs is inclined to automatically remount read-only
RB> if it detects problems. While this can be catastrophic for
RB> running processes it is helpful in protecting data so I'm
RB> happy it works this way.
RB> One last note. I hope you have good backups already. If you
RB> don't then this is the time to start taking good backups.
I have weekly backups with amanda. The tapes verify OK, but I never
tried a full-scale recover before.
RB> These are the notes from my backup talk:
RB> http://www.timetraveller.org/talks/backup_talk.pdf
>> I swear to god: I did not do anything else with the
>> xfs_*-commands than the stuff mentioned above
RB> I defrag XFS filesystems from cron as recommended by SGI and
RB> I've never had a problem. Maybe defragmentation didn't cause
RB> the problem - maybe it just revealed an underlying problem.
But it doesn't have to do with the hole that the xfs_bmap reported for
that file?
Bernhard
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: XFS-filesystem corrupted by defragmentation
@ 2010-04-13 16:08 Sebastian Brings
2010-04-13 17:41 ` Bernhard Gschaider
0 siblings, 1 reply; 6+ messages in thread
From: Sebastian Brings @ 2010-04-13 16:08 UTC (permalink / raw)
To: Bernhard Gschaider, xfs
> Hi!> > I'm asking here because I've been referred here fro the CentOS-mailing> list (for the full story see> http://www.pubbs.net/201004/centos/17112-centos-performance-> problems-with-xfs-on-centos-54.html> and > http://www.pubbs.net/201004/centos/24542-centos-xfs-filesystem-> corrupted-by-defragmentation-was-performance-problems-with-xfs-on-> centos-54.html> the following stuff is a summary of this)> > It was suggested to me that the source of my performance problems might> be the fragmentation of the XFS-system. I tested for fragmentation and> got > > xfs_db> frag> actual 6349355, ideal 4865683, fragmentation factor 23.37%> > Before I'd try to defragment my whole filesystem I figured "Let's try> it on some file". > > So I did> > > xfs_bmap /raid/Temp/someDiskimage.iso> [output shows 101 extents and 1 hole]> > Then I defragmented the file> > xfs_fsr /raid/Temp/someDiskimage.iso> extents before:101 after:3 DONE> > > xfs_bmap /raid/Temp/someDiskimage.iso> [output shows 3 extents and 1 hole]> > and now comes the bummer: i wanted to check the fragmentation of the> whole filesystem (just for checking):> > > xfs_db -r /dev/mapper/VolGroup00-LogVol04> xfs_db: unexpected XFS SB magic number 0x00000000> xfs_db: read failed: Invalid argument> xfs_db: data size check failed> cache_node_purge: refcount was 1, not zero (node=0x2a25c20)> xfs_db: cannot read root inode (22)> > THAT output was definitly not there when I did this the last time and> therefor the new fragmentation does not make me happy either> > xfs_db> frag> actual 0, ideal 0, fragmentation factor 0.00%> > The file-system is still mounted and working and I don't dare to do> anything about it (am in a mild state of panic) because I think it> might not come back if I do.> > Any suggestions most welcome (am googling myself before I do anything> about it).> > I swear to god: I did not do anything else with the xfs_*-commands> than the stuff mentioned above> > As far as I understood from other places the first thing to do is "try> to get the incore copy of the XFS superblock flushed out" before> proceeding (must find out how to do that). How would you suggest to> proceed from that? If defragmenting one file messes things up this> badly how safe is defragmentation in general?> > Thanks for your time> Bernhard> > Info about my system. Tell me if you need more info:> > My system is a CentOS 5.4 (which is equivalent to a RHEL 5.4) which> means kernel 2.6.18 (64bit. Unmodified Xen-Kernel). xfs_db -V reports> "xfs_db version 2.9.4"> > Memory on the system is 4Gig (2 DualCore Xenons). The filesystem is> 3.5 TB of which 740 Gig are used. Which is the maximum amount used> during the one year that the filesystem is being used (that is why the> high fragmentation amazes me) The filesystem is on a LVM-Volume which> sits on a RAID 5 (Hardware RAID) drive.> > % xfs_info /raid> meta-data=/dev/VolGroup00/LogVol05 isize=256 agcount=32, > agsize=29434880 blks> = sectsz=512 attr=0> data = bsize=4096 blocks=941916160, imaxpct=25> = sunit=0 swidth=0 blks, unwritten=1> naming =version 2 bsize=4096 > log =internal bsize=4096 blocks=32768, version=1> = sectsz=512 sunit=0 blks, lazy-count=0> realtime =none extsz=4096 blocks=0, rtextents=0>
Hi,
could it be you specified the wrong device for xfs_db? The xfs_info gives =/dev/VolGroup00/LogVol05 as metadata device, but for xfs_db you used /dev/mapper/VolGroup00-LogVol04...
Sebastian
___________________________________________________________
NEU: WEB.DE DSL für 19,99 EUR/mtl. und ohne Mindest-Laufzeit!
http://produkte.web.de/go/02/
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: XFS-filesystem corrupted by defragmentation
2010-04-13 12:10 Bernhard Gschaider
2010-04-13 14:58 ` Robert Brockway
@ 2010-04-13 16:36 ` Eric Sandeen
1 sibling, 0 replies; 6+ messages in thread
From: Eric Sandeen @ 2010-04-13 16:36 UTC (permalink / raw)
To: Bernhard Gschaider; +Cc: xfs
On 04/13/2010 07:10 AM, Bernhard Gschaider wrote:
>
> Hi!
>
> I'm asking here because I've been referred here fro the CentOS-mailing
> list (for the full story see
> http://www.pubbs.net/201004/centos/17112-centos-performance-problems-with-xfs-on-centos-54.html
> and
> http://www.pubbs.net/201004/centos/24542-centos-xfs-filesystem-corrupted-by-defragmentation-was-performance-problems-with-xfs-on-centos-54.html
> the following stuff is a summary of this)
>
> It was suggested to me that the source of my performance problems might
> be the fragmentation of the XFS-system. I tested for fragmentation and
> got
>
> xfs_db> frag
> actual 6349355, ideal 4865683, fragmentation factor 23.37%
so on average your filesystem has 6349355/4865683 = 1.3 extents per file.
Just as a casual side note, this is not even remotely bad, at least
on average.
> Before I'd try to defragment my whole filesystem I figured "Let's try
> it on some file".
>
> So I did
>
>> xfs_bmap /raid/Temp/someDiskimage.iso
> [output shows 101 extents and 1 hole]
>
> Then I defragmented the file
>> xfs_fsr /raid/Temp/someDiskimage.iso
> extents before:101 after:3 DONE
>
>> xfs_bmap /raid/Temp/someDiskimage.iso
> [output shows 3 extents and 1 hole]
>
> and now comes the bummer: i wanted to check the fragmentation of the
> whole filesystem (just for checking):
>
>> xfs_db -r /dev/mapper/VolGroup00-LogVol04
> xfs_db: unexpected XFS SB magic number 0x00000000
> xfs_db: read failed: Invalid argument
> xfs_db: data size check failed
> cache_node_purge: refcount was 1, not zero (node=0x2a25c20)
> xfs_db: cannot read root inode (22)
So here you did:
# xfs_db -r /dev/mapper/VolGroup00-LogVol04
but below you show:
% xfs_info /raid
> meta-data=/dev/VolGroup00/LogVol05
.... wrong device maybe?
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: XFS-filesystem corrupted by defragmentation
2010-04-13 16:08 XFS-filesystem corrupted by defragmentation Sebastian Brings
@ 2010-04-13 17:41 ` Bernhard Gschaider
0 siblings, 0 replies; 6+ messages in thread
From: Bernhard Gschaider @ 2010-04-13 17:41 UTC (permalink / raw)
To: Sebastian Brings; +Cc: xfs
This is the first of many apologies in this mail: I apologize to everyone
for using German in this list and to Sebastian for the silly pun
(which he probably heard many times before): "Herr Brings, Ihr Tipp bringts"
Sebastian is exactly right: I copy/pasted the device-name from the
output of df to use with xfs_db when I should have used the value from
the fstab. Everything is alright with the filesystem (except for the
guy handling it)
So I apologize to all those whose time I've wasted (and who were nice
enough to answer me)
I apologize for doubting the quality of the XFS-filesystem and tools
And last but not least I apologize for top-posting
Bernhard
>>>>> On Tue, 13 Apr 2010 18:08:50 +0200 (CEST)
>>>>> "SB" == Sebastian Brings <Sebastian.Brings@web.de> wrote:
SB> could it be you specified the wrong device for xfs_db? The
SB> xfs_info gives =/dev/VolGroup00/LogVol05 as metadata device,
SB> but for xfs_db you used /dev/mapper/VolGroup00-LogVol04...
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-04-13 17:39 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-13 16:08 XFS-filesystem corrupted by defragmentation Sebastian Brings
2010-04-13 17:41 ` Bernhard Gschaider
-- strict thread matches above, loose matches on Subject: below --
2010-04-13 12:10 Bernhard Gschaider
2010-04-13 14:58 ` Robert Brockway
2010-04-13 15:24 ` Bernhard Gschaider
2010-04-13 16:36 ` Eric Sandeen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox