public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Martin Steigerwald <ms@teamix.de>
To: Timothy Shimmin <tes@sgi.com>
Cc: xfs@oss.sgi.com
Subject: Re: Is it possible the check an frozen XFS filesytem to avoid downtime
Date: Tue, 15 Jul 2008 09:44:12 +0200	[thread overview]
Message-ID: <200807150944.13277.ms@teamix.de> (raw)
In-Reply-To: <487C1BAF.2030404@sgi.com>

Am Dienstag, 15. Juli 2008 05:38:23 schrieb Timothy Shimmin:
> Hi Martin,

Hi Tim,

> Martin Steigerwald wrote:
> > Hi!
> >
> > We seen in-memory corruption on two XFS filesystem on a server heartbeat
> > cluster of one of our customers:
> >
> >
> > XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1563 of file
> > fs/xfs/xfs_alloc.c.  Caller 0xffffffff8824eb5d
> >
> > Call Trace:
> >  [<ffffffff8824cff3>] :xfs:xfs_free_ag_extent+0x1a6/0x6b5
> >  [<ffffffff8824eb5d>] :xfs:xfs_free_extent+0xa9/0xc9
> >  [<ffffffff88258636>] :xfs:xfs_bmap_finish+0xf0/0x169
> >  [<ffffffff88278b4c>] :xfs:xfs_itruncate_finish+0x180/0x2c1
> >  [<ffffffff8829071a>] :xfs:xfs_setattr+0x841/0xe59
> >  [<ffffffff8022e868>] sock_common_recvmsg+0x30/0x45
> >  [<ffffffff8829adc8>] :xfs:xfs_vn_setattr+0x121/0x144
> >  [<ffffffff8022a06d>] notify_change+0x156/0x2ef
> >  [<ffffffff883bf9c6>] :nfsd:nfsd_setattr+0x334/0x4b1
> >  [<ffffffff883c61d6>] :nfsd:nfsd3_proc_setattr+0xa2/0xae
> >  [<ffffffff883bb24d>] :nfsd:nfsd_dispatch+0xdd/0x19e
> >  [<ffffffff8833a10e>] :sunrpc:svc_process+0x3cb/0x6d9
> >  [<ffffffff8025b20b>] __down_read+0x12/0x9a
> >  [<ffffffff883bb816>] :nfsd:nfsd+0x192/0x2b0
> >  [<ffffffff80255f38>] child_rip+0xa/0x12
> >  [<ffffffff883bb684>] :nfsd:nfsd+0x0/0x2b0
> >  [<ffffffff80255f2e>] child_rip+0x0/0x12
> >
> > xfs_force_shutdown(dm-1,0x8) called from line 4261 of file
> > fs/xfs/xfs_bmap.c. Return address = 0xffffffff88258673
> > Filesystem "dm-1": Corruption of in-memory data detected.  Shutting down
> > filesystem: dm-1
> > Please umount the filesystem, and rectify the problem(s)
> >
> > on
> >
> > Linux version 2.6.21-1-amd64 (Debian 2.6.21-4~bpo.1)
> > (nobse@backports.org) (gcc version 4.1.2 20061115 (prerelease) (Debian
> > 4.1.1-21)) #1 SMP Tue Jun 5 07:43:32 UTC 2007
> >
> >
> > We plan to do a takeover so that the server which appears to have memory
> > errors can be memtested.
> >
> > After the takeover we would like to make sure that the XFS filesystems
> > are intact. Is it possible to do so without taking the filesystem
> > completely offline?
> >
> > I thought about mounting read only and it might be the best choice
> > available, but then it will *fail* write accesses. I would prefer if
> > these are just stalled.
> >
> > I tried xfs_freeze -f on my laptop home directory, but then did not
> > machine to get it check via xfs_check or xfs_repair -nd... is it possible
> > at all?
> >
> > Ciao,
>
> When I last tried (and I don't think Barry has done anything to it to
> change things) it wouldn't work.
> However, I think it could/should be changed to make it work.

Okay... we recommended the customer to do it the safe way unmounting the 
filesystem completely. He did and the filesystem appear to be intact *phew*. 
XFS appeared to detect the in memory corruption early enough.

Its a bit strange however, cause we now know that the server sports ECC RAM. 
Well we will see what memtest86+ has to say about it.

> My notes from the SGI bug:
>
> 958642: running xfs_check and "xfs_repair -n" on a frozen xfs filesystem
>
> > We've been asked a few times about the possibility of running xfs_check
> > or xfs_repair -n on a frozen filesystem.
> > And a while back I looked into what some of the hinderances were.
> > And now I've forgotten ;-))
> >
> > I think there are hinderances for libxfs_init (check_open()) and
> > for having a dirty log.
> >
> > For libxfs_init, I found that I couldn't run the tools without error'ing
> > out. I think I found out that I needed the INACTIVE flag,
> > without READONLY/DANGEROUSLY, like xfs_logprint does.
> >
> > ----------------------------------------
> > Date: Thu, 19 Oct 2006 11:24:06 +1000
> > From: Timothy Shimmin <tes@sgi.com>
> > To: lachlan@sgi.com
> > cc: xfs-dev@sgi.com
> > Subject: Re: init.c patch
> > ------------------------------------------------------
> >   Ok, my understanding of the READONLY/DANGEROUSLY flags were wrong.
> >   I thought they were just overriding flags when you were guaranteeing
> > you were only reading and it would be more permissive,
> >   but they are for doing stuff on readonly (ro) mounts.
> >
> >   They are rather confusing to me. When you go with defaults for repair
> > and db then it doesn't set the INACTIVE flag.
> >   It means if I do _not_ want to be fatal then I need to set INACTIVE but
> > not set READONLY or DANGEROUSLY - which is what logprint does.
> >

I think that there should be different options for readonly / frozen fs 
checking and dangerous repair... since I think readonly checks are a 
different thing than repairing a mounted filesystem and hoping that the 
running XFS will not choke upon the filesystem that xfs_repair changes under 
its hood.

I expected a "-r" for read only in xfs_check and xfs_repair, well but for 
xfs_repair this option is already taken for specifying the realtime volume.

Ciao,
-- 
Martin Steigerwald - team(ix) GmbH - http://www.teamix.de
gpg: 19E3 8D42 896F D004 08AC A0CA 1E10 C593 0399 AE90

  reply	other threads:[~2008-07-15  7:43 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-14 13:42 Is it possible the check an frozen XFS filesytem to avoid downtime Martin Steigerwald
2008-07-15  3:38 ` Timothy Shimmin
2008-07-15  7:44   ` Martin Steigerwald [this message]
2008-07-15 15:27     ` Eric Sandeen
2008-07-16  7:53       ` Martin Steigerwald
2008-10-27 16:57       ` Martin Steigerwald
2008-10-27 17:15         ` Eric Sandeen
2008-10-28  8:36           ` Martin Steigerwald
2008-07-16  8:55     ` Timothy Shimmin
2008-07-15  7:47   ` Martin Steigerwald

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200807150944.13277.ms@teamix.de \
    --to=ms@teamix.de \
    --cc=tes@sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox