All of lore.kernel.org
 help / color / mirror / Atom feed
From: Timothy Shimmin <tes@sgi.com>
To: Martin Steigerwald <ms@teamix.de>
Cc: xfs@oss.sgi.com
Subject: Re: Is it possible the check an frozen XFS filesytem to avoid downtime
Date: Tue, 15 Jul 2008 13:38:23 +1000	[thread overview]
Message-ID: <487C1BAF.2030404@sgi.com> (raw)
In-Reply-To: <200807141542.51613.ms@teamix.de>

Hi Martin,

Martin Steigerwald wrote:
> Hi!
> 
> We seen in-memory corruption on two XFS filesystem on a server heartbeat 
> cluster of one of our customers:
> 
> 
> XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1563 of file 
> fs/xfs/xfs_alloc.c.  Caller 0xffffffff8824eb5d
> 
> Call Trace:
>  [<ffffffff8824cff3>] :xfs:xfs_free_ag_extent+0x1a6/0x6b5
>  [<ffffffff8824eb5d>] :xfs:xfs_free_extent+0xa9/0xc9
>  [<ffffffff88258636>] :xfs:xfs_bmap_finish+0xf0/0x169
>  [<ffffffff88278b4c>] :xfs:xfs_itruncate_finish+0x180/0x2c1
>  [<ffffffff8829071a>] :xfs:xfs_setattr+0x841/0xe59
>  [<ffffffff8022e868>] sock_common_recvmsg+0x30/0x45
>  [<ffffffff8829adc8>] :xfs:xfs_vn_setattr+0x121/0x144
>  [<ffffffff8022a06d>] notify_change+0x156/0x2ef
>  [<ffffffff883bf9c6>] :nfsd:nfsd_setattr+0x334/0x4b1
>  [<ffffffff883c61d6>] :nfsd:nfsd3_proc_setattr+0xa2/0xae
>  [<ffffffff883bb24d>] :nfsd:nfsd_dispatch+0xdd/0x19e
>  [<ffffffff8833a10e>] :sunrpc:svc_process+0x3cb/0x6d9
>  [<ffffffff8025b20b>] __down_read+0x12/0x9a
>  [<ffffffff883bb816>] :nfsd:nfsd+0x192/0x2b0
>  [<ffffffff80255f38>] child_rip+0xa/0x12
>  [<ffffffff883bb684>] :nfsd:nfsd+0x0/0x2b0
>  [<ffffffff80255f2e>] child_rip+0x0/0x12
> 
> xfs_force_shutdown(dm-1,0x8) called from line 4261 of file fs/xfs/xfs_bmap.c.  
> Return address = 0xffffffff88258673
> Filesystem "dm-1": Corruption of in-memory data detected.  Shutting down 
> filesystem: dm-1
> Please umount the filesystem, and rectify the problem(s)
> 
> on
> 
> Linux version 2.6.21-1-amd64 (Debian 2.6.21-4~bpo.1) (nobse@backports.org) 
> (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Tue Jun 5 
> 07:43:32 UTC 2007
> 
> 
> We plan to do a takeover so that the server which appears to have memory 
> errors can be memtested. 
> 
> After the takeover we would like to make sure that the XFS filesystems are 
> intact. Is it possible to do so without taking the filesystem completely 
> offline?
> 
> I thought about mounting read only and it might be the best choice available, 
> but then it will *fail* write accesses. I would prefer if these are just 
> stalled.
> 
> I tried xfs_freeze -f on my laptop home directory, but then did not machine to 
> get it check via xfs_check or xfs_repair -nd... is it possible at all?
> 
> Ciao,


When I last tried (and I don't think Barry has done anything to it to change
things) it wouldn't work.
However, I think it could/should be changed to make it work.

My notes from the SGI bug:

958642: running xfs_check and "xfs_repair -n" on a frozen xfs filesystem
> We've been asked a few times about the possibility of running xfs_check
> or xfs_repair -n on a frozen filesystem.
> And a while back I looked into what some of the hinderances were.
> And now I've forgotten ;-))
> 
> I think there are hinderances for libxfs_init (check_open()) and
> for having a dirty log.
> 
> For libxfs_init, I found that I couldn't run the tools without error'ing out.
> I think I found out that I needed the INACTIVE flag,
> without READONLY/DANGEROUSLY, like xfs_logprint does.
> 
> ----------------------------------------
> Date: Thu, 19 Oct 2006 11:24:06 +1000
> From: Timothy Shimmin <tes@sgi.com>
> To: lachlan@sgi.com
> cc: xfs-dev@sgi.com
> Subject: Re: init.c patch
> ------------------------------------------------------
>   Ok, my understanding of the READONLY/DANGEROUSLY flags were wrong.
>   I thought they were just overriding flags when you were guaranteeing you were only reading
>   and it would be more permissive,
>   but they are for doing stuff on readonly (ro) mounts.
>   They are rather confusing to me. When you go with defaults for repair and db then
>   it doesn't set the INACTIVE flag.
>   It means if I do _not_ want to be fatal then I need to set INACTIVE but not set READONLY or
>   DANGEROUSLY - which is what logprint does.
> 
>   I would have thought they'd be an option which for commands which don't modify anything,
>   that they can read from a non-ro mounted filesystem (at the users risk) -
>   which is what logprint does. i.e an option which just sets INACTIVE and only
>   produces a warning.
> 
>   The other alternative is to be able to test for a frozen fs as you suggested.
> ----------------------------------------------------------
> 
> Lachlan suggested using a check_isfrozen() routine instead of overriding
> check_isactive().
> 
> 
> And as far as the dirty log is concerned...
> It will be dirty when it is frozen, but in a special way.
> It will have an unmount record followed by a dummy record -
> solely used so that when mounted again it can do
> the unlinked list processing.
> So we could add code to test if the log just had an unmount record
> followed by a dummy record and continue anyway knowing that
> the metadata was consistent.
> e.g. in xfs_repair/phase2.c:zero_log() it calls xlog_find_tail()
> and tests if (head_blk != tail_blk) to know if the log is dirty.
> I think libxfs should provide a routine: libxfs_dirty_log
> or in the libxlog code with a suitable name,
> which could say how dirty the log is ;-)
> Is it dirty such that we have real transactions to replay or
> does it just have to do the unlinked processing as in the case of
> a frozen filesystem.
> It would be nice anyway to have an abstraction here because
> it is finding out the head and tail blocks solely for this purpose
> and doesn't really care what they are.
> 
> --Tim


--Tim

  reply	other threads:[~2008-07-15  3:37 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-14 13:42 Is it possible the check an frozen XFS filesytem to avoid downtime Martin Steigerwald
2008-07-15  3:38 ` Timothy Shimmin [this message]
2008-07-15  7:44   ` Martin Steigerwald
2008-07-15 15:27     ` Eric Sandeen
2008-07-16  7:53       ` Martin Steigerwald
2008-10-27 16:57       ` Martin Steigerwald
2008-10-27 17:15         ` Eric Sandeen
2008-10-28  8:36           ` Martin Steigerwald
2008-07-16  8:55     ` Timothy Shimmin
2008-07-15  7:47   ` Martin Steigerwald

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=487C1BAF.2030404@sgi.com \
    --to=tes@sgi.com \
    --cc=ms@teamix.de \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.