public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Lachlan McIlroy <lmcilroy@redhat.com>
To: John Quigley <jquigley@jquigley.com>
Cc: XFS Development <xfs@oss.sgi.com>
Subject: Re: XFS corruption with failover
Date: Fri, 21 Aug 2009 03:11:15 -0400 (EDT)	[thread overview]
Message-ID: <193616431.76011250838675275.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com> (raw)
In-Reply-To: <1194138654.75921250838215929.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>

----- "John Quigley" <jquigley@jquigley.com> wrote:

> Lachlan McIlroy wrote:
> > xfs_logprint doesn't find any problems with this log but that
> doesn't mean
> > the kernel doesn't - they use different implementations to read the
> log.  I
> > noticed that the active part of the log wraps around the physical
> end/start
> > of the log which reminds of this fix:
Hang on, I made a mistake there.  The xfs_logprint transactional view
of the log didn't find any errors but dumping the contents of the log
shows a different story.

$ xfs_logprint -f xfs-failover-logprint 
xfs_logprint:
    data device: 0xffffffffffffffff
    log device: 0xffffffffffffffff daddr: 0 length: 262144

Header 0xb wanted 0xfeedbabe
**********************************************************************
* ERROR: header cycle=11          block=6168                         *
**********************************************************************
Bad log record header

$ xfs_logprint -d -f xfs-failover-logprint 

xfs_logprint:
    data device: 0xffffffffffffffff
    log device: 0xffffffffffffffff daddr: 0 length: 262144

[00000 - 00000] Cycle 0xffffffff New Cycle 0x0000000c
    32 HEADER Cycle 12 tail 11:257848 len  32256 ops 707
    96 HEADER Cycle 12 tail 11:257848 len  24064 ops 456
   144 HEADER Cycle 12 tail 11:257848 len   3584 ops 25
   152 HEADER Cycle 12 tail 11:257848 len  32256 ops 708
   216 HEADER Cycle 12 tail 11:257848 len  32256 ops 706
   280 HEADER Cycle 12 tail 11:257848 len  32256 ops 709
   344 HEADER Cycle 12 tail 11:257848 len   3584 ops 18
   352 HEADER Cycle 12 tail 11:257848 len  32256 ops 708
   416 HEADER Cycle 12 tail 11:257848 len  32256 ops 706
   480 HEADER Cycle 12 tail 11:257848 len  32256 ops 709
   544 HEADER Cycle 12 tail 11:257848 len  32256 ops 707
   608 HEADER Cycle 12 tail 11:257848 len  32256 ops 710
   672 HEADER Cycle 12 tail 11:257848 len  32256 ops 707
   736 HEADER Cycle 12 tail 11:257848 len  32256 ops 709
   800 HEADER Cycle 12 tail 11:257848 len  32256 ops 707
   864 HEADER Cycle 12 tail 11:257848 len  32256 ops 706
   928 HEADER Cycle 12 tail 11:257848 len  32256 ops 709
   992 HEADER Cycle 12 tail 11:257848 len  32256 ops 707
  1056 HEADER Cycle 12 tail 11:257848 len  32256 ops 710
  1120 HEADER Cycle 12 tail 11:257848 len  32256 ops 707
  1184 HEADER Cycle 12 tail 11:257848 len  32256 ops 709
  1248 HEADER Cycle 12 tail 11:257848 len  32256 ops 707
  1312 HEADER Cycle 12 tail 11:257848 len  32256 ops 706
  1376 HEADER Cycle 12 tail 11:257848 len  32256 ops 709
  1440 HEADER Cycle 12 tail 11:257848 len  32256 ops 707
  1504 HEADER Cycle 12 tail 11:257848 len  32256 ops 710
  1568 HEADER Cycle 12 tail 11:257848 len  24064 ops 437
  1616 HEADER Cycle 12 tail 11:257848 len   3584 ops 25
  1624 HEADER Cycle 12 tail 11:257848 len  32256 ops 708
  1688 HEADER Cycle 12 tail 11:257848 len  32256 ops 706
  1752 HEADER Cycle 12 tail 11:257848 len  32256 ops 709
  1816 HEADER Cycle 12 tail 11:257848 len  32256 ops 707
  1880 HEADER Cycle 12 tail 11:257848 len  32256 ops 710
  1944 HEADER Cycle 12 tail 11:257848 len  32256 ops 707
  2008 HEADER Cycle 12 tail 11:257848 len  32256 ops 709
  2072 HEADER Cycle 11 tail 11:257848 len      0 ops 0
[00000 - 02072] Cycle 0x0000000c New Cycle 0x0000000b
  2073 HEADER Cycle 11 tail 11:257848 len      0 ops 0
  2074 HEADER Cycle 11 tail 11:257848 len      0 ops 0
  2075 HEADER Cycle 11 tail 11:257848 len      0 ops 0
.........
  6165 HEADER Cycle 11 tail 11:257848 len      0 ops 0
  6166 HEADER Cycle 11 tail 11:257848 len      0 ops 0
  6167 HEADER Cycle 11 tail 11:257848 len      0 ops 0
  6184 HEADER Cycle 11 tail 10:260744 len  32256 ops 707
  6248 HEADER Cycle 11 tail 10:260744 len  32256 ops 710
  6312 HEADER Cycle 11 tail 10:260744 len  32256 ops 707
..........

So we get to block 6168 and there's an unexpected state change - instead
of a magic number we have the cycle number.

BLKNO: 6167
 0 bebaedfe  b000000  2000000        0  b000000 17180000  b000000 38ef0300 
 8        0        0        0        0        0        0        0        0 
10        0        0        0        0        0        0        0        0 
18        0        0        0        0        0        0        0        0 
20        0        0        0        0        0        0        0        0 
28        0        0        0        0        0        0        0        0 
30        0        0        0        0        0        0        0        0 
38        0        0        0        0        0        0        0        0 
40        0        0        0        0        0        0        0        0 
48        0        0        0  1000000 af447af9 4a44d930 5a9b0fa0 20d7ba86 
50        0        0        0        0        0        0        0        0 
58        0        0        0        0        0        0        0        0 
60        0        0        0        0        0        0        0        0 
68        0        0        0        0        0        0        0        0 
70        0        0        0        0        0        0        0        0 
78        0        0        0        0        0        0        0        0 

BLKNO: 6168
 0  b000000       69 81a4494e    10201       63       63        1        0 
 8        0    20000 4a857400 207b4682 4a859784 21b73460 4a859784 21b73460 
10  5f60000        0        0        0        0        0  2000000        0 
18        0        0 b00428a0        0      269 780528a0        0      169 
20 780528a0 10000000       69 5452414e        3        0        1 780528a0 
28 38000000       69    2123b        1        0        0   12c126        0 
30        0        0        0        0    96090        0       10      600 
38 780528a0 60000000       69 81a4494e    10201       63       63        1 
40        0        0    20000 4a857400 207b4682 4a859784 21b73460 4a859784 
48 21b73460  5f60000        0        0        0        0        0  2000000 
50        0        0        0 780528a0        0      269 400628a0        0 
58      169 400628a0 10000000       69 5452414e        3        0        1 
60 400628a0 38000000       69    2123b        1        0        0   12c126 
68        0        0        0        0        0    96090        0       10 
70      600 400628a0 60000000       69 81a4494e    10201       63       63 
78        1        0        0    20000 4a857400 207b4682 4a859784 21c676a0 

I don't know what's happened here.  It may not even be related to the log
recovery failure.

> 
> Very interesting indeed, thank you /very/ much for looking at this.
> 
> > I think the fix made it into 2.6.24.
> 
> We're currently using the very latest 2.6.30, unfortunately.  We've
> distilled this into a reproducible environment with a stack of NFS +
> XFS to a local disk + automated sysrq 'b' reboots.  We're working on
> getting this bundled up into a nice little package as a VirtualBox vm
> for your consumption.  Please tell me if this is not desirable.
> 
> Thanks very much again.
> 
> John Quigley
> jquigley.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

       reply	other threads:[~2009-08-21  7:11 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1194138654.75921250838215929.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
2009-08-21  7:11 ` Lachlan McIlroy [this message]
     [not found] <990461759.2142271250648177725.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
2009-08-19  2:18 ` XFS corruption with failover Lachlan McIlroy
2009-08-19 15:46   ` John Quigley
     [not found] <835473717.1935811250214078456.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
2009-08-14  1:43 ` Lachlan McIlroy
     [not found] <424153067.1934481250210293891.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
2009-08-14  0:38 ` Lachlan McIlroy
2009-08-14  1:14   ` John Quigley
2009-08-17 18:04   ` John Quigley
2009-08-13 20:17 John Quigley
2009-08-13 21:17 ` Emmanuel Florac
2009-08-13 22:42   ` Felix Blyakher
2009-08-14  0:52     ` John Quigley
2009-08-14  0:50   ` John Quigley
2009-08-13 21:44 ` Felix Blyakher
2009-08-14  0:31   ` Eric Sandeen
2009-08-14  0:58     ` Lachlan McIlroy
2009-08-14  1:35       ` Eric Sandeen
2009-08-14  1:44         ` John Quigley
2009-08-14  1:06     ` John Quigley
2009-08-14 13:21     ` Felix Blyakher
2009-08-14  0:56   ` John Quigley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=193616431.76011250838675275.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com \
    --to=lmcilroy@redhat.com \
    --cc=jquigley@jquigley.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox