public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Timothy Shimmin <tes@sgi.com>
To: lachlan@sgi.com
Cc: xfs@oss.sgi.com
Subject: Re: 2.6.24-rc3 oopses while mounting fs
Date: Wed, 05 Dec 2007 17:38:25 +1100	[thread overview]
Message-ID: <47564761.9070805@sgi.com> (raw)
In-Reply-To: <47563B73.2030403@sgi.com>

Lachlan,

An aside...

I think I see why we couldn't get "xfs_logprint -t" to work on a file (-f).
   > xfs_logprint -t -f ~/debug/sdc.xlg
   xfs_logprint:
       data device: 0xffffffffffffffff
       log device: 0xffffffffffffffff daddr: 0 length: 262144

   XFS: Log inconsistent (didn't find previous header)
   XFS: failed to find log head
   xfs_logprint: failed to find head and tail, error: 5

The problem is that it is using the superblock to determine if it is version 1
or version 2 logs in order to find out the maximum size of the iclog (log record).
And for -f there is no superblock to look at. The version number is, however, in the
log record header.
So when it looks in the undefined data it comes up with _not_ version 2,
and so assumes to look back for the hdr to only 32K when it really needs
to look back to 64k (for v2 it would limit at 256K).
Hence, it can't find the magic# in the header.

The debugging...

====================================================
Breakpoint 1, xlog_find_verify_log_record (log=0x607ffffffea96be0, start_blk=5312, last_blk=0x607ffffffea96bc0, extra_bblks=0)
     at xfs_log_recover.c:234
234             if (!(bp = xlog_get_bp(log, num_blks))) {
(gdb) bt
#0  xlog_find_verify_log_record (log=0x607ffffffea96be0, start_blk=5312, last_blk=0x607ffffffea96bc0, extra_bblks=0) at xfs_log_recover.c:234
#1  0x4000000000092020 in xlog_find_head (log=0x607ffffffea96be0, return_head_blk=0x607ffffffea96bd0) at xfs_log_recover.c:527
#2  0x4000000000094000 in xlog_find_tail (log=0x607ffffffea96be0, head_blk=0x607ffffffea96bd0, tail_blk=0x607ffffffea96bd8, readonly=0)
     at xfs_log_recover.c:616
#3  0x40000000000100a0 in xfs_log_print_trans (log=0x607ffffffea96be0, print_block_start=-1) at log_print_trans.c:49
#4  0x4000000000003600 in main (argc=<value optimized out>, argv=<value optimized out>) at logprint.c:240

(gdb) print *last_blk
$1 = 5376

start_blk = 5312
last_blk = 5376
extra_bblks = 0

Goes from 5376..5312 looking for magic# at start of sector.
Errors out if can't find one.
Which is true since the next magic# is at 5248.

logprint -d output:
   4992 HEADER Cycle 154 tail 153:253056 len  61440 ops 105
   5120 HEADER Cycle 154 tail 153:253056 len  61440 ops 344
   5248 HEADER Cycle 154 tail 153:253056 len  61440 ops 303  <--- previous block
   5376 HEADER Cycle 153 tail 153:253056 len      0 ops 0    <--- hdr
[00000 - 05376] Cycle 0x0000009a New Cycle 0x00000099
   5377 HEADER Cycle 153 tail 153:253056 len      0 ops 0
   5378 HEADER Cycle 153 tail 153:253056 len      0 ops 0

So why is it using a start_blk of 5312?


     522         num_scan_bblks = XLOG_REC_SHIFT(log);
     523         if (head_blk >= num_scan_bblks) {
     524                 start_blk = head_blk - num_scan_bblks; /* don't read head_blk */
     525
     526                 /* start ptr at last block ptr before head_blk */
  ->527                 if ((error = xlog_find_verify_log_record(log, start_blk,
     528                                                         &head_blk, 0)) == -1) {

#define XLOG_REC_SHIFT(log) \
         BTOBB(1 << (XFS_SB_VERSION_HASLOGV2(&log->l_mp->m_sb) ? \
          XLOG_MAX_RECORD_BSHIFT : XLOG_BIG_RECORD_BSHIFT))

#define XLOG_BIG_RECORD_BSHIFT  15              /* 32k == 1 << 15 */
#define XLOG_MAX_RECORD_BSHIFT  18              /* 256k == 1 << 18 */

Number of blocks between rec-hdrs from -d output:
5376-5248 = 128 BB

Looks like it used 5312 = 5376 - num_scan_bblks
=> num_scan_bblks = 64

Yep
(gdb) p num_scan_bblks
$3 = 64
32K
i.e. it thinks it has v1 logs

(gdb) p *(log->l_mp)
$9 = {m_sb = {sb_magicnum = 0, sb_blocksize = 0, sb_dblocks = 0, sb_rblocks = 0, sb_rextents = 0, sb_uuid = '\0' <repeats 15 times>,
     sb_logstart = 2305843009213997776, sb_rootino = 6953557824646229696, sb_rbmino = 6953557824646229688, sb_rsumino = 6953557824646229680,
     sb_rextsize = 284472, sb_agblocks = 536870912, sb_agcount = 4281151176, sb_rbmblocks = 1619001343, sb_logblocks = 0, sb_versionnum = 0,
     sb_sectsize = 0, sb_inodesize = 38256, sb_inopblock = 4, sb_fname = "\000\000\000 H?\a\000\000\000\000 ", sb_blocklog = 0 '\0',
etc...
it is not set

Okay, need to file a bug for this part so that -f can be useful here.
It needs to be using a LR h_version number for this somehow.

--Tim


Lachlan McIlroy wrote:
>> ============================================================================ 
>>
>> cycle: 154    version: 2        lsn: 154,5248    tail_lsn: 153,253056
>> length of Log Record: 61440    prev offset: 5120        num ops: 303
>> uuid: b02fc6bd-662f-40a5-8f0c-e260a881a3e7   format: little endian linux
>> h_size: 262144
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>
>> extended-header: cycle: 154
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>
>> extended-header: cycle: 154
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>
>> extended-header: cycle: 154
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>
>> extended-header: cycle: 154
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>
>> extended-header: cycle: 154
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>
>> extended-header: cycle: 154
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>
>> extended-header: cycle: 154
>> ---------------------------------------------------------------------------- 
>>
>> Oper (0): tid: 5769cd60  len: 0  clientid: TRANS  flags: START
>> ---------------------------------------------------------------------------- 
>>
>> Oper (1): tid: 5769cd60  len: 16  clientid: TRANS  flags: none
>> TRAN:    type: STRAT_WRITE       tid: 0       num_items: 5
>> ---------------------------------------------------------------------------- 
>>
>> Oper (2): tid: 5769cd60  len: 56  clientid: TRANS  flags: none
>> INODE: #regs: 3   ino: 0x1400005c  flags: 0x5   dsize: 16
>>         blkno: 610338736  len: 16  boff: 4096
>> Oper (3): tid: 5769cd60  len: 96  clientid: TRANS  flags: none
>> INODE CORE
>> magic 0x494e mode 0100600 version 1 format 2
>> nlink 1 uid 14029 gid 1474
>> atime 0x4716b870 mtime 0x4716bade ctime 0x4716bade
>> size 0x1ce00000 nblocks 0x1c840 extsize 0x0 nextents 0x1
>> naextents 0x0 forkoff 0 dmevmask 0x0 dmstate 0x0
>> flags 0x0 gen 0x4
>> Oper (4): tid: 5769cd60  len: 16  clientid: TRANS  flags: none
>> EXTENTS inode data
>> ---------------------------------------------------------------------------- 
> 
> 
> Another interesting point is that the transaction id has changed between
> these operations from 5769cd60 to 5769cf18.  It should stay the same.
> We may be reading old log data.
> 
>>
>> Oper (5): tid: 5769cf18  len: 24  clientid: TRANS  flags: none
>> BUF:  #regs: 2   start blkno: 610338561 (0x24610701)  len: 1  bmap 
>> size: 1  flags: 0x0
>> Oper (6): tid: 5769cf18  len: 128  clientid: TRANS  flags: none
>> AGF Buffer: XAGF
>> ver: 1  seq#: 5  len: 15258464
>> root BNO: 1  CNT: 2
>> level BNO: 1  CNT: 1
>> 1st: 0  last: 3  cnt: 4  freeblks: 2491090  longest: 581471
>> ---------------------------------------------------------------------------- 
>>
>> Oper (7): tid: 5769cf18  len: 28  clientid: TRANS  flags: none
>> BUF:  #regs: 2   start blkno: 610338568 (0x24610708)  len: 8  bmap 
>> size: 2  flags: 0x0
>> Oper (8): tid: 5769cf18  len: 128  clientid: TRANS  flags: none
>> BUF DATA
>> ---------------------------------------------------------------------------- 
>>

  reply	other threads:[~2007-12-05  6:36 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-28 13:45 2.6.24-rc3 oopses while mounting fs KELEMEN Peter
2007-11-28 23:56 ` Lachlan McIlroy
2007-11-30 22:31   ` KELEMEN Peter
2007-12-03  3:24     ` Lachlan McIlroy
2007-12-03 22:02       ` KELEMEN Peter
2007-12-04  0:11         ` Timothy Shimmin
2007-12-04  7:08           ` Lachlan McIlroy
2007-12-04  9:57             ` KELEMEN Peter
2007-12-05  5:47             ` Lachlan McIlroy
2007-12-05  6:38               ` Timothy Shimmin [this message]
2007-12-05  7:04                 ` Lachlan McIlroy
2007-12-04  9:42           ` KELEMEN Peter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47564761.9070805@sgi.com \
    --to=tes@sgi.com \
    --cc=lachlan@sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox