public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* 2.6.24-rc3 oopses while mounting fs
@ 2007-11-28 13:45 KELEMEN Peter
  2007-11-28 23:56 ` Lachlan McIlroy
  0 siblings, 1 reply; 12+ messages in thread
From: KELEMEN Peter @ 2007-11-28 13:45 UTC (permalink / raw)
  To: xfs

Box is x86_64 with RHEL4 userspace.  I have the log image saved
and I can play with the filesystem if needed.

Peter

SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
SGI XFS Quota Management subsystem
XFS mounting filesystem sdc
Starting XFS recovery on filesystem: sdc (logdev: internal)
Unable to handle kernel paging request at ffffc20001c9b000 RIP: 
 [<ffffffff881c6175>] :xfs:xlog_recover_add_to_trans+0x64/0xef
PGD 15781d067 PUD 15781c067 PMD 130fda067 PTE 0
Oops: 0000 [1] SMP 
CPU 0 
Modules linked in: xfs sd_mod 3w_9xxx scsi_mod ohci_hcd uhci_hcd ehci_hcd ipv6 bnx2 sky2 r8169 ns83820 dl2k acenic e100 tg3 e1000 mii
Pid: 2434, comm: mount Not tainted 2.6.24-rc3 #1
RIP: 0010:[<ffffffff881c6175>]  [<ffffffff881c6175>] :xfs:xlog_recover_add_to_trans+0x64/0xef
RSP: 0018:ffff81013240f728  EFLAGS: 00010286
RAX: ffffc20001c9c000 RBX: 000000000020bd78 RCX: 00000000001cc280
RDX: 0000000000000000 RSI: ffffc20001c9b000 RDI: ffffc20001cdbaf8
RBP: ffff81013240f758 R08: ffff8101578011a2 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000002 R12: ffff8101318e4220
R13: ffffc20001c5b508 R14: ffff8101318e4770 R15: ffffc20001c5b4fc
FS:  00002b0dc2141b00(0000) GS:ffffffff805ef000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffc20001c9b000 CR3: 000000013243d000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process mount (pid: 2434, threadinfo ffff81013240e000, task ffff8101320a4000)
Stack:  0020bd785769c5e8 000000005769c5e8 000000000000000a ffffc20001c5b508
 000000000000011e ffffc20001c5b4fc ffff81013240f7b8 ffffffff881c79db
 ffffc20001c6a000 00000001881da0bc ffff81013246c000 ffff81013240f828
Call Trace:
 [<ffffffff881c79db>] :xfs:xlog_recover_process_data+0x184/0x1d9
 [<ffffffff881c89cb>] :xfs:xlog_do_recovery_pass+0x74b/0x795
 [<ffffffff881c8a52>] :xfs:xlog_do_log_recovery+0x3d/0x82
 [<ffffffff881c8aa9>] :xfs:xlog_do_recover+0x12/0x11c
 [<ffffffff881c8c37>] :xfs:xlog_recover+0x84/0x92
 [<ffffffff881c3125>] :xfs:xfs_log_mount+0x8c/0xe4
 [<ffffffff881cad92>] :xfs:xfs_mountfs+0x67d/0x97b
 [<ffffffff881cbcb7>] :xfs:xfs_mru_cache_create+0x170/0x1d5
 [<ffffffff881b2f04>] :xfs:xfs_fstrm_free_func+0x0/0x81
 [<ffffffff881be67c>] :xfs:xfs_ioinit+0xb/0xd
 [<ffffffff881d0cfa>] :xfs:xfs_mount+0x2bb/0x36b
 [<ffffffff881e1355>] :xfs:xfs_fs_fill_super+0xd2/0x245
 [<ffffffff802b7b9f>] get_filesystem+0x17/0x39
 [<ffffffff802a4659>] sget+0x3fb/0x418
 [<ffffffff802a4661>] sget+0x403/0x418
 [<ffffffff802a4d79>] set_bdev_super+0x0/0x14
 [<ffffffff802a4ec2>] get_sb_bdev+0x123/0x16f
 [<ffffffff881e1283>] :xfs:xfs_fs_fill_super+0x0/0x245
 [<ffffffff881e14db>] :xfs:xfs_fs_get_sb+0x13/0x18
 [<ffffffff802a5107>] vfs_kern_mount+0x8f/0x11c
 [<ffffffff802a51d8>] do_kern_mount+0x44/0xf4
 [<ffffffff802ba0d3>] do_mount+0x6d8/0x71e
 [<ffffffff80335682>] __up_read+0x93/0x9b
 [<ffffffff80252261>] up_read+0x23/0x27
 [<ffffffff8047a679>] do_page_fault+0x42e/0x7c7
 [<ffffffff80284d61>] zone_statistics+0x64/0x69
 [<ffffffff8027e4e0>] __alloc_pages+0x6b/0x311
 [<ffffffff802ba3c8>] sys_mount+0x8a/0xcc
 [<ffffffff8020bf8e>] system_call+0x7e/0x83


Code: f3 a4 49 89 c7 49 8b 54 24 08 8b 42 18 85 c0 74 0e 3b 42 14 

-- 
    .+'''+.         .+'''+.         .+'''+.         .+'''+.         .+''
 Kelemen Péter     /       \       /       \     Peter.Kelemen@cern.ch
.+'         `+...+'         `+...+'         `+...+'         `+...+'

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.24-rc3 oopses while mounting fs
  2007-11-28 13:45 2.6.24-rc3 oopses while mounting fs KELEMEN Peter
@ 2007-11-28 23:56 ` Lachlan McIlroy
  2007-11-30 22:31   ` KELEMEN Peter
  0 siblings, 1 reply; 12+ messages in thread
From: Lachlan McIlroy @ 2007-11-28 23:56 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: Type: text/plain, Size: 3677 bytes --]

Peter,

This looks like a problem we've just fixed, try this patch.
We'll get this to mainline soon.

Lachlan

KELEMEN Peter wrote:
> Box is x86_64 with RHEL4 userspace.  I have the log image saved
> and I can play with the filesystem if needed.
> 
> Peter
> 
> SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
> SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
> SGI XFS Quota Management subsystem
> XFS mounting filesystem sdc
> Starting XFS recovery on filesystem: sdc (logdev: internal)
> Unable to handle kernel paging request at ffffc20001c9b000 RIP: 
>  [<ffffffff881c6175>] :xfs:xlog_recover_add_to_trans+0x64/0xef
> PGD 15781d067 PUD 15781c067 PMD 130fda067 PTE 0
> Oops: 0000 [1] SMP 
> CPU 0 
> Modules linked in: xfs sd_mod 3w_9xxx scsi_mod ohci_hcd uhci_hcd ehci_hcd ipv6 bnx2 sky2 r8169 ns83820 dl2k acenic e100 tg3 e1000 mii
> Pid: 2434, comm: mount Not tainted 2.6.24-rc3 #1
> RIP: 0010:[<ffffffff881c6175>]  [<ffffffff881c6175>] :xfs:xlog_recover_add_to_trans+0x64/0xef
> RSP: 0018:ffff81013240f728  EFLAGS: 00010286
> RAX: ffffc20001c9c000 RBX: 000000000020bd78 RCX: 00000000001cc280
> RDX: 0000000000000000 RSI: ffffc20001c9b000 RDI: ffffc20001cdbaf8
> RBP: ffff81013240f758 R08: ffff8101578011a2 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000002 R12: ffff8101318e4220
> R13: ffffc20001c5b508 R14: ffff8101318e4770 R15: ffffc20001c5b4fc
> FS:  00002b0dc2141b00(0000) GS:ffffffff805ef000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: ffffc20001c9b000 CR3: 000000013243d000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process mount (pid: 2434, threadinfo ffff81013240e000, task ffff8101320a4000)
> Stack:  0020bd785769c5e8 000000005769c5e8 000000000000000a ffffc20001c5b508
>  000000000000011e ffffc20001c5b4fc ffff81013240f7b8 ffffffff881c79db
>  ffffc20001c6a000 00000001881da0bc ffff81013246c000 ffff81013240f828
> Call Trace:
>  [<ffffffff881c79db>] :xfs:xlog_recover_process_data+0x184/0x1d9
>  [<ffffffff881c89cb>] :xfs:xlog_do_recovery_pass+0x74b/0x795
>  [<ffffffff881c8a52>] :xfs:xlog_do_log_recovery+0x3d/0x82
>  [<ffffffff881c8aa9>] :xfs:xlog_do_recover+0x12/0x11c
>  [<ffffffff881c8c37>] :xfs:xlog_recover+0x84/0x92
>  [<ffffffff881c3125>] :xfs:xfs_log_mount+0x8c/0xe4
>  [<ffffffff881cad92>] :xfs:xfs_mountfs+0x67d/0x97b
>  [<ffffffff881cbcb7>] :xfs:xfs_mru_cache_create+0x170/0x1d5
>  [<ffffffff881b2f04>] :xfs:xfs_fstrm_free_func+0x0/0x81
>  [<ffffffff881be67c>] :xfs:xfs_ioinit+0xb/0xd
>  [<ffffffff881d0cfa>] :xfs:xfs_mount+0x2bb/0x36b
>  [<ffffffff881e1355>] :xfs:xfs_fs_fill_super+0xd2/0x245
>  [<ffffffff802b7b9f>] get_filesystem+0x17/0x39
>  [<ffffffff802a4659>] sget+0x3fb/0x418
>  [<ffffffff802a4661>] sget+0x403/0x418
>  [<ffffffff802a4d79>] set_bdev_super+0x0/0x14
>  [<ffffffff802a4ec2>] get_sb_bdev+0x123/0x16f
>  [<ffffffff881e1283>] :xfs:xfs_fs_fill_super+0x0/0x245
>  [<ffffffff881e14db>] :xfs:xfs_fs_get_sb+0x13/0x18
>  [<ffffffff802a5107>] vfs_kern_mount+0x8f/0x11c
>  [<ffffffff802a51d8>] do_kern_mount+0x44/0xf4
>  [<ffffffff802ba0d3>] do_mount+0x6d8/0x71e
>  [<ffffffff80335682>] __up_read+0x93/0x9b
>  [<ffffffff80252261>] up_read+0x23/0x27
>  [<ffffffff8047a679>] do_page_fault+0x42e/0x7c7
>  [<ffffffff80284d61>] zone_statistics+0x64/0x69
>  [<ffffffff8027e4e0>] __alloc_pages+0x6b/0x311
>  [<ffffffff802ba3c8>] sys_mount+0x8a/0xcc
>  [<ffffffff8020bf8e>] system_call+0x7e/0x83
> 
> 
> Code: f3 a4 49 89 c7 49 8b 54 24 08 8b 42 18 85 c0 74 0e 3b 42 14 
> 

[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 1480 bytes --]

--- a/fs/xfs/linux-2.6/xfs_buf.c	2007-11-29 10:53:01.000000000 +1100
+++ b/fs/xfs/linux-2.6/xfs_buf.c	2007-11-29 10:53:01.000000000 +1100
@@ -725,15 +725,15 @@
 {
 	int			rval;
 	int			i = 0;
-	size_t			ptr;
-	size_t			end, end_cur;
-	off_t			offset;
+	unsigned long		pageaddr;
+	unsigned long		offset;
+	size_t			buflen;
 	int			page_count;
 
-	page_count = PAGE_CACHE_ALIGN(len) >> PAGE_CACHE_SHIFT;
-	offset = (off_t) mem - ((off_t)mem & PAGE_CACHE_MASK);
-	if (offset && (len > PAGE_CACHE_SIZE))
-		page_count++;
+	pageaddr = (unsigned long)mem & PAGE_CACHE_MASK;
+	offset = (unsigned long)mem - pageaddr;
+	buflen = PAGE_CACHE_ALIGN(len + offset);
+	page_count = buflen >> PAGE_CACHE_SHIFT;
 
 	/* Free any previous set of page pointers */
 	if (bp->b_pages)
@@ -747,22 +747,15 @@
 		return rval;
 
 	bp->b_offset = offset;
-	ptr = (size_t) mem & PAGE_CACHE_MASK;
-	end = PAGE_CACHE_ALIGN((size_t) mem + len);
-	end_cur = end;
-	/* set up first page */
-	bp->b_pages[0] = mem_to_page(mem);
-
-	ptr += PAGE_CACHE_SIZE;
-	bp->b_page_count = ++i;
-	while (ptr < end) {
-		bp->b_pages[i] = mem_to_page((void *)ptr);
-		bp->b_page_count = ++i;
-		ptr += PAGE_CACHE_SIZE;
+
+	for (i = 0; i < bp->b_page_count; i++) {
+		bp->b_pages[i] = mem_to_page((void *)pageaddr);
+		pageaddr += PAGE_CACHE_SIZE;
 	}
 	bp->b_locked = 0;
 
-	bp->b_count_desired = bp->b_buffer_length = len;
+	bp->b_count_desired = len;
+	bp->b_buffer_length = buflen;
 	bp->b_flags |= XBF_MAPPED;
 
 	return 0;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.24-rc3 oopses while mounting fs
  2007-11-28 23:56 ` Lachlan McIlroy
@ 2007-11-30 22:31   ` KELEMEN Peter
  2007-12-03  3:24     ` Lachlan McIlroy
  0 siblings, 1 reply; 12+ messages in thread
From: KELEMEN Peter @ 2007-11-30 22:31 UTC (permalink / raw)
  To: xfs

* Lachlan Mcilroy (lachlan@sgi.com) [20071129 10:56]:

Lachlan,

> This looks like a problem we've just fixed, try this patch.
> We'll get this to mainline soon.

Thanks for the patch.  Unfortunately, it does not seem to fix my
problem.  Attempting to mount the filesystem still results in an
oops.

Peter

-----8<---
SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
SGI XFS Quota Management subsystem
XFS mounting filesystem sdc
Starting XFS recovery on filesystem: sdc (logdev: internal)
Unable to handle kernel paging request at ffffc20001c9b000 RIP: 
 [<ffffffff881c6175>] :xfs:xlog_recover_add_to_trans+0x64/0xef
PGD 15781d067 PUD 15781c067 PMD 130fd3067 PTE 0
Oops: 0000 [1] SMP 
CPU 1 
Modules linked in: xfs sd_mod 3w_9xxx scsi_mod ohci_hcd uhci_hcd ehci_hcd ipv6 bnx2 sky2 r8169 ns838
20 dl2k acenic e100 tg3 e1000 mii
Pid: 2431, comm: mount Not tainted 2.6.24-rc3-xfsbuf #2
RIP: 0010:[<ffffffff881c6175>]  [<ffffffff881c6175>] :xfs:xlog_recover_add_to_trans+0x64/0xef
RSP: 0018:ffff8101320d9728  EFLAGS: 00010286
RAX: ffffc20001c9c000 RBX: 000000000020bd78 RCX: 00000000001cc280
RDX: 0000000000000000 RSI: ffffc20001c9b000 RDI: ffffc20001cdbaf8
RBP: ffff8101320d9758 R08: ffff810157801652 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000002 R12: ffff8101324633b8
R13: ffffc20001c5b508 R14: ffff810132463110 R15: ffffc20001c5b4fc
FS:  00002abba6e87b00(0000) GS:ffff810157801708(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffc20001c9b000 CR3: 000000013252a000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process mount (pid: 2431, threadinfo ffff8101320d8000, task ffff810131972000)
Stack:  0020bd785769c5e8 000000005769c5e8 000000000000000a ffffc20001c5b508
 000000000000011e ffffc20001c5b4fc ffff8101320d97b8 ffffffff881c79db
 ffffc20001c6a000 00000001881da0bc ffff81013188f000 ffff8101320d9828
Call Trace:
 [<ffffffff881c79db>] :xfs:xlog_recover_process_data+0x184/0x1d9
 [<ffffffff881c89cb>] :xfs:xlog_do_recovery_pass+0x74b/0x795
 [<ffffffff881c8a52>] :xfs:xlog_do_log_recovery+0x3d/0x82
 [<ffffffff881c8aa9>] :xfs:xlog_do_recover+0x12/0x11c
 [<ffffffff881c8c37>] :xfs:xlog_recover+0x84/0x92
 [<ffffffff881c3125>] :xfs:xfs_log_mount+0x8c/0xe4
 [<ffffffff881cad92>] :xfs:xfs_mountfs+0x67d/0x97b
 [<ffffffff881cbcb7>] :xfs:xfs_mru_cache_create+0x170/0x1d5
 [<ffffffff881b2f04>] :xfs:xfs_fstrm_free_func+0x0/0x81
 [<ffffffff881be67c>] :xfs:xfs_ioinit+0xb/0xd
 [<ffffffff881d0cfa>] :xfs:xfs_mount+0x2bb/0x36b
 [<ffffffff881e12d9>] :xfs:xfs_fs_fill_super+0xd2/0x245
 [<ffffffff802b7b9f>] get_filesystem+0x17/0x39
 [<ffffffff802a4659>] sget+0x3fb/0x418
 [<ffffffff802a4661>] sget+0x403/0x418
 [<ffffffff802a4d79>] set_bdev_super+0x0/0x14
 [<ffffffff802a4ec2>] get_sb_bdev+0x123/0x16f
 [<ffffffff881e1207>] :xfs:xfs_fs_fill_super+0x0/0x245
 [<ffffffff881e145f>] :xfs:xfs_fs_get_sb+0x13/0x18
 [<ffffffff802a5107>] vfs_kern_mount+0x8f/0x11c
 [<ffffffff802a51d8>] do_kern_mount+0x44/0xf4
 [<ffffffff802ba0d3>] do_mount+0x6d8/0x71e
 [<ffffffff80335682>] __up_read+0x93/0x9b
 [<ffffffff80252261>] up_read+0x23/0x27
 [<ffffffff8047a679>] do_page_fault+0x42e/0x7c7
 [<ffffffff80284d61>] zone_statistics+0x64/0x69
 [<ffffffff8027e4e0>] __alloc_pages+0x6b/0x311
 [<ffffffff802ba3c8>] sys_mount+0x8a/0xcc
 [<ffffffff8020bf8e>] system_call+0x7e/0x83

Code: f3 a4 49 89 c7 49 8b 54 24 08 8b 42 18 85 c0 74 0e 3b 42 14 

-- 
    .+'''+.         .+'''+.         .+'''+.         .+'''+.         .+''
 Kelemen Péter     /       \       /       \     Peter.Kelemen@cern.ch
.+'         `+...+'         `+...+'         `+...+'         `+...+'

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.24-rc3 oopses while mounting fs
  2007-11-30 22:31   ` KELEMEN Peter
@ 2007-12-03  3:24     ` Lachlan McIlroy
  2007-12-03 22:02       ` KELEMEN Peter
  0 siblings, 1 reply; 12+ messages in thread
From: Lachlan McIlroy @ 2007-12-03  3:24 UTC (permalink / raw)
  To: xfs

Okay, sounds like it might be a corrupt log.
Can you run xfs_logprint on the device or saved log?
Also give xfs_logprint -t -i a go.

You've saved the log into a file?  You can get the
filesystem mounted again by deleting the log with
xfs_repair -L <dev>.  You'll probably need to run
xfs_repair over the filesystem to be safe.

KELEMEN Peter wrote:
> * Lachlan Mcilroy (lachlan@sgi.com) [20071129 10:56]:
> 
> Lachlan,
> 
>> This looks like a problem we've just fixed, try this patch.
>> We'll get this to mainline soon.
> 
> Thanks for the patch.  Unfortunately, it does not seem to fix my
> problem.  Attempting to mount the filesystem still results in an
> oops.
> 
> Peter
> 
> -----8<---
> SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
> SGI XFS Quota Management subsystem
> XFS mounting filesystem sdc
> Starting XFS recovery on filesystem: sdc (logdev: internal)
> Unable to handle kernel paging request at ffffc20001c9b000 RIP: 
>  [<ffffffff881c6175>] :xfs:xlog_recover_add_to_trans+0x64/0xef
> PGD 15781d067 PUD 15781c067 PMD 130fd3067 PTE 0
> Oops: 0000 [1] SMP 
> CPU 1 
> Modules linked in: xfs sd_mod 3w_9xxx scsi_mod ohci_hcd uhci_hcd ehci_hcd ipv6 bnx2 sky2 r8169 ns838
> 20 dl2k acenic e100 tg3 e1000 mii
> Pid: 2431, comm: mount Not tainted 2.6.24-rc3-xfsbuf #2
> RIP: 0010:[<ffffffff881c6175>]  [<ffffffff881c6175>] :xfs:xlog_recover_add_to_trans+0x64/0xef
> RSP: 0018:ffff8101320d9728  EFLAGS: 00010286
> RAX: ffffc20001c9c000 RBX: 000000000020bd78 RCX: 00000000001cc280
> RDX: 0000000000000000 RSI: ffffc20001c9b000 RDI: ffffc20001cdbaf8
> RBP: ffff8101320d9758 R08: ffff810157801652 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000002 R12: ffff8101324633b8
> R13: ffffc20001c5b508 R14: ffff810132463110 R15: ffffc20001c5b4fc
> FS:  00002abba6e87b00(0000) GS:ffff810157801708(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: ffffc20001c9b000 CR3: 000000013252a000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process mount (pid: 2431, threadinfo ffff8101320d8000, task ffff810131972000)
> Stack:  0020bd785769c5e8 000000005769c5e8 000000000000000a ffffc20001c5b508
>  000000000000011e ffffc20001c5b4fc ffff8101320d97b8 ffffffff881c79db
>  ffffc20001c6a000 00000001881da0bc ffff81013188f000 ffff8101320d9828
> Call Trace:
>  [<ffffffff881c79db>] :xfs:xlog_recover_process_data+0x184/0x1d9
>  [<ffffffff881c89cb>] :xfs:xlog_do_recovery_pass+0x74b/0x795
>  [<ffffffff881c8a52>] :xfs:xlog_do_log_recovery+0x3d/0x82
>  [<ffffffff881c8aa9>] :xfs:xlog_do_recover+0x12/0x11c
>  [<ffffffff881c8c37>] :xfs:xlog_recover+0x84/0x92
>  [<ffffffff881c3125>] :xfs:xfs_log_mount+0x8c/0xe4
>  [<ffffffff881cad92>] :xfs:xfs_mountfs+0x67d/0x97b
>  [<ffffffff881cbcb7>] :xfs:xfs_mru_cache_create+0x170/0x1d5
>  [<ffffffff881b2f04>] :xfs:xfs_fstrm_free_func+0x0/0x81
>  [<ffffffff881be67c>] :xfs:xfs_ioinit+0xb/0xd
>  [<ffffffff881d0cfa>] :xfs:xfs_mount+0x2bb/0x36b
>  [<ffffffff881e12d9>] :xfs:xfs_fs_fill_super+0xd2/0x245
>  [<ffffffff802b7b9f>] get_filesystem+0x17/0x39
>  [<ffffffff802a4659>] sget+0x3fb/0x418
>  [<ffffffff802a4661>] sget+0x403/0x418
>  [<ffffffff802a4d79>] set_bdev_super+0x0/0x14
>  [<ffffffff802a4ec2>] get_sb_bdev+0x123/0x16f
>  [<ffffffff881e1207>] :xfs:xfs_fs_fill_super+0x0/0x245
>  [<ffffffff881e145f>] :xfs:xfs_fs_get_sb+0x13/0x18
>  [<ffffffff802a5107>] vfs_kern_mount+0x8f/0x11c
>  [<ffffffff802a51d8>] do_kern_mount+0x44/0xf4
>  [<ffffffff802ba0d3>] do_mount+0x6d8/0x71e
>  [<ffffffff80335682>] __up_read+0x93/0x9b
>  [<ffffffff80252261>] up_read+0x23/0x27
>  [<ffffffff8047a679>] do_page_fault+0x42e/0x7c7
>  [<ffffffff80284d61>] zone_statistics+0x64/0x69
>  [<ffffffff8027e4e0>] __alloc_pages+0x6b/0x311
>  [<ffffffff802ba3c8>] sys_mount+0x8a/0xcc
>  [<ffffffff8020bf8e>] system_call+0x7e/0x83
> 
> Code: f3 a4 49 89 c7 49 8b 54 24 08 8b 42 18 85 c0 74 0e 3b 42 14 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.24-rc3 oopses while mounting fs
  2007-12-03  3:24     ` Lachlan McIlroy
@ 2007-12-03 22:02       ` KELEMEN Peter
  2007-12-04  0:11         ` Timothy Shimmin
  0 siblings, 1 reply; 12+ messages in thread
From: KELEMEN Peter @ 2007-12-03 22:02 UTC (permalink / raw)
  To: xfs

* Lachlan Mcilroy (lachlan@sgi.com) [20071203 14:24]:

Lachlan,

> Okay, sounds like it might be a corrupt log.

Yep.

> Can you run xfs_logprint on the device or saved log?

Sure.

http://cern.ch/fuji/lxfsre1103/xfs_logprint1.txt.bz2
It aborted though, with the following message:

	xfs_logprint: unknown log operation type (343b)
	Bad data in log

> Also give xfs_logprint -t -i a go.

http://cern.ch/fuji/lxfsre1103/xfs_logprint2.txt.bz2

> You've saved the log into a file?  You can get the filesystem
> mounted again by deleting the log with xfs_repair -L <dev>.
> You'll probably need to run xfs_repair over the filesystem to be
> safe.

I conserved this filesystem away for further analysis, I'm not
sure how helpful it can be.

Thanks,
Peter

-- 
    .+'''+.         .+'''+.         .+'''+.         .+'''+.         .+''
 Kelemen Péter     /       \       /       \     Peter.Kelemen@cern.ch
.+'         `+...+'         `+...+'         `+...+'         `+...+'

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.24-rc3 oopses while mounting fs
  2007-12-03 22:02       ` KELEMEN Peter
@ 2007-12-04  0:11         ` Timothy Shimmin
  2007-12-04  7:08           ` Lachlan McIlroy
  2007-12-04  9:42           ` KELEMEN Peter
  0 siblings, 2 replies; 12+ messages in thread
From: Timothy Shimmin @ 2007-12-04  0:11 UTC (permalink / raw)
  To: xfs

Hi Peter,

FYI

Just a couple of things in case you weren't aware.
Often it is simplest to save the log using the -C option to a file.
        -C filename
               Copy the log from the filesystem to the file filename.
               The log itself is not printed.
The log can then be analysed in various ways later.

Running logprint in operation mode (the default mode without options)
(non-transaction mode without -t)
is really pretty useless without the -s option.
If you use the -s option then one needs to know where to seek to
(you can use -t to find the head/tail or -d to see where log records start).
The problem here is that if you do logprint
like this (no options) then it will start at offset zero and will invariably have
trouble decoding if the log has wrapped around (the general case) as
you'll start in the middle of a record.
So the -t option is often a more interesting starting point as it
will operate more like the file system recovery does, finding the head and tail,
and processing from the tail to the head of the log.

If you want to save the filesystem away for possible metadata debugging later,
then xfs_metadump is your friend.

Cheers,
Tim.

KELEMEN Peter wrote:
> * Lachlan Mcilroy (lachlan@sgi.com) [20071203 14:24]:
> 
> Lachlan,
> 
>> Okay, sounds like it might be a corrupt log.
> 
> Yep.
> 
>> Can you run xfs_logprint on the device or saved log?
> 
> Sure.
> 
> http://cern.ch/fuji/lxfsre1103/xfs_logprint1.txt.bz2
> It aborted though, with the following message:
> 
> 	xfs_logprint: unknown log operation type (343b)
> 	Bad data in log
> 
>> Also give xfs_logprint -t -i a go.
> 
> http://cern.ch/fuji/lxfsre1103/xfs_logprint2.txt.bz2
> 
>> You've saved the log into a file?  You can get the filesystem
>> mounted again by deleting the log with xfs_repair -L <dev>.
>> You'll probably need to run xfs_repair over the filesystem to be
>> safe.
> 
> I conserved this filesystem away for further analysis, I'm not
> sure how helpful it can be.
> 
> Thanks,
> Peter
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.24-rc3 oopses while mounting fs
  2007-12-04  0:11         ` Timothy Shimmin
@ 2007-12-04  7:08           ` Lachlan McIlroy
  2007-12-04  9:57             ` KELEMEN Peter
  2007-12-05  5:47             ` Lachlan McIlroy
  2007-12-04  9:42           ` KELEMEN Peter
  1 sibling, 2 replies; 12+ messages in thread
From: Lachlan McIlroy @ 2007-12-04  7:08 UTC (permalink / raw)
  To: Timothy Shimmin; +Cc: xfs

The transactional view appears to be truncated.

============================================================================
TRANS: tid:0x5769cb28  type:STRAT_WRITE  #items:5  trans:0x0  q:0x56dfa0
INO: cnt:3 total:3 a:0x56f5e0 len:56 a:0x56f570 len:96 a:0x56df80 len:16
         INODE: #regs:3   ino:0x1400005c  flags:0x5   dsize:16
         CORE inode:
                 magic:IN  mode:0x8180  ver:1  format:2  onlink:1
                 uid:14029  gid:1474  nlink:1 projid:0
                 atime:1192671344  mtime:1192672095  ctime:1192672095
                 flushiter:20
                 size:0x20700000  nblks:0x20740  exsize:0  nextents:1  anextents:0
                 forkoff:0  dmevmask:0x0  dmstate:0  flags:0x0  gen:4
                 DATA FORK EXTENTS inode data:
BUF: cnt:2 total:2 a:0x56dc50 len:24 a:0x56dca0 len:128
         BUF:  #regs:2   start blkno:0x24610701   len:1   bmap size:1   flags:0x0
         AGF Buffer: (XAGF)
BUF: cnt:2 total:2 a:0x56ddc0 len:28 a:0x56dd30 len:128
         BUF:  #regs:2   start blkno:0x24610708   len:8   bmap size:2   flags:0x0
         BUF DATA
BUF: cnt:3 total:3 a:0x56df20 len:28 a:0x56f010 len:128 a:0x56f160 len:256
         BUF:  #regs:3   start blkno:0x24610710   len:8   bmap size:2   flags:0x0
         BUF DATA
         BUF DATA
BUF: cnt:2 total:2 a:0x56ddf0 len:24 a:0x56f0a0 len:128
         BUF:  #regs:2   start blkno:0x0   len:1   bmap size:1   flags:0x0
         SUPER Block Buffer:

LOG REC AT LSN cycle 154 block 5248 (0x9a, 0x1480)

... just stops there.  Looking at the non-transactional view at this point:

============================================================================
cycle: 154	version: 2		lsn: 154,5248	tail_lsn: 153,253056
length of Log Record: 61440	prev offset: 5120		num ops: 303
uuid: b02fc6bd-662f-40a5-8f0c-e260a881a3e7   format: little endian linux
h_size: 262144
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
extended-header: cycle: 154
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
extended-header: cycle: 154
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
extended-header: cycle: 154
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
extended-header: cycle: 154
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
extended-header: cycle: 154
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
extended-header: cycle: 154
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
extended-header: cycle: 154
----------------------------------------------------------------------------
Oper (0): tid: 5769cd60  len: 0  clientid: TRANS  flags: START
----------------------------------------------------------------------------
Oper (1): tid: 5769cd60  len: 16  clientid: TRANS  flags: none
TRAN:    type: STRAT_WRITE       tid: 0       num_items: 5
----------------------------------------------------------------------------
Oper (2): tid: 5769cd60  len: 56  clientid: TRANS  flags: none
INODE: #regs: 3   ino: 0x1400005c  flags: 0x5   dsize: 16
         blkno: 610338736  len: 16  boff: 4096
Oper (3): tid: 5769cd60  len: 96  clientid: TRANS  flags: none
INODE CORE
magic 0x494e mode 0100600 version 1 format 2
nlink 1 uid 14029 gid 1474
atime 0x4716b870 mtime 0x4716bade ctime 0x4716bade
size 0x1ce00000 nblocks 0x1c840 extsize 0x0 nextents 0x1
naextents 0x0 forkoff 0 dmevmask 0x0 dmstate 0x0
flags 0x0 gen 0x4
Oper (4): tid: 5769cd60  len: 16  clientid: TRANS  flags: none
EXTENTS inode data
----------------------------------------------------------------------------
Oper (5): tid: 5769cf18  len: 24  clientid: TRANS  flags: none
BUF:  #regs: 2   start blkno: 610338561 (0x24610701)  len: 1  bmap size: 1  flags: 0x0
Oper (6): tid: 5769cf18  len: 128  clientid: TRANS  flags: none
AGF Buffer: XAGF
ver: 1  seq#: 5  len: 15258464
root BNO: 1  CNT: 2
level BNO: 1  CNT: 1
1st: 0  last: 3  cnt: 4  freeblks: 2491090  longest: 581471
----------------------------------------------------------------------------
Oper (7): tid: 5769cf18  len: 28  clientid: TRANS  flags: none
BUF:  #regs: 2   start blkno: 610338568 (0x24610708)  len: 8  bmap size: 2  flags: 0x0
Oper (8): tid: 5769cf18  len: 128  clientid: TRANS  flags: none
BUF DATA
----------------------------------------------------------------------------
Oper (9): tid: 5769cf18  len: 28  clientid: TRANS  flags: none
BUF:  #regs: 3   start blkno: 610338576 (0x24610710)  len: 8  bmap size: 2  flags: 0x0
Oper (10): tid: 5769cf18  len: 128  clientid: TRANS  flags: none
BUF DATA
Oper (11): tid: 5769cf18  len: 256  clientid: TRANS  flags: none
BUF DATA
----------------------------------------------------------------------------
Oper (12): tid: 5769cf18  len: 24  clientid: TRANS  flags: none
BUF:  #regs: 2   start blkno: 8647474234504773632 (0x7802000000000000)  len: 1  bmap size: 1  flags: 0x0
Oper (13): tid: 5769cf18  len: 128  clientid: TRANS  flags: none
BUF DATA
----------------------------------------------------------------------------
Oper (14): tid: 5769cf18  len: 0  clientid: TRANS  flags: COMMIT
----------------------------------------------------------------------------
Oper (15): tid: 5769c5e8  len: 0  clientid: TRANS  flags: START
----------------------------------------------------------------------------
Oper (16): tid: 5769c5e8  len: 16  clientid: TRANS  flags: none
TRAN:    type: STRAT_WRITE       tid: 0       num_items: 5
----------------------------------------------------------------------------
Oper (17): tid: 5769c5e8  len: 2145656  clientid: TRANS  flags: none
**********************************************************************
* ERROR: data block=5255                                              *
**********************************************************************

We should have 5 more items in this transaction plus the commit
and then another 286 operations.  The operation number and transaction
id are correct but the length is bogus.  The first item in a STRAT_WRITE
transaction should be an inode and the length should be 56.

Tim, does this ring any bells with you?

The only validation we do on the length is an ASSERT in
xlog_recover_process_data() which isn't much good here.  I don't know
why the value is wrong but we can do better than crashing the system.
I would like to know how much more data is corrupt beyond this point.

Peter, can you try xfs_logprint -c ...?


Timothy Shimmin wrote:
> Hi Peter,
> 
> FYI
> 
> Just a couple of things in case you weren't aware.
> Often it is simplest to save the log using the -C option to a file.
>        -C filename
>               Copy the log from the filesystem to the file filename.
>               The log itself is not printed.
> The log can then be analysed in various ways later.
> 
> Running logprint in operation mode (the default mode without options)
> (non-transaction mode without -t)
> is really pretty useless without the -s option.
> If you use the -s option then one needs to know where to seek to
> (you can use -t to find the head/tail or -d to see where log records 
> start).
> The problem here is that if you do logprint
> like this (no options) then it will start at offset zero and will 
> invariably have
> trouble decoding if the log has wrapped around (the general case) as
> you'll start in the middle of a record.
> So the -t option is often a more interesting starting point as it
> will operate more like the file system recovery does, finding the head 
> and tail,
> and processing from the tail to the head of the log.
> 
> If you want to save the filesystem away for possible metadata debugging 
> later,
> then xfs_metadump is your friend.
> 
> Cheers,
> Tim.
> 
> KELEMEN Peter wrote:
>> * Lachlan Mcilroy (lachlan@sgi.com) [20071203 14:24]:
>>
>> Lachlan,
>>
>>> Okay, sounds like it might be a corrupt log.
>>
>> Yep.
>>
>>> Can you run xfs_logprint on the device or saved log?
>>
>> Sure.
>>
>> http://cern.ch/fuji/lxfsre1103/xfs_logprint1.txt.bz2
>> It aborted though, with the following message:
>>
>>     xfs_logprint: unknown log operation type (343b)
>>     Bad data in log
>>
>>> Also give xfs_logprint -t -i a go.
>>
>> http://cern.ch/fuji/lxfsre1103/xfs_logprint2.txt.bz2
>>
>>> You've saved the log into a file?  You can get the filesystem
>>> mounted again by deleting the log with xfs_repair -L <dev>.
>>> You'll probably need to run xfs_repair over the filesystem to be
>>> safe.
>>
>> I conserved this filesystem away for further analysis, I'm not
>> sure how helpful it can be.
>>
>> Thanks,
>> Peter
>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.24-rc3 oopses while mounting fs
  2007-12-04  0:11         ` Timothy Shimmin
  2007-12-04  7:08           ` Lachlan McIlroy
@ 2007-12-04  9:42           ` KELEMEN Peter
  1 sibling, 0 replies; 12+ messages in thread
From: KELEMEN Peter @ 2007-12-04  9:42 UTC (permalink / raw)
  To: xfs

* Timothy Shimmin (tes@sgi.com) [20071204 11:11]:

Timothy,

Thanks for the recommendations.

> Often it is simplest to save the log using the -C option to a
> file.  -C filename [...] The log can then be analysed in various
> ways later.

I did that (I played with it and did the same thing with dd(1),
too).

> If you want to save the filesystem away for possible metadata
> debugging later, then xfs_metadump is your friend.

I'll check this out.  I watched Barry implement the thing but we
were stuck in the past for a while with xfsprogs-2.6.x.

Peter

-- 
    .+'''+.         .+'''+.         .+'''+.         .+'''+.         .+''
 Kelemen Péter     /       \       /       \     Peter.Kelemen@cern.ch
.+'         `+...+'         `+...+'         `+...+'         `+...+'

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.24-rc3 oopses while mounting fs
  2007-12-04  7:08           ` Lachlan McIlroy
@ 2007-12-04  9:57             ` KELEMEN Peter
  2007-12-05  5:47             ` Lachlan McIlroy
  1 sibling, 0 replies; 12+ messages in thread
From: KELEMEN Peter @ 2007-12-04  9:57 UTC (permalink / raw)
  To: xfs

* Lachlan Mcilroy (lachlan@sgi.com) [20071204 18:08]:

Lachlan,

> The transactional view appears to be truncated. [...]
> Peter, can you try xfs_logprint -c ...?

xfs_logprint -c -t -i produces an identical transactional dump
than what you have seen.
http://cern.ch/fuji/lxfsre1103/xfs_logprint_c_t_i.txt

You can grab the whole saved log from here (128M uncompressed):
http://cern.ch/fuji/lxfsre1103/sdc.xlg.bz2
88a6906567ab44994a0b5c4255c5c619428405dc  sdc.xlg

HTH,
Peter

-- 
    .+'''+.         .+'''+.         .+'''+.         .+'''+.         .+''
 Kelemen Péter     /       \       /       \     Peter.Kelemen@cern.ch
.+'         `+...+'         `+...+'         `+...+'         `+...+'

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.24-rc3 oopses while mounting fs
  2007-12-04  7:08           ` Lachlan McIlroy
  2007-12-04  9:57             ` KELEMEN Peter
@ 2007-12-05  5:47             ` Lachlan McIlroy
  2007-12-05  6:38               ` Timothy Shimmin
  1 sibling, 1 reply; 12+ messages in thread
From: Lachlan McIlroy @ 2007-12-05  5:47 UTC (permalink / raw)
  To: lachlan; +Cc: Timothy Shimmin, xfs

> ============================================================================ 
> 
> cycle: 154    version: 2        lsn: 154,5248    tail_lsn: 153,253056
> length of Log Record: 61440    prev offset: 5120        num ops: 303
> uuid: b02fc6bd-662f-40a5-8f0c-e260a881a3e7   format: little endian linux
> h_size: 262144
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
> 
> extended-header: cycle: 154
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
> 
> extended-header: cycle: 154
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
> 
> extended-header: cycle: 154
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
> 
> extended-header: cycle: 154
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
> 
> extended-header: cycle: 154
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
> 
> extended-header: cycle: 154
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
> 
> extended-header: cycle: 154
> ---------------------------------------------------------------------------- 
> 
> Oper (0): tid: 5769cd60  len: 0  clientid: TRANS  flags: START
> ---------------------------------------------------------------------------- 
> 
> Oper (1): tid: 5769cd60  len: 16  clientid: TRANS  flags: none
> TRAN:    type: STRAT_WRITE       tid: 0       num_items: 5
> ---------------------------------------------------------------------------- 
> 
> Oper (2): tid: 5769cd60  len: 56  clientid: TRANS  flags: none
> INODE: #regs: 3   ino: 0x1400005c  flags: 0x5   dsize: 16
>         blkno: 610338736  len: 16  boff: 4096
> Oper (3): tid: 5769cd60  len: 96  clientid: TRANS  flags: none
> INODE CORE
> magic 0x494e mode 0100600 version 1 format 2
> nlink 1 uid 14029 gid 1474
> atime 0x4716b870 mtime 0x4716bade ctime 0x4716bade
> size 0x1ce00000 nblocks 0x1c840 extsize 0x0 nextents 0x1
> naextents 0x0 forkoff 0 dmevmask 0x0 dmstate 0x0
> flags 0x0 gen 0x4
> Oper (4): tid: 5769cd60  len: 16  clientid: TRANS  flags: none
> EXTENTS inode data
> ---------------------------------------------------------------------------- 

Another interesting point is that the transaction id has changed between
these operations from 5769cd60 to 5769cf18.  It should stay the same.
We may be reading old log data.

> 
> Oper (5): tid: 5769cf18  len: 24  clientid: TRANS  flags: none
> BUF:  #regs: 2   start blkno: 610338561 (0x24610701)  len: 1  bmap size: 
> 1  flags: 0x0
> Oper (6): tid: 5769cf18  len: 128  clientid: TRANS  flags: none
> AGF Buffer: XAGF
> ver: 1  seq#: 5  len: 15258464
> root BNO: 1  CNT: 2
> level BNO: 1  CNT: 1
> 1st: 0  last: 3  cnt: 4  freeblks: 2491090  longest: 581471
> ---------------------------------------------------------------------------- 
> 
> Oper (7): tid: 5769cf18  len: 28  clientid: TRANS  flags: none
> BUF:  #regs: 2   start blkno: 610338568 (0x24610708)  len: 8  bmap size: 
> 2  flags: 0x0
> Oper (8): tid: 5769cf18  len: 128  clientid: TRANS  flags: none
> BUF DATA
> ---------------------------------------------------------------------------- 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.24-rc3 oopses while mounting fs
  2007-12-05  5:47             ` Lachlan McIlroy
@ 2007-12-05  6:38               ` Timothy Shimmin
  2007-12-05  7:04                 ` Lachlan McIlroy
  0 siblings, 1 reply; 12+ messages in thread
From: Timothy Shimmin @ 2007-12-05  6:38 UTC (permalink / raw)
  To: lachlan; +Cc: xfs

Lachlan,

An aside...

I think I see why we couldn't get "xfs_logprint -t" to work on a file (-f).
   > xfs_logprint -t -f ~/debug/sdc.xlg
   xfs_logprint:
       data device: 0xffffffffffffffff
       log device: 0xffffffffffffffff daddr: 0 length: 262144

   XFS: Log inconsistent (didn't find previous header)
   XFS: failed to find log head
   xfs_logprint: failed to find head and tail, error: 5

The problem is that it is using the superblock to determine if it is version 1
or version 2 logs in order to find out the maximum size of the iclog (log record).
And for -f there is no superblock to look at. The version number is, however, in the
log record header.
So when it looks in the undefined data it comes up with _not_ version 2,
and so assumes to look back for the hdr to only 32K when it really needs
to look back to 64k (for v2 it would limit at 256K).
Hence, it can't find the magic# in the header.

The debugging...

====================================================
Breakpoint 1, xlog_find_verify_log_record (log=0x607ffffffea96be0, start_blk=5312, last_blk=0x607ffffffea96bc0, extra_bblks=0)
     at xfs_log_recover.c:234
234             if (!(bp = xlog_get_bp(log, num_blks))) {
(gdb) bt
#0  xlog_find_verify_log_record (log=0x607ffffffea96be0, start_blk=5312, last_blk=0x607ffffffea96bc0, extra_bblks=0) at xfs_log_recover.c:234
#1  0x4000000000092020 in xlog_find_head (log=0x607ffffffea96be0, return_head_blk=0x607ffffffea96bd0) at xfs_log_recover.c:527
#2  0x4000000000094000 in xlog_find_tail (log=0x607ffffffea96be0, head_blk=0x607ffffffea96bd0, tail_blk=0x607ffffffea96bd8, readonly=0)
     at xfs_log_recover.c:616
#3  0x40000000000100a0 in xfs_log_print_trans (log=0x607ffffffea96be0, print_block_start=-1) at log_print_trans.c:49
#4  0x4000000000003600 in main (argc=<value optimized out>, argv=<value optimized out>) at logprint.c:240

(gdb) print *last_blk
$1 = 5376

start_blk = 5312
last_blk = 5376
extra_bblks = 0

Goes from 5376..5312 looking for magic# at start of sector.
Errors out if can't find one.
Which is true since the next magic# is at 5248.

logprint -d output:
   4992 HEADER Cycle 154 tail 153:253056 len  61440 ops 105
   5120 HEADER Cycle 154 tail 153:253056 len  61440 ops 344
   5248 HEADER Cycle 154 tail 153:253056 len  61440 ops 303  <--- previous block
   5376 HEADER Cycle 153 tail 153:253056 len      0 ops 0    <--- hdr
[00000 - 05376] Cycle 0x0000009a New Cycle 0x00000099
   5377 HEADER Cycle 153 tail 153:253056 len      0 ops 0
   5378 HEADER Cycle 153 tail 153:253056 len      0 ops 0

So why is it using a start_blk of 5312?


     522         num_scan_bblks = XLOG_REC_SHIFT(log);
     523         if (head_blk >= num_scan_bblks) {
     524                 start_blk = head_blk - num_scan_bblks; /* don't read head_blk */
     525
     526                 /* start ptr at last block ptr before head_blk */
  ->527                 if ((error = xlog_find_verify_log_record(log, start_blk,
     528                                                         &head_blk, 0)) == -1) {

#define XLOG_REC_SHIFT(log) \
         BTOBB(1 << (XFS_SB_VERSION_HASLOGV2(&log->l_mp->m_sb) ? \
          XLOG_MAX_RECORD_BSHIFT : XLOG_BIG_RECORD_BSHIFT))

#define XLOG_BIG_RECORD_BSHIFT  15              /* 32k == 1 << 15 */
#define XLOG_MAX_RECORD_BSHIFT  18              /* 256k == 1 << 18 */

Number of blocks between rec-hdrs from -d output:
5376-5248 = 128 BB

Looks like it used 5312 = 5376 - num_scan_bblks
=> num_scan_bblks = 64

Yep
(gdb) p num_scan_bblks
$3 = 64
32K
i.e. it thinks it has v1 logs

(gdb) p *(log->l_mp)
$9 = {m_sb = {sb_magicnum = 0, sb_blocksize = 0, sb_dblocks = 0, sb_rblocks = 0, sb_rextents = 0, sb_uuid = '\0' <repeats 15 times>,
     sb_logstart = 2305843009213997776, sb_rootino = 6953557824646229696, sb_rbmino = 6953557824646229688, sb_rsumino = 6953557824646229680,
     sb_rextsize = 284472, sb_agblocks = 536870912, sb_agcount = 4281151176, sb_rbmblocks = 1619001343, sb_logblocks = 0, sb_versionnum = 0,
     sb_sectsize = 0, sb_inodesize = 38256, sb_inopblock = 4, sb_fname = "\000\000\000 H?\a\000\000\000\000 ", sb_blocklog = 0 '\0',
etc...
it is not set

Okay, need to file a bug for this part so that -f can be useful here.
It needs to be using a LR h_version number for this somehow.

--Tim


Lachlan McIlroy wrote:
>> ============================================================================ 
>>
>> cycle: 154    version: 2        lsn: 154,5248    tail_lsn: 153,253056
>> length of Log Record: 61440    prev offset: 5120        num ops: 303
>> uuid: b02fc6bd-662f-40a5-8f0c-e260a881a3e7   format: little endian linux
>> h_size: 262144
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>
>> extended-header: cycle: 154
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>
>> extended-header: cycle: 154
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>
>> extended-header: cycle: 154
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>
>> extended-header: cycle: 154
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>
>> extended-header: cycle: 154
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>
>> extended-header: cycle: 154
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>
>> extended-header: cycle: 154
>> ---------------------------------------------------------------------------- 
>>
>> Oper (0): tid: 5769cd60  len: 0  clientid: TRANS  flags: START
>> ---------------------------------------------------------------------------- 
>>
>> Oper (1): tid: 5769cd60  len: 16  clientid: TRANS  flags: none
>> TRAN:    type: STRAT_WRITE       tid: 0       num_items: 5
>> ---------------------------------------------------------------------------- 
>>
>> Oper (2): tid: 5769cd60  len: 56  clientid: TRANS  flags: none
>> INODE: #regs: 3   ino: 0x1400005c  flags: 0x5   dsize: 16
>>         blkno: 610338736  len: 16  boff: 4096
>> Oper (3): tid: 5769cd60  len: 96  clientid: TRANS  flags: none
>> INODE CORE
>> magic 0x494e mode 0100600 version 1 format 2
>> nlink 1 uid 14029 gid 1474
>> atime 0x4716b870 mtime 0x4716bade ctime 0x4716bade
>> size 0x1ce00000 nblocks 0x1c840 extsize 0x0 nextents 0x1
>> naextents 0x0 forkoff 0 dmevmask 0x0 dmstate 0x0
>> flags 0x0 gen 0x4
>> Oper (4): tid: 5769cd60  len: 16  clientid: TRANS  flags: none
>> EXTENTS inode data
>> ---------------------------------------------------------------------------- 
> 
> 
> Another interesting point is that the transaction id has changed between
> these operations from 5769cd60 to 5769cf18.  It should stay the same.
> We may be reading old log data.
> 
>>
>> Oper (5): tid: 5769cf18  len: 24  clientid: TRANS  flags: none
>> BUF:  #regs: 2   start blkno: 610338561 (0x24610701)  len: 1  bmap 
>> size: 1  flags: 0x0
>> Oper (6): tid: 5769cf18  len: 128  clientid: TRANS  flags: none
>> AGF Buffer: XAGF
>> ver: 1  seq#: 5  len: 15258464
>> root BNO: 1  CNT: 2
>> level BNO: 1  CNT: 1
>> 1st: 0  last: 3  cnt: 4  freeblks: 2491090  longest: 581471
>> ---------------------------------------------------------------------------- 
>>
>> Oper (7): tid: 5769cf18  len: 28  clientid: TRANS  flags: none
>> BUF:  #regs: 2   start blkno: 610338568 (0x24610708)  len: 8  bmap 
>> size: 2  flags: 0x0
>> Oper (8): tid: 5769cf18  len: 128  clientid: TRANS  flags: none
>> BUF DATA
>> ---------------------------------------------------------------------------- 
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.24-rc3 oopses while mounting fs
  2007-12-05  6:38               ` Timothy Shimmin
@ 2007-12-05  7:04                 ` Lachlan McIlroy
  0 siblings, 0 replies; 12+ messages in thread
From: Lachlan McIlroy @ 2007-12-05  7:04 UTC (permalink / raw)
  To: Timothy Shimmin; +Cc: xfs

Thanks Tim.

Timothy Shimmin wrote:
> Lachlan,
> 
> An aside...
> 
> I think I see why we couldn't get "xfs_logprint -t" to work on a file (-f).
>   > xfs_logprint -t -f ~/debug/sdc.xlg
>   xfs_logprint:
>       data device: 0xffffffffffffffff
>       log device: 0xffffffffffffffff daddr: 0 length: 262144
> 
>   XFS: Log inconsistent (didn't find previous header)
>   XFS: failed to find log head
>   xfs_logprint: failed to find head and tail, error: 5
> 
> The problem is that it is using the superblock to determine if it is 
> version 1
> or version 2 logs in order to find out the maximum size of the iclog 
> (log record).
> And for -f there is no superblock to look at. The version number is, 
> however, in the
> log record header.
> So when it looks in the undefined data it comes up with _not_ version 2,
> and so assumes to look back for the hdr to only 32K when it really needs
> to look back to 64k (for v2 it would limit at 256K).
> Hence, it can't find the magic# in the header.
> 
> The debugging...
> 
> ====================================================
> Breakpoint 1, xlog_find_verify_log_record (log=0x607ffffffea96be0, 
> start_blk=5312, last_blk=0x607ffffffea96bc0, extra_bblks=0)
>     at xfs_log_recover.c:234
> 234             if (!(bp = xlog_get_bp(log, num_blks))) {
> (gdb) bt
> #0  xlog_find_verify_log_record (log=0x607ffffffea96be0, start_blk=5312, 
> last_blk=0x607ffffffea96bc0, extra_bblks=0) at xfs_log_recover.c:234
> #1  0x4000000000092020 in xlog_find_head (log=0x607ffffffea96be0, 
> return_head_blk=0x607ffffffea96bd0) at xfs_log_recover.c:527
> #2  0x4000000000094000 in xlog_find_tail (log=0x607ffffffea96be0, 
> head_blk=0x607ffffffea96bd0, tail_blk=0x607ffffffea96bd8, readonly=0)
>     at xfs_log_recover.c:616
> #3  0x40000000000100a0 in xfs_log_print_trans (log=0x607ffffffea96be0, 
> print_block_start=-1) at log_print_trans.c:49
> #4  0x4000000000003600 in main (argc=<value optimized out>, argv=<value 
> optimized out>) at logprint.c:240
> 
> (gdb) print *last_blk
> $1 = 5376
> 
> start_blk = 5312
> last_blk = 5376
> extra_bblks = 0
> 
> Goes from 5376..5312 looking for magic# at start of sector.
> Errors out if can't find one.
> Which is true since the next magic# is at 5248.
> 
> logprint -d output:
>   4992 HEADER Cycle 154 tail 153:253056 len  61440 ops 105
>   5120 HEADER Cycle 154 tail 153:253056 len  61440 ops 344
>   5248 HEADER Cycle 154 tail 153:253056 len  61440 ops 303  <--- 
> previous block
>   5376 HEADER Cycle 153 tail 153:253056 len      0 ops 0    <--- hdr
> [00000 - 05376] Cycle 0x0000009a New Cycle 0x00000099
>   5377 HEADER Cycle 153 tail 153:253056 len      0 ops 0
>   5378 HEADER Cycle 153 tail 153:253056 len      0 ops 0
> 
> So why is it using a start_blk of 5312?
> 
> 
>     522         num_scan_bblks = XLOG_REC_SHIFT(log);
>     523         if (head_blk >= num_scan_bblks) {
>     524                 start_blk = head_blk - num_scan_bblks; /* don't 
> read head_blk */
>     525
>     526                 /* start ptr at last block ptr before head_blk */
>  ->527                 if ((error = xlog_find_verify_log_record(log, 
> start_blk,
>     528                                                         
> &head_blk, 0)) == -1) {
> 
> #define XLOG_REC_SHIFT(log) \
>         BTOBB(1 << (XFS_SB_VERSION_HASLOGV2(&log->l_mp->m_sb) ? \
>          XLOG_MAX_RECORD_BSHIFT : XLOG_BIG_RECORD_BSHIFT))
> 
> #define XLOG_BIG_RECORD_BSHIFT  15              /* 32k == 1 << 15 */
> #define XLOG_MAX_RECORD_BSHIFT  18              /* 256k == 1 << 18 */
> 
> Number of blocks between rec-hdrs from -d output:
> 5376-5248 = 128 BB
> 
> Looks like it used 5312 = 5376 - num_scan_bblks
> => num_scan_bblks = 64
> 
> Yep
> (gdb) p num_scan_bblks
> $3 = 64
> 32K
> i.e. it thinks it has v1 logs
> 
> (gdb) p *(log->l_mp)
> $9 = {m_sb = {sb_magicnum = 0, sb_blocksize = 0, sb_dblocks = 0, 
> sb_rblocks = 0, sb_rextents = 0, sb_uuid = '\0' <repeats 15 times>,
>     sb_logstart = 2305843009213997776, sb_rootino = 6953557824646229696, 
> sb_rbmino = 6953557824646229688, sb_rsumino = 6953557824646229680,
>     sb_rextsize = 284472, sb_agblocks = 536870912, sb_agcount = 
> 4281151176, sb_rbmblocks = 1619001343, sb_logblocks = 0, sb_versionnum = 0,
>     sb_sectsize = 0, sb_inodesize = 38256, sb_inopblock = 4, sb_fname = 
> "\000\000\000 H?\a\000\000\000\000 ", sb_blocklog = 0 '\0',
> etc...
> it is not set
> 
> Okay, need to file a bug for this part so that -f can be useful here.
> It needs to be using a LR h_version number for this somehow.
> 
> --Tim
> 
> 
> Lachlan McIlroy wrote:
>>> ============================================================================ 
>>>
>>> cycle: 154    version: 2        lsn: 154,5248    tail_lsn: 153,253056
>>> length of Log Record: 61440    prev offset: 5120        num ops: 303
>>> uuid: b02fc6bd-662f-40a5-8f0c-e260a881a3e7   format: little endian linux
>>> h_size: 262144
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>>
>>> extended-header: cycle: 154
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>>
>>> extended-header: cycle: 154
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>>
>>> extended-header: cycle: 154
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>>
>>> extended-header: cycle: 154
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>>
>>> extended-header: cycle: 154
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>>
>>> extended-header: cycle: 154
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>>
>>> extended-header: cycle: 154
>>> ---------------------------------------------------------------------------- 
>>>
>>> Oper (0): tid: 5769cd60  len: 0  clientid: TRANS  flags: START
>>> ---------------------------------------------------------------------------- 
>>>
>>> Oper (1): tid: 5769cd60  len: 16  clientid: TRANS  flags: none
>>> TRAN:    type: STRAT_WRITE       tid: 0       num_items: 5
>>> ---------------------------------------------------------------------------- 
>>>
>>> Oper (2): tid: 5769cd60  len: 56  clientid: TRANS  flags: none
>>> INODE: #regs: 3   ino: 0x1400005c  flags: 0x5   dsize: 16
>>>         blkno: 610338736  len: 16  boff: 4096
>>> Oper (3): tid: 5769cd60  len: 96  clientid: TRANS  flags: none
>>> INODE CORE
>>> magic 0x494e mode 0100600 version 1 format 2
>>> nlink 1 uid 14029 gid 1474
>>> atime 0x4716b870 mtime 0x4716bade ctime 0x4716bade
>>> size 0x1ce00000 nblocks 0x1c840 extsize 0x0 nextents 0x1
>>> naextents 0x0 forkoff 0 dmevmask 0x0 dmstate 0x0
>>> flags 0x0 gen 0x4
>>> Oper (4): tid: 5769cd60  len: 16  clientid: TRANS  flags: none
>>> EXTENTS inode data
>>> ---------------------------------------------------------------------------- 
>>
>>
>>
>> Another interesting point is that the transaction id has changed between
>> these operations from 5769cd60 to 5769cf18.  It should stay the same.
>> We may be reading old log data.
>>
>>>
>>> Oper (5): tid: 5769cf18  len: 24  clientid: TRANS  flags: none
>>> BUF:  #regs: 2   start blkno: 610338561 (0x24610701)  len: 1  bmap 
>>> size: 1  flags: 0x0
>>> Oper (6): tid: 5769cf18  len: 128  clientid: TRANS  flags: none
>>> AGF Buffer: XAGF
>>> ver: 1  seq#: 5  len: 15258464
>>> root BNO: 1  CNT: 2
>>> level BNO: 1  CNT: 1
>>> 1st: 0  last: 3  cnt: 4  freeblks: 2491090  longest: 581471
>>> ---------------------------------------------------------------------------- 
>>>
>>> Oper (7): tid: 5769cf18  len: 28  clientid: TRANS  flags: none
>>> BUF:  #regs: 2   start blkno: 610338568 (0x24610708)  len: 8  bmap 
>>> size: 2  flags: 0x0
>>> Oper (8): tid: 5769cf18  len: 128  clientid: TRANS  flags: none
>>> BUF DATA
>>> ---------------------------------------------------------------------------- 
>>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-12-05  7:04 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-28 13:45 2.6.24-rc3 oopses while mounting fs KELEMEN Peter
2007-11-28 23:56 ` Lachlan McIlroy
2007-11-30 22:31   ` KELEMEN Peter
2007-12-03  3:24     ` Lachlan McIlroy
2007-12-03 22:02       ` KELEMEN Peter
2007-12-04  0:11         ` Timothy Shimmin
2007-12-04  7:08           ` Lachlan McIlroy
2007-12-04  9:57             ` KELEMEN Peter
2007-12-05  5:47             ` Lachlan McIlroy
2007-12-05  6:38               ` Timothy Shimmin
2007-12-05  7:04                 ` Lachlan McIlroy
2007-12-04  9:42           ` KELEMEN Peter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox