From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: with ECARTIS (v1.0.0; list xfs); Mon, 24 Sep 2007 21:41:21 -0700 (PDT) Received: from sandeen.net (sandeen.net [209.173.210.139]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l8P4fDQ3019986 for ; Mon, 24 Sep 2007 21:41:18 -0700 Message-ID: <46F8916D.7000900@sandeen.net> Date: Mon, 24 Sep 2007 23:41:17 -0500 From: Eric Sandeen MIME-Version: 1.0 Subject: Re: something very strange w/ filestreams... References: <46F49C80.60007@sandeen.net> <20070923092444.GQ995458@sgi.com> <46F7B04D.70809@sandeen.net> <46F8654B.9010203@sandeen.net> In-Reply-To: <46F8654B.9010203@sandeen.net> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com List-Id: xfs To: Barry Naujok Cc: David Chinner , xfs-oss Eric Sandeen wrote: > Barry Naujok wrote: > >>>> So, before running this test, you should make sure your test >>>> partitions are completely zeroed from mkfs's that occurred >>>> before that recent version of mkfs.xfs was installed. >>> I dd'd over the whole test partition, ran the sequence, and hit the >>> problem. >> Yeah, worked it out yesterday but never got around to doing another >> email. It's a combination of the two filestreams tests which do >> small filesystems and mkfs.xfs doesn't wipe beyond the new >> filesystem size. Zero the disk, try the attached patch and see >> if that fixes the problem. >> >> Barry. > > Ok, but what about that double free? > > -Eric > > I have a bit of a clue about what's going wrong. first we get the buffer zone allocated: new zone 0x80efd68 for "xfs_buffer", size=116 set a watchpoint on that, also break on setup_bmap: (gdb) watch *((int *)0x80efd68) Hardware watchpoint 1: *(int *) 135200104 (gdb) break setup_bmap (gdb) cont ba_bmap gets allocated, based on some particular sb_agblocks count at the time: setup_bmap(agcount, mp->m_sb.sb_agblocks, mp->m_sb.sb_rextents); on this filesystem it's 4096 at this point, like so: Breakpoint 3, setup_bmap (agno=64, numblocks=4096, rtblocks=0) at incore.c:59 and from some debugging the size of ba_bmap[i] ends up as 2048: ... ba_bmap[31] at 0x80edc58 size 2048 ba_bmap[32] at 0x80ee460 size 2048 ba_bmap[33] at 0x80eec68 size 2048 ... so I set a watch on the zone that ends up corrupted, and: Hardware watchpoint 4: *(int *) 135200104 Old value = 116 New value = 372 0x08063a2f in set_agbno_state (mp=0xbf999188, agno=32, ag_blockno=12818, state=1) at incore.c:278 278 *addr = (((*addr) & (gdb) bt #0 0x08063a2f in set_agbno_state (mp=0xbf999188, agno=32, ag_blockno=12818, state=1) at incore.c:278 #1 0x0807d752 in scanfunc_bno (ablock=0x8187200, level=0, bno=1, agno=32, suspect=0, isroot=1) at scan.c:548 #2 0x0807c017 in scan_sbtree (root=1, nlevels=1, agno=32, suspect=0, func=0x807d430 , isroot=1) at scan.c:66 #3 0x0807d19a in scan_ag (agno=32) at ../include/xfs/swab.h:126 #4 0x0806751b in phase2 (mp=0xbf999188) at phase2.c:148 #5 0x08080d77 in main (argc=Cannot access memory at address 0x8 ) at xfs_repair.c:619 so at this point it looks like we're trying to use an ag_blockno of 12818, when we only allocated based on expecting 4096 blocks per ag? So I guess we've stumbled across another piece of the older, larger filesystem and those values cause us to walk off the end of the ba_map array? Not sure where it goes from here, but bedtime for me. :) -Eric