From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: with ECARTIS (v1.0.0; list xfs); Sun, 29 Jun 2008 23:06:28 -0700 (PDT) Received: from cuda.sgi.com (cuda1.sgi.com [192.48.168.28]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m5U66PC7016228 for ; Sun, 29 Jun 2008 23:06:25 -0700 Received: from bby1mta03.pmc-sierra.bc.ca (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 880BA1239AB0 for ; Sun, 29 Jun 2008 23:07:27 -0700 (PDT) Received: from bby1mta03.pmc-sierra.bc.ca (bby1mta03.pmc-sierra.com [216.241.235.118]) by cuda.sgi.com with ESMTP id dXiK2ua1FRBgiFA2 for ; Sun, 29 Jun 2008 23:07:27 -0700 (PDT) Received: from bby1mta03.pmc-sierra.bc.ca (localhost.pmc-sierra.bc.ca [127.0.0.1]) by localhost (Postfix) with SMTP id 872061070598 for ; Sun, 29 Jun 2008 23:09:55 -0700 (PDT) Received: from bby1exg02.pmc_nt.nt.pmc-sierra.bc.ca (BBY1EXG02.pmc-sierra.bc.ca [216.241.231.167]) by bby1mta03.pmc-sierra.bc.ca (Postfix) with SMTP id 5C904107049A for ; Sun, 29 Jun 2008 23:09:55 -0700 (PDT) Message-ID: <4868781B.40907@pmc-sierra.com> Date: Mon, 30 Jun 2008 11:37:23 +0530 From: Sagar Borikar MIME-Version: 1.0 Subject: Re: Xfs Access to block zero exception and system crash References: <340C71CD25A7EB49BFA81AE8C839266701323BD8@BBY1EXM10.pmc_nt.nt.pmc-sierra.bc.ca> <20080625084931.GI16257@build-svl-1.agami.com> <340C71CD25A7EB49BFA81AE8C839266701323BE8@BBY1EXM10.pmc_nt.nt.pmc-sierra.bc.ca> <20080626070215.GI11558@disturbed> <4864BD5D.1050202@pmc-sierra.com> <4864C001.2010308@pmc-sierra.com> <20080628000516.GD29319@disturbed> <340C71CD25A7EB49BFA81AE8C8392667028A1CA7@BBY1EXM10.pmc_nt.nt.pmc-sierra.bc.ca> <20080629215647.GJ29319@disturbed> <20080630034112.055CF18904C4@bby1mta01.pmc-sierra.bc.ca> In-Reply-To: <20080630034112.055CF18904C4@bby1mta01.pmc-sierra.bc.ca> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com List-Id: xfs To: xfs@oss.sgi.com Sagar Borikar wrote: > Dave Chinner wrote: >> On Sat, Jun 28, 2008 at 09:47:44AM -0700, Sagar Borikar wrote: >> Device Boot Start End Blocks Id System >>> /dev/scsibd1 126 286 20608 83 Linux >>> /dev/scsibd2 287 1023 94336 83 Linux >>> /dev/scsibd3 1149 1309 20608 83 Linux >>> /dev/scsibd4 1310 2046 94336 83 Linux >>> >> >> I'd have to assume thats a flash based root drive, right? >> >> > That's right, >>> Disk /dev/md0: 251.0 GB, 251000160256 bytes >>> 2 heads, 4 sectors/track, 61279336 cylinders >>> Units = cylinders of 8 * 512 = 4096 bytes >>> >>> Disk /dev/md0 doesn't contain a valid partition table >>> >>> Disk /dev/dm-0: 107.3 GB, 107374182400 bytes >>> 255 heads, 63 sectors/track, 13054 cylinders >>> Units = cylinders of 16065 * 512 = 8225280 bytes >>> >> >> Neither of these tell me what /dev/RAIDA/vol is.... >> It is the device node to which /mnt/RAIDA/vol is mapped to. Its a >> JBOD with 233 GB size. >> >>> But still the issue is why doesn't it happen every time and less >>> stress? >>> >>> I am surprised to see to let this happen immediately when the >>> subdirectories increase more than 30. Else it decays slowly. >>> >> >> So it happens when you get more than 30 entries in a directory >> under a certain load? That might be an extent->btree format >> conversion bug or vice versa. I'd suggest setting up a test based >> around this to try to narrow down the problem. >> >> Cheers, >> >> Dave. >> > Thanks for all your help. Shall keep you posted with the progress on > debugging. > > Regards > Sagar > > Sorry if I was not clear. As I mentioned the frequency of finding bad extents is much higher when I increase simultaneous transactions to 30 ( say in 5 min ) but if I run only two copies in infinite loop, the issue crops up in 2-3 hours roughly. And all the copies plus pdflush are in uninterruptible sleep state continuously. And it is not uninterruptible sleep and waiting state ( DW ) but just uninterruptible ( D ). Thanks Sagar