From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: with ECARTIS (v1.0.0; list xfs); Sun, 06 Jul 2008 20:01:09 -0700 (PDT) Received: from cuda.sgi.com (cuda1.sgi.com [192.48.168.28]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m673153Q022247 for ; Sun, 6 Jul 2008 20:01:06 -0700 Received: from bby1mta03.pmc-sierra.bc.ca (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 031CD11DD0A5 for ; Sun, 6 Jul 2008 20:02:09 -0700 (PDT) Received: from bby1mta03.pmc-sierra.bc.ca (bby1mta03.pmc-sierra.com [216.241.235.118]) by cuda.sgi.com with ESMTP id nOZv9jkLuVXAvpZa for ; Sun, 06 Jul 2008 20:02:09 -0700 (PDT) Message-ID: <4871872B.9060107@pmc-sierra.com> Date: Mon, 07 Jul 2008 08:32:03 +0530 From: Sagar Borikar MIME-Version: 1.0 Subject: Re: Xfs Access to block zero exception and system crash References: <486B01A6.4030104@pmc-sierra.com> <20080702051337.GX29319@disturbed> <486B13AD.2010500@pmc-sierra.com> <1214979191.6025.22.camel@verge.scott.net.au> <20080702065652.GS14251@build-svl-1.agami.com> <486B6062.6040201@pmc-sierra.com> <486C4F89.9030009@sandeen.net> <486C6053.7010503@pmc-sierra.com> <486CE9EA.90502@sandeen.net> <486DF8F0.5010700@pmc-sierra.com> <20080704122726.GG29319@disturbed> <340C71CD25A7EB49BFA81AE8C839266702997641@BBY1EXM10.pmc_nt.nt.pmc-sierra.bc.ca> <486E5F4D.1010009@sandeen.net> <340C71CD25A7EB49BFA81AE8C839266702997658@BBY1EXM10.pmc_nt.nt.pmc-sierra.bc.ca> <486FA095.1050106@sandeen.net> <340C71CD25A7EB49BFA81AE8C839266702A084A6@BBY1EXM10.pmc_nt.nt.pmc-sierra.bc.ca> <487117FC.9090109@sandeen.net> In-Reply-To: <487117FC.9090109@sandeen.net> Content-Type: multipart/mixed; boundary="------------080806010108010907030600" Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com List-Id: xfs To: Eric Sandeen Cc: Dave Chinner , Nathan Scott , xfs@oss.sgi.com This is a multi-part message in MIME format. --------------080806010108010907030600 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Eric Sandeen wrote: > Sagar Borikar wrote: > >> Sagar Borikar wrote: >> >>> Copy is of the same file to 30 different directories and it is >>> >> basically >> >>> overwrite. >>> >>> Here is the setup: >>> >>> It's a JBOD with Volume size 20 GB. The directories are empty and this >>> is basically continuous copy of the file on all thirty directories. >>> >> But >> >>> surprisingly none of the copy succeeds. All the copy processes are in >>> Uninterruptible sleep state and xfs_repair log I have already attached >>> >>> With the prep. As mentioned it is with 2.6.24 Fedora kernel. >>> >> It would probably be best to try a 2.6.26 kernel from rawhide to be sure >> you're closest to the bleeding edge. >> >> Sure Eric but I reran the test and I got similar errors with >> 2.6.24 kernel on x86. I am still confused with the results that I see on >> 2.6.24 kernel on x86 machine. I see that the used size shown by ls is >> way too huge than the actual size. Here is the log of the system >> >> [root@lab00 ~/test_partition]# ls -lSah >> total 202M >> -rw-r--r-- 1 root root 202M Jul 4 14:06 original ---> this I sthe file >> Which I copy. >> drwxr-x--- 65 root root 12K Jul 6 21:57 .. >> -rwxr-xr-x 1 root root 189 Jul 4 16:31 runall >> -rwxr-xr-x 1 root root 50 Jul 4 16:32 copy >> drwxr-xr-x 2 root root 45 Jul 6 22:07 . >> > > It'd be great if you provided these actual scripts so we don't have to > guess at what you're doing or work backwards from the repair output :) > Attaching the scripts with this mail. > >> dmesg log doesn't give any information. Here is XFS related >> info: >> >> XFS mounting filesystem loop0 >> Ending clean XFS mount for filesystem: loop0 >> Which is basically for mounting XFS cleanly. But there is no exception >> in XFS. >> > > and nothing else of interest either? > Not really. That's why it was surprising. Even after setting the error_level to 11 > >> Filesystem has become completely sluggish and response time is increased >> to >> 3-4 minutes for every command. Not a single copy is complete and all >> the copy processes are sleeping continuously. >> > > And how did you recover from this; did you power-cycle the box? > There was no failure. Only the processes were stalled. System was operative. > -Eric > --------------080806010108010907030600 Content-Type: text/plain; name="copy" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="copy" #! /bin/sh while [ 1 ] do cp -f $1 $2 done --------------080806010108010907030600 Content-Type: text/plain; name="runall" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="runall" #! /bin/sh for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 do mkdir -p testdir_$i ./copy testfile testdir_$i & rm -Rf testdir_$1/testfile ./copy testfile testfile_$i & done --------------080806010108010907030600--