From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p5KLGAhQ187544 for ; Mon, 20 Jun 2011 16:16:10 -0500 Received: from mail.ud10.udmedia.de (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id A9B96123BE7A for ; Mon, 20 Jun 2011 14:16:08 -0700 (PDT) Received: from mail.ud10.udmedia.de (ud10.udmedia.de [194.117.254.50]) by cuda.sgi.com with ESMTP id 18lTYvgimkmcmOZr for ; Mon, 20 Jun 2011 14:16:08 -0700 (PDT) Date: Mon, 20 Jun 2011 23:16:07 +0200 From: Markus Trippelsdorf Subject: Re: long hangs when deleting large directories (3.0-rc3) Message-ID: <20110620211607.GA1722@x4.trippels.de> References: <20110618141950.GA1685@x4.trippels.de> <20110620060351.GC1730@x4.trippels.de> <20110620111359.GA12632@x4.trippels.de> <201106201345.30271@zmi.at> <20110620123132.GA1717@x4.trippels.de> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20110620123132.GA1717@x4.trippels.de> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Michael Monnerie Cc: xfs@oss.sgi.com On 2011.06.20 at 14:31 +0200, Markus Trippelsdorf wrote: > On 2011.06.20 at 13:45 +0200, Michael Monnerie wrote: > > On Montag, 20. Juni 2011 Markus Trippelsdorf wrote: > > > Here are two more examples. The time when the hang occurs is marked > > > > Could it be that some sectors on the disk are not easy to read for the > > drive, and that it simply retries several times until it works again? > > SATA disks can show that behaviour. You could try with "dd" with > > seek/skip parameters so you read 1gb at once, then skip 1gb and read 1gb > > again etc, and compare the throughput over all 1gb areas. If there's one > > slower, that might be the problem. > > > > Maybe a check with "smartctl" could help, too. > > Thanks for the hint, Michael. I've just checked the SMART status on > both disks and the 4kb drive looks indeed suspicious: > > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 8 > 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 8 > > The 512 byte drive appears to be fine. But I'm running the long > SMART self test on both of them right now and will report back > the result in a few hours. Hmm, both tests ran fine without any errors. And the two SMART attributes above are back to zero again (must have been a temporary firmware hiccup). As you can see in the data I've posted, the disk workload consists almost only of writes. And I don't think a disk retries writes several times. On the contrary a write to a bad sector should fix it, because the drive can then remap it safely. (Current_Pending_Sector would decrease and Reallocated_Sector_Ct would increase. But Reallocated_Sector_Ct is still 0 on both affected drives) And shouldn't I see these "hangs" in situations other than "rm -fr", if the disk drive would be responsible? -- Markus _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs