From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752547Ab2GYXB5 (ORCPT ); Wed, 25 Jul 2012 19:01:57 -0400 Received: from mail-hostigation.familyross.net ([69.85.93.112]:56097 "EHLO mail.familyross.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752249Ab2GYXBy (ORCPT ); Wed, 25 Jul 2012 19:01:54 -0400 X-Greylist: delayed 559 seconds by postgrey-1.27 at vger.kernel.org; Wed, 25 Jul 2012 19:01:54 EDT Message-ID: <501078B2.8070707@familyross.net> Date: Wed, 25 Jul 2012 15:52:34 -0700 From: Kevin Ross User-Agent: Mozilla/5.0 (X11; Linux i686; rv:10.0.5) Gecko/20120624 Icedove/10.0.5 MIME-Version: 1.0 To: linux-kernel@vger.kernel.org Subject: RAID extremely slow Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, I'm having a problem. After a while, my software RAID rebuild becomes extremely slow, and the filesystem on the RAID is essentially blocked. I don't know what is causing this. I guess it could be a bad drive, but how can I find out? I used atop to show the transfer speeds to each drive. Here's a screenshot: http://img402.imageshack.us/img402/6484/screenshotfrom201207251.png "smartctl -a" for all the drives looks good to me, no pending failures, or errors logged. dmesg doesn't report anything wrong with any of the drives. It does, however, report lots of hung tasks, which are trying to access the RAID volume. For example: [51000.672064] INFO: task mythbackend:10677 blocked for more than 120 seconds. [51000.672098] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [51000.672143] mythbackend D 0000000e 0 10677 1 0x00000000 [51000.672146] f38bea00 00000086 c1095415 0000000e 00000002 00000000 00000000 c147aac0 [51000.672152] f38bebac c147aac0 eb2cff04 003d2f4b 00000000 c109cacb 01872f02 eb2cfe50 [51000.672157] c100f28b c13df480 01872f02 eb2cfe68 c10532b1 0069a8d0 f79d6ac0 00000000 [51000.672162] Call Trace: [51000.672169] [] ? find_get_pages_tag+0x2f/0xa2 [51000.672173] [] ? pagevec_lookup_tag+0x18/0x1e [51000.672176] [] ? read_tsc+0xa/0x28 [51000.672179] [] ? timekeeping_get_ns+0x11/0x55 [51000.672182] [] ? ktime_get_ts+0x7a/0x82 [51000.672186] [] ? io_schedule+0x4a/0x5f [51000.672188] [] ? sleep_on_page+0x5/0x8 [51000.672191] [] ? __wait_on_bit+0x2f/0x54 [51000.672193] [] ? lock_page+0x1d/0x1d [51000.672196] [] ? wait_on_page_bit+0x57/0x5e [51000.672199] [] ? autoremove_wake_function+0x29/0x29 [51000.672201] [] ? filemap_fdatawait_range+0x71/0x11e [51000.672205] [] ? filemap_write_and_wait_range+0x3e/0x4c [51000.672232] [] ? xfs_file_fsync+0x68/0x214 [xfs] [51000.672246] [] ? xfs_file_splice_write+0x144/0x144 [xfs] [51000.672249] [] ? vfs_fsync_range+0x27/0x2d [51000.672252] [] ? vfs_fsync+0x11/0x15 [51000.672254] [] ? sys_fdatasync+0x20/0x2e [51000.672258] [] ? sysenter_do_call+0x12/0x28 [51000.672261] [] ? quirk_usb_early_handoff+0x4a9/0x522 Here is some other possibly relevant info: # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sdh1[0] sdd1[9] sde1[10] sdb1[6] sdi1[7] sdc1[4] sdf1[3] sdg1[8] sdj1[1] 6837311488 blocks super 1.2 level 6, 512k chunk, algorithm 2 [9/9] [UUUUUUUUU] [==========>..........] resync = 51.3% (501954432/976758784) finish=28755.6min speed=275K/sec unused devices: # cat /proc/sys/dev/raid/speed_limit_min 10000 # cat /proc/sys/dev/raid/speed_limit_max 200000 Thanks in advance! -- Kevin