public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Phil Turmel <philip@turmel.org>
To: Kevin Ross <kevin@familyross.net>
Cc: linux-kernel@vger.kernel.org, linux-raid <linux-raid@vger.kernel.org>
Subject: Re: RAID extremely slow
Date: Wed, 25 Jul 2012 21:00:51 -0400	[thread overview]
Message-ID: <501096C3.5060700@turmel.org> (raw)
In-Reply-To: <501078B2.8070707@familyross.net>

[Added linux-raid to the CC]

Hi Kevin,

Notes interleaved:

On 07/25/2012 06:52 PM, Kevin Ross wrote:
> Hello,
> 
> I'm having a problem.  After a while, my software RAID rebuild becomes
> extremely slow, and the filesystem on the RAID is essentially blocked. 
> I don't know what is causing this.  I guess it could be a bad drive, but
> how can I find out?

Probably not.  That pretty much always shows up in dmesg.

> I used atop to show the transfer speeds to each drive. Here's a
> screenshot:
> http://img402.imageshack.us/img402/6484/screenshotfrom201207251.png

Piles of small reads  scattered across multiple drives, and a
concentration of queued writes to /dev/sda.  What's on /dev/sda?
It's not a member of the raid, so it must be some other system task
involved.

[ The output of "lsdrv" [1] might be useful here, along with
"mdadm -D /dev/md0" and "mdadm -E /dev/[b-j]" ]

> "smartctl -a" for all the drives looks good to me, no pending failures,
> or errors logged.  dmesg doesn't report anything wrong with any of the
> drives.  It does, however, report lots of hung tasks, which are trying
> to access the RAID volume.  For example:
> 
> [51000.672064] INFO: task mythbackend:10677 blocked for more than 120
> seconds.
> [51000.672098] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [51000.672143] mythbackend     D 0000000e     0 10677      1 0x00000000
> [51000.672146]  f38bea00 00000086 c1095415 0000000e 00000002 00000000
> 00000000 c147aac0
> [51000.672152]  f38bebac c147aac0 eb2cff04 003d2f4b 00000000 c109cacb
> 01872f02 eb2cfe50
> [51000.672157]  c100f28b c13df480 01872f02 eb2cfe68 c10532b1 0069a8d0
> f79d6ac0 00000000
> [51000.672162] Call Trace:
> [51000.672169]  [<c1095415>] ? find_get_pages_tag+0x2f/0xa2
> [51000.672173]  [<c109cacb>] ? pagevec_lookup_tag+0x18/0x1e
> [51000.672176]  [<c100f28b>] ? read_tsc+0xa/0x28
> [51000.672179]  [<c10532b1>] ? timekeeping_get_ns+0x11/0x55
> [51000.672182]  [<c10536a4>] ? ktime_get_ts+0x7a/0x82
> [51000.672186]  [<c12bea8b>] ? io_schedule+0x4a/0x5f
> [51000.672188]  [<c1095659>] ? sleep_on_page+0x5/0x8
> [51000.672191]  [<c12bedeb>] ? __wait_on_bit+0x2f/0x54
> [51000.672193]  [<c1095654>] ? lock_page+0x1d/0x1d
> [51000.672196]  [<c1095754>] ? wait_on_page_bit+0x57/0x5e
> [51000.672199]  [<c104d171>] ? autoremove_wake_function+0x29/0x29
> [51000.672201]  [<c1095823>] ? filemap_fdatawait_range+0x71/0x11e
> [51000.672205]  [<c109630f>] ? filemap_write_and_wait_range+0x3e/0x4c
> [51000.672232]  [<f86bfb39>] ? xfs_file_fsync+0x68/0x214 [xfs]
> [51000.672246]  [<f86bfad1>] ? xfs_file_splice_write+0x144/0x144 [xfs]
> [51000.672249]  [<c10e7e3b>] ? vfs_fsync_range+0x27/0x2d
> [51000.672252]  [<c10e7e52>] ? vfs_fsync+0x11/0x15
> [51000.672254]  [<c10e80b8>] ? sys_fdatasync+0x20/0x2e

MythTV is trying to flush recorded video to disk, I presume.  Sync is
known to cause stalls--a great deal of work is on-going to improve
this.  How old is this kernel?

> [51000.672258]  [<c12c409f>] ? sysenter_do_call+0x12/0x28
> [51000.672261]  [<c12b0000>] ? quirk_usb_early_handoff+0x4a9/0x522
> 
> Here is some other possibly relevant info:
> 
> # cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid6 sdh1[0] sdd1[9] sde1[10] sdb1[6] sdi1[7] sdc1[4]
> sdf1[3] sdg1[8] sdj1[1]
>       6837311488 blocks super 1.2 level 6, 512k chunk, algorithm 2 [9/9]
> [UUUUUUUUU]
>       [==========>..........]  resync = 51.3% (501954432/976758784)
> finish=28755.6min speed=275K/sec

Is this resync a weekly check, or did something else trigger it?

> unused devices: <none>
> 
> # cat /proc/sys/dev/raid/speed_limit_min
> 10000

MD is unable to reach its minimum rebuild rate while other system
activity is ongoing.  You might want to lower this number to see if that
gets you out of the stalls.

Or temporarily shut down mythtv.

> # cat /proc/sys/dev/raid/speed_limit_max
> 200000
> 
> Thanks in advance!
> -- Kevin

HTH,

Phil

[1] http://github.com/pturmel/lsdrv


  reply	other threads:[~2012-07-26  1:16 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-25 22:52 RAID extremely slow Kevin Ross
2012-07-26  1:00 ` Phil Turmel [this message]
2012-07-26  1:55   ` Kevin Ross
2012-07-26  2:09     ` CoolCold
2012-07-26  2:18       ` Kevin Ross
2012-07-26  5:00     ` Kevin Ross
2012-07-26 22:36       ` Kevin Ross
2012-07-27 19:08       ` Bill Davidsen
2012-07-27 21:45         ` Kevin Ross
2012-07-28  4:45           ` Grant Coady
2012-07-28  8:34             ` Kevin Ross
2012-08-01  3:16               ` Bill Davidsen
2012-07-27  2:15     ` David Dillow
2012-07-27  2:17       ` David Dillow
2012-07-27  2:17         ` Kevin Ross
2012-07-27  2:27           ` David Dillow
2012-07-27  2:53             ` Kevin Ross
2012-07-27  3:17               ` Kevin Ross
2012-08-17 21:55   ` Jan Engelhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=501096C3.5060700@turmel.org \
    --to=philip@turmel.org \
    --cc=kevin@familyross.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox