linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: BillStuff <billstuff2001@sbcglobal.net>
To: NeilBrown <neilb@suse.de>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: Raid5 hang in 3.14.19
Date: Sun, 28 Sep 2014 22:56:19 -0500	[thread overview]
Message-ID: <5428D863.7090409@sbcglobal.net> (raw)
In-Reply-To: <20140929122533.3b91a543@notabene.brown>

On 09/28/2014 09:25 PM, NeilBrown wrote:
> On Fri, 26 Sep 2014 17:33:58 -0500 BillStuff <billstuff2001@sbcglobal.net>
> wrote:
>
>> Hi Neil,
>>
>> I found something that looks similar to the problem described in
>> "Re: seems like a deadlock in workqueue when md do a flush" from Sept 14th.
>>
>> It's on 3.14.19 with 7 recent patches for fixing raid1 recovery hangs.
>>
>> on this array:
>> md3 : active raid5 sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0]
>>         104171200 blocks level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]
>>         bitmap: 1/5 pages [4KB], 2048KB chunk
>>
>> I was running a test doing parallel kernel builds, read/write loops, and
>> disk add / remove / check loops,
>> on both this array and a raid1 array.
>>
>> I was trying to stress test your recent raid1 fixes, which went well,
>> but then after 5 days,
>> the raid5 array hung up with this in dmesg:
> I think this is different to the workqueue problem you mentioned, though as I
> don't know exactly what caused either I cannot be certain.
>
>   From the data you provided it looks like everything is waiting on
> get_active_stripe(), or on a process that is waiting on that.
> That seems pretty common whenever anything goes wrong in raid5 :-(
>
> The md3_raid5 task is listed as blocked, but not stack trace is given.
> If the machine is still in the state, then
>
>   cat /proc/1698/stack
>
> might be useful.
> (echo t > /proc/sysrq-trigger is always a good idea)

Might this help? I believe the array was doing a "check" when things 
hung up.

md3_raid5       D ea49d770     0  1698      2 0x00000000
  e833dda8 00000046 c106d92d ea49d770 e9d38554 1cc20b58 1e79a404 0001721a
  c17d6700 c17d6700 e956d610 c2217470 c13af054 e9e8f000 00000000 00000000
  e833dd78 00000000 00000000 00000271 00000000 00000005 00000000 0000a193
Call Trace:
  [<c106d92d>] ? __enqueue_entity+0x6d/0x80
  [<c13af054>] ? scsi_init_io+0x24/0xb0
  [<c1072683>] ? enqueue_task_fair+0x2d3/0x660
  [<c153e7f3>] schedule+0x23/0x60
  [<c153db85>] schedule_timeout+0x145/0x1c0
  [<c1065698>] ? update_rq_clock.part.92+0x18/0x50
  [<c1067a65>] ? check_preempt_curr+0x65/0x90
  [<c1067aa8>] ? ttwu_do_wakeup+0x18/0x120
  [<c153ef5b>] wait_for_common+0x9b/0x110
  [<c1069ca0>] ? wake_up_process+0x40/0x40
  [<c153f077>] wait_for_completion_killable+0x17/0x30
  [<c105ad0a>] kthread_create_on_node+0x9a/0x110
  [<c1453ecc>] md_register_thread+0x8c/0xc0
  [<c1453f00>] ? md_register_thread+0xc0/0xc0
  [<c145ad14>] md_check_recovery+0x304/0x490
  [<c12b1192>] ? blk_finish_plug+0x12/0x40
  [<f3dc3a10>] raid5d+0x20/0x4c0 [raid456]
  [<c104a022>] ? try_to_del_timer_sync+0x42/0x60
  [<c153db3d>] ? schedule_timeout+0xfd/0x1c0
  [<c1453fe8>] md_thread+0xe8/0x100
  [<c1079990>] ? __wake_up_sync+0x20/0x20
  [<c1453f00>] ? md_register_thread+0xc0/0xc0
  [<c105ae21>] kthread+0xa1/0xc0
  [<c1541837>] ret_from_kernel_thread+0x1b/0x28
  [<c105ad80>] ? kthread_create_on_node+0x110/0x110

I've already rebooted the system, but I did get a snapshot of all the 
blocked processes.
It's kind of long but I can post it if it's useful.

Thanks,
Bill

  reply	other threads:[~2014-09-29  3:56 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-26 22:33 Raid5 hang in 3.14.19 BillStuff
2014-09-29  2:25 ` NeilBrown
2014-09-29  3:56   ` BillStuff [this message]
2014-09-29  4:08     ` NeilBrown
2014-09-29  4:28       ` BillStuff
2014-09-29  4:43         ` NeilBrown
2014-09-29 21:59         ` NeilBrown
2014-09-30  4:19           ` BillStuff
2014-09-30 21:21           ` BillStuff
2014-09-30 22:54             ` NeilBrown
2014-10-05 16:05               ` BillStuff
2014-10-14  1:42                 ` NeilBrown
2014-10-14 14:56                   ` BillStuff
2014-10-14 16:55                   ` BillStuff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5428D863.7090409@sbcglobal.net \
    --to=billstuff2001@sbcglobal.net \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).