All of lore.kernel.org
 help / color / mirror / Atom feed
From: BillStuff <billstuff2001@sbcglobal.net>
To: NeilBrown <neilb@suse.de>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: Raid5 hang in 3.14.19
Date: Sun, 28 Sep 2014 22:56:19 -0500	[thread overview]
Message-ID: <5428D863.7090409@sbcglobal.net> (raw)
In-Reply-To: <20140929122533.3b91a543@notabene.brown>

On 09/28/2014 09:25 PM, NeilBrown wrote:
> On Fri, 26 Sep 2014 17:33:58 -0500 BillStuff <billstuff2001@sbcglobal.net>
> wrote:
>
>> Hi Neil,
>>
>> I found something that looks similar to the problem described in
>> "Re: seems like a deadlock in workqueue when md do a flush" from Sept 14th.
>>
>> It's on 3.14.19 with 7 recent patches for fixing raid1 recovery hangs.
>>
>> on this array:
>> md3 : active raid5 sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0]
>>         104171200 blocks level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]
>>         bitmap: 1/5 pages [4KB], 2048KB chunk
>>
>> I was running a test doing parallel kernel builds, read/write loops, and
>> disk add / remove / check loops,
>> on both this array and a raid1 array.
>>
>> I was trying to stress test your recent raid1 fixes, which went well,
>> but then after 5 days,
>> the raid5 array hung up with this in dmesg:
> I think this is different to the workqueue problem you mentioned, though as I
> don't know exactly what caused either I cannot be certain.
>
>   From the data you provided it looks like everything is waiting on
> get_active_stripe(), or on a process that is waiting on that.
> That seems pretty common whenever anything goes wrong in raid5 :-(
>
> The md3_raid5 task is listed as blocked, but not stack trace is given.
> If the machine is still in the state, then
>
>   cat /proc/1698/stack
>
> might be useful.
> (echo t > /proc/sysrq-trigger is always a good idea)

Might this help? I believe the array was doing a "check" when things 
hung up.

md3_raid5       D ea49d770     0  1698      2 0x00000000
  e833dda8 00000046 c106d92d ea49d770 e9d38554 1cc20b58 1e79a404 0001721a
  c17d6700 c17d6700 e956d610 c2217470 c13af054 e9e8f000 00000000 00000000
  e833dd78 00000000 00000000 00000271 00000000 00000005 00000000 0000a193
Call Trace:
  [<c106d92d>] ? __enqueue_entity+0x6d/0x80
  [<c13af054>] ? scsi_init_io+0x24/0xb0
  [<c1072683>] ? enqueue_task_fair+0x2d3/0x660
  [<c153e7f3>] schedule+0x23/0x60
  [<c153db85>] schedule_timeout+0x145/0x1c0
  [<c1065698>] ? update_rq_clock.part.92+0x18/0x50
  [<c1067a65>] ? check_preempt_curr+0x65/0x90
  [<c1067aa8>] ? ttwu_do_wakeup+0x18/0x120
  [<c153ef5b>] wait_for_common+0x9b/0x110
  [<c1069ca0>] ? wake_up_process+0x40/0x40
  [<c153f077>] wait_for_completion_killable+0x17/0x30
  [<c105ad0a>] kthread_create_on_node+0x9a/0x110
  [<c1453ecc>] md_register_thread+0x8c/0xc0
  [<c1453f00>] ? md_register_thread+0xc0/0xc0
  [<c145ad14>] md_check_recovery+0x304/0x490
  [<c12b1192>] ? blk_finish_plug+0x12/0x40
  [<f3dc3a10>] raid5d+0x20/0x4c0 [raid456]
  [<c104a022>] ? try_to_del_timer_sync+0x42/0x60
  [<c153db3d>] ? schedule_timeout+0xfd/0x1c0
  [<c1453fe8>] md_thread+0xe8/0x100
  [<c1079990>] ? __wake_up_sync+0x20/0x20
  [<c1453f00>] ? md_register_thread+0xc0/0xc0
  [<c105ae21>] kthread+0xa1/0xc0
  [<c1541837>] ret_from_kernel_thread+0x1b/0x28
  [<c105ad80>] ? kthread_create_on_node+0x110/0x110

I've already rebooted the system, but I did get a snapshot of all the 
blocked processes.
It's kind of long but I can post it if it's useful.

Thanks,
Bill

  reply	other threads:[~2014-09-29  3:56 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-26 22:33 Raid5 hang in 3.14.19 BillStuff
2014-09-29  2:25 ` NeilBrown
2014-09-29  3:56   ` BillStuff [this message]
2014-09-29  4:08     ` NeilBrown
2014-09-29  4:28       ` BillStuff
2014-09-29  4:43         ` NeilBrown
2014-09-29 21:59         ` NeilBrown
2014-09-30  4:19           ` BillStuff
2014-09-30 21:21           ` BillStuff
2014-09-30 22:54             ` NeilBrown
2014-10-05 16:05               ` BillStuff
2014-10-14  1:42                 ` NeilBrown
2014-10-14 14:56                   ` BillStuff
2014-10-14 16:55                   ` BillStuff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5428D863.7090409@sbcglobal.net \
    --to=billstuff2001@sbcglobal.net \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.