All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Jarosch <thomas.jarosch@intra2net.com>
To: linux-raid@vger.kernel.org
Cc: Tejun Heo <tj@kernel.org>
Subject: raid1 boot regression in 2.6.37 [bisected]
Date: Fri, 25 Mar 2011 19:55:34 +0100	[thread overview]
Message-ID: <4D8CE526.3050204@intra2net.com> (raw)

Hello,

I've just updated from kernel 2.6.34.7 to kernel 2.6.37.5 and one
HP Proliant DL320 G3 box with a raid1 software RAID stopped booting.
(also two other non-HP boxes).

We run this script at boot time via dracut:
----------------------------------
#!/bin/sh
. /lib/dracut-lib.sh

info "Telling kernel to auto-detect RAID arrays"
/sbin/initqueue --settled --name kerneldetectraid /sbin/mdadm --auto-detect
----------------------------------

With the "bad" commit in place, the kernel doesn't output
any md message at all. I've bisected it down to this commit:

e804ac780e2f01cb3b914daca2fd4780d1743db1 is the first bad commit
commit e804ac780e2f01cb3b914daca2fd4780d1743db1
Author: Tejun Heo <tj@kernel.org>
Date:   Fri Oct 15 15:36:08 2010 +0200

    md: fix and update workqueue usage

    Workqueue usage in md has two problems.

    * Flush can be used during or depended upon by memory reclaim, but md
      uses the system workqueue for flush_work which may lead to deadlock.

    * md depends on flush_scheduled_work() to achieve exclusion against
      completion of removal of previous instances.  flush_scheduled_work()
      may incur unexpected amount of delay and is scheduled to be removed.

    This patch adds two workqueues to md - md_wq and md_misc_wq.  The
    former is guaranteed to make forward progress under memory pressure
    and serves flush_work.  The latter serves as the flush domain for
    other works.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Signed-off-by: NeilBrown <neilb@suse.de>

:040000 040000 f6b6a34a71864263ed253866c5f8abe7f766ac6b 
dc2eff4a91825142b7c88cf54751fc7acdf1a6d2 M      drivers

I manually verified that the commit before it 
(57dab0bdf689d42972975ec646d862b0900a4bf3) works
and the "bad" commit prevents the box from booting.


Some more info:

# mdadm --version
mdadm - v2.6.9 - 10th March 2009

# mdadm --detail /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Wed May 27 17:52:40 2009
     Raid Level : raid1
     Array Size : 2562240 (2.44 GiB 2.62 GB)
  Used Dev Size : 2562240 (2.44 GiB 2.62 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Mar 25 17:11:33 2011
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 0ee8da2c:5803478b:e399b924:6520c535
         Events : 0.160

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1



Any idea what might go wrong? May be building a kernel
with lock debugging on Monday might help. I think I also
tried kernel 2.6.38 though I'll very on Monday, too.


Have a nice weekend,
Thomas

PS: Sorry Tejun for the HTML crap in my first mail.

             reply	other threads:[~2011-03-25 18:55 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-25 18:55 Thomas Jarosch [this message]
     [not found] <201103251725.21180.thomas.jarosch@intra2net.com>
2011-03-28  7:59 ` raid1 boot regression in 2.6.37 [bisected] Tejun Heo
2011-03-28 11:02   ` Thomas Jarosch
2011-03-28 12:53     ` Thomas Jarosch
2011-03-28 15:59       ` Tejun Heo
2011-03-28 19:46         ` Thomas Jarosch
2011-03-28 19:59           ` Roberto Spadim
2011-03-29 12:06             ` Thomas Jarosch
2011-03-29 12:22               ` Roberto Spadim
2011-03-29  8:25           ` Tejun Heo
2011-03-29  9:53             ` Thomas Jarosch
2011-03-29 10:07               ` Tejun Heo
2011-03-29 11:52                 ` Thomas Jarosch
2011-04-05  3:46                 ` NeilBrown
2011-04-06 10:16                   ` Tejun Heo
2011-04-12 14:05                     ` Thomas Jarosch
2011-04-12 22:44                       ` NeilBrown
     [not found]                         ` <201104261051.09464.thomas.jarosch@intra2net.com>
2011-04-27  8:17                           ` NeilBrown
2011-04-27 10:05                             ` NeilBrown
     [not found]                               ` <201104271700.58894.thomas.jarosch@intra2net.com>
2011-04-28  1:23                                 ` NeilBrown
2011-04-28 13:47                                   ` Thomas Jarosch
2011-05-02 12:17                                     ` Thomas Jarosch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D8CE526.3050204@intra2net.com \
    --to=thomas.jarosch@intra2net.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.