linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Paul Mackerras <paulus@samba.org>
To: dm-devel@redhat.com, linux-kernel@vger.kernel.org,
	linuxppc-dev@ozlabs.org
Cc: Vladimir Davydov <vdavydov@parallels.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Hannes Reinecke <hare@suse.de>
Subject: Regression in 3.15 on POWER8 with multipath SCSI
Date: Mon, 30 Jun 2014 20:30:59 +1000	[thread overview]
Message-ID: <20140630103058.GA17747@iris.ozlabs.ibm.com> (raw)

I have a machine on which 3.15 usually fails to boot, and 3.14 boots
every time.  The machine is a POWER8 2-socket server with 20 cores
(thus 160 CPUs), 128GB of RAM, and 7 SCSI disks connected via a
hardware-RAID-capable adapter which appears as two IPR controllers
which are both connected to each disk.  I am booting from a disk that
has Fedora 20 installed on it.

After over two weeks of bisections, I can finally point to the commits
that cause the problems.  The culprits are:

3e9f1be1 dm mpath: remove process_queued_ios()
e8099177 dm mpath: push back requests instead of queueing
bcccff93 kobject: don't block for each kobject_uevent

The interesting thing is that neither e8099177 nor bcccff93 cause
failures on their own, but with both commits in there are failures
where the system will fail to find /home on some occasions.

With 3e9f1be1 included, the system appears to be prone to a deadlock
condition which typically causes the boot process to hang with this
message showing:

A start job is running for Monitoring of LVM2 mirror...rogress polling

(with a [***     ] thing before it where the asterisks move back and
forth).

If I revert 63d832c3 ("dm mpath: really fix lockdep warning") ,
4cdd2ad7 ("dm mpath: fix lock order inconsistency in
multipath_ioctl"), 3e9f1be1 and bcccff93, in that order, I get a
kernel that will boot every time.  The first two are later commits
that fix some problems with 3e9f1be1 (though not the problems I am
seeing).

Can anyone see any reason why e8099177 and bcccff93 would interfere
with each other?

-----

The rest of this email outlines the steps I took to identify these
commits.  I first identified that 3.15-rc1 would sometimes fail to
boot, and did a bisection between 3.15 and 3.15-rc1 that identified
3e9f1be1 as the bad commit.  I then took 3.15-rc8 and reverted
63d832c3, 4cdd2ad7 and 3e9f1be1, and tested that.  That didn't fail
with the deadlock, but was still prone to fail to find root or /home
and thus fail to boot.

To debug this second problem, I tested the commit before Linus merged
in the dm modifications: 3f583bc2 ("Merge tag 'iommu-updates-v3.15' of
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu").  It was
fine.  I then took 0596661f ("dm cache: fix a lock-inversion"), which
is what Linus merged in during the 3.15 merge window, reverted
3e9f1be1 on top of that, and tested that, and it also was fine.
The ID of that revert commit was 9cfd3fe8 (that ID doesn't appear in
any public tree, of course).

Interestingly, the merge of 3f583bc2 with 9cfd3fe8 was bad.  To track
this down, I first rebased the commits from the dm-3.15-changes branch
except for 3e9f1be1 on top of 3f583bc2, and bisected between 3f583bc2
and the tip of that branch.  That bisection pointed to e8099177.  I
tried reverting that from 3.15-rc8, but it doesn't revert cleanly, and
was too complex for me to work out how to manually revert it.

Next I did a git bisection between 3.14 and 3f583bc2, merging in
9cfd3fe8 at each point before testing.  That identified bcccff93 as
the first bad commit, and indeed 3.15 with bcccff93 reverted was not
prone to failing to find root or /home.

Paul.

             reply	other threads:[~2014-06-30 10:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-30 10:30 Paul Mackerras [this message]
2014-06-30 10:52 ` Regression in 3.15 on POWER8 with multipath SCSI Hannes Reinecke
2014-06-30 11:02   ` Paul Mackerras
2014-06-30 11:35     ` Hannes Reinecke
2014-06-30 21:28       ` Paul Mackerras
2014-07-01  5:57         ` Hannes Reinecke
2014-06-30 21:30   ` Paul Mackerras
2014-07-01 19:39 ` Mike Snitzer
2014-07-02 15:30   ` Bart Van Assche
2014-07-08 10:28   ` Junichi Nomura
2014-07-09  3:55     ` Alexey Kardashevskiy
2014-07-09 12:13       ` [dm-devel] " Junichi Nomura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140630103058.GA17747@iris.ozlabs.ibm.com \
    --to=paulus@samba.org \
    --cc=akpm@linux-foundation.org \
    --cc=dm-devel@redhat.com \
    --cc=hare@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vdavydov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).