public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Lars Marowsky-Bree <lmb@suse.de>
To: Oktay Akbal <oktay.akbal@s-tec.de>, linux-kernel@vger.kernel.org
Subject: Re: Possible Bug with MD multipath and raid1 on top
Date: Sun, 15 Sep 2002 01:07:53 +0200	[thread overview]
Message-ID: <20020914230753.GA3781@marowsky-bree.de> (raw)
In-Reply-To: <Pine.LNX.4.44.0209142014080.21833-100000@omega.s-tec.de>

On 2002-09-14T20:33:07,
   Oktay Akbal <oktay.akbal@s-tec.de> said:

> I found a very strange effect when using a raid1 on top of multipathing
> with Kernel 2.4.18 (Suse-version of it) with a 2-Port qlogic HBA
> connecting two arrays.

Is this with or without the patch I recently posted to linux-kernel?

If so, please use the patch at http://lars.marowsky-bree.de/dl/md-mp instead,
which is slightly newer and fixes one important (affecting raid0 use) and two
minor issues. Please be aware that you are beta-testing code for the time
being ;-) (Which is highly appreciated!)

> When I now pull out one of the cables two disks are missing and the
> multipath driver correctly uses the second path to the disks and
> continues to work. After plugging out the second cable all drives
> are marked as failed (mdstat), but the raid1 (md2) is still reported
> as functional with one device (md0) missing.

So far this sounds OK. (Even though the updated md-mp patch will _never_ fail
the last path but instead return the error to the layer upwards; this protects
against certain scenarios in 2.4 where a device error can't be distinguished
from a failed path and we don't want that to lead to an inaccessible device)

> All Processes using the raid1-device get stuck and this situation
> does not recover. Even some simple process testing the disk-access
> got stuck  (I think ps showed state   L<D).

That's not OK, obviously ;-)

I will try to reproduce this on Monday. As I don't have the hardware, but
instead use a loop device (which I can make fail on demand), if I can't
reproduce it, it might in fact be the FC driver which gets stuck somehow.

> Even if I'm quite sure that this is a bug, how should I test disk access
> without ending in "uninterruptible sleep" ?

Uhm, essentially, you should never get stuck in uninterruptible sleep. All
errors should "eventually" time out.

Please compile the kernel with magic-sysrq enabled and check where the
processes are stuck using magic-sysrq t. It might help if you piped the
results through ksymoops.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
Immortality is an adequate definition of high availability for me.
	--- Gregory F. Pfister


  reply	other threads:[~2002-09-14 23:02 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-09-14 18:33 Possible Bug with MD multipath and raid1 on top Oktay Akbal
2002-09-14 23:07 ` Lars Marowsky-Bree [this message]
2002-09-15  5:29   ` Oktay Akbal
2002-09-15 21:12     ` Lars Marowsky-Bree
2002-09-15  7:31   ` Nachtrag: " Oktay Akbal
2002-09-15  7:39     ` Oktay Akbal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020914230753.GA3781@marowsky-bree.de \
    --to=lmb@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oktay.akbal@s-tec.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox