linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* md multipath corruption in 2.4.x
@ 2005-04-13  9:26 Lars Marowsky-Bree
  2005-04-13 20:13 ` [PATCH] " Lars Marowsky-Bree
  0 siblings, 1 reply; 2+ messages in thread
From: Lars Marowsky-Bree @ 2005-04-13  9:26 UTC (permalink / raw)
  To: linux-raid

Hi all,

I'm currently having to hunt down a source of corruption in md multipath
in 2.4.21 (but the respective code paths haven't changed until 2.4.30
either, so it's likely present in all 2.4.x kernels).

The corruption is triggered by a path failure; then some writes go
wrong. (It's a bit difficult to figure out what exactly is happening as
it takes a fairly substantial write load too it seems.)

I saw that raid1 has received locking fixes for the access to its
internal structs in 2.4.28 and I've already ported these over to
multipath too, but it doesn't seem to be causing this.

I'll have to dig deeper into the code base now and try to figure out
what is going wrong; the problem is that the code _looks_ correct ;-)

So, if anybody has experienced similar issues with md multipath in 2.4,
or has some data points to share (or "what you always thought smelled
fishy about md multipath"), I'd appreciate this.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [PATCH] Re: md multipath corruption in 2.4.x
  2005-04-13  9:26 md multipath corruption in 2.4.x Lars Marowsky-Bree
@ 2005-04-13 20:13 ` Lars Marowsky-Bree
  0 siblings, 0 replies; 2+ messages in thread
From: Lars Marowsky-Bree @ 2005-04-13 20:13 UTC (permalink / raw)
  To: linux-raid

On 2005-04-13T11:26:05, Lars Marowsky-Bree <lmb@suse.de> wrote:

Ok, so the subject is a bit of a lie, because I haven't yet brought over
all changes from our 2.4.21 vendor kernel to 2.4.30 (as I'd need to
disentangle some feature additions first which Neil doesn't like).

However, I've found the root cause of the data corruption issue with md
multipath, after having been side tracked by the abhorrent locking
issues ;-)

It's this innocent "bh->b_rsector = bh->b_blocknr;" in
multipath.c:multipathd(), an obvious artifact from having forked this
from raid1.c - where however this has a clearly different meaning.

For md multipath, this basically implies that _EVERY_ IO which was
in-flight when the error happened and was requeued will be written to a
wrong block on disk.

A minimal fix would be to remove at least this one line, if one didn't
want to fix the locking raid1.c-style (was introduced with 2.4.28).


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2005-04-13 20:13 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-13  9:26 md multipath corruption in 2.4.x Lars Marowsky-Bree
2005-04-13 20:13 ` [PATCH] " Lars Marowsky-Bree

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).