linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ptb@lab.it.uc3m.es (Peter T. Breuer)
To: linux-raid@vger.kernel.org
Subject: Re: [PATCH 1/2] md bitmap bug fixes
Date: Fri, 18 Mar 2005 19:43:14 +0100	[thread overview]
Message-ID: <23krg2-4rr.ln1@news.it.uc3m.es> (raw)
In-Reply-To: 423B09EF.8070708@steeleye.com

Paul Clements <paul.clements@steeleye.com> wrote:
> [ptb]
> > Could you set out the scenario very exactly, please, for those of us at
> > the back of the class :-). I simply don't see it. I'm not saying it's
> > not there to be seen, but that I have been unable to build a mental
> > image of the situation from the description :(.
> 
> Typically, in a cluster environment, you set up a raid1 with a local 
> disk and an nbd (or one of its variants) below it:

      system A
> 
>     [raid1]
>     /     \
> [disk]  [nbd] ---------> other system

Alright.  That's just raid with one nbd device as well as a local device
in the mirror.  On failover from this node we will serve directly from
the remote source instead.


> The situation he's talking about is, as you put it "somebody tripping 
> over the network cables".
> 
> In that case, you'll end up with this:
> 
>     system A       system B
>     [raid1]        [raid1]
>     /     \        /     \
> [disk]  [XXX]  [disk]  [XXX]

Well, that is not what I think you should end up with.  You should end
up (according to me) with the floating IP moving to the other system in
degraded raid mode:

                   system B
                   [raid1]
                    /   \ 
                  disk  missing
     
and system A has died - that's what triggered the failover, usually.
And I believe the initial situation was:

      system A      system B
      [raid1]   .--- nbd      
      /     \   |     |
  [disk]  [nbd]-' [disk] 

You are suggesting a failure mode in which A does not die, but B thinks
it does, and takes the floating IP address.  Well, sorry, that's tough,
but the IP is where the IP address is no matter what A may believe. No
writes will go to A.

What seems to be the idea is that the failover mechanism has fouled up
- well, that's not a concern of md. If the failover mechanism does that
it's not right. The failover should tell A to shutdown (if it hasn't
already) and tell B to start serving.

Is the problem a race condition? One would want to hold off or even 
reject writes during the seconds of transition.


> Where there's a degraded raid1 writing only to the local disk on each 
> system (and a dirty bitmap on both sides).

This situation is explicitly disallowed by failover designs. The
failover mechanism will direct the reconfiguration so that this does
not happen. I don't even see exactly how it _can_ happen. I'm happy to
consider it, but I don't see how it can arise, since failover
mechanisms do exactly their thing in not permitting it.

> The solution is to combine the bitmaps and resync in one direction or 
> the other. Otherwise, you've got to do a full resync...

I don't see that this solves anything. If you had both sides going at
once, receiving different writes, then you are sc&**ed, and no
resolution of bitmaps will help you, since both sides have received
different (legitimate) data. It doesn't seem relevant to me to consider 
if they are equally up to date wrt the writes they have received. They
will be in the wrong even if they are up to date.

OK - maybe the problem is in the race between sending the writes
across to system B, and shutting down A, and starting serving from B.
This is the intended sequence:

   1  A sends writes to B
   2  A dies
   3  failover blocks writes
   4  failover moves IP address to B
   5  B drops nbd server
   6  B starts serving directly from a degraded raid, recording in bitmap
   7  failover starts passing writes to B

I can vaguely imagine some of the writes from (1) being still buffered in
B for write to B somewhere about the (6) point. Is that a problem?  I
don't see that it is. The kernel will have them in its buffers.
Applications will see them.

What about when A comes back up? We then get a 

                 .--------------.
        system A |    system B  |
          nbd ---'    [raid1]   |
          |           /     \   |
       [disk]     [disk]  [nbd]-'

situation, and a resync is done (skipping clean sectors). 

So I don't see where these "two" bitmaps are.


Peter


  reply	other threads:[~2005-03-18 18:43 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-03-09 22:18 [PATCH 1/2] md bitmap bug fixes Paul Clements
2005-03-09 22:19 ` [PATCH 2/2] " Paul Clements
2005-03-14  4:43 ` [PATCH 1/2] " Neil Brown
2005-03-14  9:44   ` Lars Marowsky-Bree
2005-03-14 10:22     ` Neil Brown
2005-03-14 11:24       ` Lars Marowsky-Bree
2005-03-14 22:54         ` Neil Brown
2005-03-18 10:33           ` Lars Marowsky-Bree
2005-03-18 12:52             ` Peter T. Breuer
2005-03-18 13:42               ` Lars Marowsky-Bree
2005-03-18 14:50                 ` Peter T. Breuer
2005-03-18 17:03                   ` Paul Clements
2005-03-18 18:43                     ` Peter T. Breuer [this message]
2005-03-18 19:01                       ` Mario Holbe
2005-03-18 19:33                         ` Peter T. Breuer
2005-03-18 20:24                           ` Mario Holbe
2005-03-18 21:01                             ` Andy Smith
2005-03-19 11:43                             ` Peter T. Breuer
2005-03-19 12:58                               ` Lars Marowsky-Bree
2005-03-19 13:27                                 ` Peter T. Breuer
2005-03-19 14:07                                   ` Lars Marowsky-Bree
2005-03-19 15:06                                     ` Peter T. Breuer
2005-03-19 15:24                                       ` Mario Holbe
2005-03-19 15:58                                         ` Peter T. Breuer
2005-03-19 16:24                                       ` Lars Marowsky-Bree
2005-03-19 17:19                                         ` Peter T. Breuer
2005-03-19 17:36                                           ` Lars Marowsky-Bree
2005-03-19 17:44                                         ` Guy
2005-03-19 17:54                                           ` Lars Marowsky-Bree
2005-03-19 18:05                                             ` Guy
2005-03-19 20:29                                             ` berk walker
2005-03-19 18:11                                           ` Peter T. Breuer
2005-03-18 19:43                       ` Paul Clements
2005-03-19 12:10                         ` Peter T. Breuer
2005-03-21 16:07                           ` Paul Clements
2005-03-21 18:56                             ` Luca Berra
2005-03-21 19:58                               ` Paul Clements
2005-03-21 20:45                                 ` Peter T. Breuer
2005-03-21 21:09                                   ` Gil
2005-03-21 21:19                                   ` Paul Clements
2005-03-21 22:15                                     ` Peter T. Breuer
2005-03-22 22:35                                     ` Peter T. Breuer
2005-03-21 21:32                                   ` Guy
2005-03-22  9:35                                 ` Luca Berra
2005-03-22 10:02                                   ` Peter T. Breuer
2005-03-23 20:31                                     ` Luca Berra
2005-03-25 18:51                                       ` Peter T. Breuer
2005-03-25 20:54                                         ` berk walker
2005-03-25 20:56                                           ` berk walker
2005-03-18 17:16                 ` Luca Berra
2005-03-18 17:57                   ` Lars Marowsky-Bree
2005-03-18 21:46                   ` Michael Tokarev
2005-03-19  9:05                     ` Lars Marowsky-Bree
2005-03-19 12:16                     ` Peter T. Breuer
2005-03-19 12:34                       ` Michael Tokarev
2005-03-19 12:53                         ` Peter T. Breuer
2005-03-19 16:08                           ` "Robust Read" (was: [PATCH 1/2] md bitmap bug fixes) Michael Tokarev
2005-03-19 17:03                             ` "Robust Read" Peter T. Breuer
2005-03-19 20:20                               ` Michael Tokarev
2005-03-19 20:56                                 ` Peter T. Breuer
2005-03-19 22:05                                   ` Michael Tokarev
2005-03-19 22:30                                     ` Peter T. Breuer
2005-03-15  4:24   ` [PATCH 1/2] md bitmap bug fixes Paul Clements
2005-03-17 20:51   ` [PATCH 0/3] md bitmap-based asynchronous writes Paul Clements
2005-03-17 20:53     ` [PATCH 1/3] md bitmap async write enabling Paul Clements
2005-03-17 20:55       ` [PATCH 2/3] md bitmap async writes for raid1 Paul Clements
2005-03-17 20:56         ` [PATCH 3/3] mdadm: bitmap async writes Paul Clements
2005-03-21  4:21     ` [PATCH 0/3] md bitmap-based asynchronous writes Neil Brown
2005-03-21 16:31       ` Paul Clements
2005-03-21 22:09         ` Neil Brown
2005-03-22  8:35           ` Peter T. Breuer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=23krg2-4rr.ln1@news.it.uc3m.es \
    --to=ptb@lab.it.uc3m.es \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).