From: NeilBrown <neilb@suse.com>
To: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: David Teigland <teigland@redhat.com>, linux-kernel@vger.kernel.org
Subject: Re: clustered MD
Date: Tue, 23 Jun 2015 11:34:43 +1000 [thread overview]
Message-ID: <20150623113443.42b65439@noble> (raw)
In-Reply-To: <557DFDF3.2060106@suse.com>
On Sun, 14 Jun 2015 17:19:31 -0500
Goldwyn Rodrigues <rgoldwyn@suse.com> wrote:
>
>
> On 06/12/2015 01:46 PM, David Teigland wrote:
> > When a node fails, its dirty areas get special treatment from other nodes
> > using the area_resyncing() function. Should the suspend_list be created
> > before any reads or writes from the file system are processed by md? It
> > seems to me that gfs journal recovery could read/write to dirty regions
> > (from the failed node) before md was finished setting up the suspend_list.
> > md could probably prevent that by using the recover_prep() dlm callback to
> > set a flag that would block any i/o that arrived before the suspend_list
> > was ready.
> >
> > .
>
> Yes, we should call mddev_suspend() in recover_prep() and mddev_resume()
> after suspend_list is created. Thanks for pointing it out.
>
The only thing that nodes need to be careful of between the time when
some other node disappears and when that disappearance has been
completely handled is reads.
md/raid1 must ensure that if/when the filesystem reads from a region
that the missing node was writing to, that the filesystem sees
consistent data - on all nodes.
So it needs to suspend read-balancing while it is uncertain.
Once the bitmap from the node has been loaded, the normal protection
against read-balancing in a "dirty" region is sufficient. While
waiting for the bitmap to be loaded, the safe thing to do would be to
disable read-balancing completely.
So I think that recover_prep() should set a flag which disables all
read balancing, and recover_done() (or similar) should clear that flag.
Probably there should be one flag for each other node.
Calling mddev_suspend to suspect all IO is over-kill. Suspending all
read balancing is all that is needed.
Thanks,
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2015-06-23 1:35 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-09 18:22 clustered MD David Teigland
2015-06-09 19:26 ` Goldwyn Rodrigues
2015-06-09 19:45 ` David Teigland
2015-06-09 20:08 ` Goldwyn Rodrigues
2015-06-09 20:30 ` David Teigland
2015-06-09 20:33 ` David Lang
2015-06-10 3:33 ` Goldwyn Rodrigues
2015-06-10 8:00 ` Richard Weinberger
2015-06-10 13:59 ` Goldwyn Rodrigues
2015-06-10 15:01 ` David Teigland
2015-06-10 15:27 ` Goldwyn Rodrigues
2015-06-10 15:48 ` David Teigland
2015-06-10 16:23 ` Goldwyn Rodrigues
2015-06-10 17:05 ` David Teigland
2015-06-10 19:22 ` David Teigland
2015-06-10 20:31 ` Neil Brown
2015-06-10 21:07 ` David Teigland
2015-06-10 22:11 ` David Teigland
2015-06-10 22:50 ` Neil Brown
2015-06-12 18:46 ` David Teigland
2015-06-14 22:19 ` Goldwyn Rodrigues
2015-06-23 1:34 ` NeilBrown [this message]
2015-06-09 20:14 ` Goldwyn Rodrigues
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150623113443.42b65439@noble \
--to=neilb@suse.com \
--cc=linux-kernel@vger.kernel.org \
--cc=rgoldwyn@suse.com \
--cc=teigland@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.