[Drbd-dev] roadmap draft

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Drbd-dev] roadmap draft
@ 2004-09-07 13:49 Philipp Reisner
  2004-09-09 10:05 ` Lars Marowsky-Bree
  0 siblings, 1 reply; 3+ messages in thread
From: Philipp Reisner @ 2004-09-07 13:49 UTC (permalink / raw)
  To: drbd-dev

[-- Attachment #1: Type: text/plain, Size: 221 bytes --]

...
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :

[-- Attachment #2: roadmap.txt --]
[-- Type: text/plain, Size: 3992 bytes --]

DRBD 0.8 Roadmap
----------------

1 Drop support for linux-2.4.x. 
  Do all size calculations on the base of sectors (512 Byte) as it 
  is common in Linux-2.6.x.
  (Currently they are done on a 1k base, for 2.4.x compatibility)

2 Drop the Drbd_Parameter_Packet.
  Replace the Drbd_Parameter_Packet by a more general and 
  extensible mechanism.

3 Changes of state and cstate synchronized by mutex and only done by
  the worker thread.

4 Two new config options, to allow more fine grained definition of
  DRDBs behaviour after a split-brain situation:

  after-sb-2pri = 
   disconnect     No automatic resynchronisation gets performed. One
                  node should drop its net-conf (preferable the
                  node that would become sync-target)
                  DEFAULT.
   asf-older      Auto sync from is the oder primary (curr.behaviour i.t.s.)
   asf-younger    Auto sync from is the younger primary
   asf-furthest   Auto sync from is the node that did more modifications
   asf-NODENAME   Auto sync from is the named node 

  pri-sees-sec-with-higher-gc =
   disconnect     (current behaviour)
   asf-primary    Auto sync from is the current primary
   panic          The current primary panics. The node with the
                  higher gc should take over.

  Notes:
  1) The disconnect actions cause the sync-target or the secondary
     node to go into StandAlone state.
  2) If two nodes in primary state try to connect one of them goes
     into StandAlone state (=curr. behaviour)
  3) As soon as the decision is takes the sync-target adopts the
     GC of the sync source. 
     [ The whole algorithm would also work if both would reset their 
       GCs to <0,0,0...> after the decision, but since we also
       use the GC to tag the bitmap it is better the current way ]

5 It is possible that a secondary node crashes a primary by 
  returning invalid block_ids in ACK packets. [This might be 
  either caused by faulty hardware, or by a hostile modification
  of DRBD on the secondary node]

  Proposed solution:

  Extend the block_id field. (currently 64 bit) by at least
  32 bits (64?) . (=block_id_chk field). The primary node 
  stores an encrypted (random key, changes every 15 minutes...) 
  checksum (=signature) in the second field. 

  The secondary node can not fake (either intentionally or 
  unintentionally) these signature. 

  The primary node will only dereference the block_id pointers
  if the signature is right.

6 Support IO fencing; introduce the "Dead" peer state (o_state)

  New commands:
    drbdadm peer-dead r0
    drbdadm [ considered-dead | die | fence | outdate ] r0 
      ( What do you like best ? Suggestions ? )

  remove option value: on-disconnect=freeze_io

  introduce: 
    peer-state-unknown=freeze_io
    peer-state-unknown=continue_io

  New meta-data flag: "Outdated"

  Let us assume that we have two boxes (N1 and N2) and that these
  two boxes are connected by two networks (net and cnet [ clinets'-net ]).

  Net is used by DRBD, while heartbeat uses both, net and cnet

  I know that you are talking about fencing by STONITH, but DRBD is
  not limited to that. Here comes my understanding of how fencing
  (other than STONITH) should work with DRBD-0.8 :

   N1  net   N2
   P/S ---  S/P     everything up and running.
   P/? - -  S/?     network breaks ; N1 freezes IO
   P/? - -  S/?     N1 fences N2:
                    In the STONITH case: turn off N2.
                    In the "smart" case: 
                    N1 asks N2 to fence itself from the storage via cnet.
                    HB calls "drbdadm fence r0" on N2.
                    N2 replies to N1 that fencing is done via cnet.
                    N1 calls "drbdadm peer-dead r0".
   P/D - -  S/?     N1 thaws IO

  N2 got the the "Outdated" flag set in its meta-data, by the "fence" 
  command. I am not sure if it should be called "fence", other ideas:
  "considered-dead","die","fence","outdate". What do you think ?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Drbd-dev] roadmap draft
  2004-09-07 13:49 [Drbd-dev] roadmap draft Philipp Reisner
@ 2004-09-09 10:05 ` Lars Marowsky-Bree
  2004-09-10  9:11   ` Philipp Reisner
  0 siblings, 1 reply; 3+ messages in thread
From: Lars Marowsky-Bree @ 2004-09-09 10:05 UTC (permalink / raw)
  To: drbd-dev

On 2004-09-07T15:49:15,
   Philipp Reisner <philipp.reisner@linbit.com> said:

> DRBD 0.8 Roadmap
> ----------------
> 
> 1 Drop support for linux-2.4.x. 
>   Do all size calculations on the base of sectors (512 Byte) as it 
>   is common in Linux-2.6.x.
>   (Currently they are done on a 1k base, for 2.4.x compatibility)

For all I care, 2.4 can die die die any second... ;-)

> 4 Two new config options, to allow more fine grained definition of
>   DRDBs behaviour after a split-brain situation:
> 
>   after-sb-2pri = 
>    disconnect     No automatic resynchronisation gets performed. One
>                   node should drop its net-conf (preferable the
>                   node that would become sync-target)
>                   DEFAULT.
>    asf-older      Auto sync from is the oder primary (curr.behaviour i.t.s.)
>    asf-younger    Auto sync from is the younger primary
>    asf-furthest   Auto sync from is the node that did more modifications
>    asf-NODENAME   Auto sync from is the named node 

With the 'preferrably the node which would become sync-target'
constraint, you would need to allow to specify one of the other methods
too (how else would you determine which node would become sync-target?).

>   pri-sees-sec-with-higher-gc =
>    disconnect     (current behaviour)
>    asf-primary    Auto sync from is the current primary
>    panic          The current primary panics. The node with the
>                   higher gc should take over.
>   
>   
>   Notes:
>   1) The disconnect actions cause the sync-target or the secondary
>      node to go into StandAlone state.

>   2) If two nodes in primary state try to connect one of them goes
>      into StandAlone state (=curr. behaviour)

1+2 - I'd rather prefer a symmetric scenario where both nodes go to
StandAlone.

>   3) As soon as the decision is takes the sync-target adopts the
>      GC of the sync source. 
>      [ The whole algorithm would also work if both would reset their 
>        GCs to <0,0,0...> after the decision, but since we also
>        use the GC to tag the bitmap it is better the current way ]
> 
> 5 It is possible that a secondary node crashes a primary by 
>   returning invalid block_ids in ACK packets. [This might be 
>   either caused by faulty hardware, or by a hostile modification
>   of DRBD on the secondary node]
> 
>   Proposed solution:

I'd just keep a map of outstanding ACKs and compare any ACK received
against that list. Wouldn't that solve this?

> 6 Support IO fencing; introduce the "Dead" peer state (o_state)

Dead peer state + Outdated flag seems good, but regarding the 'fence', I
defer this to the other thread ;-)


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering	   \\\  /// 
SUSE Labs, Research and Development \honk/ 
SUSE LINUX AG - A Novell company     \\// 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Drbd-dev] roadmap draft
  2004-09-09 10:05 ` Lars Marowsky-Bree
@ 2004-09-10  9:11   ` Philipp Reisner
  0 siblings, 0 replies; 3+ messages in thread
From: Philipp Reisner @ 2004-09-10  9:11 UTC (permalink / raw)
  To: drbd-dev

[-- Attachment #1: Type: text/plain, Size: 1707 bytes --]

[...]
> > 4 Two new config options, to allow more fine grained definition of
> >   DRDBs behaviour after a split-brain situation:
> >
> >   after-sb-2pri =
> >    disconnect     No automatic resynchronisation gets performed. One
> >                   node should drop its net-conf (preferable the
> >                   node that would become sync-target)
> >                   DEFAULT.
> >    asf-older      Auto sync from is the oder primary (curr.behaviour
> > i.t.s.) asf-younger    Auto sync from is the younger primary
> >    asf-furthest   Auto sync from is the node that did more modifications
> >    asf-NODENAME   Auto sync from is the named node
>
> With the 'preferrably the node which would become sync-target'
> constraint, you would need to allow to specify one of the other methods
> too (how else would you determine which node would become sync-target?).
>

Hmm, I just do not get it...

[...]
> >   Notes:
> >   1) The disconnect actions cause the sync-target or the secondary
> >      node to go into StandAlone state.
> >
> >   2) If two nodes in primary state try to connect one of them goes
> >      into StandAlone state (=curr. behaviour)
>
> 1+2 - I'd rather prefer a symmetric scenario where both nodes go to
> StandAlone.
>

Ok. I have put it in.

[...]
> I'd just keep a map of outstanding ACKs and compare any ACK received
> against that list. Wouldn't that solve this?
>

Ok, thought about it again, had a look at the code, changed my mind.
-> Hash.

-Philipp
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :

[-- Attachment #2: roadmap.txt --]
[-- Type: text/plain, Size: 5156 bytes --]

DRBD 0.8 Roadmap
----------------

1 Drop support for linux-2.4.x. 
  Do all size calculations on the base of sectors (512 Byte) as it 
  is common in Linux-2.6.x.
  (Currently they are done on a 1k base, for 2.4.x compatibility)

2 Drop the Drbd_Parameter_Packet.
  Replace the Drbd_Parameter_Packet by a more general and 
  extensible mechanism.

3 Changes of state and cstate synchronized by mutex and only done by
  the worker thread.

4 Two new config options, to allow more fine grained definition of
  DRDBs behaviour after a split-brain situation:

  after-sb-2pri = 
   disconnect     No automatic resynchronisation gets performed. One
                  node should drop its net-conf (preferable the
                  node that would become sync-target)
                  DEFAULT.
   asf-older      Auto sync from is the oder primary (curr.behaviour i.t.s.)
   asf-younger    Auto sync from is the younger primary
   asf-furthest   Auto sync from is the node that did more modifications
   asf-NODENAME   Auto sync from is the named node 

  pri-sees-sec-with-higher-gc =
   disconnect     (current behaviour)
   asf-primary    Auto sync from is the current primary
   panic          The current primary panics. The node with the
                  higher gc should take over.

  Notes:
  1) The disconnect actions cause the sync-target or the secondary
     (better both) node to go into StandAlone state.
  2) If two nodes in primary state try to connect one (better both)
     of them goes into StandAlone state (=curr. behaviour)
  3) As soon as the decision is takes the sync-target adopts the
     GC of the sync source. 
     [ The whole algorithm would also work if both would reset their 
       GCs to <0,0,0...> after the decision, but since we also
       use the GC to tag the bitmap it is better the current way ]

5 It is possible that a secondary node crashes a primary by 
  returning invalid block_ids in ACK packets. [This might be 
  either caused by faulty hardware, or by a hostile modification
  of DRBD on the secondary node]

  Proposed solution:

  Have a hash table (hlist_head style), add the collision
  member (hlist_node) to drbd_request. 

  Use the pointer to the drbd_request as key to the hash, each
  drbd_request is also put into this hash table. We still use the 
  pointer as block_id. 

  When we get an ACK packet, we lookup the hash table with the
  block_id, and may find the drbd_request there. Otherwise it 
  was a forged ACK.

6 Support IO fencing; introduce the "Dead" peer state (o_state)

  New commands:
    drbdadm peer-dead r0
    drbdadm [ considered-dead | die | fence | outdate ] r0 
      ( What do you like best ? Suggestions ? )

  remove option value: on-disconnect=freeze_io

  introduce: 
    peer-state-unknown=freeze_io
    peer-state-unknown=continue_io

  New meta-data flag: "Outdated"

  Let us assume that we have two boxes (N1 and N2) and that these
  two boxes are connected by two networks (net and cnet [ clinets'-net ]).

  Net is used by DRBD, while heartbeat uses both, net and cnet

  I know that you are talking about fencing by STONITH, but DRBD is
  not limited to that. Here comes my understanding of how fencing
  (other than STONITH) should work with DRBD-0.8 :

   N1  net   N2
   P/S ---  S/P     everything up and running.
   P/? - -  S/?     network breaks ; N1 freezes IO
   P/? - -  S/?     N1 fences N2:
                    In the STONITH case: turn off N2.
                    In the "smart" case: 
                    N1 asks N2 to fence itself from the storage via cnet.
                    HB calls "drbdadm fence r0" on N2.
                    N2 replies to N1 that fencing is done via cnet.
                    N1 calls "drbdadm peer-dead r0".
   P/D - -  S/?     N1 thaws IO

  N2 got the the "Outdated" flag set in its meta-data, by the "fence" 
  command. I am not sure if it should be called "fence", other ideas:
  "considered-dead","die","fence","outdate". What do you think ?

7 New command drbdmeta

  We move the read_gc.pl/write_gc.pl to the user directory. 
  Make them to one C program: drbdmeta
   -> in the future the module never creates the meta data
      block. One can use drbdmeta to create, read and 
      modify the drbdmeta block. drbdmeta refuses to write
      to it as long as the module is loaded (configured).

  drbdsetup gets the ability to read the gc values while DRBD
  is set up via an ioctl() call. -- drbdmeta refuses to run
  if DRBD is configured. 

  drbdadm is the nice frontend. It alsways uses the right 
  backend (drbdmeta or drbdsetup)...

  drbdadm md-set-gc 1:2:3:4:5:6 r0
  drbdadm md-get-gc r0
  drbdadm md-get/set-{la-size|consistent|etc...} resources....
  drbdadm md-create r0

plus-banches:
----------------------

* Implement the checksum based resync. 

* 3 node support. Do and test a 3 node setup (2nd DRBD stacked over
  a DRBD pair). Enhance the user level tools to support the 3 node
  setup.

* Change the bitmap code to work with unmapped highmem pages, instead
  of using vmalloc()ed memory. This allows users of 32bit platforms
  to use drbd on big devices (in the ~3TB range)

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2004-09-10  9:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-07 13:49 [Drbd-dev] roadmap draft Philipp Reisner
2004-09-09 10:05 ` Lars Marowsky-Bree
2004-09-10  9:11   ` Philipp Reisner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.