linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Francis Moreau <francis.moro@gmail.com>
Cc: Martin Wilck <mwilck@arcor.de>, linux-raid@vger.kernel.org
Subject: Re: mdadm 3.3: issue with mdmon --takeover
Date: Thu, 5 Sep 2013 12:11:23 +1000	[thread overview]
Message-ID: <20130905121123.27968f9f@notabene.brown> (raw)
In-Reply-To: <CAC9WiBiHkOjqY4nDwMh4S8td9NMz2mrdqK0R7cZp7nKSMyhBvg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 8976 bytes --]

On Wed, 4 Sep 2013 09:36:27 +0200 Francis Moreau <francis.moro@gmail.com>
wrote:

> Hi Neil,
> 
> On Wed, Sep 4, 2013 at 8:08 AM, NeilBrown <neilb@suse.de> wrote:
> > On Tue, 3 Sep 2013 17:54:55 +0200 Francis Moreau <francis.moro@gmail.com>
> > wrote:
> >
> >> Hello Martin :)
> >>
> >> I gave 3.3 release a try and I have a first issue: basically starting
> >> mdmon (3.3) with --takeover twice make mdmon failing on the second
> >> run.
> >>
> >> Please find details below:
> >>
> >> # cat /proc/mdstat
> >> Personalities : [raid1]
> >> md126 : active raid1 sdb[1] sda[0]
> >>       2064384 blocks super external:/md127/0 [2/2] [UU]
> >>
> >> md127 : inactive sdb[1](S) sda[0](S)
> >>       65536 blocks super external:ddf
> >>
> >> # ps aux | grep dmon
> >> root       311  0.4  1.0  80580 10944 ?        SLsl 17:46   0:00
> >> @sbin/mdmon --takeover md127
> >>
> >> # ./mdmon --takeover --all
> >>
> >> # ps aux | grep dmon
> >> root      3182  1.3  1.0  15156 11056 ?        SLsl 17:50   0:00
> >> ./mdmon --takeover md127
> >>
> >> # ./mdmon --takeover --all
> >> ...
> >> monitor: wake ( )
> >> monitor: wake ( )
> >> monitor: wake ( )
> >> monitor: wake ( )
> >> monitor: wake ( )
> >> monitor: wake ( 12:array_state )
> >> read_and_act(0): 1378223477.512347 state:clean prev:clean action:idle
> >> prev: idle start:18446744073709551615
> >> ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (5) clean 18446744073709551615
> >> manage_new: inst: 0 action: 11 state: 12
> >> mdmon: ddf_open_new: subarray 0 doesn't exist
> >> mdmon: failed to monitor external:/md127/0
> >> free_aa: sys_name: md126
> >> read_and_act(0): state:clean action:idle next( )
> >> manage_new: inst: 0 action: 20 state: 21
> >> ddf_open_new: new subarray 0, GUID: Linux-MDdeadbeef00000000?Ob79e0c8b1n
> >> free_aa: sys_name: md126
> >> caught sigterm, all clean... exiting
> >> monitor: wake ( )
> >> no arrays to monitor... exiting
> >>
> >> # ps aux | grep dmon
> >> #
> >>
> >> Thanks
> >
> > I can't easily reproduce this.
> 
> This is weird, it's 100% reproductible here.
> 
> >
> > Can you run "mdmon --takeover" in one window, then the next "mdmon
> > --takeover" is a different window so we can clearly see which messages are
> > coming from the mdmon which is exiting and which are coming from the mdmon
> > which is starting.
> 
> 
> Sure.
> 
> A note that I should have probably tell previously: before I'm
> starting manually the first mdmon process, an old mdmon process is
> running which was started by the system at boot and this mdmon is
> 3.2.6.
> 
> ###
> ### window 1: starting manually the first mdmon --takeover process ####
> ###
> 
> # ps aux | grep dmon
> root       312  0.5  1.0  80580 10944 ?        SLsl 09:24   0:00
> @sbin/mdmon --takeover md127
> 
> ## Note: this mdmon process was started at system boot and is 3.2.6
> 
> # ./mdmon --takeover --all
> ...
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> manage_new: inst: 0 action: 11 state: 12
> ddf_open_new: new subarray 0, GUID: Linux-MDdeadbeef00000000?Ob79e0c8b1n
> monitor: caught signal
> read_and_act(0): 1378279619.393600 state:clean prev:inactive
> action:idle prev: idle start:18446744073709551615
> pr_state/ddf_set_array_state: 0(s=10 i=02)
> ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (5) dirty 18446744073709551615
> pr_state/ddf_set_array_state: 0(s=00 i=02)
> ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (5) clean 18446744073709551615
> pr_state/__write_init_super_ddf: 0(s=00 i=02)
> writing conf record 0 on disk b342fbdc for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk b342fbdc for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk 2cf00056 for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk 2cf00056 for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> ddf: sync_metadata
> read_and_act(0): state:clean action:idle next( )
> monitor: wake ( 12:array_state )
> read_and_act(0): 1378279621.980656 state:write-pending prev:clean
> action:idle prev: idle start:18446744073709551615
> pr_state/ddf_set_array_state: 0(s=10 i=02)
> ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (7) dirty 18446744073709551615
> pr_state/__write_init_super_ddf: 0(s=10 i=02)
> writing conf record 0 on disk b342fbdc for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk b342fbdc for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk 2cf00056 for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk 2cf00056 for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> ddf: sync_metadata
> read_and_act(0): state:write-pending action:idle next( state:active )
> monitor: wake ( 12:array_state )
> read_and_act(0): 1378279622.381087 state:active prev:write-pending
> action:idle prev: idle start:18446744073709551615
> read_and_act(0): state:active action:idle next( )
> monitor: wake ( 12:array_state )
> read_and_act(0): 1378279626.520845 state:active-idle prev:active
> action:idle prev: idle start:18446744073709551615
> read_and_act(0): state:active-idle action:idle next( state:clean )
> monitor: wake ( 12:array_state )
> read_and_act(0): 1378279626.524532 state:clean prev:active-idle
> action:idle prev: idle start:18446744073709551615
> pr_state/ddf_set_array_state: 0(s=00 i=02)
> ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (5) clean 18446744073709551615
> pr_state/__write_init_super_ddf: 0(s=00 i=02)
> writing conf record 0 on disk b342fbdc for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk b342fbdc for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk 2cf00056 for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk 2cf00056 for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> ddf: sync_metadata
> read_and_act(0): state:clean action:idle next( )
> monitor: wake ( 12:array_state )
> read_and_act(0): 1378279626.981157 state:write-pending prev:clean
> action:idle prev: idle start:18446744073709551615
> pr_state/ddf_set_array_state: 0(s=10 i=02)
> ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (7) dirty 18446744073709551615
> pr_state/__write_init_super_ddf: 0(s=10 i=02)
> writing conf record 0 on disk b342fbdc for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk b342fbdc for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk 2cf00056 for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk 2cf00056 for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> ddf: sync_metadata
> read_and_act(0): state:write-pending action:idle next( state:active )
> monitor: wake ( 12:array_state )
> read_and_act(0): 1378279627.376402 state:active prev:write-pending
> action:idle prev: idle start:18446744073709551615
> read_and_act(0): state:active action:idle next( )
> 
> [launching new mdmon --takeover....]
> 
> monitor: wake ( 12:array_state )
> read_and_act(0): 1378279678.858186 state:clean prev:clean action:idle
> prev: idle start:18446744073709551615
> ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (5) clean 18446744073709551615
> read_and_act(0): state:clean action:idle next( )
> manage_new: inst: 0 action: 20 state: 21
> ddf_open_new: new subarray 0, GUID: Linux-MDdeadbeef00000000?Ob79e0c8b1n
> free_aa: sys_name: md126
> caught sigterm, all clean... exiting
> 
> ###
> ### window 2: starting the 2nd mdmon process ###
> ###
> 
> #./mdmon --takeover --all
> ...
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> manage_new: inst: 0 action: 11 state: 12
> mdmon: ddf_open_new: subarray 0 doesn't exist
> mdmon: failed to monitor external:/md127/0
> free_aa: sys_name: md126
> monitor: wake ( )
> no arrays to monitor... exiting
> 

The line

> mdmon: ddf_open_new: subarray 0 doesn't exist

is the problem.  mdmon read the metadata from the array but didn't find
subarray '0' in there even though the previous mdmon clearly did:

> ddf_open_new: new subarray 0, GUID: Linux-MDdeadbeef00000000?Ob79e0c8b1n

This suggests that even though it succeeded in reading the metadata (it would
have printed
    Cannot load metadata for md127
and exited if it had), the metadata is somehow inconsistent.

Could you trying running each mdmon under strace:
  strace -f -o /tmp/str-1 ./mddmon --takeover --all

and attach the two /tmp/str-? files?

Also what is the difference between
  mdadm --examine /dev/sda
and
  mdadm --examine /dev/sdb
??

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2013-09-05  2:11 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-03 15:54 mdadm 3.3: issue with mdmon --takeover Francis Moreau
2013-09-04  6:08 ` NeilBrown
2013-09-04  7:36   ` Francis Moreau
2013-09-05  2:11     ` NeilBrown [this message]
     [not found]       ` <CAC9WiBiHcS126iFv91250d83sMrBYmRbvoqYAEhjJWjb2p5J3A@mail.gmail.com>
2013-09-05  9:03         ` Francis Moreau
2013-09-10 23:35           ` NeilBrown
2013-09-11  7:40             ` Francis Moreau
2013-09-11  8:11               ` Francis Moreau
2013-09-12  5:03                 ` NeilBrown
2013-09-12  7:40                   ` Francis Moreau
2013-09-12  5:00               ` NeilBrown
2013-09-11 20:51             ` Martin Wilck
2013-09-12  4:59               ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130905121123.27968f9f@notabene.brown \
    --to=neilb@suse.de \
    --cc=francis.moro@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=mwilck@arcor.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).