From: NeilBrown <neilb@suse.de>
To: Francis Moreau <francis.moro@gmail.com>
Cc: Martin Wilck <mwilck@arcor.de>, linux-raid@vger.kernel.org
Subject: Re: mdadm 3.3: issue with mdmon --takeover
Date: Thu, 5 Sep 2013 12:11:23 +1000 [thread overview]
Message-ID: <20130905121123.27968f9f@notabene.brown> (raw)
In-Reply-To: <CAC9WiBiHkOjqY4nDwMh4S8td9NMz2mrdqK0R7cZp7nKSMyhBvg@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 8976 bytes --]
On Wed, 4 Sep 2013 09:36:27 +0200 Francis Moreau <francis.moro@gmail.com>
wrote:
> Hi Neil,
>
> On Wed, Sep 4, 2013 at 8:08 AM, NeilBrown <neilb@suse.de> wrote:
> > On Tue, 3 Sep 2013 17:54:55 +0200 Francis Moreau <francis.moro@gmail.com>
> > wrote:
> >
> >> Hello Martin :)
> >>
> >> I gave 3.3 release a try and I have a first issue: basically starting
> >> mdmon (3.3) with --takeover twice make mdmon failing on the second
> >> run.
> >>
> >> Please find details below:
> >>
> >> # cat /proc/mdstat
> >> Personalities : [raid1]
> >> md126 : active raid1 sdb[1] sda[0]
> >> 2064384 blocks super external:/md127/0 [2/2] [UU]
> >>
> >> md127 : inactive sdb[1](S) sda[0](S)
> >> 65536 blocks super external:ddf
> >>
> >> # ps aux | grep dmon
> >> root 311 0.4 1.0 80580 10944 ? SLsl 17:46 0:00
> >> @sbin/mdmon --takeover md127
> >>
> >> # ./mdmon --takeover --all
> >>
> >> # ps aux | grep dmon
> >> root 3182 1.3 1.0 15156 11056 ? SLsl 17:50 0:00
> >> ./mdmon --takeover md127
> >>
> >> # ./mdmon --takeover --all
> >> ...
> >> monitor: wake ( )
> >> monitor: wake ( )
> >> monitor: wake ( )
> >> monitor: wake ( )
> >> monitor: wake ( )
> >> monitor: wake ( 12:array_state )
> >> read_and_act(0): 1378223477.512347 state:clean prev:clean action:idle
> >> prev: idle start:18446744073709551615
> >> ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (5) clean 18446744073709551615
> >> manage_new: inst: 0 action: 11 state: 12
> >> mdmon: ddf_open_new: subarray 0 doesn't exist
> >> mdmon: failed to monitor external:/md127/0
> >> free_aa: sys_name: md126
> >> read_and_act(0): state:clean action:idle next( )
> >> manage_new: inst: 0 action: 20 state: 21
> >> ddf_open_new: new subarray 0, GUID: Linux-MDdeadbeef00000000?Ob79e0c8b1n
> >> free_aa: sys_name: md126
> >> caught sigterm, all clean... exiting
> >> monitor: wake ( )
> >> no arrays to monitor... exiting
> >>
> >> # ps aux | grep dmon
> >> #
> >>
> >> Thanks
> >
> > I can't easily reproduce this.
>
> This is weird, it's 100% reproductible here.
>
> >
> > Can you run "mdmon --takeover" in one window, then the next "mdmon
> > --takeover" is a different window so we can clearly see which messages are
> > coming from the mdmon which is exiting and which are coming from the mdmon
> > which is starting.
>
>
> Sure.
>
> A note that I should have probably tell previously: before I'm
> starting manually the first mdmon process, an old mdmon process is
> running which was started by the system at boot and this mdmon is
> 3.2.6.
>
> ###
> ### window 1: starting manually the first mdmon --takeover process ####
> ###
>
> # ps aux | grep dmon
> root 312 0.5 1.0 80580 10944 ? SLsl 09:24 0:00
> @sbin/mdmon --takeover md127
>
> ## Note: this mdmon process was started at system boot and is 3.2.6
>
> # ./mdmon --takeover --all
> ...
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> manage_new: inst: 0 action: 11 state: 12
> ddf_open_new: new subarray 0, GUID: Linux-MDdeadbeef00000000?Ob79e0c8b1n
> monitor: caught signal
> read_and_act(0): 1378279619.393600 state:clean prev:inactive
> action:idle prev: idle start:18446744073709551615
> pr_state/ddf_set_array_state: 0(s=10 i=02)
> ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (5) dirty 18446744073709551615
> pr_state/ddf_set_array_state: 0(s=00 i=02)
> ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (5) clean 18446744073709551615
> pr_state/__write_init_super_ddf: 0(s=00 i=02)
> writing conf record 0 on disk b342fbdc for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk b342fbdc for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk 2cf00056 for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk 2cf00056 for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> ddf: sync_metadata
> read_and_act(0): state:clean action:idle next( )
> monitor: wake ( 12:array_state )
> read_and_act(0): 1378279621.980656 state:write-pending prev:clean
> action:idle prev: idle start:18446744073709551615
> pr_state/ddf_set_array_state: 0(s=10 i=02)
> ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (7) dirty 18446744073709551615
> pr_state/__write_init_super_ddf: 0(s=10 i=02)
> writing conf record 0 on disk b342fbdc for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk b342fbdc for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk 2cf00056 for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk 2cf00056 for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> ddf: sync_metadata
> read_and_act(0): state:write-pending action:idle next( state:active )
> monitor: wake ( 12:array_state )
> read_and_act(0): 1378279622.381087 state:active prev:write-pending
> action:idle prev: idle start:18446744073709551615
> read_and_act(0): state:active action:idle next( )
> monitor: wake ( 12:array_state )
> read_and_act(0): 1378279626.520845 state:active-idle prev:active
> action:idle prev: idle start:18446744073709551615
> read_and_act(0): state:active-idle action:idle next( state:clean )
> monitor: wake ( 12:array_state )
> read_and_act(0): 1378279626.524532 state:clean prev:active-idle
> action:idle prev: idle start:18446744073709551615
> pr_state/ddf_set_array_state: 0(s=00 i=02)
> ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (5) clean 18446744073709551615
> pr_state/__write_init_super_ddf: 0(s=00 i=02)
> writing conf record 0 on disk b342fbdc for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk b342fbdc for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk 2cf00056 for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk 2cf00056 for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> ddf: sync_metadata
> read_and_act(0): state:clean action:idle next( )
> monitor: wake ( 12:array_state )
> read_and_act(0): 1378279626.981157 state:write-pending prev:clean
> action:idle prev: idle start:18446744073709551615
> pr_state/ddf_set_array_state: 0(s=10 i=02)
> ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (7) dirty 18446744073709551615
> pr_state/__write_init_super_ddf: 0(s=10 i=02)
> writing conf record 0 on disk b342fbdc for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk b342fbdc for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk 2cf00056 for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk 2cf00056 for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> ddf: sync_metadata
> read_and_act(0): state:write-pending action:idle next( state:active )
> monitor: wake ( 12:array_state )
> read_and_act(0): 1378279627.376402 state:active prev:write-pending
> action:idle prev: idle start:18446744073709551615
> read_and_act(0): state:active action:idle next( )
>
> [launching new mdmon --takeover....]
>
> monitor: wake ( 12:array_state )
> read_and_act(0): 1378279678.858186 state:clean prev:clean action:idle
> prev: idle start:18446744073709551615
> ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (5) clean 18446744073709551615
> read_and_act(0): state:clean action:idle next( )
> manage_new: inst: 0 action: 20 state: 21
> ddf_open_new: new subarray 0, GUID: Linux-MDdeadbeef00000000?Ob79e0c8b1n
> free_aa: sys_name: md126
> caught sigterm, all clean... exiting
>
> ###
> ### window 2: starting the 2nd mdmon process ###
> ###
>
> #./mdmon --takeover --all
> ...
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> manage_new: inst: 0 action: 11 state: 12
> mdmon: ddf_open_new: subarray 0 doesn't exist
> mdmon: failed to monitor external:/md127/0
> free_aa: sys_name: md126
> monitor: wake ( )
> no arrays to monitor... exiting
>
The line
> mdmon: ddf_open_new: subarray 0 doesn't exist
is the problem. mdmon read the metadata from the array but didn't find
subarray '0' in there even though the previous mdmon clearly did:
> ddf_open_new: new subarray 0, GUID: Linux-MDdeadbeef00000000?Ob79e0c8b1n
This suggests that even though it succeeded in reading the metadata (it would
have printed
Cannot load metadata for md127
and exited if it had), the metadata is somehow inconsistent.
Could you trying running each mdmon under strace:
strace -f -o /tmp/str-1 ./mddmon --takeover --all
and attach the two /tmp/str-? files?
Also what is the difference between
mdadm --examine /dev/sda
and
mdadm --examine /dev/sdb
??
Thanks,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2013-09-05 2:11 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-03 15:54 mdadm 3.3: issue with mdmon --takeover Francis Moreau
2013-09-04 6:08 ` NeilBrown
2013-09-04 7:36 ` Francis Moreau
2013-09-05 2:11 ` NeilBrown [this message]
[not found] ` <CAC9WiBiHcS126iFv91250d83sMrBYmRbvoqYAEhjJWjb2p5J3A@mail.gmail.com>
2013-09-05 9:03 ` Francis Moreau
2013-09-10 23:35 ` NeilBrown
2013-09-11 7:40 ` Francis Moreau
2013-09-11 8:11 ` Francis Moreau
2013-09-12 5:03 ` NeilBrown
2013-09-12 7:40 ` Francis Moreau
2013-09-12 5:00 ` NeilBrown
2013-09-11 20:51 ` Martin Wilck
2013-09-12 4:59 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130905121123.27968f9f@notabene.brown \
--to=neilb@suse.de \
--cc=francis.moro@gmail.com \
--cc=linux-raid@vger.kernel.org \
--cc=mwilck@arcor.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).