* mdadm 3.3: issue with mdmon --takeover
@ 2013-09-03 15:54 Francis Moreau
2013-09-04 6:08 ` NeilBrown
0 siblings, 1 reply; 13+ messages in thread
From: Francis Moreau @ 2013-09-03 15:54 UTC (permalink / raw)
To: Martin Wilck; +Cc: linux-raid
Hello Martin :)
I gave 3.3 release a try and I have a first issue: basically starting
mdmon (3.3) with --takeover twice make mdmon failing on the second
run.
Please find details below:
# cat /proc/mdstat
Personalities : [raid1]
md126 : active raid1 sdb[1] sda[0]
2064384 blocks super external:/md127/0 [2/2] [UU]
md127 : inactive sdb[1](S) sda[0](S)
65536 blocks super external:ddf
# ps aux | grep dmon
root 311 0.4 1.0 80580 10944 ? SLsl 17:46 0:00
@sbin/mdmon --takeover md127
# ./mdmon --takeover --all
# ps aux | grep dmon
root 3182 1.3 1.0 15156 11056 ? SLsl 17:50 0:00
./mdmon --takeover md127
# ./mdmon --takeover --all
...
monitor: wake ( )
monitor: wake ( )
monitor: wake ( )
monitor: wake ( )
monitor: wake ( )
monitor: wake ( 12:array_state )
read_and_act(0): 1378223477.512347 state:clean prev:clean action:idle
prev: idle start:18446744073709551615
ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (5) clean 18446744073709551615
manage_new: inst: 0 action: 11 state: 12
mdmon: ddf_open_new: subarray 0 doesn't exist
mdmon: failed to monitor external:/md127/0
free_aa: sys_name: md126
read_and_act(0): state:clean action:idle next( )
manage_new: inst: 0 action: 20 state: 21
ddf_open_new: new subarray 0, GUID: Linux-MDdeadbeef00000000?Ob79e0c8b1n
free_aa: sys_name: md126
caught sigterm, all clean... exiting
monitor: wake ( )
no arrays to monitor... exiting
# ps aux | grep dmon
#
Thanks
--
Francis
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: mdadm 3.3: issue with mdmon --takeover
2013-09-03 15:54 mdadm 3.3: issue with mdmon --takeover Francis Moreau
@ 2013-09-04 6:08 ` NeilBrown
2013-09-04 7:36 ` Francis Moreau
0 siblings, 1 reply; 13+ messages in thread
From: NeilBrown @ 2013-09-04 6:08 UTC (permalink / raw)
To: Francis Moreau; +Cc: Martin Wilck, linux-raid
[-- Attachment #1: Type: text/plain, Size: 2002 bytes --]
On Tue, 3 Sep 2013 17:54:55 +0200 Francis Moreau <francis.moro@gmail.com>
wrote:
> Hello Martin :)
>
> I gave 3.3 release a try and I have a first issue: basically starting
> mdmon (3.3) with --takeover twice make mdmon failing on the second
> run.
>
> Please find details below:
>
> # cat /proc/mdstat
> Personalities : [raid1]
> md126 : active raid1 sdb[1] sda[0]
> 2064384 blocks super external:/md127/0 [2/2] [UU]
>
> md127 : inactive sdb[1](S) sda[0](S)
> 65536 blocks super external:ddf
>
> # ps aux | grep dmon
> root 311 0.4 1.0 80580 10944 ? SLsl 17:46 0:00
> @sbin/mdmon --takeover md127
>
> # ./mdmon --takeover --all
>
> # ps aux | grep dmon
> root 3182 1.3 1.0 15156 11056 ? SLsl 17:50 0:00
> ./mdmon --takeover md127
>
> # ./mdmon --takeover --all
> ...
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( 12:array_state )
> read_and_act(0): 1378223477.512347 state:clean prev:clean action:idle
> prev: idle start:18446744073709551615
> ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (5) clean 18446744073709551615
> manage_new: inst: 0 action: 11 state: 12
> mdmon: ddf_open_new: subarray 0 doesn't exist
> mdmon: failed to monitor external:/md127/0
> free_aa: sys_name: md126
> read_and_act(0): state:clean action:idle next( )
> manage_new: inst: 0 action: 20 state: 21
> ddf_open_new: new subarray 0, GUID: Linux-MDdeadbeef00000000?Ob79e0c8b1n
> free_aa: sys_name: md126
> caught sigterm, all clean... exiting
> monitor: wake ( )
> no arrays to monitor... exiting
>
> # ps aux | grep dmon
> #
>
> Thanks
I can't easily reproduce this.
Can you run "mdmon --takeover" in one window, then the next "mdmon
--takeover" is a different window so we can clearly see which messages are
coming from the mdmon which is exiting and which are coming from the mdmon
which is starting.
Thanks.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: mdadm 3.3: issue with mdmon --takeover
2013-09-04 6:08 ` NeilBrown
@ 2013-09-04 7:36 ` Francis Moreau
2013-09-05 2:11 ` NeilBrown
0 siblings, 1 reply; 13+ messages in thread
From: Francis Moreau @ 2013-09-04 7:36 UTC (permalink / raw)
To: NeilBrown; +Cc: Martin Wilck, linux-raid
Hi Neil,
On Wed, Sep 4, 2013 at 8:08 AM, NeilBrown <neilb@suse.de> wrote:
> On Tue, 3 Sep 2013 17:54:55 +0200 Francis Moreau <francis.moro@gmail.com>
> wrote:
>
>> Hello Martin :)
>>
>> I gave 3.3 release a try and I have a first issue: basically starting
>> mdmon (3.3) with --takeover twice make mdmon failing on the second
>> run.
>>
>> Please find details below:
>>
>> # cat /proc/mdstat
>> Personalities : [raid1]
>> md126 : active raid1 sdb[1] sda[0]
>> 2064384 blocks super external:/md127/0 [2/2] [UU]
>>
>> md127 : inactive sdb[1](S) sda[0](S)
>> 65536 blocks super external:ddf
>>
>> # ps aux | grep dmon
>> root 311 0.4 1.0 80580 10944 ? SLsl 17:46 0:00
>> @sbin/mdmon --takeover md127
>>
>> # ./mdmon --takeover --all
>>
>> # ps aux | grep dmon
>> root 3182 1.3 1.0 15156 11056 ? SLsl 17:50 0:00
>> ./mdmon --takeover md127
>>
>> # ./mdmon --takeover --all
>> ...
>> monitor: wake ( )
>> monitor: wake ( )
>> monitor: wake ( )
>> monitor: wake ( )
>> monitor: wake ( )
>> monitor: wake ( 12:array_state )
>> read_and_act(0): 1378223477.512347 state:clean prev:clean action:idle
>> prev: idle start:18446744073709551615
>> ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (5) clean 18446744073709551615
>> manage_new: inst: 0 action: 11 state: 12
>> mdmon: ddf_open_new: subarray 0 doesn't exist
>> mdmon: failed to monitor external:/md127/0
>> free_aa: sys_name: md126
>> read_and_act(0): state:clean action:idle next( )
>> manage_new: inst: 0 action: 20 state: 21
>> ddf_open_new: new subarray 0, GUID: Linux-MDdeadbeef00000000?Ob79e0c8b1n
>> free_aa: sys_name: md126
>> caught sigterm, all clean... exiting
>> monitor: wake ( )
>> no arrays to monitor... exiting
>>
>> # ps aux | grep dmon
>> #
>>
>> Thanks
>
> I can't easily reproduce this.
This is weird, it's 100% reproductible here.
>
> Can you run "mdmon --takeover" in one window, then the next "mdmon
> --takeover" is a different window so we can clearly see which messages are
> coming from the mdmon which is exiting and which are coming from the mdmon
> which is starting.
Sure.
A note that I should have probably tell previously: before I'm
starting manually the first mdmon process, an old mdmon process is
running which was started by the system at boot and this mdmon is
3.2.6.
###
### window 1: starting manually the first mdmon --takeover process ####
###
# ps aux | grep dmon
root 312 0.5 1.0 80580 10944 ? SLsl 09:24 0:00
@sbin/mdmon --takeover md127
## Note: this mdmon process was started at system boot and is 3.2.6
# ./mdmon --takeover --all
...
monitor: wake ( )
monitor: wake ( )
monitor: wake ( )
monitor: wake ( )
monitor: wake ( )
monitor: wake ( )
monitor: wake ( )
monitor: wake ( )
monitor: wake ( )
monitor: wake ( )
monitor: wake ( )
monitor: wake ( )
monitor: wake ( )
monitor: wake ( )
manage_new: inst: 0 action: 11 state: 12
ddf_open_new: new subarray 0, GUID: Linux-MDdeadbeef00000000?Ob79e0c8b1n
monitor: caught signal
read_and_act(0): 1378279619.393600 state:clean prev:inactive
action:idle prev: idle start:18446744073709551615
pr_state/ddf_set_array_state: 0(s=10 i=02)
ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (5) dirty 18446744073709551615
pr_state/ddf_set_array_state: 0(s=00 i=02)
ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (5) clean 18446744073709551615
pr_state/__write_init_super_ddf: 0(s=00 i=02)
writing conf record 0 on disk b342fbdc for
Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
writing conf record 0 on disk b342fbdc for
Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
writing conf record 0 on disk 2cf00056 for
Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
writing conf record 0 on disk 2cf00056 for
Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
ddf: sync_metadata
read_and_act(0): state:clean action:idle next( )
monitor: wake ( 12:array_state )
read_and_act(0): 1378279621.980656 state:write-pending prev:clean
action:idle prev: idle start:18446744073709551615
pr_state/ddf_set_array_state: 0(s=10 i=02)
ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (7) dirty 18446744073709551615
pr_state/__write_init_super_ddf: 0(s=10 i=02)
writing conf record 0 on disk b342fbdc for
Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
writing conf record 0 on disk b342fbdc for
Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
writing conf record 0 on disk 2cf00056 for
Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
writing conf record 0 on disk 2cf00056 for
Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
ddf: sync_metadata
read_and_act(0): state:write-pending action:idle next( state:active )
monitor: wake ( 12:array_state )
read_and_act(0): 1378279622.381087 state:active prev:write-pending
action:idle prev: idle start:18446744073709551615
read_and_act(0): state:active action:idle next( )
monitor: wake ( 12:array_state )
read_and_act(0): 1378279626.520845 state:active-idle prev:active
action:idle prev: idle start:18446744073709551615
read_and_act(0): state:active-idle action:idle next( state:clean )
monitor: wake ( 12:array_state )
read_and_act(0): 1378279626.524532 state:clean prev:active-idle
action:idle prev: idle start:18446744073709551615
pr_state/ddf_set_array_state: 0(s=00 i=02)
ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (5) clean 18446744073709551615
pr_state/__write_init_super_ddf: 0(s=00 i=02)
writing conf record 0 on disk b342fbdc for
Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
writing conf record 0 on disk b342fbdc for
Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
writing conf record 0 on disk 2cf00056 for
Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
writing conf record 0 on disk 2cf00056 for
Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
ddf: sync_metadata
read_and_act(0): state:clean action:idle next( )
monitor: wake ( 12:array_state )
read_and_act(0): 1378279626.981157 state:write-pending prev:clean
action:idle prev: idle start:18446744073709551615
pr_state/ddf_set_array_state: 0(s=10 i=02)
ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (7) dirty 18446744073709551615
pr_state/__write_init_super_ddf: 0(s=10 i=02)
writing conf record 0 on disk b342fbdc for
Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
writing conf record 0 on disk b342fbdc for
Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
writing conf record 0 on disk 2cf00056 for
Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
writing conf record 0 on disk 2cf00056 for
Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
ddf: sync_metadata
read_and_act(0): state:write-pending action:idle next( state:active )
monitor: wake ( 12:array_state )
read_and_act(0): 1378279627.376402 state:active prev:write-pending
action:idle prev: idle start:18446744073709551615
read_and_act(0): state:active action:idle next( )
[launching new mdmon --takeover....]
monitor: wake ( 12:array_state )
read_and_act(0): 1378279678.858186 state:clean prev:clean action:idle
prev: idle start:18446744073709551615
ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (5) clean 18446744073709551615
read_and_act(0): state:clean action:idle next( )
manage_new: inst: 0 action: 20 state: 21
ddf_open_new: new subarray 0, GUID: Linux-MDdeadbeef00000000?Ob79e0c8b1n
free_aa: sys_name: md126
caught sigterm, all clean... exiting
###
### window 2: starting the 2nd mdmon process ###
###
#./mdmon --takeover --all
...
monitor: wake ( )
monitor: wake ( )
monitor: wake ( )
monitor: wake ( )
monitor: wake ( )
monitor: wake ( )
monitor: wake ( )
manage_new: inst: 0 action: 11 state: 12
mdmon: ddf_open_new: subarray 0 doesn't exist
mdmon: failed to monitor external:/md127/0
free_aa: sys_name: md126
monitor: wake ( )
no arrays to monitor... exiting
Thanks
--
Francis
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: mdadm 3.3: issue with mdmon --takeover
2013-09-04 7:36 ` Francis Moreau
@ 2013-09-05 2:11 ` NeilBrown
[not found] ` <CAC9WiBiHcS126iFv91250d83sMrBYmRbvoqYAEhjJWjb2p5J3A@mail.gmail.com>
0 siblings, 1 reply; 13+ messages in thread
From: NeilBrown @ 2013-09-05 2:11 UTC (permalink / raw)
To: Francis Moreau; +Cc: Martin Wilck, linux-raid
[-- Attachment #1: Type: text/plain, Size: 8976 bytes --]
On Wed, 4 Sep 2013 09:36:27 +0200 Francis Moreau <francis.moro@gmail.com>
wrote:
> Hi Neil,
>
> On Wed, Sep 4, 2013 at 8:08 AM, NeilBrown <neilb@suse.de> wrote:
> > On Tue, 3 Sep 2013 17:54:55 +0200 Francis Moreau <francis.moro@gmail.com>
> > wrote:
> >
> >> Hello Martin :)
> >>
> >> I gave 3.3 release a try and I have a first issue: basically starting
> >> mdmon (3.3) with --takeover twice make mdmon failing on the second
> >> run.
> >>
> >> Please find details below:
> >>
> >> # cat /proc/mdstat
> >> Personalities : [raid1]
> >> md126 : active raid1 sdb[1] sda[0]
> >> 2064384 blocks super external:/md127/0 [2/2] [UU]
> >>
> >> md127 : inactive sdb[1](S) sda[0](S)
> >> 65536 blocks super external:ddf
> >>
> >> # ps aux | grep dmon
> >> root 311 0.4 1.0 80580 10944 ? SLsl 17:46 0:00
> >> @sbin/mdmon --takeover md127
> >>
> >> # ./mdmon --takeover --all
> >>
> >> # ps aux | grep dmon
> >> root 3182 1.3 1.0 15156 11056 ? SLsl 17:50 0:00
> >> ./mdmon --takeover md127
> >>
> >> # ./mdmon --takeover --all
> >> ...
> >> monitor: wake ( )
> >> monitor: wake ( )
> >> monitor: wake ( )
> >> monitor: wake ( )
> >> monitor: wake ( )
> >> monitor: wake ( 12:array_state )
> >> read_and_act(0): 1378223477.512347 state:clean prev:clean action:idle
> >> prev: idle start:18446744073709551615
> >> ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (5) clean 18446744073709551615
> >> manage_new: inst: 0 action: 11 state: 12
> >> mdmon: ddf_open_new: subarray 0 doesn't exist
> >> mdmon: failed to monitor external:/md127/0
> >> free_aa: sys_name: md126
> >> read_and_act(0): state:clean action:idle next( )
> >> manage_new: inst: 0 action: 20 state: 21
> >> ddf_open_new: new subarray 0, GUID: Linux-MDdeadbeef00000000?Ob79e0c8b1n
> >> free_aa: sys_name: md126
> >> caught sigterm, all clean... exiting
> >> monitor: wake ( )
> >> no arrays to monitor... exiting
> >>
> >> # ps aux | grep dmon
> >> #
> >>
> >> Thanks
> >
> > I can't easily reproduce this.
>
> This is weird, it's 100% reproductible here.
>
> >
> > Can you run "mdmon --takeover" in one window, then the next "mdmon
> > --takeover" is a different window so we can clearly see which messages are
> > coming from the mdmon which is exiting and which are coming from the mdmon
> > which is starting.
>
>
> Sure.
>
> A note that I should have probably tell previously: before I'm
> starting manually the first mdmon process, an old mdmon process is
> running which was started by the system at boot and this mdmon is
> 3.2.6.
>
> ###
> ### window 1: starting manually the first mdmon --takeover process ####
> ###
>
> # ps aux | grep dmon
> root 312 0.5 1.0 80580 10944 ? SLsl 09:24 0:00
> @sbin/mdmon --takeover md127
>
> ## Note: this mdmon process was started at system boot and is 3.2.6
>
> # ./mdmon --takeover --all
> ...
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> manage_new: inst: 0 action: 11 state: 12
> ddf_open_new: new subarray 0, GUID: Linux-MDdeadbeef00000000?Ob79e0c8b1n
> monitor: caught signal
> read_and_act(0): 1378279619.393600 state:clean prev:inactive
> action:idle prev: idle start:18446744073709551615
> pr_state/ddf_set_array_state: 0(s=10 i=02)
> ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (5) dirty 18446744073709551615
> pr_state/ddf_set_array_state: 0(s=00 i=02)
> ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (5) clean 18446744073709551615
> pr_state/__write_init_super_ddf: 0(s=00 i=02)
> writing conf record 0 on disk b342fbdc for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk b342fbdc for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk 2cf00056 for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk 2cf00056 for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> ddf: sync_metadata
> read_and_act(0): state:clean action:idle next( )
> monitor: wake ( 12:array_state )
> read_and_act(0): 1378279621.980656 state:write-pending prev:clean
> action:idle prev: idle start:18446744073709551615
> pr_state/ddf_set_array_state: 0(s=10 i=02)
> ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (7) dirty 18446744073709551615
> pr_state/__write_init_super_ddf: 0(s=10 i=02)
> writing conf record 0 on disk b342fbdc for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk b342fbdc for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk 2cf00056 for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk 2cf00056 for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> ddf: sync_metadata
> read_and_act(0): state:write-pending action:idle next( state:active )
> monitor: wake ( 12:array_state )
> read_and_act(0): 1378279622.381087 state:active prev:write-pending
> action:idle prev: idle start:18446744073709551615
> read_and_act(0): state:active action:idle next( )
> monitor: wake ( 12:array_state )
> read_and_act(0): 1378279626.520845 state:active-idle prev:active
> action:idle prev: idle start:18446744073709551615
> read_and_act(0): state:active-idle action:idle next( state:clean )
> monitor: wake ( 12:array_state )
> read_and_act(0): 1378279626.524532 state:clean prev:active-idle
> action:idle prev: idle start:18446744073709551615
> pr_state/ddf_set_array_state: 0(s=00 i=02)
> ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (5) clean 18446744073709551615
> pr_state/__write_init_super_ddf: 0(s=00 i=02)
> writing conf record 0 on disk b342fbdc for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk b342fbdc for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk 2cf00056 for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk 2cf00056 for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> ddf: sync_metadata
> read_and_act(0): state:clean action:idle next( )
> monitor: wake ( 12:array_state )
> read_and_act(0): 1378279626.981157 state:write-pending prev:clean
> action:idle prev: idle start:18446744073709551615
> pr_state/ddf_set_array_state: 0(s=10 i=02)
> ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (7) dirty 18446744073709551615
> pr_state/__write_init_super_ddf: 0(s=10 i=02)
> writing conf record 0 on disk b342fbdc for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk b342fbdc for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk 2cf00056 for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> writing conf record 0 on disk 2cf00056 for
> Linux-MDdeadbeef00000000?Ob79e0c8b1n/0
> ddf: sync_metadata
> read_and_act(0): state:write-pending action:idle next( state:active )
> monitor: wake ( 12:array_state )
> read_and_act(0): 1378279627.376402 state:active prev:write-pending
> action:idle prev: idle start:18446744073709551615
> read_and_act(0): state:active action:idle next( )
>
> [launching new mdmon --takeover....]
>
> monitor: wake ( 12:array_state )
> read_and_act(0): 1378279678.858186 state:clean prev:clean action:idle
> prev: idle start:18446744073709551615
> ddf mark 0/Linux-MDdeadbeef00000000?Ob79e0c8b1n (5) clean 18446744073709551615
> read_and_act(0): state:clean action:idle next( )
> manage_new: inst: 0 action: 20 state: 21
> ddf_open_new: new subarray 0, GUID: Linux-MDdeadbeef00000000?Ob79e0c8b1n
> free_aa: sys_name: md126
> caught sigterm, all clean... exiting
>
> ###
> ### window 2: starting the 2nd mdmon process ###
> ###
>
> #./mdmon --takeover --all
> ...
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> monitor: wake ( )
> manage_new: inst: 0 action: 11 state: 12
> mdmon: ddf_open_new: subarray 0 doesn't exist
> mdmon: failed to monitor external:/md127/0
> free_aa: sys_name: md126
> monitor: wake ( )
> no arrays to monitor... exiting
>
The line
> mdmon: ddf_open_new: subarray 0 doesn't exist
is the problem. mdmon read the metadata from the array but didn't find
subarray '0' in there even though the previous mdmon clearly did:
> ddf_open_new: new subarray 0, GUID: Linux-MDdeadbeef00000000?Ob79e0c8b1n
This suggests that even though it succeeded in reading the metadata (it would
have printed
Cannot load metadata for md127
and exited if it had), the metadata is somehow inconsistent.
Could you trying running each mdmon under strace:
strace -f -o /tmp/str-1 ./mddmon --takeover --all
and attach the two /tmp/str-? files?
Also what is the difference between
mdadm --examine /dev/sda
and
mdadm --examine /dev/sdb
??
Thanks,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: mdadm 3.3: issue with mdmon --takeover
[not found] ` <CAC9WiBiHcS126iFv91250d83sMrBYmRbvoqYAEhjJWjb2p5J3A@mail.gmail.com>
@ 2013-09-05 9:03 ` Francis Moreau
2013-09-10 23:35 ` NeilBrown
0 siblings, 1 reply; 13+ messages in thread
From: Francis Moreau @ 2013-09-05 9:03 UTC (permalink / raw)
To: NeilBrown; +Cc: Martin Wilck, linux-raid
On Thu, Sep 5, 2013 at 9:04 AM, Francis Moreau <francis.moro@gmail.com> wrote:
> Hi Neil,
>
> On Thu, Sep 5, 2013 at 4:11 AM, NeilBrown <neilb@suse.de> wrote:
>> On Wed, 4 Sep 2013 09:36:27 +0200 Francis Moreau <francis.moro@gmail.com>
>> wrote:
>
> [...]
>
>>> no arrays to monitor... exiting
>>>
>>
>> The line
>>
>>> mdmon: ddf_open_new: subarray 0 doesn't exist
>>
>> is the problem. mdmon read the metadata from the array but didn't find
>> subarray '0' in there even though the previous mdmon clearly did:
>>
>>> ddf_open_new: new subarray 0, GUID: Linux-MDdeadbeef00000000?Ob79e0c8b1n
>>
>> This suggests that even though it succeeded in reading the metadata (it would
>> have printed
>> Cannot load metadata for md127
>> and exited if it had), the metadata is somehow inconsistent.
>>
>> Could you trying running each mdmon under strace:
>> strace -f -o /tmp/str-1 ./mddmon --takeover --all
>>
>> and attach the two /tmp/str-? files?
>
> This is weird: if I'm doing that the first strace process is put in a
> uninterruptible state at some point:
>
> # ps aux | grep dmon
> root 2297 0.1 0.0 4468 736 tty1 D+ 08:39 0:00
> strace -f -o /tmp/str-1 ./mdmon --takeover --all
> root 2301 0.6 1.0 15156 11056 ? SLsl 08:39 0:00
> ./mdmon --takeover md127
>
> Starting the second straced mdmon does the same result, and the system
> is becoming unusable as soon as it tries to write something to the
> disk/raid I guess.
>
> Note that /tmp on my system is not a tmpfs filesystem but is part of /
> which is ext4.
>
> I gave a second shot but this time I tried to put the strace output
> files on /dev/shm which is a tmpfs FS. This time I didn't have the
> issue describes above where strace is put in D state. But since after
> the second run of mdmon, there was no running mdmon process anymore,
> it was hard to retrieve the 2 strace output files.
>
> Anyways I'm attaching the 2 files now.
>
>>
>> Also what is the difference between
>> mdadm --examine /dev/sda
>> and
>> mdadm --examine /dev/sdb
>> ??
>>
>
> After the system finish booting:
>
> # diff -u sda sdb
> --- sda 2013-09-05 09:00:59.554291764 +0200
> +++ sdb 2013-09-05 09:01:01.634279757 +0200
> @@ -1,4 +1,4 @@
> -/dev/sda:
> +/dev/sdb:
> Magic : de11de11
> Version : 01.02.00
> Controller GUID : 4C696E75:782D4D44:20202020:2020206C:6F63616C:686F7374
> @@ -23,5 +23,5 @@
>
> Physical Disks : 2
> Number RefNo Size Device Type/State
> - 0 2cf00056 2064384K /dev/sda active/Online
> - 1 b342fbdc 2064384K active/Online
> + 0 2cf00056 2064384K active/Online
> + 1 b342fbdc 2064384K /dev/sdb active/Online
>
> After starting the first mdmon process:
>
> # mdadm --examine /dev/sda >sda
> Segmentation fault
>
> It looks like mdadm is running an infinite loop or something before segfaulting.
>
I don't know if that can help but it seems to start failing here:
# strace ./mdadm --examine /dev/sda
...
write(2, "mdmon: Failed to load secondary "..., 55) = 55
lseek(3, 2130706944, SEEK_SET) = 2130706944
read(3, "\336\21\336\21\262@8\360Linux-MD\336\255\276\357\0\0\0\0?O\2672\2045b="...,
512) = 512
lseek(3, 2130707456, SEEK_SET) = 2130707456
read(3, "\255\21\21\21etx\241Linux-MD localhost"..., 65536) = 65536
lseek(3, 2130772992, SEEK_SET) = 2130772992
read(3, "\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377"...,
16384) = 16384
lseek(3, 2131022336, SEEK_SET) = 2131022336
read(3, "\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377"...,
512) = 512
fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 0), ...}) = 0
fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 0), ...}) = 0
ioctl(3, BLKGETSIZE64, 2147483648) = 0
lseek(3, 2130789376, SEEK_SET) = 2130789376
read(3, "\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377"...,
232960) = 232960
close(3) = 0
fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(4, 64), ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or
TCGETS, {B9600 opost isig icanon echo ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x7fcc26b80000
write(1, "/dev/sda:\n", 10) = 10
write(1, " Magic : de11de11\n", 27) = 27
write(1, " Version : 01.02.00\n", 27) = 27
write(1, "Controller GUID : 4C696E75:782D4"..., 72) = 72
write(1, " (Linux-MD)\n", 29) = 29
write(1, " Container GUID : 4C696E75:782D4"..., 72) = 72
open("/etc/localtime", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x7fcc26b7f000
read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\f\0\0\0\f\0\0\0\0"...,
4096) = 2945
lseek(3, -1863, SEEK_CUR) = 1082
read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\r\0\0\0\r\0\0\0\0"...,
4096) = 1863
lseek(3, 2944, SEEK_SET) = 2944
close(3) = 0
munmap(0x7fcc26b7f000, 4096) = 0
write(1, " (Linux-MD 08/2"..., 47) = 47
write(1, " Seq : 00000016\n", 27) = 27
write(1, " Redundant hdr : no\n", 21) = 21
write(1, " Virtual Disks : 65535\n", 24) = 24
write(1, "\n", 1) = 1
write(1, " VD GUID[7] : DDDDDDDD:0FDC"..., 73) = 73
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0
write(1, " ( 01/01/80 00:"..., 39) = 39
write(1, " unit[7] : 65535\n", 25) = 25
write(1, " state[7] : -reserved-, M"..., 56) = 56
write(1, " init state[7] : *UNKNOWN*\n", 29) = 29
write(1, " access[7] : Blocked (no a"..., 39) = 39
write(1, " Name[7] :
\377\377\377\377\377\377\377\377\377\377\377\377\377"..., 36) = 36
write(1, "\n", 1) = 1
write(1, " VD GUID[8] : 4C696E75:782D"..., 73) = 73
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0
write(1, " (Linux-MD 08/2"..., 47) = 47
write(1, " unit[8] : 126\n", 23) = 23
write(1, " state[8] : Optimal, Not "..., 43) = 43
write(1, " init state[8] : Fully Initial"..., 37) = 37
write(1, " access[8] : Read/Write\n", 30) = 30
write(1, " Name[8] : array1\n", 26) = 26
write(1, "\n", 1) = 1
--
Francis
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: mdadm 3.3: issue with mdmon --takeover
2013-09-05 9:03 ` Francis Moreau
@ 2013-09-10 23:35 ` NeilBrown
2013-09-11 7:40 ` Francis Moreau
2013-09-11 20:51 ` Martin Wilck
0 siblings, 2 replies; 13+ messages in thread
From: NeilBrown @ 2013-09-10 23:35 UTC (permalink / raw)
To: Francis Moreau; +Cc: Martin Wilck, linux-raid
[-- Attachment #1: Type: text/plain, Size: 5867 bytes --]
On Thu, 5 Sep 2013 11:03:14 +0200 Francis Moreau <francis.moro@gmail.com>
wrote:
> On Thu, Sep 5, 2013 at 9:04 AM, Francis Moreau <francis.moro@gmail.com> wrote:
> > Hi Neil,
> >
> > On Thu, Sep 5, 2013 at 4:11 AM, NeilBrown <neilb@suse.de> wrote:
> >> On Wed, 4 Sep 2013 09:36:27 +0200 Francis Moreau <francis.moro@gmail.com>
> >> wrote:
> >
> > [...]
> >
> >>> no arrays to monitor... exiting
> >>>
> >>
> >> The line
> >>
> >>> mdmon: ddf_open_new: subarray 0 doesn't exist
> >>
> >> is the problem. mdmon read the metadata from the array but didn't find
> >> subarray '0' in there even though the previous mdmon clearly did:
> >>
> >>> ddf_open_new: new subarray 0, GUID: Linux-MDdeadbeef00000000?Ob79e0c8b1n
> >>
> >> This suggests that even though it succeeded in reading the metadata (it would
> >> have printed
> >> Cannot load metadata for md127
> >> and exited if it had), the metadata is somehow inconsistent.
> >>
> >> Could you trying running each mdmon under strace:
> >> strace -f -o /tmp/str-1 ./mddmon --takeover --all
> >>
> >> and attach the two /tmp/str-? files?
> >
> > This is weird: if I'm doing that the first strace process is put in a
> > uninterruptible state at some point:
> >
> > # ps aux | grep dmon
> > root 2297 0.1 0.0 4468 736 tty1 D+ 08:39 0:00
> > strace -f -o /tmp/str-1 ./mdmon --takeover --all
> > root 2301 0.6 1.0 15156 11056 ? SLsl 08:39 0:00
> > ./mdmon --takeover md127
> >
> > Starting the second straced mdmon does the same result, and the system
> > is becoming unusable as soon as it tries to write something to the
> > disk/raid I guess.
> >
> > Note that /tmp on my system is not a tmpfs filesystem but is part of /
> > which is ext4.
> >
> > I gave a second shot but this time I tried to put the strace output
> > files on /dev/shm which is a tmpfs FS. This time I didn't have the
> > issue describes above where strace is put in D state. But since after
> > the second run of mdmon, there was no running mdmon process anymore,
> > it was hard to retrieve the 2 strace output files.
> >
> > Anyways I'm attaching the 2 files now.
> >
> >>
> >> Also what is the difference between
> >> mdadm --examine /dev/sda
> >> and
> >> mdadm --examine /dev/sdb
> >> ??
> >>
> >
> > After the system finish booting:
> >
> > # diff -u sda sdb
> > --- sda 2013-09-05 09:00:59.554291764 +0200
> > +++ sdb 2013-09-05 09:01:01.634279757 +0200
> > @@ -1,4 +1,4 @@
> > -/dev/sda:
> > +/dev/sdb:
> > Magic : de11de11
> > Version : 01.02.00
> > Controller GUID : 4C696E75:782D4D44:20202020:2020206C:6F63616C:686F7374
> > @@ -23,5 +23,5 @@
> >
> > Physical Disks : 2
> > Number RefNo Size Device Type/State
> > - 0 2cf00056 2064384K /dev/sda active/Online
> > - 1 b342fbdc 2064384K active/Online
> > + 0 2cf00056 2064384K active/Online
> > + 1 b342fbdc 2064384K /dev/sdb active/Online
> >
> > After starting the first mdmon process:
> >
> > # mdadm --examine /dev/sda >sda
> > Segmentation fault
> >
> > It looks like mdadm is running an infinite loop or something before segfaulting.
> >
>
> I don't know if that can help but it seems to start failing here:
>
> # strace ./mdadm --examine /dev/sda
> ...
> write(2, "mdmon: Failed to load secondary "..., 55) = 55
The problem is actually a bit earlier, but it does relate to the secondary
copy of the metadata.
The first sign of trouble is that str-1 has
2435 lseek(6, 2131022336, SEEK_SET) = 2131022336
2435 read(6, "3333\27.3#Linux-MD20130828\3143\177\"\373\324\32\230"..., 512) = 512
while str-2 has
2452 lseek(6, 2131022336, SEEK_SET) = 2131022336
2452 read(6, "\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377"..., 512) = 512
From the same offset, very different data is read.
Presumably it was written by the first write of mdmon.
Looking further in str-1 we find:
2436 lseek(7, 18446744073709551104, SEEK_SET) = -1 EINVAL (Invalid argument)
2436 write(7, "\336\21\336\21~.\307}Linux-MD\336\255\276\357\0\0\0\0?O\2672\2045b="..., 512) = 512
That is a big number: "-1 << 9".
mdmon is trying to write the secondary metadata but there isn't any.
So it writes it in the wrong place and makes a mess.
I think this patch will help. The last hunk in particular should make the
difference.
Please let me know if it fixes the problem.
Thanks,
NeilBrown
diff --git a/super-ddf.c b/super-ddf.c
index 636d7b4..86f9bb0 100644
--- a/super-ddf.c
+++ b/super-ddf.c
@@ -880,7 +880,7 @@ static int load_ddf_headers(int fd, struct ddf_super *super, char *devname)
super->primary.openflag && !super->secondary.openflag)
)
super->active = &super->secondary;
- } else if (devname)
+ } else if (devname && super->anchor.secondary_lba != ~(__u64)0)
pr_err("Failed to load secondary DDF header on %s\n",
devname);
if (super->active == NULL)
@@ -2810,7 +2810,8 @@ static int add_to_super_ddf(struct supertype *st,
} while (0)
__calc_lba(dd, ddf->dlist, workspace_lba, 32);
__calc_lba(dd, ddf->dlist, primary_lba, 16);
- __calc_lba(dd, ddf->dlist, secondary_lba, 32);
+ if (ddf->dlist == NULL || ddf->dlist->secondary_lba != ~(__u64)0)
+ __calc_lba(dd, ddf->dlist, secondary_lba, 32);
pde->config_size = dd->workspace_lba;
sprintf(pde->path, "%17.17s","Information: nil") ;
@@ -2892,6 +2893,8 @@ static int __write_ddf_structure(struct dl *d, struct ddf_super *ddf, __u8 type)
default:
return 0;
}
+ if (sector == ~(__u64)0)
+ return 0;
header->type = type;
header->openflag = 1;
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: mdadm 3.3: issue with mdmon --takeover
2013-09-10 23:35 ` NeilBrown
@ 2013-09-11 7:40 ` Francis Moreau
2013-09-11 8:11 ` Francis Moreau
2013-09-12 5:00 ` NeilBrown
2013-09-11 20:51 ` Martin Wilck
1 sibling, 2 replies; 13+ messages in thread
From: Francis Moreau @ 2013-09-11 7:40 UTC (permalink / raw)
To: NeilBrown; +Cc: Martin Wilck, linux-raid
Hi Neil,
On Wed, Sep 11, 2013 at 1:35 AM, NeilBrown <neilb@suse.de> wrote:
> On Thu, 5 Sep 2013 11:03:14 +0200 Francis Moreau <francis.moro@gmail.com>
> wrote:
>
>> On Thu, Sep 5, 2013 at 9:04 AM, Francis Moreau <francis.moro@gmail.com> wrote:
>> > Hi Neil,
>> >
>> > On Thu, Sep 5, 2013 at 4:11 AM, NeilBrown <neilb@suse.de> wrote:
>> >> On Wed, 4 Sep 2013 09:36:27 +0200 Francis Moreau <francis.moro@gmail.com>
>> >> wrote:
>> >
>> > [...]
>> >
>> >>> no arrays to monitor... exiting
>> >>>
>> >>
>> >> The line
>> >>
>> >>> mdmon: ddf_open_new: subarray 0 doesn't exist
>> >>
>> >> is the problem. mdmon read the metadata from the array but didn't find
>> >> subarray '0' in there even though the previous mdmon clearly did:
>> >>
>> >>> ddf_open_new: new subarray 0, GUID: Linux-MDdeadbeef00000000?Ob79e0c8b1n
>> >>
>> >> This suggests that even though it succeeded in reading the metadata (it would
>> >> have printed
>> >> Cannot load metadata for md127
>> >> and exited if it had), the metadata is somehow inconsistent.
>> >>
>> >> Could you trying running each mdmon under strace:
>> >> strace -f -o /tmp/str-1 ./mddmon --takeover --all
>> >>
>> >> and attach the two /tmp/str-? files?
>> >
>> > This is weird: if I'm doing that the first strace process is put in a
>> > uninterruptible state at some point:
>> >
>> > # ps aux | grep dmon
>> > root 2297 0.1 0.0 4468 736 tty1 D+ 08:39 0:00
>> > strace -f -o /tmp/str-1 ./mdmon --takeover --all
>> > root 2301 0.6 1.0 15156 11056 ? SLsl 08:39 0:00
>> > ./mdmon --takeover md127
>> >
>> > Starting the second straced mdmon does the same result, and the system
>> > is becoming unusable as soon as it tries to write something to the
>> > disk/raid I guess.
>> >
>> > Note that /tmp on my system is not a tmpfs filesystem but is part of /
>> > which is ext4.
>> >
>> > I gave a second shot but this time I tried to put the strace output
>> > files on /dev/shm which is a tmpfs FS. This time I didn't have the
>> > issue describes above where strace is put in D state. But since after
>> > the second run of mdmon, there was no running mdmon process anymore,
>> > it was hard to retrieve the 2 strace output files.
>> >
>> > Anyways I'm attaching the 2 files now.
>> >
>> >>
>> >> Also what is the difference between
>> >> mdadm --examine /dev/sda
>> >> and
>> >> mdadm --examine /dev/sdb
>> >> ??
>> >>
>> >
>> > After the system finish booting:
>> >
>> > # diff -u sda sdb
>> > --- sda 2013-09-05 09:00:59.554291764 +0200
>> > +++ sdb 2013-09-05 09:01:01.634279757 +0200
>> > @@ -1,4 +1,4 @@
>> > -/dev/sda:
>> > +/dev/sdb:
>> > Magic : de11de11
>> > Version : 01.02.00
>> > Controller GUID : 4C696E75:782D4D44:20202020:2020206C:6F63616C:686F7374
>> > @@ -23,5 +23,5 @@
>> >
>> > Physical Disks : 2
>> > Number RefNo Size Device Type/State
>> > - 0 2cf00056 2064384K /dev/sda active/Online
>> > - 1 b342fbdc 2064384K active/Online
>> > + 0 2cf00056 2064384K active/Online
>> > + 1 b342fbdc 2064384K /dev/sdb active/Online
>> >
>> > After starting the first mdmon process:
>> >
>> > # mdadm --examine /dev/sda >sda
>> > Segmentation fault
>> >
>> > It looks like mdadm is running an infinite loop or something before segfaulting.
>> >
>>
>> I don't know if that can help but it seems to start failing here:
>>
>> # strace ./mdadm --examine /dev/sda
>> ...
>> write(2, "mdmon: Failed to load secondary "..., 55) = 55
>
> The problem is actually a bit earlier, but it does relate to the secondary
> copy of the metadata.
>
> The first sign of trouble is that str-1 has
>
> 2435 lseek(6, 2131022336, SEEK_SET) = 2131022336
> 2435 read(6, "3333\27.3#Linux-MD20130828\3143\177\"\373\324\32\230"..., 512) = 512
>
> while str-2 has
>
> 2452 lseek(6, 2131022336, SEEK_SET) = 2131022336
> 2452 read(6, "\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377"..., 512) = 512
>
> From the same offset, very different data is read.
>
> Presumably it was written by the first write of mdmon.
> Looking further in str-1 we find:
>
> 2436 lseek(7, 18446744073709551104, SEEK_SET) = -1 EINVAL (Invalid argument)
> 2436 write(7, "\336\21\336\21~.\307}Linux-MD\336\255\276\357\0\0\0\0?O\2672\2045b="..., 512) = 512
>
> That is a big number: "-1 << 9".
Thanks for the details.
>
> mdmon is trying to write the secondary metadata but there isn't any.
> So it writes it in the wrong place and makes a mess.
Hm, it remembers me a warning I saw which was:
mdmon: Failed to load secondary DDF header on /dev/block/8:0
it might be related.
>
> I think this patch will help. The last hunk in particular should make the
> difference.
>
> Please let me know if it fixes the problem.
>
Yes it fixes the problem.
I had to adjust the patch to make it compile by using be64_to_cpu()
where needed.
Thanks !
--
Francis
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: mdadm 3.3: issue with mdmon --takeover
2013-09-11 7:40 ` Francis Moreau
@ 2013-09-11 8:11 ` Francis Moreau
2013-09-12 5:03 ` NeilBrown
2013-09-12 5:00 ` NeilBrown
1 sibling, 1 reply; 13+ messages in thread
From: Francis Moreau @ 2013-09-11 8:11 UTC (permalink / raw)
To: NeilBrown; +Cc: Martin Wilck, linux-raid
On Wed, Sep 11, 2013 at 9:40 AM, Francis Moreau <francis.moro@gmail.com> wrote:
[...]
>>
>> I think this patch will help. The last hunk in particular should make the
>> difference.
>>
>> Please let me know if it fixes the problem.
>>
>
> Yes it fixes the problem.
>
> I had to adjust the patch to make it compile by using be64_to_cpu()
> where needed.
>
Hmm unfortunately the following test case seems broken too, I'm not
sure it's related however:
# create a ddf array containing loop0 and loop1
$ cat /proc/mdstat
Personalities : [raid1]
md124 : active raid1 loop0[1] loop1[0]
84416 blocks super external:/md125/0 [2/2] [UU]
md125 : inactive loop1[1](S) loop0[0](S)
65536 blocks super external:ddf
# stop the array
$ mdadm --stop /dev/md124
mdadm: stopped /dev/md124
$ mdadm --stop /dev/md125
mdadm: stopped /dev/md125
# Add only one disk
$ mdadm -I /dev/loop0
mdadm: container /dev/md/ddf1 now has 1 device
mdadm: /dev/md/array1 assembled with 1 device but not started
# start the array
$ mdadm -R /dev/md124
# looks like it failed
$ cat /proc/mdstat
Personalities : [raid1]
md124 : inactive loop0[0]
84416 blocks super external:/md125/0
md125 : inactive loop0[0](S)
32768 blocks super external:ddf
# start mdmon manually with debug trace
$ mdmon /dev/md125
starting mdmon for md125
monitor: wake ( )
no arrays to monitor... exiting
--
Francis
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: mdadm 3.3: issue with mdmon --takeover
2013-09-10 23:35 ` NeilBrown
2013-09-11 7:40 ` Francis Moreau
@ 2013-09-11 20:51 ` Martin Wilck
2013-09-12 4:59 ` NeilBrown
1 sibling, 1 reply; 13+ messages in thread
From: Martin Wilck @ 2013-09-11 20:51 UTC (permalink / raw)
To: NeilBrown; +Cc: Francis Moreau, linux-raid
On 09/11/2013 01:35 AM, NeilBrown wrote:
> I think this patch will help. The last hunk in particular should make the
> difference.
>
> Please let me know if it fixes the problem.
Thanks for figuring this out. Indeed I forgot the possibility that
secondary LBA may be unused when I wrote that code.
Are you going to fix this directly? Otherwise I might go over it,
double-check, and submit a patch.
Martin
>
> Thanks,
> NeilBrown
>
>
> diff --git a/super-ddf.c b/super-ddf.c
> index 636d7b4..86f9bb0 100644
> --- a/super-ddf.c
> +++ b/super-ddf.c
> @@ -880,7 +880,7 @@ static int load_ddf_headers(int fd, struct ddf_super *super, char *devname)
> super->primary.openflag && !super->secondary.openflag)
> )
> super->active = &super->secondary;
> - } else if (devname)
> + } else if (devname && super->anchor.secondary_lba != ~(__u64)0)
> pr_err("Failed to load secondary DDF header on %s\n",
> devname);
> if (super->active == NULL)
> @@ -2810,7 +2810,8 @@ static int add_to_super_ddf(struct supertype *st,
> } while (0)
> __calc_lba(dd, ddf->dlist, workspace_lba, 32);
> __calc_lba(dd, ddf->dlist, primary_lba, 16);
> - __calc_lba(dd, ddf->dlist, secondary_lba, 32);
> + if (ddf->dlist == NULL || ddf->dlist->secondary_lba != ~(__u64)0)
> + __calc_lba(dd, ddf->dlist, secondary_lba, 32);
> pde->config_size = dd->workspace_lba;
>
> sprintf(pde->path, "%17.17s","Information: nil") ;
> @@ -2892,6 +2893,8 @@ static int __write_ddf_structure(struct dl *d, struct ddf_super *ddf, __u8 type)
> default:
> return 0;
> }
> + if (sector == ~(__u64)0)
> + return 0;
>
> header->type = type;
> header->openflag = 1;
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: mdadm 3.3: issue with mdmon --takeover
2013-09-11 20:51 ` Martin Wilck
@ 2013-09-12 4:59 ` NeilBrown
0 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2013-09-12 4:59 UTC (permalink / raw)
To: Martin Wilck; +Cc: Francis Moreau, linux-raid
[-- Attachment #1: Type: text/plain, Size: 683 bytes --]
On Wed, 11 Sep 2013 22:51:20 +0200 Martin Wilck <mwilck@arcor.de> wrote:
> On 09/11/2013 01:35 AM, NeilBrown wrote:
> > I think this patch will help. The last hunk in particular should make the
> > difference.
> >
> > Please let me know if it fixes the problem.
>
> Thanks for figuring this out. Indeed I forgot the possibility that
> secondary LBA may be unused when I wrote that code.
>
> Are you going to fix this directly? Otherwise I might go over it,
> double-check, and submit a patch.
>
I've applied that patch I previously proposed (plus compile fix). Feel free
to double-check anyway and submit a patch if you find anything.
Thanks,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: mdadm 3.3: issue with mdmon --takeover
2013-09-11 7:40 ` Francis Moreau
2013-09-11 8:11 ` Francis Moreau
@ 2013-09-12 5:00 ` NeilBrown
1 sibling, 0 replies; 13+ messages in thread
From: NeilBrown @ 2013-09-12 5:00 UTC (permalink / raw)
To: Francis Moreau; +Cc: Martin Wilck, linux-raid
[-- Attachment #1: Type: text/plain, Size: 263 bytes --]
On Wed, 11 Sep 2013 09:40:56 +0200 Francis Moreau <francis.moro@gmail.com>
wrote:
> Yes it fixes the problem.
>
> I had to adjust the patch to make it compile by using be64_to_cpu()
> where needed.
Thanks. I've applied with those fixes.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: mdadm 3.3: issue with mdmon --takeover
2013-09-11 8:11 ` Francis Moreau
@ 2013-09-12 5:03 ` NeilBrown
2013-09-12 7:40 ` Francis Moreau
0 siblings, 1 reply; 13+ messages in thread
From: NeilBrown @ 2013-09-12 5:03 UTC (permalink / raw)
To: Francis Moreau; +Cc: Martin Wilck, linux-raid
[-- Attachment #1: Type: text/plain, Size: 1813 bytes --]
On Wed, 11 Sep 2013 10:11:29 +0200 Francis Moreau <francis.moro@gmail.com>
wrote:
> On Wed, Sep 11, 2013 at 9:40 AM, Francis Moreau <francis.moro@gmail.com> wrote:
> [...]
> >>
> >> I think this patch will help. The last hunk in particular should make the
> >> difference.
> >>
> >> Please let me know if it fixes the problem.
> >>
> >
> > Yes it fixes the problem.
> >
> > I had to adjust the patch to make it compile by using be64_to_cpu()
> > where needed.
> >
>
> Hmm unfortunately the following test case seems broken too, I'm not
> sure it's related however:
>
> # create a ddf array containing loop0 and loop1
> $ cat /proc/mdstat
> Personalities : [raid1]
> md124 : active raid1 loop0[1] loop1[0]
> 84416 blocks super external:/md125/0 [2/2] [UU]
>
> md125 : inactive loop1[1](S) loop0[0](S)
> 65536 blocks super external:ddf
>
> # stop the array
> $ mdadm --stop /dev/md124
> mdadm: stopped /dev/md124
> $ mdadm --stop /dev/md125
> mdadm: stopped /dev/md125
>
> # Add only one disk
> $ mdadm -I /dev/loop0
> mdadm: container /dev/md/ddf1 now has 1 device
> mdadm: /dev/md/array1 assembled with 1 device but not started
>
> # start the array
> $ mdadm -R /dev/md124
Does
mdadm -IRs
work at this point?
If so, it is just a problem with my quick reimplementation of "-R".
I've made a note to look at it when I get a chance.
NeilBrown
>
> # looks like it failed
> $ cat /proc/mdstat
> Personalities : [raid1]
> md124 : inactive loop0[0]
> 84416 blocks super external:/md125/0
>
> md125 : inactive loop0[0](S)
> 32768 blocks super external:ddf
>
> # start mdmon manually with debug trace
> $ mdmon /dev/md125
> starting mdmon for md125
> monitor: wake ( )
> no arrays to monitor... exiting
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: mdadm 3.3: issue with mdmon --takeover
2013-09-12 5:03 ` NeilBrown
@ 2013-09-12 7:40 ` Francis Moreau
0 siblings, 0 replies; 13+ messages in thread
From: Francis Moreau @ 2013-09-12 7:40 UTC (permalink / raw)
To: NeilBrown; +Cc: Martin Wilck, linux-raid
On Thu, Sep 12, 2013 at 7:03 AM, NeilBrown <neilb@suse.de> wrote:
> On Wed, 11 Sep 2013 10:11:29 +0200 Francis Moreau <francis.moro@gmail.com>
> wrote:
>
>> On Wed, Sep 11, 2013 at 9:40 AM, Francis Moreau <francis.moro@gmail.com> wrote:
>> [...]
>> >>
>> >> I think this patch will help. The last hunk in particular should make the
>> >> difference.
>> >>
>> >> Please let me know if it fixes the problem.
>> >>
>> >
>> > Yes it fixes the problem.
>> >
>> > I had to adjust the patch to make it compile by using be64_to_cpu()
>> > where needed.
>> >
>>
>> Hmm unfortunately the following test case seems broken too, I'm not
>> sure it's related however:
>>
>> # create a ddf array containing loop0 and loop1
>> $ cat /proc/mdstat
>> Personalities : [raid1]
>> md124 : active raid1 loop0[1] loop1[0]
>> 84416 blocks super external:/md125/0 [2/2] [UU]
>>
>> md125 : inactive loop1[1](S) loop0[0](S)
>> 65536 blocks super external:ddf
>>
>> # stop the array
>> $ mdadm --stop /dev/md124
>> mdadm: stopped /dev/md124
>> $ mdadm --stop /dev/md125
>> mdadm: stopped /dev/md125
>>
>> # Add only one disk
>> $ mdadm -I /dev/loop0
>> mdadm: container /dev/md/ddf1 now has 1 device
>> mdadm: /dev/md/array1 assembled with 1 device but not started
>>
>> # start the array
>> $ mdadm -R /dev/md124
>
> Does
> mdadm -IRs
> work at this point?
Yes it does.
>
> If so, it is just a problem with my quick reimplementation of "-R".
> I've made a note to look at it when I get a chance.
>
ok, thanks.
--
Francis
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2013-09-12 7:40 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-03 15:54 mdadm 3.3: issue with mdmon --takeover Francis Moreau
2013-09-04 6:08 ` NeilBrown
2013-09-04 7:36 ` Francis Moreau
2013-09-05 2:11 ` NeilBrown
[not found] ` <CAC9WiBiHcS126iFv91250d83sMrBYmRbvoqYAEhjJWjb2p5J3A@mail.gmail.com>
2013-09-05 9:03 ` Francis Moreau
2013-09-10 23:35 ` NeilBrown
2013-09-11 7:40 ` Francis Moreau
2013-09-11 8:11 ` Francis Moreau
2013-09-12 5:03 ` NeilBrown
2013-09-12 7:40 ` Francis Moreau
2013-09-12 5:00 ` NeilBrown
2013-09-11 20:51 ` Martin Wilck
2013-09-12 4:59 ` NeilBrown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).