* [Ocfs2-devel] ocfs2 hang writing until reboot the cluster-dlm: set_fs_notified: set_fs_notified no nodeid 1812048064#012
@ 2010-08-20 10:50 Roberto Giordani
2010-08-20 13:54 ` Jiaju Zhang
0 siblings, 1 reply; 2+ messages in thread
From: Roberto Giordani @ 2010-08-20 10:50 UTC (permalink / raw)
To: ocfs2-devel
Hello,
I hope this mailing list is correct.
I've a cluster pacemaker with a clone resource ocfs2 with
ocfs2-tools-1.4.1-25.6.x86_64
ocfs2-tools-o2cb-1.4.1-25.6.x86_64
on Opensuse 11.2
After some network problem on my switch I receive on one of 4 nodes of
my cluster the following messages
Aug 18 13:12:28 nodo1 openais[8462]: [TOTEM] The token was lost in the
OPERATIONAL state.
Aug 18 13:12:28 nodo1 openais[8462]: [TOTEM] Receive multicast socket
recv buffer size (262142 bytes).
Aug 18 13:12:28 nodo1 openais[8462]: [TOTEM] Transmit multicast socket
send buffer size (262142 bytes).
Aug 18 13:12:28 nodo1 openais[8462]: [TOTEM] entering GATHER state from 2.
Aug 18 13:12:31 nodo1 cluster-dlm: update_cluster: Processing membership
235508#012
Aug 18 13:12:31 nodo1 cluster-dlm: dlm_process_node: Skipped active node
1778493632 'nodo1': born-on=235504, last-seen=235508, this-event=235508,
last-event=235504#012
Aug 18 13:12:31 nodo1 cluster-dlm: dlm_process_node: Skipped inactive
node 1812048064 'nodo3': born-on=235504, last-seen=235504,
this-event=235508, last-event=235504#012
Aug 18 13:12:31 nodo1 cluster-dlm: add_change: add_change cg 3 remove
nodeid 1812048064 reason 3#012
Aug 18 13:12:31 nodo1 cluster-dlm: add_change: add_change cg 3 counts
member 1 joined 0 remove 1 failed 1#012
Aug 18 13:12:31 nodo1 cluster-dlm: stop_kernel: stop_kernel cg 3#012
Aug 18 13:12:31 nodo1 cluster-dlm: do_sysfs: write "0" to
"/sys/kernel/dlm/0BB443F896254AD3BA8FB960C425B666/control"#012
Aug 18 13:12:31 nodo1 openais[8462]: [TOTEM] entering GATHER state from 0.
Aug 18 13:12:31 nodo1 openais[8462]: [TOTEM] Creating commit token
because I am the rep.
Aug 18 13:12:31 nodo1 openais[8462]: [TOTEM] Saving state aru e1 high
seq received e1
Aug 18 13:12:31 nodo1 openais[8462]: [TOTEM] Storing new sequence id for
ring 397f4
Aug 18 13:12:31 nodo1 openais[8462]: [TOTEM] entering COMMIT state.
Aug 18 13:12:31 nodo1 cib: [8475]: notice: ais_dispatch: Membership
235508: quorum lost
Aug 18 13:12:31 nodo1 openais[8462]: [TOTEM] entering RECOVERY state.
Aug 18 13:12:31 nodo1 openais[8462]: [TOTEM] position [0] member
192.168.1.106:
Aug 18 13:12:31 nodo1 crmd: [8479]: notice: ais_dispatch: Membership
235508: quorum lost
Aug 18 13:12:31 nodo1 cluster-dlm: purge_plocks: purged 0 plocks for
1812048064#012
Aug 18 13:12:31 nodo1 cluster-dlm: [8721]: notice: ais_dispatch:
Membership 235508: quorum lost
Aug 18 13:12:31 nodo1 openais[8462]: [TOTEM] previous ring seq 235504
rep 192.168.1.106
Aug 18 13:12:31 nodo1 openais[8462]: [TOTEM] aru e1 high delivered e1
received flag 1
Aug 18 13:12:31 nodo1 ocfs2_controld: [8786]: notice: ais_dispatch:
Membership 235508: quorum lost
Aug 18 13:12:31 nodo1 openais[8462]: [TOTEM] Did not need to originate
any messages in recovery.
Aug 18 13:12:31 nodo1 cib: [8475]: info: crm_update_peer: Node nodo3:
id=1812048064 state=lost (new) addr=r(0) ip(192.168.1.108) votes=1
born=235504 seen=235504 proc=00000000000000000000000000053312
Aug 18 13:12:31 nodo1 openais[8462]: [TOTEM] Sending initial ORF token
Aug 18 13:12:31 nodo1 crmd: [8479]: info: ais_status_callback: status:
nodo3 is now lost (was member)
Aug 18 13:12:31 nodo1 openais[8462]: [CLM ] CLM CONFIGURATION CHANGE
Aug 18 13:12:31 nodo1 cluster-dlm: [8721]: info: crm_update_peer: Node
nodo3: id=1812048064 state=lost (new) addr=r(0) ip(192.168.1.108)
votes=1 born=235504 seen=235504 proc=00000000000000000000000000053312
Aug 18 13:12:31 nodo1 ocfs2_controld: [8786]: info: crm_update_peer:
Node nodo3: id=1812048064 state=lost (new) addr=r(0) ip(192.168.1.108)
votes=1 born=235504 seen=235504 proc=00000000000000000000000000053312
Aug 18 13:12:31 nodo1 cib: [8475]: info: ais_dispatch: Membership
235508: quorum still lost
Aug 18 13:12:31 nodo1 openais[8462]: [CLM ] New Configuration:
Aug 18 13:12:31 nodo1 crmd: [8479]: info: crm_update_peer: Node nodo3:
id=1812048064 state=lost (new) addr=r(0) ip(192.168.1.108) votes=1
born=235504 seen=235504 proc=00000000000000000000000000053312
Aug 18 13:12:31 nodo1 openais[8462]: [CLM ] #011r(0) ip(192.168.1.106)
Aug 18 13:12:31 nodo1 cluster-dlm: [8721]: info: ais_dispatch:
Membership 235508: quorum still lost
Aug 18 13:12:31 nodo1 openais[8462]: [CLM ] Members Left:
Aug 18 13:12:31 nodo1 openais[8462]: [CLM ] #011r(0) ip(192.168.1.108)
Aug 18 13:12:31 nodo1 crmd: [8479]: info: erase_node_from_join: Removed
node nodo3 from join calculations: welcomed=0 itegrated=0 finalized=0
confirmed=1
Aug 18 13:12:31 nodo1 openais[8462]: [CLM ] Members Joined:
Aug 18 13:12:31 nodo1 openais[8462]: [crm ] notice: pcmk_peer_update:
Transitional membership event on ring 235508: memb=1, new=0, lost=1
Aug 18 13:12:31 nodo1 crmd: [8479]: info: crm_update_quorum: Updating
quorum status to false (call=281)
Aug 18 13:12:31 nodo1 openais[8462]: [crm ] info: pcmk_peer_update:
memb: nodo1 1778493632
Aug 18 13:12:31 nodo1 openais[8462]: [crm ] info: pcmk_peer_update:
lost: nodo3 1812048064
Aug 18 13:12:31 nodo1 openais[8462]: [CLM ] CLM CONFIGURATION CHANGE
Aug 18 13:12:31 nodo1 openais[8462]: [CLM ] New Configuration:
Aug 18 13:12:31 nodo1 openais[8462]: [CLM ] #011r(0) ip(192.168.1.106)
Aug 18 13:12:31 nodo1 openais[8462]: [CLM ] Members Left:
Aug 18 13:12:31 nodo1 openais[8462]: [CLM ] Members Joined:
Aug 18 13:12:31 nodo1 openais[8462]: [crm ] notice: pcmk_peer_update:
Stable membership event on ring 235508: memb=1, new=0, lost=0
Aug 18 13:12:31 nodo1 openais[8462]: [crm ] info: pcmk_peer_update:
MEMB: nodo1 1778493632
Aug 18 13:12:31 nodo1 openais[8462]: [crm ] info:
ais_mark_unseen_peer_dead: Node nodo3 was not seen in the previous
transition
Aug 18 13:12:31 nodo1 openais[8462]: [MAIN ] info: update_member: Node
1812048064/nodo3 is now: lost
Aug 18 13:12:31 nodo1 openais[8462]: [crm ] info:
send_member_notification: Sending membership update 235508 to 4 children
Aug 18 13:12:31 nodo1 openais[8462]: [MAIN ] info: update_member:
0x7f12080009a0 Node 1778493632 ((null)) born on: 235504
Aug 18 13:12:31 nodo1 openais[8462]: [SYNC ] This node is within the
primary component and will provide service.
Aug 18 13:12:31 nodo1 openais[8462]: [TOTEM] entering OPERATIONAL state.
Aug 18 13:12:31 nodo1 openais[8462]: [MAIN ] info: update_member:
0x7f12080009a0 Node 1778493632 (nodo1) born on: 235504
Aug 18 13:12:31 nodo1 openais[8462]: [crm ] info:
send_member_notification: Sending membership update 235508 to 4 children
Aug 18 13:12:31 nodo1 openais[8462]: [CLM ] got nodejoin message
192.168.1.106
Aug 18 13:12:31 nodo1 openais[8462]: [CPG ] got joinlist message from
node 1778493632
Aug 18 13:12:31 nodo1 ocfs2_controld: [8786]: info: ais_dispatch:
Membership 235508: quorum still lost
Aug 18 13:12:31 nodo1 cib: [8475]: info: cib_process_request: Operation
complete: op cib_modify for section nodes (origin=local/crmd/279): ok
(rc=0)
Aug 18 13:12:31 nodo1 haclient: on_event:evt:cib_changed
Aug 18 13:12:31 nodo1 cib: [8475]: info: cib_config_changed: Attr changes
Aug 18 13:12:31 nodo1 haclient: on_event:evt:cib_changed
Aug 18 13:12:31 nodo1 cib: [8475]: info: log_data_element: cib:diff: -
<cib have-quorum="1" admin_epoch="0" epoch="1655" num_updates="37" />
Aug 18 13:12:31 nodo1 cib: [8475]: info: log_data_element: cib:diff: +
<cib have-quorum="0" admin_epoch="0" epoch="1656" num_updates="1" />
Aug 18 13:12:31 nodo1 cib: [8475]: info: cib_process_request: Operation
complete: op cib_modify for section cib (origin=local/crmd/281): ok (rc=0)
Aug 18 13:12:31 nodo1 cib: [8475]: info: cib_process_xpath: Processing
cib_query op for
//cib/configuration/crm_config//nvpair[@name='expected-quorum-votes']
(/cib/configuration/crm_config/cluster_property_set/nvpair[2])
Aug 18 13:12:31 nodo1 crmd: [8479]: info: abort_transition_graph:
need_abort:59 - Triggered transition abort (complete=1) : Non-status change
Aug 18 13:12:31 nodo1 crmd: [8479]: info: need_abort: Aborting on change
to have-quorum
Aug 18 13:12:31 nodo1 crmd: [8479]: info: ais_dispatch: Membership
235508: quorum still lost
Aug 18 13:12:31 nodo1 cib: [8475]: info: cib_process_request: Operation
complete: op cib_modify for section crm_config (origin=local/crmd/283):
ok (rc=0)
Aug 18 13:12:31 nodo1 cib: [8475]: info: cib_process_request: Operation
complete: op cib_modify for section nodes (origin=local/crmd/284): ok
(rc=0)
Aug 18 13:12:31 nodo1 cluster-dlm: fence_node_time: Node
1812048064/nodo3 was last shot 'now'#012
Aug 18 13:12:31 nodo1 cluster-dlm: fence_node_time: It does not appear
node 1812048064/nodo3 has been shot#012
Aug 18 13:12:31 nodo1 cluster-dlm: check_fencing_done: check_fencing
1812048064 1282129898 fenced at 1282129951#012
Aug 18 13:12:31 nodo1 cib: [8475]: info: cib_process_xpath: Processing
cib_query op for
//cib/configuration/crm_config//nvpair[@name='expected-quorum-votes']
(/cib/configuration/crm_config/cluster_property_set/nvpair[2])
Aug 18 13:12:31 nodo1 crmd: [8479]: info: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Aug 18 13:12:31 nodo1 crmd: [8479]: info: do_state_transition: All 1
cluster nodes are eligible to run resources.
Aug 18 13:12:31 nodo1 crmd: [8479]: info: do_pe_invoke: Query 288:
Requesting the current CIB: S_POLICY_ENGINE
Aug 18 13:12:31 nodo1 cib: [8475]: info: cib_process_request: Operation
complete: op cib_modify for section crm_config (origin=local/crmd/287):
ok (rc=0)
Aug 18 13:12:31 nodo1 cluster-dlm: check_fencing_done: check_fencing
done#012
Aug 18 13:12:31 nodo1 cluster-dlm: check_quorum_done: check_quorum
disabled#012
Aug 18 13:12:31 nodo1 cluster-dlm: check_fs_done: check_fs nodeid
1812048064 needs fs notify#012
Aug 18 13:12:31 nodo1 cluster-dlm: check_fencing_done: check_fencing
done#012
Aug 18 13:12:31 nodo1 cluster-dlm: check_quorum_done: check_quorum
disabled#012
Aug 18 13:12:31 nodo1 cluster-dlm: check_fs_done: check_fs nodeid
1812048064 needs fs notify#012
Aug 18 13:12:31 nodo1 cluster-dlm: check_fencing_done: check_fencing
done#012
Aug 18 13:12:31 nodo1 cluster-dlm: check_quorum_done: check_quorum
disabled#012
Aug 18 13:12:31 nodo1 cluster-dlm: check_fs_done: check_fs done#012
Aug 18 13:12:31 nodo1 cluster-dlm: send_info: send_start cg 3 flags 2
counts 2 1 0 1 1#012
Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
nodeid 1812048064#012
Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
nodeid 1812048064#012
Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
nodeid 1812048064#012
Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
nodeid 1812048064#012
Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
nodeid 1812048064#012
Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
nodeid 1812048064#012
Aug 18 13:12:31 nodo1 cluster-dlm: receive_start: receive_start
1778493632:3 len 76#012
Aug 18 13:12:31 nodo1 cluster-dlm: match_change: match_change
1778493632:3 matches cg 3#012
Aug 18 13:12:31 nodo1 cluster-dlm: wait_messages_done: wait_messages cg
3 got all 1#012
Aug 18 13:12:31 nodo1 cluster-dlm: start_kernel: start_kernel cg 3
member_count 1#012
Aug 18 13:12:31 nodo1 cluster-dlm: update_dir_members: dir_member
1812048064#012
Aug 18 13:12:31 nodo1 cluster-dlm: update_dir_members: dir_member
1778493632#012
Aug 18 13:12:31 nodo1 cluster-dlm: set_configfs_members: set_members
rmdir
"/sys/kernel/config/dlm/cluster/spaces/0BB443F896254AD3BA8FB960C425B666/nodes/1812048064"#012
Aug 18 13:12:31 nodo1 cluster-dlm: do_sysfs: write "1" to
"/sys/kernel/dlm/0BB443F896254AD3BA8FB960C425B666/control"#012
Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
nodeid 1812048064#012
Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
nodeid 1812048064#012
Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
nodeid 1812048064#012
Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
nodeid 1812048064#012
Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
nodeid 1812048064#012
Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
nodeid 1812048064#012
Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
nodeid 1812048064#012
Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
nodeid 1812048064#012
Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
nodeid 1812048064#012
Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
nodeid 1812048064#012
Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
nodeid 1812048064#012
Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
nodeid 1812048064#012
Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
nodeid 1812048064#012
...............................
and so on with the same messages until other node fence it.
I've searched on internet some info but nobody has the same problem.
I've tried to post the same error to clusterlabs mailing list without
answer.
I've notice that if I remove the FC cable from the node in error the log
messages stops, so I suppose is some problem due lock manager+ocfs2 and
network problem.
Could someone help me?
Regards,
Roberto.
^ permalink raw reply [flat|nested] 2+ messages in thread
* [Ocfs2-devel] ocfs2 hang writing until reboot the cluster-dlm: set_fs_notified: set_fs_notified no nodeid 1812048064#012
2010-08-20 10:50 [Ocfs2-devel] ocfs2 hang writing until reboot the cluster-dlm: set_fs_notified: set_fs_notified no nodeid 1812048064#012 Roberto Giordani
@ 2010-08-20 13:54 ` Jiaju Zhang
0 siblings, 0 replies; 2+ messages in thread
From: Jiaju Zhang @ 2010-08-20 13:54 UTC (permalink / raw)
To: ocfs2-devel
On Fri, Aug 20, 2010 at 6:50 PM, Roberto Giordani <r.giordani@libero.it> wrote:
> Hello,
> I hope this mailing list is correct.
> I've a cluster pacemaker with a clone resource ocfs2 with
> ocfs2-tools-1.4.1-25.6.x86_64
> ocfs2-tools-o2cb-1.4.1-25.6.x86_64
> on Opensuse 11.2
Since you are using ocfs2 with user-space cluster stack, actually the
package ocfs2-tools-o2cb hasn't been used in your setup.
>
> After some network problem on my switch I receive on one of 4 nodes of
> my cluster the following messages
>
> Aug 18 13:12:28 nodo1 openais[8462]: [TOTEM] The token was lost in the
> OPERATIONAL state.
> Aug 18 13:12:28 nodo1 openais[8462]: [TOTEM] Receive multicast socket
> recv buffer size (262142 bytes).
> Aug 18 13:12:28 nodo1 openais[8462]: [TOTEM] Transmit multicast socket
> send buffer size (262142 bytes).
> Aug 18 13:12:28 nodo1 openais[8462]: [TOTEM] entering GATHER state from 2.
> Aug 18 13:12:31 nodo1 cluster-dlm: update_cluster: Processing membership
> 235508#012
> Aug 18 13:12:31 nodo1 cluster-dlm: dlm_process_node: Skipped active node
> 1778493632 'nodo1': born-on=235504, last-seen=235508, this-event=235508,
> last-event=235504#012
> Aug 18 13:12:31 nodo1 cluster-dlm: dlm_process_node: Skipped inactive
> node 1812048064 'nodo3': born-on=235504, last-seen=235504,
> this-event=235508, last-event=235504#012
> Aug 18 13:12:31 nodo1 cluster-dlm: add_change: add_change cg 3 remove
> nodeid 1812048064 reason 3#012
> Aug 18 13:12:31 nodo1 cluster-dlm: add_change: add_change cg 3 counts
> member 1 joined 0 remove 1 failed 1#012
> Aug 18 13:12:31 nodo1 cluster-dlm: stop_kernel: stop_kernel cg 3#012
> Aug 18 13:12:31 nodo1 cluster-dlm: do_sysfs: write "0" to
> "/sys/kernel/dlm/0BB443F896254AD3BA8FB960C425B666/control"#012
> Aug 18 13:12:31 nodo1 openais[8462]: [TOTEM] entering GATHER state from 0.
> Aug 18 13:12:31 nodo1 openais[8462]: [TOTEM] Creating commit token
> because I am the rep.
> Aug 18 13:12:31 nodo1 openais[8462]: [TOTEM] Saving state aru e1 high
> seq received e1
> Aug 18 13:12:31 nodo1 openais[8462]: [TOTEM] Storing new sequence id for
> ring 397f4
> Aug 18 13:12:31 nodo1 openais[8462]: [TOTEM] entering COMMIT state.
> Aug 18 13:12:31 nodo1 cib: [8475]: notice: ais_dispatch: Membership
> 235508: quorum lost
> Aug 18 13:12:31 nodo1 openais[8462]: [TOTEM] entering RECOVERY state.
> Aug 18 13:12:31 nodo1 openais[8462]: [TOTEM] position [0] member
> 192.168.1.106:
> Aug 18 13:12:31 nodo1 crmd: [8479]: notice: ais_dispatch: Membership
> 235508: quorum lost
> Aug 18 13:12:31 nodo1 cluster-dlm: purge_plocks: purged 0 plocks for
> 1812048064#012
> Aug 18 13:12:31 nodo1 cluster-dlm: [8721]: notice: ais_dispatch:
> Membership 235508: quorum lost
> Aug 18 13:12:31 nodo1 openais[8462]: [TOTEM] previous ring seq 235504
> rep 192.168.1.106
> Aug 18 13:12:31 nodo1 openais[8462]: [TOTEM] aru e1 high delivered e1
> received flag 1
> Aug 18 13:12:31 nodo1 ocfs2_controld: [8786]: notice: ais_dispatch:
> Membership 235508: quorum lost
> Aug 18 13:12:31 nodo1 openais[8462]: [TOTEM] Did not need to originate
> any messages in recovery.
> Aug 18 13:12:31 nodo1 cib: [8475]: info: crm_update_peer: Node nodo3:
> id=1812048064 state=lost (new) addr=r(0) ip(192.168.1.108) ?votes=1
> born=235504 seen=235504 proc=00000000000000000000000000053312
> Aug 18 13:12:31 nodo1 openais[8462]: [TOTEM] Sending initial ORF token
> Aug 18 13:12:31 nodo1 crmd: [8479]: info: ais_status_callback: status:
> nodo3 is now lost (was member)
> Aug 18 13:12:31 nodo1 openais[8462]: [CLM ?] CLM CONFIGURATION CHANGE
> Aug 18 13:12:31 nodo1 cluster-dlm: [8721]: info: crm_update_peer: Node
> nodo3: id=1812048064 state=lost (new) addr=r(0) ip(192.168.1.108)
> votes=1 born=235504 seen=235504 proc=00000000000000000000000000053312
> Aug 18 13:12:31 nodo1 ocfs2_controld: [8786]: info: crm_update_peer:
> Node nodo3: id=1812048064 state=lost (new) addr=r(0) ip(192.168.1.108)
> votes=1 born=235504 seen=235504 proc=00000000000000000000000000053312
> Aug 18 13:12:31 nodo1 cib: [8475]: info: ais_dispatch: Membership
> 235508: quorum still lost
> Aug 18 13:12:31 nodo1 openais[8462]: [CLM ?] New Configuration:
> Aug 18 13:12:31 nodo1 crmd: [8479]: info: crm_update_peer: Node nodo3:
> id=1812048064 state=lost (new) addr=r(0) ip(192.168.1.108) ?votes=1
> born=235504 seen=235504 proc=00000000000000000000000000053312
> Aug 18 13:12:31 nodo1 openais[8462]: [CLM ?] #011r(0) ip(192.168.1.106)
> Aug 18 13:12:31 nodo1 cluster-dlm: [8721]: info: ais_dispatch:
> Membership 235508: quorum still lost
> Aug 18 13:12:31 nodo1 openais[8462]: [CLM ?] Members Left:
> Aug 18 13:12:31 nodo1 openais[8462]: [CLM ?] #011r(0) ip(192.168.1.108)
> Aug 18 13:12:31 nodo1 crmd: [8479]: info: erase_node_from_join: Removed
> node nodo3 from join calculations: welcomed=0 itegrated=0 finalized=0
> confirmed=1
> Aug 18 13:12:31 nodo1 openais[8462]: [CLM ?] Members Joined:
> Aug 18 13:12:31 nodo1 openais[8462]: [crm ?] notice: pcmk_peer_update:
> Transitional membership event on ring 235508: memb=1, new=0, lost=1
> Aug 18 13:12:31 nodo1 crmd: [8479]: info: crm_update_quorum: Updating
> quorum status to false (call=281)
> Aug 18 13:12:31 nodo1 openais[8462]: [crm ?] info: pcmk_peer_update:
> memb: nodo1 1778493632
> Aug 18 13:12:31 nodo1 openais[8462]: [crm ?] info: pcmk_peer_update:
> lost: nodo3 1812048064
> Aug 18 13:12:31 nodo1 openais[8462]: [CLM ?] CLM CONFIGURATION CHANGE
> Aug 18 13:12:31 nodo1 openais[8462]: [CLM ?] New Configuration:
> Aug 18 13:12:31 nodo1 openais[8462]: [CLM ?] #011r(0) ip(192.168.1.106)
> Aug 18 13:12:31 nodo1 openais[8462]: [CLM ?] Members Left:
> Aug 18 13:12:31 nodo1 openais[8462]: [CLM ?] Members Joined:
> Aug 18 13:12:31 nodo1 openais[8462]: [crm ?] notice: pcmk_peer_update:
> Stable membership event on ring 235508: memb=1, new=0, lost=0
> Aug 18 13:12:31 nodo1 openais[8462]: [crm ?] info: pcmk_peer_update:
> MEMB: nodo1 1778493632
> Aug 18 13:12:31 nodo1 openais[8462]: [crm ?] info:
> ais_mark_unseen_peer_dead: Node nodo3 was not seen in the previous
> transition
> Aug 18 13:12:31 nodo1 openais[8462]: [MAIN ] info: update_member: Node
> 1812048064/nodo3 is now: lost
> Aug 18 13:12:31 nodo1 openais[8462]: [crm ?] info:
> send_member_notification: Sending membership update 235508 to 4 children
> Aug 18 13:12:31 nodo1 openais[8462]: [MAIN ] info: update_member:
> 0x7f12080009a0 Node 1778493632 ((null)) born on: 235504
> Aug 18 13:12:31 nodo1 openais[8462]: [SYNC ] This node is within the
> primary component and will provide service.
> Aug 18 13:12:31 nodo1 openais[8462]: [TOTEM] entering OPERATIONAL state.
> Aug 18 13:12:31 nodo1 openais[8462]: [MAIN ] info: update_member:
> 0x7f12080009a0 Node 1778493632 (nodo1) born on: 235504
> Aug 18 13:12:31 nodo1 openais[8462]: [crm ?] info:
> send_member_notification: Sending membership update 235508 to 4 children
> Aug 18 13:12:31 nodo1 openais[8462]: [CLM ?] got nodejoin message
> 192.168.1.106
> Aug 18 13:12:31 nodo1 openais[8462]: [CPG ?] got joinlist message from
> node 1778493632
> Aug 18 13:12:31 nodo1 ocfs2_controld: [8786]: info: ais_dispatch:
> Membership 235508: quorum still lost
> Aug 18 13:12:31 nodo1 cib: [8475]: info: cib_process_request: Operation
> complete: op cib_modify for section nodes (origin=local/crmd/279): ok
> (rc=0)
> Aug 18 13:12:31 nodo1 haclient: on_event:evt:cib_changed
> Aug 18 13:12:31 nodo1 cib: [8475]: info: cib_config_changed: Attr changes
> Aug 18 13:12:31 nodo1 haclient: on_event:evt:cib_changed
> Aug 18 13:12:31 nodo1 cib: [8475]: info: log_data_element: cib:diff: -
> <cib have-quorum="1" admin_epoch="0" epoch="1655" num_updates="37" />
> Aug 18 13:12:31 nodo1 cib: [8475]: info: log_data_element: cib:diff: +
> <cib have-quorum="0" admin_epoch="0" epoch="1656" num_updates="1" />
> Aug 18 13:12:31 nodo1 cib: [8475]: info: cib_process_request: Operation
> complete: op cib_modify for section cib (origin=local/crmd/281): ok (rc=0)
> Aug 18 13:12:31 nodo1 cib: [8475]: info: cib_process_xpath: Processing
> cib_query op for
> //cib/configuration/crm_config//nvpair[@name='expected-quorum-votes']
> (/cib/configuration/crm_config/cluster_property_set/nvpair[2])
> Aug 18 13:12:31 nodo1 crmd: [8479]: info: abort_transition_graph:
> need_abort:59 - Triggered transition abort (complete=1) : Non-status change
> Aug 18 13:12:31 nodo1 crmd: [8479]: info: need_abort: Aborting on change
> to have-quorum
> Aug 18 13:12:31 nodo1 crmd: [8479]: info: ais_dispatch: Membership
> 235508: quorum still lost
> Aug 18 13:12:31 nodo1 cib: [8475]: info: cib_process_request: Operation
> complete: op cib_modify for section crm_config (origin=local/crmd/283):
> ok (rc=0)
> Aug 18 13:12:31 nodo1 cib: [8475]: info: cib_process_request: Operation
> complete: op cib_modify for section nodes (origin=local/crmd/284): ok
> (rc=0)
> Aug 18 13:12:31 nodo1 cluster-dlm: fence_node_time: Node
> 1812048064/nodo3 was last shot 'now'#012
> Aug 18 13:12:31 nodo1 cluster-dlm: fence_node_time: It does not appear
> node 1812048064/nodo3 has been shot#012
> Aug 18 13:12:31 nodo1 cluster-dlm: check_fencing_done: check_fencing
> 1812048064 1282129898 fenced at 1282129951#012
> Aug 18 13:12:31 nodo1 cib: [8475]: info: cib_process_xpath: Processing
> cib_query op for
> //cib/configuration/crm_config//nvpair[@name='expected-quorum-votes']
> (/cib/configuration/crm_config/cluster_property_set/nvpair[2])
> Aug 18 13:12:31 nodo1 crmd: [8479]: info: do_state_transition: State
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> Aug 18 13:12:31 nodo1 crmd: [8479]: info: do_state_transition: All 1
> cluster nodes are eligible to run resources.
> Aug 18 13:12:31 nodo1 crmd: [8479]: info: do_pe_invoke: Query 288:
> Requesting the current CIB: S_POLICY_ENGINE
> Aug 18 13:12:31 nodo1 cib: [8475]: info: cib_process_request: Operation
> complete: op cib_modify for section crm_config (origin=local/crmd/287):
> ok (rc=0)
> Aug 18 13:12:31 nodo1 cluster-dlm: check_fencing_done: check_fencing
> done#012
> Aug 18 13:12:31 nodo1 cluster-dlm: check_quorum_done: check_quorum
> disabled#012
> Aug 18 13:12:31 nodo1 cluster-dlm: check_fs_done: check_fs nodeid
> 1812048064 needs fs notify#012
> Aug 18 13:12:31 nodo1 cluster-dlm: check_fencing_done: check_fencing
> done#012
> Aug 18 13:12:31 nodo1 cluster-dlm: check_quorum_done: check_quorum
> disabled#012
> Aug 18 13:12:31 nodo1 cluster-dlm: check_fs_done: check_fs nodeid
> 1812048064 needs fs notify#012
> Aug 18 13:12:31 nodo1 cluster-dlm: check_fencing_done: check_fencing
> done#012
> Aug 18 13:12:31 nodo1 cluster-dlm: check_quorum_done: check_quorum
> disabled#012
> Aug 18 13:12:31 nodo1 cluster-dlm: check_fs_done: check_fs done#012
> Aug 18 13:12:31 nodo1 cluster-dlm: send_info: send_start cg 3 flags 2
> counts 2 1 0 1 1#012
> Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
> nodeid 1812048064#012
> Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
> nodeid 1812048064#012
> Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
> nodeid 1812048064#012
> Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
> nodeid 1812048064#012
> Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
> nodeid 1812048064#012
> Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
> nodeid 1812048064#012
> Aug 18 13:12:31 nodo1 cluster-dlm: receive_start: receive_start
> 1778493632:3 len 76#012
> Aug 18 13:12:31 nodo1 cluster-dlm: match_change: match_change
> 1778493632:3 matches cg 3#012
> Aug 18 13:12:31 nodo1 cluster-dlm: wait_messages_done: wait_messages cg
> 3 got all 1#012
> Aug 18 13:12:31 nodo1 cluster-dlm: start_kernel: start_kernel cg 3
> member_count 1#012
> Aug 18 13:12:31 nodo1 cluster-dlm: update_dir_members: dir_member
> 1812048064#012
> Aug 18 13:12:31 nodo1 cluster-dlm: update_dir_members: dir_member
> 1778493632#012
> Aug 18 13:12:31 nodo1 cluster-dlm: set_configfs_members: set_members
> rmdir
> "/sys/kernel/config/dlm/cluster/spaces/0BB443F896254AD3BA8FB960C425B666/nodes/1812048064"#012
>
> Aug 18 13:12:31 nodo1 cluster-dlm: do_sysfs: write "1" to
> "/sys/kernel/dlm/0BB443F896254AD3BA8FB960C425B666/control"#012
> Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
> nodeid 1812048064#012
> Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
> nodeid 1812048064#012
> Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
> nodeid 1812048064#012
> Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
> nodeid 1812048064#012
> Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
> nodeid 1812048064#012
> Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
> nodeid 1812048064#012
> Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
> nodeid 1812048064#012
> Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
> nodeid 1812048064#012
> Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
> nodeid 1812048064#012
> Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
> nodeid 1812048064#012
> Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
> nodeid 1812048064#012
> Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
> nodeid 1812048064#012
> Aug 18 13:12:31 nodo1 cluster-dlm: set_fs_notified: set_fs_notified no
> nodeid 1812048064#012
> ...............................
> and so on with the same messages until other node fence it.
>
> I've searched on internet some info but nobody has the same problem.
> I've tried to post the same error to clusterlabs mailing list without
> answer.
>
> I've notice that if I remove the FC cable from the node in error the log
> messages stops, so I suppose is some problem due lock manager+ocfs2 and
> network problem.
>
> Could someone help me?
Could you provide the hb_report? It gathered many useful logs which
can help us analyze if it is a configuration issue or a bug;)
Anyway, you'd better file a bug in Novell bugzilla for this kind of issue.
Thanks,
Jiaju
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2010-08-20 13:54 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-20 10:50 [Ocfs2-devel] ocfs2 hang writing until reboot the cluster-dlm: set_fs_notified: set_fs_notified no nodeid 1812048064#012 Roberto Giordani
2010-08-20 13:54 ` Jiaju Zhang
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.