* [Cluster-devel] Question about /etc/init.d/cman start
@ 2011-08-29 6:02 Dietmar Maurer
2011-08-29 6:36 ` Fabio M. Di Nitto
0 siblings, 1 reply; 10+ messages in thread
From: Dietmar Maurer @ 2011-08-29 6:02 UTC (permalink / raw)
To: cluster-devel.redhat.com
Hi all,
the current startup script simply exit if there is no quorum, so fenced and dlm_controld are not started. Even cmannotifyd is not started, so you can't react to quorum changes with cmannotifyd.
What is the suggested way to start those services after the node gets quorum?
And why can't we start those services at boot time (especially cmannotifyd)?
- Dietmar
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] Question about /etc/init.d/cman start
2011-08-29 6:02 [Cluster-devel] Question about /etc/init.d/cman start Dietmar Maurer
@ 2011-08-29 6:36 ` Fabio M. Di Nitto
2011-08-29 7:01 ` Dietmar Maurer
0 siblings, 1 reply; 10+ messages in thread
From: Fabio M. Di Nitto @ 2011-08-29 6:36 UTC (permalink / raw)
To: cluster-devel.redhat.com
On 08/29/2011 08:02 AM, Dietmar Maurer wrote:
> Hi all,
>
> the current startup script simply exit if there is no quorum, so fenced and dlm_controld are not started. Even cmannotifyd is not started, so you can't react to quorum changes with cmannotifyd.
>
It is actually configurable via /etc/sysconfig/cman (or
/etc/defaults/cman on debian based systems)
# CMAN_QUORUM_TIMEOUT -- amount of time to wait for a quorate cluster on
# startup quorum is needed by many other applications, so we may as
# well wait here. If CMAN_QUORUM_TIMEOUT is zero, quorum will
# be ignored.
[ -z "$CMAN_QUORUM_TIMEOUT" ] && CMAN_QUORUM_TIMEOUT=45
Setting CMAN_QUORUM_TIMEOUT=0 will simply stop waiting for quorum and
continue the execution of the init script.
Assuming you want to retain the default behavior, once quorum is gained,
it is enough to execute /etc/init.d/cman start again. The script is
clever enough to start only what is necessary.
You have a good point regarding cmannotifyd. In theory it could be used
to trigger a "/etc/init.d/cman start" once quorum is achieved and
notification dispatched. I can fix this upstream, but for any RHEL6
changes, I'll need you to go via support channels.
Fabio
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] Question about /etc/init.d/cman start
2011-08-29 6:36 ` Fabio M. Di Nitto
@ 2011-08-29 7:01 ` Dietmar Maurer
2011-08-29 7:38 ` Fabio M. Di Nitto
0 siblings, 1 reply; 10+ messages in thread
From: Dietmar Maurer @ 2011-08-29 7:01 UTC (permalink / raw)
To: cluster-devel.redhat.com
> It is actually configurable via /etc/sysconfig/cman (or /etc/defaults/cman on
> debian based systems)
>
> # CMAN_QUORUM_TIMEOUT -- amount of time to wait for a quorate cluster
> on
> # startup quorum is needed by many other applications, so we may as
> # well wait here. If CMAN_QUORUM_TIMEOUT is zero, quorum will
> # be ignored.
> [ -z "$CMAN_QUORUM_TIMEOUT" ] && CMAN_QUORUM_TIMEOUT=45
>
> Setting CMAN_QUORUM_TIMEOUT=0 will simply stop waiting for quorum and
> continue the execution of the init script.
Sure, but I want to wait for quorum.
> Assuming you want to retain the default behavior, once quorum is gained, it is
> enough to execute /etc/init.d/cman start again. The script is clever enough to
> start only what is necessary.
> You have a good point regarding cmannotifyd. In theory it could be used to
> trigger a "/etc/init.d/cman start" once quorum is achieved and notification
> dispatched. I can fix this upstream, but for any RHEL6 changes, I'll need you to
I compile my own packages for debian, so a fix for upstream would be great. I am just unsure
if we should call unfence_self() from cmannotifyd. I guess it is OK if we check that
we got quorum for the first time?
Besides, why do you want that extra complexity running 'cman start' from cmannotifyd? Especially error handling is somehow unclear to me (what if cman start fails there?). So can't we simple make those daemon smart enough so that we can start them at boot time (always)?
- Dietmar
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] Question about /etc/init.d/cman start
2011-08-29 7:01 ` Dietmar Maurer
@ 2011-08-29 7:38 ` Fabio M. Di Nitto
2011-08-29 7:55 ` Dietmar Maurer
0 siblings, 1 reply; 10+ messages in thread
From: Fabio M. Di Nitto @ 2011-08-29 7:38 UTC (permalink / raw)
To: cluster-devel.redhat.com
On 08/29/2011 09:01 AM, Dietmar Maurer wrote:
>> It is actually configurable via /etc/sysconfig/cman (or /etc/defaults/cman on
>> debian based systems)
>>
>> # CMAN_QUORUM_TIMEOUT -- amount of time to wait for a quorate cluster
>> on
>> # startup quorum is needed by many other applications, so we may as
>> # well wait here. If CMAN_QUORUM_TIMEOUT is zero, quorum will
>> # be ignored.
>> [ -z "$CMAN_QUORUM_TIMEOUT" ] && CMAN_QUORUM_TIMEOUT=45
>>
>> Setting CMAN_QUORUM_TIMEOUT=0 will simply stop waiting for quorum and
>> continue the execution of the init script.
>
> Sure, but I want to wait for quorum.
>
>> Assuming you want to retain the default behavior, once quorum is gained, it is
>> enough to execute /etc/init.d/cman start again. The script is clever enough to
>> start only what is necessary.
>> You have a good point regarding cmannotifyd. In theory it could be used to
>> trigger a "/etc/init.d/cman start" once quorum is achieved and notification
>> dispatched. I can fix this upstream, but for any RHEL6 changes, I'll need you to
>
> I compile my own packages for debian, so a fix for upstream would be great. I am just unsure
> if we should call unfence_self() from cmannotifyd. I guess it is OK if we check that
> we got quorum for the first time?
No you can't call unfencing from cmannotifyd. I honestly don't recall
all the details on why, but one of the reason is (for example):
- node 1 and node 2
- node 1 start experiencing network problems
- node 2 fence-scsi node 1
- node 1 unfence itself in a non clean state due to cman notifications
of up/downs.
- cluster goes kaboom.
> Besides, why do you want that extra complexity running 'cman start' from cmannotifyd? Especially error handling is somehow unclear to me (what if cman start fails there?).
Well it's one way to do it.
cmannotifyd (as documented) does not provide error handling itself. The
reason is that you can't really halt all cluster operations because a
bad script is activated by a "random" user via cmannotifyd.
> So can't we simple make those daemon smart enough so that we can start them at boot time (always)?
They are smart enough. You are misreading the comments about wait for
quorum in cman init.
The daemons can be safely started at boot time, even without quorum, but
they can't do anything useful till quorum is achieved. That is why it is
possible to override the wait for quorum.
Most users have requested and wants to wait for quorum and fail if there
is no quorum since it really doesn't help to have more daemons running
on top cman.
So maybe what you want is an option to:
wait for quorum, if there is no quorum after timeout, still allow
everything else to start?
Fabio
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] Question about /etc/init.d/cman start
2011-08-29 7:38 ` Fabio M. Di Nitto
@ 2011-08-29 7:55 ` Dietmar Maurer
2011-08-29 9:36 ` Fabio M. Di Nitto
0 siblings, 1 reply; 10+ messages in thread
From: Dietmar Maurer @ 2011-08-29 7:55 UTC (permalink / raw)
To: cluster-devel.redhat.com
> They are smart enough. You are misreading the comments about wait for
> quorum in cman init.
Ah, OK.
> The daemons can be safely started at boot time, even without quorum, but they
> can't do anything useful till quorum is achieved. That is why it is possible to
> override the wait for quorum.
>
> Most users have requested and wants to wait for quorum and fail if there is no
> quorum since it really doesn't help to have more daemons running on top cman.
And they manually log into each node and exec 'cman start' after they have quorum?
> So maybe what you want is an option to:
>
> wait for quorum, if there is no quorum after timeout, still allow everything else
> to start?
Yes, I think that is what I want.
- Dietmar
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] Question about /etc/init.d/cman start
2011-08-29 7:55 ` Dietmar Maurer
@ 2011-08-29 9:36 ` Fabio M. Di Nitto
2011-08-29 9:51 ` Dietmar Maurer
2011-08-29 10:11 ` Dietmar Maurer
0 siblings, 2 replies; 10+ messages in thread
From: Fabio M. Di Nitto @ 2011-08-29 9:36 UTC (permalink / raw)
To: cluster-devel.redhat.com
On 08/29/2011 09:55 AM, Dietmar Maurer wrote:
>> They are smart enough. You are misreading the comments about wait for
>> quorum in cman init.
>
> Ah, OK.
>
>> The daemons can be safely started at boot time, even without quorum, but they
>> can't do anything useful till quorum is achieved. That is why it is possible to
>> override the wait for quorum.
>>
>> Most users have requested and wants to wait for quorum and fail if there is no
>> quorum since it really doesn't help to have more daemons running on top cman.
>
> And they manually log into each node and exec 'cman start' after they have quorum?
Yes, generally the fact that quorum is not achieved with N seconds is an
indication of something wrong in the cluster or the hw (for instance
network issues). Users prefer to see an error at that point, rather than
keep executing more daemons that will just make things more confusing on
why it's not working at higher levels (rgmanager or gfs) when the issue
is at the bottom layers.
>
>> So maybe what you want is an option to:
>>
>> wait for quorum, if there is no quorum after timeout, still allow everything else
>> to start?
>
> Yes, I think that is what I want.
Ok, I'll take a look at it once I have some time.
Fabio
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] Question about /etc/init.d/cman start
2011-08-29 9:36 ` Fabio M. Di Nitto
@ 2011-08-29 9:51 ` Dietmar Maurer
2011-08-29 10:11 ` Dietmar Maurer
1 sibling, 0 replies; 10+ messages in thread
From: Dietmar Maurer @ 2011-08-29 9:51 UTC (permalink / raw)
To: cluster-devel.redhat.com
> Yes, generally the fact that quorum is not achieved with N seconds is an
> indication of something wrong in the cluster or the hw (for instance network
> issues). Users prefer to see an error at that point, rather than keep executing
> more daemons that will just make things more confusing on why it's not working
> at higher levels (rgmanager or gfs) when the issue is at the bottom layers.
Well, maybe that is more reasonable.
> >> So maybe what you want is an option to:
> >>
> >> wait for quorum, if there is no quorum after timeout, still allow
> >> everything else to start?
> >
> > Yes, I think that is what I want.
>
> Ok, I'll take a look at it once I have some time.
Please wait - I will need to run more tests to see if I really need this. Maybe it is not
required at all.
Many thanks for your help.
- Dietmar
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] Question about /etc/init.d/cman start
2011-08-29 9:36 ` Fabio M. Di Nitto
2011-08-29 9:51 ` Dietmar Maurer
@ 2011-08-29 10:11 ` Dietmar Maurer
2011-08-29 10:23 ` Fabio M. Di Nitto
1 sibling, 1 reply; 10+ messages in thread
From: Dietmar Maurer @ 2011-08-29 10:11 UTC (permalink / raw)
To: cluster-devel.redhat.com
> Yes, generally the fact that quorum is not achieved with N seconds is an
> indication of something wrong in the cluster or the hw (for instance network
> issues). Users prefer to see an error at that point, rather than keep executing
> more daemons that will just make things more confusing on why it's not working
> at higher levels (rgmanager or gfs) when the issue is at the bottom layers.
That sounds reasonable, but it seems that rgmanager is always started with:
/etc/init.d/rgmanager
So we have rgmanager running, but no dlm_controld and no fenced. Is it expected to
work that way?
- Dietmar
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] Question about /etc/init.d/cman start
2011-08-29 10:11 ` Dietmar Maurer
@ 2011-08-29 10:23 ` Fabio M. Di Nitto
2011-08-29 10:38 ` Dietmar Maurer
0 siblings, 1 reply; 10+ messages in thread
From: Fabio M. Di Nitto @ 2011-08-29 10:23 UTC (permalink / raw)
To: cluster-devel.redhat.com
On 08/29/2011 12:11 PM, Dietmar Maurer wrote:
>> Yes, generally the fact that quorum is not achieved with N seconds is an
>> indication of something wrong in the cluster or the hw (for instance network
>> issues). Users prefer to see an error at that point, rather than keep executing
>> more daemons that will just make things more confusing on why it's not working
>> at higher levels (rgmanager or gfs) when the issue is at the bottom layers.
>
> That sounds reasonable, but it seems that rgmanager is always started with:
>
> /etc/init.d/rgmanager
>
> So we have rgmanager running, but no dlm_controld and no fenced. Is it expected to
> work that way?
If dlm is not working, rgmanager will exit with error.
rgmanager does not use fenced directly so that's "irrelevant".
Fabio
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] Question about /etc/init.d/cman start
2011-08-29 10:23 ` Fabio M. Di Nitto
@ 2011-08-29 10:38 ` Dietmar Maurer
0 siblings, 0 replies; 10+ messages in thread
From: Dietmar Maurer @ 2011-08-29 10:38 UTC (permalink / raw)
To: cluster-devel.redhat.com
> > So we have rgmanager running, but no dlm_controld and no fenced. Is it
> > expected to work that way?
>
> If dlm is not working, rgmanager will exit with error.
The startup script print:
rgmanager[XXX]: Waiting for quorum to form
But you are right, it exit after the cluster got quorum:
Aug 29 12:32:26 5 maui rgmanager[1595]: Quorum formed
Aug 29 12:32:26 3 maui kernel:dlm: no local IP address has been set
Aug 29 12:32:26 3 maui kernel:dlm: cannot start dlm lowcomms -107
I was just confused because it does not exit immediately.
> rgmanager does not use fenced directly so that's "irrelevant".
OK, thanks.
- Dietmar
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-08-29 10:38 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-29 6:02 [Cluster-devel] Question about /etc/init.d/cman start Dietmar Maurer
2011-08-29 6:36 ` Fabio M. Di Nitto
2011-08-29 7:01 ` Dietmar Maurer
2011-08-29 7:38 ` Fabio M. Di Nitto
2011-08-29 7:55 ` Dietmar Maurer
2011-08-29 9:36 ` Fabio M. Di Nitto
2011-08-29 9:51 ` Dietmar Maurer
2011-08-29 10:11 ` Dietmar Maurer
2011-08-29 10:23 ` Fabio M. Di Nitto
2011-08-29 10:38 ` Dietmar Maurer
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.