* Monitor issue @ 2012-10-29 14:48 Roman Alekseev 2012-10-29 14:59 ` Wido den Hollander 0 siblings, 1 reply; 4+ messages in thread From: Roman Alekseev @ 2012-10-29 14:48 UTC (permalink / raw) To: ceph-devel Hello, I have 3 monitors on different nodes and when 'mon.a' was stopped whole cluster stopped work too. My conf: http://pastebin.com/hT3qEhUF Could someone explain how to fix such kind of failure? -- Kind regards, R. Alekseev ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Monitor issue 2012-10-29 14:48 Monitor issue Roman Alekseev @ 2012-10-29 14:59 ` Wido den Hollander 2012-10-30 6:06 ` Roman Alekseev 0 siblings, 1 reply; 4+ messages in thread From: Wido den Hollander @ 2012-10-29 14:59 UTC (permalink / raw) To: Roman Alekseev; +Cc: ceph-devel On 10/29/2012 03:48 PM, Roman Alekseev wrote: > Hello, > > I have 3 monitors on different nodes and when 'mon.a' was stopped whole > cluster stopped work too. > My conf: http://pastebin.com/hT3qEhUF > > Could someone explain how to fix such kind of failure? Could you explain a bit more about the setup? Which version are you running? What do you mean with failure? Is the ceph -s command still working? How sure are you that you didn't catch a bug that killed all three monitors? Are those processes actually up and running? Did you check the logs of the monitors? Could you let us know? Thanks! Wido ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Monitor issue 2012-10-29 14:59 ` Wido den Hollander @ 2012-10-30 6:06 ` Roman Alekseev 2012-10-30 10:32 ` Joao Eduardo Luis 0 siblings, 1 reply; 4+ messages in thread From: Roman Alekseev @ 2012-10-30 6:06 UTC (permalink / raw) To: Wido den Hollander; +Cc: ceph-devel On 29.10.2012 18:59, Wido den Hollander wrote: > > > On 10/29/2012 03:48 PM, Roman Alekseev wrote: >> Hello, >> >> I have 3 monitors on different nodes and when 'mon.a' was stopped whole >> cluster stopped work too. >> My conf: http://pastebin.com/hT3qEhUF >> >> Could someone explain how to fix such kind of failure? > > Could you explain a bit more about the setup? > > Which version are you running? > > What do you mean with failure? Is the ceph -s command still working? > > How sure are you that you didn't catch a bug that killed all three > monitors? Are those processes actually up and running? > > Did you check the logs of the monitors? > > Could you let us know? > > Thanks! > > Wido Hi Wido, I'm running ceph version 0.48.1argonaut. The "ceph -s" command doesn't work until I start that monitor again. Under failure I mean that ceph commands (such as ceph -s , -w, ceph mon dump etc) don't respond. I've re-added all three mons and found the following situations: Situation A: 1) mon.a is disabled: health HEALTH_WARN 1 mons down, quorum 1,2 b,c (cluster works) 2) mon.b is disabled: health HEALTH_WARN 1 mons down, quorum 0,1 a,c (cluster works) 3) mon.c is disabled: health HEALTH_WARN 1 mons down, quorum 0,2 a,b (cluster works) Situation B: If 2 mons are disabled all cluster stop working. So cluster works only when 2 monitors are running. Is it correct ? -- Kind regards, R. Alekseev ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Monitor issue 2012-10-30 6:06 ` Roman Alekseev @ 2012-10-30 10:32 ` Joao Eduardo Luis 0 siblings, 0 replies; 4+ messages in thread From: Joao Eduardo Luis @ 2012-10-30 10:32 UTC (permalink / raw) To: Roman Alekseev; +Cc: Wido den Hollander, ceph-devel On 10/30/2012 06:06 AM, Roman Alekseev wrote: > On 29.10.2012 18:59, Wido den Hollander wrote: >> >> >> On 10/29/2012 03:48 PM, Roman Alekseev wrote: >>> Hello, >>> >>> I have 3 monitors on different nodes and when 'mon.a' was stopped whole >>> cluster stopped work too. >>> My conf: http://pastebin.com/hT3qEhUF >>> >>> Could someone explain how to fix such kind of failure? >> >> Could you explain a bit more about the setup? >> >> Which version are you running? >> >> What do you mean with failure? Is the ceph -s command still working? >> >> How sure are you that you didn't catch a bug that killed all three >> monitors? Are those processes actually up and running? >> >> Did you check the logs of the monitors? >> >> Could you let us know? >> >> Thanks! >> >> Wido > > Hi Wido, > > I'm running ceph version 0.48.1argonaut. > The "ceph -s" command doesn't work until I start that monitor again. > Under failure I mean that ceph commands (such as ceph -s , -w, ceph mon > dump etc) don't respond. I've re-added all three mons and found the > following situations: > > Situation A: > 1) mon.a is disabled: > health HEALTH_WARN 1 mons down, quorum 1,2 b,c (cluster works) > 2) mon.b is disabled: > health HEALTH_WARN 1 mons down, quorum 0,1 a,c (cluster works) > 3) mon.c is disabled: > health HEALTH_WARN 1 mons down, quorum 0,2 a,b (cluster works) > > Situation B: > If 2 mons are disabled all cluster stop working. > So cluster works only when 2 monitors are running. > > Is it correct ? > In a nutshell, yes. You need to have a majority of monitors up and running, and in the quorum, for the cluster to work. So, in 'A', you always have 2 out of the 3 existing monitors up; this means a majority is up and quorum can be formed. In 'B' however, you have only one monitor. Given there are 3 monitors in the monmap, you don't have enough monitors to form a quorum (N+1/2, N being the total number of monitors in your cluster, which is 3 in your case). Same rules apply if you had, say, 5 monitors: you'd need 3 up for the cluster to work; if you only had 2, the monitors wouldn't be able to form quorum. And by the way, the cluster doesn't work because unless you have a healthy monitor cluster, with formed quorum, the monitors that are up and running will basically refuse to answer any kind of requests made to them (aside from specific commands made through the admin socket, that are targeted at one specific monitor). Hope this helps. -Joao ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-10-30 10:33 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-10-29 14:48 Monitor issue Roman Alekseev 2012-10-29 14:59 ` Wido den Hollander 2012-10-30 6:06 ` Roman Alekseev 2012-10-30 10:32 ` Joao Eduardo Luis
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.