ceph does not work

All of lore.kernel.org
 help / color / mirror / Atom feed

* ceph does not work
@ 2012-02-23  9:15 Дениска-редиска
  2012-02-23 19:00 ` Tommi Virtanen
  0 siblings, 1 reply; 7+ messages in thread
From: Дениска-редиска @ 2012-02-23  9:15 UTC (permalink / raw)
  To: ceph-devel

ehllo here,

i have tried to setup ceph .41 in simple configuration:
3 nodes, each running mon, mds & osd with replication level 3 for data & metadata pools.
Each node mounts ceph locally via ceph-fuse
cluster seems running well until one of the nodes goes down for simple reboot.
Then all mount points become inaccessible, data transfer hangs and cluster stop working

What is the purpose of ceph software while such simple case does not go through ?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ceph does not work
  2012-02-23  9:15 ceph does not work Дениска-редиска
@ 2012-02-23 19:00 ` Tommi Virtanen
  2012-02-23 19:07   ` Gregory Farnum
  2012-02-23 19:09   ` Sage Weil
  0 siblings, 2 replies; 7+ messages in thread
From: Tommi Virtanen @ 2012-02-23 19:00 UTC (permalink / raw)
  To: Дениска-редиска
  Cc: ceph-devel

On Thu, Feb 23, 2012 at 01:15, Дениска-редиска <slim@inbox.lv> wrote:
> ehllo here,
>
> i have tried to setup ceph .41 in simple configuration:
> 3 nodes, each running mon, mds & osd with replication level 3 for data & metadata pools.
> Each node mounts ceph locally via ceph-fuse
> cluster seems running well until one of the nodes goes down for simple reboot.
> Then all mount points become inaccessible, data transfer hangs and cluster stop working
>
> What is the purpose of ceph software while such simple case does not go through ?

You have a replication factor of 3, and 3 OSDs. If one of them is
down, the replication factor of 3 cannot be satisfied anymore. You
need either more nodes, or a smaller replication factor.

Ceph is not an eventually consistent system; building a POSIX
filesystem on top of one is pretty much impossible. With Ceph, all
replicas are always kept up to date.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ceph does not work
  2012-02-23 19:00 ` Tommi Virtanen
@ 2012-02-23 19:07   ` Gregory Farnum
  2012-02-23 19:11     ` Tommi Virtanen
  2012-02-23 19:09   ` Sage Weil
  1 sibling, 1 reply; 7+ messages in thread
From: Gregory Farnum @ 2012-02-23 19:07 UTC (permalink / raw)
  To: Tommi Virtanen
  Cc: Дениска-редиска,
	ceph-devel

On Thu, Feb 23, 2012 at 11:00 AM, Tommi Virtanen
<tommi.virtanen@dreamhost.com> wrote:
> On Thu, Feb 23, 2012 at 01:15, Дениска-редиска <slim@inbox.lv> wrote:
>> ehllo here,
>>
>> i have tried to setup ceph .41 in simple configuration:
>> 3 nodes, each running mon, mds & osd with replication level 3 for data & metadata pools.
>> Each node mounts ceph locally via ceph-fuse
>> cluster seems running well until one of the nodes goes down for simple reboot.
>> Then all mount points become inaccessible, data transfer hangs and cluster stop working
>>
>> What is the purpose of ceph software while such simple case does not go through ?
>
> You have a replication factor of 3, and 3 OSDs. If one of them is
> down, the replication factor of 3 cannot be satisfied anymore. You
> need either more nodes, or a smaller replication factor.
>
> Ceph is not an eventually consistent system; building a POSIX
> filesystem on top of one is pretty much impossible. With Ceph, all
> replicas are always kept up to date.

Actually the OSDs will happily (well, not happily; the will complain.
But they will run) run in degraded mode. However, if you have 3 active
MDSes and you kill one of them without a standby available, you will
lose access to part of your tree. That's probably what happened
here...
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ceph does not work
  2012-02-23 19:00 ` Tommi Virtanen
  2012-02-23 19:07   ` Gregory Farnum
@ 2012-02-23 19:09   ` Sage Weil
  1 sibling, 0 replies; 7+ messages in thread
From: Sage Weil @ 2012-02-23 19:09 UTC (permalink / raw)
  To: Tommi Virtanen
  Cc: Дениска-редиска,
	ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1256 bytes --]

On Thu, 23 Feb 2012, Tommi Virtanen wrote:
> On Thu, Feb 23, 2012 at 01:15, ÿÿÿÿÿÿÿÿÿÿÿÿÿÿ-ÿÿÿÿÿÿÿÿÿÿÿÿÿÿ <slim@inbox.lv> wrote:
> > ehllo here,
> >
> > i have tried to setup ceph .41 in simple configuration:
> > 3 nodes, each running mon, mds & osd with replication level 3 for data & metadata pools.
> > Each node mounts ceph locally via ceph-fuse
> > cluster seems running well until one of the nodes goes down for simple reboot.
> > Then all mount points become inaccessible, data transfer hangs and cluster stop working
> >
> > What is the purpose of ceph software while such simple case does not go through ?
> 
> You have a replication factor of 3, and 3 OSDs. If one of them is
> down, the replication factor of 3 cannot be satisfied anymore. You
> need either more nodes, or a smaller replication factor.
> 
> Ceph is not an eventually consistent system; building a POSIX
> filesystem on top of one is pretty much impossible. With Ceph, all
> replicas are always kept up to date.

Just to clarify: what should have happend is that after a few seconds (20 
by default?) the stopped ceph-osd is marked down and life continues with 2 
replicas.  'ceph -s' or 'ceph health' will report some PGs in the 
'degraded' state.

sage

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ceph does not work
  2012-02-23 19:07   ` Gregory Farnum
@ 2012-02-23 19:11     ` Tommi Virtanen
  2012-02-24 11:33       ` Дениска-редиска
  0 siblings, 1 reply; 7+ messages in thread
From: Tommi Virtanen @ 2012-02-23 19:11 UTC (permalink / raw)
  To: Gregory Farnum
  Cc: Дениска-редиска,
	ceph-devel

On Thu, Feb 23, 2012 at 11:07, Gregory Farnum
<gregory.farnum@dreamhost.com> wrote:
>>> 3 nodes, each running mon, mds & osd with replication level 3 for data & metadata pools.
...
> Actually the OSDs will happily (well, not happily; the will complain.
> But they will run) run in degraded mode. However, if you have 3 active
> MDSes and you kill one of them without a standby available, you will
> lose access to part of your tree. That's probably what happened
> here...

So let's try that angle. Slim, can you share the output of "ceph -s" with us?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ceph does not work
  2012-02-23 19:11     ` Tommi Virtanen
@ 2012-02-24 11:33       ` Дениска-редиска
  2012-02-24 16:47         ` Gregory Farnum
  0 siblings, 1 reply; 7+ messages in thread
From: Дениска-редиска @ 2012-02-24 11:33 UTC (permalink / raw)
  To: Tommi Virtanen, ceph-devel; +Cc: Gregory Farnum

running cluster of 3 nodes:

lv-test-2 ~ # ceph -s            
2012-02-24 13:10:35.481248    pg v726: 594 pgs: 594 active+clean; 120 MB data, 683 MB used, 35448 MB / 37967 MB avail
2012-02-24 13:10:35.484463   mds e177: 3/3/3 up {0=shark1=up:active,1=lv-test-1=up:active,2=lv-test-2=up:active}
2012-02-24 13:10:35.484529   osd e64: 3 osds: 3 up, 3 in
2012-02-24 13:10:35.484630   log 2012-02-24 13:09:50.009333 osd.1 10.0.1.246:6801/3929 29 : [INF] 2.5d scrub ok
2012-02-24 13:10:35.484907   mon e1: 3 mons at {lv-test-1=10.0.1.246:6789/0,lv-test-2=10.0.1.247:6789/0,shark1=10.0.1.81:6789/0}

mounting by fuse:
lv-test1 ~ # mount
ceph-fuse on /uploads type fuse.ceph-fuse (rw,nosuid,nodev,allow_other,default_permissions)

simulating write:
lv-test-1 ~ # cp -r /usr/src/linux-3.2.2-hardened-r1/ /uploads/

killing one node:
lv-test-2 ~ # killall ceph-mon ceph-mds  ceph-osd
Feb 24 13:11:17 lv-test-2 mon.lv-test-2[3474]: *** Caught signal (Terminated) **
Feb 24 13:11:17 lv-test-2 in thread 3195ce76760. Shutting down.
Feb 24 13:11:17 lv-test-2 mds.lv-test-2[3553]: *** Caught signal (Terminated) **
Feb 24 13:11:17 lv-test-2 in thread 2ee100bb760. Shutting down.
Feb 24 13:11:17 lv-test-2 osd.2[3654]: *** Caught signal (Terminated) **
Feb 24 13:11:17 lv-test-2 in thread 28f75487760. Shutting down.
Feb 24 13:11:35 lv-test-2 client.admin[3885]: 3a6a385b700 monclient: hunting for new mon
Feb 24 13:11:35 lv-test-2 client.admin[3885]: 3a6a385b700 client.5017 ms_handle_reset on 10.0.1.246:6789/0

Feb 24 13:11:17 lv-test-1 mon.lv-test-1[3751]: 2d62330a700 -- 10.0.1.246:6789/0 >> 10.0.1.247:6789/0 pipe(0x522b9ba080 sd=9 pgs=37 cs=1 l=0
).fault with nothing to send, going to standby
Feb 24 13:11:17 lv-test-1 mds.lv-test-1[3830]: 2e3b9bd0700 -- 10.0.1.246:6800/3829 >> 10.0.1.247:6800/3552 pipe(0x5faea44c0 sd=13 pgs=13 cs
=1 l=0).fault with nothing to send, going to standby
Feb 24 13:11:17 lv-test-1 client.admin[3151]: 2a9fbe55700 -- 10.0.1.246:0/3151 >> 10.0.1.247:6800/3552 pipe(0x2a9ec00f560 sd=0 pgs=9 cs=3 l
=0).fault with nothing to send, going to standby
Feb 24 13:11:17 lv-test-1 osd.1[3930]: 2a51b9f1700 osd.1 64 OSD::ms_handle_reset()
Feb 24 13:11:17 lv-test-1 osd.1[3930]: 2a51b9f1700 osd.1 64 OSD::ms_handle_reset() s=0x22b4580700
Feb 24 13:11:17 lv-test-1 client.admin[3151]: 2a9fd95b700 client.4617 ms_handle_reset on 10.0.1.247:6801/3653
Feb 24 13:11:17 lv-test-1 osd.1[3930]: 2a510df7700 -- 10.0.1.246:6803/3929 >> 10.0.1.247:0/3654 pipe(0x2a50c005000 sd=24 pgs=4 cs=1 l=0).fa
ult with nothing to send, going to standby
Feb 24 13:11:18 lv-test-1 osd.1[3930]: 2a5184e6700 -- 10.0.1.246:6802/3929 >> 10.0.1.247:6802/3653 pipe(0x2a5145e9c50 sd=19 pgs=3 cs=1 l=0)
.fault with nothing to send, going to standby
Feb 24 13:11:18 lv-test-1 mds.lv-test-1[3830]: 2e3bcadc700 mds.1.5 ms_handle_reset on 10.0.1.247:6801/3653
Feb 24 13:11:18 lv-test-1 osd.1[3930]: 2a5183e5700 -- 10.0.1.246:0/3930 >> 10.0.1.247:6803/3653 pipe(0x2a5145eaeb0 sd=20 pgs=16 cs=1 l=0).f
ault with nothing to send, going to standby
Feb 24 13:11:34 lv-test-1 osd.1[3930]: 2a5229ff700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:14.366355)
Feb 24 13:11:34 lv-test-1 osd.1[3930]: 2a5117fa700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:14.826382)
Feb 24 13:11:35 lv-test-1 osd.1[3930]: 2a5229ff700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:15.369660)
Feb 24 13:11:35 lv-test-1 osd.1[3930]: 2a51b9f1700 monclient: hunting for new mon
Feb 24 13:11:35 lv-test-1 osd.1[3930]: 2a51b9f1700 osd.1 64 OSD::ms_handle_reset()
Feb 24 13:11:36 lv-test-1 osd.1[3930]: 2a5117fa700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:16.129635)
Feb 24 13:11:36 lv-test-1 osd.1[3930]: 2a5229ff700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:16.372900)

copy hangs (cannot be killed by kill -9), /uploads is not accessible

lv-test-1 ~ # time ceph -s

^C*** Caught signal (Interrupt) **
 in thread 2da010af760. Shutting down.


real    3m16.481s
user    0m0.037s
sys     0m0.013s

lv-test-2 ~ # time ceph -s 


^C*** Caught signal (Interrupt) **
 in thread 314b193c760. Shutting down.


real    0m35.401s
user    0m0.017s
sys     0m0.007s

so cluster hanged and not responding anymore

lets bring up back killed node:

lv-test-2 ~ # /etc/init.d/ceph restart
lv-test-2 ~ # ceph -s
2012-02-24 13:20:01.996366    pg v734: 594 pgs: 594 active+clean; 120 MB data, 683 MB used, 35448 MB / 37967 MB avail
2012-02-24 13:20:01.999207   mds e177: 3/3/3 up {0=shark1=up:active,1=lv-test-1=up:active,2=lv-test-2=up:active}
2012-02-24 13:20:01.999268   osd e64: 3 osds: 3 up, 3 in
2012-02-24 13:20:01.999368   log 2012-02-24 13:11:02.267947 osd.1 10.0.1.246:6801/3929 41 : [INF] 2.89 scrub ok
2012-02-24 13:20:01.999612   mon e1: 3 mons at {lv-test-1=10.0.1.246:6789/0,lv-test-2=10.0.1.247:6789/0,shark1=10.0.1.81:6789/0}

lv-test-2 ~ # ceph -s
2012-02-24 13:20:44.984214    pg v742: 594 pgs: 594 active+clean; 144 MB data, 714 MB used, 35417 MB / 37967 MB avail
2012-02-24 13:20:44.986505   mds e182: 3/3/3 up {0=lv-test-2=up:resolve,1=lv-test-1=up:active,2=lv-test-2=up:active(laggy or crashed)}
2012-02-24 13:20:44.986697   osd e68: 3 osds: 1 up, 3 in
2012-02-24 13:20:44.986918   log 2012-02-24 13:20:42.606730 mon.1 10.0.1.246:6789/0 27 : [INF] mds.1 10.0.1.246:6800/3829 up:active
2012-02-24 13:20:44.987118   mon e1: 3 mons at {lv-test-1=10.0.1.246:6789/0,lv-test-2=10.0.1.247:6789/0,shark1=10.0.1.81:6789/0}

Feb 24 13:23:28 lv-test-2 mds.lv-test-2[4608]: 2a93085f700 -- 10.0.1.247:6800/4607 >> 10.0.1.247:6800/3552 pipe(0x19d6f37b40 sd=12 pgs=0 cs=0 l=0).connect claims to be 10.0.1.247:6800/4607 not 10.0.1.247:6800/3552 - wrong node!

Feb 24 13:24:13 lv-test-1 client.admin[3151]: 2a9fc158700 -- 10.0.1.246:0/3151 >> 10.0.1.247:6800/3552 pipe(0x2a9ec00f560 sd=0 pgs=9 cs=4 l=0).connect claims to be 10.0.1.247:6800/4607 not 10.0.1.247:6800/3552 - wrong node!
Feb 24 13:24:14 lv-test-1 mds.lv-test-1[3830]: 2e3b981b700 -- 10.0.1.246:6800/3829 >> 10.0.1.247:6800/3552 pipe(0x5faea44c0 sd=8 pgs=13 cs=2 l=0).connect claims to be 10.0.1.247:6800/4607 not 10.0.1.247:6800/3552 - wrong node!


lv-test-1 ~ # ceph -s
2012-02-24 13:24:36.558927    pg v762: 594 pgs: 594 active+clean; 144 MB data, 741 MB used, 35390 MB / 37967 MB avail
2012-02-24 13:24:36.560927   mds e195: 3/3/3 up {0=lv-test-2=up:resolve,1=lv-test-1=up:active,2=lv-test-2=up:active(laggy or crashed)}
2012-02-24 13:24:36.560988   osd e70: 3 osds: 2 up, 3 in
2012-02-24 13:24:36.561092   log 2012-02-24 13:24:29.691540 osd.2 10.0.1.247:6801/4706 17 : [INF] 0.77 scrub ok
2012-02-24 13:24:36.561201   mon e1: 3 mons at {lv-test-1=10.0.1.246:6789/0,lv-test-2=10.0.1.247:6789/0,shark1=10.0.1.81:6789/0}


mount point still inaccessible. Thats all, that sucks


is there proven scenario to build cluster of 3 nodes with replication that tolerates shutdown of two nodes without lockups of read/write process ?



Цитирование "Tommi Virtanen" <tommi.virtanen@dreamhost.com>:
> On Thu, Feb 23, 2012 at 11:07, Gregory Farnum
> <gregory.farnum@dreamhost.com> wrote:
>>>> 3 nodes, each running mon, mds & osd with replication level 3 for data & met
>>>>adata pools.
> ...
>> Actually the OSDs will happily (well, not happily; the will complain.
>> But they will run) run in degraded mode. However, if you have 3 active
>> MDSes and you kill one of them without a standby available, you will
>> lose access to part of your tree. That's probably what happened
>> here...
> 
> So let's try that angle. Slim, can you share the output of "ceph -s" with us
>?

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ceph does not work
  2012-02-24 11:33       ` Дениска-редиска
@ 2012-02-24 16:47         ` Gregory Farnum
  0 siblings, 0 replies; 7+ messages in thread
From: Gregory Farnum @ 2012-02-24 16:47 UTC (permalink / raw)
  To: Дениска-редиска
  Cc: Tommi Virtanen, ceph-devel@vger.kernel.org

On Feb 24, 2012, at 3:33 AM, "Дениска-редиска" <slim@inbox.lv> wrote:

> running cluster of 3 nodes:
>
> lv-test-2 ~ # ceph -s
> 2012-02-24 13:10:35.481248    pg v726: 594 pgs: 594 active+clean; 120 MB data, 683 MB used, 35448 MB / 37967 MB avail
> 2012-02-24 13:10:35.484463   mds e177: 3/3/3 up {0=shark1=up:active,1=lv-test-1=up:active,2=lv-test-2=up:active}
> 2012-02-24 13:10:35.484529   osd e64: 3 osds: 3 up, 3 in
> 2012-02-24 13:10:35.484630   log 2012-02-24 13:09:50.009333 osd.1 10.0.1.246:6801/3929 29 : [INF] 2.5d scrub ok
> 2012-02-24 13:10:35.484907   mon e1: 3 mons at {lv-test-1=10.0.1.246:6789/0,lv-test-2=10.0.1.247:6789/0,shark1=10.0.1.81:6789/0}
>
> mounting by fuse:
> lv-test1 ~ # mount
> ceph-fuse on /uploads type fuse.ceph-fuse (rw,nosuid,nodev,allow_other,default_permissions)
>
> simulating write:
> lv-test-1 ~ # cp -r /usr/src/linux-3.2.2-hardened-r1/ /uploads/
>
> killing one node:
> lv-test-2 ~ # killall ceph-mon ceph-mds  ceph-osd
> Feb 24 13:11:17 lv-test-2 mon.lv-test-2[3474]: *** Caught signal (Terminated) **
> Feb 24 13:11:17 lv-test-2 in thread 3195ce76760. Shutting down.
> Feb 24 13:11:17 lv-test-2 mds.lv-test-2[3553]: *** Caught signal (Terminated) **
> Feb 24 13:11:17 lv-test-2 in thread 2ee100bb760. Shutting down.
> Feb 24 13:11:17 lv-test-2 osd.2[3654]: *** Caught signal (Terminated) **
> Feb 24 13:11:17 lv-test-2 in thread 28f75487760. Shutting down.
> Feb 24 13:11:35 lv-test-2 client.admin[3885]: 3a6a385b700 monclient: hunting for new mon
> Feb 24 13:11:35 lv-test-2 client.admin[3885]: 3a6a385b700 client.5017 ms_handle_reset on 10.0.1.246:6789/0
>
> Feb 24 13:11:17 lv-test-1 mon.lv-test-1[3751]: 2d62330a700 -- 10.0.1.246:6789/0 >> 10.0.1.247:6789/0 pipe(0x522b9ba080 sd=9 pgs=37 cs=1 l=0
> ).fault with nothing to send, going to standby
> Feb 24 13:11:17 lv-test-1 mds.lv-test-1[3830]: 2e3b9bd0700 -- 10.0.1.246:6800/3829 >> 10.0.1.247:6800/3552 pipe(0x5faea44c0 sd=13 pgs=13 cs
> =1 l=0).fault with nothing to send, going to standby
> Feb 24 13:11:17 lv-test-1 client.admin[3151]: 2a9fbe55700 -- 10.0.1.246:0/3151 >> 10.0.1.247:6800/3552 pipe(0x2a9ec00f560 sd=0 pgs=9 cs=3 l
> =0).fault with nothing to send, going to standby
> Feb 24 13:11:17 lv-test-1 osd.1[3930]: 2a51b9f1700 osd.1 64 OSD::ms_handle_reset()
> Feb 24 13:11:17 lv-test-1 osd.1[3930]: 2a51b9f1700 osd.1 64 OSD::ms_handle_reset() s=0x22b4580700
> Feb 24 13:11:17 lv-test-1 client.admin[3151]: 2a9fd95b700 client.4617 ms_handle_reset on 10.0.1.247:6801/3653
> Feb 24 13:11:17 lv-test-1 osd.1[3930]: 2a510df7700 -- 10.0.1.246:6803/3929 >> 10.0.1.247:0/3654 pipe(0x2a50c005000 sd=24 pgs=4 cs=1 l=0).fa
> ult with nothing to send, going to standby
> Feb 24 13:11:18 lv-test-1 osd.1[3930]: 2a5184e6700 -- 10.0.1.246:6802/3929 >> 10.0.1.247:6802/3653 pipe(0x2a5145e9c50 sd=19 pgs=3 cs=1 l=0)
> .fault with nothing to send, going to standby
> Feb 24 13:11:18 lv-test-1 mds.lv-test-1[3830]: 2e3bcadc700 mds.1.5 ms_handle_reset on 10.0.1.247:6801/3653
> Feb 24 13:11:18 lv-test-1 osd.1[3930]: 2a5183e5700 -- 10.0.1.246:0/3930 >> 10.0.1.247:6803/3653 pipe(0x2a5145eaeb0 sd=20 pgs=16 cs=1 l=0).f
> ault with nothing to send, going to standby
> Feb 24 13:11:34 lv-test-1 osd.1[3930]: 2a5229ff700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:14.366355)
> Feb 24 13:11:34 lv-test-1 osd.1[3930]: 2a5117fa700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:14.826382)
> Feb 24 13:11:35 lv-test-1 osd.1[3930]: 2a5229ff700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:15.369660)
> Feb 24 13:11:35 lv-test-1 osd.1[3930]: 2a51b9f1700 monclient: hunting for new mon
> Feb 24 13:11:35 lv-test-1 osd.1[3930]: 2a51b9f1700 osd.1 64 OSD::ms_handle_reset()
> Feb 24 13:11:36 lv-test-1 osd.1[3930]: 2a5117fa700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:16.129635)
> Feb 24 13:11:36 lv-test-1 osd.1[3930]: 2a5229ff700 osd.1 64 heartbeat_check: no heartbeat from osd.0 since 2012-02-24 13:11:14.122485 (cutoff 2012-02-24 13:11:16.372900)
>
> copy hangs (cannot be killed by kill -9), /uploads is not accessible
>
> lv-test-1 ~ # time ceph -s
>
> ^C*** Caught signal (Interrupt) **
> in thread 2da010af760. Shutting down.
>
>
> real    3m16.481s
> user    0m0.037s
> sys     0m0.013s
>
> lv-test-2 ~ # time ceph -s
>
>
> ^C*** Caught signal (Interrupt) **
> in thread 314b193c760. Shutting down.
>
>
> real    0m35.401s
> user    0m0.017s
> sys     0m0.007s
>
> so cluster hanged and not responding anymore
>
> lets bring up back killed node:
>
> lv-test-2 ~ # /etc/init.d/ceph restart
> lv-test-2 ~ # ceph -s
> 2012-02-24 13:20:01.996366    pg v734: 594 pgs: 594 active+clean; 120 MB data, 683 MB used, 35448 MB / 37967 MB avail
> 2012-02-24 13:20:01.999207   mds e177: 3/3/3 up {0=shark1=up:active,1=lv-test-1=up:active,2=lv-test-2=up:active}
> 2012-02-24 13:20:01.999268   osd e64: 3 osds: 3 up, 3 in
> 2012-02-24 13:20:01.999368   log 2012-02-24 13:11:02.267947 osd.1 10.0.1.246:6801/3929 41 : [INF] 2.89 scrub ok
> 2012-02-24 13:20:01.999612   mon e1: 3 mons at {lv-test-1=10.0.1.246:6789/0,lv-test-2=10.0.1.247:6789/0,shark1=10.0.1.81:6789/0}
>
> lv-test-2 ~ # ceph -s
> 2012-02-24 13:20:44.984214    pg v742: 594 pgs: 594 active+clean; 144 MB data, 714 MB used, 35417 MB / 37967 MB avail
> 2012-02-24 13:20:44.986505   mds e182: 3/3/3 up {0=lv-test-2=up:resolve,1=lv-test-1=up:active,2=lv-test-2=up:active(laggy or crashed)}
> 2012-02-24 13:20:44.986697   osd e68: 3 osds: 1 up, 3 in
> 2012-02-24 13:20:44.986918   log 2012-02-24 13:20:42.606730 mon.1 10.0.1.246:6789/0 27 : [INF] mds.1 10.0.1.246:6800/3829 up:active
> 2012-02-24 13:20:44.987118   mon e1: 3 mons at {lv-test-1=10.0.1.246:6789/0,lv-test-2=10.0.1.247:6789/0,shark1=10.0.1.81:6789/0}
>
> Feb 24 13:23:28 lv-test-2 mds.lv-test-2[4608]: 2a93085f700 -- 10.0.1.247:6800/4607 >> 10.0.1.247:6800/3552 pipe(0x19d6f37b40 sd=12 pgs=0 cs=0 l=0).connect claims to be 10.0.1.247:6800/4607 not 10.0.1.247:6800/3552 - wrong node!
>
> Feb 24 13:24:13 lv-test-1 client.admin[3151]: 2a9fc158700 -- 10.0.1.246:0/3151 >> 10.0.1.247:6800/3552 pipe(0x2a9ec00f560 sd=0 pgs=9 cs=4 l=0).connect claims to be 10.0.1.247:6800/4607 not 10.0.1.247:6800/3552 - wrong node!
> Feb 24 13:24:14 lv-test-1 mds.lv-test-1[3830]: 2e3b981b700 -- 10.0.1.246:6800/3829 >> 10.0.1.247:6800/3552 pipe(0x5faea44c0 sd=8 pgs=13 cs=2 l=0).connect claims to be 10.0.1.247:6800/4607 not 10.0.1.247:6800/3552 - wrong node!
>
>
> lv-test-1 ~ # ceph -s
> 2012-02-24 13:24:36.558927    pg v762: 594 pgs: 594 active+clean; 144 MB data, 741 MB used, 35390 MB / 37967 MB avail
> 2012-02-24 13:24:36.560927   mds e195: 3/3/3 up {0=lv-test-2=up:resolve,1=lv-test-1=up:active,2=lv-test-2=up:active(laggy or crashed)}
> 2012-02-24 13:24:36.560988   osd e70: 3 osds: 2 up, 3 in
> 2012-02-24 13:24:36.561092   log 2012-02-24 13:24:29.691540 osd.2 10.0.1.247:6801/4706 17 : [INF] 0.77 scrub ok
> 2012-02-24 13:24:36.561201   mon e1: 3 mons at {lv-test-1=10.0.1.246:6789/0,lv-test-2=10.0.1.247:6789/0,shark1=10.0.1.81:6789/0}

Okay, so you can see here that one MDS is active, one is in the
"resolve" state, and another one is apparently crashed. If you have
logs or core dumps that you can send us we'd appreciate it, but in the
meantime: the Ceph distributed filesystem is not yet production-ready,
and a system with multiple active MDSes is significantly less stable
and well-tested. If you try using one active MDS and leave the test in
standby you will almost certainly see better results (and you're
unlikely to be bottlenecked by it). :)

Also, one of your OSDs is down, and if that crashed it's a much bigger
concern to us right now...can you check the log and see what it says?
-Greg

>
>
> mount point still inaccessible. Thats all, that sucks
>
>
> is there proven scenario to build cluster of 3 nodes with replication that tolerates shutdown of two nodes without lockups of read/write process ?
>
>
>
> Цитирование "Tommi Virtanen" <tommi.virtanen@dreamhost.com>:
>> On Thu, Feb 23, 2012 at 11:07, Gregory Farnum
>> <gregory.farnum@dreamhost.com> wrote:
>>>>> 3 nodes, each running mon, mds & osd with replication level 3 for data & met
>>>>> adata pools.
>> ...
>>> Actually the OSDs will happily (well, not happily; the will complain.
>>> But they will run) run in degraded mode. However, if you have 3 active
>>> MDSes and you kill one of them without a standby available, you will
>>> lose access to part of your tree. That's probably what happened
>>> here...
>>
>> So let's try that angle. Slim, can you share the output of "ceph -s" with us
>> ?
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-02-24 16:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-23  9:15 ceph does not work Дениска-редиска
2012-02-23 19:00 ` Tommi Virtanen
2012-02-23 19:07   ` Gregory Farnum
2012-02-23 19:11     ` Tommi Virtanen
2012-02-24 11:33       ` Дениска-редиска
2012-02-24 16:47         ` Gregory Farnum
2012-02-23 19:09   ` Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.