All of lore.kernel.org
 help / color / mirror / Atom feed
* mds'es stuck in resolve(and one duplicated?)
@ 2012-09-10 21:36 Andrew Thompson
  2012-09-13 18:37 ` Gregory Farnum
  0 siblings, 1 reply; 2+ messages in thread
From: Andrew Thompson @ 2012-09-10 21:36 UTC (permalink / raw)
  To: ceph-devel

[-- Attachment #1: Type: text/plain, Size: 1304 bytes --]

Greetings,

Has anyone seen this or got ideas on how to fix it?

mdsmap e18399: 3/3/3 up {0=b=up:resolve,1=a=up:resolve(laggy or 
crashed),2=a=up:resolve(laggy or crashed)}

Notice that the 2nd and 3rd mds are the same letter("a"). I'm not sure 
how that happened, I'm guessing a typo in my ceph.conf.

Taking mds.a down doesn't help, b just stays in resolve.

mds.a is only running on a single instance, even though it shows as up 
twice.

When I take a mds down, and start it back up, it goes through a couple 
of states and then sticks at resolve.

I've tried the method listed here, but can't see any change: 
http://www.sebastien-han.fr/blog/2012/07/04/remove-a-mds-server-from-a-ceph-cluster/

I tried "ceph mds stop X" as mentioned here 
http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/2585 , but 
see the results below:

athompson@ceph01:~$ sudo ceph mds stop 0
mds.0 not active (up:resolve)
athompson@ceph01:~$ sudo ceph mds stop 1
mds.1 not active (up:resolve)
athompson@ceph01:~$ sudo ceph mds stop 2
mds.2 not active (up:resolve)

I've attached the results of `ceph mds dump -o -`

Currently, mds.b.log is full of these reset/connect's and then where I 
issued a `service ceph stop mds` a few minutes ago(see attached).

Thanks,
Andrew.

-- 
Andrew Thompson
http://aktzero.com/


[-- Attachment #2: mds-dump.txt --]
[-- Type: text/plain, Size: 847 bytes --]

athompson@ceph01:~$ sudo ceph mds dump -o -
dumped mdsmap epoch 18493
epoch   18493
flags   0
created 2012-08-10 16:25:06.747103
modified        2012-09-10 17:29:20.826226
tableserver     0
root    0
session_timeout 60
session_autoclose       300
last_failure    3430
last_failure_osd_epoch  426
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object}
max_mds 3
in      0,1,2
up      {0=5401,1=5524,2=5506}
failed
stopped
data_pools      [0,0]
metadata_pool   1
5401:   172.19.7.54:6800/13793 'b' mds.0.9 up:resolve seq 149 laggy since 2012-09-10 17:21:05.270280
5524:   172.19.7.39:6800/8536 'a' mds.1.11 up:resolve seq 4 laggy since 2012-09-08 02:52:20.668649
5506:   172.19.7.39:6800/7930 'a' mds.2.3 up:resolve seq 5 laggy since 2012-09-08 02:48:05.433724


[-- Attachment #3: ceph-mds.b.log --]
[-- Type: text/plain, Size: 1158 bytes --]

2012-09-10 16:54:23.595995 7f843c55b700  0 mds.0.9 ms_handle_reset on 172.19.7.56:6800/8509
2012-09-10 16:54:23.598638 7f843c55b700  0 mds.0.9 ms_handle_connect on 172.19.7.56:6800/8509
2012-09-10 17:09:09.367041 7f843c55b700  0 mds.0.9 ms_handle_reset on 172.19.7.39:6804/6522
2012-09-10 17:09:09.370663 7f843c55b700  0 mds.0.9 ms_handle_connect on 172.19.7.39:6804/6522
2012-09-10 17:09:22.891795 7f843c55b700  0 mds.0.9 ms_handle_reset on 172.19.7.39:6801/6430
2012-09-10 17:09:22.894177 7f843c55b700  0 mds.0.9 ms_handle_connect on 172.19.7.39:6801/6430
2012-09-10 17:09:23.210881 7f843c55b700  0 mds.0.9 ms_handle_reset on 172.19.7.54:6801/14003
2012-09-10 17:09:23.214310 7f843c55b700  0 mds.0.9 ms_handle_connect on 172.19.7.54:6801/14003
2012-09-10 17:09:23.699220 7f843c55b700  0 mds.0.9 ms_handle_reset on 172.19.7.56:6800/8509
2012-09-10 17:09:23.701789 7f843c55b700  0 mds.0.9 ms_handle_connect on 172.19.7.56:6800/8509
2012-09-10 17:21:28.125699 7f843cd5c700 -1 mds.0.9 *** got signal Terminated ***
2012-09-10 17:21:28.125755 7f843cd5c700  1 mds.0.9 suicide.  wanted down:dne, now up:resolve
2012-09-10 17:21:28.386805 7f84422a6780  0 stopped.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2012-09-13 18:37 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-10 21:36 mds'es stuck in resolve(and one duplicated?) Andrew Thompson
2012-09-13 18:37 ` Gregory Farnum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.