All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Thompson <andrewkt@aktzero.com>
To: ceph-devel@vger.kernel.org
Subject: mds'es stuck in resolve(and one duplicated?)
Date: Mon, 10 Sep 2012 17:36:00 -0400	[thread overview]
Message-ID: <504E5D40.10006@aktzero.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 1304 bytes --]

Greetings,

Has anyone seen this or got ideas on how to fix it?

mdsmap e18399: 3/3/3 up {0=b=up:resolve,1=a=up:resolve(laggy or 
crashed),2=a=up:resolve(laggy or crashed)}

Notice that the 2nd and 3rd mds are the same letter("a"). I'm not sure 
how that happened, I'm guessing a typo in my ceph.conf.

Taking mds.a down doesn't help, b just stays in resolve.

mds.a is only running on a single instance, even though it shows as up 
twice.

When I take a mds down, and start it back up, it goes through a couple 
of states and then sticks at resolve.

I've tried the method listed here, but can't see any change: 
http://www.sebastien-han.fr/blog/2012/07/04/remove-a-mds-server-from-a-ceph-cluster/

I tried "ceph mds stop X" as mentioned here 
http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/2585 , but 
see the results below:

athompson@ceph01:~$ sudo ceph mds stop 0
mds.0 not active (up:resolve)
athompson@ceph01:~$ sudo ceph mds stop 1
mds.1 not active (up:resolve)
athompson@ceph01:~$ sudo ceph mds stop 2
mds.2 not active (up:resolve)

I've attached the results of `ceph mds dump -o -`

Currently, mds.b.log is full of these reset/connect's and then where I 
issued a `service ceph stop mds` a few minutes ago(see attached).

Thanks,
Andrew.

-- 
Andrew Thompson
http://aktzero.com/


[-- Attachment #2: mds-dump.txt --]
[-- Type: text/plain, Size: 847 bytes --]

athompson@ceph01:~$ sudo ceph mds dump -o -
dumped mdsmap epoch 18493
epoch   18493
flags   0
created 2012-08-10 16:25:06.747103
modified        2012-09-10 17:29:20.826226
tableserver     0
root    0
session_timeout 60
session_autoclose       300
last_failure    3430
last_failure_osd_epoch  426
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object}
max_mds 3
in      0,1,2
up      {0=5401,1=5524,2=5506}
failed
stopped
data_pools      [0,0]
metadata_pool   1
5401:   172.19.7.54:6800/13793 'b' mds.0.9 up:resolve seq 149 laggy since 2012-09-10 17:21:05.270280
5524:   172.19.7.39:6800/8536 'a' mds.1.11 up:resolve seq 4 laggy since 2012-09-08 02:52:20.668649
5506:   172.19.7.39:6800/7930 'a' mds.2.3 up:resolve seq 5 laggy since 2012-09-08 02:48:05.433724


[-- Attachment #3: ceph-mds.b.log --]
[-- Type: text/plain, Size: 1158 bytes --]

2012-09-10 16:54:23.595995 7f843c55b700  0 mds.0.9 ms_handle_reset on 172.19.7.56:6800/8509
2012-09-10 16:54:23.598638 7f843c55b700  0 mds.0.9 ms_handle_connect on 172.19.7.56:6800/8509
2012-09-10 17:09:09.367041 7f843c55b700  0 mds.0.9 ms_handle_reset on 172.19.7.39:6804/6522
2012-09-10 17:09:09.370663 7f843c55b700  0 mds.0.9 ms_handle_connect on 172.19.7.39:6804/6522
2012-09-10 17:09:22.891795 7f843c55b700  0 mds.0.9 ms_handle_reset on 172.19.7.39:6801/6430
2012-09-10 17:09:22.894177 7f843c55b700  0 mds.0.9 ms_handle_connect on 172.19.7.39:6801/6430
2012-09-10 17:09:23.210881 7f843c55b700  0 mds.0.9 ms_handle_reset on 172.19.7.54:6801/14003
2012-09-10 17:09:23.214310 7f843c55b700  0 mds.0.9 ms_handle_connect on 172.19.7.54:6801/14003
2012-09-10 17:09:23.699220 7f843c55b700  0 mds.0.9 ms_handle_reset on 172.19.7.56:6800/8509
2012-09-10 17:09:23.701789 7f843c55b700  0 mds.0.9 ms_handle_connect on 172.19.7.56:6800/8509
2012-09-10 17:21:28.125699 7f843cd5c700 -1 mds.0.9 *** got signal Terminated ***
2012-09-10 17:21:28.125755 7f843cd5c700  1 mds.0.9 suicide.  wanted down:dne, now up:resolve
2012-09-10 17:21:28.386805 7f84422a6780  0 stopped.

             reply	other threads:[~2012-09-10 21:36 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-10 21:36 Andrew Thompson [this message]
2012-09-13 18:37 ` mds'es stuck in resolve(and one duplicated?) Gregory Farnum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=504E5D40.10006@aktzero.com \
    --to=andrewkt@aktzero.com \
    --cc=ceph-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.