From: Joao Eduardo Luis <joao.luis@inktank.com>
To: Steffen Thorhauer <thorhaue@iti.cs.uni-magdeburg.de>
Cc: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: [ceph-users] ceph 0.59 cephx problem
Date: Fri, 22 Mar 2013 13:47:55 +0000 [thread overview]
Message-ID: <514C610B.6000306@inktank.com> (raw)
In-Reply-To: <20130322133628.GA28214@geri.cs.uni-magdeburg.de>
(Re-CC'ing the list)
On 03/22/2013 01:36 PM, Steffen Thorhauer wrote:
> I was upgrading from 0.58 to ceph version 0.59 (cbae6a435c62899f857775f66659de052fb0e759)
> Upgrading from 0.57 to 0.58 was an easy one, so I was suprised with the problems
v0.59 is the first dev release with a major monitor rework. We've
tested it thoroughly over the past weeks, but different usages tend to
trigger different behaviours, so you might just have hit one of those
buggers.
> It seems to me, that I make an fatal error, that I dont understand.
> I had 5 working mons (mon.{0-4]). After the upgrade of the first node I
> lost the mon.4 with the cephx error. Then I upgraded all of the nodes and
> I lost the mon.0 with the starting error.
The v0.59 monitors is unable to communicate with the <=0.58 monitors, so
that's likely why the monitor appeared to be lost: you would need at
least a majority of monitors on v0.59 so they could form a quorum.
> After some restarts it looks like the other mons lost any quorum
> so ceph -s or any kind of ceph commands didn't work anymore.
As long as you have a majority of monitors running v0.59, they ought to
be able to form a quorum. If they didn't, then something weird must
have happened and logs would be much appreciated!
> So I made today the decision to reinstall the test "cluster".
You decided to go back to v0.58, is that it? Regardless, if you have
logs that could provide some insight into what happened, we'd really
appreciate it.
Thanks!
-Joao
>
> -Steffen
>
> Btw. ceph rbd, adding/removing osds works great.
>
>> On Fri, Mar 22, 2013 at 10:01:10AM +0000, Joao Eduardo Luis wrote:
>> On 03/21/2013 03:47 PM, Steffen Thorhauer wrote:
>>> I think, I was impatient and should wait for the v.59 announcement. It
>>> seems I should upgrading all monitors.
>>> After upgrading all nodes I have on 2 monitors errors like:
>>> === mon.0 ===
>>> Starting Ceph mon.0 on u124-161-ceph...
>>> mon fs missing 'monmap/latest' and 'mkfs/monmap'
>>> failed: 'ulimit -n 8192; /usr/bin/ceph-mon -i 0 --pid-file
>>> /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf '
>>>
>>> Steffen
>>
>> Which version are you upgrading from?
>>
>> Also, could you provide us with some logs of those monitors with 'debug
>> mon = 20' ?
>>
>> -Joao
>>
>>>
>>>
>>> On 03/21/2013 02:22 PM, Steffen Thorhauer wrote:
>>>> Hi,
>>>> I just upgraded one node of my ceph "cluster". I wanted upgrade node
>>>> after node.
>>>> osd on this node has no problem. but the mon (mon.4) has
>>>> authorization problems.
>>>> I did'nt change any config, just made an apt-get upgrade .
>>>> ceph -s
>>>> health HEALTH_WARN 1 mons down, quorum 0,1,2,3 0,1,2,3
>>>> monmap e2: 5 mons at
>>>> {0=10.37.124.161:6789/0,1=10.37.124.162:6789/0,2=10.37.124.163:6789/0,3=10.37.124.164:6789/0,4=10.37.124.167:6789/0},
>>>> election epoch 162, quorum 0,1,2,3 0,1,2,3
>>>> osdmap e4839: 16 osds: 16 up, 16 in
>>>> pgmap v195213: 3144 pgs: 3144 active+clean; 255 GB data, 820 GB
>>>> used, 778 GB / 1599 GB avail
>>>> mdsmap e54723: 1/1/1 up {0=0=up:active}, 3 up:standby
>>>>
>>>>
>>>> but the mon.4 log file look like:
>>>>
>>>> 2013-03-21 12:45:15.701747 7f45412c6780 2 mon.4@-1(probing) e2 init
>>>> 2013-03-21 12:45:15.702051 7f45412c6780 10 mon.4@-1(probing) e2 bootstrap
>>>> 2013-03-21 12:45:15.702094 7f45412c6780 10 mon.4@-1(probing) e2
>>>> unregister_cluster_logger - not registered
>>>> 2013-03-21 12:45:15.702121 7f45412c6780 10 mon.4@-1(probing) e2
>>>> cancel_probe_timeout (none scheduled)
>>>> 2013-03-21 12:45:15.702147 7f45412c6780 0 mon.4@-1(probing) e2 my
>>>> rank is now 4 (was -1)
>>>> 2013-03-21 12:45:15.702190 7f45412c6780 10 mon.4@4(probing) e2 reset_sync
>>>> 2013-03-21 12:45:15.702213 7f45412c6780 10 mon.4@4(probing) e2 reset
>>>> 2013-03-21 12:45:15.702238 7f45412c6780 10 mon.4@4(probing) e2
>>>> timecheck_finish
>>>> 2013-03-21 12:45:15.702286 7f45412c6780 10 mon.4@4(probing) e2
>>>> cancel_probe_timeout (none scheduled)
>>>> 2013-03-21 12:45:15.702312 7f45412c6780 10 mon.4@4(probing) e2
>>>> reset_probe_timeout 0x24d6580 after 2 seconds
>>>> 2013-03-21 12:45:15.702387 7f45412c6780 10 mon.4@4(probing) e2 probing
>>>> other monitors
>>>> 2013-03-21 12:45:15.703459 7f453a15f700 10 mon.4@4(probing) e2
>>>> ms_get_authorizer for mon
>>>> 2013-03-21 12:45:15.703641 7f453a15f700 10 cephx: build_service_ticket
>>>> service mon secret_id 18446744073709551615 ticket_info.ticket.name=mon.
>>>> 2013-03-21 12:45:15.703642 7f453a361700 10 mon.4@4(probing) e2
>>>> ms_get_authorizer for mon
>>>> 2013-03-21 12:45:15.703694 7f453a361700 10 cephx: build_service_ticket
>>>> service mon secret_id 18446744073709551615 ticket_info.ticket.name=mon.
>>>> 2013-03-21 12:45:15.703869 7f453a260700 10 mon.4@4(probing) e2
>>>> ms_get_authorizer for mon
>>>> 2013-03-21 12:45:15.703957 7f453a260700 10 cephx: build_service_ticket
>>>> service mon secret_id 18446744073709551615 ticket_info.ticket.name=mon.
>>>> 2013-03-21 12:45:15.704244 7f453a05e700 10 mon.4@4(probing) e2
>>>> ms_get_authorizer for mon
>>>> 2013-03-21 12:45:15.704306 7f453a05e700 10 cephx: build_service_ticket
>>>> service mon secret_id 18446744073709551615 ticket_info.ticket.name=mon.
>>>> 2013-03-21 12:45:15.704323 7f453a361700 0 cephx: verify_reply
>>>> coudln't decrypt with error: error decoding block for decryption
>>>> 2013-03-21 12:45:15.704333 7f453a361700 0 -- 10.37.124.167:6789/0 >>
>>>> 10.37.124.161:6789/0 pipe(0x24f3c80 sd=29 :42310 s=1 pgs=0 cs=0
>>>> l=0).failed verifying authorize reply
>>>> 2013-03-21 12:45:15.704404 7f453a361700 0 -- 10.37.124.167:6789/0 >>
>>>> 10.37.124.161:6789/0 pipe(0x24f3c80 sd=29 :42310 s=1 pgs=0 cs=0
>>>> l=0).fault
>>>> 2013-03-21 12:45:15.704429 7f453a15f700 0 cephx: verify_reply
>>>> coudln't decrypt with error: error decoding block for decryption
>>>> 2013-03-21 12:45:15.704483 7f453a15f700 0 -- 10.37.124.167:6789/0 >>
>>>> 10.37.124.163:6789/0 pipe(0x24f3500 sd=31 :60255 s=1 pgs=0 cs=0
>>>> l=0).failed verifying authorize reply
>>>> 2013-03-21 12:45:15.704517 7f453a260700 0 cephx: verify_reply
>>>> coudln't decrypt with error: error decoding block for decryption
>>>> 2013-03-21 12:45:15.704578 7f453a15f700 0 -- 10.37.124.167:6789/0 >>
>>>> 10.37.124.163:6789/0 pipe(0x24f3500 sd=31 :60255 s=1 pgs=0 cs=0
>>>> l=0).fault
>>>> 2013-03-21 12:45:15.704529 7f453a260700 0 -- 10.37.124.167:6789/0 >>
>>>> 10.37.124.162:6789/0 pipe(0x24f3a00 sd=30 :55445 s=1 pgs=0 cs=0
>>>> l=0).failed verifying authorize reply
>>>>
>>>> What now??
>>>>
>>>> Regards,
>>>> Steffen
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
parent reply other threads:[~2013-03-22 13:48 UTC|newest]
Thread overview: expand[flat|nested] mbox.gz Atom feed
[parent not found: <20130322133628.GA28214@geri.cs.uni-magdeburg.de>]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=514C610B.6000306@inktank.com \
--to=joao.luis@inktank.com \
--cc=ceph-devel@vger.kernel.org \
--cc=thorhaue@iti.cs.uni-magdeburg.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.