All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wido den Hollander <wido@widodh.nl>
To: f.wiessner@smart-weblications.de
Cc: ceph-devel@vger.kernel.org
Subject: Re: upgrade from 0.39 to 0.40 failed...
Date: Sat, 21 Jan 2012 17:25:40 +0100	[thread overview]
Message-ID: <4F1AE704.6010005@widodh.nl> (raw)
In-Reply-To: <4F1ABD5C.1000907@smart-weblications.de>

Hi,

On 01/21/2012 02:27 PM, Smart Weblications GmbH - Florian Wiessner wrote:
> Hi List,
>
>
> i today upgraded from 0.39 to 0.40 and also from linux 3.1.5 to 3.2.1 and now
> have the following problems:
>
> first of all, i have a 4 node ceph cluster running.
>
> after the kernel upgrade, 2 of 4 osds failed starting because of btrfs-bugs so i
> now only have two osds available ( i set replication level to 3 so the data
> should be save)
>
> i upgraded to
> ceph version 0.40-1-g7ce8b7a (commit:7ce8b7ae3bbad70fe257db00b6fc566f57f17132)
>
> my ceph.conf looks like this:
>
> node03:/etc/ceph# cat ceph.conf
> [global]
>         pid file = /var/run/ceph/$name.pid
>         debug ms = 1
> #       auth supported = cephx
>         osd journal = /data/ceph.journal
>         osd_journal_size = 512
> #       filestore journal writeahead = true
> #       filestore journal parallel = true
>          mds max = 4
>
> [mon]
>         mon data = /data/ceph/mon
> [mon.0]
>         host = node01
>         mon addr = 192.168.0.4:6789
> [mon.1]
>         host = node02
>         mon addr = 192.168.0.5:6789
> [mon.2]
>         host = node03
>         mon addr = 192.168.0.6:6789
> [mon.3]
>         host = node04
>         mon addr = 192.168.0.7:6789

Although I don't think it is related I'd advise you to switch from 
numeric monitor names to alphanumeric names:

mon.alpha
mon.beta
mon.charlie
mon.delta

A while ago this changed, I also think the configuration doesn't even 
allow numeric monitors anymore? (correct me if I'm wrong!)

>
> [mds]
> #       keyring = /etc/ceph/keyring.$name
> #       mds dir max commit size 32
>
> [mds.0]
>         host = node01
> [mds.1]
>         host = node02
> [mds.2]
>         host = node03
> [mds.3]
>         host = node04
>
>
> [osd]
>         sudo = true
>         osd data = /data/ceph/osd
> #       keyring = /etc/ceph/keyring.$name
> [osd.0]
>         host = node01
> [osd.1]
>         host = node02
> [osd.2]
>         host = node03
> [osd.3]
>         host = node04
>
> i did set up ceph without cephx

When did you enable cephx? Did you do so from the upgrade to 0.39 to 
0.40 or earlier?

>
> so now i upgraded to 0.40 and i have this problem:
>
> node03:/etc/ceph# ceph -w
> 2012-01-21 14:24:16.177441 7f7c45cc0760 -- :/0 messenger.start
> 2012-01-21 14:24:16.177628 7f7c45cc0760 -- :/11220 -->  192.168.0.5:6789/0 --
> auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x22f3f60 con 0x22f3ce0
> 2012-01-21 14:24:16.177926 7f7c45cbf700 -- 192.168.0.6:0/11220 learned my addr
> 192.168.0.6:0/11220
> 2012-01-21 14:24:19.177703 7f7c42244700 -- 192.168.0.6:0/11220 mark_down
> 0x22f3ce0 -- 0x22f3a70
> 2012-01-21 14:24:19.177798 7f7c42244700 -- 192.168.0.6:0/11220 -->
> 192.168.0.6:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x22f3f60 con
> 0x22f51d0
> 2012-01-21 14:24:22.177921 7f7c42244700 -- 192.168.0.6:0/11220 mark_down
> 0x22f51d0 -- 0x22f4f60
> 2012-01-21 14:24:22.177999 7f7c42244700 -- 192.168.0.6:0/11220 -->
> 192.168.0.7:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x22f3f60 con
> 0x22f3ce0
> 2012-01-21 14:24:25.178187 7f7c42244700 -- 192.168.0.6:0/11220 mark_down
> 0x22f3ce0 -- 0x22f3a70
> 2012-01-21 14:24:25.178268 7f7c42244700 -- 192.168.0.6:0/11220 -->
> 192.168.0.6:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x22f3f60 con
> 0x22f4a50
> 2012-01-21 14:24:28.178358 7f7c42244700 -- 192.168.0.6:0/11220 mark_down
> 0x22f4a50 -- 0x22f47e0
> 2012-01-21 14:24:28.178431 7f7c42244700 -- 192.168.0.6:0/11220 -->
> 192.168.0.7:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x22f4310 con
> 0x22f41d0
> 2012-01-21 14:24:31.178511 7f7c42244700 -- 192.168.0.6:0/11220 mark_down
> 0x22f41d0 -- 0x22f3f60
> 2012-01-21 14:24:31.178582 7f7c42244700 -- 192.168.0.6:0/11220 -->
> 192.168.0.6:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x22f4310 con
> 0x22f4a50
> 2012-01-21 14:24:34.178661 7f7c42244700 -- 192.168.0.6:0/11220 mark_down
> 0x22f4a50 -- 0x22f47e0
> 2012-01-21 14:24:34.178729 7f7c42244700 -- 192.168.0.6:0/11220 -->
> 192.168.0.7:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x22f4310 con
> 0x22f41d0
> 2012-01-21 14:24:37.178863 7f7c42244700 -- 192.168.0.6:0/11220 mark_down
> 0x22f41d0 -- 0x22f3f60
> 2012-01-21 14:24:37.178928 7f7c42244700 -- 192.168.0.6:0/11220 -->
> 192.168.0.6:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x22f4310 con
> 0x22f4a50
> 2012-01-21 14:24:40.179067 7f7c42244700 -- 192.168.0.6:0/11220 mark_down
> 0x22f4a50 -- 0x22f47e0
> 2012-01-21 14:24:40.179156 7f7c42244700 -- 192.168.0.6:0/11220 -->
> 192.168.0.4:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x22f4b90 con
> 0x22f4320
> 2012-01-21 14:24:43.179312 7f7c42244700 -- 192.168.0.6:0/11220 mark_down
> 0x22f4320 -- 0x22f40b0
> 2012-01-21 14:24:43.179380 7f7c42244700 -- 192.168.0.6:0/11220 -->
> 192.168.0.7:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x22f4b90 con
> 0x22f4a50
> 2012-01-21 14:24:46.179464 7f7c42244700 -- 192.168.0.6:0/11220 mark_down
> 0x22f4a50 -- 0x22f47e0
> 2012-01-21 14:24:46.179533 7f7c42244700 -- 192.168.0.6:0/11220 -->
> 192.168.0.6:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x22f4b90 con
> 0x22f41d0
> 2012-01-21 14:24:49.179671 7f7c42244700 -- 192.168.0.6:0/11220 mark_down
> 0x22f41d0 -- 0x22f3f60
> 2012-01-21 14:24:49.179746 7f7c42244700 -- 192.168.0.6:0/11220 -->
> 192.168.0.5:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x22f4310 con
> 0x22f4a50
> ^C*** Caught signal (Interrupt) **
>   in thread 7f7c45cc0760. Shutting down.
>
>
> node03:/etc/ceph# rbd ls
> 2012-01-21 14:25:27.876338 7f80bcf9e760 -- :/0 messenger.start
> 2012-01-21 14:25:27.876499 7f80bcf9e760 -- :/1011679 -->  192.168.0.4:6789/0 --
> auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0xa3c020 con 0xa3bda0
> 2012-01-21 14:25:27.876787 7f80bcf9d700 -- 192.168.0.6:0/1011679 learned my addr
> 192.168.0.6:0/1011679
> 2012-01-21 14:25:30.876586 7f80b9569700 -- 192.168.0.6:0/1011679 mark_down
> 0xa3bda0 -- 0xa3bb30
> 2012-01-21 14:25:30.876675 7f80b9569700 -- 192.168.0.6:0/1011679 -->
> 192.168.0.6:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0xa3c020 con 0xa402c0
> 2012-01-21 14:25:33.876794 7f80b9569700 -- 192.168.0.6:0/1011679 mark_down
> 0xa402c0 -- 0xa40050
> 2012-01-21 14:25:33.876871 7f80b9569700 -- 192.168.0.6:0/1011679 -->
> 192.168.0.7:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0xa3bc80 con 0xa3c9f0
> 2012-01-21 14:25:36.876987 7f80b9569700 -- 192.168.0.6:0/1011679 mark_down
> 0xa3c9f0 -- 0xa3c780
> 2012-01-21 14:25:36.877068 7f80b9569700 -- 192.168.0.6:0/1011679 -->
> 192.168.0.4:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f80b40008d0
> con 0xa3c290
> 2012-01-21 14:25:39.877248 7f80b9569700 -- 192.168.0.6:0/1011679 mark_down
> 0xa3c290 -- 0xa3c020
> 2012-01-21 14:25:39.877324 7f80b9569700 -- 192.168.0.6:0/1011679 -->
> 192.168.0.7:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f80b4000af0
> con 0x7f80b40008b0
> 2012-01-21 14:25:42.877424 7f80b9569700 -- 192.168.0.6:0/1011679 mark_down
> 0x7f80b40008b0 -- 0x7f80b4000e00
> 2012-01-21 14:25:42.877496 7f80b9569700 -- 192.168.0.6:0/1011679 -->
> 192.168.0.5:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f80b4001450
> con 0x7f80b4001310
> 2012-01-21 14:25:45.877624 7f80b9569700 -- 192.168.0.6:0/1011679 mark_down
> 0x7f80b4001310 -- 0x7f80b4000af0
> 2012-01-21 14:25:45.877706 7f80b9569700 -- 192.168.0.6:0/1011679 -->
> 192.168.0.7:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f80b40010c0
> con 0x7f80b40008b0
> 2012-01-21 14:25:48.877815 7f80b9569700 -- 192.168.0.6:0/1011679 mark_down
> 0x7f80b40008b0 -- 0x7f80b4001450
> 2012-01-21 14:25:48.877889 7f80b9569700 -- 192.168.0.6:0/1011679 -->
> 192.168.0.5:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f80b40010c0
> con 0x7f80b40012e0
> 2012-01-21 14:25:51.877988 7f80b9569700 -- 192.168.0.6:0/1011679 mark_down
> 0x7f80b40012e0 -- 0x7f80b4000d70
> 2012-01-21 14:25:51.878060 7f80b9569700 -- 192.168.0.6:0/1011679 -->
> 192.168.0.7:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f80b4000a90
> con 0x7f80b4000950
> 2012-01-21 14:25:54.878187 7f80b9569700 -- 192.168.0.6:0/1011679 mark_down
> 0x7f80b4000950 -- 0x7f80b4000fe0
> 2012-01-21 14:25:54.878260 7f80b9569700 -- 192.168.0.6:0/1011679 -->
> 192.168.0.6:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f80b4000a90
> con 0x7f80b4000e90
> 2012-01-21 14:25:57.876601 7f80bcf9e760 monclient(hunting): authenticate timed
> out after 30
> 2012-01-21 14:25:57.876648 7f80bcf9e760 librados: client.admin authentication
> error (110) Connection timed out
> 2012-01-21 14:25:57.876768 7f80bcf9e760 -- 192.168.0.6:0/1011679 shutdown complete.
> error: couldn't connect to the cluster!
>
> i am now unable to mount ceph directly as fs, nor access my rbd images.
>
> reverting to 0.39 also does not work, the osds then fail starting claiming that
> the filestore does not belong to the osd
>
> Please advise!
>
>


  reply	other threads:[~2012-01-21 16:25 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-21 13:27 upgrade from 0.39 to 0.40 failed Smart Weblications GmbH - Florian Wiessner
2012-01-21 16:25 ` Wido den Hollander [this message]
2012-01-21 16:30   ` Smart Weblications GmbH - Florian Wiessner
2012-01-21 17:01     ` Gregory Farnum
2012-01-21 17:34       ` Smart Weblications GmbH - Florian Wiessner
2012-01-21 17:43       ` Smart Weblications GmbH - Florian Wiessner
2012-01-22  1:19         ` Yehuda Sadeh Weinraub
2012-01-22 12:25           ` Smart Weblications GmbH - Florian Wiessner
2012-01-23 18:44             ` Gregory Farnum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F1AE704.6010005@widodh.nl \
    --to=wido@widodh.nl \
    --cc=ceph-devel@vger.kernel.org \
    --cc=f.wiessner@smart-weblications.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.