* Strange behavior after upgrading to 0.48
@ 2012-07-05 6:41 Xiaopong Tran
2012-07-05 6:47 ` Xiaopong Tran
2012-07-05 14:26 ` Sage Weil
0 siblings, 2 replies; 10+ messages in thread
From: Xiaopong Tran @ 2012-07-05 6:41 UTC (permalink / raw)
To: ceph-devel
Hi,
I put up a small cluster with 3 osds, 2 mds, 3 mons, on 3 machines.
They were running 0.47.2, and this is a test to do rolling upgrade to
0.48.
I shutdown, upgraded the software, then restarted. One node at a time.
The first two seemed to be ok. The third one gave me some weird thing.
While it was doing the conversion and recovering, the command ceph -s
gives things like this:
root@china:/tmp# ceph -s
2012-07-05 14:28:41.069470 7fa3c8443780 2 auth: KeyRing::load: loaded
key file /etc/ceph/client.admin.keyring
2012-07-05 14:28:41.594229 7fa3c030e700 0 monclient: hunting for new mon
2012-07-05 14:28:41.596313 7fa3c030e700 0 monclient: hunting for new mon
2012-07-05 14:28:41.598949 7fa3c030e700 0 monclient: hunting for new mon
2012-07-05 14:28:41.601158 7fa3c030e700 0 monclient: hunting for new mon
2012-07-05 14:28:41.603069 7fa3c030e700 0 monclient: hunting for new mon
2012-07-05 14:28:41.605020 7fa3c030e700 0 monclient: hunting for new mon
2012-07-05 14:28:41.607436 7fa3c030e700 0 monclient: hunting for new mon
2012-07-05 14:28:41.609304 7fa3c030e700 0 monclient: hunting for new mon
2012-07-05 14:28:41.611047 7fa3c030e700 0 monclient: hunting for new mon
2012-07-05 14:28:41.667980 7fa3c030e700 0 monclient: hunting for new mon
2012-07-05 14:28:41.670283 7fa3c030e700 0 monclient: hunting for new mon
2012-07-05 14:28:41.672274 7fa3c030e700 0 monclient: hunting for new mon
....
And it never stopped. I was thinking, maybe it just behaved like
that during recovery. But after the recovery is done, it still
get the same thing:
root@china:/tmp# ceph health
2012-07-05 14:28:55.077364 7f8306a0d780 2 auth: KeyRing::load: loaded
key file /etc/ceph/client.admin.keyring
HEALTH_OK
root@china:/tmp# ceph -s
2012-07-05 14:30:49.688017 7feb6338e780 2 auth: KeyRing::load: loaded
key file /etc/ceph/client.admin.keyring
2012-07-05 14:30:49.691690 7feb5b259700 0 monclient: hunting for new mon
2012-07-05 14:30:49.694295 7feb5b259700 0 monclient: hunting for new mon
2012-07-05 14:30:49.696487 7feb5b259700 0 monclient: hunting for new mon
2012-07-05 14:30:49.698953 7feb5b259700 0 monclient: hunting for new mon
2012-07-05 14:30:49.700833 7feb5b259700 0 monclient: hunting for new mon
....
Upgrading the first two nodes have no such problem. This first two
nodes all run osd, mds, and mon. The third only runs osd and mon.
The mon log on the 3rd node shows this, not sure if this is helpful:
....
925291 lease_expire=2012-07-05 02:38:14.149966 has v44 lc 44
2012-07-05 02:38:12.572107 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap
active c 29531..30031) is_readable now=2012-07-05 02:38:12.572114
lease_expire=2012-07-05 02:38:15.889056 has v0 lc 30031
2012-07-05 02:38:12.572128 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap
active c 29531..30031) is_readable now=2012-07-05 02:38:12.572129
lease_expire=2012-07-05 02:38:15.889056 has v0 lc 30031
2012-07-05 02:38:15.120439 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap
active c 1..44) is_readable now=2012-07-05 02:38:15.120446
lease_expire=2012-07-05 02:38:17.149967 has v44 lc 44
2012-07-05 02:38:15.925349 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap
active c 1..44) is_readable now=2012-07-05 02:38:15.925356
lease_expire=2012-07-05 02:38:20.149971 has v44 lc 44
2012-07-05 02:38:17.572181 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap
active c 29531..30031) is_readable now=2012-07-05 02:38:17.572189
lease_expire=2012-07-05 02:38:21.889065 has v0 lc 30031
2012-07-05 02:38:17.572204 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap
active c 29531..30031) is_readable now=2012-07-05 02:38:17.572205
lease_expire=2012-07-05 02:38:21.889065 has v0 lc 30031
2012-07-05 02:38:19.120463 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap
active c 1..44) is_readable now=2012-07-05 02:38:19.120470
lease_expire=2012-07-05 02:38:23.149973 has v44 lc 44
2012-07-05 02:38:19.925323 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap
active c 1..44) is_readable now=2012-07-05 02:38:19.925330
lease_expire=2012-07-05 02:38:23.149973 has v44 lc 44
Could someone give a hint on this?
Thanks
Xiaopong
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Strange behavior after upgrading to 0.48
2012-07-05 6:41 Strange behavior after upgrading to 0.48 Xiaopong Tran
@ 2012-07-05 6:47 ` Xiaopong Tran
2012-07-05 14:26 ` Sage Weil
1 sibling, 0 replies; 10+ messages in thread
From: Xiaopong Tran @ 2012-07-05 6:47 UTC (permalink / raw)
To: ceph-devel
When I run the command ceph -s, I see the following information on
the mon log:
2012-07-05 02:44:13.298942 7f7d92b14700 0 can't decode unknown message
type 54 MSG_AUTH=17
2012-07-05 02:44:13.301588 7f7d9401b700 1 mon.a@0(leader).paxos(auth
active c 412..432) is_readable now=2012-07-05 02:44:13.301590
lease_expire=2012-07-05 02:44:17.566529 has v0 lc 432
2012-07-05 02:44:13.302113 7f7d9401b700 1 mon.a@0(leader).paxos(auth
active c 412..432) is_readable now=2012-07-05 02:44:13.302114
lease_expire=2012-07-05 02:44:17.566529 has v0 lc 432
2012-07-05 02:44:13.303072 7f7d92b14700 0 can't decode unknown message
type 54 MSG_AUTH=17
2012-07-05 02:44:13.309450 7f7d9401b700 1 mon.a@0(leader).paxos(auth
active c 412..432) is_readable now=2012-07-05 02:44:13.309452
lease_expire=2012-07-05 02:44:17.566529 has v0 lc 432
2012-07-05 02:44:13.309845 7f7d9401b700 1 mon.a@0(leader).paxos(auth
active c 412..432) is_readable now=2012-07-05 02:44:13.309847
lease_expire=2012-07-05 02:44:17.566529 has v0 lc 432
....
Couldn't find any helpful information regarding "can't decode"
error message, unless digging into the codes.
Thanks for any hint.
Xiaopong
On 07/05/2012 02:41 PM, Xiaopong Tran wrote:
> Hi,
>
> I put up a small cluster with 3 osds, 2 mds, 3 mons, on 3 machines.
> They were running 0.47.2, and this is a test to do rolling upgrade to
> 0.48.
>
> I shutdown, upgraded the software, then restarted. One node at a time.
> The first two seemed to be ok. The third one gave me some weird thing.
> While it was doing the conversion and recovering, the command ceph -s
> gives things like this:
>
>
> root@china:/tmp# ceph -s
> 2012-07-05 14:28:41.069470 7fa3c8443780 2 auth: KeyRing::load: loaded
> key file /etc/ceph/client.admin.keyring
> 2012-07-05 14:28:41.594229 7fa3c030e700 0 monclient: hunting for new mon
> 2012-07-05 14:28:41.596313 7fa3c030e700 0 monclient: hunting for new mon
> 2012-07-05 14:28:41.598949 7fa3c030e700 0 monclient: hunting for new mon
> 2012-07-05 14:28:41.601158 7fa3c030e700 0 monclient: hunting for new mon
> 2012-07-05 14:28:41.603069 7fa3c030e700 0 monclient: hunting for new mon
> 2012-07-05 14:28:41.605020 7fa3c030e700 0 monclient: hunting for new mon
> 2012-07-05 14:28:41.607436 7fa3c030e700 0 monclient: hunting for new mon
> 2012-07-05 14:28:41.609304 7fa3c030e700 0 monclient: hunting for new mon
> 2012-07-05 14:28:41.611047 7fa3c030e700 0 monclient: hunting for new mon
> 2012-07-05 14:28:41.667980 7fa3c030e700 0 monclient: hunting for new mon
> 2012-07-05 14:28:41.670283 7fa3c030e700 0 monclient: hunting for new mon
> 2012-07-05 14:28:41.672274 7fa3c030e700 0 monclient: hunting for new mon
> ....
>
> And it never stopped. I was thinking, maybe it just behaved like
> that during recovery. But after the recovery is done, it still
> get the same thing:
>
> root@china:/tmp# ceph health
> 2012-07-05 14:28:55.077364 7f8306a0d780 2 auth: KeyRing::load: loaded
> key file /etc/ceph/client.admin.keyring
> HEALTH_OK
> root@china:/tmp# ceph -s
> 2012-07-05 14:30:49.688017 7feb6338e780 2 auth: KeyRing::load: loaded
> key file /etc/ceph/client.admin.keyring
> 2012-07-05 14:30:49.691690 7feb5b259700 0 monclient: hunting for new mon
> 2012-07-05 14:30:49.694295 7feb5b259700 0 monclient: hunting for new mon
> 2012-07-05 14:30:49.696487 7feb5b259700 0 monclient: hunting for new mon
> 2012-07-05 14:30:49.698953 7feb5b259700 0 monclient: hunting for new mon
> 2012-07-05 14:30:49.700833 7feb5b259700 0 monclient: hunting for new mon
> ....
>
> Upgrading the first two nodes have no such problem. This first two
> nodes all run osd, mds, and mon. The third only runs osd and mon.
>
> The mon log on the 3rd node shows this, not sure if this is helpful:
>
> ....
> 925291 lease_expire=2012-07-05 02:38:14.149966 has v44 lc 44
> 2012-07-05 02:38:12.572107 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap
> active c 29531..30031) is_readable now=2012-07-05 02:38:12.572114
> lease_expire=2012-07-05 02:38:15.889056 has v0 lc 30031
> 2012-07-05 02:38:12.572128 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap
> active c 29531..30031) is_readable now=2012-07-05 02:38:12.572129
> lease_expire=2012-07-05 02:38:15.889056 has v0 lc 30031
> 2012-07-05 02:38:15.120439 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap
> active c 1..44) is_readable now=2012-07-05 02:38:15.120446
> lease_expire=2012-07-05 02:38:17.149967 has v44 lc 44
> 2012-07-05 02:38:15.925349 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap
> active c 1..44) is_readable now=2012-07-05 02:38:15.925356
> lease_expire=2012-07-05 02:38:20.149971 has v44 lc 44
> 2012-07-05 02:38:17.572181 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap
> active c 29531..30031) is_readable now=2012-07-05 02:38:17.572189
> lease_expire=2012-07-05 02:38:21.889065 has v0 lc 30031
> 2012-07-05 02:38:17.572204 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap
> active c 29531..30031) is_readable now=2012-07-05 02:38:17.572205
> lease_expire=2012-07-05 02:38:21.889065 has v0 lc 30031
> 2012-07-05 02:38:19.120463 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap
> active c 1..44) is_readable now=2012-07-05 02:38:19.120470
> lease_expire=2012-07-05 02:38:23.149973 has v44 lc 44
> 2012-07-05 02:38:19.925323 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap
> active c 1..44) is_readable now=2012-07-05 02:38:19.925330
> lease_expire=2012-07-05 02:38:23.149973 has v44 lc 44
>
> Could someone give a hint on this?
>
> Thanks
>
> Xiaopong
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Strange behavior after upgrading to 0.48
2012-07-05 6:41 Strange behavior after upgrading to 0.48 Xiaopong Tran
2012-07-05 6:47 ` Xiaopong Tran
@ 2012-07-05 14:26 ` Sage Weil
2012-07-05 14:35 ` Xiaopong Tran
1 sibling, 1 reply; 10+ messages in thread
From: Sage Weil @ 2012-07-05 14:26 UTC (permalink / raw)
To: Xiaopong Tran; +Cc: ceph-devel
Hi,
On Thu, 5 Jul 2012, Xiaopong Tran wrote:
> Hi,
>
> I put up a small cluster with 3 osds, 2 mds, 3 mons, on 3 machines.
> They were running 0.47.2, and this is a test to do rolling upgrade to
> 0.48.
>
> I shutdown, upgraded the software, then restarted. One node at a time.
> The first two seemed to be ok. The third one gave me some weird thing.
> While it was doing the conversion and recovering, the command ceph -s gives
> things like this:
>
>
> root@china:/tmp# ceph -s
> 2012-07-05 14:28:41.069470 7fa3c8443780 2 auth: KeyRing::load: loaded key
> file /etc/ceph/client.admin.keyring
> 2012-07-05 14:28:41.594229 7fa3c030e700 0 monclient: hunting for new mon
> 2012-07-05 14:28:41.596313 7fa3c030e700 0 monclient: hunting for new mon
> 2012-07-05 14:28:41.598949 7fa3c030e700 0 monclient: hunting for new mon
> 2012-07-05 14:28:41.601158 7fa3c030e700 0 monclient: hunting for new mon
> 2012-07-05 14:28:41.603069 7fa3c030e700 0 monclient: hunting for new mon
> 2012-07-05 14:28:41.605020 7fa3c030e700 0 monclient: hunting for new mon
> 2012-07-05 14:28:41.607436 7fa3c030e700 0 monclient: hunting for new mon
> 2012-07-05 14:28:41.609304 7fa3c030e700 0 monclient: hunting for new mon
> 2012-07-05 14:28:41.611047 7fa3c030e700 0 monclient: hunting for new mon
> 2012-07-05 14:28:41.667980 7fa3c030e700 0 monclient: hunting for new mon
> 2012-07-05 14:28:41.670283 7fa3c030e700 0 monclient: hunting for new mon
> 2012-07-05 14:28:41.672274 7fa3c030e700 0 monclient: hunting for new mon
> ....
The problem is that the ceph utility itself is pre-0.48, but the monitors
are running 0.48. You need to upgrade the utility as well. (There was a
note about this in the release announcement.)
This only affects the -s and -w commands.
sage
>
> And it never stopped. I was thinking, maybe it just behaved like
> that during recovery. But after the recovery is done, it still
> get the same thing:
>
> root@china:/tmp# ceph health
> 2012-07-05 14:28:55.077364 7f8306a0d780 2 auth: KeyRing::load: loaded key
> file /etc/ceph/client.admin.keyring
> HEALTH_OK
> root@china:/tmp# ceph -s
> 2012-07-05 14:30:49.688017 7feb6338e780 2 auth: KeyRing::load: loaded key
> file /etc/ceph/client.admin.keyring
> 2012-07-05 14:30:49.691690 7feb5b259700 0 monclient: hunting for new mon
> 2012-07-05 14:30:49.694295 7feb5b259700 0 monclient: hunting for new mon
> 2012-07-05 14:30:49.696487 7feb5b259700 0 monclient: hunting for new mon
> 2012-07-05 14:30:49.698953 7feb5b259700 0 monclient: hunting for new mon
> 2012-07-05 14:30:49.700833 7feb5b259700 0 monclient: hunting for new mon
> ....
>
> Upgrading the first two nodes have no such problem. This first two
> nodes all run osd, mds, and mon. The third only runs osd and mon.
>
> The mon log on the 3rd node shows this, not sure if this is helpful:
>
> ....
> 925291 lease_expire=2012-07-05 02:38:14.149966 has v44 lc 44
> 2012-07-05 02:38:12.572107 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap active
> c 29531..30031) is_readable now=2012-07-05 02:38:12.572114
> lease_expire=2012-07-05 02:38:15.889056 has v0 lc 30031
> 2012-07-05 02:38:12.572128 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap active
> c 29531..30031) is_readable now=2012-07-05 02:38:12.572129
> lease_expire=2012-07-05 02:38:15.889056 has v0 lc 30031
> 2012-07-05 02:38:15.120439 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap active
> c 1..44) is_readable now=2012-07-05 02:38:15.120446 lease_expire=2012-07-05
> 02:38:17.149967 has v44 lc 44
> 2012-07-05 02:38:15.925349 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap active
> c 1..44) is_readable now=2012-07-05 02:38:15.925356 lease_expire=2012-07-05
> 02:38:20.149971 has v44 lc 44
> 2012-07-05 02:38:17.572181 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap active
> c 29531..30031) is_readable now=2012-07-05 02:38:17.572189
> lease_expire=2012-07-05 02:38:21.889065 has v0 lc 30031
> 2012-07-05 02:38:17.572204 7f7d9381a700 1 mon.a@0(leader).paxos(pgmap active
> c 29531..30031) is_readable now=2012-07-05 02:38:17.572205
> lease_expire=2012-07-05 02:38:21.889065 has v0 lc 30031
> 2012-07-05 02:38:19.120463 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap active
> c 1..44) is_readable now=2012-07-05 02:38:19.120470 lease_expire=2012-07-05
> 02:38:23.149973 has v44 lc 44
> 2012-07-05 02:38:19.925323 7f7d9401b700 1 mon.a@0(leader).paxos(mdsmap active
> c 1..44) is_readable now=2012-07-05 02:38:19.925330 lease_expire=2012-07-05
> 02:38:23.149973 has v44 lc 44
>
> Could someone give a hint on this?
>
> Thanks
>
> Xiaopong
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Strange behavior after upgrading to 0.48
2012-07-05 14:26 ` Sage Weil
@ 2012-07-05 14:35 ` Xiaopong Tran
2012-07-05 14:38 ` Sage Weil
0 siblings, 1 reply; 10+ messages in thread
From: Xiaopong Tran @ 2012-07-05 14:35 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Sage Weil <sage@inktank.com> wrote:
>Hi,
>
>On Thu, 5 Jul 2012, Xiaopong Tran wrote:
>> Hi,
>>
>> I put up a small cluster with 3 osds, 2 mds, 3 mons, on 3 machines.
>> They were running 0.47.2, and this is a test to do rolling upgrade to
>> 0.48.
>>
>> I shutdown, upgraded the software, then restarted. One node at a
>time.
>> The first two seemed to be ok. The third one gave me some weird
>thing.
>> While it was doing the conversion and recovering, the command ceph -s
>gives
>> things like this:
>>
>>
>> root@china:/tmp# ceph -s
>> 2012-07-05 14:28:41.069470 7fa3c8443780 2 auth: KeyRing::load:
>loaded key
>> file /etc/ceph/client.admin.keyring
>> 2012-07-05 14:28:41.594229 7fa3c030e700 0 monclient: hunting for new
>mon
>> 2012-07-05 14:28:41.596313 7fa3c030e700 0 monclient: hunting for new
>mon
>> 2012-07-05 14:28:41.598949 7fa3c030e700 0 monclient: hunting for new
>mon
>> 2012-07-05 14:28:41.601158 7fa3c030e700 0 monclient: hunting for new
>mon
>> 2012-07-05 14:28:41.603069 7fa3c030e700 0 monclient: hunting for new
>mon
>> 2012-07-05 14:28:41.605020 7fa3c030e700 0 monclient: hunting for new
>mon
>> 2012-07-05 14:28:41.607436 7fa3c030e700 0 monclient: hunting for new
>mon
>> 2012-07-05 14:28:41.609304 7fa3c030e700 0 monclient: hunting for new
>mon
>> 2012-07-05 14:28:41.611047 7fa3c030e700 0 monclient: hunting for new
>mon
>> 2012-07-05 14:28:41.667980 7fa3c030e700 0 monclient: hunting for new
>mon
>> 2012-07-05 14:28:41.670283 7fa3c030e700 0 monclient: hunting for new
>mon
>> 2012-07-05 14:28:41.672274 7fa3c030e700 0 monclient: hunting for new
>mon
>> ....
>
>The problem is that the ceph utility itself is pre-0.48, but the
>monitors
>are running 0.48. You need to upgrade the utility as well. (There was
>a
>note about this in the release announcement.)
>
>This only affects the -s and -w commands.
>
>sage
I have read the notes, andupgraded the utility first. There was no problem when the first two were upgraded and recovering. This only happened when the third node is upgraded.
The nodes are running debian wheezy, while the client admin node is running ubuntu 12.04.
thanks
Xiaopong
>
>>
>> And it never stopped. I was thinking, maybe it just behaved like
>> that during recovery. But after the recovery is done, it still
>> get the same thing:
>>
>> root@china:/tmp# ceph health
>> 2012-07-05 14:28:55.077364 7f8306a0d780 2 auth: KeyRing::load:
>loaded key
>> file /etc/ceph/client.admin.keyring
>> HEALTH_OK
>> root@china:/tmp# ceph -s
>> 2012-07-05 14:30:49.688017 7feb6338e780 2 auth: KeyRing::load:
>loaded key
>> file /etc/ceph/client.admin.keyring
>> 2012-07-05 14:30:49.691690 7feb5b259700 0 monclient: hunting for new
>mon
>> 2012-07-05 14:30:49.694295 7feb5b259700 0 monclient: hunting for new
>mon
>> 2012-07-05 14:30:49.696487 7feb5b259700 0 monclient: hunting for new
>mon
>> 2012-07-05 14:30:49.698953 7feb5b259700 0 monclient: hunting for new
>mon
>> 2012-07-05 14:30:49.700833 7feb5b259700 0 monclient: hunting for new
>mon
>> ....
>>
>> Upgrading the first two nodes have no such problem. This first two
>> nodes all run osd, mds, and mon. The third only runs osd and mon.
>>
>> The mon log on the 3rd node shows this, not sure if this is helpful:
>>
>> ....
>> 925291 lease_expire=2012-07-05 02:38:14.149966 has v44 lc 44
>> 2012-07-05 02:38:12.572107 7f7d9381a700 1
>mon.a@0(leader).paxos(pgmap active
>> c 29531..30031) is_readable now=2012-07-05 02:38:12.572114
>> lease_expire=2012-07-05 02:38:15.889056 has v0 lc 30031
>> 2012-07-05 02:38:12.572128 7f7d9381a700 1
>mon.a@0(leader).paxos(pgmap active
>> c 29531..30031) is_readable now=2012-07-05 02:38:12.572129
>> lease_expire=2012-07-05 02:38:15.889056 has v0 lc 30031
>> 2012-07-05 02:38:15.120439 7f7d9401b700 1
>mon.a@0(leader).paxos(mdsmap active
>> c 1..44) is_readable now=2012-07-05 02:38:15.120446
>lease_expire=2012-07-05
>> 02:38:17.149967 has v44 lc 44
>> 2012-07-05 02:38:15.925349 7f7d9401b700 1
>mon.a@0(leader).paxos(mdsmap active
>> c 1..44) is_readable now=2012-07-05 02:38:15.925356
>lease_expire=2012-07-05
>> 02:38:20.149971 has v44 lc 44
>> 2012-07-05 02:38:17.572181 7f7d9381a700 1
>mon.a@0(leader).paxos(pgmap active
>> c 29531..30031) is_readable now=2012-07-05 02:38:17.572189
>> lease_expire=2012-07-05 02:38:21.889065 has v0 lc 30031
>> 2012-07-05 02:38:17.572204 7f7d9381a700 1
>mon.a@0(leader).paxos(pgmap active
>> c 29531..30031) is_readable now=2012-07-05 02:38:17.572205
>> lease_expire=2012-07-05 02:38:21.889065 has v0 lc 30031
>> 2012-07-05 02:38:19.120463 7f7d9401b700 1
>mon.a@0(leader).paxos(mdsmap active
>> c 1..44) is_readable now=2012-07-05 02:38:19.120470
>lease_expire=2012-07-05
>> 02:38:23.149973 has v44 lc 44
>> 2012-07-05 02:38:19.925323 7f7d9401b700 1
>mon.a@0(leader).paxos(mdsmap active
>> c 1..44) is_readable now=2012-07-05 02:38:19.925330
>lease_expire=2012-07-05
>> 02:38:23.149973 has v44 lc 44
>>
>> Could someone give a hint on this?
>>
>> Thanks
>>
>> Xiaopong
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Strange behavior after upgrading to 0.48
2012-07-05 14:35 ` Xiaopong Tran
@ 2012-07-05 14:38 ` Sage Weil
2012-07-06 2:38 ` Xiaopong Tran
0 siblings, 1 reply; 10+ messages in thread
From: Sage Weil @ 2012-07-05 14:38 UTC (permalink / raw)
To: Xiaopong Tran; +Cc: ceph-devel
On Thu, 5 Jul 2012, Xiaopong Tran wrote:
> Sage Weil <sage@inktank.com> wrote:
>
> >Hi,
> >
> >On Thu, 5 Jul 2012, Xiaopong Tran wrote:
> >> Hi,
> >>
> >> I put up a small cluster with 3 osds, 2 mds, 3 mons, on 3 machines.
> >> They were running 0.47.2, and this is a test to do rolling upgrade to
> >> 0.48.
> >>
> >> I shutdown, upgraded the software, then restarted. One node at a
> >time.
> >> The first two seemed to be ok. The third one gave me some weird
> >thing.
> >> While it was doing the conversion and recovering, the command ceph -s
> >gives
> >> things like this:
> >>
> >>
> >> root@china:/tmp# ceph -s
> >> 2012-07-05 14:28:41.069470 7fa3c8443780 2 auth: KeyRing::load:
> >loaded key
> >> file /etc/ceph/client.admin.keyring
> >> 2012-07-05 14:28:41.594229 7fa3c030e700 0 monclient: hunting for new
> >mon
> >> 2012-07-05 14:28:41.596313 7fa3c030e700 0 monclient: hunting for new
> >mon
> >> 2012-07-05 14:28:41.598949 7fa3c030e700 0 monclient: hunting for new
> >mon
> >> 2012-07-05 14:28:41.601158 7fa3c030e700 0 monclient: hunting for new
> >mon
> >> 2012-07-05 14:28:41.603069 7fa3c030e700 0 monclient: hunting for new
> >mon
> >> 2012-07-05 14:28:41.605020 7fa3c030e700 0 monclient: hunting for new
> >mon
> >> 2012-07-05 14:28:41.607436 7fa3c030e700 0 monclient: hunting for new
> >mon
> >> 2012-07-05 14:28:41.609304 7fa3c030e700 0 monclient: hunting for new
> >mon
> >> 2012-07-05 14:28:41.611047 7fa3c030e700 0 monclient: hunting for new
> >mon
> >> 2012-07-05 14:28:41.667980 7fa3c030e700 0 monclient: hunting for new
> >mon
> >> 2012-07-05 14:28:41.670283 7fa3c030e700 0 monclient: hunting for new
> >mon
> >> 2012-07-05 14:28:41.672274 7fa3c030e700 0 monclient: hunting for new
> >mon
> >> ....
> >
> >The problem is that the ceph utility itself is pre-0.48, but the
> >monitors
> >are running 0.48. You need to upgrade the utility as well. (There was
> >a
> >note about this in the release announcement.)
> >
> >This only affects the -s and -w commands.
> >
> >sage
>
> I have read the notes, andupgraded the utility first. There was no
> problem when the first two were upgraded and recovering. This only
> happened when the third node is upgraded.
>
> The nodes are running debian wheezy, while the client admin node is
> running ubuntu 12.04.
Oooh, maybe the package for wheezy in the repo is wrong. Can you confirm
which version the ceph utility is with 'ceph -v'?
Thanks!
sage
>
> thanks
>
> Xiaopong
>
> >
> >>
> >> And it never stopped. I was thinking, maybe it just behaved like
> >> that during recovery. But after the recovery is done, it still
> >> get the same thing:
> >>
> >> root@china:/tmp# ceph health
> >> 2012-07-05 14:28:55.077364 7f8306a0d780 2 auth: KeyRing::load:
> >loaded key
> >> file /etc/ceph/client.admin.keyring
> >> HEALTH_OK
> >> root@china:/tmp# ceph -s
> >> 2012-07-05 14:30:49.688017 7feb6338e780 2 auth: KeyRing::load:
> >loaded key
> >> file /etc/ceph/client.admin.keyring
> >> 2012-07-05 14:30:49.691690 7feb5b259700 0 monclient: hunting for new
> >mon
> >> 2012-07-05 14:30:49.694295 7feb5b259700 0 monclient: hunting for new
> >mon
> >> 2012-07-05 14:30:49.696487 7feb5b259700 0 monclient: hunting for new
> >mon
> >> 2012-07-05 14:30:49.698953 7feb5b259700 0 monclient: hunting for new
> >mon
> >> 2012-07-05 14:30:49.700833 7feb5b259700 0 monclient: hunting for new
> >mon
> >> ....
> >>
> >> Upgrading the first two nodes have no such problem. This first two
> >> nodes all run osd, mds, and mon. The third only runs osd and mon.
> >>
> >> The mon log on the 3rd node shows this, not sure if this is helpful:
> >>
> >> ....
> >> 925291 lease_expire=2012-07-05 02:38:14.149966 has v44 lc 44
> >> 2012-07-05 02:38:12.572107 7f7d9381a700 1
> >mon.a@0(leader).paxos(pgmap active
> >> c 29531..30031) is_readable now=2012-07-05 02:38:12.572114
> >> lease_expire=2012-07-05 02:38:15.889056 has v0 lc 30031
> >> 2012-07-05 02:38:12.572128 7f7d9381a700 1
> >mon.a@0(leader).paxos(pgmap active
> >> c 29531..30031) is_readable now=2012-07-05 02:38:12.572129
> >> lease_expire=2012-07-05 02:38:15.889056 has v0 lc 30031
> >> 2012-07-05 02:38:15.120439 7f7d9401b700 1
> >mon.a@0(leader).paxos(mdsmap active
> >> c 1..44) is_readable now=2012-07-05 02:38:15.120446
> >lease_expire=2012-07-05
> >> 02:38:17.149967 has v44 lc 44
> >> 2012-07-05 02:38:15.925349 7f7d9401b700 1
> >mon.a@0(leader).paxos(mdsmap active
> >> c 1..44) is_readable now=2012-07-05 02:38:15.925356
> >lease_expire=2012-07-05
> >> 02:38:20.149971 has v44 lc 44
> >> 2012-07-05 02:38:17.572181 7f7d9381a700 1
> >mon.a@0(leader).paxos(pgmap active
> >> c 29531..30031) is_readable now=2012-07-05 02:38:17.572189
> >> lease_expire=2012-07-05 02:38:21.889065 has v0 lc 30031
> >> 2012-07-05 02:38:17.572204 7f7d9381a700 1
> >mon.a@0(leader).paxos(pgmap active
> >> c 29531..30031) is_readable now=2012-07-05 02:38:17.572205
> >> lease_expire=2012-07-05 02:38:21.889065 has v0 lc 30031
> >> 2012-07-05 02:38:19.120463 7f7d9401b700 1
> >mon.a@0(leader).paxos(mdsmap active
> >> c 1..44) is_readable now=2012-07-05 02:38:19.120470
> >lease_expire=2012-07-05
> >> 02:38:23.149973 has v44 lc 44
> >> 2012-07-05 02:38:19.925323 7f7d9401b700 1
> >mon.a@0(leader).paxos(mdsmap active
> >> c 1..44) is_readable now=2012-07-05 02:38:19.925330
> >lease_expire=2012-07-05
> >> 02:38:23.149973 has v44 lc 44
> >>
> >> Could someone give a hint on this?
> >>
> >> Thanks
> >>
> >> Xiaopong
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>
> >>
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Strange behavior after upgrading to 0.48
2012-07-05 14:38 ` Sage Weil
@ 2012-07-06 2:38 ` Xiaopong Tran
2012-07-06 4:14 ` Mark Kirkwood
0 siblings, 1 reply; 10+ messages in thread
From: Xiaopong Tran @ 2012-07-06 2:38 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
On 07/05/2012 10:38 PM, Sage Weil wrote:
> On Thu, 5 Jul 2012, Xiaopong Tran wrote:
>>> The problem is that the ceph utility itself is pre-0.48, but the
>>> monitors
>>> are running 0.48. You need to upgrade the utility as well. (There was
>>> a
>>> note about this in the release announcement.)
>>>
>>> This only affects the -s and -w commands.
>>>
>>> sage
>>
>> I have read the notes, andupgraded the utility first. There was no
>> problem when the first two were upgraded and recovering. This only
>> happened when the third node is upgraded.
>>
>> The nodes are running debian wheezy, while the client admin node is
>> running ubuntu 12.04.
>
> Oooh, maybe the package for wheezy in the repo is wrong. Can you confirm
> which version the ceph utility is with 'ceph -v'?
>
> Thanks!
> sage
>
>
Thanks for the quick reply, I didn't have the computer with me last
night. But you were right. I checked the version of ceph on ubuntu,
and it's still stuck with 0.47.3, despite upgrading. I redid the
upgrade, and it's still stuck with that version. That's something
I didn't pay attention to.
I had to purge the ceph, ceph-common and other related packages,
and re-install it, then I got 0.48. And now ceph -s works just
as it should.
So, somehow, the upgrade on ubuntu does not work properly.
Thinking about this issue just right now, I think ceph -s
still worked right because there was still an older version
of mon when the first two nodes were being upgraded. When
the last one was upgraded, there's no mon of the same version
anymore.
Sorry, should have checked if apt upgrade was done properly
first :)
Thanks
Xiaopong
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Strange behavior after upgrading to 0.48
2012-07-06 2:38 ` Xiaopong Tran
@ 2012-07-06 4:14 ` Mark Kirkwood
2012-07-06 4:17 ` Sage Weil
0 siblings, 1 reply; 10+ messages in thread
From: Mark Kirkwood @ 2012-07-06 4:14 UTC (permalink / raw)
To: Xiaopong Tran; +Cc: Sage Weil, ceph-devel
On 06/07/12 14:38, Xiaopong Tran wrote:
>
> Thanks for the quick reply, I didn't have the computer with me last
> night. But you were right. I checked the version of ceph on ubuntu,
> and it's still stuck with 0.47.3, despite upgrading. I redid the
> upgrade, and it's still stuck with that version. That's something
> I didn't pay attention to.
>
> I had to purge the ceph, ceph-common and other related packages,
> and re-install it, then I got 0.48. And now ceph -s works just
> as it should.
>
> So, somehow, the upgrade on ubuntu does not work properly.
>
> Thinking about this issue just right now, I think ceph -s
> still worked right because there was still an older version
> of mon when the first two nodes were being upgraded. When
> the last one was upgraded, there's no mon of the same version
> anymore.
>
> Sorry, should have checked if apt upgrade was done properly
> first :)
>
>
FYI: I ran into this too - you need to do:
apt-get dist-upgrade
for the 0.47-2 packages to be replaced by 0.48 (of course purging 'em
and reinstalling works too...just a bit more drastic)!
regards
Mark
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Strange behavior after upgrading to 0.48
2012-07-06 4:14 ` Mark Kirkwood
@ 2012-07-06 4:17 ` Sage Weil
2012-07-06 4:52 ` Mark Kirkwood
0 siblings, 1 reply; 10+ messages in thread
From: Sage Weil @ 2012-07-06 4:17 UTC (permalink / raw)
To: Mark Kirkwood; +Cc: Xiaopong Tran, ceph-devel
On Fri, 6 Jul 2012, Mark Kirkwood wrote:
> On 06/07/12 14:38, Xiaopong Tran wrote:
> >
> > Thanks for the quick reply, I didn't have the computer with me last
> > night. But you were right. I checked the version of ceph on ubuntu,
> > and it's still stuck with 0.47.3, despite upgrading. I redid the
> > upgrade, and it's still stuck with that version. That's something
> > I didn't pay attention to.
> >
> > I had to purge the ceph, ceph-common and other related packages,
> > and re-install it, then I got 0.48. And now ceph -s works just
> > as it should.
> >
> > So, somehow, the upgrade on ubuntu does not work properly.
> >
> > Thinking about this issue just right now, I think ceph -s
> > still worked right because there was still an older version
> > of mon when the first two nodes were being upgraded. When
> > the last one was upgraded, there's no mon of the same version
> > anymore.
> >
> > Sorry, should have checked if apt upgrade was done properly
> > first :)
> >
> >
>
> FYI: I ran into this too - you need to do:
>
> apt-get dist-upgrade
>
> for the 0.47-2 packages to be replaced by 0.48 (of course purging 'em and
> reinstalling works too...just a bit more drastic)!
That's strange... anyone know why?
sage
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Strange behavior after upgrading to 0.48
2012-07-06 4:17 ` Sage Weil
@ 2012-07-06 4:52 ` Mark Kirkwood
2012-07-06 5:07 ` Sage Weil
0 siblings, 1 reply; 10+ messages in thread
From: Mark Kirkwood @ 2012-07-06 4:52 UTC (permalink / raw)
To: Sage Weil; +Cc: Xiaopong Tran, ceph-devel
On 06/07/12 16:17, Sage Weil wrote:
> On Fri, 6 Jul 2012, Mark Kirkwood wrote:
>>
>> FYI: I ran into this too - you need to do:
>>
>> apt-get dist-upgrade
>>
>> for the 0.47-2 packages to be replaced by 0.48 (of course purging 'em and
>> reinstalling works too...just a bit more drastic)!
> That's strange... anyone know why?
>
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
From the apt-get manual:
upgrade
upgrade is used to install the newest versions of all packages
currently installed on the system from the sources enumerated in
/etc/apt/sources.list. Packages currently installed with new
versions available are retrieved and upgraded; under no
circumstances are currently installed packages removed, or
packages
not already installed retrieved and installed. New versions of
currently installed packages that cannot be upgraded without
changing the install status of another package will be left at
their current version. An update must be performed first so that
apt-get knows that new versions of packages are available.
dist-upgrade
dist-upgrade in addition to performing the function of upgrade,
also intelligently handles changing dependencies with new
versions
of packages; apt-get has a "smart" conflict resolution
system, and
it will attempt to upgrade the most important packages at the
expense of less important ones if necessary. So, dist-upgrade
command may remove some packages. The /etc/apt/sources.list file
contains a list of locations from which to retrieve desired
package
files. See also apt_preferences(5) for a mechanism for
overriding
the general settings for individual packages.
Does 0.48 have new dependancies perhaps?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Strange behavior after upgrading to 0.48
2012-07-06 4:52 ` Mark Kirkwood
@ 2012-07-06 5:07 ` Sage Weil
0 siblings, 0 replies; 10+ messages in thread
From: Sage Weil @ 2012-07-06 5:07 UTC (permalink / raw)
To: Mark Kirkwood; +Cc: Xiaopong Tran, ceph-devel
On Fri, 6 Jul 2012, Mark Kirkwood wrote:
> On 06/07/12 16:17, Sage Weil wrote:
> > On Fri, 6 Jul 2012, Mark Kirkwood wrote:
> > >
> > > FYI: I ran into this too - you need to do:
> > >
> > > apt-get dist-upgrade
> > >
> > > for the 0.47-2 packages to be replaced by 0.48 (of course purging 'em and
> > > reinstalling works too...just a bit more drastic)!
> > That's strange... anyone know why?
> >
> > sage
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> From the apt-get manual:
>
> upgrade
> upgrade is used to install the newest versions of all packages
> currently installed on the system from the sources enumerated in
> /etc/apt/sources.list. Packages currently installed with new
> versions available are retrieved and upgraded; under no
> circumstances are currently installed packages removed, or packages
> not already installed retrieved and installed. New versions of
> currently installed packages that cannot be upgraded without
> changing the install status of another package will be left at
> their current version. An update must be performed first so that
> apt-get knows that new versions of packages are available.
>
> dist-upgrade
> dist-upgrade in addition to performing the function of upgrade,
> also intelligently handles changing dependencies with new versions
> of packages; apt-get has a "smart" conflict resolution system, and
> it will attempt to upgrade the most important packages at the
> expense of less important ones if necessary. So, dist-upgrade
> command may remove some packages. The /etc/apt/sources.list file
> contains a list of locations from which to retrieve desired package
> files. See also apt_preferences(5) for a mechanism for overriding
> the general settings for individual packages.
>
> Does 0.48 have new dependancies perhaps?
Oh, yeah. We switched to libnss from libcrypto++ by default, among other
things; that would explain it!
Thanks-
sage
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-07-06 5:07 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-05 6:41 Strange behavior after upgrading to 0.48 Xiaopong Tran
2012-07-05 6:47 ` Xiaopong Tran
2012-07-05 14:26 ` Sage Weil
2012-07-05 14:35 ` Xiaopong Tran
2012-07-05 14:38 ` Sage Weil
2012-07-06 2:38 ` Xiaopong Tran
2012-07-06 4:14 ` Mark Kirkwood
2012-07-06 4:17 ` Sage Weil
2012-07-06 4:52 ` Mark Kirkwood
2012-07-06 5:07 ` Sage Weil
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.