From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gonzalo Aguilar Delgado Subject: OSD will never become up. HEALTH_ERR Date: Wed, 11 May 2016 10:37:07 +0200 Message-ID: <1462955827.13078.22.camel@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from mail-wm0-f51.google.com ([74.125.82.51]:35617 "EHLO mail-wm0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751450AbcEKIhL (ORCPT ); Wed, 11 May 2016 04:37:11 -0400 Received: by mail-wm0-f51.google.com with SMTP id e201so208637883wme.0 for ; Wed, 11 May 2016 01:37:10 -0700 (PDT) Received: from laptop.cloud.level2crm.com (46.red-212-170-57.staticip.rima-tde.net. [212.170.57.46]) by smtp.gmail.com with ESMTPSA id gg7sm6820330wjd.10.2016.05.11.01.37.08 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 11 May 2016 01:37:09 -0700 (PDT) Sender: ceph-devel-owner@vger.kernel.org List-ID: To: ceph-devel@vger.kernel.org Hello,=C2=A0 I just upgraded my cluster to the version 10.1.2 and it worked well for a while until I saw that systemctl ceph-disk@dev-sdc1.service was failed and I reruned it. =46rom there the OSD stopped working.=C2=A0 This is ubuntu 16.04.=C2=A0 I connected to the IRC looking for help where people pointed me to one or another place but none of the investigations helped to resolve. My configuration is rather simple: oot@red-compute:~# ceph osd tree ID WEIGHT=C2=A0 TYPE NAME=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 UP/DOWN REWEIGHT PRIMA= RY-AFFINITY=C2=A0 -1 1.00000 root default=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 -4 1.00000=C2=A0=C2=A0=C2=A0=C2=A0 rack rack-1=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 -2 1.00000=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 host blue-co= mpute=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A00 1.00000=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 osd.0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 down=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1.00000=C2=A0 =C2=A02 1.00000=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 osd.2=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 down=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1.00000=C2=A0 -3 1.00000=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 host red-com= pute=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 =C2=A01 1.00000=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 osd.1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 down=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1.00000=C2=A0 =C2=A03 0.50000=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 osd.3=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 up=C2=A0 1.00000=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 1.00000=C2=A0 =C2=A04 1.00000=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 osd.4=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 down=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1.00000=C2=A0 It seems that all nodes are in preboot status. I was looking at the latests commits and it seems that there's a patch to make OSDs to wait for cluster to become healthy before rejoining. Can this be the source of my problems? root@red-compute:/var/lib/ceph/osd/ceph-1# ceph daemon osd.1 status { =C2=A0=C2=A0=C2=A0 "cluster_fsid": "9028f4da-0d77-462b-be9b-dbdf7fa5777= 1", =C2=A0=C2=A0=C2=A0 "osd_fsid": "adf9890a-e680-48e4-82c6-e96f4ed56889", =C2=A0=C2=A0=C2=A0 "whoami": 1, =C2=A0=C2=A0=C2=A0 "state": "preboot", =C2=A0=C2=A0=C2=A0 "oldest_map": 1764, =C2=A0=C2=A0=C2=A0 "newest_map": 2504, =C2=A0=C2=A0=C2=A0 "num_pgs": 323 } root@red-compute:/var/lib/ceph/osd/ceph-1# ceph daemon osd.3 status { =C2=A0=C2=A0=C2=A0 "cluster_fsid": "9028f4da-0d77-462b-be9b-dbdf7fa5777= 1", =C2=A0=C2=A0=C2=A0 "osd_fsid": "8dd085d4-0b50-4c80-a0ca-c5bc4ad972f7", =C2=A0=C2=A0=C2=A0 "whoami": 3, =C2=A0=C2=A0=C2=A0 "state": "preboot", =C2=A0=C2=A0=C2=A0 "oldest_map": 1764, =C2=A0=C2=A0=C2=A0 "newest_map": 2504, =C2=A0=C2=A0=C2=A0 "num_pgs": 150 } 3 is up and in.=C2=A0 This is what I got sofar: Once upgraded I discovered that daemon runs under ceph. I just ran chown on ceph directories. and it worked.=C2=A0 =46irewall is fully disabled. Checked connectivity with nc and nmap.=C2= =A0 Configuration seems to be right. I can post if you want.=C2=A0 Enabling logging on OSD shows that for example osd.1 is reconnecting all the time. 2016-05-10 14:35:48.199573 7f53e8f1a700=C2=A0 1 --=C2=A00.0.0.0:6806/13= 962=C2=A0>> :/0 pipe(0x556f99413400 sd=3D84 :6806 s=3D0 pgs=3D0 cs=3D0 l=3D0 c=3D0x556f993b3a80).accept sd=3D84=C2=A0172.16.0.119:35388/0 =C2=A02016-05-10 14:35:48.199966 7f53e8f1a700=C2=A0 2 --=C2=A00.0.0.0:6= 806/13962=C2=A0>> :/0 pipe(0x556f99413400 sd=3D84 :6806 s=3D4 pgs=3D0 cs=3D0 l=3D0 c=3D0x556f993b3a80).fault (0) Success =C2=A02016-05-10 14:35:48.200018 7f53fb941700=C2=A0 1 osd.1 2468 ms_han= dle_reset con 0x556f993b3a80 session 0 OSD.3 goes ok because never left out because ceph restriction. I rebooted all services at once for it to have available all OSD at the same time and don't mark it down. Don't work.=C2=A0 I forced up from commandline. ceph osd in 1-5. They appear as in for a while then out. We tried ceph-disk activate-all to boot everything. Don't work. The strange thing is that culster started worked just right after upgrade. But the systemctrl command broke both servers.=C2=A0 root@blue-compute:~# ceph -w =C2=A0=C2=A0=C2=A0 cluster 9028f4da-0d77-462b-be9b-dbdf7fa57771 =C2=A0=C2=A0=C2=A0=C2=A0 health HEALTH_ERR =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 694 = pgs are stuck inactive for more than 300 seconds =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 694 = pgs stale =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 694 = pgs stuck stale =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 too = many PGs per OSD (1528 > max 300) =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 mds = cluster is degraded =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 crus= h map has straw_calc_version=3D0 =C2=A0=C2=A0=C2=A0=C2=A0 monmap e10: 2 mons at {blue-compute=3D172.16.0= =2E119:6789/0,red- compute=3D172.16.0.100:6789/0} =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 elec= tion epoch 3600, quorum 0,1 red-compute,blue-compute =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 fsmap e673: 1/1/1 up {0:0=3Dblue-compute= =3Dup:replay} =C2=A0=C2=A0=C2=A0=C2=A0 osdmap e2495: 5 osds: 1 up, 1 in; 5 remapped p= gs =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pgmap v40765481: 764 pgs, 6 pools, 410 G= B data, 103 kobjects =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 8764= 1 MB used, 212 GB / 297 GB avail =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 694 stale+active+clean =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 70 active+clean 2016-05-10 17:03:55.822440 mon.0 [INF] HEALTH_ERR; 694 pgs are stuck inactive for more than 300 seconds; 694 pgs stale; 694 pgs stuck stale; too many PGs per OSD (1528 > max 300); mds cluster is degraded; crush map has straw_calc_version=3D cat /etc/ceph/ceph.conf=C2=A0 [global] fsid =3D 9028f4da-0d77-462b-be9b-dbdf7fa57771 mon_initial_members =3D blue-compute, red-compute mon_host =3D 172.16.0.119, 172.16.0.100 auth_cluster_required =3D cephx auth_service_required =3D cephx auth_client_required =3D cephx filestore_xattr_use_omap =3D true public_network =3D=C2=A0172.16.0.0/24 osd_pool_default_pg_num =3D 100 osd_pool_default_pgp_num =3D 100 osd_pool_default_size =3D 2=C2=A0 # Write an object 3 times. osd_pool_default_min_size =3D 1 # Allow writing one copy in a degraded state. ## Required upgrade osd max object name len =3D 256 osd max object namespace len =3D 64 [mon.] =C2=A0=C2=A0=C2=A0 debug mon =3D 9 =C2=A0=C2=A0=C2=A0 caps mon =3D "allow *" Any help on this? Any clue of what's going wrong? I also see this, I don't know if it's related or not =3D> ceph-osd.admin.log <=3D=3D 2016-05-10 18:21:46.060278 7fa8f30cc8c0=C2=A0 0 ceph version 10.1.2 (4a2a6f72640d6b74a3bbd92798bb913ed380dcd4), process ceph-osd, pid 14135 2016-05-10 18:21:46.060460 7fa8f30cc8c0 -1 bluestore(/dev/sdc2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding 2016-05-10 18:21:46.062949 7fa8f30cc8c0=C2=A0 1 journal _open /dev/sdc2= fd 4: 5367660544 bytes, block size 4096 bytes, directio =3D 0, aio =3D 0 2016-05-10 18:21:46.062991 7fa8f30cc8c0=C2=A0 1 journal close /dev/sdc2 2016-05-10 18:21:46.063026 7fa8f30cc8c0=C2=A0 0 probe_block_device_fsid /dev/sdc2 is filestore, 119a9f4e-73d8-4a1f-877c-d60b01840c96 2016-05-10 18:21:47.072082 7eff735598c0=C2=A0 0 ceph version 10.1.2 (4a2a6f72640d6b74a3bbd92798bb913ed380dcd4), process ceph-osd, pid 14177 2016-05-10 18:21:47.072285 7eff735598c0 -1 bluestore(/dev/sdf2) _read_bdev_label unable to decode label at offset 66: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding 2016-05-10 18:21:47.074799 7eff735598c0=C2=A0 1 journal _open /dev/sdf2= fd 4: 5367660544 bytes, block size 4096 bytes, directio =3D 0, aio =3D 0 2016-05-10 18:21:47.074844 7eff735598c0=C2=A0 1 journal close /dev/sdf2 2016-05-10 18:21:47.074881 7eff735598c0=C2=A0 0 probe_block_device_fsid /dev/sdf2 is filestore, fd069e6a-9a62-4286-99cb-d8a523bd946a -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html