* problem with degraded PG
@ 2019-06-14 9:19 Luk
[not found] ` <304057881.20190614111947-wYtBgQxc//8@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Luk @ 2019-06-14 9:19 UTC (permalink / raw)
To: ceph-users-idqoXFIVOFJgJs9I8MT0rw,
ceph-devel-u79uwXL29TY76Z2rM5mHXA
Hello,
Maybe somone was fighting with this kind of stuck in ceph already.
This is production cluster, can't/don't want to make wrong steps,
please advice, what to do.
After changing of one failed disk (it was osd-7) on our cluster ceph
didn't recover to HEALTH_OK, it stopped in state:
[root@ceph-mon-01 ~]# ceph -s
cluster:
id: b6f23cff-7279-f4b0-ff91-21fadac95bb5
health: HEALTH_WARN
noout,noscrub,nodeep-scrub flag(s) set
Degraded data redundancy: 24761/45994899 objects degraded (0.054%), 8 pgs degraded, 8 pgs undersized
services:
mon: 3 daemons, quorum ceph-mon-01,ceph-mon-02,ceph-mon-03
mgr: ceph-mon-03(active), standbys: ceph-mon-02, ceph-mon-01
osd: 144 osds: 144 up, 144 in
flags noout,noscrub,nodeep-scrub
rbd-mirror: 3 daemons active
rgw: 6 daemons active
data:
pools: 18 pools, 2176 pgs
objects: 15.33M objects, 49.3TiB
usage: 151TiB used, 252TiB / 403TiB avail
pgs: 24761/45994899 objects degraded (0.054%)
2168 active+clean
8 active+undersized+degraded
io:
client: 435MiB/s rd, 415MiB/s wr, 7.94kop/s rd, 2.96kop/s wr
Restart of OSD didn't helped, changing choose_total_tries from 50 to 100 didn't help.
I checked one of degraded PG, 10.3c4
[root@ceph-mon-01 ~]# ceph pg dump 2>&1 | grep -w 10.3c4
10.3c4 3593 0 3593 0 0 14769891858 10076 10076 active+undersized+degraded 2019-06-13 08:19:39.802219 37380'71900564 37380:119411139 [9,109] 9 [9,109] 9 33550'69130424 2019-06-08 02:28:40.508790 33550'69130424 2019-06-08 02:28:40.508790 18
[root@ceph-mon-01 ~]# ceph pg 10.3c4 query | jq '.["peer_info"][] | {peer: .peer, last_update:.last_update}'
{
"peer": "0",
"last_update": "36847'71412720"
}
{
"peer": "109",
"last_update": "37380'71900570"
}
{
"peer": "117",
"last_update": "0'0"
}
[root@ceph-mon-01 ~]#
I have checked space taken for this PG on storage nodes:
here is how to check where is particular OSD (on which physical storage node):
[root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 9 "
| 9 | stor-a02 | 2063G | 5386G | 52 | 1347k | 53 | 292k | exists,up |
[root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 109 "
| 109 | stor-a01 | 1285G | 4301G | 5 | 31.0k | 6 | 59.2k | exists,up |
[root@ceph-mon-01 ~]# watch ceph -s
[root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 117 "
| 117 | stor-b02 | 1334G | 4252G | 54 | 1216k | 13 | 27.4k | exists,up |
[root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 0 "
| 0 | stor-a01 | 2156G | 5293G | 58 | 387k | 29 | 30.7k | exists,up |
[root@ceph-mon-01 ~]#
and checking sizes on servers:
stor-a01 (this PG shouldn't be on the same host):
[root@stor-a01 /var/lib/ceph/osd/ceph-0/current]# du -sh 10.3c4_*
2.4G 10.3c4_head
0 10.3c4_TEMP
[root@stor-a01 /var/lib/ceph/osd/ceph-109/current]# du -sh 10.3c4_*
14G 10.3c4_head
0 10.3c4_TEMP
[root@stor-a01 /var/lib/ceph/osd/ceph-109/current]#
stor-a02:
[root@stor-a02 /var/lib/ceph/osd/ceph-9/current]# du -sh 10.3c4_*
14G 10.3c4_head
0 10.3c4_TEMP
[root@stor-a02 /var/lib/ceph/osd/ceph-9/current]#
stor-b02:
[root@stor-b02 /var/lib/ceph/osd/ceph-117/current]# du -sh 10.3c4_*
zsh: no matches found: 10.3c4_*
information about ceph:
[root@ceph-mon-01 ~]# ceph versions
{
"mon": {
"ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
},
"mgr": {
"ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
},
"osd": {
"ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 144
},
"mds": {},
"rbd-mirror": {
"ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
},
"rgw": {
"ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 6
},
"overall": {
"ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 159
}
}
crushmap: https://pastebin.com/cpC2WmyS
ceph osd tree: https://pastebin.com/XvZ2cNZZ
I'm cross-posting this do devel because maybe there is some known bug
in this particular version of ceph, and You could point some
directions to fix this problem.
--
Regards
Lukasz
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: problem with degraded PG
[not found] ` <304057881.20190614111947-wYtBgQxc//8@public.gmane.org>
@ 2019-06-14 9:22 ` Dan van der Ster
[not found] ` <CABZ+qq=Ob4SuvFSbpN-8+NP66_rQJO2_HaNmThVwF6s8vOw+hQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Dan van der Ster @ 2019-06-14 9:22 UTC (permalink / raw)
To: Łukasz Chrustek
Cc: ceph-users, ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Hi,
This looks like a tunables issue.
What is the output of `ceph osd crush show-tunables `
-- Dan
On Fri, Jun 14, 2019 at 11:19 AM Luk <skidoo-wYtBgQxc//8@public.gmane.org> wrote:
>
> Hello,
>
> Maybe somone was fighting with this kind of stuck in ceph already.
> This is production cluster, can't/don't want to make wrong steps,
> please advice, what to do.
>
> After changing of one failed disk (it was osd-7) on our cluster ceph
> didn't recover to HEALTH_OK, it stopped in state:
>
> [root@ceph-mon-01 ~]# ceph -s
> cluster:
> id: b6f23cff-7279-f4b0-ff91-21fadac95bb5
> health: HEALTH_WARN
> noout,noscrub,nodeep-scrub flag(s) set
> Degraded data redundancy: 24761/45994899 objects degraded (0.054%), 8 pgs degraded, 8 pgs undersized
>
> services:
> mon: 3 daemons, quorum ceph-mon-01,ceph-mon-02,ceph-mon-03
> mgr: ceph-mon-03(active), standbys: ceph-mon-02, ceph-mon-01
> osd: 144 osds: 144 up, 144 in
> flags noout,noscrub,nodeep-scrub
> rbd-mirror: 3 daemons active
> rgw: 6 daemons active
>
> data:
> pools: 18 pools, 2176 pgs
> objects: 15.33M objects, 49.3TiB
> usage: 151TiB used, 252TiB / 403TiB avail
> pgs: 24761/45994899 objects degraded (0.054%)
> 2168 active+clean
> 8 active+undersized+degraded
>
> io:
> client: 435MiB/s rd, 415MiB/s wr, 7.94kop/s rd, 2.96kop/s wr
>
> Restart of OSD didn't helped, changing choose_total_tries from 50 to 100 didn't help.
>
> I checked one of degraded PG, 10.3c4
>
> [root@ceph-mon-01 ~]# ceph pg dump 2>&1 | grep -w 10.3c4
> 10.3c4 3593 0 3593 0 0 14769891858 10076 10076 active+undersized+degraded 2019-06-13 08:19:39.802219 37380'71900564 37380:119411139 [9,109] 9 [9,109] 9 33550'69130424 2019-06-08 02:28:40.508790 33550'69130424 2019-06-08 02:28:40.508790 18
>
>
> [root@ceph-mon-01 ~]# ceph pg 10.3c4 query | jq '.["peer_info"][] | {peer: .peer, last_update:.last_update}'
> {
> "peer": "0",
> "last_update": "36847'71412720"
> }
> {
> "peer": "109",
> "last_update": "37380'71900570"
> }
> {
> "peer": "117",
> "last_update": "0'0"
> }
>
>
> [root@ceph-mon-01 ~]#
> I have checked space taken for this PG on storage nodes:
> here is how to check where is particular OSD (on which physical storage node):
> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 9 "
> | 9 | stor-a02 | 2063G | 5386G | 52 | 1347k | 53 | 292k | exists,up |
> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 109 "
> | 109 | stor-a01 | 1285G | 4301G | 5 | 31.0k | 6 | 59.2k | exists,up |
> [root@ceph-mon-01 ~]# watch ceph -s
> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 117 "
> | 117 | stor-b02 | 1334G | 4252G | 54 | 1216k | 13 | 27.4k | exists,up |
> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 0 "
> | 0 | stor-a01 | 2156G | 5293G | 58 | 387k | 29 | 30.7k | exists,up |
> [root@ceph-mon-01 ~]#
> and checking sizes on servers:
> stor-a01 (this PG shouldn't be on the same host):
> [root@stor-a01 /var/lib/ceph/osd/ceph-0/current]# du -sh 10.3c4_*
> 2.4G 10.3c4_head
> 0 10.3c4_TEMP
> [root@stor-a01 /var/lib/ceph/osd/ceph-109/current]# du -sh 10.3c4_*
> 14G 10.3c4_head
> 0 10.3c4_TEMP
> [root@stor-a01 /var/lib/ceph/osd/ceph-109/current]#
> stor-a02:
> [root@stor-a02 /var/lib/ceph/osd/ceph-9/current]# du -sh 10.3c4_*
> 14G 10.3c4_head
> 0 10.3c4_TEMP
> [root@stor-a02 /var/lib/ceph/osd/ceph-9/current]#
> stor-b02:
> [root@stor-b02 /var/lib/ceph/osd/ceph-117/current]# du -sh 10.3c4_*
> zsh: no matches found: 10.3c4_*
>
> information about ceph:
> [root@ceph-mon-01 ~]# ceph versions
> {
> "mon": {
> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
> },
> "mgr": {
> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
> },
> "osd": {
> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 144
> },
> "mds": {},
> "rbd-mirror": {
> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
> },
> "rgw": {
> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 6
> },
> "overall": {
> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 159
> }
> }
>
> crushmap: https://pastebin.com/cpC2WmyS
> ceph osd tree: https://pastebin.com/XvZ2cNZZ
>
> I'm cross-posting this do devel because maybe there is some known bug
> in this particular version of ceph, and You could point some
> directions to fix this problem.
>
> --
> Regards
> Lukasz
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: problem with degraded PG
[not found] ` <CABZ+qq=Ob4SuvFSbpN-8+NP66_rQJO2_HaNmThVwF6s8vOw+hQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-06-14 9:29 ` Luk
[not found] ` <1171202508.20190614112950-wYtBgQxc//8@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Luk @ 2019-06-14 9:29 UTC (permalink / raw)
To: Dan van der Ster
Cc: ceph-users, ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Hi,
here is the output:
ceph osd crush show-tunables
{
"choose_local_tries": 0,
"choose_local_fallback_tries": 0,
"choose_total_tries": 100,
"chooseleaf_descend_once": 1,
"chooseleaf_vary_r": 1,
"chooseleaf_stable": 0,
"straw_calc_version": 1,
"allowed_bucket_algs": 22,
"profile": "unknown",
"optimal_tunables": 0,
"legacy_tunables": 0,
"minimum_required_version": "hammer",
"require_feature_tunables": 1,
"require_feature_tunables2": 1,
"has_v2_rules": 0,
"require_feature_tunables3": 1,
"has_v3_rules": 0,
"has_v4_buckets": 1,
"require_feature_tunables5": 0,
"has_v5_rules": 0
}
[root@ceph-mon-01 ~]#
--
Regards
Lukasz
> Hi,
> This looks like a tunables issue.
> What is the output of `ceph osd crush show-tunables `
> -- Dan
> On Fri, Jun 14, 2019 at 11:19 AM Luk <skidoo-wYtBgQxc//8@public.gmane.org> wrote:
>>
>> Hello,
>>
>> Maybe somone was fighting with this kind of stuck in ceph already.
>> This is production cluster, can't/don't want to make wrong steps,
>> please advice, what to do.
>>
>> After changing of one failed disk (it was osd-7) on our cluster ceph
>> didn't recover to HEALTH_OK, it stopped in state:
>>
>> [root@ceph-mon-01 ~]# ceph -s
>> cluster:
>> id: b6f23cff-7279-f4b0-ff91-21fadac95bb5
>> health: HEALTH_WARN
>> noout,noscrub,nodeep-scrub flag(s) set
>> Degraded data redundancy: 24761/45994899 objects degraded (0.054%), 8 pgs degraded, 8 pgs undersized
>>
>> services:
>> mon: 3 daemons, quorum ceph-mon-01,ceph-mon-02,ceph-mon-03
>> mgr: ceph-mon-03(active), standbys: ceph-mon-02, ceph-mon-01
>> osd: 144 osds: 144 up, 144 in
>> flags noout,noscrub,nodeep-scrub
>> rbd-mirror: 3 daemons active
>> rgw: 6 daemons active
>>
>> data:
>> pools: 18 pools, 2176 pgs
>> objects: 15.33M objects, 49.3TiB
>> usage: 151TiB used, 252TiB / 403TiB avail
>> pgs: 24761/45994899 objects degraded (0.054%)
>> 2168 active+clean
>> 8 active+undersized+degraded
>>
>> io:
>> client: 435MiB/s rd, 415MiB/s wr, 7.94kop/s rd, 2.96kop/s wr
>>
>> Restart of OSD didn't helped, changing choose_total_tries from 50 to 100 didn't help.
>>
>> I checked one of degraded PG, 10.3c4
>>
>> [root@ceph-mon-01 ~]# ceph pg dump 2>&1 | grep -w 10.3c4
>> 10.3c4 3593 0 3593 0 0 14769891858 10076 10076 active+undersized+degraded 2019-06-13 08:19:39.802219 37380'71900564 37380:119411139 [9,109] 9 [9,109] 9 33550'69130424 2019-06-08 02:28:40.508790 33550'69130424 2019-06-08 02:28:40.508790 18
>>
>>
>> [root@ceph-mon-01 ~]# ceph pg 10.3c4 query | jq '.["peer_info"][] | {peer: .peer, last_update:.last_update}'
>> {
>> "peer": "0",
>> "last_update": "36847'71412720"
>> }
>> {
>> "peer": "109",
>> "last_update": "37380'71900570"
>> }
>> {
>> "peer": "117",
>> "last_update": "0'0"
>> }
>>
>>
>> [root@ceph-mon-01 ~]#
>> I have checked space taken for this PG on storage nodes:
>> here is how to check where is particular OSD (on which physical storage node):
>> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 9 "
>> | 9 | stor-a02 | 2063G | 5386G | 52 | 1347k | 53 | 292k | exists,up |
>> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 109 "
>> | 109 | stor-a01 | 1285G | 4301G | 5 | 31.0k | 6 | 59.2k | exists,up |
>> [root@ceph-mon-01 ~]# watch ceph -s
>> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 117 "
>> | 117 | stor-b02 | 1334G | 4252G | 54 | 1216k | 13 | 27.4k | exists,up |
>> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 0 "
>> | 0 | stor-a01 | 2156G | 5293G | 58 | 387k | 29 | 30.7k | exists,up |
>> [root@ceph-mon-01 ~]#
>> and checking sizes on servers:
>> stor-a01 (this PG shouldn't be on the same host):
>> [root@stor-a01 /var/lib/ceph/osd/ceph-0/current]# du -sh 10.3c4_*
>> 2.4G 10.3c4_head
>> 0 10.3c4_TEMP
>> [root@stor-a01 /var/lib/ceph/osd/ceph-109/current]# du -sh 10.3c4_*
>> 14G 10.3c4_head
>> 0 10.3c4_TEMP
>> [root@stor-a01 /var/lib/ceph/osd/ceph-109/current]#
>> stor-a02:
>> [root@stor-a02 /var/lib/ceph/osd/ceph-9/current]# du -sh 10.3c4_*
>> 14G 10.3c4_head
>> 0 10.3c4_TEMP
>> [root@stor-a02 /var/lib/ceph/osd/ceph-9/current]#
>> stor-b02:
>> [root@stor-b02 /var/lib/ceph/osd/ceph-117/current]# du -sh 10.3c4_*
>> zsh: no matches found: 10.3c4_*
>>
>> information about ceph:
>> [root@ceph-mon-01 ~]# ceph versions
>> {
>> "mon": {
>> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
>> },
>> "mgr": {
>> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
>> },
>> "osd": {
>> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 144
>> },
>> "mds": {},
>> "rbd-mirror": {
>> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
>> },
>> "rgw": {
>> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 6
>> },
>> "overall": {
>> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 159
>> }
>> }
>>
>> crushmap: https://pastebin.com/cpC2WmyS
>> ceph osd tree: https://pastebin.com/XvZ2cNZZ
>>
>> I'm cross-posting this do devel because maybe there is some known bug
>> in this particular version of ceph, and You could point some
>> directions to fix this problem.
>>
>> --
>> Regards
>> Lukasz
>>
--
Pozdrowienia,
Luk
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: problem with degraded PG
[not found] ` <1171202508.20190614112950-wYtBgQxc//8@public.gmane.org>
@ 2019-06-14 9:38 ` Dan van der Ster
[not found] ` <CABZ+qqmN0zG57Oqtv3wMLHrfXEwCPfGQARewq=hkjOxmZ0=G6g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Dan van der Ster @ 2019-06-14 9:38 UTC (permalink / raw)
To: Łukasz Chrustek
Cc: ceph-users, ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Ahh I was thinking of chooseleaf_vary_r, which you already have.
So probably not related to tunables. What is your `ceph osd tree` ?
By the way, 12.2.9 has an unrelated bug (details
http://tracker.ceph.com/issues/36686)
AFAIU you will just need to update to v12.2.11 or v12.2.12 for that fix.
-- Dan
On Fri, Jun 14, 2019 at 11:29 AM Luk <skidoo-wYtBgQxc//8@public.gmane.org> wrote:
>
> Hi,
>
> here is the output:
>
> ceph osd crush show-tunables
> {
> "choose_local_tries": 0,
> "choose_local_fallback_tries": 0,
> "choose_total_tries": 100,
> "chooseleaf_descend_once": 1,
> "chooseleaf_vary_r": 1,
> "chooseleaf_stable": 0,
> "straw_calc_version": 1,
> "allowed_bucket_algs": 22,
> "profile": "unknown",
> "optimal_tunables": 0,
> "legacy_tunables": 0,
> "minimum_required_version": "hammer",
> "require_feature_tunables": 1,
> "require_feature_tunables2": 1,
> "has_v2_rules": 0,
> "require_feature_tunables3": 1,
> "has_v3_rules": 0,
> "has_v4_buckets": 1,
> "require_feature_tunables5": 0,
> "has_v5_rules": 0
> }
>
> [root@ceph-mon-01 ~]#
>
> --
> Regards
> Lukasz
>
> > Hi,
> > This looks like a tunables issue.
> > What is the output of `ceph osd crush show-tunables `
>
> > -- Dan
>
> > On Fri, Jun 14, 2019 at 11:19 AM Luk <skidoo-wYtBgQxc//8@public.gmane.org> wrote:
> >>
> >> Hello,
> >>
> >> Maybe somone was fighting with this kind of stuck in ceph already.
> >> This is production cluster, can't/don't want to make wrong steps,
> >> please advice, what to do.
> >>
> >> After changing of one failed disk (it was osd-7) on our cluster ceph
> >> didn't recover to HEALTH_OK, it stopped in state:
> >>
> >> [root@ceph-mon-01 ~]# ceph -s
> >> cluster:
> >> id: b6f23cff-7279-f4b0-ff91-21fadac95bb5
> >> health: HEALTH_WARN
> >> noout,noscrub,nodeep-scrub flag(s) set
> >> Degraded data redundancy: 24761/45994899 objects degraded (0.054%), 8 pgs degraded, 8 pgs undersized
> >>
> >> services:
> >> mon: 3 daemons, quorum ceph-mon-01,ceph-mon-02,ceph-mon-03
> >> mgr: ceph-mon-03(active), standbys: ceph-mon-02, ceph-mon-01
> >> osd: 144 osds: 144 up, 144 in
> >> flags noout,noscrub,nodeep-scrub
> >> rbd-mirror: 3 daemons active
> >> rgw: 6 daemons active
> >>
> >> data:
> >> pools: 18 pools, 2176 pgs
> >> objects: 15.33M objects, 49.3TiB
> >> usage: 151TiB used, 252TiB / 403TiB avail
> >> pgs: 24761/45994899 objects degraded (0.054%)
> >> 2168 active+clean
> >> 8 active+undersized+degraded
> >>
> >> io:
> >> client: 435MiB/s rd, 415MiB/s wr, 7.94kop/s rd, 2.96kop/s wr
> >>
> >> Restart of OSD didn't helped, changing choose_total_tries from 50 to 100 didn't help.
> >>
> >> I checked one of degraded PG, 10.3c4
> >>
> >> [root@ceph-mon-01 ~]# ceph pg dump 2>&1 | grep -w 10.3c4
> >> 10.3c4 3593 0 3593 0 0 14769891858 10076 10076 active+undersized+degraded 2019-06-13 08:19:39.802219 37380'71900564 37380:119411139 [9,109] 9 [9,109] 9 33550'69130424 2019-06-08 02:28:40.508790 33550'69130424 2019-06-08 02:28:40.508790 18
> >>
> >>
> >> [root@ceph-mon-01 ~]# ceph pg 10.3c4 query | jq '.["peer_info"][] | {peer: .peer, last_update:.last_update}'
> >> {
> >> "peer": "0",
> >> "last_update": "36847'71412720"
> >> }
> >> {
> >> "peer": "109",
> >> "last_update": "37380'71900570"
> >> }
> >> {
> >> "peer": "117",
> >> "last_update": "0'0"
> >> }
> >>
> >>
> >> [root@ceph-mon-01 ~]#
> >> I have checked space taken for this PG on storage nodes:
> >> here is how to check where is particular OSD (on which physical storage node):
> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 9 "
> >> | 9 | stor-a02 | 2063G | 5386G | 52 | 1347k | 53 | 292k | exists,up |
> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 109 "
> >> | 109 | stor-a01 | 1285G | 4301G | 5 | 31.0k | 6 | 59.2k | exists,up |
> >> [root@ceph-mon-01 ~]# watch ceph -s
> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 117 "
> >> | 117 | stor-b02 | 1334G | 4252G | 54 | 1216k | 13 | 27.4k | exists,up |
> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 0 "
> >> | 0 | stor-a01 | 2156G | 5293G | 58 | 387k | 29 | 30.7k | exists,up |
> >> [root@ceph-mon-01 ~]#
> >> and checking sizes on servers:
> >> stor-a01 (this PG shouldn't be on the same host):
> >> [root@stor-a01 /var/lib/ceph/osd/ceph-0/current]# du -sh 10.3c4_*
> >> 2.4G 10.3c4_head
> >> 0 10.3c4_TEMP
> >> [root@stor-a01 /var/lib/ceph/osd/ceph-109/current]# du -sh 10.3c4_*
> >> 14G 10.3c4_head
> >> 0 10.3c4_TEMP
> >> [root@stor-a01 /var/lib/ceph/osd/ceph-109/current]#
> >> stor-a02:
> >> [root@stor-a02 /var/lib/ceph/osd/ceph-9/current]# du -sh 10.3c4_*
> >> 14G 10.3c4_head
> >> 0 10.3c4_TEMP
> >> [root@stor-a02 /var/lib/ceph/osd/ceph-9/current]#
> >> stor-b02:
> >> [root@stor-b02 /var/lib/ceph/osd/ceph-117/current]# du -sh 10.3c4_*
> >> zsh: no matches found: 10.3c4_*
> >>
> >> information about ceph:
> >> [root@ceph-mon-01 ~]# ceph versions
> >> {
> >> "mon": {
> >> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
> >> },
> >> "mgr": {
> >> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
> >> },
> >> "osd": {
> >> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 144
> >> },
> >> "mds": {},
> >> "rbd-mirror": {
> >> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
> >> },
> >> "rgw": {
> >> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 6
> >> },
> >> "overall": {
> >> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 159
> >> }
> >> }
> >>
> >> crushmap: https://pastebin.com/cpC2WmyS
> >> ceph osd tree: https://pastebin.com/XvZ2cNZZ
> >>
> >> I'm cross-posting this do devel because maybe there is some known bug
> >> in this particular version of ceph, and You could point some
> >> directions to fix this problem.
> >>
> >> --
> >> Regards
> >> Lukasz
> >>
>
>
>
> --
> Pozdrowienia,
> Luk
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: problem with degraded PG
[not found] ` <CABZ+qqmN0zG57Oqtv3wMLHrfXEwCPfGQARewq=hkjOxmZ0=G6g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-06-14 9:52 ` Luk
[not found] ` <661486857.20190614115232-wYtBgQxc//8@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Luk @ 2019-06-14 9:52 UTC (permalink / raw)
To: Dan van der Ster
Cc: ceph-users, ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Here is ceph osd tree, in first post there is also ceph osd df tree:
https://pastebin.com/Vs75gpwZ
> Ahh I was thinking of chooseleaf_vary_r, which you already have.
> So probably not related to tunables. What is your `ceph osd tree` ?
> By the way, 12.2.9 has an unrelated bug (details
> http://tracker.ceph.com/issues/36686)
> AFAIU you will just need to update to v12.2.11 or v12.2.12 for that fix.
> -- Dan
> On Fri, Jun 14, 2019 at 11:29 AM Luk <skidoo-wYtBgQxc//8@public.gmane.org> wrote:
>>
>> Hi,
>>
>> here is the output:
>>
>> ceph osd crush show-tunables
>> {
>> "choose_local_tries": 0,
>> "choose_local_fallback_tries": 0,
>> "choose_total_tries": 100,
>> "chooseleaf_descend_once": 1,
>> "chooseleaf_vary_r": 1,
>> "chooseleaf_stable": 0,
>> "straw_calc_version": 1,
>> "allowed_bucket_algs": 22,
>> "profile": "unknown",
>> "optimal_tunables": 0,
>> "legacy_tunables": 0,
>> "minimum_required_version": "hammer",
>> "require_feature_tunables": 1,
>> "require_feature_tunables2": 1,
>> "has_v2_rules": 0,
>> "require_feature_tunables3": 1,
>> "has_v3_rules": 0,
>> "has_v4_buckets": 1,
>> "require_feature_tunables5": 0,
>> "has_v5_rules": 0
>> }
>>
>> [root@ceph-mon-01 ~]#
>>
>> --
>> Regards
>> Lukasz
>>
>> > Hi,
>> > This looks like a tunables issue.
>> > What is the output of `ceph osd crush show-tunables `
>>
>> > -- Dan
>>
>> > On Fri, Jun 14, 2019 at 11:19 AM Luk <skidoo-wYtBgQxc//8@public.gmane.org> wrote:
>> >>
>> >> Hello,
>> >>
>> >> Maybe somone was fighting with this kind of stuck in ceph already.
>> >> This is production cluster, can't/don't want to make wrong steps,
>> >> please advice, what to do.
>> >>
>> >> After changing of one failed disk (it was osd-7) on our cluster ceph
>> >> didn't recover to HEALTH_OK, it stopped in state:
>> >>
>> >> [root@ceph-mon-01 ~]# ceph -s
>> >> cluster:
>> >> id: b6f23cff-7279-f4b0-ff91-21fadac95bb5
>> >> health: HEALTH_WARN
>> >> noout,noscrub,nodeep-scrub flag(s) set
>> >> Degraded data redundancy: 24761/45994899 objects degraded (0.054%), 8 pgs degraded, 8 pgs undersized
>> >>
>> >> services:
>> >> mon: 3 daemons, quorum ceph-mon-01,ceph-mon-02,ceph-mon-03
>> >> mgr: ceph-mon-03(active), standbys: ceph-mon-02, ceph-mon-01
>> >> osd: 144 osds: 144 up, 144 in
>> >> flags noout,noscrub,nodeep-scrub
>> >> rbd-mirror: 3 daemons active
>> >> rgw: 6 daemons active
>> >>
>> >> data:
>> >> pools: 18 pools, 2176 pgs
>> >> objects: 15.33M objects, 49.3TiB
>> >> usage: 151TiB used, 252TiB / 403TiB avail
>> >> pgs: 24761/45994899 objects degraded (0.054%)
>> >> 2168 active+clean
>> >> 8 active+undersized+degraded
>> >>
>> >> io:
>> >> client: 435MiB/s rd, 415MiB/s wr, 7.94kop/s rd, 2.96kop/s wr
>> >>
>> >> Restart of OSD didn't helped, changing choose_total_tries from 50 to 100 didn't help.
>> >>
>> >> I checked one of degraded PG, 10.3c4
>> >>
>> >> [root@ceph-mon-01 ~]# ceph pg dump 2>&1 | grep -w 10.3c4
>> >> 10.3c4 3593 0 3593 0 0 14769891858 10076 10076 active+undersized+degraded 2019-06-13 08:19:39.802219 37380'71900564 37380:119411139 [9,109] 9 [9,109] 9 33550'69130424 2019-06-08 02:28:40.508790 33550'69130424 2019-06-08 02:28:40.508790 18
>> >>
>> >>
>> >> [root@ceph-mon-01 ~]# ceph pg 10.3c4 query | jq '.["peer_info"][] | {peer: .peer, last_update:.last_update}'
>> >> {
>> >> "peer": "0",
>> >> "last_update": "36847'71412720"
>> >> }
>> >> {
>> >> "peer": "109",
>> >> "last_update": "37380'71900570"
>> >> }
>> >> {
>> >> "peer": "117",
>> >> "last_update": "0'0"
>> >> }
>> >>
>> >>
>> >> [root@ceph-mon-01 ~]#
>> >> I have checked space taken for this PG on storage nodes:
>> >> here is how to check where is particular OSD (on which physical storage node):
>> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 9 "
>> >> | 9 | stor-a02 | 2063G | 5386G | 52 | 1347k | 53 | 292k | exists,up |
>> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 109 "
>> >> | 109 | stor-a01 | 1285G | 4301G | 5 | 31.0k | 6 | 59.2k | exists,up |
>> >> [root@ceph-mon-01 ~]# watch ceph -s
>> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 117 "
>> >> | 117 | stor-b02 | 1334G | 4252G | 54 | 1216k | 13 | 27.4k | exists,up |
>> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 0 "
>> >> | 0 | stor-a01 | 2156G | 5293G | 58 | 387k | 29 | 30.7k | exists,up |
>> >> [root@ceph-mon-01 ~]#
>> >> and checking sizes on servers:
>> >> stor-a01 (this PG shouldn't be on the same host):
>> >> [root@stor-a01 /var/lib/ceph/osd/ceph-0/current]# du -sh 10.3c4_*
>> >> 2.4G 10.3c4_head
>> >> 0 10.3c4_TEMP
>> >> [root@stor-a01 /var/lib/ceph/osd/ceph-109/current]# du -sh 10.3c4_*
>> >> 14G 10.3c4_head
>> >> 0 10.3c4_TEMP
>> >> [root@stor-a01 /var/lib/ceph/osd/ceph-109/current]#
>> >> stor-a02:
>> >> [root@stor-a02 /var/lib/ceph/osd/ceph-9/current]# du -sh 10.3c4_*
>> >> 14G 10.3c4_head
>> >> 0 10.3c4_TEMP
>> >> [root@stor-a02 /var/lib/ceph/osd/ceph-9/current]#
>> >> stor-b02:
>> >> [root@stor-b02 /var/lib/ceph/osd/ceph-117/current]# du -sh 10.3c4_*
>> >> zsh: no matches found: 10.3c4_*
>> >>
>> >> information about ceph:
>> >> [root@ceph-mon-01 ~]# ceph versions
>> >> {
>> >> "mon": {
>> >> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
>> >> },
>> >> "mgr": {
>> >> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
>> >> },
>> >> "osd": {
>> >> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 144
>> >> },
>> >> "mds": {},
>> >> "rbd-mirror": {
>> >> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
>> >> },
>> >> "rgw": {
>> >> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 6
>> >> },
>> >> "overall": {
>> >> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 159
>> >> }
>> >> }
>> >>
>> >> crushmap: https://pastebin.com/cpC2WmyS
>> >> ceph osd tree: https://pastebin.com/XvZ2cNZZ
>> >>
>> >> I'm cross-posting this do devel because maybe there is some known bug
>> >> in this particular version of ceph, and You could point some
>> >> directions to fix this problem.
>> >>
>> >> --
>> >> Regards
>> >> Lukasz
>> >>
>>
>>
>>
>> --
>> Pozdrowienia,
>> Luk
>>
--
Pozdrowienia,
Luk
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: problem with degraded PG
[not found] ` <661486857.20190614115232-wYtBgQxc//8@public.gmane.org>
@ 2019-06-14 10:02 ` Caspar Smit
[not found] ` <CADvHm=ah+SZiZs8apTxC5Uom3NzEydSKswj0hsgC9pgTzgodxQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-06-14 12:02 ` Luk
1 sibling, 1 reply; 9+ messages in thread
From: Caspar Smit @ 2019-06-14 10:02 UTC (permalink / raw)
To: Łukasz Chrustek
Cc: ceph-users, ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
[-- Attachment #1.1: Type: text/plain, Size: 7811 bytes --]
Hi,
Something seems off in weight and sizes of hosts: ssdstor-a01, ssdstor-b01
and ssdstor-c01.
ssdstor-c01 has a weight of 0.4 while the size ~10TiB so i would expect the
weight to be around 10 in stead of 0.4
Same goes for the other two nodes i mentioned above.
Could you explain this?
Kind regards,
Caspar Smit
Op vr 14 jun. 2019 om 11:52 schreef Luk <skidoo-wYtBgQxc//8@public.gmane.org>:
> Here is ceph osd tree, in first post there is also ceph osd df tree:
>
> https://pastebin.com/Vs75gpwZ
>
>
>
> > Ahh I was thinking of chooseleaf_vary_r, which you already have.
> > So probably not related to tunables. What is your `ceph osd tree` ?
>
> > By the way, 12.2.9 has an unrelated bug (details
> > http://tracker.ceph.com/issues/36686)
> > AFAIU you will just need to update to v12.2.11 or v12.2.12 for that fix.
>
> > -- Dan
>
> > On Fri, Jun 14, 2019 at 11:29 AM Luk <skidoo-wYtBgQxc//8@public.gmane.org> wrote:
> >>
> >> Hi,
> >>
> >> here is the output:
> >>
> >> ceph osd crush show-tunables
> >> {
> >> "choose_local_tries": 0,
> >> "choose_local_fallback_tries": 0,
> >> "choose_total_tries": 100,
> >> "chooseleaf_descend_once": 1,
> >> "chooseleaf_vary_r": 1,
> >> "chooseleaf_stable": 0,
> >> "straw_calc_version": 1,
> >> "allowed_bucket_algs": 22,
> >> "profile": "unknown",
> >> "optimal_tunables": 0,
> >> "legacy_tunables": 0,
> >> "minimum_required_version": "hammer",
> >> "require_feature_tunables": 1,
> >> "require_feature_tunables2": 1,
> >> "has_v2_rules": 0,
> >> "require_feature_tunables3": 1,
> >> "has_v3_rules": 0,
> >> "has_v4_buckets": 1,
> >> "require_feature_tunables5": 0,
> >> "has_v5_rules": 0
> >> }
> >>
> >> [root@ceph-mon-01 ~]#
> >>
> >> --
> >> Regards
> >> Lukasz
> >>
> >> > Hi,
> >> > This looks like a tunables issue.
> >> > What is the output of `ceph osd crush show-tunables `
> >>
> >> > -- Dan
> >>
> >> > On Fri, Jun 14, 2019 at 11:19 AM Luk <skidoo-wYtBgQxc//8@public.gmane.org> wrote:
> >> >>
> >> >> Hello,
> >> >>
> >> >> Maybe somone was fighting with this kind of stuck in ceph
> already.
> >> >> This is production cluster, can't/don't want to make wrong
> steps,
> >> >> please advice, what to do.
> >> >>
> >> >> After changing of one failed disk (it was osd-7) on our cluster
> ceph
> >> >> didn't recover to HEALTH_OK, it stopped in state:
> >> >>
> >> >> [root@ceph-mon-01 ~]# ceph -s
> >> >> cluster:
> >> >> id: b6f23cff-7279-f4b0-ff91-21fadac95bb5
> >> >> health: HEALTH_WARN
> >> >> noout,noscrub,nodeep-scrub flag(s) set
> >> >> Degraded data redundancy: 24761/45994899 objects
> degraded (0.054%), 8 pgs degraded, 8 pgs undersized
> >> >>
> >> >> services:
> >> >> mon: 3 daemons, quorum ceph-mon-01,ceph-mon-02,ceph-mon-03
> >> >> mgr: ceph-mon-03(active), standbys: ceph-mon-02,
> ceph-mon-01
> >> >> osd: 144 osds: 144 up, 144 in
> >> >> flags noout,noscrub,nodeep-scrub
> >> >> rbd-mirror: 3 daemons active
> >> >> rgw: 6 daemons active
> >> >>
> >> >> data:
> >> >> pools: 18 pools, 2176 pgs
> >> >> objects: 15.33M objects, 49.3TiB
> >> >> usage: 151TiB used, 252TiB / 403TiB avail
> >> >> pgs: 24761/45994899 objects degraded (0.054%)
> >> >> 2168 active+clean
> >> >> 8 active+undersized+degraded
> >> >>
> >> >> io:
> >> >> client: 435MiB/s rd, 415MiB/s wr, 7.94kop/s rd, 2.96kop/s wr
> >> >>
> >> >> Restart of OSD didn't helped, changing choose_total_tries from 50 to
> 100 didn't help.
> >> >>
> >> >> I checked one of degraded PG, 10.3c4
> >> >>
> >> >> [root@ceph-mon-01 ~]# ceph pg dump 2>&1 | grep -w 10.3c4
> >> >> 10.3c4 3593 0 3593 0 0
> 14769891858 10076 10076 active+undersized+degraded 2019-06-13
> 08:19:39.802219 37380'71900564 37380:119411139 [9,109] 9
> [9,109] 9 33550'69130424 2019-06-08 02:28:40.508790
> 33550'69130424 2019-06-08 02:28:40.508790 18
> >> >>
> >> >>
> >> >> [root@ceph-mon-01 ~]# ceph pg 10.3c4 query | jq '.["peer_info"][] |
> {peer: .peer, last_update:.last_update}'
> >> >> {
> >> >> "peer": "0",
> >> >> "last_update": "36847'71412720"
> >> >> }
> >> >> {
> >> >> "peer": "109",
> >> >> "last_update": "37380'71900570"
> >> >> }
> >> >> {
> >> >> "peer": "117",
> >> >> "last_update": "0'0"
> >> >> }
> >> >>
> >> >>
> >> >> [root@ceph-mon-01 ~]#
> >> >> I have checked space taken for this PG on storage nodes:
> >> >> here is how to check where is particular OSD (on which physical
> storage node):
> >> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 9 "
> >> >> | 9 | stor-a02 | 2063G | 5386G | 52 | 1347k | 53 |
> 292k | exists,up |
> >> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 109 "
> >> >> | 109 | stor-a01 | 1285G | 4301G | 5 | 31.0k | 6 |
> 59.2k | exists,up |
> >> >> [root@ceph-mon-01 ~]# watch ceph -s
> >> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 117 "
> >> >> | 117 | stor-b02 | 1334G | 4252G | 54 | 1216k | 13 |
> 27.4k | exists,up |
> >> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 0 "
> >> >> | 0 | stor-a01 | 2156G | 5293G | 58 | 387k | 29 |
> 30.7k | exists,up |
> >> >> [root@ceph-mon-01 ~]#
> >> >> and checking sizes on servers:
> >> >> stor-a01 (this PG shouldn't be on the same host):
> >> >> [root@stor-a01 /var/lib/ceph/osd/ceph-0/current]# du -sh 10.3c4_*
> >> >> 2.4G 10.3c4_head
> >> >> 0 10.3c4_TEMP
> >> >> [root@stor-a01 /var/lib/ceph/osd/ceph-109/current]# du -sh 10.3c4_*
> >> >> 14G 10.3c4_head
> >> >> 0 10.3c4_TEMP
> >> >> [root@stor-a01 /var/lib/ceph/osd/ceph-109/current]#
> >> >> stor-a02:
> >> >> [root@stor-a02 /var/lib/ceph/osd/ceph-9/current]# du -sh 10.3c4_*
> >> >> 14G 10.3c4_head
> >> >> 0 10.3c4_TEMP
> >> >> [root@stor-a02 /var/lib/ceph/osd/ceph-9/current]#
> >> >> stor-b02:
> >> >> [root@stor-b02 /var/lib/ceph/osd/ceph-117/current]# du -sh 10.3c4_*
> >> >> zsh: no matches found: 10.3c4_*
> >> >>
> >> >> information about ceph:
> >> >> [root@ceph-mon-01 ~]# ceph versions
> >> >> {
> >> >> "mon": {
> >> >> "ceph version 12.2.9
> (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
> >> >> },
> >> >> "mgr": {
> >> >> "ceph version 12.2.9
> (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
> >> >> },
> >> >> "osd": {
> >> >> "ceph version 12.2.9
> (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 144
> >> >> },
> >> >> "mds": {},
> >> >> "rbd-mirror": {
> >> >> "ceph version 12.2.9
> (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
> >> >> },
> >> >> "rgw": {
> >> >> "ceph version 12.2.9
> (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 6
> >> >> },
> >> >> "overall": {
> >> >> "ceph version 12.2.9
> (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 159
> >> >> }
> >> >> }
> >> >>
> >> >> crushmap: https://pastebin.com/cpC2WmyS
> >> >> ceph osd tree: https://pastebin.com/XvZ2cNZZ
> >> >>
> >> >> I'm cross-posting this do devel because maybe there is some known
> bug
> >> >> in this particular version of ceph, and You could point
> some
> >> >> directions to fix this problem.
> >> >>
> >> >> --
> >> >> Regards
> >> >> Lukasz
> >> >>
> >>
> >>
> >>
> >> --
> >> Pozdrowienia,
> >> Luk
> >>
>
>
>
> --
> Pozdrowienia,
> Luk
>
> _______________________________________________
> ceph-users mailing list
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
[-- Attachment #1.2: Type: text/html, Size: 12396 bytes --]
[-- Attachment #2: Type: text/plain, Size: 178 bytes --]
_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: problem with degraded PG
[not found] ` <CADvHm=ah+SZiZs8apTxC5Uom3NzEydSKswj0hsgC9pgTzgodxQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-06-14 10:11 ` Luk
0 siblings, 0 replies; 9+ messages in thread
From: Luk @ 2019-06-14 10:11 UTC (permalink / raw)
To: Caspar Smit
Cc: ceph-users, ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
[-- Attachment #1: Type: text/html, Size: 14583 bytes --]
[-- Attachment #2: Type: text/plain, Size: 178 bytes --]
_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: problem with degraded PG
[not found] ` <661486857.20190614115232-wYtBgQxc//8@public.gmane.org>
2019-06-14 10:02 ` Caspar Smit
@ 2019-06-14 12:02 ` Luk
[not found] ` <1585044179.20190614140202-wYtBgQxc//8@public.gmane.org>
1 sibling, 1 reply; 9+ messages in thread
From: Luk @ 2019-06-14 12:02 UTC (permalink / raw)
To: Dan van der Ster
Cc: ceph-users, ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Hello,
All kudos are going to friends from Wroclaw, PL :)
It was as simple as typo...
There was osd added two times to crushmap due to (this commands where
run over week ago - didn't have problem then, it showed up after
replacing another osd - osd-7):
ceph osd crush add osd.112 0.00 root=hdd
ceph osd crush move osd.112 0.00 root=hdd rack=rack-a host=stor-a02
ceph osd crush add osd.112 0.00 host=stor-a02
and the ceph osd tree was like this:
[root@ceph-mon-01 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-100 200.27496 root hdd
-101 67.64999 rack rack-a
-2 33.82500 host stor-a01
0 hdd 7.27499 osd.0 up 1.00000 1.00000
6 hdd 7.27499 osd.6 up 1.00000 1.00000
12 hdd 7.27499 osd.12 up 1.00000 1.00000
108 hdd 4.00000 osd.108 up 1.00000 1.00000
109 hdd 4.00000 osd.109 up 1.00000 1.00000
110 hdd 4.00000 osd.110 up 1.00000 1.00000
-7 33.82500 host stor-a02
5 hdd 7.27499 osd.5 up 1.00000 1.00000
9 hdd 7.27499 osd.9 up 1.00000 1.00000
15 hdd 7.27499 osd.15 up 1.00000 1.00000
111 hdd 4.00000 osd.111 up 1.00000 1.00000
112 hdd 4.00000 osd.112 up 1.00000 1.00000
113 hdd 4.00000 osd.113 up 1.00000 1.00000
-102 60.97498 rack rack-b
-3 27.14998 host stor-b01
1 hdd 7.27499 osd.1 up 1.00000 1.00000
7 hdd 0.59999 osd.7 up 1.00000 1.00000
13 hdd 7.27499 osd.13 up 1.00000 1.00000
114 hdd 4.00000 osd.114 up 1.00000 1.00000
115 hdd 4.00000 osd.115 up 1.00000 1.00000
116 hdd 4.00000 osd.116 up 1.00000 1.00000
-4 33.82500 host stor-b02
2 hdd 7.27499 osd.2 up 1.00000 1.00000
10 hdd 7.27499 osd.10 up 1.00000 1.00000
16 hdd 7.27499 osd.16 up 1.00000 1.00000
117 hdd 4.00000 osd.117 up 1.00000 1.00000
118 hdd 4.00000 osd.118 up 1.00000 1.00000
119 hdd 4.00000 osd.119 up 1.00000 1.00000
-103 67.64999 rack rack-c
-6 33.82500 host stor-c01
4 hdd 7.27499 osd.4 up 1.00000 1.00000
8 hdd 7.27499 osd.8 up 1.00000 1.00000
14 hdd 7.27499 osd.14 up 1.00000 1.00000
120 hdd 4.00000 osd.120 up 1.00000 1.00000
121 hdd 4.00000 osd.121 up 1.00000 1.00000
122 hdd 4.00000 osd.122 up 1.00000 1.00000
-5 33.82500 host stor-c02
3 hdd 7.27499 osd.3 up 1.00000 1.00000
11 hdd 7.27499 osd.11 up 1.00000 1.00000
17 hdd 7.27499 osd.17 up 1.00000 1.00000
123 hdd 4.00000 osd.123 up 1.00000 1.00000
124 hdd 4.00000 osd.124 up 1.00000 1.00000
125 hdd 4.00000 osd.125 up 1.00000 1.00000
112 hdd 4.00000 osd.112 up 1.00000 1.00000
[cut]
after editing crushmap and removing osd.112 from root ceph started
recover and is healthy now :)
Regards
Lukasz
> Here is ceph osd tree, in first post there is also ceph osd df tree:
> https://pastebin.com/Vs75gpwZ
>> Ahh I was thinking of chooseleaf_vary_r, which you already have.
>> So probably not related to tunables. What is your `ceph osd tree` ?
>> By the way, 12.2.9 has an unrelated bug (details
>> http://tracker.ceph.com/issues/36686)
>> AFAIU you will just need to update to v12.2.11 or v12.2.12 for that fix.
>> -- Dan
>> On Fri, Jun 14, 2019 at 11:29 AM Luk <skidoo-wYtBgQxc//8@public.gmane.org> wrote:
>>>
>>> Hi,
>>>
>>> here is the output:
>>>
>>> ceph osd crush show-tunables
>>> {
>>> "choose_local_tries": 0,
>>> "choose_local_fallback_tries": 0,
>>> "choose_total_tries": 100,
>>> "chooseleaf_descend_once": 1,
>>> "chooseleaf_vary_r": 1,
>>> "chooseleaf_stable": 0,
>>> "straw_calc_version": 1,
>>> "allowed_bucket_algs": 22,
>>> "profile": "unknown",
>>> "optimal_tunables": 0,
>>> "legacy_tunables": 0,
>>> "minimum_required_version": "hammer",
>>> "require_feature_tunables": 1,
>>> "require_feature_tunables2": 1,
>>> "has_v2_rules": 0,
>>> "require_feature_tunables3": 1,
>>> "has_v3_rules": 0,
>>> "has_v4_buckets": 1,
>>> "require_feature_tunables5": 0,
>>> "has_v5_rules": 0
>>> }
>>>
>>> [root@ceph-mon-01 ~]#
>>>
>>> --
>>> Regards
>>> Lukasz
>>>
>>> > Hi,
>>> > This looks like a tunables issue.
>>> > What is the output of `ceph osd crush show-tunables `
>>>
>>> > -- Dan
>>>
>>> > On Fri, Jun 14, 2019 at 11:19 AM Luk <skidoo-wYtBgQxc//8@public.gmane.org> wrote:
>>> >>
>>> >> Hello,
>>> >>
>>> >> Maybe somone was fighting with this kind of stuck in ceph already.
>>> >> This is production cluster, can't/don't want to make wrong steps,
>>> >> please advice, what to do.
>>> >>
>>> >> After changing of one failed disk (it was osd-7) on our cluster ceph
>>> >> didn't recover to HEALTH_OK, it stopped in state:
>>> >>
>>> >> [root@ceph-mon-01 ~]# ceph -s
>>> >> cluster:
>>> >> id: b6f23cff-7279-f4b0-ff91-21fadac95bb5
>>> >> health: HEALTH_WARN
>>> >> noout,noscrub,nodeep-scrub flag(s) set
>>> >> Degraded data redundancy: 24761/45994899 objects degraded (0.054%), 8 pgs degraded, 8 pgs undersized
>>> >>
>>> >> services:
>>> >> mon: 3 daemons, quorum ceph-mon-01,ceph-mon-02,ceph-mon-03
>>> >> mgr: ceph-mon-03(active), standbys: ceph-mon-02, ceph-mon-01
>>> >> osd: 144 osds: 144 up, 144 in
>>> >> flags noout,noscrub,nodeep-scrub
>>> >> rbd-mirror: 3 daemons active
>>> >> rgw: 6 daemons active
>>> >>
>>> >> data:
>>> >> pools: 18 pools, 2176 pgs
>>> >> objects: 15.33M objects, 49.3TiB
>>> >> usage: 151TiB used, 252TiB / 403TiB avail
>>> >> pgs: 24761/45994899 objects degraded (0.054%)
>>> >> 2168 active+clean
>>> >> 8 active+undersized+degraded
>>> >>
>>> >> io:
>>> >> client: 435MiB/s rd, 415MiB/s wr, 7.94kop/s rd, 2.96kop/s wr
>>> >>
>>> >> Restart of OSD didn't helped, changing choose_total_tries from 50 to 100 didn't help.
>>> >>
>>> >> I checked one of degraded PG, 10.3c4
>>> >>
>>> >> [root@ceph-mon-01 ~]# ceph pg dump 2>&1 | grep -w 10.3c4
>>> >> 10.3c4 3593 0 3593 0 0 14769891858 10076 10076 active+undersized+degraded 2019-06-13 08:19:39.802219 37380'71900564 37380:119411139 [9,109] 9 [9,109] 9 33550'69130424 2019-06-08 02:28:40.508790 33550'69130424 2019-06-08 02:28:40.508790 18
>>> >>
>>> >>
>>> >> [root@ceph-mon-01 ~]# ceph pg 10.3c4 query | jq '.["peer_info"][] | {peer: .peer, last_update:.last_update}'
>>> >> {
>>> >> "peer": "0",
>>> >> "last_update": "36847'71412720"
>>> >> }
>>> >> {
>>> >> "peer": "109",
>>> >> "last_update": "37380'71900570"
>>> >> }
>>> >> {
>>> >> "peer": "117",
>>> >> "last_update": "0'0"
>>> >> }
>>> >>
>>> >>
>>> >> [root@ceph-mon-01 ~]#
>>> >> I have checked space taken for this PG on storage nodes:
>>> >> here is how to check where is particular OSD (on which physical storage node):
>>> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 9 "
>>> >> | 9 | stor-a02 | 2063G | 5386G | 52 | 1347k | 53 | 292k | exists,up |
>>> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 109 "
>>> >> | 109 | stor-a01 | 1285G | 4301G | 5 | 31.0k | 6 | 59.2k | exists,up |
>>> >> [root@ceph-mon-01 ~]# watch ceph -s
>>> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 117 "
>>> >> | 117 | stor-b02 | 1334G | 4252G | 54 | 1216k | 13 | 27.4k | exists,up |
>>> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 0 "
>>> >> | 0 | stor-a01 | 2156G | 5293G | 58 | 387k | 29 | 30.7k | exists,up |
>>> >> [root@ceph-mon-01 ~]#
>>> >> and checking sizes on servers:
>>> >> stor-a01 (this PG shouldn't be on the same host):
>>> >> [root@stor-a01 /var/lib/ceph/osd/ceph-0/current]# du -sh 10.3c4_*
>>> >> 2.4G 10.3c4_head
>>> >> 0 10.3c4_TEMP
>>> >> [root@stor-a01 /var/lib/ceph/osd/ceph-109/current]# du -sh 10.3c4_*
>>> >> 14G 10.3c4_head
>>> >> 0 10.3c4_TEMP
>>> >> [root@stor-a01 /var/lib/ceph/osd/ceph-109/current]#
>>> >> stor-a02:
>>> >> [root@stor-a02 /var/lib/ceph/osd/ceph-9/current]# du -sh 10.3c4_*
>>> >> 14G 10.3c4_head
>>> >> 0 10.3c4_TEMP
>>> >> [root@stor-a02 /var/lib/ceph/osd/ceph-9/current]#
>>> >> stor-b02:
>>> >> [root@stor-b02 /var/lib/ceph/osd/ceph-117/current]# du -sh 10.3c4_*
>>> >> zsh: no matches found: 10.3c4_*
>>> >>
>>> >> information about ceph:
>>> >> [root@ceph-mon-01 ~]# ceph versions
>>> >> {
>>> >> "mon": {
>>> >> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
>>> >> },
>>> >> "mgr": {
>>> >> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
>>> >> },
>>> >> "osd": {
>>> >> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 144
>>> >> },
>>> >> "mds": {},
>>> >> "rbd-mirror": {
>>> >> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
>>> >> },
>>> >> "rgw": {
>>> >> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 6
>>> >> },
>>> >> "overall": {
>>> >> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 159
>>> >> }
>>> >> }
>>> >>
>>> >> crushmap: https://pastebin.com/cpC2WmyS
>>> >> ceph osd tree: https://pastebin.com/XvZ2cNZZ
>>> >>
>>> >> I'm cross-posting this do devel because maybe there is some known bug
>>> >> in this particular version of ceph, and You could point some
>>> >> directions to fix this problem.
>>> >>
>>> >> --
>>> >> Regards
>>> >> Lukasz
>>> >>
>>>
>>>
>>>
>>> --
>>> Pozdrowienia,
>>> Luk
>>>
--
Pozdrowienia,
Luk
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: problem with degraded PG
[not found] ` <1585044179.20190614140202-wYtBgQxc//8@public.gmane.org>
@ 2019-06-16 5:02 ` huang jun
0 siblings, 0 replies; 9+ messages in thread
From: huang jun @ 2019-06-16 5:02 UTC (permalink / raw)
To: Łukasz Chrustek
Cc: ceph-users, ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
can you show us the output of 'ceph osd dump' and 'ceph health detail'?
Luk <skidoo@tlen.pl> 于2019年6月14日周五 下午8:02写道:
>
> Hello,
>
> All kudos are going to friends from Wroclaw, PL :)
>
> It was as simple as typo...
>
> There was osd added two times to crushmap due to (this commands where
> run over week ago - didn't have problem then, it showed up after
> replacing another osd - osd-7):
>
> ceph osd crush add osd.112 0.00 root=hdd
> ceph osd crush move osd.112 0.00 root=hdd rack=rack-a host=stor-a02
> ceph osd crush add osd.112 0.00 host=stor-a02
>
> and the ceph osd tree was like this:
> [root@ceph-mon-01 ~]# ceph osd tree
> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
> -100 200.27496 root hdd
> -101 67.64999 rack rack-a
> -2 33.82500 host stor-a01
> 0 hdd 7.27499 osd.0 up 1.00000 1.00000
> 6 hdd 7.27499 osd.6 up 1.00000 1.00000
> 12 hdd 7.27499 osd.12 up 1.00000 1.00000
> 108 hdd 4.00000 osd.108 up 1.00000 1.00000
> 109 hdd 4.00000 osd.109 up 1.00000 1.00000
> 110 hdd 4.00000 osd.110 up 1.00000 1.00000
> -7 33.82500 host stor-a02
> 5 hdd 7.27499 osd.5 up 1.00000 1.00000
> 9 hdd 7.27499 osd.9 up 1.00000 1.00000
> 15 hdd 7.27499 osd.15 up 1.00000 1.00000
> 111 hdd 4.00000 osd.111 up 1.00000 1.00000
> 112 hdd 4.00000 osd.112 up 1.00000 1.00000
> 113 hdd 4.00000 osd.113 up 1.00000 1.00000
> -102 60.97498 rack rack-b
> -3 27.14998 host stor-b01
> 1 hdd 7.27499 osd.1 up 1.00000 1.00000
> 7 hdd 0.59999 osd.7 up 1.00000 1.00000
> 13 hdd 7.27499 osd.13 up 1.00000 1.00000
> 114 hdd 4.00000 osd.114 up 1.00000 1.00000
> 115 hdd 4.00000 osd.115 up 1.00000 1.00000
> 116 hdd 4.00000 osd.116 up 1.00000 1.00000
> -4 33.82500 host stor-b02
> 2 hdd 7.27499 osd.2 up 1.00000 1.00000
> 10 hdd 7.27499 osd.10 up 1.00000 1.00000
> 16 hdd 7.27499 osd.16 up 1.00000 1.00000
> 117 hdd 4.00000 osd.117 up 1.00000 1.00000
> 118 hdd 4.00000 osd.118 up 1.00000 1.00000
> 119 hdd 4.00000 osd.119 up 1.00000 1.00000
> -103 67.64999 rack rack-c
> -6 33.82500 host stor-c01
> 4 hdd 7.27499 osd.4 up 1.00000 1.00000
> 8 hdd 7.27499 osd.8 up 1.00000 1.00000
> 14 hdd 7.27499 osd.14 up 1.00000 1.00000
> 120 hdd 4.00000 osd.120 up 1.00000 1.00000
> 121 hdd 4.00000 osd.121 up 1.00000 1.00000
> 122 hdd 4.00000 osd.122 up 1.00000 1.00000
> -5 33.82500 host stor-c02
> 3 hdd 7.27499 osd.3 up 1.00000 1.00000
> 11 hdd 7.27499 osd.11 up 1.00000 1.00000
> 17 hdd 7.27499 osd.17 up 1.00000 1.00000
> 123 hdd 4.00000 osd.123 up 1.00000 1.00000
> 124 hdd 4.00000 osd.124 up 1.00000 1.00000
> 125 hdd 4.00000 osd.125 up 1.00000 1.00000
> 112 hdd 4.00000 osd.112 up 1.00000 1.00000
>
> [cut]
>
> after editing crushmap and removing osd.112 from root ceph started
> recover and is healthy now :)
>
> Regards
> Lukasz
>
>
> > Here is ceph osd tree, in first post there is also ceph osd df tree:
>
> > https://pastebin.com/Vs75gpwZ
>
>
>
> >> Ahh I was thinking of chooseleaf_vary_r, which you already have.
> >> So probably not related to tunables. What is your `ceph osd tree` ?
>
> >> By the way, 12.2.9 has an unrelated bug (details
> >> http://tracker.ceph.com/issues/36686)
> >> AFAIU you will just need to update to v12.2.11 or v12.2.12 for that fix.
>
> >> -- Dan
>
> >> On Fri, Jun 14, 2019 at 11:29 AM Luk <skidoo@tlen.pl> wrote:
> >>>
> >>> Hi,
> >>>
> >>> here is the output:
> >>>
> >>> ceph osd crush show-tunables
> >>> {
> >>> "choose_local_tries": 0,
> >>> "choose_local_fallback_tries": 0,
> >>> "choose_total_tries": 100,
> >>> "chooseleaf_descend_once": 1,
> >>> "chooseleaf_vary_r": 1,
> >>> "chooseleaf_stable": 0,
> >>> "straw_calc_version": 1,
> >>> "allowed_bucket_algs": 22,
> >>> "profile": "unknown",
> >>> "optimal_tunables": 0,
> >>> "legacy_tunables": 0,
> >>> "minimum_required_version": "hammer",
> >>> "require_feature_tunables": 1,
> >>> "require_feature_tunables2": 1,
> >>> "has_v2_rules": 0,
> >>> "require_feature_tunables3": 1,
> >>> "has_v3_rules": 0,
> >>> "has_v4_buckets": 1,
> >>> "require_feature_tunables5": 0,
> >>> "has_v5_rules": 0
> >>> }
> >>>
> >>> [root@ceph-mon-01 ~]#
> >>>
> >>> --
> >>> Regards
> >>> Lukasz
> >>>
> >>> > Hi,
> >>> > This looks like a tunables issue.
> >>> > What is the output of `ceph osd crush show-tunables `
> >>>
> >>> > -- Dan
> >>>
> >>> > On Fri, Jun 14, 2019 at 11:19 AM Luk <skidoo@tlen.pl> wrote:
> >>> >>
> >>> >> Hello,
> >>> >>
> >>> >> Maybe somone was fighting with this kind of stuck in ceph already.
> >>> >> This is production cluster, can't/don't want to make wrong steps,
> >>> >> please advice, what to do.
> >>> >>
> >>> >> After changing of one failed disk (it was osd-7) on our cluster ceph
> >>> >> didn't recover to HEALTH_OK, it stopped in state:
> >>> >>
> >>> >> [root@ceph-mon-01 ~]# ceph -s
> >>> >> cluster:
> >>> >> id: b6f23cff-7279-f4b0-ff91-21fadac95bb5
> >>> >> health: HEALTH_WARN
> >>> >> noout,noscrub,nodeep-scrub flag(s) set
> >>> >> Degraded data redundancy: 24761/45994899 objects degraded (0.054%), 8 pgs degraded, 8 pgs undersized
> >>> >>
> >>> >> services:
> >>> >> mon: 3 daemons, quorum ceph-mon-01,ceph-mon-02,ceph-mon-03
> >>> >> mgr: ceph-mon-03(active), standbys: ceph-mon-02, ceph-mon-01
> >>> >> osd: 144 osds: 144 up, 144 in
> >>> >> flags noout,noscrub,nodeep-scrub
> >>> >> rbd-mirror: 3 daemons active
> >>> >> rgw: 6 daemons active
> >>> >>
> >>> >> data:
> >>> >> pools: 18 pools, 2176 pgs
> >>> >> objects: 15.33M objects, 49.3TiB
> >>> >> usage: 151TiB used, 252TiB / 403TiB avail
> >>> >> pgs: 24761/45994899 objects degraded (0.054%)
> >>> >> 2168 active+clean
> >>> >> 8 active+undersized+degraded
> >>> >>
> >>> >> io:
> >>> >> client: 435MiB/s rd, 415MiB/s wr, 7.94kop/s rd, 2.96kop/s wr
> >>> >>
> >>> >> Restart of OSD didn't helped, changing choose_total_tries from 50 to 100 didn't help.
> >>> >>
> >>> >> I checked one of degraded PG, 10.3c4
> >>> >>
> >>> >> [root@ceph-mon-01 ~]# ceph pg dump 2>&1 | grep -w 10.3c4
> >>> >> 10.3c4 3593 0 3593 0 0 14769891858 10076 10076 active+undersized+degraded 2019-06-13 08:19:39.802219 37380'71900564 37380:119411139 [9,109] 9 [9,109] 9 33550'69130424 2019-06-08 02:28:40.508790 33550'69130424 2019-06-08 02:28:40.508790 18
> >>> >>
> >>> >>
> >>> >> [root@ceph-mon-01 ~]# ceph pg 10.3c4 query | jq '.["peer_info"][] | {peer: .peer, last_update:.last_update}'
> >>> >> {
> >>> >> "peer": "0",
> >>> >> "last_update": "36847'71412720"
> >>> >> }
> >>> >> {
> >>> >> "peer": "109",
> >>> >> "last_update": "37380'71900570"
> >>> >> }
> >>> >> {
> >>> >> "peer": "117",
> >>> >> "last_update": "0'0"
> >>> >> }
> >>> >>
> >>> >>
> >>> >> [root@ceph-mon-01 ~]#
> >>> >> I have checked space taken for this PG on storage nodes:
> >>> >> here is how to check where is particular OSD (on which physical storage node):
> >>> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 9 "
> >>> >> | 9 | stor-a02 | 2063G | 5386G | 52 | 1347k | 53 | 292k | exists,up |
> >>> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 109 "
> >>> >> | 109 | stor-a01 | 1285G | 4301G | 5 | 31.0k | 6 | 59.2k | exists,up |
> >>> >> [root@ceph-mon-01 ~]# watch ceph -s
> >>> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 117 "
> >>> >> | 117 | stor-b02 | 1334G | 4252G | 54 | 1216k | 13 | 27.4k | exists,up |
> >>> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 0 "
> >>> >> | 0 | stor-a01 | 2156G | 5293G | 58 | 387k | 29 | 30.7k | exists,up |
> >>> >> [root@ceph-mon-01 ~]#
> >>> >> and checking sizes on servers:
> >>> >> stor-a01 (this PG shouldn't be on the same host):
> >>> >> [root@stor-a01 /var/lib/ceph/osd/ceph-0/current]# du -sh 10.3c4_*
> >>> >> 2.4G 10.3c4_head
> >>> >> 0 10.3c4_TEMP
> >>> >> [root@stor-a01 /var/lib/ceph/osd/ceph-109/current]# du -sh 10.3c4_*
> >>> >> 14G 10.3c4_head
> >>> >> 0 10.3c4_TEMP
> >>> >> [root@stor-a01 /var/lib/ceph/osd/ceph-109/current]#
> >>> >> stor-a02:
> >>> >> [root@stor-a02 /var/lib/ceph/osd/ceph-9/current]# du -sh 10.3c4_*
> >>> >> 14G 10.3c4_head
> >>> >> 0 10.3c4_TEMP
> >>> >> [root@stor-a02 /var/lib/ceph/osd/ceph-9/current]#
> >>> >> stor-b02:
> >>> >> [root@stor-b02 /var/lib/ceph/osd/ceph-117/current]# du -sh 10.3c4_*
> >>> >> zsh: no matches found: 10.3c4_*
> >>> >>
> >>> >> information about ceph:
> >>> >> [root@ceph-mon-01 ~]# ceph versions
> >>> >> {
> >>> >> "mon": {
> >>> >> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
> >>> >> },
> >>> >> "mgr": {
> >>> >> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
> >>> >> },
> >>> >> "osd": {
> >>> >> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 144
> >>> >> },
> >>> >> "mds": {},
> >>> >> "rbd-mirror": {
> >>> >> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
> >>> >> },
> >>> >> "rgw": {
> >>> >> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 6
> >>> >> },
> >>> >> "overall": {
> >>> >> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 159
> >>> >> }
> >>> >> }
> >>> >>
> >>> >> crushmap: https://pastebin.com/cpC2WmyS
> >>> >> ceph osd tree: https://pastebin.com/XvZ2cNZZ
> >>> >>
> >>> >> I'm cross-posting this do devel because maybe there is some known bug
> >>> >> in this particular version of ceph, and You could point some
> >>> >> directions to fix this problem.
> >>> >>
> >>> >> --
> >>> >> Regards
> >>> >> Lukasz
> >>> >>
> >>>
> >>>
> >>>
> >>> --
> >>> Pozdrowienia,
> >>> Luk
> >>>
>
>
>
>
>
>
> --
> Pozdrowienia,
> Luk
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-06-16 5:02 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-06-14 9:19 problem with degraded PG Luk
[not found] ` <304057881.20190614111947-wYtBgQxc//8@public.gmane.org>
2019-06-14 9:22 ` Dan van der Ster
[not found] ` <CABZ+qq=Ob4SuvFSbpN-8+NP66_rQJO2_HaNmThVwF6s8vOw+hQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-06-14 9:29 ` Luk
[not found] ` <1171202508.20190614112950-wYtBgQxc//8@public.gmane.org>
2019-06-14 9:38 ` Dan van der Ster
[not found] ` <CABZ+qqmN0zG57Oqtv3wMLHrfXEwCPfGQARewq=hkjOxmZ0=G6g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-06-14 9:52 ` Luk
[not found] ` <661486857.20190614115232-wYtBgQxc//8@public.gmane.org>
2019-06-14 10:02 ` Caspar Smit
[not found] ` <CADvHm=ah+SZiZs8apTxC5Uom3NzEydSKswj0hsgC9pgTzgodxQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-06-14 10:11 ` Luk
2019-06-14 12:02 ` Luk
[not found] ` <1585044179.20190614140202-wYtBgQxc//8@public.gmane.org>
2019-06-16 5:02 ` huang jun
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.