From: Willem Jan Withagen <wjw@digiware.nl>
To: Samuel Just <sjust@redhat.com>
Cc: M Ranga Swami Reddy <swamireddy@gmail.com>,
Ceph Development <ceph-devel@vger.kernel.org>
Subject: Re: OSD not coming up after being set down
Date: Wed, 2 Mar 2016 21:21:38 +0100 [thread overview]
Message-ID: <56D74B52.7060104@digiware.nl> (raw)
In-Reply-To: <CAN=+7FX-JXXy29y6m4=RkrivuwP0Kc9M7vkqTnmaqjKCdP4yow@mail.gmail.com>
On 2-3-2016 21:11, Samuel Just wrote:
> At this point, you will want to run the script and then dig through
> the logs until you find something that doesn't match.
> - Was osd.0 up to begin with?
> - Is its process running?
> - Did it get the map marking it down?
> - Did it send a boot message back to the mon requesting that it be
> marked back up?
> - Did the mon get that message?
> - Did the mon create a new map marking it up?
Right this is sort of a handholding I was looking for.
The first 2 items are true.
Who sends "the map marking it down"?
ceph osd down 0 => Mon => Osd
Or does that go directly ceph => Osd
Are there any statemachine pictures of this in the manuals?
--WjW
> Etc
> -Sam
>
> On Wed, Mar 2, 2016 at 11:56 AM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>> On 2-3-2016 18:01, M Ranga Swami Reddy wrote:
>>> Please see the below:
>>> ---
>>> The If something is causing OSDs to ‘flap’ (repeatedly getting marked
>>> down and then up again), you can force the monitors to stop the
>>> flapping with:
>>>
>>> ceph osd set noup # prevent OSDs from getting marked up
>>> ceph osd set nodown # prevent OSDs from getting marked down
>>> ----
>>> ref: http://docs.ceph.com/docs/hammer/rados/troubleshooting/troubleshooting-osd/
>>
>> I don't think this is the issue.
>>
>> The testcode should run as is. This run on Linux oke, but FreeBSD is
>> giving trouble.
>> The OSD should get up, but does not.
>> - OSD not receiving the UP
>> - OSD not able to go UP
>> - Or the monitors are not picking up?
>>
>> --WjW
>>
>>> On Wed, Mar 2, 2016 at 9:33 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>>>> Hi,
>>>>
>>>> Any handholding is welcomed!!
>>>>
>>>> In test/cephtool-mon-test.sh part of the excuted code is:
>>>> ceph osd down 0
>>>> ceph osd dump | grep 'osd.0 down'
>>>> ceph osd unset noup
>>>> for ((i=0; i < 120; i++)); do
>>>> if ! ceph osd dump | grep 'osd.0 up'; then
>>>> echo "waiting for osd.0 to come back up"
>>>> sleep 1
>>>> else
>>>> break
>>>> fi
>>>> done
>>>> ceph osd dump | grep 'osd.0 up'
>>>>
>>>> But the OSD refused to come back up.
>>>> Below the output of the dump.
>>>>
>>>> How would I start analyzing this issue?
>>>> What kind of things would I expect to see in the logfile?
>>>> What if the OSD does come up
>>>> What if the OSD stays down
>>>>
>>>> Thanx,
>>>> --WjW
>>>>
>>>>
>>>> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
>>>> epoch 170
>>>> fsid 8b5c0b4b-e08c-11e5-8cd4-1c6f6582ec12
>>>> created 2016-03-02 16:36:35.001700
>>>> modified 2016-03-02 16:45:17.802073
>>>> flags sortbitwise
>>>> pool 0 'rbd' replicated size 3 min_size 1 crush_ruleset 0 object_hash
>>>> rjenkins pg_num 8 pgp_num 8 last_change 1 flags hashpspool stripe_width 0
>>>> max_osd 3
>>>> osd.0 down out weight 0 up_from 4 up_thru 163 down_at 166
>>>> last_clean_interval [0,0) 127.0.0.1:6804/2455 127.0.0.1:6805/2455
>>>> 127.0.0.1:6806/2455 127.0.0.1:6807/2455 autoout,exists
>>>> 8bc29c74-e08c-11e5-8cd4-1c6f6582ec12
>>>> osd.1 up in weight 1 up_from 8 up_thru 166 down_at 0
>>>> last_clean_interval [0,0) 127.0.0.1:6808/2475 127.0.0.1:6811/2475
>>>> 127.0.0.1:6813/2475 127.0.0.1:6816/2475 exists,up
>>>> 8d7a2cb5-e08c-11e5-8cd4-1c6f6582ec12
>>>> osd.2 up in weight 1 up_from 13 up_thru 166 down_at 0
>>>> last_clean_interval [0,0) 127.0.0.1:6817/2495 127.0.0.1:6818/2495
>>>> 127.0.0.1:6819/2495 127.0.0.1:6820/2495 exists,up
>>>> 8f46df05-e08c-11e5-8cd4-1c6f6582ec12
>>>> pg_temp 0.0 [0,2,1]
>>>> pg_temp 0.1 [2,0,1]
>>>> pg_temp 0.2 [0,1,2]
>>>> pg_temp 0.3 [2,0,1]
>>>> pg_temp 0.4 [0,2,1]
>>>> pg_temp 0.5 [0,2,1]
>>>> pg_temp 0.6 [0,1,2]
>>>> pg_temp 0.7 [1,0,2]
>>>> 2016-03-02 16:56:11.027977 8021d7800 0 lockdep stop
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2016-03-02 20:35 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-02 16:03 OSD not coming up after being set down Willem Jan Withagen
2016-03-02 17:01 ` M Ranga Swami Reddy
2016-03-02 19:56 ` Willem Jan Withagen
2016-03-02 20:11 ` Samuel Just
2016-03-02 20:21 ` Willem Jan Withagen [this message]
2016-03-02 23:02 ` Samuel Just
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56D74B52.7060104@digiware.nl \
--to=wjw@digiware.nl \
--cc=ceph-devel@vger.kernel.org \
--cc=sjust@redhat.com \
--cc=swamireddy@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.