* Fwd: monitor crashing
[not found] ` <CACx0BdNRPk64Etwvs7_tEce=xwFw7E4Ube=8raav_-EFhhP2Fw@mail.gmail.com>
@ 2015-10-13 11:45 ` Luis Periquito
2015-10-13 13:25 ` Sage Weil
0 siblings, 1 reply; 9+ messages in thread
From: Luis Periquito @ 2015-10-13 11:45 UTC (permalink / raw)
To: ceph-devel
Any ideas? I'm growing desperate :(
I've tried compiling from source, and including
https://github.com/ceph/ceph/pull/5276, but it still crashes on boot
of the ceph-mon
---------- Forwarded message ----------
From: Luis Periquito <periquito@gmail.com>
Date: Tue, Oct 13, 2015 at 12:26 PM
Subject: Re: monitor crashing
To: Ceph Users <ceph-users@lists.ceph.com>
I'm currently running Hammer (0.94.3), created an invalid LRC profile
(typo in the l=, should have been l=4 but was l=3, and now I don't
have enough different ruleset-locality) and created a pool. Is there
any way to delete this pool? remember I can't start the ceph-mon...
On Tue, Oct 13, 2015 at 11:56 AM, Luis Periquito <periquito@gmail.com> wrote:
> It seems I've hit this bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=1231630
>
> is there any way I can recover this cluster? It worked in our test
> cluster, but crashed the production one...
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Fwd: monitor crashing
2015-10-13 11:45 ` Fwd: monitor crashing Luis Periquito
@ 2015-10-13 13:25 ` Sage Weil
2015-10-13 13:29 ` Luis Periquito
0 siblings, 1 reply; 9+ messages in thread
From: Sage Weil @ 2015-10-13 13:25 UTC (permalink / raw)
To: Luis Periquito; +Cc: ceph-devel
On Tue, 13 Oct 2015, Luis Periquito wrote:
> Any ideas? I'm growing desperate :(
>
> I've tried compiling from source, and including
> https://github.com/ceph/ceph/pull/5276, but it still crashes on boot
> of the ceph-mon
If you can email a (link to a) tarball of your mon data directory I'd love
to extract the osdmap and see why crush is crashing.. it's obviously not
supposed to do that (even with a bad rule). You can also use
the ceph-post-file utility.
Thanks!
sage
>
> ---------- Forwarded message ----------
> From: Luis Periquito <periquito@gmail.com>
> Date: Tue, Oct 13, 2015 at 12:26 PM
> Subject: Re: monitor crashing
> To: Ceph Users <ceph-users@lists.ceph.com>
>
>
> I'm currently running Hammer (0.94.3), created an invalid LRC profile
> (typo in the l=, should have been l=4 but was l=3, and now I don't
> have enough different ruleset-locality) and created a pool. Is there
> any way to delete this pool? remember I can't start the ceph-mon...
>
> On Tue, Oct 13, 2015 at 11:56 AM, Luis Periquito <periquito@gmail.com> wrote:
> > It seems I've hit this bug:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1231630
> >
> > is there any way I can recover this cluster? It worked in our test
> > cluster, but crashed the production one...
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Fwd: monitor crashing
2015-10-13 13:25 ` Sage Weil
@ 2015-10-13 13:29 ` Luis Periquito
2015-10-13 13:41 ` Sage Weil
0 siblings, 1 reply; 9+ messages in thread
From: Luis Periquito @ 2015-10-13 13:29 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
the store.db dir is 3.4GB big :(
can I do it on my side?
On Tue, Oct 13, 2015 at 2:25 PM, Sage Weil <sage@newdream.net> wrote:
> On Tue, 13 Oct 2015, Luis Periquito wrote:
>> Any ideas? I'm growing desperate :(
>>
>> I've tried compiling from source, and including
>> https://github.com/ceph/ceph/pull/5276, but it still crashes on boot
>> of the ceph-mon
>
> If you can email a (link to a) tarball of your mon data directory I'd love
> to extract the osdmap and see why crush is crashing.. it's obviously not
> supposed to do that (even with a bad rule). You can also use
> the ceph-post-file utility.
>
> Thanks!
> sage
>
>
>>
>> ---------- Forwarded message ----------
>> From: Luis Periquito <periquito@gmail.com>
>> Date: Tue, Oct 13, 2015 at 12:26 PM
>> Subject: Re: monitor crashing
>> To: Ceph Users <ceph-users@lists.ceph.com>
>>
>>
>> I'm currently running Hammer (0.94.3), created an invalid LRC profile
>> (typo in the l=, should have been l=4 but was l=3, and now I don't
>> have enough different ruleset-locality) and created a pool. Is there
>> any way to delete this pool? remember I can't start the ceph-mon...
>>
>> On Tue, Oct 13, 2015 at 11:56 AM, Luis Periquito <periquito@gmail.com> wrote:
>> > It seems I've hit this bug:
>> > https://bugzilla.redhat.com/show_bug.cgi?id=1231630
>> >
>> > is there any way I can recover this cluster? It worked in our test
>> > cluster, but crashed the production one...
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Fwd: monitor crashing
2015-10-13 13:29 ` Luis Periquito
@ 2015-10-13 13:41 ` Sage Weil
2015-10-13 13:52 ` Loic Dachary
0 siblings, 1 reply; 9+ messages in thread
From: Sage Weil @ 2015-10-13 13:41 UTC (permalink / raw)
To: Luis Periquito; +Cc: ceph-devel
On Tue, 13 Oct 2015, Luis Periquito wrote:
> the store.db dir is 3.4GB big :(
>
> can I do it on my side?
Nevermind, I was able to reproduce it from the bugzilla. I've pushed a
branch wip-ecpool-hammer. Not sure which distro you're on, but packages
will appear at gitbuilder.ceph.com in 30-45 minutes. This fixes the mon
crash, which will let you delete the pool. I suggest stopping the OSDs
before starting the mon with this or else they might get pg create
messages and crash too. Once the pool is removed you can start them
again. They shouldn't need to be upgraded.
Note that the latest hammer doesn't let you create the pool at all because
it fails the crush safety check (I had to disable the check to reproduce
this), so that's good at least!
sage
>
> On Tue, Oct 13, 2015 at 2:25 PM, Sage Weil <sage@newdream.net> wrote:
> > On Tue, 13 Oct 2015, Luis Periquito wrote:
> >> Any ideas? I'm growing desperate :(
> >>
> >> I've tried compiling from source, and including
> >> https://github.com/ceph/ceph/pull/5276, but it still crashes on boot
> >> of the ceph-mon
> >
> > If you can email a (link to a) tarball of your mon data directory I'd love
> > to extract the osdmap and see why crush is crashing.. it's obviously not
> > supposed to do that (even with a bad rule). You can also use
> > the ceph-post-file utility.
> >
> > Thanks!
> > sage
> >
> >
> >>
> >> ---------- Forwarded message ----------
> >> From: Luis Periquito <periquito@gmail.com>
> >> Date: Tue, Oct 13, 2015 at 12:26 PM
> >> Subject: Re: monitor crashing
> >> To: Ceph Users <ceph-users@lists.ceph.com>
> >>
> >>
> >> I'm currently running Hammer (0.94.3), created an invalid LRC profile
> >> (typo in the l=, should have been l=4 but was l=3, and now I don't
> >> have enough different ruleset-locality) and created a pool. Is there
> >> any way to delete this pool? remember I can't start the ceph-mon...
> >>
> >> On Tue, Oct 13, 2015 at 11:56 AM, Luis Periquito <periquito@gmail.com> wrote:
> >> > It seems I've hit this bug:
> >> > https://bugzilla.redhat.com/show_bug.cgi?id=1231630
> >> >
> >> > is there any way I can recover this cluster? It worked in our test
> >> > cluster, but crashed the production one...
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>
> >>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Fwd: monitor crashing
2015-10-13 13:41 ` Sage Weil
@ 2015-10-13 13:52 ` Loic Dachary
2015-10-13 13:59 ` Sage Weil
0 siblings, 1 reply; 9+ messages in thread
From: Loic Dachary @ 2015-10-13 13:52 UTC (permalink / raw)
To: Sage Weil, Luis Periquito; +Cc: ceph-devel
[-- Attachment #1: Type: text/plain, Size: 3236 bytes --]
https://github.com/ceph/ceph/compare/hammer...wip-ecpool-hammer
In order to bypass the crush verification, you could:
ceph tell mon.* injectargs --crushtool /bin/true
Cheers
On 13/10/2015 15:41, Sage Weil wrote:
> On Tue, 13 Oct 2015, Luis Periquito wrote:
>> the store.db dir is 3.4GB big :(
>>
>> can I do it on my side?
>
> Nevermind, I was able to reproduce it from the bugzilla. I've pushed a
> branch wip-ecpool-hammer. Not sure which distro you're on, but packages
> will appear at gitbuilder.ceph.com in 30-45 minutes. This fixes the mon
> crash, which will let you delete the pool. I suggest stopping the OSDs
> before starting the mon with this or else they might get pg create
> messages and crash too. Once the pool is removed you can start them
> again. They shouldn't need to be upgraded.
>
> Note that the latest hammer doesn't let you create the pool at all because
> it fails the crush safety check (I had to disable the check to reproduce
> this), so that's good at least!
>
> sage
>
>>
>> On Tue, Oct 13, 2015 at 2:25 PM, Sage Weil <sage@newdream.net> wrote:
>>> On Tue, 13 Oct 2015, Luis Periquito wrote:
>>>> Any ideas? I'm growing desperate :(
>>>>
>>>> I've tried compiling from source, and including
>>>> https://github.com/ceph/ceph/pull/5276, but it still crashes on boot
>>>> of the ceph-mon
>>>
>>> If you can email a (link to a) tarball of your mon data directory I'd love
>>> to extract the osdmap and see why crush is crashing.. it's obviously not
>>> supposed to do that (even with a bad rule). You can also use
>>> the ceph-post-file utility.
>>>
>>> Thanks!
>>> sage
>>>
>>>
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: Luis Periquito <periquito@gmail.com>
>>>> Date: Tue, Oct 13, 2015 at 12:26 PM
>>>> Subject: Re: monitor crashing
>>>> To: Ceph Users <ceph-users@lists.ceph.com>
>>>>
>>>>
>>>> I'm currently running Hammer (0.94.3), created an invalid LRC profile
>>>> (typo in the l=, should have been l=4 but was l=3, and now I don't
>>>> have enough different ruleset-locality) and created a pool. Is there
>>>> any way to delete this pool? remember I can't start the ceph-mon...
>>>>
>>>> On Tue, Oct 13, 2015 at 11:56 AM, Luis Periquito <periquito@gmail.com> wrote:
>>>>> It seems I've hit this bug:
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1231630
>>>>>
>>>>> is there any way I can recover this cluster? It worked in our test
>>>>> cluster, but crashed the production one...
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Loïc Dachary, Artisan Logiciel Libre
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Fwd: monitor crashing
2015-10-13 13:52 ` Loic Dachary
@ 2015-10-13 13:59 ` Sage Weil
2015-10-13 14:21 ` Luis Periquito
0 siblings, 1 reply; 9+ messages in thread
From: Sage Weil @ 2015-10-13 13:59 UTC (permalink / raw)
To: Loic Dachary; +Cc: Luis Periquito, ceph-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3608 bytes --]
On Tue, 13 Oct 2015, Loic Dachary wrote:
> https://github.com/ceph/ceph/compare/hammer...wip-ecpool-hammer
>
> In order to bypass the crush verification, you could:
>
> ceph tell mon.* injectargs --crushtool /bin/true
Ah, good trick!
http://tracker.ceph.com/issues/13477
is the ticket, and my fix for master is
https://github.com/ceph/ceph/pull/6246
sage
>
> Cheers
>
> On 13/10/2015 15:41, Sage Weil wrote:
> > On Tue, 13 Oct 2015, Luis Periquito wrote:
> >> the store.db dir is 3.4GB big :(
> >>
> >> can I do it on my side?
> >
> > Nevermind, I was able to reproduce it from the bugzilla. I've pushed a
> > branch wip-ecpool-hammer. Not sure which distro you're on, but packages
> > will appear at gitbuilder.ceph.com in 30-45 minutes. This fixes the mon
> > crash, which will let you delete the pool. I suggest stopping the OSDs
> > before starting the mon with this or else they might get pg create
> > messages and crash too. Once the pool is removed you can start them
> > again. They shouldn't need to be upgraded.
> >
> > Note that the latest hammer doesn't let you create the pool at all because
> > it fails the crush safety check (I had to disable the check to reproduce
> > this), so that's good at least!
> >
> > sage
> >
> >>
> >> On Tue, Oct 13, 2015 at 2:25 PM, Sage Weil <sage@newdream.net> wrote:
> >>> On Tue, 13 Oct 2015, Luis Periquito wrote:
> >>>> Any ideas? I'm growing desperate :(
> >>>>
> >>>> I've tried compiling from source, and including
> >>>> https://github.com/ceph/ceph/pull/5276, but it still crashes on boot
> >>>> of the ceph-mon
> >>>
> >>> If you can email a (link to a) tarball of your mon data directory I'd love
> >>> to extract the osdmap and see why crush is crashing.. it's obviously not
> >>> supposed to do that (even with a bad rule). You can also use
> >>> the ceph-post-file utility.
> >>>
> >>> Thanks!
> >>> sage
> >>>
> >>>
> >>>>
> >>>> ---------- Forwarded message ----------
> >>>> From: Luis Periquito <periquito@gmail.com>
> >>>> Date: Tue, Oct 13, 2015 at 12:26 PM
> >>>> Subject: Re: monitor crashing
> >>>> To: Ceph Users <ceph-users@lists.ceph.com>
> >>>>
> >>>>
> >>>> I'm currently running Hammer (0.94.3), created an invalid LRC profile
> >>>> (typo in the l=, should have been l=4 but was l=3, and now I don't
> >>>> have enough different ruleset-locality) and created a pool. Is there
> >>>> any way to delete this pool? remember I can't start the ceph-mon...
> >>>>
> >>>> On Tue, Oct 13, 2015 at 11:56 AM, Luis Periquito <periquito@gmail.com> wrote:
> >>>>> It seems I've hit this bug:
> >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1231630
> >>>>>
> >>>>> is there any way I can recover this cluster? It worked in our test
> >>>>> cluster, but crashed the production one...
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>> the body of a message to majordomo@vger.kernel.org
> >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>>>
> >>>>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>
> >>
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
> --
> Loïc Dachary, Artisan Logiciel Libre
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Fwd: monitor crashing
2015-10-13 13:59 ` Sage Weil
@ 2015-10-13 14:21 ` Luis Periquito
2015-10-13 14:35 ` Sage Weil
0 siblings, 1 reply; 9+ messages in thread
From: Luis Periquito @ 2015-10-13 14:21 UTC (permalink / raw)
To: Sage Weil; +Cc: Loic Dachary, ceph-devel
Hi Sage,
awesome help.
Sorry for not telling before, but I'm running 2xMON in precise and
1xMON in trusty. Looking at the status page
(http://ceph.com/gitbuilder.cgi) it seems the precise build is
failing... Can you have a look?
thanks,
On Tue, Oct 13, 2015 at 2:59 PM, Sage Weil <sage@newdream.net> wrote:
> On Tue, 13 Oct 2015, Loic Dachary wrote:
>> https://github.com/ceph/ceph/compare/hammer...wip-ecpool-hammer
>>
>> In order to bypass the crush verification, you could:
>>
>> ceph tell mon.* injectargs --crushtool /bin/true
>
> Ah, good trick!
>
> http://tracker.ceph.com/issues/13477
>
> is the ticket, and my fix for master is
>
> https://github.com/ceph/ceph/pull/6246
>
> sage
>
>>
>> Cheers
>>
>> On 13/10/2015 15:41, Sage Weil wrote:
>> > On Tue, 13 Oct 2015, Luis Periquito wrote:
>> >> the store.db dir is 3.4GB big :(
>> >>
>> >> can I do it on my side?
>> >
>> > Nevermind, I was able to reproduce it from the bugzilla. I've pushed a
>> > branch wip-ecpool-hammer. Not sure which distro you're on, but packages
>> > will appear at gitbuilder.ceph.com in 30-45 minutes. This fixes the mon
>> > crash, which will let you delete the pool. I suggest stopping the OSDs
>> > before starting the mon with this or else they might get pg create
>> > messages and crash too. Once the pool is removed you can start them
>> > again. They shouldn't need to be upgraded.
>> >
>> > Note that the latest hammer doesn't let you create the pool at all because
>> > it fails the crush safety check (I had to disable the check to reproduce
>> > this), so that's good at least!
>> >
>> > sage
>> >
>> >>
>> >> On Tue, Oct 13, 2015 at 2:25 PM, Sage Weil <sage@newdream.net> wrote:
>> >>> On Tue, 13 Oct 2015, Luis Periquito wrote:
>> >>>> Any ideas? I'm growing desperate :(
>> >>>>
>> >>>> I've tried compiling from source, and including
>> >>>> https://github.com/ceph/ceph/pull/5276, but it still crashes on boot
>> >>>> of the ceph-mon
>> >>>
>> >>> If you can email a (link to a) tarball of your mon data directory I'd love
>> >>> to extract the osdmap and see why crush is crashing.. it's obviously not
>> >>> supposed to do that (even with a bad rule). You can also use
>> >>> the ceph-post-file utility.
>> >>>
>> >>> Thanks!
>> >>> sage
>> >>>
>> >>>
>> >>>>
>> >>>> ---------- Forwarded message ----------
>> >>>> From: Luis Periquito <periquito@gmail.com>
>> >>>> Date: Tue, Oct 13, 2015 at 12:26 PM
>> >>>> Subject: Re: monitor crashing
>> >>>> To: Ceph Users <ceph-users@lists.ceph.com>
>> >>>>
>> >>>>
>> >>>> I'm currently running Hammer (0.94.3), created an invalid LRC profile
>> >>>> (typo in the l=, should have been l=4 but was l=3, and now I don't
>> >>>> have enough different ruleset-locality) and created a pool. Is there
>> >>>> any way to delete this pool? remember I can't start the ceph-mon...
>> >>>>
>> >>>> On Tue, Oct 13, 2015 at 11:56 AM, Luis Periquito <periquito@gmail.com> wrote:
>> >>>>> It seems I've hit this bug:
>> >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1231630
>> >>>>>
>> >>>>> is there any way I can recover this cluster? It worked in our test
>> >>>>> cluster, but crashed the production one...
>> >>>> --
>> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >>>> the body of a message to majordomo@vger.kernel.org
>> >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >>>>
>> >>>>
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >>
>> >>
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Fwd: monitor crashing
2015-10-13 14:21 ` Luis Periquito
@ 2015-10-13 14:35 ` Sage Weil
2015-10-13 17:01 ` Luis Periquito
0 siblings, 1 reply; 9+ messages in thread
From: Sage Weil @ 2015-10-13 14:35 UTC (permalink / raw)
To: Luis Periquito; +Cc: Loic Dachary, ceph-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 4650 bytes --]
On Tue, 13 Oct 2015, Luis Periquito wrote:
> Hi Sage,
>
> awesome help.
>
> Sorry for not telling before, but I'm running 2xMON in precise and
> 1xMON in trusty. Looking at the status page
> (http://ceph.com/gitbuilder.cgi) it seems the precise build is
> failing... Can you have a look?
I've repushed the branch, this time cherry-picking the master fix. Let me
know if you run into other problems!
Thanks-
sage
>
> thanks,
>
>
> On Tue, Oct 13, 2015 at 2:59 PM, Sage Weil <sage@newdream.net> wrote:
> > On Tue, 13 Oct 2015, Loic Dachary wrote:
> >> https://github.com/ceph/ceph/compare/hammer...wip-ecpool-hammer
> >>
> >> In order to bypass the crush verification, you could:
> >>
> >> ceph tell mon.* injectargs --crushtool /bin/true
> >
> > Ah, good trick!
> >
> > http://tracker.ceph.com/issues/13477
> >
> > is the ticket, and my fix for master is
> >
> > https://github.com/ceph/ceph/pull/6246
> >
> > sage
> >
> >>
> >> Cheers
> >>
> >> On 13/10/2015 15:41, Sage Weil wrote:
> >> > On Tue, 13 Oct 2015, Luis Periquito wrote:
> >> >> the store.db dir is 3.4GB big :(
> >> >>
> >> >> can I do it on my side?
> >> >
> >> > Nevermind, I was able to reproduce it from the bugzilla. I've pushed a
> >> > branch wip-ecpool-hammer. Not sure which distro you're on, but packages
> >> > will appear at gitbuilder.ceph.com in 30-45 minutes. This fixes the mon
> >> > crash, which will let you delete the pool. I suggest stopping the OSDs
> >> > before starting the mon with this or else they might get pg create
> >> > messages and crash too. Once the pool is removed you can start them
> >> > again. They shouldn't need to be upgraded.
> >> >
> >> > Note that the latest hammer doesn't let you create the pool at all because
> >> > it fails the crush safety check (I had to disable the check to reproduce
> >> > this), so that's good at least!
> >> >
> >> > sage
> >> >
> >> >>
> >> >> On Tue, Oct 13, 2015 at 2:25 PM, Sage Weil <sage@newdream.net> wrote:
> >> >>> On Tue, 13 Oct 2015, Luis Periquito wrote:
> >> >>>> Any ideas? I'm growing desperate :(
> >> >>>>
> >> >>>> I've tried compiling from source, and including
> >> >>>> https://github.com/ceph/ceph/pull/5276, but it still crashes on boot
> >> >>>> of the ceph-mon
> >> >>>
> >> >>> If you can email a (link to a) tarball of your mon data directory I'd love
> >> >>> to extract the osdmap and see why crush is crashing.. it's obviously not
> >> >>> supposed to do that (even with a bad rule). You can also use
> >> >>> the ceph-post-file utility.
> >> >>>
> >> >>> Thanks!
> >> >>> sage
> >> >>>
> >> >>>
> >> >>>>
> >> >>>> ---------- Forwarded message ----------
> >> >>>> From: Luis Periquito <periquito@gmail.com>
> >> >>>> Date: Tue, Oct 13, 2015 at 12:26 PM
> >> >>>> Subject: Re: monitor crashing
> >> >>>> To: Ceph Users <ceph-users@lists.ceph.com>
> >> >>>>
> >> >>>>
> >> >>>> I'm currently running Hammer (0.94.3), created an invalid LRC profile
> >> >>>> (typo in the l=, should have been l=4 but was l=3, and now I don't
> >> >>>> have enough different ruleset-locality) and created a pool. Is there
> >> >>>> any way to delete this pool? remember I can't start the ceph-mon...
> >> >>>>
> >> >>>> On Tue, Oct 13, 2015 at 11:56 AM, Luis Periquito <periquito@gmail.com> wrote:
> >> >>>>> It seems I've hit this bug:
> >> >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1231630
> >> >>>>>
> >> >>>>> is there any way I can recover this cluster? It worked in our test
> >> >>>>> cluster, but crashed the production one...
> >> >>>> --
> >> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> >>>> the body of a message to majordomo@vger.kernel.org
> >> >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> >>>>
> >> >>>>
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> >>
> >> >>
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> > the body of a message to majordomo@vger.kernel.org
> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> >
> >>
> >> --
> >> Loïc Dachary, Artisan Logiciel Libre
> >>
> >>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Fwd: monitor crashing
2015-10-13 14:35 ` Sage Weil
@ 2015-10-13 17:01 ` Luis Periquito
0 siblings, 0 replies; 9+ messages in thread
From: Luis Periquito @ 2015-10-13 17:01 UTC (permalink / raw)
To: Sage Weil; +Cc: Loic Dachary, ceph-devel
Thanks for all the help Sage. The cluster is now back to life with
your awesome patch.
On Tue, Oct 13, 2015 at 3:35 PM, Sage Weil <sage@newdream.net> wrote:
> On Tue, 13 Oct 2015, Luis Periquito wrote:
>> Hi Sage,
>>
>> awesome help.
>>
>> Sorry for not telling before, but I'm running 2xMON in precise and
>> 1xMON in trusty. Looking at the status page
>> (http://ceph.com/gitbuilder.cgi) it seems the precise build is
>> failing... Can you have a look?
>
> I've repushed the branch, this time cherry-picking the master fix. Let me
> know if you run into other problems!
>
> Thanks-
> sage
>
>>
>> thanks,
>>
>>
>> On Tue, Oct 13, 2015 at 2:59 PM, Sage Weil <sage@newdream.net> wrote:
>> > On Tue, 13 Oct 2015, Loic Dachary wrote:
>> >> https://github.com/ceph/ceph/compare/hammer...wip-ecpool-hammer
>> >>
>> >> In order to bypass the crush verification, you could:
>> >>
>> >> ceph tell mon.* injectargs --crushtool /bin/true
>> >
>> > Ah, good trick!
>> >
>> > http://tracker.ceph.com/issues/13477
>> >
>> > is the ticket, and my fix for master is
>> >
>> > https://github.com/ceph/ceph/pull/6246
>> >
>> > sage
>> >
>> >>
>> >> Cheers
>> >>
>> >> On 13/10/2015 15:41, Sage Weil wrote:
>> >> > On Tue, 13 Oct 2015, Luis Periquito wrote:
>> >> >> the store.db dir is 3.4GB big :(
>> >> >>
>> >> >> can I do it on my side?
>> >> >
>> >> > Nevermind, I was able to reproduce it from the bugzilla. I've pushed a
>> >> > branch wip-ecpool-hammer. Not sure which distro you're on, but packages
>> >> > will appear at gitbuilder.ceph.com in 30-45 minutes. This fixes the mon
>> >> > crash, which will let you delete the pool. I suggest stopping the OSDs
>> >> > before starting the mon with this or else they might get pg create
>> >> > messages and crash too. Once the pool is removed you can start them
>> >> > again. They shouldn't need to be upgraded.
>> >> >
>> >> > Note that the latest hammer doesn't let you create the pool at all because
>> >> > it fails the crush safety check (I had to disable the check to reproduce
>> >> > this), so that's good at least!
>> >> >
>> >> > sage
>> >> >
>> >> >>
>> >> >> On Tue, Oct 13, 2015 at 2:25 PM, Sage Weil <sage@newdream.net> wrote:
>> >> >>> On Tue, 13 Oct 2015, Luis Periquito wrote:
>> >> >>>> Any ideas? I'm growing desperate :(
>> >> >>>>
>> >> >>>> I've tried compiling from source, and including
>> >> >>>> https://github.com/ceph/ceph/pull/5276, but it still crashes on boot
>> >> >>>> of the ceph-mon
>> >> >>>
>> >> >>> If you can email a (link to a) tarball of your mon data directory I'd love
>> >> >>> to extract the osdmap and see why crush is crashing.. it's obviously not
>> >> >>> supposed to do that (even with a bad rule). You can also use
>> >> >>> the ceph-post-file utility.
>> >> >>>
>> >> >>> Thanks!
>> >> >>> sage
>> >> >>>
>> >> >>>
>> >> >>>>
>> >> >>>> ---------- Forwarded message ----------
>> >> >>>> From: Luis Periquito <periquito@gmail.com>
>> >> >>>> Date: Tue, Oct 13, 2015 at 12:26 PM
>> >> >>>> Subject: Re: monitor crashing
>> >> >>>> To: Ceph Users <ceph-users@lists.ceph.com>
>> >> >>>>
>> >> >>>>
>> >> >>>> I'm currently running Hammer (0.94.3), created an invalid LRC profile
>> >> >>>> (typo in the l=, should have been l=4 but was l=3, and now I don't
>> >> >>>> have enough different ruleset-locality) and created a pool. Is there
>> >> >>>> any way to delete this pool? remember I can't start the ceph-mon...
>> >> >>>>
>> >> >>>> On Tue, Oct 13, 2015 at 11:56 AM, Luis Periquito <periquito@gmail.com> wrote:
>> >> >>>>> It seems I've hit this bug:
>> >> >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1231630
>> >> >>>>>
>> >> >>>>> is there any way I can recover this cluster? It worked in our test
>> >> >>>>> cluster, but crashed the production one...
>> >> >>>> --
>> >> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> >>>> the body of a message to majordomo@vger.kernel.org
>> >> >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >> >>>>
>> >> >>>>
>> >> >> --
>> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> >> the body of a message to majordomo@vger.kernel.org
>> >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >> >>
>> >> >>
>> >> > --
>> >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> > the body of a message to majordomo@vger.kernel.org
>> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >> >
>> >>
>> >> --
>> >> Loïc Dachary, Artisan Logiciel Libre
>> >>
>> >>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-10-13 17:01 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CACx0BdM2qsO0n492Y7R=+FBfJzJT+4HxbBYaU1rkG0mZV=caJw@mail.gmail.com>
[not found] ` <CACx0BdNRPk64Etwvs7_tEce=xwFw7E4Ube=8raav_-EFhhP2Fw@mail.gmail.com>
2015-10-13 11:45 ` Fwd: monitor crashing Luis Periquito
2015-10-13 13:25 ` Sage Weil
2015-10-13 13:29 ` Luis Periquito
2015-10-13 13:41 ` Sage Weil
2015-10-13 13:52 ` Loic Dachary
2015-10-13 13:59 ` Sage Weil
2015-10-13 14:21 ` Luis Periquito
2015-10-13 14:35 ` Sage Weil
2015-10-13 17:01 ` Luis Periquito
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.