From: Wido den Hollander <wido@42on.com>
To: Andrey Korolyov <andrey@xdel.ru>
Cc: "Sébastien Han" <han.sebastien@gmail.com>,
"Gregory Farnum" <greg@inktank.com>,
"Dan Mick" <dan.mick@inktank.com>, "Sage Weil" <sage@inktank.com>,
"Loic Dachary" <loic@dachary.org>,
"Sylvain Munaut" <s.munaut@whatever-company.com>,
ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: [0.48.3] OSD memory leak when scrubbing
Date: Sat, 16 Feb 2013 10:09:00 +0100 [thread overview]
Message-ID: <511F4CAC.6020405@42on.com> (raw)
In-Reply-To: <CABYiri8C9pSCW0a+ZZhLHGWL3+Eh1=8Y6brsPc2S1fvB1G1dYQ@mail.gmail.com>
On 02/16/2013 08:09 AM, Andrey Korolyov wrote:
> Can anyone who hit this bug please confirm that your system contains libc 2.15+?
>
I've seen this with 0.56.2 as well on Ubuntu 12.04. Ubuntu 12.04 comes
with 2.15-0ubuntu10.3
Haven't gotten around to adding a heap profiler to it.
Wido
> On Tue, Feb 5, 2013 at 1:27 AM, Sébastien Han <han.sebastien@gmail.com> wrote:
>> oh nice, the pattern also matches path :D, didn't know that
>> thanks Greg
>> --
>> Regards,
>> Sébastien Han.
>>
>>
>> On Mon, Feb 4, 2013 at 10:22 PM, Gregory Farnum <greg@inktank.com> wrote:
>>> Set your /proc/sys/kernel/core_pattern file. :) http://linux.die.net/man/5/core
>>> -Greg
>>>
>>> On Mon, Feb 4, 2013 at 1:08 PM, Sébastien Han <han.sebastien@gmail.com> wrote:
>>>> ok I finally managed to get something on my test cluster,
>>>> unfortunately, the dump goes to /
>>>>
>>>> any idea to change the destination path?
>>>>
>>>> My production / won't be big enough...
>>>>
>>>> --
>>>> Regards,
>>>> Sébastien Han.
>>>>
>>>>
>>>> On Mon, Feb 4, 2013 at 10:03 PM, Dan Mick <dan.mick@inktank.com> wrote:
>>>>> ...and/or do you have the corepath set interestingly, or one of the
>>>>> core-trapping mechanisms turned on?
>>>>>
>>>>>
>>>>> On 02/04/2013 11:29 AM, Sage Weil wrote:
>>>>>>
>>>>>> On Mon, 4 Feb 2013, S?bastien Han wrote:
>>>>>>>
>>>>>>> Hum just tried several times on my test cluster and I can't get any
>>>>>>> core dump. Does Ceph commit suicide or something? Is it expected
>>>>>>> behavior?
>>>>>>
>>>>>>
>>>>>> SIGSEGV should trigger the usual path that dumps a stack trace and then
>>>>>> dumps core. Was your ulimit -c set before the daemon was started?
>>>>>>
>>>>>> sage
>>>>>>
>>>>>>
>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> S?bastien Han.
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Feb 3, 2013 at 10:03 PM, S?bastien Han <han.sebastien@gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Lo?c,
>>>>>>>>
>>>>>>>> Thanks for bringing our discussion on the ML. I'll check that tomorrow
>>>>>>>> :-).
>>>>>>>>
>>>>>>>> Cheer
>>>>>>>> --
>>>>>>>> Regards,
>>>>>>>> S?bastien Han.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Feb 3, 2013 at 10:01 PM, S?bastien Han <han.sebastien@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Lo?c,
>>>>>>>>>
>>>>>>>>> Thanks for bringing our discussion on the ML. I'll check that tomorrow
>>>>>>>>> :-).
>>>>>>>>>
>>>>>>>>> Cheers
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Regards,
>>>>>>>>> S?bastien Han.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Feb 3, 2013 at 7:17 PM, Loic Dachary <loic@dachary.org> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> As discussed during FOSDEM, the script you wrote to kill the OSD when
>>>>>>>>>> it
>>>>>>>>>> grows too much could be amended to core dump instead of just being
>>>>>>>>>> killed &
>>>>>>>>>> restarted. The binary + core could probably be used to figure out
>>>>>>>>>> where the
>>>>>>>>>> leak is.
>>>>>>>>>>
>>>>>>>>>> You should make sure the OSD current working directory is in a file
>>>>>>>>>> system
>>>>>>>>>> with enough free disk space to accomodate for the dump and set
>>>>>>>>>>
>>>>>>>>>> ulimit -c unlimited
>>>>>>>>>>
>>>>>>>>>> before running it ( your system default is probably ulimit -c 0 which
>>>>>>>>>> inhibits core dumps ). When you detect that OSD grows too much kill it
>>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>>> kill -SEGV $pid
>>>>>>>>>>
>>>>>>>>>> and upload the core found in the working directory, together with the
>>>>>>>>>> binary in a public place. If the osd binary is compiled with -g but
>>>>>>>>>> without
>>>>>>>>>> changing the -O settings, you should have a larger binary file but no
>>>>>>>>>> negative impact on performances. Forensics analysis will be made a lot
>>>>>>>>>> easier with the debugging symbols.
>>>>>>>>>>
>>>>>>>>>> My 2cts
>>>>>>>>>>
>>>>>>>>>> On 01/31/2013 08:57 PM, Sage Weil wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Thu, 31 Jan 2013, Sylvain Munaut wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I disabled scrubbing using
>>>>>>>>>>>>
>>>>>>>>>>>>> ceph osd tell \* injectargs '--osd-scrub-min-interval 1000000'
>>>>>>>>>>>>> ceph osd tell \* injectargs '--osd-scrub-max-interval 10000000'
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> and the leak seems to be gone.
>>>>>>>>>>>>
>>>>>>>>>>>> See the graph at http://i.imgur.com/A0KmVot.png with the OSD
>>>>>>>>>>>> memory
>>>>>>>>>>>> for the 12 osd processes over the last 3.5 days.
>>>>>>>>>>>> Memory was rising every 24h. I did the change yesterday around 13h00
>>>>>>>>>>>> and OSDs stopped growing. OSD memory even seems to go down slowly by
>>>>>>>>>>>> small blocks.
>>>>>>>>>>>>
>>>>>>>>>>>> Of course I assume disabling scrubbing is not a long term solution
>>>>>>>>>>>> and
>>>>>>>>>>>> I should re-enable it ... (how do I do that btw ? what were the
>>>>>>>>>>>> default values for those parameters)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> It depends on the exact commit you're on. You can see the defaults
>>>>>>>>>>> if
>>>>>>>>>>> you
>>>>>>>>>>> do
>>>>>>>>>>>
>>>>>>>>>>> ceph-osd --show-config | grep osd_scrub
>>>>>>>>>>>
>>>>>>>>>>> Thanks for testing this... I have a few other ideas to try to
>>>>>>>>>>> reproduce.
>>>>>>>>>>>
>>>>>>>>>>> sage
>>>>>>>>>>> --
>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>>>>>>>> in
>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Lo?c Dachary, Artisan Logiciel Libre
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Wido den Hollander
42on B.V.
Phone: +31 (0)20 700 9902
Skype: contact42on
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2013-02-16 9:09 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-22 20:01 [0.48.3] OSD memory leak when scrubbing Sylvain Munaut
2013-01-22 21:19 ` Sébastien Han
2013-01-22 21:32 ` Sylvain Munaut
2013-01-22 21:38 ` Sébastien Han
2013-01-25 16:29 ` Sébastien Han
2013-01-25 20:16 ` Sylvain Munaut
2013-01-27 16:17 ` Sylvain Munaut
2013-01-27 17:47 ` Sage Weil
2013-01-27 18:17 ` Sylvain Munaut
2013-01-30 9:12 ` Sylvain Munaut
2013-01-30 9:18 ` Sage Weil
2013-01-30 13:26 ` Sylvain Munaut
2013-01-30 19:40 ` Sage Weil
2013-01-31 13:20 ` Sylvain Munaut
[not found] ` <31226757.422.1359645742478.JavaMail.dspano@it1>
2013-01-31 15:26 ` Sylvain Munaut
2013-01-31 19:57 ` Sage Weil
2013-02-03 18:17 ` Loic Dachary
[not found] ` <CAOLwVUkUFvLihb6KbxG9Et7R_-ZTZpLQJYTjXm9TEe40V_ZRHg@mail.gmail.com>
2013-02-03 21:03 ` Sébastien Han
2013-02-04 17:29 ` Sébastien Han
2013-02-04 19:29 ` Sage Weil
2013-02-04 21:03 ` Dan Mick
2013-02-04 21:08 ` Sébastien Han
2013-02-04 21:22 ` Gregory Farnum
2013-02-04 21:27 ` Sébastien Han
2013-02-16 7:09 ` Andrey Korolyov
2013-02-16 9:09 ` Wido den Hollander [this message]
2013-02-17 17:21 ` Sébastien Han
2013-02-18 16:46 ` 0.56 scrub OSD memleaks, WAS " Christopher Kunz
2013-02-19 19:23 ` Samuel Just
2013-02-19 19:50 ` Christopher Kunz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=511F4CAC.6020405@42on.com \
--to=wido@42on.com \
--cc=andrey@xdel.ru \
--cc=ceph-devel@vger.kernel.org \
--cc=dan.mick@inktank.com \
--cc=greg@inktank.com \
--cc=han.sebastien@gmail.com \
--cc=loic@dachary.org \
--cc=s.munaut@whatever-company.com \
--cc=sage@inktank.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.