From: "Suzuki K. Poulose" <suzuki@in.ibm.com>
To: Ondrej Oprala <ooprala@redhat.com>,
Janani Venkataraman <jananive@linux.vnet.ibm.com>,
util-linux@vger.kernel.org
Cc: ananth@linux.vnet.ibm.com, Tarundeep Singh <tarundsk@linux.vnet.ibm.com>
Subject: Re: Non disruptive application core dump infrastructure
Date: Wed, 30 Jul 2014 15:58:20 +0530 [thread overview]
Message-ID: <53D8C8C4.7000002@in.ibm.com> (raw)
In-Reply-To: <53B69653.4020905@redhat.com>
On 07/04/2014 05:26 PM, Ondrej Oprala wrote:
> On 07/03/2014 02:58 PM, Suzuki K. Poulose wrote:
>> On 07/03/2014 06:06 PM, Ondrej Oprala wrote:
>>> On 07/03/2014 12:30 PM, Suzuki K. Poulose wrote:
>>>> On 05/29/2014 11:53 PM, Suzuki K. Poulose wrote:
>>>>> On 05/29/2014 06:47 PM, Ondrej Oprala wrote:
>>>>>> On 05/29/2014 02:45 PM, Suzuki K. Poulose wrote:
>>>>>>> On 05/29/2014 05:16 PM, Ondrej Oprala wrote:
>>>>>>>> On 05/29/2014 01:44 PM, Janani Venkataraman wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> We have developed a tool called "gencore" which captures the
>>>>>>>>> core of
>>>>>>>>> an application without
>>>>>>>>> disrupting its process. The dump is collected non-disruptively and
>>>>>>>>> this tool currently supports
>>>>>>>>> s390, x86 and power systems.
>>>>>>>>>
>>>>>>>>> THE TOOL:
>>>>>>>>>
>>>>>>>>> The tool can perform non-disruptive third party dumps. The tool
>>>>>>>>> also
>>>>>>>>> contains a library "libgencore"
>>>>>>>>> which helps applicationsto trigger self dumps.
>>>>>>>>>
>>>>>>>>> The tool can perform:
>>>>>>>>>
>>>>>>>>> 1) Third party dump: The pid of the process to dumped is given
>>>>>>>>> along
>>>>>>>>> with name of the core-file to
>>>>>>>>> be created.
>>>>>>>>>
>>>>>>>>> eg.
>>>>>>>>>
>>>>>>>>> [janani@localhost]:gencore 6616 core.test
>>>>>>>>>
>>>>>>>>> 2) Self dump: The programs can request a self-dump using gencore()
>>>>>>>>> API, provided throughlibgencore. This
>>>>>>>>> is implemented through a daemon which listens on a UNIX
>>>>>>>>> Filesocket for
>>>>>>>>> such requests. The daemon is started
>>>>>>>>> immediately post installation. The program which requires the dump
>>>>>>>>> makes use of the gencore() API and provides
>>>>>>>>> the name of the core-file as a parameter.
>>>>>>>>>
>>>>>>>>> eg.
>>>>>>>>>
>>>>>>>>> /* Opening the library, in this case the library is present in the
>>>>>>>>> /usr/lib64 */
>>>>>>>>> lib = dlopen("libgencore.so", RTLD_LAZY);
>>>>>>>>>
>>>>>>>>> gencore = dlsym(lib, "gencore");
>>>>>>>>>
>>>>>>>>> Call the API:
>>>>>>>>> gencore("/home/janani/core_test").
>>>>>>>>>
>>>>>>>>> BASIC IDEA:
>>>>>>>>>
>>>>>>>>> The basic idea is that the threads of the process are held using
>>>>>>>>> ptrace calls and the dump is generated in the
>>>>>>>>> ELF format using the /proc/pid filesystem.
>>>>>>>>>
>>>>>>>>> PATCH SET:
>>>>>>>>> We have designed this tool based on the discussions with linux
>>>>>>>>> kernel
>>>>>>>>> community. The patches have been posted
>>>>>>>>> at:https://lkml.org/lkml/2014/3/20/138
>>>>>>>>>
>>>>>>>>> Do you think this can be part of the util-linux bundle? We can
>>>>>>>>> tweak
>>>>>>>>> it to make it work as a package in util-linux.
>>>>>>>>>
>>>>>>>>> Let us know your reviews and comments.
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>> Janani
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>> util-linux" in
>>>>>>>>> the body of a messagetomajordomo@vger.kernel.org
>>>>>>>>> More majordomo info athttp://vger.kernel.org/majordomo-info.html
>>>>>>>> Interesting,
>>>>>>>> but how is this different from attaching to a process with GDB and
>>>>>>>> using
>>>>>>>> the gcore command? Or to automate it more, using the gcore script
>>>>>>>> that
>>>>>>>> comes with GDB?
>>>>>>>> Cheers,
>>>>>>>> Ondrej
>>>>>>>>
>>>>>>> There are two major issues with that.
>>>>>>>
>>>>>>> 1) GDB uses PTRACE_ATTACH and hence the process gets a SIGSTOP.
>>>>>> I fail to see the downside to that.
>>>>>>> 2) A process cannot initiate the request to dump itself, say from a
>>>>>>> signal handler. (since fork() is not signal safe)
>>>>>> This should be possible using libgdb. Let's say forking while in a
>>>>>> SIGSEGV
>>>>>> handler and using the libgdb API to do the dump.
>>>>> Thats exactly the problem. forking within a sighandler is not safe.
>>>>> You
>>>>> could possibly deadlock with glibc locks.
>>>> Ondrej,
>>>>
>>>> What are your thoughts about this ?
>>>>
>>>> Thanks
>>>> Suzuki
>>>>
>>> Hi Suzuki,
>>>
>>> from the LKML mailing list, I can see that the biggest
>>> criticism/confusion
>>> related to gencore comes from your necessity claims around the daemon
>>> part.
>> The daemon part was a shared philosophy from the CRIU project. There is
>> no other reliable way of doing a self dump.
> Yes, I think that you explained the problem with self-ptrace
> clearly enough on the LKML.
>>> I'm not entirely sure what kind of programs is gencore going to be most
>>> used/useful for..
>> This can be used by huge applications, like, JAVA RUNTIME, to trigger a
>> dump when it detects some issues, without actually bringing down the
>> workload.
> Well, on 64-bit archs, huge programs may eat up terabytes of
> virtual memory, so normal dumps are sometimes close to impossible
> (though I'd really like to stress-test gdb with a massive 1TB coredump).
> Do you somehow get the process' VM size before dumping?
> To limit the mappings to be dumped, for example...
>>> but isn't the signalfd API solving the problem of async-signal safety?
>>> Using it, you should be able to catch the signal, safely fork
>>> and happily exec gencore.
>> This imposes a lot of changes in the applications that may want to use
>> the API and is prone to errors in attaining the same.
> But see, now we've moved from "CAN'T be done in any other way"
> to "CAN be done in other ways, although it might be non-trivial
> for some projects". I'm not saying the daemon doesn't have its
> usecases. I'm only trying to point out here, that there indeed ARE
> other ways.
>>> No need for any other daemon running.
>>>
>> The daemon doesn't add much overhead. With systemd, you could make use
>> of the socket option to optimize the triggering of the gencore.
> I still haven't had time to look at the code itself. Does the daemon
> have to be running if I want to use the signalfd + fork + exec(gencore)
> approach
> mentioned above?
Sorry, this one was lost in other emails.
No we don't need a daemon if you can reliably invoke gencore
Cheers
Suzuki
prev parent reply other threads:[~2014-07-30 10:28 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-29 11:44 Non disruptive application core dump infrastructure Janani Venkataraman
2014-05-29 11:46 ` Ondrej Oprala
2014-05-29 12:45 ` Suzuki K. Poulose
2014-05-29 13:17 ` Ondrej Oprala
2014-05-29 18:23 ` Suzuki K. Poulose
2014-07-03 10:30 ` Suzuki K. Poulose
2014-07-03 12:36 ` Ondrej Oprala
2014-07-03 12:58 ` Suzuki K. Poulose
2014-07-04 11:56 ` Ondrej Oprala
2014-07-30 10:28 ` Suzuki K. Poulose [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53D8C8C4.7000002@in.ibm.com \
--to=suzuki@in.ibm.com \
--cc=ananth@linux.vnet.ibm.com \
--cc=jananive@linux.vnet.ibm.com \
--cc=ooprala@redhat.com \
--cc=tarundsk@linux.vnet.ibm.com \
--cc=util-linux@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).