Extending coredump note section to contain filenames

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Extending coredump note section to contain filenames
@ 2012-03-09 17:13 Denys Vlasenko
  2012-03-09 17:29 ` Jan Kratochvil
  0 siblings, 1 reply; 21+ messages in thread
From: Denys Vlasenko @ 2012-03-09 17:13 UTC (permalink / raw)
  To: Jan Kratochvil, Roland McGrath; +Cc: linux-kernel, Oleg Nesterov

Hi Roland, Jan,

While working with coredump analysis, it struck me how much
PITA is caused merely by the fact that names of loaded binary
and libraries are not known.

gdb retrieves loaded library names by examining dynamic loader's
data stored in the coredump's data segments. It uses intimate
knowledge how and where dynamic loader keeps the list of loaded
libraries. (Meaning that it will break if non-standard loader
is used).

And, as Jan explained to me, it depends on knowing
where the linked list of libs starts, which requires knowing binary
which was running. IIRC there is no easy and reasonably foolproof
way to determine binary's name. (Looking at argv[0] on stack
is not reasonably foolproof).

Which is *ridiculous*. We *know* the list of mapped files
at the coredump generation time. It can even be accessed as
/proc/PID/maps.

I propose to save this information in coredump.

(For people not very familiar with coredump format:
coredumps have a NOTE segment which contains register values,
PID of the crashed process, and other such info. It looks like this:

Program Headers:
   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
   NOTE           0x000254 0x00000000 0x00000000 0x00814 0x00000     0
   LOAD           0x001000 0x00172000 0x00000000 0x01000 0x01000 R E 0x1000
...
Notes at offset 0x00000254 with length 0x00000814:
   Owner                 Data size       Description
   CORE                 0x00000090       NT_PRSTATUS (prstatus structure)
   CORE                 0x0000007c       NT_PRPSINFO (prpsinfo structure)
   CORE                 0x000000a0       NT_AUXV (auxiliary vector)
   CORE                 0x0000006c       NT_FPREGSET (floating point registers)
   LINUX                0x00000200       NT_PRXFPREG (user_xfpregs structure)
   LINUX                0x00000340       NT_X86_XSTATE (x86 XSAVE extended state)
   LINUX                0x00000030       Unknown note type: (0x00000200)

and I propose to add a new note to this segment)

Do you think such addition would be useful?

What format this note should have? Hmmm. How about this:

Elf_Word count    // how many files are mapped
array of [count] elements of
     Elf_Addr start
     Elf_Addr end
     Elf_Addr file_ofs
followed by filenames in ASCII: "FILE1" NUL "FILE2" NUL "FILE3" NUL...

The rationale for not saving some other attributes is that the list
of all mapped files can be somewhat big. For example:

$ cat /proc/`pidof firefox`/maps | wc -c
41553

Thus, we probably may want to make it smaller.
We may save a bit by coalescing the adjacent mappings to the same file
which only differ in attributes. Example from my firefox's /proc/pid/maps file:

b6fa0000-b6fb4000 r-xp 00000000 fd:01 671717     /usr/lib/xulrunner-2/libmozjs.so
b6fb4000-b6fb6000 ---p 00014000 fd:01 671717     /usr/lib/xulrunner-2/libmozjs.so

In fact these two mappings map contiguos area of the file to a contiguous area
of memory, so in coredump we can represent it as one item:

start:b6fa0000 end:b6fb6000 ofs:00000000 name:/usr/lib/xulrunner-2/libmozjs.so

The information about memory attributes is present in coredump anyway,
in program header, so it can be restored by coredump analysis tools.

Maybe we also would want to be able to compress filenames by saying
"take N chars from previous name, then append this suffix".
This means that instead of storing
     "/usr/lib/xulrunner-2/libxul.so" NUL
     "/usr/lib/xulrunner-2/libxul.so" NUL
     "/usr/lib/xulrunner-2/libmozjs.so" NUL
we'd store
     "/usr/lib/xulrunner-2/libxul.so" NUL
     <30> NUL
     <24> "mozjs.so" NUL
But is it worth the pain in the coredump parsers?

Another question is detection of deleted files.
If /usr/lib/xulrunner-2/libmozjs.so was updated while program ran
and now file mapped into process address space does not correspond
to the same-named file on disk, can we help users to detect this? How?
By saving maj/min/inode? Hash thereof?
File size?
File's md5sum (probably not, way too expensive. But nicely robust...)?

-- 
vda

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Extending coredump note section to contain filenames
  2012-03-09 17:13 Extending coredump note section to contain filenames Denys Vlasenko
@ 2012-03-09 17:29 ` Jan Kratochvil
  2012-03-12 12:05   ` Denys Vlasenko
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kratochvil @ 2012-03-09 17:29 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Roland McGrath, linux-kernel, Oleg Nesterov

Hi Denys,

On Fri, 09 Mar 2012 18:13:49 +0100, Denys Vlasenko wrote:
> gdb retrieves loaded library names by examining dynamic loader's
> data stored in the coredump's data segments. It uses intimate
> knowledge how and where dynamic loader keeps the list of loaded
> libraries.

this is the backward compatible way and it is no longer the right one with
build-ids.

GDB should scan the address space for mapped build-ids and map symbol files
accordingly.  The shared library list may be even corrupted in crashes with
memory overruns.  But it has not been implemented yet, I plan it for years.

(It can use the shared library list as a hint as in some cases the symbol
files could overlap if you load the first page of an ELF file etc., this is
very improbable possibility.)


> Another question is detection of deleted files.
> If /usr/lib/xulrunner-2/libmozjs.so was updated while program ran
> and now file mapped into process address space does not correspond
> to the same-named file on disk, can we help users to detect this? How?
> By saving maj/min/inode? Hash thereof?
> File size?
> File's md5sum (probably not, way too expensive. But nicely robust...)?

build-id is already being saved.  This is all that matters.  Filename does not
say anything - as you noticed it can be even already deleted, it can have
unknown content etc.  I do not see what problems you target here.


Regards,
Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Extending coredump note section to contain filenames
  2012-03-09 17:29 ` Jan Kratochvil
@ 2012-03-12 12:05   ` Denys Vlasenko
  2012-03-12 12:13     ` Denys Vlasenko
  2012-03-12 16:53     ` Jan Kratochvil
  0 siblings, 2 replies; 21+ messages in thread
From: Denys Vlasenko @ 2012-03-12 12:05 UTC (permalink / raw)
  To: Jan Kratochvil; +Cc: Roland McGrath, linux-kernel, Oleg Nesterov

On 03/09/2012 06:29 PM, Jan Kratochvil wrote:
> On Fri, 09 Mar 2012 18:13:49 +0100, Denys Vlasenko wrote:
>> gdb retrieves loaded library names by examining dynamic loader's
>> data stored in the coredump's data segments. It uses intimate
>> knowledge how and where dynamic loader keeps the list of loaded
>> libraries.
>
> this is the backward compatible way and it is no longer the right one with
> build-ids.
 >
 > GDB should scan the address space for mapped build-ids and map symbol files
 > accordingly.

Build-ids are useful, but they still don't map directly to the names
of loaded files. You need to rely on /usr/lib/debug/.build-id/XX/YYYYYYYYYY
symlinks to translate build-ids to names.

For example, on my home machine (linux-from-scratch style) I don't have
/usr/lib/debug/.build-id/* directory at all. So build-ids can't be used
to find the binary and libraries there.

Why we don't save library names in coredump? I see no logical reason
not to do so. Even if those names sometimes won't be reliable
("deleted files" problem), it's not a good reason to shoot ourself
in the food and deprive ourself from this information 100% of the time.

>> Another question is detection of deleted files.
>> If /usr/lib/xulrunner-2/libmozjs.so was updated while program ran
>> and now file mapped into process address space does not correspond
>> to the same-named file on disk, can we help users to detect this? How?
>> By saving maj/min/inode? Hash thereof?
>> File size?
>> File's md5sum (probably not, way too expensive. But nicely robust...)?
>
> build-id is already being saved.  This is all that matters.  Filename does not
> say anything - as you noticed it can be even already deleted,

Yes, the file can be deleted/updated-via-rename. That's the case
I want to be possible to detect.

> it can have unknown content etc.

I don't understand. *What* can have unknown content?

 > I do not see what problems you target here.

I'm thinking whether we should supply some mechanism for detecting
"deleted/updated file" problem. Even if this would be a heuristic.
I'll be satisfied with 99.9999% success rate instead of 100% :)

-- 
vda

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Extending coredump note section to contain filenames
  2012-03-12 12:05   ` Denys Vlasenko
@ 2012-03-12 12:13     ` Denys Vlasenko
  2012-03-12 16:53     ` Jan Kratochvil
  1 sibling, 0 replies; 21+ messages in thread
From: Denys Vlasenko @ 2012-03-12 12:13 UTC (permalink / raw)
  To: Jan Kratochvil; +Cc: Roland McGrath, linux-kernel, Oleg Nesterov

On 03/12/2012 01:05 PM, Denys Vlasenko wrote:
> ("deleted files" problem), it's not a good reason to shoot ourself
> in the food and deprive ourself from this information 100% of the time.

s/food/foot/   :)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Extending coredump note section to contain filenames
  2012-03-12 12:05   ` Denys Vlasenko
  2012-03-12 12:13     ` Denys Vlasenko
@ 2012-03-12 16:53     ` Jan Kratochvil
  2012-03-12 18:58       ` Denys Vlasenko
  2012-03-12 22:21       ` H. Peter Anvin
  1 sibling, 2 replies; 21+ messages in thread
From: Jan Kratochvil @ 2012-03-12 16:53 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Roland McGrath, linux-kernel, Oleg Nesterov, Kushal Das

On Mon, 12 Mar 2012 13:05:56 +0100, Denys Vlasenko wrote:
> Build-ids are useful, but they still don't map directly to the names
> of loaded files. You need to rely on /usr/lib/debug/.build-id/XX/YYYYYYYYYY
> symlinks to translate build-ids to names.

There is draft of https://fedoraproject.org/wiki/Darkserver but it is not yet
deployed.  This will give you distro and file version / URL for the specified
build-id.  Without any local symlinks/files at all.

Reasons why it has not yet been finished + deployed I find out of the scope of
this thread.

> Why we don't save library names in coredump?

Because they are useless.  If you have a filename there how to find which
content it should match?  Even if you verify the file is still there with the
same content there is a race it can no longer be true when you read the core
file 5 seconds later.

The build-id mapping server above always works and without races.

> > it can have unknown content etc.
> 
> I don't understand. *What* can have unknown content?

You will save there "/lib64/libc-2.14.90.so".  But the next day you have no
idea which compilation or build the core file was generated for, that virtual
machine can be either already updated or even reinstalled from scratch etc.
"/lib64/libc-2.14.90.so" does not say anything about the build.

> I'll be satisfied with 99.9999% success rate instead of 100% :)

I am not, I prefer 100% build-id server.

Regards,
Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Extending coredump note section to contain filenames
  2012-03-12 16:53     ` Jan Kratochvil
@ 2012-03-12 18:58       ` Denys Vlasenko
  2012-03-12 19:08         ` Jan Kratochvil
  2012-03-13 12:12         ` Denys Vlasenko
  2012-03-12 22:21       ` H. Peter Anvin
  1 sibling, 2 replies; 21+ messages in thread
From: Denys Vlasenko @ 2012-03-12 18:58 UTC (permalink / raw)
  To: Jan Kratochvil; +Cc: Roland McGrath, linux-kernel, Oleg Nesterov, Kushal Das

On 03/12/2012 05:53 PM, Jan Kratochvil wrote:
> On Mon, 12 Mar 2012 13:05:56 +0100, Denys Vlasenko wrote:
>> Why we don't save library names in coredump?
>
> Because they are useless.

They may be useless in some situations. Not in every situation,
by a long shot. Here is a live example from my system:

$ ulimit -c unlimited
$ md5sum </dev/zero &
$ pid=$!
$ sleep 1
$ kill -ABRT $pid
$ gdb -ex "core core.12977"
GNU gdb (GDB) Fedora (7.3.50.20110722-10.fc16)
...
Missing separate debuginfo for the main executable file
Try: yum --disablerepo='*' --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/ec/1fd70dbee0db36eff9527254d9d2bbfd260f13
[New LWP 12977]
Core was generated by `md5sum'.
Program terminated with signal 6, Aborted.
#0  0x0804b2b0 in ?? ()
(gdb) bt
#0  0x0804b2b0 in ?? ()
Backtrace stopped: Not enough registers or memory available to unwind further

      No backtrace at all.
      Let's tell it which binary was that:

(gdb) file /usr/bin/md5sum
Reading symbols from /usr/bin/md5sum...(no debugging symbols found)...done.
Missing separate debuginfos, use: debuginfo-install coreutils-8.12-6.fc16.i686
(gdb) bt
#0  0x0804b2b0 in ?? ()
#1  0x0804bdd8 in ?? ()
#2  0x0804a093 in ?? ()
#3  0x08049659 in ?? ()
#4  0xb760f6b3 in ?? ()
Backtrace stopped: Not enough registers or memory available to unwind further

      This is better, isn't it?
      Wouldn't it be nice if gdb would retrieve binary's name by itself?
      (BTW: nothing prevents it from checking build ids and refusing
      to use it if they don't match.)

> If you have a filename there how to find which
> content it should match?  Even if you verify the file is still there with the
> same content there is a race it can no longer be true when you read the core
> file 5 seconds later.

And maybe root will run "rm -rf /*" in parallel. By this logic,
we should just give up on using computers.

> The build-id mapping server above always works and without races.

But it is not always available. Some people don't want to be connected
to internet; other can't be connected.

>>> it can have unknown content etc.
>>
>> I don't understand. *What* can have unknown content?
>
> You will save there "/lib64/libc-2.14.90.so".  But the next day you have no
> idea which compilation or build the core file was generated for, that virtual
> machine can be either already updated or even reinstalled from scratch etc.
> "/lib64/libc-2.14.90.so" does not say anything about the build.

Does it follow from the above that filenames are *never* useful?
I don't think so.

-- 
vda

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Extending coredump note section to contain filenames
  2012-03-12 18:58       ` Denys Vlasenko
@ 2012-03-12 19:08         ` Jan Kratochvil
  2012-03-12 19:45           ` Denys Vlasenko
  2012-03-13 12:12         ` Denys Vlasenko
  1 sibling, 1 reply; 21+ messages in thread
From: Jan Kratochvil @ 2012-03-12 19:08 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Roland McGrath, linux-kernel, Oleg Nesterov, Kushal Das

On Mon, 12 Mar 2012 19:58:31 +0100, Denys Vlasenko wrote:
>      This is better, isn't it?
>      Wouldn't it be nice if gdb would retrieve binary's name by itself?

Yes, that yum should have been executed automatically, instead of just
suggested by that:
	Try: yum --disablerepo='*' --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/ec/1fd70dbee0db36eff9527254d9d2bbfd260f13

But this is a user interface issue, it was discussed with PackageKit people
etc. but it went nowhere.

Sorry but I cannot code the whole OS myself including all the Gnome UI
interfaces.

>      (BTW: nothing prevents it from checking build ids and refusing
>      to use it if they don't match.)

And if they do not match it will need to run the yum command above anyway.
So why to code two ways where the first one (by filename) works only sometimes
while the second way works always?  Isn't it easier to do it always just the
second way?

> >The build-id mapping server above always works and without races.
> 
> But it is not always available. Some people don't want to be connected
> to internet; other can't be connected.

That 'yum' command above will run in some conditions without any Internet
connectivity.  But in some cases it will have more bandwidth requirements than
a build-id server query.

This is about package management vs. network servers connectivity, this is
also partially distro dependent.

> Does it follow from the above that filenames are *never* useful?

They can be sometimes useful but they are superseded by build-ids; with
build-ids they can be safely ignored.

Regards,
Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Extending coredump note section to contain filenames
  2012-03-12 19:08         ` Jan Kratochvil
@ 2012-03-12 19:45           ` Denys Vlasenko
  2012-03-12 22:07             ` Jan Kratochvil
  2012-03-12 22:16             ` Jan Kratochvil
  0 siblings, 2 replies; 21+ messages in thread
From: Denys Vlasenko @ 2012-03-12 19:45 UTC (permalink / raw)
  To: Jan Kratochvil; +Cc: Roland McGrath, linux-kernel, Oleg Nesterov, Kushal Das

On 03/12/2012 08:08 PM, Jan Kratochvil wrote:
> On Mon, 12 Mar 2012 19:58:31 +0100, Denys Vlasenko wrote:
>>       This is better, isn't it?
>>       Wouldn't it be nice if gdb would retrieve binary's name by itself?
>
> Yes, that yum should have been executed automatically, instead of just
> suggested by that:
> 	Try: yum --disablerepo='*' --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/ec/1fd70dbee0db36eff9527254d9d2bbfd260f13

My network is down. "yum install" won't work.


> But this is a user interface issue, it was discussed with PackageKit people
> etc. but it went nowhere.
>
> Sorry but I cannot code the whole OS myself including all the Gnome UI
> interfaces.
>
>>       (BTW: nothing prevents it from checking build ids and refusing
>>       to use it if they don't match.)
>
> And if they do not match it will need to run the yum command above anyway.
> So why to code two ways where the first one (by filename) works only sometimes
> while the second way works always?

My network is down. "yum install" won't work.


>> Does it follow from the above that filenames are *never* useful?
>
> They can be sometimes useful but they are superseded by build-ids; with
> build-ids they can be safely ignored.

I just gave you an example where filename is useful.
-- 
vda


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Extending coredump note section to contain filenames
  2012-03-12 19:45           ` Denys Vlasenko
@ 2012-03-12 22:07             ` Jan Kratochvil
  2012-03-12 22:16             ` Jan Kratochvil
  1 sibling, 0 replies; 21+ messages in thread
From: Jan Kratochvil @ 2012-03-12 22:07 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Roland McGrath, linux-kernel, Oleg Nesterov, Kushal Das

On Mon, 12 Mar 2012 20:45:47 +0100, Denys Vlasenko wrote:
> On 03/12/2012 08:08 PM, Jan Kratochvil wrote:
> >	Try: yum --disablerepo='*' --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/ec/1fd70dbee0db36eff9527254d9d2bbfd260f13
> 
> My network is down. "yum install" won't work.

You need to fix yum first on each distro so that it works like apt-get does:
	echo 'metadata_expire=never' >>/etc/yum.conf
	echo "yum --enablerepo='*' makecache" >/etc/cron.daily/yumupdate
	chmod +x /etc/cron.daily/yumupdate

Then all the examples I have given work fine.


Regards,
Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Extending coredump note section to contain filenames
  2012-03-12 19:45           ` Denys Vlasenko
  2012-03-12 22:07             ` Jan Kratochvil
@ 2012-03-12 22:16             ` Jan Kratochvil
  1 sibling, 0 replies; 21+ messages in thread
From: Jan Kratochvil @ 2012-03-12 22:16 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Roland McGrath, linux-kernel, Oleg Nesterov, Kushal Das

On Mon, 12 Mar 2012 20:45:47 +0100, Denys Vlasenko wrote:
> On 03/12/2012 08:08 PM, Jan Kratochvil wrote:
> >	Try: yum --disablerepo='*' --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/ec/1fd70dbee0db36eff9527254d9d2bbfd260f13
> 
> My network is down. "yum install" won't work.

Now I think you mean that you want to just find /usr/bin/md5sum without ever
using /usr/lib/debug/usr/bin/md5sum.debug, right?

Then you can create your local build-id -> filename database specific to your
machine in some /var/lib/build-id/ .  Such database has nothing to do with the
core file itself.  Core file can be easily transferred to other hosts as it is
machine independent.  Core file is dependent only on the specific binaries
_content_ (not _names_) which is uniquely identified by their build-id.

/var/lib/build-id/ can be created/updated either during 'rpm -i' time or from
'/proc/sys/kernel/core_pattern' hook or via some inotify daemon watching /bin,
/usr/bin and other directories updates etc. etc.

Regards,
Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Extending coredump note section to contain filenames
  2012-03-12 18:58       ` Denys Vlasenko
  2012-03-12 19:08         ` Jan Kratochvil
@ 2012-03-13 12:12         ` Denys Vlasenko
  2012-03-13 12:19           ` Jan Kratochvil
  1 sibling, 1 reply; 21+ messages in thread
From: Denys Vlasenko @ 2012-03-13 12:12 UTC (permalink / raw)
  To: Jan Kratochvil; +Cc: Roland McGrath, linux-kernel, Oleg Nesterov, Kushal Das

On 03/12/2012 07:58 PM, Denys Vlasenko wrote:
> On 03/12/2012 05:53 PM, Jan Kratochvil wrote:
>> On Mon, 12 Mar 2012 13:05:56 +0100, Denys Vlasenko wrote:
>>> Why we don't save library names in coredump?
>>
>> Because they are useless.
>
> They may be useless in some situations. Not in every situation,
> by a long shot. Here is a live example from my system:
 >...

Another example. abrt project wants to do better duplicate detection
for coredumps, where duplicate is defined as "this crash is probably
caused by the same bug".

This needs to be done quickly. Downloading and installing debuginfos
are definitely out of the question.

Karel decided to do it by walking the stack and creating a simplified
backtrace of this form:

BUILD_ID OFFSET SYMBOL MODNAME

where
BUILD_ID: build id of the binary file the address is mapped to.
OFFSET: offset from the start of the executable section of the file
     the stored instruction pointer points to.
SYMBOL: name of the function if it is known.
MODNAME: name of the binary or library.

We wrote a small standalone tool which generates such trace.
The code we have now frequently generates something like this:

ec1fd70dbee0db36eff9527254d9d2bbfd260f13 0x34a7 close_stdout [exe]
ec1fd70dbee0db36eff9527254d9d2bbfd260f13 0x3dd8 close_stdout [exe]
ec1fd70dbee0db36eff9527254d9d2bbfd260f13 0x2093 - [exe]
ec1fd70dbee0db36eff9527254d9d2bbfd260f13 0x1659 - [exe]

Naturally, "[exe]" is there because we don't know executable names.
I have a new code which uses saved /proc/pid/maps and it generates:

ec1fd70dbee0db36eff9527254d9d2bbfd260f13 0x34a7 close_stdout /usr/bin/md5sum
ec1fd70dbee0db36eff9527254d9d2bbfd260f13 0x3dd8 close_stdout /usr/bin/md5sum
ec1fd70dbee0db36eff9527254d9d2bbfd260f13 0x2093 - /usr/bin/md5sum
ec1fd70dbee0db36eff9527254d9d2bbfd260f13 0x1659 - /usr/bin/md5sum

If anyone would want to use this tool on a large collection
of coredumps with the intent of fishing out similar crashes
(this is not a theoretical assumption, we *do* have people
who badly need this feature), they won't get nice names of binaries,
they'll get non-informative "[exe]" things - because /proc/pid/maps
info is not saved in coredump, and won't be available.
This will impair ability to perform crash similarity analysis.

-- 
vda

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Extending coredump note section to contain filenames
  2012-03-13 12:12         ` Denys Vlasenko
@ 2012-03-13 12:19           ` Jan Kratochvil
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Kratochvil @ 2012-03-13 12:19 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Roland McGrath, linux-kernel, Oleg Nesterov, Kushal Das

On Tue, 13 Mar 2012 13:12:41 +0100, Denys Vlasenko wrote:
> ec1fd70dbee0db36eff9527254d9d2bbfd260f13 0x2093 - /usr/bin/md5sum
> ec1fd70dbee0db36eff9527254d9d2bbfd260f13 0x1659 - /usr/bin/md5sum
> 
> If anyone would want to use this tool on a large collection
> of coredumps with the intent of fishing out similar crashes

In such moment it is already on public Internet and the build-id -> name
mapping database is available.

Or it rather would be available if the Darkserver would be finished.  So far
I have only always seen every project for build-id targeted at getting rid of
the build-id and map it back to the ambiguous filenames / package names.

In such case there could filename / package name already in the ELF header.
build-id should be there as a feature, not as a complication.

> (this is not a theoretical assumption, we *do* have people
> who badly need this feature), they won't get nice names of binaries,

If someone does 'cp /usr/bin/md5sum ~/bin/mysum' then you either get
unidentifiable names or even misleading names.

Regards,
Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Extending coredump note section to contain filenames
  2012-03-12 16:53     ` Jan Kratochvil
  2012-03-12 18:58       ` Denys Vlasenko
@ 2012-03-12 22:21       ` H. Peter Anvin
  2012-03-12 22:31         ` Jan Kratochvil
  1 sibling, 1 reply; 21+ messages in thread
From: H. Peter Anvin @ 2012-03-12 22:21 UTC (permalink / raw)
  To: Jan Kratochvil
  Cc: Denys Vlasenko, Roland McGrath, linux-kernel, Oleg Nesterov,
	Kushal Das

On 03/12/2012 09:53 AM, Jan Kratochvil wrote:
> 
>> Why we don't save library names in coredump?
> 
> Because they are useless.  If you have a filename there how to find which
> content it should match?  Even if you verify the file is still there with the
> same content there is a race it can no longer be true when you read the core
> file 5 seconds later.
> 
> The build-id mapping server above always works and without races.
> 

It seems to me that there would be value in having *both*.

In particular, for libraries which aren't installed in standard system
directories (because they are test versions or mapped with dlopen()) it
would be good to have a hint of where to find them.  The build-id can
then tell you if you have the right version.

	-hpa


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Extending coredump note section to contain filenames
  2012-03-12 22:21       ` H. Peter Anvin
@ 2012-03-12 22:31         ` Jan Kratochvil
  2012-03-13  0:16           ` H. Peter Anvin
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kratochvil @ 2012-03-12 22:31 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Denys Vlasenko, Roland McGrath, linux-kernel, Oleg Nesterov,
	Kushal Das

On Mon, 12 Mar 2012 23:21:02 +0100, H. Peter Anvin wrote:
> In particular, for libraries which aren't installed in standard system
> directories (because they are test versions or mapped with dlopen()) it
> would be good to have a hint of where to find them.  The build-id can
> then tell you if you have the right version.

I believe "/proc/sys/kernel/core_pattern" was created so that Linux kernel
does not have to contain code dumping of many associated info, the filenames
seem to me to belong in this category.


Regards,
Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Extending coredump note section to contain filenames
  2012-03-12 22:31         ` Jan Kratochvil
@ 2012-03-13  0:16           ` H. Peter Anvin
  2012-03-13  0:27             ` Jan Kratochvil
  0 siblings, 1 reply; 21+ messages in thread
From: H. Peter Anvin @ 2012-03-13  0:16 UTC (permalink / raw)
  To: Jan Kratochvil
  Cc: Denys Vlasenko, Roland McGrath, linux-kernel, Oleg Nesterov,
	Kushal Das

On 03/12/2012 03:31 PM, Jan Kratochvil wrote:
> On Mon, 12 Mar 2012 23:21:02 +0100, H. Peter Anvin wrote:
>> In particular, for libraries which aren't installed in standard system
>> directories (because they are test versions or mapped with dlopen()) it
>> would be good to have a hint of where to find them.  The build-id can
>> then tell you if you have the right version.
> 
> I believe "/proc/sys/kernel/core_pattern" was created so that Linux kernel
> does not have to contain code dumping of many associated info, the filenames
> seem to me to belong in this category.
> 

That's quite absurd if you think about it... we're talking about each
individual library, not necessarily the root binary.

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Extending coredump note section to contain filenames
  2012-03-13  0:16           ` H. Peter Anvin
@ 2012-03-13  0:27             ` Jan Kratochvil
  2012-03-13  0:31               ` H. Peter Anvin
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kratochvil @ 2012-03-13  0:27 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Denys Vlasenko, Roland McGrath, linux-kernel, Oleg Nesterov,
	Kushal Das

On Tue, 13 Mar 2012 01:16:30 +0100, H. Peter Anvin wrote:
> That's quite absurd if you think about it... we're talking about each
> individual library, not necessarily the root binary.

Denys is also working on ABRT which already dumps a lot of useful info
associated with a core file:
	abrt_version analyzer architecture cmdline component coredump count
	dso_list environ executable hostname kernel maps os_release package
	pid pwd reason time uid username uuid var_log_messages

I find the local copy instances of the files of specific build the same or
even less less useful info than those files above.

One should start implementing dumping all the info above info Linux kernel
before the filenames are worth it.

The problem with filenames is they work in the most common end-user host
cases.  Which means tools get built on top of it that way - like what AFAIK
happened in the Ubuntu Apport case.  The resulting toolchain is then not
applicable for the longterm running server setups, it is that 99% working
solution which needs to be rewritten from scratch for that final 1%.

Regards,
Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Extending coredump note section to contain filenames
  2012-03-13  0:27             ` Jan Kratochvil
@ 2012-03-13  0:31               ` H. Peter Anvin
  2012-03-13  0:36                 ` Jan Kratochvil
  0 siblings, 1 reply; 21+ messages in thread
From: H. Peter Anvin @ 2012-03-13  0:31 UTC (permalink / raw)
  To: Jan Kratochvil
  Cc: Denys Vlasenko, Roland McGrath, linux-kernel, Oleg Nesterov,
	Kushal Das

On 03/12/2012 05:27 PM, Jan Kratochvil wrote:
> On Tue, 13 Mar 2012 01:16:30 +0100, H. Peter Anvin wrote:
>> That's quite absurd if you think about it... we're talking about each
>> individual library, not necessarily the root binary.
> 
> Denys is also working on ABRT which already dumps a lot of useful info
> associated with a core file:
> 	abrt_version analyzer architecture cmdline component coredump count
> 	dso_list environ executable hostname kernel maps os_release package
> 	pid pwd reason time uid username uuid var_log_messages
> 
> I find the local copy instances of the files of specific build the same or
> even less less useful info than those files above.
> 
> One should start implementing dumping all the info above info Linux kernel
> before the filenames are worth it.
> 

This is basically providing the dso_list and executable information.
Several of the other things you list above (cmdline, environ, pid,
reason, maps) are already part of a core file.  The rest sounds like an
RFE to me.

	-hpa

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Extending coredump note section to contain filenames
  2012-03-13  0:31               ` H. Peter Anvin
@ 2012-03-13  0:36                 ` Jan Kratochvil
  2012-03-13  0:42                   ` H. Peter Anvin
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kratochvil @ 2012-03-13  0:36 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Denys Vlasenko, Roland McGrath, linux-kernel, Oleg Nesterov,
	Kushal Das

On Tue, 13 Mar 2012 01:31:43 +0100, H. Peter Anvin wrote:
> This is basically providing the dso_list and executable information.
> Several of the other things you list above (cmdline, environ, pid,
> reason, maps) are already part of a core file.

dso_list and executable identification (their build-id) are also part of the
core file.  It is only filesystem defect it can map filename -> content but it
cannot map build-id -> content.  There are many indexing solutions for it,
including slocate, when you have started to talk about not 100% reliable
solutions.

> The rest sounds like an RFE to me.

If you really mean all that info does belong to a core file then I no longer
have any objections, Linux kernel can put a lot of redundant info into the
core file.

Regards,
Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Extending coredump note section to contain filenames
  2012-03-13  0:36                 ` Jan Kratochvil
@ 2012-03-13  0:42                   ` H. Peter Anvin
  2012-03-13  0:46                     ` Jan Kratochvil
  0 siblings, 1 reply; 21+ messages in thread
From: H. Peter Anvin @ 2012-03-13  0:42 UTC (permalink / raw)
  To: Jan Kratochvil
  Cc: Denys Vlasenko, Roland McGrath, linux-kernel, Oleg Nesterov,
	Kushal Das

On 03/12/2012 05:36 PM, Jan Kratochvil wrote:
> On Tue, 13 Mar 2012 01:31:43 +0100, H. Peter Anvin wrote:
>> This is basically providing the dso_list and executable information.
>> Several of the other things you list above (cmdline, environ, pid,
>> reason, maps) are already part of a core file.
> 
> dso_list and executable identification (their build-id) are also part of the
> core file.  It is only filesystem defect it can map filename -> content but it
> cannot map build-id -> content.  There are many indexing solutions for it,
> including slocate, when you have started to talk about not 100% reliable
> solutions.

There is no 100% reliable solution possible -- you have no guarantee of
any kind that the library executable still exists.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Extending coredump note section to contain filenames
  2012-03-13  0:42                   ` H. Peter Anvin
@ 2012-03-13  0:46                     ` Jan Kratochvil
  2012-03-13  0:50                       ` H. Peter Anvin
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kratochvil @ 2012-03-13  0:46 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Denys Vlasenko, Roland McGrath, linux-kernel, Oleg Nesterov,
	Kushal Das

On Tue, 13 Mar 2012 01:42:18 +0100, H. Peter Anvin wrote:
> There is no 100% reliable solution possible -- you have no guarantee of
> any kind that the library executable still exists.

I have guarantee that the library binary mapped in memory identified by
build-id can be found out there in the could.  There is no other guarantee.
And this guarantee fails with other solutions.

If you say that it is _additional_ info to build-id then yes, one can always
use build-id if everything else fails.  But then the non-build-id information
is redundant and it can just lead to wrong toolchain solutions - which has
already happened (Apport).

Regards,
Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Extending coredump note section to contain filenames
  2012-03-13  0:46                     ` Jan Kratochvil
@ 2012-03-13  0:50                       ` H. Peter Anvin
  0 siblings, 0 replies; 21+ messages in thread
From: H. Peter Anvin @ 2012-03-13  0:50 UTC (permalink / raw)
  To: Jan Kratochvil
  Cc: Denys Vlasenko, Roland McGrath, linux-kernel, Oleg Nesterov,
	Kushal Das

On 03/12/2012 05:46 PM, Jan Kratochvil wrote:
> On Tue, 13 Mar 2012 01:42:18 +0100, H. Peter Anvin wrote:
>> There is no 100% reliable solution possible -- you have no guarantee of
>> any kind that the library executable still exists.
> 
> I have guarantee that the library binary mapped in memory identified by
> build-id can be found out there in the could.  There is no other guarantee.
> And this guarantee fails with other solutions.
> 
> If you say that it is _additional_ info to build-id then yes, one can always
> use build-id if everything else fails.  But then the non-build-id information
> is redundant and it can just lead to wrong toolchain solutions - which has
> already happened (Apport).

You're thinking of a particular use case which isn't necessarily the
only one that matters.  It might be the only one that matters to *you*,
but that's very different to what matters to a developer, for example.

And yes of course the build-id should be included.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-03-13 12:19 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-09 17:13 Extending coredump note section to contain filenames Denys Vlasenko
2012-03-09 17:29 ` Jan Kratochvil
2012-03-12 12:05   ` Denys Vlasenko
2012-03-12 12:13     ` Denys Vlasenko
2012-03-12 16:53     ` Jan Kratochvil
2012-03-12 18:58       ` Denys Vlasenko
2012-03-12 19:08         ` Jan Kratochvil
2012-03-12 19:45           ` Denys Vlasenko
2012-03-12 22:07             ` Jan Kratochvil
2012-03-12 22:16             ` Jan Kratochvil
2012-03-13 12:12         ` Denys Vlasenko
2012-03-13 12:19           ` Jan Kratochvil
2012-03-12 22:21       ` H. Peter Anvin
2012-03-12 22:31         ` Jan Kratochvil
2012-03-13  0:16           ` H. Peter Anvin
2012-03-13  0:27             ` Jan Kratochvil
2012-03-13  0:31               ` H. Peter Anvin
2012-03-13  0:36                 ` Jan Kratochvil
2012-03-13  0:42                   ` H. Peter Anvin
2012-03-13  0:46                     ` Jan Kratochvil
2012-03-13  0:50                       ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox