* uniquely identifying KDUMP files that originate from QEMU
@ 2014-11-11 11:22 Laszlo Ersek
2014-11-11 11:46 ` [Qemu-devel] " Peter Maydell
` (2 more replies)
0 siblings, 3 replies; 29+ messages in thread
From: Laszlo Ersek @ 2014-11-11 11:22 UTC (permalink / raw)
To: Qiao Nuohan, Wen Congyang, kumagai-atsushi
Cc: Dave Anderson, Ekaterina Tumanova, kexec, qemu devel list,
crash-utility
(Note: I'm not subscribed to either qemu-devel or the kexec list; please
keep me CC'd.)
QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib,
kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command.
The resultant vmcore is usually analyzed with the "crash" utility.
The original tool producing such files is kdump. Unlike the procedure
performed by QEMU, kdump runs from *within* the guest (under a kexec'd
kdump kernel), and has more information about the original guest kernel
state (which is being dumped) than QEMU. To QEMU, the guest kernel state
is opaque.
For this reason, the kdump preparation logic in QEMU hardcodes a number
of fields in the kdump header. The direct issue is the "phys_base"
field. Refer to dump.c, functions create_header32(), create_header64(),
and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text
"0").
http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD
http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD
This works in most cases, because the guest Linux kernel indeed tends to
be loaded at guest-phys address 0. However, when the guest Linux kernel
is booted on top of OVMF (which has a somewhat unusual UEFI memory map),
then the guest Linux kernel is loaded at 16MB, thereby getting out of
sync with the phys_base=0 setting visible in the KDUMP header.
This trips up the "crash" utility.
Dave worked around the issue in "crash" for ELF format dumps -- "crash"
can identify QEMU as the originator of the vmcore by finding the QEMU
notes in the ELF vmcore. If those are present, then "crash" employs a
heuristic, probing for a phys_base up to 32MB, in 1MB steps.
Alas, the QEMU notes are not present in the KDUMP-format vmcores that
QEMU produces (they cannot be), hence crash has no way to tell apart
such files from those generated by genuine kdump. As an end result,
"crash" cannot automatically find the phys_base of OVMF-based Linux vmcores.
Dave suggested that a new flag, or a special phys_base value (like ~0UL)
be introduced as a distinguishing mark for QEMU-produced kdumps.
Implementing this in QEMU wouldn't be hard. The big question is
compatibility -- whose analysis tools would be broken by a (phys_base ==
~0UL) setting, or by a new flag?
Note that this change would affect SeaBIOS-based vmcores too. QEMU can't
(and shouldn't) discriminate the vmcores it dumps based on guest
firmware. (If QEMU did that, then it might as well try to figure out the
real phys_base value, which is clearly out of scope for qemu. One of the
selling points of the paging=false dump is that it doesn't involve
parsing guest RAM.)
Thanks
Laszlo
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: [Qemu-devel] uniquely identifying KDUMP files that originate from QEMU 2014-11-11 11:22 uniquely identifying KDUMP files that originate from QEMU Laszlo Ersek @ 2014-11-11 11:46 ` Peter Maydell 2014-11-11 12:09 ` Petr Tesarik 2014-11-11 17:27 ` [Qemu-devel] " Christopher Covington 2 siblings, 0 replies; 29+ messages in thread From: Peter Maydell @ 2014-11-11 11:46 UTC (permalink / raw) To: Laszlo Ersek Cc: Wen Congyang, Ekaterina Tumanova, kexec, qemu devel list, Qiao Nuohan, Dave Anderson, kumagai-atsushi, crash-utility On 11 November 2014 11:22, Laszlo Ersek <lersek@redhat.com> wrote: > For this reason, the kdump preparation logic in QEMU hardcodes a number > of fields in the kdump header. The direct issue is the "phys_base" > field. Refer to dump.c, functions create_header32(), create_header64(), > and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text > "0"). > > http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD > > http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD > > This works in most cases, because the guest Linux kernel indeed tends to > be loaded at guest-phys address 0. However, when the guest Linux kernel > is booted on top of OVMF (which has a somewhat unusual UEFI memory map), > then the guest Linux kernel is loaded at 16MB, thereby getting out of > sync with the phys_base=0 setting visible in the KDUMP header. Presumably this is also not going to work for machines other than the x86 PC, where physical memory may well not start at address zero... -- PMM _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: uniquely identifying KDUMP files that originate from QEMU 2014-11-11 11:22 uniquely identifying KDUMP files that originate from QEMU Laszlo Ersek 2014-11-11 11:46 ` [Qemu-devel] " Peter Maydell @ 2014-11-11 12:09 ` Petr Tesarik 2014-11-12 3:08 ` HATAYAMA Daisuke 2014-11-11 17:27 ` [Qemu-devel] " Christopher Covington 2 siblings, 1 reply; 29+ messages in thread From: Petr Tesarik @ 2014-11-11 12:09 UTC (permalink / raw) To: kexec On Tue, 11 Nov 2014 12:22:52 +0100 Laszlo Ersek <lersek@redhat.com> wrote: > (Note: I'm not subscribed to either qemu-devel or the kexec list; please > keep me CC'd.) > > QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib, > kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command. > > The resultant vmcore is usually analyzed with the "crash" utility. > > The original tool producing such files is kdump. Unlike the procedure > performed by QEMU, kdump runs from *within* the guest (under a kexec'd > kdump kernel), and has more information about the original guest kernel > state (which is being dumped) than QEMU. To QEMU, the guest kernel state > is opaque. > > For this reason, the kdump preparation logic in QEMU hardcodes a number > of fields in the kdump header. The direct issue is the "phys_base" > field. Refer to dump.c, functions create_header32(), create_header64(), > and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text > "0"). > > http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD > > http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD > > This works in most cases, because the guest Linux kernel indeed tends to > be loaded at guest-phys address 0. However, when the guest Linux kernel > is booted on top of OVMF (which has a somewhat unusual UEFI memory map), > then the guest Linux kernel is loaded at 16MB, thereby getting out of > sync with the phys_base=0 setting visible in the KDUMP header. > > This trips up the "crash" utility. > > Dave worked around the issue in "crash" for ELF format dumps -- "crash" > can identify QEMU as the originator of the vmcore by finding the QEMU > notes in the ELF vmcore. If those are present, then "crash" employs a > heuristic, probing for a phys_base up to 32MB, in 1MB steps. > > Alas, the QEMU notes are not present in the KDUMP-format vmcores that > QEMU produces (they cannot be), Why? Since KDUMP format version 4, the complete ELF notes can be stored in the file (see offset_note, size_note fields in the sub-header). Petr Tesarik _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: uniquely identifying KDUMP files that originate from QEMU 2014-11-11 12:09 ` Petr Tesarik @ 2014-11-12 3:08 ` HATAYAMA Daisuke 2014-11-12 8:04 ` Petr Tesarik 0 siblings, 1 reply; 29+ messages in thread From: HATAYAMA Daisuke @ 2014-11-12 3:08 UTC (permalink / raw) To: ptesarik; +Cc: lersek, kexec From: Petr Tesarik <ptesarik@suse.cz> Subject: Re: uniquely identifying KDUMP files that originate from QEMU Date: Tue, 11 Nov 2014 13:09:13 +0100 > On Tue, 11 Nov 2014 12:22:52 +0100 > Laszlo Ersek <lersek@redhat.com> wrote: > >> (Note: I'm not subscribed to either qemu-devel or the kexec list; please >> keep me CC'd.) >> >> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib, >> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command. >> >> The resultant vmcore is usually analyzed with the "crash" utility. >> >> The original tool producing such files is kdump. Unlike the procedure >> performed by QEMU, kdump runs from *within* the guest (under a kexec'd >> kdump kernel), and has more information about the original guest kernel >> state (which is being dumped) than QEMU. To QEMU, the guest kernel state >> is opaque. >> >> For this reason, the kdump preparation logic in QEMU hardcodes a number >> of fields in the kdump header. The direct issue is the "phys_base" >> field. Refer to dump.c, functions create_header32(), create_header64(), >> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text >> "0"). >> >> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD >> >> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD >> >> This works in most cases, because the guest Linux kernel indeed tends to >> be loaded at guest-phys address 0. However, when the guest Linux kernel >> is booted on top of OVMF (which has a somewhat unusual UEFI memory map), >> then the guest Linux kernel is loaded at 16MB, thereby getting out of >> sync with the phys_base=0 setting visible in the KDUMP header. >> >> This trips up the "crash" utility. >> >> Dave worked around the issue in "crash" for ELF format dumps -- "crash" >> can identify QEMU as the originator of the vmcore by finding the QEMU >> notes in the ELF vmcore. If those are present, then "crash" employs a >> heuristic, probing for a phys_base up to 32MB, in 1MB steps. >> >> Alas, the QEMU notes are not present in the KDUMP-format vmcores that >> QEMU produces (they cannot be), > > Why? Since KDUMP format version 4, the complete ELF notes can be stored > in the file (see offset_note, size_note fields in the sub-header). > Yes, the QEMU notes is present in kdump-compressed format. But phys_base cannot be calculated only from qemu-side. We cannot do more than the efforts crash utility does for workaround. So, the phys_base value in kdump-sub header is now designed to have 0 now. Anyway, phys_base is kernel information. To make it available for qemu side, there's need to prepare a mechanism for qemu to have any access to it. One ad-hoc but simple way is to put phys_base value as part of VMCOREINFO note information on kernel. Although there has already been a similar one in VMCOREINFO, like arch/x86/kernel/ == void arch_crash_save_vmcoreinfo(void) { VMCOREINFO_SYMBOL(phys_base); <---- This VMCOREINFO_SYMBOL(init_level4_pgt); ... == this is meangless, because this value is a virtual address assigned to phys_base symbol. To refer to the value of phys_base itself, we need the phys_base value we are about to get now. So, instead, if we change this to save the value, not value of symbol phys_base, we can get phys_base from the VMCOREINFO. The VMCOREINFO consists simply of string. So it's easy to search vmcore for it e.g. using strings and grep like this: $ strings vmcore-3.10.0-121.el7.x86_64 | grep -E ".*VMCOREINFO.*" -A 100 VMCOREINFO OSRELEASE=3.10.0-121.el7.x86_64 PAGESIZE=4096 ... SYMBOL(phys_base)=ffffffff818e5010 <-- though this is address of phys_base now... SYMBOL(init_level4_pgt)=ffffffff818de000 SYMBOL(node_data)=ffffffff819f1cc0 LENGTH(node_data)=1024 CRASHTIME=1399460394 ... This should also be useful to get phys_base of 2nd kernel, which is inherently relocated kernel from a vmcore generated using qemu dump. This is far from well-designed from qemu's point of view, but it would be manually easier to get phys_base than now. Obviously, the VMCOREINFO is available only if CONFIG_KEXEC is enabled. Other users cannot use this. -- Thanks. HATAYAMA, Daisuke _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: uniquely identifying KDUMP files that originate from QEMU 2014-11-12 3:08 ` HATAYAMA Daisuke @ 2014-11-12 8:04 ` Petr Tesarik 2014-11-12 14:50 ` Laszlo Ersek 0 siblings, 1 reply; 29+ messages in thread From: Petr Tesarik @ 2014-11-12 8:04 UTC (permalink / raw) To: HATAYAMA Daisuke; +Cc: kexec, lersek On Wed, 12 Nov 2014 12:08:38 +0900 (JST) HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote: > From: Petr Tesarik <ptesarik@suse.cz> > Subject: Re: uniquely identifying KDUMP files that originate from QEMU > Date: Tue, 11 Nov 2014 13:09:13 +0100 > > > On Tue, 11 Nov 2014 12:22:52 +0100 > > Laszlo Ersek <lersek@redhat.com> wrote: >[...] > >> Dave worked around the issue in "crash" for ELF format dumps -- "crash" > >> can identify QEMU as the originator of the vmcore by finding the QEMU > >> notes in the ELF vmcore. If those are present, then "crash" employs a > >> heuristic, probing for a phys_base up to 32MB, in 1MB steps. > >> > >> Alas, the QEMU notes are not present in the KDUMP-format vmcores that > >> QEMU produces (they cannot be), > > > > Why? Since KDUMP format version 4, the complete ELF notes can be stored > > in the file (see offset_note, size_note fields in the sub-header). > > > > Yes, the QEMU notes is present in kdump-compressed format. But > phys_base cannot be calculated only from qemu-side. We cannot do more Yes, this part is obvious. I was referring to this sentence: "Alas, the QEMU notes are not present in the KDUMP-format vmcores." My understanding was that crash cannot detect a KDUMP file created by QEMU, and so it does not apply the workaround. Sorry for confusion if this was not your problem. > than the efforts crash utility does for workaround. So, the phys_base > value in kdump-sub header is now designed to have 0 now. > > Anyway, phys_base is kernel information. To make it available for qemu > side, there's need to prepare a mechanism for qemu to have any access > to it. Yes. I wonder if you can have access without some sort of co-operation from the guest kernel itself. I guess not. > One ad-hoc but simple way is to put phys_base value as part of > VMCOREINFO note information on kernel. YES! In fact, this has been on my TODO list for a few weeks now. > Although there has already been a similar one in VMCOREINFO, like > > arch/x86/kernel/ > == > void arch_crash_save_vmcoreinfo(void) > { > VMCOREINFO_SYMBOL(phys_base); <---- This > VMCOREINFO_SYMBOL(init_level4_pgt); > > ... > == > > this is meangless, because this value is a virtual address assigned to > phys_base symbol. Yes, again. I have already done some research and *nobody* needs the actual symbol value. For example, makedumpfile only checks if the symbol exists and sets phys_base to 0 unconditionally if not. That's so wrong... > To refer to the value of phys_base itself, we need > the phys_base value we are about to get now. > > So, instead, if we change this to save the value, not value of symbol > phys_base, we can get phys_base from the VMCOREINFO. Yes, please do that. It should be sufficient to replace this line in kernel's arch/x86/kernel/machine_kexec_64.c: VMCOREINFO_SYMBOL(phys_base); with: VMCOREINFO_NUMBER(phys_base); > The VMCOREINFO consists simply of string. So it's easy to search > vmcore for it e.g. using strings and grep like this: > > $ strings vmcore-3.10.0-121.el7.x86_64 | grep -E ".*VMCOREINFO.*" -A 100 If vmcore-3.10.0-121.el7.x86_64 is a standard kernel ELF dump file, you can actually run elfutil's "readelf -n" on it and get the VMCOREINFO directly (or use my libkdumpfile library to read the kernel core file, see https://github.com/ptesarik/libkdumpfile). If it is simply a QEMU dump file (without the VMCOREINFO ELF note), then running strings on it seems like the only sensible workaround. I tried to solve a similar problem in kdumpid (http://sourceforge.net/projects/kdumpid/), and best I could do is very similar to the workaround in the crash utility (scanning physical memory for something that looks like kernel text). Petr T _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: uniquely identifying KDUMP files that originate from QEMU 2014-11-12 8:04 ` Petr Tesarik @ 2014-11-12 14:50 ` Laszlo Ersek 2014-11-12 18:43 ` Petr Tesarik 0 siblings, 1 reply; 29+ messages in thread From: Laszlo Ersek @ 2014-11-12 14:50 UTC (permalink / raw) To: Petr Tesarik, HATAYAMA Daisuke; +Cc: kexec On 11/12/14 09:04, Petr Tesarik wrote: > On Wed, 12 Nov 2014 12:08:38 +0900 (JST) > HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote: >> Anyway, phys_base is kernel information. To make it available for qemu >> side, there's need to prepare a mechanism for qemu to have any access >> to it. > > Yes. I wonder if you can have access without some sort of co-operation > from the guest kernel itself. I guess not. Propagating any kind of additional information from the guest kernel (which is unprivileged and potentially malicious) to the host-side qemu process (which is by definition more privileged, although still confined by various measures) is something we'd explicitly like to avoid. Think of it like this. I throw a physical box at you, running Linux, that has frozen in time. Can "crash" work with nothing else but the contents of the memory, and information about the CPUs? It certainly can, it already does. It might need more heuristics than in when the guest kernel offers those bits of info on a plate, but those heuristics already work. We just need to tell "crash" that this particular box, flying in the air, has been launched by qemu, so that "crash" enable its heuristics. Thanks Laszlo _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: uniquely identifying KDUMP files that originate from QEMU 2014-11-12 14:50 ` Laszlo Ersek @ 2014-11-12 18:43 ` Petr Tesarik 2014-11-12 20:30 ` Laszlo Ersek 0 siblings, 1 reply; 29+ messages in thread From: Petr Tesarik @ 2014-11-12 18:43 UTC (permalink / raw) To: Laszlo Ersek; +Cc: HATAYAMA Daisuke, kexec V Wed, 12 Nov 2014 15:50:32 +0100 Laszlo Ersek <lersek@redhat.com> napsáno: > On 11/12/14 09:04, Petr Tesarik wrote: > > On Wed, 12 Nov 2014 12:08:38 +0900 (JST) > > HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote: > > >> Anyway, phys_base is kernel information. To make it available for qemu > >> side, there's need to prepare a mechanism for qemu to have any access > >> to it. > > > > Yes. I wonder if you can have access without some sort of co-operation > > from the guest kernel itself. I guess not. > > Propagating any kind of additional information from the guest kernel > (which is unprivileged and potentially malicious) to the host-side qemu > process (which is by definition more privileged, although still confined > by various measures) is something we'd explicitly like to avoid. > > Think of it like this. I throw a physical box at you, running Linux, > that has frozen in time. Can "crash" work with nothing else but the > contents of the memory, and information about the CPUs? If only you could save the _complete_ state of the CPU... For example the content of CR3 would be quite useful. Petr T _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: uniquely identifying KDUMP files that originate from QEMU 2014-11-12 18:43 ` Petr Tesarik @ 2014-11-12 20:30 ` Laszlo Ersek 2014-11-12 20:41 ` Dave Anderson 2014-11-12 21:20 ` Petr Tesarik 0 siblings, 2 replies; 29+ messages in thread From: Laszlo Ersek @ 2014-11-12 20:30 UTC (permalink / raw) To: Petr Tesarik, anderson Cc: wency, tumanova, kexec, qiaonuohan, qemu-devel, HATAYAMA Daisuke, kumagai-atsushi, crash-utility adding back a few CC's because this discussion is useful On 11/12/14 19:43, Petr Tesarik wrote: > V Wed, 12 Nov 2014 15:50:32 +0100 > Laszlo Ersek <lersek@redhat.com> napsáno: > >> On 11/12/14 09:04, Petr Tesarik wrote: >>> On Wed, 12 Nov 2014 12:08:38 +0900 (JST) >>> HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote: >> >>>> Anyway, phys_base is kernel information. To make it available for qemu >>>> side, there's need to prepare a mechanism for qemu to have any access >>>> to it. >>> >>> Yes. I wonder if you can have access without some sort of co-operation >>> from the guest kernel itself. I guess not. >> >> Propagating any kind of additional information from the guest kernel >> (which is unprivileged and potentially malicious) to the host-side qemu >> process (which is by definition more privileged, although still confined >> by various measures) is something we'd explicitly like to avoid. >> >> Think of it like this. I throw a physical box at you, running Linux, >> that has frozen in time. Can "crash" work with nothing else but the >> contents of the memory, and information about the CPUs? > > If only you could save the _complete_ state of the CPU... For example > the content of CR3 would be quite useful. (1) CR3 is already saved, in both the ELF and the kdump compressed formats. - ELF case: qmp_dump_guest_memory() [dump.c] create_vmcore() dump_begin() write_elf64_notes() loop from 1 to #vcpu: cpu_write_elf64_note() [qom/cpu.c] x86_64_write_elf64_note() [target-i386/arch_dump.c] writes "CORE" loop from 1 to #vcpu: cpu_write_elf64_qemunote() [qom/cpu.c] x86_cpu_write_elf64_qemunote() [target-i386/arch_dump.c] cpu_write_qemu_note() qemu_get_cpustate() s->cr[3] = env->cr[3]; <---------- here writes "QEMU" Hence, the information is part of the QEMU note. - kdump case: qmp_dump_guest_memory() [dump.c] create_kdump_vmcore() write_dump_header() create_header64() write_elf64_notes() [... same as above ...] The trick here is that the note-writer functions use a callback function for actually outputting the data. So while in the ELF case the stuff goes directly to a file, in the kdump case the notes are first saved in a memory buffer, and then later saved in the file at offset KdumpSubHeader64.offset_note. (... Which is then represented in the flattened file format of course.) So, the information is there in both cases. (2) Dave -- this just made me realize that the QEMU note is *already* there in the kdump file as well; pointed-to by KdumpSubHeader64.offset_note, for a length of KdumpSubHeader64.note_size. From your other email <http://thread.gmane.org/gmane.linux.kernel.kexec/12787/focus=12797>: > sub_header_kdump: 1c9cff0 > phys_base: 0 > dump_level: 1 (0x1) (DUMP_EXCLUDE_ZERO) > split: 0 > start_pfn: (unused) > end_pfn: (unused) > offset_vmcoreinfo: 0 (0x0) > size_vmcoreinfo: 0 (0x0) > offset_note: 4200 (0x1068) <----------- here > size_note: 3232 (0xca0) <----------- > num_prstatus_notes: 4 > notes_buf: 1c9e000 > notes[0]: 1c9e000 > notes[1]: 1c9e164 > notes[2]: 1c9e2c8 > notes[3]: 1c9e42c > NT_PRSTATUS_offset: 1068 > 11cc > 1330 > 1494 > offset_eraseinfo: 0 (0x0) > size_eraseinfo: 0 (0x0) > start_pfn_64: (unused) > end_pfn_64: (unused) > max_mapnr_64: 1245184 (0x130000) Can you fetch that in "crash"? If you can, then there's nothing to do on the qemu side (and I'll have to apologize for spamming a bunch of lists :/). I think "crash" already iterates over all of the notes in the note buffer, but skips everything different from NT_PRSTATUS. (3) Regarding the structure of the notes, we have to consider the placement of the notes and their internal structure. The placement is different between the ELF and the KDUMP file format. The internal structure of the notes is identical between the two file formats. For example, for a 4 VCPU guest, you end up with note names like CORE CORE CORE CORE QEMU QEMU QEMU QEMU All of these are Elf64_Nhdr structures. The CORE ones have type NT_PRSTATUS, and the QEMU ones have type 0. (3a) The placement in the ELF file is already handled by "crash". Each note "simply" gets its own ELF note segment/section. (3b) In the kdump file, the Elf64_Nhdr structures (8 pieces in total, in the above example -- 4x CORE, 4x QEMU) are concatenated in that order, and finally stored at "offset_note". (3c) Regarding the internal structure of the notes. The CORE ones are already known and handled. The QEMU notes have the following structure: > Elf64_Nhdr: > n_namesz: 5 ("QEMU") > n_descsz: 432 > n_type: 0 (?) > 000001b000000001 0000000000000000 |------||------| |--------------| size version rax > 0000000000000000 0000000000000000 |--------------| |--------------| rbx rcx > 0000000000000000 0000000000000001 |--------------| |--------------| rdx rsi > ffffffff81dd5228 ffffffff81a01ec8 |--------------| |--------------| rdi rsp > ffffffff81a01ec8 0000000000000000 |--------------| |--------------| rbp r8 > 0000000000000000 00000013911d5f29 |--------------| |--------------| r9 r10 > 0000000000000000 ffffffff81c00480 |--------------| |--------------| r11 r12 > 0000000000000000 ffffffffffffffff |--------------| |--------------| r13 r14 > 000000000309f000 ffffffff810375ab |--------------| |--------------| r15 rip > 0000000000000246 ffffffff00000010 |--------------| |------||------| rflags cs/lim cs/sel > 0000000000a09b00 0000000000000000 |------||------| |--------------| cs/pad cs/flags cs/base > ffffffff00000018 0000000000c09300 |------||------| |------||------| ds/lim ds/sel ds/pad ds/flags > 0000000000000000 ffffffff00000018 |--------------| |------||------| ds/base es/lim es/sel > 0000000000c09300 0000000000000000 |------||------| |--------------| es/pad es/flags es/base > ffffffff00000000 0000000000000000 |------||------| |------||------| fs/lim fs/sel fs/pad fs/flags > 0000000000000000 ffffffff00000000 |--------------| |------||------| fs/base gs/lim gs/sel > 0000000000000000 ffff880003200000 |------||------| |--------------| gs/pad gs/flags gs/base > ffffffff00000018 0000000000c09300 |------||------| |------||------| ss/lim ss/sel ss/pad ss/flags > 0000000000000000 ffffffff00000000 |--------------| |------||------| ss/base ldt... > 0000000000000000 0000000000000000 |------||------| |--------------| ...ldt > 0000208700000040 0000000000008b00 |------||------| |------||------| tr... > ffff880003213b40 0000007f00000000 |--------------| |------||------| ...tr gdt... > 0000000000000000 ffff880003204000 |------||------| |--------------| ...gdt > 00000fff00000000 0000000000000000 |------||------| |------||------| idt... > ffffffff81dd2000 000000008005003b |--------------| |--------------| ...idt cr0 > 0000000000000000 0000000001b2e000 |--------------| |--------------| cr1 cr2 > 0000000007b18000 00000000000006f0 |--------------| |--------------| cr3 cr4 From "target-i386/arch_dump.c": > struct QEMUCPUSegment { > uint32_t selector; > uint32_t limit; > uint32_t flags; > uint32_t pad; > uint64_t base; > }; > > typedef struct QEMUCPUSegment QEMUCPUSegment; > > struct QEMUCPUState { > uint32_t version; > uint32_t size; > uint64_t rax, rbx, rcx, rdx, rsi, rdi, rsp, rbp; > uint64_t r8, r9, r10, r11, r12, r13, r14, r15; > uint64_t rip, rflags; > QEMUCPUSegment cs, ds, es, fs, gs, ss; > QEMUCPUSegment ldt, tr, gdt, idt; > uint64_t cr[5]; > }; > > typedef struct QEMUCPUState QEMUCPUState; Summary: I think the info is all there. Thanks Laszlo _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: uniquely identifying KDUMP files that originate from QEMU 2014-11-12 20:30 ` Laszlo Ersek @ 2014-11-12 20:41 ` Dave Anderson 2014-11-12 21:21 ` [Crash-utility] " Dave Anderson 2014-11-12 21:20 ` Petr Tesarik 1 sibling, 1 reply; 29+ messages in thread From: Dave Anderson @ 2014-11-12 20:41 UTC (permalink / raw) To: Laszlo Ersek Cc: wency, Petr Tesarik, tumanova, kexec, qiaonuohan, qemu-devel, HATAYAMA Daisuke, kumagai-atsushi, crash-utility ----- Original Message ----- > adding back a few CC's because this discussion is useful > > On 11/12/14 19:43, Petr Tesarik wrote: > > V Wed, 12 Nov 2014 15:50:32 +0100 > > Laszlo Ersek <lersek@redhat.com> napsáno: > > > >> On 11/12/14 09:04, Petr Tesarik wrote: > >>> On Wed, 12 Nov 2014 12:08:38 +0900 (JST) > >>> HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote: > >> > >>>> Anyway, phys_base is kernel information. To make it available for qemu > >>>> side, there's need to prepare a mechanism for qemu to have any access > >>>> to it. > >>> > >>> Yes. I wonder if you can have access without some sort of co-operation > >>> from the guest kernel itself. I guess not. > >> > >> Propagating any kind of additional information from the guest kernel > >> (which is unprivileged and potentially malicious) to the host-side qemu > >> process (which is by definition more privileged, although still confined > >> by various measures) is something we'd explicitly like to avoid. > >> > >> Think of it like this. I throw a physical box at you, running Linux, > >> that has frozen in time. Can "crash" work with nothing else but the > >> contents of the memory, and information about the CPUs? > > > > If only you could save the _complete_ state of the CPU... For example > > the content of CR3 would be quite useful. > > (1) CR3 is already saved, in both the ELF and the kdump compressed formats. > > - ELF case: > > qmp_dump_guest_memory() [dump.c] > create_vmcore() > dump_begin() > write_elf64_notes() > > loop from 1 to #vcpu: > cpu_write_elf64_note() [qom/cpu.c] > x86_64_write_elf64_note() [target-i386/arch_dump.c] > writes "CORE" > > loop from 1 to #vcpu: > cpu_write_elf64_qemunote() [qom/cpu.c] > x86_cpu_write_elf64_qemunote() [target-i386/arch_dump.c] > cpu_write_qemu_note() > qemu_get_cpustate() > s->cr[3] = env->cr[3]; <---------- here > writes "QEMU" > > Hence, the information is part of the QEMU note. > > - kdump case: > > qmp_dump_guest_memory() [dump.c] > create_kdump_vmcore() > write_dump_header() > create_header64() > write_elf64_notes() > [... same as above ...] > > The trick here is that the note-writer functions use a callback function > for actually outputting the data. So while in the ELF case the stuff > goes directly to a file, in the kdump case the notes are first saved in > a memory buffer, and then later saved in the file at offset > KdumpSubHeader64.offset_note. (... Which is then represented in the > flattened file format of course.) > > So, the information is there in both cases. > > > (2) Dave -- this just made me realize that the QEMU note is *already* > there in the kdump file as well; pointed-to by > KdumpSubHeader64.offset_note, for a length of KdumpSubHeader64.note_size. > > From your other email > <http://thread.gmane.org/gmane.linux.kernel.kexec/12787/focus=12797>: > > > sub_header_kdump: 1c9cff0 > > phys_base: 0 > > dump_level: 1 (0x1) (DUMP_EXCLUDE_ZERO) > > split: 0 > > start_pfn: (unused) > > end_pfn: (unused) > > offset_vmcoreinfo: 0 (0x0) > > size_vmcoreinfo: 0 (0x0) > > offset_note: 4200 (0x1068) <----------- here > > size_note: 3232 (0xca0) <----------- > > num_prstatus_notes: 4 > > notes_buf: 1c9e000 > > notes[0]: 1c9e000 > > notes[1]: 1c9e164 > > notes[2]: 1c9e2c8 > > notes[3]: 1c9e42c > > NT_PRSTATUS_offset: 1068 > > 11cc > > 1330 > > 1494 > > offset_eraseinfo: 0 (0x0) > > size_eraseinfo: 0 (0x0) > > start_pfn_64: (unused) > > end_pfn_64: (unused) > > max_mapnr_64: 1245184 (0x130000) > > Can you fetch that in "crash"? If you can, then there's nothing to do on > the qemu side (and I'll have to apologize for spamming a bunch of lists :/). Sure enough... I was just playing with process_el64_notes() to check/read the note name strings, and noticed that I can certainly see them. But as you noted, only the NT_PRSTATUS notes are stored in the "notes[]" array. so I was under the impression that the QEMU notes were completely missing. That being the case -- we're pretty much done! I'll put a patch in the next upstream release of crash. Thanks, Dave > > I think "crash" already iterates over all of the notes in the note > buffer, but skips everything different from NT_PRSTATUS. > > > (3) Regarding the structure of the notes, we have to consider the > placement of the notes and their internal structure. The placement is > different between the ELF and the KDUMP file format. The internal > structure of the notes is identical between the two file formats. > > For example, for a 4 VCPU guest, you end up with note names like > > CORE > CORE > CORE > CORE > QEMU > QEMU > QEMU > QEMU > > All of these are Elf64_Nhdr structures. The CORE ones have type > NT_PRSTATUS, and the QEMU ones have type 0. > > (3a) The placement in the ELF file is already handled by "crash". Each > note "simply" gets its own ELF note segment/section. > > (3b) In the kdump file, the Elf64_Nhdr structures (8 pieces in total, in > the above example -- 4x CORE, 4x QEMU) are concatenated in that order, > and finally stored at "offset_note". > > (3c) Regarding the internal structure of the notes. The CORE ones are > already known and handled. The QEMU notes have the following structure: > > > Elf64_Nhdr: > > n_namesz: 5 ("QEMU") > > n_descsz: 432 > > n_type: 0 (?) > > 000001b000000001 0000000000000000 > |------||------| |--------------| > size version rax > > > 0000000000000000 0000000000000000 > |--------------| |--------------| > rbx rcx > > > 0000000000000000 0000000000000001 > |--------------| |--------------| > rdx rsi > > > ffffffff81dd5228 ffffffff81a01ec8 > |--------------| |--------------| > rdi rsp > > > ffffffff81a01ec8 0000000000000000 > |--------------| |--------------| > rbp r8 > > > 0000000000000000 00000013911d5f29 > |--------------| |--------------| > r9 r10 > > > 0000000000000000 ffffffff81c00480 > |--------------| |--------------| > r11 r12 > > > 0000000000000000 ffffffffffffffff > |--------------| |--------------| > r13 r14 > > > 000000000309f000 ffffffff810375ab > |--------------| |--------------| > r15 rip > > > 0000000000000246 ffffffff00000010 > |--------------| |------||------| > rflags cs/lim cs/sel > > > 0000000000a09b00 0000000000000000 > |------||------| |--------------| > cs/pad cs/flags cs/base > > > ffffffff00000018 0000000000c09300 > |------||------| |------||------| > ds/lim ds/sel ds/pad ds/flags > > > 0000000000000000 ffffffff00000018 > |--------------| |------||------| > ds/base es/lim es/sel > > > 0000000000c09300 0000000000000000 > |------||------| |--------------| > es/pad es/flags es/base > > > ffffffff00000000 0000000000000000 > |------||------| |------||------| > fs/lim fs/sel fs/pad fs/flags > > > 0000000000000000 ffffffff00000000 > |--------------| |------||------| > fs/base gs/lim gs/sel > > > 0000000000000000 ffff880003200000 > |------||------| |--------------| > gs/pad gs/flags gs/base > > > ffffffff00000018 0000000000c09300 > |------||------| |------||------| > ss/lim ss/sel ss/pad ss/flags > > > 0000000000000000 ffffffff00000000 > |--------------| |------||------| > ss/base ldt... > > > 0000000000000000 0000000000000000 > |------||------| |--------------| > ...ldt > > > 0000208700000040 0000000000008b00 > |------||------| |------||------| > tr... > > > ffff880003213b40 0000007f00000000 > |--------------| |------||------| > ...tr gdt... > > > 0000000000000000 ffff880003204000 > |------||------| |--------------| > ...gdt > > > 00000fff00000000 0000000000000000 > |------||------| |------||------| > idt... > > > ffffffff81dd2000 000000008005003b > |--------------| |--------------| > ...idt cr0 > > > 0000000000000000 0000000001b2e000 > |--------------| |--------------| > cr1 cr2 > > > 0000000007b18000 00000000000006f0 > |--------------| |--------------| > cr3 cr4 > > From "target-i386/arch_dump.c": > > > struct QEMUCPUSegment { > > uint32_t selector; > > uint32_t limit; > > uint32_t flags; > > uint32_t pad; > > uint64_t base; > > }; > > > > typedef struct QEMUCPUSegment QEMUCPUSegment; > > > > struct QEMUCPUState { > > uint32_t version; > > uint32_t size; > > uint64_t rax, rbx, rcx, rdx, rsi, rdi, rsp, rbp; > > uint64_t r8, r9, r10, r11, r12, r13, r14, r15; > > uint64_t rip, rflags; > > QEMUCPUSegment cs, ds, es, fs, gs, ss; > > QEMUCPUSegment ldt, tr, gdt, idt; > > uint64_t cr[5]; > > }; > > > > typedef struct QEMUCPUState QEMUCPUState; > > > Summary: I think the info is all there. > > Thanks > Laszlo > _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Crash-utility] uniquely identifying KDUMP files that originate from QEMU 2014-11-12 20:41 ` Dave Anderson @ 2014-11-12 21:21 ` Dave Anderson 0 siblings, 0 replies; 29+ messages in thread From: Dave Anderson @ 2014-11-12 21:21 UTC (permalink / raw) To: Discussion list for crash utility usage, maintenance and development Cc: kexec, tumanova, Laszlo Ersek, qemu-devel ----- Original Message ----- > > Can you fetch that in "crash"? If you can, then there's nothing to do on > > the qemu side (and I'll have to apologize for spamming a bunch of lists > > :/). Well, let's be clear -- I was the one who put you up to it... But no apology is required -- and in fact, if today's discussion results in the "phys_base" vmcoreinfo issue being resolved in the kernel, we'll all be eternally grateful. Thanks again, Dave _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: uniquely identifying KDUMP files that originate from QEMU 2014-11-12 20:30 ` Laszlo Ersek 2014-11-12 20:41 ` Dave Anderson @ 2014-11-12 21:20 ` Petr Tesarik 1 sibling, 0 replies; 29+ messages in thread From: Petr Tesarik @ 2014-11-12 21:20 UTC (permalink / raw) To: Laszlo Ersek Cc: wency, tumanova, kexec, qiaonuohan, qemu-devel, HATAYAMA Daisuke, kumagai-atsushi, anderson, crash-utility On Wed, 12 Nov 2014 21:30:20 +0100 Laszlo Ersek <lersek@redhat.com> wrote: > adding back a few CC's because this discussion is useful > > On 11/12/14 19:43, Petr Tesarik wrote: > > V Wed, 12 Nov 2014 15:50:32 +0100 > > Laszlo Ersek <lersek@redhat.com> napsáno: > > > >> On 11/12/14 09:04, Petr Tesarik wrote: > >>> On Wed, 12 Nov 2014 12:08:38 +0900 (JST) > >>> HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote: > >> > >>>> Anyway, phys_base is kernel information. To make it available for qemu > >>>> side, there's need to prepare a mechanism for qemu to have any access > >>>> to it. > >>> > >>> Yes. I wonder if you can have access without some sort of co-operation > >>> from the guest kernel itself. I guess not. > >> > >> Propagating any kind of additional information from the guest kernel > >> (which is unprivileged and potentially malicious) to the host-side qemu > >> process (which is by definition more privileged, although still confined > >> by various measures) is something we'd explicitly like to avoid. > >> > >> Think of it like this. I throw a physical box at you, running Linux, > >> that has frozen in time. Can "crash" work with nothing else but the > >> contents of the memory, and information about the CPUs? > > > > If only you could save the _complete_ state of the CPU... For example > > the content of CR3 would be quite useful. > > (1) CR3 is already saved, in both the ELF and the kdump compressed formats. Sweet. :-) So, there's no need for any heuristics. Since CR3 gives the physical address of the PML4 table, I can use it to translate __START_KERNEL_map (0xffffffff80000000UL on all Linux kernels since introduction of x86_64) to a physical address and compute phys_base from that. In fact, QEMU could do the same if you can live with hardcoding a Linux-kernel specific constant into the tool... Petr T _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] uniquely identifying KDUMP files that originate from QEMU 2014-11-11 11:22 uniquely identifying KDUMP files that originate from QEMU Laszlo Ersek 2014-11-11 11:46 ` [Qemu-devel] " Peter Maydell 2014-11-11 12:09 ` Petr Tesarik @ 2014-11-11 17:27 ` Christopher Covington 2014-11-12 8:05 ` Petr Tesarik 2014-11-12 14:37 ` Laszlo Ersek 2 siblings, 2 replies; 29+ messages in thread From: Christopher Covington @ 2014-11-11 17:27 UTC (permalink / raw) To: Laszlo Ersek Cc: Wen Congyang, Ekaterina Tumanova, kexec, qemu devel list, Qiao Nuohan, Dave Anderson, kumagai-atsushi, crash-utility On 11/11/2014 06:22 AM, Laszlo Ersek wrote: > (Note: I'm not subscribed to either qemu-devel or the kexec list; please > keep me CC'd.) > > QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib, > kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command. > > The resultant vmcore is usually analyzed with the "crash" utility. > > The original tool producing such files is kdump. Unlike the procedure > performed by QEMU, kdump runs from *within* the guest (under a kexec'd > kdump kernel), and has more information about the original guest kernel > state (which is being dumped) than QEMU. To QEMU, the guest kernel state > is opaque. > > For this reason, the kdump preparation logic in QEMU hardcodes a number > of fields in the kdump header. The direct issue is the "phys_base" > field. Refer to dump.c, functions create_header32(), create_header64(), > and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text > "0"). > > http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD > > http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD > > This works in most cases, because the guest Linux kernel indeed tends to > be loaded at guest-phys address 0. However, when the guest Linux kernel > is booted on top of OVMF (which has a somewhat unusual UEFI memory map), > then the guest Linux kernel is loaded at 16MB, thereby getting out of > sync with the phys_base=0 setting visible in the KDUMP header. > > This trips up the "crash" utility. > > Dave worked around the issue in "crash" for ELF format dumps -- "crash" > can identify QEMU as the originator of the vmcore by finding the QEMU > notes in the ELF vmcore. If those are present, then "crash" employs a > heuristic, probing for a phys_base up to 32MB, in 1MB steps. What advantages does KDUMP have over ELF? Thanks, Chris -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] uniquely identifying KDUMP files that originate from QEMU 2014-11-11 17:27 ` [Qemu-devel] " Christopher Covington @ 2014-11-12 8:05 ` Petr Tesarik 2014-11-12 13:18 ` Christopher Covington 2014-11-12 14:37 ` Laszlo Ersek 1 sibling, 1 reply; 29+ messages in thread From: Petr Tesarik @ 2014-11-12 8:05 UTC (permalink / raw) To: Christopher Covington Cc: Wen Congyang, Ekaterina Tumanova, kexec, qemu devel list, Qiao Nuohan, Dave Anderson, kumagai-atsushi, Laszlo Ersek, crash-utility On Tue, 11 Nov 2014 12:27:44 -0500 Christopher Covington <cov@codeaurora.org> wrote: > On 11/11/2014 06:22 AM, Laszlo Ersek wrote: > > (Note: I'm not subscribed to either qemu-devel or the kexec list; please > > keep me CC'd.) > > > > QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib, > > kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command. > > > > The resultant vmcore is usually analyzed with the "crash" utility. > > > > The original tool producing such files is kdump. Unlike the procedure > > performed by QEMU, kdump runs from *within* the guest (under a kexec'd > > kdump kernel), and has more information about the original guest kernel > > state (which is being dumped) than QEMU. To QEMU, the guest kernel state > > is opaque. > > > > For this reason, the kdump preparation logic in QEMU hardcodes a number > > of fields in the kdump header. The direct issue is the "phys_base" > > field. Refer to dump.c, functions create_header32(), create_header64(), > > and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text > > "0"). > > > > http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD > > > > http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD > > > > This works in most cases, because the guest Linux kernel indeed tends to > > be loaded at guest-phys address 0. However, when the guest Linux kernel > > is booted on top of OVMF (which has a somewhat unusual UEFI memory map), > > then the guest Linux kernel is loaded at 16MB, thereby getting out of > > sync with the phys_base=0 setting visible in the KDUMP header. > > > > This trips up the "crash" utility. > > > > Dave worked around the issue in "crash" for ELF format dumps -- "crash" > > can identify QEMU as the originator of the vmcore by finding the QEMU > > notes in the ELF vmcore. If those are present, then "crash" employs a > > heuristic, probing for a phys_base up to 32MB, in 1MB steps. > > What advantages does KDUMP have over ELF? It's smaller (data is compressed), and it contains a header with some useful information (e.g. the crashed kernel's version and release). HTH, Petr T _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] uniquely identifying KDUMP files that originate from QEMU 2014-11-12 8:05 ` Petr Tesarik @ 2014-11-12 13:18 ` Christopher Covington 2014-11-12 13:26 ` Petr Tesarik 0 siblings, 1 reply; 29+ messages in thread From: Christopher Covington @ 2014-11-12 13:18 UTC (permalink / raw) To: Petr Tesarik Cc: Wen Congyang, Ekaterina Tumanova, kexec, qemu devel list, Qiao Nuohan, Dave Anderson, kumagai-atsushi, Laszlo Ersek, crash-utility On 11/12/2014 03:05 AM, Petr Tesarik wrote: > On Tue, 11 Nov 2014 12:27:44 -0500 > Christopher Covington <cov@codeaurora.org> wrote: > >> On 11/11/2014 06:22 AM, Laszlo Ersek wrote: >>> (Note: I'm not subscribed to either qemu-devel or the kexec list; please >>> keep me CC'd.) >>> >>> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib, >>> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command. >>> >>> The resultant vmcore is usually analyzed with the "crash" utility. >>> >>> The original tool producing such files is kdump. Unlike the procedure >>> performed by QEMU, kdump runs from *within* the guest (under a kexec'd >>> kdump kernel), and has more information about the original guest kernel >>> state (which is being dumped) than QEMU. To QEMU, the guest kernel state >>> is opaque. >>> >>> For this reason, the kdump preparation logic in QEMU hardcodes a number >>> of fields in the kdump header. The direct issue is the "phys_base" >>> field. Refer to dump.c, functions create_header32(), create_header64(), >>> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text >>> "0"). >>> >>> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD >>> >>> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD >>> >>> This works in most cases, because the guest Linux kernel indeed tends to >>> be loaded at guest-phys address 0. However, when the guest Linux kernel >>> is booted on top of OVMF (which has a somewhat unusual UEFI memory map), >>> then the guest Linux kernel is loaded at 16MB, thereby getting out of >>> sync with the phys_base=0 setting visible in the KDUMP header. >>> >>> This trips up the "crash" utility. >>> >>> Dave worked around the issue in "crash" for ELF format dumps -- "crash" >>> can identify QEMU as the originator of the vmcore by finding the QEMU >>> notes in the ELF vmcore. If those are present, then "crash" employs a >>> heuristic, probing for a phys_base up to 32MB, in 1MB steps. >> >> What advantages does KDUMP have over ELF? > > It's smaller (data is compressed), and it contains a header with some > useful information (e.g. the crashed kernel's version and release). What if the ELF dumper used SHF_COMPRESSED or could dump an ELF.xz? How does QEMU figure out the kernel version information? Thanks, Chris -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] uniquely identifying KDUMP files that originate from QEMU 2014-11-12 13:18 ` Christopher Covington @ 2014-11-12 13:26 ` Petr Tesarik 2014-11-12 13:28 ` Christopher Covington 2014-11-12 14:10 ` Laszlo Ersek 0 siblings, 2 replies; 29+ messages in thread From: Petr Tesarik @ 2014-11-12 13:26 UTC (permalink / raw) To: Christopher Covington Cc: Wen Congyang, Ekaterina Tumanova, kexec, qemu devel list, Qiao Nuohan, Dave Anderson, kumagai-atsushi, Laszlo Ersek, crash-utility On Wed, 12 Nov 2014 08:18:04 -0500 Christopher Covington <cov@codeaurora.org> wrote: > On 11/12/2014 03:05 AM, Petr Tesarik wrote: > > On Tue, 11 Nov 2014 12:27:44 -0500 > > Christopher Covington <cov@codeaurora.org> wrote: > > > >> On 11/11/2014 06:22 AM, Laszlo Ersek wrote: > >>> (Note: I'm not subscribed to either qemu-devel or the kexec list; please > >>> keep me CC'd.) > >>> > >>> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib, > >>> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command. > >>> > >>> The resultant vmcore is usually analyzed with the "crash" utility. > >>> > >>> The original tool producing such files is kdump. Unlike the procedure > >>> performed by QEMU, kdump runs from *within* the guest (under a kexec'd > >>> kdump kernel), and has more information about the original guest kernel > >>> state (which is being dumped) than QEMU. To QEMU, the guest kernel state > >>> is opaque. > >>> > >>> For this reason, the kdump preparation logic in QEMU hardcodes a number > >>> of fields in the kdump header. The direct issue is the "phys_base" > >>> field. Refer to dump.c, functions create_header32(), create_header64(), > >>> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text > >>> "0"). > >>> > >>> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD > >>> > >>> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD > >>> > >>> This works in most cases, because the guest Linux kernel indeed tends to > >>> be loaded at guest-phys address 0. However, when the guest Linux kernel > >>> is booted on top of OVMF (which has a somewhat unusual UEFI memory map), > >>> then the guest Linux kernel is loaded at 16MB, thereby getting out of > >>> sync with the phys_base=0 setting visible in the KDUMP header. > >>> > >>> This trips up the "crash" utility. > >>> > >>> Dave worked around the issue in "crash" for ELF format dumps -- "crash" > >>> can identify QEMU as the originator of the vmcore by finding the QEMU > >>> notes in the ELF vmcore. If those are present, then "crash" employs a > >>> heuristic, probing for a phys_base up to 32MB, in 1MB steps. > >> > >> What advantages does KDUMP have over ELF? > > > > It's smaller (data is compressed), and it contains a header with some > > useful information (e.g. the crashed kernel's version and release). > > What if the ELF dumper used SHF_COMPRESSED or could dump an ELF.xz? Not the same thing. With KDUMP, each page is compressed separately, so if a utility like crash needs a page from the middle, it can find it and unpack it immediately. If we had an ELF.xz, then the whole file must be unpacked before it can be used. And unpacking a few terabytes takes ... a while. ;-) > How does QEMU figure out the kernel version information? Good question. Who can answer this part? Petr T _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] uniquely identifying KDUMP files that originate from QEMU 2014-11-12 13:26 ` Petr Tesarik @ 2014-11-12 13:28 ` Christopher Covington 2014-11-12 14:36 ` Petr Tesarik 2014-11-12 14:40 ` Laszlo Ersek 2014-11-12 14:10 ` Laszlo Ersek 1 sibling, 2 replies; 29+ messages in thread From: Christopher Covington @ 2014-11-12 13:28 UTC (permalink / raw) To: Petr Tesarik Cc: Wen Congyang, Ekaterina Tumanova, kexec, qemu devel list, Qiao Nuohan, Dave Anderson, kumagai-atsushi, Laszlo Ersek, crash-utility On 11/12/2014 08:26 AM, Petr Tesarik wrote: > On Wed, 12 Nov 2014 08:18:04 -0500 > Christopher Covington <cov@codeaurora.org> wrote: > >> On 11/12/2014 03:05 AM, Petr Tesarik wrote: >>> On Tue, 11 Nov 2014 12:27:44 -0500 >>> Christopher Covington <cov@codeaurora.org> wrote: >>> >>>> On 11/11/2014 06:22 AM, Laszlo Ersek wrote: >>>>> (Note: I'm not subscribed to either qemu-devel or the kexec list; please >>>>> keep me CC'd.) >>>>> >>>>> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib, >>>>> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command. >>>>> >>>>> The resultant vmcore is usually analyzed with the "crash" utility. >>>>> >>>>> The original tool producing such files is kdump. Unlike the procedure >>>>> performed by QEMU, kdump runs from *within* the guest (under a kexec'd >>>>> kdump kernel), and has more information about the original guest kernel >>>>> state (which is being dumped) than QEMU. To QEMU, the guest kernel state >>>>> is opaque. >>>>> >>>>> For this reason, the kdump preparation logic in QEMU hardcodes a number >>>>> of fields in the kdump header. The direct issue is the "phys_base" >>>>> field. Refer to dump.c, functions create_header32(), create_header64(), >>>>> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text >>>>> "0"). >>>>> >>>>> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD >>>>> >>>>> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD >>>>> >>>>> This works in most cases, because the guest Linux kernel indeed tends to >>>>> be loaded at guest-phys address 0. However, when the guest Linux kernel >>>>> is booted on top of OVMF (which has a somewhat unusual UEFI memory map), >>>>> then the guest Linux kernel is loaded at 16MB, thereby getting out of >>>>> sync with the phys_base=0 setting visible in the KDUMP header. >>>>> >>>>> This trips up the "crash" utility. >>>>> >>>>> Dave worked around the issue in "crash" for ELF format dumps -- "crash" >>>>> can identify QEMU as the originator of the vmcore by finding the QEMU >>>>> notes in the ELF vmcore. If those are present, then "crash" employs a >>>>> heuristic, probing for a phys_base up to 32MB, in 1MB steps. >>>> >>>> What advantages does KDUMP have over ELF? >>> >>> It's smaller (data is compressed), and it contains a header with some >>> useful information (e.g. the crashed kernel's version and release). >> >> What if the ELF dumper used SHF_COMPRESSED or could dump an ELF.xz? > > Not the same thing. With KDUMP, each page is compressed separately, so > if a utility like crash needs a page from the middle, it can find it > and unpack it immediately. If we had an ELF.xz, then the whole file > must be unpacked before it can be used. And unpacking a few terabytes > takes ... a while. ;-) Understood on the ELF.xz approach, but why couldn't each page (or maybe a configurable size) be a SHF_COMPRESSED section? Thanks, Chris -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] uniquely identifying KDUMP files that originate from QEMU 2014-11-12 13:28 ` Christopher Covington @ 2014-11-12 14:36 ` Petr Tesarik 2014-11-12 14:40 ` Laszlo Ersek 1 sibling, 0 replies; 29+ messages in thread From: Petr Tesarik @ 2014-11-12 14:36 UTC (permalink / raw) To: Christopher Covington Cc: Wen Congyang, Ekaterina Tumanova, kexec, qemu devel list, Qiao Nuohan, Dave Anderson, kumagai-atsushi, Laszlo Ersek, crash-utility V Wed, 12 Nov 2014 08:28:54 -0500 Christopher Covington <cov@codeaurora.org> napsáno: > On 11/12/2014 08:26 AM, Petr Tesarik wrote: > > On Wed, 12 Nov 2014 08:18:04 -0500 > > Christopher Covington <cov@codeaurora.org> wrote: > > > >> On 11/12/2014 03:05 AM, Petr Tesarik wrote: > >>> On Tue, 11 Nov 2014 12:27:44 -0500 > >>> Christopher Covington <cov@codeaurora.org> wrote: > >>> > >>>> On 11/11/2014 06:22 AM, Laszlo Ersek wrote: > >>>>> (Note: I'm not subscribed to either qemu-devel or the kexec list; please > >>>>> keep me CC'd.) > >>>>> > >>>>> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib, > >>>>> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command. > >>>>> > >>>>> The resultant vmcore is usually analyzed with the "crash" utility. > >>>>> > >>>>> The original tool producing such files is kdump. Unlike the procedure > >>>>> performed by QEMU, kdump runs from *within* the guest (under a kexec'd > >>>>> kdump kernel), and has more information about the original guest kernel > >>>>> state (which is being dumped) than QEMU. To QEMU, the guest kernel state > >>>>> is opaque. > >>>>> > >>>>> For this reason, the kdump preparation logic in QEMU hardcodes a number > >>>>> of fields in the kdump header. The direct issue is the "phys_base" > >>>>> field. Refer to dump.c, functions create_header32(), create_header64(), > >>>>> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text > >>>>> "0"). > >>>>> > >>>>> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD > >>>>> > >>>>> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD > >>>>> > >>>>> This works in most cases, because the guest Linux kernel indeed tends to > >>>>> be loaded at guest-phys address 0. However, when the guest Linux kernel > >>>>> is booted on top of OVMF (which has a somewhat unusual UEFI memory map), > >>>>> then the guest Linux kernel is loaded at 16MB, thereby getting out of > >>>>> sync with the phys_base=0 setting visible in the KDUMP header. > >>>>> > >>>>> This trips up the "crash" utility. > >>>>> > >>>>> Dave worked around the issue in "crash" for ELF format dumps -- "crash" > >>>>> can identify QEMU as the originator of the vmcore by finding the QEMU > >>>>> notes in the ELF vmcore. If those are present, then "crash" employs a > >>>>> heuristic, probing for a phys_base up to 32MB, in 1MB steps. > >>>> > >>>> What advantages does KDUMP have over ELF? > >>> > >>> It's smaller (data is compressed), and it contains a header with some > >>> useful information (e.g. the crashed kernel's version and release). > >> > >> What if the ELF dumper used SHF_COMPRESSED or could dump an ELF.xz? > > > > Not the same thing. With KDUMP, each page is compressed separately, so > > if a utility like crash needs a page from the middle, it can find it > > and unpack it immediately. If we had an ELF.xz, then the whole file > > must be unpacked before it can be used. And unpacking a few terabytes > > takes ... a while. ;-) > > Understood on the ELF.xz approach, but why couldn't each page (or maybe a > configurable size) be a SHF_COMPRESSED section? A machine with 64TB of RAM (already manufactured by SGI) has 17,179,869,184 pages. When KDUMP (or, actually diskdump) format was invented, ELF files could have at most 2^16 = 65,536 program headers. Since then, ELF specification has been extended (PN_XNUM), so the number of sections can be stored in the sh_info field of the first ELF section, but that only increases the number of possible sections to 2^32 = 4,294,967,296. Yes, we could divide memory into larger chunks than pages, but: 1. you're probably the first one to have the idea, and 2. this is easy if you save the complete RAM content, but not quite that easy if some pages should be filtered out (makedumpfile). There are a few other (minor) points, e.g.: * Each program header consumes 56 bytes in ELF64, while a single bit is sufficient in KDUMP compressed files to tell if the corresponding page is stored or not. * SHF_COMPRESSED currently supports only zlib compression, which is rather slow. KDUMP supports zlib, lzo and snappy. * Support for KDUMP files is already present in the crash utility, while I don't think there is any support for SHF_COMPRESSED segments. In short, SHF_COMPRESSED looks like a viable alternative, but right now KDUMP is the better choice in terms of features and interoperability. Just my two cents, Petr T _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] uniquely identifying KDUMP files that originate from QEMU 2014-11-12 13:28 ` Christopher Covington 2014-11-12 14:36 ` Petr Tesarik @ 2014-11-12 14:40 ` Laszlo Ersek 1 sibling, 0 replies; 29+ messages in thread From: Laszlo Ersek @ 2014-11-12 14:40 UTC (permalink / raw) To: Christopher Covington, Petr Tesarik Cc: Wen Congyang, Ekaterina Tumanova, kexec, qemu devel list, Qiao Nuohan, Dave Anderson, kumagai-atsushi, crash-utility On 11/12/14 14:28, Christopher Covington wrote: > On 11/12/2014 08:26 AM, Petr Tesarik wrote: >> On Wed, 12 Nov 2014 08:18:04 -0500 >> Christopher Covington <cov@codeaurora.org> wrote: >> >>> On 11/12/2014 03:05 AM, Petr Tesarik wrote: >>>> On Tue, 11 Nov 2014 12:27:44 -0500 >>>> Christopher Covington <cov@codeaurora.org> wrote: >>>> >>>>> On 11/11/2014 06:22 AM, Laszlo Ersek wrote: >>>>>> (Note: I'm not subscribed to either qemu-devel or the kexec list; please >>>>>> keep me CC'd.) >>>>>> >>>>>> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib, >>>>>> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command. >>>>>> >>>>>> The resultant vmcore is usually analyzed with the "crash" utility. >>>>>> >>>>>> The original tool producing such files is kdump. Unlike the procedure >>>>>> performed by QEMU, kdump runs from *within* the guest (under a kexec'd >>>>>> kdump kernel), and has more information about the original guest kernel >>>>>> state (which is being dumped) than QEMU. To QEMU, the guest kernel state >>>>>> is opaque. >>>>>> >>>>>> For this reason, the kdump preparation logic in QEMU hardcodes a number >>>>>> of fields in the kdump header. The direct issue is the "phys_base" >>>>>> field. Refer to dump.c, functions create_header32(), create_header64(), >>>>>> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text >>>>>> "0"). >>>>>> >>>>>> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD >>>>>> >>>>>> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD >>>>>> >>>>>> This works in most cases, because the guest Linux kernel indeed tends to >>>>>> be loaded at guest-phys address 0. However, when the guest Linux kernel >>>>>> is booted on top of OVMF (which has a somewhat unusual UEFI memory map), >>>>>> then the guest Linux kernel is loaded at 16MB, thereby getting out of >>>>>> sync with the phys_base=0 setting visible in the KDUMP header. >>>>>> >>>>>> This trips up the "crash" utility. >>>>>> >>>>>> Dave worked around the issue in "crash" for ELF format dumps -- "crash" >>>>>> can identify QEMU as the originator of the vmcore by finding the QEMU >>>>>> notes in the ELF vmcore. If those are present, then "crash" employs a >>>>>> heuristic, probing for a phys_base up to 32MB, in 1MB steps. >>>>> >>>>> What advantages does KDUMP have over ELF? >>>> >>>> It's smaller (data is compressed), and it contains a header with some >>>> useful information (e.g. the crashed kernel's version and release). >>> >>> What if the ELF dumper used SHF_COMPRESSED or could dump an ELF.xz? >> >> Not the same thing. With KDUMP, each page is compressed separately, so >> if a utility like crash needs a page from the middle, it can find it >> and unpack it immediately. If we had an ELF.xz, then the whole file >> must be unpacked before it can be used. And unpacking a few terabytes >> takes ... a while. ;-) > > Understood on the ELF.xz approach, but why couldn't each page (or maybe a > configurable size) be a SHF_COMPRESSED section? Perhaps it could, technically -- it's just not how Qiao Nuohan implemented the feature. I didn't research the background for this. Thanks Laszlo _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] uniquely identifying KDUMP files that originate from QEMU 2014-11-12 13:26 ` Petr Tesarik 2014-11-12 13:28 ` Christopher Covington @ 2014-11-12 14:10 ` Laszlo Ersek 2014-11-12 14:48 ` Christopher Covington 1 sibling, 1 reply; 29+ messages in thread From: Laszlo Ersek @ 2014-11-12 14:10 UTC (permalink / raw) To: Petr Tesarik, Christopher Covington Cc: Wen Congyang, Ekaterina Tumanova, kexec, qemu devel list, Qiao Nuohan, Dave Anderson, kumagai-atsushi, crash-utility On 11/12/14 14:26, Petr Tesarik wrote: > On Wed, 12 Nov 2014 08:18:04 -0500 > Christopher Covington <cov@codeaurora.org> wrote: > >> On 11/12/2014 03:05 AM, Petr Tesarik wrote: >>> On Tue, 11 Nov 2014 12:27:44 -0500 >>> Christopher Covington <cov@codeaurora.org> wrote: >>> >>>> On 11/11/2014 06:22 AM, Laszlo Ersek wrote: >>>>> (Note: I'm not subscribed to either qemu-devel or the kexec list; please >>>>> keep me CC'd.) >>>>> >>>>> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib, >>>>> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command. >>>>> >>>>> The resultant vmcore is usually analyzed with the "crash" utility. >>>>> >>>>> The original tool producing such files is kdump. Unlike the procedure >>>>> performed by QEMU, kdump runs from *within* the guest (under a kexec'd >>>>> kdump kernel), and has more information about the original guest kernel >>>>> state (which is being dumped) than QEMU. To QEMU, the guest kernel state >>>>> is opaque. >>>>> >>>>> For this reason, the kdump preparation logic in QEMU hardcodes a number >>>>> of fields in the kdump header. The direct issue is the "phys_base" >>>>> field. Refer to dump.c, functions create_header32(), create_header64(), >>>>> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text >>>>> "0"). >>>>> >>>>> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD >>>>> >>>>> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD >>>>> >>>>> This works in most cases, because the guest Linux kernel indeed tends to >>>>> be loaded at guest-phys address 0. However, when the guest Linux kernel >>>>> is booted on top of OVMF (which has a somewhat unusual UEFI memory map), >>>>> then the guest Linux kernel is loaded at 16MB, thereby getting out of >>>>> sync with the phys_base=0 setting visible in the KDUMP header. >>>>> >>>>> This trips up the "crash" utility. >>>>> >>>>> Dave worked around the issue in "crash" for ELF format dumps -- "crash" >>>>> can identify QEMU as the originator of the vmcore by finding the QEMU >>>>> notes in the ELF vmcore. If those are present, then "crash" employs a >>>>> heuristic, probing for a phys_base up to 32MB, in 1MB steps. >>>> >>>> What advantages does KDUMP have over ELF? >>> >>> It's smaller (data is compressed), and it contains a header with some >>> useful information (e.g. the crashed kernel's version and release). Another advantage is that all zero-filled pages are represented in the kdump file by one shared zero page. The difference in speed of dumping is stunning. >> What if the ELF dumper used SHF_COMPRESSED or could dump an ELF.xz? > > Not the same thing. With KDUMP, each page is compressed separately, so > if a utility like crash needs a page from the middle, it can find it > and unpack it immediately. If we had an ELF.xz, then the whole file > must be unpacked before it can be used. And unpacking a few terabytes > takes ... a while. ;-) > >> How does QEMU figure out the kernel version information? > > Good question. Who can answer this part? I can. (Apologies for being a bit non-responsive, I'm swamped. I figured I'd let the discussion unfold a bit between the kdump experts.) So, QEMU doesn't figure out the kernel version information. It just dumps the guest-physical frames, and that's it. I linked the code before that populates the kdump header. The links and function names are still visible above. Thanks Laszlo _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] uniquely identifying KDUMP files that originate from QEMU 2014-11-12 14:10 ` Laszlo Ersek @ 2014-11-12 14:48 ` Christopher Covington 2014-11-12 15:03 ` Laszlo Ersek 0 siblings, 1 reply; 29+ messages in thread From: Christopher Covington @ 2014-11-12 14:48 UTC (permalink / raw) To: Laszlo Ersek, Petr Tesarik Cc: Wen Congyang, Ekaterina Tumanova, kexec, qemu devel list, Qiao Nuohan, Dave Anderson, kumagai-atsushi, crash-utility Thanks Petr and Laszlo for entertaining my questions. I've got one last one if you have the time. On 11/12/2014 09:10 AM, Laszlo Ersek wrote: > On 11/12/14 14:26, Petr Tesarik wrote: >> On Wed, 12 Nov 2014 08:18:04 -0500 >> Christopher Covington <cov@codeaurora.org> wrote: >> >>> On 11/12/2014 03:05 AM, Petr Tesarik wrote: >>>> On Tue, 11 Nov 2014 12:27:44 -0500 >>>> Christopher Covington <cov@codeaurora.org> wrote: >>>> >>>>> On 11/11/2014 06:22 AM, Laszlo Ersek wrote: >>>>>> (Note: I'm not subscribed to either qemu-devel or the kexec list; please >>>>>> keep me CC'd.) >>>>>> >>>>>> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib, >>>>>> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command. >>>>>> >>>>>> The resultant vmcore is usually analyzed with the "crash" utility. >>>>>> >>>>>> The original tool producing such files is kdump. Unlike the procedure >>>>>> performed by QEMU, kdump runs from *within* the guest (under a kexec'd >>>>>> kdump kernel), and has more information about the original guest kernel >>>>>> state (which is being dumped) than QEMU. To QEMU, the guest kernel state >>>>>> is opaque. >>>>>> >>>>>> For this reason, the kdump preparation logic in QEMU hardcodes a number >>>>>> of fields in the kdump header. The direct issue is the "phys_base" >>>>>> field. Refer to dump.c, functions create_header32(), create_header64(), >>>>>> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text >>>>>> "0"). >>>>>> >>>>>> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD >>>>>> >>>>>> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD >>>>>> >>>>>> This works in most cases, because the guest Linux kernel indeed tends to >>>>>> be loaded at guest-phys address 0. However, when the guest Linux kernel >>>>>> is booted on top of OVMF (which has a somewhat unusual UEFI memory map), >>>>>> then the guest Linux kernel is loaded at 16MB, thereby getting out of >>>>>> sync with the phys_base=0 setting visible in the KDUMP header. >>>>>> >>>>>> This trips up the "crash" utility. >>>>>> >>>>>> Dave worked around the issue in "crash" for ELF format dumps -- "crash" >>>>>> can identify QEMU as the originator of the vmcore by finding the QEMU >>>>>> notes in the ELF vmcore. If those are present, then "crash" employs a >>>>>> heuristic, probing for a phys_base up to 32MB, in 1MB steps. >>>>> >>>>> What advantages does KDUMP have over ELF? >>>> >>>> It's smaller (data is compressed), and it contains a header with some >>>> useful information (e.g. the crashed kernel's version and release). > > Another advantage is that all zero-filled pages are represented in the > kdump file by one shared zero page. > > The difference in speed of dumping is stunning. Would you expect using SHT_NOBITS to give a similar speedup to the ELF dumper? Thanks, Chris -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] uniquely identifying KDUMP files that originate from QEMU 2014-11-12 14:48 ` Christopher Covington @ 2014-11-12 15:03 ` Laszlo Ersek 2014-11-12 15:43 ` Christopher Covington 0 siblings, 1 reply; 29+ messages in thread From: Laszlo Ersek @ 2014-11-12 15:03 UTC (permalink / raw) To: Christopher Covington, Petr Tesarik Cc: Wen Congyang, Ekaterina Tumanova, kexec, qemu devel list, Qiao Nuohan, Dave Anderson, kumagai-atsushi, crash-utility On 11/12/14 15:48, Christopher Covington wrote: > Thanks Petr and Laszlo for entertaining my questions. I've got one last one if > you have the time. > > On 11/12/2014 09:10 AM, Laszlo Ersek wrote: >> On 11/12/14 14:26, Petr Tesarik wrote: >>> On Wed, 12 Nov 2014 08:18:04 -0500 >>> Christopher Covington <cov@codeaurora.org> wrote: >>> >>>> On 11/12/2014 03:05 AM, Petr Tesarik wrote: >>>>> On Tue, 11 Nov 2014 12:27:44 -0500 >>>>> Christopher Covington <cov@codeaurora.org> wrote: >>>>> >>>>>> On 11/11/2014 06:22 AM, Laszlo Ersek wrote: >>>>>>> (Note: I'm not subscribed to either qemu-devel or the kexec list; please >>>>>>> keep me CC'd.) >>>>>>> >>>>>>> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib, >>>>>>> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command. >>>>>>> >>>>>>> The resultant vmcore is usually analyzed with the "crash" utility. >>>>>>> >>>>>>> The original tool producing such files is kdump. Unlike the procedure >>>>>>> performed by QEMU, kdump runs from *within* the guest (under a kexec'd >>>>>>> kdump kernel), and has more information about the original guest kernel >>>>>>> state (which is being dumped) than QEMU. To QEMU, the guest kernel state >>>>>>> is opaque. >>>>>>> >>>>>>> For this reason, the kdump preparation logic in QEMU hardcodes a number >>>>>>> of fields in the kdump header. The direct issue is the "phys_base" >>>>>>> field. Refer to dump.c, functions create_header32(), create_header64(), >>>>>>> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text >>>>>>> "0"). >>>>>>> >>>>>>> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD >>>>>>> >>>>>>> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD >>>>>>> >>>>>>> This works in most cases, because the guest Linux kernel indeed tends to >>>>>>> be loaded at guest-phys address 0. However, when the guest Linux kernel >>>>>>> is booted on top of OVMF (which has a somewhat unusual UEFI memory map), >>>>>>> then the guest Linux kernel is loaded at 16MB, thereby getting out of >>>>>>> sync with the phys_base=0 setting visible in the KDUMP header. >>>>>>> >>>>>>> This trips up the "crash" utility. >>>>>>> >>>>>>> Dave worked around the issue in "crash" for ELF format dumps -- "crash" >>>>>>> can identify QEMU as the originator of the vmcore by finding the QEMU >>>>>>> notes in the ELF vmcore. If those are present, then "crash" employs a >>>>>>> heuristic, probing for a phys_base up to 32MB, in 1MB steps. >>>>>> >>>>>> What advantages does KDUMP have over ELF? >>>>> >>>>> It's smaller (data is compressed), and it contains a header with some >>>>> useful information (e.g. the crashed kernel's version and release). >> >> Another advantage is that all zero-filled pages are represented in the >> kdump file by one shared zero page. >> >> The difference in speed of dumping is stunning. > > Would you expect using SHT_NOBITS to give a similar speedup to the ELF dumper? Sorry, I don't know what SHT_NOBITS is. Laszlo _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] uniquely identifying KDUMP files that originate from QEMU 2014-11-12 15:03 ` Laszlo Ersek @ 2014-11-12 15:43 ` Christopher Covington 2014-11-12 21:10 ` Petr Tesarik 0 siblings, 1 reply; 29+ messages in thread From: Christopher Covington @ 2014-11-12 15:43 UTC (permalink / raw) To: Laszlo Ersek Cc: Wen Congyang, Petr Tesarik, Ekaterina Tumanova, kexec, qemu devel list, Qiao Nuohan, Dave Anderson, kumagai-atsushi, crash-utility On 11/12/2014 10:03 AM, Laszlo Ersek wrote: > On 11/12/14 15:48, Christopher Covington wrote: >> Thanks Petr and Laszlo for entertaining my questions. I've got one last one if >> you have the time. >> >> On 11/12/2014 09:10 AM, Laszlo Ersek wrote: >>> On 11/12/14 14:26, Petr Tesarik wrote: >>>> On Wed, 12 Nov 2014 08:18:04 -0500 >>>> Christopher Covington <cov@codeaurora.org> wrote: >>>> >>>>> On 11/12/2014 03:05 AM, Petr Tesarik wrote: >>>>>> On Tue, 11 Nov 2014 12:27:44 -0500 >>>>>> Christopher Covington <cov@codeaurora.org> wrote: >>>>>> >>>>>>> On 11/11/2014 06:22 AM, Laszlo Ersek wrote: >>>>>>>> (Note: I'm not subscribed to either qemu-devel or the kexec list; please >>>>>>>> keep me CC'd.) >>>>>>>> >>>>>>>> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib, >>>>>>>> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command. >>>>>>>> >>>>>>>> The resultant vmcore is usually analyzed with the "crash" utility. >>>>>>>> >>>>>>>> The original tool producing such files is kdump. Unlike the procedure >>>>>>>> performed by QEMU, kdump runs from *within* the guest (under a kexec'd >>>>>>>> kdump kernel), and has more information about the original guest kernel >>>>>>>> state (which is being dumped) than QEMU. To QEMU, the guest kernel state >>>>>>>> is opaque. >>>>>>>> >>>>>>>> For this reason, the kdump preparation logic in QEMU hardcodes a number >>>>>>>> of fields in the kdump header. The direct issue is the "phys_base" >>>>>>>> field. Refer to dump.c, functions create_header32(), create_header64(), >>>>>>>> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text >>>>>>>> "0"). >>>>>>>> >>>>>>>> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD >>>>>>>> >>>>>>>> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD >>>>>>>> >>>>>>>> This works in most cases, because the guest Linux kernel indeed tends to >>>>>>>> be loaded at guest-phys address 0. However, when the guest Linux kernel >>>>>>>> is booted on top of OVMF (which has a somewhat unusual UEFI memory map), >>>>>>>> then the guest Linux kernel is loaded at 16MB, thereby getting out of >>>>>>>> sync with the phys_base=0 setting visible in the KDUMP header. >>>>>>>> >>>>>>>> This trips up the "crash" utility. >>>>>>>> >>>>>>>> Dave worked around the issue in "crash" for ELF format dumps -- "crash" >>>>>>>> can identify QEMU as the originator of the vmcore by finding the QEMU >>>>>>>> notes in the ELF vmcore. If those are present, then "crash" employs a >>>>>>>> heuristic, probing for a phys_base up to 32MB, in 1MB steps. >>>>>>> >>>>>>> What advantages does KDUMP have over ELF? >>>>>> >>>>>> It's smaller (data is compressed), and it contains a header with some >>>>>> useful information (e.g. the crashed kernel's version and release). >>> >>> Another advantage is that all zero-filled pages are represented in the >>> kdump file by one shared zero page. >>> >>> The difference in speed of dumping is stunning. >> >> Would you expect using SHT_NOBITS to give a similar speedup to the ELF dumper? > > Sorry, I don't know what SHT_NOBITS is. My newbie understanding is that SHT_NOBITS is the section type of the .bss section in an everyday executable. It's what makes the section only need a header table entry in the ELF file, and not sh_size (Size) worth of zeros. The ELF loader will essentially zero memory beginning at sh_addr (Address) for sh_size (Size). http://stackoverflow.com/questions/22855320/address-space-of-a-bss-section-space-in-elf-file http://people.redhat.com/mpolacek/src/devconf2012.pdf Chris -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] uniquely identifying KDUMP files that originate from QEMU 2014-11-12 15:43 ` Christopher Covington @ 2014-11-12 21:10 ` Petr Tesarik 0 siblings, 0 replies; 29+ messages in thread From: Petr Tesarik @ 2014-11-12 21:10 UTC (permalink / raw) To: Christopher Covington Cc: Wen Congyang, Ekaterina Tumanova, kexec, qemu devel list, Qiao Nuohan, Dave Anderson, kumagai-atsushi, Laszlo Ersek, crash-utility On Wed, 12 Nov 2014 10:43:59 -0500 Christopher Covington <cov@codeaurora.org> wrote: > On 11/12/2014 10:03 AM, Laszlo Ersek wrote: > > On 11/12/14 15:48, Christopher Covington wrote: > >> Thanks Petr and Laszlo for entertaining my questions. I've got one last one if > >> you have the time. > >> > >> On 11/12/2014 09:10 AM, Laszlo Ersek wrote: > >>> On 11/12/14 14:26, Petr Tesarik wrote: > >>>> On Wed, 12 Nov 2014 08:18:04 -0500 > >>>> Christopher Covington <cov@codeaurora.org> wrote: > >>>> > >>>>> On 11/12/2014 03:05 AM, Petr Tesarik wrote: > >>>>>> On Tue, 11 Nov 2014 12:27:44 -0500 > >>>>>> Christopher Covington <cov@codeaurora.org> wrote: > >>>>>> > >>>>>>> On 11/11/2014 06:22 AM, Laszlo Ersek wrote: > >>>>>>>> (Note: I'm not subscribed to either qemu-devel or the kexec list; please > >>>>>>>> keep me CC'd.) > >>>>>>>> > >>>>>>>> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib, > >>>>>>>> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command. > >>>>>>>> > >>>>>>>> The resultant vmcore is usually analyzed with the "crash" utility. > >>>>>>>> > >>>>>>>> The original tool producing such files is kdump. Unlike the procedure > >>>>>>>> performed by QEMU, kdump runs from *within* the guest (under a kexec'd > >>>>>>>> kdump kernel), and has more information about the original guest kernel > >>>>>>>> state (which is being dumped) than QEMU. To QEMU, the guest kernel state > >>>>>>>> is opaque. > >>>>>>>> > >>>>>>>> For this reason, the kdump preparation logic in QEMU hardcodes a number > >>>>>>>> of fields in the kdump header. The direct issue is the "phys_base" > >>>>>>>> field. Refer to dump.c, functions create_header32(), create_header64(), > >>>>>>>> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text > >>>>>>>> "0"). > >>>>>>>> > >>>>>>>> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD > >>>>>>>> > >>>>>>>> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD > >>>>>>>> > >>>>>>>> This works in most cases, because the guest Linux kernel indeed tends to > >>>>>>>> be loaded at guest-phys address 0. However, when the guest Linux kernel > >>>>>>>> is booted on top of OVMF (which has a somewhat unusual UEFI memory map), > >>>>>>>> then the guest Linux kernel is loaded at 16MB, thereby getting out of > >>>>>>>> sync with the phys_base=0 setting visible in the KDUMP header. > >>>>>>>> > >>>>>>>> This trips up the "crash" utility. > >>>>>>>> > >>>>>>>> Dave worked around the issue in "crash" for ELF format dumps -- "crash" > >>>>>>>> can identify QEMU as the originator of the vmcore by finding the QEMU > >>>>>>>> notes in the ELF vmcore. If those are present, then "crash" employs a > >>>>>>>> heuristic, probing for a phys_base up to 32MB, in 1MB steps. > >>>>>>> > >>>>>>> What advantages does KDUMP have over ELF? > >>>>>> > >>>>>> It's smaller (data is compressed), and it contains a header with some > >>>>>> useful information (e.g. the crashed kernel's version and release). > >>> > >>> Another advantage is that all zero-filled pages are represented in the > >>> kdump file by one shared zero page. > >>> > >>> The difference in speed of dumping is stunning. > >> > >> Would you expect using SHT_NOBITS to give a similar speedup to the ELF dumper? > > > > Sorry, I don't know what SHT_NOBITS is. > > My newbie understanding is that SHT_NOBITS is the section type of the .bss > section in an everyday executable. Heh, yes and no. Let's clarify a few things. First, a Linux kernel dump (or a QEMU ELF dump) does not contain any sections. It only contains program headers. The reason is that program headers can specify both the virtual address and the physical address. BTW this feature is used to determine the physical base of the Linux kernel when dumping via kexec. Sections can only specify the virtual address. Of course, program headers can achieve an effect similar to SHT_NOBITS: by specifying a larger memory size than file size. Now, all this does not mean you can't create a new standard that uses sections instead (in fact, Xen DomU dumps already use ELF sections). But of course, this new standard won't be understood by the existing tools until somebody (you?) adds support for it. Same thing is true for the SHF_COMPRESSED section flag. Anyway, reuse of the zero page is a minor point. The main difference in speed comes from using a faster compression algorithm (LZO or snappy). Of course, nothing prevents you from using those algorithms in ELF compressed sections, but you'd have to extend the ELF standard first, adding magic numbers for these algorithms. To me it sounds like a long way to go. In short, there's no technical reason why ELF couldn't achieve results similar to KDUMP. It's "merely" not implemented, and for sure, I'm not going to push all the necessary changes. ;-) Petr T _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] uniquely identifying KDUMP files that originate from QEMU 2014-11-11 17:27 ` [Qemu-devel] " Christopher Covington 2014-11-12 8:05 ` Petr Tesarik @ 2014-11-12 14:37 ` Laszlo Ersek 1 sibling, 0 replies; 29+ messages in thread From: Laszlo Ersek @ 2014-11-12 14:37 UTC (permalink / raw) To: Christopher Covington Cc: Wen Congyang, Ekaterina Tumanova, kexec, qemu devel list, Qiao Nuohan, Dave Anderson, kumagai-atsushi, crash-utility On 11/11/14 18:27, Christopher Covington wrote: > On 11/11/2014 06:22 AM, Laszlo Ersek wrote: >> (Note: I'm not subscribed to either qemu-devel or the kexec list; please >> keep me CC'd.) >> >> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib, >> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command. >> >> The resultant vmcore is usually analyzed with the "crash" utility. >> >> The original tool producing such files is kdump. Unlike the procedure >> performed by QEMU, kdump runs from *within* the guest (under a kexec'd >> kdump kernel), and has more information about the original guest kernel >> state (which is being dumped) than QEMU. To QEMU, the guest kernel state >> is opaque. >> >> For this reason, the kdump preparation logic in QEMU hardcodes a number >> of fields in the kdump header. The direct issue is the "phys_base" >> field. Refer to dump.c, functions create_header32(), create_header64(), >> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text >> "0"). >> >> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD >> >> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD >> >> This works in most cases, because the guest Linux kernel indeed tends to >> be loaded at guest-phys address 0. However, when the guest Linux kernel >> is booted on top of OVMF (which has a somewhat unusual UEFI memory map), >> then the guest Linux kernel is loaded at 16MB, thereby getting out of >> sync with the phys_base=0 setting visible in the KDUMP header. >> >> This trips up the "crash" utility. >> >> Dave worked around the issue in "crash" for ELF format dumps -- "crash" >> can identify QEMU as the originator of the vmcore by finding the QEMU >> notes in the ELF vmcore. If those are present, then "crash" employs a >> heuristic, probing for a phys_base up to 32MB, in 1MB steps. > > What advantages does KDUMP have over ELF? This has been discussed, but I'd like to give a short perspective from personal experience. The more obvious advantage is the smaller size, due to (a) per-page compression (which preserves random-access for "crash"), and (b) zero page sharing. A smaller dump file is easier to store, and easier to upload if you're requesting assitance with debugging. The perhaps less obvious advantage is the speed at which qemu writes the dump. We're talking orders of magnitude, especially on rotational media. This is because lzo and snappy are *incredibly* fast (put differently: they incur very little CPU penalty for the same guest RAM size). The CPU penalty is actually so small that in almost all cases the dumping procedure stays IO-bound (in my experience: even on an SSD!). Now combine that with a potential reduction of 4GB -> 256MB in size: that's a sixteen-fold speedup. (I'm allowed to praise this qemu feature, I didn't write it. :)) Thanks Laszlo _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <mailman.20827.1415774425.22890.kexec@lists.infradead.org>]
* Re: uniquely identifying KDUMP files that originate from QEMU [not found] <mailman.20827.1415774425.22890.kexec@lists.infradead.org> @ 2014-11-12 14:09 ` Dave Anderson 2014-11-12 15:01 ` Laszlo Ersek 2014-11-13 1:08 ` HATAYAMA Daisuke 0 siblings, 2 replies; 29+ messages in thread From: Dave Anderson @ 2014-11-12 14:09 UTC (permalink / raw) To: kexec Cc: Laszlo Ersek, HATAYAMA Daisuke, Petr Tesarik, Discussion list for crash utility usage, maintenance and development ----- Original Message ----- > From: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> > To: ptesarik@suse.cz > Cc: lersek@redhat.com, kexec@lists.infradead.org > Subject: Re: uniquely identifying KDUMP files that originate from QEMU > Message-ID: > <20141112.120838.303682123986142686.d.hatayama@jp.fujitsu.com> > Content-Type: Text/Plain; charset=us-ascii > > From: Petr Tesarik <ptesarik@suse.cz> > Subject: Re: uniquely identifying KDUMP files that originate from QEMU > Date: Tue, 11 Nov 2014 13:09:13 +0100 > > > On Tue, 11 Nov 2014 12:22:52 +0100 > > Laszlo Ersek <lersek@redhat.com> wrote: > > > >> (Note: I'm not subscribed to either qemu-devel or the kexec list; please > >> keep me CC'd.) > >> > >> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib, > >> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command. > >> > >> The resultant vmcore is usually analyzed with the "crash" utility. > >> > >> The original tool producing such files is kdump. Unlike the procedure > >> performed by QEMU, kdump runs from *within* the guest (under a kexec'd > >> kdump kernel), and has more information about the original guest kernel > >> state (which is being dumped) than QEMU. To QEMU, the guest kernel state > >> is opaque. > >> > >> For this reason, the kdump preparation logic in QEMU hardcodes a number > >> of fields in the kdump header. The direct issue is the "phys_base" > >> field. Refer to dump.c, functions create_header32(), create_header64(), > >> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text > >> "0"). > >> > >> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD > >> > >> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD > >> > >> This works in most cases, because the guest Linux kernel indeed tends to > >> be loaded at guest-phys address 0. However, when the guest Linux kernel > >> is booted on top of OVMF (which has a somewhat unusual UEFI memory map), > >> then the guest Linux kernel is loaded at 16MB, thereby getting out of > >> sync with the phys_base=0 setting visible in the KDUMP header. > >> > >> This trips up the "crash" utility. > >> > >> Dave worked around the issue in "crash" for ELF format dumps -- "crash" > >> can identify QEMU as the originator of the vmcore by finding the QEMU > >> notes in the ELF vmcore. If those are present, then "crash" employs a > >> heuristic, probing for a phys_base up to 32MB, in 1MB steps. > >> > >> Alas, the QEMU notes are not present in the KDUMP-format vmcores that > >> QEMU produces (they cannot be), > > > > Why? Since KDUMP format version 4, the complete ELF notes can be stored > > in the file (see offset_note, size_note fields in the sub-header). > > > > Yes, the QEMU notes is present in kdump-compressed format. But > phys_base cannot be calculated only from qemu-side. We cannot do more > than the efforts crash utility does for workaround. So, the phys_base > value in kdump-sub header is now designed to have 0 now. > > Anyway, phys_base is kernel information. To make it available for qemu > side, there's need to prepare a mechanism for qemu to have any access > to it. > > One ad-hoc but simple way is to put phys_base value as part of > VMCOREINFO note information on kernel. > > Although there has already been a similar one in VMCOREINFO, like > > arch/x86/kernel/ > == > void arch_crash_save_vmcoreinfo(void) > { > VMCOREINFO_SYMBOL(phys_base); <---- This > VMCOREINFO_SYMBOL(init_level4_pgt); > > ... > == > > this is meangless, because this value is a virtual address assigned to > phys_base symbol. To refer to the value of phys_base itself, we need > the phys_base value we are about to get now. > > So, instead, if we change this to save the value, not value of symbol > phys_base, we can get phys_base from the VMCOREINFO. > > The VMCOREINFO consists simply of string. So it's easy to search > vmcore for it e.g. using strings and grep like this: > > $ strings vmcore-3.10.0-121.el7.x86_64 | grep -E ".*VMCOREINFO.*" -A 100 > VMCOREINFO > OSRELEASE=3.10.0-121.el7.x86_64 > PAGESIZE=4096 > ... > SYMBOL(phys_base)=ffffffff818e5010 <-- though this is address of phys_base > now... > SYMBOL(init_level4_pgt)=ffffffff818de000 > SYMBOL(node_data)=ffffffff819f1cc0 > LENGTH(node_data)=1024 > CRASHTIME=1399460394 > ... > > This should also be useful to get phys_base of 2nd kernel, which is > inherently relocated kernel from a vmcore generated using qemu dump. > > This is far from well-designed from qemu's point of view, but it would > be manually easier to get phys_base than now. > > Obviously, the VMCOREINFO is available only if CONFIG_KEXEC is > enabled. Other users cannot use this. > > -- > Thanks. > HATAYAMA, Daisuke I agree that the actual value of phys_base should be included in the vmcoreinfo. However, it won't help in this case because the vmcoreinfo data is not copied into the compressed dumpfile header. The offset_vmcoreinfo and size_vmcoreinfo fields are zero. Here's an example header dump of a QEMU-generated dumpfile: crash> help -n makedumpfile header: signature: "makedumpfile" type: 1 version: 1 all_flat_data: num_array: 18695 array: 7f484b760010 file_size: 0 diskdump_data: filename: vmcore.ovmf.rhel7.kdump-snappy flags: c6 (KDUMP_CMPRS_LOCAL|ERROR_EXCLUDED|LZO_SUPPORTED|SNAPPY_SUPPORTED) [FLAT] dfd: 3 ofp: 3e441b1260 machine_type: 62 (EM_X86_64) header: 1a68fe0 signature: "KDUMP " header_version: 6 utsname: sysname: nodename: release: version: machine: x86_64 domainname: timestamp: tv_sec: 0 tv_usec: 0 status: 4 (DUMP_DH_COMPRESSED_SNAPPY) block_size: 4096 sub_hdr_size: 1 bitmap_blocks: 76 max_mapnr: 1245184 total_ram_blocks: 0 device_blocks: 0 written_blocks: 0 current_cpu: 0 nr_cpus: 4 tasks[nr_cpus]: 0 0 0 0 sub_header: 0 (n/a) sub_header_kdump: 1a69ff0 phys_base: 0 dump_level: 1 (0x1) (DUMP_EXCLUDE_ZERO) split: 0 start_pfn: (unused) end_pfn: (unused) offset_vmcoreinfo: 0 (0x0) size_vmcoreinfo: 0 (0x0) offset_note: 4200 (0x1068) size_note: 3232 (0xca0) num_prstatus_notes: 4 notes_buf: 1a6b000 notes[0]: 1a6b000 notes[1]: 1a6b164 notes[2]: 1a6b2c8 notes[3]: 1a6b42c NT_PRSTATUS_offset: 1068 11cc 1330 1494 offset_eraseinfo: 0 (0x0) size_eraseinfo: 0 (0x0) start_pfn_64: (unused) end_pfn_64: (unused) max_mapnr_64: 1245184 (0x130000) data_offset: 4e000 block_size: 4096 block_shift: 12 bitmap: 7f484b713010 bitmap_len: 311296 max_mapnr: 1245184 (0x130000) dumpable_bitmap: 7f484b6c6010 byte: 0 bit: 0 compressed_page: 1a8c660 curbufptr: 1a7f650 ... Note that QEMU does add self-generated register dumps above, but the special "QEMU" note that is added to ELF kdumps is not included. Also note that the kernel version information is also left zero-filled. In any case, if either a QEMU note or a diskdump.data flag were added, I would be more than happy. Dave _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: uniquely identifying KDUMP files that originate from QEMU 2014-11-12 14:09 ` Dave Anderson @ 2014-11-12 15:01 ` Laszlo Ersek 2014-11-12 15:45 ` Dave Anderson 2014-11-13 1:08 ` HATAYAMA Daisuke 1 sibling, 1 reply; 29+ messages in thread From: Laszlo Ersek @ 2014-11-12 15:01 UTC (permalink / raw) To: Dave Anderson, kexec Cc: HATAYAMA Daisuke, Petr Tesarik, Discussion list for crash utility usage, maintenance and development On 11/12/14 15:09, Dave Anderson wrote: > > > ----- Original Message ----- >> From: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> >> To: ptesarik@suse.cz >> Cc: lersek@redhat.com, kexec@lists.infradead.org >> Subject: Re: uniquely identifying KDUMP files that originate from QEMU >> Message-ID: >> <20141112.120838.303682123986142686.d.hatayama@jp.fujitsu.com> >> Content-Type: Text/Plain; charset=us-ascii >> >> From: Petr Tesarik <ptesarik@suse.cz> >> Subject: Re: uniquely identifying KDUMP files that originate from QEMU >> Date: Tue, 11 Nov 2014 13:09:13 +0100 >> >>> On Tue, 11 Nov 2014 12:22:52 +0100 >>> Laszlo Ersek <lersek@redhat.com> wrote: >>> >>>> (Note: I'm not subscribed to either qemu-devel or the kexec list; please >>>> keep me CC'd.) >>>> >>>> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib, >>>> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command. >>>> >>>> The resultant vmcore is usually analyzed with the "crash" utility. >>>> >>>> The original tool producing such files is kdump. Unlike the procedure >>>> performed by QEMU, kdump runs from *within* the guest (under a kexec'd >>>> kdump kernel), and has more information about the original guest kernel >>>> state (which is being dumped) than QEMU. To QEMU, the guest kernel state >>>> is opaque. >>>> >>>> For this reason, the kdump preparation logic in QEMU hardcodes a number >>>> of fields in the kdump header. The direct issue is the "phys_base" >>>> field. Refer to dump.c, functions create_header32(), create_header64(), >>>> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text >>>> "0"). >>>> >>>> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD >>>> >>>> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD >>>> >>>> This works in most cases, because the guest Linux kernel indeed tends to >>>> be loaded at guest-phys address 0. However, when the guest Linux kernel >>>> is booted on top of OVMF (which has a somewhat unusual UEFI memory map), >>>> then the guest Linux kernel is loaded at 16MB, thereby getting out of >>>> sync with the phys_base=0 setting visible in the KDUMP header. >>>> >>>> This trips up the "crash" utility. >>>> >>>> Dave worked around the issue in "crash" for ELF format dumps -- "crash" >>>> can identify QEMU as the originator of the vmcore by finding the QEMU >>>> notes in the ELF vmcore. If those are present, then "crash" employs a >>>> heuristic, probing for a phys_base up to 32MB, in 1MB steps. >>>> >>>> Alas, the QEMU notes are not present in the KDUMP-format vmcores that >>>> QEMU produces (they cannot be), >>> >>> Why? Since KDUMP format version 4, the complete ELF notes can be stored >>> in the file (see offset_note, size_note fields in the sub-header). >>> >> >> Yes, the QEMU notes is present in kdump-compressed format. But >> phys_base cannot be calculated only from qemu-side. We cannot do more >> than the efforts crash utility does for workaround. So, the phys_base >> value in kdump-sub header is now designed to have 0 now. >> >> Anyway, phys_base is kernel information. To make it available for qemu >> side, there's need to prepare a mechanism for qemu to have any access >> to it. >> >> One ad-hoc but simple way is to put phys_base value as part of >> VMCOREINFO note information on kernel. >> >> Although there has already been a similar one in VMCOREINFO, like >> >> arch/x86/kernel/ >> == >> void arch_crash_save_vmcoreinfo(void) >> { >> VMCOREINFO_SYMBOL(phys_base); <---- This >> VMCOREINFO_SYMBOL(init_level4_pgt); >> >> ... >> == >> >> this is meangless, because this value is a virtual address assigned to >> phys_base symbol. To refer to the value of phys_base itself, we need >> the phys_base value we are about to get now. >> >> So, instead, if we change this to save the value, not value of symbol >> phys_base, we can get phys_base from the VMCOREINFO. >> >> The VMCOREINFO consists simply of string. So it's easy to search >> vmcore for it e.g. using strings and grep like this: >> >> $ strings vmcore-3.10.0-121.el7.x86_64 | grep -E ".*VMCOREINFO.*" -A 100 >> VMCOREINFO >> OSRELEASE=3.10.0-121.el7.x86_64 >> PAGESIZE=4096 >> ... >> SYMBOL(phys_base)=ffffffff818e5010 <-- though this is address of phys_base >> now... >> SYMBOL(init_level4_pgt)=ffffffff818de000 >> SYMBOL(node_data)=ffffffff819f1cc0 >> LENGTH(node_data)=1024 >> CRASHTIME=1399460394 >> ... >> >> This should also be useful to get phys_base of 2nd kernel, which is >> inherently relocated kernel from a vmcore generated using qemu dump. >> >> This is far from well-designed from qemu's point of view, but it would >> be manually easier to get phys_base than now. >> >> Obviously, the VMCOREINFO is available only if CONFIG_KEXEC is >> enabled. Other users cannot use this. >> >> -- >> Thanks. >> HATAYAMA, Daisuke > > I agree that the actual value of phys_base should be included in the vmcoreinfo. > > However, it won't help in this case because the vmcoreinfo data is not > copied into the compressed dumpfile header. The offset_vmcoreinfo and > size_vmcoreinfo fields are zero. > > Here's an example header dump of a QEMU-generated dumpfile: > > crash> help -n > makedumpfile header: > signature: "makedumpfile" > type: 1 > version: 1 > all_flat_data: > num_array: 18695 > array: 7f484b760010 > file_size: 0 > > diskdump_data: > filename: vmcore.ovmf.rhel7.kdump-snappy > flags: c6 (KDUMP_CMPRS_LOCAL|ERROR_EXCLUDED|LZO_SUPPORTED|SNAPPY_SUPPORTED) [FLAT] > dfd: 3 > ofp: 3e441b1260 > machine_type: 62 (EM_X86_64) > > header: 1a68fe0 > signature: "KDUMP " > header_version: 6 > utsname: > sysname: > nodename: > release: > version: > machine: x86_64 > domainname: > timestamp: > tv_sec: 0 > tv_usec: 0 > status: 4 (DUMP_DH_COMPRESSED_SNAPPY) > block_size: 4096 > sub_hdr_size: 1 > bitmap_blocks: 76 > max_mapnr: 1245184 > total_ram_blocks: 0 > device_blocks: 0 > written_blocks: 0 > current_cpu: 0 > nr_cpus: 4 > tasks[nr_cpus]: 0 > 0 > 0 > 0 > > sub_header: 0 (n/a) > > sub_header_kdump: 1a69ff0 > phys_base: 0 > dump_level: 1 (0x1) (DUMP_EXCLUDE_ZERO) > split: 0 > start_pfn: (unused) > end_pfn: (unused) > offset_vmcoreinfo: 0 (0x0) > size_vmcoreinfo: 0 (0x0) > offset_note: 4200 (0x1068) > size_note: 3232 (0xca0) > num_prstatus_notes: 4 > notes_buf: 1a6b000 > notes[0]: 1a6b000 > notes[1]: 1a6b164 > notes[2]: 1a6b2c8 > notes[3]: 1a6b42c > NT_PRSTATUS_offset: 1068 > 11cc > 1330 > 1494 > offset_eraseinfo: 0 (0x0) > size_eraseinfo: 0 (0x0) > start_pfn_64: (unused) > end_pfn_64: (unused) > max_mapnr_64: 1245184 (0x130000) > > data_offset: 4e000 > block_size: 4096 > block_shift: 12 > bitmap: 7f484b713010 > bitmap_len: 311296 > max_mapnr: 1245184 (0x130000) > dumpable_bitmap: 7f484b6c6010 > byte: 0 > bit: 0 > compressed_page: 1a8c660 > curbufptr: 1a7f650 > ... > > Note that QEMU does add self-generated register dumps above, but the special > "QEMU" note that is added to ELF kdumps is not included. > > Also note that the kernel version information is also left zero-filled. > > In any case, if either a QEMU note or a diskdump.data flag were added, I would > be more than happy. Looks like a new flag needs to be negotiated with many stake-holders, but a QEMU note could be included even in the kdump format (not only the ELF format) freely, and tools that don't recognize it would simply ignore it. (And other tools that generate custom notes probably won't clash with it.) Is that correct? Because if it is, then (a) I didn't know it, (b) we only need an agreement between "crash" and qemu. Is the kdump format specified somewhere (as in, a PDF or text file)? I'd like to look into this option if possible. Also, is there a command line tool that dumps metadata from a kdump file? (Quite like your "crash" invocation above, but I believe crash won't even start without a matching symbol file.) Thank you Laszlo _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: uniquely identifying KDUMP files that originate from QEMU 2014-11-12 15:01 ` Laszlo Ersek @ 2014-11-12 15:45 ` Dave Anderson 0 siblings, 0 replies; 29+ messages in thread From: Dave Anderson @ 2014-11-12 15:45 UTC (permalink / raw) To: Laszlo Ersek Cc: Petr Tesarik, HATAYAMA Daisuke, kexec, Discussion list for crash utility usage, maintenance and development ----- Original Message ----- > On 11/12/14 15:09, Dave Anderson wrote: > > > > > > ----- Original Message ----- > >> From: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> > >> To: ptesarik@suse.cz > >> Cc: lersek@redhat.com, kexec@lists.infradead.org > >> Subject: Re: uniquely identifying KDUMP files that originate from QEMU > >> Message-ID: > >> <20141112.120838.303682123986142686.d.hatayama@jp.fujitsu.com> > >> Content-Type: Text/Plain; charset=us-ascii > >> > >> From: Petr Tesarik <ptesarik@suse.cz> > >> Subject: Re: uniquely identifying KDUMP files that originate from QEMU > >> Date: Tue, 11 Nov 2014 13:09:13 +0100 > >> > >>> On Tue, 11 Nov 2014 12:22:52 +0100 > >>> Laszlo Ersek <lersek@redhat.com> wrote: > >>> > >>>> (Note: I'm not subscribed to either qemu-devel or the kexec list; please > >>>> keep me CC'd.) > >>>> > >>>> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib, > >>>> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command. > >>>> > >>>> The resultant vmcore is usually analyzed with the "crash" utility. > >>>> > >>>> The original tool producing such files is kdump. Unlike the procedure > >>>> performed by QEMU, kdump runs from *within* the guest (under a kexec'd > >>>> kdump kernel), and has more information about the original guest kernel > >>>> state (which is being dumped) than QEMU. To QEMU, the guest kernel state > >>>> is opaque. > >>>> > >>>> For this reason, the kdump preparation logic in QEMU hardcodes a number > >>>> of fields in the kdump header. The direct issue is the "phys_base" > >>>> field. Refer to dump.c, functions create_header32(), create_header64(), > >>>> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text > >>>> "0"). > >>>> > >>>> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD > >>>> > >>>> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD > >>>> > >>>> This works in most cases, because the guest Linux kernel indeed tends to > >>>> be loaded at guest-phys address 0. However, when the guest Linux kernel > >>>> is booted on top of OVMF (which has a somewhat unusual UEFI memory map), > >>>> then the guest Linux kernel is loaded at 16MB, thereby getting out of > >>>> sync with the phys_base=0 setting visible in the KDUMP header. > >>>> > >>>> This trips up the "crash" utility. > >>>> > >>>> Dave worked around the issue in "crash" for ELF format dumps -- "crash" > >>>> can identify QEMU as the originator of the vmcore by finding the QEMU > >>>> notes in the ELF vmcore. If those are present, then "crash" employs a > >>>> heuristic, probing for a phys_base up to 32MB, in 1MB steps. > >>>> > >>>> Alas, the QEMU notes are not present in the KDUMP-format vmcores that > >>>> QEMU produces (they cannot be), > >>> > >>> Why? Since KDUMP format version 4, the complete ELF notes can be stored > >>> in the file (see offset_note, size_note fields in the sub-header). > >>> > >> > >> Yes, the QEMU notes is present in kdump-compressed format. But > >> phys_base cannot be calculated only from qemu-side. We cannot do more > >> than the efforts crash utility does for workaround. So, the phys_base > >> value in kdump-sub header is now designed to have 0 now. > >> > >> Anyway, phys_base is kernel information. To make it available for qemu > >> side, there's need to prepare a mechanism for qemu to have any access > >> to it. > >> > >> One ad-hoc but simple way is to put phys_base value as part of > >> VMCOREINFO note information on kernel. > >> > >> Although there has already been a similar one in VMCOREINFO, like > >> > >> arch/x86/kernel/ > >> == > >> void arch_crash_save_vmcoreinfo(void) > >> { > >> VMCOREINFO_SYMBOL(phys_base); <---- This > >> VMCOREINFO_SYMBOL(init_level4_pgt); > >> > >> ... > >> == > >> > >> this is meangless, because this value is a virtual address assigned to > >> phys_base symbol. To refer to the value of phys_base itself, we need > >> the phys_base value we are about to get now. > >> > >> So, instead, if we change this to save the value, not value of symbol > >> phys_base, we can get phys_base from the VMCOREINFO. > >> > >> The VMCOREINFO consists simply of string. So it's easy to search > >> vmcore for it e.g. using strings and grep like this: > >> > >> $ strings vmcore-3.10.0-121.el7.x86_64 | grep -E ".*VMCOREINFO.*" -A 100 > >> VMCOREINFO > >> OSRELEASE=3.10.0-121.el7.x86_64 > >> PAGESIZE=4096 > >> ... > >> SYMBOL(phys_base)=ffffffff818e5010 <-- though this is address of > >> phys_base > >> now... > >> SYMBOL(init_level4_pgt)=ffffffff818de000 > >> SYMBOL(node_data)=ffffffff819f1cc0 > >> LENGTH(node_data)=1024 > >> CRASHTIME=1399460394 > >> ... > >> > >> This should also be useful to get phys_base of 2nd kernel, which is > >> inherently relocated kernel from a vmcore generated using qemu dump. > >> > >> This is far from well-designed from qemu's point of view, but it would > >> be manually easier to get phys_base than now. > >> > >> Obviously, the VMCOREINFO is available only if CONFIG_KEXEC is > >> enabled. Other users cannot use this. > >> > >> -- > >> Thanks. > >> HATAYAMA, Daisuke > > > > I agree that the actual value of phys_base should be included in the > > vmcoreinfo. > > > > However, it won't help in this case because the vmcoreinfo data is not > > copied into the compressed dumpfile header. The offset_vmcoreinfo and > > size_vmcoreinfo fields are zero. > > > > Here's an example header dump of a QEMU-generated dumpfile: > > > > crash> help -n > > makedumpfile header: > > signature: "makedumpfile" > > type: 1 > > version: 1 > > all_flat_data: > > num_array: 18695 > > array: 7f484b760010 > > file_size: 0 > > > > diskdump_data: > > filename: vmcore.ovmf.rhel7.kdump-snappy > > flags: c6 > > (KDUMP_CMPRS_LOCAL|ERROR_EXCLUDED|LZO_SUPPORTED|SNAPPY_SUPPORTED) > > [FLAT] > > dfd: 3 > > ofp: 3e441b1260 > > machine_type: 62 (EM_X86_64) > > > > header: 1a68fe0 > > signature: "KDUMP " > > header_version: 6 > > utsname: > > sysname: > > nodename: > > release: > > version: > > machine: x86_64 > > domainname: > > timestamp: > > tv_sec: 0 > > tv_usec: 0 > > status: 4 (DUMP_DH_COMPRESSED_SNAPPY) > > block_size: 4096 > > sub_hdr_size: 1 > > bitmap_blocks: 76 > > max_mapnr: 1245184 > > total_ram_blocks: 0 > > device_blocks: 0 > > written_blocks: 0 > > current_cpu: 0 > > nr_cpus: 4 > > tasks[nr_cpus]: 0 > > 0 > > 0 > > 0 > > > > sub_header: 0 (n/a) > > > > sub_header_kdump: 1a69ff0 > > phys_base: 0 > > dump_level: 1 (0x1) (DUMP_EXCLUDE_ZERO) > > split: 0 > > start_pfn: (unused) > > end_pfn: (unused) > > offset_vmcoreinfo: 0 (0x0) > > size_vmcoreinfo: 0 (0x0) > > offset_note: 4200 (0x1068) > > size_note: 3232 (0xca0) > > num_prstatus_notes: 4 > > notes_buf: 1a6b000 > > notes[0]: 1a6b000 > > notes[1]: 1a6b164 > > notes[2]: 1a6b2c8 > > notes[3]: 1a6b42c > > NT_PRSTATUS_offset: 1068 > > 11cc > > 1330 > > 1494 > > offset_eraseinfo: 0 (0x0) > > size_eraseinfo: 0 (0x0) > > start_pfn_64: (unused) > > end_pfn_64: (unused) > > max_mapnr_64: 1245184 (0x130000) > > > > data_offset: 4e000 > > block_size: 4096 > > block_shift: 12 > > bitmap: 7f484b713010 > > bitmap_len: 311296 > > max_mapnr: 1245184 (0x130000) > > dumpable_bitmap: 7f484b6c6010 > > byte: 0 > > bit: 0 > > compressed_page: 1a8c660 > > curbufptr: 1a7f650 > > ... > > > > Note that QEMU does add self-generated register dumps above, but the special > > "QEMU" note that is added to ELF kdumps is not included. > > > > Also note that the kernel version information is also left zero-filled. > > > > In any case, if either a QEMU note or a diskdump.data flag were added, I would > > be more than happy. > > Looks like a new flag needs to be negotiated with many stake-holders, > but a QEMU note could be included even in the kdump format (not only the > ELF format) freely, and tools that don't recognize it would simply > ignore it. (And other tools that generate custom notes probably won't > clash with it.) > > Is that correct? Because if it is, then (a) I didn't know it, (b) we > only need an agreement between "crash" and qemu. Agreed. > Is the kdump format specified somewhere (as in, a PDF or text file)? I'd > like to look into this option if possible. I don't know. Bernhard Walle, formerly of SUSE, used to keep a text file stored, but all the links are dead now: [Crash-utility] Re: Kdump compressed format https://www.redhat.com/archives/crash-utility/2008-August/msg00014.html But his does certainly wouldn't have anything w/respect to QEMU notes. Maybe Petr or the Fujitsu guys have a pointer? But anyway, as it turns out, in QEMU ELF kdumps create one of the special "QEMU" notes for each cpu: $ readelf --notes vmcore Notes at offset 0x000001c8 with length 0x00000ca0: Owner Data size Description CORE 0x00000150 NT_PRSTATUS (prstatus structure) CORE 0x00000150 NT_PRSTATUS (prstatus structure) CORE 0x00000150 NT_PRSTATUS (prstatus structure) CORE 0x00000150 NT_PRSTATUS (prstatus structure) QEMU 0x000001b0 Unknown note type: (0x00000000) QEMU 0x000001b0 Unknown note type: (0x00000000) QEMU 0x000001b0 Unknown note type: (0x00000000) QEMU 0x000001b0 Unknown note type: (0x00000000) $ Here are the contents of each QEMU note: crash> help -n ... [ cut ] ... Elf64_Nhdr: n_namesz: 5 ("QEMU") n_descsz: 432 n_type: 0 (?) 000001b000000001 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000001 ffffffff81dd5228 ffffffff81a01ec8 ffffffff81a01ec8 0000000000000000 0000000000000000 00000013911d5f29 0000000000000000 ffffffff81c00480 0000000000000000 ffffffffffffffff 000000000309f000 ffffffff810375ab 0000000000000246 ffffffff00000010 0000000000a09b00 0000000000000000 ffffffff00000018 0000000000c09300 0000000000000000 ffffffff00000018 0000000000c09300 0000000000000000 ffffffff00000000 0000000000000000 0000000000000000 ffffffff00000000 0000000000000000 ffff880003200000 ffffffff00000018 0000000000c09300 0000000000000000 ffffffff00000000 0000000000000000 0000000000000000 0000208700000040 0000000000008b00 ffff880003213b40 0000007f00000000 0000000000000000 ffff880003204000 00000fff00000000 0000000000000000 ffffffff81dd2000 000000008005003b 0000000000000000 0000000001b2e000 0000000007b18000 00000000000006f0 Elf64_Nhdr: n_namesz: 5 ("QEMU") n_descsz: 432 n_type: 0 (?) 000001b000000001 ffffffff81a93760 000000000000000c 0000000080802001 0000000000000000 00000000000000ff 00000000000000f0 ffff880002287e88 ffff880002287e88 ffff880002287e50 ffff880002287e54 000009149661fc2b ffff88001e6abe78 ffff880002287ef8 00000000fffffffe 0000000000000000 ffffffff81bfed40 ffffffff810375ba 0000000000000002 ffffffff00000010 0000000000a09b00 0000000000000000 ffffffff00000018 0000000000c09300 0000000000000000 ffffffff00000018 0000000000c09300 0000000000000000 ffffffff00000000 0000000000000000 0000000000000000 ffffffff00000000 0000000000000000 ffff880002280000 ffffffff00000018 0000000000c09300 0000000000000000 ffffffff00000000 0000000000000000 0000000000000000 0000208700000040 0000000000008b00 ffff880002293b40 0000007f00000000 0000000000000000 ffff880002284000 00000fff00000000 0000000000000000 ffffffff81dd2000 000000008005003b 0000000000000000 0000000002162570 000000001aab8000 00000000000006e0 Elf64_Nhdr: n_namesz: 5 ("QEMU") n_descsz: 432 n_type: 0 (?) 000001b000000001 ffffffff81a93760 000000000000000c 0000000080802001 0000000000000000 00000000000000ff 00000000000000f0 ffff880002307e88 ffff880002307e88 ffff880002307e50 ffff880002307e54 000009143aed494c ffff88001e6dfe78 ffff880002307ef8 00000000fffffffe 0000000000000000 ffffffff81bfed40 ffffffff810375ba 0000000000000002 ffffffff00000010 0000000000a09b00 0000000000000000 ffffffff00000018 0000000000c09300 0000000000000000 ffffffff00000018 0000000000c09300 0000000000000000 ffffffff00000000 0000000000000000 0000000000000000 ffffffff00000000 0000000000000000 ffff880002300000 ffffffff00000018 0000000000c09300 0000000000000000 ffffffff00000000 0000000000000000 0000000000000000 0000208700000040 0000000000008b00 ffff880002313b40 0000007f00000000 0000000000000000 ffff880002304000 00000fff00000000 0000000000000000 ffffffff81dd2000 000000008005003b 0000000000000000 00007fd1a029c000 000000001d5c7000 00000000000006e0 Elf64_Nhdr: n_namesz: 5 ("QEMU") n_descsz: 432 n_type: 0 (?) 000001b000000001 ffffffff81a93760 000000000000000c 0000000080802001 0000000000000000 00000000000000ff 00000000000000f0 ffff880002387e88 ffff880002387e88 ffff880002387e50 ffff880002387e54 0000091497285969 0000000000000000 ffff880002387ef8 00000000fffffffe 0000000000000000 ffffffff81bfed40 ffffffff810375ba 0000000000000046 ffffffff00000010 0000000000a09b00 0000000000000000 ffffffff00000018 0000000000c09300 0000000000000000 ffffffff00000018 0000000000c09300 0000000000000000 ffffffff00000000 0000000000000000 0000000000000000 ffffffff00000000 0000000000000000 ffff880002380000 ffffffff00000018 0000000000c09300 0000000000000000 ffffffff00000000 0000000000000000 0000000000000000 0000208700000040 0000000000008b00 ffff880002393b40 0000007f00000000 0000000000000000 ffff880002384000 00000fff00000000 0000000000000000 ffffffff81dd2000 000000008005003b 0000000000000000 00007f981fe51000 000000001f214000 00000000000006e0 crash> I'm not sure what the data consists of. The crash utility simply checks for the existance of a note with a "QEMU" name string. > Also, is there a command line tool that dumps metadata from a kdump > file? (Quite like your "crash" invocation above, but I believe crash > won't even start without a matching symbol file.) I don't know of any, although Petr mentioned something about a "kdumpid" tool? It's on sourceforge, but I've never heard of it until today, and don't know if it dumps the full contents of headers. However, you can get the header dump I showed before without a vmlinux file by using the -d debug flag on the vmcore: $ crash -d1 vmcore.ovmf.rhel7.kdump-zlib crash 7.0.8 Copyright (C) 2002-2014 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. vmcore.ovmf.rhel7.kdump-zlib: FLAT compressed kdump: header->utsname.machine: x86_64 makedumpfile header: signature: "makedumpfile" type: 1 version: 1 all_flat_data: num_array: 13851 array: 7f3c0a0da010 file_size: 0 diskdump_data: filename: vmcore.ovmf.rhel7.kdump-zlib flags: 6 (KDUMP_CMPRS_LOCAL|ERROR_EXCLUDED) [FLAT] dfd: 3 ofp: 0 machine_type: 62 (EM_X86_64) header: 1c9bfe0 signature: "KDUMP " header_version: 6 utsname: sysname: nodename: release: version: machine: x86_64 domainname: timestamp: tv_sec: 0 tv_usec: 0 status: 1 (DUMP_DH_COMPRESSED_ZLIB) block_size: 4096 sub_hdr_size: 1 bitmap_blocks: 76 max_mapnr: 1245184 total_ram_blocks: 0 device_blocks: 0 written_blocks: 0 current_cpu: 0 nr_cpus: 4 tasks[nr_cpus]: 0 0 0 0 sub_header: 0 (n/a) sub_header_kdump: 1c9cff0 phys_base: 0 dump_level: 1 (0x1) (DUMP_EXCLUDE_ZERO) split: 0 start_pfn: (unused) end_pfn: (unused) offset_vmcoreinfo: 0 (0x0) size_vmcoreinfo: 0 (0x0) offset_note: 4200 (0x1068) size_note: 3232 (0xca0) num_prstatus_notes: 4 notes_buf: 1c9e000 notes[0]: 1c9e000 notes[1]: 1c9e164 notes[2]: 1c9e2c8 notes[3]: 1c9e42c NT_PRSTATUS_offset: 1068 11cc 1330 1494 offset_eraseinfo: 0 (0x0) size_eraseinfo: 0 (0x0) start_pfn_64: (unused) end_pfn_64: (unused) max_mapnr_64: 1245184 (0x130000) data_offset: 4e000 block_size: 4096 block_shift: 12 bitmap: 7f3c0a08d010 bitmap_len: 311296 max_mapnr: 1245184 (0x130000) dumpable_bitmap: 7f3c0a040010 byte: 0 bit: 0 compressed_page: 1cbf660 curbufptr: 0 page_cache_hdr[0]: pg_flags: 0 () pg_addr: 0 pg_bufptr: 1caf650 pg_hit_count: 0 page_cache_hdr[1]: pg_flags: 0 () pg_addr: 0 pg_bufptr: 1cb0650 pg_hit_count: 0 page_cache_hdr[2]: pg_flags: 0 () pg_addr: 0 pg_bufptr: 1cb1650 pg_hit_count: 0 page_cache_hdr[3]: pg_flags: 0 () pg_addr: 0 pg_bufptr: 1cb2650 pg_hit_count: 0 page_cache_hdr[4]: pg_flags: 0 () pg_addr: 0 pg_bufptr: 1cb3650 pg_hit_count: 0 page_cache_hdr[5]: pg_flags: 0 () pg_addr: 0 pg_bufptr: 1cb4650 pg_hit_count: 0 page_cache_hdr[6]: pg_flags: 0 () pg_addr: 0 pg_bufptr: 1cb5650 pg_hit_count: 0 page_cache_hdr[7]: pg_flags: 0 () pg_addr: 0 pg_bufptr: 1cb6650 pg_hit_count: 0 page_cache_hdr[8]: pg_flags: 0 () pg_addr: 0 pg_bufptr: 1cb7650 pg_hit_count: 0 page_cache_hdr[9]: pg_flags: 0 () pg_addr: 0 pg_bufptr: 1cb8650 pg_hit_count: 0 page_cache_hdr[10]: pg_flags: 0 () pg_addr: 0 pg_bufptr: 1cb9650 pg_hit_count: 0 page_cache_hdr[11]: pg_flags: 0 () pg_addr: 0 pg_bufptr: 1cba650 pg_hit_count: 0 page_cache_hdr[12]: pg_flags: 0 () pg_addr: 0 pg_bufptr: 1cbb650 pg_hit_count: 0 page_cache_hdr[13]: pg_flags: 0 () pg_addr: 0 pg_bufptr: 1cbc650 pg_hit_count: 0 page_cache_hdr[14]: pg_flags: 0 () pg_addr: 0 pg_bufptr: 1cbd650 pg_hit_count: 0 page_cache_hdr[15]: pg_flags: 0 () pg_addr: 0 pg_bufptr: 1cbe650 pg_hit_count: 0 page_cache_buf: 1caf650 evict_index: 0 evictions: 0 accesses: 0 cached_reads: 0 valid_pages: 1caecc0 crash: namelist argument required Usage: crash [OPTION]... NAMELIST MEMORY-IMAGE[@ADDRESS] (dumpfile form) crash [OPTION]... [NAMELIST] (live system form) Enter "crash -h" for details. $ Dave _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: uniquely identifying KDUMP files that originate from QEMU 2014-11-12 14:09 ` Dave Anderson 2014-11-12 15:01 ` Laszlo Ersek @ 2014-11-13 1:08 ` HATAYAMA Daisuke 2014-11-13 15:21 ` Dave Anderson 1 sibling, 1 reply; 29+ messages in thread From: HATAYAMA Daisuke @ 2014-11-13 1:08 UTC (permalink / raw) To: anderson; +Cc: lersek, kexec, ptesarik, crash-utility From: Dave Anderson <anderson@redhat.com> Subject: Re: uniquely identifying KDUMP files that originate from QEMU Date: Wed, 12 Nov 2014 09:09:34 -0500 > > > ----- Original Message ----- >> From: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> >> To: ptesarik@suse.cz >> Cc: lersek@redhat.com, kexec@lists.infradead.org >> Subject: Re: uniquely identifying KDUMP files that originate from QEMU >> Message-ID: >> <20141112.120838.303682123986142686.d.hatayama@jp.fujitsu.com> >> Content-Type: Text/Plain; charset=us-ascii >> >> From: Petr Tesarik <ptesarik@suse.cz> >> Subject: Re: uniquely identifying KDUMP files that originate from QEMU >> Date: Tue, 11 Nov 2014 13:09:13 +0100 >> >> > On Tue, 11 Nov 2014 12:22:52 +0100 >> > Laszlo Ersek <lersek@redhat.com> wrote: >> > >> >> (Note: I'm not subscribed to either qemu-devel or the kexec list; please >> >> keep me CC'd.) >> >> >> >> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib, >> >> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command. >> >> >> >> The resultant vmcore is usually analyzed with the "crash" utility. >> >> >> >> The original tool producing such files is kdump. Unlike the procedure >> >> performed by QEMU, kdump runs from *within* the guest (under a kexec'd >> >> kdump kernel), and has more information about the original guest kernel >> >> state (which is being dumped) than QEMU. To QEMU, the guest kernel state >> >> is opaque. >> >> >> >> For this reason, the kdump preparation logic in QEMU hardcodes a number >> >> of fields in the kdump header. The direct issue is the "phys_base" >> >> field. Refer to dump.c, functions create_header32(), create_header64(), >> >> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text >> >> "0"). >> >> >> >> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD >> >> >> >> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD >> >> >> >> This works in most cases, because the guest Linux kernel indeed tends to >> >> be loaded at guest-phys address 0. However, when the guest Linux kernel >> >> is booted on top of OVMF (which has a somewhat unusual UEFI memory map), >> >> then the guest Linux kernel is loaded at 16MB, thereby getting out of >> >> sync with the phys_base=0 setting visible in the KDUMP header. >> >> >> >> This trips up the "crash" utility. >> >> >> >> Dave worked around the issue in "crash" for ELF format dumps -- "crash" >> >> can identify QEMU as the originator of the vmcore by finding the QEMU >> >> notes in the ELF vmcore. If those are present, then "crash" employs a >> >> heuristic, probing for a phys_base up to 32MB, in 1MB steps. >> >> >> >> Alas, the QEMU notes are not present in the KDUMP-format vmcores that >> >> QEMU produces (they cannot be), >> > >> > Why? Since KDUMP format version 4, the complete ELF notes can be stored >> > in the file (see offset_note, size_note fields in the sub-header). >> > >> >> Yes, the QEMU notes is present in kdump-compressed format. But >> phys_base cannot be calculated only from qemu-side. We cannot do more >> than the efforts crash utility does for workaround. So, the phys_base >> value in kdump-sub header is now designed to have 0 now. >> >> Anyway, phys_base is kernel information. To make it available for qemu >> side, there's need to prepare a mechanism for qemu to have any access >> to it. >> >> One ad-hoc but simple way is to put phys_base value as part of >> VMCOREINFO note information on kernel. >> >> Although there has already been a similar one in VMCOREINFO, like >> >> arch/x86/kernel/ >> == >> void arch_crash_save_vmcoreinfo(void) >> { >> VMCOREINFO_SYMBOL(phys_base); <---- This >> VMCOREINFO_SYMBOL(init_level4_pgt); >> >> ... >> == >> >> this is meangless, because this value is a virtual address assigned to >> phys_base symbol. To refer to the value of phys_base itself, we need >> the phys_base value we are about to get now. >> >> So, instead, if we change this to save the value, not value of symbol >> phys_base, we can get phys_base from the VMCOREINFO. >> >> The VMCOREINFO consists simply of string. So it's easy to search >> vmcore for it e.g. using strings and grep like this: >> >> $ strings vmcore-3.10.0-121.el7.x86_64 | grep -E ".*VMCOREINFO.*" -A 100 >> VMCOREINFO >> OSRELEASE=3.10.0-121.el7.x86_64 >> PAGESIZE=4096 >> ... >> SYMBOL(phys_base)=ffffffff818e5010 <-- though this is address of phys_base >> now... >> SYMBOL(init_level4_pgt)=ffffffff818de000 >> SYMBOL(node_data)=ffffffff819f1cc0 >> LENGTH(node_data)=1024 >> CRASHTIME=1399460394 >> ... >> >> This should also be useful to get phys_base of 2nd kernel, which is >> inherently relocated kernel from a vmcore generated using qemu dump. >> >> This is far from well-designed from qemu's point of view, but it would >> be manually easier to get phys_base than now. >> >> Obviously, the VMCOREINFO is available only if CONFIG_KEXEC is >> enabled. Other users cannot use this. >> >> -- >> Thanks. >> HATAYAMA, Daisuke > > I agree that the actual value of phys_base should be included in the vmcoreinfo. > > However, it won't help in this case because the vmcoreinfo data is not > copied into the compressed dumpfile header. The offset_vmcoreinfo and > size_vmcoreinfo fields are zero. Yes, so I said: >> This is far from well-designed from qemu's point of view, but it would >> be manually easier to get phys_base than now. This is just an ad-hoc way. > > Here's an example header dump of a QEMU-generated dumpfile: > > crash> help -n > makedumpfile header: > signature: "makedumpfile" > type: 1 > version: 1 > all_flat_data: > num_array: 18695 > array: 7f484b760010 > file_size: 0 > > diskdump_data: > filename: vmcore.ovmf.rhel7.kdump-snappy > flags: c6 (KDUMP_CMPRS_LOCAL|ERROR_EXCLUDED|LZO_SUPPORTED|SNAPPY_SUPPORTED) [FLAT] > dfd: 3 > ofp: 3e441b1260 > machine_type: 62 (EM_X86_64) > > header: 1a68fe0 > signature: "KDUMP " > header_version: 6 > utsname: > sysname: > nodename: > release: > version: > machine: x86_64 > domainname: > timestamp: > tv_sec: 0 > tv_usec: 0 > status: 4 (DUMP_DH_COMPRESSED_SNAPPY) > block_size: 4096 > sub_hdr_size: 1 > bitmap_blocks: 76 > max_mapnr: 1245184 > total_ram_blocks: 0 > device_blocks: 0 > written_blocks: 0 > current_cpu: 0 > nr_cpus: 4 > tasks[nr_cpus]: 0 > 0 > 0 > 0 > > sub_header: 0 (n/a) > > sub_header_kdump: 1a69ff0 > phys_base: 0 > dump_level: 1 (0x1) (DUMP_EXCLUDE_ZERO) > split: 0 > start_pfn: (unused) > end_pfn: (unused) > offset_vmcoreinfo: 0 (0x0) > size_vmcoreinfo: 0 (0x0) > offset_note: 4200 (0x1068) > size_note: 3232 (0xca0) > num_prstatus_notes: 4 > notes_buf: 1a6b000 > notes[0]: 1a6b000 > notes[1]: 1a6b164 > notes[2]: 1a6b2c8 > notes[3]: 1a6b42c > NT_PRSTATUS_offset: 1068 > 11cc > 1330 > 1494 > offset_eraseinfo: 0 (0x0) > size_eraseinfo: 0 (0x0) > start_pfn_64: (unused) > end_pfn_64: (unused) > max_mapnr_64: 1245184 (0x130000) > > data_offset: 4e000 > block_size: 4096 > block_shift: 12 > bitmap: 7f484b713010 > bitmap_len: 311296 > max_mapnr: 1245184 (0x130000) > dumpable_bitmap: 7f484b6c6010 > byte: 0 > bit: 0 > compressed_page: 1a8c660 > curbufptr: 1a7f650 > ... > > Note that QEMU does add self-generated register dumps above, but the special > "QEMU" note that is added to ELF kdumps is not included. > Sorry, I didn't know this, and there's no reason not to add it. > Also note that the kernel version information is also left zero-filled. > This is what I intended. Retrieving data from vmcore should be done in crash utility or makedumpfile. > In any case, if either a QEMU note or a diskdump.data flag were added, I would > be more than happy. > > Dave The absence of QEMU note is different from my intension. This is regression agast ELF. We must add it. -- Thanks. HATAYAMA, Daisuke _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: uniquely identifying KDUMP files that originate from QEMU 2014-11-13 1:08 ` HATAYAMA Daisuke @ 2014-11-13 15:21 ` Dave Anderson 0 siblings, 0 replies; 29+ messages in thread From: Dave Anderson @ 2014-11-13 15:21 UTC (permalink / raw) To: HATAYAMA Daisuke; +Cc: lersek, kexec, ptesarik, crash-utility ----- Original Message ----- > From: Dave Anderson <anderson@redhat.com> > Subject: Re: uniquely identifying KDUMP files that originate from QEMU > Date: Wed, 12 Nov 2014 09:09:34 -0500 > > > > > > > ----- Original Message ----- > >> From: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> > >> To: ptesarik@suse.cz > >> Cc: lersek@redhat.com, kexec@lists.infradead.org > >> Subject: Re: uniquely identifying KDUMP files that originate from QEMU > >> Message-ID: > >> <20141112.120838.303682123986142686.d.hatayama@jp.fujitsu.com> > >> Content-Type: Text/Plain; charset=us-ascii > >> > >> From: Petr Tesarik <ptesarik@suse.cz> > >> Subject: Re: uniquely identifying KDUMP files that originate from QEMU > >> Date: Tue, 11 Nov 2014 13:09:13 +0100 > >> > >> > On Tue, 11 Nov 2014 12:22:52 +0100 > >> > Laszlo Ersek <lersek@redhat.com> wrote: > >> > > >> >> (Note: I'm not subscribed to either qemu-devel or the kexec list; > >> >> please > >> >> keep me CC'd.) > >> >> > >> >> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib, > >> >> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command. > >> >> > >> >> The resultant vmcore is usually analyzed with the "crash" utility. > >> >> > >> >> The original tool producing such files is kdump. Unlike the procedure > >> >> performed by QEMU, kdump runs from *within* the guest (under a kexec'd > >> >> kdump kernel), and has more information about the original guest kernel > >> >> state (which is being dumped) than QEMU. To QEMU, the guest kernel > >> >> state > >> >> is opaque. > >> >> > >> >> For this reason, the kdump preparation logic in QEMU hardcodes a number > >> >> of fields in the kdump header. The direct issue is the "phys_base" > >> >> field. Refer to dump.c, functions create_header32(), create_header64(), > >> >> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text > >> >> "0"). > >> >> > >> >> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD > >> >> > >> >> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD > >> >> > >> >> This works in most cases, because the guest Linux kernel indeed tends > >> >> to > >> >> be loaded at guest-phys address 0. However, when the guest Linux kernel > >> >> is booted on top of OVMF (which has a somewhat unusual UEFI memory > >> >> map), > >> >> then the guest Linux kernel is loaded at 16MB, thereby getting out of > >> >> sync with the phys_base=0 setting visible in the KDUMP header. > >> >> > >> >> This trips up the "crash" utility. > >> >> > >> >> Dave worked around the issue in "crash" for ELF format dumps -- "crash" > >> >> can identify QEMU as the originator of the vmcore by finding the QEMU > >> >> notes in the ELF vmcore. If those are present, then "crash" employs a > >> >> heuristic, probing for a phys_base up to 32MB, in 1MB steps. > >> >> > >> >> Alas, the QEMU notes are not present in the KDUMP-format vmcores that > >> >> QEMU produces (they cannot be), > >> > > >> > Why? Since KDUMP format version 4, the complete ELF notes can be stored > >> > in the file (see offset_note, size_note fields in the sub-header). > >> > > >> > >> Yes, the QEMU notes is present in kdump-compressed format. But > >> phys_base cannot be calculated only from qemu-side. We cannot do more > >> than the efforts crash utility does for workaround. So, the phys_base > >> value in kdump-sub header is now designed to have 0 now. > >> > >> Anyway, phys_base is kernel information. To make it available for qemu > >> side, there's need to prepare a mechanism for qemu to have any access > >> to it. > >> > >> One ad-hoc but simple way is to put phys_base value as part of > >> VMCOREINFO note information on kernel. > >> > >> Although there has already been a similar one in VMCOREINFO, like > >> > >> arch/x86/kernel/ > >> == > >> void arch_crash_save_vmcoreinfo(void) > >> { > >> VMCOREINFO_SYMBOL(phys_base); <---- This > >> VMCOREINFO_SYMBOL(init_level4_pgt); > >> > >> ... > >> == > >> > >> this is meangless, because this value is a virtual address assigned to > >> phys_base symbol. To refer to the value of phys_base itself, we need > >> the phys_base value we are about to get now. > >> > >> So, instead, if we change this to save the value, not value of symbol > >> phys_base, we can get phys_base from the VMCOREINFO. > >> > >> The VMCOREINFO consists simply of string. So it's easy to search > >> vmcore for it e.g. using strings and grep like this: > >> > >> $ strings vmcore-3.10.0-121.el7.x86_64 | grep -E ".*VMCOREINFO.*" -A 100 > >> VMCOREINFO > >> OSRELEASE=3.10.0-121.el7.x86_64 > >> PAGESIZE=4096 > >> ... > >> SYMBOL(phys_base)=ffffffff818e5010 <-- though this is address of > >> phys_base > >> now... > >> SYMBOL(init_level4_pgt)=ffffffff818de000 > >> SYMBOL(node_data)=ffffffff819f1cc0 > >> LENGTH(node_data)=1024 > >> CRASHTIME=1399460394 > >> ... > >> > >> This should also be useful to get phys_base of 2nd kernel, which is > >> inherently relocated kernel from a vmcore generated using qemu dump. > >> > >> This is far from well-designed from qemu's point of view, but it would > >> be manually easier to get phys_base than now. > >> > >> Obviously, the VMCOREINFO is available only if CONFIG_KEXEC is > >> enabled. Other users cannot use this. > >> > >> -- > >> Thanks. > >> HATAYAMA, Daisuke > > > > I agree that the actual value of phys_base should be included in the > > vmcoreinfo. > > > > However, it won't help in this case because the vmcoreinfo data is not > > copied into the compressed dumpfile header. The offset_vmcoreinfo and > > size_vmcoreinfo fields are zero. > > Yes, so I said: > > >> This is far from well-designed from qemu's point of view, but it would > >> be manually easier to get phys_base than now. > > This is just an ad-hoc way. > > > > > Here's an example header dump of a QEMU-generated dumpfile: > > > > crash> help -n > > makedumpfile header: > > signature: "makedumpfile" > > type: 1 > > version: 1 > > all_flat_data: > > num_array: 18695 > > array: 7f484b760010 > > file_size: 0 > > > > diskdump_data: > > filename: vmcore.ovmf.rhel7.kdump-snappy > > flags: c6 > > (KDUMP_CMPRS_LOCAL|ERROR_EXCLUDED|LZO_SUPPORTED|SNAPPY_SUPPORTED) > > [FLAT] > > dfd: 3 > > ofp: 3e441b1260 > > machine_type: 62 (EM_X86_64) > > > > header: 1a68fe0 > > signature: "KDUMP " > > header_version: 6 > > utsname: > > sysname: > > nodename: > > release: > > version: > > machine: x86_64 > > domainname: > > timestamp: > > tv_sec: 0 > > tv_usec: 0 > > status: 4 (DUMP_DH_COMPRESSED_SNAPPY) > > block_size: 4096 > > sub_hdr_size: 1 > > bitmap_blocks: 76 > > max_mapnr: 1245184 > > total_ram_blocks: 0 > > device_blocks: 0 > > written_blocks: 0 > > current_cpu: 0 > > nr_cpus: 4 > > tasks[nr_cpus]: 0 > > 0 > > 0 > > 0 > > > > sub_header: 0 (n/a) > > > > sub_header_kdump: 1a69ff0 > > phys_base: 0 > > dump_level: 1 (0x1) (DUMP_EXCLUDE_ZERO) > > split: 0 > > start_pfn: (unused) > > end_pfn: (unused) > > offset_vmcoreinfo: 0 (0x0) > > size_vmcoreinfo: 0 (0x0) > > offset_note: 4200 (0x1068) > > size_note: 3232 (0xca0) > > num_prstatus_notes: 4 > > notes_buf: 1a6b000 > > notes[0]: 1a6b000 > > notes[1]: 1a6b164 > > notes[2]: 1a6b2c8 > > notes[3]: 1a6b42c > > NT_PRSTATUS_offset: 1068 > > 11cc > > 1330 > > 1494 > > offset_eraseinfo: 0 (0x0) > > size_eraseinfo: 0 (0x0) > > start_pfn_64: (unused) > > end_pfn_64: (unused) > > max_mapnr_64: 1245184 (0x130000) > > > > data_offset: 4e000 > > block_size: 4096 > > block_shift: 12 > > bitmap: 7f484b713010 > > bitmap_len: 311296 > > max_mapnr: 1245184 (0x130000) > > dumpable_bitmap: 7f484b6c6010 > > byte: 0 > > bit: 0 > > compressed_page: 1a8c660 > > curbufptr: 1a7f650 > > ... > > > > Note that QEMU does add self-generated register dumps above, but the > > special > > "QEMU" note that is added to ELF kdumps is not included. > > > > Sorry, I didn't know this, and there's no reason not to add it. > > > Also note that the kernel version information is also left zero-filled. > > > > This is what I intended. Retrieving data from vmcore should be done in > crash utility or makedumpfile. > > > In any case, if either a QEMU note or a diskdump.data flag were added, I would > > be more than happy. > > > > Dave > > The absence of QEMU note is different from my intension. This is > regression agast ELF. We must add it. Not necessary -- as it turns out, the QEMU notes are located in the compressed kdump notes section following the NT_PRSTATUS notes: http://lists.infradead.org/pipermail/kexec/2014-November/012974.html It's just that the notes-gathering code in the crash utility was only looking for and storing NT_PRSTATUS note information. Thanks, Dave _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2014-11-13 15:22 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-11 11:22 uniquely identifying KDUMP files that originate from QEMU Laszlo Ersek
2014-11-11 11:46 ` [Qemu-devel] " Peter Maydell
2014-11-11 12:09 ` Petr Tesarik
2014-11-12 3:08 ` HATAYAMA Daisuke
2014-11-12 8:04 ` Petr Tesarik
2014-11-12 14:50 ` Laszlo Ersek
2014-11-12 18:43 ` Petr Tesarik
2014-11-12 20:30 ` Laszlo Ersek
2014-11-12 20:41 ` Dave Anderson
2014-11-12 21:21 ` [Crash-utility] " Dave Anderson
2014-11-12 21:20 ` Petr Tesarik
2014-11-11 17:27 ` [Qemu-devel] " Christopher Covington
2014-11-12 8:05 ` Petr Tesarik
2014-11-12 13:18 ` Christopher Covington
2014-11-12 13:26 ` Petr Tesarik
2014-11-12 13:28 ` Christopher Covington
2014-11-12 14:36 ` Petr Tesarik
2014-11-12 14:40 ` Laszlo Ersek
2014-11-12 14:10 ` Laszlo Ersek
2014-11-12 14:48 ` Christopher Covington
2014-11-12 15:03 ` Laszlo Ersek
2014-11-12 15:43 ` Christopher Covington
2014-11-12 21:10 ` Petr Tesarik
2014-11-12 14:37 ` Laszlo Ersek
[not found] <mailman.20827.1415774425.22890.kexec@lists.infradead.org>
2014-11-12 14:09 ` Dave Anderson
2014-11-12 15:01 ` Laszlo Ersek
2014-11-12 15:45 ` Dave Anderson
2014-11-13 1:08 ` HATAYAMA Daisuke
2014-11-13 15:21 ` Dave Anderson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox