From: "HATAYAMA, Daisuke" <d.hatayama@jp.fujitsu.com>
To: "Wang, Xiao/Wang Xiao" <wangx.fnst@cn.fujitsu.com>
Cc: kexec@lists.infradead.org
Subject: Re: [PATCH 2/2] makedumpfile: make the incomplete vmcore generated by ENOSPC error analyzable
Date: Thu, 18 Sep 2014 21:29:03 +0900 [thread overview]
Message-ID: <541AD00F.1070004@jp.fujitsu.com> (raw)
In-Reply-To: <54180781.2080702@cn.fujitsu.com>
Hello Wang,
(2014/09/16 18:48), Wang, Xiao/Wang Xiao wrote:
> Since the incomplete vmcore generated by ENOSPC error can't be anylyzed by
> crash utility, but sometimes this file may contain important information
> and the panic problem won't be reproduced, then we came up with an idea to
> modify the exist data of the incomplete vmcore file to make it analyzable
> by crash utility. As there are two formats of the vmcore file, different
> methods are needed to deal with each of them,
>
> elf:
> Modify the value of the PT_LOAD program header to reflect the actual size
> of the incomplete vmcore file. This method can't be used to modify the
> vmcore written in flattened mode.
>
> kdump-compressed:
> Dump the bitmap before any page header and page data.
>
Rigorously, this design depends on implementation of filesystems where it's
possible to lseek and write data onto already allocated blocks even after the
filesystem is full. To show correctness of this design, you should show that
test results for a variety of filesystems such as ext3, ext4, xfs, btrfs, etc.
Of course, if this design doesn't work well on some filesystem, then we just
see the second ENOSPC and we give up addtional work. This is the same situation
as now. There's no demerit.
> Signed-of-by: Wang Xiao <wangx.fnst@cn.fujitsu.com>
> ---
> makedumpfile.c | 150 +++++++++++++++++++++++++++++++++++++++++++++++++++++---
> 1 files changed, 143 insertions(+), 7 deletions(-)
>
> diff --git a/makedumpfile.c b/makedumpfile.c
> index ce4a866..debc15b 100644
> --- a/makedumpfile.c
> +++ b/makedumpfile.c
> @@ -3621,6 +3621,128 @@ write_cache(struct cache_data *cd, void *buf, size_t size)
> }
>
> int
> +reserve_diskspace(int fd, off_t start_offset, off_t end_offset, char *file_name)
> +{
> + size_t buf_size;
> + char *buf = NULL;
> +
> + int ret = FALSE;
> +
> + if (start_offset >= end_offset) {
> + ERRMSG("The start offset of diskspace to be reserved must smaller than "
> + "the end offset.\n");
> + return FALSE;
> + }
I feel this check should be assert()-like. This is not a message when makedumpfile runs.
At least there's no need to print this message.
> +
> + buf_size = end_offset - start_offset;
> +
> + if ((buf = malloc(buf_size)) == NULL) {
> + ERRMSG("Can't allocate memory for the size of reserved diskspace. %s\n",
> + strerror(errno));
> + return FALSE;
> + }
> +
> + memset(buf, 0, buf_size);
> + if (!write_buffer(fd, start_offset, buf, buf_size, file_name))
> + goto out;
> +
> + ret = TRUE;
> +out:
> + if (buf != NULL) {
> + free(buf);
> + }
> +
> + return ret;
> +}
> +
> +int
> +check_and_modify_elf_header() {
check_and_modify_elf_header*s*() is better?
It's confusing to the ELF header that always occurs at the head of ELF file.
> + int i, phnum;
> + off_t file_end, offset, end_offset;
> + Elf64_Ehdr ehdr64;
> + Elf32_Ehdr ehdr32;
> + Elf64_Phdr phdr64;
> + Elf32_Phdr phdr32;
> +
> + /*
> + * the is_elf64_memory() function still can be used.
> + */
> + if (is_elf64_memory()) { /* ELF64 */
ehdr64 should be declared in this scope.
> + if (!get_elf64_ehdr(info->fd_dumpfile, info->name_dumpfile, &ehdr64)) {
> + ERRMSG("Can't get ehdr64.\n");
> + return FALSE;
> + }
> + phnum = ehdr64.e_phnum;
> + } else { /* ELF32 */
ehdr32 should be declared in this scope.
> + if (!get_elf32_ehdr(info->fd_dumpfile, info->name_dumpfile, &ehdr32)) {
> + ERRMSG("Can't get ehdr32.\n");
> + return FALSE;
> + }
> + phnum = ehdr32.e_phnum;
> + }
> +
> + file_end = lseek(info->fd_dumpfile, 0, SEEK_END);
> + if (file_end < 0) {
> + ERRMSG("Can't detect the size of %s. %s\n",
> + info->name_dumpfile,
> + strerror(errno));
> + return FALSE;
> + }
> +
> + for (i = 0; i < phnum; i++) {
> + if (is_elf64_memory()) {
phdr64.
> + if (!get_elf64_phdr(info->fd_dumpfile,
> + info->name_dumpfile,
> + i, &phdr64)) {
> + ERRMSG("Can't find Phdr %d.\n", i);
> + return FALSE;
> + }
> +
> + offset = sizeof(Elf64_Ehdr) + sizeof(Elf64_Phdr) * i;
> + end_offset = phdr64.p_offset + phdr64.p_filesz;
> +
> + /*
> + * Check the program header and modify its value according
> + * to the actual written size.
> + */
> + if (file_end >= end_offset)
> + continue;
> + else if (file_end >= phdr64.p_offset && file_end < end_offset)
> + phdr64.p_filesz = file_end - phdr64.p_offset;
So, this is the first case that p_filesz becomes strictly less than p_memsz
except for filtered pages.
> + else if (file_end < phdr64.p_offset)
> + memset(&phdr64, 0, sizeof(Elf64_Phdr));
This should be phdr64.p_filesz = 0 and phdr64.p_offset = 0.
The phdr64 has still other information on system memory such p_memsz and p_paddr.
Please don't remove them.
> +
> + if (!write_buffer(info->fd_dumpfile, offset, &phdr64,
> + sizeof(Elf64_Phdr), info->name_dumpfile))
> + return FALSE;
> + } else {
phdr32.
> + if (!get_elf32_phdr(info->fd_dumpfile,
> + info->name_dumpfile,
> + i, &phdr32)) {
> + ERRMSG("Can't find Phdr %d.\n", i);
> + return FALSE;
> + }
> +
> + offset = sizeof(Elf32_Ehdr) + sizeof(Elf32_Phdr) * i;
> + end_offset = phdr32.p_offset + phdr32.p_filesz;
> +
> + if (file_end >= end_offset)
> + continue;
> + else if (file_end >= phdr32.p_offset && file_end < end_offset)
> + phdr32.p_filesz = file_end - phdr32.p_offset;
> + else if (file_end < phdr32.p_offset)
> + memset(&phdr32, 0, sizeof(Elf32_Phdr));
same as the above phdr64 case.
> +
> + if (!write_buffer(info->fd_dumpfile, offset, &phdr32,
> + sizeof(Elf32_Phdr), info->name_dumpfile))
> + return FALSE;
> + }
> + }
> +
> + return TRUE;
> +}
> +
> +int
> write_cache_bufsz(struct cache_data *cd)
> {
> if (!cd->buf_size)
> @@ -5432,6 +5554,13 @@ write_elf_header(struct cache_data *cd_header)
> size_note = note.p_filesz;
>
> /*
> + * Reserve a space to store the whole program headers.
> + */
> + if (!reserve_diskspace(cd_header->fd, cd_header->offset,
> + offset_note_dumpfile, cd_header->file_name))
> + goto out;
> +
> + /*
> * Modify the note size in PT_NOTE header to accomodate eraseinfo data.
> * Eraseinfo will be written later.
> */
> @@ -6956,11 +7085,11 @@ write_kdump_pages_and_bitmap_cyclic(struct cache_data *cd_header, struct cache_d
> if (!exclude_unnecessary_pages_cyclic(&cycle))
> return FALSE;
>
> - if (!write_kdump_pages_cyclic(cd_header, cd_page, &pd_zero,
> - &offset_data, &cycle))
> + if (!write_kdump_bitmap2_cyclic(&cycle))
> return FALSE;
>
> - if (!write_kdump_bitmap2_cyclic(&cycle))
> + if (!write_kdump_pages_cyclic(cd_header, cd_page, &pd_zero,
> + &offset_data, &cycle))
> return FALSE;
> }
>
> @@ -7906,10 +8035,10 @@ writeout_dumpfile(void)
> goto out;
> if (info->flag_cyclic) {
> if (!write_elf_pages_cyclic(&cd_header, &cd_page))
> - goto out;
> + goto chk_enospc;
> } else {
> if (!write_elf_pages(&cd_header, &cd_page))
> - goto out;
> + goto chk_enospc;
> }
> if (!write_elf_eraseinfo(&cd_header))
> goto out;
> @@ -7923,12 +8052,12 @@ writeout_dumpfile(void)
> } else {
> if (!write_kdump_header())
> goto out;
> + if (!write_kdump_bitmap())
> + goto out;
> if (!write_kdump_pages(&cd_header, &cd_page))
> goto out;
> if (!write_kdump_eraseinfo(&cd_page))
> goto out;
> - if (!write_kdump_bitmap())
> - goto out;
> }
> if (info->flag_flatten) {
> if (!write_end_flat_header())
> @@ -7936,6 +8065,13 @@ writeout_dumpfile(void)
> }
>
> ret = TRUE;
> +chk_enospc:
> + if ((ret == FALSE) && info->flag_nospace && !info->flag_flatten) {
> + if (!write_cache_bufsz(&cd_header))
> + goto out;
> + if (!check_and_modify_elf_header())
> + ERRMSG("Can't modify the elf header.\n");
> + }
There's a feature to repeat trying to capture dumpfile in
multiple dump levels in increasing order if no space error is
returned. So, this part should be moved upward.
This is part of man 8 makedumpfile:
-d dump_level
Specify the type of unnecessary page for analysis.
Pages of the specified type are not copied to
DUMPFILE. The page type marked in the following table
is excluded. A user can specify multiple page types by
setting the sum of each page type for dump_level. The
maximum of dump_level is 31. Note that a dump_level for
Xen dump filtering is 0 or 1 on a machine other than
x86_64 (On an x86_64 machine, it is possible to specify
2 or bigger as a dump_level).
If specifying multiple dump_levels with the delimiter
',', makedumpfile retries to create a DUMPFILE by other
dump_level when "No space on device" error happens. For
example, if dump_level is "11,31" and makedumpfile
fails by dump_level 11, makedumpfile retries it by
dump_level 31.
Example:
# makedumpfile -d 11 -x vmlinux /proc/vmcore dumpfile
# makedumpfile -d 11,31 -x vmlinux /proc/vmcore
dumpfile
The process of multiple dump levels is done in create_dumpfile() below.
The above check should be done after even the last trial fails.
if (info->flag_split) {
if ((status = writeout_multiple_dumpfiles()) == FALSE)
return FALSE;
} else {
if ((status = writeout_dumpfile()) == FALSE)
return FALSE;
}
if (status == NOSPACE) {
/*
* If specifying the other dump_level, makedumpfile tries
* to create a dumpfile with it again.
*/
num_retry++;
if ((info->dump_level = get_next_dump_level(num_retry)) < 0)
return FALSE;
MSG("Retry to create a dumpfile by dump_level(%d).\n",
info->dump_level);
if (!delete_dumpfile())
return FALSE;
goto retry;
}
--
Thanks.
HATAYAMA, Daisuke
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
next prev parent reply other threads:[~2014-09-18 12:30 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-16 9:48 [PATCH 2/2] makedumpfile: make the incomplete vmcore generated by ENOSPC error analyzable Wang, Xiao/Wang Xiao
2014-09-17 6:47 ` Petr Tesarik
2014-09-18 7:26 ` Wang, Xiao/Wang Xiao
2014-09-18 11:18 ` HATAYAMA, Daisuke
2014-09-18 12:29 ` HATAYAMA, Daisuke [this message]
2014-09-19 9:16 ` Wang, Xiao/Wang Xiao
2014-09-19 9:22 ` [PATCH 1/3] makedumpfile: make get_elf64_phdr()/get_elf32_phdr() public Wang, Xiao/Wang Xiao
[not found] <mailman.12983.1410860958.22890.kexec@lists.infradead.org>
2014-09-16 13:20 ` [PATCH 2/2] makedumpfile: make the incomplete vmcore generated by ENOSPC error analyzable Dave Anderson
2014-09-17 2:41 ` Wang, Xiao/Wang Xiao
2014-09-17 13:55 ` Dave Anderson
2014-09-18 7:19 ` Wang, Xiao/Wang Xiao
2014-09-18 13:56 ` Dave Anderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=541AD00F.1070004@jp.fujitsu.com \
--to=d.hatayama@jp.fujitsu.com \
--cc=kexec@lists.infradead.org \
--cc=wangx.fnst@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox