Kexec Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Anderson <anderson@redhat.com>
To: Jay Lan <jlan@sgi.com>
Cc: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>,
	Bernhard Walle <bwalle@suse.de>,
	kexec@lists.infradead.org
Subject: Re: the exiting makedumpfile is almost there... :)
Date: Thu, 11 Sep 2008 10:13:55 -0400	[thread overview]
Message-ID: <48C927A3.2000302@redhat.com> (raw)
In-Reply-To: <48C85836.8080606@sgi.com>

Jay Lan wrote:
> After getting around a few kdump kernel panic/hang, i finally was
> able to complete a kdump vmcore with 2.6.27-rc5. The system under
> testing was an IA64 with 128 cpu and 256G memory A4700 system.
> 
> The /proc/vmcore is:
> a4700rac:/boot # ll /proc/vmcore
> -r-------- 1 root root 263006257684 2008-09-10 14:45 /proc/vmcore
> a4700rac:/boot # ls -lh /proc/vmcore
> -r-------- 1 root root 245G 2008-09-10 14:44 /proc/vmcore
> 
> Time spent in saving the vmcore using cp was 7 min 17 sec:
> 
> a4700rac:/boot # date; cp /proc/vmcore /mnt/sda9/diskdump/vmcore-cp; date
> Wed Sep 10 14:34:18 PDT 2008
> Wed Sep 10 14:41:35 PDT 2008
> 
> Time spent with 'makedumpfile -c -d31' was 1 min 40 sec:
> 
> a4700rac:/boot # date; makedumpfile -c -d31 -x
> /boot/vmlinux-2.6.27-rc5-default /proc/vmcore
> /mnt/sda9/diskdump/vmcore-2.6.27-rc5-default; date
> Wed Sep 10 14:31:56 PDT 2008
> Can't distinguish the pgtable.
> The kernel version is not supported.
> The created dumpfile may be incomplete.
> Copying data                       : [100 %]
> 
> The dumpfile is saved to /mnt/sda9/diskdump/vmcore-2.6.27-rc5-default.
> 
> makedumpfile Completed.
> Wed Sep 10 14:33:36 PDT 2008
> 
> 
> The fact that it took only 1 min 40 sec in running makedumpfile was
> EXCELLENT and EXCITING!!! Remember last time i tested on a 256 cpu
> 1TB A4700? It took 18 hours to complete the makedumpfile. What an
> improvement!
> 
> Hmmm, the reason it is only "almost there" was that crash failed
> to analyze the output of makedumpfile. :(  Crash was happy with
> the vmcore saved with 'cp' command.
> 
> a4700rac:/var/tmp/jlan # crash -d 1 /boot/vmlinux-2.6.27-rc5-default
> /mnt/sda9/diskdump/vmcore-2.6.27-rc5-default
> 
> crash 4.0-4.10
> Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007  Red Hat, Inc.
> Copyright (C) 2004, 2005, 2006  IBM Corporation
> Copyright (C) 1999-2006  Hewlett-Packard Co
> Copyright (C) 2005, 2006  Fujitsu Limited
> Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
> Copyright (C) 2005  NEC Corporation
> Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
> Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
> This program is free software, covered by the GNU General Public License,
> and you are welcome to change it and/or distribute copies of it under
> certain conditions.  Enter "help copying" to see the conditions.
> This program has absolutely no warranty.  Enter "help warranty" for details.
> 
> crash: xc_core_elf_verify: not a xen ELF core file
> diskdump_data:
>              flags: 6 (KDUMP_CMPRS_LOCAL|ERROR_EXCLUDED)
>                dfd: 3
>                ofp: 0
>       machine_type: 50 (EM_IA_64)
> 
>             header: 6000000001142c70
>            signature: "KDUMP   "
>       header_version: 1
>              utsname:
>                sysname:
>               nodename:
>                release:
>                version:
>                machine:
>             domainname:
>            timestamp:
>                 tv_sec: 0
>                tv_usec: 0
>               status: 0 ()
>           block_size: 65536
>         sub_hdr_size: 1
>        bitmap_blocks: 2076
>            max_mapnr: 543813611
>     total_ram_blocks: 0
>        device_blocks: 0
>       written_blocks: 0
>          current_cpu: 0
>              nr_cpus: 1
>       tasks[nr_cpus]: 0
> 
>         sub_header: 0 (n/a)
> 
>   sub_header_kdump: 6000000001152c80
>            phys_base: 6044000000
>           dump_level: 31 (0x1f)
> (DUMP_EXCLUDE_ZERO|DUMP_EXCLUDE_CACHE|DUMP_EXCLUDE_CACHE_PRI|DUMP_EXCLUDE_USER_DATA|DUMP_EXCLUDE_FREE)
> 
>        data_offset: 81e0000
>         block_size: 65536
>        block_shift: 16
>             bitmap: 2000000000530010
>         bitmap_len: 136052736
>    dumpable_bitmap: 2000000008700010
>               byte: 0
>                bit: 0
>    compressed_page: 6000000001162c90
>          curbufptr: 0
> 
>  page_cache_hdr[0]:
>             pg_flags: 0 ()
>              pg_addr: 0
>            pg_bufptr: 20000000109e0010
>         pg_hit_count: 0
>  page_cache_hdr[1]:
>             pg_flags: 0 ()
>              pg_addr: 0
>            pg_bufptr: 20000000109f0010
>         pg_hit_count: 0
>  page_cache_hdr[2]:
>             pg_flags: 0 ()
>              pg_addr: 0
>            pg_bufptr: 2000000010a00010
>         pg_hit_count: 0
>  page_cache_hdr[3]:
>             pg_flags: 0 ()
>              pg_addr: 0
>            pg_bufptr: 2000000010a10010
>         pg_hit_count: 0
>  page_cache_hdr[4]:
>             pg_flags: 0 ()
>              pg_addr: 0
>            pg_bufptr: 2000000010a20010
>         pg_hit_count: 0
>  page_cache_hdr[5]:
>             pg_flags: 0 ()
>              pg_addr: 0
>            pg_bufptr: 2000000010a30010
>         pg_hit_count: 0
>  page_cache_hdr[6]:
>             pg_flags: 0 ()
>              pg_addr: 0
>            pg_bufptr: 2000000010a40010
>         pg_hit_count: 0
>  page_cache_hdr[7]:
>             pg_flags: 0 ()
>              pg_addr: 0
>            pg_bufptr: 2000000010a50010
>         pg_hit_count: 0
>  page_cache_hdr[8]:
>             pg_flags: 0 ()
>              pg_addr: 0
>            pg_bufptr: 2000000010a60010
>         pg_hit_count: 0
>  page_cache_hdr[9]:
>             pg_flags: 0 ()
>              pg_addr: 0
>            pg_bufptr: 2000000010a70010
>         pg_hit_count: 0
> page_cache_hdr[10]:
>             pg_flags: 0 ()
>              pg_addr: 0
>            pg_bufptr: 2000000010a80010
>         pg_hit_count: 0
> page_cache_hdr[11]:
>             pg_flags: 0 ()
>              pg_addr: 0
>            pg_bufptr: 2000000010a90010
>         pg_hit_count: 0
> page_cache_hdr[12]:
>             pg_flags: 0 ()
>              pg_addr: 0
>            pg_bufptr: 2000000010aa0010
>         pg_hit_count: 0
> page_cache_hdr[13]:
>             pg_flags: 0 ()
>              pg_addr: 0
>            pg_bufptr: 2000000010ab0010
>         pg_hit_count: 0
> page_cache_hdr[14]:
>             pg_flags: 0 ()
>              pg_addr: 0
>            pg_bufptr: 2000000010ac0010
>         pg_hit_count: 0
> page_cache_hdr[15]:
>             pg_flags: 0 ()
>              pg_addr: 0
>            pg_bufptr: 2000000010ad0010
>         pg_hit_count: 0
> 
>     page_cache_buf: 20000000109e0010
>        evict_index: 0
>          evictions: 0
>           accesses: 0
>       cached_reads: 0
>        valid_pages: 20000000108d0010
> compressed kdump: phys_start: 6044000000
> gdb /boot/vmlinux-2.6.27-rc5-default
> GNU gdb 6.1
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "ia64-unknown-linux-gnu"...
> 
> crash: CONFIG_HZ: 250
> crash: CONFIG_NR_CPUS: 512
> verify_namelist:
> /proc/version:
> Linux version 2.6.27-rc5-default (jlan@jackhammer) (gcc version 4.1.2
> 20070115 (SUSE Linux)) #61 SMP Wed Sep 10 14:21:26 PDT 2008
> utsname version: #61 SMP Wed Sep 10 14:21:26 PDT 2008
> /boot/vmlinux-2.6.27-rc5-default:
> Linux version 2.6.27-rc5-default (jlan@jackhammer) (gcc version 4.1.2
> 20070115 (SUSE Linux)) #61 SMP Wed Sep 10 14:21:26 PDT 2008
> 
> WARNING: Because this kernel was compiled with gcc version 4.1.2, certain
>          commands or command options may fail unless crash is invoked with
>          the  "--readnow" command line option.
> 
> crash: get_cpus_online: online: 128
> node_table[0]:
>              id: 0
>           pgdat: 0
>            size: 543813632
>         present: 73014444033
>         mem_map: 0
>     start_paddr: 0
>     start_mapnr: 0
> NOTE: page_hash_table does not exist in this kernel
> crash: page excluded: kernel virtual address: e000006003108e00  type:
> "runqueues entry (per_cpu)"
> a4700rac:/var/tmp/jlan #
> 

Jay,

Ken'ichi's suggestion to update your crash version is a good one,
although it's noteworthy that "Crash was happy with the vmcore saved
with 'cp' command".

At first I thought that the "phys_start" value of 6044000000 was
bizarre, but then again, this is an SGI machine, and it must
be correct since it was able to read the "linux_banner" string
from the mapped kernel region (as evidenced by the output above
showing "/proc/version: ...").  You can always verify that value
by running on the live system or against the "cp" generated dump:

   crash> help -m | grep phys_start

In any case, the node_table data looks bogus, and there was a
change in 4.0-4.12 that comes to mind:

4.0-4.12 - Fix for the "kmem -n" command to handle the 2.6.24 kernel replacement
            of the "node_online_map" nodemask with its appropriate entry in the
            new "node_states[]" nodemask array.  Without the patch, the per-node
            zone data would not be displayed, and any commands depending upon
            the node table data would be affected.  (anderson@redhat.com)

But the crash session would at least initialize properly, as yours did when
running with the "cp" dumpfile.  Anyway, please update your crash version.

Then, when it tried to read a per-cpu runqueue structure it ran into
the "page excluded" error.  One thing to verify is that the per-cpu
address is being correctly generated.  Using the "cp" generated dumpfile
enter "per_cpu__runqueues" on the command line, as in this RHEL5/ia64
example:

   crash> per_cpu__runqueues
   PER-CPU DATA TYPE:
     struct rq per_cpu__runqueues;
   PER-CPU ADDRESSES:
     [0]: e000000004e04be0
     [1]: e000000004e14be0
   crash>

My guess is that the runqueue address you see for cpu 0 will be the excluded
e000006003108e00.  If that's true, then makedumpfile does appear to be
excluding the page, and that page -- where the runqueue data structure(s)
exist -- is absolutely essential to initializing the crash session.

Dave








_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  parent reply	other threads:[~2008-09-11 14:18 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-10 23:28 the exiting makedumpfile is almost there... :) Jay Lan
2008-09-11  2:03 ` Ken'ichi Ohmichi
2008-09-11  8:21   ` Bernhard Walle
2008-09-11  2:31 ` Ken'ichi Ohmichi
2008-09-11 14:13 ` Dave Anderson [this message]
2008-09-11 14:32 ` Hedi Berriche
2008-09-12  2:21   ` Ken'ichi Ohmichi
2008-09-12 13:38     ` Jay Lan
2008-09-12 19:49       ` Jay Lan
2008-09-12 20:38         ` Dave Anderson
2008-09-12 22:21           ` Jay Lan
2008-09-15 15:24             ` Dave Anderson
2008-09-22 11:14             ` Ken'ichi Ohmichi
2008-09-23 15:41               ` Dave Anderson
2008-09-24  1:09                 ` Ken'ichi Ohmichi
2008-09-24 18:30                   ` Jay Lan
2008-09-24 21:56                     ` Jay Lan
2008-09-25  6:38                       ` Ken'ichi Ohmichi
2008-09-25 11:31                         ` Ken'ichi Ohmichi
2008-09-25 19:22                           ` Jay Lan
2008-09-26  0:17                             ` Ken'ichi Ohmichi
2008-09-23 20:20               ` Jay Lan
2008-09-23 20:47                 ` Dave Anderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48C927A3.2000302@redhat.com \
    --to=anderson@redhat.com \
    --cc=bwalle@suse.de \
    --cc=jlan@sgi.com \
    --cc=kexec@lists.infradead.org \
    --cc=oomichi@mxs.nes.nec.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox