All of lore.kernel.org
 help / color / mirror / Atom feed
* "cannot access vmalloc'd module memory" when loading kdump'ed vmcore in crash
@ 2008-10-01 19:19 Worth, Kevin
       [not found] ` <48E3D2EB.4030301@redhat.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Worth, Kevin @ 2008-10-01 19:19 UTC (permalink / raw)
  To: kexec-ml, crash-utility@redhat.com


[-- Attachment #1.1: Type: text/plain, Size: 3449 bytes --]

Hello kexec and crash mailing lists,

Sorry to spam whoever's code this ISN'T an issue with, but I really am unsure of whether is a kdump or a crash issue. I am running an Ubuntu 7.04 with a 2.6.20 kernel (includes Ubuntus patches- source at http://packages.ubuntu.com/feisty/linux-source-2.6.20 ) and a modified VMSPLIT/PAGE_OFFSET value (see bottom for details) on an i386 machine with 4GB of memory. At first I thought this could be an issue with makedumpfile stripping out things it shouldn't, but I've found that setting up my initrd script so that it simply performs "cp /proc/vmcore /var/crash/vmcore" results in the same issue.

I've tried this with both crash 4.0-6.3 and 4.0-7.2 and get the same result. Unfortunately I'm locked at kernel 2.6.20 for other reasons, or else I would try that.

If anyone can offer suggestions of what to try, please let me know. If this is something that has already been resolved elsewhere, sorry to waste time, and if someone can point me to what resolved it, perhaps I can look at backporting the fix myself. Thanks for your time.

crash-4.0-7.2$ ./crash ~/vmcore ~/targetfiles/vmlinux-2.6.20-17.39-custom2

crash 4.0-7.2
<snip>Copyright notices...</snip>
GNU gdb 6.1
<snip>Copyright notices...</snip>
This GDB was configured as "i686-pc-linux-gnu"...

please wait... (gathering module symbol data)
WARNING: cannot access vmalloc'd module memory

      KERNEL: /home/worthk/targetfiles/vmlinux-2.6.20-17.39-custom2
    DUMPFILE: /home/worthk/vmcore
        CPUS: 2
        DATE: Wed Oct  1 12:30:50 2008
      UPTIME: 00:35:11
LOAD AVERAGE: 0.07, 0.09, 0.08
       TASKS: 94
    NODENAME: test-module
     RELEASE: 2.6.20-17.39-custom2
     VERSION: #3 SMP Wed Sep 24 10:11:03 PDT 2008
     MACHINE: i686  (2200 Mhz)
      MEMORY: 5 GB
<6>SysRq : Trigger a crashdump"
         PID: 4304
     COMMAND: "bash"
        TASK: 5d7e9030  [THREAD_INFO: f4b70000]
         CPU: 0
       STATE: TASK_RUNNING (SYSRQ)

crash> mod -s test
mod: cannot access vmalloc'd module memory


My kernel config is a bit outside the norm, in that the VMSPLIT value has been modified to give 3GB of memory the kernelspace and 1GB of memory to userspace. Below is a diff between the default Ubuntu "generic" config and mine:

diff /boot/config-2.6.20-17-generic /boot/config-2.6.20-17.37-custom2
3,4c3,4
< # Linux kernel version: 2.6.20-17-generic < # Wed Aug 20 14:43:36 2008
---
> # Linux kernel version: 2.6.20-17.37-custom2 # Tue Aug 19 18:50:53
> 2008
33c33
< CONFIG_VERSION_SIGNATURE="Ubuntu 2.6.20-17.39-generic"
---
> CONFIG_VERSION_SIGNATURE="Ubuntu 2.6.20-17.37-generic"
51c51
< # CONFIG_EMBEDDED is not set
---
> CONFIG_EMBEDDED=y
188,190c188,194
< CONFIG_HIGHMEM4G=y
< # CONFIG_HIGHMEM64G is not set
< CONFIG_PAGE_OFFSET=0xC0000000
---
> # CONFIG_HIGHMEM4G is not set
> CONFIG_HIGHMEM64G=y
> # CONFIG_VMSPLIT_3G is not set
> # CONFIG_VMSPLIT_3G_OPT is not set
> # CONFIG_VMSPLIT_2G is not set
> CONFIG_VMSPLIT_1G=y
> CONFIG_PAGE_OFFSET=0x40000000
191a196
> CONFIG_X86_PAE=y
204c209
< # CONFIG_RESOURCES_64BIT is not set
---
> CONFIG_RESOURCES_64BIT=y
1161a1167
> CONFIG_IDE_MAX_HWIFS=4
1443a1450
> # CONFIG_PATA_PLATFORM is not set
1525a1533
> CONFIG_I2O_EXT_ADAPTEC_DMA64=y


Kevin Worth
Network Security Software Engineer
ProCurve networking by HP
kevin.worth@hp.com<mailto:kevin.worth@hp.com>
ph 916.785.4528
fx 916.785.1196

[-- Attachment #1.2: Type: text/html, Size: 12842 bytes --]

[-- Attachment #2: Type: text/plain, Size: 143 bytes --]

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Crash-utility] "cannot access vmalloc'd module memory" when loading kdump'ed vmcore in crash
       [not found]       ` <28113A0489833849A1CF9EC16F4215434376E9D234@GVW1097EXB.americas.hpqcorp.net>
@ 2008-10-03 15:43         ` Dave Anderson
  2008-10-04 17:34           ` Worth, Kevin
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Anderson @ 2008-10-03 15:43 UTC (permalink / raw)
  To: Discussion list for crash utility usage, maintenance and development,
	kexec


NOTE: I've restored the kexec list to this discussion because
this 1G/3G issue does have ramifications w/respect to kexec-tools.
I'm first going to ramble on about crash utility debugging for a
bit here, but for the kexec/kdump masters in the audience, please
at least take a look at the end of this message (do a "find in this
message" for "KEXEC-KDUMP") where I discuss the kexec-tools
hardwiring of the x86 PAGE_OFFSET to c000000, and whether it
could screw up the dumpfile contents for Kevin's 1G/3G split
where his PAGE_OFFSET is 40000000.

First, the crash discussion...

Worth, Kevin wrote:
> Yep, I can run mod commands on a live system just fine.
> 
> Looks like "next" doesn't point to fffffffc...
> 

No, but it's 0x0, and therefore the "next" module in the
list gets calculated as 0 - offset-of-list-member, or
fffffffc.  And "MODULE_STATE_LIVE" is being shown by dumb
luck because its enumerator value is 0:

> crash> module f9088280
> struct module {
>   state = MODULE_STATE_LIVE,
>   list = {
>     next = 0x0,
>     prev = 0x0
>   },
>   name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
> 00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
> 00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
> 00\000",
>   mkobj = {
>     kobj = {
>       k_name = 0x0,
>       name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
> 00\000\000",
>       kref = {
>         refcount = {
>           counter = 0
...
>
> ...and all the rest of the struct is zeros too...

Right, so we know bogus data is being read from the dumpfile.  The question is:

(1) whether the virtual-to-physical address translation is
     failing somehow, or
(2) the dumpfile is screwed up.

> Does the following mean that user virtual address translations are failing too?
> 
> crash> set
>     PID: 4304
> COMMAND: "bash"
>    TASK: 5d7e9030  [THREAD_INFO: f4b70000]
>     CPU: 0
>   STATE: TASK_RUNNING (SYSRQ)
> crash> vm
> PID: 4304   TASK: 5d7e9030  CPU: 0   COMMAND: "bash"
>    MM       PGD      RSS    TOTAL_VM
> f7e7f040  5d5002c0  2616k    3972k
>   VMA       START      END    FLAGS  FILE
> 5fe454ec   8048000   80ee000   1875  /bin/bash
> 5fe45e34   80ee000   80f3000 101877  /bin/bash
> ...
> 
> crash> rd 8048000
> rd: invalid kernel virtual address: 8048000  type: "32-bit KVADDR"
> crash> rd -u 8048000
> rd: invalid user virtual address: 8048000  type: "32-bit UVADDR"
> crash> rd 80ee000
> rd: invalid kernel virtual address: 80ee000  type: "32-bit KVADDR"
> crash> rd -u 80ee000
> rd: invalid user virtual address: 80ee000  type: "32-bit UVADDR"
>

The fact that crash initially presumes that 8048000 and 80ee000 are
kernel virtual addresses can be explained by this part of "help -v"
debug output:

flags: 515a
  (NODES_ONLINE|ZONES|PERCPU_KMALLOC_V2|COMMON_VADDR|KMEM_CACHE_INIT|FLATMEM|PERCPU_KMALLOC_V2_NODES)

The "COMMON_VADDR" flag should *only* be set in the case of
the Red Hat hugemem 4G/4G split kernel.  However, I believe that
crash should be able to continue even if the bit is set, as is
the case when you run live.  It is a crash issue having to
do with your 4000000 PAGE_OFFSET, but I think it's benign,
especially if user virtual address accesses run OK on your
live system.

That's one thing that needs verification.  The "invalid user
virtual address" messages above that you get *even* when you
use "-u" would typically be generated as a result of the user
virtual-to-physical address translation. However, they also
could be generated if the virtual page being accessed has been
swapped out.

A better test would be translate all virtual address in the
user address space in one fell swoop with "vm -p".  It's
a verbose command, but for each user virtual page in the
current context, it will translate it to:

(1) the current physical address location, or
(2) if it's not in memory, but is backed by a file, it will show
     what file it comes from, or
(3) if it's been swapped out, what swapfile location is has been
     swapped out to, or
(4) if it's an anonymous page (with no file backing) that hasn't
     been touched yet, it will show "(not mapped)"

Here's a truncated example:

   PID: 19839  TASK: f7b03000  CPU: 1   COMMAND: "bash"
      MM       PGD      RSS    TOTAL_VM
   f6dc5740  f745c9c0  1392k    4532k
     VMA       START      END    FLAGS  FILE
   f69019bc    6fa000    703000     75  /lib/libnss_files-2.5.so
   VIRTUAL   PHYSICAL
   6fa000    12fdba000
   6fb000    12fdbb000
   6fc000    FILE: /lib/libnss_files-2.5.so  OFFSET: 2000
   6fd000    FILE: /lib/libnss_files-2.5.so  OFFSET: 3000
   6fe000    12f660000
   6ff000    12f2cf000
   700000    FILE: /lib/libnss_files-2.5.so  OFFSET: 6000
   701000    FILE: /lib/libnss_files-2.5.so  OFFSET: 7000
   702000    12fc6f000
     VMA       START      END    FLAGS  FILE
   f69013e4    703000    704000 100071  /lib/libnss_files-2.5.so
   VIRTUAL   PHYSICAL
   703000     54791000
     VMA       START      END    FLAGS  FILE
   f6901d84    704000    705000 100073  /lib/libnss_files-2.5.so
   VIRTUAL   PHYSICAL
   704000    12450d000
     VMA       START      END    FLAGS  FILE
   f6901284    a7c000    a96000    875  /lib/ld-2.5.so
   VIRTUAL   PHYSICAL
   a7c000     6ea28000
   a7d000    101f62000
   a7e000     6e6f3000
   a7f000     6e07e000
   a80000     6e084000
   a81000    114c8e000
   ...

Run the command above on a "bash" context on *both* the live
system and the dumpfile -- they should behave in a similar manner,
but I'm guessing you may get some bizarre errors when you
run it on the dumpfile.

Getting back to the base problem with the bogus module read,
here'a suggestion for debugging this.  It requires that you
run the live system, gather some basic data with the crash
utility, and then enter "alt-sysrq-c".  What we want to see
is a virtual-to-physical translation of the first module in
the module list on the live system.  Then crash the system.
Then we want to do the same thing on the subsequent vmcore
to see if the same physical address references are made during
the translation.

So for example, on my live system, the "/dev/crash" kernel
module is the last module entered, and therefore is pointed
to by the base kernel's "modules" list_head:

   crash> p modules
   modules = $2 = {
     next = 0xf8bd0904,
     prev = 0xf882b104
   }

Subtract 4 from the "next" pointer, and display the module:

   crash> module 0xf8bd0900
   struct module {
     state = MODULE_STATE_LIVE,
     list = {
       next = 0xf8caf984,
       prev = 0xc06787b0
     },
     name =    "crash",
     mkobj = {
       kobj = {
         k_name = 0xf8bd094c "crash",
         name = "crash",
         kref = {
           refcount = {
             counter = 2
           }
         },
     ...

Then translate it:

   crash> vtop 0xf8bd0900
   VIRTUAL   PHYSICAL
   f8bd0900  48ba1900

   PAGE DIRECTORY: c0724000
     PGD: c0724018 => 4001
     PMD:     4e28 => 37ae067
     PTE:  37aee80 => 48ba1163
    PAGE: 48ba1000

     PTE     PHYSICAL  FLAGS
   48ba1163  48ba1000  (PRESENT|RW|ACCESSED|DIRTY|GLOBAL)

     PAGE     PHYSICAL   MAPPING    INDEX CNT FLAGS
   c1917420   48ba1000         0    785045  1 c0000000
   crash>

Do the same type of thing on your live system (where you'll
have a different module), and save the output in a file.

Then immediately enter "alt-sysrq-c".

With the resultant dumpfile, perform the same "p modules",
"module <next-address-4>", and "vtop <next-address-4> steps
as done above.  The output *should* be identical, although
we're primarily interested in the vtop output given that
the "module <next-address-4>" will probably show garbage.

(BTW, this presumes that the first module in the kernel list
will still return bogus data like your current dumpfile.  That
may not be the case, and if so, we'll need to do something
similar but different. For example, on the live system, capture
the address of the "ext3" module, vtop it, crash the system,
and then do the same thing in the dumpfile.  You might want to
do that anyway, just in case the default behavior is different.
Then again, maybe it will work both live and in the dumpfile
for the ext3 module address, in which case we'll need to go in a
different debug-direction...)

Show the outputs of the live system and the subsequent dumpfile.
If they both end up resolving to the same physical address,
then there's an issue with the dumpfile.

KEXEC-KDUMP:

I talked to Vivek Goyal, who originally wrote the kexec-tools
facility, and he pointed me to this in the kexec-tools package's
"kexec/arch/i386/crashdump-x86.h" file:

   #define PAGE_OFFSET     0xc0000000
   #define __pa(x)         ((unsigned long)(x)-PAGE_OFFSET)

   #define __VMALLOC_RESERVE       (128 << 20)
   #define MAXMEM                  (-PAGE_OFFSET-__VMALLOC_RESERVE)

where for x86, it hard-wires the x86 PAGE_OFFSET to c0000000,
and will certainly result in a bogus MAXMEM given that your
PAGE_OFFSET is 40000000.  I don't know if that is related to
the problem, but if you do a "readelf -a" of your vmcore file,
you'll see some funky virtual address values for each PT_LOAD
segment.  They were dumped in the crash.log you sent me.  Note
that the virtual address regions (p_vaddr) are c0000000,
c0100000, c5000000, ffffffffffffffff and ffffffffffffffff,
all of which are incorrect or nonsensical w/respect to your
1G/3G split:

Elf64_Phdr:
                  p_type: 1 (PT_LOAD)
                p_offset: 728 (2d8)
                 p_vaddr: c0000000
                 p_paddr: 0
                p_filesz: 655360 (a0000)
                 p_memsz: 655360 (a0000)
                 p_flags: 7 (PF_X|PF_W|PF_R)
                 p_align: 0
Elf64_Phdr:
                  p_type: 1 (PT_LOAD)
                p_offset: 656088 (a02d8)
                 p_vaddr: c0100000
                 p_paddr: 100000
                p_filesz: 15728640 (f00000)
                 p_memsz: 15728640 (f00000)
                 p_flags: 7 (PF_X|PF_W|PF_R)
                 p_align: 0
Elf64_Phdr:
                  p_type: 1 (PT_LOAD)
                p_offset: 16384728 (fa02d8)
                 p_vaddr: c5000000
                 p_paddr: 5000000
                p_filesz: 855638016 (33000000)
                 p_memsz: 855638016 (33000000)
                 p_flags: 7 (PF_X|PF_W|PF_R)
                 p_align: 0
Elf64_Phdr:
                  p_type: 1 (PT_LOAD)
                p_offset: 872022744 (33fa02d8)
                 p_vaddr: ffffffffffffffff
                 p_paddr: 38000000
                p_filesz: 2272854016 (87790000)
                 p_memsz: 2272854016 (87790000)
                 p_flags: 7 (PF_X|PF_W|PF_R)
                 p_align: 0
Elf64_Phdr:
                  p_type: 1 (PT_LOAD)
                p_offset: 3144876760 (bb7302d8)
                 p_vaddr: ffffffffffffffff
                 p_paddr: 100000000
                p_filesz: 1073741824 (40000000)
                 p_memsz: 1073741824 (40000000)
                 p_flags: 7 (PF_X|PF_W|PF_R)
                 p_align: 0

Now, the crash utility only uses the p_paddr physical address
fields for x86 dumpfiles, so that shouldn't be a problem.

But I wonder whether when the /proc/vmcore is put together
that there isn't some problem with the data that it accesses?

Thanks,
   Dave









_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [Crash-utility] "cannot access vmalloc'd module memory" when loading kdump'ed vmcore in crash
  2008-10-03 15:43         ` [Crash-utility] " Dave Anderson
@ 2008-10-04 17:34           ` Worth, Kevin
  2008-10-04 17:47             ` Worth, Kevin
  2008-10-06 15:10             ` Dave Anderson
  0 siblings, 2 replies; 8+ messages in thread
From: Worth, Kevin @ 2008-10-04 17:34 UTC (permalink / raw)
  To: Discussion list for crash utility usage,	maintenance and development,
	kexec-ml

Hi Dave (and kexec list),

Including kexec list in this email because Dave mentioned: "Show the outputs of the live system and the subsequent dumpfile. If they both end up resolving to the same physical address, then there's an issue with the dumpfile." Things appear to be resolving to the same address (though I suspect Dave can confirm).

Please see below. I did have to censor one of the lines a bit- I do have a proprietary kernel module that I'm not able to discuss much about other than that it involves network traffic and should be causing any funky behavior here (the ext3 module seems to behave the same way). I just changed its name to "custom_lkm".

One additional note, although my "running system" kernel has the modified 3G kernel / 1G user split, my "capture kernel" is just the standard Ubuntu kernel (with 1G kernel / 3G user). The system panic'ed immediately if I tried to use my modified kernel as the "capture kernel". I figured this was outside the norm so I've been using the standard kernel to perform the capture.

First I ran through some commands Dave suggested on the live system (my contexts for the live system and dump were different, but what might be more important is that "vm -p" on a live system produced errors, while on the dump it did not):

crash> vm
PID: 32227  TASK: 47bc8030  CPU: 0   COMMAND: "crash"
   MM       PGD      RSS    TOTAL_VM
f7e67040  5fddfe00  63336k   67412k
  VMA       START      END    FLAGS  FILE
f3ed61d4   8048000   83e5000   1875  /root/crash
f3ed6d84   83e5000   83fc000 101877  /root/crash
....


crash> vm -p
PID: 32227  TASK: 47bc8030  CPU: 0   COMMAND: "crash"
   MM       PGD      RSS    TOTAL_VM
f7e67040  5fddfe00  63336k   67412k
  VMA       START      END    FLAGS  FILE
f3ed61d4   8048000   83e5000   1875  /root/crash
VIRTUAL   PHYSICAL
vm: read error: physical address: 10b60b000  type: "page table"


crash> p modules
modules = $2 = {
  next = 0xf9088284,
  prev = 0xf8842104
}
crash> module 0xf9088280
struct module {
  state = MODULE_STATE_LIVE,
  list = {
    next = 0xf8ff9d84,
    prev = 0x403c63a4
  },
  name = "custom_lkm\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000\000\000",
  mkobj = {
    kobj = {
      k_name = 0xf90882cc "custom_lkm",
      name = "custom_lkm\000\000\000\000\000\000\000\000",
      kref = {
        refcount = {
          counter = 3
        }
      },
      entry = {
        next = 0x403c6068,
        prev = 0xf8ff9de4
      },
...


crash> vtop 0xf9088280
VIRTUAL   PHYSICAL
f9088280  119b98280

PAGE DIRECTORY: 4044b000
  PGD: 4044b018 => 6001
  PMD:     6e40 => 1d515067
  PTE: 1d515440 => 119b98163
 PAGE: 119b98000

   PTE     PHYSICAL   FLAGS
119b98163  119b98000  (PRESENT|RW|ACCESSED|DIRTY|GLOBAL)


crash> mod | grep ext3
f88c8000  ext3             132616  (not loaded)  [CONFIG_KALLSYMS]

crash> module 0xf88c8000
struct module {
  state = MODULE_STATE_LIVE,
  list = {
    next = 0xf88a6604,
    prev = 0xf885d584
  },
  name = "ext3\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
  mkobj = {
    kobj = {
      k_name = 0xf88c804c "ext3",
      name = "ext3\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",

      kref = {
        refcount = {
          counter = 3
        }
      },
      entry = {
        next = 0xf885d5e4,
        prev = 0xf88a6664
      },
...

(Realized afterward that I forgot to vtop ext3. Let me know if it's needed and I can repeat this procedure)


From dump file:

crash> vm
PID: 4323   TASK: 47be0a90  CPU: 0   COMMAND: "bash"
   MM       PGD      RSS    TOTAL_VM
5d683580  5d500dc0  2616k    3968k
  VMA       START      END    FLAGS  FILE
5fc2aac4   8048000   80ee000   1875  /bin/bash
5fe5f0cc   80ee000   80f3000 101877  /bin/bash
...


crash> vm -p
PID: 4323   TASK: 47be0a90  CPU: 0   COMMAND: "bash"
   MM       PGD      RSS    TOTAL_VM
5d683580  5d500dc0  2616k    3968k
  VMA       START      END    FLAGS  FILE
5fc2aac4   8048000   80ee000   1875  /bin/bash
VIRTUAL   PHYSICAL
8048000   FILE: /bin/bash  OFFSET: 0
8049000   FILE: /bin/bash  OFFSET: 1000
804a000   FILE: /bin/bash  OFFSET: 2000
...no errors, lots of output


crash> modules
modules = $2 = {
  next = 0xf9088284,
  prev = 0xf8842104
}

crash> module 0xf9088280
struct module {
  state = MODULE_STATE_LIVE,
  list = {
    next = 0x0,
    prev = 0x0
  },
  name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000",
  mkobj = {
    kobj = {
      k_name = 0x0,
      name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000",
      kref = {
        refcount = {
          counter = 0
        }
      },
      entry = {
        next = 0x0,
        prev = 0x0
...

crash> vtop 0xf9088280
VIRTUAL   PHYSICAL
f9088280  119b98280

PAGE DIRECTORY: 4044b000
  PGD: 4044b018 => 6001
  PMD:     6e40 => 1d515067
  PTE: 1d515440 => 119b98163
 PAGE: 119b98000

   PTE     PHYSICAL   FLAGS
119b98163  119b98000  (PRESENT|RW|ACCESSED|DIRTY|GLOBAL)

  PAGE     PHYSICAL   MAPPING    INDEX CNT FLAGS
47337300  119b98000         0         0  1 80000000

crash> mod | grep ext3
mod: cannot access vmalloc'd module memory

(using the same address that ext3 had in a running system)
crash> module 0xf88c8000
struct module {
  state = MODULE_STATE_LIVE,
  list = {
    next = 0x0,
    prev = 0x0
  },
  name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000",
  mkobj = {
    kobj = {
      k_name = 0x0,
      name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000",
      kref = {
        refcount = {
          counter = 0
        }
      },
      entry = {
        next = 0x0,
        prev = 0x0
...

crash> vtop 0xf88c8000
VIRTUAL   PHYSICAL
f88c8000  13905f000

PAGE DIRECTORY: 4044b000
  PGD: 4044b018 => 6001
  PMD:     6e20 => 1d5fc067
  PTE: 1d5fc640 => 13905f163
 PAGE: 13905f000

   PTE     PHYSICAL   FLAGS
13905f163  13905f000  (PRESENT|RW|ACCESSED|DIRTY|GLOBAL)

  PAGE     PHYSICAL   MAPPING    INDEX CNT FLAGS


Thanks again for your continued assistance. I hope this is helpful information.

-Kevin


-----Original Message-----
From: crash-utility-bounces@redhat.com [mailto:crash-utility-bounces@redhat.com] On Behalf Of Dave Anderson
Sent: Friday, October 03, 2008 8:44 AM
To: Discussion list for crash utility usage, maintenance and development; kexec@lists.infradead.org
Subject: Re: [Crash-utility] "cannot access vmalloc'd module memory" when loading kdump'ed vmcore in crash


NOTE: I've restored the kexec list to this discussion because
this 1G/3G issue does have ramifications w/respect to kexec-tools.
I'm first going to ramble on about crash utility debugging for a
bit here, but for the kexec/kdump masters in the audience, please
at least take a look at the end of this message (do a "find in this
message" for "KEXEC-KDUMP") where I discuss the kexec-tools
hardwiring of the x86 PAGE_OFFSET to c000000, and whether it
could screw up the dumpfile contents for Kevin's 1G/3G split
where his PAGE_OFFSET is 40000000.

First, the crash discussion...

Worth, Kevin wrote:
> Yep, I can run mod commands on a live system just fine.
>
> Looks like "next" doesn't point to fffffffc...
>

No, but it's 0x0, and therefore the "next" module in the
list gets calculated as 0 - offset-of-list-member, or
fffffffc.  And "MODULE_STATE_LIVE" is being shown by dumb
luck because its enumerator value is 0:

> crash> module f9088280
> struct module {
>   state = MODULE_STATE_LIVE,
>   list = {
>     next = 0x0,
>     prev = 0x0
>   },
>   name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
> 00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
> 00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
> 00\000",
>   mkobj = {
>     kobj = {
>       k_name = 0x0,
>       name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
> 00\000\000",
>       kref = {
>         refcount = {
>           counter = 0
...
>
> ...and all the rest of the struct is zeros too...

Right, so we know bogus data is being read from the dumpfile.  The question is:

(1) whether the virtual-to-physical address translation is
     failing somehow, or
(2) the dumpfile is screwed up.

> Does the following mean that user virtual address translations are failing too?
>
> crash> set
>     PID: 4304
> COMMAND: "bash"
>    TASK: 5d7e9030  [THREAD_INFO: f4b70000]
>     CPU: 0
>   STATE: TASK_RUNNING (SYSRQ)
> crash> vm
> PID: 4304   TASK: 5d7e9030  CPU: 0   COMMAND: "bash"
>    MM       PGD      RSS    TOTAL_VM
> f7e7f040  5d5002c0  2616k    3972k
>   VMA       START      END    FLAGS  FILE
> 5fe454ec   8048000   80ee000   1875  /bin/bash
> 5fe45e34   80ee000   80f3000 101877  /bin/bash
> ...
>
> crash> rd 8048000
> rd: invalid kernel virtual address: 8048000  type: "32-bit KVADDR"
> crash> rd -u 8048000
> rd: invalid user virtual address: 8048000  type: "32-bit UVADDR"
> crash> rd 80ee000
> rd: invalid kernel virtual address: 80ee000  type: "32-bit KVADDR"
> crash> rd -u 80ee000
> rd: invalid user virtual address: 80ee000  type: "32-bit UVADDR"
>

The fact that crash initially presumes that 8048000 and 80ee000 are
kernel virtual addresses can be explained by this part of "help -v"
debug output:

flags: 515a
  (NODES_ONLINE|ZONES|PERCPU_KMALLOC_V2|COMMON_VADDR|KMEM_CACHE_INIT|FLATMEM|PERCPU_KMALLOC_V2_NODES)

The "COMMON_VADDR" flag should *only* be set in the case of
the Red Hat hugemem 4G/4G split kernel.  However, I believe that
crash should be able to continue even if the bit is set, as is
the case when you run live.  It is a crash issue having to
do with your 4000000 PAGE_OFFSET, but I think it's benign,
especially if user virtual address accesses run OK on your
live system.

That's one thing that needs verification.  The "invalid user
virtual address" messages above that you get *even* when you
use "-u" would typically be generated as a result of the user
virtual-to-physical address translation. However, they also
could be generated if the virtual page being accessed has been
swapped out.

A better test would be translate all virtual address in the
user address space in one fell swoop with "vm -p".  It's
a verbose command, but for each user virtual page in the
current context, it will translate it to:

(1) the current physical address location, or
(2) if it's not in memory, but is backed by a file, it will show
     what file it comes from, or
(3) if it's been swapped out, what swapfile location is has been
     swapped out to, or
(4) if it's an anonymous page (with no file backing) that hasn't
     been touched yet, it will show "(not mapped)"

Here's a truncated example:

   PID: 19839  TASK: f7b03000  CPU: 1   COMMAND: "bash"
      MM       PGD      RSS    TOTAL_VM
   f6dc5740  f745c9c0  1392k    4532k
     VMA       START      END    FLAGS  FILE
   f69019bc    6fa000    703000     75  /lib/libnss_files-2.5.so
   VIRTUAL   PHYSICAL
   6fa000    12fdba000
   6fb000    12fdbb000
   6fc000    FILE: /lib/libnss_files-2.5.so  OFFSET: 2000
   6fd000    FILE: /lib/libnss_files-2.5.so  OFFSET: 3000
   6fe000    12f660000
   6ff000    12f2cf000
   700000    FILE: /lib/libnss_files-2.5.so  OFFSET: 6000
   701000    FILE: /lib/libnss_files-2.5.so  OFFSET: 7000
   702000    12fc6f000
     VMA       START      END    FLAGS  FILE
   f69013e4    703000    704000 100071  /lib/libnss_files-2.5.so
   VIRTUAL   PHYSICAL
   703000     54791000
     VMA       START      END    FLAGS  FILE
   f6901d84    704000    705000 100073  /lib/libnss_files-2.5.so
   VIRTUAL   PHYSICAL
   704000    12450d000
     VMA       START      END    FLAGS  FILE
   f6901284    a7c000    a96000    875  /lib/ld-2.5.so
   VIRTUAL   PHYSICAL
   a7c000     6ea28000
   a7d000    101f62000
   a7e000     6e6f3000
   a7f000     6e07e000
   a80000     6e084000
   a81000    114c8e000
   ...

Run the command above on a "bash" context on *both* the live
system and the dumpfile -- they should behave in a similar manner,
but I'm guessing you may get some bizarre errors when you
run it on the dumpfile.

Getting back to the base problem with the bogus module read,
here'a suggestion for debugging this.  It requires that you
run the live system, gather some basic data with the crash
utility, and then enter "alt-sysrq-c".  What we want to see
is a virtual-to-physical translation of the first module in
the module list on the live system.  Then crash the system.
Then we want to do the same thing on the subsequent vmcore
to see if the same physical address references are made during
the translation.

So for example, on my live system, the "/dev/crash" kernel
module is the last module entered, and therefore is pointed
to by the base kernel's "modules" list_head:

   crash> p modules
   modules = $2 = {
     next = 0xf8bd0904,
     prev = 0xf882b104
   }

Subtract 4 from the "next" pointer, and display the module:

   crash> module 0xf8bd0900
   struct module {
     state = MODULE_STATE_LIVE,
     list = {
       next = 0xf8caf984,
       prev = 0xc06787b0
     },
     name =    "crash",
     mkobj = {
       kobj = {
         k_name = 0xf8bd094c "crash",
         name = "crash",
         kref = {
           refcount = {
             counter = 2
           }
         },
     ...

Then translate it:

   crash> vtop 0xf8bd0900
   VIRTUAL   PHYSICAL
   f8bd0900  48ba1900

   PAGE DIRECTORY: c0724000
     PGD: c0724018 => 4001
     PMD:     4e28 => 37ae067
     PTE:  37aee80 => 48ba1163
    PAGE: 48ba1000

     PTE     PHYSICAL  FLAGS
   48ba1163  48ba1000  (PRESENT|RW|ACCESSED|DIRTY|GLOBAL)

     PAGE     PHYSICAL   MAPPING    INDEX CNT FLAGS
   c1917420   48ba1000         0    785045  1 c0000000
   crash>

Do the same type of thing on your live system (where you'll
have a different module), and save the output in a file.

Then immediately enter "alt-sysrq-c".

With the resultant dumpfile, perform the same "p modules",
"module <next-address-4>", and "vtop <next-address-4> steps
as done above.  The output *should* be identical, although
we're primarily interested in the vtop output given that
the "module <next-address-4>" will probably show garbage.

(BTW, this presumes that the first module in the kernel list
will still return bogus data like your current dumpfile.  That
may not be the case, and if so, we'll need to do something
similar but different. For example, on the live system, capture
the address of the "ext3" module, vtop it, crash the system,
and then do the same thing in the dumpfile.  You might want to
do that anyway, just in case the default behavior is different.
Then again, maybe it will work both live and in the dumpfile
for the ext3 module address, in which case we'll need to go in a
different debug-direction...)

Show the outputs of the live system and the subsequent dumpfile.
If they both end up resolving to the same physical address,
then there's an issue with the dumpfile.

KEXEC-KDUMP:

I talked to Vivek Goyal, who originally wrote the kexec-tools
facility, and he pointed me to this in the kexec-tools package's
"kexec/arch/i386/crashdump-x86.h" file:

   #define PAGE_OFFSET     0xc0000000
   #define __pa(x)         ((unsigned long)(x)-PAGE_OFFSET)

   #define __VMALLOC_RESERVE       (128 << 20)
   #define MAXMEM                  (-PAGE_OFFSET-__VMALLOC_RESERVE)

where for x86, it hard-wires the x86 PAGE_OFFSET to c0000000,
and will certainly result in a bogus MAXMEM given that your
PAGE_OFFSET is 40000000.  I don't know if that is related to
the problem, but if you do a "readelf -a" of your vmcore file,
you'll see some funky virtual address values for each PT_LOAD
segment.  They were dumped in the crash.log you sent me.  Note
that the virtual address regions (p_vaddr) are c0000000,
c0100000, c5000000, ffffffffffffffff and ffffffffffffffff,
all of which are incorrect or nonsensical w/respect to your
1G/3G split:

Elf64_Phdr:
                  p_type: 1 (PT_LOAD)
                p_offset: 728 (2d8)
                 p_vaddr: c0000000
                 p_paddr: 0
                p_filesz: 655360 (a0000)
                 p_memsz: 655360 (a0000)
                 p_flags: 7 (PF_X|PF_W|PF_R)
                 p_align: 0
Elf64_Phdr:
                  p_type: 1 (PT_LOAD)
                p_offset: 656088 (a02d8)
                 p_vaddr: c0100000
                 p_paddr: 100000
                p_filesz: 15728640 (f00000)
                 p_memsz: 15728640 (f00000)
                 p_flags: 7 (PF_X|PF_W|PF_R)
                 p_align: 0
Elf64_Phdr:
                  p_type: 1 (PT_LOAD)
                p_offset: 16384728 (fa02d8)
                 p_vaddr: c5000000
                 p_paddr: 5000000
                p_filesz: 855638016 (33000000)
                 p_memsz: 855638016 (33000000)
                 p_flags: 7 (PF_X|PF_W|PF_R)
                 p_align: 0
Elf64_Phdr:
                  p_type: 1 (PT_LOAD)
                p_offset: 872022744 (33fa02d8)
                 p_vaddr: ffffffffffffffff
                 p_paddr: 38000000
                p_filesz: 2272854016 (87790000)
                 p_memsz: 2272854016 (87790000)
                 p_flags: 7 (PF_X|PF_W|PF_R)
                 p_align: 0
Elf64_Phdr:
                  p_type: 1 (PT_LOAD)
                p_offset: 3144876760 (bb7302d8)
                 p_vaddr: ffffffffffffffff
                 p_paddr: 100000000
                p_filesz: 1073741824 (40000000)
                 p_memsz: 1073741824 (40000000)
                 p_flags: 7 (PF_X|PF_W|PF_R)
                 p_align: 0

Now, the crash utility only uses the p_paddr physical address
fields for x86 dumpfiles, so that shouldn't be a problem.

But I wonder whether when the /proc/vmcore is put together
that there isn't some problem with the data that it accesses?

Thanks,
   Dave








--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [Crash-utility] "cannot access vmalloc'd module memory" when loading kdump'ed vmcore in crash
  2008-10-04 17:34           ` Worth, Kevin
@ 2008-10-04 17:47             ` Worth, Kevin
  2008-10-06 15:10             ` Dave Anderson
  1 sibling, 0 replies; 8+ messages in thread
From: Worth, Kevin @ 2008-10-04 17:47 UTC (permalink / raw)
  To: Discussion list for crash utility usage,	maintenance and development,
	kexec-ml

Whoops! Important typo....  "other than that it involves network traffic and should NOT be causing any funky behavior here"

-Kevin

-----Original Message-----
From: crash-utility-bounces@redhat.com [mailto:crash-utility-bounces@redhat.com] On Behalf Of Worth, Kevin
Sent: Saturday, October 04, 2008 10:35 AM
To: Discussion list for crash utility usage, maintenance and development; kexec-ml
Subject: RE: [Crash-utility] "cannot access vmalloc'd module memory" when loading kdump'ed vmcore in crash

Hi Dave (and kexec list),

Including kexec list in this email because Dave mentioned: "Show the outputs of the live system and the subsequent dumpfile. If they both end up resolving to the same physical address, then there's an issue with the dumpfile." Things appear to be resolving to the same address (though I suspect Dave can confirm).

Please see below. I did have to censor one of the lines a bit- I do have a proprietary kernel module that I'm not able to discuss much about other than that it involves network traffic and should be causing any funky behavior here (the ext3 module seems to behave the same way). I just changed its name to "custom_lkm".

One additional note, although my "running system" kernel has the modified 3G kernel / 1G user split, my "capture kernel" is just the standard Ubuntu kernel (with 1G kernel / 3G user). The system panic'ed immediately if I tried to use my modified kernel as the "capture kernel". I figured this was outside the norm so I've been using the standard kernel to perform the capture.

First I ran through some commands Dave suggested on the live system (my contexts for the live system and dump were different, but what might be more important is that "vm -p" on a live system produced errors, while on the dump it did not):

crash> vm
PID: 32227  TASK: 47bc8030  CPU: 0   COMMAND: "crash"
   MM       PGD      RSS    TOTAL_VM
f7e67040  5fddfe00  63336k   67412k
  VMA       START      END    FLAGS  FILE
f3ed61d4   8048000   83e5000   1875  /root/crash
f3ed6d84   83e5000   83fc000 101877  /root/crash
....


crash> vm -p
PID: 32227  TASK: 47bc8030  CPU: 0   COMMAND: "crash"
   MM       PGD      RSS    TOTAL_VM
f7e67040  5fddfe00  63336k   67412k
  VMA       START      END    FLAGS  FILE
f3ed61d4   8048000   83e5000   1875  /root/crash
VIRTUAL   PHYSICAL
vm: read error: physical address: 10b60b000  type: "page table"


crash> p modules
modules = $2 = {
  next = 0xf9088284,
  prev = 0xf8842104
}
crash> module 0xf9088280
struct module {
  state = MODULE_STATE_LIVE,
  list = {
    next = 0xf8ff9d84,
    prev = 0x403c63a4
  },
  name = "custom_lkm\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000\000\000",
  mkobj = {
    kobj = {
      k_name = 0xf90882cc "custom_lkm",
      name = "custom_lkm\000\000\000\000\000\000\000\000",
      kref = {
        refcount = {
          counter = 3
        }
      },
      entry = {
        next = 0x403c6068,
        prev = 0xf8ff9de4
      },
...


crash> vtop 0xf9088280
VIRTUAL   PHYSICAL
f9088280  119b98280

PAGE DIRECTORY: 4044b000
  PGD: 4044b018 => 6001
  PMD:     6e40 => 1d515067
  PTE: 1d515440 => 119b98163
 PAGE: 119b98000

   PTE     PHYSICAL   FLAGS
119b98163  119b98000  (PRESENT|RW|ACCESSED|DIRTY|GLOBAL)


crash> mod | grep ext3
f88c8000  ext3             132616  (not loaded)  [CONFIG_KALLSYMS]

crash> module 0xf88c8000
struct module {
  state = MODULE_STATE_LIVE,
  list = {
    next = 0xf88a6604,
    prev = 0xf885d584
  },
  name = "ext3\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
  mkobj = {
    kobj = {
      k_name = 0xf88c804c "ext3",
      name = "ext3\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",

      kref = {
        refcount = {
          counter = 3
        }
      },
      entry = {
        next = 0xf885d5e4,
        prev = 0xf88a6664
      },
...

(Realized afterward that I forgot to vtop ext3. Let me know if it's needed and I can repeat this procedure)


From dump file:

crash> vm
PID: 4323   TASK: 47be0a90  CPU: 0   COMMAND: "bash"
   MM       PGD      RSS    TOTAL_VM
5d683580  5d500dc0  2616k    3968k
  VMA       START      END    FLAGS  FILE
5fc2aac4   8048000   80ee000   1875  /bin/bash
5fe5f0cc   80ee000   80f3000 101877  /bin/bash
...


crash> vm -p
PID: 4323   TASK: 47be0a90  CPU: 0   COMMAND: "bash"
   MM       PGD      RSS    TOTAL_VM
5d683580  5d500dc0  2616k    3968k
  VMA       START      END    FLAGS  FILE
5fc2aac4   8048000   80ee000   1875  /bin/bash
VIRTUAL   PHYSICAL
8048000   FILE: /bin/bash  OFFSET: 0
8049000   FILE: /bin/bash  OFFSET: 1000
804a000   FILE: /bin/bash  OFFSET: 2000
...no errors, lots of output


crash> modules
modules = $2 = {
  next = 0xf9088284,
  prev = 0xf8842104
}

crash> module 0xf9088280
struct module {
  state = MODULE_STATE_LIVE,
  list = {
    next = 0x0,
    prev = 0x0
  },
  name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000",
  mkobj = {
    kobj = {
      k_name = 0x0,
      name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000",
      kref = {
        refcount = {
          counter = 0
        }
      },
      entry = {
        next = 0x0,
        prev = 0x0
...

crash> vtop 0xf9088280
VIRTUAL   PHYSICAL
f9088280  119b98280

PAGE DIRECTORY: 4044b000
  PGD: 4044b018 => 6001
  PMD:     6e40 => 1d515067
  PTE: 1d515440 => 119b98163
 PAGE: 119b98000

   PTE     PHYSICAL   FLAGS
119b98163  119b98000  (PRESENT|RW|ACCESSED|DIRTY|GLOBAL)

  PAGE     PHYSICAL   MAPPING    INDEX CNT FLAGS
47337300  119b98000         0         0  1 80000000

crash> mod | grep ext3
mod: cannot access vmalloc'd module memory

(using the same address that ext3 had in a running system)
crash> module 0xf88c8000
struct module {
  state = MODULE_STATE_LIVE,
  list = {
    next = 0x0,
    prev = 0x0
  },
  name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000",
  mkobj = {
    kobj = {
      k_name = 0x0,
      name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00\000\000",
      kref = {
        refcount = {
          counter = 0
        }
      },
      entry = {
        next = 0x0,
        prev = 0x0
...

crash> vtop 0xf88c8000
VIRTUAL   PHYSICAL
f88c8000  13905f000

PAGE DIRECTORY: 4044b000
  PGD: 4044b018 => 6001
  PMD:     6e20 => 1d5fc067
  PTE: 1d5fc640 => 13905f163
 PAGE: 13905f000

   PTE     PHYSICAL   FLAGS
13905f163  13905f000  (PRESENT|RW|ACCESSED|DIRTY|GLOBAL)

  PAGE     PHYSICAL   MAPPING    INDEX CNT FLAGS


Thanks again for your continued assistance. I hope this is helpful information.

-Kevin


-----Original Message-----
From: crash-utility-bounces@redhat.com [mailto:crash-utility-bounces@redhat.com] On Behalf Of Dave Anderson
Sent: Friday, October 03, 2008 8:44 AM
To: Discussion list for crash utility usage, maintenance and development; kexec@lists.infradead.org
Subject: Re: [Crash-utility] "cannot access vmalloc'd module memory" when loading kdump'ed vmcore in crash


NOTE: I've restored the kexec list to this discussion because
this 1G/3G issue does have ramifications w/respect to kexec-tools.
I'm first going to ramble on about crash utility debugging for a
bit here, but for the kexec/kdump masters in the audience, please
at least take a look at the end of this message (do a "find in this
message" for "KEXEC-KDUMP") where I discuss the kexec-tools
hardwiring of the x86 PAGE_OFFSET to c000000, and whether it
could screw up the dumpfile contents for Kevin's 1G/3G split
where his PAGE_OFFSET is 40000000.

First, the crash discussion...

Worth, Kevin wrote:
> Yep, I can run mod commands on a live system just fine.
>
> Looks like "next" doesn't point to fffffffc...
>

No, but it's 0x0, and therefore the "next" module in the
list gets calculated as 0 - offset-of-list-member, or
fffffffc.  And "MODULE_STATE_LIVE" is being shown by dumb
luck because its enumerator value is 0:

> crash> module f9088280
> struct module {
>   state = MODULE_STATE_LIVE,
>   list = {
>     next = 0x0,
>     prev = 0x0
>   },
>   name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
> 00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
> 00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
> 00\000",
>   mkobj = {
>     kobj = {
>       k_name = 0x0,
>       name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
> 00\000\000",
>       kref = {
>         refcount = {
>           counter = 0
...
>
> ...and all the rest of the struct is zeros too...

Right, so we know bogus data is being read from the dumpfile.  The question is:

(1) whether the virtual-to-physical address translation is
     failing somehow, or
(2) the dumpfile is screwed up.

> Does the following mean that user virtual address translations are failing too?
>
> crash> set
>     PID: 4304
> COMMAND: "bash"
>    TASK: 5d7e9030  [THREAD_INFO: f4b70000]
>     CPU: 0
>   STATE: TASK_RUNNING (SYSRQ)
> crash> vm
> PID: 4304   TASK: 5d7e9030  CPU: 0   COMMAND: "bash"
>    MM       PGD      RSS    TOTAL_VM
> f7e7f040  5d5002c0  2616k    3972k
>   VMA       START      END    FLAGS  FILE
> 5fe454ec   8048000   80ee000   1875  /bin/bash
> 5fe45e34   80ee000   80f3000 101877  /bin/bash
> ...
>
> crash> rd 8048000
> rd: invalid kernel virtual address: 8048000  type: "32-bit KVADDR"
> crash> rd -u 8048000
> rd: invalid user virtual address: 8048000  type: "32-bit UVADDR"
> crash> rd 80ee000
> rd: invalid kernel virtual address: 80ee000  type: "32-bit KVADDR"
> crash> rd -u 80ee000
> rd: invalid user virtual address: 80ee000  type: "32-bit UVADDR"
>

The fact that crash initially presumes that 8048000 and 80ee000 are
kernel virtual addresses can be explained by this part of "help -v"
debug output:

flags: 515a
  (NODES_ONLINE|ZONES|PERCPU_KMALLOC_V2|COMMON_VADDR|KMEM_CACHE_INIT|FLATMEM|PERCPU_KMALLOC_V2_NODES)

The "COMMON_VADDR" flag should *only* be set in the case of
the Red Hat hugemem 4G/4G split kernel.  However, I believe that
crash should be able to continue even if the bit is set, as is
the case when you run live.  It is a crash issue having to
do with your 4000000 PAGE_OFFSET, but I think it's benign,
especially if user virtual address accesses run OK on your
live system.

That's one thing that needs verification.  The "invalid user
virtual address" messages above that you get *even* when you
use "-u" would typically be generated as a result of the user
virtual-to-physical address translation. However, they also
could be generated if the virtual page being accessed has been
swapped out.

A better test would be translate all virtual address in the
user address space in one fell swoop with "vm -p".  It's
a verbose command, but for each user virtual page in the
current context, it will translate it to:

(1) the current physical address location, or
(2) if it's not in memory, but is backed by a file, it will show
     what file it comes from, or
(3) if it's been swapped out, what swapfile location is has been
     swapped out to, or
(4) if it's an anonymous page (with no file backing) that hasn't
     been touched yet, it will show "(not mapped)"

Here's a truncated example:

   PID: 19839  TASK: f7b03000  CPU: 1   COMMAND: "bash"
      MM       PGD      RSS    TOTAL_VM
   f6dc5740  f745c9c0  1392k    4532k
     VMA       START      END    FLAGS  FILE
   f69019bc    6fa000    703000     75  /lib/libnss_files-2.5.so
   VIRTUAL   PHYSICAL
   6fa000    12fdba000
   6fb000    12fdbb000
   6fc000    FILE: /lib/libnss_files-2.5.so  OFFSET: 2000
   6fd000    FILE: /lib/libnss_files-2.5.so  OFFSET: 3000
   6fe000    12f660000
   6ff000    12f2cf000
   700000    FILE: /lib/libnss_files-2.5.so  OFFSET: 6000
   701000    FILE: /lib/libnss_files-2.5.so  OFFSET: 7000
   702000    12fc6f000
     VMA       START      END    FLAGS  FILE
   f69013e4    703000    704000 100071  /lib/libnss_files-2.5.so
   VIRTUAL   PHYSICAL
   703000     54791000
     VMA       START      END    FLAGS  FILE
   f6901d84    704000    705000 100073  /lib/libnss_files-2.5.so
   VIRTUAL   PHYSICAL
   704000    12450d000
     VMA       START      END    FLAGS  FILE
   f6901284    a7c000    a96000    875  /lib/ld-2.5.so
   VIRTUAL   PHYSICAL
   a7c000     6ea28000
   a7d000    101f62000
   a7e000     6e6f3000
   a7f000     6e07e000
   a80000     6e084000
   a81000    114c8e000
   ...

Run the command above on a "bash" context on *both* the live
system and the dumpfile -- they should behave in a similar manner,
but I'm guessing you may get some bizarre errors when you
run it on the dumpfile.

Getting back to the base problem with the bogus module read,
here'a suggestion for debugging this.  It requires that you
run the live system, gather some basic data with the crash
utility, and then enter "alt-sysrq-c".  What we want to see
is a virtual-to-physical translation of the first module in
the module list on the live system.  Then crash the system.
Then we want to do the same thing on the subsequent vmcore
to see if the same physical address references are made during
the translation.

So for example, on my live system, the "/dev/crash" kernel
module is the last module entered, and therefore is pointed
to by the base kernel's "modules" list_head:

   crash> p modules
   modules = $2 = {
     next = 0xf8bd0904,
     prev = 0xf882b104
   }

Subtract 4 from the "next" pointer, and display the module:

   crash> module 0xf8bd0900
   struct module {
     state = MODULE_STATE_LIVE,
     list = {
       next = 0xf8caf984,
       prev = 0xc06787b0
     },
     name =    "crash",
     mkobj = {
       kobj = {
         k_name = 0xf8bd094c "crash",
         name = "crash",
         kref = {
           refcount = {
             counter = 2
           }
         },
     ...

Then translate it:

   crash> vtop 0xf8bd0900
   VIRTUAL   PHYSICAL
   f8bd0900  48ba1900

   PAGE DIRECTORY: c0724000
     PGD: c0724018 => 4001
     PMD:     4e28 => 37ae067
     PTE:  37aee80 => 48ba1163
    PAGE: 48ba1000

     PTE     PHYSICAL  FLAGS
   48ba1163  48ba1000  (PRESENT|RW|ACCESSED|DIRTY|GLOBAL)

     PAGE     PHYSICAL   MAPPING    INDEX CNT FLAGS
   c1917420   48ba1000         0    785045  1 c0000000
   crash>

Do the same type of thing on your live system (where you'll
have a different module), and save the output in a file.

Then immediately enter "alt-sysrq-c".

With the resultant dumpfile, perform the same "p modules",
"module <next-address-4>", and "vtop <next-address-4> steps
as done above.  The output *should* be identical, although
we're primarily interested in the vtop output given that
the "module <next-address-4>" will probably show garbage.

(BTW, this presumes that the first module in the kernel list
will still return bogus data like your current dumpfile.  That
may not be the case, and if so, we'll need to do something
similar but different. For example, on the live system, capture
the address of the "ext3" module, vtop it, crash the system,
and then do the same thing in the dumpfile.  You might want to
do that anyway, just in case the default behavior is different.
Then again, maybe it will work both live and in the dumpfile
for the ext3 module address, in which case we'll need to go in a
different debug-direction...)

Show the outputs of the live system and the subsequent dumpfile.
If they both end up resolving to the same physical address,
then there's an issue with the dumpfile.

KEXEC-KDUMP:

I talked to Vivek Goyal, who originally wrote the kexec-tools
facility, and he pointed me to this in the kexec-tools package's
"kexec/arch/i386/crashdump-x86.h" file:

   #define PAGE_OFFSET     0xc0000000
   #define __pa(x)         ((unsigned long)(x)-PAGE_OFFSET)

   #define __VMALLOC_RESERVE       (128 << 20)
   #define MAXMEM                  (-PAGE_OFFSET-__VMALLOC_RESERVE)

where for x86, it hard-wires the x86 PAGE_OFFSET to c0000000,
and will certainly result in a bogus MAXMEM given that your
PAGE_OFFSET is 40000000.  I don't know if that is related to
the problem, but if you do a "readelf -a" of your vmcore file,
you'll see some funky virtual address values for each PT_LOAD
segment.  They were dumped in the crash.log you sent me.  Note
that the virtual address regions (p_vaddr) are c0000000,
c0100000, c5000000, ffffffffffffffff and ffffffffffffffff,
all of which are incorrect or nonsensical w/respect to your
1G/3G split:

Elf64_Phdr:
                  p_type: 1 (PT_LOAD)
                p_offset: 728 (2d8)
                 p_vaddr: c0000000
                 p_paddr: 0
                p_filesz: 655360 (a0000)
                 p_memsz: 655360 (a0000)
                 p_flags: 7 (PF_X|PF_W|PF_R)
                 p_align: 0
Elf64_Phdr:
                  p_type: 1 (PT_LOAD)
                p_offset: 656088 (a02d8)
                 p_vaddr: c0100000
                 p_paddr: 100000
                p_filesz: 15728640 (f00000)
                 p_memsz: 15728640 (f00000)
                 p_flags: 7 (PF_X|PF_W|PF_R)
                 p_align: 0
Elf64_Phdr:
                  p_type: 1 (PT_LOAD)
                p_offset: 16384728 (fa02d8)
                 p_vaddr: c5000000
                 p_paddr: 5000000
                p_filesz: 855638016 (33000000)
                 p_memsz: 855638016 (33000000)
                 p_flags: 7 (PF_X|PF_W|PF_R)
                 p_align: 0
Elf64_Phdr:
                  p_type: 1 (PT_LOAD)
                p_offset: 872022744 (33fa02d8)
                 p_vaddr: ffffffffffffffff
                 p_paddr: 38000000
                p_filesz: 2272854016 (87790000)
                 p_memsz: 2272854016 (87790000)
                 p_flags: 7 (PF_X|PF_W|PF_R)
                 p_align: 0
Elf64_Phdr:
                  p_type: 1 (PT_LOAD)
                p_offset: 3144876760 (bb7302d8)
                 p_vaddr: ffffffffffffffff
                 p_paddr: 100000000
                p_filesz: 1073741824 (40000000)
                 p_memsz: 1073741824 (40000000)
                 p_flags: 7 (PF_X|PF_W|PF_R)
                 p_align: 0

Now, the crash utility only uses the p_paddr physical address
fields for x86 dumpfiles, so that shouldn't be a problem.

But I wonder whether when the /proc/vmcore is put together
that there isn't some problem with the data that it accesses?

Thanks,
   Dave








--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Crash-utility] "cannot access vmalloc'd module memory" when loading kdump'ed vmcore in crash
  2008-10-04 17:34           ` Worth, Kevin
  2008-10-04 17:47             ` Worth, Kevin
@ 2008-10-06 15:10             ` Dave Anderson
  2008-10-06 16:15               ` Worth, Kevin
  1 sibling, 1 reply; 8+ messages in thread
From: Dave Anderson @ 2008-10-06 15:10 UTC (permalink / raw)
  To: Discussion list for crash utility usage, maintenance and development
  Cc: kexec-ml


----- "Kevin Worth" <kevin.worth@hp.com> wrote:

OK, let's skip the user-space angle for now, because I keep
forgetting that you are running with /dev/mem as the memory
source.  And there is an inconsistency with your debug output
that I cannot explain.

As I mentioned before, the /dev/mem driver has this immediate 
restriction in "drivers/char/mem.c":

  static ssize_t read_mem(struct file * file, char __user * buf,
                          size_t count, loff_t *ppos)
  {
          unsigned long p = *ppos;
          ssize_t read, sz;
          char *ptr;
  
          if (!valid_phys_addr_range(p, count))
                  return -EFAULT;
          ...
  
where for x86, it looks like this:
  
  static inline int valid_phys_addr_range(unsigned long addr, size_t count)
  {
          if (addr + count > __pa(high_memory))
                  return 0;
  
          return 1;
  }

That restricts is from reading "highmem", which is the extent
of physical memory that can be unity-mapped, which means that
the kernel can directly access it by simply adding the PAGE_OFFSET 
value to the physical address.  In your case, your PAGE_OFFSET is 
0x40000000.  With your 1G/3G split, you've got 3GB of kernel virtual 
address space that you can directly access, minus 128MB at the top that
is used for the vmalloc() address range.  (3GB - 128MB) is 0xb8000000.
Therefore, your "high_memory" maximum unity-mapped kernel virtual 
address is (0xb8000000 + PAGE_OFFSET), or in your case is 0xf8000000,
your high_memory value is 0xf8000000.

In any case, on your live system, whenever a crash utility readmem()
is done that accesses a physical address beyond 0xb8000000, it *should* 
get back the EFAULT above and fail, and therefore the crash command
making the readmem() fails.

Accordingly, when you did this on your live system:

> crash> vm -p
> PID: 32227  TASK: 47bc8030  CPU: 0   COMMAND: "crash"
>    MM       PGD      RSS    TOTAL_VM
> f7e67040  5fddfe00  63336k   67412k
>   VMA       START      END    FLAGS  FILE
> f3ed61d4   8048000   83e5000   1875  /root/crash
> VIRTUAL   PHYSICAL
> vm: read error: physical address: 10b60b000  type: "page table"

It ended up translating the first user virtual address (8048000),
requiring a page-table translation, and ended up trying to access
a page table page at physical address 0x10b60b000, which /dev/mem
did not allow, because you got a "read error".

However -- and this is what I cannot explain -- the above can also
happen on a live system when accessing vmalloc() kernel virtual space 
as well *if* any PTE or page table read to make the translation, or 
*if* the ending physical page itself, are beyond the /dev/mem restriction
(again, which should be 0xb8000000 in your case).  

So when you did this on your live system, you referenced the vmalloc
address of your custom module at address 0xf9088280, and successfully
read and displayed its contents:
 
> 
> crash> p modules
> modules = $2 = {
>   next = 0xf9088284,
>   prev = 0xf8842104
> }
> crash> module 0xf9088280
> struct module {
>   state = MODULE_STATE_LIVE,
>   list = {
>     next = 0xf8ff9d84,
>     prev = 0x403c63a4
>   },
>   name = "custom_lkm\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
> 0\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
> 0\000\000\000\000\000\000\000\000\000\000\000\000\000",
>   mkobj = {
>     kobj = {
>       k_name = 0xf90882cc "custom_lkm",
>       name = "custom_lkm\000\000\000\000\000\000\000\000",
>       kref = {
>         refcount = {
>           counter = 3
>         }
>       },
>       entry = {
>         next = 0x403c6068,
>         prev = 0xf8ff9de4
>       },
> ...
> 

But when you did vtop of 0xf9088280, it ended up translating
to 119b98000, which is well beyond 4GB (never mind 0xb8000000), so
/dev/mem should not have been able to read it:

> 
> crash> vtop 0xf9088280
> VIRTUAL   PHYSICAL
> f9088280  119b98280
> 
> PAGE DIRECTORY: 4044b000
>   PGD: 4044b018 => 6001
>   PMD:     6e40 => 1d515067
>   PTE: 1d515440 => 119b98163
>  PAGE: 119b98000
> 
>    PTE     PHYSICAL   FLAGS
> 119b98163  119b98000  (PRESENT|RW|ACCESSED|DIRTY|GLOBAL)
> 

By any chance has the /dev/mem driver been modified on your kernel?

In any case, I can't explain why you are apprently able to access 
physical addresses beyond your "high_memory"?  an. 

Anyway, the ext3 translation is useless without the accompanying "vtop":

> 
> crash> mod | grep ext3
> f88c8000  ext3             132616  (not loaded)  [CONFIG_KALLSYMS]
> 
> ... [ snip ] ...
> 
> (Realized afterward that I forgot to vtop ext3. Let me know if it's needed and I can repeat this procedure)
> 

And the "bash" vm output only makes sense with respect to
its output on the live system:

> >From dump file:
> 
> crash> vm
> PID: 4323   TASK: 47be0a90  CPU: 0   COMMAND: "bash"
>    MM       PGD      RSS    TOTAL_VM
> 5d683580  5d500dc0  2616k    3968k
>   VMA       START      END    FLAGS  FILE
> 5fc2aac4   8048000   80ee000   1875  /bin/bash
> 5fe5f0cc   80ee000   80f3000 101877  /bin/bash
> ...
> 
> 
> crash> vm -p
> PID: 4323   TASK: 47be0a90  CPU: 0   COMMAND: "bash"
>    MM       PGD      RSS    TOTAL_VM
> 5d683580  5d500dc0  2616k    3968k
>   VMA       START      END    FLAGS  FILE
> 5fc2aac4   8048000   80ee000   1875  /bin/bash
> VIRTUAL   PHYSICAL
> 8048000   FILE: /bin/bash  OFFSET: 0
> 8049000   FILE: /bin/bash  OFFSET: 1000
> 804a000   FILE: /bin/bash  OFFSET: 2000
> ...no errors, lots of output
> 

But getting back to vmalloc'd module space, your access of the module
at vmalloc-address-f9088280/physical-address-119b98000 showed that 
it's getting back a page of zeroes, while accessing the same physical
address (0x119b98000) the you successfully read (but how?) on the live
system:

> 
> crash> modules
> modules = $2 = {
>   next = 0xf9088284,
>   prev = 0xf8842104
> }
> 
> crash> module 0xf9088280
> struct module {
>   state = MODULE_STATE_LIVE,
>   list = {
>     next = 0x0,
>     prev = 0x0
>   },
>   name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
> 00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
> 00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
> 00\000",
>   mkobj = {
>     kobj = {
>       k_name = 0x0,
>       name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
> 00\000\000",
>       kref = {
>         refcount = {
>           counter = 0
>         }
>       },
>       entry = {
>         next = 0x0,
>         prev = 0x0
> ...
> 
> crash> vtop 0xf9088280
> VIRTUAL   PHYSICAL
> f9088280  119b98280
> 
> PAGE DIRECTORY: 4044b000
>   PGD: 4044b018 => 6001
>   PMD:     6e40 => 1d515067
>   PTE: 1d515440 => 119b98163
>  PAGE: 119b98000
> 
>    PTE     PHYSICAL   FLAGS
> 119b98163  119b98000  (PRESENT|RW|ACCESSED|DIRTY|GLOBAL)
> 
>   PAGE     PHYSICAL   MAPPING    INDEX CNT FLAGS
> 47337300  119b98000         0         0  1 80000000

And so even though I'd like to point out that analogous readmem()
on the dumpfile reads the same physical location -- and seems to
just return zeroes -- is not enough for me to simply state that
it's a problem with kexec/kdump.
 
Because, again, I cannot explain how you are able to access 
physical address 0x119b98000 from /dev/mem on your live 
system?

Can you check whether your kernel source has modified
the read_mem() or valid_phys_addr_range() functions?
If they unchanged from what I showed above (from 2.6.20),
then I'm stumped, because it makes no sense to me how you
can read from those physical addresses on your live system.

For verification, if you do this:

  crash> p high_memory

it should show 0xf8000000.  If you then do a vtop of 0xf8000000,
it will simply end up stripping off the PAGE_OFFSET of 0x40000000, 
resulting in the maximum-accessible physical address of 0xb8000000.
And if you can do this:

  crash> rd -p 0xb8000000

it should fail -- as should any address equal to or above it.
But your output above that translates the module vmalloc
addresses seemingly reads physical addresses well beyond the
4GB (0x100000000).  And that's what I cannot begin to explain.

So I'm running out of ideas here...

One thing I can suggest is to rebuild your kexec-tools package
that you're using, and correct the PAGE_OFFSET value to equal
your system's.  The version of "kexec/arch/i386/crashdump-x86.h"
that we (Red Hat) are using looks like this: 
  
  #ifndef CRASHDUMP_X86_H
  #define CRASHDUMP_X86_H
  
  struct kexec_info;
  int load_crashdump_segments(struct kexec_info *info, char *mod_cmdline,
                                  unsigned long max_addr, unsigned long min_base);
  
  #define PAGE_OFFSET     0xc0000000
  #define __pa(x)         ((unsigned long)(x)-PAGE_OFFSET)
  
  #define __VMALLOC_RESERVE       (128 << 20)
  #define MAXMEM                  (-PAGE_OFFSET-__VMALLOC_RESERVE)
  
  #define CRASH_MAX_MEMMAP_NR     (KEXEC_MAX_SEGMENTS + 1)
  #define CRASH_MAX_MEMORY_RANGES (MAX_MEMORY_RANGES + 2)
  
  /* Backup Region, First 640K of System RAM. */
  #define BACKUP_SRC_START        0x00000000
  #define BACKUP_SRC_END          0x0009ffff
  #define BACKUP_SRC_SIZE (BACKUP_SRC_END - BACKUP_SRC_START + 1)
  
  #endif /* CRASHDUMP_X86_H */
  
Try rebuilding your package with PAGE_OFFSET defined as 0x40000000,
and then see what happens.
  
Dave

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [Crash-utility] "cannot access vmalloc'd module memory" when loading kdump'ed vmcore in crash
  2008-10-06 15:10             ` Dave Anderson
@ 2008-10-06 16:15               ` Worth, Kevin
  0 siblings, 0 replies; 8+ messages in thread
From: Worth, Kevin @ 2008-10-06 16:15 UTC (permalink / raw)
  To: Discussion list for crash utility usage,	maintenance and development
  Cc: kexec-ml

Dave,

That does seem pretty strange that the physical address is coming out beyond the 4GB mark and that the read actually succeeds. Just checked on the Ubuntu patches to the 2.6.20 kernel ( http://archive.ubuntu.com/ubuntu/pool/main/l/linux-source-2.6.20/linux-source-2.6.20_2.6.20-17.39.diff.gz ) and no mention of mem.c or either of those two functions.

Let me try the kexec PAGE_OFFSET modification today or tomorrow and reply back on how it goes. If that produces no change I'll try do a re-run of the previous email's process with some more careful attention paid (that I get a vtop of everything and that my context examples are the same process).

-Kevin
________________________________________
From: crash-utility-bounces@redhat.com [crash-utility-bounces@redhat.com] On Behalf Of Dave Anderson [anderson@redhat.com]
Sent: Monday, October 06, 2008 8:10 AM
To: Discussion list for crash utility usage,    maintenance and development
Cc: kexec-ml
Subject: Re: [Crash-utility] "cannot access vmalloc'd module memory" when       loading kdump'ed vmcore in crash

----- "Kevin Worth" <kevin.worth@hp.com> wrote:

OK, let's skip the user-space angle for now, because I keep
forgetting that you are running with /dev/mem as the memory
source.  And there is an inconsistency with your debug output
that I cannot explain.

As I mentioned before, the /dev/mem driver has this immediate
restriction in "drivers/char/mem.c":

  static ssize_t read_mem(struct file * file, char __user * buf,
                          size_t count, loff_t *ppos)
  {
          unsigned long p = *ppos;
          ssize_t read, sz;
          char *ptr;

          if (!valid_phys_addr_range(p, count))
                  return -EFAULT;
          ...

where for x86, it looks like this:

  static inline int valid_phys_addr_range(unsigned long addr, size_t count)
  {
          if (addr + count > __pa(high_memory))
                  return 0;

          return 1;
  }

That restricts is from reading "highmem", which is the extent
of physical memory that can be unity-mapped, which means that
the kernel can directly access it by simply adding the PAGE_OFFSET
value to the physical address.  In your case, your PAGE_OFFSET is
0x40000000.  With your 1G/3G split, you've got 3GB of kernel virtual
address space that you can directly access, minus 128MB at the top that
is used for the vmalloc() address range.  (3GB - 128MB) is 0xb8000000.
Therefore, your "high_memory" maximum unity-mapped kernel virtual
address is (0xb8000000 + PAGE_OFFSET), or in your case is 0xf8000000,
your high_memory value is 0xf8000000.

In any case, on your live system, whenever a crash utility readmem()
is done that accesses a physical address beyond 0xb8000000, it *should*
get back the EFAULT above and fail, and therefore the crash command
making the readmem() fails.

Accordingly, when you did this on your live system:

> crash> vm -p
> PID: 32227  TASK: 47bc8030  CPU: 0   COMMAND: "crash"
>    MM       PGD      RSS    TOTAL_VM
> f7e67040  5fddfe00  63336k   67412k
>   VMA       START      END    FLAGS  FILE
> f3ed61d4   8048000   83e5000   1875  /root/crash
> VIRTUAL   PHYSICAL
> vm: read error: physical address: 10b60b000  type: "page table"

It ended up translating the first user virtual address (8048000),
requiring a page-table translation, and ended up trying to access
a page table page at physical address 0x10b60b000, which /dev/mem
did not allow, because you got a "read error".

However -- and this is what I cannot explain -- the above can also
happen on a live system when accessing vmalloc() kernel virtual space
as well *if* any PTE or page table read to make the translation, or
*if* the ending physical page itself, are beyond the /dev/mem restriction
(again, which should be 0xb8000000 in your case).

So when you did this on your live system, you referenced the vmalloc
address of your custom module at address 0xf9088280, and successfully
read and displayed its contents:

>
> crash> p modules
> modules = $2 = {
>   next = 0xf9088284,
>   prev = 0xf8842104
> }
> crash> module 0xf9088280
> struct module {
>   state = MODULE_STATE_LIVE,
>   list = {
>     next = 0xf8ff9d84,
>     prev = 0x403c63a4
>   },
>   name = "custom_lkm\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
> 0\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
> 0\000\000\000\000\000\000\000\000\000\000\000\000\000",
>   mkobj = {
>     kobj = {
>       k_name = 0xf90882cc "custom_lkm",
>       name = "custom_lkm\000\000\000\000\000\000\000\000",
>       kref = {
>         refcount = {
>           counter = 3
>         }
>       },
>       entry = {
>         next = 0x403c6068,
>         prev = 0xf8ff9de4
>       },
> ...
>

But when you did vtop of 0xf9088280, it ended up translating
to 119b98000, which is well beyond 4GB (never mind 0xb8000000), so
/dev/mem should not have been able to read it:

>
> crash> vtop 0xf9088280
> VIRTUAL   PHYSICAL
> f9088280  119b98280
>
> PAGE DIRECTORY: 4044b000
>   PGD: 4044b018 => 6001
>   PMD:     6e40 => 1d515067
>   PTE: 1d515440 => 119b98163
>  PAGE: 119b98000
>
>    PTE     PHYSICAL   FLAGS
> 119b98163  119b98000  (PRESENT|RW|ACCESSED|DIRTY|GLOBAL)
>

By any chance has the /dev/mem driver been modified on your kernel?

In any case, I can't explain why you are apprently able to access
physical addresses beyond your "high_memory"?  an.

Anyway, the ext3 translation is useless without the accompanying "vtop":

>
> crash> mod | grep ext3
> f88c8000  ext3             132616  (not loaded)  [CONFIG_KALLSYMS]
>
> ... [ snip ] ...
>
> (Realized afterward that I forgot to vtop ext3. Let me know if it's needed and I can repeat this procedure)
>

And the "bash" vm output only makes sense with respect to
its output on the live system:

> >From dump file:
>
> crash> vm
> PID: 4323   TASK: 47be0a90  CPU: 0   COMMAND: "bash"
>    MM       PGD      RSS    TOTAL_VM
> 5d683580  5d500dc0  2616k    3968k
>   VMA       START      END    FLAGS  FILE
> 5fc2aac4   8048000   80ee000   1875  /bin/bash
> 5fe5f0cc   80ee000   80f3000 101877  /bin/bash
> ...
>
>
> crash> vm -p
> PID: 4323   TASK: 47be0a90  CPU: 0   COMMAND: "bash"
>    MM       PGD      RSS    TOTAL_VM
> 5d683580  5d500dc0  2616k    3968k
>   VMA       START      END    FLAGS  FILE
> 5fc2aac4   8048000   80ee000   1875  /bin/bash
> VIRTUAL   PHYSICAL
> 8048000   FILE: /bin/bash  OFFSET: 0
> 8049000   FILE: /bin/bash  OFFSET: 1000
> 804a000   FILE: /bin/bash  OFFSET: 2000
> ...no errors, lots of output
>

But getting back to vmalloc'd module space, your access of the module
at vmalloc-address-f9088280/physical-address-119b98000 showed that
it's getting back a page of zeroes, while accessing the same physical
address (0x119b98000) the you successfully read (but how?) on the live
system:

>
> crash> modules
> modules = $2 = {
>   next = 0xf9088284,
>   prev = 0xf8842104
> }
>
> crash> module 0xf9088280
> struct module {
>   state = MODULE_STATE_LIVE,
>   list = {
>     next = 0x0,
>     prev = 0x0
>   },
>   name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
> 00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
> 00\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
> 00\000",
>   mkobj = {
>     kobj = {
>       k_name = 0x0,
>       name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
> 00\000\000",
>       kref = {
>         refcount = {
>           counter = 0
>         }
>       },
>       entry = {
>         next = 0x0,
>         prev = 0x0
> ...
>
> crash> vtop 0xf9088280
> VIRTUAL   PHYSICAL
> f9088280  119b98280
>
> PAGE DIRECTORY: 4044b000
>   PGD: 4044b018 => 6001
>   PMD:     6e40 => 1d515067
>   PTE: 1d515440 => 119b98163
>  PAGE: 119b98000
>
>    PTE     PHYSICAL   FLAGS
> 119b98163  119b98000  (PRESENT|RW|ACCESSED|DIRTY|GLOBAL)
>
>   PAGE     PHYSICAL   MAPPING    INDEX CNT FLAGS
> 47337300  119b98000         0         0  1 80000000

And so even though I'd like to point out that analogous readmem()
on the dumpfile reads the same physical location -- and seems to
just return zeroes -- is not enough for me to simply state that
it's a problem with kexec/kdump.

Because, again, I cannot explain how you are able to access
physical address 0x119b98000 from /dev/mem on your live
system?

Can you check whether your kernel source has modified
the read_mem() or valid_phys_addr_range() functions?
If they unchanged from what I showed above (from 2.6.20),
then I'm stumped, because it makes no sense to me how you
can read from those physical addresses on your live system.

For verification, if you do this:

  crash> p high_memory

it should show 0xf8000000.  If you then do a vtop of 0xf8000000,
it will simply end up stripping off the PAGE_OFFSET of 0x40000000,
resulting in the maximum-accessible physical address of 0xb8000000.
And if you can do this:

  crash> rd -p 0xb8000000

it should fail -- as should any address equal to or above it.
But your output above that translates the module vmalloc
addresses seemingly reads physical addresses well beyond the
4GB (0x100000000).  And that's what I cannot begin to explain.

So I'm running out of ideas here...

One thing I can suggest is to rebuild your kexec-tools package
that you're using, and correct the PAGE_OFFSET value to equal
your system's.  The version of "kexec/arch/i386/crashdump-x86.h"
that we (Red Hat) are using looks like this:

  #ifndef CRASHDUMP_X86_H
  #define CRASHDUMP_X86_H

  struct kexec_info;
  int load_crashdump_segments(struct kexec_info *info, char *mod_cmdline,
                                  unsigned long max_addr, unsigned long min_base);

  #define PAGE_OFFSET     0xc0000000
  #define __pa(x)         ((unsigned long)(x)-PAGE_OFFSET)

  #define __VMALLOC_RESERVE       (128 << 20)
  #define MAXMEM                  (-PAGE_OFFSET-__VMALLOC_RESERVE)

  #define CRASH_MAX_MEMMAP_NR     (KEXEC_MAX_SEGMENTS + 1)
  #define CRASH_MAX_MEMORY_RANGES (MAX_MEMORY_RANGES + 2)

  /* Backup Region, First 640K of System RAM. */
  #define BACKUP_SRC_START        0x00000000
  #define BACKUP_SRC_END          0x0009ffff
  #define BACKUP_SRC_SIZE (BACKUP_SRC_END - BACKUP_SRC_START + 1)

  #endif /* CRASHDUMP_X86_H */

Try rebuilding your package with PAGE_OFFSET defined as 0x40000000,
and then see what happens.

Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Crash-utility] "cannot access vmalloc'd module memory" when loading kdump'ed vmcore in crash
       [not found] <1077541243.786041223321889818.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
@ 2008-10-06 19:39 ` Dave Anderson
  2008-10-10 19:31   ` Worth, Kevin
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Anderson @ 2008-10-06 19:39 UTC (permalink / raw)
  To: Discussion list for crash utility usage, maintenance and development
  Cc: kexec-ml


----- "Kevin Worth" <kevin.worth@hp.com> wrote:

> Dave,
> 
> That does seem pretty strange that the physical address is coming out
> beyond the 4GB mark and that the read actually succeeds. Just checked
> on the Ubuntu patches to the 2.6.20 kernel (
> http://archive.ubuntu.com/ubuntu/pool/main/l/linux-source-2.6.20/linux-source-2.6.20_2.6.20-17.39.diff.gz
> ) and no mention of mem.c or either of those two functions.


Hmmm -- I do see one thing with the /dev/mem driver that could
be an explanation.  Maybe...

Prior to the read() call to /dev/mem, crash does an llseek() to
the target physical address, which gets stored in the open file
structure's file.f_pos member, which is a 64-bit loff_t.  Then when
the subsequent read() call is made, the file.f_pos member gets 
passed by reference to the /dev/mem driver's read_mem() function 
via the "ppos" argument: 
 
  static ssize_t read_mem(struct file * file, char __user * buf,
                          size_t count, loff_t *ppos)
  {
          unsigned long p = *ppos;
          ssize_t read, sz;
          char *ptr;
  
          if (!valid_phys_addr_range(p, count))
                  return -EFAULT;
  
But its value is then pulled from *ppos into a 32-bit unsigned long 
"p" variable, which is what gets used from then on.  So it looks like
the high 1-bit from a greater-than-4GB (0x100000000) physical address 
would get stripped, and therefore would erroneously bypass the 
valid_phys_addr_range() check.

So in your case, physical addresses from ~3GB-up-to-4GB would 
be rejected, but those at and above 4GB would be inadvertently
accepted.  However, if that were the case, the *wrong* physical address
would be accessed -- but your "module" reads seemingly return the correct
data!  So I still don't get it...

I haven't tinkered with the 32-bit /dev/mem driver in years, because 
Red Hat not only has the "high_memory" restriction, it also has a 
devmem_is_allowed() function that further restricts /dev/mem to the 
first 256 pages (1MB) of physical memory.  (I note that upstream kernels
have recently added a CONFIG_STRICT_DEVMEM config option to do the same 
thing.)  And, FYI, the Red Hat /dev/crash "replacement-for-/dev/mem" driver 
correctly reads *ppos into a u64.

So when you test this again on your live system, after printing the
module via "p <virtual-address-of-module>", do a vtop of the 
<virtual-address-of-module>, take the translated-to physical address
and dump it to verify the contents.  Like this:
  
  crash> p modules
  modules = $2 = {
    next = 0xf8bf5904, 
    prev = 0xf8836004
  }
  crash> module 0xf8bf5900
  struct module {
    state = MODULE_STATE_LIVE, 
    list = {
      next = 0xf8a60d84, 
      prev = 0xc06787b0
    }, 
    name = "crash"
    mkobj = {
      kobj = {
        k_name = 0xf8bf594c "crash", 
        name = "crash", 
        kref = {
          refcount = {
            counter = 2
          }
        }, 
    ...
  crash> vtop 0xf8bf5900
  VIRTUAL   PHYSICAL
  f8bf5900  2412c900
  ...
  crash> rd -p 2412c900 30
  2412c900:  00000000 f8a60d84 c06787b0 73617263   ..........g.cras
  2412c910:  00000068 00000000 00000000 00000000   h...............
  2412c920:  00000000 00000000 00000000 00000000   ................
  2412c930:  00000000 00000000 00000000 00000000   ................
  2412c940:  00000000 00000000 f8bf594c 73617263   ........LY..cras
  2412c950:  00000068 00000000 00000000 00000000   h...............
  2412c960:  00000002 c06783e8 f8a60de4 c06783f4   ......g.......g.
  2412c970:  c06783e0 00000000                     ..g.....
  crash>

Lastly, try this set of crash commands on your live system:
  
  rd -p 0
  rd -p 0x20000000
  rd -p 0x40000000
  rd -p 0x60000000
  rd -p 0x80000000
  rd -p 0xa0000000
  rd -p 0xb8000000
  rd -p 0xc0000000
  rd -p 0xe0000000
  rd -p 0x100000000
  rd -p 0x120000000
  rd -p 0x140000000
  
Theoretically, anything at and above 0xb8000000 should fail.  

> Let me try the kexec PAGE_OFFSET modification today or tomorrow and
> reply back on how it goes. If that produces no change I'll try do a
> re-run of the previous email's process with some more careful
> attention paid (that I get a vtop of everything and that my context
> examples are the same process).

OK fine...

Thanks,
  Dave

 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [Crash-utility] "cannot access vmalloc'd module memory" when loading kdump'ed vmcore in crash
  2008-10-06 19:39 ` Dave Anderson
@ 2008-10-10 19:31   ` Worth, Kevin
  0 siblings, 0 replies; 8+ messages in thread
From: Worth, Kevin @ 2008-10-10 19:31 UTC (permalink / raw)
  To: Discussion list for crash utility usage,	maintenance and development
  Cc: kexec-ml

Hi Dave,

I tried changing the PAGE_OFFSET definition in kexec-tools. Didn't seem to affect it- crash still fails to load the vmalloc'ed memory. If that seems like it absolves kexec-tools of any sins then perhaps we can drop the kexec-ml off the CC list.

Your statement "Theoretically, anything at and above 0xb8000000 should fail." was accurate, which I saw on my live system (with no dump involved). Hoping this provides some insight.

-Kevin


crash> p modules
modules = $2 = {
  next = 0xf9102284,
  prev = 0xf8842104
}

crash> module 0xf9102280
struct module {
  state = MODULE_STATE_LIVE,
  list = {
    next = 0xf9073d84,
    prev = 0x403c63a4
  },
  name = "custom_lkm\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000\000\000",
  mkobj = {
    kobj = {
      k_name = 0xf91022cc "custom_lkm",
      name = "custom_lkm\000\000\000\000\000\000\000\000",
      kref = {
        refcount = {
          counter = 3
        }
      },
      entry = {
        next = 0x403c6068,
        prev = 0xf9073de4
      },
      parent = 0x403c6074,
 -- MORE --  forward: <SPACE>, <ENTER> or j  backward: b or k  quit: q

crash> vtop 0xf9102280
VIRTUAL   PHYSICAL
f9102280  119b76280

PAGE DIRECTORY: 4044b000
  PGD: 4044b018 => 6001
  PMD:     6e40 => 1d515067
  PTE: 1d515810 => 119b76163
 PAGE: 119b76000

   PTE     PHYSICAL   FLAGS
119b76163  119b76000  (PRESENT|RW|ACCESSED|DIRTY|GLOBAL)

crash> rd -p 119b76000 30
rd: read error: physical address: 119b76000  type: "32-bit PHYSADDR"

crash> rd -p 0
       0:  00000001                              ....

crash> rd -p 0x20000000
20000000:  00000000                              ....

crash> rd -p 0x40000000
40000000:  00000000                              ....

crash> rd -p 0x60000000
60000000:  00000000                              ....

crash> rd -p 0x80000000
80000000:  00000000                              ....

crash> rd -p 0xa0000000
a0000000:  00000000                              ....

crash> rd -p 0xb0000000
b0000000:  00000000                              ....

crash> rd -p 0xc0000000
rd: read error: physical address: c0000000  type: "32-bit PHYSADDR"

crash> rd -p 0xb8000000
rd: read error: physical address: b8000000  type: "32-bit PHYSADDR"

...snip out some incremental testing to find the exact point where it fails...

crash> rd -p 0xb7fffffc
b7fffffc:  00000000                              ....

crash> rd -p 0xb7fffffd
rd: read error: physical address: b8000000  type: "32-bit PHYSADDR"



-----Original Message-----
From: crash-utility-bounces@redhat.com [mailto:crash-utility-bounces@redhat.com] On Behalf Of Dave Anderson
Sent: Monday, October 06, 2008 12:39 PM
To: Discussion list for crash utility usage, maintenance and development
Cc: kexec-ml
Subject: Re: [Crash-utility] "cannot access vmalloc'd module memory" when loading kdump'ed vmcore in crash


----- "Kevin Worth" <kevin.worth@hp.com> wrote:

> Dave,
>
> That does seem pretty strange that the physical address is coming out
> beyond the 4GB mark and that the read actually succeeds. Just checked
> on the Ubuntu patches to the 2.6.20 kernel (
> http://archive.ubuntu.com/ubuntu/pool/main/l/linux-source-2.6.20/linux-source-2.6.20_2.6.20-17.39.diff.gz
> ) and no mention of mem.c or either of those two functions.


Hmmm -- I do see one thing with the /dev/mem driver that could
be an explanation.  Maybe...

Prior to the read() call to /dev/mem, crash does an llseek() to
the target physical address, which gets stored in the open file
structure's file.f_pos member, which is a 64-bit loff_t.  Then when
the subsequent read() call is made, the file.f_pos member gets
passed by reference to the /dev/mem driver's read_mem() function
via the "ppos" argument:

  static ssize_t read_mem(struct file * file, char __user * buf,
                          size_t count, loff_t *ppos)
  {
          unsigned long p = *ppos;
          ssize_t read, sz;
          char *ptr;

          if (!valid_phys_addr_range(p, count))
                  return -EFAULT;

But its value is then pulled from *ppos into a 32-bit unsigned long
"p" variable, which is what gets used from then on.  So it looks like
the high 1-bit from a greater-than-4GB (0x100000000) physical address
would get stripped, and therefore would erroneously bypass the
valid_phys_addr_range() check.

So in your case, physical addresses from ~3GB-up-to-4GB would
be rejected, but those at and above 4GB would be inadvertently
accepted.  However, if that were the case, the *wrong* physical address
would be accessed -- but your "module" reads seemingly return the correct
data!  So I still don't get it...

I haven't tinkered with the 32-bit /dev/mem driver in years, because
Red Hat not only has the "high_memory" restriction, it also has a
devmem_is_allowed() function that further restricts /dev/mem to the
first 256 pages (1MB) of physical memory.  (I note that upstream kernels
have recently added a CONFIG_STRICT_DEVMEM config option to do the same
thing.)  And, FYI, the Red Hat /dev/crash "replacement-for-/dev/mem" driver
correctly reads *ppos into a u64.

So when you test this again on your live system, after printing the
module via "p <virtual-address-of-module>", do a vtop of the
<virtual-address-of-module>, take the translated-to physical address
and dump it to verify the contents.  Like this:

  crash> p modules
  modules = $2 = {
    next = 0xf8bf5904,
    prev = 0xf8836004
  }
  crash> module 0xf8bf5900
  struct module {
    state = MODULE_STATE_LIVE,
    list = {
      next = 0xf8a60d84,
      prev = 0xc06787b0
    },
    name = "crash"
    mkobj = {
      kobj = {
        k_name = 0xf8bf594c "crash",
        name = "crash",
        kref = {
          refcount = {
            counter = 2
          }
        },
    ...
  crash> vtop 0xf8bf5900
  VIRTUAL   PHYSICAL
  f8bf5900  2412c900
  ...
  crash> rd -p 2412c900 30
  2412c900:  00000000 f8a60d84 c06787b0 73617263   ..........g.cras
  2412c910:  00000068 00000000 00000000 00000000   h...............
  2412c920:  00000000 00000000 00000000 00000000   ................
  2412c930:  00000000 00000000 00000000 00000000   ................
  2412c940:  00000000 00000000 f8bf594c 73617263   ........LY..cras
  2412c950:  00000068 00000000 00000000 00000000   h...............
  2412c960:  00000002 c06783e8 f8a60de4 c06783f4   ......g.......g.
  2412c970:  c06783e0 00000000                     ..g.....
  crash>

Lastly, try this set of crash commands on your live system:

  rd -p 0
  rd -p 0x20000000
  rd -p 0x40000000
  rd -p 0x60000000
  rd -p 0x80000000
  rd -p 0xa0000000
  rd -p 0xb8000000
  rd -p 0xc0000000
  rd -p 0xe0000000
  rd -p 0x100000000
  rd -p 0x120000000
  rd -p 0x140000000

Theoretically, anything at and above 0xb8000000 should fail.

> Let me try the kexec PAGE_OFFSET modification today or tomorrow and
> reply back on how it goes. If that produces no change I'll try do a
> re-run of the previous email's process with some more careful
> attention paid (that I get a vtop of everything and that my context
> examples are the same process).

OK fine...

Thanks,
  Dave



--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-10-10 19:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-01 19:19 "cannot access vmalloc'd module memory" when loading kdump'ed vmcore in crash Worth, Kevin
     [not found] ` <48E3D2EB.4030301@redhat.com>
     [not found]   ` <28113A0489833849A1CF9EC16F4215434376E9CC1A@GVW1097EXB.americas.hpqcorp.net>
     [not found]     ` <48E4DF65.30009@redhat.com>
     [not found]       ` <28113A0489833849A1CF9EC16F4215434376E9D234@GVW1097EXB.americas.hpqcorp.net>
2008-10-03 15:43         ` [Crash-utility] " Dave Anderson
2008-10-04 17:34           ` Worth, Kevin
2008-10-04 17:47             ` Worth, Kevin
2008-10-06 15:10             ` Dave Anderson
2008-10-06 16:15               ` Worth, Kevin
     [not found] <1077541243.786041223321889818.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
2008-10-06 19:39 ` Dave Anderson
2008-10-10 19:31   ` Worth, Kevin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.