* Re: [Fastboot] Re: Kdump Testing
@ 2005-04-23 3:30 Vivek Goyal
2005-04-25 12:15 ` Nagesh Sharyathi
0 siblings, 1 reply; 14+ messages in thread
From: Vivek Goyal @ 2005-04-23 3:30 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Nagesh Sharyathi, akpm, fastboot, linux-kernel, maneesh
Quoting "Eric W. Biederman" <ebiederm@xmission.com>:
> Nagesh Sharyathi <sharyathi@in.ibm.com> writes:
>
> > Here is the console boot log, before the machine jumps to BIOS
> > after hang during panic kerenl boot
>
> Ok thanks. So this is manually triggered with SysRq
> and the kexec part works but the recover kernel simply fails
> to boot.
>
> It looks like that hunk of the ACPI code that messes up maxcpus=1
> needs to be looked at.
I faced the similiar issue on one of my machine. Little debugging showed that
Boot cpu sends an INIT IPI to application processor to wake it up and then boot
cpu loses its way and jumps to bios. Strange....
Further, in my case this problem was noticed only if crash happened on non-boot
cpu.
It works well with Uniporcessor capture kernel. For the time being sufficient
to capture the dump but it is always good idea to be able to boot and SMP kernel
as well.
Thanks
Vivek
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [Fastboot] Re: Kdump Testing 2005-04-23 3:30 [Fastboot] Re: Kdump Testing Vivek Goyal @ 2005-04-25 12:15 ` Nagesh Sharyathi 2005-04-25 23:09 ` Randy.Dunlap 0 siblings, 1 reply; 14+ messages in thread From: Nagesh Sharyathi @ 2005-04-25 12:15 UTC (permalink / raw) To: vgoyal, akpm, Eric W. Biederman, fastboot, linux-kernel, maneesh vgoyal@in.ltcfwd.linux.ibm.com wrote on 23/04/2005 09:00:03: > Quoting "Eric W. Biederman" <ebiederm@xmission.com>: > > Nagesh Sharyathi <sharyathi@in.ibm.com> writes: > > > > > Here is the console boot log, before the machine jumps to BIOS > > > after hang during panic kerenl boot > > > > Ok thanks. So this is manually triggered with SysRq > > and the kexec part works but the recover kernel simply fails > > to boot. > > > > It looks like that hunk of the ACPI code that messes up maxcpus=1 > > needs to be looked at. > It works well with Uniporcessor capture kernel. For the time being sufficient > to capture the dump but it is always good idea to be able to boot > and SMP kernel > as well. > > Vivek I verified on my machine where earlier kdump used to fail and after disabling CONFIG_SMP(ie CONFIG_SMP=n) crash kernel boots properly and I am able to take the memory dump Regards Sharyathi ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Fastboot] Re: Kdump Testing 2005-04-25 12:15 ` Nagesh Sharyathi @ 2005-04-25 23:09 ` Randy.Dunlap 2005-04-26 8:54 ` Vivek Goyal 0 siblings, 1 reply; 14+ messages in thread From: Randy.Dunlap @ 2005-04-25 23:09 UTC (permalink / raw) To: Nagesh Sharyathi; +Cc: vgoyal, akpm, ebiederm, fastboot, linux-kernel, maneesh On Mon, 25 Apr 2005 17:45:43 +0530 Nagesh Sharyathi <sharyathi@in.ibm.com> wrote: > vgoyal@in.ltcfwd.linux.ibm.com wrote on 23/04/2005 09:00:03: > > > Quoting "Eric W. Biederman" <ebiederm@xmission.com>: > > > > Nagesh Sharyathi <sharyathi@in.ibm.com> writes: > > > > > > > Here is the console boot log, before the machine jumps to BIOS > > > > after hang during panic kerenl boot > > > > > > Ok thanks. So this is manually triggered with SysRq > > > and the kexec part works but the recover kernel simply fails > > > to boot. > > > > > > It looks like that hunk of the ACPI code that messes up maxcpus=1 > > > needs to be looked at. > > > It works well with Uniporcessor capture kernel. For the time being > sufficient > > to capture the dump but it is always good idea to be able to boot > > and SMP kernel > > as well. > > > > Vivek > I verified on my machine where earlier kdump used to fail and after > disabling CONFIG_SMP(ie CONFIG_SMP=n) crash kernel boots properly and I am > able to take the memory dump Thanks for those hints. However, my testing didn't go quite as well as that. 2.6.12-rc2-mm3 reboots vmlinux-recover-UP on panic. (vmlinux-recover-SMP hangs during [early] reboot, but -UP goes further....) (BTW, how does I do serial console from the second kernel...? It has the drivers, but not the command line info? TBD.) vmlinux-recover-UP gets to this point, hand-written, several lines missing: kfree_debugcheck: bad ptr c3dbffb0h. ( == %esi) kernel BUG at <bad filename>:23128! invalid operand: 0000 [#1] DEBUG_PAGEALLOC EIP is at kfree_debugcheck+0x45/0x50 Stack dump shows lots of ext3 cache and inode functions... On a dual-proc P4 with 1 GB RAM. -- ~Randy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Fastboot] Re: Kdump Testing 2005-04-25 23:09 ` Randy.Dunlap @ 2005-04-26 8:54 ` Vivek Goyal 2005-04-27 16:46 ` Randy.Dunlap 2005-04-27 19:23 ` Randy.Dunlap 0 siblings, 2 replies; 14+ messages in thread From: Vivek Goyal @ 2005-04-26 8:54 UTC (permalink / raw) To: Randy.Dunlap Cc: Nagesh Sharyathi, vgoyal, akpm, ebiederm, fastboot, linux-kernel, maneesh > > 2.6.12-rc2-mm3 reboots vmlinux-recover-UP on panic. > (vmlinux-recover-SMP hangs during [early] reboot, but -UP > goes further....) > > (BTW, how does I do serial console from the second > kernel...? It has the drivers, but not the command > line info? TBD.) > While pre-loading the capture kernel using kexec, you can specify the command line options to second kernel using --append="". You must already be passing the root device. Add you serial console parameters as well something like --append="console=ttyS0, 38400" > vmlinux-recover-UP gets to this point, hand-written, > several lines missing: > > kfree_debugcheck: bad ptr c3dbffb0h. ( == %esi) > kernel BUG at <bad filename>:23128! > invalid operand: 0000 [#1] > DEBUG_PAGEALLOC > EIP is at kfree_debugcheck+0x45/0x50 > > Stack dump shows lots of ext3 cache and inode functions... > Can you post a full serial console output of second kernel? That would help. Thanks Vivek ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Fastboot] Re: Kdump Testing 2005-04-26 8:54 ` Vivek Goyal @ 2005-04-27 16:46 ` Randy.Dunlap 2005-04-27 19:23 ` Randy.Dunlap 1 sibling, 0 replies; 14+ messages in thread From: Randy.Dunlap @ 2005-04-27 16:46 UTC (permalink / raw) To: vgoyal; +Cc: akpm, sharyathi, fastboot, linux-kernel, ebiederm On Tue, 26 Apr 2005 14:24:48 +0530 Vivek Goyal <vgoyal@in.ibm.com> wrote: > > > > 2.6.12-rc2-mm3 reboots vmlinux-recover-UP on panic. > > (vmlinux-recover-SMP hangs during [early] reboot, but -UP > > goes further....) > > > > (BTW, how does I do serial console from the second > > kernel...? It has the drivers, but not the command > > line info? TBD.) > > > > > While pre-loading the capture kernel using kexec, you can specify the command > line options to second kernel using --append="". You must already be passing > the root device. Add you serial console parameters as well something like > --append="console=ttyS0, 38400" Yes, that's what I was planning to try anyway, thanks for the confirmation. Finally got it working. > > vmlinux-recover-UP gets to this point, hand-written, > > several lines missing: > > > > kfree_debugcheck: bad ptr c3dbffb0h. ( == %esi) > > kernel BUG at <bad filename>:23128! > > invalid operand: 0000 [#1] > > DEBUG_PAGEALLOC > > EIP is at kfree_debugcheck+0x45/0x50 > > > > Stack dump shows lots of ext3 cache and inode functions... > > > > Can you post a full serial console output of second kernel? That would help. Here: Linux version 2.6.12-rc2-mm3 (rddunlap@gargoyle) (gcc version 3.3.3 (SuSE Linux)) #25 Tue Apr 26 17:52:39 PDT 2005 BIOS-provided physical RAM map: BIOS-e820: 0000000000000100 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 0000000000100000 - 000000003fff0000 (usable) BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS) BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data) BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved) user-defined physical RAM map: user: 0000000000000000 - 00000000000a0000 (usable) user: 0000000001000000 - 000000000144d000 (usable) user: 00000000014ed400 - 0000000005000000 (usable) 0MB HIGHMEM available. 80MB LOWMEM available. DMI 2.3 present. Allocating PCI resources starting at 05000000 (gap: 05000000:fb000000) Built 1 zonelists Initializing CPU#0 Kernel command line: root=/dev/hda9 nosmp console=ttyS0,115200n8 console=tty0 init 1 memmap=exactmap memmap=640K@0K memmap=4404K@16384K memmap=60491K@21429K elfcorehdr=21428K PID hash table entries: 512 (order: 9, 8192 bytes) Detected 1685.910 MHz processor. Using tsc for high-res timesource Console: colour VGA+ 80x25 Unknown interrupt or fault at EIP 00000246 00000060 c13d6653 [*1] Dentry cache hash table entries: 16384 (order: 4, 65536 bytes) Inode-cache hash table entries: 8192 (order: 3, 32768 bytes) Memory: 59468k/81920k available (2561k kernel code, 5956k reserved, 1311k data, 220k init, 0k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok. --- [1] c13d6653 is vfs_caches_init_early ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Fastboot] Re: Kdump Testing 2005-04-26 8:54 ` Vivek Goyal 2005-04-27 16:46 ` Randy.Dunlap @ 2005-04-27 19:23 ` Randy.Dunlap 2005-04-28 11:44 ` Vivek Goyal 1 sibling, 1 reply; 14+ messages in thread From: Randy.Dunlap @ 2005-04-27 19:23 UTC (permalink / raw) To: vgoyal; +Cc: akpm, sharyathi, fastboot, linux-kernel, ebiederm On Tue, 26 Apr 2005 14:24:48 +0530 Vivek Goyal <vgoyal@in.ibm.com> wrote: > > > > 2.6.12-rc2-mm3 reboots vmlinux-recover-UP on panic. > > (vmlinux-recover-SMP hangs during [early] reboot, but -UP > > goes further....) > > > > (BTW, how does I do serial console from the second > > kernel...? It has the drivers, but not the command > > line info? TBD.) > > > > > While pre-loading the capture kernel using kexec, you can specify the command > line options to second kernel using --append="". You must already be passing > the root device. Add you serial console parameters as well something like > --append="console=ttyS0, 38400" > > > > vmlinux-recover-UP gets to this point, hand-written, > > several lines missing: > > > > kfree_debugcheck: bad ptr c3dbffb0h. ( == %esi) > > kernel BUG at <bad filename>:23128! > > invalid operand: 0000 [#1] > > DEBUG_PAGEALLOC > > EIP is at kfree_debugcheck+0x45/0x50 > > > > Stack dump shows lots of ext3 cache and inode functions... > > > > Can you post a full serial console output of second kernel? That would help. I did another test run, same kernels (both running and recovery). The recovery kernel got a little further this time, still had Badness and a BUG. --- Kernel panic - not syncing: crashtest Linux version 2.6.12-rc2-mm3 (rddunlap@gargoyle) (gcc version 3.3.3 (SuSE Linux)) #25 Tue Apr 26 17:52:39 PDT 2005 BIOS-provided physical RAM map: BIOS-e820: 0000000000000100 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 0000000000100000 - 000000003fff0000 (usable) BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS) BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data) BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved) user-defined physical RAM map: user: 0000000000000000 - 00000000000a0000 (usable) user: 0000000001000000 - 000000000144d000 (usable) user: 00000000014ed400 - 0000000005000000 (usable) 0MB HIGHMEM available. 80MB LOWMEM available. DMI 2.3 present. Allocating PCI resources starting at 05000000 (gap: 05000000:fb000000) Built 1 zonelists Initializing CPU#0 Kernel command line: root=/dev/hda9 nosmp console=ttyS0,115200n8 console=tty0 init 1 memmap=exactmap memmap=640K@0K memmap=4404K@16384K memmap=60491K@21429K elfcorehdr=21428K PID hash table entries: 512 (order: 9, 8192 bytes) Detected 1685.983 MHz processor. Using tsc for high-res timesource Console: colour VGA+ 80x25 Unknown interrupt or fault at EIP 00000246 00000060 c13d6653 Dentry cache hash table entries: 16384 (order: 4, 65536 bytes) Inode-cache hash table entries: 8192 (order: 3, 32768 bytes) Memory: 59468k/81920k available (2561k kernel code, 5956k reserved, 1311k data, 220k init, 0k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok. Mount-cache hash table entries: 512 CPU: Trace cache: 12K uops, L1 D cache: 8K CPU: L2 cache: 256K Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU0: Intel P4/Xeon Extended MCE MSRs (12) available CPU: Intel(R) Xeon(TM) CPU 1.70GHz stepping 02 Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. softlockup thread 0 started up. NET: Registered protocol family 16 EISA bus registered PCI: PCI BIOS revision 2.10 entry at 0xfb110, last bus=4 PCI: Using configuration type 1 mtrr: v2.0 (20020519) Linux Plug and Play Support v0.97 (c) Adam Belay SCSI subsystem initialized usbcore: registered new driver usbfs usbcore: registered new driver hub PCI: Probing PCI hardware PCI: Probing PCI hardware (bus 00) PCI: Using IRQ router PIIX/ICH [8086/2440] at 0000:00:1f.0 fscache: general fs caching registered CacheFS: general fs caching v0.1 registered inotify device minor=63 Initializing Cryptographic API pci_hotplug: PCI Hot Plug PCI Core version: 0.5 lp: driver loaded but no devices found Real Time Clock Driver v1.12 Non-volatile memory driver v1.2 Software Watchdog Timer: 0.07 initialized. soft_noboot=0 soft_margin=60 sec (nowayout= 0) Linux agpgart interface v0.101 (c) Dave Jones agpgart: Detected an Intel i860 Chipset. agpgart: AGP aperture is 64M @ 0xe8000000 Hangcheck: starting hangcheck timer 0.5.0 (tick is 180 seconds, margin is 60 seconds). PNP: No PS/2 controller found. Probing ports directly. serio: i8042 AUX port at 0x60,0x64 irq 12 serio: i8042 KBD port at 0x60,0x64 irq 1 Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing disabled ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A parport0: PC-style at 0x378 (0x778) [PCSPP,TRISTATE,EPP] parport0: irq 7 detected lp0: using parport0 (polling). lp0: console ready io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered Floppy drive(s): fd0 is 1.44M FDC 0 is a post-1991 82077 loop: loaded (max 8 devices) pktcdvd: v0.2.0a 2004-07-14 Jens Axboe (axboe@suse.de) and petero2@telia.com Intel(R) PRO/1000 Network Driver - version 5.7.6-k2 Copyright (c) 1999-2004 Intel Corporation. e100: Intel(R) PRO/100 Network Driver, 3.3.6-k2-NAPI e100: Copyright(c) 1999-2004 Intel Corporation PCI: Found IRQ 10 for device 0000:04:04.0 e100: eth0: e100_probe: addr 0xf4020000, irq 10, MAC addr 00:02:55:1A:35:D4 Linux video capture interface: v1.00 Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx ICH2: IDE controller at PCI slot 0000:00:1f.1 ICH2: chipset revision 4 ICH2: not 100% native mode: will probe irqs later ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:DMA ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:DMA hda: ST3160023A, ATA DISK drive hdb: ST3160023A, ATA DISK drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 hdc: LTN486S, ATAPI CD/DVD-ROM drive hdd: SONY CD-RW CRX140E, ATAPI CD/DVD-ROM drive ide1 at 0x170-0x177,0x376 on irq 15 hda: max request size: 1024KiB hda: 312581808 sectors (160041 MB) w/8192KiB Cache, CHS=19457/255/63, UDMA(100) hda: cache flushes supported hda: hda1 < hda5 hda6 hda7 hda8 hda9 > hdb: max request size: 1024KiB hdb: 312581808 sectors (160041 MB) w/8192KiB Cache, CHS=19457/255/63, UDMA(100) hdb: cache flushes supported hdb: hdb1 hdb2 hdb3 hdb4 hdc: ATAPI 48X CD-ROM drive, 120kB Cache, UDMA(33) Uniform CD-ROM driver Revision: 3.20 hdd: ATAPI 32X CD-ROM CD-R/RW drive, 4096kB Cache, UDMA(33) PCI: Enabling device 0000:03:01.0 (0006 -> 0007) PCI: Found IRQ 11 for device 0000:03:01.0 PCI: Sharing IRQ 11 with 0000:03:01.1 PCI: Enabling device 0000:03:01.1 (0006 -> 0007) PCI: Found IRQ 11 for device 0000:03:01.1 PCI: Sharing IRQ 11 with 0000:03:01.0 scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.36 <Adaptec aic7899 Ultra160 SCSI adapter> aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.36 <Adaptec aic7899 Ultra160 SCSI adapter> aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs scsi2 : scsi_debug, version 1.75 [20050113], dev_size_mb=8, opts=0x0 Vendor: Linux Model: scsi_debug Rev: 0004 Type: Direct-Access ANSI SCSI revision: 05 SCSI device sda: 16384 512-byte hdwr sectors (8 MB) SCSI device sda: drive cache: write back SCSI device sda: 16384 512-byte hdwr sectors (8 MB) SCSI device sda: drive cache: write back sda: unknown partition table Attached scsi disk sda at scsi2, channel 0, id 0, lun 0 Attached scsi generic sg0 at scsi2, channel 0, id 0, lun 0, type 0 SCSI Media Changer driver v0.24 USB Universal Host Controller Interface driver v2.2 PCI: Found IRQ 11 for device 0000:00:1f.2 uhci_hcd 0000:00:1f.2: Intel Corporation 82801BA/BAM USB (Hub #1) uhci_hcd 0000:00:1f.2: new USB bus registered, assigned bus number 1 uhci_hcd 0000:00:1f.2: irq 11, io base 0x0000b000 uhci_hcd 0000:00:1f.2: detected 2 ports usb usb1: Product: Intel Corporation 82801BA/BAM USB (Hub #1) usb usb1: Manufacturer: Linux 2.6.12-rc2-mm3 uhci_hcd usb usb1: SerialNumber: 0000:00:1f.2 hub 1-0:1.0: USB hub found hub 1-0:1.0: 2 ports detected usbcore: registered new driver hiddev usbcore: registered new driver usbhid drivers/usb/input/hid-core.c: v2.01:USB HID core driver mice: PS/2 mouse device common for all mice input: PC Speaker i2c /dev entries driver EISA: Probing bus 0 at eisa.0 Cannot allocate resource for EISA slot 4 Cannot allocate resource for EISA slot 5 EISA: Detected 0 cards. Advanced Linux Sound Architecture Driver Version 1.0.9rc2 (Thu Mar 24 10:33:39 2005 UTC). PCI: Found IRQ 11 for device 0000:00:1f.5 PCI: Sharing IRQ 11 with 0000:00:1f.3 input: AT Translated Set 2 keyboard on isa0060/serio0 intel8x0_measure_ac97_clock: measured 49559 usecs intel8x0: clocking to 48000 ALSA device list: #0: Intel 82801BA-ICH2 with AD1885 at 0xb800, irq 11 NET: Registered protocol family 26 NET: Registered protocol family 2 IP: routing cache hash table of 128 buckets, 4Kbytes TCP established hash table entries: 4096 (order: 3, 32768 bytes) TCP bind hash table entries: 4096 (order: 4, 114688 bytes) TCP: Hash tables configured (established 4096 bind 4096) NET: Registered protocol family 1 NET: Registered protocol family 17 CacheFS: Wrong magic number on cache EXT3-fs: INFO: recovery required on readonly filesystem. EXT3-fs: write access will be enabled during recovery. input: ImPS/2 Generic Wheel Mouse on isa0060/serio1 kjournald starting. Commit interval 5 seconds EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode. VFS: Mounted root (ext3 filesystem) readonly. Freeing unused kernel memory: 220k freed Adding 2104472k swap on /dev/hda7. Priority:42 extents:1 mismatch in kmem_cache_free: expected cache c168fc80, got c4daca80 c4daca80 is ext3_inode_cache. c168fc80 is skbuff_head_cache. Badness in cache_free_debugcheck at mm/slab.c:1917 [<c1003368>] dump_stack+0x16/0x18 [<c1041a94>] cache_free_debugcheck+0x88/0x1d5 [<c10424fd>] kmem_cache_free+0x26/0x65 [<c10a8c01>] ext3_destroy_inode+0x17/0x19 [<c10784c9>] destroy_inode+0x27/0x3d [<c1078837>] dispose_list+0x60/0x178 [<c1078f81>] prune_icache+0x363/0x399 [<c1078fd0>] shrink_icache_memory+0x19/0x32 [<c1044dd7>] shrink_slab+0x104/0x172 [<c104641e>] try_to_free_pages+0xbe/0x16f [<c103d9a0>] __alloc_pages+0x1d3/0x393 [<c104037c>] kmem_getpages+0x2d/0x7f [<c1041869>] cache_grow+0x155/0x2a8 [<c1041f1f>] cache_alloc_refill+0x285/0x2c2 [<c10423c6>] kmem_cache_alloc+0x5d/0x77 [<c1075dac>] d_alloc+0x16/0x27a [<c106b2b9>] real_lookup+0x40/0xc2 [<c106b68e>] do_lookup+0x41/0x75 [<c106c3a7>] __link_path_walk+0xce5/0x1066 [<c106c768>] link_path_walk+0x40/0xc7 [<c106ca87>] path_lookup+0xec/0xf7 [<c106cbc9>] __user_walk+0x28/0x42 [<c10667b3>] vfs_lstat+0x17/0x3f [<c1066d1e>] sys_lstat64+0x13/0x29 [<c1002c5f>] sysenter_past_esp+0x54/0x75 slab error in cache_free_debugcheck(): cache `ext3_inode_cache': double free, or memory outside object was overwritten [<c1003368>] dump_stack+0x16/0x18 [<c1041ad2>] cache_free_debugcheck+0xc6/0x1d5 [<c10424fd>] kmem_cache_free+0x26/0x65 [<c10a8c01>] ext3_destroy_inode+0x17/0x19 [<c10784c9>] destroy_inode+0x27/0x3d [<c1078837>] dispose_list+0x60/0x178 [<c1078f81>] prune_icache+0x363/0x399 [<c1078fd0>] shrink_icache_memory+0x19/0x32 [<c1044dd7>] shrink_slab+0x104/0x172 [<c104641e>] try_to_free_pages+0xbe/0x16f [<c103d9a0>] __alloc_pages+0x1d3/0x393 [<c104037c>] kmem_getpages+0x2d/0x7f [<c1041869>] cache_grow+0x155/0x2a8 [<c1041f1f>] cache_alloc_refill+0x285/0x2c2 [<c10423c6>] kmem_cache_alloc+0x5d/0x77 [<c1075dac>] d_alloc+0x16/0x27a [<c106b2b9>] real_lookup+0x40/0xc2 [<c106b68e>] do_lookup+0x41/0x75 [<c106c3a7>] __link_path_walk+0xce5/0x1066 [<c106c768>] link_path_walk+0x40/0xc7 [<c106ca87>] path_lookup+0xec/0xf7 [<c106cbc9>] __user_walk+0x28/0x42 [<c10667b3>] vfs_lstat+0x17/0x3f [<c1066d1e>] sys_lstat64+0x13/0x29 [<c1002c5f>] sysenter_past_esp+0x54/0x75 c3d7afb0: redzone 1: 0x0, redzone 2: 0x0. ------------[ cut here ]------------ kernel BUG at <bad filename>:18422! invalid operand: 0000 [#1] DEBUG_PAGEALLOC Modules linked in: CPU: 0 EIP: 0060:[<c1041b46>] Not tainted VLI EFLAGS: 00010002 (2.6.12-rc2-mm3) EIP is at cache_free_debugcheck+0x13a/0x1d5 eax: c3d7a000 ebx: c3d7a000 ecx: 00001000 edx: 00000fb0 esi: c3d7afb0 edi: c4daca80 ebp: c2f73bb8 esp: c2f73bac ds: 007b es: 007b ss: 0068 Process showconsole (pid: 1264, threadinfo=c2f72000 task=c2f68ac0) Stack: c4d0fec4 c4daca80 c3d7bd44 c2f73be0 c10424fd c4daca80 c3d7bd44 c10a8c01 00000080 00000286 c3d7bddc c2f73c2c 00000080 c2f73bf0 c10a8c01 c4daca80 c3d7bd44 c2f73c00 c10784c9 c3d7bddc c3d7bddc c2f73c1c c1078837 c3d7bddc Call Trace: [<c100334a>] show_stack+0x7a/0x82 [<c1003453>] show_registers+0xe9/0x153 [<c100369f>] die+0x15c/0x23d [<c1003a79>] do_invalid_op+0x90/0x97 [<c1002ed3>] error_code+0x4f/0x54 [<c10424fd>] kmem_cache_free+0x26/0x65 [<c10a8c01>] ext3_destroy_inode+0x17/0x19 [<c10784c9>] destroy_inode+0x27/0x3d [<c1078837>] dispose_list+0x60/0x178 [<c1078f81>] prune_icache+0x363/0x399 [<c1078fd0>] shrink_icache_memory+0x19/0x32 [<c1044dd7>] shrink_slab+0x104/0x172 [<c104641e>] try_to_free_pages+0xbe/0x16f [<c103d9a0>] __alloc_pages+0x1d3/0x393 [<c104037c>] kmem_getpages+0x2d/0x7f [<c1041869>] cache_grow+0x155/0x2a8 [<c1041f1f>] cache_alloc_refill+0x285/0x2c2 [<c10423c6>] kmem_cache_alloc+0x5d/0x77 [<c1075dac>] d_alloc+0x16/0x27a [<c106b2b9>] real_lookup+0x40/0xc2 [<c106b68e>] do_lookup+0x41/0x75 [<c106c3a7>] __link_path_walk+0xce5/0x1066 [<c106c768>] link_path_walk+0x40/0xc7 [<c106ca87>] path_lookup+0xec/0xf7 [<c106cbc9>] __user_walk+0x28/0x42 [<c10667b3>] vfs_lstat+0x17/0x3f [<c1066d1e>] sys_lstat64+0x13/0x29 [<c1002c5f>] sysenter_past_esp+0x54/0x75 Code: e8 bc e4 ff ff 8b 55 10 89 10 58 5a 8b 5b 0c 89 f0 31 d2 8b 4f 34 29 d8 f7 f1 3b 47 3c 72 02 0f 0b 0f af c1 8d 04 18 39 c6 74 02 <0f> 0b f6 47 39 02 74 15 6a 05 57 57 e8 1d e4 ff ff 8d 04 30 89 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Fastboot] Re: Kdump Testing 2005-04-27 19:23 ` Randy.Dunlap @ 2005-04-28 11:44 ` Vivek Goyal 2005-04-28 16:11 ` Randy.Dunlap 0 siblings, 1 reply; 14+ messages in thread From: Vivek Goyal @ 2005-04-28 11:44 UTC (permalink / raw) To: Randy.Dunlap; +Cc: vgoyal, akpm, sharyathi, fastboot, linux-kernel, ebiederm On Wed, Apr 27, 2005 at 12:23:12PM -0700, Randy.Dunlap wrote: > On Tue, 26 Apr 2005 14:24:48 +0530 > Vivek Goyal <vgoyal@in.ibm.com> wrote: > > > > > > > 2.6.12-rc2-mm3 reboots vmlinux-recover-UP on panic. > > > (vmlinux-recover-SMP hangs during [early] reboot, but -UP > > > goes further....) > > > > > > (BTW, how does I do serial console from the second > > > kernel...? It has the drivers, but not the command > > > line info? TBD.) > > > > > > > > > While pre-loading the capture kernel using kexec, you can specify the command > > line options to second kernel using --append="". You must already be passing > > the root device. Add you serial console parameters as well something like > > --append="console=ttyS0, 38400" > > > > > > > vmlinux-recover-UP gets to this point, hand-written, > > > several lines missing: > > > > > > kfree_debugcheck: bad ptr c3dbffb0h. ( == %esi) > > > kernel BUG at <bad filename>:23128! > > > invalid operand: 0000 [#1] > > > DEBUG_PAGEALLOC > > > EIP is at kfree_debugcheck+0x45/0x50 > > > > > > Stack dump shows lots of ext3 cache and inode functions... > > > > > > > Can you post a full serial console output of second kernel? That would help. > > I did another test run, same kernels (both running and recovery). > The recovery kernel got a little further this time, still had > Badness and a BUG. > > --- Ok. I am also able to see this slab corruption occurring on my machine. I can get away with the problem if I disable cachefs support. Infact, I can reproduce the problem if I boot capture kernel normally through BIOS with commandline "mem=64M". Looks like it is generic problem and not associated with kexec/kdump. Cachefs might be doing some corruption. > CacheFS: Wrong magic number on cache > EXT3-fs: INFO: recovery required on readonly filesystem. > EXT3-fs: write access will be enabled during recovery. > input: ImPS/2 Generic Wheel Mouse on isa0060/serio1 > kjournald starting. Commit interval 5 seconds > EXT3-fs: recovery complete. > EXT3-fs: mounted filesystem with ordered data mode. > VFS: Mounted root (ext3 filesystem) readonly. > Freeing unused kernel memory: 220k freed > Adding 2104472k swap on /dev/hda7. Priority:42 extents:1 > mismatch in kmem_cache_free: expected cache c168fc80, got c4daca80 > c4daca80 is ext3_inode_cache. > c168fc80 is skbuff_head_cache. > Badness in cache_free_debugcheck at mm/slab.c:1917 > [<c1003368>] dump_stack+0x16/0x18 > [<c1041a94>] cache_free_debugcheck+0x88/0x1d5 > [<c10424fd>] kmem_cache_free+0x26/0x65 > [<c10a8c01>] ext3_destroy_inode+0x17/0x19 > [<c10784c9>] destroy_inode+0x27/0x3d > [<c1078837>] dispose_list+0x60/0x178 > [<c1078f81>] prune_icache+0x363/0x399 > [<c1078fd0>] shrink_icache_memory+0x19/0x32 > [<c1044dd7>] shrink_slab+0x104/0x172 > [<c104641e>] try_to_free_pages+0xbe/0x16f > [<c103d9a0>] __alloc_pages+0x1d3/0x393 > [<c104037c>] kmem_getpages+0x2d/0x7f > [<c1041869>] cache_grow+0x155/0x2a8 > [<c1041f1f>] cache_alloc_refill+0x285/0x2c2 > [<c10423c6>] kmem_cache_alloc+0x5d/0x77 > [<c1075dac>] d_alloc+0x16/0x27a > [<c106b2b9>] real_lookup+0x40/0xc2 > [<c106b68e>] do_lookup+0x41/0x75 > [<c106c3a7>] __link_path_walk+0xce5/0x1066 > [<c106c768>] link_path_walk+0x40/0xc7 > [<c106ca87>] path_lookup+0xec/0xf7 > [<c106cbc9>] __user_walk+0x28/0x42 > [<c10667b3>] vfs_lstat+0x17/0x3f > [<c1066d1e>] sys_lstat64+0x13/0x29 > [<c1002c5f>] sysenter_past_esp+0x54/0x75 > slab error in cache_free_debugcheck(): cache `ext3_inode_cache': double free, or memory outside object was overwritten > [<c1003368>] dump_stack+0x16/0x18 > [<c1041ad2>] cache_free_debugcheck+0xc6/0x1d5 > [<c10424fd>] kmem_cache_free+0x26/0x65 > [<c10a8c01>] ext3_destroy_inode+0x17/0x19 > [<c10784c9>] destroy_inode+0x27/0x3d > [<c1078837>] dispose_list+0x60/0x178 > [<c1078f81>] prune_icache+0x363/0x399 > [<c1078fd0>] shrink_icache_memory+0x19/0x32 > [<c1044dd7>] shrink_slab+0x104/0x172 > [<c104641e>] try_to_free_pages+0xbe/0x16f > [<c103d9a0>] __alloc_pages+0x1d3/0x393 > [<c104037c>] kmem_getpages+0x2d/0x7f > [<c1041869>] cache_grow+0x155/0x2a8 > [<c1041f1f>] cache_alloc_refill+0x285/0x2c2 > [<c10423c6>] kmem_cache_alloc+0x5d/0x77 > [<c1075dac>] d_alloc+0x16/0x27a > [<c106b2b9>] real_lookup+0x40/0xc2 > [<c106b68e>] do_lookup+0x41/0x75 > [<c106c3a7>] __link_path_walk+0xce5/0x1066 > [<c106c768>] link_path_walk+0x40/0xc7 > [<c106ca87>] path_lookup+0xec/0xf7 > [<c106cbc9>] __user_walk+0x28/0x42 > [<c10667b3>] vfs_lstat+0x17/0x3f > [<c1066d1e>] sys_lstat64+0x13/0x29 > [<c1002c5f>] sysenter_past_esp+0x54/0x75 > c3d7afb0: redzone 1: 0x0, redzone 2: 0x0. > ------------[ cut here ]------------ > kernel BUG at <bad filename>:18422! > invalid operand: 0000 [#1] > DEBUG_PAGEALLOC > Modules linked in: > CPU: 0 > EIP: 0060:[<c1041b46>] Not tainted VLI > EFLAGS: 00010002 (2.6.12-rc2-mm3) > EIP is at cache_free_debugcheck+0x13a/0x1d5 > eax: c3d7a000 ebx: c3d7a000 ecx: 00001000 edx: 00000fb0 > esi: c3d7afb0 edi: c4daca80 ebp: c2f73bb8 esp: c2f73bac > ds: 007b es: 007b ss: 0068 > Process showconsole (pid: 1264, threadinfo=c2f72000 task=c2f68ac0) > Stack: c4d0fec4 c4daca80 c3d7bd44 c2f73be0 c10424fd c4daca80 c3d7bd44 c10a8c01 > 00000080 00000286 c3d7bddc c2f73c2c 00000080 c2f73bf0 c10a8c01 c4daca80 > c3d7bd44 c2f73c00 c10784c9 c3d7bddc c3d7bddc c2f73c1c c1078837 c3d7bddc > Call Trace: > [<c100334a>] show_stack+0x7a/0x82 > [<c1003453>] show_registers+0xe9/0x153 > [<c100369f>] die+0x15c/0x23d > [<c1003a79>] do_invalid_op+0x90/0x97 > [<c1002ed3>] error_code+0x4f/0x54 > [<c10424fd>] kmem_cache_free+0x26/0x65 > [<c10a8c01>] ext3_destroy_inode+0x17/0x19 > [<c10784c9>] destroy_inode+0x27/0x3d > [<c1078837>] dispose_list+0x60/0x178 > [<c1078f81>] prune_icache+0x363/0x399 > [<c1078fd0>] shrink_icache_memory+0x19/0x32 > [<c1044dd7>] shrink_slab+0x104/0x172 > [<c104641e>] try_to_free_pages+0xbe/0x16f > [<c103d9a0>] __alloc_pages+0x1d3/0x393 > [<c104037c>] kmem_getpages+0x2d/0x7f > [<c1041869>] cache_grow+0x155/0x2a8 > [<c1041f1f>] cache_alloc_refill+0x285/0x2c2 > [<c10423c6>] kmem_cache_alloc+0x5d/0x77 > [<c1075dac>] d_alloc+0x16/0x27a > [<c106b2b9>] real_lookup+0x40/0xc2 > [<c106b68e>] do_lookup+0x41/0x75 > [<c106c3a7>] __link_path_walk+0xce5/0x1066 > [<c106c768>] link_path_walk+0x40/0xc7 > [<c106ca87>] path_lookup+0xec/0xf7 > [<c106cbc9>] __user_walk+0x28/0x42 > [<c10667b3>] vfs_lstat+0x17/0x3f > [<c1066d1e>] sys_lstat64+0x13/0x29 > [<c1002c5f>] sysenter_past_esp+0x54/0x75 > Code: e8 bc e4 ff ff 8b 55 10 89 10 58 5a 8b 5b 0c 89 f0 31 d2 8b 4f 34 29 d8 f7 f1 3b 47 3c 72 02 0f 0b 0f af c1 8d 04 18 39 c6 74 02 <0f> 0b f6 47 39 02 74 15 6a 05 57 57 e8 1d e4 ff ff 8d 04 30 89 > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Fastboot] Re: Kdump Testing 2005-04-28 11:44 ` Vivek Goyal @ 2005-04-28 16:11 ` Randy.Dunlap 2005-04-28 19:08 ` Eric W. Biederman 2005-04-29 3:08 ` [PATCH] Kdump docs Randy.Dunlap 0 siblings, 2 replies; 14+ messages in thread From: Randy.Dunlap @ 2005-04-28 16:11 UTC (permalink / raw) To: vgoyal; +Cc: akpm, ebiederm, fastboot, linux-kernel, sharyathi On Thu, 28 Apr 2005 17:14:16 +0530 Vivek Goyal <vgoyal@in.ibm.com> wrote: > > > Can you post a full serial console output of second kernel? That would help. > > > > I did another test run, same kernels (both running and recovery). > > The recovery kernel got a little further this time, still had > > Badness and a BUG. > > > > --- > > Ok. I am also able to see this slab corruption occurring on my machine. I can > get away with the problem if I disable cachefs support. > > Infact, I can reproduce the problem if I boot capture kernel normally through > BIOS with commandline "mem=64M". Looks like it is generic problem and not > associated with kexec/kdump. Cachefs might be doing some corruption. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Wheeeeeeeeee. Great, we (I) can do without cachefs, and when I do that, kexec + kdump works. First time that I've seen kdump work. :) -rw-r--r-- 1 root root 1.0G Apr 28 08:41 oldmem.0428 -r-------- 1 root root 960M Apr 28 08:36 vmcore.0428 My (crashing/panic) kernel is built without -g, but gdb can still tell me this much: (gdb) bt #0 0xc010ef95 in crash_get_current_regs () #1 0x00000000 in ?? () #2 0xee821ea0 in ?? () #3 0xee821ea0 in ?? () #4 0xee821ea0 in ?? () #5 0x00000046 in ?? () #6 0x00000000 in ?? () #7 0x00000000 in ?? () #8 0x00000000 in ?? () #9 0xee82c000 in ?? () #10 0x00000000 in ?? () #11 0xc010ed38 in machine_kexec () Thanks for following up, tracking, working on this. --- ~Randy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Fastboot] Re: Kdump Testing 2005-04-28 16:11 ` Randy.Dunlap @ 2005-04-28 19:08 ` Eric W. Biederman 2005-04-29 3:08 ` [PATCH] Kdump docs Randy.Dunlap 1 sibling, 0 replies; 14+ messages in thread From: Eric W. Biederman @ 2005-04-28 19:08 UTC (permalink / raw) To: Randy.Dunlap; +Cc: vgoyal, akpm, fastboot, linux-kernel, sharyathi "Randy.Dunlap" <rddunlap@osdl.org> writes: > On Thu, 28 Apr 2005 17:14:16 +0530 > Vivek Goyal <vgoyal@in.ibm.com> wrote: > > > > > Can you post a full serial console output of second kernel? That would > help. > > > > > > > I did another test run, same kernels (both running and recovery). > > > The recovery kernel got a little further this time, still had > > > Badness and a BUG. > > > > > > --- > > > > Ok. I am also able to see this slab corruption occurring on my machine. I can > > > get away with the problem if I disable cachefs support. > > > > Infact, I can reproduce the problem if I boot capture kernel normally through > > > BIOS with commandline "mem=64M". Looks like it is generic problem and not > > associated with kexec/kdump. Cachefs might be doing some corruption. > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > Wheeeeeeeeee. Great, we (I) can do without cachefs, > and when I do that, kexec + kdump works. > First time that I've seen kdump work. :) > > -rw-r--r-- 1 root root 1.0G Apr 28 08:41 oldmem.0428 > -r-------- 1 root root 960M Apr 28 08:36 vmcore.0428 > > My (crashing/panic) kernel is built without -g, but gdb > can still tell me this much: > > (gdb) bt > #0 0xc010ef95 in crash_get_current_regs () > #1 0x00000000 in ?? () > #2 0xee821ea0 in ?? () > #3 0xee821ea0 in ?? () > #4 0xee821ea0 in ?? () > #5 0x00000046 in ?? () > #6 0x00000000 in ?? () > #7 0x00000000 in ?? () > #8 0x00000000 in ?? () > #9 0xee82c000 in ?? () > #10 0x00000000 in ?? () > #11 0xc010ed38 in machine_kexec () > > > Thanks for following up, tracking, working on this. Congratulations everyone. The good really good news is when the recovery kernel failed it failed early enough it did not make things worse. It is good to see that prediction confirmed :) Eric ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] Kdump docs. 2005-04-28 16:11 ` Randy.Dunlap 2005-04-28 19:08 ` Eric W. Biederman @ 2005-04-29 3:08 ` Randy.Dunlap 2005-04-29 5:07 ` Vivek Goyal 1 sibling, 1 reply; 14+ messages in thread From: Randy.Dunlap @ 2005-04-29 3:08 UTC (permalink / raw) To: Randy.Dunlap; +Cc: vgoyal, akpm, sharyathi, fastboot, ebiederm, linux-kernel On Thu, 28 Apr 2005 09:11:19 -0700 Randy.Dunlap wrote: | Wheeeeeeeeee. Great, we (I) can do without cachefs, | and when I do that, kexec + kdump works. | First time that I've seen kdump work. :) Vivek, Hari, Andrew- Here's a patch to make Documentation/kdump.txt cleaner & clearer. --- From: Randy Dunlap <rddunlap@osdl.org> Cleanups and clear-ups for kdump doc: typos, punctuation, 80 columns, examples. Signed-off-by: Randy Dunlap <rddunlap@osdl.org> --- Documentation/kdump.txt | 89 ++++++++++++++++++++++++++++-------------------- 1 files changed, 52 insertions(+), 37 deletions(-) diff -Naurp ./Documentation/kdump.txt~kdump_docco ./Documentation/kdump.txt --- ./Documentation/kdump.txt~kdump_docco 2005-04-22 10:01:39.000000000 -0700 +++ ./Documentation/kdump.txt 2005-04-28 19:55:03.000000000 -0700 @@ -1,4 +1,4 @@ -Documentation for kdump - the kexec based crash dumping solution +Documentation for kdump - the kexec-based crash dumping solution ================================================================ DESIGN @@ -11,10 +11,10 @@ DMA from the first kernel does not corru All the necessary information about Core image is encoded in ELF format and stored in reserved area of memory before crash. Physical address of start of -elf header is passed to new kernel through command line parameter elfcorehdr=. +ELF header is passed to new kernel through command line parameter elfcorehdr=. -On i386, first 640k of physical memory is needed to boot, irrespctive of where -the kernel loads at. Hence, this region is backed up by kexec just before +On i386, the first 640 KB of physical memory is needed to boot, irrespective +of where the kernel loads. Hence, this region is backed up by kexec just before rebooting into the new kernel. In the second kernel, "old memory" can be accessed in two ways. @@ -22,59 +22,72 @@ In the second kernel, "old memory" can b - The first one is through a /dev/oldmem device interface. A capture utility can read the device file and write out the memory in raw format. This is raw dump of memory and analysis/capture tool should be intelligent enough to - determine where to look for the right information. Elf headers (elfcorehdr=) + determine where to look for the right information. ELF headers (elfcorehdr=) can become handy here. - The second interface is through /proc/vmcore. This exports the dump as an ELF format file which can be written out using any file copy command (cp, scp, etc). Further, gdb can be used to perform limited debugging on the dump file. This method ensures methods ensure that there is correct - ordering of the dump pages (corresponding to the first 640k that has been + ordering of the dump pages (corresponding to the first 640 KB that has been relocated). SETUP ===== -1) Obtain the appropriate -mm tree patch and apply it on to the vanilla - kernel tree. +1) Download and build the appropriate version of kexec-tools. -2) Obtain appropriate version of kexec-tools. +2) Download and build the appropriate (latest) kexec/kdump (-mm) kernel + patchset and apply it to the vanilla kernel tree. -3) Two kernels need to be built in order to get this feature working. + Two kernels need to be built in order to get this feature working. - First kernel: - a) Enable "kexec system call" feature. - b) Enable "sysfs file system support" (Pseudo filesystems). - c) Boot into first kernel with command line "crashkernel=Y@X". Put - appropriate values for X and Y. Y denotes, how much memory to reserve for - second kernel, and X denotes at what physical address reserved memory - section starts. For example, crashkernel=48M@16M. - - Second kernel: - a) Enable "kernel crash dumps" feature. - b) Specifiy a suitable value for "Physical address where the kernel is - loaded". Typically this value should be same as X (See option c) above). - c) Enable "/proc/vmcore support" (Optional). - - Note: Option a) and b) depend upon "Configure standard kernel feature - (for small systems)". - Option a) also depends on CONFIG_HIGHMEM. - Both option a) and b) are under "Processor Types and Features" + A) First kernel: + a) Enable "kexec system call" feature (in Processor type and features). + CONFIG_KEXEC=y + b) This kernel's physical load address should be the default value of + 0x100000 (0x100000, 1 MB) (in Processor type and features). + CONFIG_PHYSICAL_START=0x100000 + c) Enable "sysfs file system support" (in Pseudo filesystems). + CONFIG_SYSFS=y + d) Boot into first kernel with the command line parameter "crashkernel=Y@X". + Use appropriate values for X and Y. Y denotes how much memory to reserve + for the second kernel, and X denotes at what physical address the reserved + memory section starts. For example: "crashkernel=64M@16M". + + B) Second kernel: + a) Enable "kernel crash dumps" feature (in Processor type and features). + CONFIG_CRASH_DUMP=y + b) Specify a suitable value for "Physical address where the kernel is + loaded" (in Processor type and features). Typically this value + should be same as X (See option b) above, e.g., 16 MB or 0x1000000. + CONFIG_PHYSICAL_START=0x1000000 + c) Enable "/proc/vmcore support" (Optional, in Pseudo filesystems). + CONFIG_PROC_VMCORE=y + + Note: Options a) and b) depend upon "Configure standard kernel features + (for small systems)" (under General setup). + Option a) also depends on CONFIG_HIGHMEM (under Processor + type and features). + Both option a) and b) are under "Processor type and features". -3) Boot into the first kernel. You are now ready to try out kexec based crash +3) Boot into the first kernel. You are now ready to try out kexec-based crash dumps. -4) Load the second kernel to be booted using +4) Load the second kernel to be booted using: kexec -p <second-kernel> --crash-dump --args-linux --append="root=<root-dev> maxcpus=1 init 1" Note: i) <second-kernel> has to be a vmlinux image. bzImage will not work, as of now. - ii) By default elf headers are stored in ELF32 format(for i386). This is - sufficient to represent the physical memory up to 4GB. To store - headers in ELF64 format, specifiy "--elf64-core-headers" on kexec - command line additionally. + ii) By default ELF headers are stored in ELF32 format (for i386). This + is sufficient to represent the physical memory up to 4GB. To store + headers in ELF64 format, specifiy "--elf64-core-headers" on the + kexec command line additionally. + iii) For now (or until it is fixed), it's best to build the + second-kernel without multi-processor support, i.e., make it + a uniprocessor kernel. 5) System reboots into the second kernel when a panic occurs. A module can be written to force the panic, for testing purposes. @@ -83,14 +96,16 @@ SETUP cp /proc/vmcore <dump-file> - Dump can also be accessed as a /dev/oldmem device for a linear/raw view. - To create the device, type + Dump memory can also be accessed as a /dev/oldmem device for a linear/raw + view. To create the device, type: mknod /dev/oldmem c 1 12 Use "dd" with suitable options for count, bs and skip to access specific portions of the dump. + Entire memory: dd if=/dev/oldmem of=oldmem.001 + ANALYSIS ======== @@ -102,7 +117,7 @@ Limited analysis can be done using gdb o Stack trace for the task on processor 0, register display, memory display work fine. -Note: gdb can not analyse core files generated in ELF64 format for i386. +Note: gdb cannot analyse core files generated in ELF64 format for i386. TODO ==== ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Kdump docs. 2005-04-29 3:08 ` [PATCH] Kdump docs Randy.Dunlap @ 2005-04-29 5:07 ` Vivek Goyal 2005-04-29 14:26 ` [Fastboot] " Randy.Dunlap 2005-04-30 3:04 ` [PATCH] Kdump doc. fix option typo Randy.Dunlap 0 siblings, 2 replies; 14+ messages in thread From: Vivek Goyal @ 2005-04-29 5:07 UTC (permalink / raw) To: Randy.Dunlap; +Cc: vgoyal, akpm, sharyathi, fastboot, ebiederm, linux-kernel Hi Randy, > + A) First kernel: > + a) Enable "kexec system call" feature (in Processor type and features). > + CONFIG_KEXEC=y > + b) This kernel's physical load address should be the default value of > + 0x100000 (0x100000, 1 MB) (in Processor type and features). > + CONFIG_PHYSICAL_START=0x100000 > + c) Enable "sysfs file system support" (in Pseudo filesystems). > + CONFIG_SYSFS=y > + d) Boot into first kernel with the command line parameter "crashkernel=Y@X". > + Use appropriate values for X and Y. Y denotes how much memory to reserve > + for the second kernel, and X denotes at what physical address the reserved > + memory section starts. For example: "crashkernel=64M@16M". > + > + B) Second kernel: > + a) Enable "kernel crash dumps" feature (in Processor type and features). > + CONFIG_CRASH_DUMP=y > + b) Specify a suitable value for "Physical address where the kernel is > + loaded" (in Processor type and features). Typically this value > + should be same as X (See option b) above, e.g., 16 MB or 0x1000000. Should above line be as follows. "should be same as X (See option d) above." This will make clear what is X and what should be the new value of CONFIG_PHYSICAL_START. Thanks for testing out and providing a clearer documentation. Thanks Vivek ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Fastboot] Re: [PATCH] Kdump docs. 2005-04-29 5:07 ` Vivek Goyal @ 2005-04-29 14:26 ` Randy.Dunlap 2005-04-30 3:04 ` [PATCH] Kdump doc. fix option typo Randy.Dunlap 1 sibling, 0 replies; 14+ messages in thread From: Randy.Dunlap @ 2005-04-29 14:26 UTC (permalink / raw) To: vgoyal; +Cc: akpm, sharyathi, fastboot, linux-kernel, ebiederm On Fri, 29 Apr 2005 10:37:29 +0530 Vivek Goyal wrote: | Hi Randy, | | > + A) First kernel: | > + a) Enable "kexec system call" feature (in Processor type and features). | > + CONFIG_KEXEC=y | > + b) This kernel's physical load address should be the default value of | > + 0x100000 (0x100000, 1 MB) (in Processor type and features). | > + CONFIG_PHYSICAL_START=0x100000 | > + c) Enable "sysfs file system support" (in Pseudo filesystems). | > + CONFIG_SYSFS=y | > + d) Boot into first kernel with the command line parameter "crashkernel=Y@X". | > + Use appropriate values for X and Y. Y denotes how much memory to reserve | > + for the second kernel, and X denotes at what physical address the reserved | > + memory section starts. For example: "crashkernel=64M@16M". | > + | > + B) Second kernel: | > + a) Enable "kernel crash dumps" feature (in Processor type and features). | > + CONFIG_CRASH_DUMP=y | > + b) Specify a suitable value for "Physical address where the kernel is | > + loaded" (in Processor type and features). Typically this value | > + should be same as X (See option b) above, e.g., 16 MB or 0x1000000. | | Should above line be as follows. | "should be same as X (See option d) above." Yes, thanks for catching that. Now how to update it....? | This will make clear what is X and what should be the new value of | CONFIG_PHYSICAL_START. --- ~Randy ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] Kdump doc. fix option typo. 2005-04-29 5:07 ` Vivek Goyal 2005-04-29 14:26 ` [Fastboot] " Randy.Dunlap @ 2005-04-30 3:04 ` Randy.Dunlap 1 sibling, 0 replies; 14+ messages in thread From: Randy.Dunlap @ 2005-04-30 3:04 UTC (permalink / raw) To: vgoyal; +Cc: akpm, sharyathi, fastboot, linux-kernel, ebiederm On Fri, 29 Apr 2005 10:37:29 +0530 Vivek Goyal wrote: | Should above line be as follows. | "should be same as X (See option d) above." | | This will make clear what is X and what should be the new value of | CONFIG_PHYSICAL_START. From: Randy Dunlap <rddunlap@osdl.org> Fix one-letter typo of option b->d. Signed-off-by: Randy Dunlap <rddunlap@osdl.org> --- Documentation/kdump.txt | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) diff -Naurp ./Documentation/kdump.txt~kdump_doc_fix_optionb ./Documentation/kdump.txt --- ./Documentation/kdump.txt~kdump_doc_fix_optionb 2005-04-28 19:55:03.000000000 -0700 +++ ./Documentation/kdump.txt 2005-04-29 19:59:32.000000000 -0700 @@ -60,7 +60,7 @@ SETUP CONFIG_CRASH_DUMP=y b) Specify a suitable value for "Physical address where the kernel is loaded" (in Processor type and features). Typically this value - should be same as X (See option b) above, e.g., 16 MB or 0x1000000. + should be same as X (See option d) above, e.g., 16 MB or 0x1000000. CONFIG_PHYSICAL_START=0x1000000 c) Enable "/proc/vmcore support" (Optional, in Pseudo filesystems). CONFIG_PROC_VMCORE=y --- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Kdump Testing @ 2005-04-22 10:46 Nagesh Sharyathi 2005-04-22 12:32 ` [Fastboot] " Eric W. Biederman 0 siblings, 1 reply; 14+ messages in thread From: Nagesh Sharyathi @ 2005-04-22 10:46 UTC (permalink / raw) To: linux-kernel, fastboot; +Cc: akpm, vgoyal Here is the console boot log, before the machine jumps to BIOS after hang during panic kerenl boot ---------------------------------------------------- x235 x235b!login: SysRq : Trigger a crashdump Linux version 2.6.12-rc2-mm1 (root@x235b) (gcc version 3.3.3 (SuSE Linux)) #2 SMP PREEMPT Tue Apr 19 08:55:24 PDT 2005 BIOS-provided physical RAM map: BIOS-e820: 0000000000000100 - 000000000009c000 (usable) BIOS-e820: 000000000009c000 - 00000000000a0000 (reserved) BIOS-e820: 0000000000100000 - 000000005ffd8740 (usable) BIOS-e820: 000000005ffd8740 - 000000005ffe0000 (ACPI data) BIOS-e820: 000000005ffe0000 - 0000000060000000 (reserved) BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved) user-defined physical RAM map: user: 0000000000000000 - 00000000000a0000 (usable) user: 0000000001000000 - 0000000001518000 (usable) user: 00000000015b8400 - 0000000004000000 (usable) 0MB HIGHMEM available. 64MB LOWMEM available. found SMP MP-table at 0009c140 DMI 2.3 present. ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 15:2 APIC version 20 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x06] enabled) Processor #6 15:2 APIC version 20 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 15:2 APIC version 20 ACPI: LAPIC (acpi_id[0x03] lapic_id[0x07] enabled) Processor #7 15:2 APIC version 20 ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x06] dfl dfl lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x07] dfl dfl lint[0x1]) ACPI: IOAPIC (id[0x0e] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 14, version 17, address 0xfec00000, GSI 0-15 ACPI: IOAPIC (id[0x0d] address[0xfec01000] gsi_base[16]) IOAPIC[1]: apic_id 13, version 17, address 0xfec01000, GSI 16-31 ACPI: IOAPIC (id[0x0c] address[0xfec02000] gsi_base[32]) IOAPIC[2]: apic_id 12, version 17, address 0xfec02000, GSI 32-47 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) Enabling APIC mode: Flat. Using 3 I/O APICs Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at 04000000 (gap: 04000000:fc000000) Built 1 zonelists Initializing CPU#0 Kernel command line: root=/dev/sda1 init 1 vga=0x31a selinux=0 splash=silent resume=/dev/sda2 elevator=cfq showopts console=tty0 console=ttyS0,38400n1 memmap=exactmap memmap=640K@0K memmap=5216K@16384K memmap=43295K@22241K elfcorehdr=22240K PID hash table entries: 512 (order: 9, 8192 bytes) Detected 2795.976 MHz processor. Using tsc for high-res timesource Console: colour VGA+ 80x25 Dentry cache hash table entries: 16384 (order: 4, 65536 bytes) Inode-cache hash table entries: 8192 (order: 3, 32768 bytes) Memory: 42992k/65536k available (3523k kernel code, 6060k reserved, 1121k data, 228k init, 0k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok. Mount-cache hash table entries: 512 CPU: Trace cache: 12K uops, L1 D cache: 8K CPU: L2 cache: 512K CPU: Physical Processor ID: 3 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU0: Intel P4/Xeon Extended MCE MSRs (12) available CPU0: Thermal monitoring enabled Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. CPU0: Intel(R) Xeon(TM) CPU 2.80GHz stepping 07 Booting processor 1/0 eip 3000 ---------------------------------------------------- x206 SysRq : Trigger a crashdump Linux version 2.6.12-rc2-mm1-II (root@x206h) (gcc version 3.3.3 (SuSE Linux)) #2 SMP PREEMPT Wed Apr 20 18:58:46 IST 2005 BIOS-provided physical RAM map: BIOS-e820: 0000000000000100 - 000000000009b400 (usable) BIOS-e820: 000000000009b400 - 00000000000a0000 (reserved) BIOS-e820: 00000000000d6000 - 00000000000d8000 (reserved) BIOS-e820: 0000000000100000 - 000000007ff70000 (usable) BIOS-e820: 000000007ff70000 - 000000007ff76000 (ACPI data) BIOS-e820: 000000007ff76000 - 000000007ff80000 (ACPI NVS) BIOS-e820: 000000007ff80000 - 0000000080000000 (reserved) BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000ff800000 - 00000000ffc00000 (reserved) BIOS-e820: 00000000fffffc00 - 0000000100000000 (reserved) user-defined physical RAM map: user: 0000000000000000 - 00000000000a0000 (usable) user: 0000000001000000 - 00000000014e4000 (usable) user: 0000000001584400 - 0000000004000000 (usable) 0MB HIGHMEM available. 64MB LOWMEM available. found SMP MP-table at 000f6140 DMI present. ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 15:3 APIC version 20 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) Processor #1 15:3 APIC version 20 ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23 ACPI: IOAPIC (id[0x03] address[0xfec10000] gsi_base[24]) IOAPIC[1]: apic_id 3, version 32, address 0xfec10000, GSI 24-47 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) Enabling APIC mode: Flat. Using 2 I/O APICs Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at 04000000 (gap: 04000000:fc000000) Built 1 zonelists Initializing CPU#0 Kernel command line: root=/dev/sda5 init 1 vga=0x31a selinux=0 splash=silent acpi=oldboot resume=/dev/sda6 elevator=cfq showopts console=tty0 console=ttyS0,38400n1 memmap=exactmap memmap=640K@0K memmap=5008K@16384K memmap=43503K@22033K elfcorehdr=22032K PID hash table entries: 512 (order: 9, 8192 bytes) Detected 2801.477 MHz processor. Using tsc for high-res timesource Console: colour dummy device 80x25 Dentry cache hash table entries: 16384 (order: 4, 65536 bytes) Inode-cache hash table entries: 8192 (order: 3, 32768 bytes) Memory: 43208k/65536k available (3269k kernel code, 5848k reserved, 1147k data, 244k init, 0k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok. Mount-cache hash table entries: 512 monitor/mwait feature present. using mwait in idle threads. CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 1024K CPU: Physical Processor ID: 0 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU0: Intel P4/Xeon Extended MCE MSRs (12) available CPU0: Thermal monitoring enabled Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. CPU0: Intel(R) Pentium(R) 4 CPU 2.80GHz stepping 04 Booting processor 1/0 eip 3000 linux-kernel-owner@vger.kernel.org wrote on 21/04/2005 19:26:11: > Hi, > I tested the kdump tool on x235 and x206 machines and found this problem > where on kernel Panic, system instead of booting into the panic kernel > jumps into BIOS and machine restarts. > (I have given the hardware specifications at the bottom of the mail) > > Software: > - 2.6.12-rc2-mm1 > - kexec-tools-1.101 > - Five kdump user space patches > [http://marc.theaimsgroup.com/?l=linux-kernel&m=111201661400892&w=2] > > Test Procedure: > - Built first kernel for 1M location with CONFIG_KEXEC enabled. > - Booted into first kernel with command line options crashkernel=48M@16M. > - Built second kernel for 16M location with CONFIG_CRASH_DUMP, and > CONFIG_PROC_VMCORE enabled. > - Loaded second kernel with following kexec command. > > kexec -p vmlinux-16M --args-linux --crash-dump --append="root=<root-dev> > init 1" > > - Inserted a module or echo into sysrq-trigger to invoke panic. > - System jumps into BIOS directly instead of booting into secondary > kernel. > > Summary Observation: > > - Earlier I was able to make kdump work on x330 machine by removing > maxcpus=1 (as specified in kdump.txt) option during loading panic kernel, > through kexec tool. But this work around doesn't seems to work with the > hardware x235 and x206. On kernel panic machine jumps to BIOS rather than > to panic kernel without displaying any error message. > > > HARDWARE SPECIFICATIONS > ------------ > > A) Hardware x330: > - SMP, 2way, Pentium III (Coppermine) 1 GHz, 1.3G RAM > - Network Interface (e100) > - Disk I/O > SCSI storage controller: Adaptec Ultra160 > ----------- > B)Hardware x235 > - SMP, 2way, Xeon TM 2.8GHz, 1.5g RAM > - Network Interface (Tigon3) > - Disk I/O > SCSI storage controller: IBM Serve RAID > ------------- > C)Hardware x206 > - SMP, 1way, Pentium IV 2.8GHz, 2g RAM > - Network Interface (e1000) > - Disk I/O > SCSI storage controller: Adaptec Ultra320 > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Fastboot] Re: Kdump Testing 2005-04-22 10:46 Kdump Testing Nagesh Sharyathi @ 2005-04-22 12:32 ` Eric W. Biederman 0 siblings, 0 replies; 14+ messages in thread From: Eric W. Biederman @ 2005-04-22 12:32 UTC (permalink / raw) To: Nagesh Sharyathi; +Cc: linux-kernel, fastboot, akpm Nagesh Sharyathi <sharyathi@in.ibm.com> writes: > Here is the console boot log, before the machine jumps to BIOS > after hang during panic kerenl boot Ok thanks. So this is manually triggered with SysRq and the kexec part works but the recover kernel simply fails to boot. It looks like that hunk of the ACPI code that messes up maxcpus=1 needs to be looked at. Eric ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2005-04-30 3:05 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-04-23 3:30 [Fastboot] Re: Kdump Testing Vivek Goyal 2005-04-25 12:15 ` Nagesh Sharyathi 2005-04-25 23:09 ` Randy.Dunlap 2005-04-26 8:54 ` Vivek Goyal 2005-04-27 16:46 ` Randy.Dunlap 2005-04-27 19:23 ` Randy.Dunlap 2005-04-28 11:44 ` Vivek Goyal 2005-04-28 16:11 ` Randy.Dunlap 2005-04-28 19:08 ` Eric W. Biederman 2005-04-29 3:08 ` [PATCH] Kdump docs Randy.Dunlap 2005-04-29 5:07 ` Vivek Goyal 2005-04-29 14:26 ` [Fastboot] " Randy.Dunlap 2005-04-30 3:04 ` [PATCH] Kdump doc. fix option typo Randy.Dunlap -- strict thread matches above, loose matches on Subject: below -- 2005-04-22 10:46 Kdump Testing Nagesh Sharyathi 2005-04-22 12:32 ` [Fastboot] " Eric W. Biederman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox