* very strange issue with sata,<4G Ram, and ext3
@ 2005-04-28 16:16 Rick Warner
2005-04-28 17:32 ` Rick Warner
2005-04-28 22:48 ` Alan Cox
0 siblings, 2 replies; 8+ messages in thread
From: Rick Warner @ 2005-04-28 16:16 UTC (permalink / raw)
To: linux-kernel
Hello,
We are having a very strange issue on some 64bit systems. We have a 32 node
cluster of EM64T's (supermicro boards). We are using our node restore
software to propagate a linux install onto them. We do a pxe boot to a
kernel and initrd image. The initrd has some config info, a basic root
filesystem, and a restore script. The kernel is passed init=/restore (the
restore script itself). The script runs dhcp, gets an ip, then nfs mounts
the master node of the cluster. The backup image is stored on the master
node's nfs mount. The script then applies a backed up partition table and
then mkfs's the partitions, mounts them, untars a backup tar to the drive,
and then makes it bootable with grub.
On these systems, we are getting ext2 errors from the initrd during the
untarring. Soon after, we start getting seg faults on random things (looks
like stuff caused by the still running dhcp client), and then a continuous
stream of segfaults on the restore script itself (restore[1]).
The systems being restored are dual em64t's with 2G of ram and 200G sata
drives. If we up the memory to 4G, the restores complete without error. If
we reduce down to 512M, the segfaults start at the mkfs stage instead of the
untar stage. We've tried different sata drives and controllers without
change. Switching to ide drives works. Switching to reiserfs instead of
ext3 for the destination drives works too. We've tried enabling the scsi
debug stuff as well as the jbd debug stuff for ext3 without getting any more
info. We also enabled the kernel debug options too. We've also tried using
the deprecated ide based sata drivers instead of the scsi based ones without
success. We have tried restoring to Intel's Jarell EM64T systems as well as
an Arima HDAMA opteron with the same errors. We've also tried adding swap
space ASAP in the inird image.
This problem is really baffling us and we're not quite sure what to check
into next. Any ideas?
--
Richard Warner
Lead Systems Integrator
Microway, Inc
(508)732-5517
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: very strange issue with sata,<4G Ram, and ext3
2005-04-28 16:16 very strange issue with sata,<4G Ram, and ext3 Rick Warner
@ 2005-04-28 17:32 ` Rick Warner
2005-04-28 22:48 ` Alan Cox
1 sibling, 0 replies; 8+ messages in thread
From: Rick Warner @ 2005-04-28 17:32 UTC (permalink / raw)
To: linux-kernel
I forgot to mention the kernels that have been tried- 2.6.8.1, 2.6.11.7,
2.6.12-rc3, and a redhat 2.6.9.
On Thursday 28 April 2005 12:16 pm, Rick Warner wrote:
> Hello,
> We are having a very strange issue on some 64bit systems. We have a 32
> node cluster of EM64T's (supermicro boards). We are using our node restore
> software to propagate a linux install onto them. We do a pxe boot to a
> kernel and initrd image. The initrd has some config info, a basic root
> filesystem, and a restore script. The kernel is passed init=/restore (the
> restore script itself). The script runs dhcp, gets an ip, then nfs mounts
> the master node of the cluster. The backup image is stored on the master
> node's nfs mount. The script then applies a backed up partition table and
> then mkfs's the partitions, mounts them, untars a backup tar to the drive,
> and then makes it bootable with grub.
>
> On these systems, we are getting ext2 errors from the initrd during the
> untarring. Soon after, we start getting seg faults on random things (looks
> like stuff caused by the still running dhcp client), and then a continuous
> stream of segfaults on the restore script itself (restore[1]).
>
> The systems being restored are dual em64t's with 2G of ram and 200G sata
> drives. If we up the memory to 4G, the restores complete without error. If
> we reduce down to 512M, the segfaults start at the mkfs stage instead of
> the untar stage. We've tried different sata drives and controllers without
> change. Switching to ide drives works. Switching to reiserfs instead of
> ext3 for the destination drives works too. We've tried enabling the scsi
> debug stuff as well as the jbd debug stuff for ext3 without getting any
> more info. We also enabled the kernel debug options too. We've also tried
> using the deprecated ide based sata drivers instead of the scsi based ones
> without success. We have tried restoring to Intel's Jarell EM64T systems
> as well as an Arima HDAMA opteron with the same errors. We've also tried
> adding swap space ASAP in the inird image.
>
> This problem is really baffling us and we're not quite sure what to check
> into next. Any ideas?
--
Richard Warner
Lead Systems Integrator
Microway, Inc
(508)732-5517
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: very strange issue with sata,<4G Ram, and ext3
2005-04-28 16:16 very strange issue with sata,<4G Ram, and ext3 Rick Warner
2005-04-28 17:32 ` Rick Warner
@ 2005-04-28 22:48 ` Alan Cox
2005-04-29 14:45 ` Rick Warner
1 sibling, 1 reply; 8+ messages in thread
From: Alan Cox @ 2005-04-28 22:48 UTC (permalink / raw)
To: Rick Warner; +Cc: Linux Kernel Mailing List
On Iau, 2005-04-28 at 17:16, Rick Warner wrote:
> On these systems, we are getting ext2 errors from the initrd during the
> untarring. Soon after, we start getting seg faults on random things (looks
> like stuff caused by the still running dhcp client), and then a continuous
> stream of segfaults on the restore script itself (restore[1]).
This sounds almost like the pxe/boot code is still using ram that the
kernel has now used (eg the PXE layer or pxe booter forgot to close the
client and
its still DMAing happily into the kernel)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: very strange issue with sata,<4G Ram, and ext3
2005-04-28 22:48 ` Alan Cox
@ 2005-04-29 14:45 ` Rick Warner
2005-05-04 19:29 ` Rick Warner
2005-05-05 21:37 ` Krzysztof Halasa
0 siblings, 2 replies; 8+ messages in thread
From: Rick Warner @ 2005-04-29 14:45 UTC (permalink / raw)
To: Alan Cox; +Cc: Linux Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 1261 bytes --]
On Thursday 28 April 2005 06:48 pm, Alan Cox wrote:
> On Iau, 2005-04-28 at 17:16, Rick Warner wrote:
> > On these systems, we are getting ext2 errors from the initrd during the
> > untarring. Soon after, we start getting seg faults on random things
> > (looks like stuff caused by the still running dhcp client), and then a
> > continuous stream of segfaults on the restore script itself (restore[1]).
>
> This sounds almost like the pxe/boot code is still using ram that the
> kernel has now used (eg the PXE layer or pxe booter forgot to close the
> client and
> its still DMAing happily into the kernel)
This morning, we tried updating to a newer pxelinux (3.07) and had the same
results. We then tried using etherboot with a mknbi tagged image and also
had the same results. Since we are getting the same problem on 3 different
motherboards with 2 different network adapters, I have not looked into
updating the boot rom on the nics. Should I?
What should I look into next? I have attached a serial console log of the
system and errors. The slashes and pipes you see are from a spinning bar
thing. If you want output that is cleaned up without that, I can provide it.
--
Richard Warner
Lead Systems Integrator
Microway, Inc
(508)732-5517
[-- Attachment #2: new-pxelinux.txt --]
[-- Type: text/plain, Size: 19985 bytes --]
Bootdata ok (command line is initrd=initrd.img.gz ramdisk_size=46080 rw root=/dev/ram0 devfs=nomount init=/restore console=tty0 console=ttyS0,115200 BOOT_IMAGE=vmlinuz )
Linux version 2.6.12-rc3-em64t-mcms (root@master.cl.usgs.gov) (gcc version 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)) #6 Thu Apr 28 10:11:16 EDT 2005
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009b400 (usable)
BIOS-e820: 000000000009b400 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000007ff70000 (usable)
BIOS-e820: 000000007ff70000 - 000000007ff78000 (ACPI data)
BIOS-e820: 000000007ff78000 - 000000007ff80000 (ACPI NVS)
BIOS-e820: 000000007ff80000 - 0000000080000000 (reserved)
BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000ff800000 - 00000000ffc00000 (reserved)
BIOS-e820: 00000000fffffc00 - 0000000100000000 (reserved)
Intel MultiProcessor Specification v1.4
Virtual Wire compatibility mode.
OEM ID: INTEL <6>Product ID: Lindenhurst <6>APIC at: 0xFEE00000
Processor #0 15:4 APIC version 20
Processor #6 15:4 APIC version 20
WARNING: NR_CPUS limit of 1 reached. Processor ignored.
I/O APIC #2 Version 32 at 0xFEC00000.
I/O APIC #3 Version 32 at 0xFEC80000.
I/O APIC #4 Version 32 at 0xFEC80400.
I/O APIC #5 Version 32 at 0xFEC84000.
I/O APIC #8 Version 32 at 0xFEC84400.
Setting APIC routing to flat
Processors: 1
Allocating PCI resources starting at 80000000 (gap: 80000000:60000000)
Checking aperture...
Built 1 zonelists
Kernel command line: initrd=initrd.img.gz ramdisk_size=46080 rw root=/dev/ram0 devfs=nomount init=/restore console=tty0 console=ttyS0,115200 BOOT_IMAGE=vmlinuz
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 131072 bytes)
time.c: Using 1.193182 MHz PIT timer.
time.c: Detected 3000.254 MHz processor.
time.c: Using PIT/TSC based timekeeping.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Placing software IO TLB between 0x3243000 - 0x5243000
Memory: 2007900k/2096576k available (2932k kernel code, 87916k reserved, 1263k data, 140k init)
Mount-cache hash table entries: 256
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
using mwait in idle threads.
CPU0: Thermal monitoring enabled (TM1)
CPU: Intel(R) Xeon(TM) CPU 3.00GHz stepping 01
Using IO APIC NMI watchdog
Using IO-APIC 2
Using IO-APIC 3
Using IO-APIC 4
Using IO-APIC 5
Using IO-APIC 8
activating NMI Watchdog ... done.
testing NMI watchdog ... OK.
Using local APIC timer interrupts.
Detected 12.500 MHz APIC timer.
checking if image is initramfs...it isn't (no cpio magic); looks like an initrd
NET: Registered protocol family 16
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
SCSI subsystem initialized
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
PCI: Transparent bridge - 0000:00:1e.0
PCI: Using IRQ router PIIX/ICH [8086/24d0] at 0000:00:1f.0
PCI->APIC IRQ transform: 0000:00:02.0[A] -> IRQ 16
PCI->APIC IRQ transform: 0000:00:04.0[A] -> IRQ 16
PCI->APIC IRQ transform: 0000:00:06.0[A] -> IRQ 16
PCI->APIC IRQ transform: 0000:00:1d.0[A] -> IRQ 16
PCI->APIC IRQ transform: 0000:00:1d.1[B] -> IRQ 19
PCI->APIC IRQ transform: 0000:00:1d.2[C] -> IRQ 18
PCI->APIC IRQ transform: 0000:00:1d.3[A] -> IRQ 16
PCI->APIC IRQ transform: 0000:00:1d.7[D] -> IRQ 23
PCI->APIC IRQ transform: 0000:00:1f.2[A] -> IRQ 18
PCI->APIC IRQ transform: 0000:00:1f.3[B] -> IRQ 17
PCI->APIC IRQ transform: 0000:03:02.0[A] -> IRQ 54
PCI->APIC IRQ transform: 0000:03:02.1[B] -> IRQ 55
PCI->APIC IRQ transform: 0000:08:01.0[A] -> IRQ 17
PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
IA32 emulation $Id: sys_ia32.c,v 1.32 2002/03/24 13:02:28 ak Exp $
Total HugeTLB memory allocated, 0
JFS: nTxBlock = 8192, nTxLock = 65536
SGI XFS with large block/inode numbers, no debug enabled
Linux agpgart interface v0.101 (c) Dave Jones
Hangcheck: starting hangcheck timer 0.5.0 (tick is 180 seconds, margin is 60 seconds).
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing disabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
Floppy drive(s): fd0 is 1.44M
FDC 0 is a National Semiconductor PC87306
RAMDISK driver initialized: 8 RAM disks of 46080K size 1024 blocksize
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ata1: SATA max UDMA/133 cmd 0x14E8 ctl 0x14DE bmdma 0x14B0 irq 18
ata2: SATA max UDMA/133 cmd 0x14E0 ctl 0x14DA bmdma 0x14B8 irq 18
ata1: dev 0 ATA, max UDMA/133, 398297088 sectors: lba48
ata1: dev 0 configured for UDMA/133
scsi0 : ata_piix
ata2: SATA port has no device.
scsi1 : ata_piix
Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC
Type: Direct-Access ANSI SCSI revision: 05
SCSI device sda: 398297088 512-byte hdwr sectors (203928 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 398297088 512-byte hdwr sectors (203928 MB)
SCSI device sda: drive cache: write back
sda: sda1 sda4 < sda5 sda6 sda7 sda8 sda9 sda10 >
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0, type 0
mice: PS/2 mouse device common for all mice
NET: Registered protocol family 2
IP: routing cache hash table of 2048 buckets, 112Kbytes
TCP established hash table entries: 524288 (order: 10, 4194304 bytes)
TCP bind hash table entries: 65536 (order: 9, 3670016 bytes)
TCP: Hash tables configured (established 524288 bind 65536)
NET: Registered protocol family 1
NET: Registered protocol family 17
RAMDISK: Compressed image found at block 0
VFS: Mounted root (ext2 filesystem).
Freeing unused kernel memory: 140k freed
Turning on debugging options.
Intel(R) PRO/1000 Network Driver - version 5.7.6-k2
Copyright (c) 1999-2004 Intel Corporation.
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection
Internet Software Consortium DHCP Client V3.0.1rc13
Copyright 1995-2002 Internet Software Consortium.
All rights reserved.
For info, please visit http://www.isc.org/products/DHCP
e1000: eth0: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex
Listening on LPF/eth1/00:30:48:74:a5:71
Sending on LPF/eth1/00:30:48:74:a5:71
Listening on LPF/eth0/00:30:48:74:a5:70
Sending on LPF/eth0/00:30:48:74:a5:70
Listening on LPF/lo/
Sending on LPF/lo/
Sending on Socket/fallback
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 3
DHCPDISCOVER on lo to 255.255.255.255 port 67 interval 2
DHCPOFFER from 10.0.0.1
DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 3
DHCPDISCOVER on lo to 255.255.255.255 port 67 interval 3
DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 4
DHCPREQUEST on eth0 to 255.255.255.255 port 67
DHCPDISCOVER on lo to 255.255.255.255 port 67 interval 5
DHCPACK from 10.0.0.1
bound to 10.0.0.100 -- renewal in 1455 seconds.
Brought up network devices.
Setting date to match master
Permission denied.
Fri Apr 29 01:51:44 EDT 2005
nfs warning: mount version older than kernel
Checking that noSCSI device sda: 398297088 512-byte hdwr sectors (203928 MB)
-one is using thSCSI device sda: drive cache: write back
is disk right no sda:w ...
sda1 sda4 < sda5 sda6 sda7 sda8 sda9 sda10 >
OK
Disk /dev/sda: 24792 cylinders, 255 heads, 63 sectors/track
Old situation:
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
Device Boot Start End #cyls #blocks Id System
/dev/sda1 0+ 9 10- 80293+ 83 Linux
/dev/sda2 0 - 0 0 0 Empty
/dev/sda3 0 - 0 0 0 Empty
/dev/sda4 10 24791 24782 199061415 5 Extended
/dev/sda5 10+ 253 244- 1959898+ 83 Linux
/dev/sda6 254+ 1470 1217- 9775521 83 Linux
/dev/sda7 1471+ 1836 366- 2939863+ 83 Linux
/dev/sda8 1837+ 2202 366- 2939863+ 83 Linux
/dev/sda9 2203+ 2689 487- 3911796 82 Linux swap
/dev/sda10 2690+ 24791 22102- 177534283+ 83 Linux
New situation:
Units = sectors of 512 bytes, counting from 0
Device Boot Start End #sectors Id System
/dev/sda1 63 160649 160587 83 Linux
/dev/sda2 0 - 0 0 Empty
/dev/sda3 0 - 0 0 Empty
/dev/sda4 160650 398283479 398122830 5 Extended
/dev/sda5 160713 4080509 3919797 83 Linux
/dev/sda6 4080573 23631614 19551042 83 Linux
/dev/sda7 23631678 29511404 5879727 83 Linux
/dev/sda8 29511468 35391194 5879727 83 Linux
/dev/sda9 35391258 43214849 7823592 82 Linux swap
/dev/sda10 43214913 398283479 355068567 83 Linux
Warning: no primary partition is marked bootable (active)
This does not matter for LILO, but the DOS MBR will not boot this disk.
Successfully wrote the new partition table
Re-reading the partition table ...
SCSI device sda: 398297088 512-byte hdwr sectors (203928 MB)
SCSI device sda: drive cache: write back
sda: sda1 sda4 < sda5 sda6 sda7 sda8 sda9 sda10 >
If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)
grep: /mnt/raidtab: No such file or directory
Created ext2/3 filesystem on /dev/sda1
Created ext2/3 filesystem on /dev/sda5
Created ext2/3 filesystem on /dev/sda6
Created ext2/3 filesystem on /dev/sda7
Created ext2/3 filesystem on /dev/sda8
Created ext2/3 filesystem on /dev/sda10
kjournald starting. Commit interval 5 seconds
EXT3 FS on sda5, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Mounted /dev/sda5 at /drive
kjournald starting. Commit interval 5 seconds
EXT3 FS on sda1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Mounted /dev/sda1 at /drive/boot
kjournald starting. Commit interval 5 seconds
EXT3 FS on sda6, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Mounted /dev/sda6 at /drive/usr
kjournald starting. Commit interval 5 seconds
EXT3 FS on sda7, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Mounted /dev/sda7 at /drive/var
kjournald starting. Commit interval 5 seconds
EXT3 FS on sda8, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Mounted /dev/sda8 at /drive/tmp
kjournald starting. Commit interval 5 seconds
EXT3 FS on sda10, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Mounted /dev/sda10 at /drive/home
Adding 3911788k swap on /dev/sda9. Priority:-1 extents:1
Swapspace /dev/sda9 initialized and added
Restoring drive....
| / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ |EXT2-fs error (device ram0): ext2_check_page: bad entry in directory #3345: rec_len is smaller than minimal - offset=24576, inode=0, rec_len=0, name_len=0
EXT2-fs error (device ram0): ext2_check_page: bad entry in directory #3345: rec_len is smaller than minimal - offset=28672, inode=0, rec_len=0, name_len=0
EXT2-fs error (device ram0): ext2_check_page: bad entry in directory #3345: rec_len is smaller than minimal - offset=32768, inode=0, rec_len=0, name_len=0
EXT2-fs error (device ram0): ext2_check_page: bad entry in directory #3345: rec_len is smaller than minimal - offset=36864, inode=0, rec_len=0, name_len=0
EXT2-fs error (device ram0): ext2_check_page: bad entry in directory #3345: rec_len is smaller than minimal - offset=40960, inode=0, rec_len=0, name_len=0
/ - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / -EXT2-fs error (device ram0): ext2_check_page: bad entry in directory #3345: rec_len is smaller than minimal - offset=12288, inode=0, rec_len=0, name_len=0
EXT2-fs error (device ram0): ext2_check_page: bad entry in directory #3345: rec_len is smaller than minimal - offset=16384, inode=0, rec_len=0, name_len=0
uname[1129]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffffefe920 error 4
sed[1133]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffffb99d90 error 4
sed[1136]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffffcbdf90 error 4
\ | / - \ | / - \ |uname[1150]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffffac5a40 error 4
/sed[1155]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffffb24b70 error 4
sed[1158]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffffa44d50 error 4
- \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | /EXT2-fs error (device ram0): ext2_check_page: bad entry in directory #3345: rec_len is smaller than minimal - offset=12288, inode=0, rec_len=0, name_len=0
EXT2-fs error (device ram0): ext2_check_page: bad entry in directory #3345: rec_len is smaller than minimal - offset=16384, inode=0, rec_len=0, name_len=0
uname[1231]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007ffffff0d1f0 error 4
sed[1235]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffffd9aea0 error 4
sed[1238]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffffbb5710 error 4
-uname[1242]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffffcc1300 error 4
sed[1246]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007ffffffe9910 error 4
sed[1249]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffffef7ce0 error 4
\ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \EXT2-fs error (device ram0): ext2_check_page: bad entry in directory #3345: rec_len is smaller than minimal - offset=20480, inode=0, rec_len=0, name_len=0
EXT2-fs error (device ram0): ext2_check_page: bad entry in directory #3345: rec_len is smaller than minimal - offset=24576, inode=0, rec_len=0, name_len=0
EXT2-fs error (device ram0): ext2_check_page: bad entry in directory #3345: rec_len is smaller than minimal - offset=28672, inode=0, rec_len=0, name_len=0
EXT2-fs error (device ram0): ext2_check_page: bad entry in directory #3345: rec_len is smaller than minimal - offset=32768, inode=0, rec_len=0, name_len=0
EXT2-fs error (device ram0): ext2_check_page: bad entry in directory #3345: rec_len is smaller than minimal - offset=36864, inode=0, rec_len=0, name_len=0
EXT2-fs error (device ram0): ext2_check_page: bad entry in directory #3345: rec_len is smaller than minimal - offset=40960, inode=0, rec_len=0, name_len=0
uname[1307]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffffbce170 error 4
sed[1311]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffff96b900 error 4
sed[1314]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffff9557d0 error 4
| / - \ | / - \ | / - \uname[1330]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffffbb4930 error 4
sed[1334]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffff9200d0 error 4
sed[1337]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007ffffff780f0 error 4
| / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / -uname[1395]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffffa83250 error 4
sed[1399]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007ffffffa8380 error 4
sed[1402]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffffbc0690 error 4
\ | / - \ | / - \ |uname[1419]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007ffffff07bf0 error 4
sed[1423]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffff82cb00 error 4
sed[1426]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffffa072b0 error 4
/ - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / -EXT2-fs error (device ram0): ext2_check_page: bad entry in directory #3345: rec_len is smaller than minimal - offset=12288, inode=0, rec_len=0, name_len=0
EXT2-fs error (device ram0): ext2_check_page: bad entry in directory #3345: rec_len is smaller than minimal - offset=16384, inode=0, rec_len=0, name_len=0
uname[1482]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffffc3af00 error 4
\sed[1487]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffffd3c3c0 error 4
sed[1490]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffffbe9b10 error 4
| / - \ | / - \ | / - \uname[1505]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffffeda150 error 4
sed[1509]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007ffffff6b310 error 4
sed[1512]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffff9bed80 error 4
|
mkdir[1514]: segfault at 0000000000000008 rip 00002aaaaaab1dff rsp 00007fffff815d60 error 4
/restore: line 1mkdir[1515]: segfault at 0000000000000008 rip 00002aaaaaab1dff rsp 00007fffff994c70 error 4
72: 1514 Segmenchmod[1516]: segfault at 0000000000000008 rip 00002aaaaaab3ce7 rsp 00007fffffa8f3b0 error 4
tation fault restore[1517]: segfault at 0000000000000004 rip 00000000004322a2 rsp 00007fffffca6e58 error 6
mkdir /drive/drive
File Restoration complete.Kernel panic - not syncing: Attempted to kill init!
Ensuring /medi a/floppy and /media/cdrom have been created
/restore: line 177: 1515 Segmentation fault mkdir -p /drive/media/floppy /drive/media/cdrom /drive/media/dvd
Ensuring correct permissions on tmp
/restore: line 180: 1516 Segmentation fault chmod 1777 /drive/tmp
/restore: line 190: 1517 Segmentation fault chroot /drive $GRUB --batch --no-floppy >&/dev/null <<EOF
device (hd0) ${device_save[0]}
root (hd0,0)
setup (hd0)
EOF
Unable to run grub on /dev/sda
/restore: line 195: 1518 Segmentation fault bash
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: very strange issue with sata,<4G Ram, and ext3
2005-04-29 14:45 ` Rick Warner
@ 2005-05-04 19:29 ` Rick Warner
2005-05-05 15:00 ` Alan Cox
2005-05-05 21:37 ` Krzysztof Halasa
1 sibling, 1 reply; 8+ messages in thread
From: Rick Warner @ 2005-05-04 19:29 UTC (permalink / raw)
To: linux-kernel
Just sending out a ping on this.. anyone have any ideas?
On Friday 29 April 2005 10:45 am, you wrote:
> On Thursday 28 April 2005 06:48 pm, Alan Cox wrote:
> > On Iau, 2005-04-28 at 17:16, Rick Warner wrote:
> > > On these systems, we are getting ext2 errors from the initrd during
> > > the untarring. Soon after, we start getting seg faults on random
> > > things (looks like stuff caused by the still running dhcp client), and
> > > then a continuous stream of segfaults on the restore script itself
> > > (restore[1]).
> >
> > This sounds almost like the pxe/boot code is still using ram that the
> > kernel has now used (eg the PXE layer or pxe booter forgot to close the
> > client and
> > its still DMAing happily into the kernel)
>
> This morning, we tried updating to a newer pxelinux (3.07) and had the same
> results. We then tried using etherboot with a mknbi tagged image and also
> had the same results. Since we are getting the same problem on 3
> different motherboards with 2 different network adapters, I have not looked
> into updating the boot rom on the nics. Should I?
>
> What should I look into next? I have attached a serial console log of the
> system and errors. The slashes and pipes you see are from a spinning bar
> thing. If you want output that is cleaned up without that, I can provide
> it.
--
Richard Warner
Lead Systems Integrator
Microway, Inc
(508)732-5517
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: very strange issue with sata,<4G Ram, and ext3
2005-05-04 19:29 ` Rick Warner
@ 2005-05-05 15:00 ` Alan Cox
0 siblings, 0 replies; 8+ messages in thread
From: Alan Cox @ 2005-05-05 15:00 UTC (permalink / raw)
To: Rick Warner; +Cc: Linux Kernel Mailing List
On Mer, 2005-05-04 at 20:29, Rick Warner wrote:
> Just sending out a ping on this.. anyone have any ideas?
The best I can think of right now in going forward is check
32 v 64 bit kernel
32bit Highmem aware kernel v 32bit non highmem (1GB limit) kernel
PATA boot v SATA boot v Network boot
just to try and find any patterns.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: very strange issue with sata,<4G Ram, and ext3
2005-04-29 14:45 ` Rick Warner
2005-05-04 19:29 ` Rick Warner
@ 2005-05-05 21:37 ` Krzysztof Halasa
2005-05-06 13:39 ` Rick Warner
1 sibling, 1 reply; 8+ messages in thread
From: Krzysztof Halasa @ 2005-05-05 21:37 UTC (permalink / raw)
To: Rick Warner; +Cc: Alan Cox, Linux Kernel Mailing List
Rick Warner <rick@microway.com> writes:
> This morning, we tried updating to a newer pxelinux (3.07) and had the same
> results. We then tried using etherboot with a mknbi tagged image and also
> had the same results. Since we are getting the same problem on 3 different
> motherboards with 2 different network adapters, I have not looked into
> updating the boot rom on the nics. Should I?
I remember I had memory corruption problems with an old version of
Etherboot few years ago. The machines were mostly AMD K6 based,
network cards were SMC EPIC100 (Etherpower II) and/or RTL 8139.
Memtest86 (downloaded with Etherboot) complained about random errors.
I think Linux didn't show any such illness.
This was Etherboot 4.something. Upgrading to 5.something fixed the
problem.
I suspect you're using Etherboot newer than 4.x though. I'd probably
give memtest86 loaded from network a try.
--
Krzysztof Halasa
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: very strange issue with sata,<4G Ram, and ext3
2005-05-05 21:37 ` Krzysztof Halasa
@ 2005-05-06 13:39 ` Rick Warner
0 siblings, 0 replies; 8+ messages in thread
From: Rick Warner @ 2005-05-06 13:39 UTC (permalink / raw)
To: Krzysztof Halasa; +Cc: Alan Cox, Linux Kernel Mailing List
On Thursday 05 May 2005 05:37 pm, Krzysztof Halasa wrote:
> Rick Warner <rick@microway.com> writes:
> > This morning, we tried updating to a newer pxelinux (3.07) and had the
> > same results. We then tried using etherboot with a mknbi tagged image
> > and also had the same results. Since we are getting the same problem on
> > 3 different motherboards with 2 different network adapters, I have not
> > looked into updating the boot rom on the nics. Should I?
>
> I remember I had memory corruption problems with an old version of
> Etherboot few years ago. The machines were mostly AMD K6 based,
> network cards were SMC EPIC100 (Etherpower II) and/or RTL 8139.
>
> Memtest86 (downloaded with Etherboot) complained about random errors.
> I think Linux didn't show any such illness.
> This was Etherboot 4.something. Upgrading to 5.something fixed the
> problem.
>
> I suspect you're using Etherboot newer than 4.x though. I'd probably
> give memtest86 loaded from network a try.
We actually run memtest86 from the network regularly. This cluster had run
dozens of passes of memtest booted over the network before doing any of this.
We also did an md5sum of our initrd from the network boot server, and then
had the initrd do an md5sum of itself on the network boot. They matched.
Thanks for the advice though! I appreciate it.
--
Richard Warner
Lead Systems Integrator
Microway, Inc
(508)732-5517
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2005-05-06 13:39 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-28 16:16 very strange issue with sata,<4G Ram, and ext3 Rick Warner
2005-04-28 17:32 ` Rick Warner
2005-04-28 22:48 ` Alan Cox
2005-04-29 14:45 ` Rick Warner
2005-05-04 19:29 ` Rick Warner
2005-05-05 15:00 ` Alan Cox
2005-05-05 21:37 ` Krzysztof Halasa
2005-05-06 13:39 ` Rick Warner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox