* Re: 2.6.31-rc5-git2 crash on a idle system. [not found] ` <1249421223.18245.36.camel@pasglop> @ 2009-08-05 9:17 ` Sachin Sant 2009-08-05 9:52 ` Benjamin Herrenschmidt 2009-08-09 18:55 ` 2.6.31-rc5-git2 crash on a idle system Louwrentius 0 siblings, 2 replies; 19+ messages in thread From: Sachin Sant @ 2009-08-05 9:17 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: neilb, linuxppc-dev, linux-raid [-- Attachment #1: Type: text/plain, Size: 1296 bytes --] Benjamin Herrenschmidt wrote: > On Tue, 2009-08-04 at 17:57 +0530, Sachin Sant wrote: > >> I have a power6 blade [IBM,7998-61X] running 2.6.31-rc5-git2 >> kernel (a33a052f19a21d727847391c8c1aff3fb221c472). After some >> period of inactivity the machine drops into xmon with following >> traces. >> > > Looks like code has been overwritten with data. Would be useful > to try to track down when that happens. > > Ben. 2.6.31-rc5-git1 (4905f92ed752d49ebe9cce4fe78a4bc39e710523) works fine on this box without any problem. So the problem was introduced between 2.6.31-rc5-git1 (4905f92ed752d49ebe9cce4fe78a4bc39e710523) and 2.6.31-rc5-git2 (a33a052f19a21d727847391c8c1aff3fb221c472). Looking at the changelog all the changes are confined to drivers/md/ directory ( probably following five commits ). I do have all the CONFIG_MD* options enabled in my kernel. 449aad3e25358812c43afc60918c5ad3819488e7 70471dafe3390243c598a3165dfb86b8b8b3f4fe 3673f305faf1bc66ead751344f8262ace851ff44 3a981b03f38dc3b8a69b77cbc679e66c1318a44a ac5e7113e74872928844d00085bd47c988f12728 I will try to find out the exact commit. Thanks -Sachin -- --------------------------------- Sachin Sant IBM Linux Technology Center India Systems and Technology Labs Bangalore, India --------------------------------- [-- Attachment #2: rc5-git2-log --] [-- Type: text/plain, Size: 19467 bytes --] [root@mjs22lp1 home]# cpu 0x0: Vector: 700 (Program Check) at [c00000000ffffa90] pc: c000000000600000: .flow_cache_new_hashrnd+0x3c/0xcc lr: c0000000000c6038: .run_timer_softirq+0x20c/0x2f4 sp: c00000000ffffd10 msr: 8000000000089032 current = 0xc000000000f58b70 paca = 0xc0000000010b2400 pid = 0, comm = swapper enter ? for help [c00000000ffffda0] c0000000000c6038 .run_timer_softirq+0x20c/0x2f4 [c00000000ffffea0] c0000000000be8a0 .__do_softirq+0x174/0x2c8 [c00000000fffff90] c0000000000307b0 .call_do_softirq+0x14/0x24 [c000000001013870] c00000000000ecf8 .do_softirq+0xa0/0x104 [c000000001013910] c0000000000be1d0 .irq_exit+0x74/0xd4 [c000000001013990] c00000000002ce20 .timer_interrupt+0x1cc/0x200 [c000000001013a30] c000000000003728 decrementer_common+0x128/0x180 --- Exception: 901 (Decrementer) at c00000000000ec3c .raw_local_irq_restore+0xc0/0xdc [c000000001013dc0] c000000000015824 .cpu_idle+0x13c/0x1e0 [c000000001013e60] c000000000009fe8 .rest_init+0x94/0xcc [c000000001013ee0] c000000000990cf4 .start_kernel+0x484/0x4a8 [c000000001013f90] c000000000008408 .start_here_common+0x2c/0xa4 0:mon> e cpu 0x0: Vector: 700 (Program Check) at [c00000000ffffa90] pc: c000000000600000: .flow_cache_new_hashrnd+0x3c/0xcc lr: c0000000000c6038: .run_timer_softirq+0x20c/0x2f4 sp: c00000000ffffd10 msr: 8000000000089032 current = 0xc000000000f58b70 paca = 0xc0000000010b2400 pid = 0, comm = swapper 0:mon> di c000000000600000 c000000000600000 00001010 .long 0x1010 c000000000600004 00000008 .long 0x8 c000000000600008 00001013 .long 0x1013 c00000000060000c 0000000f .long 0xf c000000000600010 7961626f rldimi. r1,r11,44,41 c000000000600014 6f740000 xoris r20,r27,0 c000000000600018 00101600 .long 0x101600 c00000000060001c 00000c00 .long 0xc00 c000000000600020 00000400 .long 0x400 c000000000600024 00101100 .long 0x101100 c000000000600028 000008e9 .long 0x8e9 c00000000060002c 60000000 nop c000000000600030 e93e8050 ld r9,-32688(r30) c000000000600034 7c6507b4 extsw r5,r3 c000000000600038 78ab4da4 rldicr r11,r5,9,54 c00000000060003c 396b0040 addi r11,r11,64 0:mon> r R00 = c0000000000c6038 R16 = 0000000000000000 R01 = c00000000ffffd10 R17 = 0000000000000100 R02 = c000000001011110 R18 = c000000001059300 R03 = 0000000000000000 R19 = c000000001102f98 R04 = c00000000ffffd60 R20 = c000000001103398 R05 = ffffffffffffffff R21 = c000000001103798 R06 = 0000000000000700 R22 = c000000001103b98 R07 = 0000000000000001 R23 = 0000000000000000 R08 = 0000000000830000 R24 = c00000000fffc000 R09 = c00000000072f890 R25 = 0000000000000000 R10 = c00000000120fbf8 R26 = 0000000000200200 R11 = c000000000600090 R27 = c00000000ffffe10 R12 = 0000000028000022 R28 = 0000000000000001 R13 = c0000000010b2400 R29 = c0000000010805e8 R14 = 0000000000382800 R30 = c000000000fb4f78 R15 = 0000000000000000 R31 = c00000000ffffd10 pc = c000000000600000 .flow_cache_new_hashrnd+0x3c/0xcc lr = c0000000000c6038 .run_timer_softirq+0x20c/0x2f4 msr = 8000000000089032 cr = 28000024 ctr = c0000000005fffc4 xer = 0000000000000000 trap = 700 0:mon> dl <6>Phyp-dump disabled at boot time <6>Using pSeries machine description <7>Page orders: linear mapping = 24, virtual = 16, io = 12, vmemmap = 24 <6>Using 1TB segments <4>Found initrd at 0xc000000002000000:0xc000000002382800 <6>console [udbg0] enabled <6>Partition configured for 8 cpus. <6>CPU maps initialized for 2 threads per core <7> (thread shift is 1) <4>Starting Linux PPC64 #1 SMP Tue Aug 4 12:15:55 IST 2009 <4>----------------------------------------------------- <4>ppc64_pft_size = 0x1a <4>physicalMemorySize = 0xa0000000 <4>htab_hash_mask = 0x7ffff <4>----------------------------------------------------- <6>Initializing cgroup subsys cpuset <6>Initializing cgroup subsys cpu <5>Linux version 2.6.31-rc5-git2 (root@mjs22lp1) (gcc version 4.4.0 20090307 (Red Hat 4.4.0-0.23) (GCC) ) #1 SMP Tue Aug 4 12:15:55 IST 2009 <4>[boot]0012 Setup Arch <7>Node 0 Memory: 0x0-0x54000000 <7>Node 1 Memory: 0x54000000-0xa0000000 <4>EEH: No capable adapters found <6>PPC64 nvram contains 15360 bytes <7>Using shared processor idle loop <4>Zone PFN ranges: <4> DMA 0x00000000 -> 0x0000a000 <4> Normal 0x0000a000 -> 0x0000a000 <4>Movable zone start PFN for each node <4>early_node_map[2] active PFN ranges <4> 0: 0x00000000 -> 0x00005400 <4> 1: 0x00005400 -> 0x0000a000 <7>On node 0 totalpages: 21504 <7> DMA zone: 19 pages used for memmap <7> DMA zone: 0 pages reserved <7> DMA zone: 21485 pages, LIFO batch:1 <7>On node 1 totalpages: 19456 <7> DMA zone: 17 pages used for memmap <7> DMA zone: 0 pages reserved <7> DMA zone: 19439 pages, LIFO batch:1 <4>[boot]0015 Setup Done <4>Built 2 zonelists in Node order, mobility grouping on. Total pages: 40924 <4>Policy zone: DMA <5>Kernel command line: ro console=hvc0 root=UUID=9448b379-7842-486c-8bc8-22a1427c4462 <4>PID hash table entries: 4096 (order: 12, 32768 bytes) <4>freeing bootmem node 0 <4>freeing bootmem node 1 <6>Memory: 2570560k/2621440k available (15296k kernel code, 50880k reserved, 1152k data, 1488k bss, 5504k init) <6>SLUB: Genslabs=18, HWalign=128, Order=0-3, MinObjects=0, CPUs=8, Nodes=16 <6>Hierarchical RCU implementation. <6>NR_IRQS:512 <4>[boot]0020 XICS Init <4>[boot]0021 XICS Done <7>pic: no ISA interrupt controller <7>time_init: decrementer frequency = 512.000000 MHz <7>time_init: processor frequency = 4005.000000 MHz <6>clocksource: timebase mult[7d0000] shift[22] registered <7>clockevent: decrementer mult[83126e97] shift[32] cpu[0] <4>Console: colour dummy device 80x25 <6>console handover: boot [udbg0] -> real [hvc0] <6>allocated 1638400 bytes of page_cgroup <6>please try 'cgroup_disable=memory' option if you don't want memory cgroups <6>Security Framework initialized <6>SELinux: Initializing. <7>SELinux: Starting in permissive mode <6>Dentry cache hash table entries: 524288 (order: 6, 4194304 bytes) <6>Inode-cache hash table entries: 262144 (order: 5, 2097152 bytes) <4>Mount-cache hash table entries: 4096 <6>Initializing cgroup subsys ns <6>Initializing cgroup subsys cpuacct <6>Initializing cgroup subsys memory <6>Initializing cgroup subsys devices <6>Initializing cgroup subsys freezer <6>Initializing cgroup subsys net_cls <6>ftrace: allocating 19543 entries in 8 pages <7>irq: irq 2 on host null mapped to virtual irq 16 <7>clockevent: decrementer mult[83126e97] shift[32] cpu[1] <4>Processor 1 found. <7>clockevent: decrementer mult[83126e97] shift[32] cpu[2] <4>Processor 2 found. <7>clockevent: decrementer mult[83126e97] shift[32] cpu[3] <4>Processor 3 found. <7>clockevent: decrementer mult[83126e97] shift[32] cpu[4] <4>Processor 4 found. <7>clockevent: decrementer mult[83126e97] shift[32] cpu[5] <4>Processor 5 found. <7>clockevent: decrementer mult[83126e97] shift[32] cpu[6] <4>Processor 6 found. <7>clockevent: decrementer mult[83126e97] shift[32] cpu[7] <4>Processor 7 found. <6>Brought up 8 CPUs <7>Node 0 CPUs: 0-7 <7>Node 1 CPUs: <7>CPU0 attaching sched-domain: <7> domain 0: span 0-1 level SIBLING <7> groups: 0 1 <7> domain 1: span 0-7 level CPU <7> groups: 0-1 2-3 4-5 6-7 <7> domain 2: span 0-7 level NODE <7> groups: 0-7 (__cpu_power = 4096) <7>CPU1 attaching sched-domain: <7> domain 0: span 0-1 level SIBLING <7> groups: 1 0 <7> domain 1: span 0-7 level CPU <7> groups: 0-1 2-3 4-5 6-7 <7> domain 2: span 0-7 level NODE <7> groups: 0-7 (__cpu_power = 4096) <7>CPU2 attaching sched-domain: <7> domain 0: span 2-3 level SIBLING <7> groups: 2 3 <7> domain 1: span 0-7 level CPU <7> groups: 2-3 4-5 6-7 0-1 <7> domain 2: span 0-7 level NODE <7> groups: 0-7 (__cpu_power = 4096) <7>CPU3 attaching sched-domain: <7> domain 0: span 2-3 level SIBLING <7> groups: 3 2 <7> domain 1: span 0-7 level CPU <7> groups: 2-3 4-5 6-7 0-1 <7> domain 2: span 0-7 level NODE <7> groups: 0-7 (__cpu_power = 4096) <7>CPU4 attaching sched-domain: <7> domain 0: span 4-5 level SIBLING <7> groups: 4 5 <7> domain 1: span 0-7 level CPU <7> groups: 4-5 6-7 0-1 2-3 <7> domain 2: span 0-7 level NODE <7> groups: 0-7 (__cpu_power = 4096) <7>CPU5 attaching sched-domain: <7> domain 0: span 4-5 level SIBLING <7> groups: 5 4 <7> domain 1: span 0-7 level CPU <7> groups: 4-5 6-7 0-1 2-3 <7> domain 2: span 0-7 level NODE <7> groups: 0-7 (__cpu_power = 4096) <7>CPU6 attaching sched-domain: <7> domain 0: span 6-7 level SIBLING <7> groups: 6 7 <7> domain 1: span 0-7 level CPU <7> groups: 6-7 0-1 2-3 4-5 <7> domain 2: span 0-7 level NODE <7> groups: 0-7 (__cpu_power = 4096) <7>CPU7 attaching sched-domain: <7> domain 0: span 6-7 level SIBLING <7> groups: 7 6 <7> domain 1: span 0-7 level CPU <7> groups: 6-7 0-1 2-3 4-5 <7> domain 2: span 0-7 level NODE <7> groups: 0-7 (__cpu_power = 4096) <6>regulator: core version 0.5 <6>NET: Registered protocol family 16 <6>IBM eBus Device Driver <6>POWER6 performance monitor hardware support registered <6>PCI: Probing PCI hardware <7>PCI: Probing PCI hardware done <4>bio: create slab <bio-0> at 0 <5>SCSI subsystem initialized <7>libata version 3.00 loaded. <6>usbcore: registered new interface driver usbfs <6>usbcore: registered new interface driver hub <6>usbcore: registered new device driver usb <6>NetLabel: Initializing <6>NetLabel: domain hash size = 128 <6>NetLabel: protocols = UNLABELED CIPSOv4 <6>NetLabel: unlabeled traffic allowed by default <7>Switched to high resolution mode on CPU 0 <7>Switched to high resolution mode on CPU 3 <7>Switched to high resolution mode on CPU 4 <7>Switched to high resolution mode on CPU 5 <7>Switched to high resolution mode on CPU 1 <7>Switched to high resolution mode on CPU 6 <7>Switched to high resolution mode on CPU 2 <7>Switched to high resolution mode on CPU 7 <6>NET: Registered protocol family 2 <6>IP route cache hash table entries: 32768 (order: 2, 262144 bytes) <6>TCP established hash table entries: 131072 (order: 5, 2097152 bytes) <6>TCP bind hash table entries: 65536 (order: 4, 1048576 bytes) <6>TCP: Hash tables configured (established 131072 bind 65536) <6>TCP reno registered <6>NET: Registered protocol family 1 <6>Trying to unpack rootfs image as initramfs... <4>Freeing initrd memory: 3594k freed <7>irq: irq 655360 on host null mapped to virtual irq 17 <7>irq: irq 655362 on host null mapped to virtual irq 18 <6>IOMMU table initialized, virtual merging enabled <7>irq: irq 655364 on host null mapped to virtual irq 19 <7>irq: irq 655365 on host null mapped to virtual irq 20 <7>irq: irq 589825 on host null mapped to virtual irq 21 <7>RTAS daemon started <7>RTAS: event: 23, Type: Platform Information Event, Severity: 1 <6>audit: initializing netlink socket (disabled) <5>type=2000 audit(1249386596.125:1): initialized <6>HugeTLB registered 16 MB page size, pre-allocated 0 pages <6>HugeTLB registered 16 GB page size, pre-allocated 0 pages <5>VFS: Disk quotas dquot_6.5.2 <4>Dquot-cache hash table entries: 8192 (order 0, 65536 bytes) <6>msgmni has been set to 5024 <7>SELinux: Registering netfilter hooks <6>alg: No test for stdrng (krng) <6>Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252) <6>io scheduler noop registered <6>io scheduler anticipatory registered <6>io scheduler deadline registered <6>io scheduler cfq registered (default) <6>pci_hotplug: PCI Hot Plug PCI Core version: 0.5 <6>pciehp: PCI Express Hot Plug Controller Driver version: 0.4 <7>vio_register_driver: driver hvc_console registering <7>HVSI: registered 0 devices <6>Linux agpgart interface v0.103 <6>Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled <4>Platform driver 'serial8250' needs updating - please use dev_pm_ops <6>TX39/49 Serial driver version 1.11 <4>Platform driver 'serial_txx9' needs updating - please use dev_pm_ops <6>brd: module loaded <6>loop: module loaded <6>input: Macintosh mouse button emulation as /devices/virtual/input/input0 <6>Uniform Multi-Platform E-IDE driver <6>ide-gd driver 1.18 <6>Fixed MDIO Bus: probed <6>ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver <6>ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver <6>uhci_hcd: USB Universal Host Controller Interface driver <6>mice: PS/2 mouse device common for all mice <6>device-mapper: uevent: version 1.0.3 <6>device-mapper: ioctl: 4.15.0-ioctl (2009-04-01) initialised: dm-devel@redhat.com <6>usbcore: registered new interface driver hiddev <6>usbcore: registered new interface driver usbhid <6>usbhid: v2.6:USB HID core driver <4>nf_conntrack version 0.5.0 (16384 buckets, 65536 max) <4>CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Please use <4>nf_conntrack.acct=1 kernel parameter, acct=1 nf_conntrack module option or <4>sysctl net.netfilter.nf_conntrack_acct=1 to enable it. <6>ip_tables: (C) 2000-2006 Netfilter Core Team <6>TCP cubic registered <6>Initializing XFRM netlink socket <6>NET: Registered protocol family 17 <7>Running MSI bitmap self-tests ... <7>PM: Resume from disk failed. <4>registered taskstats version 1 <6>Initalizing network drop monitor service <4>Freeing unused kernel memory: 5504k freed <7>vio_register_driver: driver ibmvscsi registering <6>ibmvscsi 30000002: SRP_VERSION: 16.a <6>scsi0 : IBM POWER Virtual SCSI Adapter 1.5.8 <6>ibmvscsi 30000002: partner initialization complete <6>ibmvscsi 30000002: host srp version: 16.a, host partition 06-1C12A (1), OS 3, max io 262144 <6>ibmvscsi 30000002: Client reserve enabled <6>ibmvscsi 30000002: sent SRP login <6>ibmvscsi 30000002: SRP_LOGIN succeeded <5>scsi 0:0:1:0: Direct-Access AIX VDASD 0001 PQ: 0 ANSI: 3 <5>scsi 0:0:2:0: CD-ROM AIX VOPTA PQ: 0 ANSI: 4 <6>scsi: waiting for bus probes to complete ... <5>sd 0:0:1:0: Attached scsi generic sg0 type 0 <5>sd 0:0:1:0: [sda] 147324928 512-byte logical blocks: (75.4 GB/70.2 GiB) <5>sd 0:0:1:0: [sda] Write Protect is off <7>sd 0:0:1:0: [sda] Mode Sense: 17 00 00 08 <5>sd 0:0:1:0: [sda] Cache data unavailable <3>sd 0:0:1:0: [sda] Assuming drive cache: write through <4>sr0: scsi-1 drive <6>Uniform CD-ROM driver Revision: 3.20 <7>sr 0:0:2:0: Attached scsi CD-ROM sr0 <5>sr 0:0:2:0: Attached scsi generic sg1 type 5 <5>sd 0:0:1:0: [sda] Cache data unavailable <3>sd 0:0:1:0: [sda] Assuming drive cache: write through <6> sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 > <5>sd 0:0:1:0: [sda] Cache data unavailable <3>sd 0:0:1:0: [sda] Assuming drive cache: write through <5>sd 0:0:1:0: [sda] Attached SCSI disk <6>EXT4-fs (sda3): INFO: recovery required on readonly filesystem <6>EXT4-fs (sda3): write access will be enabled during recovery <6>EXT4-fs (sda3): barriers enabled <6>kjournald2 starting: pid 126, dev sda3:8, commit interval 5 seconds <6>EXT4-fs (sda3): delayed allocation enabled <6>EXT4-fs: file extents enabled <6>EXT4-fs: mballoc enabled <6>EXT4-fs (sda3): recovery complete <6>EXT4-fs (sda3): mounted filesystem with ordered data mode <5>type=1404 audit(1249386597.188:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295 <7>SELinux: 8192 avtab hash slots, 117789 rules. <7>SELinux: 8192 avtab hash slots, 117789 rules. <7>SELinux: 8 users, 13 roles, 2653 types, 121 bools, 1 sens, 1024 cats <7>SELinux: 74 classes, 117789 rules <7>SELinux: Completing initialization. <7>SELinux: Setting up existing superblocks. <7>SELinux: initialized (dev sda3, type ext4), uses xattr <7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs <7>SELinux: initialized (dev selinuxfs, type selinuxfs), uses genfs_contexts <7>SELinux: initialized (dev mqueue, type mqueue), uses transition SIDs <7>SELinux: initialized (dev hugetlbfs, type hugetlbfs), uses genfs_contexts <7>SELinux: initialized (dev devpts, type devpts), uses transition SIDs <7>SELinux: initialized (dev inotifyfs, type inotifyfs), uses genfs_contexts <7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs <7>SELinux: initialized (dev anon_inodefs, type anon_inodefs), uses genfs_contexts <7>SELinux: initialized (dev pipefs, type pipefs), uses task SIDs <7>SELinux: initialized (dev debugfs, type debugfs), uses genfs_contexts <7>SELinux: initialized (dev sockfs, type sockfs), uses task SIDs <7>SELinux: initialized (dev proc, type proc), uses genfs_contexts <7>SELinux: initialized (dev bdev, type bdev), uses genfs_contexts <7>SELinux: initialized (dev rootfs, type rootfs), uses genfs_contexts <7>SELinux: initialized (dev sysfs, type sysfs), uses genfs_contexts <5>type=1403 audit(1249386597.481:3): policy loaded auid=4294967295 ses=4294967295 <6>udev: starting version 141 <7>drivers/net/ibmveth.c: ibmveth: IBM i/pSeries Virtual Ethernet Driver 1.03 <7>vio_register_driver: driver ibmveth registering <6>IBM eHEA ethernet device driver (Release EHEA_0101) <7>irq: irq 590080 on host null mapped to virtual irq 256 <6>ehea: eth0: Jumbo frames are enabled <6>ehea: eth0 -> logical port id #3 <6>ehea: eth2: Jumbo frames are enabled <6>ehea: eth2 -> logical port id #4 <6>udev: renamed network interface eth1 to eth3 <6>udev: renamed network interface eth0_rename to eth1 <6>device-mapper: multipath: version 1.1.0 loaded <6>EXT4-fs (sda3): internal journal on sda3:8 <6>kjournald starting. Commit interval 5 seconds <6>EXT3 FS on sda2, internal journal <6>EXT3-fs: mounted filesystem with writeback data mode. <7>SELinux: initialized (dev sda2, type ext3), uses xattr <7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs <7>SELinux: initialized (dev binfmt_misc, type binfmt_misc), uses genfs_contexts <6>NET: Registered protocol family 10 <6>lo: Disabled Privacy Extensions <6>RPC: Registered udp transport module. <6>RPC: Registered tcp transport module. <7>SELinux: initialized (dev rpc_pipefs, type rpc_pipefs), uses genfs_contexts <7>irq: irq 779 on host null mapped to virtual irq 267 <6>ehea: eth2: Physical port up <6>ehea: External switch port is backup port <7>irq: irq 780 on host null mapped to virtual irq 268 <7>irq: irq 781 on host null mapped to virtual irq 269 <6>ehea: eth0: Physical port up <6>ehea: External switch port is backup port <7>irq: irq 782 on host null mapped to virtual irq 270 <6>Bluetooth: Core ver 2.15 <6>NET: Registered protocol family 31 <6>Bluetooth: HCI device and connection manager initialized <6>Bluetooth: HCI socket layer initialized <6>Bluetooth: L2CAP ver 2.13 <6>Bluetooth: L2CAP socket layer initialized <6>Bluetooth: BNEP (Ethernet Emulation) ver 1.3 <6>Bluetooth: BNEP filters: protocol multicast <6>Installing knfsd (copyright (C) 1996 okir@monad.swb.de). <7>SELinux: initialized (dev nfsd, type nfsd), uses genfs_contexts <4>NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory <6>NFSD: starting 90-second grace period <5>Bridge firewalling registered <6>Bluetooth: SCO (Voice Link) ver 0.6 <6>Bluetooth: SCO socket layer initialized <6>virbr0: starting userspace STP failed, starting kernel STP <7>SELinux: initialized (dev mqueue, type mqueue), uses transition SIDs <7>SELinux: initialized (dev proc, type proc), uses genfs_contexts <7>SELinux: initialized (dev mqueue, type mqueue), uses transition SIDs <6>lo: Disabled Privacy Extensions <7>SELinux: initialized (dev proc, type proc), uses genfs_contexts <7>eth0: no IPv6 routers present <7>eth2: no IPv6 routers present <7>eth1: no IPv6 routers present <7>eth3: no IPv6 routers present <7>virbr0: no IPv6 routers present 0:mon> [-- Attachment #3: Type: text/plain, Size: 150 bytes --] _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 2.6.31-rc5-git2 crash on a idle system. 2009-08-05 9:17 ` 2.6.31-rc5-git2 crash on a idle system Sachin Sant @ 2009-08-05 9:52 ` Benjamin Herrenschmidt 2009-08-06 13:33 ` Sachin Sant 2009-08-09 18:55 ` 2.6.31-rc5-git2 crash on a idle system Louwrentius 1 sibling, 1 reply; 19+ messages in thread From: Benjamin Herrenschmidt @ 2009-08-05 9:52 UTC (permalink / raw) To: Sachin Sant; +Cc: linuxppc-dev, neilb, linux-raid > 2.6.31-rc5-git1 (4905f92ed752d49ebe9cce4fe78a4bc39e710523) works fine > on this box without any problem. So the problem was introduced between > 2.6.31-rc5-git1 (4905f92ed752d49ebe9cce4fe78a4bc39e710523) and > 2.6.31-rc5-git2 (a33a052f19a21d727847391c8c1aff3fb221c472). > > Looking at the changelog all the changes are confined to > drivers/md/ directory ( probably following five commits ). I do have > all the CONFIG_MD* options enabled in my kernel. > > 449aad3e25358812c43afc60918c5ad3819488e7 > 70471dafe3390243c598a3165dfb86b8b8b3f4fe > 3673f305faf1bc66ead751344f8262ace851ff44 > 3a981b03f38dc3b8a69b77cbc679e66c1318a44a > ac5e7113e74872928844d00085bd47c988f12728 > > I will try to find out the exact commit. Thanks. Since it's a memory corruption (or seems to be) however, it's possible that the bisection will mislead you. IE. The culprit could be somewhere else, and the commit you'll find via bisection just happens to move things around in the kernel in such a way that the corruption hits that code path instead of another rarely used one. I would suggest using printk to print out the content of memory where the code appears to have been smashed at different stages during boot (maybe even in the initcalls loop in init/main.c) to try to point out what appears to be causing the corruption. Cheers, Ben. > Thanks > -Sachin > > > plain text document attachment (rc5-git2-log) > [root@mjs22lp1 home]# cpu 0x0: Vector: 700 (Program Check) at [c00000000ffffa90] > pc: c000000000600000: .flow_cache_new_hashrnd+0x3c/0xcc > lr: c0000000000c6038: .run_timer_softirq+0x20c/0x2f4 > sp: c00000000ffffd10 > msr: 8000000000089032 > current = 0xc000000000f58b70 > paca = 0xc0000000010b2400 > pid = 0, comm = swapper > enter ? for help > [c00000000ffffda0] c0000000000c6038 .run_timer_softirq+0x20c/0x2f4 > [c00000000ffffea0] c0000000000be8a0 .__do_softirq+0x174/0x2c8 > [c00000000fffff90] c0000000000307b0 .call_do_softirq+0x14/0x24 > [c000000001013870] c00000000000ecf8 .do_softirq+0xa0/0x104 > [c000000001013910] c0000000000be1d0 .irq_exit+0x74/0xd4 > [c000000001013990] c00000000002ce20 .timer_interrupt+0x1cc/0x200 > [c000000001013a30] c000000000003728 decrementer_common+0x128/0x180 > --- Exception: 901 (Decrementer) at c00000000000ec3c .raw_local_irq_restore+0xc0/0xdc > [c000000001013dc0] c000000000015824 .cpu_idle+0x13c/0x1e0 > [c000000001013e60] c000000000009fe8 .rest_init+0x94/0xcc > [c000000001013ee0] c000000000990cf4 .start_kernel+0x484/0x4a8 > [c000000001013f90] c000000000008408 .start_here_common+0x2c/0xa4 > 0:mon> e > cpu 0x0: Vector: 700 (Program Check) at [c00000000ffffa90] > pc: c000000000600000: .flow_cache_new_hashrnd+0x3c/0xcc > lr: c0000000000c6038: .run_timer_softirq+0x20c/0x2f4 > sp: c00000000ffffd10 > msr: 8000000000089032 > current = 0xc000000000f58b70 > paca = 0xc0000000010b2400 > pid = 0, comm = swapper > 0:mon> di c000000000600000 > c000000000600000 00001010 .long 0x1010 > c000000000600004 00000008 .long 0x8 > c000000000600008 00001013 .long 0x1013 > c00000000060000c 0000000f .long 0xf > c000000000600010 7961626f rldimi. r1,r11,44,41 > c000000000600014 6f740000 xoris r20,r27,0 > c000000000600018 00101600 .long 0x101600 > c00000000060001c 00000c00 .long 0xc00 > c000000000600020 00000400 .long 0x400 > c000000000600024 00101100 .long 0x101100 > c000000000600028 000008e9 .long 0x8e9 > c00000000060002c 60000000 nop > c000000000600030 e93e8050 ld r9,-32688(r30) > c000000000600034 7c6507b4 extsw r5,r3 > c000000000600038 78ab4da4 rldicr r11,r5,9,54 > c00000000060003c 396b0040 addi r11,r11,64 > 0:mon> r > R00 = c0000000000c6038 R16 = 0000000000000000 > R01 = c00000000ffffd10 R17 = 0000000000000100 > R02 = c000000001011110 R18 = c000000001059300 > R03 = 0000000000000000 R19 = c000000001102f98 > R04 = c00000000ffffd60 R20 = c000000001103398 > R05 = ffffffffffffffff R21 = c000000001103798 > R06 = 0000000000000700 R22 = c000000001103b98 > R07 = 0000000000000001 R23 = 0000000000000000 > R08 = 0000000000830000 R24 = c00000000fffc000 > R09 = c00000000072f890 R25 = 0000000000000000 > R10 = c00000000120fbf8 R26 = 0000000000200200 > R11 = c000000000600090 R27 = c00000000ffffe10 > R12 = 0000000028000022 R28 = 0000000000000001 > R13 = c0000000010b2400 R29 = c0000000010805e8 > R14 = 0000000000382800 R30 = c000000000fb4f78 > R15 = 0000000000000000 R31 = c00000000ffffd10 > pc = c000000000600000 .flow_cache_new_hashrnd+0x3c/0xcc > lr = c0000000000c6038 .run_timer_softirq+0x20c/0x2f4 > msr = 8000000000089032 cr = 28000024 > ctr = c0000000005fffc4 xer = 0000000000000000 trap = 700 > 0:mon> dl > <6>Phyp-dump disabled at boot time > <6>Using pSeries machine description > <7>Page orders: linear mapping = 24, virtual = 16, io = 12, vmemmap = 24 > <6>Using 1TB segments > <4>Found initrd at 0xc000000002000000:0xc000000002382800 > <6>console [udbg0] enabled > <6>Partition configured for 8 cpus. > <6>CPU maps initialized for 2 threads per core > <7> (thread shift is 1) > <4>Starting Linux PPC64 #1 SMP Tue Aug 4 12:15:55 IST 2009 > <4>----------------------------------------------------- > <4>ppc64_pft_size = 0x1a > <4>physicalMemorySize = 0xa0000000 > <4>htab_hash_mask = 0x7ffff > <4>----------------------------------------------------- > <6>Initializing cgroup subsys cpuset > <6>Initializing cgroup subsys cpu > <5>Linux version 2.6.31-rc5-git2 (root@mjs22lp1) (gcc version 4.4.0 20090307 (Red Hat 4.4.0-0.23) (GCC) ) #1 SMP Tue Aug 4 12:15:55 IST 2009 > <4>[boot]0012 Setup Arch > <7>Node 0 Memory: 0x0-0x54000000 > <7>Node 1 Memory: 0x54000000-0xa0000000 > <4>EEH: No capable adapters found > <6>PPC64 nvram contains 15360 bytes > <7>Using shared processor idle loop > <4>Zone PFN ranges: > <4> DMA 0x00000000 -> 0x0000a000 > <4> Normal 0x0000a000 -> 0x0000a000 > <4>Movable zone start PFN for each node > <4>early_node_map[2] active PFN ranges > <4> 0: 0x00000000 -> 0x00005400 > <4> 1: 0x00005400 -> 0x0000a000 > <7>On node 0 totalpages: 21504 > <7> DMA zone: 19 pages used for memmap > <7> DMA zone: 0 pages reserved > <7> DMA zone: 21485 pages, LIFO batch:1 > <7>On node 1 totalpages: 19456 > <7> DMA zone: 17 pages used for memmap > <7> DMA zone: 0 pages reserved > <7> DMA zone: 19439 pages, LIFO batch:1 > <4>[boot]0015 Setup Done > <4>Built 2 zonelists in Node order, mobility grouping on. Total pages: 40924 > <4>Policy zone: DMA > <5>Kernel command line: ro console=hvc0 root=UUID=9448b379-7842-486c-8bc8-22a1427c4462 > <4>PID hash table entries: 4096 (order: 12, 32768 bytes) > <4>freeing bootmem node 0 > <4>freeing bootmem node 1 > <6>Memory: 2570560k/2621440k available (15296k kernel code, 50880k reserved, 1152k data, 1488k bss, 5504k init) > <6>SLUB: Genslabs=18, HWalign=128, Order=0-3, MinObjects=0, CPUs=8, Nodes=16 > <6>Hierarchical RCU implementation. > <6>NR_IRQS:512 > <4>[boot]0020 XICS Init > <4>[boot]0021 XICS Done > <7>pic: no ISA interrupt controller > <7>time_init: decrementer frequency = 512.000000 MHz > <7>time_init: processor frequency = 4005.000000 MHz > <6>clocksource: timebase mult[7d0000] shift[22] registered > <7>clockevent: decrementer mult[83126e97] shift[32] cpu[0] > <4>Console: colour dummy device 80x25 > <6>console handover: boot [udbg0] -> real [hvc0] > <6>allocated 1638400 bytes of page_cgroup > <6>please try 'cgroup_disable=memory' option if you don't want memory cgroups > <6>Security Framework initialized > <6>SELinux: Initializing. > <7>SELinux: Starting in permissive mode > <6>Dentry cache hash table entries: 524288 (order: 6, 4194304 bytes) > <6>Inode-cache hash table entries: 262144 (order: 5, 2097152 bytes) > <4>Mount-cache hash table entries: 4096 > <6>Initializing cgroup subsys ns > <6>Initializing cgroup subsys cpuacct > <6>Initializing cgroup subsys memory > <6>Initializing cgroup subsys devices > <6>Initializing cgroup subsys freezer > <6>Initializing cgroup subsys net_cls > <6>ftrace: allocating 19543 entries in 8 pages > <7>irq: irq 2 on host null mapped to virtual irq 16 > <7>clockevent: decrementer mult[83126e97] shift[32] cpu[1] > <4>Processor 1 found. > <7>clockevent: decrementer mult[83126e97] shift[32] cpu[2] > <4>Processor 2 found. > <7>clockevent: decrementer mult[83126e97] shift[32] cpu[3] > <4>Processor 3 found. > <7>clockevent: decrementer mult[83126e97] shift[32] cpu[4] > <4>Processor 4 found. > <7>clockevent: decrementer mult[83126e97] shift[32] cpu[5] > <4>Processor 5 found. > <7>clockevent: decrementer mult[83126e97] shift[32] cpu[6] > <4>Processor 6 found. > <7>clockevent: decrementer mult[83126e97] shift[32] cpu[7] > <4>Processor 7 found. > <6>Brought up 8 CPUs > <7>Node 0 CPUs: 0-7 > <7>Node 1 CPUs: > <7>CPU0 attaching sched-domain: > <7> domain 0: span 0-1 level SIBLING > <7> groups: 0 1 > <7> domain 1: span 0-7 level CPU > <7> groups: 0-1 2-3 4-5 6-7 > <7> domain 2: span 0-7 level NODE > <7> groups: 0-7 (__cpu_power = 4096) > <7>CPU1 attaching sched-domain: > <7> domain 0: span 0-1 level SIBLING > <7> groups: 1 0 > <7> domain 1: span 0-7 level CPU > <7> groups: 0-1 2-3 4-5 6-7 > <7> domain 2: span 0-7 level NODE > <7> groups: 0-7 (__cpu_power = 4096) > <7>CPU2 attaching sched-domain: > <7> domain 0: span 2-3 level SIBLING > <7> groups: 2 3 > <7> domain 1: span 0-7 level CPU > <7> groups: 2-3 4-5 6-7 0-1 > <7> domain 2: span 0-7 level NODE > <7> groups: 0-7 (__cpu_power = 4096) > <7>CPU3 attaching sched-domain: > <7> domain 0: span 2-3 level SIBLING > <7> groups: 3 2 > <7> domain 1: span 0-7 level CPU > <7> groups: 2-3 4-5 6-7 0-1 > <7> domain 2: span 0-7 level NODE > <7> groups: 0-7 (__cpu_power = 4096) > <7>CPU4 attaching sched-domain: > <7> domain 0: span 4-5 level SIBLING > <7> groups: 4 5 > <7> domain 1: span 0-7 level CPU > <7> groups: 4-5 6-7 0-1 2-3 > <7> domain 2: span 0-7 level NODE > <7> groups: 0-7 (__cpu_power = 4096) > <7>CPU5 attaching sched-domain: > <7> domain 0: span 4-5 level SIBLING > <7> groups: 5 4 > <7> domain 1: span 0-7 level CPU > <7> groups: 4-5 6-7 0-1 2-3 > <7> domain 2: span 0-7 level NODE > <7> groups: 0-7 (__cpu_power = 4096) > <7>CPU6 attaching sched-domain: > <7> domain 0: span 6-7 level SIBLING > <7> groups: 6 7 > <7> domain 1: span 0-7 level CPU > <7> groups: 6-7 0-1 2-3 4-5 > <7> domain 2: span 0-7 level NODE > <7> groups: 0-7 (__cpu_power = 4096) > <7>CPU7 attaching sched-domain: > <7> domain 0: span 6-7 level SIBLING > <7> groups: 7 6 > <7> domain 1: span 0-7 level CPU > <7> groups: 6-7 0-1 2-3 4-5 > <7> domain 2: span 0-7 level NODE > <7> groups: 0-7 (__cpu_power = 4096) > <6>regulator: core version 0.5 > <6>NET: Registered protocol family 16 > <6>IBM eBus Device Driver > <6>POWER6 performance monitor hardware support registered > <6>PCI: Probing PCI hardware > <7>PCI: Probing PCI hardware done > <4>bio: create slab <bio-0> at 0 > <5>SCSI subsystem initialized > <7>libata version 3.00 loaded. > <6>usbcore: registered new interface driver usbfs > <6>usbcore: registered new interface driver hub > <6>usbcore: registered new device driver usb > <6>NetLabel: Initializing > <6>NetLabel: domain hash size = 128 > <6>NetLabel: protocols = UNLABELED CIPSOv4 > <6>NetLabel: unlabeled traffic allowed by default > <7>Switched to high resolution mode on CPU 0 > <7>Switched to high resolution mode on CPU 3 > <7>Switched to high resolution mode on CPU 4 > <7>Switched to high resolution mode on CPU 5 > <7>Switched to high resolution mode on CPU 1 > <7>Switched to high resolution mode on CPU 6 > <7>Switched to high resolution mode on CPU 2 > <7>Switched to high resolution mode on CPU 7 > <6>NET: Registered protocol family 2 > <6>IP route cache hash table entries: 32768 (order: 2, 262144 bytes) > <6>TCP established hash table entries: 131072 (order: 5, 2097152 bytes) > <6>TCP bind hash table entries: 65536 (order: 4, 1048576 bytes) > <6>TCP: Hash tables configured (established 131072 bind 65536) > <6>TCP reno registered > <6>NET: Registered protocol family 1 > <6>Trying to unpack rootfs image as initramfs... > <4>Freeing initrd memory: 3594k freed > <7>irq: irq 655360 on host null mapped to virtual irq 17 > <7>irq: irq 655362 on host null mapped to virtual irq 18 > <6>IOMMU table initialized, virtual merging enabled > <7>irq: irq 655364 on host null mapped to virtual irq 19 > <7>irq: irq 655365 on host null mapped to virtual irq 20 > <7>irq: irq 589825 on host null mapped to virtual irq 21 > <7>RTAS daemon started > <7>RTAS: event: 23, Type: Platform Information Event, Severity: 1 > <6>audit: initializing netlink socket (disabled) > <5>type=2000 audit(1249386596.125:1): initialized > <6>HugeTLB registered 16 MB page size, pre-allocated 0 pages > <6>HugeTLB registered 16 GB page size, pre-allocated 0 pages > <5>VFS: Disk quotas dquot_6.5.2 > <4>Dquot-cache hash table entries: 8192 (order 0, 65536 bytes) > <6>msgmni has been set to 5024 > <7>SELinux: Registering netfilter hooks > <6>alg: No test for stdrng (krng) > <6>Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252) > <6>io scheduler noop registered > <6>io scheduler anticipatory registered > <6>io scheduler deadline registered > <6>io scheduler cfq registered (default) > <6>pci_hotplug: PCI Hot Plug PCI Core version: 0.5 > <6>pciehp: PCI Express Hot Plug Controller Driver version: 0.4 > <7>vio_register_driver: driver hvc_console registering > <7>HVSI: registered 0 devices > <6>Linux agpgart interface v0.103 > <6>Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled > <4>Platform driver 'serial8250' needs updating - please use dev_pm_ops > <6>TX39/49 Serial driver version 1.11 > <4>Platform driver 'serial_txx9' needs updating - please use dev_pm_ops > <6>brd: module loaded > <6>loop: module loaded > <6>input: Macintosh mouse button emulation as /devices/virtual/input/input0 > <6>Uniform Multi-Platform E-IDE driver > <6>ide-gd driver 1.18 > <6>Fixed MDIO Bus: probed > <6>ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver > <6>ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver > <6>uhci_hcd: USB Universal Host Controller Interface driver > <6>mice: PS/2 mouse device common for all mice > <6>device-mapper: uevent: version 1.0.3 > <6>device-mapper: ioctl: 4.15.0-ioctl (2009-04-01) initialised: dm-devel@redhat.com > <6>usbcore: registered new interface driver hiddev > <6>usbcore: registered new interface driver usbhid > <6>usbhid: v2.6:USB HID core driver > <4>nf_conntrack version 0.5.0 (16384 buckets, 65536 max) > <4>CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Please use > <4>nf_conntrack.acct=1 kernel parameter, acct=1 nf_conntrack module option or > <4>sysctl net.netfilter.nf_conntrack_acct=1 to enable it. > <6>ip_tables: (C) 2000-2006 Netfilter Core Team > <6>TCP cubic registered > <6>Initializing XFRM netlink socket > <6>NET: Registered protocol family 17 > <7>Running MSI bitmap self-tests ... > <7>PM: Resume from disk failed. > <4>registered taskstats version 1 > <6>Initalizing network drop monitor service > <4>Freeing unused kernel memory: 5504k freed > <7>vio_register_driver: driver ibmvscsi registering > <6>ibmvscsi 30000002: SRP_VERSION: 16.a > <6>scsi0 : IBM POWER Virtual SCSI Adapter 1.5.8 > <6>ibmvscsi 30000002: partner initialization complete > <6>ibmvscsi 30000002: host srp version: 16.a, host partition 06-1C12A (1), OS 3, max io 262144 > <6>ibmvscsi 30000002: Client reserve enabled > <6>ibmvscsi 30000002: sent SRP login > <6>ibmvscsi 30000002: SRP_LOGIN succeeded > <5>scsi 0:0:1:0: Direct-Access AIX VDASD 0001 PQ: 0 ANSI: 3 > <5>scsi 0:0:2:0: CD-ROM AIX VOPTA PQ: 0 ANSI: 4 > <6>scsi: waiting for bus probes to complete ... > <5>sd 0:0:1:0: Attached scsi generic sg0 type 0 > <5>sd 0:0:1:0: [sda] 147324928 512-byte logical blocks: (75.4 GB/70.2 GiB) > <5>sd 0:0:1:0: [sda] Write Protect is off > <7>sd 0:0:1:0: [sda] Mode Sense: 17 00 00 08 > <5>sd 0:0:1:0: [sda] Cache data unavailable > <3>sd 0:0:1:0: [sda] Assuming drive cache: write through > <4>sr0: scsi-1 drive > <6>Uniform CD-ROM driver Revision: 3.20 > <7>sr 0:0:2:0: Attached scsi CD-ROM sr0 > <5>sr 0:0:2:0: Attached scsi generic sg1 type 5 > <5>sd 0:0:1:0: [sda] Cache data unavailable > <3>sd 0:0:1:0: [sda] Assuming drive cache: write through > <6> sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 > > <5>sd 0:0:1:0: [sda] Cache data unavailable > <3>sd 0:0:1:0: [sda] Assuming drive cache: write through > <5>sd 0:0:1:0: [sda] Attached SCSI disk > <6>EXT4-fs (sda3): INFO: recovery required on readonly filesystem > <6>EXT4-fs (sda3): write access will be enabled during recovery > <6>EXT4-fs (sda3): barriers enabled > <6>kjournald2 starting: pid 126, dev sda3:8, commit interval 5 seconds > <6>EXT4-fs (sda3): delayed allocation enabled > <6>EXT4-fs: file extents enabled > <6>EXT4-fs: mballoc enabled > <6>EXT4-fs (sda3): recovery complete > <6>EXT4-fs (sda3): mounted filesystem with ordered data mode > <5>type=1404 audit(1249386597.188:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295 > <7>SELinux: 8192 avtab hash slots, 117789 rules. > <7>SELinux: 8192 avtab hash slots, 117789 rules. > <7>SELinux: 8 users, 13 roles, 2653 types, 121 bools, 1 sens, 1024 cats > <7>SELinux: 74 classes, 117789 rules > <7>SELinux: Completing initialization. > <7>SELinux: Setting up existing superblocks. > <7>SELinux: initialized (dev sda3, type ext4), uses xattr > <7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs > <7>SELinux: initialized (dev selinuxfs, type selinuxfs), uses genfs_contexts > <7>SELinux: initialized (dev mqueue, type mqueue), uses transition SIDs > <7>SELinux: initialized (dev hugetlbfs, type hugetlbfs), uses genfs_contexts > <7>SELinux: initialized (dev devpts, type devpts), uses transition SIDs > <7>SELinux: initialized (dev inotifyfs, type inotifyfs), uses genfs_contexts > <7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs > <7>SELinux: initialized (dev anon_inodefs, type anon_inodefs), uses genfs_contexts > <7>SELinux: initialized (dev pipefs, type pipefs), uses task SIDs > <7>SELinux: initialized (dev debugfs, type debugfs), uses genfs_contexts > <7>SELinux: initialized (dev sockfs, type sockfs), uses task SIDs > <7>SELinux: initialized (dev proc, type proc), uses genfs_contexts > <7>SELinux: initialized (dev bdev, type bdev), uses genfs_contexts > <7>SELinux: initialized (dev rootfs, type rootfs), uses genfs_contexts > <7>SELinux: initialized (dev sysfs, type sysfs), uses genfs_contexts > <5>type=1403 audit(1249386597.481:3): policy loaded auid=4294967295 ses=4294967295 > <6>udev: starting version 141 > <7>drivers/net/ibmveth.c: ibmveth: IBM i/pSeries Virtual Ethernet Driver 1.03 > <7>vio_register_driver: driver ibmveth registering > <6>IBM eHEA ethernet device driver (Release EHEA_0101) > <7>irq: irq 590080 on host null mapped to virtual irq 256 > <6>ehea: eth0: Jumbo frames are enabled > <6>ehea: eth0 -> logical port id #3 > <6>ehea: eth2: Jumbo frames are enabled > <6>ehea: eth2 -> logical port id #4 > <6>udev: renamed network interface eth1 to eth3 > <6>udev: renamed network interface eth0_rename to eth1 > <6>device-mapper: multipath: version 1.1.0 loaded > <6>EXT4-fs (sda3): internal journal on sda3:8 > <6>kjournald starting. Commit interval 5 seconds > <6>EXT3 FS on sda2, internal journal > <6>EXT3-fs: mounted filesystem with writeback data mode. > <7>SELinux: initialized (dev sda2, type ext3), uses xattr > <7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs > <7>SELinux: initialized (dev binfmt_misc, type binfmt_misc), uses genfs_contexts > <6>NET: Registered protocol family 10 > <6>lo: Disabled Privacy Extensions > <6>RPC: Registered udp transport module. > <6>RPC: Registered tcp transport module. > <7>SELinux: initialized (dev rpc_pipefs, type rpc_pipefs), uses genfs_contexts > <7>irq: irq 779 on host null mapped to virtual irq 267 > <6>ehea: eth2: Physical port up > <6>ehea: External switch port is backup port > <7>irq: irq 780 on host null mapped to virtual irq 268 > <7>irq: irq 781 on host null mapped to virtual irq 269 > <6>ehea: eth0: Physical port up > <6>ehea: External switch port is backup port > <7>irq: irq 782 on host null mapped to virtual irq 270 > <6>Bluetooth: Core ver 2.15 > <6>NET: Registered protocol family 31 > <6>Bluetooth: HCI device and connection manager initialized > <6>Bluetooth: HCI socket layer initialized > <6>Bluetooth: L2CAP ver 2.13 > <6>Bluetooth: L2CAP socket layer initialized > <6>Bluetooth: BNEP (Ethernet Emulation) ver 1.3 > <6>Bluetooth: BNEP filters: protocol multicast > <6>Installing knfsd (copyright (C) 1996 okir@monad.swb.de). > <7>SELinux: initialized (dev nfsd, type nfsd), uses genfs_contexts > <4>NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > <6>NFSD: starting 90-second grace period > <5>Bridge firewalling registered > <6>Bluetooth: SCO (Voice Link) ver 0.6 > <6>Bluetooth: SCO socket layer initialized > <6>virbr0: starting userspace STP failed, starting kernel STP > <7>SELinux: initialized (dev mqueue, type mqueue), uses transition SIDs > <7>SELinux: initialized (dev proc, type proc), uses genfs_contexts > <7>SELinux: initialized (dev mqueue, type mqueue), uses transition SIDs > <6>lo: Disabled Privacy Extensions > <7>SELinux: initialized (dev proc, type proc), uses genfs_contexts > <7>eth0: no IPv6 routers present > <7>eth2: no IPv6 routers present > <7>eth1: no IPv6 routers present > <7>eth3: no IPv6 routers present > <7>virbr0: no IPv6 routers present > 0:mon> > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 2.6.31-rc5-git2 crash on a idle system. 2009-08-05 9:52 ` Benjamin Herrenschmidt @ 2009-08-06 13:33 ` Sachin Sant 2009-08-06 13:40 ` Michael Ellerman 2009-08-06 21:51 ` Benjamin Herrenschmidt 0 siblings, 2 replies; 19+ messages in thread From: Sachin Sant @ 2009-08-06 13:33 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, neilb, linux-raid Benjamin Herrenschmidt wrote: > Thanks. Since it's a memory corruption (or seems to be) however, it's > possible that the bisection will mislead you. IE. The culprit could be > somewhere else, and the commit you'll find via bisection just happens to > move things around in the kernel in such a way that the corruption hits > that code path instead of another rarely used one. > > I would suggest using printk to print out the content of memory where > the code appears to have been smashed at different stages during boot > (maybe even in the initcalls loop in init/main.c) to try to point out > what appears to be causing the corruption. > By the time machine is up and running the particular memory location in question is already overwritten. So seems like the corruption occurs during the boot. I added few printks in the initcall debug code patch. The o/p suggests that by the time first initicall debug message is printed the code is already corrupted. Further debug suggests, when start_kernel() is called the code at address(0xc000000000600000) is already corrupted. About 28 bytes of code starting from the above address is overwritten. I will try to add few more debug statements to find the place where this corruption might me happening. Thanks -Sachin -- --------------------------------- Sachin Sant IBM Linux Technology Center India Systems and Technology Labs Bangalore, India --------------------------------- ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 2.6.31-rc5-git2 crash on a idle system. 2009-08-06 13:33 ` Sachin Sant @ 2009-08-06 13:40 ` Michael Ellerman 2009-08-06 21:51 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 19+ messages in thread From: Michael Ellerman @ 2009-08-06 13:40 UTC (permalink / raw) To: Sachin Sant; +Cc: Benjamin Herrenschmidt, neilb, linuxppc-dev, linux-raid [-- Attachment #1: Type: text/plain, Size: 1494 bytes --] On Thu, 2009-08-06 at 19:03 +0530, Sachin Sant wrote: > Benjamin Herrenschmidt wrote: > > Thanks. Since it's a memory corruption (or seems to be) however, it's > > possible that the bisection will mislead you. IE. The culprit could be > > somewhere else, and the commit you'll find via bisection just happens to > > move things around in the kernel in such a way that the corruption hits > > that code path instead of another rarely used one. > > > > I would suggest using printk to print out the content of memory where > > the code appears to have been smashed at different stages during boot > > (maybe even in the initcalls loop in init/main.c) to try to point out > > what appears to be causing the corruption. > > > By the time machine is up and running the particular memory location > in question is already overwritten. So seems like the corruption occurs > during the boot. > > I added few printks in the initcall debug code patch. The o/p suggests > that by the time first initicall debug message is printed the code is > already corrupted. Further debug suggests, when start_kernel() is > called the code at address(0xc000000000600000) is already corrupted. > About 28 bytes of code starting from the above address is overwritten. > > I will try to add few more debug statements to find the place where > this corruption might me happening. Is it always the exact same pattern at the exact same address? Or does it change and if so how? cheers [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 2.6.31-rc5-git2 crash on a idle system. 2009-08-06 13:33 ` Sachin Sant 2009-08-06 13:40 ` Michael Ellerman @ 2009-08-06 21:51 ` Benjamin Herrenschmidt 2009-08-07 0:08 ` Draft Mirrored Linux Mini How-to Harold Pritchett 1 sibling, 1 reply; 19+ messages in thread From: Benjamin Herrenschmidt @ 2009-08-06 21:51 UTC (permalink / raw) To: Sachin Sant; +Cc: neilb, linuxppc-dev, linux-raid On Thu, 2009-08-06 at 19:03 +0530, Sachin Sant wrote: > I added few printks in the initcall debug code patch. The o/p suggests > that by the time first initicall debug message is printed the code is > already corrupted. Further debug suggests, when start_kernel() is > called the code at address(0xc000000000600000) is already corrupted. > About 28 bytes of code starting from the above address is > overwritten. > > I will try to add few more debug statements to find the place where > this corruption might me happening. Hrm... start_kernel is very very early... strange. Can you double check that the actual kernel image contains the right stuff ? Also, what distro are you using to test that and how are you booting that kernel ? You can always add something to prom_init.c to test (though beware you aren't relocated yet so you need to offset the addresses). Cheers, Ben. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Draft Mirrored Linux Mini How-to 2009-08-06 21:51 ` Benjamin Herrenschmidt @ 2009-08-07 0:08 ` Harold Pritchett 2009-08-07 2:09 ` Goswin von Brederlow ` (4 more replies) 0 siblings, 5 replies; 19+ messages in thread From: Harold Pritchett @ 2009-08-07 0:08 UTC (permalink / raw) To: linux-raid Following this note is a draft of a "Mirrored Linux Mini How-to" document. It's primarily for my own use, but I will make it available to anyone who would like a copy. Before I do this, I would like input from this group as to whether it: 1. is a good idea? 2. has no obvious errors or problems? 3. will reliably work? 4. Should I provide more details? Or at least pointers to using fdisk, etc. Thanks for the assistance Harold ============================================================================== Mirrored Linux Mini How-to Install linux on two identical disk drives in such a way that the failure of either of the drives will allow the system to be recovered without any loss of data Both of the drives are partitioned exactly the same: 1. 3 primary partitions 2. Partition 1 - size - 1GB format as Linux Raid (fd) 3. Partition 2 - size = real memory size, format as linux swap (82) 4. Partition 3 = size = remainder of disk, format as linux raid (fd) To ensure that the partitions are correct, and in the correct order, boot from the install CD/DVD and run the rescue system. Use fdisk from this system to partition the two drives. (The GUI disk partitioner which runs in the anaconda installer has a tendency to move partitions around without asking. We don't want this to happen!) After saving the partition information on both drives, you can Run the Linux installer. Select "custom" installation. Define a raid 1 mirror named fd0 using the first partition on both drives Define a raid 1 mirror named fd1 using the third partition of both drives Define a linux ext3 filesystem on fd0. Mount on /boot Define a linux Logical Volume Group named vg0 on fd1. Within vg0, define the following Logical Volumes: lv00 - / lv01 - /tmp lv02 - /var lv03 - /usr lv04 - /usr/local lv05 - /home lv06 - /opt I'm from the old school. I believe in lots of partitions. Create any additional partitions you may need. If you really want to, you can just create a single partition and put "/" in it. You still will need the "/boot" partition since you can't boot from an LVM partition. You can now continue to install linux normally. I usually do it twice. the first time is to get an idea of how big each partition should be and the second time is to get it right. Remember to leave some extra unallocated space in the volume group to be allocated later if one of your file systems fills up. Read the linux logical volume manager docs. In the event of a disk failure, follow the below procedure. 1. Shutdown the system and remove the failed drive. 2. If the second (non boot) drive is the failing drive, skip to step 9 below. 3. Move the second (non boot) drive to the boot drive position. 4. Boot from the linux rescue CD/DVD and start the system, no network. 5. Use the "chroot /mnt/sysimage" command to mount the system. 6. Run the command "/sbin/grub-install /dev/sda" or whatever your drive is to write a boot image into the MBR of the new boot drive. 7. Remove the CD/DVD and re-boot. The system should boot on a single drive and both raid array's will show no spare available. 8. Shutdown the system 9. Install a new drive, identical to the existing drive if possible. 10. Bring up the system and use fdisk to partition the new drive EXACTLY the same as the existing drive. 11. Format the swap partition with "mkswap /dev/sdb2" 12. Add the mirror for the /boot partition with "mdadm --add /dev/md0 /dev/sdb1" or whatever is appropriate for your system... 13. Add the mirror for the LVM partition with "mdadm --add /dev/md1 /dev/sdb3" or whatever is appropriate for your system... 14. Wait for the mirror to sync. It may take several hours ------------------------------------------------------------------------------ Example: Disk /dev/sda: 200.0 GB, 200049647616 bytes 255 heads, 63 sectors/track, 24321 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 131 1052226 fd Linux raid autodetect /dev/sda2 132 392 2096482+ 82 Linux swap / Solaris /dev/sda3 393 24321 192209692+ fd Linux raid autodetect /dev/sdb is exactly the same md0 : active raid1 sdb1[1] sda1[0] 1052160 blocks [2/2] [UU] md1 : active raid1 sdb3[1] sda3[0] 192209600 blocks [2/2] [UU] Filesystem Size Used Avail Use% Mounted on /dev/md0 996M 52M 893M 6% /boot /dev/mapper/vg0-lv00 992M 373M 568M 40% / /dev/mapper/vg0-lv01 2.0G 101M 1.8G 6% /tmp /dev/mapper/vg0-lv02 2.0G 266M 1.6G 15% /var /dev/mapper/vg0-lv03 4.9G 2.8G 1.9G 60% /usr /dev/mapper/vg0-lv04 992M 104M 837M 12% /usr/local /dev/mapper/vg0-lv05 2.0G 132M 1.8G 7% /var/www /dev/mapper/vg0-lv06 20G 6.6G 12G 36% /home /dev/mapper/vg0-lv07 44G 39G 3.2G 93% /opt tmpfs 1010M 0 1010M 0% /dev/shm ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Draft Mirrored Linux Mini How-to 2009-08-07 0:08 ` Draft Mirrored Linux Mini How-to Harold Pritchett @ 2009-08-07 2:09 ` Goswin von Brederlow 2009-08-07 3:53 ` Tapani Tarvainen ` (3 subsequent siblings) 4 siblings, 0 replies; 19+ messages in thread From: Goswin von Brederlow @ 2009-08-07 2:09 UTC (permalink / raw) To: harold; +Cc: linux-raid Harold Pritchett <harold@uga.edu> writes: > Following this note is a draft of a "Mirrored Linux Mini How-to" document. > It's primarily for my own use, but I will make it available to anyone who > would like a copy. > > Before I do this, I would like input from this group as to whether it: > 1. is a good idea? > 2. has no obvious errors or problems? > 3. will reliably work? > 4. Should I provide more details? Or at least pointers to using fdisk, etc. > > Thanks for the assistance > > Harold > > ============================================================================== > > Mirrored Linux Mini How-to > > Install linux on two identical disk drives in such a way that the failure > of either of the drives will allow the system to be recovered without any > loss of data > > Both of the drives are partitioned exactly the same: > 1. 3 primary partitions > 2. Partition 1 - size - 1GB format as Linux Raid (fd) > 3. Partition 2 - size = real memory size, format as linux swap (82) > 4. Partition 3 = size = remainder of disk, format as linux raid (fd) Grub2 can boot from lvm on raid and swap can be on lvm too. So all you really need is one partition. Also the size of swap depends on the amount of ram. The more ram you have the less swap you need. Unless you want to do suspend to swap picking the size of swap by the ram size is pointless. > To ensure that the partitions are correct, and in the correct order, boot from > the install CD/DVD and run the rescue system. Use fdisk from this system > to partition the two drives. (The GUI disk partitioner which runs in the > anaconda installer has a tendency to move partitions around without asking. > We don't want this to happen!) After saving the partition information on > both drives, you can Run the Linux installer. Select "custom" installation. Insert Debian CD/DVD. Boot installer and configure raid1 when you get to the point about partitioning. Do you really need a HowTo for that? MfG Goswin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Draft Mirrored Linux Mini How-to 2009-08-07 0:08 ` Draft Mirrored Linux Mini How-to Harold Pritchett 2009-08-07 2:09 ` Goswin von Brederlow @ 2009-08-07 3:53 ` Tapani Tarvainen 2009-08-08 1:11 ` Goswin von Brederlow 2009-08-07 8:10 ` Fredrik Pettersson ` (2 subsequent siblings) 4 siblings, 1 reply; 19+ messages in thread From: Tapani Tarvainen @ 2009-08-07 3:53 UTC (permalink / raw) To: linux-raid On Thu, Aug 06, 2009 at 08:08:47PM -0400, Harold Pritchett (harold@uga.edu) wrote: > Mirrored Linux Mini How-to A few quick observations: > Install linux on two identical disk drives in such a way that the failure > of either of the drives will allow the system to be recovered without any > loss of data > > Both of the drives are partitioned exactly the same: > 1. 3 primary partitions > 2. Partition 1 - size - 1GB format as Linux Raid (fd) > 3. Partition 2 - size = real memory size, format as linux swap (82) > 4. Partition 3 = size = remainder of disk, format as linux raid (fd) If I read correctly, you are not only leaving swap out of lvm, you are not mirroring it at all - which would make the system crash if the swap disk breaks. Putting swap on lvm would also allow growing it easily as needed. Another point is that sometimes it is useful to have multiple partitions separately mirrored and then combined with lvm: it allows things like changing the raid configuration from two-disk raid1 to three-disk raid5 without moving data via backup and yet avoiding windows of vulnerability to single-disk failure during the transition. (Perhaps not common enough to be worth mentioning here, but I've found it useful.) > Define a raid 1 mirror named fd0 using the first partition on both drives > Define a raid 1 mirror named fd1 using the third partition of both drives Why "fd"? Sounds like floppy disks to me... why not md1 &c? (Please enlighten me if some raid tool normally uses fd.) > I'm from the old school. I believe in lots of partitions. So do I (indeed I'd add /var/tmp as a separate partition to your list), but it's independent of mirroring and not worth much space in a mirroring how-to (let alone mini how-to). Anyway, your method in all sounds rather too complicated (modern installers make mirroring during normal installation a breeze), and although doing it "the hard way" can be useful for learning purposes, I'd rather write a how-to using easier methods, possibly adding instructions for other tasks like how to handle broken disks, how to replace disks with bigger ones without reinstallation, how to add more disks, how to change raid configuration, &c. -- Tapani Tarvainen ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Draft Mirrored Linux Mini How-to 2009-08-07 3:53 ` Tapani Tarvainen @ 2009-08-08 1:11 ` Goswin von Brederlow 2009-08-08 1:41 ` NeilBrown 0 siblings, 1 reply; 19+ messages in thread From: Goswin von Brederlow @ 2009-08-08 1:11 UTC (permalink / raw) To: Tapani Tarvainen; +Cc: linux-raid Tapani Tarvainen <raid@tapanitarvainen.fi> writes: > On Thu, Aug 06, 2009 at 08:08:47PM -0400, Harold Pritchett (harold@uga.edu) wrote: > >> Mirrored Linux Mini How-to > > A few quick observations: > >> Install linux on two identical disk drives in such a way that the failure >> of either of the drives will allow the system to be recovered without any >> loss of data >> >> Both of the drives are partitioned exactly the same: >> 1. 3 primary partitions >> 2. Partition 1 - size - 1GB format as Linux Raid (fd) >> 3. Partition 2 - size = real memory size, format as linux swap (82) >> 4. Partition 3 = size = remainder of disk, format as linux raid (fd) > > If I read correctly, you are not only leaving swap out of lvm, > you are not mirroring it at all - which would make the system > crash if the swap disk breaks. > Putting swap on lvm would also allow growing it easily as needed. On the other hand don't forget that raid1 is buggy with swap and the page contents might change between writes to the first and second disk. Or has that been fixed? > Another point is that sometimes it is useful to have multiple > partitions separately mirrored and then combined with lvm: > it allows things like changing the raid configuration from > two-disk raid1 to three-disk raid5 without moving data > via backup and yet avoiding windows of vulnerability > to single-disk failure during the transition. > (Perhaps not common enough to be worth mentioning here, > but I've found it useful.) You can transform raid1 to raid5 without loss of redundncy so I don't quite see what you mean here. MfG Goswin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Draft Mirrored Linux Mini How-to 2009-08-08 1:11 ` Goswin von Brederlow @ 2009-08-08 1:41 ` NeilBrown 2009-08-08 7:59 ` Goswin von Brederlow 2009-08-08 14:09 ` Bill Davidsen 0 siblings, 2 replies; 19+ messages in thread From: NeilBrown @ 2009-08-08 1:41 UTC (permalink / raw) To: Goswin von Brederlow; +Cc: Tapani Tarvainen, linux-raid On Sat, August 8, 2009 11:11 am, Goswin von Brederlow wrote: > Tapani Tarvainen <raid@tapanitarvainen.fi> writes: > >> On Thu, Aug 06, 2009 at 08:08:47PM -0400, Harold Pritchett >> (harold@uga.edu) wrote: >> >>> Mirrored Linux Mini How-to >> >> A few quick observations: >> >>> Install linux on two identical disk drives in such a way that the >>> failure >>> of either of the drives will allow the system to be recovered without >>> any >>> loss of data >>> >>> Both of the drives are partitioned exactly the same: >>> 1. 3 primary partitions >>> 2. Partition 1 - size - 1GB format as Linux Raid (fd) >>> 3. Partition 2 - size = real memory size, format as linux swap >>> (82) >>> 4. Partition 3 = size = remainder of disk, format as linux raid >>> (fd) >> >> If I read correctly, you are not only leaving swap out of lvm, >> you are not mirroring it at all - which would make the system >> crash if the swap disk breaks. >> Putting swap on lvm would also allow growing it easily as needed. > > On the other hand don't forget that raid1 is buggy with swap and the > page contents might change between writes to the first and second > disk. Or has that been fixed? There is no bug here. The behaviour is a little unexpected but it is perfectly "correct" in that there is never any risk to data. NeilBrown > >> Another point is that sometimes it is useful to have multiple >> partitions separately mirrored and then combined with lvm: >> it allows things like changing the raid configuration from >> two-disk raid1 to three-disk raid5 without moving data >> via backup and yet avoiding windows of vulnerability >> to single-disk failure during the transition. >> (Perhaps not common enough to be worth mentioning here, >> but I've found it useful.) > > You can transform raid1 to raid5 without loss of redundncy so I don't > quite see what you mean here. > > MfG > Goswin > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Draft Mirrored Linux Mini How-to 2009-08-08 1:41 ` NeilBrown @ 2009-08-08 7:59 ` Goswin von Brederlow 2009-08-08 15:24 ` John Robinson 2009-08-08 14:09 ` Bill Davidsen 1 sibling, 1 reply; 19+ messages in thread From: Goswin von Brederlow @ 2009-08-08 7:59 UTC (permalink / raw) To: NeilBrown; +Cc: Goswin von Brederlow, Tapani Tarvainen, linux-raid "NeilBrown" <neilb@suse.de> writes: > On Sat, August 8, 2009 11:11 am, Goswin von Brederlow wrote: >> Tapani Tarvainen <raid@tapanitarvainen.fi> writes: >> >>> On Thu, Aug 06, 2009 at 08:08:47PM -0400, Harold Pritchett >>> (harold@uga.edu) wrote: >>> >>>> Mirrored Linux Mini How-to >>> >>> A few quick observations: >>> >>>> Install linux on two identical disk drives in such a way that the >>>> failure >>>> of either of the drives will allow the system to be recovered without >>>> any >>>> loss of data >>>> >>>> Both of the drives are partitioned exactly the same: >>>> 1. 3 primary partitions >>>> 2. Partition 1 - size - 1GB format as Linux Raid (fd) >>>> 3. Partition 2 - size = real memory size, format as linux swap >>>> (82) >>>> 4. Partition 3 = size = remainder of disk, format as linux raid >>>> (fd) >>> >>> If I read correctly, you are not only leaving swap out of lvm, >>> you are not mirroring it at all - which would make the system >>> crash if the swap disk breaks. >>> Putting swap on lvm would also allow growing it easily as needed. >> >> On the other hand don't forget that raid1 is buggy with swap and the >> page contents might change between writes to the first and second >> disk. Or has that been fixed? > > There is no bug here. The behaviour is a little unexpected > but it is perfectly "correct" in that there is never any risk to > data. > > NeilBrown Disk 1 writes, page is modified, disk 2 writes, page is swapped in from disk 1, something crashes because old data is swapped in. Or did I miss something? MfG Goswin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Draft Mirrored Linux Mini How-to 2009-08-08 7:59 ` Goswin von Brederlow @ 2009-08-08 15:24 ` John Robinson 0 siblings, 0 replies; 19+ messages in thread From: John Robinson @ 2009-08-08 15:24 UTC (permalink / raw) To: Goswin von Brederlow; +Cc: NeilBrown, Linux RAID On 08/08/2009 08:59, Goswin von Brederlow wrote: > "NeilBrown" <neilb@suse.de> writes: >> On Sat, August 8, 2009 11:11 am, Goswin von Brederlow wrote: [...] >>> On the other hand don't forget that raid1 is buggy with swap and the >>> page contents might change between writes to the first and second >>> disk. Or has that been fixed? Can you give a reference for this? >> There is no bug here. The behaviour is a little unexpected >> but it is perfectly "correct" in that there is never any risk to >> data. > > Disk 1 writes, page is modified, disk 2 writes, page is swapped in > from disk 1, something crashes because old data is swapped in. > > Or did I miss something? If the page is modified then it won't be swapped back in when you said, because the swap out wasn't completed and the page wasn't reallocated. Well, that's my guess. I presume this is allowed for performance reasons. Cheers, John. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Draft Mirrored Linux Mini How-to 2009-08-08 1:41 ` NeilBrown 2009-08-08 7:59 ` Goswin von Brederlow @ 2009-08-08 14:09 ` Bill Davidsen 1 sibling, 0 replies; 19+ messages in thread From: Bill Davidsen @ 2009-08-08 14:09 UTC (permalink / raw) To: NeilBrown; +Cc: Goswin von Brederlow, Tapani Tarvainen, linux-raid NeilBrown wrote: > On Sat, August 8, 2009 11:11 am, Goswin von Brederlow wrote: > >> Tapani Tarvainen <raid@tapanitarvainen.fi> writes: >> >> >>> On Thu, Aug 06, 2009 at 08:08:47PM -0400, Harold Pritchett >>> (harold@uga.edu) wrote: >>> >>> >>>> Mirrored Linux Mini How-to >>>> >>> A few quick observations: >>> >>> >>>> Install linux on two identical disk drives in such a way that the >>>> failure >>>> of either of the drives will allow the system to be recovered without >>>> any >>>> loss of data >>>> >>>> Both of the drives are partitioned exactly the same: >>>> 1. 3 primary partitions >>>> 2. Partition 1 - size - 1GB format as Linux Raid (fd) >>>> 3. Partition 2 - size = real memory size, format as linux swap >>>> (82) >>>> 4. Partition 3 = size = remainder of disk, format as linux raid >>>> (fd) >>>> >>> If I read correctly, you are not only leaving swap out of lvm, >>> you are not mirroring it at all - which would make the system >>> crash if the swap disk breaks. >>> Putting swap on lvm would also allow growing it easily as needed. >>> >> On the other hand don't forget that raid1 is buggy with swap and the >> page contents might change between writes to the first and second >> disk. Or has that been fixed? >> > > There is no bug here. The behaviour is a little unexpected > but it is perfectly "correct" in that there is never any risk to > data. > > > The reason people use RAID is to protect their data, and with hardware raid there is no problem, the data is cached in the controller and sent to multiple devices, and only transferred over the bus once. Software raid can't avoid multiple bus transfers, but it could prevent the case where data "mirrored" is actually inconsistent on at least one copy. There are two ways to prevent this, one would be to always copy data to a buffer rather than write from user memory (this sounds like a lot of overhead), or marking the page copy on write, which sounds far more efficient, but is probably more complex, particularly for a threaded application. Perhaps an option on a per-array basis would be useful, people who worry about this could set the option and have every write copied to a buffer, and people who don't worry about it can leave things as they are now. >>> Another point is that sometimes it is useful to have multiple >>> partitions separately mirrored and then combined with lvm: >>> it allows things like changing the raid configuration from >>> two-disk raid1 to three-disk raid5 without moving data >>> via backup and yet avoiding windows of vulnerability >>> to single-disk failure during the transition. >>> (Perhaps not common enough to be worth mentioning here, >>> but I've found it useful.) >>> >> You can transform raid1 to raid5 without loss of redundncy so I don't >> quite see what you mean here. >> >> MfG >> Goswin >> -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc "You are disgraced professional losers. And by the way, give us our money back." - Representative Earl Pomeroy, Democrat of North Dakota on the A.I.G. executives who were paid bonuses after a federal bailout. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Draft Mirrored Linux Mini How-to 2009-08-07 0:08 ` Draft Mirrored Linux Mini How-to Harold Pritchett 2009-08-07 2:09 ` Goswin von Brederlow 2009-08-07 3:53 ` Tapani Tarvainen @ 2009-08-07 8:10 ` Fredrik Pettersson 2009-08-07 9:51 ` Keld Jørn Simonsen 2009-08-07 15:27 ` Draft Mirrored Linux Mini How-to - sfdisk suggestion Maurice Hilarius 4 siblings, 0 replies; 19+ messages in thread From: Fredrik Pettersson @ 2009-08-07 8:10 UTC (permalink / raw) To: Harold Pritchett; +Cc: linux-raid On Thu, 6 Aug 2009, Harold Pritchett wrote: ... > In the event of a disk failure, follow the below procedure. > > 1. Shutdown the system and remove the failed drive. > > 2. If the second (non boot) drive is the failing drive, skip to step 9 > below. > > 3. Move the second (non boot) drive to the boot drive position. > > 4. Boot from the linux rescue CD/DVD and start the system, no network. > > 5. Use the "chroot /mnt/sysimage" command to mount the system. > > 6. Run the command "/sbin/grub-install /dev/sda" or whatever your drive is > to write a boot image into the MBR of the new boot drive. I think you may be making things a bit more complicated then they need to be. Would it not make sense to just install grub on both drives right from the start (so on both /dev/sda and /dev/sdb)? If you then configure the bios to boot from sda first and if that fails boot from sdb then the system should be able to keep on rebooting etc. even if one drive has failed. It also means there is no need to move disks around or boot from a rescue CD to recover from a failed drive, just shut down, replace drive, reboot, partition, mdadm --add and finally install grub on the newly replaced drive. In fact I'm pretty sure I've seen it described like that in other howtos, a quick google found this guy (using lilo but the teory is the same): http://www.willert.dk/geek/raid.html This one is a bit old, but the theory is the same there as well and it is using grub: http://www.somedec.com/downloads/howto-bootable-linux-raid1.html Other than that I don't see any obvious problems with your description of the process. BR, /Fredrk Pettersson ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Draft Mirrored Linux Mini How-to 2009-08-07 0:08 ` Draft Mirrored Linux Mini How-to Harold Pritchett ` (2 preceding siblings ...) 2009-08-07 8:10 ` Fredrik Pettersson @ 2009-08-07 9:51 ` Keld Jørn Simonsen [not found] ` <4A7C5BD2.80508@uga.edu> 2009-08-07 15:27 ` Draft Mirrored Linux Mini How-to - sfdisk suggestion Maurice Hilarius 4 siblings, 1 reply; 19+ messages in thread From: Keld Jørn Simonsen @ 2009-08-07 9:51 UTC (permalink / raw) To: Harold Pritchett; +Cc: linux-raid On Thu, Aug 06, 2009 at 08:08:47PM -0400, Harold Pritchett wrote: > Following this note is a draft of a "Mirrored Linux Mini How-to" document. > It's primarily for my own use, but I will make it available to anyone who > would like a copy. > > Before I do this, I would like input from this group as to whether it: > 1. is a good idea? It is a good idea to have a how-to. But tehere are already a few around. I wrote something like it for our wiki at http://linux-raid.osdl.org/index.php/Preventing_against_a_failing_disk but with some more advanced features, such as you do not crash if one disk fails, and you can reboot the system without a rescue disk, and you get faster mirrored raid, avoiding the slow raid 1. It does not do LVM, however, and I think that how-to should be enhanced with LVM. maybe we could work together on an enhanced wiki description > 2. has no obvious errors or problems? I think it could be improved to have additional features, using the same equipment. And I think it can be simplified. > 3. will reliably work? Most likely. However I think it can be made more robust. > 4. Should I provide more details? Or at least pointers to using fdisk, etc. I would recommend that actual CLI commands be described. I think that helps more novice users a lot. > > Thanks for the assistance > > Harold > > ============================================================================== > > Mirrored Linux Mini How-to > > Install linux on two identical disk drives in such a way that the failure > of either of the drives will allow the system to be recovered without any > loss of data > > Both of the drives are partitioned exactly the same: > 1. 3 primary partitions > 2. Partition 1 - size - 1GB format as Linux Raid (fd) > 3. Partition 2 - size = real memory size, format as linux swap (82) > 4. Partition 3 = size = remainder of disk, format as linux raid (fd) > > To ensure that the partitions are correct, and in the correct order, boot from > the install CD/DVD and run the rescue system. Use fdisk from this system > to partition the two drives. (The GUI disk partitioner which runs in the > anaconda installer has a tendency to move partitions around without asking. > We don't want this to happen!) After saving the partition information on > both drives, you can Run the Linux installer. Select "custom" installation. why not use sfdisk to copy the layout of one disk to another. using sfdisk you can actually script much of this. That makes the how-to more robust. > > Define a raid 1 mirror named fd0 using the first partition on both drives > Define a raid 1 mirror named fd1 using the third partition of both drives > > Define a linux ext3 filesystem on fd0. Mount on /boot > Define a linux Logical Volume Group named vg0 on fd1. > Within vg0, define the following Logical Volumes: > lv00 - / > lv01 - /tmp > lv02 - /var > lv03 - /usr > lv04 - /usr/local > lv05 - /home > lv06 - /opt > I'm from the old school. I believe in lots of partitions. Create any > additional partitions you may need. If you really want to, you can > just create a single partition and put "/" in it. You still will > need the "/boot" partition since you can't boot from an LVM partition. I think it is good to have a / , a /home, and possibly a /boot partition, but having more partitions is probably just shooting yourself in the foot, because you may create space problems. Those smaill partitions can easily hit some roof, like /var (when logs run full) and /tmp (doing temporary work like editing in big files), /opt and /usr/local (installing big new packages) and why should /usr and / be on different partition - that beats me. I don't remember, but can you change LVM sizes on active LVMs? Anyway you are likely to run into space constraints on such a multipartiton system, and overrun of just one of the partitions will create a severe operation problem. Better pool all the space together... Having the system on one partition faclitates if you want to upgrade your system, or shift to another distribution, then you can make another partition with the new system, and have a quick fallback path, while preserving your actual data on a /home partition. > > You can now continue to install linux normally. I usually do it twice. > the first time is to get an idea of how big each partition should be > and the second time is to get it right. That is cumbersome, and probably caused by your use of many partitions. It will turn some novices off. > Remember to leave some extra > unallocated space in the volume group to be allocated later if one of your > file systems fills up. Read the linux logical volume manager docs. an URL to one would be fine. > > > In the event of a disk failure, follow the below procedure. > > 1. Shutdown the system and remove the failed drive. > > 2. If the second (non boot) drive is the failing drive, skip to step 9 below. > > 3. Move the second (non boot) drive to the boot drive position. why can't you boot from the working drive? The system should be configured to do this. > 4. Boot from the linux rescue CD/DVD and start the system, no network. better avoid the rescue cd by making the system bootable from both drives.' > 5. Use the "chroot /mnt/sysimage" command to mount the system. > > 6. Run the command "/sbin/grub-install /dev/sda" or whatever your drive is > to write a boot image into the MBR of the new boot drive. > > 7. Remove the CD/DVD and re-boot. The system should boot on a single drive > and both raid array's will show no spare available. > > 8. Shutdown the system > > 9. Install a new drive, identical to the existing drive if possible. > > 10. Bring up the system and use fdisk to partition the new drive EXACTLY > the same as the existing drive. > > 11. Format the swap partition with "mkswap /dev/sdb2" > > 12. Add the mirror for the /boot partition with "mdadm --add /dev/md0 /dev/sdb1" > or whatever is appropriate for your system... > > 13. Add the mirror for the LVM partition with "mdadm --add /dev/md1 /dev/sdb3" > or whatever is appropriate for your system... > > 14. Wait for the mirror to sync. It may take several hours you can begin using the system immediately, while the raids are syncing. it would be nice if you could reference our wiki, wherever you put up your howto. best regards keld ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <4A7C5BD2.80508@uga.edu>]
[parent not found: <20090807173423.GA32127@rap.rap.dk>]
* Re: Draft Mirrored Linux Mini How-to [not found] ` <20090807173423.GA32127@rap.rap.dk> @ 2009-08-08 11:36 ` Keld Jørn Simonsen 0 siblings, 0 replies; 19+ messages in thread From: Keld Jørn Simonsen @ 2009-08-08 11:36 UTC (permalink / raw) To: Harold Pritchett; +Cc: linux-raid Forgot to cc linux-raid... keld On Fri, Aug 07, 2009 at 07:34:23PM +0200, Keld Jørn Simonsen wrote: > On Fri, Aug 07, 2009 at 12:52:34PM -0400, Harold Pritchett wrote: > > Keld Jørn Simonsen wrote: > > > >> It is a good idea to have a how-to. But there are already a few around. > >> I wrote something like it for our wiki at > >> http://linux-raid.osdl.org/index.php/Preventing_against_a_failing_disk > >> but with some more advanced features, such as you do not crash if one > >> disk fails, and you can reboot the system without a rescue disk, and you > >> get faster mirrored raid, avoiding the slow raid 1. It does not do LVM, > >> however, and I think that how-to should be enhanced with LVM. > > > > I guess this would depend upon the linux. Currently, I am working with > > Centos 5.3 and the only raid personalities available in the kernel on > > the DVD appear to be RAID0, RAID1, RAID5, and RAID6. With two disks > > this limits us to RAID0 and RAID1. > > I am also running centos 5.3 and raid10 is supported, and I run most of > my data on it. > > >>> I'm from the old school. I believe in lots of partitions. Create any > >>> additional partitions you may need. If you really want to, you can > >>> just create a single partition and put "/" in it. You still will > >>> need the "/boot" partition since you can't boot from an LVM partition. > >> > >> I think it is good to have a / , a /home, and possibly a /boot > >> partition, but having more partitions is probably just shooting > >> yourself in the > >> foot, because you may create space problems. Those smaill partitions can > >> easily hit some roof, like /var (when logs run full) and /tmp (doing temporary > >> work like editing in big files), /opt and /usr/local (installing big new packages) > >> and why should /usr and / be on different partition - that beats me. > > > > As I said, I'm from the old school originally cutting my teeth on BSD unix in > > the 1980's. In those days, disks were always too small. A couple of 20 MB > > disks (that's MB, not GB) was a LOT of space back then. By using multiple > > partitions you could keep a run-away from crashing the whole system when it > > filled up /tmp or /var. > > I am also from that time, starting out with UNIX V6 on RL05 with 2.5 MB on a PDP-11/45. > And we had a *big* 40 MB disk in the corner. Later we ran VAX'en and BSD > 4.2 - still I think it is better to keep the system things in one > partition. Anyway why not describe both, and tell of pros and cons. > > > But once again, this personal preferences. All you really need is /boot, / and > > some swap space. > > I agree with that. > > >>> You can now continue to install linux normally. I usually do it twice. > >>> the first time is to get an idea of how big each partition should be > >>> and the second time is to get it right. > >> > >> That is cumbersome, and probably caused by your use of many partitions. > >> It will turn some novices off. > > > > Make the default a single partition and put the multiple partition version in an > > appendix... In today's world of TeraByte disk drives for under $100.00 It may > > be the best idea to just put it all in a single file system. > > as sad you could decribe both. For pedagogical reasons, and because it > does not matter very much, you could probebly benefit from describing > the simpler version. > > >> why can't you boot from the working drive? > >> The system should be configured to do this. > >> > >>> 4. Boot from the linux rescue CD/DVD and start the system, no network. > >> > >> better avoid the rescue cd by making the system bootable from both drives.' > > > > can do... I just didn't think of this... > > > >>> 14. Wait for the mirror to sync. It may take several hours > >> > >> you can begin using the system immediately, while the raids are syncing. > > > > Good point. I knew that and just didn't say it. > > > >> it would be nice if you could reference our wiki, wherever you put up > >> your howto. > > > > I would be glad to... In fact, if we get something useful, you might want > > to put a copy on the wiki... > > to me it is very overlapping with what is already up there. > For now I cannot see the benefit of two howtos with the same aim. > Better consolidate it. > > best regards > keld -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Draft Mirrored Linux Mini How-to - sfdisk suggestion 2009-08-07 0:08 ` Draft Mirrored Linux Mini How-to Harold Pritchett ` (3 preceding siblings ...) 2009-08-07 9:51 ` Keld Jørn Simonsen @ 2009-08-07 15:27 ` Maurice Hilarius 2009-08-08 1:13 ` Goswin von Brederlow 4 siblings, 1 reply; 19+ messages in thread From: Maurice Hilarius @ 2009-08-07 15:27 UTC (permalink / raw) To: harold, vger majordomo for lists Harold Pritchett wrote: > Following this note is a draft of a "Mirrored Linux Mini How-to" > document. > .. > To ensure that the partitions are correct, and in the correct order, > boot from > the install CD/DVD and run the rescue system. Use fdisk from this system > to partition the two drives. > .. > In the event of a disk failure, follow the below procedure. > .. > 10. Bring up the system and use fdisk to partition the new drive EXACTLY > the same as the existing drive. I suggest that the steps involving partitioning multiple drives are prone to error. I offer that the use sfdisk is more accurate, and user friendly. By using sfdisk with the -d option we can get a dump of the current partition table in a regular file, and if needed we can restore it from that file. Example: | sfdisk -d /dev/sda | sfdisk /dev/sdb This is a very simple way to duplicate the partition table from one disk to another. | -- Regards, Maurice ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Draft Mirrored Linux Mini How-to - sfdisk suggestion 2009-08-07 15:27 ` Draft Mirrored Linux Mini How-to - sfdisk suggestion Maurice Hilarius @ 2009-08-08 1:13 ` Goswin von Brederlow 0 siblings, 0 replies; 19+ messages in thread From: Goswin von Brederlow @ 2009-08-08 1:13 UTC (permalink / raw) To: Maurice Hilarius; +Cc: harold, vger majordomo for lists Maurice Hilarius <maurice@harddata.com> writes: > Harold Pritchett wrote: >> Following this note is a draft of a "Mirrored Linux Mini How-to" >> document. >> .. >> To ensure that the partitions are correct, and in the correct order, >> boot from >> the install CD/DVD and run the rescue system. Use fdisk from this system >> to partition the two drives. >> .. >> In the event of a disk failure, follow the below procedure. >> .. >> 10. Bring up the system and use fdisk to partition the new drive EXACTLY >> the same as the existing drive. > > I suggest that the steps involving partitioning multiple drives are > prone to error. > > I offer that the use sfdisk is more accurate, and user friendly. > > By using sfdisk with the -d option we can get a dump of the current > partition table in a regular file, > and if needed we can restore it from that file. > > Example: > > | sfdisk -d /dev/sda | sfdisk /dev/sdb > > This is a very simple way to duplicate the partition table from one disk to another. > > | ACK. I do that too. Now we only need the same thing directly in the installer. MfG Goswin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 2.6.31-rc5-git2 crash on a idle system. 2009-08-05 9:17 ` 2.6.31-rc5-git2 crash on a idle system Sachin Sant 2009-08-05 9:52 ` Benjamin Herrenschmidt @ 2009-08-09 18:55 ` Louwrentius 1 sibling, 0 replies; 19+ messages in thread From: Louwrentius @ 2009-08-09 18:55 UTC (permalink / raw) To: linux-raid unsubscribe ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2009-08-09 18:55 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4A78292A.5000607@in.ibm.com>
[not found] ` <1249421223.18245.36.camel@pasglop>
2009-08-05 9:17 ` 2.6.31-rc5-git2 crash on a idle system Sachin Sant
2009-08-05 9:52 ` Benjamin Herrenschmidt
2009-08-06 13:33 ` Sachin Sant
2009-08-06 13:40 ` Michael Ellerman
2009-08-06 21:51 ` Benjamin Herrenschmidt
2009-08-07 0:08 ` Draft Mirrored Linux Mini How-to Harold Pritchett
2009-08-07 2:09 ` Goswin von Brederlow
2009-08-07 3:53 ` Tapani Tarvainen
2009-08-08 1:11 ` Goswin von Brederlow
2009-08-08 1:41 ` NeilBrown
2009-08-08 7:59 ` Goswin von Brederlow
2009-08-08 15:24 ` John Robinson
2009-08-08 14:09 ` Bill Davidsen
2009-08-07 8:10 ` Fredrik Pettersson
2009-08-07 9:51 ` Keld Jørn Simonsen
[not found] ` <4A7C5BD2.80508@uga.edu>
[not found] ` <20090807173423.GA32127@rap.rap.dk>
2009-08-08 11:36 ` Keld Jørn Simonsen
2009-08-07 15:27 ` Draft Mirrored Linux Mini How-to - sfdisk suggestion Maurice Hilarius
2009-08-08 1:13 ` Goswin von Brederlow
2009-08-09 18:55 ` 2.6.31-rc5-git2 crash on a idle system Louwrentius
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).