From mboxrd@z Thu Jan 1 00:00:00 1970 From: "David C. Rankin" Subject: kernel update and dmraid causing grub errors Date: Mon, 01 Nov 2010 17:27:16 -0500 Message-ID: <4CCF3EC4.1020708@suddenlinkmail.com> Reply-To: device-mapper development Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com To: dm-devel@redhat.com List-Id: dm-devel.ids dmraid devs, Over the past 8-9 months, I have had numerous dmraid related boot failures with the past 6-8 kernels. It seems like a Russian-roulette type problem. Some kernels work with dmraid, some cause grub errors. The problem is most acute on an MSI SLI Platinum Based board (MS-7374), Phenom X4 (9850), with the following pci bus config: [15:48 archangel:/home/david/bugs/aa] # lspci 00:00.0 RAM memory: nVidia Corporation MCP78S [GeForce 8200] Memory Controller (rev a2) 00:01.0 ISA bridge: nVidia Corporation MCP78S [GeForce 8200] LPC Bridge (rev a2) 00:01.1 SMBus: nVidia Corporation MCP78S [GeForce 8200] SMBus (rev a1) 00:01.2 RAM memory: nVidia Corporation MCP78S [GeForce 8200] Memory Controller (rev a1) 00:01.3 Co-processor: nVidia Corporation MCP78S [GeForce 8200] Co-Processor (rev a2) 00:01.4 RAM memory: nVidia Corporation MCP78S [GeForce 8200] Memory Controller (rev a1) 00:02.0 USB Controller: nVidia Corporation MCP78S [GeForce 8200] OHCI USB 1.1 Controller (rev a1) 00:02.1 USB Controller: nVidia Corporation MCP78S [GeForce 8200] EHCI USB 2.0 Controller (rev a1) 00:04.0 USB Controller: nVidia Corporation MCP78S [GeForce 8200] OHCI USB 1.1 Controller (rev a1) 00:04.1 USB Controller: nVidia Corporation MCP78S [GeForce 8200] EHCI USB 2.0 Controller (rev a1) 00:06.0 IDE interface: nVidia Corporation MCP78S [GeForce 8200] IDE (rev a1) 00:07.0 Audio device: nVidia Corporation MCP72XE/MCP72P/MCP78U/MCP78S High Definition Audio (rev a1) 00:08.0 PCI bridge: nVidia Corporation MCP78S [GeForce 8200] PCI Bridge (rev a1) 00:09.0 RAID bus controller: nVidia Corporation MCP78S [GeForce 8200] SATA Controller (RAID mode) (rev a2) 00:0a.0 Ethernet controller: nVidia Corporation MCP77 Ethernet (rev a2) 00:10.0 PCI bridge: nVidia Corporation MCP78S [GeForce 8200] PCI Express Bridge (rev a1) 00:12.0 PCI bridge: nVidia Corporation MCP78S [GeForce 8200] PCI Express Bridge (rev a1) 00:13.0 PCI bridge: nVidia Corporation MCP78S [GeForce 8200] PCI Bridge (rev a1) 00:14.0 PCI bridge: nVidia Corporation MCP78S [GeForce 8200] PCI Bridge (rev a1) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control 00:18.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control 01:06.0 Serial controller: 3Com Corp, Modem Division 56K FaxModem Model 5610 (rev 01) 01:09.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller (rev c0) 02:00.0 VGA compatible controller: nVidia Corporation G92 [GeForce 8800 GT] (rev a2) 04:00.0 SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 03) 04:00.1 IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 03) full dmidecode information at: http://www.3111skyline.com/dl/Archlinux/bugs/aa-dmidecode.txt Booting the current Arch Linux kernel (2.6.35.8-1) fails and the boot hangs at the very start. The kernel line I use hasn't changed in a long time: kernel /vmlinuz root=/dev/mapper/nvidia_baaccajap5 ro vga=0x31a Booting first stopped with the following error: Booting 'Arch Linux on Archangel' root (hd1,5) Filesystem type is ext2fs, Partition type 0x83 Kernel /vmlinuz26 root=/dev/mapper/nvidia_baacca_jap5 ro vga=794 Error 24: Attempt to access block outside partition Press any key to continue... Upgrading to device-mapper-2.02.75-1 completely changes the error to: Error 5: Partition table invalid or corrupt Rebooting to 2.6.35.7-1, or 2.6.32.25-1 (the Arch LTS kernel) works just fine. So the problem is not a partition or partition table problem. The Arch Linux developer (Tobias Powalowski) has referred me here as the problem isn't a kernel problem, but something strange that is happening with dmraid. The only guess I have is that it is a dmraid/GeForce controller issue that is triggered when dmraid loads under certain circumstances. This box has 2 dmraid arrays: [17:15 archangel:/home/david/bugs/aa] # dmraid -r /dev/sdd: nvidia, "nvidia_baaccaja", mirror, ok, 1465149166 sectors, data@ 0 /dev/sda: nvidia, "nvidia_fdaacfde", mirror, ok, 976773166 sectors, data@ 0 /dev/sdb: nvidia, "nvidia_baaccaja", mirror, ok, 1465149166 sectors, data@ 0 /dev/sdc: nvidia, "nvidia_fdaacfde", mirror, ok, 976773166 sectors, data@ 0 [17:15 archangel:/home/david/bugs/aa] # dmraid -s *** Active Set name : nvidia_baaccaja size : 1465149056 stride : 128 type : mirror status : ok subsets: 0 devs : 2 spares : 0 *** Active Set name : nvidia_fdaacfde size : 976773120 stride : 128 type : mirror status : ok subsets: 0 devs : 2 spares : 0 All disks check out fine with smartctl, so it isn't a disk-hardware problem. The detailed information on the GeForce controller (lspci -vv) is: 00:09.0 RAID bus controller: nVidia Corporation MCP78S [GeForce 8200] SATA Controller (RAID mode) (rev a2) Subsystem: Micro-Star International Co., Ltd. Device 7374 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR-