* Linux software RAID assistance @ 2011-02-10 16:16 Simon McNair 2011-02-10 18:24 ` Phil Turmel 2011-02-15 4:53 ` NeilBrown 0 siblings, 2 replies; 64+ messages in thread From: Simon McNair @ 2011-02-10 16:16 UTC (permalink / raw) To: linux-raid Hi all I use a 3ware 9500-12 port sata card (JBOD) which will not work without a 128mb sodimm. The sodimm socket is flakey and the result is that the machine occasionally crashes. Yesterday I finally gave in and put together another machine so that I can rsync between them. When I turned the machine on today to set up rync, the RAID array was not gone, but corrupted. Typical... I built the array in Aug 2010 using the following command: mdadm --create --verbose /dev/md0 --metadata=1.1 --level=5 --raid-devices=10 /dev/sd{b,c,d,e,f,g,h,i,j,k}1 --chunk=64 Using LVM, I did the following: pvscan pvcreate -M2 /dev/md0 vgcreate lvm-raid /dev/md0 vgdisplay lvm-raid vgscan lvscan lvcreate -v -l 100%VG -n RAID lvm-raid lvdisplay /dev/lvm-raid/lvm0 I then formatted using: mkfs -t ext4 -v -m .1 -b 4096 -E stride=16,stripe-width=144 /dev/lvm-raid/RAID This worked perfectly since I created the array. Now mdadm is coming up with proxmox:/dev/md# mdadm --assemble --scan --verbose mdadm: looking for devices for further assembly mdadm: no recogniseable superblock on /dev/md/ubuntu:0 mdadm: cannot open device /dev/dm-2: Device or resource busy mdadm: cannot open device /dev/dm-1: Device or resource busy mdadm: cannot open device /dev/dm-0: Device or resource busy mdadm: cannot open device /dev/sdm1: Device or resource busy mdadm: cannot open device /dev/sdm: Device or resource busy mdadm: cannot open device /dev/sdl1: Device or resource busy mdadm: cannot open device /dev/sdl: Device or resource busy mdadm: cannot open device /dev/sdk1: Device or resource busy mdadm: cannot open device /dev/sdk: Device or resource busy mdadm: cannot open device /dev/sdj1: Device or resource busy mdadm: cannot open device /dev/sdj: Device or resource busy mdadm: cannot open device /dev/sdh1: Device or resource busy mdadm: cannot open device /dev/sdh: Device or resource busy mdadm: cannot open device /dev/sdi1: Device or resource busy mdadm: cannot open device /dev/sdi: Device or resource busy mdadm: cannot open device /dev/sdg1: Device or resource busy mdadm: cannot open device /dev/sdg: Device or resource busy mdadm: cannot open device /dev/sdf1: Device or resource busy mdadm: cannot open device /dev/sdf: Device or resource busy mdadm: cannot open device /dev/sde1: Device or resource busy mdadm: cannot open device /dev/sde: Device or resource busy mdadm: no RAID superblock on /dev/sdd mdadm: cannot open device /dev/sda2: Device or resource busy mdadm: cannot open device /dev/sda1: Device or resource busy mdadm: cannot open device /dev/sda: Device or resource busy mdadm: /dev/sdd1 is identified as a member of /dev/md/pro�lox:0, slot 0. mdadm: no uptodate device for slot 1 of /dev/md/pro�lox:0 mdadm: no uptodate device for slot 2 of /dev/md/pro�lox:0 mdadm: no uptodate device for slot 3 of /dev/md/pro�lox:0 mdadm: no uptodate device for slot 4 of /dev/md/pro�lox:0 mdadm: no uptodate device for slot 5 of /dev/md/pro�lox:0 mdadm: no uptodate device for slot 6 of /dev/md/pro�lox:0 mdadm: no uptodate device for slot 7 of /dev/md/pro�lox:0 mdadm: no uptodate device for slot 8 of /dev/md/pro�lox:0 mdadm: no uptodate device for slot 9 of /dev/md/pro�lox:0 mdadm: failed to add /dev/sdd1 to /dev/md/pro�lox:0: Invalid argument mdadm: /dev/md/pro�lox:0 assembled from 0 drives - not enough to start the array. mdadm: looking for devices for further assembly mdadm: no recogniseable superblock on /dev/sdd mdadm: No arrays found in config file or automatically pvscan and vgscan show nothing. So I tried running mdadm --create --verbose /dev/md0 --metadata=1.1 --level=5 --raid-devices=10 missing /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 --chunk=64 as it seemed that /dev/sdd1 failed to be added to the array. This did nothing. dmesg contains: md: invalid superblock checksum on sdd1 md: sdd1 does not have a valid v1.1 superblock, not importing! md: md_import_device returned -22 The output of mdadm -E is as follows: proxmox:~# mdadm -E /dev/sd[d-m]1 /dev/sdd1: Magic : a92b4efc Version : 1.1 Feature Map : 0x0 Array UUID : c4f62f32:4244a1db:b7746203:f10b5227 Name : proølox:0 Creation Time : Sat Aug 21 19:16:38 2010 Raid Level : raid5 Raid Devices : 10 Avail Dev Size : 1953099512 (931.31 GiB 999.99 GB) Array Size : 17577894528 (8381.79 GiB 8999.88 GB) Used Dev Size : 1953099392 (931.31 GiB 999.99 GB) Data Offset : 264 sectors Super Offset : 0 sectors State : clean Device UUID : b0ffd0a1:866676af:3ae4c03c:d219676e Update Time : Mon Feb 7 21:08:29 2011 Checksum : 13aa9685 - expected 93aa9672 Events : 60802 Layout : left-symmetric Chunk Size : 64K Device Role : Active device 0 Array State : AAAAAAAAAA ('A' == active, '.' == missing) /dev/sde1: Magic : a92b4efc Version : 1.1 Feature Map : 0x0 Array UUID : 7d55c29e:273c35da:f6438197:0365f95f Name : ubuntu:0 Creation Time : Thu Feb 10 11:59:48 2011 Raid Level : raid5 Raid Devices : 10 Avail Dev Size : 1953099512 (931.31 GiB 999.99 GB) Array Size : 17577894528 (8381.79 GiB 8999.88 GB) Used Dev Size : 1953099392 (931.31 GiB 999.99 GB) Data Offset : 264 sectors Super Offset : 0 sectors State : clean Device UUID : 4121fa31:7543218c:f42a937d:fddf04e8 Update Time : Thu Feb 10 12:17:59 2011 Checksum : 5f965c84 - correct Events : 4 Layout : left-symmetric Chunk Size : 64K Device Role : Active device 1 Array State : .AAAAAAAAA ('A' == active, '.' == missing) /dev/sdf1: Magic : a92b4efc Version : 1.1 Feature Map : 0x0 Array UUID : 7d55c29e:273c35da:f6438197:0365f95f Name : ubuntu:0 Creation Time : Thu Feb 10 11:59:48 2011 Raid Level : raid5 Raid Devices : 10 Avail Dev Size : 1953099512 (931.31 GiB 999.99 GB) Array Size : 17577894528 (8381.79 GiB 8999.88 GB) Used Dev Size : 1953099392 (931.31 GiB 999.99 GB) Data Offset : 264 sectors Super Offset : 0 sectors State : clean Device UUID : e70da8ed:c80e9533:7d8e200e:dc285255 Update Time : Thu Feb 10 12:17:59 2011 Checksum : c092c0e5 - correct Events : 4 Layout : left-symmetric Chunk Size : 64K Device Role : Active device 2 Array State : .AAAAAAAAA ('A' == active, '.' == missing) /dev/sdg1: Magic : a92b4efc Version : 1.1 Feature Map : 0x0 Array UUID : 7d55c29e:273c35da:f6438197:0365f95f Name : ubuntu:0 Creation Time : Thu Feb 10 11:59:48 2011 Raid Level : raid5 Raid Devices : 10 Avail Dev Size : 1953099512 (931.31 GiB 999.99 GB) Array Size : 17577894528 (8381.79 GiB 8999.88 GB) Used Dev Size : 1953099392 (931.31 GiB 999.99 GB) Data Offset : 264 sectors Super Offset : 0 sectors State : clean Device UUID : 5878fdbb:d0c7c892:40c8fa6c:0c5e257b Update Time : Thu Feb 10 12:17:59 2011 Checksum : 72c95353 - correct Events : 4 Layout : left-symmetric Chunk Size : 64K Device Role : Active device 3 Array State : .AAAAAAAAA ('A' == active, '.' == missing) /dev/sdh1: Magic : a92b4efc Version : 1.1 Feature Map : 0x0 Array UUID : 7d55c29e:273c35da:f6438197:0365f95f Name : ubuntu:0 Creation Time : Thu Feb 10 11:59:48 2011 Raid Level : raid5 Raid Devices : 10 Avail Dev Size : 1953099512 (931.31 GiB 999.99 GB) Array Size : 17577894528 (8381.79 GiB 8999.88 GB) Used Dev Size : 1953099392 (931.31 GiB 999.99 GB) Data Offset : 264 sectors Super Offset : 0 sectors State : clean Device UUID : 7bced3d0:5ebac414:02effc8b:ac1ad69e Update Time : Thu Feb 10 12:17:59 2011 Checksum : 4c4e7f67 - correct Events : 4 Layout : left-symmetric Chunk Size : 64K Device Role : Active device 4 Array State : .AAAAAAAAA ('A' == active, '.' == missing) /dev/sdi1: Magic : a92b4efc Version : 1.1 Feature Map : 0x0 Array UUID : 7d55c29e:273c35da:f6438197:0365f95f Name : ubuntu:0 Creation Time : Thu Feb 10 11:59:48 2011 Raid Level : raid5 Raid Devices : 10 Avail Dev Size : 1953099512 (931.31 GiB 999.99 GB) Array Size : 17577894528 (8381.79 GiB 8999.88 GB) Used Dev Size : 1953099392 (931.31 GiB 999.99 GB) Data Offset : 264 sectors Super Offset : 0 sectors State : clean Device UUID : 56513558:a25025bb:11e8b563:ae42f8d4 Update Time : Thu Feb 10 12:17:59 2011 Checksum : 87ebb998 - correct Events : 4 Layout : left-symmetric Chunk Size : 64K Device Role : Active device 5 Array State : .AAAAAAAAA ('A' == active, '.' == missing) /dev/sdj1: Magic : a92b4efc Version : 1.1 Feature Map : 0x0 Array UUID : 7d55c29e:273c35da:f6438197:0365f95f Name : ubuntu:0 Creation Time : Thu Feb 10 11:59:48 2011 Raid Level : raid5 Raid Devices : 10 Avail Dev Size : 1953099512 (931.31 GiB 999.99 GB) Array Size : 17577894528 (8381.79 GiB 8999.88 GB) Used Dev Size : 1953099392 (931.31 GiB 999.99 GB) Data Offset : 264 sectors Super Offset : 0 sectors State : clean Device UUID : 90621937:23cff36c:c55b1581:ed28b433 Update Time : Thu Feb 10 12:17:59 2011 Checksum : 94b9a346 - correct Events : 4 Layout : left-symmetric Chunk Size : 64K Device Role : Active device 6 Array State : .AAAAAAAAA ('A' == active, '.' == missing) /dev/sdk1: Magic : a92b4efc Version : 1.1 Feature Map : 0x0 Array UUID : 7d55c29e:273c35da:f6438197:0365f95f Name : ubuntu:0 Creation Time : Thu Feb 10 11:59:48 2011 Raid Level : raid5 Raid Devices : 10 Avail Dev Size : 1953099512 (931.31 GiB 999.99 GB) Array Size : 17577894528 (8381.79 GiB 8999.88 GB) Used Dev Size : 1953099392 (931.31 GiB 999.99 GB) Data Offset : 264 sectors Super Offset : 0 sectors State : clean Device UUID : 72d5ce06:855aa34c:3a354abc:0ddb6e80 Update Time : Thu Feb 10 12:17:59 2011 Checksum : cc0e2d20 - correct Events : 4 Layout : left-symmetric Chunk Size : 64K Device Role : Active device 7 Array State : .AAAAAAAAA ('A' == active, '.' == missing) /dev/sdl1: Magic : a92b4efc Version : 1.1 Feature Map : 0x0 Array UUID : 7d55c29e:273c35da:f6438197:0365f95f Name : ubuntu:0 Creation Time : Thu Feb 10 11:59:48 2011 Raid Level : raid5 Raid Devices : 10 Avail Dev Size : 1953099512 (931.31 GiB 999.99 GB) Array Size : 17577894528 (8381.79 GiB 8999.88 GB) Used Dev Size : 1953099392 (931.31 GiB 999.99 GB) Data Offset : 264 sectors Super Offset : 0 sectors State : clean Device UUID : 8792e04a:ee298ace:fec79850:ddd3167b Update Time : Thu Feb 10 12:17:59 2011 Checksum : 20fd4534 - correct Events : 4 Layout : left-symmetric Chunk Size : 64K Device Role : Active device 8 Array State : .AAAAAAAAA ('A' == active, '.' == missing) /dev/sdm1: Magic : a92b4efc Version : 1.1 Feature Map : 0x0 Array UUID : 7d55c29e:273c35da:f6438197:0365f95f Name : ubuntu:0 Creation Time : Thu Feb 10 11:59:48 2011 Raid Level : raid5 Raid Devices : 10 Avail Dev Size : 1953099512 (931.31 GiB 999.99 GB) Array Size : 17577894528 (8381.79 GiB 8999.88 GB) Used Dev Size : 1953099392 (931.31 GiB 999.99 GB) Data Offset : 264 sectors Super Offset : 0 sectors State : clean Device UUID : 8916d09c:448a5791:c2923182:ab99eb92 Update Time : Thu Feb 10 12:17:59 2011 Checksum : 7f27ba1f - correct Events : 4 Layout : left-symmetric Chunk Size : 64K Device Role : Active device 9 Array State : .AAAAAAAAA ('A' == active, '.' == missing) I'm confident the data is on there, I just need to get a backup of it off ! Yours desperately. regards Simon McNair -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-10 16:16 Linux software RAID assistance Simon McNair @ 2011-02-10 18:24 ` Phil Turmel 2011-02-15 4:53 ` NeilBrown 1 sibling, 0 replies; 64+ messages in thread From: Phil Turmel @ 2011-02-10 18:24 UTC (permalink / raw) To: simonmcnair; +Cc: linux-raid Hi Simon, On 02/10/2011 11:16 AM, Simon McNair wrote: > [trim /] > > So I tried running mdadm --create --verbose /dev/md0 --metadata=1.1 > --level=5 --raid-devices=10 missing /dev/sde1 /dev/sdf1 /dev/sdg1 > /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 --chunk=64 Uh-Oh. You told mdadm to create an array with a missing member, but you didn't tell it to assume-clean. > as it seemed that /dev/sdd1 failed to be added to the array. This did nothing. According to your mdadm -E output below, it did what you asked: it created a new array from the devices given (note the new array UUID on the nine devices). Since you have one device missing, it can't rebuild, so you might not have lost your data yet. That mdadm -E /dev/sdd1 reports its array assembled and clean is odd. Please show us the output of /proc/mdstat and then "mdadm -D /dev/md*" or "mdadm -D /dev/md/*" Phil > > dmesg contains: > > md: invalid superblock checksum on sdd1 > md: sdd1 does not have a valid v1.1 superblock, not importing! > md: md_import_device returned -22 > > The output of mdadm -E is as follows: > > proxmox:~# mdadm -E /dev/sd[d-m]1 > /dev/sdd1: > Magic : a92b4efc > Version : 1.1 > Feature Map : 0x0 > Array UUID : c4f62f32:4244a1db:b7746203:f10b5227 > Name : proølox:0 > Creation Time : Sat Aug 21 19:16:38 2010 > Raid Level : raid5 > Raid Devices : 10 > > Avail Dev Size : 1953099512 (931.31 GiB 999.99 GB) > Array Size : 17577894528 (8381.79 GiB 8999.88 GB) > Used Dev Size : 1953099392 (931.31 GiB 999.99 GB) > Data Offset : 264 sectors > Super Offset : 0 sectors > State : clean > Device UUID : b0ffd0a1:866676af:3ae4c03c:d219676e > > Update Time : Mon Feb 7 21:08:29 2011 > Checksum : 13aa9685 - expected 93aa9672 > Events : 60802 > > Layout : left-symmetric > Chunk Size : 64K > > Device Role : Active device 0 > Array State : AAAAAAAAAA ('A' == active, '.' == missing) > /dev/sde1: > Magic : a92b4efc > Version : 1.1 > Feature Map : 0x0 > Array UUID : 7d55c29e:273c35da:f6438197:0365f95f > Name : ubuntu:0 > Creation Time : Thu Feb 10 11:59:48 2011 > Raid Level : raid5 > Raid Devices : 10 > > Avail Dev Size : 1953099512 (931.31 GiB 999.99 GB) > Array Size : 17577894528 (8381.79 GiB 8999.88 GB) > Used Dev Size : 1953099392 (931.31 GiB 999.99 GB) > Data Offset : 264 sectors > Super Offset : 0 sectors > State : clean > Device UUID : 4121fa31:7543218c:f42a937d:fddf04e8 > > Update Time : Thu Feb 10 12:17:59 2011 > Checksum : 5f965c84 - correct > Events : 4 > > Layout : left-symmetric > Chunk Size : 64K > > Device Role : Active device 1 > Array State : .AAAAAAAAA ('A' == active, '.' == missing) -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-10 16:16 Linux software RAID assistance Simon McNair 2011-02-10 18:24 ` Phil Turmel @ 2011-02-15 4:53 ` NeilBrown 2011-02-15 8:48 ` Simon McNair 2011-02-15 14:51 ` Phil Turmel 1 sibling, 2 replies; 64+ messages in thread From: NeilBrown @ 2011-02-15 4:53 UTC (permalink / raw) To: simonmcnair; +Cc: linux-raid On Thu, 10 Feb 2011 16:16:44 +0000 Simon McNair <simonmcnair@gmail.com> wrote: > > Hi all > > I use a 3ware 9500-12 port sata card (JBOD) which will not work without a > 128mb sodimm. The sodimm socket is flakey and the result is that the > machine occasionally crashes. Yesterday I finally gave in and put > together another > machine so that I can rsync between them. When I turned the machine > on today to set up rync, the RAID array was not gone, but corrupted. > Typical... Presumably the old machine was called 'ubuntu' and the new machine 'proølox' > > I built the array in Aug 2010 using the following command: > > mdadm --create --verbose /dev/md0 --metadata=1.1 --level=5 > --raid-devices=10 /dev/sd{b,c,d,e,f,g,h,i,j,k}1 --chunk=64 > > Using LVM, I did the following: > pvscan > pvcreate -M2 /dev/md0 > vgcreate lvm-raid /dev/md0 > vgdisplay lvm-raid > vgscan > lvscan > lvcreate -v -l 100%VG -n RAID lvm-raid > lvdisplay /dev/lvm-raid/lvm0 > > I then formatted using: > mkfs -t ext4 -v -m .1 -b 4096 -E stride=16,stripe-width=144 > /dev/lvm-raid/RAID > > This worked perfectly since I created the array. Now mdadm is coming up > with > > proxmox:/dev/md# mdadm --assemble --scan --verbose > mdadm: looking for devices for further assembly > mdadm: no recogniseable superblock on /dev/md/ubuntu:0 And it seems that ubuntu:0 have been successfully assembled. It is missing one device for some reason (sdd1) but RAID can cope with that. > mdadm: cannot open device /dev/dm-2: Device or resource busy > mdadm: cannot open device /dev/dm-1: Device or resource busy > mdadm: cannot open device /dev/dm-0: Device or resource busy > mdadm: cannot open device /dev/sdm1: Device or resource busy > mdadm: cannot open device /dev/sdm: Device or resource busy > mdadm: cannot open device /dev/sdl1: Device or resource busy > mdadm: cannot open device /dev/sdl: Device or resource busy > mdadm: cannot open device /dev/sdk1: Device or resource busy > mdadm: cannot open device /dev/sdk: Device or resource busy > mdadm: cannot open device /dev/sdj1: Device or resource busy > mdadm: cannot open device /dev/sdj: Device or resource busy > mdadm: cannot open device /dev/sdh1: Device or resource busy > mdadm: cannot open device /dev/sdh: Device or resource busy > mdadm: cannot open device /dev/sdi1: Device or resource busy > mdadm: cannot open device /dev/sdi: Device or resource busy > mdadm: cannot open device /dev/sdg1: Device or resource busy > mdadm: cannot open device /dev/sdg: Device or resource busy > mdadm: cannot open device /dev/sdf1: Device or resource busy > mdadm: cannot open device /dev/sdf: Device or resource busy > mdadm: cannot open device /dev/sde1: Device or resource busy > mdadm: cannot open device /dev/sde: Device or resource busy > mdadm: no RAID superblock on /dev/sdd > mdadm: cannot open device /dev/sda2: Device or resource busy > mdadm: cannot open device /dev/sda1: Device or resource busy > mdadm: cannot open device /dev/sda: Device or resource busy > mdadm: /dev/sdd1 is identified as a member of /dev/md/pro�lox:0, slot 0. > mdadm: no uptodate device for slot 1 of /dev/md/pro�lox:0 > mdadm: no uptodate device for slot 2 of /dev/md/pro�lox:0 > mdadm: no uptodate device for slot 3 of /dev/md/pro�lox:0 > mdadm: no uptodate device for slot 4 of /dev/md/pro�lox:0 > mdadm: no uptodate device for slot 5 of /dev/md/pro�lox:0 > mdadm: no uptodate device for slot 6 of /dev/md/pro�lox:0 > mdadm: no uptodate device for slot 7 of /dev/md/pro�lox:0 > mdadm: no uptodate device for slot 8 of /dev/md/pro�lox:0 > mdadm: no uptodate device for slot 9 of /dev/md/pro�lox:0 > mdadm: failed to add /dev/sdd1 to /dev/md/pro�lox:0: Invalid argument > mdadm: /dev/md/pro�lox:0 assembled from 0 drives - not enough to start > the array. This looks like it is *after* to trying the --create command you give below.. It is best to report things in the order they happen, else you can confuse people (or get caught out!). > mdadm: looking for devices for further assembly > mdadm: no recogniseable superblock on /dev/sdd > mdadm: No arrays found in config file or automatically > > pvscan and vgscan show nothing. > > So I tried running mdadm --create --verbose /dev/md0 --metadata=1.1 > --level=5 --raid-devices=10 missing /dev/sde1 /dev/sdf1 /dev/sdg1 > /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 --chunk=64 > > as it seemed that /dev/sdd1 failed to be added to the array. This did > nothing. It did not to nothing. It wrote a superblock to /dev/sdd1 and complained that it couldn't write to all the others --- didn't it? > > dmesg contains: > > md: invalid superblock checksum on sdd1 I guess that is why sdd1 was missing from 'ubuntu:0'. Though as I cannot tell if this happened before or after any of the various things reported above, it is hard to be sure. The real mystery is why 'pvscan' reports nothing. What about pvscan --verbose or blkid -p /dev/md/ubuntu:0 or even dd of=/dev/md/ubuntu:0 count=8 | od -c ?? NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-15 4:53 ` NeilBrown @ 2011-02-15 8:48 ` Simon McNair 2011-02-15 14:51 ` Phil Turmel 1 sibling, 0 replies; 64+ messages in thread From: Simon McNair @ 2011-02-15 8:48 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid Hi Neal, Thanks for the response. Phil Turmel has been giving me some kind assistance. After some investigation we have found that the 3ware card I'm using as a controller (as JBOD) is making the whole thing far too flakey and providing some data corruption (hopefully not for too long). Thankfully this is mainly a data archive so the data changes very infrequently. I have ordered a Supermicro 8 port sata controller and intend to use that, in conjunction with the 6 onboard sata ports to try and get the array back in shape. Some seriously bad stuff has happened, so I'm hoping the data is still retrievable. I'll cc the latest update to the list. Any input you can provide would be seriously welcome. regards Simon On 15/02/2011 04:53, NeilBrown wrote: > On Thu, 10 Feb 2011 16:16:44 +0000 Simon McNair<simonmcnair@gmail.com> wrote: > >> Hi all >> >> I use a 3ware 9500-12 port sata card (JBOD) which will not work without a >> 128mb sodimm. The sodimm socket is flakey and the result is that the >> machine occasionally crashes. Yesterday I finally gave in and put >> together another >> machine so that I can rsync between them. When I turned the machine >> on today to set up rync, the RAID array was not gone, but corrupted. >> Typical... > Presumably the old machine was called 'ubuntu' and the new machine 'proølox' > > >> I built the array in Aug 2010 using the following command: >> >> mdadm --create --verbose /dev/md0 --metadata=1.1 --level=5 >> --raid-devices=10 /dev/sd{b,c,d,e,f,g,h,i,j,k}1 --chunk=64 >> >> Using LVM, I did the following: >> pvscan >> pvcreate -M2 /dev/md0 >> vgcreate lvm-raid /dev/md0 >> vgdisplay lvm-raid >> vgscan >> lvscan >> lvcreate -v -l 100%VG -n RAID lvm-raid >> lvdisplay /dev/lvm-raid/lvm0 >> >> I then formatted using: >> mkfs -t ext4 -v -m .1 -b 4096 -E stride=16,stripe-width=144 >> /dev/lvm-raid/RAID >> >> This worked perfectly since I created the array. Now mdadm is coming up >> with >> >> proxmox:/dev/md# mdadm --assemble --scan --verbose >> mdadm: looking for devices for further assembly >> mdadm: no recogniseable superblock on /dev/md/ubuntu:0 > And it seems that ubuntu:0 have been successfully assembled. > It is missing one device for some reason (sdd1) but RAID can cope with that. > > > >> mdadm: cannot open device /dev/dm-2: Device or resource busy >> mdadm: cannot open device /dev/dm-1: Device or resource busy >> mdadm: cannot open device /dev/dm-0: Device or resource busy >> mdadm: cannot open device /dev/sdm1: Device or resource busy >> mdadm: cannot open device /dev/sdm: Device or resource busy >> mdadm: cannot open device /dev/sdl1: Device or resource busy >> mdadm: cannot open device /dev/sdl: Device or resource busy >> mdadm: cannot open device /dev/sdk1: Device or resource busy >> mdadm: cannot open device /dev/sdk: Device or resource busy >> mdadm: cannot open device /dev/sdj1: Device or resource busy >> mdadm: cannot open device /dev/sdj: Device or resource busy >> mdadm: cannot open device /dev/sdh1: Device or resource busy >> mdadm: cannot open device /dev/sdh: Device or resource busy >> mdadm: cannot open device /dev/sdi1: Device or resource busy >> mdadm: cannot open device /dev/sdi: Device or resource busy >> mdadm: cannot open device /dev/sdg1: Device or resource busy >> mdadm: cannot open device /dev/sdg: Device or resource busy >> mdadm: cannot open device /dev/sdf1: Device or resource busy >> mdadm: cannot open device /dev/sdf: Device or resource busy >> mdadm: cannot open device /dev/sde1: Device or resource busy >> mdadm: cannot open device /dev/sde: Device or resource busy >> mdadm: no RAID superblock on /dev/sdd >> mdadm: cannot open device /dev/sda2: Device or resource busy >> mdadm: cannot open device /dev/sda1: Device or resource busy >> mdadm: cannot open device /dev/sda: Device or resource busy >> mdadm: /dev/sdd1 is identified as a member of /dev/md/pro�lox:0, slot 0. >> mdadm: no uptodate device for slot 1 of /dev/md/pro�lox:0 >> mdadm: no uptodate device for slot 2 of /dev/md/pro�lox:0 >> mdadm: no uptodate device for slot 3 of /dev/md/pro�lox:0 >> mdadm: no uptodate device for slot 4 of /dev/md/pro�lox:0 >> mdadm: no uptodate device for slot 5 of /dev/md/pro�lox:0 >> mdadm: no uptodate device for slot 6 of /dev/md/pro�lox:0 >> mdadm: no uptodate device for slot 7 of /dev/md/pro�lox:0 >> mdadm: no uptodate device for slot 8 of /dev/md/pro�lox:0 >> mdadm: no uptodate device for slot 9 of /dev/md/pro�lox:0 >> mdadm: failed to add /dev/sdd1 to /dev/md/pro�lox:0: Invalid argument >> mdadm: /dev/md/pro�lox:0 assembled from 0 drives - not enough to start >> the array. > This looks like it is *after* to trying the --create command you give > below.. It is best to report things in the order they happen, else you can > confuse people (or get caught out!). > > >> mdadm: looking for devices for further assembly >> mdadm: no recogniseable superblock on /dev/sdd >> mdadm: No arrays found in config file or automatically >> >> pvscan and vgscan show nothing. >> >> So I tried running mdadm --create --verbose /dev/md0 --metadata=1.1 >> --level=5 --raid-devices=10 missing /dev/sde1 /dev/sdf1 /dev/sdg1 >> /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 --chunk=64 >> >> as it seemed that /dev/sdd1 failed to be added to the array. This did >> nothing. > It did not to nothing. It wrote a superblock to /dev/sdd1 and complained > that it couldn't write to all the others --- didn't it? > > >> dmesg contains: >> >> md: invalid superblock checksum on sdd1 > I guess that is why sdd1 was missing from 'ubuntu:0'. Though as I cannot > tell if this happened before or after any of the various things reported > above, it is hard to be sure. > > > The real mystery is why 'pvscan' reports nothing. > > What about > pvscan --verbose > > or > > blkid -p /dev/md/ubuntu:0 > > or even > > dd of=/dev/md/ubuntu:0 count=8 | od -c > > ?? > > NeilBrown > > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-15 4:53 ` NeilBrown 2011-02-15 8:48 ` Simon McNair @ 2011-02-15 14:51 ` Phil Turmel 2011-02-15 19:04 ` Simon McNair ` (2 more replies) 1 sibling, 3 replies; 64+ messages in thread From: Phil Turmel @ 2011-02-15 14:51 UTC (permalink / raw) To: NeilBrown; +Cc: simonmcnair, linux-raid Hi Neil, Since Simon has responded, let me summarize the assistance I provided per his off-list request: On 02/14/2011 11:53 PM, NeilBrown wrote: > On Thu, 10 Feb 2011 16:16:44 +0000 Simon McNair <simonmcnair@gmail.com> wrote: > >> >> Hi all >> >> I use a 3ware 9500-12 port sata card (JBOD) which will not work without a >> 128mb sodimm. The sodimm socket is flakey and the result is that the >> machine occasionally crashes. Yesterday I finally gave in and put >> together another >> machine so that I can rsync between them. When I turned the machine >> on today to set up rync, the RAID array was not gone, but corrupted. >> Typical... > > Presumably the old machine was called 'ubuntu' and the new machine 'proølox' > > >> >> I built the array in Aug 2010 using the following command: >> >> mdadm --create --verbose /dev/md0 --metadata=1.1 --level=5 >> --raid-devices=10 /dev/sd{b,c,d,e,f,g,h,i,j,k}1 --chunk=64 >> >> Using LVM, I did the following: >> pvscan >> pvcreate -M2 /dev/md0 >> vgcreate lvm-raid /dev/md0 >> vgdisplay lvm-raid >> vgscan >> lvscan >> lvcreate -v -l 100%VG -n RAID lvm-raid >> lvdisplay /dev/lvm-raid/lvm0 >> >> I then formatted using: >> mkfs -t ext4 -v -m .1 -b 4096 -E stride=16,stripe-width=144 >> /dev/lvm-raid/RAID >> >> This worked perfectly since I created the array. Now mdadm is coming up >> with >> >> proxmox:/dev/md# mdadm --assemble --scan --verbose >> mdadm: looking for devices for further assembly >> mdadm: no recogniseable superblock on /dev/md/ubuntu:0 > > And it seems that ubuntu:0 have been successfully assembled. > It is missing one device for some reason (sdd1) but RAID can cope with that. 3ware card is compromised, with a loose buffer memory dimm. Some of its ECC errors were caught and reported in dmesg. Its likely, based on the loose memory socket, that many multiple-bit errors got through. [trim /] >> mdadm: no uptodate device for slot 8 of /dev/md/pro�lox:0 >> mdadm: no uptodate device for slot 9 of /dev/md/pro�lox:0 >> mdadm: failed to add /dev/sdd1 to /dev/md/pro�lox:0: Invalid argument >> mdadm: /dev/md/pro�lox:0 assembled from 0 drives - not enough to start >> the array. > > This looks like it is *after* to trying the --create command you give > below.. It is best to report things in the order they happen, else you can > confuse people (or get caught out!). Yes, this was after. >> mdadm: looking for devices for further assembly >> mdadm: no recogniseable superblock on /dev/sdd >> mdadm: No arrays found in config file or automatically >> >> pvscan and vgscan show nothing. >> >> So I tried running mdadm --create --verbose /dev/md0 --metadata=1.1 >> --level=5 --raid-devices=10 missing /dev/sde1 /dev/sdf1 /dev/sdg1 >> /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 --chunk=64 >> >> as it seemed that /dev/sdd1 failed to be added to the array. This did >> nothing. > > It did not to nothing. It wrote a superblock to /dev/sdd1 and complained > that it couldn't write to all the others --- didn't it? There were multiple attempts to create. One wrote to just sdd1, another succeeded with all but sdd1. >> dmesg contains: >> >> md: invalid superblock checksum on sdd1 > > I guess that is why sdd1 was missing from 'ubuntu:0'. Though as I cannot > tell if this happened before or after any of the various things reported > above, it is hard to be sure. > > > The real mystery is why 'pvscan' reports nothing. The original array was created with mdadm v2.6.7, and had a data offset of 264 sectors. After Simon's various attempts to --create, he ended up with data offset of 2048, using mdadm v3.1.4. The mdadm -E reports he posted to the list showed the 264 offset. We didn't realize the offset had been updated until somewhat later in our troubleshooting efforts. In any case, pvscan couldn't see the LVM signature because it wasn't there (at offset 2048). > What about > pvscan --verbose > > or > > blkid -p /dev/md/ubuntu:0 > > or even > > dd of=/dev/md/ubuntu:0 count=8 | od -c Fortunately, Simon did have a copy of his LVM configuration. With the help of dd, strings, and grep, we did locate his LVM sig at the correct location on sdd1 (for data offset 264). After a number of attempts to bypass LVM and access his single LV with dmsetup (based on his backed up configuration, on the assembled new array less sdd1), I realized that the data offset was wrong on the recreated array, and went looking for the cause. I found your git commit that changed that logic last spring, and recommended that Simon revert to the default package for his ubuntu install, which is v2.6.7. Simon has now attempted to recreate the array with v2.6.7, but the controller is throwing too many errors to succeed, and I suggested it was too flakey to trust any further. Based on the existence of the LVM sig on sdd1, I believe Simon's data is (mostly) intact, and only needs a successful create operation with a properly functioning controller. (He might also need to perform an lvm vgcfgrestore, but he has the necessary backup file.) A new controller is on order. Phil -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-15 14:51 ` Phil Turmel @ 2011-02-15 19:04 ` Simon McNair 2011-02-15 19:37 ` Phil Turmel 2011-02-16 13:51 ` Simon McNair 2011-02-16 13:56 ` Simon McNair 2 siblings, 1 reply; 64+ messages in thread From: Simon McNair @ 2011-02-15 19:04 UTC (permalink / raw) To: Phil Turmel; +Cc: NeilBrown, linux-raid Phil, Thanks for filling in the gaps, I had forgotton quite how much help and assistance you had provided up to now. You're a real godsend. To fill in some of the other pieces of info: The original machine was called proxmox (KVM virtualisation machine) and I used an Ubuntu live cd to see if it was the software which was preventing progress or not. It does seem like there is data corruption in the machine name for some reason. The original company I ordered the controller through sent me an email at 4pm saying that they could not fulfill the order, 4pm was too late for them to pick & pack so there was another 24hr turn around delay. The supermicro card should arrive here tomorrow and hopefully I'll be able to get a dd of each of the drives prior to Phil coming online. For some reason blkid doesn't exist on my machine even though I have e2fs-utils installed. V weird. Simon On 15/02/2011 14:51, Phil Turmel wrote: > Hi Neil, > > Since Simon has responded, let me summarize the assistance I provided per his off-list request: > > On 02/14/2011 11:53 PM, NeilBrown wrote: >> On Thu, 10 Feb 2011 16:16:44 +0000 Simon McNair<simonmcnair@gmail.com> wrote: >> >>> Hi all >>> >>> I use a 3ware 9500-12 port sata card (JBOD) which will not work without a >>> 128mb sodimm. The sodimm socket is flakey and the result is that the >>> machine occasionally crashes. Yesterday I finally gave in and put >>> together another >>> machine so that I can rsync between them. When I turned the machine >>> on today to set up rync, the RAID array was not gone, but corrupted. >>> Typical... >> Presumably the old machine was called 'ubuntu' and the new machine 'proølox' >> >> >>> I built the array in Aug 2010 using the following command: >>> >>> mdadm --create --verbose /dev/md0 --metadata=1.1 --level=5 >>> --raid-devices=10 /dev/sd{b,c,d,e,f,g,h,i,j,k}1 --chunk=64 >>> >>> Using LVM, I did the following: >>> pvscan >>> pvcreate -M2 /dev/md0 >>> vgcreate lvm-raid /dev/md0 >>> vgdisplay lvm-raid >>> vgscan >>> lvscan >>> lvcreate -v -l 100%VG -n RAID lvm-raid >>> lvdisplay /dev/lvm-raid/lvm0 >>> >>> I then formatted using: >>> mkfs -t ext4 -v -m .1 -b 4096 -E stride=16,stripe-width=144 >>> /dev/lvm-raid/RAID >>> >>> This worked perfectly since I created the array. Now mdadm is coming up >>> with >>> >>> proxmox:/dev/md# mdadm --assemble --scan --verbose >>> mdadm: looking for devices for further assembly >>> mdadm: no recogniseable superblock on /dev/md/ubuntu:0 >> And it seems that ubuntu:0 have been successfully assembled. >> It is missing one device for some reason (sdd1) but RAID can cope with that. > 3ware card is compromised, with a loose buffer memory dimm. Some of its ECC errors were caught and reported in dmesg. Its likely, based on the loose memory socket, that many multiple-bit errors got through. > > [trim /] > >>> mdadm: no uptodate device for slot 8 of /dev/md/pro�lox:0 >>> mdadm: no uptodate device for slot 9 of /dev/md/pro�lox:0 >>> mdadm: failed to add /dev/sdd1 to /dev/md/pro�lox:0: Invalid argument >>> mdadm: /dev/md/pro�lox:0 assembled from 0 drives - not enough to start >>> the array. >> This looks like it is *after* to trying the --create command you give >> below.. It is best to report things in the order they happen, else you can >> confuse people (or get caught out!). > Yes, this was after. > >>> mdadm: looking for devices for further assembly >>> mdadm: no recogniseable superblock on /dev/sdd >>> mdadm: No arrays found in config file or automatically >>> >>> pvscan and vgscan show nothing. >>> >>> So I tried running mdadm --create --verbose /dev/md0 --metadata=1.1 >>> --level=5 --raid-devices=10 missing /dev/sde1 /dev/sdf1 /dev/sdg1 >>> /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 --chunk=64 >>> >>> as it seemed that /dev/sdd1 failed to be added to the array. This did >>> nothing. >> It did not to nothing. It wrote a superblock to /dev/sdd1 and complained >> that it couldn't write to all the others --- didn't it? > There were multiple attempts to create. One wrote to just sdd1, another succeeded with all but sdd1. > >>> dmesg contains: >>> >>> md: invalid superblock checksum on sdd1 >> I guess that is why sdd1 was missing from 'ubuntu:0'. Though as I cannot >> tell if this happened before or after any of the various things reported >> above, it is hard to be sure. >> >> >> The real mystery is why 'pvscan' reports nothing. > The original array was created with mdadm v2.6.7, and had a data offset of 264 sectors. After Simon's various attempts to --create, he ended up with data offset of 2048, using mdadm v3.1.4. The mdadm -E reports he posted to the list showed the 264 offset. We didn't realize the offset had been updated until somewhat later in our troubleshooting efforts. > > In any case, pvscan couldn't see the LVM signature because it wasn't there (at offset 2048). > >> What about >> pvscan --verbose >> >> or >> >> blkid -p /dev/md/ubuntu:0 >> >> or even >> >> dd of=/dev/md/ubuntu:0 count=8 | od -c > Fortunately, Simon did have a copy of his LVM configuration. With the help of dd, strings, and grep, we did locate his LVM sig at the correct location on sdd1 (for data offset 264). After a number of attempts to bypass LVM and access his single LV with dmsetup (based on his backed up configuration, on the assembled new array less sdd1), I realized that the data offset was wrong on the recreated array, and went looking for the cause. I found your git commit that changed that logic last spring, and recommended that Simon revert to the default package for his ubuntu install, which is v2.6.7. > > Simon has now attempted to recreate the array with v2.6.7, but the controller is throwing too many errors to succeed, and I suggested it was too flakey to trust any further. Based on the existence of the LVM sig on sdd1, I believe Simon's data is (mostly) intact, and only needs a successful create operation with a properly functioning controller. (He might also need to perform an lvm vgcfgrestore, but he has the necessary backup file.) > > A new controller is on order. > > Phil -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-15 19:04 ` Simon McNair @ 2011-02-15 19:37 ` Phil Turmel 2011-02-15 19:45 ` Roman Mamedov 0 siblings, 1 reply; 64+ messages in thread From: Phil Turmel @ 2011-02-15 19:37 UTC (permalink / raw) To: simonmcnair; +Cc: NeilBrown, linux-raid On 02/15/2011 02:04 PM, Simon McNair wrote: > Phil, > Thanks for filling in the gaps, I had forgotton quite how much help and assistance you had provided up to now. You're a real godsend. > > To fill in some of the other pieces of info: > The original machine was called proxmox (KVM virtualisation machine) and I used an Ubuntu live cd to see if it was the software which was preventing progress or not. It does seem like there is data corruption in the machine name for some reason. > > The original company I ordered the controller through sent me an email at 4pm saying that they could not fulfill the order, 4pm was too late for them to pick & pack so there was another 24hr turn around delay. The supermicro card should arrive here tomorrow and hopefully I'll be able to get a dd of each of the drives prior to Phil coming online. > > For some reason blkid doesn't exist on my machine even though I have e2fs-utils installed. V weird. FWIW, on my VPS running Ubuntu Server 10.10, the blkid executable is part of util-linux, and installed in /sbin. You might need to re-install it. Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-15 19:37 ` Phil Turmel @ 2011-02-15 19:45 ` Roman Mamedov 2011-02-15 21:09 ` Simon McNair 2011-02-17 15:10 ` Simon Mcnair 0 siblings, 2 replies; 64+ messages in thread From: Roman Mamedov @ 2011-02-15 19:45 UTC (permalink / raw) To: Phil Turmel; +Cc: simonmcnair, NeilBrown, linux-raid [-- Attachment #1: Type: text/plain, Size: 494 bytes --] On Tue, 15 Feb 2011 14:37:55 -0500 Phil Turmel <philip@turmel.org> wrote: > > For some reason blkid doesn't exist on my machine even though I have > > e2fs-utils installed. V weird. Weird is that you think it is somehow a "e2fs-utils" program. dpkg -S blkid ... util-linux: /sbin/blkid ... > > FWIW, on my VPS running Ubuntu Server 10.10, the blkid executable is part of > util-linux, and installed in /sbin. You might need to re-install it. -- With respect, Roman [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-15 19:45 ` Roman Mamedov @ 2011-02-15 21:09 ` Simon McNair 2011-02-17 15:10 ` Simon Mcnair 1 sibling, 0 replies; 64+ messages in thread From: Simon McNair @ 2011-02-15 21:09 UTC (permalink / raw) To: Roman Mamedov; +Cc: Phil Turmel, NeilBrown, linux-raid I got my info from: http://linux.die.net/man/8/blkid "*blkid* is part the e2fsprogs package since version 1.26 and is available from http://e2fsprogs.sourceforge.net <http://e2fsprogs.sourceforge.net/>. " I also did apt-file search blkid and it showed up as e2fsprogs or was it e2fs-utils (I forget). The system I was/am running was Debian Lenny x64. This just shows how little I know about Linux (I've only been dabbling for 6 months or so). I'll install util-linux next time the machine start up. Thanks for the pointer. regards Simon On 15/02/2011 19:45, Roman Mamedov wrote: > On Tue, 15 Feb 2011 14:37:55 -0500 > Phil Turmel<philip@turmel.org> wrote: > >>> For some reason blkid doesn't exist on my machine even though I have >>> e2fs-utils installed. V weird. > Weird is that you think it is somehow a "e2fs-utils" program. > > dpkg -S blkid > ... > util-linux: /sbin/blkid > ... > >> FWIW, on my VPS running Ubuntu Server 10.10, the blkid executable is part of >> util-linux, and installed in /sbin. You might need to re-install it. ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-15 19:45 ` Roman Mamedov 2011-02-15 21:09 ` Simon McNair @ 2011-02-17 15:10 ` Simon Mcnair 2011-02-17 15:42 ` Roman Mamedov 1 sibling, 1 reply; 64+ messages in thread From: Simon Mcnair @ 2011-02-17 15:10 UTC (permalink / raw) To: Roman Mamedov; +Cc: Phil Turmel, NeilBrown, linux-raid proxmox:~# dpkg -S blkid libblkid1: /lib/libblkid.so.1.0 libblkid1: /lib/libblkid.so.1 libblkid1: /usr/share/doc/libblkid1/changelog.Debian.gz libblkid1: /usr/share/doc/libblkid1/copyright libblkid1: /usr/share/doc/libblkid1 e2fsprogs-dbg: /usr/lib/debug/sbin/blkid proxmox:~# apt-get install e2fsprogs-dbg Reading package lists... Done Building dependency tree Reading state information... Done e2fsprogs-dbg is already the newest version. 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. I still can't blkid installed, even though I have e2fsprogs-dbg installed... On 15 February 2011 19:45, Roman Mamedov <rm@romanrm.ru> wrote: > On Tue, 15 Feb 2011 14:37:55 -0500 > Phil Turmel <philip@turmel.org> wrote: > >> > For some reason blkid doesn't exist on my machine even though I have >> > e2fs-utils installed. V weird. > > Weird is that you think it is somehow a "e2fs-utils" program. > > dpkg -S blkid > ... > util-linux: /sbin/blkid > ... > >> >> FWIW, on my VPS running Ubuntu Server 10.10, the blkid executable is part of >> util-linux, and installed in /sbin. You might need to re-install it. > > -- > With respect, > Roman > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-17 15:10 ` Simon Mcnair @ 2011-02-17 15:42 ` Roman Mamedov 2011-02-18 9:13 ` Simon McNair 0 siblings, 1 reply; 64+ messages in thread From: Roman Mamedov @ 2011-02-17 15:42 UTC (permalink / raw) To: Simon Mcnair; +Cc: Phil Turmel, NeilBrown, linux-raid [-- Attachment #1: Type: text/plain, Size: 1057 bytes --] On Thu, 17 Feb 2011 15:10:32 +0000 Simon Mcnair <simonmcnair@gmail.com> wrote: > proxmox:~# dpkg -S blkid > libblkid1: /lib/libblkid.so.1.0 > libblkid1: /lib/libblkid.so.1 > libblkid1: /usr/share/doc/libblkid1/changelog.Debian.gz > libblkid1: /usr/share/doc/libblkid1/copyright > libblkid1: /usr/share/doc/libblkid1 > e2fsprogs-dbg: /usr/lib/debug/sbin/blkid > > proxmox:~# apt-get install e2fsprogs-dbg > Reading package lists... Done > Building dependency tree > Reading state information... Done > e2fsprogs-dbg is already the newest version. > 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. > > > I still can't blkid installed, even though I have e2fsprogs-dbg installed... /usr/lib/debug/sbin/ is not in system 'path', so you won't be able to run the program just by entering 'blkid' into the shell prompt. Either use the complete file name and path, e.g. # /usr/lib/debug/sbin/blkid --help or symlink /usr/lib/debug/sbin/blkid to something like /usr/local/sbin/blkid. -- With respect, Roman [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-17 15:42 ` Roman Mamedov @ 2011-02-18 9:13 ` Simon McNair 2011-02-18 9:38 ` Robin Hill 0 siblings, 1 reply; 64+ messages in thread From: Simon McNair @ 2011-02-18 9:13 UTC (permalink / raw) To: Roman Mamedov; +Cc: Phil Turmel, linux-raid Hi Roman, Sorry to be so persistent on something that is not Linux raid specific, but I still can't get blkid to run, the message is: proxmox:~# ls -la /usr/lib/debug/sbin/blkid -rwx------ 1 root root 19887 2008-10-13 04:54 /usr/lib/debug/sbin/blkid proxmox:~# /usr/lib/debug/sbin/blkid --help -bash: /usr/lib/debug/sbin/blkid: cannot execute binary file If I can't run it directly I'm pretty sure symlinking it will not make any difference. It's weird, normally when I apt-get things it modifies the path etc and does everything required to make it work, I would have thought if I installed a debug version of something it should add the debug path as well ? blkid always works on my Ubuntu boxes, I don't know why this is any different (apart from being a different distribution, it's still a mature program, package and platform). :-) Simon On 17/02/2011 15:42, Roman Mamedov wrote: > On Thu, 17 Feb 2011 15:10:32 +0000 > Simon Mcnair<simonmcnair@gmail.com> wrote: > >> proxmox:~# dpkg -S blkid >> libblkid1: /lib/libblkid.so.1.0 >> libblkid1: /lib/libblkid.so.1 >> libblkid1: /usr/share/doc/libblkid1/changelog.Debian.gz >> libblkid1: /usr/share/doc/libblkid1/copyright >> libblkid1: /usr/share/doc/libblkid1 >> e2fsprogs-dbg: /usr/lib/debug/sbin/blkid >> >> proxmox:~# apt-get install e2fsprogs-dbg >> Reading package lists... Done >> Building dependency tree >> Reading state information... Done >> e2fsprogs-dbg is already the newest version. >> 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. >> >> >> I still can't blkid installed, even though I have e2fsprogs-dbg installed... > /usr/lib/debug/sbin/ is not in system 'path', so you won't be able to run the > program just by entering 'blkid' into the shell prompt. > Either use the complete file name and path, e.g. > > # /usr/lib/debug/sbin/blkid --help > > or symlink /usr/lib/debug/sbin/blkid to something like /usr/local/sbin/blkid. > ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-18 9:13 ` Simon McNair @ 2011-02-18 9:38 ` Robin Hill 2011-02-18 10:38 ` Simon Mcnair 0 siblings, 1 reply; 64+ messages in thread From: Robin Hill @ 2011-02-18 9:38 UTC (permalink / raw) To: Simon McNair; +Cc: Roman Mamedov, Phil Turmel, linux-raid [-- Attachment #1: Type: text/plain, Size: 1323 bytes --] On Fri Feb 18, 2011 at 09:13:33AM +0000, Simon McNair wrote: > Hi Roman, > Sorry to be so persistent on something that is not Linux raid specific, > but I still can't get blkid to run, the message is: > > proxmox:~# ls -la /usr/lib/debug/sbin/blkid > -rwx------ 1 root root 19887 2008-10-13 04:54 /usr/lib/debug/sbin/blkid > > proxmox:~# /usr/lib/debug/sbin/blkid --help > -bash: /usr/lib/debug/sbin/blkid: cannot execute binary file > > If I can't run it directly I'm pretty sure symlinking it will not make > any difference. It's weird, normally when I apt-get things it modifies > the path etc and does everything required to make it work, I would have > thought if I installed a debug version of something it should add the > debug path as well ? > I suspect that those are just the debug symbols - they're split off from the binary so they can be automatically pulled in when needed, but don't bloat the application itself. From the package naming, the actual blkid binary should be in the e2fsprogs package - do you have that installed? Cheers, Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-18 9:38 ` Robin Hill @ 2011-02-18 10:38 ` Simon Mcnair 2011-02-19 11:46 ` Jan Ceuleers 0 siblings, 1 reply; 64+ messages in thread From: Simon Mcnair @ 2011-02-18 10:38 UTC (permalink / raw) To: Simon McNair, Roman Mamedov, Phil Turmel, linux-raid; +Cc: Robin Hill Hi Robin, proxmox:~# apt-get install e2fsprogs Reading package lists... Done Building dependency tree Reading state information... Done e2fsprogs is already the newest version. 0 upgraded, 0 newly installed, 0 to remove and 3 not upgraded. Yeah, It's installed :-) Simon On 18 February 2011 09:38, Robin Hill <robin@robinhill.me.uk> wrote: > On Fri Feb 18, 2011 at 09:13:33AM +0000, Simon McNair wrote: > >> Hi Roman, >> Sorry to be so persistent on something that is not Linux raid specific, >> but I still can't get blkid to run, the message is: >> >> proxmox:~# ls -la /usr/lib/debug/sbin/blkid >> -rwx------ 1 root root 19887 2008-10-13 04:54 /usr/lib/debug/sbin/blkid >> >> proxmox:~# /usr/lib/debug/sbin/blkid --help >> -bash: /usr/lib/debug/sbin/blkid: cannot execute binary file >> >> If I can't run it directly I'm pretty sure symlinking it will not make >> any difference. It's weird, normally when I apt-get things it modifies >> the path etc and does everything required to make it work, I would have >> thought if I installed a debug version of something it should add the >> debug path as well ? >> > I suspect that those are just the debug symbols - they're split off from > the binary so they can be automatically pulled in when needed, but don't > bloat the application itself. From the package naming, the actual blkid > binary should be in the e2fsprogs package - do you have that installed? > > Cheers, > Robin > -- > ___ > ( ' } | Robin Hill <robin@robinhill.me.uk> | > / / ) | Little Jim says .... | > // !! | "He fallen in de water !!" | > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-18 10:38 ` Simon Mcnair @ 2011-02-19 11:46 ` Jan Ceuleers 2011-02-19 12:40 ` Simon McNair 0 siblings, 1 reply; 64+ messages in thread From: Jan Ceuleers @ 2011-02-19 11:46 UTC (permalink / raw) To: Simon Mcnair; +Cc: Roman Mamedov, Phil Turmel, linux-raid, Robin Hill On 18/02/11 11:38, Simon Mcnair wrote: > Yeah, It's installed :-) apt-get install --reinstall e2fsprogs ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-19 11:46 ` Jan Ceuleers @ 2011-02-19 12:40 ` Simon McNair 2011-02-19 17:37 ` Jan Ceuleers 0 siblings, 1 reply; 64+ messages in thread From: Simon McNair @ 2011-02-19 12:40 UTC (permalink / raw) To: Jan Ceuleers; +Cc: Roman Mamedov, Phil Turmel, linux-raid, Robin Hill Jan, yay. Thanks very much. That worked. I don't quite understand why something should need a reinstall to work, but thanks v much. regards Simon On 19/02/2011 11:46, Jan Ceuleers wrote: > On 18/02/11 11:38, Simon Mcnair wrote: >> Yeah, It's installed :-) > > apt-get install --reinstall e2fsprogs ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-19 12:40 ` Simon McNair @ 2011-02-19 17:37 ` Jan Ceuleers 0 siblings, 0 replies; 64+ messages in thread From: Jan Ceuleers @ 2011-02-19 17:37 UTC (permalink / raw) To: simonmcnair; +Cc: Roman Mamedov, Phil Turmel, linux-raid, Robin Hill On 19/02/11 13:40, Simon McNair wrote: > yay. Thanks very much. That worked. I don't quite understand why > something should need a reinstall to work, but thanks v much. If a package's binaries have been damaged or deleted, on purpose or otherwise, then --reinstall helps. I have no idea what happened to your copy of blkid, but --reinstall has brought it back for you. Jan ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-15 14:51 ` Phil Turmel 2011-02-15 19:04 ` Simon McNair @ 2011-02-16 13:51 ` Simon McNair 2011-02-16 14:37 ` Phil Turmel 2011-02-16 13:56 ` Simon McNair 2 siblings, 1 reply; 64+ messages in thread From: Simon McNair @ 2011-02-16 13:51 UTC (permalink / raw) To: Phil Turmel; +Cc: NeilBrown, linux-raid latest update. A bit long I'm afraid... hi, card installed and devices plugged in. Now have sda sda2 sdb1 sdc1 sde sdf1 sdg1 sdi sdj1 sdk1 sdl1 sdm1 sda1 sdb sdc sdd sdf sdg sdh sdj sdk sdl sdm proxmox:/home/simon# ./lsdrv.sh Controller device @ pci0000:00/0000:00:01.0/0000:01:00.0 [mvsas] SCSI storage controller: Marvell Technology Group Ltd. MV64460/64461/64462 Sys tem Controller, Revision B (rev 01) host4: /dev/sdf ATA Hitachi HDS72101 {SN: GTA000PAGABXRA} host4: /dev/sdg ATA Hitachi HDS72101 {SN: GTA000PAGAA5DA} host4: /dev/sdh ATA Hitachi HDS72101 {SN: GTA000PAG9NL9A} host4: /dev/sdi ATA Hitachi HDS72101 {SN: GTA000PAGA8V4A} host4: /dev/sdj ATA Hitachi HDS72101 {SN: GTD000PAGMT9GD} host4: /dev/sdk ATA Hitachi HDS72101 {SN: GTG000PAG18BJC} host4: /dev/sdl ATA Hitachi HDS72101 {SN: GTG000PAG1DPLC} host4: /dev/sdm ATA Hitachi HDS72101 {SN: GTA000PAG7WMEA} Controller device @ pci0000:00/0000:00:1a.7/usb1/1-4/1-4.1/1-4.1.1/1-4.1.1:1.0 [ usb-storage] Bus 001 Device 007: ID 0424:2228 Standard Microsystems Corp. 9-in-2 Card Reade r {SN: 08050920003A} host9: /dev/sdd Generic Flash HS-CF host9: /dev/sde Generic Flash HS-COMBO Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.0 [ahci] SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 03) host7: [Empty] host8: [Empty] Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.1 [pata_jmicron] IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (r ev 03) host5: [Empty] host6: [Empty] Controller device @ pci0000:00/0000:00:1f.2 [ata_piix] IDE interface: Intel Corporation 82801JI (ICH10 Family) 4 port SATA IDE Contro ller #1 host0: /dev/sda ATA STM3500418AS {SN: 9VM3QJ5C} host1: /dev/sr0 Optiarc DVD RW AD-5240S Controller device @ pci0000:00/0000:00:1f.5 [ata_piix] IDE interface: Intel Corporation 82801JI (ICH10 Family) 2 port SATA IDE Contro ller #2 host2: /dev/sdb ATA Hitachi HDS72101 {SN: GTD000PAGMT8DD} host3: /dev/sdc ATA Hitachi HDS72101 {SN: GTG000PAG04V0C} proxmox:/home/simon# parted -l Model: ATA STM3500418AS (scsi) Disk /dev/sda: 500GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 32.8kB 537MB 537MB primary ext3 boot 2 537MB 500GB 500GB primary lvm Error: The backup GPT table is not at the end of the disk, as it should be. This might mean that another operating system believes the disk is smaller. Fix, by moving the backup to the end (and removing the old backup)? Fix/Cancel? c Error: The backup GPT table is not at the end of the disk, as it should be. This might mean that another operating system believes the disk is smaller. Fix, by moving the backup to the end (and removing the old backup)? Fix/Cancel? c Error: The backup GPT table is not at the end of the disk, as it should be. This might mean that another operating system believes the disk is smaller. Fix, by moving the backup to the end (and removing the old backup)? Fix/Cancel? c Model: Linux device-mapper (linear) (dm) Disk /dev/mapper/pve-data: 380GB Sector size (logical/physical): 512B/512B Partition Table: loop Number Start End Size File system Flags 1 0.00B 380GB 380GB ext3 cModel: Linux device-mapper (linear) (dm) Disk /dev/mapper/pve-root: 103GB Sector size (logical/physical): 512B/512B Partition Table: loop Number Start End Size File system Flags 1 0.00B 103GB 103GB ext3 Model: Linux device-mapper (linear) (dm) Disk /dev/mapper/pve-swap: 11.8GB Sector size (logical/physical): 512B/512B Partition Table: loop Number Start End Size File system Flags 1 0.00B 11.8GB 11.8GB linux-swap Error: The backup GPT table is not at the end of the disk, as it should be. This might mean that another operating system believes the disk is smaller. Fix, by moving the backup to the end (and removing the old backup)? Fix/Cancel? c Error: /dev/sdh: unrecognised disk label Error: /dev/sdi: unrecognised disk label Error: The backup GPT table is not at the end of the disk, as it should be. This might mean that another operating system believes the disk is smaller. Fix, by moving the backup to the end (and removing the old backup)? Fix/Cancel? c Error: The backup GPT table is not at the end of the disk, as it should be. This might mean that another operating system believes the disk is smaller. Fix, by moving the backup to the end (and removing the old backup)? Fix/Cancel? c Error: The backup GPT table is not at the end of the disk, as it should be. This might mean that another operating system believes the disk is smaller. Fix, by moving the backup to the end (and removing the old backup)? Fix/Cancel? c Error: The backup GPT table is not at the end of the disk, as it should be. This might mean that another operating system believes the disk is smaller. Fix, by moving the backup to the end (and removing the old backup)? Fix/Cancel? c Error: /dev/md0: unrecognised disk label proxmox:/home/simon# mdadm --assemble --scan --verbose mdadm: looking for devices for /dev/md/0 mdadm: cannot open device /dev/dm-2: Device or resource busy mdadm: /dev/dm-2 has wrong uuid. mdadm: cannot open device /dev/dm-1: Device or resource busy mdadm: /dev/dm-1 has wrong uuid. mdadm: cannot open device /dev/dm-0: Device or resource busy mdadm: /dev/dm-0 has wrong uuid. mdadm: no RAID superblock on /dev/sdf1 mdadm: /dev/sdf1 has wrong uuid. mdadm: no RAID superblock on /dev/sdf mdadm: /dev/sdf has wrong uuid. mdadm: no RAID superblock on /dev/sdm1 mdadm: /dev/sdm1 has wrong uuid. mdadm: no RAID superblock on /dev/sdm mdadm: /dev/sdm has wrong uuid. mdadm: no RAID superblock on /dev/sdl1 mdadm: /dev/sdl1 has wrong uuid. mdadm: no RAID superblock on /dev/sdl mdadm: /dev/sdl has wrong uuid. mdadm: no RAID superblock on /dev/sdk1 mdadm: /dev/sdk1 has wrong uuid. mdadm: no RAID superblock on /dev/sdk mdadm: /dev/sdk has wrong uuid. mdadm: no RAID superblock on /dev/sdj1 mdadm: /dev/sdj1 has wrong uuid. mdadm: no RAID superblock on /dev/sdj mdadm: /dev/sdj has wrong uuid. mdadm: /dev/sdi has wrong uuid. mdadm: /dev/sdh has wrong uuid. mdadm: no RAID superblock on /dev/sdg1 mdadm: /dev/sdg1 has wrong uuid. mdadm: no RAID superblock on /dev/sdg mdadm: /dev/sdg has wrong uuid. mdadm: no RAID superblock on /dev/sdc1 mdadm: /dev/sdc1 has wrong uuid. mdadm: no RAID superblock on /dev/sdc mdadm: /dev/sdc has wrong uuid. mdadm: no RAID superblock on /dev/sdb1 mdadm: /dev/sdb1 has wrong uuid. mdadm: no RAID superblock on /dev/sdb mdadm: /dev/sdb has wrong uuid. mdadm: cannot open device /dev/sda2: Device or resource busy mdadm: /dev/sda2 has wrong uuid. mdadm: cannot open device /dev/sda1: Device or resource busy mdadm: /dev/sda1 has wrong uuid. mdadm: cannot open device /dev/sda: Device or resource busy mdadm: /dev/sda has wrong uuid. mdadm: no devices found for /dev/md/0 mdadm: looking for devices for further assembly mdadm: cannot open device /dev/dm-2: Device or resource busy mdadm: cannot open device /dev/dm-1: Device or resource busy mdadm: cannot open device /dev/dm-0: Device or resource busy mdadm: no recogniseable superblock on /dev/sdf1 mdadm: no recogniseable superblock on /dev/sdf mdadm: no recogniseable superblock on /dev/sdm1 mdadm: no recogniseable superblock on /dev/sdm mdadm: no recogniseable superblock on /dev/sdl1 mdadm: no recogniseable superblock on /dev/sdl mdadm: no recogniseable superblock on /dev/sdk1 mdadm: no recogniseable superblock on /dev/sdk mdadm: no recogniseable superblock on /dev/sdj1 mdadm: no recogniseable superblock on /dev/sdj mdadm: /dev/sdi is not built for host proxmox. mdadm: /dev/sdh is not built for host proxmox. mdadm: no recogniseable superblock on /dev/sdg1 mdadm: no recogniseable superblock on /dev/sdg mdadm: no recogniseable superblock on /dev/sdc1 mdadm: no recogniseable superblock on /dev/sdc mdadm: no recogniseable superblock on /dev/sdb1 mdadm: no recogniseable superblock on /dev/sdb mdadm: cannot open device /dev/sda2: Device or resource busy mdadm: cannot open device /dev/sda1: Device or resource busy mdadm: cannot open device /dev/sda: Device or resource busy proxmox:/home/simon# apt-show-versions -a mdadm mdadm 2.6.7.2-3 install ok installed mdadm 2.6.7.2-3 lenny ftp.uk.debian.org No stable version No testing version mdadm 3.1.4-1+8efb9d1 sid ftp.uk.debian.org mdadm/lenny uptodate 2.6.7.2-3 anything else you want ? Simon On 15/02/2011 14:51, Phil Turmel wrote: > Hi Neil, > > Since Simon has responded, let me summarize the assistance I provided per his off-list request: > > On 02/14/2011 11:53 PM, NeilBrown wrote: >> On Thu, 10 Feb 2011 16:16:44 +0000 Simon McNair<simonmcnair@gmail.com> wrote: >> >>> Hi all >>> >>> I use a 3ware 9500-12 port sata card (JBOD) which will not work without a >>> 128mb sodimm. The sodimm socket is flakey and the result is that the >>> machine occasionally crashes. Yesterday I finally gave in and put >>> together another >>> machine so that I can rsync between them. When I turned the machine >>> on today to set up rync, the RAID array was not gone, but corrupted. >>> Typical... >> Presumably the old machine was called 'ubuntu' and the new machine 'proølox' >> >> >>> I built the array in Aug 2010 using the following command: >>> >>> mdadm --create --verbose /dev/md0 --metadata=1.1 --level=5 >>> --raid-devices=10 /dev/sd{b,c,d,e,f,g,h,i,j,k}1 --chunk=64 >>> >>> Using LVM, I did the following: >>> pvscan >>> pvcreate -M2 /dev/md0 >>> vgcreate lvm-raid /dev/md0 >>> vgdisplay lvm-raid >>> vgscan >>> lvscan >>> lvcreate -v -l 100%VG -n RAID lvm-raid >>> lvdisplay /dev/lvm-raid/lvm0 >>> >>> I then formatted using: >>> mkfs -t ext4 -v -m .1 -b 4096 -E stride=16,stripe-width=144 >>> /dev/lvm-raid/RAID >>> >>> This worked perfectly since I created the array. Now mdadm is coming up >>> with >>> >>> proxmox:/dev/md# mdadm --assemble --scan --verbose >>> mdadm: looking for devices for further assembly >>> mdadm: no recogniseable superblock on /dev/md/ubuntu:0 >> And it seems that ubuntu:0 have been successfully assembled. >> It is missing one device for some reason (sdd1) but RAID can cope with that. > 3ware card is compromised, with a loose buffer memory dimm. Some of its ECC errors were caught and reported in dmesg. Its likely, based on the loose memory socket, that many multiple-bit errors got through. > > [trim /] > >>> mdadm: no uptodate device for slot 8 of /dev/md/pro�lox:0 >>> mdadm: no uptodate device for slot 9 of /dev/md/pro�lox:0 >>> mdadm: failed to add /dev/sdd1 to /dev/md/pro�lox:0: Invalid argument >>> mdadm: /dev/md/pro�lox:0 assembled from 0 drives - not enough to start >>> the array. >> This looks like it is *after* to trying the --create command you give >> below.. It is best to report things in the order they happen, else you can >> confuse people (or get caught out!). > Yes, this was after. > >>> mdadm: looking for devices for further assembly >>> mdadm: no recogniseable superblock on /dev/sdd >>> mdadm: No arrays found in config file or automatically >>> >>> pvscan and vgscan show nothing. >>> >>> So I tried running mdadm --create --verbose /dev/md0 --metadata=1.1 >>> --level=5 --raid-devices=10 missing /dev/sde1 /dev/sdf1 /dev/sdg1 >>> /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 --chunk=64 >>> >>> as it seemed that /dev/sdd1 failed to be added to the array. This did >>> nothing. >> It did not to nothing. It wrote a superblock to /dev/sdd1 and complained >> that it couldn't write to all the others --- didn't it? > There were multiple attempts to create. One wrote to just sdd1, another succeeded with all but sdd1. > >>> dmesg contains: >>> >>> md: invalid superblock checksum on sdd1 >> I guess that is why sdd1 was missing from 'ubuntu:0'. Though as I cannot >> tell if this happened before or after any of the various things reported >> above, it is hard to be sure. >> >> >> The real mystery is why 'pvscan' reports nothing. > The original array was created with mdadm v2.6.7, and had a data offset of 264 sectors. After Simon's various attempts to --create, he ended up with data offset of 2048, using mdadm v3.1.4. The mdadm -E reports he posted to the list showed the 264 offset. We didn't realize the offset had been updated until somewhat later in our troubleshooting efforts. > > In any case, pvscan couldn't see the LVM signature because it wasn't there (at offset 2048). > >> What about >> pvscan --verbose >> >> or >> >> blkid -p /dev/md/ubuntu:0 >> >> or even >> >> dd of=/dev/md/ubuntu:0 count=8 | od -c > Fortunately, Simon did have a copy of his LVM configuration. With the help of dd, strings, and grep, we did locate his LVM sig at the correct location on sdd1 (for data offset 264). After a number of attempts to bypass LVM and access his single LV with dmsetup (based on his backed up configuration, on the assembled new array less sdd1), I realized that the data offset was wrong on the recreated array, and went looking for the cause. I found your git commit that changed that logic last spring, and recommended that Simon revert to the default package for his ubuntu install, which is v2.6.7. > > Simon has now attempted to recreate the array with v2.6.7, but the controller is throwing too many errors to succeed, and I suggested it was too flakey to trust any further. Based on the existence of the LVM sig on sdd1, I believe Simon's data is (mostly) intact, and only needs a successful create operation with a properly functioning controller. (He might also need to perform an lvm vgcfgrestore, but he has the necessary backup file.) > > A new controller is on order. > > Phil -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-16 13:51 ` Simon McNair @ 2011-02-16 14:37 ` Phil Turmel 2011-02-16 17:49 ` Simon McNair 0 siblings, 1 reply; 64+ messages in thread From: Phil Turmel @ 2011-02-16 14:37 UTC (permalink / raw) To: simonmcnair; +Cc: NeilBrown, linux-raid Good morning, Simon, On 02/16/2011 08:51 AM, Simon McNair wrote: > latest update. A bit long I'm afraid... > > hi, card installed and devices plugged in. Now have > sda sda2 sdb1 sdc1 sde sdf1 sdg1 sdi sdj1 sdk1 sdl1 sdm1 > sda1 sdb sdc sdd sdf sdg sdh sdj sdk sdl sdm Hmmm. > proxmox:/home/simon# ./lsdrv.sh > Controller device @ pci0000:00/0000:00:01.0/0000:01:00.0 [mvsas] > SCSI storage controller: Marvell Technology Group Ltd. MV64460/64461/64462 Sys tem Controller, Revision B (rev 01) > host4: /dev/sdf ATA Hitachi HDS72101 {SN: GTA000PAGABXRA} > host4: /dev/sdg ATA Hitachi HDS72101 {SN: GTA000PAGAA5DA} > host4: /dev/sdh ATA Hitachi HDS72101 {SN: GTA000PAG9NL9A} > host4: /dev/sdi ATA Hitachi HDS72101 {SN: GTA000PAGA8V4A} > host4: /dev/sdj ATA Hitachi HDS72101 {SN: GTD000PAGMT9GD} > host4: /dev/sdk ATA Hitachi HDS72101 {SN: GTG000PAG18BJC} > host4: /dev/sdl ATA Hitachi HDS72101 {SN: GTG000PAG1DPLC} > host4: /dev/sdm ATA Hitachi HDS72101 {SN: GTA000PAG7WMEA} > Controller device @ pci0000:00/0000:00:1a.7/usb1/1-4/1-4.1/1-4.1.1/1-4.1.1:1.0 [ usb-storage] > Bus 001 Device 007: ID 0424:2228 Standard Microsystems Corp. 9-in-2 Card Reade r {SN: 08050920003A} > host9: /dev/sdd Generic Flash HS-CF > host9: /dev/sde Generic Flash HS-COMBO > Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.0 [ahci] > SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 03) > host7: [Empty] > host8: [Empty] > Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.1 [pata_jmicron] > IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (r ev 03) > host5: [Empty] > host6: [Empty] > Controller device @ pci0000:00/0000:00:1f.2 [ata_piix] > IDE interface: Intel Corporation 82801JI (ICH10 Family) 4 port SATA IDE Contro ller #1 > host0: /dev/sda ATA STM3500418AS {SN: 9VM3QJ5C} > host1: /dev/sr0 Optiarc DVD RW AD-5240S > Controller device @ pci0000:00/0000:00:1f.5 [ata_piix] > IDE interface: Intel Corporation 82801JI (ICH10 Family) 2 port SATA IDE Contro ller #2 > host2: /dev/sdb ATA Hitachi HDS72101 {SN: GTD000PAGMT8DD} > host3: /dev/sdc ATA Hitachi HDS72101 {SN: GTG000PAG04V0C} Good thing we recorded the serial numbers. From an earlier run of "lsdrv": > Phil, > sg3-utils did the job :-) > Sorry for doubting you. > > host6: /dev/sdd AMCC 9500S-12 DISK {SN: PAGA8V4A3B8378002254} > host6: /dev/sde AMCC 9500S-12 DISK {SN: PAG9NL9A3B8387004C88} > host6: /dev/sdf AMCC 9500S-12 DISK {SN: PAGAA5DA3B8396001AE0} > host6: /dev/sdg AMCC 9500S-12 DISK {SN: PAGABXRA3B83A0005EFB} > host6: /dev/sdh AMCC 9500S-12 DISK {SN: PAG7WMEA3B83AA003E4F} > host6: /dev/sdi AMCC 9500S-12 DISK {SN: PAG1DPLC3B83B40026E6} > host6: /dev/sdj AMCC 9500S-12 DISK {SN: PAG18BJC3B83C3004760} > host6: /dev/sdk AMCC 9500S-12 DISK {SN: PAGMT9GD3B83CD004B18} > host6: /dev/sdl AMCC 9500S-12 DISK {SN: PAGMT8DD3B83D70021FF} > host6: /dev/sdm AMCC 9500S-12 DISK {SN: PAG04V0C3B83E100ACA0} > > Simon I don't know why the serial numbers are formatted differently, but we can still tell them apart (the eight characters starting with "PAG"). So, our device order in your new setup is: [ihgfmlkjbc], where /dev/sdi corresponds to the original report's /dev/sdd, which matches the sig grep in your other note. Another note: The controller for sd[abc] is still showing ata_piix as its controller. That means you cannot hot-plug those ports. If you change your BIOS to AHCI mode instead of "Compatibility" or "Emulation", the full-featured ahci driver will run those ports. Not urgent, but I highly recommend it. > proxmox:/home/simon# parted -l > Model: ATA STM3500418AS (scsi) > Disk /dev/sda: 500GB > Sector size (logical/physical): 512B/512B > Partition Table: msdos > > Number Start End Size Type File system Flags > 1 32.8kB 537MB 537MB primary ext3 boot > 2 537MB 500GB 500GB primary lvm > > > Error: The backup GPT table is not at the end of the disk, as it should be. This might mean that another operating > system believes the disk is smaller. Fix, by moving the backup to the end (and removing the old backup)? > Fix/Cancel? c The 3ware controller must have reserved some space at the end of each drive for its own use. Didn't know it'd do that. You will have to fix that. [trim /] > Error: /dev/sdh: unrecognised disk label > > Error: /dev/sdi: unrecognised disk label It seems the flaky controller took out these partition tables. Hope that's all it got. They'll have to be re-created with parted. Please run parted on each of the ten drives, and make sure they end up like so: > proxmox:/home/simon# for x in sd{d..m} ; do parted -s /dev/$x unit s > print ; done > Model: AMCC 9500S-12 DISK (scsi) > Disk /dev/sdd: 1953103872s > Sector size (logical/physical): 512B/512B > Partition Table: gpt > > Number Start End Size File system Name Flags > 1 2048s 1953101823s 1953099776s primary raid Make sure you request "unit s" before your other commands so we can make sure it matches. When you think you are done, check them all: for x in /dev/sd{i,h,g,f,m,l,k,j,b,c} ; do parted -s $x unit s print ; done After that, create: mdadm --create --verbose --assume-clean /dev/md0 --metadata=1.1 --level=5 --raid-devices=10 /dev/sd{i,h,g,f,m,l,k,j,b,c}1 --chunk=64 And finally: pvscan --verbose Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-16 14:37 ` Phil Turmel @ 2011-02-16 17:49 ` Simon McNair 2011-02-16 18:14 ` Phil Turmel 0 siblings, 1 reply; 64+ messages in thread From: Simon McNair @ 2011-02-16 17:49 UTC (permalink / raw) To: Phil Turmel; +Cc: NeilBrown, linux-raid Hi Phil, A couple of questions please. On 16/02/2011 14:37, Phil Turmel wrote: > Good morning, Simon, > > On 02/16/2011 08:51 AM, Simon McNair wrote: >> latest update. A bit long I'm afraid... >> >> hi, card installed and devices plugged in. Now have >> sda sda2 sdb1 sdc1 sde sdf1 sdg1 sdi sdj1 sdk1 sdl1 sdm1 >> sda1 sdb sdc sdd sdf sdg sdh sdj sdk sdl sdm > Hmmm. > >> proxmox:/home/simon# ./lsdrv.sh >> Controller device @ pci0000:00/0000:00:01.0/0000:01:00.0 [mvsas] >> SCSI storage controller: Marvell Technology Group Ltd. MV64460/64461/64462 Sys tem Controller, Revision B (rev 01) >> host4: /dev/sdf ATA Hitachi HDS72101 {SN: GTA000PAGABXRA} >> host4: /dev/sdg ATA Hitachi HDS72101 {SN: GTA000PAGAA5DA} >> host4: /dev/sdh ATA Hitachi HDS72101 {SN: GTA000PAG9NL9A} >> host4: /dev/sdi ATA Hitachi HDS72101 {SN: GTA000PAGA8V4A} >> host4: /dev/sdj ATA Hitachi HDS72101 {SN: GTD000PAGMT9GD} >> host4: /dev/sdk ATA Hitachi HDS72101 {SN: GTG000PAG18BJC} >> host4: /dev/sdl ATA Hitachi HDS72101 {SN: GTG000PAG1DPLC} >> host4: /dev/sdm ATA Hitachi HDS72101 {SN: GTA000PAG7WMEA} >> Controller device @ pci0000:00/0000:00:1a.7/usb1/1-4/1-4.1/1-4.1.1/1-4.1.1:1.0 [ usb-storage] >> Bus 001 Device 007: ID 0424:2228 Standard Microsystems Corp. 9-in-2 Card Reade r {SN: 08050920003A} >> host9: /dev/sdd Generic Flash HS-CF >> host9: /dev/sde Generic Flash HS-COMBO >> Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.0 [ahci] >> SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 03) >> host7: [Empty] >> host8: [Empty] >> Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.1 [pata_jmicron] >> IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (r ev 03) >> host5: [Empty] >> host6: [Empty] >> Controller device @ pci0000:00/0000:00:1f.2 [ata_piix] >> IDE interface: Intel Corporation 82801JI (ICH10 Family) 4 port SATA IDE Contro ller #1 >> host0: /dev/sda ATA STM3500418AS {SN: 9VM3QJ5C} >> host1: /dev/sr0 Optiarc DVD RW AD-5240S >> Controller device @ pci0000:00/0000:00:1f.5 [ata_piix] >> IDE interface: Intel Corporation 82801JI (ICH10 Family) 2 port SATA IDE Contro ller #2 >> host2: /dev/sdb ATA Hitachi HDS72101 {SN: GTD000PAGMT8DD} >> host3: /dev/sdc ATA Hitachi HDS72101 {SN: GTG000PAG04V0C} > Good thing we recorded the serial numbers. From an earlier run of "lsdrv": > >> Phil, >> sg3-utils did the job :-) >> Sorry for doubting you. >> >> host6: /dev/sdd AMCC 9500S-12 DISK {SN: PAGA8V4A3B8378002254} >> host6: /dev/sde AMCC 9500S-12 DISK {SN: PAG9NL9A3B8387004C88} >> host6: /dev/sdf AMCC 9500S-12 DISK {SN: PAGAA5DA3B8396001AE0} >> host6: /dev/sdg AMCC 9500S-12 DISK {SN: PAGABXRA3B83A0005EFB} >> host6: /dev/sdh AMCC 9500S-12 DISK {SN: PAG7WMEA3B83AA003E4F} >> host6: /dev/sdi AMCC 9500S-12 DISK {SN: PAG1DPLC3B83B40026E6} >> host6: /dev/sdj AMCC 9500S-12 DISK {SN: PAG18BJC3B83C3004760} >> host6: /dev/sdk AMCC 9500S-12 DISK {SN: PAGMT9GD3B83CD004B18} >> host6: /dev/sdl AMCC 9500S-12 DISK {SN: PAGMT8DD3B83D70021FF} >> host6: /dev/sdm AMCC 9500S-12 DISK {SN: PAG04V0C3B83E100ACA0} >> >> Simon > I don't know why the serial numbers are formatted differently, but we can still tell them apart (the eight characters starting with "PAG"). > > So, our device order in your new setup is: [ihgfmlkjbc], where /dev/sdi corresponds to the original report's /dev/sdd, which matches the sig grep in your other note. > > Another note: The controller for sd[abc] is still showing ata_piix as its controller. That means you cannot hot-plug those ports. If you change your BIOS to AHCI mode instead of "Compatibility" or "Emulation", the full-featured ahci driver will run those ports. Not urgent, but I highly recommend it. > Will do that now, before I forget >> proxmox:/home/simon# parted -l >> Model: ATA STM3500418AS (scsi) >> Disk /dev/sda: 500GB >> Sector size (logical/physical): 512B/512B >> Partition Table: msdos >> >> Number Start End Size Type File system Flags >> 1 32.8kB 537MB 537MB primary ext3 boot >> 2 537MB 500GB 500GB primary lvm >> >> >> Error: The backup GPT table is not at the end of the disk, as it should be. This might mean that another operating >> system believes the disk is smaller. Fix, by moving the backup to the end (and removing the old backup)? >> Fix/Cancel? c > The 3ware controller must have reserved some space at the end of each drive for its own use. Didn't know it'd do that. You will have to fix that. > > [trim /] > Do you have any suggestions on how I can fix that ? I don't have a clue >> Error: /dev/sdh: unrecognised disk label >> >> Error: /dev/sdi: unrecognised disk label > It seems the flaky controller took out these partition tables. Hope that's all it got. They'll have to be re-created with parted. > > Please run parted on each of the ten drives, and make sure they end up like so: > >> proxmox:/home/simon# for x in sd{d..m} ; do parted -s /dev/$x unit s >> print ; done >> Model: AMCC 9500S-12 DISK (scsi) >> Disk /dev/sdd: 1953103872s >> Sector size (logical/physical): 512B/512B >> Partition Table: gpt >> >> Number Start End Size File system Name Flags >> 1 2048s 1953101823s 1953099776s primary raid > Make sure you request "unit s" before your other commands so we can make sure it matches. > > When you think you are done, check them all: > when I was trying to figure out the command for this using 'man parted' I came across this: " rescue start end Rescue a lost partition that was located somewhere between start and end. If a partition is found, parted will ask if you want to create an entry for it in the partition table." Is it worth trying ? I originally created the partitions like so: parted -s /dev/sdb rm 1 parted -s /dev/sdb mklabel gpt parted -s --align optimal /dev/sdb mkpart primary ext4 .512 100% parted -s /dev/sdb set 1 raid on parted -s /dev/sdb align-check optimal 1 so to recreate the above I would do: parted -s /dev/sdb mkpart primary ext4 2048s 1953101823s parted -s /dev/sdc mkpart primary ext4 2048s 1953101823s parted -s /dev/sdf mkpart primary ext4 2048s 1953101823s parted -s /dev/sdg mkpart primary ext4 2048s 1953101823s parted -s /dev/sdh mkpart primary ext4 2048s 1953101823s parted -s /dev/sdi mkpart primary ext4 2048s 1953101823s parted -s /dev/sdj mkpart primary ext4 2048s 1953101823s parted -s /dev/sdk mkpart primary ext4 2048s 1953101823s parted -s /dev/sdl mkpart primary ext4 2048s 1953101823s parted -s /dev/sdm mkpart primary ext4 2048s 1953101823s I'm guessing the backups that I want to do can wait until any potential fsck ? sorry if the questions are dumb but I'm not sure what I'm doing and I'd rather ask more questions than fewer and understand the implications of what I'm doing. thanks Simon > for x in /dev/sd{i,h,g,f,m,l,k,j,b,c} ; do parted -s $x unit s print ; done > > After that, create: > > mdadm --create --verbose --assume-clean /dev/md0 --metadata=1.1 --level=5 --raid-devices=10 /dev/sd{i,h,g,f,m,l,k,j,b,c}1 --chunk=64 > > And finally: > > pvscan --verbose > > Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-16 17:49 ` Simon McNair @ 2011-02-16 18:14 ` Phil Turmel 2011-02-16 18:18 ` Simon McNair 2011-02-19 8:49 ` Simon Mcnair 0 siblings, 2 replies; 64+ messages in thread From: Phil Turmel @ 2011-02-16 18:14 UTC (permalink / raw) To: simonmcnair; +Cc: NeilBrown, linux-raid On 02/16/2011 12:49 PM, Simon McNair wrote: > Hi Phil, > A couple of questions please. [trim /] >>> Simon >> I don't know why the serial numbers are formatted differently, but we can still tell them apart (the eight characters starting with "PAG"). >> >> So, our device order in your new setup is: [ihgfmlkjbc], where /dev/sdi corresponds to the original report's /dev/sdd, which matches the sig grep in your other note. >> >> Another note: The controller for sd[abc] is still showing ata_piix as its controller. That means you cannot hot-plug those ports. If you change your BIOS to AHCI mode instead of "Compatibility" or "Emulation", the full-featured ahci driver will run those ports. Not urgent, but I highly recommend it. >> > Will do that now, before I forget Hot-pluggability with suitable trays is very handy! :) [trim /] >>> Error: The backup GPT table is not at the end of the disk, as it should be. This might mean that another operating >>> system believes the disk is smaller. Fix, by moving the backup to the end (and removing the old backup)? >>> Fix/Cancel? c >> The 3ware controller must have reserved some space at the end of each drive for its own use. Didn't know it'd do that. You will have to fix that. >> >> [trim /] >> > Do you have any suggestions on how I can fix that ? I don't have a clue Just do 'parted /dev/sd?' and on the ones it offers to fix, say yes. Then request 'unit s' and 'print' to verify that it is correct. [trim /] > when I was trying to figure out the command for this using 'man parted' I came across this: > " rescue start end > Rescue a lost partition that was located somewhere between start and end. If a partition is > found, parted will ask if you want to create an entry for it in the partition table." > Is it worth trying ? Nah. That's for when you don't know exactly where the partition is. We know. > I originally created the partitions like so: > parted -s /dev/sdb rm 1 > parted -s /dev/sdb mklabel gpt > parted -s --align optimal /dev/sdb mkpart primary ext4 .512 100% > parted -s /dev/sdb set 1 raid on > parted -s /dev/sdb align-check optimal 1 > > so to recreate the above I would do: > parted -s /dev/sdb mkpart primary ext4 2048s 1953101823s > parted -s /dev/sdc mkpart primary ext4 2048s 1953101823s > parted -s /dev/sdf mkpart primary ext4 2048s 1953101823s > parted -s /dev/sdg mkpart primary ext4 2048s 1953101823s > parted -s /dev/sdh mkpart primary ext4 2048s 1953101823s > parted -s /dev/sdi mkpart primary ext4 2048s 1953101823s > parted -s /dev/sdj mkpart primary ext4 2048s 1953101823s > parted -s /dev/sdk mkpart primary ext4 2048s 1953101823s > parted -s /dev/sdl mkpart primary ext4 2048s 1953101823s > parted -s /dev/sdm mkpart primary ext4 2048s 1953101823s Only recreate the partition tables where you have to, i.e., the 'Fix' option above didn't work. And don't specify a filesystem. Probably just /dev/sdh and /dev/sdi. Like so, though: parted -s /dev/sdh mklabel gpt mkpart primary 2048s 1953101823s set 1 raid on parted -s /dev/sdi mklabel gpt mkpart primary 2048s 1953101823s set 1 raid on > I'm guessing the backups that I want to do can wait until any potential fsck ? Do an 'fsck -N' first, and if it passes, or has few errors, mount the filesystem readonly and grab your backup. Then let fsck have at it for real. If anything gets fixed, compare your backup from the read-only fs to the fixed fs. Given your flaky old controller, I expect there'll be *some* problems. > sorry if the questions are dumb but I'm not sure what I'm doing and I'd rather ask more questions than fewer and understand the implications of what I'm doing. Oh, no. You are right to be paranoid. If anything looks funny, stop. Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-16 18:14 ` Phil Turmel @ 2011-02-16 18:18 ` Simon McNair 2011-02-16 18:22 ` Phil Turmel 2011-02-19 8:49 ` Simon Mcnair 1 sibling, 1 reply; 64+ messages in thread From: Simon McNair @ 2011-02-16 18:18 UTC (permalink / raw) To: Phil Turmel; +Cc: NeilBrown, linux-raid parted -l gives Error: The backup GPT table is not at the end of the disk, as it should be. This might mean that another operating system believes the disk is smaller. Fix, by moving the backup to the end (and removing the old backup)? Fix/Cancel? fix Warning: Not all of the space available to /dev/sdb appears to be used, you can fix the GPT to use all of the space (an extra 421296 blocks) or continue with the current setting? Fix/Ignore? I'm guessing ignore ? Simon On 16/02/2011 18:14, Phil Turmel wrote: > On 02/16/2011 12:49 PM, Simon McNair wrote: >> Hi Phil, >> A couple of questions please. > [trim /] > >>>> Simon >>> I don't know why the serial numbers are formatted differently, but we can still tell them apart (the eight characters starting with "PAG"). >>> >>> So, our device order in your new setup is: [ihgfmlkjbc], where /dev/sdi corresponds to the original report's /dev/sdd, which matches the sig grep in your other note. >>> >>> Another note: The controller for sd[abc] is still showing ata_piix as its controller. That means you cannot hot-plug those ports. If you change your BIOS to AHCI mode instead of "Compatibility" or "Emulation", the full-featured ahci driver will run those ports. Not urgent, but I highly recommend it. >>> >> Will do that now, before I forget > Hot-pluggability with suitable trays is very handy! :) > > [trim /] > >>>> Error: The backup GPT table is not at the end of the disk, as it should be. This might mean that another operating >>>> system believes the disk is smaller. Fix, by moving the backup to the end (and removing the old backup)? >>>> Fix/Cancel? c >>> The 3ware controller must have reserved some space at the end of each drive for its own use. Didn't know it'd do that. You will have to fix that. >>> >>> [trim /] >>> >> Do you have any suggestions on how I can fix that ? I don't have a clue > Just do 'parted /dev/sd?' and on the ones it offers to fix, say yes. Then request 'unit s' and 'print' to verify that it is correct. > > [trim /] > >> when I was trying to figure out the command for this using 'man parted' I came across this: >> " rescue start end >> Rescue a lost partition that was located somewhere between start and end. If a partition is >> found, parted will ask if you want to create an entry for it in the partition table." >> Is it worth trying ? > Nah. That's for when you don't know exactly where the partition is. We know. > >> I originally created the partitions like so: >> parted -s /dev/sdb rm 1 >> parted -s /dev/sdb mklabel gpt >> parted -s --align optimal /dev/sdb mkpart primary ext4 .512 100% >> parted -s /dev/sdb set 1 raid on >> parted -s /dev/sdb align-check optimal 1 >> >> so to recreate the above I would do: >> parted -s /dev/sdb mkpart primary ext4 2048s 1953101823s >> parted -s /dev/sdc mkpart primary ext4 2048s 1953101823s >> parted -s /dev/sdf mkpart primary ext4 2048s 1953101823s >> parted -s /dev/sdg mkpart primary ext4 2048s 1953101823s >> parted -s /dev/sdh mkpart primary ext4 2048s 1953101823s >> parted -s /dev/sdi mkpart primary ext4 2048s 1953101823s >> parted -s /dev/sdj mkpart primary ext4 2048s 1953101823s >> parted -s /dev/sdk mkpart primary ext4 2048s 1953101823s >> parted -s /dev/sdl mkpart primary ext4 2048s 1953101823s >> parted -s /dev/sdm mkpart primary ext4 2048s 1953101823s > Only recreate the partition tables where you have to, i.e., the 'Fix' option above didn't work. And don't specify a filesystem. > > Probably just /dev/sdh and /dev/sdi. Like so, though: > > parted -s /dev/sdh mklabel gpt mkpart primary 2048s 1953101823s set 1 raid on > parted -s /dev/sdi mklabel gpt mkpart primary 2048s 1953101823s set 1 raid on > >> I'm guessing the backups that I want to do can wait until any potential fsck ? > Do an 'fsck -N' first, and if it passes, or has few errors, mount the filesystem readonly and grab your backup. Then let fsck have at it for real. If anything gets fixed, compare your backup from the read-only fs to the fixed fs. > > Given your flaky old controller, I expect there'll be *some* problems. > >> sorry if the questions are dumb but I'm not sure what I'm doing and I'd rather ask more questions than fewer and understand the implications of what I'm doing. > Oh, no. You are right to be paranoid. If anything looks funny, stop. > > Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-16 18:18 ` Simon McNair @ 2011-02-16 18:22 ` Phil Turmel 2011-02-16 18:25 ` Phil Turmel 0 siblings, 1 reply; 64+ messages in thread From: Phil Turmel @ 2011-02-16 18:22 UTC (permalink / raw) To: simonmcnair; +Cc: NeilBrown, linux-raid On 02/16/2011 01:18 PM, Simon McNair wrote: > parted -l gives > > Error: The backup GPT table is not at the end of the disk, as it should be. This might mean that another operating > system believes the disk is smaller. Fix, by moving the backup to the end (and removing the old backup)? > Fix/Cancel? fix > Warning: Not all of the space available to /dev/sdb appears to be used, you can fix the GPT to use all of the space (an > extra 421296 blocks) or continue with the current setting? > Fix/Ignore? > > I'm guessing ignore ? Either should work. If you 'Fix' for all of the drives, you could later 'grow' your array to include the extra space. Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-16 18:22 ` Phil Turmel @ 2011-02-16 18:25 ` Phil Turmel 2011-02-16 18:52 ` Simon McNair 0 siblings, 1 reply; 64+ messages in thread From: Phil Turmel @ 2011-02-16 18:25 UTC (permalink / raw) To: simonmcnair; +Cc: NeilBrown, linux-raid On 02/16/2011 01:22 PM, Phil Turmel wrote: > On 02/16/2011 01:18 PM, Simon McNair wrote: >> parted -l gives >> >> Error: The backup GPT table is not at the end of the disk, as it should be. This might mean that another operating >> system believes the disk is smaller. Fix, by moving the backup to the end (and removing the old backup)? >> Fix/Cancel? fix >> Warning: Not all of the space available to /dev/sdb appears to be used, you can fix the GPT to use all of the space (an >> extra 421296 blocks) or continue with the current setting? >> Fix/Ignore? >> >> I'm guessing ignore ? > > Either should work. If you 'Fix' for all of the drives, you could later 'grow' your array to include the extra space. It's vital that the start sector be 2048, though. Please do recheck after all are done. Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-16 18:25 ` Phil Turmel @ 2011-02-16 18:52 ` Simon McNair 2011-02-16 18:57 ` Phil Turmel 0 siblings, 1 reply; 64+ messages in thread From: Simon McNair @ 2011-02-16 18:52 UTC (permalink / raw) To: Phil Turmel; +Cc: NeilBrown, linux-raid have done this.... Results are: proxmox:/home/simon# ./lsdrv.sh Controller device @ pci0000:00/0000:00:01.0/0000:01:00.0 [mvsas] SCSI storage controller: Marvell Technology Group Ltd. MV64460/64461/64462 System Controller, Revision B (rev 01) host0: /dev/sdf ATA Hitachi HDS72101 {SN: GTA000PAGABXRA} host0: /dev/sdg ATA Hitachi HDS72101 {SN: GTA000PAGAA5DA} host0: /dev/sdh ATA Hitachi HDS72101 {SN: GTA000PAG9NL9A} host0: /dev/sdi ATA Hitachi HDS72101 {SN: GTA000PAGA8V4A} host0: /dev/sdj ATA Hitachi HDS72101 {SN: GTD000PAGMT9GD} host0: /dev/sdk ATA Hitachi HDS72101 {SN: GTG000PAG18BJC} host0: /dev/sdl ATA Hitachi HDS72101 {SN: GTG000PAG1DPLC} host0: /dev/sdm ATA Hitachi HDS72101 {SN: GTA000PAG7WMEA} Controller device @ pci0000:00/0000:00:1a.7/usb1/1-4/1-4.1/1-4.1.1/1-4.1.1:1.0 [usb-storage] Bus 001 Device 007: ID 0424:2228 Standard Microsystems Corp. 9-in-2 Card Reader {SN: 08050920003A} host11: /dev/sdb Generic Flash HS-CF host11: /dev/sdc Generic Flash HS-COMBO Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.0 [ahci] SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 03) host9: [Empty] host10: [Empty] Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.1 [pata_jmicron] IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 03) host1: [Empty] host2: [Empty] Controller device @ pci0000:00/0000:00:1f.2 [ahci] SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller host3: /dev/sda ATA STM3500418AS {SN: 9VM3QJ5C} host4: /dev/sr0 Optiarc DVD RW AD-5240S host5: [Empty] host6: [Empty] host7: /dev/sdd ATA Hitachi HDS72101 {SN: GTD000PAGMT8DD} host8: /dev/sde ATA Hitachi HDS72101 {SN: GTG000PAG04V0C} enabling achi has switched the drive letters around for a couple of drives again parted list is: proxmox:/home/simon# for x in /dev/sd{i,h,g,f,m,l,k,j,d,e} ; do parted -s $x unit s print ; done Model: ATA Hitachi HDS72101 (scsi) Disk /dev/sdi: 1953525168s Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 2048s 1953101823s 1953099776s primary raid Model: ATA Hitachi HDS72101 (scsi) Disk /dev/sdh: 1953525168s Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 2048s 1953101823s 1953099776s primary raid Model: ATA Hitachi HDS72101 (scsi) Disk /dev/sdg: 1953525168s Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 2048s 1953101823s 1953099776s primary raid Model: ATA Hitachi HDS72101 (scsi) Disk /dev/sdf: 1953525168s Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 2048s 1953101823s 1953099776s primary raid Model: ATA Hitachi HDS72101 (scsi) Disk /dev/sdm: 1953525168s Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 2048s 1953101823s 1953099776s primary raid Model: ATA Hitachi HDS72101 (scsi) Disk /dev/sdl: 1953525168s Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 2048s 1953101823s 1953099776s primary raid Model: ATA Hitachi HDS72101 (scsi) Disk /dev/sdk: 1953525168s Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 2048s 1953101823s 1953099776s primary raid Model: ATA Hitachi HDS72101 (scsi) Disk /dev/sdj: 1953525168s Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 2048s 1953101823s 1953099776s primary raid Model: ATA Hitachi HDS72101 (scsi) Disk /dev/sdd: 1953525168s Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 2048s 1953101823s 1953099776s primary raid Model: ATA Hitachi HDS72101 (scsi) Disk /dev/sde: 1953525168s Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 2048s 1953101823s 1953099776s primary raid is it now: mdadm --create --verbose --assume-clean /dev/md0 --metadata=1.1 --level=5 --raid-devices=10 /dev/sd{i,h,g,f,m,l,k,j,d,e}1 --chunk=64 ? regards Simon On 16/02/2011 18:25, Phil Turmel wrote: > On 02/16/2011 01:22 PM, Phil Turmel wrote: >> On 02/16/2011 01:18 PM, Simon McNair wrote: >>> parted -l gives >>> >>> Error: The backup GPT table is not at the end of the disk, as it should be. This might mean that another operating >>> system believes the disk is smaller. Fix, by moving the backup to the end (and removing the old backup)? >>> Fix/Cancel? fix >>> Warning: Not all of the space available to /dev/sdb appears to be used, you can fix the GPT to use all of the space (an >>> extra 421296 blocks) or continue with the current setting? >>> Fix/Ignore? >>> >>> I'm guessing ignore ? >> Either should work. If you 'Fix' for all of the drives, you could later 'grow' your array to include the extra space. > It's vital that the start sector be 2048, though. Please do recheck after all are done. > > Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-16 18:52 ` Simon McNair @ 2011-02-16 18:57 ` Phil Turmel 2011-02-16 19:07 ` Simon McNair 0 siblings, 1 reply; 64+ messages in thread From: Phil Turmel @ 2011-02-16 18:57 UTC (permalink / raw) To: simonmcnair; +Cc: NeilBrown, linux-raid On 02/16/2011 01:52 PM, Simon McNair wrote: > have done this.... Results are: > > proxmox:/home/simon# ./lsdrv.sh > Controller device @ pci0000:00/0000:00:01.0/0000:01:00.0 [mvsas] > SCSI storage controller: Marvell Technology Group Ltd. MV64460/64461/64462 System Controller, Revision B (rev 01) > host0: /dev/sdf ATA Hitachi HDS72101 {SN: GTA000PAGABXRA} > host0: /dev/sdg ATA Hitachi HDS72101 {SN: GTA000PAGAA5DA} > host0: /dev/sdh ATA Hitachi HDS72101 {SN: GTA000PAG9NL9A} > host0: /dev/sdi ATA Hitachi HDS72101 {SN: GTA000PAGA8V4A} > host0: /dev/sdj ATA Hitachi HDS72101 {SN: GTD000PAGMT9GD} > host0: /dev/sdk ATA Hitachi HDS72101 {SN: GTG000PAG18BJC} > host0: /dev/sdl ATA Hitachi HDS72101 {SN: GTG000PAG1DPLC} > host0: /dev/sdm ATA Hitachi HDS72101 {SN: GTA000PAG7WMEA} > Controller device @ pci0000:00/0000:00:1a.7/usb1/1-4/1-4.1/1-4.1.1/1-4.1.1:1.0 [usb-storage] > Bus 001 Device 007: ID 0424:2228 Standard Microsystems Corp. 9-in-2 Card Reader {SN: 08050920003A} > host11: /dev/sdb Generic Flash HS-CF > host11: /dev/sdc Generic Flash HS-COMBO > Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.0 [ahci] > SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 03) > host9: [Empty] > host10: [Empty] > Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.1 [pata_jmicron] > IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 03) > host1: [Empty] > host2: [Empty] > Controller device @ pci0000:00/0000:00:1f.2 [ahci] > SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller > host3: /dev/sda ATA STM3500418AS {SN: 9VM3QJ5C} > host4: /dev/sr0 Optiarc DVD RW AD-5240S > host5: [Empty] > host6: [Empty] > host7: /dev/sdd ATA Hitachi HDS72101 {SN: GTD000PAGMT8DD} > host8: /dev/sde ATA Hitachi HDS72101 {SN: GTG000PAG04V0C} > > enabling achi has switched the drive letters around for a couple of drives again > > parted list is: > > proxmox:/home/simon# for x in /dev/sd{i,h,g,f,m,l,k,j,d,e} ; do parted -s $x unit s print ; done > Model: ATA Hitachi HDS72101 (scsi) > Disk /dev/sdi: 1953525168s > Sector size (logical/physical): 512B/512B > Partition Table: gpt > > Number Start End Size File system Name Flags > 1 2048s 1953101823s 1953099776s primary raid > > Model: ATA Hitachi HDS72101 (scsi) > Disk /dev/sdh: 1953525168s > Sector size (logical/physical): 512B/512B > Partition Table: gpt > > Number Start End Size File system Name Flags > 1 2048s 1953101823s 1953099776s primary raid > > Model: ATA Hitachi HDS72101 (scsi) > Disk /dev/sdg: 1953525168s > Sector size (logical/physical): 512B/512B > Partition Table: gpt > > Number Start End Size File system Name Flags > 1 2048s 1953101823s 1953099776s primary raid > > Model: ATA Hitachi HDS72101 (scsi) > Disk /dev/sdf: 1953525168s > Sector size (logical/physical): 512B/512B > Partition Table: gpt > > Number Start End Size File system Name Flags > 1 2048s 1953101823s 1953099776s primary raid > > Model: ATA Hitachi HDS72101 (scsi) > Disk /dev/sdm: 1953525168s > Sector size (logical/physical): 512B/512B > Partition Table: gpt > > Number Start End Size File system Name Flags > 1 2048s 1953101823s 1953099776s primary raid > > Model: ATA Hitachi HDS72101 (scsi) > Disk /dev/sdl: 1953525168s > Sector size (logical/physical): 512B/512B > Partition Table: gpt > > Number Start End Size File system Name Flags > 1 2048s 1953101823s 1953099776s primary raid > > Model: ATA Hitachi HDS72101 (scsi) > Disk /dev/sdk: 1953525168s > Sector size (logical/physical): 512B/512B > Partition Table: gpt > > Number Start End Size File system Name Flags > 1 2048s 1953101823s 1953099776s primary raid > > Model: ATA Hitachi HDS72101 (scsi) > Disk /dev/sdj: 1953525168s > Sector size (logical/physical): 512B/512B > Partition Table: gpt > > Number Start End Size File system Name Flags > 1 2048s 1953101823s 1953099776s primary raid > > Model: ATA Hitachi HDS72101 (scsi) > Disk /dev/sdd: 1953525168s > Sector size (logical/physical): 512B/512B > Partition Table: gpt > > Number Start End Size File system Name Flags > 1 2048s 1953101823s 1953099776s primary raid > > Model: ATA Hitachi HDS72101 (scsi) > Disk /dev/sde: 1953525168s > Sector size (logical/physical): 512B/512B > Partition Table: gpt > > Number Start End Size File system Name Flags > 1 2048s 1953101823s 1953099776s primary raid All looks good. > is it now: > > mdadm --create --verbose --assume-clean /dev/md0 --metadata=1.1 --level=5 --raid-devices=10 /dev/sd{i,h,g,f,m,l,k,j,d,e}1 --chunk=64 Yes. Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-16 18:57 ` Phil Turmel @ 2011-02-16 19:07 ` Simon McNair 2011-02-16 19:10 ` Phil Turmel 0 siblings, 1 reply; 64+ messages in thread From: Simon McNair @ 2011-02-16 19:07 UTC (permalink / raw) To: Phil Turmel; +Cc: NeilBrown, linux-raid pvscan is: proxmox:/home/simon# pvscan --verbose Wiping cache of LVM-capable devices Wiping internal VG cache Walking through all physical volumes PV /dev/sda2 VG pve lvm2 [465.26 GB / 4.00 GB free] PV /dev/md0 VG lvm-raid lvm2 [8.19 TB / 0 free] Total: 2 [655.05 GB] / in use: 2 [655.05 GB] / in no VG: 0 [0 ] lvm-raid is there .... proxmox:/home/simon# fsck.ext4 -n /dev/md0 e2fsck 1.41.3 (12-Oct-2008) fsck.ext4: Superblock invalid, trying backup blocks... Superblock has an invalid ext3 journal (inode 8). Clear? no fsck.ext4: Illegal inode number while checking ext3 journal for /dev/md0 I thought I needed to run fsck against the lvm group ? I used to do... fsck.ext4 /dev/lvm-raid/RAID so do I need to mount it or something ? cheers Simon On 16/02/2011 18:57, Phil Turmel wrote: > On 02/16/2011 01:52 PM, Simon McNair wrote: >> have done this.... Results are: >> >> proxmox:/home/simon# ./lsdrv.sh >> Controller device @ pci0000:00/0000:00:01.0/0000:01:00.0 [mvsas] >> SCSI storage controller: Marvell Technology Group Ltd. MV64460/64461/64462 System Controller, Revision B (rev 01) >> host0: /dev/sdf ATA Hitachi HDS72101 {SN: GTA000PAGABXRA} >> host0: /dev/sdg ATA Hitachi HDS72101 {SN: GTA000PAGAA5DA} >> host0: /dev/sdh ATA Hitachi HDS72101 {SN: GTA000PAG9NL9A} >> host0: /dev/sdi ATA Hitachi HDS72101 {SN: GTA000PAGA8V4A} >> host0: /dev/sdj ATA Hitachi HDS72101 {SN: GTD000PAGMT9GD} >> host0: /dev/sdk ATA Hitachi HDS72101 {SN: GTG000PAG18BJC} >> host0: /dev/sdl ATA Hitachi HDS72101 {SN: GTG000PAG1DPLC} >> host0: /dev/sdm ATA Hitachi HDS72101 {SN: GTA000PAG7WMEA} >> Controller device @ pci0000:00/0000:00:1a.7/usb1/1-4/1-4.1/1-4.1.1/1-4.1.1:1.0 [usb-storage] >> Bus 001 Device 007: ID 0424:2228 Standard Microsystems Corp. 9-in-2 Card Reader {SN: 08050920003A} >> host11: /dev/sdb Generic Flash HS-CF >> host11: /dev/sdc Generic Flash HS-COMBO >> Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.0 [ahci] >> SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 03) >> host9: [Empty] >> host10: [Empty] >> Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.1 [pata_jmicron] >> IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 03) >> host1: [Empty] >> host2: [Empty] >> Controller device @ pci0000:00/0000:00:1f.2 [ahci] >> SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller >> host3: /dev/sda ATA STM3500418AS {SN: 9VM3QJ5C} >> host4: /dev/sr0 Optiarc DVD RW AD-5240S >> host5: [Empty] >> host6: [Empty] >> host7: /dev/sdd ATA Hitachi HDS72101 {SN: GTD000PAGMT8DD} >> host8: /dev/sde ATA Hitachi HDS72101 {SN: GTG000PAG04V0C} >> >> enabling achi has switched the drive letters around for a couple of drives again >> >> parted list is: >> >> proxmox:/home/simon# for x in /dev/sd{i,h,g,f,m,l,k,j,d,e} ; do parted -s $x unit s print ; done >> Model: ATA Hitachi HDS72101 (scsi) >> Disk /dev/sdi: 1953525168s >> Sector size (logical/physical): 512B/512B >> Partition Table: gpt >> >> Number Start End Size File system Name Flags >> 1 2048s 1953101823s 1953099776s primary raid >> >> Model: ATA Hitachi HDS72101 (scsi) >> Disk /dev/sdh: 1953525168s >> Sector size (logical/physical): 512B/512B >> Partition Table: gpt >> >> Number Start End Size File system Name Flags >> 1 2048s 1953101823s 1953099776s primary raid >> >> Model: ATA Hitachi HDS72101 (scsi) >> Disk /dev/sdg: 1953525168s >> Sector size (logical/physical): 512B/512B >> Partition Table: gpt >> >> Number Start End Size File system Name Flags >> 1 2048s 1953101823s 1953099776s primary raid >> >> Model: ATA Hitachi HDS72101 (scsi) >> Disk /dev/sdf: 1953525168s >> Sector size (logical/physical): 512B/512B >> Partition Table: gpt >> >> Number Start End Size File system Name Flags >> 1 2048s 1953101823s 1953099776s primary raid >> >> Model: ATA Hitachi HDS72101 (scsi) >> Disk /dev/sdm: 1953525168s >> Sector size (logical/physical): 512B/512B >> Partition Table: gpt >> >> Number Start End Size File system Name Flags >> 1 2048s 1953101823s 1953099776s primary raid >> >> Model: ATA Hitachi HDS72101 (scsi) >> Disk /dev/sdl: 1953525168s >> Sector size (logical/physical): 512B/512B >> Partition Table: gpt >> >> Number Start End Size File system Name Flags >> 1 2048s 1953101823s 1953099776s primary raid >> >> Model: ATA Hitachi HDS72101 (scsi) >> Disk /dev/sdk: 1953525168s >> Sector size (logical/physical): 512B/512B >> Partition Table: gpt >> >> Number Start End Size File system Name Flags >> 1 2048s 1953101823s 1953099776s primary raid >> >> Model: ATA Hitachi HDS72101 (scsi) >> Disk /dev/sdj: 1953525168s >> Sector size (logical/physical): 512B/512B >> Partition Table: gpt >> >> Number Start End Size File system Name Flags >> 1 2048s 1953101823s 1953099776s primary raid >> >> Model: ATA Hitachi HDS72101 (scsi) >> Disk /dev/sdd: 1953525168s >> Sector size (logical/physical): 512B/512B >> Partition Table: gpt >> >> Number Start End Size File system Name Flags >> 1 2048s 1953101823s 1953099776s primary raid >> >> Model: ATA Hitachi HDS72101 (scsi) >> Disk /dev/sde: 1953525168s >> Sector size (logical/physical): 512B/512B >> Partition Table: gpt >> >> Number Start End Size File system Name Flags >> 1 2048s 1953101823s 1953099776s primary raid > All looks good. > >> is it now: >> >> mdadm --create --verbose --assume-clean /dev/md0 --metadata=1.1 --level=5 --raid-devices=10 /dev/sd{i,h,g,f,m,l,k,j,d,e}1 --chunk=64 > Yes. > > Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-16 19:07 ` Simon McNair @ 2011-02-16 19:10 ` Phil Turmel 2011-02-16 19:15 ` Simon McNair 0 siblings, 1 reply; 64+ messages in thread From: Phil Turmel @ 2011-02-16 19:10 UTC (permalink / raw) To: simonmcnair; +Cc: NeilBrown, linux-raid On 02/16/2011 02:07 PM, Simon McNair wrote: > pvscan is: > proxmox:/home/simon# pvscan --verbose > Wiping cache of LVM-capable devices > Wiping internal VG cache > Walking through all physical volumes > PV /dev/sda2 VG pve lvm2 [465.26 GB / 4.00 GB free] > PV /dev/md0 VG lvm-raid lvm2 [8.19 TB / 0 free] > Total: 2 [655.05 GB] / in use: 2 [655.05 GB] / in no VG: 0 [0 ] > > lvm-raid is there .... Very good. > proxmox:/home/simon# fsck.ext4 -n /dev/md0 Uh, no. > e2fsck 1.41.3 (12-Oct-2008) > fsck.ext4: Superblock invalid, trying backup blocks... > Superblock has an invalid ext3 journal (inode 8). > Clear? no > > fsck.ext4: Illegal inode number while checking ext3 journal for /dev/md0 > I thought I needed to run fsck against the lvm group ? I used to do... fsck.ext4 /dev/lvm-raid/RAID so do I need to mount it or something ? vgscan --verbose lvscan --verbose Then either: fsck -N /dev/mapper/lvm-raid-RAID or: fsck -N /dev/lvm-raid/RAID depends on what udev is doing. Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-16 19:10 ` Phil Turmel @ 2011-02-16 19:15 ` Simon McNair 2011-02-16 19:36 ` Phil Turmel 0 siblings, 1 reply; 64+ messages in thread From: Simon McNair @ 2011-02-16 19:15 UTC (permalink / raw) To: Phil Turmel; +Cc: NeilBrown, linux-raid proxmox:/home/simon# vgscan --verbose Wiping cache of LVM-capable devices Wiping internal VG cache Reading all physical volumes. This may take a while... Finding all volume groups Finding volume group "pve" Found volume group "pve" using metadata type lvm2 Finding volume group "lvm-raid" Found volume group "lvm-raid" using metadata type lvm2 proxmox:/home/simon# proxmox:/home/simon# lvscan --verbose Finding all logical volumes ACTIVE '/dev/pve/swap' [11.00 GB] inherit ACTIVE '/dev/pve/root' [96.00 GB] inherit ACTIVE '/dev/pve/data' [354.26 GB] inherit inactive '/dev/lvm-raid/RAID' [8.19 TB] inherit proxmox:/home/simon# vgchange -ay 3 logical volume(s) in volume group "pve" now active 1 logical volume(s) in volume group "lvm-raid" now active proxmox:/home/simon# fsck.ext4 -n /dev/mapper/lvm-raid-RAID e2fsck 1.41.3 (12-Oct-2008) fsck.ext4: No such file or directory while trying to open /dev/mapper/lvm-raid-RAID The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device> proxmox:/home/simon# fsck.ext4 -n /dev/mapper/ control lvm--raid-RAID pve-data pve-root pve-swap proxmox:/home/simon# fsck.ext4 -n /dev/mapper/lvm--raid-RAID e2fsck 1.41.3 (12-Oct-2008) /dev/mapper/lvm--raid-RAID has unsupported feature(s): FEATURE_I31 e2fsck: Get a newer version of e2fsck! my version of e2fsck always worked before ? Simon On 16/02/2011 19:10, Phil Turmel wrote: > On 02/16/2011 02:07 PM, Simon McNair wrote: >> pvscan is: >> proxmox:/home/simon# pvscan --verbose >> Wiping cache of LVM-capable devices >> Wiping internal VG cache >> Walking through all physical volumes >> PV /dev/sda2 VG pve lvm2 [465.26 GB / 4.00 GB free] >> PV /dev/md0 VG lvm-raid lvm2 [8.19 TB / 0 free] >> Total: 2 [655.05 GB] / in use: 2 [655.05 GB] / in no VG: 0 [0 ] >> >> lvm-raid is there .... > Very good. > >> proxmox:/home/simon# fsck.ext4 -n /dev/md0 > Uh, no. > >> e2fsck 1.41.3 (12-Oct-2008) >> fsck.ext4: Superblock invalid, trying backup blocks... >> Superblock has an invalid ext3 journal (inode 8). >> Clear? no >> >> fsck.ext4: Illegal inode number while checking ext3 journal for /dev/md0 > >> I thought I needed to run fsck against the lvm group ? I used to do... fsck.ext4 /dev/lvm-raid/RAID so do I need to mount it or something ? > vgscan --verbose > > lvscan --verbose > > > Then either: > > fsck -N /dev/mapper/lvm-raid-RAID > > or: > > fsck -N /dev/lvm-raid/RAID > > depends on what udev is doing. > > Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-16 19:15 ` Simon McNair @ 2011-02-16 19:36 ` Phil Turmel 2011-02-16 21:28 ` Simon McNair ` (2 more replies) 0 siblings, 3 replies; 64+ messages in thread From: Phil Turmel @ 2011-02-16 19:36 UTC (permalink / raw) To: simonmcnair; +Cc: NeilBrown, linux-raid On 02/16/2011 02:15 PM, Simon McNair wrote: > proxmox:/home/simon# vgscan --verbose > Wiping cache of LVM-capable devices > Wiping internal VG cache > Reading all physical volumes. This may take a while... > Finding all volume groups > Finding volume group "pve" > Found volume group "pve" using metadata type lvm2 > Finding volume group "lvm-raid" > Found volume group "lvm-raid" using metadata type lvm2 > proxmox:/home/simon# > proxmox:/home/simon# lvscan --verbose > Finding all logical volumes > ACTIVE '/dev/pve/swap' [11.00 GB] inherit > ACTIVE '/dev/pve/root' [96.00 GB] inherit > ACTIVE '/dev/pve/data' [354.26 GB] inherit > inactive '/dev/lvm-raid/RAID' [8.19 TB] inherit > > proxmox:/home/simon# vgchange -ay > 3 logical volume(s) in volume group "pve" now active > 1 logical volume(s) in volume group "lvm-raid" now active Heh. Figures. > proxmox:/home/simon# fsck.ext4 -n /dev/mapper/lvm-raid-RAID Actually, I wanted you to try with a capital N. Lower case 'n' is similar, but not quite the same. > e2fsck 1.41.3 (12-Oct-2008) > fsck.ext4: No such file or directory while trying to open /dev/mapper/lvm-raid-RAID > > The superblock could not be read or does not describe a correct ext2 > filesystem. If the device is valid and it really contains an ext2 > filesystem (and not swap or ufs or something else), then the superblock > is corrupt, and you might try running e2fsck with an alternate superblock: > e2fsck -b 8193 <device> > > proxmox:/home/simon# fsck.ext4 -n /dev/mapper/ > control lvm--raid-RAID pve-data pve-root pve-swap Strange. I guess it does that to distinguish dashes in the VG name from dashes between VG and LV names. > proxmox:/home/simon# fsck.ext4 -n /dev/mapper/lvm--raid-RAID > e2fsck 1.41.3 (12-Oct-2008) > /dev/mapper/lvm--raid-RAID has unsupported feature(s): FEATURE_I31 > e2fsck: Get a newer version of e2fsck! > > my version of e2fsck always worked before ? v1.41.14 was release 7 weeks ago. But, I suspect there's corruption in the superblock. Do you still have your disk images tucked away somewhere safe? If so, try: 1) The '-b' option to e2fsck. We need to experiment with '-n -b offset' to find the alternate superblock. Trying 'offset' = to 8193, 16384, and 32768, per the man-page. 2) A newer e2fsprogs. Finally, 3) mount -r /dev/lvm-raid/RAID /mnt/whatever Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-16 19:36 ` Phil Turmel @ 2011-02-16 21:28 ` Simon McNair 2011-02-16 21:30 ` Phil Turmel 2011-02-18 9:31 ` Simon Mcnair [not found] ` <AANLkTi=RmR5nVnmFLuqK5anHc3WDPxjuYjitT6+5wAqS@mail.gmail.com> 2 siblings, 1 reply; 64+ messages in thread From: Simon McNair @ 2011-02-16 21:28 UTC (permalink / raw) To: Phil Turmel; +Cc: NeilBrown, linux-raid Phil, Jeez I'm having a bad week (my windows 7x64 machine has just started randomly crashing, my Thecus n5200 is playing up and the weather has been dire so I've not been able to put my new shed up...oh and I have ongoing 'other' issues as you're well aware ;-)) The thecus n5200 has 5x2TB hdd's. I wiped out the existing raid5 array to create a jbod span of 10tb in order to hold my 9tb of backups. The Thecus has had a hissy fit and I've had to set the process off again, so you can bet it'll be a day or two before it get's the drives formatted (it's not a very powerful nas), then I'll do the backups, then I'll try as you suggested. Thanks for the ongoing assistance. Simon On 16/02/2011 19:36, Phil Turmel wrote: > On 02/16/2011 02:15 PM, Simon McNair wrote: >> proxmox:/home/simon# vgscan --verbose >> Wiping cache of LVM-capable devices >> Wiping internal VG cache >> Reading all physical volumes. This may take a while... >> Finding all volume groups >> Finding volume group "pve" >> Found volume group "pve" using metadata type lvm2 >> Finding volume group "lvm-raid" >> Found volume group "lvm-raid" using metadata type lvm2 >> proxmox:/home/simon# >> proxmox:/home/simon# lvscan --verbose >> Finding all logical volumes >> ACTIVE '/dev/pve/swap' [11.00 GB] inherit >> ACTIVE '/dev/pve/root' [96.00 GB] inherit >> ACTIVE '/dev/pve/data' [354.26 GB] inherit >> inactive '/dev/lvm-raid/RAID' [8.19 TB] inherit >> >> proxmox:/home/simon# vgchange -ay >> 3 logical volume(s) in volume group "pve" now active >> 1 logical volume(s) in volume group "lvm-raid" now active > Heh. Figures. > >> proxmox:/home/simon# fsck.ext4 -n /dev/mapper/lvm-raid-RAID > Actually, I wanted you to try with a capital N. Lower case 'n' is similar, but not quite the same. > >> e2fsck 1.41.3 (12-Oct-2008) >> fsck.ext4: No such file or directory while trying to open /dev/mapper/lvm-raid-RAID >> >> The superblock could not be read or does not describe a correct ext2 >> filesystem. If the device is valid and it really contains an ext2 >> filesystem (and not swap or ufs or something else), then the superblock >> is corrupt, and you might try running e2fsck with an alternate superblock: >> e2fsck -b 8193<device> >> >> proxmox:/home/simon# fsck.ext4 -n /dev/mapper/ >> control lvm--raid-RAID pve-data pve-root pve-swap > Strange. I guess it does that to distinguish dashes in the VG name from dashes between VG and LV names. > >> proxmox:/home/simon# fsck.ext4 -n /dev/mapper/lvm--raid-RAID >> e2fsck 1.41.3 (12-Oct-2008) >> /dev/mapper/lvm--raid-RAID has unsupported feature(s): FEATURE_I31 >> e2fsck: Get a newer version of e2fsck! >> >> my version of e2fsck always worked before ? > v1.41.14 was release 7 weeks ago. But, I suspect there's corruption in the superblock. Do you still have your disk images tucked away somewhere safe? > > If so, try: > > 1) The '-b' option to e2fsck. We need to experiment with '-n -b offset' to find the alternate superblock. Trying 'offset' = to 8193, 16384, and 32768, per the man-page. > > 2) A newer e2fsprogs. > > Finally, > 3) mount -r /dev/lvm-raid/RAID /mnt/whatever > > Phil > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-16 21:28 ` Simon McNair @ 2011-02-16 21:30 ` Phil Turmel 2011-02-16 22:44 ` Simon Mcnair 0 siblings, 1 reply; 64+ messages in thread From: Phil Turmel @ 2011-02-16 21:30 UTC (permalink / raw) To: simonmcnair; +Cc: NeilBrown, linux-raid On 02/16/2011 04:28 PM, Simon McNair wrote: > Phil, > Jeez I'm having a bad week (my windows 7x64 machine has just started randomly crashing, my Thecus n5200 is playing up and the weather has been dire so I've not been able to put my new shed up...oh and I have ongoing 'other' issues as you're well aware ;-)) The thecus n5200 has 5x2TB hdd's. I wiped out the existing raid5 array to create a jbod span of 10tb in order to hold my 9tb of backups. The Thecus has had a hissy fit and I've had to set the process off again, so you can bet it'll be a day or two before it get's the drives formatted (it's not a very powerful nas), then I'll do the backups, then I'll try as you suggested. That's fine. > > Thanks for the ongoing assistance. No problem. Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-16 21:30 ` Phil Turmel @ 2011-02-16 22:44 ` Simon Mcnair 2011-02-16 23:39 ` Phil Turmel 0 siblings, 1 reply; 64+ messages in thread From: Simon Mcnair @ 2011-02-16 22:44 UTC (permalink / raw) To: Phil Turmel; +Cc: NeilBrown, linux-raid@vger.kernel.org I just went to bed and one last question popped in to my mind. Since there is a fair timezone gap I thought I'd be presumptuous and ask it in the hope I can turn it around a bit quicker in the morning. My suspicion is that once I have 5 formatted 2tb drives I may be lucky to get 10x 1tb dd images on to it. Can I feed the dd process in to tar, bzip2, zip or something else which will give me enough space to fit the images on ? Will I get more usable space from 5x2tb partitions or from 1xspanned volume ? (the thecus pretty much only allows you to create raid volumes so a jbod needs to be 5x1tb arrays or 1 spanned volume or stripe). TIA Simon On 16 Feb 2011, at 21:30, Phil Turmel <philip@turmel.org> wrote: > On 02/16/2011 04:28 PM, Simon McNair wrote: >> Phil, >> Jeez I'm having a bad week (my windows 7x64 machine has just started randomly crashing, my Thecus n5200 is playing up and the weather has been dire so I've not been able to put my new shed up...oh and I have ongoing 'other' issues as you're well aware ;-)) The thecus n5200 has 5x2TB hdd's. I wiped out the existing raid5 array to create a jbod span of 10tb in order to hold my 9tb of backups. The Thecus has had a hissy fit and I've had to set the process off again, so you can bet it'll be a day or two before it get's the drives formatted (it's not a very powerful nas), then I'll do the backups, then I'll try as you suggested. > > That's fine. >> >> Thanks for the ongoing assistance. > > No problem. > > Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-16 22:44 ` Simon Mcnair @ 2011-02-16 23:39 ` Phil Turmel 2011-02-17 13:26 ` Simon Mcnair 0 siblings, 1 reply; 64+ messages in thread From: Phil Turmel @ 2011-02-16 23:39 UTC (permalink / raw) To: Simon Mcnair; +Cc: NeilBrown, linux-raid@vger.kernel.org On 02/16/2011 05:44 PM, Simon Mcnair wrote: > I just went to bed and one last question popped in to my mind. Since > there is a fair timezone gap I thought I'd be presumptuous and ask it > in the hope I can turn it around a bit quicker in the morning. > > My suspicion is that once I have 5 formatted 2tb drives I may be lucky > to get 10x 1tb dd images on to it. Can I feed the dd process in to > tar, bzip2, zip or something else which will give me enough space to > fit the images on ? > > Will I get more usable space from 5x2tb partitions or from 1xspanned > volume ? (the thecus pretty much only allows you to create raid > volumes so a jbod needs to be 5x1tb arrays or 1 spanned volume or > stripe). I'd use one spanned volume, and gzip. I'd simultaneously generate an md5sum while streaming to your thecus. A script like so: #! /bin/bash # function usage() { printf "Usage:\n\t%s devname\n\n" "`basename \"$0\"`" printf "'devname' must be a relative path in /dev/ to the desired block device.\n" exit 1 } # Verify the supplied name is a device test -b "/dev/$1" || usage # Convert path separators and spaces into dashes outfile="`echo \"$1\" |sed -r -e 's:[ /]+:-:g'`" # Create a side-stream for computing the MD5 of the data read fifo=`mktemp -u` mkfifo $fifo || exit md5sum -b <$fifo >/mnt/thecus/$outfile.md5 & # Read the device and compress it dd if="/dev/$1" bs=1M | tee 2>$fifo | gzip >/mnt/thecus/$outfile.gz # Wait for the background task to close wait ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-16 23:39 ` Phil Turmel @ 2011-02-17 13:26 ` Simon Mcnair 2011-02-17 13:48 ` Phil Turmel 0 siblings, 1 reply; 64+ messages in thread From: Simon Mcnair @ 2011-02-17 13:26 UTC (permalink / raw) To: Phil Turmel; +Cc: NeilBrown, linux-raid@vger.kernel.org Phil, After a couple of attempts I realised that all I needed to do was ./backup.sh sdi rather than specifying ./backup.sh /dev/sdi schoolboy error huh, rtfm. I tried modifying the code, so I could kick them all off at once, but I wanted to check that this would work from a multithreaded/sane perspective (yes I know it's a bit of IO, but iotop seems to infer I'm only getting 10M/s throughput from the disk anyway). is this code kosha ? #! /bin/bash # function usage() { printf "Usage:\n\t%s devname\n\n" "`basename \"$0\"`" printf "'devname' must be a relative path in /dev/ to the desired block device.\n" exit 1 } # Verify the supplied name is a device test -b "/dev/$1" || usage # Convert path separators and spaces into dashes outfile="`echo \"$1\" |sed -r -e 's:[ /]+:-:g'`" # Create a side-stream for computing the MD5 of the data read fifo=`mktemp -u` mkfifo $fifo || exit md5sum -b <$fifo >$2/$outfile.md5 & # Read the device and compress it dd if="/dev/$1" bs=1M | tee 2>$fifo | gzip >$2/$outfile.gz # Wait for the background task to close wait I was a little concerned as I didn't see the hdd drive LED light up for my attempt even though the file was growing nicely on my CIFS share. I'm dumping it on the 5x2TB disks (JBOD) in my windows 7 box as my Thecus is not a happy chappy and I need time to mount the DOM module in another machine to find out why. cheers Simon On 16 February 2011 23:39, Phil Turmel <philip@turmel.org> wrote: > On 02/16/2011 05:44 PM, Simon Mcnair wrote: >> I just went to bed and one last question popped in to my mind. Since >> there is a fair timezone gap I thought I'd be presumptuous and ask it >> in the hope I can turn it around a bit quicker in the morning. >> >> My suspicion is that once I have 5 formatted 2tb drives I may be lucky >> to get 10x 1tb dd images on to it. Can I feed the dd process in to >> tar, bzip2, zip or something else which will give me enough space to >> fit the images on ? >> >> Will I get more usable space from 5x2tb partitions or from 1xspanned >> volume ? (the thecus pretty much only allows you to create raid >> volumes so a jbod needs to be 5x1tb arrays or 1 spanned volume or >> stripe). > > I'd use one spanned volume, and gzip. I'd simultaneously generate an md5sum while streaming to your thecus. A script like so: > > #! /bin/bash > # > function usage() { > printf "Usage:\n\t%s devname\n\n" "`basename \"$0\"`" > printf "'devname' must be a relative path in /dev/ to the desired block device.\n" > exit 1 > } > > # Verify the supplied name is a device > test -b "/dev/$1" || usage > > # Convert path separators and spaces into dashes > outfile="`echo \"$1\" |sed -r -e 's:[ /]+:-:g'`" > > # Create a side-stream for computing the MD5 of the data read > fifo=`mktemp -u` > mkfifo $fifo || exit > md5sum -b <$fifo >/mnt/thecus/$outfile.md5 & > > # Read the device and compress it > dd if="/dev/$1" bs=1M | tee 2>$fifo | gzip >/mnt/thecus/$outfile.gz > > # Wait for the background task to close > wait > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-17 13:26 ` Simon Mcnair @ 2011-02-17 13:48 ` Phil Turmel 2011-02-17 13:56 ` Simon Mcnair 0 siblings, 1 reply; 64+ messages in thread From: Phil Turmel @ 2011-02-17 13:48 UTC (permalink / raw) To: Simon Mcnair; +Cc: NeilBrown, linux-raid@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 1979 bytes --] On 02/17/2011 08:26 AM, Simon Mcnair wrote: > Phil, > After a couple of attempts I realised that all I needed to do was > ./backup.sh sdi rather than specifying ./backup.sh /dev/sdi schoolboy > error huh, rtfm. > > I tried modifying the code, so I could kick them all off at once, but > I wanted to check that this would work from a multithreaded/sane > perspective (yes I know it's a bit of IO, but iotop seems to infer I'm > only getting 10M/s throughput from the disk anyway). That sounds suspiciously like a 100 meg ethernet bottleneck, so parallel operation won't help. If so, this'll take a very long time. > is this code kosha ? Looks pretty good. I've attached a slight update with your changes and a bugfix. > #! /bin/bash > # > function usage() { > printf "Usage:\n\t%s devname\n\n" "`basename \"$0\"`" > printf "'devname' must be a relative path in /dev/ to the > desired block device.\n" > exit 1 > } > > # Verify the supplied name is a device > test -b "/dev/$1" || usage > > # Convert path separators and spaces into dashes > outfile="`echo \"$1\" |sed -r -e 's:[ /]+:-:g'`" > > # Create a side-stream for computing the MD5 of the data read > fifo=`mktemp -u` > mkfifo $fifo || exit > md5sum -b <$fifo >$2/$outfile.md5 & > > # Read the device and compress it > dd if="/dev/$1" bs=1M | tee 2>$fifo | gzip >$2/$outfile.gz > > # Wait for the background task to close > wait > > I was a little concerned as I didn't see the hdd drive LED light up > for my attempt even though the file was growing nicely on my CIFS > share. I'm dumping it on the 5x2TB disks (JBOD) in my windows 7 box > as my Thecus is not a happy chappy and I need time to mount the DOM > module in another machine to find out why. If you're willing to dismantle the thecus, directly connecting its drives to your crippled system will make things go much faster. You've got four empty motherboard SATA ports. You just have to watch out for the power load. Phil [-- Attachment #2: block2gz.sh --] [-- Type: application/x-sh, Size: 803 bytes --] ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-17 13:48 ` Phil Turmel @ 2011-02-17 13:56 ` Simon Mcnair 2011-02-17 14:34 ` Simon Mcnair 0 siblings, 1 reply; 64+ messages in thread From: Simon Mcnair @ 2011-02-17 13:56 UTC (permalink / raw) To: Phil Turmel; +Cc: NeilBrown, linux-raid@vger.kernel.org Phil, I think I need to connect them to the crippled machine directly, I was just trying to leave it alone, I know that doesn't make much sense but I'm more familiar with windows than Linux. The cards that I have are all connected at gigabit and bonded. That machine has two ports, so does the other. I'm running two asus p6ts (1 deluxe and 1 se) with i7 920's I cant understand why it's so slow though I have seen some reports that the supermicro card is quite slow. Simon Ps thanks for the bugfixes and updates Simon On 17 Feb 2011, at 13:48, Phil Turmel <philip@turmel.org> wrote: > On 02/17/2011 08:26 AM, Simon Mcnair wrote: >> Phil, >> After a couple of attempts I realised that all I needed to do was >> ./backup.sh sdi rather than specifying ./backup.sh /dev/sdi schoolboy >> error huh, rtfm. >> >> I tried modifying the code, so I could kick them all off at once, but >> I wanted to check that this would work from a multithreaded/sane >> perspective (yes I know it's a bit of IO, but iotop seems to infer I'm >> only getting 10M/s throughput from the disk anyway). > > That sounds suspiciously like a 100 meg ethernet bottleneck, so parallel operation won't help. If so, this'll take a very long time. > >> is this code kosha ? > > Looks pretty good. I've attached a slight update with your changes and a bugfix. > >> #! /bin/bash >> # >> function usage() { >> printf "Usage:\n\t%s devname\n\n" "`basename \"$0\"`" >> printf "'devname' must be a relative path in /dev/ to the >> desired block device.\n" >> exit 1 >> } >> >> # Verify the supplied name is a device >> test -b "/dev/$1" || usage >> >> # Convert path separators and spaces into dashes >> outfile="`echo \"$1\" |sed -r -e 's:[ /]+:-:g'`" >> >> # Create a side-stream for computing the MD5 of the data read >> fifo=`mktemp -u` >> mkfifo $fifo || exit >> md5sum -b <$fifo >$2/$outfile.md5 & >> >> # Read the device and compress it >> dd if="/dev/$1" bs=1M | tee 2>$fifo | gzip >$2/$outfile.gz >> >> # Wait for the background task to close >> wait >> >> I was a little concerned as I didn't see the hdd drive LED light up >> for my attempt even though the file was growing nicely on my CIFS >> share. I'm dumping it on the 5x2TB disks (JBOD) in my windows 7 box >> as my Thecus is not a happy chappy and I need time to mount the DOM >> module in another machine to find out why. > > If you're willing to dismantle the thecus, directly connecting its drives to your crippled system will make things go much faster. You've got four empty motherboard SATA ports. You just have to watch out for the power load. > > Phil > <block2gz.sh> ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-17 13:56 ` Simon Mcnair @ 2011-02-17 14:34 ` Simon Mcnair 2011-02-17 16:54 ` Phil Turmel 0 siblings, 1 reply; 64+ messages in thread From: Simon Mcnair @ 2011-02-17 14:34 UTC (permalink / raw) To: Phil Turmel; +Cc: NeilBrown, linux-raid@vger.kernel.org 19390 root 17.49 M/s 0 B/s 0.00 % 21.36 % dd if /dev/sde bs 1M 19333 root 16.52 M/s 0 B/s 0.00 % 17.07 % dd if /dev/sdd bs 1M 19503 root 15.79 M/s 0 B/s 0.00 % 13.91 % dd if /dev/sdf bs 1M 18896 root 0 B/s 16.10 M/s 0.00 % 0.00 % ntfs-3g /dev/sdb1 /media/ntfs3g/sdb 18909 root 0 B/s 13.49 M/s 0.00 % 0.00 % ntfs-3g /dev/sdc1 /media/ntfs3g/sdc 18920 root 0 B/s 15.73 M/s 0.00 % 0.00 % ntfs-3g /dev/sdn1 /media/ntfs3g/sdn this will certainly be quicker :-) On 17 February 2011 13:56, Simon Mcnair <simonmcnair@gmail.com> wrote: > Phil, I think I need to connect them to the crippled machine directly, > I was just trying to leave it alone, I know that doesn't make much > sense but I'm more familiar with windows than Linux. The cards that I > have are all connected at gigabit and bonded. That machine has two > ports, so does the other. I'm running two asus p6ts (1 deluxe and 1 > se) with i7 920's > > I cant understand why it's so slow though I have seen some reports > that the supermicro card is quite slow. > Simon > Ps thanks for the bugfixes and updates > Simon > > On 17 Feb 2011, at 13:48, Phil Turmel <philip@turmel.org> wrote: > >> On 02/17/2011 08:26 AM, Simon Mcnair wrote: >>> Phil, >>> After a couple of attempts I realised that all I needed to do was >>> ./backup.sh sdi rather than specifying ./backup.sh /dev/sdi schoolboy >>> error huh, rtfm. >>> >>> I tried modifying the code, so I could kick them all off at once, but >>> I wanted to check that this would work from a multithreaded/sane >>> perspective (yes I know it's a bit of IO, but iotop seems to infer I'm >>> only getting 10M/s throughput from the disk anyway). >> >> That sounds suspiciously like a 100 meg ethernet bottleneck, so parallel operation won't help. If so, this'll take a very long time. >> >>> is this code kosha ? >> >> Looks pretty good. I've attached a slight update with your changes and a bugfix. >> >>> #! /bin/bash >>> # >>> function usage() { >>> printf "Usage:\n\t%s devname\n\n" "`basename \"$0\"`" >>> printf "'devname' must be a relative path in /dev/ to the >>> desired block device.\n" >>> exit 1 >>> } >>> >>> # Verify the supplied name is a device >>> test -b "/dev/$1" || usage >>> >>> # Convert path separators and spaces into dashes >>> outfile="`echo \"$1\" |sed -r -e 's:[ /]+:-:g'`" >>> >>> # Create a side-stream for computing the MD5 of the data read >>> fifo=`mktemp -u` >>> mkfifo $fifo || exit >>> md5sum -b <$fifo >$2/$outfile.md5 & >>> >>> # Read the device and compress it >>> dd if="/dev/$1" bs=1M | tee 2>$fifo | gzip >$2/$outfile.gz >>> >>> # Wait for the background task to close >>> wait >>> >>> I was a little concerned as I didn't see the hdd drive LED light up >>> for my attempt even though the file was growing nicely on my CIFS >>> share. I'm dumping it on the 5x2TB disks (JBOD) in my windows 7 box >>> as my Thecus is not a happy chappy and I need time to mount the DOM >>> module in another machine to find out why. >> >> If you're willing to dismantle the thecus, directly connecting its drives to your crippled system will make things go much faster. You've got four empty motherboard SATA ports. You just have to watch out for the power load. >> >> Phil >> <block2gz.sh> > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-17 14:34 ` Simon Mcnair @ 2011-02-17 16:54 ` Phil Turmel 2011-02-19 8:43 ` Simon Mcnair 0 siblings, 1 reply; 64+ messages in thread From: Phil Turmel @ 2011-02-17 16:54 UTC (permalink / raw) To: Simon Mcnair; +Cc: NeilBrown, linux-raid@vger.kernel.org On 02/17/2011 09:34 AM, Simon Mcnair wrote: > 19390 root 17.49 M/s 0 B/s 0.00 % 21.36 % dd if /dev/sde bs 1M > 19333 root 16.52 M/s 0 B/s 0.00 % 17.07 % dd if /dev/sdd bs 1M > 19503 root 15.79 M/s 0 B/s 0.00 % 13.91 % dd if /dev/sdf bs 1M > 18896 root 0 B/s 16.10 M/s 0.00 % 0.00 % ntfs-3g > /dev/sdb1 /media/ntfs3g/sdb > 18909 root 0 B/s 13.49 M/s 0.00 % 0.00 % ntfs-3g > /dev/sdc1 /media/ntfs3g/sdc > 18920 root 0 B/s 15.73 M/s 0.00 % 0.00 % ntfs-3g > /dev/sdn1 /media/ntfs3g/sdn > > this will certainly be quicker :-) But still a long time... Let me know. Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-17 16:54 ` Phil Turmel @ 2011-02-19 8:43 ` Simon Mcnair 2011-02-19 15:30 ` Phil Turmel 0 siblings, 1 reply; 64+ messages in thread From: Simon Mcnair @ 2011-02-19 8:43 UTC (permalink / raw) To: Phil Turmel; +Cc: NeilBrown, linux-raid@vger.kernel.org Phil, Sorry for the spamming, but I'm just keeping you informed :-). proxmox:/home/simon# ./block2gz.sh sdd /media/ntfs3g/sdb 953869+1 records in 953869+1 records out 1000204886016 bytes (1.0 TB) copied, 95384 s, 10.5 MB/s proxmox:/home/simon# ./block2gz.sh sde /media/ntfs3g/sdn 953869+1 records in 953869+1 records out 1000204886016 bytes (1.0 TB) copied, 106317 s, 9.4 MB/s proxmox:/home/simon# ./block2gz.sh sdf /media/ntfs3g/sdc 953869+1 records in 953869+1 records out 1000204886016 bytes (1.0 TB) copied, 104711 s, 9.6 MB/s proxmox:/home/simon# ./block2gz.sh sdg /media/ntfs3g/sdb 953869+1 records in 953869+1 records out 1000204886016 bytes (1.0 TB) copied, 104980 s, 9.5 MB/s proxmox:/home/simon# ./block2gz.sh sdh /media/ntfs3g/sdn 953869+1 records in 953869+1 records out 1000204886016 bytes (1.0 TB) copied, 98105.1 s, 10.2 MB/s proxmox:/home/simon# ./block2gz.sh sdi /media/ntfs3g/sdc 953869+1 records in 953869+1 records out 1000204886016 bytes (1.0 TB) copied, 98291 s, 10.2 MB/s proxmox:/home/simon# Have just started sdj,sdk,sdl,sdm. I was thinking of renaming the .gz's with the serial number as this would seem to me as more useful, is this a good idea ? These numbers are not at all reflective of the drive or controller speed as I took the lazy route and was writing 2 sets of images at the same time to the same drive and in addition gzip was also running (although not really stressing the system from what I could tell). I suspect also that the drives may be pretty fragmented as the space was not allocated at the start of the write, so that may have had some impact too. Simon On 17 February 2011 16:54, Phil Turmel <philip@turmel.org> wrote: > On 02/17/2011 09:34 AM, Simon Mcnair wrote: >> 19390 root 17.49 M/s 0 B/s 0.00 % 21.36 % dd if /dev/sde bs 1M >> 19333 root 16.52 M/s 0 B/s 0.00 % 17.07 % dd if /dev/sdd bs 1M >> 19503 root 15.79 M/s 0 B/s 0.00 % 13.91 % dd if /dev/sdf bs 1M >> 18896 root 0 B/s 16.10 M/s 0.00 % 0.00 % ntfs-3g >> /dev/sdb1 /media/ntfs3g/sdb >> 18909 root 0 B/s 13.49 M/s 0.00 % 0.00 % ntfs-3g >> /dev/sdc1 /media/ntfs3g/sdc >> 18920 root 0 B/s 15.73 M/s 0.00 % 0.00 % ntfs-3g >> /dev/sdn1 /media/ntfs3g/sdn >> >> this will certainly be quicker :-) > > But still a long time... > > Let me know. > > Phil > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-19 8:43 ` Simon Mcnair @ 2011-02-19 15:30 ` Phil Turmel [not found] ` <AANLkTinOXJWRw_et2U43R_T9XPBzQLnN56Kf2bOAz=_c@mail.gmail.com> 0 siblings, 1 reply; 64+ messages in thread From: Phil Turmel @ 2011-02-19 15:30 UTC (permalink / raw) To: Simon Mcnair; +Cc: NeilBrown, linux-raid@vger.kernel.org On 02/19/2011 03:43 AM, Simon Mcnair wrote: > Phil, > Sorry for the spamming, but I'm just keeping you informed :-). > > proxmox:/home/simon# ./block2gz.sh sdd /media/ntfs3g/sdb > 953869+1 records in > 953869+1 records out > 1000204886016 bytes (1.0 TB) copied, 95384 s, 10.5 MB/s [trim /] > Have just started sdj,sdk,sdl,sdm. I was thinking of renaming the > .gz's with the serial number as this would seem to me as more useful, > is this a good idea ? Yes. I guess they'll finish some time tomorrow? > These numbers are not at all reflective of the drive or controller > speed as I took the lazy route and was writing 2 sets of images at the > same time to the same drive and in addition gzip was also running > (although not really stressing the system from what I could tell). The CPU was busy, but not overloaded. The per-drive data rates were 1/5th to 1/10th what I would have expected. The Asus P6T SE motherboard has a lot of bandwidth. Something's odd. Can you attach a copy of your dmesg? > I suspect also that the drives may be pretty fragmented as the space > was not allocated at the start of the write, so that may have had some > impact too. Ntfs-3g can push 70+ MB/s onto my heavily fragged Windows laptop partition with only 50% of a Core2 Duo. I doubt that's it. Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <AANLkTinOXJWRw_et2U43R_T9XPBzQLnN56Kf2bOAz=_c@mail.gmail.com>]
* Re: Linux software RAID assistance [not found] ` <AANLkTinOXJWRw_et2U43R_T9XPBzQLnN56Kf2bOAz=_c@mail.gmail.com> @ 2011-02-19 16:19 ` Phil Turmel 2011-02-20 9:56 ` Simon Mcnair 0 siblings, 1 reply; 64+ messages in thread From: Phil Turmel @ 2011-02-19 16:19 UTC (permalink / raw) To: Simon Mcnair; +Cc: NeilBrown, linux-raid@vger.kernel.org On 02/19/2011 10:36 AM, Simon Mcnair wrote: > as attached (hopefully). > regards > Simon Nothing obvious to me. Are you running irqbalance? cat /proc/interrupts ? ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-19 16:19 ` Phil Turmel @ 2011-02-20 9:56 ` Simon Mcnair 2011-02-20 19:50 ` Phil Turmel 0 siblings, 1 reply; 64+ messages in thread From: Simon Mcnair @ 2011-02-20 9:56 UTC (permalink / raw) To: Phil Turmel; +Cc: NeilBrown, linux-raid@vger.kernel.org Phil, I don't know how to find out if I'm running irqbalance, it's whatever was in the proxmox iso that I installed the OS from. I ran a ps -aux | grep irq in case it shows anything of interest: proxmox:/home/simon# ps -aux | grep irq Warning: bad ps syntax, perhaps a bogus '-'? See http://procps.sf.net/faq.html root 3 0.0 0.0 0 0 ? S Feb17 0:22 [ksoftirqd/0] root 7 0.0 0.0 0 0 ? S Feb17 0:05 [ksoftirqd/1] root 10 0.0 0.0 0 0 ? S Feb17 0:05 [ksoftirqd/2] root 13 0.0 0.0 0 0 ? S Feb17 0:05 [ksoftirqd/3] root 16 0.0 0.0 0 0 ? S Feb17 0:11 [ksoftirqd/4] root 19 0.0 0.0 0 0 ? S Feb17 0:13 [ksoftirqd/5] root 22 0.0 0.0 0 0 ? S Feb17 0:06 [ksoftirqd/6] root 25 0.0 0.0 0 0 ? S Feb17 0:06 [ksoftirqd/7] root 4024 0.0 0.0 0 0 ? S Feb17 0:00 [kvm-irqfd-clean] proxmox:/home/simon# cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 0: 66008972 0 0 0 0 0 0 0 IR-IO-APIC-edge timer 1: 265829 0 0 0 0 0 0 0 IR-IO-APIC-edge i8042 8: 1 0 0 0 0 0 0 0 IR-IO-APIC-edge rtc0 9: 0 0 0 0 0 0 0 0 IR-IO-APIC-fasteoi acpi 16: 4432639 0 0 0 0 0 0 0 IR-IO-APIC-fasteoi uhci_hcd:usb3, ahci 17: 124325 0 0 0 0 0 0 0 IR-IO-APIC-fasteoi pata_jmicron, eth1 18: 954710 0 0 0 0 0 0 0 IR-IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb8 19: 5994 0 0 0 0 0 0 0 IR-IO-APIC-fasteoi uhci_hcd:usb5, uhci_hcd:usb7, firewire_ohci 21: 0 0 0 0 0 0 0 0 IR-IO-APIC-fasteoi uhci_hcd:usb4 23: 62 0 0 0 0 0 0 0 IR-IO-APIC-fasteoi ehci_hcd:usb2, uhci_hcd:usb6 24: 273558 0 0 0 0 0 0 0 IR-IO-APIC-fasteoi nvidia 28: 30234710 0 0 0 0 0 0 0 IR-IO-APIC-fasteoi mvsas 64: 0 0 0 0 0 0 0 0 DMAR_MSI-edge dmar0 65: 0 0 0 0 0 0 0 0 DMAR_MSI-edge dmar1 73: 26838727 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0 74: 27057631 0 0 0 0 0 0 0 IR-PCI-MSI-edge ahci 75: 247 0 0 0 0 0 0 0 IR-PCI-MSI-edge hda_intel NMI: 0 0 0 0 0 0 0 0 Non-maskable interrupts LOC: 23087679 23088282 21638323 20580136 22974769 20856094 20144310 20084801 Local timer interrupts SPU: 0 0 0 0 0 0 0 0 Spurious interrupts PMI: 0 0 0 0 0 0 0 0 Performance monitoring interrupts PND: 0 0 0 0 0 0 0 0 Performance pending work RES: 15236622 9388860 8532928 7181879 13444866 6005599 4856933 3713707 Rescheduling interrupts CAL: 3384 5773 5839 5859 4590 5748 5781 5691 Function call interrupts TLB: 201989 195383 193545 196155 251338 249053 272032 319233 TLB shootdowns TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts MCE: 0 0 0 0 0 0 0 0 Machine check exceptions MCP: 830 830 830 830 830 830 830 830 Machine check polls ERR: 7 MIS: 0 cheers Simon On 19 February 2011 16:19, Phil Turmel <philip@turmel.org> wrote: > On 02/19/2011 10:36 AM, Simon Mcnair wrote: >> as attached (hopefully). >> regards >> Simon > > Nothing obvious to me. > > Are you running irqbalance? > > cat /proc/interrupts ? > ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-20 9:56 ` Simon Mcnair @ 2011-02-20 19:50 ` Phil Turmel 2011-02-20 23:17 ` Simon Mcnair 0 siblings, 1 reply; 64+ messages in thread From: Phil Turmel @ 2011-02-20 19:50 UTC (permalink / raw) To: Simon Mcnair; +Cc: NeilBrown, linux-raid@vger.kernel.org On 02/20/2011 04:56 AM, Simon Mcnair wrote: > Phil, > I don't know how to find out if I'm running irqbalance, it's whatever > was in the proxmox iso that I installed the OS from. I ran a ps -aux > | grep irq in case it shows anything of interest: > > proxmox:/home/simon# ps -aux | grep irq > Warning: bad ps syntax, perhaps a bogus '-'? See http://procps.sf.net/faq.html > root 3 0.0 0.0 0 0 ? S Feb17 0:22 [ksoftirqd/0] > root 7 0.0 0.0 0 0 ? S Feb17 0:05 [ksoftirqd/1] > root 10 0.0 0.0 0 0 ? S Feb17 0:05 [ksoftirqd/2] > root 13 0.0 0.0 0 0 ? S Feb17 0:05 [ksoftirqd/3] > root 16 0.0 0.0 0 0 ? S Feb17 0:11 [ksoftirqd/4] > root 19 0.0 0.0 0 0 ? S Feb17 0:13 [ksoftirqd/5] > root 22 0.0 0.0 0 0 ? S Feb17 0:06 [ksoftirqd/6] > root 25 0.0 0.0 0 0 ? S Feb17 0:06 [ksoftirqd/7] > root 4024 0.0 0.0 0 0 ? S Feb17 0:00 > [kvm-irqfd-clean] Not there. On ubuntu 10.10, the package is called "irqbalance", and the executable daemon is "irqbalance". > proxmox:/home/simon# cat /proc/interrupts > CPU0 CPU1 CPU2 CPU3 CPU4 > CPU5 CPU6 CPU7 > 0: 66008972 0 0 0 0 > 0 0 0 IR-IO-APIC-edge timer > 1: 265829 0 0 0 0 > 0 0 0 IR-IO-APIC-edge i8042 > 8: 1 0 0 0 0 > 0 0 0 IR-IO-APIC-edge rtc0 > 9: 0 0 0 0 0 > 0 0 0 IR-IO-APIC-fasteoi acpi > 16: 4432639 0 0 0 0 > 0 0 0 IR-IO-APIC-fasteoi uhci_hcd:usb3, ahci > 17: 124325 0 0 0 0 > 0 0 0 IR-IO-APIC-fasteoi pata_jmicron, eth1 > 18: 954710 0 0 0 0 > 0 0 0 IR-IO-APIC-fasteoi ehci_hcd:usb1, > uhci_hcd:usb8 > 19: 5994 0 0 0 0 > 0 0 0 IR-IO-APIC-fasteoi uhci_hcd:usb5, > uhci_hcd:usb7, firewire_ohci > 21: 0 0 0 0 0 > 0 0 0 IR-IO-APIC-fasteoi uhci_hcd:usb4 > 23: 62 0 0 0 0 > 0 0 0 IR-IO-APIC-fasteoi ehci_hcd:usb2, > uhci_hcd:usb6 > 24: 273558 0 0 0 0 > 0 0 0 IR-IO-APIC-fasteoi nvidia > 28: 30234710 0 0 0 0 > 0 0 0 IR-IO-APIC-fasteoi mvsas > 64: 0 0 0 0 0 > 0 0 0 DMAR_MSI-edge dmar0 > 65: 0 0 0 0 0 > 0 0 0 DMAR_MSI-edge dmar1 > 73: 26838727 0 0 0 0 > 0 0 0 IR-PCI-MSI-edge eth0 > 74: 27057631 0 0 0 0 > 0 0 0 IR-PCI-MSI-edge ahci > 75: 247 0 0 0 0 > 0 0 0 IR-PCI-MSI-edge hda_intel > NMI: 0 0 0 0 0 > 0 0 0 Non-maskable interrupts > LOC: 23087679 23088282 21638323 20580136 22974769 > 20856094 20144310 20084801 Local timer interrupts > SPU: 0 0 0 0 0 > 0 0 0 Spurious interrupts > PMI: 0 0 0 0 0 > 0 0 0 Performance monitoring interrupts > PND: 0 0 0 0 0 > 0 0 0 Performance pending work > RES: 15236622 9388860 8532928 7181879 13444866 > 6005599 4856933 3713707 Rescheduling interrupts > CAL: 3384 5773 5839 5859 4590 > 5748 5781 5691 Function call interrupts > TLB: 201989 195383 193545 196155 251338 > 249053 272032 319233 TLB shootdowns > TRM: 0 0 0 0 0 > 0 0 0 Thermal event interrupts > THR: 0 0 0 0 0 > 0 0 0 Threshold APIC interrupts > MCE: 0 0 0 0 0 > 0 0 0 Machine check exceptions > MCP: 830 830 830 830 830 > 830 830 830 Machine check polls > ERR: 7 > MIS: 0 CPU0 is handling every single I/O interrupt. I really think you need irqbalance. Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-20 19:50 ` Phil Turmel @ 2011-02-20 23:17 ` Simon Mcnair 2011-02-20 23:39 ` Phil Turmel 0 siblings, 1 reply; 64+ messages in thread From: Simon Mcnair @ 2011-02-20 23:17 UTC (permalink / raw) To: Phil Turmel; +Cc: NeilBrown, linux-raid@vger.kernel.org Phil, You are a god amongst men :-) I seem to have the vast majority of my data. I can't see anything in lost+found and the fsck log doesn't list any filenames etc of missing data. Is there any way for me to find anything that I've lost ? :-) :-) :-) Simon On 20 February 2011 19:50, Phil Turmel <philip@turmel.org> wrote: > On 02/20/2011 04:56 AM, Simon Mcnair wrote: >> Phil, >> I don't know how to find out if I'm running irqbalance, it's whatever >> was in the proxmox iso that I installed the OS from. I ran a ps -aux >> | grep irq in case it shows anything of interest: >> >> proxmox:/home/simon# ps -aux | grep irq >> Warning: bad ps syntax, perhaps a bogus '-'? See http://procps.sf.net/faq.html >> root 3 0.0 0.0 0 0 ? S Feb17 0:22 [ksoftirqd/0] >> root 7 0.0 0.0 0 0 ? S Feb17 0:05 [ksoftirqd/1] >> root 10 0.0 0.0 0 0 ? S Feb17 0:05 [ksoftirqd/2] >> root 13 0.0 0.0 0 0 ? S Feb17 0:05 [ksoftirqd/3] >> root 16 0.0 0.0 0 0 ? S Feb17 0:11 [ksoftirqd/4] >> root 19 0.0 0.0 0 0 ? S Feb17 0:13 [ksoftirqd/5] >> root 22 0.0 0.0 0 0 ? S Feb17 0:06 [ksoftirqd/6] >> root 25 0.0 0.0 0 0 ? S Feb17 0:06 [ksoftirqd/7] >> root 4024 0.0 0.0 0 0 ? S Feb17 0:00 >> [kvm-irqfd-clean] > > Not there. On ubuntu 10.10, the package is called "irqbalance", and the executable daemon is "irqbalance". > >> proxmox:/home/simon# cat /proc/interrupts >> CPU0 CPU1 CPU2 CPU3 CPU4 >> CPU5 CPU6 CPU7 >> 0: 66008972 0 0 0 0 >> 0 0 0 IR-IO-APIC-edge timer >> 1: 265829 0 0 0 0 >> 0 0 0 IR-IO-APIC-edge i8042 >> 8: 1 0 0 0 0 >> 0 0 0 IR-IO-APIC-edge rtc0 >> 9: 0 0 0 0 0 >> 0 0 0 IR-IO-APIC-fasteoi acpi >> 16: 4432639 0 0 0 0 >> 0 0 0 IR-IO-APIC-fasteoi uhci_hcd:usb3, ahci >> 17: 124325 0 0 0 0 >> 0 0 0 IR-IO-APIC-fasteoi pata_jmicron, eth1 >> 18: 954710 0 0 0 0 >> 0 0 0 IR-IO-APIC-fasteoi ehci_hcd:usb1, >> uhci_hcd:usb8 >> 19: 5994 0 0 0 0 >> 0 0 0 IR-IO-APIC-fasteoi uhci_hcd:usb5, >> uhci_hcd:usb7, firewire_ohci >> 21: 0 0 0 0 0 >> 0 0 0 IR-IO-APIC-fasteoi uhci_hcd:usb4 >> 23: 62 0 0 0 0 >> 0 0 0 IR-IO-APIC-fasteoi ehci_hcd:usb2, >> uhci_hcd:usb6 >> 24: 273558 0 0 0 0 >> 0 0 0 IR-IO-APIC-fasteoi nvidia >> 28: 30234710 0 0 0 0 >> 0 0 0 IR-IO-APIC-fasteoi mvsas >> 64: 0 0 0 0 0 >> 0 0 0 DMAR_MSI-edge dmar0 >> 65: 0 0 0 0 0 >> 0 0 0 DMAR_MSI-edge dmar1 >> 73: 26838727 0 0 0 0 >> 0 0 0 IR-PCI-MSI-edge eth0 >> 74: 27057631 0 0 0 0 >> 0 0 0 IR-PCI-MSI-edge ahci >> 75: 247 0 0 0 0 >> 0 0 0 IR-PCI-MSI-edge hda_intel >> NMI: 0 0 0 0 0 >> 0 0 0 Non-maskable interrupts >> LOC: 23087679 23088282 21638323 20580136 22974769 >> 20856094 20144310 20084801 Local timer interrupts >> SPU: 0 0 0 0 0 >> 0 0 0 Spurious interrupts >> PMI: 0 0 0 0 0 >> 0 0 0 Performance monitoring interrupts >> PND: 0 0 0 0 0 >> 0 0 0 Performance pending work >> RES: 15236622 9388860 8532928 7181879 13444866 >> 6005599 4856933 3713707 Rescheduling interrupts >> CAL: 3384 5773 5839 5859 4590 >> 5748 5781 5691 Function call interrupts >> TLB: 201989 195383 193545 196155 251338 >> 249053 272032 319233 TLB shootdowns >> TRM: 0 0 0 0 0 >> 0 0 0 Thermal event interrupts >> THR: 0 0 0 0 0 >> 0 0 0 Threshold APIC interrupts >> MCE: 0 0 0 0 0 >> 0 0 0 Machine check exceptions >> MCP: 830 830 830 830 830 >> 830 830 830 Machine check polls >> ERR: 7 >> MIS: 0 > > CPU0 is handling every single I/O interrupt. I really think you need irqbalance. > > Phil > ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-20 23:17 ` Simon Mcnair @ 2011-02-20 23:39 ` Phil Turmel 2011-02-22 17:12 ` Simon Mcnair 0 siblings, 1 reply; 64+ messages in thread From: Phil Turmel @ 2011-02-20 23:39 UTC (permalink / raw) To: Simon Mcnair; +Cc: NeilBrown, linux-raid@vger.kernel.org On 02/20/2011 06:17 PM, Simon Mcnair wrote: > Phil, > You are a god amongst men :-) Heh. I read that out loud to my wife.... > I seem to have the vast majority of my data. I can't see anything in > lost+found and the fsck log doesn't list any filenames etc of missing > data. Is there any way for me to find anything that I've lost ? Not that I know of, short of a list of names from before the crash. Glad to hear that the family treasures are safe. Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-20 23:39 ` Phil Turmel @ 2011-02-22 17:12 ` Simon Mcnair 2011-02-22 17:14 ` Simon Mcnair 0 siblings, 1 reply; 64+ messages in thread From: Simon Mcnair @ 2011-02-22 17:12 UTC (permalink / raw) To: Phil Turmel; +Cc: NeilBrown, linux-raid@vger.kernel.org Phil, I thought it was fixed, but every time I do a reboot it disappears again. I did the: pvcreate --restorefile 'simons-lvm-backup' --uuid 9fAJEz-HcaP-RQ51-fV8b-nxrN-Uqwb-PPnOLJ /dev/md0 and the vgcfgrestore the only piece of your advice that I didn't follow was "zero the superblock on the removed drive" as I thought re-adding it to the array would do that anyway. Did this cause me problems ? proxmox:~# mdadm --assemble --scan -v mdadm: looking for devices for /dev/md/0 mdadm: cannot open device /dev/dm-2: Device or resource busy mdadm: /dev/dm-2 has wrong uuid. mdadm: cannot open device /dev/dm-1: Device or resource busy mdadm: /dev/dm-1 has wrong uuid. mdadm: cannot open device /dev/dm-0: Device or resource busy mdadm: /dev/dm-0 has wrong uuid. mdadm: /dev/sdo1 has wrong uuid. mdadm: no RAID superblock on /dev/sdo mdadm: /dev/sdo has wrong uuid. mdadm: /dev/sdn1 has wrong uuid. mdadm: no RAID superblock on /dev/sdn mdadm: /dev/sdn has wrong uuid. mdadm: /dev/sdm1 has wrong uuid. mdadm: no RAID superblock on /dev/sdm mdadm: /dev/sdm has wrong uuid. mdadm: /dev/sdl1 has wrong uuid. mdadm: no RAID superblock on /dev/sdl mdadm: /dev/sdl has wrong uuid. mdadm: /dev/sdk1 has wrong uuid. mdadm: no RAID superblock on /dev/sdk mdadm: /dev/sdk has wrong uuid. mdadm: /dev/sdj1 has wrong uuid. mdadm: no RAID superblock on /dev/sdj mdadm: /dev/sdj has wrong uuid. mdadm: /dev/sdi1 has wrong uuid. mdadm: no RAID superblock on /dev/sdi mdadm: /dev/sdi has wrong uuid. mdadm: /dev/sdh1 has wrong uuid. mdadm: no RAID superblock on /dev/sdh mdadm: /dev/sdh has wrong uuid. mdadm: /dev/sdg1 has wrong uuid. mdadm: no RAID superblock on /dev/sdg mdadm: /dev/sdg has wrong uuid. mdadm: /dev/sdf1 has wrong uuid. mdadm: no RAID superblock on /dev/sdf mdadm: /dev/sdf has wrong uuid. mdadm: cannot open device /dev/sdc1: Device or resource busy mdadm: /dev/sdc1 has wrong uuid. mdadm: cannot open device /dev/sdc: Device or resource busy mdadm: /dev/sdc has wrong uuid. mdadm: cannot open device /dev/sdb1: Device or resource busy mdadm: /dev/sdb1 has wrong uuid. mdadm: cannot open device /dev/sdb: Device or resource busy mdadm: /dev/sdb has wrong uuid. mdadm: cannot open device /dev/sda2: Device or resource busy mdadm: /dev/sda2 has wrong uuid. mdadm: cannot open device /dev/sda1: Device or resource busy mdadm: /dev/sda1 has wrong uuid. mdadm: cannot open device /dev/sda: Device or resource busy mdadm: /dev/sda has wrong uuid. I'm getting frustrated now. It seems like it's fixzed, then next reboot, BAM, it's gone again. I'm so out of my depth here. Please can you help me fix this once and for all ? cheers Simon On 20 February 2011 23:39, Phil Turmel <philip@turmel.org> wrote: > On 02/20/2011 06:17 PM, Simon Mcnair wrote: >> Phil, >> You are a god amongst men :-) > > Heh. I read that out loud to my wife.... > >> I seem to have the vast majority of my data. I can't see anything in >> lost+found and the fsck log doesn't list any filenames etc of missing >> data. Is there any way for me to find anything that I've lost ? > > Not that I know of, short of a list of names from before the crash. > > Glad to hear that the family treasures are safe. > > Phil > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-22 17:12 ` Simon Mcnair @ 2011-02-22 17:14 ` Simon Mcnair 2011-02-22 18:23 ` Phil Turmel 0 siblings, 1 reply; 64+ messages in thread From: Simon Mcnair @ 2011-02-22 17:14 UTC (permalink / raw) To: Phil Turmel; +Cc: NeilBrown, linux-raid@vger.kernel.org was this anything to do with the dist-upgrade that I performed ? mdadm has upgraded again to proxmox:~# mdadm -V mdadm - v3.1.4 - 31st August 2010 if that's important. Simon On 22 February 2011 17:12, Simon Mcnair <simonmcnair@gmail.com> wrote: > Phil, > I thought it was fixed, but every time I do a reboot it disappears > again. I did the: > pvcreate --restorefile 'simons-lvm-backup' --uuid > 9fAJEz-HcaP-RQ51-fV8b-nxrN-Uqwb-PPnOLJ /dev/md0 and the vgcfgrestore > > the only piece of your advice that I didn't follow was "zero the > superblock on the removed drive" as I thought re-adding it to the > array would do that anyway. Did this cause me problems ? > > proxmox:~# mdadm --assemble --scan -v > mdadm: looking for devices for /dev/md/0 > mdadm: cannot open device /dev/dm-2: Device or resource busy > mdadm: /dev/dm-2 has wrong uuid. > mdadm: cannot open device /dev/dm-1: Device or resource busy > mdadm: /dev/dm-1 has wrong uuid. > mdadm: cannot open device /dev/dm-0: Device or resource busy > mdadm: /dev/dm-0 has wrong uuid. > mdadm: /dev/sdo1 has wrong uuid. > mdadm: no RAID superblock on /dev/sdo > mdadm: /dev/sdo has wrong uuid. > mdadm: /dev/sdn1 has wrong uuid. > mdadm: no RAID superblock on /dev/sdn > mdadm: /dev/sdn has wrong uuid. > mdadm: /dev/sdm1 has wrong uuid. > mdadm: no RAID superblock on /dev/sdm > mdadm: /dev/sdm has wrong uuid. > mdadm: /dev/sdl1 has wrong uuid. > mdadm: no RAID superblock on /dev/sdl > mdadm: /dev/sdl has wrong uuid. > mdadm: /dev/sdk1 has wrong uuid. > mdadm: no RAID superblock on /dev/sdk > mdadm: /dev/sdk has wrong uuid. > mdadm: /dev/sdj1 has wrong uuid. > mdadm: no RAID superblock on /dev/sdj > mdadm: /dev/sdj has wrong uuid. > mdadm: /dev/sdi1 has wrong uuid. > mdadm: no RAID superblock on /dev/sdi > mdadm: /dev/sdi has wrong uuid. > mdadm: /dev/sdh1 has wrong uuid. > mdadm: no RAID superblock on /dev/sdh > mdadm: /dev/sdh has wrong uuid. > mdadm: /dev/sdg1 has wrong uuid. > mdadm: no RAID superblock on /dev/sdg > mdadm: /dev/sdg has wrong uuid. > mdadm: /dev/sdf1 has wrong uuid. > mdadm: no RAID superblock on /dev/sdf > mdadm: /dev/sdf has wrong uuid. > mdadm: cannot open device /dev/sdc1: Device or resource busy > mdadm: /dev/sdc1 has wrong uuid. > mdadm: cannot open device /dev/sdc: Device or resource busy > mdadm: /dev/sdc has wrong uuid. > mdadm: cannot open device /dev/sdb1: Device or resource busy > mdadm: /dev/sdb1 has wrong uuid. > mdadm: cannot open device /dev/sdb: Device or resource busy > mdadm: /dev/sdb has wrong uuid. > mdadm: cannot open device /dev/sda2: Device or resource busy > mdadm: /dev/sda2 has wrong uuid. > mdadm: cannot open device /dev/sda1: Device or resource busy > mdadm: /dev/sda1 has wrong uuid. > mdadm: cannot open device /dev/sda: Device or resource busy > mdadm: /dev/sda has wrong uuid. > > I'm getting frustrated now. It seems like it's fixzed, then next > reboot, BAM, it's gone again. I'm so out of my depth here. > > Please can you help me fix this once and for all ? > > cheers > Simon > > On 20 February 2011 23:39, Phil Turmel <philip@turmel.org> wrote: >> On 02/20/2011 06:17 PM, Simon Mcnair wrote: >>> Phil, >>> You are a god amongst men :-) >> >> Heh. I read that out loud to my wife.... >> >>> I seem to have the vast majority of my data. I can't see anything in >>> lost+found and the fsck log doesn't list any filenames etc of missing >>> data. Is there any way for me to find anything that I've lost ? >> >> Not that I know of, short of a list of names from before the crash. >> >> Glad to hear that the family treasures are safe. >> >> Phil >> > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-22 17:14 ` Simon Mcnair @ 2011-02-22 18:23 ` Phil Turmel 2011-02-22 18:36 ` Simon McNair 0 siblings, 1 reply; 64+ messages in thread From: Phil Turmel @ 2011-02-22 18:23 UTC (permalink / raw) To: Simon Mcnair; +Cc: NeilBrown, linux-raid@vger.kernel.org On 02/22/2011 12:14 PM, Simon Mcnair wrote: > was this anything to do with the dist-upgrade that I performed ? > > mdadm has upgraded again to > proxmox:~# mdadm -V > mdadm - v3.1.4 - 31st August 2010 No, but you probably can't do any more "--create --assume-clean" operations with that array. As for the assembly errors, its probably an out-of-date mdadm.conf. My server's looks like this: > DEVICE /dev/sd[a-z][1-9] > > ARRAY /dev/md23 UUID=c3cbe096:fc43d939:8aa66230:708c5670 > ARRAY /dev/md22 UUID=1438a239:aa03c3f9:68e051d7:b59a6219 > ARRAY /dev/md21 UUID=4b47d0f6:16f0e352:c67a2185:feb0b573 > ARRAY /dev/md20 UUID=7676c77e:6ae70f65:30a170a2:b1a1b242 Yours probably needs to be updated with the new md0 uuid. Its not your boot drive, so your initramfs shouldn't matter. Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-22 18:23 ` Phil Turmel @ 2011-02-22 18:36 ` Simon McNair 2011-02-22 19:06 ` Phil Turmel 0 siblings, 1 reply; 64+ messages in thread From: Simon McNair @ 2011-02-22 18:36 UTC (permalink / raw) To: Phil Turmel; +Cc: NeilBrown, linux-raid@vger.kernel.org Phil, phew I didn't know that: "mdadm: /dev/sdo has wrong uuid. mdadm: /dev/sdn1 has wrong uuid." was just that the array UUID didn't match mdadm.conf, it would be nice if it said: "mdadm: /dev/sdo uuid does not match mdadm.conf. mdadm: /dev/sdn1 uuid does not match mdadm.conf" my mdadm.conf now reads DEVICE partitions CREATE owner=root group=disk mode=0660 auto=yes HOMEHOST <system> MAILADDR root # ARRAY /dev/md/0 level=raid5 metadata=1.1 num-devices=10 UUID=0a72e40f:aec6f80f:a7004457:1a84a7a8 name=pro�lox:0 ARRAY /dev/md/0 metadata=1.1 UUID=12c2af00:10681e10:fb17e449:1404739c name=proxmox:0 I'm afraid to do another reboot incase something else goes wrong ;-). Just for my info, when I only have a single array there isn't much chance of it being assigned anything other than md0, so could I not just leave mdadm.conf empty ? cheers Simon On 22/02/2011 18:23, Phil Turmel wrote: > On 02/22/2011 12:14 PM, Simon Mcnair wrote: >> was this anything to do with the dist-upgrade that I performed ? >> >> mdadm has upgraded again to >> proxmox:~# mdadm -V >> mdadm - v3.1.4 - 31st August 2010 > No, but you probably can't do any more "--create --assume-clean" operations with that array. > > As for the assembly errors, its probably an out-of-date mdadm.conf. > > My server's looks like this: > >> DEVICE /dev/sd[a-z][1-9] >> >> ARRAY /dev/md23 UUID=c3cbe096:fc43d939:8aa66230:708c5670 >> ARRAY /dev/md22 UUID=1438a239:aa03c3f9:68e051d7:b59a6219 >> ARRAY /dev/md21 UUID=4b47d0f6:16f0e352:c67a2185:feb0b573 >> ARRAY /dev/md20 UUID=7676c77e:6ae70f65:30a170a2:b1a1b242 > Yours probably needs to be updated with the new md0 uuid. Its not your boot drive, so your initramfs shouldn't matter. > > Phil -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-22 18:36 ` Simon McNair @ 2011-02-22 19:06 ` Phil Turmel 0 siblings, 0 replies; 64+ messages in thread From: Phil Turmel @ 2011-02-22 19:06 UTC (permalink / raw) To: simonmcnair; +Cc: NeilBrown, linux-raid@vger.kernel.org On 02/22/2011 01:36 PM, Simon McNair wrote: > Phil, > phew I didn't know that: > "mdadm: /dev/sdo has wrong uuid. > mdadm: /dev/sdn1 has wrong uuid." > > was just that the array UUID didn't match mdadm.conf, it would be nice if it said: > > "mdadm: /dev/sdo uuid does not match mdadm.conf. > mdadm: /dev/sdn1 uuid does not match mdadm.conf" > my mdadm.conf now reads > > DEVICE partitions > CREATE owner=root group=disk mode=0660 auto=yes > HOMEHOST <system> > MAILADDR root > # ARRAY /dev/md/0 level=raid5 metadata=1.1 num-devices=10 UUID=0a72e40f:aec6f80f:a7004457:1a84a7a8 name=pro�lox:0 > ARRAY /dev/md/0 metadata=1.1 UUID=12c2af00:10681e10:fb17e449:1404739c name=proxmox:0 You can strip it down to be effectively "auto", but I wouldn't remove it. Since you are using partitioned devices throughout your system, something like this: DEVICE /dev/sd*[0-9] AUTO all Or slightly more restrictive: DEVICE /dev/sd*[0-9] ARRAY /dev/md/0 UUID=12c2af00:10681e10:fb17e449:1404739c > I'm afraid to do another reboot incase something else goes wrong ;-). Just for my info, when I only have a single array there isn't much chance of it being assigned anything other than md0, so could I not just leave mdadm.conf empty ? LVM doesn't care what device name the physical volumes end up with. Please reboot when the system is quiet again to make sure your new mdadm.conf works. Or stop your array, and then do a "mdadm --assemble --scan", which is effectively the same. Phil -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-16 19:36 ` Phil Turmel 2011-02-16 21:28 ` Simon McNair @ 2011-02-18 9:31 ` Simon Mcnair 2011-02-18 13:16 ` Phil Turmel [not found] ` <AANLkTi=RmR5nVnmFLuqK5anHc3WDPxjuYjitT6+5wAqS@mail.gmail.com> 2 siblings, 1 reply; 64+ messages in thread From: Simon Mcnair @ 2011-02-18 9:31 UTC (permalink / raw) To: Phil Turmel; +Cc: NeilBrown, linux-raid time passes. You are eaten by a Grue. Sheesh this is taking a long time. simon@proxmox:~$ top top - 09:26:19 up 20:36, 11 users, load average: 2.46, 2.49, 2.39 Tasks: 334 total, 9 running, 325 sleeping, 0 stopped, 0 zombie Cpu(s): 52.8%us, 9.4%sy, 0.0%ni, 35.2%id, 2.5%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 12299244k total, 12227584k used, 71660k free, 11042868k buffers Swap: 11534332k total, 0k used, 11534332k free, 208204k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 18896 root 20 0 12288 1072 460 S 63 0.0 431:16.55 ntfs-3g 18909 root 20 0 12288 1064 460 R 62 0.0 419:17.11 ntfs-3g 18920 root 20 0 12288 1100 492 R 62 0.0 428:17.33 ntfs-3g 9210 root 20 0 4068 520 328 S 54 0.0 661:59.92 gzip 9138 root 20 0 4068 524 328 R 53 0.0 647:29.07 gzip 9247 root 20 0 4068 524 328 S 53 0.0 651:28.36 gzip 25678 root 20 0 4068 524 328 R 52 0.0 439:49.20 gzip 24957 root 20 0 4068 524 328 R 51 0.0 437:44.01 gzip 25792 root 20 0 4068 524 328 S 48 0.0 433:28.14 gzip This is hardly touching the CPU on the box (in my opinion), any advice on using renice ? I've never used it before, but now seems like a good time ? tia Simon On 16 February 2011 19:36, Phil Turmel <philip@turmel.org> wrote: > On 02/16/2011 02:15 PM, Simon McNair wrote: >> proxmox:/home/simon# vgscan --verbose >> Wiping cache of LVM-capable devices >> Wiping internal VG cache >> Reading all physical volumes. This may take a while... >> Finding all volume groups >> Finding volume group "pve" >> Found volume group "pve" using metadata type lvm2 >> Finding volume group "lvm-raid" >> Found volume group "lvm-raid" using metadata type lvm2 >> proxmox:/home/simon# >> proxmox:/home/simon# lvscan --verbose >> Finding all logical volumes >> ACTIVE '/dev/pve/swap' [11.00 GB] inherit >> ACTIVE '/dev/pve/root' [96.00 GB] inherit >> ACTIVE '/dev/pve/data' [354.26 GB] inherit >> inactive '/dev/lvm-raid/RAID' [8.19 TB] inherit >> >> proxmox:/home/simon# vgchange -ay >> 3 logical volume(s) in volume group "pve" now active >> 1 logical volume(s) in volume group "lvm-raid" now active > > Heh. Figures. > >> proxmox:/home/simon# fsck.ext4 -n /dev/mapper/lvm-raid-RAID > > Actually, I wanted you to try with a capital N. Lower case 'n' is similar, but not quite the same. > >> e2fsck 1.41.3 (12-Oct-2008) >> fsck.ext4: No such file or directory while trying to open /dev/mapper/lvm-raid-RAID >> >> The superblock could not be read or does not describe a correct ext2 >> filesystem. If the device is valid and it really contains an ext2 >> filesystem (and not swap or ufs or something else), then the superblock >> is corrupt, and you might try running e2fsck with an alternate superblock: >> e2fsck -b 8193 <device> >> >> proxmox:/home/simon# fsck.ext4 -n /dev/mapper/ >> control lvm--raid-RAID pve-data pve-root pve-swap > > Strange. I guess it does that to distinguish dashes in the VG name from dashes between VG and LV names. > >> proxmox:/home/simon# fsck.ext4 -n /dev/mapper/lvm--raid-RAID >> e2fsck 1.41.3 (12-Oct-2008) >> /dev/mapper/lvm--raid-RAID has unsupported feature(s): FEATURE_I31 >> e2fsck: Get a newer version of e2fsck! >> >> my version of e2fsck always worked before ? > > v1.41.14 was release 7 weeks ago. But, I suspect there's corruption in the superblock. Do you still have your disk images tucked away somewhere safe? > > If so, try: > > 1) The '-b' option to e2fsck. We need to experiment with '-n -b offset' to find the alternate superblock. Trying 'offset' = to 8193, 16384, and 32768, per the man-page. > > 2) A newer e2fsprogs. > > Finally, > 3) mount -r /dev/lvm-raid/RAID /mnt/whatever > > Phil > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-18 9:31 ` Simon Mcnair @ 2011-02-18 13:16 ` Phil Turmel 2011-02-18 13:21 ` Roberto Spadim 2011-02-18 13:29 ` Simon Mcnair 0 siblings, 2 replies; 64+ messages in thread From: Phil Turmel @ 2011-02-18 13:16 UTC (permalink / raw) To: Simon Mcnair; +Cc: NeilBrown, linux-raid On 02/18/2011 04:31 AM, Simon Mcnair wrote: > time passes. You are eaten by a Grue. Sheesh this is taking a long time. > > simon@proxmox:~$ top > > top - 09:26:19 up 20:36, 11 users, load average: 2.46, 2.49, 2.39 > Tasks: 334 total, 9 running, 325 sleeping, 0 stopped, 0 zombie > Cpu(s): 52.8%us, 9.4%sy, 0.0%ni, 35.2%id, 2.5%wa, 0.0%hi, 0.2%si, 0.0%st > Mem: 12299244k total, 12227584k used, 71660k free, 11042868k buffers > Swap: 11534332k total, 0k used, 11534332k free, 208204k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 18896 root 20 0 12288 1072 460 S 63 0.0 431:16.55 ntfs-3g > 18909 root 20 0 12288 1064 460 R 62 0.0 419:17.11 ntfs-3g > 18920 root 20 0 12288 1100 492 R 62 0.0 428:17.33 ntfs-3g > 9210 root 20 0 4068 520 328 S 54 0.0 661:59.92 gzip > 9138 root 20 0 4068 524 328 R 53 0.0 647:29.07 gzip > 9247 root 20 0 4068 524 328 S 53 0.0 651:28.36 gzip > 25678 root 20 0 4068 524 328 R 52 0.0 439:49.20 gzip > 24957 root 20 0 4068 524 328 R 51 0.0 437:44.01 gzip > 25792 root 20 0 4068 524 328 S 48 0.0 433:28.14 gzip > > This is hardly touching the CPU on the box (in my opinion), any advice > on using renice ? I've never used it before, but now seems like a > good time ? Probably wouldn't help. You have a total usage of 65%. I suspect that each of the processors running gzip are nearly pegged, and everything else is loafing along. Hit '1' in top to show the CPU usage per-cpu. Single-threaded gzip is holding you back. If you had enough space for saving the partitions uncompressed, it would be much faster. The PCIe x4 interface on the SuperMicro has a theoretical performance of 1GByte/s, which would be ~ 125MB/s per drive. From what I've read, that card actually delivers ~ 75MB/s per drive when they're all busy. Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-18 13:16 ` Phil Turmel @ 2011-02-18 13:21 ` Roberto Spadim 2011-02-18 13:26 ` Phil Turmel 2011-02-18 13:29 ` Simon Mcnair 1 sibling, 1 reply; 64+ messages in thread From: Roberto Spadim @ 2011-02-18 13:21 UTC (permalink / raw) To: Phil Turmel; +Cc: Simon Mcnair, NeilBrown, linux-raid pci express x4 = 2.5*4gbps = 10gbps not?! 2011/2/18 Phil Turmel <philip@turmel.org>: > On 02/18/2011 04:31 AM, Simon Mcnair wrote: >> time passes. You are eaten by a Grue. Sheesh this is taking a long time. >> >> simon@proxmox:~$ top >> >> top - 09:26:19 up 20:36, 11 users, load average: 2.46, 2.49, 2.39 >> Tasks: 334 total, 9 running, 325 sleeping, 0 stopped, 0 zombie >> Cpu(s): 52.8%us, 9.4%sy, 0.0%ni, 35.2%id, 2.5%wa, 0.0%hi, 0.2%si, 0.0%st >> Mem: 12299244k total, 12227584k used, 71660k free, 11042868k buffers >> Swap: 11534332k total, 0k used, 11534332k free, 208204k cached >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 18896 root 20 0 12288 1072 460 S 63 0.0 431:16.55 ntfs-3g >> 18909 root 20 0 12288 1064 460 R 62 0.0 419:17.11 ntfs-3g >> 18920 root 20 0 12288 1100 492 R 62 0.0 428:17.33 ntfs-3g >> 9210 root 20 0 4068 520 328 S 54 0.0 661:59.92 gzip >> 9138 root 20 0 4068 524 328 R 53 0.0 647:29.07 gzip >> 9247 root 20 0 4068 524 328 S 53 0.0 651:28.36 gzip >> 25678 root 20 0 4068 524 328 R 52 0.0 439:49.20 gzip >> 24957 root 20 0 4068 524 328 R 51 0.0 437:44.01 gzip >> 25792 root 20 0 4068 524 328 S 48 0.0 433:28.14 gzip >> >> This is hardly touching the CPU on the box (in my opinion), any advice >> on using renice ? I've never used it before, but now seems like a >> good time ? > > Probably wouldn't help. > > You have a total usage of 65%. I suspect that each of the processors running gzip are nearly pegged, and everything else is loafing along. Hit '1' in top to show the CPU usage per-cpu. Single-threaded gzip is holding you back. > > If you had enough space for saving the partitions uncompressed, it would be much faster. The PCIe x4 interface on the SuperMicro has a theoretical performance of 1GByte/s, which would be ~ 125MB/s per drive. From what I've read, that card actually delivers ~ 75MB/s per drive when they're all busy. > > Phil > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-18 13:21 ` Roberto Spadim @ 2011-02-18 13:26 ` Phil Turmel 0 siblings, 0 replies; 64+ messages in thread From: Phil Turmel @ 2011-02-18 13:26 UTC (permalink / raw) To: Roberto Spadim; +Cc: Simon Mcnair, NeilBrown, linux-raid On 02/18/2011 08:21 AM, Roberto Spadim wrote: > pci express x4 = 2.5*4gbps = 10gbps not?! Subtract transfer overhead (Framing, ECC, addressing, etc) and it takes >8 bits to transfer 8 bits. I haven't checked the spec, but it looks like just framing and ECC is +25%. ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-18 13:16 ` Phil Turmel 2011-02-18 13:21 ` Roberto Spadim @ 2011-02-18 13:29 ` Simon Mcnair 2011-02-18 13:34 ` Phil Turmel 1 sibling, 1 reply; 64+ messages in thread From: Simon Mcnair @ 2011-02-18 13:29 UTC (permalink / raw) To: Phil Turmel; +Cc: NeilBrown, linux-raid top - 13:28:45 up 1 day, 39 min, 12 users, load average: 2.65, 2.49, 2.37 Tasks: 340 total, 7 running, 333 sleeping, 0 stopped, 0 zombie Cpu0 : 47.9%us, 15.0%sy, 0.0%ni, 13.7%id, 21.5%wa, 0.0%hi, 2.0%si, 0.0%st Cpu1 : 54.8%us, 8.7%sy, 0.0%ni, 33.3%id, 3.2%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 46.5%us, 13.7%sy, 0.0%ni, 37.9%id, 1.9%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 49.5%us, 9.5%sy, 0.0%ni, 37.6%id, 3.4%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 48.7%us, 8.2%sy, 0.0%ni, 17.3%id, 25.5%wa, 0.0%hi, 0.3%si, 0.0%st Cpu5 : 50.3%us, 7.2%sy, 0.0%ni, 41.8%id, 0.6%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 53.5%us, 6.6%sy, 0.0%ni, 38.4%id, 1.6%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 : 52.0%us, 6.2%sy, 0.0%ni, 41.2%id, 0.6%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 12299244k total, 12227064k used, 72180k free, 10754600k buffers Swap: 11534332k total, 0k used, 11534332k free, 175916k cached On 18 February 2011 13:16, Phil Turmel <philip@turmel.org> wrote: > On 02/18/2011 04:31 AM, Simon Mcnair wrote: >> time passes. You are eaten by a Grue. Sheesh this is taking a long time. >> >> simon@proxmox:~$ top >> >> top - 09:26:19 up 20:36, 11 users, load average: 2.46, 2.49, 2.39 >> Tasks: 334 total, 9 running, 325 sleeping, 0 stopped, 0 zombie >> Cpu(s): 52.8%us, 9.4%sy, 0.0%ni, 35.2%id, 2.5%wa, 0.0%hi, 0.2%si, 0.0%st >> Mem: 12299244k total, 12227584k used, 71660k free, 11042868k buffers >> Swap: 11534332k total, 0k used, 11534332k free, 208204k cached >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 18896 root 20 0 12288 1072 460 S 63 0.0 431:16.55 ntfs-3g >> 18909 root 20 0 12288 1064 460 R 62 0.0 419:17.11 ntfs-3g >> 18920 root 20 0 12288 1100 492 R 62 0.0 428:17.33 ntfs-3g >> 9210 root 20 0 4068 520 328 S 54 0.0 661:59.92 gzip >> 9138 root 20 0 4068 524 328 R 53 0.0 647:29.07 gzip >> 9247 root 20 0 4068 524 328 S 53 0.0 651:28.36 gzip >> 25678 root 20 0 4068 524 328 R 52 0.0 439:49.20 gzip >> 24957 root 20 0 4068 524 328 R 51 0.0 437:44.01 gzip >> 25792 root 20 0 4068 524 328 S 48 0.0 433:28.14 gzip >> >> This is hardly touching the CPU on the box (in my opinion), any advice >> on using renice ? I've never used it before, but now seems like a >> good time ? > > Probably wouldn't help. > > You have a total usage of 65%. I suspect that each of the processors running gzip are nearly pegged, and everything else is loafing along. Hit '1' in top to show the CPU usage per-cpu. Single-threaded gzip is holding you back. > > If you had enough space for saving the partitions uncompressed, it would be much faster. The PCIe x4 interface on the SuperMicro has a theoretical performance of 1GByte/s, which would be ~ 125MB/s per drive. From what I've read, that card actually delivers ~ 75MB/s per drive when they're all busy. > > Phil > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-18 13:29 ` Simon Mcnair @ 2011-02-18 13:34 ` Phil Turmel 2011-02-18 14:12 ` Simon McNair 0 siblings, 1 reply; 64+ messages in thread From: Phil Turmel @ 2011-02-18 13:34 UTC (permalink / raw) To: Simon Mcnair; +Cc: NeilBrown, linux-raid On 02/18/2011 08:29 AM, Simon Mcnair wrote: > top - 13:28:45 up 1 day, 39 min, 12 users, load average: 2.65, 2.49, 2.37 > Tasks: 340 total, 7 running, 333 sleeping, 0 stopped, 0 zombie > Cpu0 : 47.9%us, 15.0%sy, 0.0%ni, 13.7%id, 21.5%wa, 0.0%hi, 2.0%si, 0.0%st > Cpu1 : 54.8%us, 8.7%sy, 0.0%ni, 33.3%id, 3.2%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu2 : 46.5%us, 13.7%sy, 0.0%ni, 37.9%id, 1.9%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu3 : 49.5%us, 9.5%sy, 0.0%ni, 37.6%id, 3.4%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu4 : 48.7%us, 8.2%sy, 0.0%ni, 17.3%id, 25.5%wa, 0.0%hi, 0.3%si, 0.0%st > Cpu5 : 50.3%us, 7.2%sy, 0.0%ni, 41.8%id, 0.6%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu6 : 53.5%us, 6.6%sy, 0.0%ni, 38.4%id, 1.6%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu7 : 52.0%us, 6.2%sy, 0.0%ni, 41.2%id, 0.6%wa, 0.0%hi, 0.0%si, 0.0%st > Mem: 12299244k total, 12227064k used, 72180k free, 10754600k buffers > Swap: 11534332k total, 0k used, 11534332k free, 175916k cached Hmmm. Not what I expected at all. I'd be very interested in some performance tests of the Supermicro ports vs. motherboard ports on that system. Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-18 13:34 ` Phil Turmel @ 2011-02-18 14:12 ` Simon McNair 2011-02-18 16:10 ` Phil Turmel 0 siblings, 1 reply; 64+ messages in thread From: Simon McNair @ 2011-02-18 14:12 UTC (permalink / raw) To: Phil Turmel; +Cc: NeilBrown, linux-raid Phil, Tell me what you want me to do and I'm game for it. When I bought it the reviews were not all that promising (http://www.google.co.uk/products/catalog?hl=&q=AOC-SASLP-MV8&cid=15596376504116122379&os=reviews) but I figure getting up and running is more important than sata speed at the moment. cheers Simon On 18/02/2011 13:34, Phil Turmel wrote: > On 02/18/2011 08:29 AM, Simon Mcnair wrote: >> top - 13:28:45 up 1 day, 39 min, 12 users, load average: 2.65, 2.49, 2.37 >> Tasks: 340 total, 7 running, 333 sleeping, 0 stopped, 0 zombie >> Cpu0 : 47.9%us, 15.0%sy, 0.0%ni, 13.7%id, 21.5%wa, 0.0%hi, 2.0%si, 0.0%st >> Cpu1 : 54.8%us, 8.7%sy, 0.0%ni, 33.3%id, 3.2%wa, 0.0%hi, 0.0%si, 0.0%st >> Cpu2 : 46.5%us, 13.7%sy, 0.0%ni, 37.9%id, 1.9%wa, 0.0%hi, 0.0%si, 0.0%st >> Cpu3 : 49.5%us, 9.5%sy, 0.0%ni, 37.6%id, 3.4%wa, 0.0%hi, 0.0%si, 0.0%st >> Cpu4 : 48.7%us, 8.2%sy, 0.0%ni, 17.3%id, 25.5%wa, 0.0%hi, 0.3%si, 0.0%st >> Cpu5 : 50.3%us, 7.2%sy, 0.0%ni, 41.8%id, 0.6%wa, 0.0%hi, 0.0%si, 0.0%st >> Cpu6 : 53.5%us, 6.6%sy, 0.0%ni, 38.4%id, 1.6%wa, 0.0%hi, 0.0%si, 0.0%st >> Cpu7 : 52.0%us, 6.2%sy, 0.0%ni, 41.2%id, 0.6%wa, 0.0%hi, 0.0%si, 0.0%st >> Mem: 12299244k total, 12227064k used, 72180k free, 10754600k buffers >> Swap: 11534332k total, 0k used, 11534332k free, 175916k cached > Hmmm. Not what I expected at all. > > I'd be very interested in some performance tests of the Supermicro ports vs. motherboard ports on that system. > > Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-18 14:12 ` Simon McNair @ 2011-02-18 16:10 ` Phil Turmel 2011-02-18 16:38 ` Roberto Spadim 0 siblings, 1 reply; 64+ messages in thread From: Phil Turmel @ 2011-02-18 16:10 UTC (permalink / raw) To: simonmcnair; +Cc: NeilBrown, linux-raid On 02/18/2011 09:12 AM, Simon McNair wrote: > Phil, > Tell me what you want me to do and I'm game for it. When I bought it the reviews were not all that promising (http://www.google.co.uk/products/catalog?hl=&q=AOC-SASLP-MV8&cid=15596376504116122379&os=reviews) but I figure getting up and running is more important than sata speed at the moment. I perused the cards reviewed here: http://blog.zorinaq.com/?e=10 I was aiming for the ~$100 mark, for personal use. I'd have chosen the one of LSI 1068E units, but the cheaper ones are reversed boards (parts on opposite side of normal). The Marvell 6480 was apparently the next best deal. I may have to rethink that. In any case, some benchmarking options: 1) Spot check with "hdparm -t /dev/sd?". Not considered very accurate. It's read-only, though, so can run in parallel with other device activity (w/ corresponding reduced accuracy). 2) Run a packaged set of benchmarks: http://www.coker.com.au/bonnie++/ http://www.iozone.org/ http://dbench.samba.org/ 3) Script a series of basic tests of various tasks: https://raid.wiki.kernel.org/index.php/Home_grown_testing_methods Most of these tests require a writable filesystem on the target device to work in. Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-18 16:10 ` Phil Turmel @ 2011-02-18 16:38 ` Roberto Spadim 0 siblings, 0 replies; 64+ messages in thread From: Roberto Spadim @ 2011-02-18 16:38 UTC (permalink / raw) To: Phil Turmel; +Cc: simonmcnair, NeilBrown, linux-raid dd if=/dev/XXX of=/dev/null iflag=direct on other terminal run iostat -d 1 -k 2011/2/18 Phil Turmel <philip@turmel.org>: > On 02/18/2011 09:12 AM, Simon McNair wrote: >> Phil, >> Tell me what you want me to do and I'm game for it. When I bought it the reviews were not all that promising (http://www.google.co.uk/products/catalog?hl=&q=AOC-SASLP-MV8&cid=15596376504116122379&os=reviews) but I figure getting up and running is more important than sata speed at the moment. > > I perused the cards reviewed here: > > http://blog.zorinaq.com/?e=10 > > I was aiming for the ~$100 mark, for personal use. I'd have chosen the one of LSI 1068E units, but the cheaper ones are reversed boards (parts on opposite side of normal). The Marvell 6480 was apparently the next best deal. I may have to rethink that. > > In any case, some benchmarking options: > > 1) Spot check with "hdparm -t /dev/sd?". Not considered very accurate. It's read-only, though, so can run in parallel with other device activity (w/ corresponding reduced accuracy). > > 2) Run a packaged set of benchmarks: > http://www.coker.com.au/bonnie++/ > http://www.iozone.org/ > http://dbench.samba.org/ > > 3) Script a series of basic tests of various tasks: > https://raid.wiki.kernel.org/index.php/Home_grown_testing_methods > > Most of these tests require a writable filesystem on the target device to work in. > > Phil > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
[parent not found: <AANLkTi=RmR5nVnmFLuqK5anHc3WDPxjuYjitT6+5wAqS@mail.gmail.com>]
* Re: Linux software RAID assistance [not found] ` <AANLkTi=RmR5nVnmFLuqK5anHc3WDPxjuYjitT6+5wAqS@mail.gmail.com> @ 2011-02-20 18:48 ` Phil Turmel 2011-02-20 19:25 ` Simon Mcnair 0 siblings, 1 reply; 64+ messages in thread From: Phil Turmel @ 2011-02-20 18:48 UTC (permalink / raw) To: Simon Mcnair; +Cc: NeilBrown, linux-raid On 02/20/2011 12:03 PM, Simon Mcnair wrote: > Hi Phil, > > Is this fsck (fsck.ext4 -n -b 32768 /dev/mapper/lvm--raid-RAID >> fsck.txt) as bad as it looks ? :-( It's bad. Either the original sdd has a lot more corruption than I expected, or the 3ware spread corruption over all the drives. If the former, failing it out of the array might help. If the latter, your data is likely toast. Some identifiable data is being found, based on the used vs. free block/inode/directory counts in that report. That's good. I suggest you do "mdadm /dev/md0 --fail /dev/sdi1" and repeat the "fsck -n" as above. (It'll be noticably slower, as it'll be using parity to reconstruct 1 out every 9 chunks.) If it the fsck results improve, or stay the same, proceed to "fsck -y", and we'll see. Wouldn't hurt to run "iostat -xm 5" in another terminal during the fsck to see what kind of performance that array is getting. Phil ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-20 18:48 ` Phil Turmel @ 2011-02-20 19:25 ` Simon Mcnair 0 siblings, 0 replies; 64+ messages in thread From: Simon Mcnair @ 2011-02-20 19:25 UTC (permalink / raw) To: Phil Turmel; +Cc: NeilBrown, linux-raid that's not good :-( have done the --fail and running fsck at the moment. IOstat as below: avg-cpu: %user %nice %system %iowait %steal %idle 10.96 0.00 4.01 10.98 0.00 74.05 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdd 4455.20 0.00 497.80 0.00 19.35 0.00 79.59 1.66 3.34 1.11 55.20 sdd1 4455.20 0.00 497.80 0.00 19.35 0.00 79.59 1.66 3.34 1.11 55.20 sde 4454.20 0.00 498.60 0.00 19.35 0.00 79.47 1.63 3.26 1.13 56.20 sde1 4454.20 0.00 498.60 0.00 19.35 0.00 79.47 1.63 3.26 1.13 56.20 sdf 4311.60 0.00 615.60 0.00 19.33 0.00 64.31 1.22 1.98 0.60 37.20 sdf1 4311.60 0.00 615.60 0.00 19.33 0.00 64.31 1.22 1.98 0.60 37.20 sdg 4262.60 0.00 659.60 0.00 19.35 0.00 60.08 1.83 2.77 0.79 52.20 sdg1 4262.60 0.00 659.60 0.00 19.35 0.00 60.08 1.83 2.77 0.79 52.20 sdh 4242.20 0.00 665.80 0.00 19.36 0.00 59.54 1.67 2.51 0.63 42.00 sdh1 4242.20 0.00 665.80 0.00 19.36 0.00 59.54 1.67 2.51 0.63 42.00 sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdi1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdj 4382.00 0.00 567.20 0.00 19.34 0.00 69.82 1.35 2.38 0.74 42.20 sdj1 4382.00 0.00 567.20 0.00 19.34 0.00 69.82 1.35 2.38 0.74 42.20 sdk 4341.40 0.00 605.60 0.00 19.34 0.00 65.42 1.71 2.82 0.89 53.80 sdk1 4341.40 0.00 605.60 0.00 19.34 0.00 65.42 1.71 2.82 0.89 53.80 sdl 4368.20 0.00 579.20 0.00 19.33 0.00 68.36 1.78 3.07 0.99 57.60 sdl1 4368.20 0.00 579.20 0.00 19.33 0.00 68.36 1.78 3.07 0.99 57.60 sdm 4351.00 0.00 591.40 0.00 19.32 0.00 66.92 2.17 3.68 1.09 64.60 sdm1 4351.00 0.00 591.40 0.00 19.32 0.00 66.92 2.17 3.68 1.09 64.60 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md0 0.00 0.00 24808.00 0.00 96.91 0.00 8.00 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 5735.60 0.00 22.40 0.00 8.00 26.08 4.55 0.17 99.20 sdr 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdr1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sds 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sds1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Simon On 20 February 2011 18:48, Phil Turmel <philip@turmel.org> wrote: > On 02/20/2011 12:03 PM, Simon Mcnair wrote: >> Hi Phil, >> >> Is this fsck (fsck.ext4 -n -b 32768 /dev/mapper/lvm--raid-RAID >>> fsck.txt) as bad as it looks ? :-( > > It's bad. Either the original sdd has a lot more corruption than I expected, or the 3ware spread corruption over all the drives. > > If the former, failing it out of the array might help. If the latter, your data is likely toast. Some identifiable data is being found, based on the used vs. free block/inode/directory counts in that report. That's good. > > I suggest you do "mdadm /dev/md0 --fail /dev/sdi1" and repeat the "fsck -n" as above. > > (It'll be noticably slower, as it'll be using parity to reconstruct 1 out every 9 chunks.) > > If it the fsck results improve, or stay the same, proceed to "fsck -y", and we'll see. > > Wouldn't hurt to run "iostat -xm 5" in another terminal during the fsck to see what kind of performance that array is getting. > > Phil > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-16 18:14 ` Phil Turmel 2011-02-16 18:18 ` Simon McNair @ 2011-02-19 8:49 ` Simon Mcnair 1 sibling, 0 replies; 64+ messages in thread From: Simon Mcnair @ 2011-02-19 8:49 UTC (permalink / raw) To: Phil Turmel; +Cc: NeilBrown, linux-raid Hot plug isn't a problem, I have two supermicro CSE-M35T1 's. It's wierd though that the drive order has changed so much because I plugged the sata ports in to the board in the same order as the 3ware card. i.e. port 0->0 1->1 2->2 3->3 4->4 5->5 6->6 7->7 8->m/b 7 9->m/b 8 maybe it's the way linux/udev allocated them... one last thing: Is echo 1 > /sys/block/sda/device/delete the correct way to stop a AHCI SATA device ? cheers Simon On 16 February 2011 18:14, Phil Turmel <philip@turmel.org> wrote: > On 02/16/2011 12:49 PM, Simon McNair wrote: >> Hi Phil, >> A couple of questions please. > > [trim /] > >>>> Simon >>> I don't know why the serial numbers are formatted differently, but we can still tell them apart (the eight characters starting with "PAG"). >>> >>> So, our device order in your new setup is: [ihgfmlkjbc], where /dev/sdi corresponds to the original report's /dev/sdd, which matches the sig grep in your other note. >>> >>> Another note: The controller for sd[abc] is still showing ata_piix as its controller. That means you cannot hot-plug those ports. If you change your BIOS to AHCI mode instead of "Compatibility" or "Emulation", the full-featured ahci driver will run those ports. Not urgent, but I highly recommend it. >>> >> Will do that now, before I forget > > Hot-pluggability with suitable trays is very handy! :) > > [trim /] > >>>> Error: The backup GPT table is not at the end of the disk, as it should be. This might mean that another operating >>>> system believes the disk is smaller. Fix, by moving the backup to the end (and removing the old backup)? >>>> Fix/Cancel? c >>> The 3ware controller must have reserved some space at the end of each drive for its own use. Didn't know it'd do that. You will have to fix that. >>> >>> [trim /] >>> >> Do you have any suggestions on how I can fix that ? I don't have a clue > > Just do 'parted /dev/sd?' and on the ones it offers to fix, say yes. Then request 'unit s' and 'print' to verify that it is correct. > > [trim /] > >> when I was trying to figure out the command for this using 'man parted' I came across this: >> " rescue start end >> Rescue a lost partition that was located somewhere between start and end. If a partition is >> found, parted will ask if you want to create an entry for it in the partition table." >> Is it worth trying ? > > Nah. That's for when you don't know exactly where the partition is. We know. > >> I originally created the partitions like so: >> parted -s /dev/sdb rm 1 >> parted -s /dev/sdb mklabel gpt >> parted -s --align optimal /dev/sdb mkpart primary ext4 .512 100% >> parted -s /dev/sdb set 1 raid on >> parted -s /dev/sdb align-check optimal 1 >> >> so to recreate the above I would do: >> parted -s /dev/sdb mkpart primary ext4 2048s 1953101823s >> parted -s /dev/sdc mkpart primary ext4 2048s 1953101823s >> parted -s /dev/sdf mkpart primary ext4 2048s 1953101823s >> parted -s /dev/sdg mkpart primary ext4 2048s 1953101823s >> parted -s /dev/sdh mkpart primary ext4 2048s 1953101823s >> parted -s /dev/sdi mkpart primary ext4 2048s 1953101823s >> parted -s /dev/sdj mkpart primary ext4 2048s 1953101823s >> parted -s /dev/sdk mkpart primary ext4 2048s 1953101823s >> parted -s /dev/sdl mkpart primary ext4 2048s 1953101823s >> parted -s /dev/sdm mkpart primary ext4 2048s 1953101823s > > Only recreate the partition tables where you have to, i.e., the 'Fix' option above didn't work. And don't specify a filesystem. > > Probably just /dev/sdh and /dev/sdi. Like so, though: > > parted -s /dev/sdh mklabel gpt mkpart primary 2048s 1953101823s set 1 raid on > parted -s /dev/sdi mklabel gpt mkpart primary 2048s 1953101823s set 1 raid on > >> I'm guessing the backups that I want to do can wait until any potential fsck ? > > Do an 'fsck -N' first, and if it passes, or has few errors, mount the filesystem readonly and grab your backup. Then let fsck have at it for real. If anything gets fixed, compare your backup from the read-only fs to the fixed fs. > > Given your flaky old controller, I expect there'll be *some* problems. > >> sorry if the questions are dumb but I'm not sure what I'm doing and I'd rather ask more questions than fewer and understand the implications of what I'm doing. > > Oh, no. You are right to be paranoid. If anything looks funny, stop. > > Phil > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: Linux software RAID assistance 2011-02-15 14:51 ` Phil Turmel 2011-02-15 19:04 ` Simon McNair 2011-02-16 13:51 ` Simon McNair @ 2011-02-16 13:56 ` Simon McNair 2 siblings, 0 replies; 64+ messages in thread From: Simon McNair @ 2011-02-16 13:56 UTC (permalink / raw) To: Phil Turmel; +Cc: NeilBrown, linux-raid one other snippet: proxmox:/home/simon# for x in /dev/sd{d..m} ; do echo $x ; dd if=$x skip=2312 count=128 2>/dev/null |strings |grep 9fAJEz-HcaP-RQ51-fV8b-nxrN-Uqwb-PPnOLJ ; done /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi id = "9fAJEz-HcaP-RQ51-fV8b-nxrN-Uqwb-PPnOLJ" id = "9fAJEz-HcaP-RQ51-fV8b-nxrN-Uqwb-PPnOLJ" /dev/sdj /dev/sdk /dev/sdl /dev/sdm On 15/02/2011 14:51, Phil Turmel wrote: > Hi Neil, > > Since Simon has responded, let me summarize the assistance I provided per his off-list request: > > On 02/14/2011 11:53 PM, NeilBrown wrote: >> On Thu, 10 Feb 2011 16:16:44 +0000 Simon McNair<simonmcnair@gmail.com> wrote: >> >>> Hi all >>> >>> I use a 3ware 9500-12 port sata card (JBOD) which will not work without a >>> 128mb sodimm. The sodimm socket is flakey and the result is that the >>> machine occasionally crashes. Yesterday I finally gave in and put >>> together another >>> machine so that I can rsync between them. When I turned the machine >>> on today to set up rync, the RAID array was not gone, but corrupted. >>> Typical... >> Presumably the old machine was called 'ubuntu' and the new machine 'proølox' >> >> >>> I built the array in Aug 2010 using the following command: >>> >>> mdadm --create --verbose /dev/md0 --metadata=1.1 --level=5 >>> --raid-devices=10 /dev/sd{b,c,d,e,f,g,h,i,j,k}1 --chunk=64 >>> >>> Using LVM, I did the following: >>> pvscan >>> pvcreate -M2 /dev/md0 >>> vgcreate lvm-raid /dev/md0 >>> vgdisplay lvm-raid >>> vgscan >>> lvscan >>> lvcreate -v -l 100%VG -n RAID lvm-raid >>> lvdisplay /dev/lvm-raid/lvm0 >>> >>> I then formatted using: >>> mkfs -t ext4 -v -m .1 -b 4096 -E stride=16,stripe-width=144 >>> /dev/lvm-raid/RAID >>> >>> This worked perfectly since I created the array. Now mdadm is coming up >>> with >>> >>> proxmox:/dev/md# mdadm --assemble --scan --verbose >>> mdadm: looking for devices for further assembly >>> mdadm: no recogniseable superblock on /dev/md/ubuntu:0 >> And it seems that ubuntu:0 have been successfully assembled. >> It is missing one device for some reason (sdd1) but RAID can cope with that. > 3ware card is compromised, with a loose buffer memory dimm. Some of its ECC errors were caught and reported in dmesg. Its likely, based on the loose memory socket, that many multiple-bit errors got through. > > [trim /] > >>> mdadm: no uptodate device for slot 8 of /dev/md/pro�lox:0 >>> mdadm: no uptodate device for slot 9 of /dev/md/pro�lox:0 >>> mdadm: failed to add /dev/sdd1 to /dev/md/pro�lox:0: Invalid argument >>> mdadm: /dev/md/pro�lox:0 assembled from 0 drives - not enough to start >>> the array. >> This looks like it is *after* to trying the --create command you give >> below.. It is best to report things in the order they happen, else you can >> confuse people (or get caught out!). > Yes, this was after. > >>> mdadm: looking for devices for further assembly >>> mdadm: no recogniseable superblock on /dev/sdd >>> mdadm: No arrays found in config file or automatically >>> >>> pvscan and vgscan show nothing. >>> >>> So I tried running mdadm --create --verbose /dev/md0 --metadata=1.1 >>> --level=5 --raid-devices=10 missing /dev/sde1 /dev/sdf1 /dev/sdg1 >>> /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 --chunk=64 >>> >>> as it seemed that /dev/sdd1 failed to be added to the array. This did >>> nothing. >> It did not to nothing. It wrote a superblock to /dev/sdd1 and complained >> that it couldn't write to all the others --- didn't it? > There were multiple attempts to create. One wrote to just sdd1, another succeeded with all but sdd1. > >>> dmesg contains: >>> >>> md: invalid superblock checksum on sdd1 >> I guess that is why sdd1 was missing from 'ubuntu:0'. Though as I cannot >> tell if this happened before or after any of the various things reported >> above, it is hard to be sure. >> >> >> The real mystery is why 'pvscan' reports nothing. > The original array was created with mdadm v2.6.7, and had a data offset of 264 sectors. After Simon's various attempts to --create, he ended up with data offset of 2048, using mdadm v3.1.4. The mdadm -E reports he posted to the list showed the 264 offset. We didn't realize the offset had been updated until somewhat later in our troubleshooting efforts. > > In any case, pvscan couldn't see the LVM signature because it wasn't there (at offset 2048). > >> What about >> pvscan --verbose >> >> or >> >> blkid -p /dev/md/ubuntu:0 >> >> or even >> >> dd of=/dev/md/ubuntu:0 count=8 | od -c > Fortunately, Simon did have a copy of his LVM configuration. With the help of dd, strings, and grep, we did locate his LVM sig at the correct location on sdd1 (for data offset 264). After a number of attempts to bypass LVM and access his single LV with dmsetup (based on his backed up configuration, on the assembled new array less sdd1), I realized that the data offset was wrong on the recreated array, and went looking for the cause. I found your git commit that changed that logic last spring, and recommended that Simon revert to the default package for his ubuntu install, which is v2.6.7. > > Simon has now attempted to recreate the array with v2.6.7, but the controller is throwing too many errors to succeed, and I suggested it was too flakey to trust any further. Based on the existence of the LVM sig on sdd1, I believe Simon's data is (mostly) intact, and only needs a successful create operation with a properly functioning controller. (He might also need to perform an lvm vgcfgrestore, but he has the necessary backup file.) > > A new controller is on order. > > Phil -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 64+ messages in thread
end of thread, other threads:[~2011-02-22 19:06 UTC | newest] Thread overview: 64+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-02-10 16:16 Linux software RAID assistance Simon McNair 2011-02-10 18:24 ` Phil Turmel 2011-02-15 4:53 ` NeilBrown 2011-02-15 8:48 ` Simon McNair 2011-02-15 14:51 ` Phil Turmel 2011-02-15 19:04 ` Simon McNair 2011-02-15 19:37 ` Phil Turmel 2011-02-15 19:45 ` Roman Mamedov 2011-02-15 21:09 ` Simon McNair 2011-02-17 15:10 ` Simon Mcnair 2011-02-17 15:42 ` Roman Mamedov 2011-02-18 9:13 ` Simon McNair 2011-02-18 9:38 ` Robin Hill 2011-02-18 10:38 ` Simon Mcnair 2011-02-19 11:46 ` Jan Ceuleers 2011-02-19 12:40 ` Simon McNair 2011-02-19 17:37 ` Jan Ceuleers 2011-02-16 13:51 ` Simon McNair 2011-02-16 14:37 ` Phil Turmel 2011-02-16 17:49 ` Simon McNair 2011-02-16 18:14 ` Phil Turmel 2011-02-16 18:18 ` Simon McNair 2011-02-16 18:22 ` Phil Turmel 2011-02-16 18:25 ` Phil Turmel 2011-02-16 18:52 ` Simon McNair 2011-02-16 18:57 ` Phil Turmel 2011-02-16 19:07 ` Simon McNair 2011-02-16 19:10 ` Phil Turmel 2011-02-16 19:15 ` Simon McNair 2011-02-16 19:36 ` Phil Turmel 2011-02-16 21:28 ` Simon McNair 2011-02-16 21:30 ` Phil Turmel 2011-02-16 22:44 ` Simon Mcnair 2011-02-16 23:39 ` Phil Turmel 2011-02-17 13:26 ` Simon Mcnair 2011-02-17 13:48 ` Phil Turmel 2011-02-17 13:56 ` Simon Mcnair 2011-02-17 14:34 ` Simon Mcnair 2011-02-17 16:54 ` Phil Turmel 2011-02-19 8:43 ` Simon Mcnair 2011-02-19 15:30 ` Phil Turmel [not found] ` <AANLkTinOXJWRw_et2U43R_T9XPBzQLnN56Kf2bOAz=_c@mail.gmail.com> 2011-02-19 16:19 ` Phil Turmel 2011-02-20 9:56 ` Simon Mcnair 2011-02-20 19:50 ` Phil Turmel 2011-02-20 23:17 ` Simon Mcnair 2011-02-20 23:39 ` Phil Turmel 2011-02-22 17:12 ` Simon Mcnair 2011-02-22 17:14 ` Simon Mcnair 2011-02-22 18:23 ` Phil Turmel 2011-02-22 18:36 ` Simon McNair 2011-02-22 19:06 ` Phil Turmel 2011-02-18 9:31 ` Simon Mcnair 2011-02-18 13:16 ` Phil Turmel 2011-02-18 13:21 ` Roberto Spadim 2011-02-18 13:26 ` Phil Turmel 2011-02-18 13:29 ` Simon Mcnair 2011-02-18 13:34 ` Phil Turmel 2011-02-18 14:12 ` Simon McNair 2011-02-18 16:10 ` Phil Turmel 2011-02-18 16:38 ` Roberto Spadim [not found] ` <AANLkTi=RmR5nVnmFLuqK5anHc3WDPxjuYjitT6+5wAqS@mail.gmail.com> 2011-02-20 18:48 ` Phil Turmel 2011-02-20 19:25 ` Simon Mcnair 2011-02-19 8:49 ` Simon Mcnair 2011-02-16 13:56 ` Simon McNair
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).