Linux software RAID assistance

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Linux software RAID assistance
@ 2011-02-10 16:16 Simon McNair
  2011-02-10 18:24 ` Phil Turmel
  2011-02-15  4:53 ` NeilBrown
  0 siblings, 2 replies; 64+ messages in thread
From: Simon McNair @ 2011-02-10 16:16 UTC (permalink / raw)
  To: linux-raid


Hi all

I use a 3ware 9500-12 port sata card (JBOD) which will not work without a
128mb sodimm.  The sodimm socket is flakey and the result is that the
machine occasionally crashes.  Yesterday I finally gave in and put 
together another
machine so that I can rsync between them.  When I turned the machine
on today to set up rync, the RAID array was not gone, but corrupted. 
  Typical...

I built the array in Aug 2010 using the following command:

mdadm --create --verbose /dev/md0 --metadata=1.1 --level=5
--raid-devices=10 /dev/sd{b,c,d,e,f,g,h,i,j,k}1 --chunk=64

Using LVM, I did the following:
pvscan
pvcreate -M2 /dev/md0
vgcreate lvm-raid /dev/md0
vgdisplay lvm-raid
vgscan
lvscan
lvcreate -v -l 100%VG -n RAID lvm-raid
lvdisplay /dev/lvm-raid/lvm0

I then formatted using:
mkfs -t ext4 -v -m .1 -b 4096 -E stride=16,stripe-width=144 
/dev/lvm-raid/RAID

This worked perfectly since I created the array.  Now mdadm is coming up 
with

proxmox:/dev/md# mdadm --assemble --scan --verbose
mdadm: looking for devices for further assembly
mdadm: no recogniseable superblock on /dev/md/ubuntu:0
mdadm: cannot open device /dev/dm-2: Device or resource busy
mdadm: cannot open device /dev/dm-1: Device or resource busy
mdadm: cannot open device /dev/dm-0: Device or resource busy
mdadm: cannot open device /dev/sdm1: Device or resource busy
mdadm: cannot open device /dev/sdm: Device or resource busy
mdadm: cannot open device /dev/sdl1: Device or resource busy
mdadm: cannot open device /dev/sdl: Device or resource busy
mdadm: cannot open device /dev/sdk1: Device or resource busy
mdadm: cannot open device /dev/sdk: Device or resource busy
mdadm: cannot open device /dev/sdj1: Device or resource busy
mdadm: cannot open device /dev/sdj: Device or resource busy
mdadm: cannot open device /dev/sdh1: Device or resource busy
mdadm: cannot open device /dev/sdh: Device or resource busy
mdadm: cannot open device /dev/sdi1: Device or resource busy
mdadm: cannot open device /dev/sdi: Device or resource busy
mdadm: cannot open device /dev/sdg1: Device or resource busy
mdadm: cannot open device /dev/sdg: Device or resource busy
mdadm: cannot open device /dev/sdf1: Device or resource busy
mdadm: cannot open device /dev/sdf: Device or resource busy
mdadm: cannot open device /dev/sde1: Device or resource busy
mdadm: cannot open device /dev/sde: Device or resource busy
mdadm: no RAID superblock on /dev/sdd
mdadm: cannot open device /dev/sda2: Device or resource busy
mdadm: cannot open device /dev/sda1: Device or resource busy
mdadm: cannot open device /dev/sda: Device or resource busy
mdadm: /dev/sdd1 is identified as a member of /dev/md/pro�lox:0, slot 0.
mdadm: no uptodate device for slot 1 of /dev/md/pro�lox:0
mdadm: no uptodate device for slot 2 of /dev/md/pro�lox:0
mdadm: no uptodate device for slot 3 of /dev/md/pro�lox:0
mdadm: no uptodate device for slot 4 of /dev/md/pro�lox:0
mdadm: no uptodate device for slot 5 of /dev/md/pro�lox:0
mdadm: no uptodate device for slot 6 of /dev/md/pro�lox:0
mdadm: no uptodate device for slot 7 of /dev/md/pro�lox:0
mdadm: no uptodate device for slot 8 of /dev/md/pro�lox:0
mdadm: no uptodate device for slot 9 of /dev/md/pro�lox:0
mdadm: failed to add /dev/sdd1 to /dev/md/pro�lox:0: Invalid argument
mdadm: /dev/md/pro�lox:0 assembled from 0 drives - not enough to start
the array.
mdadm: looking for devices for further assembly
mdadm: no recogniseable superblock on /dev/sdd
mdadm: No arrays found in config file or automatically

pvscan and vgscan show nothing.

So I tried running mdadm --create --verbose /dev/md0 --metadata=1.1
--level=5 --raid-devices=10 missing /dev/sde1 /dev/sdf1 /dev/sdg1
/dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 --chunk=64

as it seemed that /dev/sdd1 failed to be added to the array.  This did 
nothing.

dmesg contains:

md: invalid superblock checksum on sdd1
md: sdd1 does not have a valid v1.1 superblock, not importing!
md: md_import_device returned -22

The output of mdadm -E is as follows:

proxmox:~# mdadm -E /dev/sd[d-m]1
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x0
     Array UUID : c4f62f32:4244a1db:b7746203:f10b5227
           Name : proølox:0
  Creation Time : Sat Aug 21 19:16:38 2010
     Raid Level : raid5
   Raid Devices : 10

  Avail Dev Size : 1953099512 (931.31 GiB 999.99 GB)
     Array Size : 17577894528 (8381.79 GiB 8999.88 GB)
  Used Dev Size : 1953099392 (931.31 GiB 999.99 GB)
    Data Offset : 264 sectors
   Super Offset : 0 sectors
          State : clean
    Device UUID : b0ffd0a1:866676af:3ae4c03c:d219676e

    Update Time : Mon Feb  7 21:08:29 2011
       Checksum : 13aa9685 - expected 93aa9672
         Events : 60802

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 0
   Array State : AAAAAAAAAA ('A' == active, '.' == missing)
/dev/sde1:
          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x0
     Array UUID : 7d55c29e:273c35da:f6438197:0365f95f
           Name : ubuntu:0
  Creation Time : Thu Feb 10 11:59:48 2011
     Raid Level : raid5
   Raid Devices : 10

  Avail Dev Size : 1953099512 (931.31 GiB 999.99 GB)
     Array Size : 17577894528 (8381.79 GiB 8999.88 GB)
  Used Dev Size : 1953099392 (931.31 GiB 999.99 GB)
    Data Offset : 264 sectors
   Super Offset : 0 sectors
          State : clean
    Device UUID : 4121fa31:7543218c:f42a937d:fddf04e8

    Update Time : Thu Feb 10 12:17:59 2011
       Checksum : 5f965c84 - correct
         Events : 4

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 1
   Array State : .AAAAAAAAA ('A' == active, '.' == missing)
/dev/sdf1:
          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x0
     Array UUID : 7d55c29e:273c35da:f6438197:0365f95f
           Name : ubuntu:0
  Creation Time : Thu Feb 10 11:59:48 2011
     Raid Level : raid5
   Raid Devices : 10

  Avail Dev Size : 1953099512 (931.31 GiB 999.99 GB)
     Array Size : 17577894528 (8381.79 GiB 8999.88 GB)
  Used Dev Size : 1953099392 (931.31 GiB 999.99 GB)
    Data Offset : 264 sectors
   Super Offset : 0 sectors
          State : clean
    Device UUID : e70da8ed:c80e9533:7d8e200e:dc285255

    Update Time : Thu Feb 10 12:17:59 2011
       Checksum : c092c0e5 - correct
         Events : 4

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 2
   Array State : .AAAAAAAAA ('A' == active, '.' == missing)
/dev/sdg1:
          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x0
     Array UUID : 7d55c29e:273c35da:f6438197:0365f95f
           Name : ubuntu:0
  Creation Time : Thu Feb 10 11:59:48 2011
     Raid Level : raid5
   Raid Devices : 10

  Avail Dev Size : 1953099512 (931.31 GiB 999.99 GB)
     Array Size : 17577894528 (8381.79 GiB 8999.88 GB)
  Used Dev Size : 1953099392 (931.31 GiB 999.99 GB)
    Data Offset : 264 sectors
   Super Offset : 0 sectors
          State : clean
    Device UUID : 5878fdbb:d0c7c892:40c8fa6c:0c5e257b

    Update Time : Thu Feb 10 12:17:59 2011
       Checksum : 72c95353 - correct
         Events : 4

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 3
   Array State : .AAAAAAAAA ('A' == active, '.' == missing)
/dev/sdh1:
          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x0
     Array UUID : 7d55c29e:273c35da:f6438197:0365f95f
           Name : ubuntu:0
  Creation Time : Thu Feb 10 11:59:48 2011
     Raid Level : raid5
   Raid Devices : 10

  Avail Dev Size : 1953099512 (931.31 GiB 999.99 GB)
     Array Size : 17577894528 (8381.79 GiB 8999.88 GB)
  Used Dev Size : 1953099392 (931.31 GiB 999.99 GB)
    Data Offset : 264 sectors
   Super Offset : 0 sectors
          State : clean
    Device UUID : 7bced3d0:5ebac414:02effc8b:ac1ad69e

    Update Time : Thu Feb 10 12:17:59 2011
       Checksum : 4c4e7f67 - correct
         Events : 4

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 4
   Array State : .AAAAAAAAA ('A' == active, '.' == missing)
/dev/sdi1:
          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x0
     Array UUID : 7d55c29e:273c35da:f6438197:0365f95f
           Name : ubuntu:0
  Creation Time : Thu Feb 10 11:59:48 2011
     Raid Level : raid5
   Raid Devices : 10

  Avail Dev Size : 1953099512 (931.31 GiB 999.99 GB)
     Array Size : 17577894528 (8381.79 GiB 8999.88 GB)
  Used Dev Size : 1953099392 (931.31 GiB 999.99 GB)
    Data Offset : 264 sectors
   Super Offset : 0 sectors
          State : clean
    Device UUID : 56513558:a25025bb:11e8b563:ae42f8d4

    Update Time : Thu Feb 10 12:17:59 2011
       Checksum : 87ebb998 - correct
         Events : 4

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 5
   Array State : .AAAAAAAAA ('A' == active, '.' == missing)
/dev/sdj1:
          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x0
     Array UUID : 7d55c29e:273c35da:f6438197:0365f95f
           Name : ubuntu:0
  Creation Time : Thu Feb 10 11:59:48 2011
     Raid Level : raid5
   Raid Devices : 10

  Avail Dev Size : 1953099512 (931.31 GiB 999.99 GB)
     Array Size : 17577894528 (8381.79 GiB 8999.88 GB)
  Used Dev Size : 1953099392 (931.31 GiB 999.99 GB)
    Data Offset : 264 sectors
   Super Offset : 0 sectors
          State : clean
    Device UUID : 90621937:23cff36c:c55b1581:ed28b433

    Update Time : Thu Feb 10 12:17:59 2011
       Checksum : 94b9a346 - correct
         Events : 4

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 6
   Array State : .AAAAAAAAA ('A' == active, '.' == missing)
/dev/sdk1:
          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x0
     Array UUID : 7d55c29e:273c35da:f6438197:0365f95f
           Name : ubuntu:0
  Creation Time : Thu Feb 10 11:59:48 2011
     Raid Level : raid5
   Raid Devices : 10

  Avail Dev Size : 1953099512 (931.31 GiB 999.99 GB)
     Array Size : 17577894528 (8381.79 GiB 8999.88 GB)
  Used Dev Size : 1953099392 (931.31 GiB 999.99 GB)
    Data Offset : 264 sectors
   Super Offset : 0 sectors
          State : clean
    Device UUID : 72d5ce06:855aa34c:3a354abc:0ddb6e80

    Update Time : Thu Feb 10 12:17:59 2011
       Checksum : cc0e2d20 - correct
         Events : 4

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 7
   Array State : .AAAAAAAAA ('A' == active, '.' == missing)
/dev/sdl1:
          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x0
     Array UUID : 7d55c29e:273c35da:f6438197:0365f95f
           Name : ubuntu:0
  Creation Time : Thu Feb 10 11:59:48 2011
     Raid Level : raid5
   Raid Devices : 10

  Avail Dev Size : 1953099512 (931.31 GiB 999.99 GB)
     Array Size : 17577894528 (8381.79 GiB 8999.88 GB)
  Used Dev Size : 1953099392 (931.31 GiB 999.99 GB)
    Data Offset : 264 sectors
   Super Offset : 0 sectors
          State : clean
    Device UUID : 8792e04a:ee298ace:fec79850:ddd3167b

    Update Time : Thu Feb 10 12:17:59 2011
       Checksum : 20fd4534 - correct
         Events : 4

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 8
   Array State : .AAAAAAAAA ('A' == active, '.' == missing)
/dev/sdm1:
          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x0
     Array UUID : 7d55c29e:273c35da:f6438197:0365f95f
           Name : ubuntu:0
  Creation Time : Thu Feb 10 11:59:48 2011
     Raid Level : raid5
   Raid Devices : 10

  Avail Dev Size : 1953099512 (931.31 GiB 999.99 GB)
     Array Size : 17577894528 (8381.79 GiB 8999.88 GB)
  Used Dev Size : 1953099392 (931.31 GiB 999.99 GB)
    Data Offset : 264 sectors
   Super Offset : 0 sectors
          State : clean
    Device UUID : 8916d09c:448a5791:c2923182:ab99eb92

    Update Time : Thu Feb 10 12:17:59 2011
       Checksum : 7f27ba1f - correct
         Events : 4

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 9
   Array State : .AAAAAAAAA ('A' == active, '.' == missing)

I'm confident the data is on there, I just need to get a backup of it off !

Yours desperately.

regards
Simon McNair


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-10 16:16 Linux software RAID assistance Simon McNair
@ 2011-02-10 18:24 ` Phil Turmel
  2011-02-15  4:53 ` NeilBrown
  1 sibling, 0 replies; 64+ messages in thread
From: Phil Turmel @ 2011-02-10 18:24 UTC (permalink / raw)
  To: simonmcnair; +Cc: linux-raid

Hi Simon,

On 02/10/2011 11:16 AM, Simon McNair wrote:
> 
[trim /]
> 
> So I tried running mdadm --create --verbose /dev/md0 --metadata=1.1
> --level=5 --raid-devices=10 missing /dev/sde1 /dev/sdf1 /dev/sdg1
> /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 --chunk=64

Uh-Oh.  You told mdadm to create an array with a missing member, but you didn't tell it to assume-clean.

> as it seemed that /dev/sdd1 failed to be added to the array.  This did nothing.

According to your mdadm -E output below, it did what you asked:  it created a new array from the devices given (note the new array UUID on the nine devices).  Since you have one device missing, it can't rebuild, so you might not have lost your data yet.

That mdadm -E /dev/sdd1 reports its array assembled and clean is odd.

Please show us the output of /proc/mdstat and then "mdadm -D /dev/md*" or "mdadm -D /dev/md/*"

Phil

> 
> dmesg contains:
> 
> md: invalid superblock checksum on sdd1
> md: sdd1 does not have a valid v1.1 superblock, not importing!
> md: md_import_device returned -22
> 
> The output of mdadm -E is as follows:
> 
> proxmox:~# mdadm -E /dev/sd[d-m]1
> /dev/sdd1:
>          Magic : a92b4efc
>        Version : 1.1
>    Feature Map : 0x0
>     Array UUID : c4f62f32:4244a1db:b7746203:f10b5227
>           Name : proølox:0
>  Creation Time : Sat Aug 21 19:16:38 2010
>     Raid Level : raid5
>   Raid Devices : 10
> 
>  Avail Dev Size : 1953099512 (931.31 GiB 999.99 GB)
>     Array Size : 17577894528 (8381.79 GiB 8999.88 GB)
>  Used Dev Size : 1953099392 (931.31 GiB 999.99 GB)
>    Data Offset : 264 sectors
>   Super Offset : 0 sectors
>          State : clean
>    Device UUID : b0ffd0a1:866676af:3ae4c03c:d219676e
> 
>    Update Time : Mon Feb  7 21:08:29 2011
>       Checksum : 13aa9685 - expected 93aa9672
>         Events : 60802
> 
>         Layout : left-symmetric
>     Chunk Size : 64K
> 
>   Device Role : Active device 0
>   Array State : AAAAAAAAAA ('A' == active, '.' == missing)
> /dev/sde1:
>          Magic : a92b4efc
>        Version : 1.1
>    Feature Map : 0x0
>     Array UUID : 7d55c29e:273c35da:f6438197:0365f95f
>           Name : ubuntu:0
>  Creation Time : Thu Feb 10 11:59:48 2011
>     Raid Level : raid5
>   Raid Devices : 10
> 
>  Avail Dev Size : 1953099512 (931.31 GiB 999.99 GB)
>     Array Size : 17577894528 (8381.79 GiB 8999.88 GB)
>  Used Dev Size : 1953099392 (931.31 GiB 999.99 GB)
>    Data Offset : 264 sectors
>   Super Offset : 0 sectors
>          State : clean
>    Device UUID : 4121fa31:7543218c:f42a937d:fddf04e8
> 
>    Update Time : Thu Feb 10 12:17:59 2011
>       Checksum : 5f965c84 - correct
>         Events : 4
> 
>         Layout : left-symmetric
>     Chunk Size : 64K
> 
>   Device Role : Active device 1
>   Array State : .AAAAAAAAA ('A' == active, '.' == missing)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-10 16:16 Linux software RAID assistance Simon McNair
  2011-02-10 18:24 ` Phil Turmel
@ 2011-02-15  4:53 ` NeilBrown
  2011-02-15  8:48   ` Simon McNair
  2011-02-15 14:51   ` Phil Turmel
  1 sibling, 2 replies; 64+ messages in thread
From: NeilBrown @ 2011-02-15  4:53 UTC (permalink / raw)
  To: simonmcnair; +Cc: linux-raid

On Thu, 10 Feb 2011 16:16:44 +0000 Simon McNair <simonmcnair@gmail.com> wrote:

> 
> Hi all
> 
> I use a 3ware 9500-12 port sata card (JBOD) which will not work without a
> 128mb sodimm.  The sodimm socket is flakey and the result is that the
> machine occasionally crashes.  Yesterday I finally gave in and put 
> together another
> machine so that I can rsync between them.  When I turned the machine
> on today to set up rync, the RAID array was not gone, but corrupted. 
>   Typical...

Presumably the old machine was called 'ubuntu' and the new machine 'proølox'


> 
> I built the array in Aug 2010 using the following command:
> 
> mdadm --create --verbose /dev/md0 --metadata=1.1 --level=5
> --raid-devices=10 /dev/sd{b,c,d,e,f,g,h,i,j,k}1 --chunk=64
> 
> Using LVM, I did the following:
> pvscan
> pvcreate -M2 /dev/md0
> vgcreate lvm-raid /dev/md0
> vgdisplay lvm-raid
> vgscan
> lvscan
> lvcreate -v -l 100%VG -n RAID lvm-raid
> lvdisplay /dev/lvm-raid/lvm0
> 
> I then formatted using:
> mkfs -t ext4 -v -m .1 -b 4096 -E stride=16,stripe-width=144 
> /dev/lvm-raid/RAID
> 
> This worked perfectly since I created the array.  Now mdadm is coming up 
> with
> 
> proxmox:/dev/md# mdadm --assemble --scan --verbose
> mdadm: looking for devices for further assembly
> mdadm: no recogniseable superblock on /dev/md/ubuntu:0

And it seems that ubuntu:0 have been successfully assembled.
It is missing one device for some reason (sdd1) but RAID can cope with that.



> mdadm: cannot open device /dev/dm-2: Device or resource busy
> mdadm: cannot open device /dev/dm-1: Device or resource busy
> mdadm: cannot open device /dev/dm-0: Device or resource busy
> mdadm: cannot open device /dev/sdm1: Device or resource busy
> mdadm: cannot open device /dev/sdm: Device or resource busy
> mdadm: cannot open device /dev/sdl1: Device or resource busy
> mdadm: cannot open device /dev/sdl: Device or resource busy
> mdadm: cannot open device /dev/sdk1: Device or resource busy
> mdadm: cannot open device /dev/sdk: Device or resource busy
> mdadm: cannot open device /dev/sdj1: Device or resource busy
> mdadm: cannot open device /dev/sdj: Device or resource busy
> mdadm: cannot open device /dev/sdh1: Device or resource busy
> mdadm: cannot open device /dev/sdh: Device or resource busy
> mdadm: cannot open device /dev/sdi1: Device or resource busy
> mdadm: cannot open device /dev/sdi: Device or resource busy
> mdadm: cannot open device /dev/sdg1: Device or resource busy
> mdadm: cannot open device /dev/sdg: Device or resource busy
> mdadm: cannot open device /dev/sdf1: Device or resource busy
> mdadm: cannot open device /dev/sdf: Device or resource busy
> mdadm: cannot open device /dev/sde1: Device or resource busy
> mdadm: cannot open device /dev/sde: Device or resource busy
> mdadm: no RAID superblock on /dev/sdd
> mdadm: cannot open device /dev/sda2: Device or resource busy
> mdadm: cannot open device /dev/sda1: Device or resource busy
> mdadm: cannot open device /dev/sda: Device or resource busy
> mdadm: /dev/sdd1 is identified as a member of /dev/md/pro�lox:0, slot 0.
> mdadm: no uptodate device for slot 1 of /dev/md/pro�lox:0
> mdadm: no uptodate device for slot 2 of /dev/md/pro�lox:0
> mdadm: no uptodate device for slot 3 of /dev/md/pro�lox:0
> mdadm: no uptodate device for slot 4 of /dev/md/pro�lox:0
> mdadm: no uptodate device for slot 5 of /dev/md/pro�lox:0
> mdadm: no uptodate device for slot 6 of /dev/md/pro�lox:0
> mdadm: no uptodate device for slot 7 of /dev/md/pro�lox:0
> mdadm: no uptodate device for slot 8 of /dev/md/pro�lox:0
> mdadm: no uptodate device for slot 9 of /dev/md/pro�lox:0
> mdadm: failed to add /dev/sdd1 to /dev/md/pro�lox:0: Invalid argument
> mdadm: /dev/md/pro�lox:0 assembled from 0 drives - not enough to start
> the array.

This looks like it is *after* to trying the --create command you give
below..  It is best to report things in the order they happen, else you can
confuse people (or get caught out!).


> mdadm: looking for devices for further assembly
> mdadm: no recogniseable superblock on /dev/sdd
> mdadm: No arrays found in config file or automatically
> 
> pvscan and vgscan show nothing.
> 
> So I tried running mdadm --create --verbose /dev/md0 --metadata=1.1
> --level=5 --raid-devices=10 missing /dev/sde1 /dev/sdf1 /dev/sdg1
> /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 --chunk=64
> 
> as it seemed that /dev/sdd1 failed to be added to the array.  This did 
> nothing.

It did not to nothing.  It wrote a superblock to /dev/sdd1 and complained
that it couldn't write to all the others --- didn't it?


> 
> dmesg contains:
> 
> md: invalid superblock checksum on sdd1

I guess that is why sdd1 was missing from 'ubuntu:0'.  Though as I cannot
tell if this happened before or after any of the various things reported
above, it is hard to be sure.


The  real mystery is why 'pvscan' reports nothing.

What about
  pvscan --verbose

or

  blkid -p /dev/md/ubuntu:0

or even

  dd of=/dev/md/ubuntu:0 count=8 | od -c 

??

NeilBrown


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-15  4:53 ` NeilBrown
@ 2011-02-15  8:48   ` Simon McNair
  2011-02-15 14:51   ` Phil Turmel
  1 sibling, 0 replies; 64+ messages in thread
From: Simon McNair @ 2011-02-15  8:48 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Hi Neal,
Thanks for the response.  Phil Turmel has been giving me some kind 
assistance.  After some investigation we have found that the 3ware card 
I'm using as a controller (as JBOD) is making the whole thing far too 
flakey and providing some data corruption (hopefully not for too long).  
Thankfully this is mainly a data archive so the data changes very 
infrequently.  I have ordered a Supermicro 8 port sata controller and 
intend to use that, in conjunction with the 6 onboard sata ports to try 
and get the array back in shape.  Some seriously bad stuff has happened, 
so I'm hoping the data is still retrievable.

I'll cc the latest update to the list.  Any input you can provide would 
be seriously welcome.

regards
Simon

On 15/02/2011 04:53, NeilBrown wrote:
> On Thu, 10 Feb 2011 16:16:44 +0000 Simon McNair<simonmcnair@gmail.com>  wrote:
>
>> Hi all
>>
>> I use a 3ware 9500-12 port sata card (JBOD) which will not work without a
>> 128mb sodimm.  The sodimm socket is flakey and the result is that the
>> machine occasionally crashes.  Yesterday I finally gave in and put
>> together another
>> machine so that I can rsync between them.  When I turned the machine
>> on today to set up rync, the RAID array was not gone, but corrupted.
>>    Typical...
> Presumably the old machine was called 'ubuntu' and the new machine 'proølox'
>
>
>> I built the array in Aug 2010 using the following command:
>>
>> mdadm --create --verbose /dev/md0 --metadata=1.1 --level=5
>> --raid-devices=10 /dev/sd{b,c,d,e,f,g,h,i,j,k}1 --chunk=64
>>
>> Using LVM, I did the following:
>> pvscan
>> pvcreate -M2 /dev/md0
>> vgcreate lvm-raid /dev/md0
>> vgdisplay lvm-raid
>> vgscan
>> lvscan
>> lvcreate -v -l 100%VG -n RAID lvm-raid
>> lvdisplay /dev/lvm-raid/lvm0
>>
>> I then formatted using:
>> mkfs -t ext4 -v -m .1 -b 4096 -E stride=16,stripe-width=144
>> /dev/lvm-raid/RAID
>>
>> This worked perfectly since I created the array.  Now mdadm is coming up
>> with
>>
>> proxmox:/dev/md# mdadm --assemble --scan --verbose
>> mdadm: looking for devices for further assembly
>> mdadm: no recogniseable superblock on /dev/md/ubuntu:0
> And it seems that ubuntu:0 have been successfully assembled.
> It is missing one device for some reason (sdd1) but RAID can cope with that.
>
>
>
>> mdadm: cannot open device /dev/dm-2: Device or resource busy
>> mdadm: cannot open device /dev/dm-1: Device or resource busy
>> mdadm: cannot open device /dev/dm-0: Device or resource busy
>> mdadm: cannot open device /dev/sdm1: Device or resource busy
>> mdadm: cannot open device /dev/sdm: Device or resource busy
>> mdadm: cannot open device /dev/sdl1: Device or resource busy
>> mdadm: cannot open device /dev/sdl: Device or resource busy
>> mdadm: cannot open device /dev/sdk1: Device or resource busy
>> mdadm: cannot open device /dev/sdk: Device or resource busy
>> mdadm: cannot open device /dev/sdj1: Device or resource busy
>> mdadm: cannot open device /dev/sdj: Device or resource busy
>> mdadm: cannot open device /dev/sdh1: Device or resource busy
>> mdadm: cannot open device /dev/sdh: Device or resource busy
>> mdadm: cannot open device /dev/sdi1: Device or resource busy
>> mdadm: cannot open device /dev/sdi: Device or resource busy
>> mdadm: cannot open device /dev/sdg1: Device or resource busy
>> mdadm: cannot open device /dev/sdg: Device or resource busy
>> mdadm: cannot open device /dev/sdf1: Device or resource busy
>> mdadm: cannot open device /dev/sdf: Device or resource busy
>> mdadm: cannot open device /dev/sde1: Device or resource busy
>> mdadm: cannot open device /dev/sde: Device or resource busy
>> mdadm: no RAID superblock on /dev/sdd
>> mdadm: cannot open device /dev/sda2: Device or resource busy
>> mdadm: cannot open device /dev/sda1: Device or resource busy
>> mdadm: cannot open device /dev/sda: Device or resource busy
>> mdadm: /dev/sdd1 is identified as a member of /dev/md/pro�lox:0, slot 0.
>> mdadm: no uptodate device for slot 1 of /dev/md/pro�lox:0
>> mdadm: no uptodate device for slot 2 of /dev/md/pro�lox:0
>> mdadm: no uptodate device for slot 3 of /dev/md/pro�lox:0
>> mdadm: no uptodate device for slot 4 of /dev/md/pro�lox:0
>> mdadm: no uptodate device for slot 5 of /dev/md/pro�lox:0
>> mdadm: no uptodate device for slot 6 of /dev/md/pro�lox:0
>> mdadm: no uptodate device for slot 7 of /dev/md/pro�lox:0
>> mdadm: no uptodate device for slot 8 of /dev/md/pro�lox:0
>> mdadm: no uptodate device for slot 9 of /dev/md/pro�lox:0
>> mdadm: failed to add /dev/sdd1 to /dev/md/pro�lox:0: Invalid argument
>> mdadm: /dev/md/pro�lox:0 assembled from 0 drives - not enough to start
>> the array.
> This looks like it is *after* to trying the --create command you give
> below..  It is best to report things in the order they happen, else you can
> confuse people (or get caught out!).
>
>
>> mdadm: looking for devices for further assembly
>> mdadm: no recogniseable superblock on /dev/sdd
>> mdadm: No arrays found in config file or automatically
>>
>> pvscan and vgscan show nothing.
>>
>> So I tried running mdadm --create --verbose /dev/md0 --metadata=1.1
>> --level=5 --raid-devices=10 missing /dev/sde1 /dev/sdf1 /dev/sdg1
>> /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 --chunk=64
>>
>> as it seemed that /dev/sdd1 failed to be added to the array.  This did
>> nothing.
> It did not to nothing.  It wrote a superblock to /dev/sdd1 and complained
> that it couldn't write to all the others --- didn't it?
>
>
>> dmesg contains:
>>
>> md: invalid superblock checksum on sdd1
> I guess that is why sdd1 was missing from 'ubuntu:0'.  Though as I cannot
> tell if this happened before or after any of the various things reported
> above, it is hard to be sure.
>
>
> The  real mystery is why 'pvscan' reports nothing.
>
> What about
>    pvscan --verbose
>
> or
>
>    blkid -p /dev/md/ubuntu:0
>
> or even
>
>    dd of=/dev/md/ubuntu:0 count=8 | od -c
>
> ??
>
> NeilBrown
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-15  4:53 ` NeilBrown
  2011-02-15  8:48   ` Simon McNair
@ 2011-02-15 14:51   ` Phil Turmel
  2011-02-15 19:04     ` Simon McNair
                       ` (2 more replies)
  1 sibling, 3 replies; 64+ messages in thread
From: Phil Turmel @ 2011-02-15 14:51 UTC (permalink / raw)
  To: NeilBrown; +Cc: simonmcnair, linux-raid

Hi Neil,

Since Simon has responded, let me summarize the assistance I provided per his off-list request:

On 02/14/2011 11:53 PM, NeilBrown wrote:
> On Thu, 10 Feb 2011 16:16:44 +0000 Simon McNair <simonmcnair@gmail.com> wrote:
> 
>>
>> Hi all
>>
>> I use a 3ware 9500-12 port sata card (JBOD) which will not work without a
>> 128mb sodimm.  The sodimm socket is flakey and the result is that the
>> machine occasionally crashes.  Yesterday I finally gave in and put 
>> together another
>> machine so that I can rsync between them.  When I turned the machine
>> on today to set up rync, the RAID array was not gone, but corrupted. 
>>   Typical...
> 
> Presumably the old machine was called 'ubuntu' and the new machine 'proølox'
> 
> 
>>
>> I built the array in Aug 2010 using the following command:
>>
>> mdadm --create --verbose /dev/md0 --metadata=1.1 --level=5
>> --raid-devices=10 /dev/sd{b,c,d,e,f,g,h,i,j,k}1 --chunk=64
>>
>> Using LVM, I did the following:
>> pvscan
>> pvcreate -M2 /dev/md0
>> vgcreate lvm-raid /dev/md0
>> vgdisplay lvm-raid
>> vgscan
>> lvscan
>> lvcreate -v -l 100%VG -n RAID lvm-raid
>> lvdisplay /dev/lvm-raid/lvm0
>>
>> I then formatted using:
>> mkfs -t ext4 -v -m .1 -b 4096 -E stride=16,stripe-width=144 
>> /dev/lvm-raid/RAID
>>
>> This worked perfectly since I created the array.  Now mdadm is coming up 
>> with
>>
>> proxmox:/dev/md# mdadm --assemble --scan --verbose
>> mdadm: looking for devices for further assembly
>> mdadm: no recogniseable superblock on /dev/md/ubuntu:0
> 
> And it seems that ubuntu:0 have been successfully assembled.
> It is missing one device for some reason (sdd1) but RAID can cope with that.

3ware card is compromised, with a loose buffer memory dimm.  Some of its ECC errors were caught and reported in dmesg.  Its likely, based on the loose memory socket, that many multiple-bit errors got through.

[trim /]

>> mdadm: no uptodate device for slot 8 of /dev/md/pro�lox:0
>> mdadm: no uptodate device for slot 9 of /dev/md/pro�lox:0
>> mdadm: failed to add /dev/sdd1 to /dev/md/pro�lox:0: Invalid argument
>> mdadm: /dev/md/pro�lox:0 assembled from 0 drives - not enough to start
>> the array.
> 
> This looks like it is *after* to trying the --create command you give
> below..  It is best to report things in the order they happen, else you can
> confuse people (or get caught out!).

Yes, this was after.

>> mdadm: looking for devices for further assembly
>> mdadm: no recogniseable superblock on /dev/sdd
>> mdadm: No arrays found in config file or automatically
>>
>> pvscan and vgscan show nothing.
>>
>> So I tried running mdadm --create --verbose /dev/md0 --metadata=1.1
>> --level=5 --raid-devices=10 missing /dev/sde1 /dev/sdf1 /dev/sdg1
>> /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 --chunk=64
>>
>> as it seemed that /dev/sdd1 failed to be added to the array.  This did 
>> nothing.
> 
> It did not to nothing.  It wrote a superblock to /dev/sdd1 and complained
> that it couldn't write to all the others --- didn't it?

There were multiple attempts to create.  One wrote to just sdd1, another succeeded with all but sdd1.

>> dmesg contains:
>>
>> md: invalid superblock checksum on sdd1
> 
> I guess that is why sdd1 was missing from 'ubuntu:0'.  Though as I cannot
> tell if this happened before or after any of the various things reported
> above, it is hard to be sure.
> 
> 
> The  real mystery is why 'pvscan' reports nothing.

The original array was created with mdadm v2.6.7, and had a data offset of 264 sectors.  After Simon's various attempts to --create, he ended up with data offset of 2048, using mdadm v3.1.4.  The mdadm -E reports he posted to the list showed the 264 offset.  We didn't realize the offset had been updated until somewhat later in our troubleshooting efforts.

In any case, pvscan couldn't see the LVM signature because it wasn't there (at offset 2048).

> What about
>   pvscan --verbose
> 
> or
> 
>   blkid -p /dev/md/ubuntu:0
> 
> or even
> 
>   dd of=/dev/md/ubuntu:0 count=8 | od -c 

Fortunately, Simon did have a copy of his LVM configuration.  With the help of dd, strings, and grep, we did locate his LVM sig at the correct location on sdd1 (for data offset 264).  After a number of attempts to bypass LVM and access his single LV with dmsetup (based on his backed up configuration, on the assembled new array less sdd1), I realized that the data offset was wrong on the recreated array, and went looking for the cause.  I found your git commit that changed that logic last spring, and recommended that Simon revert to the default package for his ubuntu install, which is v2.6.7.

Simon has now attempted to recreate the array with v2.6.7, but the controller is throwing too many errors to succeed, and I suggested it was too flakey to trust any further.  Based on the existence of the LVM sig on sdd1, I believe Simon's data is (mostly) intact, and only needs a successful create operation with a properly functioning controller.  (He might also need to perform an lvm vgcfgrestore, but he has the necessary backup file.)

A new controller is on order.

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-15 14:51   ` Phil Turmel
@ 2011-02-15 19:04     ` Simon McNair
  2011-02-15 19:37       ` Phil Turmel
  2011-02-16 13:51     ` Simon McNair
  2011-02-16 13:56     ` Simon McNair
  2 siblings, 1 reply; 64+ messages in thread
From: Simon McNair @ 2011-02-15 19:04 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid

Phil,
Thanks for filling in the gaps, I had forgotton quite how much help and 
assistance you had provided up to now.  You're a real godsend.

To fill in some of the other pieces of info:
The original machine was called proxmox (KVM virtualisation machine) and 
I used an Ubuntu live cd to see if it was the software which was 
preventing progress or not.  It does seem like there is data corruption 
in the machine name for some reason.

The original company I ordered the controller through sent me an email 
at 4pm saying that they could not fulfill the order, 4pm was too late 
for them to pick & pack so there was another 24hr turn around delay.  
The supermicro card should arrive here tomorrow and hopefully I'll be 
able to get a dd of each of the drives prior to Phil coming online.

For some reason blkid doesn't exist on my machine even though I have 
e2fs-utils installed.  V weird.

Simon


On 15/02/2011 14:51, Phil Turmel wrote:
> Hi Neil,
>
> Since Simon has responded, let me summarize the assistance I provided per his off-list request:
>
> On 02/14/2011 11:53 PM, NeilBrown wrote:
>> On Thu, 10 Feb 2011 16:16:44 +0000 Simon McNair<simonmcnair@gmail.com>  wrote:
>>
>>> Hi all
>>>
>>> I use a 3ware 9500-12 port sata card (JBOD) which will not work without a
>>> 128mb sodimm.  The sodimm socket is flakey and the result is that the
>>> machine occasionally crashes.  Yesterday I finally gave in and put
>>> together another
>>> machine so that I can rsync between them.  When I turned the machine
>>> on today to set up rync, the RAID array was not gone, but corrupted.
>>>    Typical...
>> Presumably the old machine was called 'ubuntu' and the new machine 'proølox'
>>
>>
>>> I built the array in Aug 2010 using the following command:
>>>
>>> mdadm --create --verbose /dev/md0 --metadata=1.1 --level=5
>>> --raid-devices=10 /dev/sd{b,c,d,e,f,g,h,i,j,k}1 --chunk=64
>>>
>>> Using LVM, I did the following:
>>> pvscan
>>> pvcreate -M2 /dev/md0
>>> vgcreate lvm-raid /dev/md0
>>> vgdisplay lvm-raid
>>> vgscan
>>> lvscan
>>> lvcreate -v -l 100%VG -n RAID lvm-raid
>>> lvdisplay /dev/lvm-raid/lvm0
>>>
>>> I then formatted using:
>>> mkfs -t ext4 -v -m .1 -b 4096 -E stride=16,stripe-width=144
>>> /dev/lvm-raid/RAID
>>>
>>> This worked perfectly since I created the array.  Now mdadm is coming up
>>> with
>>>
>>> proxmox:/dev/md# mdadm --assemble --scan --verbose
>>> mdadm: looking for devices for further assembly
>>> mdadm: no recogniseable superblock on /dev/md/ubuntu:0
>> And it seems that ubuntu:0 have been successfully assembled.
>> It is missing one device for some reason (sdd1) but RAID can cope with that.
> 3ware card is compromised, with a loose buffer memory dimm.  Some of its ECC errors were caught and reported in dmesg.  Its likely, based on the loose memory socket, that many multiple-bit errors got through.
>
> [trim /]
>
>>> mdadm: no uptodate device for slot 8 of /dev/md/pro�lox:0
>>> mdadm: no uptodate device for slot 9 of /dev/md/pro�lox:0
>>> mdadm: failed to add /dev/sdd1 to /dev/md/pro�lox:0: Invalid argument
>>> mdadm: /dev/md/pro�lox:0 assembled from 0 drives - not enough to start
>>> the array.
>> This looks like it is *after* to trying the --create command you give
>> below..  It is best to report things in the order they happen, else you can
>> confuse people (or get caught out!).
> Yes, this was after.
>
>>> mdadm: looking for devices for further assembly
>>> mdadm: no recogniseable superblock on /dev/sdd
>>> mdadm: No arrays found in config file or automatically
>>>
>>> pvscan and vgscan show nothing.
>>>
>>> So I tried running mdadm --create --verbose /dev/md0 --metadata=1.1
>>> --level=5 --raid-devices=10 missing /dev/sde1 /dev/sdf1 /dev/sdg1
>>> /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 --chunk=64
>>>
>>> as it seemed that /dev/sdd1 failed to be added to the array.  This did
>>> nothing.
>> It did not to nothing.  It wrote a superblock to /dev/sdd1 and complained
>> that it couldn't write to all the others --- didn't it?
> There were multiple attempts to create.  One wrote to just sdd1, another succeeded with all but sdd1.
>
>>> dmesg contains:
>>>
>>> md: invalid superblock checksum on sdd1
>> I guess that is why sdd1 was missing from 'ubuntu:0'.  Though as I cannot
>> tell if this happened before or after any of the various things reported
>> above, it is hard to be sure.
>>
>>
>> The  real mystery is why 'pvscan' reports nothing.
> The original array was created with mdadm v2.6.7, and had a data offset of 264 sectors.  After Simon's various attempts to --create, he ended up with data offset of 2048, using mdadm v3.1.4.  The mdadm -E reports he posted to the list showed the 264 offset.  We didn't realize the offset had been updated until somewhat later in our troubleshooting efforts.
>
> In any case, pvscan couldn't see the LVM signature because it wasn't there (at offset 2048).
>
>> What about
>>    pvscan --verbose
>>
>> or
>>
>>    blkid -p /dev/md/ubuntu:0
>>
>> or even
>>
>>    dd of=/dev/md/ubuntu:0 count=8 | od -c
> Fortunately, Simon did have a copy of his LVM configuration.  With the help of dd, strings, and grep, we did locate his LVM sig at the correct location on sdd1 (for data offset 264).  After a number of attempts to bypass LVM and access his single LV with dmsetup (based on his backed up configuration, on the assembled new array less sdd1), I realized that the data offset was wrong on the recreated array, and went looking for the cause.  I found your git commit that changed that logic last spring, and recommended that Simon revert to the default package for his ubuntu install, which is v2.6.7.
>
> Simon has now attempted to recreate the array with v2.6.7, but the controller is throwing too many errors to succeed, and I suggested it was too flakey to trust any further.  Based on the existence of the LVM sig on sdd1, I believe Simon's data is (mostly) intact, and only needs a successful create operation with a properly functioning controller.  (He might also need to perform an lvm vgcfgrestore, but he has the necessary backup file.)
>
> A new controller is on order.
>
> Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-15 19:04     ` Simon McNair
@ 2011-02-15 19:37       ` Phil Turmel
  2011-02-15 19:45         ` Roman Mamedov
  0 siblings, 1 reply; 64+ messages in thread
From: Phil Turmel @ 2011-02-15 19:37 UTC (permalink / raw)
  To: simonmcnair; +Cc: NeilBrown, linux-raid

On 02/15/2011 02:04 PM, Simon McNair wrote:
> Phil,
> Thanks for filling in the gaps, I had forgotton quite how much help and assistance you had provided up to now.  You're a real godsend.
> 
> To fill in some of the other pieces of info:
> The original machine was called proxmox (KVM virtualisation machine) and I used an Ubuntu live cd to see if it was the software which was preventing progress or not.  It does seem like there is data corruption in the machine name for some reason.
> 
> The original company I ordered the controller through sent me an email at 4pm saying that they could not fulfill the order, 4pm was too late for them to pick & pack so there was another 24hr turn around delay.  The supermicro card should arrive here tomorrow and hopefully I'll be able to get a dd of each of the drives prior to Phil coming online.
> 
> For some reason blkid doesn't exist on my machine even though I have e2fs-utils installed.  V weird.

FWIW, on my VPS running Ubuntu Server 10.10, the blkid executable is part of util-linux, and installed in /sbin.  You might need to re-install it.

Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-15 19:37       ` Phil Turmel
@ 2011-02-15 19:45         ` Roman Mamedov
  2011-02-15 21:09           ` Simon McNair
  2011-02-17 15:10           ` Simon Mcnair
  0 siblings, 2 replies; 64+ messages in thread
From: Roman Mamedov @ 2011-02-15 19:45 UTC (permalink / raw)
  To: Phil Turmel; +Cc: simonmcnair, NeilBrown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 494 bytes --]

On Tue, 15 Feb 2011 14:37:55 -0500
Phil Turmel <philip@turmel.org> wrote:

> > For some reason blkid doesn't exist on my machine even though I have
> > e2fs-utils installed.  V weird.

Weird is that you think it is somehow a "e2fs-utils" program.

dpkg -S blkid
...
util-linux: /sbin/blkid
...

> 
> FWIW, on my VPS running Ubuntu Server 10.10, the blkid executable is part of
> util-linux, and installed in /sbin.  You might need to re-install it.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-15 19:45         ` Roman Mamedov
@ 2011-02-15 21:09           ` Simon McNair
  2011-02-17 15:10           ` Simon Mcnair
  1 sibling, 0 replies; 64+ messages in thread
From: Simon McNair @ 2011-02-15 21:09 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Phil Turmel, NeilBrown, linux-raid

I got my info from:

http://linux.die.net/man/8/blkid

"*blkid* is part the e2fsprogs package since version 1.26 and is 
available from http://e2fsprogs.sourceforge.net 
<http://e2fsprogs.sourceforge.net/>. "

I also did apt-file search blkid and it showed up as e2fsprogs or was it 
e2fs-utils (I forget).  The system I was/am running was Debian Lenny x64.

This just shows how little I know about Linux (I've only been dabbling 
for 6 months or so).  I'll install util-linux next time the machine 
start up.  Thanks for the pointer.

regards
Simon

On 15/02/2011 19:45, Roman Mamedov wrote:
> On Tue, 15 Feb 2011 14:37:55 -0500
> Phil Turmel<philip@turmel.org>  wrote:
>
>>> For some reason blkid doesn't exist on my machine even though I have
>>> e2fs-utils installed.  V weird.
> Weird is that you think it is somehow a "e2fs-utils" program.
>
> dpkg -S blkid
> ...
> util-linux: /sbin/blkid
> ...
>
>> FWIW, on my VPS running Ubuntu Server 10.10, the blkid executable is part of
>> util-linux, and installed in /sbin.  You might need to re-install it.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-15 19:45         ` Roman Mamedov
  2011-02-15 21:09           ` Simon McNair
@ 2011-02-17 15:10           ` Simon Mcnair
  2011-02-17 15:42             ` Roman Mamedov
  1 sibling, 1 reply; 64+ messages in thread
From: Simon Mcnair @ 2011-02-17 15:10 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Phil Turmel, NeilBrown, linux-raid

proxmox:~# dpkg -S blkid
libblkid1: /lib/libblkid.so.1.0
libblkid1: /lib/libblkid.so.1
libblkid1: /usr/share/doc/libblkid1/changelog.Debian.gz
libblkid1: /usr/share/doc/libblkid1/copyright
libblkid1: /usr/share/doc/libblkid1
e2fsprogs-dbg: /usr/lib/debug/sbin/blkid

proxmox:~# apt-get install e2fsprogs-dbg
Reading package lists... Done
Building dependency tree
Reading state information... Done
e2fsprogs-dbg is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.


I still can't blkid installed, even though I have e2fsprogs-dbg installed...

On 15 February 2011 19:45, Roman Mamedov <rm@romanrm.ru> wrote:
> On Tue, 15 Feb 2011 14:37:55 -0500
> Phil Turmel <philip@turmel.org> wrote:
>
>> > For some reason blkid doesn't exist on my machine even though I have
>> > e2fs-utils installed.  V weird.
>
> Weird is that you think it is somehow a "e2fs-utils" program.
>
> dpkg -S blkid
> ...
> util-linux: /sbin/blkid
> ...
>
>>
>> FWIW, on my VPS running Ubuntu Server 10.10, the blkid executable is part of
>> util-linux, and installed in /sbin.  You might need to re-install it.
>
> --
> With respect,
> Roman
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-17 15:10           ` Simon Mcnair
@ 2011-02-17 15:42             ` Roman Mamedov
  2011-02-18  9:13               ` Simon McNair
  0 siblings, 1 reply; 64+ messages in thread
From: Roman Mamedov @ 2011-02-17 15:42 UTC (permalink / raw)
  To: Simon Mcnair; +Cc: Phil Turmel, NeilBrown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1057 bytes --]

On Thu, 17 Feb 2011 15:10:32 +0000
Simon Mcnair <simonmcnair@gmail.com> wrote:

> proxmox:~# dpkg -S blkid
> libblkid1: /lib/libblkid.so.1.0
> libblkid1: /lib/libblkid.so.1
> libblkid1: /usr/share/doc/libblkid1/changelog.Debian.gz
> libblkid1: /usr/share/doc/libblkid1/copyright
> libblkid1: /usr/share/doc/libblkid1
> e2fsprogs-dbg: /usr/lib/debug/sbin/blkid
> 
> proxmox:~# apt-get install e2fsprogs-dbg
> Reading package lists... Done
> Building dependency tree
> Reading state information... Done
> e2fsprogs-dbg is already the newest version.
> 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
> 
> 
> I still can't blkid installed, even though I have e2fsprogs-dbg installed...

/usr/lib/debug/sbin/ is not in system 'path', so you won't be able to run the
program just by entering 'blkid' into the shell prompt.
Either use the complete file name and path, e.g. 

  # /usr/lib/debug/sbin/blkid --help

or symlink /usr/lib/debug/sbin/blkid to something like /usr/local/sbin/blkid.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-17 15:42             ` Roman Mamedov
@ 2011-02-18  9:13               ` Simon McNair
  2011-02-18  9:38                 ` Robin Hill
  0 siblings, 1 reply; 64+ messages in thread
From: Simon McNair @ 2011-02-18  9:13 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Phil Turmel, linux-raid

Hi Roman,
Sorry to be so persistent on something that is not Linux raid specific, 
but I still can't get blkid to run, the message is:

proxmox:~# ls -la /usr/lib/debug/sbin/blkid
-rwx------ 1 root root 19887 2008-10-13 04:54 /usr/lib/debug/sbin/blkid

proxmox:~# /usr/lib/debug/sbin/blkid --help
-bash: /usr/lib/debug/sbin/blkid: cannot execute binary file

If I can't run it directly I'm pretty sure symlinking it will not make 
any difference.  It's weird, normally when I apt-get things it modifies 
the path etc and does everything required to make it work, I would have 
thought if I installed a debug version of something it should add the 
debug path as well ?

blkid always works on my Ubuntu boxes, I don't know why this is any 
different (apart from being a different distribution, it's still a 
mature program, package and platform).

:-)

Simon

On 17/02/2011 15:42, Roman Mamedov wrote:
> On Thu, 17 Feb 2011 15:10:32 +0000
> Simon Mcnair<simonmcnair@gmail.com>  wrote:
>
>> proxmox:~# dpkg -S blkid
>> libblkid1: /lib/libblkid.so.1.0
>> libblkid1: /lib/libblkid.so.1
>> libblkid1: /usr/share/doc/libblkid1/changelog.Debian.gz
>> libblkid1: /usr/share/doc/libblkid1/copyright
>> libblkid1: /usr/share/doc/libblkid1
>> e2fsprogs-dbg: /usr/lib/debug/sbin/blkid
>>
>> proxmox:~# apt-get install e2fsprogs-dbg
>> Reading package lists... Done
>> Building dependency tree
>> Reading state information... Done
>> e2fsprogs-dbg is already the newest version.
>> 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
>>
>>
>> I still can't blkid installed, even though I have e2fsprogs-dbg installed...
> /usr/lib/debug/sbin/ is not in system 'path', so you won't be able to run the
> program just by entering 'blkid' into the shell prompt.
> Either use the complete file name and path, e.g.
>
>    # /usr/lib/debug/sbin/blkid --help
>
> or symlink /usr/lib/debug/sbin/blkid to something like /usr/local/sbin/blkid.
>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-18  9:13               ` Simon McNair
@ 2011-02-18  9:38                 ` Robin Hill
  2011-02-18 10:38                   ` Simon Mcnair
  0 siblings, 1 reply; 64+ messages in thread
From: Robin Hill @ 2011-02-18  9:38 UTC (permalink / raw)
  To: Simon McNair; +Cc: Roman Mamedov, Phil Turmel, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1323 bytes --]

On Fri Feb 18, 2011 at 09:13:33AM +0000, Simon McNair wrote:

> Hi Roman,
> Sorry to be so persistent on something that is not Linux raid specific, 
> but I still can't get blkid to run, the message is:
> 
> proxmox:~# ls -la /usr/lib/debug/sbin/blkid
> -rwx------ 1 root root 19887 2008-10-13 04:54 /usr/lib/debug/sbin/blkid
> 
> proxmox:~# /usr/lib/debug/sbin/blkid --help
> -bash: /usr/lib/debug/sbin/blkid: cannot execute binary file
> 
> If I can't run it directly I'm pretty sure symlinking it will not make 
> any difference.  It's weird, normally when I apt-get things it modifies 
> the path etc and does everything required to make it work, I would have 
> thought if I installed a debug version of something it should add the 
> debug path as well ?
> 
I suspect that those are just the debug symbols - they're split off from
the binary so they can be automatically pulled in when needed, but don't
bloat the application itself.  From the package naming, the actual blkid
binary should be in the e2fsprogs package - do you have that installed?

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-18  9:38                 ` Robin Hill
@ 2011-02-18 10:38                   ` Simon Mcnair
  2011-02-19 11:46                     ` Jan Ceuleers
  0 siblings, 1 reply; 64+ messages in thread
From: Simon Mcnair @ 2011-02-18 10:38 UTC (permalink / raw)
  To: Simon McNair, Roman Mamedov, Phil Turmel, linux-raid; +Cc: Robin Hill

Hi Robin,

proxmox:~# apt-get install e2fsprogs
Reading package lists... Done
Building dependency tree
Reading state information... Done
e2fsprogs is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 3 not upgraded.

Yeah, It's installed :-)

Simon

On 18 February 2011 09:38, Robin Hill <robin@robinhill.me.uk> wrote:
> On Fri Feb 18, 2011 at 09:13:33AM +0000, Simon McNair wrote:
>
>> Hi Roman,
>> Sorry to be so persistent on something that is not Linux raid specific,
>> but I still can't get blkid to run, the message is:
>>
>> proxmox:~# ls -la /usr/lib/debug/sbin/blkid
>> -rwx------ 1 root root 19887 2008-10-13 04:54 /usr/lib/debug/sbin/blkid
>>
>> proxmox:~# /usr/lib/debug/sbin/blkid --help
>> -bash: /usr/lib/debug/sbin/blkid: cannot execute binary file
>>
>> If I can't run it directly I'm pretty sure symlinking it will not make
>> any difference.  It's weird, normally when I apt-get things it modifies
>> the path etc and does everything required to make it work, I would have
>> thought if I installed a debug version of something it should add the
>> debug path as well ?
>>
> I suspect that those are just the debug symbols - they're split off from
> the binary so they can be automatically pulled in when needed, but don't
> bloat the application itself.  From the package naming, the actual blkid
> binary should be in the e2fsprogs package - do you have that installed?
>
> Cheers,
>    Robin
> --
>     ___
>    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
>   / / )      | Little Jim says ....                            |
>  // !!       |      "He fallen in de water !!"                 |
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-18 10:38                   ` Simon Mcnair
@ 2011-02-19 11:46                     ` Jan Ceuleers
  2011-02-19 12:40                       ` Simon McNair
  0 siblings, 1 reply; 64+ messages in thread
From: Jan Ceuleers @ 2011-02-19 11:46 UTC (permalink / raw)
  To: Simon Mcnair; +Cc: Roman Mamedov, Phil Turmel, linux-raid, Robin Hill

On 18/02/11 11:38, Simon Mcnair wrote:
> Yeah, It's installed :-)

apt-get install --reinstall e2fsprogs

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-19 11:46                     ` Jan Ceuleers
@ 2011-02-19 12:40                       ` Simon McNair
  2011-02-19 17:37                         ` Jan Ceuleers
  0 siblings, 1 reply; 64+ messages in thread
From: Simon McNair @ 2011-02-19 12:40 UTC (permalink / raw)
  To: Jan Ceuleers; +Cc: Roman Mamedov, Phil Turmel, linux-raid, Robin Hill

Jan,
yay.  Thanks very much.  That worked.  I don't quite understand why 
something should need a reinstall to work, but thanks v much.

regards
Simon

On 19/02/2011 11:46, Jan Ceuleers wrote:
> On 18/02/11 11:38, Simon Mcnair wrote:
>> Yeah, It's installed :-)
>
> apt-get install --reinstall e2fsprogs

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-19 12:40                       ` Simon McNair
@ 2011-02-19 17:37                         ` Jan Ceuleers
  0 siblings, 0 replies; 64+ messages in thread
From: Jan Ceuleers @ 2011-02-19 17:37 UTC (permalink / raw)
  To: simonmcnair; +Cc: Roman Mamedov, Phil Turmel, linux-raid, Robin Hill

On 19/02/11 13:40, Simon McNair wrote:
> yay. Thanks very much. That worked. I don't quite understand why
> something should need a reinstall to work, but thanks v much.

If a package's binaries have been damaged or deleted, on purpose or 
otherwise, then --reinstall helps. I have no idea what happened to your 
copy of blkid, but --reinstall has brought it back for you.

Jan

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-15 14:51   ` Phil Turmel
  2011-02-15 19:04     ` Simon McNair
@ 2011-02-16 13:51     ` Simon McNair
  2011-02-16 14:37       ` Phil Turmel
  2011-02-16 13:56     ` Simon McNair
  2 siblings, 1 reply; 64+ messages in thread
From: Simon McNair @ 2011-02-16 13:51 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid

latest update.  A bit long I'm afraid...

hi, card installed and devices plugged in.  Now have
sda   sda2  sdb1  sdc1  sde   sdf1  sdg1  sdi   sdj1  sdk1  sdl1  sdm1
sda1  sdb   sdc   sdd   sdf   sdg   sdh   sdj   sdk   sdl   sdm

proxmox:/home/simon# ./lsdrv.sh
Controller device @ pci0000:00/0000:00:01.0/0000:01:00.0 [mvsas]
   SCSI storage controller: Marvell Technology Group Ltd. 
MV64460/64461/64462 Sys                                         tem 
Controller, Revision B (rev 01)
     host4: /dev/sdf ATA Hitachi HDS72101 {SN: GTA000PAGABXRA}
     host4: /dev/sdg ATA Hitachi HDS72101 {SN: GTA000PAGAA5DA}
     host4: /dev/sdh ATA Hitachi HDS72101 {SN: GTA000PAG9NL9A}
     host4: /dev/sdi ATA Hitachi HDS72101 {SN: GTA000PAGA8V4A}
     host4: /dev/sdj ATA Hitachi HDS72101 {SN: GTD000PAGMT9GD}
     host4: /dev/sdk ATA Hitachi HDS72101 {SN: GTG000PAG18BJC}
     host4: /dev/sdl ATA Hitachi HDS72101 {SN: GTG000PAG1DPLC}
     host4: /dev/sdm ATA Hitachi HDS72101 {SN: GTA000PAG7WMEA}
Controller device @ 
pci0000:00/0000:00:1a.7/usb1/1-4/1-4.1/1-4.1.1/1-4.1.1:1.0 
[                                         usb-storage]
   Bus 001 Device 007: ID 0424:2228 Standard Microsystems Corp. 9-in-2 
Card Reade                                         r {SN: 08050920003A}
     host9: /dev/sdd Generic Flash HS-CF
     host9: /dev/sde Generic Flash HS-COMBO
Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.0 [ahci]
   SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA 
Controller                                          (rev 03)
     host7: [Empty]
     host8: [Empty]
Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.1 [pata_jmicron]
   IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA 
Controller (r                                         ev 03)
     host5: [Empty]
     host6: [Empty]
Controller device @ pci0000:00/0000:00:1f.2 [ata_piix]
   IDE interface: Intel Corporation 82801JI (ICH10 Family) 4 port SATA 
IDE Contro                                         ller #1
     host0: /dev/sda ATA STM3500418AS {SN: 9VM3QJ5C}
     host1: /dev/sr0 Optiarc DVD RW AD-5240S
Controller device @ pci0000:00/0000:00:1f.5 [ata_piix]
   IDE interface: Intel Corporation 82801JI (ICH10 Family) 2 port SATA 
IDE Contro                                         ller #2
     host2: /dev/sdb ATA Hitachi HDS72101 {SN: GTD000PAGMT8DD}
     host3: /dev/sdc ATA Hitachi HDS72101 {SN: GTG000PAG04V0C}

proxmox:/home/simon# parted -l
Model: ATA STM3500418AS (scsi)
Disk /dev/sda: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End    Size   Type     File system  Flags
  1      32.8kB  537MB  537MB  primary  ext3         boot
  2      537MB   500GB  500GB  primary               lvm

Error: The backup GPT table is not at the end of the disk, as it should 
be.  This might mean that another operating
system believes the disk is smaller.  Fix, by moving the backup to the 
end (and removing the old backup)?
Fix/Cancel? c

Error: The backup GPT table is not at the end of the disk, as it should 
be.  This might mean that another operating
system believes the disk is smaller.  Fix, by moving the backup to the 
end (and removing the old backup)?
Fix/Cancel? c

Error: The backup GPT table is not at the end of the disk, as it should 
be.  This might mean that another operating
system believes the disk is smaller.  Fix, by moving the backup to the 
end (and removing the old backup)?
Fix/Cancel? c

Model: Linux device-mapper (linear) (dm)
Disk /dev/mapper/pve-data: 380GB
Sector size (logical/physical): 512B/512B
Partition Table: loop

Number  Start  End    Size   File system  Flags
  1      0.00B  380GB  380GB  ext3

cModel: Linux device-mapper (linear) (dm)
Disk /dev/mapper/pve-root: 103GB
Sector size (logical/physical): 512B/512B
Partition Table: loop

Number  Start  End    Size   File system  Flags
  1      0.00B  103GB  103GB  ext3

Model: Linux device-mapper (linear) (dm)
Disk /dev/mapper/pve-swap: 11.8GB
Sector size (logical/physical): 512B/512B
Partition Table: loop

Number  Start  End     Size    File system  Flags
  1      0.00B  11.8GB  11.8GB  linux-swap

Error: The backup GPT table is not at the end of the disk, as it should 
be.  This might mean that another operating
system believes the disk is smaller.  Fix, by moving the backup to the 
end (and removing the old backup)?
Fix/Cancel? c

Error: /dev/sdh: unrecognised disk label

Error: /dev/sdi: unrecognised disk label

Error: The backup GPT table is not at the end of the disk, as it should 
be.  This might mean that another operating
system believes the disk is smaller.  Fix, by moving the backup to the 
end (and removing the old backup)?
Fix/Cancel? c

Error: The backup GPT table is not at the end of the disk, as it should 
be.  This might mean that another operating
system believes the disk is smaller.  Fix, by moving the backup to the 
end (and removing the old backup)?
Fix/Cancel? c

Error: The backup GPT table is not at the end of the disk, as it should 
be.  This might mean that another operating
system believes the disk is smaller.  Fix, by moving the backup to the 
end (and removing the old backup)?
Fix/Cancel? c

Error: The backup GPT table is not at the end of the disk, as it should 
be.  This might mean that another operating
system believes the disk is smaller.  Fix, by moving the backup to the 
end (and removing the old backup)?
Fix/Cancel? c

Error: /dev/md0: unrecognised disk label

proxmox:/home/simon# mdadm --assemble --scan --verbose
mdadm: looking for devices for /dev/md/0
mdadm: cannot open device /dev/dm-2: Device or resource busy
mdadm: /dev/dm-2 has wrong uuid.
mdadm: cannot open device /dev/dm-1: Device or resource busy
mdadm: /dev/dm-1 has wrong uuid.
mdadm: cannot open device /dev/dm-0: Device or resource busy
mdadm: /dev/dm-0 has wrong uuid.
mdadm: no RAID superblock on /dev/sdf1
mdadm: /dev/sdf1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdf
mdadm: /dev/sdf has wrong uuid.
mdadm: no RAID superblock on /dev/sdm1
mdadm: /dev/sdm1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdm
mdadm: /dev/sdm has wrong uuid.
mdadm: no RAID superblock on /dev/sdl1
mdadm: /dev/sdl1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdl
mdadm: /dev/sdl has wrong uuid.
mdadm: no RAID superblock on /dev/sdk1
mdadm: /dev/sdk1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdk
mdadm: /dev/sdk has wrong uuid.
mdadm: no RAID superblock on /dev/sdj1
mdadm: /dev/sdj1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdj
mdadm: /dev/sdj has wrong uuid.
mdadm: /dev/sdi has wrong uuid.
mdadm: /dev/sdh has wrong uuid.
mdadm: no RAID superblock on /dev/sdg1
mdadm: /dev/sdg1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdg
mdadm: /dev/sdg has wrong uuid.
mdadm: no RAID superblock on /dev/sdc1
mdadm: /dev/sdc1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdc
mdadm: /dev/sdc has wrong uuid.
mdadm: no RAID superblock on /dev/sdb1
mdadm: /dev/sdb1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdb
mdadm: /dev/sdb has wrong uuid.
mdadm: cannot open device /dev/sda2: Device or resource busy
mdadm: /dev/sda2 has wrong uuid.
mdadm: cannot open device /dev/sda1: Device or resource busy
mdadm: /dev/sda1 has wrong uuid.
mdadm: cannot open device /dev/sda: Device or resource busy
mdadm: /dev/sda has wrong uuid.
mdadm: no devices found for /dev/md/0
mdadm: looking for devices for further assembly
mdadm: cannot open device /dev/dm-2: Device or resource busy
mdadm: cannot open device /dev/dm-1: Device or resource busy
mdadm: cannot open device /dev/dm-0: Device or resource busy
mdadm: no recogniseable superblock on /dev/sdf1
mdadm: no recogniseable superblock on /dev/sdf
mdadm: no recogniseable superblock on /dev/sdm1
mdadm: no recogniseable superblock on /dev/sdm
mdadm: no recogniseable superblock on /dev/sdl1
mdadm: no recogniseable superblock on /dev/sdl
mdadm: no recogniseable superblock on /dev/sdk1
mdadm: no recogniseable superblock on /dev/sdk
mdadm: no recogniseable superblock on /dev/sdj1
mdadm: no recogniseable superblock on /dev/sdj
mdadm: /dev/sdi is not built for host proxmox.
mdadm: /dev/sdh is not built for host proxmox.
mdadm: no recogniseable superblock on /dev/sdg1
mdadm: no recogniseable superblock on /dev/sdg
mdadm: no recogniseable superblock on /dev/sdc1
mdadm: no recogniseable superblock on /dev/sdc
mdadm: no recogniseable superblock on /dev/sdb1
mdadm: no recogniseable superblock on /dev/sdb
mdadm: cannot open device /dev/sda2: Device or resource busy
mdadm: cannot open device /dev/sda1: Device or resource busy
mdadm: cannot open device /dev/sda: Device or resource busy

proxmox:/home/simon# apt-show-versions -a mdadm
mdadm 2.6.7.2-3 install ok installed
mdadm 2.6.7.2-3       lenny ftp.uk.debian.org
No stable version
No testing version
mdadm 3.1.4-1+8efb9d1 sid   ftp.uk.debian.org
mdadm/lenny uptodate 2.6.7.2-3

anything else you want ?

Simon

On 15/02/2011 14:51, Phil Turmel wrote:
> Hi Neil,
>
> Since Simon has responded, let me summarize the assistance I provided per his off-list request:
>
> On 02/14/2011 11:53 PM, NeilBrown wrote:
>> On Thu, 10 Feb 2011 16:16:44 +0000 Simon McNair<simonmcnair@gmail.com>  wrote:
>>
>>> Hi all
>>>
>>> I use a 3ware 9500-12 port sata card (JBOD) which will not work without a
>>> 128mb sodimm.  The sodimm socket is flakey and the result is that the
>>> machine occasionally crashes.  Yesterday I finally gave in and put
>>> together another
>>> machine so that I can rsync between them.  When I turned the machine
>>> on today to set up rync, the RAID array was not gone, but corrupted.
>>>    Typical...
>> Presumably the old machine was called 'ubuntu' and the new machine 'proølox'
>>
>>
>>> I built the array in Aug 2010 using the following command:
>>>
>>> mdadm --create --verbose /dev/md0 --metadata=1.1 --level=5
>>> --raid-devices=10 /dev/sd{b,c,d,e,f,g,h,i,j,k}1 --chunk=64
>>>
>>> Using LVM, I did the following:
>>> pvscan
>>> pvcreate -M2 /dev/md0
>>> vgcreate lvm-raid /dev/md0
>>> vgdisplay lvm-raid
>>> vgscan
>>> lvscan
>>> lvcreate -v -l 100%VG -n RAID lvm-raid
>>> lvdisplay /dev/lvm-raid/lvm0
>>>
>>> I then formatted using:
>>> mkfs -t ext4 -v -m .1 -b 4096 -E stride=16,stripe-width=144
>>> /dev/lvm-raid/RAID
>>>
>>> This worked perfectly since I created the array.  Now mdadm is coming up
>>> with
>>>
>>> proxmox:/dev/md# mdadm --assemble --scan --verbose
>>> mdadm: looking for devices for further assembly
>>> mdadm: no recogniseable superblock on /dev/md/ubuntu:0
>> And it seems that ubuntu:0 have been successfully assembled.
>> It is missing one device for some reason (sdd1) but RAID can cope with that.
> 3ware card is compromised, with a loose buffer memory dimm.  Some of its ECC errors were caught and reported in dmesg.  Its likely, based on the loose memory socket, that many multiple-bit errors got through.
>
> [trim /]
>
>>> mdadm: no uptodate device for slot 8 of /dev/md/pro�lox:0
>>> mdadm: no uptodate device for slot 9 of /dev/md/pro�lox:0
>>> mdadm: failed to add /dev/sdd1 to /dev/md/pro�lox:0: Invalid argument
>>> mdadm: /dev/md/pro�lox:0 assembled from 0 drives - not enough to start
>>> the array.
>> This looks like it is *after* to trying the --create command you give
>> below..  It is best to report things in the order they happen, else you can
>> confuse people (or get caught out!).
> Yes, this was after.
>
>>> mdadm: looking for devices for further assembly
>>> mdadm: no recogniseable superblock on /dev/sdd
>>> mdadm: No arrays found in config file or automatically
>>>
>>> pvscan and vgscan show nothing.
>>>
>>> So I tried running mdadm --create --verbose /dev/md0 --metadata=1.1
>>> --level=5 --raid-devices=10 missing /dev/sde1 /dev/sdf1 /dev/sdg1
>>> /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 --chunk=64
>>>
>>> as it seemed that /dev/sdd1 failed to be added to the array.  This did
>>> nothing.
>> It did not to nothing.  It wrote a superblock to /dev/sdd1 and complained
>> that it couldn't write to all the others --- didn't it?
> There were multiple attempts to create.  One wrote to just sdd1, another succeeded with all but sdd1.
>
>>> dmesg contains:
>>>
>>> md: invalid superblock checksum on sdd1
>> I guess that is why sdd1 was missing from 'ubuntu:0'.  Though as I cannot
>> tell if this happened before or after any of the various things reported
>> above, it is hard to be sure.
>>
>>
>> The  real mystery is why 'pvscan' reports nothing.
> The original array was created with mdadm v2.6.7, and had a data offset of 264 sectors.  After Simon's various attempts to --create, he ended up with data offset of 2048, using mdadm v3.1.4.  The mdadm -E reports he posted to the list showed the 264 offset.  We didn't realize the offset had been updated until somewhat later in our troubleshooting efforts.
>
> In any case, pvscan couldn't see the LVM signature because it wasn't there (at offset 2048).
>
>> What about
>>    pvscan --verbose
>>
>> or
>>
>>    blkid -p /dev/md/ubuntu:0
>>
>> or even
>>
>>    dd of=/dev/md/ubuntu:0 count=8 | od -c
> Fortunately, Simon did have a copy of his LVM configuration.  With the help of dd, strings, and grep, we did locate his LVM sig at the correct location on sdd1 (for data offset 264).  After a number of attempts to bypass LVM and access his single LV with dmsetup (based on his backed up configuration, on the assembled new array less sdd1), I realized that the data offset was wrong on the recreated array, and went looking for the cause.  I found your git commit that changed that logic last spring, and recommended that Simon revert to the default package for his ubuntu install, which is v2.6.7.
>
> Simon has now attempted to recreate the array with v2.6.7, but the controller is throwing too many errors to succeed, and I suggested it was too flakey to trust any further.  Based on the existence of the LVM sig on sdd1, I believe Simon's data is (mostly) intact, and only needs a successful create operation with a properly functioning controller.  (He might also need to perform an lvm vgcfgrestore, but he has the necessary backup file.)
>
> A new controller is on order.
>
> Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-16 13:51     ` Simon McNair
@ 2011-02-16 14:37       ` Phil Turmel
  2011-02-16 17:49         ` Simon McNair
  0 siblings, 1 reply; 64+ messages in thread
From: Phil Turmel @ 2011-02-16 14:37 UTC (permalink / raw)
  To: simonmcnair; +Cc: NeilBrown, linux-raid

Good morning, Simon,

On 02/16/2011 08:51 AM, Simon McNair wrote:
> latest update.  A bit long I'm afraid...
> 
> hi, card installed and devices plugged in.  Now have
> sda   sda2  sdb1  sdc1  sde   sdf1  sdg1  sdi   sdj1  sdk1  sdl1  sdm1
> sda1  sdb   sdc   sdd   sdf   sdg   sdh   sdj   sdk   sdl   sdm

Hmmm.

> proxmox:/home/simon# ./lsdrv.sh
> Controller device @ pci0000:00/0000:00:01.0/0000:01:00.0 [mvsas]
>   SCSI storage controller: Marvell Technology Group Ltd. MV64460/64461/64462 Sys                                         tem Controller, Revision B (rev 01)
>     host4: /dev/sdf ATA Hitachi HDS72101 {SN: GTA000PAGABXRA}
>     host4: /dev/sdg ATA Hitachi HDS72101 {SN: GTA000PAGAA5DA}
>     host4: /dev/sdh ATA Hitachi HDS72101 {SN: GTA000PAG9NL9A}
>     host4: /dev/sdi ATA Hitachi HDS72101 {SN: GTA000PAGA8V4A}
>     host4: /dev/sdj ATA Hitachi HDS72101 {SN: GTD000PAGMT9GD}
>     host4: /dev/sdk ATA Hitachi HDS72101 {SN: GTG000PAG18BJC}
>     host4: /dev/sdl ATA Hitachi HDS72101 {SN: GTG000PAG1DPLC}
>     host4: /dev/sdm ATA Hitachi HDS72101 {SN: GTA000PAG7WMEA}
> Controller device @ pci0000:00/0000:00:1a.7/usb1/1-4/1-4.1/1-4.1.1/1-4.1.1:1.0 [                                         usb-storage]
>   Bus 001 Device 007: ID 0424:2228 Standard Microsystems Corp. 9-in-2 Card Reade                                         r {SN: 08050920003A}
>     host9: /dev/sdd Generic Flash HS-CF
>     host9: /dev/sde Generic Flash HS-COMBO
> Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.0 [ahci]
>   SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller                                          (rev 03)
>     host7: [Empty]
>     host8: [Empty]
> Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.1 [pata_jmicron]
>   IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (r                                         ev 03)
>     host5: [Empty]
>     host6: [Empty]
> Controller device @ pci0000:00/0000:00:1f.2 [ata_piix]
>   IDE interface: Intel Corporation 82801JI (ICH10 Family) 4 port SATA IDE Contro                                         ller #1
>     host0: /dev/sda ATA STM3500418AS {SN: 9VM3QJ5C}
>     host1: /dev/sr0 Optiarc DVD RW AD-5240S
> Controller device @ pci0000:00/0000:00:1f.5 [ata_piix]
>   IDE interface: Intel Corporation 82801JI (ICH10 Family) 2 port SATA IDE Contro                                         ller #2
>     host2: /dev/sdb ATA Hitachi HDS72101 {SN: GTD000PAGMT8DD}
>     host3: /dev/sdc ATA Hitachi HDS72101 {SN: GTG000PAG04V0C}

Good thing we recorded the serial numbers.  From an earlier run of "lsdrv":

> Phil,
> sg3-utils did the job :-)
> Sorry for doubting you.
> 
>     host6: /dev/sdd AMCC 9500S-12 DISK {SN: PAGA8V4A3B8378002254}
>     host6: /dev/sde AMCC 9500S-12 DISK {SN: PAG9NL9A3B8387004C88}
>     host6: /dev/sdf AMCC 9500S-12 DISK {SN: PAGAA5DA3B8396001AE0}
>     host6: /dev/sdg AMCC 9500S-12 DISK {SN: PAGABXRA3B83A0005EFB}
>     host6: /dev/sdh AMCC 9500S-12 DISK {SN: PAG7WMEA3B83AA003E4F}
>     host6: /dev/sdi AMCC 9500S-12 DISK {SN: PAG1DPLC3B83B40026E6}
>     host6: /dev/sdj AMCC 9500S-12 DISK {SN: PAG18BJC3B83C3004760}
>     host6: /dev/sdk AMCC 9500S-12 DISK {SN: PAGMT9GD3B83CD004B18}
>     host6: /dev/sdl AMCC 9500S-12 DISK {SN: PAGMT8DD3B83D70021FF}
>     host6: /dev/sdm AMCC 9500S-12 DISK {SN: PAG04V0C3B83E100ACA0}
> 
> Simon

I don't know why the serial numbers are formatted differently, but we can still tell them apart (the eight characters starting with "PAG").

So, our device order in your new setup is: [ihgfmlkjbc], where /dev/sdi corresponds to the original report's /dev/sdd, which matches the sig grep in your other note.

Another note:  The controller for sd[abc] is still showing ata_piix as its controller.  That means you cannot hot-plug those ports.  If you change your BIOS to AHCI mode instead of "Compatibility" or "Emulation", the full-featured ahci driver will run those ports.  Not urgent, but I highly recommend it.

> proxmox:/home/simon# parted -l
> Model: ATA STM3500418AS (scsi)
> Disk /dev/sda: 500GB
> Sector size (logical/physical): 512B/512B
> Partition Table: msdos
> 
> Number  Start   End    Size   Type     File system  Flags
>  1      32.8kB  537MB  537MB  primary  ext3         boot
>  2      537MB   500GB  500GB  primary               lvm
> 
> 
> Error: The backup GPT table is not at the end of the disk, as it should be.  This might mean that another operating
> system believes the disk is smaller.  Fix, by moving the backup to the end (and removing the old backup)?
> Fix/Cancel? c

The 3ware controller must have reserved some space at the end of each drive for its own use.  Didn't know it'd do that.  You will have to fix that.

[trim /]

> Error: /dev/sdh: unrecognised disk label
> 
> Error: /dev/sdi: unrecognised disk label

It seems the flaky controller took out these partition tables.  Hope that's all it got.  They'll have to be re-created with parted.

Please run parted on each of the ten drives, and make sure they end up like so:

> proxmox:/home/simon# for x in sd{d..m} ; do parted -s /dev/$x unit s
> print ; done
> Model: AMCC 9500S-12 DISK (scsi)
> Disk /dev/sdd: 1953103872s
> Sector size (logical/physical): 512B/512B
> Partition Table: gpt
> 
> Number  Start  End          Size         File system  Name     Flags
>  1      2048s  1953101823s  1953099776s               primary  raid

Make sure you request "unit s" before your other commands so we can make sure it matches.

When you think you are done, check them all:

for x in /dev/sd{i,h,g,f,m,l,k,j,b,c} ; do parted -s $x unit s print ; done

After that, create:

mdadm --create --verbose --assume-clean /dev/md0 --metadata=1.1 --level=5 --raid-devices=10 /dev/sd{i,h,g,f,m,l,k,j,b,c}1 --chunk=64

And finally:

pvscan --verbose

Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-16 14:37       ` Phil Turmel
@ 2011-02-16 17:49         ` Simon McNair
  2011-02-16 18:14           ` Phil Turmel
  0 siblings, 1 reply; 64+ messages in thread
From: Simon McNair @ 2011-02-16 17:49 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid

Hi Phil,
  A couple of questions please.

On 16/02/2011 14:37, Phil Turmel wrote:
> Good morning, Simon,
>
> On 02/16/2011 08:51 AM, Simon McNair wrote:
>> latest update.  A bit long I'm afraid...
>>
>> hi, card installed and devices plugged in.  Now have
>> sda   sda2  sdb1  sdc1  sde   sdf1  sdg1  sdi   sdj1  sdk1  sdl1  sdm1
>> sda1  sdb   sdc   sdd   sdf   sdg   sdh   sdj   sdk   sdl   sdm
> Hmmm.
>
>> proxmox:/home/simon# ./lsdrv.sh
>> Controller device @ pci0000:00/0000:00:01.0/0000:01:00.0 [mvsas]
>>    SCSI storage controller: Marvell Technology Group Ltd. MV64460/64461/64462 Sys                                         tem Controller, Revision B (rev 01)
>>      host4: /dev/sdf ATA Hitachi HDS72101 {SN: GTA000PAGABXRA}
>>      host4: /dev/sdg ATA Hitachi HDS72101 {SN: GTA000PAGAA5DA}
>>      host4: /dev/sdh ATA Hitachi HDS72101 {SN: GTA000PAG9NL9A}
>>      host4: /dev/sdi ATA Hitachi HDS72101 {SN: GTA000PAGA8V4A}
>>      host4: /dev/sdj ATA Hitachi HDS72101 {SN: GTD000PAGMT9GD}
>>      host4: /dev/sdk ATA Hitachi HDS72101 {SN: GTG000PAG18BJC}
>>      host4: /dev/sdl ATA Hitachi HDS72101 {SN: GTG000PAG1DPLC}
>>      host4: /dev/sdm ATA Hitachi HDS72101 {SN: GTA000PAG7WMEA}
>> Controller device @ pci0000:00/0000:00:1a.7/usb1/1-4/1-4.1/1-4.1.1/1-4.1.1:1.0 [                                         usb-storage]
>>    Bus 001 Device 007: ID 0424:2228 Standard Microsystems Corp. 9-in-2 Card Reade                                         r {SN: 08050920003A}
>>      host9: /dev/sdd Generic Flash HS-CF
>>      host9: /dev/sde Generic Flash HS-COMBO
>> Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.0 [ahci]
>>    SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller                                          (rev 03)
>>      host7: [Empty]
>>      host8: [Empty]
>> Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.1 [pata_jmicron]
>>    IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (r                                         ev 03)
>>      host5: [Empty]
>>      host6: [Empty]
>> Controller device @ pci0000:00/0000:00:1f.2 [ata_piix]
>>    IDE interface: Intel Corporation 82801JI (ICH10 Family) 4 port SATA IDE Contro                                         ller #1
>>      host0: /dev/sda ATA STM3500418AS {SN: 9VM3QJ5C}
>>      host1: /dev/sr0 Optiarc DVD RW AD-5240S
>> Controller device @ pci0000:00/0000:00:1f.5 [ata_piix]
>>    IDE interface: Intel Corporation 82801JI (ICH10 Family) 2 port SATA IDE Contro                                         ller #2
>>      host2: /dev/sdb ATA Hitachi HDS72101 {SN: GTD000PAGMT8DD}
>>      host3: /dev/sdc ATA Hitachi HDS72101 {SN: GTG000PAG04V0C}
> Good thing we recorded the serial numbers.  From an earlier run of "lsdrv":
>
>> Phil,
>> sg3-utils did the job :-)
>> Sorry for doubting you.
>>
>>      host6: /dev/sdd AMCC 9500S-12 DISK {SN: PAGA8V4A3B8378002254}
>>      host6: /dev/sde AMCC 9500S-12 DISK {SN: PAG9NL9A3B8387004C88}
>>      host6: /dev/sdf AMCC 9500S-12 DISK {SN: PAGAA5DA3B8396001AE0}
>>      host6: /dev/sdg AMCC 9500S-12 DISK {SN: PAGABXRA3B83A0005EFB}
>>      host6: /dev/sdh AMCC 9500S-12 DISK {SN: PAG7WMEA3B83AA003E4F}
>>      host6: /dev/sdi AMCC 9500S-12 DISK {SN: PAG1DPLC3B83B40026E6}
>>      host6: /dev/sdj AMCC 9500S-12 DISK {SN: PAG18BJC3B83C3004760}
>>      host6: /dev/sdk AMCC 9500S-12 DISK {SN: PAGMT9GD3B83CD004B18}
>>      host6: /dev/sdl AMCC 9500S-12 DISK {SN: PAGMT8DD3B83D70021FF}
>>      host6: /dev/sdm AMCC 9500S-12 DISK {SN: PAG04V0C3B83E100ACA0}
>>
>> Simon
> I don't know why the serial numbers are formatted differently, but we can still tell them apart (the eight characters starting with "PAG").
>
> So, our device order in your new setup is: [ihgfmlkjbc], where /dev/sdi corresponds to the original report's /dev/sdd, which matches the sig grep in your other note.
>
> Another note:  The controller for sd[abc] is still showing ata_piix as its controller.  That means you cannot hot-plug those ports.  If you change your BIOS to AHCI mode instead of "Compatibility" or "Emulation", the full-featured ahci driver will run those ports.  Not urgent, but I highly recommend it.
>
Will do that now, before I forget

>> proxmox:/home/simon# parted -l
>> Model: ATA STM3500418AS (scsi)
>> Disk /dev/sda: 500GB
>> Sector size (logical/physical): 512B/512B
>> Partition Table: msdos
>>
>> Number  Start   End    Size   Type     File system  Flags
>>   1      32.8kB  537MB  537MB  primary  ext3         boot
>>   2      537MB   500GB  500GB  primary               lvm
>>
>>
>> Error: The backup GPT table is not at the end of the disk, as it should be.  This might mean that another operating
>> system believes the disk is smaller.  Fix, by moving the backup to the end (and removing the old backup)?
>> Fix/Cancel? c
> The 3ware controller must have reserved some space at the end of each drive for its own use.  Didn't know it'd do that.  You will have to fix that.
>
> [trim /]
>
Do you have any suggestions on how I can fix that ?  I don't have a clue
>> Error: /dev/sdh: unrecognised disk label
>>
>> Error: /dev/sdi: unrecognised disk label
> It seems the flaky controller took out these partition tables.  Hope that's all it got.  They'll have to be re-created with parted.
>
> Please run parted on each of the ten drives, and make sure they end up like so:
>
>> proxmox:/home/simon# for x in sd{d..m} ; do parted -s /dev/$x unit s
>> print ; done
>> Model: AMCC 9500S-12 DISK (scsi)
>> Disk /dev/sdd: 1953103872s
>> Sector size (logical/physical): 512B/512B
>> Partition Table: gpt
>>
>> Number  Start  End          Size         File system  Name     Flags
>>   1      2048s  1953101823s  1953099776s               primary  raid
> Make sure you request "unit s" before your other commands so we can make sure it matches.
>
> When you think you are done, check them all:
>

when I was trying to figure out the command for this using 'man parted' 
I came across this:
" rescue start end
                      Rescue  a  lost  partition  that was located 
somewhere between start and end.  If a partition is
                      found, parted will ask if you want to create an 
entry for it in the partition table."
Is it worth trying ?

I originally created the partitions like so:
parted -s /dev/sdb rm 1
parted -s /dev/sdb mklabel gpt
parted -s --align optimal /dev/sdb mkpart primary ext4 .512 100%
parted -s /dev/sdb set 1 raid on
parted -s /dev/sdb align-check optimal 1

so to recreate the above I would do:
parted -s /dev/sdb mkpart primary ext4 2048s 1953101823s
parted -s /dev/sdc mkpart primary ext4 2048s 1953101823s
parted -s /dev/sdf mkpart primary ext4 2048s 1953101823s
parted -s /dev/sdg mkpart primary ext4 2048s 1953101823s
parted -s /dev/sdh mkpart primary ext4 2048s 1953101823s
parted -s /dev/sdi mkpart primary ext4 2048s 1953101823s
parted -s /dev/sdj mkpart primary ext4 2048s 1953101823s
parted -s /dev/sdk mkpart primary ext4 2048s 1953101823s
parted -s /dev/sdl mkpart primary ext4 2048s 1953101823s
parted -s /dev/sdm mkpart primary ext4 2048s 1953101823s

I'm guessing the backups that I want to do can wait until any potential 
fsck ?

sorry if the questions are dumb but I'm not sure what I'm doing and I'd 
rather ask more questions than fewer and understand the implications of 
what I'm doing.

thanks
Simon

> for x in /dev/sd{i,h,g,f,m,l,k,j,b,c} ; do parted -s $x unit s print ; done
>
> After that, create:
>
> mdadm --create --verbose --assume-clean /dev/md0 --metadata=1.1 --level=5 --raid-devices=10 /dev/sd{i,h,g,f,m,l,k,j,b,c}1 --chunk=64
>
> And finally:
>
> pvscan --verbose
>
> Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-16 17:49         ` Simon McNair
@ 2011-02-16 18:14           ` Phil Turmel
  2011-02-16 18:18             ` Simon McNair
  2011-02-19  8:49             ` Simon Mcnair
  0 siblings, 2 replies; 64+ messages in thread
From: Phil Turmel @ 2011-02-16 18:14 UTC (permalink / raw)
  To: simonmcnair; +Cc: NeilBrown, linux-raid

On 02/16/2011 12:49 PM, Simon McNair wrote:
> Hi Phil,
>  A couple of questions please.

[trim /]

>>> Simon
>> I don't know why the serial numbers are formatted differently, but we can still tell them apart (the eight characters starting with "PAG").
>>
>> So, our device order in your new setup is: [ihgfmlkjbc], where /dev/sdi corresponds to the original report's /dev/sdd, which matches the sig grep in your other note.
>>
>> Another note:  The controller for sd[abc] is still showing ata_piix as its controller.  That means you cannot hot-plug those ports.  If you change your BIOS to AHCI mode instead of "Compatibility" or "Emulation", the full-featured ahci driver will run those ports.  Not urgent, but I highly recommend it.
>>
> Will do that now, before I forget

Hot-pluggability with suitable trays is very handy! :)

[trim /]

>>> Error: The backup GPT table is not at the end of the disk, as it should be.  This might mean that another operating
>>> system believes the disk is smaller.  Fix, by moving the backup to the end (and removing the old backup)?
>>> Fix/Cancel? c
>> The 3ware controller must have reserved some space at the end of each drive for its own use.  Didn't know it'd do that.  You will have to fix that.
>>
>> [trim /]
>>
> Do you have any suggestions on how I can fix that ?  I don't have a clue

Just do 'parted /dev/sd?' and on the ones it offers to fix, say yes.  Then request 'unit s' and 'print' to verify that it is correct.

[trim /]

> when I was trying to figure out the command for this using 'man parted' I came across this:
> " rescue start end
>                      Rescue  a  lost  partition  that was located somewhere between start and end.  If a partition is
>                      found, parted will ask if you want to create an entry for it in the partition table."
> Is it worth trying ?

Nah.  That's for when you don't know exactly where the partition is.  We know.

> I originally created the partitions like so:
> parted -s /dev/sdb rm 1
> parted -s /dev/sdb mklabel gpt
> parted -s --align optimal /dev/sdb mkpart primary ext4 .512 100%
> parted -s /dev/sdb set 1 raid on
> parted -s /dev/sdb align-check optimal 1
> 
> so to recreate the above I would do:
> parted -s /dev/sdb mkpart primary ext4 2048s 1953101823s
> parted -s /dev/sdc mkpart primary ext4 2048s 1953101823s
> parted -s /dev/sdf mkpart primary ext4 2048s 1953101823s
> parted -s /dev/sdg mkpart primary ext4 2048s 1953101823s
> parted -s /dev/sdh mkpart primary ext4 2048s 1953101823s
> parted -s /dev/sdi mkpart primary ext4 2048s 1953101823s
> parted -s /dev/sdj mkpart primary ext4 2048s 1953101823s
> parted -s /dev/sdk mkpart primary ext4 2048s 1953101823s
> parted -s /dev/sdl mkpart primary ext4 2048s 1953101823s
> parted -s /dev/sdm mkpart primary ext4 2048s 1953101823s

Only recreate the partition tables where you have to, i.e., the 'Fix' option above didn't work.  And don't specify a filesystem.

Probably just /dev/sdh and /dev/sdi.  Like so, though:

parted -s /dev/sdh mklabel gpt mkpart primary 2048s 1953101823s set 1 raid on
parted -s /dev/sdi mklabel gpt mkpart primary 2048s 1953101823s set 1 raid on

> I'm guessing the backups that I want to do can wait until any potential fsck ?

Do an 'fsck -N' first, and if it passes, or has few errors, mount the filesystem readonly and grab your backup.  Then let fsck have at it for real.  If anything gets fixed, compare your backup from the read-only fs to the fixed fs.

Given your flaky old controller, I expect there'll be *some* problems.

> sorry if the questions are dumb but I'm not sure what I'm doing and I'd rather ask more questions than fewer and understand the implications of what I'm doing.

Oh, no.  You are right to be paranoid.  If anything looks funny, stop.

Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-16 18:14           ` Phil Turmel
@ 2011-02-16 18:18             ` Simon McNair
  2011-02-16 18:22               ` Phil Turmel
  2011-02-19  8:49             ` Simon Mcnair
  1 sibling, 1 reply; 64+ messages in thread
From: Simon McNair @ 2011-02-16 18:18 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid

parted -l gives

Error: The backup GPT table is not at the end of the disk, as it should 
be.  This might mean that another operating
system believes the disk is smaller.  Fix, by moving the backup to the 
end (and removing the old backup)?
Fix/Cancel? fix
Warning: Not all of the space available to /dev/sdb appears to be used, 
you can fix the GPT to use all of the space (an
extra 421296 blocks) or continue with the current setting?
Fix/Ignore?

I'm guessing ignore ?

Simon

On 16/02/2011 18:14, Phil Turmel wrote:
> On 02/16/2011 12:49 PM, Simon McNair wrote:
>> Hi Phil,
>>   A couple of questions please.
> [trim /]
>
>>>> Simon
>>> I don't know why the serial numbers are formatted differently, but we can still tell them apart (the eight characters starting with "PAG").
>>>
>>> So, our device order in your new setup is: [ihgfmlkjbc], where /dev/sdi corresponds to the original report's /dev/sdd, which matches the sig grep in your other note.
>>>
>>> Another note:  The controller for sd[abc] is still showing ata_piix as its controller.  That means you cannot hot-plug those ports.  If you change your BIOS to AHCI mode instead of "Compatibility" or "Emulation", the full-featured ahci driver will run those ports.  Not urgent, but I highly recommend it.
>>>
>> Will do that now, before I forget
> Hot-pluggability with suitable trays is very handy! :)
>
> [trim /]
>
>>>> Error: The backup GPT table is not at the end of the disk, as it should be.  This might mean that another operating
>>>> system believes the disk is smaller.  Fix, by moving the backup to the end (and removing the old backup)?
>>>> Fix/Cancel? c
>>> The 3ware controller must have reserved some space at the end of each drive for its own use.  Didn't know it'd do that.  You will have to fix that.
>>>
>>> [trim /]
>>>
>> Do you have any suggestions on how I can fix that ?  I don't have a clue
> Just do 'parted /dev/sd?' and on the ones it offers to fix, say yes.  Then request 'unit s' and 'print' to verify that it is correct.
>
> [trim /]
>
>> when I was trying to figure out the command for this using 'man parted' I came across this:
>> " rescue start end
>>                       Rescue  a  lost  partition  that was located somewhere between start and end.  If a partition is
>>                       found, parted will ask if you want to create an entry for it in the partition table."
>> Is it worth trying ?
> Nah.  That's for when you don't know exactly where the partition is.  We know.
>
>> I originally created the partitions like so:
>> parted -s /dev/sdb rm 1
>> parted -s /dev/sdb mklabel gpt
>> parted -s --align optimal /dev/sdb mkpart primary ext4 .512 100%
>> parted -s /dev/sdb set 1 raid on
>> parted -s /dev/sdb align-check optimal 1
>>
>> so to recreate the above I would do:
>> parted -s /dev/sdb mkpart primary ext4 2048s 1953101823s
>> parted -s /dev/sdc mkpart primary ext4 2048s 1953101823s
>> parted -s /dev/sdf mkpart primary ext4 2048s 1953101823s
>> parted -s /dev/sdg mkpart primary ext4 2048s 1953101823s
>> parted -s /dev/sdh mkpart primary ext4 2048s 1953101823s
>> parted -s /dev/sdi mkpart primary ext4 2048s 1953101823s
>> parted -s /dev/sdj mkpart primary ext4 2048s 1953101823s
>> parted -s /dev/sdk mkpart primary ext4 2048s 1953101823s
>> parted -s /dev/sdl mkpart primary ext4 2048s 1953101823s
>> parted -s /dev/sdm mkpart primary ext4 2048s 1953101823s
> Only recreate the partition tables where you have to, i.e., the 'Fix' option above didn't work.  And don't specify a filesystem.
>
> Probably just /dev/sdh and /dev/sdi.  Like so, though:
>
> parted -s /dev/sdh mklabel gpt mkpart primary 2048s 1953101823s set 1 raid on
> parted -s /dev/sdi mklabel gpt mkpart primary 2048s 1953101823s set 1 raid on
>
>> I'm guessing the backups that I want to do can wait until any potential fsck ?
> Do an 'fsck -N' first, and if it passes, or has few errors, mount the filesystem readonly and grab your backup.  Then let fsck have at it for real.  If anything gets fixed, compare your backup from the read-only fs to the fixed fs.
>
> Given your flaky old controller, I expect there'll be *some* problems.
>
>> sorry if the questions are dumb but I'm not sure what I'm doing and I'd rather ask more questions than fewer and understand the implications of what I'm doing.
> Oh, no.  You are right to be paranoid.  If anything looks funny, stop.
>
> Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-16 18:18             ` Simon McNair
@ 2011-02-16 18:22               ` Phil Turmel
  2011-02-16 18:25                 ` Phil Turmel
  0 siblings, 1 reply; 64+ messages in thread
From: Phil Turmel @ 2011-02-16 18:22 UTC (permalink / raw)
  To: simonmcnair; +Cc: NeilBrown, linux-raid

On 02/16/2011 01:18 PM, Simon McNair wrote:
> parted -l gives
> 
> Error: The backup GPT table is not at the end of the disk, as it should be.  This might mean that another operating
> system believes the disk is smaller.  Fix, by moving the backup to the end (and removing the old backup)?
> Fix/Cancel? fix
> Warning: Not all of the space available to /dev/sdb appears to be used, you can fix the GPT to use all of the space (an
> extra 421296 blocks) or continue with the current setting?
> Fix/Ignore?
> 
> I'm guessing ignore ?

Either should work.  If you 'Fix' for all of the drives, you could later 'grow' your array to include the extra space.

Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-16 18:22               ` Phil Turmel
@ 2011-02-16 18:25                 ` Phil Turmel
  2011-02-16 18:52                   ` Simon McNair
  0 siblings, 1 reply; 64+ messages in thread
From: Phil Turmel @ 2011-02-16 18:25 UTC (permalink / raw)
  To: simonmcnair; +Cc: NeilBrown, linux-raid

On 02/16/2011 01:22 PM, Phil Turmel wrote:
> On 02/16/2011 01:18 PM, Simon McNair wrote:
>> parted -l gives
>>
>> Error: The backup GPT table is not at the end of the disk, as it should be.  This might mean that another operating
>> system believes the disk is smaller.  Fix, by moving the backup to the end (and removing the old backup)?
>> Fix/Cancel? fix
>> Warning: Not all of the space available to /dev/sdb appears to be used, you can fix the GPT to use all of the space (an
>> extra 421296 blocks) or continue with the current setting?
>> Fix/Ignore?
>>
>> I'm guessing ignore ?
> 
> Either should work.  If you 'Fix' for all of the drives, you could later 'grow' your array to include the extra space.

It's vital that the start sector be 2048, though.  Please do recheck after all are done.

Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-16 18:25                 ` Phil Turmel
@ 2011-02-16 18:52                   ` Simon McNair
  2011-02-16 18:57                     ` Phil Turmel
  0 siblings, 1 reply; 64+ messages in thread
From: Simon McNair @ 2011-02-16 18:52 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid

have done this....  Results are:

proxmox:/home/simon# ./lsdrv.sh
Controller device @ pci0000:00/0000:00:01.0/0000:01:00.0 [mvsas]
   SCSI storage controller: Marvell Technology Group Ltd. 
MV64460/64461/64462 System Controller, Revision B (rev 01)
     host0: /dev/sdf ATA Hitachi HDS72101 {SN: GTA000PAGABXRA}
     host0: /dev/sdg ATA Hitachi HDS72101 {SN: GTA000PAGAA5DA}
     host0: /dev/sdh ATA Hitachi HDS72101 {SN: GTA000PAG9NL9A}
     host0: /dev/sdi ATA Hitachi HDS72101 {SN: GTA000PAGA8V4A}
     host0: /dev/sdj ATA Hitachi HDS72101 {SN: GTD000PAGMT9GD}
     host0: /dev/sdk ATA Hitachi HDS72101 {SN: GTG000PAG18BJC}
     host0: /dev/sdl ATA Hitachi HDS72101 {SN: GTG000PAG1DPLC}
     host0: /dev/sdm ATA Hitachi HDS72101 {SN: GTA000PAG7WMEA}
Controller device @ 
pci0000:00/0000:00:1a.7/usb1/1-4/1-4.1/1-4.1.1/1-4.1.1:1.0 [usb-storage]
   Bus 001 Device 007: ID 0424:2228 Standard Microsystems Corp. 9-in-2 
Card Reader {SN: 08050920003A}
     host11: /dev/sdb Generic Flash HS-CF
     host11: /dev/sdc Generic Flash HS-COMBO
Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.0 [ahci]
   SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA 
Controller (rev 03)
     host9: [Empty]
     host10: [Empty]
Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.1 [pata_jmicron]
   IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA 
Controller (rev 03)
     host1: [Empty]
     host2: [Empty]
Controller device @ pci0000:00/0000:00:1f.2 [ahci]
   SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI 
Controller
     host3: /dev/sda ATA STM3500418AS {SN: 9VM3QJ5C}
     host4: /dev/sr0 Optiarc DVD RW AD-5240S
     host5: [Empty]
     host6: [Empty]
     host7: /dev/sdd ATA Hitachi HDS72101 {SN: GTD000PAGMT8DD}
     host8: /dev/sde ATA Hitachi HDS72101 {SN: GTG000PAG04V0C}

enabling achi has switched the drive letters around for a couple of 
drives again

parted list is:

proxmox:/home/simon# for x in /dev/sd{i,h,g,f,m,l,k,j,d,e} ; do parted 
-s $x unit s print ; done
Model: ATA Hitachi HDS72101 (scsi)
Disk /dev/sdi: 1953525168s
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start  End          Size         File system  Name     Flags
  1      2048s  1953101823s  1953099776s               primary  raid

Model: ATA Hitachi HDS72101 (scsi)
Disk /dev/sdh: 1953525168s
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start  End          Size         File system  Name     Flags
  1      2048s  1953101823s  1953099776s               primary  raid

Model: ATA Hitachi HDS72101 (scsi)
Disk /dev/sdg: 1953525168s
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start  End          Size         File system  Name     Flags
  1      2048s  1953101823s  1953099776s               primary  raid

Model: ATA Hitachi HDS72101 (scsi)
Disk /dev/sdf: 1953525168s
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start  End          Size         File system  Name     Flags
  1      2048s  1953101823s  1953099776s               primary  raid

Model: ATA Hitachi HDS72101 (scsi)
Disk /dev/sdm: 1953525168s
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start  End          Size         File system  Name     Flags
  1      2048s  1953101823s  1953099776s               primary  raid

Model: ATA Hitachi HDS72101 (scsi)
Disk /dev/sdl: 1953525168s
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start  End          Size         File system  Name     Flags
  1      2048s  1953101823s  1953099776s               primary  raid

Model: ATA Hitachi HDS72101 (scsi)
Disk /dev/sdk: 1953525168s
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start  End          Size         File system  Name     Flags
  1      2048s  1953101823s  1953099776s               primary  raid

Model: ATA Hitachi HDS72101 (scsi)
Disk /dev/sdj: 1953525168s
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start  End          Size         File system  Name     Flags
  1      2048s  1953101823s  1953099776s               primary  raid

Model: ATA Hitachi HDS72101 (scsi)
Disk /dev/sdd: 1953525168s
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start  End          Size         File system  Name     Flags
  1      2048s  1953101823s  1953099776s               primary  raid

Model: ATA Hitachi HDS72101 (scsi)
Disk /dev/sde: 1953525168s
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start  End          Size         File system  Name     Flags
  1      2048s  1953101823s  1953099776s               primary  raid

is it now:

mdadm --create --verbose --assume-clean /dev/md0 --metadata=1.1 
--level=5 --raid-devices=10 /dev/sd{i,h,g,f,m,l,k,j,d,e}1 --chunk=64

?
regards
Simon



On 16/02/2011 18:25, Phil Turmel wrote:
> On 02/16/2011 01:22 PM, Phil Turmel wrote:
>> On 02/16/2011 01:18 PM, Simon McNair wrote:
>>> parted -l gives
>>>
>>> Error: The backup GPT table is not at the end of the disk, as it should be.  This might mean that another operating
>>> system believes the disk is smaller.  Fix, by moving the backup to the end (and removing the old backup)?
>>> Fix/Cancel? fix
>>> Warning: Not all of the space available to /dev/sdb appears to be used, you can fix the GPT to use all of the space (an
>>> extra 421296 blocks) or continue with the current setting?
>>> Fix/Ignore?
>>>
>>> I'm guessing ignore ?
>> Either should work.  If you 'Fix' for all of the drives, you could later 'grow' your array to include the extra space.
> It's vital that the start sector be 2048, though.  Please do recheck after all are done.
>
> Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-16 18:52                   ` Simon McNair
@ 2011-02-16 18:57                     ` Phil Turmel
  2011-02-16 19:07                       ` Simon McNair
  0 siblings, 1 reply; 64+ messages in thread
From: Phil Turmel @ 2011-02-16 18:57 UTC (permalink / raw)
  To: simonmcnair; +Cc: NeilBrown, linux-raid

On 02/16/2011 01:52 PM, Simon McNair wrote:
> have done this....  Results are:
> 
> proxmox:/home/simon# ./lsdrv.sh
> Controller device @ pci0000:00/0000:00:01.0/0000:01:00.0 [mvsas]
>   SCSI storage controller: Marvell Technology Group Ltd. MV64460/64461/64462 System Controller, Revision B (rev 01)
>     host0: /dev/sdf ATA Hitachi HDS72101 {SN: GTA000PAGABXRA}
>     host0: /dev/sdg ATA Hitachi HDS72101 {SN: GTA000PAGAA5DA}
>     host0: /dev/sdh ATA Hitachi HDS72101 {SN: GTA000PAG9NL9A}
>     host0: /dev/sdi ATA Hitachi HDS72101 {SN: GTA000PAGA8V4A}
>     host0: /dev/sdj ATA Hitachi HDS72101 {SN: GTD000PAGMT9GD}
>     host0: /dev/sdk ATA Hitachi HDS72101 {SN: GTG000PAG18BJC}
>     host0: /dev/sdl ATA Hitachi HDS72101 {SN: GTG000PAG1DPLC}
>     host0: /dev/sdm ATA Hitachi HDS72101 {SN: GTA000PAG7WMEA}
> Controller device @ pci0000:00/0000:00:1a.7/usb1/1-4/1-4.1/1-4.1.1/1-4.1.1:1.0 [usb-storage]
>   Bus 001 Device 007: ID 0424:2228 Standard Microsystems Corp. 9-in-2 Card Reader {SN: 08050920003A}
>     host11: /dev/sdb Generic Flash HS-CF
>     host11: /dev/sdc Generic Flash HS-COMBO
> Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.0 [ahci]
>   SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 03)
>     host9: [Empty]
>     host10: [Empty]
> Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.1 [pata_jmicron]
>   IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 03)
>     host1: [Empty]
>     host2: [Empty]
> Controller device @ pci0000:00/0000:00:1f.2 [ahci]
>   SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller
>     host3: /dev/sda ATA STM3500418AS {SN: 9VM3QJ5C}
>     host4: /dev/sr0 Optiarc DVD RW AD-5240S
>     host5: [Empty]
>     host6: [Empty]
>     host7: /dev/sdd ATA Hitachi HDS72101 {SN: GTD000PAGMT8DD}
>     host8: /dev/sde ATA Hitachi HDS72101 {SN: GTG000PAG04V0C}
> 
> enabling achi has switched the drive letters around for a couple of drives again
> 
> parted list is:
> 
> proxmox:/home/simon# for x in /dev/sd{i,h,g,f,m,l,k,j,d,e} ; do parted -s $x unit s print ; done
> Model: ATA Hitachi HDS72101 (scsi)
> Disk /dev/sdi: 1953525168s
> Sector size (logical/physical): 512B/512B
> Partition Table: gpt
> 
> Number  Start  End          Size         File system  Name     Flags
>  1      2048s  1953101823s  1953099776s               primary  raid
> 
> Model: ATA Hitachi HDS72101 (scsi)
> Disk /dev/sdh: 1953525168s
> Sector size (logical/physical): 512B/512B
> Partition Table: gpt
> 
> Number  Start  End          Size         File system  Name     Flags
>  1      2048s  1953101823s  1953099776s               primary  raid
> 
> Model: ATA Hitachi HDS72101 (scsi)
> Disk /dev/sdg: 1953525168s
> Sector size (logical/physical): 512B/512B
> Partition Table: gpt
> 
> Number  Start  End          Size         File system  Name     Flags
>  1      2048s  1953101823s  1953099776s               primary  raid
> 
> Model: ATA Hitachi HDS72101 (scsi)
> Disk /dev/sdf: 1953525168s
> Sector size (logical/physical): 512B/512B
> Partition Table: gpt
> 
> Number  Start  End          Size         File system  Name     Flags
>  1      2048s  1953101823s  1953099776s               primary  raid
> 
> Model: ATA Hitachi HDS72101 (scsi)
> Disk /dev/sdm: 1953525168s
> Sector size (logical/physical): 512B/512B
> Partition Table: gpt
> 
> Number  Start  End          Size         File system  Name     Flags
>  1      2048s  1953101823s  1953099776s               primary  raid
> 
> Model: ATA Hitachi HDS72101 (scsi)
> Disk /dev/sdl: 1953525168s
> Sector size (logical/physical): 512B/512B
> Partition Table: gpt
> 
> Number  Start  End          Size         File system  Name     Flags
>  1      2048s  1953101823s  1953099776s               primary  raid
> 
> Model: ATA Hitachi HDS72101 (scsi)
> Disk /dev/sdk: 1953525168s
> Sector size (logical/physical): 512B/512B
> Partition Table: gpt
> 
> Number  Start  End          Size         File system  Name     Flags
>  1      2048s  1953101823s  1953099776s               primary  raid
> 
> Model: ATA Hitachi HDS72101 (scsi)
> Disk /dev/sdj: 1953525168s
> Sector size (logical/physical): 512B/512B
> Partition Table: gpt
> 
> Number  Start  End          Size         File system  Name     Flags
>  1      2048s  1953101823s  1953099776s               primary  raid
> 
> Model: ATA Hitachi HDS72101 (scsi)
> Disk /dev/sdd: 1953525168s
> Sector size (logical/physical): 512B/512B
> Partition Table: gpt
> 
> Number  Start  End          Size         File system  Name     Flags
>  1      2048s  1953101823s  1953099776s               primary  raid
> 
> Model: ATA Hitachi HDS72101 (scsi)
> Disk /dev/sde: 1953525168s
> Sector size (logical/physical): 512B/512B
> Partition Table: gpt
> 
> Number  Start  End          Size         File system  Name     Flags
>  1      2048s  1953101823s  1953099776s               primary  raid

All looks good.

> is it now:
> 
> mdadm --create --verbose --assume-clean /dev/md0 --metadata=1.1 --level=5 --raid-devices=10 /dev/sd{i,h,g,f,m,l,k,j,d,e}1 --chunk=64

Yes.

Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-16 18:57                     ` Phil Turmel
@ 2011-02-16 19:07                       ` Simon McNair
  2011-02-16 19:10                         ` Phil Turmel
  0 siblings, 1 reply; 64+ messages in thread
From: Simon McNair @ 2011-02-16 19:07 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid

pvscan is:
proxmox:/home/simon# pvscan --verbose
     Wiping cache of LVM-capable devices
     Wiping internal VG cache
     Walking through all physical volumes
   PV /dev/sda2   VG pve        lvm2 [465.26 GB / 4.00 GB free]
   PV /dev/md0    VG lvm-raid   lvm2 [8.19 TB / 0    free]
   Total: 2 [655.05 GB] / in use: 2 [655.05 GB] / in no VG: 0 [0   ]

lvm-raid is there ....

proxmox:/home/simon# fsck.ext4 -n /dev/md0
e2fsck 1.41.3 (12-Oct-2008)
fsck.ext4: Superblock invalid, trying backup blocks...
Superblock has an invalid ext3 journal (inode 8).
Clear? no

fsck.ext4: Illegal inode number while checking ext3 journal for /dev/md0

I thought I needed to run fsck against the lvm group ?  I used to do... 
fsck.ext4 /dev/lvm-raid/RAID so do I need to mount it or something ?

cheers
Simon


On 16/02/2011 18:57, Phil Turmel wrote:
> On 02/16/2011 01:52 PM, Simon McNair wrote:
>> have done this....  Results are:
>>
>> proxmox:/home/simon# ./lsdrv.sh
>> Controller device @ pci0000:00/0000:00:01.0/0000:01:00.0 [mvsas]
>>    SCSI storage controller: Marvell Technology Group Ltd. MV64460/64461/64462 System Controller, Revision B (rev 01)
>>      host0: /dev/sdf ATA Hitachi HDS72101 {SN: GTA000PAGABXRA}
>>      host0: /dev/sdg ATA Hitachi HDS72101 {SN: GTA000PAGAA5DA}
>>      host0: /dev/sdh ATA Hitachi HDS72101 {SN: GTA000PAG9NL9A}
>>      host0: /dev/sdi ATA Hitachi HDS72101 {SN: GTA000PAGA8V4A}
>>      host0: /dev/sdj ATA Hitachi HDS72101 {SN: GTD000PAGMT9GD}
>>      host0: /dev/sdk ATA Hitachi HDS72101 {SN: GTG000PAG18BJC}
>>      host0: /dev/sdl ATA Hitachi HDS72101 {SN: GTG000PAG1DPLC}
>>      host0: /dev/sdm ATA Hitachi HDS72101 {SN: GTA000PAG7WMEA}
>> Controller device @ pci0000:00/0000:00:1a.7/usb1/1-4/1-4.1/1-4.1.1/1-4.1.1:1.0 [usb-storage]
>>    Bus 001 Device 007: ID 0424:2228 Standard Microsystems Corp. 9-in-2 Card Reader {SN: 08050920003A}
>>      host11: /dev/sdb Generic Flash HS-CF
>>      host11: /dev/sdc Generic Flash HS-COMBO
>> Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.0 [ahci]
>>    SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 03)
>>      host9: [Empty]
>>      host10: [Empty]
>> Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.1 [pata_jmicron]
>>    IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 03)
>>      host1: [Empty]
>>      host2: [Empty]
>> Controller device @ pci0000:00/0000:00:1f.2 [ahci]
>>    SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller
>>      host3: /dev/sda ATA STM3500418AS {SN: 9VM3QJ5C}
>>      host4: /dev/sr0 Optiarc DVD RW AD-5240S
>>      host5: [Empty]
>>      host6: [Empty]
>>      host7: /dev/sdd ATA Hitachi HDS72101 {SN: GTD000PAGMT8DD}
>>      host8: /dev/sde ATA Hitachi HDS72101 {SN: GTG000PAG04V0C}
>>
>> enabling achi has switched the drive letters around for a couple of drives again
>>
>> parted list is:
>>
>> proxmox:/home/simon# for x in /dev/sd{i,h,g,f,m,l,k,j,d,e} ; do parted -s $x unit s print ; done
>> Model: ATA Hitachi HDS72101 (scsi)
>> Disk /dev/sdi: 1953525168s
>> Sector size (logical/physical): 512B/512B
>> Partition Table: gpt
>>
>> Number  Start  End          Size         File system  Name     Flags
>>   1      2048s  1953101823s  1953099776s               primary  raid
>>
>> Model: ATA Hitachi HDS72101 (scsi)
>> Disk /dev/sdh: 1953525168s
>> Sector size (logical/physical): 512B/512B
>> Partition Table: gpt
>>
>> Number  Start  End          Size         File system  Name     Flags
>>   1      2048s  1953101823s  1953099776s               primary  raid
>>
>> Model: ATA Hitachi HDS72101 (scsi)
>> Disk /dev/sdg: 1953525168s
>> Sector size (logical/physical): 512B/512B
>> Partition Table: gpt
>>
>> Number  Start  End          Size         File system  Name     Flags
>>   1      2048s  1953101823s  1953099776s               primary  raid
>>
>> Model: ATA Hitachi HDS72101 (scsi)
>> Disk /dev/sdf: 1953525168s
>> Sector size (logical/physical): 512B/512B
>> Partition Table: gpt
>>
>> Number  Start  End          Size         File system  Name     Flags
>>   1      2048s  1953101823s  1953099776s               primary  raid
>>
>> Model: ATA Hitachi HDS72101 (scsi)
>> Disk /dev/sdm: 1953525168s
>> Sector size (logical/physical): 512B/512B
>> Partition Table: gpt
>>
>> Number  Start  End          Size         File system  Name     Flags
>>   1      2048s  1953101823s  1953099776s               primary  raid
>>
>> Model: ATA Hitachi HDS72101 (scsi)
>> Disk /dev/sdl: 1953525168s
>> Sector size (logical/physical): 512B/512B
>> Partition Table: gpt
>>
>> Number  Start  End          Size         File system  Name     Flags
>>   1      2048s  1953101823s  1953099776s               primary  raid
>>
>> Model: ATA Hitachi HDS72101 (scsi)
>> Disk /dev/sdk: 1953525168s
>> Sector size (logical/physical): 512B/512B
>> Partition Table: gpt
>>
>> Number  Start  End          Size         File system  Name     Flags
>>   1      2048s  1953101823s  1953099776s               primary  raid
>>
>> Model: ATA Hitachi HDS72101 (scsi)
>> Disk /dev/sdj: 1953525168s
>> Sector size (logical/physical): 512B/512B
>> Partition Table: gpt
>>
>> Number  Start  End          Size         File system  Name     Flags
>>   1      2048s  1953101823s  1953099776s               primary  raid
>>
>> Model: ATA Hitachi HDS72101 (scsi)
>> Disk /dev/sdd: 1953525168s
>> Sector size (logical/physical): 512B/512B
>> Partition Table: gpt
>>
>> Number  Start  End          Size         File system  Name     Flags
>>   1      2048s  1953101823s  1953099776s               primary  raid
>>
>> Model: ATA Hitachi HDS72101 (scsi)
>> Disk /dev/sde: 1953525168s
>> Sector size (logical/physical): 512B/512B
>> Partition Table: gpt
>>
>> Number  Start  End          Size         File system  Name     Flags
>>   1      2048s  1953101823s  1953099776s               primary  raid
> All looks good.
>
>> is it now:
>>
>> mdadm --create --verbose --assume-clean /dev/md0 --metadata=1.1 --level=5 --raid-devices=10 /dev/sd{i,h,g,f,m,l,k,j,d,e}1 --chunk=64
> Yes.
>
> Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-16 19:07                       ` Simon McNair
@ 2011-02-16 19:10                         ` Phil Turmel
  2011-02-16 19:15                           ` Simon McNair
  0 siblings, 1 reply; 64+ messages in thread
From: Phil Turmel @ 2011-02-16 19:10 UTC (permalink / raw)
  To: simonmcnair; +Cc: NeilBrown, linux-raid

On 02/16/2011 02:07 PM, Simon McNair wrote:
> pvscan is:
> proxmox:/home/simon# pvscan --verbose
>     Wiping cache of LVM-capable devices
>     Wiping internal VG cache
>     Walking through all physical volumes
>   PV /dev/sda2   VG pve        lvm2 [465.26 GB / 4.00 GB free]
>   PV /dev/md0    VG lvm-raid   lvm2 [8.19 TB / 0    free]
>   Total: 2 [655.05 GB] / in use: 2 [655.05 GB] / in no VG: 0 [0   ]
> 
> lvm-raid is there ....

Very good.

> proxmox:/home/simon# fsck.ext4 -n /dev/md0

Uh, no.

> e2fsck 1.41.3 (12-Oct-2008)
> fsck.ext4: Superblock invalid, trying backup blocks...
> Superblock has an invalid ext3 journal (inode 8).
> Clear? no
> 
> fsck.ext4: Illegal inode number while checking ext3 journal for /dev/md0


> I thought I needed to run fsck against the lvm group ?  I used to do... fsck.ext4 /dev/lvm-raid/RAID so do I need to mount it or something ?

vgscan --verbose

lvscan --verbose


Then either:

fsck -N /dev/mapper/lvm-raid-RAID

or:

fsck -N /dev/lvm-raid/RAID

depends on what udev is doing.

Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-16 19:10                         ` Phil Turmel
@ 2011-02-16 19:15                           ` Simon McNair
  2011-02-16 19:36                             ` Phil Turmel
  0 siblings, 1 reply; 64+ messages in thread
From: Simon McNair @ 2011-02-16 19:15 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid

proxmox:/home/simon# vgscan --verbose
     Wiping cache of LVM-capable devices
     Wiping internal VG cache
   Reading all physical volumes.  This may take a while...
     Finding all volume groups
     Finding volume group "pve"
   Found volume group "pve" using metadata type lvm2
     Finding volume group "lvm-raid"
   Found volume group "lvm-raid" using metadata type lvm2
proxmox:/home/simon#
proxmox:/home/simon# lvscan --verbose
     Finding all logical volumes
   ACTIVE            '/dev/pve/swap' [11.00 GB] inherit
   ACTIVE            '/dev/pve/root' [96.00 GB] inherit
   ACTIVE            '/dev/pve/data' [354.26 GB] inherit
   inactive          '/dev/lvm-raid/RAID' [8.19 TB] inherit

proxmox:/home/simon# vgchange -ay
   3 logical volume(s) in volume group "pve" now active
   1 logical volume(s) in volume group "lvm-raid" now active

proxmox:/home/simon# fsck.ext4 -n /dev/mapper/lvm-raid-RAID
e2fsck 1.41.3 (12-Oct-2008)
fsck.ext4: No such file or directory while trying to open 
/dev/mapper/lvm-raid-RAID

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
     e2fsck -b 8193 <device>

proxmox:/home/simon# fsck.ext4 -n /dev/mapper/
control         lvm--raid-RAID  pve-data        pve-root        pve-swap
proxmox:/home/simon# fsck.ext4 -n /dev/mapper/lvm--raid-RAID
e2fsck 1.41.3 (12-Oct-2008)
/dev/mapper/lvm--raid-RAID has unsupported feature(s): FEATURE_I31
e2fsck: Get a newer version of e2fsck!

my version of e2fsck always worked before ?

Simon

On 16/02/2011 19:10, Phil Turmel wrote:
> On 02/16/2011 02:07 PM, Simon McNair wrote:
>> pvscan is:
>> proxmox:/home/simon# pvscan --verbose
>>      Wiping cache of LVM-capable devices
>>      Wiping internal VG cache
>>      Walking through all physical volumes
>>    PV /dev/sda2   VG pve        lvm2 [465.26 GB / 4.00 GB free]
>>    PV /dev/md0    VG lvm-raid   lvm2 [8.19 TB / 0    free]
>>    Total: 2 [655.05 GB] / in use: 2 [655.05 GB] / in no VG: 0 [0   ]
>>
>> lvm-raid is there ....
> Very good.
>
>> proxmox:/home/simon# fsck.ext4 -n /dev/md0
> Uh, no.
>
>> e2fsck 1.41.3 (12-Oct-2008)
>> fsck.ext4: Superblock invalid, trying backup blocks...
>> Superblock has an invalid ext3 journal (inode 8).
>> Clear? no
>>
>> fsck.ext4: Illegal inode number while checking ext3 journal for /dev/md0
>
>> I thought I needed to run fsck against the lvm group ?  I used to do... fsck.ext4 /dev/lvm-raid/RAID so do I need to mount it or something ?
> vgscan --verbose
>
> lvscan --verbose
>
>
> Then either:
>
> fsck -N /dev/mapper/lvm-raid-RAID
>
> or:
>
> fsck -N /dev/lvm-raid/RAID
>
> depends on what udev is doing.
>
> Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-16 19:15                           ` Simon McNair
@ 2011-02-16 19:36                             ` Phil Turmel
  2011-02-16 21:28                               ` Simon McNair
                                                 ` (2 more replies)
  0 siblings, 3 replies; 64+ messages in thread
From: Phil Turmel @ 2011-02-16 19:36 UTC (permalink / raw)
  To: simonmcnair; +Cc: NeilBrown, linux-raid

On 02/16/2011 02:15 PM, Simon McNair wrote:
> proxmox:/home/simon# vgscan --verbose
>     Wiping cache of LVM-capable devices
>     Wiping internal VG cache
>   Reading all physical volumes.  This may take a while...
>     Finding all volume groups
>     Finding volume group "pve"
>   Found volume group "pve" using metadata type lvm2
>     Finding volume group "lvm-raid"
>   Found volume group "lvm-raid" using metadata type lvm2
> proxmox:/home/simon#
> proxmox:/home/simon# lvscan --verbose
>     Finding all logical volumes
>   ACTIVE            '/dev/pve/swap' [11.00 GB] inherit
>   ACTIVE            '/dev/pve/root' [96.00 GB] inherit
>   ACTIVE            '/dev/pve/data' [354.26 GB] inherit
>   inactive          '/dev/lvm-raid/RAID' [8.19 TB] inherit
> 
> proxmox:/home/simon# vgchange -ay
>   3 logical volume(s) in volume group "pve" now active
>   1 logical volume(s) in volume group "lvm-raid" now active

Heh.  Figures.

> proxmox:/home/simon# fsck.ext4 -n /dev/mapper/lvm-raid-RAID

Actually, I wanted you to try with a capital N.  Lower case 'n' is similar, but not quite the same.

> e2fsck 1.41.3 (12-Oct-2008)
> fsck.ext4: No such file or directory while trying to open /dev/mapper/lvm-raid-RAID
> 
> The superblock could not be read or does not describe a correct ext2
> filesystem.  If the device is valid and it really contains an ext2
> filesystem (and not swap or ufs or something else), then the superblock
> is corrupt, and you might try running e2fsck with an alternate superblock:
>     e2fsck -b 8193 <device>
> 
> proxmox:/home/simon# fsck.ext4 -n /dev/mapper/
> control         lvm--raid-RAID  pve-data        pve-root        pve-swap

Strange.  I guess it does that to distinguish dashes in the VG name from dashes between VG and LV names.

> proxmox:/home/simon# fsck.ext4 -n /dev/mapper/lvm--raid-RAID
> e2fsck 1.41.3 (12-Oct-2008)
> /dev/mapper/lvm--raid-RAID has unsupported feature(s): FEATURE_I31
> e2fsck: Get a newer version of e2fsck!
> 
> my version of e2fsck always worked before ?

v1.41.14 was release 7 weeks ago.  But, I suspect there's corruption in the superblock.  Do you still have your disk images tucked away somewhere safe?

If so, try:

1) The '-b' option to e2fsck.  We need to experiment with '-n -b offset' to find the alternate superblock.  Trying 'offset' = to 8193, 16384, and 32768, per the man-page.

2) A newer e2fsprogs.

Finally,
3) mount -r /dev/lvm-raid/RAID /mnt/whatever

Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-16 19:36                             ` Phil Turmel
@ 2011-02-16 21:28                               ` Simon McNair
  2011-02-16 21:30                                 ` Phil Turmel
  2011-02-18  9:31                               ` Simon Mcnair
       [not found]                               ` <AANLkTi=RmR5nVnmFLuqK5anHc3WDPxjuYjitT6+5wAqS@mail.gmail.com>
  2 siblings, 1 reply; 64+ messages in thread
From: Simon McNair @ 2011-02-16 21:28 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid

Phil,
Jeez I'm having a bad week (my windows 7x64 machine has just started 
randomly crashing, my Thecus n5200 is playing up and the weather has 
been dire so I've not been able to put my new shed up...oh and I have 
ongoing 'other' issues as you're well aware ;-))  The thecus n5200 has 
5x2TB hdd's.  I wiped out the existing raid5 array to create a jbod span 
of 10tb in order to hold my 9tb of backups.  The Thecus has had a hissy 
fit and I've had to set the process off again, so you can bet it'll be a 
day or two before it get's the drives formatted (it's not a very 
powerful nas), then I'll do the backups, then I'll try as you suggested.

Thanks for the ongoing assistance.

Simon

On 16/02/2011 19:36, Phil Turmel wrote:
> On 02/16/2011 02:15 PM, Simon McNair wrote:
>> proxmox:/home/simon# vgscan --verbose
>>      Wiping cache of LVM-capable devices
>>      Wiping internal VG cache
>>    Reading all physical volumes.  This may take a while...
>>      Finding all volume groups
>>      Finding volume group "pve"
>>    Found volume group "pve" using metadata type lvm2
>>      Finding volume group "lvm-raid"
>>    Found volume group "lvm-raid" using metadata type lvm2
>> proxmox:/home/simon#
>> proxmox:/home/simon# lvscan --verbose
>>      Finding all logical volumes
>>    ACTIVE            '/dev/pve/swap' [11.00 GB] inherit
>>    ACTIVE            '/dev/pve/root' [96.00 GB] inherit
>>    ACTIVE            '/dev/pve/data' [354.26 GB] inherit
>>    inactive          '/dev/lvm-raid/RAID' [8.19 TB] inherit
>>
>> proxmox:/home/simon# vgchange -ay
>>    3 logical volume(s) in volume group "pve" now active
>>    1 logical volume(s) in volume group "lvm-raid" now active
> Heh.  Figures.
>
>> proxmox:/home/simon# fsck.ext4 -n /dev/mapper/lvm-raid-RAID
> Actually, I wanted you to try with a capital N.  Lower case 'n' is similar, but not quite the same.
>
>> e2fsck 1.41.3 (12-Oct-2008)
>> fsck.ext4: No such file or directory while trying to open /dev/mapper/lvm-raid-RAID
>>
>> The superblock could not be read or does not describe a correct ext2
>> filesystem.  If the device is valid and it really contains an ext2
>> filesystem (and not swap or ufs or something else), then the superblock
>> is corrupt, and you might try running e2fsck with an alternate superblock:
>>      e2fsck -b 8193<device>
>>
>> proxmox:/home/simon# fsck.ext4 -n /dev/mapper/
>> control         lvm--raid-RAID  pve-data        pve-root        pve-swap
> Strange.  I guess it does that to distinguish dashes in the VG name from dashes between VG and LV names.
>
>> proxmox:/home/simon# fsck.ext4 -n /dev/mapper/lvm--raid-RAID
>> e2fsck 1.41.3 (12-Oct-2008)
>> /dev/mapper/lvm--raid-RAID has unsupported feature(s): FEATURE_I31
>> e2fsck: Get a newer version of e2fsck!
>>
>> my version of e2fsck always worked before ?
> v1.41.14 was release 7 weeks ago.  But, I suspect there's corruption in the superblock.  Do you still have your disk images tucked away somewhere safe?
>
> If so, try:
>
> 1) The '-b' option to e2fsck.  We need to experiment with '-n -b offset' to find the alternate superblock.  Trying 'offset' = to 8193, 16384, and 32768, per the man-page.
>
> 2) A newer e2fsprogs.
>
> Finally,
> 3) mount -r /dev/lvm-raid/RAID /mnt/whatever
>
> Phil
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-16 21:28                               ` Simon McNair
@ 2011-02-16 21:30                                 ` Phil Turmel
  2011-02-16 22:44                                   ` Simon Mcnair
  0 siblings, 1 reply; 64+ messages in thread
From: Phil Turmel @ 2011-02-16 21:30 UTC (permalink / raw)
  To: simonmcnair; +Cc: NeilBrown, linux-raid

On 02/16/2011 04:28 PM, Simon McNair wrote:
> Phil,
> Jeez I'm having a bad week (my windows 7x64 machine has just started randomly crashing, my Thecus n5200 is playing up and the weather has been dire so I've not been able to put my new shed up...oh and I have ongoing 'other' issues as you're well aware ;-))  The thecus n5200 has 5x2TB hdd's.  I wiped out the existing raid5 array to create a jbod span of 10tb in order to hold my 9tb of backups.  The Thecus has had a hissy fit and I've had to set the process off again, so you can bet it'll be a day or two before it get's the drives formatted (it's not a very powerful nas), then I'll do the backups, then I'll try as you suggested.

That's fine.
> 
> Thanks for the ongoing assistance.

No problem.

Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-16 21:30                                 ` Phil Turmel
@ 2011-02-16 22:44                                   ` Simon Mcnair
  2011-02-16 23:39                                     ` Phil Turmel
  0 siblings, 1 reply; 64+ messages in thread
From: Simon Mcnair @ 2011-02-16 22:44 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid@vger.kernel.org

I just went to bed and one last question popped in to my mind. Since
there is a fair timezone gap I thought I'd be presumptuous and ask it
in the hope I can turn it around a bit quicker in the morning.

My suspicion is that once I have 5 formatted 2tb drives I may be lucky
to get 10x 1tb dd images on to it. Can I feed the dd process in to
tar, bzip2, zip or something else which will give me enough space to
fit the images on ?

Will I get more usable space from 5x2tb partitions or from 1xspanned
volume ? (the thecus pretty much only allows you to create raid
volumes so a jbod needs to be 5x1tb arrays or 1 spanned volume or
stripe).

TIA
Simon

On 16 Feb 2011, at 21:30, Phil Turmel <philip@turmel.org> wrote:

> On 02/16/2011 04:28 PM, Simon McNair wrote:
>> Phil,
>> Jeez I'm having a bad week (my windows 7x64 machine has just started randomly crashing, my Thecus n5200 is playing up and the weather has been dire so I've not been able to put my new shed up...oh and I have ongoing 'other' issues as you're well aware ;-))  The thecus n5200 has 5x2TB hdd's.  I wiped out the existing raid5 array to create a jbod span of 10tb in order to hold my 9tb of backups.  The Thecus has had a hissy fit and I've had to set the process off again, so you can bet it'll be a day or two before it get's the drives formatted (it's not a very powerful nas), then I'll do the backups, then I'll try as you suggested.
>
> That's fine.
>>
>> Thanks for the ongoing assistance.
>
> No problem.
>
> Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-16 22:44                                   ` Simon Mcnair
@ 2011-02-16 23:39                                     ` Phil Turmel
  2011-02-17 13:26                                       ` Simon Mcnair
  0 siblings, 1 reply; 64+ messages in thread
From: Phil Turmel @ 2011-02-16 23:39 UTC (permalink / raw)
  To: Simon Mcnair; +Cc: NeilBrown, linux-raid@vger.kernel.org

On 02/16/2011 05:44 PM, Simon Mcnair wrote:
> I just went to bed and one last question popped in to my mind. Since
> there is a fair timezone gap I thought I'd be presumptuous and ask it
> in the hope I can turn it around a bit quicker in the morning.
> 
> My suspicion is that once I have 5 formatted 2tb drives I may be lucky
> to get 10x 1tb dd images on to it. Can I feed the dd process in to
> tar, bzip2, zip or something else which will give me enough space to
> fit the images on ?
> 
> Will I get more usable space from 5x2tb partitions or from 1xspanned
> volume ? (the thecus pretty much only allows you to create raid
> volumes so a jbod needs to be 5x1tb arrays or 1 spanned volume or
> stripe).

I'd use one spanned volume, and gzip.  I'd simultaneously generate an md5sum while streaming to your thecus.  A script like so:

#! /bin/bash
#
function usage() {
	printf "Usage:\n\t%s devname\n\n" "`basename \"$0\"`"
	printf "'devname' must be a relative path in /dev/ to the desired block device.\n"
	exit 1
}

# Verify the supplied name is a device
test -b "/dev/$1" || usage

# Convert path separators and spaces into dashes
outfile="`echo \"$1\" |sed -r -e 's:[ /]+:-:g'`"

# Create a side-stream for computing the MD5 of the data read
fifo=`mktemp -u`
mkfifo $fifo || exit
md5sum -b <$fifo >/mnt/thecus/$outfile.md5 &

# Read the device and compress it
dd if="/dev/$1" bs=1M | tee 2>$fifo | gzip >/mnt/thecus/$outfile.gz

# Wait for the background task to close
wait

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-16 23:39                                     ` Phil Turmel
@ 2011-02-17 13:26                                       ` Simon Mcnair
  2011-02-17 13:48                                         ` Phil Turmel
  0 siblings, 1 reply; 64+ messages in thread
From: Simon Mcnair @ 2011-02-17 13:26 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid@vger.kernel.org

Phil,
After a couple of attempts I realised that all I needed to do was
./backup.sh sdi rather than specifying ./backup.sh /dev/sdi  schoolboy
error huh, rtfm.

I tried modifying the code, so I could kick them all off at once, but
I wanted to check that this would work from a multithreaded/sane
perspective (yes I know it's a bit of IO, but iotop seems to infer I'm
only getting 10M/s throughput from the disk anyway).

is this code kosha ?

#! /bin/bash
#
function usage() {
       printf "Usage:\n\t%s devname\n\n" "`basename \"$0\"`"
       printf "'devname' must be a relative path in /dev/ to the
desired block device.\n"
       exit 1
}

# Verify the supplied name is a device
test -b "/dev/$1" || usage

# Convert path separators and spaces into dashes
outfile="`echo \"$1\" |sed -r -e 's:[ /]+:-:g'`"

# Create a side-stream for computing the MD5 of the data read
fifo=`mktemp -u`
mkfifo $fifo || exit
md5sum -b <$fifo >$2/$outfile.md5 &

# Read the device and compress it
dd if="/dev/$1" bs=1M | tee 2>$fifo | gzip >$2/$outfile.gz

# Wait for the background task to close
wait

I was a little concerned as I didn't see the hdd drive LED light up
for my attempt even though the file was growing nicely on my CIFS
share.  I'm dumping it on the 5x2TB disks (JBOD) in my windows 7 box
as my Thecus is not a happy chappy and I need time to mount the DOM
module in another machine to find out why.

cheers
Simon


On 16 February 2011 23:39, Phil Turmel <philip@turmel.org> wrote:
> On 02/16/2011 05:44 PM, Simon Mcnair wrote:
>> I just went to bed and one last question popped in to my mind. Since
>> there is a fair timezone gap I thought I'd be presumptuous and ask it
>> in the hope I can turn it around a bit quicker in the morning.
>>
>> My suspicion is that once I have 5 formatted 2tb drives I may be lucky
>> to get 10x 1tb dd images on to it. Can I feed the dd process in to
>> tar, bzip2, zip or something else which will give me enough space to
>> fit the images on ?
>>
>> Will I get more usable space from 5x2tb partitions or from 1xspanned
>> volume ? (the thecus pretty much only allows you to create raid
>> volumes so a jbod needs to be 5x1tb arrays or 1 spanned volume or
>> stripe).
>
> I'd use one spanned volume, and gzip.  I'd simultaneously generate an md5sum while streaming to your thecus.  A script like so:
>
> #! /bin/bash
> #
> function usage() {
>        printf "Usage:\n\t%s devname\n\n" "`basename \"$0\"`"
>        printf "'devname' must be a relative path in /dev/ to the desired block device.\n"
>        exit 1
> }
>
> # Verify the supplied name is a device
> test -b "/dev/$1" || usage
>
> # Convert path separators and spaces into dashes
> outfile="`echo \"$1\" |sed -r -e 's:[ /]+:-:g'`"
>
> # Create a side-stream for computing the MD5 of the data read
> fifo=`mktemp -u`
> mkfifo $fifo || exit
> md5sum -b <$fifo >/mnt/thecus/$outfile.md5 &
>
> # Read the device and compress it
> dd if="/dev/$1" bs=1M | tee 2>$fifo | gzip >/mnt/thecus/$outfile.gz
>
> # Wait for the background task to close
> wait
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-17 13:26                                       ` Simon Mcnair
@ 2011-02-17 13:48                                         ` Phil Turmel
  2011-02-17 13:56                                           ` Simon Mcnair
  0 siblings, 1 reply; 64+ messages in thread
From: Phil Turmel @ 2011-02-17 13:48 UTC (permalink / raw)
  To: Simon Mcnair; +Cc: NeilBrown, linux-raid@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1979 bytes --]

On 02/17/2011 08:26 AM, Simon Mcnair wrote:
> Phil,
> After a couple of attempts I realised that all I needed to do was
> ./backup.sh sdi rather than specifying ./backup.sh /dev/sdi  schoolboy
> error huh, rtfm.
> 
> I tried modifying the code, so I could kick them all off at once, but
> I wanted to check that this would work from a multithreaded/sane
> perspective (yes I know it's a bit of IO, but iotop seems to infer I'm
> only getting 10M/s throughput from the disk anyway).

That sounds suspiciously like a 100 meg ethernet bottleneck, so parallel operation won't help.  If so, this'll take a very long time.

> is this code kosha ?

Looks pretty good.  I've attached a slight update with your changes and a bugfix.

> #! /bin/bash
> #
> function usage() {
>        printf "Usage:\n\t%s devname\n\n" "`basename \"$0\"`"
>        printf "'devname' must be a relative path in /dev/ to the
> desired block device.\n"
>        exit 1
> }
> 
> # Verify the supplied name is a device
> test -b "/dev/$1" || usage
> 
> # Convert path separators and spaces into dashes
> outfile="`echo \"$1\" |sed -r -e 's:[ /]+:-:g'`"
> 
> # Create a side-stream for computing the MD5 of the data read
> fifo=`mktemp -u`
> mkfifo $fifo || exit
> md5sum -b <$fifo >$2/$outfile.md5 &
> 
> # Read the device and compress it
> dd if="/dev/$1" bs=1M | tee 2>$fifo | gzip >$2/$outfile.gz
> 
> # Wait for the background task to close
> wait
> 
> I was a little concerned as I didn't see the hdd drive LED light up
> for my attempt even though the file was growing nicely on my CIFS
> share.  I'm dumping it on the 5x2TB disks (JBOD) in my windows 7 box
> as my Thecus is not a happy chappy and I need time to mount the DOM
> module in another machine to find out why.

If you're willing to dismantle the thecus, directly connecting its drives to your crippled system will make things go much faster.  You've got four empty motherboard SATA ports.  You just have to watch out for the power load.

Phil

[-- Attachment #2: block2gz.sh --]
[-- Type: application/x-sh, Size: 803 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-17 13:48                                         ` Phil Turmel
@ 2011-02-17 13:56                                           ` Simon Mcnair
  2011-02-17 14:34                                             ` Simon Mcnair
  0 siblings, 1 reply; 64+ messages in thread
From: Simon Mcnair @ 2011-02-17 13:56 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid@vger.kernel.org

Phil, I think I need to connect them to the crippled machine directly,
I was just trying to leave it alone, I know that doesn't make much
sense but I'm more familiar with windows than Linux. The cards that I
have are all connected at gigabit and bonded. That machine has two
ports, so does the other. I'm running two asus p6ts (1 deluxe and 1
se) with i7 920's

I cant understand why it's so slow though I have seen some reports
that the supermicro card is quite slow.
Simon
Ps thanks for the bugfixes and updates
Simon

On 17 Feb 2011, at 13:48, Phil Turmel <philip@turmel.org> wrote:

> On 02/17/2011 08:26 AM, Simon Mcnair wrote:
>> Phil,
>> After a couple of attempts I realised that all I needed to do was
>> ./backup.sh sdi rather than specifying ./backup.sh /dev/sdi  schoolboy
>> error huh, rtfm.
>>
>> I tried modifying the code, so I could kick them all off at once, but
>> I wanted to check that this would work from a multithreaded/sane
>> perspective (yes I know it's a bit of IO, but iotop seems to infer I'm
>> only getting 10M/s throughput from the disk anyway).
>
> That sounds suspiciously like a 100 meg ethernet bottleneck, so parallel operation won't help.  If so, this'll take a very long time.
>
>> is this code kosha ?
>
> Looks pretty good.  I've attached a slight update with your changes and a bugfix.
>
>> #! /bin/bash
>> #
>> function usage() {
>>       printf "Usage:\n\t%s devname\n\n" "`basename \"$0\"`"
>>       printf "'devname' must be a relative path in /dev/ to the
>> desired block device.\n"
>>       exit 1
>> }
>>
>> # Verify the supplied name is a device
>> test -b "/dev/$1" || usage
>>
>> # Convert path separators and spaces into dashes
>> outfile="`echo \"$1\" |sed -r -e 's:[ /]+:-:g'`"
>>
>> # Create a side-stream for computing the MD5 of the data read
>> fifo=`mktemp -u`
>> mkfifo $fifo || exit
>> md5sum -b <$fifo >$2/$outfile.md5 &
>>
>> # Read the device and compress it
>> dd if="/dev/$1" bs=1M | tee 2>$fifo | gzip >$2/$outfile.gz
>>
>> # Wait for the background task to close
>> wait
>>
>> I was a little concerned as I didn't see the hdd drive LED light up
>> for my attempt even though the file was growing nicely on my CIFS
>> share.  I'm dumping it on the 5x2TB disks (JBOD) in my windows 7 box
>> as my Thecus is not a happy chappy and I need time to mount the DOM
>> module in another machine to find out why.
>
> If you're willing to dismantle the thecus, directly connecting its drives to your crippled system will make things go much faster.  You've got four empty motherboard SATA ports.  You just have to watch out for the power load.
>
> Phil
> <block2gz.sh>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-17 13:56                                           ` Simon Mcnair
@ 2011-02-17 14:34                                             ` Simon Mcnair
  2011-02-17 16:54                                               ` Phil Turmel
  0 siblings, 1 reply; 64+ messages in thread
From: Simon Mcnair @ 2011-02-17 14:34 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid@vger.kernel.org

19390 root       17.49 M/s       0 B/s  0.00 % 21.36 % dd if /dev/sde bs 1M
19333 root       16.52 M/s       0 B/s  0.00 % 17.07 % dd if /dev/sdd bs 1M
19503 root       15.79 M/s       0 B/s  0.00 % 13.91 % dd if /dev/sdf bs 1M
18896 root           0 B/s   16.10 M/s  0.00 %  0.00 % ntfs-3g
/dev/sdb1 /media/ntfs3g/sdb
18909 root           0 B/s   13.49 M/s  0.00 %  0.00 % ntfs-3g
/dev/sdc1 /media/ntfs3g/sdc
18920 root           0 B/s   15.73 M/s  0.00 %  0.00 % ntfs-3g
/dev/sdn1 /media/ntfs3g/sdn

this will certainly be quicker :-)

On 17 February 2011 13:56, Simon Mcnair <simonmcnair@gmail.com> wrote:
> Phil, I think I need to connect them to the crippled machine directly,
> I was just trying to leave it alone, I know that doesn't make much
> sense but I'm more familiar with windows than Linux. The cards that I
> have are all connected at gigabit and bonded. That machine has two
> ports, so does the other. I'm running two asus p6ts (1 deluxe and 1
> se) with i7 920's
>
> I cant understand why it's so slow though I have seen some reports
> that the supermicro card is quite slow.
> Simon
> Ps thanks for the bugfixes and updates
> Simon
>
> On 17 Feb 2011, at 13:48, Phil Turmel <philip@turmel.org> wrote:
>
>> On 02/17/2011 08:26 AM, Simon Mcnair wrote:
>>> Phil,
>>> After a couple of attempts I realised that all I needed to do was
>>> ./backup.sh sdi rather than specifying ./backup.sh /dev/sdi  schoolboy
>>> error huh, rtfm.
>>>
>>> I tried modifying the code, so I could kick them all off at once, but
>>> I wanted to check that this would work from a multithreaded/sane
>>> perspective (yes I know it's a bit of IO, but iotop seems to infer I'm
>>> only getting 10M/s throughput from the disk anyway).
>>
>> That sounds suspiciously like a 100 meg ethernet bottleneck, so parallel operation won't help.  If so, this'll take a very long time.
>>
>>> is this code kosha ?
>>
>> Looks pretty good.  I've attached a slight update with your changes and a bugfix.
>>
>>> #! /bin/bash
>>> #
>>> function usage() {
>>>       printf "Usage:\n\t%s devname\n\n" "`basename \"$0\"`"
>>>       printf "'devname' must be a relative path in /dev/ to the
>>> desired block device.\n"
>>>       exit 1
>>> }
>>>
>>> # Verify the supplied name is a device
>>> test -b "/dev/$1" || usage
>>>
>>> # Convert path separators and spaces into dashes
>>> outfile="`echo \"$1\" |sed -r -e 's:[ /]+:-:g'`"
>>>
>>> # Create a side-stream for computing the MD5 of the data read
>>> fifo=`mktemp -u`
>>> mkfifo $fifo || exit
>>> md5sum -b <$fifo >$2/$outfile.md5 &
>>>
>>> # Read the device and compress it
>>> dd if="/dev/$1" bs=1M | tee 2>$fifo | gzip >$2/$outfile.gz
>>>
>>> # Wait for the background task to close
>>> wait
>>>
>>> I was a little concerned as I didn't see the hdd drive LED light up
>>> for my attempt even though the file was growing nicely on my CIFS
>>> share.  I'm dumping it on the 5x2TB disks (JBOD) in my windows 7 box
>>> as my Thecus is not a happy chappy and I need time to mount the DOM
>>> module in another machine to find out why.
>>
>> If you're willing to dismantle the thecus, directly connecting its drives to your crippled system will make things go much faster.  You've got four empty motherboard SATA ports.  You just have to watch out for the power load.
>>
>> Phil
>> <block2gz.sh>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-17 14:34                                             ` Simon Mcnair
@ 2011-02-17 16:54                                               ` Phil Turmel
  2011-02-19  8:43                                                 ` Simon Mcnair
  0 siblings, 1 reply; 64+ messages in thread
From: Phil Turmel @ 2011-02-17 16:54 UTC (permalink / raw)
  To: Simon Mcnair; +Cc: NeilBrown, linux-raid@vger.kernel.org

On 02/17/2011 09:34 AM, Simon Mcnair wrote:
> 19390 root       17.49 M/s       0 B/s  0.00 % 21.36 % dd if /dev/sde bs 1M
> 19333 root       16.52 M/s       0 B/s  0.00 % 17.07 % dd if /dev/sdd bs 1M
> 19503 root       15.79 M/s       0 B/s  0.00 % 13.91 % dd if /dev/sdf bs 1M
> 18896 root           0 B/s   16.10 M/s  0.00 %  0.00 % ntfs-3g
> /dev/sdb1 /media/ntfs3g/sdb
> 18909 root           0 B/s   13.49 M/s  0.00 %  0.00 % ntfs-3g
> /dev/sdc1 /media/ntfs3g/sdc
> 18920 root           0 B/s   15.73 M/s  0.00 %  0.00 % ntfs-3g
> /dev/sdn1 /media/ntfs3g/sdn
> 
> this will certainly be quicker :-)

But still a long time...

Let me know.

Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-17 16:54                                               ` Phil Turmel
@ 2011-02-19  8:43                                                 ` Simon Mcnair
  2011-02-19 15:30                                                   ` Phil Turmel
  0 siblings, 1 reply; 64+ messages in thread
From: Simon Mcnair @ 2011-02-19  8:43 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid@vger.kernel.org

Phil,
Sorry for the spamming, but I'm just keeping you informed :-).

proxmox:/home/simon# ./block2gz.sh sdd /media/ntfs3g/sdb
953869+1 records in
953869+1 records out
1000204886016 bytes (1.0 TB) copied, 95384 s, 10.5 MB/s

proxmox:/home/simon# ./block2gz.sh sde /media/ntfs3g/sdn
953869+1 records in
953869+1 records out
1000204886016 bytes (1.0 TB) copied, 106317 s, 9.4 MB/s

proxmox:/home/simon# ./block2gz.sh sdf /media/ntfs3g/sdc
953869+1 records in
953869+1 records out
1000204886016 bytes (1.0 TB) copied, 104711 s, 9.6 MB/s

proxmox:/home/simon# ./block2gz.sh sdg /media/ntfs3g/sdb
953869+1 records in
953869+1 records out
1000204886016 bytes (1.0 TB) copied, 104980 s, 9.5 MB/s

proxmox:/home/simon# ./block2gz.sh sdh /media/ntfs3g/sdn
953869+1 records in
953869+1 records out
1000204886016 bytes (1.0 TB) copied, 98105.1 s, 10.2 MB/s

proxmox:/home/simon# ./block2gz.sh sdi /media/ntfs3g/sdc
953869+1 records in
953869+1 records out
1000204886016 bytes (1.0 TB) copied, 98291 s, 10.2 MB/s
proxmox:/home/simon#

Have just started sdj,sdk,sdl,sdm.  I was thinking of renaming the
.gz's with the serial number as this would seem to me as more useful,
is this a good idea ?

These numbers are not at all reflective of the drive or controller
speed as I took the lazy route and was writing 2 sets of images at the
same time to the same drive and in addition gzip was also running
(although not really stressing the system from what I could tell).

I suspect also that the drives may be pretty fragmented as the space
was not allocated at the start of the write, so that may have had some
impact too.

Simon

On 17 February 2011 16:54, Phil Turmel <philip@turmel.org> wrote:
> On 02/17/2011 09:34 AM, Simon Mcnair wrote:
>> 19390 root       17.49 M/s       0 B/s  0.00 % 21.36 % dd if /dev/sde bs 1M
>> 19333 root       16.52 M/s       0 B/s  0.00 % 17.07 % dd if /dev/sdd bs 1M
>> 19503 root       15.79 M/s       0 B/s  0.00 % 13.91 % dd if /dev/sdf bs 1M
>> 18896 root           0 B/s   16.10 M/s  0.00 %  0.00 % ntfs-3g
>> /dev/sdb1 /media/ntfs3g/sdb
>> 18909 root           0 B/s   13.49 M/s  0.00 %  0.00 % ntfs-3g
>> /dev/sdc1 /media/ntfs3g/sdc
>> 18920 root           0 B/s   15.73 M/s  0.00 %  0.00 % ntfs-3g
>> /dev/sdn1 /media/ntfs3g/sdn
>>
>> this will certainly be quicker :-)
>
> But still a long time...
>
> Let me know.
>
> Phil
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-19  8:43                                                 ` Simon Mcnair
@ 2011-02-19 15:30                                                   ` Phil Turmel
       [not found]                                                     ` <AANLkTinOXJWRw_et2U43R_T9XPBzQLnN56Kf2bOAz=_c@mail.gmail.com>
  0 siblings, 1 reply; 64+ messages in thread
From: Phil Turmel @ 2011-02-19 15:30 UTC (permalink / raw)
  To: Simon Mcnair; +Cc: NeilBrown, linux-raid@vger.kernel.org

On 02/19/2011 03:43 AM, Simon Mcnair wrote:
> Phil,
> Sorry for the spamming, but I'm just keeping you informed :-).
> 
> proxmox:/home/simon# ./block2gz.sh sdd /media/ntfs3g/sdb
> 953869+1 records in
> 953869+1 records out
> 1000204886016 bytes (1.0 TB) copied, 95384 s, 10.5 MB/s

[trim /]

> Have just started sdj,sdk,sdl,sdm.  I was thinking of renaming the
> .gz's with the serial number as this would seem to me as more useful,
> is this a good idea ?

Yes.  I guess they'll finish some time tomorrow?

> These numbers are not at all reflective of the drive or controller
> speed as I took the lazy route and was writing 2 sets of images at the
> same time to the same drive and in addition gzip was also running
> (although not really stressing the system from what I could tell).

The CPU was busy, but not overloaded.  The per-drive data rates were 1/5th to 1/10th what I would have expected.  The Asus P6T SE motherboard has a lot of bandwidth.  Something's odd.

Can you attach a copy of your dmesg?

> I suspect also that the drives may be pretty fragmented as the space
> was not allocated at the start of the write, so that may have had some
> impact too.

Ntfs-3g can push 70+ MB/s onto my heavily fragged Windows laptop partition with only 50% of a Core2 Duo.  I doubt that's it.

Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

[parent not found: <AANLkTinOXJWRw_et2U43R_T9XPBzQLnN56Kf2bOAz=_c@mail.gmail.com>]

* Re: Linux software RAID assistance
       [not found]                                                     ` <AANLkTinOXJWRw_et2U43R_T9XPBzQLnN56Kf2bOAz=_c@mail.gmail.com>
@ 2011-02-19 16:19                                                       ` Phil Turmel
  2011-02-20  9:56                                                         ` Simon Mcnair
  0 siblings, 1 reply; 64+ messages in thread
From: Phil Turmel @ 2011-02-19 16:19 UTC (permalink / raw)
  To: Simon Mcnair; +Cc: NeilBrown, linux-raid@vger.kernel.org

On 02/19/2011 10:36 AM, Simon Mcnair wrote:
> as attached (hopefully).
> regards
> Simon

Nothing obvious to me.

Are you running irqbalance?

cat /proc/interrupts ?

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-19 16:19                                                       ` Phil Turmel
@ 2011-02-20  9:56                                                         ` Simon Mcnair
  2011-02-20 19:50                                                           ` Phil Turmel
  0 siblings, 1 reply; 64+ messages in thread
From: Simon Mcnair @ 2011-02-20  9:56 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid@vger.kernel.org

Phil,
I don't know how to find out if I'm running irqbalance, it's whatever
was in the proxmox iso that I installed the OS from.  I ran a ps -aux
| grep irq in case it shows anything of interest:

proxmox:/home/simon# ps -aux | grep irq
Warning: bad ps syntax, perhaps a bogus '-'? See http://procps.sf.net/faq.html
root         3  0.0  0.0      0     0 ?        S    Feb17   0:22 [ksoftirqd/0]
root         7  0.0  0.0      0     0 ?        S    Feb17   0:05 [ksoftirqd/1]
root        10  0.0  0.0      0     0 ?        S    Feb17   0:05 [ksoftirqd/2]
root        13  0.0  0.0      0     0 ?        S    Feb17   0:05 [ksoftirqd/3]
root        16  0.0  0.0      0     0 ?        S    Feb17   0:11 [ksoftirqd/4]
root        19  0.0  0.0      0     0 ?        S    Feb17   0:13 [ksoftirqd/5]
root        22  0.0  0.0      0     0 ?        S    Feb17   0:06 [ksoftirqd/6]
root        25  0.0  0.0      0     0 ?        S    Feb17   0:06 [ksoftirqd/7]
root      4024  0.0  0.0      0     0 ?        S    Feb17   0:00
[kvm-irqfd-clean]

proxmox:/home/simon# cat /proc/interrupts
            CPU0       CPU1       CPU2       CPU3       CPU4
CPU5       CPU6       CPU7
   0:   66008972          0          0          0          0
0          0          0  IR-IO-APIC-edge      timer
   1:     265829          0          0          0          0
0          0          0  IR-IO-APIC-edge      i8042
   8:          1          0          0          0          0
0          0          0  IR-IO-APIC-edge      rtc0
   9:          0          0          0          0          0
0          0          0  IR-IO-APIC-fasteoi   acpi
  16:    4432639          0          0          0          0
0          0          0  IR-IO-APIC-fasteoi   uhci_hcd:usb3, ahci
  17:     124325          0          0          0          0
0          0          0  IR-IO-APIC-fasteoi   pata_jmicron, eth1
  18:     954710          0          0          0          0
0          0          0  IR-IO-APIC-fasteoi   ehci_hcd:usb1,
uhci_hcd:usb8
  19:       5994          0          0          0          0
0          0          0  IR-IO-APIC-fasteoi   uhci_hcd:usb5,
uhci_hcd:usb7, firewire_ohci
  21:          0          0          0          0          0
0          0          0  IR-IO-APIC-fasteoi   uhci_hcd:usb4
  23:         62          0          0          0          0
0          0          0  IR-IO-APIC-fasteoi   ehci_hcd:usb2,
uhci_hcd:usb6
  24:     273558          0          0          0          0
0          0          0  IR-IO-APIC-fasteoi   nvidia
  28:   30234710          0          0          0          0
0          0          0  IR-IO-APIC-fasteoi   mvsas
  64:          0          0          0          0          0
0          0          0  DMAR_MSI-edge      dmar0
  65:          0          0          0          0          0
0          0          0  DMAR_MSI-edge      dmar1
  73:   26838727          0          0          0          0
0          0          0  IR-PCI-MSI-edge      eth0
  74:   27057631          0          0          0          0
0          0          0  IR-PCI-MSI-edge      ahci
  75:        247          0          0          0          0
0          0          0  IR-PCI-MSI-edge      hda_intel
 NMI:          0          0          0          0          0
0          0          0   Non-maskable interrupts
 LOC:   23087679   23088282   21638323   20580136   22974769
20856094   20144310   20084801   Local timer interrupts
 SPU:          0          0          0          0          0
0          0          0   Spurious interrupts
 PMI:          0          0          0          0          0
0          0          0   Performance monitoring interrupts
 PND:          0          0          0          0          0
0          0          0   Performance pending work
 RES:   15236622    9388860    8532928    7181879   13444866
6005599    4856933    3713707   Rescheduling interrupts
 CAL:       3384       5773       5839       5859       4590
5748       5781       5691   Function call interrupts
 TLB:     201989     195383     193545     196155     251338
249053     272032     319233   TLB shootdowns
 TRM:          0          0          0          0          0
0          0          0   Thermal event interrupts
 THR:          0          0          0          0          0
0          0          0   Threshold APIC interrupts
 MCE:          0          0          0          0          0
0          0          0   Machine check exceptions
 MCP:        830        830        830        830        830
830        830        830   Machine check polls
 ERR:          7
 MIS:          0

cheers
Simon

On 19 February 2011 16:19, Phil Turmel <philip@turmel.org> wrote:
> On 02/19/2011 10:36 AM, Simon Mcnair wrote:
>> as attached (hopefully).
>> regards
>> Simon
>
> Nothing obvious to me.
>
> Are you running irqbalance?
>
> cat /proc/interrupts ?
>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-20  9:56                                                         ` Simon Mcnair
@ 2011-02-20 19:50                                                           ` Phil Turmel
  2011-02-20 23:17                                                             ` Simon Mcnair
  0 siblings, 1 reply; 64+ messages in thread
From: Phil Turmel @ 2011-02-20 19:50 UTC (permalink / raw)
  To: Simon Mcnair; +Cc: NeilBrown, linux-raid@vger.kernel.org

On 02/20/2011 04:56 AM, Simon Mcnair wrote:
> Phil,
> I don't know how to find out if I'm running irqbalance, it's whatever
> was in the proxmox iso that I installed the OS from.  I ran a ps -aux
> | grep irq in case it shows anything of interest:
> 
> proxmox:/home/simon# ps -aux | grep irq
> Warning: bad ps syntax, perhaps a bogus '-'? See http://procps.sf.net/faq.html
> root         3  0.0  0.0      0     0 ?        S    Feb17   0:22 [ksoftirqd/0]
> root         7  0.0  0.0      0     0 ?        S    Feb17   0:05 [ksoftirqd/1]
> root        10  0.0  0.0      0     0 ?        S    Feb17   0:05 [ksoftirqd/2]
> root        13  0.0  0.0      0     0 ?        S    Feb17   0:05 [ksoftirqd/3]
> root        16  0.0  0.0      0     0 ?        S    Feb17   0:11 [ksoftirqd/4]
> root        19  0.0  0.0      0     0 ?        S    Feb17   0:13 [ksoftirqd/5]
> root        22  0.0  0.0      0     0 ?        S    Feb17   0:06 [ksoftirqd/6]
> root        25  0.0  0.0      0     0 ?        S    Feb17   0:06 [ksoftirqd/7]
> root      4024  0.0  0.0      0     0 ?        S    Feb17   0:00
> [kvm-irqfd-clean]

Not there.  On ubuntu 10.10, the package is called "irqbalance", and the executable daemon is "irqbalance".

> proxmox:/home/simon# cat /proc/interrupts
>             CPU0       CPU1       CPU2       CPU3       CPU4
> CPU5       CPU6       CPU7
>    0:   66008972          0          0          0          0
> 0          0          0  IR-IO-APIC-edge      timer
>    1:     265829          0          0          0          0
> 0          0          0  IR-IO-APIC-edge      i8042
>    8:          1          0          0          0          0
> 0          0          0  IR-IO-APIC-edge      rtc0
>    9:          0          0          0          0          0
> 0          0          0  IR-IO-APIC-fasteoi   acpi
>   16:    4432639          0          0          0          0
> 0          0          0  IR-IO-APIC-fasteoi   uhci_hcd:usb3, ahci
>   17:     124325          0          0          0          0
> 0          0          0  IR-IO-APIC-fasteoi   pata_jmicron, eth1
>   18:     954710          0          0          0          0
> 0          0          0  IR-IO-APIC-fasteoi   ehci_hcd:usb1,
> uhci_hcd:usb8
>   19:       5994          0          0          0          0
> 0          0          0  IR-IO-APIC-fasteoi   uhci_hcd:usb5,
> uhci_hcd:usb7, firewire_ohci
>   21:          0          0          0          0          0
> 0          0          0  IR-IO-APIC-fasteoi   uhci_hcd:usb4
>   23:         62          0          0          0          0
> 0          0          0  IR-IO-APIC-fasteoi   ehci_hcd:usb2,
> uhci_hcd:usb6
>   24:     273558          0          0          0          0
> 0          0          0  IR-IO-APIC-fasteoi   nvidia
>   28:   30234710          0          0          0          0
> 0          0          0  IR-IO-APIC-fasteoi   mvsas
>   64:          0          0          0          0          0
> 0          0          0  DMAR_MSI-edge      dmar0
>   65:          0          0          0          0          0
> 0          0          0  DMAR_MSI-edge      dmar1
>   73:   26838727          0          0          0          0
> 0          0          0  IR-PCI-MSI-edge      eth0
>   74:   27057631          0          0          0          0
> 0          0          0  IR-PCI-MSI-edge      ahci
>   75:        247          0          0          0          0
> 0          0          0  IR-PCI-MSI-edge      hda_intel
>  NMI:          0          0          0          0          0
> 0          0          0   Non-maskable interrupts
>  LOC:   23087679   23088282   21638323   20580136   22974769
> 20856094   20144310   20084801   Local timer interrupts
>  SPU:          0          0          0          0          0
> 0          0          0   Spurious interrupts
>  PMI:          0          0          0          0          0
> 0          0          0   Performance monitoring interrupts
>  PND:          0          0          0          0          0
> 0          0          0   Performance pending work
>  RES:   15236622    9388860    8532928    7181879   13444866
> 6005599    4856933    3713707   Rescheduling interrupts
>  CAL:       3384       5773       5839       5859       4590
> 5748       5781       5691   Function call interrupts
>  TLB:     201989     195383     193545     196155     251338
> 249053     272032     319233   TLB shootdowns
>  TRM:          0          0          0          0          0
> 0          0          0   Thermal event interrupts
>  THR:          0          0          0          0          0
> 0          0          0   Threshold APIC interrupts
>  MCE:          0          0          0          0          0
> 0          0          0   Machine check exceptions
>  MCP:        830        830        830        830        830
> 830        830        830   Machine check polls
>  ERR:          7
>  MIS:          0

CPU0 is handling every single I/O interrupt.  I really think you need irqbalance.

Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-20 19:50                                                           ` Phil Turmel
@ 2011-02-20 23:17                                                             ` Simon Mcnair
  2011-02-20 23:39                                                               ` Phil Turmel
  0 siblings, 1 reply; 64+ messages in thread
From: Simon Mcnair @ 2011-02-20 23:17 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid@vger.kernel.org

Phil,
You are a god amongst men :-)

I seem to have the vast majority of my data.  I can't see anything in
lost+found and the fsck log doesn't list any filenames etc of missing
data.  Is there any way for me to find anything that I've lost ?

:-)
:-)
:-)

Simon

On 20 February 2011 19:50, Phil Turmel <philip@turmel.org> wrote:
> On 02/20/2011 04:56 AM, Simon Mcnair wrote:
>> Phil,
>> I don't know how to find out if I'm running irqbalance, it's whatever
>> was in the proxmox iso that I installed the OS from.  I ran a ps -aux
>> | grep irq in case it shows anything of interest:
>>
>> proxmox:/home/simon# ps -aux | grep irq
>> Warning: bad ps syntax, perhaps a bogus '-'? See http://procps.sf.net/faq.html
>> root         3  0.0  0.0      0     0 ?        S    Feb17   0:22 [ksoftirqd/0]
>> root         7  0.0  0.0      0     0 ?        S    Feb17   0:05 [ksoftirqd/1]
>> root        10  0.0  0.0      0     0 ?        S    Feb17   0:05 [ksoftirqd/2]
>> root        13  0.0  0.0      0     0 ?        S    Feb17   0:05 [ksoftirqd/3]
>> root        16  0.0  0.0      0     0 ?        S    Feb17   0:11 [ksoftirqd/4]
>> root        19  0.0  0.0      0     0 ?        S    Feb17   0:13 [ksoftirqd/5]
>> root        22  0.0  0.0      0     0 ?        S    Feb17   0:06 [ksoftirqd/6]
>> root        25  0.0  0.0      0     0 ?        S    Feb17   0:06 [ksoftirqd/7]
>> root      4024  0.0  0.0      0     0 ?        S    Feb17   0:00
>> [kvm-irqfd-clean]
>
> Not there.  On ubuntu 10.10, the package is called "irqbalance", and the executable daemon is "irqbalance".
>
>> proxmox:/home/simon# cat /proc/interrupts
>>             CPU0       CPU1       CPU2       CPU3       CPU4
>> CPU5       CPU6       CPU7
>>    0:   66008972          0          0          0          0
>> 0          0          0  IR-IO-APIC-edge      timer
>>    1:     265829          0          0          0          0
>> 0          0          0  IR-IO-APIC-edge      i8042
>>    8:          1          0          0          0          0
>> 0          0          0  IR-IO-APIC-edge      rtc0
>>    9:          0          0          0          0          0
>> 0          0          0  IR-IO-APIC-fasteoi   acpi
>>   16:    4432639          0          0          0          0
>> 0          0          0  IR-IO-APIC-fasteoi   uhci_hcd:usb3, ahci
>>   17:     124325          0          0          0          0
>> 0          0          0  IR-IO-APIC-fasteoi   pata_jmicron, eth1
>>   18:     954710          0          0          0          0
>> 0          0          0  IR-IO-APIC-fasteoi   ehci_hcd:usb1,
>> uhci_hcd:usb8
>>   19:       5994          0          0          0          0
>> 0          0          0  IR-IO-APIC-fasteoi   uhci_hcd:usb5,
>> uhci_hcd:usb7, firewire_ohci
>>   21:          0          0          0          0          0
>> 0          0          0  IR-IO-APIC-fasteoi   uhci_hcd:usb4
>>   23:         62          0          0          0          0
>> 0          0          0  IR-IO-APIC-fasteoi   ehci_hcd:usb2,
>> uhci_hcd:usb6
>>   24:     273558          0          0          0          0
>> 0          0          0  IR-IO-APIC-fasteoi   nvidia
>>   28:   30234710          0          0          0          0
>> 0          0          0  IR-IO-APIC-fasteoi   mvsas
>>   64:          0          0          0          0          0
>> 0          0          0  DMAR_MSI-edge      dmar0
>>   65:          0          0          0          0          0
>> 0          0          0  DMAR_MSI-edge      dmar1
>>   73:   26838727          0          0          0          0
>> 0          0          0  IR-PCI-MSI-edge      eth0
>>   74:   27057631          0          0          0          0
>> 0          0          0  IR-PCI-MSI-edge      ahci
>>   75:        247          0          0          0          0
>> 0          0          0  IR-PCI-MSI-edge      hda_intel
>>  NMI:          0          0          0          0          0
>> 0          0          0   Non-maskable interrupts
>>  LOC:   23087679   23088282   21638323   20580136   22974769
>> 20856094   20144310   20084801   Local timer interrupts
>>  SPU:          0          0          0          0          0
>> 0          0          0   Spurious interrupts
>>  PMI:          0          0          0          0          0
>> 0          0          0   Performance monitoring interrupts
>>  PND:          0          0          0          0          0
>> 0          0          0   Performance pending work
>>  RES:   15236622    9388860    8532928    7181879   13444866
>> 6005599    4856933    3713707   Rescheduling interrupts
>>  CAL:       3384       5773       5839       5859       4590
>> 5748       5781       5691   Function call interrupts
>>  TLB:     201989     195383     193545     196155     251338
>> 249053     272032     319233   TLB shootdowns
>>  TRM:          0          0          0          0          0
>> 0          0          0   Thermal event interrupts
>>  THR:          0          0          0          0          0
>> 0          0          0   Threshold APIC interrupts
>>  MCE:          0          0          0          0          0
>> 0          0          0   Machine check exceptions
>>  MCP:        830        830        830        830        830
>> 830        830        830   Machine check polls
>>  ERR:          7
>>  MIS:          0
>
> CPU0 is handling every single I/O interrupt.  I really think you need irqbalance.
>
> Phil
>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-20 23:17                                                             ` Simon Mcnair
@ 2011-02-20 23:39                                                               ` Phil Turmel
  2011-02-22 17:12                                                                 ` Simon Mcnair
  0 siblings, 1 reply; 64+ messages in thread
From: Phil Turmel @ 2011-02-20 23:39 UTC (permalink / raw)
  To: Simon Mcnair; +Cc: NeilBrown, linux-raid@vger.kernel.org

On 02/20/2011 06:17 PM, Simon Mcnair wrote:
> Phil,
> You are a god amongst men :-)

Heh.  I read that out loud to my wife....

> I seem to have the vast majority of my data.  I can't see anything in
> lost+found and the fsck log doesn't list any filenames etc of missing
> data.  Is there any way for me to find anything that I've lost ?

Not that I know of, short of a list of names from before the crash.

Glad to hear that the family treasures are safe.

Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-20 23:39                                                               ` Phil Turmel
@ 2011-02-22 17:12                                                                 ` Simon Mcnair
  2011-02-22 17:14                                                                   ` Simon Mcnair
  0 siblings, 1 reply; 64+ messages in thread
From: Simon Mcnair @ 2011-02-22 17:12 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid@vger.kernel.org

Phil,
I thought it was fixed, but every time I do a reboot it disappears
again.  I did the:
pvcreate --restorefile 'simons-lvm-backup' --uuid
9fAJEz-HcaP-RQ51-fV8b-nxrN-Uqwb-PPnOLJ /dev/md0 and the vgcfgrestore

the only piece of your advice that I didn't follow was "zero the
superblock on the removed drive" as I thought re-adding it to the
array would do that anyway.  Did this cause me problems ?

proxmox:~# mdadm  --assemble --scan -v
mdadm: looking for devices for /dev/md/0
mdadm: cannot open device /dev/dm-2: Device or resource busy
mdadm: /dev/dm-2 has wrong uuid.
mdadm: cannot open device /dev/dm-1: Device or resource busy
mdadm: /dev/dm-1 has wrong uuid.
mdadm: cannot open device /dev/dm-0: Device or resource busy
mdadm: /dev/dm-0 has wrong uuid.
mdadm: /dev/sdo1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdo
mdadm: /dev/sdo has wrong uuid.
mdadm: /dev/sdn1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdn
mdadm: /dev/sdn has wrong uuid.
mdadm: /dev/sdm1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdm
mdadm: /dev/sdm has wrong uuid.
mdadm: /dev/sdl1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdl
mdadm: /dev/sdl has wrong uuid.
mdadm: /dev/sdk1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdk
mdadm: /dev/sdk has wrong uuid.
mdadm: /dev/sdj1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdj
mdadm: /dev/sdj has wrong uuid.
mdadm: /dev/sdi1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdi
mdadm: /dev/sdi has wrong uuid.
mdadm: /dev/sdh1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdh
mdadm: /dev/sdh has wrong uuid.
mdadm: /dev/sdg1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdg
mdadm: /dev/sdg has wrong uuid.
mdadm: /dev/sdf1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdf
mdadm: /dev/sdf has wrong uuid.
mdadm: cannot open device /dev/sdc1: Device or resource busy
mdadm: /dev/sdc1 has wrong uuid.
mdadm: cannot open device /dev/sdc: Device or resource busy
mdadm: /dev/sdc has wrong uuid.
mdadm: cannot open device /dev/sdb1: Device or resource busy
mdadm: /dev/sdb1 has wrong uuid.
mdadm: cannot open device /dev/sdb: Device or resource busy
mdadm: /dev/sdb has wrong uuid.
mdadm: cannot open device /dev/sda2: Device or resource busy
mdadm: /dev/sda2 has wrong uuid.
mdadm: cannot open device /dev/sda1: Device or resource busy
mdadm: /dev/sda1 has wrong uuid.
mdadm: cannot open device /dev/sda: Device or resource busy
mdadm: /dev/sda has wrong uuid.

I'm getting frustrated now.  It seems like it's fixzed, then next
reboot, BAM, it's gone again.  I'm so out of my depth here.

Please can you help me fix this once and for all ?

cheers
Simon

On 20 February 2011 23:39, Phil Turmel <philip@turmel.org> wrote:
> On 02/20/2011 06:17 PM, Simon Mcnair wrote:
>> Phil,
>> You are a god amongst men :-)
>
> Heh.  I read that out loud to my wife....
>
>> I seem to have the vast majority of my data.  I can't see anything in
>> lost+found and the fsck log doesn't list any filenames etc of missing
>> data.  Is there any way for me to find anything that I've lost ?
>
> Not that I know of, short of a list of names from before the crash.
>
> Glad to hear that the family treasures are safe.
>
> Phil
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-22 17:12                                                                 ` Simon Mcnair
@ 2011-02-22 17:14                                                                   ` Simon Mcnair
  2011-02-22 18:23                                                                     ` Phil Turmel
  0 siblings, 1 reply; 64+ messages in thread
From: Simon Mcnair @ 2011-02-22 17:14 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid@vger.kernel.org

was this anything to do with the dist-upgrade that I performed ?

mdadm has upgraded again to
proxmox:~# mdadm -V
mdadm - v3.1.4 - 31st August 2010

if that's important.

Simon

On 22 February 2011 17:12, Simon Mcnair <simonmcnair@gmail.com> wrote:
> Phil,
> I thought it was fixed, but every time I do a reboot it disappears
> again.  I did the:
> pvcreate --restorefile 'simons-lvm-backup' --uuid
> 9fAJEz-HcaP-RQ51-fV8b-nxrN-Uqwb-PPnOLJ /dev/md0 and the vgcfgrestore
>
> the only piece of your advice that I didn't follow was "zero the
> superblock on the removed drive" as I thought re-adding it to the
> array would do that anyway.  Did this cause me problems ?
>
> proxmox:~# mdadm  --assemble --scan -v
> mdadm: looking for devices for /dev/md/0
> mdadm: cannot open device /dev/dm-2: Device or resource busy
> mdadm: /dev/dm-2 has wrong uuid.
> mdadm: cannot open device /dev/dm-1: Device or resource busy
> mdadm: /dev/dm-1 has wrong uuid.
> mdadm: cannot open device /dev/dm-0: Device or resource busy
> mdadm: /dev/dm-0 has wrong uuid.
> mdadm: /dev/sdo1 has wrong uuid.
> mdadm: no RAID superblock on /dev/sdo
> mdadm: /dev/sdo has wrong uuid.
> mdadm: /dev/sdn1 has wrong uuid.
> mdadm: no RAID superblock on /dev/sdn
> mdadm: /dev/sdn has wrong uuid.
> mdadm: /dev/sdm1 has wrong uuid.
> mdadm: no RAID superblock on /dev/sdm
> mdadm: /dev/sdm has wrong uuid.
> mdadm: /dev/sdl1 has wrong uuid.
> mdadm: no RAID superblock on /dev/sdl
> mdadm: /dev/sdl has wrong uuid.
> mdadm: /dev/sdk1 has wrong uuid.
> mdadm: no RAID superblock on /dev/sdk
> mdadm: /dev/sdk has wrong uuid.
> mdadm: /dev/sdj1 has wrong uuid.
> mdadm: no RAID superblock on /dev/sdj
> mdadm: /dev/sdj has wrong uuid.
> mdadm: /dev/sdi1 has wrong uuid.
> mdadm: no RAID superblock on /dev/sdi
> mdadm: /dev/sdi has wrong uuid.
> mdadm: /dev/sdh1 has wrong uuid.
> mdadm: no RAID superblock on /dev/sdh
> mdadm: /dev/sdh has wrong uuid.
> mdadm: /dev/sdg1 has wrong uuid.
> mdadm: no RAID superblock on /dev/sdg
> mdadm: /dev/sdg has wrong uuid.
> mdadm: /dev/sdf1 has wrong uuid.
> mdadm: no RAID superblock on /dev/sdf
> mdadm: /dev/sdf has wrong uuid.
> mdadm: cannot open device /dev/sdc1: Device or resource busy
> mdadm: /dev/sdc1 has wrong uuid.
> mdadm: cannot open device /dev/sdc: Device or resource busy
> mdadm: /dev/sdc has wrong uuid.
> mdadm: cannot open device /dev/sdb1: Device or resource busy
> mdadm: /dev/sdb1 has wrong uuid.
> mdadm: cannot open device /dev/sdb: Device or resource busy
> mdadm: /dev/sdb has wrong uuid.
> mdadm: cannot open device /dev/sda2: Device or resource busy
> mdadm: /dev/sda2 has wrong uuid.
> mdadm: cannot open device /dev/sda1: Device or resource busy
> mdadm: /dev/sda1 has wrong uuid.
> mdadm: cannot open device /dev/sda: Device or resource busy
> mdadm: /dev/sda has wrong uuid.
>
> I'm getting frustrated now.  It seems like it's fixzed, then next
> reboot, BAM, it's gone again.  I'm so out of my depth here.
>
> Please can you help me fix this once and for all ?
>
> cheers
> Simon
>
> On 20 February 2011 23:39, Phil Turmel <philip@turmel.org> wrote:
>> On 02/20/2011 06:17 PM, Simon Mcnair wrote:
>>> Phil,
>>> You are a god amongst men :-)
>>
>> Heh.  I read that out loud to my wife....
>>
>>> I seem to have the vast majority of my data.  I can't see anything in
>>> lost+found and the fsck log doesn't list any filenames etc of missing
>>> data.  Is there any way for me to find anything that I've lost ?
>>
>> Not that I know of, short of a list of names from before the crash.
>>
>> Glad to hear that the family treasures are safe.
>>
>> Phil
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-22 17:14                                                                   ` Simon Mcnair
@ 2011-02-22 18:23                                                                     ` Phil Turmel
  2011-02-22 18:36                                                                       ` Simon McNair
  0 siblings, 1 reply; 64+ messages in thread
From: Phil Turmel @ 2011-02-22 18:23 UTC (permalink / raw)
  To: Simon Mcnair; +Cc: NeilBrown, linux-raid@vger.kernel.org

On 02/22/2011 12:14 PM, Simon Mcnair wrote:
> was this anything to do with the dist-upgrade that I performed ?
> 
> mdadm has upgraded again to
> proxmox:~# mdadm -V
> mdadm - v3.1.4 - 31st August 2010

No, but you probably can't do any more "--create --assume-clean" operations with that array.

As for the assembly errors, its probably an out-of-date mdadm.conf.

My server's looks like this:

> DEVICE /dev/sd[a-z][1-9]
> 
> ARRAY /dev/md23 UUID=c3cbe096:fc43d939:8aa66230:708c5670
> ARRAY /dev/md22 UUID=1438a239:aa03c3f9:68e051d7:b59a6219
> ARRAY /dev/md21 UUID=4b47d0f6:16f0e352:c67a2185:feb0b573
> ARRAY /dev/md20 UUID=7676c77e:6ae70f65:30a170a2:b1a1b242

Yours probably needs to be updated with the new md0 uuid.  Its not your boot drive, so your initramfs shouldn't matter.

Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-22 18:23                                                                     ` Phil Turmel
@ 2011-02-22 18:36                                                                       ` Simon McNair
  2011-02-22 19:06                                                                         ` Phil Turmel
  0 siblings, 1 reply; 64+ messages in thread
From: Simon McNair @ 2011-02-22 18:36 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid@vger.kernel.org

Phil,
phew I didn't know that:
"mdadm: /dev/sdo has wrong uuid.
mdadm: /dev/sdn1 has wrong uuid."

was just that the array UUID didn't match mdadm.conf, it would be nice 
if it said:

"mdadm: /dev/sdo uuid does not match mdadm.conf.
mdadm: /dev/sdn1 uuid does not match mdadm.conf"
my mdadm.conf now reads

DEVICE partitions
CREATE owner=root group=disk mode=0660 auto=yes
HOMEHOST <system>
MAILADDR root
# ARRAY /dev/md/0 level=raid5 metadata=1.1 num-devices=10 
UUID=0a72e40f:aec6f80f:a7004457:1a84a7a8 name=proï¿½lox:0
ARRAY /dev/md/0 metadata=1.1 UUID=12c2af00:10681e10:fb17e449:1404739c 
name=proxmox:0

I'm afraid to do another reboot incase something else goes wrong ;-).  
Just for my info, when I only have a single array there isn't much 
chance of it being assigned anything other than md0, so could I not just 
leave mdadm.conf empty ?
cheers

Simon

On 22/02/2011 18:23, Phil Turmel wrote:
> On 02/22/2011 12:14 PM, Simon Mcnair wrote:
>> was this anything to do with the dist-upgrade that I performed ?
>>
>> mdadm has upgraded again to
>> proxmox:~# mdadm -V
>> mdadm - v3.1.4 - 31st August 2010
> No, but you probably can't do any more "--create --assume-clean" operations with that array.
>
> As for the assembly errors, its probably an out-of-date mdadm.conf.
>
> My server's looks like this:
>
>> DEVICE /dev/sd[a-z][1-9]
>>
>> ARRAY /dev/md23 UUID=c3cbe096:fc43d939:8aa66230:708c5670
>> ARRAY /dev/md22 UUID=1438a239:aa03c3f9:68e051d7:b59a6219
>> ARRAY /dev/md21 UUID=4b47d0f6:16f0e352:c67a2185:feb0b573
>> ARRAY /dev/md20 UUID=7676c77e:6ae70f65:30a170a2:b1a1b242
> Yours probably needs to be updated with the new md0 uuid.  Its not your boot drive, so your initramfs shouldn't matter.
>
> Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-22 18:36                                                                       ` Simon McNair
@ 2011-02-22 19:06                                                                         ` Phil Turmel
  0 siblings, 0 replies; 64+ messages in thread
From: Phil Turmel @ 2011-02-22 19:06 UTC (permalink / raw)
  To: simonmcnair; +Cc: NeilBrown, linux-raid@vger.kernel.org

On 02/22/2011 01:36 PM, Simon McNair wrote:
> Phil,
> phew I didn't know that:
> "mdadm: /dev/sdo has wrong uuid.
> mdadm: /dev/sdn1 has wrong uuid."
> 
> was just that the array UUID didn't match mdadm.conf, it would be nice if it said:
> 
> "mdadm: /dev/sdo uuid does not match mdadm.conf.
> mdadm: /dev/sdn1 uuid does not match mdadm.conf"
> my mdadm.conf now reads
> 
> DEVICE partitions
> CREATE owner=root group=disk mode=0660 auto=yes
> HOMEHOST <system>
> MAILADDR root
> # ARRAY /dev/md/0 level=raid5 metadata=1.1 num-devices=10 UUID=0a72e40f:aec6f80f:a7004457:1a84a7a8 name=proï¿½lox:0
> ARRAY /dev/md/0 metadata=1.1 UUID=12c2af00:10681e10:fb17e449:1404739c name=proxmox:0

You can strip it down to be effectively "auto", but I wouldn't remove it.  Since you are using partitioned devices throughout your system, something like this:

DEVICE /dev/sd*[0-9]
AUTO all

Or slightly more restrictive:

DEVICE /dev/sd*[0-9]
ARRAY /dev/md/0 UUID=12c2af00:10681e10:fb17e449:1404739c

> I'm afraid to do another reboot incase something else goes wrong ;-).  Just for my info, when I only have a single array there isn't much chance of it being assigned anything other than md0, so could I not just leave mdadm.conf empty ?

LVM doesn't care what device name the physical volumes end up with.  Please reboot when the system is quiet again to make sure your new mdadm.conf works.  Or stop your array, and then do a "mdadm --assemble --scan", which is effectively the same.

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-16 19:36                             ` Phil Turmel
  2011-02-16 21:28                               ` Simon McNair
@ 2011-02-18  9:31                               ` Simon Mcnair
  2011-02-18 13:16                                 ` Phil Turmel
       [not found]                               ` <AANLkTi=RmR5nVnmFLuqK5anHc3WDPxjuYjitT6+5wAqS@mail.gmail.com>
  2 siblings, 1 reply; 64+ messages in thread
From: Simon Mcnair @ 2011-02-18  9:31 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid

time passes.  You are eaten by a Grue.  Sheesh this is taking a long time.

simon@proxmox:~$ top

top - 09:26:19 up 20:36, 11 users,  load average: 2.46, 2.49, 2.39
Tasks: 334 total,   9 running, 325 sleeping,   0 stopped,   0 zombie
Cpu(s): 52.8%us,  9.4%sy,  0.0%ni, 35.2%id,  2.5%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:  12299244k total, 12227584k used,    71660k free, 11042868k buffers
Swap: 11534332k total,        0k used, 11534332k free,   208204k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
18896 root      20   0 12288 1072  460 S   63  0.0 431:16.55 ntfs-3g
18909 root      20   0 12288 1064  460 R   62  0.0 419:17.11 ntfs-3g
18920 root      20   0 12288 1100  492 R   62  0.0 428:17.33 ntfs-3g
 9210 root      20   0  4068  520  328 S   54  0.0 661:59.92 gzip
 9138 root      20   0  4068  524  328 R   53  0.0 647:29.07 gzip
 9247 root      20   0  4068  524  328 S   53  0.0 651:28.36 gzip
25678 root      20   0  4068  524  328 R   52  0.0 439:49.20 gzip
24957 root      20   0  4068  524  328 R   51  0.0 437:44.01 gzip
25792 root      20   0  4068  524  328 S   48  0.0 433:28.14 gzip

This is hardly touching the CPU on the box (in my opinion), any advice
on using renice ?  I've never used it before, but now seems like a
good time ?

tia
Simon

On 16 February 2011 19:36, Phil Turmel <philip@turmel.org> wrote:
> On 02/16/2011 02:15 PM, Simon McNair wrote:
>> proxmox:/home/simon# vgscan --verbose
>>     Wiping cache of LVM-capable devices
>>     Wiping internal VG cache
>>   Reading all physical volumes.  This may take a while...
>>     Finding all volume groups
>>     Finding volume group "pve"
>>   Found volume group "pve" using metadata type lvm2
>>     Finding volume group "lvm-raid"
>>   Found volume group "lvm-raid" using metadata type lvm2
>> proxmox:/home/simon#
>> proxmox:/home/simon# lvscan --verbose
>>     Finding all logical volumes
>>   ACTIVE            '/dev/pve/swap' [11.00 GB] inherit
>>   ACTIVE            '/dev/pve/root' [96.00 GB] inherit
>>   ACTIVE            '/dev/pve/data' [354.26 GB] inherit
>>   inactive          '/dev/lvm-raid/RAID' [8.19 TB] inherit
>>
>> proxmox:/home/simon# vgchange -ay
>>   3 logical volume(s) in volume group "pve" now active
>>   1 logical volume(s) in volume group "lvm-raid" now active
>
> Heh.  Figures.
>
>> proxmox:/home/simon# fsck.ext4 -n /dev/mapper/lvm-raid-RAID
>
> Actually, I wanted you to try with a capital N.  Lower case 'n' is similar, but not quite the same.
>
>> e2fsck 1.41.3 (12-Oct-2008)
>> fsck.ext4: No such file or directory while trying to open /dev/mapper/lvm-raid-RAID
>>
>> The superblock could not be read or does not describe a correct ext2
>> filesystem.  If the device is valid and it really contains an ext2
>> filesystem (and not swap or ufs or something else), then the superblock
>> is corrupt, and you might try running e2fsck with an alternate superblock:
>>     e2fsck -b 8193 <device>
>>
>> proxmox:/home/simon# fsck.ext4 -n /dev/mapper/
>> control         lvm--raid-RAID  pve-data        pve-root        pve-swap
>
> Strange.  I guess it does that to distinguish dashes in the VG name from dashes between VG and LV names.
>
>> proxmox:/home/simon# fsck.ext4 -n /dev/mapper/lvm--raid-RAID
>> e2fsck 1.41.3 (12-Oct-2008)
>> /dev/mapper/lvm--raid-RAID has unsupported feature(s): FEATURE_I31
>> e2fsck: Get a newer version of e2fsck!
>>
>> my version of e2fsck always worked before ?
>
> v1.41.14 was release 7 weeks ago.  But, I suspect there's corruption in the superblock.  Do you still have your disk images tucked away somewhere safe?
>
> If so, try:
>
> 1) The '-b' option to e2fsck.  We need to experiment with '-n -b offset' to find the alternate superblock.  Trying 'offset' = to 8193, 16384, and 32768, per the man-page.
>
> 2) A newer e2fsprogs.
>
> Finally,
> 3) mount -r /dev/lvm-raid/RAID /mnt/whatever
>
> Phil
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-18  9:31                               ` Simon Mcnair
@ 2011-02-18 13:16                                 ` Phil Turmel
  2011-02-18 13:21                                   ` Roberto Spadim
  2011-02-18 13:29                                   ` Simon Mcnair
  0 siblings, 2 replies; 64+ messages in thread
From: Phil Turmel @ 2011-02-18 13:16 UTC (permalink / raw)
  To: Simon Mcnair; +Cc: NeilBrown, linux-raid

On 02/18/2011 04:31 AM, Simon Mcnair wrote:
> time passes.  You are eaten by a Grue.  Sheesh this is taking a long time.
> 
> simon@proxmox:~$ top
> 
> top - 09:26:19 up 20:36, 11 users,  load average: 2.46, 2.49, 2.39
> Tasks: 334 total,   9 running, 325 sleeping,   0 stopped,   0 zombie
> Cpu(s): 52.8%us,  9.4%sy,  0.0%ni, 35.2%id,  2.5%wa,  0.0%hi,  0.2%si,  0.0%st
> Mem:  12299244k total, 12227584k used,    71660k free, 11042868k buffers
> Swap: 11534332k total,        0k used, 11534332k free,   208204k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 18896 root      20   0 12288 1072  460 S   63  0.0 431:16.55 ntfs-3g
> 18909 root      20   0 12288 1064  460 R   62  0.0 419:17.11 ntfs-3g
> 18920 root      20   0 12288 1100  492 R   62  0.0 428:17.33 ntfs-3g
>  9210 root      20   0  4068  520  328 S   54  0.0 661:59.92 gzip
>  9138 root      20   0  4068  524  328 R   53  0.0 647:29.07 gzip
>  9247 root      20   0  4068  524  328 S   53  0.0 651:28.36 gzip
> 25678 root      20   0  4068  524  328 R   52  0.0 439:49.20 gzip
> 24957 root      20   0  4068  524  328 R   51  0.0 437:44.01 gzip
> 25792 root      20   0  4068  524  328 S   48  0.0 433:28.14 gzip
> 
> This is hardly touching the CPU on the box (in my opinion), any advice
> on using renice ?  I've never used it before, but now seems like a
> good time ?

Probably wouldn't help.

You have a total usage of 65%.  I suspect that each of the processors running gzip are nearly pegged, and everything else is loafing along.  Hit '1' in top to show the CPU usage per-cpu.  Single-threaded gzip is holding you back.

If you had enough space for saving the partitions uncompressed, it would be much faster.  The PCIe x4 interface on the SuperMicro has a theoretical performance of 1GByte/s, which would be ~ 125MB/s per drive.  From what I've read, that card actually delivers ~ 75MB/s per drive when they're all busy.

Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-18 13:16                                 ` Phil Turmel
@ 2011-02-18 13:21                                   ` Roberto Spadim
  2011-02-18 13:26                                     ` Phil Turmel
  2011-02-18 13:29                                   ` Simon Mcnair
  1 sibling, 1 reply; 64+ messages in thread
From: Roberto Spadim @ 2011-02-18 13:21 UTC (permalink / raw)
  To: Phil Turmel; +Cc: Simon Mcnair, NeilBrown, linux-raid

pci express x4 = 2.5*4gbps = 10gbps not?!

2011/2/18 Phil Turmel <philip@turmel.org>:
> On 02/18/2011 04:31 AM, Simon Mcnair wrote:
>> time passes.  You are eaten by a Grue.  Sheesh this is taking a long time.
>>
>> simon@proxmox:~$ top
>>
>> top - 09:26:19 up 20:36, 11 users,  load average: 2.46, 2.49, 2.39
>> Tasks: 334 total,   9 running, 325 sleeping,   0 stopped,   0 zombie
>> Cpu(s): 52.8%us,  9.4%sy,  0.0%ni, 35.2%id,  2.5%wa,  0.0%hi,  0.2%si,  0.0%st
>> Mem:  12299244k total, 12227584k used,    71660k free, 11042868k buffers
>> Swap: 11534332k total,        0k used, 11534332k free,   208204k cached
>>
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 18896 root      20   0 12288 1072  460 S   63  0.0 431:16.55 ntfs-3g
>> 18909 root      20   0 12288 1064  460 R   62  0.0 419:17.11 ntfs-3g
>> 18920 root      20   0 12288 1100  492 R   62  0.0 428:17.33 ntfs-3g
>>  9210 root      20   0  4068  520  328 S   54  0.0 661:59.92 gzip
>>  9138 root      20   0  4068  524  328 R   53  0.0 647:29.07 gzip
>>  9247 root      20   0  4068  524  328 S   53  0.0 651:28.36 gzip
>> 25678 root      20   0  4068  524  328 R   52  0.0 439:49.20 gzip
>> 24957 root      20   0  4068  524  328 R   51  0.0 437:44.01 gzip
>> 25792 root      20   0  4068  524  328 S   48  0.0 433:28.14 gzip
>>
>> This is hardly touching the CPU on the box (in my opinion), any advice
>> on using renice ?  I've never used it before, but now seems like a
>> good time ?
>
> Probably wouldn't help.
>
> You have a total usage of 65%.  I suspect that each of the processors running gzip are nearly pegged, and everything else is loafing along.  Hit '1' in top to show the CPU usage per-cpu.  Single-threaded gzip is holding you back.
>
> If you had enough space for saving the partitions uncompressed, it would be much faster.  The PCIe x4 interface on the SuperMicro has a theoretical performance of 1GByte/s, which would be ~ 125MB/s per drive.  From what I've read, that card actually delivers ~ 75MB/s per drive when they're all busy.
>
> Phil
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-18 13:21                                   ` Roberto Spadim
@ 2011-02-18 13:26                                     ` Phil Turmel
  0 siblings, 0 replies; 64+ messages in thread
From: Phil Turmel @ 2011-02-18 13:26 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: Simon Mcnair, NeilBrown, linux-raid

On 02/18/2011 08:21 AM, Roberto Spadim wrote:
> pci express x4 = 2.5*4gbps = 10gbps not?!

Subtract transfer overhead (Framing, ECC, addressing, etc) and it takes >8 bits to transfer 8 bits.  I haven't checked the spec, but it looks like just framing and ECC is +25%.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-18 13:16                                 ` Phil Turmel
  2011-02-18 13:21                                   ` Roberto Spadim
@ 2011-02-18 13:29                                   ` Simon Mcnair
  2011-02-18 13:34                                     ` Phil Turmel
  1 sibling, 1 reply; 64+ messages in thread
From: Simon Mcnair @ 2011-02-18 13:29 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid

top - 13:28:45 up 1 day, 39 min, 12 users,  load average: 2.65, 2.49, 2.37
Tasks: 340 total,   7 running, 333 sleeping,   0 stopped,   0 zombie
Cpu0  : 47.9%us, 15.0%sy,  0.0%ni, 13.7%id, 21.5%wa,  0.0%hi,  2.0%si,  0.0%st
Cpu1  : 54.8%us,  8.7%sy,  0.0%ni, 33.3%id,  3.2%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 46.5%us, 13.7%sy,  0.0%ni, 37.9%id,  1.9%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  : 49.5%us,  9.5%sy,  0.0%ni, 37.6%id,  3.4%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  : 48.7%us,  8.2%sy,  0.0%ni, 17.3%id, 25.5%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu5  : 50.3%us,  7.2%sy,  0.0%ni, 41.8%id,  0.6%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  : 53.5%us,  6.6%sy,  0.0%ni, 38.4%id,  1.6%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  : 52.0%us,  6.2%sy,  0.0%ni, 41.2%id,  0.6%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  12299244k total, 12227064k used,    72180k free, 10754600k buffers
Swap: 11534332k total,        0k used, 11534332k free,   175916k cached


On 18 February 2011 13:16, Phil Turmel <philip@turmel.org> wrote:
> On 02/18/2011 04:31 AM, Simon Mcnair wrote:
>> time passes.  You are eaten by a Grue.  Sheesh this is taking a long time.
>>
>> simon@proxmox:~$ top
>>
>> top - 09:26:19 up 20:36, 11 users,  load average: 2.46, 2.49, 2.39
>> Tasks: 334 total,   9 running, 325 sleeping,   0 stopped,   0 zombie
>> Cpu(s): 52.8%us,  9.4%sy,  0.0%ni, 35.2%id,  2.5%wa,  0.0%hi,  0.2%si,  0.0%st
>> Mem:  12299244k total, 12227584k used,    71660k free, 11042868k buffers
>> Swap: 11534332k total,        0k used, 11534332k free,   208204k cached
>>
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 18896 root      20   0 12288 1072  460 S   63  0.0 431:16.55 ntfs-3g
>> 18909 root      20   0 12288 1064  460 R   62  0.0 419:17.11 ntfs-3g
>> 18920 root      20   0 12288 1100  492 R   62  0.0 428:17.33 ntfs-3g
>>  9210 root      20   0  4068  520  328 S   54  0.0 661:59.92 gzip
>>  9138 root      20   0  4068  524  328 R   53  0.0 647:29.07 gzip
>>  9247 root      20   0  4068  524  328 S   53  0.0 651:28.36 gzip
>> 25678 root      20   0  4068  524  328 R   52  0.0 439:49.20 gzip
>> 24957 root      20   0  4068  524  328 R   51  0.0 437:44.01 gzip
>> 25792 root      20   0  4068  524  328 S   48  0.0 433:28.14 gzip
>>
>> This is hardly touching the CPU on the box (in my opinion), any advice
>> on using renice ?  I've never used it before, but now seems like a
>> good time ?
>
> Probably wouldn't help.
>
> You have a total usage of 65%.  I suspect that each of the processors running gzip are nearly pegged, and everything else is loafing along.  Hit '1' in top to show the CPU usage per-cpu.  Single-threaded gzip is holding you back.
>
> If you had enough space for saving the partitions uncompressed, it would be much faster.  The PCIe x4 interface on the SuperMicro has a theoretical performance of 1GByte/s, which would be ~ 125MB/s per drive.  From what I've read, that card actually delivers ~ 75MB/s per drive when they're all busy.
>
> Phil
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-18 13:29                                   ` Simon Mcnair
@ 2011-02-18 13:34                                     ` Phil Turmel
  2011-02-18 14:12                                       ` Simon McNair
  0 siblings, 1 reply; 64+ messages in thread
From: Phil Turmel @ 2011-02-18 13:34 UTC (permalink / raw)
  To: Simon Mcnair; +Cc: NeilBrown, linux-raid

On 02/18/2011 08:29 AM, Simon Mcnair wrote:
> top - 13:28:45 up 1 day, 39 min, 12 users,  load average: 2.65, 2.49, 2.37
> Tasks: 340 total,   7 running, 333 sleeping,   0 stopped,   0 zombie
> Cpu0  : 47.9%us, 15.0%sy,  0.0%ni, 13.7%id, 21.5%wa,  0.0%hi,  2.0%si,  0.0%st
> Cpu1  : 54.8%us,  8.7%sy,  0.0%ni, 33.3%id,  3.2%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu2  : 46.5%us, 13.7%sy,  0.0%ni, 37.9%id,  1.9%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu3  : 49.5%us,  9.5%sy,  0.0%ni, 37.6%id,  3.4%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu4  : 48.7%us,  8.2%sy,  0.0%ni, 17.3%id, 25.5%wa,  0.0%hi,  0.3%si,  0.0%st
> Cpu5  : 50.3%us,  7.2%sy,  0.0%ni, 41.8%id,  0.6%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu6  : 53.5%us,  6.6%sy,  0.0%ni, 38.4%id,  1.6%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu7  : 52.0%us,  6.2%sy,  0.0%ni, 41.2%id,  0.6%wa,  0.0%hi,  0.0%si,  0.0%st
> Mem:  12299244k total, 12227064k used,    72180k free, 10754600k buffers
> Swap: 11534332k total,        0k used, 11534332k free,   175916k cached

Hmmm.  Not what I expected at all.

I'd be very interested in some performance tests of the Supermicro ports vs. motherboard ports on that system.

Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-18 13:34                                     ` Phil Turmel
@ 2011-02-18 14:12                                       ` Simon McNair
  2011-02-18 16:10                                         ` Phil Turmel
  0 siblings, 1 reply; 64+ messages in thread
From: Simon McNair @ 2011-02-18 14:12 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid

Phil,
Tell me what you want me to do and I'm game for it.  When I bought it 
the reviews were not all that promising 
(http://www.google.co.uk/products/catalog?hl=&q=AOC-SASLP-MV8&cid=15596376504116122379&os=reviews) 
but I figure getting up and running is more important than sata speed at 
the moment.

cheers
Simon

On 18/02/2011 13:34, Phil Turmel wrote:
> On 02/18/2011 08:29 AM, Simon Mcnair wrote:
>> top - 13:28:45 up 1 day, 39 min, 12 users,  load average: 2.65, 2.49, 2.37
>> Tasks: 340 total,   7 running, 333 sleeping,   0 stopped,   0 zombie
>> Cpu0  : 47.9%us, 15.0%sy,  0.0%ni, 13.7%id, 21.5%wa,  0.0%hi,  2.0%si,  0.0%st
>> Cpu1  : 54.8%us,  8.7%sy,  0.0%ni, 33.3%id,  3.2%wa,  0.0%hi,  0.0%si,  0.0%st
>> Cpu2  : 46.5%us, 13.7%sy,  0.0%ni, 37.9%id,  1.9%wa,  0.0%hi,  0.0%si,  0.0%st
>> Cpu3  : 49.5%us,  9.5%sy,  0.0%ni, 37.6%id,  3.4%wa,  0.0%hi,  0.0%si,  0.0%st
>> Cpu4  : 48.7%us,  8.2%sy,  0.0%ni, 17.3%id, 25.5%wa,  0.0%hi,  0.3%si,  0.0%st
>> Cpu5  : 50.3%us,  7.2%sy,  0.0%ni, 41.8%id,  0.6%wa,  0.0%hi,  0.0%si,  0.0%st
>> Cpu6  : 53.5%us,  6.6%sy,  0.0%ni, 38.4%id,  1.6%wa,  0.0%hi,  0.0%si,  0.0%st
>> Cpu7  : 52.0%us,  6.2%sy,  0.0%ni, 41.2%id,  0.6%wa,  0.0%hi,  0.0%si,  0.0%st
>> Mem:  12299244k total, 12227064k used,    72180k free, 10754600k buffers
>> Swap: 11534332k total,        0k used, 11534332k free,   175916k cached
> Hmmm.  Not what I expected at all.
>
> I'd be very interested in some performance tests of the Supermicro ports vs. motherboard ports on that system.
>
> Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-18 14:12                                       ` Simon McNair
@ 2011-02-18 16:10                                         ` Phil Turmel
  2011-02-18 16:38                                           ` Roberto Spadim
  0 siblings, 1 reply; 64+ messages in thread
From: Phil Turmel @ 2011-02-18 16:10 UTC (permalink / raw)
  To: simonmcnair; +Cc: NeilBrown, linux-raid

On 02/18/2011 09:12 AM, Simon McNair wrote:
> Phil,
> Tell me what you want me to do and I'm game for it.  When I bought it the reviews were not all that promising (http://www.google.co.uk/products/catalog?hl=&q=AOC-SASLP-MV8&cid=15596376504116122379&os=reviews) but I figure getting up and running is more important than sata speed at the moment.

I perused the cards reviewed here:

http://blog.zorinaq.com/?e=10

I was aiming for the ~$100 mark, for personal use.  I'd have chosen the one of LSI 1068E units, but the cheaper ones are reversed boards (parts on opposite side of normal).  The Marvell 6480 was apparently the next best deal.  I may have to rethink that.

In any case, some benchmarking options:

1) Spot check with "hdparm -t /dev/sd?".  Not considered very accurate.  It's read-only, though, so can run in parallel with other device activity (w/ corresponding reduced accuracy).

2) Run a packaged set of benchmarks:
	http://www.coker.com.au/bonnie++/
	http://www.iozone.org/
	http://dbench.samba.org/

3) Script a series of basic tests of various tasks:
	https://raid.wiki.kernel.org/index.php/Home_grown_testing_methods

Most of these tests require a writable filesystem on the target device to work in.

Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-18 16:10                                         ` Phil Turmel
@ 2011-02-18 16:38                                           ` Roberto Spadim
  0 siblings, 0 replies; 64+ messages in thread
From: Roberto Spadim @ 2011-02-18 16:38 UTC (permalink / raw)
  To: Phil Turmel; +Cc: simonmcnair, NeilBrown, linux-raid

dd if=/dev/XXX of=/dev/null iflag=direct
on other terminal run
iostat -d 1 -k


2011/2/18 Phil Turmel <philip@turmel.org>:
> On 02/18/2011 09:12 AM, Simon McNair wrote:
>> Phil,
>> Tell me what you want me to do and I'm game for it.  When I bought it the reviews were not all that promising (http://www.google.co.uk/products/catalog?hl=&q=AOC-SASLP-MV8&cid=15596376504116122379&os=reviews) but I figure getting up and running is more important than sata speed at the moment.
>
> I perused the cards reviewed here:
>
> http://blog.zorinaq.com/?e=10
>
> I was aiming for the ~$100 mark, for personal use.  I'd have chosen the one of LSI 1068E units, but the cheaper ones are reversed boards (parts on opposite side of normal).  The Marvell 6480 was apparently the next best deal.  I may have to rethink that.
>
> In any case, some benchmarking options:
>
> 1) Spot check with "hdparm -t /dev/sd?".  Not considered very accurate.  It's read-only, though, so can run in parallel with other device activity (w/ corresponding reduced accuracy).
>
> 2) Run a packaged set of benchmarks:
>        http://www.coker.com.au/bonnie++/
>        http://www.iozone.org/
>        http://dbench.samba.org/
>
> 3) Script a series of basic tests of various tasks:
>        https://raid.wiki.kernel.org/index.php/Home_grown_testing_methods
>
> Most of these tests require a writable filesystem on the target device to work in.
>
> Phil
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

[parent not found: <AANLkTi=RmR5nVnmFLuqK5anHc3WDPxjuYjitT6+5wAqS@mail.gmail.com>]

* Re: Linux software RAID assistance
       [not found]                               ` <AANLkTi=RmR5nVnmFLuqK5anHc3WDPxjuYjitT6+5wAqS@mail.gmail.com>
@ 2011-02-20 18:48                                 ` Phil Turmel
  2011-02-20 19:25                                   ` Simon Mcnair
  0 siblings, 1 reply; 64+ messages in thread
From: Phil Turmel @ 2011-02-20 18:48 UTC (permalink / raw)
  To: Simon Mcnair; +Cc: NeilBrown, linux-raid

On 02/20/2011 12:03 PM, Simon Mcnair wrote:
> Hi Phil,
> 
> Is this fsck (fsck.ext4 -n -b 32768 /dev/mapper/lvm--raid-RAID
>> fsck.txt) as bad as it looks ? :-(

It's bad.  Either the original sdd has a lot more corruption than I expected, or the 3ware spread corruption over all the drives.

If the former, failing it out of the array might help.  If the latter, your data is likely toast.  Some identifiable data is being found, based on the used vs. free block/inode/directory counts in that report.  That's good.

I suggest you do "mdadm /dev/md0 --fail /dev/sdi1" and repeat the "fsck -n" as above.

(It'll be noticably slower, as it'll be using parity to reconstruct 1 out every 9 chunks.)

If it the fsck results improve, or stay the same, proceed to "fsck -y", and we'll see.

Wouldn't hurt to run "iostat -xm 5" in another terminal during the fsck to see what kind of performance that array is getting.

Phil

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-20 18:48                                 ` Phil Turmel
@ 2011-02-20 19:25                                   ` Simon Mcnair
  0 siblings, 0 replies; 64+ messages in thread
From: Simon Mcnair @ 2011-02-20 19:25 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid

that's not good :-(

have done the --fail and running fsck at the moment.  IOstat as below:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          10.96    0.00    4.01   10.98    0.00   74.05

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sda1              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sda2              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdd            4455.20     0.00  497.80    0.00    19.35     0.00
79.59     1.66    3.34   1.11  55.20
sdd1           4455.20     0.00  497.80    0.00    19.35     0.00
79.59     1.66    3.34   1.11  55.20
sde            4454.20     0.00  498.60    0.00    19.35     0.00
79.47     1.63    3.26   1.13  56.20
sde1           4454.20     0.00  498.60    0.00    19.35     0.00
79.47     1.63    3.26   1.13  56.20
sdf            4311.60     0.00  615.60    0.00    19.33     0.00
64.31     1.22    1.98   0.60  37.20
sdf1           4311.60     0.00  615.60    0.00    19.33     0.00
64.31     1.22    1.98   0.60  37.20
sdg            4262.60     0.00  659.60    0.00    19.35     0.00
60.08     1.83    2.77   0.79  52.20
sdg1           4262.60     0.00  659.60    0.00    19.35     0.00
60.08     1.83    2.77   0.79  52.20
sdh            4242.20     0.00  665.80    0.00    19.36     0.00
59.54     1.67    2.51   0.63  42.00
sdh1           4242.20     0.00  665.80    0.00    19.36     0.00
59.54     1.67    2.51   0.63  42.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdi1              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdj            4382.00     0.00  567.20    0.00    19.34     0.00
69.82     1.35    2.38   0.74  42.20
sdj1           4382.00     0.00  567.20    0.00    19.34     0.00
69.82     1.35    2.38   0.74  42.20
sdk            4341.40     0.00  605.60    0.00    19.34     0.00
65.42     1.71    2.82   0.89  53.80
sdk1           4341.40     0.00  605.60    0.00    19.34     0.00
65.42     1.71    2.82   0.89  53.80
sdl            4368.20     0.00  579.20    0.00    19.33     0.00
68.36     1.78    3.07   0.99  57.60
sdl1           4368.20     0.00  579.20    0.00    19.33     0.00
68.36     1.78    3.07   0.99  57.60
sdm            4351.00     0.00  591.40    0.00    19.32     0.00
66.92     2.17    3.68   1.09  64.60
sdm1           4351.00     0.00  591.40    0.00    19.32     0.00
66.92     2.17    3.68   1.09  64.60
dm-0              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
md0               0.00     0.00 24808.00    0.00    96.91     0.00
8.00     0.00    0.00   0.00   0.00
dm-3              0.00     0.00 5735.60    0.00    22.40     0.00
8.00    26.08    4.55   0.17  99.20
sdr               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdr1              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sds               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sds1              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00

Simon

On 20 February 2011 18:48, Phil Turmel <philip@turmel.org> wrote:
> On 02/20/2011 12:03 PM, Simon Mcnair wrote:
>> Hi Phil,
>>
>> Is this fsck (fsck.ext4 -n -b 32768 /dev/mapper/lvm--raid-RAID
>>> fsck.txt) as bad as it looks ? :-(
>
> It's bad.  Either the original sdd has a lot more corruption than I expected, or the 3ware spread corruption over all the drives.
>
> If the former, failing it out of the array might help.  If the latter, your data is likely toast.  Some identifiable data is being found, based on the used vs. free block/inode/directory counts in that report.  That's good.
>
> I suggest you do "mdadm /dev/md0 --fail /dev/sdi1" and repeat the "fsck -n" as above.
>
> (It'll be noticably slower, as it'll be using parity to reconstruct 1 out every 9 chunks.)
>
> If it the fsck results improve, or stay the same, proceed to "fsck -y", and we'll see.
>
> Wouldn't hurt to run "iostat -xm 5" in another terminal during the fsck to see what kind of performance that array is getting.
>
> Phil
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-16 18:14           ` Phil Turmel
  2011-02-16 18:18             ` Simon McNair
@ 2011-02-19  8:49             ` Simon Mcnair
  1 sibling, 0 replies; 64+ messages in thread
From: Simon Mcnair @ 2011-02-19  8:49 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid

Hot plug isn't a problem, I have two supermicro CSE-M35T1 's.

It's wierd though that the drive order has changed so much because I
plugged the sata ports in to the board in the same order as the 3ware
card. i.e.

port
0->0
1->1
2->2
3->3
4->4
5->5
6->6
7->7
8->m/b 7
9->m/b 8

maybe it's the way linux/udev allocated them...

one last thing:  Is echo 1 > /sys/block/sda/device/delete the correct
way to stop a AHCI SATA device ?

cheers
Simon

On 16 February 2011 18:14, Phil Turmel <philip@turmel.org> wrote:
> On 02/16/2011 12:49 PM, Simon McNair wrote:
>> Hi Phil,
>>  A couple of questions please.
>
> [trim /]
>
>>>> Simon
>>> I don't know why the serial numbers are formatted differently, but we can still tell them apart (the eight characters starting with "PAG").
>>>
>>> So, our device order in your new setup is: [ihgfmlkjbc], where /dev/sdi corresponds to the original report's /dev/sdd, which matches the sig grep in your other note.
>>>
>>> Another note:  The controller for sd[abc] is still showing ata_piix as its controller.  That means you cannot hot-plug those ports.  If you change your BIOS to AHCI mode instead of "Compatibility" or "Emulation", the full-featured ahci driver will run those ports.  Not urgent, but I highly recommend it.
>>>
>> Will do that now, before I forget
>
> Hot-pluggability with suitable trays is very handy! :)
>
> [trim /]
>
>>>> Error: The backup GPT table is not at the end of the disk, as it should be.  This might mean that another operating
>>>> system believes the disk is smaller.  Fix, by moving the backup to the end (and removing the old backup)?
>>>> Fix/Cancel? c
>>> The 3ware controller must have reserved some space at the end of each drive for its own use.  Didn't know it'd do that.  You will have to fix that.
>>>
>>> [trim /]
>>>
>> Do you have any suggestions on how I can fix that ?  I don't have a clue
>
> Just do 'parted /dev/sd?' and on the ones it offers to fix, say yes.  Then request 'unit s' and 'print' to verify that it is correct.
>
> [trim /]
>
>> when I was trying to figure out the command for this using 'man parted' I came across this:
>> " rescue start end
>>                      Rescue  a  lost  partition  that was located somewhere between start and end.  If a partition is
>>                      found, parted will ask if you want to create an entry for it in the partition table."
>> Is it worth trying ?
>
> Nah.  That's for when you don't know exactly where the partition is.  We know.
>
>> I originally created the partitions like so:
>> parted -s /dev/sdb rm 1
>> parted -s /dev/sdb mklabel gpt
>> parted -s --align optimal /dev/sdb mkpart primary ext4 .512 100%
>> parted -s /dev/sdb set 1 raid on
>> parted -s /dev/sdb align-check optimal 1
>>
>> so to recreate the above I would do:
>> parted -s /dev/sdb mkpart primary ext4 2048s 1953101823s
>> parted -s /dev/sdc mkpart primary ext4 2048s 1953101823s
>> parted -s /dev/sdf mkpart primary ext4 2048s 1953101823s
>> parted -s /dev/sdg mkpart primary ext4 2048s 1953101823s
>> parted -s /dev/sdh mkpart primary ext4 2048s 1953101823s
>> parted -s /dev/sdi mkpart primary ext4 2048s 1953101823s
>> parted -s /dev/sdj mkpart primary ext4 2048s 1953101823s
>> parted -s /dev/sdk mkpart primary ext4 2048s 1953101823s
>> parted -s /dev/sdl mkpart primary ext4 2048s 1953101823s
>> parted -s /dev/sdm mkpart primary ext4 2048s 1953101823s
>
> Only recreate the partition tables where you have to, i.e., the 'Fix' option above didn't work.  And don't specify a filesystem.
>
> Probably just /dev/sdh and /dev/sdi.  Like so, though:
>
> parted -s /dev/sdh mklabel gpt mkpart primary 2048s 1953101823s set 1 raid on
> parted -s /dev/sdi mklabel gpt mkpart primary 2048s 1953101823s set 1 raid on
>
>> I'm guessing the backups that I want to do can wait until any potential fsck ?
>
> Do an 'fsck -N' first, and if it passes, or has few errors, mount the filesystem readonly and grab your backup.  Then let fsck have at it for real.  If anything gets fixed, compare your backup from the read-only fs to the fixed fs.
>
> Given your flaky old controller, I expect there'll be *some* problems.
>
>> sorry if the questions are dumb but I'm not sure what I'm doing and I'd rather ask more questions than fewer and understand the implications of what I'm doing.
>
> Oh, no.  You are right to be paranoid.  If anything looks funny, stop.
>
> Phil
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Linux software RAID assistance
  2011-02-15 14:51   ` Phil Turmel
  2011-02-15 19:04     ` Simon McNair
  2011-02-16 13:51     ` Simon McNair
@ 2011-02-16 13:56     ` Simon McNair
  2 siblings, 0 replies; 64+ messages in thread
From: Simon McNair @ 2011-02-16 13:56 UTC (permalink / raw)
  To: Phil Turmel; +Cc: NeilBrown, linux-raid

one other snippet:

proxmox:/home/simon# for x in /dev/sd{d..m} ; do echo $x ; dd if=$x 
skip=2312 count=128 2>/dev/null |strings |grep 
9fAJEz-HcaP-RQ51-fV8b-nxrN-Uqwb-PPnOLJ ; done
/dev/sdd
/dev/sde
/dev/sdf
/dev/sdg
/dev/sdh
/dev/sdi
id = "9fAJEz-HcaP-RQ51-fV8b-nxrN-Uqwb-PPnOLJ"
id = "9fAJEz-HcaP-RQ51-fV8b-nxrN-Uqwb-PPnOLJ"
/dev/sdj
/dev/sdk
/dev/sdl
/dev/sdm



On 15/02/2011 14:51, Phil Turmel wrote:
> Hi Neil,
>
> Since Simon has responded, let me summarize the assistance I provided per his off-list request:
>
> On 02/14/2011 11:53 PM, NeilBrown wrote:
>> On Thu, 10 Feb 2011 16:16:44 +0000 Simon McNair<simonmcnair@gmail.com>  wrote:
>>
>>> Hi all
>>>
>>> I use a 3ware 9500-12 port sata card (JBOD) which will not work without a
>>> 128mb sodimm.  The sodimm socket is flakey and the result is that the
>>> machine occasionally crashes.  Yesterday I finally gave in and put
>>> together another
>>> machine so that I can rsync between them.  When I turned the machine
>>> on today to set up rync, the RAID array was not gone, but corrupted.
>>>    Typical...
>> Presumably the old machine was called 'ubuntu' and the new machine 'proølox'
>>
>>
>>> I built the array in Aug 2010 using the following command:
>>>
>>> mdadm --create --verbose /dev/md0 --metadata=1.1 --level=5
>>> --raid-devices=10 /dev/sd{b,c,d,e,f,g,h,i,j,k}1 --chunk=64
>>>
>>> Using LVM, I did the following:
>>> pvscan
>>> pvcreate -M2 /dev/md0
>>> vgcreate lvm-raid /dev/md0
>>> vgdisplay lvm-raid
>>> vgscan
>>> lvscan
>>> lvcreate -v -l 100%VG -n RAID lvm-raid
>>> lvdisplay /dev/lvm-raid/lvm0
>>>
>>> I then formatted using:
>>> mkfs -t ext4 -v -m .1 -b 4096 -E stride=16,stripe-width=144
>>> /dev/lvm-raid/RAID
>>>
>>> This worked perfectly since I created the array.  Now mdadm is coming up
>>> with
>>>
>>> proxmox:/dev/md# mdadm --assemble --scan --verbose
>>> mdadm: looking for devices for further assembly
>>> mdadm: no recogniseable superblock on /dev/md/ubuntu:0
>> And it seems that ubuntu:0 have been successfully assembled.
>> It is missing one device for some reason (sdd1) but RAID can cope with that.
> 3ware card is compromised, with a loose buffer memory dimm.  Some of its ECC errors were caught and reported in dmesg.  Its likely, based on the loose memory socket, that many multiple-bit errors got through.
>
> [trim /]
>
>>> mdadm: no uptodate device for slot 8 of /dev/md/pro�lox:0
>>> mdadm: no uptodate device for slot 9 of /dev/md/pro�lox:0
>>> mdadm: failed to add /dev/sdd1 to /dev/md/pro�lox:0: Invalid argument
>>> mdadm: /dev/md/pro�lox:0 assembled from 0 drives - not enough to start
>>> the array.
>> This looks like it is *after* to trying the --create command you give
>> below..  It is best to report things in the order they happen, else you can
>> confuse people (or get caught out!).
> Yes, this was after.
>
>>> mdadm: looking for devices for further assembly
>>> mdadm: no recogniseable superblock on /dev/sdd
>>> mdadm: No arrays found in config file or automatically
>>>
>>> pvscan and vgscan show nothing.
>>>
>>> So I tried running mdadm --create --verbose /dev/md0 --metadata=1.1
>>> --level=5 --raid-devices=10 missing /dev/sde1 /dev/sdf1 /dev/sdg1
>>> /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 --chunk=64
>>>
>>> as it seemed that /dev/sdd1 failed to be added to the array.  This did
>>> nothing.
>> It did not to nothing.  It wrote a superblock to /dev/sdd1 and complained
>> that it couldn't write to all the others --- didn't it?
> There were multiple attempts to create.  One wrote to just sdd1, another succeeded with all but sdd1.
>
>>> dmesg contains:
>>>
>>> md: invalid superblock checksum on sdd1
>> I guess that is why sdd1 was missing from 'ubuntu:0'.  Though as I cannot
>> tell if this happened before or after any of the various things reported
>> above, it is hard to be sure.
>>
>>
>> The  real mystery is why 'pvscan' reports nothing.
> The original array was created with mdadm v2.6.7, and had a data offset of 264 sectors.  After Simon's various attempts to --create, he ended up with data offset of 2048, using mdadm v3.1.4.  The mdadm -E reports he posted to the list showed the 264 offset.  We didn't realize the offset had been updated until somewhat later in our troubleshooting efforts.
>
> In any case, pvscan couldn't see the LVM signature because it wasn't there (at offset 2048).
>
>> What about
>>    pvscan --verbose
>>
>> or
>>
>>    blkid -p /dev/md/ubuntu:0
>>
>> or even
>>
>>    dd of=/dev/md/ubuntu:0 count=8 | od -c
> Fortunately, Simon did have a copy of his LVM configuration.  With the help of dd, strings, and grep, we did locate his LVM sig at the correct location on sdd1 (for data offset 264).  After a number of attempts to bypass LVM and access his single LV with dmsetup (based on his backed up configuration, on the assembled new array less sdd1), I realized that the data offset was wrong on the recreated array, and went looking for the cause.  I found your git commit that changed that logic last spring, and recommended that Simon revert to the default package for his ubuntu install, which is v2.6.7.
>
> Simon has now attempted to recreate the array with v2.6.7, but the controller is throwing too many errors to succeed, and I suggested it was too flakey to trust any further.  Based on the existence of the LVM sig on sdd1, I believe Simon's data is (mostly) intact, and only needs a successful create operation with a properly functioning controller.  (He might also need to perform an lvm vgcfgrestore, but he has the necessary backup file.)
>
> A new controller is on order.
>
> Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2011-02-22 19:06 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-10 16:16 Linux software RAID assistance Simon McNair
2011-02-10 18:24 ` Phil Turmel
2011-02-15  4:53 ` NeilBrown
2011-02-15  8:48   ` Simon McNair
2011-02-15 14:51   ` Phil Turmel
2011-02-15 19:04     ` Simon McNair
2011-02-15 19:37       ` Phil Turmel
2011-02-15 19:45         ` Roman Mamedov
2011-02-15 21:09           ` Simon McNair
2011-02-17 15:10           ` Simon Mcnair
2011-02-17 15:42             ` Roman Mamedov
2011-02-18  9:13               ` Simon McNair
2011-02-18  9:38                 ` Robin Hill
2011-02-18 10:38                   ` Simon Mcnair
2011-02-19 11:46                     ` Jan Ceuleers
2011-02-19 12:40                       ` Simon McNair
2011-02-19 17:37                         ` Jan Ceuleers
2011-02-16 13:51     ` Simon McNair
2011-02-16 14:37       ` Phil Turmel
2011-02-16 17:49         ` Simon McNair
2011-02-16 18:14           ` Phil Turmel
2011-02-16 18:18             ` Simon McNair
2011-02-16 18:22               ` Phil Turmel
2011-02-16 18:25                 ` Phil Turmel
2011-02-16 18:52                   ` Simon McNair
2011-02-16 18:57                     ` Phil Turmel
2011-02-16 19:07                       ` Simon McNair
2011-02-16 19:10                         ` Phil Turmel
2011-02-16 19:15                           ` Simon McNair
2011-02-16 19:36                             ` Phil Turmel
2011-02-16 21:28                               ` Simon McNair
2011-02-16 21:30                                 ` Phil Turmel
2011-02-16 22:44                                   ` Simon Mcnair
2011-02-16 23:39                                     ` Phil Turmel
2011-02-17 13:26                                       ` Simon Mcnair
2011-02-17 13:48                                         ` Phil Turmel
2011-02-17 13:56                                           ` Simon Mcnair
2011-02-17 14:34                                             ` Simon Mcnair
2011-02-17 16:54                                               ` Phil Turmel
2011-02-19  8:43                                                 ` Simon Mcnair
2011-02-19 15:30                                                   ` Phil Turmel
     [not found]                                                     ` <AANLkTinOXJWRw_et2U43R_T9XPBzQLnN56Kf2bOAz=_c@mail.gmail.com>
2011-02-19 16:19                                                       ` Phil Turmel
2011-02-20  9:56                                                         ` Simon Mcnair
2011-02-20 19:50                                                           ` Phil Turmel
2011-02-20 23:17                                                             ` Simon Mcnair
2011-02-20 23:39                                                               ` Phil Turmel
2011-02-22 17:12                                                                 ` Simon Mcnair
2011-02-22 17:14                                                                   ` Simon Mcnair
2011-02-22 18:23                                                                     ` Phil Turmel
2011-02-22 18:36                                                                       ` Simon McNair
2011-02-22 19:06                                                                         ` Phil Turmel
2011-02-18  9:31                               ` Simon Mcnair
2011-02-18 13:16                                 ` Phil Turmel
2011-02-18 13:21                                   ` Roberto Spadim
2011-02-18 13:26                                     ` Phil Turmel
2011-02-18 13:29                                   ` Simon Mcnair
2011-02-18 13:34                                     ` Phil Turmel
2011-02-18 14:12                                       ` Simon McNair
2011-02-18 16:10                                         ` Phil Turmel
2011-02-18 16:38                                           ` Roberto Spadim
     [not found]                               ` <AANLkTi=RmR5nVnmFLuqK5anHc3WDPxjuYjitT6+5wAqS@mail.gmail.com>
2011-02-20 18:48                                 ` Phil Turmel
2011-02-20 19:25                                   ` Simon Mcnair
2011-02-19  8:49             ` Simon Mcnair
2011-02-16 13:56     ` Simon McNair

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).