Linux RAID subsystem development

Linux RAID subsystem development
 help / color / mirror / Atom feed

* Re: Unstable speed or correct?
From: Pol Hallen @ 2011-06-09 11:52 UTC (permalink / raw)
  To: linux-raid
In-Reply-To: <201106091325.54782.raid1@fuckaround.org>

init 1 raid 6 performance:

http://fuckaround.org/nuvola/?p=17

iostat md0 -x 1:

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz 
avgqu-sz   await  svctm  %util
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     
0.00    0.00   0.00   0.00

:-(((

Pol

^ permalink raw reply

* Re: Triple-parity raid6
From: David Brown @ 2011-06-09 11:32 UTC (permalink / raw)
  To: linux-raid
In-Reply-To: <20110609114954.243e9e22@notabene.brown>

On 09/06/2011 03:49, NeilBrown wrote:
> On Thu, 09 Jun 2011 02:01:06 +0200 David Brown<david.brown@hesbynett.no>
> wrote:
>
>> Has anyone considered triple-parity raid6 ?  As far as I can see, it
>> should not be significantly harder than normal raid6 - either  to
>> implement, or for the processor at run-time.  Once you have the GF(2⁸)
>> field arithmetic in place for raid6, it's just a matter of making
>> another parity block in the same way but using a different generator:
>>
>> P = D_0 + D_1 + D_2 + .. + D_(n.1)
>> Q = D_0 + g.D_1 + g².D_2 + .. + g^(n-1).D_(n.1)
>> R = D_0 + h.D_1 + h².D_2 + .. + h^(n-1).D_(n.1)
>>
>> The raid6 implementation in mdraid uses g = 0x02 to generate the second
>> parity (based on "The mathematics of RAID-6" - I haven't checked the
>> source code).  You can make a third parity using h = 0x04 and then get a
>> redundancy of 3 disks.  (Note - I haven't yet confirmed that this is
>> valid for more than 100 data disks - I need to make my checker program
>> more efficient first.)
>>
>> Rebuilding a disk, or running in degraded mode, is just an obvious
>> extension to the current raid6 algorithms.  If you are missing three
>> data blocks, the maths looks hard to start with - but if you express the
>> equations as a set of linear equations and use standard matrix inversion
>> techniques, it should not be hard to implement.  You only need to do
>> this inversion once when you find that one or more disks have failed -
>> then you pre-compute the multiplication tables in the same way as is
>> done for raid6 today.
>>
>> In normal use, calculating the R parity is no more demanding than
>> calculating the Q parity.  And most rebuilds or degraded situations will
>> only involve a single disk, and the data can thus be re-constructed
>> using the P parity just like raid5 or two-parity raid6.
>>
>>
>> I'm sure there are situations where triple-parity raid6 would be
>> appealing - it has already been implemented in ZFS, and it is only a
>> matter of time before two-parity raid6 has a real probability of hitting
>> an unrecoverable read error during a rebuild.
>>
>>
>> And of course, there is no particular reason to stop at three parity
>> blocks - the maths can easily be generalised.  1, 2, 4 and 8 can be used
>> as generators for quad-parity (checked up to 60 disks), and adding 16
>> gives you quintuple parity (checked up to 30 disks) - but that's maybe
>> getting a bit paranoid.
>>
>>
>> ref.:
>>
>> <http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf>
>> <http://blogs.oracle.com/ahl/entry/acm_triple_parity_raid>
>> <http://queue.acm.org/detail.cfm?id=1670144>
>> <http://blogs.oracle.com/ahl/entry/triple_parity_raid_z>
>>
>
>   -ENOPATCH  :-)
>
> I have a series of patches nearly ready which removes a lot of the remaining
> duplication in raid5.c between raid5 and raid6 paths.  So there will be
> relative few places where RAID5 and RAID6 do different things - only the
> places where they *must* do different things.
> After that, adding a new level or layout which has 'max_degraded == 3' would
> be quite easy.
> The most difficult part would be the enhancements to libraid6 to generate the
> new 'syndrome', and to handle the different recovery possibilities.
>
> So if you're not otherwise busy this weekend, a patch would be nice :-)
>

I'm not going to promise any patches, but maybe I can help with the 
maths.  You say the difficult part is the syndrome calculations and 
recovery - I've got these bits figured out on paper and some 
quick-and-dirty python test code.  On the other hand, I don't really 
want to get into the md kernel code, or the mdadm code - I haven't done 
Linux kernel development before (I mostly program 8-bit microcontrollers 
- when I code on Linux, I use Python), and I fear it would take me a 
long time to get up to speed.

However, if the parity generation and recovery is neatly separated into 
a libraid6 library, the whole thing becomes much more tractable from my 
viewpoint.  Since I am new to this, can you tell me where I should get 
the current libraid6 code?  I'm sure google will find some sources for 
me, but I'd like to make sure I start with whatever version /you/ have.




--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Unstable speed or correct?
From: Pol Hallen @ 2011-06-09 11:25 UTC (permalink / raw)
  To: Durval Menezes; +Cc: linux-raid
In-Reply-To: <BANLkTinrqCiyj+Z2zTAfS9Ar8MxUTn7Khw@mail.gmail.com>

> I'm not familiar with this program (BTW, you mean "palimpsest",
> right?)

yep :-)

> I would try running some other benchmark

iozone3 and bonnie?

Do u known other bench programs?

> that can be run on
> single-user

next day I'll try do these benchs to init 1

>if it does, then I
> would conclude that there's something wrong (possibly on your
> hardware, like a failing disk doing retries/reseeks).

I already examined all disks by smartclt, hdparm and seem there aren't 
problems :-/

Is ram a possible problem?

How hardware test I'll do?

thanks :-)

Pol

^ permalink raw reply

* Re: Possible to use multiple disk to bypass I/O wait?
From: Mathias Burén @ 2011-06-09 10:19 UTC (permalink / raw)
  To: Emmanuel Noobadmin; +Cc: CentOS mailing list, linux-raid
In-Reply-To: <BANLkTimFOaJoMnwid1F+ghVwkBgJi2FymQ@mail.gmail.com>

On 9 June 2011 10:24, Emmanuel Noobadmin <centos.admin@gmail.com> wrote:
> I'm trying to resolve an I/O problem on a CentOS 5.6 server. The
> process basically scans through Maildirs, checking for space usage and
> quota. Because there are hundred odd user folders and several 10s of
> thousands of small files, this sends the I/O wait % way high. The
> server hits a very high load level and stops responding to other
> requests until the crawl is done.
>
> I am wondering if I add another disk and symlink the sub-directories
> to that, would that free up the server to respond to other requests
> despite the wait on that disk?
>
> Alternatively, if I mdraid mirror the existing disk, would md be smart
> enough to read using the other disk while the first's tied up with the
> first process?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

The first thing that comes to my mind: Have you tried another IO scheduler?

/M
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Unstable speed or correct?
From: Durval Menezes @ 2011-06-09  9:36 UTC (permalink / raw)
  To: Pol Hallen, linux-raid
In-Reply-To: <BANLkTimQ854noJpPEstnaG2CvsWvXFiWoQ@mail.gmail.com>

Hello Pol.

On Thu, Jun 9, 2011 at 6:02 AM, Pol Hallen <raid2@fuckaround.org> wrote:
>
> > What benchmark program did you use for the test? Also, was there any other
> > programs running at the same time? Was the machine in single-user (init 1)
> > state?
>
> hello :-)
>
> dd if=/dev/zero of=/share/raid6/1Gb bs=1024 count=1000000
> 1000000+0 records in
> 1000000+0 records out
> 1024000000 bytes (1.0 GB) copied, 12.1632 s, 84.2 MB/s
>
> dd if=/dev/zero of=/share/raid6/10Gb bs=1024 count=10000000
> 10000000+0 records in
> 10000000+0 records out
> 10240000000 bytes (10 GB) copied, 142.021 s, 72.1 MB/s

The above dd's seem much more consistent than the graph.

> That graphic with palimptest (gnome-disk-utility).

I'm not familiar with this program (BTW, you mean "palimpsest",
right?), and I've run it for the first time on my system (M4A785-M
with Phenon-II X6 1055T, 4GB RAM, with 3 x ST31500541AS 1.5TB 3.5"
5900RPM HDs plugged directly onto the motherboard's SB700 SATA
controller, running Ubuntu Lucid Lynx 10.04.1 LTS, kernel
2.6.32-32-generic; the 3 HDs are mounted as a RAID5 md device and
dedicated to user data-only storage and mounted as /usr2, the root
filesystem runs on a separate SSD disk). Here's my palimpsest graph:
http://durval.com/felwithe_raid5_palimpsest_ro_20110609.jpg

Note that it shows much less "jitter" (local variation) than your graph.

> No other applications uses raid (I'm on init 2 but samba, ftp, and other
> daemons are off)
>
> iostat not show me any activity by raid, so I think there is not programs that
> runs on array.

This is very strange... concurrent access by other programs would
certainly explain the variation on your graph.

I would try running some other benchmark that can be run on
single-user state to see if the variation remains. If it does, then I
would conclude that there's something wrong (possibly on your
hardware, like a failing disk doing retries/reseeks).

Cheers,
--
  Durval Menezes.

>
> thanks :-)
>
> Pol
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Possible to use multiple disk to bypass I/O wait?
From: Emmanuel Noobadmin @ 2011-06-09  9:24 UTC (permalink / raw)
  To: CentOS mailing list, linux-raid

I'm trying to resolve an I/O problem on a CentOS 5.6 server. The
process basically scans through Maildirs, checking for space usage and
quota. Because there are hundred odd user folders and several 10s of
thousands of small files, this sends the I/O wait % way high. The
server hits a very high load level and stops responding to other
requests until the crawl is done.

I am wondering if I add another disk and symlink the sub-directories
to that, would that free up the server to respond to other requests
despite the wait on that disk?

Alternatively, if I mdraid mirror the existing disk, would md be smart
enough to read using the other disk while the first's tied up with the
first process?

^ permalink raw reply

* Re: Unstable speed or correct?
From: Pol Hallen @ 2011-06-09  9:02 UTC (permalink / raw)
  To: Durval Menezes; +Cc: linux-raid
In-Reply-To: <BANLkTimbhe+qmTyVWKbRt6TZ9cynmmVNRw@mail.gmail.com>

> What benchmark program did you use for the test? Also, was there any other
> programs running at the same time? Was the machine in single-user (init 1)
> state?

hello :-)

dd if=/dev/zero of=/share/raid6/1Gb bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 12.1632 s, 84.2 MB/s

dd if=/dev/zero of=/share/raid6/10Gb bs=1024 count=10000000
10000000+0 records in
10000000+0 records out
10240000000 bytes (10 GB) copied, 142.021 s, 72.1 MB/s

That graphic with palimptest (gnome-disk-utility).

No other applications uses raid (I'm on init 2 but samba, ftp, and other 
daemons are off)

iostat not show me any activity by raid, so I think there is not programs that 
runs on array.

thanks :-) 

Pol

^ permalink raw reply

* Unstable speed or correct?
From: Pol Hallen @ 2011-06-09  7:09 UTC (permalink / raw)
  To: linux-raid

Hi folks :-)

I've a raid6 sw (6disks from 1.5Tb) asus p5b-e.

Yestarday I done some checks tests and I see this:

http://fuckaround.org/nuvola/

is it correct (read) speed or is there a problem?

The read line oscillate and I don't understand if there a problem.

hdparm, smartctl and other diagnostic tool doesn't report any problem..

raid6 is correctly runs.

Any idea?

thanks!

Pol

^ permalink raw reply

* (unknown)
From: Dragon @ 2011-06-09  6:50 UTC (permalink / raw)
  To: philip; +Cc: linux-raid

Hi Phil,
i know that there is something odd with the raid, thats why i need help.
No i didnt scamble the report. thats what the system output. Sorry for confusing with sdo, this is my usb disk and doesnt belong to the raid. because of the size i didnt have any backup ;(

I do not let the system run 24/7 and as i started at in the morning the sequence has changed.
 fdisk -l |grep sd
Disk /dev/sda: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdc: 20.4 GB, 20409532416 bytes
/dev/sdc1   *           1        2372    19053058+  83  Linux
/dev/sdc2            2373        2481      875542+   5  Extended
/dev/sdc5            2373        2481      875511   82  Linux swap / Solaris
Disk /dev/sdd: 1500.3 GB, 1500301910016 bytes
Disk /dev/sde: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdg: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdf: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdh: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdi: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdj: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdk: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdl: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdm: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdn: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
Yesterday was the system on disk sdk. now its on sdc?! the system is now and up to the evening online.
here the actual data of the drives again:
mdadm -E /dev/sda
/dev/sda:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 23:47:53 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee4232 - correct
         Events : 156864

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     4       8      176        4      active sync   /dev/sdl

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8        0       13      spare   /dev/sda
mdadm -E /dev/sdb
/dev/sdb:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 23:47:53 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee4244 - correct
         Events : 156864

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     5       8      192        5      active sync   /dev/sdm

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8        0       13      spare   /dev/sda
 mdadm -E /dev/sdd
/dev/sdd:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 23:47:53 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee418e - correct
         Events : 156864

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this    13       8        0       13      spare   /dev/sda

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8        0       13      spare   /dev/sda
mdadm -E /dev/sde
/dev/sde:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 23:47:53 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee4196 - correct
         Events : 156864

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     6       8       16        6      active sync   /dev/sdb

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8        0       13      spare   /dev/sda
mdadm -E /dev/sdf
/dev/sdf:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 23:47:53 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee41aa - correct
         Events : 156864

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     8       8       32        8      active sync   /dev/sdc

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8        0       13      spare   /dev/sda
mdadm -E /dev/sdg
/dev/sdg:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 23:47:53 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee41bc - correct
         Events : 156864

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     9       8       48        9      active sync   /dev/sdd

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8        0       13      spare   /dev/sda
mdadm -E /dev/sdh
/dev/sdh:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 23:47:53 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee41ce - correct
         Events : 156864

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this    10       8       64       10      active sync   /dev/sde

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8        0       13      spare   /dev/sda
mdadm -E /dev/sdi
/dev/sdi:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 23:47:53 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee41e0 - correct
         Events : 156864

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this    11       8       80       11      active sync   /dev/sdf

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8        0       13      spare   /dev/sda
mdadm -E /dev/sdj
/dev/sdj:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 23:47:53 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee41f2 - correct
         Events : 156864

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this    12       8       96       12      active sync   /dev/sdg

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8        0       13      spare   /dev/sda
mdadm -E /dev/sdk
/dev/sdk:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 23:47:53 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee41ea - correct
         Events : 156864

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8      112        0      active sync   /dev/sdh

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8        0       13      spare   /dev/sda
mdadm -E /dev/sdl
/dev/sdl:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 23:47:53 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee41fe - correct
         Events : 156864

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2       8      128        2      active sync   /dev/sdi

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8        0       13      spare   /dev/sda
mdadm -E /dev/sdm
/dev/sdm:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 23:47:53 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee4210 - correct
         Events : 156864

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     3       8      144        3      active sync   /dev/sdj

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8        0       13      spare   /dev/sda
mdadm -E /dev/sdn
/dev/sdn:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 22:49:22 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee3313 - correct
         Events : 156606

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this    13       8      160       13      spare   /dev/sdk

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8      160       13      spare   /dev/sdk

as far as i can see, now there is no error with a missing superblock of one disk.

how can i download lsdrv with "wget"? Yes the way backwards by shrinking lead to the actual problem.
-- 
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de

^ permalink raw reply

* RE: [PATCH 01/21] imsm: FIX: Cannot create volume
From: Kwolek, Adam @ 2011-06-09  6:40 UTC (permalink / raw)
  To: NeilBrown
  Cc: linux-raid@vger.kernel.org, Williams, Dan J, Ciechanowski, Ed,
	Neubauer, Wojciech
In-Reply-To: <20110609124218.1b7f7457@notabene.brown>



> -----Original Message-----
> From: NeilBrown [mailto:neilb@suse.de]
> Sent: Thursday, June 09, 2011 4:42 AM
> To: Kwolek, Adam
> Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> Neubauer, Wojciech
> Subject: Re: [PATCH 01/21] imsm: FIX: Cannot create volume
> 
> On Wed, 08 Jun 2011 18:09:41 +0200 Adam Kwolek <adam.kwolek@intel.com>
> wrote:
> 
> > Clearing info structure causes mdadm is not able to create workable
> volume.
> >
> > During volume creation info structure passed to getinfo() function
> > contains some information already and cannot be cleared.
> >
> > Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
> > ---
> >
> >  super-intel.c |    1 -
> >  1 files changed, 0 insertions(+), 1 deletions(-)
> >
> > diff --git a/super-intel.c b/super-intel.c
> > index b8d8b4e..471dbd2 100644
> > --- a/super-intel.c
> > +++ b/super-intel.c
> > @@ -2075,7 +2075,6 @@ static void getinfo_super_imsm_volume(struct
> supertype *st, struct mdinfo *info,
> >  	unsigned int component_size_alligment;
> >  	int map_disks = info->array.raid_disks;
> >
> > -	memset(info, 0, sizeof(*info));
> >  	if (prev_map)
> >  		map_to_analyse = prev_map;
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid"
> in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> I'm sorry, but this is just a very silly patch.


You are right in your suspicions about root cause. It is link problem using 'next' field.
(I can see this problem at this moment)

I'll review it and I'll make it less silly ;)

BR
Adam

> 
> In many caches the 'info' structure is completely uninitialised, so it
> really
> does make sense to initialise it, which is why I added that.
> Just removing it *must* be wrong.
> 
> It would be much more helpful if you had explained what the "some
> information" was.
> 
> Maybe it is the '->next' field that container_content_imsm() sets before
> calling getinfo_super_imsm_volume()?? Well that
> getinfo_super_imsm_volume
> doesn't use that field, so we can just move the assignment afterwards.
> 
> or maybe ... getinfo_super_imsm_volume uses info->disk.raid_disk, but no
> caller ever sets that up that I can see, so that code is plainly wrong
> (though ddf makes the same mistake so that is probably my fault).
> I've 'fixed' them both to just report on the 'first' disk as that is no
> worse
> than what they currently do.
> 
> I that doesn't fix your problem, please explain exactly what is being
> cleared
> that shouldn't be.
> 
> NeilBrown
> 
> commit 9894ec0d64a9faab719d016bbbf5fbc842757df6
> Author: NeilBrown <neilb@suse.de>
> Date:   Thu Jun 9 12:42:02 2011 +1000
> 
>     Fix some fall-out from recent memset-zero for getinfo_super
> 
>     container_content_imsm was setting info->next before calling
>     getinfo_super_imsm_container which now zeros everything.
>     So move that assignment to afterwards.
> 
>     So both imsm and ddf were assuming info->disk.raid_disk means
>     something but it doesn't.  So fix those.
> 
>     Signed-off-by: NeilBrown <neilb@suse.de>
> 
> diff --git a/super-ddf.c b/super-ddf.c
> index 21a917e..3fba2eb 100644
> --- a/super-ddf.c
> +++ b/super-ddf.c
> @@ -1429,9 +1429,7 @@ static void getinfo_super_ddf_bvd(struct supertype
> *st, struct mdinfo *info, cha
>  			info->component_size = __be64_to_cpu(vc->conf.blocks);
>  	}
> 
> -	for (dl = ddf->dlist; dl ; dl = dl->next)
> -		if (dl->raiddisk == info->disk.raid_disk)
> -			break;
> +	dl = ddf->dlist;
>  	info->disk.major = 0;
>  	info->disk.minor = 0;
>  	if (dl) {
> diff --git a/super-intel.c b/super-intel.c
> index b8d8b4e..6fed9eb 100644
> --- a/super-intel.c
> +++ b/super-intel.c
> @@ -2079,9 +2079,8 @@ static void getinfo_super_imsm_volume(struct
> supertype *st, struct mdinfo *info,
>  	if (prev_map)
>  		map_to_analyse = prev_map;
> 
> -	for (dl = super->disks; dl; dl = dl->next)
> -		if (dl->raiddisk == info->disk.raid_disk)
> -			break;
> +	dl = super->disks;
> +
>  	info->container_member	  = super->current_vol;
>  	info->array.raid_disks    = map->num_members;
>  	info->array.level	  = get_imsm_raid_level(map_to_analyse);
> @@ -5446,11 +5445,10 @@ static struct mdinfo
> *container_content_imsm(struct supertype *st, char *subarra
>  				sizeof(*this));
>  			break;
>  		}
> -		memset(this, 0, sizeof(*this));
> -		this->next = rest;
> 
>  		super->current_vol = i;
>  		getinfo_super_imsm_volume(st, this, NULL);
> +		this->next = rest;
>  		for (slot = 0 ; slot <  map->num_members; slot++) {
>  			unsigned long long recovery_start;
>  			struct mdinfo *info_d;


^ permalink raw reply

* Re: [PATCH 00/21] IMSM Checkpointing Bug Fix Series
From: NeilBrown @ 2011-06-09  3:03 UTC (permalink / raw)
  To: Adam Kwolek
  Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer
In-Reply-To: <20110608160222.24327.71439.stgit@gklab-128-013.igk.intel.com>

On Wed, 08 Jun 2011 18:09:33 +0200 Adam Kwolek <adam.kwolek@intel.com> wrote:

> The following series fixes problems found in IMSM's checkpointing.
> It contains rework based on Neil's comments to previous/initial checkpointing
> series and tt should be applied on neil_master branch (on the top 
> of previous checkpointing patches).
> 
> BR
> Adam
> 
> 
> ---
> 
> Adam Kwolek (21):
>       MAN: Man update for check-pointing
>       imsm: Optimize expansion speed when no backup is required
>       imsm: FIX: Remove timeout from wait_for_reshape_imsm()
>       imsm: FIX: wait_for_reshape_imsm() cleanup
>       imsm: FIX: Do not continue reshape when backup exists
>       FIX: Move buffer to next location
>       imsm: FIX: Remove unused variables and code
>       imsm: FIX: Move reshape_progress forward
>       imsm: FIX: Detect failed devices during recover_backup_imsm()
>       imsm: FIX: Use metadata information for restore_stripes() and save_stripes()
>       imsm: FIX: Remove unused parameter from save_backup_imsm() interface
>       imsm: FIX: Do not use pba_of_lba0 for copy position calculation
>       imsm: FIX: Do not verify unused parameters
>       imsm: FIX: Calculate backup location based on metadata information
>       imsm: FIX: Use macros to data access
>       imsm: FIX: Check layout for level migration
>       imsm: FIX: Max position could not be rounded to MB
>       imsm: FIX: Detect migration end during migration record saving
>       imsm: FIX: Verify if migration record is loaded correctly
>       imsm: FIX: Opened handle is not closed
>       imsm: FIX: Cannot create volume
> 
> 
>  mdadm.8.in    |    9 ++
>  restripe.c    |    6 +
>  super-intel.c |  237 ++++++++++++++++++++++++++++++++++-----------------------
>  3 files changed, 153 insertions(+), 99 deletions(-)
> 

Thanks.
Apart from the first one which I have already commented on, these look good.
I have applied them all, thanks.

Please confirm that it all still works for you and let me know if you have
any other changes pending that I should know about.

Thanks,
NeilBrown


^ permalink raw reply

* Re: [PATCH 01/21] imsm: FIX: Cannot create volume
From: NeilBrown @ 2011-06-09  2:42 UTC (permalink / raw)
  To: Adam Kwolek
  Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer
In-Reply-To: <20110608160941.24327.74788.stgit@gklab-128-013.igk.intel.com>

On Wed, 08 Jun 2011 18:09:41 +0200 Adam Kwolek <adam.kwolek@intel.com> wrote:

> Clearing info structure causes mdadm is not able to create workable volume.
> 
> During volume creation info structure passed to getinfo() function
> contains some information already and cannot be cleared.
> 
> Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
> ---
> 
>  super-intel.c |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/super-intel.c b/super-intel.c
> index b8d8b4e..471dbd2 100644
> --- a/super-intel.c
> +++ b/super-intel.c
> @@ -2075,7 +2075,6 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info,
>  	unsigned int component_size_alligment;
>  	int map_disks = info->array.raid_disks;
>  
> -	memset(info, 0, sizeof(*info));
>  	if (prev_map)
>  		map_to_analyse = prev_map;
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


I'm sorry, but this is just a very silly patch.

In many caches the 'info' structure is completely uninitialised, so it really
does make sense to initialise it, which is why I added that.
Just removing it *must* be wrong.

It would be much more helpful if you had explained what the "some
information" was.

Maybe it is the '->next' field that container_content_imsm() sets before
calling getinfo_super_imsm_volume()?? Well that getinfo_super_imsm_volume
doesn't use that field, so we can just move the assignment afterwards.

or maybe ... getinfo_super_imsm_volume uses info->disk.raid_disk, but no
caller ever sets that up that I can see, so that code is plainly wrong
(though ddf makes the same mistake so that is probably my fault).
I've 'fixed' them both to just report on the 'first' disk as that is no worse
than what they currently do.

I that doesn't fix your problem, please explain exactly what is being cleared
that shouldn't be.

NeilBrown

commit 9894ec0d64a9faab719d016bbbf5fbc842757df6
Author: NeilBrown <neilb@suse.de>
Date:   Thu Jun 9 12:42:02 2011 +1000

    Fix some fall-out from recent memset-zero for getinfo_super
    
    container_content_imsm was setting info->next before calling
    getinfo_super_imsm_container which now zeros everything.
    So move that assignment to afterwards.
    
    So both imsm and ddf were assuming info->disk.raid_disk means
    something but it doesn't.  So fix those.
    
    Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/super-ddf.c b/super-ddf.c
index 21a917e..3fba2eb 100644
--- a/super-ddf.c
+++ b/super-ddf.c
@@ -1429,9 +1429,7 @@ static void getinfo_super_ddf_bvd(struct supertype *st, struct mdinfo *info, cha
 			info->component_size = __be64_to_cpu(vc->conf.blocks);
 	}
 
-	for (dl = ddf->dlist; dl ; dl = dl->next)
-		if (dl->raiddisk == info->disk.raid_disk)
-			break;
+	dl = ddf->dlist;
 	info->disk.major = 0;
 	info->disk.minor = 0;
 	if (dl) {
diff --git a/super-intel.c b/super-intel.c
index b8d8b4e..6fed9eb 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -2079,9 +2079,8 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info,
 	if (prev_map)
 		map_to_analyse = prev_map;
 
-	for (dl = super->disks; dl; dl = dl->next)
-		if (dl->raiddisk == info->disk.raid_disk)
-			break;
+	dl = super->disks;
+
 	info->container_member	  = super->current_vol;
 	info->array.raid_disks    = map->num_members;
 	info->array.level	  = get_imsm_raid_level(map_to_analyse);
@@ -5446,11 +5445,10 @@ static struct mdinfo *container_content_imsm(struct supertype *st, char *subarra
 				sizeof(*this));
 			break;
 		}
-		memset(this, 0, sizeof(*this));
-		this->next = rest;
 
 		super->current_vol = i;
 		getinfo_super_imsm_volume(st, this, NULL);
+		this->next = rest;
 		for (slot = 0 ; slot <  map->num_members; slot++) {
 			unsigned long long recovery_start;
 			struct mdinfo *info_d;


^ permalink raw reply related

* Re: [PATCH] md/raid10: share pages between read and write bio's during recovery
From: Namhyung Kim @ 2011-06-09  2:33 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid
In-Reply-To: <20110609122109.080e82de@notabene.brown>

2011-06-09 (목), 12:21 +1000, NeilBrown:
> On Thu,  9 Jun 2011 03:04:33 +0900 Namhyung Kim <namhyung@gmail.com> wrote:
> 
> > When performing a recovery, only first 2 slots in r10_bio are in use,
> > for read and write respectively. However all of pages in the write bio
> > are never used and just replaced to read bio's when the read completes.
> > 
> > Get rid of those unused pages and share read pages properly.
> > 
> > Signed-off-by: Namhyung Kim <namhyung@gmail.com>
> > ---
> >  drivers/md/raid10.c |   13 ++++++++++---
> >  1 files changed, 10 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> > index a53779ffdf89..621594981339 100644
> > --- a/drivers/md/raid10.c
> > +++ b/drivers/md/raid10.c
> > @@ -116,6 +116,13 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
> >  			goto out_free_bio;
> >  		r10_bio->devs[j].bio = bio;
> >  	}
> > +
> > +	/*
> > +	 * We can share bv_page's during the recovery
> > +	 */
> > +	if (!test_bit(MD_RECOVERY_SYNC, &conf->mddev->recovery))
> > +		nalloc--;
> > +
> >  	/*
> >  	 * Allocate RESYNC_PAGES data pages and attach them
> >  	 * where needed.
> > @@ -1363,16 +1370,16 @@ static void recovery_request_write(mddev_t *mddev, r10bio_t *r10_bio)
> >  	int i, d;
> >  	struct bio *bio, *wbio;
> >  
> > -
> > -	/* move the pages across to the second bio
> > +	/*
> > +	 * share the pages with the first bio
> >  	 * and submit the write request
> >  	 */
> >  	bio = r10_bio->devs[0].bio;
> >  	wbio = r10_bio->devs[1].bio;
> >  	for (i=0; i < wbio->bi_vcnt; i++) {
> >  		struct page *p = bio->bi_io_vec[i].bv_page;
> > -		bio->bi_io_vec[i].bv_page = wbio->bi_io_vec[i].bv_page;
> >  		wbio->bi_io_vec[i].bv_page = p;
> > +		get_page(p);
> >  	}
> >  	d = r10_bio->devs[1].devnum;
> >  
> 
> 
> Thanks.   Interesting idea, but I don't think this code is safe.
> 
> We end up calling bio_add_page on with an uninitialised 'page' (in
> sync_request()).  This could result on BIOVEC_PHYS_MERGEABLE doing funny
> things.
> 

Arrrh, right. I missed that part.


> It would be OK to set up the two links to the one page in r10buf_pool_alloc,
> so recovery_request_write doesn't need to do anything with pages.
> That would probably be even more safe than the current code.
> 

Agreed, will resend v2. Thanks for the review.


-- 
Regards,
Namhyung Kim


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] md/raid10: share pages between read and write bio's during recovery
From: NeilBrown @ 2011-06-09  2:21 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: linux-raid
In-Reply-To: <1307556273-16730-1-git-send-email-namhyung@gmail.com>

On Thu,  9 Jun 2011 03:04:33 +0900 Namhyung Kim <namhyung@gmail.com> wrote:

> When performing a recovery, only first 2 slots in r10_bio are in use,
> for read and write respectively. However all of pages in the write bio
> are never used and just replaced to read bio's when the read completes.
> 
> Get rid of those unused pages and share read pages properly.
> 
> Signed-off-by: Namhyung Kim <namhyung@gmail.com>
> ---
>  drivers/md/raid10.c |   13 ++++++++++---
>  1 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index a53779ffdf89..621594981339 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -116,6 +116,13 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
>  			goto out_free_bio;
>  		r10_bio->devs[j].bio = bio;
>  	}
> +
> +	/*
> +	 * We can share bv_page's during the recovery
> +	 */
> +	if (!test_bit(MD_RECOVERY_SYNC, &conf->mddev->recovery))
> +		nalloc--;
> +
>  	/*
>  	 * Allocate RESYNC_PAGES data pages and attach them
>  	 * where needed.
> @@ -1363,16 +1370,16 @@ static void recovery_request_write(mddev_t *mddev, r10bio_t *r10_bio)
>  	int i, d;
>  	struct bio *bio, *wbio;
>  
> -
> -	/* move the pages across to the second bio
> +	/*
> +	 * share the pages with the first bio
>  	 * and submit the write request
>  	 */
>  	bio = r10_bio->devs[0].bio;
>  	wbio = r10_bio->devs[1].bio;
>  	for (i=0; i < wbio->bi_vcnt; i++) {
>  		struct page *p = bio->bi_io_vec[i].bv_page;
> -		bio->bi_io_vec[i].bv_page = wbio->bi_io_vec[i].bv_page;
>  		wbio->bi_io_vec[i].bv_page = p;
> +		get_page(p);
>  	}
>  	d = r10_bio->devs[1].devnum;
>  


Thanks.   Interesting idea, but I don't think this code is safe.

We end up calling bio_add_page on with an uninitialised 'page' (in
sync_request()).  This could result on BIOVEC_PHYS_MERGEABLE doing funny
things.

It would be OK to set up the two links to the one page in r10buf_pool_alloc,
so recovery_request_write doesn't need to do anything with pages.
That would probably be even more safe than the current code.

Thanks,
NeilBrown

^ permalink raw reply

* Re: Triple-parity raid6
From: NeilBrown @ 2011-06-09  1:49 UTC (permalink / raw)
  To: David Brown; +Cc: linux-raid
In-Reply-To: <isp2g2$rf$1@dough.gmane.org>

On Thu, 09 Jun 2011 02:01:06 +0200 David Brown <david.brown@hesbynett.no>
wrote:

> Has anyone considered triple-parity raid6 ?  As far as I can see, it 
> should not be significantly harder than normal raid6 - either  to 
> implement, or for the processor at run-time.  Once you have the GF(2⁸) 
> field arithmetic in place for raid6, it's just a matter of making 
> another parity block in the same way but using a different generator:
> 
> P = D_0 + D_1 + D_2 + .. + D_(n.1)
> Q = D_0 + g.D_1 + g².D_2 + .. + g^(n-1).D_(n.1)
> R = D_0 + h.D_1 + h².D_2 + .. + h^(n-1).D_(n.1)
> 
> The raid6 implementation in mdraid uses g = 0x02 to generate the second 
> parity (based on "The mathematics of RAID-6" - I haven't checked the 
> source code).  You can make a third parity using h = 0x04 and then get a 
> redundancy of 3 disks.  (Note - I haven't yet confirmed that this is 
> valid for more than 100 data disks - I need to make my checker program 
> more efficient first.)
> 
> Rebuilding a disk, or running in degraded mode, is just an obvious 
> extension to the current raid6 algorithms.  If you are missing three 
> data blocks, the maths looks hard to start with - but if you express the 
> equations as a set of linear equations and use standard matrix inversion 
> techniques, it should not be hard to implement.  You only need to do 
> this inversion once when you find that one or more disks have failed - 
> then you pre-compute the multiplication tables in the same way as is 
> done for raid6 today.
> 
> In normal use, calculating the R parity is no more demanding than 
> calculating the Q parity.  And most rebuilds or degraded situations will 
> only involve a single disk, and the data can thus be re-constructed 
> using the P parity just like raid5 or two-parity raid6.
> 
> 
> I'm sure there are situations where triple-parity raid6 would be 
> appealing - it has already been implemented in ZFS, and it is only a 
> matter of time before two-parity raid6 has a real probability of hitting 
> an unrecoverable read error during a rebuild.
> 
> 
> And of course, there is no particular reason to stop at three parity 
> blocks - the maths can easily be generalised.  1, 2, 4 and 8 can be used 
> as generators for quad-parity (checked up to 60 disks), and adding 16 
> gives you quintuple parity (checked up to 30 disks) - but that's maybe 
> getting a bit paranoid.
> 
> 
> ref.:
> 
> <http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf>
> <http://blogs.oracle.com/ahl/entry/acm_triple_parity_raid>
> <http://queue.acm.org/detail.cfm?id=1670144>
> <http://blogs.oracle.com/ahl/entry/triple_parity_raid_z>
> 

 -ENOPATCH  :-)

I have a series of patches nearly ready which removes a lot of the remaining
duplication in raid5.c between raid5 and raid6 paths.  So there will be
relative few places where RAID5 and RAID6 do different things - only the
places where they *must* do different things.
After that, adding a new level or layout which has 'max_degraded == 3' would
be quite easy.
The most difficult part would be the enhancements to libraid6 to generate the
new 'syndrome', and to handle the different recovery possibilities.

So if you're not otherwise busy this weekend, a patch would be nice :-)

Thanks,
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] MD:  use is_power_of_2 macro
From: NeilBrown @ 2011-06-09  1:44 UTC (permalink / raw)
  To: Jonathan Brassow; +Cc: linux-raid
In-Reply-To: <1307556070.9555.11.camel@f14.redhat.com>

On Wed, 08 Jun 2011 13:01:10 -0500 Jonathan Brassow <jbrassow@redhat.com>
wrote:

> Make use of is_power_of_2 macro.
> 
> Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
> 
> Index: linux-2.6/drivers/md/bitmap.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/bitmap.c
> +++ linux-2.6/drivers/md/bitmap.c
> @@ -651,7 +651,7 @@ static int bitmap_read_sb(struct bitmap 
>  		reason = "unrecognized superblock version";
>  	else if (chunksize < 512)
>  		reason = "bitmap chunksize too small";
> -	else if ((1 << ffz(~chunksize)) != chunksize)
> +	else if (!is_power_of_2(chunksize))
>  		reason = "bitmap chunksize not a power of 2";
>  	else if (daemon_sleep < 1 || daemon_sleep > MAX_SCHEDULE_TIMEOUT)
>  		reason = "daemon sleep period out of range";
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hi Jon,
 thanks for this and the others.
I'll push them into my 'for-next' branch today so they should appear in -next
soon.  I expect to send a pull request to Linus on Tuesday - after the long
weekend.

Thanks,
NeilBrown

^ permalink raw reply

* Re: Growing a RAID-0 "internally"?
From: Stan Hoeppner @ 2011-06-09  0:50 UTC (permalink / raw)
  To: Patrick J. LoPresti; +Cc: linux-raid
In-Reply-To: <BANLkTi=kM_3Aadv7Y8ime2MGgPEJL_+7Yw@mail.gmail.com>

On 6/8/2011 3:34 PM, Patrick J. LoPresti wrote:
> Hi.  I have a RAID-0 striped across two partitions; let's call them
> /dev/sda2 and /dev/sdb2 (equal size).  These partitions do not stretch
> to the end of the respective drives; that is, there is also a
> /dev/sda3 and /dev/sdb3 (also equal size).
> 
> I want to delete partitions sda3 and sdb3, resize sda2 and sdb2 to
> consume the additional space, and then grow my RAID-0 array to use it.
> 
> I know how to perform every step (e.g., modifying partition tables,
> resizing the file system) except for one:  How do I use mdadm to
> change the size of the drives inside the RAID-0 array?  Or is this not
> supported?

AFAIK you cannot currently grow (or shrink) an md RAID0 or RAID10.
According to 'man mdadm':

Grow

Grow (or shrink) an array, or otherwise reshape it in some way.
Currently supported growth options including changing the active size of
component devices in RAID level 1/4/5/6 and changing the number of
active devices in RAID1.

-- 
Stan

^ permalink raw reply

* Triple-parity raid6
From: David Brown @ 2011-06-09  0:01 UTC (permalink / raw)
  To: linux-raid

Has anyone considered triple-parity raid6 ?  As far as I can see, it 
should not be significantly harder than normal raid6 - either  to 
implement, or for the processor at run-time.  Once you have the GF(2⁸) 
field arithmetic in place for raid6, it's just a matter of making 
another parity block in the same way but using a different generator:

P = D_0 + D_1 + D_2 + .. + D_(n.1)
Q = D_0 + g.D_1 + g².D_2 + .. + g^(n-1).D_(n.1)
R = D_0 + h.D_1 + h².D_2 + .. + h^(n-1).D_(n.1)

The raid6 implementation in mdraid uses g = 0x02 to generate the second 
parity (based on "The mathematics of RAID-6" - I haven't checked the 
source code).  You can make a third parity using h = 0x04 and then get a 
redundancy of 3 disks.  (Note - I haven't yet confirmed that this is 
valid for more than 100 data disks - I need to make my checker program 
more efficient first.)

Rebuilding a disk, or running in degraded mode, is just an obvious 
extension to the current raid6 algorithms.  If you are missing three 
data blocks, the maths looks hard to start with - but if you express the 
equations as a set of linear equations and use standard matrix inversion 
techniques, it should not be hard to implement.  You only need to do 
this inversion once when you find that one or more disks have failed - 
then you pre-compute the multiplication tables in the same way as is 
done for raid6 today.

In normal use, calculating the R parity is no more demanding than 
calculating the Q parity.  And most rebuilds or degraded situations will 
only involve a single disk, and the data can thus be re-constructed 
using the P parity just like raid5 or two-parity raid6.

I'm sure there are situations where triple-parity raid6 would be 
appealing - it has already been implemented in ZFS, and it is only a 
matter of time before two-parity raid6 has a real probability of hitting 
an unrecoverable read error during a rebuild.

And of course, there is no particular reason to stop at three parity 
blocks - the maths can easily be generalised.  1, 2, 4 and 8 can be used 
as generators for quad-parity (checked up to 60 disks), and adding 16 
gives you quintuple parity (checked up to 30 disks) - but that's maybe 
getting a bit paranoid.

ref.:

<http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf>
<http://blogs.oracle.com/ahl/entry/acm_triple_parity_raid>
<http://queue.acm.org/detail.cfm?id=1670144>
<http://blogs.oracle.com/ahl/entry/triple_parity_raid_z>

mvh.,

David

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Growing a RAID-0 "internally"?
From: Patrick J. LoPresti @ 2011-06-08 20:34 UTC (permalink / raw)
  To: linux-raid

Hi.  I have a RAID-0 striped across two partitions; let's call them
/dev/sda2 and /dev/sdb2 (equal size).  These partitions do not stretch
to the end of the respective drives; that is, there is also a
/dev/sda3 and /dev/sdb3 (also equal size).

I want to delete partitions sda3 and sdb3, resize sda2 and sdb2 to
consume the additional space, and then grow my RAID-0 array to use it.

I know how to perform every step (e.g., modifying partition tables,
resizing the file system) except for one:  How do I use mdadm to
change the size of the drives inside the RAID-0 array?  Or is this not
supported?

Thank you.

 - Pat
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: SRaid with 13 Disks crashed
From: Phil Turmel @ 2011-06-08 20:32 UTC (permalink / raw)
  To: Dragon; +Cc: linux-raid
In-Reply-To: <20110608200246.135930@gmx.net>

Hi Dragon,

On 06/08/2011 04:02 PM, Dragon wrote:
[...]

> Please share the output of "mdadm -E /dev/sd[abcdefghijklm]"
> 
> Phil
> -------------------
> Sorry my misstake, of course "resize2fs".
> "Was /dev/sdk supposed to be in this list?"
> ->yes was a part of the raid, but i think something happends with the uuid and now its the system and boot disk. for explanation the raid is stand alone 
> "This is suspicious.  Looks like sda was added as a spare?"
> ->last i wanted to downgrade the raid from 13 to 12 and there for take a disk and marked it as spare, but it was the sda, uuid problem?
> ->
>  mdadm -E /dev/sda
> /dev/sda:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 975d6eb2:285eed11:021df236:c2d05073
>   Creation Time : Tue Oct 13 23:26:17 2009
>      Raid Level : raid5
>   Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
>      Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
>    Raid Devices : 13
>   Total Devices : 12
> Preferred Minor : 0
> 
>     Update Time : Fri Jun  3 23:47:53 2011
>           State : clean
>  Active Devices : 11
> Working Devices : 12
>  Failed Devices : 2
>   Spare Devices : 1
>        Checksum : 1dee418e - correct
>          Events : 156864
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this    13       8        0       13      spare   /dev/sda
> 
>    0     0       8      112        0      active sync   /dev/sdh
>    1     1       0        0        1      faulty removed
>    2     2       8      128        2      active sync   /dev/sdi
>    3     3       8      144        3      active sync   /dev/sdj
>    4     4       8      176        4      active sync   /dev/sdl
>    5     5       8      192        5      active sync   /dev/sdm
>    6     6       8       16        6      active sync   /dev/sdb
>    7     7       0        0        7      faulty removed
>    8     8       8       32        8      active sync   /dev/sdc
>    9     9       8       48        9      active sync   /dev/sdd
>   10    10       8       64       10      active sync   /dev/sde
>   11    11       8       80       11      active sync   /dev/sdf
>   12    12       8       96       12      active sync   /dev/sdg
>   13    13       8        0       13      spare   /dev/sda
>  mdadm -E /dev/sdb
> /dev/sdb:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 975d6eb2:285eed11:021df236:c2d05073
>   Creation Time : Tue Oct 13 23:26:17 2009
>      Raid Level : raid5
>   Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
>      Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
>    Raid Devices : 13
>   Total Devices : 12
> Preferred Minor : 0
> 
>     Update Time : Fri Jun  3 23:47:53 2011
>           State : clean
>  Active Devices : 11
> Working Devices : 12
>  Failed Devices : 2
>   Spare Devices : 1
>        Checksum : 1dee4196 - correct
>          Events : 156864
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     6       8       16        6      active sync   /dev/sdb
> 
>    0     0       8      112        0      active sync   /dev/sdh
>    1     1       0        0        1      faulty removed
>    2     2       8      128        2      active sync   /dev/sdi
>    3     3       8      144        3      active sync   /dev/sdj
>    4     4       8      176        4      active sync   /dev/sdl
>    5     5       8      192        5      active sync   /dev/sdm
>    6     6       8       16        6      active sync   /dev/sdb
>    7     7       0        0        7      faulty removed
>    8     8       8       32        8      active sync   /dev/sdc
>    9     9       8       48        9      active sync   /dev/sdd
>   10    10       8       64       10      active sync   /dev/sde
>   11    11       8       80       11      active sync   /dev/sdf
>   12    12       8       96       12      active sync   /dev/sdg
>   13    13       8        0       13      spare   /dev/sda
>  mdadm -E /dev/sdc
> /dev/sdc:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 975d6eb2:285eed11:021df236:c2d05073
>   Creation Time : Tue Oct 13 23:26:17 2009
>      Raid Level : raid5
>   Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
>      Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
>    Raid Devices : 13
>   Total Devices : 12
> Preferred Minor : 0
> 
>     Update Time : Fri Jun  3 23:47:53 2011
>           State : clean
>  Active Devices : 11
> Working Devices : 12
>  Failed Devices : 2
>   Spare Devices : 1
>        Checksum : 1dee41aa - correct
>          Events : 156864
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     8       8       32        8      active sync   /dev/sdc
> 
>    0     0       8      112        0      active sync   /dev/sdh
>    1     1       0        0        1      faulty removed
>    2     2       8      128        2      active sync   /dev/sdi
>    3     3       8      144        3      active sync   /dev/sdj
>    4     4       8      176        4      active sync   /dev/sdl
>    5     5       8      192        5      active sync   /dev/sdm
>    6     6       8       16        6      active sync   /dev/sdb
>    7     7       0        0        7      faulty removed
>    8     8       8       32        8      active sync   /dev/sdc
>    9     9       8       48        9      active sync   /dev/sdd
>   10    10       8       64       10      active sync   /dev/sde
>   11    11       8       80       11      active sync   /dev/sdf
>   12    12       8       96       12      active sync   /dev/sdg
>   13    13       8        0       13      spare   /dev/sda
>  mdadm -E /dev/sdd
> /dev/sdd:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 975d6eb2:285eed11:021df236:c2d05073
>   Creation Time : Tue Oct 13 23:26:17 2009
>      Raid Level : raid5
>   Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
>      Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
>    Raid Devices : 13
>   Total Devices : 12
> Preferred Minor : 0
> 
>     Update Time : Fri Jun  3 23:47:53 2011
>           State : clean
>  Active Devices : 11
> Working Devices : 12
>  Failed Devices : 2
>   Spare Devices : 1
>        Checksum : 1dee41bc - correct
>          Events : 156864
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     9       8       48        9      active sync   /dev/sdd
> 
>    0     0       8      112        0      active sync   /dev/sdh
>    1     1       0        0        1      faulty removed
>    2     2       8      128        2      active sync   /dev/sdi
>    3     3       8      144        3      active sync   /dev/sdj
>    4     4       8      176        4      active sync   /dev/sdl
>    5     5       8      192        5      active sync   /dev/sdm
>    6     6       8       16        6      active sync   /dev/sdb
>    7     7       0        0        7      faulty removed
>    8     8       8       32        8      active sync   /dev/sdc
>    9     9       8       48        9      active sync   /dev/sdd
>   10    10       8       64       10      active sync   /dev/sde
>   11    11       8       80       11      active sync   /dev/sdf
>   12    12       8       96       12      active sync   /dev/sdg
>   13    13       8        0       13      spare   /dev/sda

OK, something is odd here:

>  mdadm -E /dev/sde
> /dev/sde:
       ^^^
You request /dev/sde

>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 975d6eb2:285eed11:021df236:c2d05073
>   Creation Time : Tue Oct 13 23:26:17 2009
>      Raid Level : raid5
>   Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
>      Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
>    Raid Devices : 13
>   Total Devices : 12
> Preferred Minor : 0
> 
>     Update Time : Fri Jun  3 23:47:53 2011
>           State : clean
>  Active Devices : 11
> Working Devices : 12
>  Failed Devices : 2
>   Spare Devices : 1
>        Checksum : 1dee41ea - correct
>          Events : 156864
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     0       8      112        0      active sync   /dev/sdh

We get a report for /dev/sdh  (did you scramble the report?)   ^^^

>    0     0       8      112        0      active sync   /dev/sdh
>    1     1       0        0        1      faulty removed
>    2     2       8      128        2      active sync   /dev/sdi
>    3     3       8      144        3      active sync   /dev/sdj
>    4     4       8      176        4      active sync   /dev/sdl
>    5     5       8      192        5      active sync   /dev/sdm
>    6     6       8       16        6      active sync   /dev/sdb
>    7     7       0        0        7      faulty removed
>    8     8       8       32        8      active sync   /dev/sdc
>    9     9       8       48        9      active sync   /dev/sdd
>   10    10       8       64       10      active sync   /dev/sde
>   11    11       8       80       11      active sync   /dev/sdf
>   12    12       8       96       12      active sync   /dev/sdg
>   13    13       8        0       13      spare   /dev/sda

And again here, and for all the rest:

>  mdadm -E /dev/sdf
> /dev/sdf:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 975d6eb2:285eed11:021df236:c2d05073
>   Creation Time : Tue Oct 13 23:26:17 2009
>      Raid Level : raid5
>   Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
>      Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
>    Raid Devices : 13
>   Total Devices : 12
> Preferred Minor : 0
> 
>     Update Time : Fri Jun  3 23:47:53 2011
>           State : clean
>  Active Devices : 11
> Working Devices : 12
>  Failed Devices : 2
>   Spare Devices : 1
>        Checksum : 1dee41fe - correct
>          Events : 156864
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     2       8      128        2      active sync   /dev/sdi
> 
>    0     0       8      112        0      active sync   /dev/sdh
>    1     1       0        0        1      faulty removed
>    2     2       8      128        2      active sync   /dev/sdi
>    3     3       8      144        3      active sync   /dev/sdj
>    4     4       8      176        4      active sync   /dev/sdl
>    5     5       8      192        5      active sync   /dev/sdm
>    6     6       8       16        6      active sync   /dev/sdb
>    7     7       0        0        7      faulty removed
>    8     8       8       32        8      active sync   /dev/sdc
>    9     9       8       48        9      active sync   /dev/sdd
>   10    10       8       64       10      active sync   /dev/sde
>   11    11       8       80       11      active sync   /dev/sdf
>   12    12       8       96       12      active sync   /dev/sdg
>   13    13       8        0       13      spare   /dev/sda
>  mdadm -E /dev/sdg
> /dev/sdg:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 975d6eb2:285eed11:021df236:c2d05073
>   Creation Time : Tue Oct 13 23:26:17 2009
>      Raid Level : raid5
>   Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
>      Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
>    Raid Devices : 13
>   Total Devices : 12
> Preferred Minor : 0
> 
>     Update Time : Fri Jun  3 23:47:53 2011
>           State : clean
>  Active Devices : 11
> Working Devices : 12
>  Failed Devices : 2
>   Spare Devices : 1
>        Checksum : 1dee4210 - correct
>          Events : 156864
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     3       8      144        3      active sync   /dev/sdj
> 
>    0     0       8      112        0      active sync   /dev/sdh
>    1     1       0        0        1      faulty removed
>    2     2       8      128        2      active sync   /dev/sdi
>    3     3       8      144        3      active sync   /dev/sdj
>    4     4       8      176        4      active sync   /dev/sdl
>    5     5       8      192        5      active sync   /dev/sdm
>    6     6       8       16        6      active sync   /dev/sdb
>    7     7       0        0        7      faulty removed
>    8     8       8       32        8      active sync   /dev/sdc
>    9     9       8       48        9      active sync   /dev/sdd
>   10    10       8       64       10      active sync   /dev/sde
>   11    11       8       80       11      active sync   /dev/sdf
>   12    12       8       96       12      active sync   /dev/sdg
>   13    13       8        0       13      spare   /dev/sda

This one is odd:

> mdadm -E /dev/sdh
> /dev/sdh:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 975d6eb2:285eed11:021df236:c2d05073
>   Creation Time : Tue Oct 13 23:26:17 2009
>      Raid Level : raid5
>   Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
>      Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
>    Raid Devices : 13
>   Total Devices : 12
> Preferred Minor : 0
> 
>     Update Time : Fri Jun  3 22:49:22 2011
>           State : clean
>  Active Devices : 11
> Working Devices : 12
>  Failed Devices : 2
>   Spare Devices : 1
>        Checksum : 1dee3313 - correct
>          Events : 156606
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this    13       8      160       13      spare   /dev/sdk

Supposed requested /dev/sdh, shows report for /dev/sdk, which is not a raid device, but shows as a spare here.
> 
>    0     0       8      112        0      active sync   /dev/sdh
>    1     1       0        0        1      faulty removed
>    2     2       8      128        2      active sync   /dev/sdi
>    3     3       8      144        3      active sync   /dev/sdj
>    4     4       8      176        4      active sync   /dev/sdl
>    5     5       8      192        5      active sync   /dev/sdm
>    6     6       8       16        6      active sync   /dev/sdb
>    7     7       0        0        7      faulty removed
>    8     8       8       32        8      active sync   /dev/sdc
>    9     9       8       48        9      active sync   /dev/sdd
>   10    10       8       64       10      active sync   /dev/sde
>   11    11       8       80       11      active sync   /dev/sdf
>   12    12       8       96       12      active sync   /dev/sdg
>   13    13       8      160       13      spare   /dev/sdk
>  mdadm -E /dev/sdi
> /dev/sdi:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 975d6eb2:285eed11:021df236:c2d05073
>   Creation Time : Tue Oct 13 23:26:17 2009
>      Raid Level : raid5
>   Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
>      Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
>    Raid Devices : 13
>   Total Devices : 12
> Preferred Minor : 0
> 
>     Update Time : Fri Jun  3 23:47:53 2011
>           State : clean
>  Active Devices : 11
> Working Devices : 12
>  Failed Devices : 2
>   Spare Devices : 1
>        Checksum : 1dee4232 - correct
>          Events : 156864
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     4       8      176        4      active sync   /dev/sdl
> 
>    0     0       8      112        0      active sync   /dev/sdh
>    1     1       0        0        1      faulty removed
>    2     2       8      128        2      active sync   /dev/sdi
>    3     3       8      144        3      active sync   /dev/sdj
>    4     4       8      176        4      active sync   /dev/sdl
>    5     5       8      192        5      active sync   /dev/sdm
>    6     6       8       16        6      active sync   /dev/sdb
>    7     7       0        0        7      faulty removed
>    8     8       8       32        8      active sync   /dev/sdc
>    9     9       8       48        9      active sync   /dev/sdd
>   10    10       8       64       10      active sync   /dev/sde
>   11    11       8       80       11      active sync   /dev/sdf
>   12    12       8       96       12      active sync   /dev/sdg
>   13    13       8        0       13      spare   /dev/sda
>  mdadm -E /dev/sdj
> /dev/sdj:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 975d6eb2:285eed11:021df236:c2d05073
>   Creation Time : Tue Oct 13 23:26:17 2009
>      Raid Level : raid5
>   Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
>      Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
>    Raid Devices : 13
>   Total Devices : 12
> Preferred Minor : 0
> 
>     Update Time : Fri Jun  3 23:47:53 2011
>           State : clean
>  Active Devices : 11
> Working Devices : 12
>  Failed Devices : 2
>   Spare Devices : 1
>        Checksum : 1dee4244 - correct
>          Events : 156864
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     5       8      192        5      active sync   /dev/sdm
> 
>    0     0       8      112        0      active sync   /dev/sdh
>    1     1       0        0        1      faulty removed
>    2     2       8      128        2      active sync   /dev/sdi
>    3     3       8      144        3      active sync   /dev/sdj
>    4     4       8      176        4      active sync   /dev/sdl
>    5     5       8      192        5      active sync   /dev/sdm
>    6     6       8       16        6      active sync   /dev/sdb
>    7     7       0        0        7      faulty removed
>    8     8       8       32        8      active sync   /dev/sdc
>    9     9       8       48        9      active sync   /dev/sdd
>   10    10       8       64       10      active sync   /dev/sde
>   11    11       8       80       11      active sync   /dev/sdf
>   12    12       8       96       12      active sync   /dev/sdg
>   13    13       8        0       13      spare   /dev/sda

Now, this is significant:

>  mdadm -E /dev/sdk
> mdadm: No md superblock detected on /dev/sdk.

As we see below, this is not one of your 1.5T drives.

>  mdadm -E /dev/sdl
> /dev/sdl:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 975d6eb2:285eed11:021df236:c2d05073
>   Creation Time : Tue Oct 13 23:26:17 2009
>      Raid Level : raid5
>   Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
>      Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
>    Raid Devices : 13
>   Total Devices : 12
> Preferred Minor : 0
> 
>     Update Time : Fri Jun  3 23:47:53 2011
>           State : clean
>  Active Devices : 11
> Working Devices : 12
>  Failed Devices : 2
>   Spare Devices : 1
>        Checksum : 1dee41ce - correct
>          Events : 156864
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this    10       8       64       10      active sync   /dev/sde
> 
>    0     0       8      112        0      active sync   /dev/sdh
>    1     1       0        0        1      faulty removed
>    2     2       8      128        2      active sync   /dev/sdi
>    3     3       8      144        3      active sync   /dev/sdj
>    4     4       8      176        4      active sync   /dev/sdl
>    5     5       8      192        5      active sync   /dev/sdm
>    6     6       8       16        6      active sync   /dev/sdb
>    7     7       0        0        7      faulty removed
>    8     8       8       32        8      active sync   /dev/sdc
>    9     9       8       48        9      active sync   /dev/sdd
>   10    10       8       64       10      active sync   /dev/sde
>   11    11       8       80       11      active sync   /dev/sdf
>   12    12       8       96       12      active sync   /dev/sdg
>   13    13       8        0       13      spare   /dev/sda
>  mdadm -E /dev/sdm
> /dev/sdm:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 975d6eb2:285eed11:021df236:c2d05073
>   Creation Time : Tue Oct 13 23:26:17 2009
>      Raid Level : raid5
>   Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
>      Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
>    Raid Devices : 13
>   Total Devices : 12
> Preferred Minor : 0
> 
>     Update Time : Fri Jun  3 23:47:53 2011
>           State : clean
>  Active Devices : 11
> Working Devices : 12
>  Failed Devices : 2
>   Spare Devices : 1
>        Checksum : 1dee41e0 - correct
>          Events : 156864
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this    11       8       80       11      active sync   /dev/sdf
> 
>    0     0       8      112        0      active sync   /dev/sdh
>    1     1       0        0        1      faulty removed
>    2     2       8      128        2      active sync   /dev/sdi
>    3     3       8      144        3      active sync   /dev/sdj
>    4     4       8      176        4      active sync   /dev/sdl
>    5     5       8      192        5      active sync   /dev/sdm
>    6     6       8       16        6      active sync   /dev/sdb
>    7     7       0        0        7      faulty removed
>    8     8       8       32        8      active sync   /dev/sdc
>    9     9       8       48        9      active sync   /dev/sdd
>   10    10       8       64       10      active sync   /dev/sde
>   11    11       8       80       11      active sync   /dev/sdf
>   12    12       8       96       12      active sync   /dev/sdg
>   13    13       8        0       13      spare   /dev/sda
>  mdadm -E /dev/sdn
> /dev/sdn:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 975d6eb2:285eed11:021df236:c2d05073
>   Creation Time : Tue Oct 13 23:26:17 2009
>      Raid Level : raid5
>   Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
>      Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
>    Raid Devices : 13
>   Total Devices : 12
> Preferred Minor : 0
> 
>     Update Time : Fri Jun  3 23:47:53 2011
>           State : clean
>  Active Devices : 11
> Working Devices : 12
>  Failed Devices : 2
>   Spare Devices : 1
>        Checksum : 1dee41f2 - correct
>          Events : 156864
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this    12       8       96       12      active sync   /dev/sdg
> 
>    0     0       8      112        0      active sync   /dev/sdh
>    1     1       0        0        1      faulty removed
>    2     2       8      128        2      active sync   /dev/sdi
>    3     3       8      144        3      active sync   /dev/sdj
>    4     4       8      176        4      active sync   /dev/sdl
>    5     5       8      192        5      active sync   /dev/sdm
>    6     6       8       16        6      active sync   /dev/sdb
>    7     7       0        0        7      faulty removed
>    8     8       8       32        8      active sync   /dev/sdc
>    9     9       8       48        9      active sync   /dev/sdd
>   10    10       8       64       10      active sync   /dev/sde
>   11    11       8       80       11      active sync   /dev/sdf
>   12    12       8       96       12      active sync   /dev/sdg
>   13    13       8        0       13      spare   /dev/sda
>  mdadm -E /dev/sdo
> mdadm: No md superblock detected on /dev/sdo.
> nassrv01:~# fdisk -l |grep sd
> Disk /dev/sda: 1500.3 GB, 1500301910016 bytes
> Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
> Disk /dev/sdc: 1500.3 GB, 1500301910016 bytes
> Disk /dev/sdd: 1500.3 GB, 1500301910016 bytes
> Disk /dev/sde: 1500.3 GB, 1500301910016 bytes
> Disk /dev/sdf: 1500.3 GB, 1500301910016 bytes
> Disk /dev/sdg: 1500.3 GB, 1500301910016 bytes
> Disk /dev/sdh: 1500.3 GB, 1500301910016 bytes
> Disk /dev/sdi: 1500.3 GB, 1500301910016 bytes
> Disk /dev/sdj: 1500.3 GB, 1500301910016 bytes
> Disk /dev/sdk: 20.4 GB, 20409532416 bytes
> /dev/sdk1   *           1        2372    19053058+  83  Linux
> /dev/sdk2            2373        2481      875542+   5  Extended
> /dev/sdk5            2373        2481      875511   82  Linux swap / Solaris
> Disk /dev/sdl: 1500.3 GB, 1500301910016 bytes
> Disk /dev/sdm: 1500.3 GB, 1500301910016 bytes
> Disk /dev/sdn: 1500.3 GB, 1500301910016 bytes
> Disk /dev/sdo: 1500.3 GB, 1500301910016 bytes
> /dev/sdo1               1      182401  1465136001   83  Linux

Uh, oh.  /dev/sdo has a partition table.  Please show "mdadm -E /dev/sdo1"

I'm guessing that drive letters have changed since you first put this together.  Could you please get my "lsdrv" script from:

http://github.com/pturmel/lsdrv

and show us the output.  This will help make sure we don't lose track of the roles of each of these drives.

My first impression from the above output is that the shrink from 13 back to 12 did not actually happen.  Shrinking is dangerous enough that mdadm stays in test mode unless you set a sysfs variable with the new size before trying to shrink.

Another question:  Do you have a backup?

Phil

^ permalink raw reply

* Re: SRaid with 13 Disks crashed
From: Dragon @ 2011-06-08 20:02 UTC (permalink / raw)
  To: philip; +Cc: linux-raid

Hi Dragon,

On 06/08/2011 10:24 AM, Dragon wrote:
> SRaid with 13 Disks crashed
> Hello,
> 
> 
> this seems to be my last chance to get back all of my data from a sw-raid5 with
12-13 disks.
> i use debian ( 2.6.32-bpo.5-amd64) and last i wanted to grow the raid from 12 to
13 disk with a size at all of 18tb. after run mke2fs i must see that the tool on
ext4 allow a maximum size of 16tb. after that i wanted to shrink the size back to
12 disk and now the raid is gone.

Did you actually mean "mke2fs" ?  It destroys existing data.  I hope you meant
"resize2fs".

> i tried some assemble and examine things but without success.
> 
> here some information:
>  cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : inactive sdh[0](S) sda[13](S) sdg[12](S) sdf[11](S) sde[10](S) sdd[9](S)
sdc[8](S) sdb[6](S) sdm[5](S) sdl[4](S) sdj[3](S) sdi[2](S)
>       17581661952 blocks
> 
> unused devices: <none>
> 
> mdadm --detail /dev/md0
> mdadm: md device /dev/md0 does not appear to be active.
> 
>  mdadm --assemble --force -v /dev/md0 /dev/sdh /dev/sda /dev/sdg /dev/sdf /dev/sde
/dev/sdd /dev/sdc /dev/sdb /dev/sdm /dev/sdl /dev/sdj /dev/sdi
--update=super-minor /dev/sdh

Was /dev/sdk supposed to be in this list?

> mdadm: looking for devices for /dev/md0
> mdadm: updating superblock of /dev/sdh with minor number 0
> mdadm: /dev/sdh is identified as a member of /dev/md0, slot 0.
> mdadm: updating superblock of /dev/sda with minor number 0
> mdadm: /dev/sda is identified as a member of /dev/md0, slot 13.

This is suspicious.  Looks like sda was added as a spare?

> mdadm: updating superblock of /dev/sdg with minor number 0
> mdadm: /dev/sdg is identified as a member of /dev/md0, slot 12.
> mdadm: updating superblock of /dev/sdf with minor number 0
> mdadm: /dev/sdf is identified as a member of /dev/md0, slot 11.
> mdadm: updating superblock of /dev/sde with minor number 0
> mdadm: /dev/sde is identified as a member of /dev/md0, slot 10.
> mdadm: updating superblock of /dev/sdd with minor number 0
> mdadm: /dev/sdd is identified as a member of /dev/md0, slot 9.
> mdadm: updating superblock of /dev/sdc with minor number 0
> mdadm: /dev/sdc is identified as a member of /dev/md0, slot 8.
> mdadm: updating superblock of /dev/sdb with minor number 0
> mdadm: /dev/sdb is identified as a member of /dev/md0, slot 6.
> mdadm: updating superblock of /dev/sdm with minor number 0
> mdadm: /dev/sdm is identified as a member of /dev/md0, slot 5.
> mdadm: updating superblock of /dev/sdl with minor number 0
> mdadm: /dev/sdl is identified as a member of /dev/md0, slot 4.
> mdadm: updating superblock of /dev/sdj with minor number 0
> mdadm: /dev/sdj is identified as a member of /dev/md0, slot 3.
> mdadm: updating superblock of /dev/sdi with minor number 0
> mdadm: /dev/sdi is identified as a member of /dev/md0, slot 2.
> mdadm: updating superblock of /dev/sdh with minor number 0
> mdadm: /dev/sdh is identified as a member of /dev/md0, slot 0.
> mdadm: no uptodate device for slot 1 of /dev/md0
> mdadm: added /dev/sdi to /dev/md0 as 2
> mdadm: added /dev/sdj to /dev/md0 as 3
> mdadm: added /dev/sdl to /dev/md0 as 4
> mdadm: added /dev/sdm to /dev/md0 as 5
> mdadm: added /dev/sdb to /dev/md0 as 6
> mdadm: no uptodate device for slot 7 of /dev/md0
> mdadm: added /dev/sdc to /dev/md0 as 8
> mdadm: added /dev/sdd to /dev/md0 as 9
> mdadm: added /dev/sde to /dev/md0 as 10
> mdadm: added /dev/sdf to /dev/md0 as 11
> mdadm: added /dev/sdg to /dev/md0 as 12
> mdadm: added /dev/sda to /dev/md0 as 13
> mdadm: added /dev/sdh to /dev/md0 as 0
> mdadm: /dev/md0 assembled from 11 drives and 1 spare - not enough to start the array.

Indeed.  Your problem is likely to be /dev/sda.

> mdadm.conf
> #old=ARRAY /dev/md0 level=raid5 num-devices=13 metadata=0.90
UUID=975d6eb2:285eed11:021df236:c2d05073
> ARRAY /dev/md0 UUID=975d6eb2:285eed11:021df236:c2d05073
> 
> Hope some can help. Thx

Please share the output of "mdadm -E /dev/sd[abcdefghijklm]"

Phil
-------------------
Sorry my misstake, of course "resize2fs".
"Was /dev/sdk supposed to be in this list?"
->yes was a part of the raid, but i think something happends with the uuid and now its the system and boot disk. for explanation the raid is stand alone 
"This is suspicious.  Looks like sda was added as a spare?"
->last i wanted to downgrade the raid from 13 to 12 and there for take a disk and marked it as spare, but it was the sda, uuid problem?
->
 mdadm -E /dev/sda
/dev/sda:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 23:47:53 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee418e - correct
         Events : 156864

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this    13       8        0       13      spare   /dev/sda

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8        0       13      spare   /dev/sda
 mdadm -E /dev/sdb
/dev/sdb:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 23:47:53 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee4196 - correct
         Events : 156864

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     6       8       16        6      active sync   /dev/sdb

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8        0       13      spare   /dev/sda
 mdadm -E /dev/sdc
/dev/sdc:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 23:47:53 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee41aa - correct
         Events : 156864

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     8       8       32        8      active sync   /dev/sdc

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8        0       13      spare   /dev/sda
 mdadm -E /dev/sdd
/dev/sdd:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 23:47:53 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee41bc - correct
         Events : 156864

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     9       8       48        9      active sync   /dev/sdd

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8        0       13      spare   /dev/sda
 mdadm -E /dev/sde
/dev/sde:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 23:47:53 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee41ea - correct
         Events : 156864

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8      112        0      active sync   /dev/sdh

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8        0       13      spare   /dev/sda
 mdadm -E /dev/sdf
/dev/sdf:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 23:47:53 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee41fe - correct
         Events : 156864

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2       8      128        2      active sync   /dev/sdi

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8        0       13      spare   /dev/sda
 mdadm -E /dev/sdg
/dev/sdg:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 23:47:53 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee4210 - correct
         Events : 156864

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     3       8      144        3      active sync   /dev/sdj

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8        0       13      spare   /dev/sda
mdadm -E /dev/sdh
/dev/sdh:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 22:49:22 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee3313 - correct
         Events : 156606

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this    13       8      160       13      spare   /dev/sdk

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8      160       13      spare   /dev/sdk
 mdadm -E /dev/sdi
/dev/sdi:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 23:47:53 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee4232 - correct
         Events : 156864

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     4       8      176        4      active sync   /dev/sdl

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8        0       13      spare   /dev/sda
 mdadm -E /dev/sdj
/dev/sdj:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 23:47:53 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee4244 - correct
         Events : 156864

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     5       8      192        5      active sync   /dev/sdm

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8        0       13      spare   /dev/sda
 mdadm -E /dev/sdk
mdadm: No md superblock detected on /dev/sdk.
 mdadm -E /dev/sdl
/dev/sdl:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 23:47:53 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee41ce - correct
         Events : 156864

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this    10       8       64       10      active sync   /dev/sde

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8        0       13      spare   /dev/sda
 mdadm -E /dev/sdm
/dev/sdm:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 23:47:53 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee41e0 - correct
         Events : 156864

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this    11       8       80       11      active sync   /dev/sdf

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8        0       13      spare   /dev/sda
 mdadm -E /dev/sdn
/dev/sdn:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 975d6eb2:285eed11:021df236:c2d05073
  Creation Time : Tue Oct 13 23:26:17 2009
     Raid Level : raid5
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
     Array Size : 17581661952 (16767.18 GiB 18003.62 GB)
   Raid Devices : 13
  Total Devices : 12
Preferred Minor : 0

    Update Time : Fri Jun  3 23:47:53 2011
          State : clean
 Active Devices : 11
Working Devices : 12
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 1dee41f2 - correct
         Events : 156864

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this    12       8       96       12      active sync   /dev/sdg

   0     0       8      112        0      active sync   /dev/sdh
   1     1       0        0        1      faulty removed
   2     2       8      128        2      active sync   /dev/sdi
   3     3       8      144        3      active sync   /dev/sdj
   4     4       8      176        4      active sync   /dev/sdl
   5     5       8      192        5      active sync   /dev/sdm
   6     6       8       16        6      active sync   /dev/sdb
   7     7       0        0        7      faulty removed
   8     8       8       32        8      active sync   /dev/sdc
   9     9       8       48        9      active sync   /dev/sdd
  10    10       8       64       10      active sync   /dev/sde
  11    11       8       80       11      active sync   /dev/sdf
  12    12       8       96       12      active sync   /dev/sdg
  13    13       8        0       13      spare   /dev/sda
 mdadm -E /dev/sdo
mdadm: No md superblock detected on /dev/sdo.
nassrv01:~# fdisk -l |grep sd
Disk /dev/sda: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdc: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdd: 1500.3 GB, 1500301910016 bytes
Disk /dev/sde: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdf: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdg: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdh: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdi: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdj: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdk: 20.4 GB, 20409532416 bytes
/dev/sdk1   *           1        2372    19053058+  83  Linux
/dev/sdk2            2373        2481      875542+   5  Extended
/dev/sdk5            2373        2481      875511   82  Linux swap / Solaris
Disk /dev/sdl: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdm: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdn: 1500.3 GB, 1500301910016 bytes
Disk /dev/sdo: 1500.3 GB, 1500301910016 bytes
/dev/sdo1               1      182401  1465136001   83  Linux

Hope that helps. thx
-- 
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de

^ permalink raw reply

* [PATCH] md/raid10: share pages between read and write bio's during recovery
From: Namhyung Kim @ 2011-06-08 18:04 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

When performing a recovery, only first 2 slots in r10_bio are in use,
for read and write respectively. However all of pages in the write bio
are never used and just replaced to read bio's when the read completes.

Get rid of those unused pages and share read pages properly.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
---
 drivers/md/raid10.c |   13 ++++++++++---
 1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index a53779ffdf89..621594981339 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -116,6 +116,13 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
 			goto out_free_bio;
 		r10_bio->devs[j].bio = bio;
 	}
+
+	/*
+	 * We can share bv_page's during the recovery
+	 */
+	if (!test_bit(MD_RECOVERY_SYNC, &conf->mddev->recovery))
+		nalloc--;
+
 	/*
 	 * Allocate RESYNC_PAGES data pages and attach them
 	 * where needed.
@@ -1363,16 +1370,16 @@ static void recovery_request_write(mddev_t *mddev, r10bio_t *r10_bio)
 	int i, d;
 	struct bio *bio, *wbio;
 
-
-	/* move the pages across to the second bio
+	/*
+	 * share the pages with the first bio
 	 * and submit the write request
 	 */
 	bio = r10_bio->devs[0].bio;
 	wbio = r10_bio->devs[1].bio;
 	for (i=0; i < wbio->bi_vcnt; i++) {
 		struct page *p = bio->bi_io_vec[i].bv_page;
-		bio->bi_io_vec[i].bv_page = wbio->bi_io_vec[i].bv_page;
 		wbio->bi_io_vec[i].bv_page = p;
+		get_page(p);
 	}
 	d = r10_bio->devs[1].devnum;
 
-- 
1.7.5.2


^ permalink raw reply related

* [PATCH] MD:  use is_power_of_2 macro
From: Jonathan Brassow @ 2011-06-08 18:01 UTC (permalink / raw)
  To: linux-raid

Make use of is_power_of_2 macro.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>

Index: linux-2.6/drivers/md/bitmap.c
===================================================================
--- linux-2.6.orig/drivers/md/bitmap.c
+++ linux-2.6/drivers/md/bitmap.c
@@ -651,7 +651,7 @@ static int bitmap_read_sb(struct bitmap 
 		reason = "unrecognized superblock version";
 	else if (chunksize < 512)
 		reason = "bitmap chunksize too small";
-	else if ((1 << ffz(~chunksize)) != chunksize)
+	else if (!is_power_of_2(chunksize))
 		reason = "bitmap chunksize not a power of 2";
 	else if (daemon_sleep < 1 || daemon_sleep > MAX_SCHEDULE_TIMEOUT)
 		reason = "daemon sleep period out of range";



^ permalink raw reply

* [PATCH 8 of 8 - v2] MD:  raid5 do not set fullsync
From: Jonathan Brassow @ 2011-06-08 18:00 UTC (permalink / raw)
  To: linux-raid

Neil, thanks for the insight on 'saved_raid_disk' - this is indeed better.

 brassow
==========

Add check to determine if a device needs full resync or if partial resync will do

RAID 5 was assuming that if a device was not In_sync, it must undergo a full
resync.  We add a check to see if 'saved_raid_disk' is the same as 'raid_disk'.
If it is, we can safely skip the full resync and rely on the bitmap for
partial recovery instead.  This is the legitimate purpose of 'saved_raid_disk',
from md.h:
int saved_raid_disk;            /* role that device used to have in the
                                 * array and could again if we did a partial
                                 * resync from the bitmap
                                 */

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>

Index: linux-2.6/drivers/md/raid5.c
===================================================================
--- linux-2.6.orig/drivers/md/raid5.c
+++ linux-2.6/drivers/md/raid5.c
@@ -4858,7 +4858,7 @@ static raid5_conf_t *setup_conf(mddev_t 
 			printk(KERN_INFO "md/raid:%s: device %s operational as raid"
 			       " disk %d\n",
 			       mdname(mddev), bdevname(rdev->bdev, b), raid_disk);
-		} else
+		} else if (rdev->saved_raid_disk != raid_disk)
 			/* Cannot rely on bitmap to complete recovery */
 			conf->fullsync = 1;
 	}



^ permalink raw reply

* [PATCH 7 of 8 - v2] MD:  add bitmap support
From: Jonathan Brassow @ 2011-06-08 17:59 UTC (permalink / raw)
  To: linux-raid

Neil, fixed power-of-two check and switched read_sb_page to just alloc_page.  Also found another
place in bitmap.c that needs to switch to is_power_of_2, but I'll put that in another patch.

 brassow
==========

Add bitmap support to the device-mapper specific metadata area.

This patch allows the creation of the bitmap metadata area upon initial array
creation via device-mapper.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>

Index: linux-2.6/drivers/md/bitmap.c
===================================================================
--- linux-2.6.orig/drivers/md/bitmap.c
+++ linux-2.6/drivers/md/bitmap.c
@@ -534,6 +534,82 @@ void bitmap_print_sb(struct bitmap *bitm
 	kunmap_atomic(sb, KM_USER0);
 }
 
+/*
+ * bitmap_new_disk_sb
+ * @bitmap
+ *
+ * This function is somewhat the reverse of bitmap_read_sb.  bitmap_read_sb
+ * reads and verifies the on-disk bitmap superblock and populates bitmap_info.
+ * This function verifies 'bitmap_info' and populates the on-disk bitmap
+ * structure, which is to be written to disk.
+ *
+ * Returns: 0 on success, -Exxx on error
+ */
+static int bitmap_new_disk_sb(struct bitmap *bitmap)
+{
+	bitmap_super_t *sb;
+	unsigned long chunksize, daemon_sleep, write_behind;
+	int err = -EINVAL;
+
+	bitmap->sb_page = alloc_page(GFP_KERNEL);
+	if (IS_ERR(bitmap->sb_page)) {
+		err = PTR_ERR(bitmap->sb_page);
+		bitmap->sb_page = NULL;
+		return err;
+	}
+	bitmap->sb_page->index = 0;
+
+	sb = kmap_atomic(bitmap->sb_page, KM_USER0);
+
+	sb->magic = cpu_to_le32(BITMAP_MAGIC);
+	sb->version = cpu_to_le32(BITMAP_MAJOR_HI);
+
+	chunksize = bitmap->mddev->bitmap_info.chunksize;
+	BUG_ON(!chunksize);
+	if (!is_power_of_2(chunksize)) {
+		kunmap_atomic(sb, KM_USER0);
+		printk(KERN_ERR "bitmap chunksize not a power of 2\n");
+		return -EINVAL;
+	}
+	sb->chunksize = cpu_to_le32(chunksize);
+
+	daemon_sleep = bitmap->mddev->bitmap_info.daemon_sleep;
+	if (!daemon_sleep ||
+	    (daemon_sleep < 1) || (daemon_sleep > MAX_SCHEDULE_TIMEOUT)) {
+		printk(KERN_INFO "Choosing daemon_sleep default (5 sec)\n");
+		daemon_sleep = 5 * HZ;
+	}
+	sb->daemon_sleep = cpu_to_le32(daemon_sleep);
+	bitmap->mddev->bitmap_info.daemon_sleep = daemon_sleep;
+
+	/*
+	 * FIXME: write_behind for RAID1.  If not specified, what
+	 * is a good choice?  We choose COUNTER_MAX / 2 arbitrarily.
+	 */
+	write_behind = bitmap->mddev->bitmap_info.max_write_behind;
+	if (write_behind > COUNTER_MAX)
+		write_behind = COUNTER_MAX / 2;
+	sb->write_behind = cpu_to_le32(write_behind);
+	bitmap->mddev->bitmap_info.max_write_behind = write_behind;
+
+	/* keep the array size field of the bitmap superblock up to date */
+	sb->sync_size = cpu_to_le64(bitmap->mddev->resync_max_sectors);
+
+	memcpy(sb->uuid, bitmap->mddev->uuid, 16);
+
+	bitmap->flags |= BITMAP_STALE;
+	sb->state |= cpu_to_le32(BITMAP_STALE);
+	bitmap->events_cleared = bitmap->mddev->events;
+	sb->events_cleared = cpu_to_le64(bitmap->mddev->events);
+
+	bitmap->flags |= BITMAP_HOSTENDIAN;
+	sb->version = cpu_to_le32(BITMAP_MAJOR_HOSTENDIAN);
+
+	kunmap_atomic(sb, KM_USER0);
+
+	return 0;
+}
+
 /* read the superblock from the bitmap file and initialize some bitmap fields */
 static int bitmap_read_sb(struct bitmap *bitmap)
 {
@@ -1076,8 +1152,8 @@ static int bitmap_init_from_disk(struct 
 	}
 
 	printk(KERN_INFO "%s: bitmap initialized from disk: "
-		"read %lu/%lu pages, set %lu bits\n",
-		bmname(bitmap), bitmap->file_pages, num_pages, bit_cnt);
+	       "read %lu/%lu pages, set %lu of %lu bits\n",
+	       bmname(bitmap), bitmap->file_pages, num_pages, bit_cnt, chunks);
 
 	return 0;
 
@@ -1728,9 +1804,16 @@ int bitmap_create(mddev_t *mddev)
 		vfs_fsync(file, 1);
 	}
 	/* read superblock from bitmap file (this sets mddev->bitmap_info.chunksize) */
-	if (!mddev->bitmap_info.external)
-		err = bitmap_read_sb(bitmap);
-	else {
+	if (!mddev->bitmap_info.external) {
+		/*
+		 * If 'MD_ARRAY_FIRST_USE' is set, then device-mapper is
+		 * instructing us to create a new on-disk bitmap instance.
+		 */
+		if (test_and_clear_bit(MD_ARRAY_FIRST_USE, &mddev->flags))
+			err = bitmap_new_disk_sb(bitmap);
+		else
+			err = bitmap_read_sb(bitmap);
+	} else {
 		err = 0;
 		if (mddev->bitmap_info.chunksize == 0 ||
 		    mddev->bitmap_info.daemon_sleep == 0)
Index: linux-2.6/drivers/md/md.h
===================================================================
--- linux-2.6.orig/drivers/md/md.h
+++ linux-2.6/drivers/md/md.h
@@ -124,6 +124,7 @@ struct mddev_s
 #define MD_CHANGE_DEVS	0	/* Some device status has changed */
 #define MD_CHANGE_CLEAN 1	/* transition to or from 'clean' */
 #define MD_CHANGE_PENDING 2	/* switch from 'clean' to 'active' in progress */
+#define MD_ARRAY_FIRST_USE 3    /* First use of array, needs initialization */
 
 	int				suspended;
 	atomic_t			active_io;



^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox