Linux RAID subsystem development

Linux RAID subsystem development
 help / color / mirror / Atom feed

* Re: Why do I get different results for 'mdadm --detail' & 'mdadm --examine' for the same array?
From: jeffs_linux @ 2011-06-12  1:49 UTC (permalink / raw)
  To: linux-raid
In-Reply-To: <1307842966.26762.1462147225@webmail.messagingengine.com>

and, one more bit of information

ls -al /dev/md*
brw-rw---- 1 root disk 9, 126 Jun 11 16:52 /dev/md126
brw-rw---- 1 root disk 9, 127 Jun 11 16:51 /dev/md127

/dev/md:
total 0
drwxr-xr-x  2 root root   60 Jun 11 16:52 ./
drwxr-xr-x 22 root root 5560 Jun 11 17:15 ../
lrwxrwxrwx  1 root root    8 Jun 11 16:52 0_0 -> ../md126

Jeff

^ permalink raw reply

* Re: Why do I get different results for 'mdadm --detail' & 'mdadm --examine' for the same array?
From: jeffs_linux @ 2011-06-12  1:42 UTC (permalink / raw)
  To: linux-raid
In-Reply-To: <1307840471.18429.1462140465@webmail.messagingengine.com>


On Sat, 11 Jun 2011 18:01 -0700, jeffs_linux@123mail.org wrote:
> I'm working on setting up my 1st Linux production server with RAID for
> our office.

I was digging through my logs and found these 'failed' notices,

grep mdadm /var/log/*
/var/log/boot.msg:udevd[360]: exec of program '/bin/grep -qs '^AUTO
-all' /etc/mdadm.conf' failed
/var/log/boot.msg:udevd[363]: exec of program '/bin/grep -qs '^AUTO
-all' /etc/mdadm.conf' failed
/var/log/boot.msg:udevd[364]: exec of program '/bin/grep -qs '^AUTO
-all' /etc/mdadm.conf' failed
/var/log/boot.msg:udevd[366]: exec of program '/bin/grep -qs '^AUTO
-all' /etc/mdadm.conf' failed
/var/log/boot.msg:udevd[367]: exec of program '/bin/grep -qs '^AUTO
-all' /etc/mdadm.conf' failed
/var/log/boot.msg:udevd[377]: exec of program '/bin/grep -qs '^AUTO
-all' /etc/mdadm.conf' failed
/var/log/boot.msg:udevd[382]: exec of program '/bin/grep -qs '^AUTO
-all' /etc/mdadm.conf' failed
/var/log/boot.msg:udevd[383]: exec of program '/bin/grep -qs '^AUTO
-all' /etc/mdadm.conf' failed
/var/log/boot.msg:udevd[426]: exec of program '/bin/grep -qs '^AUTO
-all' /etc/mdadm.conf' failed
/var/log/boot.msg:udevd[427]: exec of program '/bin/grep -qs '^AUTO
-all' /etc/mdadm.conf' failed
/var/log/boot.msg:udevd[428]: exec of program '/bin/grep -qs '^AUTO
-all' /etc/mdadm.conf' failed
/var/log/boot.msg:udevd[431]: exec of program '/bin/grep -qs '^AUTO
-all' /etc/mdadm.conf' failed
/var/log/boot.msg:mdadm: /dev/md127 has been started with 4 drives.
/var/log/boot.msg:Starting MD Raid mdadm: /dev/md/0_0 has been started
with 4 drives.
/var/log/boot.msg:<notice -- Jun 11 16:52:56.214405000> mdadmd start
/var/log/boot.msg:Starting mdadmd
/var/log/boot.msg:<notice -- Jun 11 16:52:56.293717000> startproc:
execve (/sbin/mdadm) [ /sbin/mdadm -F -d 60 -m root@localhost -s -c
/etc/mdadm.conf ], [ DO_FASTBOOT=no CONSOLE=/dev/console SELINUX_INIT=NO
ROOTFS_FSTYPE=ext4 SHELL=/bin/sh TERM=dumb ROOTFS_FSCK=0 LC_ALL=POSIX
INIT_VERSION=sysvinit-2.89 DO_BLOGD=yes REDIRECT=/dev/ttyS0 COLUMNS=80
PATH=/bin:/sbin:/usr/bin:/usr/sbin vga=0x31a DO_CONFIRM=no RUNLEVEL=5
PWD=/
SPLASHCFG=/etc/bootsplash/themes/openSUSE/config/bootsplash-1280x1024.cfg
DO_QUIET=no PREVLEVEL=N LINES=24 HOME=/ SHLVL=2 DO_FORCEFSCK=no
SPLASH=yes ROOTFS_BLKDEV=/dev/VG0/ROOT0 _=/sbin/startproc
DAEMON=/sbin/mdadm ]
/var/log/boot.msg:'mdadmd start' exits with status 0

I also found this file,

cat /dev/.mdadm/map
---------------------------------------------------------------
md126 0.90 19f2b21c:e54f9e1a:be5ad16e:9754ab5e /dev/md/0_0
md127 1.2 79fb7ad4:289bfae5:86c535ff:202960f2 /dev/md127
---------------------------------------------------------------

which I think is created maybe by udevd that's complaining about all
those failures above?

What's unexpected is that 'map' files /dev/md0_0 UUID matches the
"--detail" scan, but the UUID for the /dev/md127 device is different !?

mdadm --detail --scan
        ARRAY /dev/md127 metadata=1.2 name=jeffadm:jeffadm1
        UUID=d84afb64:e6fa2b64:ff21c975:f9765431
        ARRAY /dev/md/0_0 metadata=0.90
        UUID=19f2b21c:e54f9e1a:be5ad16e:9754ab5e

I'm pretty confused about what IS versus what SHOULD BE happening, or
whether I should delete/modify something, so I'm going to hold-off any
changes and hope somebody can shed some light.

Jeff

^ permalink raw reply

* Why do I get different results for 'mdadm --detail' & 'mdadm --examine' for the same array?
From: jeffs_linux @ 2011-06-12  1:01 UTC (permalink / raw)
  To: linux-raid

Hi,

I'm working on setting up my 1st Linux production server with RAID for
our office.

I have four drives, across which I created two arrays.

mdadm --create /dev/md0 --verbose --bitmap=interal --metadata=0.90
--raid-devices=4 --homehost=jeffadm --name=jeffadm0 --level=raid1
/dev/sd[abcd]1

mdadm --create /dev/md1 --verbose --bitmap=interal --metadata=1.2
--raid-devices=4 --homehost=jeffadm --name=jeffadm1 --level=raid10
--layout=f2 --chunk=512 /dev/sd[abcd]2

After letting the arrays build an so on, I check to see:

fdisk -l | grep -i /dev/md | grep bytes
	Disk /dev/md127 doesn't contain a valid partition table
	Disk /dev/md126 doesn't contain a valid partition table
	Disk /dev/md127: 1998.2 GB, 1998231437312 bytes
	Disk /dev/md126: 1085 MB, 1085603840 bytes

cat /proc/mdstat
-----------------------------------
Personalities : [raid10] [raid0] [raid1] [raid6] [raid5] [raid4]
[linear]
md126 : active raid1 sda1[0] sdd1[3] sdc1[2] sdb1[1]
      1060160 blocks [4/4] [UUUU]
      bitmap: 0/130 pages [0KB], 4KB chunk

md127 : active raid10 sda2[0] sdd2[3] sdc2[2] sdb2[1]
      1951397888 blocks super 1.2 512K chunks 2 far-copies [4/4] [UUUU]
      bitmap: 11/466 pages [44KB], 2048KB chunk

unused devices: <none>
-----------------------------------

For assembly at boot-up I created by hand,

cat /etc/mdadm.conf
-----------------------------------
DEVICE /dev/disk/by-id/ata-ST31000528AS_9VP37KJF-part1
/dev/disk/by-id/ata-ST31000528AS_9VP18C2L-part1
/dev/disk/by-id/ata-ST31000528AS_9VP18JXF-part1
/dev/disk/by-id/ata-ST31000528AS_6FD23G3U-part1

DEVICE /dev/disk/by-id/ata-ST31000528AS_9VP37KJF-part2
/dev/disk/by-id/ata-ST31000528AS_9VP18C2L-part2
/dev/disk/by-id/ata-ST31000528AS_9VP18JXF-part2
/dev/disk/by-id/ata-ST31000528AS_6FD23G3U-part2

ARRAY /dev/md/0_0 level=raid1  num-devices=4 metadata=0.90
UUID=19f2b21c:e54f9e1a:be5ad16e:9754ab5e

ARRAY /dev/md/jeffadm:jeffadm1 level=raid10 num-devices=4 metadata=1.02
UUID=d84afb64:e6fa2b64:ff21c975:f9765431 name=name=jeffadm:jeffadm1
-----------------------------------

I installed a Linux system, across the RAID arrays.  It boots up like I
expect.  As far as I can tell, everything seems to work ok.



In case it's helpful,

dmesg | grep md
[    5.312800] md: raid10 personality registered for level 10
[    5.364552] md: raid0 personality registered for level 0
[    5.379499] md: raid1 personality registered for level 1
[    5.649211] md: raid6 personality registered for level 6
[    5.649213] md: raid5 personality registered for level 5
[    5.649214] md: raid4 personality registered for level 4
[    8.450420] md: md127 stopped.
[    8.461620] md: bind<sdb2>
[    8.470479] md: bind<sdc2>
[    8.479231] md: bind<sdd2>
[    8.487931] md: bind<sda2>
[    8.512436] md/raid10:md127: active with 4 out of 4 devices
[    8.529191] created bitmap (466 pages) for device md127
[    8.554742] md127: bitmap initialized from disk: read 30/30 pages,
set 3952 bits
[    8.599314] md127: detected capacity change from 0 to 1998231437312
[    8.621493]  md127: unknown partition table
[    9.645318] md: linear personality registered for level -1
[   12.397689] ata5: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xff00
irq 14
[   12.397690] ata6: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xff08
irq 15
[   18.127526] md: md126 stopped.
[   18.138594] md: bind<sdb1>
[   18.147113] md: bind<sdc1>
[   18.155444] md: bind<sdd1>
[   18.163804] md: bind<sda1>
[   18.172913] md/raid1:md126: active with 4 out of 4 mirrors
[   18.186626] created bitmap (130 pages) for device md126
[   18.201996] md126: bitmap initialized from disk: read 9/9 pages, set
0 bits
[   18.225079] md126: detected capacity change from 0 to 1085603840
[   18.240775]  md126: unknown partition table
[   20.429304] EXT4-fs (md126): mounted filesystem with ordered data
mode. Opts: acl,user_xattr,barrier=1
[   84.109701] EXT4-fs (md126): re-mounted. Opts:
acl,user_xattr,barrier=1,commit=0


Now, I'm going about characterizing the arrays, and the Volumes on them,
so I can deal with recovery if & when it's necessary.

When I "look" at the array with these two commands,

mdadm --examine --scan
	ARRAY /dev/md/jeffadm1 metadata=1.2
	UUID=d84afb64:e6fa2b64:ff21c975:f9765431 name=jeffadm:jeffadm1
	ARRAY /dev/md126 UUID=19f2b21c:e54f9e1a:be5ad16e:9754ab5e

mdadm --detail --scan
	ARRAY /dev/md127 metadata=1.2 name=jeffadm:jeffadm1
	UUID=d84afb64:e6fa2b64:ff21c975:f9765431
	ARRAY /dev/md/0_0 metadata=0.90
	UUID=19f2b21c:e54f9e1a:be5ad16e:9754ab5e


I get different results for each one.

From my reading about naming in mdadm.conf, I was expecting to see:

  /dev/md/0_0
  /dev/jeffadm:jeffadm1


Why do I get this mix of different results,

	/dev/md/jeffadm1
	/dev/md126

from the "--detail" output, and

	/dev/md127 metadata=1.2 name=jeffadm:jeffadm1
	/dev/md/0_0

according to the "--examine" output?

Is my mdadm.conf OK?  What really should I expect to see for the names
of my arrays?

Jeff

^ permalink raw reply

* Re: Maximizing failed disk replacement on a RAID5 array
From: Durval Menezes @ 2011-06-11 22:35 UTC (permalink / raw)
  To: John Robinson; +Cc: Linux RAID
In-Reply-To: <4DF1F117.5010604@anonymous.org.uk>

Hello John,

On Fri, Jun 10, 2011 at 7:25 AM, John Robinson
<john.robinson@anonymous.org.uk> wrote:
> On 07/06/2011 09:52, John Robinson wrote:
>>
>> On 06/06/2011 19:06, Durval Menezes wrote:
>> [...]
>>>
>>> It would be great to have a
>>> "duplicate-this-bad-old-disk-into-this-shiny-new-disk" functionality,
>>> as it would enable an almost-no-downtime disk replacement with
>>> minimum risk, but it seems we can't have everything... :-0 Maybe it's
>>> something for the wishlist?
>>
>> It's already on the wishlist, described as a hot replace.
>
> Actually I've been thinking about this. I think I'd rather the hot replace
> functionality did a normal rebuild from the still-good drives, and only if
> it came across a read error from those would it attempt to refer to the
> contents of the known-to-be-failing drive (and then also attempt to repair
> the read error on the supposedly-still-good drive that gave a read error, as
> already happens).

This looks like a very good idea. The old (failing) drive would be
kept "on reserve", ready to be accessed for eventual failed sectors on
the other old (good) drives...

> My rationale for this is as follows: if we want to hot-replace a drive
> that's known to be failing, we should trust it less than the remaining
> still-good drives, and treat it with kid gloves. It may be suffering from
> bit-rot. We'd rather not hit all the bad sectors on the failing drive,
> because each time we do that we send the drive into 7 seconds (or more, for
> cheap drives without TLER) of re-reading, plus any Linux-level re-reading
> there might be. Further, making the known-to-be-failing drive work extra
> hard (doing the equivalent of dd'ing from it while also still using it to
> serve its contents as an array member) might make it die completely before
> we've finished.

I agree completely.

> What will this do for rebuild time? Well, I don't think it'll be any slower.

I think it will actually be faster.

> On the one hand, you'd think that copying from one drive to another would be
> faster than a rebuild, because you're only reading 1 drive instead of N-1,
> but on the other, your array is going to run slowly (pretty much degraded
> speed) anyway because you're keeping one drive in constant use reading from
> it, and you risk it becoming much, much slower if you do run in to hundreds
> or thousands of read errors on the failing drive.
>
> So overall I think hot-replace should be a normal replace with a possible
> second source of data/parity.

Your reasoning sounds good to me.

> Thoughts?

Only sadness that it's not implemented yet... :-)

> Yes, I know, -ENOPATCH

Exactly :-)

Cheers,
-- 
   Durval Menezes.


>
> Cheers,
>
> John.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Triple-parity raid6
From: David Brown @ 2011-06-11 18:05 UTC (permalink / raw)
  To: linux-raid
In-Reply-To: <4DF3A27E.8080806@gmail.com>

On 11/06/11 19:14, Joe Landman wrote:
> A quick note of caution (and someone from Netapp, feel free to speak up).
>
> Netapp has a patent on triple parity raid (c.f.
> http://www.freepatentsonline.com/7640484.html). A quick look over this,
> suggests that the major innovation is the layout and computation which
> they simplified in a particular manner. That is, I don't think their
> patent covers triple parity RAID in general, but does cover their
> implementation, and the diagonal parity with anti-diagonal parity
> (effectively counter propagating or orthogonalized parity).
>
> I am not sure what this means from a coding sense, other than not to use
> their techniques without a license to do so. If Netapp wants to grant
> such a license, this would be good, but I suspect that it wouldn't be
> quite as simple as this.
>
> Just a note so that we don't encounter problems. I think its very
> possible to avoid their IP, as it would somewhat hard to claim ownership
> of the Galois Field math behind RAID calculations. They can (and do)
> claim a particular implementation and algorithm.
>
> [also not trying to open the patent on code wars here, just pointing out
> the current situation ]
>
>

I've read a little about diagonal parities - I can see some advantage in 
their simplicity, but I think that they are a poor choice for raid. 
Raid5+ already suffers from performance issues because you often have to 
read a whole stripe at a time just to change a few blocks - with 
diagonal parity, you'd have to read a whole 2-D set of stripes.



^ permalink raw reply

* Re: Triple-parity raid6
From: Joe Landman @ 2011-06-11 17:14 UTC (permalink / raw)
  To: David Brown; +Cc: linux-raid
In-Reply-To: <it058n$1ju$1@dough.gmane.org>

A quick note of caution (and someone from Netapp, feel free to speak up).

Netapp has a patent on triple parity raid (c.f. 
http://www.freepatentsonline.com/7640484.html).  A quick look over this, 
suggests that the major innovation is the layout and computation which 
they simplified in a particular manner.  That is, I don't think their 
patent covers triple parity RAID in general, but does cover their 
implementation, and the diagonal parity with anti-diagonal parity 
(effectively counter propagating or orthogonalized parity).

I am not sure what this means from a coding sense, other than not to use 
their techniques without a license to do so.  If Netapp wants to grant 
such a license, this would be good, but I suspect that it wouldn't be 
quite as simple as this.

Just a note so that we don't encounter problems.  I think its very 
possible to avoid their IP, as it would somewhat hard to claim ownership 
of the Galois Field math behind RAID calculations.  They can (and do) 
claim a particular implementation and algorithm.

[also not trying to open the patent on code wars here, just pointing out 
the current situation ]

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

^ permalink raw reply

* Re: Triple-parity raid6
From: Joe Landman @ 2011-06-11 16:57 UTC (permalink / raw)
  To: David Brown; +Cc: linux-raid
In-Reply-To: <it058n$1ju$1@dough.gmane.org>

On 06/11/2011 12:31 PM, David Brown wrote:

> What has changed over the years is that there is no longer such a need
> for manual assembly code to get optimal speed out of the cpu. While

Hmmm ... I've done studies on this using an incredibly simple function 
(Riemann Zeta Function c.f. http://scalability.org/?p=470  ).  The short 
version is that hand optimized SSE2 is ~4x faster (for this case) than 
best optimization of high level code.  Hand optimized assembler is even 
better.

> writing such assembly is fun, it is time-consuming to write and hard to
> maintain, especially for code that must run on so many different platforms.

Yes, it is generally hard to write and maintain.  But it you can get the 
rest of the language semantics out of the way.  If you look at the tests 
that Linux does when it starts up, you can see a fairly wide 
distribution in the performance.

raid5: using function: generic_sse (13356.000 MB/sec)
raid6: int64x1   3507 MB/s
raid6: int64x2   3886 MB/s
raid6: int64x4   3257 MB/s
raid6: int64x8   3054 MB/s
raid6: sse2x1    8347 MB/s
raid6: sse2x2    9695 MB/s
raid6: sse2x4   10972 MB/s

Some of these are hand coded assembly. See 
${KERNEL_SOURCE}/drivers/md/raid6sse2.c and look at the 
raid6_sse24_gen_syndrome code.

Really, to get the best performance out of the system, requires a fairly 
deep understanding of how the processor/memory system operates.  These 
functions do use the SSE registers, but we can have only so many SSE 
operations in flight at once.  These processors can generally have quite 
a few simultaneous operations in flight at once, so a knowledge about 
that, and the mix of operations, and how the interact with the 
instruction scheduler in the hardware, is fairly essential to getting 
good performance.

>
>> We are interested in working on this capability (and more generic
>> capability) as well.
>>
>> Is anyone in particular starting to design/code this? Please let me know.
>>
>
> Well, I am currently trying to write up some of the maths - I started
> the thread because I had been playing around with the maths, and thought
> it should work. I made a brief stab at writing a
> "raid7_int$#_gen_syndrome()" function, but I haven't done any testing
> with it (or even tried to compile it) - first I want to be sure of the
> algorithms.

I've been coding various bits as "pseudocode" using Octave.  Makes 
checking with the built in Galios functions pretty easy.

I haven't looked at the math behind the triple parity syndrome calc yet, 
though I'd imagine someone has, and can write it down.  If someone 
hasn't done that yet, its a good first step.  Then we can code the 
simple version from there with test drivers/cases, and then start 
optimizing the implementation.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

^ permalink raw reply

* Re: Triple-parity raid6
From: David Brown @ 2011-06-11 16:31 UTC (permalink / raw)
  To: linux-raid
In-Reply-To: <4DF38424.1010500@gmail.com>

On 11/06/11 17:05, Joe Landman wrote:
> On 06/11/2011 10:53 AM, David Brown wrote:
>
>> Yes - we've already established that the implementation will be
>> possible, and that there are people willing and able to help with it.
>> And I believe that much of the optimisation can be handled by the
>> compiler - gcc has come a long way since raid6 was first implemented in
>> mdraid.
>
> Hmmm ... don't be too overly reliant upon optimization from a compiler
> for your performance. It makes sense to have a well designed (and proven
> correct) algorithm first that is hand optimizable. There are many
> designs which are anathema to performance, and you have to be careful to
> avoid using them in your coding.
>

Absolutely - it's the designer's job to pick a good algorithm, and the 
programmer's job to make a good implementation of it.  But it's the 
compiler's job to turn that into tight object code.  And for code like 
this, the algorithm designer has to work closely with the programmer (or 
be the same person, of course) to pick algorithms that can be 
implemented well.  Similarly, the programmer has to have a good 
understanding of the compiler and be able to understand the generated 
assembly, in order to get the best from the tools.

What has changed over the years is that there is no longer such a need 
for manual assembly code to get optimal speed out of the cpu.  While 
writing such assembly is fun, it is time-consuming to write and hard to 
maintain, especially for code that must run on so many different platforms.

> We are interested in working on this capability (and more generic
> capability) as well.
>
> Is anyone in particular starting to design/code this? Please let me know.
>

Well, I am currently trying to write up some of the maths - I started 
the thread because I had been playing around with the maths, and thought 
it should work.  I made a brief stab at writing a 
"raid7_int$#_gen_syndrome()" function, but I haven't done any testing 
with it (or even tried to compile it) - first I want to be sure of the 
algorithms.

^ permalink raw reply

* Re: Triple-parity raid6
From: Joe Landman @ 2011-06-11 15:05 UTC (permalink / raw)
  To: David Brown; +Cc: linux-raid
In-Reply-To: <isvvhi$2vf$1@dough.gmane.org>

On 06/11/2011 10:53 AM, David Brown wrote:

> Yes - we've already established that the implementation will be
> possible, and that there are people willing and able to help with it.
> And I believe that much of the optimisation can be handled by the
> compiler - gcc has come a long way since raid6 was first implemented in
> mdraid.

Hmmm ... don't be too overly reliant upon optimization from a compiler 
for your performance.  It makes sense to have a well designed (and 
proven correct) algorithm first that is hand optimizable.  There are 
many designs which are anathema to performance, and you have to be 
careful to avoid using them in your coding.

We are interested in working on this capability (and more generic 
capability) as well.

Is anyone in particular starting to design/code this?  Please let me know.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

^ permalink raw reply

* Re: Triple-parity raid6
From: David Brown @ 2011-06-11 14:53 UTC (permalink / raw)
  To: linux-raid
In-Reply-To: <20110611131801.GA2764@lazy.lzy>

On 11/06/11 15:18, Piergiorgio Sartor wrote:
> On Sat, Jun 11, 2011 at 01:51:12PM +0200, David Brown wrote:
>> On 11/06/11 12:13, Piergiorgio Sartor wrote:
>>> [snip]
>>>> Of course, all this assume that my maths is correct !
>>>
>>> I would suggest to check out the Reed-Solomon thing
>>> in the more friendly form of Vandermonde matrix.
>>>
>>> It will be completely clear how to generate k parity
>>> set with n data set (disk), so that n+k<   258 for the
>>> GF(256) space.
>>>
>>> It will also be much more clear how to re-construct
>>> the data set in case of erasure (known data lost).
>>>
>>> You can have a look, for reference, at:
>>>
>>> http://lsirwww.epfl.ch/wdas2004/Presentations/manasse.ppt
>>>
>>> If you search for something like "Reed Solomon Vandermonde"
>>> you'll find even more information.
>>>
>>> Hope this helps.
>>>
>>> bye,
>>>
>>
>> That presentation is using Vandermonde matrices, which are the same
>> as the ones used in James Plank's papers.  As far as I can see,
>> these are limited in how well you can recover from missing disks
>> (the presentation here says it only works for up to triple parity).
>
> As far as I understood, 3 is only an example, it works
> up to k lost disks, with n+k<  258 (or 259).
> I mean, I do not see why it should not.
>

I also don't see why 3 parity should be a limitation - I think it must 
be because of the choice of syndrome calculations.  But the presentation 
you linked to specifically says on page 3 that it will say "Why it stops 
at three erasures, and works only for GF(2^k)".  I haven't investigated 
anything other than GF(2^8), since that is optimal for implementing raid 
(well, 2^1 is easier - but that only gives you raid5).  Unfortunately, 
the paper doesn't give details there.  Adam Leventhal's blog (mentioned 
earlier in this thread) also says that implementation of triple-parity 
for ZFS was relatively easy, but not for more than three parity bits.

> You've, of course, to know which disks are failed.

That's normally the case for disk applications.

> On the other hand, having k parities allows you to find
> up to k/2 error positions.
> This is bit more complicated, I guess.
> You can search for Berlekamp-Massey Algorithm (and related)
> in order to see how to *find* the errors.
>

I've worked with ECC systems for data transmission and communications 
systems, when you don't know if there are any errors or where the errors 
might be.  But although there is a fair overlap of the theory here, 
there are big differences in the way you implement such checking and 
correction, and your priorities.  With raid, you know either that your 
block read is correct (because of the ECC handled at the drive firmware 
level), or incorrect.

To deal with unknown errors or error positions, you have to read in 
everything in a stripe and run your error checking for every read - that 
would be a significant run-time cost, which normally be wasted (as the 
raid set is normally consistent).

One situation where that might be useful, however, is for scrubbing or 
checking when the array is know to be inconsistent (such as after a 
power failure).  Neil has already argued that the simple approach of 
re-creating the parity blocks (rather than identifying incorrect blocks) 
is better, or at least no worse, than being "smart".  But the balance of 
that argument might change with more parity blocks.

<http://neil.brown.name/blog/20100211050355>

>> They Vandermonde matrices have that advantage that the determinants
>> are easily calculated - I haven't yet figured out an analytical
>> method of calculating the determinants in my equations, and have
>> just used brute force checking. (My syndromes also have the
>> advantage of being easy to calculate quickly.)
>
> I think the point of Reed Solomon (with Vandermonde or Cauchy
> matrices) is also that it generalize the parity concept.
>
> This means you do not have to care if it is 2 or 3 or 7 or ...
>
> In this way you can have as many parities as you like, up to
> the limitation of Reed Solomon in GF(256).
>

I agree.  However, I'm not sure there is much practical use of going 
beyond perhaps 4 parity blocks - at that stage you are probably better 
dividing up your array, or (if you need more protection) using n-parity 
raid6 over raid1 mirror sets.

>> Still, I think the next step for me should be to write up the maths
>> a bit more formally, rather than just hints in mailing list posts.
>> Then others can have a look, and have an opinion on whether I've got
>> it right or not.  It makes sense to be sure the algorithms will work
>> before spending much time implementing them!
>
> I tend to agree. At first you should set up the background
> theory, then the algorithm, later the implementation and
> eventually the optimization.
>

Yes - we've already established that the implementation will be 
possible, and that there are people willing and able to help with it. 
And I believe that much of the optimisation can be handled by the 
compiler - gcc has come a long way since raid6 was first implemented in 
mdraid.

>> I certainly /believe/ my maths is correct here - but it's nearly
>> twenty years since I did much formal algebra.  I studied maths at
>> university, but I don't use group theory often in my daily job as an
>> embedded programmer.
>
> Well, I, for sure, will stay tuned for your results!
>
> bye,
>

^ permalink raw reply

* Re: Triple-parity raid6
From: Piergiorgio Sartor @ 2011-06-11 13:18 UTC (permalink / raw)
  To: David Brown; +Cc: linux-raid
In-Reply-To: <isvkrg$c79$1@dough.gmane.org>

On Sat, Jun 11, 2011 at 01:51:12PM +0200, David Brown wrote:
> On 11/06/11 12:13, Piergiorgio Sartor wrote:
> >[snip]
> >>Of course, all this assume that my maths is correct !
> >
> >I would suggest to check out the Reed-Solomon thing
> >in the more friendly form of Vandermonde matrix.
> >
> >It will be completely clear how to generate k parity
> >set with n data set (disk), so that n+k<  258 for the
> >GF(256) space.
> >
> >It will also be much more clear how to re-construct
> >the data set in case of erasure (known data lost).
> >
> >You can have a look, for reference, at:
> >
> >http://lsirwww.epfl.ch/wdas2004/Presentations/manasse.ppt
> >
> >If you search for something like "Reed Solomon Vandermonde"
> >you'll find even more information.
> >
> >Hope this helps.
> >
> >bye,
> >
> 
> That presentation is using Vandermonde matrices, which are the same
> as the ones used in James Plank's papers.  As far as I can see,
> these are limited in how well you can recover from missing disks
> (the presentation here says it only works for up to triple parity).

As far as I understood, 3 is only an example, it works
up to k lost disks, with n+k < 258 (or 259).
I mean, I do not see why it should not.

You've, of course, to know which disks are failed.
On the other hand, having k parities allows you to find
up to k/2 error positions.
This is bit more complicated, I guess.
You can search for Berlekamp-Massey Algorithm (and related)
in order to see how to *find* the errors.

> They Vandermonde matrices have that advantage that the determinants
> are easily calculated - I haven't yet figured out an analytical
> method of calculating the determinants in my equations, and have
> just used brute force checking. (My syndromes also have the
> advantage of being easy to calculate quickly.)

I think the point of Reed Solomon (with Vandermonde or Cauchy
matrices) is also that it generalize the parity concept.

This means you do not have to care if it is 2 or 3 or 7 or ...

In this way you can have as many parities as you like, up to
the limitation of Reed Solomon in GF(256).

> Still, I think the next step for me should be to write up the maths
> a bit more formally, rather than just hints in mailing list posts.
> Then others can have a look, and have an opinion on whether I've got
> it right or not.  It makes sense to be sure the algorithms will work
> before spending much time implementing them!

I tend to agree. At first you should set up the background
theory, then the algorithm, later the implementation and
eventually the optimization.
 
> I certainly /believe/ my maths is correct here - but it's nearly
> twenty years since I did much formal algebra.  I studied maths at
> university, but I don't use group theory often in my daily job as an
> embedded programmer.

Well, I, for sure, will stay tuned for your results!

bye,

-- 

piergiorgio

^ permalink raw reply

* mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
From: Pol Hallen @ 2011-06-11 12:36 UTC (permalink / raw)
  To: linux-raid

Hi all :-)

after disks problems now I can't assemble my array:

mdadm -A --force /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sda
mdadm: failed to RUN_ARRAY /dev/md0: Input/output error

mdadm -S /dev/md0 and try to assemble same error..

mdadm -E /dev/sda
 
 Magic : a92b4efc
        Version : 0.90.00
           UUID : 9bd6372e:e2eab1d5:d2bdc3cb:ad12f41d
  Creation Time : Mon Sep 27 14:19:15 2010
     Raid Level : raid6
  Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
     Array Size : 5860543744 (5589.05 GiB 6001.20 GB)
   Raid Devices : 6
  Total Devices : 5
Preferred Minor : 0

    Update Time : Sat Jun 11 13:59:43 2011
          State : active
 Active Devices : 4
Working Devices : 5
 Failed Devices : 1
  Spare Devices : 1
       Checksum : 99b0bd41 - correct
         Events : 701273

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     3       8       80        3      active sync   /dev/sdf

   0     0       0        0        0      removed
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       65        2      active sync   /dev/sde1
   3     3       8       80        3      active sync   /dev/sdf
   4     4       8       17        4      active sync   /dev/sdb1
   5     5       0        0        5      faulty removed
   6     6       8       49        6      spare   /dev/sdd1

how start my array?

thanks!

Pol

^ permalink raw reply

* Re: Triple-parity raid6
From: David Brown @ 2011-06-11 11:51 UTC (permalink / raw)
  To: linux-raid
In-Reply-To: <20110611101312.GA3528@lazy.lzy>

On 11/06/11 12:13, Piergiorgio Sartor wrote:
> [snip]
>> Of course, all this assume that my maths is correct !
>
> I would suggest to check out the Reed-Solomon thing
> in the more friendly form of Vandermonde matrix.
>
> It will be completely clear how to generate k parity
> set with n data set (disk), so that n+k<  258 for the
> GF(256) space.
>
> It will also be much more clear how to re-construct
> the data set in case of erasure (known data lost).
>
> You can have a look, for reference, at:
>
> http://lsirwww.epfl.ch/wdas2004/Presentations/manasse.ppt
>
> If you search for something like "Reed Solomon Vandermonde"
> you'll find even more information.
>
> Hope this helps.
>
> bye,
>

That presentation is using Vandermonde matrices, which are the same as 
the ones used in James Plank's papers.  As far as I can see, these are 
limited in how well you can recover from missing disks (the presentation 
here says it only works for up to triple parity).  They Vandermonde 
matrices have that advantage that the determinants are easily calculated 
- I haven't yet figured out an analytical method of calculating the 
determinants in my equations, and have just used brute force checking. 
(My syndromes also have the advantage of being easy to calculate quickly.)


Still, I think the next step for me should be to write up the maths a 
bit more formally, rather than just hints in mailing list posts.  Then 
others can have a look, and have an opinion on whether I've got it right 
or not.  It makes sense to be sure the algorithms will work before 
spending much time implementing them!

I certainly /believe/ my maths is correct here - but it's nearly twenty 
years since I did much formal algebra.  I studied maths at university, 
but I don't use group theory often in my daily job as an embedded 
programmer.



^ permalink raw reply

* Re: Triple-parity raid6
From: Piergiorgio Sartor @ 2011-06-11 10:13 UTC (permalink / raw)
  To: David Brown; +Cc: linux-raid
In-Reply-To: <ist9n7$khq$1@dough.gmane.org>

[snip]
> Of course, all this assume that my maths is correct !

I would suggest to check out the Reed-Solomon thing
in the more friendly form of Vandermonde matrix.

It will be completely clear how to generate k parity
set with n data set (disk), so that n+k < 258 for the
GF(256) space.

It will also be much more clear how to re-construct
the data set in case of erasure (known data lost).

You can have a look, for reference, at:

http://lsirwww.epfl.ch/wdas2004/Presentations/manasse.ppt

If you search for something like "Reed Solomon Vandermonde"
you'll find even more information.

Hope this helps.

bye,

-- 

piergiorgio

^ permalink raw reply

* Re: HDD reports errors while completing RAID6 array check
From: Mathias Burén @ 2011-06-11  9:49 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Linux-RAID
In-Reply-To: <BANLkTinPfZeSwJ8Tckc0usXFpLHHuPSz4Q@mail.gmail.com>

On 10 June 2011 19:23, Mathias Burén <mathias.buren@gmail.com> wrote:
> On 10 June 2011 19:00, Roman Mamedov <rm@romanrm.ru> wrote:
>> On Fri, 10 Jun 2011 18:37:06 +0100
>> Mathias Burén <mathias.buren@gmail.com> wrote:
>>
>>>   9 Power_On_Hours          0x0032   090   090   000    Old_age
>>> Always       -       7781
>>
>>
>>> # 1  Extended offline    Completed without error       00%      6827
>>> - # 2  Extended offline    Completed without error       00%
>>> 6550         - # 3  Extended offline    Completed without error
>>> 00%      6468         - # 4  Extended offline    Completed without
>>> error       00%      6329         - # 5  Extended offline    Completed
>>> without error       00%      6040         - # 6  Extended offline
>>> Completed without error       00%      5584         - # 7  Extended
>>> offline    Completed without error       00%      5178         - # 8
>>> Extended offline    Completed without error       00%      4761         - #
>>> 9  Short offline       Completed without error       00%      2285         -
>>> #10  Extended offline    Completed without error       00%      1514
>>
>> I suggest that you do another "smartctl -t long" on it, the latest one was
>> done almost 1000 hours ago which is also much longer than the period between
>> previous tests. Freezes on reads could be a symptom of a bad (unreadable, or
>> very slowly readable - which is worse) sector, perhaps it could be detected by
>> the SMART test. Or also do a full read of the drive directly (not through the
>> RAID) e.g. with "badblocks" and see if you get any I/O errors that way.
>>
>> --
>> With respect,
>> Roman
>>
>
> Thanks for the suggestions, I'll start the long selftest now.
>
> /M
>

Things look OK after the test:

 $ sudo smartctl -a /dev/sdd
Password:
smartctl 5.40 2010-10-16 r3189 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (Adv. Format) family
Device Model:     WDC WD20EARS-00MVWB0
Serial Number:    WD-WMAZ20188479
Firmware Version: 50.0AB50
User Capacity:    2,000,398,934,016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sat Jun 11 10:48:05 2011 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an
interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (36000) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection
on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x3035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
Always       -       0
  3 Spin_Up_Time            0x0027   176   162   021    Pre-fail
Always       -       6183
  4 Start_Stop_Count        0x0032   100   100   000    Old_age
Always       -       59
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age
Always       -       0
  9 Power_On_Hours          0x0032   090   090   000    Old_age
Always       -       7797
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age
Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age
Always       -       53
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age
Always       -       32
193 Load_Cycle_Count        0x0032   162   162   000    Old_age
Always       -       114863
194 Temperature_Celsius     0x0022   109   102   000    Old_age
Always       -       41
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age
Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age
Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      7788         -
# 2  Extended offline    Completed without error       00%      6827         -
# 3  Extended offline    Completed without error       00%      6550         -
# 4  Extended offline    Completed without error       00%      6468         -
# 5  Extended offline    Completed without error       00%      6329         -
# 6  Extended offline    Completed without error       00%      6040         -
# 7  Extended offline    Completed without error       00%      5584         -
# 8  Extended offline    Completed without error       00%      5178         -
# 9  Extended offline    Completed without error       00%      4761         -
#10  Short offline       Completed without error       00%      2285         -
#11  Extended offline    Completed without error       00%      1514         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

I initiated a self test on each of the other HDDs as well. It's time
to run badblocks then!

/M
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Disk upgrade
From: Stan Hoeppner @ 2011-06-11  3:00 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Linux RAID
In-Reply-To: <4DF22587.60509@tmr.com>

On 6/10/2011 9:09 AM, Bill Davidsen wrote:
> I'm running out of room in a box to add drives, so I want to go to
> larger drives. Unfortunately I have but one bay left. 

Cases are cheap.  Buy a new case with plenty of bays and cooling:
http://www.newegg.com/Product/Product.aspx?Item=N82E16811147154
8 x 3.5" internal bays w/drive rails, 3 x 5.25" external bays, excellent
cooling, $80 USD, reuse your current PSU and internals, grab a new PSU
if needed to handle all the drives.

If you start running out of 3.5" internal bays start adding these to
your 5.25" bays:
http://www.newegg.com/Product/Product.aspx?Item=N82E16817994095

With current 2.5" drives you can get up to 12TB raw in the 3 5.25" bays
on top of the 24TB raw available with the 8 3.5" bays, 36TB total.

-- 
Stan

^ permalink raw reply

* Re:
From: Phil Turmel @ 2011-06-11  2:06 UTC (permalink / raw)
  To: Dragon; +Cc: linux-raid
In-Reply-To: <20110610202638.220520@gmx.net>

On 06/10/2011 04:26 PM, Dragon wrote:
> "No, it must be "Used Device Size" * 11 = 16116523456.  Try it without the 'k'."
> -> was better:

[...]

> ->fsck -n /dev/md0, was ok
> ->now:mdadm /dev/md0 --grow -n 12 --backup-file=/reshape.bak
> ->and after that, how become the disk out of the raid?

Monitor your background reshape with "cat /proc/mdstat".

When the reshape is complete, the extra disk will be marked "spare".

Then you can use "mdadm --remove".

> at this point i think i take the disk out of the raid, because i need the space of the disk.

Understood, but you are living on the edge.  You have no backup, and only one drive of redundancy.  If one of your drives does fail, the odds of losing the whole array while replacing it is significant.  Your Samsung drives claim a non-recoverable read error rate of 1 per 1x10^15 bits.  Your eleven data disks contain 1.32x10^14 bits, all of which must be read during rebuild.  That means a _13%_ chance of total failure while replacing a failed drive.

I hope your 16T of data is not terribly important to you, or is otherwise replaceable.

> I need another advise of you. While the computer is actualy build with 13 disk and i will become more data in the next month and the limit of power supply connecotors is reached i am looking forward to another solution. one possibility is to build up a better computer with more sata and sas connectors and add further raid-controller-cards. an other idea is to build a kind of cluster or dfs with two and later 3,4... computer. i read something about gluster.org. do you have a tip for me or experience in this?

Unfortunately, no.  Although I skirt the edges in my engineering work, I'm primarily an end-user.  Both personal and work projects have relatively modest needs.  From the engineering side, I do recommend you spend extra on power supplies & UPS.

Phil

^ permalink raw reply

* (unknown)
From: Dragon @ 2011-06-10 20:26 UTC (permalink / raw)
  To: philip; +Cc: linux-raid

"No, it must be "Used Device Size" * 11 = 16116523456.  Try it without the 'k'."
-> was better:
mdadm /dev/md0 --grow --array-size=16116523456
mdadm --detail /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Fri Jun 10 14:19:24 2011
     Raid Level : raid5
     Array Size : 16116523456 (15369.91 GiB 16503.32 GB)
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
   Raid Devices : 13
  Total Devices : 13
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Jun 10 16:49:37 2011
          State : clean
 Active Devices : 13
Working Devices : 13
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 8c4d8438:42aa49f9:a6d866f6:b6ea6b93 (local to host nassrv01)
         Events : 0.2

    Number   Major   Minor   RaidDevice State
       0       8      160        0      active sync   /dev/sdk
       1       8      208        1      active sync   /dev/sdn
       2       8      176        2      active sync   /dev/sdl
       3       8      192        3      active sync   /dev/sdm
       4       8        0        4      active sync   /dev/sda
       5       8       16        5      active sync   /dev/sdb
       6       8       64        6      active sync   /dev/sde
       7       8       48        7      active sync   /dev/sdd
       8       8       80        8      active sync   /dev/sdf
       9       8       96        9      active sync   /dev/sdg
      10       8      112       10      active sync   /dev/sdh
      11       8      128       11      active sync   /dev/sdi
      12       8      144       12      active sync   /dev/sdj

->fsck -n /dev/md0, was ok
->now:mdadm /dev/md0 --grow -n 12 --backup-file=/reshape.bak
->and after that, how become the disk out of the raid?
--

at this point i think i take the disk out of the raid, because i need the space of the disk.

I need another advise of you. While the computer is actualy build with 13 disk and i will become more data in the next month and the limit of power supply connecotors is reached i am looking forward to another solution. one possibility is to build up a better computer with more sata and sas connectors and add further raid-controller-cards. an other idea is to build a kind of cluster or dfs with two and later 3,4... computer. i read something about gluster.org. do you have a tip for me or experience in this?
-- 
NEU: FreePhone - kostenlos mobil telefonieren!			
Jetzt informieren: http://www.gmx.net/de/go/freephone


-- 
NEU: FreePhone - kostenlos mobil telefonieren!			
Jetzt informieren: http://www.gmx.net/de/go/freephone

^ permalink raw reply

* Re: HDD reports errors while completing RAID6 array check
From: Mathias Burén @ 2011-06-10 18:23 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Linux-RAID
In-Reply-To: <20110611000000.76ca58e7@natsu>

On 10 June 2011 19:00, Roman Mamedov <rm@romanrm.ru> wrote:
> On Fri, 10 Jun 2011 18:37:06 +0100
> Mathias Burén <mathias.buren@gmail.com> wrote:
>
>>   9 Power_On_Hours          0x0032   090   090   000    Old_age
>> Always       -       7781
>
>
>> # 1  Extended offline    Completed without error       00%      6827
>> - # 2  Extended offline    Completed without error       00%
>> 6550         - # 3  Extended offline    Completed without error
>> 00%      6468         - # 4  Extended offline    Completed without
>> error       00%      6329         - # 5  Extended offline    Completed
>> without error       00%      6040         - # 6  Extended offline
>> Completed without error       00%      5584         - # 7  Extended
>> offline    Completed without error       00%      5178         - # 8
>> Extended offline    Completed without error       00%      4761         - #
>> 9  Short offline       Completed without error       00%      2285         -
>> #10  Extended offline    Completed without error       00%      1514
>
> I suggest that you do another "smartctl -t long" on it, the latest one was
> done almost 1000 hours ago which is also much longer than the period between
> previous tests. Freezes on reads could be a symptom of a bad (unreadable, or
> very slowly readable - which is worse) sector, perhaps it could be detected by
> the SMART test. Or also do a full read of the drive directly (not through the
> RAID) e.g. with "badblocks" and see if you get any I/O errors that way.
>
> --
> With respect,
> Roman
>

Thanks for the suggestions, I'll start the long selftest now.

/M
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: HDD reports errors while completing RAID6 array check
From: Roman Mamedov @ 2011-06-10 18:00 UTC (permalink / raw)
  To: Mathias Burén; +Cc: Linux-RAID
In-Reply-To: <BANLkTik87PZChAzgBnDM170uJHU=3Yc6Eg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1472 bytes --]

On Fri, 10 Jun 2011 18:37:06 +0100
Mathias Burén <mathias.buren@gmail.com> wrote:

>   9 Power_On_Hours          0x0032   090   090   000    Old_age
> Always       -       7781


> # 1  Extended offline    Completed without error       00%      6827
> - # 2  Extended offline    Completed without error       00%
> 6550         - # 3  Extended offline    Completed without error
> 00%      6468         - # 4  Extended offline    Completed without
> error       00%      6329         - # 5  Extended offline    Completed
> without error       00%      6040         - # 6  Extended offline
> Completed without error       00%      5584         - # 7  Extended
> offline    Completed without error       00%      5178         - # 8
> Extended offline    Completed without error       00%      4761         - #
> 9  Short offline       Completed without error       00%      2285         -
> #10  Extended offline    Completed without error       00%      1514

I suggest that you do another "smartctl -t long" on it, the latest one was
done almost 1000 hours ago which is also much longer than the period between
previous tests. Freezes on reads could be a symptom of a bad (unreadable, or
very slowly readable - which is worse) sector, perhaps it could be detected by
the SMART test. Or also do a full read of the drive directly (not through the
RAID) e.g. with "badblocks" and see if you get any I/O errors that way.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* HDD reports errors while completing RAID6 array check
From: Mathias Burén @ 2011-06-10 17:37 UTC (permalink / raw)
  To: Linux-RAID

Hi list,

When I run a check ( echo check > /sys/block/md0/md/sync_action ) on
my RAID6 array I see errors regarding ata4 in dmesg. When I check the
SMART data all appears to be fine, which is what confuses me. I saw
someone on the list posted a link to a blog post about Google's drive
failures and their analysis, and they concluded that many drives fail
without reporting any type of issue in the SMART table. Therefore I'm
wondering if what I'm seeing here could be an indicator of a pending
drive failure? (heh, aren't all drives pending failures...)

The array is healthy and working.

Here is the dmesg:

[774777.586500] md: data-check of RAID array md0
[774777.586510] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[774777.586516] md: using maximum available idle IO bandwidth (but not
more than 200000 KB/sec) for data-check.
[774777.586537] md: using 128k window, over a total of 1950351360 blocks.
[804382.162553] forcedeth 0000:00:0a.0: eth0: link down
[804382.181483] br0: port 2(eth0) entering forwarding state
[804384.461232] forcedeth 0000:00:0a.0: eth0: link up
[804384.462858] br0: port 2(eth0) entering learning state
[804384.462866] br0: port 2(eth0) entering learning state
[804399.492930] br0: port 2(eth0) entering forwarding state
[816754.318388] ata4.00: exception Emask 0x0 SAct 0x1fc1f SErr 0x0
action 0x6 frozen
[816754.318397] ata4.00: failed command: READ FPDMA QUEUED
[816754.318409] ata4.00: cmd 60/88:00:18:69:46/00:00:e5:00:00/40 tag 0
ncq 69632 in
[816754.318411]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[816754.318417] ata4.00: status: { DRDY }
[816754.318421] ata4.00: failed command: READ FPDMA QUEUED
[816754.318432] ata4.00: cmd 60/38:08:00:6b:46/00:00:e5:00:00/40 tag 1
ncq 28672 in
[816754.318434]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[816754.318439] ata4.00: status: { DRDY }
[816754.318444] ata4.00: failed command: READ FPDMA QUEUED
[816754.318454] ata4.00: cmd 60/b0:10:00:6c:46/00:00:e5:00:00/40 tag 2
ncq 90112 in
[816754.318457]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[816754.318462] ata4.00: status: { DRDY }
[816754.318466] ata4.00: failed command: READ FPDMA QUEUED
[816754.318476] ata4.00: cmd 60/18:18:00:6e:46/00:00:e5:00:00/40 tag 3
ncq 12288 in
[816754.318479]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[816754.318484] ata4.00: status: { DRDY }
[816754.318488] ata4.00: failed command: READ FPDMA QUEUED
[816754.318499] ata4.00: cmd 60/c8:20:00:6f:46/00:00:e5:00:00/40 tag 4
ncq 102400 in
[816754.318501]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[816754.318506] ata4.00: status: { DRDY }
[816754.318511] ata4.00: failed command: READ FPDMA QUEUED
[816754.318521] ata4.00: cmd 60/60:50:a0:69:46/00:00:e5:00:00/40 tag
10 ncq 49152 in
[816754.318524]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[816754.318529] ata4.00: status: { DRDY }
[816754.318533] ata4.00: failed command: READ FPDMA QUEUED
[816754.318543] ata4.00: cmd 60/00:58:00:6a:46/01:00:e5:00:00/40 tag
11 ncq 131072 in
[816754.318546]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[816754.318551] ata4.00: status: { DRDY }
[816754.318555] ata4.00: failed command: READ FPDMA QUEUED
[816754.318566] ata4.00: cmd 60/60:60:38:6b:46/00:00:e5:00:00/40 tag
12 ncq 49152 in
[816754.318568]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[816754.318573] ata4.00: status: { DRDY }
[816754.318577] ata4.00: failed command: READ FPDMA QUEUED
[816754.318588] ata4.00: cmd 60/68:68:98:6b:46/00:00:e5:00:00/40 tag
13 ncq 53248 in
[816754.318590]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[816754.318595] ata4.00: status: { DRDY }
[816754.318600] ata4.00: failed command: READ FPDMA QUEUED
[816754.318610] ata4.00: cmd 60/50:70:b0:6c:46/00:00:e5:00:00/40 tag
14 ncq 40960 in
[816754.318613]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[816754.318617] ata4.00: status: { DRDY }
[816754.318622] ata4.00: failed command: READ FPDMA QUEUED
[816754.318632] ata4.00: cmd 60/00:78:00:6d:46/01:00:e5:00:00/40 tag
15 ncq 131072 in
[816754.318635]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[816754.318640] ata4.00: status: { DRDY }
[816754.318644] ata4.00: failed command: READ FPDMA QUEUED
[816754.318654] ata4.00: cmd 60/e8:80:18:6e:46/00:00:e5:00:00/40 tag
16 ncq 118784 in
[816754.318657]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[816754.318662] ata4.00: status: { DRDY }
[816754.318670] ata4: hard resetting link
[816754.638361] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[816754.645454] ata4.00: configured for UDMA/133
[816754.645466] ata4.00: device reported invalid CHS sector 0
[816754.645472] ata4.00: device reported invalid CHS sector 0
[816754.645477] ata4.00: device reported invalid CHS sector 0
[816754.645482] ata4.00: device reported invalid CHS sector 0
[816754.645488] ata4.00: device reported invalid CHS sector 0
[816754.645493] ata4.00: device reported invalid CHS sector 0
[816754.645498] ata4.00: device reported invalid CHS sector 0
[816754.645502] ata4.00: device reported invalid CHS sector 0
[816754.645507] ata4.00: device reported invalid CHS sector 0
[816754.645512] ata4.00: device reported invalid CHS sector 0
[816754.645516] ata4.00: device reported invalid CHS sector 0
[816754.645521] ata4.00: device reported invalid CHS sector 0
[816754.645555] ata4: EH complete
[817317.467510] md: md0: data-check done.

Here is mdadm -D /dev/md0:

$ mdadm -D /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Tue Oct 19 08:58:41 2010
     Raid Level : raid6
     Array Size : 9751756800 (9300.00 GiB 9985.80 GB)
  Used Dev Size : 1950351360 (1860.00 GiB 1997.16 GB)
   Raid Devices : 7
  Total Devices : 7
    Persistence : Superblock is persistent

    Update Time : Fri Jun 10 18:30:56 2011
          State : clean
 Active Devices : 7
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : ion:0  (local to host ion)
           UUID : e6595c64:b3ae90b3:f01133ac:3f402d20
         Events : 6158735

    Number   Major   Minor   RaidDevice State
       0       8       97        0      active sync   /dev/sdg1
       1       8       17        1      active sync   /dev/sdb1
       4       8       49        2      active sync   /dev/sdd1
       3       8       33        3      active sync   /dev/sdc1
       5       8       81        4      active sync   /dev/sdf1
       6       8      113        5      active sync   /dev/sdh1
       7       8       65        6      active sync   /dev/sde1

Below is the output of the lsdrv python script, ata4 mentioned in
dmesg should be sdd:

 $ sudo python2 lsdrv
PCI [ahci] 00:0b.0 SATA controller: nVidia Corporation MCP79 AHCI
Controller (rev b1)
 ├─scsi 0:0:0:0 ATA Corsair CSSD-F60 {10326505580009990027}
 │  └─sda: Partitioned (dos) 55.90g
 │     ├─sda1: (ext4) 100.00m 'ssd_boot' {ae879f86-73a4-451f-bb6b-e778ad1b57d6}
 │     │  └─Mounted as /dev/sda1 @ /boot
 │     ├─sda2: (swap) 2.00g 'ssd_swap' {a28e32fa-628c-419a-9693-ca88166d230f}
 │     └─sda3: (ext4) 53.80g 'ssd_root' {6e812ed7-01c4-4a76-ae31-7b3d36d847f5}
 │        └─Mounted as /dev/disk/by-label/ssd_root @ /
 ├─scsi 1:0:0:0 ATA WDC WD20EARS-00M {WD-WCAZA1022443}
 │  └─sdb: Partitioned (dos) 1.82t
 │     └─sdb1: MD raid6 (1/7) 1.82t md0 clean in_sync 'ion:0'
{e6595c64-b3ae-90b3-f011-33ac3f402d20}
 │        └─md0: PV LVM2_member 9.08t/9.08t VG lvstorage 9.08t
{YLEUKB-klxF-X3gF-6dG3-DL4R-xebv-6gKQc2}
 │            └─Volume Group lvstorage (md0) 0 free {
Xd0HTM-azdN-v9kJ-C7vD-COcU-Cnn8-6AJ6hI}
 │              └─dm-0: (ext4) 9.08t 'storage'
{0ca82f13-680f-4b0d-a5d0-08c246a838e5}
 │                 └─Mounted as /dev/mapper/lvstorage-storage @ /raid6volume
 ├─scsi 2:0:0:0 ATA WDC WD20EARS-00M {WD-WMAZ20152590}
 │  └─sdc: Partitioned (dos) 1.82t
 │     └─sdc1: MD raid6 (3/7) 1.82t md0 clean in_sync 'ion:0'
{e6595c64-b3ae-90b3-f011-33ac3f402d20}
 ├─scsi 3:0:0:0 ATA WDC WD20EARS-00M {WD-WMAZ20188479}
 │  └─sdd: Partitioned (dos) 1.82t
 │     └─sdd1: MD raid6 (2/7) 1.82t md0 clean in_sync 'ion:0'
{e6595c64-b3ae-90b3-f011-33ac3f402d20}
 ├─scsi 4:x:x:x [Empty]
 └─scsi 5:x:x:x [Empty]
PCI [sata_mv] 05:00.0 SCSI storage controller: HighPoint Technologies,
Inc. RocketRAID 230x 4 Port SATA-II Controller (rev 02)
 ├─scsi 6:0:0:0 ATA WDC WD20EARS-00M {WD-WCAZA3609190}
 │  └─sde: Partitioned (dos) 1.82t
 │     └─sde1: MD raid6 (6/7) 1.82t md0 clean in_sync 'ion:0'
{e6595c64-b3ae-90b3-f011-33ac3f402d20}
 ├─scsi 7:0:0:0 ATA SAMSUNG HD204UI {S2HGJ1RZ800964}
 │  └─sdf: Partitioned (dos) 1.82t
 │     └─sdf1: MD raid6 (4/7) 1.82t md0 clean in_sync 'ion:0'
{e6595c64-b3ae-90b3-f011-33ac3f402d20}
 ├─scsi 8:0:0:0 ATA WDC WD20EARS-00M {WD-WCAZA1000331}
 │  └─sdg: Partitioned (dos) 1.82t
 │     └─sdg1: MD raid6 (0/7) 1.82t md0 clean in_sync 'ion:0'
{e6595c64-b3ae-90b3-f011-33ac3f402d20}
 └─scsi 9:0:0:0 ATA SAMSUNG HD204UI {S2HGJ1RZ800850}
    └─sdh: Partitioned (dos) 1.82t
       └─sdh1: MD raid6 (5/7) 1.82t md0 clean in_sync 'ion:0'
{e6595c64-b3ae-90b3-f011-33ac3f402d20}

Here's mdadm -E /dev/sdd1:

 $ sudo mdadm -E /dev/sdd1
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : e6595c64:b3ae90b3:f01133ac:3f402d20
           Name : ion:0  (local to host ion)
  Creation Time : Tue Oct 19 08:58:41 2010
     Raid Level : raid6
   Raid Devices : 7

 Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
     Array Size : 19503513600 (9300.00 GiB 9985.80 GB)
  Used Dev Size : 3900702720 (1860.00 GiB 1997.16 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : f9f0e2b3:ab659d71:66cfdb9d:2b87dcea

    Update Time : Fri Jun 10 18:33:55 2011
       Checksum : 6c60e800 - correct
         Events : 6158735

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 2
   Array State : AAAAAAA ('A' == active, '.' == missing)

And finally the SMART status of sdd:

 $ sudo smartctl -a /dev/sdd
smartctl 5.40 2010-10-16 r3189 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (Adv. Format) family
Device Model:     WDC WD20EARS-00MVWB0
Serial Number:    WD-WMAZ20188479
Firmware Version: 50.0AB50
User Capacity:    2,000,398,934,016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Fri Jun 10 18:35:58 2011 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (36000) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection
on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x3035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
Always       -       0
  3 Spin_Up_Time            0x0027   176   162   021    Pre-fail
Always       -       6183
  4 Start_Stop_Count        0x0032   100   100   000    Old_age
Always       -       59
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age
Always       -       0
  9 Power_On_Hours          0x0032   090   090   000    Old_age
Always       -       7781
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age
Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age
Always       -       53
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age
Always       -       32
193 Load_Cycle_Count        0x0032   162   162   000    Old_age
Always       -       114636
194 Temperature_Celsius     0x0022   111   102   000    Old_age
Always       -       39
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age
Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age
Offline      -       1

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      6827         -
# 2  Extended offline    Completed without error       00%      6550         -
# 3  Extended offline    Completed without error       00%      6468         -
# 4  Extended offline    Completed without error       00%      6329         -
# 5  Extended offline    Completed without error       00%      6040         -
# 6  Extended offline    Completed without error       00%      5584         -
# 7  Extended offline    Completed without error       00%      5178         -
# 8  Extended offline    Completed without error       00%      4761         -
# 9  Short offline       Completed without error       00%      2285         -
#10  Extended offline    Completed without error       00%      1514         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Sure, the load_cycle_count is a tad high, but the drive is not new
either. Multi_Zone_Error_Rate is 1, but I'm not sure what that even
is, some vendors don't have this in their SMART table AFAIK.

If anyone could give me any clues that would be appreciated. Thanks!

/M
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: Backup Server RAID Array Event Notification
From: Leslie Rhorer @ 2011-06-10 17:32 UTC (permalink / raw)
  To: 'NeilBrown'; +Cc: linux-raid
In-Reply-To: <34.8F.15242.4FE61FD4@cdptpa-omtalb.mail.rr.com>

> > "mdadm --monitor" does not monitor RAID0 or Linear arrays.  There is
> > nothing
> > to see.  Nothing can fail, they don't rebuilt, they are really just AID,
> > not
> > RAID.

	I've been thinking about this.  It's true they are just AID, but
they most certainly can fail.  If the array truly disappears, perhaps
because a drive fails, I certainly should be notified of it.

> 	That would make sense if I had started the monitor deamon and it had
> sent the e-mail, but the monitor has been running for nearly two days,
> since
> the system was rebooted.  Why send the message nearly a day after the
> deamon
> is started, and why send it more than once (for each array)?
> 
> 	By the same token, why did it wait nearly 8 hours and then again
> more than a day and a half after the array was created to send the
> messages,
> instead of immediately after it was created?
> 
> 	This suggests I am going to be treated to a pair of spurious e-mails
> every day or so telling me the device has disappeared, when it is
> perfectly
> good.  After a few months of that, what happens when one of the devices
> really does disappear?  We all know what happens to the system that cries,
> "Wolf!" all the time.

	Yeah, it looks like it's going to send this message out once a day
for both arrays.  Mdadm sent out another pair of e-mails at 07:44 this
morning. Is no one else seeing this with RAID0 arrays?  Is there some way I
can stop it without impacting any real notifications?  I could intercept the
message in the script run by mdadm to send the e-mail, but if I do, I fear I
might also incorrectly trash a real error message.  I don't suppose
"Wrong-Level" would ever appear in a valid notification for a properly
configured system, would it?  If not, I suppose I could grep for
"Wrong-Level" in the e-mail packet and trash it if the text is found.


^ permalink raw reply

* [PATCH 2/2] imsm: FIX: klocwork: passed dev pointer to is_gen_migration() can be NULL
From: Adam Kwolek @ 2011-06-10 15:56 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer
In-Reply-To: <20110610154856.24539.28960.stgit@gklab-128-013.igk.intel.com>

Pointer dev2 passed in write_super_imsm():4451 can be equal to NULL.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 super-intel.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 8dd0805..3b4010d 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -5321,6 +5321,9 @@ static int update_subarray_imsm(struct supertype *st, char *subarray,
 
 static int is_gen_migration(struct imsm_dev *dev)
 {
+	if (dev == NULL)
+		return 0;
+
 	if (!dev->vol.migr_state)
 		return 0;
 


^ permalink raw reply related

* [PATCH 1/2] imsm: Fix: klocwork: targets variable can be used uninitialized
From: Adam Kwolek @ 2011-06-10 15:56 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer
In-Reply-To: <20110610154856.24539.28960.stgit@gklab-128-013.igk.intel.com>

When target_offsets allocation fails execution goes to abort label,
where elements from targets table are closed.

Initialize targets table after allocation.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 super-intel.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 40fd940..8dd0805 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -7724,6 +7724,9 @@ int save_backup_imsm(struct supertype *st,
 	if (!targets)
 		goto abort;
 
+	for (i = 0; i < new_disks; i++)
+		targets[i] = -1;
+
 	target_offsets = malloc(new_disks * sizeof(unsigned long long));
 	if (!target_offsets)
 		goto abort;


^ permalink raw reply related

* [PATCH 0/2] IMSM Checkpointing Bug Fix Series (3)
From: Adam Kwolek @ 2011-06-10 15:56 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

The following series implements fixes for potential problems found
using Klocwork.

For complete solution all sent patches (7) needs to be applied:
  1. imsm: FIX: Cannot create volume
  2. FIX: Cannot create volume
  3. imsm: FIX: Use function to obtain array layout
  4. imsm: FIX: Disable automatic metadata rollback for broken reshape
  5. imsm: FIX: Raid5 data corruption data recovering from backup
  6. imsm: Fix: klocwork: targets variable can be used uninitialized
  7. imsm: FIX: klocwork: passed dev pointer to is_gen_migration() can be NULL

IMSM Checkpointing Status: All unit tests passed

BR
Adam

---

Adam Kwolek (2):
      imsm: FIX: klocwork: passed dev pointer to is_gen_migration() can be NULL
      imsm: Fix: klocwork: targets variable can be used uninitialized

 super-intel.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

-- 
Signature

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox