Linux RAID subsystem development
 help / color / mirror / Atom feed
* Re: Unable to grow RAID 6 array
From: NeilBrown @ 2011-05-18  8:56 UTC (permalink / raw)
  To: John Robinson; +Cc: Linux RAID, Laurent CARON
In-Reply-To: <4DD37C24.6040307@anonymous.org.uk>

On Wed, 18 May 2011 08:58:28 +0100 John Robinson
<john.robinson@anonymous.org.uk> wrote:

> On 18/05/2011 08:31, Laurent CARON wrote:
> > Hi,
> >
> > I'm basically trying to grow a RAID-6 array (5 disks).
> >
> > I did change the disks one by one (change, rebuild, ...).
> >
> > I finally did delete the last partition (the one I wanted to enlarge and
> > recreated it with the exactly same start but different end).
> >
> > When I try to grow the array, I get:
> >
> > # mdadm --grow --size max /dev/md2
> > mdadm: component size of /dev/md2 has been set to 732458496K
> [...]
> >
> > If any of you can help me to sort this out It would be nice.
> 
> You need to tell mdadm the underlying devices have grown; do this by 
> stopping the array then assembling it again with --update=devicesize. 
> Then when you --grow --size max you will get the result you are looking for.
> 

Alternately you could get this latest mdadm from 
   git://neil.brown.name/mdadm/

and run the "--grow --size max" command again.  It now updates the device
size too.
But John's answer is probably easiest.

NeilBrow

^ permalink raw reply

* Re: Unable to grow RAID 6 array
From: John Robinson @ 2011-05-18  7:58 UTC (permalink / raw)
  To: Linux RAID; +Cc: Laurent CARON
In-Reply-To: <20110518092327.airaicao@trusted.unix-scripts.info>

On 18/05/2011 08:31, Laurent CARON wrote:
> Hi,
>
> I'm basically trying to grow a RAID-6 array (5 disks).
>
> I did change the disks one by one (change, rebuild, ...).
>
> I finally did delete the last partition (the one I wanted to enlarge and
> recreated it with the exactly same start but different end).
>
> When I try to grow the array, I get:
>
> # mdadm --grow --size max /dev/md2
> mdadm: component size of /dev/md2 has been set to 732458496K
[...]
>
> If any of you can help me to sort this out It would be nice.

You need to tell mdadm the underlying devices have grown; do this by 
stopping the array then assembling it again with --update=devicesize. 
Then when you --grow --size max you will get the result you are looking for.

Cheers,

John.


^ permalink raw reply

* Unable to grow RAID 6 array
From: Laurent CARON @ 2011-05-18  7:31 UTC (permalink / raw)
  To: linux-raid

Hi,

I'm basically trying to grow a RAID-6 array (5 disks).

I did change the disks one by one (change, rebuild, ...).

I finally did delete the last partition (the one I wanted to enlarge and
recreated it with the exactly same start but different end).

When I try to grow the array, I get: 

# mdadm --grow --size max /dev/md2
mdadm: component size of /dev/md2 has been set to 732458496K

# cat /proc/mdstat 
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] 
md2 : active raid6 sda3[0] sde3[4] sdd3[3] sdc3[2] sdb3[1]
      2197375488 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/5] [UUUUU]

# mdadm -E /dev/sda3
/dev/sda3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 2687b3cf:4c884e3d:75aff183:342fe97d
           Name : gw:2  (local to host gw)
  Creation Time : Mon Apr 25 22:51:23 2011
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 1464917107 (698.53 GiB 750.04 GB)
     Array Size : 4394750976 (2095.58 GiB 2250.11 GB)
  Used Dev Size : 1464916992 (698.53 GiB 750.04 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 98d5d478:4a58c880:a70646d8:d3b92777

    Update Time : Wed May 18 09:25:25 2011
       Checksum : 7cbdbbb9 - correct
         Events : 84

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAAA ('A' == active, '.' == missing)

# mdadm -D /dev/md2
/dev/md2:
        Version : 1.2
  Creation Time : Mon Apr 25 22:51:23 2011
     Raid Level : raid6
     Array Size : 2197375488 (2095.58 GiB 2250.11 GB)
  Used Dev Size : 732458496 (698.53 GiB 750.04 GB)
   Raid Devices : 5
  Total Devices : 5
    Persistence : Superblock is persistent

    Update Time : Wed May 18 09:26:36 2011
          State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : gw:2  (local to host gw)
           UUID : 2687b3cf:4c884e3d:75aff183:342fe97d
         Events : 84

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3
       2       8       35        2      active sync   /dev/sdc3
       3       8       51        3      active sync   /dev/sdd3
       4       8       67        4      active sync   /dev/sde3

# fdisk -l /dev/sda
Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x09cd117c

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          19      152586   fd  Linux raid autodetect
/dev/sda2              20       30414   244147837+  fd  Linux raid autodetect
/dev/sda3           30415      243201  1709211577+  fd  Linux raid autodetect

The partitions I try to use are now 1.7TB so my raid array should be
able to grow without any trouble.

Needless to say I already rebooted for the kernel to re-read the
partition table.

If any of you can help me to sort this out It would be nice.

Thanks


^ permalink raw reply

* Re: Incompatibility of internal bitmap with ext4 barriers?
From: John Robinson @ 2011-05-17 20:20 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Jason Tinker, linux-raid
In-Reply-To: <20110518010020.353e5865@natsu>

On 17/05/2011 20:00, Roman Mamedov wrote:
> On Tue, 17 May 2011 22:43:48 +0400
> Jason Tinker<jsntinker@gmail.com>  wrote:
>
>> Do you have RAID5 over RAID0's or an ordinary RAID5? On my box
>> ordinary RAID5 works perfectly too.
>
> Yes, I ran RAID5 of 4x2TB+(1+1TB RAID0)+(1.5+0.5TB RAID0), recently changed to
> RAID6 though, also replaced one of the RAID0s with another 2TB drive.
>
>> If your configuration is also RAID5 over RAID0 and it works fine on
>> 2.6.38 then I'll just wait for next ubuntu lts...
>
> I am not saying you should just give up. The logs you posted seem like
> they can be very helpful in tracking this down. But if it is indeed a kernel
> issue, how do you fix or even debug it it without replacing/compiling a new
> kernel, which is exactly what you do not want to do. And if you're already
> replacing a kernel, why not try a 2.6.38 right away, to check if your issue is
> already solved in there (after all 2.6.32 vs 2.6.38 are eons apart BOTH
> mdadm-wise and ext4-wise).

I've said it before and no doubt I'll say it again, but the RHEL kernels 
are often a lot closer to the most recent than their version numbers 
suggest. It's a little more difficult to tell with EL6, because Red Hat 
don't ship vanilla+patches sources any more, but their .32 series 
includes lots of backported fixes from .38 etc.

Jason, it's probably worth testing with vanilla .38 to see if that does 
fix your problem, then post to Red Hat bugzilla saying so, especially as 
it looks to me like the configuration you're trying to use ought to be a 
supported one - then a future RHEL .32 might well include the fix.

Cheers,

John.
(Not a spokesman for Red Hat, nor anyone else other than myself, and 
sometimes not even that ;-)


^ permalink raw reply

* Re: permanently removing a spare
From: NeilBrown @ 2011-05-17 19:44 UTC (permalink / raw)
  To: Tobias McNulty; +Cc: linux-raid
In-Reply-To: <BANLkTim+8mu9xuUsQvCXPXUSVLfpveGofg@mail.gmail.com>

On Tue, 17 May 2011 15:09:18 -0400 Tobias McNulty <tobias@caktusgroup.com>
wrote:

> On Wed, May 11, 2011 at 9:49 PM, Tobias McNulty <tobias@caktusgroup.com> wrote:
> >
> > Hi all,
> >
> > After successfully converting my raid6 array to raid5, I of course
> > neglected to update mdadm.conf, so the array was absent on reboot.  A
> > quick mdadm --assemble brought the array back online.
> >
> > However, now I am trying to update mdadm.conf, and I hit what I think
> > is this bug:
> >
> > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=610184
> >
> > So I thought I'd try to remove the spare from my raid5 array.  I
> > marked it as failed and then removed it, and the spare no longer shows
> > in /proc/mdstat:
> >
> > md0 : active raid5 sda[0] sdd[3] sdc[2] sdb[1]
> >      5860543488 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
> >
> > However, when I do mdadm -Es, I still see it:
> >
> > ARRAY /dev/md0 UUID=25a818ff:68f07e28:0d7656f3:2f233380
> >   spares=1
> >
> > And I still see the "error: superfluous RAID member (4 found)." error
> > when running update-grub (even if I leave out the spares=1 part).
> >
> > Is there a way to "permanently" remove the spare "slot" from the
> > array?  I tried mdadm --grow --spare-devices=0, since the man page
> > arguably suggests that --spare-devices should work in grow mode, but
> > running the command reports that it's not actually supported.  To the
> > credit of the man page, the description of --spare-devices *does* say
> > it is used in the *initial* array creation.
> >
> > Does what I'm trying to do make sense?  Is there a way to make the
> > array forget that it ever had a spare in the first place?
> >
> > I'm a little afraid to reboot until I get this figured out.
> 
> Hey all - I think I am missing something obvious but I am not sure
> what it is, and I still haven't turned up anything in my own
> searching.
> 
> Do you have any advice for what I need to do so that the array is
> mounted automatically on boot again?
>

If you want to stop a spare from looking like part of the array, simply

   mdadm --zero-superblock /dev/DEVICENAME

But you really want it to assemble automatically at boot and I cannot see how
a spare would interfere with that, bugs.debian.org isn't responding just now.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: permanently removing a spare
From: Tobias McNulty @ 2011-05-17 19:09 UTC (permalink / raw)
  To: linux-raid
In-Reply-To: <BANLkTinbk9VMuiLNUA8VW-gE1Znbf9caWA@mail.gmail.com>

On Wed, May 11, 2011 at 9:49 PM, Tobias McNulty <tobias@caktusgroup.com> wrote:
>
> Hi all,
>
> After successfully converting my raid6 array to raid5, I of course
> neglected to update mdadm.conf, so the array was absent on reboot.  A
> quick mdadm --assemble brought the array back online.
>
> However, now I am trying to update mdadm.conf, and I hit what I think
> is this bug:
>
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=610184
>
> So I thought I'd try to remove the spare from my raid5 array.  I
> marked it as failed and then removed it, and the spare no longer shows
> in /proc/mdstat:
>
> md0 : active raid5 sda[0] sdd[3] sdc[2] sdb[1]
>      5860543488 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
>
> However, when I do mdadm -Es, I still see it:
>
> ARRAY /dev/md0 UUID=25a818ff:68f07e28:0d7656f3:2f233380
>   spares=1
>
> And I still see the "error: superfluous RAID member (4 found)." error
> when running update-grub (even if I leave out the spares=1 part).
>
> Is there a way to "permanently" remove the spare "slot" from the
> array?  I tried mdadm --grow --spare-devices=0, since the man page
> arguably suggests that --spare-devices should work in grow mode, but
> running the command reports that it's not actually supported.  To the
> credit of the man page, the description of --spare-devices *does* say
> it is used in the *initial* array creation.
>
> Does what I'm trying to do make sense?  Is there a way to make the
> array forget that it ever had a spare in the first place?
>
> I'm a little afraid to reboot until I get this figured out.

Hey all - I think I am missing something obvious but I am not sure
what it is, and I still haven't turned up anything in my own
searching.

Do you have any advice for what I need to do so that the array is
mounted automatically on boot again?

Thank you!
Tobias
--
Tobias McNulty, Managing Member
Caktus Consulting Group, LLC
http://www.caktusgroup.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Incompatibility of internal bitmap with ext4 barriers?
From: Roman Mamedov @ 2011-05-17 19:00 UTC (permalink / raw)
  To: Jason Tinker; +Cc: linux-raid
In-Reply-To: <BANLkTikvZZYy9fm_sA8oAmkqim1tkbaC_A@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 980 bytes --]

On Tue, 17 May 2011 22:43:48 +0400
Jason Tinker <jsntinker@gmail.com> wrote:

> Do you have RAID5 over RAID0's or an ordinary RAID5? On my box
> ordinary RAID5 works perfectly too.

Yes, I ran RAID5 of 4x2TB+(1+1TB RAID0)+(1.5+0.5TB RAID0), recently changed to
RAID6 though, also replaced one of the RAID0s with another 2TB drive.

> If your configuration is also RAID5 over RAID0 and it works fine on
> 2.6.38 then I'll just wait for next ubuntu lts...

I am not saying you should just give up. The logs you posted seem like
they can be very helpful in tracking this down. But if it is indeed a kernel
issue, how do you fix or even debug it it without replacing/compiling a new
kernel, which is exactly what you do not want to do. And if you're already
replacing a kernel, why not try a 2.6.38 right away, to check if your issue is
already solved in there (after all 2.6.32 vs 2.6.38 are eons apart BOTH
mdadm-wise and ext4-wise).

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: Incompatibility of internal bitmap with ext4 barriers?
From: Jason Tinker @ 2011-05-17 18:43 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: linux-raid
In-Reply-To: <20110517191746.556b10fa@natsu>

2011/5/17 Roman Mamedov <rm@romanrm.ru>:
> On Tue, 17 May 2011 16:10:07 +0400
> Jason Tinker <jsntinker@gmail.com> wrote:
>
> The same combination works perfectly for me on 2.6.38.

Do you have RAID5 over RAID0's or an ordinary RAID5? On my box
ordinary RAID5 works perfectly too.

>
> How about a newer kernel, maybe?

This is out of the question, unfortunately. Compiling my own kernel on
top of RHEL would kind of defeat the purpose of having safe low
maintenance distro. And both stable Debian and Ubuntu LTS have the
same kernel.

If your configuration is also RAID5 over RAID0 and it works fine on
2.6.38 then I'll just wait for next ubuntu lts...

^ permalink raw reply

* Re: Incompatibility of internal bitmap with ext4 barriers?
From: Roman Mamedov @ 2011-05-17 13:17 UTC (permalink / raw)
  To: Jason Tinker; +Cc: linux-raid
In-Reply-To: <BANLkTin9h+feRNWg1pWFq1t6dmBpn9keKw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 934 bytes --]

On Tue, 17 May 2011 16:10:07 +0400
Jason Tinker <jsntinker@gmail.com> wrote:

> I have encountered a weird bug when trying to use ext4 partition on
> top of mdadm RAID5 array. Mdadm array has internal bitmap, and ext4 is
> mounted with default options - which means that barriers are enabled.

The same combination works perfectly for me on 2.6.38.

> When trying to write large enough amount of files system just locks up
> indefinitely, only hard reset helps. This seems to happen at random
> times, yet consistently after several minutes of usage.
> I tried different configuration options and it seems to happen only
> when both barriers on ext4 and mdadm's internal bitmaps are enabled.
> After disabling ext4 barriers for good (it seemed like a lesser evil)
> no lock ups happened for 3 months.
> Mdadm is version 3.1.3, kernel is 2.6.32 (rhel6)

How about a newer kernel, maybe?

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: repartitioning disks
From: CoolCold @ 2011-05-17 12:31 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux RAID
In-Reply-To: <20110517203944.65d8114e@notabene.brown>

On Tue, May 17, 2011 at 2:39 PM, NeilBrown <neilb@suse.de> wrote:
> On Tue, 17 May 2011 12:20:50 +0400 CoolCold <coolthecold@gmail.com> wrote:
>
>> Wiki says: "Never NEVER never re-partition disks that are part of a
>> running RAID. If you must alter the partition table on a disk which is
>> a part of a RAID, stop the array first, then repartition. " -
>> https://raid.wiki.kernel.org/index.php/Tweaking,_tuning_and_troubleshooting#Pitfalls
>>
>> Is it really true for situations like - I have 2x1Tb drives, which are
>> already partitioned like /dev/sd{a,b}1 - 500mb, /boot & /dev/sd{a,b}2
>> - 20gb, / and are assembled in RAID1 arrays md0 & md1 accordingly. So,
>> if I want to create one more RAID1 array , say md3 from the rest of
>> the drives.
>> So i take my cfdisk ,add new partition with some space 100-150mb from
>> the end, do write changes & partprobe the drives, then creating new
>> array.
>>
>> Is it bad? To be honest i'm doing this all the time and can't
>> understand how this gonna hurt md. Neil and/or others, please clarify
>> this.
>>
>>
>
> There shouldn't be any problem with that as long as you are careful (and if
> you aren't careful, there are plenty of other ways to destroy your data).
>
> I wasn't aware of partprobe.    Just telling the kernel to reread the
> partition table won't work when a partition is in use.
> But partprobe seems to just tell the kernel about the partitions that have
> changed, using a different ioctl, and that seem to work.
Hmm..when I'm adding new partitions (or even deleting partitions which
are not part of any array), i'm not changing the existing ones, which
are part of arrays, and even without partprobe this action should be
ok, as it doesn't do anything to md's metadata ?
Updating partition which is part of running array going to be bad
thing, this is clear for me.

>
> NeilBrown
>



-- 
Best regards,
[COOLCOLD-RIPN]
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Incompatibility of internal bitmap with ext4 barriers?
From: Jason Tinker @ 2011-05-17 12:10 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3639 bytes --]

Hello

I have encountered a weird bug when trying to use ext4 partition on
top of mdadm RAID5 array. Mdadm array has internal bitmap, and ext4 is
mounted with default options - which means that barriers are enabled.
When trying to write large enough amount of files system just locks up
indefinitely, only hard reset helps. This seems to happen at random
times, yet consistently after several minutes of usage.
I tried different configuration options and it seems to happen only
when both barriers on ext4 and mdadm's internal bitmaps are enabled.
After disabling ext4 barriers for good (it seemed like a lesser evil)
no lock ups happened for 3 months.
Mdadm is version 3.1.3, kernel is 2.6.32 (rhel6)

Initialy it happened on this configuration:
1 RAID0 of 2x1TB drives + 1 RAID0 of 2x1TB drives +1x2TB drive
Each array had 1 hdd on PCI SATA (SiliconImage) controller and 1 on
internal ICH7 (Intel G41 chipset), 2Tb drive was on PCI SATA
controller.

Later I successfully reproduced the same bug in a test setup with all
partitions on a single drive.
The following steps where taken:
1. Created 6 identical blank partitions (I made 6*1GB partitions on a
single hdd).
2. Created 3 RAID0 arrays from these partitions: 0&1, 2&3, 4&5.
3. Created MBR and blank primary partition on each of these arrays
using fdisk (this step is
probably optional).
4. Created 1 RAID5 array from these 3 partitions with --bitmap=internal,
everything else default.
5. Created ext4 filesystem on RAID5 with default options, mount with default
options (barriers are enabled by default).
6. Tried to rsync several hundred of ~1-20 MB files to mounted directory.

/var/log/messages at the moment of lock up:

kernel: INFO: task md90_raid5:13736 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_
timeout_secs" disables this message.
kernel: md90_raid5    D 0000000000000000     0 13736      2 0x00000080
kernel: ffff88003e0b7ab0 0000000000000046 ffff88006d273e60 ffff8800716b8240
kernel: ffff88003e0b7a50 ffffffff8123a553 ffff880072ea4800 0000000000000810
kernel: ffff88006f50ba98 ffff88003e0b7fd8 0000000000010518 ffff88006f50ba98
kernel: Call Trace:
kernel: [<ffffffff8123a553>] ? elv_insert+0x133/0x1f0
kernel: [<ffffffff810920ce>] ? prepare_to_wait+0x4e/0x80
kernel: [<ffffffff813d0535>] md_make_request+0x85/0x230
kernel: [<ffffffff81091de0>] ? autoremove_wake_function+0x0/0x40
kernel: [<ffffffff81241652>] ? generic_make_request+0x1b2/0x4f0
kernel: [<ffffffff81241652>] generic_make_request+0x1b2/0x4f0
kernel: [<ffffffff8106333a>] ? find_busiest_group+0x96a/0xb40
kernel: [<ffffffffa03d8d9d>] ops_run_io+0x22d/0x330 [raid456]
kernel: [<ffffffff813d1ef6>] ? md_super_write+0xd6/0xe0
kernel: [<ffffffffa03db9f5>] handle_stripe+0x4d5/0x22e0 [raid456]
kernel: [<ffffffff81059db2>] ? finish_task_switch+0x42/0xd0
kernel: [<ffffffffa03ddc9f>] raid5d+0x49f/0x690 [raid456]
kernel: [<ffffffff813d182c>] md_thread+0x5c/0x130
kernel: [<ffffffff81091de0>] ? autoremove_wake_function+0x0/0x40
kernel: [<ffffffff813d17d0>] ? md_thread+0x0/0x130
kernel: [<ffffffff81091a76>] kthread+0x96/0xa0
kernel: [<ffffffff810141ca>] child_rip+0xa/0x20
kernel: [<ffffffff810919e0>] ? kthread+0x0/0xa0
kernel: [<ffffffff810141c0>] ? child_rip+0x0/0x20

I have included all additional info and logs about test setup in
separate attachments.
According to logs it seems that the bug is in mdadm, but I'm not sure
since I haven't found any similar reports anywhere.
It would be great if someone tried to reproduce it on their machine,
this shouldn't take long, maybe 20 minutes or so...

[-- Attachment #2: fdisk-info --]
[-- Type: application/octet-stream, Size: 1116 bytes --]

Disk /dev/sdb: 1000.2 GB, 1000203804160 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953523055 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0000c1b3

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *        2048     1050623      524288   83  Linux
/dev/sdb2         1050624    63965183    31457280   83  Linux
/dev/sdb3        63965184   126879743    31457280   83  Linux
/dev/sdb4       126879744  1953521663   913320960    5  Extended
/dev/sdb5       126883840   189798399    31457280   83  Linux
/dev/sdb6       189800448   198189055     4194304   82  Linux swap / Solaris
/dev/sdb7       198193968   200137769      971901   83  Linux
/dev/sdb8       200137833   202097699      979933+  83  Linux
/dev/sdb9       202097763   204057629      979933+  83  Linux
/dev/sdb10      204057693   206017559      979933+  83  Linux
/dev/sdb11      206017623   207977489      979933+  83  Linux
/dev/sdb12      207977553   209937419      979933+  83  Linux


[-- Attachment #3: messages --]
[-- Type: application/octet-stream, Size: 17016 bytes --]

Feb  8 19:04:24 store-el6 kernel: md: bind<md910p1>
Feb  8 19:04:24 store-el6 kernel: md: bind<md911p1>
Feb  8 19:04:24 store-el6 kernel: md: bind<md912p1>
Feb  8 19:04:24 store-el6 kernel: raid5: device md911p1 operational as raid disk 1
Feb  8 19:04:24 store-el6 kernel: raid5: device md910p1 operational as raid disk 0
Feb  8 19:04:24 store-el6 kernel: raid5: allocated 3230kB for md90
Feb  8 19:04:24 store-el6 kernel: 1: w=1 pa=0 pr=3 m=1 a=2 r=3 op1=0 op2=0
Feb  8 19:04:24 store-el6 kernel: 0: w=2 pa=0 pr=3 m=1 a=2 r=3 op1=0 op2=0
Feb  8 19:04:24 store-el6 kernel: raid5: raid level 5 set md90 active with 2 out of 3 devices, algorithm 2
Feb  8 19:04:24 store-el6 kernel: RAID5 conf printout:
Feb  8 19:04:24 store-el6 kernel: --- rd:3 wd:2
Feb  8 19:04:24 store-el6 kernel: disk 0, o:1, dev:md910p1
Feb  8 19:04:24 store-el6 kernel: disk 1, o:1, dev:md911p1
Feb  8 19:04:24 store-el6 kernel: md90: bitmap initialized from disk: read 1/1 pages, set 30 bits
Feb  8 19:04:24 store-el6 kernel: created bitmap (1 pages) for device md90
Feb  8 19:04:24 store-el6 kernel: md90: detected capacity change from 0 to 3986685952
Feb  8 19:04:24 store-el6 kernel: RAID5 conf printout:
Feb  8 19:04:24 store-el6 kernel: --- rd:3 wd:2
Feb  8 19:04:24 store-el6 kernel: disk 0, o:1, dev:md910p1
Feb  8 19:04:24 store-el6 kernel: disk 1, o:1, dev:md911p1
Feb  8 19:04:24 store-el6 kernel: disk 2, o:1, dev:md912p1
Feb  8 19:04:24 store-el6 kernel: md: recovery of RAID array md90
Feb  8 19:04:24 store-el6 kernel: md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
Feb  8 19:04:24 store-el6 kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
Feb  8 19:04:24 store-el6 kernel: md: using 128k window, over a total of 1946624 blocks.
Feb  8 19:04:25 store-el6 kernel: md90: unknown partition table
Feb  8 19:05:50 store-el6 kernel: EXT4-fs (md90): mounted filesystem with ordered data mode
Feb  8 19:06:31 store-el6 kernel: md: md90: recovery done.
Feb  8 19:06:32 store-el6 kernel: RAID5 conf printout:
Feb  8 19:06:32 store-el6 kernel: --- rd:3 wd:3
Feb  8 19:06:32 store-el6 kernel: disk 0, o:1, dev:md910p1
Feb  8 19:06:32 store-el6 kernel: disk 1, o:1, dev:md911p1
Feb  8 19:06:32 store-el6 kernel: disk 2, o:1, dev:md912p1
Feb  8 19:06:35 store-el6 kernel: EXT4-fs (md90): mounted filesystem with ordered data mode

#
# here i started "rsync -av /somewhere /mnt/test-raid" in gnome-terminal
#

Feb  8 19:09:39 store-el6 kernel: INFO: task gnome-terminal:5463 blocked for more than 120 seconds.
Feb  8 19:09:39 store-el6 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  8 19:09:39 store-el6 kernel: gnome-termina D ffff8800413ed998     0  5463   5459 0x00000080
Feb  8 19:09:39 store-el6 kernel: ffff8800413ed8b8 0000000000000086 ffff88004bd2e138 ffff88004bd2e138
Feb  8 19:09:39 store-el6 kernel: ffff880001e96980 ffff880001e96980 ffff880066e48080 ffff880001e96980
Feb  8 19:09:39 store-el6 kernel: ffff880066e48638 ffff8800413edfd8 0000000000010518 ffff880066e48638
Feb  8 19:09:39 store-el6 kernel: Call Trace:
Feb  8 19:09:39 store-el6 kernel: [<ffffffff814c9ad5>] schedule_timeout+0x225/0x2f0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8101ae45>] ? native_sched_clock+0x15/0x70
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8101a4c9>] ? sched_clock+0x9/0x10
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81097f25>] ? sched_clock_local+0x25/0x90
Feb  8 19:09:39 store-el6 kernel: [<ffffffff814c9743>] wait_for_common+0x123/0x180
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8105c530>] ? default_wake_function+0x0/0x20
Feb  8 19:09:39 store-el6 kernel: [<ffffffff814c985d>] wait_for_completion+0x1d/0x20
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8108d107>] flush_work+0x77/0xc0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8108cb90>] ? wq_barrier_func+0x0/0x20
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8108d324>] flush_delayed_work+0x54/0x70
Feb  8 19:09:39 store-el6 kernel: [<ffffffff812fd5d5>] tty_flush_to_ldisc+0x15/0x20
Feb  8 19:09:39 store-el6 kernel: [<ffffffff812f8117>] n_tty_poll+0x67/0x1d0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff812f3baa>] tty_poll+0x8a/0xa0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff811824ab>] do_sys_poll+0x29b/0x520
Feb  8 19:09:39 store-el6 kernel: [<ffffffff811820c0>] ? __pollwait+0x0/0xf0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff811821b0>] ? pollwake+0x0/0x60
Feb  8 19:09:39 store-el6 kernel: [<ffffffff811821b0>] ? pollwake+0x0/0x60
Feb  8 19:09:39 store-el6 kernel: [<ffffffff811821b0>] ? pollwake+0x0/0x60
Feb  8 19:09:39 store-el6 kernel: [<ffffffff811821b0>] ? pollwake+0x0/0x60
Feb  8 19:09:39 store-el6 kernel: [<ffffffff811821b0>] ? pollwake+0x0/0x60
Feb  8 19:09:39 store-el6 kernel: [<ffffffff811821b0>] ? pollwake+0x0/0x60
Feb  8 19:09:39 store-el6 kernel: [<ffffffff811821b0>] ? pollwake+0x0/0x60
Feb  8 19:09:39 store-el6 kernel: [<ffffffff811821b0>] ? pollwake+0x0/0x60
Feb  8 19:09:39 store-el6 kernel: [<ffffffff811821b0>] ? pollwake+0x0/0x60
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8120c6cf>] ? selinux_file_permission+0xbf/0x150
Feb  8 19:09:39 store-el6 kernel: [<ffffffff811ffb76>] ? security_file_permission+0x16/0x20
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8116d9d1>] ? vfs_read+0x181/0x1a0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8118292c>] sys_poll+0x7c/0x110
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81013172>] system_call_fastpath+0x16/0x1b
Feb  8 19:09:39 store-el6 kernel: INFO: task md90_raid5:13736 blocked for more than 120 seconds.
Feb  8 19:09:39 store-el6 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  8 19:09:39 store-el6 kernel: md90_raid5    D 0000000000000000     0 13736      2 0x00000080
Feb  8 19:09:39 store-el6 kernel: ffff88003e0b7ab0 0000000000000046 ffff88006d273e60 ffff8800716b8240
Feb  8 19:09:39 store-el6 kernel: ffff88003e0b7a50 ffffffff8123a553 ffff880072ea4800 0000000000000810
Feb  8 19:09:39 store-el6 kernel: ffff88006f50ba98 ffff88003e0b7fd8 0000000000010518 ffff88006f50ba98
Feb  8 19:09:39 store-el6 kernel: Call Trace:
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8123a553>] ? elv_insert+0x133/0x1f0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff810920ce>] ? prepare_to_wait+0x4e/0x80
Feb  8 19:09:39 store-el6 kernel: [<ffffffff813d0535>] md_make_request+0x85/0x230
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81091de0>] ? autoremove_wake_function+0x0/0x40
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81241652>] ? generic_make_request+0x1b2/0x4f0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81241652>] generic_make_request+0x1b2/0x4f0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8106333a>] ? find_busiest_group+0x96a/0xb40
Feb  8 19:09:39 store-el6 kernel: [<ffffffffa03d8d9d>] ops_run_io+0x22d/0x330 [raid456]
Feb  8 19:09:39 store-el6 kernel: [<ffffffff813d1ef6>] ? md_super_write+0xd6/0xe0
Feb  8 19:09:39 store-el6 kernel: [<ffffffffa03db9f5>] handle_stripe+0x4d5/0x22e0 [raid456]
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81059db2>] ? finish_task_switch+0x42/0xd0
Feb  8 19:09:39 store-el6 kernel: [<ffffffffa03ddc9f>] raid5d+0x49f/0x690 [raid456]
Feb  8 19:09:39 store-el6 kernel: [<ffffffff813d182c>] md_thread+0x5c/0x130
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81091de0>] ? autoremove_wake_function+0x0/0x40
Feb  8 19:09:39 store-el6 kernel: [<ffffffff813d17d0>] ? md_thread+0x0/0x130
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81091a76>] kthread+0x96/0xa0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff810141ca>] child_rip+0xa/0x20
Feb  8 19:09:39 store-el6 kernel: [<ffffffff810919e0>] ? kthread+0x0/0xa0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff810141c0>] ? child_rip+0x0/0x20
Feb  8 19:09:39 store-el6 kernel: INFO: task flush-9:90:13883 blocked for more than 120 seconds.
Feb  8 19:09:39 store-el6 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  8 19:09:39 store-el6 kernel: flush-9:90    D 0000000000000001     0 13883      2 0x00000080
Feb  8 19:09:39 store-el6 kernel: ffff88003e16b6d0 0000000000000046 ffff88003e16b6a0 ffff880075384dd8
Feb  8 19:09:39 store-el6 kernel: ffff880070efa618 ffffffffa01e5a00 ffff880073744610 0000000000000000
Feb  8 19:09:39 store-el6 kernel: ffff88005023d0e8 ffff88003e16bfd8 0000000000010518 ffff88005023d0e8
Feb  8 19:09:39 store-el6 kernel: Call Trace:
Feb  8 19:09:39 store-el6 kernel: [<ffffffff810920ce>] ? prepare_to_wait+0x4e/0x80
Feb  8 19:09:39 store-el6 kernel: [<ffffffff813d0535>] md_make_request+0x85/0x230
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81091de0>] ? autoremove_wake_function+0x0/0x40
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81241652>] generic_make_request+0x1b2/0x4f0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8110e865>] ? mempool_alloc_slab+0x15/0x20
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81091dcf>] ? wake_up_bit+0x2f/0x40
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81241a1f>] submit_bio+0x8f/0x120
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8119d324>] submit_bh+0xf4/0x140
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8119f320>] __block_write_full_page+0x1e0/0x3b0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8119e9b0>] ? end_buffer_async_write+0x0/0x190
Feb  8 19:09:39 store-el6 kernel: [<ffffffffa01b4c00>] ? noalloc_get_block_write+0x0/0x60 [ext4]
Feb  8 19:09:39 store-el6 kernel: [<ffffffffa01b4c00>] ? noalloc_get_block_write+0x0/0x60 [ext4]
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8119fc20>] block_write_full_page_endio+0xe0/0x120
Feb  8 19:09:39 store-el6 kernel: [<ffffffffa01af960>] ? ext4_bh_delay_or_unwritten+0x0/0x30 [ext4]
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8119fc75>] block_write_full_page+0x15/0x20
Feb  8 19:09:39 store-el6 kernel: [<ffffffffa01b0ea6>] ext4_writepage+0xd6/0x3a0 [ext4]
Feb  8 19:09:39 store-el6 kernel: [<ffffffffa01b0d2c>] mpage_da_submit_io+0x14c/0x1d0 [ext4]
Feb  8 19:09:39 store-el6 kernel: [<ffffffffa01b6149>] ext4_da_writepages+0x3d9/0x600 [ext4]
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81120bc1>] do_writepages+0x21/0x40
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8119656d>] writeback_single_inode+0xdd/0x2c0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8119696e>] writeback_sb_inodes+0xce/0x180
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81196ac3>] writeback_inodes_wb+0xa3/0x1a0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81196e5b>] wb_writeback+0x29b/0x3f0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff814c8d96>] ? thread_return+0x4e/0x778
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81197149>] wb_do_writeback+0x199/0x240
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81197253>] bdi_writeback_task+0x63/0x1b0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81091ca7>] ? bit_waitqueue+0x17/0xd0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8112ef70>] ? bdi_start_fn+0x0/0x100
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8112eff6>] bdi_start_fn+0x86/0x100
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8112ef70>] ? bdi_start_fn+0x0/0x100
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81091a76>] kthread+0x96/0xa0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff810141ca>] child_rip+0xa/0x20
Feb  8 19:09:39 store-el6 kernel: [<ffffffff810919e0>] ? kthread+0x0/0xa0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff810141c0>] ? child_rip+0x0/0x20
Feb  8 19:09:39 store-el6 kernel: INFO: task jbd2/md90-8:17803 blocked for more than 120 seconds.
Feb  8 19:09:39 store-el6 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  8 19:09:39 store-el6 kernel: jbd2/md90-8   D 0000000000000002     0 17803      2 0x00000080
Feb  8 19:09:39 store-el6 kernel: ffff8800741d5c10 0000000000000046 ffff8800741d5bb0 ffffffffa03d92e8
Feb  8 19:09:39 store-el6 kernel: ffff880047bc9c00 ffff880047bc9da0 ffff880047bc9c00 ffff880072044200
Feb  8 19:09:39 store-el6 kernel: ffff88003e183a98 ffff8800741d5fd8 0000000000010518 ffff88003e183a98
Feb  8 19:09:39 store-el6 kernel: Call Trace:
Feb  8 19:09:39 store-el6 kernel: [<ffffffffa03d92e8>] ? unplug_slaves+0x98/0xe0 [raid456]
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8109bae9>] ? ktime_get_ts+0xa9/0xe0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8119e580>] ? sync_buffer+0x0/0x50
Feb  8 19:09:39 store-el6 kernel: [<ffffffff814c9533>] io_schedule+0x73/0xc0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8119e5c0>] sync_buffer+0x40/0x50
Feb  8 19:09:39 store-el6 kernel: [<ffffffff814c9daf>] __wait_on_bit+0x5f/0x90
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8119e580>] ? sync_buffer+0x0/0x50
Feb  8 19:09:39 store-el6 kernel: [<ffffffff814c9e58>] out_of_line_wait_on_bit+0x78/0x90
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81091e20>] ? wake_bit_function+0x0/0x50
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8119e576>] __wait_on_buffer+0x26/0x30
Feb  8 19:09:39 store-el6 kernel: [<ffffffffa0184801>] jbd2_journal_commit_transaction+0x10d1/0x14e0 [jbd2]
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81091de0>] ? autoremove_wake_function+0x0/0x40
Feb  8 19:09:39 store-el6 kernel: [<ffffffffa018a0b8>] kjournald2+0xb8/0x220 [jbd2]
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81091de0>] ? autoremove_wake_function+0x0/0x40
Feb  8 19:09:39 store-el6 kernel: [<ffffffffa018a000>] ? kjournald2+0x0/0x220 [jbd2]
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81091a76>] kthread+0x96/0xa0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff810141ca>] child_rip+0xa/0x20
Feb  8 19:09:39 store-el6 kernel: [<ffffffff810919e0>] ? kthread+0x0/0xa0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff810141c0>] ? child_rip+0x0/0x20
Feb  8 19:09:39 store-el6 kernel: INFO: task rsync:17880 blocked for more than 120 seconds.
Feb  8 19:09:39 store-el6 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  8 19:09:39 store-el6 kernel: rsync         D 0000000000000100     0 17880  17879 0x00000080
Feb  8 19:09:39 store-el6 kernel: ffff880012625918 0000000000000086 ffff880012625968 ffffffff8111e411
Feb  8 19:09:39 store-el6 kernel: ffff8800000126c0 ffffea0000697188 ffff8800772f5118 0000000000001000
Feb  8 19:09:39 store-el6 kernel: ffff880037db5a58 ffff880012625fd8 0000000000010518 ffff880037db5a58
Feb  8 19:09:39 store-el6 kernel: Call Trace:
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8111e411>] ? __alloc_pages_nodemask+0x111/0x850
Feb  8 19:09:39 store-el6 kernel: [<ffffffff810920ce>] ? prepare_to_wait+0x4e/0x80
Feb  8 19:09:39 store-el6 kernel: [<ffffffff813d0535>] md_make_request+0x85/0x230
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81091de0>] ? autoremove_wake_function+0x0/0x40
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8119d5ef>] ? __find_get_block_slow+0xaf/0x130
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81241652>] generic_make_request+0x1b2/0x4f0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8110e865>] ? mempool_alloc_slab+0x15/0x20
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8119dcfc>] ? __getblk+0x9c/0x2e0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81241a1f>] submit_bio+0x8f/0x120
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8119d324>] submit_bh+0xf4/0x140
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8119ec83>] ll_rw_block+0x143/0x150
Feb  8 19:09:39 store-el6 kernel: [<ffffffffa01baa59>] ext4_find_entry+0x179/0x4c0 [ext4]
Feb  8 19:09:39 store-el6 kernel: [<ffffffffa01baded>] ext4_lookup+0x4d/0x140 [ext4]
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8117ae0b>] do_lookup+0x18b/0x220
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8117b9c5>] __link_path_walk+0x6f5/0x1040
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8117c59a>] path_walk+0x6a/0xe0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8117c76b>] do_path_lookup+0x5b/0xa0
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8116e6a1>] ? get_empty_filp+0xa1/0x170
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8117d6a6>] do_filp_open+0x106/0xd50
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8120c6cf>] ? selinux_file_permission+0xbf/0x150
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81189b82>] ? alloc_fd+0x92/0x160
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8116a1a9>] do_sys_open+0x69/0x140
Feb  8 19:09:39 store-el6 kernel: [<ffffffff8116a2c0>] sys_open+0x20/0x30
Feb  8 19:09:39 store-el6 kernel: [<ffffffff81013172>] system_call_fastpath+0x16/0x1b
Feb  8 19:09:50 store-el6 ntpd[2024]: synchronized to 92.43.184.44, stratum 2

#
# here i was forced to make a hard reset, neither remote x session was not responging nor local ttys
#

Feb  8 19:11:35 store-el6 kernel: imklog 4.6.2, log source = /proc/kmsg started.
Feb  8 19:11:35 store-el6 rsyslogd: [origin software="rsyslogd" swVersion="4.6.2" x-pid="1670" x-info="http://www.rsyslog.com"] (re)start
Feb  8 19:11:35 store-el6 kernel: Initializing cgroup subsys cpuset
Feb  8 19:11:35 store-el6 kernel: Initializing cgroup subsys cpu
Feb  8 19:11:35 store-el6 kernel: Linux version 2.6.32-71.14.1.el6.x86_64 (mockbuild@ls20-bc2-14.build.redhat.com) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) ) #1 SMP Wed Jan 5 17:01:01 EST 2011
Feb  8 19:11:35 store-el6 kernel: Command line: ro root=UUID=a59ea22b-6adf-497b-9409-4cf9062c0b99 rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us crashkernel=auto rhgb quiet


[-- Attachment #4: raid-conf --]
[-- Type: application/octet-stream, Size: 651 bytes --]

DEVICE partitions
MAILADDR root

ARRAY /dev/md910 level=raid0 metadata=1.2 num-devices=2 
	UUID=5ebddd81:9a2d0aeb:d55f3c13:e1c532cf name=store-el6:test1
	devices=/dev/sdb8,/dev/sdb7
ARRAY /dev/md911 level=raid0 metadata=1.2 num-devices=2 
	UUID=6a06b35a:b39e97e4:77b26ada:f324e8fa name=store-el6:test2
	devices=/dev/sdb10,/dev/sdb9
ARRAY /dev/md912 level=raid0 metadata=1.2 num-devices=2 
	UUID=65c9e5f3:1594e59e:b41d1d8e:6f2671de name=store-el6:test3
	devices=/dev/sdb12,/dev/sdb11
ARRAY /dev/md90 level=raid5 metadata=1.2 num-devices=3 
	UUID=ae8ba75d:d0415f07:5d60eedd:394f28eb name=store-el6:test
	devices=/dev/md910p1,/dev/md911p1,/dev/md912p1



[-- Attachment #5: raid-details --]
[-- Type: application/octet-stream, Size: 898 bytes --]

/dev/md90:
        Version : 1.2
  Creation Time : Tue Feb  8 19:04:24 2011
     Raid Level : raid5
     Array Size : 3893248 (3.71 GiB 3.99 GB)
  Used Dev Size : 1946624 (1901.32 MiB 1993.34 MB)
   Raid Devices : 3
  Total Devices : 3
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Tue Feb  8 19:32:28 2011
          State : active
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : store-el6:test  (local to host store-el6)
           UUID : 86914486:82756172:cf377f53:95c77f0a
         Events : 46

    Number   Major   Minor   RaidDevice State
       0     259        2        0      active sync   /dev/md910p1
       1     259        3        1      active sync   /dev/md911p1
       3     259        4        2      active sync   /dev/md912p1


[-- Attachment #6: raid-mdstat --]
[-- Type: application/octet-stream, Size: 482 bytes --]

Personalities : [raid0] [raid6] [raid5] [raid4] 
md90 : active raid5 md910p1[0] md912p1[3] md911p1[1]
      3893248 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md912 : active raid0 sdb12[1] sdb11[0]
      1956864 blocks super 1.2 512k chunks
      
md910 : active raid0 sdb7[0] sdb8[1]
      1949184 blocks super 1.2 512k chunks
      
md911 : active raid0 sdb9[0] sdb10[1]
      1956864 blocks super 1.2 512k chunks


^ permalink raw reply

* Re: repartitioning disks
From: NeilBrown @ 2011-05-17 10:39 UTC (permalink / raw)
  To: CoolCold; +Cc: Linux RAID
In-Reply-To: <BANLkTim5A1HEiPrRMxiUFZ4PGuyc4pbG-Q@mail.gmail.com>

On Tue, 17 May 2011 12:20:50 +0400 CoolCold <coolthecold@gmail.com> wrote:

> Wiki says: "Never NEVER never re-partition disks that are part of a
> running RAID. If you must alter the partition table on a disk which is
> a part of a RAID, stop the array first, then repartition. " -
> https://raid.wiki.kernel.org/index.php/Tweaking,_tuning_and_troubleshooting#Pitfalls
> 
> Is it really true for situations like - I have 2x1Tb drives, which are
> already partitioned like /dev/sd{a,b}1 - 500mb, /boot & /dev/sd{a,b}2
> - 20gb, / and are assembled in RAID1 arrays md0 & md1 accordingly. So,
> if I want to create one more RAID1 array , say md3 from the rest of
> the drives.
> So i take my cfdisk ,add new partition with some space 100-150mb from
> the end, do write changes & partprobe the drives, then creating new
> array.
> 
> Is it bad? To be honest i'm doing this all the time and can't
> understand how this gonna hurt md. Neil and/or others, please clarify
> this.
> 
> 

There shouldn't be any problem with that as long as you are careful (and if
you aren't careful, there are plenty of other ways to destroy your data).

I wasn't aware of partprobe.    Just telling the kernel to reread the
partition table won't work when a partition is in use.
But partprobe seems to just tell the kernel about the partitions that have
changed, using a different ioctl, and that seem to work.

NeilBrown

^ permalink raw reply

* repartitioning disks
From: CoolCold @ 2011-05-17  8:20 UTC (permalink / raw)
  To: Linux RAID

Wiki says: "Never NEVER never re-partition disks that are part of a
running RAID. If you must alter the partition table on a disk which is
a part of a RAID, stop the array first, then repartition. " -
https://raid.wiki.kernel.org/index.php/Tweaking,_tuning_and_troubleshooting#Pitfalls

Is it really true for situations like - I have 2x1Tb drives, which are
already partitioned like /dev/sd{a,b}1 - 500mb, /boot & /dev/sd{a,b}2
- 20gb, / and are assembled in RAID1 arrays md0 & md1 accordingly. So,
if I want to create one more RAID1 array , say md3 from the rest of
the drives.
So i take my cfdisk ,add new partition with some space 100-150mb from
the end, do write changes & partprobe the drives, then creating new
array.

Is it bad? To be honest i'm doing this all the time and can't
understand how this gonna hurt md. Neil and/or others, please clarify
this.


-- 
Best regards,
[COOLCOLD-RIPN]

^ permalink raw reply

* Warning: E-mail viruses detected
From: MailScanner @ 2011-05-17  1:27 UTC (permalink / raw)
  To: linux-raid

Our e-mail content detector has just been triggered by a message you sent:
  To: mchrismachado@pneumo.epm.br
  Subject: status
  Date: Mon May 16 22:27:01 2011

One or more of the attachments (document.scr, document.zip) are on
the list of unacceptable attachments for this site and will not have
been delivered.

Consider renaming the files to avoid this constraint.

The virus detector said this about the message:
Report: Report: MailScanner: Windows Screensavers are often used to hide viruses (document.scr)
Report: No programs allowed (document.scr)


-- 
MailScanner
Email Virus Scanner
Your Organisation Name Here
www.your-organisation.com

MailScanner thanks transtec Computers for their support

^ permalink raw reply

* Best way to create RAID-6 for swap partition - existing one failed
From: Gavin Flower @ 2011-05-16 21:41 UTC (permalink / raw)
  To: linux-raid; +Cc: neilb, mb

Hi,

Motivation, existing RAID-6 swap partition failed.  I am thinking I should recreate it in a new format, as currently it is 'Version : 0.90', rather than simply rebuild it.

So 3 questions:

(1) What further diagnostics should I run first, if any (note I am currently running badblocks on the drive that dropped out), and I have put the existing diagnostic info at the end of this email


(2) What is the most appropriate RAID-6 format for a swap partition, keeping same the number of drives and overall capacity.

(3) How to convert the existing /dev/md0 to the new format.


Cheers,
Gavin


# grep md0 /var/log/messages

May 16 04:05:47 saturn kernel: [    3.658644] md: md0 stopped.

May 16 04:05:47 saturn kernel: [    3.933910] md/raid:md0: not clean -- starting background reconstruction

May 16 04:05:47 saturn kernel: [    3.937796] md/raid:md0: device sda3 operational as raid disk 0

May 16 04:05:47 saturn kernel: [    3.941540] md/raid:md0: device sdb3 operational as raid disk 4

May 16 04:05:47 saturn kernel: [    3.945161] md/raid:md0: device sdd3 operational as raid disk 3

May 16 04:05:47 saturn kernel: [    3.948706] md/raid:md0: device sdc3 operational as raid disk 2

May 16 04:05:47 saturn kernel: [    3.953408] md/raid:md0: allocated 5334kB

May 16 04:05:47 saturn kernel: [    3.956939] md/raid:md0: cannot start dirty degraded array.

May 16 04:05:47 saturn kernel: [    3.961082] md/raid:md0: failed to run raid set.

May 16 04:05:47 saturn kernel: [    3.968237] dracut: mdadm: failed to RUN_ARRAY /dev/md0: Input/output error

May 16 04:05:47 saturn kernel: [    4.239948] dracut: mdadm: /dev/md0 is already in use.

May 16 04:05:47 saturn kernel: [    4.340048] dracut: mdadm: /dev/md0 is already in use.

May 16 04:08:28 saturn kernel: [    3.038486] md: md0 stopped.

May 16 04:08:28 saturn kernel: [    3.205219] md/raid:md0: not clean -- starting background reconstruction

May 16 04:08:28 saturn kernel: [    3.206711] md/raid:md0: device sda3 operational as raid disk 0

May 16 04:08:28 saturn kernel: [    3.208501] md/raid:md0: device sdb3 operational as raid disk 4

May 16 04:08:28 saturn kernel: [    3.210254] md/raid:md0: device sdd3 operational as raid disk 3

May 16 04:08:28 saturn kernel: [    3.211979] md/raid:md0: device sdc3 operational as raid disk 2

May 16 04:08:28 saturn kernel: [    3.214179] md/raid:md0: allocated 5334kB

May 16 04:08:28 saturn kernel: [    3.215917] md/raid:md0: cannot start dirty degraded array.

May 16 04:08:28 saturn kernel: [    3.217880] md/raid:md0: failed to run raid set.

May 16 04:08:28 saturn kernel: [    3.221377] dracut: mdadm: failed to RUN_ARRAY /dev/md0: Input/output error

May 16 04:08:28 saturn kernel: [    3.425089] dracut: mdadm: /dev/md0 is already in use.

May 16 04:08:28 saturn kernel: [    4.118667] dracut: mdadm: /dev/md0 is already in use.

May 17 00:58:12 saturn kernel: [    3.006195] md: md0 stopped.

May 17 00:58:12 saturn kernel: [    3.174154] md/raid:md0: not clean -- starting background reconstruction

May 17 00:58:12 saturn kernel: [    3.175688] md/raid:md0: device sda3 operational as raid disk 0

May 17 00:58:12 saturn kernel: [    3.177218] md/raid:md0: device sdb3 operational as raid disk 4

May 17 00:58:12 saturn kernel: [    3.178717] md/raid:md0: device sdd3 operational as raid disk 3

May 17 00:58:12 saturn kernel: [    3.180196] md/raid:md0: device sdc3 operational as raid disk 2

May 17 00:58:12 saturn kernel: [    3.182161] md/raid:md0: allocated 5334kB

May 17 00:58:12 saturn kernel: [    3.183976] md/raid:md0: cannot start dirty degraded array.

May 17 00:58:12 saturn kernel: [    3.186002] md/raid:md0: failed to run raid set.

May 17 00:58:12 saturn kernel: [    3.189615] dracut: mdadm: failed to RUN_ARRAY /dev/md0: Input/output error

May 17 00:58:12 saturn kernel: [    3.540474] dracut: mdadm: /dev/md0 is already in use.

May 17 00:58:12 saturn kernel: [    3.614348] dracut: mdadm: /dev/md0 is already in use.

# 


# grep md0 /var/log/messages-20110515

[…]
May 12 03:05:12 saturn kernel: [132994.557873] md: delaying data-check of md0 until md1 has finished (they share one or more physical units)

May 12 03:24:41 saturn kernel: [134160.574564] md: data-check of RAID array md0

May 12 03:25:22 saturn kernel: [134202.299274] md: md0: data-check done.

May 13 00:15:00 saturn kernel: [    3.046117] md: md0 stopped.

May 13 00:15:00 saturn kernel: [    3.208950] md/raid:md0: device sda3 operational as raid disk 0

May 13 00:15:00 saturn kernel: [    3.210743] md/raid:md0: device sdb3 operational as raid disk 4

May 13 00:15:00 saturn kernel: [    3.212501] md/raid:md0: device sdd3 operational as raid disk 3

May 13 00:15:00 saturn kernel: [    3.214246] md/raid:md0: device sdc3 operational as raid disk 2

May 13 00:15:00 saturn kernel: [    3.215974] md/raid:md0: device sde3 operational as raid disk 1

May 13 00:15:00 saturn kernel: [    3.218201] md/raid:md0: allocated 5334kB

May 13 00:15:00 saturn kernel: [    3.219955] md/raid:md0: raid level 6 active with 5 out of 5 devices, algorithm 2

May 13 00:15:00 saturn kernel: [    3.221442] md0: detected capacity change from 0 to 11009851392

May 13 00:15:00 saturn kernel: [    3.223285] dracut: mdadm: /dev/md0 has been started with 5 drives.

May 13 00:15:00 saturn kernel: [    3.223971]  md0: unknown partition table

May 13 00:15:00 saturn kernel: [   12.055465] Adding 10751804k swap on /dev/md0.  Priority:-1 extents:1 across:10751804k 


# date ; cat /proc/mdstat 

Tue May 17 08:52:16 NZST 2011

Personalities : [raid6] [raid5] [raid4] 

md2 : active raid6 sda4[0] sdc4[6] sdd4[3] sdb4[5] sde4[1]

      1114745856 blocks super 1.1 level 6, 512k chunk, algorithm 2 [5/5] [UUUUU]

      bitmap: 2/3 pages [8KB], 65536KB chunk



md1 : active raid6 sda2[0] sdc2[4] sdd2[3] sde2[2] sdb2[1]

      307198464 blocks level 6, 512k chunk, algorithm 2 [5/5] [UUUUU]

      

md0 : inactive sda3[0] sdb3[4] sdd3[3] sdc3[2]

      14335744 blocks

       

unused devices: <none>


# mdadm --detail /dev/md0

/dev/md0:

        Version : 0.90

  Creation Time : Thu Dec  3 13:05:42 2009

     Raid Level : raid6

  Used Dev Size : 3583936 (3.42 GiB 3.67 GB)

   Raid Devices : 5

  Total Devices : 4

Preferred Minor : 0

    Persistence : Superblock is persistent



    Update Time : Mon May 16 03:56:48 2011

          State : active, degraded, Not Started

 Active Devices : 4

Working Devices : 4

 Failed Devices : 0

  Spare Devices : 0



         Layout : left-symmetric

     Chunk Size : 64K



           UUID : 3b76ac20:8253f696:bfe78010:bc810f04

         Events : 0.11171



    Number   Major   Minor   RaidDevice State

       0       8        3        0      active sync   /dev/sda3

       1       0        0        1      removed

       2       8       35        2      active sync   /dev/sdc3

       3       8       51        3      active sync   /dev/sdd3

       4       8       19        4      active sync   /dev/sdb3


# mdadm --stop /dev/md0

mdadm: stopped /dev/md0

# date ; cat /proc/mdstat

Tue May 17 09:04:49 NZST 2011

Personalities : [raid6] [raid5] [raid4] 

md2 : active raid6 sda4[0] sdc4[6] sdd4[3] sdb4[5] sde4[1]

      1114745856 blocks super 1.1 level 6, 512k chunk, algorithm 2 [5/5] [UUUUU]

      bitmap: 2/3 pages [8KB], 65536KB chunk



md1 : active raid6 sda2[0] sdc2[4] sdd2[3] sde2[2] sdb2[1]

      307198464 blocks level 6, 512k chunk, algorithm 2 [5/5] [UUUUU]

      

unused devices: <none>

# 


--
All Adults share the Responsibility
to help Raise Today's Children,
for they are Tomorrow's Society!
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] RAID-6 check standalone suspend array V2.0
From: NeilBrown @ 2011-05-16 10:08 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid
In-Reply-To: <20110515211515.GA30260@lazy.lzy>

On Sun, 15 May 2011 23:15:15 +0200 Piergiorgio Sartor
<piergiorgio.sartor@nexgo.de> wrote:

> Hi Neil,
> 
> reminder for the suspend patch.
> 
> Thank you so much for the code review.
> 
> I modified it in order to fix, hopefully, all the flaws.
> 
> New patch attached below.
> 
> Please note that "sigblock()" cannot be used, since it is
> declared, at least on my system, as "deprecated".
> Furthermore, I noticed that "Grow.c" is not checking the
> return value of "sysfs_set_num()" while suspending the
> array, maybe you'll need to look at this.
> 
> Finally, please check the new patch too, while I can
> confirm the software is doing what is supposed to do,
> I still need support in order to confirm the suspend
> and resume code.
> 
> Thanks again for your help, again let me know what
> is the next expected step.

That all looks fine thank.  I've applied it and pushed it out.

I'm not sure what you mean exactly by the 'next expected step'... 

Thanks,
NeilBrown


> 
> bye,
> 
> --- cut here ---
> 
> diff -uNr a/raid6check.c b/raid6check.c
> --- a/raid6check.c	2011-05-07 20:35:18.693370007 +0200
> +++ b/raid6check.c	2011-05-09 20:32:14.551695036 +0200
> @@ -24,6 +24,8 @@
>  
>  #include "mdadm.h"
>  #include <stdint.h>
> +#include <signal.h>
> +#include <sys/mman.h>
>  
>  int geo_map(int block, unsigned long long stripe, int raid_disks,
>  	    int level, int layout);
> @@ -99,7 +101,7 @@
>  	return curr_broken_disk;
>  }
>  
> -int check_stripes(int *source, unsigned long long *offsets,
> +int check_stripes(struct mdinfo *info, int *source, unsigned long long *offsets,
>  		  int raid_disks, int chunk_size, int level, int layout,
>  		  unsigned long long start, unsigned long long length, char *name[])
>  {
> @@ -115,6 +117,8 @@
>  	int diskP, diskQ;
>  	int data_disks = raid_disks - 2;
>  	int err = 0;
> +	sighandler_t sig[3];
> +	int rv;
>  
>  	extern int tables_ready;
>  
> @@ -139,10 +143,35 @@
>  
>  		printf("pos --> %llu\n", start);
>  
> +		if(mlockall(MCL_CURRENT | MCL_FUTURE) != 0) {
> +			err = 2;
> +			goto exitCheck;
> +		}
> +		sig[0] = signal(SIGTERM, SIG_IGN);
> +		sig[1] = signal(SIGINT, SIG_IGN);
> +		sig[2] = signal(SIGQUIT, SIG_IGN);
> +		rv = sysfs_set_num(info, NULL, "suspend_lo", start * chunk_size * data_disks);
> +		rv |= sysfs_set_num(info, NULL, "suspend_hi", (start + 1) * chunk_size * data_disks);
>  		for (i = 0 ; i < raid_disks ; i++) {
>  			lseek64(source[i], offsets[i] + start * chunk_size, 0);
>  			read(source[i], stripes[i], chunk_size);
>  		}
> +		rv |= sysfs_set_num(info, NULL, "suspend_lo", 0x7FFFFFFFFFFFFFFFULL);
> +		rv |= sysfs_set_num(info, NULL, "suspend_hi", 0);
> +		rv |= sysfs_set_num(info, NULL, "suspend_lo", 0);
> +		signal(SIGQUIT, sig[2]);
> +		signal(SIGINT, sig[1]);
> +		signal(SIGTERM, sig[0]);
> +		if(munlockall() != 0) {
> +			err = 3;
> +			goto exitCheck;
> +		}
> +
> +		if(rv != 0) {
> +			err = rv * 256;
> +			goto exitCheck;
> +		}
> +
>  		for (i = 0 ; i < data_disks ; i++) {
>  			int disk = geo_map(i, start, raid_disks, level, layout);
>  			blocks[i] = stripes[disk];
> @@ -214,7 +243,7 @@
>  	unsigned long long start, length;
>  	int i;
>  	int mdfd;
> -	struct mdinfo *info, *comp;
> +	struct mdinfo *info = NULL, *comp = NULL;
>  	char *err = NULL;
>  	int exit_err = 0;
>  	int close_flag = 0;
> @@ -250,6 +279,12 @@
>  			  GET_OFFSET|
>  			  GET_SIZE);
>  
> +	if(info == NULL) {
> +		fprintf(stderr, "%s: Error reading sysfs information of %s\n", prg, argv[1]);
> +		exit_err = 9;
> +		goto exitHere;
> +	}
> +
>  	if(info->array.level != level) {
>  		fprintf(stderr, "%s: %s not a RAID-6\n", prg, argv[1]);
>  		exit_err = 3;
> @@ -343,7 +378,7 @@
>  		comp = comp->next;
>  	}
>  
> -	int rv = check_stripes(fds, offsets,
> +	int rv = check_stripes(info, fds, offsets,
>  			       raid_disks, chunk_size, level, layout,
>  			       start, length, disk_name);
>  	if (rv != 0) {
> 
> --- cut here ---
> 
> bye,
> 


^ permalink raw reply

* RE: /dev/md2 stopped after changing SAS controller
From: Leslie Rhorer @ 2011-05-16  9:27 UTC (permalink / raw)
  To: 'Stan Hoeppner', linux-raid
In-Reply-To: <4DD0C059.4010102@hardwarefreak.com>



> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Stan Hoeppner
> Sent: Monday, May 16, 2011 1:13 AM
> To: linux-raid@vger.kernel.org
> Subject: Re: /dev/md2 stopped after changing SAS controller
> 
> On 5/15/2011 9:37 AM, Louis-David Mitterrand wrote:
> 
> > Someone pointed me to ARECA non-raid controllers:
> >
> > http://www.areca.com.tw/products/sasnoneraid3g.htm
> >
> > Does anyone have good or bad experiences with these?
> 
> http://www.newegg.com/Product/Product.aspx?Item=N82E16816151061
> 
> Read the 3 reviews.  Probably best to stay away from the Areca JBOD
> HBAs.  All 4 use a Marvell chip.  Many NewEgg reviews are valuable, as
> in this case.
> 
> Lest anyone denigrate the validity of quoting NewEgg due to their
> "perceived customer base", note that today's NewEgg ships plenty of
> mid/upper range gear (see links below) into corporations and
> universities, including blade chassis/blades, FC and converged switches,
> FC disk arrays, LTO libraries, etc.  They now have a business division
> that competes to a degree with the likes of CDW et al.

	Well, yes, but the reviews are sometimes a different matter.  The
best I can advise (and this is true for any review on any site) is to look
closely at the tone and clarity of each review.  Many are clearly written by
fools who haven't a clue.  Others are better written, but the level of
expertise behind the review may not be as high as one might first think.
Basically, no matter where one might read or see it, I recommend one take
all reviews with a grain of salt.

	That said, I have seen some useful reviews on NewEgg.


^ permalink raw reply

* Re: /dev/md2 stopped after changing SAS controller
From: Stan Hoeppner @ 2011-05-16  6:12 UTC (permalink / raw)
  To: linux-raid
In-Reply-To: <20110515143718.GA8667@apartia.fr>

On 5/15/2011 9:37 AM, Louis-David Mitterrand wrote:

> Someone pointed me to ARECA non-raid controllers: 
> 
> http://www.areca.com.tw/products/sasnoneraid3g.htm
> 
> Does anyone have good or bad experiences with these?

http://www.newegg.com/Product/Product.aspx?Item=N82E16816151061

Read the 3 reviews.  Probably best to stay away from the Areca JBOD
HBAs.  All 4 use a Marvell chip.  Many NewEgg reviews are valuable, as
in this case.

Lest anyone denigrate the validity of quoting NewEgg due to their
"perceived customer base", note that today's NewEgg ships plenty of
mid/upper range gear (see links below) into corporations and
universities, including blade chassis/blades, FC and converged switches,
FC disk arrays, LTO libraries, etc.  They now have a business division
that competes to a degree with the likes of CDW et al.

http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&IsNodeId=1&Description=hewlett%20packard&bop=And&Order=PRICED&PageSize=100

http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&IsNodeId=1&Description=ibm&bop=And&Order=PRICED&PageSize=100

-- 
Stan

^ permalink raw reply

* Re: /dev/md2 stopped after changing SAS controller
From: Stan Hoeppner @ 2011-05-16  5:01 UTC (permalink / raw)
  To: linux-raid
In-Reply-To: <4DCFE337.7090306@gmail.com>

On 5/15/2011 9:29 AM, Joe Landman wrote:

> Again, this is why making sure that the bona fides of those making
> recommendations don't begin and end with a google search ...

It would benefit some list members to read entire threads before making
comments such as this, demonstrating a total lack of clue, pettiness,
and a grudge holding personality.

Someone must have wounded your Id deeply for you to take a swipe at that
person in every post you make to this list...

-- 
Stan

^ permalink raw reply

* Re: /dev/md2 stopped after changing SAS controller
From: Stan Hoeppner @ 2011-05-16  4:43 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: linux-raid
In-Reply-To: <alpine.DEB.2.00.1105150922510.20305@uplift.swm.pp.se>

On 5/15/2011 2:26 AM, Mikael Abrahamsson wrote:
> On Sat, 14 May 2011, Stan Hoeppner wrote:

>> http://www.newegg.com/Product/Product.aspx?Item=N82E16816101358
>> http://www.supermicro.com/manuals/other/AOC-SASLP-MV8.pdf
>>
>> Simple JBOD only HBA, no fakeRAID.  Uses a Marvell 88SE6480 chip, 8
>> SAS/SATA ports via two SFF8087.  $110 USD.  I've heard minor rumblings
>> WRT the mvsas driver though I don't recall specifics.  The board itself
>> is good quality, as with most things SuperMicro.
> 
> MINOR!??? I've had this card for 1.5 years, it still not usable (as far
> as I can discern) as of 2.6.38. Do NOT buy it. Please stop recommending
> this card for Linux use unless it's to your enemies.

Now you get the idea Mikal.  ;) If the OP is so convinced that LSI
sucks, with ample evidence to the contrary, let him try something that
*everyone* knows sucks.

-- 
Stan

^ permalink raw reply

* Re: raid5 reshape failure - restart?
From: Glen Dragon @ 2011-05-15 21:45 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid
In-Reply-To: <20110516073702.6b6b9bb2@notabene.brown>

On Sun, May 15, 2011 at 5:37 PM, NeilBrown <neilb@suse.de> wrote:
> On Sun, 15 May 2011 13:33:28 -0400 Glen Dragon <glen.dragon@gmail.com> wrote:
>
>> In trying to reshape a raid5 array, I encountered some problems.
>> I was trying to reshape from raid5 3->4 devices.  The reshape process
>> started with seeming no problems, however i noticed in the kernel log
>> a number of ata3.00: failed command: WRITE FPDMA QUEUED errors.
>> In trying to determine if this was going to be bad for me, I disabled
>> ncq on this device. Looking at the log, i notice around the same time
>> /dev/sdd reported problems and took itself offline.
>> At this point the reshape seemed to be continuing w/o issue, even
>> though one of the drives was offline.. I wasn't sure that this made
>> sense.
>>
>> Shortly after, I noticed that the progress on the reshape had stalled.
>>  I tried changing the stripe_cache_size from 256 to [1024|2048|4096],
>> but the reshape did not resume.  top reported that the reshape process
>> was using 100% of one core, and the load average was climbing into the
>> 50's
>>
>> At this point I rebooted.   The array does not start.
>>
>> Can the reshape be restarted?  I cannot figure out where the backup
>> file ended up.  It does not seem to be where I thought I saved it.
>
> When a reshape is increasing the size of the array the backup file is only
> needed for the first few stripes.  After that it is irrelevant and is removed.
>
> You should be able to simply reassemble the array and it should continue the
> reshape.
>
> What happens when you  try:
>
>  mdadm -S /dev/md_d2
>  mdadm -A /dev/md_d2 /dev/sd[abc]5 -vv
>
> Please report both the messsages from mdadm and any new message is "dmesg" at
> the time.
>
> NeilBrown
>

 # mdadm -S /dev/md_d2
mdadm: stopped /dev/md_d2


 # mdadm -A /dev/md_d2  /dev/sd[abcd]5 -vv
mdadm: looking for devices for /dev/md_d2
mdadm: /dev/sda5 is identified as a member of /dev/md_d2, slot 0.
mdadm: /dev/sdb5 is identified as a member of /dev/md_d2, slot 1.
mdadm: /dev/sdc5 is identified as a member of /dev/md_d2, slot 3.
mdadm: /dev/sdd5 is identified as a member of /dev/md_d2, slot 2.
mdadm:/dev/md_d2 has an active reshape - checking if critical section
needs to be restored
mdadm: No backup metadata on device-3
mdadm: added /dev/sdb5 to /dev/md_d2 as 1
mdadm: added /dev/sdd5 to /dev/md_d2 as 2
mdadm: added /dev/sdc5 to /dev/md_d2 as 3
mdadm: added /dev/sda5 to /dev/md_d2 as 0
mdadm: /dev/md_d2 assembled from 3 drives - not enough to start the
array while not clean - consider --force.

 # mdadm -D /dev/md_d2
mdadm: md device /dev/md_d2 does not appear to be active.

 # cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [multipath] [raid1]
md_d2 : inactive sda5[0](S) sdc5[3](S) sdd5[2](S) sdb5[1](S)
      2799357952 blocks super 0.91

md8 : active raid5 sdh1[0] sdg1[4] sdf1[1] sdi1[3] sde1[2]
      5860542464 blocks level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]

md1 : active raid5 sdd3[2] sdb3[1] sda3[0]
      62926336 blocks level 5, 256k chunk, algorithm 2 [3/3] [UUU]

md0 : active raid1 sdb1[1] sda1[0] sdd1[2]
      208704 blocks [3/3] [UUU]


kernel log:
md: md_d2 stopped.
md: unbind<sda5>
md: export_rdev(sda5)
md: unbind<sdc5>
md: export_rdev(sdc5)
md: unbind<sdd5>
md: export_rdev(sdd5)
md: unbind<sdb5>
md: export_rdev(sdb5)
md: md_d2 stopped.
md: bind<sdb5>
md: bind<sdd5>
md: bind<sdc5>
md: bind<sda5>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: raid5 reshape failure - restart?
From: NeilBrown @ 2011-05-15 21:37 UTC (permalink / raw)
  To: Glen Dragon; +Cc: linux-raid
In-Reply-To: <BANLkTi=-QZaQD6itGGZeyFekb2Kq5=_1iA@mail.gmail.com>

On Sun, 15 May 2011 13:33:28 -0400 Glen Dragon <glen.dragon@gmail.com> wrote:

> In trying to reshape a raid5 array, I encountered some problems.
> I was trying to reshape from raid5 3->4 devices.  The reshape process
> started with seeming no problems, however i noticed in the kernel log
> a number of ata3.00: failed command: WRITE FPDMA QUEUED errors.
> In trying to determine if this was going to be bad for me, I disabled
> ncq on this device. Looking at the log, i notice around the same time
> /dev/sdd reported problems and took itself offline.
> At this point the reshape seemed to be continuing w/o issue, even
> though one of the drives was offline.. I wasn't sure that this made
> sense.
> 
> Shortly after, I noticed that the progress on the reshape had stalled.
>  I tried changing the stripe_cache_size from 256 to [1024|2048|4096],
> but the reshape did not resume.  top reported that the reshape process
> was using 100% of one core, and the load average was climbing into the
> 50's
> 
> At this point I rebooted.   The array does not start.
> 
> Can the reshape be restarted?  I cannot figure out where the backup
> file ended up.  It does not seem to be where I thought I saved it.

When a reshape is increasing the size of the array the backup file is only
needed for the first few stripes.  After that it is irrelevant and is removed.

You should be able to simply reassemble the array and it should continue the
reshape.

What happens when you  try:

 mdadm -S /dev/md_d2
 mdadm -A /dev/md_d2 /dev/sd[abc]5 -vv

Please report both the messsages from mdadm and any new message is "dmesg" at
the time.

NeilBrown



> 
> Can I assemble this array with only the 3 original devices? Is there a
> way to recover at least some of the data on the array?  I have various
> backups, but there are some stuff that was not "critical' but would
> still be handy to not loose.
> 
> Various logs that could be helpful:  md_d2 is the array in question.
> Thanks..
> --Glen
> 
> # mdadm --version
> mdadm - v3.1.4 - 31st August 2010
> 
>  # uname -a
> Linux palidor 2.6.36-gentoo-r5 #1 SMP Wed Mar 2 20:54:16 EST 2011
> x86_64 Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz GenuineIntel
> GNU/Linux
> 
> current state:
> 
> # cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4] [multipath] [raid1]
> md8 : active raid5 sdh1[0] sdg1[4] sdf1[1] sdi1[3] sde1[2]
>       5860542464 blocks level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
> 
> md_d2 : inactive sdb5[1](S) sda5[0](S) sdd5[2](S) sdc5[3](S)
>       2799357952 blocks super 0.91
> 
> md1 : active raid5 sdd3[2] sdb3[1] sda3[0]
>       62926336 blocks level 5, 256k chunk, algorithm 2 [3/3] [UUU]
> 
> md0 : active raid1 sdb1[1] sda1[0] sdd1[2]
>       208704 blocks [3/3] [UUU]
> 
> 
> # mdadm -E /dev/sdb5   ([abc]) are all similiar.
> /dev/sdb5:
>           Magic : a92b4efc
>         Version : 0.91.00
>            UUID : 2803efc9:c5d2ec1e:9894605d:35c5ea6f
>   Creation Time : Sat Oct  3 11:01:02 2009
>      Raid Level : raid5
>   Used Dev Size : 699839488 (667.42 GiB 716.64 GB)
>      Array Size : 2099518464 (2002.26 GiB 2149.91 GB)
>    Raid Devices : 4
>   Total Devices : 4
> Preferred Minor : 2
> 
>   Reshape pos'n : 62731776 (59.83 GiB 64.24 GB)
>   Delta Devices : 1 (3->4)
> 
>     Update Time : Sun May 15 11:25:21 2011
>           State : active
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 1
>   Spare Devices : 0
>        Checksum : 2f2eac3a - correct
>          Events : 114069
> 
>          Layout : left-symmetric
>      Chunk Size : 256K
> 
>       Number   Major   Minor   RaidDevice State
> this     1       8       21        1      active sync   /dev/sdb5
> 
>    0     0       8        5        0      active sync   /dev/sda5
>    1     1       8       21        1      active sync   /dev/sdb5
>    2     2       0        0        2      faulty removed
>    3     3       8       37        3      active sync   /dev/sdc5
> 
> # mdadm -E /dev/sdd5
> /dev/sdd5:
>           Magic : a92b4efc
>         Version : 0.91.00
>            UUID : 2803efc9:c5d2ec1e:9894605d:35c5ea6f
>   Creation Time : Sat Oct  3 11:01:02 2009
>      Raid Level : raid5
>   Used Dev Size : 699839488 (667.42 GiB 716.64 GB)
>      Array Size : 2099518464 (2002.26 GiB 2149.91 GB)
>    Raid Devices : 4
>   Total Devices : 4
> Preferred Minor : 2
> 
>   Reshape pos'n : 18048768 (17.21 GiB 18.48 GB)
>   Delta Devices : 1 (3->4)
> 
>     Update Time : Sun May 15 10:51:41 2011
>           State : clean
>  Active Devices : 4
> Working Devices : 4
>  Failed Devices : 0
>   Spare Devices : 0
>        Checksum : 29dcc275 - correct
>          Events : 113870
> 
>          Layout : left-symmetric
>      Chunk Size : 256K
> 
>       Number   Major   Minor   RaidDevice State
> this     2       8       53        2      active sync   /dev/sdd5
> 
>    0     0       8        5        0      active sync   /dev/sda5
>    1     1       8       21        1      active sync   /dev/sdb5
>    2     2       8       53        2      active sync   /dev/sdd5
>    3     3       8       37        3      active sync   /dev/sdc5
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply

* [PATCH] RAID-6 check standalone suspend array V2.0
From: Piergiorgio Sartor @ 2011-05-15 21:15 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: NeilBrown, linux-raid
In-Reply-To: <20110509184333.GA28743@lazy.lzy>

Hi Neil,

reminder for the suspend patch.

Thank you so much for the code review.

I modified it in order to fix, hopefully, all the flaws.

New patch attached below.

Please note that "sigblock()" cannot be used, since it is
declared, at least on my system, as "deprecated".
Furthermore, I noticed that "Grow.c" is not checking the
return value of "sysfs_set_num()" while suspending the
array, maybe you'll need to look at this.

Finally, please check the new patch too, while I can
confirm the software is doing what is supposed to do,
I still need support in order to confirm the suspend
and resume code.

Thanks again for your help, again let me know what
is the next expected step.

bye,

--- cut here ---

diff -uNr a/raid6check.c b/raid6check.c
--- a/raid6check.c	2011-05-07 20:35:18.693370007 +0200
+++ b/raid6check.c	2011-05-09 20:32:14.551695036 +0200
@@ -24,6 +24,8 @@
 
 #include "mdadm.h"
 #include <stdint.h>
+#include <signal.h>
+#include <sys/mman.h>
 
 int geo_map(int block, unsigned long long stripe, int raid_disks,
 	    int level, int layout);
@@ -99,7 +101,7 @@
 	return curr_broken_disk;
 }
 
-int check_stripes(int *source, unsigned long long *offsets,
+int check_stripes(struct mdinfo *info, int *source, unsigned long long *offsets,
 		  int raid_disks, int chunk_size, int level, int layout,
 		  unsigned long long start, unsigned long long length, char *name[])
 {
@@ -115,6 +117,8 @@
 	int diskP, diskQ;
 	int data_disks = raid_disks - 2;
 	int err = 0;
+	sighandler_t sig[3];
+	int rv;
 
 	extern int tables_ready;
 
@@ -139,10 +143,35 @@
 
 		printf("pos --> %llu\n", start);
 
+		if(mlockall(MCL_CURRENT | MCL_FUTURE) != 0) {
+			err = 2;
+			goto exitCheck;
+		}
+		sig[0] = signal(SIGTERM, SIG_IGN);
+		sig[1] = signal(SIGINT, SIG_IGN);
+		sig[2] = signal(SIGQUIT, SIG_IGN);
+		rv = sysfs_set_num(info, NULL, "suspend_lo", start * chunk_size * data_disks);
+		rv |= sysfs_set_num(info, NULL, "suspend_hi", (start + 1) * chunk_size * data_disks);
 		for (i = 0 ; i < raid_disks ; i++) {
 			lseek64(source[i], offsets[i] + start * chunk_size, 0);
 			read(source[i], stripes[i], chunk_size);
 		}
+		rv |= sysfs_set_num(info, NULL, "suspend_lo", 0x7FFFFFFFFFFFFFFFULL);
+		rv |= sysfs_set_num(info, NULL, "suspend_hi", 0);
+		rv |= sysfs_set_num(info, NULL, "suspend_lo", 0);
+		signal(SIGQUIT, sig[2]);
+		signal(SIGINT, sig[1]);
+		signal(SIGTERM, sig[0]);
+		if(munlockall() != 0) {
+			err = 3;
+			goto exitCheck;
+		}
+
+		if(rv != 0) {
+			err = rv * 256;
+			goto exitCheck;
+		}
+
 		for (i = 0 ; i < data_disks ; i++) {
 			int disk = geo_map(i, start, raid_disks, level, layout);
 			blocks[i] = stripes[disk];
@@ -214,7 +243,7 @@
 	unsigned long long start, length;
 	int i;
 	int mdfd;
-	struct mdinfo *info, *comp;
+	struct mdinfo *info = NULL, *comp = NULL;
 	char *err = NULL;
 	int exit_err = 0;
 	int close_flag = 0;
@@ -250,6 +279,12 @@
 			  GET_OFFSET|
 			  GET_SIZE);
 
+	if(info == NULL) {
+		fprintf(stderr, "%s: Error reading sysfs information of %s\n", prg, argv[1]);
+		exit_err = 9;
+		goto exitHere;
+	}
+
 	if(info->array.level != level) {
 		fprintf(stderr, "%s: %s not a RAID-6\n", prg, argv[1]);
 		exit_err = 3;
@@ -343,7 +378,7 @@
 		comp = comp->next;
 	}
 
-	int rv = check_stripes(fds, offsets,
+	int rv = check_stripes(info, fds, offsets,
 			       raid_disks, chunk_size, level, layout,
 			       start, length, disk_name);
 	if (rv != 0) {

--- cut here ---

bye,

-- 

piergiorgio

^ permalink raw reply

* raid5 reshape failure - restart?
From: Glen Dragon @ 2011-05-15 17:33 UTC (permalink / raw)
  To: linux-raid

In trying to reshape a raid5 array, I encountered some problems.
I was trying to reshape from raid5 3->4 devices.  The reshape process
started with seeming no problems, however i noticed in the kernel log
a number of ata3.00: failed command: WRITE FPDMA QUEUED errors.
In trying to determine if this was going to be bad for me, I disabled
ncq on this device. Looking at the log, i notice around the same time
/dev/sdd reported problems and took itself offline.
At this point the reshape seemed to be continuing w/o issue, even
though one of the drives was offline.. I wasn't sure that this made
sense.

Shortly after, I noticed that the progress on the reshape had stalled.
 I tried changing the stripe_cache_size from 256 to [1024|2048|4096],
but the reshape did not resume.  top reported that the reshape process
was using 100% of one core, and the load average was climbing into the
50's

At this point I rebooted.   The array does not start.

Can the reshape be restarted?  I cannot figure out where the backup
file ended up.  It does not seem to be where I thought I saved it.

Can I assemble this array with only the 3 original devices? Is there a
way to recover at least some of the data on the array?  I have various
backups, but there are some stuff that was not "critical' but would
still be handy to not loose.

Various logs that could be helpful:  md_d2 is the array in question.
Thanks..
--Glen

# mdadm --version
mdadm - v3.1.4 - 31st August 2010

 # uname -a
Linux palidor 2.6.36-gentoo-r5 #1 SMP Wed Mar 2 20:54:16 EST 2011
x86_64 Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz GenuineIntel
GNU/Linux

current state:

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [multipath] [raid1]
md8 : active raid5 sdh1[0] sdg1[4] sdf1[1] sdi1[3] sde1[2]
      5860542464 blocks level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]

md_d2 : inactive sdb5[1](S) sda5[0](S) sdd5[2](S) sdc5[3](S)
      2799357952 blocks super 0.91

md1 : active raid5 sdd3[2] sdb3[1] sda3[0]
      62926336 blocks level 5, 256k chunk, algorithm 2 [3/3] [UUU]

md0 : active raid1 sdb1[1] sda1[0] sdd1[2]
      208704 blocks [3/3] [UUU]


# mdadm -E /dev/sdb5   ([abc]) are all similiar.
/dev/sdb5:
          Magic : a92b4efc
        Version : 0.91.00
           UUID : 2803efc9:c5d2ec1e:9894605d:35c5ea6f
  Creation Time : Sat Oct  3 11:01:02 2009
     Raid Level : raid5
  Used Dev Size : 699839488 (667.42 GiB 716.64 GB)
     Array Size : 2099518464 (2002.26 GiB 2149.91 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 2

  Reshape pos'n : 62731776 (59.83 GiB 64.24 GB)
  Delta Devices : 1 (3->4)

    Update Time : Sun May 15 11:25:21 2011
          State : active
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 2f2eac3a - correct
         Events : 114069

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     1       8       21        1      active sync   /dev/sdb5

   0     0       8        5        0      active sync   /dev/sda5
   1     1       8       21        1      active sync   /dev/sdb5
   2     2       0        0        2      faulty removed
   3     3       8       37        3      active sync   /dev/sdc5

# mdadm -E /dev/sdd5
/dev/sdd5:
          Magic : a92b4efc
        Version : 0.91.00
           UUID : 2803efc9:c5d2ec1e:9894605d:35c5ea6f
  Creation Time : Sat Oct  3 11:01:02 2009
     Raid Level : raid5
  Used Dev Size : 699839488 (667.42 GiB 716.64 GB)
     Array Size : 2099518464 (2002.26 GiB 2149.91 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 2

  Reshape pos'n : 18048768 (17.21 GiB 18.48 GB)
  Delta Devices : 1 (3->4)

    Update Time : Sun May 15 10:51:41 2011
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 29dcc275 - correct
         Events : 113870

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     2       8       53        2      active sync   /dev/sdd5

   0     0       8        5        0      active sync   /dev/sda5
   1     1       8       21        1      active sync   /dev/sdb5
   2     2       8       53        2      active sync   /dev/sdd5
   3     3       8       37        3      active sync   /dev/sdc5

^ permalink raw reply

* Re: /dev/md2 stopped after changing SAS controller
From: Louis-David Mitterrand @ 2011-05-15 14:37 UTC (permalink / raw)
  To: linux-raid
In-Reply-To: <4DCFE337.7090306@gmail.com>

On Sun, May 15, 2011 at 10:29:11AM -0400, Joe Landman wrote:
> 
> +1 to Mikael
> 
> We have a number of these.  They are pretty much useless.  We *do
> not recommend them for any purpose whatsoever*.  We got them to
> test, and they were and are wastes of money.
> 
> Again, this is why making sure that the bona fides of those making
> recommendations don't begin and end with a google search ...
> 
> >
> >Buy a 1068E (3081E) based card, they've been working fine for a long time.
> >
> 
> +1 on that.  We like the 3081E, and the 9211-8i for these purposes.
> Both work quite well.
> 
> Again, ignore these cards, they will absorb your money and your time.

Someone pointed me to ARECA non-raid controllers: 

http://www.areca.com.tw/products/sasnoneraid3g.htm

Does anyone have good or bad experiences with these?

Thanks,

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox