disk order problem in a raid 10 array

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* disk order problem in a raid 10 array
@ 2011-03-18 14:49 Xavier Brochard
  2011-03-18 17:22 ` hansbkk
                   ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Xavier Brochard @ 2011-03-18 14:49 UTC (permalink / raw)
  To: linux-raid

Hello

trying to solve my problem with a unusable raid10 array, I discovered that 
disk order is mixed between each boot - even with live-cd.
Here's an extract from dmesg:
[   12.5]  sda:
[   12.5]  sdc:
[   12.5]  sdd:
[   12.5]  sde: sdd1
[   12.5]  sdf: sdc1
[   12.5]  sda1 sda2
[   12.5]  sdg: sde1
[   12.5]  sdf1

is that normal?
could this be a sign of hardware controler problem?
could this happen because all disks are sata-3 except 1 SSD which is sata-2?

Xavier
xavier@alternatif.org

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: disk order problem in a raid 10 array
  2011-03-18 14:49 disk order problem in a raid 10 array Xavier Brochard
@ 2011-03-18 17:22 ` hansbkk
  2011-03-18 20:09   ` Xavier Brochard
  2011-03-18 20:12   ` Xavier Brochard
  2011-03-18 20:26 ` Adaptive throttling for RAID1 background resync Hari Subramanian
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 28+ messages in thread
From: hansbkk @ 2011-03-18 17:22 UTC (permalink / raw)
  To: Xavier Brochard; +Cc: linux-raid

On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard <xavier@alternatif.org> wrote:
> disk order is mixed between each boot - even with live-cd.

> is that normal?

If nothing is changing and the order is swapping really every boot,
then IMO that is odd.

But it's very normal for  ordering to change from time to time, and
definitely when elements change - kernel version/flavor, drivers, BIOS
settings, etc.

Part of my SOP is now to record both mdadm and the boot loader's
ordering against serial number and UUID of drives when creating an
array, and to put the relevant information on labels securely attached
to the physical drives, along with creating a map of their physical
location and taping that inside the case.

It's critical to know what's what in a crisis. . .

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: disk order problem in a raid 10 array
  2011-03-18 17:22 ` hansbkk
@ 2011-03-18 20:09   ` Xavier Brochard
  2011-03-18 20:12   ` Xavier Brochard
  1 sibling, 0 replies; 28+ messages in thread
From: Xavier Brochard @ 2011-03-18 20:09 UTC (permalink / raw)
  To: hansbkk; +Cc: linux-raid

Le vendredi 18 mars 2011 18:22:34 hansbkk@gmail.com, vous avez écrit :
> On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard <xavier@alternatif.org> 
wrote:
> > disk order is mixed between each boot - even with live-cd.
> > is that normal?
> 
> If nothing is changing and the order is swapping really every boot,
> then IMO that is odd.

nothing has changed, except kernel minor version

> 
> Part of my SOP is now to record both mdadm and the boot loader's
> ordering against serial number and UUID of drives when creating an
> array, and to put the relevant information on labels securely attached
> to the physical drives, along with creating a map of their physical
> location and taping that inside the case.nt
> 
> It's critical to know what's what in a crisis. . .

mdadm --examine output is somewhat weird as it shows
/dev/sde1
this     0       8       33        0      active sync   /dev/sdd1
/dev/sdd1
this     0       8       33        0      active sync   /dev/sdc1
/dev/sdc1
this     0       8       33        0      active sync   /dev/sde1
and /dev/sdf1 as sdf1

I think I can believe mdadm?
and that /proc/mdstat content comes directly from mdadm (that is with "real" 
sdc,d,e)?

what trouble me is that after I removed 2 disk drive from the bay, mdadm start 
to recover: 
md0 : active raid10 sdb1[1] sdc1[4] sdd1[3]
      976767872 blocks 64K chunks 2 near-copies [4/2] [_U_U]
      [=>...................]  recovery =  5.0% (24436736/488383936) 
finish=56.2min speed=137513K/sec

I guess that it is ok, and that it is recovering with the spare. But I wld 
like to be sure...


Xavier
xavier@alternatif.org - 09 54 06 16 26
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: disk order problem in a raid 10 array
  2011-03-18 17:22 ` hansbkk
  2011-03-18 20:09   ` Xavier Brochard
@ 2011-03-18 20:12   ` Xavier Brochard
  2011-03-18 22:22     ` NeilBrown
  1 sibling, 1 reply; 28+ messages in thread
From: Xavier Brochard @ 2011-03-18 20:12 UTC (permalink / raw)
  To: hansbkk; +Cc: linux-raid

Le vendredi 18 mars 2011 18:22:34 hansbkk@gmail.com, vous avez écrit :
> On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard <xavier@alternatif.org> 
wrote:
> > disk order is mixed between each boot - even with live-cd.
> > is that normal?
> 
> If nothing is changing and the order is swapping really every boot,
> then IMO that is odd.

nothing has changed, except kernel minor version

> 
> Part of my SOP is now to record both mdadm and the boot loader's
> ordering against serial number and UUID of drives when creating an
> array, and to put the relevant information on labels securely attached
> to the physical drives, along with creating a map of their physical
> location and taping that inside the case.nt
> 
> It's critical to know what's what in a crisis. . .

exactly, in my case mdadm --examine output is somewhat weird as it shows:
/dev/sde1
this     0       8       33        0      active sync   /dev/sdd1
/dev/sdd1
this     0       8       33        0      active sync   /dev/sdc1
/dev/sdc1
this     0       8       33        0      active sync   /dev/sde1
and /dev/sdf1 as sdf1

I think I can believe mdadm?
and that /proc/mdstat content comes directly from mdadm (that is with "exact"
sdc,d,e)?

what trouble me is that after I removed 2 disk drive from the bay, mdadm start 
to recover: 
md0 : active raid10 sdb1[1] sdc1[4] sdd1[3]
      976767872 blocks 64K chunks 2 near-copies [4/2] [_U_U]
      [=>...................]  recovery =  5.0% (24436736/488383936) 
finish=56.2min speed=137513K/sec

I guess that it is ok, and that it is recovering with the spare. But I would 
like to be sure...


Xavier
xavier@alternatif.org - 09 54 06 16 26
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: disk order problem in a raid 10 array
  2011-03-18 20:12   ` Xavier Brochard
@ 2011-03-18 22:22     ` NeilBrown
  0 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2011-03-18 22:22 UTC (permalink / raw)
  To: Xavier Brochard; +Cc: hansbkk, linux-raid

On Fri, 18 Mar 2011 21:12:49 +0100 Xavier Brochard <xavier@alternatif.org>
wrote:

> Le vendredi 18 mars 2011 18:22:34 hansbkk@gmail.com, vous avez écrit :
> > On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard <xavier@alternatif.org> 
> wrote:
> > > disk order is mixed between each boot - even with live-cd.
> > > is that normal?
> > 
> > If nothing is changing and the order is swapping really every boot,
> > then IMO that is odd.
> 
> nothing has changed, except kernel minor version

Yet you don't tell us what the kernel minor version changed from or to.

That may not be important, but it might and you obviously don't know which.
It is always better to give too much information rather than not enough.


> 
> > 
> > Part of my SOP is now to record both mdadm and the boot loader's
> > ordering against serial number and UUID of drives when creating an
> > array, and to put the relevant information on labels securely attached
> > to the physical drives, along with creating a map of their physical
> > location and taping that inside the case.nt
> > 
> > It's critical to know what's what in a crisis. . .
> 
> exactly, in my case mdadm --examine output is somewhat weird as it shows:
> /dev/sde1
> this     0       8       33        0      active sync   /dev/sdd1
> /dev/sdd1
> this     0       8       33        0      active sync   /dev/sdc1
> /dev/sdc1
> this     0       8       33        0      active sync   /dev/sde1
> and /dev/sdf1 as sdf1

You are hiding lots of details again...

Are these all from different arrays?  They all claim to be 'device 0' of some
array.

Infact,  "8, 33" is *always* /dev/sdc1,  so I think the above lines have been
edited by hand because I'm 100% certain mdadm didn't output them.


> 
> I think I can believe mdadm?

Yes, you can believe mdadm - but only if you understand what it is saying,
and there are times when that is not as easy as one might like....

> and that /proc/mdstat content comes directly from mdadm (that is with "exact"
> sdc,d,e)?
> 
> what trouble me is that after I removed 2 disk drive from the bay, mdadm start 
> to recover: 
> md0 : active raid10 sdb1[1] sdc1[4] sdd1[3]
>       976767872 blocks 64K chunks 2 near-copies [4/2] [_U_U]
>       [=>...................]  recovery =  5.0% (24436736/488383936) 
> finish=56.2min speed=137513K/sec

Why exactly does this trouble you?  It seems to be doing exactly the right
thing.

> 
> I guess that it is ok, and that it is recovering with the spare. But I would 
> like to be sure...

Sure of what?  If you want a clear answer you need to ask a clear question.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Adaptive throttling for RAID1 background resync
  2011-03-18 14:49 disk order problem in a raid 10 array Xavier Brochard
  2011-03-18 17:22 ` hansbkk
@ 2011-03-18 20:26 ` Hari Subramanian
  2011-03-18 20:28   ` Roberto Spadim
  2011-03-18 22:11   ` NeilBrown
  2011-03-18 22:14 ` disk order problem in a raid 10 array NeilBrown
       [not found] ` <201103182350.19281.xavier@alternatif.org>
  3 siblings, 2 replies; 28+ messages in thread
From: Hari Subramanian @ 2011-03-18 20:26 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org

I am hitting an issue when performing RAID1 resync from a replica hosted on a fast disk to one on a slow disk. When resync throughput is set at 20Mbps min and 200Mbps max and we have enough data to resync, I see the kernel running out of memory quickly (within a minute). From the crash dumps, I see that a whole lot (12,000+) of biovec-64s that are active on the slab cache.

Our guess is that MD is allowing data to be read from the fast disk at a frequency much higher than what the slow disk is able to write to. This continues for a long time (> 1 minute) in an unbounded fashion resulting in buildup of IOs that are waiting to be written to the disk. This eventually causes the machine to panic (we have panic on OOM selected)

From reading the MD and RAID1 resync code, I don't see anything that would prevent something like this from happening. So, we would like to implement something to this effect that adaptively throttles the background resync.

Can someone confirm or deny these claims and also the need for a new solution. Maybe I'm missing something that already exists that would give me the adaptive throttling. We cannot make do with the static throttling (sync_speed_max and min) since that would be too difficult to get right for varying IO throughputs form the different RAID1 replicas.

Thanks
~ Hari

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Adaptive throttling for RAID1 background resync
  2011-03-18 20:26 ` Adaptive throttling for RAID1 background resync Hari Subramanian
@ 2011-03-18 20:28   ` Roberto Spadim
  2011-03-18 20:31     ` Hari Subramanian
  2011-03-18 22:11   ` NeilBrown
  1 sibling, 1 reply; 28+ messages in thread
From: Roberto Spadim @ 2011-03-18 20:28 UTC (permalink / raw)
  To: Hari Subramanian; +Cc: linux-raid@vger.kernel.org

maybe this could be better solved at queue linux kernel area... at
elevators or block devices

2011/3/18 Hari Subramanian <hari@vmware.com>:
> I am hitting an issue when performing RAID1 resync from a replica hosted on a fast disk to one on a slow disk. When resync throughput is set at 20Mbps min and 200Mbps max and we have enough data to resync, I see the kernel running out of memory quickly (within a minute). From the crash dumps, I see that a whole lot (12,000+) of biovec-64s that are active on the slab cache.
>
> Our guess is that MD is allowing data to be read from the fast disk at a frequency much higher than what the slow disk is able to write to. This continues for a long time (> 1 minute) in an unbounded fashion resulting in buildup of IOs that are waiting to be written to the disk. This eventually causes the machine to panic (we have panic on OOM selected)
>
> From reading the MD and RAID1 resync code, I don't see anything that would prevent something like this from happening. So, we would like to implement something to this effect that adaptively throttles the background resync.
>
> Can someone confirm or deny these claims and also the need for a new solution. Maybe I'm missing something that already exists that would give me the adaptive throttling. We cannot make do with the static throttling (sync_speed_max and min) since that would be too difficult to get right for varying IO throughputs form the different RAID1 replicas.
>
> Thanks
> ~ Hari
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Adaptive throttling for RAID1 background resync
  2011-03-18 20:28   ` Roberto Spadim
@ 2011-03-18 20:31     ` Hari Subramanian
  2011-03-18 20:36       ` Roberto Spadim
  0 siblings, 1 reply; 28+ messages in thread
From: Hari Subramanian @ 2011-03-18 20:31 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: linux-raid@vger.kernel.org

Roberto, My use case involves both foreground and background resyncs happening at the same time. So, by throttling it at the block or IO queues, I would be limiting my throughout for foreground IOs as well which is undesirable.

~ Hari

-----Original Message-----
From: rspadim@gmail.com [mailto:rspadim@gmail.com] On Behalf Of Roberto Spadim
Sent: Friday, March 18, 2011 4:29 PM
To: Hari Subramanian
Cc: linux-raid@vger.kernel.org
Subject: Re: Adaptive throttling for RAID1 background resync

maybe this could be better solved at queue linux kernel area... at
elevators or block devices

2011/3/18 Hari Subramanian <hari@vmware.com>:
> I am hitting an issue when performing RAID1 resync from a replica hosted on a fast disk to one on a slow disk. When resync throughput is set at 20Mbps min and 200Mbps max and we have enough data to resync, I see the kernel running out of memory quickly (within a minute). From the crash dumps, I see that a whole lot (12,000+) of biovec-64s that are active on the slab cache.
>
> Our guess is that MD is allowing data to be read from the fast disk at a frequency much higher than what the slow disk is able to write to. This continues for a long time (> 1 minute) in an unbounded fashion resulting in buildup of IOs that are waiting to be written to the disk. This eventually causes the machine to panic (we have panic on OOM selected)
>
> From reading the MD and RAID1 resync code, I don't see anything that would prevent something like this from happening. So, we would like to implement something to this effect that adaptively throttles the background resync.
>
> Can someone confirm or deny these claims and also the need for a new solution. Maybe I'm missing something that already exists that would give me the adaptive throttling. We cannot make do with the static throttling (sync_speed_max and min) since that would be too difficult to get right for varying IO throughputs form the different RAID1 replicas.
>
> Thanks
> ~ Hari
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Adaptive throttling for RAID1 background resync
  2011-03-18 20:31     ` Hari Subramanian
@ 2011-03-18 20:36       ` Roberto Spadim
  2011-03-18 20:54         ` Hari Subramanian
  0 siblings, 1 reply; 28+ messages in thread
From: Roberto Spadim @ 2011-03-18 20:36 UTC (permalink / raw)
  To: Hari Subramanian; +Cc: linux-raid@vger.kernel.org

hum, it´s not a io queue size (very big ram memory queue) problem?
maybe getting it smaller could help?
resync is something like read here write there, if you have write
problem, read should stop when async writes can´t work more (no ram
memory)
i´m right? if true, that´s why i think queue is a point to check

2011/3/18 Hari Subramanian <hari@vmware.com>:
> Roberto, My use case involves both foreground and background resyncs happening at the same time. So, by throttling it at the block or IO queues, I would be limiting my throughout for foreground IOs as well which is undesirable.
>
> ~ Hari
>
> -----Original Message-----
> From: rspadim@gmail.com [mailto:rspadim@gmail.com] On Behalf Of Roberto Spadim
> Sent: Friday, March 18, 2011 4:29 PM
> To: Hari Subramanian
> Cc: linux-raid@vger.kernel.org
> Subject: Re: Adaptive throttling for RAID1 background resync
>
> maybe this could be better solved at queue linux kernel area... at
> elevators or block devices
>
> 2011/3/18 Hari Subramanian <hari@vmware.com>:
>> I am hitting an issue when performing RAID1 resync from a replica hosted on a fast disk to one on a slow disk. When resync throughput is set at 20Mbps min and 200Mbps max and we have enough data to resync, I see the kernel running out of memory quickly (within a minute). From the crash dumps, I see that a whole lot (12,000+) of biovec-64s that are active on the slab cache.
>>
>> Our guess is that MD is allowing data to be read from the fast disk at a frequency much higher than what the slow disk is able to write to. This continues for a long time (> 1 minute) in an unbounded fashion resulting in buildup of IOs that are waiting to be written to the disk. This eventually causes the machine to panic (we have panic on OOM selected)
>>
>> From reading the MD and RAID1 resync code, I don't see anything that would prevent something like this from happening. So, we would like to implement something to this effect that adaptively throttles the background resync.
>>
>> Can someone confirm or deny these claims and also the need for a new solution. Maybe I'm missing something that already exists that would give me the adaptive throttling. We cannot make do with the static throttling (sync_speed_max and min) since that would be too difficult to get right for varying IO throughputs form the different RAID1 replicas.
>>
>> Thanks
>> ~ Hari
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Adaptive throttling for RAID1 background resync
  2011-03-18 20:36       ` Roberto Spadim
@ 2011-03-18 20:54         ` Hari Subramanian
  2011-03-18 21:02           ` Roberto Spadim
  0 siblings, 1 reply; 28+ messages in thread
From: Hari Subramanian @ 2011-03-18 20:54 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: linux-raid@vger.kernel.org

Roberto, I still think the solution you point out has the potential for throttling foreground IOs issued to MD from the filesystem as well as the MD initiated background resyncs. So, I don't want to limit the IO queues, esp since our foreground workload involves a LOT of small random IO.

Thanks
~ Hari

-----Original Message-----
From: rspadim@gmail.com [mailto:rspadim@gmail.com] On Behalf Of Roberto Spadim
Sent: Friday, March 18, 2011 4:36 PM
To: Hari Subramanian
Cc: linux-raid@vger.kernel.org
Subject: Re: Adaptive throttling for RAID1 background resync

hum, it´s not a io queue size (very big ram memory queue) problem?
maybe getting it smaller could help?
resync is something like read here write there, if you have write
problem, read should stop when async writes can´t work more (no ram
memory)
i´m right? if true, that´s why i think queue is a point to check

2011/3/18 Hari Subramanian <hari@vmware.com>:
> Roberto, My use case involves both foreground and background resyncs happening at the same time. So, by throttling it at the block or IO queues, I would be limiting my throughout for foreground IOs as well which is undesirable.
>
> ~ Hari
>
> -----Original Message-----
> From: rspadim@gmail.com [mailto:rspadim@gmail.com] On Behalf Of Roberto Spadim
> Sent: Friday, March 18, 2011 4:29 PM
> To: Hari Subramanian
> Cc: linux-raid@vger.kernel.org
> Subject: Re: Adaptive throttling for RAID1 background resync
>
> maybe this could be better solved at queue linux kernel area... at
> elevators or block devices
>
> 2011/3/18 Hari Subramanian <hari@vmware.com>:
>> I am hitting an issue when performing RAID1 resync from a replica hosted on a fast disk to one on a slow disk. When resync throughput is set at 20Mbps min and 200Mbps max and we have enough data to resync, I see the kernel running out of memory quickly (within a minute). From the crash dumps, I see that a whole lot (12,000+) of biovec-64s that are active on the slab cache.
>>
>> Our guess is that MD is allowing data to be read from the fast disk at a frequency much higher than what the slow disk is able to write to. This continues for a long time (> 1 minute) in an unbounded fashion resulting in buildup of IOs that are waiting to be written to the disk. This eventually causes the machine to panic (we have panic on OOM selected)
>>
>> From reading the MD and RAID1 resync code, I don't see anything that would prevent something like this from happening. So, we would like to implement something to this effect that adaptively throttles the background resync.
>>
>> Can someone confirm or deny these claims and also the need for a new solution. Maybe I'm missing something that already exists that would give me the adaptive throttling. We cannot make do with the static throttling (sync_speed_max and min) since that would be too difficult to get right for varying IO throughputs form the different RAID1 replicas.
>>
>> Thanks
>> ~ Hari
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Adaptive throttling for RAID1 background resync
  2011-03-18 20:54         ` Hari Subramanian
@ 2011-03-18 21:02           ` Roberto Spadim
  0 siblings, 0 replies; 28+ messages in thread
From: Roberto Spadim @ 2011-03-18 21:02 UTC (permalink / raw)
  To: Hari Subramanian; +Cc: linux-raid@vger.kernel.org

humm, let´s wait anothers ideas from list

2011/3/18 Hari Subramanian <hari@vmware.com>:
> Roberto, I still think the solution you point out has the potential for throttling foreground IOs issued to MD from the filesystem as well as the MD initiated background resyncs. So, I don't want to limit the IO queues, esp since our foreground workload involves a LOT of small random IO.
>
> Thanks
> ~ Hari
>
> -----Original Message-----
> From: rspadim@gmail.com [mailto:rspadim@gmail.com] On Behalf Of Roberto Spadim
> Sent: Friday, March 18, 2011 4:36 PM
> To: Hari Subramanian
> Cc: linux-raid@vger.kernel.org
> Subject: Re: Adaptive throttling for RAID1 background resync
>
> hum, it´s not a io queue size (very big ram memory queue) problem?
> maybe getting it smaller could help?
> resync is something like read here write there, if you have write
> problem, read should stop when async writes can´t work more (no ram
> memory)
> i´m right? if true, that´s why i think queue is a point to check
>
> 2011/3/18 Hari Subramanian <hari@vmware.com>:
>> Roberto, My use case involves both foreground and background resyncs happening at the same time. So, by throttling it at the block or IO queues, I would be limiting my throughout for foreground IOs as well which is undesirable.
>>
>> ~ Hari
>>
>> -----Original Message-----
>> From: rspadim@gmail.com [mailto:rspadim@gmail.com] On Behalf Of Roberto Spadim
>> Sent: Friday, March 18, 2011 4:29 PM
>> To: Hari Subramanian
>> Cc: linux-raid@vger.kernel.org
>> Subject: Re: Adaptive throttling for RAID1 background resync
>>
>> maybe this could be better solved at queue linux kernel area... at
>> elevators or block devices
>>
>> 2011/3/18 Hari Subramanian <hari@vmware.com>:
>>> I am hitting an issue when performing RAID1 resync from a replica hosted on a fast disk to one on a slow disk. When resync throughput is set at 20Mbps min and 200Mbps max and we have enough data to resync, I see the kernel running out of memory quickly (within a minute). From the crash dumps, I see that a whole lot (12,000+) of biovec-64s that are active on the slab cache.
>>>
>>> Our guess is that MD is allowing data to be read from the fast disk at a frequency much higher than what the slow disk is able to write to. This continues for a long time (> 1 minute) in an unbounded fashion resulting in buildup of IOs that are waiting to be written to the disk. This eventually causes the machine to panic (we have panic on OOM selected)
>>>
>>> From reading the MD and RAID1 resync code, I don't see anything that would prevent something like this from happening. So, we would like to implement something to this effect that adaptively throttles the background resync.
>>>
>>> Can someone confirm or deny these claims and also the need for a new solution. Maybe I'm missing something that already exists that would give me the adaptive throttling. We cannot make do with the static throttling (sync_speed_max and min) since that would be too difficult to get right for varying IO throughputs form the different RAID1 replicas.
>>>
>>> Thanks
>>> ~ Hari
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>>
>>
>> --
>> Roberto Spadim
>> Spadim Technology / SPAEmpresarial
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Adaptive throttling for RAID1 background resync
  2011-03-18 20:26 ` Adaptive throttling for RAID1 background resync Hari Subramanian
  2011-03-18 20:28   ` Roberto Spadim
@ 2011-03-18 22:11   ` NeilBrown
  2011-03-21 21:02     ` Hari Subramanian
  1 sibling, 1 reply; 28+ messages in thread
From: NeilBrown @ 2011-03-18 22:11 UTC (permalink / raw)
  To: Hari Subramanian; +Cc: linux-raid@vger.kernel.org

On Fri, 18 Mar 2011 13:26:52 -0700 Hari Subramanian <hari@vmware.com> wrote:

> I am hitting an issue when performing RAID1 resync from a replica hosted on a fast disk to one on a slow disk. When resync throughput is set at 20Mbps min and 200Mbps max and we have enough data to resync, I see the kernel running out of memory quickly (within a minute). From the crash dumps, I see that a whole lot (12,000+) of biovec-64s that are active on the slab cache.
> 
> Our guess is that MD is allowing data to be read from the fast disk at a frequency much higher than what the slow disk is able to write to. This continues for a long time (> 1 minute) in an unbounded fashion resulting in buildup of IOs that are waiting to be written to the disk. This eventually causes the machine to panic (we have panic on OOM selected)
> 
> >From reading the MD and RAID1 resync code, I don't see anything that would prevent something like this from happening. So, we would like to implement something to this effect that adaptively throttles the background resync.
> 
> Can someone confirm or deny these claims and also the need for a new solution. Maybe I'm missing something that already exists that would give me the adaptive throttling. We cannot make do with the static throttling (sync_speed_max and min) since that would be too difficult to get right for varying IO throughputs form the different RAID1 replicas.

The thing you are missing that already exists is 

#define RESYNC_DEPTH 32

which is a limit places on conf->barrier, where conf->barrier is incremented
before submitting a resync IO, and decremented after completing a resync IO.

So there can never be more than 32 bios per device in use for resync.


12,000 active biovec-64s sounds a lot like a memory leak - something isn't
freeing them.
Is there some 'bio-XXX' slab with a similar count.  If there isn't, then the
bio was released without releasing the biovec, which would be bad.
If there is - that information would help.

NeilBrown

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Adaptive throttling for RAID1 background resync
  2011-03-18 22:11   ` NeilBrown
@ 2011-03-21 21:02     ` Hari Subramanian
  0 siblings, 0 replies; 28+ messages in thread
From: Hari Subramanian @ 2011-03-21 21:02 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid@vger.kernel.org

Hi Neil,

There are an equal number of BIOs as there are biovec-64s. But I understand why that is the case now. Turns out that some changes that were made by one of our performance engineers in the interest of increasing the performance of background resyncs when there are foreground I/Os gets in the way (or effectively neuters) the RESYNC_DEPTH throttle that exists today.

The gist of these changes is that:
 - We hold the barrier across the resync window to disallow foreground IOs from interrupting background resyncs and raise_barrier is not being invoked raid1 sync_request
 - We increase the resync window to 8M and resync chunk size to 256K

The combination of these factors caused us to have a huge number of IOs outstanding and as much as 256M of resync data pages. We are working on a fix for this. I can share a patch to MD that implements these changes if someone is interested.

Thanks again for your help!
~ Hari

-----Original Message-----
From: NeilBrown [mailto:neilb@suse.de] 
Sent: Friday, March 18, 2011 6:12 PM
To: Hari Subramanian
Cc: linux-raid@vger.kernel.org
Subject: Re: Adaptive throttling for RAID1 background resync

On Fri, 18 Mar 2011 13:26:52 -0700 Hari Subramanian <hari@vmware.com> wrote:

> I am hitting an issue when performing RAID1 resync from a replica hosted on a fast disk to one on a slow disk. When resync throughput is set at 20Mbps min and 200Mbps max and we have enough data to resync, I see the kernel running out of memory quickly (within a minute). From the crash dumps, I see that a whole lot (12,000+) of biovec-64s that are active on the slab cache.
> 
> Our guess is that MD is allowing data to be read from the fast disk at a frequency much higher than what the slow disk is able to write to. This continues for a long time (> 1 minute) in an unbounded fashion resulting in buildup of IOs that are waiting to be written to the disk. This eventually causes the machine to panic (we have panic on OOM selected)
> 
> >From reading the MD and RAID1 resync code, I don't see anything that would prevent something like this from happening. So, we would like to implement something to this effect that adaptively throttles the background resync.
> 
> Can someone confirm or deny these claims and also the need for a new solution. Maybe I'm missing something that already exists that would give me the adaptive throttling. We cannot make do with the static throttling (sync_speed_max and min) since that would be too difficult to get right for varying IO throughputs form the different RAID1 replicas.

The thing you are missing that already exists is 

#define RESYNC_DEPTH 32

which is a limit places on conf->barrier, where conf->barrier is incremented
before submitting a resync IO, and decremented after completing a resync IO.

So there can never be more than 32 bios per device in use for resync.

12,000 active biovec-64s sounds a lot like a memory leak - something isn't
freeing them.
Is there some 'bio-XXX' slab with a similar count.  If there isn't, then the
bio was released without releasing the biovec, which would be bad.
If there is - that information would help.

NeilBrown

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: disk order problem in a raid 10 array
  2011-03-18 14:49 disk order problem in a raid 10 array Xavier Brochard
  2011-03-18 17:22 ` hansbkk
  2011-03-18 20:26 ` Adaptive throttling for RAID1 background resync Hari Subramanian
@ 2011-03-18 22:14 ` NeilBrown
       [not found] ` <201103182350.19281.xavier@alternatif.org>
  3 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2011-03-18 22:14 UTC (permalink / raw)
  To: Xavier Brochard; +Cc: linux-raid

On Fri, 18 Mar 2011 15:49:20 +0100 Xavier Brochard <xavier@alternatif.org>
wrote:

> Hello
> 
> trying to solve my problem with a unusable raid10 array, I discovered that 
> disk order is mixed between each boot - even with live-cd.
> Here's an extract from dmesg:
> [   12.5]  sda:
> [   12.5]  sdc:
> [   12.5]  sdd:
> [   12.5]  sde: sdd1
> [   12.5]  sdf: sdc1
> [   12.5]  sda1 sda2
> [   12.5]  sdg: sde1
> [   12.5]  sdf1
> 
> is that normal?
> could this be a sign of hardware controler problem?
> could this happen because all disks are sata-3 except 1 SSD which is sata-2?

You are saying that something changes between each boot, but only giving one
example so that we cannot see the change.  That is not particularly helpful.

The output above is a bit odd, but I think it is simply that the devices are
all being examined in parallel so the per-device messages are being mingled
together.
Certainly 'sdd1' is on 'sdd', not no 'sde' as the message seems to show.

NeilBrown

^ permalink raw reply	[flat|nested] 28+ messages in thread

[parent not found: <201103182350.19281.xavier@alternatif.org>]

[parent not found: <20110319102039.52cc2282@notabene.brown>]

* Re: disk order problem in a raid 10 array
       [not found]   ` <20110319102039.52cc2282@notabene.brown>
@ 2011-03-18 23:59     ` Xavier Brochard
  2011-03-19  0:05       ` Xavier Brochard
  2011-03-19  1:42       ` NeilBrown
  2011-03-19 12:01     ` Xavier Brochard
  1 sibling, 2 replies; 28+ messages in thread
From: Xavier Brochard @ 2011-03-18 23:59 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Le samedi 19 mars 2011 00:20:39 NeilBrown, vous avez écrit :
> On Fri, 18 Mar 2011 23:50:18 +0100 Xavier Brochard <xavier@alternatif.org>
> > Le vendredi 18 mars 2011 23:22:51, NeilBrown  écrivait :
> > > On Fri, 18 Mar 2011 21:12:49 +0100 Xavier Brochard
> > > > Le vendredi 18 mars 2011 18:22:34 hansbkk@gmail.com, vous avez écrit :
> > > > > On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard
> > > > > <xavier@alternatif.org>
> > > > 
> > > > wrote:
> > > > > > disk order is mixed between each boot - even with live-cd.
> > > > > > is that normal?
> > > > > 
> > > > > If nothing is changing and the order is swapping really every boot,
> > > > > then IMO that is odd.
> > > > 
> > > > nothing has changed, except kernel minor version
> > > 
> > > Yet you don't tell us what the kernel minor version changed from or to.
> > 
> > Previously it was ubuntu 2.6.32-27-server or 2.6.32-28-server and now it
> > is ubuntu 2.6.32-29.58-server 2.6.32.28+drm33.13
> > 
> > > That may not be important, but it might and you obviously don't know
> > > which. It is always better to give too much information rather than
> > > not enough.

> > 
> > Here's full output of mdadm --examine /dev/sd[cdefg]1
> > As you can see, disks sdc, sdd and sde claims to be different, is it a
> > problem?
> 
> Where all of these outputs collected at the same time?

yes

> They seem
> inconsistent.

> In particular, sdc1 has a higher 'events' number than the others (154 vs
> 102) yet an earlier Update Time.  It also thinks that the array is
> completely failed.

When I removed that disk (sdc is number 2) and another one (I tried with 
different disks), all other disks display (with mdadm -E):
0	Active
1	Active
2	Active
3 	Active
4	Spare

But when I removed that disk (#2) and #0, it start to recover and all other 
disks display (with mdadm -E):
0	Removed
1	Active
2	Faulty removed
3	Active
4	Spare
That looks coherent for me, now.

> So I suspect that device is badly confused and you probably want to zero
> it's metadata ... but don't do that too hastily.
> 
> All the other devices think the array is working correctly with a full
> compliment of devices.  However there is no device which claims to
> be "RaidDevice 2" - except sdc1 and it is obviously confused..
> 
> The device name listed in the table at the end of --examine output.
> It is the name that the device had when the metadata was last written.  And
> device names can change on reboot.
> The fact that the names don't line up suggest that the metadata hasn't been
> written since the last reboot - so presumably you aren't really using the
> array.(???)
> 
> [the newer 1.x metadata format doesn't try to record the names of devices
> in the superblock so it doesn't result in some of this confusion).
> 
> 
> Based on your earlier email, it would appear that the device discovery for
> some of your devices is happening in parallel at boot time, so or ordering
> could be random - each time you boot you get a different order.  This will
> not confuse md or mdadm - they look at the content of the devices rather
> than the name.
> If you want a definitive name for each device, it might be a good idea to
> look in /dev/disk/by-path or /dev/disk/by-id and use names from there.
> 
> Could you please sent a complete output of:
> 
>    cat /proc/mdstat
>    mdadm -D /dev/md0
>    mdadm -E /dev/sd?1
> 
> all collected at the same time.  Then I will suggest if there is any action
> you should take to repair anything.

Here it is, thankyou for you help

mdstat:
=====
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] 
[raid10] 
md0 : inactive sdb1[2](S) sdf1[4](S) sdd1[3](S) sdc1[1](S) sde1[0](S)
      2441919680 blocks
       
unused devices: <none>
====
obviously, mdadm -D /dev/md0 output nothing

mdadm -E /dev/sd?1
====
/dev/sdb1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
  Creation Time : Sun Jan  2 16:41:45 2011
     Raid Level : raid10
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 976767872 (931.52 GiB 1000.21 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 0

    Update Time : Wed Mar 16 09:50:03 2011
          State : clean
 Active Devices : 1
Working Devices : 1
 Failed Devices : 2
  Spare Devices : 0
       Checksum : ec151590 - correct
         Events : 154

         Layout : near=2
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2       8       65        2      active sync   /dev/sde1

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2       8       65        2      active sync   /dev/sde1
   3     3       0        0        3      faulty removed
/dev/sdc1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
  Creation Time : Sun Jan  2 16:41:45 2011
     Raid Level : raid10
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 976767872 (931.52 GiB 1000.21 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

    Update Time : Fri Mar 18 16:37:45 2011
          State : clean
 Active Devices : 2
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 1
       Checksum : ec181672 - correct
         Events : 107

         Layout : near=2
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8       17        1      active sync   /dev/sdb1

   0     0       0        0        0      removed
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       0        0        2      faulty removed
   3     3       8       49        3      active sync   /dev/sdd1
   4     4       8       33        4      spare   /dev/sdc1
/dev/sdd1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
  Creation Time : Sun Jan  2 16:41:45 2011
     Raid Level : raid10
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 976767872 (931.52 GiB 1000.21 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

    Update Time : Fri Mar 18 16:37:45 2011
          State : clean
 Active Devices : 2
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 1
       Checksum : ec181696 - correct
         Events : 107

         Layout : near=2
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     3       8       49        3      active sync   /dev/sdd1

   0     0       0        0        0      removed
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       0        0        2      faulty removed
   3     3       8       49        3      active sync   /dev/sdd1
   4     4       8       33        4      spare   /dev/sdc1
/dev/sde1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
  Creation Time : Sun Jan  2 16:41:45 2011
     Raid Level : raid10
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 976767872 (931.52 GiB 1000.21 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 0

    Update Time : Wed Mar 16 07:43:45 2011
          State : clean
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1
       Checksum : ec14f740 - correct
         Events : 102

         Layout : near=2
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8       33        0      active sync   /dev/sdc1

   0     0       8       33        0      active sync   /dev/sdc1
   1     1       8       49        1      active sync   /dev/sdd1
   2     2       8       65        2      active sync   /dev/sde1
   3     3       8       81        3      active sync   /dev/sdf1
   4     4       8       97        4      spare
/dev/sdf1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
  Creation Time : Sun Jan  2 16:41:45 2011
     Raid Level : raid10
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 976767872 (931.52 GiB 1000.21 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

    Update Time : Fri Mar 18 16:37:45 2011
          State : clean
 Active Devices : 2
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 1
       Checksum : ec181682 - correct
         Events : 107

         Layout : near=2
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     4       8       33        4      spare   /dev/sdc1

   0     0       0        0        0      removed
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       0        0        2      faulty removed
   3     3       8       49        3      active sync   /dev/sdd1
   4     4       8       33        4      spare   /dev/sdc1
====



Xavier
xavier@alternatif.org - 09 54 06 16 26
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: disk order problem in a raid 10 array
  2011-03-18 23:59     ` Xavier Brochard
@ 2011-03-19  0:05       ` Xavier Brochard
  2011-03-19  0:07         ` Roberto Spadim
  2011-03-19  1:42       ` NeilBrown
  1 sibling, 1 reply; 28+ messages in thread
From: Xavier Brochard @ 2011-03-19  0:05 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Le samedi 19 mars 2011 00:59:07 Xavier Brochard, vous avez écrit :
> > Could you please sent a complete output of:
> > 
> >    cat /proc/mdstat
> >    mdadm -D /dev/md0
> >    mdadm -E /dev/sd?1
> > 
> >
> > all collected at the same time.  Then I will suggest if there is any
> > action you should take to repair anything.
> 
> Here it is, thankyou for you help

don't now if it is important, but I collected them from Sysrescue-cd, with a 
2.6.35-std163-amd64 kernel

Xavier
xavier@alternatif.org - 09 54 06 16 26
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: disk order problem in a raid 10 array
  2011-03-19  0:05       ` Xavier Brochard
@ 2011-03-19  0:07         ` Roberto Spadim
  2011-03-19  0:25           ` Xavier Brochard
  0 siblings, 1 reply; 28+ messages in thread
From: Roberto Spadim @ 2011-03-19  0:07 UTC (permalink / raw)
  To: Xavier Brochard; +Cc: NeilBrown, linux-raid

hum, can you check if you updated udev ?
i think it´s a udev with wrong configuration (maybe, maybe not)

2011/3/18 Xavier Brochard <xavier@alternatif.org>:
> Le samedi 19 mars 2011 00:59:07 Xavier Brochard, vous avez écrit :
>> > Could you please sent a complete output of:
>> >
>> >    cat /proc/mdstat
>> >    mdadm -D /dev/md0
>> >    mdadm -E /dev/sd?1
>> >
>> >
>> > all collected at the same time.  Then I will suggest if there is any
>> > action you should take to repair anything.
>>
>> Here it is, thankyou for you help
>
> don't now if it is important, but I collected them from Sysrescue-cd, with a
> 2.6.35-std163-amd64 kernel
>
> Xavier
> xavier@alternatif.org - 09 54 06 16 26
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: disk order problem in a raid 10 array
  2011-03-19  0:07         ` Roberto Spadim
@ 2011-03-19  0:25           ` Xavier Brochard
  0 siblings, 0 replies; 28+ messages in thread
From: Xavier Brochard @ 2011-03-19  0:25 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: linux-raid

Le samedi 19 mars 2011 01:07:59, Roberto Spadim écrivait :
> hum, can you check if you updated udev ?
> i think it´s a udev with wrong configuration (maybe, maybe not)

Unfortunatly /var is on the faulty raid.
But I'm pretty sure that it was not updated because last updates for udev 
package in ubuntu lucid and lucid-updates repositories were all before my 
installation of the server (respectively in april and november).


Xavier
xavier@alternatif.org - 09 54 06 16 26
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: disk order problem in a raid 10 array
  2011-03-18 23:59     ` Xavier Brochard
  2011-03-19  0:05       ` Xavier Brochard
@ 2011-03-19  1:42       ` NeilBrown
  2011-03-19 13:44         ` Xavier Brochard
  1 sibling, 1 reply; 28+ messages in thread
From: NeilBrown @ 2011-03-19  1:42 UTC (permalink / raw)
  To: Xavier Brochard; +Cc: linux-raid

On Sat, 19 Mar 2011 00:59:07 +0100 Xavier Brochard <xavier@alternatif.org>
wrote:

> Le samedi 19 mars 2011 00:20:39 NeilBrown, vous avez écrit :
> > On Fri, 18 Mar 2011 23:50:18 +0100 Xavier Brochard <xavier@alternatif.org>
> > > Le vendredi 18 mars 2011 23:22:51, NeilBrown  écrivait :
> > > > On Fri, 18 Mar 2011 21:12:49 +0100 Xavier Brochard
> > > > > Le vendredi 18 mars 2011 18:22:34 hansbkk@gmail.com, vous avez écrit :
> > > > > > On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard
> > > > > > <xavier@alternatif.org>
> > > > > 
> > > > > wrote:
> > > > > > > disk order is mixed between each boot - even with live-cd.
> > > > > > > is that normal?
> > > > > > 
> > > > > > If nothing is changing and the order is swapping really every boot,
> > > > > > then IMO that is odd.
> > > > > 
> > > > > nothing has changed, except kernel minor version
> > > > 
> > > > Yet you don't tell us what the kernel minor version changed from or to.
> > > 
> > > Previously it was ubuntu 2.6.32-27-server or 2.6.32-28-server and now it
> > > is ubuntu 2.6.32-29.58-server 2.6.32.28+drm33.13
> > > 
> > > > That may not be important, but it might and you obviously don't know
> > > > which. It is always better to give too much information rather than
> > > > not enough.
> 
> > > 
> > > Here's full output of mdadm --examine /dev/sd[cdefg]1
> > > As you can see, disks sdc, sdd and sde claims to be different, is it a
> > > problem?
> > 
> > Where all of these outputs collected at the same time?
> 
> yes
> 
> > They seem
> > inconsistent.
> 
> > In particular, sdc1 has a higher 'events' number than the others (154 vs
> > 102) yet an earlier Update Time.  It also thinks that the array is
> > completely failed.
> 
> When I removed that disk (sdc is number 2) and another one (I tried with 
> different disks), all other disks display (with mdadm -E):
> 0	Active
> 1	Active
> 2	Active
> 3 	Active
> 4	Spare
> 
> But when I removed that disk (#2) and #0, it start to recover and all other 
> disks display (with mdadm -E):
> 0	Removed
> 1	Active
> 2	Faulty removed
> 3	Active
> 4	Spare
> That looks coherent for me, now.
> 
> > So I suspect that device is badly confused and you probably want to zero
> > it's metadata ... but don't do that too hastily.
> > 
> > All the other devices think the array is working correctly with a full
> > compliment of devices.  However there is no device which claims to
> > be "RaidDevice 2" - except sdc1 and it is obviously confused..
> > 
> > The device name listed in the table at the end of --examine output.
> > It is the name that the device had when the metadata was last written.  And
> > device names can change on reboot.
> > The fact that the names don't line up suggest that the metadata hasn't been
> > written since the last reboot - so presumably you aren't really using the
> > array.(???)
> > 
> > [the newer 1.x metadata format doesn't try to record the names of devices
> > in the superblock so it doesn't result in some of this confusion).
> > 
> > 
> > Based on your earlier email, it would appear that the device discovery for
> > some of your devices is happening in parallel at boot time, so or ordering
> > could be random - each time you boot you get a different order.  This will
> > not confuse md or mdadm - they look at the content of the devices rather
> > than the name.
> > If you want a definitive name for each device, it might be a good idea to
> > look in /dev/disk/by-path or /dev/disk/by-id and use names from there.
> > 
> > Could you please sent a complete output of:
> > 
> >    cat /proc/mdstat
> >    mdadm -D /dev/md0
> >    mdadm -E /dev/sd?1
> > 
> > all collected at the same time.  Then I will suggest if there is any action
> > you should take to repair anything.
> 
> Here it is, thankyou for you help
> 

I suggest you:

  mdadm --zero /dev/sdb1

having first double-checked that sdb1 is the devices with Events of 154,

then

 mdadm -S /dev/md0
 mdadm -As /dev/md0


and let the  array rebuild the spare.
Then check the data and make sure it is all good.
Then add /dev/sdb1 back in as the spare
  mdadm /dev/md0 --add /dev/sdb1

and everything should be fine - providing you don't hit any hardware errors
etc.


NeilBrown




> mdstat:
> =====
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] 
> [raid10] 
> md0 : inactive sdb1[2](S) sdf1[4](S) sdd1[3](S) sdc1[1](S) sde1[0](S)
>       2441919680 blocks
>        
> unused devices: <none>
> ====
> obviously, mdadm -D /dev/md0 output nothing
> 
> mdadm -E /dev/sd?1
> ====
> /dev/sdb1:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
>   Creation Time : Sun Jan  2 16:41:45 2011
>      Raid Level : raid10
>   Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
>      Array Size : 976767872 (931.52 GiB 1000.21 GB)
>    Raid Devices : 4
>   Total Devices : 5
> Preferred Minor : 0
> 
>     Update Time : Wed Mar 16 09:50:03 2011
>           State : clean
>  Active Devices : 1
> Working Devices : 1
>  Failed Devices : 2
>   Spare Devices : 0
>        Checksum : ec151590 - correct
>          Events : 154
> 
>          Layout : near=2
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     2       8       65        2      active sync   /dev/sde1
> 
>    0     0       0        0        0      removed
>    1     1       0        0        1      faulty removed
>    2     2       8       65        2      active sync   /dev/sde1
>    3     3       0        0        3      faulty removed
> /dev/sdc1:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
>   Creation Time : Sun Jan  2 16:41:45 2011
>      Raid Level : raid10
>   Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
>      Array Size : 976767872 (931.52 GiB 1000.21 GB)
>    Raid Devices : 4
>   Total Devices : 3
> Preferred Minor : 0
> 
>     Update Time : Fri Mar 18 16:37:45 2011
>           State : clean
>  Active Devices : 2
> Working Devices : 3
>  Failed Devices : 1
>   Spare Devices : 1
>        Checksum : ec181672 - correct
>          Events : 107
> 
>          Layout : near=2
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     1       8       17        1      active sync   /dev/sdb1
> 
>    0     0       0        0        0      removed
>    1     1       8       17        1      active sync   /dev/sdb1
>    2     2       0        0        2      faulty removed
>    3     3       8       49        3      active sync   /dev/sdd1
>    4     4       8       33        4      spare   /dev/sdc1
> /dev/sdd1:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
>   Creation Time : Sun Jan  2 16:41:45 2011
>      Raid Level : raid10
>   Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
>      Array Size : 976767872 (931.52 GiB 1000.21 GB)
>    Raid Devices : 4
>   Total Devices : 3
> Preferred Minor : 0
> 
>     Update Time : Fri Mar 18 16:37:45 2011
>           State : clean
>  Active Devices : 2
> Working Devices : 3
>  Failed Devices : 1
>   Spare Devices : 1
>        Checksum : ec181696 - correct
>          Events : 107
> 
>          Layout : near=2
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     3       8       49        3      active sync   /dev/sdd1
> 
>    0     0       0        0        0      removed
>    1     1       8       17        1      active sync   /dev/sdb1
>    2     2       0        0        2      faulty removed
>    3     3       8       49        3      active sync   /dev/sdd1
>    4     4       8       33        4      spare   /dev/sdc1
> /dev/sde1:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
>   Creation Time : Sun Jan  2 16:41:45 2011
>      Raid Level : raid10
>   Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
>      Array Size : 976767872 (931.52 GiB 1000.21 GB)
>    Raid Devices : 4
>   Total Devices : 5
> Preferred Minor : 0
> 
>     Update Time : Wed Mar 16 07:43:45 2011
>           State : clean
>  Active Devices : 4
> Working Devices : 5
>  Failed Devices : 0
>   Spare Devices : 1
>        Checksum : ec14f740 - correct
>          Events : 102
> 
>          Layout : near=2
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     0       8       33        0      active sync   /dev/sdc1
> 
>    0     0       8       33        0      active sync   /dev/sdc1
>    1     1       8       49        1      active sync   /dev/sdd1
>    2     2       8       65        2      active sync   /dev/sde1
>    3     3       8       81        3      active sync   /dev/sdf1
>    4     4       8       97        4      spare
> /dev/sdf1:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
>   Creation Time : Sun Jan  2 16:41:45 2011
>      Raid Level : raid10
>   Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
>      Array Size : 976767872 (931.52 GiB 1000.21 GB)
>    Raid Devices : 4
>   Total Devices : 3
> Preferred Minor : 0
> 
>     Update Time : Fri Mar 18 16:37:45 2011
>           State : clean
>  Active Devices : 2
> Working Devices : 3
>  Failed Devices : 1
>   Spare Devices : 1
>        Checksum : ec181682 - correct
>          Events : 107
> 
>          Layout : near=2
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     4       8       33        4      spare   /dev/sdc1
> 
>    0     0       0        0        0      removed
>    1     1       8       17        1      active sync   /dev/sdb1
>    2     2       0        0        2      faulty removed
>    3     3       8       49        3      active sync   /dev/sdd1
>    4     4       8       33        4      spare   /dev/sdc1
> ====
> 
> 
> 
> Xavier
> xavier@alternatif.org - 09 54 06 16 26

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: disk order problem in a raid 10 array
  2011-03-19  1:42       ` NeilBrown
@ 2011-03-19 13:44         ` Xavier Brochard
  2011-03-19 15:14           ` Xavier Brochard
  2011-03-20  3:53           ` NeilBrown
  0 siblings, 2 replies; 28+ messages in thread
From: Xavier Brochard @ 2011-03-19 13:44 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Le samedi 19 mars 2011 02:42:47 NeilBrown, vous avez écrit :
> I suggest you:
> 
>   mdadm --zero /dev/sdb1
> 
> having first double-checked that sdb1 is the devices with Events of 154,
> 
> then
> 
>  mdadm -S /dev/md0
>  mdadm -As /dev/md0
> 
> 
> and let the  array rebuild the spare.
> Then check the data and make sure it is all good.
> Then add /dev/sdb1 back in as the spare
>   mdadm /dev/md0 --add /dev/sdb1
> 
> and everything should be fine - providing you don't hit any hardware errors
> etc.

It didnt work, until I've stopped the raid array:
mdadm --zero /dev/sdg1
mdadm: Couldn't open /dev/sdg1 for write - not zeroing

is that normal, can I continue?

Xavier
xavier@alternatif.org
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: disk order problem in a raid 10 array
  2011-03-19 13:44         ` Xavier Brochard
@ 2011-03-19 15:14           ` Xavier Brochard
  2011-03-20  3:53           ` NeilBrown
  1 sibling, 0 replies; 28+ messages in thread
From: Xavier Brochard @ 2011-03-19 15:14 UTC (permalink / raw)
  To: linux-raid


Le samedi 19 mars 2011 14:44:40, Xavier Brochard écrivait :
> Le samedi 19 mars 2011 02:42:47 NeilBrown, vous avez écrit :
> > I suggest you:
> >   mdadm --zero /dev/sdb1
> > 
> > having first double-checked that sdb1 is the devices with Events of 154,
> > 
> > then
> > 
> >  mdadm -S /dev/md0
> >  mdadm -As /dev/md0
> > 
> > and let the  array rebuild the spare.
> > Then check the data and make sure it is all good.
> > Then add /dev/sdb1 back in as the spare
> > 
> >   mdadm /dev/md0 --add /dev/sdb1
> > 
> > and everything should be fine - providing you don't hit any hardware
> > errors etc.
> 
> It didnt work, until I've stopped the raid array:
> mdadm --zero /dev/sdg1
> mdadm: Couldn't open /dev/sdg1 for write - not zeroing
> 
> is that normal, can I continue?

so far I've done:
mdam -S /dev/md0
mdadm --zero /dev/sdg1
mdadm -As /dev/md0 --config=/path/to/config

mdadm: /dev/md0 has been started with 2 drives (out of 4) and 1 spare.

ot started to recover:
md0 : active raid10 sdc1[1] sdf1[4] sde1[3]
      976767872 blocks 64K chunks 2 near-copies [4/2] [_U_U]
      [>....................]  recovery =  0.3% (1468160/488383936) 
finish=66.3min speed=122346K/sec

but why did it started md0 with 2 drives and not with 3 drives?

Xavier
xavier@alternatif.org - 09 54 06 16 26
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: disk order problem in a raid 10 array
  2011-03-19 13:44         ` Xavier Brochard
  2011-03-19 15:14           ` Xavier Brochard
@ 2011-03-20  3:53           ` NeilBrown
  2011-03-20 10:40             ` Xavier Brochard
  1 sibling, 1 reply; 28+ messages in thread
From: NeilBrown @ 2011-03-20  3:53 UTC (permalink / raw)
  To: Xavier Brochard; +Cc: linux-raid

On Sat, 19 Mar 2011 14:44:40 +0100 Xavier Brochard <xavier@alternatif.org>
wrote:

> Le samedi 19 mars 2011 02:42:47 NeilBrown, vous avez écrit :
> > I suggest you:
> > 
> >   mdadm --zero /dev/sdb1
> > 
> > having first double-checked that sdb1 is the devices with Events of 154,
> > 
> > then
> > 
> >  mdadm -S /dev/md0
> >  mdadm -As /dev/md0
> > 
> > 
> > and let the  array rebuild the spare.
> > Then check the data and make sure it is all good.
> > Then add /dev/sdb1 back in as the spare
> >   mdadm /dev/md0 --add /dev/sdb1
> > 
> > and everything should be fine - providing you don't hit any hardware errors
> > etc.
> 
> It didnt work, until I've stopped the raid array:
> mdadm --zero /dev/sdg1
> mdadm: Couldn't open /dev/sdg1 for write - not zeroing
> 
> is that normal, can I continue?
>

Yes, you are right.  You need to stop the array before you zero things.

So:
 
   mdadm -S /dev/md0
   mdadm --zero /dev/the-device-which-thinks-most-of-the-other-devices-have-failed
   mdadm -As /dev/md0

That last command might need to be
   mdadm -As /dev/md0 /dev/sdc1 /dev/sdd1 ..... list of all member devices.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: disk order problem in a raid 10 array
  2011-03-20  3:53           ` NeilBrown
@ 2011-03-20 10:40             ` Xavier Brochard
  0 siblings, 0 replies; 28+ messages in thread
From: Xavier Brochard @ 2011-03-20 10:40 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Le dimanche 20 mars 2011 04:53:23 NeilBrown, vous avez écrit :
> On Sat, 19 Mar 2011 14:44:40 +0100 Xavier Brochard <xavier@alternatif.org>
> 
> wrote:
> > Le samedi 19 mars 2011 02:42:47 NeilBrown, vous avez écrit :
> > > I suggest you:
> > >   mdadm --zero /dev/sdb1
> > > 
> > > having first double-checked that sdb1 is the devices with Events of
> > > 154,
> > > 
> > > then
> > > 
> > >  mdadm -S /dev/md0
> > >  mdadm -As /dev/md0
> > > 
> > > and let the  array rebuild the spare.
> > > Then check the data and make sure it is all good.
> > > Then add /dev/sdb1 back in as the spare
> > > 
> > >   mdadm /dev/md0 --add /dev/sdb1
> > > 
> > > and everything should be fine - providing you don't hit any hardware
> > > errors etc.
> > 
> > It didnt work, until I've stopped the raid array:
> > mdadm --zero /dev/sdg1
> > mdadm: Couldn't open /dev/sdg1 for write - not zeroing
> > 
> > is that normal, can I continue?
> 
> Yes, you are right.  You need to stop the array before you zero things.
> 
> So:
> 
>    mdadm -S /dev/md0
>    mdadm --zero
> /dev/the-device-which-thinks-most-of-the-other-devices-have-failed mdadm
> -As /dev/md0
> 
> That last command might need to be
>    mdadm -As /dev/md0 /dev/sdc1 /dev/sdd1 ..... list of all member devices.

I would like to thank you very much for you help.
You helped me to understand mdadm message and to keep only relevant part of 
the various errors I could see in the logs. Your explanation and instructions 
were a great help to repair .

Everything is working now.
Thanks again.

Xavier
xavier@alternatif.org 
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: disk order problem in a raid 10 array
       [not found]   ` <20110319102039.52cc2282@notabene.brown>
  2011-03-18 23:59     ` Xavier Brochard
@ 2011-03-19 12:01     ` Xavier Brochard
  1 sibling, 0 replies; 28+ messages in thread
From: Xavier Brochard @ 2011-03-19 12:01 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Le samedi 19 mars 2011 00:20:39, NeilBrown  écrivait :
> On Fri, 18 Mar 2011 23:50:18 +0100 Xavier Brochard <xavier@alternatif.org>
> > Le vendredi 18 mars 2011 23:22:51, NeilBrown  écrivait :
> > > On Fri, 18 Mar 2011 21:12:49 +0100 Xavier Brochard
> > > > Le vendredi 18 mars 2011 18:22:34 hansbkk@gmail.com, vous avez écrit :
> > > > > On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard
> > > > > > disk order is mixed between each boot - even with live-cd.
> > > > > > is that normal?
> > > > > 
> > > > > If nothing is changing and the order is swapping really every boot,
> > > > > then IMO that is odd.
> > > > 
> > > > nothing has changed, except kernel minor version
> > > 
> > > Yet you don't tell us what the kernel minor version changed from or to.
> > 
> > Previously it was ubuntu 2.6.32-27-server or 2.6.32-28-server and now it
> > is ubuntu 2.6.32-29.58-server 2.6.32.28+drm33.13
> > 
> > > That may not be important, but it might and you obviously don't know
> > > which. It is always better to give too much information rather than
> > > not enough.

> > Here's full output of mdadm --examine /dev/sd[cdefg]1
> > As you can see, disks sdc, sdd and sde claims to be different, is it a
> > problem?
> 
> Where all of these outputs collected at the same time?  They seem
> inconsistent.
> 
> In particular, sdc1 has a higher 'events' number than the others (154 vs
> 102) yet an earlier Update Time.  It also thinks that the array is
> completely failed.
> So I suspect that device is badly confused and you probably want to zero
> it's metadata ... but don't do that too hastily.
> 
> All the other devices think the array is working correctly with a full
> compliment of devices.  However there is no device which claims to
> be "RaidDevice 2" - except sdc1 and it is obviously confused..
> 
> The device name listed in the table at the end of --examine output.
> It is the name that the device had when the metadata was last written.  And
> device names can change on reboot.
> The fact that the names don't line up suggest that the metadata hasn't been
> written since the last reboot - so presumably you aren't really using the
> array.(???)

The array was in use 24/24.
But the last reboot using it  was after the first error (I described it 
extensively in wednesday email). As I first thought it was a file system error, 
I've launched fsck to check the /tmp FS with fsck /dev/mapper/tout-tmp (it is 
a Raid10 + lvm setup). 
Can it be the reason for the metadata not written?

> [the newer 1.x metadata format doesn't try to record the names of devices
> in the superblock so it doesn't result in some of this confusion).

Yes it's really confusing:
the SAS/SATA controler card gives "numbers" for the hard drives
which doesn't correspond to the /dev/sd? names 
which doesn't correspond to the drive numer in the array
etc.

> Based on your earlier email, it would appear that the device discovery for
> some of your devices is happening in parallel at boot time, so or ordering
> could be random - each time you boot you get a different order.  This will
> not confuse md or mdadm - they look at the content of the devices rather
> than the name.

ok, thanks for making it very clear

> If you want a definitive name for each device, it might be a good idea to
> look in /dev/disk/by-path or /dev/disk/by-id and use names from there.

I think I can't: 
With System Rescue CD (2.6.35-std163-amd64 kernel) I have only one path 
available:
pci-0000:00:14.1-scsi-0:0:0:0
which, according to lspci, is not the LSI sas/sata controler, but the IDE 
interface:
00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller
While the LSI controler is at
01:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 
PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 02)

This make me a bit anxious to start the raid recovery!


Xavier
xavier@alternatif.org - 09 54 06 16 26
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: disk order problem in a raid 10 array
@ 2011-03-18 23:06 Xavier Brochard
  0 siblings, 0 replies; 28+ messages in thread
From: Xavier Brochard @ 2011-03-18 23:06 UTC (permalink / raw)
  To: linux-raid

Le vendredi 18 mars 2011 23:14:05, NeilBrown  écrivait :
> On Fri, 18 Mar 2011 15:49:20 +0100 Xavier Brochard <xavier@alternatif.org>
> > trying to solve my problem with a unusable raid10 array, I discovered
> > that disk order is mixed between each boot - even with live-cd.
> > Here's an extract from dmesg:
> > [   12.5]  sda:
> > [   12.5]  sdc:
> > [   12.5]  sdd:
> > [   12.5]  sde: sdd1
> > [   12.5]  sdf: sdc1
> > [   12.5]  sda1 sda2
> > [   12.5]  sdg: sde1
> > [   12.5]  sdf1
> > 
> > is that normal?
> 
> You are saying that something changes between each boot, but only giving
> one example so that we cannot see the change.  That is not particularly
> helpful.

sorry, I didn't want to send too long email
as each dmesg show diffent but similar output


> The output above is a bit odd, but I think it is simply that the devices
> are all being examined in parallel so the per-device messages are being
> mingled together.
> Certainly 'sdd1' is on 'sdd', not no 'sde' as the message seems to show.

ok thanks


Xavier
xavier@alternatif.org - 09 54 06 16 26
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: disk order problem in a raid 10 array
@ 2011-03-18 23:06 Xavier Brochard
  2011-03-18 23:57 ` Roberto Spadim
  0 siblings, 1 reply; 28+ messages in thread
From: Xavier Brochard @ 2011-03-18 23:06 UTC (permalink / raw)
  To: linux-raid; +Cc: hansbkk

Hello,

Le vendredi 18 mars 2011 23:22:51, NeilBrown  écrivait :
> On Fri, 18 Mar 2011 21:12:49 +0100 Xavier Brochard <xavier@alternatif.org>
> > Le vendredi 18 mars 2011 18:22:34 hansbkk@gmail.com, vous avez écrit :
> > > On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard
> > > <xavier@alternatif.org>
> > 
> > wrote:
> > > > disk order is mixed between each boot - even with live-cd.
> > > > is that normal?
> > > 
> > > If nothing is changing and the order is swapping really every boot,
> > > then IMO that is odd.
> > 
> > nothing has changed, except kernel minor version
> 
> Yet you don't tell us what the kernel minor version changed from or to.

Previously it was ubuntu 2.6.32-27-server or 2.6.32-28-server and now it is 
ubuntu 2.6.32-29.58-server 2.6.32.28+drm33.13

> That may not be important, but it might and you obviously don't know which.
> It is always better to give too much information rather than not enough.

Again sorry, my wednesday email was long and I thought it was too long! 

> > exactly, in my case mdadm --examine output is somewhat weird as it shows:
> > /dev/sde1
> > this     0       8       33        0      active sync   /dev/sdd1
> > /dev/sdd1
> > this     0       8       33        0      active sync   /dev/sdc1
> > /dev/sdc1
> > this     0       8       33        0      active sync   /dev/sde1
> > and /dev/sdf1 as sdf1
> 
> You are hiding lots of details again...
> 
> Are these all from different arrays?  They all claim to be 'device 0' of
> some array.

They are all from same md RAID10 array

> Infact,  "8, 33" is *always* /dev/sdc1,  so I think the above lines have
> been edited by hand because I'm 100% certain mdadm didn't output them.

You're right, I'm sorry. I  have copied this line, just changing the /dev/sd?

Here's full output of mdadm --examine /dev/sd[cdefg]1
As you can see, disks sdc, sdd and sde claims to be different, is it a problem?
======================================
/dev/sdc1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
  Creation Time : Sun Jan  2 16:41:45 2011
     Raid Level : raid10
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 976767872 (931.52 GiB 1000.21 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 0

    Update Time : Wed Mar 16 09:50:03 2011
          State : clean
 Active Devices : 1
Working Devices : 1
 Failed Devices : 2
  Spare Devices : 0
       Checksum : ec151590 - correct
         Events : 154

         Layout : near=2
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2       8       65        2      active sync   /dev/sde1

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2       8       65        2      active sync   /dev/sde1
   3     3       0        0        3      faulty removed
/dev/sdd1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
  Creation Time : Sun Jan  2 16:41:45 2011
     Raid Level : raid10
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 976767872 (931.52 GiB 1000.21 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 0

    Update Time : Wed Mar 16 07:43:45 2011
          State : clean
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1
       Checksum : ec14f740 - correct
         Events : 102

         Layout : near=2
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8       33        0      active sync   /dev/sdc1

   0     0       8       33        0      active sync   /dev/sdc1
   1     1       8       49        1      active sync   /dev/sdd1
   2     2       8       65        2      active sync   /dev/sde1
   3     3       8       81        3      active sync   /dev/sdf1
   4     4       8       97        4      spare   /dev/sdg1
/dev/sde1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
  Creation Time : Sun Jan  2 16:41:45 2011
     Raid Level : raid10
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 976767872 (931.52 GiB 1000.21 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 0

    Update Time : Wed Mar 16 07:43:45 2011
          State : clean
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1
       Checksum : ec14f752 - correct
         Events : 102

         Layout : near=2
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8       49        1      active sync   /dev/sdd1

   0     0       8       33        0      active sync   /dev/sdc1
   1     1       8       49        1      active sync   /dev/sdd1
   2     2       8       65        2      active sync   /dev/sde1
   3     3       8       81        3      active sync   /dev/sdf1
   4     4       8       97        4      spare   /dev/sdg1
/dev/sdf1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
  Creation Time : Sun Jan  2 16:41:45 2011
     Raid Level : raid10
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 976767872 (931.52 GiB 1000.21 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 0

    Update Time : Wed Mar 16 07:43:45 2011
          State : clean
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1
       Checksum : ec14f776 - correct
         Events : 102

         Layout : near=2
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     3       8       81        3      active sync   /dev/sdf1

   0     0       8       33        0      active sync   /dev/sdc1
   1     1       8       49        1      active sync   /dev/sdd1
   2     2       8       65        2      active sync   /dev/sde1
   3     3       8       81        3      active sync   /dev/sdf1
   4     4       8       97        4      spare   /dev/sdg1
/dev/sdg1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
  Creation Time : Sun Jan  2 16:41:45 2011
     Raid Level : raid10
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 976767872 (931.52 GiB 1000.21 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 0

    Update Time : Wed Mar 16 07:43:45 2011
          State : clean
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1
       Checksum : ec14f782 - correct
         Events : 102

         Layout : near=2
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     4       8       97        4      spare   /dev/sdg1

   0     0       8       33        0      active sync   /dev/sdc1
   1     1       8       49        1      active sync   /dev/sdd1
   2     2       8       65        2      active sync   /dev/sde1
   3     3       8       81        3      active sync   /dev/sdf1
   4     4       8       97        4      spare   /dev/sdg1
===========


> > I think I can believe mdadm?
> 
> Yes, you can believe mdadm - but only if you understand what it is saying,
> and there are times when that is not as easy as one might like....

Specially when a raid system is broken! One mind looks broken too and it's a 
bit hard to think clearly :-)

Thanks for the help

Xavier
xavier@alternatif.org - 09 54 06 16 26
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: disk order problem in a raid 10 array
  2011-03-18 23:06 Xavier Brochard
@ 2011-03-18 23:57 ` Roberto Spadim
  2011-03-19  0:03   ` Xavier Brochard
  0 siblings, 1 reply; 28+ messages in thread
From: Roberto Spadim @ 2011-03-18 23:57 UTC (permalink / raw)
  To: Xavier Brochard; +Cc: linux-raid, hansbkk

did you try to change udev configuration?

2011/3/18 Xavier Brochard <xavier@alternatif.org>:
> Hello,
>
> Le vendredi 18 mars 2011 23:22:51, NeilBrown  écrivait :
>> On Fri, 18 Mar 2011 21:12:49 +0100 Xavier Brochard <xavier@alternatif.org>
>> > Le vendredi 18 mars 2011 18:22:34 hansbkk@gmail.com, vous avez écrit :
>> > > On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard
>> > > <xavier@alternatif.org>
>> >
>> > wrote:
>> > > > disk order is mixed between each boot - even with live-cd.
>> > > > is that normal?
>> > >
>> > > If nothing is changing and the order is swapping really every boot,
>> > > then IMO that is odd.
>> >
>> > nothing has changed, except kernel minor version
>>
>> Yet you don't tell us what the kernel minor version changed from or to.
>
> Previously it was ubuntu 2.6.32-27-server or 2.6.32-28-server and now it is
> ubuntu 2.6.32-29.58-server 2.6.32.28+drm33.13
>
>> That may not be important, but it might and you obviously don't know which.
>> It is always better to give too much information rather than not enough.
>
> Again sorry, my wednesday email was long and I thought it was too long!
>
>> > exactly, in my case mdadm --examine output is somewhat weird as it shows:
>> > /dev/sde1
>> > this     0       8       33        0      active sync   /dev/sdd1
>> > /dev/sdd1
>> > this     0       8       33        0      active sync   /dev/sdc1
>> > /dev/sdc1
>> > this     0       8       33        0      active sync   /dev/sde1
>> > and /dev/sdf1 as sdf1
>>
>> You are hiding lots of details again...
>>
>> Are these all from different arrays?  They all claim to be 'device 0' of
>> some array.
>
> They are all from same md RAID10 array
>
>> Infact,  "8, 33" is *always* /dev/sdc1,  so I think the above lines have
>> been edited by hand because I'm 100% certain mdadm didn't output them.
>
> You're right, I'm sorry. I  have copied this line, just changing the /dev/sd?
>
> Here's full output of mdadm --examine /dev/sd[cdefg]1
> As you can see, disks sdc, sdd and sde claims to be different, is it a problem?
> ======================================
> /dev/sdc1:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
>  Creation Time : Sun Jan  2 16:41:45 2011
>     Raid Level : raid10
>  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
>     Array Size : 976767872 (931.52 GiB 1000.21 GB)
>   Raid Devices : 4
>  Total Devices : 5
> Preferred Minor : 0
>
>    Update Time : Wed Mar 16 09:50:03 2011
>          State : clean
>  Active Devices : 1
> Working Devices : 1
>  Failed Devices : 2
>  Spare Devices : 0
>       Checksum : ec151590 - correct
>         Events : 154
>
>         Layout : near=2
>     Chunk Size : 64K
>
>      Number   Major   Minor   RaidDevice State
> this     2       8       65        2      active sync   /dev/sde1
>
>   0     0       0        0        0      removed
>   1     1       0        0        1      faulty removed
>   2     2       8       65        2      active sync   /dev/sde1
>   3     3       0        0        3      faulty removed
> /dev/sdd1:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
>  Creation Time : Sun Jan  2 16:41:45 2011
>     Raid Level : raid10
>  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
>     Array Size : 976767872 (931.52 GiB 1000.21 GB)
>   Raid Devices : 4
>  Total Devices : 5
> Preferred Minor : 0
>
>    Update Time : Wed Mar 16 07:43:45 2011
>          State : clean
>  Active Devices : 4
> Working Devices : 5
>  Failed Devices : 0
>  Spare Devices : 1
>       Checksum : ec14f740 - correct
>         Events : 102
>
>         Layout : near=2
>     Chunk Size : 64K
>
>      Number   Major   Minor   RaidDevice State
> this     0       8       33        0      active sync   /dev/sdc1
>
>   0     0       8       33        0      active sync   /dev/sdc1
>   1     1       8       49        1      active sync   /dev/sdd1
>   2     2       8       65        2      active sync   /dev/sde1
>   3     3       8       81        3      active sync   /dev/sdf1
>   4     4       8       97        4      spare   /dev/sdg1
> /dev/sde1:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
>  Creation Time : Sun Jan  2 16:41:45 2011
>     Raid Level : raid10
>  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
>     Array Size : 976767872 (931.52 GiB 1000.21 GB)
>   Raid Devices : 4
>  Total Devices : 5
> Preferred Minor : 0
>
>    Update Time : Wed Mar 16 07:43:45 2011
>          State : clean
>  Active Devices : 4
> Working Devices : 5
>  Failed Devices : 0
>  Spare Devices : 1
>       Checksum : ec14f752 - correct
>         Events : 102
>
>         Layout : near=2
>     Chunk Size : 64K
>
>      Number   Major   Minor   RaidDevice State
> this     1       8       49        1      active sync   /dev/sdd1
>
>   0     0       8       33        0      active sync   /dev/sdc1
>   1     1       8       49        1      active sync   /dev/sdd1
>   2     2       8       65        2      active sync   /dev/sde1
>   3     3       8       81        3      active sync   /dev/sdf1
>   4     4       8       97        4      spare   /dev/sdg1
> /dev/sdf1:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
>  Creation Time : Sun Jan  2 16:41:45 2011
>     Raid Level : raid10
>  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
>     Array Size : 976767872 (931.52 GiB 1000.21 GB)
>   Raid Devices : 4
>  Total Devices : 5
> Preferred Minor : 0
>
>    Update Time : Wed Mar 16 07:43:45 2011
>          State : clean
>  Active Devices : 4
> Working Devices : 5
>  Failed Devices : 0
>  Spare Devices : 1
>       Checksum : ec14f776 - correct
>         Events : 102
>
>         Layout : near=2
>     Chunk Size : 64K
>
>      Number   Major   Minor   RaidDevice State
> this     3       8       81        3      active sync   /dev/sdf1
>
>   0     0       8       33        0      active sync   /dev/sdc1
>   1     1       8       49        1      active sync   /dev/sdd1
>   2     2       8       65        2      active sync   /dev/sde1
>   3     3       8       81        3      active sync   /dev/sdf1
>   4     4       8       97        4      spare   /dev/sdg1
> /dev/sdg1:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
>  Creation Time : Sun Jan  2 16:41:45 2011
>     Raid Level : raid10
>  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
>     Array Size : 976767872 (931.52 GiB 1000.21 GB)
>   Raid Devices : 4
>  Total Devices : 5
> Preferred Minor : 0
>
>    Update Time : Wed Mar 16 07:43:45 2011
>          State : clean
>  Active Devices : 4
> Working Devices : 5
>  Failed Devices : 0
>  Spare Devices : 1
>       Checksum : ec14f782 - correct
>         Events : 102
>
>         Layout : near=2
>     Chunk Size : 64K
>
>      Number   Major   Minor   RaidDevice State
> this     4       8       97        4      spare   /dev/sdg1
>
>   0     0       8       33        0      active sync   /dev/sdc1
>   1     1       8       49        1      active sync   /dev/sdd1
>   2     2       8       65        2      active sync   /dev/sde1
>   3     3       8       81        3      active sync   /dev/sdf1
>   4     4       8       97        4      spare   /dev/sdg1
> ===========
>
>
>> > I think I can believe mdadm?
>>
>> Yes, you can believe mdadm - but only if you understand what it is saying,
>> and there are times when that is not as easy as one might like....
>
> Specially when a raid system is broken! One mind looks broken too and it's a
> bit hard to think clearly :-)
>
> Thanks for the help
>
> Xavier
> xavier@alternatif.org - 09 54 06 16 26
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: disk order problem in a raid 10 array
  2011-03-18 23:57 ` Roberto Spadim
@ 2011-03-19  0:03   ` Xavier Brochard
  0 siblings, 0 replies; 28+ messages in thread
From: Xavier Brochard @ 2011-03-19  0:03 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: linux-raid

Le samedi 19 mars 2011 00:57:15, Roberto Spadim écrivait :
> did you try to change udev configuration?

no

But attempting to boot on a 2.6.32-27 or 2.6.32-24 kernel result in a freeze 
short after:
ata_id[919] : HDIO_GET_IDENTITY failed for '/dev/sdb'
And while rebooting with alt + Sysrq + REISUB keys, I saw that udev was stuck 
on waiting a logitech usb mouse. 

Xavier
xavier@alternatif.org
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2011-03-21 21:02 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-18 14:49 disk order problem in a raid 10 array Xavier Brochard
2011-03-18 17:22 ` hansbkk
2011-03-18 20:09   ` Xavier Brochard
2011-03-18 20:12   ` Xavier Brochard
2011-03-18 22:22     ` NeilBrown
2011-03-18 20:26 ` Adaptive throttling for RAID1 background resync Hari Subramanian
2011-03-18 20:28   ` Roberto Spadim
2011-03-18 20:31     ` Hari Subramanian
2011-03-18 20:36       ` Roberto Spadim
2011-03-18 20:54         ` Hari Subramanian
2011-03-18 21:02           ` Roberto Spadim
2011-03-18 22:11   ` NeilBrown
2011-03-21 21:02     ` Hari Subramanian
2011-03-18 22:14 ` disk order problem in a raid 10 array NeilBrown
     [not found] ` <201103182350.19281.xavier@alternatif.org>
     [not found]   ` <20110319102039.52cc2282@notabene.brown>
2011-03-18 23:59     ` Xavier Brochard
2011-03-19  0:05       ` Xavier Brochard
2011-03-19  0:07         ` Roberto Spadim
2011-03-19  0:25           ` Xavier Brochard
2011-03-19  1:42       ` NeilBrown
2011-03-19 13:44         ` Xavier Brochard
2011-03-19 15:14           ` Xavier Brochard
2011-03-20  3:53           ` NeilBrown
2011-03-20 10:40             ` Xavier Brochard
2011-03-19 12:01     ` Xavier Brochard
  -- strict thread matches above, loose matches on Subject: below --
2011-03-18 23:06 Xavier Brochard
2011-03-18 23:06 Xavier Brochard
2011-03-18 23:57 ` Roberto Spadim
2011-03-19  0:03   ` Xavier Brochard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).