* disk order problem in a raid 10 array
@ 2011-03-18 14:49 Xavier Brochard
2011-03-18 17:22 ` hansbkk
` (3 more replies)
0 siblings, 4 replies; 28+ messages in thread
From: Xavier Brochard @ 2011-03-18 14:49 UTC (permalink / raw)
To: linux-raid
Hello
trying to solve my problem with a unusable raid10 array, I discovered that
disk order is mixed between each boot - even with live-cd.
Here's an extract from dmesg:
[ 12.5] sda:
[ 12.5] sdc:
[ 12.5] sdd:
[ 12.5] sde: sdd1
[ 12.5] sdf: sdc1
[ 12.5] sda1 sda2
[ 12.5] sdg: sde1
[ 12.5] sdf1
is that normal?
could this be a sign of hardware controler problem?
could this happen because all disks are sata-3 except 1 SSD which is sata-2?
Xavier
xavier@alternatif.org
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: disk order problem in a raid 10 array
2011-03-18 14:49 disk order problem in a raid 10 array Xavier Brochard
@ 2011-03-18 17:22 ` hansbkk
2011-03-18 20:09 ` Xavier Brochard
2011-03-18 20:12 ` Xavier Brochard
2011-03-18 20:26 ` Adaptive throttling for RAID1 background resync Hari Subramanian
` (2 subsequent siblings)
3 siblings, 2 replies; 28+ messages in thread
From: hansbkk @ 2011-03-18 17:22 UTC (permalink / raw)
To: Xavier Brochard; +Cc: linux-raid
On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard <xavier@alternatif.org> wrote:
> disk order is mixed between each boot - even with live-cd.
> is that normal?
If nothing is changing and the order is swapping really every boot,
then IMO that is odd.
But it's very normal for ordering to change from time to time, and
definitely when elements change - kernel version/flavor, drivers, BIOS
settings, etc.
Part of my SOP is now to record both mdadm and the boot loader's
ordering against serial number and UUID of drives when creating an
array, and to put the relevant information on labels securely attached
to the physical drives, along with creating a map of their physical
location and taping that inside the case.
It's critical to know what's what in a crisis. . .
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: disk order problem in a raid 10 array
2011-03-18 17:22 ` hansbkk
@ 2011-03-18 20:09 ` Xavier Brochard
2011-03-18 20:12 ` Xavier Brochard
1 sibling, 0 replies; 28+ messages in thread
From: Xavier Brochard @ 2011-03-18 20:09 UTC (permalink / raw)
To: hansbkk; +Cc: linux-raid
Le vendredi 18 mars 2011 18:22:34 hansbkk@gmail.com, vous avez écrit :
> On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard <xavier@alternatif.org>
wrote:
> > disk order is mixed between each boot - even with live-cd.
> > is that normal?
>
> If nothing is changing and the order is swapping really every boot,
> then IMO that is odd.
nothing has changed, except kernel minor version
>
> Part of my SOP is now to record both mdadm and the boot loader's
> ordering against serial number and UUID of drives when creating an
> array, and to put the relevant information on labels securely attached
> to the physical drives, along with creating a map of their physical
> location and taping that inside the case.nt
>
> It's critical to know what's what in a crisis. . .
mdadm --examine output is somewhat weird as it shows
/dev/sde1
this 0 8 33 0 active sync /dev/sdd1
/dev/sdd1
this 0 8 33 0 active sync /dev/sdc1
/dev/sdc1
this 0 8 33 0 active sync /dev/sde1
and /dev/sdf1 as sdf1
I think I can believe mdadm?
and that /proc/mdstat content comes directly from mdadm (that is with "real"
sdc,d,e)?
what trouble me is that after I removed 2 disk drive from the bay, mdadm start
to recover:
md0 : active raid10 sdb1[1] sdc1[4] sdd1[3]
976767872 blocks 64K chunks 2 near-copies [4/2] [_U_U]
[=>...................] recovery = 5.0% (24436736/488383936)
finish=56.2min speed=137513K/sec
I guess that it is ok, and that it is recovering with the spare. But I wld
like to be sure...
Xavier
xavier@alternatif.org - 09 54 06 16 26
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: disk order problem in a raid 10 array
2011-03-18 17:22 ` hansbkk
2011-03-18 20:09 ` Xavier Brochard
@ 2011-03-18 20:12 ` Xavier Brochard
2011-03-18 22:22 ` NeilBrown
1 sibling, 1 reply; 28+ messages in thread
From: Xavier Brochard @ 2011-03-18 20:12 UTC (permalink / raw)
To: hansbkk; +Cc: linux-raid
Le vendredi 18 mars 2011 18:22:34 hansbkk@gmail.com, vous avez écrit :
> On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard <xavier@alternatif.org>
wrote:
> > disk order is mixed between each boot - even with live-cd.
> > is that normal?
>
> If nothing is changing and the order is swapping really every boot,
> then IMO that is odd.
nothing has changed, except kernel minor version
>
> Part of my SOP is now to record both mdadm and the boot loader's
> ordering against serial number and UUID of drives when creating an
> array, and to put the relevant information on labels securely attached
> to the physical drives, along with creating a map of their physical
> location and taping that inside the case.nt
>
> It's critical to know what's what in a crisis. . .
exactly, in my case mdadm --examine output is somewhat weird as it shows:
/dev/sde1
this 0 8 33 0 active sync /dev/sdd1
/dev/sdd1
this 0 8 33 0 active sync /dev/sdc1
/dev/sdc1
this 0 8 33 0 active sync /dev/sde1
and /dev/sdf1 as sdf1
I think I can believe mdadm?
and that /proc/mdstat content comes directly from mdadm (that is with "exact"
sdc,d,e)?
what trouble me is that after I removed 2 disk drive from the bay, mdadm start
to recover:
md0 : active raid10 sdb1[1] sdc1[4] sdd1[3]
976767872 blocks 64K chunks 2 near-copies [4/2] [_U_U]
[=>...................] recovery = 5.0% (24436736/488383936)
finish=56.2min speed=137513K/sec
I guess that it is ok, and that it is recovering with the spare. But I would
like to be sure...
Xavier
xavier@alternatif.org - 09 54 06 16 26
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: disk order problem in a raid 10 array
2011-03-18 20:12 ` Xavier Brochard
@ 2011-03-18 22:22 ` NeilBrown
0 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2011-03-18 22:22 UTC (permalink / raw)
To: Xavier Brochard; +Cc: hansbkk, linux-raid
On Fri, 18 Mar 2011 21:12:49 +0100 Xavier Brochard <xavier@alternatif.org>
wrote:
> Le vendredi 18 mars 2011 18:22:34 hansbkk@gmail.com, vous avez écrit :
> > On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard <xavier@alternatif.org>
> wrote:
> > > disk order is mixed between each boot - even with live-cd.
> > > is that normal?
> >
> > If nothing is changing and the order is swapping really every boot,
> > then IMO that is odd.
>
> nothing has changed, except kernel minor version
Yet you don't tell us what the kernel minor version changed from or to.
That may not be important, but it might and you obviously don't know which.
It is always better to give too much information rather than not enough.
>
> >
> > Part of my SOP is now to record both mdadm and the boot loader's
> > ordering against serial number and UUID of drives when creating an
> > array, and to put the relevant information on labels securely attached
> > to the physical drives, along with creating a map of their physical
> > location and taping that inside the case.nt
> >
> > It's critical to know what's what in a crisis. . .
>
> exactly, in my case mdadm --examine output is somewhat weird as it shows:
> /dev/sde1
> this 0 8 33 0 active sync /dev/sdd1
> /dev/sdd1
> this 0 8 33 0 active sync /dev/sdc1
> /dev/sdc1
> this 0 8 33 0 active sync /dev/sde1
> and /dev/sdf1 as sdf1
You are hiding lots of details again...
Are these all from different arrays? They all claim to be 'device 0' of some
array.
Infact, "8, 33" is *always* /dev/sdc1, so I think the above lines have been
edited by hand because I'm 100% certain mdadm didn't output them.
>
> I think I can believe mdadm?
Yes, you can believe mdadm - but only if you understand what it is saying,
and there are times when that is not as easy as one might like....
> and that /proc/mdstat content comes directly from mdadm (that is with "exact"
> sdc,d,e)?
>
> what trouble me is that after I removed 2 disk drive from the bay, mdadm start
> to recover:
> md0 : active raid10 sdb1[1] sdc1[4] sdd1[3]
> 976767872 blocks 64K chunks 2 near-copies [4/2] [_U_U]
> [=>...................] recovery = 5.0% (24436736/488383936)
> finish=56.2min speed=137513K/sec
Why exactly does this trouble you? It seems to be doing exactly the right
thing.
>
> I guess that it is ok, and that it is recovering with the spare. But I would
> like to be sure...
Sure of what? If you want a clear answer you need to ask a clear question.
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 28+ messages in thread
* Adaptive throttling for RAID1 background resync
2011-03-18 14:49 disk order problem in a raid 10 array Xavier Brochard
2011-03-18 17:22 ` hansbkk
@ 2011-03-18 20:26 ` Hari Subramanian
2011-03-18 20:28 ` Roberto Spadim
2011-03-18 22:11 ` NeilBrown
2011-03-18 22:14 ` disk order problem in a raid 10 array NeilBrown
[not found] ` <201103182350.19281.xavier@alternatif.org>
3 siblings, 2 replies; 28+ messages in thread
From: Hari Subramanian @ 2011-03-18 20:26 UTC (permalink / raw)
To: linux-raid@vger.kernel.org
I am hitting an issue when performing RAID1 resync from a replica hosted on a fast disk to one on a slow disk. When resync throughput is set at 20Mbps min and 200Mbps max and we have enough data to resync, I see the kernel running out of memory quickly (within a minute). From the crash dumps, I see that a whole lot (12,000+) of biovec-64s that are active on the slab cache.
Our guess is that MD is allowing data to be read from the fast disk at a frequency much higher than what the slow disk is able to write to. This continues for a long time (> 1 minute) in an unbounded fashion resulting in buildup of IOs that are waiting to be written to the disk. This eventually causes the machine to panic (we have panic on OOM selected)
From reading the MD and RAID1 resync code, I don't see anything that would prevent something like this from happening. So, we would like to implement something to this effect that adaptively throttles the background resync.
Can someone confirm or deny these claims and also the need for a new solution. Maybe I'm missing something that already exists that would give me the adaptive throttling. We cannot make do with the static throttling (sync_speed_max and min) since that would be too difficult to get right for varying IO throughputs form the different RAID1 replicas.
Thanks
~ Hari
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Adaptive throttling for RAID1 background resync
2011-03-18 20:26 ` Adaptive throttling for RAID1 background resync Hari Subramanian
@ 2011-03-18 20:28 ` Roberto Spadim
2011-03-18 20:31 ` Hari Subramanian
2011-03-18 22:11 ` NeilBrown
1 sibling, 1 reply; 28+ messages in thread
From: Roberto Spadim @ 2011-03-18 20:28 UTC (permalink / raw)
To: Hari Subramanian; +Cc: linux-raid@vger.kernel.org
maybe this could be better solved at queue linux kernel area... at
elevators or block devices
2011/3/18 Hari Subramanian <hari@vmware.com>:
> I am hitting an issue when performing RAID1 resync from a replica hosted on a fast disk to one on a slow disk. When resync throughput is set at 20Mbps min and 200Mbps max and we have enough data to resync, I see the kernel running out of memory quickly (within a minute). From the crash dumps, I see that a whole lot (12,000+) of biovec-64s that are active on the slab cache.
>
> Our guess is that MD is allowing data to be read from the fast disk at a frequency much higher than what the slow disk is able to write to. This continues for a long time (> 1 minute) in an unbounded fashion resulting in buildup of IOs that are waiting to be written to the disk. This eventually causes the machine to panic (we have panic on OOM selected)
>
> From reading the MD and RAID1 resync code, I don't see anything that would prevent something like this from happening. So, we would like to implement something to this effect that adaptively throttles the background resync.
>
> Can someone confirm or deny these claims and also the need for a new solution. Maybe I'm missing something that already exists that would give me the adaptive throttling. We cannot make do with the static throttling (sync_speed_max and min) since that would be too difficult to get right for varying IO throughputs form the different RAID1 replicas.
>
> Thanks
> ~ Hari
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: Adaptive throttling for RAID1 background resync
2011-03-18 20:28 ` Roberto Spadim
@ 2011-03-18 20:31 ` Hari Subramanian
2011-03-18 20:36 ` Roberto Spadim
0 siblings, 1 reply; 28+ messages in thread
From: Hari Subramanian @ 2011-03-18 20:31 UTC (permalink / raw)
To: Roberto Spadim; +Cc: linux-raid@vger.kernel.org
Roberto, My use case involves both foreground and background resyncs happening at the same time. So, by throttling it at the block or IO queues, I would be limiting my throughout for foreground IOs as well which is undesirable.
~ Hari
-----Original Message-----
From: rspadim@gmail.com [mailto:rspadim@gmail.com] On Behalf Of Roberto Spadim
Sent: Friday, March 18, 2011 4:29 PM
To: Hari Subramanian
Cc: linux-raid@vger.kernel.org
Subject: Re: Adaptive throttling for RAID1 background resync
maybe this could be better solved at queue linux kernel area... at
elevators or block devices
2011/3/18 Hari Subramanian <hari@vmware.com>:
> I am hitting an issue when performing RAID1 resync from a replica hosted on a fast disk to one on a slow disk. When resync throughput is set at 20Mbps min and 200Mbps max and we have enough data to resync, I see the kernel running out of memory quickly (within a minute). From the crash dumps, I see that a whole lot (12,000+) of biovec-64s that are active on the slab cache.
>
> Our guess is that MD is allowing data to be read from the fast disk at a frequency much higher than what the slow disk is able to write to. This continues for a long time (> 1 minute) in an unbounded fashion resulting in buildup of IOs that are waiting to be written to the disk. This eventually causes the machine to panic (we have panic on OOM selected)
>
> From reading the MD and RAID1 resync code, I don't see anything that would prevent something like this from happening. So, we would like to implement something to this effect that adaptively throttles the background resync.
>
> Can someone confirm or deny these claims and also the need for a new solution. Maybe I'm missing something that already exists that would give me the adaptive throttling. We cannot make do with the static throttling (sync_speed_max and min) since that would be too difficult to get right for varying IO throughputs form the different RAID1 replicas.
>
> Thanks
> ~ Hari
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Adaptive throttling for RAID1 background resync
2011-03-18 20:31 ` Hari Subramanian
@ 2011-03-18 20:36 ` Roberto Spadim
2011-03-18 20:54 ` Hari Subramanian
0 siblings, 1 reply; 28+ messages in thread
From: Roberto Spadim @ 2011-03-18 20:36 UTC (permalink / raw)
To: Hari Subramanian; +Cc: linux-raid@vger.kernel.org
hum, it´s not a io queue size (very big ram memory queue) problem?
maybe getting it smaller could help?
resync is something like read here write there, if you have write
problem, read should stop when async writes can´t work more (no ram
memory)
i´m right? if true, that´s why i think queue is a point to check
2011/3/18 Hari Subramanian <hari@vmware.com>:
> Roberto, My use case involves both foreground and background resyncs happening at the same time. So, by throttling it at the block or IO queues, I would be limiting my throughout for foreground IOs as well which is undesirable.
>
> ~ Hari
>
> -----Original Message-----
> From: rspadim@gmail.com [mailto:rspadim@gmail.com] On Behalf Of Roberto Spadim
> Sent: Friday, March 18, 2011 4:29 PM
> To: Hari Subramanian
> Cc: linux-raid@vger.kernel.org
> Subject: Re: Adaptive throttling for RAID1 background resync
>
> maybe this could be better solved at queue linux kernel area... at
> elevators or block devices
>
> 2011/3/18 Hari Subramanian <hari@vmware.com>:
>> I am hitting an issue when performing RAID1 resync from a replica hosted on a fast disk to one on a slow disk. When resync throughput is set at 20Mbps min and 200Mbps max and we have enough data to resync, I see the kernel running out of memory quickly (within a minute). From the crash dumps, I see that a whole lot (12,000+) of biovec-64s that are active on the slab cache.
>>
>> Our guess is that MD is allowing data to be read from the fast disk at a frequency much higher than what the slow disk is able to write to. This continues for a long time (> 1 minute) in an unbounded fashion resulting in buildup of IOs that are waiting to be written to the disk. This eventually causes the machine to panic (we have panic on OOM selected)
>>
>> From reading the MD and RAID1 resync code, I don't see anything that would prevent something like this from happening. So, we would like to implement something to this effect that adaptively throttles the background resync.
>>
>> Can someone confirm or deny these claims and also the need for a new solution. Maybe I'm missing something that already exists that would give me the adaptive throttling. We cannot make do with the static throttling (sync_speed_max and min) since that would be too difficult to get right for varying IO throughputs form the different RAID1 replicas.
>>
>> Thanks
>> ~ Hari
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: Adaptive throttling for RAID1 background resync
2011-03-18 20:36 ` Roberto Spadim
@ 2011-03-18 20:54 ` Hari Subramanian
2011-03-18 21:02 ` Roberto Spadim
0 siblings, 1 reply; 28+ messages in thread
From: Hari Subramanian @ 2011-03-18 20:54 UTC (permalink / raw)
To: Roberto Spadim; +Cc: linux-raid@vger.kernel.org
Roberto, I still think the solution you point out has the potential for throttling foreground IOs issued to MD from the filesystem as well as the MD initiated background resyncs. So, I don't want to limit the IO queues, esp since our foreground workload involves a LOT of small random IO.
Thanks
~ Hari
-----Original Message-----
From: rspadim@gmail.com [mailto:rspadim@gmail.com] On Behalf Of Roberto Spadim
Sent: Friday, March 18, 2011 4:36 PM
To: Hari Subramanian
Cc: linux-raid@vger.kernel.org
Subject: Re: Adaptive throttling for RAID1 background resync
hum, it´s not a io queue size (very big ram memory queue) problem?
maybe getting it smaller could help?
resync is something like read here write there, if you have write
problem, read should stop when async writes can´t work more (no ram
memory)
i´m right? if true, that´s why i think queue is a point to check
2011/3/18 Hari Subramanian <hari@vmware.com>:
> Roberto, My use case involves both foreground and background resyncs happening at the same time. So, by throttling it at the block or IO queues, I would be limiting my throughout for foreground IOs as well which is undesirable.
>
> ~ Hari
>
> -----Original Message-----
> From: rspadim@gmail.com [mailto:rspadim@gmail.com] On Behalf Of Roberto Spadim
> Sent: Friday, March 18, 2011 4:29 PM
> To: Hari Subramanian
> Cc: linux-raid@vger.kernel.org
> Subject: Re: Adaptive throttling for RAID1 background resync
>
> maybe this could be better solved at queue linux kernel area... at
> elevators or block devices
>
> 2011/3/18 Hari Subramanian <hari@vmware.com>:
>> I am hitting an issue when performing RAID1 resync from a replica hosted on a fast disk to one on a slow disk. When resync throughput is set at 20Mbps min and 200Mbps max and we have enough data to resync, I see the kernel running out of memory quickly (within a minute). From the crash dumps, I see that a whole lot (12,000+) of biovec-64s that are active on the slab cache.
>>
>> Our guess is that MD is allowing data to be read from the fast disk at a frequency much higher than what the slow disk is able to write to. This continues for a long time (> 1 minute) in an unbounded fashion resulting in buildup of IOs that are waiting to be written to the disk. This eventually causes the machine to panic (we have panic on OOM selected)
>>
>> From reading the MD and RAID1 resync code, I don't see anything that would prevent something like this from happening. So, we would like to implement something to this effect that adaptively throttles the background resync.
>>
>> Can someone confirm or deny these claims and also the need for a new solution. Maybe I'm missing something that already exists that would give me the adaptive throttling. We cannot make do with the static throttling (sync_speed_max and min) since that would be too difficult to get right for varying IO throughputs form the different RAID1 replicas.
>>
>> Thanks
>> ~ Hari
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Adaptive throttling for RAID1 background resync
2011-03-18 20:54 ` Hari Subramanian
@ 2011-03-18 21:02 ` Roberto Spadim
0 siblings, 0 replies; 28+ messages in thread
From: Roberto Spadim @ 2011-03-18 21:02 UTC (permalink / raw)
To: Hari Subramanian; +Cc: linux-raid@vger.kernel.org
humm, let´s wait anothers ideas from list
2011/3/18 Hari Subramanian <hari@vmware.com>:
> Roberto, I still think the solution you point out has the potential for throttling foreground IOs issued to MD from the filesystem as well as the MD initiated background resyncs. So, I don't want to limit the IO queues, esp since our foreground workload involves a LOT of small random IO.
>
> Thanks
> ~ Hari
>
> -----Original Message-----
> From: rspadim@gmail.com [mailto:rspadim@gmail.com] On Behalf Of Roberto Spadim
> Sent: Friday, March 18, 2011 4:36 PM
> To: Hari Subramanian
> Cc: linux-raid@vger.kernel.org
> Subject: Re: Adaptive throttling for RAID1 background resync
>
> hum, it´s not a io queue size (very big ram memory queue) problem?
> maybe getting it smaller could help?
> resync is something like read here write there, if you have write
> problem, read should stop when async writes can´t work more (no ram
> memory)
> i´m right? if true, that´s why i think queue is a point to check
>
> 2011/3/18 Hari Subramanian <hari@vmware.com>:
>> Roberto, My use case involves both foreground and background resyncs happening at the same time. So, by throttling it at the block or IO queues, I would be limiting my throughout for foreground IOs as well which is undesirable.
>>
>> ~ Hari
>>
>> -----Original Message-----
>> From: rspadim@gmail.com [mailto:rspadim@gmail.com] On Behalf Of Roberto Spadim
>> Sent: Friday, March 18, 2011 4:29 PM
>> To: Hari Subramanian
>> Cc: linux-raid@vger.kernel.org
>> Subject: Re: Adaptive throttling for RAID1 background resync
>>
>> maybe this could be better solved at queue linux kernel area... at
>> elevators or block devices
>>
>> 2011/3/18 Hari Subramanian <hari@vmware.com>:
>>> I am hitting an issue when performing RAID1 resync from a replica hosted on a fast disk to one on a slow disk. When resync throughput is set at 20Mbps min and 200Mbps max and we have enough data to resync, I see the kernel running out of memory quickly (within a minute). From the crash dumps, I see that a whole lot (12,000+) of biovec-64s that are active on the slab cache.
>>>
>>> Our guess is that MD is allowing data to be read from the fast disk at a frequency much higher than what the slow disk is able to write to. This continues for a long time (> 1 minute) in an unbounded fashion resulting in buildup of IOs that are waiting to be written to the disk. This eventually causes the machine to panic (we have panic on OOM selected)
>>>
>>> From reading the MD and RAID1 resync code, I don't see anything that would prevent something like this from happening. So, we would like to implement something to this effect that adaptively throttles the background resync.
>>>
>>> Can someone confirm or deny these claims and also the need for a new solution. Maybe I'm missing something that already exists that would give me the adaptive throttling. We cannot make do with the static throttling (sync_speed_max and min) since that would be too difficult to get right for varying IO throughputs form the different RAID1 replicas.
>>>
>>> Thanks
>>> ~ Hari
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>>
>>
>> --
>> Roberto Spadim
>> Spadim Technology / SPAEmpresarial
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Adaptive throttling for RAID1 background resync
2011-03-18 20:26 ` Adaptive throttling for RAID1 background resync Hari Subramanian
2011-03-18 20:28 ` Roberto Spadim
@ 2011-03-18 22:11 ` NeilBrown
2011-03-21 21:02 ` Hari Subramanian
1 sibling, 1 reply; 28+ messages in thread
From: NeilBrown @ 2011-03-18 22:11 UTC (permalink / raw)
To: Hari Subramanian; +Cc: linux-raid@vger.kernel.org
On Fri, 18 Mar 2011 13:26:52 -0700 Hari Subramanian <hari@vmware.com> wrote:
> I am hitting an issue when performing RAID1 resync from a replica hosted on a fast disk to one on a slow disk. When resync throughput is set at 20Mbps min and 200Mbps max and we have enough data to resync, I see the kernel running out of memory quickly (within a minute). From the crash dumps, I see that a whole lot (12,000+) of biovec-64s that are active on the slab cache.
>
> Our guess is that MD is allowing data to be read from the fast disk at a frequency much higher than what the slow disk is able to write to. This continues for a long time (> 1 minute) in an unbounded fashion resulting in buildup of IOs that are waiting to be written to the disk. This eventually causes the machine to panic (we have panic on OOM selected)
>
> >From reading the MD and RAID1 resync code, I don't see anything that would prevent something like this from happening. So, we would like to implement something to this effect that adaptively throttles the background resync.
>
> Can someone confirm or deny these claims and also the need for a new solution. Maybe I'm missing something that already exists that would give me the adaptive throttling. We cannot make do with the static throttling (sync_speed_max and min) since that would be too difficult to get right for varying IO throughputs form the different RAID1 replicas.
The thing you are missing that already exists is
#define RESYNC_DEPTH 32
which is a limit places on conf->barrier, where conf->barrier is incremented
before submitting a resync IO, and decremented after completing a resync IO.
So there can never be more than 32 bios per device in use for resync.
12,000 active biovec-64s sounds a lot like a memory leak - something isn't
freeing them.
Is there some 'bio-XXX' slab with a similar count. If there isn't, then the
bio was released without releasing the biovec, which would be bad.
If there is - that information would help.
NeilBrown
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: Adaptive throttling for RAID1 background resync
2011-03-18 22:11 ` NeilBrown
@ 2011-03-21 21:02 ` Hari Subramanian
0 siblings, 0 replies; 28+ messages in thread
From: Hari Subramanian @ 2011-03-21 21:02 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid@vger.kernel.org
Hi Neil,
There are an equal number of BIOs as there are biovec-64s. But I understand why that is the case now. Turns out that some changes that were made by one of our performance engineers in the interest of increasing the performance of background resyncs when there are foreground I/Os gets in the way (or effectively neuters) the RESYNC_DEPTH throttle that exists today.
The gist of these changes is that:
- We hold the barrier across the resync window to disallow foreground IOs from interrupting background resyncs and raise_barrier is not being invoked raid1 sync_request
- We increase the resync window to 8M and resync chunk size to 256K
The combination of these factors caused us to have a huge number of IOs outstanding and as much as 256M of resync data pages. We are working on a fix for this. I can share a patch to MD that implements these changes if someone is interested.
Thanks again for your help!
~ Hari
-----Original Message-----
From: NeilBrown [mailto:neilb@suse.de]
Sent: Friday, March 18, 2011 6:12 PM
To: Hari Subramanian
Cc: linux-raid@vger.kernel.org
Subject: Re: Adaptive throttling for RAID1 background resync
On Fri, 18 Mar 2011 13:26:52 -0700 Hari Subramanian <hari@vmware.com> wrote:
> I am hitting an issue when performing RAID1 resync from a replica hosted on a fast disk to one on a slow disk. When resync throughput is set at 20Mbps min and 200Mbps max and we have enough data to resync, I see the kernel running out of memory quickly (within a minute). From the crash dumps, I see that a whole lot (12,000+) of biovec-64s that are active on the slab cache.
>
> Our guess is that MD is allowing data to be read from the fast disk at a frequency much higher than what the slow disk is able to write to. This continues for a long time (> 1 minute) in an unbounded fashion resulting in buildup of IOs that are waiting to be written to the disk. This eventually causes the machine to panic (we have panic on OOM selected)
>
> >From reading the MD and RAID1 resync code, I don't see anything that would prevent something like this from happening. So, we would like to implement something to this effect that adaptively throttles the background resync.
>
> Can someone confirm or deny these claims and also the need for a new solution. Maybe I'm missing something that already exists that would give me the adaptive throttling. We cannot make do with the static throttling (sync_speed_max and min) since that would be too difficult to get right for varying IO throughputs form the different RAID1 replicas.
The thing you are missing that already exists is
#define RESYNC_DEPTH 32
which is a limit places on conf->barrier, where conf->barrier is incremented
before submitting a resync IO, and decremented after completing a resync IO.
So there can never be more than 32 bios per device in use for resync.
12,000 active biovec-64s sounds a lot like a memory leak - something isn't
freeing them.
Is there some 'bio-XXX' slab with a similar count. If there isn't, then the
bio was released without releasing the biovec, which would be bad.
If there is - that information would help.
NeilBrown
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: disk order problem in a raid 10 array
2011-03-18 14:49 disk order problem in a raid 10 array Xavier Brochard
2011-03-18 17:22 ` hansbkk
2011-03-18 20:26 ` Adaptive throttling for RAID1 background resync Hari Subramanian
@ 2011-03-18 22:14 ` NeilBrown
[not found] ` <201103182350.19281.xavier@alternatif.org>
3 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2011-03-18 22:14 UTC (permalink / raw)
To: Xavier Brochard; +Cc: linux-raid
On Fri, 18 Mar 2011 15:49:20 +0100 Xavier Brochard <xavier@alternatif.org>
wrote:
> Hello
>
> trying to solve my problem with a unusable raid10 array, I discovered that
> disk order is mixed between each boot - even with live-cd.
> Here's an extract from dmesg:
> [ 12.5] sda:
> [ 12.5] sdc:
> [ 12.5] sdd:
> [ 12.5] sde: sdd1
> [ 12.5] sdf: sdc1
> [ 12.5] sda1 sda2
> [ 12.5] sdg: sde1
> [ 12.5] sdf1
>
> is that normal?
> could this be a sign of hardware controler problem?
> could this happen because all disks are sata-3 except 1 SSD which is sata-2?
You are saying that something changes between each boot, but only giving one
example so that we cannot see the change. That is not particularly helpful.
The output above is a bit odd, but I think it is simply that the devices are
all being examined in parallel so the per-device messages are being mingled
together.
Certainly 'sdd1' is on 'sdd', not no 'sde' as the message seems to show.
NeilBrown
^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <201103182350.19281.xavier@alternatif.org>]
* Re: disk order problem in a raid 10 array
@ 2011-03-18 23:06 Xavier Brochard
0 siblings, 0 replies; 28+ messages in thread
From: Xavier Brochard @ 2011-03-18 23:06 UTC (permalink / raw)
To: linux-raid
Le vendredi 18 mars 2011 23:14:05, NeilBrown écrivait :
> On Fri, 18 Mar 2011 15:49:20 +0100 Xavier Brochard <xavier@alternatif.org>
> > trying to solve my problem with a unusable raid10 array, I discovered
> > that disk order is mixed between each boot - even with live-cd.
> > Here's an extract from dmesg:
> > [ 12.5] sda:
> > [ 12.5] sdc:
> > [ 12.5] sdd:
> > [ 12.5] sde: sdd1
> > [ 12.5] sdf: sdc1
> > [ 12.5] sda1 sda2
> > [ 12.5] sdg: sde1
> > [ 12.5] sdf1
> >
> > is that normal?
>
> You are saying that something changes between each boot, but only giving
> one example so that we cannot see the change. That is not particularly
> helpful.
sorry, I didn't want to send too long email
as each dmesg show diffent but similar output
> The output above is a bit odd, but I think it is simply that the devices
> are all being examined in parallel so the per-device messages are being
> mingled together.
> Certainly 'sdd1' is on 'sdd', not no 'sde' as the message seems to show.
ok thanks
Xavier
xavier@alternatif.org - 09 54 06 16 26
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: disk order problem in a raid 10 array
@ 2011-03-18 23:06 Xavier Brochard
2011-03-18 23:57 ` Roberto Spadim
0 siblings, 1 reply; 28+ messages in thread
From: Xavier Brochard @ 2011-03-18 23:06 UTC (permalink / raw)
To: linux-raid; +Cc: hansbkk
Hello,
Le vendredi 18 mars 2011 23:22:51, NeilBrown écrivait :
> On Fri, 18 Mar 2011 21:12:49 +0100 Xavier Brochard <xavier@alternatif.org>
> > Le vendredi 18 mars 2011 18:22:34 hansbkk@gmail.com, vous avez écrit :
> > > On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard
> > > <xavier@alternatif.org>
> >
> > wrote:
> > > > disk order is mixed between each boot - even with live-cd.
> > > > is that normal?
> > >
> > > If nothing is changing and the order is swapping really every boot,
> > > then IMO that is odd.
> >
> > nothing has changed, except kernel minor version
>
> Yet you don't tell us what the kernel minor version changed from or to.
Previously it was ubuntu 2.6.32-27-server or 2.6.32-28-server and now it is
ubuntu 2.6.32-29.58-server 2.6.32.28+drm33.13
> That may not be important, but it might and you obviously don't know which.
> It is always better to give too much information rather than not enough.
Again sorry, my wednesday email was long and I thought it was too long!
> > exactly, in my case mdadm --examine output is somewhat weird as it shows:
> > /dev/sde1
> > this 0 8 33 0 active sync /dev/sdd1
> > /dev/sdd1
> > this 0 8 33 0 active sync /dev/sdc1
> > /dev/sdc1
> > this 0 8 33 0 active sync /dev/sde1
> > and /dev/sdf1 as sdf1
>
> You are hiding lots of details again...
>
> Are these all from different arrays? They all claim to be 'device 0' of
> some array.
They are all from same md RAID10 array
> Infact, "8, 33" is *always* /dev/sdc1, so I think the above lines have
> been edited by hand because I'm 100% certain mdadm didn't output them.
You're right, I'm sorry. I have copied this line, just changing the /dev/sd?
Here's full output of mdadm --examine /dev/sd[cdefg]1
As you can see, disks sdc, sdd and sde claims to be different, is it a problem?
======================================
/dev/sdc1:
Magic : a92b4efc
Version : 0.90.00
UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
Creation Time : Sun Jan 2 16:41:45 2011
Raid Level : raid10
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 976767872 (931.52 GiB 1000.21 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0
Update Time : Wed Mar 16 09:50:03 2011
State : clean
Active Devices : 1
Working Devices : 1
Failed Devices : 2
Spare Devices : 0
Checksum : ec151590 - correct
Events : 154
Layout : near=2
Chunk Size : 64K
Number Major Minor RaidDevice State
this 2 8 65 2 active sync /dev/sde1
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 8 65 2 active sync /dev/sde1
3 3 0 0 3 faulty removed
/dev/sdd1:
Magic : a92b4efc
Version : 0.90.00
UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
Creation Time : Sun Jan 2 16:41:45 2011
Raid Level : raid10
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 976767872 (931.52 GiB 1000.21 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0
Update Time : Wed Mar 16 07:43:45 2011
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : ec14f740 - correct
Events : 102
Layout : near=2
Chunk Size : 64K
Number Major Minor RaidDevice State
this 0 8 33 0 active sync /dev/sdc1
0 0 8 33 0 active sync /dev/sdc1
1 1 8 49 1 active sync /dev/sdd1
2 2 8 65 2 active sync /dev/sde1
3 3 8 81 3 active sync /dev/sdf1
4 4 8 97 4 spare /dev/sdg1
/dev/sde1:
Magic : a92b4efc
Version : 0.90.00
UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
Creation Time : Sun Jan 2 16:41:45 2011
Raid Level : raid10
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 976767872 (931.52 GiB 1000.21 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0
Update Time : Wed Mar 16 07:43:45 2011
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : ec14f752 - correct
Events : 102
Layout : near=2
Chunk Size : 64K
Number Major Minor RaidDevice State
this 1 8 49 1 active sync /dev/sdd1
0 0 8 33 0 active sync /dev/sdc1
1 1 8 49 1 active sync /dev/sdd1
2 2 8 65 2 active sync /dev/sde1
3 3 8 81 3 active sync /dev/sdf1
4 4 8 97 4 spare /dev/sdg1
/dev/sdf1:
Magic : a92b4efc
Version : 0.90.00
UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
Creation Time : Sun Jan 2 16:41:45 2011
Raid Level : raid10
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 976767872 (931.52 GiB 1000.21 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0
Update Time : Wed Mar 16 07:43:45 2011
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : ec14f776 - correct
Events : 102
Layout : near=2
Chunk Size : 64K
Number Major Minor RaidDevice State
this 3 8 81 3 active sync /dev/sdf1
0 0 8 33 0 active sync /dev/sdc1
1 1 8 49 1 active sync /dev/sdd1
2 2 8 65 2 active sync /dev/sde1
3 3 8 81 3 active sync /dev/sdf1
4 4 8 97 4 spare /dev/sdg1
/dev/sdg1:
Magic : a92b4efc
Version : 0.90.00
UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
Creation Time : Sun Jan 2 16:41:45 2011
Raid Level : raid10
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 976767872 (931.52 GiB 1000.21 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0
Update Time : Wed Mar 16 07:43:45 2011
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : ec14f782 - correct
Events : 102
Layout : near=2
Chunk Size : 64K
Number Major Minor RaidDevice State
this 4 8 97 4 spare /dev/sdg1
0 0 8 33 0 active sync /dev/sdc1
1 1 8 49 1 active sync /dev/sdd1
2 2 8 65 2 active sync /dev/sde1
3 3 8 81 3 active sync /dev/sdf1
4 4 8 97 4 spare /dev/sdg1
===========
> > I think I can believe mdadm?
>
> Yes, you can believe mdadm - but only if you understand what it is saying,
> and there are times when that is not as easy as one might like....
Specially when a raid system is broken! One mind looks broken too and it's a
bit hard to think clearly :-)
Thanks for the help
Xavier
xavier@alternatif.org - 09 54 06 16 26
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: disk order problem in a raid 10 array
2011-03-18 23:06 Xavier Brochard
@ 2011-03-18 23:57 ` Roberto Spadim
2011-03-19 0:03 ` Xavier Brochard
0 siblings, 1 reply; 28+ messages in thread
From: Roberto Spadim @ 2011-03-18 23:57 UTC (permalink / raw)
To: Xavier Brochard; +Cc: linux-raid, hansbkk
did you try to change udev configuration?
2011/3/18 Xavier Brochard <xavier@alternatif.org>:
> Hello,
>
> Le vendredi 18 mars 2011 23:22:51, NeilBrown écrivait :
>> On Fri, 18 Mar 2011 21:12:49 +0100 Xavier Brochard <xavier@alternatif.org>
>> > Le vendredi 18 mars 2011 18:22:34 hansbkk@gmail.com, vous avez écrit :
>> > > On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard
>> > > <xavier@alternatif.org>
>> >
>> > wrote:
>> > > > disk order is mixed between each boot - even with live-cd.
>> > > > is that normal?
>> > >
>> > > If nothing is changing and the order is swapping really every boot,
>> > > then IMO that is odd.
>> >
>> > nothing has changed, except kernel minor version
>>
>> Yet you don't tell us what the kernel minor version changed from or to.
>
> Previously it was ubuntu 2.6.32-27-server or 2.6.32-28-server and now it is
> ubuntu 2.6.32-29.58-server 2.6.32.28+drm33.13
>
>> That may not be important, but it might and you obviously don't know which.
>> It is always better to give too much information rather than not enough.
>
> Again sorry, my wednesday email was long and I thought it was too long!
>
>> > exactly, in my case mdadm --examine output is somewhat weird as it shows:
>> > /dev/sde1
>> > this 0 8 33 0 active sync /dev/sdd1
>> > /dev/sdd1
>> > this 0 8 33 0 active sync /dev/sdc1
>> > /dev/sdc1
>> > this 0 8 33 0 active sync /dev/sde1
>> > and /dev/sdf1 as sdf1
>>
>> You are hiding lots of details again...
>>
>> Are these all from different arrays? They all claim to be 'device 0' of
>> some array.
>
> They are all from same md RAID10 array
>
>> Infact, "8, 33" is *always* /dev/sdc1, so I think the above lines have
>> been edited by hand because I'm 100% certain mdadm didn't output them.
>
> You're right, I'm sorry. I have copied this line, just changing the /dev/sd?
>
> Here's full output of mdadm --examine /dev/sd[cdefg]1
> As you can see, disks sdc, sdd and sde claims to be different, is it a problem?
> ======================================
> /dev/sdc1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
> Creation Time : Sun Jan 2 16:41:45 2011
> Raid Level : raid10
> Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
> Array Size : 976767872 (931.52 GiB 1000.21 GB)
> Raid Devices : 4
> Total Devices : 5
> Preferred Minor : 0
>
> Update Time : Wed Mar 16 09:50:03 2011
> State : clean
> Active Devices : 1
> Working Devices : 1
> Failed Devices : 2
> Spare Devices : 0
> Checksum : ec151590 - correct
> Events : 154
>
> Layout : near=2
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 2 8 65 2 active sync /dev/sde1
>
> 0 0 0 0 0 removed
> 1 1 0 0 1 faulty removed
> 2 2 8 65 2 active sync /dev/sde1
> 3 3 0 0 3 faulty removed
> /dev/sdd1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
> Creation Time : Sun Jan 2 16:41:45 2011
> Raid Level : raid10
> Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
> Array Size : 976767872 (931.52 GiB 1000.21 GB)
> Raid Devices : 4
> Total Devices : 5
> Preferred Minor : 0
>
> Update Time : Wed Mar 16 07:43:45 2011
> State : clean
> Active Devices : 4
> Working Devices : 5
> Failed Devices : 0
> Spare Devices : 1
> Checksum : ec14f740 - correct
> Events : 102
>
> Layout : near=2
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 0 8 33 0 active sync /dev/sdc1
>
> 0 0 8 33 0 active sync /dev/sdc1
> 1 1 8 49 1 active sync /dev/sdd1
> 2 2 8 65 2 active sync /dev/sde1
> 3 3 8 81 3 active sync /dev/sdf1
> 4 4 8 97 4 spare /dev/sdg1
> /dev/sde1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
> Creation Time : Sun Jan 2 16:41:45 2011
> Raid Level : raid10
> Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
> Array Size : 976767872 (931.52 GiB 1000.21 GB)
> Raid Devices : 4
> Total Devices : 5
> Preferred Minor : 0
>
> Update Time : Wed Mar 16 07:43:45 2011
> State : clean
> Active Devices : 4
> Working Devices : 5
> Failed Devices : 0
> Spare Devices : 1
> Checksum : ec14f752 - correct
> Events : 102
>
> Layout : near=2
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 1 8 49 1 active sync /dev/sdd1
>
> 0 0 8 33 0 active sync /dev/sdc1
> 1 1 8 49 1 active sync /dev/sdd1
> 2 2 8 65 2 active sync /dev/sde1
> 3 3 8 81 3 active sync /dev/sdf1
> 4 4 8 97 4 spare /dev/sdg1
> /dev/sdf1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
> Creation Time : Sun Jan 2 16:41:45 2011
> Raid Level : raid10
> Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
> Array Size : 976767872 (931.52 GiB 1000.21 GB)
> Raid Devices : 4
> Total Devices : 5
> Preferred Minor : 0
>
> Update Time : Wed Mar 16 07:43:45 2011
> State : clean
> Active Devices : 4
> Working Devices : 5
> Failed Devices : 0
> Spare Devices : 1
> Checksum : ec14f776 - correct
> Events : 102
>
> Layout : near=2
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 3 8 81 3 active sync /dev/sdf1
>
> 0 0 8 33 0 active sync /dev/sdc1
> 1 1 8 49 1 active sync /dev/sdd1
> 2 2 8 65 2 active sync /dev/sde1
> 3 3 8 81 3 active sync /dev/sdf1
> 4 4 8 97 4 spare /dev/sdg1
> /dev/sdg1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
> Creation Time : Sun Jan 2 16:41:45 2011
> Raid Level : raid10
> Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
> Array Size : 976767872 (931.52 GiB 1000.21 GB)
> Raid Devices : 4
> Total Devices : 5
> Preferred Minor : 0
>
> Update Time : Wed Mar 16 07:43:45 2011
> State : clean
> Active Devices : 4
> Working Devices : 5
> Failed Devices : 0
> Spare Devices : 1
> Checksum : ec14f782 - correct
> Events : 102
>
> Layout : near=2
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 4 8 97 4 spare /dev/sdg1
>
> 0 0 8 33 0 active sync /dev/sdc1
> 1 1 8 49 1 active sync /dev/sdd1
> 2 2 8 65 2 active sync /dev/sde1
> 3 3 8 81 3 active sync /dev/sdf1
> 4 4 8 97 4 spare /dev/sdg1
> ===========
>
>
>> > I think I can believe mdadm?
>>
>> Yes, you can believe mdadm - but only if you understand what it is saying,
>> and there are times when that is not as easy as one might like....
>
> Specially when a raid system is broken! One mind looks broken too and it's a
> bit hard to think clearly :-)
>
> Thanks for the help
>
> Xavier
> xavier@alternatif.org - 09 54 06 16 26
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: disk order problem in a raid 10 array
2011-03-18 23:57 ` Roberto Spadim
@ 2011-03-19 0:03 ` Xavier Brochard
0 siblings, 0 replies; 28+ messages in thread
From: Xavier Brochard @ 2011-03-19 0:03 UTC (permalink / raw)
To: Roberto Spadim; +Cc: linux-raid
Le samedi 19 mars 2011 00:57:15, Roberto Spadim écrivait :
> did you try to change udev configuration?
no
But attempting to boot on a 2.6.32-27 or 2.6.32-24 kernel result in a freeze
short after:
ata_id[919] : HDIO_GET_IDENTITY failed for '/dev/sdb'
And while rebooting with alt + Sysrq + REISUB keys, I saw that udev was stuck
on waiting a logitech usb mouse.
Xavier
xavier@alternatif.org
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2011-03-21 21:02 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-18 14:49 disk order problem in a raid 10 array Xavier Brochard
2011-03-18 17:22 ` hansbkk
2011-03-18 20:09 ` Xavier Brochard
2011-03-18 20:12 ` Xavier Brochard
2011-03-18 22:22 ` NeilBrown
2011-03-18 20:26 ` Adaptive throttling for RAID1 background resync Hari Subramanian
2011-03-18 20:28 ` Roberto Spadim
2011-03-18 20:31 ` Hari Subramanian
2011-03-18 20:36 ` Roberto Spadim
2011-03-18 20:54 ` Hari Subramanian
2011-03-18 21:02 ` Roberto Spadim
2011-03-18 22:11 ` NeilBrown
2011-03-21 21:02 ` Hari Subramanian
2011-03-18 22:14 ` disk order problem in a raid 10 array NeilBrown
[not found] ` <201103182350.19281.xavier@alternatif.org>
[not found] ` <20110319102039.52cc2282@notabene.brown>
2011-03-18 23:59 ` Xavier Brochard
2011-03-19 0:05 ` Xavier Brochard
2011-03-19 0:07 ` Roberto Spadim
2011-03-19 0:25 ` Xavier Brochard
2011-03-19 1:42 ` NeilBrown
2011-03-19 13:44 ` Xavier Brochard
2011-03-19 15:14 ` Xavier Brochard
2011-03-20 3:53 ` NeilBrown
2011-03-20 10:40 ` Xavier Brochard
2011-03-19 12:01 ` Xavier Brochard
-- strict thread matches above, loose matches on Subject: below --
2011-03-18 23:06 Xavier Brochard
2011-03-18 23:06 Xavier Brochard
2011-03-18 23:57 ` Roberto Spadim
2011-03-19 0:03 ` Xavier Brochard
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).