linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* question about MD raid rebuild performance degradation even with speed_limit_min/speed_limit_max set.
       [not found] <5445332B.9060009@cse.yorku.ca>
@ 2014-10-20 16:19 ` Jason Keltz
  2014-10-20 21:07   ` Jason Keltz
  0 siblings, 1 reply; 7+ messages in thread
From: Jason Keltz @ 2014-10-20 16:19 UTC (permalink / raw)
  To: linux-raid

Hi.

I'm creating a 22 x 2 TB SATA disk MD RAID10 on a new RHEL6 system. I've 
experimented with setting "speed_limit_min" and "speed_limit_max" kernel 
variables so that I get the best balance of performance during a RAID 
rebuild of one of the RAID1 pairs. If, for example, I set 
speed_limit_min AND speed_limit_max to 80000 then fail a disk when there 
is no other disk activity, then I do get a rebuild rate of around 80 
MB/s. However, if I then start up a write intensive operation on the MD 
array (eg. a dd, or a mkfs on an LVM logical volume that is created on 
that MD), then, my write operation seems to get "full power", and my 
rebuild drops to around 25 MB/s. This means that the rebuild of my 
RAID10 disk is going to take a huge amount of time (>12 hours!!!). When 
I set speed_limit_min and speed_limit_max to the same value, am I not 
guaranteeing the rebuild speed? Is this a bug that I should be reporting 
to Red Hat, or a "feature"?

Thanks in advance for any help that you can provide...

Jason.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: question about MD raid rebuild performance degradation even with speed_limit_min/speed_limit_max set.
  2014-10-20 16:19 ` question about MD raid rebuild performance degradation even with speed_limit_min/speed_limit_max set Jason Keltz
@ 2014-10-20 21:07   ` Jason Keltz
  2014-10-28 22:38     ` NeilBrown
  0 siblings, 1 reply; 7+ messages in thread
From: Jason Keltz @ 2014-10-20 21:07 UTC (permalink / raw)
  To: linux-raid

On 10/20/2014 12:19 PM, Jason Keltz wrote:
> Hi.
>
> I'm creating a 22 x 2 TB SATA disk MD RAID10 on a new RHEL6 system. 
> I've experimented with setting "speed_limit_min" and "speed_limit_max" 
> kernel variables so that I get the best balance of performance during 
> a RAID rebuild of one of the RAID1 pairs. If, for example, I set 
> speed_limit_min AND speed_limit_max to 80000 then fail a disk when 
> there is no other disk activity, then I do get a rebuild rate of 
> around 80 MB/s. However, if I then start up a write intensive 
> operation on the MD array (eg. a dd, or a mkfs on an LVM logical 
> volume that is created on that MD), then, my write operation seems to 
> get "full power", and my rebuild drops to around 25 MB/s. This means 
> that the rebuild of my RAID10 disk is going to take a huge amount of 
> time (>12 hours!!!). When I set speed_limit_min and speed_limit_max to 
> the same value, am I not guaranteeing the rebuild speed? Is this a bug 
> that I should be reporting to Red Hat, or a "feature"?
>
> Thanks in advance for any help that you can provide...
>
> Jason.

I would like to add that I downloaded the latest version of Ubuntu, and 
am running it on the same server with the same MD.
When I set speed_limit_min and speed_limit_max to 80000, I was able to 
start two large dds on the md array, and the rebuild stuck at around 71 
MB/s, which is close enough.  This leads me to believe that the problem 
above is probably a RHEL6 issue.  However, after I stopped the two dd 
operations,  and raised both speed_limit_min and speed_limit_max to 
120000, the rebuild stayed between 71-73 Mb/s for more than 10 minutes 
.. now it seems to be at 100 MB/s... but doesn't seem to get any higher 
(even though I had 120 MB/s and above on the RHEL system without any 
load)... Hmm.

Jason.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: question about MD raid rebuild performance degradation even with speed_limit_min/speed_limit_max set.
  2014-10-20 21:07   ` Jason Keltz
@ 2014-10-28 22:38     ` NeilBrown
  2014-10-29  2:34       ` Jason Keltz
  0 siblings, 1 reply; 7+ messages in thread
From: NeilBrown @ 2014-10-28 22:38 UTC (permalink / raw)
  To: Jason Keltz; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2825 bytes --]

On Mon, 20 Oct 2014 17:07:38 -0400 Jason Keltz <jas@cse.yorku.ca> wrote:

> On 10/20/2014 12:19 PM, Jason Keltz wrote:
> > Hi.
> >
> > I'm creating a 22 x 2 TB SATA disk MD RAID10 on a new RHEL6 system. 
> > I've experimented with setting "speed_limit_min" and "speed_limit_max" 
> > kernel variables so that I get the best balance of performance during 
> > a RAID rebuild of one of the RAID1 pairs. If, for example, I set 
> > speed_limit_min AND speed_limit_max to 80000 then fail a disk when 
> > there is no other disk activity, then I do get a rebuild rate of 
> > around 80 MB/s. However, if I then start up a write intensive 
> > operation on the MD array (eg. a dd, or a mkfs on an LVM logical 
> > volume that is created on that MD), then, my write operation seems to 
> > get "full power", and my rebuild drops to around 25 MB/s. This means 
> > that the rebuild of my RAID10 disk is going to take a huge amount of 
> > time (>12 hours!!!). When I set speed_limit_min and speed_limit_max to 
> > the same value, am I not guaranteeing the rebuild speed? Is this a bug 
> > that I should be reporting to Red Hat, or a "feature"?
> >
> > Thanks in advance for any help that you can provide...
> >
> > Jason.
> 
> I would like to add that I downloaded the latest version of Ubuntu, and 
> am running it on the same server with the same MD.
> When I set speed_limit_min and speed_limit_max to 80000, I was able to 
> start two large dds on the md array, and the rebuild stuck at around 71 
> MB/s, which is close enough.  This leads me to believe that the problem 
> above is probably a RHEL6 issue.  However, after I stopped the two dd 
> operations,  and raised both speed_limit_min and speed_limit_max to 
> 120000, the rebuild stayed between 71-73 Mb/s for more than 10 minutes 
> .. now it seems to be at 100 MB/s... but doesn't seem to get any higher 
> (even though I had 120 MB/s and above on the RHEL system without any 
> load)... Hmm.
>

md certainly cannot "guarantee" any speed - it can only deliver what the
underlying devices deliver.
I know the kernels logs say something about a "guarantee".  That was added
before my time and I haven't had occasion to remove it.

md will normally just try to recover as fast as it can unless that exceeds
one of the limits - then it will back-off.
What speed it actually achieved depends on other load and the behaviour of
the IO scheduler.

"RHEL6" and "Ubuntu" don't mean a lot to me.  Specific kernel version might,
though in the case of Redhat I know that backport lots of stuff so even the
kernel version isn't very helpful.  I'm must prefer having report against
mainline kernels.

Rotating drives do get lower transfer speeds at higher addresses.  That might
explain the 120 / 100 difference.

NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: question about MD raid rebuild performance degradation even with speed_limit_min/speed_limit_max set.
  2014-10-28 22:38     ` NeilBrown
@ 2014-10-29  2:34       ` Jason Keltz
  2014-10-29  2:57         ` NeilBrown
  0 siblings, 1 reply; 7+ messages in thread
From: Jason Keltz @ 2014-10-29  2:34 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 28/10/2014 6:38 PM, NeilBrown wrote:
> On Mon, 20 Oct 2014 17:07:38 -0400 Jason Keltz<jas@cse.yorku.ca>  wrote:
>
>> On 10/20/2014 12:19 PM, Jason Keltz wrote:
>>> Hi.
>>>
>>> I'm creating a 22 x 2 TB SATA disk MD RAID10 on a new RHEL6 system.
>>> I've experimented with setting "speed_limit_min" and "speed_limit_max"
>>> kernel variables so that I get the best balance of performance during
>>> a RAID rebuild of one of the RAID1 pairs. If, for example, I set
>>> speed_limit_min AND speed_limit_max to 80000 then fail a disk when
>>> there is no other disk activity, then I do get a rebuild rate of
>>> around 80 MB/s. However, if I then start up a write intensive
>>> operation on the MD array (eg. a dd, or a mkfs on an LVM logical
>>> volume that is created on that MD), then, my write operation seems to
>>> get "full power", and my rebuild drops to around 25 MB/s. This means
>>> that the rebuild of my RAID10 disk is going to take a huge amount of
>>> time (>12 hours!!!). When I set speed_limit_min and speed_limit_max to
>>> the same value, am I not guaranteeing the rebuild speed? Is this a bug
>>> that I should be reporting to Red Hat, or a "feature"?
>>>
>>> Thanks in advance for any help that you can provide...
>>>
>>> Jason.
>> I would like to add that I downloaded the latest version of Ubuntu, and
>> am running it on the same server with the same MD.
>> When I set speed_limit_min and speed_limit_max to 80000, I was able to
>> start two large dds on the md array, and the rebuild stuck at around 71
>> MB/s, which is close enough.  This leads me to believe that the problem
>> above is probably a RHEL6 issue.  However, after I stopped the two dd
>> operations,  and raised both speed_limit_min and speed_limit_max to
>> 120000, the rebuild stayed between 71-73 Mb/s for more than 10 minutes
>> .. now it seems to be at 100 MB/s... but doesn't seem to get any higher
>> (even though I had 120 MB/s and above on the RHEL system without any
>> load)... Hmm.
>>
> md certainly cannot "guarantee" any speed - it can only deliver what the
> underlying devices deliver.
> I know the kernels logs say something about a "guarantee".  That was added
> before my time and I haven't had occasion to remove it.
>
> md will normally just try to recover as fast as it can unless that exceeds
> one of the limits - then it will back-off.
> What speed it actually achieved depends on other load and the behaviour of
> the IO scheduler.
>
> "RHEL6" and "Ubuntu" don't mean a lot to me.  Specific kernel version might,
> though in the case of Redhat I know that backport lots of stuff so even the
> kernel version isn't very helpful.  I'm must prefer having report against
> mainline kernels.
>
> Rotating drives do get lower transfer speeds at higher addresses.  That might
> explain the 120 / 100 difference.
Hi Neil,
Thanks very much for your response.
I must say that I'm a little puzzled though. I'm coming from using a 
3Ware hardware RAID controller where I could configure how much of the 
disk bandwidth is to be used for a rebuild versus I/O.   From what I 
understand, you're saying that MD can only use the disk bandwidth 
available to it.  It seems that it doesn't take any priority in the I/O 
chain.  It will only attempt to use no less than min bandwidth, and no 
more than max bandwidth for the rebuild, but if you're on a busy system, 
and other system I/O needs that disk bandwidth, then there's nothing it 
can do about it.  I guess I just don't understand why.  Why can't md be 
given a priority in the kernel to allow the admin to decide how much 
bandwidth goes to system I/O versus rebuild I/O.  Even in a busy system, 
I still want to allocate at least some minimum bandwidth to MD.  In 
fact, in the event of a disk failure, I want to have a whole lot of the 
disk bandwidth dedicated to MD.  It's something about short term pain 
for long term gain? I'd rather not have the users suffer at all, but if 
they do have to suffer, I'd rather them suffer for a few hours, knowing 
that after that, the RAID system is in a perfectly good state with no 
bad disks as opposed to letting a bad disk resync take days because the 
system is really busy... days during which another failure might occur!

Jason.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: question about MD raid rebuild performance degradation even with speed_limit_min/speed_limit_max set.
  2014-10-29  2:34       ` Jason Keltz
@ 2014-10-29  2:57         ` NeilBrown
  2014-10-29 20:56           ` Jason Keltz
  0 siblings, 1 reply; 7+ messages in thread
From: NeilBrown @ 2014-10-29  2:57 UTC (permalink / raw)
  To: Jason Keltz; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 5940 bytes --]

On Tue, 28 Oct 2014 22:34:07 -0400 Jason Keltz <jas@cse.yorku.ca> wrote:

> On 28/10/2014 6:38 PM, NeilBrown wrote:
> > On Mon, 20 Oct 2014 17:07:38 -0400 Jason Keltz<jas@cse.yorku.ca>  wrote:
> >
> >> On 10/20/2014 12:19 PM, Jason Keltz wrote:
> >>> Hi.
> >>>
> >>> I'm creating a 22 x 2 TB SATA disk MD RAID10 on a new RHEL6 system.
> >>> I've experimented with setting "speed_limit_min" and "speed_limit_max"
> >>> kernel variables so that I get the best balance of performance during
> >>> a RAID rebuild of one of the RAID1 pairs. If, for example, I set
> >>> speed_limit_min AND speed_limit_max to 80000 then fail a disk when
> >>> there is no other disk activity, then I do get a rebuild rate of
> >>> around 80 MB/s. However, if I then start up a write intensive
> >>> operation on the MD array (eg. a dd, or a mkfs on an LVM logical
> >>> volume that is created on that MD), then, my write operation seems to
> >>> get "full power", and my rebuild drops to around 25 MB/s. This means
> >>> that the rebuild of my RAID10 disk is going to take a huge amount of
> >>> time (>12 hours!!!). When I set speed_limit_min and speed_limit_max to
> >>> the same value, am I not guaranteeing the rebuild speed? Is this a bug
> >>> that I should be reporting to Red Hat, or a "feature"?
> >>>
> >>> Thanks in advance for any help that you can provide...
> >>>
> >>> Jason.
> >> I would like to add that I downloaded the latest version of Ubuntu, and
> >> am running it on the same server with the same MD.
> >> When I set speed_limit_min and speed_limit_max to 80000, I was able to
> >> start two large dds on the md array, and the rebuild stuck at around 71
> >> MB/s, which is close enough.  This leads me to believe that the problem
> >> above is probably a RHEL6 issue.  However, after I stopped the two dd
> >> operations,  and raised both speed_limit_min and speed_limit_max to
> >> 120000, the rebuild stayed between 71-73 Mb/s for more than 10 minutes
> >> .. now it seems to be at 100 MB/s... but doesn't seem to get any higher
> >> (even though I had 120 MB/s and above on the RHEL system without any
> >> load)... Hmm.
> >>
> > md certainly cannot "guarantee" any speed - it can only deliver what the
> > underlying devices deliver.
> > I know the kernels logs say something about a "guarantee".  That was added
> > before my time and I haven't had occasion to remove it.
> >
> > md will normally just try to recover as fast as it can unless that exceeds
> > one of the limits - then it will back-off.
> > What speed it actually achieved depends on other load and the behaviour of
> > the IO scheduler.
> >
> > "RHEL6" and "Ubuntu" don't mean a lot to me.  Specific kernel version might,
> > though in the case of Redhat I know that backport lots of stuff so even the
> > kernel version isn't very helpful.  I'm must prefer having report against
> > mainline kernels.
> >
> > Rotating drives do get lower transfer speeds at higher addresses.  That might
> > explain the 120 / 100 difference.
> Hi Neil,
> Thanks very much for your response.
> I must say that I'm a little puzzled though. I'm coming from using a 
> 3Ware hardware RAID controller where I could configure how much of the 
> disk bandwidth is to be used for a rebuild versus I/O.   From what I 
> understand, you're saying that MD can only use the disk bandwidth 
> available to it.  It seems that it doesn't take any priority in the I/O 
> chain.  It will only attempt to use no less than min bandwidth, and no 
> more than max bandwidth for the rebuild, but if you're on a busy system, 
> and other system I/O needs that disk bandwidth, then there's nothing it 
> can do about it.  I guess I just don't understand why.  Why can't md be 
> given a priority in the kernel to allow the admin to decide how much 
> bandwidth goes to system I/O versus rebuild I/O.  Even in a busy system, 
> I still want to allocate at least some minimum bandwidth to MD.  In 
> fact, in the event of a disk failure, I want to have a whole lot of the 
> disk bandwidth dedicated to MD.  It's something about short term pain 
> for long term gain? I'd rather not have the users suffer at all, but if 
> they do have to suffer, I'd rather them suffer for a few hours, knowing 
> that after that, the RAID system is in a perfectly good state with no 
> bad disks as opposed to letting a bad disk resync take days because the 
> system is really busy... days during which another failure might occur!
> 
> Jason.

It isn't so much "that MD can only use..." but rather "that MD does only
use ...".

This is how the code has "always" worked and no-one has ever bothered to
change it, or to ask for it to be changed (that I recall).

There are difficulties in guaranteeing a minimum when the array uses
partitions from devices on which other partitions are used for other things.
In that case I don't think it is practical to make guarantees, but that
needn't stop us making guarantees when we can I guess.

If the configured bandwidth exceeded the physically available bandwidth I
don't think we would want to exclude non-resync IO completely, so the
guaranty would have to be:
   N MB/sec or M% of available, whichever is less

We could even implement the different approach in a back-compatible way.
Introduce a new setting "max_sync_percent".  By default that is unset and the
current algorithm applies.
If it is set to something below 100, non-resync IO is throttled to
an appropriate fraction of the actual resync throughput whenever that is
below sync_speed_min.

Or something like that.

Some care would be needed in comparing throughput and sync throughput is
measured per-device, while non-resync throughput might be measured per-array.
Maybe the throttling would happen per-device??

All we need now is for someone to firm up the design and then write the code.

NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: question about MD raid rebuild performance degradation even with speed_limit_min/speed_limit_max set.
  2014-10-29  2:57         ` NeilBrown
@ 2014-10-29 20:56           ` Jason Keltz
  2014-10-31 19:44             ` Peter Grandi
  0 siblings, 1 reply; 7+ messages in thread
From: Jason Keltz @ 2014-10-29 20:56 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 10/28/2014 10:57 PM, NeilBrown wrote:
> On Tue, 28 Oct 2014 22:34:07 -0400 Jason Keltz <jas@cse.yorku.ca> wrote:
>
>> On 28/10/2014 6:38 PM, NeilBrown wrote:
>>> On Mon, 20 Oct 2014 17:07:38 -0400 Jason Keltz<jas@cse.yorku.ca>  wrote:
>>>
>>>> On 10/20/2014 12:19 PM, Jason Keltz wrote:
>>>>> Hi.
>>>>>
>>>>> I'm creating a 22 x 2 TB SATA disk MD RAID10 on a new RHEL6 system.
>>>>> I've experimented with setting "speed_limit_min" and "speed_limit_max"
>>>>> kernel variables so that I get the best balance of performance during
>>>>> a RAID rebuild of one of the RAID1 pairs. If, for example, I set
>>>>> speed_limit_min AND speed_limit_max to 80000 then fail a disk when
>>>>> there is no other disk activity, then I do get a rebuild rate of
>>>>> around 80 MB/s. However, if I then start up a write intensive
>>>>> operation on the MD array (eg. a dd, or a mkfs on an LVM logical
>>>>> volume that is created on that MD), then, my write operation seems to
>>>>> get "full power", and my rebuild drops to around 25 MB/s. This means
>>>>> that the rebuild of my RAID10 disk is going to take a huge amount of
>>>>> time (>12 hours!!!). When I set speed_limit_min and speed_limit_max to
>>>>> the same value, am I not guaranteeing the rebuild speed? Is this a bug
>>>>> that I should be reporting to Red Hat, or a "feature"?
>>>>>
>>>>> Thanks in advance for any help that you can provide...
>>>>>
>>>>> Jason.
>>>> I would like to add that I downloaded the latest version of Ubuntu, and
>>>> am running it on the same server with the same MD.
>>>> When I set speed_limit_min and speed_limit_max to 80000, I was able to
>>>> start two large dds on the md array, and the rebuild stuck at around 71
>>>> MB/s, which is close enough.  This leads me to believe that the problem
>>>> above is probably a RHEL6 issue.  However, after I stopped the two dd
>>>> operations,  and raised both speed_limit_min and speed_limit_max to
>>>> 120000, the rebuild stayed between 71-73 Mb/s for more than 10 minutes
>>>> .. now it seems to be at 100 MB/s... but doesn't seem to get any higher
>>>> (even though I had 120 MB/s and above on the RHEL system without any
>>>> load)... Hmm.
>>>>
>>> md certainly cannot "guarantee" any speed - it can only deliver what the
>>> underlying devices deliver.
>>> I know the kernels logs say something about a "guarantee".  That was added
>>> before my time and I haven't had occasion to remove it.
>>>
>>> md will normally just try to recover as fast as it can unless that exceeds
>>> one of the limits - then it will back-off.
>>> What speed it actually achieved depends on other load and the behaviour of
>>> the IO scheduler.
>>>
>>> "RHEL6" and "Ubuntu" don't mean a lot to me.  Specific kernel version might,
>>> though in the case of Redhat I know that backport lots of stuff so even the
>>> kernel version isn't very helpful.  I'm must prefer having report against
>>> mainline kernels.
>>>
>>> Rotating drives do get lower transfer speeds at higher addresses.  That might
>>> explain the 120 / 100 difference.
>> Hi Neil,
>> Thanks very much for your response.
>> I must say that I'm a little puzzled though. I'm coming from using a
>> 3Ware hardware RAID controller where I could configure how much of the
>> disk bandwidth is to be used for a rebuild versus I/O.   From what I
>> understand, you're saying that MD can only use the disk bandwidth
>> available to it.  It seems that it doesn't take any priority in the I/O
>> chain.  It will only attempt to use no less than min bandwidth, and no
>> more than max bandwidth for the rebuild, but if you're on a busy system,
>> and other system I/O needs that disk bandwidth, then there's nothing it
>> can do about it.  I guess I just don't understand why.  Why can't md be
>> given a priority in the kernel to allow the admin to decide how much
>> bandwidth goes to system I/O versus rebuild I/O.  Even in a busy system,
>> I still want to allocate at least some minimum bandwidth to MD.  In
>> fact, in the event of a disk failure, I want to have a whole lot of the
>> disk bandwidth dedicated to MD.  It's something about short term pain
>> for long term gain? I'd rather not have the users suffer at all, but if
>> they do have to suffer, I'd rather them suffer for a few hours, knowing
>> that after that, the RAID system is in a perfectly good state with no
>> bad disks as opposed to letting a bad disk resync take days because the
>> system is really busy... days during which another failure might occur!
>>
>> Jason.
> It isn't so much "that MD can only use..." but rather "that MD does only
> use ...".
Got it..

> This is how the code has "always" worked and no-one has ever bothered to
> change it, or to ask for it to be changed (that I recall).
I'm actually not surprised to hear that since I spent a considerable  
time trying to find articles talking about this topic, and couldn't find 
a single thing! :)  I, on the other hand, am replacing a hardware RAID 
system with MD, so I've been "spoiled" already! (not with the 
performance of hardware RAID, but the functionality) :)

> There are difficulties in guaranteeing a minimum when the array uses
> partitions from devices on which other partitions are used for other things.
> In that case I don't think it is practical to make guarantees, but that
> needn't stop us making guarantees when we can I guess.
>
> If the configured bandwidth exceeded the physically available bandwidth I
> don't think we would want to exclude non-resync IO completely, so the
> guaranty would have to be:
>     N MB/sec or M% of available, whichever is less
>
> We could even implement the different approach in a back-compatible way.
> Introduce a new setting "max_sync_percent".  By default that is unset and the
> current algorithm applies.
> If it is set to something below 100, non-resync IO is throttled to
> an appropriate fraction of the actual resync throughput whenever that is
> below sync_speed_min.
>
> Or something like that.
That actually sounds great!  I can certainly understand and appreciate 
how it would be difficult to handle arrays using partitions from 
multiple devices. Maybe the functionality only works if you're using 
full disk devices. :)  (Okay, on second thought, that's about 99% of the 
people using MD -- apparently, word has it that only you and I are using 
MD full disk devices).

> Some care would be needed in comparing throughput and sync throughput is
> measured per-device, while non-resync throughput might be measured per-array.
> Maybe the throttling would happen per-device??
>
> All we need now is for someone to firm up the design and then write the code.
Maybe someone on the list will step forward :D

The truth is, as people start to combine larger and larger disks, and 
rebuild times go up and up and up, this type of request will become more 
common....

Jas.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: question about MD raid rebuild performance degradation even with speed_limit_min/speed_limit_max set.
  2014-10-29 20:56           ` Jason Keltz
@ 2014-10-31 19:44             ` Peter Grandi
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Grandi @ 2014-10-31 19:44 UTC (permalink / raw)
  To: Linux RAID

>>>>>> If, for example, I set speed_limit_min AND speed_limit_max to
>>>>>> 80000 then fail a disk when there is no other disk activity, then
>>>>>> I do get a rebuild rate of around 80 MB/s. However, if I then
>>>>>> start up a write intensive operation on the MD array (eg. a dd,
>>>>>> or a mkfs on an LVM logical volume that is created on that MD),
>>>>>> then, my write operation seems to get "full power", and my
>>>>>> rebuild drops to around 25 MB/s.

Linux MD RAID is fundamentally an IO address remapper, and actual IO is
scheduled and executed by the Linux block (page) IO subsystem. This
separation is beneficial in many ways.

Also the bandwidth delivered by a storage device is not a single
number. Disk drives transfer rate depend a lot on degree of randomness
of access, and outer vs. inner regions. Common disks can therefore
deliver bandwidth between 150MB/s and 0.5MB/s depending on the overall
traffic investing them.

Therefore in order to deliver a consistent transfer rate to the MD
resync kernel process the Linux block (page) IO subsystem would have to
be quite clever in controlling the rates of usage of all the processes
using a disk.

>>> I'm coming from using a 3Ware hardware RAID cotntroller where I
>>> could configure how much of the disk bandwidth is to be used for a
>>> rebuild versus I/O.

That (usually) works because the disk is completely dedicated to the
RAID card and the RAID card can schedule all IO to it.

>>> From what I understand, you're saying that MD [ ... ] and other
>>> system I/O needs that disk bandwidth, then there's nothing it can do
>>> about it. I guess I just don't understand why. Why can't md be given
>>> a priority in the kernel to allow the admin to decide how much
>>> bandwidth goes to system I/O versus rebuild I/O.

There is something you can do about it: rewrite the block IO subsystem
in the Linux kernel so that it can be configured to allocate IOPS and/or
bandwidth quotas to different processes, among them the MD resync kernel
process (extra awesome if that is also isochronous).

Because as well summarized below that process is just one of many
possible users of a given disk in a Linux based system:

>> There are difficulties in guaranteeing a minimum when the array uses
>> partitions from devices on which other partitions are used for other
>> things.

Put another way, designing MD RAID as fundamentally an IO address
remapper, and letting the MD resync kernel process run as "just another
process", has some big advantages and gives a lot of flexibility, but
means relying on the kernel block subsystem to do actual IO, and
accepting its current limitations. That is a free choice.

> The truth is, as people start to combine larger and larger
> disks, and rebuild times go up and up and up, this type of
> request will become more common....

Request of the type "I decided to use a physical storage design that
behaves in a way that I don't like so MD should do magic and work around
my decision" are already common. :-).

Using «larger and larger disks» is a *choice people make* and if they
don't like the obvious consequences they should then not make that
choice; they can instead choose to use smaller disks, or just the outer
part of larger disks (which can` be cheaper).

A critical metric for physical storage is IOPS/GB ratios (and their
variability dependent on workload) and ooking at those ratios I
personally think that common disks larger than 1TB are not suitable for
many cases of "typical" live data usage, and in the day job we sometimes
build MD RAID sets made of 146GB 15k disks, because fortunately my
colleagues understand the relevant tradeoffs too.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-10-31 19:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <5445332B.9060009@cse.yorku.ca>
2014-10-20 16:19 ` question about MD raid rebuild performance degradation even with speed_limit_min/speed_limit_max set Jason Keltz
2014-10-20 21:07   ` Jason Keltz
2014-10-28 22:38     ` NeilBrown
2014-10-29  2:34       ` Jason Keltz
2014-10-29  2:57         ` NeilBrown
2014-10-29 20:56           ` Jason Keltz
2014-10-31 19:44             ` Peter Grandi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).