Speeding up chunk size change?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Speeding up chunk size change?
@ 2012-03-03 19:36 Steven Haigh
  2012-03-03 21:42 ` Stan Hoeppner
  0 siblings, 1 reply; 6+ messages in thread
From: Steven Haigh @ 2012-03-03 19:36 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1551 bytes --]

Hi all,

I just wanted to run this past a few folk here as I want to make sure 
I'm doing it the Right Way(tm).

I've decided to experiment with using a 128Kb chunk size on my RAID6 
instead of a 64kb chunk. To prepare for this, I removed the internal 
bitmap, then started the reshape via:

# mdadm --grow /dev/md2 --chunk=128K --backup-file=/md2-backup-file

This is slowly going on, however it will take much longer than I expected:

md2 : active raid6 sdf[6] sde[5] sdb[4] sda[3]
       1953520896 blocks super 1.2 level 6, 64k chunk, algorithm 2 [4/4] 
[UUUU]
       [>....................]  reshape =  0.6% (6649600/976760448) 
finish=2576.1min speed=6275K/sec

I set a few 'optimisations' that I believe should help:
## Tweak the RAIDs
blockdev --setra 8192 /dev/sd[abcdefg]
blockdev --setra 8192 /dev/md0
blockdev --setra 8192 /dev/md1
blockdev --setra 16384 /dev/md2
echo 16384 > /sys/block/md2/md/stripe_cache_size

for i in sda sdb sdc sdd sde sdf; do
         echo "Setting options for $i"
         echo 256 > /sys/block/$i/queue/nr_requests
         echo 4096 > /sys/block/$i/queue/read_ahead_kb
         echo 1 > /sys/block/$i/device/queue_depth
         echo deadline > /sys/block/$i/queue/scheduler
done

Just wondering if anyone knows of any possible way to speed up the 
reshape a little, or if (like I suspect) it will take ~2 days to 
complete the reshape.

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4952 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Speeding up chunk size change?
  2012-03-03 19:36 Speeding up chunk size change? Steven Haigh
@ 2012-03-03 21:42 ` Stan Hoeppner
  2012-03-04  0:56   ` Steven Haigh
  0 siblings, 1 reply; 6+ messages in thread
From: Stan Hoeppner @ 2012-03-03 21:42 UTC (permalink / raw)
  To: Steven Haigh; +Cc: linux-raid

On 3/3/2012 1:36 PM, Steven Haigh wrote:
> Hi all,
> 
> I just wanted to run this past a few folk here as I want to make sure
> I'm doing it the Right Way(tm).
> 
> I've decided to experiment with using a 128Kb chunk size on my RAID6
> instead of a 64kb chunk. 

Why?  Does your target application(s) perform better with a larger
chunk, and therefore larger total stripe size?  If you're strictly after
larger dd copy numbers then you're wasting everyone's time, including
yours, as such has almost zero bearing on real world performance, as
most workloads are far more random than sequential.

And apparently you're not using XFS.  This reshape will screw up your
alignment, and you'll need to change your fstab mount to reflect the new
RAID geometry.  But my guess is you're not using.  If you were you'd
probably be experienced enough to know that doubling your chunk size
isn't going to make much difference, if any, in real world system usage.

> I set a few 'optimisations' that I believe should help:
> ## Tweak the RAIDs
> blockdev --setra 8192 /dev/sd[abcdefg]

Read-ahead is per file descriptor, and occurs at the filesystem level.
The read-ahead value used is that of the device immediately underlying
the filessytem.  So don't bother setting these above.

> blockdev --setra 8192 /dev/md0
> blockdev --setra 8192 /dev/md1
> blockdev --setra 16384 /dev/md2

This is fine.  You could theoretically set this to 1GB or more if you
always read entire files, with no ill effects, as read-ahead doesn't go
past EOF.  However if you do any mmap reads (many apps do) of portions
of large files, this will hammer performance, obviously, as you're
reading entire large files speculatively when not needed.  Play with
this at your own risk.

> echo 16384 > /sys/block/md2/md/stripe_cache_size
> 
> for i in sda sdb sdc sdd sde sdf; do
>         echo "Setting options for $i"
>         echo 256 > /sys/block/$i/queue/nr_requests
>         echo 4096 > /sys/block/$i/queue/read_ahead_kb
Eliminate this line ^^^^

>         echo 1 > /sys/block/$i/device/queue_depth
>         echo deadline > /sys/block/$i/queue/scheduler
> done
> 
> Just wondering if anyone knows of any possible way to speed up the
> reshape a little, or if (like I suspect) it will take ~2 days to
> complete the reshape.

Considering how expensive such operations are in both time and wear on
the disk drives, it's better to read everything available to you on the
subject and ask questions *before* performing expensive experiments on
your array.  If you currently have an performance problem you're trying
to solve, the cause lay somewhere other than your chunk size.

-- 
Stan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Speeding up chunk size change?
  2012-03-03 21:42 ` Stan Hoeppner
@ 2012-03-04  0:56   ` Steven Haigh
  2012-03-04  2:24     ` Stan Hoeppner
  0 siblings, 1 reply; 6+ messages in thread
From: Steven Haigh @ 2012-03-04  0:56 UTC (permalink / raw)
  To: stan; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 4332 bytes --]

On 4/03/2012 8:42 AM, Stan Hoeppner wrote:
> On 3/3/2012 1:36 PM, Steven Haigh wrote:
>> Hi all,
>>
>> I just wanted to run this past a few folk here as I want to make sure
>> I'm doing it the Right Way(tm).
>>
>> I've decided to experiment with using a 128Kb chunk size on my RAID6
>> instead of a 64kb chunk.
>
> Why?  Does your target application(s) perform better with a larger
> chunk, and therefore larger total stripe size?  If you're strictly after
> larger dd copy numbers then you're wasting everyone's time, including
> yours, as such has almost zero bearing on real world performance, as
> most workloads are far more random than sequential.

Purely experimental for fun and education. I actually thought that a 
reshape would go at somewhat near the resync speeds I get of 
~60-90Mb/sec. I guess this shows I'm wrong ;)

> And apparently you're not using XFS.  This reshape will screw up your
> alignment, and you'll need to change your fstab mount to reflect the new
> RAID geometry.  But my guess is you're not using.  If you were you'd
> probably be experienced enough to know that doubling your chunk size
> isn't going to make much difference, if any, in real world system usage.

I do use XFS - but this machines role is a Xen Dom0 - so md2 holds the 
filesystems for the guest VMsin LVs. One of those guest filesystems is 
an LV of the VG on md2 formatted as XFS. It will be interesting to see 
how this affects things :)

>> I set a few 'optimisations' that I believe should help:
>> ## Tweak the RAIDs
>> blockdev --setra 8192 /dev/sd[abcdefg]
>
> Read-ahead is per file descriptor, and occurs at the filesystem level.
> The read-ahead value used is that of the device immediately underlying
> the filessytem.  So don't bother setting these above.

Interesting - I didn't think that was the case for whole disk arrays - 
but there you go... Learnt something else :)

>> blockdev --setra 8192 /dev/md0
>> blockdev --setra 8192 /dev/md1
>> blockdev --setra 16384 /dev/md2
>
> This is fine.  You could theoretically set this to 1GB or more if you
> always read entire files, with no ill effects, as read-ahead doesn't go
> past EOF.  However if you do any mmap reads (many apps do) of portions
> of large files, this will hammer performance, obviously, as you're
> reading entire large files speculatively when not needed.  Play with
> this at your own risk.

The workloads of the array (having LVM on top) for the VMs would 
probably make it quite random. This is part of the reason I am playing 
here - pure experimentation. I am very curious to see if it works better 
or worse after the reshape. I honestly don't know :)

>> echo 16384>  /sys/block/md2/md/stripe_cache_size
>>
>> for i in sda sdb sdc sdd sde sdf; do
>>          echo "Setting options for $i"
>>          echo 256>  /sys/block/$i/queue/nr_requests
>>          echo 4096>  /sys/block/$i/queue/read_ahead_kb
> Eliminate this line ^^^^

Any insight into why? I would have thought that this would help - 
however I'm not quite sure as to the values - as this is much less than 
one chunk... That also being said, wouldn't it be a good idea to have 
*some* readahead?

>>          echo 1>  /sys/block/$i/device/queue_depth
>>          echo deadline>  /sys/block/$i/queue/scheduler
>> done
>>
>> Just wondering if anyone knows of any possible way to speed up the
>> reshape a little, or if (like I suspect) it will take ~2 days to
>> complete the reshape.
>
> Considering how expensive such operations are in both time and wear on
> the disk drives, it's better to read everything available to you on the
> subject and ask questions *before* performing expensive experiments on
> your array.  If you currently have an performance problem you're trying
> to solve, the cause lay somewhere other than your chunk size.

As I said above, there really is no 'problem' I'm trying to solve. The 
whole reason is experimentation and education - really to see a 'what 
if' case. The last reshape I did on this array was a RAID5->RAID6 grow 
which went very well - however I have never experimented with chunk size 
on a mdadm raid.

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4952 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Speeding up chunk size change?
  2012-03-04  0:56   ` Steven Haigh
@ 2012-03-04  2:24     ` Stan Hoeppner
  2012-03-04  2:27       ` Steven Haigh
  0 siblings, 1 reply; 6+ messages in thread
From: Stan Hoeppner @ 2012-03-04  2:24 UTC (permalink / raw)
  To: Steven Haigh; +Cc: linux-raid

On 3/3/2012 6:56 PM, Steven Haigh wrote:
> On 4/03/2012 8:42 AM, Stan Hoeppner wrote:
[snip]
>>> blockdev --setra 8192 /dev/sd[abcdefg]
>>
>> Read-ahead is per file descriptor, and occurs at the filesystem level.
>> The read-ahead value used is that of the device immediately underlying
>> the filessytem.  So don't bother setting these above.
> 
> Interesting - I didn't think that was the case for whole disk arrays -
> but there you go... Learnt something else :)
[snip]
>>>          echo 4096>  /sys/block/$i/queue/read_ahead_kb
>> Eliminate this line ^^^^
> 
> Any insight into why? I would have thought that this would help -
> however I'm not quite sure as to the values - as this is much less than
> one chunk... That also being said, wouldn't it be a good idea to have
> *some* readahead?

You read the answer up above, and commented on it.  Maybe you didn't
fully understand?  Or maybe it's because you don't know that these two
are functionally equivalent?

blockdev --setra X
echo X >  /sys/block/$i/queue/read_ahead_kb

-- 
Stan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Speeding up chunk size change?
  2012-03-04  2:24     ` Stan Hoeppner
@ 2012-03-04  2:27       ` Steven Haigh
  2012-03-04  2:37         ` Bernd Schubert
  0 siblings, 1 reply; 6+ messages in thread
From: Steven Haigh @ 2012-03-04  2:27 UTC (permalink / raw)
  To: stan; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1348 bytes --]

On 4/03/2012 1:24 PM, Stan Hoeppner wrote:
> On 3/3/2012 6:56 PM, Steven Haigh wrote:
>> On 4/03/2012 8:42 AM, Stan Hoeppner wrote:
> [snip]
>>>> blockdev --setra 8192 /dev/sd[abcdefg]
>>>
>>> Read-ahead is per file descriptor, and occurs at the filesystem level.
>>> The read-ahead value used is that of the device immediately underlying
>>> the filessytem.  So don't bother setting these above.
>>
>> Interesting - I didn't think that was the case for whole disk arrays -
>> but there you go... Learnt something else :)
> [snip]
>>>>           echo 4096>   /sys/block/$i/queue/read_ahead_kb
>>> Eliminate this line ^^^^
>>
>> Any insight into why? I would have thought that this would help -
>> however I'm not quite sure as to the values - as this is much less than
>> one chunk... That also being said, wouldn't it be a good idea to have
>> *some* readahead?
>
> You read the answer up above, and commented on it.  Maybe you didn't
> fully understand?  Or maybe it's because you don't know that these two
> are functionally equivalent?
>
> blockdev --setra X
> echo X>   /sys/block/$i/queue/read_ahead_kb

Ahhh - you're spot on... I didn't think they had the same functionality!

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4952 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Speeding up chunk size change?
  2012-03-04  2:27       ` Steven Haigh
@ 2012-03-04  2:37         ` Bernd Schubert
  0 siblings, 0 replies; 6+ messages in thread
From: Bernd Schubert @ 2012-03-04  2:37 UTC (permalink / raw)
  To: Steven Haigh; +Cc: stan, linux-raid

On 03/04/2012 03:27 AM, Steven Haigh wrote:
> On 4/03/2012 1:24 PM, Stan Hoeppner wrote:
>> On 3/3/2012 6:56 PM, Steven Haigh wrote:
>>> On 4/03/2012 8:42 AM, Stan Hoeppner wrote:
>> [snip]
>>>>> blockdev --setra 8192 /dev/sd[abcdefg]
>>>>
>>>> Read-ahead is per file descriptor, and occurs at the filesystem level.
>>>> The read-ahead value used is that of the device immediately underlying
>>>> the filessytem. So don't bother setting these above.
>>>
>>> Interesting - I didn't think that was the case for whole disk arrays -
>>> but there you go... Learnt something else :)
>> [snip]
>>>>> echo 4096> /sys/block/$i/queue/read_ahead_kb
>>>> Eliminate this line ^^^^
>>>
>>> Any insight into why? I would have thought that this would help -
>>> however I'm not quite sure as to the values - as this is much less than
>>> one chunk... That also being said, wouldn't it be a good idea to have
>>> *some* readahead?
>>
>> You read the answer up above, and commented on it. Maybe you didn't
>> fully understand? Or maybe it's because you don't know that these two
>> are functionally equivalent?
>>
>> blockdev --setra X
>> echo X> /sys/block/$i/queue/read_ahead_kb
>
> Ahhh - you're spot on... I didn't think they had the same functionality!
>

Btw, /sys/block/$i/queue/read_ahead_kb is deprecated, as its not a 
kernel-internal queue setting, but rather bdi (backing device) related.
So /sys/block/$i/bdi/read_ahead_kb is recommended to be used (I guess 
queue/read_ahead_kb will go away sometime in the future). In 
/sys/class/bdi you can even control read-head of non-block device file 
systems.

Cheers,
Bernd

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-03-04  2:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-03 19:36 Speeding up chunk size change? Steven Haigh
2012-03-03 21:42 ` Stan Hoeppner
2012-03-04  0:56   ` Steven Haigh
2012-03-04  2:24     ` Stan Hoeppner
2012-03-04  2:27       ` Steven Haigh
2012-03-04  2:37         ` Bernd Schubert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).