internal write-intent bitmap is horribly slow with RAID10 over 20 drives

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* internal write-intent bitmap is horribly slow with RAID10 over 20 drives
@ 2017-06-05 10:06 CoolCold
  2017-06-05 10:55 ` David Brown
  2017-06-06  3:40 ` NeilBrown
  0 siblings, 2 replies; 9+ messages in thread
From: CoolCold @ 2017-06-05 10:06 UTC (permalink / raw)
  To: Linux RAID

Hello!
Keep testing the new box and while having not the best sync speed,
it's not the worst thing I found.

Doing FIO testing, for RAID10 over 20 10k RPM drives, I have very bad
performance, like _45_ iops only.

Drives:
=== START OF INFORMATION SECTION ===
Vendor:               TOSHIBA
Product:              AL14SEB18EQ
Revision:             0101
User Capacity:        1,800,360,124,416 bytes [1.80 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
Lowest aligned LBA:   0
Rotation Rate:        10500 rpm
Form Factor:          2.5 inches
Logical Unit id:      0x500003975840f759
Serial number:        X6K0A0D5FZRC
Device type:          disk
Transport protocol:   SAS
Local Time is:        Mon Jun  5 09:51:56 2017 UTC
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled


Output from fio with internal write-intent bitmap:
Jobs: 1 (f=1): [w(1)] [28.3% done] [0KB/183KB/0KB /s] [0/45/0 iops]
[eta 07m:11s]

array definition:
[root@spare-a17484327407661 rovchinnikov]# cat /proc/mdstat
Personalities : [raid1] [raid10] [raid6] [raid5] [raid4]
md1 : active raid10 sdx[19] sdw[18] sdv[17] sdu[16] sdt[15] sds[14]
sdr[13] sdq[12] sdp[11] sdo[10] sdn[9] sdm[8] sdl[7] sdk[6] sdj[5]
sdi[4] sdh[3] sdg[2] sdf[1] sde[0]
      17580330880 blocks super 1.2 64K chunks 2 near-copies [20/20]
[UUUUUUUUUUUUUUUUUUUU]
      bitmap: 0/66 pages [0KB], 131072KB chunk

Setting journal to be
1) on SSD (separate drives), shows
Jobs: 1 (f=1): [w(1)] [5.0% done] [0KB/18783KB/0KB /s] [0/4695/0 iops]
[eta 09m:31s]
2) to 'none' (disabling) shows
Jobs: 1 (f=1): [w(1)] [14.0% done] [0KB/18504KB/0KB /s] [0/4626/0
iops] [eta 08m:36s]

All bitmap manipulations was on running array,

Sample output from iostat with internal bitmap (array of interest is md1):
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdz               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdy               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
sda               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00   47.00     0.00   542.00
23.06     0.67   14.34    0.00   14.34  14.28  67.10
sdh               0.00     0.00    0.00   50.00     0.00   734.00
29.36     0.68   13.66    0.00   13.66  13.60  68.00
sdi               0.00     0.00    0.00   46.00     0.00   478.00
20.78     0.67   14.63    0.00   14.63  14.54  66.90
sdf               0.00     0.00    0.00   47.00     0.00   542.00
23.06     0.67   14.28    0.00   14.28  14.21  66.80
sdl               0.00     0.00    0.00   45.00     0.00   414.00
18.40     0.66   14.78    0.00   14.78  14.69  66.10
sdn               0.00     0.00    0.00   44.00     0.00   350.00
15.91     0.66   14.95    0.00   14.95  14.95  65.80
sdj               0.00     0.00    0.00   46.00     0.00   478.00
20.78     0.65   14.22    0.00   14.22  14.13  65.00
sdk               0.00     0.00    0.00   46.00     0.00   418.00
18.17     0.65   14.35    0.00   14.35  13.98  64.30
sdm               0.00     0.00    0.00   44.00     0.00   350.00
15.91     0.62   14.18    0.00   14.18  14.14  62.20
sdp               0.00     0.00    0.00   43.00     0.00   286.00
13.30     0.62   14.56    0.00   14.56  14.51  62.40
sdg               0.00     0.00    0.00   50.00     0.00   734.00
29.36     0.68   13.58    0.00   13.58  13.52  67.60
sdv               0.00     0.00    0.00   43.00     0.00   346.00
16.09     0.67   15.19    0.00   15.19  15.49  66.60
sdr               0.00     0.00    0.00   44.00     0.00   410.00
18.64     0.66   14.59    0.00   14.59  14.86  65.40
sds               0.00     0.00    0.00   43.00     0.00   286.00
13.30     0.65   15.07    0.00   15.07  15.02  64.60
sdt               0.00     0.00    0.00   42.00     0.00   282.00
13.43     0.65   15.17    0.00   15.17  15.48  65.00
sdo               0.00     0.00    0.00   43.00     0.00   286.00
13.30     0.64   14.93    0.00   14.93  14.88  64.00
sdq               0.00     0.00    0.00   45.00     0.00   414.00
18.40     0.66   14.67    0.00   14.67  14.62  65.80
sdw               0.00     0.00    0.00   43.00     0.00   346.00
16.09     0.66   14.93    0.00   14.93  15.21  65.40
sdu               0.00     0.00    0.00   44.00     0.00   350.00
15.91     0.64   14.55    0.00   14.55  14.50  63.80
sdx               0.00     0.00    0.00   44.00     0.00   350.00
15.91     0.64   14.55    0.00   14.55  14.48  63.70
md127             0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
md126             0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
md125             0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00   41.00     0.00  2624.00
128.00     0.00    0.00    0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00


The same picture on 3.10.0-327.el7.x86_64 and 4.11 kernel.

Any advises would be very helpful.

-- 
Best regards,
[COOLCOLD-RIPN]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: internal write-intent bitmap is horribly slow with RAID10 over 20 drives
  2017-06-05 10:06 internal write-intent bitmap is horribly slow with RAID10 over 20 drives CoolCold
@ 2017-06-05 10:55 ` David Brown
  2017-06-05 12:30   ` CoolCold
  2017-06-06  3:40 ` NeilBrown
  1 sibling, 1 reply; 9+ messages in thread
From: David Brown @ 2017-06-05 10:55 UTC (permalink / raw)
  To: CoolCold, Linux RAID

On 05/06/17 12:06, CoolCold wrote:
> Hello!
> Keep testing the new box and while having not the best sync speed,
> it's not the worst thing I found.
> 
> Doing FIO testing, for RAID10 over 20 10k RPM drives, I have very bad
> performance, like _45_ iops only.
> 

<snip>

> 
> Any advises would be very helpful.
> 

The best advice I can give you is to take a step back, and try to be 
clear what problem you are trying to solve here.  What is this system 
supposed to do?  What are your requirements?  What are your use-cases?

Attempting to optimise the sync speed (or IOPS) of a 20 drive RAID10 set 
is a totally pointless exercise in itself.  At best, it would be a 
torture test for md raid.  There is no system in the world where the 
specifications are "make the fastest 20 drive RAID10 setup" - at least, 
not from a sane IT management.

When you have a clear picture of what you actually /need/, what you 
/want/, what you /have/, and how you want to use it all - /then/, and 
only then, is it time to look at possible RAID setups, filesystem 
arrangements, alternative hardware, etc.  Then you ask for advice on 
this list (or elsewhere - but this list is a good starting point), 
learning about the pros and cons of a variety of possible arrangements.

Once you have collected these ideas, with different balances in 
performance, safety, space efficiency, features, cost, etc., you can 
then test them out and look at benchmarks to see if they will work in 
practice.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: internal write-intent bitmap is horribly slow with RAID10 over 20 drives
  2017-06-05 10:55 ` David Brown
@ 2017-06-05 12:30   ` CoolCold
  2017-06-05 16:16     ` David Brown
  0 siblings, 1 reply; 9+ messages in thread
From: CoolCold @ 2017-06-05 12:30 UTC (permalink / raw)
  To: David Brown; +Cc: Linux RAID

Hello!
Thanks for your proposals.

From my POV, having ~ 100 iops per drive is completely okay, as you
see from iostat/fio output it's not even near the case.
Hope this clarifies.

On Mon, Jun 5, 2017 at 5:55 PM, David Brown <david.brown@hesbynett.no> wrote:
>
>
> On 05/06/17 12:06, CoolCold wrote:
>>
>> Hello!
>> Keep testing the new box and while having not the best sync speed,
>> it's not the worst thing I found.
>>
>> Doing FIO testing, for RAID10 over 20 10k RPM drives, I have very bad
>> performance, like _45_ iops only.
>>
>
> <snip>
>
>>
>> Any advises would be very helpful.
>>
>
> The best advice I can give you is to take a step back, and try to be clear
> what problem you are trying to solve here.  What is this system supposed to
> do?  What are your requirements?  What are your use-cases?
>
> Attempting to optimise the sync speed (or IOPS) of a 20 drive RAID10 set is
> a totally pointless exercise in itself.  At best, it would be a torture test
> for md raid.  There is no system in the world where the specifications are
> "make the fastest 20 drive RAID10 setup" - at least, not from a sane IT
> management.
>
> When you have a clear picture of what you actually /need/, what you /want/,
> what you /have/, and how you want to use it all - /then/, and only then, is
> it time to look at possible RAID setups, filesystem arrangements,
> alternative hardware, etc.  Then you ask for advice on this list (or
> elsewhere - but this list is a good starting point), learning about the pros
> and cons of a variety of possible arrangements.
>
> Once you have collected these ideas, with different balances in performance,
> safety, space efficiency, features, cost, etc., you can then test them out
> and look at benchmarks to see if they will work in practice.



-- 
Best regards,
[COOLCOLD-RIPN]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: internal write-intent bitmap is horribly slow with RAID10 over 20 drives
  2017-06-05 12:30   ` CoolCold
@ 2017-06-05 16:16     ` David Brown
  0 siblings, 0 replies; 9+ messages in thread
From: David Brown @ 2017-06-05 16:16 UTC (permalink / raw)
  To: CoolCold; +Cc: Linux RAID



On 05/06/17 14:30, CoolCold wrote:
> Hello!
> Thanks for your proposals.
> 
>  From my POV, having ~ 100 iops per drive is completely okay, as you
> see from iostat/fio output it's not even near the case.
> Hope this clarifies.

No, that tells me absolutely /nothing/.

Please re-read what I wrote, and think about it.  When you can tell us 
what you plan to /do/ with your disk array, it will be possible for 
people to give you /real/ help and useful advice.  As it is, the best 
you can get is suggestions for the colour of your bike shed.

mvh.,

David


> 
> On Mon, Jun 5, 2017 at 5:55 PM, David Brown <david.brown@hesbynett.no> wrote:
>>
>>
>> On 05/06/17 12:06, CoolCold wrote:
>>>
>>> Hello!
>>> Keep testing the new box and while having not the best sync speed,
>>> it's not the worst thing I found.
>>>
>>> Doing FIO testing, for RAID10 over 20 10k RPM drives, I have very bad
>>> performance, like _45_ iops only.
>>>
>>
>> <snip>
>>
>>>
>>> Any advises would be very helpful.
>>>
>>
>> The best advice I can give you is to take a step back, and try to be clear
>> what problem you are trying to solve here.  What is this system supposed to
>> do?  What are your requirements?  What are your use-cases?
>>
>> Attempting to optimise the sync speed (or IOPS) of a 20 drive RAID10 set is
>> a totally pointless exercise in itself.  At best, it would be a torture test
>> for md raid.  There is no system in the world where the specifications are
>> "make the fastest 20 drive RAID10 setup" - at least, not from a sane IT
>> management.
>>
>> When you have a clear picture of what you actually /need/, what you /want/,
>> what you /have/, and how you want to use it all - /then/, and only then, is
>> it time to look at possible RAID setups, filesystem arrangements,
>> alternative hardware, etc.  Then you ask for advice on this list (or
>> elsewhere - but this list is a good starting point), learning about the pros
>> and cons of a variety of possible arrangements.
>>
>> Once you have collected these ideas, with different balances in performance,
>> safety, space efficiency, features, cost, etc., you can then test them out
>> and look at benchmarks to see if they will work in practice.
> 
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: internal write-intent bitmap is horribly slow with RAID10 over 20 drives
  2017-06-05 10:06 internal write-intent bitmap is horribly slow with RAID10 over 20 drives CoolCold
  2017-06-05 10:55 ` David Brown
@ 2017-06-06  3:40 ` NeilBrown
  2017-06-06 11:31   ` CoolCold
  1 sibling, 1 reply; 9+ messages in thread
From: NeilBrown @ 2017-06-06  3:40 UTC (permalink / raw)
  To: CoolCold, Linux RAID

[-- Attachment #1: Type: text/plain, Size: 2205 bytes --]

On Mon, Jun 05 2017, CoolCold wrote:

> Hello!
> Keep testing the new box and while having not the best sync speed,
> it's not the worst thing I found.
>
> Doing FIO testing, for RAID10 over 20 10k RPM drives, I have very bad
> performance, like _45_ iops only.

...
>
>
> Output from fio with internal write-intent bitmap:
> Jobs: 1 (f=1): [w(1)] [28.3% done] [0KB/183KB/0KB /s] [0/45/0 iops]
> [eta 07m:11s]
>
> array definition:
> [root@spare-a17484327407661 rovchinnikov]# cat /proc/mdstat
> Personalities : [raid1] [raid10] [raid6] [raid5] [raid4]
> md1 : active raid10 sdx[19] sdw[18] sdv[17] sdu[16] sdt[15] sds[14]
> sdr[13] sdq[12] sdp[11] sdo[10] sdn[9] sdm[8] sdl[7] sdk[6] sdj[5]
> sdi[4] sdh[3] sdg[2] sdf[1] sde[0]
>       17580330880 blocks super 1.2 64K chunks 2 near-copies [20/20]
> [UUUUUUUUUUUUUUUUUUUU]
>       bitmap: 0/66 pages [0KB], 131072KB chunk
>
> Setting journal to be
> 1) on SSD (separate drives), shows
> Jobs: 1 (f=1): [w(1)] [5.0% done] [0KB/18783KB/0KB /s] [0/4695/0 iops]
> [eta 09m:31s]
> 2) to 'none' (disabling) shows
> Jobs: 1 (f=1): [w(1)] [14.0% done] [0KB/18504KB/0KB /s] [0/4626/0
> iops] [eta 08m:36s]

These numbers suggest that the write intent bitmap causes a 100-fold slow
down.
i.e. 45 iops instead of 4500 iops (roughly).

That is certainly more than I would expect, so maybe there is a bug.

Large RAID10 is a worst-base for bitmap updates as the bitmap is written
to all devices instead of just those devices that contain the data which
the bit corresponds to.  So every bitmap update goes to 10 device.

Your bitmap chunk size of 128M is nice and large, but making it larger
might help - maybe 1GB.

Still 100-fold ... that's a lot..

A potentially useful exercise would be to run a series of tests,
changing the number of devices in the array from 2 to 10, changing the
RAID chunk size from 64K to 64M, and changing the bitmap chunk size from
64M to 4G.
In each configuration, run the same test and record the iops.
(You don't need to wait for a resync each time, just use
--assume-clean).
Then graph all this data (or just provide the table and I'll graph it).
That might provide an insight into where to start looking for the
slowdown.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: internal write-intent bitmap is horribly slow with RAID10 over 20 drives
  2017-06-06  3:40 ` NeilBrown
@ 2017-06-06 11:31   ` CoolCold
  2017-06-06 22:02     ` NeilBrown
  0 siblings, 1 reply; 9+ messages in thread
From: CoolCold @ 2017-06-06 11:31 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux RAID

Hello!
Neil, thanks for reply, further inline

On Tue, Jun 6, 2017 at 10:40 AM, NeilBrown <neilb@suse.com> wrote:
> On Mon, Jun 05 2017, CoolCold wrote:
>
>> Hello!
>> Keep testing the new box and while having not the best sync speed,
>> it's not the worst thing I found.
>>
>> Doing FIO testing, for RAID10 over 20 10k RPM drives, I have very bad
>> performance, like _45_ iops only.
>
> ...
>>
>>
>> Output from fio with internal write-intent bitmap:
>> Jobs: 1 (f=1): [w(1)] [28.3% done] [0KB/183KB/0KB /s] [0/45/0 iops]
>> [eta 07m:11s]
>>
>> array definition:
>> [root@spare-a17484327407661 rovchinnikov]# cat /proc/mdstat
>> Personalities : [raid1] [raid10] [raid6] [raid5] [raid4]
>> md1 : active raid10 sdx[19] sdw[18] sdv[17] sdu[16] sdt[15] sds[14]
>> sdr[13] sdq[12] sdp[11] sdo[10] sdn[9] sdm[8] sdl[7] sdk[6] sdj[5]
>> sdi[4] sdh[3] sdg[2] sdf[1] sde[0]
>>       17580330880 blocks super 1.2 64K chunks 2 near-copies [20/20]
>> [UUUUUUUUUUUUUUUUUUUU]
>>       bitmap: 0/66 pages [0KB], 131072KB chunk
>>
>> Setting journal to be
>> 1) on SSD (separate drives), shows
>> Jobs: 1 (f=1): [w(1)] [5.0% done] [0KB/18783KB/0KB /s] [0/4695/0 iops]
>> [eta 09m:31s]
>> 2) to 'none' (disabling) shows
>> Jobs: 1 (f=1): [w(1)] [14.0% done] [0KB/18504KB/0KB /s] [0/4626/0
>> iops] [eta 08m:36s]
>
> These numbers suggest that the write intent bitmap causes a 100-fold slow
> down.
> i.e. 45 iops instead of 4500 iops (roughly).
>
> That is certainly more than I would expect, so maybe there is a bug.
I suppose noone is using raid10 over more than 4 drives then, i can't
believe i'm the one who hit this problem.

>
> Large RAID10 is a worst-base for bitmap updates as the bitmap is written
> to all devices instead of just those devices that contain the data which
> the bit corresponds to.  So every bitmap update goes to 10 device.
>
> Your bitmap chunk size of 128M is nice and large, but making it larger
> might help - maybe 1GB.
Tried that already, wasn't any much difference, but will gather more statistics.

>
> Still 100-fold ... that's a lot..
>
> A potentially useful exercise would be to run a series of tests,
> changing the number of devices in the array from 2 to 10, changing the
> RAID chunk size from 64K to 64M, and changing the bitmap chunk size from
> 64M to 4G.
Changing chunk size to up to 64M just to gather statistics or you
suppose it may be some practical usage for this?
> In each configuration, run the same test and record the iops.
> (You don't need to wait for a resync each time, just use
> --assume-clean).
This helps, thanks
> Then graph all this data (or just provide the table and I'll graph it).
> That might provide an insight into where to start looking for the
> slowdown.
>
> NeilBrown



-- 
Best regards,
[COOLCOLD-RIPN]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: internal write-intent bitmap is horribly slow with RAID10 over 20 drives
  2017-06-06 11:31   ` CoolCold
@ 2017-06-06 22:02     ` NeilBrown
  2017-06-12  6:12       ` CoolCold
  0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2017-06-06 22:02 UTC (permalink / raw)
  To: CoolCold; +Cc: Linux RAID

[-- Attachment #1: Type: text/plain, Size: 3630 bytes --]

On Tue, Jun 06 2017, CoolCold wrote:

> Hello!
> Neil, thanks for reply, further inline
>
> On Tue, Jun 6, 2017 at 10:40 AM, NeilBrown <neilb@suse.com> wrote:
>> On Mon, Jun 05 2017, CoolCold wrote:
>>
>>> Hello!
>>> Keep testing the new box and while having not the best sync speed,
>>> it's not the worst thing I found.
>>>
>>> Doing FIO testing, for RAID10 over 20 10k RPM drives, I have very bad
>>> performance, like _45_ iops only.
>>
>> ...
>>>
>>>
>>> Output from fio with internal write-intent bitmap:
>>> Jobs: 1 (f=1): [w(1)] [28.3% done] [0KB/183KB/0KB /s] [0/45/0 iops]
>>> [eta 07m:11s]
>>>
>>> array definition:
>>> [root@spare-a17484327407661 rovchinnikov]# cat /proc/mdstat
>>> Personalities : [raid1] [raid10] [raid6] [raid5] [raid4]
>>> md1 : active raid10 sdx[19] sdw[18] sdv[17] sdu[16] sdt[15] sds[14]
>>> sdr[13] sdq[12] sdp[11] sdo[10] sdn[9] sdm[8] sdl[7] sdk[6] sdj[5]
>>> sdi[4] sdh[3] sdg[2] sdf[1] sde[0]
>>>       17580330880 blocks super 1.2 64K chunks 2 near-copies [20/20]
>>> [UUUUUUUUUUUUUUUUUUUU]
>>>       bitmap: 0/66 pages [0KB], 131072KB chunk
>>>
>>> Setting journal to be
>>> 1) on SSD (separate drives), shows
>>> Jobs: 1 (f=1): [w(1)] [5.0% done] [0KB/18783KB/0KB /s] [0/4695/0 iops]
>>> [eta 09m:31s]
>>> 2) to 'none' (disabling) shows
>>> Jobs: 1 (f=1): [w(1)] [14.0% done] [0KB/18504KB/0KB /s] [0/4626/0
>>> iops] [eta 08m:36s]
>>
>> These numbers suggest that the write intent bitmap causes a 100-fold slow
>> down.
>> i.e. 45 iops instead of 4500 iops (roughly).
>>
>> That is certainly more than I would expect, so maybe there is a bug.
> I suppose noone is using raid10 over more than 4 drives then, i can't
> believe i'm the one who hit this problem.

We have customers who use RAID10 will many more than 4 drives, but I
haven't had reports like this.  Presumably whatever problem is affecting
you is not affecting them.  We cannot know until we drill down to
understand the problem.

>
>>
>> Large RAID10 is a worst-base for bitmap updates as the bitmap is written
>> to all devices instead of just those devices that contain the data which
>> the bit corresponds to.  So every bitmap update goes to 10 device.
>>
>> Your bitmap chunk size of 128M is nice and large, but making it larger
>> might help - maybe 1GB.
> Tried that already, wasn't any much difference, but will gather more statistics.
>
>>
>> Still 100-fold ... that's a lot..
>>
>> A potentially useful exercise would be to run a series of tests,
>> changing the number of devices in the array from 2 to 10, changing the
>> RAID chunk size from 64K to 64M, and changing the bitmap chunk size from
>> 64M to 4G.
> Changing chunk size to up to 64M just to gather statistics or you
> suppose it may be some practical usage for this?

I don't have any particular reason to expect this to have an effect.
But it is easy to change, and changing it might show provide some hints.
So probably "just to gather statistics".

NeilBrown


>> In each configuration, run the same test and record the iops.
>> (You don't need to wait for a resync each time, just use
>> --assume-clean).
> This helps, thanks
>> Then graph all this data (or just provide the table and I'll graph it).
>> That might provide an insight into where to start looking for the
>> slowdown.
>>
>> NeilBrown
>
>
>
> -- 
> Best regards,
> [COOLCOLD-RIPN]
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: internal write-intent bitmap is horribly slow with RAID10 over 20 drives
  2017-06-06 22:02     ` NeilBrown
@ 2017-06-12  6:12       ` CoolCold
  2017-06-14  1:40         ` NeilBrown
  0 siblings, 1 reply; 9+ messages in thread
From: CoolCold @ 2017-06-12  6:12 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux RAID

Hello!
i've started doing testing as proposed by you and found other strange
behavior with _4_ drives ~ 44 iops as well:
mdadm --create --assume-clean -c $((64*1)) -b internal
--bitmap-chunk=$((128*1024)) -n 4 -l 10 /dev/md1 /dev/sde /dev/sdf
/dev/sdg /dev/sdh

mdstat:
[root@spare-a17484327407661 rovchinnikov]# cat /proc/mdstat
Personalities : [raid1] [raid10] [raid6] [raid5] [raid4]
md1 : active raid10 sdh[3] sdg[2] sdf[1] sde[0]
      3516066176 blocks super 1.2 64K chunks 2 near-copies [4/4] [UUUU]
      bitmap: 0/14 pages [0KB], 131072KB chunk


fio:
[root@spare-a17484327407661 tests]# fio --runtime 60 randwrite.conf
randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio,
iodepth=512
fio-2.2.8
Starting 1 process
Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/179KB/0KB /s] [0/44/0 iops]
[eta 00m:00s]
randwrite: (groupid=0, jobs=1): err= 0: pid=35048: Mon Jun 12 06:03:28 2017
  write: io=35728KB, bw=609574B/s, iops=148, runt= 60018msec
    slat (usec): min=6, max=3006.6K, avg=6714.32, stdev=33548.39
    clat (usec): min=137, max=14323K, avg=3430245.54, stdev=4822029.01
     lat (msec): min=22, max=14323, avg=3436.96, stdev=4830.87
    clat percentiles (msec):
     |  1.00th=[   40],  5.00th=[   76], 10.00th=[   87], 20.00th=[  115],
     | 30.00th=[  437], 40.00th=[  510], 50.00th=[  553], 60.00th=[  619],
     | 70.00th=[ 2376], 80.00th=[11600], 90.00th=[11731], 95.00th=[11863],
     | 99.00th=[12387], 99.50th=[13435], 99.90th=[14091], 99.95th=[14222],
     | 99.99th=[14353]
    bw (KB  /s): min=  111, max=14285, per=95.41%, avg=567.70, stdev=1623.95
    lat (usec) : 250=0.01%
    lat (msec) : 50=2.02%, 100=12.52%, 250=7.01%, 500=17.02%, 750=30.62%
    lat (msec) : 1000=0.12%, 2000=0.50%, >=2000=30.18%
  cpu          : usr=0.06%, sys=0.34%, ctx=2607, majf=0, minf=30
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.3%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued    : total=r=0/w=8932/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=512

Run status group 0 (all jobs):
  WRITE: io=35728KB, aggrb=595KB/s, minb=595KB/s, maxb=595KB/s,
mint=60018msec, maxt=60018msec

Disk stats (read/write):
    md1: ios=61/8928, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=16/6693, aggrmerge=0/0, aggrticks=13/512251,
aggrin_queue=512265, aggrutil=83.63%
  sde: ios=40/6787, merge=0/0, ticks=15/724812, in_queue=724824, util=83.63%
  sdf: ios=2/6787, merge=0/0, ticks=5/694057, in_queue=694061, util=82.20%
  sdg: ios=24/6599, merge=0/0, ticks=27/154988, in_queue=155022, util=80.72%
  sdh: ios=1/6599, merge=0/0, ticks=6/475150, in_queue=475155, util=82.29%





comparing to the same drives on RAID5 fio shows ~ 142 iops:
[root@spare-a17484327407661 tests]# fio --runtime 60 randwrite.conf
randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio,
iodepth=512
fio-2.2.8
Starting 1 process
Jobs: 1 (f=1): [w(1)] [0.0% done] [0KB/571KB/0KB /s] [0/142/0 iops]
[eta 93d:11h:20m:52s]
randwrite: (groupid=0, jobs=1): err= 0: pid=34914: Mon Jun 12 05:59:23 2017
  write: io=41880KB, bw=707115B/s, iops=172, runt= 60648msec

raid5 created basically the same as for RAID10
mdadm --create --assume-clean -c $((64*1)) -b internal
--bitmap-chunk=$((128*1024)) -n 4 -l 5 /dev/md1 /dev/sde /dev/sdf
/dev/sdg /dev/sdh

mdstat output for raid5:
[root@spare-a17484327407661 rovchinnikov]# cat /proc/mdstat
Personalities : [raid1] [raid10] [raid6] [raid5] [raid4]
md1 : active raid5 sdh[3] sdg[2] sdf[1] sde[0]
      5274099264 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 7/7 pages [28KB], 131072KB chunk


for both cases, the same fio config:
[root@spare-a17484327407661 tests]# cat randwrite.conf
[randwrite]
blocksize=4k
#blocksize=64k
filename=/dev/md1
#filename=/dev/md2
readwrite=randwrite
#rwmixread=75
direct=1
buffered=0
ioengine=libaio
iodepth=512
#numjobs=4
group_reporting=1

from iostat, hard drives are having more requests than md1 (compare
40-43 on md1 and ~ 60 per device)
[root@spare-a17484327407661 rovchinnikov]# iostat -d -xk 1  /dev/md1
/dev/sde /dev/sdf /dev/sdg /dev/sdh
Linux 3.10.0-327.el7.x86_64 (spare-a17484327407661.sgdc)
06/12/2017      _x86_64_        (40 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sde               0.00     0.00    0.00   59.00     0.00   236.00
8.00     0.76   12.76    0.00   12.76  12.75  75.20
sdh               0.00     0.00    0.00   59.00     0.00   236.00
8.00     0.76   12.85    0.00   12.85  12.88  76.00
sdf               0.00     0.00    0.00   62.00     0.00   248.00
8.00     0.78   12.89    0.00   12.89  12.58  78.00
sdg               0.00     0.00    0.00   62.00     0.00   248.00
8.00     0.77   12.71    0.00   12.71  12.45  77.20
md1               0.00     0.00    0.00   40.00     0.00   160.00
8.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sde               0.00     0.00    0.00   66.00     0.00   264.00
8.00     0.80   12.39    0.00   12.39  12.09  79.80
sdh               0.00     0.00    0.00   62.00     0.00   248.00
8.00     0.78   12.87    0.00   12.87  12.58  78.00
sdf               0.00     0.00    0.00   66.00     0.00   264.00
8.00     0.78   11.82    0.00   11.82  11.82  78.00
sdg               0.00     0.00    0.00   62.00     0.00   248.00
8.00     0.80   12.82    0.00   12.82  12.85  79.70
md1               0.00     0.00    0.00   43.00     0.00   172.00
8.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sde               0.00     0.00    0.00   65.00     0.00   260.00
8.00     0.81   12.43    0.00   12.43  12.40  80.60
sdh               0.00     0.00    0.00   58.00     0.00   232.00
8.00     0.74   12.81    0.00   12.81  12.78  74.10
sdf               0.00     0.00    0.00   71.00     0.00   284.00
8.00     0.81   11.38    0.00   11.38  11.34  80.50
sdg               0.00     0.00    0.00   64.00     0.00   256.00
8.00     0.82   12.77    0.00   12.77  12.73  81.50
md1               0.00     0.00    0.00   43.00     0.00   172.00
8.00     0.00    0.00    0.00    0.00   0.00   0.00

I don't see any good explanation for this, kindly waiting for your advice.


On Wed, Jun 7, 2017 at 5:02 AM, NeilBrown <neilb@suse.com> wrote:
> On Tue, Jun 06 2017, CoolCold wrote:
>
>> Hello!
>> Neil, thanks for reply, further inline
>>
>> On Tue, Jun 6, 2017 at 10:40 AM, NeilBrown <neilb@suse.com> wrote:
>>> On Mon, Jun 05 2017, CoolCold wrote:
>>>
>>>> Hello!
>>>> Keep testing the new box and while having not the best sync speed,
>>>> it's not the worst thing I found.
>>>>
>>>> Doing FIO testing, for RAID10 over 20 10k RPM drives, I have very bad
>>>> performance, like _45_ iops only.
>>>
>>> ...
>>>>
>>>>
>>>> Output from fio with internal write-intent bitmap:
>>>> Jobs: 1 (f=1): [w(1)] [28.3% done] [0KB/183KB/0KB /s] [0/45/0 iops]
>>>> [eta 07m:11s]
>>>>
>>>> array definition:
>>>> [root@spare-a17484327407661 rovchinnikov]# cat /proc/mdstat
>>>> Personalities : [raid1] [raid10] [raid6] [raid5] [raid4]
>>>> md1 : active raid10 sdx[19] sdw[18] sdv[17] sdu[16] sdt[15] sds[14]
>>>> sdr[13] sdq[12] sdp[11] sdo[10] sdn[9] sdm[8] sdl[7] sdk[6] sdj[5]
>>>> sdi[4] sdh[3] sdg[2] sdf[1] sde[0]
>>>>       17580330880 blocks super 1.2 64K chunks 2 near-copies [20/20]
>>>> [UUUUUUUUUUUUUUUUUUUU]
>>>>       bitmap: 0/66 pages [0KB], 131072KB chunk
>>>>
>>>> Setting journal to be
>>>> 1) on SSD (separate drives), shows
>>>> Jobs: 1 (f=1): [w(1)] [5.0% done] [0KB/18783KB/0KB /s] [0/4695/0 iops]
>>>> [eta 09m:31s]
>>>> 2) to 'none' (disabling) shows
>>>> Jobs: 1 (f=1): [w(1)] [14.0% done] [0KB/18504KB/0KB /s] [0/4626/0
>>>> iops] [eta 08m:36s]
>>>
>>> These numbers suggest that the write intent bitmap causes a 100-fold slow
>>> down.
>>> i.e. 45 iops instead of 4500 iops (roughly).
>>>
>>> That is certainly more than I would expect, so maybe there is a bug.
>> I suppose noone is using raid10 over more than 4 drives then, i can't
>> believe i'm the one who hit this problem.
>
> We have customers who use RAID10 will many more than 4 drives, but I
> haven't had reports like this.  Presumably whatever problem is affecting
> you is not affecting them.  We cannot know until we drill down to
> understand the problem.
>
>>
>>>
>>> Large RAID10 is a worst-base for bitmap updates as the bitmap is written
>>> to all devices instead of just those devices that contain the data which
>>> the bit corresponds to.  So every bitmap update goes to 10 device.
>>>
>>> Your bitmap chunk size of 128M is nice and large, but making it larger
>>> might help - maybe 1GB.
>> Tried that already, wasn't any much difference, but will gather more statistics.
>>
>>>
>>> Still 100-fold ... that's a lot..
>>>
>>> A potentially useful exercise would be to run a series of tests,
>>> changing the number of devices in the array from 2 to 10, changing the
>>> RAID chunk size from 64K to 64M, and changing the bitmap chunk size from
>>> 64M to 4G.
>> Changing chunk size to up to 64M just to gather statistics or you
>> suppose it may be some practical usage for this?
>
> I don't have any particular reason to expect this to have an effect.
> But it is easy to change, and changing it might show provide some hints.
> So probably "just to gather statistics".
>
> NeilBrown
>
>
>>> In each configuration, run the same test and record the iops.
>>> (You don't need to wait for a resync each time, just use
>>> --assume-clean).
>> This helps, thanks
>>> Then graph all this data (or just provide the table and I'll graph it).
>>> That might provide an insight into where to start looking for the
>>> slowdown.
>>>
>>> NeilBrown
>>
>>
>>
>> --
>> Best regards,
>> [COOLCOLD-RIPN]
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best regards,
[COOLCOLD-RIPN]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: internal write-intent bitmap is horribly slow with RAID10 over 20 drives
  2017-06-12  6:12       ` CoolCold
@ 2017-06-14  1:40         ` NeilBrown
  0 siblings, 0 replies; 9+ messages in thread
From: NeilBrown @ 2017-06-14  1:40 UTC (permalink / raw)
  To: CoolCold; +Cc: Linux RAID

[-- Attachment #1: Type: text/plain, Size: 7994 bytes --]

On Mon, Jun 12 2017, CoolCold wrote:

> Hello!
> i've started doing testing as proposed by you and found other strange
> behavior with _4_ drives ~ 44 iops as well:
> mdadm --create --assume-clean -c $((64*1)) -b internal
> --bitmap-chunk=$((128*1024)) -n 4 -l 10 /dev/md1 /dev/sde /dev/sdf
> /dev/sdg /dev/sdh
>
> mdstat:
> [root@spare-a17484327407661 rovchinnikov]# cat /proc/mdstat
> Personalities : [raid1] [raid10] [raid6] [raid5] [raid4]
> md1 : active raid10 sdh[3] sdg[2] sdf[1] sde[0]
>       3516066176 blocks super 1.2 64K chunks 2 near-copies [4/4] [UUUU]
>       bitmap: 0/14 pages [0KB], 131072KB chunk
>
>
> fio:
> [root@spare-a17484327407661 tests]# fio --runtime 60 randwrite.conf
> randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio,
> iodepth=512
> fio-2.2.8
> Starting 1 process
> Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/179KB/0KB /s] [0/44/0 iops]
> [eta 00m:00s]
> randwrite: (groupid=0, jobs=1): err= 0: pid=35048: Mon Jun 12 06:03:28 2017
>   write: io=35728KB, bw=609574B/s, iops=148, runt= 60018msec
>     slat (usec): min=6, max=3006.6K, avg=6714.32, stdev=33548.39
>     clat (usec): min=137, max=14323K, avg=3430245.54, stdev=4822029.01
>      lat (msec): min=22, max=14323, avg=3436.96, stdev=4830.87
>     clat percentiles (msec):
>      |  1.00th=[   40],  5.00th=[   76], 10.00th=[   87], 20.00th=[  115],
>      | 30.00th=[  437], 40.00th=[  510], 50.00th=[  553], 60.00th=[  619],
>      | 70.00th=[ 2376], 80.00th=[11600], 90.00th=[11731], 95.00th=[11863],
>      | 99.00th=[12387], 99.50th=[13435], 99.90th=[14091], 99.95th=[14222],
>      | 99.99th=[14353]
>     bw (KB  /s): min=  111, max=14285, per=95.41%, avg=567.70, stdev=1623.95
>     lat (usec) : 250=0.01%
>     lat (msec) : 50=2.02%, 100=12.52%, 250=7.01%, 500=17.02%, 750=30.62%
>     lat (msec) : 1000=0.12%, 2000=0.50%, >=2000=30.18%
>   cpu          : usr=0.06%, sys=0.34%, ctx=2607, majf=0, minf=30
>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.3%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
>      issued    : total=r=0/w=8932/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>      latency   : target=0, window=0, percentile=100.00%, depth=512
>
> Run status group 0 (all jobs):
>   WRITE: io=35728KB, aggrb=595KB/s, minb=595KB/s, maxb=595KB/s,
> mint=60018msec, maxt=60018msec
>
> Disk stats (read/write):
>     md1: ios=61/8928, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
> aggrios=16/6693, aggrmerge=0/0, aggrticks=13/512251,
> aggrin_queue=512265, aggrutil=83.63%
>   sde: ios=40/6787, merge=0/0, ticks=15/724812, in_queue=724824, util=83.63%
>   sdf: ios=2/6787, merge=0/0, ticks=5/694057, in_queue=694061, util=82.20%
>   sdg: ios=24/6599, merge=0/0, ticks=27/154988, in_queue=155022, util=80.72%
>   sdh: ios=1/6599, merge=0/0, ticks=6/475150, in_queue=475155, util=82.29%
>
>
>
>
>
> comparing to the same drives on RAID5 fio shows ~ 142 iops:
> [root@spare-a17484327407661 tests]# fio --runtime 60 randwrite.conf
> randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio,
> iodepth=512
> fio-2.2.8
> Starting 1 process
> Jobs: 1 (f=1): [w(1)] [0.0% done] [0KB/571KB/0KB /s] [0/142/0 iops]
> [eta 93d:11h:20m:52s]
> randwrite: (groupid=0, jobs=1): err= 0: pid=34914: Mon Jun 12 05:59:23 2017
>   write: io=41880KB, bw=707115B/s, iops=172, runt= 60648msec
>
> raid5 created basically the same as for RAID10
> mdadm --create --assume-clean -c $((64*1)) -b internal
> --bitmap-chunk=$((128*1024)) -n 4 -l 5 /dev/md1 /dev/sde /dev/sdf
> /dev/sdg /dev/sdh
>
> mdstat output for raid5:
> [root@spare-a17484327407661 rovchinnikov]# cat /proc/mdstat
> Personalities : [raid1] [raid10] [raid6] [raid5] [raid4]
> md1 : active raid5 sdh[3] sdg[2] sdf[1] sde[0]
>       5274099264 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
>       bitmap: 7/7 pages [28KB], 131072KB chunk
>
>
> for both cases, the same fio config:
> [root@spare-a17484327407661 tests]# cat randwrite.conf
> [randwrite]
> blocksize=4k
> #blocksize=64k
> filename=/dev/md1
> #filename=/dev/md2
> readwrite=randwrite
> #rwmixread=75
> direct=1
> buffered=0
> ioengine=libaio
> iodepth=512
> #numjobs=4
> group_reporting=1
>
> from iostat, hard drives are having more requests than md1 (compare
> 40-43 on md1 and ~ 60 per device)
> [root@spare-a17484327407661 rovchinnikov]# iostat -d -xk 1  /dev/md1
> /dev/sde /dev/sdf /dev/sdg /dev/sdh
> Linux 3.10.0-327.el7.x86_64 (spare-a17484327407661.sgdc)
> 06/12/2017      _x86_64_        (40 CPU)
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sde               0.00     0.00    0.00   59.00     0.00   236.00
> 8.00     0.76   12.76    0.00   12.76  12.75  75.20
> sdh               0.00     0.00    0.00   59.00     0.00   236.00
> 8.00     0.76   12.85    0.00   12.85  12.88  76.00
> sdf               0.00     0.00    0.00   62.00     0.00   248.00
> 8.00     0.78   12.89    0.00   12.89  12.58  78.00
> sdg               0.00     0.00    0.00   62.00     0.00   248.00
> 8.00     0.77   12.71    0.00   12.71  12.45  77.20
> md1               0.00     0.00    0.00   40.00     0.00   160.00
> 8.00     0.00    0.00    0.00    0.00   0.00   0.00
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sde               0.00     0.00    0.00   66.00     0.00   264.00
> 8.00     0.80   12.39    0.00   12.39  12.09  79.80
> sdh               0.00     0.00    0.00   62.00     0.00   248.00
> 8.00     0.78   12.87    0.00   12.87  12.58  78.00
> sdf               0.00     0.00    0.00   66.00     0.00   264.00
> 8.00     0.78   11.82    0.00   11.82  11.82  78.00
> sdg               0.00     0.00    0.00   62.00     0.00   248.00
> 8.00     0.80   12.82    0.00   12.82  12.85  79.70
> md1               0.00     0.00    0.00   43.00     0.00   172.00
> 8.00     0.00    0.00    0.00    0.00   0.00   0.00
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sde               0.00     0.00    0.00   65.00     0.00   260.00
> 8.00     0.81   12.43    0.00   12.43  12.40  80.60
> sdh               0.00     0.00    0.00   58.00     0.00   232.00
> 8.00     0.74   12.81    0.00   12.81  12.78  74.10
> sdf               0.00     0.00    0.00   71.00     0.00   284.00
> 8.00     0.81   11.38    0.00   11.38  11.34  80.50
> sdg               0.00     0.00    0.00   64.00     0.00   256.00
> 8.00     0.82   12.77    0.00   12.77  12.73  81.50
> md1               0.00     0.00    0.00   43.00     0.00   172.00
> 8.00     0.00    0.00    0.00    0.00   0.00   0.00
>
> I don't see any good explanation for this, kindly waiting for your advice.
>

I really did want to see the multi-dimensional collection of data
points, rather than just one.  It is hard to see patterns in a single
number.

RAID5 and RAID10 are not directly comparable.
For every block written to the array, RAID10 writes 2 block, and RAID5
writes 1.33 (on average).  So you would expect 50% more writes to blocks
in a (4 device) RAID10.
Also each bit in the bitmap for RAID10 covers less space, so you get
more bitmap updates.
I don't think this quite covers the difference though.

40 writes to the array, 60 writes to each device is a little high.
I think that is worst case.
Every write to the array updates the bitmap on all devices, and the data
on 2 devices.
So it seems like every write is being handled synchronously with no
write combining.  Normally multiple bitmap updates are handled with a
single write.

Having only one job doing direct IO would quite possibly cause this
worst-case performance (though I don't know details about how fio
works).

Try using buffered io (that easily allows more parallelism), and try
multiple concurrent threads.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-06-14  1:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-05 10:06 internal write-intent bitmap is horribly slow with RAID10 over 20 drives CoolCold
2017-06-05 10:55 ` David Brown
2017-06-05 12:30   ` CoolCold
2017-06-05 16:16     ` David Brown
2017-06-06  3:40 ` NeilBrown
2017-06-06 11:31   ` CoolCold
2017-06-06 22:02     ` NeilBrown
2017-06-12  6:12       ` CoolCold
2017-06-14  1:40         ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).