slow sequential read on partitioned raid6

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* slow sequential read on partitioned raid6
@ 2010-03-16 19:05 Nicolae Mihalache
  2010-03-16 22:22 ` Neil Brown
  0 siblings, 1 reply; 7+ messages in thread
From: Nicolae Mihalache @ 2010-03-16 19:05 UTC (permalink / raw)
  To: linux-raid

Hello,

I have created a partitioned raid6 array over 6x1TB SATA disks using the
command (from memory): mdadm --create --auto=mdp --level=6
--raid-devices /dev/md_d1 /dev/sd[b-g].

When I run a sequential read test using
dd if=/dev/md_d1p1 of=/dev/null bs=1M
I get low read speeds of around 80MB/s but only when the partition is
mounted.

If I unmount, the speed is around 350MB/s. The filesystems I tried are
ext3 and xfs.

The partitions have been created with gparted, the partition table being
of type GPT.

If I create normal /dev/sdx1 partitions on each disk and then make a
/dev/md1 raid6 array over them, the read speed is ok.

I played with different read ahead settings and while they changed the
read speed, it's only marginally around the values reported above.

Can somebody explain what is the difference when accessing a raw disk
when it is mounted or not? Also when playing with those read ahead
settings it was not clear how/if the read ahead of the individual disks
are taken into account.

When setting big values of read ahead, I could see with iostat that tps
for the individual disks is double when accessing the mounted disk as
opposed to when accessing it unmounted (despite the speed being three
times lower).
It's like when accessing the mounted partition, it reads some other
parts of the disks. I could not find a way to print the blocks read from
the individual disks. The sysctl vm.block_dump=1 makes the kernel print
the block numbers on the md array but not on the components of the array.

The system is debian 5 with kernel 2.6.26-2-686.

Thanks for any hint on how to further debug the problem.

nicolae

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: slow sequential read on partitioned raid6
  2010-03-16 19:05 slow sequential read on partitioned raid6 Nicolae Mihalache
@ 2010-03-16 22:22 ` Neil Brown
  2010-03-16 23:16   ` Nicolae Mihalache
  0 siblings, 1 reply; 7+ messages in thread
From: Neil Brown @ 2010-03-16 22:22 UTC (permalink / raw)
  To: Nicolae Mihalache; +Cc: linux-raid

On Tue, 16 Mar 2010 20:05:45 +0100
Nicolae Mihalache <mache@abcpages.com> wrote:

> Hello,
> 
> I have created a partitioned raid6 array over 6x1TB SATA disks using the
> command (from memory): mdadm --create --auto=mdp --level=6
> --raid-devices /dev/md_d1 /dev/sd[b-g].
> 
> When I run a sequential read test using
> dd if=/dev/md_d1p1 of=/dev/null bs=1M
> I get low read speeds of around 80MB/s but only when the partition is
> mounted.
> 
> If I unmount, the speed is around 350MB/s. The filesystems I tried are
> ext3 and xfs.

Thanks for reporting this.

I just did some testing and I get the reverse!!

When a filesystem is mounted I get 135MB/s.  When it isn't mounted
I get 64MB/s.

I cannot think what could cause this.  I will have to explore.
Can you please double check you results and confirm that it definitely
is  faster then unmounted.


>  
> The partitions have been created with gparted, the partition table being
> of type GPT.
> 
> If I create normal /dev/sdx1 partitions on each disk and then make a
> /dev/md1 raid6 array over them, the read speed is ok.
> 
> I played with different read ahead settings and while they changed the
> read speed, it's only marginally around the values reported above.
> 
> Can somebody explain what is the difference when accessing a raw disk
> when it is mounted or not? Also when playing with those read ahead
> settings it was not clear how/if the read ahead of the individual disks
> are taken into account.

Only the read-ahead value of the array is considered.  The read-ahead settings
of the individual devices in the array are ignored.


NeilBrown

> 
> When setting big values of read ahead, I could see with iostat that tps
> for the individual disks is double when accessing the mounted disk as
> opposed to when accessing it unmounted (despite the speed being three
> times lower).
> It's like when accessing the mounted partition, it reads some other
> parts of the disks. I could not find a way to print the blocks read from
> the individual disks. The sysctl vm.block_dump=1 makes the kernel print
> the block numbers on the md array but not on the components of the array.
> 
> The system is debian 5 with kernel 2.6.26-2-686.
> 
> Thanks for any hint on how to further debug the problem.
> 
> nicolae
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: slow sequential read on partitioned raid6
  2010-03-16 22:22 ` Neil Brown
@ 2010-03-16 23:16   ` Nicolae Mihalache
       [not found]     ` <1268783497.3781.14.camel@localhost.localdomain>
  0 siblings, 1 reply; 7+ messages in thread
From: Nicolae Mihalache @ 2010-03-16 23:16 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On 03/16/2010 11:22 PM, Neil Brown wrote:
> On Tue, 16 Mar 2010 20:05:45 +0100
> Nicolae Mihalache <mache@abcpages.com> wrote:
>   
>> Hello,
>>
>> I have created a partitioned raid6 array over 6x1TB SATA disks using the
>> command (from memory): mdadm --create --auto=mdp --level=6
>> --raid-devices /dev/md_d1 /dev/sd[b-g].
>>
>> When I run a sequential read test using
>> dd if=/dev/md_d1p1 of=/dev/null bs=1M
>> I get low read speeds of around 80MB/s but only when the partition is
>> mounted.
>>
>> If I unmount, the speed is around 350MB/s. The filesystems I tried are
>> ext3 and xfs.
>>     
>
> Thanks for reporting this.
>
> I just did some testing and I get the reverse!!
>
> When a filesystem is mounted I get 135MB/s.  When it isn't mounted
> I get 64MB/s.
>
> I cannot think what could cause this.  I will have to explore.
> Can you please double check you results and confirm that it definitely
> is  faster then unmounted.
>   
I'm positive that it's slow when mounted, that's how I discovered the
problem. See below (I recreated the array over 1/10 of the original
disks to be able to test easier).
In fact the highest speed I get when accessing directly the entire disk
even when one partition is mounted.


bacula:~# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md_d1 : active raid6 sdg1[5] sdf1[4] sde1[3] sdd1[2] sdc1[1] sdb1[0]
      390668288 blocks level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]

md2 : active raid1 sdi1[0] sdj1[1]
      1462750272 blocks [2/2] [UU]

unused devices: <none>

bacula:~# parted  /dev/md_d1
GNU Parted 1.8.8
Using /dev/md_d1
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Model: Unknown (unknown)
Disk /dev/md_d1: 400GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End     Size    File system  Name     Flags
 1      17.4kB  50.0GB  50.0GB  ext3         primary

(parted) quit

bacula:~# umount /dev/md_d1p1
umount: /dev/md_d1p1: not mounted

bacula:~# dd if=/dev/md_d1p1 of=/dev/null bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 37.4938 s, 280 MB/s

bacula:~# mount /dev/md_d1p1 /mnt

bacula:~# dd if=/dev/md_d1p1 of=/dev/null bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 132.894 s, 78.9 MB/s

bacula:~# dd if=/dev/md_d1 of=/dev/null bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 28.222 s, 372 MB/s

^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <1268783497.3781.14.camel@localhost.localdomain>]

* Re: slow sequential read on partitioned raid6
       [not found]     ` <1268783497.3781.14.camel@localhost.localdomain>
@ 2010-03-17  8:23       ` Nicolae Mihalache
  2010-03-18  2:40         ` Michael Evans
  0 siblings, 1 reply; 7+ messages in thread
From: Nicolae Mihalache @ 2010-03-17  8:23 UTC (permalink / raw)
  To: linux-raid

I created a second 100GB partition on all the disks and then made a
normal /dev/md1 raid6 array out of them, and the results I get:
bacula:~# dd if=/dev/zero of=/mnt1/test-file bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 72.6303 s, 144 MB/s

bacula:~# dd if=/mnt1/test-file of=/dev/null bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 29.1241 s, 360 MB/s

I really believe it's something with the partitioned array.
/proc/devices shows:

Block devices:
  ...
  9 md
  ...
253 mdp


All the md_d1 partitions have major number 253. I don't know if this
means something but maybe there is a bug in the mdp driver (or whatever
is called).

nicolae
Daniel Reurich wrote:
> On Wed, 2010-03-17 at 00:16 +0100, Nicolae Mihalache wrote:
>   
>> On 03/16/2010 11:22 PM, Neil Brown wrote:
>>     
>>> On Tue, 16 Mar 2010 20:05:45 +0100
>>> Nicolae Mihalache <mache@abcpages.com> wrote:
>>>   
>>>       
>>>> Hello,
>>>>
>>>> I have created a partitioned raid6 array over 6x1TB SATA disks using the
>>>> command (from memory): mdadm --create --auto=mdp --level=6
>>>> --raid-devices /dev/md_d1 /dev/sd[b-g].
>>>>
>>>> When I run a sequential read test using
>>>> dd if=/dev/md_d1p1 of=/dev/null bs=1M
>>>> I get low read speeds of around 80MB/s but only when the partition is
>>>> mounted.
>>>>
>>>> If I unmount, the speed is around 350MB/s. The filesystems I tried are
>>>> ext3 and xfs.
>>>>     
>>>>         
>>> Thanks for reporting this.
>>>
>>> I just did some testing and I get the reverse!!
>>>
>>> When a filesystem is mounted I get 135MB/s.  When it isn't mounted
>>> I get 64MB/s.
>>>
>>> I cannot think what could cause this.  I will have to explore.
>>> Can you please double check you results and confirm that it definitely
>>> is  faster then unmounted.
>>>   
>>>       
>> I'm positive that it's slow when mounted, that's how I discovered the
>> problem. See below (I recreated the array over 1/10 of the original
>> disks to be able to test easier).
>> In fact the highest speed I get when accessing directly the entire disk
>> even when one partition is mounted.
>>
>>
>> bacula:~# cat /proc/mdstat
>> Personalities : [raid1] [raid6] [raid5] [raid4]
>> md_d1 : active raid6 sdg1[5] sdf1[4] sde1[3] sdd1[2] sdc1[1] sdb1[0]
>>       390668288 blocks level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]
>>
>> md2 : active raid1 sdi1[0] sdj1[1]
>>       1462750272 blocks [2/2] [UU]
>>
>> unused devices: <none>
>>
>> bacula:~# parted  /dev/md_d1
>> GNU Parted 1.8.8
>> Using /dev/md_d1
>> Welcome to GNU Parted! Type 'help' to view a list of commands.
>> (parted) print
>> Model: Unknown (unknown)
>> Disk /dev/md_d1: 400GB
>> Sector size (logical/physical): 512B/512B
>> Partition Table: gpt
>>
>> Number  Start   End     Size    File system  Name     Flags
>>  1      17.4kB  50.0GB  50.0GB  ext3         primary
>>
>> (parted) quit
>>
>> bacula:~# umount /dev/md_d1p1
>> umount: /dev/md_d1p1: not mounted
>>
>> bacula:~# dd if=/dev/md_d1p1 of=/dev/null bs=1M count=10000
>> 10000+0 records in
>> 10000+0 records out
>> 10485760000 bytes (10 GB) copied, 37.4938 s, 280 MB/s
>>
>> bacula:~# mount /dev/md_d1p1 /mnt
>>
>> bacula:~# dd if=/dev/md_d1p1 of=/dev/null bs=1M count=10000
>> 10000+0 records in
>> 10000+0 records out
>> 10485760000 bytes (10 GB) copied, 132.894 s, 78.9 MB/s
>>
>> bacula:~# dd if=/dev/md_d1 of=/dev/null bs=1M count=10000
>> 10000+0 records in
>> 10000+0 records out
>> 10485760000 bytes (10 GB) copied, 28.222 s, 372 MB/s
>>     
>
> Why are you trying directly from the block devices when they contain a
> mounted filesystem?  Surely the fs layer would be holding a locks on the
> block device causing it to slow down raw layer access.
>
> Might I suggest you should be reading files that are located within the
> mounted filesystem.
>
> I suggest you try this in the mounted filesystem:
>
> dd if=/dev/zero of=/mnt/test-file bs=1M count=10000
> dd if=/mnt/test-file of=/dev/null bs=1M
> rm /mnt/test-file
>
> I hope this helps.
>   


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: slow sequential read on partitioned raid6
  2010-03-17  8:23       ` Nicolae Mihalache
@ 2010-03-18  2:40         ` Michael Evans
  2010-03-19  6:47           ` Nicolae Mihalache
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Evans @ 2010-03-18  2:40 UTC (permalink / raw)
  To: Nicolae Mihalache; +Cc: linux-raid

On Wed, Mar 17, 2010 at 1:23 AM, Nicolae Mihalache <mache@abcpages.com> wrote:
> I created a second 100GB partition on all the disks and then made a
> normal /dev/md1 raid6 array out of them, and the results I get:
> bacula:~# dd if=/dev/zero of=/mnt1/test-file bs=1M count=10000
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 72.6303 s, 144 MB/s
>
> bacula:~# dd if=/mnt1/test-file of=/dev/null bs=1M count=10000
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 29.1241 s, 360 MB/s
>
> I really believe it's something with the partitioned array.
> /proc/devices shows:
>
> Block devices:
>  ...
>  9 md
>  ...
> 253 mdp
>
>
> All the md_d1 partitions have major number 253. I don't know if this
> means something but maybe there is a bug in the mdp driver (or whatever
> is called).
>
> nicolae
> Daniel Reurich wrote:
>> On Wed, 2010-03-17 at 00:16 +0100, Nicolae Mihalache wrote:
>>
>>> On 03/16/2010 11:22 PM, Neil Brown wrote:
>>>
>>>> On Tue, 16 Mar 2010 20:05:45 +0100
>>>> Nicolae Mihalache <mache@abcpages.com> wrote:
>>>>
>>>>
>>>>> Hello,
>>>>>
>>>>> I have created a partitioned raid6 array over 6x1TB SATA disks using the
>>>>> command (from memory): mdadm --create --auto=mdp --level=6
>>>>> --raid-devices /dev/md_d1 /dev/sd[b-g].
>>>>>
>>>>> When I run a sequential read test using
>>>>> dd if=/dev/md_d1p1 of=/dev/null bs=1M
>>>>> I get low read speeds of around 80MB/s but only when the partition is
>>>>> mounted.
>>>>>
>>>>> If I unmount, the speed is around 350MB/s. The filesystems I tried are
>>>>> ext3 and xfs.
>>>>>
>>>>>
>>>> Thanks for reporting this.
>>>>
>>>> I just did some testing and I get the reverse!!
>>>>
>>>> When a filesystem is mounted I get 135MB/s.  When it isn't mounted
>>>> I get 64MB/s.
>>>>
>>>> I cannot think what could cause this.  I will have to explore.
>>>> Can you please double check you results and confirm that it definitely
>>>> is  faster then unmounted.
>>>>
>>>>
>>> I'm positive that it's slow when mounted, that's how I discovered the
>>> problem. See below (I recreated the array over 1/10 of the original
>>> disks to be able to test easier).
>>> In fact the highest speed I get when accessing directly the entire disk
>>> even when one partition is mounted.
>>>
>>>
>>> bacula:~# cat /proc/mdstat
>>> Personalities : [raid1] [raid6] [raid5] [raid4]
>>> md_d1 : active raid6 sdg1[5] sdf1[4] sde1[3] sdd1[2] sdc1[1] sdb1[0]
>>>       390668288 blocks level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]
>>>
>>> md2 : active raid1 sdi1[0] sdj1[1]
>>>       1462750272 blocks [2/2] [UU]
>>>
>>> unused devices: <none>
>>>
>>> bacula:~# parted  /dev/md_d1
>>> GNU Parted 1.8.8
>>> Using /dev/md_d1
>>> Welcome to GNU Parted! Type 'help' to view a list of commands.
>>> (parted) print
>>> Model: Unknown (unknown)
>>> Disk /dev/md_d1: 400GB
>>> Sector size (logical/physical): 512B/512B
>>> Partition Table: gpt
>>>
>>> Number  Start   End     Size    File system  Name     Flags
>>>  1      17.4kB  50.0GB  50.0GB  ext3         primary
>>>
>>> (parted) quit
>>>
>>> bacula:~# umount /dev/md_d1p1
>>> umount: /dev/md_d1p1: not mounted
>>>
>>> bacula:~# dd if=/dev/md_d1p1 of=/dev/null bs=1M count=10000
>>> 10000+0 records in
>>> 10000+0 records out
>>> 10485760000 bytes (10 GB) copied, 37.4938 s, 280 MB/s
>>>
>>> bacula:~# mount /dev/md_d1p1 /mnt
>>>
>>> bacula:~# dd if=/dev/md_d1p1 of=/dev/null bs=1M count=10000
>>> 10000+0 records in
>>> 10000+0 records out
>>> 10485760000 bytes (10 GB) copied, 132.894 s, 78.9 MB/s
>>>
>>> bacula:~# dd if=/dev/md_d1 of=/dev/null bs=1M count=10000
>>> 10000+0 records in
>>> 10000+0 records out
>>> 10485760000 bytes (10 GB) copied, 28.222 s, 372 MB/s
>>>
>>
>> Why are you trying directly from the block devices when they contain a
>> mounted filesystem?  Surely the fs layer would be holding a locks on the
>> block device causing it to slow down raw layer access.
>>
>> Might I suggest you should be reading files that are located within the
>> mounted filesystem.
>>
>> I suggest you try this in the mounted filesystem:
>>
>> dd if=/dev/zero of=/mnt/test-file bs=1M count=10000
>> dd if=/mnt/test-file of=/dev/null bs=1M
>> rm /mnt/test-file
>>
>> I hope this helps.
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

First off, why not use a hard disk benchmark utility (their names
escape me aside from Bonnie++) which has these issues worked out?

Second, if you absolutely must try to do a benchmark with basic tools
(that buffer and use cache) try this:

dd if=/dev/zero bs=1M count=10000 | tr '\0' 't' > testfile
dd if=testfile of=/dev/null bs=1M

You may note that you'll be writing a file with Ts instead of a file
with 0's; my method should not be detected as sparse, where as the
case with zeros probably will be detected as sparse and simply not
stored.

If in doubt you can check the size of the file on disk with ls -ls
If I'm reading the output correctly the left most column (size on
disk) is in kilobyte units, even on a 4kb cluster EXT4 filesystem.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: slow sequential read on partitioned raid6
  2010-03-18  2:40         ` Michael Evans
@ 2010-03-19  6:47           ` Nicolae Mihalache
  2010-03-19  8:16             ` Michael Evans
  0 siblings, 1 reply; 7+ messages in thread
From: Nicolae Mihalache @ 2010-03-19  6:47 UTC (permalink / raw)
  To: Michael Evans; +Cc: linux-raid

Actually my problem as written in the subject of the mail was that the
sequential read was slow. Somebody suggested to use a file instead of
the raw partition. If the file was detected as sparse (who does that??),
it would be even faster to read not slower.

nicolae


On 03/18/2010 03:40 AM, Michael Evans wrote:
> First off, why not use a hard disk benchmark utility (their names
> escape me aside from Bonnie++) which has these issues worked out?
>
> Second, if you absolutely must try to do a benchmark with basic tools
> (that buffer and use cache) try this:
>
> dd if=/dev/zero bs=1M count=10000 | tr '\0' 't' > testfile
> dd if=testfile of=/dev/null bs=1M
>
> You may note that you'll be writing a file with Ts instead of a file
> with 0's; my method should not be detected as sparse, where as the
> case with zeros probably will be detected as sparse and simply not
> stored.
>
> If in doubt you can check the size of the file on disk with ls -ls
> If I'm reading the output correctly the left most column (size on
> disk) is in kilobyte units, even on a 4kb cluster EXT4 filesystem


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: slow sequential read on partitioned raid6
  2010-03-19  6:47           ` Nicolae Mihalache
@ 2010-03-19  8:16             ` Michael Evans
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Evans @ 2010-03-19  8:16 UTC (permalink / raw)
  To: Nicolae Mihalache; +Cc: linux-raid

On Thu, Mar 18, 2010 at 11:47 PM, Nicolae Mihalache <mache@abcpages.com> wrote:
> Actually my problem as written in the subject of the mail was that the
> sequential read was slow. Somebody suggested to use a file instead of
> the raw partition. If the file was detected as sparse (who does that??),
> it would be even faster to read not slower.
>
> nicolae
>
>
> On 03/18/2010 03:40 AM, Michael Evans wrote:
>> First off, why not use a hard disk benchmark utility (their names
>> escape me aside from Bonnie++) which has these issues worked out?
>>
>> Second, if you absolutely must try to do a benchmark with basic tools
>> (that buffer and use cache) try this:
>>
>> dd if=/dev/zero bs=1M count=10000 | tr '\0' 't' > testfile
>> dd if=testfile of=/dev/null bs=1M
>>
>> You may note that you'll be writing a file with Ts instead of a file
>> with 0's; my method should not be detected as sparse, where as the
>> case with zeros probably will be detected as sparse and simply not
>> stored.
>>
>> If in doubt you can check the size of the file on disk with ls -ls
>> If I'm reading the output correctly the left most column (size on
>> disk) is in kilobyte units, even on a 4kb cluster EXT4 filesystem
>
>

Some versions of standard system utilities may do that by default.
They only have to ensure that the data is processed with similar
content; not identical on disk structure.  I've been told (by
developers on the gnu project that includes it) that dd is supposed to
do it (at least in recent versions); probably other utilities like cp
could have it on by default as well.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-03-19  8:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-16 19:05 slow sequential read on partitioned raid6 Nicolae Mihalache
2010-03-16 22:22 ` Neil Brown
2010-03-16 23:16   ` Nicolae Mihalache
     [not found]     ` <1268783497.3781.14.camel@localhost.localdomain>
2010-03-17  8:23       ` Nicolae Mihalache
2010-03-18  2:40         ` Michael Evans
2010-03-19  6:47           ` Nicolae Mihalache
2010-03-19  8:16             ` Michael Evans

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).