linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* MD RAID1 performance very different from non-RAID partition
@ 2007-09-15  5:28 Jordan Russell
  2007-09-15  7:09 ` Iustin Pop
  0 siblings, 1 reply; 9+ messages in thread
From: Jordan Russell @ 2007-09-15  5:28 UTC (permalink / raw)
  To: linux-raid

(Kernel: 2.6.18, x86_64)

Is it normal for an MD RAID1 partition with 1 active disk to perform
differently from a non-RAID partition?

md0 : active raid1 sda2[0]
      8193024 blocks [2/1] [U_]

I'm building a search engine database onto this partition. All of the
source data is cached into memory already (i.e., only writes should be
hitting the disk).
If I mount the partition as /dev/md0, building the database consistently
takes 18 minutes.
If I stop /dev/md0 and mount the partition as /dev/sda2, building the
database consistently takes 31 minutes.

Why the difference?

The "fast" time seen when the partition is mounted as /dev/md0 actually
creates a serious problem: the kernel apparently flushes dirty pages so
aggressively that other processes attempting to write to the same
partition during the database build become blocked for several
minutes(!) at a time.

When mounted as /dev/sda2, that doesn't happen: other processes writing
to the same partition are blocked for no more than a few seconds at a time.

I don't know if it's relevant, but the results from iostat when writing
large chunks of data to RAID1 partitions seem somewhat curious, as if MD
is telling the I/O layer "all done!" before it's actually finished
writing the data out to the member disks. Note the unrealistically high
kB_wrtn/s numbers on md0 in the following test. (And why does it show
50000 tps?)

# iostat -dk 1 md0 sda

# fgrep MemTotal /proc/meminfo
MemTotal:      2059784 kB
# cat /proc/sys/vm/dirty_ratio
40
# cat /proc/sys/vm/dirty_background_ratio
10
# dd if=/dev/zero of=/testpart/bigfile bs=1M count=400

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              91.00         0.00     46592.00          0      46592
md0           48692.00         0.00    194768.00          0     194768

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              99.00         0.00     50176.00          0      50176
md0               0.00         0.00         0.00          0          0

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              96.00         0.00     49152.00          0      49152
md0               0.00         0.00         0.00          0          0

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              86.00         0.00     44032.00          0      44032
md0               0.00         0.00         0.00          0          0

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              96.00         0.00     48160.00          0      48160
md0           51636.00         0.00    206544.00          0     206544

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              95.05         0.00     48665.35          0      49152
md0               0.00         0.00         0.00          0          0

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              92.00         0.00     46596.00          0      46596
md0               0.00         0.00         0.00          0          0

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              95.96         0.00     48614.14          0      48128
md0               0.00         0.00         0.00          0          0

...

-- 
Jordan Russell

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: MD RAID1 performance very different from non-RAID partition
  2007-09-15  5:28 MD RAID1 performance very different from non-RAID partition Jordan Russell
@ 2007-09-15  7:09 ` Iustin Pop
  2007-09-15 12:18   ` Goswin von Brederlow
  2007-09-15 18:11   ` Jordan Russell
  0 siblings, 2 replies; 9+ messages in thread
From: Iustin Pop @ 2007-09-15  7:09 UTC (permalink / raw)
  To: Jordan Russell; +Cc: linux-raid

On Sat, Sep 15, 2007 at 12:28:07AM -0500, Jordan Russell wrote:
> (Kernel: 2.6.18, x86_64)
> 
> Is it normal for an MD RAID1 partition with 1 active disk to perform
> differently from a non-RAID partition?
> 
> md0 : active raid1 sda2[0]
>       8193024 blocks [2/1] [U_]
> 
> I'm building a search engine database onto this partition. All of the
> source data is cached into memory already (i.e., only writes should be
> hitting the disk).
> If I mount the partition as /dev/md0, building the database consistently
> takes 18 minutes.
> If I stop /dev/md0 and mount the partition as /dev/sda2, building the
> database consistently takes 31 minutes.
> 
> Why the difference?

Maybe it's because md doesn't support barriers whereas the disks
supports them? In this case some filesystems, for example XFS, will work
faster on raid1 because they can't force the flush to disk using
barriers.

Just a guess...

regards,
iustin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: MD RAID1 performance very different from non-RAID partition
  2007-09-15  7:09 ` Iustin Pop
@ 2007-09-15 12:18   ` Goswin von Brederlow
  2007-09-15 12:32     ` Iustin Pop
  2007-09-15 18:11   ` Jordan Russell
  1 sibling, 1 reply; 9+ messages in thread
From: Goswin von Brederlow @ 2007-09-15 12:18 UTC (permalink / raw)
  To: Jordan Russell; +Cc: linux-raid

Iustin Pop <iusty@k1024.org> writes:

> On Sat, Sep 15, 2007 at 12:28:07AM -0500, Jordan Russell wrote:
>> (Kernel: 2.6.18, x86_64)
>> 
>> Is it normal for an MD RAID1 partition with 1 active disk to perform
>> differently from a non-RAID partition?
>> 
>> md0 : active raid1 sda2[0]
>>       8193024 blocks [2/1] [U_]
>> 
>> I'm building a search engine database onto this partition. All of the
>> source data is cached into memory already (i.e., only writes should be
>> hitting the disk).
>> If I mount the partition as /dev/md0, building the database consistently
>> takes 18 minutes.
>> If I stop /dev/md0 and mount the partition as /dev/sda2, building the
>> database consistently takes 31 minutes.
>> 
>> Why the difference?
>
> Maybe it's because md doesn't support barriers whereas the disks
> supports them? In this case some filesystems, for example XFS, will work
> faster on raid1 because they can't force the flush to disk using
> barriers.
>
> Just a guess...
>
> regards,
> iustin

Shouldn't it be the other way around? With a barrier the filesystem
can enforce an order on the data written and can then continue writing
data to the cache. More data is queued up for write. Without barriers
the filesystem should do a sync at that point and have to wait for the
write to fully finish. So less is put into cache.

Or not?

MfG
        Goswin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: MD RAID1 performance very different from non-RAID partition
  2007-09-15 12:18   ` Goswin von Brederlow
@ 2007-09-15 12:32     ` Iustin Pop
  0 siblings, 0 replies; 9+ messages in thread
From: Iustin Pop @ 2007-09-15 12:32 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: Jordan Russell, linux-raid

On Sat, Sep 15, 2007 at 02:18:19PM +0200, Goswin von Brederlow wrote:
> Shouldn't it be the other way around? With a barrier the filesystem
> can enforce an order on the data written and can then continue writing
> data to the cache. More data is queued up for write. Without barriers
> the filesystem should do a sync at that point and have to wait for the
> write to fully finish. So less is put into cache.

I don't know in general, but XFS will simply not issue any sync at all
if the block device doesn't support barriers. It's the syadmin's job to
either ensure you have barriers or turn off write cache on disk (see the
XFS faq, for example).

However, I never saw such behaviour from MD (i.e. claiming the write has
completed while the disk underneath is still receiving data to write
from Linux) so I'm not sure this is what happens here. In my experience,
MD acknowledges a write only when it has been pushed to the drive (write
cache enabled or not) and there is no buffer between MD and the drive.

regards,
iustin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: MD RAID1 performance very different from non-RAID partition
  2007-09-15  7:09 ` Iustin Pop
  2007-09-15 12:18   ` Goswin von Brederlow
@ 2007-09-15 18:11   ` Jordan Russell
  2007-09-16 22:08     ` Goswin von Brederlow
  1 sibling, 1 reply; 9+ messages in thread
From: Jordan Russell @ 2007-09-15 18:11 UTC (permalink / raw)
  To: linux-raid

Iustin Pop wrote:
> Maybe it's because md doesn't support barriers whereas the disks
> supports them? In this case some filesystems, for example XFS, will work
> faster on raid1 because they can't force the flush to disk using
> barriers.

It's an ext3 partition, so I guess that doesn't apply?

I tried remounting /dev/sda2 with the "barrier=0" option (which I assume
disables barriers, looking at the source), though, just to see if it
would make any difference, but it didn't; the database build still took
31 minutes.

-- 
Jordan Russell

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: MD RAID1 performance very different from non-RAID partition
  2007-09-15 18:11   ` Jordan Russell
@ 2007-09-16 22:08     ` Goswin von Brederlow
  2007-09-17 15:58       ` Jordan Russell
  0 siblings, 1 reply; 9+ messages in thread
From: Goswin von Brederlow @ 2007-09-16 22:08 UTC (permalink / raw)
  To: Jordan Russell; +Cc: linux-raid

Jordan Russell <jr-list-2007@quo.to> writes:

> Iustin Pop wrote:
>> Maybe it's because md doesn't support barriers whereas the disks
>> supports them? In this case some filesystems, for example XFS, will work
>> faster on raid1 because they can't force the flush to disk using
>> barriers.
>
> It's an ext3 partition, so I guess that doesn't apply?
>
> I tried remounting /dev/sda2 with the "barrier=0" option (which I assume
> disables barriers, looking at the source), though, just to see if it
> would make any difference, but it didn't; the database build still took
> 31 minutes.

Compare the read ahead settings.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: MD RAID1 performance very different from non-RAID partition
  2007-09-16 22:08     ` Goswin von Brederlow
@ 2007-09-17 15:58       ` Jordan Russell
  2007-09-18 13:44         ` Luca Berra
  0 siblings, 1 reply; 9+ messages in thread
From: Jordan Russell @ 2007-09-17 15:58 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: linux-raid

Goswin von Brederlow wrote:
> Jordan Russell <jr-list-2007@quo.to> writes:
>> It's an ext3 partition, so I guess that doesn't apply?
>>
>> I tried remounting /dev/sda2 with the "barrier=0" option (which I assume
>> disables barriers, looking at the source), though, just to see if it
>> would make any difference, but it didn't; the database build still took
>> 31 minutes.
> 
> Compare the read ahead settings.

I'm not sure what you mean. There's a per-mount read ahead setting?

-- 
Jordan Russell

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: MD RAID1 performance very different from non-RAID partition
  2007-09-17 15:58       ` Jordan Russell
@ 2007-09-18 13:44         ` Luca Berra
  2007-09-19  5:28           ` Jordan Russell
  0 siblings, 1 reply; 9+ messages in thread
From: Luca Berra @ 2007-09-18 13:44 UTC (permalink / raw)
  To: linux-raid

On Mon, Sep 17, 2007 at 10:58:11AM -0500, Jordan Russell wrote:
>Goswin von Brederlow wrote:
>> Jordan Russell <jr-list-2007@quo.to> writes:
>>> It's an ext3 partition, so I guess that doesn't apply?
>>>
>>> I tried remounting /dev/sda2 with the "barrier=0" option (which I assume
>>> disables barriers, looking at the source), though, just to see if it
>>> would make any difference, but it didn't; the database build still took
>>> 31 minutes.
>> 
>> Compare the read ahead settings.
>
>I'm not sure what you mean. There's a per-mount read ahead setting?
>

per device

compare
blockdev --getra /dev/sda2
and 
blockdev --getra /dev/md0

L.
-- 
Luca Berra -- bluca@comedia.it
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: MD RAID1 performance very different from non-RAID partition
  2007-09-18 13:44         ` Luca Berra
@ 2007-09-19  5:28           ` Jordan Russell
  0 siblings, 0 replies; 9+ messages in thread
From: Jordan Russell @ 2007-09-19  5:28 UTC (permalink / raw)
  To: linux-raid

Luca Berra wrote:
> On Mon, Sep 17, 2007 at 10:58:11AM -0500, Jordan Russell wrote:
>> Goswin von Brederlow wrote:
>>> Compare the read ahead settings.
>> I'm not sure what you mean. There's a per-mount read ahead setting?
>>
> 
> per device
> 
> compare
> blockdev --getra /dev/sda2
> and 
> blockdev --getra /dev/md0

Ah.

256 on /dev/sda2 with md0 stopped
256 on /dev/sda2 with md0 active
256 on /dev/md0 with md0 active

-- 
Jordan Russell

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-09-19  5:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-15  5:28 MD RAID1 performance very different from non-RAID partition Jordan Russell
2007-09-15  7:09 ` Iustin Pop
2007-09-15 12:18   ` Goswin von Brederlow
2007-09-15 12:32     ` Iustin Pop
2007-09-15 18:11   ` Jordan Russell
2007-09-16 22:08     ` Goswin von Brederlow
2007-09-17 15:58       ` Jordan Russell
2007-09-18 13:44         ` Luca Berra
2007-09-19  5:28           ` Jordan Russell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).