All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kasprzak <kas@fi.muni.cz>
To: John Robinson <john.robinson@anonymous.org.uk>
Cc: linux-raid@vger.kernel.org, Neil Brown <neilb@suse.de>
Subject: Re: RAID-10 initial sync is CPU-limited
Date: Tue, 4 Jan 2011 18:13:24 +0100	[thread overview]
Message-ID: <20110104171324.GU17455@fi.muni.cz> (raw)
In-Reply-To: <4D2332F1.6090205@anonymous.org.uk>

John Robinson wrote:
: >	According to dmesg(8) my hardware is able to do XOR
: >at 9864 MB/s using generic_sse, and 2167 MB/s using int64x1. So I assume
: >memcmp+memcpy would not be much slower. According to /proc/mdstat, the 
: >resync
: >is running at 449 MB/s. So I expect just memcmp+memcpy cannot be a 
: >bottleneck
: >here.
: 
: I think it can. Those XOR benchmarks only tell you what the CPU core can 
: do internally, and don't reflect FSB/RAM bandwidth.

	Fair enough.

: My Core 2 Quad 
: 3.2GHz on 1.6GHz FSB with dual-channel memory at 800MHz each (P45 
: chipset) has maximum memory bandwidth of about 4.5GB/s with two sticks 
: of RAM, according to memtest86+. With 4 sticks of RAM it's 3.5GB/s. In 
: real use it'll be rather less.

	My system has 16 1333MHz DIMMs, so I expect the total
available bandwidth would be much higher than 6x 449 MB/s.

: One core can easily saturate the memory bandwidth, so having multiple 
: threads would not help at all.

	I am not sure about that, especially on NUMA systems
(my system is dual-socket Opteron 6128). I would think having at least
two threads (each one running on a core in a different socket) can help.

: (a) if you memcpy it, you go through RAM 4 times instead of 6;

	Yes, I was wondering why the resync does memcpy at all instead
of passing the buffer to the other half of a mirror and doing DMA from it
as soon as memcmp fails.

: In the mean time, wiping your discs before you create the array with `dd 
: if=/dev/zero of=/dev/disk` would only go from RAM to disc twice (once 
: for each disc), then create the array with --assume-clean.

	I think it is possible to do --assume-clean even without
cleaning the disk, provided that the resulting md device is used by a
filesystem. I don't think there is a filesystem that reads blocks which
it did not write before.

	Anyway, I have tried to do "echo check > /sys/block/md1/md/sync_action"
and apparently just checking the array without writing (i.e. just memcmp
without memcpy) is sometimes able to keep the disks with 100% utilization
according to iostat. In /proc/mdstat I can see the rebuild speed of about
520 MB/s.  md1_resync uses about 40-50% of a single CPU, and md1_raid10
still uses 90-100%.

	Another possible source of the overhead is that the resync
uses page-sized chunks instead of something bigger, and relies on the
block layer to do request merging. I observe high variance of
the avgrq-sz value in iostat (varying between about 120 to 280).
Maybe this is what causes the md1_raid10 high CPU utilization?

	Sincerely,

-Yenya

-- 
| Jan "Yenya" Kasprzak  <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839      Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/    Journal: http://www.fi.muni.cz/~kas/blog/ |
Please don't top post and in particular don't attach entire digests to your
mail or we'll all soon be using bittorrent to read the list.     --Alan Cox

  reply	other threads:[~2011-01-04 17:13 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-03 16:32 RAID-10 initial sync is CPU-limited Jan Kasprzak
2011-01-04  5:24 ` NeilBrown
2011-01-04  8:29   ` Jan Kasprzak
2011-01-04 11:15     ` NeilBrown
2011-01-04 14:47     ` John Robinson
2011-01-04 17:13       ` Jan Kasprzak [this message]
2011-01-04 14:54 ` John Robinson
2011-01-04 16:41   ` Jan Kasprzak
2011-01-04 17:05     ` John Robinson
2011-01-04 17:17       ` Jan Kasprzak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110104171324.GU17455@fi.muni.cz \
    --to=kas@fi.muni.cz \
    --cc=john.robinson@anonymous.org.uk \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.