linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Interesting RAID checking observations
@ 2006-08-27  6:46 linux
  2006-08-27 11:18 ` Justin Piszcz
  2006-08-28  1:04 ` Neil Brown
  0 siblings, 2 replies; 11+ messages in thread
From: linux @ 2006-08-27  6:46 UTC (permalink / raw)
  To: linux-raid

Two discoveries: first, I locked up my machine, and second, it's surprisingly
slow.

I think the former is pilot error, and not a bug, but after applying
the raid-1 check patch (cherry-picked from the v2.6.18-rc4-mm3 tree at
git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git)
to a couple of machines, I decided to try it out.

This is an older 466 MHz P-II Celeron with 6 drives, and 1 GB of
each drive mirrored in pairs as swap space.  So I did a quick

# for i in /sys/block/md[567]/md/sync_sction; do echo check > $i ; done

to watch them all proceeding in parallel.

But... I had /proc/sys/dev/raid/speed_limit_max set at 200000.

The machine became quite unresponsive for a minute or two
as the check proceeded.  Caps lock and console-switching worked,
as did Alt-SysRq, but I couldn't type a single character at the
console until the first check ended.

Trying it again, I repeated it a few times, and starting the checks
one at a time, the machine gets jerky with two checks running,
and becomes unresponsive with three.

The drives are all PATA, one pair on the motherboard (good ol' 440BX
chipset), and the others on a pair of Promise PDC20268 PCI cards.

I think this is a "duh! that's why seed_limit_max is there" thing,
but I didn't expect it to saturate the processor before the drives.


Second, trying checks on a fast (2.2 GHz AMD64) machine, I'm surprised
at how slow it is:

md4 : active raid10 sdf3[4] sde3[3] sdd3[2] sdc3[1] sdb3[0] sda3[5]
      131837184 blocks 256K chunks 2 near-copies [6/6] [UUUUUU]
      [==================>..]  resync = 90.5% (119333248/131837184) finish=4.2min speed=48628K/sec

This is 6 ST3400832AS 400 GB SATA drives, each capable of 60 MB/s
sustained, on Sil3132 PCIe controllers with NCQ enabled.  I measured
> 300 MB/sec sustained aggregate off a temporary RAID-0 device during
installation.

RAID-5 is even slower:
md5 : active raid5 sdf4[5] sde4[4] sdd4[3] sdc4[2] sdb4[1] sda4[0]
      1719155200 blocks level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]
      [>....................]  resync =  2.4% (8401536/343831040) finish=242.9min speed=23012K/sec


To illustrate the hardware's capabilities:

# hdparm --direct -tT /dev/sd[abcdef]3

/dev/sda3:
 Timing O_DIRECT cached reads:   232 MB in  2.01 seconds = 115.37 MB/sec
 Timing O_DIRECT disk reads:  200 MB in  3.01 seconds =  66.48 MB/sec

/dev/sdb3:
 Timing O_DIRECT cached reads:   232 MB in  2.01 seconds = 115.62 MB/sec
 Timing O_DIRECT disk reads:  200 MB in  3.03 seconds =  66.06 MB/sec

/dev/sdc3:
 Timing O_DIRECT cached reads:   240 MB in  2.02 seconds = 118.81 MB/sec
 Timing O_DIRECT disk reads:  196 MB in  3.01 seconds =  65.13 MB/sec

/dev/sdd3:
 Timing O_DIRECT cached reads:   236 MB in  2.01 seconds = 117.47 MB/sec
 Timing O_DIRECT disk reads:  174 MB in  3.00 seconds =  57.98 MB/sec

/dev/sde3:
 Timing O_DIRECT cached reads:   236 MB in  2.02 seconds = 117.04 MB/sec
 Timing O_DIRECT disk reads:  198 MB in  3.03 seconds =  65.38 MB/sec

/dev/sdf3:
 Timing O_DIRECT cached reads:   240 MB in  2.00 seconds = 119.75 MB/sec
 Timing O_DIRECT disk reads:  186 MB in  3.01 seconds =  61.77 MB/sec

Or, more to the point (interleaved output fixed up by hand):

# for i in /dev/sd[abcdef]3; do hdparm --direct -tT $i & done
[2] 4104 [3] 4105 [4] 4106 [5] 4107 [6] 4108 [7] 4109
/dev/sda3: /dev/sdb3: /dev/sdc3: /dev/sdd3: /dev/sde3: /dev/sdf3:
 Timing O_DIRECT cached reads:   232 MB in  2.00 seconds = 115.85 MB/sec
 Timing O_DIRECT cached reads:   240 MB in  2.00 seconds = 119.75 MB/sec
 Timing O_DIRECT cached reads:   240 MB in  2.02 seconds = 118.84 MB/sec
 Timing O_DIRECT cached reads:   240 MB in  2.02 seconds = 118.66 MB/sec
 Timing O_DIRECT cached reads:   240 MB in  2.02 seconds = 118.62 MB/sec
 Timing O_DIRECT cached reads:   236 MB in  2.02 seconds = 116.57 MB/sec
 Timing O_DIRECT disk reads:  200 MB in  3.01 seconds =  66.41 MB/sec
 Timing O_DIRECT disk reads:  198 MB in  3.01 seconds =  65.80 MB/sec
 Timing O_DIRECT disk reads:  178 MB in  3.01 seconds =  59.07 MB/sec
 Timing O_DIRECT disk reads:  190 MB in  3.02 seconds =  62.88 MB/sec
 Timing O_DIRECT disk reads:  194 MB in  3.03 seconds =  64.09 MB/sec
 Timing O_DIRECT disk reads:  200 MB in  3.03 seconds =  66.08 MB/sec
[2]   Done                    hdparm --direct -tT $i
[3]   Done                    hdparm --direct -tT $i
[4]   Done                    hdparm --direct -tT $i
[5]   Done                    hdparm --direct -tT $i
[6]   Done                    hdparm --direct -tT $i
[7]-  Done                    hdparm --direct -tT $i

A quick test program emulating RAID-1 (appended) produced:

# ./xor /dev/sdb3 /dev/sdc3
Read 131072 K in 1944478 usec (69025068/sec)
Read 131072 K in 1952476 usec (68742318/sec)
Final sum: 0000000000000000
XOR time: 77007 usec (1742928928 bytes/sec)
# ./xor /dev/md4 /dev/md4 
Read 131072 K in 580483 usec (231217327/sec)
Read 131072 K in 583844 usec (229886284/sec)
Final sum: 0000000000000000
XOR time: 76901 usec (1745331374 bytes/sec)
# ./xor /dev/md5 /dev/md5
Read 131072 K in 484162 usec (277216568/sec)
Read 131072 K in 458060 usec (293013421/sec)
Final sum: 0000000000000000
XOR time: 76752 usec (1748719616 bytes/sec)

And that's without using prefetch or SSE, so I don't think the processor
is a bottleneck.  Any ideas why checking is not 3x faster?

=== xor.c ===
#define _GNU_SOURCE	/* For O_DIRECT */
#include <stdio.h>
#include <malloc.h>	/* For valloc */
#include <unistd.h>
#include <fcntl.h>
#include <sys/time.h>

#define WORDS (16*1024*1024)
#define BYTES (WORDS * (unsigned)sizeof(long))

static unsigned __attribute__((pure))
tv_diff(struct timeval const *start, struct timeval const *stop)
{
	return 1000000u * (stop->tv_sec - start->tv_sec) + stop->tv_usec - start->tv_usec;
}

int
main(int argc, char **argv)
{
	int fd1, fd2;
	long *p1 = valloc(2 * WORDS * sizeof *p1);
	long *p2 = p1 + WORDS;
	long sum = 0;
	unsigned i;
	struct timeval start, stop;
	ssize_t ss;

	if (argc != 3) {
		fputs("Expecting 2 arguments\n", stderr);
		return 1;
	}
	fd1 = open(argv[1], O_RDONLY | O_DIRECT);
	if (fd1 < 0) {
		perror(argv[1]);
		return 1;
	}
	fd2 = open(argv[2], O_RDONLY | O_DIRECT);
	if (fd2 < 0) {
		perror(argv[2]);
		return 1;
	}

	gettimeofday(&start, 0);
	ss = read(fd1, p1, BYTES);
	if (ss < (ssize_t)BYTES) {
		if (ss < 0)
			perror(argv[1]);
		else
			fprintf(stderr, "%s: short read (%zd)\n", argv[1], ss);
		return 1;
	}
	gettimeofday(&stop, 0);
	i = tv_diff(&start, &stop);
	printf("Read %u K in %u usec (%lu/sec)\n", BYTES/1024, i, 1000000ul*BYTES/i);

	ss = read(fd2, p2, BYTES);
	if (ss < BYTES) {
		if (ss < 0)
			perror(argv[2]);
		else
			fprintf(stderr, "%s: short read (%zd)\n", argv[2], ss);
		return 1;
	}
	gettimeofday(&start, 0);
	i = tv_diff(&stop, &start);
	printf("Read %u K in %u usec (%lu/sec)\n", BYTES/1024, i, 1000000ul*BYTES/i);

	for (i = 0; i < WORDS; i++)
		sum |= p1[i] ^ p2[i];

	gettimeofday(&stop, 0);

	printf("Final sum: %016lx\n", sum);
	i = tv_diff(&start, &stop);
	printf("XOR time: %u usec (%lu bytes/sec)\n", i, 1000000ul * BYTES / i);
	return 0;
}

^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: Interesting RAID checking observations
@ 2006-09-21  6:58 linux
  2006-09-21 18:48 ` linux
  2006-09-21 18:55 ` Rob Bray
  0 siblings, 2 replies; 11+ messages in thread
From: linux @ 2006-09-21  6:58 UTC (permalink / raw)
  To: linux-raid; +Cc: linux, neilb

Just to follow up my speed observations last month on a 6x SATA <-> 3x
PCIe <-> AMD64 system, as of 2.6.18 final, RAID-10 checking is running
at a reasonable ~156 MB/s (which I presume means 312 MB/s of reads),
and raid5 is better than the 23 MB/s I complained about earlier, but
still a bit sluggish...

md5 : active raid5 sdf4[5] sde4[4] sdd4[3] sdc4[2] sdb4[1] sda4[0]
      1719155200 blocks level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]
      [=>...................]  resync =  6.2% (21564928/343831040) finish=86.0min speed=62429K/sec

I'm not sure why the raid5 check can't run at 250 MB/s (300 MB/s disk
speed).  The processor is idle and can do a lot more than that:

raid5: automatically using best checksumming function: generic_sse
   generic_sse:  6769.000 MB/sec
raid5: using function: generic_sse (6769.000 MB/sec)


But anyway, it's better, so thank you!  I haven't rebooted the celeron
I hung for the duration of a RAID-1 check, so I haven't checked that with
2.6.18 yet.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-09-22  2:13 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-27  6:46 Interesting RAID checking observations linux
2006-08-27 11:18 ` Justin Piszcz
2006-08-28 13:00   ` linux
2006-08-28  1:04 ` Neil Brown
2006-08-28 13:15   ` linux
2006-09-04 17:16     ` Bill Davidsen
2006-09-04 21:15       ` linux
  -- strict thread matches above, loose matches on Subject: below --
2006-09-21  6:58 linux
2006-09-21 18:48 ` linux
2006-09-21 18:55 ` Rob Bray
2006-09-22  2:13   ` linux

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).