Interesting RAID checking observations

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Interesting RAID checking observations
@ 2006-08-27  6:46 linux
  2006-08-27 11:18 ` Justin Piszcz
  2006-08-28  1:04 ` Neil Brown
  0 siblings, 2 replies; 11+ messages in thread
From: linux @ 2006-08-27  6:46 UTC (permalink / raw)
  To: linux-raid

Two discoveries: first, I locked up my machine, and second, it's surprisingly
slow.

I think the former is pilot error, and not a bug, but after applying
the raid-1 check patch (cherry-picked from the v2.6.18-rc4-mm3 tree at
git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git)
to a couple of machines, I decided to try it out.

This is an older 466 MHz P-II Celeron with 6 drives, and 1 GB of
each drive mirrored in pairs as swap space.  So I did a quick

# for i in /sys/block/md[567]/md/sync_sction; do echo check > $i ; done

to watch them all proceeding in parallel.

But... I had /proc/sys/dev/raid/speed_limit_max set at 200000.

The machine became quite unresponsive for a minute or two
as the check proceeded.  Caps lock and console-switching worked,
as did Alt-SysRq, but I couldn't type a single character at the
console until the first check ended.

Trying it again, I repeated it a few times, and starting the checks
one at a time, the machine gets jerky with two checks running,
and becomes unresponsive with three.

The drives are all PATA, one pair on the motherboard (good ol' 440BX
chipset), and the others on a pair of Promise PDC20268 PCI cards.

I think this is a "duh! that's why seed_limit_max is there" thing,
but I didn't expect it to saturate the processor before the drives.


Second, trying checks on a fast (2.2 GHz AMD64) machine, I'm surprised
at how slow it is:

md4 : active raid10 sdf3[4] sde3[3] sdd3[2] sdc3[1] sdb3[0] sda3[5]
      131837184 blocks 256K chunks 2 near-copies [6/6] [UUUUUU]
      [==================>..]  resync = 90.5% (119333248/131837184) finish=4.2min speed=48628K/sec

This is 6 ST3400832AS 400 GB SATA drives, each capable of 60 MB/s
sustained, on Sil3132 PCIe controllers with NCQ enabled.  I measured
> 300 MB/sec sustained aggregate off a temporary RAID-0 device during
installation.

RAID-5 is even slower:
md5 : active raid5 sdf4[5] sde4[4] sdd4[3] sdc4[2] sdb4[1] sda4[0]
      1719155200 blocks level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]
      [>....................]  resync =  2.4% (8401536/343831040) finish=242.9min speed=23012K/sec


To illustrate the hardware's capabilities:

# hdparm --direct -tT /dev/sd[abcdef]3

/dev/sda3:
 Timing O_DIRECT cached reads:   232 MB in  2.01 seconds = 115.37 MB/sec
 Timing O_DIRECT disk reads:  200 MB in  3.01 seconds =  66.48 MB/sec

/dev/sdb3:
 Timing O_DIRECT cached reads:   232 MB in  2.01 seconds = 115.62 MB/sec
 Timing O_DIRECT disk reads:  200 MB in  3.03 seconds =  66.06 MB/sec

/dev/sdc3:
 Timing O_DIRECT cached reads:   240 MB in  2.02 seconds = 118.81 MB/sec
 Timing O_DIRECT disk reads:  196 MB in  3.01 seconds =  65.13 MB/sec

/dev/sdd3:
 Timing O_DIRECT cached reads:   236 MB in  2.01 seconds = 117.47 MB/sec
 Timing O_DIRECT disk reads:  174 MB in  3.00 seconds =  57.98 MB/sec

/dev/sde3:
 Timing O_DIRECT cached reads:   236 MB in  2.02 seconds = 117.04 MB/sec
 Timing O_DIRECT disk reads:  198 MB in  3.03 seconds =  65.38 MB/sec

/dev/sdf3:
 Timing O_DIRECT cached reads:   240 MB in  2.00 seconds = 119.75 MB/sec
 Timing O_DIRECT disk reads:  186 MB in  3.01 seconds =  61.77 MB/sec

Or, more to the point (interleaved output fixed up by hand):

# for i in /dev/sd[abcdef]3; do hdparm --direct -tT $i & done
[2] 4104 [3] 4105 [4] 4106 [5] 4107 [6] 4108 [7] 4109
/dev/sda3: /dev/sdb3: /dev/sdc3: /dev/sdd3: /dev/sde3: /dev/sdf3:
 Timing O_DIRECT cached reads:   232 MB in  2.00 seconds = 115.85 MB/sec
 Timing O_DIRECT cached reads:   240 MB in  2.00 seconds = 119.75 MB/sec
 Timing O_DIRECT cached reads:   240 MB in  2.02 seconds = 118.84 MB/sec
 Timing O_DIRECT cached reads:   240 MB in  2.02 seconds = 118.66 MB/sec
 Timing O_DIRECT cached reads:   240 MB in  2.02 seconds = 118.62 MB/sec
 Timing O_DIRECT cached reads:   236 MB in  2.02 seconds = 116.57 MB/sec
 Timing O_DIRECT disk reads:  200 MB in  3.01 seconds =  66.41 MB/sec
 Timing O_DIRECT disk reads:  198 MB in  3.01 seconds =  65.80 MB/sec
 Timing O_DIRECT disk reads:  178 MB in  3.01 seconds =  59.07 MB/sec
 Timing O_DIRECT disk reads:  190 MB in  3.02 seconds =  62.88 MB/sec
 Timing O_DIRECT disk reads:  194 MB in  3.03 seconds =  64.09 MB/sec
 Timing O_DIRECT disk reads:  200 MB in  3.03 seconds =  66.08 MB/sec
[2]   Done                    hdparm --direct -tT $i
[3]   Done                    hdparm --direct -tT $i
[4]   Done                    hdparm --direct -tT $i
[5]   Done                    hdparm --direct -tT $i
[6]   Done                    hdparm --direct -tT $i
[7]-  Done                    hdparm --direct -tT $i

A quick test program emulating RAID-1 (appended) produced:

# ./xor /dev/sdb3 /dev/sdc3
Read 131072 K in 1944478 usec (69025068/sec)
Read 131072 K in 1952476 usec (68742318/sec)
Final sum: 0000000000000000
XOR time: 77007 usec (1742928928 bytes/sec)
# ./xor /dev/md4 /dev/md4 
Read 131072 K in 580483 usec (231217327/sec)
Read 131072 K in 583844 usec (229886284/sec)
Final sum: 0000000000000000
XOR time: 76901 usec (1745331374 bytes/sec)
# ./xor /dev/md5 /dev/md5
Read 131072 K in 484162 usec (277216568/sec)
Read 131072 K in 458060 usec (293013421/sec)
Final sum: 0000000000000000
XOR time: 76752 usec (1748719616 bytes/sec)

And that's without using prefetch or SSE, so I don't think the processor
is a bottleneck.  Any ideas why checking is not 3x faster?

=== xor.c ===
#define _GNU_SOURCE	/* For O_DIRECT */
#include <stdio.h>
#include <malloc.h>	/* For valloc */
#include <unistd.h>
#include <fcntl.h>
#include <sys/time.h>

#define WORDS (16*1024*1024)
#define BYTES (WORDS * (unsigned)sizeof(long))

static unsigned __attribute__((pure))
tv_diff(struct timeval const *start, struct timeval const *stop)
{
	return 1000000u * (stop->tv_sec - start->tv_sec) + stop->tv_usec - start->tv_usec;
}

int
main(int argc, char **argv)
{
	int fd1, fd2;
	long *p1 = valloc(2 * WORDS * sizeof *p1);
	long *p2 = p1 + WORDS;
	long sum = 0;
	unsigned i;
	struct timeval start, stop;
	ssize_t ss;

	if (argc != 3) {
		fputs("Expecting 2 arguments\n", stderr);
		return 1;
	}
	fd1 = open(argv[1], O_RDONLY | O_DIRECT);
	if (fd1 < 0) {
		perror(argv[1]);
		return 1;
	}
	fd2 = open(argv[2], O_RDONLY | O_DIRECT);
	if (fd2 < 0) {
		perror(argv[2]);
		return 1;
	}

	gettimeofday(&start, 0);
	ss = read(fd1, p1, BYTES);
	if (ss < (ssize_t)BYTES) {
		if (ss < 0)
			perror(argv[1]);
		else
			fprintf(stderr, "%s: short read (%zd)\n", argv[1], ss);
		return 1;
	}
	gettimeofday(&stop, 0);
	i = tv_diff(&start, &stop);
	printf("Read %u K in %u usec (%lu/sec)\n", BYTES/1024, i, 1000000ul*BYTES/i);

	ss = read(fd2, p2, BYTES);
	if (ss < BYTES) {
		if (ss < 0)
			perror(argv[2]);
		else
			fprintf(stderr, "%s: short read (%zd)\n", argv[2], ss);
		return 1;
	}
	gettimeofday(&start, 0);
	i = tv_diff(&stop, &start);
	printf("Read %u K in %u usec (%lu/sec)\n", BYTES/1024, i, 1000000ul*BYTES/i);

	for (i = 0; i < WORDS; i++)
		sum |= p1[i] ^ p2[i];

	gettimeofday(&stop, 0);

	printf("Final sum: %016lx\n", sum);
	i = tv_diff(&start, &stop);
	printf("XOR time: %u usec (%lu bytes/sec)\n", i, 1000000ul * BYTES / i);
	return 0;
}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Interesting RAID checking observations
  2006-08-27  6:46 Interesting RAID checking observations linux
@ 2006-08-27 11:18 ` Justin Piszcz
  2006-08-28 13:00   ` linux
  2006-08-28  1:04 ` Neil Brown
  1 sibling, 1 reply; 11+ messages in thread
From: Justin Piszcz @ 2006-08-27 11:18 UTC (permalink / raw)
  To: linux; +Cc: linux-raid

> Second, trying checks on a fast (2.2 GHz AMD64) machine, I'm surprised
> at how slow it is:

The PCI bus is only capable of 133MB/s max.  Unless you have dedicated 
SATA ports, each on its own PCI-e bus, you will not get speeds in excess 
of 133MB/s, 200MB/s+ I have read reports of someone using 4-5 SATA 
controllers (SiI 3112 cards on PCI-e x1 ports) and they got around 200MB/s 
or so on a RAID5, I assume that was read performance.



On Sun, 27 Aug 2006, linux@horizon.com wrote:

> Two discoveries: first, I locked up my machine, and second, it's surprisingly
> slow.
>
> I think the former is pilot error, and not a bug, but after applying
> the raid-1 check patch (cherry-picked from the v2.6.18-rc4-mm3 tree at
> git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git)
> to a couple of machines, I decided to try it out.
>
> This is an older 466 MHz P-II Celeron with 6 drives, and 1 GB of
> each drive mirrored in pairs as swap space.  So I did a quick
>
> # for i in /sys/block/md[567]/md/sync_sction; do echo check > $i ; done
>
> to watch them all proceeding in parallel.
>
> But... I had /proc/sys/dev/raid/speed_limit_max set at 200000.
>
> The machine became quite unresponsive for a minute or two
> as the check proceeded.  Caps lock and console-switching worked,
> as did Alt-SysRq, but I couldn't type a single character at the
> console until the first check ended.
>
> Trying it again, I repeated it a few times, and starting the checks
> one at a time, the machine gets jerky with two checks running,
> and becomes unresponsive with three.
>
> The drives are all PATA, one pair on the motherboard (good ol' 440BX
> chipset), and the others on a pair of Promise PDC20268 PCI cards.
>
> I think this is a "duh! that's why seed_limit_max is there" thing,
> but I didn't expect it to saturate the processor before the drives.
>
>
> Second, trying checks on a fast (2.2 GHz AMD64) machine, I'm surprised
> at how slow it is:
>
> md4 : active raid10 sdf3[4] sde3[3] sdd3[2] sdc3[1] sdb3[0] sda3[5]
>      131837184 blocks 256K chunks 2 near-copies [6/6] [UUUUUU]
>      [==================>..]  resync = 90.5% (119333248/131837184) finish=4.2min speed=48628K/sec
>
> This is 6 ST3400832AS 400 GB SATA drives, each capable of 60 MB/s
> sustained, on Sil3132 PCIe controllers with NCQ enabled.  I measured
>> 300 MB/sec sustained aggregate off a temporary RAID-0 device during
> installation.
>
> RAID-5 is even slower:
> md5 : active raid5 sdf4[5] sde4[4] sdd4[3] sdc4[2] sdb4[1] sda4[0]
>      1719155200 blocks level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]
>      [>....................]  resync =  2.4% (8401536/343831040) finish=242.9min speed=23012K/sec
>
>
> To illustrate the hardware's capabilities:
>
> # hdparm --direct -tT /dev/sd[abcdef]3
>
> /dev/sda3:
> Timing O_DIRECT cached reads:   232 MB in  2.01 seconds = 115.37 MB/sec
> Timing O_DIRECT disk reads:  200 MB in  3.01 seconds =  66.48 MB/sec
>
> /dev/sdb3:
> Timing O_DIRECT cached reads:   232 MB in  2.01 seconds = 115.62 MB/sec
> Timing O_DIRECT disk reads:  200 MB in  3.03 seconds =  66.06 MB/sec
>
> /dev/sdc3:
> Timing O_DIRECT cached reads:   240 MB in  2.02 seconds = 118.81 MB/sec
> Timing O_DIRECT disk reads:  196 MB in  3.01 seconds =  65.13 MB/sec
>
> /dev/sdd3:
> Timing O_DIRECT cached reads:   236 MB in  2.01 seconds = 117.47 MB/sec
> Timing O_DIRECT disk reads:  174 MB in  3.00 seconds =  57.98 MB/sec
>
> /dev/sde3:
> Timing O_DIRECT cached reads:   236 MB in  2.02 seconds = 117.04 MB/sec
> Timing O_DIRECT disk reads:  198 MB in  3.03 seconds =  65.38 MB/sec
>
> /dev/sdf3:
> Timing O_DIRECT cached reads:   240 MB in  2.00 seconds = 119.75 MB/sec
> Timing O_DIRECT disk reads:  186 MB in  3.01 seconds =  61.77 MB/sec
>
> Or, more to the point (interleaved output fixed up by hand):
>
> # for i in /dev/sd[abcdef]3; do hdparm --direct -tT $i & done
> [2] 4104 [3] 4105 [4] 4106 [5] 4107 [6] 4108 [7] 4109
> /dev/sda3: /dev/sdb3: /dev/sdc3: /dev/sdd3: /dev/sde3: /dev/sdf3:
> Timing O_DIRECT cached reads:   232 MB in  2.00 seconds = 115.85 MB/sec
> Timing O_DIRECT cached reads:   240 MB in  2.00 seconds = 119.75 MB/sec
> Timing O_DIRECT cached reads:   240 MB in  2.02 seconds = 118.84 MB/sec
> Timing O_DIRECT cached reads:   240 MB in  2.02 seconds = 118.66 MB/sec
> Timing O_DIRECT cached reads:   240 MB in  2.02 seconds = 118.62 MB/sec
> Timing O_DIRECT cached reads:   236 MB in  2.02 seconds = 116.57 MB/sec
> Timing O_DIRECT disk reads:  200 MB in  3.01 seconds =  66.41 MB/sec
> Timing O_DIRECT disk reads:  198 MB in  3.01 seconds =  65.80 MB/sec
> Timing O_DIRECT disk reads:  178 MB in  3.01 seconds =  59.07 MB/sec
> Timing O_DIRECT disk reads:  190 MB in  3.02 seconds =  62.88 MB/sec
> Timing O_DIRECT disk reads:  194 MB in  3.03 seconds =  64.09 MB/sec
> Timing O_DIRECT disk reads:  200 MB in  3.03 seconds =  66.08 MB/sec
> [2]   Done                    hdparm --direct -tT $i
> [3]   Done                    hdparm --direct -tT $i
> [4]   Done                    hdparm --direct -tT $i
> [5]   Done                    hdparm --direct -tT $i
> [6]   Done                    hdparm --direct -tT $i
> [7]-  Done                    hdparm --direct -tT $i
>
> A quick test program emulating RAID-1 (appended) produced:
>
> # ./xor /dev/sdb3 /dev/sdc3
> Read 131072 K in 1944478 usec (69025068/sec)
> Read 131072 K in 1952476 usec (68742318/sec)
> Final sum: 0000000000000000
> XOR time: 77007 usec (1742928928 bytes/sec)
> # ./xor /dev/md4 /dev/md4
> Read 131072 K in 580483 usec (231217327/sec)
> Read 131072 K in 583844 usec (229886284/sec)
> Final sum: 0000000000000000
> XOR time: 76901 usec (1745331374 bytes/sec)
> # ./xor /dev/md5 /dev/md5
> Read 131072 K in 484162 usec (277216568/sec)
> Read 131072 K in 458060 usec (293013421/sec)
> Final sum: 0000000000000000
> XOR time: 76752 usec (1748719616 bytes/sec)
>
> And that's without using prefetch or SSE, so I don't think the processor
> is a bottleneck.  Any ideas why checking is not 3x faster?
>
> === xor.c ===
> #define _GNU_SOURCE	/* For O_DIRECT */
> #include <stdio.h>
> #include <malloc.h>	/* For valloc */
> #include <unistd.h>
> #include <fcntl.h>
> #include <sys/time.h>
>
> #define WORDS (16*1024*1024)
> #define BYTES (WORDS * (unsigned)sizeof(long))
>
> static unsigned __attribute__((pure))
> tv_diff(struct timeval const *start, struct timeval const *stop)
> {
> 	return 1000000u * (stop->tv_sec - start->tv_sec) + stop->tv_usec - start->tv_usec;
> }
>
> int
> main(int argc, char **argv)
> {
> 	int fd1, fd2;
> 	long *p1 = valloc(2 * WORDS * sizeof *p1);
> 	long *p2 = p1 + WORDS;
> 	long sum = 0;
> 	unsigned i;
> 	struct timeval start, stop;
> 	ssize_t ss;
>
> 	if (argc != 3) {
> 		fputs("Expecting 2 arguments\n", stderr);
> 		return 1;
> 	}
> 	fd1 = open(argv[1], O_RDONLY | O_DIRECT);
> 	if (fd1 < 0) {
> 		perror(argv[1]);
> 		return 1;
> 	}
> 	fd2 = open(argv[2], O_RDONLY | O_DIRECT);
> 	if (fd2 < 0) {
> 		perror(argv[2]);
> 		return 1;
> 	}
>
> 	gettimeofday(&start, 0);
> 	ss = read(fd1, p1, BYTES);
> 	if (ss < (ssize_t)BYTES) {
> 		if (ss < 0)
> 			perror(argv[1]);
> 		else
> 			fprintf(stderr, "%s: short read (%zd)\n", argv[1], ss);
> 		return 1;
> 	}
> 	gettimeofday(&stop, 0);
> 	i = tv_diff(&start, &stop);
> 	printf("Read %u K in %u usec (%lu/sec)\n", BYTES/1024, i, 1000000ul*BYTES/i);
>
> 	ss = read(fd2, p2, BYTES);
> 	if (ss < BYTES) {
> 		if (ss < 0)
> 			perror(argv[2]);
> 		else
> 			fprintf(stderr, "%s: short read (%zd)\n", argv[2], ss);
> 		return 1;
> 	}
> 	gettimeofday(&start, 0);
> 	i = tv_diff(&stop, &start);
> 	printf("Read %u K in %u usec (%lu/sec)\n", BYTES/1024, i, 1000000ul*BYTES/i);
>
> 	for (i = 0; i < WORDS; i++)
> 		sum |= p1[i] ^ p2[i];
>
> 	gettimeofday(&stop, 0);
>
> 	printf("Final sum: %016lx\n", sum);
> 	i = tv_diff(&start, &stop);
> 	printf("XOR time: %u usec (%lu bytes/sec)\n", i, 1000000ul * BYTES / i);
> 	return 0;
> }
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Interesting RAID checking observations
  2006-08-27  6:46 Interesting RAID checking observations linux
  2006-08-27 11:18 ` Justin Piszcz
@ 2006-08-28  1:04 ` Neil Brown
  2006-08-28 13:15   ` linux
  1 sibling, 1 reply; 11+ messages in thread
From: Neil Brown @ 2006-08-28  1:04 UTC (permalink / raw)
  To: linux; +Cc: linux-raid

On  August 27, linux@horizon.com wrote:
> Two discoveries: first, I locked up my machine, and second, it's surprisingly
> slow.
...
> 
> I think this is a "duh! that's why seed_limit_max is there" thing,
> but I didn't expect it to saturate the processor before the drives.
> 

I don't think the processor is saturating.  I've seen reports of this
sort of thing before and until recently had no idea what was happening,
couldn't reproduce it, and couldn't think of any more useful data to
collect.
I think I may have recently stumbled across the problem though.
md doesn't support 'bdi_write_congested' simply because I never knew
about it until recently.
This means that when the VM is trying to gently flush out dirty memory
is checks to see if an MD device is congested, sees (wrongly) that it
isn't, and happily try to write data to it.  But if it really is
congested (e.g. because a resync is using up all the bandwidth so the
device queues are full), the VM blocks when it doesn't expect to.
I'm not sure exactly how that translates into the machine being
unresponsive, but it wouldn't surprise me.  I'll try to post a patch
for that to linux-raid in a soonish.

How much RAM does this system have?

> 
> Second, trying checks on a fast (2.2 GHz AMD64) machine, I'm surprised
> at how slow it is:
> 
> md4 : active raid10 sdf3[4] sde3[3] sdd3[2] sdc3[1] sdb3[0] sda3[5]
>       131837184 blocks 256K chunks 2 near-copies [6/6] [UUUUUU]
>       [==================>..]  resync = 90.5% (119333248/131837184) finish=4.2min speed=48628K/sec
> 
> This is 6 ST3400832AS 400 GB SATA drives, each capable of 60 MB/s
> sustained, on Sil3132 PCIe controllers with NCQ enabled.  I measured
> > 300 MB/sec sustained aggregate off a temporary RAID-0 device during
> installation.
> 
> RAID-5 is even slower:
> md5 : active raid5 sdf4[5] sde4[4] sdd4[3] sdc4[2] sdb4[1] sda4[0]
>       1719155200 blocks level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]
>       [>....................]  resync =  2.4% (8401536/343831040) finish=242.9min speed=23012K/sec
> 

Again, you aren't the first to report this.  I haven't seen it myself,
but my regular test machine is busy at the moment.  I'll try to
organise some testing soon.

Thanks for the reports.

NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Interesting RAID checking observations
  2006-08-27 11:18 ` Justin Piszcz
@ 2006-08-28 13:00   ` linux
  0 siblings, 0 replies; 11+ messages in thread
From: linux @ 2006-08-28 13:00 UTC (permalink / raw)
  To: jpiszcz, linux; +Cc: linux-raid

> The PCI bus is only capable of 133MB/s max.  Unless you have dedicated 
> SATA ports, each on its own PCI-e bus, you will not get speeds in excess 
> of 133MB/s, 200MB/s+ I have read reports of someone using 4-5 SATA 
> controllers (SiI 3112 cards on PCI-e x1 ports) and they got around 200MB/s 
> or so on a RAID5, I assume that was read performance.

Um, I *do* have dedicated SATA ports, each pair on its own PCIe bus.
You might recall I wrote:

>> This is 6 ST3400832AS 400 GB SATA drives, each capable of 60 MB/s
>> sustained, on Sil3132 PCIe controllers with NCQ enabled.  I measured
>> > 300 MB/sec sustained aggregate off a temporary RAID-0 device during
>> installation.

I also included examples of reading 60 MB/s off 6 drives in parallel
(360 MB/s aggregate), and reading off RAID-5 at 277 MB/s.

To be specific, the narrowest bottleneck between drive and RAM is the
250 MB/s PCIe link shared by each pair of drives.  To quote approximate
one-way bandwidths:

    1.5 Gb/s SATA         2.5 Gb/s PCIe
Drive <--------> Sil3132
Drive <--------> Dual SATA <--------\
                                     \ 
Drive <--------> Sil3132              \  16x HyperTransport     2x DDR
Drive <--------> Dual SATA <-------> nForce4 <-----> CPU <-----> RAM
                                      /     4000 MB/s    6400 MB/s
Drive <--------> Sil3132             /
Drive <--------> Dual SATA <--------/
    150 MB/s each         250 MB/s each

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Interesting RAID checking observations
  2006-08-28  1:04 ` Neil Brown
@ 2006-08-28 13:15   ` linux
  2006-09-04 17:16     ` Bill Davidsen
  0 siblings, 1 reply; 11+ messages in thread
From: linux @ 2006-08-28 13:15 UTC (permalink / raw)
  To: linux, neilb; +Cc: linux-raid

> I don't think the processor is saturating.  I've seen reports of this
> sort of thing before and until recently had no idea what was happening,
> couldn't reproduce it, and couldn't think of any more useful data to
> collect.

Well I can reproduce it easily enough.  It's a production server, but
I can do low-risk experiments after hours.

I'd like to note that the symptoms include not even being
able to *type* at the console, which I thought was all in-kernel
code, not subject to being swapped out.  But whatever.

> I'm not sure exactly how that translates into the machine being
> unresponsive, but it wouldn't surprise me.  I'll try to post a patch
> for that to linux-raid in a soonish.

I look forward to testing it!

> How much RAM does this system have?

1 GB (ECC).  133 MHz SDR SDRAM.

> Again, you aren't the first to report this.  I haven't seen it myself,
> but my regular test machine is busy at the moment.  I'll try to
> organise some testing soon.

I can do a fair bit of experimentation with this machine, as long as I
don't lose the data that's on it.

> Thanks for the reports.

Thanks for the response!  Sorry to be a pain.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Interesting RAID checking observations
  2006-08-28 13:15   ` linux
@ 2006-09-04 17:16     ` Bill Davidsen
  2006-09-04 21:15       ` linux
  0 siblings, 1 reply; 11+ messages in thread
From: Bill Davidsen @ 2006-09-04 17:16 UTC (permalink / raw)
  To: linux; +Cc: neilb, linux-raid

linux@horizon.com wrote:

>>I don't think the processor is saturating.  I've seen reports of this
>>sort of thing before and until recently had no idea what was happening,
>>couldn't reproduce it, and couldn't think of any more useful data to
>>collect.
>>    
>>
>
>Well I can reproduce it easily enough.  It's a production server, but
>I can do low-risk experiments after hours.
>
>I'd like to note that the symptoms include not even being
>able to *type* at the console, which I thought was all in-kernel
>code, not subject to being swapped out.  But whatever.
>
Really? Or is it just that you can type but the characters don't get 
echoed. The type part is in the kernel, but the display involves X 
unless you run a direct console.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Interesting RAID checking observations
  2006-09-04 17:16     ` Bill Davidsen
@ 2006-09-04 21:15       ` linux
  0 siblings, 0 replies; 11+ messages in thread
From: linux @ 2006-09-04 21:15 UTC (permalink / raw)
  To: davidsen; +Cc: linux-raid, linux, neilb

>> I'd like to note that the symptoms include not even being
>> able to *type* at the console, which I thought was all in-kernel
>> code, not subject to being swapped out.  But whatever.
>
> Really? Or is it just that you can type but the characters don't get 
> echoed. The type part is in the kernel, but the display involves X 
> unless you run a direct console.

No, as I said, it was at a (text-mode) console, /dev/tty2 as I recall.
You're right, I expect it was just not echoing, as I could switch
consoles fine, implying that the lowest-level interrupt handler was
working, but it's still surprising.

The server stopped responding to my login, so I wandered over to the
console to look for kernel panics and encountered the symptom.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Interesting RAID checking observations
@ 2006-09-21  6:58 linux
  2006-09-21 18:48 ` linux
  2006-09-21 18:55 ` Rob Bray
  0 siblings, 2 replies; 11+ messages in thread
From: linux @ 2006-09-21  6:58 UTC (permalink / raw)
  To: linux-raid; +Cc: linux, neilb

Just to follow up my speed observations last month on a 6x SATA <-> 3x
PCIe <-> AMD64 system, as of 2.6.18 final, RAID-10 checking is running
at a reasonable ~156 MB/s (which I presume means 312 MB/s of reads),
and raid5 is better than the 23 MB/s I complained about earlier, but
still a bit sluggish...

md5 : active raid5 sdf4[5] sde4[4] sdd4[3] sdc4[2] sdb4[1] sda4[0]
      1719155200 blocks level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]
      [=>...................]  resync =  6.2% (21564928/343831040) finish=86.0min speed=62429K/sec

I'm not sure why the raid5 check can't run at 250 MB/s (300 MB/s disk
speed).  The processor is idle and can do a lot more than that:

raid5: automatically using best checksumming function: generic_sse
   generic_sse:  6769.000 MB/sec
raid5: using function: generic_sse (6769.000 MB/sec)

But anyway, it's better, so thank you!  I haven't rebooted the celeron
I hung for the duration of a RAID-1 check, so I haven't checked that with
2.6.18 yet.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Interesting RAID checking observations
  2006-09-21  6:58 linux
@ 2006-09-21 18:48 ` linux
  2006-09-21 18:55 ` Rob Bray
  1 sibling, 0 replies; 11+ messages in thread
From: linux @ 2006-09-21 18:48 UTC (permalink / raw)
  To: linux-raid; +Cc: linux, neilb

> But anyway, it's better, so thank you!  I haven't rebooted the celeron
> I hung for the duration of a RAID-1 check, so I haven't checked that with
> 2.6.18 yet.

Followup: 2.6.18 installed and tested; no change.  The machine still
"goes away" for the duration if three RAID-1 checks are run in parallel,
and comes back to life only when the first one finishes.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Interesting RAID checking observations
  2006-09-21  6:58 linux
  2006-09-21 18:48 ` linux
@ 2006-09-21 18:55 ` Rob Bray
  2006-09-22  2:13   ` linux
  1 sibling, 1 reply; 11+ messages in thread
From: Rob Bray @ 2006-09-21 18:55 UTC (permalink / raw)
  Cc: linux-raid, linux

> Just to follow up my speed observations last month on a 6x SATA <-> 3x
> PCIe <-> AMD64 system, as of 2.6.18 final, RAID-10 checking is running
> at a reasonable ~156 MB/s (which I presume means 312 MB/s of reads),
> and raid5 is better than the 23 MB/s I complained about earlier, but
> still a bit sluggish...
>
> md5 : active raid5 sdf4[5] sde4[4] sdd4[3] sdc4[2] sdb4[1] sda4[0]
>       1719155200 blocks level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]
>       [=>...................]  resync =  6.2% (21564928/343831040)
> finish=86.0min speed=62429K/sec
>
> I'm not sure why the raid5 check can't run at 250 MB/s (300 MB/s disk
> speed).  The processor is idle and can do a lot more than that:
>
> raid5: automatically using best checksumming function: generic_sse
>    generic_sse:  6769.000 MB/sec
> raid5: using function: generic_sse (6769.000 MB/sec)
>
>
> But anyway, it's better, so thank you!  I haven't rebooted the celeron
> I hung for the duration of a RAID-1 check, so I haven't checked that with
> 2.6.18 yet.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Check the I/O performance on the box. I think the "speed" indicator comes
out of calculations to determine how fast a failing drive would be
rebuilt, were you doing a rebuild instead of a check. I like using the
"dstat" tool to get that info at-a-glance
(http://dag.wieers.com/home-made/dstat/).


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Interesting RAID checking observations
  2006-09-21 18:55 ` Rob Bray
@ 2006-09-22  2:13   ` linux
  0 siblings, 0 replies; 11+ messages in thread
From: linux @ 2006-09-22  2:13 UTC (permalink / raw)
  To: raid; +Cc: linux-raid, linux

> Check the I/O performance on the box. I think the "speed" indicator comes
> out of calculations to determine how fast a failing drive would be
> rebuilt, were you doing a rebuild instead of a check. I like using the
> "dstat" tool to get that info at-a-glance
> (http://dag.wieers.com/home-made/dstat/).

Thank you, you're quite right!  Running dtst during the check shows

--dsk/sda-----dsk/sdb-----dsk/sdc-----dsk/sdd-----dsk/sde-----dsk/sdf----dsk/total-
_read _writ:_read _writ:_read _writ:_read _writ:_read _writ:_read _writ:_read _writ
2954k   37k:2952k   36k:2954k   37k:2952k   35k:2955k   37k:2952k   35k:  17M  217k
  62M    0 :  62M    0 :  62M    0 :  62M    0 :  61M    0 :  62M    0 : 370M    0 
  62M    0 :  62M    0 :  62M    0 :  62M    0 :  62M    0 :  62M    0 : 372M    0 
  62M 8192B:  62M   76k:  62M   76k:  62M   36k:  62M   36k:  62M 8192B: 373M  240k
  47M 8192B:  47M 8192B:  47M 8192B:  47M 8192B:  47M 8192B:  47M 8192B: 283M   48k
  59M   16k:  59M   16k:  59M   16k:  59M  100k:  59M  100k:  59M   16k: 353M  264k
  63M    0 :  63M    0 :  63M    0 :  63M    0 :  63M    0 :  63M    0 : 378M    0 
  58M    0 :  58M    0 :  58M    0 :  58M    0 :  58M    0 :  58M    0 : 348M    0 
  57M    0 :  58M    0 :  58M    0 :  57M    0 :  58M    0 :  58M    0 : 345M    0 
  61M    0 :  61M    0 :  61M    0 :  61M    0 :  61M    0 :  61M    0 : 365M    0 
  61M    0 :  60M    0 :  60M    0 :  61M    0 :  60M    0 :  60M    0 : 363M    0 
  60M   16k:  60M   20k:  59M   20k:  59M   28k:  59M   28k:  60M   16k: 356M  128k

Whish is 60 MB/drive, exactly as expected.

On RAID-5, I could call it a "redundancy checking rate", but that doesn't
correspond to the RAID-1 numbers:

md0 : active raid1 sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0]
      979840 blocks [6/6] [UUUUUU]
      [==>..................]  resync = 14.3% (140480/979840) finish=0.4min speed=28096K/sec

Judging from dstat, that 28 MB/s figure is a per-drive.  (And I wonder
why *that* is so low.)


But anyway, good work on RAID-5!  Multi-way RAID-1 isn't economical on large
volumes of data, so it's hard to get very concerned about the speed.  I *can*
get > 60 MB/s per drive on 3 parallel 2-way RAID-1 checks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-09-22  2:13 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-27  6:46 Interesting RAID checking observations linux
2006-08-27 11:18 ` Justin Piszcz
2006-08-28 13:00   ` linux
2006-08-28  1:04 ` Neil Brown
2006-08-28 13:15   ` linux
2006-09-04 17:16     ` Bill Davidsen
2006-09-04 21:15       ` linux
  -- strict thread matches above, loose matches on Subject: below --
2006-09-21  6:58 linux
2006-09-21 18:48 ` linux
2006-09-21 18:55 ` Rob Bray
2006-09-22  2:13   ` linux

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).