* non volatile ram devices
@ 2002-12-04 19:59 Russell Coker
2002-12-04 20:24 ` Ragnar Kjørstad
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Russell Coker @ 2002-12-04 19:59 UTC (permalink / raw)
To: linux-ide-arrays; +Cc: ReiserFS
I have some servers that are giving inadequate disk performance for Maildir
mail spools. They are running kernel 2.4.19 (2.4.20 upgrade is planned) and
using ReiserFS for everything that's important.
At this stage it is impossible for me to replace disks, RAID controllers, or
anything else really significant.
What I am thinking of doing is using a kernel that supports data journalling
which should increase performance, but still probably won't give me enough.
So I am thinking of using an "external journal" (or using software RAID to
put the part of the partition containing the journal on a different device).
The device containing the journal would be something much faster than physical
media. I have been doing some research on non-volatile memory devices. I
only found one company producing disks that are RAM based with battery
backup, and they seem to start at $10K (too expensive - probably because they
are much larger than I need, I need 128M at most, they provide 2G). I found
many companies selling flash memory, but that only takes a million writes
(that'll last about an hour for the use I plan). I found one company selling
PC-Card devices that have two batterys for backup, but that requires getting
a PCI controller for PC-Card's (something I haven't tried before).
Does anyone know of an affordable ($1000 or less) device that can survive
unexpected power outages of at least 24 hours duration, can commit a write in
less than 1ms, supports unlimited writes, and connects to a IDE or SCSI bus
(or PCI if there's a suitable Linux driver).
--
http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/ Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/ My home page
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: non volatile ram devices 2002-12-04 19:59 non volatile ram devices Russell Coker @ 2002-12-04 20:24 ` Ragnar Kjørstad 2002-12-05 9:00 ` Russell Coker 2002-12-04 22:05 ` Hans Reiser 2002-12-05 6:32 ` Oleg Drokin 2 siblings, 1 reply; 20+ messages in thread From: Ragnar Kjørstad @ 2002-12-04 20:24 UTC (permalink / raw) To: Russell Coker; +Cc: linux-ide-arrays, ReiserFS On Wed, Dec 04, 2002 at 08:59:35PM +0100, Russell Coker wrote: > I have some servers that are giving inadequate disk performance for Maildir > mail spools. They are running kernel 2.4.19 (2.4.20 upgrade is planned) and > using ReiserFS for everything that's important. One thing you might considder is replacing the reiserfs hash with a maildir-specific hash. In my rather limited testing I found that it was significantly faster; I think some tests gave 200-300% speed improvement. But, as I said, there was only limited testing. Don't go this route unless you have the time to test it properly both for stability and performance. > What I am thinking of doing is using a kernel that supports data journalling > which should increase performance, but still probably won't give me enough. > So I am thinking of using an "external journal" (or using software RAID to > put the part of the partition containing the journal on a different device). > > The device containing the journal would be something much faster than physical > media. Even if the device is just a regular disk it should give you a real performance boost. Depending on your RAID-setup, it may not be the throughput, but the seeking back and forth between the journal and the rest of the disk that kills performance. Having the journal on a seperate disk solves that problem. > Does anyone know of an affordable ($1000 or less) device that can survive > unexpected power outages of at least 24 hours duration, can commit a write in > less than 1ms, supports unlimited writes, and connects to a IDE or SCSI bus > (or PCI if there's a suitable Linux driver). Did you check out Micro Memory Inc? (http://www.umem.com/) I think they have some PCI-cards (with linux-drivers) which may be suitable for this. However, the main strength of flash/RAM devices is that you can do random writes very fast. For a journal deice all access will be sequential, so there may not be much advantage compared to using a seperate disk for the journal? I've never tried, so I'm not sure exactly how well it would work. Is your server read- or write- bound? I've found that some mailservers are IO-bound because of reads (I guess pop- and imap-servers that are polling), and then the external journal is not likely to help. -- Ragnar Kjørstad ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: non volatile ram devices 2002-12-04 20:24 ` Ragnar Kjørstad @ 2002-12-05 9:00 ` Russell Coker 2002-12-05 10:38 ` Ragnar Kjørstad 2002-12-05 13:23 ` Chris Mason 0 siblings, 2 replies; 20+ messages in thread From: Russell Coker @ 2002-12-05 9:00 UTC (permalink / raw) To: Ragnar Kjørstad; +Cc: linux-ide-arrays, ReiserFS, Mike Jadon On Wed, 4 Dec 2002 21:24, Ragnar Kjørstad wrote: > On Wed, Dec 04, 2002 at 08:59:35PM +0100, Russell Coker wrote: > > I have some servers that are giving inadequate disk performance for > > Maildir mail spools. They are running kernel 2.4.19 (2.4.20 upgrade is > > planned) and using ReiserFS for everything that's important. > > One thing you might considder is replacing the reiserfs hash with a > maildir-specific hash. In my rather limited testing I found that it was > significantly faster; I think some tests gave 200-300% speed > improvement. > > But, as I said, there was only limited testing. Don't go this route > unless you have the time to test it properly both for stability and > performance. Thanks for the suggestion. However I don't think that I have the resources to develop and adequately test such a change. Also I doubt that this will help much for my use, I am seeing 160 writes per second but only 20 reads per second at peak load times. So I think that the caching is doing well (and directory sizes aren't too big because of quotas). > > The device containing the journal would be something much faster than > > physical media. > > Even if the device is just a regular disk it should give you a real > performance boost. Depending on your RAID-setup, it may not be the > throughput, but the seeking back and forth between the journal and the > rest of the disk that kills performance. Having the journal on a > seperate disk solves that problem. True. However I could only put in a single extra disk, and I don't want to use non-RAID... > > Does anyone know of an affordable ($1000 or less) device that can survive > > unexpected power outages of at least 24 hours duration, can commit a > > write in less than 1ms, supports unlimited writes, and connects to a IDE > > or SCSI bus (or PCI if there's a suitable Linux driver). > > Did you check out Micro Memory Inc? (http://www.umem.com/) > I think they have some PCI-cards (with linux-drivers) which may be > suitable for this. Thanks for everyone who recommended that, I'll check it out. Based on the prices that Mike offered it seems crazy to go for a mere 128M, I think that a 1G card would do best. I could use it for the journal of the mail store file system and for the entire mail spool. This should multiply mail delivery performance by a factor of at least 4 I think! Given the price difference between 128M and 1G, maybe I should be looking at 2G... > However, the main strength of flash/RAM devices is that you can do > random writes very fast. For a journal deice all access will be > sequential, so there may not be much advantage compared to using a > seperate disk for the journal? I've never tried, so I'm not sure exactly > how well it would work. One significant issue of RAM is that there's almost zero latency. Doing a synchronous write to disk (IE any journal write) takes a significant amount of time. Moving that to RAM should improve things a lot. > Is your server read- or write- bound? I've found that some mailservers > are IO-bound because of reads (I guess pop- and imap-servers that are > polling), and then the external journal is not likely to help. In this case the machines each have 4G of RAM. The total RAM for the mail cluster is four times what was used for the Solaris cluster, and Intel X86 architecture uses less RAM than SPARC (32bit CISC vs 64bit RISC) and I suspect that the software we're now using (Qmail and Courier) is more memory efficient than Netscape too. Overall we have heaps more cache memory than before, I'm seeing 20 reads per second and 160 writes per second at times of peak load. I don't think that the ratio of reads and writes will change, the people who have their machines constantly polling for mail are the ones who receive the most mail, so therefore for reads it all stays in cache. When we scale the machines up to more users if the RAM proves inadequate then we can always upgrade the servers to 8G of RAM each if necessary... -- http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark http://www.coker.com.au/postal/ Postal SMTP/POP benchmark http://www.coker.com.au/~russell/ My home page ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: non volatile ram devices 2002-12-05 9:00 ` Russell Coker @ 2002-12-05 10:38 ` Ragnar Kjørstad 2002-12-05 10:45 ` Russell Coker 2002-12-05 13:23 ` Chris Mason 1 sibling, 1 reply; 20+ messages in thread From: Ragnar Kjørstad @ 2002-12-05 10:38 UTC (permalink / raw) To: Russell Coker; +Cc: linux-ide-arrays, ReiserFS, Mike Jadon On Thu, Dec 05, 2002 at 10:00:32AM +0100, Russell Coker wrote: > > Even if the device is just a regular disk it should give you a real > > performance boost. Depending on your RAID-setup, it may not be the > > throughput, but the seeking back and forth between the journal and the > > rest of the disk that kills performance. Having the journal on a > > seperate disk solves that problem. > > True. However I could only put in a single extra disk, and I don't want to > use non-RAID... Unless you use two ramdisks you still have a single point of failure. Not sure exactly how the reability of the ramdrive is compared to a disk? -- Ragnar Kjørstad ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: non volatile ram devices 2002-12-05 10:38 ` Ragnar Kjørstad @ 2002-12-05 10:45 ` Russell Coker 0 siblings, 0 replies; 20+ messages in thread From: Russell Coker @ 2002-12-05 10:45 UTC (permalink / raw) To: Ragnar Kjørstad; +Cc: linux-ide-arrays, ReiserFS, Mike Jadon On Thu, 5 Dec 2002 11:38, Ragnar Kjørstad wrote: > On Thu, Dec 05, 2002 at 10:00:32AM +0100, Russell Coker wrote: > > > Even if the device is just a regular disk it should give you a real > > > performance boost. Depending on your RAID-setup, it may not be the > > > throughput, but the seeking back and forth between the journal and the > > > rest of the disk that kills performance. Having the journal on a > > > seperate disk solves that problem. > > > > True. However I could only put in a single extra disk, and I don't want > > to use non-RAID... > > Unless you use two ramdisks you still have a single point of failure. True, but then both the RAID controller and the motherboard are single points of failure already. > Not sure exactly how the reability of the ramdrive is compared to a > disk? The RAM drive has no moving parts and should be inherantly more reliable. I don't recall ever having RAM die on a machine that had been functioning properly except when mechanical issues apply (IE clumsy people taking machines apart). Hard drives die regularly. Get a busy machine like a news server or a mail server and you expect to keep replacing dead hard drives as they wear out. -- http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark http://www.coker.com.au/postal/ Postal SMTP/POP benchmark http://www.coker.com.au/~russell/ My home page ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: non volatile ram devices 2002-12-05 9:00 ` Russell Coker 2002-12-05 10:38 ` Ragnar Kjørstad @ 2002-12-05 13:23 ` Chris Mason 2002-12-06 9:52 ` Russell Coker 1 sibling, 1 reply; 20+ messages in thread From: Chris Mason @ 2002-12-05 13:23 UTC (permalink / raw) To: Russell Coker; +Cc: Ragnar Kjørstad, ReiserFS, Mike Jadon On Thu, 2002-12-05 at 04:00, Russell Coker wrote: > In this case the machines each have 4G of RAM. The total RAM for the mail > cluster is four times what was used for the Solaris cluster, and Intel X86 > architecture uses less RAM than SPARC (32bit CISC vs 64bit RISC) and I > suspect that the software we're now using (Qmail and Courier) is more memory > efficient than Netscape too. Overall we have heaps more cache memory than > before, I'm seeing 20 reads per second and 160 writes per second at times of > peak load. Have you benchmarked these machines to determine the max write load capacity on reiserfs? Are you using a vanilla kernel or one with patches applied? I've done a few of my own benchmarks of the data logging patches, but it would be great to see some independent verification of the speedups in a real mail server workload. -chris ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: non volatile ram devices 2002-12-05 13:23 ` Chris Mason @ 2002-12-06 9:52 ` Russell Coker 2002-12-06 13:03 ` Chris Mason 2002-12-06 23:50 ` Matthias Andree 0 siblings, 2 replies; 20+ messages in thread From: Russell Coker @ 2002-12-06 9:52 UTC (permalink / raw) To: Chris Mason; +Cc: Ragnar Kjørstad, ReiserFS, Mike Jadon [-- Attachment #1: Type: text/plain, Size: 2617 bytes --] On Thu, 5 Dec 2002 14:23, Chris Mason wrote: > On Thu, 2002-12-05 at 04:00, Russell Coker wrote: > > In this case the machines each have 4G of RAM. The total RAM for the > > mail cluster is four times what was used for the Solaris cluster, and > > Intel X86 architecture uses less RAM than SPARC (32bit CISC vs 64bit > > RISC) and I suspect that the software we're now using (Qmail and Courier) > > is more memory efficient than Netscape too. Overall we have heaps more > > cache memory than before, I'm seeing 20 reads per second and 160 writes > > per second at times of peak load. > > Have you benchmarked these machines to determine the max write load > capacity on reiserfs? Are you using a vanilla kernel or one with > patches applied? I'm using a fairly vanilla kernel. It's performance is 2 messages per second taken from qmail spool and delivered while there is a background load of pop access and new incoming mail. IE if there is a backlog of mail to deliver the backlog gets smaller by 120 messages per minute. > I've done a few of my own benchmarks of the data logging patches, but it > would be great to see some independent verification of the speedups in a > real mail server workload. I've attached the results from a quick bonnie++ run of a vanilla system, a system with the patches you referred me to, and finally with the file system mounted with data journalling. The test was pretty quick (only a single pass) because I've spent so much time fiddling with the crappy test hardware to be inclined to spend too much effort on it (how the hell is a P3-600 with a 6G IDE drive and 128M of RAM supposed to be used for evaluating software to deploy on a server with 2*1.8GHz CPUs, 196G of hardware RAID, and 4G of RAM). The results seem to show that the patches do some good on their own, nothing really exciting but worth having. The data journalling improves performance of synchronously creating files in the 512b to 16K size range (the issue I am interested in) by a factor of 7! This is very promising, I only hope that the performance gains when 200 processes are hitting a hardware RAID array of 4 U160 disks are as good as when a single process is hitting a cheap old IDE disk. This may even remove the immediate need for umem devices. But I think I'll try and get them anyway. Extra speed is always useful. -- http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark http://www.coker.com.au/postal/ Postal SMTP/POP benchmark http://www.coker.com.au/~russell/ My home page [-- Attachment #2: res.html --] [-- Type: text/html, Size: 4706 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: non volatile ram devices 2002-12-06 9:52 ` Russell Coker @ 2002-12-06 13:03 ` Chris Mason 2002-12-06 23:53 ` Matthias Andree 2002-12-06 23:50 ` Matthias Andree 1 sibling, 1 reply; 20+ messages in thread From: Chris Mason @ 2002-12-06 13:03 UTC (permalink / raw) To: Russell Coker; +Cc: Ragnar Kjørstad, ReiserFS, Mike Jadon [-- Attachment #1: Type: text/plain, Size: 1442 bytes --] On Fri, 2002-12-06 at 04:52, Russell Coker wrote: > The results seem to show that the patches do some good on their own, nothing > really exciting but worth having. The data journalling improves performance > of synchronously creating files in the 512b to 16K size range (the issue I am > interested in) by a factor of 7! This is very promising, I only hope that > the performance gains when 200 processes are hitting a hardware RAID array of > 4 U160 disks are as good as when a single process is hitting a cheap old IDE > disk. > You should see a significant improvement over the old code as the number of procs involved goes up. The data logging patches have an optimization andrew morton suggested, which is to schedule for a bit during an fsync to allow other procs to get some work done and increase the size of the transaction. I've attached his synctest.c, which tries to approximate a postfix mail load. Check the difference between data=journal and a pure kernel for time synctest -F -f -n 1 -t 100 dir_name (it does no timing on it's own, you'll have to run it under time) This does fsyncs on both the file and the directory in a simulated delivery. It isn't a perfect benchmark, but it does hammer on fsyncs nicely. Another interesting metric is to use the reiserfs proc interface to count the number of transactions required to finish each run. (check the transid in proc/fs/reiserfs/<disk>/journal) -chris [-- Attachment #2: synctest.c --] [-- Type: text/plain, Size: 7642 bytes --] /* * Test and benchmark synchronous operations. */ #undef _XOPEN_SOURCE /* MAP_ANONYMOUS */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <fcntl.h> #include <errno.h> #include <stdarg.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/resource.h> #include <sys/wait.h> #include <sys/mman.h> /* * Lots of yummy globals! */ char *progname, *dirname; int verbose, use_fsync, use_osync; int fsync_dir; int n_threads = 1, n_iters = 100; int *child_status; int this_child_index; int dir_fd; int show_tids; int threads_per_dir = 1; int thread_group; int do_unlink; int rename_pass; #define N_FILES 100 #define UNLINK_LAG 30 #define RENAME_PASSES 3 void show(char *fmt, ...) { if (verbose) { va_list ap; va_start(ap, fmt); vfprintf(stdout, fmt, ap); fflush( stdout ); va_end(ap); } } /* * - Create a file. * - Write some data to it * - Maybe fsync() it. * - Close it * - Maybe fsync() its parent dir * - rename() it. * - maybe fsync() its parent dir * - rename() it. * - maybe fsync() its parent dir * - rename() it. * - maybe fsync() its parent dir * - UNLINK_LAG files later, maybe unlink it. * - maybe fsync() its parent dir * * Repeat the above N_FILES times */ char *mk_dirname(void) { char *ret = malloc(strlen(dirname) + 64); sprintf(ret, "%s/%05d", dirname, thread_group); return ret; } char *mk_filename(int fileno) { char *ret = malloc(strlen(dirname) + 64); sprintf(ret, "%s/%05d/%05d-%05d", dirname, thread_group, getpid(), fileno); return ret; } char *mk_new_filename(int fileno, int pass) { char *ret = malloc(strlen(dirname) + 64); sprintf(ret, "%s/%05d/%02d-%05d-%05d", dirname, thread_group, pass, getpid(), fileno); return ret; } void sync_dir(void) { if (fsync_dir) { show("fsync(%s)\n", dirname); if (fsync(dir_fd) < 0) { fprintf(stderr, "%s: failed to fsync dir `%s': %s\n", progname, dirname, strerror(errno)); exit(1); } } } void make_dir(void) { char *n = mk_dirname(); show("mkdir(%s)\n", n); if (mkdir(n, 0777) < 0) { fprintf(stderr, "%s: Cannot make directory `%s': %s\n", progname, n, strerror(errno)); exit(1); } free(n); } void remove_dir(void) { char *n = mk_dirname(); show("rmdir(%s)\n", n); rmdir(n); free(n); } void write_stuff_to(int fd, char *name) { static char buf[500000]; static int to_write = 5000; show("write %d bytes to `%s'\n", sizeof(buf), name); if (write(fd, buf, to_write) != to_write) { fprintf(stderr, "%s: failed to write %d bytes to `%s': %s\n", progname, to_write, name, strerror(errno)); exit(1); } to_write *= 1.1; if (to_write > 250000) to_write = 5000; } void unlink_one_file(int fileno, int pass) { if (do_unlink) { char *name = mk_new_filename(fileno, pass); show("unlink(%s)\n", name); if (unlink(name) < 0) { fprintf(stderr, "%s: failed to unlink `%s': %s\n", progname, name, strerror(errno)); exit(1); } sync_dir(); free(name); } } void do_one_file(int fileno) { char *name = mk_filename(fileno); int fd, flags; flags = O_RDWR|O_CREAT|O_TRUNC; if (use_osync) flags |= O_SYNC; show("open(%s)\n", name); fd = open(name, flags, 0666); if (fd < 0) { fprintf(stderr, "%s: failed to create file `%s': %s\n", progname, name, strerror(errno)); exit(1); } write_stuff_to(fd, name); if (use_fsync) { show("fsync(%s)\n", name); if (fsync(fd) < 0) { fprintf(stderr, "%s: failed to fsync `%s': %s\n", progname, name, strerror(errno)); exit(1); } } show("close(%s)\n", name); if (close(fd) < 0) { fprintf(stderr, "%s: failed to close `%s': %s\n", progname, name, strerror(errno)); exit(1); } sync_dir(); for (rename_pass = 0; rename_pass < RENAME_PASSES; rename_pass++) { char *newname = mk_new_filename(fileno, rename_pass); show("rename(%s, %s)\n", name, newname); if (rename(name, newname) < 0) { fprintf(stderr, "%s: failed to rename `%s' to `%s': %s\n", progname, name, newname, strerror(errno)); exit(1); } sync_dir(); free(name); name = newname; } rename_pass--; free(name); } void do_child(void) { int fileno; char *dn = mk_dirname(); int dotcount; dir_fd = open(dn, O_RDONLY); if (dir_fd < 0) { fprintf(stderr, "%s: failed to open dir `%s': %s\n", progname, dn, strerror(errno)); exit(1); } free(dn); dotcount = N_FILES / 10; if (dotcount == 0) dotcount = 1; for (fileno = 0; fileno < N_FILES; fileno++) { if (fileno % dotcount == 0) { printf("."); fflush(stdout); } do_one_file(fileno); if (fileno >= UNLINK_LAG) unlink_one_file(fileno - UNLINK_LAG, RENAME_PASSES - 1); } for (fileno = N_FILES - UNLINK_LAG; fileno < N_FILES; fileno++) unlink_one_file(fileno, RENAME_PASSES - 1); } void doit(void) { int child; int children_left; child_status = (int *)mmap( 0, n_threads * sizeof(*child_status), PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0); if (child_status == MAP_FAILED) { perror("mmap"); exit(1); } memset(child_status, 0, n_threads * sizeof(*child_status)); thread_group = -1; for (this_child_index = 0; this_child_index < n_threads; this_child_index++) { if (this_child_index % threads_per_dir == 0) { thread_group++; make_dir(); } if (fork() == 0) { int iter; for (iter = 0; iter < n_iters; iter++) do_child(); child_status[this_child_index] = 1; exit(0); } } /* Parent */ children_left = n_threads; while (children_left) { int status; if( wait3(&status, 0, 0) < 0 ) { if( errno != EINTR ) { perror("wait3"); exit(1); } continue; } for (child = 0; child < n_threads; child++) { if (child_status[child] == 1) { child_status[child] = 2; printf("*"); fflush(stdout); children_left--; } } } for (thread_group = 0; thread_group < ( n_threads / threads_per_dir ); thread_group++ ) remove_dir(); printf("\n"); } void usage(void) { fprintf(stderr, "Usage: %s [-fFosuv] [-p threads-pre-dir ][-n iters] [-t threads] dirname\n", progname); fprintf(stderr, " -f: Use fsync() on close\n"); fprintf(stderr, " -F: Use fsync() on parent dir\n"); fprintf(stderr, " -n: Number of iterations\n"); fprintf(stderr, " -o: Open files O_SYNC\n"); fprintf(stderr, " -p: Number of threads per directory\n"); fprintf(stderr, " -t: Number of threads\n"); fprintf(stderr, " -u: Unlink files during test\n"); fprintf(stderr, " -v: Verbose\n"); fprintf(stderr, " dirname: Directory to run tests in\n"); exit(1); } int main(int argc, char *argv[]) { int c; progname = argv[0]; while ((c = getopt(argc, argv, "vFfout:n:p:")) != -1) { switch (c) { case 'f': use_fsync++; break; case 'F': fsync_dir++; break; case 'n': n_iters = strtol(optarg, NULL, 10); break; case 'o': use_osync++; break; case 'p': threads_per_dir = strtol(optarg, NULL, 10); break; case 't': n_threads = strtol(optarg, NULL, 10); break; case 'u': do_unlink++; break; case 'v': verbose++; break; } } if (optind == argc) usage(); dirname = argv[optind++]; if (optind != argc) usage(); doit(); exit(0); } ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: non volatile ram devices 2002-12-06 13:03 ` Chris Mason @ 2002-12-06 23:53 ` Matthias Andree 0 siblings, 0 replies; 20+ messages in thread From: Matthias Andree @ 2002-12-06 23:53 UTC (permalink / raw) To: reiserfs-list Chris Mason <mason@suse.com> writes: > This does fsyncs on both the file and the directory in a simulated > delivery. It isn't a perfect benchmark, but it does hammer on fsyncs > nicely. No need to sync the directory in Postfix if the fsync() makes sure the filename of a newly created file cannot be lost. I heard this was true for ReiserFS v3.6 ;-) -- Matthias Andree ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: non volatile ram devices 2002-12-06 9:52 ` Russell Coker 2002-12-06 13:03 ` Chris Mason @ 2002-12-06 23:50 ` Matthias Andree 2002-12-07 4:09 ` Todd Lyons 2002-12-07 10:03 ` Russell Coker 1 sibling, 2 replies; 20+ messages in thread From: Matthias Andree @ 2002-12-06 23:50 UTC (permalink / raw) To: reiserfs-list Russell Coker <russell@coker.com.au> writes: > I'm using a fairly vanilla kernel. It's performance is 2 messages per second > taken from qmail spool and delivered while there is a background load of pop > access and new incoming mail. IE if there is a backlog of mail to deliver > the backlog gets smaller by 120 messages per minute. In my benchmarks on a plain FreeBSD ffs and a Micropolis 4345WS UWSCSI disk drive (7200/min) that was otherwise idle, qmail maxes out for remote 1-to-1 deliveries at a good 3 deliveries/s. It might improve a little with André Oppenheimer's patches, I didn't bother to check, Postfix does 15/s on softupdates FreeBSD ffs, qmail does not support softupdates. I didn't check Linux file systems on a current disk drive such as Fujitsu MAH (7200/min U160 SCSI). So I believe on ATA or loaded SCSI 2 messages per second is as good as qmail gets with its 13+ synchronous writes per delivery. It's a pig. Retrying with -o dirsync instead of -o sync might be worthwhile though. Kernel patches needed. -- Matthias Andree ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: non volatile ram devices 2002-12-06 23:50 ` Matthias Andree @ 2002-12-07 4:09 ` Todd Lyons 2002-12-07 17:13 ` Matthias Andree 2002-12-07 10:03 ` Russell Coker 1 sibling, 1 reply; 20+ messages in thread From: Todd Lyons @ 2002-12-07 4:09 UTC (permalink / raw) To: reiserfs-list -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Matthias Andree wanted us to know: >> I'm using a fairly vanilla kernel. It's performance is 2 messages per second >> taken from qmail spool and delivered while there is a background load of pop >> access and new incoming mail. IE if there is a backlog of mail to deliver >> the backlog gets smaller by 120 messages per minute. >In my benchmarks on a plain FreeBSD ffs and a Micropolis 4345WS UWSCSI >disk drive (7200/min) that was otherwise idle, qmail maxes out for >remote 1-to-1 deliveries at a good 3 deliveries/s. It might improve a You need to increase your remoteconcurrency limit. Unless your emails are 10's of Megabytes each, 3/s is way low. - -- Blue skies... Todd | Get a bigger hammer! | All vendors suck, but different ones | | http://www.mrball.net | suck less in different applications. | | http://faq.mrball.net | --Andy Walden on NANOG | Linux kernel 2.4.19-16mdk 1 user, load average: 0.00, 0.00, 0.00 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (GNU/Linux) iD8DBQE98XRlIBT1264ScBURAltmAJsED+JgbSx0CKWb1PIf5iopOXXLBQCeKebg 3LeZt00jkHlER19Mqt/bBCU= =RldY -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: non volatile ram devices 2002-12-07 4:09 ` Todd Lyons @ 2002-12-07 17:13 ` Matthias Andree 0 siblings, 0 replies; 20+ messages in thread From: Matthias Andree @ 2002-12-07 17:13 UTC (permalink / raw) To: reiserfs-list Todd Lyons <todd@mrball.net> writes: >>> I'm using a fairly vanilla kernel. It's performance is 2 messages per second >>> taken from qmail spool and delivered while there is a background load of pop >>> access and new incoming mail. IE if there is a backlog of mail to deliver >>> the backlog gets smaller by 120 messages per minute. >>In my benchmarks on a plain FreeBSD ffs and a Micropolis 4345WS UWSCSI >>disk drive (7200/min) that was otherwise idle, qmail maxes out for >>remote 1-to-1 deliveries at a good 3 deliveries/s. It might improve a > > You need to increase your remoteconcurrency limit. Unless your emails > are 10's of Megabytes each, 3/s is way low. No, I don't -- most time of the test, qmail was running from 0 to 2 qmail-remote processes. This only changes when the todo queue has been drained completely and no more mail needs to be preprocessed. Only after the todo is empty, qmail ramps up into the remoteconcurrency limit. And I'd certainly not raise remoteconcurrency above 20 because qmail would easily trample the destination host if I had many recipients in one domain, giving me "false" deferrals (because it's ran into the tcpserver limit of the MX it's talking to). See André Oppenheim's "silly qmail syndrome patch" and the corresponding graphs at http://www.nrg4u.com/ -- Matthias Andree ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: non volatile ram devices 2002-12-06 23:50 ` Matthias Andree 2002-12-07 4:09 ` Todd Lyons @ 2002-12-07 10:03 ` Russell Coker 2002-12-07 10:44 ` Valdis.Kletnieks 1 sibling, 1 reply; 20+ messages in thread From: Russell Coker @ 2002-12-07 10:03 UTC (permalink / raw) To: Matthias Andree, reiserfs-list On Sat, 7 Dec 2002 00:50, Matthias Andree wrote: > Russell Coker <russell@coker.com.au> writes: > > I'm using a fairly vanilla kernel. It's performance is 2 messages per > > second taken from qmail spool and delivered while there is a background > > load of pop access and new incoming mail. IE if there is a backlog of > > mail to deliver the backlog gets smaller by 120 messages per minute. > > In my benchmarks on a plain FreeBSD ffs and a Micropolis 4345WS UWSCSI > disk drive (7200/min) that was otherwise idle, qmail maxes out for > remote 1-to-1 deliveries at a good 3 deliveries/s. It might improve a > little with André Oppenheimer's patches, I didn't bother to check, > Postfix does 15/s on softupdates FreeBSD ffs, qmail does not support > softupdates. I didn't check Linux file systems on a current disk drive > such as Fujitsu MAH (7200/min U160 SCSI). How does qmail not support softupdates? > So I believe on ATA or loaded SCSI 2 messages per second is as good as > qmail gets with its 13+ synchronous writes per delivery. It's a > pig. Retrying with -o dirsync instead of -o sync might be worthwhile > though. Kernel patches needed. Well this 13 synchronous writes is what I am trying to solve. With data=journal and the journal on a RAM device I expect that performance will improve massively. This is not a Qmail bottleneck AFAIK, Qmail is using all the disk capacity. If I add any extra disk IO load (such as starting a process to deliver bulletins by hard-linking directly into user Maildir's) then the system load average dramatically increases (load average goes from ~2 to ~10 if I add an extra process doing heavy disk writes). -- http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark http://www.coker.com.au/postal/ Postal SMTP/POP benchmark http://www.coker.com.au/~russell/ My home page ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: non volatile ram devices 2002-12-07 10:03 ` Russell Coker @ 2002-12-07 10:44 ` Valdis.Kletnieks 0 siblings, 0 replies; 20+ messages in thread From: Valdis.Kletnieks @ 2002-12-07 10:44 UTC (permalink / raw) To: Russell Coker; +Cc: reiserfs-list [-- Attachment #1: Type: text/plain, Size: 432 bytes --] On Sat, 07 Dec 2002 11:03:00 +0100, Russell Coker <russell@coker.com.au> said: > How does qmail not support softupdates? I can't speak for qmail directly, but I've heard of other software that gets indigestion because softupdates doesn't present the exact same API and view of the world. I think it had to do with exactly how you fsync() a directory, and when the syscall returned, and when the data was *really* on disk. [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: non volatile ram devices 2002-12-04 19:59 non volatile ram devices Russell Coker 2002-12-04 20:24 ` Ragnar Kjørstad @ 2002-12-04 22:05 ` Hans Reiser 2002-12-04 21:17 ` Mike Jadon 2002-12-05 6:32 ` Oleg Drokin 2 siblings, 1 reply; 20+ messages in thread From: Hans Reiser @ 2002-12-04 22:05 UTC (permalink / raw) To: Russell Coker; +Cc: linux-ide-arrays, ReiserFS, mikej, Edward Shishkin Russell Coker wrote: >I have some servers that are giving inadequate disk performance for Maildir >mail spools. They are running kernel 2.4.19 (2.4.20 upgrade is planned) and >using ReiserFS for everything that's important. > >At this stage it is impossible for me to replace disks, RAID controllers, or >anything else really significant. > >What I am thinking of doing is using a kernel that supports data journalling >which should increase performance, but still probably won't give me enough. >So I am thinking of using an "external journal" (or using software RAID to >put the part of the partition containing the journal on a different device). > >The device containing the journal would be something much faster than physical >media. I have been doing some research on non-volatile memory devices. I >only found one company producing disks that are RAM based with battery >backup, and they seem to start at $10K (too expensive - probably because they >are much larger than I need, I need 128M at most, they provide 2G). I found >many companies selling flash memory, but that only takes a million writes >(that'll last about an hour for the use I plan). I found one company selling >PC-Card devices that have two batterys for backup, but that requires getting >a PCI controller for PC-Card's (something I haven't tried before). > >Does anyone know of an affordable ($1000 or less) device that can survive >unexpected power outages of at least 24 hours duration, can commit a write in >less than 1ms, supports unlimited writes, and connects to a IDE or SCSI bus >(or PCI if there's a suitable Linux driver). > > > The umem.com folks sell a device that we have tested and benchmarked reiserfs on. If I could get Edward to format benchmarks in a way that conveys that information that is relevant to persons reading them, I would post them on our mailing list.... Hans ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: non volatile ram devices 2002-12-04 22:05 ` Hans Reiser @ 2002-12-04 21:17 ` Mike Jadon 0 siblings, 0 replies; 20+ messages in thread From: Mike Jadon @ 2002-12-04 21:17 UTC (permalink / raw) To: reiser, Russell Coker Cc: linux-ide-arrays, ReiserFS, Edward Shishkin, lrm, rmathews Hans, Many thanks for the referral. Russell, Our qty. 1-9 pricing is $730/unit for the 128MB card and $960/unit for the 1GB card. A driver is included the 2.4.19 kernel. Thanks, Mike At 02:05 PM 12/4/2002, Hans Reiser wrote: >Russell Coker wrote: > >>I have some servers that are giving inadequate disk performance for >>Maildir mail spools. They are running kernel 2.4.19 (2.4.20 upgrade is >>planned) and using ReiserFS for everything that's important. >> >>At this stage it is impossible for me to replace disks, RAID controllers, >>or anything else really significant. >> >>What I am thinking of doing is using a kernel that supports data >>journalling which should increase performance, but still probably won't >>give me enough. >>So I am thinking of using an "external journal" (or using software RAID >>to put the part of the partition containing the journal on a different device). >> >>The device containing the journal would be something much faster than >>physical media. I have been doing some research on non-volatile memory >>devices. I only found one company producing disks that are RAM based >>with battery backup, and they seem to start at $10K (too expensive - >>probably because they are much larger than I need, I need 128M at most, >>they provide 2G). I found many companies selling flash memory, but that >>only takes a million writes (that'll last about an hour for the use I >>plan). I found one company selling PC-Card devices that have two >>batterys for backup, but that requires getting a PCI controller for >>PC-Card's (something I haven't tried before). >> >>Does anyone know of an affordable ($1000 or less) device that can survive >>unexpected power outages of at least 24 hours duration, can commit a >>write in less than 1ms, supports unlimited writes, and connects to a IDE >>or SCSI bus (or PCI if there's a suitable Linux driver). >> >> >The umem.com folks sell a device that we have tested and benchmarked >reiserfs on. If I could get Edward to format benchmarks in a way that >conveys that information that is relevant to persons reading them, I would >post them on our mailing list.... > >Hans > Mike Jadon Micro Memory, LLC (US) Tel 818 998 0070 x 318 (US) Fax 818 998 4459 mikej@umem.com www.umem.com 9540 Vassar Chatsworth, Ca. USA 91311 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: non volatile ram devices 2002-12-04 19:59 non volatile ram devices Russell Coker 2002-12-04 20:24 ` Ragnar Kjørstad 2002-12-04 22:05 ` Hans Reiser @ 2002-12-05 6:32 ` Oleg Drokin 2002-12-05 8:36 ` Russell Coker 2 siblings, 1 reply; 20+ messages in thread From: Oleg Drokin @ 2002-12-05 6:32 UTC (permalink / raw) To: Russell Coker; +Cc: ReiserFS Hello! On Wed, Dec 04, 2002 at 08:59:35PM +0100, Russell Coker wrote: > I have some servers that are giving inadequate disk performance for Maildir > mail spools. They are running kernel 2.4.19 (2.4.20 upgrade is planned) and > using ReiserFS for everything that's important. May I ask what kind of inadequacy on what kinds of operations do you observe? Thank you. Bye, Oleg ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: non volatile ram devices 2002-12-05 6:32 ` Oleg Drokin @ 2002-12-05 8:36 ` Russell Coker 2002-12-05 16:21 ` Todd Lyons 0 siblings, 1 reply; 20+ messages in thread From: Russell Coker @ 2002-12-05 8:36 UTC (permalink / raw) To: Oleg Drokin; +Cc: ReiserFS On Thu, 5 Dec 2002 07:32, Oleg Drokin wrote: > > I have some servers that are giving inadequate disk performance for > > Maildir mail spools. They are running kernel 2.4.19 (2.4.20 upgrade is > > planned) and using ReiserFS for everything that's important. > > May I ask what kind of inadequacy on what kinds of operations do you > observe? It just generally isn't fast enough. The servers in question have 4 * 72G U160 SCSI disks in RAID-5 arrays on MegaRAID controllers. They are designed to handle 300,000 accounts for POP and IMAP. At times of high load there's 20 reads per second and 160 writes per second. -- http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark http://www.coker.com.au/postal/ Postal SMTP/POP benchmark http://www.coker.com.au/~russell/ My home page ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: non volatile ram devices 2002-12-05 8:36 ` Russell Coker @ 2002-12-05 16:21 ` Todd Lyons 2002-12-05 22:51 ` Russell Coker 0 siblings, 1 reply; 20+ messages in thread From: Todd Lyons @ 2002-12-05 16:21 UTC (permalink / raw) To: ReiserFS -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Russell Coker wanted us to know: >It just generally isn't fast enough. The servers in question have 4 * 72G >U160 SCSI disks in RAID-5 arrays on MegaRAID controllers. They are designed >to handle 300,000 accounts for POP and IMAP. >At times of high load there's 20 reads per second and 160 writes per second. Let me ask some really stupid questions. What kind of logging are your pop, imap, and mail services doing? If logging to syslog, redirect the mail logging facility to tty12 instead of a file on the harddrive. If syslog is logging to a network log server, then there's not much you can do. If logging to /dev/null, this is a non-issue. Is the Maildir spool on its own partition? (I can't see how it's not since it's you, Russell). Is /var/log on its own partition. What I'm getting at with all of this is that syslog can create significant load on a machine if the machine is really busy. Russell, if you've already tried all this, happily ignore this message and let us know what you find. - -- Blue skies... Todd | Get a bigger hammer! | Sometimes you get what you want. | | http://www.mrball.net | Sometimes you get experience. | | http://faq.mrball.net | --unknown origin | Linux kernel 2.4.19-16mdk 1 user, load average: 0.00, 0.00, 0.00 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (GNU/Linux) iD8DBQE9730IIBT1264ScBURAr7lAJ4+19Qrj/aeWSgrOGLHKRvw7jRVqgCg5gZb +za9M955ADOxSXxlVcOOV6Y= =e0ai -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: non volatile ram devices 2002-12-05 16:21 ` Todd Lyons @ 2002-12-05 22:51 ` Russell Coker 0 siblings, 0 replies; 20+ messages in thread From: Russell Coker @ 2002-12-05 22:51 UTC (permalink / raw) To: Todd Lyons, ReiserFS On Thu, 5 Dec 2002 17:21, Todd Lyons wrote: > Russell Coker wanted us to know: > >It just generally isn't fast enough. The servers in question have 4 * 72G > >U160 SCSI disks in RAID-5 arrays on MegaRAID controllers. They are > > designed to handle 300,000 accounts for POP and IMAP. > >At times of high load there's 20 reads per second and 160 writes per > > second. > > Let me ask some really stupid questions. > What kind of logging are your pop, imap, and mail services doing? If > logging to syslog, redirect the mail logging facility to tty12 instead > of a file on the harddrive. If syslog is logging to a network log > server, then there's not much you can do. If logging to /dev/null, this > is a non-issue. Logging is to files, but it's got "-" at the start of the log entries to stop them being sync'd so I doubt that they have a great impact. Turning off logs on live production servers is something that I am hesitant to do, and I don't expect it to improve performance much as the qmail spool dir (with synchronous writes) is also on the /var file system. > Is the Maildir spool on its own partition? (I can't see > how it's not since it's you, Russell). Is /var/log on its own > partition. Partitions are /, /var, and /mail . Not my choice but it's not too bad either. > What I'm getting at with all of this is that syslog can create > significant load on a machine if the machine is really busy. Non synchronous writes for logging when a machine has 4G of RAM to allow big caches should not be a performance issue. If it is then there's something wrong with the logging. I may give it a try, but I'll try data logging first. Thanks for the suggestion. -- http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark http://www.coker.com.au/postal/ Postal SMTP/POP benchmark http://www.coker.com.au/~russell/ My home page ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2002-12-07 17:13 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-12-04 19:59 non volatile ram devices Russell Coker 2002-12-04 20:24 ` Ragnar Kjørstad 2002-12-05 9:00 ` Russell Coker 2002-12-05 10:38 ` Ragnar Kjørstad 2002-12-05 10:45 ` Russell Coker 2002-12-05 13:23 ` Chris Mason 2002-12-06 9:52 ` Russell Coker 2002-12-06 13:03 ` Chris Mason 2002-12-06 23:53 ` Matthias Andree 2002-12-06 23:50 ` Matthias Andree 2002-12-07 4:09 ` Todd Lyons 2002-12-07 17:13 ` Matthias Andree 2002-12-07 10:03 ` Russell Coker 2002-12-07 10:44 ` Valdis.Kletnieks 2002-12-04 22:05 ` Hans Reiser 2002-12-04 21:17 ` Mike Jadon 2002-12-05 6:32 ` Oleg Drokin 2002-12-05 8:36 ` Russell Coker 2002-12-05 16:21 ` Todd Lyons 2002-12-05 22:51 ` Russell Coker
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.