* 2 times faster rawio and several fixes (2.4.3aa3) @ 2001-04-06 16:34 Andrea Arcangeli 2001-04-06 17:07 ` Andrea Arcangeli 0 siblings, 1 reply; 5+ messages in thread From: Andrea Arcangeli @ 2001-04-06 16:34 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Linus Torvalds, linux-kernel I merged some of SCT's fixes plus I fixed another couple of bugs and then I boosted the code to run faster. There's still room for improvements for example by using a ring of iobuf to walk pagetables and lock down pages for the next atomic I/O chunk while the I/O of the previous iobuf is in progress (before waiting synchronously) but with those first basic improvements it just runs exactly 2 times faster than vanilla 2.4.3 on my hardware. NOTE: since I made the atomic I/O 512k to go in sync with the max size of a io-request and to take advantage of the large I/O requests the MAX_KIO_SECTORS grown so much that it cannot be loaded on the stack anymore (it was just a bad idea anyways to load it on the stack anyways) so for things like the buffer array I preallocate an helper buffer in the kiovec structure for that. This should very significantly boost Oracle when the working set doesn't fit in cache because the rawio path should be quite efficient now (comparable to regular I/O through the cache). 2.4.3aa3 without rawio-1: alpha:/home/andrea # time ./rawio-bench Opening /dev/raw1 Allocating 50MB of memory Reading from /dev/raw1 Writing data to /dev/raw1 real 0m10.323s user 0m0.002s sys 0m1.248s alpha:/home/andrea # time ./rawio-bench Opening /dev/raw1 Allocating 50MB of memory Reading from /dev/raw1 Writing data to /dev/raw1 real 0m10.299s user 0m0.002s sys 0m1.247s alpha:/home/andrea # time ./rawio-bench Opening /dev/raw1 Allocating 50MB of memory Reading from /dev/raw1 Writing data to /dev/raw1 real 0m10.557s user 0m0.004s sys 0m1.267s alpha:/home/andrea # time ./rawio-bench Opening /dev/raw1 Allocating 50MB of memory Reading from /dev/raw1 Writing data to /dev/raw1 real 0m10.310s user 0m0.003s sys 0m1.282s alpha:/home/andrea # 2.4.3aa3 with rawio-1: root@alpha:/home/andrea > time ./rawio-bench Opening /dev/raw1 Allocating 50MB of memory Reading from /dev/raw1 Writing data to /dev/raw1 real 0m5.208s user 0m0.001s sys 0m1.162s root@alpha:/home/andrea > time ./rawio-bench Opening /dev/raw1 Allocating 50MB of memory Reading from /dev/raw1 Writing data to /dev/raw1 real 0m5.233s user 0m0.002s sys 0m1.184s root@alpha:/home/andrea > time ./rawio-bench Opening /dev/raw1 Allocating 50MB of memory Reading from /dev/raw1 Writing data to /dev/raw1 real 0m5.378s user 0m0.002s sys 0m1.213s root@alpha:/home/andrea > time ./rawio-bench Opening /dev/raw1 Allocating 50MB of memory Reading from /dev/raw1 Writing data to /dev/raw1 real 0m5.258s user 0m0.001s sys 0m1.183s root@alpha:/home/andrea > Original patch is here: ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.3aa3/20_rawio-1 however to apply cleanly to lvm you need to first apply the lvm patches into the 2.4.3aa3 directory to upgrade to 0.9.1 beta6 (btw, I appreciated very much the sistina folks that gone back to IPO 10 as suggested a few weeks ago, thanks! :) I also ported the patch to vanilla 2.4.3 for inclusion (however that version is untested but the only rejects was in lvm-snap.c and they were obvious enough not to require testing) but lvm people please look at the other patch that will just apply cleanly to your CVS tree: ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.3/rawio-1 Andrea ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2 times faster rawio and several fixes (2.4.3aa3) 2001-04-06 16:34 2 times faster rawio and several fixes (2.4.3aa3) Andrea Arcangeli @ 2001-04-06 17:07 ` Andrea Arcangeli 2001-04-06 17:02 ` Andi Kleen 0 siblings, 1 reply; 5+ messages in thread From: Andrea Arcangeli @ 2001-04-06 17:07 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Linus Torvalds, linux-kernel On Fri, Apr 06, 2001 at 06:34:40PM +0200, Andrea Arcangeli wrote: > 2.4.3aa3 with rawio-1: > > root@alpha:/home/andrea > time ./rawio-bench > Opening /dev/raw1 > Allocating 50MB of memory > Reading from /dev/raw1 > Writing data to /dev/raw1 > > real 0m5.208s > user 0m0.001s > sys 0m1.162s > root@alpha:/home/andrea > time ./rawio-bench > Opening /dev/raw1 > Allocating 50MB of memory > Reading from /dev/raw1 > Writing data to /dev/raw1 > > real 0m5.233s > user 0m0.002s > sys 0m1.184s > root@alpha:/home/andrea > time ./rawio-bench > Opening /dev/raw1 > Allocating 50MB of memory > Reading from /dev/raw1 > Writing data to /dev/raw1 > > real 0m5.378s > user 0m0.002s > sys 0m1.213s > root@alpha:/home/andrea > time ./rawio-bench > Opening /dev/raw1 > Allocating 50MB of memory > Reading from /dev/raw1 > Writing data to /dev/raw1 > > real 0m5.258s > user 0m0.001s > sys 0m1.183s > root@alpha:/home/andrea > with this patch: --- 2.4.3aa/include/linux/iobuf.h Fri Apr 6 16:33:12 2001 +++ /misc/andrea-alpha/2.4.3aa/include/linux/iobuf.h Fri Apr 6 18:31:23 2001 @@ -24,7 +24,7 @@ * entire iovec. */ -#define KIO_MAX_ATOMIC_IO 512 /* in kb */ +#define KIO_MAX_ATOMIC_IO 1024 /* in kb */ #define KIO_STATIC_PAGES (KIO_MAX_ATOMIC_IO / (PAGE_SIZE >> 10) + 1) #define KIO_MAX_SECTORS (KIO_MAX_ATOMIC_IO * 2) applied on top of 2.4.3aa3 I get even better results: alpha:/home/andrea # time ./rawio-bench Opening /dev/raw1 Allocating 50MB of memory Reading from /dev/raw1 Writing data to /dev/raw1 real 0m4.898s user 0m0.003s sys 0m1.138s alpha:/home/andrea # time ./rawio-bench Opening /dev/raw1 Allocating 50MB of memory Reading from /dev/raw1 Writing data to /dev/raw1 real 0m4.935s user 0m0.002s sys 0m1.159s alpha:/home/andrea # time ./rawio-bench Opening /dev/raw1 Allocating 50MB of memory Reading from /dev/raw1 Writing data to /dev/raw1 real 0m4.925s user 0m0.003s sys 0m1.162s alpha:/home/andrea # time ./rawio-bench Opening /dev/raw1 Allocating 50MB of memory Reading from /dev/raw1 Writing data to /dev/raw1 real 0m4.941s user 0m0.004s sys 0m1.166s alpha:/home/andrea # this is most probably beacuse I'm striping on two scsi disks and this way we can send 512k requests to each disk. NOTE: also userspace reads and writes have to be >=512kbytes in granularity or you'll generate small requests because rawio in always synchronous. And using decent sized write/reads is good idea anyways to reduce the enter/exit kernel overhead. However we can probably stay with the 512k atomic I/O otherwise the iobuf structure will grow again of an order of 2. With 512k of atomic I/O the kiovec structure is just 8756 in size (infact probably I should allocate some of the structures dynamically instead of statics inside the kiobuf.. as it is now with my patch it's not very reliable as it needs an allocation of order 2). BTW, some more description on the testcase: it first read 50mbytes physically contigous and then it lseek to zero and writes 50mbytes. Disk throughput in mean is 100mbyte/5sec = 20mbyte/sec. It uses anonymous memory as in-core backend. It looks perfect testcase to me and they're the faster disks I have here around. Here the proggy: /* 2001 Andrea Arcangeli <andrea@suse.de> */ #include <fcntl.h> #include <unistd.h> #include <stdio.h> #include <string.h> #include <stdlib.h> #include <unistd.h> #include <sys/mman.h> #include <asm/page.h> #define MB (1024*1024) #define BUFSIZE (50*MB) main() { int fd, size, ret; int filemap; char * buf, * end, * tmp; printf("Opening /dev/raw1\n"); fd = open("/dev/raw1", O_RDWR); if (fd < 0) perror("open /dev/raw1"), exit(1); #if 1 printf("Allocating %dMB of memory\n", BUFSIZE/MB); buf = (char *) malloc(BUFSIZE); if (buf < 0) perror("malloc"), exit(1); end = (char *) ((unsigned long) (buf + BUFSIZE) & PAGE_MASK); buf = (char *) ((unsigned long)(buf + ~PAGE_MASK) & PAGE_MASK); #else printf("Mapping %dMB of memory\n", BUFSIZE/MB); filemap = open("deleteme", O_RDWR|O_TRUNC|O_CREAT, 0644); if (filemap < 0) perror("open"), exit(1); { int i; char buf[4096]; for (i = 0; i < BUFSIZE; i += 4096) write(filemap, &buf, 4096); } ftruncate(filemap, BUFSIZE); buf = mmap(0, BUFSIZE, PROT_READ|PROT_WRITE, MAP_SHARED, filemap, 0); if ((long) buf < 0) perror("mmap"), exit(1); if ((unsigned long) buf & ~PAGE_MASK) perror("mmap misaligned"), exit(1); end = buf + BUFSIZE; #endif size = end - buf; printf("Reading from /dev/raw1\n"); ret = read(fd, buf, size); if (ret < 0) perror("read /dev/raw1"), exit(1); if (ret != size) fprintf(stderr, "read only %d of %d bytes\n", ret, size); printf("Writing data to /dev/raw1\n"); if (lseek(fd, 0, SEEK_SET) < 0) perror("lseek"), exit(1); ret = write(fd, buf, size); if (ret < 0) perror("read /dev/raw1"), exit(1); if (ret != size) fprintf(stderr, "write only %d of %d bytes\n", ret, size); } Andrea ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2 times faster rawio and several fixes (2.4.3aa3) 2001-04-06 17:07 ` Andrea Arcangeli @ 2001-04-06 17:02 ` Andi Kleen 2001-04-06 17:36 ` Andrea Arcangeli 0 siblings, 1 reply; 5+ messages in thread From: Andi Kleen @ 2001-04-06 17:02 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Stephen C. Tweedie, Linus Torvalds, linux-kernel On Fri, Apr 06, 2001 at 07:07:01PM +0200, Andrea Arcangeli wrote: > However we can probably stay with the 512k atomic I/O otherwise the iobuf > structure will grow again of an order of 2. With 512k of atomic I/O the kiovec > structure is just 8756 in size (infact probably I should allocate some of the > structures dynamically instead of statics inside the kiobuf.. as it is now > with my patch it's not very reliable as it needs an allocation of order 2). 8756bytes wastes most of an order 2 allocation. Wouldn't it make more sense to round it up to 16k to use the four pages fully ? (if the increased atomic size doesn't have other bad effects -- i guess it's no problem anymore to lock down that much memory?) -Andi ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2 times faster rawio and several fixes (2.4.3aa3) 2001-04-06 17:02 ` Andi Kleen @ 2001-04-06 17:36 ` Andrea Arcangeli 2001-04-06 18:22 ` Andrea Arcangeli 0 siblings, 1 reply; 5+ messages in thread From: Andrea Arcangeli @ 2001-04-06 17:36 UTC (permalink / raw) To: Andi Kleen; +Cc: Stephen C. Tweedie, Linus Torvalds, linux-kernel On Fri, Apr 06, 2001 at 07:02:32PM +0200, Andi Kleen wrote: > On Fri, Apr 06, 2001 at 07:07:01PM +0200, Andrea Arcangeli wrote: > > However we can probably stay with the 512k atomic I/O otherwise the iobuf > > structure will grow again of an order of 2. With 512k of atomic I/O the kiovec > > structure is just 8756 in size (infact probably I should allocate some of the > > structures dynamically instead of statics inside the kiobuf.. as it is now > > with my patch it's not very reliable as it needs an allocation of order 2). > > 8756bytes wastes most of an order 2 allocation. Wouldn't it make more sense to > round it up to 16k to use the four pages fully ? (if the increased atomic I prefer to get rid of the order 2 allocation to avoid having to deal with fragmentation. The patch introduces arrays takes 1 page each (on x86 and alpha) if the atomic IO is 512k so I can allocate them with a separate kmalloc. OTOH on x86-64 we have PAGE_SIZE 4k and 8byte words so maybe I should use vmalloc instead? Performance of vmalloc is not an issue because those allocations doesn't happen anymore in any fast path, only worry in using vmalloc are the additional global 3 tlb entries (but OTOH also with kmalloc there's the chance the code will use a few more global tlb entries if the memory returned for all the kiovec structures doesn't all fit in the same 2/4Mbytes naturally aligned area). so probably I will take the vmalloc way that is more generic and it shouldn't hurt perormance (I will measure that to be sure though). Andrea ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2 times faster rawio and several fixes (2.4.3aa3) 2001-04-06 17:36 ` Andrea Arcangeli @ 2001-04-06 18:22 ` Andrea Arcangeli 0 siblings, 0 replies; 5+ messages in thread From: Andrea Arcangeli @ 2001-04-06 18:22 UTC (permalink / raw) To: Andi Kleen; +Cc: Stephen C. Tweedie, Linus Torvalds, linux-kernel On Fri, Apr 06, 2001 at 07:36:21PM +0200, Andrea Arcangeli wrote: > 2/4Mbytes naturally aligned area). so probably I will take the vmalloc way As expected vmalloc additional 2 tlbs aren't visible in the numbers (that are mostly dominated by I/O anyways), I think it's the best solution to avoid the order 2 multipage: alpha:/home/andrea # time ./rawio-bench Opening /dev/raw1 Allocating 50MB of memory Reading from /dev/raw1 Writing data to /dev/raw1 real 0m5.241s user 0m0.002s sys 0m1.119s alpha:/home/andrea # time ./rawio-bench Opening /dev/raw1 Allocating 50MB of memory Reading from /dev/raw1 Writing data to /dev/raw1 real 0m5.176s user 0m0.003s sys 0m1.128s alpha:/home/andrea # time ./rawio-bench Opening /dev/raw1 Allocating 50MB of memory Reading from /dev/raw1 Writing data to /dev/raw1 real 0m5.196s user 0m0.002s sys 0m1.132s alpha:/home/andrea # time ./rawio-bench Opening /dev/raw1 Allocating 50MB of memory Reading from /dev/raw1 Writing data to /dev/raw1 real 0m5.477s user 0m0.004s sys 0m1.146s alpha:/home/andrea # time ./rawio-bench Opening /dev/raw1 Allocating 50MB of memory Reading from /dev/raw1 Writing data to /dev/raw1 real 0m5.217s user 0m0.004s sys 0m1.149s alpha:/home/andrea # Tomorrow maybe I will try to speed it up furhter using the desing described in the first email. The s/kmem_cache_alloc/vmalloc/ change is here for now and it is rock solid for me (regression testing is still happy): ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.3/rawio-2 I think it's ok for inclusion. Andrea ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2001-04-06 18:02 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-04-06 16:34 2 times faster rawio and several fixes (2.4.3aa3) Andrea Arcangeli 2001-04-06 17:07 ` Andrea Arcangeli 2001-04-06 17:02 ` Andi Kleen 2001-04-06 17:36 ` Andrea Arcangeli 2001-04-06 18:22 ` Andrea Arcangeli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox