* partially uptodate page reads @ 2008-07-24 15:17 Nick Piggin 2008-07-24 17:59 ` Christoph Hellwig 0 siblings, 1 reply; 9+ messages in thread From: Nick Piggin @ 2008-07-24 15:17 UTC (permalink / raw) To: hifumi.hisashi, jack, linux-ext4, linux-fsdevel, akpm Hi, I have some questions about your patch in -mm vfs-pagecache-usage-optimization-onpagesize=blocksize-environment.patch I have no particular problem with something like this, but leaving the implementation details aside for the moment, can we discuss the justification for this? Are there significant numbers of people using block size < page size in situations where performance is important and significantly improved by this patch? Can you give any performance numbers to illustrate perhaps? Thanks, Nick ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: partially uptodate page reads 2008-07-24 15:17 partially uptodate page reads Nick Piggin @ 2008-07-24 17:59 ` Christoph Hellwig 2008-07-24 19:08 ` Andrew Morton 2008-07-25 9:22 ` Nick Piggin 0 siblings, 2 replies; 9+ messages in thread From: Christoph Hellwig @ 2008-07-24 17:59 UTC (permalink / raw) To: Nick Piggin; +Cc: hifumi.hisashi, jack, linux-ext4, linux-fsdevel, akpm, xfs On Fri, Jul 25, 2008 at 01:17:11AM +1000, Nick Piggin wrote: > Hi, I have some questions about your patch in -mm > > vfs-pagecache-usage-optimization-onpagesize=blocksize-environment.patch > > I have no particular problem with something like this, but leaving the > implementation details aside for the moment, can we discuss the > justification for this? > > Are there significant numbers of people using block size < page size in > situations where performance is important and significantly improved by > this patch? Can you give any performance numbers to illustrate perhaps? With XFS lots of people use 4k blocksize filesystems on ia64 systems with 16k pages, so an optimization like this would be useful. But as mentioned in one of your previous comments I'd rather prefer a readpage interface chaneg to deal with this. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: partially uptodate page reads 2008-07-24 17:59 ` Christoph Hellwig @ 2008-07-24 19:08 ` Andrew Morton 2008-07-28 4:34 ` Hisashi Hifumi 2008-07-25 9:22 ` Nick Piggin 1 sibling, 1 reply; 9+ messages in thread From: Andrew Morton @ 2008-07-24 19:08 UTC (permalink / raw) To: Christoph Hellwig Cc: Nick Piggin, hifumi.hisashi, jack, linux-ext4, linux-fsdevel, xfs On Thu, 24 Jul 2008 13:59:13 -0400 Christoph Hellwig <hch@infradead.org> wrote: > On Fri, Jul 25, 2008 at 01:17:11AM +1000, Nick Piggin wrote: > > Hi, I have some questions about your patch in -mm > > > > vfs-pagecache-usage-optimization-onpagesize=blocksize-environment.patch > > > > I have no particular problem with something like this, but leaving the > > implementation details aside for the moment, can we discuss the > > justification for this? > > > > Are there significant numbers of people using block size < page size in > > situations where performance is important and significantly improved by > > this patch? Can you give any performance numbers to illustrate perhaps? > > With XFS lots of people use 4k blocksize filesystems on ia64 systems > with 16k pages, so an optimization like this would be useful. As Nick says, we really should have some measurement results which confirm this theory. Maybe we did do some but they didn't find theor way into the changelog. I've put the patch on hold until this confirmation data is available. > But as mentioned in one of your previous comments I'd rather prefer > a readpage interface chaneg to deal with this. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: partially uptodate page reads 2008-07-24 19:08 ` Andrew Morton @ 2008-07-28 4:34 ` Hisashi Hifumi 2008-07-28 6:51 ` Andrew Morton 0 siblings, 1 reply; 9+ messages in thread From: Hisashi Hifumi @ 2008-07-28 4:34 UTC (permalink / raw) To: Andrew Morton, Christoph Hellwig Cc: Nick Piggin, jack, linux-ext4, linux-fsdevel, xfs Hi >> > >> > Are there significant numbers of people using block size < page size in >> > situations where performance is important and significantly improved by >> > this patch? Can you give any performance numbers to illustrate perhaps? >> >> With XFS lots of people use 4k blocksize filesystems on ia64 systems >> with 16k pages, so an optimization like this would be useful. > >As Nick says, we really should have some measurement results which >confirm this theory. Maybe we did do some but they didn't find theor >way into the changelog. > >I've put the patch on hold until this confirmation data is available. > I've got some performance number. I wrote a benchmark program and got result number with this program. This benchmark do: 1, mount and open a test file. 2, create a 512MB file. 3, close a file and umount. 4, mount and again open a test file. 5, pwrite randomly 300000 times on a test file. offset is aligned by IO size(1024bytes). 6, measure time of preading randomly 100000 times on a test file. The result was: 2.6.26 330 sec 2.6.26-patched 226 sec Arch:i386 Filesystem:ext3 Blocksize:1024 bytes Memory: 1GB On ext3/4, a file is written through buffer/block. So random read/write mixed workloads or random read after random write workloads are optimized with this patch under pagesize != blocksize environment. This test result showed this. The benchmark program is as follows: #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <time.h> #include <stdlib.h> #include <string.h> #include <sys/mount.h> #define LEN 1024 #define LOOP 1024*512 /* 512MB */ main(void) { unsigned long i, offset, filesize; int fd; char buf[LEN]; time_t t1, t2; if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) { perror("cannot mount\n"); exit(1); } memset(buf, 0, LEN); fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC); if (fd < 0) { perror("cannot open file\n"); exit(1); } for (i = 0; i < LOOP; i++) write(fd, buf, LEN); close(fd); if (umount("/root/test1/") < 0) { perror("cannot umount\n"); exit(1); } if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) { perror("cannot mount\n"); exit(1); } fd = open("/root/test1/testfile", O_RDWR); if (fd < 0) { perror("cannot open file\n"); exit(1); } filesize = LEN * LOOP; for (i = 0; i < 300000; i++){ offset = (random() % filesize) & (~(LEN - 1)); pwrite(fd, buf, LEN, offset); } printf("start test\n"); time(&t1); for (i = 0; i < 100000; i++){ offset = (random() % filesize) & (~(LEN - 1)); pread(fd, buf, LEN, offset); } time(&t2); printf("%ld sec\n", t2-t1); close(fd); if (umount("/root/test1/") < 0) { perror("cannot umount\n"); exit(1); } } ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: partially uptodate page reads 2008-07-28 4:34 ` Hisashi Hifumi @ 2008-07-28 6:51 ` Andrew Morton 2008-07-28 6:56 ` Nick Piggin 0 siblings, 1 reply; 9+ messages in thread From: Andrew Morton @ 2008-07-28 6:51 UTC (permalink / raw) To: Hisashi Hifumi Cc: Christoph Hellwig, Nick Piggin, jack, linux-ext4, linux-fsdevel, xfs On Mon, 28 Jul 2008 13:34:12 +0900 Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> wrote: > Hi > > >> > > >> > Are there significant numbers of people using block size < page size in > >> > situations where performance is important and significantly improved by > >> > this patch? Can you give any performance numbers to illustrate perhaps? > >> > >> With XFS lots of people use 4k blocksize filesystems on ia64 systems > >> with 16k pages, so an optimization like this would be useful. > > > >As Nick says, we really should have some measurement results which > >confirm this theory. Maybe we did do some but they didn't find theor > >way into the changelog. > > > >I've put the patch on hold until this confirmation data is available. > > > > I've got some performance number. > I wrote a benchmark program and got result number with this program. > This benchmark do: > 1, mount and open a test file. > 2, create a 512MB file. > 3, close a file and umount. > 4, mount and again open a test file. > 5, pwrite randomly 300000 times on a test file. offset is aligned by IO size(1024bytes). > 6, measure time of preading randomly 100000 times on a test file. > > The result was: > 2.6.26 > 330 sec > > 2.6.26-patched > 226 sec > > Arch:i386 > Filesystem:ext3 > Blocksize:1024 bytes > Memory: 1GB > > On ext3/4, a file is written through buffer/block. So random read/write mixed workloads > or random read after random write workloads are optimized with this patch under > pagesize != blocksize environment. This test result showed this. OK, thanks. Those are pretty nice numbers for what is probably a fairly common workload. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: partially uptodate page reads 2008-07-28 6:51 ` Andrew Morton @ 2008-07-28 6:56 ` Nick Piggin 2008-07-28 7:09 ` Andrew Morton 0 siblings, 1 reply; 9+ messages in thread From: Nick Piggin @ 2008-07-28 6:56 UTC (permalink / raw) To: Andrew Morton Cc: Hisashi Hifumi, Christoph Hellwig, jack, linux-ext4, linux-fsdevel, xfs On Monday 28 July 2008 16:51, Andrew Morton wrote: > On Mon, 28 Jul 2008 13:34:12 +0900 Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> wrote: > > Hi > > > > >> > Are there significant numbers of people using block size < page size > > >> > in situations where performance is important and significantly > > >> > improved by this patch? Can you give any performance numbers to > > >> > illustrate perhaps? > > >> > > >> With XFS lots of people use 4k blocksize filesystems on ia64 systems > > >> with 16k pages, so an optimization like this would be useful. > > > > > >As Nick says, we really should have some measurement results which > > >confirm this theory. Maybe we did do some but they didn't find theor > > >way into the changelog. > > > > > >I've put the patch on hold until this confirmation data is available. > > > > I've got some performance number. > > I wrote a benchmark program and got result number with this program. > > This benchmark do: > > 1, mount and open a test file. > > 2, create a 512MB file. > > 3, close a file and umount. > > 4, mount and again open a test file. > > 5, pwrite randomly 300000 times on a test file. offset is aligned by IO > > size(1024bytes). 6, measure time of preading randomly 100000 times on a > > test file. > > > > The result was: > > 2.6.26 > > 330 sec > > > > 2.6.26-patched > > 226 sec > > > > Arch:i386 > > Filesystem:ext3 > > Blocksize:1024 bytes > > Memory: 1GB > > > > On ext3/4, a file is written through buffer/block. So random read/write > > mixed workloads or random read after random write workloads are optimized > > with this patch under pagesize != blocksize environment. This test result > > showed this. Yeah, thanks for the numbers. > OK, thanks. Those are pretty nice numbers for what is probably a > fairly common workload. What kind of workloads does this kind of thing? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: partially uptodate page reads 2008-07-28 6:56 ` Nick Piggin @ 2008-07-28 7:09 ` Andrew Morton 2008-07-28 7:22 ` Nick Piggin 0 siblings, 1 reply; 9+ messages in thread From: Andrew Morton @ 2008-07-28 7:09 UTC (permalink / raw) To: Nick Piggin Cc: Hisashi Hifumi, Christoph Hellwig, jack, linux-ext4, linux-fsdevel, xfs On Mon, 28 Jul 2008 16:56:37 +1000 Nick Piggin <nickpiggin@yahoo.com.au> wrote: > On Monday 28 July 2008 16:51, Andrew Morton wrote: > > On Mon, 28 Jul 2008 13:34:12 +0900 Hisashi Hifumi > <hifumi.hisashi@oss.ntt.co.jp> wrote: > > > Hi > > > > > > >> > Are there significant numbers of people using block size < page size > > > >> > in situations where performance is important and significantly > > > >> > improved by this patch? Can you give any performance numbers to > > > >> > illustrate perhaps? > > > >> > > > >> With XFS lots of people use 4k blocksize filesystems on ia64 systems > > > >> with 16k pages, so an optimization like this would be useful. > > > > > > > >As Nick says, we really should have some measurement results which > > > >confirm this theory. Maybe we did do some but they didn't find theor > > > >way into the changelog. > > > > > > > >I've put the patch on hold until this confirmation data is available. > > > > > > I've got some performance number. > > > I wrote a benchmark program and got result number with this program. > > > This benchmark do: > > > 1, mount and open a test file. > > > 2, create a 512MB file. > > > 3, close a file and umount. > > > 4, mount and again open a test file. > > > 5, pwrite randomly 300000 times on a test file. offset is aligned by IO > > > size(1024bytes). 6, measure time of preading randomly 100000 times on a > > > test file. > > > > > > The result was: > > > 2.6.26 > > > 330 sec > > > > > > 2.6.26-patched > > > 226 sec > > > > > > Arch:i386 > > > Filesystem:ext3 > > > Blocksize:1024 bytes > > > Memory: 1GB > > > > > > On ext3/4, a file is written through buffer/block. So random read/write > > > mixed workloads or random read after random write workloads are optimized > > > with this patch under pagesize != blocksize environment. This test result > > > showed this. > > Yeah, thanks for the numbers. > > > > OK, thanks. Those are pretty nice numbers for what is probably a > > fairly common workload. > > What kind of workloads does this kind of thing? Various databases? (confused). More likely pattern is 8k IOs with 16k pagesize or thereabouts. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: partially uptodate page reads 2008-07-28 7:09 ` Andrew Morton @ 2008-07-28 7:22 ` Nick Piggin 0 siblings, 0 replies; 9+ messages in thread From: Nick Piggin @ 2008-07-28 7:22 UTC (permalink / raw) To: Andrew Morton Cc: Hisashi Hifumi, Christoph Hellwig, jack, linux-ext4, linux-fsdevel, xfs On Monday 28 July 2008 17:09, Andrew Morton wrote: > On Mon, 28 Jul 2008 16:56:37 +1000 Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > On Monday 28 July 2008 16:51, Andrew Morton wrote: > > > On Mon, 28 Jul 2008 13:34:12 +0900 Hisashi Hifumi > > Yeah, thanks for the numbers. > > > > > OK, thanks. Those are pretty nice numbers for what is probably a > > > fairly common workload. > > > > What kind of workloads does this kind of thing? > > Various databases? (confused). I guess so, I was thinking of direct IO, but I guess there are good open source ones which go through pagecache. > More likely pattern is 8k IOs with 16k pagesize or thereabouts. Right, but it won't be a completely random workload. Also, it would be interesting to know if there are any 8k database block size databases on 4k block size filesystems, running on 16k page size machines, which are very performance critical ;) But I guess it is only a small amount of code in order to get a pretty good speedup. So while those are probably very few installations, it is probably as much because we do a bad job of it as it just isn't a good idea in general ;) The improvement is quite significant, even if it is the artificial best possible case... I suppose let's just merge it then? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: partially uptodate page reads 2008-07-24 17:59 ` Christoph Hellwig 2008-07-24 19:08 ` Andrew Morton @ 2008-07-25 9:22 ` Nick Piggin 1 sibling, 0 replies; 9+ messages in thread From: Nick Piggin @ 2008-07-25 9:22 UTC (permalink / raw) To: Christoph Hellwig Cc: hifumi.hisashi, jack, linux-ext4, linux-fsdevel, akpm, xfs On Friday 25 July 2008 03:59, Christoph Hellwig wrote: > On Fri, Jul 25, 2008 at 01:17:11AM +1000, Nick Piggin wrote: > > Hi, I have some questions about your patch in -mm > > > > vfs-pagecache-usage-optimization-onpagesize=blocksize-environment.patch > > > > I have no particular problem with something like this, but leaving the > > implementation details aside for the moment, can we discuss the > > justification for this? > > > > Are there significant numbers of people using block size < page size in > > situations where performance is important and significantly improved by > > this patch? Can you give any performance numbers to illustrate perhaps? > > With XFS lots of people use 4k blocksize filesystems on ia64 systems > with 16k pages, so an optimization like this would be useful. > > But as mentioned in one of your previous comments I'd rather prefer > a readpage interface chaneg to deal with this. Yeah... actually if it is a nice win I don't mind too much to go with this API to start with, and consolidate with readpage later. Readpage I am thinking about making a few other changes for it as well, so I am happy to look at folding in this partially-uptodate API with it as well. If we just get some numbers (maybe SGI can help out?), I'm happy enough with this approach. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-07-28 7:22 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-07-24 15:17 partially uptodate page reads Nick Piggin 2008-07-24 17:59 ` Christoph Hellwig 2008-07-24 19:08 ` Andrew Morton 2008-07-28 4:34 ` Hisashi Hifumi 2008-07-28 6:51 ` Andrew Morton 2008-07-28 6:56 ` Nick Piggin 2008-07-28 7:09 ` Andrew Morton 2008-07-28 7:22 ` Nick Piggin 2008-07-25 9:22 ` Nick Piggin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).