* partially uptodate page reads
@ 2008-07-24 15:17 Nick Piggin
2008-07-24 17:59 ` Christoph Hellwig
0 siblings, 1 reply; 9+ messages in thread
From: Nick Piggin @ 2008-07-24 15:17 UTC (permalink / raw)
To: hifumi.hisashi, jack, linux-ext4, linux-fsdevel, akpm
Hi, I have some questions about your patch in -mm
vfs-pagecache-usage-optimization-onpagesize=blocksize-environment.patch
I have no particular problem with something like this, but leaving the
implementation details aside for the moment, can we discuss the
justification for this?
Are there significant numbers of people using block size < page size in
situations where performance is important and significantly improved by
this patch? Can you give any performance numbers to illustrate perhaps?
Thanks,
Nick
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: partially uptodate page reads
2008-07-24 15:17 partially uptodate page reads Nick Piggin
@ 2008-07-24 17:59 ` Christoph Hellwig
2008-07-24 19:08 ` Andrew Morton
2008-07-25 9:22 ` Nick Piggin
0 siblings, 2 replies; 9+ messages in thread
From: Christoph Hellwig @ 2008-07-24 17:59 UTC (permalink / raw)
To: Nick Piggin; +Cc: hifumi.hisashi, jack, linux-ext4, linux-fsdevel, akpm, xfs
On Fri, Jul 25, 2008 at 01:17:11AM +1000, Nick Piggin wrote:
> Hi, I have some questions about your patch in -mm
>
> vfs-pagecache-usage-optimization-onpagesize=blocksize-environment.patch
>
> I have no particular problem with something like this, but leaving the
> implementation details aside for the moment, can we discuss the
> justification for this?
>
> Are there significant numbers of people using block size < page size in
> situations where performance is important and significantly improved by
> this patch? Can you give any performance numbers to illustrate perhaps?
With XFS lots of people use 4k blocksize filesystems on ia64 systems
with 16k pages, so an optimization like this would be useful.
But as mentioned in one of your previous comments I'd rather prefer
a readpage interface chaneg to deal with this.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: partially uptodate page reads
2008-07-24 17:59 ` Christoph Hellwig
@ 2008-07-24 19:08 ` Andrew Morton
2008-07-28 4:34 ` Hisashi Hifumi
2008-07-25 9:22 ` Nick Piggin
1 sibling, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2008-07-24 19:08 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Nick Piggin, hifumi.hisashi, jack, linux-ext4, linux-fsdevel, xfs
On Thu, 24 Jul 2008 13:59:13 -0400 Christoph Hellwig <hch@infradead.org> wrote:
> On Fri, Jul 25, 2008 at 01:17:11AM +1000, Nick Piggin wrote:
> > Hi, I have some questions about your patch in -mm
> >
> > vfs-pagecache-usage-optimization-onpagesize=blocksize-environment.patch
> >
> > I have no particular problem with something like this, but leaving the
> > implementation details aside for the moment, can we discuss the
> > justification for this?
> >
> > Are there significant numbers of people using block size < page size in
> > situations where performance is important and significantly improved by
> > this patch? Can you give any performance numbers to illustrate perhaps?
>
> With XFS lots of people use 4k blocksize filesystems on ia64 systems
> with 16k pages, so an optimization like this would be useful.
As Nick says, we really should have some measurement results which
confirm this theory. Maybe we did do some but they didn't find theor
way into the changelog.
I've put the patch on hold until this confirmation data is available.
> But as mentioned in one of your previous comments I'd rather prefer
> a readpage interface chaneg to deal with this.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: partially uptodate page reads
2008-07-24 17:59 ` Christoph Hellwig
2008-07-24 19:08 ` Andrew Morton
@ 2008-07-25 9:22 ` Nick Piggin
1 sibling, 0 replies; 9+ messages in thread
From: Nick Piggin @ 2008-07-25 9:22 UTC (permalink / raw)
To: Christoph Hellwig
Cc: hifumi.hisashi, jack, linux-ext4, linux-fsdevel, akpm, xfs
On Friday 25 July 2008 03:59, Christoph Hellwig wrote:
> On Fri, Jul 25, 2008 at 01:17:11AM +1000, Nick Piggin wrote:
> > Hi, I have some questions about your patch in -mm
> >
> > vfs-pagecache-usage-optimization-onpagesize=blocksize-environment.patch
> >
> > I have no particular problem with something like this, but leaving the
> > implementation details aside for the moment, can we discuss the
> > justification for this?
> >
> > Are there significant numbers of people using block size < page size in
> > situations where performance is important and significantly improved by
> > this patch? Can you give any performance numbers to illustrate perhaps?
>
> With XFS lots of people use 4k blocksize filesystems on ia64 systems
> with 16k pages, so an optimization like this would be useful.
>
> But as mentioned in one of your previous comments I'd rather prefer
> a readpage interface chaneg to deal with this.
Yeah... actually if it is a nice win I don't mind too much to go
with this API to start with, and consolidate with readpage later.
Readpage I am thinking about making a few other changes for it as
well, so I am happy to look at folding in this partially-uptodate
API with it as well.
If we just get some numbers (maybe SGI can help out?), I'm happy
enough with this approach.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: partially uptodate page reads
2008-07-24 19:08 ` Andrew Morton
@ 2008-07-28 4:34 ` Hisashi Hifumi
2008-07-28 6:51 ` Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: Hisashi Hifumi @ 2008-07-28 4:34 UTC (permalink / raw)
To: Andrew Morton, Christoph Hellwig
Cc: Nick Piggin, jack, linux-ext4, linux-fsdevel, xfs
Hi
>> >
>> > Are there significant numbers of people using block size < page size in
>> > situations where performance is important and significantly improved by
>> > this patch? Can you give any performance numbers to illustrate perhaps?
>>
>> With XFS lots of people use 4k blocksize filesystems on ia64 systems
>> with 16k pages, so an optimization like this would be useful.
>
>As Nick says, we really should have some measurement results which
>confirm this theory. Maybe we did do some but they didn't find theor
>way into the changelog.
>
>I've put the patch on hold until this confirmation data is available.
>
I've got some performance number.
I wrote a benchmark program and got result number with this program.
This benchmark do:
1, mount and open a test file.
2, create a 512MB file.
3, close a file and umount.
4, mount and again open a test file.
5, pwrite randomly 300000 times on a test file. offset is aligned by IO size(1024bytes).
6, measure time of preading randomly 100000 times on a test file.
The result was:
2.6.26
330 sec
2.6.26-patched
226 sec
Arch:i386
Filesystem:ext3
Blocksize:1024 bytes
Memory: 1GB
On ext3/4, a file is written through buffer/block. So random read/write mixed workloads
or random read after random write workloads are optimized with this patch under
pagesize != blocksize environment. This test result showed this.
The benchmark program is as follows:
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <time.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mount.h>
#define LEN 1024
#define LOOP 1024*512 /* 512MB */
main(void)
{
unsigned long i, offset, filesize;
int fd;
char buf[LEN];
time_t t1, t2;
if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
perror("cannot mount\n");
exit(1);
}
memset(buf, 0, LEN);
fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC);
if (fd < 0) {
perror("cannot open file\n");
exit(1);
}
for (i = 0; i < LOOP; i++)
write(fd, buf, LEN);
close(fd);
if (umount("/root/test1/") < 0) {
perror("cannot umount\n");
exit(1);
}
if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
perror("cannot mount\n");
exit(1);
}
fd = open("/root/test1/testfile", O_RDWR);
if (fd < 0) {
perror("cannot open file\n");
exit(1);
}
filesize = LEN * LOOP;
for (i = 0; i < 300000; i++){
offset = (random() % filesize) & (~(LEN - 1));
pwrite(fd, buf, LEN, offset);
}
printf("start test\n");
time(&t1);
for (i = 0; i < 100000; i++){
offset = (random() % filesize) & (~(LEN - 1));
pread(fd, buf, LEN, offset);
}
time(&t2);
printf("%ld sec\n", t2-t1);
close(fd);
if (umount("/root/test1/") < 0) {
perror("cannot umount\n");
exit(1);
}
}
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: partially uptodate page reads
2008-07-28 4:34 ` Hisashi Hifumi
@ 2008-07-28 6:51 ` Andrew Morton
2008-07-28 6:56 ` Nick Piggin
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2008-07-28 6:51 UTC (permalink / raw)
To: Hisashi Hifumi
Cc: Christoph Hellwig, Nick Piggin, jack, linux-ext4, linux-fsdevel,
xfs
On Mon, 28 Jul 2008 13:34:12 +0900 Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> wrote:
> Hi
>
> >> >
> >> > Are there significant numbers of people using block size < page size in
> >> > situations where performance is important and significantly improved by
> >> > this patch? Can you give any performance numbers to illustrate perhaps?
> >>
> >> With XFS lots of people use 4k blocksize filesystems on ia64 systems
> >> with 16k pages, so an optimization like this would be useful.
> >
> >As Nick says, we really should have some measurement results which
> >confirm this theory. Maybe we did do some but they didn't find theor
> >way into the changelog.
> >
> >I've put the patch on hold until this confirmation data is available.
> >
>
> I've got some performance number.
> I wrote a benchmark program and got result number with this program.
> This benchmark do:
> 1, mount and open a test file.
> 2, create a 512MB file.
> 3, close a file and umount.
> 4, mount and again open a test file.
> 5, pwrite randomly 300000 times on a test file. offset is aligned by IO size(1024bytes).
> 6, measure time of preading randomly 100000 times on a test file.
>
> The result was:
> 2.6.26
> 330 sec
>
> 2.6.26-patched
> 226 sec
>
> Arch:i386
> Filesystem:ext3
> Blocksize:1024 bytes
> Memory: 1GB
>
> On ext3/4, a file is written through buffer/block. So random read/write mixed workloads
> or random read after random write workloads are optimized with this patch under
> pagesize != blocksize environment. This test result showed this.
OK, thanks. Those are pretty nice numbers for what is probably a
fairly common workload.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: partially uptodate page reads
2008-07-28 6:51 ` Andrew Morton
@ 2008-07-28 6:56 ` Nick Piggin
2008-07-28 7:09 ` Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: Nick Piggin @ 2008-07-28 6:56 UTC (permalink / raw)
To: Andrew Morton
Cc: Hisashi Hifumi, Christoph Hellwig, jack, linux-ext4,
linux-fsdevel, xfs
On Monday 28 July 2008 16:51, Andrew Morton wrote:
> On Mon, 28 Jul 2008 13:34:12 +0900 Hisashi Hifumi
<hifumi.hisashi@oss.ntt.co.jp> wrote:
> > Hi
> >
> > >> > Are there significant numbers of people using block size < page size
> > >> > in situations where performance is important and significantly
> > >> > improved by this patch? Can you give any performance numbers to
> > >> > illustrate perhaps?
> > >>
> > >> With XFS lots of people use 4k blocksize filesystems on ia64 systems
> > >> with 16k pages, so an optimization like this would be useful.
> > >
> > >As Nick says, we really should have some measurement results which
> > >confirm this theory. Maybe we did do some but they didn't find theor
> > >way into the changelog.
> > >
> > >I've put the patch on hold until this confirmation data is available.
> >
> > I've got some performance number.
> > I wrote a benchmark program and got result number with this program.
> > This benchmark do:
> > 1, mount and open a test file.
> > 2, create a 512MB file.
> > 3, close a file and umount.
> > 4, mount and again open a test file.
> > 5, pwrite randomly 300000 times on a test file. offset is aligned by IO
> > size(1024bytes). 6, measure time of preading randomly 100000 times on a
> > test file.
> >
> > The result was:
> > 2.6.26
> > 330 sec
> >
> > 2.6.26-patched
> > 226 sec
> >
> > Arch:i386
> > Filesystem:ext3
> > Blocksize:1024 bytes
> > Memory: 1GB
> >
> > On ext3/4, a file is written through buffer/block. So random read/write
> > mixed workloads or random read after random write workloads are optimized
> > with this patch under pagesize != blocksize environment. This test result
> > showed this.
Yeah, thanks for the numbers.
> OK, thanks. Those are pretty nice numbers for what is probably a
> fairly common workload.
What kind of workloads does this kind of thing?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: partially uptodate page reads
2008-07-28 6:56 ` Nick Piggin
@ 2008-07-28 7:09 ` Andrew Morton
2008-07-28 7:22 ` Nick Piggin
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2008-07-28 7:09 UTC (permalink / raw)
To: Nick Piggin
Cc: Hisashi Hifumi, Christoph Hellwig, jack, linux-ext4,
linux-fsdevel, xfs
On Mon, 28 Jul 2008 16:56:37 +1000 Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> On Monday 28 July 2008 16:51, Andrew Morton wrote:
> > On Mon, 28 Jul 2008 13:34:12 +0900 Hisashi Hifumi
> <hifumi.hisashi@oss.ntt.co.jp> wrote:
> > > Hi
> > >
> > > >> > Are there significant numbers of people using block size < page size
> > > >> > in situations where performance is important and significantly
> > > >> > improved by this patch? Can you give any performance numbers to
> > > >> > illustrate perhaps?
> > > >>
> > > >> With XFS lots of people use 4k blocksize filesystems on ia64 systems
> > > >> with 16k pages, so an optimization like this would be useful.
> > > >
> > > >As Nick says, we really should have some measurement results which
> > > >confirm this theory. Maybe we did do some but they didn't find theor
> > > >way into the changelog.
> > > >
> > > >I've put the patch on hold until this confirmation data is available.
> > >
> > > I've got some performance number.
> > > I wrote a benchmark program and got result number with this program.
> > > This benchmark do:
> > > 1, mount and open a test file.
> > > 2, create a 512MB file.
> > > 3, close a file and umount.
> > > 4, mount and again open a test file.
> > > 5, pwrite randomly 300000 times on a test file. offset is aligned by IO
> > > size(1024bytes). 6, measure time of preading randomly 100000 times on a
> > > test file.
> > >
> > > The result was:
> > > 2.6.26
> > > 330 sec
> > >
> > > 2.6.26-patched
> > > 226 sec
> > >
> > > Arch:i386
> > > Filesystem:ext3
> > > Blocksize:1024 bytes
> > > Memory: 1GB
> > >
> > > On ext3/4, a file is written through buffer/block. So random read/write
> > > mixed workloads or random read after random write workloads are optimized
> > > with this patch under pagesize != blocksize environment. This test result
> > > showed this.
>
> Yeah, thanks for the numbers.
>
>
> > OK, thanks. Those are pretty nice numbers for what is probably a
> > fairly common workload.
>
> What kind of workloads does this kind of thing?
Various databases? (confused).
More likely pattern is 8k IOs with 16k pagesize or thereabouts.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: partially uptodate page reads
2008-07-28 7:09 ` Andrew Morton
@ 2008-07-28 7:22 ` Nick Piggin
0 siblings, 0 replies; 9+ messages in thread
From: Nick Piggin @ 2008-07-28 7:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Hisashi Hifumi, Christoph Hellwig, jack, linux-ext4,
linux-fsdevel, xfs
On Monday 28 July 2008 17:09, Andrew Morton wrote:
> On Mon, 28 Jul 2008 16:56:37 +1000 Nick Piggin <nickpiggin@yahoo.com.au>
wrote:
> > On Monday 28 July 2008 16:51, Andrew Morton wrote:
> > > On Mon, 28 Jul 2008 13:34:12 +0900 Hisashi Hifumi
> > Yeah, thanks for the numbers.
> >
> > > OK, thanks. Those are pretty nice numbers for what is probably a
> > > fairly common workload.
> >
> > What kind of workloads does this kind of thing?
>
> Various databases? (confused).
I guess so, I was thinking of direct IO, but I guess there are
good open source ones which go through pagecache.
> More likely pattern is 8k IOs with 16k pagesize or thereabouts.
Right, but it won't be a completely random workload. Also, it would
be interesting to know if there are any 8k database block size databases
on 4k block size filesystems, running on 16k page size machines, which
are very performance critical ;)
But I guess it is only a small amount of code in order to get a pretty
good speedup. So while those are probably very few installations, it is
probably as much because we do a bad job of it as it just isn't a good
idea in general ;)
The improvement is quite significant, even if it is the artificial best
possible case... I suppose let's just merge it then?
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-07-28 7:22 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-24 15:17 partially uptodate page reads Nick Piggin
2008-07-24 17:59 ` Christoph Hellwig
2008-07-24 19:08 ` Andrew Morton
2008-07-28 4:34 ` Hisashi Hifumi
2008-07-28 6:51 ` Andrew Morton
2008-07-28 6:56 ` Nick Piggin
2008-07-28 7:09 ` Andrew Morton
2008-07-28 7:22 ` Nick Piggin
2008-07-25 9:22 ` Nick Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).