From: Andrew Morton <akpm@linux-foundation.org>
To: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH] VFS: Pagecache usage optimization on pagesize != blocksize environment
Date: Wed, 21 May 2008 00:19:30 -0700 [thread overview]
Message-ID: <20080521001930.202446eb.akpm@linux-foundation.org> (raw)
In-Reply-To: <6.0.0.20.2.20080513205758.03a7a6b0@172.19.0.2>
On Wed, 21 May 2008 15:52:04 +0900 Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> wrote:
> Hi.
>
> When we read some part of a file through pagecache, if there is a pagecache
> of corresponding index but this page is not uptodate, read IO is issued and
> this page will be uptodate.
> I think this is good for pagesize == blocksize environment but there is room
> for improvement on pagesize != blocksize environment. Because in this case
> a page can have multiple buffers and even if a page is not uptodate, some buffers
> can be uptodate. So I suggest that when all buffers which correspond to a part
> of a file that we want to read are uptodate, use this pagecache and copy data
> from this pagecache to user buffer even if a page is not uptodate. This can
> reduce read IO and improve system throughput.
I suppose that makes sense.
> I did a performance test using the sysbench.
That's not a terribly good benchmark, IMO. It's too complex.
To work out the best-case for a change like this I'd suggest a
microbenchmark which does something such as seeking all around a file
doing single-byte reads.
Then one should think up a benchmark which demonstrates the worst-case,
such as reading one-byte-quantities from a file at offsets 0, 0x2000,
0x4000, 0x6000, ... and then read more one-byte-quantities at offsets
0x1000, 0x3000, 0x5000, etc. That would be a pretty cruel comparison,
but as one tosses in more such artificial worklaods, one is in a better
position to work out whether the change is an aggregate benefit.
The results from a great big lumped-together benchmark such as sysbench
aren't a lot of use to us in predicting how effective this change will
be across all the workloads which the kernel implements.
> @@ -932,8 +932,16 @@ find_page:
> ra, filp, page,
> index, last_index - index);
> }
> - if (!PageUptodate(page))
> - goto page_not_up_to_date;
> + if (!PageUptodate(page)) {
> + if (inode->i_blkbits == PAGE_CACHE_SHIFT)
> + goto page_not_up_to_date;
> + if (TestSetPageLocked(page))
> + goto page_not_up_to_date;
> + if (!page_has_buffers(page) ||
> + !check_buffers_uptodate(offset, desc, page))
We shouldn't do this.
> + goto page_not_up_to_date_locked;
> + unlock_page(page);
> + }
See, the code which you have here is assuming that if PagePrivate is
set, then the thing which is at page.private is a ring of buffer_heads.
But this code (do_generic_file_read) doesn't know that! Take a look at
afs, nfs, perhaps other filesystems, grep for set_page_private().
Only the address_space implementation (ie: the filesystem) knows
whether page.private holds buffer_heads and only the
address_space_operations functions are allowed to call into library
functions which treat page.private as a buffer_head ring.
Now, your code _may_ not crash, because perhaps there is no filesystem
which puts something else into page.private which also uses
do_generic_file_read(). But it's still wrong.
I guess a suitable fix might be to implement the above using a new
address_space_operations callback:
if (PagePrivate(page) && aops->is_partially_uptodate) {
if (aops->is_partially_uptodate(page, desc, offset))
<OK, we can copy the data>
then implement a generic_file_is_partially_uptodate() in fs/buffer.c
and wire that up in the filesystems.
Note that things like network filesystems can then implement this also.
next prev parent reply other threads:[~2008-05-21 7:19 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-21 6:52 [PATCH] VFS: Pagecache usage optimization on pagesize != blocksize environment Hisashi Hifumi
2008-05-21 7:19 ` Andrew Morton [this message]
2008-05-21 7:38 ` Andrew Morton
2008-05-22 7:31 ` [PATCH] VFS: Pagecache usage optimization on pagesize !=blocksize environment Hisashi Hifumi
2008-05-22 8:03 ` Andrew Morton
2008-05-22 12:08 ` [PATCH] VFS: Pagecache usage optimization on pagesize!=blocksize environment Hisashi Hifumi
2008-05-23 22:51 ` [PATCH] VFS: Pagecache usage optimization on pagesize !=blocksize environment Jan Kara
2008-05-26 7:20 ` Hisashi Hifumi
2008-05-26 11:40 ` Jan Kara
2008-05-27 8:38 ` [PATCH] VFS: Pagecache usage optimization on pagesize!=blocksize environment Hisashi Hifumi
2008-05-27 8:51 ` Andrew Morton
2008-05-27 9:34 ` [PATCH] VFS: Pagecache usage optimization onpagesize!=blocksize environment Hisashi Hifumi
2008-05-28 23:23 ` Andrew Morton
2008-06-10 1:52 ` [PATCH] VFS: Pagecache usage optimization onpagesize!=blocksizeenvironment Hisashi Hifumi
2008-07-11 23:39 ` [PATCH] VFS: Pagecache usage optimization onpagesize!=blocksize environment Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080521001930.202446eb.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=hifumi.hisashi@oss.ntt.co.jp \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox