From: Shaohua Li <shaohua.li@intel.com>
To: "Wu, Fengguang" <fengguang.wu@intel.com>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Chris Mason <chris.mason@oracle.com>,
Christoph Hellwig <hch@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Arjan van de Ven <arjan@infradead.org>,
"Yan, Zheng" <zheng.z.yan@linux.intel.com>,
"linux-api@vger.kernel.org" <linux-api@vger.kernel.org>,
"mtk.manpages@gmail.com" <mtk.manpages@gmail.com>
Subject: Re: [PATCH v2 0/5] add new ioctls to do metadata readahead in btrfs
Date: Tue, 18 Jan 2011 14:35:35 +0800 [thread overview]
Message-ID: <1295332535.1949.810.camel@sli10-conroe> (raw)
In-Reply-To: <20110118062253.GA7370@localhost>
On Tue, 2011-01-18 at 14:22 +0800, Wu, Fengguang wrote:
> On Tue, Jan 18, 2011 at 01:15:27PM +0800, Li, Shaohua wrote:
> > On Tue, 2011-01-18 at 12:41 +0800, Wu, Fengguang wrote:
> > > On Mon, Jan 17, 2011 at 09:32:37AM +0800, Li, Shaohua wrote:
> > > > On Sun, 2011-01-16 at 11:38 +0800, Wu, Fengguang wrote:
> > > > > On Wed, Jan 12, 2011 at 10:55:16AM +0800, Li, Shaohua wrote:
> > > > > > On Tue, Jan 11, 2011 at 05:13:53PM +0800, Wu, Fengguang wrote:
> > > > > > > On Tue, Jan 11, 2011 at 11:27:33AM +0800, Li, Shaohua wrote:
> > > > > > > > On Tue, 2011-01-11 at 11:07 +0800, Wu, Fengguang wrote:
> > > > > > > > > On Tue, Jan 11, 2011 at 10:03:16AM +0800, Li, Shaohua wrote:
> > >
> > > > > > > > > > fincore can takes a parameter or it returns a bit to distinguish
> > > > > > > > > > referenced pages, but I don't think it's a good API. This should be
> > > > > > > > > > transparent to userspace.
> > > > > > > > >
> > > > > > > > > Users care about the "cached" status may well be interested in the
> > > > > > > > > "active/referenced" status. They are co-related information. fincore()
> > > > > > > > > won't be a simple replication of mincore() anyway. fincore() has to
> > > > > > > > > deal with huge sparsely accessed files. The accessed bits of a file
> > > > > > > > > page are normally more meaningful than the accessed bits of mapped
> > > > > > > > > (anonymous) pages.
> > > > > > > > if all filesystems have the bit set, I'll buy-in. Otherwise, this isn't generic enough.
> > > > > > >
> > > > > > > It's a reasonable thing to set the accessed bits. So I believe the
> > > > > > > various filesystems are calling mark_page_accessed() on their metadata
> > > > > > > inode, or can be changed to do it.
> > > > > > yes, we can, with a lot of pain. And filesystems must be smart to avoid marking the bit
> > > > > > for pages which are readahead in but actually are invalid. The second patch in the series
> > > > >
> > > > > "invalid" means !PG_uptodate? I wonder why there is a need to test
> > > > > that bit at all. !PG_uptodate seems an unrelated transitional state.
> > > > not PG_update, it's referenced bit. A readahead metadata page will have update bit set,
> > > > but it might not have referenced bit if it's an obsolete page. btrfs
> > > > doesn't use the buffer_head
> > >
> > > I do see PageUptodate() tests in your patch, perhaps they be removed?
> > uptodate bit isn't really needed, but I added it to make sure the page
> > is valid.
>
> It may be nit pick, but I always try to remove optional code. The
> PageUptodate() looks like an irrelevant test and a good candidate to
> remove.
ok, I can do this.
> > > > > > has more detailed infomation about this issue. The problem is if this is really worthy
> > > > > > for metadata readahead. Some filesystems might don't care about metadata readahead. If
> > > > > > we make fincore check the bit, then fincore syscall will not work for such filesystems,
> > > > > > which is bad.
> > > > >
> > > > > fincore() will always work as is. If the filesystem don't care about
> > > > > metadata readahead, then the metadata readahead that makes use of the
> > > > > bits will naturally not work for them?
> > > > yes, they don't care about readahead, but they do care about fincore
> > > > output.
> > >
> > > fincore() just reports the accessed bits as is. If the filesystem does
> > > not use blockdev or export its internal metadata inode, the user won't
> > > be able to run fincore() on the metadata inode at all.
> > >
> > > > if fincore() checks the bits, it doesn't work even for normal file
> > > > pages, if the pages get deactivated.
> > >
> > > That's a problem independent of the interface. And for user space
> > > readahead, it can be nicely fixed by collecting the pages-to-readahead
> > > before the free pages drop low, ie. before any page reclaim actions.
> > > It's "nice" because you don't want to readahead more data than
> > > cache-able anyway and avoid thrashing for small memory systems.
> > My point is fincore() isn't designed only for readahead. People will use
> > it like mincore, which is its normal usage. Checking the bits will break
> > its normal usage, because fincore just doesn't check if the fd means a
> > metadata inode.
>
> Sorry, you missed my point :) I mean to export the accessed bits as-is
> via the fincore() interface, not to check the accessed bits and then
> report "page not cached" to user space for !PG_referenced pages.
I thought you said this before, and I think it's a bad API. userspace
should not be aware of such bits, because they are kernel internal.
Except the readahead usage, I can't imagine why userspace needs to know
the bits.
Thanks,
Shaohua
prev parent reply other threads:[~2011-01-18 6:35 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-04 5:40 [PATCH v2 0/5] add new ioctls to do metadata readahead in btrfs Shaohua Li
2011-01-04 16:14 ` Jeff Moyer
[not found] ` <x498vz0abov.fsf-RRHT56Q3PSP4kTEheFKJxxDDeQx5vsVwAInAS/Ez/D0@public.gmane.org>
2011-01-05 2:10 ` Shaohua Li
2011-01-10 14:26 ` Wu Fengguang
2011-01-11 0:15 ` Shaohua Li
2011-01-11 1:38 ` Wu Fengguang
2011-01-11 2:03 ` Shaohua Li
2011-01-11 3:07 ` Wu Fengguang
2011-01-11 3:27 ` Shaohua Li
2011-01-11 9:13 ` Wu Fengguang
2011-01-12 2:55 ` Shaohua Li
[not found] ` <20110112025516.GA11303-yAZKuqJtXNMXR+D7ky4Foa2pdiUAq4bhAL8bYrjMMd8@public.gmane.org>
2011-01-16 3:38 ` Wu Fengguang
2011-01-17 1:32 ` Shaohua Li
2011-01-18 4:41 ` Wu Fengguang
2011-01-18 5:15 ` Shaohua Li
2011-01-18 6:22 ` Wu Fengguang
2011-01-18 6:35 ` Shaohua Li [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1295332535.1949.810.camel@sli10-conroe \
--to=shaohua.li@intel.com \
--cc=akpm@linux-foundation.org \
--cc=arjan@infradead.org \
--cc=chris.mason@oracle.com \
--cc=fengguang.wu@intel.com \
--cc=hch@infradead.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mtk.manpages@gmail.com \
--cc=zheng.z.yan@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).