linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: "Li, Shaohua" <shaohua.li-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: "linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Chris Mason <chris.mason-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Arjan van de Ven <arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	"Yan,
	Zheng" <zheng.z.yan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>,
	"linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org"
	<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Subject: Re: [PATCH v2 0/5] add new ioctls to do metadata readahead in btrfs
Date: Tue, 18 Jan 2011 14:22:53 +0800	[thread overview]
Message-ID: <20110118062253.GA7370@localhost> (raw)
In-Reply-To: <1295327727.1949.730.camel@sli10-conroe>

On Tue, Jan 18, 2011 at 01:15:27PM +0800, Li, Shaohua wrote:
> On Tue, 2011-01-18 at 12:41 +0800, Wu, Fengguang wrote:
> > On Mon, Jan 17, 2011 at 09:32:37AM +0800, Li, Shaohua wrote:
> > > On Sun, 2011-01-16 at 11:38 +0800, Wu, Fengguang wrote:
> > > > On Wed, Jan 12, 2011 at 10:55:16AM +0800, Li, Shaohua wrote:
> > > > > On Tue, Jan 11, 2011 at 05:13:53PM +0800, Wu, Fengguang wrote:
> > > > > > On Tue, Jan 11, 2011 at 11:27:33AM +0800, Li, Shaohua wrote:
> > > > > > > On Tue, 2011-01-11 at 11:07 +0800, Wu, Fengguang wrote:
> > > > > > > > On Tue, Jan 11, 2011 at 10:03:16AM +0800, Li, Shaohua wrote:
> > 
> > > > > > > > > fincore can takes a parameter or it returns a bit to distinguish
> > > > > > > > > referenced pages, but I don't think it's a good API. This should be
> > > > > > > > > transparent to userspace.
> > > > > > > >
> > > > > > > > Users care about the "cached" status may well be interested in the
> > > > > > > > "active/referenced" status. They are co-related information. fincore()
> > > > > > > > won't be a simple replication of mincore() anyway. fincore() has to
> > > > > > > > deal with huge sparsely accessed files. The accessed bits of a file
> > > > > > > > page are normally more meaningful than the accessed bits of mapped
> > > > > > > > (anonymous) pages.
> > > > > > > if all filesystems have the bit set, I'll buy-in. Otherwise, this isn't generic enough.
> > > > > > 
> > > > > > It's a reasonable thing to set the accessed bits. So I believe the
> > > > > > various filesystems are calling mark_page_accessed() on their metadata
> > > > > > inode, or can be changed to do it.
> > > > > yes, we can, with a lot of pain. And filesystems must be smart to avoid marking the bit
> > > > > for pages which are readahead in but actually are invalid. The second patch in the series
> > > > 
> > > > "invalid" means !PG_uptodate? I wonder why there is a need to test
> > > > that bit at all. !PG_uptodate seems an unrelated transitional state.
> > > not PG_update, it's referenced bit. A readahead metadata page will have update bit set,
> > > but it might not have referenced bit if it's an obsolete page. btrfs
> > > doesn't use the buffer_head
> > 
> > I do see PageUptodate() tests in your patch, perhaps they be removed?
> uptodate bit isn't really needed, but I added it to make sure the page
> is valid.

It may be nit pick, but I always try to remove optional code.  The
PageUptodate() looks like an irrelevant test and a good candidate to
remove.

> > > > > has more detailed infomation about this issue. The problem is if this is really worthy
> > > > > for metadata readahead. Some filesystems might don't care about metadata readahead. If
> > > > > we make fincore check the bit, then fincore syscall will not work for such filesystems,
> > > > > which is bad.
> > > > 
> > > > fincore() will always work as is. If the filesystem don't care about
> > > > metadata readahead, then the metadata readahead that makes use of the
> > > > bits will naturally not work for them?
> > > yes, they don't care about readahead, but they do care about fincore
> > > output.
> > 
> > fincore() just reports the accessed bits as is. If the filesystem does
> > not use blockdev or export its internal metadata inode, the user won't
> > be able to run fincore() on the metadata inode at all.
> > 
> > > if fincore() checks the bits, it doesn't work even for normal file
> > > pages, if the pages get deactivated.
> > 
> > That's a problem independent of the interface. And for user space
> > readahead, it can be nicely fixed by collecting the pages-to-readahead
> > before the free pages drop low, ie. before any page reclaim actions.
> > It's "nice" because you don't want to readahead more data than
> > cache-able anyway and avoid thrashing for small memory systems.
> My point is fincore() isn't designed only for readahead. People will use
> it like mincore, which is its normal usage. Checking the bits will break
> its normal usage, because fincore just doesn't check if the fd means a
> metadata inode.

Sorry, you missed my point :) I mean to export the accessed bits as-is
via the fincore() interface, not to check the accessed bits and then
report "page not cached" to user space for !PG_referenced pages.

Thanks,
Fengguang

  reply	other threads:[~2011-01-18  6:22 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-04  5:40 [PATCH v2 0/5] add new ioctls to do metadata readahead in btrfs Shaohua Li
2011-01-04 16:14 ` Jeff Moyer
     [not found]   ` <x498vz0abov.fsf-RRHT56Q3PSP4kTEheFKJxxDDeQx5vsVwAInAS/Ez/D0@public.gmane.org>
2011-01-05  2:10     ` Shaohua Li
2011-01-10 14:26 ` Wu Fengguang
2011-01-11  0:15   ` Shaohua Li
2011-01-11  1:38     ` Wu Fengguang
2011-01-11  2:03       ` Shaohua Li
2011-01-11  3:07         ` Wu Fengguang
2011-01-11  3:27           ` Shaohua Li
2011-01-11  9:13             ` Wu Fengguang
2011-01-12  2:55               ` Shaohua Li
     [not found]                 ` <20110112025516.GA11303-yAZKuqJtXNMXR+D7ky4Foa2pdiUAq4bhAL8bYrjMMd8@public.gmane.org>
2011-01-16  3:38                   ` Wu Fengguang
2011-01-17  1:32                     ` Shaohua Li
2011-01-18  4:41                       ` Wu Fengguang
2011-01-18  5:15                         ` Shaohua Li
2011-01-18  6:22                           ` Wu Fengguang [this message]
2011-01-18  6:35                             ` Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110118062253.GA7370@localhost \
    --to=fengguang.wu-ral2jqcrhueavxtiumwx3w@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    --cc=chris.mason-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    --cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=shaohua.li-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=zheng.z.yan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).