All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: "Li, Shaohua" <shaohua.li-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: "linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Chris Mason <chris.mason-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Arjan van de Ven <arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	"Yan,
	Zheng" <zheng.z.yan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>,
	"linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org"
	<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Subject: Re: [PATCH v2 0/5] add new ioctls to do metadata readahead in btrfs
Date: Tue, 18 Jan 2011 14:22:53 +0800	[thread overview]
Message-ID: <20110118062253.GA7370@localhost> (raw)
In-Reply-To: <1295327727.1949.730.camel@sli10-conroe>

On Tue, Jan 18, 2011 at 01:15:27PM +0800, Li, Shaohua wrote:
> On Tue, 2011-01-18 at 12:41 +0800, Wu, Fengguang wrote:
> > On Mon, Jan 17, 2011 at 09:32:37AM +0800, Li, Shaohua wrote:
> > > On Sun, 2011-01-16 at 11:38 +0800, Wu, Fengguang wrote:
> > > > On Wed, Jan 12, 2011 at 10:55:16AM +0800, Li, Shaohua wrote:
> > > > > On Tue, Jan 11, 2011 at 05:13:53PM +0800, Wu, Fengguang wrote:
> > > > > > On Tue, Jan 11, 2011 at 11:27:33AM +0800, Li, Shaohua wrote:
> > > > > > > On Tue, 2011-01-11 at 11:07 +0800, Wu, Fengguang wrote:
> > > > > > > > On Tue, Jan 11, 2011 at 10:03:16AM +0800, Li, Shaohua wrote:
> > 
> > > > > > > > > fincore can takes a parameter or it returns a bit to distinguish
> > > > > > > > > referenced pages, but I don't think it's a good API. This should be
> > > > > > > > > transparent to userspace.
> > > > > > > >
> > > > > > > > Users care about the "cached" status may well be interested in the
> > > > > > > > "active/referenced" status. They are co-related information. fincore()
> > > > > > > > won't be a simple replication of mincore() anyway. fincore() has to
> > > > > > > > deal with huge sparsely accessed files. The accessed bits of a file
> > > > > > > > page are normally more meaningful than the accessed bits of mapped
> > > > > > > > (anonymous) pages.
> > > > > > > if all filesystems have the bit set, I'll buy-in. Otherwise, this isn't generic enough.
> > > > > > 
> > > > > > It's a reasonable thing to set the accessed bits. So I believe the
> > > > > > various filesystems are calling mark_page_accessed() on their metadata
> > > > > > inode, or can be changed to do it.
> > > > > yes, we can, with a lot of pain. And filesystems must be smart to avoid marking the bit
> > > > > for pages which are readahead in but actually are invalid. The second patch in the series
> > > > 
> > > > "invalid" means !PG_uptodate? I wonder why there is a need to test
> > > > that bit at all. !PG_uptodate seems an unrelated transitional state.
> > > not PG_update, it's referenced bit. A readahead metadata page will have update bit set,
> > > but it might not have referenced bit if it's an obsolete page. btrfs
> > > doesn't use the buffer_head
> > 
> > I do see PageUptodate() tests in your patch, perhaps they be removed?
> uptodate bit isn't really needed, but I added it to make sure the page
> is valid.

It may be nit pick, but I always try to remove optional code.  The
PageUptodate() looks like an irrelevant test and a good candidate to
remove.

> > > > > has more detailed infomation about this issue. The problem is if this is really worthy
> > > > > for metadata readahead. Some filesystems might don't care about metadata readahead. If
> > > > > we make fincore check the bit, then fincore syscall will not work for such filesystems,
> > > > > which is bad.
> > > > 
> > > > fincore() will always work as is. If the filesystem don't care about
> > > > metadata readahead, then the metadata readahead that makes use of the
> > > > bits will naturally not work for them?
> > > yes, they don't care about readahead, but they do care about fincore
> > > output.
> > 
> > fincore() just reports the accessed bits as is. If the filesystem does
> > not use blockdev or export its internal metadata inode, the user won't
> > be able to run fincore() on the metadata inode at all.
> > 
> > > if fincore() checks the bits, it doesn't work even for normal file
> > > pages, if the pages get deactivated.
> > 
> > That's a problem independent of the interface. And for user space
> > readahead, it can be nicely fixed by collecting the pages-to-readahead
> > before the free pages drop low, ie. before any page reclaim actions.
> > It's "nice" because you don't want to readahead more data than
> > cache-able anyway and avoid thrashing for small memory systems.
> My point is fincore() isn't designed only for readahead. People will use
> it like mincore, which is its normal usage. Checking the bits will break
> its normal usage, because fincore just doesn't check if the fd means a
> metadata inode.

Sorry, you missed my point :) I mean to export the accessed bits as-is
via the fincore() interface, not to check the accessed bits and then
report "page not cached" to user space for !PG_referenced pages.

Thanks,
Fengguang

  reply	other threads:[~2011-01-18  6:22 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-04  5:40 [PATCH v2 0/5] add new ioctls to do metadata readahead in btrfs Shaohua Li
2011-01-04 16:14 ` Jeff Moyer
2011-01-04 16:14   ` Jeff Moyer
     [not found]   ` <x498vz0abov.fsf-RRHT56Q3PSP4kTEheFKJxxDDeQx5vsVwAInAS/Ez/D0@public.gmane.org>
2011-01-05  2:10     ` Shaohua Li
2011-01-10 14:26 ` Wu Fengguang
2011-01-11  0:15   ` Shaohua Li
2011-01-11  1:38     ` Wu Fengguang
2011-01-11  2:03       ` Shaohua Li
2011-01-11  3:07         ` Wu Fengguang
2011-01-11  3:27           ` Shaohua Li
2011-01-11  9:13             ` Wu Fengguang
2011-01-12  2:55               ` Shaohua Li
     [not found]                 ` <20110112025516.GA11303-yAZKuqJtXNMXR+D7ky4Foa2pdiUAq4bhAL8bYrjMMd8@public.gmane.org>
2011-01-16  3:38                   ` Wu Fengguang
2011-01-17  1:32                     ` Shaohua Li
2011-01-18  4:41                       ` Wu Fengguang
2011-01-18  5:15                         ` Shaohua Li
2011-01-18  6:22                           ` Wu Fengguang [this message]
2011-01-18  6:35                             ` Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110118062253.GA7370@localhost \
    --to=fengguang.wu-ral2jqcrhueavxtiumwx3w@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    --cc=chris.mason-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    --cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=shaohua.li-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=zheng.z.yan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.