linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Isaacson <adi@hexapodia.org>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Chris Frost <chris@frostnet.net>,
	Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Benny Halevy <bhalevy@panasas.com>,
	"Andrew@firstfloor.org" <Andrew@firstfloor.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Steve VanDeBogart <vandebo-lkml@nerdbox.net>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Matt Mackall <mpm@selenic.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@elte.hu>
Subject: Re: [PATCH] fs: add fincore(2) (mincore(2) for file descriptors)
Date: Tue, 23 Feb 2010 08:39:26 -0800	[thread overview]
Message-ID: <20100223163926.GC18096@hexapodia.org> (raw)
In-Reply-To: <20100221032533.GB14056@localhost>

On Sun, Feb 21, 2010 at 11:25:33AM +0800, Wu Fengguang wrote:
> Andy and Chris,
> On Sun, Feb 21, 2010 at 11:02:38AM +0800, Andy Isaacson wrote:
> > On Tue, Feb 16, 2010 at 10:13:12AM -0800, Chris Frost wrote:
> > > Add the fincore() system call. fincore() is mincore() for file descriptors.
> > > 
> > > The functionality of fincore() can be emulated with an mmap(), mincore(),
> > > and munmap(), but this emulation requires more system calls and requires
> > > page table modifications. fincore() can provide a significant performance
> > > improvement for non-sequential in-core queries.
> > 
> > In addition to being expensive, mmap/mincore/munmap perturb the VM's
> > eviction algorithm -- a page is less likely to be evicted if it's
> > mmapped when being considered for eviction.
> > 
> > I frequently see this happen when using mincore(1) from
> > http://bitbucket.org/radii/mincore/ -- "watch mincore -v *.big" while
> > *.big are being sequentially read results in a significant number of
> > pages remaining in-core, whereas if I only run mincore after the
> > sequential read is complete, the large files will be nearly-completely
> > out of core (except for the tail of the last file, of course).
> > 
> > It's very interesting to watch
> > % watch --interval=.5 mincore -v *
> > 
> > while an IO-intensive process is happening, such as mke2fs on a
> > filesystem image.
> > 
> > So, I support the addition of fincore(2) and would use it if it were
> > merged.
> 
> I'd like to advocate the "pagecache object collections", a ftrace
> based alternative:
> 
>         http://lkml.org/lkml/2010/2/9/156
> 
> Which will provide much more information than fincore(). I'd really
> appreciate it if you can join and use the general "pagecache object
> collections" facility.

1. The ftrace alternative appears to require root.  That's a complete
   non-starter for my use case.

2. I can imagine advocating that other UNIXes adopt fincore.  It's
   unrealistic to pretend that other UNIXes will adopt our trace/
   infrastructure.  (If anything we should have adopted DTrace.)

3. It appears to expose a significantly more complicated userland API.
   (But this doesn't matter until (1) is addressed.)  Also, it looks
   like it'll be a lot more expensive for high-frequency queries.  Note
   that in the library-helper use case, the library implementation may
   be limited by its exposed API from leaving filedescriptors open
   across calls.  Does ftrace really require the kernel to format data
   to ASCII so that it can be fscanf()ed by userland?  I hope that's
   just a convenience and there's a binary output path.

4. How committed is the ftrace API and ABI?  Is it guaranteed to
   continue to be supported for the next 2 decades?

I'd much rather have the simple, supportible, explainable, performant
API that fits in well to the standard UNIX paradigm than to add
dependencies on Linux-specific APIs that appear to be in extreme flux.

My apologies if I've missed anything in the above, please let me know if
I'm wrong.

Thanks,
-andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-02-23 16:39 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20100120215712.GO27212@frostnet.net>
2010-01-22  1:17 ` [PATCH] fs: add fincore(2) (mincore(2) for file descriptors) Wu Fengguang
     [not found]   ` <87k4vc2rds.fsf@basil.nowhere.org>
2010-02-16 18:13     ` Chris Frost
2010-02-21  3:02       ` Andy Isaacson
2010-02-21  3:25         ` Wu Fengguang
2010-02-23 16:39           ` Andy Isaacson [this message]
2010-05-07 22:46         ` Cédric Villemain
2010-01-22  1:29 ` Paul E. McKenney
2010-01-26 22:12 ` Andrew Morton
2010-01-28  7:42   ` Steve VanDeBogart
2010-01-28  8:23     ` Andrew Morton
2010-01-28  8:32       ` Steve VanDeBogart
2010-01-28 23:54       ` Andres Freund
2010-01-27 18:14 ` Jamie Lokier
2010-01-28  8:23   ` Steve VanDeBogart
2010-01-20 21:57 Chris Frost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100223163926.GC18096@hexapodia.org \
    --to=adi@hexapodia.org \
    --cc=Andrew@firstfloor.org \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=bhalevy@panasas.com \
    --cc=chris@frostnet.net \
    --cc=fengguang.wu@intel.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=mpm@selenic.com \
    --cc=peterz@infradead.org \
    --cc=vandebo-lkml@nerdbox.net \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).