Re: [PATCH] procfs: expose page cache contents

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Nick White <nwhite@palantir.com>
Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH] procfs: expose page cache contents
Date: Mon, 29 Apr 2013 11:38:05 +1000	[thread overview]
Message-ID: <20130429013805.GX10481@dastard> (raw)
In-Reply-To: <1F4A5D0CEBD3124E92D83D35F27C0734017A634D@EX02-EAST.YOJOE.local>

On Sun, Apr 28, 2013 at 12:05:47AM +0000, Nick White wrote:
> > what use does this information have?
> 
> There are two main ways I'd find this data (as distinct from this format)
> useful:
> 
> Some applications would benefit from knowing which files are cheaper to
> access. A good example would be a database's query planner, when deciding
> whether to use an index or just sequentially scan a table. If the table's
> blocks were resident in memory but the index's weren't, then it might be
> faster just to start scan the table.

Sounds like a severe case of premature optimisation to me. Indeed,
most databases use direct IO, so there aren't any cached pages in
kernel memory, so nothing you do here will tell you anything about
what is the best query method.

> While mmap / mincore'ing the files would provide this information
> for a specific file, when the size of the files you're interested
> in exceed the address space available (admittedly unlike on 64-bit
> machines, but easy on 32-bit machines) you'd have to start
> processing the files in chunks; this would take much longer and so
> increase the accuracy problems you highlight.

And points out the silliness of attempting to use "what is cached"
as a method of determining the best algorithm to use - it simply
doesn't scale up. Further, if you optimise towards whatever method
gives the best physical IO patterns you'll end up with the most
robust and consistently performing solution.

There's nothing more irritating than a database that randomly
changes performance on the same workload for no obvious reason....

> This scenario actually highlights an algorithmic problem with my
> solution - it loops through the inodes of each (block-device)
> super-block, querying if any of their pages are resident.

Well, yes. Think of a machine with a couple of TB of RAM and tens of
millions of cached inodes....

> It'd be far more efficient to look through the resident pages, and
> see which inodes they pointed at (if any), possibly by walking
> through the memory zones (like /proc/zoneinfo), iterating over the
> per_cpu_pages and mapping them to inodes (if applicable) via
> page->mapping->host?

That doesn't make the TB of page cache case any better - it's just
as gross as your current patch....

> The other use-case I had in mind was when profiling existing
> processes that either use memory-mapping or otherwise rely on the
> kernel to cache the data they frequently rely on.

Go google for the recent hot data tracking patch series.

> I understand your concerns, but I believe more transparency around
> what the page cache is doing would be useful due to its
> significant impact on a system's performance.

You don't need to scan the page cache to understand what it is
doing. strace will tell you the IO your application is doing,
blktrace will tell you the IO that the page cache is doing, various
tracepoints will tell you what pages are being reclaimed, etc.  If
this isn't sufficient for you to understand what your application is
doing and you really need fine grained, custom information about
what is cached in the page cache, then perhaps systemtap would be a
better solution for your purposes.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

     prev parent reply	other threads:[~2013-04-29  1:38 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-25 14:27 [PATCH] procfs: expose page cache contents Nick White
2013-04-26  0:47 ` Dave Chinner
2013-04-28  0:05   ` Nick White
2013-04-29  1:38     ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130429013805.GX10481@dastard \
    --to=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=nwhite@palantir.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).