linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Borislav Petkov <bp@alien8.de>,
	Cyrill Gorcunov <gorcunov@openvz.org>
Subject: Re: [PATCH 0/4] pagecache scanning with /proc/kpagecache
Date: Thu, 22 May 2014 13:36:32 +0300	[thread overview]
Message-ID: <20140522103632.GA23680@node.dhcp.inet.fi> (raw)
In-Reply-To: <CALYGNiMeDtiaA6gfbEYcXbwkuFvTRCLC9KmMOPtopAgGg5b6AA@mail.gmail.com>

On Thu, May 22, 2014 at 01:50:22PM +0400, Konstantin Khlebnikov wrote:
> On Thu, May 22, 2014 at 6:33 AM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> > On Wed, 21 May 2014 22:19:55 -0400 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> >
> >> > A much nicer interface would be for us to (finally!) implement
> >> > fincore(), perhaps with an enhanced per-present-page payload which
> >> > presents the info which you need (although we don't actually know what
> >> > that info is!).
> >>
> >> page/pfn of each page slot and its page cache tag as shown in patch 4/4.
> >>
> >> > This would require open() - it appears to be a requirement that the
> >> > caller not open the file, but no reason was given for this.
> >> >
> >> > Requiring open() would address some of the obvious security concerns,
> >> > but it will still be possible for processes to poke around and get some
> >> > understanding of the behaviour of other processes.  Careful attention
> >> > should be paid to this aspect of any such patchset.
> >>
> >> Sorry if I missed your point, but this interface defines fixed mapping
> >> between file position in /proc/kpagecache and in-file page offset of
> >> the target file. So we do not need to use seq_file mechanism, that's
> >> why open() is not defined and default one is used.
> >> The same thing is true for /proc/{kpagecount,kpageflags}, from which
> >> I copied/pasted some basic code.
> >
> > I think you did miss my point ;) Please do a web search for fincore -
> > it's a syscall similar to mincore(), only it queries pagecache:
> > fincore(int fd, loff_t offset, ...).  In its simplest form it queries
> > just for present/absent, but we could increase the query payload to
> > incorporate additional per-page info.
> >
> > It would take a lot of thought and discussion to nail down the
> > fincore() interface (we've already tried a couple of times).  But
> > unfortunately, fincore() is probably going to be implemented one day
> > and it will (or at least could) make /proc/kpagecache obsolete.
> >
> 
> It seems fincore() also might obsolete /proc/kpageflags and /proc/pid/pagemap.
> because it might be implemented for /dev/mem and /proc/pid/mem as well
> as for normal files.
> 
> Something like this:
> int fincore(int fd, u64 *kpf, u64 *pfn, size_t length, off_t offset)

As always with new syscalls flags are missing ;)

u64 for kpf doesn't sound future proof enough. What about this:

int fincore(int fd, size_t length, off_t offset,
	unsigned long flags, void *records);

Format of records is defined by what user asks in flags. Like:

 - FINCORE_PFN: records are 64-bit each with pfn;
 - FINCORE_PAGE_FLAGS: records are 64-bit each with flags;
 - FINCORE_PFN | FINCORE_PAGE_FLAGS: records are 128-bit each with pfns
   followed by flags (or vice versa);

New flags can extend the format if we would want to expose more info.

Comments?

BTW, does everybody happy with mincore() interface? We report 1 there if
pte is present, but it doesn't really say much about the page for cases
like zero page...

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-05-22 10:37 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-21  2:26 [PATCH 0/4] pagecache scanning with /proc/kpagecache Naoya Horiguchi
2014-05-21  2:26 ` [PATCH 1/4] radix-tree: add end_index to support ranged iteration Naoya Horiguchi
2014-05-21  8:21   ` Konstantin Khlebnikov
2014-05-21 19:26     ` Naoya Horiguchi
2014-05-21  2:26 ` [PATCH 2/4] fs/proc/page.c: introduce /proc/kpagecache interface Naoya Horiguchi
2014-05-21  2:26 ` [PATCH 3/4] tools/vm/page-types.c: rework on file cache scanning mode Naoya Horiguchi
2014-05-21  2:26 ` [PATCH 4/4] Documentation: update Documentation/vm/pagemap.txt Naoya Horiguchi
2014-05-21 22:42 ` [PATCH 0/4] pagecache scanning with /proc/kpagecache Andrew Morton
2014-05-22  2:19   ` Naoya Horiguchi
     [not found]   ` <537d5ee4.4914e00a.5672.ffff85d5SMTPIN_ADDED_BROKEN@mx.google.com>
2014-05-22  2:33     ` Andrew Morton
2014-05-22  9:50       ` Konstantin Khlebnikov
2014-05-22 10:36         ` Kirill A. Shutemov [this message]
2014-05-22 17:47           ` Naoya Horiguchi
2014-05-22 21:02             ` Naoya Horiguchi
2014-06-02  5:24       ` [RFC][PATCH 0/3] mm: introduce fincore() Naoya Horiguchi
2014-06-02  5:24         ` [PATCH 1/3] replace PAGECACHE_TAG_* definition with enumeration Naoya Horiguchi
2014-06-02 16:12           ` Dave Hansen
2014-06-02 16:37             ` Naoya Horiguchi
     [not found]             ` <1401727052-f7v7kykv@n-horiguchi@ah.jp.nec.com>
2014-06-02 16:45               ` Dave Hansen
2014-06-02 17:14                 ` Naoya Horiguchi
2014-06-02 18:19                   ` Dave Hansen
2014-06-02 18:48                     ` Naoya Horiguchi
2014-06-02 21:16             ` Andrew Morton
2014-06-02 21:51               ` Naoya Horiguchi
2014-06-02  5:24         ` [PATCH 2/3] mm: introduce fincore() Naoya Horiguchi
2014-06-02  6:42           ` Christoph Hellwig
2014-06-02 14:19             ` Naoya Horiguchi
2014-06-02  7:06           ` Michael Kerrisk
2014-06-02 14:21             ` Naoya Horiguchi
2014-06-02 12:23           ` Kirill A. Shutemov
2014-06-02 14:52             ` Naoya Horiguchi
2014-06-02 16:11           ` Dave Hansen
2014-06-02 16:22             ` Naoya Horiguchi
2014-06-02  5:24         ` [PATCH 3/3] selftest: add test code for fincore() Naoya Horiguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140522103632.GA23680@node.dhcp.inet.fi \
    --to=kirill@shutemov.name \
    --cc=acme@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=fengguang.wu@intel.com \
    --cc=gorcunov@openvz.org \
    --cc=koct9i@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).