All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Borislav Petkov <bp@alien8.de>,
	Cyrill Gorcunov <gorcunov@openvz.org>
Subject: Re: [PATCH 0/4] pagecache scanning with /proc/kpagecache
Date: Thu, 22 May 2014 13:36:32 +0300	[thread overview]
Message-ID: <20140522103632.GA23680@node.dhcp.inet.fi> (raw)
In-Reply-To: <CALYGNiMeDtiaA6gfbEYcXbwkuFvTRCLC9KmMOPtopAgGg5b6AA@mail.gmail.com>

On Thu, May 22, 2014 at 01:50:22PM +0400, Konstantin Khlebnikov wrote:
> On Thu, May 22, 2014 at 6:33 AM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> > On Wed, 21 May 2014 22:19:55 -0400 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> >
> >> > A much nicer interface would be for us to (finally!) implement
> >> > fincore(), perhaps with an enhanced per-present-page payload which
> >> > presents the info which you need (although we don't actually know what
> >> > that info is!).
> >>
> >> page/pfn of each page slot and its page cache tag as shown in patch 4/4.
> >>
> >> > This would require open() - it appears to be a requirement that the
> >> > caller not open the file, but no reason was given for this.
> >> >
> >> > Requiring open() would address some of the obvious security concerns,
> >> > but it will still be possible for processes to poke around and get some
> >> > understanding of the behaviour of other processes.  Careful attention
> >> > should be paid to this aspect of any such patchset.
> >>
> >> Sorry if I missed your point, but this interface defines fixed mapping
> >> between file position in /proc/kpagecache and in-file page offset of
> >> the target file. So we do not need to use seq_file mechanism, that's
> >> why open() is not defined and default one is used.
> >> The same thing is true for /proc/{kpagecount,kpageflags}, from which
> >> I copied/pasted some basic code.
> >
> > I think you did miss my point ;) Please do a web search for fincore -
> > it's a syscall similar to mincore(), only it queries pagecache:
> > fincore(int fd, loff_t offset, ...).  In its simplest form it queries
> > just for present/absent, but we could increase the query payload to
> > incorporate additional per-page info.
> >
> > It would take a lot of thought and discussion to nail down the
> > fincore() interface (we've already tried a couple of times).  But
> > unfortunately, fincore() is probably going to be implemented one day
> > and it will (or at least could) make /proc/kpagecache obsolete.
> >
> 
> It seems fincore() also might obsolete /proc/kpageflags and /proc/pid/pagemap.
> because it might be implemented for /dev/mem and /proc/pid/mem as well
> as for normal files.
> 
> Something like this:
> int fincore(int fd, u64 *kpf, u64 *pfn, size_t length, off_t offset)

As always with new syscalls flags are missing ;)

u64 for kpf doesn't sound future proof enough. What about this:

int fincore(int fd, size_t length, off_t offset,
	unsigned long flags, void *records);

Format of records is defined by what user asks in flags. Like:

 - FINCORE_PFN: records are 64-bit each with pfn;
 - FINCORE_PAGE_FLAGS: records are 64-bit each with flags;
 - FINCORE_PFN | FINCORE_PAGE_FLAGS: records are 128-bit each with pfns
   followed by flags (or vice versa);

New flags can extend the format if we would want to expose more info.

Comments?

BTW, does everybody happy with mincore() interface? We report 1 there if
pte is present, but it doesn't really say much about the page for cases
like zero page...

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Borislav Petkov <bp@alien8.de>,
	Cyrill Gorcunov <gorcunov@openvz.org>
Subject: Re: [PATCH 0/4] pagecache scanning with /proc/kpagecache
Date: Thu, 22 May 2014 13:36:32 +0300	[thread overview]
Message-ID: <20140522103632.GA23680@node.dhcp.inet.fi> (raw)
In-Reply-To: <CALYGNiMeDtiaA6gfbEYcXbwkuFvTRCLC9KmMOPtopAgGg5b6AA@mail.gmail.com>

On Thu, May 22, 2014 at 01:50:22PM +0400, Konstantin Khlebnikov wrote:
> On Thu, May 22, 2014 at 6:33 AM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> > On Wed, 21 May 2014 22:19:55 -0400 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> >
> >> > A much nicer interface would be for us to (finally!) implement
> >> > fincore(), perhaps with an enhanced per-present-page payload which
> >> > presents the info which you need (although we don't actually know what
> >> > that info is!).
> >>
> >> page/pfn of each page slot and its page cache tag as shown in patch 4/4.
> >>
> >> > This would require open() - it appears to be a requirement that the
> >> > caller not open the file, but no reason was given for this.
> >> >
> >> > Requiring open() would address some of the obvious security concerns,
> >> > but it will still be possible for processes to poke around and get some
> >> > understanding of the behaviour of other processes.  Careful attention
> >> > should be paid to this aspect of any such patchset.
> >>
> >> Sorry if I missed your point, but this interface defines fixed mapping
> >> between file position in /proc/kpagecache and in-file page offset of
> >> the target file. So we do not need to use seq_file mechanism, that's
> >> why open() is not defined and default one is used.
> >> The same thing is true for /proc/{kpagecount,kpageflags}, from which
> >> I copied/pasted some basic code.
> >
> > I think you did miss my point ;) Please do a web search for fincore -
> > it's a syscall similar to mincore(), only it queries pagecache:
> > fincore(int fd, loff_t offset, ...).  In its simplest form it queries
> > just for present/absent, but we could increase the query payload to
> > incorporate additional per-page info.
> >
> > It would take a lot of thought and discussion to nail down the
> > fincore() interface (we've already tried a couple of times).  But
> > unfortunately, fincore() is probably going to be implemented one day
> > and it will (or at least could) make /proc/kpagecache obsolete.
> >
> 
> It seems fincore() also might obsolete /proc/kpageflags and /proc/pid/pagemap.
> because it might be implemented for /dev/mem and /proc/pid/mem as well
> as for normal files.
> 
> Something like this:
> int fincore(int fd, u64 *kpf, u64 *pfn, size_t length, off_t offset)

As always with new syscalls flags are missing ;)

u64 for kpf doesn't sound future proof enough. What about this:

int fincore(int fd, size_t length, off_t offset,
	unsigned long flags, void *records);

Format of records is defined by what user asks in flags. Like:

 - FINCORE_PFN: records are 64-bit each with pfn;
 - FINCORE_PAGE_FLAGS: records are 64-bit each with flags;
 - FINCORE_PFN | FINCORE_PAGE_FLAGS: records are 128-bit each with pfns
   followed by flags (or vice versa);

New flags can extend the format if we would want to expose more info.

Comments?

BTW, does everybody happy with mincore() interface? We report 1 there if
pte is present, but it doesn't really say much about the page for cases
like zero page...

-- 
 Kirill A. Shutemov

  reply	other threads:[~2014-05-22 10:37 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-21  2:26 [PATCH 0/4] pagecache scanning with /proc/kpagecache Naoya Horiguchi
2014-05-21  2:26 ` Naoya Horiguchi
2014-05-21  2:26 ` [PATCH 1/4] radix-tree: add end_index to support ranged iteration Naoya Horiguchi
2014-05-21  2:26   ` Naoya Horiguchi
2014-05-21  8:21   ` Konstantin Khlebnikov
2014-05-21  8:21     ` Konstantin Khlebnikov
2014-05-21 19:26     ` Naoya Horiguchi
2014-05-21  2:26 ` [PATCH 2/4] fs/proc/page.c: introduce /proc/kpagecache interface Naoya Horiguchi
2014-05-21  2:26   ` Naoya Horiguchi
2014-05-21  2:26 ` [PATCH 3/4] tools/vm/page-types.c: rework on file cache scanning mode Naoya Horiguchi
2014-05-21  2:26   ` Naoya Horiguchi
2014-05-21  2:26 ` [PATCH 4/4] Documentation: update Documentation/vm/pagemap.txt Naoya Horiguchi
2014-05-21  2:26   ` Naoya Horiguchi
2014-05-21 22:42 ` [PATCH 0/4] pagecache scanning with /proc/kpagecache Andrew Morton
2014-05-21 22:42   ` Andrew Morton
2014-05-22  2:19   ` Naoya Horiguchi
     [not found]   ` <537d5ee4.4914e00a.5672.ffff85d5SMTPIN_ADDED_BROKEN@mx.google.com>
2014-05-22  2:33     ` Andrew Morton
2014-05-22  2:33       ` Andrew Morton
2014-05-22  9:50       ` Konstantin Khlebnikov
2014-05-22  9:50         ` Konstantin Khlebnikov
2014-05-22 10:36         ` Kirill A. Shutemov [this message]
2014-05-22 10:36           ` Kirill A. Shutemov
2014-05-22 17:47           ` Naoya Horiguchi
2014-05-22 21:02             ` Naoya Horiguchi
2014-06-02  5:24       ` [RFC][PATCH 0/3] mm: introduce fincore() Naoya Horiguchi
2014-06-02  5:24         ` Naoya Horiguchi
2014-06-02  5:24         ` [PATCH 1/3] replace PAGECACHE_TAG_* definition with enumeration Naoya Horiguchi
2014-06-02  5:24           ` Naoya Horiguchi
2014-06-02 16:12           ` Dave Hansen
2014-06-02 16:12             ` Dave Hansen
2014-06-02 16:37             ` Naoya Horiguchi
     [not found]             ` <1401727052-f7v7kykv@n-horiguchi@ah.jp.nec.com>
2014-06-02 16:45               ` Dave Hansen
2014-06-02 16:45                 ` Dave Hansen
2014-06-02 17:14                 ` Naoya Horiguchi
2014-06-02 18:19                   ` Dave Hansen
2014-06-02 18:19                     ` Dave Hansen
2014-06-02 18:48                     ` Naoya Horiguchi
2014-06-02 21:16             ` Andrew Morton
2014-06-02 21:16               ` Andrew Morton
2014-06-02 21:51               ` Naoya Horiguchi
2014-06-02  5:24         ` [PATCH 2/3] mm: introduce fincore() Naoya Horiguchi
2014-06-02  5:24           ` Naoya Horiguchi
     [not found]           ` <1401686699-9723-3-git-send-email-n-horiguchi-PaJj6Psr51x8UrSeD/g0lQ@public.gmane.org>
2014-06-02  6:42             ` Christoph Hellwig
2014-06-02  6:42               ` Christoph Hellwig
2014-06-02  6:42               ` Christoph Hellwig
2014-06-02 14:19               ` Naoya Horiguchi
2014-06-02  7:06             ` Michael Kerrisk
2014-06-02  7:06               ` Michael Kerrisk
2014-06-02  7:06               ` Michael Kerrisk
2014-06-02 14:21               ` Naoya Horiguchi
2014-06-02 12:23           ` Kirill A. Shutemov
2014-06-02 12:23             ` Kirill A. Shutemov
2014-06-02 14:52             ` Naoya Horiguchi
2014-06-02 16:11           ` Dave Hansen
2014-06-02 16:11             ` Dave Hansen
2014-06-02 16:22             ` Naoya Horiguchi
2014-06-02  5:24         ` [PATCH 3/3] selftest: add test code for fincore() Naoya Horiguchi
2014-06-02  5:24           ` Naoya Horiguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140522103632.GA23680@node.dhcp.inet.fi \
    --to=kirill@shutemov.name \
    --cc=acme@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=fengguang.wu@intel.com \
    --cc=gorcunov@openvz.org \
    --cc=koct9i@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.