From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Wu Fengguang <fengguang.wu@intel.com>,
Arnaldo Carvalho de Melo <acme@redhat.com>,
Borislav Petkov <bp@alien8.de>,
Cyrill Gorcunov <gorcunov@openvz.org>
Subject: Re: [PATCH 0/4] pagecache scanning with /proc/kpagecache
Date: Thu, 22 May 2014 13:36:32 +0300 [thread overview]
Message-ID: <20140522103632.GA23680@node.dhcp.inet.fi> (raw)
In-Reply-To: <CALYGNiMeDtiaA6gfbEYcXbwkuFvTRCLC9KmMOPtopAgGg5b6AA@mail.gmail.com>
On Thu, May 22, 2014 at 01:50:22PM +0400, Konstantin Khlebnikov wrote:
> On Thu, May 22, 2014 at 6:33 AM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> > On Wed, 21 May 2014 22:19:55 -0400 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> >
> >> > A much nicer interface would be for us to (finally!) implement
> >> > fincore(), perhaps with an enhanced per-present-page payload which
> >> > presents the info which you need (although we don't actually know what
> >> > that info is!).
> >>
> >> page/pfn of each page slot and its page cache tag as shown in patch 4/4.
> >>
> >> > This would require open() - it appears to be a requirement that the
> >> > caller not open the file, but no reason was given for this.
> >> >
> >> > Requiring open() would address some of the obvious security concerns,
> >> > but it will still be possible for processes to poke around and get some
> >> > understanding of the behaviour of other processes. Careful attention
> >> > should be paid to this aspect of any such patchset.
> >>
> >> Sorry if I missed your point, but this interface defines fixed mapping
> >> between file position in /proc/kpagecache and in-file page offset of
> >> the target file. So we do not need to use seq_file mechanism, that's
> >> why open() is not defined and default one is used.
> >> The same thing is true for /proc/{kpagecount,kpageflags}, from which
> >> I copied/pasted some basic code.
> >
> > I think you did miss my point ;) Please do a web search for fincore -
> > it's a syscall similar to mincore(), only it queries pagecache:
> > fincore(int fd, loff_t offset, ...). In its simplest form it queries
> > just for present/absent, but we could increase the query payload to
> > incorporate additional per-page info.
> >
> > It would take a lot of thought and discussion to nail down the
> > fincore() interface (we've already tried a couple of times). But
> > unfortunately, fincore() is probably going to be implemented one day
> > and it will (or at least could) make /proc/kpagecache obsolete.
> >
>
> It seems fincore() also might obsolete /proc/kpageflags and /proc/pid/pagemap.
> because it might be implemented for /dev/mem and /proc/pid/mem as well
> as for normal files.
>
> Something like this:
> int fincore(int fd, u64 *kpf, u64 *pfn, size_t length, off_t offset)
As always with new syscalls flags are missing ;)
u64 for kpf doesn't sound future proof enough. What about this:
int fincore(int fd, size_t length, off_t offset,
unsigned long flags, void *records);
Format of records is defined by what user asks in flags. Like:
- FINCORE_PFN: records are 64-bit each with pfn;
- FINCORE_PAGE_FLAGS: records are 64-bit each with flags;
- FINCORE_PFN | FINCORE_PAGE_FLAGS: records are 128-bit each with pfns
followed by flags (or vice versa);
New flags can extend the format if we would want to expose more info.
Comments?
BTW, does everybody happy with mincore() interface? We report 1 there if
pte is present, but it doesn't really say much about the page for cases
like zero page...
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-05-22 10:37 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-21 2:26 [PATCH 0/4] pagecache scanning with /proc/kpagecache Naoya Horiguchi
2014-05-21 2:26 ` [PATCH 1/4] radix-tree: add end_index to support ranged iteration Naoya Horiguchi
2014-05-21 8:21 ` Konstantin Khlebnikov
2014-05-21 19:26 ` Naoya Horiguchi
2014-05-21 2:26 ` [PATCH 2/4] fs/proc/page.c: introduce /proc/kpagecache interface Naoya Horiguchi
2014-05-21 2:26 ` [PATCH 3/4] tools/vm/page-types.c: rework on file cache scanning mode Naoya Horiguchi
2014-05-21 2:26 ` [PATCH 4/4] Documentation: update Documentation/vm/pagemap.txt Naoya Horiguchi
2014-05-21 22:42 ` [PATCH 0/4] pagecache scanning with /proc/kpagecache Andrew Morton
2014-05-22 2:19 ` Naoya Horiguchi
[not found] ` <537d5ee4.4914e00a.5672.ffff85d5SMTPIN_ADDED_BROKEN@mx.google.com>
2014-05-22 2:33 ` Andrew Morton
2014-05-22 9:50 ` Konstantin Khlebnikov
2014-05-22 10:36 ` Kirill A. Shutemov [this message]
2014-05-22 17:47 ` Naoya Horiguchi
2014-05-22 21:02 ` Naoya Horiguchi
2014-06-02 5:24 ` [RFC][PATCH 0/3] mm: introduce fincore() Naoya Horiguchi
2014-06-02 5:24 ` [PATCH 1/3] replace PAGECACHE_TAG_* definition with enumeration Naoya Horiguchi
2014-06-02 16:12 ` Dave Hansen
2014-06-02 16:37 ` Naoya Horiguchi
[not found] ` <1401727052-f7v7kykv@n-horiguchi@ah.jp.nec.com>
2014-06-02 16:45 ` Dave Hansen
2014-06-02 17:14 ` Naoya Horiguchi
2014-06-02 18:19 ` Dave Hansen
2014-06-02 18:48 ` Naoya Horiguchi
2014-06-02 21:16 ` Andrew Morton
2014-06-02 21:51 ` Naoya Horiguchi
2014-06-02 5:24 ` [PATCH 2/3] mm: introduce fincore() Naoya Horiguchi
2014-06-02 6:42 ` Christoph Hellwig
2014-06-02 14:19 ` Naoya Horiguchi
2014-06-02 7:06 ` Michael Kerrisk
2014-06-02 14:21 ` Naoya Horiguchi
2014-06-02 12:23 ` Kirill A. Shutemov
2014-06-02 14:52 ` Naoya Horiguchi
2014-06-02 16:11 ` Dave Hansen
2014-06-02 16:22 ` Naoya Horiguchi
2014-06-02 5:24 ` [PATCH 3/3] selftest: add test code for fincore() Naoya Horiguchi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140522103632.GA23680@node.dhcp.inet.fi \
--to=kirill@shutemov.name \
--cc=acme@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=fengguang.wu@intel.com \
--cc=gorcunov@openvz.org \
--cc=koct9i@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=n-horiguchi@ah.jp.nec.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).