linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
To: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: Andres Lagar-Cavilla
	<andreslc-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Minchan Kim <minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Raghavendra K T
	<raghavendra.kt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>,
	Greg Thelen <gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Michel Lespinasse
	<walken-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	David Rientjes <rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>,
	Cyrill Gorcunov
	<gorcunov-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>,
	Jonathan Corbet <corbet-T1hC0tSOHrs@public.gmane.org>,
	linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-doc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
Subject: Re: [PATCH -mm v9 0/8] idle memory tracking
Date: Sat, 25 Jul 2015 19:24:56 +0300	[thread overview]
Message-ID: <20150725162456.GM8100@esperanza> (raw)
In-Reply-To: <20150722162353.GM23374@esperanza>

On Wed, Jul 22, 2015 at 07:23:53PM +0300, Vladimir Davydov wrote:
> On Tue, Jul 21, 2015 at 04:34:02PM -0700, Andrew Morton wrote:
> > On Sun, 19 Jul 2015 15:31:09 +0300 Vladimir Davydov <vdavydov-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> wrote:

> > > Documentation/vm/pagemap.txt           |  22 ++-
> > 
> > I think we'll need quite a lot more than this to fully describe the
> > interface?
> 
> Agree, the documentation sucks :-( Will try to forge something more
> thorough.

The incremental patch is attached. Could you please merge it into
proc-add-kpageidle-file?
---
From: Vladimir Davydov <vdavydov-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
Subject: [PATCH] Documentation: Add idle page tracking description

Signed-off-by: Vladimir Davydov <vdavydov-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>

diff --git a/Documentation/vm/00-INDEX b/Documentation/vm/00-INDEX
index 081c49777abb..6a5e2a102a45 100644
--- a/Documentation/vm/00-INDEX
+++ b/Documentation/vm/00-INDEX
@@ -14,6 +14,8 @@ hugetlbpage.txt
 	- a brief summary of hugetlbpage support in the Linux kernel.
 hwpoison.txt
 	- explains what hwpoison is
+idle_page_tracking.txt
+	- description of the idle page tracking feature.
 ksm.txt
 	- how to use the Kernel Samepage Merging feature.
 numa
diff --git a/Documentation/vm/idle_page_tracking.txt b/Documentation/vm/idle_page_tracking.txt
new file mode 100644
index 000000000000..d0f332d544c4
--- /dev/null
+++ b/Documentation/vm/idle_page_tracking.txt
@@ -0,0 +1,94 @@
+MOTIVATION
+
+The idle page tracking feature allows to track which memory pages are being
+accessed by a workload and which are idle. This information can be useful for
+estimating the workload's working set size, which, in turn, can be taken into
+account when configuring the workload parameters, setting memory cgroup limits,
+or deciding where to place the workload within a compute cluster.
+
+USER API
+
+If CONFIG_IDLE_PAGE_TRACKING was enabled on compile time, a new read-write file
+is present on the proc filesystem, /proc/kpageidle.
+
+The file implements a bitmap where each bit corresponds to a memory page. The
+bitmap is represented by an array of 8-byte integers, and the page at PFN #i is
+mapped to bit #i%64 of array element #i/64, byte order is native. When a bit is
+set, the corresponding page is idle.
+
+A page is considered idle if it has not been accessed since it was marked idle
+(for more details on what "accessed" actually means see the IMPLEMENTATION
+DETAILS section). To mark a page idle one has to set the bit corresponding to
+the page by writing to the file. A value written to the file is OR-ed with the
+current bitmap value.
+
+Only accesses to user memory pages are tracked. These are pages mapped to a
+process address space, page cache and buffer pages, swap cache pages. For other
+page types (e.g. SLAB pages) an attempt to mark a page idle is silently ignored,
+and hence such pages are never reported idle.
+
+For huge pages the idle flag is set only on the head page, so one has to read
+/proc/kpageflags in order to correctly count idle huge pages.
+
+Reading from or writing to /proc/kpageidle will return -EINVAL if you are not
+starting the read/write on an 8-byte boundary, or if the size of the read/write
+is not a multiple of 8 bytes. Writing to this file beyond max PFN will return
+-ENXIO.
+
+That said, in order to estimate the amount of pages that are not used by a
+workload one should:
+
+ 1. Mark all the workload's pages as idle by setting corresponding bits in the
+    /proc/kpageidle bitmap. The pages can be found by reading /proc/pid/pagemap
+    if the workload is represented by a process, or by filtering out alien pages
+    using /proc/kpagecgroup in case the workload is placed in a memory cgroup.
+
+ 2. Wait until the workload accesses its working set.
+
+ 3. Read /proc/kpageidle and count the number of bits set. If one wants to
+    ignore certain types of pages, e.g. mlocked pages since they are not
+    reclaimable, he or she can filter them out using /proc/kpageflags.
+
+See Documentation/vm/pagemap.txt for more information about /proc/pid/pagemap,
+/proc/kpageflags, and /proc/kpagecgroup.
+
+IMPLEMENTATION DETAILS
+
+The kernel internally keeps track of accesses to user memory pages in order to
+reclaim unreferenced pages first on memory shortage conditions. A page is
+considered referenced if it has been recently accessed via a process address
+space, in which case one or more PTEs it is mapped to will have the Accessed bit
+set, or marked accessed explicitly by the kernel (see mark_page_accessed()). The
+latter happens when:
+
+ - a userspace process reads or writes a page using a system call (e.g. read(2)
+   or write(2))
+
+ - a page that is used for storing filesystem buffers is read or written,
+   because a process needs filesystem metadata stored in it (e.g. lists a
+   directory tree)
+
+ - a page is accessed by a device driver using get_user_pages()
+
+When a dirty page is written to swap or disk as a result of memory reclaim or
+exceeding the dirty memory limit, it is not marked referenced.
+
+The idle memory tracking feature adds a new page flag, the Idle flag. This flag
+is set manually, by writing to /proc/kpageidle (see the USER API section), and
+cleared automatically whenever a page is referenced as defined above.
+
+When a page is marked idle, the Accessed bit must be cleared in all PTEs it is
+mapped to, otherwise we will not be able to detect accesses to the page coming
+from a process address space. To avoid interference with the reclaimer, which,
+as noted above, uses the Accessed bit to promote actively referenced pages, one
+more page flag is introduced, the Young flag. When the PTE Accessed bit is
+cleared as a result of setting or updating a page's Idle flag, the Young flag
+is set on the page. The reclaimer treats the Young flag as an extra PTE
+Accessed bit and therefore will consider such a page as referenced.
+
+Since the idle memory tracking feature is based on the memory reclaimer logic,
+it only works with pages that are on an LRU list, other pages are silently
+ignored. That means it will ignore a user memory page if it is isolated, but
+since there are usually not many of them, it should not affect the overall
+result noticeably. In order not to stall scanning of /proc/kpageidle, locked
+pages may be skipped too.
diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
index 538735465693..cff513e28a13 100644
--- a/Documentation/vm/pagemap.txt
+++ b/Documentation/vm/pagemap.txt
@@ -71,15 +71,8 @@ There are five components to pagemap:
    memory cgroup each page is charged to, indexed by PFN. Only available when
    CONFIG_MEMCG is set.
 
- * /proc/kpageidle.  This file implements a bitmap where each bit corresponds
-   to a page, indexed by PFN. When the bit is set, the corresponding page is
-   idle. A page is considered idle if it has not been accessed since it was
-   marked idle. To mark a page idle one should set the bit corresponding to the
-   page by writing to the file. A value written to the file is OR-ed with the
-   current bitmap value. Only user memory pages can be marked idle, for other
-   page types input is silently ignored. Writing to this file beyond max PFN
-   results in the ENXIO error. Only available when CONFIG_IDLE_PAGE_TRACKING is
-   set.
+ * /proc/kpageidle.  This file comprises API of the idle page tracking feature.
+   See Documentation/vm/idle_page_tracking.txt for more details.
 
 Short descriptions to the page flags:
 
diff --git a/mm/Kconfig b/mm/Kconfig
index a1de09926171..90fa89175102 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -666,4 +666,4 @@ config IDLE_PAGE_TRACKING
 	  be useful to tune memory cgroup limits and/or for job placement
 	  within a compute cluster.
 
-	  See Documentation/vm/pagemap.txt for more details.
+	  See Documentation/vm/idle_page_tracking.txt for more details.

  reply	other threads:[~2015-07-25 16:24 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-19 12:31 [PATCH -mm v9 0/8] idle memory tracking Vladimir Davydov
2015-07-19 12:31 ` [PATCH -mm v9 1/8] memcg: add page_cgroup_ino helper Vladimir Davydov
2015-07-21 23:34   ` Andrew Morton
2015-07-22  9:21     ` Vladimir Davydov
2015-07-19 12:31 ` [PATCH -mm v9 2/8] hwpoison: use page_cgroup_ino for filtering by memcg Vladimir Davydov
2015-07-21 23:34   ` Andrew Morton
     [not found]     ` <20150721163412.1b44e77f5ac3b742734d1ce6-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2015-07-22  9:45       ` Vladimir Davydov
2015-07-19 12:31 ` [PATCH -mm v9 3/8] memcg: zap try_get_mem_cgroup_from_page Vladimir Davydov
2015-07-19 12:31 ` [PATCH -mm v9 4/8] proc: add kpagecgroup file Vladimir Davydov
2015-07-21 23:34   ` Andrew Morton
     [not found]     ` <20150721163433.618855e1f61536a09dfac30b-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2015-07-22 10:33       ` Vladimir Davydov
2015-07-19 12:31 ` [PATCH -mm v9 5/8] mmu-notifier: add clear_young callback Vladimir Davydov
2015-07-20 18:34   ` Andres Lagar-Cavilla
2015-07-21  8:51     ` Vladimir Davydov
2015-07-22 16:33       ` Vladimir Davydov
2015-07-19 12:31 ` [PATCH -mm v9 6/8] proc: add kpageidle file Vladimir Davydov
2015-07-21 23:34   ` Andrew Morton
     [not found]     ` <20150721163452.c1e4075a2b193bcd325fad56-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2015-07-22 15:20       ` Vladimir Davydov
     [not found]   ` <d7a78b72053cf529c0c9ff6cbc02ffbb3d58fe35.1437303956.git.vdavydov-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2015-07-24 14:08     ` Paul Gortmaker
     [not found]       ` <CAP=VYLqiNfQJ6oyQg2GszeHwdOmeY_uD3XPvw=++weJOKdx4_g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-07-24 14:17         ` Vladimir Davydov
2015-07-19 12:31 ` [PATCH -mm v9 7/8] proc: export idle flag via kpageflags Vladimir Davydov
2015-07-21 23:35   ` Andrew Morton
     [not found]     ` <20150721163500.528bd39bbbc71abc3c8d429b-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2015-07-22 16:25       ` Vladimir Davydov
2015-07-22 19:44         ` Andrew Morton
2015-07-22 20:46           ` Andres Lagar-Cavilla
2015-07-23  7:57             ` Vladimir Davydov
2015-07-19 12:31 ` [PATCH -mm v9 8/8] proc: add cond_resched to /proc/kpage* read/write loop Vladimir Davydov
     [not found] ` <cover.1437303956.git.vdavydov-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2015-07-19 12:37   ` [PATCH -mm v9 0/8] idle memory tracking Vladimir Davydov
2015-07-21 21:39 ` Andres Lagar-Cavilla
2015-07-21 23:34 ` Andrew Morton
2015-07-22 16:23   ` Vladimir Davydov
2015-07-25 16:24     ` Vladimir Davydov [this message]
2015-07-27 19:18   ` Kees Cook
2015-07-27 19:25     ` Andrew Morton
2015-07-29 12:36 ` Michal Hocko
     [not found]   ` <20150729123629.GI15801-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2015-07-29 13:59     ` Vladimir Davydov
2015-07-29 14:12       ` Michel Lespinasse
     [not found]         ` <CANN689HJX2ZL891uOd8TW9ct4PNH9d5odQZm86WMxkpkCWhA-w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-07-29 14:13           ` Michel Lespinasse
2015-07-29 14:45           ` Vladimir Davydov
2015-07-29 15:08             ` Michel Lespinasse
     [not found]               ` <CANN689Euq3Y-CHQo8q88vzFAYZX4S6rK+rZRfbuSKfS74u=gcg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-07-29 15:31                 ` Vladimir Davydov
2015-07-29 15:34                   ` Michel Lespinasse
2015-07-29 15:08             ` Michal Hocko
     [not found]               ` <20150729150855.GM15801-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2015-07-29 15:36                 ` Vladimir Davydov
2015-07-29 15:58                   ` Michal Hocko
2015-07-29 14:26       ` Michal Hocko
2015-07-29 15:28         ` Vladimir Davydov
2015-07-29 15:47           ` Michal Hocko
     [not found]             ` <20150729154718.GN15801-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2015-07-29 16:29               ` Vladimir Davydov
2015-07-29 21:30                 ` Andrew Morton
2015-07-30  9:12                   ` Vladimir Davydov
2015-07-30 13:01                     ` Vladimir Davydov
2015-07-31  9:34                       ` Vladimir Davydov
2015-07-30  9:07                 ` Michal Hocko
     [not found]                   ` <20150730090708.GE9387-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2015-07-30  9:31                     ` Vladimir Davydov
2015-07-29 15:55           ` Andres Lagar-Cavilla
     [not found]             ` <CAJu=L59RdowYjTyVM0Vhz79A4d=d8=ZmU7PB59CmEj5B0_c48Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-07-29 16:37               ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150725162456.GM8100@esperanza \
    --to=vdavydov-bzqdu9zft3wakbo8gow8eq@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=andreslc-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=corbet-T1hC0tSOHrs@public.gmane.org \
    --cc=gorcunov-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org \
    --cc=gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
    --cc=keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org \
    --cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-doc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    --cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
    --cc=minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=raghavendra.kt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
    --cc=rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=walken-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).