From: Andrew Morton <akpm@linux-foundation.org>
To: Pavel Emelyanov <xemul@parallels.com>
Cc: Hugh Dickins <hughd@google.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Michal Hocko <mhocko@suse.cz>, Mel Gorman <mgorman@suse.de>,
Johannes Weiner <hannes@cmpxchg.org>,
Linux MM <linux-mm@kvack.org>, Rik van Riel <riel@redhat.com>,
Matt Mackall <mpm@selenic.com>,
Wu Fengguang <fengguang.wu@intel.com>
Subject: Re: [RFC PATCH 0/2] mm: Add ability to monitor task's memory changes
Date: Tue, 4 Dec 2012 15:21:21 -0800 [thread overview]
Message-ID: <20121204152121.e5c33938.akpm@linux-foundation.org> (raw)
In-Reply-To: <50BD86DE.6050700@parallels.com>
On Tue, 04 Dec 2012 09:15:10 +0400
Pavel Emelyanov <xemul@parallels.com> wrote:
>
> > Two alternatives come to mind:
> >
> > 1) Use /proc/pid/pagemap (Documentation/vm/pagemap.txt) in some
> > fashion to determine which pages have been touched.
>
> I thought about this. Unfortunately there's no free bits left in the pagemap
> entry. What can we do about it (other than introducing the pagemap2 file)?
urgh, we were pretty careless in laying out the /proc/pid/pagemap
entries.
Probably the 55 bits for pfn/swap were excessive.
The page shift didn't need six bits! Simply predividing the page shift
by 1k would have saved a few bits, and permitting expansion to a 1^63
byte page size is nuts.
Sigh. I wonder how traumatic it would be to put the pagemap record on
a diet and make up some free space.
Anyway, do you actually need to add another bit? /proc/pid/pagemap
gives you the pfn which can then be used to look up the page's flags in
/proc/pageflags. You can add a "touched" flag to /proc/kpageflags?
But that would require grabbing another bit in struct page.flags, I
assume.
And it would be very expensive. An in-kernel loop which searches the
MM spitting out a string of touched-pages would be faster, but still
slow.
hm.
> > 2) At pagefault time, don't send an event: just mark the vma as
> > "touched". Then add a userspace interface to sweep the vma tree
> > testing, clearing and reporting the touched flags.
>
> Per-vma granularity is not enough. In OpenVZ we've observed Oracle touching
> several pages in a hundred-megs anon mapping. Marking _part_ of the vma with
> the "node write-faults" bit would help, but there's currently no APIs that
> modifies vma and report some info back at the same time. Can you propose how
> it could look like?
I don't see a need to report the info back at the same time? You want
to *record* that information but only report it when someone does a
query?
Dunno. One could add a radix-tree to the vma and store 32 or 64
per-page bits in each slots[] entry. Worst case that would consume
approx one bit of kernel memory for each 4k of instantiated user pages
- an increase of 1/32768. Not too bad. Use the tagged-lookup facility
to efficiently query that bitmap at query-time.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-12-04 23:21 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-30 17:55 [RFC PATCH 0/2] mm: Add ability to monitor task's memory changes Pavel Emelyanov
2012-11-30 17:55 ` [PATCH 1/2] mm: Mark VMA with VM_TRACE bit Pavel Emelyanov
2012-11-30 17:55 ` [PATCH 2/2] mm: Generate events when tasks change their memory Pavel Emelyanov
2012-12-03 23:42 ` Xiao Guangrong
2012-12-04 5:04 ` Pavel Emelyanov
2012-12-03 8:36 ` [RFC PATCH 0/2] mm: Add ability to monitor task's memory changes Glauber Costa
2012-12-03 8:36 ` Glauber Costa
2012-12-03 20:16 ` Marcelo Tosatti
2012-12-04 7:39 ` Glauber Costa
2012-12-04 7:39 ` Glauber Costa
2012-12-03 22:43 ` Andrew Morton
2012-12-04 5:15 ` Pavel Emelyanov
2012-12-04 23:21 ` Andrew Morton [this message]
2012-12-05 0:17 ` Matt Mackall
2012-12-05 0:24 ` Andrew Morton
2012-12-05 0:38 ` Matt Mackall
2012-12-05 9:53 ` Pavel Emelyanov
2012-12-05 22:06 ` Andrew Morton
2012-12-06 6:32 ` Pavel Emelyanov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121204152121.e5c33938.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=fengguang.wu@intel.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=mpm@selenic.com \
--cc=riel@redhat.com \
--cc=xemul@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.