From: Pavel Emelyanov <xemul@parallels.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Michal Hocko <mhocko@suse.cz>, Mel Gorman <mgorman@suse.de>,
Johannes Weiner <hannes@cmpxchg.org>,
Linux MM <linux-mm@kvack.org>, Rik van Riel <riel@redhat.com>
Subject: Re: [RFC PATCH 0/2] mm: Add ability to monitor task's memory changes
Date: Tue, 04 Dec 2012 09:15:10 +0400 [thread overview]
Message-ID: <50BD86DE.6050700@parallels.com> (raw)
In-Reply-To: <20121203144310.7ccdbeb4.akpm@linux-foundation.org>
On 12/04/2012 02:43 AM, Andrew Morton wrote:
> On Fri, 30 Nov 2012 21:55:00 +0400
> Pavel Emelyanov <xemul@parallels.com> wrote:
>
>> This is an attempt to implement support for memory snapshot for the the
>> checkpoint-restore project (http://criu.org).
>>
>> To create a dump of an application(s) we save all the information about it
>> to files. No surprise, the biggest part of such dump is the contents of tasks'
>> memory. However, in some usage scenarios it's not required to get _all_ the
>> task memory while creating a dump. For example, when doing periodical dumps
>> it's only required to take full memory dump only at the first step and then
>> take incremental changes of memory. Another example is live migration. In the
>> simplest form it looks like -- create dump, copy it on the remote node then
>> restore tasks from dump files. While all this dump-copy-restore thing goes all
>> the process must be stopped. However, if we can monitor how tasks change their
>> memory, we can dump and copy it in smaller chunks, periodically updating it
>> and thus freezing tasks only at the very end for the very short time to pick
>> up the recent changes.
>>
>> That said, some help from kernel to watch how processes modify the contents of
>> their memory is required. I'd like to propose one possible solution of this
>> task -- with the help of page-faults and trace events.
>>
>> Briefly the approach is -- remap some memory regions as read-only, get the #pf
>> on task's attempt to modify the memory and issue a trace event of that. Since
>> we're only interested in parts of memory of some tasks, make it possible to mark
>> the vmas we're interested in and issue events for them only. Also, to be aware
>> of tasks unmapping the vma-s being watched, also issue an event when the marked
>> vma is removed (and for symmetry -- an event when a vma is marked).
>>
>> What do you think about this approach? Is this way of supporting mem snapshot
>> OK for you, or should we invent some better one?
>
> The patches look pretty simple.
>
> Some performance numbers would be useful.
>
> Is it reliable? Under what circumstances will the trace system drop
> events?
AFAIS when the buffer for events overflows, but the buffer size can be
tuned. I will write some mode descriptive text about it if the tracing
approach will be considered to be the way to go.
> Please cc Steven Rostedt on tracing stuff - he is a diligent reviewer.
OK.
> The proposed interface might be useful to things other than c/r. But
> it hasn't actually been described. Please include a full description
> of the proposed kernel/usersapce interface.
OK, will try to address that.
> Two alternatives come to mind:
>
> 1) Use /proc/pid/pagemap (Documentation/vm/pagemap.txt) in some
> fashion to determine which pages have been touched.
I thought about this. Unfortunately there's no free bits left in the pagemap
entry. What can we do about it (other than introducing the pagemap2 file)?
> 2) At pagefault time, don't send an event: just mark the vma as
> "touched". Then add a userspace interface to sweep the vma tree
> testing, clearing and reporting the touched flags.
Per-vma granularity is not enough. In OpenVZ we've observed Oracle touching
several pages in a hundred-megs anon mapping. Marking _part_ of the vma with
the "node write-faults" bit would help, but there's currently no APIs that
modifies vma and report some info back at the same time. Can you propose how
it could look like?
> 2a) Avoid the full linear search by propagating the "touched" flag
> up the rbtree and do the sweep in a fashion similar to
> radix_tree_for_each_tagged().
> .
Thanks,
Pavel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-12-04 5:16 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-30 17:55 [RFC PATCH 0/2] mm: Add ability to monitor task's memory changes Pavel Emelyanov
2012-11-30 17:55 ` [PATCH 1/2] mm: Mark VMA with VM_TRACE bit Pavel Emelyanov
2012-11-30 17:55 ` [PATCH 2/2] mm: Generate events when tasks change their memory Pavel Emelyanov
2012-12-03 23:42 ` Xiao Guangrong
2012-12-04 5:04 ` Pavel Emelyanov
2012-12-03 8:36 ` [RFC PATCH 0/2] mm: Add ability to monitor task's memory changes Glauber Costa
2012-12-03 8:36 ` Glauber Costa
2012-12-03 20:16 ` Marcelo Tosatti
2012-12-04 7:39 ` Glauber Costa
2012-12-04 7:39 ` Glauber Costa
2012-12-03 22:43 ` Andrew Morton
2012-12-04 5:15 ` Pavel Emelyanov [this message]
2012-12-04 23:21 ` Andrew Morton
2012-12-05 0:17 ` Matt Mackall
2012-12-05 0:24 ` Andrew Morton
2012-12-05 0:38 ` Matt Mackall
2012-12-05 9:53 ` Pavel Emelyanov
2012-12-05 22:06 ` Andrew Morton
2012-12-06 6:32 ` Pavel Emelyanov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50BD86DE.6050700@parallels.com \
--to=xemul@parallels.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.