Re: [CRIU] Periodic checkpointing (using perf and signals?)

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Pavel Emelyanov <xemul@parallels.com>
To: Christopher Covington <cov@codeaurora.org>
Cc: linux-perf-users@vger.kernel.org, criu@openvz.org
Subject: Re: [CRIU] Periodic checkpointing (using perf and signals?)
Date: Wed, 17 Jul 2013 19:57:11 +0400	[thread overview]
Message-ID: <51E6BED7.20402@parallels.com> (raw)
In-Reply-To: <51E6BBC0.1080102@codeaurora.org>

On 07/17/2013 07:44 PM, Christopher Covington wrote:
> Hi,
> 
> I'm interested in taking checkpoints of processes from fast systems like
> hardware and restoring them on really slow software models for performance
> analysis.

Great idea! I will add it on http://criu.org/Usage_scenarios :)

> So far I've been able to save and restore checkpoints on the
> different systems using CRIU. Now I'm looking for some way to trigger the
> checkpointing. One basic use case might be to take a process that runs for say
> 100M instructions and take a checkpoint every 10M instructions to be restored
> as 10 parallel runs of the model.
> 
> I'm thinking of trying to use performance counters to trigger such behavior.
> Does perf already have support for triggering things like this?

I'm not 100% sure, but I've seen examples of python plugins for perf. From
these examples, I believe that it's possible to write a plugin, that will run
some code after noticing 100M instructions.

> If not, I'm
> thinking of trying to work in the ability to send a signal, like stop, to the
> process of interest once the specified count, such as 10M instructions, has
> been reached. CRIU or a wrapper could then wait for process of interest to
> stop, take the checkpoint, let the process continue, and then wait for it to
> stop again or exit. Would such an approach make sense?

It makes perfect sense! Several things to note from my side.

1. It's perfect case where the --track-mem + --prev-images-dir options should be
used. It will help subsequent dumps take MUCH less time, since with them CRIU 
will not take full task dump, but instead will only grab what has changed since
last dump.

2. Current version of CRIU doesn't work with stopped tasks. We're currently
developing it and this functionality will be available with v0.7 only. However,
I think it's OK just to start "criu dump" command after perf trigger. The dump
would work on a process that has done slightly more than 10M instructions, but
that would be the same in case you send it STOP signal.

> Thanks,
> Christopher

Thanks,
Pavel

     prev parent reply	other threads:[~2013-07-17 15:57 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-17 15:44 Periodic checkpointing (using perf and signals?) Christopher Covington
2013-07-17 15:57 ` Pavel Emelyanov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51E6BED7.20402@parallels.com \
    --to=xemul@parallels.com \
    --cc=cov@codeaurora.org \
    --cc=criu@openvz.org \
    --cc=linux-perf-users@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).