Re: [PATCH 6/6] KVM: Dirty memory tracking for performant checkpointing and improved live migration

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: "Huang, Kai" <kai.huang@linux.intel.com>
To: "Cao, Lei" <Lei.Cao@stratus.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [PATCH 6/6] KVM: Dirty memory tracking for performant checkpointing and improved live migration
Date: Tue, 3 May 2016 19:10:54 +1200	[thread overview]
Message-ID: <2c122d8a-6633-9812-5f44-47bb50db07fa@linux.intel.com> (raw)
In-Reply-To: <BL2PR08MB4814C8EBEC9E7A82E01EC39F0630@BL2PR08MB481.namprd08.prod.outlook.com>

Hi,

On 4/27/2016 7:26 AM, Cao, Lei wrote:
> Updates to KVM API documentation
>
> ---
>  Documentation/virtual/kvm/api.txt | 170 ++++++++++++++++++++++++++++
>  1 file changed, 170 insertions(+)
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 4d0542c..3f5367a 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -3120,6 +3120,176 @@ struct kvm_reinject_control {
>  pit_reinject = 0 (!reinject mode) is recommended, unless running an old
>  operating system that uses the PIT for timing (e.g. Linux 2.4.x).
>
> +4.99 KVM_INIT_MT
> +
> +Capability: basic
> +Architectures: x86

Shall we make the new IOCTLs be available for all archs? In my 
understanding your memory tracking mechanism doesn't depend on any 
specific arch. :)

Thanks,
-Kai

> +Type: vm ioctl
> +Parameters: struct mt_setup (in)
> +Returns: 0 on success, -1 on error
> +
> +/* for KVM_INIT_MT */
> +struct mt_setup {
> +#define KVM_MT_VERSION                  1
> +struct mt_setup {
> +        __u32 version;
> +
> +        /* which operation to perform? */
> +#define KVM_MT_OP_INIT           1
> +#define KVM_MT_OP_CLEANUP        2
> +        __u32 op;
> +
> +        /*
> +         * Features.
> +         * 1. Avoid logging duplicate entries
> +         */
> +#define KVM_MT_OPTION_NO_DUPS           (1 << 2)
> +
> +        __u32 flags;
> +
> +        /* max number of dirty pages per checkpoint cycle */
> +        __u32 max_dirty;
> +};
> +
> +This instructs the memory tracking (MT) subsystem to initialize or
> +cleanup memory tracking data structures. Userspace specifies the
> +memory tracking version to make sure it and KVM are on the same
> +page. For initialization, userspace specifies the maxinum number
> +of dirty pages that is allowed per checkpoint cycle. It can tell
> +KVM to avoid logging duplicate pages via 'flags', and KVM would
> +create bitmap to track dirty pages.
> +
> +Called once during initialization.
> +
> +4.100 KVM_ENABLE_MT
> +
> +Capability: basic
> +Architectures: x86
> +Type: vm ioctl
> +Parameters: struct mt_enable (in)
> +Returns: 0 on success, -1 on error
> +
> +/* for KVM_ENABLE_MT */
> +struct mt_enable {
> +       __u32 flags;            /* 1 -> on, 0 -> off */
> +};
> +
> +This instructs the MT subsystem to start/stop logging dirty
> +VM pages. On hosts that support fault based memory tracking, KVM
> +write-protects all VM pages to start dirty logging. On hosts that
> +support PML, KVM clears the dirty bits for all VM pages to start
> +dirty logging, and sets the dirty bits to stop dirty logging.
> +
> +Called once when entering/exiting live migration/checkpoint mode.
> +
> +4.101 KVM_PREPARE_MT_CP
> +
> +Capability: basic
> +Architectures: x86
> +Type: vm ioctl
> +Parameters: struct mt_prepare_cp (in)
> +Returns: 0 on success, -1 on error
> +
> +/* for KVM_PREPARE_MT_CP */
> +struct mt_prepare_cp {
> +        __s64   cpid;
> +};
> +
> +This instructs the MT subsystem that a new checkpoint cycle is
> +about to start and provides the cycle ID. The MT subsystem resets
> +all the relevant variables, assuming all dirty pages have been
> +fetched.
> +
> +Called once per checkpoint cycle.
> +
> +4.102 KVM_MT_SUBLIST_FETCH
> +
> +Capability: basic
> +Architectures: x86
> +Type: vm ioctl
> +Parameters: struct mt_sublist_fetch_info (in/out)
> +Returns: 0 on success, -1 on error
> +
> +/* for KVM_MT_SUBLIST_FETCH */
> +struct mt_gfn_list {
> +        __s32   count;
> +        __u32   max_dirty;
> +        __u64   *gfnlist;
> +};
> +
> +struct mt_sublist_fetch_info {
> +        struct mt_gfn_list  gfn_info;
> +
> +        /*
> +         * flags bit defs:
> +         */
> +
> +        /* caller sleeps until dirty count is reached */
> +#define MT_FETCH_WAIT           (1 << 0)
> +        /* dirty tracking is re-armed for each page in returned list */
> +#define MT_FETCH_REARM          (1 << 1)
> +
> +        __u32 flags;
> +};
> +
> +This fetches a subset of the current dirty pages. Userspace thread
> +specifies the maximum number of dirty pages it wants to fetch via
> +(struct mt_gfn_list).count. It also tells the MT subsystem if it is
> +going to wait until the specified maxinum number is reached. Userspace
> +thread can instruct the MT subsystem to re-arm the dirty trap for
> +each page that is fetched. The dirty pages are returned to userspace
> +in (struct mt_gfn_list).gfnlist, and (struct mt_gfn_list).count
> +indicates the number of dirty pages that are returned.
> +
> +Called multiple times by multiple threads per checkpoint cycle.
> +
> +4.103 KVM_REARM_DIRTY_PAGES
> +
> +Capability: basic
> +Architectures: x86
> +Type: vm ioctl
> +Parameters:
> +Returns: 0 on success, -1 on error
> +
> +This instructs the MT subsystem to rearm the dirty traps for all
> +the pages that were dirtied during the last checkpoint cycle.
> +
> +Called once per checkpoint cycle. The call is not necessary if dirty
> +traps are rearmed when dirty pages are being fetched.
> +
> +4.104 KVM_MT_VM_QUIESCED
> +
> +Capability: basic
> +Architectures: x86
> +Type: vm ioctl
> +Parameters:
> +Returns: 0 on success, -1 on error
> +
> +This instructs the MT subsystem that the VM has been quiesced and no
> +more pages will be dirtied this checkpoint cycle. The MT subsystem
> +will wake up userspace threads that are waiting for new dirty pages
> +to fetch, if any.
> +
> +Called once per checkpoint cycle.
> +
> +4.105 KVM_MT_DIRTY_TRIGGER
> +
> +Capability: basic
> +Architectures: x86
> +Type: vm ioctl
> +Parameters: struct mt_dirty_trigger (in)
> +Returns: 0 on success, -1 on error
> +
> +/* for KVM_MT_DIRTY_TRIGGER */
> +struct mt_dirty_trigger {
> +        /* force vcpus to exit when trigger is reached */
> +        __u32 dirty_trigger;
> +};
> +
> +This sets the VM exit trigger point based on dirty page count.
> +
> +Called once when entering live migration/checkpoint mode.
> +
>  5. The kvm_run structure
>  ------------------------
>
>

     prev parent reply	other threads:[~2016-05-03  7:11 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <201604261856.u3QIuJMs025122@dev1.sn.stratus.com>
2016-04-26 19:26 ` [PATCH 6/6] KVM: Dirty memory tracking for performant checkpointing and improved live migration Cao, Lei
2016-04-28 18:08   ` Radim Krčmář
2016-04-29 18:47     ` Cao, Lei
2016-05-02 16:23       ` Radim Krčmář
2016-05-03 13:34         ` Cao, Lei
2016-05-03  7:10   ` Huang, Kai [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2c122d8a-6633-9812-5f44-47bb50db07fa@linux.intel.com \
    --to=kai.huang@linux.intel.com \
    --cc=Lei.Cao@stratus.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox