All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: gleb@redhat.com, avi.kivity@gmail.com, pbonzini@redhat.com,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable
Date: Tue, 19 Nov 2013 22:29:20 -0200	[thread overview]
Message-ID: <20131120002920.GA14230@amt.cnet> (raw)
In-Reply-To: <5201C7D9.20004@linux.vnet.ibm.com>

On Wed, Aug 07, 2013 at 12:06:49PM +0800, Xiao Guangrong wrote:
> On 08/07/2013 09:48 AM, Marcelo Tosatti wrote:
> > On Tue, Jul 30, 2013 at 09:02:02PM +0800, Xiao Guangrong wrote:
> >> Make sure we can see the writable spte before the dirt bitmap is visible
> >>
> >> We do this is for kvm_vm_ioctl_get_dirty_log() write-protects the spte based
> >> on the dirty bitmap, we should ensure the writable spte can be found in rmap
> >> before the dirty bitmap is visible. Otherwise, we cleared the dirty bitmap and
> >> failed to write-protect the page
> >>
> >> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> >> ---
> >>  arch/x86/kvm/mmu.c | 6 +++---
> >>  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > Can you explain why this is safe, with regard to the rule 
> > at edde99ce05290e50 ?
> 
> BTW, this log fixed this case:
> 
>      VCPU 0                KVM migration control
> 
>                            write-protects all pages
> #Pf happen then the page
> become writable, set dirty
> bit on the bitmap
> 
>                           swap the bitmap, current bitmap is empty
> 
> write the page (no dirty log)
> 
>                           stop the guest and push
>                           the remaining dirty pages
> Stopped
>                           See current bitmap is empty that means
>                           no page is dirty.
> > 
> > "The rule is that all pages are either dirty in the current bitmap,
> > or write-protected, which is violated here."
> 
> Actually, this rule is not complete true, there's the 3th case:
> the window between write guest page and set dirty bitmap is valid.
> In that window, page is write-free and not dirty logged.
> 
> This case is based on the fact that at the final step of live migration,
> kvm should stop the guest and push the remaining dirty pages to the
> destination.
> 
> They're some examples in the current code:
> example 1, in fast_pf_fix_direct_spte():
> 	if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
> 		/* The window in here... */
> 		mark_page_dirty(vcpu->kvm, gfn);
> 
> example 2, in kvm_write_guest_page():
> 	r = __copy_to_user((void __user *)addr + offset, data, len);
> 	if (r)
> 		return -EFAULT;
> 	/*
> 	 * The window is here, the page is dirty but not logged in
>          * The bitmap.
> 	 */
> 	mark_page_dirty(kvm, gfn);
> 	return 0;

Why is this valid ? That is, the obviously correct rule is

"that all pages are either dirty in the current bitmap,
or write-protected, which is violated here."

With the window above, GET_DIRTY_LOG can be called 100 times while the 
page is dirty, but the corresponding bit not set in the dirty bitmap.

It violates the documentation:

/* for KVM_GET_DIRTY_LOG */
struct kvm_dirty_log {
        __u32 slot;
        __u32 padding;
        union {
                void __user *dirty_bitmap; /* one bit per page */
                __u64 padding;
        };
};

Given a memory slot, return a bitmap containing any pages dirtied
since the last call to this ioctl.  Bit 0 is the first page in the
memory slot.  Ensure the entire structure is cleared to avoid padding
issues.

The point about migration, is that GET_DIRTY_LOG is strictly correct
because it stops vcpus.

But what guarantee does userspace require, from GET_DIRTY_LOG, while vcpus are
executing? 

With fast page fault:

       if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
               /* The window in here... */
               mark_page_dirty(vcpu->kvm, gfn);
 
And the $SUBJECT set_spte reordering, the rule becomes

A call to GET_DIRTY_LOG guarantees to return correct information about 
dirty pages before invocation of the previous GET_DIRTY_LOG call.

(see example 1: the next GET_DIRTY_LOG will return the dirty information
there).

The rule for sptes that is, because kvm_write_guest does not match the
documentation at all.

So before example 1 and this patch, the rule (well for sptes at least) was

"Given a memory slot, return a bitmap containing any pages dirtied
since the last call to this ioctl.  Bit 0 is the first page in the
memory slot.  Ensure the entire structure is cleared to avoid padding
issues."

Can you explain why it is OK to relax this rule?

Thanks

(reviewing the nulls desc patch...).



  parent reply	other threads:[~2013-11-20  0:30 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-30 13:01 [RFC PATCH 00/12] KVM: MMU: locklessly wirte-protect Xiao Guangrong
2013-07-30 13:01 ` [PATCH 01/12] KVM: MMU: remove unused parameter Xiao Guangrong
2013-08-29  7:22   ` Gleb Natapov
2013-07-30 13:02 ` [PATCH 02/12] KVM: MMU: properly check last spte in fast_page_fault() Xiao Guangrong
2013-07-30 13:02 ` [PATCH 03/12] KVM: MMU: lazily drop large spte Xiao Guangrong
2013-08-02 14:55   ` Marcelo Tosatti
2013-08-02 15:42     ` Xiao Guangrong
2013-08-02 20:27       ` Marcelo Tosatti
2013-08-02 22:56         ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable Xiao Guangrong
2013-07-30 13:26   ` Paolo Bonzini
2013-07-31  7:25     ` Xiao Guangrong
2013-08-07  1:48   ` Marcelo Tosatti
2013-08-07  4:06     ` Xiao Guangrong
2013-08-08 15:06       ` Marcelo Tosatti
2013-08-08 16:26         ` Xiao Guangrong
2013-11-20  0:29       ` Marcelo Tosatti [this message]
2013-11-20  0:35         ` Marcelo Tosatti
2013-11-20 14:20         ` Xiao Guangrong
2013-11-20 19:47           ` Marcelo Tosatti
2013-11-21  4:26             ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 05/12] KVM: MMU: add spte into rmap before logging dirty page Xiao Guangrong
2013-07-30 13:27   ` Paolo Bonzini
2013-07-31  7:33     ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 06/12] KVM: MMU: flush tlb if the spte can be locklessly modified Xiao Guangrong
2013-08-28  7:23   ` Gleb Natapov
2013-08-28  7:50     ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 07/12] KVM: MMU: redesign the algorithm of pte_list Xiao Guangrong
2013-08-28  8:12   ` Gleb Natapov
2013-08-28  8:37     ` Xiao Guangrong
2013-08-28  8:58       ` Gleb Natapov
2013-08-28  9:19         ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 08/12] KVM: MMU: introduce nulls desc Xiao Guangrong
2013-08-28  8:40   ` Gleb Natapov
2013-08-28  8:54     ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 09/12] KVM: MMU: introduce pte-list lockless walker Xiao Guangrong
2013-08-28  9:20   ` Gleb Natapov
2013-08-28  9:33     ` Xiao Guangrong
2013-08-28  9:46       ` Gleb Natapov
2013-08-28 10:13         ` Xiao Guangrong
2013-08-28 10:49           ` Gleb Natapov
2013-08-28 12:15             ` Xiao Guangrong
2013-08-28 13:36               ` Gleb Natapov
2013-08-29  6:50                 ` Xiao Guangrong
2013-08-29  9:08                   ` Gleb Natapov
2013-08-29  9:31                     ` Xiao Guangrong
2013-08-29  9:51                       ` Gleb Natapov
2013-08-29 11:26                         ` Xiao Guangrong
2013-08-30 11:38                           ` Gleb Natapov
2013-09-02  7:02                             ` Xiao Guangrong
2013-08-29  9:31                   ` Gleb Natapov
2013-08-29 11:33                     ` Xiao Guangrong
2013-08-29 12:02                       ` Xiao Guangrong
2013-08-30 11:44                         ` Gleb Natapov
2013-09-02  8:50                           ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 10/12] KVM: MMU: allow locklessly access shadow page table out of vcpu thread Xiao Guangrong
2013-08-07 13:09   ` Takuya Yoshikawa
2013-08-07 13:19     ` Xiao Guangrong
2013-08-29  9:10   ` Gleb Natapov
2013-08-29  9:25     ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 11/12] KVM: MMU: locklessly write-protect the page Xiao Guangrong
2013-07-30 13:02 ` [PATCH 12/12] KVM: MMU: clean up spte_write_protect Xiao Guangrong
2013-07-30 13:11 ` [RFC PATCH 00/12] KVM: MMU: locklessly wirte-protect Xiao Guangrong
2013-08-03  5:09 ` Takuya Yoshikawa
2013-08-04 14:15   ` Xiao Guangrong
2013-08-29  7:16   ` Gleb Natapov
2013-08-06 13:16 ` Xiao Guangrong
2013-08-08 17:38   ` Paolo Bonzini
2013-08-09  4:51     ` Xiao Guangrong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131120002920.GA14230@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=avi.kivity@gmail.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=xiaoguangrong@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.