Re: [patch 0/3] QEMU balloon support

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Avi Kivity <avi@qumranet.com>
To: Marcelo Tosatti <marcelo@kvack.org>
Cc: kvm-devel@lists.sourceforge.net
Subject: Re: [patch 0/3] QEMU balloon support
Date: Mon, 17 Mar 2008 13:11:11 +0200	[thread overview]
Message-ID: <47DE51CF.1030905@qumranet.com> (raw)
In-Reply-To: <20080316185951.GA931@dmt>

Marcelo Tosatti wrote:
> Hi Avi,
>
> Good that you're back.
>
> On Sun, Mar 16, 2008 at 04:00:06PM +0200, Avi Kivity wrote:
>   
>> Marcelo Tosatti wrote:
>>     
>>> This patchset resends Anthony's QEMU balloon support plus:
>>>
>>> - Truncates the target size to ram size
>>> - Enables madvise() conditioned on KVM_ZAP_GFN ioctl
>>>
>>>  
>>>       
>> Once mmu notifiers are in, KVM_ZAP_GFN isn't needed.  So we have three 
>> possible situations:
>>
>> - zap needed, but not available: don't madvise()
>> - zap needed and available: zap and madvise()
>> - zap unneeded: madvise()
>>     
>
> Right. That is what the patch does. You just have to fill in
> "have_mmu_notifiers" here:
>
> +int kvm_zap_single_gfn(struct kvm *kvm, gfn_t gfn)
> +{
> +       unsigned long addr;
> +       int have_mmu_notifiers = 0;
> +
> +       down_read(&kvm->slots_lock);
> +       addr = gfn_to_hva(kvm, gfn);
> +
> +       if (kvm_is_error_hva(addr)) {
> +               up_read(&kvm->slots_lock);
> +               return -EINVAL;
> +       }
> +
> +       if (!have_mmu_notifiers) {
> +               spin_lock(&kvm->mmu_lock);
> +               rmap_nuke(kvm, gfn);
> +               spin_unlock(&kvm->mmu_lock);
> +       }
> +       up_read(&kvm->slots_lock);
>
> So as to avoid rmap_nuke, since that will be done through the madvise()
> path.
>
>   

Why not do it in userspace?

I'll likely merge zap directly into the backwards compatibility support 
(it won't ever appear in mainline since mmu notifiers will be merged 
sooner).

>> Did you find out what's causing the errors in the first place (if zap is 
>> not used)?  It worries me greatly.
>>     
>
> Yes, the problem is that the rmap code does not handle the qemu process
> mappings from vanishing while there is a present rmap. If that happens,
> and there is a fault for a gfn whose qemu mapping has been removed, a
> different physical zero page will be allocated:
>
> 	rmap a -> gfn 0 -> physical host page 0
> 	mapping for gfn 0 gets removed
> 	guest faults in gfn 0 through the same pte "chain"
> 	rmap a -> gfn 0 -> physical host page 1
>
> When instantiating the shadow mapping for the second time, the
> "is_rmap_pte" check succeeds, so we release the reference grabbed by
> gfn_to_page() at mmu_set_spte(). We now have a shadow mapping pointing
> to a physical page without having an additional reference on that page.
>
> The following makes the host not crash under such a condition, but the
> condition itself is invalid leading to inconsistent state on the guest.
> So IMHO it shouldnt be allowed to happen in the first place.
>
>   

The only way to prevent it is with mmu notifiers, which we may not have 
for 2.6.25.  So I'd like to send this for 2.6.25-rc.

Please add a signoff.

> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index f0cdfba..4c93b79 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1009,6 +1009,21 @@ struct page *gva_to_page(struct kvm_vcpu *vcpu, gva_t
> +gva)
>         return page;
>  }
>
> +static int was_spte_rmapped(struct kvm *kvm, u64 *spte, struct page *page)
> +{
> +       int ret = 0;
> +       unsigned long host_pfn = (*spte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
>   

hfn_t hfn

> +
> +       if (is_rmap_pte(*spte)) {
> +               if (host_pfn != page_to_pfn(page))
> +                       rmap_remove(kvm, spte);
> +               else
> +                       ret = 1;
> +       }
> +
> +       return ret;
> +}
> +
>  static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
>                          unsigned pt_access, unsigned pte_access,
>                          int user_fault, int write_fault, int dirty,
> @@ -1016,7 +1031,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64
> +*shadow_pte,
>                          struct page *page)
>  {
>         u64 spte;
> -       int was_rmapped = is_rmap_pte(*shadow_pte);
> +       int was_rmapped = was_spte_rmapped(vcpu->kvm, shadow_pte, page);
>         int was_writeble = is_writeble_pte(*shadow_pte);
>   

Calling code with side effects in an initializer is a bit confusing.  I 
suggest simply inlining the code into mmu_set_spte().

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

next prev parent reply	other threads:[~2008-03-17 11:11 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-12 19:17 [patch 0/3] QEMU balloon support Marcelo Tosatti
2008-03-12 19:17 ` [patch 1/3] QEMU support for virtio balloon driver Marcelo Tosatti
2008-03-12 19:17 ` [patch 2/3] QEMU: balloon: dont allow target larger than ram size Marcelo Tosatti
2008-03-12 19:17 ` [patch 3/3] QEMU: balloon: zap any shadow mappings to a page before madvise(MADV_DONTNEED) Marcelo Tosatti
2008-03-16 14:00 ` [patch 0/3] QEMU balloon support Avi Kivity
2008-03-16 18:59   ` Marcelo Tosatti
2008-03-17 11:11     ` Avi Kivity [this message]
2008-03-17 13:08       ` Marcelo Tosatti
2008-03-17 14:56         ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47DE51CF.1030905@qumranet.com \
    --to=avi@qumranet.com \
    --cc=kvm-devel@lists.sourceforge.net \
    --cc=marcelo@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox