All of lore.kernel.org
 help / color / mirror / Atom feed
From: Avi Kivity <avi@qumranet.com>
To: Marcelo Tosatti <marcelo@kvack.org>
Cc: kvm-devel@lists.sourceforge.net
Subject: Re: [patch 0/3] QEMU balloon support
Date: Mon, 17 Mar 2008 13:11:11 +0200	[thread overview]
Message-ID: <47DE51CF.1030905@qumranet.com> (raw)
In-Reply-To: <20080316185951.GA931@dmt>

Marcelo Tosatti wrote:
> Hi Avi,
>
> Good that you're back.
>
> On Sun, Mar 16, 2008 at 04:00:06PM +0200, Avi Kivity wrote:
>   
>> Marcelo Tosatti wrote:
>>     
>>> This patchset resends Anthony's QEMU balloon support plus:
>>>
>>> - Truncates the target size to ram size
>>> - Enables madvise() conditioned on KVM_ZAP_GFN ioctl
>>>
>>>  
>>>       
>> Once mmu notifiers are in, KVM_ZAP_GFN isn't needed.  So we have three 
>> possible situations:
>>
>> - zap needed, but not available: don't madvise()
>> - zap needed and available: zap and madvise()
>> - zap unneeded: madvise()
>>     
>
> Right. That is what the patch does. You just have to fill in
> "have_mmu_notifiers" here:
>
> +int kvm_zap_single_gfn(struct kvm *kvm, gfn_t gfn)
> +{
> +       unsigned long addr;
> +       int have_mmu_notifiers = 0;
> +
> +       down_read(&kvm->slots_lock);
> +       addr = gfn_to_hva(kvm, gfn);
> +
> +       if (kvm_is_error_hva(addr)) {
> +               up_read(&kvm->slots_lock);
> +               return -EINVAL;
> +       }
> +
> +       if (!have_mmu_notifiers) {
> +               spin_lock(&kvm->mmu_lock);
> +               rmap_nuke(kvm, gfn);
> +               spin_unlock(&kvm->mmu_lock);
> +       }
> +       up_read(&kvm->slots_lock);
>
> So as to avoid rmap_nuke, since that will be done through the madvise()
> path.
>
>   

Why not do it in userspace?

I'll likely merge zap directly into the backwards compatibility support 
(it won't ever appear in mainline since mmu notifiers will be merged 
sooner).

>> Did you find out what's causing the errors in the first place (if zap is 
>> not used)?  It worries me greatly.
>>     
>
> Yes, the problem is that the rmap code does not handle the qemu process
> mappings from vanishing while there is a present rmap. If that happens,
> and there is a fault for a gfn whose qemu mapping has been removed, a
> different physical zero page will be allocated:
>
> 	rmap a -> gfn 0 -> physical host page 0
> 	mapping for gfn 0 gets removed
> 	guest faults in gfn 0 through the same pte "chain"
> 	rmap a -> gfn 0 -> physical host page 1
>
> When instantiating the shadow mapping for the second time, the
> "is_rmap_pte" check succeeds, so we release the reference grabbed by
> gfn_to_page() at mmu_set_spte(). We now have a shadow mapping pointing
> to a physical page without having an additional reference on that page.
>
> The following makes the host not crash under such a condition, but the
> condition itself is invalid leading to inconsistent state on the guest.
> So IMHO it shouldnt be allowed to happen in the first place.
>
>   

The only way to prevent it is with mmu notifiers, which we may not have 
for 2.6.25.  So I'd like to send this for 2.6.25-rc.

Please add a signoff.

> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index f0cdfba..4c93b79 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1009,6 +1009,21 @@ struct page *gva_to_page(struct kvm_vcpu *vcpu, gva_t
> +gva)
>         return page;
>  }
>
> +static int was_spte_rmapped(struct kvm *kvm, u64 *spte, struct page *page)
> +{
> +       int ret = 0;
> +       unsigned long host_pfn = (*spte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
>   

hfn_t hfn

> +
> +       if (is_rmap_pte(*spte)) {
> +               if (host_pfn != page_to_pfn(page))
> +                       rmap_remove(kvm, spte);
> +               else
> +                       ret = 1;
> +       }
> +
> +       return ret;
> +}
> +
>  static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
>                          unsigned pt_access, unsigned pte_access,
>                          int user_fault, int write_fault, int dirty,
> @@ -1016,7 +1031,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64
> +*shadow_pte,
>                          struct page *page)
>  {
>         u64 spte;
> -       int was_rmapped = is_rmap_pte(*shadow_pte);
> +       int was_rmapped = was_spte_rmapped(vcpu->kvm, shadow_pte, page);
>         int was_writeble = is_writeble_pte(*shadow_pte);
>   

Calling code with side effects in an initializer is a bit confusing.  I 
suggest simply inlining the code into mmu_set_spte().

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

  reply	other threads:[~2008-03-17 11:11 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-12 19:17 [patch 0/3] QEMU balloon support Marcelo Tosatti
2008-03-12 19:17 ` [patch 1/3] QEMU support for virtio balloon driver Marcelo Tosatti
2008-03-12 19:17 ` [patch 2/3] QEMU: balloon: dont allow target larger than ram size Marcelo Tosatti
2008-03-12 19:17 ` [patch 3/3] QEMU: balloon: zap any shadow mappings to a page before madvise(MADV_DONTNEED) Marcelo Tosatti
2008-03-16 14:00 ` [patch 0/3] QEMU balloon support Avi Kivity
2008-03-16 18:59   ` Marcelo Tosatti
2008-03-17 11:11     ` Avi Kivity [this message]
2008-03-17 13:08       ` Marcelo Tosatti
2008-03-17 14:56         ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47DE51CF.1030905@qumranet.com \
    --to=avi@qumranet.com \
    --cc=kvm-devel@lists.sourceforge.net \
    --cc=marcelo@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.