All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Avi Kivity <avi@redhat.com>, Paul Mackerras <paulus@samba.org>,
	Gleb Natapov <gleb@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>,
	Alexander Graf <agraf@suse.de>,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH 3/5] KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly
Date: Thu, 23 Aug 2012 13:55:37 +0000	[thread overview]
Message-ID: <20120823135537.GC4747@amt.cnet> (raw)
In-Reply-To: <1345235545.11751.89.camel@pasglop>

On Sat, Aug 18, 2012 at 06:32:25AM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2012-08-17 at 15:39 -0300, Marcelo Tosatti wrote:
> > On Fri, Aug 17, 2012 at 05:06:18PM +1000, Benjamin Herrenschmidt wrote:
> > > On Wed, 2012-08-15 at 14:59 -0300, Marcelo Tosatti wrote:
> > > > 
> > > > The guest should not expect memory accesses to an address
> > > > to behave sanely while changing a BAR anyway.
> > > > 
> > > > Yes, compatibility for change of GPA base can be done in the
> > > > kernel. I can look into it next week if nobody has done so at
> > > > that point. 
> > > 
> > > There's one thing to be extra careful about here is if we start
> > > doing that for normal memory (in case we start breaking it up
> > > in slots, such as NUMA setups etc...).
> > > 
> > > The problem is that we must not allow normal memory accesses to be
> > > handled via the "emulation" code (ie MMIO emulation or load/store
> > > emulation, whatever we call it).
> > > 
> > > Part of the issues is that on architectures that don't use IPIs for
> > > TLB invalidations but instead use some form of HW broadcast such as
> > > PowerPC or ARM, there is an inherent race in that the emulation code can
> > > keep a guest physical address (and perform the relevant access to the
> > > corresponding memory region) way beyond the point where the guest
> > > virtual->physical translation leading to that address has been
> > > invalidated.
> > > 
> > > This doesn't happen on x86 because essentially the completion of the
> > > invalidation IPI has to wait for all VCPUs to "respond" and thus to
> > > finish whatever emulation they are doing. This is not the case on archs
> > > with a HW invalidate broadcast.
> > > 
> > > This is a nasty race, and while we more/less decided that it was
> > > survivable as long as we only go through emulation for devices (as we
> > > don't play swapping games with them in the guest kernel), the minute we
> > > allow normal guest memory access to "slip through", we have broken the
> > > guest virtual memory model.
> > 
> > This emulation is in hardware, yes? It is the lack of a TLB entry (or
> > the lack of a valid pagetable to fill the TLB) that triggers it?
> 
> What do you mean ? I'm talking about KVM emulating load and store
> instructions (either in the kernel or passing them down to qemu).
> 
> This happens whenever an access triggers a host page fault which we
> can't resolve because there is no memory slot. In that case, the access
> is treated as "emulation".
> 
> Thus removing a memory slot and later on adding it back is broken for
> that reason on those architectures if that memory slot is used to cover
> actual guest memory or anything for which the guest kernel can
> potentially play mapping game (yes, potentially this can be an issue
> with emulated graphics BARs if/when we start doing fancy stuff with them
> such as using the DRM with the TTM which can "Swap" objects in/out of
> the emulated vram and play with mappings).
> 
> The memory slot update must either be atomic or as you mention below,
> whoever does the update must stop all vcpu's before doing the update
> which sucks as well.

There are a number of options to consider:

1) Merge the current patchset from Paul, which has two downsides:
	1-1) It contains an unfixable race.
	1-2) It splits the rcu/invalidation steps in generic code
             and subarch code. It opens the precedent for other 
             arches to do the same.
You'd still have to implement kvm_arch_flush_shadow to support proper 
deletion of memslots (without races there), for example.

2) Disallow GPA base change, require userspace to stop vcpus if it 
needs atomicity while changing the GPA base of a slot (and then 
introduce the two new callbacks as discussed in this thread).

3) Introduce a mechanism in kernel to serialize guest access (such as a mutex) 
while memory slot updates are performed, thus retaining atomicity.

It appears to me that given the relative rarity (as compared to
vmentry/vmexits, say) of change of GPA base, 2) is preferred. But i
might be wrong.

What do you prefer?

> > > So if we are manipulated memory slots used for guest RAM we must -not-
> > > break atomicity, since during the time the slot is gone, it will
> > > fallback to emulation, introducing the above race (at least on PowerPC
> > > and ARM).
> > 
> > You could say get the vcpus to a known state (which has a side effect of
> > making sure that emulation is stopped), no? (just as a mental exercise).
> 
> Yes, you could do that.
> 
> > > Cheers,
> > > Ben.
> > 
> > Yes. Well, Avi mentioned earlier that there are users for change of GPA
> > base. But, if my understanding is correct, the code that emulates
> > change of BAR in QEMU is:
> > 
> >         /* now do the real mapping */
> >         if (r->addr != PCI_BAR_UNMAPPED) {
> >             memory_region_del_subregion(r->address_space, r->memory);
> >         }
> >         r->addr = new_addr;
> >         if (r->addr != PCI_BAR_UNMAPPED) {
> >             memory_region_add_subregion_overlap(r->address_space,
> >                                                 r->addr, r->memory, 1);
> > 
> > These translate to two kvm_set_user_memory ioctls. 
> > 
> > "> Without taking into consideration backwards compatibility, userspace 
> >  > can first delete the slot and later create a new one.
> > 
> >  Current qemu will in fact do that.  Not sure about older ones.
> > "
> > 
> > Avi, where it does that?
> > 
> Cheers,
> Ben.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Avi Kivity <avi@redhat.com>, Paul Mackerras <paulus@samba.org>,
	Gleb Natapov <gleb@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>,
	Alexander Graf <agraf@suse.de>,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH 3/5] KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly
Date: Thu, 23 Aug 2012 10:55:37 -0300	[thread overview]
Message-ID: <20120823135537.GC4747@amt.cnet> (raw)
In-Reply-To: <1345235545.11751.89.camel@pasglop>

On Sat, Aug 18, 2012 at 06:32:25AM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2012-08-17 at 15:39 -0300, Marcelo Tosatti wrote:
> > On Fri, Aug 17, 2012 at 05:06:18PM +1000, Benjamin Herrenschmidt wrote:
> > > On Wed, 2012-08-15 at 14:59 -0300, Marcelo Tosatti wrote:
> > > > 
> > > > The guest should not expect memory accesses to an address
> > > > to behave sanely while changing a BAR anyway.
> > > > 
> > > > Yes, compatibility for change of GPA base can be done in the
> > > > kernel. I can look into it next week if nobody has done so at
> > > > that point. 
> > > 
> > > There's one thing to be extra careful about here is if we start
> > > doing that for normal memory (in case we start breaking it up
> > > in slots, such as NUMA setups etc...).
> > > 
> > > The problem is that we must not allow normal memory accesses to be
> > > handled via the "emulation" code (ie MMIO emulation or load/store
> > > emulation, whatever we call it).
> > > 
> > > Part of the issues is that on architectures that don't use IPIs for
> > > TLB invalidations but instead use some form of HW broadcast such as
> > > PowerPC or ARM, there is an inherent race in that the emulation code can
> > > keep a guest physical address (and perform the relevant access to the
> > > corresponding memory region) way beyond the point where the guest
> > > virtual->physical translation leading to that address has been
> > > invalidated.
> > > 
> > > This doesn't happen on x86 because essentially the completion of the
> > > invalidation IPI has to wait for all VCPUs to "respond" and thus to
> > > finish whatever emulation they are doing. This is not the case on archs
> > > with a HW invalidate broadcast.
> > > 
> > > This is a nasty race, and while we more/less decided that it was
> > > survivable as long as we only go through emulation for devices (as we
> > > don't play swapping games with them in the guest kernel), the minute we
> > > allow normal guest memory access to "slip through", we have broken the
> > > guest virtual memory model.
> > 
> > This emulation is in hardware, yes? It is the lack of a TLB entry (or
> > the lack of a valid pagetable to fill the TLB) that triggers it?
> 
> What do you mean ? I'm talking about KVM emulating load and store
> instructions (either in the kernel or passing them down to qemu).
> 
> This happens whenever an access triggers a host page fault which we
> can't resolve because there is no memory slot. In that case, the access
> is treated as "emulation".
> 
> Thus removing a memory slot and later on adding it back is broken for
> that reason on those architectures if that memory slot is used to cover
> actual guest memory or anything for which the guest kernel can
> potentially play mapping game (yes, potentially this can be an issue
> with emulated graphics BARs if/when we start doing fancy stuff with them
> such as using the DRM with the TTM which can "Swap" objects in/out of
> the emulated vram and play with mappings).
> 
> The memory slot update must either be atomic or as you mention below,
> whoever does the update must stop all vcpu's before doing the update
> which sucks as well.

There are a number of options to consider:

1) Merge the current patchset from Paul, which has two downsides:
	1-1) It contains an unfixable race.
	1-2) It splits the rcu/invalidation steps in generic code
             and subarch code. It opens the precedent for other 
             arches to do the same.
You'd still have to implement kvm_arch_flush_shadow to support proper 
deletion of memslots (without races there), for example.

2) Disallow GPA base change, require userspace to stop vcpus if it 
needs atomicity while changing the GPA base of a slot (and then 
introduce the two new callbacks as discussed in this thread).

3) Introduce a mechanism in kernel to serialize guest access (such as a mutex) 
while memory slot updates are performed, thus retaining atomicity.

It appears to me that given the relative rarity (as compared to
vmentry/vmexits, say) of change of GPA base, 2) is preferred. But i
might be wrong.

What do you prefer?

> > > So if we are manipulated memory slots used for guest RAM we must -not-
> > > break atomicity, since during the time the slot is gone, it will
> > > fallback to emulation, introducing the above race (at least on PowerPC
> > > and ARM).
> > 
> > You could say get the vcpus to a known state (which has a side effect of
> > making sure that emulation is stopped), no? (just as a mental exercise).
> 
> Yes, you could do that.
> 
> > > Cheers,
> > > Ben.
> > 
> > Yes. Well, Avi mentioned earlier that there are users for change of GPA
> > base. But, if my understanding is correct, the code that emulates
> > change of BAR in QEMU is:
> > 
> >         /* now do the real mapping */
> >         if (r->addr != PCI_BAR_UNMAPPED) {
> >             memory_region_del_subregion(r->address_space, r->memory);
> >         }
> >         r->addr = new_addr;
> >         if (r->addr != PCI_BAR_UNMAPPED) {
> >             memory_region_add_subregion_overlap(r->address_space,
> >                                                 r->addr, r->memory, 1);
> > 
> > These translate to two kvm_set_user_memory ioctls. 
> > 
> > "> Without taking into consideration backwards compatibility, userspace 
> >  > can first delete the slot and later create a new one.
> > 
> >  Current qemu will in fact do that.  Not sure about older ones.
> > "
> > 
> > Avi, where it does that?
> > 
> Cheers,
> Ben.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2012-08-23 13:55 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-06 10:02 [PATCH 0/5] Improve memory slot handling and other fixes Paul Mackerras
2012-08-06 10:02 ` Paul Mackerras
2012-08-06 10:03 ` [PATCH 1/5] KVM: PPC: Book3S HV: Fix incorrect branch in H_CEDE code Paul Mackerras
2012-08-06 10:03   ` Paul Mackerras
2012-08-06 10:04 ` [PATCH 2/5] KVM: PPC: Quieten message about allocating linear regions Paul Mackerras
2012-08-06 10:04   ` Paul Mackerras
2012-08-06 10:06 ` [PATCH 3/5] KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly Paul Mackerras
2012-08-06 10:06   ` Paul Mackerras
2012-08-09 18:16   ` Marcelo Tosatti
2012-08-09 18:16     ` Marcelo Tosatti
2012-08-10  0:34     ` Paul Mackerras
2012-08-10  0:34       ` Paul Mackerras
2012-08-10  1:25       ` Marcelo Tosatti
2012-08-10  1:25         ` Marcelo Tosatti
2012-08-10  1:33         ` Marcelo Tosatti
2012-08-10  1:33           ` Marcelo Tosatti
2012-08-10  2:09         ` Takuya Yoshikawa
2012-08-10  2:09           ` Takuya Yoshikawa
2012-08-10 18:35           ` Marcelo Tosatti
2012-08-10 18:35             ` Marcelo Tosatti
2012-08-11  0:37             ` Paul Mackerras
2012-08-11  0:37               ` Paul Mackerras
2012-08-13 16:34               ` Marcelo Tosatti
2012-08-13 16:34                 ` Marcelo Tosatti
2012-08-13 22:04                 ` Marcelo Tosatti
2012-08-13 22:04                   ` Marcelo Tosatti
2012-08-15  9:26                   ` Avi Kivity
2012-08-15  9:26                     ` Avi Kivity
2012-08-15 17:59                     ` Marcelo Tosatti
2012-08-15 17:59                       ` Marcelo Tosatti
2012-08-17  7:06                       ` Benjamin Herrenschmidt
2012-08-17  7:06                         ` Benjamin Herrenschmidt
2012-08-17 18:39                         ` Marcelo Tosatti
2012-08-17 18:39                           ` Marcelo Tosatti
2012-08-17 20:32                           ` Benjamin Herrenschmidt
2012-08-17 20:32                             ` Benjamin Herrenschmidt
2012-08-23 13:55                             ` Marcelo Tosatti [this message]
2012-08-23 13:55                               ` Marcelo Tosatti
2012-08-24  9:29                               ` Paul Mackerras
2012-08-24  9:29                                 ` Paul Mackerras
2012-08-24 18:58                                 ` Marcelo Tosatti
2012-08-24 18:58                                   ` Marcelo Tosatti
2012-08-19  9:39                           ` Avi Kivity
2012-08-19  9:39                             ` Avi Kivity
2012-08-15  6:06                 ` Paul Mackerras
2012-08-15  6:06                   ` Paul Mackerras
2012-08-15  9:23                 ` Avi Kivity
2012-08-15  9:23                   ` Avi Kivity
2012-08-06 10:06 ` [PATCH 4/5] KVM: PPC: Book3S HV: Take the SRCU read lock before looking up memslots Paul Mackerras
2012-08-06 10:06   ` Paul Mackerras
2012-08-09 18:22   ` Marcelo Tosatti
2012-08-09 18:22     ` Marcelo Tosatti
2012-08-10  0:45     ` Paul Mackerras
2012-08-10  0:45       ` Paul Mackerras
2012-08-06 10:08 ` [RFC PATCH 5/5] KVM: PPC: Take the SRCU lock around memslot use Paul Mackerras
2012-08-06 10:08   ` Paul Mackerras
2012-08-09 18:27   ` Marcelo Tosatti
2012-08-09 18:27     ` Marcelo Tosatti
2012-08-10  0:37     ` Paul Mackerras
2012-08-10  0:37       ` Paul Mackerras
2012-08-10  9:27       ` Alexander Graf
2012-08-10  9:27         ` Alexander Graf
2012-08-15  8:16         ` Benjamin Herrenschmidt
2012-08-15  8:16           ` Benjamin Herrenschmidt
2012-08-10  9:23 ` [PATCH 0/5] Improve memory slot handling and other fixes Alexander Graf
2012-08-10  9:23   ` Alexander Graf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120823135537.GC4747@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=agraf@suse.de \
    --cc=alex.williamson@redhat.com \
    --cc=avi@redhat.com \
    --cc=benh@kernel.crashing.org \
    --cc=gleb@redhat.com \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=paulus@samba.org \
    --cc=yoshikawa.takuya@oss.ntt.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.