public inbox for kvm-ppc@vger.kernel.org
 help / color / mirror / Atom feed
From: Avi Kivity <avi@redhat.com>
To: Scott Wood <scottwood@freescale.com>
Cc: Alexander Graf <agraf@suse.de>,
	kvm-ppc@vger.kernel.org, kvm list <kvm@vger.kernel.org>,
	Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: [PATCH 04/14] KVM: PPC: e500: MMU API
Date: Tue, 01 Nov 2011 08:58:33 +0000	[thread overview]
Message-ID: <4EAFB4B9.2040806@redhat.com> (raw)
In-Reply-To: <4EAF013C.7050206@freescale.com>

On 10/31/2011 10:12 PM, Scott Wood wrote:
> >> +4.59 KVM_DIRTY_TLB
> >> +
> >> +Capability: KVM_CAP_SW_TLB
> >> +Architectures: ppc
> >> +Type: vcpu ioctl
> >> +Parameters: struct kvm_dirty_tlb (in)
> >> +Returns: 0 on success, -1 on error
> >> +
> >> +struct kvm_dirty_tlb {
> >> +	__u64 bitmap;
> >> +	__u32 num_dirty;
> >> +};
> > 
> > This is not 32/64 bit safe.  e500 is 32-bit only, yes?
>
> e5500 is 64-bit -- we don't support it with KVM yet, but it's planned.
>
> > but what if someone wants to emulate an e500 on a ppc64?  maybe it's better to add
> > padding here.
>
> What is unsafe about it?  Are you picturing TLBs with more than 4
> billion entries?

sizeof(struct kvm_tlb_dirty) = 12 for 32-bit userspace, but =  16 for
64-bit userspace and the kernel.  ABI structures must have the same
alignment and size for 32/64 bit userspace, or they need compat handling.

> There shouldn't be any alignment issues.
>
> > Another alternative is to drop the num_dirty field (and let the kernel
> > compute it instead, shouldn't take long?), and have the third argument
> > to ioctl() reference the bitmap directly.
>
> The idea was to make it possible for the kernel to apply a threshold
> above which it would be better to ignore the bitmap entirely and flush
> everything:
>
> http://www.spinics.net/lists/kvm/msg50079.html
>
> Currently we always just flush everything, and QEMU always says
> everything is dirty when it makes a change, but the API is there if needed.

Right, but you don't need num_dirty for it.  There are typically only a
few dozen entries, yes?  It should take a trivial amount of time to
calculate its weight.

> >> +Configures the virtual CPU's TLB array, establishing a shared memory area
> >> +between userspace and KVM.  The "params" and "array" fields are userspace
> >> +addresses of mmu-type-specific data structures.  The "array_len" field is an
> >> +safety mechanism, and should be set to the size in bytes of the memory that
> >> +userspace has reserved for the array.  It must be at least the size dictated
> >> +by "mmu_type" and "params".
> >> +
> >> +While KVM_RUN is active, the shared region is under control of KVM.  Its
> >> +contents are undefined, and any modification by userspace results in
> >> +boundedly undefined behavior.
> >> +
> >> +On return from KVM_RUN, the shared region will reflect the current state of
> >> +the guest's TLB.  If userspace makes any changes, it must call KVM_DIRTY_TLB
> >> +to tell KVM which entries have been changed, prior to calling KVM_RUN again
> >> +on this vcpu.
> > 
> > We already have another mechanism for such shared memory,
> > mmap(vcpu_fd).  x86 uses it for the coalesced mmio region as well as the
> > traditional kvm_run area.  Please consider using it.
>
> What does it buy us, other than needing a separate codepath in QEMU to
> allocate the memory differently based on whether KVM (and this feature)

The ability to use get_free_pages() and ordinary kernel memory directly,
instead of indirection through a struct page ** array.

> are being used, since QEMU uses this for its own MMU representation?
>
> This API has been discussed extensively, and the code using it is
> already in mainline QEMU.  This aspect of it hasn't changed since the
> discussion back in February:
>
> http://www.spinics.net/lists/kvm/msg50102.html
>
> I'd prefer to avoid another round of major overhaul without a really
> good reason.

Me too, but I also prefer not to make ABI choices by inertia.  ABI is
practically the only thing I care about wrt non-x86 (other than
whitespace, of course).  Please involve me in the discussions earlier in
the future.

> >> +For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV:
> >> + - The "params" field is of type "struct kvm_book3e_206_tlb_params".
> >> + - The "array" field points to an array of type "struct
> >> +   kvm_book3e_206_tlb_entry".
> >> + - The array consists of all entries in the first TLB, followed by all
> >> +   entries in the second TLB.
> >> + - Within a TLB, entries are ordered first by increasing set number.  Within a
> >> +   set, entries are ordered by way (increasing ESEL).
> >> + - The hash for determining set number in TLB0 is: (MAS2 >> 12) & (num_sets - 1)
> >> +   where "num_sets" is the tlb_sizes[] value divided by the tlb_ways[] value.
> >> + - The tsize field of mas1 shall be set to 4K on TLB0, even though the
> >> +   hardware ignores this value for TLB0.
> > 
> > Holy shit.
>
> You were the one that first suggested we use shared data:
> http://www.spinics.net/lists/kvm/msg49802.html
>
> These are the assumptions needed to make such an interface well-defined.

Just remarking on the complexity, don't take it personally.

> >> @@ -95,6 +90,9 @@ struct kvmppc_vcpu_e500 {
> >>  	u32 tlb1cfg;
> >>  	u64 mcar;
> >>  
> >> +	struct page **shared_tlb_pages;
> >> +	int num_shared_tlb_pages;
> >> +
> > 
> > I missed the requirement that things be page aligned.
>
> They don't need to be, we'll ignore the data before and after the shared
> area.
>
> > If you use mmap(vcpu_fd) this becomes simpler; you can use
> > get_free_pages() and have a single pointer.  You can also use vmap() on
> > this array (but get_free_pages() is faster).
>
> We do use vmap().  This is just the bookkeeping so we know what pages to
> free later.
>

Ah, I missed that (and the pointer).

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


  reply	other threads:[~2011-11-01  8:58 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-31  7:53 [PULL 00/14] ppc patch queue 2011-10-31 Alexander Graf
2011-10-31  7:53 ` [PATCH 01/14] KVM: PPC: e500: don't translate gfn to pfn with preemption disabled Alexander Graf
2011-10-31 12:50   ` [PATCH 01/14] KVM: PPC: e500: don't translate gfn to pfn with Avi Kivity
2011-10-31 18:52     ` Scott Wood
2011-11-01  9:00       ` Avi Kivity
2011-10-31  7:53 ` [PATCH 02/14] KVM: PPC: e500: Eliminate preempt_disable in local_sid_destroy_all Alexander Graf
2011-10-31  7:53 ` [PATCH 03/14] KVM: PPC: e500: clear up confusion between host and guest entries Alexander Graf
2011-10-31  7:53 ` [PATCH 04/14] KVM: PPC: e500: MMU API Alexander Graf
2011-10-31 13:24   ` Avi Kivity
2011-10-31 20:12     ` Scott Wood
2011-11-01  8:58       ` Avi Kivity [this message]
2011-11-01  9:55         ` Avi Kivity
2011-11-01 16:16         ` Scott Wood
2011-11-02 10:33           ` Avi Kivity
2011-11-10 14:20           ` Alexander Graf
2011-11-10 14:16             ` Avi Kivity
2011-10-31  7:53 ` [PATCH 05/14] KVM: PPC: e500: tlbsx: fix tlb0 esel Alexander Graf
2011-10-31  7:53 ` [PATCH 06/14] KVM: PPC: e500: Don't hardcode PIR=0 Alexander Graf
2011-10-31 13:27   ` Avi Kivity
2011-10-31  7:53 ` [PATCH 07/14] KVM: PPC: Fix build failure with HV KVM and CBE Alexander Graf
2011-10-31  7:53 ` [PATCH 08/14] Revert "KVM: PPC: Add support for explicit HIOR setting" Alexander Graf
2011-10-31 13:30   ` [PATCH 08/14] Revert "KVM: PPC: Add support for explicit HIOR Avi Kivity
2011-10-31 23:49     ` [PATCH 08/14] Revert "KVM: PPC: Add support for explicit HIOR setting" Alexander Graf
2011-10-31  7:53 ` [PATCH 09/14] KVM: PPC: Add generic single register ioctls Alexander Graf
2011-10-31 13:36   ` Avi Kivity
2011-10-31 17:26     ` Jan Kiszka
2011-11-10 14:22     ` Alexander Graf
2011-11-10 16:05   ` Marcelo Tosatti
2011-11-10 16:49     ` Alexander Graf
2011-11-10 17:35       ` Marcelo Tosatti
2011-11-15 23:45         ` Alexander Graf
2011-11-23 12:47           ` Marcelo Tosatti
2011-12-19 12:58             ` Alexander Graf
2011-12-19 17:29               ` Marcelo Tosatti
2011-10-31  7:53 ` [PATCH 10/14] KVM: PPC: Add support for explicit HIOR setting Alexander Graf
2011-10-31  7:53 ` [PATCH 11/14] KVM: PPC: Whitespace fix for kvm.h Alexander Graf
2011-10-31  7:53 ` [PATCH 12/14] KVM: Fix whitespace in kvm_para.h Alexander Graf
2011-10-31  7:53 ` [PATCH 13/14] KVM: PPC: E500: Support hugetlbfs Alexander Graf
2011-10-31 13:38   ` Avi Kivity
2011-11-10 14:24     ` Alexander Graf
2011-10-31  7:53 ` [PATCH 14/14] PPC: Fix race in mtmsr paravirt implementation Alexander Graf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4EAFB4B9.2040806@redhat.com \
    --to=avi@redhat.com \
    --cc=agraf@suse.de \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=scottwood@freescale.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox