From: Avi Kivity <avi@redhat.com>
To: Scott Wood <scottwood@freescale.com>
Cc: Alexander Graf <agraf@suse.de>,
kvm-ppc@vger.kernel.org, kvm list <kvm@vger.kernel.org>,
Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: [PATCH 04/14] KVM: PPC: e500: MMU API
Date: Tue, 01 Nov 2011 08:58:33 +0000 [thread overview]
Message-ID: <4EAFB4B9.2040806@redhat.com> (raw)
In-Reply-To: <4EAF013C.7050206@freescale.com>
On 10/31/2011 10:12 PM, Scott Wood wrote:
> >> +4.59 KVM_DIRTY_TLB
> >> +
> >> +Capability: KVM_CAP_SW_TLB
> >> +Architectures: ppc
> >> +Type: vcpu ioctl
> >> +Parameters: struct kvm_dirty_tlb (in)
> >> +Returns: 0 on success, -1 on error
> >> +
> >> +struct kvm_dirty_tlb {
> >> + __u64 bitmap;
> >> + __u32 num_dirty;
> >> +};
> >
> > This is not 32/64 bit safe. e500 is 32-bit only, yes?
>
> e5500 is 64-bit -- we don't support it with KVM yet, but it's planned.
>
> > but what if someone wants to emulate an e500 on a ppc64? maybe it's better to add
> > padding here.
>
> What is unsafe about it? Are you picturing TLBs with more than 4
> billion entries?
sizeof(struct kvm_tlb_dirty) = 12 for 32-bit userspace, but = 16 for
64-bit userspace and the kernel. ABI structures must have the same
alignment and size for 32/64 bit userspace, or they need compat handling.
> There shouldn't be any alignment issues.
>
> > Another alternative is to drop the num_dirty field (and let the kernel
> > compute it instead, shouldn't take long?), and have the third argument
> > to ioctl() reference the bitmap directly.
>
> The idea was to make it possible for the kernel to apply a threshold
> above which it would be better to ignore the bitmap entirely and flush
> everything:
>
> http://www.spinics.net/lists/kvm/msg50079.html
>
> Currently we always just flush everything, and QEMU always says
> everything is dirty when it makes a change, but the API is there if needed.
Right, but you don't need num_dirty for it. There are typically only a
few dozen entries, yes? It should take a trivial amount of time to
calculate its weight.
> >> +Configures the virtual CPU's TLB array, establishing a shared memory area
> >> +between userspace and KVM. The "params" and "array" fields are userspace
> >> +addresses of mmu-type-specific data structures. The "array_len" field is an
> >> +safety mechanism, and should be set to the size in bytes of the memory that
> >> +userspace has reserved for the array. It must be at least the size dictated
> >> +by "mmu_type" and "params".
> >> +
> >> +While KVM_RUN is active, the shared region is under control of KVM. Its
> >> +contents are undefined, and any modification by userspace results in
> >> +boundedly undefined behavior.
> >> +
> >> +On return from KVM_RUN, the shared region will reflect the current state of
> >> +the guest's TLB. If userspace makes any changes, it must call KVM_DIRTY_TLB
> >> +to tell KVM which entries have been changed, prior to calling KVM_RUN again
> >> +on this vcpu.
> >
> > We already have another mechanism for such shared memory,
> > mmap(vcpu_fd). x86 uses it for the coalesced mmio region as well as the
> > traditional kvm_run area. Please consider using it.
>
> What does it buy us, other than needing a separate codepath in QEMU to
> allocate the memory differently based on whether KVM (and this feature)
The ability to use get_free_pages() and ordinary kernel memory directly,
instead of indirection through a struct page ** array.
> are being used, since QEMU uses this for its own MMU representation?
>
> This API has been discussed extensively, and the code using it is
> already in mainline QEMU. This aspect of it hasn't changed since the
> discussion back in February:
>
> http://www.spinics.net/lists/kvm/msg50102.html
>
> I'd prefer to avoid another round of major overhaul without a really
> good reason.
Me too, but I also prefer not to make ABI choices by inertia. ABI is
practically the only thing I care about wrt non-x86 (other than
whitespace, of course). Please involve me in the discussions earlier in
the future.
> >> +For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV:
> >> + - The "params" field is of type "struct kvm_book3e_206_tlb_params".
> >> + - The "array" field points to an array of type "struct
> >> + kvm_book3e_206_tlb_entry".
> >> + - The array consists of all entries in the first TLB, followed by all
> >> + entries in the second TLB.
> >> + - Within a TLB, entries are ordered first by increasing set number. Within a
> >> + set, entries are ordered by way (increasing ESEL).
> >> + - The hash for determining set number in TLB0 is: (MAS2 >> 12) & (num_sets - 1)
> >> + where "num_sets" is the tlb_sizes[] value divided by the tlb_ways[] value.
> >> + - The tsize field of mas1 shall be set to 4K on TLB0, even though the
> >> + hardware ignores this value for TLB0.
> >
> > Holy shit.
>
> You were the one that first suggested we use shared data:
> http://www.spinics.net/lists/kvm/msg49802.html
>
> These are the assumptions needed to make such an interface well-defined.
Just remarking on the complexity, don't take it personally.
> >> @@ -95,6 +90,9 @@ struct kvmppc_vcpu_e500 {
> >> u32 tlb1cfg;
> >> u64 mcar;
> >>
> >> + struct page **shared_tlb_pages;
> >> + int num_shared_tlb_pages;
> >> +
> >
> > I missed the requirement that things be page aligned.
>
> They don't need to be, we'll ignore the data before and after the shared
> area.
>
> > If you use mmap(vcpu_fd) this becomes simpler; you can use
> > get_free_pages() and have a single pointer. You can also use vmap() on
> > this array (but get_free_pages() is faster).
>
> We do use vmap(). This is just the bookkeeping so we know what pages to
> free later.
>
Ah, I missed that (and the pointer).
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
next prev parent reply other threads:[~2011-11-01 8:58 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-10-31 7:53 [PULL 00/14] ppc patch queue 2011-10-31 Alexander Graf
2011-10-31 7:53 ` [PATCH 01/14] KVM: PPC: e500: don't translate gfn to pfn with preemption disabled Alexander Graf
2011-10-31 12:50 ` [PATCH 01/14] KVM: PPC: e500: don't translate gfn to pfn with Avi Kivity
2011-10-31 18:52 ` Scott Wood
2011-11-01 9:00 ` Avi Kivity
2011-10-31 7:53 ` [PATCH 02/14] KVM: PPC: e500: Eliminate preempt_disable in local_sid_destroy_all Alexander Graf
2011-10-31 7:53 ` [PATCH 03/14] KVM: PPC: e500: clear up confusion between host and guest entries Alexander Graf
2011-10-31 7:53 ` [PATCH 04/14] KVM: PPC: e500: MMU API Alexander Graf
2011-10-31 13:24 ` Avi Kivity
2011-10-31 20:12 ` Scott Wood
2011-11-01 8:58 ` Avi Kivity [this message]
2011-11-01 9:55 ` Avi Kivity
2011-11-01 16:16 ` Scott Wood
2011-11-02 10:33 ` Avi Kivity
2011-11-10 14:20 ` Alexander Graf
2011-11-10 14:16 ` Avi Kivity
2011-10-31 7:53 ` [PATCH 05/14] KVM: PPC: e500: tlbsx: fix tlb0 esel Alexander Graf
2011-10-31 7:53 ` [PATCH 06/14] KVM: PPC: e500: Don't hardcode PIR=0 Alexander Graf
2011-10-31 13:27 ` Avi Kivity
2011-10-31 7:53 ` [PATCH 07/14] KVM: PPC: Fix build failure with HV KVM and CBE Alexander Graf
2011-10-31 7:53 ` [PATCH 08/14] Revert "KVM: PPC: Add support for explicit HIOR setting" Alexander Graf
2011-10-31 13:30 ` [PATCH 08/14] Revert "KVM: PPC: Add support for explicit HIOR Avi Kivity
2011-10-31 23:49 ` [PATCH 08/14] Revert "KVM: PPC: Add support for explicit HIOR setting" Alexander Graf
2011-10-31 7:53 ` [PATCH 09/14] KVM: PPC: Add generic single register ioctls Alexander Graf
2011-10-31 13:36 ` Avi Kivity
2011-10-31 17:26 ` Jan Kiszka
2011-11-10 14:22 ` Alexander Graf
2011-11-10 16:05 ` Marcelo Tosatti
2011-11-10 16:49 ` Alexander Graf
2011-11-10 17:35 ` Marcelo Tosatti
2011-11-15 23:45 ` Alexander Graf
2011-11-23 12:47 ` Marcelo Tosatti
2011-12-19 12:58 ` Alexander Graf
2011-12-19 17:29 ` Marcelo Tosatti
2011-10-31 7:53 ` [PATCH 10/14] KVM: PPC: Add support for explicit HIOR setting Alexander Graf
2011-10-31 7:53 ` [PATCH 11/14] KVM: PPC: Whitespace fix for kvm.h Alexander Graf
2011-10-31 7:53 ` [PATCH 12/14] KVM: Fix whitespace in kvm_para.h Alexander Graf
2011-10-31 7:53 ` [PATCH 13/14] KVM: PPC: E500: Support hugetlbfs Alexander Graf
2011-10-31 13:38 ` Avi Kivity
2011-11-10 14:24 ` Alexander Graf
2011-10-31 7:53 ` [PATCH 14/14] PPC: Fix race in mtmsr paravirt implementation Alexander Graf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4EAFB4B9.2040806@redhat.com \
--to=avi@redhat.com \
--cc=agraf@suse.de \
--cc=kvm-ppc@vger.kernel.org \
--cc=kvm@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=scottwood@freescale.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox