From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: kvm@vger.kernel.org, Gleb Natapov <gleb@kernel.org>,
Alexander Graf <agraf@suse.de>,
kvm-ppc@vger.kernel.org, linux-kernel@vger.kernel.org,
Paul Mackerras <paulus@samba.org>,
Paolo Bonzini <pbonzini@redhat.com>,
linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 3/3] PPC: KVM: Add support for 64bit TCE windows
Date: Thu, 05 Jun 2014 19:26:51 +1000 [thread overview]
Message-ID: <539037DB.5080706@ozlabs.ru> (raw)
In-Reply-To: <1401953908.3247.121.camel@pasglop>
On 06/05/2014 05:38 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2014-06-05 at 17:25 +1000, Alexey Kardashevskiy wrote:
>> +This creates a virtual TCE (translation control entry) table, which
>> +is an IOMMU for PAPR-style virtual I/O. It is used to translate
>> +logical addresses used in virtual I/O into guest physical addresses,
>> +and provides a scatter/gather capability for PAPR virtual I/O.
>> +
>> +/* for KVM_CAP_SPAPR_TCE_64 */
>> +struct kvm_create_spapr_tce_64 {
>> + __u64 liobn;
>> + __u64 window_size;
>> + __u64 bus_offset;
>> + __u32 page_shift;
>> + __u32 flags;
>> +};
>> +
>> +The liobn field gives the logical IO bus number for which to create a
>> +TCE table. The window_size field specifies the size of the DMA window
>> +which this TCE table will translate - the table will contain one 64
>> +bit TCE entry for every IOMMU page. The bus_offset field tells where
>> +this window is mapped on the IO bus.
>
> Hrm, the bus_offset cannot be set arbitrarily, it has some pretty strong
> HW limits depending on the type of bridge & architecture version...
>
> Do you plan to have that knowledge in qemu ? Or do you have some other
> mechanism to query it ? (I might be missing a piece of the puzzle here).
Yes. QEMU will have this knowledge as it has to implement
ibm,create-pe-dma-window and return this address to the guest. There will
be a container API to receive it from powernv code via funky ppc_md callback.
There are 2 steps:
1. query + create window
2. enable in-kernel KVM acceleration for it.
Everything will work without step2 and, frankly speaking, we do not need it
too much for DDW but it does not cost much.
By having bus_offset in ioctl which is only used for step2, I reduce
dependance from powernv.
> Also one thing I've been pondering ...
>
> We'll end up wasting a ton of memory with those TCE tables. If you have
> 3 PEs mapped into a guest, it will try to create 3 DDW's mapping the
> entire guest memory and so 3 TCE tables large enough for that ... and
> which will contain exactly the same entries !
This is in the plan too, do not rush :)
> We really want to look into extending PAPR to allow the creation of
> table "aliases" so that the guest can essentially create one table and
> associate it with multiple PEs. We might still decide to do multiple
> copies for NUMA reasons but no more than one per node for example... at
> least we can have the policy in qemu/kvm.
>
> Also, do you currently require allocating a single physically contiguous
> table or do you support TCE trees in your implementation ?
No trees yet. For 64GB window we need (64<<30)/(16<<20)*8 = 32K TCE table.
Do we really need trees?
--
Alexey
next prev parent reply other threads:[~2014-06-05 9:27 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-05 7:25 [PATCH 0/3] Prepare for in-kernel VFIO DMA operations acceleration Alexey Kardashevskiy
2014-06-05 7:25 ` [PATCH 1/3] PPC: KVM: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number Alexey Kardashevskiy
2014-06-05 7:25 ` [PATCH 2/3] PPC: KVM: Reserve KVM_CAP_SPAPR_TCE_64 " Alexey Kardashevskiy
2014-06-05 7:25 ` [PATCH 3/3] PPC: KVM: Add support for 64bit TCE windows Alexey Kardashevskiy
2014-06-05 7:38 ` Benjamin Herrenschmidt
2014-06-05 9:26 ` Alexey Kardashevskiy [this message]
2014-06-05 10:27 ` Benjamin Herrenschmidt
2014-06-05 11:56 ` Alexander Graf
2014-06-05 12:30 ` Benjamin Herrenschmidt
2014-06-05 12:32 ` Alexander Graf
2014-06-05 13:04 ` Alexey Kardashevskiy
2014-06-05 11:57 ` [PATCH 0/3] Prepare for in-kernel VFIO DMA operations acceleration Alexander Graf
2014-06-06 0:20 ` Alexey Kardashevskiy
2014-06-25 21:12 ` Alexander Graf
2014-06-25 23:59 ` Alexey Kardashevskiy
2014-06-26 10:37 ` Alexander Graf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=539037DB.5080706@ozlabs.ru \
--to=aik@ozlabs.ru \
--cc=agraf@suse.de \
--cc=benh@kernel.crashing.org \
--cc=gleb@kernel.org \
--cc=kvm-ppc@vger.kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=paulus@samba.org \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).