All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Alexander Graf <agraf@suse.de>, qemu-devel@nongnu.org
Cc: Alex Williamson <alex.williamson@redhat.com>,
	qemu-ppc@nongnu.org, Gavin Shan <gwshan@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] [PATCH v7 1/4] spapr_iommu: Make in-kernel TCE table optional
Date: Fri, 06 Jun 2014 09:48:16 +1000	[thread overview]
Message-ID: <539101C0.2090004@ozlabs.ru> (raw)
In-Reply-To: <5390FEF3.4080108@suse.de>

On 06/06/2014 09:36 AM, Alexander Graf wrote:
> 
> On 06.06.14 01:17, Alexey Kardashevskiy wrote:
>> On 06/06/2014 02:51 AM, Alexander Graf wrote:
>>> On 05.06.14 16:33, Alexey Kardashevskiy wrote:
>>>> On 06/05/2014 11:36 PM, Alexander Graf wrote:
>>>>> On 05.06.14 15:33, Alexey Kardashevskiy wrote:
>>>>>> On 06/05/2014 11:15 PM, Alexander Graf wrote:
>>>>>>> On 05.06.14 15:10, Alexey Kardashevskiy wrote:
>>>>>>>> On 06/05/2014 11:06 PM, Alexander Graf wrote:
>>>>>>>>> On 05.06.14 08:43, Alexey Kardashevskiy wrote:
>>>>>>>>>> On 06/05/2014 03:49 PM, Alexey Kardashevskiy wrote:
>>>>>>>>>>> POWER KVM supports an KVM_CAP_SPAPR_TCE capability which allows
>>>>>>>>>>> allocating
>>>>>>>>>>> TCE tables in the host kernel memory and handle H_PUT_TCE requests
>>>>>>>>>>> targeted to specific LIOBN (logical bus number) right in the host
>>>>>>>>>>> without
>>>>>>>>>>> switching to QEMU. At the moment this is used for emulated devices
>>>>>>>>>>> only
>>>>>>>>>>> and the handler only puts TCE to the table. If the in-kernel
>>>>>>>>>>> H_PUT_TCE
>>>>>>>>>>> handler finds a LIOBN and corresponding table, it will put a TCE to
>>>>>>>>>>> the table and complete hypercall execution. The user space will
>>>>>>>>>>> not be
>>>>>>>>>>> notified.
>>>>>>>>>>>
>>>>>>>>>>> Upcoming VFIO support is going to use the same sPAPRTCETable device
>>>>>>>>>>> class
>>>>>>>>>>> so KVM_CAP_SPAPR_TCE is going to be used as well. That means
>>>>>>>>>>> that TCE
>>>>>>>>>>> tables for VFIO are going to be allocated in the host as well.
>>>>>>>>>>> However VFIO operates with real IOMMU tables and simple copying of
>>>>>>>>>>> a TCE to the real hardware TCE table will not work as guest
>>>>>>>>>>> physical
>>>>>>>>>>> to host physical address translation is requited.
>>>>>>>>>>>
>>>>>>>>>>> So until the host kernel gets VFIO support for H_PUT_TCE, we
>>>>>>>>>>> better not
>>>>>>>>>>> to register VFIO's TCE in the host.
>>>>>>>>>>>
>>>>>>>>>>> This adds a bool @kvm_accel flag to the sPAPRTCETable device
>>>>>>>>>>> telling
>>>>>>>>>>> that sPAPRTCETable should not try allocating TCE table in the host
>>>>>>>>>>> kernel.
>>>>>>>>>>> Instead, the table will be created in QEMU.
>>>>>>>>>>>
>>>>>>>>>>> This adds an kvm_accel parameter to spapr_tce_new_table() to let
>>>>>>>>>>> users
>>>>>>>>>>> choose whether to use acceleration or not. At the moment it is
>>>>>>>>>>> enabled
>>>>>>>>>>> for VIO and emulated PCI. Upcoming VFIO support will set it to
>>>>>>>>>>> false.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>>>>> ---
>>>>>>>>>>>
>>>>>>>>>>> This is a workaround but it lets me have one IOMMU device for VIO,
>>>>>>>>>>> emulated
>>>>>>>>>>> PCI and VFIO which is a good thing.
>>>>>>>>>>>
>>>>>>>>>>> The other way around would be a new KVM_CAP_SPAPR_TCE_VFIO
>>>>>>>>>>> capability but
>>>>>>>>>>> this needs kernel update.
>>>>>>>>>> Never mind, I'll make it a capability. I'll post capability
>>>>>>>>>> reservation
>>>>>>>>>> patch separately.
>>>>>>>>> Just rename the flag from "kvm_accel" to "vfio_accel", set it to
>>>>>>>>> true for
>>>>>>>>> vfio and false for emulated devices. Then the spapr_iommu file can
>>>>>>>>> check on
>>>>>>>>> the capability (and default to false for now, since it doesn't exist
>>>>>>>>> yet).
>>>>>>>> Is that ok if the flag does not have to do anything with VFIO per
>>>>>>>> se? :)
>>>>>>> The flag means "use in-kernel acceleration if the vfio coupling
>>>>>>> capability
>>>>>>> is available", no?
>>>>>> It is a flag of sPAPRTCETable which is not supposed to know about
>>>>>> VFIO at
>>>>>> all, it is just an IOMMU. But if you are ok with it, I have no reason
>>>>>> to be
>>>>>> unhappy either :)
>>>>>>
>>>>>>
>>>>>>
>>>>>>>>> That way you don't have to reserve a CAP today.
>>>>>>>> Why exactly cannot we do that today?
>>>>>>> Because the CAP namespace isn't a garbage bin we can just throw IDs at.
>>>>>>> Maybe we realize during patch review that we need completely different
>>>>>>> CAPs.
>>>>>> That was my first plan - to wait for KVM_CAP_SPAPR_TCE_64 be
>>>>>> available in
>>>>>> the kernel.
>>>>> So all you need are 64bit TCEs with bus_offset?
>>>> No. I need 64bit IOBAs a.k.a. PCI bus addresses. The default DMA window is
>>>> just 1 or 2GB and it is mapped at 0 on PCI bus.
>>>>
>>>> TCEs are 64 bit already.
>>> Ok, so the guest has to tell the PCI device to write to a specific window.
>>> That's a shame :).
>> No. Guest tells the device some address, that's it.  Guest allocates those
>> addresses from some window which host, guest and PHB know about but not the
>> device. What is a shame here?
> 
> It would be nicer if the guest had full control over the virtual address
> range of a PCI device.
>
>>>>> What about the missing
>>>>> in-kernel modification of the shadow TCEs on H_PUT_TCE? I thought that's
>>>>> what this is really about.
>>>> This I do not understand :(
>>> How does real mode H_PUT_TCE emulation know that it needs to notify user
>>> space to establish the map?
>> If it wants to pass control to the user space, it returns H_TOO_HARD. This
>> happens, for example, if LIOBN was not registered in KVM.
> 
> So how does KVM_CAP_SPAPR_TCE_64 help here? With KVM_CAP_SPAPR_TCE_64 we
> can still not map VFIO devices' TCE tables because we're missing all the
> magic to link the virtual TCE table to a physical TCE table.


It does not help here indeeed, I did not say it would ;) I just wanted to
do the preparations first, and this means I need to reserve capability
numbers (which is normally very tough process). Since one capability is
straightforward to implement, I included this into the set.



-- 
Alexey

  reply	other threads:[~2014-06-05 23:48 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-05  5:49 [Qemu-devel] [PATCH v7 0/4] vfio on spapr-ppc64 Alexey Kardashevskiy
2014-06-05  5:49 ` [Qemu-devel] [PATCH v7 1/4] spapr_iommu: Make in-kernel TCE table optional Alexey Kardashevskiy
2014-06-05  6:43   ` Alexey Kardashevskiy
2014-06-05 13:06     ` Alexander Graf
2014-06-05 13:10       ` Alexey Kardashevskiy
2014-06-05 13:15         ` Alexander Graf
2014-06-05 13:33           ` Alexey Kardashevskiy
2014-06-05 13:36             ` Alexander Graf
2014-06-05 14:33               ` Alexey Kardashevskiy
2014-06-05 16:51                 ` Alexander Graf
2014-06-05 23:17                   ` Alexey Kardashevskiy
2014-06-05 23:36                     ` Alexander Graf
2014-06-05 23:48                       ` Alexey Kardashevskiy [this message]
2014-06-06  3:38                       ` Benjamin Herrenschmidt
2014-06-05  5:49 ` [Qemu-devel] [PATCH v7 2/4] vfio: Add vfio_container_spapr_get_info() Alexey Kardashevskiy
2014-06-05 19:27   ` Alex Williamson
2014-06-05 23:40     ` Alexey Kardashevskiy
2014-06-06  1:32       ` Gavin Shan
2014-06-05  5:50 ` [Qemu-devel] [PATCH v7 3/4] spapr_pci_vfio: Add spapr-pci-vfio-host-bridge to support vfio Alexey Kardashevskiy
2014-06-05 13:34   ` Alexander Graf
2014-06-05 14:37     ` Alexey Kardashevskiy
2014-06-05  5:50 ` [Qemu-devel] [PATCH v7 4/4] vfio: Enable for spapr Alexey Kardashevskiy
2014-06-05 19:31   ` Alex Williamson
2014-06-05 23:39     ` Alexey Kardashevskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=539101C0.2090004@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=agraf@suse.de \
    --cc=alex.williamson@redhat.com \
    --cc=gwshan@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.