All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: linuxppc-dev@lists.ozlabs.org, Paul Mackerras <paulus@samba.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH kernel 8/9] KVM: PPC: Add in-kernel handling for VFIO
Date: Wed, 09 Mar 2016 08:46:47 +0000	[thread overview]
Message-ID: <56DFE2F7.80300@ozlabs.ru> (raw)
In-Reply-To: <20160308110812.GC22546@voom.fritz.box>

On 03/08/2016 10:08 PM, David Gibson wrote:
> On Mon, Mar 07, 2016 at 02:41:16PM +1100, Alexey Kardashevskiy wrote:
>> This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
>> and H_STUFF_TCE requests targeted an IOMMU TCE table used for VFIO
>> without passing them to user space which saves time on switching
>> to user space and back.
>>
>> Both real and virtual modes are supported. The kernel tries to
>> handle a TCE request in the real mode, if fails it passes the request
>> to the virtual mode to complete the operation. If it a virtual mode
>> handler fails, the request is passed to user space; this is not expected
>> to happen ever though.
>
> Well... not expect to happen with a qemu which uses this.  Presumably
> it will fall back to userspace routinely if you have an old qemu that
> doesn't add the liobn mappings.


Ah. Ok, thanks, I'll add this to the commit log.


>> The first user of this is VFIO on POWER. Trampolines to the VFIO external
>> user API functions are required for this patch.
>
> I'm not sure what you mean by "trampoline" here.

For example, look at kvm_vfio_group_get_external_user. It calls 
symbol_get(vfio_group_get_external_user) and then calls a function via the 
returned pointer.

Is there a better word for this?


>> This uses a VFIO KVM device to associate a logical bus number (LIOBN)
>> with an VFIO IOMMU group fd and enable in-kernel handling of map/unmap
>> requests.
>
> Group fd?  Or container fd?  The group fd wouldn't make a lot of
> sense.


Group. KVM has no idea about containers.


>> To make use of the feature, the user space has to create a guest view
>> of the TCE table via KVM_CAP_SPAPR_TCE/KVM_CAP_SPAPR_TCE_64 and
>> then associate a LIOBN with this table via VFIO KVM device,
>> a KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN property (which is added in
>> the next patch).
>>
>> Tests show that this patch increases transmission speed from 220MB/s
>> to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
>
> Is that with or without DDW (i.e. with or without a 64-bit DMA window)?


Without DDW, I should have mentioned this. The patch is from the times when 
there was no DDW :(



>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>   arch/powerpc/kvm/book3s_64_vio.c    | 184 +++++++++++++++++++++++++++++++++++
>>   arch/powerpc/kvm/book3s_64_vio_hv.c | 186 ++++++++++++++++++++++++++++++++++++
>>   2 files changed, 370 insertions(+)
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>> index 7965fc7..9417d12 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>> @@ -33,6 +33,7 @@
>>   #include <asm/kvm_ppc.h>
>>   #include <asm/kvm_book3s.h>
>>   #include <asm/mmu-hash64.h>
>> +#include <asm/mmu_context.h>
>>   #include <asm/hvcall.h>
>>   #include <asm/synch.h>
>>   #include <asm/ppc-opcode.h>
>> @@ -317,11 +318,161 @@ fail:
>>   	return ret;
>>   }
>>
>> +static long kvmppc_tce_iommu_mapped_dec(struct iommu_table *tbl,
>> +		unsigned long entry)
>> +{
>> +	struct mm_iommu_table_group_mem_t *mem = NULL;
>> +	const unsigned long pgsize = 1ULL << tbl->it_page_shift;
>> +	unsigned long *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
>> +
>> +	if (!pua)
>> +		return H_HARDWARE;
>> +
>> +	mem = mm_iommu_lookup(*pua, pgsize);
>> +	if (!mem)
>> +		return H_HARDWARE;
>> +
>> +	mm_iommu_mapped_dec(mem);
>> +
>> +	*pua = 0;
>> +
>> +	return H_SUCCESS;
>> +}
>> +
>> +static long kvmppc_tce_iommu_unmap(struct iommu_table *tbl,
>> +		unsigned long entry)
>> +{
>> +	enum dma_data_direction dir = DMA_NONE;
>> +	unsigned long hpa = 0;
>> +
>> +	if (iommu_tce_xchg(tbl, entry, &hpa, &dir))
>> +		return H_HARDWARE;
>> +
>> +	if (dir = DMA_NONE)
>> +		return H_SUCCESS;
>> +
>> +	return kvmppc_tce_iommu_mapped_dec(tbl, entry);
>> +}
>> +
>> +long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>> +		unsigned long entry, unsigned long gpa,
>> +		enum dma_data_direction dir)
>> +{
>> +	long ret;
>> +	unsigned long hpa, ua, *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
>> +	struct mm_iommu_table_group_mem_t *mem;
>> +
>> +	if (!pua)
>> +		return H_HARDWARE;
>
> H_HARDWARE?  Or H_PARAMETER?  This essentially means the guest has
> supplied a bad physical address, doesn't it?

Well, may be. I'll change. If it not H_TOO_HARD, it does not make any 
difference after all :)



>> +	if (kvmppc_gpa_to_ua(kvm, gpa, &ua, NULL))
>> +		return H_HARDWARE;
>> +
>> +	mem = mm_iommu_lookup(ua, 1ULL << tbl->it_page_shift);
>> +	if (!mem)
>> +		return H_HARDWARE;
>> +
>> +	if (mm_iommu_ua_to_hpa(mem, ua, &hpa))
>> +		return H_HARDWARE;
>> +
>> +	if (mm_iommu_mapped_inc(mem))
>> +		return H_HARDWARE;
>> +
>> +	ret = iommu_tce_xchg(tbl, entry, &hpa, &dir);
>> +	if (ret) {
>> +		mm_iommu_mapped_dec(mem);
>> +		return H_TOO_HARD;
>> +	}
>> +
>> +	if (dir != DMA_NONE)
>> +		kvmppc_tce_iommu_mapped_dec(tbl, entry);
>> +
>> +	*pua = ua;
>
> IIUC this means you have a copy of the UA for every group attached to
> the TCE table, but they'll all be the same. Any way to avoid that
> duplication?

It is for every container, not a group. On P8, I allow multiple groups to 
go to the same container, that means that a container has one or two 
iommu_table, and each iommu_table has this "ua" list but since tables are 
different (window size, page size, content), these "ua" arrays are also 
different.





-- 
Alexey

WARNING: multiple messages have this Message-ID (diff)
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: linuxppc-dev@lists.ozlabs.org, Paul Mackerras <paulus@samba.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH kernel 8/9] KVM: PPC: Add in-kernel handling for VFIO
Date: Wed, 9 Mar 2016 19:46:47 +1100	[thread overview]
Message-ID: <56DFE2F7.80300@ozlabs.ru> (raw)
In-Reply-To: <20160308110812.GC22546@voom.fritz.box>

On 03/08/2016 10:08 PM, David Gibson wrote:
> On Mon, Mar 07, 2016 at 02:41:16PM +1100, Alexey Kardashevskiy wrote:
>> This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
>> and H_STUFF_TCE requests targeted an IOMMU TCE table used for VFIO
>> without passing them to user space which saves time on switching
>> to user space and back.
>>
>> Both real and virtual modes are supported. The kernel tries to
>> handle a TCE request in the real mode, if fails it passes the request
>> to the virtual mode to complete the operation. If it a virtual mode
>> handler fails, the request is passed to user space; this is not expected
>> to happen ever though.
>
> Well... not expect to happen with a qemu which uses this.  Presumably
> it will fall back to userspace routinely if you have an old qemu that
> doesn't add the liobn mappings.


Ah. Ok, thanks, I'll add this to the commit log.


>> The first user of this is VFIO on POWER. Trampolines to the VFIO external
>> user API functions are required for this patch.
>
> I'm not sure what you mean by "trampoline" here.

For example, look at kvm_vfio_group_get_external_user. It calls 
symbol_get(vfio_group_get_external_user) and then calls a function via the 
returned pointer.

Is there a better word for this?


>> This uses a VFIO KVM device to associate a logical bus number (LIOBN)
>> with an VFIO IOMMU group fd and enable in-kernel handling of map/unmap
>> requests.
>
> Group fd?  Or container fd?  The group fd wouldn't make a lot of
> sense.


Group. KVM has no idea about containers.


>> To make use of the feature, the user space has to create a guest view
>> of the TCE table via KVM_CAP_SPAPR_TCE/KVM_CAP_SPAPR_TCE_64 and
>> then associate a LIOBN with this table via VFIO KVM device,
>> a KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN property (which is added in
>> the next patch).
>>
>> Tests show that this patch increases transmission speed from 220MB/s
>> to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
>
> Is that with or without DDW (i.e. with or without a 64-bit DMA window)?


Without DDW, I should have mentioned this. The patch is from the times when 
there was no DDW :(



>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>   arch/powerpc/kvm/book3s_64_vio.c    | 184 +++++++++++++++++++++++++++++++++++
>>   arch/powerpc/kvm/book3s_64_vio_hv.c | 186 ++++++++++++++++++++++++++++++++++++
>>   2 files changed, 370 insertions(+)
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>> index 7965fc7..9417d12 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>> @@ -33,6 +33,7 @@
>>   #include <asm/kvm_ppc.h>
>>   #include <asm/kvm_book3s.h>
>>   #include <asm/mmu-hash64.h>
>> +#include <asm/mmu_context.h>
>>   #include <asm/hvcall.h>
>>   #include <asm/synch.h>
>>   #include <asm/ppc-opcode.h>
>> @@ -317,11 +318,161 @@ fail:
>>   	return ret;
>>   }
>>
>> +static long kvmppc_tce_iommu_mapped_dec(struct iommu_table *tbl,
>> +		unsigned long entry)
>> +{
>> +	struct mm_iommu_table_group_mem_t *mem = NULL;
>> +	const unsigned long pgsize = 1ULL << tbl->it_page_shift;
>> +	unsigned long *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
>> +
>> +	if (!pua)
>> +		return H_HARDWARE;
>> +
>> +	mem = mm_iommu_lookup(*pua, pgsize);
>> +	if (!mem)
>> +		return H_HARDWARE;
>> +
>> +	mm_iommu_mapped_dec(mem);
>> +
>> +	*pua = 0;
>> +
>> +	return H_SUCCESS;
>> +}
>> +
>> +static long kvmppc_tce_iommu_unmap(struct iommu_table *tbl,
>> +		unsigned long entry)
>> +{
>> +	enum dma_data_direction dir = DMA_NONE;
>> +	unsigned long hpa = 0;
>> +
>> +	if (iommu_tce_xchg(tbl, entry, &hpa, &dir))
>> +		return H_HARDWARE;
>> +
>> +	if (dir == DMA_NONE)
>> +		return H_SUCCESS;
>> +
>> +	return kvmppc_tce_iommu_mapped_dec(tbl, entry);
>> +}
>> +
>> +long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>> +		unsigned long entry, unsigned long gpa,
>> +		enum dma_data_direction dir)
>> +{
>> +	long ret;
>> +	unsigned long hpa, ua, *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
>> +	struct mm_iommu_table_group_mem_t *mem;
>> +
>> +	if (!pua)
>> +		return H_HARDWARE;
>
> H_HARDWARE?  Or H_PARAMETER?  This essentially means the guest has
> supplied a bad physical address, doesn't it?

Well, may be. I'll change. If it not H_TOO_HARD, it does not make any 
difference after all :)



>> +	if (kvmppc_gpa_to_ua(kvm, gpa, &ua, NULL))
>> +		return H_HARDWARE;
>> +
>> +	mem = mm_iommu_lookup(ua, 1ULL << tbl->it_page_shift);
>> +	if (!mem)
>> +		return H_HARDWARE;
>> +
>> +	if (mm_iommu_ua_to_hpa(mem, ua, &hpa))
>> +		return H_HARDWARE;
>> +
>> +	if (mm_iommu_mapped_inc(mem))
>> +		return H_HARDWARE;
>> +
>> +	ret = iommu_tce_xchg(tbl, entry, &hpa, &dir);
>> +	if (ret) {
>> +		mm_iommu_mapped_dec(mem);
>> +		return H_TOO_HARD;
>> +	}
>> +
>> +	if (dir != DMA_NONE)
>> +		kvmppc_tce_iommu_mapped_dec(tbl, entry);
>> +
>> +	*pua = ua;
>
> IIUC this means you have a copy of the UA for every group attached to
> the TCE table, but they'll all be the same. Any way to avoid that
> duplication?

It is for every container, not a group. On P8, I allow multiple groups to 
go to the same container, that means that a container has one or two 
iommu_table, and each iommu_table has this "ua" list but since tables are 
different (window size, page size, content), these "ua" arrays are also 
different.





-- 
Alexey

  reply	other threads:[~2016-03-09  8:46 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-07  3:41 [PATCH kernel 0/9] KVM, PPC, VFIO: Enable in-kernel acceleration Alexey Kardashevskiy
2016-03-07  3:41 ` Alexey Kardashevskiy
2016-03-07  3:41 ` [PATCH kernel 1/9] KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number Alexey Kardashevskiy
2016-03-07  3:41   ` Alexey Kardashevskiy
2016-03-07  4:58   ` David Gibson
2016-03-07  4:58     ` David Gibson
2016-03-07  3:41 ` [PATCH kernel 2/9] powerpc/mmu: Add real mode support for IOMMU preregistered memory Alexey Kardashevskiy
2016-03-07  3:41   ` Alexey Kardashevskiy
2016-03-07  5:30   ` David Gibson
2016-03-07  5:30     ` David Gibson
2016-03-07  3:41 ` [PATCH kernel 3/9] KVM: PPC: Use preregistered memory API to access TCE list Alexey Kardashevskiy
2016-03-07  3:41   ` Alexey Kardashevskiy
2016-03-07  6:00   ` David Gibson
2016-03-07  6:00     ` David Gibson
2016-03-08  5:47     ` Alexey Kardashevskiy
2016-03-08  5:47       ` Alexey Kardashevskiy
2016-03-08  6:30       ` David Gibson
2016-03-08  6:30         ` David Gibson
2016-03-09  8:55         ` Alexey Kardashevskiy
2016-03-09  8:55           ` Alexey Kardashevskiy
2016-03-09 23:46           ` David Gibson
2016-03-09 23:46             ` David Gibson
2016-03-10  8:33     ` Paul Mackerras
2016-03-10  8:33       ` Paul Mackerras
2016-03-10 23:42       ` David Gibson
2016-03-10 23:42         ` David Gibson
2016-03-07  3:41 ` [PATCH kernel 4/9] powerpc/powernv/iommu: Add real mode version of xchg() Alexey Kardashevskiy
2016-03-07  3:41   ` Alexey Kardashevskiy
2016-03-07  6:05   ` David Gibson
2016-03-07  6:05     ` David Gibson
2016-03-07  7:32     ` Alexey Kardashevskiy
2016-03-07  7:32       ` Alexey Kardashevskiy
2016-03-08  4:50       ` David Gibson
2016-03-08  4:50         ` David Gibson
2016-03-10  8:43   ` Paul Mackerras
2016-03-10  8:43     ` Paul Mackerras
2016-03-10  8:46   ` Paul Mackerras
2016-03-10  8:46     ` Paul Mackerras
2016-03-07  3:41 ` [PATCH kernel 5/9] KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64 permanently Alexey Kardashevskiy
2016-03-07  3:41   ` Alexey Kardashevskiy
2016-03-07  3:41 ` [PATCH kernel 6/9] KVM: PPC: Associate IOMMU group with guest view of TCE table Alexey Kardashevskiy
2016-03-07  3:41   ` Alexey Kardashevskiy
2016-03-07  6:25   ` David Gibson
2016-03-07  6:25     ` David Gibson
2016-03-07  9:38     ` Alexey Kardashevskiy
2016-03-07  9:38       ` Alexey Kardashevskiy
2016-03-08  4:55       ` David Gibson
2016-03-08  4:55         ` David Gibson
2016-03-07  3:41 ` [PATCH kernel 7/9] KVM: PPC: Create a virtual-mode only TCE table handlers Alexey Kardashevskiy
2016-03-07  3:41   ` Alexey Kardashevskiy
2016-03-08  6:32   ` David Gibson
2016-03-08  6:32     ` David Gibson
2016-03-07  3:41 ` [PATCH kernel 8/9] KVM: PPC: Add in-kernel handling for VFIO Alexey Kardashevskiy
2016-03-07  3:41   ` Alexey Kardashevskiy
2016-03-08 11:08   ` David Gibson
2016-03-08 11:08     ` David Gibson
2016-03-09  8:46     ` Alexey Kardashevskiy [this message]
2016-03-09  8:46       ` Alexey Kardashevskiy
2016-03-10  5:18       ` David Gibson
2016-03-10  5:18         ` David Gibson
2016-03-11  2:15         ` Alexey Kardashevskiy
2016-03-11  2:15           ` Alexey Kardashevskiy
2016-03-15  6:00           ` David Gibson
2016-03-15  6:00             ` David Gibson
2016-03-07  3:41 ` [PATCH kernel 9/9] KVM: PPC: VFIO device: support SPAPR TCE Alexey Kardashevskiy
2016-03-07  3:41   ` Alexey Kardashevskiy
2016-03-09  5:45   ` David Gibson
2016-03-09  5:45     ` David Gibson
2016-03-09  9:20     ` Alexey Kardashevskiy
2016-03-09  9:20       ` Alexey Kardashevskiy
2016-03-10  5:21       ` David Gibson
2016-03-10  5:21         ` David Gibson
2016-03-10 23:09         ` Alexey Kardashevskiy
2016-03-10 23:09           ` Alexey Kardashevskiy
2016-03-15  6:04           ` David Gibson
2016-03-15  6:04             ` David Gibson
     [not found]             ` <15389a41428.27cb.1ca38dd7e845b990cd13d431eb58563d@ozlabs.ru>
     [not found]               ` <20160321051932.GJ23586@voom.redhat.com>
2016-03-22  0:34                 ` Alexey Kardashevskiy
2016-03-22  0:34                   ` Alexey Kardashevskiy
2016-03-23  3:03                   ` David Gibson
2016-03-23  3:03                     ` David Gibson
2016-06-09  6:47                     ` Alexey Kardashevskiy
2016-06-09  6:47                       ` Alexey Kardashevskiy
2016-06-10  6:50                       ` David Gibson
2016-06-10  6:50                         ` David Gibson
2016-06-14  3:30                         ` Alexey Kardashevskiy
2016-06-14  3:30                           ` Alexey Kardashevskiy
2016-06-15  4:43                           ` David Gibson
2016-06-15  4:43                             ` David Gibson
2016-04-08  9:13     ` Alexey Kardashevskiy
2016-04-08  9:13       ` Alexey Kardashevskiy
2016-04-11  3:36       ` David Gibson
2016-04-11  3:36         ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56DFE2F7.80300@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.