From: Alexander Graf <agraf@suse.de>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
Christoffer Dall <cdall@cs.columbia.edu>,
kvm@vger.kernel.org,
Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>,
Alexey Kardashevskiy <aik@ozlabs.ru>,
linux-kernel@vger.kernel.org, kvm-ppc@vger.kernel.org,
linux-mm@kvack.org, Alex Williamson <alex.williamson@redhat.com>,
Paul Mackerras <paulus@samba.org>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
linuxppc-dev@lists.ozlabs.org,
David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [PATCH 4/8] powerpc: Prepare to support kernel handling of IOMMU map/unmap
Date: Tue, 09 Jul 2013 17:54:42 +0200 [thread overview]
Message-ID: <51DC3242.30802@suse.de> (raw)
In-Reply-To: <1373247199.4446.29.camel@pasglop>
On 07/08/2013 03:33 AM, Benjamin Herrenschmidt wrote:
> On Sun, 2013-07-07 at 01:07 +1000, Alexey Kardashevskiy wrote:
>> The current VFIO-on-POWER implementation supports only user mode
>> driven mapping, i.e. QEMU is sending requests to map/unmap pages.
>> However this approach is really slow, so we want to move that to KVM.
>> Since H_PUT_TCE can be extremely performance sensitive (especially with
>> network adapters where each packet needs to be mapped/unmapped) we chose
>> to implement that as a "fast" hypercall directly in "real
>> mode" (processor still in the guest context but MMU off).
>>
>> To be able to do that, we need to provide some facilities to
>> access the struct page count within that real mode environment as things
>> like the sparsemem vmemmap mappings aren't accessible.
>>
>> This adds an API to increment/decrement page counter as
>> get_user_pages API used for user mode mapping does not work
>> in the real mode.
>>
>> CONFIG_SPARSEMEM_VMEMMAP and CONFIG_FLATMEM are supported.
> This patch will need an ack from "mm" people to make sure they are ok
> with our approach and ack the change to the generic header.
>
> (Added linux-mm).
>
> Cheers,
> Ben.
>
>> Reviewed-by: Paul Mackerras<paulus@samba.org>
>> Signed-off-by: Paul Mackerras<paulus@samba.org>
>> Signed-off-by: Alexey Kardashevskiy<aik@ozlabs.ru>
>>
>> ---
>>
>> Changes:
>> 2013/06/27:
>> * realmode_get_page() fixed to use get_page_unless_zero(). If failed,
>> the call will be passed from real to virtual mode and safely handled.
>> * added comment to PageCompound() in include/linux/page-flags.h.
>>
>> 2013/05/20:
>> * PageTail() is replaced by PageCompound() in order to have the same checks
>> for whether the page is huge in realmode_get_page() and realmode_put_page()
>>
>> Signed-off-by: Alexey Kardashevskiy<aik@ozlabs.ru>
>> ---
>> arch/powerpc/include/asm/pgtable-ppc64.h | 4 ++
>> arch/powerpc/mm/init_64.c | 78 +++++++++++++++++++++++++++++++-
>> include/linux/page-flags.h | 4 +-
>> 3 files changed, 84 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
>> index e3d55f6f..7b46e5f 100644
>> --- a/arch/powerpc/include/asm/pgtable-ppc64.h
>> +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
>> @@ -376,6 +376,10 @@ static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
>> }
>> #endif /* !CONFIG_HUGETLB_PAGE */
>>
>> +struct page *realmode_pfn_to_page(unsigned long pfn);
>> +int realmode_get_page(struct page *page);
>> +int realmode_put_page(struct page *page);
>> +
>> #endif /* __ASSEMBLY__ */
>>
>> #endif /* _ASM_POWERPC_PGTABLE_PPC64_H_ */
>> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
>> index a90b9c4..7031be3 100644
>> --- a/arch/powerpc/mm/init_64.c
>> +++ b/arch/powerpc/mm/init_64.c
>> @@ -297,5 +297,81 @@ void vmemmap_free(unsigned long start, unsigned long end)
>> {
>> }
>>
>> -#endif /* CONFIG_SPARSEMEM_VMEMMAP */
>> +/*
>> + * We do not have access to the sparsemem vmemmap, so we fallback to
>> + * walking the list of sparsemem blocks which we already maintain for
>> + * the sake of crashdump. In the long run, we might want to maintain
>> + * a tree if performance of that linear walk becomes a problem.
>> + *
>> + * Any of realmode_XXXX functions can fail due to:
>> + * 1) As real sparsemem blocks do not lay in RAM continously (they
>> + * are in virtual address space which is not available in the real mode),
>> + * the requested page struct can be split between blocks so get_page/put_page
>> + * may fail.
>> + * 2) When huge pages are used, the get_page/put_page API will fail
>> + * in real mode as the linked addresses in the page struct are virtual
>> + * too.
>> + * When 1) or 2) takes place, the API returns an error code to cause
>> + * an exit to kernel virtual mode where the operation will be completed.
I don't see where these functions enter kernel virtual mode. I think
it's best to just remove the last sentence. It doesn't belong here.
Alex
>> + */
>> +struct page *realmode_pfn_to_page(unsigned long pfn)
>> +{
>> + struct vmemmap_backing *vmem_back;
>> + struct page *page;
>> + unsigned long page_size = 1<< mmu_psize_defs[mmu_vmemmap_psize].shift;
>> + unsigned long pg_va = (unsigned long) pfn_to_page(pfn);
>>
>> + for (vmem_back = vmemmap_list; vmem_back; vmem_back = vmem_back->list) {
>> + if (pg_va< vmem_back->virt_addr)
>> + continue;
>> +
>> + /* Check that page struct is not split between real pages */
>> + if ((pg_va + sizeof(struct page))>
>> + (vmem_back->virt_addr + page_size))
>> + return NULL;
>> +
>> + page = (struct page *) (vmem_back->phys + pg_va -
>> + vmem_back->virt_addr);
>> + return page;
>> + }
>> +
>> + return NULL;
>> +}
>> +EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
>> +
>> +#elif defined(CONFIG_FLATMEM)
>> +
>> +struct page *realmode_pfn_to_page(unsigned long pfn)
>> +{
>> + struct page *page = pfn_to_page(pfn);
>> + return page;
>> +}
>> +EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
>> +
>> +#endif /* CONFIG_SPARSEMEM_VMEMMAP/CONFIG_FLATMEM */
>> +
>> +#if defined(CONFIG_SPARSEMEM_VMEMMAP) || defined(CONFIG_FLATMEM)
>> +int realmode_get_page(struct page *page)
>> +{
>> + if (PageCompound(page))
>> + return -EAGAIN;
>> +
>> + if (!get_page_unless_zero(page))
>> + return -EAGAIN;
>> +
>> + return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(realmode_get_page);
>> +
>> +int realmode_put_page(struct page *page)
>> +{
>> + if (PageCompound(page))
>> + return -EAGAIN;
>> +
>> + if (!atomic_add_unless(&page->_count, -1, 1))
>> + return -EAGAIN;
>> +
>> + return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(realmode_put_page);
>> +#endif
>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>> index 6d53675..98ada58 100644
>> --- a/include/linux/page-flags.h
>> +++ b/include/linux/page-flags.h
>> @@ -329,7 +329,9 @@ static inline void set_page_writeback(struct page *page)
>> * System with lots of page flags available. This allows separate
>> * flags for PageHead() and PageTail() checks of compound pages so that bit
>> * tests can be used in performance sensitive paths. PageCompound is
>> - * generally not used in hot code paths.
>> + * generally not used in hot code paths except arch/powerpc/mm/init_64.c
>> + * and arch/powerpc/kvm/book3s_64_vio_hv.c which use it to detect huge pages
>> + * and avoid handling those in real mode.
>> */
>> __PAGEFLAG(Head, head) CLEARPAGEFLAG(Head, head)
>> __PAGEFLAG(Tail, tail)
>
next prev parent reply other threads:[~2013-07-09 15:54 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-06 15:06 [PATCH 0/8 v5] KVM: PPC: IOMMU in-kernel handling Alexey Kardashevskiy
2013-07-06 15:07 ` [PATCH 1/8] KVM: PPC: reserve a capability number for multitce support Alexey Kardashevskiy
2013-07-06 15:07 ` [PATCH 2/8] KVM: PPC: reserve a capability and ioctl numbers for realmode VFIO Alexey Kardashevskiy
2013-07-06 15:07 ` [PATCH 3/8] vfio: add external user support Alexey Kardashevskiy
2013-07-08 21:52 ` Alex Williamson
2013-07-09 5:40 ` Alexey Kardashevskiy
2013-07-09 14:08 ` Alex Williamson
2013-07-06 15:07 ` [PATCH 4/8] powerpc: Prepare to support kernel handling of IOMMU map/unmap Alexey Kardashevskiy
2013-07-08 1:33 ` Benjamin Herrenschmidt
2013-07-09 15:54 ` Alexander Graf [this message]
2013-07-06 15:07 ` [PATCH 5/8] powerpc: add real mode support for dma operations on powernv Alexey Kardashevskiy
2013-07-08 4:44 ` [PATCH v2] " Alexey Kardashevskiy
2013-07-08 7:20 ` Benjamin Herrenschmidt
2013-07-08 7:31 ` Alexey Kardashevskiy
2013-07-08 7:40 ` Benjamin Herrenschmidt
2013-07-09 16:02 ` [PATCH 5/8] " Alexander Graf
2013-07-10 3:17 ` Alexey Kardashevskiy
2013-07-10 3:37 ` Benjamin Herrenschmidt
2013-07-06 15:07 ` [PATCH 6/8] KVM: PPC: Add support for multiple-TCE hcalls Alexey Kardashevskiy
2013-07-09 17:02 ` Alexander Graf
2013-07-10 5:00 ` Alexey Kardashevskiy
2013-07-10 10:05 ` Alexander Graf
2013-07-11 5:12 ` Alexey Kardashevskiy
2013-07-11 10:11 ` Alexander Graf
2013-07-11 10:54 ` Alexey Kardashevskiy
2013-07-11 11:15 ` Alexander Graf
2013-07-11 12:39 ` Benjamin Herrenschmidt
2013-07-11 12:51 ` Alexander Graf
2013-07-11 12:56 ` Alexey Kardashevskiy
2013-07-11 12:58 ` Benjamin Herrenschmidt
2013-07-11 13:13 ` Alexey Kardashevskiy
2013-07-11 13:21 ` Alexander Graf
2013-07-11 12:40 ` Benjamin Herrenschmidt
2013-07-11 12:38 ` Benjamin Herrenschmidt
2013-07-11 12:33 ` Benjamin Herrenschmidt
2013-07-11 13:11 ` Alexander Graf
2013-07-06 15:07 ` [PATCH 7/8] KVM: PPC: Add support for IOMMU in-kernel handling Alexey Kardashevskiy
2013-07-09 17:06 ` Alexander Graf
2013-07-06 15:07 ` [PATCH 8/8] KVM: PPC: Add hugepage " Alexey Kardashevskiy
2013-07-09 17:32 ` Alexander Graf
2013-07-09 23:29 ` Alexey Kardashevskiy
2013-07-10 10:33 ` Alexander Graf
2013-07-10 10:39 ` Benjamin Herrenschmidt
2013-07-10 10:40 ` Alexander Graf
2013-07-10 10:42 ` Alexander Graf
2013-07-11 8:57 ` Alexey Kardashevskiy
2013-07-11 9:52 ` Alexander Graf
2013-07-11 12:37 ` Benjamin Herrenschmidt
2013-07-11 12:50 ` Alexander Graf
2013-07-11 12:56 ` Benjamin Herrenschmidt
2013-07-11 13:41 ` chandrashekar shastri
2013-07-11 13:44 ` Alexander Graf
2013-07-11 13:46 ` Alexey Kardashevskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51DC3242.30802@suse.de \
--to=agraf@suse.de \
--cc=aarcange@redhat.com \
--cc=aik@ozlabs.ru \
--cc=akpm@linux-foundation.org \
--cc=alex.williamson@redhat.com \
--cc=benh@kernel.crashing.org \
--cc=cdall@cs.columbia.edu \
--cc=david@gibson.dropbear.id.au \
--cc=isimatu.yasuaki@jp.fujitsu.com \
--cc=kvm-ppc@vger.kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=paulus@samba.org \
--cc=schwidefsky@de.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).