linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Graf <agraf@suse.de>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Christoffer Dall <cdall@cs.columbia.edu>,
	kvm@vger.kernel.org,
	Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>,
	Alexey Kardashevskiy <aik@ozlabs.ru>,
	linux-kernel@vger.kernel.org, kvm-ppc@vger.kernel.org,
	linux-mm@kvack.org, Alex Williamson <alex.williamson@redhat.com>,
	Paul Mackerras <paulus@samba.org>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [PATCH 4/8] powerpc: Prepare to support kernel handling of IOMMU map/unmap
Date: Tue, 09 Jul 2013 17:54:42 +0200	[thread overview]
Message-ID: <51DC3242.30802@suse.de> (raw)
In-Reply-To: <1373247199.4446.29.camel@pasglop>

On 07/08/2013 03:33 AM, Benjamin Herrenschmidt wrote:
> On Sun, 2013-07-07 at 01:07 +1000, Alexey Kardashevskiy wrote:
>> The current VFIO-on-POWER implementation supports only user mode
>> driven mapping, i.e. QEMU is sending requests to map/unmap pages.
>> However this approach is really slow, so we want to move that to KVM.
>> Since H_PUT_TCE can be extremely performance sensitive (especially with
>> network adapters where each packet needs to be mapped/unmapped) we chose
>> to implement that as a "fast" hypercall directly in "real
>> mode" (processor still in the guest context but MMU off).
>>
>> To be able to do that, we need to provide some facilities to
>> access the struct page count within that real mode environment as things
>> like the sparsemem vmemmap mappings aren't accessible.
>>
>> This adds an API to increment/decrement page counter as
>> get_user_pages API used for user mode mapping does not work
>> in the real mode.
>>
>> CONFIG_SPARSEMEM_VMEMMAP and CONFIG_FLATMEM are supported.
> This patch will need an ack from "mm" people to make sure they are ok
> with our approach and ack the change to the generic header.
>
> (Added linux-mm).
>
> Cheers,
> Ben.
>
>> Reviewed-by: Paul Mackerras<paulus@samba.org>
>> Signed-off-by: Paul Mackerras<paulus@samba.org>
>> Signed-off-by: Alexey Kardashevskiy<aik@ozlabs.ru>
>>
>> ---
>>
>> Changes:
>> 2013/06/27:
>> * realmode_get_page() fixed to use get_page_unless_zero(). If failed,
>> the call will be passed from real to virtual mode and safely handled.
>> * added comment to PageCompound() in include/linux/page-flags.h.
>>
>> 2013/05/20:
>> * PageTail() is replaced by PageCompound() in order to have the same checks
>> for whether the page is huge in realmode_get_page() and realmode_put_page()
>>
>> Signed-off-by: Alexey Kardashevskiy<aik@ozlabs.ru>
>> ---
>>   arch/powerpc/include/asm/pgtable-ppc64.h |  4 ++
>>   arch/powerpc/mm/init_64.c                | 78 +++++++++++++++++++++++++++++++-
>>   include/linux/page-flags.h               |  4 +-
>>   3 files changed, 84 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
>> index e3d55f6f..7b46e5f 100644
>> --- a/arch/powerpc/include/asm/pgtable-ppc64.h
>> +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
>> @@ -376,6 +376,10 @@ static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
>>   }
>>   #endif /* !CONFIG_HUGETLB_PAGE */
>>
>> +struct page *realmode_pfn_to_page(unsigned long pfn);
>> +int realmode_get_page(struct page *page);
>> +int realmode_put_page(struct page *page);
>> +
>>   #endif /* __ASSEMBLY__ */
>>
>>   #endif /* _ASM_POWERPC_PGTABLE_PPC64_H_ */
>> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
>> index a90b9c4..7031be3 100644
>> --- a/arch/powerpc/mm/init_64.c
>> +++ b/arch/powerpc/mm/init_64.c
>> @@ -297,5 +297,81 @@ void vmemmap_free(unsigned long start, unsigned long end)
>>   {
>>   }
>>
>> -#endif /* CONFIG_SPARSEMEM_VMEMMAP */
>> +/*
>> + * We do not have access to the sparsemem vmemmap, so we fallback to
>> + * walking the list of sparsemem blocks which we already maintain for
>> + * the sake of crashdump. In the long run, we might want to maintain
>> + * a tree if performance of that linear walk becomes a problem.
>> + *
>> + * Any of realmode_XXXX functions can fail due to:
>> + * 1) As real sparsemem blocks do not lay in RAM continously (they
>> + * are in virtual address space which is not available in the real mode),
>> + * the requested page struct can be split between blocks so get_page/put_page
>> + * may fail.
>> + * 2) When huge pages are used, the get_page/put_page API will fail
>> + * in real mode as the linked addresses in the page struct are virtual
>> + * too.
>> + * When 1) or 2) takes place, the API returns an error code to cause
>> + * an exit to kernel virtual mode where the operation will be completed.

I don't see where these functions enter kernel virtual mode. I think 
it's best to just remove the last sentence. It doesn't belong here.


Alex

>> + */
>> +struct page *realmode_pfn_to_page(unsigned long pfn)
>> +{
>> +	struct vmemmap_backing *vmem_back;
>> +	struct page *page;
>> +	unsigned long page_size = 1<<  mmu_psize_defs[mmu_vmemmap_psize].shift;
>> +	unsigned long pg_va = (unsigned long) pfn_to_page(pfn);
>>
>> +	for (vmem_back = vmemmap_list; vmem_back; vmem_back = vmem_back->list) {
>> +		if (pg_va<  vmem_back->virt_addr)
>> +			continue;
>> +
>> +		/* Check that page struct is not split between real pages */
>> +		if ((pg_va + sizeof(struct page))>
>> +				(vmem_back->virt_addr + page_size))
>> +			return NULL;
>> +
>> +		page = (struct page *) (vmem_back->phys + pg_va -
>> +				vmem_back->virt_addr);
>> +		return page;
>> +	}
>> +
>> +	return NULL;
>> +}
>> +EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
>> +
>> +#elif defined(CONFIG_FLATMEM)
>> +
>> +struct page *realmode_pfn_to_page(unsigned long pfn)
>> +{
>> +	struct page *page = pfn_to_page(pfn);
>> +	return page;
>> +}
>> +EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
>> +
>> +#endif /* CONFIG_SPARSEMEM_VMEMMAP/CONFIG_FLATMEM */
>> +
>> +#if defined(CONFIG_SPARSEMEM_VMEMMAP) || defined(CONFIG_FLATMEM)
>> +int realmode_get_page(struct page *page)
>> +{
>> +	if (PageCompound(page))
>> +		return -EAGAIN;
>> +
>> +	if (!get_page_unless_zero(page))
>> +		return -EAGAIN;
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(realmode_get_page);
>> +
>> +int realmode_put_page(struct page *page)
>> +{
>> +	if (PageCompound(page))
>> +		return -EAGAIN;
>> +
>> +	if (!atomic_add_unless(&page->_count, -1, 1))
>> +		return -EAGAIN;
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(realmode_put_page);
>> +#endif
>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>> index 6d53675..98ada58 100644
>> --- a/include/linux/page-flags.h
>> +++ b/include/linux/page-flags.h
>> @@ -329,7 +329,9 @@ static inline void set_page_writeback(struct page *page)
>>    * System with lots of page flags available. This allows separate
>>    * flags for PageHead() and PageTail() checks of compound pages so that bit
>>    * tests can be used in performance sensitive paths. PageCompound is
>> - * generally not used in hot code paths.
>> + * generally not used in hot code paths except arch/powerpc/mm/init_64.c
>> + * and arch/powerpc/kvm/book3s_64_vio_hv.c which use it to detect huge pages
>> + * and avoid handling those in real mode.
>>    */
>>   __PAGEFLAG(Head, head) CLEARPAGEFLAG(Head, head)
>>   __PAGEFLAG(Tail, tail)
>

  reply	other threads:[~2013-07-09 15:54 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-06 15:06 [PATCH 0/8 v5] KVM: PPC: IOMMU in-kernel handling Alexey Kardashevskiy
2013-07-06 15:07 ` [PATCH 1/8] KVM: PPC: reserve a capability number for multitce support Alexey Kardashevskiy
2013-07-06 15:07 ` [PATCH 2/8] KVM: PPC: reserve a capability and ioctl numbers for realmode VFIO Alexey Kardashevskiy
2013-07-06 15:07 ` [PATCH 3/8] vfio: add external user support Alexey Kardashevskiy
2013-07-08 21:52   ` Alex Williamson
2013-07-09  5:40     ` Alexey Kardashevskiy
2013-07-09 14:08       ` Alex Williamson
2013-07-06 15:07 ` [PATCH 4/8] powerpc: Prepare to support kernel handling of IOMMU map/unmap Alexey Kardashevskiy
2013-07-08  1:33   ` Benjamin Herrenschmidt
2013-07-09 15:54     ` Alexander Graf [this message]
2013-07-06 15:07 ` [PATCH 5/8] powerpc: add real mode support for dma operations on powernv Alexey Kardashevskiy
2013-07-08  4:44   ` [PATCH v2] " Alexey Kardashevskiy
2013-07-08  7:20     ` Benjamin Herrenschmidt
2013-07-08  7:31       ` Alexey Kardashevskiy
2013-07-08  7:40         ` Benjamin Herrenschmidt
2013-07-09 16:02   ` [PATCH 5/8] " Alexander Graf
2013-07-10  3:17     ` Alexey Kardashevskiy
2013-07-10  3:37     ` Benjamin Herrenschmidt
2013-07-06 15:07 ` [PATCH 6/8] KVM: PPC: Add support for multiple-TCE hcalls Alexey Kardashevskiy
2013-07-09 17:02   ` Alexander Graf
2013-07-10  5:00     ` Alexey Kardashevskiy
2013-07-10 10:05       ` Alexander Graf
2013-07-11  5:12         ` Alexey Kardashevskiy
2013-07-11 10:11           ` Alexander Graf
2013-07-11 10:54             ` Alexey Kardashevskiy
2013-07-11 11:15               ` Alexander Graf
2013-07-11 12:39                 ` Benjamin Herrenschmidt
2013-07-11 12:51                   ` Alexander Graf
2013-07-11 12:56                     ` Alexey Kardashevskiy
2013-07-11 12:58                     ` Benjamin Herrenschmidt
2013-07-11 13:13                       ` Alexey Kardashevskiy
2013-07-11 13:21                         ` Alexander Graf
2013-07-11 12:40                 ` Benjamin Herrenschmidt
2013-07-11 12:38             ` Benjamin Herrenschmidt
2013-07-11 12:33           ` Benjamin Herrenschmidt
2013-07-11 13:11             ` Alexander Graf
2013-07-06 15:07 ` [PATCH 7/8] KVM: PPC: Add support for IOMMU in-kernel handling Alexey Kardashevskiy
2013-07-09 17:06   ` Alexander Graf
2013-07-06 15:07 ` [PATCH 8/8] KVM: PPC: Add hugepage " Alexey Kardashevskiy
2013-07-09 17:32   ` Alexander Graf
2013-07-09 23:29     ` Alexey Kardashevskiy
2013-07-10 10:33       ` Alexander Graf
2013-07-10 10:39         ` Benjamin Herrenschmidt
2013-07-10 10:40           ` Alexander Graf
2013-07-10 10:42             ` Alexander Graf
2013-07-11  8:57     ` Alexey Kardashevskiy
2013-07-11  9:52       ` Alexander Graf
2013-07-11 12:37         ` Benjamin Herrenschmidt
2013-07-11 12:50           ` Alexander Graf
2013-07-11 12:56             ` Benjamin Herrenschmidt
2013-07-11 13:41               ` chandrashekar shastri
2013-07-11 13:44                 ` Alexander Graf
2013-07-11 13:46                 ` Alexey Kardashevskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51DC3242.30802@suse.de \
    --to=agraf@suse.de \
    --cc=aarcange@redhat.com \
    --cc=aik@ozlabs.ru \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=benh@kernel.crashing.org \
    --cc=cdall@cs.columbia.edu \
    --cc=david@gibson.dropbear.id.au \
    --cc=isimatu.yasuaki@jp.fujitsu.com \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=paulus@samba.org \
    --cc=schwidefsky@de.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).