From: Janosch Frank <frankja@linux.vnet.ibm.com>
To: David Hildenbrand <david@redhat.com>, kvm@vger.kernel.org
Cc: schwidefsky@de.ibm.com, borntraeger@de.ibm.com,
dominik.dingel@gmail.com, linux-s390@vger.kernel.org
Subject: Re: [RFC/PATCH v3 04/16] s390/mm: add gmap PMD invalidation notification
Date: Tue, 13 Feb 2018 15:54:43 +0100 [thread overview]
Message-ID: <e9024cdf-7f44-cae3-c839-96eb2e9a6f37@linux.vnet.ibm.com> (raw)
In-Reply-To: <38c32f80-9225-8e71-7a8a-aa759cdbbfc7@redhat.com>
[-- Attachment #1.1: Type: text/plain, Size: 6364 bytes --]
On 13.02.2018 15:36, David Hildenbrand wrote:
> On 09.02.2018 10:34, Janosch Frank wrote:
>> For later migration of huge pages we want to write-protect guest
>> PMDs. While doing this, we have to make absolutely sure, that the
>> guest's lowcore is always accessible when the VCPU is running. With
>> PTEs, this is solved by marking the PGSTEs of the lowcore pages with
>> the invalidation notification bit and kicking the guest out of the SIE
>> via a notifier function if we need to invalidate such a page.
>>
>> With PMDs we do not have PGSTEs or some other bits we could use in the
>> host PMD. Instead we pick one of the free bits in the gmap PMD. Every
>> time a host pmd will be invalidated, we will check if the respective
>> gmap PMD has the bit set and in that case fire up the notifier.
>>
>> In the first step we only support setting the invalidation bit, but we
>> do not support restricting access of guest pmds. It will follow
>> shortly.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
>
> [...]
>
>> +
>> +/**
>> + * gmap_pmd_split - Split a huge gmap pmd and use a page table instead
>> + * @gmap: pointer to guest mapping meta data structure
>> + * @gaddr: virtual address in the guest address space
>> + * @pmdp: pointer to the pmd that will be split
>> + *
>> + * When splitting gmap pmds, we have to make the resulting page table
>> + * look like it's a normal one to be able to use the common pte
>> + * handling functions. Also we need to track these new tables as they
>> + * aren't tracked anywhere else.
>> + */
>> +static int gmap_pmd_split(struct gmap *gmap, unsigned long gaddr, pmd_t *pmdp)
>> +{
>> + unsigned long *table;
>> + struct page *page;
>> + pmd_t new;
>> + int i;
>> +
>
> That's interesting, because the SIE can now suddenly work on these
> PGSTEs, e.g. not leading to intercepts on certain events (like setting
> storage keys).
>
> How is that intended to be handled? I assume we would somehow have to
> forbid the SIE from making use of the PGSTE. But that involves clearing
> certain interception controls, which might be problematic.
Well, cmma is disabled and storage keys should only be a problem, when
the pte is invalid without the pgste lock, which should never be the
case for split pmds.
>
>> + page = page_table_alloc_pgste(gmap->mm);
>> + if (!page)
>> + return -ENOMEM;
>> + table = (unsigned long *) page_to_phys(page);
>> + for (i = 0; i < 256; i++) {
>> + table[i] = (pmd_val(*pmdp) & HPAGE_MASK) + i * PAGE_SIZE;
>> + /* pmd_large() implies pmd/pte_present() */
>> + table[i] |= _PAGE_PRESENT | _PAGE_READ | _PAGE_WRITE;
>> + /* ptes are directly marked as dirty */
>> + table[i + PTRS_PER_PTE] |= PGSTE_UC_BIT;
>> + }
>> +
>> + pmd_val(new) = ((unsigned long)table | _SEGMENT_ENTRY |
>> + (_SEGMENT_ENTRY_GMAP_SPLIT));
>> + list_add(&page->lru, &gmap->split_list);
>> + gmap_pmdp_xchg(gmap, pmdp, new, gaddr);
>> + return 0;
>> +}
>> +
>> /*
>> * gmap_protect_pte - remove access rights to memory and set pgste bits
>> * @gmap: pointer to guest mapping meta data structure
>> @@ -941,7 +1021,7 @@ static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
>> spinlock_t *ptl = NULL;
>> unsigned long pbits = 0;
>>
>> - ptep = pte_alloc_map_lock(gmap->mm, pmdp, gaddr, &ptl);
>> + ptep = gmap_pte_from_pmd(gmap, pmdp, gaddr, &ptl);
>> if (!ptep)
>> return -ENOMEM;
>>
>> @@ -979,15 +1059,21 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
>> rc = -EAGAIN;
>> pmdp = gmap_pmd_op_walk(gmap, gaddr);
>> if (pmdp && !(pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)) {
>> - rc = gmap_protect_pte(gmap, gaddr, pmdp, prot,
>> - bits);
>> - if (!rc) {
>> - len -= PAGE_SIZE;
>> - gaddr += PAGE_SIZE;
>> + if (!pmd_large(*pmdp)) {
>> + rc = gmap_protect_pte(gmap, gaddr, pmdp, prot,
>> + bits);
>> + if (!rc) {
>> + len -= PAGE_SIZE;
>> + gaddr += PAGE_SIZE;
>> + }
>> + } else {
>> + rc = gmap_pmd_split(gmap, gaddr, pmdp);
>> + if (!rc)
>> + rc = -EFAULT;
>
> Not sure if -EFAULT is the right thing to do here. Actually this would
> be a much better use case for -EAGAIN.
>
> (or simply avoid this by doing an gmap_pmd_op_end() + continue;)
That is a leftover from the dark days of pmd protecting/notifying.
Will fix.
>
>> }
>> gmap_pmd_op_end(gmap, pmdp);
>> }
>> - if (rc) {
>> + if (rc && rc != -EFAULT) {
>> vmaddr = __gmap_translate(gmap, gaddr);
>> if (IS_ERR_VALUE(vmaddr))
>> return vmaddr;
>> @@ -2133,6 +2219,39 @@ static void gmap_shadow_notify(struct gmap *sg, unsigned long vmaddr,
>> spin_unlock(&sg->guest_table_lock);
>> }
>> /**
>> * ptep_notify - call all invalidation callbacks for a specific pte.
>> * @mm: pointer to the process mm_struct
>> @@ -2177,6 +2296,23 @@ void ptep_notify(struct mm_struct *mm, unsigned long vmaddr,
>> }
>> EXPORT_SYMBOL_GPL(ptep_notify);
>>
>> +static void pmdp_notify_split(struct mm_struct *mm, unsigned long vmaddr,
>> + unsigned long *table)
>> +{
>> + int i = 0;
>> + unsigned long bits;
>> + unsigned long *ptep = (unsigned long *)(*table & PAGE_MASK);
>> + unsigned long *pgste = ptep + PTRS_PER_PTE;
>> +
>> + for (; i < 256; i++, vmaddr += PAGE_SIZE, ptep++, pgste++) {
>> + bits = *pgste & (PGSTE_IN_BIT | PGSTE_VSIE_BIT);
>> + if (bits) {
>> + *pgste ^= bits;
>> + ptep_notify_gmap(mm, vmaddr, (pte_t *)ptep, bits);
>
> All existing callers of ptep_notify_gmap() are called with the pgste
> lock being held. I guess we don't need that here as we always take the
> guest_table_lock when working with split PMDs.
That's my understanding
>
> Complicated stuff :)
Yes, also this is rather horrible performance wise, as we do another run
through the gmap list in ptep_notify_gmap, which we need to do, because
we don't get the gmap in ptep_remove_dirty_protection_split and
test_and_clear_guest_dirty_split.
The reason for that being, that we need the pgste lock for our dirty
changes which we only can do in pgtable.c for more or less proper
separation of mm and gmap functions. I've yet to find a nice and clean
solution to that.
>
>> + }
>> + }
>> +}
>
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
next prev parent reply other threads:[~2018-02-13 14:54 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-09 9:34 [RFC/PATCH v3 00/16] KVM/s390: Hugetlbfs enablement Janosch Frank
2018-02-09 9:34 ` [RFC/PATCH v3 01/16] s390/mm: make gmap_protect_range more modular Janosch Frank
2018-02-13 14:07 ` David Hildenbrand
2018-02-09 9:34 ` [RFC/PATCH v3 02/16] s390/mm: Abstract gmap notify bit setting Janosch Frank
2018-02-13 14:10 ` David Hildenbrand
2018-02-13 14:31 ` Janosch Frank
2018-02-09 9:34 ` [RFC/PATCH v3 03/16] s390/mm: Introduce gmap_pmdp_xchg Janosch Frank
2018-02-13 14:16 ` David Hildenbrand
2018-02-13 14:39 ` Janosch Frank
2018-02-09 9:34 ` [RFC/PATCH v3 04/16] s390/mm: add gmap PMD invalidation notification Janosch Frank
2018-02-13 14:36 ` David Hildenbrand
2018-02-13 14:54 ` Janosch Frank [this message]
2018-02-13 14:59 ` David Hildenbrand
2018-02-13 15:33 ` Janosch Frank
2018-02-14 10:42 ` David Hildenbrand
2018-02-14 11:19 ` Janosch Frank
2018-02-14 14:18 ` David Hildenbrand
2018-02-14 14:55 ` Janosch Frank
2018-02-14 15:15 ` David Hildenbrand
2018-02-14 15:24 ` Janosch Frank
2018-02-09 9:34 ` [RFC/PATCH v3 05/16] s390/mm: Add gmap pmd invalidation and clearing Janosch Frank
2018-02-09 9:34 ` [RFC/PATCH v3 06/16] s390/mm: Add huge page dirty sync support Janosch Frank
2018-02-09 9:34 ` [RFC/PATCH v3 07/16] s390/mm: Make gmap_read_table EDAT1 compatible Janosch Frank
2018-02-09 9:34 ` [RFC/PATCH v3 08/16] s390/mm: Make protect_rmap " Janosch Frank
2018-02-09 9:34 ` [RFC/PATCH v3 09/16] s390/mm: Add shadow segment code Janosch Frank
2018-02-09 9:34 ` [RFC/PATCH v3 10/16] s390/mm: Add VSIE reverse fake case Janosch Frank
2018-02-09 9:34 ` [RFC/PATCH v3 11/16] s390/mm: Enable gmap huge pmd support Janosch Frank
2018-02-09 9:34 ` [RFC/PATCH v3 12/16] s390/mm: clear huge page storage keys on enable_skey Janosch Frank
2018-02-09 9:34 ` [RFC/PATCH v3 13/16] s390/mm: Add huge pmd storage key handling Janosch Frank
2018-02-09 9:34 ` [RFC/PATCH v3 14/16] s390/mm: hugetlb pages within a gmap can not be freed Janosch Frank
2018-02-09 9:34 ` [RFC/PATCH v3 15/16] KVM: s390: Add KVM HPAGE capability Janosch Frank
2018-02-09 9:34 ` [RFC/PATCH v3 16/16] s390/mm: Add gmap lock classes Janosch Frank
2018-02-14 14:30 ` [RFC/PATCH v3 00/16] KVM/s390: Hugetlbfs enablement David Hildenbrand
2018-02-14 15:01 ` Janosch Frank
2018-02-14 15:07 ` David Hildenbrand
2018-02-14 15:33 ` Janosch Frank
2018-02-14 15:48 ` Christian Borntraeger
2018-02-14 15:57 ` David Hildenbrand
2018-02-14 15:56 ` David Hildenbrand
2018-02-15 15:43 ` [PATCH 0/3] Hpage capability rework Janosch Frank
2018-02-15 15:43 ` [PATCH 1/3] KVM: s390: Refactor host cmma and pfmfi interpretation controls Janosch Frank
2018-02-15 16:08 ` David Hildenbrand
2018-02-15 16:42 ` Janosch Frank
2018-02-16 9:46 ` David Hildenbrand
2018-02-15 15:43 ` [PATCH 2/3] KVM: s390: Add storage key facility interpretation control Janosch Frank
2018-02-15 16:09 ` David Hildenbrand
2018-02-15 20:27 ` Farhan Ali
2018-02-15 15:43 ` [PATCH 3/3] s390/mm: Enable gmap huge pmd support Janosch Frank
2018-02-15 16:10 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e9024cdf-7f44-cae3-c839-96eb2e9a6f37@linux.vnet.ibm.com \
--to=frankja@linux.vnet.ibm.com \
--cc=borntraeger@de.ibm.com \
--cc=david@redhat.com \
--cc=dominik.dingel@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=schwidefsky@de.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).