From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Janosch Frank <frankja@linux.vnet.ibm.com>
Cc: David Hildenbrand <david@redhat.com>,
kvm@vger.kernel.org, borntraeger@de.ibm.com,
dominik.dingel@gmail.com, linux-s390@vger.kernel.org
Subject: Re: [RFC/PATCH 03/22] s390/mm: add gmap PMD invalidation notification
Date: Fri, 17 Nov 2017 10:19:28 +0100 [thread overview]
Message-ID: <20171117101928.1ea9d354@mschwideX1> (raw)
In-Reply-To: <eebe97cd-5c3d-f54f-6dfc-e6a105c57dc4@linux.vnet.ibm.com>
On Fri, 17 Nov 2017 10:02:57 +0100
Janosch Frank <frankja@linux.vnet.ibm.com> wrote:
> On 15.11.2017 10:55, David Hildenbrand wrote:
> > On 06.11.2017 23:29, Janosch Frank wrote:
> >> For later migration of huge pages we want to write-protect guest
> >> PMDs. While doing this, we have to make absolutely sure, that the
> >> guest's lowcore is always accessible when the VCPU is running. With
> >> PTEs, this is solved by marking the PGSTEs of the lowcore pages with
> >> the invalidation notification bit and kicking the guest out of the SIE
> >> via a notifier function if we need to invalidate such a page.
> >>
> >> With PMDs we do not have PGSTEs or some other bits we could use in the
> >> host PMD. Instead we pick one of the free bits in the gmap PMD. Every
> >> time a host pmd will be invalidated, we will check if the respective
> >> gmap PMD has the bit set and in that case fire up the notifier.
> >>
> >> In the first step we only support setting the invalidation bit, but we
> >> do not support restricting access of guest pmds. It will follow
> >> shortly.
> >>
> >> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
> >> ---
>
> >> index 74e2062..3961589 100644
> >> --- a/arch/s390/mm/gmap.c
> >> +++ b/arch/s390/mm/gmap.c
> >> @@ -595,10 +595,17 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr)
> >> if (*table == _SEGMENT_ENTRY_EMPTY) {
> >> rc = radix_tree_insert(&gmap->host_to_guest,
> >> vmaddr >> PMD_SHIFT, table);
> >> - if (!rc)
> >> - *table = pmd_val(*pmd);
> >> - } else
> >> - rc = 0;
> >> + if (!rc) {
> >> + if (pmd_large(*pmd)) {
> >> + *table = pmd_val(*pmd) &
> >> + (_SEGMENT_ENTRY_ORIGIN_LARGE
> >> + | _SEGMENT_ENTRY_INVALID
> >> + | _SEGMENT_ENTRY_LARGE
> >> + | _SEGMENT_ENTRY_PROTECT);
> >
> > Can we reuse/define a mask for that?
>
> We could define GMAP masks, that only leave the bits needed for GMAP
> usage. Integrating this in pgtable.h might be hard and we only have one
> user of this.
>
> >
> > Like _SEGMENT_ENTRY_BITS_LARGE
> >
> >> + } else
> >> + *table = pmd_val(*pmd) & ~0x03UL;
> >
> > Where exactly does the 0x03UL come from? Can we reuse e.g.
> > _SEGMENT_ENTRY_BITS here?
>
> No, _SEGMENT_ENTRY_BITS contains these bits.
>
> This is effectively ~(_SEGMENT_ENTRY_READ | _SEGMENT_ENTRY_WRITE) to get
> rid of the software bits linux uses for tracking.
>
> >
> > Or is there a way to unify both cases?
>
> That will be difficult because of the different origin masks.
>
> >>
> >> /*
> >> + * gmap_protect_large - set pmd notification bits
> >
> > Why not gmap_protect_pmd just like gmap_protect_pte?
>
> When I started I thought adding edat2 would not be harder than adding
> edat1. The large suffix could have later been used to do 1m and 2g
> handling because the bits are at the same places.
>
> I'll change this one and all later occurrences.
>
> >
> >> + * @pmdp: pointer to the pmd to be protected
> >> + * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE
> >> + * @bits: notification bits to set
> >> + *
> >> + * Returns 0 if successfully protected, -ENOMEM if out of memory and
> >> + * -EAGAIN if a fixup is needed.
> >> + *
> >> + * Expected to be called with sg->mm->mmap_sem in read and
> >> + * guest_table_lock held.
> >
> > gmap_pmd_op_walk() guarantees the latter, correct?
>
> Yes
>
> >
> >> + */
> >> +static int gmap_protect_large(struct gmap *gmap, unsigned long gaddr,
> >> + pmd_t *pmdp, int prot, unsigned long bits)
> >> +{
> >> + int pmd_i, pmd_p;
> >> +
> >> + pmd_i = pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID;
> >> + pmd_p = pmd_val(*pmdp) & _SEGMENT_ENTRY_PROTECT;
> >
> > initialize both directly and make them const.
>
> Sure
>
> >> static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
> >> unsigned long len, int prot, unsigned long bits)
> >> @@ -990,11 +1026,20 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
> >> rc = -EAGAIN;
> >> pmdp = gmap_pmd_op_walk(gmap, gaddr);
> >> if (pmdp) {
> >> - rc = gmap_protect_pte(gmap, gaddr, pmdp, prot,
> >> - bits);
> >> - if (!rc) {
> >> - len -= PAGE_SIZE;
> >> - gaddr += PAGE_SIZE;
> >> + if (!pmd_large(*pmdp)) {
> >> + rc = gmap_protect_pte(gmap, gaddr, pmdp, prot,
> >> + bits);
> >> + if (!rc) {
> >> + len -= PAGE_SIZE;
> >> + gaddr += PAGE_SIZE;
> >> + }
> >> + } else {
> >> + rc = gmap_protect_large(gmap, gaddr, pmdp,
> >> + prot, bits);
> >> + if (!rc) {
> >> + len = len < HPAGE_SIZE ? 0 : len - HPAGE_SIZE;
> >
> > Shouldn't you also have to take the difference to (gaddr & HPAGE_MASK)
> > into account, like you do in the next step? (so always subtracting
> > HPAGE_SIZE looks suspicious)
>
> Yes it seems we have to do that.
> We just never ran into troubles because len is generally < HPAGE_SIZE
> and tables don't seem to cross segment boundaries.
>
> >> +/**
> >> + * pmdp_notify - call all invalidation callbacks for a specific pmd
> >> + * @mm: pointer to the process mm_struct
> >> + * @vmaddr: virtual address in the process address space
> >> + *
> >> + * This function is expected to be called with mmap_sem held in read.
> >> + */
> >> +void pmdp_notify(struct mm_struct *mm, unsigned long vmaddr)
> >> +{
> >> + unsigned long *table, gaddr;
> >> + struct gmap *gmap;
> >> +
> >> + rcu_read_lock();
> >> + list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) {
> >> + spin_lock(&gmap->guest_table_lock);
> >> + table = radix_tree_lookup(&gmap->host_to_guest,
> >> + vmaddr >> PMD_SHIFT);
> >> + if (!table || !(*table & _SEGMENT_ENTRY_GMAP_IN)) {
> >> + spin_unlock(&gmap->guest_table_lock);
> >> + continue;
> >> + }
> >> + gaddr = __gmap_segment_gaddr(table);
> >> + *table &= ~_SEGMENT_ENTRY_GMAP_IN;
> >
> > We can perform this page table change without any flushing/stuff like
> > that because we don't modify bits used by the HW. Is that guaranteed by
> > the architecture or are there any special conditions to take into
> > account? (applies also to were we set the _SEGMENT_ENTRY_GMAP_IN)
>
> ACC is ignored when the validity is 0 and 62 and 63 (0x3) are "available
> for programming".
> It pretty much sounds like these are OS bits free for any usage, I'll
> ask the hardware team when I find time.
Bits 62 and 63 used to be reserved bits but they have been defined for OS
usage with the z13 PoP.
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
next prev parent reply other threads:[~2017-11-17 9:19 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-06 22:29 [RFC/PATCH 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 01/22] s390/mm: make gmap_protect_range more modular Janosch Frank
2017-11-08 10:40 ` David Hildenbrand
2017-11-08 12:21 ` Janosch Frank
2017-11-08 12:26 ` David Hildenbrand
2017-11-06 22:29 ` [RFC/PATCH 02/22] s390/mm: Abstract gmap notify bit setting Janosch Frank
2017-11-10 12:57 ` David Hildenbrand
2017-11-13 15:57 ` Janosch Frank
2017-11-15 9:30 ` David Hildenbrand
2017-11-06 22:29 ` [RFC/PATCH 03/22] s390/mm: add gmap PMD invalidation notification Janosch Frank
2017-11-15 9:55 ` David Hildenbrand
2017-11-17 9:02 ` Janosch Frank
2017-11-17 9:19 ` Martin Schwidefsky [this message]
2017-11-06 22:29 ` [RFC/PATCH 04/22] s390/mm: Add gmap pmd invalidation and clearing Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 05/22] s390/mm: hugetlb pages within a gmap can not be freed Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 06/22] s390/mm: Introduce gmap_pmdp_xchg Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 07/22] RFC: s390/mm: Transfer guest pmd protection to host Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 08/22] s390/mm: Add huge page dirty sync support Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 09/22] s390/mm: clear huge page storage keys on enable_skey Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 10/22] s390/mm: Add huge pmd storage key handling Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 11/22] s390/mm: Remove superfluous parameter Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 12/22] s390/mm: Add gmap_protect_large read protection support Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 13/22] s390/mm: Make gmap_read_table EDAT1 compatible Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 14/22] s390/mm: Make protect_rmap " Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 15/22] s390/mm: GMAP read table extensions Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 16/22] s390/mm: Add shadow segment code Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 17/22] s390/mm: Add VSIE reverse fake case Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 18/22] s390/mm: Remove gmap_pte_op_walk Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 19/22] s390/mm: Split huge pages if granular protection is needed Janosch Frank
2017-12-07 16:32 ` David Hildenbrand
2017-12-08 7:00 ` Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 20/22] s390/mm: Enable gmap huge pmd support Janosch Frank
2017-11-15 10:08 ` David Hildenbrand
2017-11-15 12:24 ` Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 21/22] KVM: s390: Add KVM HPAGE capability Janosch Frank
2017-11-07 10:07 ` Cornelia Huck
2017-11-07 10:53 ` Janosch Frank
2017-11-15 10:06 ` David Hildenbrand
2017-11-15 12:02 ` Janosch Frank
2017-11-06 22:30 ` [RFC/PATCH 22/22] RFC: s390/mm: Add gmap lock classes Janosch Frank
2017-11-15 10:10 ` David Hildenbrand
2017-11-15 12:16 ` Janosch Frank
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171117101928.1ea9d354@mschwideX1 \
--to=schwidefsky@de.ibm.com \
--cc=borntraeger@de.ibm.com \
--cc=david@redhat.com \
--cc=dominik.dingel@gmail.com \
--cc=frankja@linux.vnet.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox