public inbox for linux-s390@vger.kernel.org
 help / color / mirror / Atom feed
From: Janosch Frank <frankja@linux.vnet.ibm.com>
To: David Hildenbrand <david@redhat.com>, kvm@vger.kernel.org
Cc: schwidefsky@de.ibm.com, borntraeger@de.ibm.com,
	dominik.dingel@gmail.com, linux-s390@vger.kernel.org
Subject: Re: [RFC/PATCH 03/22] s390/mm: add gmap PMD invalidation notification
Date: Fri, 17 Nov 2017 10:02:57 +0100	[thread overview]
Message-ID: <eebe97cd-5c3d-f54f-6dfc-e6a105c57dc4@linux.vnet.ibm.com> (raw)
In-Reply-To: <3b439732-70a4-8b95-4b9e-87f230dd7aaa@redhat.com>


[-- Attachment #1.1: Type: text/plain, Size: 6121 bytes --]

On 15.11.2017 10:55, David Hildenbrand wrote:
> On 06.11.2017 23:29, Janosch Frank wrote:
>> For later migration of huge pages we want to write-protect guest
>> PMDs. While doing this, we have to make absolutely sure, that the
>> guest's lowcore is always accessible when the VCPU is running. With
>> PTEs, this is solved by marking the PGSTEs of the lowcore pages with
>> the invalidation notification bit and kicking the guest out of the SIE
>> via a notifier function if we need to invalidate such a page.
>>
>> With PMDs we do not have PGSTEs or some other bits we could use in the
>> host PMD. Instead we pick one of the free bits in the gmap PMD. Every
>> time a host pmd will be invalidated, we will check if the respective
>> gmap PMD has the bit set and in that case fire up the notifier.
>>
>> In the first step we only support setting the invalidation bit, but we
>> do not support restricting access of guest pmds. It will follow
>> shortly.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
>> ---

>> index 74e2062..3961589 100644
>> --- a/arch/s390/mm/gmap.c
>> +++ b/arch/s390/mm/gmap.c
>> @@ -595,10 +595,17 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr)
>>  	if (*table == _SEGMENT_ENTRY_EMPTY) {
>>  		rc = radix_tree_insert(&gmap->host_to_guest,
>>  				       vmaddr >> PMD_SHIFT, table);
>> -		if (!rc)
>> -			*table = pmd_val(*pmd);
>> -	} else
>> -		rc = 0;
>> +		if (!rc) {
>> +			if (pmd_large(*pmd)) {
>> +				*table = pmd_val(*pmd) &
>> +					(_SEGMENT_ENTRY_ORIGIN_LARGE
>> +					 | _SEGMENT_ENTRY_INVALID
>> +					 | _SEGMENT_ENTRY_LARGE
>> +					 | _SEGMENT_ENTRY_PROTECT);
> 
> Can we reuse/define a mask for that?

We could define GMAP masks, that only leave the bits needed for GMAP
usage. Integrating this in pgtable.h might be hard and we only have one
user of this.

> 
> Like _SEGMENT_ENTRY_BITS_LARGE
> 
>> +			} else
>> +				*table = pmd_val(*pmd) & ~0x03UL;
> 
> Where exactly does the 0x03UL come from? Can we reuse e.g.
> _SEGMENT_ENTRY_BITS here?

No, _SEGMENT_ENTRY_BITS contains these bits.

This is effectively ~(_SEGMENT_ENTRY_READ | _SEGMENT_ENTRY_WRITE) to get
rid of the software bits linux uses for tracking.

> 
> Or is there a way to unify both cases?

That will be difficult because of the different origin masks.

>>  
>>  /*
>> + * gmap_protect_large - set pmd notification bits
> 
> Why not gmap_protect_pmd just like gmap_protect_pte?

When I started I thought adding edat2 would not be harder than adding
edat1. The large suffix could have later been used to do 1m and 2g
handling because the bits are at the same places.

I'll change this one and all later occurrences.

> 
>> + * @pmdp: pointer to the pmd to be protected
>> + * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE
>> + * @bits: notification bits to set
>> + *
>> + * Returns 0 if successfully protected, -ENOMEM if out of memory and
>> + * -EAGAIN if a fixup is needed.
>> + *
>> + * Expected to be called with sg->mm->mmap_sem in read and
>> + * guest_table_lock held.
> 
> gmap_pmd_op_walk() guarantees the latter, correct?

Yes

> 
>> + */
>> +static int gmap_protect_large(struct gmap *gmap, unsigned long gaddr,
>> +			      pmd_t *pmdp, int prot, unsigned long bits)
>> +{
>> +	int pmd_i, pmd_p;
>> +
>> +	pmd_i = pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID;
>> +	pmd_p = pmd_val(*pmdp) & _SEGMENT_ENTRY_PROTECT;
> 
> initialize both directly and make them const.

Sure

>>  static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
>>  			      unsigned long len, int prot, unsigned long bits)
>> @@ -990,11 +1026,20 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
>>  		rc = -EAGAIN;
>>  		pmdp = gmap_pmd_op_walk(gmap, gaddr);
>>  		if (pmdp) {
>> -			rc = gmap_protect_pte(gmap, gaddr, pmdp, prot,
>> -					      bits);
>> -			if (!rc) {
>> -				len -= PAGE_SIZE;
>> -				gaddr += PAGE_SIZE;
>> +			if (!pmd_large(*pmdp)) {
>> +				rc = gmap_protect_pte(gmap, gaddr, pmdp, prot,
>> +						      bits);
>> +				if (!rc) {
>> +					len -= PAGE_SIZE;
>> +					gaddr += PAGE_SIZE;
>> +				}
>> +			} else {
>> +				rc =  gmap_protect_large(gmap, gaddr, pmdp,
>> +							 prot, bits);
>> +				if (!rc) {
>> +					len = len < HPAGE_SIZE ? 0 : len - HPAGE_SIZE;
> 
> Shouldn't you also have to take the difference to (gaddr & HPAGE_MASK)
> into account, like you do in the next step? (so always subtracting
> HPAGE_SIZE looks suspicious)

Yes it seems we have to do that.
We just never ran into troubles because len is generally < HPAGE_SIZE
and tables don't seem to cross segment boundaries.

>> +/**
>> + * pmdp_notify - call all invalidation callbacks for a specific pmd
>> + * @mm: pointer to the process mm_struct
>> + * @vmaddr: virtual address in the process address space
>> + *
>> + * This function is expected to be called with mmap_sem held in read.
>> + */
>> +void pmdp_notify(struct mm_struct *mm, unsigned long vmaddr)
>> +{
>> +	unsigned long *table, gaddr;
>> +	struct gmap *gmap;
>> +
>> +	rcu_read_lock();
>> +	list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) {
>> +		spin_lock(&gmap->guest_table_lock);
>> +		table = radix_tree_lookup(&gmap->host_to_guest,
>> +					  vmaddr >> PMD_SHIFT);
>> +		if (!table || !(*table & _SEGMENT_ENTRY_GMAP_IN)) {
>> +			spin_unlock(&gmap->guest_table_lock);
>> +			continue;
>> +		}
>> +		gaddr = __gmap_segment_gaddr(table);
>> +		*table &= ~_SEGMENT_ENTRY_GMAP_IN;
> 
> We can perform this page table change without any flushing/stuff like
> that because we don't modify bits used by the HW. Is that guaranteed by
> the architecture or are there any special conditions to take into
> account? (applies also to were we set the _SEGMENT_ENTRY_GMAP_IN)

ACC is ignored when the validity is 0 and 62 and 63 (0x3) are "available
for programming".
It pretty much sounds like these are OS bits free for any usage, I'll
ask the hardware team when I find time.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  reply	other threads:[~2017-11-17  9:02 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-06 22:29 [RFC/PATCH 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 01/22] s390/mm: make gmap_protect_range more modular Janosch Frank
2017-11-08 10:40   ` David Hildenbrand
2017-11-08 12:21     ` Janosch Frank
2017-11-08 12:26       ` David Hildenbrand
2017-11-06 22:29 ` [RFC/PATCH 02/22] s390/mm: Abstract gmap notify bit setting Janosch Frank
2017-11-10 12:57   ` David Hildenbrand
2017-11-13 15:57     ` Janosch Frank
2017-11-15  9:30       ` David Hildenbrand
2017-11-06 22:29 ` [RFC/PATCH 03/22] s390/mm: add gmap PMD invalidation notification Janosch Frank
2017-11-15  9:55   ` David Hildenbrand
2017-11-17  9:02     ` Janosch Frank [this message]
2017-11-17  9:19       ` Martin Schwidefsky
2017-11-06 22:29 ` [RFC/PATCH 04/22] s390/mm: Add gmap pmd invalidation and clearing Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 05/22] s390/mm: hugetlb pages within a gmap can not be freed Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 06/22] s390/mm: Introduce gmap_pmdp_xchg Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 07/22] RFC: s390/mm: Transfer guest pmd protection to host Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 08/22] s390/mm: Add huge page dirty sync support Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 09/22] s390/mm: clear huge page storage keys on enable_skey Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 10/22] s390/mm: Add huge pmd storage key handling Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 11/22] s390/mm: Remove superfluous parameter Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 12/22] s390/mm: Add gmap_protect_large read protection support Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 13/22] s390/mm: Make gmap_read_table EDAT1 compatible Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 14/22] s390/mm: Make protect_rmap " Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 15/22] s390/mm: GMAP read table extensions Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 16/22] s390/mm: Add shadow segment code Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 17/22] s390/mm: Add VSIE reverse fake case Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 18/22] s390/mm: Remove gmap_pte_op_walk Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 19/22] s390/mm: Split huge pages if granular protection is needed Janosch Frank
2017-12-07 16:32   ` David Hildenbrand
2017-12-08  7:00     ` Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 20/22] s390/mm: Enable gmap huge pmd support Janosch Frank
2017-11-15 10:08   ` David Hildenbrand
2017-11-15 12:24     ` Janosch Frank
2017-11-06 22:29 ` [RFC/PATCH 21/22] KVM: s390: Add KVM HPAGE capability Janosch Frank
2017-11-07 10:07   ` Cornelia Huck
2017-11-07 10:53     ` Janosch Frank
2017-11-15 10:06   ` David Hildenbrand
2017-11-15 12:02     ` Janosch Frank
2017-11-06 22:30 ` [RFC/PATCH 22/22] RFC: s390/mm: Add gmap lock classes Janosch Frank
2017-11-15 10:10   ` David Hildenbrand
2017-11-15 12:16     ` Janosch Frank

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eebe97cd-5c3d-f54f-6dfc-e6a105c57dc4@linux.vnet.ibm.com \
    --to=frankja@linux.vnet.ibm.com \
    --cc=borntraeger@de.ibm.com \
    --cc=david@redhat.com \
    --cc=dominik.dingel@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=schwidefsky@de.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox