* [PATCH v2 0/1] KVM: s390: pv: fix asynchronous teardown for small VMs
@ 2023-04-21 8:50 Claudio Imbrenda
2023-04-21 8:50 ` [PATCH v2 1/1] " Claudio Imbrenda
0 siblings, 1 reply; 4+ messages in thread
From: Claudio Imbrenda @ 2023-04-21 8:50 UTC (permalink / raw)
To: kvm; +Cc: borntraeger, nrb, nsg, frankja, mhartmay, kvm390-list, linux-s390
On machines without the Destroy Secure Configuration Fast UVC, the
topmost level of page tables is set aside and freed asynchronously
as last step of the asynchronous teardown.
Each gmap has a host_to_guest radix tree mapping host (userspace)
addresses (with 1M granularity) to gmap segment table entries (pmds).
If a guest is smaller than 2GB, the topmost level of page tables is the
segment table (i.e. there are only 2 levels). Replacing it means that
the pointers in the host_to_guest mapping would become stale and cause
all kinds of nasty issues.
This patch fixes the issue by disallowing asynchronous teardown for
guests with only 2 levels of page tables. Userspace should (and already
does) try using the normal destroy if the asynchronous one fails.
Update s390_replace_asce so it refuses to replace segment type ASCEs.
v1->v2:
After talking with Marc, I decided to throw away most of the patch and
instead simply refuse to prepare for asynchronous teardown if the VM has a
segment type ASCE.
Claudio Imbrenda (1):
KVM: s390: pv: fix asynchronous teardown for small VMs
arch/s390/kvm/pv.c | 5 +++++
arch/s390/mm/gmap.c | 7 +++++++
2 files changed, 12 insertions(+)
--
2.40.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2 1/1] KVM: s390: pv: fix asynchronous teardown for small VMs
2023-04-21 8:50 [PATCH v2 0/1] KVM: s390: pv: fix asynchronous teardown for small VMs Claudio Imbrenda
@ 2023-04-21 8:50 ` Claudio Imbrenda
2023-04-21 8:57 ` Janosch Frank
2023-04-21 9:30 ` Marc Hartmayer
0 siblings, 2 replies; 4+ messages in thread
From: Claudio Imbrenda @ 2023-04-21 8:50 UTC (permalink / raw)
To: kvm; +Cc: borntraeger, nrb, nsg, frankja, mhartmay, kvm390-list, linux-s390
On machines without the Destroy Secure Configuration Fast UVC, the
topmost level of page tables is set aside and freed asynchronously
as last step of the asynchronous teardown.
Each gmap has a host_to_guest radix tree mapping host (userspace)
addresses (with 1M granularity) to gmap segment table entries (pmds).
If a guest is smaller than 2GB, the topmost level of page tables is the
segment table (i.e. there are only 2 levels). Replacing it means that
the pointers in the host_to_guest mapping would become stale and cause
all kinds of nasty issues.
This patch fixes the issue by disallowing asynchronous teardown for
guests with only 2 levels of page tables. Userspace should (and already
does) try using the normal destroy if the asynchronous one fails.
Update s390_replace_asce so it refuses to replace segment type ASCEs.
This is still needed in case the normal destroy VM fails.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: fb491d5500a7 ("KVM: s390: pv: asynchronous destroy for reboot")
---
arch/s390/kvm/pv.c | 5 +++++
arch/s390/mm/gmap.c | 7 +++++++
2 files changed, 12 insertions(+)
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index e032ebbf51b9..3ce5f4351156 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -314,6 +314,11 @@ int kvm_s390_pv_set_aside(struct kvm *kvm, u16 *rc, u16 *rrc)
*/
if (kvm->arch.pv.set_aside)
return -EINVAL;
+
+ /* Guest with segment type ASCE, refuse to destroy asynchronously */
+ if ((kvm->arch.gmap->asce & _ASCE_TYPE_MASK) == _ASCE_TYPE_SEGMENT)
+ return -EINVAL;
+
priv = kzalloc(sizeof(*priv), GFP_KERNEL);
if (!priv)
return -ENOMEM;
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 5a716bdcba05..2267cf9819b2 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2833,6 +2833,9 @@ EXPORT_SYMBOL_GPL(s390_unlist_old_asce);
* s390_replace_asce - Try to replace the current ASCE of a gmap with a copy
* @gmap: the gmap whose ASCE needs to be replaced
*
+ * If the ASCE is a SEGMENT type then this function will return -EINVAL,
+ * otherwise the pointers in the host_to_guest radix tree will keep pointing
+ * to the wrong pages, causing use-after-free and memory corruption.
* If the allocation of the new top level page table fails, the ASCE is not
* replaced.
* In any case, the old ASCE is always removed from the gmap CRST list.
@@ -2847,6 +2850,10 @@ int s390_replace_asce(struct gmap *gmap)
s390_unlist_old_asce(gmap);
+ /* Replacing segment type ASCEs would cause serious issues */
+ if ((gmap->asce & _ASCE_TYPE_MASK) == _ASCE_TYPE_SEGMENT)
+ return -EINVAL;
+
page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
if (!page)
return -ENOMEM;
--
2.40.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2 1/1] KVM: s390: pv: fix asynchronous teardown for small VMs
2023-04-21 8:50 ` [PATCH v2 1/1] " Claudio Imbrenda
@ 2023-04-21 8:57 ` Janosch Frank
2023-04-21 9:30 ` Marc Hartmayer
1 sibling, 0 replies; 4+ messages in thread
From: Janosch Frank @ 2023-04-21 8:57 UTC (permalink / raw)
To: Claudio Imbrenda, kvm
Cc: borntraeger, nrb, nsg, mhartmay, kvm390-list, linux-s390
On 4/21/23 10:50, Claudio Imbrenda wrote:
> On machines without the Destroy Secure Configuration Fast UVC, the
> topmost level of page tables is set aside and freed asynchronously
> as last step of the asynchronous teardown.
>
> Each gmap has a host_to_guest radix tree mapping host (userspace)
> addresses (with 1M granularity) to gmap segment table entries (pmds).
>
> If a guest is smaller than 2GB, the topmost level of page tables is the
> segment table (i.e. there are only 2 levels). Replacing it means that
> the pointers in the host_to_guest mapping would become stale and cause
> all kinds of nasty issues.
>
> This patch fixes the issue by disallowing asynchronous teardown for
> guests with only 2 levels of page tables. Userspace should (and already
> does) try using the normal destroy if the asynchronous one fails.
>
> Update s390_replace_asce so it refuses to replace segment type ASCEs.
> This is still needed in case the normal destroy VM fails.
>
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Fixes: fb491d5500a7 ("KVM: s390: pv: asynchronous destroy for reboot")
Since QEMU will do a normal PV disable on a rc != 0 this should work out
just fine. The less code to fix this, the better.
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
> ---
> arch/s390/kvm/pv.c | 5 +++++
> arch/s390/mm/gmap.c | 7 +++++++
> 2 files changed, 12 insertions(+)
>
> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
> index e032ebbf51b9..3ce5f4351156 100644
> --- a/arch/s390/kvm/pv.c
> +++ b/arch/s390/kvm/pv.c
> @@ -314,6 +314,11 @@ int kvm_s390_pv_set_aside(struct kvm *kvm, u16 *rc, u16 *rrc)
> */
> if (kvm->arch.pv.set_aside)
> return -EINVAL;
> +
> + /* Guest with segment type ASCE, refuse to destroy asynchronously */
> + if ((kvm->arch.gmap->asce & _ASCE_TYPE_MASK) == _ASCE_TYPE_SEGMENT)
> + return -EINVAL;
> +
> priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> if (!priv)
> return -ENOMEM;
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 5a716bdcba05..2267cf9819b2 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -2833,6 +2833,9 @@ EXPORT_SYMBOL_GPL(s390_unlist_old_asce);
> * s390_replace_asce - Try to replace the current ASCE of a gmap with a copy
> * @gmap: the gmap whose ASCE needs to be replaced
> *
> + * If the ASCE is a SEGMENT type then this function will return -EINVAL,
> + * otherwise the pointers in the host_to_guest radix tree will keep pointing
> + * to the wrong pages, causing use-after-free and memory corruption.
> * If the allocation of the new top level page table fails, the ASCE is not
> * replaced.
> * In any case, the old ASCE is always removed from the gmap CRST list.
> @@ -2847,6 +2850,10 @@ int s390_replace_asce(struct gmap *gmap)
>
> s390_unlist_old_asce(gmap);
>
> + /* Replacing segment type ASCEs would cause serious issues */
> + if ((gmap->asce & _ASCE_TYPE_MASK) == _ASCE_TYPE_SEGMENT)
> + return -EINVAL;
> +
> page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> if (!page)
> return -ENOMEM;
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2 1/1] KVM: s390: pv: fix asynchronous teardown for small VMs
2023-04-21 8:50 ` [PATCH v2 1/1] " Claudio Imbrenda
2023-04-21 8:57 ` Janosch Frank
@ 2023-04-21 9:30 ` Marc Hartmayer
1 sibling, 0 replies; 4+ messages in thread
From: Marc Hartmayer @ 2023-04-21 9:30 UTC (permalink / raw)
To: Claudio Imbrenda, kvm
Cc: borntraeger, nrb, nsg, frankja, kvm390-list, linux-s390
Claudio Imbrenda <imbrenda@linux.ibm.com> writes:
> On machines without the Destroy Secure Configuration Fast UVC, the
> topmost level of page tables is set aside and freed asynchronously
> as last step of the asynchronous teardown.
>
> Each gmap has a host_to_guest radix tree mapping host (userspace)
> addresses (with 1M granularity) to gmap segment table entries (pmds).
>
> If a guest is smaller than 2GB, the topmost level of page tables is the
> segment table (i.e. there are only 2 levels). Replacing it means that
> the pointers in the host_to_guest mapping would become stale and cause
> all kinds of nasty issues.
>
> This patch fixes the issue by disallowing asynchronous teardown for
> guests with only 2 levels of page tables. Userspace should (and already
> does) try using the normal destroy if the asynchronous one fails.
>
> Update s390_replace_asce so it refuses to replace segment type ASCEs.
> This is still needed in case the normal destroy VM fails.
>
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Fixes: fb491d5500a7 ("KVM: s390: pv: asynchronous destroy for reboot")
> ---
> arch/s390/kvm/pv.c | 5 +++++
> arch/s390/mm/gmap.c | 7 +++++++
> 2 files changed, 12 insertions(+)
>
> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
> index e032ebbf51b9..3ce5f4351156 100644
> --- a/arch/s390/kvm/pv.c
> +++ b/arch/s390/kvm/pv.c
> @@ -314,6 +314,11 @@ int kvm_s390_pv_set_aside(struct kvm *kvm, u16 *rc, u16 *rrc)
> */
> if (kvm->arch.pv.set_aside)
> return -EINVAL;
> +
> + /* Guest with segment type ASCE, refuse to destroy asynchronously */
> + if ((kvm->arch.gmap->asce & _ASCE_TYPE_MASK) == _ASCE_TYPE_SEGMENT)
> + return -EINVAL;
> +
> priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> if (!priv)
> return -ENOMEM;
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 5a716bdcba05..2267cf9819b2 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -2833,6 +2833,9 @@ EXPORT_SYMBOL_GPL(s390_unlist_old_asce);
> * s390_replace_asce - Try to replace the current ASCE of a gmap with a copy
> * @gmap: the gmap whose ASCE needs to be replaced
> *
> + * If the ASCE is a SEGMENT type then this function will return -EINVAL,
> + * otherwise the pointers in the host_to_guest radix tree will keep pointing
> + * to the wrong pages, causing use-after-free and memory corruption.
> * If the allocation of the new top level page table fails, the ASCE is not
> * replaced.
> * In any case, the old ASCE is always removed from the gmap CRST list.
> @@ -2847,6 +2850,10 @@ int s390_replace_asce(struct gmap *gmap)
>
> s390_unlist_old_asce(gmap);
>
> + /* Replacing segment type ASCEs would cause serious issues */
> + if ((gmap->asce & _ASCE_TYPE_MASK) == _ASCE_TYPE_SEGMENT)
> + return -EINVAL;
As discussed... not sure if this is a valid scenario or if it can be
considered a bug if it happens.
> +
> page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> if (!page)
> return -ENOMEM;
> --
> 2.40.0
IMO, much better.
Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-04-21 9:30 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-21 8:50 [PATCH v2 0/1] KVM: s390: pv: fix asynchronous teardown for small VMs Claudio Imbrenda
2023-04-21 8:50 ` [PATCH v2 1/1] " Claudio Imbrenda
2023-04-21 8:57 ` Janosch Frank
2023-04-21 9:30 ` Marc Hartmayer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox