* Re: [PATCH v2 1/1] KVM: s390: pv: fix asynchronous teardown for small VMs
2023-04-21 8:50 ` [PATCH v2 1/1] " Claudio Imbrenda
@ 2023-04-21 8:57 ` Janosch Frank
2023-04-21 9:30 ` Marc Hartmayer
1 sibling, 0 replies; 4+ messages in thread
From: Janosch Frank @ 2023-04-21 8:57 UTC (permalink / raw)
To: Claudio Imbrenda, kvm
Cc: borntraeger, nrb, nsg, mhartmay, kvm390-list, linux-s390
On 4/21/23 10:50, Claudio Imbrenda wrote:
> On machines without the Destroy Secure Configuration Fast UVC, the
> topmost level of page tables is set aside and freed asynchronously
> as last step of the asynchronous teardown.
>
> Each gmap has a host_to_guest radix tree mapping host (userspace)
> addresses (with 1M granularity) to gmap segment table entries (pmds).
>
> If a guest is smaller than 2GB, the topmost level of page tables is the
> segment table (i.e. there are only 2 levels). Replacing it means that
> the pointers in the host_to_guest mapping would become stale and cause
> all kinds of nasty issues.
>
> This patch fixes the issue by disallowing asynchronous teardown for
> guests with only 2 levels of page tables. Userspace should (and already
> does) try using the normal destroy if the asynchronous one fails.
>
> Update s390_replace_asce so it refuses to replace segment type ASCEs.
> This is still needed in case the normal destroy VM fails.
>
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Fixes: fb491d5500a7 ("KVM: s390: pv: asynchronous destroy for reboot")
Since QEMU will do a normal PV disable on a rc != 0 this should work out
just fine. The less code to fix this, the better.
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
> ---
> arch/s390/kvm/pv.c | 5 +++++
> arch/s390/mm/gmap.c | 7 +++++++
> 2 files changed, 12 insertions(+)
>
> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
> index e032ebbf51b9..3ce5f4351156 100644
> --- a/arch/s390/kvm/pv.c
> +++ b/arch/s390/kvm/pv.c
> @@ -314,6 +314,11 @@ int kvm_s390_pv_set_aside(struct kvm *kvm, u16 *rc, u16 *rrc)
> */
> if (kvm->arch.pv.set_aside)
> return -EINVAL;
> +
> + /* Guest with segment type ASCE, refuse to destroy asynchronously */
> + if ((kvm->arch.gmap->asce & _ASCE_TYPE_MASK) == _ASCE_TYPE_SEGMENT)
> + return -EINVAL;
> +
> priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> if (!priv)
> return -ENOMEM;
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 5a716bdcba05..2267cf9819b2 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -2833,6 +2833,9 @@ EXPORT_SYMBOL_GPL(s390_unlist_old_asce);
> * s390_replace_asce - Try to replace the current ASCE of a gmap with a copy
> * @gmap: the gmap whose ASCE needs to be replaced
> *
> + * If the ASCE is a SEGMENT type then this function will return -EINVAL,
> + * otherwise the pointers in the host_to_guest radix tree will keep pointing
> + * to the wrong pages, causing use-after-free and memory corruption.
> * If the allocation of the new top level page table fails, the ASCE is not
> * replaced.
> * In any case, the old ASCE is always removed from the gmap CRST list.
> @@ -2847,6 +2850,10 @@ int s390_replace_asce(struct gmap *gmap)
>
> s390_unlist_old_asce(gmap);
>
> + /* Replacing segment type ASCEs would cause serious issues */
> + if ((gmap->asce & _ASCE_TYPE_MASK) == _ASCE_TYPE_SEGMENT)
> + return -EINVAL;
> +
> page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> if (!page)
> return -ENOMEM;
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH v2 1/1] KVM: s390: pv: fix asynchronous teardown for small VMs
2023-04-21 8:50 ` [PATCH v2 1/1] " Claudio Imbrenda
2023-04-21 8:57 ` Janosch Frank
@ 2023-04-21 9:30 ` Marc Hartmayer
1 sibling, 0 replies; 4+ messages in thread
From: Marc Hartmayer @ 2023-04-21 9:30 UTC (permalink / raw)
To: Claudio Imbrenda, kvm
Cc: borntraeger, nrb, nsg, frankja, kvm390-list, linux-s390
Claudio Imbrenda <imbrenda@linux.ibm.com> writes:
> On machines without the Destroy Secure Configuration Fast UVC, the
> topmost level of page tables is set aside and freed asynchronously
> as last step of the asynchronous teardown.
>
> Each gmap has a host_to_guest radix tree mapping host (userspace)
> addresses (with 1M granularity) to gmap segment table entries (pmds).
>
> If a guest is smaller than 2GB, the topmost level of page tables is the
> segment table (i.e. there are only 2 levels). Replacing it means that
> the pointers in the host_to_guest mapping would become stale and cause
> all kinds of nasty issues.
>
> This patch fixes the issue by disallowing asynchronous teardown for
> guests with only 2 levels of page tables. Userspace should (and already
> does) try using the normal destroy if the asynchronous one fails.
>
> Update s390_replace_asce so it refuses to replace segment type ASCEs.
> This is still needed in case the normal destroy VM fails.
>
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Fixes: fb491d5500a7 ("KVM: s390: pv: asynchronous destroy for reboot")
> ---
> arch/s390/kvm/pv.c | 5 +++++
> arch/s390/mm/gmap.c | 7 +++++++
> 2 files changed, 12 insertions(+)
>
> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
> index e032ebbf51b9..3ce5f4351156 100644
> --- a/arch/s390/kvm/pv.c
> +++ b/arch/s390/kvm/pv.c
> @@ -314,6 +314,11 @@ int kvm_s390_pv_set_aside(struct kvm *kvm, u16 *rc, u16 *rrc)
> */
> if (kvm->arch.pv.set_aside)
> return -EINVAL;
> +
> + /* Guest with segment type ASCE, refuse to destroy asynchronously */
> + if ((kvm->arch.gmap->asce & _ASCE_TYPE_MASK) == _ASCE_TYPE_SEGMENT)
> + return -EINVAL;
> +
> priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> if (!priv)
> return -ENOMEM;
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 5a716bdcba05..2267cf9819b2 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -2833,6 +2833,9 @@ EXPORT_SYMBOL_GPL(s390_unlist_old_asce);
> * s390_replace_asce - Try to replace the current ASCE of a gmap with a copy
> * @gmap: the gmap whose ASCE needs to be replaced
> *
> + * If the ASCE is a SEGMENT type then this function will return -EINVAL,
> + * otherwise the pointers in the host_to_guest radix tree will keep pointing
> + * to the wrong pages, causing use-after-free and memory corruption.
> * If the allocation of the new top level page table fails, the ASCE is not
> * replaced.
> * In any case, the old ASCE is always removed from the gmap CRST list.
> @@ -2847,6 +2850,10 @@ int s390_replace_asce(struct gmap *gmap)
>
> s390_unlist_old_asce(gmap);
>
> + /* Replacing segment type ASCEs would cause serious issues */
> + if ((gmap->asce & _ASCE_TYPE_MASK) == _ASCE_TYPE_SEGMENT)
> + return -EINVAL;
As discussed... not sure if this is a valid scenario or if it can be
considered a bug if it happens.
> +
> page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> if (!page)
> return -ENOMEM;
> --
> 2.40.0
IMO, much better.
Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com>
^ permalink raw reply [flat|nested] 4+ messages in thread