* [PATCH v2 2/2] KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path
2019-11-13 16:46 [PATCH v2 1/2] KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one Greg Kurz
@ 2019-11-13 16:46 ` Greg Kurz
2019-11-13 17:00 ` Cédric Le Goater
2019-11-20 16:19 ` [PATCH v2 1/2] KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one Lijun Pan
2019-11-20 16:38 ` Lijun Pan
2 siblings, 1 reply; 5+ messages in thread
From: Greg Kurz @ 2019-11-13 16:46 UTC (permalink / raw)
To: Paul Mackerras
Cc: Laurent Vivier, linux-kernel, kvm-ppc, Satheesh Rajendran,
Cédric Le Goater, Lijun Pan, stable, linuxppc-dev,
David Gibson
We need to check the host page size is big enough to accomodate the
EQ. Let's do this before taking a reference on the EQ page to avoid
a potential leak if the check fails.
Cc: stable@vger.kernel.org # v5.2
Fixes: 13ce3297c576 ("KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration")
Signed-off-by: Greg Kurz <groug@kaod.org>
---
arch/powerpc/kvm/book3s_xive_native.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index 0e1fc5a16729..d83adb1e1490 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -630,12 +630,6 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
srcu_idx = srcu_read_lock(&kvm->srcu);
gfn = gpa_to_gfn(kvm_eq.qaddr);
- page = gfn_to_page(kvm, gfn);
- if (is_error_page(page)) {
- srcu_read_unlock(&kvm->srcu, srcu_idx);
- pr_err("Couldn't get queue page %llx!\n", kvm_eq.qaddr);
- return -EINVAL;
- }
page_size = kvm_host_page_size(kvm, gfn);
if (1ull << kvm_eq.qshift > page_size) {
@@ -644,6 +638,13 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
return -EINVAL;
}
+ page = gfn_to_page(kvm, gfn);
+ if (is_error_page(page)) {
+ srcu_read_unlock(&kvm->srcu, srcu_idx);
+ pr_err("Couldn't get queue page %llx!\n", kvm_eq.qaddr);
+ return -EINVAL;
+ }
+
qaddr = page_to_virt(page) + (kvm_eq.qaddr & ~PAGE_MASK);
srcu_read_unlock(&kvm->srcu, srcu_idx);
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH v2 2/2] KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path
2019-11-13 16:46 ` [PATCH v2 2/2] KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path Greg Kurz
@ 2019-11-13 17:00 ` Cédric Le Goater
0 siblings, 0 replies; 5+ messages in thread
From: Cédric Le Goater @ 2019-11-13 17:00 UTC (permalink / raw)
To: Greg Kurz, Paul Mackerras
Cc: Laurent Vivier, linux-kernel, kvm-ppc, Satheesh Rajendran, stable,
Lijun Pan, linuxppc-dev, David Gibson
On 13/11/2019 17:46, Greg Kurz wrote:
> We need to check the host page size is big enough to accomodate the
> EQ. Let's do this before taking a reference on the EQ page to avoid
> a potential leak if the check fails.
>
> Cc: stable@vger.kernel.org # v5.2
> Fixes: 13ce3297c576 ("KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration")
> Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
> ---
> arch/powerpc/kvm/book3s_xive_native.c | 13 +++++++------
> 1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
> index 0e1fc5a16729..d83adb1e1490 100644
> --- a/arch/powerpc/kvm/book3s_xive_native.c
> +++ b/arch/powerpc/kvm/book3s_xive_native.c
> @@ -630,12 +630,6 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
>
> srcu_idx = srcu_read_lock(&kvm->srcu);
> gfn = gpa_to_gfn(kvm_eq.qaddr);
> - page = gfn_to_page(kvm, gfn);
> - if (is_error_page(page)) {
> - srcu_read_unlock(&kvm->srcu, srcu_idx);
> - pr_err("Couldn't get queue page %llx!\n", kvm_eq.qaddr);
> - return -EINVAL;
> - }
>
> page_size = kvm_host_page_size(kvm, gfn);
> if (1ull << kvm_eq.qshift > page_size) {
> @@ -644,6 +638,13 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
> return -EINVAL;
> }
>
> + page = gfn_to_page(kvm, gfn);
> + if (is_error_page(page)) {
> + srcu_read_unlock(&kvm->srcu, srcu_idx);
> + pr_err("Couldn't get queue page %llx!\n", kvm_eq.qaddr);
> + return -EINVAL;
> + }
> +
> qaddr = page_to_virt(page) + (kvm_eq.qaddr & ~PAGE_MASK);
> srcu_read_unlock(&kvm->srcu, srcu_idx);
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2 1/2] KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one
2019-11-13 16:46 [PATCH v2 1/2] KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one Greg Kurz
2019-11-13 16:46 ` [PATCH v2 2/2] KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path Greg Kurz
@ 2019-11-20 16:19 ` Lijun Pan
2019-11-20 16:38 ` Lijun Pan
2 siblings, 0 replies; 5+ messages in thread
From: Lijun Pan @ 2019-11-20 16:19 UTC (permalink / raw)
To: Greg Kurz
Cc: Laurent Vivier, linux-kernel, kvm-ppc, Satheesh Rajendran,
Cédric Le Goater, Lijun Pan, stable, linuxppc-dev,
David Gibson
[-- Attachment #1: Type: text/plain, Size: 4150 bytes --]
> On Nov 13, 2019, at 10:46 AM, Greg Kurz <groug@kaod.org> wrote:
>
> The EQ page is allocated by the guest and then passed to the hypervisor
> with the H_INT_SET_QUEUE_CONFIG hcall. A reference is taken on the page
> before handing it over to the HW. This reference is dropped either when
> the guest issues the H_INT_RESET hcall or when the KVM device is released.
> But, the guest can legitimately call H_INT_SET_QUEUE_CONFIG several times,
> either to reset the EQ (vCPU hot unplug) or to set a new EQ (guest reboot).
> In both cases the existing EQ page reference is leaked because we simply
> overwrite it in the XIVE queue structure without calling put_page().
>
> This is especially visible when the guest memory is backed with huge pages:
> start a VM up to the guest userspace, either reboot it or unplug a vCPU,
> quit QEMU. The leak is observed by comparing the value of HugePages_Free in
> /proc/meminfo before and after the VM is run.
>
> Ideally we'd want the XIVE code to handle the EQ page de-allocation at the
> platform level. This isn't the case right now because the various XIVE
> drivers have different allocation needs. It could maybe worth introducing
> hooks for this purpose instead of exposing XIVE internals to the drivers,
> but this is certainly a huge work to be done later.
>
> In the meantime, for easier backport, fix both vCPU unplug and guest reboot
> leaks by introducing a wrapper around xive_native_configure_queue() that
> does the necessary cleanup.
>
> Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
> Cc: stable@vger.kernel.org # v5.2
> Fixes: 13ce3297c576 ("KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration")
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> Signed-off-by: Greg Kurz <groug@kaod.org>
Tested-by: Lijun Pan <ljp@linux.ibm <mailto:ljp@linux.ibm>.com>
> ---
> v2: use wrapper as suggested by Cedric
> ---
> arch/powerpc/kvm/book3s_xive_native.c | 31 ++++++++++++++++++++++---------
> 1 file changed, 22 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
> index 34bd123fa024..0e1fc5a16729 100644
> --- a/arch/powerpc/kvm/book3s_xive_native.c
> +++ b/arch/powerpc/kvm/book3s_xive_native.c
> @@ -50,6 +50,24 @@ static void kvmppc_xive_native_cleanup_queue(struct kvm_vcpu *vcpu, int prio)
> }
> }
>
> +static int kvmppc_xive_native_configure_queue(u32 vp_id, struct xive_q *q,
> + u8 prio, __be32 *qpage,
> + u32 order, bool can_escalate)
> +{
> + int rc;
> + __be32 *qpage_prev = q->qpage;
> +
> + rc = xive_native_configure_queue(vp_id, q, prio, qpage, order,
> + can_escalate);
> + if (rc)
> + return rc;
> +
> + if (qpage_prev)
> + put_page(virt_to_page(qpage_prev));
> +
> + return rc;
> +}
> +
> void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu)
> {
> struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
> @@ -575,19 +593,14 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
> q->guest_qaddr = 0;
> q->guest_qshift = 0;
>
> - rc = xive_native_configure_queue(xc->vp_id, q, priority,
> - NULL, 0, true);
> + rc = kvmppc_xive_native_configure_queue(xc->vp_id, q, priority,
> + NULL, 0, true);
> if (rc) {
> pr_err("Failed to reset queue %d for VCPU %d: %d\n",
> priority, xc->server_num, rc);
> return rc;
> }
>
> - if (q->qpage) {
> - put_page(virt_to_page(q->qpage));
> - q->qpage = NULL;
> - }
> -
> return 0;
> }
>
> @@ -646,8 +659,8 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
> * OPAL level because the use of END ESBs is not supported by
> * Linux.
> */
> - rc = xive_native_configure_queue(xc->vp_id, q, priority,
> - (__be32 *) qaddr, kvm_eq.qshift, true);
> + rc = kvmppc_xive_native_configure_queue(xc->vp_id, q, priority,
> + (__be32 *) qaddr, kvm_eq.qshift, true);
> if (rc) {
> pr_err("Failed to configure queue %d for VCPU %d: %d\n",
> priority, xc->server_num, rc);
>
[-- Attachment #2: Type: text/html, Size: 11679 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH v2 1/2] KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one
2019-11-13 16:46 [PATCH v2 1/2] KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one Greg Kurz
2019-11-13 16:46 ` [PATCH v2 2/2] KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path Greg Kurz
2019-11-20 16:19 ` [PATCH v2 1/2] KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one Lijun Pan
@ 2019-11-20 16:38 ` Lijun Pan
2 siblings, 0 replies; 5+ messages in thread
From: Lijun Pan @ 2019-11-20 16:38 UTC (permalink / raw)
To: Greg Kurz
Cc: Laurent Vivier, linux-kernel, kvm-ppc, Satheesh Rajendran,
Cédric Le Goater, Lijun Pan, stable, linuxppc-dev,
David Gibson
> On Nov 13, 2019, at 10:46 AM, Greg Kurz <groug@kaod.org> wrote:
>
> The EQ page is allocated by the guest and then passed to the hypervisor
> with the H_INT_SET_QUEUE_CONFIG hcall. A reference is taken on the page
> before handing it over to the HW. This reference is dropped either when
> the guest issues the H_INT_RESET hcall or when the KVM device is released.
> But, the guest can legitimately call H_INT_SET_QUEUE_CONFIG several times,
> either to reset the EQ (vCPU hot unplug) or to set a new EQ (guest reboot).
> In both cases the existing EQ page reference is leaked because we simply
> overwrite it in the XIVE queue structure without calling put_page().
>
> This is especially visible when the guest memory is backed with huge pages:
> start a VM up to the guest userspace, either reboot it or unplug a vCPU,
> quit QEMU. The leak is observed by comparing the value of HugePages_Free in
> /proc/meminfo before and after the VM is run.
>
> Ideally we'd want the XIVE code to handle the EQ page de-allocation at the
> platform level. This isn't the case right now because the various XIVE
> drivers have different allocation needs. It could maybe worth introducing
> hooks for this purpose instead of exposing XIVE internals to the drivers,
> but this is certainly a huge work to be done later.
>
> In the meantime, for easier backport, fix both vCPU unplug and guest reboot
> leaks by introducing a wrapper around xive_native_configure_queue() that
> does the necessary cleanup.
>
> Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
> Cc: stable@vger.kernel.org # v5.2
> Fixes: 13ce3297c576 ("KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration")
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> Signed-off-by: Greg Kurz <groug@kaod.org>
Tested-by: Lijun Pan <ljp@linux.ibm.com>
> ---
> v2: use wrapper as suggested by Cedric
> ---
> arch/powerpc/kvm/book3s_xive_native.c | 31 ++++++++++++++++++++++---------
> 1 file changed, 22 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
> index 34bd123fa024..0e1fc5a16729 100644
> --- a/arch/powerpc/kvm/book3s_xive_native.c
> +++ b/arch/powerpc/kvm/book3s_xive_native.c
> @@ -50,6 +50,24 @@ static void kvmppc_xive_native_cleanup_queue(struct kvm_vcpu *vcpu, int prio)
> }
> }
>
> +static int kvmppc_xive_native_configure_queue(u32 vp_id, struct xive_q *q,
> + u8 prio, __be32 *qpage,
> + u32 order, bool can_escalate)
> +{
> + int rc;
> + __be32 *qpage_prev = q->qpage;
> +
> + rc = xive_native_configure_queue(vp_id, q, prio, qpage, order,
> + can_escalate);
> + if (rc)
> + return rc;
> +
> + if (qpage_prev)
> + put_page(virt_to_page(qpage_prev));
> +
> + return rc;
> +}
> +
> void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu)
> {
> struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
> @@ -575,19 +593,14 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
> q->guest_qaddr = 0;
> q->guest_qshift = 0;
>
> - rc = xive_native_configure_queue(xc->vp_id, q, priority,
> - NULL, 0, true);
> + rc = kvmppc_xive_native_configure_queue(xc->vp_id, q, priority,
> + NULL, 0, true);
> if (rc) {
> pr_err("Failed to reset queue %d for VCPU %d: %d\n",
> priority, xc->server_num, rc);
> return rc;
> }
>
> - if (q->qpage) {
> - put_page(virt_to_page(q->qpage));
> - q->qpage = NULL;
> - }
> -
> return 0;
> }
>
> @@ -646,8 +659,8 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
> * OPAL level because the use of END ESBs is not supported by
> * Linux.
> */
> - rc = xive_native_configure_queue(xc->vp_id, q, priority,
> - (__be32 *) qaddr, kvm_eq.qshift, true);
> + rc = kvmppc_xive_native_configure_queue(xc->vp_id, q, priority,
> + (__be32 *) qaddr, kvm_eq.qshift, true);
> if (rc) {
> pr_err("Failed to configure queue %d for VCPU %d: %d\n",
> priority, xc->server_num, rc);
>
^ permalink raw reply [flat|nested] 5+ messages in thread