From: David Gibson <david@gibson.dropbear.id.au>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: linuxppc-dev@lists.ozlabs.org, Paul Mackerras <paulus@samba.org>,
kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH kernel v2 3/6] KVM: PPC: Account TCE-containing pages in locked_vm
Date: Mon, 25 Jan 2016 10:57:14 +1100 [thread overview]
Message-ID: <20160124235714.GC27454@voom.redhat.com> (raw)
In-Reply-To: <1453361977-19589-4-git-send-email-aik@ozlabs.ru>
[-- Attachment #1: Type: text/plain, Size: 4860 bytes --]
On Thu, Jan 21, 2016 at 06:39:34PM +1100, Alexey Kardashevskiy wrote:
> At the moment pages used for TCE tables (in addition to pages addressed
> by TCEs) are not counted in locked_vm counter so a malicious userspace
> tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and
> lock a lot of memory.
>
> This adds counting for pages used for TCE tables.
>
> This counts the number of pages required for a table plus pages for
> the kvmppc_spapr_tce_table struct (TCE table descriptor) itself.
>
> This changes release_spapr_tce_table() to store @npages on stack to
> avoid calling kvmppc_stt_npages() in the loop (tiny optimization,
> probably).
>
> This does not change the amount of (de)allocated memory.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> Changes:
> v2:
> * switched from long to unsigned long types
> * added WARN_ON_ONCE() in locked_vm decrement case
> ---
> arch/powerpc/kvm/book3s_64_vio.c | 55 +++++++++++++++++++++++++++++++++++++---
> 1 file changed, 52 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 9526c34..ea498b4 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -39,19 +39,62 @@
>
> #define TCES_PER_PAGE (PAGE_SIZE / sizeof(u64))
>
> -static long kvmppc_stt_npages(unsigned long window_size)
> +static unsigned long kvmppc_stt_npages(unsigned long window_size)
> {
> return ALIGN((window_size >> SPAPR_TCE_SHIFT)
> * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
> }
>
> +static long kvmppc_account_memlimit(unsigned long npages, bool inc)
> +{
> + long ret = 0;
> + const unsigned long bytes = sizeof(struct kvmppc_spapr_tce_table) +
> + (npages * sizeof(struct page *));
> + const unsigned long stt_pages = ALIGN(bytes, PAGE_SIZE) / PAGE_SIZE;
Urgh, this is made pretty hard to follow by the fact that in some
places npages / stt_pages refers to the number of pages occupied by
the actual TCE tables, and in other places to the number of pages
occupied by the overhead data structures. Please use different (and
consistent) variables for the two things to make this clearer.
It also seems odd the calculation of the overhead pages is done here,
but the base number of pages is calculated in the caller, even though
both quantities come from the stt structure itself.
> + if (!current || !current->mm)
> + return ret; /* process exited */
> +
> + npages += stt_pages;
> +
> + down_write(¤t->mm->mmap_sem);
> +
> + if (inc) {
> + unsigned long locked, lock_limit;
> +
> + locked = current->mm->locked_vm + npages;
> + lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> + if (locked > lock_limit && !capable(CAP_IPC_LOCK))
> + ret = -ENOMEM;
> + else
> + current->mm->locked_vm += npages;
> + } else {
> + if (WARN_ON_ONCE(npages > current->mm->locked_vm))
> + npages = current->mm->locked_vm;
> +
> + current->mm->locked_vm -= npages;
> + }
> +
> + pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid,
> + inc ? '+' : '-',
> + npages << PAGE_SHIFT,
> + current->mm->locked_vm << PAGE_SHIFT,
> + rlimit(RLIMIT_MEMLOCK),
> + ret ? " - exceeded" : "");
> +
> + up_write(¤t->mm->mmap_sem);
> +
> + return ret;
> +}
> +
> static void release_spapr_tce_table(struct rcu_head *head)
> {
> struct kvmppc_spapr_tce_table *stt = container_of(head,
> struct kvmppc_spapr_tce_table, rcu);
> int i;
> + unsigned long npages = kvmppc_stt_npages(stt->window_size);
>
> - for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
> + for (i = 0; i < npages; i++)
> __free_page(stt->pages[i]);
>
> kfree(stt);
> @@ -89,6 +132,7 @@ static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
>
> kvm_put_kvm(stt->kvm);
>
> + kvmppc_account_memlimit(kvmppc_stt_npages(stt->window_size), false);
> call_rcu(&stt->rcu, release_spapr_tce_table);
>
> return 0;
> @@ -103,7 +147,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
> struct kvm_create_spapr_tce *args)
> {
> struct kvmppc_spapr_tce_table *stt = NULL;
> - long npages;
> + unsigned long npages;
> int ret = -ENOMEM;
> int i;
>
> @@ -114,6 +158,11 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
> }
>
> npages = kvmppc_stt_npages(args->window_size);
> + ret = kvmppc_account_memlimit(npages, true);
> + if (ret) {
> + stt = NULL;
> + goto fail;
> + }
>
> stt = kzalloc(sizeof(*stt) + npages * sizeof(struct page *),
> GFP_KERNEL);
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
next prev parent reply other threads:[~2016-01-25 0:43 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-21 7:39 [PATCH kernel v2 0/6] KVM: PPC: Add in-kernel multitce handling Alexey Kardashevskiy
2016-01-21 7:39 ` [PATCH kernel v2 1/6] KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers Alexey Kardashevskiy
2016-01-22 0:42 ` David Gibson
2016-01-22 1:59 ` Alexey Kardashevskiy
2016-01-24 23:43 ` David Gibson
2016-02-11 4:11 ` Paul Mackerras
2016-01-21 7:39 ` [PATCH kernel v2 2/6] KVM: PPC: Use RCU for arch.spapr_tce_tables Alexey Kardashevskiy
2016-01-24 23:46 ` David Gibson
2016-01-21 7:39 ` [PATCH kernel v2 3/6] KVM: PPC: Account TCE-containing pages in locked_vm Alexey Kardashevskiy
2016-01-24 23:57 ` David Gibson [this message]
2016-01-21 7:39 ` [PATCH kernel v2 4/6] KVM: PPC: Replace SPAPR_TCE_SHIFT with IOMMU_PAGE_SHIFT_4K Alexey Kardashevskiy
2016-01-21 7:39 ` [PATCH kernel v2 5/6] KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers Alexey Kardashevskiy
2016-01-25 0:12 ` David Gibson
2016-01-25 0:18 ` David Gibson
2016-02-11 4:39 ` Paul Mackerras
2016-01-21 7:39 ` [PATCH kernel v2 6/6] KVM: PPC: Add support for multiple-TCE hcalls Alexey Kardashevskiy
2016-01-21 7:56 ` kbuild test robot
2016-01-21 8:09 ` Alexey Kardashevskiy
2016-01-25 0:44 ` David Gibson
2016-01-25 1:24 ` Alexey Kardashevskiy
2016-01-25 5:21 ` David Gibson
2016-02-11 5:32 ` Paul Mackerras
2016-02-12 4:54 ` Alexey Kardashevskiy
2016-02-12 5:52 ` Paul Mackerras
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160124235714.GC27454@voom.redhat.com \
--to=david@gibson.dropbear.id.au \
--cc=aik@ozlabs.ru \
--cc=kvm-ppc@vger.kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).