From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Alexander Graf <agraf@suse.de>
Cc: linuxppc-dev@lists.ozlabs.org, Paul Mackerras <paulus@samba.org>,
Gleb Natapov <gleb@kernel.org>,
Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
kvm-ppc@vger.kernel.org
Subject: Re: [PATCH 3/3] PPC: KVM: Add support for 64bit TCE windows
Date: Thu, 05 Jun 2014 13:04:01 +0000 [thread overview]
Message-ID: <53906AC1.6000404@ozlabs.ru> (raw)
In-Reply-To: <1401971411.3247.132.camel@pasglop>
On 06/05/2014 10:30 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2014-06-05 at 13:56 +0200, Alexander Graf wrote:
>> What if we ask user space to give us a pointer to user space allocated
>> memory along with the TCE registration? We would still ask user space to
>> only use the returned fd for TCE modifications, but would have some
>> nicely swappable memory we can store the TCE entries in.
>
> That isn't going to work terribly well for VFIO :-) But yes, for
> emulated devices, we could improve things a bit, including for
> the 32-bit TCE tables.
>
> For emulated, the real mode path could walk the page tables and fallback
> to virtual mode & get_user if the page isn't present, thus operating
> directly on qemu memory TCE tables instead of the current pinned stuff.
>
> However that has a cost in performance, but since that's really only
> used for emulated devices and PAPR VIOs, it might not be a huge issue.
>
> But for VFIO we don't have much choice, we need to create something the
> HW can access.
You are confusing things here.
There are 2 tables:
1. guest-visible TCE table, this is what is allocated for VIO or emulated PCI;
2. real HW DMA window, one exists already for DMA32 and one I will
allocated for a huge window.
I have just #2 for VFIO now but we will need both in order to implement
H_GET_TCE correctly, and this is the table I will allocate by this new ioctl.
>> In fact, the code as is today can allocate an arbitrary amount of pinned
>> kernel memory from within user space without any checks.
>
> Right. We should at least account it in the locked limit.
Yup. And (probably) this thing will keep a counter of how many windows were
created per KVM instance to avoid having multiple copies of the same table.
--
Alexey
WARNING: multiple messages have this Message-ID (diff)
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Alexander Graf <agraf@suse.de>
Cc: kvm@vger.kernel.org, Gleb Natapov <gleb@kernel.org>,
linux-kernel@vger.kernel.org, kvm-ppc@vger.kernel.org,
Paul Mackerras <paulus@samba.org>,
Paolo Bonzini <pbonzini@redhat.com>,
linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 3/3] PPC: KVM: Add support for 64bit TCE windows
Date: Thu, 05 Jun 2014 23:04:01 +1000 [thread overview]
Message-ID: <53906AC1.6000404@ozlabs.ru> (raw)
In-Reply-To: <1401971411.3247.132.camel@pasglop>
On 06/05/2014 10:30 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2014-06-05 at 13:56 +0200, Alexander Graf wrote:
>> What if we ask user space to give us a pointer to user space allocated
>> memory along with the TCE registration? We would still ask user space to
>> only use the returned fd for TCE modifications, but would have some
>> nicely swappable memory we can store the TCE entries in.
>
> That isn't going to work terribly well for VFIO :-) But yes, for
> emulated devices, we could improve things a bit, including for
> the 32-bit TCE tables.
>
> For emulated, the real mode path could walk the page tables and fallback
> to virtual mode & get_user if the page isn't present, thus operating
> directly on qemu memory TCE tables instead of the current pinned stuff.
>
> However that has a cost in performance, but since that's really only
> used for emulated devices and PAPR VIOs, it might not be a huge issue.
>
> But for VFIO we don't have much choice, we need to create something the
> HW can access.
You are confusing things here.
There are 2 tables:
1. guest-visible TCE table, this is what is allocated for VIO or emulated PCI;
2. real HW DMA window, one exists already for DMA32 and one I will
allocated for a huge window.
I have just #2 for VFIO now but we will need both in order to implement
H_GET_TCE correctly, and this is the table I will allocate by this new ioctl.
>> In fact, the code as is today can allocate an arbitrary amount of pinned
>> kernel memory from within user space without any checks.
>
> Right. We should at least account it in the locked limit.
Yup. And (probably) this thing will keep a counter of how many windows were
created per KVM instance to avoid having multiple copies of the same table.
--
Alexey
WARNING: multiple messages have this Message-ID (diff)
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Alexander Graf <agraf@suse.de>
Cc: linuxppc-dev@lists.ozlabs.org, Paul Mackerras <paulus@samba.org>,
Gleb Natapov <gleb@kernel.org>,
Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
kvm-ppc@vger.kernel.org
Subject: Re: [PATCH 3/3] PPC: KVM: Add support for 64bit TCE windows
Date: Thu, 05 Jun 2014 23:04:01 +1000 [thread overview]
Message-ID: <53906AC1.6000404@ozlabs.ru> (raw)
In-Reply-To: <1401971411.3247.132.camel@pasglop>
On 06/05/2014 10:30 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2014-06-05 at 13:56 +0200, Alexander Graf wrote:
>> What if we ask user space to give us a pointer to user space allocated
>> memory along with the TCE registration? We would still ask user space to
>> only use the returned fd for TCE modifications, but would have some
>> nicely swappable memory we can store the TCE entries in.
>
> That isn't going to work terribly well for VFIO :-) But yes, for
> emulated devices, we could improve things a bit, including for
> the 32-bit TCE tables.
>
> For emulated, the real mode path could walk the page tables and fallback
> to virtual mode & get_user if the page isn't present, thus operating
> directly on qemu memory TCE tables instead of the current pinned stuff.
>
> However that has a cost in performance, but since that's really only
> used for emulated devices and PAPR VIOs, it might not be a huge issue.
>
> But for VFIO we don't have much choice, we need to create something the
> HW can access.
You are confusing things here.
There are 2 tables:
1. guest-visible TCE table, this is what is allocated for VIO or emulated PCI;
2. real HW DMA window, one exists already for DMA32 and one I will
allocated for a huge window.
I have just #2 for VFIO now but we will need both in order to implement
H_GET_TCE correctly, and this is the table I will allocate by this new ioctl.
>> In fact, the code as is today can allocate an arbitrary amount of pinned
>> kernel memory from within user space without any checks.
>
> Right. We should at least account it in the locked limit.
Yup. And (probably) this thing will keep a counter of how many windows were
created per KVM instance to avoid having multiple copies of the same table.
--
Alexey
next prev parent reply other threads:[~2014-06-05 13:04 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-05 7:25 [PATCH 0/3] Prepare for in-kernel VFIO DMA operations acceleration Alexey Kardashevskiy
2014-06-05 7:25 ` Alexey Kardashevskiy
2014-06-05 7:25 ` Alexey Kardashevskiy
2014-06-05 7:25 ` [PATCH 1/3] PPC: KVM: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number Alexey Kardashevskiy
2014-06-05 7:25 ` Alexey Kardashevskiy
2014-06-05 7:25 ` Alexey Kardashevskiy
2014-06-05 7:25 ` [PATCH 2/3] PPC: KVM: Reserve KVM_CAP_SPAPR_TCE_64 " Alexey Kardashevskiy
2014-06-05 7:25 ` Alexey Kardashevskiy
2014-06-05 7:25 ` Alexey Kardashevskiy
2014-06-05 7:25 ` Alexey Kardashevskiy
2014-06-05 7:25 ` [PATCH 3/3] PPC: KVM: Add support for 64bit TCE windows Alexey Kardashevskiy
2014-06-05 7:25 ` Alexey Kardashevskiy
2014-06-05 7:25 ` Alexey Kardashevskiy
2014-06-05 7:38 ` Benjamin Herrenschmidt
2014-06-05 7:38 ` Benjamin Herrenschmidt
2014-06-05 7:38 ` Benjamin Herrenschmidt
2014-06-05 9:26 ` Alexey Kardashevskiy
2014-06-05 9:26 ` Alexey Kardashevskiy
2014-06-05 9:26 ` Alexey Kardashevskiy
2014-06-05 10:27 ` Benjamin Herrenschmidt
2014-06-05 10:27 ` Benjamin Herrenschmidt
2014-06-05 10:27 ` Benjamin Herrenschmidt
2014-06-05 10:27 ` Benjamin Herrenschmidt
2014-06-05 11:56 ` Alexander Graf
2014-06-05 11:56 ` Alexander Graf
2014-06-05 11:56 ` Alexander Graf
2014-06-05 12:30 ` Benjamin Herrenschmidt
2014-06-05 12:30 ` Benjamin Herrenschmidt
2014-06-05 12:30 ` Benjamin Herrenschmidt
2014-06-05 12:32 ` Alexander Graf
2014-06-05 12:32 ` Alexander Graf
2014-06-05 12:32 ` Alexander Graf
2014-06-05 13:04 ` Alexey Kardashevskiy [this message]
2014-06-05 13:04 ` Alexey Kardashevskiy
2014-06-05 13:04 ` Alexey Kardashevskiy
2014-06-05 11:57 ` [PATCH 0/3] Prepare for in-kernel VFIO DMA operations acceleration Alexander Graf
2014-06-05 11:57 ` Alexander Graf
2014-06-05 11:57 ` Alexander Graf
2014-06-06 0:20 ` Alexey Kardashevskiy
2014-06-06 0:20 ` Alexey Kardashevskiy
2014-06-06 0:20 ` Alexey Kardashevskiy
2014-06-25 21:12 ` Alexander Graf
2014-06-25 21:12 ` Alexander Graf
2014-06-25 21:12 ` Alexander Graf
2014-06-25 23:59 ` Alexey Kardashevskiy
2014-06-25 23:59 ` Alexey Kardashevskiy
2014-06-25 23:59 ` Alexey Kardashevskiy
2014-06-26 10:37 ` Alexander Graf
2014-06-26 10:37 ` Alexander Graf
2014-06-26 10:37 ` Alexander Graf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53906AC1.6000404@ozlabs.ru \
--to=aik@ozlabs.ru \
--cc=agraf@suse.de \
--cc=benh@kernel.crashing.org \
--cc=gleb@kernel.org \
--cc=kvm-ppc@vger.kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=paulus@samba.org \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.