From: Jason Wang <jasowang@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
peterx@redhat.com, virtualization@lists.linux-foundation.org,
pbonzini@redhat.com
Subject: Re: [RFC PATCH V2 2/2] vhost: device IOTLB API
Date: Thu, 28 Apr 2016 14:37:16 +0800 [thread overview]
Message-ID: <5721AF9C.9030209@redhat.com> (raw)
In-Reply-To: <20160427143057-mutt-send-email-mst@redhat.com>
On 04/27/2016 07:45 PM, Michael S. Tsirkin wrote:
> On Fri, Mar 25, 2016 at 10:34:34AM +0800, Jason Wang wrote:
>> This patch tries to implement an device IOTLB for vhost. This could be
>> used with for co-operation with userspace(qemu) implementation of DMA
>> remapping.
>>
>> The idea is simple. When vhost meets an IOTLB miss, it will request
>> the assistance of userspace to do the translation, this is done
>> through:
>>
>> - Fill the translation request in a preset userspace address (This
>> address is set through ioctl VHOST_SET_IOTLB_REQUEST_ENTRY).
>> - Notify userspace through eventfd (This eventfd was set through ioctl
>> VHOST_SET_IOTLB_FD).
> Why use an eventfd for this?
The aim is to implement the API all through ioctls.
> We use them for interrupts because
> that happens to be what kvm wants, but here - why don't we
> just add a generic support for reading out events
> on the vhost fd itself?
I've considered this approach, but what's the advantages of this? I mean
looks like all other ioctls could be done through vhost fd
reading/writing too.
>
>> - device IOTLB were started and stopped through VHOST_RUN_IOTLB ioctl
>>
>> When userspace finishes the translation, it will update the vhost
>> IOTLB through VHOST_UPDATE_IOTLB ioctl. Userspace is also in charge of
>> snooping the IOTLB invalidation of IOMMU IOTLB and use
>> VHOST_UPDATE_IOTLB to invalidate the possible entry in vhost.
> There's one problem here, and that is that VQs still do not undergo
> translation. In theory VQ could be mapped in such a way
> that it's not contigious in userspace memory.
I'm not sure I get the issue, current vhost API support setting
desc_user_addr, used_user_addr and avail_user_addr independently. So
looks ok? If not, looks not a problem to device IOTLB API itself.
>
>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> What limits amount of entries that kernel keeps around?
It depends on guest working set I think. Looking at
http://dpdk.org/doc/guides/linux_gsg/sys_reqs.html:
- For 2MB page size in guest, it suggests hugepages=1024
- For 1GB page size, it suggests a hugepages=4
So I choose 2048 to make sure it can cover this.
> Do we want at least a mod parameter for this?
Maybe.
>
>> ---
>> drivers/vhost/net.c | 6 +-
>> drivers/vhost/vhost.c | 301 +++++++++++++++++++++++++++++++++++++++------
>> drivers/vhost/vhost.h | 17 ++-
>> fs/eventfd.c | 3 +-
>> include/uapi/linux/vhost.h | 35 ++++++
>> 5 files changed, 320 insertions(+), 42 deletions(-)
>>
[...]
>> +struct vhost_iotlb_entry {
>> + __u64 iova;
>> + __u64 size;
>> + __u64 userspace_addr;
> Alignment requirements?
The API does not require any alignment. Will add a comment for this.
>
>> + struct {
>> +#define VHOST_ACCESS_RO 0x1
>> +#define VHOST_ACCESS_WO 0x2
>> +#define VHOST_ACCESS_RW 0x3
>> + __u8 perm;
>> +#define VHOST_IOTLB_MISS 1
>> +#define VHOST_IOTLB_UPDATE 2
>> +#define VHOST_IOTLB_INVALIDATE 3
>> + __u8 type;
>> +#define VHOST_IOTLB_INVALID 0x1
>> +#define VHOST_IOTLB_VALID 0x2
>> + __u8 valid;
> why do we need this flag?
Useless, will remove.
>
>> + __u8 u8_padding;
>> + __u32 padding;
>> + } flags;
>> +};
>> +
>> +struct vhost_vring_iotlb_entry {
>> + unsigned int index;
>> + __u64 userspace_addr;
>> +};
>> +
>> struct vhost_memory_region {
>> __u64 guest_phys_addr;
>> __u64 memory_size; /* bytes */
>> @@ -127,6 +153,15 @@ struct vhost_memory {
>> /* Set eventfd to signal an error */
>> #define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct vhost_vring_file)
>>
>> +/* IOTLB */
>> +/* Specify an eventfd file descriptor to signle on IOTLB miss */
> typo
Will fix it.
>
>> +#define VHOST_SET_VRING_IOTLB_CALL _IOW(VHOST_VIRTIO, 0x23, struct \
>> + vhost_vring_file)
>> +#define VHOST_SET_VRING_IOTLB_REQUEST _IOW(VHOST_VIRTIO, 0x25, struct \
>> + vhost_vring_iotlb_entry)
>> +#define VHOST_UPDATE_IOTLB _IOW(VHOST_VIRTIO, 0x24, struct vhost_iotlb_entry)
>> +#define VHOST_RUN_IOTLB _IOW(VHOST_VIRTIO, 0x26, int)
>> +
> Is the assumption that userspace must dedicate a thread to running the iotlb?
> I dislike this one.
> Please support asynchronous APIs at least optionally to make
> userspace make its own threading decisions.
Nope, my qemu patches does not use a dedicated thread. This API is used
to start or top DMAR according to e.g whether guest enable DMAR for
intel IOMMU.
>
>> /* VHOST_NET specific defines */
>>
>> /* Attach virtio net ring to a raw socket, or tap device.
> Don't we need a feature bit for this?
Yes we need it. The feature bit is not considered in this patch and
looks like it was still under discussion. After we finalize it, I will add.
> Are we short on feature bits? If yes maybe it's time to add
> something like PROTOCOL_FEATURES that we have in vhost-user.
>
I believe it can just work like VERSION_1, or is there anything I missed?
next prev parent reply other threads:[~2016-04-28 6:37 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-25 2:34 [RFC PATCH V2 0/2] basic device IOTLB support Jason Wang
2016-03-25 2:34 ` [RFC PATCH V2 1/2] vhost: convert pre sorted vhost memory array to interval tree Jason Wang
2016-04-27 11:30 ` Michael S. Tsirkin
2016-04-28 6:20 ` Jason Wang
2016-03-25 2:34 ` [RFC PATCH V2 2/2] vhost: device IOTLB API Jason Wang
2016-04-27 11:45 ` Michael S. Tsirkin
[not found] ` <20160427143057-mutt-send-email-mst@redhat.com>
2016-04-28 6:37 ` Jason Wang [this message]
2016-04-28 14:43 ` Michael S. Tsirkin
2016-04-29 1:12 ` Jason Wang
2016-04-29 4:44 ` Jason Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5721AF9C.9030209@redhat.com \
--to=jasowang@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).