From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next 3/3] vhost: access vq metadata through kernel virtual address
Date: Fri, 14 Dec 2018 07:36:45 -0500 [thread overview]
Message-ID: <20181214073332-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <d70c5ac1-c805-cefd-ace1-643a2e271ba0@redhat.com>
On Fri, Dec 14, 2018 at 11:57:35AM +0800, Jason Wang wrote:
>
> On 2018/12/13 下午11:44, Michael S. Tsirkin wrote:
> > On Thu, Dec 13, 2018 at 06:10:22PM +0800, Jason Wang wrote:
> > > It was noticed that the copy_user() friends that was used to access
> > > virtqueue metdata tends to be very expensive for dataplane
> > > implementation like vhost since it involves lots of software check,
> > > speculation barrier, hardware feature toggling (e.g SMAP). The
> > > extra cost will be more obvious when transferring small packets.
> > >
> > > This patch tries to eliminate those overhead by pin vq metadata pages
> > > and access them through vmap(). During SET_VRING_ADDR, we will setup
> > > those mappings and memory accessors are modified to use pointers to
> > > access the metadata directly.
> > >
> > > Note, this was only done when device IOTLB is not enabled. We could
> > > use similar method to optimize it in the future.
> > >
> > > Tests shows about ~24% improvement on TX PPS when using virtio-user +
> > > vhost_net + xdp1 on TAP (CONFIG_HARDENED_USERCOPY is not enabled):
> > >
> > > Before: ~5.0Mpps
> > > After: ~6.1Mpps
> > >
> > > Signed-off-by: Jason Wang<jasowang@redhat.com>
> > > ---
> > > drivers/vhost/vhost.c | 178 ++++++++++++++++++++++++++++++++++++++++++
> > > drivers/vhost/vhost.h | 11 +++
> > > 2 files changed, 189 insertions(+)
> > >
> > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > index bafe39d2e637..1bd24203afb6 100644
> > > --- a/drivers/vhost/vhost.c
> > > +++ b/drivers/vhost/vhost.c
> > > @@ -443,6 +443,9 @@ void vhost_dev_init(struct vhost_dev *dev,
> > > vq->indirect = NULL;
> > > vq->heads = NULL;
> > > vq->dev = dev;
> > > + memset(&vq->avail_ring, 0, sizeof(vq->avail_ring));
> > > + memset(&vq->used_ring, 0, sizeof(vq->used_ring));
> > > + memset(&vq->desc_ring, 0, sizeof(vq->desc_ring));
> > > mutex_init(&vq->mutex);
> > > vhost_vq_reset(dev, vq);
> > > if (vq->handle_kick)
> > > @@ -614,6 +617,102 @@ static void vhost_clear_msg(struct vhost_dev *dev)
> > > spin_unlock(&dev->iotlb_lock);
> > > }
> > > +static int vhost_init_vmap(struct vhost_vmap *map, unsigned long uaddr,
> > > + size_t size, int write)
> > > +{
> > > + struct page **pages;
> > > + int npages = DIV_ROUND_UP(size, PAGE_SIZE);
> > > + int npinned;
> > > + void *vaddr;
> > > +
> > > + pages = kmalloc_array(npages, sizeof(struct page *), GFP_KERNEL);
> > > + if (!pages)
> > > + return -ENOMEM;
> > > +
> > > + npinned = get_user_pages_fast(uaddr, npages, write, pages);
> > > + if (npinned != npages)
> > > + goto err;
> > > +
> > As I said I have doubts about the whole approach, but this
> > implementation in particular isn't a good idea
> > as it keeps the page around forever.
> > So no THP, no NUMA rebalancing,
>
>
> This is the price of all GUP users not only vhost itself.
Yes. GUP is just not a great interface for vhost to use.
> What's more
> important, the goal is not to be left too much behind for other backends
> like DPDK or AF_XDP (all of which are using GUP).
So these guys assume userspace knows what it's doing.
We can't assume that.
>
> > userspace-controlled
> > amount of memory locked up and not accounted for.
>
>
> It's pretty easy to add this since the slow path was still kept. If we
> exceeds the limitation, we can switch back to slow path.
>
> >
> > Don't get me wrong it's a great patch in an ideal world.
> > But then in an ideal world no barriers smap etc are necessary at all.
>
>
> Again, this is only for metadata accessing not the data which has been used
> for years for real use cases.
>
> For SMAP, it makes senses for the address that kernel can not forcast. But
> it's not the case for the vhost metadata since we know the address will be
> accessed very frequently. For speculation barrier, it helps nothing for the
> data path of vhost which is a kthread.
I don't see how a kthread makes any difference. We do have a validation
step which makes some difference.
> Packet or AF_XDP benefit from
> accessing metadata directly, we should do it as well.
>
> Thanks
next prev parent reply other threads:[~2018-12-14 12:36 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-13 10:10 [PATCH net-next 0/3] vhost: accelerate metadata access through vmap() Jason Wang
2018-12-13 10:10 ` [PATCH net-next 1/3] vhost: generalize adding used elem Jason Wang
2018-12-13 19:41 ` Michael S. Tsirkin
2018-12-14 4:00 ` Jason Wang
2018-12-13 10:10 ` [PATCH net-next 2/3] vhost: fine grain userspace memory accessors Jason Wang
2018-12-13 10:10 ` [PATCH net-next 3/3] vhost: access vq metadata through kernel virtual address Jason Wang
2018-12-13 15:44 ` Michael S. Tsirkin
2018-12-13 21:18 ` Konrad Rzeszutek Wilk
2018-12-13 21:58 ` Michael S. Tsirkin
2018-12-14 3:57 ` Jason Wang
2018-12-14 12:36 ` Michael S. Tsirkin [this message]
2018-12-24 7:53 ` Jason Wang
2018-12-24 18:10 ` Michael S. Tsirkin
2018-12-25 10:05 ` Jason Wang
2018-12-25 12:50 ` Michael S. Tsirkin
2018-12-26 3:57 ` Jason Wang
2018-12-26 15:02 ` Michael S. Tsirkin
2018-12-27 9:39 ` Jason Wang
2018-12-30 18:30 ` Michael S. Tsirkin
2019-01-02 11:38 ` Jason Wang
2018-12-15 21:15 ` David Miller
2018-12-14 14:48 ` kbuild test robot
2018-12-13 15:27 ` [PATCH net-next 0/3] vhost: accelerate metadata access through vmap() Michael S. Tsirkin
2018-12-14 3:42 ` Jason Wang
2018-12-14 12:33 ` Michael S. Tsirkin
2018-12-14 15:31 ` Michael S. Tsirkin
2018-12-24 8:32 ` Jason Wang
2018-12-24 18:12 ` Michael S. Tsirkin
2018-12-25 10:09 ` Jason Wang
2018-12-25 12:52 ` Michael S. Tsirkin
2018-12-26 3:59 ` Jason Wang
2018-12-13 20:12 ` Michael S. Tsirkin
2018-12-14 4:29 ` Jason Wang
2018-12-14 12:52 ` Michael S. Tsirkin
2018-12-15 19:43 ` David Miller
2018-12-16 19:57 ` Michael S. Tsirkin
2018-12-24 8:44 ` Jason Wang
2018-12-24 19:09 ` Michael S. Tsirkin
2018-12-14 15:16 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181214073332-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=jasowang@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).