From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
kvm@vger.kernel.org, virtualization@lists.linux-foundation.org
Subject: Re: [RFC PATCH net-next 10/12] vhost_net: build xdp buff
Date: Tue, 22 May 2018 01:21:11 +0300 [thread overview]
Message-ID: <20180522012008-mutt-send-email-mst__38174.064565874$1526941158$gmane$org@kernel.org> (raw)
In-Reply-To: <20180521095611.00005caa@intel.com>
On Mon, May 21, 2018 at 09:56:11AM -0700, Jesse Brandeburg wrote:
> On Mon, 21 May 2018 17:04:31 +0800 Jason wrote:
> > This patch implement build XDP buffers in vhost_net. The idea is do
> > userspace copy in vhost_net and build XDP buff based on the
> > page. Vhost_net can then submit one or an array of XDP buffs to
> > underlayer socket (e.g TUN). TUN can choose to do XDP or call
> > build_skb() to build skb. To support build skb, vnet header were also
> > stored into the header of the XDP buff.
> >
> > This userspace copy and XDP buffs building is key to achieve XDP
> > batching in TUN, since TUN does not need to care about userspace copy
> > and then can disable premmption for several XDP buffs to achieve
> > batching from XDP.
> >
> > TODO: reserve headroom based on the TUN XDP.
> >
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> > drivers/vhost/net.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 74 insertions(+)
> >
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index f0639d7..1209e84 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -492,6 +492,80 @@ static bool vhost_has_more_pkts(struct vhost_net *net,
> > likely(!vhost_exceeds_maxpend(net));
> > }
> >
> > +#define VHOST_NET_HEADROOM 256
> > +#define VHOST_NET_RX_PAD (NET_IP_ALIGN + NET_SKB_PAD)
> > +
> > +static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
> > + struct iov_iter *from,
> > + struct xdp_buff *xdp)
> > +{
> > + struct vhost_virtqueue *vq = &nvq->vq;
> > + struct page_frag *alloc_frag = ¤t->task_frag;
> > + struct virtio_net_hdr *gso;
> > + size_t len = iov_iter_count(from);
> > + int buflen = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > + int pad = SKB_DATA_ALIGN(VHOST_NET_RX_PAD + VHOST_NET_HEADROOM
> > + + nvq->sock_hlen);
> > + int sock_hlen = nvq->sock_hlen;
> > + void *buf;
> > + int copied;
> > +
> > + if (len < nvq->sock_hlen)
> > + return -EFAULT;
> > +
> > + if (SKB_DATA_ALIGN(len + pad) +
> > + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > PAGE_SIZE)
> > + return -ENOSPC;
> > +
> > + buflen += SKB_DATA_ALIGN(len + pad);
>
> maybe store the result of SKB_DATA_ALIGN in a local instead of doing
> the work twice?
I don't mind, but I guess gcc can always do it itself?
> > + alloc_frag->offset = ALIGN((u64)alloc_frag->offset, SMP_CACHE_BYTES);
> > + if (unlikely(!skb_page_frag_refill(buflen, alloc_frag, GFP_KERNEL)))
> > + return -ENOMEM;
> > +
> > + buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > +
> > + /* We store two kinds of metadata in the header which will be
> > + * used for XDP_PASS to do build_skb():
> > + * offset 0: buflen
> > + * offset sizeof(int): vnet header
> > + */
> > + copied = copy_page_from_iter(alloc_frag->page,
> > + alloc_frag->offset + sizeof(int), sock_hlen, from);
> > + if (copied != sock_hlen)
> > + return -EFAULT;
> > +
> > + gso = (struct virtio_net_hdr *)(buf + sizeof(int));
> > +
> > + if ((gso->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) &&
> > + vhost16_to_cpu(vq, gso->csum_start) +
> > + vhost16_to_cpu(vq, gso->csum_offset) + 2 >
> > + vhost16_to_cpu(vq, gso->hdr_len)) {
> > + gso->hdr_len = cpu_to_vhost16(vq,
> > + vhost16_to_cpu(vq, gso->csum_start) +
> > + vhost16_to_cpu(vq, gso->csum_offset) + 2);
> > +
> > + if (vhost16_to_cpu(vq, gso->hdr_len) > len)
> > + return -EINVAL;
> > + }
> > +
> > + len -= sock_hlen;
> > + copied = copy_page_from_iter(alloc_frag->page,
> > + alloc_frag->offset + pad,
> > + len, from);
> > + if (copied != len)
> > + return -EFAULT;
> > +
> > + xdp->data_hard_start = buf;
> > + xdp->data = buf + pad;
> > + xdp->data_end = xdp->data + len;
> > + *(int *)(xdp->data_hard_start)= buflen;
>
> space before =
>
> > +
> > + get_page(alloc_frag->page);
> > + alloc_frag->offset += buflen;
> > +
> > + return 0;
> > +}
> > +
> > static void handle_tx_copy(struct vhost_net *net)
> > {
> > struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
next prev parent reply other threads:[~2018-05-21 22:21 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-21 9:04 [RFC PATCH net-next 00/12] XDP batching for TUN/vhost_net Jason Wang
2018-05-21 9:04 ` [RFC PATCH net-next 01/12] vhost_net: introduce helper to initialize tx iov iter Jason Wang
2018-05-21 9:04 ` [RFC PATCH net-next 02/12] vhost_net: introduce vhost_exceeds_weight() Jason Wang
2018-05-21 9:04 ` [RFC PATCH net-next 03/12] vhost_net: introduce vhost_has_more_pkts() Jason Wang
2018-05-21 16:39 ` Jesse Brandeburg
[not found] ` <20180521093908.00006747@intel.com>
2018-05-22 12:31 ` Jason Wang
2018-05-21 9:04 ` [RFC PATCH net-next 04/12] vhost_net: split out datacopy logic Jason Wang
2018-05-21 16:46 ` Jesse Brandeburg
2018-05-22 12:39 ` Jason Wang
2018-05-21 9:04 ` [RFC PATCH net-next 05/12] vhost_net: batch update used ring for datacopy TX Jason Wang
2018-05-21 9:04 ` [RFC PATCH net-next 06/12] tuntap: enable premmption early Jason Wang
2018-05-21 9:04 ` [RFC PATCH net-next 07/12] tuntap: simplify error handling in tun_build_skb() Jason Wang
2018-05-21 9:04 ` [RFC PATCH net-next 08/12] tuntap: tweak on the path of non-xdp case " Jason Wang
2018-05-21 9:04 ` [RFC PATCH net-next 09/12] tuntap: split out XDP logic Jason Wang
2018-05-21 9:04 ` [RFC PATCH net-next 10/12] vhost_net: build xdp buff Jason Wang
2018-05-21 9:04 ` [RFC PATCH net-next 11/12] vhost_net: passing raw xdp buff to tun Jason Wang
2018-05-21 9:04 ` [RFC PATCH net-next 12/12] vhost_net: batch submitting XDP buffers to underlayer sockets Jason Wang
[not found] ` <1526893473-20128-7-git-send-email-jasowang@redhat.com>
2018-05-21 14:32 ` [RFC PATCH net-next 06/12] tuntap: enable premmption early Michael S. Tsirkin
[not found] ` <1526893473-20128-13-git-send-email-jasowang@redhat.com>
2018-05-21 14:33 ` [RFC PATCH net-next 12/12] vhost_net: batch submitting XDP buffers to underlayer sockets Michael S. Tsirkin
[not found] ` <1526893473-20128-2-git-send-email-jasowang@redhat.com>
2018-05-21 16:24 ` [RFC PATCH net-next 01/12] vhost_net: introduce helper to initialize tx iov iter Jesse Brandeburg
[not found] ` <20180521092400.00004c68@intel.com>
2018-05-22 12:26 ` Jason Wang
[not found] ` <1526893473-20128-3-git-send-email-jasowang@redhat.com>
2018-05-21 16:29 ` [RFC PATCH net-next 02/12] vhost_net: introduce vhost_exceeds_weight() Jesse Brandeburg
[not found] ` <20180521092923.00005cec@intel.com>
2018-05-22 12:27 ` Jason Wang
[not found] ` <1526893473-20128-11-git-send-email-jasowang@redhat.com>
2018-05-21 16:56 ` [RFC PATCH net-next 10/12] vhost_net: build xdp buff Jesse Brandeburg
[not found] ` <20180521095611.00005caa@intel.com>
2018-05-21 22:21 ` Michael S. Tsirkin [this message]
2018-05-22 12:41 ` Jason Wang
2018-05-25 17:53 ` [RFC PATCH net-next 00/12] XDP batching for TUN/vhost_net Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='20180522012008-mutt-send-email-mst__38174.064565874$1526941158$gmane$org@kernel.org' \
--to=mst@redhat.com \
--cc=jesse.brandeburg@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).