From: John Fastabend <john.fastabend@gmail.com>
To: Jason Wang <jasowang@redhat.com>,
kubakici@wp.pl, ast@fb.com, mst@redhat.com
Cc: john.r.fastabend@intel.com, netdev@vger.kernel.org
Subject: Re: [net-next PATCH v2 5/5] virtio_net: XDP support for adjust_head
Date: Mon, 6 Feb 2017 11:29:59 -0800 [thread overview]
Message-ID: <5898CEB7.1020509@gmail.com> (raw)
In-Reply-To: <993d31de-57d9-fb7f-568e-c18fc9fadbc6@redhat.com>
On 17-02-05 11:08 PM, Jason Wang wrote:
>
>
> On 2017年02月03日 11:16, John Fastabend wrote:
>> Add support for XDP adjust head by allocating a 256B header region
>> that XDP programs can grow into. This is only enabled when a XDP
>> program is loaded.
>>
>> In order to ensure that we do not have to unwind queue headroom push
>> queue setup below bpf_prog_add. It reads better to do a prog ref
>> unwind vs another queue setup call.
>>
>> At the moment this code must do a full reset to ensure old buffers
>> without headroom on program add or with headroom on program removal
>> are not used incorrectly in the datapath. Ideally we would only
>> have to disable/enable the RX queues being updated but there is no
>> API to do this at the moment in virtio so use the big hammer. In
>> practice it is likely not that big of a problem as this will only
>> happen when XDP is enabled/disabled changing programs does not
>> require the reset. There is some risk that the driver may either
>> have an allocation failure or for some reason fail to correctly
>> negotiate with the underlying backend in this case the driver will
>> be left uninitialized. I have not seen this ever happen on my test
>> systems and for what its worth this same failure case can occur
>> from probe and other contexts in virtio framework.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
[...]
>> @@ -412,7 +418,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
>> struct bpf_prog *xdp_prog;
>> len -= vi->hdr_len;
>> - skb_trim(skb, len);
>> rcu_read_lock();
>> xdp_prog = rcu_dereference(rq->xdp_prog);
>> @@ -424,12 +429,16 @@ static struct sk_buff *receive_small(struct net_device
>> *dev,
>> if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
>> goto err_xdp;
>> - xdp.data = skb->data;
>> + xdp.data_hard_start = skb->data;
>> + xdp.data = skb->data + VIRTIO_XDP_HEADROOM;
>> xdp.data_end = xdp.data + len;
>> act = bpf_prog_run_xdp(xdp_prog, &xdp);
>> switch (act) {
>> case XDP_PASS:
>> + /* Recalculate length in case bpf program changed it */
>> + __skb_pull(skb, xdp.data - xdp.data_hard_start);
>
> But skb->len were trimmed to len below which seems wrong.
I believe this is correct and it passes my basic iperf/ping tests.
When we are using small buffers with XDP, skb->data is pointing to the front
of the buffer. This space includes the XDP headroom. When we pass the skb up
to the stack we need to pull this off and point to the start of the data. But
there still is likely a bunch of room at the end of the buffer assuming the
packet is smaller than the buffer side.
>
>> + len = xdp.data_end - xdp.data;
>> break;
>> case XDP_TX:
>> if (unlikely(!virtnet_xdp_xmit(vi, rq, &xdp, skb)))
>> @@ -446,6 +455,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>> }
>> rcu_read_unlock();
>> + skb_trim(skb, len);
So here we trim the packet to set the length to the actual payload size. The
'len' parameter passed into receive_small does not include the headroom so this
gives us the correct length of the payload.
Make sense?
>> return skb;
>> err_xdp:
[...]
>> @@ -569,7 +580,7 @@ static struct sk_buff *receive_mergeable(struct net_device
>> *dev,
>> page, offset, &len);
>> if (!xdp_page)
>> goto err_xdp;
>> - offset = 0;
>> + offset = VIRTIO_XDP_HEADROOM;
>> } else {
>> xdp_page = page;
>> }
>> @@ -582,19 +593,30 @@ static struct sk_buff *receive_mergeable(struct
>> net_device *dev,
>> if (unlikely(hdr->hdr.gso_type))
>> goto err_xdp;
>> + /* Allow consuming headroom but reserve enough space to push
>> + * the descriptor on if we get an XDP_TX return code.
>> + */
>> data = page_address(xdp_page) + offset;
>> + xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;
>
> Should be data - VIRTIO_XDP_HEADROOM I think?
>
If the XDP program does an adjust_head() and then a XDP_TX I want to ensure
we reserve enough headroom to push the header onto the buffer when the packet
is sent. So the additional hdr_len reserve here is intentional. Otherwise we
would need to detect this and do some type of linearize action.
>> xdp.data = data + vi->hdr_len;
>> xdp.data_end = xdp.data + (len - vi->hdr_len);
>> act = bpf_prog_run_xdp(xdp_prog, &xdp);
>> switch (act) {
>> case XDP_PASS:
>> + /* recalculate offset to account for any header
>> + * adjustments. Note other cases do not build an
>> + * skb and avoid using offset
>> + */
>> + offset = xdp.data -
>> + page_address(xdp_page) - vi->hdr_len;
>> +
>> /* We can only create skb based on xdp_page. */
>> if (unlikely(xdp_page != page)) {
>> rcu_read_unlock();
>> put_page(page);
>> head_skb = page_to_skb(vi, rq, xdp_page,
>> - 0, len, PAGE_SIZE);
>> + offset, len, PAGE_SIZE);
>> ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
>> return head_skb;
[...]
>> -static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
>> +static int add_recvbuf_mergeable(struct virtnet_info *vi,
>> + struct receive_queue *rq, gfp_t gfp)
>> {
>> struct page_frag *alloc_frag = &rq->alloc_frag;
>> + unsigned int headroom = virtnet_get_headroom(vi);
>> char *buf;
>> unsigned long ctx;
>> int err;
>> unsigned int len, hole;
>> len = get_mergeable_buf_len(&rq->mrg_avg_pkt_len);
>> - if (unlikely(!skb_page_frag_refill(len, alloc_frag, gfp)))
>> + if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp)))
>> return -ENOMEM;
>> buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
>> + buf += headroom; /* advance address leaving hole at front of pkt */
>
> Note: the headroom will reduce the possibility of frag coalescing which may
> damage the performance more or less.
>
> [...]
Right there are a few other performance optimizations I am looking at in
virtio as well but these should go in as follow on series.
Specifically, I'm looking at recycling buffers to see what sort of performance
increase we can get out of that. Many of the hardware drivers do this and see
a performance boost from it. However dynamic buffer sizes like this make it a
bit challenging.
.John
next prev parent reply other threads:[~2017-02-06 19:30 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-03 3:14 [net-next PATCH v2 0/5] XDP adjust head support for virtio John Fastabend
2017-02-03 3:14 ` [net-next PATCH v2 1/5] virtio_net: wrap rtnl_lock in test for calling with lock already held John Fastabend
2017-02-06 6:48 ` Jason Wang
2017-02-03 3:15 ` [net-next PATCH v2 2/5] virtio_net: factor out xdp handler for readability John Fastabend
2017-02-06 6:49 ` Jason Wang
2017-02-03 3:15 ` [net-next PATCH v2 3/5] virtio_net: remove duplicate queue pair binding in XDP John Fastabend
2017-02-06 7:06 ` Jason Wang
2017-02-03 3:16 ` [net-next PATCH v2 4/5] virtio_net: refactor freeze/restore logic into virtnet reset logic John Fastabend
2017-02-06 7:07 ` Jason Wang
2017-02-03 3:16 ` [net-next PATCH v2 5/5] virtio_net: XDP support for adjust_head John Fastabend
2017-02-03 4:04 ` Michael S. Tsirkin
2017-02-06 7:08 ` Jason Wang
2017-02-06 19:29 ` John Fastabend [this message]
2017-02-07 2:23 ` Jason Wang
2017-02-03 3:29 ` [net-next PATCH v2 0/5] XDP adjust head support for virtio Alexei Starovoitov
2017-02-03 3:55 ` Jakub Kicinski
2017-02-05 22:36 ` David Miller
2017-02-06 4:39 ` Michael S. Tsirkin
2017-02-06 7:12 ` Jason Wang
2017-02-06 16:37 ` David Miller
2017-02-07 4:15 ` Michael S. Tsirkin
2017-02-07 15:05 ` David Miller
2017-02-08 16:39 ` John Fastabend
2017-02-08 16:50 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5898CEB7.1020509@gmail.com \
--to=john.fastabend@gmail.com \
--cc=ast@fb.com \
--cc=jasowang@redhat.com \
--cc=john.r.fastabend@intel.com \
--cc=kubakici@wp.pl \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).