From: Jason Wang <jasowang@redhat.com>
To: John Fastabend <john.fastabend@gmail.com>,
kubakici@wp.pl, ast@fb.com, mst@redhat.com
Cc: john.r.fastabend@intel.com, netdev@vger.kernel.org
Subject: Re: [net-next PATCH v2 5/5] virtio_net: XDP support for adjust_head
Date: Tue, 7 Feb 2017 10:23:23 +0800 [thread overview]
Message-ID: <abeb882e-0b44-12d6-d30d-d037c0813941@redhat.com> (raw)
In-Reply-To: <5898CEB7.1020509@gmail.com>
On 2017年02月07日 03:29, John Fastabend wrote:
> On 17-02-05 11:08 PM, Jason Wang wrote:
>>
>> On 2017年02月03日 11:16, John Fastabend wrote:
>>> Add support for XDP adjust head by allocating a 256B header region
>>> that XDP programs can grow into. This is only enabled when a XDP
>>> program is loaded.
>>>
>>> In order to ensure that we do not have to unwind queue headroom push
>>> queue setup below bpf_prog_add. It reads better to do a prog ref
>>> unwind vs another queue setup call.
>>>
>>> At the moment this code must do a full reset to ensure old buffers
>>> without headroom on program add or with headroom on program removal
>>> are not used incorrectly in the datapath. Ideally we would only
>>> have to disable/enable the RX queues being updated but there is no
>>> API to do this at the moment in virtio so use the big hammer. In
>>> practice it is likely not that big of a problem as this will only
>>> happen when XDP is enabled/disabled changing programs does not
>>> require the reset. There is some risk that the driver may either
>>> have an allocation failure or for some reason fail to correctly
>>> negotiate with the underlying backend in this case the driver will
>>> be left uninitialized. I have not seen this ever happen on my test
>>> systems and for what its worth this same failure case can occur
>>> from probe and other contexts in virtio framework.
>>>
>>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>>> ---
>
> [...]
>
>>> @@ -412,7 +418,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>> struct bpf_prog *xdp_prog;
>>> len -= vi->hdr_len;
>>> - skb_trim(skb, len);
>>> rcu_read_lock();
>>> xdp_prog = rcu_dereference(rq->xdp_prog);
>>> @@ -424,12 +429,16 @@ static struct sk_buff *receive_small(struct net_device
>>> *dev,
>>> if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
>>> goto err_xdp;
>>> - xdp.data = skb->data;
>>> + xdp.data_hard_start = skb->data;
>>> + xdp.data = skb->data + VIRTIO_XDP_HEADROOM;
>>> xdp.data_end = xdp.data + len;
>>> act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>> switch (act) {
>>> case XDP_PASS:
>>> + /* Recalculate length in case bpf program changed it */
>>> + __skb_pull(skb, xdp.data - xdp.data_hard_start);
>> But skb->len were trimmed to len below which seems wrong.
> I believe this is correct and it passes my basic iperf/ping tests.
>
> When we are using small buffers with XDP, skb->data is pointing to the front
> of the buffer. This space includes the XDP headroom. When we pass the skb up
> to the stack we need to pull this off and point to the start of the data. But
> there still is likely a bunch of room at the end of the buffer assuming the
> packet is smaller than the buffer side.
>
>>> + len = xdp.data_end - xdp.data;
>>> break;
>>> case XDP_TX:
>>> if (unlikely(!virtnet_xdp_xmit(vi, rq, &xdp, skb)))
>>> @@ -446,6 +455,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>> }
>>> rcu_read_unlock();
>>> + skb_trim(skb, len);
> So here we trim the packet to set the length to the actual payload size. The
> 'len' parameter passed into receive_small does not include the headroom so this
> gives us the correct length of the payload.
>
> Make sense?
Yes, you are right.
>
>>> return skb;
>>> err_xdp:
> [...]
>
>>> @@ -569,7 +580,7 @@ static struct sk_buff *receive_mergeable(struct net_device
>>> *dev,
>>> page, offset, &len);
>>> if (!xdp_page)
>>> goto err_xdp;
>>> - offset = 0;
>>> + offset = VIRTIO_XDP_HEADROOM;
>>> } else {
>>> xdp_page = page;
>>> }
>>> @@ -582,19 +593,30 @@ static struct sk_buff *receive_mergeable(struct
>>> net_device *dev,
>>> if (unlikely(hdr->hdr.gso_type))
>>> goto err_xdp;
>>> + /* Allow consuming headroom but reserve enough space to push
>>> + * the descriptor on if we get an XDP_TX return code.
>>> + */
>>> data = page_address(xdp_page) + offset;
>>> + xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;
>> Should be data - VIRTIO_XDP_HEADROOM I think?
>>
> If the XDP program does an adjust_head() and then a XDP_TX I want to ensure
> we reserve enough headroom to push the header onto the buffer when the packet
> is sent. So the additional hdr_len reserve here is intentional. Otherwise we
> would need to detect this and do some type of linearize action.
I get the point.
>
>>> xdp.data = data + vi->hdr_len;
>>> xdp.data_end = xdp.data + (len - vi->hdr_len);
>>> act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>> switch (act) {
>>> case XDP_PASS:
>>> + /* recalculate offset to account for any header
>>> + * adjustments. Note other cases do not build an
>>> + * skb and avoid using offset
>>> + */
>>> + offset = xdp.data -
>>> + page_address(xdp_page) - vi->hdr_len;
>>> +
>>> /* We can only create skb based on xdp_page. */
>>> if (unlikely(xdp_page != page)) {
>>> rcu_read_unlock();
>>> put_page(page);
>>> head_skb = page_to_skb(vi, rq, xdp_page,
>>> - 0, len, PAGE_SIZE);
>>> + offset, len, PAGE_SIZE);
>>> ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
>>> return head_skb;
> [...]
>
>>> -static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
>>> +static int add_recvbuf_mergeable(struct virtnet_info *vi,
>>> + struct receive_queue *rq, gfp_t gfp)
>>> {
>>> struct page_frag *alloc_frag = &rq->alloc_frag;
>>> + unsigned int headroom = virtnet_get_headroom(vi);
>>> char *buf;
>>> unsigned long ctx;
>>> int err;
>>> unsigned int len, hole;
>>> len = get_mergeable_buf_len(&rq->mrg_avg_pkt_len);
>>> - if (unlikely(!skb_page_frag_refill(len, alloc_frag, gfp)))
>>> + if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp)))
>>> return -ENOMEM;
>>> buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
>>> + buf += headroom; /* advance address leaving hole at front of pkt */
>> Note: the headroom will reduce the possibility of frag coalescing which may
>> damage the performance more or less.
>>
>> [...]
> Right there are a few other performance optimizations I am looking at in
> virtio as well but these should go in as follow on series.
>
> Specifically, I'm looking at recycling buffers to see what sort of performance
> increase we can get out of that. Many of the hardware drivers do this and see
> a performance boost from it. However dynamic buffer sizes like this make it a
> bit challenging.
>
> .John
Right.
Acked-by: Jason Wang <jasowang@redhat.com>
Thanks
next prev parent reply other threads:[~2017-02-07 2:23 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-03 3:14 [net-next PATCH v2 0/5] XDP adjust head support for virtio John Fastabend
2017-02-03 3:14 ` [net-next PATCH v2 1/5] virtio_net: wrap rtnl_lock in test for calling with lock already held John Fastabend
2017-02-06 6:48 ` Jason Wang
2017-02-03 3:15 ` [net-next PATCH v2 2/5] virtio_net: factor out xdp handler for readability John Fastabend
2017-02-06 6:49 ` Jason Wang
2017-02-03 3:15 ` [net-next PATCH v2 3/5] virtio_net: remove duplicate queue pair binding in XDP John Fastabend
2017-02-06 7:06 ` Jason Wang
2017-02-03 3:16 ` [net-next PATCH v2 4/5] virtio_net: refactor freeze/restore logic into virtnet reset logic John Fastabend
2017-02-06 7:07 ` Jason Wang
2017-02-03 3:16 ` [net-next PATCH v2 5/5] virtio_net: XDP support for adjust_head John Fastabend
2017-02-03 4:04 ` Michael S. Tsirkin
2017-02-06 7:08 ` Jason Wang
2017-02-06 19:29 ` John Fastabend
2017-02-07 2:23 ` Jason Wang [this message]
2017-02-03 3:29 ` [net-next PATCH v2 0/5] XDP adjust head support for virtio Alexei Starovoitov
2017-02-03 3:55 ` Jakub Kicinski
2017-02-05 22:36 ` David Miller
2017-02-06 4:39 ` Michael S. Tsirkin
2017-02-06 7:12 ` Jason Wang
2017-02-06 16:37 ` David Miller
2017-02-07 4:15 ` Michael S. Tsirkin
2017-02-07 15:05 ` David Miller
2017-02-08 16:39 ` John Fastabend
2017-02-08 16:50 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abeb882e-0b44-12d6-d30d-d037c0813941@redhat.com \
--to=jasowang@redhat.com \
--cc=ast@fb.com \
--cc=john.fastabend@gmail.com \
--cc=john.r.fastabend@intel.com \
--cc=kubakici@wp.pl \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).