Re: [net-next PATCH v2 5/5] virtio_net: XDP support for adjust_head

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jason Wang <jasowang@redhat.com>
To: John Fastabend <john.fastabend@gmail.com>,
	kubakici@wp.pl, ast@fb.com, mst@redhat.com
Cc: john.r.fastabend@intel.com, netdev@vger.kernel.org
Subject: Re: [net-next PATCH v2 5/5] virtio_net: XDP support for adjust_head
Date: Tue, 7 Feb 2017 10:23:23 +0800	[thread overview]
Message-ID: <abeb882e-0b44-12d6-d30d-d037c0813941@redhat.com> (raw)
In-Reply-To: <5898CEB7.1020509@gmail.com>



On 2017年02月07日 03:29, John Fastabend wrote:
> On 17-02-05 11:08 PM, Jason Wang wrote:
>>
>> On 2017年02月03日 11:16, John Fastabend wrote:
>>> Add support for XDP adjust head by allocating a 256B header region
>>> that XDP programs can grow into. This is only enabled when a XDP
>>> program is loaded.
>>>
>>> In order to ensure that we do not have to unwind queue headroom push
>>> queue setup below bpf_prog_add. It reads better to do a prog ref
>>> unwind vs another queue setup call.
>>>
>>> At the moment this code must do a full reset to ensure old buffers
>>> without headroom on program add or with headroom on program removal
>>> are not used incorrectly in the datapath. Ideally we would only
>>> have to disable/enable the RX queues being updated but there is no
>>> API to do this at the moment in virtio so use the big hammer. In
>>> practice it is likely not that big of a problem as this will only
>>> happen when XDP is enabled/disabled changing programs does not
>>> require the reset. There is some risk that the driver may either
>>> have an allocation failure or for some reason fail to correctly
>>> negotiate with the underlying backend in this case the driver will
>>> be left uninitialized. I have not seen this ever happen on my test
>>> systems and for what its worth this same failure case can occur
>>> from probe and other contexts in virtio framework.
>>>
>>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>>> ---
>
> [...]
>
>>> @@ -412,7 +418,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>>        struct bpf_prog *xdp_prog;
>>>          len -= vi->hdr_len;
>>> -    skb_trim(skb, len);
>>>          rcu_read_lock();
>>>        xdp_prog = rcu_dereference(rq->xdp_prog);
>>> @@ -424,12 +429,16 @@ static struct sk_buff *receive_small(struct net_device
>>> *dev,
>>>            if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
>>>                goto err_xdp;
>>>    -        xdp.data = skb->data;
>>> +        xdp.data_hard_start = skb->data;
>>> +        xdp.data = skb->data + VIRTIO_XDP_HEADROOM;
>>>            xdp.data_end = xdp.data + len;
>>>            act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>>              switch (act) {
>>>            case XDP_PASS:
>>> +            /* Recalculate length in case bpf program changed it */
>>> +            __skb_pull(skb, xdp.data - xdp.data_hard_start);
>> But skb->len were trimmed to len below which seems wrong.
> I believe this is correct and it passes my basic iperf/ping tests.
>
> When we are using small buffers with XDP, skb->data is pointing to the front
> of the buffer. This space includes the XDP headroom. When we pass the skb up
> to the stack we need to pull this off and point to the start of the data. But
> there still is likely a bunch of room at the end of the buffer assuming the
> packet is smaller than the buffer side.
>
>>> +            len = xdp.data_end - xdp.data;
>>>                break;
>>>            case XDP_TX:
>>>                if (unlikely(!virtnet_xdp_xmit(vi, rq, &xdp, skb)))
>>> @@ -446,6 +455,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>>        }
>>>        rcu_read_unlock();
>>>    +    skb_trim(skb, len);
> So here we trim the packet to set the length to the actual payload size. The
> 'len' parameter passed into receive_small does not include the headroom so this
> gives us the correct length of the payload.
>
> Make sense?

Yes, you are right.

>
>>>        return skb;
>>>      err_xdp:
> [...]
>
>>> @@ -569,7 +580,7 @@ static struct sk_buff *receive_mergeable(struct net_device
>>> *dev,
>>>                                  page, offset, &len);
>>>                if (!xdp_page)
>>>                    goto err_xdp;
>>> -            offset = 0;
>>> +            offset = VIRTIO_XDP_HEADROOM;
>>>            } else {
>>>                xdp_page = page;
>>>            }
>>> @@ -582,19 +593,30 @@ static struct sk_buff *receive_mergeable(struct
>>> net_device *dev,
>>>            if (unlikely(hdr->hdr.gso_type))
>>>                goto err_xdp;
>>>    +        /* Allow consuming headroom but reserve enough space to push
>>> +         * the descriptor on if we get an XDP_TX return code.
>>> +         */
>>>            data = page_address(xdp_page) + offset;
>>> +        xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;
>> Should be data - VIRTIO_XDP_HEADROOM I think?
>>
> If the XDP program does an adjust_head() and then a XDP_TX I want to ensure
> we reserve enough headroom to push the header onto the buffer when the packet
> is sent. So the additional hdr_len reserve here is intentional. Otherwise we
> would need to detect this and do some type of linearize action.

I get the point.

>
>>>            xdp.data = data + vi->hdr_len;
>>>            xdp.data_end = xdp.data + (len - vi->hdr_len);
>>>            act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>>              switch (act) {
>>>            case XDP_PASS:
>>> +            /* recalculate offset to account for any header
>>> +             * adjustments. Note other cases do not build an
>>> +             * skb and avoid using offset
>>> +             */
>>> +            offset = xdp.data -
>>> +                    page_address(xdp_page) - vi->hdr_len;
>>> +
>>>                /* We can only create skb based on xdp_page. */
>>>                if (unlikely(xdp_page != page)) {
>>>                    rcu_read_unlock();
>>>                    put_page(page);
>>>                    head_skb = page_to_skb(vi, rq, xdp_page,
>>> -                               0, len, PAGE_SIZE);
>>> +                               offset, len, PAGE_SIZE);
>>>                    ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
>>>                    return head_skb;
> [...]
>
>>>    -static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
>>> +static int add_recvbuf_mergeable(struct virtnet_info *vi,
>>> +                 struct receive_queue *rq, gfp_t gfp)
>>>    {
>>>        struct page_frag *alloc_frag = &rq->alloc_frag;
>>> +    unsigned int headroom = virtnet_get_headroom(vi);
>>>        char *buf;
>>>        unsigned long ctx;
>>>        int err;
>>>        unsigned int len, hole;
>>>          len = get_mergeable_buf_len(&rq->mrg_avg_pkt_len);
>>> -    if (unlikely(!skb_page_frag_refill(len, alloc_frag, gfp)))
>>> +    if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp)))
>>>            return -ENOMEM;
>>>          buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
>>> +    buf += headroom; /* advance address leaving hole at front of pkt */
>> Note: the headroom will reduce the possibility of frag coalescing which may
>> damage the performance more or less.
>>
>> [...]
> Right there are a few other performance optimizations I am looking at in
> virtio as well but these should go in as follow on series.
>
> Specifically, I'm looking at recycling buffers to see what sort of performance
> increase we can get out of that. Many of the hardware drivers do this and see
> a performance boost from it. However dynamic buffer sizes like this make it a
> bit challenging.
>
> .John

Right.

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks

next prev parent reply	other threads:[~2017-02-07  2:23 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-03  3:14 [net-next PATCH v2 0/5] XDP adjust head support for virtio John Fastabend
2017-02-03  3:14 ` [net-next PATCH v2 1/5] virtio_net: wrap rtnl_lock in test for calling with lock already held John Fastabend
2017-02-06  6:48   ` Jason Wang
2017-02-03  3:15 ` [net-next PATCH v2 2/5] virtio_net: factor out xdp handler for readability John Fastabend
2017-02-06  6:49   ` Jason Wang
2017-02-03  3:15 ` [net-next PATCH v2 3/5] virtio_net: remove duplicate queue pair binding in XDP John Fastabend
2017-02-06  7:06   ` Jason Wang
2017-02-03  3:16 ` [net-next PATCH v2 4/5] virtio_net: refactor freeze/restore logic into virtnet reset logic John Fastabend
2017-02-06  7:07   ` Jason Wang
2017-02-03  3:16 ` [net-next PATCH v2 5/5] virtio_net: XDP support for adjust_head John Fastabend
2017-02-03  4:04   ` Michael S. Tsirkin
2017-02-06  7:08   ` Jason Wang
2017-02-06 19:29     ` John Fastabend
2017-02-07  2:23       ` Jason Wang [this message]
2017-02-03  3:29 ` [net-next PATCH v2 0/5] XDP adjust head support for virtio Alexei Starovoitov
2017-02-03  3:55 ` Jakub Kicinski
2017-02-05 22:36 ` David Miller
2017-02-06  4:39   ` Michael S. Tsirkin
2017-02-06  7:12     ` Jason Wang
2017-02-06 16:37     ` David Miller
2017-02-07  4:15 ` Michael S. Tsirkin
2017-02-07 15:05   ` David Miller
2017-02-08 16:39   ` John Fastabend
2017-02-08 16:50     ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abeb882e-0b44-12d6-d30d-d037c0813941@redhat.com \
    --to=jasowang@redhat.com \
    --cc=ast@fb.com \
    --cc=john.fastabend@gmail.com \
    --cc=john.r.fastabend@intel.com \
    --cc=kubakici@wp.pl \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).