Re: [RFC PATCH] virtio_net: XDP support for adjust_head

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>,
	john.r.fastabend@intel.com, netdev@vger.kernel.org,
	alexei.starovoitov@gmail.com, daniel@iogearbox.net
Subject: Re: [RFC PATCH] virtio_net: XDP support for adjust_head
Date: Wed, 4 Jan 2017 00:16:14 +0200	[thread overview]
Message-ID: <20170104001332-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <73715f7a-eeeb-679f-a7b8-7b1fefe1757e@redhat.com>

On Tue, Jan 03, 2017 at 02:01:27PM +0800, Jason Wang wrote:
> 
> 
> On 2017年01月03日 03:44, John Fastabend wrote:
> > Add support for XDP adjust head by allocating a 256B header region
> > that XDP programs can grow into. This is only enabled when a XDP
> > program is loaded.
> > 
> > In order to ensure that we do not have to unwind queue headroom push
> > queue setup below bpf_prog_add. It reads better to do a prog ref
> > unwind vs another queue setup call.
> > 
> > : There is a problem with this patch as is. When xdp prog is loaded
> >    the old buffers without the 256B headers need to be flushed so that
> >    the bpf prog has the necessary headroom. This patch does this by
> >    calling the virtqueue_detach_unused_buf() and followed by the
> >    virtnet_set_queues() call to reinitialize the buffers. However I
> >    don't believe this is safe per comment in virtio_ring this API
> >    is not valid on an active queue and the only thing we have done
> >    here is napi_disable/napi_enable wrappers which doesn't do anything
> >    to the emulation layer.
> > 
> >    So the RFC is really to find the best solution to this problem.
> >    A couple things come to mind, (a) always allocate the necessary
> >    headroom but this is a bit of a waste (b) add some bit somewhere
> >    to check if the buffer has headroom but this would mean XDP programs
> >    would be broke for a cycle through the ring, (c) figure out how
> >    to deactivate a queue, free the buffers and finally reallocate.
> >    I think (c) is the best choice for now but I'm not seeing the
> >    API to do this so virtio/qemu experts anyone know off-hand
> >    how to make this work? I started looking into the PCI callbacks
> >    reset() and virtio_device_ready() or possibly hitting the right
> >    set of bits with vp_set_status() but my first attempt just hung
> >    the device.
> 
> Hi John:
> 
> AFAIK, disabling a specific queue was supported only by virtio 1.0 through
> queue_enable field in pci common cfg.

In fact 1.0 only allows enabling queues selectively.
We can add disabling by a spec enhancement but
for now reset is the only way.


> But unfortunately, qemu does not
> emulate this at all and legacy device does not even support this. So the
> safe way is probably reset the device and redo the initialization here.

You will also have to re-apply rx filtering if you do this.
Probably sending notification uplink.

> > 
> > Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> > ---
> >   drivers/net/virtio_net.c |  106 +++++++++++++++++++++++++++++++++++-----------
> >   1 file changed, 80 insertions(+), 26 deletions(-)
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 5deeda6..fcc5bd7 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -159,6 +159,9 @@ struct virtnet_info {
> >   	/* Ethtool settings */
> >   	u8 duplex;
> >   	u32 speed;
> > +
> > +	/* Headroom allocated in RX Queue */
> > +	unsigned int headroom;
> >   };
> >   struct padded_vnet_hdr {
> > @@ -355,6 +358,7 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi,
> >   	}
> >   	if (vi->mergeable_rx_bufs) {
> > +		xdp->data -= sizeof(struct virtio_net_hdr_mrg_rxbuf);
> >   		/* Zero header and leave csum up to XDP layers */
> >   		hdr = xdp->data;
> >   		memset(hdr, 0, vi->hdr_len);
> > @@ -371,7 +375,7 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi,
> >   		num_sg = 2;
> >   		sg_init_table(sq->sg, 2);
> >   		sg_set_buf(sq->sg, hdr, vi->hdr_len);
> > -		skb_to_sgvec(skb, sq->sg + 1, 0, skb->len);
> > +		skb_to_sgvec(skb, sq->sg + 1, vi->headroom, xdp->data_end - xdp->data);
> 
> vi->headroom look suspicious, should it be xdp->data - xdp->data_hard_start?
> 
> >   	}
> >   	err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg,
> >   				   data, GFP_ATOMIC);
> > @@ -393,34 +397,39 @@ static u32 do_xdp_prog(struct virtnet_info *vi,
> >   		       struct bpf_prog *xdp_prog,
> >   		       void *data, int len)
> >   {
> > -	int hdr_padded_len;
> >   	struct xdp_buff xdp;
> > -	void *buf;
> >   	unsigned int qp;
> >   	u32 act;
> > +
> >   	if (vi->mergeable_rx_bufs) {
> > -		hdr_padded_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
> > -		xdp.data = data + hdr_padded_len;
> > +		int desc_room = sizeof(struct virtio_net_hdr_mrg_rxbuf);
> > +
> > +		/* Allow consuming headroom but reserve enough space to push
> > +		 * the descriptor on if we get an XDP_TX return code.
> > +		 */
> > +		xdp.data_hard_start = data - vi->headroom + desc_room;
> > +		xdp.data = data + desc_room;
> >   		xdp.data_end = xdp.data + (len - vi->hdr_len);
> > -		buf = data;
> >   	} else { /* small buffers */
> >   		struct sk_buff *skb = data;
> > -		xdp.data = skb->data;
> > +		xdp.data_hard_start = skb->data;
> > +		xdp.data = skb->data + vi->headroom;
> >   		xdp.data_end = xdp.data + len;
> > -		buf = skb->data;
> >   	}
> >   	act = bpf_prog_run_xdp(xdp_prog, &xdp);
> >   	switch (act) {
> >   	case XDP_PASS:
> > +		if (!vi->mergeable_rx_bufs)
> > +			__skb_pull((struct sk_buff *) data,
> > +				   xdp.data - xdp.data_hard_start);
> 
> Instead of doing things here and virtnet_xdp_xmit(). How about always making
> skb->data point to the buffer head like:
> 
> 1) reserve headroom in add_recvbuf_small()
> 2) skb_push(xdp->data - xdp_data_hard_start, skb) if we detect xdp->data was
> modified afer bpf_prog_run_xdp()
> 
> Then there's no special code in either XDP_PASS or XDP_TX?
> 
> >   		return XDP_PASS;
> >   	case XDP_TX:
> >   		qp = vi->curr_queue_pairs -
> >   			vi->xdp_queue_pairs +
> >   			smp_processor_id();
> 
> [...]
> 
> > +#define VIRTIO_XDP_HEADROOM 256
> > +
> >   static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
> >   {
> >   	unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
> >   	struct virtnet_info *vi = netdev_priv(dev);
> >   	struct bpf_prog *old_prog;
> >   	u16 xdp_qp = 0, curr_qp;
> > +	unsigned int old_hr;
> >   	int i, err;
> >   	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > @@ -1736,19 +1751,58 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
> >   		return -ENOMEM;
> >   	}
> > +	old_hr = vi->headroom;
> > +	if (prog) {
> > +		prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
> > +		if (IS_ERR(prog))
> > +			return PTR_ERR(prog);
> > +		vi->headroom = VIRTIO_XDP_HEADROOM;
> > +	} else {
> > +		vi->headroom = 0;
> > +	}
> > +
> > +	/* Changing the headroom in buffers is a disruptive operation because
> > +	 * existing buffers must be flushed and reallocated. This will happen
> > +	 * when a xdp program is initially added or xdp is disabled by removing
> > +	 * the xdp program.
> > +	 */
> 
> We probably need reset the device here, but maybe Michale has more ideas.
> And if we do this, another interesting thing to do is to disable EWMA and
> always use a single page for each packet, this could almost eliminate
> linearizing.
> 
> Thanks

next prev parent reply	other threads:[~2017-01-03 22:16 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-02 19:44 [RFC PATCH] virtio_net: XDP support for adjust_head John Fastabend
2017-01-02 19:47 ` John Fastabend
2017-01-03  6:01 ` Jason Wang
2017-01-03 16:54   ` John Fastabend
2017-01-03 16:57     ` John Fastabend
2017-01-04  3:22       ` Jason Wang
2017-01-04  3:21     ` Jason Wang
2017-01-03 22:16   ` Michael S. Tsirkin [this message]
2017-01-05 22:57     ` John Fastabend
2017-01-06  0:39       ` Michael S. Tsirkin
2017-01-06  3:28         ` John Fastabend
2017-01-04 18:58   ` John Fastabend
2017-01-05  3:10     ` Jason Wang
  -- strict thread matches above, loose matches on Subject: below --
2016-12-23 18:43 John Fastabend
2016-12-23 21:46 ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170104001332-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=daniel@iogearbox.net \
    --cc=jasowang@redhat.com \
    --cc=john.fastabend@gmail.com \
    --cc=john.r.fastabend@intel.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).