From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jesper Dangaard Brouer Subject: Re: [PATCH RFC 6/9] veth: Add ndo_xdp_xmit Date: Mon, 30 Apr 2018 19:27:09 +0200 Message-ID: <20180430192709.1fc47265@redhat.com> References: <20180424143923.26519-1-toshiaki.makita1@gmail.com> <20180424143923.26519-7-toshiaki.makita1@gmail.com> <20180425222452.1ea5c69f@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Toshiaki Makita , netdev@vger.kernel.org, brouer@redhat.com, Tariq Toukan To: Toshiaki Makita Return-path: Received: from mx3-rdu2.redhat.com ([66.187.233.73]:39796 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751938AbeD3R1P (ORCPT ); Mon, 30 Apr 2018 13:27:15 -0400 In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On Thu, 26 Apr 2018 19:52:40 +0900 Toshiaki Makita wrote: > On 2018/04/26 5:24, Jesper Dangaard Brouer wrote: > > On Tue, 24 Apr 2018 23:39:20 +0900 > > Toshiaki Makita wrote: > > > >> +static int veth_xdp_xmit(struct net_device *dev, struct xdp_frame *frame) > >> +{ > >> + struct veth_priv *rcv_priv, *priv = netdev_priv(dev); > >> + int headroom = frame->data - (void *)frame; > >> + struct net_device *rcv; > >> + int err = 0; > >> + > >> + rcv = rcu_dereference(priv->peer); > >> + if (unlikely(!rcv)) > >> + return -ENXIO; > >> + > >> + rcv_priv = netdev_priv(rcv); > >> + /* xdp_ring is initialized on receive side? */ > >> + if (rcu_access_pointer(rcv_priv->xdp_prog)) { > >> + err = xdp_ok_fwd_dev(rcv, frame->len); > >> + if (unlikely(err)) > >> + return err; > >> + > >> + err = veth_xdp_enqueue(rcv_priv, veth_xdp_to_ptr(frame)); > >> + } else { > >> + struct sk_buff *skb; > >> + > >> + skb = veth_build_skb(frame, headroom, frame->len, 0); > >> + if (unlikely(!skb)) > >> + return -ENOMEM; > >> + > >> + /* Get page ref in case skb is dropped in netif_rx. > >> + * The caller is responsible for freeing the page on error. > >> + */ > >> + get_page(virt_to_page(frame->data)); > > > > I'm not sure you can make this assumption, that xdp_frames coming from > > another device driver uses a refcnt based memory model. But maybe I'm > > confused, as this looks like an SKB receive path, but in the > > ndo_xdp_xmit(). > > I find this path similar to cpumap, which creates skb from redirected > xdp frame. Once it is converted to skb, skb head is freed by > page_frag_free, so anyway I needed to get the refcount here regardless > of memory model. Yes I know, I wrote cpumap ;-) First of all, I don't want to see such xdp_frame to SKB conversion code in every driver. Because that increase the chances of errors. And when looking at the details, then it seems that you have made the mistake of making it possible to leak xdp_frame info to the SKB (which cpumap takes into account). Second, I think the refcnt scheme here is wrong. The xdp_frame should be "owned" by XDP and have the proper refcnt to deliver it directly to the network stack. Third, if we choose that we want a fallback, in-case XDP is not enabled on egress dev (but it have an ndo_xdp_xmit), then it should be placed in the generic/core code. E.g. __bpf_tx_xdp_map() could look at the return code from dev->netdev_ops->ndo_xdp() and create an SKB. (Hint, this would make it easy to implement TX bulking towards the dev). -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer +static struct sk_buff *veth_build_skb(void *head, int headroom, int len, + int buflen) +{ + struct sk_buff *skb; + + if (!buflen) { + buflen = SKB_DATA_ALIGN(headroom + len) + + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + } + skb = build_skb(head, buflen); + if (!skb) + return NULL; + + skb_reserve(skb, headroom); + skb_put(skb, len); + + return skb; +}