From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Edward Cree <ecree@solarflare.com>
Cc: <netdev@vger.kernel.org>, <jakub.kicinski@netronome.com>,
"Michael S. Tsirkin" <mst@redhat.com>, <pavel.odintsov@gmail.com>,
Jason Wang <jasowang@redhat.com>, <mchan@broadcom.com>,
John Fastabend <john.fastabend@gmail.com>,
<peter.waskiewicz.jr@intel.com>, <ast@fiberby.dk>,
Daniel Borkmann <borkmann@iogearbox.net>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Andy Gospodarek <andy@greyhouse.net>,
brouer@redhat.com
Subject: Re: [net-next V7 PATCH 3/5] bpf: cpumap xdp_buff to skb conversion and allocation
Date: Fri, 13 Oct 2017 11:13:47 +0200 [thread overview]
Message-ID: <20171013111347.1472fcfb@redhat.com> (raw)
In-Reply-To: <21efa4a5-1e2f-bbe9-c1d0-115843fb885b@solarflare.com>
On Thu, 12 Oct 2017 22:13:43 +0100
Edward Cree <ecree@solarflare.com> wrote:
> On 12/10/17 13:26, Jesper Dangaard Brouer wrote:
> > This patch makes cpumap functional, by adding SKB allocation and
> > invoking the network stack on the dequeuing CPU.
> >
> > For constructing the SKB on the remote CPU, the xdp_buff in converted
> > into a struct xdp_pkt, and it mapped into the top headroom of the
> > packet, to avoid allocating separate mem. For now, struct xdp_pkt is
> > just a cpumap internal data structure, with info carried between
> > enqueue to dequeue.
>
> <snip>
>
> > +struct sk_buff *cpu_map_build_skb(struct bpf_cpu_map_entry *rcpu,
> > + struct xdp_pkt *xdp_pkt)
> > +{
> > + unsigned int frame_size;
> > + void *pkt_data_start;
> > + struct sk_buff *skb;
> > +
> > + /* build_skb need to place skb_shared_info after SKB end, and
> > + * also want to know the memory "truesize". Thus, need to
> > + * know the memory frame size backing xdp_buff.
> > + *
> > + * XDP was designed to have PAGE_SIZE frames, but this
> > + * assumption is not longer true with ixgbe and i40e. It
> > + * would be preferred to set frame_size to 2048 or 4096
> > + * depending on the driver.
> > + * frame_size = 2048;
> > + * frame_len = frame_size - sizeof(*xdp_pkt);
> > + *
> > + * Instead, with info avail, skb_shared_info in placed after
> > + * packet len. This, unfortunately fakes the truesize.
> > + * Another disadvantage of this approach, the skb_shared_info
> > + * is not at a fixed memory location, with mixed length
> > + * packets, which is bad for cache-line hotness.
> > + */
> > + frame_size = SKB_DATA_ALIGN(xdp_pkt->len) + xdp_pkt->headroom +
> > + SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > +
> > + pkt_data_start = xdp_pkt->data - xdp_pkt->headroom;
> > + skb = build_skb(pkt_data_start, frame_size);
> > + if (!skb)
> > + return NULL;
> > +
> > + skb_reserve(skb, xdp_pkt->headroom);
> > + __skb_put(skb, xdp_pkt->len);
> > + if (xdp_pkt->metasize)
> > + skb_metadata_set(skb, xdp_pkt->metasize);
> > +
> > + /* Essential SKB info: protocol and skb->dev */
> > + skb->protocol = eth_type_trans(skb, xdp_pkt->dev_rx);
> > +
> > + /* Optional SKB info, currently missing:
> > + * - HW checksum info (skb->ip_summed)
> > + * - HW RX hash (skb_set_hash)
> > + * - RX ring dev queue index (skb_record_rx_queue)
> > + */
> One possibility for dealing with these and related issues — also things
> like the proper way to free an xdp_buff if SKB creation fails, which
> might not be page_frag_free() for some drivers with unusual recycle ring
> implementations — is to have a new ndo for 'receiving' an xdp_pkt from a
> cpumap redirect.
> Since you're always receiving from the same driver that enqueued it, even
> the structure of the metadata stored in the top of the packet page
> doesn't have to be standardised; instead, each driver can put there just
> whatever happens to be needed for its ndo_xdp_rx routine. (Though there
> would probably be standard enqueue and dequeue functions that the
> 'common-case' drivers could use.)
> In some cases, the driver could even just leave in the page the packet
> prefix it got from the NIC, rather than reading it and then writing an
> interpreted version back, thus minimising the number of packet-page
> cachelines the 'bottom half' RX function has to touch (it would still
> need to write in anything it got from the RX event, of course).
> It shouldn't be much work as many driver RX routines are already
> structured this way — sfc, for instance, has a split into efx_rx_packet()
> and __efx_rx_packet(), as a software pipeline for prefetching.
This is all talking about future work. I'll be happy to discuss this
outside/after this patchset. I see nothing blocking these ideas.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2017-10-13 9:13 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-12 12:26 [net-next V7 PATCH 0/5] New bpf cpumap type for XDP_REDIRECT Jesper Dangaard Brouer
2017-10-12 12:26 ` [net-next V7 PATCH 1/5] bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP Jesper Dangaard Brouer
2017-10-12 20:35 ` Edward Cree
2017-10-13 8:17 ` Jesper Dangaard Brouer
2017-10-14 0:36 ` kbuild test robot
2017-10-12 12:26 ` [net-next V7 PATCH 2/5] bpf: XDP_REDIRECT enable use of cpumap Jesper Dangaard Brouer
2017-10-12 12:26 ` [net-next V7 PATCH 3/5] bpf: cpumap xdp_buff to skb conversion and allocation Jesper Dangaard Brouer
2017-10-12 21:13 ` Edward Cree
2017-10-13 9:13 ` Jesper Dangaard Brouer [this message]
2017-10-12 12:27 ` [net-next V7 PATCH 4/5] bpf: cpumap add tracepoints Jesper Dangaard Brouer
2017-10-14 16:54 ` David Miller
2017-10-12 12:27 ` [net-next V7 PATCH 5/5] samples/bpf: add cpumap sample program xdp_redirect_cpu Jesper Dangaard Brouer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171013111347.1472fcfb@redhat.com \
--to=brouer@redhat.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andy@greyhouse.net \
--cc=ast@fiberby.dk \
--cc=borkmann@iogearbox.net \
--cc=ecree@solarflare.com \
--cc=jakub.kicinski@netronome.com \
--cc=jasowang@redhat.com \
--cc=john.fastabend@gmail.com \
--cc=mchan@broadcom.com \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pavel.odintsov@gmail.com \
--cc=peter.waskiewicz.jr@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).