From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: netdev@vger.kernel.org, BjörnTöpel <bjorn.topel@intel.com>,
magnus.karlsson@intel.com, eugenia@mellanox.com,
"John Fastabend" <john.fastabend@gmail.com>,
"Eran Ben Elisha" <eranbe@mellanox.com>,
"Saeed Mahameed" <saeedm@mellanox.com>,
galp@mellanox.com, "Daniel Borkmann" <borkmann@iogearbox.net>,
"Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
"Tariq Toukan" <tariqt@mellanox.com>,
brouer@redhat.com
Subject: Re: [bpf-next V2 PATCH 02/15] xdp: introduce xdp_return_frame API and use in cpumap
Date: Fri, 9 Mar 2018 10:35:36 +0100 [thread overview]
Message-ID: <20180309103536.1d7fdc58@redhat.com> (raw)
In-Reply-To: <2fcc8356-4d68-30dc-5424-4dea618ac6b5@redhat.com>
On Fri, 9 Mar 2018 15:24:10 +0800
Jason Wang <jasowang@redhat.com> wrote:
> On 2018年03月08日 21:07, Jesper Dangaard Brouer wrote:
> > Introduce an xdp_return_frame API, and convert over cpumap as
> > the first user, given it have queued XDP frame structure to leverage.
> >
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> > include/net/xdp.h | 32 +++++++++++++++++++++++++++
> > kernel/bpf/cpumap.c | 60 +++++++++++++++++++++++++++++++--------------------
> > net/core/xdp.c | 18 +++++++++++++++
> > 3 files changed, 86 insertions(+), 24 deletions(-)
> >
> > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > index b2362ddfa694..3cb726a6dc5b 100644
> > --- a/include/net/xdp.h
> > +++ b/include/net/xdp.h
> > @@ -33,16 +33,48 @@
> > * also mandatory during RX-ring setup.
> > */
> >
> > +enum mem_type {
> > + MEM_TYPE_PAGE_SHARED = 0, /* Split-page refcnt based model */
> > + MEM_TYPE_PAGE_ORDER0, /* Orig XDP full page model */
> > + // Possible new ideas for types:
> > + // MEM_TYPE_PAGE_POOL, /* Will be added later */
> > + // MEM_TYPE_AF_XDP,
> > + // MEM_TYPE_DEVICE_OWNED -- invoking an dev->ndo?
> > + MEM_TYPE_MAX,
> > +};
>
> So if we plan to support dev->ndo, it looks to me two types AF_XDP and
> DEVICE_OWNED are sufficient? Driver can do what it wants (e.g page pool
> or ordinary page allocator) in ndo or what ever other callbacks.
So, the design is open to go both ways, we can figure out later.
The reason I'm not calling page_pool from a driver level dev->ndo, is
that I'm trying to avoid invoking code in a module, as the driver
module code can be (in the process of being) unloaded from the kernel,
while another driver is calling xdp_return_frame. The current design
in patch 10, uses the RCU period to guarantee that the allocator
pointer is valid as long as the ID lookup succeeded.
To do what you propose, we also need to guarantee that the net_device
cannot disappear so it is safe to invoke dev->ndo. To do so, the
driver likely have to add a rcu_barrier before unloading. I'm also
thinking that for MEM_TYPE_DEVICE_OWNED the allocator pointer ("under
protection") could be the net_device pointer, and we could take a
refcnt on dev (dev_hold/dev_put). Thus, I think it is doable, but lets
figure this out later.
Another important design consideration, is that the xdp core need to
know how to release memory in case the ID lookup failed. This is
another argument for keeping the memory release code inside the xdp
core, and not leave too much freedom to the drivers.
> > +struct xdp_mem_info {
> > + u32 type; /* enum mem_type, but known size type */
> > + u32 id; // Needed later (to lookup struct xdp_rxq_info)
> > +};
> > +
> > struct xdp_rxq_info {
> > struct net_device *dev;
> > u32 queue_index;
> > u32 reg_state;
> > + struct xdp_mem_info mem;
> > } ____cacheline_aligned; /* perf critical, avoid false-sharing */
> >
> > +
> > +static inline
> > +void xdp_return_frame(void *data, struct xdp_mem_info *mem)
> > +{
> > + if (mem->type == MEM_TYPE_PAGE_SHARED)
> > + page_frag_free(data);
> > +
> > + if (mem->type == MEM_TYPE_PAGE_ORDER0) {
> > + struct page *page = virt_to_page(data); /* Assumes order0 page*/
> > +
> > + put_page(page);
> > + }
> > +}
> > +
> > int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
> > struct net_device *dev, u32 queue_index);
> > void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq);
> > void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq);
> > bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq);
> > +int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
> > + enum mem_type type, void *allocator);
> >
> > #endif /* __LINUX_NET_XDP_H__ */
> > diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
> > index a4bb0b34375a..3e4bbcbe3e86 100644
> > --- a/kernel/bpf/cpumap.c
> > +++ b/kernel/bpf/cpumap.c
[...]
> > static void get_cpu_map_entry(struct bpf_cpu_map_entry *rcpu)
> > {
> > atomic_inc(&rcpu->refcnt);
> > @@ -188,6 +168,10 @@ struct xdp_pkt {
> > u16 len;
> > u16 headroom;
> > u16 metasize;
> > + /* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
> > + * while mem info is valid on remote CPU.
> > + */
>
> Can we simply move the xdp_mem_info to xdp_buff to avoid conversion?
No, xdp_buff is a stack allocated piece of memory, thus we do need a
conversion into another piece of memory.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2018-03-09 9:35 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-08 13:07 [bpf-next V2 PATCH 00/15] XDP redirect memory return API Jesper Dangaard Brouer
2018-03-08 13:07 ` [bpf-next V2 PATCH 01/15] mlx5: basic XDP_REDIRECT forward support Jesper Dangaard Brouer
2018-03-08 13:07 ` [bpf-next V2 PATCH 02/15] xdp: introduce xdp_return_frame API and use in cpumap Jesper Dangaard Brouer
2018-03-09 7:24 ` Jason Wang
2018-03-09 9:35 ` Jesper Dangaard Brouer [this message]
2018-03-09 13:04 ` Jason Wang
2018-03-09 16:05 ` Jesper Dangaard Brouer
2018-03-16 8:43 ` Jason Wang
2018-03-08 13:07 ` [bpf-next V2 PATCH 03/15] ixgbe: use xdp_return_frame API Jesper Dangaard Brouer
2018-03-08 13:08 ` [bpf-next V2 PATCH 04/15] xdp: move struct xdp_buff from filter.h to xdp.h Jesper Dangaard Brouer
2018-03-08 13:08 ` [bpf-next V2 PATCH 05/15] xdp: introduce a new xdp_frame type Jesper Dangaard Brouer
2018-03-08 13:08 ` [bpf-next V2 PATCH 06/15] tun: convert to use generic xdp_frame and xdp_return_frame API Jesper Dangaard Brouer
2018-03-08 15:16 ` Jesper Dangaard Brouer
2018-03-09 7:16 ` Jason Wang
2018-03-09 8:46 ` Jesper Dangaard Brouer
2018-03-08 13:08 ` [bpf-next V2 PATCH 07/15] virtio_net: " Jesper Dangaard Brouer
2018-03-09 8:03 ` Jason Wang
2018-03-09 9:44 ` Jesper Dangaard Brouer
2018-03-09 13:11 ` Jason Wang
2018-03-08 13:08 ` [bpf-next V2 PATCH 08/15] bpf: cpumap convert to use generic xdp_frame Jesper Dangaard Brouer
2018-03-08 13:08 ` [bpf-next V2 PATCH 09/15] mlx5: register a memory model when XDP is enabled Jesper Dangaard Brouer
2018-03-08 13:08 ` [bpf-next V2 PATCH 10/15] xdp: rhashtable with allocator ID to pointer mapping Jesper Dangaard Brouer
2018-03-09 8:08 ` Jason Wang
2018-03-09 9:37 ` Jesper Dangaard Brouer
2018-03-09 13:07 ` Jason Wang
2018-03-09 16:07 ` Jesper Dangaard Brouer
2018-03-16 8:45 ` Jason Wang
2018-03-19 9:48 ` Jesper Dangaard Brouer
2018-03-20 2:26 ` Jason Wang
2018-03-20 14:27 ` Jesper Dangaard Brouer
2018-03-22 2:16 ` Jason Wang
2018-03-08 13:08 ` [bpf-next V2 PATCH 11/15] page_pool: refurbish version of page_pool code Jesper Dangaard Brouer
2018-03-08 13:08 ` [bpf-next V2 PATCH 12/15] xdp: allow page_pool as an allocator type in xdp_return_frame Jesper Dangaard Brouer
2018-03-08 13:08 ` [bpf-next V2 PATCH 13/15] mlx5: use page_pool for xdp_return_frame call Jesper Dangaard Brouer
2018-03-08 13:08 ` [bpf-next V2 PATCH 14/15] xdp: transition into using xdp_frame for return API Jesper Dangaard Brouer
2018-03-08 13:08 ` [bpf-next V2 PATCH 15/15] xdp: transition into using xdp_frame for ndo_xdp_xmit Jesper Dangaard Brouer
2018-03-08 15:03 ` [bpf-next V2 PATCH 00/15] XDP redirect memory return API Alexander Duyck
2018-03-08 15:30 ` Jesper Dangaard Brouer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180309103536.1d7fdc58@redhat.com \
--to=brouer@redhat.com \
--cc=alexei.starovoitov@gmail.com \
--cc=bjorn.topel@intel.com \
--cc=borkmann@iogearbox.net \
--cc=eranbe@mellanox.com \
--cc=eugenia@mellanox.com \
--cc=galp@mellanox.com \
--cc=jasowang@redhat.com \
--cc=john.fastabend@gmail.com \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=saeedm@mellanox.com \
--cc=tariqt@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).