Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v3] xen-netback: prefer xenbus_scanf() over xenbus_gather()
From: David Miller @ 2016-11-10  1:24 UTC (permalink / raw)
  To: JBeulich; +Cc: paul.durrant, wei.liu2, xen-devel, netdev
In-Reply-To: <582190C1020000780011CF5C@prv-mh.provo.novell.com>

From: "Jan Beulich" <JBeulich@suse.com>
Date: Tue, 08 Nov 2016 00:45:53 -0700

> For single items being collected this should be preferred as being more
> typesafe (as the compiler can check format string and to-be-written-to
> variable match) and more efficient (requiring one less parameter to be
> passed).
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v3: For consistency with other code don't consider zero an error
>     (utilizing that xenbus_scanf() at present won't return zero).
> v2: Avoid commit message to continue from subject.

Applied to net-next, thanks.

^ permalink raw reply

* Re: [PATCH net-next] igmp: Document sysctl force_igmp_version
From: David Miller @ 2016-11-10  1:23 UTC (permalink / raw)
  To: liuhangbin; +Cc: netdev, hannes, daniel
In-Reply-To: <1478501483-6047-1-git-send-email-liuhangbin@gmail.com>

From: Hangbin Liu <liuhangbin@gmail.com>
Date: Mon,  7 Nov 2016 14:51:23 +0800

> There is some difference between force_igmp_version and force_mld_version.
> Add document to make users aware of this.
> 
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>

Applied, thank you.

^ permalink raw reply

* Re: [Regression w/ patch] Restore network resistance to weird ICMP messages
From: David Miller @ 2016-11-10  1:22 UTC (permalink / raw)
  To: googuy; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel
In-Reply-To: <CAO1wt+Y7JfeCXnrR5BeZ7K6dErmB3nfdA4mRkFQ21S=VPCkLXg@mail.gmail.com>

From: Vicente Jiménez <googuy@gmail.com>
Date: Mon, 7 Nov 2016 12:11:59 +0100

> From bfc9a00e6b78d8eb60e46dacd7d761669d29a573 Mon Sep 17 00:00:00 2001
> From: Vicente Jimenez Aguilar <googuy@gmail.com>
> Date: Mon, 31 Oct 2016 13:10:29 +0100
> Subject: [PATCH] ipv4: icmp: Fix pMTU handling for rarest case
> 
> Restore network resistance to weird ICMP fragmentation needed messages
> with next hop MTU equal to (or exceeding) dropped packet size
> 
> Fixes: 46517008e116 ("ipv4: Kill ip_rt_frag_needed().")
> Signed-off-by: Vicente Jimenez Aguilar <googuy@gmail.com>
> ---
>  net/ipv4/icmp.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index 38abe70..c0af1d2 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -776,6 +776,7 @@ static bool icmp_unreach(struct sk_buff *skb)
>  	struct icmphdr *icmph;
>  	struct net *net;
>  	u32 info = 0;
> +	unsigned short old_mtu;
>  
>  	net = dev_net(skb_dst(skb)->dev);
>  

Order local variable declarations from longest to shortest line
please.

> +				if ( info >= old_mtu )

There should be no space after the '(' and before the ')' in this
conditional.

^ permalink raw reply

* Re: [PATCH net,v2] Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.")
From: David Miller @ 2016-11-10  1:20 UTC (permalink / raw)
  To: stephen.suryaputra.lin; +Cc: netdev, ssurya
In-Reply-To: <1478558919-12014-1-git-send-email-ssurya@ieee.org>

From: Stephen Suryaputra Lin <stephen.suryaputra.lin@gmail.com>
Date: Mon,  7 Nov 2016 17:48:39 -0500

> ICMP redirects behavior is different after the commit above. An email
> requesting the explanation on why the behavior needs to be different
> was sent earlier to netdev (https://patchwork.ozlabs.org/patch/687728/).
> Since there isn't a reply yet, I decided to prepare this formal patch.
> 
> In v2.6 kernel, it used to be that ip_rt_redirect() calls
> arp_bind_neighbour() which returns 0 and then the state of the neigh for
> the new_gw is checked. If the state isn't valid then the redirected
> route is deleted. This behavior is maintained up to v3.5.7 by
> check_peer_redirect() because rt->rt_gateway is assigned to
> peer->redirect_learned.a4 before calling ipv4_neigh_lookup().
> 
> After the commit, ipv4_neigh_lookup() is performed without the
> rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw)
> isn't zero, the function uses it as the key. The neigh is most likely valid
> since the old_gw is the one that sends the ICMP redirect message. Then the
> new_gw is assigned to fib_nh_exception. The problem is: the new_gw ARP may
> never gets resolved and the traffic is blackholed.
> 
> Changes from v1:
>  - use __ipv4_neigh_lookup instead (per Eric Dumazet).
> 
> Signed-off-by: Stephen Suryaputra Lin <ssurya@ieee.org>

The Fixes tag belongs in the commit message body, right before the
signoff(s).  And you need to therefore write an appropriate subject
line of the form:

	[PATCH net,v2] $SUBSYSTEM: $DESCRIPTION

^ permalink raw reply

* Re: [PATCH] rtnl: reset calcit fptr in rtnl_unregister()
From: David Miller @ 2016-11-10  1:18 UTC (permalink / raw)
  To: minipli; +Cc: netdev, jeffrey.t.kirsher, gregory.v.rose
In-Reply-To: <1478557339-16039-1-git-send-email-minipli@googlemail.com>

From: Mathias Krause <minipli@googlemail.com>
Date: Mon,  7 Nov 2016 23:22:19 +0100

> To avoid having dangling function pointers left behind, reset calcit in
> rtnl_unregister(), too.
> 
> This is no issue so far, as only the rtnl core registers a netlink
> handler with a calcit hook which won't be unregistered, but may become
> one if new code makes use of the calcit hook.
> 
> Fixes: c7ac8679bec9 ("rtnetlink: Compute and store minimum ifinfo...")
> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Cc: Greg Rose <gregory.v.rose@intel.com>
> Signed-off-by: Mathias Krause <minipli@googlemail.com>

Applied, thanks.

^ permalink raw reply

* Re: linux-next: manual merge of the net-next tree with the netfilter tree
From: Pablo Neira Ayuso @ 2016-11-10  0:31 UTC (permalink / raw)
  To: David Miller
  Cc: Stephen Rothwell, Networking, NetFilter, linux-next, linux-kernel,
	WANG Cong, Johannes Berg
In-Reply-To: <20161110105633.31ebdc76@canb.auug.org.au>

Hi David,

On Thu, Nov 10, 2016 at 10:56:33AM +1100, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the net-next tree got a conflict in:
> 
>   net/netfilter/ipvs/ip_vs_ctl.c
> 
> between commit:
> 
>   8fbfef7f505b ("ipvs: use IPVS_CMD_ATTR_MAX for family.maxattr")
> 
> from the netfilter tree and commit:
> 
>   489111e5c25b ("genetlink: statically initialize families")
> 
> from the net-next tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.

I think I cannot help to address this conflict myself.

8fbfef7f505b is in my nf tree, while 489111e5c25b is in net-next. So
you will hit this conflict by when you pull net into net-next.

So please keep this patch from Stephen to resolve the conflict in your
radar to solve this.

Or let me know if you come up with any way I can handle this from here
to reduce your burden. Thanks.

> diff --cc net/netfilter/ipvs/ip_vs_ctl.c
> index a6e44ef2ec9a,6b85ded4f91d..000000000000
> --- a/net/netfilter/ipvs/ip_vs_ctl.c
> +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> @@@ -3872,10 -3865,20 +3865,20 @@@ static const struct genl_ops ip_vs_genl
>   	},
>   };
>   
> + static struct genl_family ip_vs_genl_family __ro_after_init = {
> + 	.hdrsize	= 0,
> + 	.name		= IPVS_GENL_NAME,
> + 	.version	= IPVS_GENL_VERSION,
>  -	.maxattr	= IPVS_CMD_MAX,
> ++	.maxattr	= IPVS_CMD_ATTR_MAX,
> + 	.netnsok        = true,         /* Make ipvsadm to work on netns */
> + 	.module		= THIS_MODULE,
> + 	.ops		= ip_vs_genl_ops,
> + 	.n_ops		= ARRAY_SIZE(ip_vs_genl_ops),
> + };
> + 
>   static int __init ip_vs_genl_register(void)
>   {
> - 	return genl_register_family_with_ops(&ip_vs_genl_family,
> - 					     ip_vs_genl_ops);
> + 	return genl_register_family(&ip_vs_genl_family);
>   }
>   
>   static void ip_vs_genl_unregister(void)

^ permalink raw reply

* [PATCH 13/14] netfilter: conntrack: refine gc worker heuristics
From: Pablo Neira Ayuso @ 2016-11-10  0:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1478737427-1574-1-git-send-email-pablo@netfilter.org>

From: Florian Westphal <fw@strlen.de>

Nicolas Dichtel says:
  After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to
  remove timed-out entries"), netlink conntrack deletion events may be
  sent with a huge delay.

Nicolas further points at this line:

  goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV, GC_MAX_BUCKETS);

and indeed, this isn't optimal at all.  Rationale here was to ensure that
we don't block other work items for too long, even if
nf_conntrack_htable_size is huge.  But in order to have some guarantee
about maximum time period where a scan of the full conntrack table
completes we should always use a fixed slice size, so that once every
N scans the full table has been examined at least once.

We also need to balance this vs. the case where the system is either idle
(i.e., conntrack table (almost) empty) or very busy (i.e. eviction happens
from packet path).

So, after some discussion with Nicolas:

1. want hard guarantee that we scan entire table at least once every X s
-> need to scan fraction of table (get rid of upper bound)

2. don't want to eat cycles on idle or very busy system
-> increase interval if we did not evict any entries

3. don't want to block other worker items for too long
-> make fraction really small, and prefer small scan interval instead

4. Want reasonable short time where we detect timed-out entry when
system went idle after a burst of traffic, while not doing scans
all the time.
-> Store next gc scan in worker, increasing delays when no eviction
happened and shrinking delay when we see timed out entries.

The old gc interval is turned into a max number, scans can now happen
every jiffy if stale entries are present.

Longest possible time period until an entry is evicted is now 2 minutes
in worst case (entry expires right after it was deemed 'not expired').

Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_core.c | 49 ++++++++++++++++++++++++++++++++-------
 1 file changed, 41 insertions(+), 8 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index df2f5a3901df..0f87e5d21be7 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -76,6 +76,7 @@ struct conntrack_gc_work {
 	struct delayed_work	dwork;
 	u32			last_bucket;
 	bool			exiting;
+	long			next_gc_run;
 };
 
 static __read_mostly struct kmem_cache *nf_conntrack_cachep;
@@ -83,9 +84,11 @@ static __read_mostly spinlock_t nf_conntrack_locks_all_lock;
 static __read_mostly DEFINE_SPINLOCK(nf_conntrack_locks_all_lock);
 static __read_mostly bool nf_conntrack_locks_all;
 
+/* every gc cycle scans at most 1/GC_MAX_BUCKETS_DIV part of table */
 #define GC_MAX_BUCKETS_DIV	64u
-#define GC_MAX_BUCKETS		8192u
-#define GC_INTERVAL		(5 * HZ)
+/* upper bound of scan intervals */
+#define GC_INTERVAL_MAX		(2 * HZ)
+/* maximum conntracks to evict per gc run */
 #define GC_MAX_EVICTS		256u
 
 static struct conntrack_gc_work conntrack_gc_work;
@@ -936,13 +939,13 @@ static noinline int early_drop(struct net *net, unsigned int _hash)
 static void gc_worker(struct work_struct *work)
 {
 	unsigned int i, goal, buckets = 0, expired_count = 0;
-	unsigned long next_run = GC_INTERVAL;
-	unsigned int ratio, scanned = 0;
 	struct conntrack_gc_work *gc_work;
+	unsigned int ratio, scanned = 0;
+	unsigned long next_run;
 
 	gc_work = container_of(work, struct conntrack_gc_work, dwork.work);
 
-	goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV, GC_MAX_BUCKETS);
+	goal = nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV;
 	i = gc_work->last_bucket;
 
 	do {
@@ -982,17 +985,47 @@ static void gc_worker(struct work_struct *work)
 	if (gc_work->exiting)
 		return;
 
+	/*
+	 * Eviction will normally happen from the packet path, and not
+	 * from this gc worker.
+	 *
+	 * This worker is only here to reap expired entries when system went
+	 * idle after a busy period.
+	 *
+	 * The heuristics below are supposed to balance conflicting goals:
+	 *
+	 * 1. Minimize time until we notice a stale entry
+	 * 2. Maximize scan intervals to not waste cycles
+	 *
+	 * Normally, expired_count will be 0, this increases the next_run time
+	 * to priorize 2) above.
+	 *
+	 * As soon as a timed-out entry is found, move towards 1) and increase
+	 * the scan frequency.
+	 * In case we have lots of evictions next scan is done immediately.
+	 */
 	ratio = scanned ? expired_count * 100 / scanned : 0;
-	if (ratio >= 90 || expired_count == GC_MAX_EVICTS)
+	if (ratio >= 90 || expired_count == GC_MAX_EVICTS) {
+		gc_work->next_gc_run = 0;
 		next_run = 0;
+	} else if (expired_count) {
+		gc_work->next_gc_run /= 2U;
+		next_run = msecs_to_jiffies(1);
+	} else {
+		if (gc_work->next_gc_run < GC_INTERVAL_MAX)
+			gc_work->next_gc_run += msecs_to_jiffies(1);
+
+		next_run = gc_work->next_gc_run;
+	}
 
 	gc_work->last_bucket = i;
-	schedule_delayed_work(&gc_work->dwork, next_run);
+	queue_delayed_work(system_long_wq, &gc_work->dwork, next_run);
 }
 
 static void conntrack_gc_work_init(struct conntrack_gc_work *gc_work)
 {
 	INIT_DELAYED_WORK(&gc_work->dwork, gc_worker);
+	gc_work->next_gc_run = GC_INTERVAL_MAX;
 	gc_work->exiting = false;
 }
 
@@ -1885,7 +1918,7 @@ int nf_conntrack_init_start(void)
 	nf_ct_untracked_status_or(IPS_CONFIRMED | IPS_UNTRACKED);
 
 	conntrack_gc_work_init(&conntrack_gc_work);
-	schedule_delayed_work(&conntrack_gc_work.dwork, GC_INTERVAL);
+	queue_delayed_work(system_long_wq, &conntrack_gc_work.dwork, GC_INTERVAL_MAX);
 
 	return 0;
 
-- 
2.1.4

^ permalink raw reply related

* [PATCH 14/14] netfilter: nf_tables: fix oops when inserting an element into a verdict map
From: Pablo Neira Ayuso @ 2016-11-10  0:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1478737427-1574-1-git-send-email-pablo@netfilter.org>

From: Liping Zhang <zlpnobody@gmail.com>

Dalegaard says:
 The following ruleset, when loaded with 'nft -f bad.txt'
 ----snip----
 flush ruleset
 table ip inlinenat {
   map sourcemap {
     type ipv4_addr : verdict;
   }

   chain postrouting {
     ip saddr vmap @sourcemap accept
   }
 }
 add chain inlinenat test
 add element inlinenat sourcemap { 100.123.10.2 : jump test }
 ----snip----

 results in a kernel oops:
 BUG: unable to handle kernel paging request at 0000000000001344
 IP: [<ffffffffa07bf704>] nf_tables_check_loops+0x114/0x1f0 [nf_tables]
 [...]
 Call Trace:
  [<ffffffffa07c2aae>] ? nft_data_init+0x13e/0x1a0 [nf_tables]
  [<ffffffffa07c1950>] nft_validate_register_store+0x60/0xb0 [nf_tables]
  [<ffffffffa07c74b5>] nft_add_set_elem+0x545/0x5e0 [nf_tables]
  [<ffffffffa07bfdd0>] ? nft_table_lookup+0x30/0x60 [nf_tables]
  [<ffffffff8132c630>] ? nla_strcmp+0x40/0x50
  [<ffffffffa07c766e>] nf_tables_newsetelem+0x11e/0x210 [nf_tables]
  [<ffffffff8132c400>] ? nla_validate+0x60/0x80
  [<ffffffffa030d9b4>] nfnetlink_rcv+0x354/0x5a7 [nfnetlink]

Because we forget to fill the net pointer in bind_ctx, so dereferencing
it may cause kernel crash.

Reported-by: Dalegaard <dalegaard@gmail.com>
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_tables_api.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 7d6a626b08f1..026581b04ea8 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -3568,6 +3568,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 		dreg = nft_type_to_reg(set->dtype);
 		list_for_each_entry(binding, &set->bindings, list) {
 			struct nft_ctx bind_ctx = {
+				.net	= ctx->net,
 				.afi	= ctx->afi,
 				.table	= ctx->table,
 				.chain	= (struct nft_chain *)binding->chain,
-- 
2.1.4

^ permalink raw reply related

* [PATCH 11/14] netfilter: connmark: ignore skbs with magic untracked conntrack objects
From: Pablo Neira Ayuso @ 2016-11-10  0:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1478737427-1574-1-git-send-email-pablo@netfilter.org>

From: Florian Westphal <fw@strlen.de>

The (percpu) untracked conntrack entries can end up with nonzero connmarks.

The 'untracked' conntrack objects are merely a way to distinguish INVALID
(i.e. protocol connection tracker says payload doesn't meet some
requirements or packet was never seen by the connection tracking code)
from packets that are intentionally not tracked (some icmpv6 types such as
neigh solicitation, or by using 'iptables -j CT --notrack' option).

Untracked conntrack objects are implementation detail, we might as well use
invalid magic address instead to tell INVALID and UNTRACKED apart.

Check skb->nfct for untracked dummy and behave as if skb->nfct is NULL.

Reported-by: XU Tianwen <evan.xu.tianwen@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/xt_connmark.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/xt_connmark.c b/net/netfilter/xt_connmark.c
index 69f78e96fdb4..b83e158e116a 100644
--- a/net/netfilter/xt_connmark.c
+++ b/net/netfilter/xt_connmark.c
@@ -44,7 +44,7 @@ connmark_tg(struct sk_buff *skb, const struct xt_action_param *par)
 	u_int32_t newmark;

 	ct = nf_ct_get(skb, &ctinfo);
-	if (ct == NULL)
+	if (ct == NULL || nf_ct_is_untracked(ct))
 		return XT_CONTINUE;

 	switch (info->mode) {
@@ -97,7 +97,7 @@ connmark_mt(const struct sk_buff *skb, struct xt_action_param *par)
 	const struct nf_conn *ct;

 	ct = nf_ct_get(skb, &ctinfo);
-	if (ct == NULL)
+	if (ct == NULL || nf_ct_is_untracked(ct))
 		return false;

 	return ((ct->mark & info->mask) == info->mark) ^ info->invert;
-- 
2.1.4

^ permalink raw reply related

* [PATCH 05/14] netfilter: nf_tables: fix type mismatch with error return from nft_parse_u32_check
From: Pablo Neira Ayuso @ 2016-11-10  0:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1478737427-1574-1-git-send-email-pablo@netfilter.org>

From: "John W. Linville" <linville@tuxdriver.com>

Commit 36b701fae12ac ("netfilter: nf_tables: validate maximum value of
u32 netlink attributes") introduced nft_parse_u32_check with a return
value of "unsigned int", yet on error it returns "-ERANGE".

This patch corrects the mismatch by changing the return value to "int",
which happens to match the actual users of nft_parse_u32_check already.

Found by Coverity, CID 1373930.

Note that commit 21a9e0f1568ea ("netfilter: nft_exthdr: fix error
handling in nft_exthdr_init()) attempted to address the issue, but
did not address the return type of nft_parse_u32_check.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cc: Laura Garcia Liebana <nevola@gmail.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 36b701fae12ac ("netfilter: nf_tables: validate maximum value...")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h | 2 +-
 net/netfilter/nf_tables_api.c     | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 741dcded5b4f..d79d1e9b9546 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -145,7 +145,7 @@ static inline enum nft_registers nft_type_to_reg(enum nft_data_types type)
 	return type == NFT_DATA_VERDICT ? NFT_REG_VERDICT : NFT_REG_1 * NFT_REG_SIZE / NFT_REG32_SIZE;
 }
 
-unsigned int nft_parse_u32_check(const struct nlattr *attr, int max, u32 *dest);
+int nft_parse_u32_check(const struct nlattr *attr, int max, u32 *dest);
 unsigned int nft_parse_register(const struct nlattr *attr);
 int nft_dump_register(struct sk_buff *skb, unsigned int attr, unsigned int reg);
 
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 86e48aeb20be..365d31b86816 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -4422,7 +4422,7 @@ static int nf_tables_check_loops(const struct nft_ctx *ctx,
  *	Otherwise a 0 is returned and the attribute value is stored in the
  *	destination variable.
  */
-unsigned int nft_parse_u32_check(const struct nlattr *attr, int max, u32 *dest)
+int nft_parse_u32_check(const struct nlattr *attr, int max, u32 *dest)
 {
 	u32 val;
 
-- 
2.1.4

^ permalink raw reply related

* [PATCH 08/14] netfilter: nf_tables: destroy the set if fail to add transaction
From: Pablo Neira Ayuso @ 2016-11-10  0:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1478737427-1574-1-git-send-email-pablo@netfilter.org>

From: Liping Zhang <zlpnobody@gmail.com>

When the memory is exhausted, then we will fail to add the NFT_MSG_NEWSET
transaction. In such case, we should destroy the set before we free it.

Fixes: 958bee14d071 ("netfilter: nf_tables: use new transaction infrastructure to handle sets")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_tables_api.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 365d31b86816..7d6a626b08f1 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2956,12 +2956,14 @@ static int nf_tables_newset(struct net *net, struct sock *nlsk,
 
 	err = nft_trans_set_add(&ctx, NFT_MSG_NEWSET, set);
 	if (err < 0)
-		goto err2;
+		goto err3;
 
 	list_add_tail_rcu(&set->list, &table->sets);
 	table->use++;
 	return 0;
 
+err3:
+	ops->destroy(set);
 err2:
 	kfree(set);
 err1:
-- 
2.1.4

^ permalink raw reply related

* [PATCH 09/14] netfilter: nft_dup: do not use sreg_dev if the user doesn't specify it
From: Pablo Neira Ayuso @ 2016-11-10  0:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1478737427-1574-1-git-send-email-pablo@netfilter.org>

From: Liping Zhang <zlpnobody@gmail.com>

The NFTA_DUP_SREG_DEV attribute is not a must option, so we should use it
in routing lookup only when the user specify it.

Fixes: d877f07112f1 ("netfilter: nf_tables: add nft_dup expression")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv4/netfilter/nft_dup_ipv4.c | 6 ++++--
 net/ipv6/netfilter/nft_dup_ipv6.c | 6 ++++--
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/netfilter/nft_dup_ipv4.c b/net/ipv4/netfilter/nft_dup_ipv4.c
index bf855e64fc45..0c01a270bf9f 100644
--- a/net/ipv4/netfilter/nft_dup_ipv4.c
+++ b/net/ipv4/netfilter/nft_dup_ipv4.c
@@ -28,7 +28,7 @@ static void nft_dup_ipv4_eval(const struct nft_expr *expr,
 	struct in_addr gw = {
 		.s_addr = (__force __be32)regs->data[priv->sreg_addr],
 	};
-	int oif = regs->data[priv->sreg_dev];
+	int oif = priv->sreg_dev ? regs->data[priv->sreg_dev] : -1;
 
 	nf_dup_ipv4(pkt->net, pkt->skb, pkt->hook, &gw, oif);
 }
@@ -59,7 +59,9 @@ static int nft_dup_ipv4_dump(struct sk_buff *skb, const struct nft_expr *expr)
 {
 	struct nft_dup_ipv4 *priv = nft_expr_priv(expr);
 
-	if (nft_dump_register(skb, NFTA_DUP_SREG_ADDR, priv->sreg_addr) ||
+	if (nft_dump_register(skb, NFTA_DUP_SREG_ADDR, priv->sreg_addr))
+		goto nla_put_failure;
+	if (priv->sreg_dev &&
 	    nft_dump_register(skb, NFTA_DUP_SREG_DEV, priv->sreg_dev))
 		goto nla_put_failure;
 
diff --git a/net/ipv6/netfilter/nft_dup_ipv6.c b/net/ipv6/netfilter/nft_dup_ipv6.c
index 8bfd470cbe72..831f86e1ec08 100644
--- a/net/ipv6/netfilter/nft_dup_ipv6.c
+++ b/net/ipv6/netfilter/nft_dup_ipv6.c
@@ -26,7 +26,7 @@ static void nft_dup_ipv6_eval(const struct nft_expr *expr,
 {
 	struct nft_dup_ipv6 *priv = nft_expr_priv(expr);
 	struct in6_addr *gw = (struct in6_addr *)&regs->data[priv->sreg_addr];
-	int oif = regs->data[priv->sreg_dev];
+	int oif = priv->sreg_dev ? regs->data[priv->sreg_dev] : -1;
 
 	nf_dup_ipv6(pkt->net, pkt->skb, pkt->hook, gw, oif);
 }
@@ -57,7 +57,9 @@ static int nft_dup_ipv6_dump(struct sk_buff *skb, const struct nft_expr *expr)
 {
 	struct nft_dup_ipv6 *priv = nft_expr_priv(expr);
 
-	if (nft_dump_register(skb, NFTA_DUP_SREG_ADDR, priv->sreg_addr) ||
+	if (nft_dump_register(skb, NFTA_DUP_SREG_ADDR, priv->sreg_addr))
+		goto nla_put_failure;
+	if (priv->sreg_dev &&
 	    nft_dump_register(skb, NFTA_DUP_SREG_DEV, priv->sreg_dev))
 		goto nla_put_failure;
 
-- 
2.1.4

^ permalink raw reply related

* [PATCH 06/14] netfilter: conntrack: avoid excess memory allocation
From: Pablo Neira Ayuso @ 2016-11-10  0:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1478737427-1574-1-git-send-email-pablo@netfilter.org>

From: Florian Westphal <fw@strlen.de>

This is now a fixed-size extension, so we don't need to pass a variable
alloc size.  This (harmless) error results in allocating 32 instead of
the needed 16 bytes for this extension as the size gets passed twice.

Fixes: 23014011ba420 ("netfilter: conntrack: support a fixed size of 128 distinct labels")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_conntrack_labels.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_labels.h b/include/net/netfilter/nf_conntrack_labels.h
index 498814626e28..1723a67c0b0a 100644
--- a/include/net/netfilter/nf_conntrack_labels.h
+++ b/include/net/netfilter/nf_conntrack_labels.h
@@ -30,8 +30,7 @@ static inline struct nf_conn_labels *nf_ct_labels_ext_add(struct nf_conn *ct)
 	if (net->ct.labels_used == 0)
 		return NULL;
 
-	return nf_ct_ext_add_length(ct, NF_CT_EXT_LABELS,
-				    sizeof(struct nf_conn_labels), GFP_ATOMIC);
+	return nf_ct_ext_add(ct, NF_CT_EXT_LABELS, GFP_ATOMIC);
 #else
 	return NULL;
 #endif
-- 
2.1.4

^ permalink raw reply related

* [PATCH 03/14] netfilter: nf_tables: fix race when create new element in dynset
From: Pablo Neira Ayuso @ 2016-11-10  0:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1478737427-1574-1-git-send-email-pablo@netfilter.org>

From: Liping Zhang <zlpnobody@gmail.com>

Packets may race when create the new element in nft_hash_update:
       CPU0                 CPU1
  lookup_fast - fail     lookup_fast - fail
       new - ok             new - ok
     insert - ok         insert - fail(EEXIST)

So when race happened, we reuse the existing element. Otherwise,
these *racing* packets will not be handled properly.

Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_set_hash.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index 88d9fc8343e7..a3dface3e6e6 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -98,7 +98,7 @@ static bool nft_hash_update(struct nft_set *set, const u32 *key,
 			    const struct nft_set_ext **ext)
 {
 	struct nft_hash *priv = nft_set_priv(set);
-	struct nft_hash_elem *he;
+	struct nft_hash_elem *he, *prev;
 	struct nft_hash_cmp_arg arg = {
 		.genmask = NFT_GENMASK_ANY,
 		.set	 = set,
@@ -112,9 +112,18 @@ static bool nft_hash_update(struct nft_set *set, const u32 *key,
 	he = new(set, expr, regs);
 	if (he == NULL)
 		goto err1;
-	if (rhashtable_lookup_insert_key(&priv->ht, &arg, &he->node,
-					 nft_hash_params))
+
+	prev = rhashtable_lookup_get_insert_key(&priv->ht, &arg, &he->node,
+						nft_hash_params);
+	if (IS_ERR(prev))
 		goto err2;
+
+	/* Another cpu may race to insert the element with the same key */
+	if (prev) {
+		nft_set_elem_destroy(set, he, true);
+		he = prev;
+	}
+
 out:
 	*ext = &he->ext;
 	return true;
-- 
2.1.4

^ permalink raw reply related

* [PATCH 02/14] netfilter: nf_tables: fix *leak* when expr clone fail
From: Pablo Neira Ayuso @ 2016-11-10  0:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1478737427-1574-1-git-send-email-pablo@netfilter.org>

From: Liping Zhang <zlpnobody@gmail.com>

When nft_expr_clone failed, a series of problems will happen:

1. module refcnt will leak, we call __module_get at the beginning but
   we forget to put it back if ops->clone returns fail
2. memory will be leaked, if clone fail, we just return NULL and forget
   to free the alloced element
3. set->nelems will become incorrect when set->size is specified. If
   clone fail, we should decrease the set->nelems

Now this patch fixes these problems. And fortunately, clone fail will
only happen on counter expression when memory is exhausted.

Fixes: 086f332167d6 ("netfilter: nf_tables: add clone interface to expression operations")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h |  6 ++++--
 net/netfilter/nf_tables_api.c     | 11 ++++++-----
 net/netfilter/nft_dynset.c        | 16 ++++++++++------
 net/netfilter/nft_set_hash.c      |  4 ++--
 net/netfilter/nft_set_rbtree.c    |  2 +-
 5 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 5031e072567b..741dcded5b4f 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -542,7 +542,8 @@ void *nft_set_elem_init(const struct nft_set *set,
 			const struct nft_set_ext_tmpl *tmpl,
 			const u32 *key, const u32 *data,
 			u64 timeout, gfp_t gfp);
-void nft_set_elem_destroy(const struct nft_set *set, void *elem);
+void nft_set_elem_destroy(const struct nft_set *set, void *elem,
+			  bool destroy_expr);
 
 /**
  *	struct nft_set_gc_batch_head - nf_tables set garbage collection batch
@@ -693,7 +694,6 @@ static inline int nft_expr_clone(struct nft_expr *dst, struct nft_expr *src)
 {
 	int err;
 
-	__module_get(src->ops->type->owner);
 	if (src->ops->clone) {
 		dst->ops = src->ops;
 		err = src->ops->clone(dst, src);
@@ -702,6 +702,8 @@ static inline int nft_expr_clone(struct nft_expr *dst, struct nft_expr *src)
 	} else {
 		memcpy(dst, src, src->ops->size);
 	}
+
+	__module_get(src->ops->type->owner);
 	return 0;
 }
 
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 24db22257586..86e48aeb20be 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -3452,14 +3452,15 @@ void *nft_set_elem_init(const struct nft_set *set,
 	return elem;
 }
 
-void nft_set_elem_destroy(const struct nft_set *set, void *elem)
+void nft_set_elem_destroy(const struct nft_set *set, void *elem,
+			  bool destroy_expr)
 {
 	struct nft_set_ext *ext = nft_set_elem_ext(set, elem);
 
 	nft_data_uninit(nft_set_ext_key(ext), NFT_DATA_VALUE);
 	if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA))
 		nft_data_uninit(nft_set_ext_data(ext), set->dtype);
-	if (nft_set_ext_exists(ext, NFT_SET_EXT_EXPR))
+	if (destroy_expr && nft_set_ext_exists(ext, NFT_SET_EXT_EXPR))
 		nf_tables_expr_destroy(NULL, nft_set_ext_expr(ext));
 
 	kfree(elem);
@@ -3812,7 +3813,7 @@ void nft_set_gc_batch_release(struct rcu_head *rcu)
 
 	gcb = container_of(rcu, struct nft_set_gc_batch, head.rcu);
 	for (i = 0; i < gcb->head.cnt; i++)
-		nft_set_elem_destroy(gcb->head.set, gcb->elems[i]);
+		nft_set_elem_destroy(gcb->head.set, gcb->elems[i], true);
 	kfree(gcb);
 }
 EXPORT_SYMBOL_GPL(nft_set_gc_batch_release);
@@ -4030,7 +4031,7 @@ static void nf_tables_commit_release(struct nft_trans *trans)
 		break;
 	case NFT_MSG_DELSETELEM:
 		nft_set_elem_destroy(nft_trans_elem_set(trans),
-				     nft_trans_elem(trans).priv);
+				     nft_trans_elem(trans).priv, true);
 		break;
 	}
 	kfree(trans);
@@ -4171,7 +4172,7 @@ static void nf_tables_abort_release(struct nft_trans *trans)
 		break;
 	case NFT_MSG_NEWSETELEM:
 		nft_set_elem_destroy(nft_trans_elem_set(trans),
-				     nft_trans_elem(trans).priv);
+				     nft_trans_elem(trans).priv, true);
 		break;
 	}
 	kfree(trans);
diff --git a/net/netfilter/nft_dynset.c b/net/netfilter/nft_dynset.c
index bfdb689664b0..31ca94793aa9 100644
--- a/net/netfilter/nft_dynset.c
+++ b/net/netfilter/nft_dynset.c
@@ -44,18 +44,22 @@ static void *nft_dynset_new(struct nft_set *set, const struct nft_expr *expr,
 				 &regs->data[priv->sreg_key],
 				 &regs->data[priv->sreg_data],
 				 timeout, GFP_ATOMIC);
-	if (elem == NULL) {
-		if (set->size)
-			atomic_dec(&set->nelems);
-		return NULL;
-	}
+	if (elem == NULL)
+		goto err1;
 
 	ext = nft_set_elem_ext(set, elem);
 	if (priv->expr != NULL &&
 	    nft_expr_clone(nft_set_ext_expr(ext), priv->expr) < 0)
-		return NULL;
+		goto err2;
 
 	return elem;
+
+err2:
+	nft_set_elem_destroy(set, elem, false);
+err1:
+	if (set->size)
+		atomic_dec(&set->nelems);
+	return NULL;
 }
 
 static void nft_dynset_eval(const struct nft_expr *expr,
diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index 3794cb2fc788..88d9fc8343e7 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -120,7 +120,7 @@ static bool nft_hash_update(struct nft_set *set, const u32 *key,
 	return true;
 
 err2:
-	nft_set_elem_destroy(set, he);
+	nft_set_elem_destroy(set, he, true);
 err1:
 	return false;
 }
@@ -332,7 +332,7 @@ static int nft_hash_init(const struct nft_set *set,
 
 static void nft_hash_elem_destroy(void *ptr, void *arg)
 {
-	nft_set_elem_destroy((const struct nft_set *)arg, ptr);
+	nft_set_elem_destroy((const struct nft_set *)arg, ptr, true);
 }
 
 static void nft_hash_destroy(const struct nft_set *set)
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 38b5bda242f8..36493a7cae88 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -266,7 +266,7 @@ static void nft_rbtree_destroy(const struct nft_set *set)
 	while ((node = priv->root.rb_node) != NULL) {
 		rb_erase(node, &priv->root);
 		rbe = rb_entry(node, struct nft_rbtree_elem, node);
-		nft_set_elem_destroy(set, rbe);
+		nft_set_elem_destroy(set, rbe, true);
 	}
 }
 
-- 
2.1.4

^ permalink raw reply related

* [PATCH 00/14] Netfilter fixes for net
From: Pablo Neira Ayuso @ 2016-11-10  0:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

Hi David,

The following patchset contains a larger than usual batch of Netfilter
fixes for your net tree. This series contains a mixture of old bugs and
recently introduced bugs, they are:

1) Fix a crash when using nft_dynset with nft_set_rbtree, which doesn't
   support the set element updates from the packet path. From Liping
   Zhang.

2) Fix leak when nft_expr_clone() fails, from Liping Zhang.

3) Fix a race when inserting new elements to the set hash from the
   packet path, also from Liping.

4) Handle segmented TCP SIP packets properly, basically avoid that the
   INVITE in the allow header create bogus expectations by performing
   stricter SIP message parsing, from Ulrich Weber.

5) nft_parse_u32_check() should return signed integer for errors, from
   John Linville.

6) Fix wrong allocation instead of connlabels, allocate 16 instead of
   32 bytes, from Florian Westphal.

7) Fix compilation breakage when building the ip_vs_sync code with
   CONFIG_OPTIMIZE_INLINING on x86, from Arnd Bergmann.

8) Destroy the new set if the transaction object cannot be allocated,
   also from Liping Zhang.

9) Use device to route duplicated packets via nft_dup only when set by
   the user, otherwise packets may not follow the right route, again
   from Liping.

10) Fix wrong maximum genetlink attribute definition in IPVS, from
    WANG Cong.

11) Ignore untracked conntrack objects from xt_connmark, from Florian
    Westphal.

12) Allow to use conntrack helpers that are registered NFPROTO_UNSPEC
    via CT target, otherwise we cannot use the h.245 helper, from
    Florian.

13) Revisit garbage collection heuristic in the new workqueue-based
    timer approach for conntrack to evict objects earlier, again from
    Florian.

14) Fix crash in nf_tables when inserting an element into a verdict map,
    from Liping Zhang.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!

----------------------------------------------------------------

The following changes since commit 67f0160fe34ec5391a428603b9832c9f99d8f3a1:

  MAINTAINERS: Update qlogic networking drivers (2016-10-26 23:29:12 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to 58c78e104d937c1f560fb10ed9bb2dcde0db4fcf:

  netfilter: nf_tables: fix oops when inserting an element into a verdict map (2016-11-08 23:53:39 +0100)

----------------------------------------------------------------
Arnd Bergmann (1):
      netfilter: ip_vs_sync: fix bogus maybe-uninitialized warning

Florian Westphal (4):
      netfilter: conntrack: avoid excess memory allocation
      netfilter: connmark: ignore skbs with magic untracked conntrack objects
      netfilter: conntrack: fix CT target for UNSPEC helpers
      netfilter: conntrack: refine gc worker heuristics

John W. Linville (1):
      netfilter: nf_tables: fix type mismatch with error return from nft_parse_u32_check

Liping Zhang (6):
      netfilter: nft_dynset: fix panic if NFT_SET_HASH is not enabled
      netfilter: nf_tables: fix *leak* when expr clone fail
      netfilter: nf_tables: fix race when create new element in dynset
      netfilter: nf_tables: destroy the set if fail to add transaction
      netfilter: nft_dup: do not use sreg_dev if the user doesn't specify it
      netfilter: nf_tables: fix oops when inserting an element into a verdict map

Ulrich Weber (1):
      netfilter: nf_conntrack_sip: extend request line validation

WANG Cong (1):
      ipvs: use IPVS_CMD_ATTR_MAX for family.maxattr

 include/net/netfilter/nf_conntrack_labels.h |  3 +-
 include/net/netfilter/nf_tables.h           |  8 +++--
 net/ipv4/netfilter/nft_dup_ipv4.c           |  6 ++--
 net/ipv6/netfilter/nft_dup_ipv6.c           |  6 ++--
 net/netfilter/ipvs/ip_vs_ctl.c              |  2 +-
 net/netfilter/ipvs/ip_vs_sync.c             |  7 +++--
 net/netfilter/nf_conntrack_core.c           | 49 ++++++++++++++++++++++++-----
 net/netfilter/nf_conntrack_helper.c         | 11 +++++--
 net/netfilter/nf_conntrack_sip.c            |  5 ++-
 net/netfilter/nf_tables_api.c               | 18 ++++++-----
 net/netfilter/nft_dynset.c                  | 19 +++++++----
 net/netfilter/nft_set_hash.c                | 19 ++++++++---
 net/netfilter/nft_set_rbtree.c              |  2 +-
 net/netfilter/xt_connmark.c                 |  4 +--
 14 files changed, 114 insertions(+), 45 deletions(-)

^ permalink raw reply

* [PATCH 12/14] netfilter: conntrack: fix CT target for UNSPEC helpers
From: Pablo Neira Ayuso @ 2016-11-10  0:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1478737427-1574-1-git-send-email-pablo@netfilter.org>

From: Florian Westphal <fw@strlen.de>

Thomas reports its not possible to attach the H.245 helper:

iptables -t raw -A PREROUTING -p udp -j CT --helper H.245
iptables: No chain/target/match by that name.
xt_CT: No such helper "H.245"

This is because H.245 registers as NFPROTO_UNSPEC, but the CT target
passes NFPROTO_IPV4/IPV6 to nf_conntrack_helper_try_module_get.

We should treat UNSPEC as wildcard and ignore the l3num instead.

Reported-by: Thomas Woerner <twoerner@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_helper.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
index 336e21559e01..7341adf7059d 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -138,9 +138,14 @@ __nf_conntrack_helper_find(const char *name, u16 l3num, u8 protonum)
 
 	for (i = 0; i < nf_ct_helper_hsize; i++) {
 		hlist_for_each_entry_rcu(h, &nf_ct_helper_hash[i], hnode) {
-			if (!strcmp(h->name, name) &&
-			    h->tuple.src.l3num == l3num &&
-			    h->tuple.dst.protonum == protonum)
+			if (strcmp(h->name, name))
+				continue;
+
+			if (h->tuple.src.l3num != NFPROTO_UNSPEC &&
+			    h->tuple.src.l3num != l3num)
+				continue;
+
+			if (h->tuple.dst.protonum == protonum)
 				return h;
 		}
 	}
-- 
2.1.4


^ permalink raw reply related

* [PATCH 10/14] ipvs: use IPVS_CMD_ATTR_MAX for family.maxattr
From: Pablo Neira Ayuso @ 2016-11-10  0:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1478737427-1574-1-git-send-email-pablo@netfilter.org>

From: WANG Cong <xiyou.wangcong@gmail.com>

family.maxattr is the max index for policy[], the size of
ops[] is determined with ARRAY_SIZE().

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/ipvs/ip_vs_ctl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index c3c809b2e712..a6e44ef2ec9a 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -2845,7 +2845,7 @@ static struct genl_family ip_vs_genl_family = {
 	.hdrsize	= 0,
 	.name		= IPVS_GENL_NAME,
 	.version	= IPVS_GENL_VERSION,
-	.maxattr	= IPVS_CMD_MAX,
+	.maxattr	= IPVS_CMD_ATTR_MAX,
 	.netnsok        = true,         /* Make ipvsadm to work on netns */
 };
 
-- 
2.1.4


^ permalink raw reply related

* [PATCH 07/14] netfilter: ip_vs_sync: fix bogus maybe-uninitialized warning
From: Pablo Neira Ayuso @ 2016-11-10  0:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1478737427-1574-1-git-send-email-pablo@netfilter.org>

From: Arnd Bergmann <arnd@arndb.de>

Building the ip_vs_sync code with CONFIG_OPTIMIZE_INLINING on x86
confuses the compiler to the point where it produces a rather
dubious warning message:

net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.init_seq’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  struct ip_vs_sync_conn_options opt;
                                 ^~~
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.previous_delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)&opt+12).init_seq’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)&opt+12).delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)&opt+12).previous_delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized]

The problem appears to be a combination of a number of factors, including
the __builtin_bswap32 compiler builtin being slightly odd, having a large
amount of code inlined into a single function, and the way that some
functions only get partially inlined here.

I've spent way too much time trying to work out a way to improve the
code, but the best I've come up with is to add an explicit memset
right before the ip_vs_seq structure is first initialized here. When
the compiler works correctly, this has absolutely no effect, but in the
case that produces the warning, the warning disappears.

In the process of analysing this warning, I also noticed that
we use memcpy to copy the larger ip_vs_sync_conn_options structure
over two members of the ip_vs_conn structure. This works because
the layout is identical, but seems error-prone, so I'm changing
this in the process to directly copy the two members. This change
seemed to have no effect on the object code or the warning, but
it deals with the same data, so I kept the two changes together.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/ipvs/ip_vs_sync.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index 1b07578bedf3..9350530c16c1 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -283,6 +283,7 @@ struct ip_vs_sync_buff {
  */
 static void ntoh_seq(struct ip_vs_seq *no, struct ip_vs_seq *ho)
 {
+	memset(ho, 0, sizeof(*ho));
 	ho->init_seq       = get_unaligned_be32(&no->init_seq);
 	ho->delta          = get_unaligned_be32(&no->delta);
 	ho->previous_delta = get_unaligned_be32(&no->previous_delta);
@@ -917,8 +918,10 @@ static void ip_vs_proc_conn(struct netns_ipvs *ipvs, struct ip_vs_conn_param *pa
 			kfree(param->pe_data);
 	}
 
-	if (opt)
-		memcpy(&cp->in_seq, opt, sizeof(*opt));
+	if (opt) {
+		cp->in_seq = opt->in_seq;
+		cp->out_seq = opt->out_seq;
+	}
 	atomic_set(&cp->in_pkts, sysctl_sync_threshold(ipvs));
 	cp->state = state;
 	cp->old_state = cp->state;
-- 
2.1.4


^ permalink raw reply related

* [PATCH 04/14] netfilter: nf_conntrack_sip: extend request line validation
From: Pablo Neira Ayuso @ 2016-11-10  0:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1478737427-1574-1-git-send-email-pablo@netfilter.org>

From: Ulrich Weber <ulrich.weber@riverbed.com>

on SIP requests, so a fragmented TCP SIP packet from an allow header starting with
 INVITE,NOTIFY,OPTIONS,REFER,REGISTER,UPDATE,SUBSCRIBE
 Content-Length: 0

will not bet interpreted as an INVITE request. Also Request-URI must start with an alphabetic character.

Confirm with RFC 3261
 Request-Line   =  Method SP Request-URI SP SIP-Version CRLF

Fixes: 30f33e6dee80 ("[NETFILTER]: nf_conntrack_sip: support method specific request/response handling")
Signed-off-by: Ulrich Weber <ulrich.weber@riverbed.com>
Acked-by: Marco Angaroni <marcoangaroni@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_sip.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_conntrack_sip.c b/net/netfilter/nf_conntrack_sip.c
index 621b81c7bddc..c3fc14e021ec 100644
--- a/net/netfilter/nf_conntrack_sip.c
+++ b/net/netfilter/nf_conntrack_sip.c
@@ -1436,9 +1436,12 @@ static int process_sip_request(struct sk_buff *skb, unsigned int protoff,
 		handler = &sip_handlers[i];
 		if (handler->request == NULL)
 			continue;
-		if (*datalen < handler->len ||
+		if (*datalen < handler->len + 2 ||
 		    strncasecmp(*dptr, handler->method, handler->len))
 			continue;
+		if ((*dptr)[handler->len] != ' ' ||
+		    !isalpha((*dptr)[handler->len+1]))
+			continue;
 
 		if (ct_sip_get_header(ct, *dptr, 0, *datalen, SIP_HDR_CSEQ,
 				      &matchoff, &matchlen) <= 0) {
-- 
2.1.4


^ permalink raw reply related

* [PATCH 01/14] netfilter: nft_dynset: fix panic if NFT_SET_HASH is not enabled
From: Pablo Neira Ayuso @ 2016-11-10  0:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1478737427-1574-1-git-send-email-pablo@netfilter.org>

From: Liping Zhang <zlpnobody@gmail.com>

When CONFIG_NFT_SET_HASH is not enabled and I input the following rule:
"nft add rule filter output flow table test {ip daddr counter }", kernel
panic happened on my system:
 BUG: unable to handle kernel NULL pointer dereference at (null)
 IP: [<          (null)>]           (null)
 [...]
 Call Trace:
 [<ffffffffa0590466>] ? nft_dynset_eval+0x56/0x100 [nf_tables]
 [<ffffffffa05851bb>] nft_do_chain+0xfb/0x4e0 [nf_tables]
 [<ffffffffa0432f01>] ? nf_conntrack_tuple_taken+0x61/0x210 [nf_conntrack]
 [<ffffffffa0459ea6>] ? get_unique_tuple+0x136/0x560 [nf_nat]
 [<ffffffffa043bca1>] ? __nf_ct_ext_add_length+0x111/0x130 [nf_conntrack]
 [<ffffffffa045a357>] ? nf_nat_setup_info+0x87/0x3b0 [nf_nat]
 [<ffffffff81761e27>] ? ipt_do_table+0x327/0x610
 [<ffffffffa045a6d7>] ? __nf_nat_alloc_null_binding+0x57/0x80 [nf_nat]
 [<ffffffffa059f21f>] nft_ipv4_output+0xaf/0xd0 [nf_tables_ipv4]
 [<ffffffff81702515>] nf_iterate+0x55/0x60
 [<ffffffff81702593>] nf_hook_slow+0x73/0xd0

Because in rbtree type set, ops->update is not implemented. So just keep
it simple, in such case, report -EOPNOTSUPP to the user space.

Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_dynset.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/netfilter/nft_dynset.c b/net/netfilter/nft_dynset.c
index 517f08767a3c..bfdb689664b0 100644
--- a/net/netfilter/nft_dynset.c
+++ b/net/netfilter/nft_dynset.c
@@ -139,6 +139,9 @@ static int nft_dynset_init(const struct nft_ctx *ctx,
 			return PTR_ERR(set);
 	}

+	if (set->ops->update == NULL)
+		return -EOPNOTSUPP;
+
 	if (set->flags & NFT_SET_CONSTANT)
 		return -EBUSY;

-- 
2.1.4

^ permalink raw reply related

* [PATCH iproute2 -net-next] bpf: make tc's bpf loader generic and move into lib
From: Daniel Borkmann @ 2016-11-10  0:20 UTC (permalink / raw)
  To: stephen; +Cc: tgraf, alexei.starovoitov, netdev, Daniel Borkmann

This work moves the bpf loader into the iproute2 library and reworks
the tc specific parts into generic code. It's useful as we can then
more easily support new program types by just having the same ELF
loader backend. Joint work with Thomas Graf. I hacked a rough start
of a test suite to make sure nothing breaks [1] and looks all good.

  [1] https://github.com/borkmann/clsact/blob/master/test_bpf.sh

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 Makefile           |   13 +
 configure          |    2 +-
 include/bpf_api.h  |   26 +-
 include/bpf_util.h |   95 +++
 lib/Makefile       |    2 +-
 lib/bpf.c          | 2262 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 tc/Makefile        |    7 +-
 tc/e_bpf.c         |    2 +-
 tc/f_bpf.c         |   58 +-
 tc/m_bpf.c         |   47 +-
 tc/tc_bpf.c        | 2010 ----------------------------------------------
 tc/tc_bpf.h        |   82 --
 12 files changed, 2467 insertions(+), 2139 deletions(-)
 create mode 100644 include/bpf_util.h
 create mode 100644 lib/bpf.c
 delete mode 100644 tc/tc_bpf.c
 delete mode 100644 tc/tc_bpf.h

diff --git a/Makefile b/Makefile
index fa200dd..37b68ad 100644
--- a/Makefile
+++ b/Makefile
@@ -1,3 +1,8 @@
+# Include "Config" if already generated
+ifneq ($(wildcard Config),)
+include Config
+endif
+
 ifndef VERBOSE
 MAKEFLAGS += --no-print-directory
 endif
@@ -7,6 +12,7 @@ LIBDIR?=$(PREFIX)/lib
 SBINDIR?=/sbin
 CONFDIR?=/etc/iproute2
 DATADIR?=$(PREFIX)/share
+HDRDIR?=$(PREFIX)/include/iproute2
 DOCDIR?=$(DATADIR)/doc/iproute2
 MANDIR?=$(DATADIR)/man
 ARPDDIR?=/var/lib/arpd
@@ -51,6 +57,11 @@ SUBDIRS=lib ip tc bridge misc netem genl tipc devlink man
 LIBNETLINK=../lib/libnetlink.a ../lib/libutil.a
 LDLIBS += $(LIBNETLINK)
 
+ifeq ($(HAVE_ELF),y)
+CFLAGS += -DHAVE_ELF
+LDLIBS += -lelf
+endif
+
 all: Config
 	@set -e; \
 	for i in $(SUBDIRS); \
@@ -63,6 +74,7 @@ install: all
 	install -m 0755 -d $(DESTDIR)$(SBINDIR)
 	install -m 0755 -d $(DESTDIR)$(CONFDIR)
 	install -m 0755 -d $(DESTDIR)$(ARPDDIR)
+	install -m 0755 -d $(DESTDIR)$(HDRDIR)
 	install -m 0755 -d $(DESTDIR)$(DOCDIR)/examples
 	install -m 0755 -d $(DESTDIR)$(DOCDIR)/examples/diffserv
 	install -m 0644 README.iproute2+tc $(shell find examples -maxdepth 1 -type f) \
@@ -73,6 +85,7 @@ install: all
 	install -m 0644 $(shell find etc/iproute2 -maxdepth 1 -type f) $(DESTDIR)$(CONFDIR)
 	install -m 0755 -d $(DESTDIR)$(BASH_COMPDIR)
 	install -m 0644 bash-completion/tc $(DESTDIR)$(BASH_COMPDIR)
+	install -m 0644 include/bpf_elf.h $(DESTDIR)$(HDRDIR)
 
 snapshot:
 	echo "static const char SNAPSHOT[] = \""`date +%y%m%d`"\";" \
diff --git a/configure b/configure
index c978da3..6c431c3 100755
--- a/configure
+++ b/configure
@@ -272,7 +272,7 @@ EOF
 
     if $CC -I$INCLUDE -o $TMPDIR/elftest $TMPDIR/elftest.c -lelf >/dev/null 2>&1
     then
-	echo "TC_CONFIG_ELF:=y" >>Config
+	echo "HAVE_ELF:=y" >>Config
 	echo "yes"
     else
 	echo "no"
diff --git a/include/bpf_api.h b/include/bpf_api.h
index 1b250d2..7642623 100644
--- a/include/bpf_api.h
+++ b/include/bpf_api.h
@@ -107,9 +107,14 @@
 
 /** BPF helper functions for tc. Individual flags are in linux/bpf.h */
 
+#ifndef __BPF_FUNC
+# define __BPF_FUNC(NAME, ...)						\
+	(* NAME)(__VA_ARGS__) __maybe_unused
+#endif
+
 #ifndef BPF_FUNC
 # define BPF_FUNC(NAME, ...)						\
-	(* NAME)(__VA_ARGS__) __maybe_unused = (void *) BPF_FUNC_##NAME
+	__BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
 #endif
 
 /* Map access/manipulation */
@@ -147,10 +152,15 @@ static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
 
 /* System helpers */
 static uint32_t BPF_FUNC(get_smp_processor_id);
+static uint32_t BPF_FUNC(get_numa_node_id);
 
 /* Packet misc meta data */
 static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
+static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
+
 static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
+static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
+static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
 
 /* Packet redirection */
 static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
@@ -169,6 +179,20 @@ static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
 		    uint32_t from, uint32_t to, uint32_t flags);
 static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
 		    const void *to, uint32_t to_size, uint32_t seed);
+static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
+
+static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
+static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
+		    uint32_t flags);
+static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
+		    uint32_t flags);
+
+static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
+
+/* Event notification */
+static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
+		      uint64_t index, const void *data, uint32_t size) =
+		      (void *) BPF_FUNC_perf_event_output;
 
 /* Packet vlan encap/decap */
 static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
diff --git a/include/bpf_util.h b/include/bpf_util.h
new file mode 100644
index 0000000..05baeec
--- /dev/null
+++ b/include/bpf_util.h
@@ -0,0 +1,95 @@
+/*
+ * bpf_util.h	BPF common code
+ *
+ *		This program is free software; you can distribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Authors:	Daniel Borkmann <daniel@iogearbox.net>
+ *		Jiri Pirko <jiri@resnulli.us>
+ */
+
+#ifndef __BPF_UTIL__
+#define __BPF_UTIL__
+
+#include <linux/bpf.h>
+#include <linux/filter.h>
+#include <linux/magic.h>
+#include <linux/elf-em.h>
+#include <linux/if_alg.h>
+
+#include "utils.h"
+#include "bpf_scm.h"
+
+#define BPF_ENV_UDS	"TC_BPF_UDS"
+#define BPF_ENV_MNT	"TC_BPF_MNT"
+
+#ifndef BPF_MAX_LOG
+# define BPF_MAX_LOG	4096
+#endif
+
+#define BPF_DIR_GLOBALS	"globals"
+
+#ifndef BPF_FS_MAGIC
+# define BPF_FS_MAGIC	0xcafe4a11
+#endif
+
+#define BPF_DIR_MNT	"/sys/fs/bpf"
+
+#ifndef TRACEFS_MAGIC
+# define TRACEFS_MAGIC	0x74726163
+#endif
+
+#define TRACE_DIR_MNT	"/sys/kernel/tracing"
+
+#ifndef AF_ALG
+# define AF_ALG		38
+#endif
+
+#ifndef EM_BPF
+# define EM_BPF		247
+#endif
+
+struct bpf_cfg_ops {
+	void (*cbpf_cb)(void *nl, const struct sock_filter *ops, int ops_len);
+	void (*ebpf_cb)(void *nl, int fd, const char *annotation);
+};
+
+struct bpf_cfg_in {
+	const char *object;
+	const char *section;
+	const char *uds;
+	int argc;
+	char **argv;
+	struct sock_filter *ops;
+};
+
+int bpf_parse_common(enum bpf_prog_type type, struct bpf_cfg_in *cfg,
+		     const struct bpf_cfg_ops *ops, void *nl);
+
+const char *bpf_prog_to_default_section(enum bpf_prog_type type);
+
+int bpf_graft_map(const char *map_path, uint32_t *key, int argc, char **argv);
+int bpf_trace_pipe(void);
+
+void bpf_print_ops(FILE *f, struct rtattr *bpf_ops, __u16 len);
+
+#ifdef HAVE_ELF
+int bpf_send_map_fds(const char *path, const char *obj);
+int bpf_recv_map_fds(const char *path, int *fds, struct bpf_map_aux *aux,
+		     unsigned int entries);
+#else
+static inline int bpf_send_map_fds(const char *path, const char *obj)
+{
+	return 0;
+}
+
+static inline int bpf_recv_map_fds(const char *path, int *fds,
+				   struct bpf_map_aux *aux,
+				   unsigned int entries)
+{
+	return -1;
+}
+#endif /* HAVE_ELF */
+#endif /* __BPF_UTIL__ */
diff --git a/lib/Makefile b/lib/Makefile
index 52e016d..5b7ec16 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -8,7 +8,7 @@ CFLAGS += -fPIC
 
 UTILOBJ = utils.o rt_names.o ll_types.o ll_proto.o ll_addr.o \
 	inet_proto.o namespace.o json_writer.o \
-	names.o color.o
+	names.o color.o bpf.o
 
 NLOBJ=libgenl.o ll_map.o libnetlink.o
 
diff --git a/lib/bpf.c b/lib/bpf.c
new file mode 100644
index 0000000..8a5b84b
--- /dev/null
+++ b/lib/bpf.c
@@ -0,0 +1,2262 @@
+/*
+ * bpf.c	BPF common code
+ *
+ *		This program is free software; you can distribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Authors:	Daniel Borkmann <daniel@iogearbox.net>
+ *		Jiri Pirko <jiri@resnulli.us>
+ *		Alexei Starovoitov <ast@kernel.org>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <stdarg.h>
+#include <limits.h>
+#include <assert.h>
+
+#ifdef HAVE_ELF
+#include <libelf.h>
+#include <gelf.h>
+#endif
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/un.h>
+#include <sys/vfs.h>
+#include <sys/mount.h>
+#include <sys/syscall.h>
+#include <sys/sendfile.h>
+#include <sys/resource.h>
+
+#include <arpa/inet.h>
+
+#include "utils.h"
+
+#include "bpf_util.h"
+#include "bpf_elf.h"
+#include "bpf_scm.h"
+
+struct bpf_prog_meta {
+	const char *type;
+	const char *subdir;
+	const char *section;
+	bool may_uds_export;
+};
+
+static const enum bpf_prog_type __bpf_types[] = {
+	BPF_PROG_TYPE_SCHED_CLS,
+	BPF_PROG_TYPE_SCHED_ACT,
+};
+
+static const struct bpf_prog_meta __bpf_prog_meta[] = {
+	[BPF_PROG_TYPE_SCHED_CLS] = {
+		.type		= "cls",
+		.subdir		= "tc",
+		.section	= ELF_SECTION_CLASSIFIER,
+		.may_uds_export	= true,
+	},
+	[BPF_PROG_TYPE_SCHED_ACT] = {
+		.type		= "act",
+		.subdir		= "tc",
+		.section	= ELF_SECTION_ACTION,
+		.may_uds_export	= true,
+	},
+};
+
+static const char *bpf_prog_to_subdir(enum bpf_prog_type type)
+{
+	assert(type < ARRAY_SIZE(__bpf_prog_meta) &&
+	       __bpf_prog_meta[type].subdir);
+	return __bpf_prog_meta[type].subdir;
+}
+
+const char *bpf_prog_to_default_section(enum bpf_prog_type type)
+{
+	assert(type < ARRAY_SIZE(__bpf_prog_meta) &&
+	       __bpf_prog_meta[type].section);
+	return __bpf_prog_meta[type].section;
+}
+
+#ifdef HAVE_ELF
+static int bpf_obj_open(const char *path, enum bpf_prog_type type,
+			const char *sec, bool verbose);
+#else
+static int bpf_obj_open(const char *path, enum bpf_prog_type type,
+			const char *sec, bool verbose)
+{
+	fprintf(stderr, "No ELF library support compiled in.\n");
+	errno = ENOSYS;
+	return -1;
+}
+#endif
+
+static inline __u64 bpf_ptr_to_u64(const void *ptr)
+{
+	return (__u64)(unsigned long)ptr;
+}
+
+static int bpf(int cmd, union bpf_attr *attr, unsigned int size)
+{
+#ifdef __NR_bpf
+	return syscall(__NR_bpf, cmd, attr, size);
+#else
+	fprintf(stderr, "No bpf syscall, kernel headers too old?\n");
+	errno = ENOSYS;
+	return -1;
+#endif
+}
+
+static int bpf_map_update(int fd, const void *key, const void *value,
+			  uint64_t flags)
+{
+	union bpf_attr attr = {};
+
+	attr.map_fd = fd;
+	attr.key = bpf_ptr_to_u64(key);
+	attr.value = bpf_ptr_to_u64(value);
+	attr.flags = flags;
+
+	return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
+}
+
+static int bpf_parse_string(char *arg, bool from_file, __u16 *bpf_len,
+			    char **bpf_string, bool *need_release,
+			    const char separator)
+{
+	char sp;
+
+	if (from_file) {
+		size_t tmp_len, op_len = sizeof("65535 255 255 4294967295,");
+		char *tmp_string, *last;
+		FILE *fp;
+
+		tmp_len = sizeof("4096,") + BPF_MAXINSNS * op_len;
+		tmp_string = calloc(1, tmp_len);
+		if (tmp_string == NULL)
+			return -ENOMEM;
+
+		fp = fopen(arg, "r");
+		if (fp == NULL) {
+			perror("Cannot fopen");
+			free(tmp_string);
+			return -ENOENT;
+		}
+
+		if (!fgets(tmp_string, tmp_len, fp)) {
+			free(tmp_string);
+			fclose(fp);
+			return -EIO;
+		}
+
+		fclose(fp);
+
+		last = &tmp_string[strlen(tmp_string) - 1];
+		if (*last == '\n')
+			*last = 0;
+
+		*need_release = true;
+		*bpf_string = tmp_string;
+	} else {
+		*need_release = false;
+		*bpf_string = arg;
+	}
+
+	if (sscanf(*bpf_string, "%hu%c", bpf_len, &sp) != 2 ||
+	    sp != separator) {
+		if (*need_release)
+			free(*bpf_string);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int bpf_ops_parse(int argc, char **argv, struct sock_filter *bpf_ops,
+			 bool from_file)
+{
+	char *bpf_string, *token, separator = ',';
+	int ret = 0, i = 0;
+	bool need_release;
+	__u16 bpf_len = 0;
+
+	if (argc < 1)
+		return -EINVAL;
+	if (bpf_parse_string(argv[0], from_file, &bpf_len, &bpf_string,
+			     &need_release, separator))
+		return -EINVAL;
+	if (bpf_len == 0 || bpf_len > BPF_MAXINSNS) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	token = bpf_string;
+	while ((token = strchr(token, separator)) && (++token)[0]) {
+		if (i >= bpf_len) {
+			fprintf(stderr, "Real program length exceeds encoded length parameter!\n");
+			ret = -EINVAL;
+			goto out;
+		}
+
+		if (sscanf(token, "%hu %hhu %hhu %u,",
+			   &bpf_ops[i].code, &bpf_ops[i].jt,
+			   &bpf_ops[i].jf, &bpf_ops[i].k) != 4) {
+			fprintf(stderr, "Error at instruction %d!\n", i);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		i++;
+	}
+
+	if (i != bpf_len) {
+		fprintf(stderr, "Parsed program length is less than encoded length parameter!\n");
+		ret = -EINVAL;
+		goto out;
+	}
+	ret = bpf_len;
+out:
+	if (need_release)
+		free(bpf_string);
+
+	return ret;
+}
+
+void bpf_print_ops(FILE *f, struct rtattr *bpf_ops, __u16 len)
+{
+	struct sock_filter *ops = (struct sock_filter *) RTA_DATA(bpf_ops);
+	int i;
+
+	if (len == 0)
+		return;
+
+	fprintf(f, "bytecode \'%u,", len);
+
+	for (i = 0; i < len - 1; i++)
+		fprintf(f, "%hu %hhu %hhu %u,", ops[i].code, ops[i].jt,
+			ops[i].jf, ops[i].k);
+
+	fprintf(f, "%hu %hhu %hhu %u\'", ops[i].code, ops[i].jt,
+		ops[i].jf, ops[i].k);
+}
+
+static void bpf_map_pin_report(const struct bpf_elf_map *pin,
+			       const struct bpf_elf_map *obj)
+{
+	fprintf(stderr, "Map specification differs from pinned file!\n");
+
+	if (obj->type != pin->type)
+		fprintf(stderr, " - Type:         %u (obj) != %u (pin)\n",
+			obj->type, pin->type);
+	if (obj->size_key != pin->size_key)
+		fprintf(stderr, " - Size key:     %u (obj) != %u (pin)\n",
+			obj->size_key, pin->size_key);
+	if (obj->size_value != pin->size_value)
+		fprintf(stderr, " - Size value:   %u (obj) != %u (pin)\n",
+			obj->size_value, pin->size_value);
+	if (obj->max_elem != pin->max_elem)
+		fprintf(stderr, " - Max elems:    %u (obj) != %u (pin)\n",
+			obj->max_elem, pin->max_elem);
+	if (obj->flags != pin->flags)
+		fprintf(stderr, " - Flags:        %#x (obj) != %#x (pin)\n",
+			obj->flags, pin->flags);
+
+	fprintf(stderr, "\n");
+}
+
+static int bpf_map_selfcheck_pinned(int fd, const struct bpf_elf_map *map,
+				    int length)
+{
+	char file[PATH_MAX], buff[4096];
+	struct bpf_elf_map tmp = {}, zero = {};
+	unsigned int val;
+	FILE *fp;
+
+	snprintf(file, sizeof(file), "/proc/%d/fdinfo/%d", getpid(), fd);
+
+	fp = fopen(file, "r");
+	if (!fp) {
+		fprintf(stderr, "No procfs support?!\n");
+		return -EIO;
+	}
+
+	while (fgets(buff, sizeof(buff), fp)) {
+		if (sscanf(buff, "map_type:\t%u", &val) == 1)
+			tmp.type = val;
+		else if (sscanf(buff, "key_size:\t%u", &val) == 1)
+			tmp.size_key = val;
+		else if (sscanf(buff, "value_size:\t%u", &val) == 1)
+			tmp.size_value = val;
+		else if (sscanf(buff, "max_entries:\t%u", &val) == 1)
+			tmp.max_elem = val;
+		else if (sscanf(buff, "map_flags:\t%i", &val) == 1)
+			tmp.flags = val;
+	}
+
+	fclose(fp);
+
+	if (!memcmp(&tmp, map, length)) {
+		return 0;
+	} else {
+		/* If kernel doesn't have eBPF-related fdinfo, we cannot do much,
+		 * so just accept it. We know we do have an eBPF fd and in this
+		 * case, everything is 0. It is guaranteed that no such map exists
+		 * since map type of 0 is unloadable BPF_MAP_TYPE_UNSPEC.
+		 */
+		if (!memcmp(&tmp, &zero, length))
+			return 0;
+
+		bpf_map_pin_report(&tmp, map);
+		return -EINVAL;
+	}
+}
+
+static int bpf_mnt_fs(const char *target)
+{
+	bool bind_done = false;
+
+	while (mount("", target, "none", MS_PRIVATE | MS_REC, NULL)) {
+		if (errno != EINVAL || bind_done) {
+			fprintf(stderr, "mount --make-private %s failed: %s\n",
+				target,	strerror(errno));
+			return -1;
+		}
+
+		if (mount(target, target, "none", MS_BIND, NULL)) {
+			fprintf(stderr, "mount --bind %s %s failed: %s\n",
+				target,	target, strerror(errno));
+			return -1;
+		}
+
+		bind_done = true;
+	}
+
+	if (mount("bpf", target, "bpf", 0, "mode=0700")) {
+		fprintf(stderr, "mount -t bpf bpf %s failed: %s\n",
+			target,	strerror(errno));
+		return -1;
+	}
+
+	return 0;
+}
+
+static int bpf_valid_mntpt(const char *mnt, unsigned long magic)
+{
+	struct statfs st_fs;
+
+	if (statfs(mnt, &st_fs) < 0)
+		return -ENOENT;
+	if ((unsigned long)st_fs.f_type != magic)
+		return -ENOENT;
+
+	return 0;
+}
+
+static const char *bpf_find_mntpt(const char *fstype, unsigned long magic,
+				  char *mnt, int len,
+				  const char * const *known_mnts)
+{
+	const char * const *ptr;
+	char type[100];
+	FILE *fp;
+
+	if (known_mnts) {
+		ptr = known_mnts;
+		while (*ptr) {
+			if (bpf_valid_mntpt(*ptr, magic) == 0) {
+				strncpy(mnt, *ptr, len - 1);
+				mnt[len - 1] = 0;
+				return mnt;
+			}
+			ptr++;
+		}
+	}
+
+	fp = fopen("/proc/mounts", "r");
+	if (fp == NULL || len != PATH_MAX)
+		return NULL;
+
+	while (fscanf(fp, "%*s %" textify(PATH_MAX) "s %99s %*s %*d %*d\n",
+		      mnt, type) == 2) {
+		if (strcmp(type, fstype) == 0)
+			break;
+	}
+
+	fclose(fp);
+	if (strcmp(type, fstype) != 0)
+		return NULL;
+
+	return mnt;
+}
+
+int bpf_trace_pipe(void)
+{
+	char tracefs_mnt[PATH_MAX] = TRACE_DIR_MNT;
+	static const char * const tracefs_known_mnts[] = {
+		TRACE_DIR_MNT,
+		"/sys/kernel/debug/tracing",
+		"/tracing",
+		"/trace",
+		0,
+	};
+	char tpipe[PATH_MAX];
+	const char *mnt;
+	int fd;
+
+	mnt = bpf_find_mntpt("tracefs", TRACEFS_MAGIC, tracefs_mnt,
+			     sizeof(tracefs_mnt), tracefs_known_mnts);
+	if (!mnt) {
+		fprintf(stderr, "tracefs not mounted?\n");
+		return -1;
+	}
+
+	snprintf(tpipe, sizeof(tpipe), "%s/trace_pipe", mnt);
+
+	fd = open(tpipe, O_RDONLY);
+	if (fd < 0)
+		return -1;
+
+	fprintf(stderr, "Running! Hang up with ^C!\n\n");
+	while (1) {
+		static char buff[4096];
+		ssize_t ret;
+
+		ret = read(fd, buff, sizeof(buff) - 1);
+		if (ret > 0) {
+			write(2, buff, ret);
+			fflush(stderr);
+		}
+	}
+
+	return 0;
+}
+
+static int bpf_gen_global(const char *bpf_sub_dir)
+{
+	char bpf_glo_dir[PATH_MAX];
+	int ret;
+
+	snprintf(bpf_glo_dir, sizeof(bpf_glo_dir), "%s/%s/",
+		 bpf_sub_dir, BPF_DIR_GLOBALS);
+
+	ret = mkdir(bpf_glo_dir, S_IRWXU);
+	if (ret && errno != EEXIST) {
+		fprintf(stderr, "mkdir %s failed: %s\n", bpf_glo_dir,
+			strerror(errno));
+		return ret;
+	}
+
+	return 0;
+}
+
+static int bpf_gen_master(const char *base, const char *name)
+{
+	char bpf_sub_dir[PATH_MAX];
+	int ret;
+
+	snprintf(bpf_sub_dir, sizeof(bpf_sub_dir), "%s%s/", base, name);
+
+	ret = mkdir(bpf_sub_dir, S_IRWXU);
+	if (ret && errno != EEXIST) {
+		fprintf(stderr, "mkdir %s failed: %s\n", bpf_sub_dir,
+			strerror(errno));
+		return ret;
+	}
+
+	return bpf_gen_global(bpf_sub_dir);
+}
+
+static int bpf_slave_via_bind_mnt(const char *full_name,
+				  const char *full_link)
+{
+	int ret;
+
+	ret = mkdir(full_name, S_IRWXU);
+	if (ret) {
+		assert(errno != EEXIST);
+		fprintf(stderr, "mkdir %s failed: %s\n", full_name,
+			strerror(errno));
+		return ret;
+	}
+
+	ret = mount(full_link, full_name, "none", MS_BIND, NULL);
+	if (ret) {
+		rmdir(full_name);
+		fprintf(stderr, "mount --bind %s %s failed: %s\n",
+			full_link, full_name, strerror(errno));
+	}
+
+	return ret;
+}
+
+static int bpf_gen_slave(const char *base, const char *name,
+			 const char *link)
+{
+	char bpf_lnk_dir[PATH_MAX];
+	char bpf_sub_dir[PATH_MAX];
+	struct stat sb = {};
+	int ret;
+
+	snprintf(bpf_lnk_dir, sizeof(bpf_lnk_dir), "%s%s/", base, link);
+	snprintf(bpf_sub_dir, sizeof(bpf_sub_dir), "%s%s",  base, name);
+
+	ret = symlink(bpf_lnk_dir, bpf_sub_dir);
+	if (ret) {
+		if (errno != EEXIST) {
+			if (errno != EPERM) {
+				fprintf(stderr, "symlink %s failed: %s\n",
+					bpf_sub_dir, strerror(errno));
+				return ret;
+			}
+
+			return bpf_slave_via_bind_mnt(bpf_sub_dir,
+						      bpf_lnk_dir);
+		}
+
+		ret = lstat(bpf_sub_dir, &sb);
+		if (ret) {
+			fprintf(stderr, "lstat %s failed: %s\n",
+				bpf_sub_dir, strerror(errno));
+			return ret;
+		}
+
+		if ((sb.st_mode & S_IFMT) != S_IFLNK)
+			return bpf_gen_global(bpf_sub_dir);
+	}
+
+	return 0;
+}
+
+static int bpf_gen_hierarchy(const char *base)
+{
+	int ret, i;
+
+	ret = bpf_gen_master(base, bpf_prog_to_subdir(__bpf_types[0]));
+	for (i = 1; i < ARRAY_SIZE(__bpf_types) && !ret; i++)
+		ret = bpf_gen_slave(base,
+				    bpf_prog_to_subdir(__bpf_types[i]),
+				    bpf_prog_to_subdir(__bpf_types[0]));
+	return ret;
+}
+
+static const char *bpf_get_work_dir(enum bpf_prog_type type)
+{
+	static char bpf_tmp[PATH_MAX] = BPF_DIR_MNT;
+	static char bpf_wrk_dir[PATH_MAX];
+	static const char *mnt;
+	static bool bpf_mnt_cached;
+	static const char * const bpf_known_mnts[] = {
+		BPF_DIR_MNT,
+		"/bpf",
+		0,
+	};
+	int ret;
+
+	if (bpf_mnt_cached) {
+		const char *out = mnt;
+
+		if (out) {
+			snprintf(bpf_tmp, sizeof(bpf_tmp), "%s%s/",
+				 out, bpf_prog_to_subdir(type));
+			out = bpf_tmp;
+		}
+		return out;
+	}
+
+	mnt = bpf_find_mntpt("bpf", BPF_FS_MAGIC, bpf_tmp, sizeof(bpf_tmp),
+			     bpf_known_mnts);
+	if (!mnt) {
+		mnt = getenv(BPF_ENV_MNT);
+		if (!mnt)
+			mnt = BPF_DIR_MNT;
+		ret = bpf_mnt_fs(mnt);
+		if (ret) {
+			mnt = NULL;
+			goto out;
+		}
+	}
+
+	snprintf(bpf_wrk_dir, sizeof(bpf_wrk_dir), "%s/", mnt);
+
+	ret = bpf_gen_hierarchy(bpf_wrk_dir);
+	if (ret) {
+		mnt = NULL;
+		goto out;
+	}
+
+	mnt = bpf_wrk_dir;
+out:
+	bpf_mnt_cached = true;
+	return mnt;
+}
+
+static int bpf_obj_get(const char *pathname, enum bpf_prog_type type)
+{
+	union bpf_attr attr = {};
+	char tmp[PATH_MAX];
+
+	if (strlen(pathname) > 2 && pathname[0] == 'm' &&
+	    pathname[1] == ':' && bpf_get_work_dir(type)) {
+		snprintf(tmp, sizeof(tmp), "%s/%s",
+			 bpf_get_work_dir(type), pathname + 2);
+		pathname = tmp;
+	}
+
+	attr.pathname = bpf_ptr_to_u64(pathname);
+
+	return bpf(BPF_OBJ_GET, &attr, sizeof(attr));
+}
+
+enum bpf_mode {
+	CBPF_BYTECODE,
+	CBPF_FILE,
+	EBPF_OBJECT,
+	EBPF_PINNED,
+	BPF_MODE_MAX,
+};
+
+static int bpf_parse(enum bpf_prog_type *type, enum bpf_mode *mode,
+		     struct bpf_cfg_in *cfg, const bool *opt_tbl)
+{
+	const char *file, *section, *uds_name;
+	bool verbose = false;
+	int i, ret, argc;
+	char **argv;
+
+	argv = cfg->argv;
+	argc = cfg->argc;
+
+	if (opt_tbl[CBPF_BYTECODE] &&
+	    (matches(*argv, "bytecode") == 0 ||
+	     strcmp(*argv, "bc") == 0)) {
+		*mode = CBPF_BYTECODE;
+	} else if (opt_tbl[CBPF_FILE] &&
+		   (matches(*argv, "bytecode-file") == 0 ||
+		    strcmp(*argv, "bcf") == 0)) {
+		*mode = CBPF_FILE;
+	} else if (opt_tbl[EBPF_OBJECT] &&
+		   (matches(*argv, "object-file") == 0 ||
+		    strcmp(*argv, "obj") == 0)) {
+		*mode = EBPF_OBJECT;
+	} else if (opt_tbl[EBPF_PINNED] &&
+		   (matches(*argv, "object-pinned") == 0 ||
+		    matches(*argv, "pinned") == 0 ||
+		    matches(*argv, "fd") == 0)) {
+		*mode = EBPF_PINNED;
+	} else {
+		fprintf(stderr, "What mode is \"%s\"?\n", *argv);
+		return -1;
+	}
+
+	NEXT_ARG();
+	file = section = uds_name = NULL;
+	if (*mode == EBPF_OBJECT || *mode == EBPF_PINNED) {
+		file = *argv;
+		NEXT_ARG_FWD();
+
+		if (*type == BPF_PROG_TYPE_UNSPEC) {
+			if (argc > 0 && matches(*argv, "type") == 0) {
+				NEXT_ARG();
+				for (i = 0; i < ARRAY_SIZE(__bpf_prog_meta);
+				     i++) {
+					if (!__bpf_prog_meta[i].type)
+						continue;
+					if (!matches(*argv,
+						     __bpf_prog_meta[i].type)) {
+						*type = i;
+						break;
+					}
+				}
+
+				if (*type == BPF_PROG_TYPE_UNSPEC) {
+					fprintf(stderr, "What type is \"%s\"?\n",
+						*argv);
+					return -1;
+				}
+				NEXT_ARG_FWD();
+			} else {
+				*type = BPF_PROG_TYPE_SCHED_CLS;
+			}
+		}
+
+		section = bpf_prog_to_default_section(*type);
+		if (argc > 0 && matches(*argv, "section") == 0) {
+			NEXT_ARG();
+			section = *argv;
+			NEXT_ARG_FWD();
+		}
+
+		if (__bpf_prog_meta[*type].may_uds_export) {
+			uds_name = getenv(BPF_ENV_UDS);
+			if (argc > 0 && !uds_name &&
+			    matches(*argv, "export") == 0) {
+				NEXT_ARG();
+				uds_name = *argv;
+				NEXT_ARG_FWD();
+			}
+		}
+
+		if (argc > 0 && matches(*argv, "verbose") == 0) {
+			verbose = true;
+			NEXT_ARG_FWD();
+		}
+
+		PREV_ARG();
+	}
+
+	if (*mode == CBPF_BYTECODE || *mode == CBPF_FILE)
+		ret = bpf_ops_parse(argc, argv, cfg->ops, *mode == CBPF_FILE);
+	else if (*mode == EBPF_OBJECT)
+		ret = bpf_obj_open(file, *type, section, verbose);
+	else if (*mode == EBPF_PINNED)
+		ret = bpf_obj_get(file, *type);
+	else
+		return -1;
+
+	cfg->object  = file;
+	cfg->section = section;
+	cfg->uds     = uds_name;
+	cfg->argc    = argc;
+	cfg->argv    = argv;
+
+	return ret;
+}
+
+static int bpf_parse_opt_tbl(enum bpf_prog_type type, struct bpf_cfg_in *cfg,
+			     const struct bpf_cfg_ops *ops, void *nl,
+			     const bool *opt_tbl)
+{
+	struct sock_filter opcodes[BPF_MAXINSNS];
+	char annotation[256];
+	enum bpf_mode mode;
+	int ret;
+
+	cfg->ops = opcodes;
+	ret = bpf_parse(&type, &mode, cfg, opt_tbl);
+	cfg->ops = NULL;
+	if (ret < 0)
+		return ret;
+
+	if (mode == CBPF_BYTECODE || mode == CBPF_FILE)
+		ops->cbpf_cb(nl, opcodes, ret);
+	if (mode == EBPF_OBJECT || mode == EBPF_PINNED) {
+		snprintf(annotation, sizeof(annotation), "%s:[%s]",
+			 basename(cfg->object), mode == EBPF_PINNED ?
+			 "*fsobj" : cfg->section);
+		ops->ebpf_cb(nl, ret, annotation);
+	}
+
+	return 0;
+}
+
+int bpf_parse_common(enum bpf_prog_type type, struct bpf_cfg_in *cfg,
+		     const struct bpf_cfg_ops *ops, void *nl)
+{
+	bool opt_tbl[BPF_MODE_MAX] = {};
+
+	if (ops->cbpf_cb) {
+		opt_tbl[CBPF_BYTECODE] = true;
+		opt_tbl[CBPF_FILE]     = true;
+	}
+
+	if (ops->ebpf_cb) {
+		opt_tbl[EBPF_OBJECT]   = true;
+		opt_tbl[EBPF_PINNED]   = true;
+	}
+
+	return bpf_parse_opt_tbl(type, cfg, ops, nl, opt_tbl);
+}
+
+int bpf_graft_map(const char *map_path, uint32_t *key, int argc, char **argv)
+{
+	enum bpf_prog_type type = BPF_PROG_TYPE_UNSPEC;
+	const bool opt_tbl[BPF_MODE_MAX] = {
+		[EBPF_OBJECT]	= true,
+		[EBPF_PINNED]	= true,
+	};
+	const struct bpf_elf_map test = {
+		.type		= BPF_MAP_TYPE_PROG_ARRAY,
+		.size_key	= sizeof(int),
+		.size_value	= sizeof(int),
+	};
+	struct bpf_cfg_in cfg = {
+		.argc		= argc,
+		.argv		= argv,
+	};
+	int ret, prog_fd, map_fd;
+	enum bpf_mode mode;
+	uint32_t map_key;
+
+	prog_fd = bpf_parse(&type, &mode, &cfg, opt_tbl);
+	if (prog_fd < 0)
+		return prog_fd;
+	if (key) {
+		map_key = *key;
+	} else {
+		ret = sscanf(cfg.section, "%*i/%i", &map_key);
+		if (ret != 1) {
+			fprintf(stderr, "Couldn\'t infer map key from section name! Please provide \'key\' argument!\n");
+			ret = -EINVAL;
+			goto out_prog;
+		}
+	}
+
+	map_fd = bpf_obj_get(map_path, type);
+	if (map_fd < 0) {
+		fprintf(stderr, "Couldn\'t retrieve pinned map \'%s\': %s\n",
+			map_path, strerror(errno));
+		ret = map_fd;
+		goto out_prog;
+	}
+
+	ret = bpf_map_selfcheck_pinned(map_fd, &test,
+				       offsetof(struct bpf_elf_map, max_elem));
+	if (ret < 0) {
+		fprintf(stderr, "Map \'%s\' self-check failed!\n", map_path);
+		goto out_map;
+	}
+
+	ret = bpf_map_update(map_fd, &map_key, &prog_fd, BPF_ANY);
+	if (ret < 0)
+		fprintf(stderr, "Map update failed: %s\n", strerror(errno));
+out_map:
+	close(map_fd);
+out_prog:
+	close(prog_fd);
+	return ret;
+}
+
+#ifdef HAVE_ELF
+struct bpf_elf_prog {
+	enum bpf_prog_type	type;
+	const struct bpf_insn	*insns;
+	size_t			size;
+	const char		*license;
+};
+
+struct bpf_hash_entry {
+	unsigned int		pinning;
+	const char		*subpath;
+	struct bpf_hash_entry	*next;
+};
+
+struct bpf_elf_ctx {
+	Elf			*elf_fd;
+	GElf_Ehdr		elf_hdr;
+	Elf_Data		*sym_tab;
+	Elf_Data		*str_tab;
+	int			obj_fd;
+	int			map_fds[ELF_MAX_MAPS];
+	struct bpf_elf_map	maps[ELF_MAX_MAPS];
+	int			sym_num;
+	int			map_num;
+	int			map_len;
+	bool			*sec_done;
+	int			sec_maps;
+	char			license[ELF_MAX_LICENSE_LEN];
+	enum bpf_prog_type	type;
+	bool			verbose;
+	struct bpf_elf_st	stat;
+	struct bpf_hash_entry	*ht[256];
+	char			*log;
+	size_t			log_size;
+};
+
+struct bpf_elf_sec_data {
+	GElf_Shdr		sec_hdr;
+	Elf_Data		*sec_data;
+	const char		*sec_name;
+};
+
+struct bpf_map_data {
+	int			*fds;
+	const char		*obj;
+	struct bpf_elf_st	*st;
+	struct bpf_elf_map	*ent;
+};
+
+static __check_format_string(2, 3) void
+bpf_dump_error(struct bpf_elf_ctx *ctx, const char *format, ...)
+{
+	va_list vl;
+
+	va_start(vl, format);
+	vfprintf(stderr, format, vl);
+	va_end(vl);
+
+	if (ctx->log && ctx->log[0]) {
+		if (ctx->verbose) {
+			fprintf(stderr, "%s\n", ctx->log);
+		} else {
+			unsigned int off = 0, len = strlen(ctx->log);
+
+			if (len > BPF_MAX_LOG) {
+				off = len - BPF_MAX_LOG;
+				fprintf(stderr, "Skipped %u bytes, use \'verb\' option for the full verbose log.\n[...]\n",
+					off);
+			}
+			fprintf(stderr, "%s\n", ctx->log + off);
+		}
+
+		memset(ctx->log, 0, ctx->log_size);
+	}
+}
+
+static int bpf_log_realloc(struct bpf_elf_ctx *ctx)
+{
+	size_t log_size = ctx->log_size;
+	void *ptr;
+
+	if (!ctx->log) {
+		log_size = 65536;
+	} else {
+		log_size <<= 1;
+		if (log_size > (UINT_MAX >> 8))
+			return -EINVAL;
+	}
+
+	ptr = realloc(ctx->log, log_size);
+	if (!ptr)
+		return -ENOMEM;
+
+	ctx->log = ptr;
+	ctx->log_size = log_size;
+
+	return 0;
+}
+
+static int bpf_map_create(enum bpf_map_type type, uint32_t size_key,
+			  uint32_t size_value, uint32_t max_elem,
+			  uint32_t flags)
+{
+	union bpf_attr attr = {};
+
+	attr.map_type = type;
+	attr.key_size = size_key;
+	attr.value_size = size_value;
+	attr.max_entries = max_elem;
+	attr.map_flags = flags;
+
+	return bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
+}
+
+static int bpf_prog_load(enum bpf_prog_type type, const struct bpf_insn *insns,
+			 size_t size_insns, const char *license, char *log,
+			 size_t size_log)
+{
+	union bpf_attr attr = {};
+
+	attr.prog_type = type;
+	attr.insns = bpf_ptr_to_u64(insns);
+	attr.insn_cnt = size_insns / sizeof(struct bpf_insn);
+	attr.license = bpf_ptr_to_u64(license);
+
+	if (size_log > 0) {
+		attr.log_buf = bpf_ptr_to_u64(log);
+		attr.log_size = size_log;
+		attr.log_level = 1;
+	}
+
+	return bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
+}
+
+static int bpf_obj_pin(int fd, const char *pathname)
+{
+	union bpf_attr attr = {};
+
+	attr.pathname = bpf_ptr_to_u64(pathname);
+	attr.bpf_fd = fd;
+
+	return bpf(BPF_OBJ_PIN, &attr, sizeof(attr));
+}
+
+static int bpf_obj_hash(const char *object, uint8_t *out, size_t len)
+{
+	struct sockaddr_alg alg = {
+		.salg_family	= AF_ALG,
+		.salg_type	= "hash",
+		.salg_name	= "sha1",
+	};
+	int ret, cfd, ofd, ffd;
+	struct stat stbuff;
+	ssize_t size;
+
+	if (!object || len != 20)
+		return -EINVAL;
+
+	cfd = socket(AF_ALG, SOCK_SEQPACKET, 0);
+	if (cfd < 0) {
+		fprintf(stderr, "Cannot get AF_ALG socket: %s\n",
+			strerror(errno));
+		return cfd;
+	}
+
+	ret = bind(cfd, (struct sockaddr *)&alg, sizeof(alg));
+	if (ret < 0) {
+		fprintf(stderr, "Error binding socket: %s\n", strerror(errno));
+		goto out_cfd;
+	}
+
+	ofd = accept(cfd, NULL, 0);
+	if (ofd < 0) {
+		fprintf(stderr, "Error accepting socket: %s\n",
+			strerror(errno));
+		ret = ofd;
+		goto out_cfd;
+	}
+
+	ffd = open(object, O_RDONLY);
+	if (ffd < 0) {
+		fprintf(stderr, "Error opening object %s: %s\n",
+			object, strerror(errno));
+		ret = ffd;
+		goto out_ofd;
+	}
+
+	ret = fstat(ffd, &stbuff);
+	if (ret < 0) {
+		fprintf(stderr, "Error doing fstat: %s\n",
+			strerror(errno));
+		goto out_ffd;
+	}
+
+	size = sendfile(ofd, ffd, NULL, stbuff.st_size);
+	if (size != stbuff.st_size) {
+		fprintf(stderr, "Error from sendfile (%zd vs %zu bytes): %s\n",
+			size, stbuff.st_size, strerror(errno));
+		ret = -1;
+		goto out_ffd;
+	}
+
+	size = read(ofd, out, len);
+	if (size != len) {
+		fprintf(stderr, "Error from read (%zd vs %zu bytes): %s\n",
+			size, len, strerror(errno));
+		ret = -1;
+	} else {
+		ret = 0;
+	}
+out_ffd:
+	close(ffd);
+out_ofd:
+	close(ofd);
+out_cfd:
+	close(cfd);
+	return ret;
+}
+
+static const char *bpf_get_obj_uid(const char *pathname)
+{
+	static bool bpf_uid_cached;
+	static char bpf_uid[64];
+	uint8_t tmp[20];
+	int ret;
+
+	if (bpf_uid_cached)
+		goto done;
+
+	ret = bpf_obj_hash(pathname, tmp, sizeof(tmp));
+	if (ret) {
+		fprintf(stderr, "Object hashing failed!\n");
+		return NULL;
+	}
+
+	hexstring_n2a(tmp, sizeof(tmp), bpf_uid, sizeof(bpf_uid));
+	bpf_uid_cached = true;
+done:
+	return bpf_uid;
+}
+
+static int bpf_init_env(const char *pathname)
+{
+	struct rlimit limit = {
+		.rlim_cur = RLIM_INFINITY,
+		.rlim_max = RLIM_INFINITY,
+	};
+
+	/* Don't bother in case we fail! */
+	setrlimit(RLIMIT_MEMLOCK, &limit);
+
+	if (!bpf_get_work_dir(BPF_PROG_TYPE_UNSPEC)) {
+		fprintf(stderr, "Continuing without mounted eBPF fs. Too old kernel?\n");
+		return 0;
+	}
+
+	if (!bpf_get_obj_uid(pathname))
+		return -1;
+
+	return 0;
+}
+
+static const char *bpf_custom_pinning(const struct bpf_elf_ctx *ctx,
+				      uint32_t pinning)
+{
+	struct bpf_hash_entry *entry;
+
+	entry = ctx->ht[pinning & (ARRAY_SIZE(ctx->ht) - 1)];
+	while (entry && entry->pinning != pinning)
+		entry = entry->next;
+
+	return entry ? entry->subpath : NULL;
+}
+
+static bool bpf_no_pinning(const struct bpf_elf_ctx *ctx,
+			   uint32_t pinning)
+{
+	switch (pinning) {
+	case PIN_OBJECT_NS:
+	case PIN_GLOBAL_NS:
+		return false;
+	case PIN_NONE:
+		return true;
+	default:
+		return !bpf_custom_pinning(ctx, pinning);
+	}
+}
+
+static void bpf_make_pathname(char *pathname, size_t len, const char *name,
+			      const struct bpf_elf_ctx *ctx, uint32_t pinning)
+{
+	switch (pinning) {
+	case PIN_OBJECT_NS:
+		snprintf(pathname, len, "%s/%s/%s",
+			 bpf_get_work_dir(ctx->type),
+			 bpf_get_obj_uid(NULL), name);
+		break;
+	case PIN_GLOBAL_NS:
+		snprintf(pathname, len, "%s/%s/%s",
+			 bpf_get_work_dir(ctx->type),
+			 BPF_DIR_GLOBALS, name);
+		break;
+	default:
+		snprintf(pathname, len, "%s/../%s/%s",
+			 bpf_get_work_dir(ctx->type),
+			 bpf_custom_pinning(ctx, pinning), name);
+		break;
+	}
+}
+
+static int bpf_probe_pinned(const char *name, const struct bpf_elf_ctx *ctx,
+			    uint32_t pinning)
+{
+	char pathname[PATH_MAX];
+
+	if (bpf_no_pinning(ctx, pinning) || !bpf_get_work_dir(ctx->type))
+		return 0;
+
+	bpf_make_pathname(pathname, sizeof(pathname), name, ctx, pinning);
+	return bpf_obj_get(pathname, ctx->type);
+}
+
+static int bpf_make_obj_path(const struct bpf_elf_ctx *ctx)
+{
+	char tmp[PATH_MAX];
+	int ret;
+
+	snprintf(tmp, sizeof(tmp), "%s/%s", bpf_get_work_dir(ctx->type),
+		 bpf_get_obj_uid(NULL));
+
+	ret = mkdir(tmp, S_IRWXU);
+	if (ret && errno != EEXIST) {
+		fprintf(stderr, "mkdir %s failed: %s\n", tmp, strerror(errno));
+		return ret;
+	}
+
+	return 0;
+}
+
+static int bpf_make_custom_path(const struct bpf_elf_ctx *ctx,
+				const char *todo)
+{
+	char tmp[PATH_MAX], rem[PATH_MAX], *sub;
+	int ret;
+
+	snprintf(tmp, sizeof(tmp), "%s/../", bpf_get_work_dir(ctx->type));
+	snprintf(rem, sizeof(rem), "%s/", todo);
+	sub = strtok(rem, "/");
+
+	while (sub) {
+		if (strlen(tmp) + strlen(sub) + 2 > PATH_MAX)
+			return -EINVAL;
+
+		strcat(tmp, sub);
+		strcat(tmp, "/");
+
+		ret = mkdir(tmp, S_IRWXU);
+		if (ret && errno != EEXIST) {
+			fprintf(stderr, "mkdir %s failed: %s\n", tmp,
+				strerror(errno));
+			return ret;
+		}
+
+		sub = strtok(NULL, "/");
+	}
+
+	return 0;
+}
+
+static int bpf_place_pinned(int fd, const char *name,
+			    const struct bpf_elf_ctx *ctx, uint32_t pinning)
+{
+	char pathname[PATH_MAX];
+	const char *tmp;
+	int ret = 0;
+
+	if (bpf_no_pinning(ctx, pinning) || !bpf_get_work_dir(ctx->type))
+		return 0;
+
+	if (pinning == PIN_OBJECT_NS)
+		ret = bpf_make_obj_path(ctx);
+	else if ((tmp = bpf_custom_pinning(ctx, pinning)))
+		ret = bpf_make_custom_path(ctx, tmp);
+	if (ret < 0)
+		return ret;
+
+	bpf_make_pathname(pathname, sizeof(pathname), name, ctx, pinning);
+	return bpf_obj_pin(fd, pathname);
+}
+
+static void bpf_prog_report(int fd, const char *section,
+			    const struct bpf_elf_prog *prog,
+			    struct bpf_elf_ctx *ctx)
+{
+	unsigned int insns = prog->size / sizeof(struct bpf_insn);
+
+	fprintf(stderr, "\nProg section \'%s\' %s%s (%d)!\n", section,
+		fd < 0 ? "rejected: " : "loaded",
+		fd < 0 ? strerror(errno) : "",
+		fd < 0 ? errno : fd);
+
+	fprintf(stderr, " - Type:         %u\n", prog->type);
+	fprintf(stderr, " - Instructions: %u (%u over limit)\n",
+		insns, insns > BPF_MAXINSNS ? insns - BPF_MAXINSNS : 0);
+	fprintf(stderr, " - License:      %s\n\n", prog->license);
+
+	bpf_dump_error(ctx, "Verifier analysis:\n\n");
+}
+
+static int bpf_prog_attach(const char *section,
+			   const struct bpf_elf_prog *prog,
+			   struct bpf_elf_ctx *ctx)
+{
+	int tries = 0, fd;
+retry:
+	errno = 0;
+	fd = bpf_prog_load(prog->type, prog->insns, prog->size,
+			   prog->license, ctx->log, ctx->log_size);
+	if (fd < 0 || ctx->verbose) {
+		/* The verifier log is pretty chatty, sometimes so chatty
+		 * on larger programs, that we could fail to dump everything
+		 * into our buffer. Still, try to give a debuggable error
+		 * log for the user, so enlarge it and re-fail.
+		 */
+		if (fd < 0 && (errno == ENOSPC || !ctx->log_size)) {
+			if (tries++ < 6 && !bpf_log_realloc(ctx))
+				goto retry;
+
+			fprintf(stderr, "Log buffer too small to dump verifier log %zu bytes (%d tries)!\n",
+				ctx->log_size, tries);
+			return fd;
+		}
+
+		bpf_prog_report(fd, section, prog, ctx);
+	}
+
+	return fd;
+}
+
+static void bpf_map_report(int fd, const char *name,
+			   const struct bpf_elf_map *map,
+			   struct bpf_elf_ctx *ctx)
+{
+	fprintf(stderr, "Map object \'%s\' %s%s (%d)!\n", name,
+		fd < 0 ? "rejected: " : "loaded",
+		fd < 0 ? strerror(errno) : "",
+		fd < 0 ? errno : fd);
+
+	fprintf(stderr, " - Type:         %u\n", map->type);
+	fprintf(stderr, " - Identifier:   %u\n", map->id);
+	fprintf(stderr, " - Pinning:      %u\n", map->pinning);
+	fprintf(stderr, " - Size key:     %u\n", map->size_key);
+	fprintf(stderr, " - Size value:   %u\n", map->size_value);
+	fprintf(stderr, " - Max elems:    %u\n", map->max_elem);
+	fprintf(stderr, " - Flags:        %#x\n\n", map->flags);
+}
+
+static int bpf_map_attach(const char *name, const struct bpf_elf_map *map,
+			  struct bpf_elf_ctx *ctx)
+{
+	int fd, ret;
+
+	fd = bpf_probe_pinned(name, ctx, map->pinning);
+	if (fd > 0) {
+		ret = bpf_map_selfcheck_pinned(fd, map,
+					       offsetof(struct bpf_elf_map,
+							id));
+		if (ret < 0) {
+			close(fd);
+			fprintf(stderr, "Map \'%s\' self-check failed!\n",
+				name);
+			return ret;
+		}
+		if (ctx->verbose)
+			fprintf(stderr, "Map \'%s\' loaded as pinned!\n",
+				name);
+		return fd;
+	}
+
+	errno = 0;
+	fd = bpf_map_create(map->type, map->size_key, map->size_value,
+			    map->max_elem, map->flags);
+	if (fd < 0 || ctx->verbose) {
+		bpf_map_report(fd, name, map, ctx);
+		if (fd < 0)
+			return fd;
+	}
+
+	ret = bpf_place_pinned(fd, name, ctx, map->pinning);
+	if (ret < 0 && errno != EEXIST) {
+		fprintf(stderr, "Could not pin %s map: %s\n", name,
+			strerror(errno));
+		close(fd);
+		return ret;
+	}
+
+	return fd;
+}
+
+static const char *bpf_str_tab_name(const struct bpf_elf_ctx *ctx,
+				    const GElf_Sym *sym)
+{
+	return ctx->str_tab->d_buf + sym->st_name;
+}
+
+static const char *bpf_map_fetch_name(struct bpf_elf_ctx *ctx, int which)
+{
+	GElf_Sym sym;
+	int i;
+
+	for (i = 0; i < ctx->sym_num; i++) {
+		if (gelf_getsym(ctx->sym_tab, i, &sym) != &sym)
+			continue;
+
+		if (GELF_ST_BIND(sym.st_info) != STB_GLOBAL ||
+		    GELF_ST_TYPE(sym.st_info) != STT_NOTYPE ||
+		    sym.st_shndx != ctx->sec_maps ||
+		    sym.st_value / ctx->map_len != which)
+			continue;
+
+		return bpf_str_tab_name(ctx, &sym);
+	}
+
+	return NULL;
+}
+
+static int bpf_maps_attach_all(struct bpf_elf_ctx *ctx)
+{
+	const char *map_name;
+	int i, fd;
+
+	for (i = 0; i < ctx->map_num; i++) {
+		map_name = bpf_map_fetch_name(ctx, i);
+		if (!map_name)
+			return -EIO;
+
+		fd = bpf_map_attach(map_name, &ctx->maps[i], ctx);
+		if (fd < 0)
+			return fd;
+
+		ctx->map_fds[i] = fd;
+	}
+
+	return 0;
+}
+
+static int bpf_map_num_sym(struct bpf_elf_ctx *ctx)
+{
+	int i, num = 0;
+	GElf_Sym sym;
+
+	for (i = 0; i < ctx->sym_num; i++) {
+		if (gelf_getsym(ctx->sym_tab, i, &sym) != &sym)
+			continue;
+
+		if (GELF_ST_BIND(sym.st_info) != STB_GLOBAL ||
+		    GELF_ST_TYPE(sym.st_info) != STT_NOTYPE ||
+		    sym.st_shndx != ctx->sec_maps)
+			continue;
+		num++;
+	}
+
+	return num;
+}
+
+static int bpf_fill_section_data(struct bpf_elf_ctx *ctx, int section,
+				 struct bpf_elf_sec_data *data)
+{
+	Elf_Data *sec_edata;
+	GElf_Shdr sec_hdr;
+	Elf_Scn *sec_fd;
+	char *sec_name;
+
+	memset(data, 0, sizeof(*data));
+
+	sec_fd = elf_getscn(ctx->elf_fd, section);
+	if (!sec_fd)
+		return -EINVAL;
+	if (gelf_getshdr(sec_fd, &sec_hdr) != &sec_hdr)
+		return -EIO;
+
+	sec_name = elf_strptr(ctx->elf_fd, ctx->elf_hdr.e_shstrndx,
+			      sec_hdr.sh_name);
+	if (!sec_name || !sec_hdr.sh_size)
+		return -ENOENT;
+
+	sec_edata = elf_getdata(sec_fd, NULL);
+	if (!sec_edata || elf_getdata(sec_fd, sec_edata))
+		return -EIO;
+
+	memcpy(&data->sec_hdr, &sec_hdr, sizeof(sec_hdr));
+
+	data->sec_name = sec_name;
+	data->sec_data = sec_edata;
+	return 0;
+}
+
+struct bpf_elf_map_min {
+	__u32 type;
+	__u32 size_key;
+	__u32 size_value;
+	__u32 max_elem;
+};
+
+static int bpf_fetch_maps_begin(struct bpf_elf_ctx *ctx, int section,
+				struct bpf_elf_sec_data *data)
+{
+	ctx->map_num = data->sec_data->d_size;
+	ctx->sec_maps = section;
+	ctx->sec_done[section] = true;
+
+	if (ctx->map_num > sizeof(ctx->maps)) {
+		fprintf(stderr, "Too many BPF maps in ELF section!\n");
+		return -ENOMEM;
+	}
+
+	memcpy(ctx->maps, data->sec_data->d_buf, ctx->map_num);
+	return 0;
+}
+
+static int bpf_map_verify_all_offs(struct bpf_elf_ctx *ctx, int end)
+{
+	GElf_Sym sym;
+	int off, i;
+
+	for (off = 0; off < end; off += ctx->map_len) {
+		/* Order doesn't need to be linear here, hence we walk
+		 * the table again.
+		 */
+		for (i = 0; i < ctx->sym_num; i++) {
+			if (gelf_getsym(ctx->sym_tab, i, &sym) != &sym)
+				continue;
+			if (GELF_ST_BIND(sym.st_info) != STB_GLOBAL ||
+			    GELF_ST_TYPE(sym.st_info) != STT_NOTYPE ||
+			    sym.st_shndx != ctx->sec_maps)
+				continue;
+			if (sym.st_value == off)
+				break;
+			if (i == ctx->sym_num - 1)
+				return -1;
+		}
+	}
+
+	return off == end ? 0 : -1;
+}
+
+static int bpf_fetch_maps_end(struct bpf_elf_ctx *ctx)
+{
+	struct bpf_elf_map fixup[ARRAY_SIZE(ctx->maps)] = {};
+	int i, sym_num = bpf_map_num_sym(ctx);
+	__u8 *buff;
+
+	if (sym_num == 0 || sym_num > ARRAY_SIZE(ctx->maps)) {
+		fprintf(stderr, "%u maps not supported in current map section!\n",
+			sym_num);
+		return -EINVAL;
+	}
+
+	if (ctx->map_num % sym_num != 0 ||
+	    ctx->map_num % sizeof(__u32) != 0) {
+		fprintf(stderr, "Number BPF map symbols are not multiple of struct bpf_elf_map!\n");
+		return -EINVAL;
+	}
+
+	ctx->map_len = ctx->map_num / sym_num;
+	if (bpf_map_verify_all_offs(ctx, ctx->map_num)) {
+		fprintf(stderr, "Different struct bpf_elf_map in use!\n");
+		return -EINVAL;
+	}
+
+	if (ctx->map_len == sizeof(struct bpf_elf_map)) {
+		ctx->map_num = sym_num;
+		return 0;
+	} else if (ctx->map_len > sizeof(struct bpf_elf_map)) {
+		fprintf(stderr, "struct bpf_elf_map not supported, coming from future version?\n");
+		return -EINVAL;
+	} else if (ctx->map_len < sizeof(struct bpf_elf_map_min)) {
+		fprintf(stderr, "struct bpf_elf_map too small, not supported!\n");
+		return -EINVAL;
+	}
+
+	ctx->map_num = sym_num;
+	for (i = 0, buff = (void *)ctx->maps; i < ctx->map_num;
+	     i++, buff += ctx->map_len) {
+		/* The fixup leaves the rest of the members as zero, which
+		 * is fine currently, but option exist to set some other
+		 * default value as well when needed in future.
+		 */
+		memcpy(&fixup[i], buff, ctx->map_len);
+	}
+
+	memcpy(ctx->maps, fixup, sizeof(fixup));
+
+	printf("Note: %zu bytes struct bpf_elf_map fixup performed due to size mismatch!\n",
+	       sizeof(struct bpf_elf_map) - ctx->map_len);
+	return 0;
+}
+
+static int bpf_fetch_license(struct bpf_elf_ctx *ctx, int section,
+			     struct bpf_elf_sec_data *data)
+{
+	if (data->sec_data->d_size > sizeof(ctx->license))
+		return -ENOMEM;
+
+	memcpy(ctx->license, data->sec_data->d_buf, data->sec_data->d_size);
+	ctx->sec_done[section] = true;
+	return 0;
+}
+
+static int bpf_fetch_symtab(struct bpf_elf_ctx *ctx, int section,
+			    struct bpf_elf_sec_data *data)
+{
+	ctx->sym_tab = data->sec_data;
+	ctx->sym_num = data->sec_hdr.sh_size / data->sec_hdr.sh_entsize;
+	ctx->sec_done[section] = true;
+	return 0;
+}
+
+static int bpf_fetch_strtab(struct bpf_elf_ctx *ctx, int section,
+			    struct bpf_elf_sec_data *data)
+{
+	ctx->str_tab = data->sec_data;
+	ctx->sec_done[section] = true;
+	return 0;
+}
+
+static bool bpf_has_map_data(const struct bpf_elf_ctx *ctx)
+{
+	return ctx->sym_tab && ctx->str_tab && ctx->sec_maps;
+}
+
+static int bpf_fetch_ancillary(struct bpf_elf_ctx *ctx)
+{
+	struct bpf_elf_sec_data data;
+	int i, ret = -1;
+
+	for (i = 1; i < ctx->elf_hdr.e_shnum; i++) {
+		ret = bpf_fill_section_data(ctx, i, &data);
+		if (ret < 0)
+			continue;
+
+		if (data.sec_hdr.sh_type == SHT_PROGBITS &&
+		    !strcmp(data.sec_name, ELF_SECTION_MAPS))
+			ret = bpf_fetch_maps_begin(ctx, i, &data);
+		else if (data.sec_hdr.sh_type == SHT_PROGBITS &&
+			 !strcmp(data.sec_name, ELF_SECTION_LICENSE))
+			ret = bpf_fetch_license(ctx, i, &data);
+		else if (data.sec_hdr.sh_type == SHT_SYMTAB &&
+			 !strcmp(data.sec_name, ".symtab"))
+			ret = bpf_fetch_symtab(ctx, i, &data);
+		else if (data.sec_hdr.sh_type == SHT_STRTAB &&
+			 !strcmp(data.sec_name, ".strtab"))
+			ret = bpf_fetch_strtab(ctx, i, &data);
+		if (ret < 0) {
+			fprintf(stderr, "Error parsing section %d! Perhaps check with readelf -a?\n",
+				i);
+			return ret;
+		}
+	}
+
+	if (bpf_has_map_data(ctx)) {
+		ret = bpf_fetch_maps_end(ctx);
+		if (ret < 0) {
+			fprintf(stderr, "Error fixing up map structure, incompatible struct bpf_elf_map used?\n");
+			return ret;
+		}
+
+		ret = bpf_maps_attach_all(ctx);
+		if (ret < 0) {
+			fprintf(stderr, "Error loading maps into kernel!\n");
+			return ret;
+		}
+	}
+
+	return ret;
+}
+
+static int bpf_fetch_prog(struct bpf_elf_ctx *ctx, const char *section,
+			  bool *sseen)
+{
+	struct bpf_elf_sec_data data;
+	struct bpf_elf_prog prog;
+	int ret, i, fd = -1;
+
+	for (i = 1; i < ctx->elf_hdr.e_shnum; i++) {
+		if (ctx->sec_done[i])
+			continue;
+
+		ret = bpf_fill_section_data(ctx, i, &data);
+		if (ret < 0 ||
+		    !(data.sec_hdr.sh_type == SHT_PROGBITS &&
+		      data.sec_hdr.sh_flags & SHF_EXECINSTR &&
+		      !strcmp(data.sec_name, section)))
+			continue;
+
+		*sseen = true;
+
+		memset(&prog, 0, sizeof(prog));
+		prog.type    = ctx->type;
+		prog.insns   = data.sec_data->d_buf;
+		prog.size    = data.sec_data->d_size;
+		prog.license = ctx->license;
+
+		fd = bpf_prog_attach(section, &prog, ctx);
+		if (fd < 0)
+			return fd;
+
+		ctx->sec_done[i] = true;
+		break;
+	}
+
+	return fd;
+}
+
+static int bpf_apply_relo_data(struct bpf_elf_ctx *ctx,
+			       struct bpf_elf_sec_data *data_relo,
+			       struct bpf_elf_sec_data *data_insn)
+{
+	Elf_Data *idata = data_insn->sec_data;
+	GElf_Shdr *rhdr = &data_relo->sec_hdr;
+	int relo_ent, relo_num = rhdr->sh_size / rhdr->sh_entsize;
+	struct bpf_insn *insns = idata->d_buf;
+	unsigned int num_insns = idata->d_size / sizeof(*insns);
+
+	for (relo_ent = 0; relo_ent < relo_num; relo_ent++) {
+		unsigned int ioff, rmap;
+		GElf_Rel relo;
+		GElf_Sym sym;
+
+		if (gelf_getrel(data_relo->sec_data, relo_ent, &relo) != &relo)
+			return -EIO;
+
+		ioff = relo.r_offset / sizeof(struct bpf_insn);
+		if (ioff >= num_insns ||
+		    insns[ioff].code != (BPF_LD | BPF_IMM | BPF_DW)) {
+			fprintf(stderr, "ELF contains relo data for non ld64 instruction at offset %u! Compiler bug?!\n",
+				ioff);
+			if (ioff < num_insns &&
+			    insns[ioff].code == (BPF_JMP | BPF_CALL))
+				fprintf(stderr, " - Try to annotate functions with always_inline attribute!\n");
+			return -EINVAL;
+		}
+
+		if (gelf_getsym(ctx->sym_tab, GELF_R_SYM(relo.r_info), &sym) != &sym)
+			return -EIO;
+		if (sym.st_shndx != ctx->sec_maps) {
+			fprintf(stderr, "ELF contains non-map related relo data in entry %u pointing to section %u! Compiler bug?!\n",
+				relo_ent, sym.st_shndx);
+			return -EIO;
+		}
+
+		rmap = sym.st_value / ctx->map_len;
+		if (rmap >= ARRAY_SIZE(ctx->map_fds))
+			return -EINVAL;
+		if (!ctx->map_fds[rmap])
+			return -EINVAL;
+
+		if (ctx->verbose)
+			fprintf(stderr, "Map \'%s\' (%d) injected into prog section \'%s\' at offset %u!\n",
+				bpf_str_tab_name(ctx, &sym), ctx->map_fds[rmap],
+				data_insn->sec_name, ioff);
+
+		insns[ioff].src_reg = BPF_PSEUDO_MAP_FD;
+		insns[ioff].imm     = ctx->map_fds[rmap];
+	}
+
+	return 0;
+}
+
+static int bpf_fetch_prog_relo(struct bpf_elf_ctx *ctx, const char *section,
+			       bool *lderr, bool *sseen)
+{
+	struct bpf_elf_sec_data data_relo, data_insn;
+	struct bpf_elf_prog prog;
+	int ret, idx, i, fd = -1;
+
+	for (i = 1; i < ctx->elf_hdr.e_shnum; i++) {
+		ret = bpf_fill_section_data(ctx, i, &data_relo);
+		if (ret < 0 || data_relo.sec_hdr.sh_type != SHT_REL)
+			continue;
+
+		idx = data_relo.sec_hdr.sh_info;
+
+		ret = bpf_fill_section_data(ctx, idx, &data_insn);
+		if (ret < 0 ||
+		    !(data_insn.sec_hdr.sh_type == SHT_PROGBITS &&
+		      data_insn.sec_hdr.sh_flags & SHF_EXECINSTR &&
+		      !strcmp(data_insn.sec_name, section)))
+			continue;
+
+		*sseen = true;
+
+		ret = bpf_apply_relo_data(ctx, &data_relo, &data_insn);
+		if (ret < 0)
+			return ret;
+
+		memset(&prog, 0, sizeof(prog));
+		prog.type    = ctx->type;
+		prog.insns   = data_insn.sec_data->d_buf;
+		prog.size    = data_insn.sec_data->d_size;
+		prog.license = ctx->license;
+
+		fd = bpf_prog_attach(section, &prog, ctx);
+		if (fd < 0) {
+			*lderr = true;
+			return fd;
+		}
+
+		ctx->sec_done[i]   = true;
+		ctx->sec_done[idx] = true;
+		break;
+	}
+
+	return fd;
+}
+
+static int bpf_fetch_prog_sec(struct bpf_elf_ctx *ctx, const char *section)
+{
+	bool lderr = false, sseen = false;
+	int ret = -1;
+
+	if (bpf_has_map_data(ctx))
+		ret = bpf_fetch_prog_relo(ctx, section, &lderr, &sseen);
+	if (ret < 0 && !lderr)
+		ret = bpf_fetch_prog(ctx, section, &sseen);
+	if (ret < 0 && !sseen)
+		fprintf(stderr, "Program section \'%s\' not found in ELF file!\n",
+			section);
+	return ret;
+}
+
+static int bpf_find_map_by_id(struct bpf_elf_ctx *ctx, uint32_t id)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(ctx->map_fds); i++)
+		if (ctx->map_fds[i] && ctx->maps[i].id == id &&
+		    ctx->maps[i].type == BPF_MAP_TYPE_PROG_ARRAY)
+			return i;
+	return -1;
+}
+
+static int bpf_fill_prog_arrays(struct bpf_elf_ctx *ctx)
+{
+	struct bpf_elf_sec_data data;
+	uint32_t map_id, key_id;
+	int fd, i, ret, idx;
+
+	for (i = 1; i < ctx->elf_hdr.e_shnum; i++) {
+		if (ctx->sec_done[i])
+			continue;
+
+		ret = bpf_fill_section_data(ctx, i, &data);
+		if (ret < 0)
+			continue;
+
+		ret = sscanf(data.sec_name, "%i/%i", &map_id, &key_id);
+		if (ret != 2)
+			continue;
+
+		idx = bpf_find_map_by_id(ctx, map_id);
+		if (idx < 0)
+			continue;
+
+		fd = bpf_fetch_prog_sec(ctx, data.sec_name);
+		if (fd < 0)
+			return -EIO;
+
+		ret = bpf_map_update(ctx->map_fds[idx], &key_id,
+				     &fd, BPF_ANY);
+		if (ret < 0) {
+			if (errno == E2BIG)
+				fprintf(stderr, "Tail call key %u for map %u out of bounds?\n",
+					key_id, map_id);
+			return -errno;
+		}
+
+		ctx->sec_done[i] = true;
+	}
+
+	return 0;
+}
+
+static void bpf_save_finfo(struct bpf_elf_ctx *ctx)
+{
+	struct stat st;
+	int ret;
+
+	memset(&ctx->stat, 0, sizeof(ctx->stat));
+
+	ret = fstat(ctx->obj_fd, &st);
+	if (ret < 0) {
+		fprintf(stderr, "Stat of elf file failed: %s\n",
+			strerror(errno));
+		return;
+	}
+
+	ctx->stat.st_dev = st.st_dev;
+	ctx->stat.st_ino = st.st_ino;
+}
+
+static int bpf_read_pin_mapping(FILE *fp, uint32_t *id, char *path)
+{
+	char buff[PATH_MAX];
+
+	while (fgets(buff, sizeof(buff), fp)) {
+		char *ptr = buff;
+
+		while (*ptr == ' ' || *ptr == '\t')
+			ptr++;
+
+		if (*ptr == '#' || *ptr == '\n' || *ptr == 0)
+			continue;
+
+		if (sscanf(ptr, "%i %s\n", id, path) != 2 &&
+		    sscanf(ptr, "%i %s #", id, path) != 2) {
+			strcpy(path, ptr);
+			return -1;
+		}
+
+		return 1;
+	}
+
+	return 0;
+}
+
+static bool bpf_pinning_reserved(uint32_t pinning)
+{
+	switch (pinning) {
+	case PIN_NONE:
+	case PIN_OBJECT_NS:
+	case PIN_GLOBAL_NS:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static void bpf_hash_init(struct bpf_elf_ctx *ctx, const char *db_file)
+{
+	struct bpf_hash_entry *entry;
+	char subpath[PATH_MAX] = {};
+	uint32_t pinning;
+	FILE *fp;
+	int ret;
+
+	fp = fopen(db_file, "r");
+	if (!fp)
+		return;
+
+	while ((ret = bpf_read_pin_mapping(fp, &pinning, subpath))) {
+		if (ret == -1) {
+			fprintf(stderr, "Database %s is corrupted at: %s\n",
+				db_file, subpath);
+			fclose(fp);
+			return;
+		}
+
+		if (bpf_pinning_reserved(pinning)) {
+			fprintf(stderr, "Database %s, id %u is reserved - ignoring!\n",
+				db_file, pinning);
+			continue;
+		}
+
+		entry = malloc(sizeof(*entry));
+		if (!entry) {
+			fprintf(stderr, "No memory left for db entry!\n");
+			continue;
+		}
+
+		entry->pinning = pinning;
+		entry->subpath = strdup(subpath);
+		if (!entry->subpath) {
+			fprintf(stderr, "No memory left for db entry!\n");
+			free(entry);
+			continue;
+		}
+
+		entry->next = ctx->ht[pinning & (ARRAY_SIZE(ctx->ht) - 1)];
+		ctx->ht[pinning & (ARRAY_SIZE(ctx->ht) - 1)] = entry;
+	}
+
+	fclose(fp);
+}
+
+static void bpf_hash_destroy(struct bpf_elf_ctx *ctx)
+{
+	struct bpf_hash_entry *entry;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(ctx->ht); i++) {
+		while ((entry = ctx->ht[i]) != NULL) {
+			ctx->ht[i] = entry->next;
+			free((char *)entry->subpath);
+			free(entry);
+		}
+	}
+}
+
+static int bpf_elf_check_ehdr(const struct bpf_elf_ctx *ctx)
+{
+	if (ctx->elf_hdr.e_type != ET_REL ||
+	    (ctx->elf_hdr.e_machine != EM_NONE &&
+	     ctx->elf_hdr.e_machine != EM_BPF) ||
+	    ctx->elf_hdr.e_version != EV_CURRENT) {
+		fprintf(stderr, "ELF format error, ELF file not for eBPF?\n");
+		return -EINVAL;
+	}
+
+	switch (ctx->elf_hdr.e_ident[EI_DATA]) {
+	default:
+		fprintf(stderr, "ELF format error, wrong endianness info?\n");
+		return -EINVAL;
+	case ELFDATA2LSB:
+		if (htons(1) == 1) {
+			fprintf(stderr,
+				"We are big endian, eBPF object is little endian!\n");
+			return -EIO;
+		}
+		break;
+	case ELFDATA2MSB:
+		if (htons(1) != 1) {
+			fprintf(stderr,
+				"We are little endian, eBPF object is big endian!\n");
+			return -EIO;
+		}
+		break;
+	}
+
+	return 0;
+}
+
+static int bpf_elf_ctx_init(struct bpf_elf_ctx *ctx, const char *pathname,
+			    enum bpf_prog_type type, bool verbose)
+{
+	int ret = -EINVAL;
+
+	if (elf_version(EV_CURRENT) == EV_NONE ||
+	    bpf_init_env(pathname))
+		return ret;
+
+	memset(ctx, 0, sizeof(*ctx));
+	ctx->verbose = verbose;
+	ctx->type    = type;
+
+	ctx->obj_fd = open(pathname, O_RDONLY);
+	if (ctx->obj_fd < 0)
+		return ctx->obj_fd;
+
+	ctx->elf_fd = elf_begin(ctx->obj_fd, ELF_C_READ, NULL);
+	if (!ctx->elf_fd) {
+		ret = -EINVAL;
+		goto out_fd;
+	}
+
+	if (elf_kind(ctx->elf_fd) != ELF_K_ELF) {
+		ret = -EINVAL;
+		goto out_fd;
+	}
+
+	if (gelf_getehdr(ctx->elf_fd, &ctx->elf_hdr) !=
+	    &ctx->elf_hdr) {
+		ret = -EIO;
+		goto out_elf;
+	}
+
+	ret = bpf_elf_check_ehdr(ctx);
+	if (ret < 0)
+		goto out_elf;
+
+	ctx->sec_done = calloc(ctx->elf_hdr.e_shnum,
+			       sizeof(*(ctx->sec_done)));
+	if (!ctx->sec_done) {
+		ret = -ENOMEM;
+		goto out_elf;
+	}
+
+	if (ctx->verbose && bpf_log_realloc(ctx)) {
+		ret = -ENOMEM;
+		goto out_free;
+	}
+
+	bpf_save_finfo(ctx);
+	bpf_hash_init(ctx, CONFDIR "/bpf_pinning");
+
+	return 0;
+out_free:
+	free(ctx->sec_done);
+out_elf:
+	elf_end(ctx->elf_fd);
+out_fd:
+	close(ctx->obj_fd);
+	return ret;
+}
+
+static int bpf_maps_count(struct bpf_elf_ctx *ctx)
+{
+	int i, count = 0;
+
+	for (i = 0; i < ARRAY_SIZE(ctx->map_fds); i++) {
+		if (!ctx->map_fds[i])
+			break;
+		count++;
+	}
+
+	return count;
+}
+
+static void bpf_maps_teardown(struct bpf_elf_ctx *ctx)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(ctx->map_fds); i++) {
+		if (ctx->map_fds[i])
+			close(ctx->map_fds[i]);
+	}
+}
+
+static void bpf_elf_ctx_destroy(struct bpf_elf_ctx *ctx, bool failure)
+{
+	if (failure)
+		bpf_maps_teardown(ctx);
+
+	bpf_hash_destroy(ctx);
+
+	free(ctx->sec_done);
+	free(ctx->log);
+
+	elf_end(ctx->elf_fd);
+	close(ctx->obj_fd);
+}
+
+static struct bpf_elf_ctx __ctx;
+
+static int bpf_obj_open(const char *pathname, enum bpf_prog_type type,
+			const char *section, bool verbose)
+{
+	struct bpf_elf_ctx *ctx = &__ctx;
+	int fd = 0, ret;
+
+	ret = bpf_elf_ctx_init(ctx, pathname, type, verbose);
+	if (ret < 0) {
+		fprintf(stderr, "Cannot initialize ELF context!\n");
+		return ret;
+	}
+
+	ret = bpf_fetch_ancillary(ctx);
+	if (ret < 0) {
+		fprintf(stderr, "Error fetching ELF ancillary data!\n");
+		goto out;
+	}
+
+	fd = bpf_fetch_prog_sec(ctx, section);
+	if (fd < 0) {
+		fprintf(stderr, "Error fetching program/map!\n");
+		ret = fd;
+		goto out;
+	}
+
+	ret = bpf_fill_prog_arrays(ctx);
+	if (ret < 0)
+		fprintf(stderr, "Error filling program arrays!\n");
+out:
+	bpf_elf_ctx_destroy(ctx, ret < 0);
+	if (ret < 0) {
+		if (fd)
+			close(fd);
+		return ret;
+	}
+
+	return fd;
+}
+
+static int
+bpf_map_set_send(int fd, struct sockaddr_un *addr, unsigned int addr_len,
+		 const struct bpf_map_data *aux, unsigned int entries)
+{
+	struct bpf_map_set_msg msg = {
+		.aux.uds_ver = BPF_SCM_AUX_VER,
+		.aux.num_ent = entries,
+	};
+	int *cmsg_buf, min_fd;
+	char *amsg_buf;
+	int i;
+
+	strncpy(msg.aux.obj_name, aux->obj, sizeof(msg.aux.obj_name));
+	memcpy(&msg.aux.obj_st, aux->st, sizeof(msg.aux.obj_st));
+
+	cmsg_buf = bpf_map_set_init(&msg, addr, addr_len);
+	amsg_buf = (char *)msg.aux.ent;
+
+	for (i = 0; i < entries; i += min_fd) {
+		int ret;
+
+		min_fd = min(BPF_SCM_MAX_FDS * 1U, entries - i);
+		bpf_map_set_init_single(&msg, min_fd);
+
+		memcpy(cmsg_buf, &aux->fds[i], sizeof(aux->fds[0]) * min_fd);
+		memcpy(amsg_buf, &aux->ent[i], sizeof(aux->ent[0]) * min_fd);
+
+		ret = sendmsg(fd, &msg.hdr, 0);
+		if (ret <= 0)
+			return ret ? : -1;
+	}
+
+	return 0;
+}
+
+static int
+bpf_map_set_recv(int fd, int *fds,  struct bpf_map_aux *aux,
+		 unsigned int entries)
+{
+	struct bpf_map_set_msg msg;
+	int *cmsg_buf, min_fd;
+	char *amsg_buf, *mmsg_buf;
+	unsigned int needed = 1;
+	int i;
+
+	cmsg_buf = bpf_map_set_init(&msg, NULL, 0);
+	amsg_buf = (char *)msg.aux.ent;
+	mmsg_buf = (char *)&msg.aux;
+
+	for (i = 0; i < min(entries, needed); i += min_fd) {
+		struct cmsghdr *cmsg;
+		int ret;
+
+		min_fd = min(entries, entries - i);
+		bpf_map_set_init_single(&msg, min_fd);
+
+		ret = recvmsg(fd, &msg.hdr, 0);
+		if (ret <= 0)
+			return ret ? : -1;
+
+		cmsg = CMSG_FIRSTHDR(&msg.hdr);
+		if (!cmsg || cmsg->cmsg_type != SCM_RIGHTS)
+			return -EINVAL;
+		if (msg.hdr.msg_flags & MSG_CTRUNC)
+			return -EIO;
+		if (msg.aux.uds_ver != BPF_SCM_AUX_VER)
+			return -ENOSYS;
+
+		min_fd = (cmsg->cmsg_len - sizeof(*cmsg)) / sizeof(fd);
+		if (min_fd > entries || min_fd <= 0)
+			return -EINVAL;
+
+		memcpy(&fds[i], cmsg_buf, sizeof(fds[0]) * min_fd);
+		memcpy(&aux->ent[i], amsg_buf, sizeof(aux->ent[0]) * min_fd);
+		memcpy(aux, mmsg_buf, offsetof(struct bpf_map_aux, ent));
+
+		needed = aux->num_ent;
+	}
+
+	return 0;
+}
+
+int bpf_send_map_fds(const char *path, const char *obj)
+{
+	struct bpf_elf_ctx *ctx = &__ctx;
+	struct sockaddr_un addr = { .sun_family = AF_UNIX };
+	struct bpf_map_data bpf_aux = {
+		.fds = ctx->map_fds,
+		.ent = ctx->maps,
+		.st  = &ctx->stat,
+		.obj = obj,
+	};
+	int fd, ret;
+
+	fd = socket(AF_UNIX, SOCK_DGRAM, 0);
+	if (fd < 0) {
+		fprintf(stderr, "Cannot open socket: %s\n",
+			strerror(errno));
+		return -1;
+	}
+
+	strncpy(addr.sun_path, path, sizeof(addr.sun_path));
+
+	ret = connect(fd, (struct sockaddr *)&addr, sizeof(addr));
+	if (ret < 0) {
+		fprintf(stderr, "Cannot connect to %s: %s\n",
+			path, strerror(errno));
+		return -1;
+	}
+
+	ret = bpf_map_set_send(fd, &addr, sizeof(addr), &bpf_aux,
+			       bpf_maps_count(ctx));
+	if (ret < 0)
+		fprintf(stderr, "Cannot send fds to %s: %s\n",
+			path, strerror(errno));
+
+	bpf_maps_teardown(ctx);
+	close(fd);
+	return ret;
+}
+
+int bpf_recv_map_fds(const char *path, int *fds, struct bpf_map_aux *aux,
+		     unsigned int entries)
+{
+	struct sockaddr_un addr = { .sun_family = AF_UNIX };
+	int fd, ret;
+
+	fd = socket(AF_UNIX, SOCK_DGRAM, 0);
+	if (fd < 0) {
+		fprintf(stderr, "Cannot open socket: %s\n",
+			strerror(errno));
+		return -1;
+	}
+
+	strncpy(addr.sun_path, path, sizeof(addr.sun_path));
+
+	ret = bind(fd, (struct sockaddr *)&addr, sizeof(addr));
+	if (ret < 0) {
+		fprintf(stderr, "Cannot bind to socket: %s\n",
+			strerror(errno));
+		return -1;
+	}
+
+	ret = bpf_map_set_recv(fd, fds, aux, entries);
+	if (ret < 0)
+		fprintf(stderr, "Cannot recv fds from %s: %s\n",
+			path, strerror(errno));
+
+	unlink(addr.sun_path);
+	close(fd);
+	return ret;
+}
+#endif /* HAVE_ELF */
diff --git a/tc/Makefile b/tc/Makefile
index dfa875b..f986fcb 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -1,5 +1,5 @@
 TCOBJ= tc.o tc_qdisc.o tc_class.o tc_filter.o tc_util.o tc_monitor.o \
-       tc_exec.o tc_bpf.o m_police.o m_estimator.o m_action.o m_ematch.o \
+       tc_exec.o m_police.o m_estimator.o m_action.o m_ematch.o \
        emp_ematch.yacc.o emp_ematch.lex.o
 
 include ../Config
@@ -94,11 +94,6 @@ ifneq ($(TC_CONFIG_NO_XT),y)
   endif
 endif
 
-ifeq ($(TC_CONFIG_ELF),y)
-  CFLAGS += -DHAVE_ELF
-  LDLIBS += -lelf
-endif
-
 TCOBJ += $(TCMODULES)
 LDLIBS += -L. -ltc -lm
 
diff --git a/tc/e_bpf.c b/tc/e_bpf.c
index d1f5d87..84f43e6 100644
--- a/tc/e_bpf.c
+++ b/tc/e_bpf.c
@@ -15,8 +15,8 @@
 #include "utils.h"
 
 #include "tc_util.h"
-#include "tc_bpf.h"
 
+#include "bpf_util.h"
 #include "bpf_elf.h"
 #include "bpf_scm.h"
 
diff --git a/tc/f_bpf.c b/tc/f_bpf.c
index 665bc66..c4764d8 100644
--- a/tc/f_bpf.c
+++ b/tc/f_bpf.c
@@ -6,7 +6,7 @@
  *		as published by the Free Software Foundation; either version
  *		2 of the License, or (at your option) any later version.
  *
- * Authors:	Daniel Borkmann <dborkman@redhat.com>
+ * Authors:	Daniel Borkmann <daniel@iogearbox.net>
  */
 
 #include <stdio.h>
@@ -15,18 +15,12 @@
 #include <linux/bpf.h>
 
 #include "utils.h"
+
 #include "tc_util.h"
-#include "tc_bpf.h"
+#include "bpf_util.h"
 
 static const enum bpf_prog_type bpf_type = BPF_PROG_TYPE_SCHED_CLS;
 
-static const int nla_tbl[BPF_NLA_MAX] = {
-	[BPF_NLA_OPS_LEN]	= TCA_BPF_OPS_LEN,
-	[BPF_NLA_OPS]		= TCA_BPF_OPS,
-	[BPF_NLA_FD]		= TCA_BPF_FD,
-	[BPF_NLA_NAME]		= TCA_BPF_NAME,
-};
-
 static void explain(void)
 {
 	fprintf(stderr, "Usage: ... bpf ...\n");
@@ -52,7 +46,7 @@ static void explain(void)
 	fprintf(stderr, "pinned eBPF program.\n");
 	fprintf(stderr, "\n");
 	fprintf(stderr, "Where CLS_NAME refers to the section name containing the\n");
-	fprintf(stderr, "classifier (default \'%s\').\n", bpf_default_section(bpf_type));
+	fprintf(stderr, "classifier (default \'%s\').\n", bpf_prog_to_default_section(bpf_type));
 	fprintf(stderr, "\n");
 	fprintf(stderr, "Where UDS_FILE points to a unix domain socket file in order\n");
 	fprintf(stderr, "to hand off control of all created eBPF maps to an agent.\n");
@@ -61,6 +55,24 @@ static void explain(void)
 	fprintf(stderr, "NOTE: CLASSID is parsed as hexadecimal input.\n");
 }
 
+static void bpf_cbpf_cb(void *nl, const struct sock_filter *ops, int ops_len)
+{
+	addattr16(nl, MAX_MSG, TCA_BPF_OPS_LEN, ops_len);
+	addattr_l(nl, MAX_MSG, TCA_BPF_OPS, ops,
+		  ops_len * sizeof(struct sock_filter));
+}
+
+static void bpf_ebpf_cb(void *nl, int fd, const char *annotation)
+{
+	addattr32(nl, MAX_MSG, TCA_BPF_FD, fd);
+	addattrstrz(nl, MAX_MSG, TCA_BPF_NAME, annotation);
+}
+
+static const struct bpf_cfg_ops bpf_cb_ops = {
+	.cbpf_cb = bpf_cbpf_cb,
+	.ebpf_cb = bpf_ebpf_cb,
+};
+
 static int bpf_parse_opt(struct filter_util *qu, char *handle,
 			 int argc, char **argv, struct nlmsghdr *n)
 {
@@ -68,6 +80,7 @@ static int bpf_parse_opt(struct filter_util *qu, char *handle,
 	struct tcmsg *t = NLMSG_DATA(n);
 	unsigned int bpf_gen_flags = 0;
 	unsigned int bpf_flags = 0;
+	struct bpf_cfg_in cfg = {};
 	bool seen_run = false;
 	struct rtattr *tail;
 	int ret = 0;
@@ -90,11 +103,17 @@ static int bpf_parse_opt(struct filter_util *qu, char *handle,
 			NEXT_ARG();
 opt_bpf:
 			seen_run = true;
-			if (bpf_parse_common(&argc, &argv, nla_tbl, bpf_type,
-					     &bpf_obj, &bpf_uds_name, n)) {
-				fprintf(stderr, "Failed to retrieve (e)BPF data!\n");
+			cfg.argc = argc;
+			cfg.argv = argv;
+
+			if (bpf_parse_common(bpf_type, &cfg, &bpf_cb_ops, n))
 				return -1;
-			}
+
+			argc = cfg.argc;
+			argv = cfg.argv;
+
+			bpf_obj = cfg.object;
+			bpf_uds_name = cfg.uds;
 		} else if (matches(*argv, "classid") == 0 ||
 			   matches(*argv, "flowid") == 0) {
 			unsigned int handle;
@@ -143,7 +162,7 @@ opt_bpf:
 
 	if (bpf_gen_flags)
 		addattr32(n, MAX_MSG, TCA_BPF_FLAGS_GEN, bpf_gen_flags);
-	if (bpf_obj && bpf_flags)
+	if (bpf_flags)
 		addattr32(n, MAX_MSG, TCA_BPF_FLAGS, bpf_flags);
 
 	tail->rta_len = (((void *)n) + n->nlmsg_len) - (void *)tail;
@@ -175,8 +194,6 @@ static int bpf_print_opt(struct filter_util *qu, FILE *f,
 
 	if (tb[TCA_BPF_NAME])
 		fprintf(f, "%s ", rta_getattr_str(tb[TCA_BPF_NAME]));
-	else if (tb[TCA_BPF_FD])
-		fprintf(f, "pfd %u ", rta_getattr_u32(tb[TCA_BPF_FD]));
 
 	if (tb[TCA_BPF_FLAGS]) {
 		unsigned int flags = rta_getattr_u32(tb[TCA_BPF_FLAGS]);
@@ -195,20 +212,17 @@ static int bpf_print_opt(struct filter_util *qu, FILE *f,
 			fprintf(f, "skip_sw ");
 	}
 
-	if (tb[TCA_BPF_OPS] && tb[TCA_BPF_OPS_LEN]) {
+	if (tb[TCA_BPF_OPS] && tb[TCA_BPF_OPS_LEN])
 		bpf_print_ops(f, tb[TCA_BPF_OPS],
 			      rta_getattr_u16(tb[TCA_BPF_OPS_LEN]));
-		fprintf(f, "\n");
-	}
 
 	if (tb[TCA_BPF_POLICE]) {
 		fprintf(f, "\n");
 		tc_print_police(f, tb[TCA_BPF_POLICE]);
 	}
 
-	if (tb[TCA_BPF_ACT]) {
+	if (tb[TCA_BPF_ACT])
 		tc_print_action(f, tb[TCA_BPF_ACT]);
-	}
 
 	return 0;
 }
diff --git a/tc/m_bpf.c b/tc/m_bpf.c
index 9bf2a85..e26b85d 100644
--- a/tc/m_bpf.c
+++ b/tc/m_bpf.c
@@ -17,18 +17,12 @@
 #include <linux/tc_act/tc_bpf.h>
 
 #include "utils.h"
+
 #include "tc_util.h"
-#include "tc_bpf.h"
+#include "bpf_util.h"
 
 static const enum bpf_prog_type bpf_type = BPF_PROG_TYPE_SCHED_ACT;
 
-static const int nla_tbl[BPF_NLA_MAX] = {
-	[BPF_NLA_OPS_LEN]	= TCA_ACT_BPF_OPS_LEN,
-	[BPF_NLA_OPS]		= TCA_ACT_BPF_OPS,
-	[BPF_NLA_FD]		= TCA_ACT_BPF_FD,
-	[BPF_NLA_NAME]		= TCA_ACT_BPF_NAME,
-};
-
 static void explain(void)
 {
 	fprintf(stderr, "Usage: ... bpf ... [ index INDEX ]\n");
@@ -50,7 +44,7 @@ static void explain(void)
 	fprintf(stderr, "pinned eBPF program.\n");
 	fprintf(stderr, "\n");
 	fprintf(stderr, "Where ACT_NAME refers to the section name containing the\n");
-	fprintf(stderr, "action (default \'%s\').\n", bpf_default_section(bpf_type));
+	fprintf(stderr, "action (default \'%s\').\n", bpf_prog_to_default_section(bpf_type));
 	fprintf(stderr, "\n");
 	fprintf(stderr, "Where UDS_FILE points to a unix domain socket file in order\n");
 	fprintf(stderr, "to hand off control of all created eBPF maps to an agent.\n");
@@ -59,11 +53,30 @@ static void explain(void)
 	fprintf(stderr, "explicitly specifies an action index upon creation.\n");
 }
 
+static void bpf_cbpf_cb(void *nl, const struct sock_filter *ops, int ops_len)
+{
+	addattr16(nl, MAX_MSG, TCA_ACT_BPF_OPS_LEN, ops_len);
+	addattr_l(nl, MAX_MSG, TCA_ACT_BPF_OPS, ops,
+		  ops_len * sizeof(struct sock_filter));
+}
+
+static void bpf_ebpf_cb(void *nl, int fd, const char *annotation)
+{
+	addattr32(nl, MAX_MSG, TCA_ACT_BPF_FD, fd);
+	addattrstrz(nl, MAX_MSG, TCA_ACT_BPF_NAME, annotation);
+}
+
+static const struct bpf_cfg_ops bpf_cb_ops = {
+	.cbpf_cb = bpf_cbpf_cb,
+	.ebpf_cb = bpf_ebpf_cb,
+};
+
 static int bpf_parse_opt(struct action_util *a, int *ptr_argc, char ***ptr_argv,
 			 int tca_id, struct nlmsghdr *n)
 {
 	const char *bpf_obj = NULL, *bpf_uds_name = NULL;
 	struct tc_act_bpf parm = { .action = TC_ACT_PIPE };
+	struct bpf_cfg_in cfg = {};
 	bool seen_run = false;
 	struct rtattr *tail;
 	int argc, ret = 0;
@@ -85,11 +98,17 @@ static int bpf_parse_opt(struct action_util *a, int *ptr_argc, char ***ptr_argv,
 			NEXT_ARG();
 opt_bpf:
 			seen_run = true;
-			if (bpf_parse_common(&argc, &argv, nla_tbl, bpf_type,
-					     &bpf_obj, &bpf_uds_name, n)) {
-				fprintf(stderr, "Failed to retrieve (e)BPF data!\n");
+			cfg.argc = argc;
+			cfg.argv = argv;
+
+			if (bpf_parse_common(bpf_type, &cfg, &bpf_cb_ops, n))
 				return -1;
-			}
+
+			argc = cfg.argc;
+			argv = cfg.argv;
+
+			bpf_obj = cfg.object;
+			bpf_uds_name = cfg.uds;
 		} else if (matches(*argv, "help") == 0) {
 			explain();
 			return -1;
@@ -151,8 +170,6 @@ static int bpf_print_opt(struct action_util *au, FILE *f, struct rtattr *arg)
 
 	if (tb[TCA_ACT_BPF_NAME])
 		fprintf(f, "%s ", rta_getattr_str(tb[TCA_ACT_BPF_NAME]));
-	else if (tb[TCA_ACT_BPF_FD])
-		fprintf(f, "pfd %u ", rta_getattr_u32(tb[TCA_ACT_BPF_FD]));
 
 	if (tb[TCA_ACT_BPF_OPS] && tb[TCA_ACT_BPF_OPS_LEN]) {
 		bpf_print_ops(f, tb[TCA_ACT_BPF_OPS],
diff --git a/tc/tc_bpf.c b/tc/tc_bpf.c
deleted file mode 100644
index b390f7e..0000000
--- a/tc/tc_bpf.c
+++ /dev/null
@@ -1,2010 +0,0 @@
-/*
- * tc_bpf.c	BPF common code
- *
- *		This program is free software; you can distribute it and/or
- *		modify it under the terms of the GNU General Public License
- *		as published by the Free Software Foundation; either version
- *		2 of the License, or (at your option) any later version.
- *
- * Authors:	Daniel Borkmann <dborkman@redhat.com>
- *		Jiri Pirko <jiri@resnulli.us>
- *		Alexei Starovoitov <ast@plumgrid.com>
- */
-
-#include <stdio.h>
-#include <stdlib.h>
-#include <unistd.h>
-#include <string.h>
-#include <stdbool.h>
-#include <stdint.h>
-#include <errno.h>
-#include <fcntl.h>
-#include <stdarg.h>
-#include <limits.h>
-
-#ifdef HAVE_ELF
-#include <libelf.h>
-#include <gelf.h>
-#endif
-
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <sys/un.h>
-#include <sys/vfs.h>
-#include <sys/mount.h>
-#include <sys/syscall.h>
-#include <sys/sendfile.h>
-#include <sys/resource.h>
-
-#include <linux/bpf.h>
-#include <linux/filter.h>
-#include <linux/if_alg.h>
-
-#include <arpa/inet.h>
-
-#include "utils.h"
-
-#include "bpf_elf.h"
-#include "bpf_scm.h"
-
-#include "tc_util.h"
-#include "tc_bpf.h"
-
-#ifndef AF_ALG
-#define AF_ALG 38
-#endif
-
-#ifndef EM_BPF
-#define EM_BPF 247
-#endif
-
-#ifdef HAVE_ELF
-static int bpf_obj_open(const char *path, enum bpf_prog_type type,
-			const char *sec, bool verbose);
-#else
-static int bpf_obj_open(const char *path, enum bpf_prog_type type,
-			const char *sec, bool verbose)
-{
-	fprintf(stderr, "No ELF library support compiled in.\n");
-	errno = ENOSYS;
-	return -1;
-}
-#endif
-
-static inline __u64 bpf_ptr_to_u64(const void *ptr)
-{
-	return (__u64)(unsigned long)ptr;
-}
-
-static int bpf(int cmd, union bpf_attr *attr, unsigned int size)
-{
-#ifdef __NR_bpf
-	return syscall(__NR_bpf, cmd, attr, size);
-#else
-	fprintf(stderr, "No bpf syscall, kernel headers too old?\n");
-	errno = ENOSYS;
-	return -1;
-#endif
-}
-
-static int bpf_map_update(int fd, const void *key, const void *value,
-			  uint64_t flags)
-{
-	union bpf_attr attr = {};
-
-	attr.map_fd = fd;
-	attr.key = bpf_ptr_to_u64(key);
-	attr.value = bpf_ptr_to_u64(value);
-	attr.flags = flags;
-
-	return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
-}
-
-static int bpf_parse_string(char *arg, bool from_file, __u16 *bpf_len,
-			    char **bpf_string, bool *need_release,
-			    const char separator)
-{
-	char sp;
-
-	if (from_file) {
-		size_t tmp_len, op_len = sizeof("65535 255 255 4294967295,");
-		char *tmp_string;
-		FILE *fp;
-
-		tmp_len = sizeof("4096,") + BPF_MAXINSNS * op_len;
-		tmp_string = calloc(1, tmp_len);
-		if (tmp_string == NULL)
-			return -ENOMEM;
-
-		fp = fopen(arg, "r");
-		if (fp == NULL) {
-			perror("Cannot fopen");
-			free(tmp_string);
-			return -ENOENT;
-		}
-
-		if (!fgets(tmp_string, tmp_len, fp)) {
-			free(tmp_string);
-			fclose(fp);
-			return -EIO;
-		}
-
-		fclose(fp);
-
-		*need_release = true;
-		*bpf_string = tmp_string;
-	} else {
-		*need_release = false;
-		*bpf_string = arg;
-	}
-
-	if (sscanf(*bpf_string, "%hu%c", bpf_len, &sp) != 2 ||
-	    sp != separator) {
-		if (*need_release)
-			free(*bpf_string);
-		return -EINVAL;
-	}
-
-	return 0;
-}
-
-static int bpf_ops_parse(int argc, char **argv, struct sock_filter *bpf_ops,
-			 bool from_file)
-{
-	char *bpf_string, *token, separator = ',';
-	int ret = 0, i = 0;
-	bool need_release;
-	__u16 bpf_len = 0;
-
-	if (argc < 1)
-		return -EINVAL;
-	if (bpf_parse_string(argv[0], from_file, &bpf_len, &bpf_string,
-			     &need_release, separator))
-		return -EINVAL;
-	if (bpf_len == 0 || bpf_len > BPF_MAXINSNS) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	token = bpf_string;
-	while ((token = strchr(token, separator)) && (++token)[0]) {
-		if (i >= bpf_len) {
-			fprintf(stderr, "Real program length exceeds encoded length parameter!\n");
-			ret = -EINVAL;
-			goto out;
-		}
-
-		if (sscanf(token, "%hu %hhu %hhu %u,",
-			   &bpf_ops[i].code, &bpf_ops[i].jt,
-			   &bpf_ops[i].jf, &bpf_ops[i].k) != 4) {
-			fprintf(stderr, "Error at instruction %d!\n", i);
-			ret = -EINVAL;
-			goto out;
-		}
-
-		i++;
-	}
-
-	if (i != bpf_len) {
-		fprintf(stderr, "Parsed program length is less than encoded length parameter!\n");
-		ret = -EINVAL;
-		goto out;
-	}
-	ret = bpf_len;
-out:
-	if (need_release)
-		free(bpf_string);
-
-	return ret;
-}
-
-void bpf_print_ops(FILE *f, struct rtattr *bpf_ops, __u16 len)
-{
-	struct sock_filter *ops = (struct sock_filter *) RTA_DATA(bpf_ops);
-	int i;
-
-	if (len == 0)
-		return;
-
-	fprintf(f, "bytecode \'%u,", len);
-
-	for (i = 0; i < len - 1; i++)
-		fprintf(f, "%hu %hhu %hhu %u,", ops[i].code, ops[i].jt,
-			ops[i].jf, ops[i].k);
-
-	fprintf(f, "%hu %hhu %hhu %u\'", ops[i].code, ops[i].jt,
-		ops[i].jf, ops[i].k);
-}
-
-static void bpf_map_pin_report(const struct bpf_elf_map *pin,
-			       const struct bpf_elf_map *obj)
-{
-	fprintf(stderr, "Map specification differs from pinned file!\n");
-
-	if (obj->type != pin->type)
-		fprintf(stderr, " - Type:         %u (obj) != %u (pin)\n",
-			obj->type, pin->type);
-	if (obj->size_key != pin->size_key)
-		fprintf(stderr, " - Size key:     %u (obj) != %u (pin)\n",
-			obj->size_key, pin->size_key);
-	if (obj->size_value != pin->size_value)
-		fprintf(stderr, " - Size value:   %u (obj) != %u (pin)\n",
-			obj->size_value, pin->size_value);
-	if (obj->max_elem != pin->max_elem)
-		fprintf(stderr, " - Max elems:    %u (obj) != %u (pin)\n",
-			obj->max_elem, pin->max_elem);
-	if (obj->flags != pin->flags)
-		fprintf(stderr, " - Flags:        %#x (obj) != %#x (pin)\n",
-			obj->flags, pin->flags);
-
-	fprintf(stderr, "\n");
-}
-
-static int bpf_map_selfcheck_pinned(int fd, const struct bpf_elf_map *map,
-				    int length)
-{
-	char file[PATH_MAX], buff[4096];
-	struct bpf_elf_map tmp = {}, zero = {};
-	unsigned int val;
-	FILE *fp;
-
-	snprintf(file, sizeof(file), "/proc/%d/fdinfo/%d", getpid(), fd);
-
-	fp = fopen(file, "r");
-	if (!fp) {
-		fprintf(stderr, "No procfs support?!\n");
-		return -EIO;
-	}
-
-	while (fgets(buff, sizeof(buff), fp)) {
-		if (sscanf(buff, "map_type:\t%u", &val) == 1)
-			tmp.type = val;
-		else if (sscanf(buff, "key_size:\t%u", &val) == 1)
-			tmp.size_key = val;
-		else if (sscanf(buff, "value_size:\t%u", &val) == 1)
-			tmp.size_value = val;
-		else if (sscanf(buff, "max_entries:\t%u", &val) == 1)
-			tmp.max_elem = val;
-		else if (sscanf(buff, "map_flags:\t%i", &val) == 1)
-			tmp.flags = val;
-	}
-
-	fclose(fp);
-
-	if (!memcmp(&tmp, map, length)) {
-		return 0;
-	} else {
-		/* If kernel doesn't have eBPF-related fdinfo, we cannot do much,
-		 * so just accept it. We know we do have an eBPF fd and in this
-		 * case, everything is 0. It is guaranteed that no such map exists
-		 * since map type of 0 is unloadable BPF_MAP_TYPE_UNSPEC.
-		 */
-		if (!memcmp(&tmp, &zero, length))
-			return 0;
-
-		bpf_map_pin_report(&tmp, map);
-		return -EINVAL;
-	}
-}
-
-static int bpf_mnt_fs(const char *target)
-{
-	bool bind_done = false;
-
-	while (mount("", target, "none", MS_PRIVATE | MS_REC, NULL)) {
-		if (errno != EINVAL || bind_done) {
-			fprintf(stderr, "mount --make-private %s failed: %s\n",
-				target,	strerror(errno));
-			return -1;
-		}
-
-		if (mount(target, target, "none", MS_BIND, NULL)) {
-			fprintf(stderr, "mount --bind %s %s failed: %s\n",
-				target,	target, strerror(errno));
-			return -1;
-		}
-
-		bind_done = true;
-	}
-
-	if (mount("bpf", target, "bpf", 0, NULL)) {
-		fprintf(stderr, "mount -t bpf bpf %s failed: %s\n",
-			target,	strerror(errno));
-		return -1;
-	}
-
-	return 0;
-}
-
-static int bpf_valid_mntpt(const char *mnt, unsigned long magic)
-{
-	struct statfs st_fs;
-
-	if (statfs(mnt, &st_fs) < 0)
-		return -ENOENT;
-	if ((unsigned long)st_fs.f_type != magic)
-		return -ENOENT;
-
-	return 0;
-}
-
-static const char *bpf_find_mntpt(const char *fstype, unsigned long magic,
-				  char *mnt, int len,
-				  const char * const *known_mnts)
-{
-	const char * const *ptr;
-	char type[100];
-	FILE *fp;
-
-	if (known_mnts) {
-		ptr = known_mnts;
-		while (*ptr) {
-			if (bpf_valid_mntpt(*ptr, magic) == 0) {
-				strncpy(mnt, *ptr, len - 1);
-				mnt[len - 1] = 0;
-				return mnt;
-			}
-			ptr++;
-		}
-	}
-
-	fp = fopen("/proc/mounts", "r");
-	if (fp == NULL || len != PATH_MAX)
-		return NULL;
-
-	while (fscanf(fp, "%*s %" textify(PATH_MAX) "s %99s %*s %*d %*d\n",
-		      mnt, type) == 2) {
-		if (strcmp(type, fstype) == 0)
-			break;
-	}
-
-	fclose(fp);
-	if (strcmp(type, fstype) != 0)
-		return NULL;
-
-	return mnt;
-}
-
-int bpf_trace_pipe(void)
-{
-	char tracefs_mnt[PATH_MAX] = TRACE_DIR_MNT;
-	static const char * const tracefs_known_mnts[] = {
-		TRACE_DIR_MNT,
-		"/sys/kernel/debug/tracing",
-		"/tracing",
-		"/trace",
-		0,
-	};
-	char tpipe[PATH_MAX];
-	const char *mnt;
-	int fd;
-
-	mnt = bpf_find_mntpt("tracefs", TRACEFS_MAGIC, tracefs_mnt,
-			     sizeof(tracefs_mnt), tracefs_known_mnts);
-	if (!mnt) {
-		fprintf(stderr, "tracefs not mounted?\n");
-		return -1;
-	}
-
-	snprintf(tpipe, sizeof(tpipe), "%s/trace_pipe", mnt);
-
-	fd = open(tpipe, O_RDONLY);
-	if (fd < 0)
-		return -1;
-
-	fprintf(stderr, "Running! Hang up with ^C!\n\n");
-	while (1) {
-		static char buff[4096];
-		ssize_t ret;
-
-		ret = read(fd, buff, sizeof(buff) - 1);
-		if (ret > 0) {
-			write(2, buff, ret);
-			fflush(stderr);
-		}
-	}
-
-	return 0;
-}
-
-static const char *bpf_get_tc_dir(void)
-{
-	static bool bpf_mnt_cached;
-	static char bpf_tc_dir[PATH_MAX];
-	static const char *mnt;
-	static const char * const bpf_known_mnts[] = {
-		BPF_DIR_MNT,
-		0,
-	};
-	char bpf_mnt[PATH_MAX] = BPF_DIR_MNT;
-	char bpf_glo_dir[PATH_MAX];
-	int ret;
-
-	if (bpf_mnt_cached)
-		goto done;
-
-	mnt = bpf_find_mntpt("bpf", BPF_FS_MAGIC, bpf_mnt, sizeof(bpf_mnt),
-			     bpf_known_mnts);
-	if (!mnt) {
-		mnt = getenv(BPF_ENV_MNT);
-		if (!mnt)
-			mnt = BPF_DIR_MNT;
-		ret = bpf_mnt_fs(mnt);
-		if (ret) {
-			mnt = NULL;
-			goto out;
-		}
-	}
-
-	snprintf(bpf_tc_dir, sizeof(bpf_tc_dir), "%s/%s", mnt, BPF_DIR_TC);
-	ret = mkdir(bpf_tc_dir, S_IRWXU);
-	if (ret && errno != EEXIST) {
-		fprintf(stderr, "mkdir %s failed: %s\n", bpf_tc_dir,
-			strerror(errno));
-		mnt = NULL;
-		goto out;
-	}
-
-	snprintf(bpf_glo_dir, sizeof(bpf_glo_dir), "%s/%s",
-		 bpf_tc_dir, BPF_DIR_GLOBALS);
-	ret = mkdir(bpf_glo_dir, S_IRWXU);
-	if (ret && errno != EEXIST) {
-		fprintf(stderr, "mkdir %s failed: %s\n", bpf_glo_dir,
-			strerror(errno));
-		mnt = NULL;
-		goto out;
-	}
-
-	mnt = bpf_tc_dir;
-out:
-	bpf_mnt_cached = true;
-done:
-	return mnt;
-}
-
-static int bpf_obj_get(const char *pathname)
-{
-	union bpf_attr attr = {};
-	char tmp[PATH_MAX];
-
-	if (strlen(pathname) > 2 && pathname[0] == 'm' &&
-	    pathname[1] == ':' && bpf_get_tc_dir()) {
-		snprintf(tmp, sizeof(tmp), "%s/%s",
-			 bpf_get_tc_dir(), pathname + 2);
-		pathname = tmp;
-	}
-
-	attr.pathname = bpf_ptr_to_u64(pathname);
-
-	return bpf(BPF_OBJ_GET, &attr, sizeof(attr));
-}
-
-const char *bpf_default_section(const enum bpf_prog_type type)
-{
-	switch (type) {
-	case BPF_PROG_TYPE_SCHED_CLS:
-		return ELF_SECTION_CLASSIFIER;
-	case BPF_PROG_TYPE_SCHED_ACT:
-		return ELF_SECTION_ACTION;
-	default:
-		return NULL;
-	}
-}
-
-enum bpf_mode {
-	CBPF_BYTECODE = 0,
-	CBPF_FILE,
-	EBPF_OBJECT,
-	EBPF_PINNED,
-	__BPF_MODE_MAX,
-#define BPF_MODE_MAX	__BPF_MODE_MAX
-};
-
-static int bpf_parse(int *ptr_argc, char ***ptr_argv, const bool *opt_tbl,
-		     enum bpf_prog_type *type, enum bpf_mode *mode,
-		     const char **ptr_object, const char **ptr_section,
-		     const char **ptr_uds_name, struct sock_filter *opcodes)
-{
-	const char *file, *section, *uds_name;
-	bool verbose = false;
-	int ret, argc;
-	char **argv;
-
-	argv = *ptr_argv;
-	argc = *ptr_argc;
-
-	if (opt_tbl[CBPF_BYTECODE] &&
-	    (matches(*argv, "bytecode") == 0 ||
-	     strcmp(*argv, "bc") == 0)) {
-		*mode = CBPF_BYTECODE;
-	} else if (opt_tbl[CBPF_FILE] &&
-		   (matches(*argv, "bytecode-file") == 0 ||
-		    strcmp(*argv, "bcf") == 0)) {
-		*mode = CBPF_FILE;
-	} else if (opt_tbl[EBPF_OBJECT] &&
-		   (matches(*argv, "object-file") == 0 ||
-		    strcmp(*argv, "obj") == 0)) {
-		*mode = EBPF_OBJECT;
-	} else if (opt_tbl[EBPF_PINNED] &&
-		   (matches(*argv, "object-pinned") == 0 ||
-		    matches(*argv, "pinned") == 0 ||
-		    matches(*argv, "fd") == 0)) {
-		*mode = EBPF_PINNED;
-	} else {
-		fprintf(stderr, "What mode is \"%s\"?\n", *argv);
-		return -1;
-	}
-
-	NEXT_ARG();
-	file = section = uds_name = NULL;
-	if (*mode == EBPF_OBJECT || *mode == EBPF_PINNED) {
-		file = *argv;
-		NEXT_ARG_FWD();
-
-		if (*type == BPF_PROG_TYPE_UNSPEC) {
-			if (argc > 0 && matches(*argv, "type") == 0) {
-				NEXT_ARG();
-				if (matches(*argv, "cls") == 0) {
-					*type = BPF_PROG_TYPE_SCHED_CLS;
-				} else if (matches(*argv, "act") == 0) {
-					*type = BPF_PROG_TYPE_SCHED_ACT;
-				} else {
-					fprintf(stderr, "What type is \"%s\"?\n",
-						*argv);
-					return -1;
-				}
-				NEXT_ARG_FWD();
-			} else {
-				*type = BPF_PROG_TYPE_SCHED_CLS;
-			}
-		}
-
-		section = bpf_default_section(*type);
-		if (argc > 0 && matches(*argv, "section") == 0) {
-			NEXT_ARG();
-			section = *argv;
-			NEXT_ARG_FWD();
-		}
-
-		uds_name = getenv(BPF_ENV_UDS);
-		if (argc > 0 && !uds_name &&
-		    matches(*argv, "export") == 0) {
-			NEXT_ARG();
-			uds_name = *argv;
-			NEXT_ARG_FWD();
-		}
-
-		if (argc > 0 && matches(*argv, "verbose") == 0) {
-			verbose = true;
-			NEXT_ARG_FWD();
-		}
-
-		PREV_ARG();
-	}
-
-	if (*mode == CBPF_BYTECODE || *mode == CBPF_FILE)
-		ret = bpf_ops_parse(argc, argv, opcodes, *mode == CBPF_FILE);
-	else if (*mode == EBPF_OBJECT)
-		ret = bpf_obj_open(file, *type, section, verbose);
-	else if (*mode == EBPF_PINNED)
-		ret = bpf_obj_get(file);
-	else
-		return -1;
-
-	if (ptr_object)
-		*ptr_object = file;
-	if (ptr_section)
-		*ptr_section = section;
-	if (ptr_uds_name)
-		*ptr_uds_name = uds_name;
-
-	*ptr_argc = argc;
-	*ptr_argv = argv;
-
-	return ret;
-}
-
-int bpf_parse_common(int *ptr_argc, char ***ptr_argv, const int *nla_tbl,
-		     enum bpf_prog_type type, const char **ptr_object,
-		     const char **ptr_uds_name, struct nlmsghdr *n)
-{
-	struct sock_filter opcodes[BPF_MAXINSNS];
-	const bool opt_tbl[BPF_MODE_MAX] = {
-		[CBPF_BYTECODE]	= true,
-		[CBPF_FILE]	= true,
-		[EBPF_OBJECT]	= true,
-		[EBPF_PINNED]	= true,
-	};
-	char annotation[256];
-	const char *section;
-	enum bpf_mode mode;
-	int ret;
-
-	ret = bpf_parse(ptr_argc, ptr_argv, opt_tbl, &type, &mode,
-			ptr_object, &section, ptr_uds_name, opcodes);
-	if (ret < 0)
-		return ret;
-
-	if (mode == CBPF_BYTECODE || mode == CBPF_FILE) {
-		addattr16(n, MAX_MSG, nla_tbl[BPF_NLA_OPS_LEN], ret);
-		addattr_l(n, MAX_MSG, nla_tbl[BPF_NLA_OPS], opcodes,
-			  ret * sizeof(struct sock_filter));
-	}
-
-	if (mode == EBPF_OBJECT || mode == EBPF_PINNED) {
-		snprintf(annotation, sizeof(annotation), "%s:[%s]",
-			 basename(*ptr_object), mode == EBPF_PINNED ?
-			 "*fsobj" : section);
-
-		addattr32(n, MAX_MSG, nla_tbl[BPF_NLA_FD], ret);
-		addattrstrz(n, MAX_MSG, nla_tbl[BPF_NLA_NAME], annotation);
-	}
-
-	return 0;
-}
-
-int bpf_graft_map(const char *map_path, uint32_t *key, int argc, char **argv)
-{
-	enum bpf_prog_type type = BPF_PROG_TYPE_UNSPEC;
-	const bool opt_tbl[BPF_MODE_MAX] = {
-		[CBPF_BYTECODE]	= false,
-		[CBPF_FILE]	= false,
-		[EBPF_OBJECT]	= true,
-		[EBPF_PINNED]	= true,
-	};
-	const struct bpf_elf_map test = {
-		.type		= BPF_MAP_TYPE_PROG_ARRAY,
-		.size_key	= sizeof(int),
-		.size_value	= sizeof(int),
-	};
-	int ret, prog_fd, map_fd;
-	const char *section;
-	enum bpf_mode mode;
-	uint32_t map_key;
-
-	prog_fd = bpf_parse(&argc, &argv, opt_tbl, &type, &mode,
-			    NULL, &section, NULL, NULL);
-	if (prog_fd < 0)
-		return prog_fd;
-	if (key) {
-		map_key = *key;
-	} else {
-		ret = sscanf(section, "%*i/%i", &map_key);
-		if (ret != 1) {
-			fprintf(stderr, "Couldn\'t infer map key from section name! Please provide \'key\' argument!\n");
-			ret = -EINVAL;
-			goto out_prog;
-		}
-	}
-
-	map_fd = bpf_obj_get(map_path);
-	if (map_fd < 0) {
-		fprintf(stderr, "Couldn\'t retrieve pinned map \'%s\': %s\n",
-			map_path, strerror(errno));
-		ret = map_fd;
-		goto out_prog;
-	}
-
-	ret = bpf_map_selfcheck_pinned(map_fd, &test,
-				       offsetof(struct bpf_elf_map, max_elem));
-	if (ret < 0) {
-		fprintf(stderr, "Map \'%s\' self-check failed!\n", map_path);
-		goto out_map;
-	}
-
-	ret = bpf_map_update(map_fd, &map_key, &prog_fd, BPF_ANY);
-	if (ret < 0)
-		fprintf(stderr, "Map update failed: %s\n", strerror(errno));
-out_map:
-	close(map_fd);
-out_prog:
-	close(prog_fd);
-	return ret;
-}
-
-#ifdef HAVE_ELF
-struct bpf_elf_prog {
-	enum bpf_prog_type	type;
-	const struct bpf_insn	*insns;
-	size_t			size;
-	const char		*license;
-};
-
-struct bpf_hash_entry {
-	unsigned int		pinning;
-	const char		*subpath;
-	struct bpf_hash_entry	*next;
-};
-
-struct bpf_elf_ctx {
-	Elf			*elf_fd;
-	GElf_Ehdr		elf_hdr;
-	Elf_Data		*sym_tab;
-	Elf_Data		*str_tab;
-	int			obj_fd;
-	int			map_fds[ELF_MAX_MAPS];
-	struct bpf_elf_map	maps[ELF_MAX_MAPS];
-	int			sym_num;
-	int			map_num;
-	bool			*sec_done;
-	int			sec_maps;
-	char			license[ELF_MAX_LICENSE_LEN];
-	enum bpf_prog_type	type;
-	bool			verbose;
-	struct bpf_elf_st	stat;
-	struct bpf_hash_entry	*ht[256];
-	char			*log;
-	size_t			log_size;
-};
-
-struct bpf_elf_sec_data {
-	GElf_Shdr		sec_hdr;
-	Elf_Data		*sec_data;
-	const char		*sec_name;
-};
-
-struct bpf_map_data {
-	int			*fds;
-	const char		*obj;
-	struct bpf_elf_st	*st;
-	struct bpf_elf_map	*ent;
-};
-
-static __check_format_string(2, 3) void
-bpf_dump_error(struct bpf_elf_ctx *ctx, const char *format, ...)
-{
-	va_list vl;
-
-	va_start(vl, format);
-	vfprintf(stderr, format, vl);
-	va_end(vl);
-
-	if (ctx->log && ctx->log[0]) {
-		if (ctx->verbose) {
-			fprintf(stderr, "%s\n", ctx->log);
-		} else {
-			unsigned int off = 0, len = strlen(ctx->log);
-
-			if (len > BPF_MAX_LOG) {
-				off = len - BPF_MAX_LOG;
-				fprintf(stderr, "Skipped %u bytes, use \'verb\' option for the full verbose log.\n[...]\n",
-					off);
-			}
-			fprintf(stderr, "%s\n", ctx->log + off);
-		}
-
-		memset(ctx->log, 0, ctx->log_size);
-	}
-}
-
-static int bpf_log_realloc(struct bpf_elf_ctx *ctx)
-{
-	size_t log_size = ctx->log_size;
-	void *ptr;
-
-	if (!ctx->log) {
-		log_size = 65536;
-	} else {
-		log_size <<= 1;
-		if (log_size > (UINT_MAX >> 8))
-			return -EINVAL;
-	}
-
-	ptr = realloc(ctx->log, log_size);
-	if (!ptr)
-		return -ENOMEM;
-
-	ctx->log = ptr;
-	ctx->log_size = log_size;
-
-	return 0;
-}
-
-static int bpf_map_create(enum bpf_map_type type, uint32_t size_key,
-			  uint32_t size_value, uint32_t max_elem,
-			  uint32_t flags)
-{
-	union bpf_attr attr = {};
-
-	attr.map_type = type;
-	attr.key_size = size_key;
-	attr.value_size = size_value;
-	attr.max_entries = max_elem;
-	attr.map_flags = flags;
-
-	return bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
-}
-
-static int bpf_prog_load(enum bpf_prog_type type, const struct bpf_insn *insns,
-			 size_t size_insns, const char *license, char *log,
-			 size_t size_log)
-{
-	union bpf_attr attr = {};
-
-	attr.prog_type = type;
-	attr.insns = bpf_ptr_to_u64(insns);
-	attr.insn_cnt = size_insns / sizeof(struct bpf_insn);
-	attr.license = bpf_ptr_to_u64(license);
-
-	if (size_log > 0) {
-		attr.log_buf = bpf_ptr_to_u64(log);
-		attr.log_size = size_log;
-		attr.log_level = 1;
-	}
-
-	return bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
-}
-
-static int bpf_obj_pin(int fd, const char *pathname)
-{
-	union bpf_attr attr = {};
-
-	attr.pathname = bpf_ptr_to_u64(pathname);
-	attr.bpf_fd = fd;
-
-	return bpf(BPF_OBJ_PIN, &attr, sizeof(attr));
-}
-
-static int bpf_obj_hash(const char *object, uint8_t *out, size_t len)
-{
-	struct sockaddr_alg alg = {
-		.salg_family	= AF_ALG,
-		.salg_type	= "hash",
-		.salg_name	= "sha1",
-	};
-	int ret, cfd, ofd, ffd;
-	struct stat stbuff;
-	ssize_t size;
-
-	if (!object || len != 20)
-		return -EINVAL;
-
-	cfd = socket(AF_ALG, SOCK_SEQPACKET, 0);
-	if (cfd < 0) {
-		fprintf(stderr, "Cannot get AF_ALG socket: %s\n",
-			strerror(errno));
-		return cfd;
-	}
-
-	ret = bind(cfd, (struct sockaddr *)&alg, sizeof(alg));
-	if (ret < 0) {
-		fprintf(stderr, "Error binding socket: %s\n", strerror(errno));
-		goto out_cfd;
-	}
-
-	ofd = accept(cfd, NULL, 0);
-	if (ofd < 0) {
-		fprintf(stderr, "Error accepting socket: %s\n",
-			strerror(errno));
-		ret = ofd;
-		goto out_cfd;
-	}
-
-	ffd = open(object, O_RDONLY);
-	if (ffd < 0) {
-		fprintf(stderr, "Error opening object %s: %s\n",
-			object, strerror(errno));
-		ret = ffd;
-		goto out_ofd;
-	}
-
-	ret = fstat(ffd, &stbuff);
-	if (ret < 0) {
-		fprintf(stderr, "Error doing fstat: %s\n",
-			strerror(errno));
-		goto out_ffd;
-	}
-
-	size = sendfile(ofd, ffd, NULL, stbuff.st_size);
-	if (size != stbuff.st_size) {
-		fprintf(stderr, "Error from sendfile (%zd vs %zu bytes): %s\n",
-			size, stbuff.st_size, strerror(errno));
-		ret = -1;
-		goto out_ffd;
-	}
-
-	size = read(ofd, out, len);
-	if (size != len) {
-		fprintf(stderr, "Error from read (%zd vs %zu bytes): %s\n",
-			size, len, strerror(errno));
-		ret = -1;
-	} else {
-		ret = 0;
-	}
-out_ffd:
-	close(ffd);
-out_ofd:
-	close(ofd);
-out_cfd:
-	close(cfd);
-	return ret;
-}
-
-static const char *bpf_get_obj_uid(const char *pathname)
-{
-	static bool bpf_uid_cached;
-	static char bpf_uid[64];
-	uint8_t tmp[20];
-	int ret;
-
-	if (bpf_uid_cached)
-		goto done;
-
-	ret = bpf_obj_hash(pathname, tmp, sizeof(tmp));
-	if (ret) {
-		fprintf(stderr, "Object hashing failed!\n");
-		return NULL;
-	}
-
-	hexstring_n2a(tmp, sizeof(tmp), bpf_uid, sizeof(bpf_uid));
-	bpf_uid_cached = true;
-done:
-	return bpf_uid;
-}
-
-static int bpf_init_env(const char *pathname)
-{
-	struct rlimit limit = {
-		.rlim_cur = RLIM_INFINITY,
-		.rlim_max = RLIM_INFINITY,
-	};
-
-	/* Don't bother in case we fail! */
-	setrlimit(RLIMIT_MEMLOCK, &limit);
-
-	if (!bpf_get_tc_dir()) {
-		fprintf(stderr, "Continuing without mounted eBPF fs. Too old kernel?\n");
-		return 0;
-	}
-
-	if (!bpf_get_obj_uid(pathname))
-		return -1;
-
-	return 0;
-}
-
-static const char *bpf_custom_pinning(const struct bpf_elf_ctx *ctx,
-				      uint32_t pinning)
-{
-	struct bpf_hash_entry *entry;
-
-	entry = ctx->ht[pinning & (ARRAY_SIZE(ctx->ht) - 1)];
-	while (entry && entry->pinning != pinning)
-		entry = entry->next;
-
-	return entry ? entry->subpath : NULL;
-}
-
-static bool bpf_no_pinning(const struct bpf_elf_ctx *ctx,
-			   uint32_t pinning)
-{
-	switch (pinning) {
-	case PIN_OBJECT_NS:
-	case PIN_GLOBAL_NS:
-		return false;
-	case PIN_NONE:
-		return true;
-	default:
-		return !bpf_custom_pinning(ctx, pinning);
-	}
-}
-
-static void bpf_make_pathname(char *pathname, size_t len, const char *name,
-			      const struct bpf_elf_ctx *ctx, uint32_t pinning)
-{
-	switch (pinning) {
-	case PIN_OBJECT_NS:
-		snprintf(pathname, len, "%s/%s/%s", bpf_get_tc_dir(),
-			 bpf_get_obj_uid(NULL), name);
-		break;
-	case PIN_GLOBAL_NS:
-		snprintf(pathname, len, "%s/%s/%s", bpf_get_tc_dir(),
-			 BPF_DIR_GLOBALS, name);
-		break;
-	default:
-		snprintf(pathname, len, "%s/../%s/%s", bpf_get_tc_dir(),
-			 bpf_custom_pinning(ctx, pinning), name);
-		break;
-	}
-}
-
-static int bpf_probe_pinned(const char *name, const struct bpf_elf_ctx *ctx,
-			    uint32_t pinning)
-{
-	char pathname[PATH_MAX];
-
-	if (bpf_no_pinning(ctx, pinning) || !bpf_get_tc_dir())
-		return 0;
-
-	bpf_make_pathname(pathname, sizeof(pathname), name, ctx, pinning);
-	return bpf_obj_get(pathname);
-}
-
-static int bpf_make_obj_path(void)
-{
-	char tmp[PATH_MAX];
-	int ret;
-
-	snprintf(tmp, sizeof(tmp), "%s/%s", bpf_get_tc_dir(),
-		 bpf_get_obj_uid(NULL));
-
-	ret = mkdir(tmp, S_IRWXU);
-	if (ret && errno != EEXIST) {
-		fprintf(stderr, "mkdir %s failed: %s\n", tmp, strerror(errno));
-		return ret;
-	}
-
-	return 0;
-}
-
-static int bpf_make_custom_path(const char *todo)
-{
-	char tmp[PATH_MAX], rem[PATH_MAX], *sub;
-	int ret;
-
-	snprintf(tmp, sizeof(tmp), "%s/../", bpf_get_tc_dir());
-	snprintf(rem, sizeof(rem), "%s/", todo);
-	sub = strtok(rem, "/");
-
-	while (sub) {
-		if (strlen(tmp) + strlen(sub) + 2 > PATH_MAX)
-			return -EINVAL;
-
-		strcat(tmp, sub);
-		strcat(tmp, "/");
-
-		ret = mkdir(tmp, S_IRWXU);
-		if (ret && errno != EEXIST) {
-			fprintf(stderr, "mkdir %s failed: %s\n", tmp,
-				strerror(errno));
-			return ret;
-		}
-
-		sub = strtok(NULL, "/");
-	}
-
-	return 0;
-}
-
-static int bpf_place_pinned(int fd, const char *name,
-			    const struct bpf_elf_ctx *ctx, uint32_t pinning)
-{
-	char pathname[PATH_MAX];
-	const char *tmp;
-	int ret = 0;
-
-	if (bpf_no_pinning(ctx, pinning) || !bpf_get_tc_dir())
-		return 0;
-
-	if (pinning == PIN_OBJECT_NS)
-		ret = bpf_make_obj_path();
-	else if ((tmp = bpf_custom_pinning(ctx, pinning)))
-		ret = bpf_make_custom_path(tmp);
-	if (ret < 0)
-		return ret;
-
-	bpf_make_pathname(pathname, sizeof(pathname), name, ctx, pinning);
-	return bpf_obj_pin(fd, pathname);
-}
-
-static void bpf_prog_report(int fd, const char *section,
-			    const struct bpf_elf_prog *prog,
-			    struct bpf_elf_ctx *ctx)
-{
-	unsigned int insns = prog->size / sizeof(struct bpf_insn);
-
-	fprintf(stderr, "\nProg section \'%s\' %s%s (%d)!\n", section,
-		fd < 0 ? "rejected: " : "loaded",
-		fd < 0 ? strerror(errno) : "",
-		fd < 0 ? errno : fd);
-
-	fprintf(stderr, " - Type:         %u\n", prog->type);
-	fprintf(stderr, " - Instructions: %u (%u over limit)\n",
-		insns, insns > BPF_MAXINSNS ? insns - BPF_MAXINSNS : 0);
-	fprintf(stderr, " - License:      %s\n\n", prog->license);
-
-	bpf_dump_error(ctx, "Verifier analysis:\n\n");
-}
-
-static int bpf_prog_attach(const char *section,
-			   const struct bpf_elf_prog *prog,
-			   struct bpf_elf_ctx *ctx)
-{
-	int tries = 0, fd;
-retry:
-	errno = 0;
-	fd = bpf_prog_load(prog->type, prog->insns, prog->size,
-			   prog->license, ctx->log, ctx->log_size);
-	if (fd < 0 || ctx->verbose) {
-		/* The verifier log is pretty chatty, sometimes so chatty
-		 * on larger programs, that we could fail to dump everything
-		 * into our buffer. Still, try to give a debuggable error
-		 * log for the user, so enlarge it and re-fail.
-		 */
-		if (fd < 0 && (errno == ENOSPC || !ctx->log_size)) {
-			if (tries++ < 6 && !bpf_log_realloc(ctx))
-				goto retry;
-
-			fprintf(stderr, "Log buffer too small to dump verifier log %zu bytes (%d tries)!\n",
-				ctx->log_size, tries);
-			return fd;
-		}
-
-		bpf_prog_report(fd, section, prog, ctx);
-	}
-
-	return fd;
-}
-
-static void bpf_map_report(int fd, const char *name,
-			   const struct bpf_elf_map *map,
-			   struct bpf_elf_ctx *ctx)
-{
-	fprintf(stderr, "Map object \'%s\' %s%s (%d)!\n", name,
-		fd < 0 ? "rejected: " : "loaded",
-		fd < 0 ? strerror(errno) : "",
-		fd < 0 ? errno : fd);
-
-	fprintf(stderr, " - Type:         %u\n", map->type);
-	fprintf(stderr, " - Identifier:   %u\n", map->id);
-	fprintf(stderr, " - Pinning:      %u\n", map->pinning);
-	fprintf(stderr, " - Size key:     %u\n", map->size_key);
-	fprintf(stderr, " - Size value:   %u\n", map->size_value);
-	fprintf(stderr, " - Max elems:    %u\n", map->max_elem);
-	fprintf(stderr, " - Flags:        %#x\n\n", map->flags);
-}
-
-static int bpf_map_attach(const char *name, const struct bpf_elf_map *map,
-			  struct bpf_elf_ctx *ctx)
-{
-	int fd, ret;
-
-	fd = bpf_probe_pinned(name, ctx, map->pinning);
-	if (fd > 0) {
-		ret = bpf_map_selfcheck_pinned(fd, map,
-					       offsetof(struct bpf_elf_map,
-							id));
-		if (ret < 0) {
-			close(fd);
-			fprintf(stderr, "Map \'%s\' self-check failed!\n",
-				name);
-			return ret;
-		}
-		if (ctx->verbose)
-			fprintf(stderr, "Map \'%s\' loaded as pinned!\n",
-				name);
-		return fd;
-	}
-
-	errno = 0;
-	fd = bpf_map_create(map->type, map->size_key, map->size_value,
-			    map->max_elem, map->flags);
-	if (fd < 0 || ctx->verbose) {
-		bpf_map_report(fd, name, map, ctx);
-		if (fd < 0)
-			return fd;
-	}
-
-	ret = bpf_place_pinned(fd, name, ctx, map->pinning);
-	if (ret < 0 && errno != EEXIST) {
-		fprintf(stderr, "Could not pin %s map: %s\n", name,
-			strerror(errno));
-		close(fd);
-		return ret;
-	}
-
-	return fd;
-}
-
-static const char *bpf_str_tab_name(const struct bpf_elf_ctx *ctx,
-				    const GElf_Sym *sym)
-{
-	return ctx->str_tab->d_buf + sym->st_name;
-}
-
-static const char *bpf_map_fetch_name(struct bpf_elf_ctx *ctx, int which)
-{
-	GElf_Sym sym;
-	int i;
-
-	for (i = 0; i < ctx->sym_num; i++) {
-		if (gelf_getsym(ctx->sym_tab, i, &sym) != &sym)
-			continue;
-
-		if (GELF_ST_BIND(sym.st_info) != STB_GLOBAL ||
-		    GELF_ST_TYPE(sym.st_info) != STT_NOTYPE ||
-		    sym.st_shndx != ctx->sec_maps ||
-		    sym.st_value / sizeof(struct bpf_elf_map) != which)
-			continue;
-
-		return bpf_str_tab_name(ctx, &sym);
-	}
-
-	return NULL;
-}
-
-static int bpf_maps_attach_all(struct bpf_elf_ctx *ctx)
-{
-	const char *map_name;
-	int i, fd;
-
-	for (i = 0; i < ctx->map_num; i++) {
-		map_name = bpf_map_fetch_name(ctx, i);
-		if (!map_name)
-			return -EIO;
-
-		fd = bpf_map_attach(map_name, &ctx->maps[i], ctx);
-		if (fd < 0)
-			return fd;
-
-		ctx->map_fds[i] = fd;
-	}
-
-	return 0;
-}
-
-static int bpf_fill_section_data(struct bpf_elf_ctx *ctx, int section,
-				 struct bpf_elf_sec_data *data)
-{
-	Elf_Data *sec_edata;
-	GElf_Shdr sec_hdr;
-	Elf_Scn *sec_fd;
-	char *sec_name;
-
-	memset(data, 0, sizeof(*data));
-
-	sec_fd = elf_getscn(ctx->elf_fd, section);
-	if (!sec_fd)
-		return -EINVAL;
-	if (gelf_getshdr(sec_fd, &sec_hdr) != &sec_hdr)
-		return -EIO;
-
-	sec_name = elf_strptr(ctx->elf_fd, ctx->elf_hdr.e_shstrndx,
-			      sec_hdr.sh_name);
-	if (!sec_name || !sec_hdr.sh_size)
-		return -ENOENT;
-
-	sec_edata = elf_getdata(sec_fd, NULL);
-	if (!sec_edata || elf_getdata(sec_fd, sec_edata))
-		return -EIO;
-
-	memcpy(&data->sec_hdr, &sec_hdr, sizeof(sec_hdr));
-
-	data->sec_name = sec_name;
-	data->sec_data = sec_edata;
-	return 0;
-}
-
-static int bpf_fetch_maps(struct bpf_elf_ctx *ctx, int section,
-			  struct bpf_elf_sec_data *data)
-{
-	if (data->sec_data->d_size % sizeof(struct bpf_elf_map) != 0)
-		return -EINVAL;
-
-	ctx->map_num = data->sec_data->d_size / sizeof(struct bpf_elf_map);
-	ctx->sec_maps = section;
-	ctx->sec_done[section] = true;
-
-	if (ctx->map_num > ARRAY_SIZE(ctx->map_fds)) {
-		fprintf(stderr, "Too many BPF maps in ELF section!\n");
-		return -ENOMEM;
-	}
-
-	memcpy(ctx->maps, data->sec_data->d_buf, data->sec_data->d_size);
-	return 0;
-}
-
-static int bpf_fetch_license(struct bpf_elf_ctx *ctx, int section,
-			     struct bpf_elf_sec_data *data)
-{
-	if (data->sec_data->d_size > sizeof(ctx->license))
-		return -ENOMEM;
-
-	memcpy(ctx->license, data->sec_data->d_buf, data->sec_data->d_size);
-	ctx->sec_done[section] = true;
-	return 0;
-}
-
-static int bpf_fetch_symtab(struct bpf_elf_ctx *ctx, int section,
-			    struct bpf_elf_sec_data *data)
-{
-	ctx->sym_tab = data->sec_data;
-	ctx->sym_num = data->sec_hdr.sh_size / data->sec_hdr.sh_entsize;
-	ctx->sec_done[section] = true;
-	return 0;
-}
-
-static int bpf_fetch_strtab(struct bpf_elf_ctx *ctx, int section,
-			    struct bpf_elf_sec_data *data)
-{
-	ctx->str_tab = data->sec_data;
-	ctx->sec_done[section] = true;
-	return 0;
-}
-
-static bool bpf_has_map_data(const struct bpf_elf_ctx *ctx)
-{
-	return ctx->sym_tab && ctx->str_tab && ctx->sec_maps;
-}
-
-static int bpf_fetch_ancillary(struct bpf_elf_ctx *ctx)
-{
-	struct bpf_elf_sec_data data;
-	int i, ret = -1;
-
-	for (i = 1; i < ctx->elf_hdr.e_shnum; i++) {
-		ret = bpf_fill_section_data(ctx, i, &data);
-		if (ret < 0)
-			continue;
-
-		if (data.sec_hdr.sh_type == SHT_PROGBITS &&
-		    !strcmp(data.sec_name, ELF_SECTION_MAPS))
-			ret = bpf_fetch_maps(ctx, i, &data);
-		else if (data.sec_hdr.sh_type == SHT_PROGBITS &&
-			 !strcmp(data.sec_name, ELF_SECTION_LICENSE))
-			ret = bpf_fetch_license(ctx, i, &data);
-		else if (data.sec_hdr.sh_type == SHT_SYMTAB &&
-			 !strcmp(data.sec_name, ".symtab"))
-			ret = bpf_fetch_symtab(ctx, i, &data);
-		else if (data.sec_hdr.sh_type == SHT_STRTAB &&
-			 !strcmp(data.sec_name, ".strtab"))
-			ret = bpf_fetch_strtab(ctx, i, &data);
-		if (ret < 0) {
-			fprintf(stderr, "Error parsing section %d! Perhaps check with readelf -a?\n",
-				i);
-			break;
-		}
-	}
-
-	if (bpf_has_map_data(ctx)) {
-		ret = bpf_maps_attach_all(ctx);
-		if (ret < 0) {
-			fprintf(stderr, "Error loading maps into kernel!\n");
-			return ret;
-		}
-	}
-
-	return ret;
-}
-
-static int bpf_fetch_prog(struct bpf_elf_ctx *ctx, const char *section)
-{
-	struct bpf_elf_sec_data data;
-	struct bpf_elf_prog prog;
-	int ret, i, fd = -1;
-
-	for (i = 1; i < ctx->elf_hdr.e_shnum; i++) {
-		if (ctx->sec_done[i])
-			continue;
-
-		ret = bpf_fill_section_data(ctx, i, &data);
-		if (ret < 0 ||
-		    !(data.sec_hdr.sh_type == SHT_PROGBITS &&
-		      data.sec_hdr.sh_flags & SHF_EXECINSTR &&
-		      !strcmp(data.sec_name, section)))
-			continue;
-
-		memset(&prog, 0, sizeof(prog));
-		prog.type    = ctx->type;
-		prog.insns   = data.sec_data->d_buf;
-		prog.size    = data.sec_data->d_size;
-		prog.license = ctx->license;
-
-		fd = bpf_prog_attach(section, &prog, ctx);
-		if (fd < 0)
-			break;
-
-		ctx->sec_done[i] = true;
-		break;
-	}
-
-	return fd;
-}
-
-static int bpf_apply_relo_data(struct bpf_elf_ctx *ctx,
-			       struct bpf_elf_sec_data *data_relo,
-			       struct bpf_elf_sec_data *data_insn)
-{
-	Elf_Data *idata = data_insn->sec_data;
-	GElf_Shdr *rhdr = &data_relo->sec_hdr;
-	int relo_ent, relo_num = rhdr->sh_size / rhdr->sh_entsize;
-	struct bpf_insn *insns = idata->d_buf;
-	unsigned int num_insns = idata->d_size / sizeof(*insns);
-
-	for (relo_ent = 0; relo_ent < relo_num; relo_ent++) {
-		unsigned int ioff, rmap;
-		GElf_Rel relo;
-		GElf_Sym sym;
-
-		if (gelf_getrel(data_relo->sec_data, relo_ent, &relo) != &relo)
-			return -EIO;
-
-		ioff = relo.r_offset / sizeof(struct bpf_insn);
-		if (ioff >= num_insns ||
-		    insns[ioff].code != (BPF_LD | BPF_IMM | BPF_DW)) {
-			fprintf(stderr, "ELF contains relo data for non ld64 instruction at offset %u! Compiler bug?!\n",
-				ioff);
-			if (ioff < num_insns &&
-			    insns[ioff].code == (BPF_JMP | BPF_CALL))
-				fprintf(stderr, " - Try to annotate functions with always_inline attribute!\n");
-			return -EINVAL;
-		}
-
-		if (gelf_getsym(ctx->sym_tab, GELF_R_SYM(relo.r_info), &sym) != &sym)
-			return -EIO;
-		if (sym.st_shndx != ctx->sec_maps) {
-			fprintf(stderr, "ELF contains non-map related relo data in entry %u pointing to section %u! Compiler bug?!\n",
-				relo_ent, sym.st_shndx);
-			return -EIO;
-		}
-
-		rmap = sym.st_value / sizeof(struct bpf_elf_map);
-		if (rmap >= ARRAY_SIZE(ctx->map_fds))
-			return -EINVAL;
-		if (!ctx->map_fds[rmap])
-			return -EINVAL;
-
-		if (ctx->verbose)
-			fprintf(stderr, "Map \'%s\' (%d) injected into prog section \'%s\' at offset %u!\n",
-				bpf_str_tab_name(ctx, &sym), ctx->map_fds[rmap],
-				data_insn->sec_name, ioff);
-
-		insns[ioff].src_reg = BPF_PSEUDO_MAP_FD;
-		insns[ioff].imm     = ctx->map_fds[rmap];
-	}
-
-	return 0;
-}
-
-static int bpf_fetch_prog_relo(struct bpf_elf_ctx *ctx, const char *section,
-			       bool *lderr)
-{
-	struct bpf_elf_sec_data data_relo, data_insn;
-	struct bpf_elf_prog prog;
-	int ret, idx, i, fd = -1;
-
-	for (i = 1; i < ctx->elf_hdr.e_shnum; i++) {
-		ret = bpf_fill_section_data(ctx, i, &data_relo);
-		if (ret < 0 || data_relo.sec_hdr.sh_type != SHT_REL)
-			continue;
-
-		idx = data_relo.sec_hdr.sh_info;
-		ret = bpf_fill_section_data(ctx, idx, &data_insn);
-		if (ret < 0 ||
-		    !(data_insn.sec_hdr.sh_type == SHT_PROGBITS &&
-		      data_insn.sec_hdr.sh_flags & SHF_EXECINSTR &&
-		      !strcmp(data_insn.sec_name, section)))
-			continue;
-
-		ret = bpf_apply_relo_data(ctx, &data_relo, &data_insn);
-		if (ret < 0)
-			continue;
-
-		memset(&prog, 0, sizeof(prog));
-		prog.type    = ctx->type;
-		prog.insns   = data_insn.sec_data->d_buf;
-		prog.size    = data_insn.sec_data->d_size;
-		prog.license = ctx->license;
-
-		fd = bpf_prog_attach(section, &prog, ctx);
-		if (fd < 0) {
-			*lderr = true;
-			break;
-		}
-
-		ctx->sec_done[i]   = true;
-		ctx->sec_done[idx] = true;
-		break;
-	}
-
-	return fd;
-}
-
-static int bpf_fetch_prog_sec(struct bpf_elf_ctx *ctx, const char *section)
-{
-	bool lderr = false;
-	int ret = -1;
-
-	if (bpf_has_map_data(ctx))
-		ret = bpf_fetch_prog_relo(ctx, section, &lderr);
-	if (ret < 0 && !lderr)
-		ret = bpf_fetch_prog(ctx, section);
-
-	return ret;
-}
-
-static int bpf_find_map_by_id(struct bpf_elf_ctx *ctx, uint32_t id)
-{
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(ctx->map_fds); i++)
-		if (ctx->map_fds[i] && ctx->maps[i].id == id &&
-		    ctx->maps[i].type == BPF_MAP_TYPE_PROG_ARRAY)
-			return i;
-	return -1;
-}
-
-static int bpf_fill_prog_arrays(struct bpf_elf_ctx *ctx)
-{
-	struct bpf_elf_sec_data data;
-	uint32_t map_id, key_id;
-	int fd, i, ret, idx;
-
-	for (i = 1; i < ctx->elf_hdr.e_shnum; i++) {
-		if (ctx->sec_done[i])
-			continue;
-
-		ret = bpf_fill_section_data(ctx, i, &data);
-		if (ret < 0)
-			continue;
-
-		ret = sscanf(data.sec_name, "%i/%i", &map_id, &key_id);
-		if (ret != 2)
-			continue;
-
-		idx = bpf_find_map_by_id(ctx, map_id);
-		if (idx < 0)
-			continue;
-
-		fd = bpf_fetch_prog_sec(ctx, data.sec_name);
-		if (fd < 0)
-			return -EIO;
-
-		ret = bpf_map_update(ctx->map_fds[idx], &key_id,
-				     &fd, BPF_ANY);
-		if (ret < 0) {
-			if (errno == E2BIG)
-				fprintf(stderr, "Tail call key %u for map %u out of bounds?\n",
-					key_id, map_id);
-			return -errno;
-		}
-
-		ctx->sec_done[i] = true;
-	}
-
-	return 0;
-}
-
-static void bpf_save_finfo(struct bpf_elf_ctx *ctx)
-{
-	struct stat st;
-	int ret;
-
-	memset(&ctx->stat, 0, sizeof(ctx->stat));
-
-	ret = fstat(ctx->obj_fd, &st);
-	if (ret < 0) {
-		fprintf(stderr, "Stat of elf file failed: %s\n",
-			strerror(errno));
-		return;
-	}
-
-	ctx->stat.st_dev = st.st_dev;
-	ctx->stat.st_ino = st.st_ino;
-}
-
-static int bpf_read_pin_mapping(FILE *fp, uint32_t *id, char *path)
-{
-	char buff[PATH_MAX];
-
-	while (fgets(buff, sizeof(buff), fp)) {
-		char *ptr = buff;
-
-		while (*ptr == ' ' || *ptr == '\t')
-			ptr++;
-
-		if (*ptr == '#' || *ptr == '\n' || *ptr == 0)
-			continue;
-
-		if (sscanf(ptr, "%i %s\n", id, path) != 2 &&
-		    sscanf(ptr, "%i %s #", id, path) != 2) {
-			strcpy(path, ptr);
-			return -1;
-		}
-
-		return 1;
-	}
-
-	return 0;
-}
-
-static bool bpf_pinning_reserved(uint32_t pinning)
-{
-	switch (pinning) {
-	case PIN_NONE:
-	case PIN_OBJECT_NS:
-	case PIN_GLOBAL_NS:
-		return true;
-	default:
-		return false;
-	}
-}
-
-static void bpf_hash_init(struct bpf_elf_ctx *ctx, const char *db_file)
-{
-	struct bpf_hash_entry *entry;
-	char subpath[PATH_MAX] = {};
-	uint32_t pinning;
-	FILE *fp;
-	int ret;
-
-	fp = fopen(db_file, "r");
-	if (!fp)
-		return;
-
-	while ((ret = bpf_read_pin_mapping(fp, &pinning, subpath))) {
-		if (ret == -1) {
-			fprintf(stderr, "Database %s is corrupted at: %s\n",
-				db_file, subpath);
-			fclose(fp);
-			return;
-		}
-
-		if (bpf_pinning_reserved(pinning)) {
-			fprintf(stderr, "Database %s, id %u is reserved - ignoring!\n",
-				db_file, pinning);
-			continue;
-		}
-
-		entry = malloc(sizeof(*entry));
-		if (!entry) {
-			fprintf(stderr, "No memory left for db entry!\n");
-			continue;
-		}
-
-		entry->pinning = pinning;
-		entry->subpath = strdup(subpath);
-		if (!entry->subpath) {
-			fprintf(stderr, "No memory left for db entry!\n");
-			free(entry);
-			continue;
-		}
-
-		entry->next = ctx->ht[pinning & (ARRAY_SIZE(ctx->ht) - 1)];
-		ctx->ht[pinning & (ARRAY_SIZE(ctx->ht) - 1)] = entry;
-	}
-
-	fclose(fp);
-}
-
-static void bpf_hash_destroy(struct bpf_elf_ctx *ctx)
-{
-	struct bpf_hash_entry *entry;
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(ctx->ht); i++) {
-		while ((entry = ctx->ht[i]) != NULL) {
-			ctx->ht[i] = entry->next;
-			free((char *)entry->subpath);
-			free(entry);
-		}
-	}
-}
-
-static int bpf_elf_check_ehdr(const struct bpf_elf_ctx *ctx)
-{
-	if (ctx->elf_hdr.e_type != ET_REL ||
-	    (ctx->elf_hdr.e_machine != EM_NONE &&
-	     ctx->elf_hdr.e_machine != EM_BPF) ||
-	    ctx->elf_hdr.e_version != EV_CURRENT) {
-		fprintf(stderr, "ELF format error, ELF file not for eBPF?\n");
-		return -EINVAL;
-	}
-
-	switch (ctx->elf_hdr.e_ident[EI_DATA]) {
-	default:
-		fprintf(stderr, "ELF format error, wrong endianness info?\n");
-		return -EINVAL;
-	case ELFDATA2LSB:
-		if (htons(1) == 1) {
-			fprintf(stderr,
-				"We are big endian, eBPF object is little endian!\n");
-			return -EIO;
-		}
-		break;
-	case ELFDATA2MSB:
-		if (htons(1) != 1) {
-			fprintf(stderr,
-				"We are little endian, eBPF object is big endian!\n");
-			return -EIO;
-		}
-		break;
-	}
-
-	return 0;
-}
-
-static int bpf_elf_ctx_init(struct bpf_elf_ctx *ctx, const char *pathname,
-			    enum bpf_prog_type type, bool verbose)
-{
-	int ret = -EINVAL;
-
-	if (elf_version(EV_CURRENT) == EV_NONE ||
-	    bpf_init_env(pathname))
-		return ret;
-
-	memset(ctx, 0, sizeof(*ctx));
-	ctx->verbose = verbose;
-	ctx->type    = type;
-
-	ctx->obj_fd = open(pathname, O_RDONLY);
-	if (ctx->obj_fd < 0)
-		return ctx->obj_fd;
-
-	ctx->elf_fd = elf_begin(ctx->obj_fd, ELF_C_READ, NULL);
-	if (!ctx->elf_fd) {
-		ret = -EINVAL;
-		goto out_fd;
-	}
-
-	if (elf_kind(ctx->elf_fd) != ELF_K_ELF) {
-		ret = -EINVAL;
-		goto out_fd;
-	}
-
-	if (gelf_getehdr(ctx->elf_fd, &ctx->elf_hdr) !=
-	    &ctx->elf_hdr) {
-		ret = -EIO;
-		goto out_elf;
-	}
-
-	ret = bpf_elf_check_ehdr(ctx);
-	if (ret < 0)
-		goto out_elf;
-
-	ctx->sec_done = calloc(ctx->elf_hdr.e_shnum,
-			       sizeof(*(ctx->sec_done)));
-	if (!ctx->sec_done) {
-		ret = -ENOMEM;
-		goto out_elf;
-	}
-
-	if (ctx->verbose && bpf_log_realloc(ctx)) {
-		ret = -ENOMEM;
-		goto out_free;
-	}
-
-	bpf_save_finfo(ctx);
-	bpf_hash_init(ctx, CONFDIR "/bpf_pinning");
-
-	return 0;
-out_free:
-	free(ctx->sec_done);
-out_elf:
-	elf_end(ctx->elf_fd);
-out_fd:
-	close(ctx->obj_fd);
-	return ret;
-}
-
-static int bpf_maps_count(struct bpf_elf_ctx *ctx)
-{
-	int i, count = 0;
-
-	for (i = 0; i < ARRAY_SIZE(ctx->map_fds); i++) {
-		if (!ctx->map_fds[i])
-			break;
-		count++;
-	}
-
-	return count;
-}
-
-static void bpf_maps_teardown(struct bpf_elf_ctx *ctx)
-{
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(ctx->map_fds); i++) {
-		if (ctx->map_fds[i])
-			close(ctx->map_fds[i]);
-	}
-}
-
-static void bpf_elf_ctx_destroy(struct bpf_elf_ctx *ctx, bool failure)
-{
-	if (failure)
-		bpf_maps_teardown(ctx);
-
-	bpf_hash_destroy(ctx);
-
-	free(ctx->sec_done);
-	free(ctx->log);
-
-	elf_end(ctx->elf_fd);
-	close(ctx->obj_fd);
-}
-
-static struct bpf_elf_ctx __ctx;
-
-static int bpf_obj_open(const char *pathname, enum bpf_prog_type type,
-			const char *section, bool verbose)
-{
-	struct bpf_elf_ctx *ctx = &__ctx;
-	int fd = 0, ret;
-
-	ret = bpf_elf_ctx_init(ctx, pathname, type, verbose);
-	if (ret < 0) {
-		fprintf(stderr, "Cannot initialize ELF context!\n");
-		return ret;
-	}
-
-	ret = bpf_fetch_ancillary(ctx);
-	if (ret < 0) {
-		fprintf(stderr, "Error fetching ELF ancillary data!\n");
-		goto out;
-	}
-
-	fd = bpf_fetch_prog_sec(ctx, section);
-	if (fd < 0) {
-		fprintf(stderr, "Error fetching program/map!\n");
-		ret = fd;
-		goto out;
-	}
-
-	ret = bpf_fill_prog_arrays(ctx);
-	if (ret < 0)
-		fprintf(stderr, "Error filling program arrays!\n");
-out:
-	bpf_elf_ctx_destroy(ctx, ret < 0);
-	if (ret < 0) {
-		if (fd)
-			close(fd);
-		return ret;
-	}
-
-	return fd;
-}
-
-static int
-bpf_map_set_send(int fd, struct sockaddr_un *addr, unsigned int addr_len,
-		 const struct bpf_map_data *aux, unsigned int entries)
-{
-	struct bpf_map_set_msg msg = {
-		.aux.uds_ver = BPF_SCM_AUX_VER,
-		.aux.num_ent = entries,
-	};
-	int *cmsg_buf, min_fd;
-	char *amsg_buf;
-	int i;
-
-	strncpy(msg.aux.obj_name, aux->obj, sizeof(msg.aux.obj_name));
-	memcpy(&msg.aux.obj_st, aux->st, sizeof(msg.aux.obj_st));
-
-	cmsg_buf = bpf_map_set_init(&msg, addr, addr_len);
-	amsg_buf = (char *)msg.aux.ent;
-
-	for (i = 0; i < entries; i += min_fd) {
-		int ret;
-
-		min_fd = min(BPF_SCM_MAX_FDS * 1U, entries - i);
-		bpf_map_set_init_single(&msg, min_fd);
-
-		memcpy(cmsg_buf, &aux->fds[i], sizeof(aux->fds[0]) * min_fd);
-		memcpy(amsg_buf, &aux->ent[i], sizeof(aux->ent[0]) * min_fd);
-
-		ret = sendmsg(fd, &msg.hdr, 0);
-		if (ret <= 0)
-			return ret ? : -1;
-	}
-
-	return 0;
-}
-
-static int
-bpf_map_set_recv(int fd, int *fds,  struct bpf_map_aux *aux,
-		 unsigned int entries)
-{
-	struct bpf_map_set_msg msg;
-	int *cmsg_buf, min_fd;
-	char *amsg_buf, *mmsg_buf;
-	unsigned int needed = 1;
-	int i;
-
-	cmsg_buf = bpf_map_set_init(&msg, NULL, 0);
-	amsg_buf = (char *)msg.aux.ent;
-	mmsg_buf = (char *)&msg.aux;
-
-	for (i = 0; i < min(entries, needed); i += min_fd) {
-		struct cmsghdr *cmsg;
-		int ret;
-
-		min_fd = min(entries, entries - i);
-		bpf_map_set_init_single(&msg, min_fd);
-
-		ret = recvmsg(fd, &msg.hdr, 0);
-		if (ret <= 0)
-			return ret ? : -1;
-
-		cmsg = CMSG_FIRSTHDR(&msg.hdr);
-		if (!cmsg || cmsg->cmsg_type != SCM_RIGHTS)
-			return -EINVAL;
-		if (msg.hdr.msg_flags & MSG_CTRUNC)
-			return -EIO;
-		if (msg.aux.uds_ver != BPF_SCM_AUX_VER)
-			return -ENOSYS;
-
-		min_fd = (cmsg->cmsg_len - sizeof(*cmsg)) / sizeof(fd);
-		if (min_fd > entries || min_fd <= 0)
-			return -EINVAL;
-
-		memcpy(&fds[i], cmsg_buf, sizeof(fds[0]) * min_fd);
-		memcpy(&aux->ent[i], amsg_buf, sizeof(aux->ent[0]) * min_fd);
-		memcpy(aux, mmsg_buf, offsetof(struct bpf_map_aux, ent));
-
-		needed = aux->num_ent;
-	}
-
-	return 0;
-}
-
-int bpf_send_map_fds(const char *path, const char *obj)
-{
-	struct bpf_elf_ctx *ctx = &__ctx;
-	struct sockaddr_un addr = { .sun_family = AF_UNIX };
-	struct bpf_map_data bpf_aux = {
-		.fds = ctx->map_fds,
-		.ent = ctx->maps,
-		.st  = &ctx->stat,
-		.obj = obj,
-	};
-	int fd, ret;
-
-	fd = socket(AF_UNIX, SOCK_DGRAM, 0);
-	if (fd < 0) {
-		fprintf(stderr, "Cannot open socket: %s\n",
-			strerror(errno));
-		return -1;
-	}
-
-	strncpy(addr.sun_path, path, sizeof(addr.sun_path));
-
-	ret = connect(fd, (struct sockaddr *)&addr, sizeof(addr));
-	if (ret < 0) {
-		fprintf(stderr, "Cannot connect to %s: %s\n",
-			path, strerror(errno));
-		return -1;
-	}
-
-	ret = bpf_map_set_send(fd, &addr, sizeof(addr), &bpf_aux,
-			       bpf_maps_count(ctx));
-	if (ret < 0)
-		fprintf(stderr, "Cannot send fds to %s: %s\n",
-			path, strerror(errno));
-
-	bpf_maps_teardown(ctx);
-	close(fd);
-	return ret;
-}
-
-int bpf_recv_map_fds(const char *path, int *fds, struct bpf_map_aux *aux,
-		     unsigned int entries)
-{
-	struct sockaddr_un addr = { .sun_family = AF_UNIX };
-	int fd, ret;
-
-	fd = socket(AF_UNIX, SOCK_DGRAM, 0);
-	if (fd < 0) {
-		fprintf(stderr, "Cannot open socket: %s\n",
-			strerror(errno));
-		return -1;
-	}
-
-	strncpy(addr.sun_path, path, sizeof(addr.sun_path));
-
-	ret = bind(fd, (struct sockaddr *)&addr, sizeof(addr));
-	if (ret < 0) {
-		fprintf(stderr, "Cannot bind to socket: %s\n",
-			strerror(errno));
-		return -1;
-	}
-
-	ret = bpf_map_set_recv(fd, fds, aux, entries);
-	if (ret < 0)
-		fprintf(stderr, "Cannot recv fds from %s: %s\n",
-			path, strerror(errno));
-
-	unlink(addr.sun_path);
-	close(fd);
-	return ret;
-}
-#endif /* HAVE_ELF */
diff --git a/tc/tc_bpf.h b/tc/tc_bpf.h
deleted file mode 100644
index 30306de..0000000
--- a/tc/tc_bpf.h
+++ /dev/null
@@ -1,82 +0,0 @@
-/*
- * tc_bpf.h	BPF common code
- *
- *		This program is free software; you can distribute it and/or
- *		modify it under the terms of the GNU General Public License
- *		as published by the Free Software Foundation; either version
- *		2 of the License, or (at your option) any later version.
- *
- * Authors:	Daniel Borkmann <dborkman@redhat.com>
- *		Jiri Pirko <jiri@resnulli.us>
- */
-
-#ifndef _TC_BPF_H_
-#define _TC_BPF_H_ 1
-
-#include <linux/netlink.h>
-#include <linux/bpf.h>
-#include <linux/magic.h>
-
-#include "utils.h"
-#include "bpf_scm.h"
-
-enum {
-	BPF_NLA_OPS_LEN = 0,
-	BPF_NLA_OPS,
-	BPF_NLA_FD,
-	BPF_NLA_NAME,
-	__BPF_NLA_MAX,
-};
-
-#define BPF_NLA_MAX	__BPF_NLA_MAX
-
-#define BPF_ENV_UDS	"TC_BPF_UDS"
-#define BPF_ENV_MNT	"TC_BPF_MNT"
-
-#ifndef BPF_MAX_LOG
-# define BPF_MAX_LOG	4096
-#endif
-
-#ifndef BPF_FS_MAGIC
-# define BPF_FS_MAGIC	0xcafe4a11
-#endif
-
-#define BPF_DIR_MNT	"/sys/fs/bpf"
-
-#define BPF_DIR_TC	"tc"
-#define BPF_DIR_GLOBALS	"globals"
-
-#ifndef TRACEFS_MAGIC
-# define TRACEFS_MAGIC	0x74726163
-#endif
-
-#define TRACE_DIR_MNT	"/sys/kernel/tracing"
-
-int bpf_trace_pipe(void);
-const char *bpf_default_section(const enum bpf_prog_type type);
-
-int bpf_parse_common(int *ptr_argc, char ***ptr_argv, const int *nla_tbl,
-		     enum bpf_prog_type type, const char **ptr_object,
-		     const char **ptr_uds_name, struct nlmsghdr *n);
-int bpf_graft_map(const char *map_path, uint32_t *key, int argc, char **argv);
-
-void bpf_print_ops(FILE *f, struct rtattr *bpf_ops, __u16 len);
-
-#ifdef HAVE_ELF
-int bpf_send_map_fds(const char *path, const char *obj);
-int bpf_recv_map_fds(const char *path, int *fds, struct bpf_map_aux *aux,
-		     unsigned int entries);
-#else
-static inline int bpf_send_map_fds(const char *path, const char *obj)
-{
-	return 0;
-}
-
-static inline int bpf_recv_map_fds(const char *path, int *fds,
-				   struct bpf_map_aux *aux,
-				   unsigned int entries)
-{
-	return -1;
-}
-#endif /* HAVE_ELF */
-#endif /* _TC_BPF_H_ */
-- 
1.9.3

^ permalink raw reply related

* [PATCH net] net: __skb_flow_dissect() must cap its return value
From: Eric Dumazet @ 2016-11-10  0:04 UTC (permalink / raw)
  To: David Miller
  Cc: Yibin Yang, Tom Herbert, Jojy Varghese, Alexander Duyck,
	Alexei Starovoitov, Willem de Bruijn, netdev
In-Reply-To: <1478718427.16809.7.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <edumazet@google.com>

After Tom patch, thoff field could point past the end of the buffer,
this could fool some callers.

If an skb was provided, skb->len should be the upper limit.
If not, hlen is supposed to be the upper limit.

Fixes: a6e544b0a88b ("flow_dissector: Jump to exit code in __skb_flow_dissect")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Yibin Yang <yibyang@cisco.com
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 net/core/flow_dissector.c |   11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 44e6ba9d3a6b..5a908c534bec 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -122,7 +122,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 	struct flow_dissector_key_keyid *key_keyid;
 	bool skip_vlan = false;
 	u8 ip_proto = 0;
-	bool ret = false;
+	bool ret;
 
 	if (!data) {
 		data = skb->data;
@@ -549,12 +549,17 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 out_good:
 	ret = true;
 
-out_bad:
+	key_control->thoff = (u16)nhoff;
+out:
 	key_basic->n_proto = proto;
 	key_basic->ip_proto = ip_proto;
-	key_control->thoff = (u16)nhoff;
 
 	return ret;
+
+out_bad:
+	ret = false;
+	key_control->thoff = min_t(u16, nhoff, skb ? skb->len : hlen);
+	goto out;
 }
 EXPORT_SYMBOL(__skb_flow_dissect);
 

^ permalink raw reply related

* Re: [PATCH] vxlan: hide unused local variable
From: David Miller @ 2016-11-10  0:00 UTC (permalink / raw)
  To: arnd; +Cc: jbenc, hannes, aduyck, pshelar, netdev, linux-kernel
In-Reply-To: <20161107211017.857340-1-arnd@arndb.de>

From: Arnd Bergmann <arnd@arndb.de>
Date: Mon,  7 Nov 2016 22:09:07 +0100

> A bugfix introduced a harmless warning in v4.9-rc4:
> 
> drivers/net/vxlan.c: In function 'vxlan_group_used':
> drivers/net/vxlan.c:947:21: error: unused variable 'sock6' [-Werror=unused-variable]
> 
> This hides the variable inside of the same #ifdef that is
> around its user. The extraneous initialization is removed
> at the same time, it was accidentally introduced in the
> same commit.
> 
> Fixes: c6fcc4fc5f8b ("vxlan: avoid using stale vxlan socket.")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Applied.

^ permalink raw reply

* Re: [PATCH net-next v2 5/5] net: l2tp: fix negative assignment to unsigned int
From: David Miller @ 2016-11-09 23:56 UTC (permalink / raw)
  To: asbjorn; +Cc: jchapman, netdev, linux-kernel
In-Reply-To: <20161107203928.30111-5-asbjorn@asbjorn.st>

From: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st>
Date: Mon,  7 Nov 2016 20:39:28 +0000

> recv_seq, send_seq and lns_mode mode are all defined as
> unsigned int foo:1;
> 
> Signed-off-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st>

Applied.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox