Netdev List
 help / color / mirror / Atom feed
* [PATCH RFC] ipv6: Implement limits on hop by hop and destination options
From: Tom Herbert @ 2017-04-27 20:57 UTC (permalink / raw)
  To: netdev; +Cc: Tom Herbert

RFC 2460 (IPv6) defines hop by hop options and destination options
extension headers. Both of these carry a list of TLVs which is
only limited by the maximum length of the extension header (2048
bytes). By the spec a host must process all the TLVs in these
options, however these could be used as a fairly obvious
denial of service attack. I think this could in fact be
a significant DOS vector on the Internet, one mitigating
factor might be that many FWs drop all packets with EH (and
obviously this is only IPv6) so an Internet wide might not be so
effective (yet!).

By my calculation, the worse case packet with TLVs in a standard
1500 byte MTU packet that would be processed by the stack contains
1282 invidual TLVs (including pad TLVS) or 724 two byte TLVs. I
wrote a quick test program that floods a whole bunch of these
packets to a host and sure enough there is substantial time spent
in ip6_parse_tlv. These packets contain nothing but unknown TLVS
(that are ignored), TLV padding, and bogus UDP header with zero
payload length.

  25.38%  [kernel]                    [k] __fib6_clean_all
  21.63%  [kernel]                    [k] ip6_parse_tlv
   4.21%  [kernel]                    [k] __local_bh_enable_ip
   2.18%  [kernel]                    [k] ip6_pol_route.isra.39
   1.98%  [kernel]                    [k] fib6_walk_continue
   1.88%  [kernel]                    [k] _raw_write_lock_bh
   1.65%  [kernel]                    [k] dst_release

This patches adds configurable limits to destination and hop by hop
options. There are three limits that may be set:
  - Limit the number of non-padding TLVs that may be in an extension header
  - Limit the length of a hop by hop or destination options extension header
  - Disallow unknown options

The limits are set in corresponding sysctls:

  ipv6.sysctl.max_dst_opts_cnt
  ipv6.sysctl.max_hbh_opts_cnt
  ipv6.sysctl.max_dst_opts_len
  ipv6.sysctl.max_hbh_opts_len

If a max_*_opts_cnt is less than zero then unknown TLVs are disallowed.
The number of known TLVs that are allowed is the absolute value of
this number.

If a limit is exceeded when processing an extension header the packet is
dropped.

Default values are set to 8 for options counts, and set to INT_MAX
for maximum length. Note the choice to limit options to 8 is an
arbitrary guess (roughly based on the fact that the stack supports
three HBH options and just one destination option).

Tested: I've only complied this code, working on getting a test
environment set up which is why RFC. If anyone has resources and time
to do some testing or development, let me know!
---
 Documentation/networking/ip-sysctl.txt | 22 +++++++++++++++++
 include/net/ipv6.h                     | 33 +++++++++++++++++++++++++
 include/net/netns/ipv6.h               |  4 ++++
 net/ipv6/af_inet6.c                    |  4 ++++
 net/ipv6/exthdrs.c                     | 44 ++++++++++++++++++++++++++++++----
 net/ipv6/sysctl_net_ipv6.c             | 32 +++++++++++++++++++++++++
 6 files changed, 134 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 974ab47..476a5c5 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1379,6 +1379,28 @@ mld_qrv - INTEGER
 	Default: 2 (as specified by RFC3810 9.1)
 	Minimum: 1 (as specified by RFC6636 4.5)
 
+max_dst_opts_cnt - INTEGER
+	Maximum number of non-padding TLVs allowed in a destination
+	options extension header. If this value is less than zero
+	then unknown options are disallowed and the number of known
+	TLVs allowed are the absolute value of this numer.
+
+	Default: 8
+
+max_hbh_opts_cnt - INTEGER
+	Maximum number of non-padding TLVs allowed in a hop by hop
+	options extension header. If this value is less than zero
+	then unknown options are disallowed and the number of known
+	TLVs allowed are the absolute value of this number.
+
+max dst_opts_len - INTEGER
+	Maximum length allowed for a destination options extension
+	header.
+
+max hbh_opts_len - INTEGER
+	Maximum length allowed for a hop by hop options extension
+	header.
+
 IPv6 Fragmentation:
 
 ip6frag_high_thresh - INTEGER
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index dbf0abb..9f724ae 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -50,6 +50,39 @@
 #define IPV6_DEFAULT_HOPLIMIT   64
 #define IPV6_DEFAULT_MCASTHOPS	1
 
+/* Limits on hop by hop and destination options.
+ *
+ * Per RFC2640 there is no limit on the maximum number or lengths of TLVs in
+ * hop by hop or destination options other then the packet must fit in an MTU.
+ * We allow configurable limits in order to mitigate potential denial of
+ * service attacks.
+ *
+ * There are three limits that may be set:
+ *   - Limit the number of non-padding TLVs that may be in an extension header
+ *   - Limit the length of a hop by hop or destination options extension header
+ *   - Disallow unknown options
+ *
+ * The limits are set in corresponding sysctls:
+ *
+ * ipv6.sysctl.max_dst_opts_cnt
+ * ipv6.sysctl.max_hbh_opts_cnt
+ * ipv6.sysctl.max_dst_opts_len
+ * ipv6.sysctl.max_hbh_opts_len
+ *
+ * If a max_*_opts_cnt is less than zero then unknown TLVs are disallowed.
+ * The number of known TLVs that are allowed is the absolute value of
+ * this number.
+ *
+ * If a limit is exceeded when processing an extension header the packet is
+ * dropped.
+ */
+
+/* Default limits for hop by hop and destination options */
+#define IP6_DEFAULT_MAX_DST_OPTS_CNT	8
+#define IP6_DEFAULT_MAX_HBH_OPTS_CNT	8
+#define IP6_DEFAULT_MAX_DST_OPTS_LEN	INT_MAX /* No limit */
+#define IP6_DEFAULT_MAX_HBH_OPTS_LEN	INT_MAX /* No limit */
+
 /*
  *	Addr type
  *	
diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index de7745e..655bd236 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -36,6 +36,10 @@ struct netns_sysctl_ipv6 {
 	int idgen_retries;
 	int idgen_delay;
 	int flowlabel_state_ranges;
+	int max_dst_opts_cnt;
+	int max_hbh_opts_cnt;
+	int max_dst_opts_len;
+	int max_hbh_opts_len;
 };
 
 struct netns_ipv6 {
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index a88b5b5..38e1079 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -807,6 +807,10 @@ static int __net_init inet6_net_init(struct net *net)
 	net->ipv6.sysctl.idgen_retries = 3;
 	net->ipv6.sysctl.idgen_delay = 1 * HZ;
 	net->ipv6.sysctl.flowlabel_state_ranges = 0;
+	net->ipv6.sysctl.max_dst_opts_cnt = IP6_DEFAULT_MAX_DST_OPTS_CNT;
+	net->ipv6.sysctl.max_hbh_opts_cnt = IP6_DEFAULT_MAX_HBH_OPTS_CNT;
+	net->ipv6.sysctl.max_dst_opts_len = IP6_DEFAULT_MAX_DST_OPTS_LEN;
+	net->ipv6.sysctl.max_hbh_opts_len = IP6_DEFAULT_MAX_HBH_OPTS_LEN;
 	atomic_set(&net->ipv6.fib6_sernum, 1);
 
 	err = ipv6_init_mibs(net);
diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index d32e211..d86aebf 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -100,13 +100,22 @@ static bool ip6_tlvopt_unknown(struct sk_buff *skb, int optoff)
 
 /* Parse tlv encoded option header (hop-by-hop or destination) */
 
-static bool ip6_parse_tlv(const struct tlvtype_proc *procs, struct sk_buff *skb)
+static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
+			  struct sk_buff *skb,
+			  int max_count)
 {
 	const struct tlvtype_proc *curr;
 	const unsigned char *nh = skb_network_header(skb);
 	int off = skb_network_header_len(skb);
 	int len = (skb_transport_header(skb)[1] + 1) << 3;
 	int padlen = 0;
+	int tlv_count = 0;
+	bool disallow_unknowns = false;
+
+	if (unlikely(max_count < 0)) {
+		disallow_unknowns = true;
+		max_count = -max_count;
+	}
 
 	if (skb_transport_offset(skb) + len > skb_headlen(skb))
 		goto bad;
@@ -148,6 +157,11 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs, struct sk_buff *skb)
 		default: /* Other TLV code so scan list */
 			if (optlen > len)
 				goto bad;
+
+			tlv_count++;
+			if (tlv_count > max_count)
+				goto bad;
+
 			for (curr = procs; curr->type >= 0; curr++) {
 				if (curr->type == nh[off]) {
 					/* type specific length/alignment
@@ -161,7 +175,10 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs, struct sk_buff *skb)
 			if (curr->type < 0) {
 				if (ip6_tlvopt_unknown(skb, off) == 0)
 					return false;
+				if (disallow_unknowns)
+					goto bad;
 			}
+
 			padlen = 0;
 			break;
 		}
@@ -260,23 +277,31 @@ static int ipv6_destopt_rcv(struct sk_buff *skb)
 	__u16 dstbuf;
 #endif
 	struct dst_entry *dst = skb_dst(skb);
+	struct net *net = dev_net(skb->dev);
+	int extlen;
 
 	if (!pskb_may_pull(skb, skb_transport_offset(skb) + 8) ||
 	    !pskb_may_pull(skb, (skb_transport_offset(skb) +
 				 ((skb_transport_header(skb)[1] + 1) << 3)))) {
+fail_and_free:
 		__IP6_INC_STATS(dev_net(dst->dev), ip6_dst_idev(dst),
 				IPSTATS_MIB_INHDRERRORS);
 		kfree_skb(skb);
 		return -1;
 	}
 
+	extlen = (skb_transport_header(skb)[1] + 1) << 3;
+	if (extlen > net->ipv6.sysctl.max_dst_opts_len)
+		goto fail_and_free;
+
 	opt->lastopt = opt->dst1 = skb_network_header_len(skb);
 #if IS_ENABLED(CONFIG_IPV6_MIP6)
 	dstbuf = opt->dst1;
 #endif
 
-	if (ip6_parse_tlv(tlvprocdestopt_lst, skb)) {
-		skb->transport_header += (skb_transport_header(skb)[1] + 1) << 3;
+	if (ip6_parse_tlv(tlvprocdestopt_lst, skb,
+			  init_net.ipv6.sysctl.max_dst_opts_cnt)) {
+		skb->transport_header += extlen;
 		opt = IP6CB(skb);
 #if IS_ENABLED(CONFIG_IPV6_MIP6)
 		opt->nhoff = dstbuf;
@@ -804,6 +829,8 @@ static const struct tlvtype_proc tlvprochopopt_lst[] = {
 int ipv6_parse_hopopts(struct sk_buff *skb)
 {
 	struct inet6_skb_parm *opt = IP6CB(skb);
+	struct net *net = dev_net(skb->dev);
+	int extlen;
 
 	/*
 	 * skb_network_header(skb) is equal to skb->data, and
@@ -818,9 +845,16 @@ int ipv6_parse_hopopts(struct sk_buff *skb)
 		return -1;
 	}
 
+	extlen = (skb_transport_header(skb)[1] + 1) << 3;
+	if (extlen > net->ipv6.sysctl.max_dst_opts_len) {
+		kfree_skb(skb);
+		return -1;
+	}
+
 	opt->flags |= IP6SKB_HOPBYHOP;
-	if (ip6_parse_tlv(tlvprochopopt_lst, skb)) {
-		skb->transport_header += (skb_transport_header(skb)[1] + 1) << 3;
+	if (ip6_parse_tlv(tlvprochopopt_lst, skb,
+			  init_net.ipv6.sysctl.max_hbh_opts_cnt)) {
+		skb->transport_header += extlen;
 		opt = IP6CB(skb);
 		opt->nhoff = sizeof(struct ipv6hdr);
 		return 1;
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index 69c50e7..054cabe 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -90,6 +90,34 @@ static struct ctl_table ipv6_table_template[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+	{
+		.procname	= "max_dst_opts_number",
+		.data		= &init_net.ipv6.sysctl.max_dst_opts_cnt,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
+	{
+		.procname	= "max_hbh_opts_number",
+		.data		= &init_net.ipv6.sysctl.max_hbh_opts_cnt,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
+	{
+		.procname	= "max_dst_opts_length",
+		.data		= &init_net.ipv6.sysctl.max_dst_opts_len,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
+	{
+		.procname	= "max_hbh_length",
+		.data		= &init_net.ipv6.sysctl.max_hbh_opts_len,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
 	{ }
 };
 
@@ -149,6 +177,10 @@ static int __net_init ipv6_sysctl_net_init(struct net *net)
 	ipv6_table[6].data = &net->ipv6.sysctl.idgen_delay;
 	ipv6_table[7].data = &net->ipv6.sysctl.flowlabel_state_ranges;
 	ipv6_table[8].data = &net->ipv6.sysctl.ip_nonlocal_bind;
+	ipv6_table[9].data = &net->ipv6.sysctl.max_dst_opts_cnt;
+	ipv6_table[10].data = &net->ipv6.sysctl.max_hbh_opts_cnt;
+	ipv6_table[11].data = &net->ipv6.sysctl.max_dst_opts_len;
+	ipv6_table[12].data = &net->ipv6.sysctl.max_hbh_opts_len;
 
 	ipv6_route_table = ipv6_route_sysctl_init(net);
 	if (!ipv6_route_table)
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 binutils] Add BPF support to binutils...
From: David Miller @ 2017-04-27 21:09 UTC (permalink / raw)
  To: ast; +Cc: daniel, netdev, xdp-newbies


Here is what I have after today's work.  I think I sorted out the
endianness issues.

gas can be controlled explicitly using "-EB" and "-EL" options.  The
default is whatever endianness the host has.  The elf names for the
two variants are "elf64-bpfbe" and "elf64-bpfle".

I fleshed out all the rest of the assembler parsing for instructions
and added many entries to the gas testsuite.

They are all explicitly in little endian, although I should add big
endian versions too of course.

If someone is looking for a way to help, could you please verify the
testsuite output to make sure the opcode and fields are correctly
set in the testsuite.  Just look in:

	gas/testsuite/gas/bpf/

and there are two files for every test.  One is the "foo.s" file which
gets built using gas into an object file "foo.o".  Then there is a
dump file named "foo.d" which specifies optionally how to run gas and
with what options, and then what to dump with (usually "objdump -dr")
then there is text which the testsuite compares with the dump of
the resulting "foo.o" file.

The testsuite is driven by bpf.exp which has pretty straightforward
syntax.

Anyways, enjoy.  I'll keep cracking on this tomorrow.

====================
>From 6a99a5754ae20bc8b857b3ebb814a18759ad3f4e Mon Sep 17 00:00:00 2001
From: "David S. Miller" <davem@davemloft.net>
Date: Wed, 26 Apr 2017 14:27:53 -0400
Subject: [PATCH] Start adding BPF support...

---
 bfd/Makefile.am                 |   2 +
 bfd/Makefile.in                 |   3 +
 bfd/archures.c                  |   3 +
 bfd/bfd-in2.h                   |   8 +
 bfd/config.bfd                  |   6 +
 bfd/configure                   |   2 +
 bfd/configure.ac                |   2 +
 bfd/cpu-bpf.c                   |  41 +++
 bfd/elf64-bpf.c                 |  49 ++++
 bfd/libbfd.h                    |   4 +
 bfd/reloc.c                     |  11 +
 bfd/targets.c                   |   5 +
 config.sub                      |   5 +-
 gas/Makefile.am                 |   2 +
 gas/Makefile.in                 |  17 ++
 gas/config/tc-bpf.c             | 583 ++++++++++++++++++++++++++++++++++++++++
 gas/config/tc-bpf.h             |  45 ++++
 gas/configure.tgt               |   3 +
 gas/testsuite/gas/bpf/arith.d   |  61 +++++
 gas/testsuite/gas/bpf/arith.s   |  53 ++++
 gas/testsuite/gas/bpf/atomics.d |  12 +
 gas/testsuite/gas/bpf/atomics.s |   4 +
 gas/testsuite/gas/bpf/bpf.exp   |  28 ++
 gas/testsuite/gas/bpf/call.d    |  18 ++
 gas/testsuite/gas/bpf/call.s    |  10 +
 gas/testsuite/gas/bpf/imm64.d   |  30 +++
 gas/testsuite/gas/bpf/imm64.s   |  12 +
 gas/testsuite/gas/bpf/jump.d    |  43 +++
 gas/testsuite/gas/bpf/jump.s    |  35 +++
 gas/testsuite/gas/bpf/loads.d   |  23 ++
 gas/testsuite/gas/bpf/loads.s   |  15 ++
 gas/testsuite/gas/bpf/move.d    |  19 ++
 gas/testsuite/gas/bpf/move.s    |  11 +
 gas/testsuite/gas/bpf/stores.d  |  17 ++
 gas/testsuite/gas/bpf/stores.s  |   9 +
 gdb/bpf-tdep.c                  | 229 ++++++++++++++++
 gdb/bpf-tdep.h                  |  40 +++
 gdb/configure.tgt               |   4 +
 include/dis-asm.h               |   1 +
 include/elf/bpf.h               |  35 +++
 include/opcode/bpf.h            |  16 ++
 ld/Makefile.am                  |   4 +
 ld/Makefile.in                  |   5 +
 ld/configure.tgt                |   2 +
 ld/emulparams/elf64_bpf.sh      |   8 +
 opcodes/Makefile.am             |   2 +
 opcodes/bpf-dis.c               | 152 +++++++++++
 opcodes/bpf-opc.c               | 147 ++++++++++
 opcodes/configure               |   1 +
 opcodes/configure.ac            |   1 +
 opcodes/disassemble.c           |   6 +
 51 files changed, 1842 insertions(+), 2 deletions(-)
 create mode 100644 bfd/cpu-bpf.c
 create mode 100644 bfd/elf64-bpf.c
 create mode 100644 gas/config/tc-bpf.c
 create mode 100644 gas/config/tc-bpf.h
 create mode 100644 gas/testsuite/gas/bpf/arith.d
 create mode 100644 gas/testsuite/gas/bpf/arith.s
 create mode 100644 gas/testsuite/gas/bpf/atomics.d
 create mode 100644 gas/testsuite/gas/bpf/atomics.s
 create mode 100644 gas/testsuite/gas/bpf/bpf.exp
 create mode 100644 gas/testsuite/gas/bpf/call.d
 create mode 100644 gas/testsuite/gas/bpf/call.s
 create mode 100644 gas/testsuite/gas/bpf/imm64.d
 create mode 100644 gas/testsuite/gas/bpf/imm64.s
 create mode 100644 gas/testsuite/gas/bpf/jump.d
 create mode 100644 gas/testsuite/gas/bpf/jump.s
 create mode 100644 gas/testsuite/gas/bpf/loads.d
 create mode 100644 gas/testsuite/gas/bpf/loads.s
 create mode 100644 gas/testsuite/gas/bpf/move.d
 create mode 100644 gas/testsuite/gas/bpf/move.s
 create mode 100644 gas/testsuite/gas/bpf/stores.d
 create mode 100644 gas/testsuite/gas/bpf/stores.s
 create mode 100644 gdb/bpf-tdep.c
 create mode 100644 gdb/bpf-tdep.h
 create mode 100644 include/elf/bpf.h
 create mode 100644 include/opcode/bpf.h
 create mode 100644 ld/emulparams/elf64_bpf.sh
 create mode 100644 opcodes/bpf-dis.c
 create mode 100644 opcodes/bpf-opc.c

diff --git a/bfd/Makefile.am b/bfd/Makefile.am
index 97b608c..911655a 100644
--- a/bfd/Makefile.am
+++ b/bfd/Makefile.am
@@ -95,6 +95,7 @@ ALL_MACHINES = \
 	cpu-arm.lo \
 	cpu-avr.lo \
 	cpu-bfin.lo \
+	cpu-bpf.lo \
 	cpu-cr16.lo \
 	cpu-cr16c.lo \
 	cpu-cris.lo \
@@ -185,6 +186,7 @@ ALL_MACHINES_CFILES = \
 	cpu-arm.c \
 	cpu-avr.c \
 	cpu-bfin.c \
+	cpu-bpf.c \
 	cpu-cr16.c \
 	cpu-cr16c.c \
 	cpu-cris.c \
diff --git a/bfd/Makefile.in b/bfd/Makefile.in
index e48abaf..930aa09 100644
--- a/bfd/Makefile.in
+++ b/bfd/Makefile.in
@@ -428,6 +428,7 @@ ALL_MACHINES = \
 	cpu-arm.lo \
 	cpu-avr.lo \
 	cpu-bfin.lo \
+	cpu-bpf.lo \
 	cpu-cr16.lo \
 	cpu-cr16c.lo \
 	cpu-cris.lo \
@@ -518,6 +519,7 @@ ALL_MACHINES_CFILES = \
 	cpu-arm.c \
 	cpu-avr.c \
 	cpu-bfin.c \
+	cpu-bpf.c \
 	cpu-cr16.c \
 	cpu-cr16c.c \
 	cpu-cris.c \
@@ -1380,6 +1382,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/cpu-arm.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/cpu-avr.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/cpu-bfin.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/cpu-bpf.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/cpu-cr16.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/cpu-cr16c.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/cpu-cris.Plo@am__quote@
diff --git a/bfd/archures.c b/bfd/archures.c
index c6e7152..f096d73 100644
--- a/bfd/archures.c
+++ b/bfd/archures.c
@@ -447,6 +447,8 @@ DESCRIPTION
 .#define bfd_mach_avrxmega7 107
 .  bfd_arch_bfin,        {* ADI Blackfin *}
 .#define bfd_mach_bfin          1
+.  bfd_arch_bpf,        {* eBPF *}
+.#define bfd_mach_bpf           1
 .  bfd_arch_cr16,       {* National Semiconductor CompactRISC (ie CR16). *}
 .#define bfd_mach_cr16		1
 .  bfd_arch_cr16c,       {* National Semiconductor CompactRISC. *}
@@ -582,6 +584,7 @@ extern const bfd_arch_info_type bfd_arc_arch;
 extern const bfd_arch_info_type bfd_arm_arch;
 extern const bfd_arch_info_type bfd_avr_arch;
 extern const bfd_arch_info_type bfd_bfin_arch;
+extern const bfd_arch_info_type bfd_bpf_arch;
 extern const bfd_arch_info_type bfd_cr16_arch;
 extern const bfd_arch_info_type bfd_cr16c_arch;
 extern const bfd_arch_info_type bfd_cris_arch;
diff --git a/bfd/bfd-in2.h b/bfd/bfd-in2.h
index 17a35c0..6d44534 100644
--- a/bfd/bfd-in2.h
+++ b/bfd/bfd-in2.h
@@ -2304,6 +2304,8 @@ enum bfd_architecture
 #define bfd_mach_avrxmega7 107
   bfd_arch_bfin,        /* ADI Blackfin */
 #define bfd_mach_bfin          1
+  bfd_arch_bpf,        /* eBPF */
+#define bfd_mach_bpf           1
   bfd_arch_cr16,       /* National Semiconductor CompactRISC (ie CR16). */
 #define bfd_mach_cr16          1
   bfd_arch_cr16c,       /* National Semiconductor CompactRISC. */
@@ -3910,6 +3912,12 @@ pc-relative or some form of GOT-indirect relocation.  */
 /* ADI Blackfin arithmetic relocation.  */
   BFD_ARELOC_BFIN_ADDR,
 
+/* BPF relocations  */
+  BFD_RELOC_BPF_16,
+  BFD_RELOC_BPF_32,
+  BFD_RELOC_BPF_64,
+  BFD_RELOC_BPF_WDISP16,
+
 /* Mitsubishi D10V relocs.
 This is a 10-bit reloc with the right 2 bits
 assumed to be 0.  */
diff --git a/bfd/config.bfd b/bfd/config.bfd
index 151de95..f6d90cd 100644
--- a/bfd/config.bfd
+++ b/bfd/config.bfd
@@ -161,6 +161,7 @@ am33_2.0*)	 targ_archs=bfd_mn10300_arch ;;
 arc*)		 targ_archs=bfd_arc_arch ;;
 arm*)		 targ_archs=bfd_arm_arch ;;
 bfin*)		 targ_archs=bfd_bfin_arch ;;
+bpf*)		 targ_archs=bfd_bpf_arch ;;
 c30*)		 targ_archs=bfd_tic30_arch ;;
 c4x*)		 targ_archs=bfd_tic4x_arch ;;
 c54x*)		 targ_archs=bfd_tic54x_arch ;;
@@ -471,6 +472,11 @@ case "${targ}" in
     targ_underscore=yes
     ;;
 
+  bpf-*-*)
+    targ_defvec=bpf_elf64_be_vec
+    targ_selvecs=bpf_elf64_le_vec
+    ;;
+
   c30-*-*aout* | tic30-*-*aout*)
     targ_defvec=tic30_aout_vec
     ;;
diff --git a/bfd/configure b/bfd/configure
index 24e3e2f..2a5ba40 100755
--- a/bfd/configure
+++ b/bfd/configure
@@ -14298,6 +14298,8 @@ do
     avr_elf32_vec)		 tb="$tb elf32-avr.lo elf32.lo $elf" ;;
     bfin_elf32_vec)		 tb="$tb elf32-bfin.lo elf32.lo $elf" ;;
     bfin_elf32_fdpic_vec)	 tb="$tb elf32-bfin.lo elf32.lo $elf" ;;
+    bpf_elf64_le_vec)		 tb="$tb elf64-bpf.lo elf64.lo $elf" ;;
+    bpf_elf64_be_vec)		 tb="$tb elf64-bpf.lo elf64.lo $elf" ;;
     bout_be_vec)		 tb="$tb bout.lo aout32.lo" ;;
     bout_le_vec)		 tb="$tb bout.lo aout32.lo" ;;
     cr16_elf32_vec)		 tb="$tb elf32-cr16.lo elf32.lo $elf" ;;
diff --git a/bfd/configure.ac b/bfd/configure.ac
index e568847..0dd7139 100644
--- a/bfd/configure.ac
+++ b/bfd/configure.ac
@@ -429,6 +429,8 @@ do
     avr_elf32_vec)		 tb="$tb elf32-avr.lo elf32.lo $elf" ;;
     bfin_elf32_vec)		 tb="$tb elf32-bfin.lo elf32.lo $elf" ;;
     bfin_elf32_fdpic_vec)	 tb="$tb elf32-bfin.lo elf32.lo $elf" ;;
+    bpf_elf64_le_vec)		 tb="$tb elf64-bpf.lo elf64.lo $elf" ;;
+    bpf_elf64_be_vec)		 tb="$tb elf64-bpf.lo elf64.lo $elf" ;;
     bout_be_vec)		 tb="$tb bout.lo aout32.lo" ;;
     bout_le_vec)		 tb="$tb bout.lo aout32.lo" ;;
     cr16_elf32_vec)		 tb="$tb elf32-cr16.lo elf32.lo $elf" ;;
diff --git a/bfd/cpu-bpf.c b/bfd/cpu-bpf.c
new file mode 100644
index 0000000..551e42e
--- /dev/null
+++ b/bfd/cpu-bpf.c
@@ -0,0 +1,41 @@
+/* BFD Support for the eBPF.
+
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   This file is part of BFD, the Binary File Descriptor library.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston,
+   MA 02110-1301, USA.  */
+
+#include "sysdep.h"
+#include "bfd.h"
+#include "libbfd.h"
+
+const bfd_arch_info_type bfd_bpf_arch =
+  {
+    64,     		/* Bits in a word.  */
+    64,  		/* Bits in an address.  */
+    8,     		/* Bits in a byte.  */
+    bfd_arch_bpf,
+    0,                	/* Only one machine.  */
+    "bpf",        	/* Arch name.  */
+    "bpf",        	/* Arch printable name.  */
+    3,                	/* Section align power.  */
+    TRUE,             	/* The one and only.  */
+    bfd_default_compatible,
+    bfd_default_scan,
+    bfd_arch_default_fill,
+    0,
+  };
diff --git a/bfd/elf64-bpf.c b/bfd/elf64-bpf.c
new file mode 100644
index 0000000..9944bb4
--- /dev/null
+++ b/bfd/elf64-bpf.c
@@ -0,0 +1,49 @@
+#include "sysdep.h"
+#include "bfd.h"
+#include "libbfd.h"
+#include "elf-bfd.h"
+#include "opcode/bpf.h"
+
+static void
+check_for_relocs (bfd * abfd, asection * o, void * failed)
+{
+  if ((o->flags & SEC_RELOC) != 0)
+    {
+      Elf_Internal_Ehdr *ehdrp;
+
+      ehdrp = elf_elfheader (abfd);
+      /* xgettext:c-format */
+      _bfd_error_handler (_("%B: Relocations in generic ELF (EM: %d)"),
+			  abfd, ehdrp->e_machine);
+
+      bfd_set_error (bfd_error_wrong_format);
+      * (bfd_boolean *) failed = TRUE;
+    }
+}
+
+static bfd_boolean
+elf64_generic_link_add_symbols (bfd *abfd, struct bfd_link_info *info)
+{
+  bfd_boolean failed = FALSE;
+
+  /* Check if there are any relocations.  */
+  bfd_map_over_sections (abfd, check_for_relocs, & failed);
+
+  if (failed)
+    return FALSE;
+  return bfd_elf_link_add_symbols (abfd, info);
+}
+
+#define TARGET_LITTLE_SYM	bpf_elf64_le_vec
+#define TARGET_LITTLE_NAME	"elf64-bpfle"
+#define TARGET_BIG_SYM		bpf_elf64_be_vec
+#define TARGET_BIG_NAME		"elf64-bpfbe"
+#define ELF_ARCH		bfd_arch_bpf
+#define ELF_MAXPAGESIZE		0x100000
+#define ELF_MACHINE_CODE	EM_BPF
+
+#define bfd_elf64_bfd_reloc_type_lookup bfd_default_reloc_type_lookup
+#define bfd_elf64_bfd_reloc_name_lookup _bfd_norelocs_bfd_reloc_name_lookup
+#define bfd_elf64_bfd_link_add_symbols	elf64_generic_link_add_symbols
+
+#include "elf64-target.h"
diff --git a/bfd/libbfd.h b/bfd/libbfd.h
index 8bac650..1a3001d 100644
--- a/bfd/libbfd.h
+++ b/bfd/libbfd.h
@@ -1794,6 +1794,10 @@ static const char *const bfd_reloc_code_real_names[] = { "@@uninitialized@@",
   "BFD_ARELOC_BFIN_PAGE",
   "BFD_ARELOC_BFIN_HWPAGE",
   "BFD_ARELOC_BFIN_ADDR",
+  "BFD_RELOC_BPF_16",
+  "BFD_RELOC_BPF_32",
+  "BFD_RELOC_BPF_64",
+  "BFD_RELOC_BPF_WDISP16",
   "BFD_RELOC_D10V_10_PCREL_R",
   "BFD_RELOC_D10V_10_PCREL_L",
   "BFD_RELOC_D10V_18",
diff --git a/bfd/reloc.c b/bfd/reloc.c
index 9a04022..4100caf 100644
--- a/bfd/reloc.c
+++ b/bfd/reloc.c
@@ -3854,6 +3854,17 @@ ENUMDOC
   ADI Blackfin arithmetic relocation.
 
 ENUM
+  BFD_RELOC_BPF_16
+ENUMX
+  BFD_RELOC_BPF_32
+ENUMX
+  BFD_RELOC_BPF_64
+ENUMX
+  BFD_RELOC_BPF_WDISP16
+ENUMDOC
+  BPF relocations
+
+ENUM
   BFD_RELOC_D10V_10_PCREL_R
 ENUMDOC
   Mitsubishi D10V relocs.
diff --git a/bfd/targets.c b/bfd/targets.c
index 5841e8d..c38c4fb 100644
--- a/bfd/targets.c
+++ b/bfd/targets.c
@@ -619,6 +619,8 @@ extern const bfd_target arm_pei_wince_le_vec;
 extern const bfd_target avr_elf32_vec;
 extern const bfd_target bfin_elf32_vec;
 extern const bfd_target bfin_elf32_fdpic_vec;
+extern const bfd_target bpf_elf64_le_vec;
+extern const bfd_target bpf_elf64_be_vec;
 extern const bfd_target bout_be_vec;
 extern const bfd_target bout_le_vec;
 extern const bfd_target cr16_elf32_vec;
@@ -1029,6 +1031,9 @@ static const bfd_target * const _bfd_target_vector[] =
 	&bfin_elf32_vec,
 	&bfin_elf32_fdpic_vec,
 
+	&bpf_elf64_le_vec,
+	&bpf_elf64_be_vec,
+
 	&bout_be_vec,
 	&bout_le_vec,
 
diff --git a/config.sub b/config.sub
index 40ea5df..942989e 100755
--- a/config.sub
+++ b/config.sub
@@ -2,7 +2,7 @@
 # Configuration validation subroutine script.
 #   Copyright 1992-2017 Free Software Foundation, Inc.
 
-timestamp='2017-04-02'
+timestamp='2017-04-25'
 
 # This file is free software; you can redistribute it and/or modify it
 # under the terms of the GNU General Public License as published by
@@ -257,6 +257,7 @@ case $basic_machine in
 	| ba \
 	| be32 | be64 \
 	| bfin \
+	| bpf \
 	| c4x | c8051 | clipper \
 	| d10v | d30v | dlx | dsp16xx \
 	| e2k | epiphany \
@@ -380,7 +381,7 @@ case $basic_machine in
 	| avr-* | avr32-* \
 	| ba-* \
 	| be32-* | be64-* \
-	| bfin-* | bs2000-* \
+	| bfin-* | bpf-* | bs2000-* \
 	| c[123]* | c30-* | [cjt]90-* | c4x-* \
 	| c8051-* | clipper-* | craynv-* | cydra-* \
 	| d10v-* | d30v-* | dlx-* \
diff --git a/gas/Makefile.am b/gas/Makefile.am
index c9f9de0..bfd6ed9 100644
--- a/gas/Makefile.am
+++ b/gas/Makefile.am
@@ -135,6 +135,7 @@ TARGET_CPU_CFILES = \
 	config/tc-arm.c \
 	config/tc-avr.c \
 	config/tc-bfin.c \
+	config/tc-bpf.c \
 	config/tc-cr16.c \
 	config/tc-cris.c \
 	config/tc-crx.c \
@@ -212,6 +213,7 @@ TARGET_CPU_HFILES = \
 	config/tc-arm.h \
 	config/tc-avr.h \
 	config/tc-bfin.h \
+	config/tc-bpf.h \
 	config/tc-cr16.h \
 	config/tc-cris.h \
 	config/tc-crx.h \
diff --git a/gas/Makefile.in b/gas/Makefile.in
index 1927de5..ee62f1a 100644
--- a/gas/Makefile.in
+++ b/gas/Makefile.in
@@ -431,6 +431,7 @@ TARGET_CPU_CFILES = \
 	config/tc-arm.c \
 	config/tc-avr.c \
 	config/tc-bfin.c \
+	config/tc-bpf.c \
 	config/tc-cr16.c \
 	config/tc-cris.c \
 	config/tc-crx.c \
@@ -508,6 +509,7 @@ TARGET_CPU_HFILES = \
 	config/tc-arm.h \
 	config/tc-avr.h \
 	config/tc-bfin.h \
+	config/tc-bpf.h \
 	config/tc-cr16.h \
 	config/tc-cris.h \
 	config/tc-crx.h \
@@ -868,6 +870,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tc-arm.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tc-avr.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tc-bfin.Po@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tc-bpf.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tc-cr16.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tc-cris.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tc-crx.Po@am__quote@
@@ -1045,6 +1048,20 @@ tc-bfin.obj: config/tc-bfin.c
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
 @am__fastdepCC_FALSE@	$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o tc-bfin.obj `if test -f 'config/tc-bfin.c'; then $(CYGPATH_W) 'config/tc-bfin.c'; else $(CYGPATH_W) '$(srcdir)/config/tc-bfin.c'; fi`
 
+tc-bpf.o: config/tc-bpf.c
+@am__fastdepCC_TRUE@	$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -MT tc-bpf.o -MD -MP -MF $(DEPDIR)/tc-bpf.Tpo -c -o tc-bpf.o `test -f 'config/tc-bpf.c' || echo '$(srcdir)/'`config/tc-bpf.c
+@am__fastdepCC_TRUE@	$(am__mv) $(DEPDIR)/tc-bpf.Tpo $(DEPDIR)/tc-bpf.Po
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	source='config/tc-bpf.c' object='tc-bpf.o' libtool=no @AMDEPBACKSLASH@
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
+@am__fastdepCC_FALSE@	$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o tc-bpf.o `test -f 'config/tc-bpf.c' || echo '$(srcdir)/'`config/tc-bpf.c
+
+tc-bpf.obj: config/tc-bpf.c
+@am__fastdepCC_TRUE@	$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -MT tc-bpf.obj -MD -MP -MF $(DEPDIR)/tc-bpf.Tpo -c -o tc-bpf.obj `if test -f 'config/tc-bpf.c'; then $(CYGPATH_W) 'config/tc-bpf.c'; else $(CYGPATH_W) '$(srcdir)/config/tc-bpf.c'; fi`
+@am__fastdepCC_TRUE@	$(am__mv) $(DEPDIR)/tc-bpf.Tpo $(DEPDIR)/tc-bpf.Po
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	source='config/tc-bpf.c' object='tc-bpf.obj' libtool=no @AMDEPBACKSLASH@
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
+@am__fastdepCC_FALSE@	$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o tc-bpf.obj `if test -f 'config/tc-bpf.c'; then $(CYGPATH_W) 'config/tc-bpf.c'; else $(CYGPATH_W) '$(srcdir)/config/tc-bpf.c'; fi`
+
 tc-cr16.o: config/tc-cr16.c
 @am__fastdepCC_TRUE@	$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -MT tc-cr16.o -MD -MP -MF $(DEPDIR)/tc-cr16.Tpo -c -o tc-cr16.o `test -f 'config/tc-cr16.c' || echo '$(srcdir)/'`config/tc-cr16.c
 @am__fastdepCC_TRUE@	$(am__mv) $(DEPDIR)/tc-cr16.Tpo $(DEPDIR)/tc-cr16.Po
diff --git a/gas/config/tc-bpf.c b/gas/config/tc-bpf.c
new file mode 100644
index 0000000..f5fb308
--- /dev/null
+++ b/gas/config/tc-bpf.c
@@ -0,0 +1,583 @@
+/* tc-bpf.c -- Assemble for the SPARC
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of GAS, the GNU Assembler.
+
+   GAS is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GAS is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public
+   License along with GAS; see the file COPYING.  If not, write
+   to the Free Software Foundation, 51 Franklin Street - Fifth Floor,
+   Boston, MA 02110-1301, USA.  */
+
+#include "as.h"
+#include "safe-ctype.h"
+#include "subsegs.h"
+#include "opcode/bpf.h"
+#ifdef OBJ_ELF
+#include "elf/bpf.h"
+#include "dwarf2dbg.h"
+#endif
+
+const pseudo_typeS md_pseudo_table[] =
+{
+  {"align", s_align_bytes, 0},	/* Defaulting is invalid (0).  */
+  {"global", s_globl, 0},
+  {"half", cons, 2},
+  {"skip", s_space, 0},
+  {"word", cons, 4},
+  {"xword", cons, 8},
+  {NULL, 0, 0},
+};
+
+const char comment_chars[] = "!";
+const char line_comment_chars[] = "#";
+const char line_separator_chars[] = ";";
+const char EXP_CHARS[] = "eE";
+const char FLT_CHARS[] = "rRsSfFdDxXpP";
+
+const char *md_shortopts = "V";
+struct option md_longopts[] =
+{
+#define OPTION_LITTLE_ENDIAN (OPTION_MD_BASE + 8)
+  {"EL", no_argument, NULL, OPTION_LITTLE_ENDIAN},
+#define OPTION_BIG_ENDIAN (OPTION_MD_BASE + 9)
+  {"EB", no_argument, NULL, OPTION_BIG_ENDIAN},
+  { NULL, no_argument, NULL, 0 },
+};
+size_t md_longopts_size = sizeof (md_longopts);
+
+/* Whether or not, we've set target_big_endian.  */
+static int set_target_endian = 0;
+
+int
+md_parse_option (int c ATTRIBUTE_UNUSED, const char *arg ATTRIBUTE_UNUSED)
+{
+  switch (c)
+    {
+    case OPTION_LITTLE_ENDIAN:
+      target_big_endian = 0;
+      set_target_endian = 1;
+      break;
+    case OPTION_BIG_ENDIAN:
+      target_big_endian = 1;
+      set_target_endian = 1;
+      break;
+    case 'V':
+      print_version_id ();
+      break;
+    default:
+      return 0;
+    }
+  return 1;
+}
+
+void
+md_show_usage (FILE *stream)
+{
+  fprintf (stream, _("BPF options:\n"));
+}
+
+/* Handle of the OPCODE hash table.  */
+static struct hash_control *op_hash;
+
+void
+md_begin (void)
+{
+  const char *retval = NULL;
+  unsigned int i = 0;
+  int lose = 0;
+
+  op_hash = hash_new ();
+  while (i < (unsigned int) bpf_num_opcodes)
+    {
+      const char *name = bpf_opcodes[i].name;
+      retval = hash_insert (op_hash, name, (void *) &bpf_opcodes[i]);
+      if (retval != NULL)
+	{
+	  as_bad (_("Internal error: can't hash `%s': %s\n"),
+		  bpf_opcodes[i].name, retval);
+	  lose = 1;
+	}
+      do
+	{
+	  ++i;
+	}
+      while (i < (unsigned int) bpf_num_opcodes
+	     && !strcmp (bpf_opcodes[i].name, name));
+    }
+  if (lose)
+    as_fatal (_("Broken assembler.  No assembly attempted."));
+
+  if (!set_target_endian)
+    {
+      /* Default to host endianness. */
+#ifdef WORDS_BIGENDIAN
+      target_big_endian = 1;
+#else
+      target_big_endian = 0;
+#endif
+      set_target_endian = 1;
+    }
+}
+
+const char *
+bpf_target_format (void)
+{
+  return target_big_endian ? "elf64-bpfbe" : "elf64-bpfle";
+}
+
+struct bpf_it
+  {
+    const char *error;
+    valueT opcode;
+    valueT high64;
+    expressionS exp;
+    int pcrel;
+    int imm64;
+    bfd_reloc_code_real_type reloc;
+  };
+
+/* Subroutine of md_assemble to output one insn.  */
+
+static void
+output_insn (struct bpf_it *theinsn)
+{
+  valueT opc = theinsn->opcode;
+  char *toP = frag_more (theinsn->imm64 ? 16 : 8);
+  char code, regs;
+  
+  code = opc >> (64 - 8);
+  regs = opc >> (64 - (8 + 8));
+
+  toP[0] = code;
+  toP[1] = regs;
+
+  /* Put out the opcode.  */
+  if (target_big_endian)
+    {
+      number_to_chars_bigendian (toP + 2, opc >> 32, 2);
+      number_to_chars_bigendian (toP + 4, opc, 4);
+    }
+  else
+    {
+      number_to_chars_littleendian (toP + 2, opc >> 32, 2);
+      number_to_chars_littleendian (toP + 4, opc, 4);
+    }
+
+  if (theinsn->imm64)
+    {
+      toP[8] = 0;
+      toP[9] = 0;
+      toP[10] = 0;
+      toP[11] = 0;
+      if (target_big_endian)
+	{
+	  number_to_chars_bigendian (toP + 12, theinsn->high64, 4);
+	}
+      else
+	{
+	  number_to_chars_littleendian (toP + 12, theinsn->high64, 4);
+	}
+    }
+
+  /* Put out the symbol-dependent stuff.  */
+  if (theinsn->reloc != BFD_RELOC_NONE)
+    {
+      fixS *fixP =  fix_new_exp (frag_now,	/* Which frag.  */
+				 (toP - frag_now->fr_literal),	/* Where.  */
+				 4,		/* Size.  */
+				 &theinsn->exp,
+				 theinsn->pcrel,
+				 theinsn->reloc);
+      /* Turn off overflow checking in fixup_segment.  We'll do our
+	 own overflow checking in md_apply_fix.  This is necessary because
+	 the insn size is 4 and fixup_segment will signal an overflow for
+	 large 8 byte quantities.  */
+      fixP->fx_no_overflow = 1;
+    }
+
+#ifdef OBJ_ELF
+  dwarf2_emit_insn (8);
+#endif
+}
+
+static struct bpf_it the_insn;
+static char *expr_end;
+
+static int
+get_expression (char *str, expressionS *exp)
+{
+  char *save_in;
+  segT seg;
+
+  save_in = input_line_pointer;
+  input_line_pointer = str;
+  seg = expression (exp);
+  if (seg != absolute_section
+      && seg != text_section
+      && seg != data_section
+      && seg != bss_section
+      && seg != undefined_section)
+    {
+      the_insn.error = _("bad segment");
+      expr_end = input_line_pointer;
+      input_line_pointer = save_in;
+      return 1;
+    }
+  expr_end = input_line_pointer;
+  input_line_pointer = save_in;
+  return 0;
+}
+
+void
+md_assemble (char *str ATTRIBUTE_UNUSED)
+{
+  const struct bpf_opcode *insn;
+  const char *args;
+  char *argsStart;
+  int match = 0;
+  valueT mask;
+  char *s, c;
+
+  s = str;
+  if (ISLOWER (*s))
+    {
+      do
+	++s;
+      while (ISLOWER (*s) || ISDIGIT (*s) || *s == '_');
+    }
+
+  switch (*s)
+    {
+    case '\0':
+      break;
+
+    case ' ':
+      *s++ = '\0';
+      break;
+
+    default:
+      as_bad (_("Unknown opcode: `%s'"), str);
+      return;
+    }
+  insn = (struct bpf_opcode *) hash_find (op_hash, str);
+
+  if (insn == NULL)
+    {
+      as_bad (_("Unknown opcode: `%s'"), str);
+      return;
+    }
+
+  argsStart = s;
+  for (;;)
+    {
+      memset (&the_insn, '\0', sizeof (the_insn));
+      the_insn.reloc = BFD_RELOC_NONE;
+      the_insn.opcode = ((valueT)insn->code << 56);
+
+      for (args = insn->args;; args++)
+	{
+	  switch (*args)
+	    {
+	    case '+':
+	    case ',':
+	    case '[':
+	    case ']':
+	      if (*s++ == *args)
+		continue;
+	      break;
+	    case '1':
+	      if (*s++ == 'r')
+		{
+		  if (!ISDIGIT ((c = *s++)))
+		    {
+		      goto error;
+		    }
+		  c -= '0';
+		  mask = c;
+		  if (ISDIGIT (*s))
+		    {
+		      c = *s++;
+		      if (c != '0' || mask != 1)
+			goto error;
+		      mask = 10;
+		    }			  
+		  the_insn.opcode |= (mask << 52);
+		  continue;
+		}
+	      break;
+	    case '2':
+	      if (*s++ == 'r')
+		{
+		  if (!ISDIGIT ((c = *s++)))
+		    {
+		      goto error;
+		    }
+		  c -= '0';
+		  mask = c;
+		  if (ISDIGIT (*s))
+		    {
+		      c = *s++;
+		      if (c != '0' || mask != 1)
+			goto error;
+		      mask = 10;
+		    }			  
+		  the_insn.opcode |= (mask << 48);
+		  continue;
+		}
+	      break;
+	    case 'i':
+	    case 'C':
+	      the_insn.reloc = BFD_RELOC_BPF_32;
+	      if (*s == ' ')
+		s++;
+	      get_expression (s, &the_insn.exp);
+	      s = expr_end;
+	      if (the_insn.exp.X_op == O_constant
+		  && the_insn.exp.X_add_symbol == 0
+		  && the_insn.exp.X_op_symbol == 0)
+		{
+		  valueT val = the_insn.exp.X_add_number;
+
+		  the_insn.reloc = BFD_RELOC_NONE;
+		  val &= 0xffffffff;
+		  the_insn.opcode |= val;
+		}
+	      continue;
+	    case 'O':
+	      the_insn.reloc = BFD_RELOC_BPF_16;
+	      if (*s == ' ')
+		s++;
+	      get_expression (s, &the_insn.exp);
+	      s = expr_end;
+	      if (the_insn.exp.X_op == O_constant
+		  && the_insn.exp.X_add_symbol == 0
+		  && the_insn.exp.X_op_symbol == 0)
+		{
+		  valueT val = the_insn.exp.X_add_number;
+
+		  the_insn.reloc = BFD_RELOC_NONE;
+		  val &= 0xffff;
+		  the_insn.opcode |= val << 32;
+		}
+	      continue;
+	    case 'L':
+	      the_insn.reloc = BFD_RELOC_BPF_WDISP16;
+	      the_insn.pcrel = 1;
+	      if (*s == ' ')
+		s++;
+	      get_expression (s, &the_insn.exp);
+	      s = expr_end;
+	      if (the_insn.exp.X_op == O_constant
+		  && the_insn.exp.X_add_symbol == 0
+		  && the_insn.exp.X_op_symbol == 0)
+		{
+		  valueT val = the_insn.exp.X_add_number;
+
+		  the_insn.reloc = BFD_RELOC_NONE;
+		  val &= 0xffff;
+		  the_insn.opcode |= val << 32;
+		}
+	      continue;
+	    case 'D':
+	      the_insn.reloc = BFD_RELOC_BPF_64;
+	      the_insn.imm64 = 1;
+	      if (*s == ' ')
+		s++;
+	      get_expression (s, &the_insn.exp);
+	      s = expr_end;
+	      if (the_insn.exp.X_op == O_constant
+		  && the_insn.exp.X_add_symbol == 0
+		  && the_insn.exp.X_op_symbol == 0)
+		{
+		  valueT val = the_insn.exp.X_add_number;
+
+		  the_insn.reloc = BFD_RELOC_NONE;
+		  the_insn.opcode |= (val & 0xffffffff);
+		  the_insn.high64 = ((val >> 32) & 0xffffffff);
+		}
+	      continue;
+	    case '\0':		/* End of args.  */
+	      match = 1;
+	      break;
+	    default:
+	      as_fatal (_("failed sanity check."));
+	    }
+
+	  /* Break out of for() loop.  */
+	  break;
+	}
+    error:
+      if (match == 0)
+	{
+	  /* Args don't match.  */
+	  if (&insn[1] - bpf_opcodes < bpf_num_opcodes
+	      && (insn->name == insn[1].name
+		  || !strcmp (insn->name, insn[1].name)))
+	    {
+	      ++insn;
+	      s = argsStart;
+	      continue;
+	    }
+	  else
+	    {
+	      as_bad (_("Illegal operands%s"), "");
+	      return;
+	    }
+	}
+      break;
+    }
+
+  output_insn (&the_insn);
+}
+
+void
+md_number_to_chars (char *buf, valueT val, int n)
+{
+  if (target_big_endian)
+    number_to_chars_bigendian (buf, val, n);
+  else
+    number_to_chars_littleendian (buf, val, n);
+}
+
+static void
+md_apply_u16 (offsetT val, char *buf)
+{
+  long off;
+
+  if (target_big_endian)
+    off = bfd_getb16 ((unsigned char *) buf + 2);
+  else
+    off = bfd_getl16 ((unsigned char *) buf + 2);
+  off |= val;
+  if (target_big_endian)
+    bfd_putb16 (off, (unsigned char *) buf + 2);
+  else
+    bfd_putl16 (off, (unsigned char *) buf + 2);
+}
+
+static void
+md_apply_u32 (offsetT val, char *buf)
+{
+  long imm;
+
+  if (target_big_endian)
+    imm = bfd_getb32 ((unsigned char *) buf + 4);
+  else
+    imm = bfd_getl32 ((unsigned char *) buf + 4);
+  imm |= val;
+  if (target_big_endian)
+    bfd_putb32 (imm, (unsigned char *) buf + 4);
+  else
+    bfd_putl32 (imm, (unsigned char *) buf + 4);
+}
+
+static void
+md_apply_u64 (offsetT val, char *buf)
+{
+  md_apply_u32(val & 0xffffffff, buf);
+  md_apply_u32((val >> 32) & 0xffffffff, buf + 12);
+}
+
+void
+md_apply_fix (fixS *fixP, valueT *valP ATTRIBUTE_UNUSED, segT segment ATTRIBUTE_UNUSED)
+{
+  char *buf = fixP->fx_where + fixP->fx_frag->fr_literal;
+  offsetT val = * (offsetT *) valP;
+
+  gas_assert (fixP->fx_r_type < BFD_RELOC_UNUSED);
+  /* If this is a data relocation, just output VAL.  */
+
+  if (fixP->fx_r_type == BFD_RELOC_8)
+    {
+      md_number_to_chars (buf, val, 1);
+    }
+  else if (fixP->fx_r_type == BFD_RELOC_16)
+    {
+      md_number_to_chars (buf, val, 2);
+    }
+  else if (fixP->fx_r_type == BFD_RELOC_32)
+    {
+      md_number_to_chars (buf, val, 4);
+    }
+  else if (fixP->fx_r_type == BFD_RELOC_64)
+    {
+      md_number_to_chars (buf, val, 8);
+    }
+  else if (fixP->fx_r_type == BFD_RELOC_VTABLE_INHERIT
+           || fixP->fx_r_type == BFD_RELOC_VTABLE_ENTRY)
+    {
+      fixP->fx_done = 0;
+      return;
+    }
+  else
+    {
+      /* It's a relocation against an instruction.  */
+
+      switch (fixP->fx_r_type)
+	{
+	case BFD_RELOC_BPF_WDISP16:
+	  val = val  >> 3;
+	  md_apply_u16((val + 1) & 0xffff, buf);
+	  break;
+	case BFD_RELOC_BPF_16:
+	  md_apply_u16(val & 0xffff, buf);
+	  break;
+	case BFD_RELOC_BPF_32:
+	  md_apply_u32(val & 0xffffffff, buf);
+	  break;
+	case BFD_RELOC_BPF_64:
+	  md_apply_u64(val, buf);
+	  break;
+	case BFD_RELOC_NONE:
+	default:
+	  as_bad_where (fixP->fx_file, fixP->fx_line,
+			_("bad or unhandled relocation type: 0x%02x"),
+			fixP->fx_r_type);
+	  break;
+	}
+
+    }
+}
+
+arelent *
+tc_gen_reloc (asection *section ATTRIBUTE_UNUSED, fixS *fixp ATTRIBUTE_UNUSED)
+{
+  return NULL;
+}
+
+symbolS *
+md_undefined_symbol (char *name ATTRIBUTE_UNUSED)
+{
+  return 0;
+}
+
+valueT
+md_section_align (segT segment ATTRIBUTE_UNUSED, valueT size)
+{
+  return size;
+}
+
+long
+md_pcrel_from (fixS *fixP)
+{
+  long ret;
+
+  ret = fixP->fx_where + fixP->fx_frag->fr_address;
+  /* XXX */
+  return ret;
+}
+
+const char *
+md_atof (int type, char *litP, int *sizeP)
+{
+  return ieee_md_atof (type, litP, sizeP, target_big_endian);
+}
diff --git a/gas/config/tc-bpf.h b/gas/config/tc-bpf.h
new file mode 100644
index 0000000..45ab5d2
--- /dev/null
+++ b/gas/config/tc-bpf.h
@@ -0,0 +1,45 @@
+/* tc-bpf.h - Macros and type defines for the bpf.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   This file is part of GAS, the GNU Assembler.
+
+   GAS is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as
+   published by the Free Software Foundation; either version 3,
+   or (at your option) any later version.
+
+   GAS is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+   the GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public
+   License along with GAS; see the file COPYING.  If not, write
+   to the Free Software Foundation, 51 Franklin Street - Fifth Floor,
+   Boston, MA 02110-1301, USA.  */
+
+#ifndef TC_BPF
+#define TC_BPF 1
+
+#define TARGET_ARCH			bfd_arch_bpf
+
+#ifdef WORDS_BIGENDIAN
+#define TARGET_BYTES_BIG_ENDIAN		1
+#else
+#define TARGET_BYTES_BIG_ENDIAN		0
+#endif
+
+#define TARGET_FORMAT (bpf_target_format ())
+extern const char *bpf_target_format (void);
+
+#define md_convert_frag(b,s,f) \
+  as_fatal (_("bpf convert_frag\n"))
+#define md_estimate_size_before_relax(f,s) \
+  (as_fatal (_("estimate_size_before_relax called")), 1)
+#define md_operand(x)
+
+#define LISTING_HEADER "BPF GAS "
+
+#define WORKING_DOT_WORD
+
+#endif
diff --git a/gas/configure.tgt b/gas/configure.tgt
index ca58b69..fa959c3 100644
--- a/gas/configure.tgt
+++ b/gas/configure.tgt
@@ -54,6 +54,7 @@ case ${cpu} in
   arm*be|arm*b)		cpu_type=arm endian=big ;;
   arm*)			cpu_type=arm endian=little ;;
   bfin*)		cpu_type=bfin endian=little ;;
+  bpf*)			cpu_type=bpf ;;
   c4x*)			cpu_type=tic4x ;;
   cr16*)		cpu_type=cr16 endian=little ;;
   crisv32)		cpu_type=cris arch=crisv32 ;;
@@ -171,6 +172,8 @@ case ${generic_target} in
   bfin-*-uclinux*)			fmt=elf em=linux ;;
   bfin-*elf)				fmt=elf ;;
 
+  bpf-*elf)				fmt=elf ;;
+
   cr16-*-elf*)				fmt=elf ;;
 
   cris-*-linux-* | crisv32-*-linux-*)
diff --git a/gas/testsuite/gas/bpf/arith.d b/gas/testsuite/gas/bpf/arith.d
new file mode 100644
index 0000000..d63de38
--- /dev/null
+++ b/gas/testsuite/gas/bpf/arith.d
@@ -0,0 +1,61 @@
+#as: -EL
+#objdump: -dr
+#name: arith
+
+.*: +file format elf64-bpfle
+
+Disassembly of section .text:
+
+0000000000000000 <.text>:
+   0:	0f 12 00 00 00 00 00 00 	add	r1, r2
+   8:	07 10 00 00 05 00 00 00 	add	r1, 5
+  10:	0c 12 00 00 00 00 00 00 	add32	r1, r2
+  18:	04 10 00 00 05 00 00 00 	add32	r1, 5
+  20:	1f 12 00 00 00 00 00 00 	sub	r1, r2
+  28:	17 10 00 00 05 00 00 00 	sub	r1, 5
+  30:	1c 12 00 00 00 00 00 00 	sub32	r1, r2
+  38:	14 10 00 00 05 00 00 00 	sub32	r1, 5
+  40:	5f 12 00 00 00 00 00 00 	and	r1, r2
+  48:	57 10 00 00 ff 00 00 00 	and	r1, 255
+  50:	5c 12 00 00 00 00 00 00 	and32	r1, r2
+  58:	54 10 00 00 ff 00 00 00 	and32	r1, 255
+  60:	4f 12 00 00 00 00 00 00 	or	r1, r2
+  68:	a7 10 00 00 80 00 00 00 	or	r1, 128
+  70:	4c 12 00 00 00 00 00 00 	or32	r1, r2
+  78:	a4 10 00 00 80 00 00 00 	or32	r1, 128
+  80:	af 12 00 00 00 00 00 00 	xor	r1, r2
+  88:	47 10 00 00 1f 00 00 00 	xor	r1, 31
+  90:	ac 12 00 00 00 00 00 00 	xor32	r1, r2
+  98:	44 10 00 00 1f 00 00 00 	xor32	r1, 31
+  a0:	2f 12 00 00 00 00 00 00 	mul	r1, r2
+  a8:	27 10 00 00 05 00 00 00 	mul	r1, 5
+  b0:	2c 12 00 00 00 00 00 00 	mul32	r1, r2
+  b8:	24 10 00 00 05 00 00 00 	mul32	r1, 5
+  c0:	3f 12 00 00 00 00 00 00 	div	r1, r2
+  c8:	37 10 00 00 02 00 00 00 	div	r1, 2
+  d0:	3c 12 00 00 00 00 00 00 	div32	r1, r2
+  d8:	34 10 00 00 02 00 00 00 	div32	r1, 2
+  e0:	9f 12 00 00 00 00 00 00 	mod	r1, r2
+  e8:	97 10 00 00 03 00 00 00 	mod	r1, 3
+  f0:	9c 12 00 00 00 00 00 00 	mod32	r1, r2
+  f8:	94 10 00 00 03 00 00 00 	mod32	r1, 3
+ 100:	6f 12 00 00 00 00 00 00 	lsh	r1, r2
+ 108:	67 10 00 00 01 00 00 00 	lsh	r1, 1
+ 110:	6c 12 00 00 00 00 00 00 	lsh32	r1, r2
+ 118:	64 10 00 00 01 00 00 00 	lsh32	r1, 1
+ 120:	7f 12 00 00 00 00 00 00 	rsh	r1, r2
+ 128:	77 10 00 00 01 00 00 00 	rsh	r1, 1
+ 130:	7c 12 00 00 00 00 00 00 	rsh32	r1, r2
+ 138:	74 10 00 00 01 00 00 00 	rsh32	r1, 1
+ 140:	cf 12 00 00 00 00 00 00 	arsh	r1, r2
+ 148:	c7 10 00 00 04 00 00 00 	arsh	r1, 4
+ 150:	cc 12 00 00 00 00 00 00 	arsh32	r1, r2
+ 158:	c4 10 00 00 04 00 00 00 	arsh32	r1, 4
+ 160:	8f 10 00 00 00 00 00 00 	neg	r1
+ 168:	8c 10 00 00 00 00 00 00 	neg32	r1
+ 170:	dc 10 00 00 10 00 00 00 	endbe	r1, 16
+ 178:	dc 10 00 00 20 00 00 00 	endbe	r1, 32
+ 180:	dc 10 00 00 40 00 00 00 	endbe	r1, 64
+ 188:	d4 10 00 00 10 00 00 00 	endle	r1, 16
+ 190:	d4 10 00 00 20 00 00 00 	endle	r1, 32
+ 198:	d4 10 00 00 40 00 00 00 	endle	r1, 64
diff --git a/gas/testsuite/gas/bpf/arith.s b/gas/testsuite/gas/bpf/arith.s
new file mode 100644
index 0000000..58bf2a5
--- /dev/null
+++ b/gas/testsuite/gas/bpf/arith.s
@@ -0,0 +1,53 @@
+	.text
+	add	r1, r2
+	add	r1, 5
+	add32	r1, r2
+	add32	r1, 5
+	sub	r1, r2
+	sub	r1, 5
+	sub32	r1, r2
+	sub32	r1, 5
+	and	r1, r2
+	and	r1, 0xff
+	and32	r1, r2
+	and32	r1, 0xff
+	or	r1, r2
+	or	r1, 0x80
+	or32	r1, r2
+	or32	r1, 0x80
+	xor	r1, r2
+	xor	r1, 0x1f
+	xor32	r1, r2
+	xor32	r1, 0x1f
+	mul	r1, r2
+	mul	r1, 5
+	mul32	r1, r2
+	mul32	r1, 5
+	div	r1, r2
+	div	r1, 2
+	div32	r1, r2
+	div32	r1, 2
+	mod	r1, r2
+	mod	r1, 3
+	mod32	r1, r2
+	mod32	r1, 3
+	lsh	r1, r2
+	lsh	r1, 1
+	lsh32	r1, r2
+	lsh32	r1, 1
+	rsh	r1, r2
+	rsh	r1, 1
+	rsh32	r1, r2
+	rsh32	r1, 1
+	arsh	r1, r2
+	arsh	r1, 4
+	arsh32	r1, r2
+	arsh32	r1, 4
+	neg	r1
+	neg32	r1
+	endbe	r1, 16
+	endbe	r1, 32
+	endbe	r1, 64
+	endle	r1, 16
+	endle	r1, 32
+	endle	r1, 64
diff --git a/gas/testsuite/gas/bpf/atomics.d b/gas/testsuite/gas/bpf/atomics.d
new file mode 100644
index 0000000..fc710d6
--- /dev/null
+++ b/gas/testsuite/gas/bpf/atomics.d
@@ -0,0 +1,12 @@
+#as: -EL
+#objdump: -dr
+#name: atomics
+
+.*: +file format elf64-bpfle
+
+Disassembly of section .text:
+
+0000000000000000 <.text>:
+   0:	b7 20 00 00 06 00 00 00 	mov	r2, 6
+   8:	db 12 00 00 00 00 00 00 	xadddw	\[r1\+0\], r2
+  10:	c3 12 08 00 00 00 00 00 	xaddw	\[r1\+8\], r2
diff --git a/gas/testsuite/gas/bpf/atomics.s b/gas/testsuite/gas/bpf/atomics.s
new file mode 100644
index 0000000..6552ef3
--- /dev/null
+++ b/gas/testsuite/gas/bpf/atomics.s
@@ -0,0 +1,4 @@
+	.text
+	mov	r2, 6
+	xadddw	[r1+0], r2
+	xaddw	[r1+8], r2
diff --git a/gas/testsuite/gas/bpf/bpf.exp b/gas/testsuite/gas/bpf/bpf.exp
new file mode 100644
index 0000000..363fd2c
--- /dev/null
+++ b/gas/testsuite/gas/bpf/bpf.exp
@@ -0,0 +1,28 @@
+# Copyright (C) 2017 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+# 
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA.  
+
+# BPF assembler testsuite
+
+if [istarget bpf*-*-*] {
+    run_dump_test "arith"
+    run_dump_test "jump"
+    run_dump_test "move"
+    run_dump_test "loads"
+    run_dump_test "stores"
+    run_dump_test "atomics"
+    run_dump_test "call"
+    run_dump_test "imm64"
+}
diff --git a/gas/testsuite/gas/bpf/call.d b/gas/testsuite/gas/bpf/call.d
new file mode 100644
index 0000000..e142050
--- /dev/null
+++ b/gas/testsuite/gas/bpf/call.d
@@ -0,0 +1,18 @@
+#as: -EL
+#objdump: -dr
+#name: call
+
+.*: +file format elf64-bpfle
+
+Disassembly of section .text:
+
+0000000000000000 <.text>:
+   0:	85 00 00 00 01 00 00 00 	call	0x1
+   8:	85 00 00 00 02 00 00 00 	call	0x2
+  10:	85 00 00 00 03 00 00 00 	call	0x3
+  18:	85 00 00 00 04 00 00 00 	call	0x4
+  20:	8d 00 00 00 05 00 00 00 	tailcall	0x5
+  28:	8d 00 00 00 06 00 00 00 	tailcall	0x6
+  30:	8d 00 00 00 07 00 00 00 	tailcall	0x7
+  38:	8d 00 00 00 09 00 00 00 	tailcall	0x9
+  40:	95 00 00 00 00 00 00 00 	exit	
diff --git a/gas/testsuite/gas/bpf/call.s b/gas/testsuite/gas/bpf/call.s
new file mode 100644
index 0000000..6fdc4f2
--- /dev/null
+++ b/gas/testsuite/gas/bpf/call.s
@@ -0,0 +1,10 @@
+	.text
+	call	1
+	call	2
+	call	3
+	call	4
+	tailcall 5
+	tailcall 6
+	tailcall 7
+	tailcall 9
+	exit
diff --git a/gas/testsuite/gas/bpf/imm64.d b/gas/testsuite/gas/bpf/imm64.d
new file mode 100644
index 0000000..4dcaf7b
--- /dev/null
+++ b/gas/testsuite/gas/bpf/imm64.d
@@ -0,0 +1,30 @@
+#as: -EL
+#objdump: -dr
+#name: imm64a
+
+.*: +file format elf64-bpfle
+
+Disassembly of section .text:
+
+0000000000000000 <.text>:
+   0:	18 10 00 00 01 00 00 00 	ldimm64	r1, 1
+   8:	00 00 00 00 00 00 00 00 
+  10:	18 10 00 00 02 00 00 00 	ldimm64	r1, 2
+  18:	00 00 00 00 00 00 00 00 
+  20:	18 10 00 00 00 00 01 00 	ldimm64	r1, 65536
+  28:	00 00 00 00 00 00 00 00 
+  30:	18 10 00 00 ff ff ff ff 	ldimm64	r1, 4294967295
+  38:	00 00 00 00 00 00 00 00 
+  40:	18 10 00 00 01 00 00 00 	ldimm64	r1, -4294967295
+  48:	00 00 00 00 ff ff ff ff 
+  50:	18 10 00 00 ff ff ff ff 	ldimm64	r1, -1
+  58:	00 00 00 00 ff ff ff ff 
+  60:	18 20 00 00 00 ff ff ff 	ldimm64	r2, -256
+  68:	00 00 00 00 ff ff ff ff 
+  70:	18 30 00 00 00 00 ff ff 	ldimm64	r3, -65536
+  78:	00 00 00 00 ff ff ff ff 
+  80:	18 40 00 00 00 00 00 00 	ldimm64	r4, 4294967296
+  88:	00 00 00 00 01 00 00 00 
+  90:	18 50 00 00 00 00 00 00 	ldimm64	r5, -9223372036854775808
+  98:	00 00 00 00 00 00 00 80 
+  a0:	95 00 00 00 00 00 00 00 	exit	
diff --git a/gas/testsuite/gas/bpf/imm64.s b/gas/testsuite/gas/bpf/imm64.s
new file mode 100644
index 0000000..929e357
--- /dev/null
+++ b/gas/testsuite/gas/bpf/imm64.s
@@ -0,0 +1,12 @@
+	.text
+	ldimm64	r1, 1
+	ldimm64	r1, 2
+	ldimm64	r1, 65536
+	ldimm64	r1, 4294967295
+	ldimm64	r1, -4294967295
+	ldimm64	r1, -1
+	ldimm64	r2, -256
+	ldimm64	r3, -65536
+	ldimm64	r4, 4294967296
+	ldimm64 r5, -9223372036854775808
+	exit
diff --git a/gas/testsuite/gas/bpf/jump.d b/gas/testsuite/gas/bpf/jump.d
new file mode 100644
index 0000000..fc1e6bd
--- /dev/null
+++ b/gas/testsuite/gas/bpf/jump.d
@@ -0,0 +1,43 @@
+#as: -EL
+#objdump: -dr
+#name: jump
+
+.*: +file format elf64-bpfle
+
+Disassembly of section .text:
+
+0000000000000000 <.text>:
+   0:	05 00 03 00 00 00 00 00 	ja	0x10
+   8:	bf 11 00 00 00 00 00 00 	mov	r1, r1
+  10:	b7 20 00 00 03 00 00 00 	mov	r2, 3
+  18:	25 20 06 00 02 00 00 00 	jgt	r2, 2, 0x40
+  20:	bf 11 00 00 00 00 00 00 	mov	r1, r1
+  28:	b7 30 00 00 03 00 00 00 	mov	r3, 3
+  30:	15 30 03 00 03 00 00 00 	jeq	r3, 3, 0x40
+  38:	bf 11 00 00 00 00 00 00 	mov	r1, r1
+  40:	1d 32 03 00 00 00 00 00 	jeq	r3, r2, 0x50
+  48:	bf 11 00 00 00 00 00 00 	mov	r1, r1
+  50:	b7 40 00 00 04 00 00 00 	mov	r4, 4
+  58:	2d 43 03 00 00 00 00 00 	jgt	r4, r3, 0x68
+  60:	bf 11 00 00 00 00 00 00 	mov	r1, r1
+  68:	3d 43 03 00 00 00 00 00 	jge	r4, r3, 0x78
+  70:	bf 11 00 00 00 00 00 00 	mov	r1, r1
+  78:	35 30 03 00 03 00 00 00 	jge	r3, 3, 0x88
+  80:	bf 11 00 00 00 00 00 00 	mov	r1, r1
+  88:	5d 43 03 00 00 00 00 00 	jne	r4, r3, 0x98
+  90:	bf 11 00 00 00 00 00 00 	mov	r1, r1
+  98:	55 30 03 00 03 00 00 00 	jne	r3, 3, 0xa8
+  a0:	bf 11 00 00 00 00 00 00 	mov	r1, r1
+  a8:	6d 43 03 00 00 00 00 00 	jsgt	r4, r3, 0xb8
+  b0:	bf 11 00 00 00 00 00 00 	mov	r1, r1
+  b8:	65 30 03 00 03 00 00 00 	jsgt	r3, 3, 0xc8
+  c0:	bf 11 00 00 00 00 00 00 	mov	r1, r1
+  c8:	7d 43 03 00 00 00 00 00 	jsge	r4, r3, 0xd8
+  d0:	bf 11 00 00 00 00 00 00 	mov	r1, r1
+  d8:	75 30 03 00 03 00 00 00 	jsge	r3, 3, 0xe8
+  e0:	bf 11 00 00 00 00 00 00 	mov	r1, r1
+  e8:	4d 43 03 00 00 00 00 00 	jset	r4, r3, 0xf8
+  f0:	bf 11 00 00 00 00 00 00 	mov	r1, r1
+  f8:	45 30 03 00 03 00 00 00 	jset	r3, 3, 0x108
+ 100:	bf 11 00 00 00 00 00 00 	mov	r1, r1
+ 108:	95 00 00 00 00 00 00 00 	exit	
diff --git a/gas/testsuite/gas/bpf/jump.s b/gas/testsuite/gas/bpf/jump.s
new file mode 100644
index 0000000..4e084b4
--- /dev/null
+++ b/gas/testsuite/gas/bpf/jump.s
@@ -0,0 +1,35 @@
+	.text
+	ja	1f
+	mov	r1, r1
+1:	mov	r2, 3
+	jgt	r2, 2, 1f
+	mov	r1, r1
+	mov	r3, 3
+	jeq	r3, 3, 1f
+	mov	r1, r1
+1:	jeq	r3, r2, 1f
+	mov	r1, r1
+1:	mov	r4, 4
+	jgt	r4, r3, 1f
+	mov	r1, r1
+1:	jge	r4, r3, 1f
+	mov	r1, r1
+1:	jge	r3, 3, 1f
+	mov	r1, r1
+1:	jne	r4, r3, 1f
+	mov	r1, r1
+1:	jne	r3, 3, 1f
+	mov	r1, r1
+1:	jsgt	r4, r3, 1f
+	mov	r1, r1
+1:	jsgt	r3, 3, 1f
+	mov	r1, r1
+1:	jsge	r4, r3, 1f
+	mov	r1, r1
+1:	jsge	r3, 3, 1f
+	mov	r1, r1
+1:	jset	r4, r3, 1f
+	mov	r1, r1
+1:	jset	r3, 3, 1f
+	mov	r1, r1
+1:	exit
diff --git a/gas/testsuite/gas/bpf/loads.d b/gas/testsuite/gas/bpf/loads.d
new file mode 100644
index 0000000..d981ef6
--- /dev/null
+++ b/gas/testsuite/gas/bpf/loads.d
@@ -0,0 +1,23 @@
+#as: -EL
+#objdump: -dr
+#name: loads
+
+.*: +file format elf64-bpfle
+
+Disassembly of section .text:
+
+0000000000000000 <.text>:
+   0:	71 12 03 00 00 00 00 00 	ldb	r1, \[r2\+3\]
+   8:	69 12 02 00 00 00 00 00 	ldh	r1, \[r2\+2\]
+  10:	61 12 04 00 00 00 00 00 	ldw	r1, \[r2\+4\]
+  18:	79 12 08 00 00 00 00 00 	lddw	r1, \[r2\+8\]
+  20:	61 34 04 00 00 00 00 00 	ldw	r3, \[r4\+4\]
+  28:	61 44 08 00 00 00 00 00 	ldw	r4, \[r4\+8\]
+  30:	61 54 00 00 00 00 00 00 	ldw	r5, \[r4\+0\]
+  38:	69 33 02 00 00 00 00 00 	ldh	r3, \[r3\+2\]
+  40:	69 43 04 00 00 00 00 00 	ldh	r4, \[r3\+4\]
+  48:	69 53 00 00 00 00 00 00 	ldh	r5, \[r3\+0\]
+  50:	71 33 01 00 00 00 00 00 	ldb	r3, \[r3\+1\]
+  58:	71 43 02 00 00 00 00 00 	ldb	r4, \[r3\+2\]
+  60:	71 53 03 00 00 00 00 00 	ldb	r5, \[r3\+3\]
+  68:	71 63 00 00 00 00 00 00 	ldb	r6, \[r3\+0\]
diff --git a/gas/testsuite/gas/bpf/loads.s b/gas/testsuite/gas/bpf/loads.s
new file mode 100644
index 0000000..8602897
--- /dev/null
+++ b/gas/testsuite/gas/bpf/loads.s
@@ -0,0 +1,15 @@
+	.text
+	ldb	r1, [r2+3]
+	ldh	r1, [r2+2]
+	ldw	r1, [r2+4]
+	lddw	r1, [r2+8]
+	ldw	r3, [r4+4]
+	ldw	r4, [r4+8]
+	ldw	r5, [r4+0]
+	ldh	r3, [r3+2]
+	ldh	r4, [r3+4]
+	ldh	r5, [r3+0]
+	ldb	r3, [r3+1]
+	ldb	r4, [r3+2]
+	ldb	r5, [r3+3]
+	ldb	r6, [r3+0]
diff --git a/gas/testsuite/gas/bpf/move.d b/gas/testsuite/gas/bpf/move.d
new file mode 100644
index 0000000..f15ad23
--- /dev/null
+++ b/gas/testsuite/gas/bpf/move.d
@@ -0,0 +1,19 @@
+#as: -EL
+#objdump: -dr
+#name: move
+
+.*: +file format elf64-bpfle
+
+Disassembly of section .text:
+
+0000000000000000 <.text>:
+   0:	bf 12 00 00 00 00 00 00 	mov	r1, r2
+   8:	b7 10 00 00 ef 00 00 00 	mov	r1, 239
+  10:	bc 12 00 00 00 00 00 00 	mov32	r1, r2
+  18:	b4 10 00 00 ef 00 00 00 	mov32	r1, 239
+  20:	bf 36 00 00 00 00 00 00 	mov	r3, r6
+  28:	bf 63 00 00 00 00 00 00 	mov	r6, r3
+  30:	bf 89 00 00 00 00 00 00 	mov	r8, r9
+  38:	bf a1 00 00 00 00 00 00 	mov	r10, r1
+  40:	bf 73 00 00 00 00 00 00 	mov	r7, r3
+  48:	b7 50 00 00 02 00 00 00 	mov	r5, 2
diff --git a/gas/testsuite/gas/bpf/move.s b/gas/testsuite/gas/bpf/move.s
new file mode 100644
index 0000000..36797b3
--- /dev/null
+++ b/gas/testsuite/gas/bpf/move.s
@@ -0,0 +1,11 @@
+	.text
+	mov	r1, r2
+	mov	r1, 0xef
+	mov32	r1, r2
+	mov32	r1, 0xef
+	mov	r3, r6
+	mov	r6, r3
+	mov	r8, r9
+	mov	r10, r1
+	mov	r7, r3
+	mov	r5, 2
diff --git a/gas/testsuite/gas/bpf/stores.d b/gas/testsuite/gas/bpf/stores.d
new file mode 100644
index 0000000..0f416e0
--- /dev/null
+++ b/gas/testsuite/gas/bpf/stores.d
@@ -0,0 +1,17 @@
+#as: -EL
+#objdump: -dr
+#name: stores
+
+.*: +file format elf64-bpfle
+
+Disassembly of section .text:
+
+0000000000000000 <.text>:
+   0:	63 12 00 00 00 00 00 00 	stw	\[r1\+0\], r2
+   8:	62 10 04 00 00 00 00 00 	stw	\[r1\+4\], 0
+  10:	6b 13 00 00 00 00 00 00 	sth	\[r1\+0\], r3
+  18:	6a 10 02 00 01 00 00 00 	sth	\[r1\+2\], 1
+  20:	73 14 00 00 00 00 00 00 	stb	\[r1\+0\], r4
+  28:	72 10 02 00 02 00 00 00 	stb	\[r1\+2\], 2
+  30:	7b 15 08 00 00 00 00 00 	stdw	\[r1\+8\], r5
+  38:	7a 10 10 00 10 00 00 00 	stdw	\[r1\+16\], 16
diff --git a/gas/testsuite/gas/bpf/stores.s b/gas/testsuite/gas/bpf/stores.s
new file mode 100644
index 0000000..d164f2a
--- /dev/null
+++ b/gas/testsuite/gas/bpf/stores.s
@@ -0,0 +1,9 @@
+	.text
+	stw	[r1+0], r2
+	stw	[r1+4], 0
+	sth	[r1+0], r3
+	sth	[r1+2], 1
+	stb	[r1+0], r4
+	stb	[r1+2], 2
+	stdw	[r1+8], r5
+	stdw	[r1+16], 16
diff --git a/gdb/bpf-tdep.c b/gdb/bpf-tdep.c
new file mode 100644
index 0000000..6629f73
--- /dev/null
+++ b/gdb/bpf-tdep.c
@@ -0,0 +1,229 @@
+/* Target-dependent code for eBPF, for GDB.
+
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include "defs.h"
+#include "inferior.h"
+#include "gdbcore.h"
+#include "arch-utils.h"
+#include "regcache.h"
+#include "frame.h"
+#include "frame-unwind.h"
+#include "frame-base.h"
+#include "trad-frame.h"
+#include "dis-asm.h"
+#include "dwarf2-frame.h"
+#include "symtab.h"
+#include "elf-bfd.h"
+#include "osabi.h"
+#include "infcall.h"
+#include "bpf-tdep.h"
+
+static const char * const bpf_register_name_strings[] =
+{
+  "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7",
+  "r8", "r9", "r10", "pc",
+};
+
+#define NUM_BPF_REGNAMES ARRAY_SIZE (bpf_register_name_strings)
+
+/* Return the BPF register name corresponding to register I.  */
+
+static const char *
+bpf_register_name (struct gdbarch *gdbarch, int i)
+{
+  return bpf_register_name_strings[i];
+}
+
+/* Return the GDB type object for the "standard" data type of data in
+   register N.  */
+
+static struct type *
+bpf_register_type (struct gdbarch *gdbarch, int regnum)
+{
+  if (regnum == BPF_R10_REGNUM)
+    return builtin_type (gdbarch)->builtin_data_ptr;
+
+  if (regnum == BPF_PC_REGNUM)
+    return builtin_type (gdbarch)->builtin_func_ptr;
+
+  return builtin_type (gdbarch)->builtin_int32;
+}
+
+/* Convert DWARF2 register number REG to the appropriate register number
+   used by GDB.  */
+
+static int
+bpf_reg_to_regnum (struct gdbarch *gdbarch, int reg)
+{
+  if (reg < 0 || reg >= BPF_NUM_REGS)
+    return -1;
+
+  return reg;
+}
+
+static struct frame_id
+bpf_dummy_id (struct gdbarch *gdbarch, struct frame_info *this_frame)
+{
+  CORE_ADDR sp;
+
+  sp = get_frame_register_unsigned (this_frame, BPF_R10_REGNUM);
+
+  return frame_id_build (sp, get_frame_pc (this_frame));
+}
+
+static CORE_ADDR
+bpf_push_dummy_call (struct gdbarch *gdbarch,
+		      struct value *function,
+		      struct regcache *regcache,
+		      CORE_ADDR bp_addr,
+		      int nargs,
+		      struct value **args,
+		      CORE_ADDR sp,
+		      int struct_return,
+		      CORE_ADDR struct_addr)
+{
+  return sp; /* XXX */
+}
+
+/* Extract a function return value of TYPE from REGCACHE, and copy
+   that into VALBUF.  */
+
+static void
+bpf_extract_return_value (struct type *type, struct regcache *regcache,
+			  gdb_byte *valbuf)
+{
+  int len = TYPE_LENGTH (type);
+  gdb_byte buf[8];
+
+  regcache_cooked_read (regcache, BPF_R0_REGNUM, buf);
+  memcpy (valbuf, buf + 8 - len, len);
+}
+
+/* Store the function return value of type TYPE from VALBUF into
+   REGCACHE.  */
+
+static void
+bpf_store_return_value (struct type *type, struct regcache *regcache,
+			const gdb_byte *valbuf)
+{
+  int len = TYPE_LENGTH (type);
+  gdb_byte buf[8];
+
+  memcpy (buf + 8 - len, valbuf, len);
+  regcache_cooked_write (regcache, BPF_R0_REGNUM, buf);
+}
+
+/* Determine, for architecture GDBARCH, how a return value of TYPE
+   should be returned.  If it is supposed to be returned in registers,
+   and READBUF is nonzero, read the appropriate value from REGCACHE,
+   and copy it into READBUF.  If WRITEBUF is nonzero, write the value
+   from WRITEBUF into REGCACHE.  */
+
+static enum return_value_convention
+bpf_return_value (struct gdbarch *gdbarch,
+		   struct value *function,
+		   struct type *type,
+		   struct regcache *regcache,
+		   gdb_byte *readbuf,
+		   const gdb_byte *writebuf)
+{
+  if (TYPE_LENGTH (type) > 8)
+    return RETURN_VALUE_STRUCT_CONVENTION;
+
+  if (readbuf)
+    bpf_extract_return_value (type, regcache, readbuf);
+
+  if (writebuf)
+    bpf_store_return_value (type, regcache, writebuf);
+
+  return RETURN_VALUE_REGISTER_CONVENTION;
+}
+
+static CORE_ADDR
+bpf_unwind_pc (struct gdbarch *gdbarch, struct frame_info *next_frame)
+{
+  return frame_unwind_register_unsigned (next_frame, BPF_PC_REGNUM);
+}
+
+/* Skip all the insns that appear in generated function prologues.  */
+
+static CORE_ADDR
+bpf_skip_prologue (struct gdbarch *gdbarch, CORE_ADDR pc)
+{
+  return pc;
+}
+
+/* Implement the breakpoint_kind_from_pc gdbarch method.  */
+
+static int
+bpf_breakpoint_kind_from_pc (struct gdbarch *gdbarch, CORE_ADDR *pcptr)
+{
+  return 8;
+}
+
+/* Initialize the current architecture based on INFO.  If possible,
+   re-use an architecture from ARCHES, which is a list of
+   architectures already created during this debugging session.
+
+   Called e.g. at program startup, when reading a core file, and when
+   reading a binary file.  */
+
+static struct gdbarch *
+bpf_gdbarch_init (struct gdbarch_info info, struct gdbarch_list *arches)
+{
+  struct gdbarch_tdep *tdep;
+  struct gdbarch *gdbarch;
+
+  tdep = XNEW (struct gdbarch_tdep);
+  gdbarch = gdbarch_alloc (&info, tdep);
+  
+  tdep->xxx = 0;
+
+  set_gdbarch_num_regs (gdbarch, BPF_NUM_REGS);
+  set_gdbarch_sp_regnum (gdbarch, BPF_R10_REGNUM);
+  set_gdbarch_pc_regnum (gdbarch, BPF_PC_REGNUM);
+  set_gdbarch_dwarf2_reg_to_regnum (gdbarch, bpf_reg_to_regnum);
+  set_gdbarch_register_name (gdbarch, bpf_register_name);
+  set_gdbarch_register_type (gdbarch, bpf_register_type);
+  set_gdbarch_dummy_id (gdbarch, bpf_dummy_id);
+  set_gdbarch_push_dummy_call (gdbarch, bpf_push_dummy_call);
+  set_gdbarch_return_value (gdbarch, bpf_return_value);
+  set_gdbarch_inner_than (gdbarch, core_addr_lessthan);
+  set_gdbarch_frame_args_skip (gdbarch, 8);
+  set_gdbarch_unwind_pc (gdbarch, bpf_unwind_pc);
+  set_gdbarch_print_insn (gdbarch, print_insn_bpf);
+
+  set_gdbarch_skip_prologue (gdbarch, bpf_skip_prologue);
+  set_gdbarch_breakpoint_kind_from_pc (gdbarch, bpf_breakpoint_kind_from_pc);
+
+  /* Hook in ABI-specific overrides, if they have been registered.  */
+  gdbarch_init_osabi (info, gdbarch);
+
+  dwarf2_append_unwinders (gdbarch);
+  return gdbarch;
+}
+
+/* Provide a prototype to silence -Wmissing-prototypes.  */
+extern initialize_file_ftype _initialize_bpf_tdep;
+
+void
+_initialize_bpf_tdep (void)
+{
+  register_gdbarch_init (bfd_arch_bpf, bpf_gdbarch_init);
+}
diff --git a/gdb/bpf-tdep.h b/gdb/bpf-tdep.h
new file mode 100644
index 0000000..52cae6d
--- /dev/null
+++ b/gdb/bpf-tdep.h
@@ -0,0 +1,40 @@
+/* Target-dependent code for eBPF, for GDB.
+
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+enum gdb_regnum {
+  BPF_R0_REGNUM = 0,
+  BPF_R1_REGNUM,
+  BPF_R2_REGNUM,
+  BPF_R3_REGNUM,
+  BPF_R4_REGNUM,
+  BPF_R5_REGNUM,
+  BPF_R6_REGNUM,
+  BPF_R7_REGNUM,
+  BPF_R8_REGNUM,
+  BPF_R9_REGNUM,
+  BPF_R10_REGNUM,
+  BPF_PC_REGNUM,
+};
+
+#define BPF_NUM_REGS	(BPF_PC_REGNUM + 1)
+
+struct gdbarch_tdep
+{
+  int xxx;
+};
diff --git a/gdb/configure.tgt b/gdb/configure.tgt
index fdcb7b1..e8d5fb4 100644
--- a/gdb/configure.tgt
+++ b/gdb/configure.tgt
@@ -142,6 +142,10 @@ bfin-*-*)
 	gdb_sim=../sim/bfin/libsim.a
 	;;
 
+bpf*)
+	# Target: eBPF
+	gdb_target_obs="bpf-tdep.o"
+	;;
 cris*)
 	# Target: CRIS
 	gdb_target_obs="cris-tdep.o cris-linux-tdep.o linux-tdep.o solib-svr4.o"
diff --git a/include/dis-asm.h b/include/dis-asm.h
index 6f1801d..cbfebc8 100644
--- a/include/dis-asm.h
+++ b/include/dis-asm.h
@@ -241,6 +241,7 @@ extern int print_insn_aarch64		(bfd_vma, disassemble_info *);
 extern int print_insn_alpha		(bfd_vma, disassemble_info *);
 extern int print_insn_avr		(bfd_vma, disassemble_info *);
 extern int print_insn_bfin		(bfd_vma, disassemble_info *);
+extern int print_insn_bpf		(bfd_vma, disassemble_info *);
 extern int print_insn_big_arm		(bfd_vma, disassemble_info *);
 extern int print_insn_big_mips		(bfd_vma, disassemble_info *);
 extern int print_insn_big_nios2		(bfd_vma, disassemble_info *);
diff --git a/include/elf/bpf.h b/include/elf/bpf.h
new file mode 100644
index 0000000..3a84d9a
--- /dev/null
+++ b/include/elf/bpf.h
@@ -0,0 +1,35 @@
+/* BPF ELF support for BFD.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   This file is part of BFD, the Binary File Descriptor library.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston,
+   MA 02110-1301, USA.  */
+
+#ifndef _ELF_BPF_H
+#define _ELF_BPF_H
+
+#include "elf/reloc-macros.h"
+
+/* Relocation types.  */
+START_RELOC_NUMBERS (elf_bpf_reloc_type)
+  RELOC_NUMBER (R_BPF_NONE, 0)
+  RELOC_NUMBER (R_BPF_16, 1)
+  RELOC_NUMBER (R_BPF_32, 2)
+  RELOC_NUMBER (R_BPF_64, 3)
+  RELOC_NUMBER (R_BPF_WDISP16, 4)
+END_RELOC_NUMBERS (R_BPF_max)
+
+#endif /* _ELF_BPF_H */
diff --git a/include/opcode/bpf.h b/include/opcode/bpf.h
new file mode 100644
index 0000000..298ed1b
--- /dev/null
+++ b/include/opcode/bpf.h
@@ -0,0 +1,16 @@
+#ifndef OPCODE_BPF_H
+#define OPCODE_BPF_H
+
+/* Structure of an opcode table entry.  */
+
+typedef struct bpf_opcode
+{
+  const char *name;
+  unsigned char code;
+  const char *args;
+} bpf_opcode;
+
+extern const struct bpf_opcode bpf_opcodes[];
+extern const int bpf_num_opcodes;
+
+#endif /* OPCODE_BPF_H */
diff --git a/ld/Makefile.am b/ld/Makefile.am
index 3aa7e80..d840bed 100644
--- a/ld/Makefile.am
+++ b/ld/Makefile.am
@@ -477,6 +477,7 @@ ALL_64_EMULATION_SOURCES = \
 	eelf32ltsmipn32_fbsd.c \
 	eelf32mipswindiss.c \
 	eelf64_aix.c \
+	eelf64_bpf.c \
 	eelf64_ia64.c \
 	eelf64_ia64_fbsd.c \
 	eelf64_ia64_vms.c \
@@ -1920,6 +1921,9 @@ eelf32_x86_64_nacl.c: $(srcdir)/emulparams/elf32_x86_64_nacl.sh \
 eelf64_aix.c: $(srcdir)/emulparams/elf64_aix.sh \
   $(ELF_DEPS) $(srcdir)/scripttempl/elf.sc ${GEN_DEPENDS}
 
+eelf64_bpf.c: $(srcdir)/emulparams/elf64_bpf.sh \
+  $(ELF_DEPS) $(srcdir)/scripttempl/elf.sc ${GEN_DEPENDS}
+
 eelf64_ia64.c: $(srcdir)/emulparams/elf64_ia64.sh \
   $(ELF_DEPS) $(srcdir)/emultempl/ia64elf.em \
   $(srcdir)/emultempl/needrelax.em \
diff --git a/ld/Makefile.in b/ld/Makefile.in
index f485f4f..706a889 100644
--- a/ld/Makefile.in
+++ b/ld/Makefile.in
@@ -845,6 +845,7 @@ ALL_64_EMULATION_SOURCES = \
 	eelf32ltsmipn32_fbsd.c \
 	eelf32mipswindiss.c \
 	eelf64_aix.c \
+	eelf64_bpf.c \
 	eelf64_ia64.c \
 	eelf64_ia64_fbsd.c \
 	eelf64_ia64_vms.c \
@@ -1292,6 +1293,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/eelf32xstormy16.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/eelf32xtensa.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/eelf64_aix.Po@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/eelf64_bpf.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/eelf64_ia64.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/eelf64_ia64_fbsd.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/eelf64_ia64_vms.Po@am__quote@
@@ -3484,6 +3486,9 @@ eelf32_x86_64_nacl.c: $(srcdir)/emulparams/elf32_x86_64_nacl.sh \
 eelf64_aix.c: $(srcdir)/emulparams/elf64_aix.sh \
   $(ELF_DEPS) $(srcdir)/scripttempl/elf.sc ${GEN_DEPENDS}
 
+eelf64_bpf.c: $(srcdir)/emulparams/elf64_bpf.sh \
+  $(ELF_DEPS) $(srcdir)/scripttempl/elf.sc ${GEN_DEPENDS}
+
 eelf64_ia64.c: $(srcdir)/emulparams/elf64_ia64.sh \
   $(ELF_DEPS) $(srcdir)/emultempl/ia64elf.em \
   $(srcdir)/emultempl/needrelax.em \
diff --git a/ld/configure.tgt b/ld/configure.tgt
index 895f0fb..13645f5 100644
--- a/ld/configure.tgt
+++ b/ld/configure.tgt
@@ -177,6 +177,8 @@ bfin-*-linux-uclibc*)	targ_emul=elf32bfinfd;
 			targ_extra_emuls="elf32bfin"
 			targ_extra_libpath=$targ_extra_emuls
 			;;
+bpf-*-elf)		targ_emul=elf64_bpf
+			;;
 cr16-*-elf*)            targ_emul=elf32cr16 ;;
 cr16c-*-elf*)           targ_emul=elf32cr16c
 			;;
diff --git a/ld/emulparams/elf64_bpf.sh b/ld/emulparams/elf64_bpf.sh
new file mode 100644
index 0000000..0e1e549
--- /dev/null
+++ b/ld/emulparams/elf64_bpf.sh
@@ -0,0 +1,8 @@
+# See genscripts.sh and ../scripttempl/elf.sc for the meaning of these.
+SCRIPT_NAME=elf
+ELFSIZE=64
+TEMPLATE_NAME=elf32
+OUTPUT_FORMAT="elf64-bpf"
+TARGET_PAGE_SIZE=0x1000
+ARCH=bpf
+MACHINE=
diff --git a/opcodes/Makefile.am b/opcodes/Makefile.am
index 1ac6bb1..ccc9453 100644
--- a/opcodes/Makefile.am
+++ b/opcodes/Makefile.am
@@ -105,6 +105,8 @@ TARGET_LIBOPCODES_CFILES = \
 	arm-dis.c \
 	avr-dis.c \
 	bfin-dis.c \
+	bpf-dis.c \
+	bpf-opc.c \
 	cgen-asm.c \
 	cgen-bitset.c \
 	cgen-dis.c \
diff --git a/opcodes/bpf-dis.c b/opcodes/bpf-dis.c
new file mode 100644
index 0000000..2a0b7da
--- /dev/null
+++ b/opcodes/bpf-dis.c
@@ -0,0 +1,152 @@
+#include "sysdep.h"
+#include <stdio.h>
+#include "opcode/bpf.h"
+#include "dis-asm.h"
+#include "libiberty.h"
+
+#define HASH_SIZE 256
+#define HASH_INSN(CODE)	(CODE)
+
+typedef struct bpf_opcode_hash
+{
+  struct bpf_opcode_hash *next;
+  const bpf_opcode *opcode;
+} bpf_opcode_hash;
+
+static bpf_opcode_hash *opcode_hash_table[HASH_SIZE];
+
+static void
+build_hash_table (const bpf_opcode *opcode_table,
+		  bpf_opcode_hash **hash_table,
+		  int num_opcodes)
+{
+  static bpf_opcode_hash *hash_buf = NULL;
+  int i;
+
+  memset (hash_table, 0, HASH_SIZE * sizeof (hash_table[0]));
+  if (hash_buf != NULL)
+    free (hash_buf);
+  hash_buf = xmalloc (sizeof (* hash_buf) * num_opcodes);
+  for (i = num_opcodes - 1; i >= 0; --i)
+    {
+      int hash = HASH_INSN (opcode_table[i].code);
+      bpf_opcode_hash *h = &hash_buf[i];
+
+      h->next = hash_table[hash];
+      h->opcode = &opcode_table[i];
+      hash_table[hash] = h;
+    }
+}
+
+int
+print_insn_bpf (bfd_vma memaddr, disassemble_info *info)
+{
+  static unsigned long current_mach = 0;
+  static int opcodes_initialized = 0;
+  bfd_vma (*getword) (const void *);
+  bfd_vma (*gethalf) (const void *);
+  FILE *stream = info->stream;
+  bpf_opcode_hash *op;
+  int code, dest, src;
+  bfd_byte buffer[8];
+  unsigned short off;
+  int status, ret;
+  signed int imm;
+
+  if (!opcodes_initialized
+      || info->mach != current_mach)
+    {
+      build_hash_table (bpf_opcodes, opcode_hash_table, bpf_num_opcodes);
+      current_mach = info->mach;
+      opcodes_initialized = 1;
+    }
+
+  info->bytes_per_line = 8;
+
+  status = (*info->read_memory_func) (memaddr, buffer, sizeof (buffer), info);
+  if (status != 0)
+    {
+      (*info->memory_error_func) (status, memaddr, info);
+      return -1;
+    }
+
+  if (info->endian == BFD_ENDIAN_BIG)
+    {
+      getword = bfd_getb32;
+      gethalf = bfd_getb16;
+    }
+  else
+    {
+      getword = bfd_getl32;
+      gethalf = bfd_getl32;
+    }  
+
+  code = buffer[0];
+  dest = (buffer[1] & 0xf0) >> 4;
+  src = buffer[1] & 0x0f;
+  off = gethalf(&buffer[2]);
+  imm = getword(&buffer[4]);
+
+  ret = sizeof (buffer);
+  for (op = opcode_hash_table[HASH_INSN (code)]; op; op = op->next)
+    {
+      const bpf_opcode *opcode = op->opcode;
+      BFD_HOST_U_64_BIT value;
+      signed int imm2;
+      const char *s;
+
+      if (opcode->code != code)
+	continue;
+
+      (*info->fprintf_func) (stream, "%s\t", opcode->name);
+      for (s = opcode->args; *s != '\0'; s++)
+	{
+	  switch (*s)
+	    {
+	    case '+':
+	    default:
+	      (*info->fprintf_func) (stream, "%c", *s);
+	      break;
+	    case ',':
+	      (*info->fprintf_func) (stream, ", ");
+	      break;
+	    case '1':
+	      (*info->fprintf_func) (stream, "r%d", dest);
+	      break;
+	    case '2':
+	      (*info->fprintf_func) (stream, "r%d", src);
+	      break;
+	    case 'i':
+	      (*info->fprintf_func) (stream, "%d", imm);
+	      break;
+	    case 'O':
+	      (*info->fprintf_func) (stream, "%d", off);
+	      break;
+	    case 'L':
+	      info->target = memaddr + ((off - 1) * 8);
+	      (*info->print_address_func) (info->target, info);
+	      break;
+	    case 'C':
+	      info->target = imm;
+	      (*info->print_address_func) (info->target, info);
+	      break;
+	    case 'D':
+	      status = (*info->read_memory_func) (memaddr + 8, buffer,
+						  sizeof (buffer), info);
+	      if (status != 0)
+		{
+		  (*info->memory_error_func) (status, memaddr, info);
+		  return -1;
+		}
+	      ret += sizeof (buffer);
+	      imm2 = getword(&buffer[4]);
+	      value = ((BFD_HOST_U_64_BIT) (unsigned) imm2) << 32;
+	      value |= (BFD_HOST_U_64_BIT) (unsigned) imm;
+	      (*info->fprintf_func) (stream, "%lld", (long long) value);
+	      break;
+	    }
+	}
+    }
+
+  return ret;
+}
diff --git a/opcodes/bpf-opc.c b/opcodes/bpf-opc.c
new file mode 100644
index 0000000..bca8e47
--- /dev/null
+++ b/opcodes/bpf-opc.c
@@ -0,0 +1,147 @@
+#include "sysdep.h"
+#include <stdio.h>
+#include "opcode/bpf.h"
+
+#define BPF_OPC_ALU64	0x07
+#define BPF_OPC_DW	0x18
+#define BPF_OPC_XADD	0xc0
+#define BPF_OPC_MOV	0xb0
+#define BPF_OPC_ARSH	0xc0
+#define BPF_OPC_END	0xd0
+#define BPF_OPC_TO_LE	0x00
+#define BPF_OPC_TO_BE	0x08
+#define BPF_OPC_JNE	0x50
+#define BPF_OPC_JSGT	0x60
+#define BPF_OPC_JSGE	0x70
+#define BPF_OPC_CALL	0x80
+#define BPF_OPC_EXIT	0x90
+
+#define BPF_OPC_LD	0x00
+#define BPF_OPC_LDX	0x01
+#define BPF_OPC_ST	0x02
+#define BPF_OPC_STX	0x03
+#define BPF_OPC_ALU	0x04
+#define BPF_OPC_JMP	0x05
+#define BPF_OPC_RET	0x06
+#define BPF_OPC_MISC	0x07
+
+#define BPF_OPC_W	0x00
+#define BPF_OPC_H	0x08
+#define BPF_OPC_B	0x10
+
+#define BPF_OPC_IMM	0x00
+#define BPF_OPC_ABS	0x20
+#define BPF_OPC_IND	0x40
+#define BPF_OPC_MEM	0x60
+#define BPF_OPC_LEL	0x80
+#define BPF_OPC_MSH	0xa0
+
+#define BPF_OPC_ADD	0x00
+#define BPF_OPC_SUB	0x10
+#define BPF_OPC_MUL	0x20
+#define BPF_OPC_DIV	0x30
+#define BPF_OPC_OR	0x40
+#define BPF_OPC_AND	0x50
+#define BPF_OPC_LSH	0x60
+#define BPF_OPC_RSH	0x70
+#define BPF_OPC_NEG	0x80
+#define BPF_OPC_MOD	0x90
+#define BPF_OPC_XOR	0xa0
+
+#define BPF_OPC_JA	0x00
+#define BPF_OPC_JEQ	0x10
+#define BPF_OPC_JGT	0x20
+#define BPF_OPC_JGE	0x30
+#define BPF_OPC_JSET	0x40
+
+#define BPF_OPC_K	0x00
+#define BPF_OPC_X	0x08
+
+const struct bpf_opcode bpf_opcodes[] = {
+  { "mov32",   BPF_OPC_ALU   | BPF_OPC_MOV  | BPF_OPC_X,     "1,2" },
+  { "mov32",   BPF_OPC_ALU   | BPF_OPC_MOV  | BPF_OPC_K,     "1,i" },
+  { "mov",     BPF_OPC_ALU64 | BPF_OPC_MOV  | BPF_OPC_X,     "1,2" },
+  { "mov",     BPF_OPC_ALU64 | BPF_OPC_MOV  | BPF_OPC_K,     "1,i" },
+  { "add32",   BPF_OPC_ALU   | BPF_OPC_ADD  | BPF_OPC_X,     "1,2" },
+  { "add32",   BPF_OPC_ALU   | BPF_OPC_ADD  | BPF_OPC_K,     "1,i" },
+  { "add",     BPF_OPC_ALU64 | BPF_OPC_ADD  | BPF_OPC_X,     "1,2" },
+  { "add",     BPF_OPC_ALU64 | BPF_OPC_ADD  | BPF_OPC_K,     "1,i" },
+  { "sub32",   BPF_OPC_ALU   | BPF_OPC_SUB  | BPF_OPC_X,     "1,2" },
+  { "sub32",   BPF_OPC_ALU   | BPF_OPC_SUB  | BPF_OPC_K,     "1,i" },
+  { "sub",     BPF_OPC_ALU64 | BPF_OPC_SUB  | BPF_OPC_X,     "1,2" },
+  { "sub",     BPF_OPC_ALU64 | BPF_OPC_SUB  | BPF_OPC_K,     "1,i" },
+  { "and32",   BPF_OPC_ALU   | BPF_OPC_AND  | BPF_OPC_X,     "1,2" },
+  { "and32",   BPF_OPC_ALU   | BPF_OPC_AND  | BPF_OPC_K,     "1,i" },
+  { "and",     BPF_OPC_ALU64 | BPF_OPC_AND  | BPF_OPC_X,     "1,2" },
+  { "and",     BPF_OPC_ALU64 | BPF_OPC_AND  | BPF_OPC_K,     "1,i" },
+  { "or32",    BPF_OPC_ALU   | BPF_OPC_OR   | BPF_OPC_X,     "1,2" },
+  { "or32",    BPF_OPC_ALU   | BPF_OPC_XOR  | BPF_OPC_K,     "1,i" },
+  { "or",      BPF_OPC_ALU64 | BPF_OPC_OR   | BPF_OPC_X,     "1,2" },
+  { "or",      BPF_OPC_ALU64 | BPF_OPC_XOR  | BPF_OPC_K,     "1,i" },
+  { "xor32",   BPF_OPC_ALU   | BPF_OPC_XOR  | BPF_OPC_X,     "1,2" },
+  { "xor32",   BPF_OPC_ALU   | BPF_OPC_OR   | BPF_OPC_K,     "1,i" },
+  { "xor",     BPF_OPC_ALU64 | BPF_OPC_XOR  | BPF_OPC_X,     "1,2" },
+  { "xor",     BPF_OPC_ALU64 | BPF_OPC_OR   | BPF_OPC_K,     "1,i" },
+  { "mul32",   BPF_OPC_ALU   | BPF_OPC_MUL  | BPF_OPC_X,     "1,2" },
+  { "mul32",   BPF_OPC_ALU   | BPF_OPC_MUL  | BPF_OPC_K,     "1,i" },
+  { "mul",     BPF_OPC_ALU64 | BPF_OPC_MUL  | BPF_OPC_X,     "1,2" },
+  { "mul",     BPF_OPC_ALU64 | BPF_OPC_MUL  | BPF_OPC_K,     "1,i" },
+  { "div32",   BPF_OPC_ALU   | BPF_OPC_DIV  | BPF_OPC_X,     "1,2" },
+  { "div32",   BPF_OPC_ALU   | BPF_OPC_DIV  | BPF_OPC_K,     "1,i" },
+  { "div",     BPF_OPC_ALU64 | BPF_OPC_DIV  | BPF_OPC_X,     "1,2" },
+  { "div",     BPF_OPC_ALU64 | BPF_OPC_DIV  | BPF_OPC_K,     "1,i" },
+  { "mod32",   BPF_OPC_ALU   | BPF_OPC_MOD  | BPF_OPC_X,     "1,2" },
+  { "mod32",   BPF_OPC_ALU   | BPF_OPC_MOD  | BPF_OPC_K,     "1,i" },
+  { "mod",     BPF_OPC_ALU64 | BPF_OPC_MOD  | BPF_OPC_X,     "1,2" },
+  { "mod",     BPF_OPC_ALU64 | BPF_OPC_MOD  | BPF_OPC_K,     "1,i" },
+  { "lsh32",   BPF_OPC_ALU   | BPF_OPC_LSH  | BPF_OPC_X,     "1,2" },
+  { "lsh32",   BPF_OPC_ALU   | BPF_OPC_LSH  | BPF_OPC_K,     "1,i" },
+  { "lsh",     BPF_OPC_ALU64 | BPF_OPC_LSH  | BPF_OPC_X,     "1,2" },
+  { "lsh",     BPF_OPC_ALU64 | BPF_OPC_LSH  | BPF_OPC_K,     "1,i" },
+  { "rsh32",   BPF_OPC_ALU   | BPF_OPC_RSH  | BPF_OPC_X,     "1,2" },
+  { "rsh32",   BPF_OPC_ALU   | BPF_OPC_RSH  | BPF_OPC_K,     "1,i" },
+  { "rsh",     BPF_OPC_ALU64 | BPF_OPC_RSH  | BPF_OPC_X,     "1,2" },
+  { "rsh",     BPF_OPC_ALU64 | BPF_OPC_RSH  | BPF_OPC_K,     "1,i" },
+  { "arsh32",  BPF_OPC_ALU   | BPF_OPC_ARSH | BPF_OPC_X,     "1,2" },
+  { "arsh32",  BPF_OPC_ALU   | BPF_OPC_ARSH | BPF_OPC_K,     "1,i" },
+  { "arsh",    BPF_OPC_ALU64 | BPF_OPC_ARSH | BPF_OPC_X,     "1,2" },
+  { "arsh",    BPF_OPC_ALU64 | BPF_OPC_ARSH | BPF_OPC_K,     "1,i" },
+  { "neg32",   BPF_OPC_ALU   | BPF_OPC_NEG  | BPF_OPC_X,     "1" },
+  { "neg",     BPF_OPC_ALU64 | BPF_OPC_NEG  | BPF_OPC_X,     "1" },
+  { "endbe",   BPF_OPC_ALU   | BPF_OPC_END  | BPF_OPC_TO_BE, "1,i" },
+  { "endle",   BPF_OPC_ALU   | BPF_OPC_END  | BPF_OPC_TO_LE, "1,i" },
+  { "ja",      BPF_OPC_JMP   | BPF_OPC_JA,                   "L" },
+  { "jeq",     BPF_OPC_JMP   | BPF_OPC_JEQ  | BPF_OPC_X,     "1,2,L" },
+  { "jeq",     BPF_OPC_JMP   | BPF_OPC_JEQ  | BPF_OPC_K,     "1,i,L" },
+  { "jgt",     BPF_OPC_JMP   | BPF_OPC_JGT  | BPF_OPC_X,     "1,2,L" },
+  { "jgt",     BPF_OPC_JMP   | BPF_OPC_JGT  | BPF_OPC_K,     "1,i,L" },
+  { "jge",     BPF_OPC_JMP   | BPF_OPC_JGE  | BPF_OPC_X,     "1,2,L" },
+  { "jge",     BPF_OPC_JMP   | BPF_OPC_JGE  | BPF_OPC_K,     "1,i,L" },
+  { "jne",     BPF_OPC_JMP   | BPF_OPC_JNE  | BPF_OPC_X,     "1,2,L" },
+  { "jne",     BPF_OPC_JMP   | BPF_OPC_JNE  | BPF_OPC_K,     "1,i,L" },
+  { "jsgt",    BPF_OPC_JMP   | BPF_OPC_JSGT | BPF_OPC_X,     "1,2,L" },
+  { "jsgt",    BPF_OPC_JMP   | BPF_OPC_JSGT | BPF_OPC_K,     "1,i,L" },
+  { "jsge",    BPF_OPC_JMP   | BPF_OPC_JSGE | BPF_OPC_X,     "1,2,L" },
+  { "jsge",    BPF_OPC_JMP   | BPF_OPC_JSGE | BPF_OPC_K,     "1,i,L" },
+  { "jset",    BPF_OPC_JMP   | BPF_OPC_JSET | BPF_OPC_X,     "1,2,L" },
+  { "jset",    BPF_OPC_JMP   | BPF_OPC_JSET | BPF_OPC_K,     "1,i,L" },
+  { "call",    BPF_OPC_JMP   | BPF_OPC_CALL,                 "C" },
+  { "tailcall",BPF_OPC_JMP   | BPF_OPC_CALL | BPF_OPC_X,     "C" },
+  { "exit",    BPF_OPC_JMP   | BPF_OPC_EXIT,                 "" },
+  { "ldimm64", BPF_OPC_LD    | BPF_OPC_IMM  | BPF_OPC_DW,    "1,D" },
+  { "ldw",     BPF_OPC_LDX   | BPF_OPC_MEM  | BPF_OPC_W,     "1,[2+O]" },
+  { "ldh",     BPF_OPC_LDX   | BPF_OPC_MEM  | BPF_OPC_H,     "1,[2+O]" },
+  { "ldb",     BPF_OPC_LDX   | BPF_OPC_MEM  | BPF_OPC_B,     "1,[2+O]" },
+  { "lddw",    BPF_OPC_LDX   | BPF_OPC_MEM  | BPF_OPC_DW,    "1,[2+O]" },
+  { "stw",     BPF_OPC_STX   | BPF_OPC_MEM  | BPF_OPC_W,     "[1+O],2" },
+  { "stw",     BPF_OPC_ST    | BPF_OPC_MEM  | BPF_OPC_W,     "[1+O],i" },
+  { "sth",     BPF_OPC_STX   | BPF_OPC_MEM  | BPF_OPC_H,     "[1+O],2" },
+  { "sth",     BPF_OPC_ST    | BPF_OPC_MEM  | BPF_OPC_H,     "[1+O],i" },
+  { "stb",     BPF_OPC_STX   | BPF_OPC_MEM  | BPF_OPC_B,     "[1+O],2" },
+  { "stb",     BPF_OPC_ST    | BPF_OPC_MEM  | BPF_OPC_B,     "[1+O],i" },
+  { "stdw",    BPF_OPC_STX   | BPF_OPC_MEM  | BPF_OPC_DW,    "[1+O],2" },
+  { "stdw",    BPF_OPC_ST    | BPF_OPC_MEM  | BPF_OPC_DW,    "[1+O],i" },
+  { "xaddw",   BPF_OPC_STX   | BPF_OPC_XADD | BPF_OPC_W,     "[1+O],2" },
+  { "xadddw",  BPF_OPC_STX   | BPF_OPC_XADD | BPF_OPC_DW,    "[1+O],2" },
+};
+const int bpf_num_opcodes = ((sizeof bpf_opcodes)/(sizeof bpf_opcodes[0]));
diff --git a/opcodes/configure b/opcodes/configure
index 27d1472..7583220 100755
--- a/opcodes/configure
+++ b/opcodes/configure
@@ -12634,6 +12634,7 @@ if test x${all_targets} = xfalse ; then
 	bfd_arm_arch)		ta="$ta arm-dis.lo" ;;
 	bfd_avr_arch)		ta="$ta avr-dis.lo" ;;
 	bfd_bfin_arch)		ta="$ta bfin-dis.lo" ;;
+	bfd_bpf_arch)		ta="$ta bpf-dis.lo bpf-opc.lo" ;;
 	bfd_cr16_arch)		ta="$ta cr16-dis.lo cr16-opc.lo" ;;
 	bfd_cris_arch)		ta="$ta cris-dis.lo cris-opc.lo cgen-bitset.lo" ;;
 	bfd_crx_arch)		ta="$ta crx-dis.lo crx-opc.lo" ;;
diff --git a/opcodes/configure.ac b/opcodes/configure.ac
index a9fbfd6..7dc6a92 100644
--- a/opcodes/configure.ac
+++ b/opcodes/configure.ac
@@ -258,6 +258,7 @@ if test x${all_targets} = xfalse ; then
 	bfd_arm_arch)		ta="$ta arm-dis.lo" ;;
 	bfd_avr_arch)		ta="$ta avr-dis.lo" ;;
 	bfd_bfin_arch)		ta="$ta bfin-dis.lo" ;;
+	bfd_bpf_arch)		ta="$ta bpf-dis.lo bpf-opc.lo" ;;
 	bfd_cr16_arch)		ta="$ta cr16-dis.lo cr16-opc.lo" ;;
 	bfd_cris_arch)		ta="$ta cris-dis.lo cris-opc.lo cgen-bitset.lo" ;;
 	bfd_crx_arch)		ta="$ta crx-dis.lo crx-opc.lo" ;;
diff --git a/opcodes/disassemble.c b/opcodes/disassemble.c
index dd7d3a3..e594f86 100644
--- a/opcodes/disassemble.c
+++ b/opcodes/disassemble.c
@@ -29,6 +29,7 @@
 #define ARCH_arm
 #define ARCH_avr
 #define ARCH_bfin
+#define ARCH_bpf
 #define ARCH_cr16
 #define ARCH_cris
 #define ARCH_crx
@@ -151,6 +152,11 @@ disassembler (bfd *abfd)
       disassemble = print_insn_bfin;
       break;
 #endif
+#ifdef ARCH_bpf
+    case bfd_arch_bpf:
+      disassemble = print_insn_bpf;
+      break;
+#endif
 #ifdef ARCH_cr16
     case bfd_arch_cr16:
       disassemble = print_insn_cr16;
-- 
2.4.11

^ permalink raw reply related

* Re: [RFC net-next 2/2] bpf: Test for bpf_prog ID and BPF_PROG_GET_NEXT_ID
From: Martin KaFai Lau @ 2017-04-27 21:10 UTC (permalink / raw)
  To: Alexander Alemayhu
  Cc: netdev, Daniel Borkmann, Hannes Frederic Sowa, Alexei Starovoitov,
	kernel-team
In-Reply-To: <20170427072318.GA4734@gmail.com>

On Thu, Apr 27, 2017 at 09:23:18AM +0200, Alexander Alemayhu wrote:
> On Wed, Apr 26, 2017 at 11:24:49PM -0700, Martin KaFai Lau wrote:
> > Add test to exercise the bpf_prog id generation
> > and iteration.
> >
> Could test_prog_id be a function in tools/testing/selftests/bpf/test_progs.c
> instead? bpf_prog_load is already available there.
Will refactor to avoid duplication.

>
> --
> Mit freundlichen Grüßen
>
> Alexander Alemayhu

^ permalink raw reply

* Re: [RFC net-next 0/2] Introduce bpf_prog ID and iteration
From: Martin KaFai Lau @ 2017-04-27 21:14 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: netdev, Daniel Borkmann, Alexei Starovoitov, kernel-team
In-Reply-To: <e81805c8-0499-a0a5-b788-0168947d9b8c@stressinduktion.org>

On Thu, Apr 27, 2017 at 03:36:59PM +0200, Hannes Frederic Sowa wrote:
> It would help a lot if you could pass the prog_id back during program
> creation, otherwise it will be kind of difficult to get a hold on which
> program is where. ;)
Thanks for your feedback :).  Make sense.  I will look into it.

^ permalink raw reply

* Re: rhashtable - Cap total number of entries to 2^31
From: Florian Fainelli @ 2017-04-27 21:16 UTC (permalink / raw)
  To: Herbert Xu, David Miller; +Cc: fw, netdev, Thomas Graf
In-Reply-To: <20170427054451.GA529@gondor.apana.org.au>

Hi Herbert,

On 04/26/2017 10:44 PM, Herbert Xu wrote:
> On Tue, Apr 25, 2017 at 10:48:22AM -0400, David Miller wrote:
>> From: Florian Westphal <fw@strlen.de>
>> Date: Tue, 25 Apr 2017 16:17:49 +0200
>>
>>> I'd have less of an issue with this if we'd be talking about
>>> something computationally expensive, but this is about storing
>>> an extra value inside a struct just to avoid one "shr" in insert path...
>>
>> Agreed, this shift is probably filling an available cpu cycle :-)
> 
> OK, but we need to have an extra field for another reason anyway.
> The problem is that we're not capping the total number of elements
> in the hashtable when max_size is not set, this means that nelems
> can overflow which will cause havoc with the automatic shrinking
> when it tries to fit 2^32 entries into a minimum-sized table.
> 
> So I'm taking that hole back for now :)
> 
> ---8<---
> When max_size is not set or if it set to a sufficiently large
> value, the nelems counter can overflow.  This would cause havoc
> with the automatic shrinking as it would then attempt to fit a
> huge number of entries into a tiny hash table.
> 
> This patch fixes this by adding max_elems to struct rhashtable
> to cap the number of elements.  This is set to 2^31 as nelems is
> not a precise count.  This is sufficiently smaller than UINT_MAX
> that it should be safe.
> 
> When max_size is set max_elems will be lowered to at most twice
> max_size as is the status quo.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

This commit:

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=6d684e54690caef45cf14051ddeb7c71beeb681b

makes my ARMv7 (32-bit) system panic on boot with the log below. I can
test net-next (or net) and report back if you want me to test anything.
Thanks!

[    0.158619] futex hash table entries: 1024 (order: 4, 65536 bytes)
[    0.166386] NET: Registered protocol family 16
[    0.179596] Kernel panic - not syncing: rtnetlink_init: cannot
initialize rtnetlink
[    0.179596]
[    0.189350] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
4.11.0-rc8-02028-g6d684e54690c #37
[    0.197908] Hardware name: Broadcom STB (Flattened Device Tree)
[    0.204254] [<c020fa18>] (unwind_backtrace) from [<c020b294>]
(show_stack+0x10/0x14)
[    0.212447] [<c020b294>] (show_stack) from [<c04bc454>]
(dump_stack+0x90/0xa4)
[    0.220144] [<c04bc454>] (dump_stack) from [<c02ab684>]
(panic+0xf0/0x270)
[    0.227460] [<c02ab684>] (panic) from [<c0c2705c>]
(rtnetlink_init+0x24/0x1d4)
[    0.235145] [<c0c2705c>] (rtnetlink_init) from [<c0c27630>]
(netlink_proto_init+0x124/0x148)
[    0.244124] [<c0c27630>] (netlink_proto_init) from [<c02017f8>]
(do_one_initcall+0x40/0x168)
[    0.253072] [<c02017f8>] (do_one_initcall) from [<c0c00dfc>]
(kernel_init_freeable+0x164/0x200)
[    0.262304] [<c0c00dfc>] (kernel_init_freeable) from [<c087bfd8>]
(kernel_init+0x8/0x110)
[    0.270970] [<c087bfd8>] (kernel_init) from [<c0207fa8>]
(ret_from_fork+0x14/0x2c)
[    0.279014] CPU1: stopping
[    0.281916] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
4.11.0-rc8-02028-g6d684e54690c #37
[    0.290499] Hardware name: Broadcom STB (Flattened Device Tree)
[    0.296796] [<c020fa18>] (unwind_backtrace) from [<c020b294>]
(show_stack+0x10/0x14)
[    0.305018] [<c020b294>] (show_stack) from [<c04bc454>]
(dump_stack+0x90/0xa4)
[    0.312684] [<c04bc454>] (dump_stack) from [<c020e984>]
(handle_IPI+0x170/0x190)
[    0.320531] [<c020e984>] (handle_IPI) from [<c020144c>]
(gic_handle_irq+0x88/0x8c)
[    0.328586] [<c020144c>] (gic_handle_irq) from [<c020bd78>]
(__irq_svc+0x58/0x74)
[    0.336543] Exception stack(0xee055f68 to 0xee055fb0)
[    0.341938] 5f60:                   00000001 00000000 ee055fc0
c0219b60 ee054000 c1603cc8
[    0.350661] 5f80: c1603c6c 00000000 00000000 c1486188 ee055fc0
c1603cd4 c1483408 ee055fb8
[    0.359323] 5fa0: c0208a40 c0208a44 60000013 ffffffff
[    0.364745] [<c020bd78>] (__irq_svc) from [<c0208a44>]
(arch_cpu_idle+0x38/0x3c)
[    0.372613] [<c0208a44>] (arch_cpu_idle) from [<c0255e98>]
(do_idle+0x168/0x204)
[    0.380479] [<c0255e98>] (do_idle) from [<c02561ac>]
(cpu_startup_entry+0x18/0x1c)
[    0.388493] [<c02561ac>] (cpu_startup_entry) from [<002014ec>] (0x2014ec)
[    0.395687] CPU3: stopping
[    0.398606] CPU: 3 PID: 0 Comm: swapper/3 Not tainted
4.11.0-rc8-02028-g6d684e54690c #37
[    0.407242] Hardware name: Broadcom STB (Flattened Device Tree)
[    0.413564] [<c020fa18>] (unwind_backtrace) from [<c020b294>]
(show_stack+0x10/0x14)
[    0.421795] [<c020b294>] (show_stack) from [<c04bc454>]
(dump_stack+0x90/0xa4)
[    0.429495] [<c04bc454>] (dump_stack) from [<c020e984>]
(handle_IPI+0x170/0x190)
[    0.437394] [<c020e984>] (handle_IPI) from [<c020144c>]
(gic_handle_irq+0x88/0x8c)
[    0.445475] [<c020144c>] (gic_handle_irq) from [<c020bd78>]
(__irq_svc+0x58/0x74)
[    0.453406] Exception stack(0xee059f68 to 0xee059fb0)
[    0.458792] 9f60:                   00000001 00000000 ee059fc0
c0219b60 ee058000 c1603cc8
[    0.467489] 9f80: c1603c6c 00000000 00000000 c1486188 ee059fc0
c1603cd4 c1483408 ee059fb8
[    0.476177] 9fa0: c0208a40 c0208a44 60000013 ffffffff
[    0.481581] [<c020bd78>] (__irq_svc) from [<c0208a44>]
(arch_cpu_idle+0x38/0x3c)
[    0.489474] [<c0208a44>] (arch_cpu_idle) from [<c0255e98>]
(do_idle+0x168/0x204)
[    0.497331] [<c0255e98>] (do_idle) from [<c02561ac>]
(cpu_startup_entry+0x18/0x1c)
[    0.505369] [<c02561ac>] (cpu_startup_entry) from [<002014ec>] (0x2014ec)
[    0.512562] CPU2: stopping
[    0.515463] CPU: 2 PID: 0 Comm: swapper/2 Not tainted
4.11.0-rc8-02028-g6d684e54690c #37
[    0.524047] Hardware name: Broadcom STB (Flattened Device Tree)
[    0.530368] [<c020fa18>] (unwind_backtrace) from [<c020b294>]
(show_stack+0x10/0x14)
[    0.538573] [<c020b294>] (show_stack) from [<c04bc454>]
(dump_stack+0x90/0xa4)
[    0.546195] [<c04bc454>] (dump_stack) from [<c020e984>]
(handle_IPI+0x170/0x190)
[    0.554050] [<c020e984>] (handle_IPI) from [<c020144c>]
(gic_handle_irq+0x88/0x8c)
[    0.562096] [<c020144c>] (gic_handle_irq) from [<c020bd78>]
(__irq_svc+0x58/0x74)
[    0.570044] Exception stack(0xee057f68 to 0xee057fb0)
[    0.575465] 7f60:                   00000001 00000000 ee057fc0
c0219b60 ee056000 c1603cc8
[    0.584145] 7f80: c1603c6c 00000000 00000000 c1486188 ee057fc0
c1603cd4 c1483408 ee057fb8
[    0.592806] 7fa0: c0208a40 c0208a44 60000013 ffffffff
[    0.598220] [<c020bd78>] (__irq_svc) from [<c0208a44>]
(arch_cpu_idle+0x38/0x3c)
[    0.606103] [<c0208a44>] (arch_cpu_idle) from [<c0255e98>]
(do_idle+0x168/0x204)
[    0.613960] [<c0255e98>] (do_idle) from [<c02561ac>]
(cpu_startup_entry+0x18/0x1c)
[    0.621990] [<c02561ac>] (cpu_startup_entry) from [<002014ec>] (0x2014ec)
[    0.629201] ---[ end Kernel panic - not syncing: rtnetlink_init:
cannot initialize rtnetlink
[    0.629201]

-- 
Florian

^ permalink raw reply

* Re: pull-request: wireless-drivers-next 2017-04-27
From: David Miller @ 2017-04-27 21:16 UTC (permalink / raw)
  To: kvalo; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <87a872xk4u.fsf@kamboji.qca.qualcomm.com>

From: Kalle Valo <kvalo@codeaurora.org>
Date: Thu, 27 Apr 2017 12:41:37 +0300

> here's a pull request for net-next, more info in the tag below. This
> should be the last pull request to net-next for 4.12. Please let me know
> if there are any problems.


Pulled, thanks Kalle.

^ permalink raw reply

* [PATCH net-next] geneve: fix incorrect setting of UDP checksum flag
From: Girish Moodalbail @ 2017-04-27 21:11 UTC (permalink / raw)
  To: davem; +Cc: netdev, pshelar

Creating a geneve link with 'udpcsum' set results in a creation of link
for which UDP checksum will NOT be computed on outbound packets, as can
be seen below.

11: gen0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
    link/ether c2:85:27:b6:b4:15 brd ff:ff:ff:ff:ff:ff promiscuity 0
    geneve id 200 remote 192.168.13.1 dstport 6081 noudpcsum

Similarly, creating a link with 'noudpcsum' set results in a creation
of link for which UDP checksum will be computed on outbound packets.

Fixes: 9b4437a5b870 ("geneve: Unify LWT and netdev handling.")
Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
---
 drivers/net/geneve.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 7074b40..dec5d56 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -1244,7 +1244,7 @@ static int geneve_newlink(struct net *net, struct net_device *dev,
 		metadata = true;
 
 	if (data[IFLA_GENEVE_UDP_CSUM] &&
-	    !nla_get_u8(data[IFLA_GENEVE_UDP_CSUM]))
+	    nla_get_u8(data[IFLA_GENEVE_UDP_CSUM]))
 		info.key.tun_flags |= TUNNEL_CSUM;
 
 	if (data[IFLA_GENEVE_UDP_ZERO_CSUM6_TX] &&
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next] net: Initialise init_net.count to 1
From: David Howells @ 2017-04-27 21:40 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel

Initialise init_net.count to 1 for its pointer from init_nsproxy lest
someone tries to do a get_net() and a put_net() in a process in which
current->ns_proxy->net_ns points to the initial network namespace.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/core/net_namespace.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 652468ff65b7..adb97ca141b7 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -35,7 +35,8 @@ LIST_HEAD(net_namespace_list);
 EXPORT_SYMBOL_GPL(net_namespace_list);
 
 struct net init_net = {
-	.dev_base_head = LIST_HEAD_INIT(init_net.dev_base_head),
+	.count		= ATOMIC_INIT(1),
+	.dev_base_head	= LIST_HEAD_INIT(init_net.dev_base_head),
 };
 EXPORT_SYMBOL(init_net);
 

^ permalink raw reply related

* Re: [PATCH v2 15/21] xen-blkfront: Make use of the new sg_map helper function
From: Logan Gunthorpe @ 2017-04-27 21:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Boris Ostrovsky, linux-nvdimm-y27Ovi1pjclAfugRpC6u6w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	target-devel-u79uwXL29TY76Z2rM5mHXA, Christoph Hellwig,
	devel-gWbeCf7V1WCQmaza687I9mD2FQJk+8+b, James E.J. Bottomley,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, Matthew Wilcox,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Sumit Semwal,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw,
	linux-media-u79uwXL29TY76Z2rM5mHXA, Juergen Gross, Julien Grall,
	intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	sparmaintainer-GLv8BlqOqDDQT0dZR+AlfA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	megaraidlinux.pdl-dY08KVG/lbpWk0Htik3J/w, Jens Axboe,
	Martin K. Petersen, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-mmc-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA, Greg Kroah-Hartman
In-Reply-To: <20170427205339.GB26330-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>



On 27/04/17 02:53 PM, Jason Gunthorpe wrote:
> blkfront is one of the drivers I looked at, and it appears to only be
> memcpying with the bvec_data pointer, so I wonder why it does not use
> sg_copy_X_buffer instead..

Yes, sort of...

But you'd potentially end up calling sg_copy_to_buffer multiple times
per page within the sg (given that gnttab_foreach_grant_in_range might
call blkif_copy_from_grant/blkif_setup_rw_req_grant multiple times).
Even calling sg_copy_to_buffer once per page seems rather inefficient as
it uses sg_miter internally.

Switching the for_each_sg to sg_miter is probably the nicer solution as
it takes care of the mapping and the offset/length accounting for you
and will have similar performance.

But, yes, if performance is not an issue, switching it to
sg_copy_to_buffer would be a less invasive change than sg_miter. Which
the same might be said about a lot of these cases.

Unfortunately, changing from kmap_atomic (which is a null operation in a
lot of cases) to sg_copy_X_buffer is a pretty big performance hit.

Logan

^ permalink raw reply

* Re: [PATCH v2 15/21] xen-blkfront: Make use of the new sg_map helper function
From: Jason Gunthorpe @ 2017-04-27 22:11 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Boris Ostrovsky, linux-nvdimm-y27Ovi1pjclAfugRpC6u6w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	target-devel-u79uwXL29TY76Z2rM5mHXA, Christoph Hellwig,
	devel-gWbeCf7V1WCQmaza687I9mD2FQJk+8+b, James E.J. Bottomley,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, Matthew Wilcox,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Sumit Semwal,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw,
	linux-media-u79uwXL29TY76Z2rM5mHXA, Juergen Gross, Julien Grall,
	intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	sparmaintainer-GLv8BlqOqDDQT0dZR+AlfA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	megaraidlinux.pdl-dY08KVG/lbpWk0Htik3J/w, Jens Axboe,
	Martin K. Petersen, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-mmc-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA, Greg Kroah-Hartman
In-Reply-To: <02ba3c7b-5fab-a06c-fbbf-c3be1c0fae1b-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>

On Thu, Apr 27, 2017 at 03:53:37PM -0600, Logan Gunthorpe wrote:
> On 27/04/17 02:53 PM, Jason Gunthorpe wrote:
> > blkfront is one of the drivers I looked at, and it appears to only be
> > memcpying with the bvec_data pointer, so I wonder why it does not use
> > sg_copy_X_buffer instead..
> 
> But you'd potentially end up calling sg_copy_to_buffer multiple times
> per page within the sg (given that gnttab_foreach_grant_in_range might
> call blkif_copy_from_grant/blkif_setup_rw_req_grant multiple times).
> Even calling sg_copy_to_buffer once per page seems rather inefficient as
> it uses sg_miter internally.

Well, that is in the current form, with more users it would make sense
to optimize for the single page case, eg by providing the existing
call, providing a faster single-page-only variant of the copy, perhaps
even one that is inlined.

> Switching the for_each_sg to sg_miter is probably the nicer solution as
> it takes care of the mapping and the offset/length accounting for you
> and will have similar performance.

sg_miter will still fail when the sg contains __iomem, however I would
expect that the sg_copy will work with iomem, by using the __iomem
memcpy variant.

So, sg_copy should always be preferred in this new world with mixed
__iomem since it is the only primitive that can transparently handle
it.

Jason

^ permalink raw reply

* Re: ipsec doesn't route TCP with 4.11 kernel
From: Don Bowman @ 2017-04-27 22:15 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Cong Wang, linux-kernel@vger.kernel.org, Herbert Xu,
	Linux Kernel Network Developers
In-Reply-To: <20170427084238.GX2649@secunet.com>

On 27 April 2017 at 04:42, Steffen Klassert
<steffen.klassert@secunet.com> wrote:
> On Wed, Apr 26, 2017 at 10:01:34PM -0700, Cong Wang wrote:
>> (Cc'ing netdev and IPSec maintainers)
>>
>> On Tue, Apr 25, 2017 at 6:08 PM, Don Bowman <db@donbowman.ca> wrote:

for 'esp' question, i have ' esp = aes256-sha256-modp1536!' is that
what you mean?
its nat-aware tunnel [from my desktop pc to my office]

root@office:~# ip -s x s
src 172.16.0.8 dst 64.7.137.180
        proto esp spi 0x0d588366(223904614) reqid 1(0x00000001) mode tunnel
        replay-window 0 seq 0x00000000 flag af-unspec (0x00100000)
        auth-trunc hmac(sha256)
0x046cafdf19c5d78d1c29165d96a0b9fce1c500029d77be0fe956dce1bf80a86a
(256 bits) 128
        enc cbc(aes)
0x79ff2fbc2178eb468de6ff16612f0603b514a1d1d5f375c67222294463ec7c62
(256 bits)
        encap type espinudp sport 4500 dport 4500 addr 0.0.0.0
        anti-replay context: seq 0x0, oseq 0x28, bitmap 0x00000000
        lifetime config:
          limit: soft (INF)(bytes), hard (INF)(bytes)
          limit: soft (INF)(packets), hard (INF)(packets)
          expire add: soft 42843(sec), hard 43200(sec)
          expire use: soft 0(sec), hard 0(sec)
        lifetime current:
          2986(bytes), 40(packets)
          add 2017-04-27 18:08:12 use 2017-04-27 18:08:12
        stats:
          replay-window 0 replay 0 failed 0
src 64.7.137.180 dst 172.16.0.8
        proto esp spi 0xcd366c03(3442895875) reqid 1(0x00000001) mode tunnel
        replay-window 32 seq 0x00000000 flag af-unspec (0x00100000)
        auth-trunc hmac(sha256)
0x4158741cc971c49417d60165f19ed02249385c7bba808927d4a9e7b45fb30c5b
(256 bits) 128
        enc cbc(aes)
0x77592c79c964787bca5012214b85172b06deb7b3f06aac02e3934dd9ead67c15
(256 bits)
        encap type espinudp sport 4500 dport 4500 addr 0.0.0.0
        anti-replay context: seq 0x27, oseq 0x0, bitmap 0xffffffff
        lifetime config:
          limit: soft (INF)(bytes), hard (INF)(bytes)
          limit: soft (INF)(packets), hard (INF)(packets)
          expire add: soft 42873(sec), hard 43200(sec)
          expire use: soft 0(sec), hard 0(sec)
        lifetime current:
          4501(bytes), 38(packets)
          add 2017-04-27 18:08:12 use 2017-04-27 18:08:12
        stats:
          replay-window 0 replay 0 failed 0


>> >
>> > My ipsec tunnel comes up ok.
>
> When talking about IPsec, I guess you use ESP, right?
 ...

>
> If it is a GRO issue, then it is on the receive side, could you do
> tcpdump on the receiving interface to see what you get there?

I'm not sure what you mean the receiving interface, you mean the
outer, the native interface?
listening on eno1, link-type EN10MB (Ethernet), capture size 262144 bytes
18:11:32.061501 IP 172.16.0.8.3416 > 64.7.137.180.33638: truncated-udplength 0
18:11:32.788091 IP 64.7.137.180.4500 > 172.16.0.8.4500: NONESP-encap:
isakmp: child_sa  inf2
18:11:32.788354 IP 172.16.0.8.4500 > 64.7.137.180.4500: NONESP-encap:
isakmp: child_sa  inf2[IR]
18:11:33.066830 IP 172.16.0.8.3416 > 64.7.137.180.33638: truncated-udplength 0
18:11:35.082839 IP 172.16.0.8.3416 > 64.7.137.180.33638: truncated-udplength 0
18:11:37.807945 IP 64.7.137.180.4500 > 172.16.0.8.4500: NONESP-encap:
isakmp: child_sa  inf2
18:11:37.808300 IP 172.16.0.8.4500 > 64.7.137.180.4500: NONESP-encap:
isakmp: child_sa  inf2[IR]

is what i see there for the 'curl' command that doesn't complete.

>
> What shows /proc/net/xfrm_stat?

root@office:~# cat /proc/net/xfrm_stat
XfrmInError                     0
XfrmInBufferError               0
XfrmInHdrError                  0
XfrmInNoStates                  0
XfrmInStateProtoError           0
XfrmInStateModeError            0
XfrmInStateSeqError             0
XfrmInStateExpired              0
XfrmInStateMismatch             0
XfrmInStateInvalid              0
XfrmInTmplMismatch              0
XfrmInNoPols                    0
XfrmInPolBlock                  0
XfrmInPolError                  0
XfrmOutError                    0
XfrmOutBundleGenError           0
XfrmOutBundleCheckError         0
XfrmOutNoStates                 0
XfrmOutStateProtoError          0
XfrmOutStateModeError           0
XfrmOutStateSeqError            0
XfrmOutStateExpired             0
XfrmOutPolBlock                 0
XfrmOutPolDead                  0
XfrmOutPolError                 0
XfrmFwdHdrError                 0
XfrmOutStateInvalid             0
XfrmAcquireError                0

>
> Can you do 'ip -s x s' to see if the SAs are used?
>
> Do you have INET_ESP_OFFLOAD enabled?
>

CONFIG_INET_ESP=m
CONFIG_INET_ESP_OFFLOAD=m
CONFIG_INET6_ESP=m
CONFIG_INET6_ESP_OFFLOAD=m
CONFIG_NETFILTER_XT_MATCH_ESP=m
CONFIG_IP_VS_PROTO_AH_ESP=y
CONFIG_IP_VS_PROTO_ESP=y


# lsmod |grep esp
esp4                   20480  2
xfrm_algo              16384  5 xfrm_user,esp4,ah4,af_key,xfrm_ipcomp

^ permalink raw reply

* Re: rhashtable - Cap total number of entries to 2^31
From: Florian Fainelli @ 2017-04-27 22:21 UTC (permalink / raw)
  To: Herbert Xu, David Miller; +Cc: fw, netdev, Thomas Graf
In-Reply-To: <0cd0286d-b81d-7bf4-d345-7ef098b9a998@broadcom.com>

On 04/27/2017 02:16 PM, Florian Fainelli wrote:
> Hi Herbert,
> 
> On 04/26/2017 10:44 PM, Herbert Xu wrote:
>> On Tue, Apr 25, 2017 at 10:48:22AM -0400, David Miller wrote:
>>> From: Florian Westphal <fw@strlen.de>
>>> Date: Tue, 25 Apr 2017 16:17:49 +0200
>>>
>>>> I'd have less of an issue with this if we'd be talking about
>>>> something computationally expensive, but this is about storing
>>>> an extra value inside a struct just to avoid one "shr" in insert path...
>>>
>>> Agreed, this shift is probably filling an available cpu cycle :-)
>>
>> OK, but we need to have an extra field for another reason anyway.
>> The problem is that we're not capping the total number of elements
>> in the hashtable when max_size is not set, this means that nelems
>> can overflow which will cause havoc with the automatic shrinking
>> when it tries to fit 2^32 entries into a minimum-sized table.
>>
>> So I'm taking that hole back for now :)
>>
>> ---8<---
>> When max_size is not set or if it set to a sufficiently large
>> value, the nelems counter can overflow.  This would cause havoc
>> with the automatic shrinking as it would then attempt to fit a
>> huge number of entries into a tiny hash table.
>>
>> This patch fixes this by adding max_elems to struct rhashtable
>> to cap the number of elements.  This is set to 2^31 as nelems is
>> not a precise count.  This is sufficiently smaller than UINT_MAX
>> that it should be safe.
>>
>> When max_size is set max_elems will be lowered to at most twice
>> max_size as is the status quo.
>>
>> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> This commit:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=6d684e54690caef45cf14051ddeb7c71beeb681b
> 
> makes my ARMv7 (32-bit) system panic on boot with the log below. I can
> test net-next (or net) and report back if you want me to test anything.
> Thanks!

And another on with a QEMU guest:

[    0.389212] NET: Registered protocol family 16
[    0.388807] Kernel panic - not syncing: rtnetlink_init: cannot
initialize rtnetlink
[    0.388807]
[    0.389445] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
4.11.0-rc8-02077-ge221c1f0fe25 #1
[    0.389745] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
[    0.390219] Call Trace:
[    0.391406]  dump_stack+0x51/0x78
[    0.391585]  panic+0xc7/0x20e
[    0.391740]  ? register_pernet_operations+0xa1/0xd0
[    0.392031]  rtnetlink_init+0x22/0x1a0
[    0.392190]  netlink_proto_init+0x168/0x184
[    0.392359]  ? ptp_classifier_init+0x26/0x30
[    0.392528]  ? netlink_net_init+0x2e/0x2e
[    0.392692]  do_one_initcall+0x54/0x190
[    0.392852]  ? parse_args+0x248/0x400
[    0.393033]  kernel_init_freeable+0x127/0x1b6
[    0.393208]  ? kernel_init_freeable+0x1b6/0x1b6
[    0.393389]  ? rest_init+0x70/0x70
[    0.393533]  kernel_init+0x9/0x100
[    0.393676]  ret_from_fork+0x29/0x40
[    0.394555] ---[ end Kernel panic - not syncing: rtnetlink_init:
cannot initialize rtnetlink
[    0.394555]

I traced this down to:

rtnetlink_net_init()
  netlink_kernel_create()
     netlink_insert()
	__netlink_insert()
	   rhashtable_lookup_insert_key()
	      __rhashtable_insert_fast()
                rht_grow_above_max()

And indeed we have:

ht->nelemts = 0
ht->max_elems = 0

such that rht_grow_above_max() returns true.

With your commit we actually take this branch:

if (ht->p.max_size < ht->max_elems / 2)
	ht->max_elems = ht->p.max_size * 2;

since max_size = 0 we have max_elems = 0 as well.

Candidate fix #1:

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 45f89369c4c8..ad9020e1609c 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -329,7 +329,7 @@ static inline bool rht_grow_above_100(const struct
rhashtable *ht,
 static inline bool rht_grow_above_max(const struct rhashtable *ht,
                                      const struct bucket_table *tbl)
 {
-       return atomic_read(&ht->nelems) >= ht->max_elems;
+       return ht->p.max_size && atomic_read(&ht->nelems) >= ht->max_elems;
 }

Candidate fix #2:

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 751630bbe409..6b4f07760fec 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -963,7 +963,7 @@ int rhashtable_init(struct rhashtable *ht,

        /* Cap total entries at 2^31 to avoid nelems overflow. */
        ht->max_elems = 1u << 31;
-       if (ht->p.max_size < ht->max_elems / 2)
+       if (ht->p.max_size && (ht->p.max_size < ht->max_elems / 2))
                ht->max_elems = ht->p.max_size * 2;

        ht->p.min_size = max(ht->p.min_size, HASH_MIN_SIZE);

Number #2 does not introduce an additional conditional on the fastpath,
so I suppose that would be what we would prefer?

> 
> [    0.158619] futex hash table entries: 1024 (order: 4, 65536 bytes)
> [    0.166386] NET: Registered protocol family 16
> [    0.179596] Kernel panic - not syncing: rtnetlink_init: cannot
> initialize rtnetlink
> [    0.179596]
> [    0.189350] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 4.11.0-rc8-02028-g6d684e54690c #37
> [    0.197908] Hardware name: Broadcom STB (Flattened Device Tree)
> [    0.204254] [<c020fa18>] (unwind_backtrace) from [<c020b294>]
> (show_stack+0x10/0x14)
> [    0.212447] [<c020b294>] (show_stack) from [<c04bc454>]
> (dump_stack+0x90/0xa4)
> [    0.220144] [<c04bc454>] (dump_stack) from [<c02ab684>]
> (panic+0xf0/0x270)
> [    0.227460] [<c02ab684>] (panic) from [<c0c2705c>]
> (rtnetlink_init+0x24/0x1d4)
> [    0.235145] [<c0c2705c>] (rtnetlink_init) from [<c0c27630>]
> (netlink_proto_init+0x124/0x148)
> [    0.244124] [<c0c27630>] (netlink_proto_init) from [<c02017f8>]
> (do_one_initcall+0x40/0x168)
> [    0.253072] [<c02017f8>] (do_one_initcall) from [<c0c00dfc>]
> (kernel_init_freeable+0x164/0x200)
> [    0.262304] [<c0c00dfc>] (kernel_init_freeable) from [<c087bfd8>]
> (kernel_init+0x8/0x110)
> [    0.270970] [<c087bfd8>] (kernel_init) from [<c0207fa8>]
> (ret_from_fork+0x14/0x2c)
> [    0.279014] CPU1: stopping
> [    0.281916] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
> 4.11.0-rc8-02028-g6d684e54690c #37
> [    0.290499] Hardware name: Broadcom STB (Flattened Device Tree)
> [    0.296796] [<c020fa18>] (unwind_backtrace) from [<c020b294>]
> (show_stack+0x10/0x14)
> [    0.305018] [<c020b294>] (show_stack) from [<c04bc454>]
> (dump_stack+0x90/0xa4)
> [    0.312684] [<c04bc454>] (dump_stack) from [<c020e984>]
> (handle_IPI+0x170/0x190)
> [    0.320531] [<c020e984>] (handle_IPI) from [<c020144c>]
> (gic_handle_irq+0x88/0x8c)
> [    0.328586] [<c020144c>] (gic_handle_irq) from [<c020bd78>]
> (__irq_svc+0x58/0x74)
> [    0.336543] Exception stack(0xee055f68 to 0xee055fb0)
> [    0.341938] 5f60:                   00000001 00000000 ee055fc0
> c0219b60 ee054000 c1603cc8
> [    0.350661] 5f80: c1603c6c 00000000 00000000 c1486188 ee055fc0
> c1603cd4 c1483408 ee055fb8
> [    0.359323] 5fa0: c0208a40 c0208a44 60000013 ffffffff
> [    0.364745] [<c020bd78>] (__irq_svc) from [<c0208a44>]
> (arch_cpu_idle+0x38/0x3c)
> [    0.372613] [<c0208a44>] (arch_cpu_idle) from [<c0255e98>]
> (do_idle+0x168/0x204)
> [    0.380479] [<c0255e98>] (do_idle) from [<c02561ac>]
> (cpu_startup_entry+0x18/0x1c)
> [    0.388493] [<c02561ac>] (cpu_startup_entry) from [<002014ec>] (0x2014ec)
> [    0.395687] CPU3: stopping
> [    0.398606] CPU: 3 PID: 0 Comm: swapper/3 Not tainted
> 4.11.0-rc8-02028-g6d684e54690c #37
> [    0.407242] Hardware name: Broadcom STB (Flattened Device Tree)
> [    0.413564] [<c020fa18>] (unwind_backtrace) from [<c020b294>]
> (show_stack+0x10/0x14)
> [    0.421795] [<c020b294>] (show_stack) from [<c04bc454>]
> (dump_stack+0x90/0xa4)
> [    0.429495] [<c04bc454>] (dump_stack) from [<c020e984>]
> (handle_IPI+0x170/0x190)
> [    0.437394] [<c020e984>] (handle_IPI) from [<c020144c>]
> (gic_handle_irq+0x88/0x8c)
> [    0.445475] [<c020144c>] (gic_handle_irq) from [<c020bd78>]
> (__irq_svc+0x58/0x74)
> [    0.453406] Exception stack(0xee059f68 to 0xee059fb0)
> [    0.458792] 9f60:                   00000001 00000000 ee059fc0
> c0219b60 ee058000 c1603cc8
> [    0.467489] 9f80: c1603c6c 00000000 00000000 c1486188 ee059fc0
> c1603cd4 c1483408 ee059fb8
> [    0.476177] 9fa0: c0208a40 c0208a44 60000013 ffffffff
> [    0.481581] [<c020bd78>] (__irq_svc) from [<c0208a44>]
> (arch_cpu_idle+0x38/0x3c)
> [    0.489474] [<c0208a44>] (arch_cpu_idle) from [<c0255e98>]
> (do_idle+0x168/0x204)
> [    0.497331] [<c0255e98>] (do_idle) from [<c02561ac>]
> (cpu_startup_entry+0x18/0x1c)
> [    0.505369] [<c02561ac>] (cpu_startup_entry) from [<002014ec>] (0x2014ec)
> [    0.512562] CPU2: stopping
> [    0.515463] CPU: 2 PID: 0 Comm: swapper/2 Not tainted
> 4.11.0-rc8-02028-g6d684e54690c #37
> [    0.524047] Hardware name: Broadcom STB (Flattened Device Tree)
> [    0.530368] [<c020fa18>] (unwind_backtrace) from [<c020b294>]
> (show_stack+0x10/0x14)
> [    0.538573] [<c020b294>] (show_stack) from [<c04bc454>]
> (dump_stack+0x90/0xa4)
> [    0.546195] [<c04bc454>] (dump_stack) from [<c020e984>]
> (handle_IPI+0x170/0x190)
> [    0.554050] [<c020e984>] (handle_IPI) from [<c020144c>]
> (gic_handle_irq+0x88/0x8c)
> [    0.562096] [<c020144c>] (gic_handle_irq) from [<c020bd78>]
> (__irq_svc+0x58/0x74)
> [    0.570044] Exception stack(0xee057f68 to 0xee057fb0)
> [    0.575465] 7f60:                   00000001 00000000 ee057fc0
> c0219b60 ee056000 c1603cc8
> [    0.584145] 7f80: c1603c6c 00000000 00000000 c1486188 ee057fc0
> c1603cd4 c1483408 ee057fb8
> [    0.592806] 7fa0: c0208a40 c0208a44 60000013 ffffffff
> [    0.598220] [<c020bd78>] (__irq_svc) from [<c0208a44>]
> (arch_cpu_idle+0x38/0x3c)
> [    0.606103] [<c0208a44>] (arch_cpu_idle) from [<c0255e98>]
> (do_idle+0x168/0x204)
> [    0.613960] [<c0255e98>] (do_idle) from [<c02561ac>]
> (cpu_startup_entry+0x18/0x1c)
> [    0.621990] [<c02561ac>] (cpu_startup_entry) from [<002014ec>] (0x2014ec)
> [    0.629201] ---[ end Kernel panic - not syncing: rtnetlink_init:
> cannot initialize rtnetlink
> [    0.629201]
> 


-- 
Florian

^ permalink raw reply related

* [PATCH net-next] rhashtable: Make sure max_size is non zero
From: Florian Fainelli @ 2017-04-27 22:28 UTC (permalink / raw)
  To: netdev; +Cc: davem, herbert, fw, tgraf, Florian Fainelli
In-Reply-To: <56843a86-9a09-16e8-acec-05a80396f282@gmail.com>

After commit 6d684e54690c ("rhashtable: Cap total number of
entries to 2^31"), we would be hitting a panic() in net/core/rtnetlink.c
during initialization. The call stack would look like this:

register_pernet_subsys()
    ...
    ops->init()
	rtnetlink_net_init()
	  netlink_kernel_create()
	     netlink_insert()
		__netlink_insert()
		   rhashtable_lookup_insert_key()
		      __rhashtable_insert_fast()
			rht_grow_above_max()

And here, we have rht_grow_above_max() return true, because ht->nelemts = 0 (legit)
&& ht->max_elems = 0 (looks bogus).

Eventually, we would be return -E2BIG from __rhashtable_insert_fast()
and propagate this all the way back to the caller.

After commit 6d684e54690c what changed is that we would take the
following condition:

if (ht->p.max_size < ht->max_elems / 2)
	ht->max_elems = ht->p.max_size * 2;

and since ht->p.max_size = 0, we would set ht->max_elems to 0 as well.

Fix this by taking this branch only when ht->p.max_size is non-zero

Fixes: Fixes: 6d684e54690c ("rhashtable: Cap total number of entries to 2^31")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
---
 lib/rhashtable.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 751630bbe409..6b4f07760fec 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -963,7 +963,7 @@ int rhashtable_init(struct rhashtable *ht,
 
 	/* Cap total entries at 2^31 to avoid nelems overflow. */
 	ht->max_elems = 1u << 31;
-	if (ht->p.max_size < ht->max_elems / 2)
+	if (ht->p.max_size && (ht->p.max_size < ht->max_elems / 2))
 		ht->max_elems = ht->p.max_size * 2;
 
 	ht->p.min_size = max(ht->p.min_size, HASH_MIN_SIZE);
-- 
2.12.2

^ permalink raw reply related

* [PATCH net-next] rhashtable: Make sure max_size is non zero
From: Florian Fainelli @ 2017-04-27 22:30 UTC (permalink / raw)
  To: netdev; +Cc: davem, herbert, fw, tgraf, Florian Fainelli
In-Reply-To: <56843a86-9a09-16e8-acec-05a80396f282@gmail.com>

After commit 6d684e54690c ("rhashtable: Cap total number of
entries to 2^31"), we would be hitting a panic() in net/core/rtnetlink.c
during initialization. The call stack would look like this:

register_pernet_subsys()
    ...
    ops->init()
	rtnetlink_net_init()
	  netlink_kernel_create()
	     netlink_insert()
		__netlink_insert()
		   rhashtable_lookup_insert_key()
		      __rhashtable_insert_fast()
			rht_grow_above_max()

And here, we have rht_grow_above_max() return true, because ht->nelemts = 0 (legit)
&& ht->max_elems = 0 (looks bogus).

Eventually, we would be return -E2BIG from __rhashtable_insert_fast()
and propagate this all the way back to the caller.

After commit 6d684e54690c what changed is that we would take the
following condition:

if (ht->p.max_size < ht->max_elems / 2)
	ht->max_elems = ht->p.max_size * 2;

and since ht->p.max_size = 0, we would set ht->max_elems to 0 as well.

Fix this by taking this branch only when ht->p.max_size is non-zero

Fixes: Fixes: 6d684e54690c ("rhashtable: Cap total number of entries to 2^31")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 lib/rhashtable.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 751630bbe409..6b4f07760fec 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -963,7 +963,7 @@ int rhashtable_init(struct rhashtable *ht,
 
 	/* Cap total entries at 2^31 to avoid nelems overflow. */
 	ht->max_elems = 1u << 31;
-	if (ht->p.max_size < ht->max_elems / 2)
+	if (ht->p.max_size && (ht->p.max_size < ht->max_elems / 2))
 		ht->max_elems = ht->p.max_size * 2;
 
 	ht->p.min_size = max(ht->p.min_size, HASH_MIN_SIZE);
-- 
2.12.2

^ permalink raw reply related

* Re: [PATCH net-next] rhashtable: Make sure max_size is non zero
From: Florian Fainelli @ 2017-04-27 22:32 UTC (permalink / raw)
  To: netdev; +Cc: davem, herbert, fw, tgraf
In-Reply-To: <20170427222824.31936-1-florian.fainelli@broadcom.com>

On 04/27/2017 03:28 PM, Florian Fainelli wrote:
> After commit 6d684e54690c ("rhashtable: Cap total number of
> entries to 2^31"), we would be hitting a panic() in net/core/rtnetlink.c
> during initialization. The call stack would look like this:
> 
> register_pernet_subsys()
>     ...
>     ops->init()
> 	rtnetlink_net_init()
> 	  netlink_kernel_create()
> 	     netlink_insert()
> 		__netlink_insert()
> 		   rhashtable_lookup_insert_key()
> 		      __rhashtable_insert_fast()
> 			rht_grow_above_max()
> 
> And here, we have rht_grow_above_max() return true, because ht->nelemts = 0 (legit)
> && ht->max_elems = 0 (looks bogus).
> 
> Eventually, we would be return -E2BIG from __rhashtable_insert_fast()
> and propagate this all the way back to the caller.
> 
> After commit 6d684e54690c what changed is that we would take the
> following condition:
> 
> if (ht->p.max_size < ht->max_elems / 2)
> 	ht->max_elems = ht->p.max_size * 2;
> 
> and since ht->p.max_size = 0, we would set ht->max_elems to 0 as well.
> 
> Fix this by taking this branch only when ht->p.max_size is non-zero
> 
> Fixes: Fixes: 6d684e54690c ("rhashtable: Cap total number of entries to 2^31")
> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>

Sent another version with the correct email address and marked this one
as superseded in patchwork, not that this email is not valid, but it's
all about consistency.

David pleas apply this one instead:
http://patchwork.ozlabs.org/patch/756172/

/me remembers to stop switching between machines.
-- 
Florian

^ permalink raw reply

* Re: [PATCH net] bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removal
From: Marcelo Ricardo Leitner @ 2017-04-27 22:54 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Jay Vosburgh, David S. Miller,
	Honggang LI, linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <733d454d3c36e99b55de5374c7664364975b171d.1493313626.git.pabeni-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Thu, Apr 27, 2017 at 07:29:34PM +0200, Paolo Abeni wrote:
> On slave list updates, the bonding driver computes its hard_header_len
> as the maximum of all enslaved devices's hard_header_len.
> If the slave list is empty, e.g. on last enslaved device removal,
> ETH_HLEN is used.
> 
> Since the bonding header_ops are set only when the first enslaved
> device is attached, the above can lead to header_ops->create()
> being called with the wrong skb headroom in place.
> 
> If bond0 is configured on top of ipoib devices, with the
> following commands:
> 
> ifup bond0
> for slave in $BOND_SLAVES_LIST; do
> 	ip link set dev $slave nomaster
> done
> ping -c 1 <ip on bond0 subnet>
> 
> we will obtain a skb_under_panic() with a similar call trace:
> 	skb_push+0x3d/0x40
> 	push_pseudo_header+0x17/0x30 [ib_ipoib]
> 	ipoib_hard_header+0x4e/0x80 [ib_ipoib]
> 	arp_create+0x12f/0x220
> 	arp_send_dst.part.19+0x28/0x50
> 	arp_solicit+0x115/0x290
> 	neigh_probe+0x4d/0x70
> 	__neigh_event_send+0xa7/0x230
> 	neigh_resolve_output+0x12e/0x1c0
> 	ip_finish_output2+0x14b/0x390
> 	ip_finish_output+0x136/0x1e0
> 	ip_output+0x76/0xe0
> 	ip_local_out+0x35/0x40
> 	ip_send_skb+0x19/0x40
> 	ip_push_pending_frames+0x33/0x40
> 	raw_sendmsg+0x7d3/0xb50
> 	inet_sendmsg+0x31/0xb0
> 	sock_sendmsg+0x38/0x50
> 	SYSC_sendto+0x102/0x190
> 	SyS_sendto+0xe/0x10
> 	do_syscall_64+0x67/0x180
> 	entry_SYSCALL64_slow_path+0x25/0x25
> 
> This change addresses the issue avoiding updating the bonding device
> hard_header_len when the slaves list become empty, forbidding to
> shrink it below the value used by header_ops->create().
> 
> The bug is there since commit 54ef31371407 ("[PATCH] bonding: Handle large
> hard_header_len") but the panic can be triggered only since
> commit fc791b633515 ("IB/ipoib: move back IB LL address into the hard
> header").
> 
> Reported-by: Norbert P <noe-PRwTpj6vllL463JZfw7VRw@public.gmane.org>
> Fixes: 54ef31371407 ("[PATCH] bonding: Handle large hard_header_len")
> Fixes: fc791b633515 ("IB/ipoib: move back IB LL address into the hard header")
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Signed-off-by: Paolo Abeni <pabeni-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---

Thanks Paolo.

>  drivers/net/bonding/bond_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 8a4ba8b..34481c9 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1104,11 +1104,11 @@ static void bond_compute_features(struct bonding *bond)
>  		gso_max_size = min(gso_max_size, slave->dev->gso_max_size);
>  		gso_max_segs = min(gso_max_segs, slave->dev->gso_max_segs);
>  	}
> +	bond_dev->hard_header_len = max_hard_header_len;
>  
>  done:
>  	bond_dev->vlan_features = vlan_features;
>  	bond_dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL;
> -	bond_dev->hard_header_len = max_hard_header_len;
>  	bond_dev->gso_max_segs = gso_max_segs;
>  	netif_set_gso_max_size(bond_dev, gso_max_size);
>  
> -- 
> 2.9.3
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 15/21] xen-blkfront: Make use of the new sg_map helper function
From: Logan Gunthorpe @ 2017-04-27 23:03 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Boris Ostrovsky, linux-nvdimm-y27Ovi1pjclAfugRpC6u6w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	target-devel-u79uwXL29TY76Z2rM5mHXA, Christoph Hellwig,
	devel-gWbeCf7V1WCQmaza687I9mD2FQJk+8+b, James E.J. Bottomley,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, Matthew Wilcox,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Sumit Semwal,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw,
	linux-media-u79uwXL29TY76Z2rM5mHXA, Juergen Gross, Julien Grall,
	intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	sparmaintainer-GLv8BlqOqDDQT0dZR+AlfA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	megaraidlinux.pdl-dY08KVG/lbpWk0Htik3J/w, Jens Axboe,
	Martin K. Petersen, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-mmc-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA, Greg Kroah-Hartman
In-Reply-To: <20170427221132.GA30036-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>



On 27/04/17 04:11 PM, Jason Gunthorpe wrote:
> On Thu, Apr 27, 2017 at 03:53:37PM -0600, Logan Gunthorpe wrote:
> Well, that is in the current form, with more users it would make sense
> to optimize for the single page case, eg by providing the existing
> call, providing a faster single-page-only variant of the copy, perhaps
> even one that is inlined.

Ok, does it make sense then to have an sg_copy_page_to_buffer (or some
such... I'm having trouble thinking of a sane name that isn't too long).
That just does k(un)map_atomic and memcpy? I could try that if it makes
sense to people.

>> Switching the for_each_sg to sg_miter is probably the nicer solution as
>> it takes care of the mapping and the offset/length accounting for you
>> and will have similar performance.
> 
> sg_miter will still fail when the sg contains __iomem, however I would
> expect that the sg_copy will work with iomem, by using the __iomem
> memcpy variant.

Yes, that's true. Any sg_miters that ever see iomem will need to be
converted to support it. This isn't much different than the other
kmap(sg_page()) users I was converting that will also fail if they see
iomem. Though, I suspect an sg_miter user would be easier to convert to
iomem than a random kmap user.

Logan

^ permalink raw reply

* Re: [PATCH net] bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removal
From: Jay Vosburgh @ 2017-04-27 23:08 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, David S. Miller, Honggang LI,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <733d454d3c36e99b55de5374c7664364975b171d.1493313626.git.pabeni-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Paolo Abeni <pabeni-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

>On slave list updates, the bonding driver computes its hard_header_len
>as the maximum of all enslaved devices's hard_header_len.
>If the slave list is empty, e.g. on last enslaved device removal,
>ETH_HLEN is used.
>
>Since the bonding header_ops are set only when the first enslaved
>device is attached, the above can lead to header_ops->create()
>being called with the wrong skb headroom in place.
>
>If bond0 is configured on top of ipoib devices, with the
>following commands:
>
>ifup bond0
>for slave in $BOND_SLAVES_LIST; do
>	ip link set dev $slave nomaster
>done
>ping -c 1 <ip on bond0 subnet>
>
>we will obtain a skb_under_panic() with a similar call trace:
>	skb_push+0x3d/0x40
>	push_pseudo_header+0x17/0x30 [ib_ipoib]
>	ipoib_hard_header+0x4e/0x80 [ib_ipoib]
>	arp_create+0x12f/0x220
>	arp_send_dst.part.19+0x28/0x50
>	arp_solicit+0x115/0x290
>	neigh_probe+0x4d/0x70
>	__neigh_event_send+0xa7/0x230
>	neigh_resolve_output+0x12e/0x1c0
>	ip_finish_output2+0x14b/0x390
>	ip_finish_output+0x136/0x1e0
>	ip_output+0x76/0xe0
>	ip_local_out+0x35/0x40
>	ip_send_skb+0x19/0x40
>	ip_push_pending_frames+0x33/0x40
>	raw_sendmsg+0x7d3/0xb50
>	inet_sendmsg+0x31/0xb0
>	sock_sendmsg+0x38/0x50
>	SYSC_sendto+0x102/0x190
>	SyS_sendto+0xe/0x10
>	do_syscall_64+0x67/0x180
>	entry_SYSCALL64_slow_path+0x25/0x25
>
>This change addresses the issue avoiding updating the bonding device
>hard_header_len when the slaves list become empty, forbidding to
>shrink it below the value used by header_ops->create().
>
>The bug is there since commit 54ef31371407 ("[PATCH] bonding: Handle large
>hard_header_len") but the panic can be triggered only since
>commit fc791b633515 ("IB/ipoib: move back IB LL address into the hard
>header").
>
>Reported-by: Norbert P <noe-PRwTpj6vllL463JZfw7VRw@public.gmane.org>
>Fixes: 54ef31371407 ("[PATCH] bonding: Handle large hard_header_len")
>Fixes: fc791b633515 ("IB/ipoib: move back IB LL address into the hard header")
>Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>Signed-off-by: Paolo Abeni <pabeni-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Signed-off-by: Jay Vosburgh <jay.vosburgh-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>


> drivers/net/bonding/bond_main.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 8a4ba8b..34481c9 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1104,11 +1104,11 @@ static void bond_compute_features(struct bonding *bond)
> 		gso_max_size = min(gso_max_size, slave->dev->gso_max_size);
> 		gso_max_segs = min(gso_max_segs, slave->dev->gso_max_segs);
> 	}
>+	bond_dev->hard_header_len = max_hard_header_len;
> 
> done:
> 	bond_dev->vlan_features = vlan_features;
> 	bond_dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL;
>-	bond_dev->hard_header_len = max_hard_header_len;
> 	bond_dev->gso_max_segs = gso_max_segs;
> 	netif_set_gso_max_size(bond_dev, gso_max_size);
> 
>-- 
>2.9.3
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 15/21] xen-blkfront: Make use of the new sg_map helper function
From: Jason Gunthorpe @ 2017-04-27 23:20 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Boris Ostrovsky, linux-nvdimm-y27Ovi1pjclAfugRpC6u6w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	target-devel-u79uwXL29TY76Z2rM5mHXA, Christoph Hellwig,
	devel-gWbeCf7V1WCQmaza687I9mD2FQJk+8+b, James E.J. Bottomley,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, Matthew Wilcox,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Sumit Semwal,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw,
	linux-media-u79uwXL29TY76Z2rM5mHXA, Juergen Gross, Julien Grall,
	intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	sparmaintainer-GLv8BlqOqDDQT0dZR+AlfA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	megaraidlinux.pdl-dY08KVG/lbpWk0Htik3J/w, Jens Axboe,
	Martin K. Petersen, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-mmc-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA, Greg Kroah-Hartman
In-Reply-To: <3a7c0d27-0744-4e91-b37f-3885c50455e8-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>

On Thu, Apr 27, 2017 at 05:03:45PM -0600, Logan Gunthorpe wrote:
> 
> 
> On 27/04/17 04:11 PM, Jason Gunthorpe wrote:
> > On Thu, Apr 27, 2017 at 03:53:37PM -0600, Logan Gunthorpe wrote:
> > Well, that is in the current form, with more users it would make sense
> > to optimize for the single page case, eg by providing the existing
> > call, providing a faster single-page-only variant of the copy, perhaps
> > even one that is inlined.
> 
> Ok, does it make sense then to have an sg_copy_page_to_buffer (or some
> such... I'm having trouble thinking of a sane name that isn't too long).
> That just does k(un)map_atomic and memcpy? I could try that if it makes
> sense to people.

It seems the most robust: test for iomem, and jump to a slow path
copy, otherwise inline the kmap and memcpy

Every place doing memcpy from sgl will need that pattern to be
correct.

> > sg_miter will still fail when the sg contains __iomem, however I would
> > expect that the sg_copy will work with iomem, by using the __iomem
> > memcpy variant.
> 
> Yes, that's true. Any sg_miters that ever see iomem will need to be
> converted to support it. This isn't much different than the other
> kmap(sg_page()) users I was converting that will also fail if they see
> iomem. Though, I suspect an sg_miter user would be easier to convert to
> iomem than a random kmap user.

How? sg_miter seems like the next nightmare down this path, what is
sg_miter_next supposed to do when something hits an iomem sgl?

miter.addr is supposed to be a kernel pointer that must not be
__iomem..

Jason

^ permalink raw reply

* Re: [PATCH v2 15/21] xen-blkfront: Make use of the new sg_map helper function
From: Logan Gunthorpe @ 2017-04-27 23:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Boris Ostrovsky, linux-nvdimm, dri-devel, Stephen Bates, dm-devel,
	target-devel, Christoph Hellwig, devel, James E.J. Bottomley,
	linux-scsi, Matthew Wilcox, linux-rdma, Sumit Semwal,
	Ross Zwisler, open-iscsi, linux-media, Juergen Gross,
	Julien Grall, Konrad Rzeszutek Wilk, intel-gfx, sparmaintainer,
	linux-raid, Dan Williams, megaraidlinux.pdl, Jens Axboe,
	"Martin K. Petersen" <martin.p
In-Reply-To: <20170427232022.GA30398@obsidianresearch.com>



On 27/04/17 05:20 PM, Jason Gunthorpe wrote:
> It seems the most robust: test for iomem, and jump to a slow path
> copy, otherwise inline the kmap and memcpy
> 
> Every place doing memcpy from sgl will need that pattern to be
> correct.

Ok, sounds like a good place to start to me. I'll see what I can do for
a v3 of this set. Though, I probably won't send anything until after the
merge window.

>>> sg_miter will still fail when the sg contains __iomem, however I would
>>> expect that the sg_copy will work with iomem, by using the __iomem
>>> memcpy variant.
>>
>> Yes, that's true. Any sg_miters that ever see iomem will need to be
>> converted to support it. This isn't much different than the other
>> kmap(sg_page()) users I was converting that will also fail if they see
>> iomem. Though, I suspect an sg_miter user would be easier to convert to
>> iomem than a random kmap user.
> 
> How? sg_miter seems like the next nightmare down this path, what is
> sg_miter_next supposed to do when something hits an iomem sgl?

My proposal is roughly included in the draft I sent upthread. We add an
sg_miter flag indicating the iteratee supports iomem and if miter finds
iomem (with the support flag set) it sets ioaddr which is __iomem. The
iteratee then just needs to null check addr and ioaddr and perform the
appropriate action.

Logan

^ permalink raw reply

* Re: xdp_redirect ifindex vs port. Was: best API for returning/setting egress port?
From: Alexei Starovoitov @ 2017-04-27 23:31 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Andy Gospodarek
  Cc: John Fastabend, Alexei Starovoitov, Daniel Borkmann,
	Daniel Borkmann, netdev@vger.kernel.org,
	xdp-newbies@vger.kernel.org
In-Reply-To: <20170427104121.32df2178@redhat.com>

On 4/27/17 1:41 AM, Jesper Dangaard Brouer wrote:
> When registering/attaching a XDP/bpf program, we would just send the
> file-descriptor for this port-map along (like we do with the bpf_prog
> FD). Plus, it own ingress-port number this program is in the port-map.
>
> It is not clear to me, in-which-data-structure on the kernel-side we
> store this reference to the port-map and ingress-port. As today we only
> have the "raw" struct bpf_prog pointer. I see several options:
>
> 1. Create a new xdp_prog struct that contains existing bpf_prog,
> a port-map pointer and ingress-port. (IMHO easiest solution)
>
> 2. Just create a new pointer to port-map and store it in driver rx-ring
> struct (like existing bpf_prog), but this create a race-challenge
> replacing (cmpxchg) the program (or perhaps it's not a problem as it
> runs under rcu and RTNL-lock).
>
> 3. Extend bpf_prog to store this port-map and ingress-port, and have a
> fast-way to access it.  I assume it will be accessible via
> bpf_prog->bpf_prog_aux->used_maps[X] but it will be too slow for XDP.

I'm not sure I completely follow the 3 proposals.
Are you suggesting to have only one netdev_array per program?
Why not to allow any number like we do for tailcall+prog_array, etc?
We can teach verifier to allow new helper
bpf_tx_port(netdev_array, port_num);
to only be used with netdev_array map type.
It will fetch netdevice pointer from netdev_array[port_num]
and will tx the packet into it.
We can make it similar to bpf_tail_call(), so that program will
finish on successful bpf_tx_port() or
make it into 'delayed' tx which will be executed when program finishes.
Not sure which approach is better.

We can also extend this netdev_array into broadcast/multicast. Like
bpf_tx_allports(&netdev_array);
call from the program will xmit the packet to all netdevices
in that 'netdev_array' map type.

The map-in-map support can be trivially extended to allow netdev_array,
then the program can create N multicast groups of netdevices.
Each multicast group == one netdev_array map.
The user space will populate a hashmap with these netdev_arrays and
bpf kernel side can select dynamically which multicast group to use
to send the packets to.
bpf kernel side may look like:
struct bpf_netdev_array *netdev_array = bpf_map_lookup_elem(&hash, key);
if (!netdev_array)
   ...
if (my_condition)
    bpf_tx_allports(netdev_array);  /* broadcast to all netdevices */
else
    bpf_tx_port(netdev_array, port_num); /* tx into one netdevice */

that's an artificial example. Just trying to point out
that we shouldn't restrict the feature too soon.

^ permalink raw reply

* Re: [Patch net-next] ipv4: get rid of ip_ra_lock
From: Cong Wang @ 2017-04-27 23:46 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel Network Developers
In-Reply-To: <1493297179.6453.105.camel@edumazet-glaptop3.roam.corp.google.com>

On Thu, Apr 27, 2017 at 5:46 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2017-04-26 at 13:55 -0700, Cong Wang wrote:
>> After commit 1215e51edad1 ("ipv4: fix a deadlock in ip_ra_control")
>> we always take RTNL lock for ip_ra_control() which is the only place
>> we update the list ip_ra_chain, so the ip_ra_lock is no longer needed,
>> we just need to disable BH there.
>>
>> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>> ---
>
> Looks great, but reading again this code, I believe we do not need to
> disable BH at all ?
>

Hmm, if we don't disable BH here, a reader in BH could jump in and
break this critical section? Or that is fine for RCU?

^ permalink raw reply

* Re: [PATCH iproute2] routel: fix infinite loop in line parser
From: Stephen Hemminger @ 2017-04-27 23:46 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: netdev
In-Reply-To: <20170427094347.3EB63A0F1C@unicorn.suse.cz>

On Thu, 27 Apr 2017 11:43:47 +0200 (CEST)
Michal Kubecek <mkubecek@suse.cz> wrote:

> As noticed by one of the few users of routel script, it ends up in an
> infinite loop when they pull out the cable from the NIC used for some
> route. This is caused by its parser expecting the line of "ip route show"
> output consists of "key value" pairs (except for the initial target range),
> together with an old trap of Bourne style shells that "shift 2" does
> nothing if there is only one argument left. Some keywords, e.g. "linkdown",
> are not followed by a value.
> 
> Improve the parser to
> 
>   (1) only set variables for keywords we care about
>   (2) recognize (currently) known keywords without value
> 
> This is still far from perfect (and certainly not future proof) but to
> fully fix the script, one would probably have to rewrite the logic
> completely (and I'm not sure it's worth the effort).
> 
> Signed-off-by: Michal Kubecek <mkubecek@suse.cz>

Appled, but this really needs to be done better.
Either as a simplified output of route command. See ip -br link
Or ip route should have a json output option and use python/perl/xss
to reformat.

^ permalink raw reply

* Re: [Patch net-next] ipv4: get rid of ip_ra_lock
From: Eric Dumazet @ 2017-04-27 23:54 UTC (permalink / raw)
  To: Cong Wang; +Cc: Linux Kernel Network Developers
In-Reply-To: <CAM_iQpXexP9OzzqyPJ-yHmyq-ZF=cNboaG8566yaxZKbn0+TTg@mail.gmail.com>

On Thu, 2017-04-27 at 16:46 -0700, Cong Wang wrote:
> On Thu, Apr 27, 2017 at 5:46 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Wed, 2017-04-26 at 13:55 -0700, Cong Wang wrote:
> >> After commit 1215e51edad1 ("ipv4: fix a deadlock in ip_ra_control")
> >> we always take RTNL lock for ip_ra_control() which is the only place
> >> we update the list ip_ra_chain, so the ip_ra_lock is no longer needed,
> >> we just need to disable BH there.
> >>
> >> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> >> ---
> >
> > Looks great, but reading again this code, I believe we do not need to
> > disable BH at all ?
> >
> 
> Hmm, if we don't disable BH here, a reader in BH could jump in and
> break this critical section? Or that is fine for RCU?

It should be fine for RCU.

The spinlock (or mutex if this is RTNL) is protecting writers among
themselves. Here it should run in process context, with no specific
rules to disable preemption, hard or soft irqs.

The reader(s) do not care of how writer(s) enforce their mutual
protection, and if writer(s) disable hard or soft irqs.

^ permalink raw reply

* Re: [PATCH net-next V3 2/2] rtnl: Add support for netdev event attribute to link messages
From: Roopa Prabhu @ 2017-04-28  0:11 UTC (permalink / raw)
  To: Vlad Yasevich
  Cc: David Ahern, Vladislav Yasevich, netdev@vger.kernel.org,
	Jiri Pirko
In-Reply-To: <8986b8c8-9bf1-21fc-49e5-e196630cd318@redhat.com>

On Thu, Apr 27, 2017 at 12:51 PM, Vlad Yasevich <vyasevic@redhat.com> wrote:
> On 04/24/2017 11:14 AM, Roopa Prabhu wrote:
>> On Sun, Apr 23, 2017 at 6:07 PM, David Ahern <dsa@cumulusnetworks.com> wrote:
>>>
>>> On 4/21/17 11:31 AM, Vladislav Yasevich wrote:
>>>> @@ -1276,9 +1277,40 @@ static int rtnl_xdp_fill(struct sk_buff *skb, struct net_device *dev)
>>>>       return err;
>>>>  }
>>>>
>>>> +static int rtnl_fill_link_event(struct sk_buff *skb, unsigned long event)
>>>> +{
>>>> +     u32 rtnl_event;
>>>> +
>>>> +     switch (event) {
>>>> +     case NETDEV_REBOOT:
>>>> +             rtnl_event = IFLA_EVENT_REBOOT;
>>>> +             break;
>>>> +     case NETDEV_FEAT_CHANGE:
>>>> +             rtnl_event = IFLA_EVENT_FEAT_CHANGE;
>>>> +             break;
>>>> +     case NETDEV_BONDING_FAILOVER:
>>>> +             rtnl_event = IFLA_EVENT_BONDING_FAILOVER;
>>>> +             break;
>>>> +     case NETDEV_NOTIFY_PEERS:
>>>> +             rtnl_event = IFLA_EVENT_NOTIFY_PEERS;
>>>> +             break;
>>>> +     case NETDEV_RESEND_IGMP:
>>>> +             rtnl_event = IFLA_EVENT_RESEND_IGMP;
>>>> +             break;
>>>> +     case NETDEV_CHANGEINFODATA:
>>>> +             rtnl_event = IFLA_EVENT_CHANGE_INFO_DATA;
>>>> +             break;
>>>> +     default:
>>>> +             return 0;
>>>> +     }
>>>> +
>>>> +     return nla_put_u32(skb, IFLA_EVENT, rtnl_event);
>>>> +}
>>>> +
>>>
>>> I still have doubts about encoding kernel events into a uapi.
>>
>> agree. I don't see why user-space will need NETDEV_CHANGEINFODATA and
>> others david listed.
>>
>
> Well, I am not sure about CHANGEINFODATA as well, but I can see use
> cases for others.
>
>> My other concerns are, once we have this exposed to user-space and
>> user-space starts relying on it, it will need accurate information and
>> will expect to have this event information all the time.
>> IIUC, we cannot cover multiple events in a single notification and not
>> all link notifications will contain an IFLA_EVENT attribute.
>
> Uhm...  If the rtnetlink message was a result of an event, it will have
> an IFLA_EVENT.  If a message is something else, then it will not have
> an event.  That's the point.  Not all netlink attributes are in every
> netlink message.
>
>> In other
>> words, we will be telling user-space to not expect that the kernel
>> will send IFLA_EVENT every time.
>>
>
> No, we are telling the user that if it is interested in a specific event
> (let's say NOTIFY_PEERS or RESEND_IGMP), then it now can monitor netlink
> traffic for those events.
> As things stand right now, that's not possible.
>
> I've done this specifically for all events for which we currently generate
> a netlink message.
>
> The only concern I have is that if in the future we remove a certain netdev
> event, it may impact applications.  But we may be doing it right now as well,
> only silently, and the apps may have to find some ways to work around it.
>

ok, fair enough. it might be ok then....except for the specific
attributes that user-space may not be interested like CHANGEINFODATA.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox