Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v1 net-next 5/6] net: allow simultaneous SW and HW transmit timestamping
From: Willem de Bruijn @ 2017-04-27 16:48 UTC (permalink / raw)
  To: Miroslav Lichvar
  Cc: Network Development, Richard Cochran, Willem de Bruijn,
	Soheil Hassas Yeganeh, Keller, Jacob E, Denny Page, Jiri Benc
In-Reply-To: <20170427163911.GC3401@localhost>

On Thu, Apr 27, 2017 at 12:39 PM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
> On Thu, Apr 27, 2017 at 12:21:00PM -0400, Willem de Bruijn wrote:
>> >> > @@ -720,6 +720,7 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
>> >> >                 empty = 0;
>> >> >         if (shhwtstamps &&
>> >> >             (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
>> >> > +           (empty || !skb_is_err_queue(skb)) &&
>> >> >             ktime_to_timespec_cond(shhwtstamps->hwtstamp, tss.ts + 2)) {
>> >>
>> >> I find skb->tstamp == 0 easier to understand than the condition on empty.
>> >>
>> >> Indeed, this is so non-obvious that I would suggest another helper function
>> >> skb_is_hwtx_tstamp with a concise comment about the race condition
>> >> between tx software and hardware timestamps (as in the last sentence of
>> >> the commit message).
>> >
>> > Should it include also the skb_is_err_queue() check? If it returned
>> > true for both TX and RX HW timestamps, maybe it could be called
>> > skb_has_hw_tstamp?
>>
>> For the purpose of documenting why this complex condition exists,
>> I would call the skb_is_err_queue in that helper function and make
>> it tx + hw specific.
>
> Hm, like this?
>
>         if (shhwtstamps &&
>             (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
> +           (skb_is_hwtx_tstamp(skb) || !skb_is_err_queue(skb)) &&
>             ktime_to_timespec_cond(shhwtstamps->hwtstamp, tss.ts + 2)) {
>
> where skb_is_hwtx_tstamp() has
>         return skb->tstamp == 0 && skb_is_err_queue(skb);
>
> I was just not sure about the unnecessary skb_is_err_queue() call.

Oh, good point. If the condition is

  (skb_is_err_queue(skb) && !skb->tstamp) || !skb_is_err_queue(skb)

then it makes more sense to just use the simpler expression

  (!skb_is_err_queue(skb)) || (!skb->tstamp)

This cannot be called skb_is_hwtx_tstamp, as this does not imply
skb_hwtstamps(skb). Perhaps instead define

  /* On transmit, software and hardware timestamps are returned independently.
   * Do not return hardware timestamps even if skb_hwtstamps(skb) is true if
   * skb->tstamp is set
   */
  static bool skb_is_swtx_tstamp(skb) {
    return skb_is_err_queue(skb) && skb->tstamp;
  }

and use !skb_is_swtx_tstamp(skb) in this condition. The comment is
just a quick first try, can probably be improved.

^ permalink raw reply

* Re: [PATCH net-next 6/6] bpf: show bpf programs
From: David Miller @ 2017-04-27 16:40 UTC (permalink / raw)
  To: hannes; +Cc: daniel, netdev, ast, daniel, jbenc, aconole
In-Reply-To: <d99407db-895c-01c0-8e26-c6bd2b79d4ff@stressinduktion.org>

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Thu, 27 Apr 2017 18:28:17 +0200

> Merely I tried to establish the procfs interface as quick look
> interface

Show me that "quick look" nftables dumping facility via procfs
and I'll start to listen to you.

What you are proposing has no real value once we have bpf() system
call based traversal and has no strict precedence across the
networking subsystem.

Thank you.

^ permalink raw reply

* status of bpf binutils
From: David Miller @ 2017-04-27 16:39 UTC (permalink / raw)
  To: ast; +Cc: daniel, netdev, xdp-newbies


As I hinted the other day I'm hacking on BPF support in binutils.

Here is what works right now:

1) disassembly of object files made by existing tools, I can build things
   with clang/llvm on my sparc and analyze the resulting object files
   using objdump and gdb

[davem@dhcp-10-15-49-210 build-bpf]$ ./binutils/objdump -d ./x.o

./x.o:     file format elf64-bpf


Disassembly of section test1:

0000000000000000 <process>:
   0:	b7 00 00 00 00 00 00 02 	mov	r0, 2
   8:	61 21 00 50 00 00 00 00 	ldw	r2, [r1+80]
 ...

[davem@dhcp-10-15-49-210 build-bpf]$ ./gdb/gdb ./x.o
GNU gdb (GDB) 8.0.50.20170427-git
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-pc-linux-gnu --target=bpf-elf".
 ...
(gdb) x/10i process
   0x0 <process>:	mov	r0, 2
   0x8 <process+8>:	ldw	r2, [r1+80]
   0x10 <process+16>:	ldw	r1, [r1+76]
   0x18 <process+24>:	mov	r4, r1
   0x20 <process+32>:	add	r4, 14
   0x28 <process+40>:	jgt	r4, r2, 0x148 <LBB0_11>
   0x30 <process+48>:	ldb	r5, [r1+13]
   0x38 <process+56>:	ldb	r3, [r1+12]
   0x40 <process+64>:	lsh	r3, 8
   0x48 <process+72>:	or	r3, r5
(gdb)

2) Simple assembler programs compile.

[davem@dhcp-10-15-49-210 build-bpf]$ cat gas/x.s
	.text
	.align	8
	.globl	foo
foo:	mov	r1, 2
	ja	1f
	mov	r1, 3
1:	exit
[davem@dhcp-10-15-49-210 build-bpf]$ gas/as-new -o gas/x.o gas/x.s
[davem@dhcp-10-15-49-210 build-bpf]$ ./binutils/objdump -d gas/x.o

gas/x.o:     file format elf64-bpf


Disassembly of section .text:

0000000000000000 <foo>:
   0:	b7 10 00 00 00 00 00 02 	mov	r1, 2
   8:	05 00 00 03 00 00 00 00 	ja	18 <foo+0x18>
  10:	b7 10 00 00 00 00 00 03 	mov	r1, 3
  18:	95 00 00 00 00 00 00 00 	exit	
[davem@dhcp-10-15-49-210 build-bpf]$

I've created a few ELF relocations for BPF, there are only really 3
main things to consider:

1) Immediate field, 32-bit
2) Offset field, 16-bit absolute
3) Offset field, 16-bit PC-relative displacement

and thus:

/* Relocation types.  */
START_RELOC_NUMBERS (elf_bpf_reloc_type)
  RELOC_NUMBER (R_BPF_NONE, 0)
  RELOC_NUMBER (R_BPF_16, 1)
  RELOC_NUMBER (R_BPF_32, 2)
  RELOC_NUMBER (R_BPF_WDISP16, 3)
END_RELOC_NUMBERS (R_BPF_max)

is what goes into include/elf/bpf.h

I just realized while writing this that I'll need to add an R_BPF_64
to handle ldimm64 instructions, but that's not a big deal.

I'm going to concentrate on the assembler for now, and start writing
test cases.

Another area I have not resolved completely is endianness.  Right now
just for my hacking and testing, I'm forcing everything to be big
endian which of course will not be the final default :-)

The current patch against:

	git://sourceware.org/git/binutils-gdb.git

is below.

If you want to play with this configure for "--target=bpf-elf".


>From 2e193eecf3eee1c0632f5c1932f76ff387c49ae2 Mon Sep 17 00:00:00 2001
From: "David S. Miller" <davem@davemloft.net>
Date: Wed, 26 Apr 2017 14:27:53 -0400
Subject: [PATCH] Start adding BPF support...

---
 bfd/Makefile.am            |   2 +
 bfd/Makefile.in            |   3 +
 bfd/archures.c             |   3 +
 bfd/bfd-in2.h              |   5 +
 bfd/config.bfd             |   5 +
 bfd/configure              |   1 +
 bfd/configure.ac           |   1 +
 bfd/cpu-bpf.c              |  41 +++++
 bfd/elf64-bpf.c            |  47 +++++
 bfd/libbfd.h               |   1 +
 bfd/reloc.c                |   5 +
 bfd/targets.c              |   3 +
 config.sub                 |   5 +-
 gas/Makefile.am            |   2 +
 gas/Makefile.in            |  17 ++
 gas/config/tc-bpf.c        | 447 +++++++++++++++++++++++++++++++++++++++++++++
 gas/config/tc-bpf.h        |  38 ++++
 gas/configure.tgt          |   3 +
 gdb/bpf-tdep.c             | 229 +++++++++++++++++++++++
 gdb/bpf-tdep.h             |  40 ++++
 gdb/configure.tgt          |   4 +
 include/dis-asm.h          |   1 +
 include/elf/bpf.h          |  34 ++++
 include/opcode/bpf.h       |  16 ++
 ld/Makefile.am             |   4 +
 ld/Makefile.in             |   5 +
 ld/configure.tgt           |   2 +
 ld/emulparams/elf64_bpf.sh |   8 +
 opcodes/Makefile.am        |   2 +
 opcodes/bpf-dis.c          | 152 +++++++++++++++
 opcodes/bpf-opc.c          | 147 +++++++++++++++
 opcodes/configure          |   1 +
 opcodes/configure.ac       |   1 +
 opcodes/disassemble.c      |   6 +
 34 files changed, 1279 insertions(+), 2 deletions(-)
 create mode 100644 bfd/cpu-bpf.c
 create mode 100644 bfd/elf64-bpf.c
 create mode 100644 gas/config/tc-bpf.c
 create mode 100644 gas/config/tc-bpf.h
 create mode 100644 gdb/bpf-tdep.c
 create mode 100644 gdb/bpf-tdep.h
 create mode 100644 include/elf/bpf.h
 create mode 100644 include/opcode/bpf.h
 create mode 100644 ld/emulparams/elf64_bpf.sh
 create mode 100644 opcodes/bpf-dis.c
 create mode 100644 opcodes/bpf-opc.c

diff --git a/bfd/Makefile.am b/bfd/Makefile.am
index 97b608c..911655a 100644
--- a/bfd/Makefile.am
+++ b/bfd/Makefile.am
@@ -95,6 +95,7 @@ ALL_MACHINES = \
 	cpu-arm.lo \
 	cpu-avr.lo \
 	cpu-bfin.lo \
+	cpu-bpf.lo \
 	cpu-cr16.lo \
 	cpu-cr16c.lo \
 	cpu-cris.lo \
@@ -185,6 +186,7 @@ ALL_MACHINES_CFILES = \
 	cpu-arm.c \
 	cpu-avr.c \
 	cpu-bfin.c \
+	cpu-bpf.c \
 	cpu-cr16.c \
 	cpu-cr16c.c \
 	cpu-cris.c \
diff --git a/bfd/Makefile.in b/bfd/Makefile.in
index e48abaf..930aa09 100644
--- a/bfd/Makefile.in
+++ b/bfd/Makefile.in
@@ -428,6 +428,7 @@ ALL_MACHINES = \
 	cpu-arm.lo \
 	cpu-avr.lo \
 	cpu-bfin.lo \
+	cpu-bpf.lo \
 	cpu-cr16.lo \
 	cpu-cr16c.lo \
 	cpu-cris.lo \
@@ -518,6 +519,7 @@ ALL_MACHINES_CFILES = \
 	cpu-arm.c \
 	cpu-avr.c \
 	cpu-bfin.c \
+	cpu-bpf.c \
 	cpu-cr16.c \
 	cpu-cr16c.c \
 	cpu-cris.c \
@@ -1380,6 +1382,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/cpu-arm.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/cpu-avr.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/cpu-bfin.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/cpu-bpf.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/cpu-cr16.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/cpu-cr16c.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/cpu-cris.Plo@am__quote@
diff --git a/bfd/archures.c b/bfd/archures.c
index c6e7152..f096d73 100644
--- a/bfd/archures.c
+++ b/bfd/archures.c
@@ -447,6 +447,8 @@ DESCRIPTION
 .#define bfd_mach_avrxmega7 107
 .  bfd_arch_bfin,        {* ADI Blackfin *}
 .#define bfd_mach_bfin          1
+.  bfd_arch_bpf,        {* eBPF *}
+.#define bfd_mach_bpf           1
 .  bfd_arch_cr16,       {* National Semiconductor CompactRISC (ie CR16). *}
 .#define bfd_mach_cr16		1
 .  bfd_arch_cr16c,       {* National Semiconductor CompactRISC. *}
@@ -582,6 +584,7 @@ extern const bfd_arch_info_type bfd_arc_arch;
 extern const bfd_arch_info_type bfd_arm_arch;
 extern const bfd_arch_info_type bfd_avr_arch;
 extern const bfd_arch_info_type bfd_bfin_arch;
+extern const bfd_arch_info_type bfd_bpf_arch;
 extern const bfd_arch_info_type bfd_cr16_arch;
 extern const bfd_arch_info_type bfd_cr16c_arch;
 extern const bfd_arch_info_type bfd_cris_arch;
diff --git a/bfd/bfd-in2.h b/bfd/bfd-in2.h
index 17a35c0..b4db6b2 100644
--- a/bfd/bfd-in2.h
+++ b/bfd/bfd-in2.h
@@ -2304,6 +2304,8 @@ enum bfd_architecture
 #define bfd_mach_avrxmega7 107
   bfd_arch_bfin,        /* ADI Blackfin */
 #define bfd_mach_bfin          1
+  bfd_arch_bpf,        /* eBPF */
+#define bfd_mach_bpf           1
   bfd_arch_cr16,       /* National Semiconductor CompactRISC (ie CR16). */
 #define bfd_mach_cr16          1
   bfd_arch_cr16c,       /* National Semiconductor CompactRISC. */
@@ -3910,6 +3912,9 @@ pc-relative or some form of GOT-indirect relocation.  */
 /* ADI Blackfin arithmetic relocation.  */
   BFD_ARELOC_BFIN_ADDR,
 
+/* BPF relocations  */
+  BFD_RELOC_BPF_WDISP16,
+
 /* Mitsubishi D10V relocs.
 This is a 10-bit reloc with the right 2 bits
 assumed to be 0.  */
diff --git a/bfd/config.bfd b/bfd/config.bfd
index 151de95..0cbccae 100644
--- a/bfd/config.bfd
+++ b/bfd/config.bfd
@@ -161,6 +161,7 @@ am33_2.0*)	 targ_archs=bfd_mn10300_arch ;;
 arc*)		 targ_archs=bfd_arc_arch ;;
 arm*)		 targ_archs=bfd_arm_arch ;;
 bfin*)		 targ_archs=bfd_bfin_arch ;;
+bpf*)		 targ_archs=bfd_bpf_arch ;;
 c30*)		 targ_archs=bfd_tic30_arch ;;
 c4x*)		 targ_archs=bfd_tic4x_arch ;;
 c54x*)		 targ_archs=bfd_tic54x_arch ;;
@@ -471,6 +472,10 @@ case "${targ}" in
     targ_underscore=yes
     ;;
 
+  bpf-*-*)
+    targ_defvec=bpf_elf64_vec
+    ;;
+
   c30-*-*aout* | tic30-*-*aout*)
     targ_defvec=tic30_aout_vec
     ;;
diff --git a/bfd/configure b/bfd/configure
index 24e3e2f..d1a67bb 100755
--- a/bfd/configure
+++ b/bfd/configure
@@ -14298,6 +14298,7 @@ do
     avr_elf32_vec)		 tb="$tb elf32-avr.lo elf32.lo $elf" ;;
     bfin_elf32_vec)		 tb="$tb elf32-bfin.lo elf32.lo $elf" ;;
     bfin_elf32_fdpic_vec)	 tb="$tb elf32-bfin.lo elf32.lo $elf" ;;
+    bpf_elf64_vec)		 tb="$tb elf64-bpf.lo elf64.lo $elf" ;;
     bout_be_vec)		 tb="$tb bout.lo aout32.lo" ;;
     bout_le_vec)		 tb="$tb bout.lo aout32.lo" ;;
     cr16_elf32_vec)		 tb="$tb elf32-cr16.lo elf32.lo $elf" ;;
diff --git a/bfd/configure.ac b/bfd/configure.ac
index e568847..00c6690 100644
--- a/bfd/configure.ac
+++ b/bfd/configure.ac
@@ -429,6 +429,7 @@ do
     avr_elf32_vec)		 tb="$tb elf32-avr.lo elf32.lo $elf" ;;
     bfin_elf32_vec)		 tb="$tb elf32-bfin.lo elf32.lo $elf" ;;
     bfin_elf32_fdpic_vec)	 tb="$tb elf32-bfin.lo elf32.lo $elf" ;;
+    bpf_elf64_vec)		 tb="$tb elf64-bpf.lo elf64.lo $elf" ;;
     bout_be_vec)		 tb="$tb bout.lo aout32.lo" ;;
     bout_le_vec)		 tb="$tb bout.lo aout32.lo" ;;
     cr16_elf32_vec)		 tb="$tb elf32-cr16.lo elf32.lo $elf" ;;
diff --git a/bfd/cpu-bpf.c b/bfd/cpu-bpf.c
new file mode 100644
index 0000000..551e42e
--- /dev/null
+++ b/bfd/cpu-bpf.c
@@ -0,0 +1,41 @@
+/* BFD Support for the eBPF.
+
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   This file is part of BFD, the Binary File Descriptor library.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston,
+   MA 02110-1301, USA.  */
+
+#include "sysdep.h"
+#include "bfd.h"
+#include "libbfd.h"
+
+const bfd_arch_info_type bfd_bpf_arch =
+  {
+    64,     		/* Bits in a word.  */
+    64,  		/* Bits in an address.  */
+    8,     		/* Bits in a byte.  */
+    bfd_arch_bpf,
+    0,                	/* Only one machine.  */
+    "bpf",        	/* Arch name.  */
+    "bpf",        	/* Arch printable name.  */
+    3,                	/* Section align power.  */
+    TRUE,             	/* The one and only.  */
+    bfd_default_compatible,
+    bfd_default_scan,
+    bfd_arch_default_fill,
+    0,
+  };
diff --git a/bfd/elf64-bpf.c b/bfd/elf64-bpf.c
new file mode 100644
index 0000000..76f14e7
--- /dev/null
+++ b/bfd/elf64-bpf.c
@@ -0,0 +1,47 @@
+#include "sysdep.h"
+#include "bfd.h"
+#include "libbfd.h"
+#include "elf-bfd.h"
+#include "opcode/bpf.h"
+
+static void
+check_for_relocs (bfd * abfd, asection * o, void * failed)
+{
+  if ((o->flags & SEC_RELOC) != 0)
+    {
+      Elf_Internal_Ehdr *ehdrp;
+
+      ehdrp = elf_elfheader (abfd);
+      /* xgettext:c-format */
+      _bfd_error_handler (_("%B: Relocations in generic ELF (EM: %d)"),
+			  abfd, ehdrp->e_machine);
+
+      bfd_set_error (bfd_error_wrong_format);
+      * (bfd_boolean *) failed = TRUE;
+    }
+}
+
+static bfd_boolean
+elf64_generic_link_add_symbols (bfd *abfd, struct bfd_link_info *info)
+{
+  bfd_boolean failed = FALSE;
+
+  /* Check if there are any relocations.  */
+  bfd_map_over_sections (abfd, check_for_relocs, & failed);
+
+  if (failed)
+    return FALSE;
+  return bfd_elf_link_add_symbols (abfd, info);
+}
+
+#define TARGET_BIG_SYM		bpf_elf64_vec
+#define TARGET_BIG_NAME		"elf64-bpf"
+#define ELF_ARCH		bfd_arch_bpf
+#define ELF_MAXPAGESIZE		0x100000
+#define ELF_MACHINE_CODE	EM_BPF
+
+#define bfd_elf64_bfd_reloc_type_lookup bfd_default_reloc_type_lookup
+#define bfd_elf64_bfd_reloc_name_lookup _bfd_norelocs_bfd_reloc_name_lookup
+#define bfd_elf64_bfd_link_add_symbols	elf64_generic_link_add_symbols
+
+#include "elf64-target.h"
diff --git a/bfd/libbfd.h b/bfd/libbfd.h
index 8bac650..01c6d84 100644
--- a/bfd/libbfd.h
+++ b/bfd/libbfd.h
@@ -1794,6 +1794,7 @@ static const char *const bfd_reloc_code_real_names[] = { "@@uninitialized@@",
   "BFD_ARELOC_BFIN_PAGE",
   "BFD_ARELOC_BFIN_HWPAGE",
   "BFD_ARELOC_BFIN_ADDR",
+  "BFD_RELOC_BPF_WDISP16",
   "BFD_RELOC_D10V_10_PCREL_R",
   "BFD_RELOC_D10V_10_PCREL_L",
   "BFD_RELOC_D10V_18",
diff --git a/bfd/reloc.c b/bfd/reloc.c
index 9a04022..39dc3b2 100644
--- a/bfd/reloc.c
+++ b/bfd/reloc.c
@@ -3854,6 +3854,11 @@ ENUMDOC
   ADI Blackfin arithmetic relocation.
 
 ENUM
+  BFD_RELOC_BPF_WDISP16
+ENUMDOC
+  BPF relocations
+
+ENUM
   BFD_RELOC_D10V_10_PCREL_R
 ENUMDOC
   Mitsubishi D10V relocs.
diff --git a/bfd/targets.c b/bfd/targets.c
index 5841e8d..799e2bb 100644
--- a/bfd/targets.c
+++ b/bfd/targets.c
@@ -619,6 +619,7 @@ extern const bfd_target arm_pei_wince_le_vec;
 extern const bfd_target avr_elf32_vec;
 extern const bfd_target bfin_elf32_vec;
 extern const bfd_target bfin_elf32_fdpic_vec;
+extern const bfd_target bpf_elf64_vec;
 extern const bfd_target bout_be_vec;
 extern const bfd_target bout_le_vec;
 extern const bfd_target cr16_elf32_vec;
@@ -1029,6 +1030,8 @@ static const bfd_target * const _bfd_target_vector[] =
 	&bfin_elf32_vec,
 	&bfin_elf32_fdpic_vec,
 
+	&bpf_elf64_vec,
+
 	&bout_be_vec,
 	&bout_le_vec,
 
diff --git a/config.sub b/config.sub
index 40ea5df..942989e 100755
--- a/config.sub
+++ b/config.sub
@@ -2,7 +2,7 @@
 # Configuration validation subroutine script.
 #   Copyright 1992-2017 Free Software Foundation, Inc.
 
-timestamp='2017-04-02'
+timestamp='2017-04-25'
 
 # This file is free software; you can redistribute it and/or modify it
 # under the terms of the GNU General Public License as published by
@@ -257,6 +257,7 @@ case $basic_machine in
 	| ba \
 	| be32 | be64 \
 	| bfin \
+	| bpf \
 	| c4x | c8051 | clipper \
 	| d10v | d30v | dlx | dsp16xx \
 	| e2k | epiphany \
@@ -380,7 +381,7 @@ case $basic_machine in
 	| avr-* | avr32-* \
 	| ba-* \
 	| be32-* | be64-* \
-	| bfin-* | bs2000-* \
+	| bfin-* | bpf-* | bs2000-* \
 	| c[123]* | c30-* | [cjt]90-* | c4x-* \
 	| c8051-* | clipper-* | craynv-* | cydra-* \
 	| d10v-* | d30v-* | dlx-* \
diff --git a/gas/Makefile.am b/gas/Makefile.am
index c9f9de0..bfd6ed9 100644
--- a/gas/Makefile.am
+++ b/gas/Makefile.am
@@ -135,6 +135,7 @@ TARGET_CPU_CFILES = \
 	config/tc-arm.c \
 	config/tc-avr.c \
 	config/tc-bfin.c \
+	config/tc-bpf.c \
 	config/tc-cr16.c \
 	config/tc-cris.c \
 	config/tc-crx.c \
@@ -212,6 +213,7 @@ TARGET_CPU_HFILES = \
 	config/tc-arm.h \
 	config/tc-avr.h \
 	config/tc-bfin.h \
+	config/tc-bpf.h \
 	config/tc-cr16.h \
 	config/tc-cris.h \
 	config/tc-crx.h \
diff --git a/gas/Makefile.in b/gas/Makefile.in
index 1927de5..ee62f1a 100644
--- a/gas/Makefile.in
+++ b/gas/Makefile.in
@@ -431,6 +431,7 @@ TARGET_CPU_CFILES = \
 	config/tc-arm.c \
 	config/tc-avr.c \
 	config/tc-bfin.c \
+	config/tc-bpf.c \
 	config/tc-cr16.c \
 	config/tc-cris.c \
 	config/tc-crx.c \
@@ -508,6 +509,7 @@ TARGET_CPU_HFILES = \
 	config/tc-arm.h \
 	config/tc-avr.h \
 	config/tc-bfin.h \
+	config/tc-bpf.h \
 	config/tc-cr16.h \
 	config/tc-cris.h \
 	config/tc-crx.h \
@@ -868,6 +870,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tc-arm.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tc-avr.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tc-bfin.Po@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tc-bpf.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tc-cr16.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tc-cris.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tc-crx.Po@am__quote@
@@ -1045,6 +1048,20 @@ tc-bfin.obj: config/tc-bfin.c
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
 @am__fastdepCC_FALSE@	$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o tc-bfin.obj `if test -f 'config/tc-bfin.c'; then $(CYGPATH_W) 'config/tc-bfin.c'; else $(CYGPATH_W) '$(srcdir)/config/tc-bfin.c'; fi`
 
+tc-bpf.o: config/tc-bpf.c
+@am__fastdepCC_TRUE@	$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -MT tc-bpf.o -MD -MP -MF $(DEPDIR)/tc-bpf.Tpo -c -o tc-bpf.o `test -f 'config/tc-bpf.c' || echo '$(srcdir)/'`config/tc-bpf.c
+@am__fastdepCC_TRUE@	$(am__mv) $(DEPDIR)/tc-bpf.Tpo $(DEPDIR)/tc-bpf.Po
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	source='config/tc-bpf.c' object='tc-bpf.o' libtool=no @AMDEPBACKSLASH@
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
+@am__fastdepCC_FALSE@	$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o tc-bpf.o `test -f 'config/tc-bpf.c' || echo '$(srcdir)/'`config/tc-bpf.c
+
+tc-bpf.obj: config/tc-bpf.c
+@am__fastdepCC_TRUE@	$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -MT tc-bpf.obj -MD -MP -MF $(DEPDIR)/tc-bpf.Tpo -c -o tc-bpf.obj `if test -f 'config/tc-bpf.c'; then $(CYGPATH_W) 'config/tc-bpf.c'; else $(CYGPATH_W) '$(srcdir)/config/tc-bpf.c'; fi`
+@am__fastdepCC_TRUE@	$(am__mv) $(DEPDIR)/tc-bpf.Tpo $(DEPDIR)/tc-bpf.Po
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	source='config/tc-bpf.c' object='tc-bpf.obj' libtool=no @AMDEPBACKSLASH@
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
+@am__fastdepCC_FALSE@	$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o tc-bpf.obj `if test -f 'config/tc-bpf.c'; then $(CYGPATH_W) 'config/tc-bpf.c'; else $(CYGPATH_W) '$(srcdir)/config/tc-bpf.c'; fi`
+
 tc-cr16.o: config/tc-cr16.c
 @am__fastdepCC_TRUE@	$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -MT tc-cr16.o -MD -MP -MF $(DEPDIR)/tc-cr16.Tpo -c -o tc-cr16.o `test -f 'config/tc-cr16.c' || echo '$(srcdir)/'`config/tc-cr16.c
 @am__fastdepCC_TRUE@	$(am__mv) $(DEPDIR)/tc-cr16.Tpo $(DEPDIR)/tc-cr16.Po
diff --git a/gas/config/tc-bpf.c b/gas/config/tc-bpf.c
new file mode 100644
index 0000000..334e228
--- /dev/null
+++ b/gas/config/tc-bpf.c
@@ -0,0 +1,447 @@
+/* tc-bpf.c -- Assemble for the SPARC
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of GAS, the GNU Assembler.
+
+   GAS is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GAS is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public
+   License along with GAS; see the file COPYING.  If not, write
+   to the Free Software Foundation, 51 Franklin Street - Fifth Floor,
+   Boston, MA 02110-1301, USA.  */
+
+#include "as.h"
+#include "safe-ctype.h"
+#include "subsegs.h"
+#include "opcode/bpf.h"
+#ifdef OBJ_ELF
+#include "elf/bpf.h"
+#include "dwarf2dbg.h"
+#endif
+
+const pseudo_typeS md_pseudo_table[] =
+{
+  {"align", s_align_bytes, 0},	/* Defaulting is invalid (0).  */
+  {"global", s_globl, 0},
+  {"half", cons, 2},
+  {"skip", s_space, 0},
+  {"word", cons, 4},
+  {"xword", cons, 8},
+  {NULL, 0, 0},
+};
+
+const char comment_chars[] = "!";
+const char line_comment_chars[] = "#";
+const char line_separator_chars[] = ";";
+const char EXP_CHARS[] = "eE";
+const char FLT_CHARS[] = "rRsSfFdDxXpP";
+
+const char *md_shortopts = "";
+struct option md_longopts[] =
+{
+  { NULL,		no_argument,		NULL, 0                 },
+};
+size_t md_longopts_size = sizeof (md_longopts);
+
+int
+md_parse_option (int c ATTRIBUTE_UNUSED, const char *arg ATTRIBUTE_UNUSED)
+{
+  return 0;
+}
+
+void
+md_show_usage (FILE *stream)
+{
+  fprintf (stream, _("BPF options:\n"));
+}
+
+/* Handle of the OPCODE hash table.  */
+static struct hash_control *op_hash;
+
+void
+md_begin (void)
+{
+  const char *retval = NULL;
+  unsigned int i = 0;
+  int lose = 0;
+
+  op_hash = hash_new ();
+  while (i < (unsigned int) bpf_num_opcodes)
+    {
+      const char *name = bpf_opcodes[i].name;
+      retval = hash_insert (op_hash, name, (void *) &bpf_opcodes[i]);
+      if (retval != NULL)
+	{
+	  as_bad (_("Internal error: can't hash `%s': %s\n"),
+		  bpf_opcodes[i].name, retval);
+	  lose = 1;
+	}
+      do
+	{
+	  ++i;
+	}
+      while (i < (unsigned int) bpf_num_opcodes
+	     && !strcmp (bpf_opcodes[i].name, name));
+    }
+  if (lose)
+    as_fatal (_("Broken assembler.  No assembly attempted."));
+
+
+}
+
+struct bpf_it
+  {
+    const char *error;
+    valueT opcode;
+    expressionS exp;
+    int pcrel;
+    bfd_reloc_code_real_type reloc;
+  };
+
+/* Subroutine of md_assemble to output one insn.  */
+
+static void
+output_insn (struct bpf_it *theinsn)
+{
+  char *toP = frag_more (8);
+
+  /* Put out the opcode.  */
+  if (target_big_endian)
+    {
+      number_to_chars_bigendian (toP, theinsn->opcode, 8);
+    }
+  else
+    {
+      number_to_chars_littleendian (toP, theinsn->opcode, 8);
+    }
+
+  /* Put out the symbol-dependent stuff.  */
+  if (theinsn->reloc != BFD_RELOC_NONE)
+    {
+      fixS *fixP =  fix_new_exp (frag_now,	/* Which frag.  */
+				 (toP - frag_now->fr_literal),	/* Where.  */
+				 4,		/* Size.  */
+				 &theinsn->exp,
+				 theinsn->pcrel,
+				 theinsn->reloc);
+      /* Turn off overflow checking in fixup_segment.  We'll do our
+	 own overflow checking in md_apply_fix.  This is necessary because
+	 the insn size is 4 and fixup_segment will signal an overflow for
+	 large 8 byte quantities.  */
+      fixP->fx_no_overflow = 1;
+    }
+
+#ifdef OBJ_ELF
+  dwarf2_emit_insn (8);
+#endif
+}
+
+static struct bpf_it the_insn;
+static char *expr_end;
+
+static int
+get_expression (char *str, expressionS *exp)
+{
+  char *save_in;
+  segT seg;
+
+  save_in = input_line_pointer;
+  input_line_pointer = str;
+  seg = expression (exp);
+  if (seg != absolute_section
+      && seg != text_section
+      && seg != data_section
+      && seg != bss_section
+      && seg != undefined_section)
+    {
+      the_insn.error = _("bad segment");
+      expr_end = input_line_pointer;
+      input_line_pointer = save_in;
+      return 1;
+    }
+  expr_end = input_line_pointer;
+  input_line_pointer = save_in;
+  return 0;
+}
+
+void
+md_assemble (char *str ATTRIBUTE_UNUSED)
+{
+  const struct bpf_opcode *insn;
+  const char *args;
+  char *argsStart;
+  int match = 0;
+  valueT mask;
+  char *s, c;
+
+  s = str;
+  if (ISLOWER (*s))
+    {
+      do
+	++s;
+      while (ISLOWER (*s) || ISDIGIT (*s) || *s == '_');
+    }
+
+  switch (*s)
+    {
+    case '\0':
+      break;
+
+    case ' ':
+      *s++ = '\0';
+      break;
+
+    default:
+      as_bad (_("Unknown opcode: `%s'"), str);
+      return;
+    }
+  insn = (struct bpf_opcode *) hash_find (op_hash, str);
+
+  if (insn == NULL)
+    {
+      as_bad (_("Unknown opcode: `%s'"), str);
+      return;
+    }
+
+  argsStart = s;
+  for (;;)
+    {
+      memset (&the_insn, '\0', sizeof (the_insn));
+      the_insn.reloc = BFD_RELOC_NONE;
+      the_insn.opcode = ((valueT)insn->code << 56);
+
+      for (args = insn->args;; args++)
+	{
+	  switch (*args)
+	    {
+	    case '+':
+	    case ',':
+	    case '[':
+	    case ']':
+	      if (*s++ == *args)
+		continue;
+	      break;
+	    case '1':
+	      if (*s++ == 'r')
+		{
+		  if (!ISDIGIT ((c = *s++)))
+		    {
+		      goto error;
+		    }
+		  c -= '0';
+		  mask = c;
+		  the_insn.opcode |= (mask << 52);
+		  continue;
+		}
+	      break;
+	    case '2':
+	      if (*s++ == 'r')
+		{
+		  if (!ISDIGIT ((c = *s++)))
+		    {
+		      goto error;
+		    }
+		  c -= '0';
+		  mask = c;
+		  the_insn.opcode |= (mask << 48);
+		  continue;
+		}
+	      break;
+	    case 'i':
+	      the_insn.reloc = BFD_RELOC_32;
+	      if (*s == ' ')
+		s++;
+	      get_expression (s, &the_insn.exp);
+	      s = expr_end;
+	      if (the_insn.exp.X_op == O_constant
+		  && the_insn.exp.X_add_symbol == 0
+		  && the_insn.exp.X_op_symbol == 0)
+		{
+		  valueT val = the_insn.exp.X_add_number;
+
+		  the_insn.reloc = BFD_RELOC_NONE;
+		  val &= 0xffffffff;
+		  the_insn.opcode |= val;
+		}
+	      continue;
+	    case 'O':
+	      the_insn.reloc = BFD_RELOC_16;
+	      if (*s == ' ')
+		s++;
+	      get_expression (s, &the_insn.exp);
+	      s = expr_end;
+	      if (the_insn.exp.X_op == O_constant
+		  && the_insn.exp.X_add_symbol == 0
+		  && the_insn.exp.X_op_symbol == 0)
+		{
+		  valueT val = the_insn.exp.X_add_number;
+
+		  the_insn.reloc = BFD_RELOC_NONE;
+		  val &= 0xffff;
+		  the_insn.opcode |= val << 32;
+		}
+	      continue;
+	    case 'L':
+	      the_insn.reloc = BFD_RELOC_BPF_WDISP16;
+	      the_insn.pcrel = 1;
+	      if (*s == ' ')
+		s++;
+	      get_expression (s, &the_insn.exp);
+	      s = expr_end;
+	      if (the_insn.exp.X_op == O_constant
+		  && the_insn.exp.X_add_symbol == 0
+		  && the_insn.exp.X_op_symbol == 0)
+		{
+		  valueT val = the_insn.exp.X_add_number;
+
+		  the_insn.reloc = BFD_RELOC_NONE;
+		  val &= 0xffff;
+		  the_insn.opcode |= val << 32;
+		}
+	      continue;
+	    case 'C':
+	      break;
+	    case 'D':
+	      break;
+	    case '\0':		/* End of args.  */
+	      match = 1;
+	      break;
+	    default:
+	      as_fatal (_("failed sanity check."));
+	    }
+
+	  /* Break out of for() loop.  */
+	  break;
+	}
+    error:
+      if (match == 0)
+	{
+	  /* Args don't match.  */
+	  if (&insn[1] - bpf_opcodes < bpf_num_opcodes
+	      && (insn->name == insn[1].name
+		  || !strcmp (insn->name, insn[1].name)))
+	    {
+	      ++insn;
+	      s = argsStart;
+	      continue;
+	    }
+	  else
+	    {
+	      as_bad (_("Illegal operands%s"), "");
+	      return;
+	    }
+	}
+      break;
+    }
+
+  output_insn (&the_insn);
+}
+
+void
+md_number_to_chars (char *buf ATTRIBUTE_UNUSED, valueT val ATTRIBUTE_UNUSED, int n ATTRIBUTE_UNUSED)
+{
+}
+
+void
+md_apply_fix (fixS *fixP, valueT *valP ATTRIBUTE_UNUSED, segT segment ATTRIBUTE_UNUSED)
+{
+  char *buf = fixP->fx_where + fixP->fx_frag->fr_literal;
+  offsetT val = * (offsetT *) valP;
+
+  gas_assert (fixP->fx_r_type < BFD_RELOC_UNUSED);
+  /* If this is a data relocation, just output VAL.  */
+
+  if (fixP->fx_r_type == BFD_RELOC_8)
+    {
+      md_number_to_chars (buf, val, 1);
+    }
+  else if (fixP->fx_r_type == BFD_RELOC_16)
+    {
+      md_number_to_chars (buf, val, 2);
+    }
+  else if (fixP->fx_r_type == BFD_RELOC_32)
+    {
+      md_number_to_chars (buf, val, 4);
+    }
+  else if (fixP->fx_r_type == BFD_RELOC_64)
+    {
+      md_number_to_chars (buf, val, 8);
+    }
+  else if (fixP->fx_r_type == BFD_RELOC_VTABLE_INHERIT
+           || fixP->fx_r_type == BFD_RELOC_VTABLE_ENTRY)
+    {
+      fixP->fx_done = 0;
+      return;
+    }
+  else
+    {
+      long insn;
+
+      if (target_big_endian)
+	insn = bfd_getb32 ((unsigned char *) buf);
+      else
+	insn = bfd_getl32 ((unsigned char *) buf);
+
+      /* It's a relocation against an instruction.  */
+
+      switch (fixP->fx_r_type)
+	{
+	case BFD_RELOC_BPF_WDISP16:
+	  val = val  >> 3;
+	  insn |= (val + 1) & 0xffff;
+	  break;
+	case BFD_RELOC_NONE:
+	default:
+	  as_bad_where (fixP->fx_file, fixP->fx_line,
+			_("bad or unhandled relocation type: 0x%02x"),
+			fixP->fx_r_type);
+	  break;
+	}
+
+      if (target_big_endian)
+	bfd_putb32 (insn, (unsigned char *) buf);
+      else
+	bfd_putl32 (insn, (unsigned char *) buf);
+    }
+}
+
+arelent *
+tc_gen_reloc (asection *section ATTRIBUTE_UNUSED, fixS *fixp ATTRIBUTE_UNUSED)
+{
+  return NULL;
+}
+
+symbolS *
+md_undefined_symbol (char *name ATTRIBUTE_UNUSED)
+{
+  return 0;
+}
+
+valueT
+md_section_align (segT segment ATTRIBUTE_UNUSED, valueT size)
+{
+  return size;
+}
+
+long
+md_pcrel_from (fixS *fixP)
+{
+  long ret;
+
+  ret = fixP->fx_where + fixP->fx_frag->fr_address;
+  /* XXX */
+  return ret;
+}
+
+const char *
+md_atof (int type, char *litP, int *sizeP)
+{
+  return ieee_md_atof (type, litP, sizeP, target_big_endian);
+}
diff --git a/gas/config/tc-bpf.h b/gas/config/tc-bpf.h
new file mode 100644
index 0000000..013e5ed
--- /dev/null
+++ b/gas/config/tc-bpf.h
@@ -0,0 +1,38 @@
+/* tc-bpf.h - Macros and type defines for the bpf.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   This file is part of GAS, the GNU Assembler.
+
+   GAS is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as
+   published by the Free Software Foundation; either version 3,
+   or (at your option) any later version.
+
+   GAS is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+   the GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public
+   License along with GAS; see the file COPYING.  If not, write
+   to the Free Software Foundation, 51 Franklin Street - Fifth Floor,
+   Boston, MA 02110-1301, USA.  */
+
+#ifndef TC_BPF
+#define TC_BPF 1
+
+#define TARGET_ARCH			bfd_arch_bpf
+#define TARGET_FORMAT			"elf64-bpf"
+#define TARGET_BYTES_BIG_ENDIAN		1
+
+#define md_convert_frag(b,s,f) \
+  as_fatal (_("bpf convert_frag\n"))
+#define md_estimate_size_before_relax(f,s) \
+  (as_fatal (_("estimate_size_before_relax called")), 1)
+#define md_operand(x)
+
+#define LISTING_HEADER "BPF GAS "
+
+#define WORKING_DOT_WORD
+
+#endif
diff --git a/gas/configure.tgt b/gas/configure.tgt
index ca58b69..fa959c3 100644
--- a/gas/configure.tgt
+++ b/gas/configure.tgt
@@ -54,6 +54,7 @@ case ${cpu} in
   arm*be|arm*b)		cpu_type=arm endian=big ;;
   arm*)			cpu_type=arm endian=little ;;
   bfin*)		cpu_type=bfin endian=little ;;
+  bpf*)			cpu_type=bpf ;;
   c4x*)			cpu_type=tic4x ;;
   cr16*)		cpu_type=cr16 endian=little ;;
   crisv32)		cpu_type=cris arch=crisv32 ;;
@@ -171,6 +172,8 @@ case ${generic_target} in
   bfin-*-uclinux*)			fmt=elf em=linux ;;
   bfin-*elf)				fmt=elf ;;
 
+  bpf-*elf)				fmt=elf ;;
+
   cr16-*-elf*)				fmt=elf ;;
 
   cris-*-linux-* | crisv32-*-linux-*)
diff --git a/gdb/bpf-tdep.c b/gdb/bpf-tdep.c
new file mode 100644
index 0000000..6629f73
--- /dev/null
+++ b/gdb/bpf-tdep.c
@@ -0,0 +1,229 @@
+/* Target-dependent code for eBPF, for GDB.
+
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include "defs.h"
+#include "inferior.h"
+#include "gdbcore.h"
+#include "arch-utils.h"
+#include "regcache.h"
+#include "frame.h"
+#include "frame-unwind.h"
+#include "frame-base.h"
+#include "trad-frame.h"
+#include "dis-asm.h"
+#include "dwarf2-frame.h"
+#include "symtab.h"
+#include "elf-bfd.h"
+#include "osabi.h"
+#include "infcall.h"
+#include "bpf-tdep.h"
+
+static const char * const bpf_register_name_strings[] =
+{
+  "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7",
+  "r8", "r9", "r10", "pc",
+};
+
+#define NUM_BPF_REGNAMES ARRAY_SIZE (bpf_register_name_strings)
+
+/* Return the BPF register name corresponding to register I.  */
+
+static const char *
+bpf_register_name (struct gdbarch *gdbarch, int i)
+{
+  return bpf_register_name_strings[i];
+}
+
+/* Return the GDB type object for the "standard" data type of data in
+   register N.  */
+
+static struct type *
+bpf_register_type (struct gdbarch *gdbarch, int regnum)
+{
+  if (regnum == BPF_R10_REGNUM)
+    return builtin_type (gdbarch)->builtin_data_ptr;
+
+  if (regnum == BPF_PC_REGNUM)
+    return builtin_type (gdbarch)->builtin_func_ptr;
+
+  return builtin_type (gdbarch)->builtin_int32;
+}
+
+/* Convert DWARF2 register number REG to the appropriate register number
+   used by GDB.  */
+
+static int
+bpf_reg_to_regnum (struct gdbarch *gdbarch, int reg)
+{
+  if (reg < 0 || reg >= BPF_NUM_REGS)
+    return -1;
+
+  return reg;
+}
+
+static struct frame_id
+bpf_dummy_id (struct gdbarch *gdbarch, struct frame_info *this_frame)
+{
+  CORE_ADDR sp;
+
+  sp = get_frame_register_unsigned (this_frame, BPF_R10_REGNUM);
+
+  return frame_id_build (sp, get_frame_pc (this_frame));
+}
+
+static CORE_ADDR
+bpf_push_dummy_call (struct gdbarch *gdbarch,
+		      struct value *function,
+		      struct regcache *regcache,
+		      CORE_ADDR bp_addr,
+		      int nargs,
+		      struct value **args,
+		      CORE_ADDR sp,
+		      int struct_return,
+		      CORE_ADDR struct_addr)
+{
+  return sp; /* XXX */
+}
+
+/* Extract a function return value of TYPE from REGCACHE, and copy
+   that into VALBUF.  */
+
+static void
+bpf_extract_return_value (struct type *type, struct regcache *regcache,
+			  gdb_byte *valbuf)
+{
+  int len = TYPE_LENGTH (type);
+  gdb_byte buf[8];
+
+  regcache_cooked_read (regcache, BPF_R0_REGNUM, buf);
+  memcpy (valbuf, buf + 8 - len, len);
+}
+
+/* Store the function return value of type TYPE from VALBUF into
+   REGCACHE.  */
+
+static void
+bpf_store_return_value (struct type *type, struct regcache *regcache,
+			const gdb_byte *valbuf)
+{
+  int len = TYPE_LENGTH (type);
+  gdb_byte buf[8];
+
+  memcpy (buf + 8 - len, valbuf, len);
+  regcache_cooked_write (regcache, BPF_R0_REGNUM, buf);
+}
+
+/* Determine, for architecture GDBARCH, how a return value of TYPE
+   should be returned.  If it is supposed to be returned in registers,
+   and READBUF is nonzero, read the appropriate value from REGCACHE,
+   and copy it into READBUF.  If WRITEBUF is nonzero, write the value
+   from WRITEBUF into REGCACHE.  */
+
+static enum return_value_convention
+bpf_return_value (struct gdbarch *gdbarch,
+		   struct value *function,
+		   struct type *type,
+		   struct regcache *regcache,
+		   gdb_byte *readbuf,
+		   const gdb_byte *writebuf)
+{
+  if (TYPE_LENGTH (type) > 8)
+    return RETURN_VALUE_STRUCT_CONVENTION;
+
+  if (readbuf)
+    bpf_extract_return_value (type, regcache, readbuf);
+
+  if (writebuf)
+    bpf_store_return_value (type, regcache, writebuf);
+
+  return RETURN_VALUE_REGISTER_CONVENTION;
+}
+
+static CORE_ADDR
+bpf_unwind_pc (struct gdbarch *gdbarch, struct frame_info *next_frame)
+{
+  return frame_unwind_register_unsigned (next_frame, BPF_PC_REGNUM);
+}
+
+/* Skip all the insns that appear in generated function prologues.  */
+
+static CORE_ADDR
+bpf_skip_prologue (struct gdbarch *gdbarch, CORE_ADDR pc)
+{
+  return pc;
+}
+
+/* Implement the breakpoint_kind_from_pc gdbarch method.  */
+
+static int
+bpf_breakpoint_kind_from_pc (struct gdbarch *gdbarch, CORE_ADDR *pcptr)
+{
+  return 8;
+}
+
+/* Initialize the current architecture based on INFO.  If possible,
+   re-use an architecture from ARCHES, which is a list of
+   architectures already created during this debugging session.
+
+   Called e.g. at program startup, when reading a core file, and when
+   reading a binary file.  */
+
+static struct gdbarch *
+bpf_gdbarch_init (struct gdbarch_info info, struct gdbarch_list *arches)
+{
+  struct gdbarch_tdep *tdep;
+  struct gdbarch *gdbarch;
+
+  tdep = XNEW (struct gdbarch_tdep);
+  gdbarch = gdbarch_alloc (&info, tdep);
+  
+  tdep->xxx = 0;
+
+  set_gdbarch_num_regs (gdbarch, BPF_NUM_REGS);
+  set_gdbarch_sp_regnum (gdbarch, BPF_R10_REGNUM);
+  set_gdbarch_pc_regnum (gdbarch, BPF_PC_REGNUM);
+  set_gdbarch_dwarf2_reg_to_regnum (gdbarch, bpf_reg_to_regnum);
+  set_gdbarch_register_name (gdbarch, bpf_register_name);
+  set_gdbarch_register_type (gdbarch, bpf_register_type);
+  set_gdbarch_dummy_id (gdbarch, bpf_dummy_id);
+  set_gdbarch_push_dummy_call (gdbarch, bpf_push_dummy_call);
+  set_gdbarch_return_value (gdbarch, bpf_return_value);
+  set_gdbarch_inner_than (gdbarch, core_addr_lessthan);
+  set_gdbarch_frame_args_skip (gdbarch, 8);
+  set_gdbarch_unwind_pc (gdbarch, bpf_unwind_pc);
+  set_gdbarch_print_insn (gdbarch, print_insn_bpf);
+
+  set_gdbarch_skip_prologue (gdbarch, bpf_skip_prologue);
+  set_gdbarch_breakpoint_kind_from_pc (gdbarch, bpf_breakpoint_kind_from_pc);
+
+  /* Hook in ABI-specific overrides, if they have been registered.  */
+  gdbarch_init_osabi (info, gdbarch);
+
+  dwarf2_append_unwinders (gdbarch);
+  return gdbarch;
+}
+
+/* Provide a prototype to silence -Wmissing-prototypes.  */
+extern initialize_file_ftype _initialize_bpf_tdep;
+
+void
+_initialize_bpf_tdep (void)
+{
+  register_gdbarch_init (bfd_arch_bpf, bpf_gdbarch_init);
+}
diff --git a/gdb/bpf-tdep.h b/gdb/bpf-tdep.h
new file mode 100644
index 0000000..52cae6d
--- /dev/null
+++ b/gdb/bpf-tdep.h
@@ -0,0 +1,40 @@
+/* Target-dependent code for eBPF, for GDB.
+
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+enum gdb_regnum {
+  BPF_R0_REGNUM = 0,
+  BPF_R1_REGNUM,
+  BPF_R2_REGNUM,
+  BPF_R3_REGNUM,
+  BPF_R4_REGNUM,
+  BPF_R5_REGNUM,
+  BPF_R6_REGNUM,
+  BPF_R7_REGNUM,
+  BPF_R8_REGNUM,
+  BPF_R9_REGNUM,
+  BPF_R10_REGNUM,
+  BPF_PC_REGNUM,
+};
+
+#define BPF_NUM_REGS	(BPF_PC_REGNUM + 1)
+
+struct gdbarch_tdep
+{
+  int xxx;
+};
diff --git a/gdb/configure.tgt b/gdb/configure.tgt
index fdcb7b1..e8d5fb4 100644
--- a/gdb/configure.tgt
+++ b/gdb/configure.tgt
@@ -142,6 +142,10 @@ bfin-*-*)
 	gdb_sim=../sim/bfin/libsim.a
 	;;
 
+bpf*)
+	# Target: eBPF
+	gdb_target_obs="bpf-tdep.o"
+	;;
 cris*)
 	# Target: CRIS
 	gdb_target_obs="cris-tdep.o cris-linux-tdep.o linux-tdep.o solib-svr4.o"
diff --git a/include/dis-asm.h b/include/dis-asm.h
index 6f1801d..cbfebc8 100644
--- a/include/dis-asm.h
+++ b/include/dis-asm.h
@@ -241,6 +241,7 @@ extern int print_insn_aarch64		(bfd_vma, disassemble_info *);
 extern int print_insn_alpha		(bfd_vma, disassemble_info *);
 extern int print_insn_avr		(bfd_vma, disassemble_info *);
 extern int print_insn_bfin		(bfd_vma, disassemble_info *);
+extern int print_insn_bpf		(bfd_vma, disassemble_info *);
 extern int print_insn_big_arm		(bfd_vma, disassemble_info *);
 extern int print_insn_big_mips		(bfd_vma, disassemble_info *);
 extern int print_insn_big_nios2		(bfd_vma, disassemble_info *);
diff --git a/include/elf/bpf.h b/include/elf/bpf.h
new file mode 100644
index 0000000..6360db8
--- /dev/null
+++ b/include/elf/bpf.h
@@ -0,0 +1,34 @@
+/* BPF ELF support for BFD.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+   This file is part of BFD, the Binary File Descriptor library.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston,
+   MA 02110-1301, USA.  */
+
+#ifndef _ELF_BPF_H
+#define _ELF_BPF_H
+
+#include "elf/reloc-macros.h"
+
+/* Relocation types.  */
+START_RELOC_NUMBERS (elf_bpf_reloc_type)
+  RELOC_NUMBER (R_BPF_NONE, 0)
+  RELOC_NUMBER (R_BPF_16, 1)
+  RELOC_NUMBER (R_BPF_32, 2)
+  RELOC_NUMBER (R_BPF_WDISP16, 3)
+END_RELOC_NUMBERS (R_BPF_max)
+
+#endif /* _ELF_BPF_H */
diff --git a/include/opcode/bpf.h b/include/opcode/bpf.h
new file mode 100644
index 0000000..298ed1b
--- /dev/null
+++ b/include/opcode/bpf.h
@@ -0,0 +1,16 @@
+#ifndef OPCODE_BPF_H
+#define OPCODE_BPF_H
+
+/* Structure of an opcode table entry.  */
+
+typedef struct bpf_opcode
+{
+  const char *name;
+  unsigned char code;
+  const char *args;
+} bpf_opcode;
+
+extern const struct bpf_opcode bpf_opcodes[];
+extern const int bpf_num_opcodes;
+
+#endif /* OPCODE_BPF_H */
diff --git a/ld/Makefile.am b/ld/Makefile.am
index 3aa7e80..d840bed 100644
--- a/ld/Makefile.am
+++ b/ld/Makefile.am
@@ -477,6 +477,7 @@ ALL_64_EMULATION_SOURCES = \
 	eelf32ltsmipn32_fbsd.c \
 	eelf32mipswindiss.c \
 	eelf64_aix.c \
+	eelf64_bpf.c \
 	eelf64_ia64.c \
 	eelf64_ia64_fbsd.c \
 	eelf64_ia64_vms.c \
@@ -1920,6 +1921,9 @@ eelf32_x86_64_nacl.c: $(srcdir)/emulparams/elf32_x86_64_nacl.sh \
 eelf64_aix.c: $(srcdir)/emulparams/elf64_aix.sh \
   $(ELF_DEPS) $(srcdir)/scripttempl/elf.sc ${GEN_DEPENDS}
 
+eelf64_bpf.c: $(srcdir)/emulparams/elf64_bpf.sh \
+  $(ELF_DEPS) $(srcdir)/scripttempl/elf.sc ${GEN_DEPENDS}
+
 eelf64_ia64.c: $(srcdir)/emulparams/elf64_ia64.sh \
   $(ELF_DEPS) $(srcdir)/emultempl/ia64elf.em \
   $(srcdir)/emultempl/needrelax.em \
diff --git a/ld/Makefile.in b/ld/Makefile.in
index f485f4f..706a889 100644
--- a/ld/Makefile.in
+++ b/ld/Makefile.in
@@ -845,6 +845,7 @@ ALL_64_EMULATION_SOURCES = \
 	eelf32ltsmipn32_fbsd.c \
 	eelf32mipswindiss.c \
 	eelf64_aix.c \
+	eelf64_bpf.c \
 	eelf64_ia64.c \
 	eelf64_ia64_fbsd.c \
 	eelf64_ia64_vms.c \
@@ -1292,6 +1293,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/eelf32xstormy16.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/eelf32xtensa.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/eelf64_aix.Po@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/eelf64_bpf.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/eelf64_ia64.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/eelf64_ia64_fbsd.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/eelf64_ia64_vms.Po@am__quote@
@@ -3484,6 +3486,9 @@ eelf32_x86_64_nacl.c: $(srcdir)/emulparams/elf32_x86_64_nacl.sh \
 eelf64_aix.c: $(srcdir)/emulparams/elf64_aix.sh \
   $(ELF_DEPS) $(srcdir)/scripttempl/elf.sc ${GEN_DEPENDS}
 
+eelf64_bpf.c: $(srcdir)/emulparams/elf64_bpf.sh \
+  $(ELF_DEPS) $(srcdir)/scripttempl/elf.sc ${GEN_DEPENDS}
+
 eelf64_ia64.c: $(srcdir)/emulparams/elf64_ia64.sh \
   $(ELF_DEPS) $(srcdir)/emultempl/ia64elf.em \
   $(srcdir)/emultempl/needrelax.em \
diff --git a/ld/configure.tgt b/ld/configure.tgt
index 895f0fb..13645f5 100644
--- a/ld/configure.tgt
+++ b/ld/configure.tgt
@@ -177,6 +177,8 @@ bfin-*-linux-uclibc*)	targ_emul=elf32bfinfd;
 			targ_extra_emuls="elf32bfin"
 			targ_extra_libpath=$targ_extra_emuls
 			;;
+bpf-*-elf)		targ_emul=elf64_bpf
+			;;
 cr16-*-elf*)            targ_emul=elf32cr16 ;;
 cr16c-*-elf*)           targ_emul=elf32cr16c
 			;;
diff --git a/ld/emulparams/elf64_bpf.sh b/ld/emulparams/elf64_bpf.sh
new file mode 100644
index 0000000..0e1e549
--- /dev/null
+++ b/ld/emulparams/elf64_bpf.sh
@@ -0,0 +1,8 @@
+# See genscripts.sh and ../scripttempl/elf.sc for the meaning of these.
+SCRIPT_NAME=elf
+ELFSIZE=64
+TEMPLATE_NAME=elf32
+OUTPUT_FORMAT="elf64-bpf"
+TARGET_PAGE_SIZE=0x1000
+ARCH=bpf
+MACHINE=
diff --git a/opcodes/Makefile.am b/opcodes/Makefile.am
index 1ac6bb1..ccc9453 100644
--- a/opcodes/Makefile.am
+++ b/opcodes/Makefile.am
@@ -105,6 +105,8 @@ TARGET_LIBOPCODES_CFILES = \
 	arm-dis.c \
 	avr-dis.c \
 	bfin-dis.c \
+	bpf-dis.c \
+	bpf-opc.c \
 	cgen-asm.c \
 	cgen-bitset.c \
 	cgen-dis.c \
diff --git a/opcodes/bpf-dis.c b/opcodes/bpf-dis.c
new file mode 100644
index 0000000..2a0b7da
--- /dev/null
+++ b/opcodes/bpf-dis.c
@@ -0,0 +1,152 @@
+#include "sysdep.h"
+#include <stdio.h>
+#include "opcode/bpf.h"
+#include "dis-asm.h"
+#include "libiberty.h"
+
+#define HASH_SIZE 256
+#define HASH_INSN(CODE)	(CODE)
+
+typedef struct bpf_opcode_hash
+{
+  struct bpf_opcode_hash *next;
+  const bpf_opcode *opcode;
+} bpf_opcode_hash;
+
+static bpf_opcode_hash *opcode_hash_table[HASH_SIZE];
+
+static void
+build_hash_table (const bpf_opcode *opcode_table,
+		  bpf_opcode_hash **hash_table,
+		  int num_opcodes)
+{
+  static bpf_opcode_hash *hash_buf = NULL;
+  int i;
+
+  memset (hash_table, 0, HASH_SIZE * sizeof (hash_table[0]));
+  if (hash_buf != NULL)
+    free (hash_buf);
+  hash_buf = xmalloc (sizeof (* hash_buf) * num_opcodes);
+  for (i = num_opcodes - 1; i >= 0; --i)
+    {
+      int hash = HASH_INSN (opcode_table[i].code);
+      bpf_opcode_hash *h = &hash_buf[i];
+
+      h->next = hash_table[hash];
+      h->opcode = &opcode_table[i];
+      hash_table[hash] = h;
+    }
+}
+
+int
+print_insn_bpf (bfd_vma memaddr, disassemble_info *info)
+{
+  static unsigned long current_mach = 0;
+  static int opcodes_initialized = 0;
+  bfd_vma (*getword) (const void *);
+  bfd_vma (*gethalf) (const void *);
+  FILE *stream = info->stream;
+  bpf_opcode_hash *op;
+  int code, dest, src;
+  bfd_byte buffer[8];
+  unsigned short off;
+  int status, ret;
+  signed int imm;
+
+  if (!opcodes_initialized
+      || info->mach != current_mach)
+    {
+      build_hash_table (bpf_opcodes, opcode_hash_table, bpf_num_opcodes);
+      current_mach = info->mach;
+      opcodes_initialized = 1;
+    }
+
+  info->bytes_per_line = 8;
+
+  status = (*info->read_memory_func) (memaddr, buffer, sizeof (buffer), info);
+  if (status != 0)
+    {
+      (*info->memory_error_func) (status, memaddr, info);
+      return -1;
+    }
+
+  if (info->endian == BFD_ENDIAN_BIG)
+    {
+      getword = bfd_getb32;
+      gethalf = bfd_getb16;
+    }
+  else
+    {
+      getword = bfd_getl32;
+      gethalf = bfd_getl32;
+    }  
+
+  code = buffer[0];
+  dest = (buffer[1] & 0xf0) >> 4;
+  src = buffer[1] & 0x0f;
+  off = gethalf(&buffer[2]);
+  imm = getword(&buffer[4]);
+
+  ret = sizeof (buffer);
+  for (op = opcode_hash_table[HASH_INSN (code)]; op; op = op->next)
+    {
+      const bpf_opcode *opcode = op->opcode;
+      BFD_HOST_U_64_BIT value;
+      signed int imm2;
+      const char *s;
+
+      if (opcode->code != code)
+	continue;
+
+      (*info->fprintf_func) (stream, "%s\t", opcode->name);
+      for (s = opcode->args; *s != '\0'; s++)
+	{
+	  switch (*s)
+	    {
+	    case '+':
+	    default:
+	      (*info->fprintf_func) (stream, "%c", *s);
+	      break;
+	    case ',':
+	      (*info->fprintf_func) (stream, ", ");
+	      break;
+	    case '1':
+	      (*info->fprintf_func) (stream, "r%d", dest);
+	      break;
+	    case '2':
+	      (*info->fprintf_func) (stream, "r%d", src);
+	      break;
+	    case 'i':
+	      (*info->fprintf_func) (stream, "%d", imm);
+	      break;
+	    case 'O':
+	      (*info->fprintf_func) (stream, "%d", off);
+	      break;
+	    case 'L':
+	      info->target = memaddr + ((off - 1) * 8);
+	      (*info->print_address_func) (info->target, info);
+	      break;
+	    case 'C':
+	      info->target = imm;
+	      (*info->print_address_func) (info->target, info);
+	      break;
+	    case 'D':
+	      status = (*info->read_memory_func) (memaddr + 8, buffer,
+						  sizeof (buffer), info);
+	      if (status != 0)
+		{
+		  (*info->memory_error_func) (status, memaddr, info);
+		  return -1;
+		}
+	      ret += sizeof (buffer);
+	      imm2 = getword(&buffer[4]);
+	      value = ((BFD_HOST_U_64_BIT) (unsigned) imm2) << 32;
+	      value |= (BFD_HOST_U_64_BIT) (unsigned) imm;
+	      (*info->fprintf_func) (stream, "%lld", (long long) value);
+	      break;
+	    }
+	}
+    }
+
+  return ret;
+}
diff --git a/opcodes/bpf-opc.c b/opcodes/bpf-opc.c
new file mode 100644
index 0000000..bca8e47
--- /dev/null
+++ b/opcodes/bpf-opc.c
@@ -0,0 +1,147 @@
+#include "sysdep.h"
+#include <stdio.h>
+#include "opcode/bpf.h"
+
+#define BPF_OPC_ALU64	0x07
+#define BPF_OPC_DW	0x18
+#define BPF_OPC_XADD	0xc0
+#define BPF_OPC_MOV	0xb0
+#define BPF_OPC_ARSH	0xc0
+#define BPF_OPC_END	0xd0
+#define BPF_OPC_TO_LE	0x00
+#define BPF_OPC_TO_BE	0x08
+#define BPF_OPC_JNE	0x50
+#define BPF_OPC_JSGT	0x60
+#define BPF_OPC_JSGE	0x70
+#define BPF_OPC_CALL	0x80
+#define BPF_OPC_EXIT	0x90
+
+#define BPF_OPC_LD	0x00
+#define BPF_OPC_LDX	0x01
+#define BPF_OPC_ST	0x02
+#define BPF_OPC_STX	0x03
+#define BPF_OPC_ALU	0x04
+#define BPF_OPC_JMP	0x05
+#define BPF_OPC_RET	0x06
+#define BPF_OPC_MISC	0x07
+
+#define BPF_OPC_W	0x00
+#define BPF_OPC_H	0x08
+#define BPF_OPC_B	0x10
+
+#define BPF_OPC_IMM	0x00
+#define BPF_OPC_ABS	0x20
+#define BPF_OPC_IND	0x40
+#define BPF_OPC_MEM	0x60
+#define BPF_OPC_LEL	0x80
+#define BPF_OPC_MSH	0xa0
+
+#define BPF_OPC_ADD	0x00
+#define BPF_OPC_SUB	0x10
+#define BPF_OPC_MUL	0x20
+#define BPF_OPC_DIV	0x30
+#define BPF_OPC_OR	0x40
+#define BPF_OPC_AND	0x50
+#define BPF_OPC_LSH	0x60
+#define BPF_OPC_RSH	0x70
+#define BPF_OPC_NEG	0x80
+#define BPF_OPC_MOD	0x90
+#define BPF_OPC_XOR	0xa0
+
+#define BPF_OPC_JA	0x00
+#define BPF_OPC_JEQ	0x10
+#define BPF_OPC_JGT	0x20
+#define BPF_OPC_JGE	0x30
+#define BPF_OPC_JSET	0x40
+
+#define BPF_OPC_K	0x00
+#define BPF_OPC_X	0x08
+
+const struct bpf_opcode bpf_opcodes[] = {
+  { "mov32",   BPF_OPC_ALU   | BPF_OPC_MOV  | BPF_OPC_X,     "1,2" },
+  { "mov32",   BPF_OPC_ALU   | BPF_OPC_MOV  | BPF_OPC_K,     "1,i" },
+  { "mov",     BPF_OPC_ALU64 | BPF_OPC_MOV  | BPF_OPC_X,     "1,2" },
+  { "mov",     BPF_OPC_ALU64 | BPF_OPC_MOV  | BPF_OPC_K,     "1,i" },
+  { "add32",   BPF_OPC_ALU   | BPF_OPC_ADD  | BPF_OPC_X,     "1,2" },
+  { "add32",   BPF_OPC_ALU   | BPF_OPC_ADD  | BPF_OPC_K,     "1,i" },
+  { "add",     BPF_OPC_ALU64 | BPF_OPC_ADD  | BPF_OPC_X,     "1,2" },
+  { "add",     BPF_OPC_ALU64 | BPF_OPC_ADD  | BPF_OPC_K,     "1,i" },
+  { "sub32",   BPF_OPC_ALU   | BPF_OPC_SUB  | BPF_OPC_X,     "1,2" },
+  { "sub32",   BPF_OPC_ALU   | BPF_OPC_SUB  | BPF_OPC_K,     "1,i" },
+  { "sub",     BPF_OPC_ALU64 | BPF_OPC_SUB  | BPF_OPC_X,     "1,2" },
+  { "sub",     BPF_OPC_ALU64 | BPF_OPC_SUB  | BPF_OPC_K,     "1,i" },
+  { "and32",   BPF_OPC_ALU   | BPF_OPC_AND  | BPF_OPC_X,     "1,2" },
+  { "and32",   BPF_OPC_ALU   | BPF_OPC_AND  | BPF_OPC_K,     "1,i" },
+  { "and",     BPF_OPC_ALU64 | BPF_OPC_AND  | BPF_OPC_X,     "1,2" },
+  { "and",     BPF_OPC_ALU64 | BPF_OPC_AND  | BPF_OPC_K,     "1,i" },
+  { "or32",    BPF_OPC_ALU   | BPF_OPC_OR   | BPF_OPC_X,     "1,2" },
+  { "or32",    BPF_OPC_ALU   | BPF_OPC_XOR  | BPF_OPC_K,     "1,i" },
+  { "or",      BPF_OPC_ALU64 | BPF_OPC_OR   | BPF_OPC_X,     "1,2" },
+  { "or",      BPF_OPC_ALU64 | BPF_OPC_XOR  | BPF_OPC_K,     "1,i" },
+  { "xor32",   BPF_OPC_ALU   | BPF_OPC_XOR  | BPF_OPC_X,     "1,2" },
+  { "xor32",   BPF_OPC_ALU   | BPF_OPC_OR   | BPF_OPC_K,     "1,i" },
+  { "xor",     BPF_OPC_ALU64 | BPF_OPC_XOR  | BPF_OPC_X,     "1,2" },
+  { "xor",     BPF_OPC_ALU64 | BPF_OPC_OR   | BPF_OPC_K,     "1,i" },
+  { "mul32",   BPF_OPC_ALU   | BPF_OPC_MUL  | BPF_OPC_X,     "1,2" },
+  { "mul32",   BPF_OPC_ALU   | BPF_OPC_MUL  | BPF_OPC_K,     "1,i" },
+  { "mul",     BPF_OPC_ALU64 | BPF_OPC_MUL  | BPF_OPC_X,     "1,2" },
+  { "mul",     BPF_OPC_ALU64 | BPF_OPC_MUL  | BPF_OPC_K,     "1,i" },
+  { "div32",   BPF_OPC_ALU   | BPF_OPC_DIV  | BPF_OPC_X,     "1,2" },
+  { "div32",   BPF_OPC_ALU   | BPF_OPC_DIV  | BPF_OPC_K,     "1,i" },
+  { "div",     BPF_OPC_ALU64 | BPF_OPC_DIV  | BPF_OPC_X,     "1,2" },
+  { "div",     BPF_OPC_ALU64 | BPF_OPC_DIV  | BPF_OPC_K,     "1,i" },
+  { "mod32",   BPF_OPC_ALU   | BPF_OPC_MOD  | BPF_OPC_X,     "1,2" },
+  { "mod32",   BPF_OPC_ALU   | BPF_OPC_MOD  | BPF_OPC_K,     "1,i" },
+  { "mod",     BPF_OPC_ALU64 | BPF_OPC_MOD  | BPF_OPC_X,     "1,2" },
+  { "mod",     BPF_OPC_ALU64 | BPF_OPC_MOD  | BPF_OPC_K,     "1,i" },
+  { "lsh32",   BPF_OPC_ALU   | BPF_OPC_LSH  | BPF_OPC_X,     "1,2" },
+  { "lsh32",   BPF_OPC_ALU   | BPF_OPC_LSH  | BPF_OPC_K,     "1,i" },
+  { "lsh",     BPF_OPC_ALU64 | BPF_OPC_LSH  | BPF_OPC_X,     "1,2" },
+  { "lsh",     BPF_OPC_ALU64 | BPF_OPC_LSH  | BPF_OPC_K,     "1,i" },
+  { "rsh32",   BPF_OPC_ALU   | BPF_OPC_RSH  | BPF_OPC_X,     "1,2" },
+  { "rsh32",   BPF_OPC_ALU   | BPF_OPC_RSH  | BPF_OPC_K,     "1,i" },
+  { "rsh",     BPF_OPC_ALU64 | BPF_OPC_RSH  | BPF_OPC_X,     "1,2" },
+  { "rsh",     BPF_OPC_ALU64 | BPF_OPC_RSH  | BPF_OPC_K,     "1,i" },
+  { "arsh32",  BPF_OPC_ALU   | BPF_OPC_ARSH | BPF_OPC_X,     "1,2" },
+  { "arsh32",  BPF_OPC_ALU   | BPF_OPC_ARSH | BPF_OPC_K,     "1,i" },
+  { "arsh",    BPF_OPC_ALU64 | BPF_OPC_ARSH | BPF_OPC_X,     "1,2" },
+  { "arsh",    BPF_OPC_ALU64 | BPF_OPC_ARSH | BPF_OPC_K,     "1,i" },
+  { "neg32",   BPF_OPC_ALU   | BPF_OPC_NEG  | BPF_OPC_X,     "1" },
+  { "neg",     BPF_OPC_ALU64 | BPF_OPC_NEG  | BPF_OPC_X,     "1" },
+  { "endbe",   BPF_OPC_ALU   | BPF_OPC_END  | BPF_OPC_TO_BE, "1,i" },
+  { "endle",   BPF_OPC_ALU   | BPF_OPC_END  | BPF_OPC_TO_LE, "1,i" },
+  { "ja",      BPF_OPC_JMP   | BPF_OPC_JA,                   "L" },
+  { "jeq",     BPF_OPC_JMP   | BPF_OPC_JEQ  | BPF_OPC_X,     "1,2,L" },
+  { "jeq",     BPF_OPC_JMP   | BPF_OPC_JEQ  | BPF_OPC_K,     "1,i,L" },
+  { "jgt",     BPF_OPC_JMP   | BPF_OPC_JGT  | BPF_OPC_X,     "1,2,L" },
+  { "jgt",     BPF_OPC_JMP   | BPF_OPC_JGT  | BPF_OPC_K,     "1,i,L" },
+  { "jge",     BPF_OPC_JMP   | BPF_OPC_JGE  | BPF_OPC_X,     "1,2,L" },
+  { "jge",     BPF_OPC_JMP   | BPF_OPC_JGE  | BPF_OPC_K,     "1,i,L" },
+  { "jne",     BPF_OPC_JMP   | BPF_OPC_JNE  | BPF_OPC_X,     "1,2,L" },
+  { "jne",     BPF_OPC_JMP   | BPF_OPC_JNE  | BPF_OPC_K,     "1,i,L" },
+  { "jsgt",    BPF_OPC_JMP   | BPF_OPC_JSGT | BPF_OPC_X,     "1,2,L" },
+  { "jsgt",    BPF_OPC_JMP   | BPF_OPC_JSGT | BPF_OPC_K,     "1,i,L" },
+  { "jsge",    BPF_OPC_JMP   | BPF_OPC_JSGE | BPF_OPC_X,     "1,2,L" },
+  { "jsge",    BPF_OPC_JMP   | BPF_OPC_JSGE | BPF_OPC_K,     "1,i,L" },
+  { "jset",    BPF_OPC_JMP   | BPF_OPC_JSET | BPF_OPC_X,     "1,2,L" },
+  { "jset",    BPF_OPC_JMP   | BPF_OPC_JSET | BPF_OPC_K,     "1,i,L" },
+  { "call",    BPF_OPC_JMP   | BPF_OPC_CALL,                 "C" },
+  { "tailcall",BPF_OPC_JMP   | BPF_OPC_CALL | BPF_OPC_X,     "C" },
+  { "exit",    BPF_OPC_JMP   | BPF_OPC_EXIT,                 "" },
+  { "ldimm64", BPF_OPC_LD    | BPF_OPC_IMM  | BPF_OPC_DW,    "1,D" },
+  { "ldw",     BPF_OPC_LDX   | BPF_OPC_MEM  | BPF_OPC_W,     "1,[2+O]" },
+  { "ldh",     BPF_OPC_LDX   | BPF_OPC_MEM  | BPF_OPC_H,     "1,[2+O]" },
+  { "ldb",     BPF_OPC_LDX   | BPF_OPC_MEM  | BPF_OPC_B,     "1,[2+O]" },
+  { "lddw",    BPF_OPC_LDX   | BPF_OPC_MEM  | BPF_OPC_DW,    "1,[2+O]" },
+  { "stw",     BPF_OPC_STX   | BPF_OPC_MEM  | BPF_OPC_W,     "[1+O],2" },
+  { "stw",     BPF_OPC_ST    | BPF_OPC_MEM  | BPF_OPC_W,     "[1+O],i" },
+  { "sth",     BPF_OPC_STX   | BPF_OPC_MEM  | BPF_OPC_H,     "[1+O],2" },
+  { "sth",     BPF_OPC_ST    | BPF_OPC_MEM  | BPF_OPC_H,     "[1+O],i" },
+  { "stb",     BPF_OPC_STX   | BPF_OPC_MEM  | BPF_OPC_B,     "[1+O],2" },
+  { "stb",     BPF_OPC_ST    | BPF_OPC_MEM  | BPF_OPC_B,     "[1+O],i" },
+  { "stdw",    BPF_OPC_STX   | BPF_OPC_MEM  | BPF_OPC_DW,    "[1+O],2" },
+  { "stdw",    BPF_OPC_ST    | BPF_OPC_MEM  | BPF_OPC_DW,    "[1+O],i" },
+  { "xaddw",   BPF_OPC_STX   | BPF_OPC_XADD | BPF_OPC_W,     "[1+O],2" },
+  { "xadddw",  BPF_OPC_STX   | BPF_OPC_XADD | BPF_OPC_DW,    "[1+O],2" },
+};
+const int bpf_num_opcodes = ((sizeof bpf_opcodes)/(sizeof bpf_opcodes[0]));
diff --git a/opcodes/configure b/opcodes/configure
index 27d1472..7583220 100755
--- a/opcodes/configure
+++ b/opcodes/configure
@@ -12634,6 +12634,7 @@ if test x${all_targets} = xfalse ; then
 	bfd_arm_arch)		ta="$ta arm-dis.lo" ;;
 	bfd_avr_arch)		ta="$ta avr-dis.lo" ;;
 	bfd_bfin_arch)		ta="$ta bfin-dis.lo" ;;
+	bfd_bpf_arch)		ta="$ta bpf-dis.lo bpf-opc.lo" ;;
 	bfd_cr16_arch)		ta="$ta cr16-dis.lo cr16-opc.lo" ;;
 	bfd_cris_arch)		ta="$ta cris-dis.lo cris-opc.lo cgen-bitset.lo" ;;
 	bfd_crx_arch)		ta="$ta crx-dis.lo crx-opc.lo" ;;
diff --git a/opcodes/configure.ac b/opcodes/configure.ac
index a9fbfd6..7dc6a92 100644
--- a/opcodes/configure.ac
+++ b/opcodes/configure.ac
@@ -258,6 +258,7 @@ if test x${all_targets} = xfalse ; then
 	bfd_arm_arch)		ta="$ta arm-dis.lo" ;;
 	bfd_avr_arch)		ta="$ta avr-dis.lo" ;;
 	bfd_bfin_arch)		ta="$ta bfin-dis.lo" ;;
+	bfd_bpf_arch)		ta="$ta bpf-dis.lo bpf-opc.lo" ;;
 	bfd_cr16_arch)		ta="$ta cr16-dis.lo cr16-opc.lo" ;;
 	bfd_cris_arch)		ta="$ta cris-dis.lo cris-opc.lo cgen-bitset.lo" ;;
 	bfd_crx_arch)		ta="$ta crx-dis.lo crx-opc.lo" ;;
diff --git a/opcodes/disassemble.c b/opcodes/disassemble.c
index dd7d3a3..e594f86 100644
--- a/opcodes/disassemble.c
+++ b/opcodes/disassemble.c
@@ -29,6 +29,7 @@
 #define ARCH_arm
 #define ARCH_avr
 #define ARCH_bfin
+#define ARCH_bpf
 #define ARCH_cr16
 #define ARCH_cris
 #define ARCH_crx
@@ -151,6 +152,11 @@ disassembler (bfd *abfd)
       disassemble = print_insn_bfin;
       break;
 #endif
+#ifdef ARCH_bpf
+    case bfd_arch_bpf:
+      disassemble = print_insn_bpf;
+      break;
+#endif
 #ifdef ARCH_cr16
     case bfd_arch_cr16:
       disassemble = print_insn_cr16;
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH v1 net-next 5/6] net: allow simultaneous SW and HW transmit timestamping
From: Miroslav Lichvar @ 2017-04-27 16:39 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, Richard Cochran, Willem de Bruijn,
	Soheil Hassas Yeganeh, Keller, Jacob E, Denny Page, Jiri Benc
In-Reply-To: <CAF=yD-+HK-dCG_XjqBKfkSF1bjJavTr7EFgeFNH2yRc2CXgOxA@mail.gmail.com>

On Thu, Apr 27, 2017 at 12:21:00PM -0400, Willem de Bruijn wrote:
> >> > @@ -720,6 +720,7 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
> >> >                 empty = 0;
> >> >         if (shhwtstamps &&
> >> >             (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
> >> > +           (empty || !skb_is_err_queue(skb)) &&
> >> >             ktime_to_timespec_cond(shhwtstamps->hwtstamp, tss.ts + 2)) {
> >>
> >> I find skb->tstamp == 0 easier to understand than the condition on empty.
> >>
> >> Indeed, this is so non-obvious that I would suggest another helper function
> >> skb_is_hwtx_tstamp with a concise comment about the race condition
> >> between tx software and hardware timestamps (as in the last sentence of
> >> the commit message).
> >
> > Should it include also the skb_is_err_queue() check? If it returned
> > true for both TX and RX HW timestamps, maybe it could be called
> > skb_has_hw_tstamp?
> 
> For the purpose of documenting why this complex condition exists,
> I would call the skb_is_err_queue in that helper function and make
> it tx + hw specific.

Hm, like this?

        if (shhwtstamps &&
            (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
+           (skb_is_hwtx_tstamp(skb) || !skb_is_err_queue(skb)) &&
            ktime_to_timespec_cond(shhwtstamps->hwtstamp, tss.ts + 2)) {

where skb_is_hwtx_tstamp() has
	return skb->tstamp == 0 && skb_is_err_queue(skb);

I was just not sure about the unnecessary skb_is_err_queue() call.

-- 
Miroslav Lichvar

^ permalink raw reply

* Re: [PATCH net-next] samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel
From: Alexei Starovoitov @ 2017-04-27 16:38 UTC (permalink / raw)
  To: David Ahern, netdev; +Cc: daniel
In-Reply-To: <1493309473-27384-1-git-send-email-dsa@cumulusnetworks.com>

On 4/27/17 9:11 AM, David Ahern wrote:
> Add option to xdp1 and xdp_tx_iptunnel to insert xdp program in
> SKB_MODE:
>  - update set_link_xdp_fd to take a flags argument that is added to the
>    RTM_SETLINK message
>
>  - Add -S option to xdp1 and xdp_tx_iptunnel user code. When passed in
>    XDP_FLAGS_SKB_MODE is set in the flags arg passed to set_link_xdp_fd
>
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>

awesome. thanks!
Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply

* [PATCH] net: ath: tx99: fixed a spelling issue
From: ammly @ 2017-04-27 16:31 UTC (permalink / raw)
  To: ath9k-devel; +Cc: kvalo, linux-wireless, netdev, linux-kernel, Ammly Fredrick

Fixed a spelling issue.

Signed-off-by: Ammly Fredrick <ammlyf@gmail.com>
---
 drivers/net/wireless/ath/ath9k/tx99.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath9k/tx99.c b/drivers/net/wireless/ath/ath9k/tx99.c
index 16aca9e28b77..a866cbda0799 100644
--- a/drivers/net/wireless/ath/ath9k/tx99.c
+++ b/drivers/net/wireless/ath/ath9k/tx99.c
@@ -153,7 +153,7 @@ static int ath9k_tx99_init(struct ath_softc *sc)
 		sc->tx99_power,
 		sc->tx99_power / 2);
 
-	/* We leave the harware awake as it will be chugging on */
+	/* We leave the hardware awake as it will be chugging on */
 
 	return 0;
 }
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH net-next 6/6] bpf: show bpf programs
From: Hannes Frederic Sowa @ 2017-04-27 16:28 UTC (permalink / raw)
  To: David Miller; +Cc: daniel, netdev, ast, daniel, jbenc, aconole
In-Reply-To: <20170427.120019.1559603500876505216.davem@davemloft.net>

On 27.04.2017 18:00, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Thu, 27 Apr 2017 15:22:49 +0200
> 
>> Sure, that sounds super. But so far Linux and most (maybe I should write
>> all) subsystems always provided some easy way to get the insights of the
>> kernel without having to code or rely on special tools so far.
> 
> Not true.

Yes, I should not have written it that generally. ;)

> You cannot fully dump socket TCP internal state without netlink based
> tools.  It is just one of many examples.
>
> Can you dump all nftables rules without a special tool?

You got me here, I agree that not all state is discoverable via procfs.
But to some degree even netfilter and tcp do expose some considerable
amount of data via procfs. In the case of netfilter it might be less
valuable, though, I have to agree.

> I don't think this is a legitimate line of argument, and I want
> this to be done via the bpf() system call which is what people
> are working on.

I hope you saw that I was absolutely not against dumping or enumeration
with the bpf syscall. It will be the primary interface to debug ebpf and
I completely agree.

Merely I tried to establish the procfs interface as quick look interface
if some type of bpf program is loaded which could start any further
diagnosis. This interface should not have any dependencies and should
even work on embedded devices, where sometimes it might be difficult to
get a binary for the correct architecture installed ad-hoc (I am
thinking about openwrt). But this is definitely also solvable.

I do think if a common utility in util-linux, like lsbpf, is available I
will be fine.

Anyway, I will take this argument back.

Thanks,
Hannes

^ permalink raw reply

* Re: [PATCH net-next 9/9] ipvlan: introduce individual MAC addresses
From: Dan Williams @ 2017-04-27 16:23 UTC (permalink / raw)
  To: Marco Chiappero, netdev
  Cc: David S . Miller, Jeff Kirsher, Alexander Duyck, Sainath Grandhi,
	Mahesh Bandewar
In-Reply-To: <1493310033.27948.3.camel@redhat.com>

On Thu, 2017-04-27 at 11:20 -0500, Dan Williams wrote:
> On Thu, 2017-04-27 at 15:51 +0100, Marco Chiappero wrote:
> > Currently all the slave devices belonging to the same port inherit
> > their
> > MAC address from its master device. This patch removes this
> > limitation
> > and allows every slave device to obtain a unique MAC address, by
> > default
> > randomly generated at creation time.
> > 
> > Moreover it is now possible to correctly modify the MAC address at
> > any
> > time, fixing an existing bug as MAC address changes on the master
> > were
> > not reflected on the slaves. It also avoids multiple interfaces
> > sharing
> > the same IPv6 link-local address.
> 
> How is this different than macvlan now?  Why would you use unique
> addressed ipvlan instances instead of macvlan?  Wouldn't the same
> problems around external switches not expecting multiple MACs from
> the
> same switch port apply now to ipvlan?
> 
> The whole point of ipvlan AIUI was to get around macvlan problems
> related to multiple MACs on the same port.

Another issue is the unicast MAC limits on cards.  ipvlan is now much
more likely to hit the unicast MAC limit of the NIC and thus trigger
promiscuous mode and the resulting performance drop, where before it
would not.

Dan

> Also, I think the IPv6 thing you mention is incorrect and has long
> since been solved.  Originally, ipvlan did not include a "dev_id"
> property that differened between child interfaces, and thus the IID
> of
> the each interface was the same.  That has now been fixed, and each
> ipvlan slave should now have a different IID and thus a different
> link-
> local address.
> 
> Dan
> 
> > Since ipvlan is designed to expose a single MAC address for
> > external
> > communications, the driver now behaves as follow:
> > - L2 mode:
> >    * Any reference to the internal MAC address of the originating
> > slave
> >      is replaced with the MAC address of the master for outbound
> > frames.
> >    * Likewise, the destination MAC address is overwritten with the
> >      internal one (once the correct slave is determined) for any
> >      incoming external frame.
> >    * For any internal slave-to-slave communication, the original
> > MAC
> >      addresses are preserved (although not used for
> > routing/switching).
> > - L3/L3s mode:
> >    * The destination MAC address for incoming external packets is
> >      replaced with the one belonging to the destination slave
> > device
> >      (as for L2 mode)
> >    * Every other path behaves as before.
> > 
> > Being a significant behavioral change, version number has been
> > increased.
> > 
> > Signed-off-by: Marco Chiappero <marco.chiappero@intel.com>
> > Tested-by: Marco Chiappero <marco.chiappero@intel.com>
> > ---
> >  drivers/net/ipvlan/ipvlan.h      |   2 +-
> >  drivers/net/ipvlan/ipvlan_core.c | 113
> > ++++++++++++++++++++++++++++++++++-----
> >  drivers/net/ipvlan/ipvlan_main.c |  18 +++----
> >  3 files changed, 111 insertions(+), 22 deletions(-)
> > 
> > diff --git a/drivers/net/ipvlan/ipvlan.h
> > b/drivers/net/ipvlan/ipvlan.h
> > index 800a46c..efe4fd1 100644
> > --- a/drivers/net/ipvlan/ipvlan.h
> > +++ b/drivers/net/ipvlan/ipvlan.h
> > @@ -32,7 +32,7 @@
> >  #include <net/l3mdev.h>
> >  
> >  #define IPVLAN_DRV	"ipvlan"
> > -#define IPV_DRV_VER	"0.1"
> > +#define IPV_DRV_VER	"0.2"
> >  
> >  #define IPVLAN_HASH_SIZE	(1 << BITS_PER_BYTE)
> >  #define IPVLAN_HASH_MASK	(IPVLAN_HASH_SIZE - 1)
> > diff --git a/drivers/net/ipvlan/ipvlan_core.c
> > b/drivers/net/ipvlan/ipvlan_core.c
> > index 67e342d..a30bc11 100644
> > --- a/drivers/net/ipvlan/ipvlan_core.c
> > +++ b/drivers/net/ipvlan/ipvlan_core.c
> > @@ -215,6 +215,89 @@ static void ipvlan_skb_crossing_ns(struct
> > sk_buff *skb, struct net_device *dev)
> >  		skb->dev = dev;
> >  }
> >  
> > +static inline struct nd_opt_hdr *ipvlan_icmp6_nd_opts(struct
> > icmp6hdr *icmph)
> > +{
> > +	return (struct nd_opt_hdr *)((struct nd_msg *)icmph)->opt;
> > +}
> > +
> > +static inline struct nd_opt_hdr *ipvlan_icmp6_rs_opts(struct
> > icmp6hdr *icmph)
> > +{
> > +	return (struct nd_opt_hdr *)((struct rs_msg *)icmph)->opt;
> > +}
> > +
> > +static void ipvlan_proxy_l2_update_icmp6(const struct net_device
> > *master,
> > +					 struct sk_buff *skb,
> > +					 struct nd_opt_hdr
> > *nd_opt,
> > +					 u8 opt_type)
> > +{
> > +	u32 opts_len = skb_tail_pointer(skb) - (u8 *)nd_opt;
> > +
> > +	while (opts_len) {
> > +		u32 opt_len = nd_opt->nd_opt_len << 3;
> > +
> > +		if (nd_opt->nd_opt_type == opt_type) {
> > +			struct ipv6hdr *ip6h = ipv6_hdr(skb);
> > +			struct icmp6hdr *icmph = icmp6_hdr(skb);
> > +			u32 len = ntohs(ip6h->payload_len);
> > +
> > +			memcpy(nd_opt + 1, master->dev_addr,
> > master-
> > > addr_len);
> > 
> > +			icmph->icmp6_cksum = 0;
> > +			icmph->icmp6_cksum =
> > +				csum_ipv6_magic(&ip6h->saddr,
> > +						&ip6h->daddr, len,
> > +						IPPROTO_ICMPV6,
> > +						csum_partial(icmph
> > ,
> > len, 0));
> > +			return;
> > +		}
> > +
> > +		opts_len -= opt_len;
> > +		nd_opt = ((void *)nd_opt) + opt_len;
> > +	}
> > +}
> > +
> > +static void ipvlan_proxy_l2_outbound(struct sk_buff *skb,
> > +				     const struct net_device
> > *master)
> > +{
> > +	/* masquerade the source MAC address for every outgoing
> > frame */
> > +	memcpy(eth_hdr(skb)->h_source, master->dev_addr, master-
> > > addr_len);
> > 
> > +
> > +	/* ARP and some NDISC packets need additional treatment */
> > +	if (skb->protocol == htons(ETH_P_IPV6)) {
> > +		struct ipv6hdr *ip6h = ipv6_hdr(skb);
> > +		struct icmp6hdr *icmph = icmp6_hdr(skb);
> > +		struct nd_opt_hdr *nd_opt;
> > +		u8 opt_type;
> > +
> > +		if (likely(ip6h->nexthdr != NEXTHDR_ICMP))
> > +			return;
> > +
> > +		switch (icmph->icmp6_type) {
> > +		case NDISC_NEIGHBOUR_SOLICITATION: {
> > +			nd_opt = ipvlan_icmp6_nd_opts(icmph);
> > +			opt_type = ND_OPT_SOURCE_LL_ADDR;
> > +			break;
> > +		}
> > +		case NDISC_NEIGHBOUR_ADVERTISEMENT: {
> > +			nd_opt = ipvlan_icmp6_nd_opts(icmph);
> > +			opt_type = ND_OPT_TARGET_LL_ADDR;
> > +			break;
> > +		}
> > +		case NDISC_ROUTER_SOLICITATION: {
> > +			nd_opt = ipvlan_icmp6_rs_opts(icmph);
> > +			opt_type = ND_OPT_SOURCE_LL_ADDR;
> > +			break;
> > +		}
> > +		default:
> > +			return;
> > +		}
> > +
> > +		ipvlan_proxy_l2_update_icmp6(master, skb, nd_opt,
> > opt_type);
> > +
> > +	} else if (unlikely(skb->protocol == htons(ETH_P_ARP))) {
> > +		memcpy(arp_hdr(skb) + 1, master->dev_addr, master-
> > > addr_len);
> > 
> > +	}
> > +}
> > +
> >  static void ipvlan_dispatch_multicast(struct ipvl_port *port,
> >  				      struct sk_buff *skb, u8
> > pkt_type,
> >  				      unsigned int mac_hash)
> > @@ -258,6 +341,7 @@ static void ipvlan_dispatch_multicast(struct
> > ipvl_port *port,
> >  		/* If the packet originated here, send it out. */
> >  		skb->dev = port->dev;
> >  		skb->pkt_type = pkt_type;
> > +		ipvlan_proxy_l2_outbound(skb, port->dev);
> >  		dev_queue_xmit(skb);
> >  	} else {
> >  		if (consumed)
> > @@ -489,6 +573,7 @@ static int ipvlan_xmit_mode_l3(struct sk_buff
> > *skb, struct net_device *dev)
> >  static inline int ipvlan_process_l2_outbound(struct sk_buff *skb,
> >  					     struct net_device
> > *dev)
> >  {
> > +	ipvlan_proxy_l2_outbound(skb, dev);
> >  	ipvlan_skb_crossing_ns(skb, dev);
> >  	return dev_queue_xmit(skb);
> >  }
> > @@ -499,27 +584,27 @@ static int ipvlan_xmit_mode_l2(struct sk_buff
> > *skb, struct net_device *dev)
> >  	struct ethhdr *ethh = eth_hdr(skb);
> >  	struct ipvl_addr *addr;
> >  
> > -	if (ether_addr_equal(ethh->h_dest, ethh->h_source)) {
> > -		addr = ipvlan_get_slave_addr_dst(skb, ipvlan-
> > >port);
> > -		if (addr)
> > -			return ipvlan_rcv_int_frame(addr, &skb);
> > +	if (is_multicast_ether_addr(ethh->h_dest)) {
> > +		ipvlan_multicast_enqueue(ipvlan->port, skb, true);
> > +		return NET_XMIT_SUCCESS;
> > +	}
> >  
> > +	if (ether_addr_equal(ethh->h_dest, ipvlan->phy_dev-
> > > dev_addr)) {
> > 
> >  		skb = skb_share_check(skb, GFP_ATOMIC);
> >  		if (unlikely(!skb))
> >  			return NET_XMIT_DROP;
> >  
> > -		/* Packet definitely does not belong to any of the
> > -		 * virtual devices, but the dest is local. So
> > forward
> > -		 * the skb for the main-dev. At the RX side we
> > just
> > return
> > -		 * RX_PASS for it to be processed further on the
> > stack.
> > +		/* Forward the skb for the master device. At the
> > RX
> > side we
> > +		 * just return RX_HANDLER_PASS for it to be
> > processed further
> > +		 * on the stack.
> >  		 */
> >  		return dev_forward_skb(ipvlan->phy_dev, skb);
> > -
> > -	} else if (is_multicast_ether_addr(ethh->h_dest)) {
> > -		ipvlan_multicast_enqueue(ipvlan->port, skb, true);
> > -		return NET_XMIT_SUCCESS;
> >  	}
> >  
> > +	addr = ipvlan_get_slave_addr_dst(skb, ipvlan->port);
> > +	if (addr)
> > +		return ipvlan_rcv_int_frame(addr, &skb);
> > +
> >  	return ipvlan_process_l2_outbound(skb, ipvlan->phy_dev);
> >  }
> >  
> > @@ -562,6 +647,10 @@ static int ipvlan_rcv_ext_frame(struct
> > ipvl_addr
> > *addr, struct sk_buff *skb)
> >  	struct net_device *dev = ipvlan->dev;
> >  	unsigned int len = skb->len + ETH_HLEN;
> >  
> > +	/* NOTE: although not necessary restore the actual
> > destination
> > +	 * address; this is also what traffic sniffers will
> > display.
> > +	 */
> > +	memcpy(eth_hdr(skb)->h_dest, dev->dev_addr, dev-
> > >addr_len);
> >  	ipvlan_skb_crossing_ns(skb, dev);
> >  	ipvlan_count_rx(ipvlan, len, true, false);
> >  
> > diff --git a/drivers/net/ipvlan/ipvlan_main.c
> > b/drivers/net/ipvlan/ipvlan_main.c
> > index b837807..709f27d 100644
> > --- a/drivers/net/ipvlan/ipvlan_main.c
> > +++ b/drivers/net/ipvlan/ipvlan_main.c
> > @@ -378,6 +378,7 @@ static const struct net_device_ops
> > ipvlan_netdev_ops = {
> >  	.ndo_start_xmit		= ipvlan_start_xmit,
> >  	.ndo_fix_features	= ipvlan_fix_features,
> >  	.ndo_change_rx_flags	= ipvlan_change_rx_flags,
> > +	.ndo_set_mac_address	= eth_mac_addr,
> >  	.ndo_set_rx_mode	= ipvlan_set_multicast_mac_filter,
> >  	.ndo_get_stats64	= ipvlan_get_stats64,
> >  	.ndo_vlan_rx_add_vid	= ipvlan_vlan_rx_add_vid,
> > @@ -392,9 +393,10 @@ static int ipvlan_hard_header(struct sk_buff
> > *skb, struct net_device *dev,
> >  	const struct ipvl_dev *ipvlan = netdev_priv(dev);
> >  	struct net_device *phy_dev = ipvlan->phy_dev;
> >  
> > -	/* TODO Probably use a different field than dev_addr so
> > that
> > the
> > -	 * mac-address on the virtual device is portable and can
> > be
> > carried
> > -	 * while the packets use the mac-addr on the physical
> > device.
> > +	/* This driver uses (almost exclusively) L3 addresses for
> > +	 * routing/switching. Use the actual slave's MAC address,
> > +	 * but overwrite it later during the packet processing for
> > +	 * frames leaving from master
> >  	 */
> >  	return dev_hard_header(skb, phy_dev, type, daddr,
> >  			       saddr ? : dev->dev_addr, len);
> > @@ -559,11 +561,8 @@ int ipvlan_link_new(struct net *src_net,
> > struct
> > net_device *dev,
> >  	/* Increment id-base to the next slot for the future
> > assignment */
> >  	port->dev_id_start = err + 1;
> >  
> > -	/* TODO Probably put random address here to be presented
> > to
> > the
> > -	 * world but keep using the physical-dev address for the
> > outgoing
> > -	 * packets.
> > -	 */
> > -	memcpy(dev->dev_addr, phy_dev->dev_addr, ETH_ALEN);
> > +	/* TODO: consider storing the original MAC address in dev-
> > > perm_addr */
> > 
> > +	eth_hw_addr_random(dev);
> >  
> >  	dev->priv_flags |= IFF_IPVLAN_SLAVE;
> >  
> > @@ -619,7 +618,8 @@ void ipvlan_link_setup(struct net_device *dev)
> >  	ether_setup(dev);
> >  
> >  	dev->priv_flags &= ~(IFF_XMIT_DST_RELEASE |
> > IFF_TX_SKB_SHARING);
> > -	dev->priv_flags |= IFF_UNICAST_FLT | IFF_NO_QUEUE;
> > +	dev->priv_flags |= IFF_UNICAST_FLT | IFF_NO_QUEUE
> > +			   | IFF_LIVE_ADDR_CHANGE;
> >  	dev->netdev_ops = &ipvlan_netdev_ops;
> >  	dev->destructor = free_netdev;
> >  	dev->header_ops = &ipvlan_header_ops;

^ permalink raw reply

* Re: [PATCH v1 net-next 5/6] net: allow simultaneous SW and HW transmit timestamping
From: Willem de Bruijn @ 2017-04-27 16:21 UTC (permalink / raw)
  To: Miroslav Lichvar
  Cc: Network Development, Richard Cochran, Willem de Bruijn,
	Soheil Hassas Yeganeh, Keller, Jacob E, Denny Page, Jiri Benc
In-Reply-To: <20170427161700.GB3401@localhost>

>> > @@ -720,6 +720,7 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
>> >                 empty = 0;
>> >         if (shhwtstamps &&
>> >             (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
>> > +           (empty || !skb_is_err_queue(skb)) &&
>> >             ktime_to_timespec_cond(shhwtstamps->hwtstamp, tss.ts + 2)) {
>>
>> I find skb->tstamp == 0 easier to understand than the condition on empty.
>>
>> Indeed, this is so non-obvious that I would suggest another helper function
>> skb_is_hwtx_tstamp with a concise comment about the race condition
>> between tx software and hardware timestamps (as in the last sentence of
>> the commit message).
>
> Should it include also the skb_is_err_queue() check? If it returned
> true for both TX and RX HW timestamps, maybe it could be called
> skb_has_hw_tstamp?

For the purpose of documenting why this complex condition exists,
I would call the skb_is_err_queue in that helper function and make
it tx + hw specific.

^ permalink raw reply

* Re: [PATCH net-next 9/9] ipvlan: introduce individual MAC addresses
From: Dan Williams @ 2017-04-27 16:20 UTC (permalink / raw)
  To: Marco Chiappero, netdev
  Cc: David S . Miller, Jeff Kirsher, Alexander Duyck, Sainath Grandhi,
	Mahesh Bandewar
In-Reply-To: <20170427145142.15830-10-marco.chiappero@intel.com>

On Thu, 2017-04-27 at 15:51 +0100, Marco Chiappero wrote:
> Currently all the slave devices belonging to the same port inherit
> their
> MAC address from its master device. This patch removes this
> limitation
> and allows every slave device to obtain a unique MAC address, by
> default
> randomly generated at creation time.
> 
> Moreover it is now possible to correctly modify the MAC address at
> any
> time, fixing an existing bug as MAC address changes on the master
> were
> not reflected on the slaves. It also avoids multiple interfaces
> sharing
> the same IPv6 link-local address.

How is this different than macvlan now?  Why would you use unique
addressed ipvlan instances instead of macvlan?  Wouldn't the same
problems around external switches not expecting multiple MACs from the
same switch port apply now to ipvlan?

The whole point of ipvlan AIUI was to get around macvlan problems
related to multiple MACs on the same port.

Also, I think the IPv6 thing you mention is incorrect and has long
since been solved.  Originally, ipvlan did not include a "dev_id"
property that differened between child interfaces, and thus the IID of
the each interface was the same.  That has now been fixed, and each
ipvlan slave should now have a different IID and thus a different link-
local address.

Dan

> Since ipvlan is designed to expose a single MAC address for external
> communications, the driver now behaves as follow:
> - L2 mode:
>    * Any reference to the internal MAC address of the originating
> slave
>      is replaced with the MAC address of the master for outbound
> frames.
>    * Likewise, the destination MAC address is overwritten with the
>      internal one (once the correct slave is determined) for any
>      incoming external frame.
>    * For any internal slave-to-slave communication, the original MAC
>      addresses are preserved (although not used for
> routing/switching).
> - L3/L3s mode:
>    * The destination MAC address for incoming external packets is
>      replaced with the one belonging to the destination slave device
>      (as for L2 mode)
>    * Every other path behaves as before.
> 
> Being a significant behavioral change, version number has been
> increased.
> 
> Signed-off-by: Marco Chiappero <marco.chiappero@intel.com>
> Tested-by: Marco Chiappero <marco.chiappero@intel.com>
> ---
>  drivers/net/ipvlan/ipvlan.h      |   2 +-
>  drivers/net/ipvlan/ipvlan_core.c | 113
> ++++++++++++++++++++++++++++++++++-----
>  drivers/net/ipvlan/ipvlan_main.c |  18 +++----
>  3 files changed, 111 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/net/ipvlan/ipvlan.h
> b/drivers/net/ipvlan/ipvlan.h
> index 800a46c..efe4fd1 100644
> --- a/drivers/net/ipvlan/ipvlan.h
> +++ b/drivers/net/ipvlan/ipvlan.h
> @@ -32,7 +32,7 @@
>  #include <net/l3mdev.h>
>  
>  #define IPVLAN_DRV	"ipvlan"
> -#define IPV_DRV_VER	"0.1"
> +#define IPV_DRV_VER	"0.2"
>  
>  #define IPVLAN_HASH_SIZE	(1 << BITS_PER_BYTE)
>  #define IPVLAN_HASH_MASK	(IPVLAN_HASH_SIZE - 1)
> diff --git a/drivers/net/ipvlan/ipvlan_core.c
> b/drivers/net/ipvlan/ipvlan_core.c
> index 67e342d..a30bc11 100644
> --- a/drivers/net/ipvlan/ipvlan_core.c
> +++ b/drivers/net/ipvlan/ipvlan_core.c
> @@ -215,6 +215,89 @@ static void ipvlan_skb_crossing_ns(struct
> sk_buff *skb, struct net_device *dev)
>  		skb->dev = dev;
>  }
>  
> +static inline struct nd_opt_hdr *ipvlan_icmp6_nd_opts(struct
> icmp6hdr *icmph)
> +{
> +	return (struct nd_opt_hdr *)((struct nd_msg *)icmph)->opt;
> +}
> +
> +static inline struct nd_opt_hdr *ipvlan_icmp6_rs_opts(struct
> icmp6hdr *icmph)
> +{
> +	return (struct nd_opt_hdr *)((struct rs_msg *)icmph)->opt;
> +}
> +
> +static void ipvlan_proxy_l2_update_icmp6(const struct net_device
> *master,
> +					 struct sk_buff *skb,
> +					 struct nd_opt_hdr *nd_opt,
> +					 u8 opt_type)
> +{
> +	u32 opts_len = skb_tail_pointer(skb) - (u8 *)nd_opt;
> +
> +	while (opts_len) {
> +		u32 opt_len = nd_opt->nd_opt_len << 3;
> +
> +		if (nd_opt->nd_opt_type == opt_type) {
> +			struct ipv6hdr *ip6h = ipv6_hdr(skb);
> +			struct icmp6hdr *icmph = icmp6_hdr(skb);
> +			u32 len = ntohs(ip6h->payload_len);
> +
> +			memcpy(nd_opt + 1, master->dev_addr, master-
> >addr_len);
> +			icmph->icmp6_cksum = 0;
> +			icmph->icmp6_cksum =
> +				csum_ipv6_magic(&ip6h->saddr,
> +						&ip6h->daddr, len,
> +						IPPROTO_ICMPV6,
> +						csum_partial(icmph,
> len, 0));
> +			return;
> +		}
> +
> +		opts_len -= opt_len;
> +		nd_opt = ((void *)nd_opt) + opt_len;
> +	}
> +}
> +
> +static void ipvlan_proxy_l2_outbound(struct sk_buff *skb,
> +				     const struct net_device
> *master)
> +{
> +	/* masquerade the source MAC address for every outgoing
> frame */
> +	memcpy(eth_hdr(skb)->h_source, master->dev_addr, master-
> >addr_len);
> +
> +	/* ARP and some NDISC packets need additional treatment */
> +	if (skb->protocol == htons(ETH_P_IPV6)) {
> +		struct ipv6hdr *ip6h = ipv6_hdr(skb);
> +		struct icmp6hdr *icmph = icmp6_hdr(skb);
> +		struct nd_opt_hdr *nd_opt;
> +		u8 opt_type;
> +
> +		if (likely(ip6h->nexthdr != NEXTHDR_ICMP))
> +			return;
> +
> +		switch (icmph->icmp6_type) {
> +		case NDISC_NEIGHBOUR_SOLICITATION: {
> +			nd_opt = ipvlan_icmp6_nd_opts(icmph);
> +			opt_type = ND_OPT_SOURCE_LL_ADDR;
> +			break;
> +		}
> +		case NDISC_NEIGHBOUR_ADVERTISEMENT: {
> +			nd_opt = ipvlan_icmp6_nd_opts(icmph);
> +			opt_type = ND_OPT_TARGET_LL_ADDR;
> +			break;
> +		}
> +		case NDISC_ROUTER_SOLICITATION: {
> +			nd_opt = ipvlan_icmp6_rs_opts(icmph);
> +			opt_type = ND_OPT_SOURCE_LL_ADDR;
> +			break;
> +		}
> +		default:
> +			return;
> +		}
> +
> +		ipvlan_proxy_l2_update_icmp6(master, skb, nd_opt,
> opt_type);
> +
> +	} else if (unlikely(skb->protocol == htons(ETH_P_ARP))) {
> +		memcpy(arp_hdr(skb) + 1, master->dev_addr, master-
> >addr_len);
> +	}
> +}
> +
>  static void ipvlan_dispatch_multicast(struct ipvl_port *port,
>  				      struct sk_buff *skb, u8
> pkt_type,
>  				      unsigned int mac_hash)
> @@ -258,6 +341,7 @@ static void ipvlan_dispatch_multicast(struct
> ipvl_port *port,
>  		/* If the packet originated here, send it out. */
>  		skb->dev = port->dev;
>  		skb->pkt_type = pkt_type;
> +		ipvlan_proxy_l2_outbound(skb, port->dev);
>  		dev_queue_xmit(skb);
>  	} else {
>  		if (consumed)
> @@ -489,6 +573,7 @@ static int ipvlan_xmit_mode_l3(struct sk_buff
> *skb, struct net_device *dev)
>  static inline int ipvlan_process_l2_outbound(struct sk_buff *skb,
>  					     struct net_device *dev)
>  {
> +	ipvlan_proxy_l2_outbound(skb, dev);
>  	ipvlan_skb_crossing_ns(skb, dev);
>  	return dev_queue_xmit(skb);
>  }
> @@ -499,27 +584,27 @@ static int ipvlan_xmit_mode_l2(struct sk_buff
> *skb, struct net_device *dev)
>  	struct ethhdr *ethh = eth_hdr(skb);
>  	struct ipvl_addr *addr;
>  
> -	if (ether_addr_equal(ethh->h_dest, ethh->h_source)) {
> -		addr = ipvlan_get_slave_addr_dst(skb, ipvlan->port);
> -		if (addr)
> -			return ipvlan_rcv_int_frame(addr, &skb);
> +	if (is_multicast_ether_addr(ethh->h_dest)) {
> +		ipvlan_multicast_enqueue(ipvlan->port, skb, true);
> +		return NET_XMIT_SUCCESS;
> +	}
>  
> +	if (ether_addr_equal(ethh->h_dest, ipvlan->phy_dev-
> >dev_addr)) {
>  		skb = skb_share_check(skb, GFP_ATOMIC);
>  		if (unlikely(!skb))
>  			return NET_XMIT_DROP;
>  
> -		/* Packet definitely does not belong to any of the
> -		 * virtual devices, but the dest is local. So
> forward
> -		 * the skb for the main-dev. At the RX side we just
> return
> -		 * RX_PASS for it to be processed further on the
> stack.
> +		/* Forward the skb for the master device. At the RX
> side we
> +		 * just return RX_HANDLER_PASS for it to be
> processed further
> +		 * on the stack.
>  		 */
>  		return dev_forward_skb(ipvlan->phy_dev, skb);
> -
> -	} else if (is_multicast_ether_addr(ethh->h_dest)) {
> -		ipvlan_multicast_enqueue(ipvlan->port, skb, true);
> -		return NET_XMIT_SUCCESS;
>  	}
>  
> +	addr = ipvlan_get_slave_addr_dst(skb, ipvlan->port);
> +	if (addr)
> +		return ipvlan_rcv_int_frame(addr, &skb);
> +
>  	return ipvlan_process_l2_outbound(skb, ipvlan->phy_dev);
>  }
>  
> @@ -562,6 +647,10 @@ static int ipvlan_rcv_ext_frame(struct ipvl_addr
> *addr, struct sk_buff *skb)
>  	struct net_device *dev = ipvlan->dev;
>  	unsigned int len = skb->len + ETH_HLEN;
>  
> +	/* NOTE: although not necessary restore the actual
> destination
> +	 * address; this is also what traffic sniffers will display.
> +	 */
> +	memcpy(eth_hdr(skb)->h_dest, dev->dev_addr, dev->addr_len);
>  	ipvlan_skb_crossing_ns(skb, dev);
>  	ipvlan_count_rx(ipvlan, len, true, false);
>  
> diff --git a/drivers/net/ipvlan/ipvlan_main.c
> b/drivers/net/ipvlan/ipvlan_main.c
> index b837807..709f27d 100644
> --- a/drivers/net/ipvlan/ipvlan_main.c
> +++ b/drivers/net/ipvlan/ipvlan_main.c
> @@ -378,6 +378,7 @@ static const struct net_device_ops
> ipvlan_netdev_ops = {
>  	.ndo_start_xmit		= ipvlan_start_xmit,
>  	.ndo_fix_features	= ipvlan_fix_features,
>  	.ndo_change_rx_flags	= ipvlan_change_rx_flags,
> +	.ndo_set_mac_address	= eth_mac_addr,
>  	.ndo_set_rx_mode	= ipvlan_set_multicast_mac_filter,
>  	.ndo_get_stats64	= ipvlan_get_stats64,
>  	.ndo_vlan_rx_add_vid	= ipvlan_vlan_rx_add_vid,
> @@ -392,9 +393,10 @@ static int ipvlan_hard_header(struct sk_buff
> *skb, struct net_device *dev,
>  	const struct ipvl_dev *ipvlan = netdev_priv(dev);
>  	struct net_device *phy_dev = ipvlan->phy_dev;
>  
> -	/* TODO Probably use a different field than dev_addr so that
> the
> -	 * mac-address on the virtual device is portable and can be
> carried
> -	 * while the packets use the mac-addr on the physical
> device.
> +	/* This driver uses (almost exclusively) L3 addresses for
> +	 * routing/switching. Use the actual slave's MAC address,
> +	 * but overwrite it later during the packet processing for
> +	 * frames leaving from master
>  	 */
>  	return dev_hard_header(skb, phy_dev, type, daddr,
>  			       saddr ? : dev->dev_addr, len);
> @@ -559,11 +561,8 @@ int ipvlan_link_new(struct net *src_net, struct
> net_device *dev,
>  	/* Increment id-base to the next slot for the future
> assignment */
>  	port->dev_id_start = err + 1;
>  
> -	/* TODO Probably put random address here to be presented to
> the
> -	 * world but keep using the physical-dev address for the
> outgoing
> -	 * packets.
> -	 */
> -	memcpy(dev->dev_addr, phy_dev->dev_addr, ETH_ALEN);
> +	/* TODO: consider storing the original MAC address in dev-
> >perm_addr */
> +	eth_hw_addr_random(dev);
>  
>  	dev->priv_flags |= IFF_IPVLAN_SLAVE;
>  
> @@ -619,7 +618,8 @@ void ipvlan_link_setup(struct net_device *dev)
>  	ether_setup(dev);
>  
>  	dev->priv_flags &= ~(IFF_XMIT_DST_RELEASE |
> IFF_TX_SKB_SHARING);
> -	dev->priv_flags |= IFF_UNICAST_FLT | IFF_NO_QUEUE;
> +	dev->priv_flags |= IFF_UNICAST_FLT | IFF_NO_QUEUE
> +			   | IFF_LIVE_ADDR_CHANGE;
>  	dev->netdev_ops = &ipvlan_netdev_ops;
>  	dev->destructor = free_netdev;
>  	dev->header_ops = &ipvlan_header_ops;

^ permalink raw reply

* Re: [PATCH v1 net-next 5/6] net: allow simultaneous SW and HW transmit timestamping
From: Miroslav Lichvar @ 2017-04-27 16:17 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, Richard Cochran, Willem de Bruijn,
	Soheil Hassas Yeganeh, Keller, Jacob E, Denny Page, Jiri Benc
In-Reply-To: <CAF=yD-+GSK491AWQx8=6yd3=-HHwxdWq677ubwdjbV5AXzRbog@mail.gmail.com>

On Wed, Apr 26, 2017 at 08:00:02PM -0400, Willem de Bruijn wrote:
> > +       if (!hwtstamps && !(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW) &&
> > +           skb_shinfo(orig_skb)->tx_flags & SKBTX_IN_PROGRESS)
> > +               return;
> > +
> 
> This check should only happen for software transmit timestamps, so simpler to
> revise the check in sw_tx_timestamp above to
> 
>   if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP &&
> -        !(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS))
> +      (!(skb_shinfo(orig_skb)->tx_flags & SKBTX_IN_PROGRESS)) ||
> +      (skb->sk && skb->sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW)

Good point. This will avoid unnecessary calls of skb_tstamp_tx() in
the common case when SOF_TIMESTAMPING_OPT_TX_SWHW will not be enabled.

> > @@ -720,6 +720,7 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
> >                 empty = 0;
> >         if (shhwtstamps &&
> >             (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
> > +           (empty || !skb_is_err_queue(skb)) &&
> >             ktime_to_timespec_cond(shhwtstamps->hwtstamp, tss.ts + 2)) {
> 
> I find skb->tstamp == 0 easier to understand than the condition on empty.
> 
> Indeed, this is so non-obvious that I would suggest another helper function
> skb_is_hwtx_tstamp with a concise comment about the race condition
> between tx software and hardware timestamps (as in the last sentence of
> the commit message).

Should it include also the skb_is_err_queue() check? If it returned
true for both TX and RX HW timestamps, maybe it could be called
skb_has_hw_tstamp?

-- 
Miroslav Lichvar

^ permalink raw reply

* [PATCH net-next] samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel
From: David Ahern @ 2017-04-27 16:11 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, David Ahern

Add option to xdp1 and xdp_tx_iptunnel to insert xdp program in
SKB_MODE:
 - update set_link_xdp_fd to take a flags argument that is added to the
   RTM_SETLINK message

 - Add -S option to xdp1 and xdp_tx_iptunnel user code. When passed in
   XDP_FLAGS_SKB_MODE is set in the flags arg passed to set_link_xdp_fd

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 samples/bpf/bpf_load.c             | 19 +++++++++++++++---
 samples/bpf/bpf_load.h             |  2 +-
 samples/bpf/xdp1_user.c            | 40 ++++++++++++++++++++++++++++++--------
 samples/bpf/xdp_tx_iptunnel_user.c | 13 +++++++++----
 4 files changed, 58 insertions(+), 16 deletions(-)

diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index 0d449d8032d1..d4433a47e6c3 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -563,7 +563,7 @@ struct ksym *ksym_search(long key)
 	return &syms[0];
 }
 
-int set_link_xdp_fd(int ifindex, int fd)
+int set_link_xdp_fd(int ifindex, int fd, int flags)
 {
 	struct sockaddr_nl sa;
 	int sock, seq = 0, len, ret = -1;
@@ -599,15 +599,28 @@ int set_link_xdp_fd(int ifindex, int fd)
 	req.nh.nlmsg_seq = ++seq;
 	req.ifinfo.ifi_family = AF_UNSPEC;
 	req.ifinfo.ifi_index = ifindex;
+
+	/* started nested attribute for XDP */
 	nla = (struct nlattr *)(((char *)&req)
 				+ NLMSG_ALIGN(req.nh.nlmsg_len));
 	nla->nla_type = NLA_F_NESTED | 43/*IFLA_XDP*/;
+	nla->nla_len = NLA_HDRLEN;
 
-	nla_xdp = (struct nlattr *)((char *)nla + NLA_HDRLEN);
+	/* add XDP fd */
+	nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
 	nla_xdp->nla_type = 1/*IFLA_XDP_FD*/;
 	nla_xdp->nla_len = NLA_HDRLEN + sizeof(int);
 	memcpy((char *)nla_xdp + NLA_HDRLEN, &fd, sizeof(fd));
-	nla->nla_len = NLA_HDRLEN + nla_xdp->nla_len;
+	nla->nla_len += nla_xdp->nla_len;
+
+	/* if user passed in any flags, add those too */
+	if (flags) {
+		nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
+		nla_xdp->nla_type = 3/*IFLA_XDP_FLAGS*/;
+		nla_xdp->nla_len = NLA_HDRLEN + sizeof(flags);
+		memcpy((char *)nla_xdp + NLA_HDRLEN, &flags, sizeof(flags));
+		nla->nla_len += nla_xdp->nla_len;
+	}
 
 	req.nh.nlmsg_len += NLA_ALIGN(nla->nla_len);
 
diff --git a/samples/bpf/bpf_load.h b/samples/bpf/bpf_load.h
index 68f6b2d22507..6bfd75ec6a16 100644
--- a/samples/bpf/bpf_load.h
+++ b/samples/bpf/bpf_load.h
@@ -47,5 +47,5 @@ struct ksym {
 
 int load_kallsyms(void);
 struct ksym *ksym_search(long key);
-int set_link_xdp_fd(int ifindex, int fd);
+int set_link_xdp_fd(int ifindex, int fd, int flags);
 #endif
diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
index d2be65d1fd86..deb05e630d84 100644
--- a/samples/bpf/xdp1_user.c
+++ b/samples/bpf/xdp1_user.c
@@ -5,6 +5,7 @@
  * License as published by the Free Software Foundation.
  */
 #include <linux/bpf.h>
+#include <linux/if_link.h>
 #include <assert.h>
 #include <errno.h>
 #include <signal.h>
@@ -12,16 +13,18 @@
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
+#include <libgen.h>
 
 #include "bpf_load.h"
 #include "bpf_util.h"
 #include "libbpf.h"
 
 static int ifindex;
+static int flags;
 
 static void int_exit(int sig)
 {
-	set_link_xdp_fd(ifindex, -1);
+	set_link_xdp_fd(ifindex, -1, flags);
 	exit(0);
 }
 
@@ -54,18 +57,39 @@ static void poll_stats(int interval)
 	}
 }
 
-int main(int ac, char **argv)
+static void usage(const char *prog)
 {
-	char filename[256];
+	fprintf(stderr,
+		"usage: %s [OPTS] IFINDEX\n\n"
+		"OPTS:\n"
+		"    -S    use skb-mode\n",
+		prog);
+}
 
-	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+int main(int argc, char **argv)
+{
+	const char *optstr = "S";
+	char filename[256];
+	int opt;
+
+	while ((opt = getopt(argc, argv, optstr)) != -1) {
+		switch (opt) {
+		case 'S':
+			flags |= XDP_FLAGS_SKB_MODE;
+			break;
+		default:
+			usage(basename(argv[0]));
+			return 1;
+		}
+	}
 
-	if (ac != 2) {
-		printf("usage: %s IFINDEX\n", argv[0]);
+	if (optind == argc) {
+		usage(basename(argv[0]));
 		return 1;
 	}
+	ifindex = strtoul(argv[optind], NULL, 0);
 
-	ifindex = strtoul(argv[1], NULL, 0);
+	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
 
 	if (load_bpf_file(filename)) {
 		printf("%s", bpf_log_buf);
@@ -79,7 +103,7 @@ int main(int ac, char **argv)
 
 	signal(SIGINT, int_exit);
 
-	if (set_link_xdp_fd(ifindex, prog_fd[0]) < 0) {
+	if (set_link_xdp_fd(ifindex, prog_fd[0], flags) < 0) {
 		printf("link set xdp fd failed\n");
 		return 1;
 	}
diff --git a/samples/bpf/xdp_tx_iptunnel_user.c b/samples/bpf/xdp_tx_iptunnel_user.c
index 70e192fc61aa..cb2bda7b5346 100644
--- a/samples/bpf/xdp_tx_iptunnel_user.c
+++ b/samples/bpf/xdp_tx_iptunnel_user.c
@@ -5,6 +5,7 @@
  * License as published by the Free Software Foundation.
  */
 #include <linux/bpf.h>
+#include <linux/if_link.h>
 #include <assert.h>
 #include <errno.h>
 #include <signal.h>
@@ -28,7 +29,7 @@ static int ifindex = -1;
 static void int_exit(int sig)
 {
 	if (ifindex > -1)
-		set_link_xdp_fd(ifindex, -1);
+		set_link_xdp_fd(ifindex, -1, 0);
 	exit(0);
 }
 
@@ -136,12 +137,13 @@ int main(int argc, char **argv)
 {
 	unsigned char opt_flags[256] = {};
 	unsigned int kill_after_s = 0;
-	const char *optstr = "i:a:p:s:d:m:T:P:h";
+	const char *optstr = "i:a:p:s:d:m:T:P:Sh";
 	int min_port = 0, max_port = 0;
 	struct iptnl_info tnl = {};
 	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
 	struct vip vip = {};
 	char filename[256];
+	int flags = 0;
 	int opt;
 	int i;
 
@@ -201,6 +203,9 @@ int main(int argc, char **argv)
 		case 'T':
 			kill_after_s = atoi(optarg);
 			break;
+		case 'S':
+			flags |= XDP_FLAGS_SKB_MODE;
+			break;
 		default:
 			usage(argv[0]);
 			return 1;
@@ -243,14 +248,14 @@ int main(int argc, char **argv)
 		}
 	}
 
-	if (set_link_xdp_fd(ifindex, prog_fd[0]) < 0) {
+	if (set_link_xdp_fd(ifindex, prog_fd[0], flags) < 0) {
 		printf("link set xdp fd failed\n");
 		return 1;
 	}
 
 	poll_stats(kill_after_s);
 
-	set_link_xdp_fd(ifindex, -1);
+	set_link_xdp_fd(ifindex, -1, flags);
 
 	return 0;
 }
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH net-next] can: fix build error without CONFIG_PROC_FS
From: Oliver Hartkopp @ 2017-04-27 16:11 UTC (permalink / raw)
  To: Marc Kleine-Budde, Arnd Bergmann, David S. Miller
  Cc: Thomas Gleixner, Cong Wang, Mario Kicherer, Eric Dumazet,
	linux-can, netdev, linux-kernel
In-Reply-To: <937e9144-c06c-d265-29eb-a1c6f96b6f89@pengutronix.de>

Hello Arnd,

many thanks for your patch.

Btw

 >  static void canbcm_pernet_exit(struct net *net)
 >  {
 > +#ifdef CONFIG_PROC_FS
 >  	/* remove /proc/net/can-bcm directory */
 >  	if (IS_ENABLED(CONFIG_PROC_FS)) {
 >  		if (net->can.bcmproc_dir)
 >  			remove_proc_entry("can-bcm", net->proc_net);
 >  	}
 > +#endif
 >  }

"if (IS_ENABLED(CONFIG_PROC_FS))"

becomes obsolete too then ...

So I would suggest to take my patch to fix my fault ;-)

Best regards,
Oliver

On 04/27/2017 04:29 PM, Marc Kleine-Budde wrote:
> Hello Arnd,
>
> On 04/27/2017 04:21 PM, Arnd Bergmann wrote:
>> The procfs dir entry was added inside of an #ifdef, causing a build error
>> when we try to access it without CONFIG_PROC_FS set:
>>
>> net/can/bcm.c:1541:14: error: 'struct netns_can' has no member named 'bcmproc_dir'
>> net/can/bcm.c: In function 'bcm_connect':
>> net/can/bcm.c:1601:14: error: 'struct netns_can' has no member named 'bcmproc_dir'
>> net/can/bcm.c: In function 'canbcm_pernet_init':
>> net/can/bcm.c:1696:11: error: 'struct netns_can' has no member named 'bcmproc_dir'
>> net/can/bcm.c: In function 'canbcm_pernet_exit':
>> net/can/bcm.c:1707:15: error: 'struct netns_can' has no member named 'bcmproc_dir'
>>
>> This adds the same #ifdef around all users of the pointer. Alternatively
>> we could move the pointer outside of the #ifdef.
>>
>> Fixes: 384317ef4187 ("can: network namespace support for CAN_BCM protocol")
>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>
> A fix for this problem is part of the pull request I send to David
> earlier today:
>
>     https://www.mail-archive.com/netdev@vger.kernel.org/msg165764.html
>
> regards,
> Marc
>

^ permalink raw reply

* Re: [PATCH net-next 8/9] ipvlan: improve compiler hints
From: David Miller @ 2017-04-27 16:05 UTC (permalink / raw)
  To: alexander.h.duyck
  Cc: marco.chiappero, netdev, jeffrey.t.kirsher, sainath.grandhi,
	maheshb
In-Reply-To: <B1C1DF2ACD01FD4881736AA51731BAB2B2EFB2@ORSMSX107.amr.corp.intel.com>

From: "Duyck, Alexander H" <alexander.h.duyck@intel.com>
Date: Thu, 27 Apr 2017 15:21:16 +0000

>> -----Original Message-----
>> From: Chiappero, Marco
>> Sent: Thursday, April 27, 2017 7:52 AM
>> To: netdev@vger.kernel.org
>> Cc: David S . Miller <davem@davemloft.net>; Kirsher, Jeffrey T
>> <jeffrey.t.kirsher@intel.com>; Duyck, Alexander H
>> <alexander.h.duyck@intel.com>; Grandhi, Sainath
>> <sainath.grandhi@intel.com>; Mahesh Bandewar <maheshb@google.com>;
>> Chiappero, Marco <marco.chiappero@intel.com>
>> Subject: [PATCH net-next 8/9] ipvlan: improve compiler hints
>> 
>> Extend inlining and branch prediction hints.
>> 
>> Signed-off-by: Marco Chiappero <marco.chiappero@intel.com>
>> ---
>>  drivers/net/ipvlan/ipvlan_core.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>> 
>> diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
>> index a9fc1b5..67e342d 100644
>> --- a/drivers/net/ipvlan/ipvlan_core.c
>> +++ b/drivers/net/ipvlan/ipvlan_core.c
>> @@ -88,7 +88,7 @@ void ipvlan_ht_addr_del(struct ipvl_addr *addr)
>>  	hlist_del_init_rcu(&addr->hlnode);
>>  }
>> 
>> -unsigned int ipvlan_mac_hash(const unsigned char *addr)
>> +inline unsigned int ipvlan_mac_hash(const unsigned char *addr)
>>  {
>>  	u32 hash = jhash_1word(__get_unaligned_cpu32(addr + 2),
>>  			       ipvlan_jhash_secret);
> 
> I'm kind of surprised this isn't causing a problem with differing
> declarations between the declaration here and the declaration in
> ipvlan.h. Normally for inlining something like this you would change
> it to a "static inline" and move the entire declaration into the
> header file.

No inlines in foo.c files please, seriously let the compiler decide
it knows better than you.

^ permalink raw reply

* Re: [PATCH net-next 6/6] bpf: show bpf programs
From: David Miller @ 2017-04-27 16:00 UTC (permalink / raw)
  To: hannes; +Cc: daniel, netdev, ast, daniel, jbenc, aconole
In-Reply-To: <5b1f23e3-86a7-69aa-91e2-1dc72125f22b@stressinduktion.org>

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Thu, 27 Apr 2017 15:22:49 +0200

> Sure, that sounds super. But so far Linux and most (maybe I should write
> all) subsystems always provided some easy way to get the insights of the
> kernel without having to code or rely on special tools so far.

Not true.

You cannot fully dump socket TCP internal state without netlink based
tools.  It is just one of many examples.

Can you dump all nftables rules without a special tool?

I don't think this is a legitimate line of argument, and I want
this to be done via the bpf() system call which is what people
are working on.

Thanks.

^ permalink raw reply

* Re: [PATCH v2 01/21] scatterlist: Introduce sg_map helper functions
From: Logan Gunthorpe @ 2017-04-27 15:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Christoph Hellwig
  Cc: linux-nvdimm-y27Ovi1pjclAfugRpC6u6w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	target-devel-u79uwXL29TY76Z2rM5mHXA, Sumit Semwal,
	devel-gWbeCf7V1WCQmaza687I9mD2FQJk+8+b, James E.J. Bottomley,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, Matthew Wilcox,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw,
	linux-media-u79uwXL29TY76Z2rM5mHXA,
	intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	sparmaintainer-GLv8BlqOqDDQT0dZR+AlfA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	megaraidlinux.pdl-dY08KVG/lbpWk0Htik3J/w, Jens Axboe,
	Martin K. Petersen, Greg Kroah-Hartman,
	linux-mmc-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20170427152720.GA7662-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>



On 27/04/17 09:27 AM, Jason Gunthorpe wrote:
> On Thu, Apr 27, 2017 at 08:53:38AM +0200, Christoph Hellwig wrote:
> How about first switching as many call sites as possible to use
> sg_copy_X_buffer instead of kmap?

Yeah, I could look at doing that first.

One problem is we might get more Naks of the form of Herbert Xu's who
might be concerned with the performance implications.

These are definitely a bit more invasive changes than thin wrappers
around kmap calls.

> A random audit of Logan's series suggests this is actually a fairly
> common thing.

It's not _that_ common but there are a significant fraction. One of my
patches actually did this to two places that seemed to be reimplementing
the sg_copy_X_buffer logic.

Thanks,

Logan

^ permalink raw reply

* Re: [PATCH v6 1/5] skbuff: return -EMSGSIZE in skb_to_sgvec to prevent overflow
From: David Miller @ 2017-04-27 15:54 UTC (permalink / raw)
  To: Jason; +Cc: netdev, linux-kernel, David.Laight, kernel-hardening
In-Reply-To: <CAHmME9qDmcvzF_xeaxegC2RpBOs8PziJOaKEqv6Z_X1pUFbR0w@mail.gmail.com>

From: "Jason A. Donenfeld" <Jason@zx2c4.com>
Date: Thu, 27 Apr 2017 11:21:51 +0200

> Hey Dave,
> 
> David Laight and I have been discussing offlist. It occurred to both
> of us that this could just be turned into a loop because perhaps this
> is actually just tail-recursive. Upon further inspection, however, the
> way the current algorithm works, it's possible that each of the
> fraglist skbs has its own fraglist, which would make this into tree
> recursion, which is why in the first place I wanted to place that
> limit on it. If that's the case, then the patch I proposed above is
> the best way forward. However, perhaps there's the chance that
> fraglist skbs having separate fraglists are actually forbidden? Is
> this the case? Are there other parts of the API that enforce this
> contract? Is it something we could safely rely on here? If you say
> yes, I'll send a v7 that makes this into a non-recursive loop.

As Sabrina showed, it can happen.  There are no such restrictions on
the geometry of an SKB.

^ permalink raw reply

* Re: rhashtable - Cap total number of entries to 2^31
From: David Miller @ 2017-04-27 15:48 UTC (permalink / raw)
  To: herbert; +Cc: fw, netdev, tgraf
In-Reply-To: <20170427054451.GA529@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 27 Apr 2017 13:44:51 +0800

> When max_size is not set or if it set to a sufficiently large
> value, the nelems counter can overflow.  This would cause havoc
> with the automatic shrinking as it would then attempt to fit a
> huge number of entries into a tiny hash table.
> 
> This patch fixes this by adding max_elems to struct rhashtable
> to cap the number of elements.  This is set to 2^31 as nelems is
> not a precise count.  This is sufficiently smaller than UINT_MAX
> that it should be safe.
> 
> When max_size is set max_elems will be lowered to at most twice
> max_size as is the status quo.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied to net-next, thanks Herbert.

^ permalink raw reply

* Re: [PATCH net-next] tcp: tcp_rack_reo_timeout() must update tp->tcp_mstamp
From: David Miller @ 2017-04-27 15:46 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, soheil, ncardwell, ycheng
In-Reply-To: <1493266255.6453.103.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 26 Apr 2017 21:10:55 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> I wrongly assumed tp->tcp_mstamp was up to date at the time
> tcp_rack_reo_timeout() was called.
> 
> It is not true, since we only update tcp->tcp_mstamp when receiving
> a packet (as initially done in commit 69e996c58a35 ("tcp: add
> tp->tcp_mstamp field")
> 
> tcp_rack_reo_timeout() being called by a timer and not an incoming
> packet, we need to refresh tp->tcp_mstamp
> 
> Fixes: 7c1c7308592f ("tcp: do not pass timestamp to tcp_rack_detect_loss()")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH v2 07/21] crypto: shash, caam: Make use of the new sg_map helper function
From: Logan Gunthorpe @ 2017-04-27 15:45 UTC (permalink / raw)
  To: Herbert Xu
  Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	target-devel-u79uwXL29TY76Z2rM5mHXA, Christoph Hellwig,
	devel-gWbeCf7V1WCQmaza687I9mD2FQJk+8+b, James E.J. Bottomley,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Sumit Semwal,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw,
	linux-media-u79uwXL29TY76Z2rM5mHXA,
	intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	sparmaintainer-GLv8BlqOqDDQT0dZR+AlfA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	megaraidlinux.pdl-dY08KVG/lbpWk0Htik3J/w, Jens Axboe,
	Martin K. Petersen, netdev-u79uwXL29TY76Z2rM5mHXA, Matthew Wilcox,
	linux-mmc-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA, Greg Kroah-Hartman,
	David S. Miller
In-Reply-To: <20170427035603.GA32212-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>



On 26/04/17 09:56 PM, Herbert Xu wrote:
> On Tue, Apr 25, 2017 at 12:20:54PM -0600, Logan Gunthorpe wrote:
>> Very straightforward conversion to the new function in the caam driver
>> and shash library.
>>
>> Signed-off-by: Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
>> Cc: Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
>> Cc: "David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
>> ---
>>  crypto/shash.c                | 9 ++++++---
>>  drivers/crypto/caam/caamalg.c | 8 +++-----
>>  2 files changed, 9 insertions(+), 8 deletions(-)
>>
>> diff --git a/crypto/shash.c b/crypto/shash.c
>> index 5e31c8d..5914881 100644
>> --- a/crypto/shash.c
>> +++ b/crypto/shash.c
>> @@ -283,10 +283,13 @@ int shash_ahash_digest(struct ahash_request *req, struct shash_desc *desc)
>>  	if (nbytes < min(sg->length, ((unsigned int)(PAGE_SIZE)) - offset)) {
>>  		void *data;
>>  
>> -		data = kmap_atomic(sg_page(sg));
>> -		err = crypto_shash_digest(desc, data + offset, nbytes,
>> +		data = sg_map(sg, 0, SG_KMAP_ATOMIC);
>> +		if (IS_ERR(data))
>> +			return PTR_ERR(data);
>> +
>> +		err = crypto_shash_digest(desc, data, nbytes,
>>  					  req->result);
>> -		kunmap_atomic(data);
>> +		sg_unmap(sg, data, 0, SG_KMAP_ATOMIC);
>>  		crypto_yield(desc->flags);
>>  	} else
>>  		err = crypto_shash_init(desc) ?:
> 
> Nack.  This is an optimisation for the special case of a single
> SG list entry.  In fact in the common case the kmap_atomic should
> disappear altogether in the no-highmem case.  So replacing it
> with sg_map is not acceptable.

What you seem to have missed is that sg_map is just a thin wrapper
around kmap_atomic. Perhaps with a future check for a mappable page.
This change should have zero impact on performance.

Logan

^ permalink raw reply

* Re: [PATCH v2 01/21] scatterlist: Introduce sg_map helper functions
From: Logan Gunthorpe @ 2017-04-27 15:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dri-devel, Stephen Bates, dm-devel, target-devel, Sumit Semwal,
	devel, James E.J. Bottomley, linux-scsi, linux-nvdimm, linux-rdma,
	Ross Zwisler, open-iscsi, linux-media, intel-gfx, sparmaintainer,
	linux-raid, Dan Williams, megaraidlinux.pdl, Jens Axboe,
	Martin K. Petersen, netdev, Matthew Wilcox, linux-mmc,
	linux-kernel, linux-crypto, Greg Kroah-Hartman
In-Reply-To: <20170427065338.GA20677@lst.de>



On 27/04/17 12:53 AM, Christoph Hellwig wrote:
> I think you'll need to follow the existing kmap semantics and never
> fail the iomem version either.  Otherwise you'll have a special case
> that's almost never used that has a different error path.
>
> Again, wrong way.  Suddenly making things fail for your special case
> that normally don't fail is a receipe for bugs.

I don't disagree but these restrictions make the problem impossible to
solve? If there is iomem behind a page in an SGL and someone tries to
map it, we either have to fail or we break iomem safety which was your
original concern.

Logan

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply

* Re: Realtek RTL8101E PCI-E ethernet controller does not work with the r8169 driver
From: Piotr Gabriel Kosinski @ 2017-04-27 15:33 UTC (permalink / raw)
  To: netdev
In-Reply-To: <CAFMLSdOBwO4zWZo48EymSGw=4i5DSDfc4VjrxBjWaM5h5DBknw@mail.gmail.com>

00:00.0 Host bridge: Intel Corporation Atom Processor Z36xxx/Z37xxx
Series SoC Transaction Register (rev 0a)
    Subsystem: Toshiba America Info Systems Atom Processor
Z36xxx/Z37xxx Series SoC Transaction Register
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0
    Kernel driver in use: iosf_mbi_pci

00:02.0 VGA compatible controller: Intel Corporation Atom Processor
Z36xxx/Z37xxx Series Graphics & Display (rev 0a) (prog-if 00 [VGA
controller])
    Subsystem: Toshiba America Info Systems Atom Processor
Z36xxx/Z37xxx Series Graphics & Display
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0
    Interrupt: pin A routed to IRQ 90
    Region 0: Memory at c0000000 (32-bit, non-prefetchable) [size=4M]
    Region 2: Memory at d0000000 (32-bit, prefetchable) [size=256M]
    Region 4: I/O ports at f080 [size=8]
    [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
    Capabilities: [d0] Power Management version 2
        Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Address: fee0200c  Data: 41d1
    Capabilities: [b0] Vendor Specific Information: Len=07 <?>
    Kernel driver in use: i915
    Kernel modules: i915

00:13.0 SATA controller: Intel Corporation Atom Processor E3800 Series
SATA AHCI Controller (rev 0a) (prog-if 01 [AHCI 1.0])
    Subsystem: Toshiba America Info Systems Atom Processor E3800
Series SATA AHCI Controller
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0
    Interrupt: pin A routed to IRQ 88
    Region 0: I/O ports at f070 [size=8]
    Region 1: I/O ports at f060 [size=4]
    Region 2: I/O ports at f050 [size=8]
    Region 3: I/O ports at f040 [size=4]
    Region 4: I/O ports at f020 [size=32]
    Region 5: Memory at c0815000 (32-bit, non-prefetchable) [size=2K]
    Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Address: fee0200c  Data: 41b1
    Capabilities: [70] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot+,D3cold-)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [a8] SATA HBA v1.0 BAR4 Offset=00000004
    Kernel driver in use: ahci
    Kernel modules: ahci

00:14.0 USB controller: Intel Corporation Atom Processor
Z36xxx/Z37xxx, Celeron N2000 Series USB xHCI (rev 0a) (prog-if 30
[XHCI])
    Subsystem: Toshiba America Info Systems Atom Processor
Z36xxx/Z37xxx, Celeron N2000 Series USB xHCI
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0
    Interrupt: pin A routed to IRQ 87
    Region 0: Memory at c0800000 (64-bit, non-prefetchable) [size=64K]
    Capabilities: [70] Power Management version 2
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA
PME(D0-,D1-,D2-,D3hot+,D3cold+)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [80] MSI: Enable+ Count=1/8 Maskable- 64bit+
        Address: 00000000fee0200c  Data: 4191
    Kernel driver in use: xhci_hcd

00:1a.0 Encryption controller: Intel Corporation Atom Processor
Z36xxx/Z37xxx Series Trusted Execution Engine (rev 0a)
    Subsystem: Toshiba America Info Systems Atom Processor
Z36xxx/Z37xxx Series Trusted Execution Engine
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 91
    Region 0: Memory at c0500000 (32-bit, non-prefetchable) [size=1M]
    Region 1: Memory at c0400000 (32-bit, non-prefetchable) [size=1M]
    Capabilities: [80] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold-)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Address: fee0300c  Data: 41e1
    Kernel driver in use: mei_txe
    Kernel modules: mei_txe

00:1b.0 Audio device: Intel Corporation Atom Processor Z36xxx/Z37xxx
Series High Definition Audio Controller (rev 0a)
    Subsystem: Toshiba America Info Systems Atom Processor
Z36xxx/Z37xxx Series High Definition Audio Controller
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 92
    Region 0: Memory at c0810000 (64-bit, non-prefetchable) [size=16K]
    Capabilities: [50] Power Management version 2
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Address: 00000000fee0300c  Data: 4142
    Kernel driver in use: snd_hda_intel
    Kernel modules: snd_hda_intel

00:1c.0 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI
Express Root Port 1 (rev 0a) (prog-if 00 [Normal decode])
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 16
    Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
    I/O behind bridge: 00001000-00001fff
    Memory behind bridge: c0900000-c0afffff
    Prefetchable memory behind bridge: 00000000c0b00000-00000000c0cfffff
    Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort+ <SERR- <PERR-
    BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
        PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
    Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
        DevCap:    MaxPayload 128 bytes, PhantFunc 0
            ExtTag- RBE+
        DevCtl:    Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
            RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
            MaxPayload 128 bytes, MaxReadReq 128 bytes
        DevSta:    CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
        LnkCap:    Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Exit
Latency L0s <512ns, L1 <4us
            ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp-
        LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+
DLActive- BWMgmt- ABWMgmt-
        SltCap:    AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+
            Slot #0, PowerLimit 10.000W; Interlock- NoCompl+
        SltCtl:    Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt-
HPIrq- LinkChg-
            Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
        SltSta:    Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock-
            Changed: MRL- PresDet- LinkState-
        RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
        RootCap: CRSVisible-
        RootSta: PME ReqID 0000, PMEStatus- PMEPending-
        DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF
Not Supported ARIFwd-
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-,
OBFF Disabled ARIFwd-
        LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
             Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -3.5dB,
EqualizationComplete-, EqualizationPhase1-
             EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
    Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
        Address: 00000000  Data: 0000
    Capabilities: [90] Subsystem: Toshiba America Info Systems Atom
Processor E3800 Series PCI Express Root Port 1
    Capabilities: [a0] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
    Kernel driver in use: pcieport
    Kernel modules: shpchp

00:1c.1 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI
Express Root Port 2 (rev 0a) (prog-if 00 [Normal decode])
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin B routed to IRQ 17
    Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
    I/O behind bridge: 00002000-00002fff
    Memory behind bridge: c0700000-c07fffff
    Prefetchable memory behind bridge: 00000000c0d00000-00000000c0efffff
    Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort+ <SERR- <PERR-
    BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
        PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
    Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
        DevCap:    MaxPayload 128 bytes, PhantFunc 0
            ExtTag- RBE+
        DevCtl:    Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
            RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
            MaxPayload 128 bytes, MaxReadReq 128 bytes
        DevSta:    CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
        LnkCap:    Port #2, Speed 5GT/s, Width x1, ASPM L0s L1, Exit
Latency L0s <512ns, L1 <4us
            ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp-
        LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+
DLActive+ BWMgmt+ ABWMgmt-
        SltCap:    AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+
            Slot #1, PowerLimit 10.000W; Interlock- NoCompl+
        SltCtl:    Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt-
HPIrq- LinkChg-
            Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
        SltSta:    Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
            Changed: MRL- PresDet- LinkState-
        RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
        RootCap: CRSVisible-
        RootSta: PME ReqID 0000, PMEStatus- PMEPending-
        DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF
Not Supported ARIFwd-
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+, LTR-,
OBFF Disabled ARIFwd-
        LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
             Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -3.5dB,
EqualizationComplete-, EqualizationPhase1-
             EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
    Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
        Address: 00000000  Data: 0000
    Capabilities: [90] Subsystem: Toshiba America Info Systems Atom
Processor E3800 Series PCI Express Root Port 2
    Capabilities: [a0] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
    Kernel driver in use: pcieport
    Kernel modules: shpchp

00:1c.2 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI
Express Root Port 3 (rev 0a) (prog-if 00 [Normal decode])
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin C routed to IRQ 18
    Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
    I/O behind bridge: 0000e000-0000efff
    Memory behind bridge: c0600000-c06fffff
    Prefetchable memory behind bridge: 00000000e0000000-00000000e00fffff
    Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort+ <SERR- <PERR-
    BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
        PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
    Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
        DevCap:    MaxPayload 128 bytes, PhantFunc 0
            ExtTag- RBE+
        DevCtl:    Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
            RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
            MaxPayload 128 bytes, MaxReadReq 128 bytes
        DevSta:    CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
        LnkCap:    Port #3, Speed 5GT/s, Width x1, ASPM L0s L1, Exit
Latency L0s <512ns, L1 <4us
            ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp-
        LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+
DLActive+ BWMgmt+ ABWMgmt-
        SltCap:    AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+
            Slot #2, PowerLimit 10.000W; Interlock- NoCompl+
        SltCtl:    Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt-
HPIrq- LinkChg-
            Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
        SltSta:    Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
            Changed: MRL- PresDet- LinkState-
        RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
        RootCap: CRSVisible-
        RootSta: PME ReqID 0000, PMEStatus- PMEPending-
        DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF
Not Supported ARIFwd-
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+, LTR-,
OBFF Disabled ARIFwd-
        LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
             Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -3.5dB,
EqualizationComplete-, EqualizationPhase1-
             EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
    Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
        Address: 00000000  Data: 0000
    Capabilities: [90] Subsystem: Toshiba America Info Systems Atom
Processor E3800 Series PCI Express Root Port 3
    Capabilities: [a0] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
    Kernel driver in use: pcieport
    Kernel modules: shpchp

00:1f.0 ISA bridge: Intel Corporation Atom Processor Z36xxx/Z37xxx
Series Power Control Unit (rev 0a)
    Subsystem: Toshiba America Info Systems Atom Processor
Z36xxx/Z37xxx Series Power Control Unit
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0
    Capabilities: [e0] Vendor Specific Information: Len=0c <?>
    Kernel driver in use: lpc_ich
    Kernel modules: lpc_ich

00:1f.3 SMBus: Intel Corporation Atom Processor E3800 Series SMBus
Controller (rev 0a)
    Subsystem: Toshiba America Info Systems Atom Processor E3800
Series SMBus Controller
    Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
    Interrupt: pin B routed to IRQ 255
    Region 0: Memory at c0814000 (32-bit, non-prefetchable) [size=32]
    Region 4: I/O ports at f000 [size=32]
    Capabilities: [50] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Kernel modules: i2c_i801

02:00.0 Network controller: Qualcomm Atheros QCA9565 / AR9565 Wireless
Network Adapter (rev 01)
    Subsystem: XAVi Technologies Corp. QCA9565 / AR9565 Wireless Network Adapter
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 17
    Region 0: Memory at c0700000 (64-bit, non-prefetchable) [size=512K]
    Expansion ROM at c0780000 [disabled] [size=64K]
    Capabilities: [40] Power Management version 2
        Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [50] MSI: Enable- Count=1/4 Maskable+ 64bit+
        Address: 0000000000000000  Data: 0000
        Masking: 00000000  Pending: 00000000
    Capabilities: [70] Express (v2) Endpoint, MSI 00
        DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
unlimited, L1 <64us
            ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
        DevCtl:    Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
            RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
            MaxPayload 128 bytes, MaxReadReq 512 bytes
        DevSta:    CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
        LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
Latency L0s <4us, L1 <64us
            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
        LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+
DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-,
OBFF Not Supported
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+, LTR-,
OBFF Disabled
        LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
             Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete-, EqualizationPhase1-
             EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
    Capabilities: [100 v1] Advanced Error Reporting
        UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
        CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
        AERCap:    First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
    Capabilities: [140 v1] Virtual Channel
        Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
        Arb:    Fixed- WRR32- WRR64- WRR128-
        Ctrl:    ArbSelect=Fixed
        Status:    InProgress-
        VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
            Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
            Status:    NegoPending- InProgress-
    Capabilities: [160 v1] Device Serial Number 00-00-00-00-00-00-00-00
    Kernel driver in use: ath9k
    Kernel modules: ath9k

03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller (rev 07)
    Subsystem: Toshiba America Info Systems RTL8101/2/6E PCI Express
Fast/Gigabit Ethernet controller
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 89
    Region 0: I/O ports at e000 [size=256]
    Region 2: Memory at e0004000 (64-bit, prefetchable) [size=4K]
    Region 4: Memory at e0000000 (64-bit, prefetchable) [size=16K]
    Expansion ROM at c0600000 [disabled] [size=64K]
    Capabilities: [40] Power Management version 3
        Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Address: 00000000fee0300c  Data: 41c1
    Capabilities: [70] Express (v2) Endpoint, MSI 01
        DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
unlimited, L1 <64us
            ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
        DevCtl:    Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
            RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
            MaxPayload 128 bytes, MaxReadReq 512 bytes
        DevSta:    CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
        LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
Latency L0s unlimited, L1 <64us
            ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
        LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+
DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
OBFF Via message/WAKE#
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+, LTR+,
OBFF Disabled
        LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
             Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete-, EqualizationPhase1-
             EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
    Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
        Vector table: BAR=4 offset=00000000
        PBA: BAR=4 offset=00000800
    Capabilities: [d0] Vital Product Data
        Not readable
    Capabilities: [100 v1] Advanced Error Reporting
        UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
        CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
        AERCap:    First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
    Capabilities: [140 v1] Virtual Channel
        Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
        Arb:    Fixed- WRR32- WRR64- WRR128-
        Ctrl:    ArbSelect=Fixed
        Status:    InProgress-
        VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
            Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
            Status:    NegoPending- InProgress-
    Capabilities: [160 v1] Device Serial Number 00-00-00-00-00-00-00-00
    Capabilities: [170 v1] Latency Tolerance Reporting
        Max snoop latency: 71680ns
        Max no snoop latency: 71680ns
    Kernel driver in use: r8169
    Kernel modules: r8169


2017-04-27 17:31 GMT+02:00 Piotr Gabriel Kosinski <pg.kosinski@gmail.com>:
> The Realtek RTL8101E PCI Express ethernet controller does not work
> with the r8169 driver, on kernel 4.10 (it also did not work on earlier
> kernels). These are the messages from the kernel log:
>
> [   11.359593] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> [   11.359611] r8169 0000:03:00.0: can't disable ASPM; OS doesn't have
> ASPM control
> [   11.359803] r8169 0000:03:00.0 (unnamed net_device)
> (uninitialized): unknown MAC, using family default
> [   11.369802] r8169 0000:03:00.0 (unnamed net_device)
> (uninitialized): rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
> [   11.406668] r8169 0000:03:00.0 eth0: rtl_counters_cond == 1 (loop:
> 1000, delay: 10).
> [   11.406815] r8169 0000:03:00.0 eth0: RTL8101e at
> 0xffffb3140068d000, ff:ff:ff:ff:ff:ff, XID 9cf0f8ff IRQ 89
> [   11.482740] r8169 0000:03:00.0 enp3s0: renamed from eth0
> [   11.501187] r8169 0000:03:00.0 enp3s0: rtl_counters_cond == 1
> (loop: 1000, delay: 10).
> [   33.348246] r8169 0000:03:00.0 enp3s0: rtl_counters_cond == 1
> (loop: 1000, delay: 10).
> [   34.273602] r8169 0000:03:00.0 enp3s0: rtl_counters_cond == 1
> (loop: 1000, delay: 10).
> [   35.195420] r8169 0000:03:00.0 enp3s0: rtl_counters_cond == 1
> (loop: 1000, delay: 10).
> [   35.678908] r8169 0000:03:00.0 enp3s0: rtl_counters_cond == 1
> (loop: 1000, delay: 10).
> [   36.401291] IPv6: ADDRCONF(NETDEV_UP): enp3s0: link is not ready
> [   36.411989] r8169 0000:03:00.0 enp3s0: rtl_counters_cond == 1
> (loop: 1000, delay: 10).
>
> the rtl_counters_cond message is then regularly repeated.
>
> ifconfig's ether address shows ff:ff:ff:ff:ff:ff
>
>
> If that is of any help - the controller works on FreeBSD's re driver,
> but only when the following two loader options are set:
> hw.re.msix_disable=1
> hw.re.prefer_iomap=1
>
> Both of these options must be set to 1. If prefer_iomap is not set,
> there are messages like:
> re0: reset never completed!
> re0: PHY write failed
> re0: attaching PHYs failed
> device_attach: re0 attach returned 6
>
> If msix_disable is not set (msix is enabled), there are no error
> messages and mac address is read correctly, but the controller does
> not work (no connection can be established, e.g. DHCP).
>
> If both of the above options are set to 1, the controller works fine,
> with the following messages:
> re0: <RealTek 810xE PCIe 10/100baseTX> port 0xe000-0xe0ff mem
> 0xe0004000-0xe0004fff,0xe0000000-0xe0003fff at device 0.0 on pci2
> re0: Using 1 MSI message
> re0: Chip rev. 0x44800000
> re0: MAC rev. 0x00100000
> miibus0: <MII bus> on re0
> rlphy0: <RTL8201E 10/100 media interface> PHY 1 on miibus0
> rlphy0: 10baseT, 100baseT-FDX, 100baseTX, 100baseTX-FDX, auto, auto-flow
> re0: Using defaults for TSO: 65518/35/2048
> re0: netmap queues/slots: TX 1/256, RX 1/256
> re0: link state changed to DOWN
> re0: link state changed to UP
>
> The problem occurs e.g. on Intel Atom-based Toshiba NB10-A-104 Netbook.
> In the next mail I will attach lspci -vvv output.
>
> Thank you,
> Piotr.

^ permalink raw reply

* Realtek RTL8101E PCI-E ethernet controller does not work with the r8169 driver
From: Piotr Gabriel Kosinski @ 2017-04-27 15:31 UTC (permalink / raw)
  To: netdev

The Realtek RTL8101E PCI Express ethernet controller does not work
with the r8169 driver, on kernel 4.10 (it also did not work on earlier
kernels). These are the messages from the kernel log:

[   11.359593] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[   11.359611] r8169 0000:03:00.0: can't disable ASPM; OS doesn't have
ASPM control
[   11.359803] r8169 0000:03:00.0 (unnamed net_device)
(uninitialized): unknown MAC, using family default
[   11.369802] r8169 0000:03:00.0 (unnamed net_device)
(uninitialized): rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
[   11.406668] r8169 0000:03:00.0 eth0: rtl_counters_cond == 1 (loop:
1000, delay: 10).
[   11.406815] r8169 0000:03:00.0 eth0: RTL8101e at
0xffffb3140068d000, ff:ff:ff:ff:ff:ff, XID 9cf0f8ff IRQ 89
[   11.482740] r8169 0000:03:00.0 enp3s0: renamed from eth0
[   11.501187] r8169 0000:03:00.0 enp3s0: rtl_counters_cond == 1
(loop: 1000, delay: 10).
[   33.348246] r8169 0000:03:00.0 enp3s0: rtl_counters_cond == 1
(loop: 1000, delay: 10).
[   34.273602] r8169 0000:03:00.0 enp3s0: rtl_counters_cond == 1
(loop: 1000, delay: 10).
[   35.195420] r8169 0000:03:00.0 enp3s0: rtl_counters_cond == 1
(loop: 1000, delay: 10).
[   35.678908] r8169 0000:03:00.0 enp3s0: rtl_counters_cond == 1
(loop: 1000, delay: 10).
[   36.401291] IPv6: ADDRCONF(NETDEV_UP): enp3s0: link is not ready
[   36.411989] r8169 0000:03:00.0 enp3s0: rtl_counters_cond == 1
(loop: 1000, delay: 10).

the rtl_counters_cond message is then regularly repeated.

ifconfig's ether address shows ff:ff:ff:ff:ff:ff


If that is of any help - the controller works on FreeBSD's re driver,
but only when the following two loader options are set:
hw.re.msix_disable=1
hw.re.prefer_iomap=1

Both of these options must be set to 1. If prefer_iomap is not set,
there are messages like:
re0: reset never completed!
re0: PHY write failed
re0: attaching PHYs failed
device_attach: re0 attach returned 6

If msix_disable is not set (msix is enabled), there are no error
messages and mac address is read correctly, but the controller does
not work (no connection can be established, e.g. DHCP).

If both of the above options are set to 1, the controller works fine,
with the following messages:
re0: <RealTek 810xE PCIe 10/100baseTX> port 0xe000-0xe0ff mem
0xe0004000-0xe0004fff,0xe0000000-0xe0003fff at device 0.0 on pci2
re0: Using 1 MSI message
re0: Chip rev. 0x44800000
re0: MAC rev. 0x00100000
miibus0: <MII bus> on re0
rlphy0: <RTL8201E 10/100 media interface> PHY 1 on miibus0
rlphy0: 10baseT, 100baseT-FDX, 100baseTX, 100baseTX-FDX, auto, auto-flow
re0: Using defaults for TSO: 65518/35/2048
re0: netmap queues/slots: TX 1/256, RX 1/256
re0: link state changed to DOWN
re0: link state changed to UP

The problem occurs e.g. on Intel Atom-based Toshiba NB10-A-104 Netbook.
In the next mail I will attach lspci -vvv output.

Thank you,
Piotr.

^ permalink raw reply

* [PATCH net-next] rhashtable: compact struct rhashtable_params
From: Florian Westphal @ 2017-04-27 15:28 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal

By using smaller datatypes this struct shrinks considerably
(80 -> 48 bytes on x86_64).

As this is embedded in other structs, this also reduces size of several
others, e.g. cls_fl_head and nft_hash.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/rhashtable.h | 18 +++++++++---------
 lib/rhashtable.c           |  2 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 8e628feb8708..77cf43166383 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -127,23 +127,23 @@ struct rhashtable;
  * @head_offset: Offset of rhash_head in struct to be hashed
  * @max_size: Maximum size while expanding
  * @min_size: Minimum size while shrinking
- * @nulls_base: Base value to generate nulls marker
- * @automatic_shrinking: Enable automatic shrinking of tables
  * @locks_mul: Number of bucket locks to allocate per cpu (default: 128)
+ * @automatic_shrinking: Enable automatic shrinking of tables
+ * @nulls_base: Base value to generate nulls marker
  * @hashfn: Hash function (default: jhash2 if !(key_len % 4), or jhash)
  * @obj_hashfn: Function to hash object
  * @obj_cmpfn: Function to compare key with object
  */
 struct rhashtable_params {
-	size_t			nelem_hint;
-	size_t			key_len;
-	size_t			key_offset;
-	size_t			head_offset;
+	u16			nelem_hint;
+	u16			key_len;
+	u16			key_offset;
+	u16			head_offset;
 	unsigned int		max_size;
-	unsigned int		min_size;
-	u32			nulls_base;
+	u16			min_size;
 	bool			automatic_shrinking;
-	size_t			locks_mul;
+	u8			locks_mul;
+	u32			nulls_base;
 	rht_hashfn_t		hashfn;
 	rht_obj_hashfn_t	obj_hashfn;
 	rht_obj_cmpfn_t		obj_cmpfn;
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 1fd7986aa9b4..4462676a26f2 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -961,7 +961,7 @@ int rhashtable_init(struct rhashtable *ht,
 	if (params->max_size)
 		ht->p.max_size = rounddown_pow_of_two(params->max_size);
 
-	ht->p.min_size = max(ht->p.min_size, HASH_MIN_SIZE);
+	ht->p.min_size = max_t(u16, ht->p.min_size, HASH_MIN_SIZE);
 
 	if (params->nelem_hint)
 		size = rounded_hashtable_size(&ht->p);
-- 
2.10.2

^ permalink raw reply related

* RE: [PATCH net-next 8/9] ipvlan: improve compiler hints
From: David Laight @ 2017-04-27 15:27 UTC (permalink / raw)
  To: 'Duyck, Alexander H', Chiappero, Marco,
	netdev@vger.kernel.org
  Cc: David S . Miller, Kirsher, Jeffrey T, Grandhi, Sainath,
	Mahesh Bandewar
In-Reply-To: <B1C1DF2ACD01FD4881736AA51731BAB2B2EFB2@ORSMSX107.amr.corp.intel.com>

From: Duyck, Alexander H
> Sent: 27 April 2017 16:21
...
> > -unsigned int ipvlan_mac_hash(const unsigned char *addr)
> > +inline unsigned int ipvlan_mac_hash(const unsigned char *addr)
> >  {
> >  	u32 hash = jhash_1word(__get_unaligned_cpu32(addr + 2),
> >  			       ipvlan_jhash_secret);
> 
> I'm kind of surprised this isn't causing a problem with differing declarations between the declaration
> here and the declaration in ipvlan.h. Normally for inlining something like this you would change it to
> a "static inline" and move the entire declaration into the header file.

You get a callable copy for external callers and local calls inlined.
Not usually what you want.

	David

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox