Netdev List

Netdev List
 help / color / mirror / Atom feed

* [GIT PULL 00/11] perf/core improvements and fixes
From: Arnaldo Carvalho de Melo @ 2018-05-16 14:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Clark Williams, linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, Adrian Hunter, Agustin Vega-Frias,
	Alexander Shishkin, Andi Kleen, Andy Lutomirski, Daniel Borkmann,
	Dave Hansen, David Ahern, Ganapatrao Kulkarni, H . Peter Anvin,
	Jin Yao, Jiri Olsa, Joerg Roedel, Kan Liang, Masami Hiramatsu,
	Namhyung Kim <namhyun

Hi Ingo,

	Please consider pulling, more to come as I go thru Adrian's x86
PTI series and the C++ support improvements to 'perf probe', from
Holger,

Best Regards,

- Arnaldo

Test results at the end of this message, as usual.
  
The following changes since commit 291c161f6c65530092903fbea58eb07a62b220ba:

  Merge remote-tracking branch 'tip/perf/urgent' into perf/core (2018-05-15 10:30:17 -0300)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-4.18-20180516

for you to fetch changes up to 7a36a287de9fbb1ba906e70573d3f2315f7fd609:

  perf bpf: Fix NULL return handling in bpf__prepare_load() (2018-05-16 10:01:55 -0300)

----------------------------------------------------------------
perf/core improvements and fixes:

- Add '-e intel_pt//u' test to the 'parse-events' 'perf test' entry,
  to help avoiding regressions in the events parser such as one
  that caused a revert in v4.17-rc (Arnaldo Carvalho de Melo)

- Fix NULL return handling in bpf__prepare_load() (YueHaibing)

- Warn about 'perf buildid-cache --purge-all' failures (Ravi Bangoria)

- Add infrastructure to help in writing eBPF C programs to be used
  with '-e name.c' type events in tools such as 'record' and 'trace',
  with headers for common constructs and an examples directory that
  will get populated as we add more such helpers and the 'perf bpf'
  branch that Jiri Olsa has been working on (Arnaldo Carvalho de Melo)

- Handle uncore event aliases in small groups properly (Kan Liang)

- Use the "_stest" symbol to identify the kernel map when loading kcore (Adrian Hunter)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

----------------------------------------------------------------
Adrian Hunter (1):
      perf tools: Use the "_stest" symbol to identify the kernel map when loading kcore

Arnaldo Carvalho de Melo (7):
      perf tests parse-events: Add intel_pt parse test
      perf llvm-utils: Add bpf include path to clang command line
      perf bpf: Add 'examples' directories
      perf bpf: Add bpf.h to be used in eBPF proggies
      perf bpf: Add kprobe example to catch 5s naps
      perf bpf: Add license(NAME) helper
      perf bpf: Add probe() helper to reduce kprobes boilerplate

Kan Liang (1):
      perf parse-events: Handle uncore event aliases in small groups properly

Ravi Bangoria (1):
      perf buildid-cache: Warn --purge-all failures

YueHaibing (1):
      perf bpf: Fix NULL return handling in bpf__prepare_load()

 tools/perf/Makefile.config         |  14 ++++
 tools/perf/Makefile.perf           |   8 +++
 tools/perf/builtin-buildid-cache.c |   8 ++-
 tools/perf/examples/bpf/5sec.c     |  49 ++++++++++++++
 tools/perf/examples/bpf/empty.c    |   3 +
 tools/perf/include/bpf/bpf.h       |  13 ++++
 tools/perf/tests/parse-events.c    |  13 ++++
 tools/perf/util/Build              |   2 +
 tools/perf/util/bpf-loader.c       |   6 +-
 tools/perf/util/evsel.h            |   1 +
 tools/perf/util/llvm-utils.c       |  19 ++++--
 tools/perf/util/parse-events.c     | 130 ++++++++++++++++++++++++++++++++++++-
 tools/perf/util/parse-events.h     |   7 +-
 tools/perf/util/parse-events.y     |   8 +--
 tools/perf/util/symbol.c           |  16 ++---
 15 files changed, 270 insertions(+), 27 deletions(-)
 create mode 100644 tools/perf/examples/bpf/5sec.c
 create mode 100644 tools/perf/examples/bpf/empty.c
 create mode 100644 tools/perf/include/bpf/bpf.h

Test results:

The first ones are container (docker) based builds of tools/perf with
and without libelf support.  Where clang is available, it is also used
to build perf with/without libelf, and building with LIBCLANGLLVM=1
(built-in clang) with gcc and clang when clang and its devel libraries
are installed.

The objtool and samples/bpf/ builds are disabled now that I'm switching from
using the sources in a local volume to fetching them from a http server to
build it inside the container, to make it easier to build in a container cluster.
Those will come back later.

Several are cross builds, the ones with -x-ARCH and the android one, and those
may not have all the features built, due to lack of multi-arch devel packages,
available and being used so far on just a few, like
debian:experimental-x-{arm64,mipsel}.

The 'perf test' one will perform a variety of tests exercising
tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands
with a variety of command line event specifications to then intercept the
sys_perf_event syscall to check that the perf_event_attr fields are set up as
expected, among a variety of other unit tests.

Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/
with a variety of feature sets, exercising the build with an incomplete set of
features as well as with a complete one. It is planned to have it run on each
of the containers mentioned above, using some container orchestration
infrastructure. Get in contact if interested in helping having this in place.

  # dm
   1 alpine:3.4                    : Ok   gcc (Alpine 5.3.0) 5.3.0
   2 alpine:3.5                    : Ok   gcc (Alpine 6.2.1) 6.2.1 20160822
   3 alpine:3.6                    : Ok   gcc (Alpine 6.3.0) 6.3.0
   4 alpine:3.7                    : Ok   gcc (Alpine 6.4.0) 6.4.0
   5 alpine:edge                   : Ok   gcc (Alpine 6.4.0) 6.4.0
   6 amazonlinux:1                 : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)
   7 amazonlinux:2                 : Ok   gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
   8 android-ndk:r12b-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
   9 android-ndk:r15c-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
  10 centos:5                      : Ok   gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55)
  11 centos:6                      : Ok   gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18)
  12 centos:7                      : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
  13 debian:7                      : Ok   gcc (Debian 4.7.2-5) 4.7.2
  14 debian:8                      : Ok   gcc (Debian 4.9.2-10+deb8u1) 4.9.2
  15 debian:9                      : Ok   gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
  16 debian:experimental           : Ok   gcc (Debian 7.3.0-18) 7.3.0
  17 debian:experimental-x-arm64   : Ok   aarch64-linux-gnu-gcc (Debian 7.3.0-18) 7.3.0
  18 debian:experimental-x-mips    : Ok   mips-linux-gnu-gcc (Debian 7.3.0-18) 7.3.0
  19 debian:experimental-x-mips64  : Ok   mips64-linux-gnuabi64-gcc (Debian 7.3.0-18) 7.3.0
  20 debian:experimental-x-mipsel  : Ok   mipsel-linux-gnu-gcc (Debian 7.3.0-18) 7.3.0
  21 fedora:20                     : Ok   gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)
  22 fedora:21                     : Ok   gcc (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)
  23 fedora:22                     : Ok   gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
  24 fedora:23                     : Ok   gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
  25 fedora:24                     : Ok   gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1)
  26 fedora:24-x-ARC-uClibc        : Ok   arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710
  27 fedora:25                     : Ok   gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1)
  28 fedora:26                     : Ok   gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2)
  29 fedora:27                     : Ok   gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
  30 fedora:28                     : Ok   gcc (GCC) 8.1.1 20180502 (Red Hat 8.1.1-1)
  31 fedora:rawhide                : Ok   gcc (GCC) 8.0.1 20180324 (Red Hat 8.0.1-0.20)
  32 gentoo-stage3-amd64:latest    : Ok   gcc (Gentoo 6.4.0-r1 p1.3) 6.4.0
  33 mageia:5                      : Ok   gcc (GCC) 4.9.2
  34 mageia:6                      : Ok   gcc (Mageia 5.5.0-1.mga6) 5.5.0
  35 opensuse:42.1                 : Ok   gcc (SUSE Linux) 4.8.5
  36 opensuse:42.2                 : Ok   gcc (SUSE Linux) 4.8.5
  37 opensuse:42.3                 : Ok   gcc (SUSE Linux) 4.8.5
  38 opensuse:tumbleweed           : Ok   gcc (SUSE Linux) 7.3.1 20180323 [gcc-7-branch revision 258812]
  39 oraclelinux:6                 : Ok   gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18.0.7)
  40 oraclelinux:7                 : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28.0.1)
  41 ubuntu:12.04.5                : Ok   gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
  42 ubuntu:14.04.4                : Ok   gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
  43 ubuntu:14.04.4-x-linaro-arm64 : Ok   aarch64-linux-gnu-gcc (Linaro GCC 5.5-2017.10) 5.5.0
  44 ubuntu:16.04                  : Ok   gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  45 ubuntu:16.04-x-arm            : Ok   arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  46 ubuntu:16.04-x-arm64          : Ok   aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  47 ubuntu:16.04-x-powerpc        : Ok   powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  48 ubuntu:16.04-x-powerpc64      : Ok   powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  49 ubuntu:16.04-x-powerpc64el    : Ok   powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  50 ubuntu:16.04-x-s390           : Ok   s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  51 ubuntu:16.10                  : Ok   gcc (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005
  52 ubuntu:17.04                  : Ok   gcc (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406
  53 ubuntu:17.10                  : Ok   gcc (Ubuntu 7.2.0-8ubuntu3.2) 7.2.0
  54 ubuntu:18.04                  : Ok   gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0
  #

  # uname -a
  Linux jouet 4.17.0-rc5 #21 SMP Mon May 14 15:35:35 -03 2018 x86_64 x86_64 x86_64 GNU/Linux
  # perf test
   1: vmlinux symtab matches kallsyms                       : Ok
   2: Detect openat syscall event                           : Ok
   3: Detect openat syscall event on all cpus               : Ok
   4: Read samples using the mmap interface                 : Ok
   5: Test data source output                               : Ok
   6: Parse event definition strings                        : Ok
   7: Simple expression parser                              : Ok
   8: PERF_RECORD_* events & perf_sample fields             : Ok
   9: Parse perf pmu format                                 : Ok
  10: DSO data read                                         : Ok
  11: DSO data cache                                        : Ok
  12: DSO data reopen                                       : Ok
  13: Roundtrip evsel->name                                 : Ok
  14: Parse sched tracepoints fields                        : Ok
  15: syscalls:sys_enter_openat event fields                : Ok
  16: Setup struct perf_event_attr                          : Ok
  17: Match and link multiple hists                         : Ok
  18: 'import perf' in python                               : Ok
  19: Breakpoint overflow signal handler                    : Ok
  20: Breakpoint overflow sampling                          : Ok
  21: Breakpoint accounting                                 : Ok
  22: Number of exit events of a simple workload            : Ok
  23: Software clock events period values                   : Ok
  24: Object code reading                                   : Ok
  25: Sample parsing                                        : Ok
  26: Use a dummy software event to keep tracking           : Ok
  27: Parse with no sample_id_all bit set                   : Ok
  28: Filter hist entries                                   : Ok
  29: Lookup mmap thread                                    : Ok
  30: Share thread mg                                       : Ok
  31: Sort output of hist entries                           : Ok
  32: Cumulate child hist entries                           : Ok
  33: Track with sched_switch                               : Ok
  34: Filter fds with revents mask in a fdarray             : Ok
  35: Add fd to a fdarray, making it autogrow               : Ok
  36: kmod_path__parse                                      : Ok
  37: Thread map                                            : Ok
  38: LLVM search and compile                               :
  38.1: Basic BPF llvm compile                              : Ok
  38.2: kbuild searching                                    : Ok
  38.3: Compile source for BPF prologue generation          : Ok
  38.4: Compile source for BPF relocation                   : Ok
  39: Session topology                                      : Ok
  40: BPF filter                                            :
  40.1: Basic BPF filtering                                 : Ok
  40.2: BPF pinning                                         : Ok
  40.3: BPF prologue generation                             : Ok
  40.4: BPF relocation checker                              : Ok
  41: Synthesize thread map                                 : Ok
  42: Remove thread map                                     : Ok
  43: Synthesize cpu map                                    : Ok
  44: Synthesize stat config                                : Ok
  45: Synthesize stat                                       : Ok
  46: Synthesize stat round                                 : Ok
  47: Synthesize attr update                                : Ok
  48: Event times                                           : Ok
  49: Read backward ring buffer                             : Ok
  50: Print cpu map                                         : Ok
  51: Probe SDT events                                      : Ok
  52: is_printable_array                                    : Ok
  53: Print bitmap                                          : Ok
  54: perf hooks                                            : Ok
  55: builtin clang support                                 : Skip (not compiled in)
  56: unit_number__scnprintf                                : Ok
  57: mem2node                                              : Ok
  58: x86 rdpmc                                             : Ok
  59: Convert perf time to TSC                              : Ok
  60: DWARF unwind                                          : Ok
  61: x86 instruction decoder - new instructions            : Ok
  62: Use vfs_getname probe to get syscall args filenames   : Ok
  63: Check open filename arg using perf trace + vfs_getname: Ok
  64: probe libc's inet_pton & backtrace it with ping       : Ok
  65: Add vfs_getname probe to get syscall args filenames   : Ok
  #
  
  $ make -C tools/perf build-test
  make: Entering directory '/home/acme/git/perf/tools/perf'
  - tarpkg: ./tests/perf-targz-src-pkg .
        make_with_babeltrace_O: make LIBBABELTRACE=1
                 make_static_O: make LDFLAGS=-static
                   make_tags_O: make tags
                make_install_O: make install
                    make_doc_O: make doc
              make_no_libelf_O: make NO_LIBELF=1
              make_no_libbpf_O: make NO_LIBBPF=1
            make_no_demangle_O: make NO_DEMANGLE=1
              make_clean_all_O: make clean all
               make_no_slang_O: make NO_SLANG=1
             make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1
         make_with_clangllvm_O: make LIBCLANGLLVM=1
                make_no_newt_O: make NO_NEWT=1
             make_no_libnuma_O: make NO_LIBNUMA=1
  make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1
           make_no_libbionic_O: make NO_LIBBIONIC=1
             make_no_libperl_O: make NO_LIBPERL=1
           make_no_libpython_O: make NO_LIBPYTHON=1
           make_no_libunwind_O: make NO_LIBUNWIND=1
           make_no_backtrace_O: make NO_BACKTRACE=1
   make_install_prefix_slash_O: make install prefix=/tmp/krava/
                  make_debug_O: make DEBUG=1
             make_util_map_o_O: make util/map.o
                make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1
         make_install_prefix_O: make install prefix=/tmp/krava
                   make_pure_O: make
                 make_perf_o_O: make perf.o
                   make_help_O: make help
                make_no_gtk2_O: make NO_GTK2=1
            make_no_libaudit_O: make NO_LIBAUDIT=1
       make_util_pmu_bison_o_O: make util/pmu-bison.o
                  make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1
            make_install_bin_O: make install-bin
            make_no_auxtrace_O: make NO_AUXTRACE=1
  OK
  make: Leaving directory '/home/acme/git/perf/tools/perf'
  $ 

^ permalink raw reply

* Re: [RFC PATCH bpf-next 12/12] i40e: implement Tx zero-copy
From: Magnus Karlsson @ 2018-05-16 14:38 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Björn Töpel, Karlsson, Magnus, Alexander Duyck,
	Alexander Duyck, John Fastabend, Alexei Starovoitov,
	Willem de Bruijn, Daniel Borkmann, Michael S. Tsirkin,
	Network Development, michael.lundkvist, Brandeburg, Jesse,
	Singhai, Anjali, Zhang, Qi Z, intel-wired-lan
In-Reply-To: <20180516162840.1405afaf@redhat.com>

On Wed, May 16, 2018 at 4:28 PM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
> On Tue, 15 May 2018 21:06:15 +0200
> Björn Töpel <bjorn.topel@gmail.com> wrote:
>
>> From: Magnus Karlsson <magnus.karlsson@intel.com>
>>
>> Here, the zero-copy ndo is implemented. As a shortcut, the existing
>> XDP Tx rings are used for zero-copy. This means that and XDP program
>> cannot redirect to an AF_XDP enabled XDP Tx ring.
>
> I've changed i40e1 to only have one queue via:
>  $ ethtool -L i40e1 combined 1
>
> And then, I'm sending on queue 1, which is/should not be avail... and then crash/BUG:
>
> $ sudo taskset -c 2 ./xdpsock --tx --interface=i40e1 --queue=1
>
> [ 3799.936877] Number of in use tx queues changed invalidating tc mappings. Priority traffic
>  classification disabled!
> [ 3799.972970] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> [ 3799.980790] PGD 80000007b0992067 P4D 80000007b0992067 PUD 7b62d4067 PMD 0
> [ 3799.987654] Oops: 0002 [#1] PREEMPT SMP PTI
> [ 3799.991831] Modules linked in: nf_nat_masquerade_ipv4 tun nfnetlink bridge stp llc nf_nat
>  nf_conntrack rpcrdma ib_ipoib rdma_ucm ib_ucm ib_umad rdma_cm ib_cm iw_cm sunrpc mlx5_ib ib
> _uverbs ib_core coretemp kvm_intel kvm irqbypass intel_cstate intel_uncore intel_rapl_perf p
> cspkr i2c_i801 shpchp wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad sch_fq_codel i40e ml
> x5_core hid_generic ixgbe igb devlink mdio ptp sd_mod i2c_algo_bit i2c_core pps_core [last u
> nloaded: x_tables]
> [ 3800.033472] CPU: 2 PID: 2006 Comm: xdpsock Not tainted 4.17.0-rc3-af_xdp03_ZC_rfc+ #155
> [ 3800.041465] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0a 08/01/2016
> [ 3800.048943] RIP: 0010:i40e_xmit_frame_ring+0xd4/0x1490 [i40e]
> [ 3800.054683] RSP: 0018:ffffc9000407bcd0 EFLAGS: 00010293
> [ 3800.059900] RAX: 0000000000000000 RBX: ffff88084f0fd200 RCX: 0000000000000000
> [ 3800.067022] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff8807b6e710c0
> [ 3800.074148] RBP: ffff8807c6397800 R08: 00000000000000c0 R09: 0000000000000001
> [ 3800.081270] R10: 0000000000000800 R11: 0000000000000010 R12: 0000000000000001
> [ 3800.088396] R13: 0000000000000000 R14: 0000000000000001 R15: 000000000000003c
> [ 3800.095520] FS:  00007f1d1e00bb80(0000) GS:ffff88087fc80000(0000) knlGS:0000000000000000
> [ 3800.103597] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3800.109335] CR2: 0000000000000008 CR3: 000000087d542001 CR4: 00000000003606e0
> [ 3800.116458] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 3800.123583] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 3800.130706] Call Trace:
> [ 3800.133157]  ? validate_xmit_skb.isra.116+0x1c/0x270
> [ 3800.138118]  dev_direct_xmit+0xec/0x1d0
> [ 3800.141949]  xsk_sendmsg+0x1f4/0x380
> [ 3800.145521]  sock_sendmsg+0x30/0x40
> [ 3800.149005]  __sys_sendto+0x10e/0x140
> [ 3800.152662]  ? __do_page_fault+0x283/0x500
> [ 3800.156751]  __x64_sys_sendto+0x24/0x30
> [ 3800.160585]  do_syscall_64+0x42/0xf0
> [ 3800.164156]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 3800.169204] RIP: 0033:0x7f1d1d9db430
> [ 3800.172774] RSP: 002b:00007fffb7278610 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
> [ 3800.180333] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1d1d9db430
> [ 3800.187456] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
> [ 3800.194582] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [ 3800.201705] R10: 0000000000000040 R11: 0000000000000293 R12: 0000000000000000
> [ 3800.208830] R13: 0000000000000000 R14: 0000000000755510 R15: 00007f1d0d3fc000
> [ 3800.215953] Code: d0 0f 86 db 05 00 00 01 c8 0f b7 ca 29 c8 83 e8 01 39 c6 0f 8f ea 06 00 00 48 8b 45 28 48 8d 14 92 41 b9 01 00 00 00 4c 8d 2c d0 <49> 89 5d 08 8b 83 80 00 00 00 66 45 89 4d 14 41 89 45 10 0f b7
> [ 3800.234798] RIP: i40e_xmit_frame_ring+0xd4/0x1490 [i40e] RSP: ffffc9000407bcd0
> [ 3800.242005] CR2: 0000000000000008
> [ 3800.245320] ---[ end trace f169e36f468e0c59 ]---
> [ 3801.726719] Kernel panic - not syncing: Fatal exception in interrupt
> [ 3801.733097] Kernel Offset: disabled
> [ 3801.785836] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> [ 3801.793403] ------------[ cut here ]------------
>
> (gdb) list *(i40e_xmit_frame_ring)+0xd4
> 0x2ccd4 is in i40e_xmit_frame_ring (drivers/net/ethernet/intel/i40e/i40e_txrx.c:4048).
> warning: Source file is more recent than executable.
> 4043                    return NETDEV_TX_BUSY;
> 4044            }
> 4045
> 4046            /* record the location of the first descriptor for this packet */
> 4047            first = &tx_ring->tx_bi[tx_ring->next_to_use];
> 4048            first->skb = skb;
> 4049            first->bytecount = skb->len;
> 4050            first->gso_segs = 1;
> 4051
> 4052            /* prepare the xmit flags */
>
>
> (gdb) list *(xsk_sendmsg)+0x1f4
> 0xffffffff81800c34 is in xsk_sendmsg (net/xdp/xsk.c:251).
> warning: Source file is more recent than executable.
> 246                     skb_shinfo(skb)->destructor_arg = (void *)(long)id;
> 247                     skb->destructor = xsk_destruct_skb;
> 248
> 249                     err = dev_direct_xmit(skb, xs->queue_id);
> 250                     /* Ignore NET_XMIT_CN as packet might have been sent */
> 251                     if (err == NET_XMIT_DROP || err == NETDEV_TX_BUSY) {
> 252                             err = -EAGAIN;
> 253                             /* SKB consumed by dev_direct_xmit() */
> 254                             goto out;
> 255                     }

Thanks Jesper for reporting. I will take a look at it.

/Magnus

^ permalink raw reply

* Re: [PATCH 00/14] Modify action API for implementing lockless actions
From: Roman Mashak @ 2018-05-16 14:38 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Jamal Hadi Salim, Linux Kernel Network Developers, David Miller,
	Cong Wang, Jiri Pirko, pablo, kadlec, fw, ast, Daniel Borkmann,
	Eric Dumazet, kliteyn
In-Reply-To: <vbffu2s14os.fsf@reg-r-vrt-018-180.mtr.labs.mlnx>

On Wed, May 16, 2018 at 2:43 AM, Vlad Buslov <vladbu@mellanox.com> wrote:
>>>>> I'm trying to run tdc, but keep getting following error even on clean
>>>>> branch without my patches:
>>>>
>>>> Vlad, not sure if you saw my email:
>>>> Apply Roman's patch and try again
>>>>
>>>> https://marc.info/?l=linux-netdev&m=152639369112020&w=2
>>>>
>>>> cheers,
>>>> jamal
>>>
>>> With patch applied I get following error:
>>>
>>> Test 7d50: Add skbmod action to set destination mac
>>> exit: 255 0
>>> dst MAC address <11:22:33:44:55:66>
>>> RTNETLINK answers: No such file or directory
>>> We have an error talking to the kernel
>>>
>>
>> You may actually have broken something with your patches in this case.
>
> Results is for net-next without my patches.

Do you have skbmod compiled in kernel or as a module?

^ permalink raw reply

* Re: [RFC v4 3/5] virtio_ring: add packed ring support
From: Tiwei Bie @ 2018-05-16 14:33 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann
In-Reply-To: <eaf946f1-abab-c4e5-1ab5-ba7912986d58@redhat.com>

On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:
> On 2018年05月16日 21:45, Tiwei Bie wrote:
> > On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:
> > > On 2018年05月16日 20:39, Tiwei Bie wrote:
> > > > On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:
> > > > > On 2018年05月16日 16:37, Tiwei Bie wrote:
[...]
> > > > > > +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
> > > > > > +			      unsigned int id, void **ctx)
> > > > > > +{
> > > > > > +	struct vring_packed_desc *desc;
> > > > > > +	unsigned int i, j;
> > > > > > +
> > > > > > +	/* Clear data ptr. */
> > > > > > +	vq->desc_state[id].data = NULL;
> > > > > > +
> > > > > > +	i = head;
> > > > > > +
> > > > > > +	for (j = 0; j < vq->desc_state[id].num; j++) {
> > > > > > +		desc = &vq->vring_packed.desc[i];
> > > > > > +		vring_unmap_one_packed(vq, desc);
> > > > > As mentioned in previous discussion, this probably won't work for the case
> > > > > of out of order completion since it depends on the information in the
> > > > > descriptor ring. We probably need to extend ctx to record such information.
> > > > Above code doesn't depend on the information in the descriptor
> > > > ring. The vq->desc_state[] is the extended ctx.
> > > > 
> > > > Best regards,
> > > > Tiwei Bie
> > > Yes, but desc is a pointer to descriptor ring I think so
> > > vring_unmap_one_packed() still depends on the content of descriptor ring?
> > > 
> > I got your point now. I think it makes sense to reserve
> > the bits of the addr field. Driver shouldn't try to get
> > addrs from the descriptors when cleanup the descriptors
> > no matter whether we support out-of-order or not.
> 
> Maybe I was wrong, but I remember spec mentioned something like this.

You're right. Spec mentioned this. I was just repeating
the spec to emphasize that it does make sense. :)

> 
> > 
> > But combining it with the out-of-order support, it will
> > mean that the driver still needs to maintain a desc/ctx
> > list that is very similar to the desc ring in the split
> > ring. I'm not quite sure whether it's something we want.
> > If it is true, I'll do it. So do you think we also want
> > to maintain such a desc/ctx list for packed ring?
> 
> To make it work for OOO backends I think we need something like this
> (hardware NIC drivers are usually have something like this).

Which hardware NIC drivers have this?

> 
> Not for the patch, but it looks like having a OUT_OF_ORDER feature bit is
> much more simpler to be started with.

+1

Best regards,
Tiwei Bie

^ permalink raw reply

* Re: [RFC PATCH bpf-next 12/12] i40e: implement Tx zero-copy
From: Jesper Dangaard Brouer @ 2018-05-16 14:28 UTC (permalink / raw)
  To: Björn Töpel
  Cc: magnus.karlsson, magnus.karlsson, alexander.h.duyck,
	alexander.duyck, john.fastabend, ast, willemdebruijn.kernel,
	daniel, mst, netdev, michael.lundkvist, jesse.brandeburg,
	anjali.singhai, qi.z.zhang, intel-wired-lan, brouer
In-Reply-To: <20180515190615.23099-13-bjorn.topel@gmail.com>

On Tue, 15 May 2018 21:06:15 +0200
Björn Töpel <bjorn.topel@gmail.com> wrote:

> From: Magnus Karlsson <magnus.karlsson@intel.com>
> 
> Here, the zero-copy ndo is implemented. As a shortcut, the existing
> XDP Tx rings are used for zero-copy. This means that and XDP program
> cannot redirect to an AF_XDP enabled XDP Tx ring.

I've changed i40e1 to only have one queue via:
 $ ethtool -L i40e1 combined 1

And then, I'm sending on queue 1, which is/should not be avail... and then crash/BUG:

$ sudo taskset -c 2 ./xdpsock --tx --interface=i40e1 --queue=1

[ 3799.936877] Number of in use tx queues changed invalidating tc mappings. Priority traffic
 classification disabled!
[ 3799.972970] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 3799.980790] PGD 80000007b0992067 P4D 80000007b0992067 PUD 7b62d4067 PMD 0 
[ 3799.987654] Oops: 0002 [#1] PREEMPT SMP PTI
[ 3799.991831] Modules linked in: nf_nat_masquerade_ipv4 tun nfnetlink bridge stp llc nf_nat
 nf_conntrack rpcrdma ib_ipoib rdma_ucm ib_ucm ib_umad rdma_cm ib_cm iw_cm sunrpc mlx5_ib ib
_uverbs ib_core coretemp kvm_intel kvm irqbypass intel_cstate intel_uncore intel_rapl_perf p
cspkr i2c_i801 shpchp wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad sch_fq_codel i40e ml
x5_core hid_generic ixgbe igb devlink mdio ptp sd_mod i2c_algo_bit i2c_core pps_core [last u
nloaded: x_tables]
[ 3800.033472] CPU: 2 PID: 2006 Comm: xdpsock Not tainted 4.17.0-rc3-af_xdp03_ZC_rfc+ #155
[ 3800.041465] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0a 08/01/2016
[ 3800.048943] RIP: 0010:i40e_xmit_frame_ring+0xd4/0x1490 [i40e]
[ 3800.054683] RSP: 0018:ffffc9000407bcd0 EFLAGS: 00010293
[ 3800.059900] RAX: 0000000000000000 RBX: ffff88084f0fd200 RCX: 0000000000000000
[ 3800.067022] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff8807b6e710c0
[ 3800.074148] RBP: ffff8807c6397800 R08: 00000000000000c0 R09: 0000000000000001
[ 3800.081270] R10: 0000000000000800 R11: 0000000000000010 R12: 0000000000000001
[ 3800.088396] R13: 0000000000000000 R14: 0000000000000001 R15: 000000000000003c
[ 3800.095520] FS:  00007f1d1e00bb80(0000) GS:ffff88087fc80000(0000) knlGS:0000000000000000
[ 3800.103597] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3800.109335] CR2: 0000000000000008 CR3: 000000087d542001 CR4: 00000000003606e0
[ 3800.116458] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3800.123583] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3800.130706] Call Trace:
[ 3800.133157]  ? validate_xmit_skb.isra.116+0x1c/0x270
[ 3800.138118]  dev_direct_xmit+0xec/0x1d0
[ 3800.141949]  xsk_sendmsg+0x1f4/0x380
[ 3800.145521]  sock_sendmsg+0x30/0x40
[ 3800.149005]  __sys_sendto+0x10e/0x140
[ 3800.152662]  ? __do_page_fault+0x283/0x500
[ 3800.156751]  __x64_sys_sendto+0x24/0x30
[ 3800.160585]  do_syscall_64+0x42/0xf0
[ 3800.164156]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3800.169204] RIP: 0033:0x7f1d1d9db430
[ 3800.172774] RSP: 002b:00007fffb7278610 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
[ 3800.180333] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1d1d9db430
[ 3800.187456] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
[ 3800.194582] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 3800.201705] R10: 0000000000000040 R11: 0000000000000293 R12: 0000000000000000
[ 3800.208830] R13: 0000000000000000 R14: 0000000000755510 R15: 00007f1d0d3fc000
[ 3800.215953] Code: d0 0f 86 db 05 00 00 01 c8 0f b7 ca 29 c8 83 e8 01 39 c6 0f 8f ea 06 00 00 48 8b 45 28 48 8d 14 92 41 b9 01 00 00 00 4c 8d 2c d0 <49> 89 5d 08 8b 83 80 00 00 00 66 45 89 4d 14 41 89 45 10 0f b7 
[ 3800.234798] RIP: i40e_xmit_frame_ring+0xd4/0x1490 [i40e] RSP: ffffc9000407bcd0
[ 3800.242005] CR2: 0000000000000008
[ 3800.245320] ---[ end trace f169e36f468e0c59 ]---
[ 3801.726719] Kernel panic - not syncing: Fatal exception in interrupt
[ 3801.733097] Kernel Offset: disabled
[ 3801.785836] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
[ 3801.793403] ------------[ cut here ]------------

(gdb) list *(i40e_xmit_frame_ring)+0xd4
0x2ccd4 is in i40e_xmit_frame_ring (drivers/net/ethernet/intel/i40e/i40e_txrx.c:4048).
warning: Source file is more recent than executable.
4043			return NETDEV_TX_BUSY;
4044		}
4045	
4046		/* record the location of the first descriptor for this packet */
4047		first = &tx_ring->tx_bi[tx_ring->next_to_use];
4048		first->skb = skb;
4049		first->bytecount = skb->len;
4050		first->gso_segs = 1;
4051	
4052		/* prepare the xmit flags */


(gdb) list *(xsk_sendmsg)+0x1f4
0xffffffff81800c34 is in xsk_sendmsg (net/xdp/xsk.c:251).
warning: Source file is more recent than executable.
246			skb_shinfo(skb)->destructor_arg = (void *)(long)id;
247			skb->destructor = xsk_destruct_skb;
248	
249			err = dev_direct_xmit(skb, xs->queue_id);
250			/* Ignore NET_XMIT_CN as packet might have been sent */
251			if (err == NET_XMIT_DROP || err == NETDEV_TX_BUSY) {
252				err = -EAGAIN;
253				/* SKB consumed by dev_direct_xmit() */
254				goto out;
255			}

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH 12/14] net: sched: retry action check-insert on concurrent modification
From: Vlad Buslov @ 2018-05-16 14:26 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, jhs, xiyou.wangcong, pablo, kadlec, fw, ast,
	daniel, edumazet, keescook, linux-kernel, netfilter-devel,
	coreteam, kliteyn
In-Reply-To: <20180516141319.GO1972@nanopsycho>


On Wed 16 May 2018 at 14:13, Jiri Pirko <jiri@resnulli.us> wrote:
> Wed, May 16, 2018 at 03:52:20PM CEST, vladbu@mellanox.com wrote:
>>
>>On Wed 16 May 2018 at 13:21, Jiri Pirko <jiri@resnulli.us> wrote:
>>> Wed, May 16, 2018 at 02:43:58PM CEST, vladbu@mellanox.com wrote:
>>>>
>>>>On Wed 16 May 2018 at 12:26, Jiri Pirko <jiri@resnulli.us> wrote:
>>>>> Wed, May 16, 2018 at 01:55:06PM CEST, vladbu@mellanox.com wrote:
>>>>>>
>>>>>>On Wed 16 May 2018 at 09:59, Jiri Pirko <jiri@resnulli.us> wrote:
>>>>>>> Mon, May 14, 2018 at 04:27:13PM CEST, vladbu@mellanox.com wrote:
>>>>>>>>Retry check-insert sequence in action init functions if action with same
>>>>>>>>index was inserted concurrently.
>>>>>>>>
>>>>>>>>Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
>>>>>>>>---
>>>>>>>> net/sched/act_bpf.c        | 8 +++++++-
>>>>>>>> net/sched/act_connmark.c   | 8 +++++++-
>>>>>>>> net/sched/act_csum.c       | 8 +++++++-
>>>>>>>> net/sched/act_gact.c       | 8 +++++++-
>>>>>>>> net/sched/act_ife.c        | 8 +++++++-
>>>>>>>> net/sched/act_ipt.c        | 8 +++++++-
>>>>>>>> net/sched/act_mirred.c     | 8 +++++++-
>>>>>>>> net/sched/act_nat.c        | 8 +++++++-
>>>>>>>> net/sched/act_pedit.c      | 8 +++++++-
>>>>>>>> net/sched/act_police.c     | 9 ++++++++-
>>>>>>>> net/sched/act_sample.c     | 8 +++++++-
>>>>>>>> net/sched/act_simple.c     | 9 ++++++++-
>>>>>>>> net/sched/act_skbedit.c    | 8 +++++++-
>>>>>>>> net/sched/act_skbmod.c     | 8 +++++++-
>>>>>>>> net/sched/act_tunnel_key.c | 9 ++++++++-
>>>>>>>> net/sched/act_vlan.c       | 9 ++++++++-
>>>>>>>> 16 files changed, 116 insertions(+), 16 deletions(-)
>>>>>>>>
>>>>>>>>diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
>>>>>>>>index 5554bf7..7e20fdc 100644
>>>>>>>>--- a/net/sched/act_bpf.c
>>>>>>>>+++ b/net/sched/act_bpf.c
>>>>>>>>@@ -299,10 +299,16 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla,
>>>>>>>> 
>>>>>>>> 	parm = nla_data(tb[TCA_ACT_BPF_PARMS]);
>>>>>>>> 
>>>>>>>>+replay:
>>>>>>>> 	if (!tcf_idr_check(tn, parm->index, act, bind)) {
>>>>>>>> 		ret = tcf_idr_create(tn, parm->index, est, act,
>>>>>>>> 				     &act_bpf_ops, bind, true);
>>>>>>>>-		if (ret < 0)
>>>>>>>>+		/* Action with specified index was created concurrently.
>>>>>>>>+		 * Check again.
>>>>>>>>+		 */
>>>>>>>>+		if (parm->index && ret == -ENOSPC)
>>>>>>>>+			goto replay;
>>>>>>>>+		else if (ret)
>>>>>>>
>>>>>>> Hmm, looks like you are doing the same/very similar thing in every act
>>>>>>> code. I think it would make sense to introduce a helper function for
>>>>>>> this purpose.
>>>>>>
>>>>>>This code uses goto so it can't be easily refactored into standalone
>>>>>>function. Could you specify which part of this code you suggest to
>>>>>>extract?
>>>>>
>>>>> Hmm, looking at the code, I think that what would help is to have a
>>>>> helper that would atomically check if index exists and if not, it would
>>>>> allocate one. Something like:
>>>>>
>>>>>
>>>>> int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index,
>>>>> 			struct tc_action **a, int bind)
>>>>> {
>>>>> 	struct tcf_idrinfo *idrinfo = tn->idrinfo;
>>>>> 	struct tc_action *p;
>>>>> 	int err;
>>>>>
>>>>> 	spin_lock(&idrinfo->lock);
>>>>> 	if (*index) {
>>>>> 		p = idr_find(&idrinfo->action_idr, *index);
>>>>> 		if (p) {
>>>>> 			if (bind)
>>>>> 	   			p->tcfa_bindcnt++;
>>>>> 			p->tcfa_refcnt++;
>>>>> 			*a = p;
>>>>> 			err = 0;
>>>>> 		} else {
>>>>> 			*a = NULL;
>>>>> 			err = idr_alloc_u32(idr, NULL, index,
>>>>> 					    *index, GFP_ATOMIC);
>>>>> 		}
>>>>> 	} else {
>>>>> 		*index = 1;
>>>>> 		*a = NULL;
>>>>> 		err = idr_alloc_u32(idr, NULL, index, UINT_MAX, GFP_ATOMIC);
>>>>> 	}
>>>>> 	spin_unlock(&idrinfo->lock);
>>>>> 	return err;
>>>>> }
>>>>>
>>>>> The act code would just check if "a" is NULL and if so, it would call
>>>>> tcf_idr_create() with allocated index as arg.
>>>>
>>>>What about multiple actions that have arbitrary code between initial
>>>>check and idr allocation that is currently inside tcf_idr_create()?
>>>
>>> Why it would be a problem to have them after the allocation?
>>
>>Lets look at mirred for exmple:
>>	exists = tcf_idr_check(tn, parm->index, a, bind);
>>	if (exists && bind)
>>		return 0;
>>
>>	switch (parm->eaction) {
>>	case TCA_EGRESS_MIRROR:
>>	case TCA_EGRESS_REDIR:
>>	case TCA_INGRESS_REDIR:
>>	case TCA_INGRESS_MIRROR:
>>		break;
>>	default:
>>		if (exists)
>>			tcf_idr_release(*a, bind);
>>		NL_SET_ERR_MSG_MOD(extack, "Unknown mirred option");
>>		return -EINVAL;
>>	}
>>	if (parm->ifindex) {
>>		dev = __dev_get_by_index(net, parm->ifindex);
>>		if (dev == NULL) {
>>			if (exists)
>>				tcf_idr_release(*a, bind);
>>			return -ENODEV;
>>		}
>>		mac_header_xmit = dev_is_mac_header_xmit(dev);
>>	} else {
>>		dev = NULL;
>>	}
>>
>>	if (!exists) {
>>		if (!dev) {
>>			NL_SET_ERR_MSG_MOD(extack, "Specified device does not exist");
>>			return -EINVAL;
>>		}
>>		ret = tcf_idr_create(tn, parm->index, est, a,
>>				     &act_mirred_ops, bind, true);
>>		/* Action with specified index was created concurrently.
>>		 * Check again.
>>		 */
>>		if (parm->index && ret == -ENOSPC)
>>			goto replay;
>>		else if (ret)
>>			return ret;
>>
>>There are several returns and cleanup is only performed when action
>>exists. So all code like that will have to be audited to also remove
>>index from idr, otherwise idr handles leak on return.
>
> Yeah. You have to take care of the error path.
>
>
>>
>>>
>>> There is one issue though with my draft. tcf_idr_insert() function
>>> which actually assigns a "p" pointer to the idr index is called later on.
>>> Until that happens, the idr_find() would return NULL even if the index
>>> is actually allocated. We cannot assign "p" in tcf_idr_check_alloc()
>>> because it is allocated only later on in tcf_idr_create(). But that is
>>> resolvable by the following trick:
>>>
>>> int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index,
>>> 			struct tc_action **a, int bind)
>>> {
>>> 	struct tcf_idrinfo *idrinfo = tn->idrinfo;
>>> 	struct tc_action *p;
>>> 	int err;
>>>
>>> again:
>>> 	spin_lock(&idrinfo->lock);
>>> 	if (*index) {
>>>  		p = idr_find(&idrinfo->action_idr, *index);
>>> 		if (IS_ERR(p)) {
>>> 			/* This means that another process allocated
>>> 			 * index but did not assign the pointer yet.
>>> 			 */
>>> 			spin_unlock(&idrinfo->lock);
>>> 			goto again;
>>> 		}
>>>  		if (p) {
>>>  			if (bind)
>>>  	   			p->tcfa_bindcnt++;
>>>  			p->tcfa_refcnt++;
>>>  			*a = p;
>>>  			err = 0;
>>>  		} else {
>>>  			*a = NULL;
>>>  			err = idr_alloc_u32(idr, NULL, index,
>>>  					    *index, GFP_ATOMIC);
>>> 			idr_replace(&idrinfo->action_idr,
>>> 				    ERR_PTR(-EBUSY), *index);
>>>  		}
>>>  	} else {
>>>  		*index = 1;
>>> 		*a = NULL;
>>>  		err = idr_alloc_u32(idr, NULL, index, UINT_MAX, GFP_ATOMIC);
>>> 		idr_replace(&idrinfo->action_idr, ERR_PTR(-EBUSY), *index);
>>>  	}
>>>  	spin_unlock(&idrinfo->lock);
>>>  	return err;
>>> }
>>>
>>
>>So users of action idr that might perform concurrent lookups are also
>>have to be changed to check for error pointers, that now can be inserted
>>into idr? Seems like a complex change...
>
> You can just add a simple check into tcf_idr_lookup(). Where else?
>

To me it looks like we take something simple and already working, and
make it complex to save few lines of code in action init...

Anyway, how should I do patch split for this?
Patch to implement function you outlined and another one to modify all
actions to use it(with all refactoring to not leak references)? Or patch
per action is better approach?

^ permalink raw reply

* Re: [PATCH net-next 2/2] pfifo_fast: drop unneeded additional lock on dequeue
From: Michael S. Tsirkin @ 2018-05-16 14:24 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, David S. Miller, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	John Fastabend
In-Reply-To: <1526457376.2553.1.camel@redhat.com>

On Wed, May 16, 2018 at 09:56:16AM +0200, Paolo Abeni wrote:
> On Tue, 2018-05-15 at 23:17 +0300, Michael S. Tsirkin wrote:
> > On Tue, May 15, 2018 at 04:24:37PM +0200, Paolo Abeni wrote:
> > > After the previous patch, for NOLOCK qdiscs, q->seqlock is
> > > always held when the dequeue() is invoked, we can drop
> > > any additional locking to protect such operation.
> > > 
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > ---
> > >  include/linux/skb_array.h | 5 +++++
> > >  net/sched/sch_generic.c   | 4 ++--
> > >  2 files changed, 7 insertions(+), 2 deletions(-)
> > 
> > Is the seqlock taken during qdisc_change_tx_queue_len?
> > We need to prevent that racing with dequeue.
> 
> Thanks for the head-up! I missed that code-path.
> 
> I'll add the lock in qdisc_change_tx_queue_len() in v2.
> 
> Thanks you,
> 
> Paolo

When you do, make sure locks nest consistently.

^ permalink raw reply

* Re: Xilinx axienet + DP83620 in fiber mode won't set netif_carrier_on
From: Alvaro G. M. @ 2018-05-16 14:24 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev
In-Reply-To: <20180516131134.GF22000@lunn.ch>

On Wed, May 16, 2018 at 03:11:34PM +0200, Andrew Lunn wrote:
> Hi Alvaro
> 
> What should happen in general terms is that at some point the link to
> the peer is established. phylib, the generic PHY code, polls the PHY
> ever second, asking what the link state is. When the link changes from
> down to up, phylib will call the link_adjust callback in the MAC, and
> netif_carrier_on().
> 
> When the PHY reports the link has gone down, it does similar, calls
> the adjust_link callback, and netif_carrier_off().
> 
> So what you need to do is find out why the PHY driver never reports
> link up. Does the PHY even know when the link is up? Often SFF/SFP
> modules have a Signal Detect pin, which is connected to a gpio. Do you
> have something like that? If so, you should look at the PHYLINK code
> and the SFP device which was added recently.


I didn't know about the SFP device. I don't think this will help my specific
case because my board didn't route the i2c bus from the SFP, so it basically
sits there and does it thing alone, I can't communicate with it.

I see that net/phy/marvell.c has a custom marvell_update_link that reads a
different register to check for fiber connectivity instead of using
genphy_update_link, which I see reads from MII_BMSR.BMSR_LSTATUS

It looks like the DP83620 may do something similar, and the fiber status may
be accesible from some other register. This starts to make sense, thanks for
setting my on track!

Best regards!

-- 
Alvaro G. M.

^ permalink raw reply

* [PATCH net-next] cxgb4: update LE-TCAM collection for T6
From: Rahul Lakkireddy @ 2018-05-16 14:21 UTC (permalink / raw)
  To: netdev; +Cc: davem, ganeshgr, nirranjan, indranil, Rahul Lakkireddy

For T6, clip table is separated from main TCAM. So, update LE-TCAM
collection logic to collect clip table TCAM as well. IPv6 takes
4 entries in clip table TCAM compared to 2 entries in main TCAM.

Also, in case of errors, keep LE-TCAM collected so far and set the
status to partial dump.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h |  3 ++
 drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h     |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c    | 40 ++++++++++++++++++-----
 drivers/net/ethernet/chelsio/cxgb4/t4_regs.h      |  1 +
 4 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h b/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
index b57acb8dc35b..740a18ba4229 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
@@ -235,6 +235,9 @@ struct cudbg_vpd_data {
 };
 
 #define CUDBG_MAX_TCAM_TID 0x800
+#define CUDBG_T6_CLIP 1536
+#define CUDBG_MAX_TID_COMP_EN 6144
+#define CUDBG_MAX_TID_COMP_DIS 3072
 
 enum cudbg_le_entry_types {
 	LE_ET_UNKNOWN = 0,
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
index 8568a51f6414..215fe6260fd7 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
@@ -24,6 +24,7 @@
 #define CUDBG_STATUS_NOT_IMPLEMENTED -28
 #define CUDBG_SYSTEM_ERROR -29
 #define CUDBG_STATUS_CCLK_NOT_DEFINED -32
+#define CUDBG_STATUS_PARTIAL_DATA -41
 
 #define CUDBG_MAJOR_VERSION 1
 #define CUDBG_MINOR_VERSION 14
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
index 9da6f57901a9..4feb7eca0acf 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
@@ -2366,8 +2366,11 @@ void cudbg_fill_le_tcam_info(struct adapter *padap,
 	value = t4_read_reg(padap, LE_DB_ROUTING_TABLE_INDEX_A);
 	tcam_region->routing_start = value;
 
-	/*Get clip table index */
-	value = t4_read_reg(padap, LE_DB_CLIP_TABLE_INDEX_A);
+	/* Get clip table index. For T6 there is separate CLIP TCAM */
+	if (is_t6(padap->params.chip))
+		value = t4_read_reg(padap, LE_DB_CLCAM_TID_BASE_A);
+	else
+		value = t4_read_reg(padap, LE_DB_CLIP_TABLE_INDEX_A);
 	tcam_region->clip_start = value;
 
 	/* Get filter table index */
@@ -2392,8 +2395,16 @@ void cudbg_fill_le_tcam_info(struct adapter *padap,
 					       tcam_region->tid_hash_base;
 		}
 	} else { /* hash not enabled */
-		tcam_region->max_tid = CUDBG_MAX_TCAM_TID;
+		if (is_t6(padap->params.chip))
+			tcam_region->max_tid = (value & ASLIPCOMPEN_F) ?
+					       CUDBG_MAX_TID_COMP_EN :
+					       CUDBG_MAX_TID_COMP_DIS;
+		else
+			tcam_region->max_tid = CUDBG_MAX_TCAM_TID;
 	}
+
+	if (is_t6(padap->params.chip))
+		tcam_region->max_tid += CUDBG_T6_CLIP;
 }
 
 int cudbg_collect_le_tcam(struct cudbg_init *pdbg_init,
@@ -2423,18 +2434,31 @@ int cudbg_collect_le_tcam(struct cudbg_init *pdbg_init,
 	for (i = 0; i < tcam_region.max_tid; ) {
 		rc = cudbg_read_tid(pdbg_init, i, tid_data);
 		if (rc) {
-			cudbg_err->sys_err = rc;
-			cudbg_put_buff(pdbg_init, &temp_buff);
-			return rc;
+			cudbg_err->sys_warn = CUDBG_STATUS_PARTIAL_DATA;
+			/* Update tcam header and exit */
+			tcam_region.max_tid = i;
+			memcpy(temp_buff.data, &tcam_region,
+			       sizeof(struct cudbg_tcam));
+			goto out;
 		}
 
-		/* ipv6 takes two tids */
-		cudbg_is_ipv6_entry(tid_data, tcam_region) ? i += 2 : i++;
+		if (cudbg_is_ipv6_entry(tid_data, tcam_region)) {
+			/* T6 CLIP TCAM: ipv6 takes 4 entries */
+			if (is_t6(padap->params.chip) &&
+			    i >= tcam_region.clip_start &&
+			    i < tcam_region.clip_start + CUDBG_T6_CLIP)
+				i += 4;
+			else /* Main TCAM: ipv6 takes two tids */
+				i += 2;
+		} else {
+			i++;
+		}
 
 		tid_data++;
 		bytes += sizeof(struct cudbg_tid_data);
 	}
 
+out:
 	return cudbg_write_and_release_buff(pdbg_init, &temp_buff, dbg_buff);
 }
 
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
index 843f8cada1de..6b55aa2eb2a5 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
@@ -2999,6 +2999,7 @@
 #define LE_DB_HASH_TID_BASE_A 0x19c30
 #define LE_DB_HASH_TBL_BASE_ADDR_A 0x19c30
 #define LE_DB_INT_CAUSE_A 0x19c3c
+#define LE_DB_CLCAM_TID_BASE_A 0x19df4
 #define LE_DB_TID_HASHBASE_A 0x19df8
 #define T6_LE_DB_HASH_TID_BASE_A 0x19df8
 
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH 12/14] net: sched: retry action check-insert on concurrent modification
From: Jiri Pirko @ 2018-05-16 14:13 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: netdev, davem, jhs, xiyou.wangcong, pablo, kadlec, fw, ast,
	daniel, edumazet, keescook, linux-kernel, netfilter-devel,
	coreteam, kliteyn
In-Reply-To: <vbfvabn1zic.fsf@reg-r-vrt-018-180.mtr.labs.mlnx>

Wed, May 16, 2018 at 03:52:20PM CEST, vladbu@mellanox.com wrote:
>
>On Wed 16 May 2018 at 13:21, Jiri Pirko <jiri@resnulli.us> wrote:
>> Wed, May 16, 2018 at 02:43:58PM CEST, vladbu@mellanox.com wrote:
>>>
>>>On Wed 16 May 2018 at 12:26, Jiri Pirko <jiri@resnulli.us> wrote:
>>>> Wed, May 16, 2018 at 01:55:06PM CEST, vladbu@mellanox.com wrote:
>>>>>
>>>>>On Wed 16 May 2018 at 09:59, Jiri Pirko <jiri@resnulli.us> wrote:
>>>>>> Mon, May 14, 2018 at 04:27:13PM CEST, vladbu@mellanox.com wrote:
>>>>>>>Retry check-insert sequence in action init functions if action with same
>>>>>>>index was inserted concurrently.
>>>>>>>
>>>>>>>Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
>>>>>>>---
>>>>>>> net/sched/act_bpf.c        | 8 +++++++-
>>>>>>> net/sched/act_connmark.c   | 8 +++++++-
>>>>>>> net/sched/act_csum.c       | 8 +++++++-
>>>>>>> net/sched/act_gact.c       | 8 +++++++-
>>>>>>> net/sched/act_ife.c        | 8 +++++++-
>>>>>>> net/sched/act_ipt.c        | 8 +++++++-
>>>>>>> net/sched/act_mirred.c     | 8 +++++++-
>>>>>>> net/sched/act_nat.c        | 8 +++++++-
>>>>>>> net/sched/act_pedit.c      | 8 +++++++-
>>>>>>> net/sched/act_police.c     | 9 ++++++++-
>>>>>>> net/sched/act_sample.c     | 8 +++++++-
>>>>>>> net/sched/act_simple.c     | 9 ++++++++-
>>>>>>> net/sched/act_skbedit.c    | 8 +++++++-
>>>>>>> net/sched/act_skbmod.c     | 8 +++++++-
>>>>>>> net/sched/act_tunnel_key.c | 9 ++++++++-
>>>>>>> net/sched/act_vlan.c       | 9 ++++++++-
>>>>>>> 16 files changed, 116 insertions(+), 16 deletions(-)
>>>>>>>
>>>>>>>diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
>>>>>>>index 5554bf7..7e20fdc 100644
>>>>>>>--- a/net/sched/act_bpf.c
>>>>>>>+++ b/net/sched/act_bpf.c
>>>>>>>@@ -299,10 +299,16 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla,
>>>>>>> 
>>>>>>> 	parm = nla_data(tb[TCA_ACT_BPF_PARMS]);
>>>>>>> 
>>>>>>>+replay:
>>>>>>> 	if (!tcf_idr_check(tn, parm->index, act, bind)) {
>>>>>>> 		ret = tcf_idr_create(tn, parm->index, est, act,
>>>>>>> 				     &act_bpf_ops, bind, true);
>>>>>>>-		if (ret < 0)
>>>>>>>+		/* Action with specified index was created concurrently.
>>>>>>>+		 * Check again.
>>>>>>>+		 */
>>>>>>>+		if (parm->index && ret == -ENOSPC)
>>>>>>>+			goto replay;
>>>>>>>+		else if (ret)
>>>>>>
>>>>>> Hmm, looks like you are doing the same/very similar thing in every act
>>>>>> code. I think it would make sense to introduce a helper function for
>>>>>> this purpose.
>>>>>
>>>>>This code uses goto so it can't be easily refactored into standalone
>>>>>function. Could you specify which part of this code you suggest to
>>>>>extract?
>>>>
>>>> Hmm, looking at the code, I think that what would help is to have a
>>>> helper that would atomically check if index exists and if not, it would
>>>> allocate one. Something like:
>>>>
>>>>
>>>> int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index,
>>>> 			struct tc_action **a, int bind)
>>>> {
>>>> 	struct tcf_idrinfo *idrinfo = tn->idrinfo;
>>>> 	struct tc_action *p;
>>>> 	int err;
>>>>
>>>> 	spin_lock(&idrinfo->lock);
>>>> 	if (*index) {
>>>> 		p = idr_find(&idrinfo->action_idr, *index);
>>>> 		if (p) {
>>>> 			if (bind)
>>>> 	   			p->tcfa_bindcnt++;
>>>> 			p->tcfa_refcnt++;
>>>> 			*a = p;
>>>> 			err = 0;
>>>> 		} else {
>>>> 			*a = NULL;
>>>> 			err = idr_alloc_u32(idr, NULL, index,
>>>> 					    *index, GFP_ATOMIC);
>>>> 		}
>>>> 	} else {
>>>> 		*index = 1;
>>>> 		*a = NULL;
>>>> 		err = idr_alloc_u32(idr, NULL, index, UINT_MAX, GFP_ATOMIC);
>>>> 	}
>>>> 	spin_unlock(&idrinfo->lock);
>>>> 	return err;
>>>> }
>>>>
>>>> The act code would just check if "a" is NULL and if so, it would call
>>>> tcf_idr_create() with allocated index as arg.
>>>
>>>What about multiple actions that have arbitrary code between initial
>>>check and idr allocation that is currently inside tcf_idr_create()?
>>
>> Why it would be a problem to have them after the allocation?
>
>Lets look at mirred for exmple:
>	exists = tcf_idr_check(tn, parm->index, a, bind);
>	if (exists && bind)
>		return 0;
>
>	switch (parm->eaction) {
>	case TCA_EGRESS_MIRROR:
>	case TCA_EGRESS_REDIR:
>	case TCA_INGRESS_REDIR:
>	case TCA_INGRESS_MIRROR:
>		break;
>	default:
>		if (exists)
>			tcf_idr_release(*a, bind);
>		NL_SET_ERR_MSG_MOD(extack, "Unknown mirred option");
>		return -EINVAL;
>	}
>	if (parm->ifindex) {
>		dev = __dev_get_by_index(net, parm->ifindex);
>		if (dev == NULL) {
>			if (exists)
>				tcf_idr_release(*a, bind);
>			return -ENODEV;
>		}
>		mac_header_xmit = dev_is_mac_header_xmit(dev);
>	} else {
>		dev = NULL;
>	}
>
>	if (!exists) {
>		if (!dev) {
>			NL_SET_ERR_MSG_MOD(extack, "Specified device does not exist");
>			return -EINVAL;
>		}
>		ret = tcf_idr_create(tn, parm->index, est, a,
>				     &act_mirred_ops, bind, true);
>		/* Action with specified index was created concurrently.
>		 * Check again.
>		 */
>		if (parm->index && ret == -ENOSPC)
>			goto replay;
>		else if (ret)
>			return ret;
>
>There are several returns and cleanup is only performed when action
>exists. So all code like that will have to be audited to also remove
>index from idr, otherwise idr handles leak on return.

Yeah. You have to take care of the error path.


>
>>
>> There is one issue though with my draft. tcf_idr_insert() function
>> which actually assigns a "p" pointer to the idr index is called later on.
>> Until that happens, the idr_find() would return NULL even if the index
>> is actually allocated. We cannot assign "p" in tcf_idr_check_alloc()
>> because it is allocated only later on in tcf_idr_create(). But that is
>> resolvable by the following trick:
>>
>> int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index,
>> 			struct tc_action **a, int bind)
>> {
>> 	struct tcf_idrinfo *idrinfo = tn->idrinfo;
>> 	struct tc_action *p;
>> 	int err;
>>
>> again:
>> 	spin_lock(&idrinfo->lock);
>> 	if (*index) {
>>  		p = idr_find(&idrinfo->action_idr, *index);
>> 		if (IS_ERR(p)) {
>> 			/* This means that another process allocated
>> 			 * index but did not assign the pointer yet.
>> 			 */
>> 			spin_unlock(&idrinfo->lock);
>> 			goto again;
>> 		}
>>  		if (p) {
>>  			if (bind)
>>  	   			p->tcfa_bindcnt++;
>>  			p->tcfa_refcnt++;
>>  			*a = p;
>>  			err = 0;
>>  		} else {
>>  			*a = NULL;
>>  			err = idr_alloc_u32(idr, NULL, index,
>>  					    *index, GFP_ATOMIC);
>> 			idr_replace(&idrinfo->action_idr,
>> 				    ERR_PTR(-EBUSY), *index);
>>  		}
>>  	} else {
>>  		*index = 1;
>> 		*a = NULL;
>>  		err = idr_alloc_u32(idr, NULL, index, UINT_MAX, GFP_ATOMIC);
>> 		idr_replace(&idrinfo->action_idr, ERR_PTR(-EBUSY), *index);
>>  	}
>>  	spin_unlock(&idrinfo->lock);
>>  	return err;
>> }
>>
>
>So users of action idr that might perform concurrent lookups are also
>have to be changed to check for error pointers, that now can be inserted
>into idr? Seems like a complex change...

You can just add a simple check into tcf_idr_lookup(). Where else?


>
>>
>>
>>
>>
>>>
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> [...]
>>>>>
>>>
>

^ permalink raw reply

* Re: [PATCH net-next] erspan: set bso bit based on mirrored packet's len
From: William Tu @ 2018-05-16 14:05 UTC (permalink / raw)
  To: Tobin C. Harding; +Cc: Linux Kernel Network Developers
In-Reply-To: <20180515053342.GB10152@eros>

On Mon, May 14, 2018 at 10:33 PM, Tobin C. Harding <tobin@apporbit.com> wrote:
> On Mon, May 14, 2018 at 04:54:36PM -0700, William Tu wrote:
>> Before the patch, the erspan BSO bit (Bad/Short/Oversized) is not
>> handled.  BSO has 4 possible values:
>>   00 --> Good frame with no error, or unknown integrity
>>   11 --> Payload is a Bad Frame with CRC or Alignment Error
>>   01 --> Payload is a Short Frame
>>   10 --> Payload is an Oversized Frame
>>
>> Based the short/oversized definitions in RFC1757, the patch sets
>> the bso bit based on the mirrored packet's size.
>>
>> Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com>
>> Signed-off-by: William Tu <u9012063@gmail.com>
>> ---
>>  include/net/erspan.h | 25 +++++++++++++++++++++++++
>>  1 file changed, 25 insertions(+)
>>
>> diff --git a/include/net/erspan.h b/include/net/erspan.h
>> index d044aa60cc76..5eb95f78ad45 100644
>> --- a/include/net/erspan.h
>> +++ b/include/net/erspan.h
>> @@ -219,6 +219,30 @@ static inline __be32 erspan_get_timestamp(void)
>>       return htonl((u32)h_usecs);
>>  }
>>
>> +/* ERSPAN BSO (Bad/Short/Oversized)
>> + *   00b --> Good frame with no error, or unknown integrity
>> + *   01b --> Payload is a Short Frame
>> + *   10b --> Payload is an Oversized Frame
>> + *   11b --> Payload is a Bad Frame with CRC or Alignment Error
>> + */
>> +enum erspan_bso {
>> +     BSO_NOERROR,
>> +     BSO_SHORT,
>> +     BSO_OVERSIZED,
>> +     BSO_BAD,
>> +};
>
> If we are relying on the values perhaps this would be clearer
>
>         BSO_NOERROR     = 0x00,
>         BSO_SHORT       = 0x01,
>         BSO_OVERSIZED   = 0x02,
>         BSO_BAD         = 0x03,
>

Yes, thanks. I will change in v2.

>> +
>> +static inline u8 erspan_detect_bso(struct sk_buff *skb)
>> +{
>> +     if (skb->len < ETH_ZLEN)
>> +             return BSO_SHORT;
>> +
>> +     if (skb->len > ETH_FRAME_LEN)
>> +             return BSO_OVERSIZED;
>> +
>> +     return BSO_NOERROR;
>> +}
>
> Without having much contextual knowledge around this patch; should we be
> doing some check on CRC or alignment (at some stage)?  Having BSO_BAD
> seems to imply so?
>

The definition of BSO_BAD:
etherStatsCRCAlignErrors OBJECT-TYPE
              SYNTAX Counter
              ACCESS read-only
              STATUS mandatory
              DESCRIPTION
                  "The total number of packets received that
                  had a length (excluding framing bits, but
                  including FCS octets) of between 64 and 1518
                  octets, inclusive, but but had either a bad
                  Frame Check Sequence (FCS) with an integral
                  number of octets (FCS Error) or a bad FCS with
                  a non-integral number of octets (Alignment Error)."

But I don't know how to check CRC error at this code point.
Isn't it done by the NIC hardware?

Thanks for your review!
William

^ permalink raw reply

* Re: [RFC v4 3/5] virtio_ring: add packed ring support
From: Jason Wang @ 2018-05-16 14:05 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann
In-Reply-To: <20180516134550.GB4171@debian>



On 2018年05月16日 21:45, Tiwei Bie wrote:
> On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:
>> On 2018年05月16日 20:39, Tiwei Bie wrote:
>>> On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:
>>>> On 2018年05月16日 16:37, Tiwei Bie wrote:
>>> [...]
>>>>>     struct vring_virtqueue {
>>>>> @@ -116,6 +117,9 @@ struct vring_virtqueue {
>>>>>     			/* Last written value to driver->flags in
>>>>>     			 * guest byte order. */
>>>>>     			u16 event_flags_shadow;
>>>>> +
>>>>> +			/* ID allocation. */
>>>>> +			struct idr buffer_id;
>>>> I'm not sure idr is fit for the performance critical case here. Need to
>>>> measure its performance impact, especially if we have few unused slots.
>>> I'm also not sure.. But fortunately, it should be quite easy
>>> to replace it with something else without changing other code.
>>> If it will really hurt the performance, I'll change it.
>> We may want to do some benchmarking/profiling to see.
> Yeah!
>
>>>>>     		};
>>>>>     	};
>>> [...]
>>>>> +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
>>>>> +			      unsigned int id, void **ctx)
>>>>> +{
>>>>> +	struct vring_packed_desc *desc;
>>>>> +	unsigned int i, j;
>>>>> +
>>>>> +	/* Clear data ptr. */
>>>>> +	vq->desc_state[id].data = NULL;
>>>>> +
>>>>> +	i = head;
>>>>> +
>>>>> +	for (j = 0; j < vq->desc_state[id].num; j++) {
>>>>> +		desc = &vq->vring_packed.desc[i];
>>>>> +		vring_unmap_one_packed(vq, desc);
>>>> As mentioned in previous discussion, this probably won't work for the case
>>>> of out of order completion since it depends on the information in the
>>>> descriptor ring. We probably need to extend ctx to record such information.
>>> Above code doesn't depend on the information in the descriptor
>>> ring. The vq->desc_state[] is the extended ctx.
>>>
>>> Best regards,
>>> Tiwei Bie
>> Yes, but desc is a pointer to descriptor ring I think so
>> vring_unmap_one_packed() still depends on the content of descriptor ring?
>>
> I got your point now. I think it makes sense to reserve
> the bits of the addr field. Driver shouldn't try to get
> addrs from the descriptors when cleanup the descriptors
> no matter whether we support out-of-order or not.

Maybe I was wrong, but I remember spec mentioned something like this.

>
> But combining it with the out-of-order support, it will
> mean that the driver still needs to maintain a desc/ctx
> list that is very similar to the desc ring in the split
> ring. I'm not quite sure whether it's something we want.
> If it is true, I'll do it. So do you think we also want
> to maintain such a desc/ctx list for packed ring?

To make it work for OOO backends I think we need something like this 
(hardware NIC drivers are usually have something like this).

Not for the patch, but it looks like having a OUT_OF_ORDER feature bit 
is much more simpler to be started with.

Thanks

>
> Best regards,
> Tiwei Bie

^ permalink raw reply

* Re: [PATCH v2 1/2] media: rc: introduce BPF_PROG_RAWIR_EVENT
From: Sean Young @ 2018-05-16 13:55 UTC (permalink / raw)
  To: linux-media, linux-kernel, Alexei Starovoitov,
	Mauro Carvalho Chehab, Daniel Borkmann, netdev, Matthias Reichl,
	Devin Heitmueller, Y Song
In-Reply-To: <82363bf25c059528b93fbe542d88f147b5081424.1526409690.git.sean@mess.org>

On Tue, May 15, 2018 at 07:50:19PM +0100, Sean Young wrote:
> Add support for BPF_PROG_RAWIR_EVENT. This type of BPF program can call
> rc_keydown() to reported decoded IR scancodes, or rc_repeat() to report
> that the last key should be repeated.
> 
> The bpf program can be attached to using the bpf(BPF_PROG_ATTACH) syscall;
> the target_fd must be the /dev/lircN device.

The locking of the list of attached bpf programs is broken in various ways.
I'll have to rework this for v3.


Sean

> 
> Signed-off-by: Sean Young <sean@mess.org>
> ---
>  drivers/media/rc/Kconfig           |  10 +
>  drivers/media/rc/Makefile          |   1 +
>  drivers/media/rc/bpf-rawir-event.c | 322 +++++++++++++++++++++++++++++
>  drivers/media/rc/lirc_dev.c        |  28 +++
>  drivers/media/rc/rc-core-priv.h    |  19 ++
>  drivers/media/rc/rc-ir-raw.c       |   3 +
>  include/linux/bpf_rcdev.h          |  30 +++
>  include/linux/bpf_types.h          |   3 +
>  include/uapi/linux/bpf.h           |  55 ++++-
>  kernel/bpf/syscall.c               |   7 +
>  10 files changed, 477 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/media/rc/bpf-rawir-event.c
>  create mode 100644 include/linux/bpf_rcdev.h
> 
> diff --git a/drivers/media/rc/Kconfig b/drivers/media/rc/Kconfig
> index eb2c3b6eca7f..55747af5b978 100644
> --- a/drivers/media/rc/Kconfig
> +++ b/drivers/media/rc/Kconfig
> @@ -25,6 +25,16 @@ config LIRC
>  	   passes raw IR to and from userspace, which is needed for
>  	   IR transmitting (aka "blasting") and for the lirc daemon.
>  
> +config BPF_RAWIR_EVENT
> +	bool "Enable attaching BPF programs to lirc devices"
> +	depends on BPF_SYSCALL
> +	depends on RC_CORE=y
> +	depends on LIRC
> +	help
> +	   Enable this option to make it possible to load custom IR
> +	   decoders written in BPF. These decoders are type
> +	   BPF_PROG_TYPE_RAW_IR_EVENT.
> +
>  menuconfig RC_DECODERS
>  	bool "Remote controller decoders"
>  	depends on RC_CORE
> diff --git a/drivers/media/rc/Makefile b/drivers/media/rc/Makefile
> index 2e1c87066f6c..74907823bef8 100644
> --- a/drivers/media/rc/Makefile
> +++ b/drivers/media/rc/Makefile
> @@ -5,6 +5,7 @@ obj-y += keymaps/
>  obj-$(CONFIG_RC_CORE) += rc-core.o
>  rc-core-y := rc-main.o rc-ir-raw.o
>  rc-core-$(CONFIG_LIRC) += lirc_dev.o
> +rc-core-$(CONFIG_BPF_RAWIR_EVENT) += bpf-rawir-event.o
>  obj-$(CONFIG_IR_NEC_DECODER) += ir-nec-decoder.o
>  obj-$(CONFIG_IR_RC5_DECODER) += ir-rc5-decoder.o
>  obj-$(CONFIG_IR_RC6_DECODER) += ir-rc6-decoder.o
> diff --git a/drivers/media/rc/bpf-rawir-event.c b/drivers/media/rc/bpf-rawir-event.c
> new file mode 100644
> index 000000000000..8007841977d6
> --- /dev/null
> +++ b/drivers/media/rc/bpf-rawir-event.c
> @@ -0,0 +1,322 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// bpf-rawir-event.c - handles bpf
> +//
> +// Copyright (C) 2018 Sean Young <sean@mess.org>
> +
> +#include <linux/bpf.h>
> +#include <linux/filter.h>
> +#include "rc-core-priv.h"
> +
> +/*
> + * BPF interface for raw IR
> + */
> +const struct bpf_prog_ops rawir_event_prog_ops = {
> +};
> +
> +BPF_CALL_1(bpf_rc_repeat, struct bpf_rawir_event*, event)
> +{
> +	struct ir_raw_event_ctrl *ctrl;
> +
> +	ctrl = container_of(event, struct ir_raw_event_ctrl, bpf_rawir_event);
> +
> +	rc_repeat(ctrl->dev);
> +
> +	return 0;
> +}
> +
> +static const struct bpf_func_proto rc_repeat_proto = {
> +	.func	   = bpf_rc_repeat,
> +	.gpl_only  = true, /* rc_repeat is EXPORT_SYMBOL_GPL */
> +	.ret_type  = RET_INTEGER,
> +	.arg1_type = ARG_PTR_TO_CTX,
> +};
> +
> +BPF_CALL_4(bpf_rc_keydown, struct bpf_rawir_event*, event, u32, protocol,
> +	   u32, scancode, u32, toggle)
> +{
> +	struct ir_raw_event_ctrl *ctrl;
> +
> +	ctrl = container_of(event, struct ir_raw_event_ctrl, bpf_rawir_event);
> +
> +	rc_keydown(ctrl->dev, protocol, scancode, toggle != 0);
> +
> +	return 0;
> +}
> +
> +static const struct bpf_func_proto rc_keydown_proto = {
> +	.func	   = bpf_rc_keydown,
> +	.gpl_only  = true, /* rc_keydown is EXPORT_SYMBOL_GPL */
> +	.ret_type  = RET_INTEGER,
> +	.arg1_type = ARG_PTR_TO_CTX,
> +	.arg2_type = ARG_ANYTHING,
> +	.arg3_type = ARG_ANYTHING,
> +	.arg4_type = ARG_ANYTHING,
> +};
> +
> +static const struct bpf_func_proto *
> +rawir_event_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> +{
> +	switch (func_id) {
> +	case BPF_FUNC_rc_repeat:
> +		return &rc_repeat_proto;
> +	case BPF_FUNC_rc_keydown:
> +		return &rc_keydown_proto;
> +	case BPF_FUNC_map_lookup_elem:
> +		return &bpf_map_lookup_elem_proto;
> +	case BPF_FUNC_map_update_elem:
> +		return &bpf_map_update_elem_proto;
> +	case BPF_FUNC_map_delete_elem:
> +		return &bpf_map_delete_elem_proto;
> +	case BPF_FUNC_ktime_get_ns:
> +		return &bpf_ktime_get_ns_proto;
> +	case BPF_FUNC_tail_call:
> +		return &bpf_tail_call_proto;
> +	case BPF_FUNC_get_prandom_u32:
> +		return &bpf_get_prandom_u32_proto;
> +	default:
> +		return NULL;
> +	}
> +}
> +
> +static bool rawir_event_is_valid_access(int off, int size,
> +					enum bpf_access_type type,
> +					const struct bpf_prog *prog,
> +					struct bpf_insn_access_aux *info)
> +{
> +	/* struct bpf_rawir_event has two u32 fields */
> +	if (type == BPF_WRITE)
> +		return false;
> +
> +	if (size != sizeof(__u32))
> +		return false;
> +
> +	if (!(off == offsetof(struct bpf_rawir_event, duration) ||
> +	      off == offsetof(struct bpf_rawir_event, type)))
> +		return false;
> +
> +	return true;
> +}
> +
> +const struct bpf_verifier_ops rawir_event_verifier_ops = {
> +	.get_func_proto  = rawir_event_func_proto,
> +	.is_valid_access = rawir_event_is_valid_access
> +};
> +
> +#define BPF_MAX_PROGS 64
> +
> +static int rc_dev_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog)
> +{
> +	struct ir_raw_event_ctrl *raw;
> +	struct bpf_prog_array __rcu *old_array;
> +	struct bpf_prog_array *new_array;
> +	int ret, i, size;
> +
> +	if (rcdev->driver_type != RC_DRIVER_IR_RAW)
> +		return -EINVAL;
> +
> +	ret = mutex_lock_interruptible(&rcdev->lock);
> +	if (ret)
> +		return ret;
> +
> +	raw = rcdev->raw;
> +
> +	if (raw->progs) {
> +		size = bpf_prog_array_length(raw->progs);
> +		for (i = 0; i < size; i++) {
> +			if (prog == raw->progs->progs[i]) {
> +				ret = -EEXIST;
> +				goto out;
> +			}
> +		}
> +
> +		if (size >= BPF_MAX_PROGS) {
> +			ret = -E2BIG;
> +			goto out;
> +		}
> +	}
> +
> +	old_array = raw->progs;
> +	ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array);
> +	if (ret < 0)
> +		goto out;
> +
> +	rcu_assign_pointer(raw->progs, new_array);
> +	bpf_prog_array_free(old_array);
> +out:
> +	mutex_unlock(&rcdev->lock);
> +	return ret;
> +}
> +
> +static int rc_dev_bpf_detach(struct rc_dev *rcdev, struct bpf_prog *prog)
> +{
> +	struct ir_raw_event_ctrl *raw;
> +	struct bpf_prog_array __rcu *old_array;
> +	struct bpf_prog_array *new_array;
> +	int ret;
> +
> +	if (rcdev->driver_type != RC_DRIVER_IR_RAW)
> +		return -EINVAL;
> +
> +	ret = mutex_lock_interruptible(&rcdev->lock);
> +	if (ret)
> +		return ret;
> +
> +	raw = rcdev->raw;
> +
> +	old_array = raw->progs;
> +	ret = bpf_prog_array_copy(old_array, prog, NULL, &new_array);
> +	if (ret < 0) {
> +		bpf_prog_array_delete_safe(old_array, prog);
> +	} else {
> +		rcu_assign_pointer(raw->progs, new_array);
> +		bpf_prog_array_free(old_array);
> +	}
> +
> +	bpf_prog_put(prog);
> +	mutex_unlock(&rcdev->lock);
> +	return 0;
> +}
> +
> +void rc_dev_bpf_run(struct rc_dev *rcdev, struct ir_raw_event ev)
> +{
> +	struct ir_raw_event_ctrl *raw = rcdev->raw;
> +
> +	if (!raw->progs)
> +		return;
> +
> +	if (unlikely(ev.carrier_report)) {
> +		raw->bpf_rawir_event.carrier = ev.carrier;
> +		raw->bpf_rawir_event.type = BPF_RAWIR_EVENT_CARRIER;
> +	} else {
> +		raw->bpf_rawir_event.duration = ev.duration;
> +
> +		if (ev.pulse)
> +			raw->bpf_rawir_event.type = BPF_RAWIR_EVENT_PULSE;
> +		else if (ev.timeout)
> +			raw->bpf_rawir_event.type = BPF_RAWIR_EVENT_TIMEOUT;
> +		else if (ev.reset)
> +			raw->bpf_rawir_event.type = BPF_RAWIR_EVENT_RESET;
> +		else
> +			raw->bpf_rawir_event.type = BPF_RAWIR_EVENT_SPACE;
> +	}
> +
> +	BPF_PROG_RUN_ARRAY(raw->progs, &raw->bpf_rawir_event, BPF_PROG_RUN);
> +}
> +
> +void rc_dev_bpf_put(struct rc_dev *rcdev)
> +{
> +	struct bpf_prog_array *progs = rcdev->raw->progs;
> +	int i, size;
> +
> +	if (!progs)
> +		return;
> +
> +	size = bpf_prog_array_length(progs);
> +	for (i = 0; i < size; i++)
> +		bpf_prog_put(progs->progs[i]);
> +
> +	bpf_prog_array_free(rcdev->raw->progs);
> +}
> +
> +int rc_dev_prog_attach(const union bpf_attr *attr)
> +{
> +	struct bpf_prog *prog;
> +	struct rc_dev *rcdev;
> +	int ret;
> +
> +	if (attr->attach_flags)
> +		return -EINVAL;
> +
> +	prog = bpf_prog_get_type(attr->attach_bpf_fd,
> +				 BPF_PROG_TYPE_RAWIR_EVENT);
> +	if (IS_ERR(prog))
> +		return PTR_ERR(prog);
> +
> +	rcdev = rc_dev_get_from_fd(attr->target_fd);
> +	if (IS_ERR(rcdev)) {
> +		bpf_prog_put(prog);
> +		return PTR_ERR(rcdev);
> +	}
> +
> +	ret = rc_dev_bpf_attach(rcdev, prog);
> +	if (ret)
> +		bpf_prog_put(prog);
> +
> +	put_device(&rcdev->dev);
> +
> +	return ret;
> +}
> +
> +int rc_dev_prog_detach(const union bpf_attr *attr)
> +{
> +	struct bpf_prog *prog;
> +	struct rc_dev *rcdev;
> +	int ret;
> +
> +	if (attr->attach_flags)
> +		return -EINVAL;
> +
> +	prog = bpf_prog_get_type(attr->attach_bpf_fd,
> +				 BPF_PROG_TYPE_RAWIR_EVENT);
> +	if (IS_ERR(prog))
> +		return PTR_ERR(prog);
> +
> +	rcdev = rc_dev_get_from_fd(attr->target_fd);
> +	if (IS_ERR(rcdev)) {
> +		bpf_prog_put(prog);
> +		return PTR_ERR(rcdev);
> +	}
> +
> +	ret = rc_dev_bpf_detach(rcdev, prog);
> +
> +	bpf_prog_put(prog);
> +	put_device(&rcdev->dev);
> +
> +	return ret;
> +}
> +
> +int rc_dev_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr)
> +{
> +	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
> +	struct bpf_prog_array *progs;
> +	struct rc_dev *rcdev;
> +	u32 cnt, flags = 0;
> +	int ret;
> +
> +	if (attr->query.query_flags)
> +		return -EINVAL;
> +
> +	rcdev = rc_dev_get_from_fd(attr->query.target_fd);
> +	if (IS_ERR(rcdev))
> +		return PTR_ERR(rcdev);
> +
> +	if (rcdev->driver_type != RC_DRIVER_IR_RAW) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	ret = mutex_lock_interruptible(&rcdev->lock);
> +	if (ret)
> +		goto out;
> +
> +	progs = rcdev->raw->progs;
> +	cnt = progs ? bpf_prog_array_length(progs) : 0;
> +
> +	if (copy_to_user(&uattr->query.prog_cnt, &cnt, sizeof(cnt))) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +	if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags))) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	if (attr->query.prog_cnt != 0 && prog_ids && cnt)
> +		ret = bpf_prog_array_copy_to_user(progs, prog_ids, cnt);
> +
> +out:
> +	mutex_unlock(&rcdev->lock);
> +	put_device(&rcdev->dev);
> +
> +	return ret;
> +}
> diff --git a/drivers/media/rc/lirc_dev.c b/drivers/media/rc/lirc_dev.c
> index 24e9fbb80e81..c3028d0366d1 100644
> --- a/drivers/media/rc/lirc_dev.c
> +++ b/drivers/media/rc/lirc_dev.c
> @@ -20,6 +20,7 @@
>  #include <linux/module.h>
>  #include <linux/mutex.h>
>  #include <linux/device.h>
> +#include <linux/file.h>
>  #include <linux/idr.h>
>  #include <linux/poll.h>
>  #include <linux/sched.h>
> @@ -816,4 +817,31 @@ void __exit lirc_dev_exit(void)
>  	unregister_chrdev_region(lirc_base_dev, RC_DEV_MAX);
>  }
>  
> +struct rc_dev *rc_dev_get_from_fd(int fd)
> +{
> +	struct rc_dev *dev;
> +	struct file *f;
> +
> +	f = fget_raw(fd);
> +	if (!f)
> +		return ERR_PTR(-EBADF);
> +
> +	if (!S_ISCHR(f->f_inode->i_mode) ||
> +	    imajor(f->f_inode) != MAJOR(lirc_base_dev)) {
> +		fput(f);
> +		return ERR_PTR(-EBADF);
> +	}
> +
> +	dev = container_of(f->f_inode->i_cdev, struct rc_dev, lirc_cdev);
> +	if (!dev->registered) {
> +		fput(f);
> +		return ERR_PTR(-ENODEV);
> +	}
> +
> +	get_device(&dev->dev);
> +	fput(f);
> +
> +	return dev;
> +}
> +
>  MODULE_ALIAS("lirc_dev");
> diff --git a/drivers/media/rc/rc-core-priv.h b/drivers/media/rc/rc-core-priv.h
> index e0e6a17460f6..6db579f425f1 100644
> --- a/drivers/media/rc/rc-core-priv.h
> +++ b/drivers/media/rc/rc-core-priv.h
> @@ -13,6 +13,7 @@
>  #define	MAX_IR_EVENT_SIZE	512
>  
>  #include <linux/slab.h>
> +#include <uapi/linux/bpf.h>
>  #include <media/rc-core.h>
>  
>  /**
> @@ -57,6 +58,11 @@ struct ir_raw_event_ctrl {
>  	/* raw decoder state follows */
>  	struct ir_raw_event prev_ev;
>  	struct ir_raw_event this_ev;
> +
> +#ifdef CONFIG_BPF_RAWIR_EVENT
> +	struct bpf_rawir_event		bpf_rawir_event;
> +	struct bpf_prog_array		*progs;
> +#endif
>  	struct nec_dec {
>  		int state;
>  		unsigned count;
> @@ -288,6 +294,7 @@ void ir_lirc_raw_event(struct rc_dev *dev, struct ir_raw_event ev);
>  void ir_lirc_scancode_event(struct rc_dev *dev, struct lirc_scancode *lsc);
>  int ir_lirc_register(struct rc_dev *dev);
>  void ir_lirc_unregister(struct rc_dev *dev);
> +struct rc_dev *rc_dev_get_from_fd(int fd);
>  #else
>  static inline int lirc_dev_init(void) { return 0; }
>  static inline void lirc_dev_exit(void) {}
> @@ -299,4 +306,16 @@ static inline int ir_lirc_register(struct rc_dev *dev) { return 0; }
>  static inline void ir_lirc_unregister(struct rc_dev *dev) { }
>  #endif
>  
> +/*
> + * bpf interface
> + */
> +#ifdef CONFIG_BPF_RAWIR_EVENT
> +void rc_dev_bpf_put(struct rc_dev *dev);
> +void rc_dev_bpf_run(struct rc_dev *dev, struct ir_raw_event ev);
> +#else
> +static inline void rc_dev_bpf_put(struct rc_dev *dev) { }
> +static inline void rc_dev_bpf_run(struct rc_dev *dev, struct ir_raw_event ev)
> +{ }
> +#endif
> +
>  #endif /* _RC_CORE_PRIV */
> diff --git a/drivers/media/rc/rc-ir-raw.c b/drivers/media/rc/rc-ir-raw.c
> index 374f83105a23..25828f15faec 100644
> --- a/drivers/media/rc/rc-ir-raw.c
> +++ b/drivers/media/rc/rc-ir-raw.c
> @@ -32,6 +32,7 @@ static int ir_raw_event_thread(void *data)
>  				    handler->protocols || !handler->protocols)
>  					handler->decode(raw->dev, ev);
>  			ir_lirc_raw_event(raw->dev, ev);
> +			rc_dev_bpf_run(raw->dev, ev);
>  			raw->prev_ev = ev;
>  		}
>  		mutex_unlock(&ir_raw_handler_lock);
> @@ -623,6 +624,8 @@ void ir_raw_event_unregister(struct rc_dev *dev)
>  			handler->raw_unregister(dev);
>  	mutex_unlock(&ir_raw_handler_lock);
>  
> +	rc_dev_bpf_put(dev);
> +
>  	ir_raw_event_free(dev);
>  }
>  
> diff --git a/include/linux/bpf_rcdev.h b/include/linux/bpf_rcdev.h
> new file mode 100644
> index 000000000000..17a30f30436a
> --- /dev/null
> +++ b/include/linux/bpf_rcdev.h
> @@ -0,0 +1,30 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _BPF_RCDEV_H
> +#define _BPF_RCDEV_H
> +
> +#include <linux/bpf.h>
> +#include <uapi/linux/bpf.h>
> +
> +#ifdef CONFIG_BPF_RAWIR_EVENT
> +int rc_dev_prog_attach(const union bpf_attr *attr);
> +int rc_dev_prog_detach(const union bpf_attr *attr);
> +int rc_dev_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr);
> +#else
> +static inline int rc_dev_prog_attach(const union bpf_attr *attr)
> +{
> +	return -EINVAL;
> +}
> +
> +static inline int rc_dev_prog_detach(const union bpf_attr *attr)
> +{
> +	return -EINVAL;
> +}
> +
> +static inline int rc_dev_prog_query(const union bpf_attr *attr,
> +				    union bpf_attr __user *uattr)
> +{
> +	return -EINVAL;
> +}
> +#endif
> +
> +#endif /* _BPF_RCDEV_H */
> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> index d7df1b323082..667d1d557090 100644
> --- a/include/linux/bpf_types.h
> +++ b/include/linux/bpf_types.h
> @@ -25,6 +25,9 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT, raw_tracepoint)
>  #ifdef CONFIG_CGROUP_BPF
>  BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev)
>  #endif
> +#ifdef CONFIG_BPF_RAWIR_EVENT
> +BPF_PROG_TYPE(BPF_PROG_TYPE_RAWIR_EVENT, rawir_event)
> +#endif
>  
>  BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
>  BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops)
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 02e4112510f8..8ba1be825af0 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -140,6 +140,7 @@ enum bpf_prog_type {
>  	BPF_PROG_TYPE_SK_MSG,
>  	BPF_PROG_TYPE_RAW_TRACEPOINT,
>  	BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
> +	BPF_PROG_TYPE_RAWIR_EVENT,
>  };
>  
>  enum bpf_attach_type {
> @@ -157,6 +158,7 @@ enum bpf_attach_type {
>  	BPF_CGROUP_INET6_CONNECT,
>  	BPF_CGROUP_INET4_POST_BIND,
>  	BPF_CGROUP_INET6_POST_BIND,
> +	BPF_RAWIR_EVENT,
>  	__MAX_BPF_ATTACH_TYPE
>  };
>  
> @@ -1855,6 +1857,35 @@ union bpf_attr {
>   *             Egress device index on success, 0 if packet needs to continue
>   *             up the stack for further processing or a negative error in case
>   *             of failure.
> + *
> + * int bpf_rc_keydown(void *ctx, u32 protocol, u32 scancode, u32 toggle)
> + *	Description
> + *		Report decoded scancode with toggle value. For use in
> + *		BPF_PROG_TYPE_RAWIR_EVENT, to report a successfully
> + *		decoded scancode. This is will generate a keydown event,
> + *		and a keyup event once the scancode is no longer repeated.
> + *
> + *		*ctx* pointer to bpf_rawir_event, *protocol* is decoded
> + *		protocol (see RC_PROTO_* enum).
> + *
> + *		Some protocols include a toggle bit, in case the button
> + *		was released and pressed again between consecutive scancodes,
> + *		copy this bit into *toggle* if it exists, else set to 0.
> + *
> + *     Return
> + *		Always return 0 (for now)
> + *
> + * int bpf_rc_repeat(void *ctx)
> + *	Description
> + *		Repeat the last decoded scancode; some IR protocols like
> + *		NEC have a special IR message for repeat last button,
> + *		in case user is holding a button down; the scancode is
> + *		not repeated.
> + *
> + *		*ctx* pointer to bpf_rawir_event.
> + *
> + *     Return
> + *		Always return 0 (for now)
>   */
>  #define __BPF_FUNC_MAPPER(FN)		\
>  	FN(unspec),			\
> @@ -1926,7 +1957,9 @@ union bpf_attr {
>  	FN(skb_get_xfrm_state),		\
>  	FN(get_stack),			\
>  	FN(skb_load_bytes_relative),	\
> -	FN(fib_lookup),
> +	FN(fib_lookup),			\
> +	FN(rc_repeat),			\
> +	FN(rc_keydown),
>  
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> @@ -1993,6 +2026,26 @@ enum bpf_hdr_start_off {
>  	BPF_HDR_START_NET,
>  };
>  
> +/*
> + * user accessible mirror of in-kernel ir_raw_event
> + */
> +#define BPF_RAWIR_EVENT_SPACE		0
> +#define BPF_RAWIR_EVENT_PULSE		1
> +#define BPF_RAWIR_EVENT_TIMEOUT		2
> +#define BPF_RAWIR_EVENT_RESET		3
> +#define BPF_RAWIR_EVENT_CARRIER		4
> +#define BPF_RAWIR_EVENT_DUTY_CYCLE	5
> +
> +struct bpf_rawir_event {
> +	union {
> +		__u32	duration;
> +		__u32	carrier;
> +		__u32	duty_cycle;
> +	};
> +
> +	__u32	type;
> +};
> +
>  /* user accessible mirror of in-kernel sk_buff.
>   * new fields can only be added to the end of this structure
>   */
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index e2aeb5e89f44..75c089f407c8 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -11,6 +11,7 @@
>   */
>  #include <linux/bpf.h>
>  #include <linux/bpf_trace.h>
> +#include <linux/bpf_rcdev.h>
>  #include <linux/btf.h>
>  #include <linux/syscalls.h>
>  #include <linux/slab.h>
> @@ -1567,6 +1568,8 @@ static int bpf_prog_attach(const union bpf_attr *attr)
>  	case BPF_SK_SKB_STREAM_PARSER:
>  	case BPF_SK_SKB_STREAM_VERDICT:
>  		return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_SKB, true);
> +	case BPF_RAWIR_EVENT:
> +		return rc_dev_prog_attach(attr);
>  	default:
>  		return -EINVAL;
>  	}
> @@ -1637,6 +1640,8 @@ static int bpf_prog_detach(const union bpf_attr *attr)
>  	case BPF_SK_SKB_STREAM_PARSER:
>  	case BPF_SK_SKB_STREAM_VERDICT:
>  		return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_SKB, false);
> +	case BPF_RAWIR_EVENT:
> +		return rc_dev_prog_detach(attr);
>  	default:
>  		return -EINVAL;
>  	}
> @@ -1684,6 +1689,8 @@ static int bpf_prog_query(const union bpf_attr *attr,
>  	case BPF_CGROUP_SOCK_OPS:
>  	case BPF_CGROUP_DEVICE:
>  		break;
> +	case BPF_RAWIR_EVENT:
> +		return rc_dev_prog_query(attr, uattr);
>  	default:
>  		return -EINVAL;
>  	}
> -- 
> 2.17.0

^ permalink raw reply

* Re: [PATCH 12/14] net: sched: retry action check-insert on concurrent modification
From: Vlad Buslov @ 2018-05-16 13:52 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, jhs, xiyou.wangcong, pablo, kadlec, fw, ast,
	daniel, edumazet, keescook, linux-kernel, netfilter-devel,
	coreteam, kliteyn
In-Reply-To: <20180516132135.GN1972@nanopsycho>


On Wed 16 May 2018 at 13:21, Jiri Pirko <jiri@resnulli.us> wrote:
> Wed, May 16, 2018 at 02:43:58PM CEST, vladbu@mellanox.com wrote:
>>
>>On Wed 16 May 2018 at 12:26, Jiri Pirko <jiri@resnulli.us> wrote:
>>> Wed, May 16, 2018 at 01:55:06PM CEST, vladbu@mellanox.com wrote:
>>>>
>>>>On Wed 16 May 2018 at 09:59, Jiri Pirko <jiri@resnulli.us> wrote:
>>>>> Mon, May 14, 2018 at 04:27:13PM CEST, vladbu@mellanox.com wrote:
>>>>>>Retry check-insert sequence in action init functions if action with same
>>>>>>index was inserted concurrently.
>>>>>>
>>>>>>Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
>>>>>>---
>>>>>> net/sched/act_bpf.c        | 8 +++++++-
>>>>>> net/sched/act_connmark.c   | 8 +++++++-
>>>>>> net/sched/act_csum.c       | 8 +++++++-
>>>>>> net/sched/act_gact.c       | 8 +++++++-
>>>>>> net/sched/act_ife.c        | 8 +++++++-
>>>>>> net/sched/act_ipt.c        | 8 +++++++-
>>>>>> net/sched/act_mirred.c     | 8 +++++++-
>>>>>> net/sched/act_nat.c        | 8 +++++++-
>>>>>> net/sched/act_pedit.c      | 8 +++++++-
>>>>>> net/sched/act_police.c     | 9 ++++++++-
>>>>>> net/sched/act_sample.c     | 8 +++++++-
>>>>>> net/sched/act_simple.c     | 9 ++++++++-
>>>>>> net/sched/act_skbedit.c    | 8 +++++++-
>>>>>> net/sched/act_skbmod.c     | 8 +++++++-
>>>>>> net/sched/act_tunnel_key.c | 9 ++++++++-
>>>>>> net/sched/act_vlan.c       | 9 ++++++++-
>>>>>> 16 files changed, 116 insertions(+), 16 deletions(-)
>>>>>>
>>>>>>diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
>>>>>>index 5554bf7..7e20fdc 100644
>>>>>>--- a/net/sched/act_bpf.c
>>>>>>+++ b/net/sched/act_bpf.c
>>>>>>@@ -299,10 +299,16 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla,
>>>>>> 
>>>>>> 	parm = nla_data(tb[TCA_ACT_BPF_PARMS]);
>>>>>> 
>>>>>>+replay:
>>>>>> 	if (!tcf_idr_check(tn, parm->index, act, bind)) {
>>>>>> 		ret = tcf_idr_create(tn, parm->index, est, act,
>>>>>> 				     &act_bpf_ops, bind, true);
>>>>>>-		if (ret < 0)
>>>>>>+		/* Action with specified index was created concurrently.
>>>>>>+		 * Check again.
>>>>>>+		 */
>>>>>>+		if (parm->index && ret == -ENOSPC)
>>>>>>+			goto replay;
>>>>>>+		else if (ret)
>>>>>
>>>>> Hmm, looks like you are doing the same/very similar thing in every act
>>>>> code. I think it would make sense to introduce a helper function for
>>>>> this purpose.
>>>>
>>>>This code uses goto so it can't be easily refactored into standalone
>>>>function. Could you specify which part of this code you suggest to
>>>>extract?
>>>
>>> Hmm, looking at the code, I think that what would help is to have a
>>> helper that would atomically check if index exists and if not, it would
>>> allocate one. Something like:
>>>
>>>
>>> int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index,
>>> 			struct tc_action **a, int bind)
>>> {
>>> 	struct tcf_idrinfo *idrinfo = tn->idrinfo;
>>> 	struct tc_action *p;
>>> 	int err;
>>>
>>> 	spin_lock(&idrinfo->lock);
>>> 	if (*index) {
>>> 		p = idr_find(&idrinfo->action_idr, *index);
>>> 		if (p) {
>>> 			if (bind)
>>> 	   			p->tcfa_bindcnt++;
>>> 			p->tcfa_refcnt++;
>>> 			*a = p;
>>> 			err = 0;
>>> 		} else {
>>> 			*a = NULL;
>>> 			err = idr_alloc_u32(idr, NULL, index,
>>> 					    *index, GFP_ATOMIC);
>>> 		}
>>> 	} else {
>>> 		*index = 1;
>>> 		*a = NULL;
>>> 		err = idr_alloc_u32(idr, NULL, index, UINT_MAX, GFP_ATOMIC);
>>> 	}
>>> 	spin_unlock(&idrinfo->lock);
>>> 	return err;
>>> }
>>>
>>> The act code would just check if "a" is NULL and if so, it would call
>>> tcf_idr_create() with allocated index as arg.
>>
>>What about multiple actions that have arbitrary code between initial
>>check and idr allocation that is currently inside tcf_idr_create()?
>
> Why it would be a problem to have them after the allocation?

Lets look at mirred for exmple:
	exists = tcf_idr_check(tn, parm->index, a, bind);
	if (exists && bind)
		return 0;

	switch (parm->eaction) {
	case TCA_EGRESS_MIRROR:
	case TCA_EGRESS_REDIR:
	case TCA_INGRESS_REDIR:
	case TCA_INGRESS_MIRROR:
		break;
	default:
		if (exists)
			tcf_idr_release(*a, bind);
		NL_SET_ERR_MSG_MOD(extack, "Unknown mirred option");
		return -EINVAL;
	}
	if (parm->ifindex) {
		dev = __dev_get_by_index(net, parm->ifindex);
		if (dev == NULL) {
			if (exists)
				tcf_idr_release(*a, bind);
			return -ENODEV;
		}
		mac_header_xmit = dev_is_mac_header_xmit(dev);
	} else {
		dev = NULL;
	}

	if (!exists) {
		if (!dev) {
			NL_SET_ERR_MSG_MOD(extack, "Specified device does not exist");
			return -EINVAL;
		}
		ret = tcf_idr_create(tn, parm->index, est, a,
				     &act_mirred_ops, bind, true);
		/* Action with specified index was created concurrently.
		 * Check again.
		 */
		if (parm->index && ret == -ENOSPC)
			goto replay;
		else if (ret)
			return ret;

There are several returns and cleanup is only performed when action
exists. So all code like that will have to be audited to also remove
index from idr, otherwise idr handles leak on return.

>
> There is one issue though with my draft. tcf_idr_insert() function
> which actually assigns a "p" pointer to the idr index is called later on.
> Until that happens, the idr_find() would return NULL even if the index
> is actually allocated. We cannot assign "p" in tcf_idr_check_alloc()
> because it is allocated only later on in tcf_idr_create(). But that is
> resolvable by the following trick:
>
> int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index,
> 			struct tc_action **a, int bind)
> {
> 	struct tcf_idrinfo *idrinfo = tn->idrinfo;
> 	struct tc_action *p;
> 	int err;
>
> again:
> 	spin_lock(&idrinfo->lock);
> 	if (*index) {
>  		p = idr_find(&idrinfo->action_idr, *index);
> 		if (IS_ERR(p)) {
> 			/* This means that another process allocated
> 			 * index but did not assign the pointer yet.
> 			 */
> 			spin_unlock(&idrinfo->lock);
> 			goto again;
> 		}
>  		if (p) {
>  			if (bind)
>  	   			p->tcfa_bindcnt++;
>  			p->tcfa_refcnt++;
>  			*a = p;
>  			err = 0;
>  		} else {
>  			*a = NULL;
>  			err = idr_alloc_u32(idr, NULL, index,
>  					    *index, GFP_ATOMIC);
> 			idr_replace(&idrinfo->action_idr,
> 				    ERR_PTR(-EBUSY), *index);
>  		}
>  	} else {
>  		*index = 1;
> 		*a = NULL;
>  		err = idr_alloc_u32(idr, NULL, index, UINT_MAX, GFP_ATOMIC);
> 		idr_replace(&idrinfo->action_idr, ERR_PTR(-EBUSY), *index);
>  	}
>  	spin_unlock(&idrinfo->lock);
>  	return err;
> }
>

So users of action idr that might perform concurrent lookups are also
have to be changed to check for error pointers, that now can be inserted
into idr? Seems like a complex change...

>
>
>
>
>>
>>>
>>>
>>>>
>>>>>
>>>>> [...]
>>>>
>>

^ permalink raw reply

* Re: [RFC v4 3/5] virtio_ring: add packed ring support
From: Tiwei Bie @ 2018-05-16 13:45 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann
In-Reply-To: <ecb50c84-b412-ff0b-6c52-fd789b6c8a86@redhat.com>

On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:
> On 2018年05月16日 20:39, Tiwei Bie wrote:
> > On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:
> > > On 2018年05月16日 16:37, Tiwei Bie wrote:
> > [...]
> > > >    struct vring_virtqueue {
> > > > @@ -116,6 +117,9 @@ struct vring_virtqueue {
> > > >    			/* Last written value to driver->flags in
> > > >    			 * guest byte order. */
> > > >    			u16 event_flags_shadow;
> > > > +
> > > > +			/* ID allocation. */
> > > > +			struct idr buffer_id;
> > > I'm not sure idr is fit for the performance critical case here. Need to
> > > measure its performance impact, especially if we have few unused slots.
> > I'm also not sure.. But fortunately, it should be quite easy
> > to replace it with something else without changing other code.
> > If it will really hurt the performance, I'll change it.
> 
> We may want to do some benchmarking/profiling to see.

Yeah!

> 
> > 
> > > >    		};
> > > >    	};
> > [...]
> > > > +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
> > > > +			      unsigned int id, void **ctx)
> > > > +{
> > > > +	struct vring_packed_desc *desc;
> > > > +	unsigned int i, j;
> > > > +
> > > > +	/* Clear data ptr. */
> > > > +	vq->desc_state[id].data = NULL;
> > > > +
> > > > +	i = head;
> > > > +
> > > > +	for (j = 0; j < vq->desc_state[id].num; j++) {
> > > > +		desc = &vq->vring_packed.desc[i];
> > > > +		vring_unmap_one_packed(vq, desc);
> > > As mentioned in previous discussion, this probably won't work for the case
> > > of out of order completion since it depends on the information in the
> > > descriptor ring. We probably need to extend ctx to record such information.
> > Above code doesn't depend on the information in the descriptor
> > ring. The vq->desc_state[] is the extended ctx.
> > 
> > Best regards,
> > Tiwei Bie
> 
> Yes, but desc is a pointer to descriptor ring I think so
> vring_unmap_one_packed() still depends on the content of descriptor ring?
> 

I got your point now. I think it makes sense to reserve
the bits of the addr field. Driver shouldn't try to get
addrs from the descriptors when cleanup the descriptors
no matter whether we support out-of-order or not.

But combining it with the out-of-order support, it will
mean that the driver still needs to maintain a desc/ctx
list that is very similar to the desc ring in the split
ring. I'm not quite sure whether it's something we want.
If it is true, I'll do it. So do you think we also want
to maintain such a desc/ctx list for packed ring?

Best regards,
Tiwei Bie

^ permalink raw reply

* Re: [iproute2] Bug#898840: Latest update breaks ip6 default gateway cli api
From: Luca Boccassi @ 2018-05-16 13:42 UTC (permalink / raw)
  To: Serhey Popovych, Stephen Hemminger; +Cc: Hans van Kranenburg, 898840, netdev
In-Reply-To: <4de26023-48bf-fb98-2592-6fe5a6bc5c49@mendix.com>

[-- Attachment #1: Type: text/plain, Size: 1439 bytes --]

On Wed, 2018-05-16 at 14:26 +0200, Hans van Kranenburg wrote:
> Package: iproute2
> Version: 4.16.0-2
> Severity: normal
> 
> Hi,
> 
> The last iproute2 update has a backwards incompatible change in
> setting
> IPv6 default routes, breaking existing configuration and scripts.
> 
> Previously, the following was possible, and now it requires an
> explicit
> -6 option to be added:
> 
> -# ip route add default via 2001:db8::1 dev eth0
> Error: inet address is expected rather than "2001:db8::1".
> 
> This works:
> -# ip -6 route add default via 2001:db8::1 dev eth0
> 
> I found out after having systems end up being unreachable after a
> reboot, because I have commands like these in network/interfaces.
> 
> I had a look at upstream changelogs, but I don't see any mention of
> this, and suspect it was not intentional. However, it's bad.

Hello Serhey and Stephen,

Hans reported a regression in v4.16.0, ip route now requires -6 to be
manually added when using v6 addresses while up to 4.15 it didn't, the
commands quoted show the problem.

Bisecting shows that the following commit from Serhey introduced the
problem:

93fa12418dc6f5943692250244be303bb162175b
utils: Always specify family and ->bytelen in get_prefix_1()

Could you please have a look when you have a moment? It's very easy to
reproduce, and it breaks existing scripts and so on.

Thanks!

-- 
Kind regards,
Luca Boccassi

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4
From: Jiri Slaby @ 2018-05-16 13:36 UTC (permalink / raw)
  To: Hayes Wang, Oliver Neukum; +Cc: netdev@vger.kernel.org
In-Reply-To: <60052c70-ae38-7be8-5f70-5598104a13bd@suse.cz>

On 05/16/2018, 03:29 PM, Jiri Slaby wrote:
> On 05/16/2018, 02:07 PM, Hayes Wang wrote:
>> Oliver Neukum [mailto:oneukum@suse.com]
>>> Sent: Wednesday, May 16, 2018 6:10 PM
>> [...]
>>>> Besides, I find a similar issue as following.
>>>> https://www.spinics.net/lists/netdev/msg493512.html
>>>
>>> Well, if we have an imbalance in NAPI it should strike whereever
>>> it is used, not just in suspend(). Is there debugging for NAPI
>>> we could activate?
>>
>> No. The driver doesn't embed such debugging about it.
> 
> Despite of that, Oliver, I have a kernel with a debug patch (attached)
> at (suse-only link):
> https://build.suse.de/project/show/home:jirislaby:stable-drm
> 
>> And I don't find an imbalance between napi_disable() and napi_enable().
> 
> There is none, apparently (the warns never triggered). BUt still the
> driver sucks wrt both power mgmt and dock plug/unplug. Since I am using
> the patch (it upper-bounds the napi_disable loop count) and the udev
> rule below, I can really use the nic.

BTW the added warning to napi_disable indeed triggers:

> xzgrep -a -B 2 kernel:.*WARNING.*napi messages-20180*
> messages-20180503.xz:2018-04-27T09:57:00.048922+02:00 anemoi2 kernel: [158616.363052] ------------[ cut here ]------------
> messages-20180503.xz:2018-04-27T09:57:00.048979+02:00 anemoi2 kernel: [158616.363070] NAPI_STATE_SCHED never cleared
> messages-20180503.xz:2018-04-27T09:57:00.048988+02:00 anemoi2 kernel: [158616.363120] WARNING: CPU: 1 PID: 14365 at ../net/core/dev.c:5665 napi_disable+0x3d/0x80

And since I do 'ip l set dev ethX down' before unplugging the dock with
the NIC, I have not seen a single occurrence.

So I assume it must be a problem of making usb->disconnect without prior
ndo->close (or alike).

thanks,
-- 
js
suse labs

^ permalink raw reply

* Re: i.MX6S/DL and QCA8334 switch using DSA driver - CPU port not working
From: Michal Vokáč @ 2018-05-16 13:32 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Florian Fainelli, netdev, Vivien Didelot
In-Reply-To: <20180516131753.GG22000@lunn.ch>

On 16.5.2018 15:17, Andrew Lunn wrote:
>> So as the fixed-link subnode is optional we still should force some sensible
>> defaults if it is not used. Same as Marvell drives does. Can I say that we
>> agreed that the current CPU port setting is not correct and the fastest link
>> speed and duplex supported by the chip should be set as a default?
> 
> That is a sensible default.

OK.

>>> As far as people using this driver, John submitted it, but we have not
>>> see many fixes/enhancements, so I am not clear who is actually using it
>>> these days... glad you are though!
>>
>> Thanks.
>>
>> To the compatibility of the driver with the QCA8334 that I use. I am not sure
>> what should be the correct way to deal with that. For example Marvell binding
>> uses only two compatible strings for a large range of chips in one family.
>> While Broadcom has one compatible string for each chip.
> 
> With the Marvell devices, there is an ID register which tells you the
> model. The compatible string tells us the information we need in order
> to find that ID register. Once we know the ID, we have all the
> information we need, so don't need a more specific compatible string.
> 
>> As I mentioned earlier
>> in this thread the QCA8334 switch has four ports while the already supported
>> QCA8337 has seven ports. That is the only difference. Register address space
>> is the same.
> 
> Can you identify the exact model using some ID register? If yes, than
> the existing compatible string is sufficient. If no, you need an
> additional compatible string.

Nope. Device ID is the same for the whole family so we need a new compatible.
--	
Michal

^ permalink raw reply

* Re: [PATCH net-next v2 15/15] arm64: dts: allwinner: a64: add SRAM controller device tree node
From: Maxime Ripard @ 2018-05-16 13:31 UTC (permalink / raw)
  To: Chen-Yu Tsai
  Cc: Icenowy Zheng, linux-arm-kernel, Mark Rutland, devicetree,
	Stephen Boyd, netdev, Michael Turquette, Rob Herring,
	Corentin Labbe, Mark Brown, Giuseppe Cavallaro, linux-clk
In-Reply-To: <CAGb2v65uZeueE=2FfsxWen9zNvCtMsJ+b=KgMfSh-ZKmO+S=cQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4635 bytes --]

Hi,

On Tue, May 15, 2018 at 11:47:16PM -0700, Chen-Yu Tsai wrote:
> On Mon, May 14, 2018 at 1:03 AM, Maxime Ripard
> <maxime.ripard@bootlin.com> wrote:
> > 1;5201;0c
> > On Sun, May 13, 2018 at 12:37:49PM -0700, Chen-Yu Tsai wrote:
> >> On Wed, May 2, 2018 at 4:54 AM, Maxime Ripard <maxime.ripard@bootlin.com> wrote:
> >> > On Wed, May 02, 2018 at 06:19:51PM +0800, Icenowy Zheng wrote:
> >> >>
> >> >>
> >> >> 于 2018年5月2日 GMT+08:00 下午5:53:21, Chen-Yu Tsai <wens@csie.org> 写到:
> >> >> >On Wed, May 2, 2018 at 5:51 PM, Maxime Ripard
> >> >> ><maxime.ripard@bootlin.com> wrote:
> >> >> >> Hi,
> >> >> >>
> >> >> >> On Wed, May 02, 2018 at 12:12:27AM +0800, Chen-Yu Tsai wrote:
> >> >> >>> From: Icenowy Zheng <icenowy@aosc.io>
> >> >> >>>
> >> >> >>> Allwinner A64 has a SRAM controller, and in the device tree
> >> >> >currently
> >> >> >>> we have a syscon node to enable EMAC driver to access the EMAC clock
> >> >> >>> register. As SRAM controller driver can now export regmap for this
> >> >> >>> register, replace the syscon node to the SRAM controller device
> >> >> >node,
> >> >> >>> and let EMAC driver to acquire its EMAC clock regmap.
> >> >> >>>
> >> >> >>> Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
> >> >> >>> Signed-off-by: Chen-Yu Tsai <wens@csie.org>
> >> >> >>> ---
> >> >> >>>  arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi | 23
> >> >> >+++++++++++++++----
> >> >> >>>  1 file changed, 19 insertions(+), 4 deletions(-)
> >> >> >>>
> >> >> >>> diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
> >> >> >b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
> >> >> >>> index 1b2ef28c42bd..1c37659d9d41 100644
> >> >> >>> --- a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
> >> >> >>> +++ b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
> >> >> >>> @@ -168,10 +168,25 @@
> >> >> >>>               #size-cells = <1>;
> >> >> >>>               ranges;
> >> >> >>>
> >> >> >>> -             syscon: syscon@1c00000 {
> >> >> >>> -                     compatible =
> >> >> >"allwinner,sun50i-a64-system-controller",
> >> >> >>> -                             "syscon";
> >> >> >>> +             sram_controller: sram-controller@1c00000 {
> >> >> >>> +                     compatible =
> >> >> >"allwinner,sun50i-a64-sram-controller";
> >> >> >>
> >> >> >> I don't think there's anything preventing us from keeping the
> >> >> >> -system-controller compatible. It's what was in the DT before, and
> >> >> >> it's how it's called in the datasheet.
> >> >> >
> >> >> >I actually meant to ask you about this. The -system-controller
> >> >> >compatible matches the datasheet better. Maybe we should just
> >> >> >switch to that one?
> >> >>
> >> >> No, if we do the switch the system-controller compatible,
> >> >> the device will be probed on the same memory region with
> >> >> a syscon on old DTs.
> >> >
> >> > The device hasn't magically changed either. Maybe we just need to add
> >> > a check to make sure we don't have the syscon compatible in the SRAM
> >> > driver probe so that the double driver issue doesn't happen?
> >>
> >> The syscon interface (which is not even a full blown device driver)
> >> only looks at the "syscon" compatible. Either way we're removing that
> >> part from the device tree so things should be ok for new device trees.
> >> As Maxime mentioned we can do a check for the syscon compatible and
> >> either give a warning to the user asking them to update their device
> >> tree, or not register our custom regmap, or not probe the SRAM driver.
> >> Personally I prefer the first option. The system controller block is
> >> probed before any syscon users, so we should be fine, given the dwmac
> >> driver goes the custom regmap path first.
> >>
> >> BTW, I still might end up changing the compatible. The manual uses
> >> "system control", not "system controller", which I think makes sense,
> >> since it is just a bunch of register files, kind of like the GRF
> >> (General Register Files) block found in Rockchip SoCs [1], and not an
> >> actual "controller".
> >
> > I'm not really fond of that, but we should at least make it consistent
> > on the other patches Paul sent then.
> 
> For the A10s / A13 right?

And A33, yep.

> I think my naming is slightly better, but it's just a minor detail.

Let's do this then.

> While we're still debating this, can I merge the R40 stuff first?
> The driver bits are already in.

Yep, go ahead.

Maxime

-- 
Maxime Ripard, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [RFC v4 4/5] virtio_ring: add event idx support in packed ring
From: Jason Wang @ 2018-05-16 13:31 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann
In-Reply-To: <20180516125825.GA4171@debian>



On 2018年05月16日 20:58, Tiwei Bie wrote:
> On Wed, May 16, 2018 at 08:17:21PM +0800, Jason Wang wrote:
>> On 2018年05月16日 16:37, Tiwei Bie wrote:
> [...]
>>> @@ -1160,15 +1186,27 @@ static void virtqueue_disable_cb_packed(struct virtqueue *_vq)
>>>    static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
>>>    {
>>>    	struct vring_virtqueue *vq = to_vvq(_vq);
>>> +	u16 wrap_counter;
>>>    	START_USE(vq);
>>>    	/* We optimistically turn back on interrupts, then check if there was
>>>    	 * more to do. */
>>> +	/* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
>>> +	 * either clear the flags bit or point the event index at the next
>>> +	 * entry. Always update the event index to keep code simple. */
>>> +
>>> +	wrap_counter = vq->wrap_counter;
>>> +	if (vq->last_used_idx > vq->next_avail_idx)
>> Should this be ">=" consider rx refill may try to completely fill the ring?
> It seems that there are two cases that last_used_idx
> equals to next_avail_idx. The first one is that the
> ring is empty. And the second one is that the ring
> is full. Although in the first case, most probably,
> the driver won't enable the interrupt.
>
> Maybe I really should track the used_wrap_counter
> instead of calculating it each time I need it.. I'll
> give it a try..
>

Right, good to know and this will match spec sample code.

Thanks

^ permalink raw reply

* Re: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4
From: Jiri Slaby @ 2018-05-16 13:29 UTC (permalink / raw)
  To: Hayes Wang, Oliver Neukum; +Cc: netdev@vger.kernel.org
In-Reply-To: <0835B3720019904CB8F7AA43166CEEB2D2E47F8B@RTITMBSV06.realtek.com.tw>

[-- Attachment #1: Type: text/plain, Size: 1143 bytes --]

On 05/16/2018, 02:07 PM, Hayes Wang wrote:
> Oliver Neukum [mailto:oneukum@suse.com]
>> Sent: Wednesday, May 16, 2018 6:10 PM
> [...]
>>> Besides, I find a similar issue as following.
>>> https://www.spinics.net/lists/netdev/msg493512.html
>>
>> Well, if we have an imbalance in NAPI it should strike whereever
>> it is used, not just in suspend(). Is there debugging for NAPI
>> we could activate?
> 
> No. The driver doesn't embed such debugging about it.

Despite of that, Oliver, I have a kernel with a debug patch (attached)
at (suse-only link):
https://build.suse.de/project/show/home:jirislaby:stable-drm

> And I don't find an imbalance between napi_disable() and napi_enable().

There is none, apparently (the warns never triggered). BUt still the
driver sucks wrt both power mgmt and dock plug/unplug. Since I am using
the patch (it upper-bounds the napi_disable loop count) and the udev
rule below, I can really use the nic.

$ cat /etc/udev/rules.d/10-disable-r8152-pm.rules
ACTION=="add", SUBSYSTEM=="usb", ATTR{idProduct}=="8153",
ATTR{idVendor}=="0bda", TEST=="power/control", ATTR{power/control}="on"

thanks,
-- 
js
suse labs

[-- Attachment #2: r8152.patch --]
[-- Type: text/x-patch, Size: 5467 bytes --]

---
 drivers/net/usb/r8152.c |   62 +++++++++++++++++++++++++++++++-----------------
 net/core/dev.c          |   14 +++++++++-
 2 files changed, 53 insertions(+), 23 deletions(-)

--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -704,6 +704,8 @@ struct r8152 {
 	unsigned long flags;
 	struct usb_device *udev;
 	struct napi_struct napi;
+	int napi_stat;
+	void *napi_last_en, *napi_last_dis;
 	struct usb_interface *intf;
 	struct net_device *netdev;
 	struct urb *intr_urb;
@@ -775,6 +777,31 @@ static unsigned int agg_buf_sz = 16384;
 #define RTL_LIMITED_TSO_SIZE	(agg_buf_sz - sizeof(struct tx_desc) - \
 				 VLAN_ETH_HLEN - ETH_FCS_LEN)
 
+static void my_napi_enable(struct r8152 *tp)
+{
+	if (tp->napi_stat == 0) {
+		napi_enable(&tp->napi);
+		tp->napi_stat++;
+		tp->napi_last_en = (void *)_RET_IP_;
+		return;
+	}
+
+	WARN(1, "napi_stat=%d\n", tp->napi_stat);
+}
+
+static void my_napi_disable(struct r8152 *tp)
+{
+	if (tp->napi_stat == 1) {
+		napi_disable(&tp->napi);
+		tp->napi_stat--;
+		tp->napi_last_dis = (void *)_RET_IP_;
+		return;
+	}
+
+	WARN(1, "napi_stat=%d last_dis=%pF last_en=%pF\n",
+			tp->napi_stat, tp->napi_last_en, tp->napi_last_dis);
+}
+
 static
 int get_registers(struct r8152 *tp, u16 value, u16 index, u16 size, void *data)
 {
@@ -3787,7 +3814,6 @@ static bool rtl8153_in_nway(struct r8152
 static void set_carrier(struct r8152 *tp)
 {
 	struct net_device *netdev = tp->netdev;
-	struct napi_struct *napi = &tp->napi;
 	u8 speed;
 
 	speed = rtl8152_get_speed(tp);
@@ -3796,12 +3822,12 @@ static void set_carrier(struct r8152 *tp
 		if (!netif_carrier_ok(netdev)) {
 			tp->rtl_ops.enable(tp);
 			netif_stop_queue(netdev);
-			napi_disable(napi);
+			my_napi_disable(tp);
 			netif_carrier_on(netdev);
 			rtl_start_rx(tp);
 			clear_bit(RTL8152_SET_RX_MODE, &tp->flags);
 			_rtl8152_set_rx_mode(netdev);
-			napi_enable(&tp->napi);
+			my_napi_enable(tp);
 			netif_wake_queue(netdev);
 			netif_info(tp, link, netdev, "carrier on\n");
 		} else if (netif_queue_stopped(netdev) &&
@@ -3811,9 +3837,9 @@ static void set_carrier(struct r8152 *tp
 	} else {
 		if (netif_carrier_ok(netdev)) {
 			netif_carrier_off(netdev);
-			napi_disable(napi);
+			my_napi_disable(tp);
 			tp->rtl_ops.disable(tp);
-			napi_enable(napi);
+			my_napi_enable(tp);
 			netif_info(tp, link, netdev, "carrier off\n");
 		}
 	}
@@ -3934,7 +3960,7 @@ static int rtl8152_open(struct net_devic
 			   res);
 		goto out_unlock;
 	}
-	napi_enable(&tp->napi);
+	my_napi_enable(tp);
 
 	mutex_unlock(&tp->control);
 
@@ -3962,7 +3988,7 @@ static int rtl8152_close(struct net_devi
 #ifdef CONFIG_PM_SLEEP
 	unregister_pm_notifier(&tp->pm_notifier);
 #endif
-	napi_disable(&tp->napi);
+	my_napi_disable(tp);
 	clear_bit(WORK_ENABLE, &tp->flags);
 	usb_kill_urb(tp->intr_urb);
 	cancel_delayed_work_sync(&tp->schedule);
@@ -4230,7 +4256,7 @@ static int rtl8152_pre_reset(struct usb_
 		return 0;
 
 	netif_stop_queue(netdev);
-	napi_disable(&tp->napi);
+	my_napi_disable(tp);
 	clear_bit(WORK_ENABLE, &tp->flags);
 	usb_kill_urb(tp->intr_urb);
 	cancel_delayed_work_sync(&tp->schedule);
@@ -4264,7 +4290,7 @@ static int rtl8152_post_reset(struct usb
 		mutex_unlock(&tp->control);
 	}
 
-	napi_enable(&tp->napi);
+	my_napi_enable(tp);
 	netif_wake_queue(netdev);
 	usb_submit_urb(tp->intr_urb, GFP_KERNEL);
 
@@ -4302,10 +4328,8 @@ static int rtl8152_runtime_resume(struct
 	struct net_device *netdev = tp->netdev;
 
 	if (netif_running(netdev) && netdev->flags & IFF_UP) {
-		struct napi_struct *napi = &tp->napi;
-
 		tp->rtl_ops.autosuspend_en(tp, false);
-		napi_disable(napi);
+		my_napi_disable(tp);
 		set_bit(WORK_ENABLE, &tp->flags);
 
 		if (netif_carrier_ok(netdev)) {
@@ -4318,7 +4342,7 @@ static int rtl8152_runtime_resume(struct
 			}
 		}
 
-		napi_enable(napi);
+		my_napi_enable(tp);
 		clear_bit(SELECTIVE_SUSPEND, &tp->flags);
 		smp_mb__after_atomic();
 
@@ -4388,13 +4412,11 @@ static int rtl8152_runtime_suspend(struc
 		tp->rtl_ops.autosuspend_en(tp, true);
 
 		if (netif_carrier_ok(netdev)) {
-			struct napi_struct *napi = &tp->napi;
-
-			napi_disable(napi);
+			my_napi_disable(tp);
 			rtl_stop_rx(tp);
 			rxdy_gated_en(tp, false);
 			ocp_write_dword(tp, MCU_TYPE_PLA, PLA_RCR, rcr);
-			napi_enable(napi);
+			my_napi_enable(tp);
 		}
 
 		if (delay_autosuspend(tp)) {
@@ -4415,14 +4437,12 @@ static int rtl8152_system_suspend(struct
 	netif_device_detach(netdev);
 
 	if (netif_running(netdev) && test_bit(WORK_ENABLE, &tp->flags)) {
-		struct napi_struct *napi = &tp->napi;
-
 		clear_bit(WORK_ENABLE, &tp->flags);
 		usb_kill_urb(tp->intr_urb);
-		napi_disable(napi);
+		my_napi_disable(tp);
 		cancel_delayed_work_sync(&tp->schedule);
 		tp->rtl_ops.down(tp);
-		napi_enable(napi);
+		my_napi_enable(tp);
 	}
 
 	return ret;
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5654,13 +5654,23 @@ EXPORT_SYMBOL(netif_napi_add);
 
 void napi_disable(struct napi_struct *n)
 {
+	int a;
+
 	might_sleep();
 	set_bit(NAPI_STATE_DISABLE, &n->state);
 
-	while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
+	a = 0;
+	while (test_and_set_bit(NAPI_STATE_SCHED, &n->state)) {
 		msleep(1);
-	while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
+		if (WARN(a++ > 20 * 1000, "NAPI_STATE_SCHED never cleared"))
+			break;
+	}
+	a = 0;
+	while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state)) {
 		msleep(1);
+		if (WARN(a++ > 20 * 1000, "NAPI_STATE_NPSVC never cleared"))
+			break;
+	}
 
 	hrtimer_cancel(&n->timer);
 

^ permalink raw reply

* Re: simplify procfs code for seq_file instances V3
From: Al Viro @ 2018-05-16 13:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Alexey Dobriyan, Greg Kroah-Hartman, Jiri Slaby,
	Alessandro Zummo, Alexandre Belloni, linux-acpi, drbd-dev,
	linux-ide, netdev, linux-rtc, megaraidlinux.pdl, linux-scsi,
	devel, linux-afs, linux-ext4, jfs-discussion, netfilter-devel,
	linux-kernel
In-Reply-To: <20180516094346.20506-1-hch@lst.de>

On Wed, May 16, 2018 at 11:43:04AM +0200, Christoph Hellwig wrote:
> We currently have hundreds of proc files that implement plain, read-only
> seq_file based interfaces.  This series consolidates them using new
> procfs helpers that take the seq_operations or simple show callback
> directly.
> 
> A git tree is available at:
> 
>     git://git.infradead.org/users/hch/misc.git proc_create.3

Pulled, but the last bit is a bleeding atrocity in need of followup
cleanup.

^ permalink raw reply

* Re: [PATCH 12/14] net: sched: retry action check-insert on concurrent modification
From: Jiri Pirko @ 2018-05-16 13:21 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: netdev, davem, jhs, xiyou.wangcong, pablo, kadlec, fw, ast,
	daniel, edumazet, keescook, linux-kernel, netfilter-devel,
	coreteam, kliteyn
In-Reply-To: <vbfwow322k1.fsf@reg-r-vrt-018-180.mtr.labs.mlnx>

Wed, May 16, 2018 at 02:43:58PM CEST, vladbu@mellanox.com wrote:
>
>On Wed 16 May 2018 at 12:26, Jiri Pirko <jiri@resnulli.us> wrote:
>> Wed, May 16, 2018 at 01:55:06PM CEST, vladbu@mellanox.com wrote:
>>>
>>>On Wed 16 May 2018 at 09:59, Jiri Pirko <jiri@resnulli.us> wrote:
>>>> Mon, May 14, 2018 at 04:27:13PM CEST, vladbu@mellanox.com wrote:
>>>>>Retry check-insert sequence in action init functions if action with same
>>>>>index was inserted concurrently.
>>>>>
>>>>>Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
>>>>>---
>>>>> net/sched/act_bpf.c        | 8 +++++++-
>>>>> net/sched/act_connmark.c   | 8 +++++++-
>>>>> net/sched/act_csum.c       | 8 +++++++-
>>>>> net/sched/act_gact.c       | 8 +++++++-
>>>>> net/sched/act_ife.c        | 8 +++++++-
>>>>> net/sched/act_ipt.c        | 8 +++++++-
>>>>> net/sched/act_mirred.c     | 8 +++++++-
>>>>> net/sched/act_nat.c        | 8 +++++++-
>>>>> net/sched/act_pedit.c      | 8 +++++++-
>>>>> net/sched/act_police.c     | 9 ++++++++-
>>>>> net/sched/act_sample.c     | 8 +++++++-
>>>>> net/sched/act_simple.c     | 9 ++++++++-
>>>>> net/sched/act_skbedit.c    | 8 +++++++-
>>>>> net/sched/act_skbmod.c     | 8 +++++++-
>>>>> net/sched/act_tunnel_key.c | 9 ++++++++-
>>>>> net/sched/act_vlan.c       | 9 ++++++++-
>>>>> 16 files changed, 116 insertions(+), 16 deletions(-)
>>>>>
>>>>>diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
>>>>>index 5554bf7..7e20fdc 100644
>>>>>--- a/net/sched/act_bpf.c
>>>>>+++ b/net/sched/act_bpf.c
>>>>>@@ -299,10 +299,16 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla,
>>>>> 
>>>>> 	parm = nla_data(tb[TCA_ACT_BPF_PARMS]);
>>>>> 
>>>>>+replay:
>>>>> 	if (!tcf_idr_check(tn, parm->index, act, bind)) {
>>>>> 		ret = tcf_idr_create(tn, parm->index, est, act,
>>>>> 				     &act_bpf_ops, bind, true);
>>>>>-		if (ret < 0)
>>>>>+		/* Action with specified index was created concurrently.
>>>>>+		 * Check again.
>>>>>+		 */
>>>>>+		if (parm->index && ret == -ENOSPC)
>>>>>+			goto replay;
>>>>>+		else if (ret)
>>>>
>>>> Hmm, looks like you are doing the same/very similar thing in every act
>>>> code. I think it would make sense to introduce a helper function for
>>>> this purpose.
>>>
>>>This code uses goto so it can't be easily refactored into standalone
>>>function. Could you specify which part of this code you suggest to
>>>extract?
>>
>> Hmm, looking at the code, I think that what would help is to have a
>> helper that would atomically check if index exists and if not, it would
>> allocate one. Something like:
>>
>>
>> int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index,
>> 			struct tc_action **a, int bind)
>> {
>> 	struct tcf_idrinfo *idrinfo = tn->idrinfo;
>> 	struct tc_action *p;
>> 	int err;
>>
>> 	spin_lock(&idrinfo->lock);
>> 	if (*index) {
>> 		p = idr_find(&idrinfo->action_idr, *index);
>> 		if (p) {
>> 			if (bind)
>> 	   			p->tcfa_bindcnt++;
>> 			p->tcfa_refcnt++;
>> 			*a = p;
>> 			err = 0;
>> 		} else {
>> 			*a = NULL;
>> 			err = idr_alloc_u32(idr, NULL, index,
>> 					    *index, GFP_ATOMIC);
>> 		}
>> 	} else {
>> 		*index = 1;
>> 		*a = NULL;
>> 		err = idr_alloc_u32(idr, NULL, index, UINT_MAX, GFP_ATOMIC);
>> 	}
>> 	spin_unlock(&idrinfo->lock);
>> 	return err;
>> }
>>
>> The act code would just check if "a" is NULL and if so, it would call
>> tcf_idr_create() with allocated index as arg.
>
>What about multiple actions that have arbitrary code between initial
>check and idr allocation that is currently inside tcf_idr_create()?

Why it would be a problem to have them after the allocation?

There is one issue though with my draft. tcf_idr_insert() function
which actually assigns a "p" pointer to the idr index is called later on.
Until that happens, the idr_find() would return NULL even if the index
is actually allocated. We cannot assign "p" in tcf_idr_check_alloc()
because it is allocated only later on in tcf_idr_create(). But that is
resolvable by the following trick:

int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index,
			struct tc_action **a, int bind)
{
	struct tcf_idrinfo *idrinfo = tn->idrinfo;
	struct tc_action *p;
	int err;

again:
	spin_lock(&idrinfo->lock);
	if (*index) {
 		p = idr_find(&idrinfo->action_idr, *index);
		if (IS_ERR(p)) {
			/* This means that another process allocated
			 * index but did not assign the pointer yet.
			 */
			spin_unlock(&idrinfo->lock);
			goto again;
		}
 		if (p) {
 			if (bind)
 	   			p->tcfa_bindcnt++;
 			p->tcfa_refcnt++;
 			*a = p;
 			err = 0;
 		} else {
 			*a = NULL;
 			err = idr_alloc_u32(idr, NULL, index,
 					    *index, GFP_ATOMIC);
			idr_replace(&idrinfo->action_idr,
				    ERR_PTR(-EBUSY), *index);
 		}
 	} else {
 		*index = 1;
		*a = NULL;
 		err = idr_alloc_u32(idr, NULL, index, UINT_MAX, GFP_ATOMIC);
		idr_replace(&idrinfo->action_idr, ERR_PTR(-EBUSY), *index);
 	}
 	spin_unlock(&idrinfo->lock);
 	return err;
}





>
>>
>>
>>>
>>>>
>>>> [...]
>>>
>

^ permalink raw reply

* Re: i.MX6S/DL and QCA8334 switch using DSA driver - CPU port not working
From: Andrew Lunn @ 2018-05-16 13:17 UTC (permalink / raw)
  To: Michal Vokáč; +Cc: Florian Fainelli, netdev, Vivien Didelot
In-Reply-To: <bc6fbc83-7c12-182b-cc8c-0dd0da6a9e0e@gmail.com>

> So as the fixed-link subnode is optional we still should force some sensible
> defaults if it is not used. Same as Marvell drives does. Can I say that we
> agreed that the current CPU port setting is not correct and the fastest link
> speed and duplex supported by the chip should be set as a default?

That is a sensible default.

> >As far as people using this driver, John submitted it, but we have not
> >see many fixes/enhancements, so I am not clear who is actually using it
> >these days... glad you are though!
> 
> Thanks.
> 
> To the compatibility of the driver with the QCA8334 that I use. I am not sure
> what should be the correct way to deal with that. For example Marvell binding
> uses only two compatible strings for a large range of chips in one family.
> While Broadcom has one compatible string for each chip.

With the Marvell devices, there is an ID register which tells you the
model. The compatible string tells us the information we need in order
to find that ID register. Once we know the ID, we have all the
information we need, so don't need a more specific compatible string.

> As I mentioned earlier
> in this thread the QCA8334 switch has four ports while the already supported
> QCA8337 has seven ports. That is the only difference. Register address space
> is the same.

Can you identify the exact model using some ID register? If yes, than
the existing compatible string is sufficient. If no, you need an
additional compatible string.

	Andrew

^ permalink raw reply

* Re: [Jfs-discussion] [PATCH 25/42] jfs: simplify procfs code
From: Dave Kleikamp @ 2018-05-16 13:15 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton, Alexander Viro
  Cc: linux-rtc, Alessandro Zummo, Alexandre Belloni, devel, linux-scsi,
	linux-acpi, Greg Kroah-Hartman, Jiri Slaby, linux-kernel,
	Alexey Dobriyan, linux-ide, netfilter-devel, megaraidlinux.pdl,
	netdev, linux-ext4, linux-afs, jfs-discussion, drbd-dev
In-Reply-To: <20180516094346.20506-26-hch@lst.de>

On 05/16/2018 04:43 AM, Christoph Hellwig wrote:
> Use remove_proc_subtree to remove the whole subtree on cleanup, and
> unwind the registration loop into individual calls.  Switch to use
> proc_create_seq where applicable.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Acked-by:  Dave Kleikamp <dave.kleikamp@oracle.com>

> ---
>  fs/jfs/jfs_debug.c    | 43 ++++++++++++++-----------------------------
>  fs/jfs/jfs_debug.h    | 10 +++++-----
>  fs/jfs/jfs_logmgr.c   | 14 +-------------
>  fs/jfs/jfs_metapage.c | 14 +-------------
>  fs/jfs/jfs_txnmgr.c   | 28 ++--------------------------
>  fs/jfs/jfs_xtree.c    | 14 +-------------
>  6 files changed, 24 insertions(+), 99 deletions(-)
> 
> diff --git a/fs/jfs/jfs_debug.c b/fs/jfs/jfs_debug.c
> index a70907606025..35a5b2a81ae0 100644
> --- a/fs/jfs/jfs_debug.c
> +++ b/fs/jfs/jfs_debug.c
> @@ -29,7 +29,6 @@
>  
>  #ifdef PROC_FS_JFS /* see jfs_debug.h */
>  
> -static struct proc_dir_entry *base;
>  #ifdef CONFIG_JFS_DEBUG
>  static int jfs_loglevel_proc_show(struct seq_file *m, void *v)
>  {
> @@ -66,43 +65,29 @@ static const struct file_operations jfs_loglevel_proc_fops = {
>  };
>  #endif
>  
> -static struct {
> -	const char	*name;
> -	const struct file_operations *proc_fops;
> -} Entries[] = {
> -#ifdef CONFIG_JFS_STATISTICS
> -	{ "lmstats",	&jfs_lmstats_proc_fops, },
> -	{ "txstats",	&jfs_txstats_proc_fops, },
> -	{ "xtstat",	&jfs_xtstat_proc_fops, },
> -	{ "mpstat",	&jfs_mpstat_proc_fops, },
> -#endif
> -#ifdef CONFIG_JFS_DEBUG
> -	{ "TxAnchor",	&jfs_txanchor_proc_fops, },
> -	{ "loglevel",	&jfs_loglevel_proc_fops }
> -#endif
> -};
> -#define NPROCENT	ARRAY_SIZE(Entries)
> -
>  void jfs_proc_init(void)
>  {
> -	int i;
> +	struct proc_dir_entry *base;
>  
> -	if (!(base = proc_mkdir("fs/jfs", NULL)))
> +	base = proc_mkdir("fs/jfs", NULL);
> +	if (!base)
>  		return;
>  
> -	for (i = 0; i < NPROCENT; i++)
> -		proc_create(Entries[i].name, 0, base, Entries[i].proc_fops);
> +#ifdef CONFIG_JFS_STATISTICS
> +	proc_create_single("lmstats", 0, base, jfs_lmstats_proc_show);
> +	proc_create_single("txstats", 0, base, jfs_txstats_proc_show);
> +	proc_create_single("xtstat", 0, base, jfs_xtstat_proc_show);
> +	proc_create_single("mpstat", 0, base, jfs_mpstat_proc_show);
> +#endif
> +#ifdef CONFIG_JFS_DEBUG
> +	proc_create_single("TxAnchor", 0, base, jfs_txanchor_proc_show);
> +	proc_create("loglevel", 0, base, &jfs_loglevel_proc_fops);
> +#endif
>  }
>  
>  void jfs_proc_clean(void)
>  {
> -	int i;
> -
> -	if (base) {
> -		for (i = 0; i < NPROCENT; i++)
> -			remove_proc_entry(Entries[i].name, base);
> -		remove_proc_entry("fs/jfs", NULL);
> -	}
> +	remove_proc_subtree("fs/jfs", NULL);
>  }
>  
>  #endif /* PROC_FS_JFS */
> diff --git a/fs/jfs/jfs_debug.h b/fs/jfs/jfs_debug.h
> index eafd1300a00b..0d9e35da8462 100644
> --- a/fs/jfs/jfs_debug.h
> +++ b/fs/jfs/jfs_debug.h
> @@ -62,7 +62,7 @@ extern void jfs_proc_clean(void);
>  
>  extern int jfsloglevel;
>  
> -extern const struct file_operations jfs_txanchor_proc_fops;
> +int jfs_txanchor_proc_show(struct seq_file *m, void *v);
>  
>  /* information message: e.g., configuration, major event */
>  #define jfs_info(fmt, arg...) do {			\
> @@ -105,10 +105,10 @@ extern const struct file_operations jfs_txanchor_proc_fops;
>   *	----------
>   */
>  #ifdef	CONFIG_JFS_STATISTICS
> -extern const struct file_operations jfs_lmstats_proc_fops;
> -extern const struct file_operations jfs_txstats_proc_fops;
> -extern const struct file_operations jfs_mpstat_proc_fops;
> -extern const struct file_operations jfs_xtstat_proc_fops;
> +int jfs_lmstats_proc_show(struct seq_file *m, void *v);
> +int jfs_txstats_proc_show(struct seq_file *m, void *v);
> +int jfs_mpstat_proc_show(struct seq_file *m, void *v);
> +int jfs_xtstat_proc_show(struct seq_file *m, void *v);
>  
>  #define	INCREMENT(x)		((x)++)
>  #define	DECREMENT(x)		((x)--)
> diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c
> index 0e5d412c0b01..6b68df395892 100644
> --- a/fs/jfs/jfs_logmgr.c
> +++ b/fs/jfs/jfs_logmgr.c
> @@ -2493,7 +2493,7 @@ int lmLogFormat(struct jfs_log *log, s64 logAddress, int logSize)
>  }
>  
>  #ifdef CONFIG_JFS_STATISTICS
> -static int jfs_lmstats_proc_show(struct seq_file *m, void *v)
> +int jfs_lmstats_proc_show(struct seq_file *m, void *v)
>  {
>  	seq_printf(m,
>  		       "JFS Logmgr stats\n"
> @@ -2510,16 +2510,4 @@ static int jfs_lmstats_proc_show(struct seq_file *m, void *v)
>  		       lmStat.partial_page);
>  	return 0;
>  }
> -
> -static int jfs_lmstats_proc_open(struct inode *inode, struct file *file)
> -{
> -	return single_open(file, jfs_lmstats_proc_show, NULL);
> -}
> -
> -const struct file_operations jfs_lmstats_proc_fops = {
> -	.open		= jfs_lmstats_proc_open,
> -	.read		= seq_read,
> -	.llseek		= seq_lseek,
> -	.release	= single_release,
> -};
>  #endif /* CONFIG_JFS_STATISTICS */
> diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
> index 1a3b0cc22ad3..fa2c6824c7f2 100644
> --- a/fs/jfs/jfs_metapage.c
> +++ b/fs/jfs/jfs_metapage.c
> @@ -815,7 +815,7 @@ void __invalidate_metapages(struct inode *ip, s64 addr, int len)
>  }
>  
>  #ifdef CONFIG_JFS_STATISTICS
> -static int jfs_mpstat_proc_show(struct seq_file *m, void *v)
> +int jfs_mpstat_proc_show(struct seq_file *m, void *v)
>  {
>  	seq_printf(m,
>  		       "JFS Metapage statistics\n"
> @@ -828,16 +828,4 @@ static int jfs_mpstat_proc_show(struct seq_file *m, void *v)
>  		       mpStat.lockwait);
>  	return 0;
>  }
> -
> -static int jfs_mpstat_proc_open(struct inode *inode, struct file *file)
> -{
> -	return single_open(file, jfs_mpstat_proc_show, NULL);
> -}
> -
> -const struct file_operations jfs_mpstat_proc_fops = {
> -	.open		= jfs_mpstat_proc_open,
> -	.read		= seq_read,
> -	.llseek		= seq_lseek,
> -	.release	= single_release,
> -};
>  #endif
> diff --git a/fs/jfs/jfs_txnmgr.c b/fs/jfs/jfs_txnmgr.c
> index 4d973524c887..a5663cb621d8 100644
> --- a/fs/jfs/jfs_txnmgr.c
> +++ b/fs/jfs/jfs_txnmgr.c
> @@ -2998,7 +2998,7 @@ int jfs_sync(void *arg)
>  }
>  
>  #if defined(CONFIG_PROC_FS) && defined(CONFIG_JFS_DEBUG)
> -static int jfs_txanchor_proc_show(struct seq_file *m, void *v)
> +int jfs_txanchor_proc_show(struct seq_file *m, void *v)
>  {
>  	char *freewait;
>  	char *freelockwait;
> @@ -3032,22 +3032,10 @@ static int jfs_txanchor_proc_show(struct seq_file *m, void *v)
>  		       list_empty(&TxAnchor.unlock_queue) ? "" : "not ");
>  	return 0;
>  }
> -
> -static int jfs_txanchor_proc_open(struct inode *inode, struct file *file)
> -{
> -	return single_open(file, jfs_txanchor_proc_show, NULL);
> -}
> -
> -const struct file_operations jfs_txanchor_proc_fops = {
> -	.open		= jfs_txanchor_proc_open,
> -	.read		= seq_read,
> -	.llseek		= seq_lseek,
> -	.release	= single_release,
> -};
>  #endif
>  
>  #if defined(CONFIG_PROC_FS) && defined(CONFIG_JFS_STATISTICS)
> -static int jfs_txstats_proc_show(struct seq_file *m, void *v)
> +int jfs_txstats_proc_show(struct seq_file *m, void *v)
>  {
>  	seq_printf(m,
>  		       "JFS TxStats\n"
> @@ -3072,16 +3060,4 @@ static int jfs_txstats_proc_show(struct seq_file *m, void *v)
>  		       TxStat.txLockAlloc_freelock);
>  	return 0;
>  }
> -
> -static int jfs_txstats_proc_open(struct inode *inode, struct file *file)
> -{
> -	return single_open(file, jfs_txstats_proc_show, NULL);
> -}
> -
> -const struct file_operations jfs_txstats_proc_fops = {
> -	.open		= jfs_txstats_proc_open,
> -	.read		= seq_read,
> -	.llseek		= seq_lseek,
> -	.release	= single_release,
> -};
>  #endif
> diff --git a/fs/jfs/jfs_xtree.c b/fs/jfs/jfs_xtree.c
> index 5cde6d2fcfca..2c200b5256a6 100644
> --- a/fs/jfs/jfs_xtree.c
> +++ b/fs/jfs/jfs_xtree.c
> @@ -3874,7 +3874,7 @@ s64 xtTruncate_pmap(tid_t tid, struct inode *ip, s64 committed_size)
>  }
>  
>  #ifdef CONFIG_JFS_STATISTICS
> -static int jfs_xtstat_proc_show(struct seq_file *m, void *v)
> +int jfs_xtstat_proc_show(struct seq_file *m, void *v)
>  {
>  	seq_printf(m,
>  		       "JFS Xtree statistics\n"
> @@ -3887,16 +3887,4 @@ static int jfs_xtstat_proc_show(struct seq_file *m, void *v)
>  		       xtStat.split);
>  	return 0;
>  }
> -
> -static int jfs_xtstat_proc_open(struct inode *inode, struct file *file)
> -{
> -	return single_open(file, jfs_xtstat_proc_show, NULL);
> -}
> -
> -const struct file_operations jfs_xtstat_proc_fops = {
> -	.open		= jfs_xtstat_proc_open,
> -	.read		= seq_read,
> -	.llseek		= seq_lseek,
> -	.release	= single_release,
> -};
>  #endif
> 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox