Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v4 2/2] tcp: Add snd_wnd to TCP_INFO
From: Neal Cardwell @ 2019-09-13 21:53 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Thomas Higdon, netdev@vger.kernel.org, Jonathan Lemon, Dave Jones,
	Eric Dumazet, Dave Taht, Soheil Hassas Yeganeh
In-Reply-To: <CAK6E8=ddxo+yg2tTiZm5YEbfPkeVkeZOGwB33+Qfb4Qfj4yDJA@mail.gmail.com>

On Fri, Sep 13, 2019 at 5:29 PM Yuchung Cheng <ycheng@google.com> wrote:
> > What if the comment is shortened up to fit in 80 columns and the units
> > (bytes) are added, something like:
> >
> >         __u32   tcpi_snd_wnd;        /* peer's advertised recv window (bytes) */
> just a thought: will tcpi_peer_rcv_wnd be more self-explanatory?

Good suggestion. I'm on the fence about that one. By itself, I agree
tcpi_peer_rcv_wnd sounds much more clear. But tcpi_snd_wnd has the
virtue of matching both the kernel code (tp->snd_wnd) and RFC 793
(SND.WND). So they both have pros and cons. Maybe someone else feels
more strongly one way or the other.

neal

^ permalink raw reply

* Re: [PATCH bpf-next 11/11] samples: bpf: makefile: add sysroot support
From: Yonghong Song @ 2019-09-13 21:45 UTC (permalink / raw)
  To: Ivan Khoronzhuk, ast@kernel.org, daniel@iogearbox.net,
	davem@davemloft.net, jakub.kicinski@netronome.com,
	hawk@kernel.org, john.fastabend@gmail.com
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	bpf@vger.kernel.org, clang-built-linux@googlegroups.com
In-Reply-To: <20190910103830.20794-12-ivan.khoronzhuk@linaro.org>



On 9/10/19 11:38 AM, Ivan Khoronzhuk wrote:
> Basically it only enables that was added by previous couple fixes.
> For sure, just make tools/include to be included after sysroot
> headers.
> 
> export ARCH=arm
> export CROSS_COMPILE=arm-linux-gnueabihf-
> make samples/bpf/ SYSROOT="path/to/sysroot"
> 
> Sysroot contains correct libs installed and its headers ofc.
> Useful when working with NFC or virtual machine.
> 
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> ---
>   samples/bpf/Makefile   |  5 +++++
>   samples/bpf/README.rst | 10 ++++++++++
>   2 files changed, 15 insertions(+)
> 
> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> index 4edc5232cfc1..68ba78d1dbbe 100644
> --- a/samples/bpf/Makefile
> +++ b/samples/bpf/Makefile
> @@ -177,6 +177,11 @@ ifeq ($(ARCH), arm)
>   CLANG_EXTRA_CFLAGS := $(D_OPTIONS)
>   endif
>   
> +ifdef SYSROOT
> +ccflags-y += --sysroot=${SYSROOT}
> +PROGS_LDFLAGS := -L${SYSROOT}/usr/lib
> +endif
> +
>   ccflags-y += -I$(objtree)/usr/include
>   ccflags-y += -I$(srctree)/tools/lib/bpf/
>   ccflags-y += -I$(srctree)/tools/testing/selftests/bpf/
> diff --git a/samples/bpf/README.rst b/samples/bpf/README.rst
> index 5f27e4faca50..786d0ab98e8a 100644
> --- a/samples/bpf/README.rst
> +++ b/samples/bpf/README.rst
> @@ -74,3 +74,13 @@ samples for the cross target.
>   export ARCH=arm64
>   export CROSS_COMPILE="aarch64-linux-gnu-"
>   make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
> +
> +If need to use environment of target board (headers and libs), the SYSROOT
> +also can be set, pointing on FS of target board:
> +
> +export ARCH=arm64
> +export CROSS_COMPILE="aarch64-linux-gnu-"
> +make samples/bpf/ SYSROOT=~/some_sdk/linux-devkit/sysroots/aarch64-linux-gnu
> +
> +Setting LLC and CLANG is not necessarily if it's installed on HOST and have
> +in its targets appropriate arch triple (usually it has several arches).

You have very good description about how to build and test in cover 
letter. Could you include those instructions here as well? This will
help keep a record so later people can try/test if needed.

^ permalink raw reply

* Re: [PATCH bpf-next 10/11] libbpf: makefile: add C/CXX/LDFLAGS to libbpf.so and test_libpf targets
From: Yonghong Song @ 2019-09-13 21:43 UTC (permalink / raw)
  To: Ivan Khoronzhuk, ast@kernel.org, daniel@iogearbox.net,
	davem@davemloft.net, jakub.kicinski@netronome.com,
	hawk@kernel.org, john.fastabend@gmail.com
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	bpf@vger.kernel.org, clang-built-linux@googlegroups.com
In-Reply-To: <20190910103830.20794-11-ivan.khoronzhuk@linaro.org>



On 9/10/19 11:38 AM, Ivan Khoronzhuk wrote:
> In case of LDFLAGS and EXTRA_CC/CXX flags there is no way to pass them
> correctly to build command, for instance when --sysroot is used or
> external libraries are used, like -lelf, wich can be absent in
> toolchain. This is used for samples/bpf cross-compiling allowing to
> get elf lib from sysroot.
> 
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> ---
>   samples/bpf/Makefile   |  8 +++++++-
>   tools/lib/bpf/Makefile | 11 ++++++++---
>   2 files changed, 15 insertions(+), 4 deletions(-)

Could you separate this patch into two?
One of libbpf and another for samples.

The subject 'libbpf: ...' is not entirely accurate.

> 
> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> index 79c9aa41832e..4edc5232cfc1 100644
> --- a/samples/bpf/Makefile
> +++ b/samples/bpf/Makefile
> @@ -186,6 +186,10 @@ ccflags-y += -I$(srctree)/tools/perf
>   ccflags-y += $(D_OPTIONS)
>   ccflags-y += -Wall
>   ccflags-y += -fomit-frame-pointer
> +
> +EXTRA_CXXFLAGS := $(ccflags-y)
> +
> +# options not valid for C++
>   ccflags-y += -Wmissing-prototypes
>   ccflags-y += -Wstrict-prototypes
>   
> @@ -252,7 +256,9 @@ clean:
>   
>   $(LIBBPF): FORCE
>   # Fix up variables inherited from Kbuild that tools/ build system won't like
> -	$(MAKE) -C $(dir $@) RM='rm -rf' LDFLAGS= srctree=$(BPF_SAMPLES_PATH)/../../ O=
> +	$(MAKE) -C $(dir $@) RM='rm -rf' EXTRA_CFLAGS="$(PROGS_CFLAGS)" \
> +		EXTRA_CXXFLAGS="$(EXTRA_CXXFLAGS)" LDFLAGS=$(PROGS_LDFLAGS) \
> +		srctree=$(BPF_SAMPLES_PATH)/../../ O=
>   
>   $(obj)/syscall_nrs.h:	$(obj)/syscall_nrs.s FORCE
>   	$(call filechk,offsets,__SYSCALL_NRS_H__)
> diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
> index c6f94cffe06e..bccfa556ef4e 100644
> --- a/tools/lib/bpf/Makefile
> +++ b/tools/lib/bpf/Makefile
> @@ -94,6 +94,10 @@ else
>     CFLAGS := -g -Wall
>   endif
>   
> +ifdef EXTRA_CXXFLAGS
> +  CXXFLAGS := $(EXTRA_CXXFLAGS)
> +endif
> +
>   ifeq ($(feature-libelf-mmap), 1)
>     override CFLAGS += -DHAVE_LIBELF_MMAP_SUPPORT
>   endif
> @@ -176,8 +180,9 @@ $(BPF_IN): force elfdep bpfdep
>   $(OUTPUT)libbpf.so: $(OUTPUT)libbpf.so.$(LIBBPF_VERSION)
>   
>   $(OUTPUT)libbpf.so.$(LIBBPF_VERSION): $(BPF_IN)
> -	$(QUIET_LINK)$(CC) --shared -Wl,-soname,libbpf.so.$(LIBBPF_MAJOR_VERSION) \
> -				    -Wl,--version-script=$(VERSION_SCRIPT) $^ -lelf -o $@
> +	$(QUIET_LINK)$(CC) $(LDFLAGS) \
> +		--shared -Wl,-soname,libbpf.so.$(LIBBPF_MAJOR_VERSION) \
> +		-Wl,--version-script=$(VERSION_SCRIPT) $^ -lelf -o $@
>   	@ln -sf $(@F) $(OUTPUT)libbpf.so
>   	@ln -sf $(@F) $(OUTPUT)libbpf.so.$(LIBBPF_MAJOR_VERSION)
>   
> @@ -185,7 +190,7 @@ $(OUTPUT)libbpf.a: $(BPF_IN)
>   	$(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^
>   
>   $(OUTPUT)test_libbpf: test_libbpf.cpp $(OUTPUT)libbpf.a
> -	$(QUIET_LINK)$(CXX) $(INCLUDES) $^ -lelf -o $@
> +	$(QUIET_LINK)$(CXX) $(CXXFLAGS) $(LDFLAGS) $(INCLUDES) $^ -lelf -o $@
>   
>   $(OUTPUT)libbpf.pc:
>   	$(QUIET_GEN)sed -e "s|@PREFIX@|$(prefix)|" \
> 

^ permalink raw reply

* Re: [PATCH bpf-next 08/11] samples: bpf: makefile: base progs build on makefile.progs
From: Yonghong Song @ 2019-09-13 21:41 UTC (permalink / raw)
  To: Ivan Khoronzhuk, ast@kernel.org, daniel@iogearbox.net,
	davem@davemloft.net, jakub.kicinski@netronome.com,
	hawk@kernel.org, john.fastabend@gmail.com
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	bpf@vger.kernel.org, clang-built-linux@googlegroups.com
In-Reply-To: <20190910103830.20794-9-ivan.khoronzhuk@linaro.org>



On 9/10/19 11:38 AM, Ivan Khoronzhuk wrote:
> The main reason for that - HOSTCC and CC have different aims.
> It was tested for arm cross compilation, based on linaro toolchain,
> but should work for others.
> 
> In order to split cross compilation (CC) with host build (HOSTCC),
> lets base bpf samples on Makefile.progs. It allows to cross-compile
> samples/bpf progs with CC while auxialry tools running on host built
> with HOSTCC.

I got a compilation failure with the following error

$ make samples/bpf/
   ...
   LD  samples/bpf/hbm
   CC      samples/bpf/syscall_nrs.s
gcc: error: -pg and -fomit-frame-pointer are incompatible
make[2]: *** [samples/bpf/syscall_nrs.s] Error 1
make[1]: *** [samples/bpf/] Error 2
make: *** [sub-make] Error 2

Could you take a look?

> 
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> ---
>   samples/bpf/Makefile | 138 +++++++++++++++++++++++--------------------
>   1 file changed, 73 insertions(+), 65 deletions(-)
> 
> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> index f5dbf3d0c5f3..625a71f2e9d2 100644
> --- a/samples/bpf/Makefile
> +++ b/samples/bpf/Makefile
> @@ -4,55 +4,53 @@ BPF_SAMPLES_PATH ?= $(abspath $(srctree)/$(src))
>   TOOLS_PATH := $(BPF_SAMPLES_PATH)/../../tools
>   
>   # List of programs to build
> -hostprogs-y := test_lru_dist
> -hostprogs-y += sock_example
> -hostprogs-y += fds_example
> -hostprogs-y += sockex1
> -hostprogs-y += sockex2
> -hostprogs-y += sockex3
> -hostprogs-y += tracex1
> -hostprogs-y += tracex2
> -hostprogs-y += tracex3
> -hostprogs-y += tracex4
> -hostprogs-y += tracex5
> -hostprogs-y += tracex6
> -hostprogs-y += tracex7
> -hostprogs-y += test_probe_write_user
> -hostprogs-y += trace_output
> -hostprogs-y += lathist
> -hostprogs-y += offwaketime
> -hostprogs-y += spintest
> -hostprogs-y += map_perf_test
> -hostprogs-y += test_overhead
> -hostprogs-y += test_cgrp2_array_pin
> -hostprogs-y += test_cgrp2_attach
> -hostprogs-y += test_cgrp2_sock
> -hostprogs-y += test_cgrp2_sock2
> -hostprogs-y += xdp1
> -hostprogs-y += xdp2
> -hostprogs-y += xdp_router_ipv4
> -hostprogs-y += test_current_task_under_cgroup
> -hostprogs-y += trace_event
> -hostprogs-y += sampleip
> -hostprogs-y += tc_l2_redirect
> -hostprogs-y += lwt_len_hist
> -hostprogs-y += xdp_tx_iptunnel
> -hostprogs-y += test_map_in_map
> -hostprogs-y += per_socket_stats_example
> -hostprogs-y += xdp_redirect
> -hostprogs-y += xdp_redirect_map
> -hostprogs-y += xdp_redirect_cpu
> -hostprogs-y += xdp_monitor
> -hostprogs-y += xdp_rxq_info
> -hostprogs-y += syscall_tp
> -hostprogs-y += cpustat
> -hostprogs-y += xdp_adjust_tail
> -hostprogs-y += xdpsock
> -hostprogs-y += xdp_fwd
> -hostprogs-y += task_fd_query
> -hostprogs-y += xdp_sample_pkts
> -hostprogs-y += ibumad
> -hostprogs-y += hbm
> +progs-y := test_lru_dist
> +progs-y += sock_example
> +progs-y += fds_example
> +progs-y += sockex1
> +progs-y += sockex2
> +progs-y += sockex3
> +progs-y += tracex1
> +progs-y += tracex2
> +progs-y += tracex3
> +progs-y += tracex4
> +progs-y += tracex5
> +progs-y += tracex6
> +progs-y += tracex7
> +progs-y += test_probe_write_user
> +progs-y += trace_output
> +progs-y += lathist
> +progs-y += offwaketime
> +progs-y += spintest
> +progs-y += map_perf_test
> +progs-y += test_overhead
> +progs-y += test_cgrp2_array_pin
> +progs-y += test_cgrp2_attach
> +progs-y += test_cgrp2_sock
> +progs-y += test_cgrp2_sock2
> +progs-y += xdp1
> +progs-y += xdp2
> +progs-y += xdp_router_ipv4
> +progs-y += test_current_task_under_cgroup
> +progs-y += trace_event
> +progs-y += sampleip
> +progs-y += tc_l2_redirect
> +progs-y += lwt_len_hist
> +progs-y += xdp_tx_iptunnel
> +progs-y += test_map_in_map
> +progs-y += xdp_redirect_map
> +progs-y += xdp_redirect_cpu
> +progs-y += xdp_monitor
> +progs-y += xdp_rxq_info
> +progs-y += syscall_tp
> +progs-y += cpustat
> +progs-y += xdp_adjust_tail
> +progs-y += xdpsock
> +progs-y += xdp_fwd
> +progs-y += task_fd_query
> +progs-y += xdp_sample_pkts
> +progs-y += ibumad
> +progs-y += hbm
>   
>   # Libbpf dependencies
>   LIBBPF = $(TOOLS_PATH)/lib/bpf/libbpf.a
> @@ -111,7 +109,7 @@ ibumad-objs := bpf_load.o ibumad_user.o $(TRACE_HELPERS)
>   hbm-objs := bpf_load.o hbm.o $(CGROUP_HELPERS)
>   
>   # Tell kbuild to always build the programs
> -always := $(hostprogs-y)
> +always := $(progs-y)
>   always += sockex1_kern.o
>   always += sockex2_kern.o
>   always += sockex3_kern.o
> @@ -170,21 +168,6 @@ always += ibumad_kern.o
>   always += hbm_out_kern.o
>   always += hbm_edt_kern.o
>   
> -KBUILD_HOSTCFLAGS += -I$(objtree)/usr/include
> -KBUILD_HOSTCFLAGS += -I$(srctree)/tools/lib/bpf/
> -KBUILD_HOSTCFLAGS += -I$(srctree)/tools/testing/selftests/bpf/
> -KBUILD_HOSTCFLAGS += -I$(srctree)/tools/lib/ -I$(srctree)/tools/include
> -KBUILD_HOSTCFLAGS += -I$(srctree)/tools/perf
> -
> -HOSTCFLAGS_bpf_load.o += -Wno-unused-variable
> -
> -KBUILD_HOSTLDLIBS		+= $(LIBBPF) -lelf
> -HOSTLDLIBS_tracex4		+= -lrt
> -HOSTLDLIBS_trace_output	+= -lrt
> -HOSTLDLIBS_map_perf_test	+= -lrt
> -HOSTLDLIBS_test_overhead	+= -lrt
> -HOSTLDLIBS_xdpsock		+= -pthread
> -
>   # Strip all expet -D options needed to handle linux headers
>   # for arm it's __LINUX_ARM_ARCH__ and potentially others fork vars
>   D_OPTIONS = $(shell echo "$(KBUILD_CFLAGS) " | sed 's/[[:blank:]]/\n/g' | \
> @@ -194,6 +177,29 @@ ifeq ($(ARCH), arm)
>   CLANG_EXTRA_CFLAGS := $(D_OPTIONS)
>   endif
>   
> +ccflags-y += -I$(objtree)/usr/include
> +ccflags-y += -I$(srctree)/tools/lib/bpf/
> +ccflags-y += -I$(srctree)/tools/testing/selftests/bpf/
> +ccflags-y += -I$(srctree)/tools/lib/
> +ccflags-y += -I$(srctree)/tools/include
> +ccflags-y += -I$(srctree)/tools/perf
> +ccflags-y += $(D_OPTIONS)
> +ccflags-y += -Wall
> +ccflags-y += -fomit-frame-pointer
> +ccflags-y += -Wmissing-prototypes
> +ccflags-y += -Wstrict-prototypes
> +
> +PROGS_CFLAGS := $(ccflags-y)
> +
> +PROGCFLAGS_bpf_load.o += -Wno-unused-variable
> +
> +PROGS_LDLIBS			:= $(LIBBPF) -lelf
> +PROGLDLIBS_tracex4		+= -lrt
> +PROGLDLIBS_trace_output		+= -lrt
> +PROGLDLIBS_map_perf_test	+= -lrt
> +PROGLDLIBS_test_overhead	+= -lrt
> +PROGLDLIBS_xdpsock		+= -pthread
> +
>   # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
>   #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
>   LLC ?= llc
> @@ -284,6 +290,8 @@ $(obj)/hbm_out_kern.o: $(src)/hbm.h $(src)/hbm_kern.h
>   $(obj)/hbm.o: $(src)/hbm.h
>   $(obj)/hbm_edt_kern.o: $(src)/hbm.h $(src)/hbm_kern.h
>   
> +-include $(BPF_SAMPLES_PATH)/Makefile.prog
> +
>   # asm/sysreg.h - inline assembly used by it is incompatible with llvm.
>   # But, there is no easy way to fix it, so just exclude it since it is
>   # useless for BPF samples.
> 

^ permalink raw reply

* Re: [PATCH bpf-next 07/11] samples: bpf: add makefile.prog for separate CC build
From: Yonghong Song @ 2019-09-13 21:33 UTC (permalink / raw)
  To: Ivan Khoronzhuk, ast@kernel.org, daniel@iogearbox.net,
	davem@davemloft.net, jakub.kicinski@netronome.com,
	hawk@kernel.org, john.fastabend@gmail.com
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	bpf@vger.kernel.org, clang-built-linux@googlegroups.com
In-Reply-To: <20190910103830.20794-8-ivan.khoronzhuk@linaro.org>



On 9/10/19 11:38 AM, Ivan Khoronzhuk wrote:
> The makefile.prog is added only, will be used in sample/bpf/Makefile
> later in order to switch cross-compiling on CC from HOSTCC.
> 
> The HOSTCC is supposed to build binaries and tools running on the host
> afterwards, in order to simplify build or so, like "fixdep" or else.
> In case of cross compiling "fixdep" is executed on host when the rest
> samples should run on target arch. In order to build binaries for
> target arch with CC and tools running on host with HOSTCC, lets add
> Makefile.prog for simplicity, having definition and routines similar
> to ones, used in script/Makefile.host. This allows later add
> cross-compilation to samples/bpf with minimum changes.

So this is really Makefile.host or Makefile.user, right?
In BPF, 'prog' can refers to user prog or bpf prog.
To avoid ambiguity, maybe Makefile.host?

> 
> Makefile.prog contains only stuff needed for samples/bpf, potentially
> can be reused and extended for other prog sets later and now needed

What do you mean 'extended for other prog sets'? I am wondering whether
we could just include 'scripts/Makefile.host'? How hard it is?

> only for unblocking tricky samples/bpf cross compilation.
> 
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> ---
>   samples/bpf/Makefile.prog | 77 +++++++++++++++++++++++++++++++++++++++
>   1 file changed, 77 insertions(+)
>   create mode 100644 samples/bpf/Makefile.prog
> 
> diff --git a/samples/bpf/Makefile.prog b/samples/bpf/Makefile.prog
> new file mode 100644
> index 000000000000..3781999b9193
> --- /dev/null
> +++ b/samples/bpf/Makefile.prog
> @@ -0,0 +1,77 @@
> +# SPDX-License-Identifier: GPL-2.0
> +# ==========================================================================
> +# Building binaries on the host system
> +# Binaries are not used during the compilation of the kernel, and intendent
> +# to be build for target board, target board can be host ofc. Added to build
> +# binaries to run not on host system.
> +#
> +# Only C is supported, but can be extended for C++.

The above comment is not needed.
samples/bpf/ only have C now. I am wondering whether your below scripts 
can be simplified, e.g., removing cxxobjs.

> +#
> +# Sample syntax (see Documentation/kbuild/makefiles.rst for reference)
> +# progs-y := xsk_example
> +# Will compile xdpsock_example.c and create an executable named xsk_example
> +#
> +# progs-y    := xdpsock
> +# xdpsock-objs := xdpsock_1.o xdpsock_2.o
> +# Will compile xdpsock_1.c and xdpsock_2.c, and then link the executable
> +# xdpsock, based on xdpsock_1.o and xdpsock_2.o
> +#
> +# Inherited from scripts/Makefile.host
> +#
> +__progs := $(sort $(progs-y))
> +
> +# C code
> +# Executables compiled from a single .c file
> +prog-csingle	:= $(foreach m,$(__progs), \
> +			$(if $($(m)-objs)$($(m)-cxxobjs),,$(m)))
> +
> +# C executables linked based on several .o files
> +prog-cmulti	:= $(foreach m,$(__progs),\
> +		   $(if $($(m)-cxxobjs),,$(if $($(m)-objs),$(m))))
> +
> +# Object (.o) files compiled from .c files
> +prog-cobjs	:= $(sort $(foreach m,$(__progs),$($(m)-objs)))
> +
> +prog-csingle	:= $(addprefix $(obj)/,$(prog-csingle))
> +prog-cmulti	:= $(addprefix $(obj)/,$(prog-cmulti))
> +prog-cobjs	:= $(addprefix $(obj)/,$(prog-cobjs))
> +
> +#####
> +# Handle options to gcc. Support building with separate output directory
> +
> +_progc_flags   = $(PROGS_CFLAGS) \
> +                 $(PROGCFLAGS_$(basetarget).o)
> +
> +# $(objtree)/$(obj) for including generated headers from checkin source files
> +ifeq ($(KBUILD_EXTMOD),)
> +ifdef building_out_of_srctree
> +_progc_flags   += -I $(objtree)/$(obj)
> +endif
> +endif
> +
> +progc_flags    = -Wp,-MD,$(depfile) $(_progc_flags)
> +
> +# Create executable from a single .c file
> +# prog-csingle -> Executable
> +quiet_cmd_prog-csingle 	= CC  $@
> +      cmd_prog-csingle	= $(CC) $(progc_flags) $(PROGS_LDFLAGS) -o $@ $< \
> +		$(PROGS_LDLIBS) $(PROGLDLIBS_$(@F))
> +$(prog-csingle): $(obj)/%: $(src)/%.c FORCE
> +	$(call if_changed_dep,prog-csingle)
> +
> +# Link an executable based on list of .o files, all plain c
> +# prog-cmulti -> executable
> +quiet_cmd_prog-cmulti	= LD  $@
> +      cmd_prog-cmulti	= $(CC) $(progc_flags) $(PROGS_LDFLAGS) -o $@ \
> +			  $(addprefix $(obj)/,$($(@F)-objs)) \
> +			  $(PROGS_LDLIBS) $(PROGLDLIBS_$(@F))
> +$(prog-cmulti): $(prog-cobjs) FORCE
> +	$(call if_changed,prog-cmulti)
> +$(call multi_depend, $(prog-cmulti), , -objs)
> +
> +# Create .o file from a single .c file
> +# prog-cobjs -> .o
> +quiet_cmd_prog-cobjs	= CC  $@
> +      cmd_prog-cobjs	= $(CC) $(progc_flags) -c -o $@ $<
> +$(prog-cobjs): $(obj)/%.o: $(src)/%.c FORCE
> +	$(call if_changed_dep,prog-cobjs)
> 

^ permalink raw reply

* Re: [PATCH v3 0/5] Introduce variable length mdev alias
From: Alex Williamson @ 2019-09-13 21:32 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jiri Pirko, kwankhede@nvidia.com, cohuck@redhat.com,
	davem@davemloft.net, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <AM0PR05MB48667E374853D485788D8159D1B10@AM0PR05MB4866.eurprd05.prod.outlook.com>

On Wed, 11 Sep 2019 16:38:49 +0000
Parav Pandit <parav@mellanox.com> wrote:

> > -----Original Message-----
> > From: linux-kernel-owner@vger.kernel.org <linux-kernel-  
> > owner@vger.kernel.org> On Behalf Of Parav Pandit  
> > Sent: Wednesday, September 11, 2019 10:31 AM
> > To: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Jiri Pirko <jiri@mellanox.com>; kwankhede@nvidia.com;
> > cohuck@redhat.com; davem@davemloft.net; kvm@vger.kernel.org; linux-
> > kernel@vger.kernel.org; netdev@vger.kernel.org
> > Subject: RE: [PATCH v3 0/5] Introduce variable length mdev alias
> > 
> > Hi Alex,
> >   
> > > -----Original Message-----
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Wednesday, September 11, 2019 8:56 AM
> > > To: Parav Pandit <parav@mellanox.com>
> > > Cc: Jiri Pirko <jiri@mellanox.com>; kwankhede@nvidia.com;
> > > cohuck@redhat.com; davem@davemloft.net; kvm@vger.kernel.org; linux-
> > > kernel@vger.kernel.org; netdev@vger.kernel.org
> > > Subject: Re: [PATCH v3 0/5] Introduce variable length mdev alias
> > >
> > > On Mon, 9 Sep 2019 20:42:32 +0000
> > > Parav Pandit <parav@mellanox.com> wrote:
> > >  
> > > > Hi Alex,
> > > >  
> > > > > -----Original Message-----
> > > > > From: Parav Pandit <parav@mellanox.com>
> > > > > Sent: Sunday, September 1, 2019 11:25 PM
> > > > > To: alex.williamson@redhat.com; Jiri Pirko <jiri@mellanox.com>;
> > > > > kwankhede@nvidia.com; cohuck@redhat.com; davem@davemloft.net
> > > > > Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > > netdev@vger.kernel.org; Parav Pandit <parav@mellanox.com>
> > > > > Subject: [PATCH v3 0/5] Introduce variable length mdev alias
> > > > >
> > > > > To have consistent naming for the netdevice of a mdev and to have
> > > > > consistent naming of the devlink port [1] of a mdev, which is
> > > > > formed using phys_port_name of the devlink port, current UUID is
> > > > > not usable because UUID is too long.
> > > > >
> > > > > UUID in string format is 36-characters long and in binary 128-bit.
> > > > > Both formats are not able to fit within 15 characters limit of
> > > > > netdev  
> > > name.  
> > > > >
> > > > > It is desired to have mdev device naming consistent using UUID.
> > > > > So that widely used user space framework such as ovs [2] can make
> > > > > use of mdev representor in similar way as PCIe SR-IOV VF and PF  
> > > representors.  
> > > > >
> > > > > Hence,
> > > > > (a) mdev alias is created which is derived using sha1 from the
> > > > > mdev  
> > > name.  
> > > > > (b) Vendor driver describes how long an alias should be for the
> > > > > child mdev created for a given parent.
> > > > > (c) Mdev aliases are unique at system level.
> > > > > (d) alias is created optionally whenever parent requested.
> > > > > This ensures that non networking mdev parents can function without
> > > > > alias creation overhead.
> > > > >
> > > > > This design is discussed at [3].
> > > > >
> > > > > An example systemd/udev extension will have,
> > > > >
> > > > > 1. netdev name created using mdev alias available in sysfs.
> > > > >
> > > > > mdev UUID=83b8f4f2-509f-382f-3c1e-e6bfe0fa1001
> > > > > mdev 12 character alias=cd5b146a80a5
> > > > >
> > > > > netdev name of this mdev = enmcd5b146a80a5 Here en = Ethernet link
> > > > > m = mediated device
> > > > >
> > > > > 2. devlink port phys_port_name created using mdev alias.
> > > > > devlink phys_port_name=pcd5b146a80a5
> > > > >
> > > > > This patchset enables mdev core to maintain unique alias for a mdev.
> > > > >
> > > > > Patch-1 Introduces mdev alias using sha1.
> > > > > Patch-2 Ensures that mdev alias is unique in a system.
> > > > > Patch-3 Exposes mdev alias in a sysfs hirerchy, update
> > > > > Documentation
> > > > > Patch-4 Introduces mdev_alias() API.
> > > > > Patch-5 Extends mtty driver to optionally provide alias generation.
> > > > > This also enables to test UUID based sha1 collision and trigger
> > > > > error handling for duplicate sha1 results.
> > > > >
> > > > > [1] http://man7.org/linux/man-pages/man8/devlink-port.8.html
> > > > > [2] https://docs.openstack.org/os-vif/latest/user/plugins/ovs.html
> > > > > [3] https://patchwork.kernel.org/cover/11084231/
> > > > >
> > > > > ---
> > > > > Changelog:
> > > > > v2->v3:
> > > > >  - Addressed comment from Yunsheng Lin
> > > > >  - Changed strcmp() ==0 to !strcmp()
> > > > >  - Addressed comment from Cornelia Hunk
> > > > >  - Merged sysfs Documentation patch with syfs patch
> > > > >  - Added more description for alias return value  
> > > >
> > > > Did you get a chance review this updated series?
> > > > I addressed Cornelia's and yours comment.
> > > > I do not think allocating alias memory twice, once for comparison
> > > > and once for storing is good idea or moving alias generation logic
> > > > inside the mdev_list_lock(). So I didn't address that suggestion of  
> > Cornelia.  
> > >
> > > Sorry, I'm at LPC this week.  I agree, I don't think the double
> > > allocation is necessary, I thought the comment was sufficient to
> > > clarify null'ing the variable.  It's awkward, but seems correct.
> > >
> > > I'm not sure what we do with this patch series though, has the real
> > > consumer of this even been proposed?    
> 
> Jiri already acked to use mdev_alias() to generate phys_port_name several days back in the discussion we had in [1].
> After concluding in the thread [1], I proceed with mdev_alias().
> mlx5_core patches are not yet present on netdev mailing list, but we
> all agree to use it in mdev_alias() in devlink phys_port_name
> generation. So we have collective agreement on how to proceed
> forward. I wasn't probably clear enough in previous email reply about
> it, so adding link here.
> 
> [1] https://patchwork.kernel.org/cover/11084231/#22838955

Jiri may have agreed to the concept, but without patches on the list
proving an end to end solution, I think it's too early for us to commit
to this by preemptively adding it to our API.  "Acked" and "collective
agreement" seem like they overstate something that seems not to have
seen the light of day yet.  Instead I'll say, it looks reasonable, come
back when the real consumer has actually been proposed upstream and has
more buy-in from the community and we'll see if it still looks like the
right approach from an mdev perspective then.  Thanks,

Alex

^ permalink raw reply

* Re: [PATCH v4 2/2] tcp: Add snd_wnd to TCP_INFO
From: Yuchung Cheng @ 2019-09-13 21:28 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Thomas Higdon, netdev@vger.kernel.org, Jonathan Lemon, Dave Jones,
	Eric Dumazet, Dave Taht, Soheil Hassas Yeganeh
In-Reply-To: <CADVnQymKS6-jztAbLu_QYWiPYMqoTf5ODzSg3UPJxH+vBt=bmw@mail.gmail.com>

On Fri, Sep 13, 2019 at 2:02 PM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Fri, Sep 13, 2019 at 3:36 PM Thomas Higdon <tph@fb.com> wrote:
> >
> > Neal Cardwell mentioned that snd_wnd would be useful for diagnosing TCP
> > performance problems --
> > > (1) Usually when we're diagnosing TCP performance problems, we do so
> > > from the sender, since the sender makes most of the
> > > performance-critical decisions (cwnd, pacing, TSO size, TSQ, etc).
> > > From the sender-side the thing that would be most useful is to see
> > > tp->snd_wnd, the receive window that the receiver has advertised to
> > > the sender.
> >
> > This serves the purpose of adding an additional __u32 to avoid the
> > would-be hole caused by the addition of the tcpi_rcvi_ooopack field.
> >
> > Signed-off-by: Thomas Higdon <tph@fb.com>
> > ---
> > changes from v3:
> >  - changed from rcv_wnd to snd_wnd
> >
> >  include/uapi/linux/tcp.h | 1 +
> >  net/ipv4/tcp.c           | 1 +
> >  2 files changed, 2 insertions(+)
> >
> > diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> > index 20237987ccc8..240654f22d98 100644
> > --- a/include/uapi/linux/tcp.h
> > +++ b/include/uapi/linux/tcp.h
> > @@ -272,6 +272,7 @@ struct tcp_info {
> >         __u32   tcpi_reord_seen;     /* reordering events seen */
> >
> >         __u32   tcpi_rcv_ooopack;    /* Out-of-order packets received */
> > +       __u32   tcpi_snd_wnd;        /* Remote peer's advertised recv window size */
> >  };
>
> Thanks for adding this!
>
> My run of ./scripts/checkpatch.pl is showing a warning on this line:
>
> WARNING: line over 80 characters
> #19: FILE: include/uapi/linux/tcp.h:273:
> +       __u32   tcpi_snd_wnd;        /* Remote peer's advertised recv
> window size */
>
> What if the comment is shortened up to fit in 80 columns and the units
> (bytes) are added, something like:
>
>         __u32   tcpi_snd_wnd;        /* peer's advertised recv window (bytes) */
just a thought: will tcpi_peer_rcv_wnd be more self-explanatory?
>
> neal

^ permalink raw reply

* [PATCH v4.14-stable 2/2] tcp: Don't dequeue SYN/FIN-segments from write-queue
From: Christoph Paasch @ 2019-09-13 20:08 UTC (permalink / raw)
  To: stable, netdev, gregkh, Sasha Levin
  Cc: David Miller, Eric Dumazet, Jason Baron, Vladimir Rutsky,
	Soheil Hassas Yeganeh, Neal Cardwell
In-Reply-To: <20190913200819.32686-1-cpaasch@apple.com>

If a SYN/FIN-segment is on the write-queue, skb->len is 0, but the
segment actually has been transmitted. end_seq and seq of the tcp_skb_cb
in that case will indicate this difference.

We should not remove such segments from the write-queue as we might be
in SYN_SENT-state and a retransmission-timer is running. When that one
fires, packets_out will be 1, but the write-queue would be empty,
resulting in:

[   61.280214] ------------[ cut here ]------------
[   61.281307] WARNING: CPU: 0 PID: 0 at net/ipv4/tcp_timer.c:429 tcp_retransmit_timer+0x18f9/0x2660
[   61.283498] Modules linked in:
[   61.284084] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.142 #58
[   61.285214] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
[   61.286644] task: ffffffff8401e1c0 task.stack: ffffffff84000000
[   61.287758] RIP: 0010:tcp_retransmit_timer+0x18f9/0x2660
[   61.288715] RSP: 0018:ffff88806ce07cb8 EFLAGS: 00010206
[   61.289669] RAX: ffffffff8401e1c0 RBX: ffff88805c998b00 RCX: 0000000000000006
[   61.290968] RDX: 0000000000000100 RSI: 0000000000000000 RDI: ffff88805c9994d8
[   61.292314] RBP: ffff88805c99919a R08: ffff88807fff901c R09: ffff88807fff9008
[   61.293547] R10: ffff88807fff9017 R11: ffff88807fff9010 R12: ffff88805c998b30
[   61.294834] R13: ffffffff844b9380 R14: 0000000000000000 R15: ffff88805c99930c
[   61.296086] FS:  0000000000000000(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000
[   61.297523] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   61.298646] CR2: 00007f721da50ff8 CR3: 0000000004014002 CR4: 00000000001606f0
[   61.299944] Call Trace:
[   61.300403]  <IRQ>
[   61.300806]  ? kvm_sched_clock_read+0x21/0x30
[   61.301689]  ? sched_clock+0x5/0x10
[   61.302433]  ? sched_clock_cpu+0x18/0x170
[   61.303173]  tcp_write_timer_handler+0x2c1/0x7a0
[   61.304038]  tcp_write_timer+0x13e/0x160
[   61.304794]  call_timer_fn+0x14a/0x5f0
[   61.305480]  ? tcp_write_timer_handler+0x7a0/0x7a0
[   61.306364]  ? __next_timer_interrupt+0x140/0x140
[   61.307229]  ? _raw_spin_unlock_irq+0x24/0x40
[   61.308033]  ? tcp_write_timer_handler+0x7a0/0x7a0
[   61.308887]  ? tcp_write_timer_handler+0x7a0/0x7a0
[   61.309760]  run_timer_softirq+0xc41/0x1080
[   61.310539]  ? trigger_dyntick_cpu.isra.33+0x180/0x180
[   61.311506]  ? ktime_get+0x13f/0x1c0
[   61.312232]  ? clockevents_program_event+0x10d/0x2f0
[   61.313158]  __do_softirq+0x20b/0x96b
[   61.313889]  irq_exit+0x1a7/0x1e0
[   61.314513]  smp_apic_timer_interrupt+0xfc/0x4d0
[   61.315386]  apic_timer_interrupt+0x8f/0xa0
[   61.316129]  </IRQ>

Followed by a panic.

So, before removing an skb with skb->len == 0, let's make sure that the
skb is really empty by checking the end_seq and seq.

This patch needs to be backported only to 4.14 and older (among those
that applied the backport of fdfc5c8594c2).

Fixes: fdfc5c8594c2 ("tcp: remove empty skb from write queue in error cases")
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Vladimir Rutsky <rutsky@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
---
 net/ipv4/tcp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index efe767e20d01..c1f59a53f68f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -922,7 +922,8 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
  */
 static void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb)
 {
-	if (skb && !skb->len) {
+	if (skb && !skb->len &&
+	    TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq) {
 		tcp_unlink_write_queue(skb, sk);
 		tcp_check_send_head(sk, skb);
 		sk_wmem_free_skb(sk, skb);
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH bpf-next 02/11] samples: bpf: makefile: fix cookie_uid_helper_example obj build
From: Ivan Khoronzhuk @ 2019-09-13 21:25 UTC (permalink / raw)
  To: Yonghong Song
  Cc: ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net,
	jakub.kicinski@netronome.com, hawk@kernel.org,
	john.fastabend@gmail.com, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	clang-built-linux@googlegroups.com
In-Reply-To: <7f556c1c-abee-41a9-af83-1d72fc33af4b@fb.com>

On Fri, Sep 13, 2019 at 08:48:37PM +0000, Yonghong Song wrote:
>
>
>On 9/10/19 11:38 AM, Ivan Khoronzhuk wrote:
>> Don't list userspace "cookie_uid_helper_example" object in list for
>> bpf objects.
>>
>> per_socket_stats_example-opjs is used to list additional dependencies
>
>s/opjs/objs
>
>> for user space binary from hostprogs-y list. Kbuild system creates
>> rules for objects listed this way anyway and no need to worry about
>> this. Despite on it, the samples bpf uses logic that hostporgs-y are
>> build for userspace with includes needed for this, but "always"
>> target, if it's not in hostprog-y list, uses CLANG-bpf rule and is
>> intended to create bpf obj but not arch obj and uses only kernel
>> includes for that. So correct it, as it breaks cross-compiling at
>> least.
>
>The above description is a little tricky to understand.
>Maybe something like:
>    'always' target is for bpf programs.
>    'cookie_uid_helper_example.o' is a user space ELF file, and
>    covered by rule `per_socket_stats_example`.
>    Let us remove `always += cookie_uid_helper_example.o`,
>    which avoids breaking cross compilation due to
>    mismatched includes.

Yes, looks better, thanks.

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply

* Re: [PATCH bpf-next 05/11] samples: bpf: makefile: use D vars from KBUILD_CFLAGS to handle headers
From: Ivan Khoronzhuk @ 2019-09-13 21:24 UTC (permalink / raw)
  To: Yonghong Song
  Cc: ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net,
	jakub.kicinski@netronome.com, hawk@kernel.org,
	john.fastabend@gmail.com, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	clang-built-linux@googlegroups.com
In-Reply-To: <97ca4228-145a-2449-b4ba-8e79380a54f4@fb.com>

On Fri, Sep 13, 2019 at 09:12:01PM +0000, Yonghong Song wrote:
>
>
>On 9/10/19 11:38 AM, Ivan Khoronzhuk wrote:
>> The kernel headers are reused from samples bpf, and autoconf.h is not
>> enough to reflect complete arch configuration for clang. But CLANG-bpf
>> cmds are sensitive for assembler part taken from linux headers and -D
>> vars, usually used in CFLAGS, should be carefully added for each arch.
>> For that, for CLANG-bpf, lets filter them only for arm arch as it
>> definitely requires __LINUX_ARM_ARCH__ to be set, but ignore for
>> others till it's really needed. For arm, -D__LINUX_ARM_ARCH__ is min
>> version used as instruction set selector. In another case errors
>> like "SMP is not supported" for arm and bunch of other errors are
>> issued resulting to incorrect final object.
>>
>> Later D_OPTIONS can be used for gcc part.
>> ---
>>   samples/bpf/Makefile | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
>> index 8ecc5d0c2d5b..6492b7e65c08 100644
>> --- a/samples/bpf/Makefile
>> +++ b/samples/bpf/Makefile
>> @@ -185,6 +185,15 @@ HOSTLDLIBS_map_perf_test	+= -lrt
>>   HOSTLDLIBS_test_overhead	+= -lrt
>>   HOSTLDLIBS_xdpsock		+= -pthread
>>
>> +# Strip all expet -D options needed to handle linux headers
>> +# for arm it's __LINUX_ARM_ARCH__ and potentially others fork vars
>> +D_OPTIONS = $(shell echo "$(KBUILD_CFLAGS) " | sed 's/[[:blank:]]/\n/g' | \
>> +	sed '/^-D/!d' | tr '\n' ' ')
>> +
>> +ifeq ($(ARCH), arm)
>> +CLANG_EXTRA_CFLAGS := $(D_OPTIONS)
>> +endif
>
>Do you need this for native compilation?
Yes, native "arm" also requires it.

>
>so arm64 compilation does not need this?
yes, now only arm

>If only -D__LINUX_ARM_ARCH__ is needed, maybe just
>with
>    CLANG_EXTRA_CFLAGS := -D__LINUX_ARM_ARCH__
Value also needed: -D__LINUX_ARM_ARCH_=7 or -D__LINUX_ARM_ARCH_=6
So, need retrieve it.

>Otherwise, people will wonder whether this is needed for
>other architectures. Or just do
>    CLANG_EXTRA_CFLAGS := $(D_OPTIONS)
>for all cross compilation?
arm, cross and native requires it.

Will do this:

# Strip all expet -D options needed to handle linux headers
# for arm it's __LINUX_ARM_ARCH__ and potentially others fork vars
ifeq ($(ARCH), arm)
D_OPTIONS = $(shell echo "$(KBUILD_CFLAGS) " | sed 's/[[:blank:]]/\n/g' | \
	sed '/^-D/!d' | tr '\n' ' ')
endif

CLANG_EXTRA_CFLAGS := $(D_OPTIONS)



>
>> +
>>   # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
>>   #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
>>   LLC ?= llc
>>

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply

* [PATCH v4.14-stable 1/2] tcp: Reset send_head when removing skb from write-queue
From: Christoph Paasch @ 2019-09-13 20:08 UTC (permalink / raw)
  To: stable, netdev, gregkh, Sasha Levin
  Cc: David Miller, Eric Dumazet, Jason Baron, Vladimir Rutsky,
	Soheil Hassas Yeganeh, Neal Cardwell
In-Reply-To: <20190913200819.32686-1-cpaasch@apple.com>

syzkaller is not happy since commit fdfc5c8594c2 ("tcp: remove empty skb
from write queue in error cases"):

CPU: 1 PID: 13814 Comm: syz-executor.4 Not tainted 4.14.143 #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
task: ffff888040105c00 task.stack: ffff8880649c0000
RIP: 0010:tcp_sendmsg_locked+0x6b4/0x4390 net/ipv4/tcp.c:1350
RSP: 0018:ffff8880649cf718 EFLAGS: 00010206
RAX: 0000000000000014 RBX: 000000000000001e RCX: ffffc90000717000
RDX: 0000000000000077 RSI: ffffffff82e760f7 RDI: 00000000000000a0
RBP: ffff8880649cfaa8 R08: 1ffff1100c939e7a R09: ffff8880401063c8
R10: 0000000000000003 R11: 0000000000000001 R12: dffffc0000000000
R13: ffff888043d74750 R14: ffff888043d74500 R15: 000000000000001e
FS:  00007f0afcb6d700(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2ca22000 CR3: 0000000040496004 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 tcp_sendmsg+0x2a/0x40 net/ipv4/tcp.c:1533
 inet_sendmsg+0x173/0x4e0 net/ipv4/af_inet.c:784
 sock_sendmsg_nosec net/socket.c:646 [inline]
 sock_sendmsg+0xc3/0x100 net/socket.c:656
 SYSC_sendto+0x35d/0x5e0 net/socket.c:1766
 do_syscall_64+0x241/0x680 arch/x86/entry/common.c:292
 entry_SYSCALL_64_after_hwframe+0x42/0xb7

The problem is that we are removing an skb from the write-queue that
could have been referenced by the sk_send_head. Thus, we need to check
for the send_head's sanity after removing it.

This patch needs to be backported only to 4.14 and older (among those
that applied the backport of fdfc5c8594c2).

Fixes: fdfc5c8594c2 ("tcp: remove empty skb from write queue in error cases")
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Vladimir Rutsky <rutsky@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
---
 net/ipv4/tcp.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5ce069ce2a97..efe767e20d01 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -924,8 +924,7 @@ static void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb)
 {
 	if (skb && !skb->len) {
 		tcp_unlink_write_queue(skb, sk);
-		if (tcp_write_queue_empty(sk))
-			tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
+		tcp_check_send_head(sk, skb);
 		sk_wmem_free_skb(sk, skb);
 	}
 }
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH bpf-next 05/11] samples: bpf: makefile: use D vars from KBUILD_CFLAGS to handle headers
From: Yonghong Song @ 2019-09-13 21:12 UTC (permalink / raw)
  To: Ivan Khoronzhuk, ast@kernel.org, daniel@iogearbox.net,
	davem@davemloft.net, jakub.kicinski@netronome.com,
	hawk@kernel.org, john.fastabend@gmail.com
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	bpf@vger.kernel.org, clang-built-linux@googlegroups.com
In-Reply-To: <20190910103830.20794-6-ivan.khoronzhuk@linaro.org>



On 9/10/19 11:38 AM, Ivan Khoronzhuk wrote:
> The kernel headers are reused from samples bpf, and autoconf.h is not
> enough to reflect complete arch configuration for clang. But CLANG-bpf
> cmds are sensitive for assembler part taken from linux headers and -D
> vars, usually used in CFLAGS, should be carefully added for each arch.
> For that, for CLANG-bpf, lets filter them only for arm arch as it
> definitely requires __LINUX_ARM_ARCH__ to be set, but ignore for
> others till it's really needed. For arm, -D__LINUX_ARM_ARCH__ is min
> version used as instruction set selector. In another case errors
> like "SMP is not supported" for arm and bunch of other errors are
> issued resulting to incorrect final object.
> 
> Later D_OPTIONS can be used for gcc part.
> ---
>   samples/bpf/Makefile | 9 +++++++++
>   1 file changed, 9 insertions(+)
> 
> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> index 8ecc5d0c2d5b..6492b7e65c08 100644
> --- a/samples/bpf/Makefile
> +++ b/samples/bpf/Makefile
> @@ -185,6 +185,15 @@ HOSTLDLIBS_map_perf_test	+= -lrt
>   HOSTLDLIBS_test_overhead	+= -lrt
>   HOSTLDLIBS_xdpsock		+= -pthread
>   
> +# Strip all expet -D options needed to handle linux headers
> +# for arm it's __LINUX_ARM_ARCH__ and potentially others fork vars
> +D_OPTIONS = $(shell echo "$(KBUILD_CFLAGS) " | sed 's/[[:blank:]]/\n/g' | \
> +	sed '/^-D/!d' | tr '\n' ' ')
> +
> +ifeq ($(ARCH), arm)
> +CLANG_EXTRA_CFLAGS := $(D_OPTIONS)
> +endif

Do you need this for native compilation?

so arm64 compilation does not need this?
If only -D__LINUX_ARM_ARCH__ is needed, maybe just
with
    CLANG_EXTRA_CFLAGS := -D__LINUX_ARM_ARCH__
Otherwise, people will wonder whether this is needed for
other architectures. Or just do
    CLANG_EXTRA_CFLAGS := $(D_OPTIONS)
for all cross compilation?

> +
>   # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
>   #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
>   LLC ?= llc
> 

^ permalink raw reply

* Re: [PATCH v4 2/2] tcp: Add snd_wnd to TCP_INFO
From: Neal Cardwell @ 2019-09-13 21:02 UTC (permalink / raw)
  To: Thomas Higdon
  Cc: netdev@vger.kernel.org, Jonathan Lemon, Dave Jones, Eric Dumazet,
	Dave Taht, Yuchung Cheng, Soheil Hassas Yeganeh
In-Reply-To: <20190913193629.55201-2-tph@fb.com>

On Fri, Sep 13, 2019 at 3:36 PM Thomas Higdon <tph@fb.com> wrote:
>
> Neal Cardwell mentioned that snd_wnd would be useful for diagnosing TCP
> performance problems --
> > (1) Usually when we're diagnosing TCP performance problems, we do so
> > from the sender, since the sender makes most of the
> > performance-critical decisions (cwnd, pacing, TSO size, TSQ, etc).
> > From the sender-side the thing that would be most useful is to see
> > tp->snd_wnd, the receive window that the receiver has advertised to
> > the sender.
>
> This serves the purpose of adding an additional __u32 to avoid the
> would-be hole caused by the addition of the tcpi_rcvi_ooopack field.
>
> Signed-off-by: Thomas Higdon <tph@fb.com>
> ---
> changes from v3:
>  - changed from rcv_wnd to snd_wnd
>
>  include/uapi/linux/tcp.h | 1 +
>  net/ipv4/tcp.c           | 1 +
>  2 files changed, 2 insertions(+)
>
> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> index 20237987ccc8..240654f22d98 100644
> --- a/include/uapi/linux/tcp.h
> +++ b/include/uapi/linux/tcp.h
> @@ -272,6 +272,7 @@ struct tcp_info {
>         __u32   tcpi_reord_seen;     /* reordering events seen */
>
>         __u32   tcpi_rcv_ooopack;    /* Out-of-order packets received */
> +       __u32   tcpi_snd_wnd;        /* Remote peer's advertised recv window size */
>  };

Thanks for adding this!

My run of ./scripts/checkpatch.pl is showing a warning on this line:

WARNING: line over 80 characters
#19: FILE: include/uapi/linux/tcp.h:273:
+       __u32   tcpi_snd_wnd;        /* Remote peer's advertised recv
window size */

What if the comment is shortened up to fit in 80 columns and the units
(bytes) are added, something like:

        __u32   tcpi_snd_wnd;        /* peer's advertised recv window (bytes) */

neal

^ permalink raw reply

* Re: [PATCH v4 1/2] tcp: Add TCP_INFO counter for packets received out-of-order
From: Neal Cardwell @ 2019-09-13 20:55 UTC (permalink / raw)
  To: Thomas Higdon
  Cc: netdev@vger.kernel.org, Jonathan Lemon, Dave Jones, Eric Dumazet,
	Dave Taht, Yuchung Cheng, Soheil Hassas Yeganeh
In-Reply-To: <20190913193629.55201-1-tph@fb.com>

On Fri, Sep 13, 2019 at 3:37 PM Thomas Higdon <tph@fb.com> wrote:
>
> For receive-heavy cases on the server-side, we want to track the
> connection quality for individual client IPs. This counter, similar to
> the existing system-wide TCPOFOQueue counter in /proc/net/netstat,
> tracks out-of-order packet reception. By providing this counter in
> TCP_INFO, it will allow understanding to what degree receive-heavy
> sockets are experiencing out-of-order delivery and packet drops
> indicating congestion.
>
> Please note that this is similar to the counter in NetBSD TCP_INFO, and
> has the same name.
>
> Signed-off-by: Thomas Higdon <tph@fb.com>
> ---
>
> no changes from v3
>
>  include/linux/tcp.h      | 2 ++
>  include/uapi/linux/tcp.h | 2 ++
>  net/ipv4/tcp.c           | 2 ++
>  net/ipv4/tcp_input.c     | 1 +
>  4 files changed, 7 insertions(+)
>
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index f3a85a7fb4b1..a01dc78218f1 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -393,6 +393,8 @@ struct tcp_sock {
>          */
>         struct request_sock *fastopen_rsk;
>         u32     *saved_syn;
> +
> +       u32 rcv_ooopack; /* Received out-of-order packets, for tcpinfo */

Thanks for adding this.

A thought: putting the new rcv_ooopack field here makes struct
tcp_sock bigger, and increases the odds of taking a cache miss
(according to "pahole" this field is the only one in a new cache
line).

I'd suggest putting the new rcv_ooopack field immediately before
rcv_rtt_last_tsecr. That would use up a 4-byte hole, and would place
it in a cache line already used on TCP receivers (for rcv_rtt logic).
This would make it less likely this new field causes more cache misses
or uses more space.

Details: looking at the output of "pahole" for tcp_sock in various cases:

net-next before this patch:
-------------------------------------
...
        u8                         bpf_sock_ops_cb_flags; /*  2076     1 */

        /* XXX 3 bytes hole, try to pack */

        u32                        rcv_rtt_last_tsecr;   /*  2080     4 */

        /* XXX 4 bytes hole, try to pack */

        struct {
                u32                rtt_us;               /*  2088     4 */
                u32                seq;                  /*  2092     4 */
                u64                time;                 /*  2096     8 */
        } rcv_rtt_est;                                   /*  2088    16 */
...
        /* size: 2176, cachelines: 34, members: 134 */
        /* sum members: 2164, holes: 4, sum holes: 12 */
        /* paddings: 3, sum paddings: 12 */


net-next with this patch:
-------------------------------------
...
        u32 *                      saved_syn;            /*  2168     8 */
        /* --- cacheline 34 boundary (2176 bytes) --- */
        u32                        rcv_ooopack;          /*  2176     4 */
...
        /* size: 2184, cachelines: 35, members: 135 */
        /* sum members: 2168, holes: 4, sum holes: 12 */
        /* padding: 4 */
        /* paddings: 3, sum paddings: 12 */
        /* last cacheline: 8 bytes */


net-next with this field in the suggested spot:
-------------------------------------
...
       /* XXX 3 bytes hole, try to pack */

        u32                        rcv_ooopack;          /*  2080     4 */
        u32                        rcv_rtt_last_tsecr;   /*  2084     4 */
        struct {
                u32                rtt_us;               /*  2088     4 */
                u32                seq;                  /*  2092     4 */
                u64                time;                 /*  2096     8 */
        } rcv_rtt_est;                                   /*  2088    16 */
...
        /* size: 2176, cachelines: 34, members: 135 */
        /* sum members: 2168, holes: 3, sum holes: 8 */
        /* paddings: 3, sum paddings: 12 */

neal


neal

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2019-09-13 20:55 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


1) Don't corrupt xfrm_interface parms before validation, from Nicolas
   Dichtel.

2) Revert use of usb-wakeup in btusb, from Mario Limonciello.

3) Block ipv6 packets in bridge netfilter if ipv6 is disabled,
   from Leonardo Bras.

4) IPS_OFFLOAD not honored in ctnetlink, from Pablo Neira Ayuso.

5) Missing ULP check in sock_map, from John Fastabend.

6) Fix receive statistic handling in forcedeth, from Zhu Yanjun.

7) Fix length of SKB allocated in 6pack driver, from Christophe
   JAILLET.

8) ip6_route_info_create() returns an error pointer, not NULL.
   From Maciej Żenczykowski.

9) Only add RDS sock to the hashes after rs_transport is set, from
   Ka-Cheong Poon.

10) Don't double clean TX descriptors in ixgbe, from Ilya Maximets.

11) Presence of transmit IPSEC offload in an SKB is not tested for
    correctly in ixgbe and ixgbevf.  From Steffen Klassert and
    Jeff Kirsher.

12) Need rcu_barrier() when register_netdevice() takes one of the
    notifier based failure paths, from Subash Abhinov Kasiviswanathan.

13) Fix leak in sctp_do_bind(), from Mao Wenan.

Please pull, thanks a lot!

The following changes since commit 089cf7f6ecb266b6a4164919a2e69bd2f938374a:

  Linux 5.3-rc7 (2019-09-02 09:57:40 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git 

for you to fetch changes up to 4d7ffcf3bf1be98d876c570cab8fc31d9fa92725:

  cdc_ether: fix rndis support for Mediatek based smartphones (2019-09-13 22:08:13 +0200)

----------------------------------------------------------------
Alexander Duyck (1):
      ixgbe: Prevent u8 wrapping of ITR value to something less than 10us

Alexei Starovoitov (1):
      bpf: fix precision tracking of stack slots

Bjørn Mork (1):
      cdc_ether: fix rndis support for Mediatek based smartphones

Christophe JAILLET (3):
      net/hamradio/6pack: Fix the size of a sk_buff used in 'sp_bump()'
      ipv6: Fix the link time qualifier of 'ping_v6_proc_exit_net()'
      sctp: Fix the link time qualifier of 'sctp_ctrlsock_exit()'

Colin Ian King (4):
      NFC: st95hf: fix spelling mistake "receieve" -> "receive"
      net: lmc: fix spelling mistake "runnin" -> "running"
      net: hns3: fix spelling mistake "undeflow" -> "underflow"
      mlx4: fix spelling mistake "veify" -> "verify"

Cong Wang (2):
      net_sched: check cops->tcf_block in tc_bind_tclass()
      sch_hhf: ensure quantum and hhf_non_hh_weight are non-zero

David Ahern (2):
      ipv6: Fix RTA_MULTIPATH with nexthop objects
      selftest: A few cleanups for fib_nexthops.sh

David Howells (1):
      rxrpc: Fix misplaced traceline

David S. Miller (8):
      Merge git://git.kernel.org/.../pablo/nf
      Merge branch 'for-upstream' of git://git.kernel.org/.../bluetooth/bluetooth
      Merge branch 'nexthops-Fix-multipath-notifications-for-IPv6-and-selftests'
      Merge branch 'master' of git://git.kernel.org/.../klassert/ipsec
      Merge tag 'wireless-drivers-for-davem-2019-09-05' of git://git.kernel.org/.../kvalo/wireless-drivers
      Merge git://git.kernel.org/.../bpf/bpf
      Merge branch '10GbE' of git://git.kernel.org/.../jkirsher/net-queue
      Merge branch 'sctp_do_bind-leak'

Donald Sharp (1):
      net: Properly update v4 routes with v6 nexthop

Eric Biggers (1):
      isdn/capi: check message length in capi_write()

Eric Dumazet (1):
      net: sched: fix reordering issues

Fernando Fernandez Mancera (1):
      netfilter: nft_socket: fix erroneous socket assignment

Florian Westphal (1):
      xfrm: policy: avoid warning splat when merging nodes

Fred Lotter (1):
      nfp: flower: cmsg rtnl locks can timeout reify messages

Harish Bandi (1):
      Bluetooth: hci_qca: disable irqs when spinlock is acquired

Hui Peng (1):
      rsi: fix a double free bug in rsi_91x_deinit()

Ilya Maximets (1):
      ixgbe: fix double clean of Tx descriptors with xdp

Jeff Kirsher (1):
      ixgbevf: Fix secpath usage for IPsec Tx offload

Jian-Hong Pan (1):
      Bluetooth: btrtl: Additional Realtek 8822CE Bluetooth devices

John Fastabend (1):
      net: sock_map, fix missing ulp check in sock hash case

Jouni Malinen (1):
      mac80211: Do not send Layer 2 Update frame before authorization

Juliet Kim (1):
      net/ibmvnic: free reset work of removed device from queue

Ka-Cheong Poon (1):
      net/rds: An rds_sock is added too early to the hash table

Leonardo Bras (2):
      netfilter: bridge: Drops IPv6 packets if IPv6 module is not loaded
      netfilter: nft_fib_netdev: Terminate rule eval if protocol=IPv6 and ipv6 module is disabled

Luca Coelho (1):
      iwlwifi: assign directly to iwl_trans->cfg in QuZ detection

Maciej Żenczykowski (2):
      net-ipv6: fix excessive RTF_ADDRCONF flag on ::1/128 local route (and others)
      ipv6: addrconf_f6i_alloc - fix non-null pointer check to !IS_ERR()

Mao Wenan (5):
      net: sonic: return NETDEV_TX_OK if failed to map buffer
      net: sonic: replace dev_kfree_skb in sonic_send_packet
      sctp: change return type of sctp_get_port_local
      sctp: remove redundant assignment when call sctp_get_port_local
      sctp: destroy bucket if failed to bind addr

Marcel Holtmann (1):
      Revert "Bluetooth: validate BLE connection interval updates"

Mario Limonciello (1):
      Revert "Bluetooth: btusb: driver to enable the usb-wakeup feature"

Michal Suchanek (1):
      net/ibmvnic: Fix missing { in __ibmvnic_reset

Moritz Fischer (1):
      net: fixed_phy: Add forward declaration for struct gpio_desc;

Navid Emamdoost (3):
      Bluetooth: bpa10x: change return value
      wimax: i2400: fix memory leak
      net: qrtr: fix memort leak in qrtr_tun_write_iter

Neal Cardwell (1):
      tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR

Nicolas Dichtel (5):
      xfrm interface: avoid corruption on changelink
      xfrm interface: ifname may be wrong in logs
      xfrm interface: fix list corruption for x-netns
      xfrm interface: fix management of phydev
      bridge/mdb: remove wrong use of NLM_F_MULTI

Pablo Neira Ayuso (2):
      netfilter: ctnetlink: honor IPS_OFFLOAD flag
      netfilter: nf_flow_table: set default timeout after successful insertion

Radhey Shyam Pandey (1):
      MAINTAINERS: add myself as maintainer for xilinx axiethernet driver

Randy Dunlap (1):
      lib/Kconfig: fix OBJAGG in lib/ menu structure

Shmulik Ladkani (1):
      net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list

Stanislaw Gruszka (4):
      mt76: mt76x0e: don't use hw encryption for MT7630E
      mt76: mt76x0e: disable 5GHz band for MT7630E
      rt2x00: clear up IV's on key removal
      Revert "rt2800: enable TX_PIN_CFG_LNA_PE_ bits per band"

Stefan Chulski (1):
      net: phylink: Fix flow control resolution

Steffen Klassert (1):
      ixgbe: Fix secpath usage for IPsec TX offload.

Subash Abhinov Kasiviswanathan (1):
      net: Fix null de-reference of device refcount

Wen Huang (1):
      mwifiex: Fix three heap overflow at parsing element in cfg80211_ap_settings

Xin Long (3):
      sctp: use transport pf_retrans in sctp_do_8_2_transport_strike
      tipc: add NULL pointer check before calling kfree_rcu
      sctp: fix the missing put_user when dumping transport thresholds

Yang Yingliang (1):
      tun: fix use-after-free when register netdev failed

Yizhuo (1):
      net: stmmac: dwmac-sun8i: Variable "val" in function sun8i_dwmac_set_syscon() could be uninitialized

Zhu Yanjun (1):
      forcedeth: use per cpu to collect xmit/recv statistics

 MAINTAINERS                                            |   3 +--
 drivers/bluetooth/bpa10x.c                             |   2 +-
 drivers/bluetooth/btusb.c                              |   8 +++----
 drivers/bluetooth/hci_qca.c                            |  10 ++++----
 drivers/isdn/capi/capi.c                               |  10 +++++++-
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c |   2 +-
 drivers/net/ethernet/ibm/ibmvnic.c                     |   9 ++++---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c          |   7 ++++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c           |  29 +++++++++--------------
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c      |   3 ++-
 drivers/net/ethernet/mellanox/mlx4/main.c              |   2 +-
 drivers/net/ethernet/natsemi/sonic.c                   |   6 ++---
 drivers/net/ethernet/netronome/nfp/flower/cmsg.c       |  10 ++++----
 drivers/net/ethernet/nvidia/forcedeth.c                | 143 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------------------------
 drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c      |   7 +++++-
 drivers/net/hamradio/6pack.c                           |   4 ++--
 drivers/net/phy/phylink.c                              |   6 ++---
 drivers/net/tun.c                                      |  16 +++++++++----
 drivers/net/usb/cdc_ether.c                            |  10 +++++++-
 drivers/net/wan/lmc/lmc_main.c                         |   2 +-
 drivers/net/wimax/i2400m/op-rfkill.c                   |   1 +
 drivers/net/wireless/intel/iwlwifi/pcie/drv.c          |  24 +++++++++----------
 drivers/net/wireless/marvell/mwifiex/ie.c              |   3 +++
 drivers/net/wireless/marvell/mwifiex/uap_cmd.c         |   9 ++++++-
 drivers/net/wireless/mediatek/mt76/mt76x0/eeprom.c     |   5 ++++
 drivers/net/wireless/mediatek/mt76/mt76x0/pci.c        |  15 +++++++++++-
 drivers/net/wireless/ralink/rt2x00/rt2800lib.c         |  37 ++++++++++++++---------------
 drivers/net/wireless/rsi/rsi_91x_usb.c                 |   1 -
 drivers/nfc/st95hf/core.c                              |   2 +-
 include/linux/phy_fixed.h                              |   1 +
 include/net/ip_fib.h                                   |   4 ++--
 include/net/nexthop.h                                  |   5 ++--
 include/net/xfrm.h                                     |   2 --
 include/uapi/linux/isdn/capicmd.h                      |   1 +
 kernel/bpf/verifier.c                                  |  23 +++++++++++-------
 lib/Kconfig                                            |   6 ++---
 net/bluetooth/hci_event.c                              |   5 ----
 net/bluetooth/l2cap_core.c                             |   9 +------
 net/bridge/br_mdb.c                                    |   2 +-
 net/bridge/br_netfilter_hooks.c                        |   4 ++++
 net/core/dev.c                                         |   2 ++
 net/core/skbuff.c                                      |  19 +++++++++++++++
 net/core/sock_map.c                                    |   3 +++
 net/ipv4/fib_semantics.c                               |  15 ++++++------
 net/ipv4/tcp_input.c                                   |   2 +-
 net/ipv6/ping.c                                        |   2 +-
 net/ipv6/route.c                                       |  21 ++++++++++-------
 net/mac80211/cfg.c                                     |  14 ++++-------
 net/mac80211/sta_info.c                                |   4 ++++
 net/netfilter/nf_conntrack_netlink.c                   |   7 ++++--
 net/netfilter/nf_flow_table_core.c                     |   2 +-
 net/netfilter/nft_fib_netdev.c                         |   3 +++
 net/netfilter/nft_socket.c                             |   6 ++---
 net/qrtr/tun.c                                         |   5 +++-
 net/rds/bind.c                                         |  40 ++++++++++++++------------------
 net/rxrpc/input.c                                      |   2 +-
 net/sched/sch_api.c                                    |   2 ++
 net/sched/sch_generic.c                                |   9 +++++--
 net/sched/sch_hhf.c                                    |   2 +-
 net/sctp/protocol.c                                    |   2 +-
 net/sctp/sm_sideeffect.c                               |   2 +-
 net/sctp/socket.c                                      |  24 ++++++++++---------
 net/tipc/name_distr.c                                  |   3 ++-
 net/xfrm/xfrm_interface.c                              |  56 ++++++++++++++++++++------------------------
 net/xfrm/xfrm_policy.c                                 |   6 +++--
 tools/testing/selftests/net/fib_nexthops.sh            |  24 ++++++++++---------
 tools/testing/selftests/net/xfrm_policy.sh             |   7 ++++++
 67 files changed, 443 insertions(+), 289 deletions(-)

^ permalink raw reply

* Re: [PATCH bpf-next 02/11] samples: bpf: makefile: fix cookie_uid_helper_example obj build
From: Yonghong Song @ 2019-09-13 20:48 UTC (permalink / raw)
  To: Ivan Khoronzhuk, ast@kernel.org, daniel@iogearbox.net,
	davem@davemloft.net, jakub.kicinski@netronome.com,
	hawk@kernel.org, john.fastabend@gmail.com
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	bpf@vger.kernel.org, clang-built-linux@googlegroups.com
In-Reply-To: <20190910103830.20794-3-ivan.khoronzhuk@linaro.org>



On 9/10/19 11:38 AM, Ivan Khoronzhuk wrote:
> Don't list userspace "cookie_uid_helper_example" object in list for
> bpf objects.
> 
> per_socket_stats_example-opjs is used to list additional dependencies

s/opjs/objs

> for user space binary from hostprogs-y list. Kbuild system creates
> rules for objects listed this way anyway and no need to worry about
> this. Despite on it, the samples bpf uses logic that hostporgs-y are
> build for userspace with includes needed for this, but "always"
> target, if it's not in hostprog-y list, uses CLANG-bpf rule and is
> intended to create bpf obj but not arch obj and uses only kernel
> includes for that. So correct it, as it breaks cross-compiling at
> least.

The above description is a little tricky to understand.
Maybe something like:
    'always' target is for bpf programs.
    'cookie_uid_helper_example.o' is a user space ELF file, and
    covered by rule `per_socket_stats_example`.
    Let us remove `always += cookie_uid_helper_example.o`,
    which avoids breaking cross compilation due to
    mismatched includes.

> 
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> ---
>   samples/bpf/Makefile | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> index f50ca852c2a8..43dee90dffa4 100644
> --- a/samples/bpf/Makefile
> +++ b/samples/bpf/Makefile
> @@ -145,7 +145,6 @@ always += sampleip_kern.o
>   always += lwt_len_hist_kern.o
>   always += xdp_tx_iptunnel_kern.o
>   always += test_map_in_map_kern.o
> -always += cookie_uid_helper_example.o
>   always += tcp_synrto_kern.o
>   always += tcp_rwnd_kern.o
>   always += tcp_bufs_kern.o
> 

^ permalink raw reply

* RE: [PATCH][PATCH net-next] hv_sock: Add the support of hibernation
From: Dexuan Cui @ 2019-09-13 20:13 UTC (permalink / raw)
  To: David Miller, sashal@kernel.org
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley
In-Reply-To: <20190913.210343.724088723062134961.davem@davemloft.net>

> From: David Miller <davem@davemloft.net>
> Sent: Friday, September 13, 2019 1:04 PM
> 
> From: Dexuan Cui <decui@microsoft.com>
> Date: Wed, 11 Sep 2019 23:37:27 +0000
> > I request this patch should go through Sasha's tree rather than the
> > net-next tree.
> 
> That's fine:
> 
> Acked-by: David S. Miller <davem@davemloft.net>

Thanks, David!

@Sasha: I found a few typos in my comment below. I'll post a v2.

> > +/* hv_sock connections can not persist across hibernation, and all the hv_sock
> >  + * channels are forceed to be rescinded before hibernation: see

forceed -> forced

> >  + * are only needed because hibernation requires that every device's driver

every device's driver -> every vmbus device's driver

Thanks,
-- Dexuan

^ permalink raw reply

* Re: [patch net-next v3 0/3] net: devlink: move reload fail indication to devlink core and expose to user
From: David Miller @ 2019-09-13 20:11 UTC (permalink / raw)
  To: jiri; +Cc: netdev, idosch, dsahern, jakub.kicinski, tariqt, mlxsw
In-Reply-To: <20190912084946.7468-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@resnulli.us>
Date: Thu, 12 Sep 2019 10:49:43 +0200

> From: Jiri Pirko <jiri@mellanox.com>
> 
> First two patches are dependencies of the last one. That moves devlink
> reload failure indication to the devlink code, so the drivers do not
> have to track it themselves. Currently it is only mlxsw, but I will send
> a follow-up patchset that introduces this in netdevsim too.

Series applied.

^ permalink raw reply

* Re: [PATCH net,stable] cdc_ether: fix rndis support for Mediatek based smartphones
From: David Miller @ 2019-09-13 20:09 UTC (permalink / raw)
  To: bjorn; +Cc: netdev, oliver, linux-usb, larsm17
In-Reply-To: <20190912084200.6359-1-bjorn@mork.no>

From: Bjørn Mork <bjorn@mork.no>
Date: Thu, 12 Sep 2019 10:42:00 +0200

> A Mediatek based smartphone owner reports problems with USB
> tethering in Linux.  The verbose USB listing shows a rndis_host
> interface pair (e0/01/03 + 10/00/00), but the driver fails to
> bind with
> 
> [  355.960428] usb 1-4: bad CDC descriptors
> 
> The problem is a failsafe test intended to filter out ACM serial
> functions using the same 02/02/ff class/subclass/protocol as RNDIS.
> The serial functions are recognized by their non-zero bmCapabilities.
> 
> No RNDIS function with non-zero bmCapabilities were known at the time
> this failsafe was added. But it turns out that some Wireless class
> RNDIS functions are using the bmCapabilities field. These functions
> are uniquely identified as RNDIS by their class/subclass/protocol, so
> the failing test can safely be disabled.  The same applies to the two
> types of Misc class RNDIS functions.
> 
> Applying the failsafe to Communication class functions only retains
> the original functionality, and fixes the problem for the Mediatek based
> smartphone.
> 
> Tow examples of CDC functional descriptors with non-zero bmCapabilities
> from Wireless class RNDIS functions are:
 ...
> The Mediatek example is believed to apply to most smartphones with
> Mediatek firmware.  The ZTE example is most likely also part of a larger
> family of devices/firmwares.
> 
> Suggested-by: Lars Melin <larsm17@gmail.com>
> Signed-off-by: Bjørn Mork <bjorn@mork.no>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* [PATCH v4.14-stable 0/2] Fixes to commit fdfc5c8594c2 (tcp: remove empty skb from write queue in error cases)
From: Christoph Paasch @ 2019-09-13 20:08 UTC (permalink / raw)
  To: stable, netdev, gregkh, Sasha Levin; +Cc: David Miller, Eric Dumazet


The above referenced commit has problems on older non-rbTree kernels.

AFAICS, the commit has only been backported to 4.14 up to now, but the
commit that fdfc5c8594c2 is fixing (namely ce5ec440994b ("tcp: ensure epoll
edge trigger wakeup when write queue is empty"), is in v4.2.

Christoph Paasch (2):
  tcp: Reset send_head when removing skb from write-queue
  tcp: Don't dequeue SYN/FIN-segments from write-queue

 net/ipv4/tcp.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
2.21.0


^ permalink raw reply

* Re: [PATCH v2 net 0/3] fix memory leak for sctp_do_bind
From: David Miller @ 2019-09-13 20:06 UTC (permalink / raw)
  To: maowenan
  Cc: vyasevich, nhorman, marcelo.leitner, linux-sctp, netdev,
	linux-kernel, kernel-janitors
In-Reply-To: <20190912040219.67517-1-maowenan@huawei.com>

From: Mao Wenan <maowenan@huawei.com>
Date: Thu, 12 Sep 2019 12:02:16 +0800

> First two patches are to do cleanup, remove redundant assignment,
> and change return type of sctp_get_port_local.
> Third patch is to fix memory leak for sctp_do_bind if failed
> to bind address.
> 
> ---
>  v2: add one patch to change return type of sctp_get_port_local.

Series applied with Fixes: tag removed from patch #1.

Thanks.

^ permalink raw reply

* Re: [PATCH][PATCH net-next] hv_sock: Add the support of hibernation
From: David Miller @ 2019-09-13 20:03 UTC (permalink / raw)
  To: decui
  Cc: kys, haiyangz, sthemmin, sashal, linux-hyperv, netdev,
	linux-kernel, mikelley
In-Reply-To: <1568245042-66967-1-git-send-email-decui@microsoft.com>

From: Dexuan Cui <decui@microsoft.com>
Date: Wed, 11 Sep 2019 23:37:27 +0000

> Add the necessary dummy callbacks for hibernation.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
> This patch is basically a pure Hyper-V specific change and it has a
> build dependency on the commit 271b2224d42f ("Drivers: hv: vmbus: Implement
> suspend/resume for VSC drivers for hibernation"), which is on Sasha Levin's
> Hyper-V tree's hyperv-next branch:
> https://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git/log/?h=hyperv-next
> 
> I request this patch should go through Sasha's tree rather than the
> net-next tree.

That's fine:

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [PATCH net-next] ip: support SO_MARK cmsg
From: David Miller @ 2019-09-13 19:44 UTC (permalink / raw)
  To: willemdebruijn.kernel; +Cc: netdev, willemb
In-Reply-To: <20190911195051.166062-1-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Wed, 11 Sep 2019 15:50:51 -0400

> From: Willem de Bruijn <willemb@google.com>
> 
> Enable setting skb->mark for UDP and RAW sockets using cmsg.
> 
> This is analogous to existing support for TOS, TTL, txtime, etc.
> 
> Packet sockets already support this as of commit c7d39e32632e
> ("packet: support per-packet fwmark for af_packet sendmsg").
> 
> Similar to other fields, implement by
> 1. initialize the sockcm_cookie.mark from socket option sk_mark
> 2. optionally overwrite this in ip_cmsg_send/ip6_datagram_send_ctl
> 3. initialize inet_cork.mark from sockcm_cookie.mark
> 4. initialize each (usually just one) skb->mark from inet_cork.mark
> 
> Step 1 is handled in one location for most protocols by ipcm_init_sk
> as of commit 351782067b6b ("ipv4: ipcm_cookie initializers").
> 
> Signed-off-by: Willem de Bruijn <willemb@google.com>

Looks good, applied.

^ permalink raw reply

* Re: [PATCH bpf-next 01/11] samples: bpf: makefile: fix HDR_PROBE "echo"
From: Ivan Khoronzhuk @ 2019-09-13 19:56 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: ast, daniel, yhs, davem, jakub.kicinski, hawk, john.fastabend,
	linux-kernel, netdev, bpf, clang-built-linux
In-Reply-To: <4251fe86-ccc7-f1ce-e954-2d488d2a95a9@cogentembedded.com>

On Wed, Sep 11, 2019 at 02:02:11PM +0300, Sergei Shtylyov wrote:
>On 10.09.2019 17:54, Ivan Khoronzhuk wrote:
>
>>>Hello!
>>>
>>>On 10.09.2019 13:38, Ivan Khoronzhuk wrote:
>>>
>>>>echo should be replaced on echo -e to handle \n correctly, but instead,
>>>
>>> s/on/with/?
>>s/echo/printf/ instead of s/echo/echo -e/
>
>   I only pointed that 'on' is incorrect there. You replace something 
>/with/ something other...
>
>>
>>printf looks better.
>>
>>>
>>>>replace it on printf as some systems can't handle echo -e.
>>>
>>>  Likewise?
>
>   Same grammatical mistake.
Oh, will correct it next v.


-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply

* [PATCH v4 1/2] tcp: Add TCP_INFO counter for packets received out-of-order
From: Thomas Higdon @ 2019-09-13 19:36 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: Jonathan Lemon, Dave Jones, Eric Dumazet, Neal Cardwell,
	Dave Taht, Yuchung Cheng, Soheil Hassas Yeganeh

For receive-heavy cases on the server-side, we want to track the
connection quality for individual client IPs. This counter, similar to
the existing system-wide TCPOFOQueue counter in /proc/net/netstat,
tracks out-of-order packet reception. By providing this counter in
TCP_INFO, it will allow understanding to what degree receive-heavy
sockets are experiencing out-of-order delivery and packet drops
indicating congestion.

Please note that this is similar to the counter in NetBSD TCP_INFO, and
has the same name.

Signed-off-by: Thomas Higdon <tph@fb.com>
---

no changes from v3

 include/linux/tcp.h      | 2 ++
 include/uapi/linux/tcp.h | 2 ++
 net/ipv4/tcp.c           | 2 ++
 net/ipv4/tcp_input.c     | 1 +
 4 files changed, 7 insertions(+)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index f3a85a7fb4b1..a01dc78218f1 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -393,6 +393,8 @@ struct tcp_sock {
 	 */
 	struct request_sock *fastopen_rsk;
 	u32	*saved_syn;
+
+	u32 rcv_ooopack; /* Received out-of-order packets, for tcpinfo */
 };
 
 enum tsq_enum {
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index b3564f85a762..20237987ccc8 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -270,6 +270,8 @@ struct tcp_info {
 	__u64	tcpi_bytes_retrans;  /* RFC4898 tcpEStatsPerfOctetsRetrans */
 	__u32	tcpi_dsack_dups;     /* RFC4898 tcpEStatsStackDSACKDups */
 	__u32	tcpi_reord_seen;     /* reordering events seen */
+
+	__u32	tcpi_rcv_ooopack;    /* Out-of-order packets received */
 };
 
 /* netlink attributes types for SCM_TIMESTAMPING_OPT_STATS */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 94df48bcecc2..4cf58208270e 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2653,6 +2653,7 @@ int tcp_disconnect(struct sock *sk, int flags)
 	tp->rx_opt.saw_tstamp = 0;
 	tp->rx_opt.dsack = 0;
 	tp->rx_opt.num_sacks = 0;
+	tp->rcv_ooopack = 0;
 
 
 	/* Clean up fastopen related fields */
@@ -3295,6 +3296,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
 	info->tcpi_bytes_retrans = tp->bytes_retrans;
 	info->tcpi_dsack_dups = tp->dsack_dups;
 	info->tcpi_reord_seen = tp->reord_seen;
+	info->tcpi_rcv_ooopack = tp->rcv_ooopack;
 	unlock_sock_fast(sk, slow);
 }
 EXPORT_SYMBOL_GPL(tcp_get_info);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 706cbb3b2986..2ef333354026 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4555,6 +4555,7 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
 	tp->pred_flags = 0;
 	inet_csk_schedule_ack(sk);
 
+	tp->rcv_ooopack += max_t(u16, 1, skb_shinfo(skb)->gso_segs);
 	NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPOFOQUEUE);
 	seq = TCP_SKB_CB(skb)->seq;
 	end_seq = TCP_SKB_CB(skb)->end_seq;
-- 
2.17.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox