Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v4 2/2] tcp: Add snd_wnd to TCP_INFO
From: Yuchung Cheng @ 2019-09-13 21:28 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Thomas Higdon, netdev@vger.kernel.org, Jonathan Lemon, Dave Jones,
	Eric Dumazet, Dave Taht, Soheil Hassas Yeganeh
In-Reply-To: <CADVnQymKS6-jztAbLu_QYWiPYMqoTf5ODzSg3UPJxH+vBt=bmw@mail.gmail.com>

On Fri, Sep 13, 2019 at 2:02 PM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Fri, Sep 13, 2019 at 3:36 PM Thomas Higdon <tph@fb.com> wrote:
> >
> > Neal Cardwell mentioned that snd_wnd would be useful for diagnosing TCP
> > performance problems --
> > > (1) Usually when we're diagnosing TCP performance problems, we do so
> > > from the sender, since the sender makes most of the
> > > performance-critical decisions (cwnd, pacing, TSO size, TSQ, etc).
> > > From the sender-side the thing that would be most useful is to see
> > > tp->snd_wnd, the receive window that the receiver has advertised to
> > > the sender.
> >
> > This serves the purpose of adding an additional __u32 to avoid the
> > would-be hole caused by the addition of the tcpi_rcvi_ooopack field.
> >
> > Signed-off-by: Thomas Higdon <tph@fb.com>
> > ---
> > changes from v3:
> >  - changed from rcv_wnd to snd_wnd
> >
> >  include/uapi/linux/tcp.h | 1 +
> >  net/ipv4/tcp.c           | 1 +
> >  2 files changed, 2 insertions(+)
> >
> > diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> > index 20237987ccc8..240654f22d98 100644
> > --- a/include/uapi/linux/tcp.h
> > +++ b/include/uapi/linux/tcp.h
> > @@ -272,6 +272,7 @@ struct tcp_info {
> >         __u32   tcpi_reord_seen;     /* reordering events seen */
> >
> >         __u32   tcpi_rcv_ooopack;    /* Out-of-order packets received */
> > +       __u32   tcpi_snd_wnd;        /* Remote peer's advertised recv window size */
> >  };
>
> Thanks for adding this!
>
> My run of ./scripts/checkpatch.pl is showing a warning on this line:
>
> WARNING: line over 80 characters
> #19: FILE: include/uapi/linux/tcp.h:273:
> +       __u32   tcpi_snd_wnd;        /* Remote peer's advertised recv
> window size */
>
> What if the comment is shortened up to fit in 80 columns and the units
> (bytes) are added, something like:
>
>         __u32   tcpi_snd_wnd;        /* peer's advertised recv window (bytes) */
just a thought: will tcpi_peer_rcv_wnd be more self-explanatory?
>
> neal

^ permalink raw reply

* Re: [PATCH v3 0/5] Introduce variable length mdev alias
From: Alex Williamson @ 2019-09-13 21:32 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jiri Pirko, kwankhede@nvidia.com, cohuck@redhat.com,
	davem@davemloft.net, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <AM0PR05MB48667E374853D485788D8159D1B10@AM0PR05MB4866.eurprd05.prod.outlook.com>

On Wed, 11 Sep 2019 16:38:49 +0000
Parav Pandit <parav@mellanox.com> wrote:

> > -----Original Message-----
> > From: linux-kernel-owner@vger.kernel.org <linux-kernel-  
> > owner@vger.kernel.org> On Behalf Of Parav Pandit  
> > Sent: Wednesday, September 11, 2019 10:31 AM
> > To: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Jiri Pirko <jiri@mellanox.com>; kwankhede@nvidia.com;
> > cohuck@redhat.com; davem@davemloft.net; kvm@vger.kernel.org; linux-
> > kernel@vger.kernel.org; netdev@vger.kernel.org
> > Subject: RE: [PATCH v3 0/5] Introduce variable length mdev alias
> > 
> > Hi Alex,
> >   
> > > -----Original Message-----
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Wednesday, September 11, 2019 8:56 AM
> > > To: Parav Pandit <parav@mellanox.com>
> > > Cc: Jiri Pirko <jiri@mellanox.com>; kwankhede@nvidia.com;
> > > cohuck@redhat.com; davem@davemloft.net; kvm@vger.kernel.org; linux-
> > > kernel@vger.kernel.org; netdev@vger.kernel.org
> > > Subject: Re: [PATCH v3 0/5] Introduce variable length mdev alias
> > >
> > > On Mon, 9 Sep 2019 20:42:32 +0000
> > > Parav Pandit <parav@mellanox.com> wrote:
> > >  
> > > > Hi Alex,
> > > >  
> > > > > -----Original Message-----
> > > > > From: Parav Pandit <parav@mellanox.com>
> > > > > Sent: Sunday, September 1, 2019 11:25 PM
> > > > > To: alex.williamson@redhat.com; Jiri Pirko <jiri@mellanox.com>;
> > > > > kwankhede@nvidia.com; cohuck@redhat.com; davem@davemloft.net
> > > > > Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > > netdev@vger.kernel.org; Parav Pandit <parav@mellanox.com>
> > > > > Subject: [PATCH v3 0/5] Introduce variable length mdev alias
> > > > >
> > > > > To have consistent naming for the netdevice of a mdev and to have
> > > > > consistent naming of the devlink port [1] of a mdev, which is
> > > > > formed using phys_port_name of the devlink port, current UUID is
> > > > > not usable because UUID is too long.
> > > > >
> > > > > UUID in string format is 36-characters long and in binary 128-bit.
> > > > > Both formats are not able to fit within 15 characters limit of
> > > > > netdev  
> > > name.  
> > > > >
> > > > > It is desired to have mdev device naming consistent using UUID.
> > > > > So that widely used user space framework such as ovs [2] can make
> > > > > use of mdev representor in similar way as PCIe SR-IOV VF and PF  
> > > representors.  
> > > > >
> > > > > Hence,
> > > > > (a) mdev alias is created which is derived using sha1 from the
> > > > > mdev  
> > > name.  
> > > > > (b) Vendor driver describes how long an alias should be for the
> > > > > child mdev created for a given parent.
> > > > > (c) Mdev aliases are unique at system level.
> > > > > (d) alias is created optionally whenever parent requested.
> > > > > This ensures that non networking mdev parents can function without
> > > > > alias creation overhead.
> > > > >
> > > > > This design is discussed at [3].
> > > > >
> > > > > An example systemd/udev extension will have,
> > > > >
> > > > > 1. netdev name created using mdev alias available in sysfs.
> > > > >
> > > > > mdev UUID=83b8f4f2-509f-382f-3c1e-e6bfe0fa1001
> > > > > mdev 12 character alias=cd5b146a80a5
> > > > >
> > > > > netdev name of this mdev = enmcd5b146a80a5 Here en = Ethernet link
> > > > > m = mediated device
> > > > >
> > > > > 2. devlink port phys_port_name created using mdev alias.
> > > > > devlink phys_port_name=pcd5b146a80a5
> > > > >
> > > > > This patchset enables mdev core to maintain unique alias for a mdev.
> > > > >
> > > > > Patch-1 Introduces mdev alias using sha1.
> > > > > Patch-2 Ensures that mdev alias is unique in a system.
> > > > > Patch-3 Exposes mdev alias in a sysfs hirerchy, update
> > > > > Documentation
> > > > > Patch-4 Introduces mdev_alias() API.
> > > > > Patch-5 Extends mtty driver to optionally provide alias generation.
> > > > > This also enables to test UUID based sha1 collision and trigger
> > > > > error handling for duplicate sha1 results.
> > > > >
> > > > > [1] http://man7.org/linux/man-pages/man8/devlink-port.8.html
> > > > > [2] https://docs.openstack.org/os-vif/latest/user/plugins/ovs.html
> > > > > [3] https://patchwork.kernel.org/cover/11084231/
> > > > >
> > > > > ---
> > > > > Changelog:
> > > > > v2->v3:
> > > > >  - Addressed comment from Yunsheng Lin
> > > > >  - Changed strcmp() ==0 to !strcmp()
> > > > >  - Addressed comment from Cornelia Hunk
> > > > >  - Merged sysfs Documentation patch with syfs patch
> > > > >  - Added more description for alias return value  
> > > >
> > > > Did you get a chance review this updated series?
> > > > I addressed Cornelia's and yours comment.
> > > > I do not think allocating alias memory twice, once for comparison
> > > > and once for storing is good idea or moving alias generation logic
> > > > inside the mdev_list_lock(). So I didn't address that suggestion of  
> > Cornelia.  
> > >
> > > Sorry, I'm at LPC this week.  I agree, I don't think the double
> > > allocation is necessary, I thought the comment was sufficient to
> > > clarify null'ing the variable.  It's awkward, but seems correct.
> > >
> > > I'm not sure what we do with this patch series though, has the real
> > > consumer of this even been proposed?    
> 
> Jiri already acked to use mdev_alias() to generate phys_port_name several days back in the discussion we had in [1].
> After concluding in the thread [1], I proceed with mdev_alias().
> mlx5_core patches are not yet present on netdev mailing list, but we
> all agree to use it in mdev_alias() in devlink phys_port_name
> generation. So we have collective agreement on how to proceed
> forward. I wasn't probably clear enough in previous email reply about
> it, so adding link here.
> 
> [1] https://patchwork.kernel.org/cover/11084231/#22838955

Jiri may have agreed to the concept, but without patches on the list
proving an end to end solution, I think it's too early for us to commit
to this by preemptively adding it to our API.  "Acked" and "collective
agreement" seem like they overstate something that seems not to have
seen the light of day yet.  Instead I'll say, it looks reasonable, come
back when the real consumer has actually been proposed upstream and has
more buy-in from the community and we'll see if it still looks like the
right approach from an mdev perspective then.  Thanks,

Alex

^ permalink raw reply

* Re: [PATCH bpf-next 07/11] samples: bpf: add makefile.prog for separate CC build
From: Yonghong Song @ 2019-09-13 21:33 UTC (permalink / raw)
  To: Ivan Khoronzhuk, ast@kernel.org, daniel@iogearbox.net,
	davem@davemloft.net, jakub.kicinski@netronome.com,
	hawk@kernel.org, john.fastabend@gmail.com
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	bpf@vger.kernel.org, clang-built-linux@googlegroups.com
In-Reply-To: <20190910103830.20794-8-ivan.khoronzhuk@linaro.org>



On 9/10/19 11:38 AM, Ivan Khoronzhuk wrote:
> The makefile.prog is added only, will be used in sample/bpf/Makefile
> later in order to switch cross-compiling on CC from HOSTCC.
> 
> The HOSTCC is supposed to build binaries and tools running on the host
> afterwards, in order to simplify build or so, like "fixdep" or else.
> In case of cross compiling "fixdep" is executed on host when the rest
> samples should run on target arch. In order to build binaries for
> target arch with CC and tools running on host with HOSTCC, lets add
> Makefile.prog for simplicity, having definition and routines similar
> to ones, used in script/Makefile.host. This allows later add
> cross-compilation to samples/bpf with minimum changes.

So this is really Makefile.host or Makefile.user, right?
In BPF, 'prog' can refers to user prog or bpf prog.
To avoid ambiguity, maybe Makefile.host?

> 
> Makefile.prog contains only stuff needed for samples/bpf, potentially
> can be reused and extended for other prog sets later and now needed

What do you mean 'extended for other prog sets'? I am wondering whether
we could just include 'scripts/Makefile.host'? How hard it is?

> only for unblocking tricky samples/bpf cross compilation.
> 
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> ---
>   samples/bpf/Makefile.prog | 77 +++++++++++++++++++++++++++++++++++++++
>   1 file changed, 77 insertions(+)
>   create mode 100644 samples/bpf/Makefile.prog
> 
> diff --git a/samples/bpf/Makefile.prog b/samples/bpf/Makefile.prog
> new file mode 100644
> index 000000000000..3781999b9193
> --- /dev/null
> +++ b/samples/bpf/Makefile.prog
> @@ -0,0 +1,77 @@
> +# SPDX-License-Identifier: GPL-2.0
> +# ==========================================================================
> +# Building binaries on the host system
> +# Binaries are not used during the compilation of the kernel, and intendent
> +# to be build for target board, target board can be host ofc. Added to build
> +# binaries to run not on host system.
> +#
> +# Only C is supported, but can be extended for C++.

The above comment is not needed.
samples/bpf/ only have C now. I am wondering whether your below scripts 
can be simplified, e.g., removing cxxobjs.

> +#
> +# Sample syntax (see Documentation/kbuild/makefiles.rst for reference)
> +# progs-y := xsk_example
> +# Will compile xdpsock_example.c and create an executable named xsk_example
> +#
> +# progs-y    := xdpsock
> +# xdpsock-objs := xdpsock_1.o xdpsock_2.o
> +# Will compile xdpsock_1.c and xdpsock_2.c, and then link the executable
> +# xdpsock, based on xdpsock_1.o and xdpsock_2.o
> +#
> +# Inherited from scripts/Makefile.host
> +#
> +__progs := $(sort $(progs-y))
> +
> +# C code
> +# Executables compiled from a single .c file
> +prog-csingle	:= $(foreach m,$(__progs), \
> +			$(if $($(m)-objs)$($(m)-cxxobjs),,$(m)))
> +
> +# C executables linked based on several .o files
> +prog-cmulti	:= $(foreach m,$(__progs),\
> +		   $(if $($(m)-cxxobjs),,$(if $($(m)-objs),$(m))))
> +
> +# Object (.o) files compiled from .c files
> +prog-cobjs	:= $(sort $(foreach m,$(__progs),$($(m)-objs)))
> +
> +prog-csingle	:= $(addprefix $(obj)/,$(prog-csingle))
> +prog-cmulti	:= $(addprefix $(obj)/,$(prog-cmulti))
> +prog-cobjs	:= $(addprefix $(obj)/,$(prog-cobjs))
> +
> +#####
> +# Handle options to gcc. Support building with separate output directory
> +
> +_progc_flags   = $(PROGS_CFLAGS) \
> +                 $(PROGCFLAGS_$(basetarget).o)
> +
> +# $(objtree)/$(obj) for including generated headers from checkin source files
> +ifeq ($(KBUILD_EXTMOD),)
> +ifdef building_out_of_srctree
> +_progc_flags   += -I $(objtree)/$(obj)
> +endif
> +endif
> +
> +progc_flags    = -Wp,-MD,$(depfile) $(_progc_flags)
> +
> +# Create executable from a single .c file
> +# prog-csingle -> Executable
> +quiet_cmd_prog-csingle 	= CC  $@
> +      cmd_prog-csingle	= $(CC) $(progc_flags) $(PROGS_LDFLAGS) -o $@ $< \
> +		$(PROGS_LDLIBS) $(PROGLDLIBS_$(@F))
> +$(prog-csingle): $(obj)/%: $(src)/%.c FORCE
> +	$(call if_changed_dep,prog-csingle)
> +
> +# Link an executable based on list of .o files, all plain c
> +# prog-cmulti -> executable
> +quiet_cmd_prog-cmulti	= LD  $@
> +      cmd_prog-cmulti	= $(CC) $(progc_flags) $(PROGS_LDFLAGS) -o $@ \
> +			  $(addprefix $(obj)/,$($(@F)-objs)) \
> +			  $(PROGS_LDLIBS) $(PROGLDLIBS_$(@F))
> +$(prog-cmulti): $(prog-cobjs) FORCE
> +	$(call if_changed,prog-cmulti)
> +$(call multi_depend, $(prog-cmulti), , -objs)
> +
> +# Create .o file from a single .c file
> +# prog-cobjs -> .o
> +quiet_cmd_prog-cobjs	= CC  $@
> +      cmd_prog-cobjs	= $(CC) $(progc_flags) -c -o $@ $<
> +$(prog-cobjs): $(obj)/%.o: $(src)/%.c FORCE
> +	$(call if_changed_dep,prog-cobjs)
> 

^ permalink raw reply

* Re: [PATCH bpf-next 08/11] samples: bpf: makefile: base progs build on makefile.progs
From: Yonghong Song @ 2019-09-13 21:41 UTC (permalink / raw)
  To: Ivan Khoronzhuk, ast@kernel.org, daniel@iogearbox.net,
	davem@davemloft.net, jakub.kicinski@netronome.com,
	hawk@kernel.org, john.fastabend@gmail.com
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	bpf@vger.kernel.org, clang-built-linux@googlegroups.com
In-Reply-To: <20190910103830.20794-9-ivan.khoronzhuk@linaro.org>



On 9/10/19 11:38 AM, Ivan Khoronzhuk wrote:
> The main reason for that - HOSTCC and CC have different aims.
> It was tested for arm cross compilation, based on linaro toolchain,
> but should work for others.
> 
> In order to split cross compilation (CC) with host build (HOSTCC),
> lets base bpf samples on Makefile.progs. It allows to cross-compile
> samples/bpf progs with CC while auxialry tools running on host built
> with HOSTCC.

I got a compilation failure with the following error

$ make samples/bpf/
   ...
   LD  samples/bpf/hbm
   CC      samples/bpf/syscall_nrs.s
gcc: error: -pg and -fomit-frame-pointer are incompatible
make[2]: *** [samples/bpf/syscall_nrs.s] Error 1
make[1]: *** [samples/bpf/] Error 2
make: *** [sub-make] Error 2

Could you take a look?

> 
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> ---
>   samples/bpf/Makefile | 138 +++++++++++++++++++++++--------------------
>   1 file changed, 73 insertions(+), 65 deletions(-)
> 
> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> index f5dbf3d0c5f3..625a71f2e9d2 100644
> --- a/samples/bpf/Makefile
> +++ b/samples/bpf/Makefile
> @@ -4,55 +4,53 @@ BPF_SAMPLES_PATH ?= $(abspath $(srctree)/$(src))
>   TOOLS_PATH := $(BPF_SAMPLES_PATH)/../../tools
>   
>   # List of programs to build
> -hostprogs-y := test_lru_dist
> -hostprogs-y += sock_example
> -hostprogs-y += fds_example
> -hostprogs-y += sockex1
> -hostprogs-y += sockex2
> -hostprogs-y += sockex3
> -hostprogs-y += tracex1
> -hostprogs-y += tracex2
> -hostprogs-y += tracex3
> -hostprogs-y += tracex4
> -hostprogs-y += tracex5
> -hostprogs-y += tracex6
> -hostprogs-y += tracex7
> -hostprogs-y += test_probe_write_user
> -hostprogs-y += trace_output
> -hostprogs-y += lathist
> -hostprogs-y += offwaketime
> -hostprogs-y += spintest
> -hostprogs-y += map_perf_test
> -hostprogs-y += test_overhead
> -hostprogs-y += test_cgrp2_array_pin
> -hostprogs-y += test_cgrp2_attach
> -hostprogs-y += test_cgrp2_sock
> -hostprogs-y += test_cgrp2_sock2
> -hostprogs-y += xdp1
> -hostprogs-y += xdp2
> -hostprogs-y += xdp_router_ipv4
> -hostprogs-y += test_current_task_under_cgroup
> -hostprogs-y += trace_event
> -hostprogs-y += sampleip
> -hostprogs-y += tc_l2_redirect
> -hostprogs-y += lwt_len_hist
> -hostprogs-y += xdp_tx_iptunnel
> -hostprogs-y += test_map_in_map
> -hostprogs-y += per_socket_stats_example
> -hostprogs-y += xdp_redirect
> -hostprogs-y += xdp_redirect_map
> -hostprogs-y += xdp_redirect_cpu
> -hostprogs-y += xdp_monitor
> -hostprogs-y += xdp_rxq_info
> -hostprogs-y += syscall_tp
> -hostprogs-y += cpustat
> -hostprogs-y += xdp_adjust_tail
> -hostprogs-y += xdpsock
> -hostprogs-y += xdp_fwd
> -hostprogs-y += task_fd_query
> -hostprogs-y += xdp_sample_pkts
> -hostprogs-y += ibumad
> -hostprogs-y += hbm
> +progs-y := test_lru_dist
> +progs-y += sock_example
> +progs-y += fds_example
> +progs-y += sockex1
> +progs-y += sockex2
> +progs-y += sockex3
> +progs-y += tracex1
> +progs-y += tracex2
> +progs-y += tracex3
> +progs-y += tracex4
> +progs-y += tracex5
> +progs-y += tracex6
> +progs-y += tracex7
> +progs-y += test_probe_write_user
> +progs-y += trace_output
> +progs-y += lathist
> +progs-y += offwaketime
> +progs-y += spintest
> +progs-y += map_perf_test
> +progs-y += test_overhead
> +progs-y += test_cgrp2_array_pin
> +progs-y += test_cgrp2_attach
> +progs-y += test_cgrp2_sock
> +progs-y += test_cgrp2_sock2
> +progs-y += xdp1
> +progs-y += xdp2
> +progs-y += xdp_router_ipv4
> +progs-y += test_current_task_under_cgroup
> +progs-y += trace_event
> +progs-y += sampleip
> +progs-y += tc_l2_redirect
> +progs-y += lwt_len_hist
> +progs-y += xdp_tx_iptunnel
> +progs-y += test_map_in_map
> +progs-y += xdp_redirect_map
> +progs-y += xdp_redirect_cpu
> +progs-y += xdp_monitor
> +progs-y += xdp_rxq_info
> +progs-y += syscall_tp
> +progs-y += cpustat
> +progs-y += xdp_adjust_tail
> +progs-y += xdpsock
> +progs-y += xdp_fwd
> +progs-y += task_fd_query
> +progs-y += xdp_sample_pkts
> +progs-y += ibumad
> +progs-y += hbm
>   
>   # Libbpf dependencies
>   LIBBPF = $(TOOLS_PATH)/lib/bpf/libbpf.a
> @@ -111,7 +109,7 @@ ibumad-objs := bpf_load.o ibumad_user.o $(TRACE_HELPERS)
>   hbm-objs := bpf_load.o hbm.o $(CGROUP_HELPERS)
>   
>   # Tell kbuild to always build the programs
> -always := $(hostprogs-y)
> +always := $(progs-y)
>   always += sockex1_kern.o
>   always += sockex2_kern.o
>   always += sockex3_kern.o
> @@ -170,21 +168,6 @@ always += ibumad_kern.o
>   always += hbm_out_kern.o
>   always += hbm_edt_kern.o
>   
> -KBUILD_HOSTCFLAGS += -I$(objtree)/usr/include
> -KBUILD_HOSTCFLAGS += -I$(srctree)/tools/lib/bpf/
> -KBUILD_HOSTCFLAGS += -I$(srctree)/tools/testing/selftests/bpf/
> -KBUILD_HOSTCFLAGS += -I$(srctree)/tools/lib/ -I$(srctree)/tools/include
> -KBUILD_HOSTCFLAGS += -I$(srctree)/tools/perf
> -
> -HOSTCFLAGS_bpf_load.o += -Wno-unused-variable
> -
> -KBUILD_HOSTLDLIBS		+= $(LIBBPF) -lelf
> -HOSTLDLIBS_tracex4		+= -lrt
> -HOSTLDLIBS_trace_output	+= -lrt
> -HOSTLDLIBS_map_perf_test	+= -lrt
> -HOSTLDLIBS_test_overhead	+= -lrt
> -HOSTLDLIBS_xdpsock		+= -pthread
> -
>   # Strip all expet -D options needed to handle linux headers
>   # for arm it's __LINUX_ARM_ARCH__ and potentially others fork vars
>   D_OPTIONS = $(shell echo "$(KBUILD_CFLAGS) " | sed 's/[[:blank:]]/\n/g' | \
> @@ -194,6 +177,29 @@ ifeq ($(ARCH), arm)
>   CLANG_EXTRA_CFLAGS := $(D_OPTIONS)
>   endif
>   
> +ccflags-y += -I$(objtree)/usr/include
> +ccflags-y += -I$(srctree)/tools/lib/bpf/
> +ccflags-y += -I$(srctree)/tools/testing/selftests/bpf/
> +ccflags-y += -I$(srctree)/tools/lib/
> +ccflags-y += -I$(srctree)/tools/include
> +ccflags-y += -I$(srctree)/tools/perf
> +ccflags-y += $(D_OPTIONS)
> +ccflags-y += -Wall
> +ccflags-y += -fomit-frame-pointer
> +ccflags-y += -Wmissing-prototypes
> +ccflags-y += -Wstrict-prototypes
> +
> +PROGS_CFLAGS := $(ccflags-y)
> +
> +PROGCFLAGS_bpf_load.o += -Wno-unused-variable
> +
> +PROGS_LDLIBS			:= $(LIBBPF) -lelf
> +PROGLDLIBS_tracex4		+= -lrt
> +PROGLDLIBS_trace_output		+= -lrt
> +PROGLDLIBS_map_perf_test	+= -lrt
> +PROGLDLIBS_test_overhead	+= -lrt
> +PROGLDLIBS_xdpsock		+= -pthread
> +
>   # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
>   #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
>   LLC ?= llc
> @@ -284,6 +290,8 @@ $(obj)/hbm_out_kern.o: $(src)/hbm.h $(src)/hbm_kern.h
>   $(obj)/hbm.o: $(src)/hbm.h
>   $(obj)/hbm_edt_kern.o: $(src)/hbm.h $(src)/hbm_kern.h
>   
> +-include $(BPF_SAMPLES_PATH)/Makefile.prog
> +
>   # asm/sysreg.h - inline assembly used by it is incompatible with llvm.
>   # But, there is no easy way to fix it, so just exclude it since it is
>   # useless for BPF samples.
> 

^ permalink raw reply

* Re: [PATCH bpf-next 10/11] libbpf: makefile: add C/CXX/LDFLAGS to libbpf.so and test_libpf targets
From: Yonghong Song @ 2019-09-13 21:43 UTC (permalink / raw)
  To: Ivan Khoronzhuk, ast@kernel.org, daniel@iogearbox.net,
	davem@davemloft.net, jakub.kicinski@netronome.com,
	hawk@kernel.org, john.fastabend@gmail.com
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	bpf@vger.kernel.org, clang-built-linux@googlegroups.com
In-Reply-To: <20190910103830.20794-11-ivan.khoronzhuk@linaro.org>



On 9/10/19 11:38 AM, Ivan Khoronzhuk wrote:
> In case of LDFLAGS and EXTRA_CC/CXX flags there is no way to pass them
> correctly to build command, for instance when --sysroot is used or
> external libraries are used, like -lelf, wich can be absent in
> toolchain. This is used for samples/bpf cross-compiling allowing to
> get elf lib from sysroot.
> 
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> ---
>   samples/bpf/Makefile   |  8 +++++++-
>   tools/lib/bpf/Makefile | 11 ++++++++---
>   2 files changed, 15 insertions(+), 4 deletions(-)

Could you separate this patch into two?
One of libbpf and another for samples.

The subject 'libbpf: ...' is not entirely accurate.

> 
> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> index 79c9aa41832e..4edc5232cfc1 100644
> --- a/samples/bpf/Makefile
> +++ b/samples/bpf/Makefile
> @@ -186,6 +186,10 @@ ccflags-y += -I$(srctree)/tools/perf
>   ccflags-y += $(D_OPTIONS)
>   ccflags-y += -Wall
>   ccflags-y += -fomit-frame-pointer
> +
> +EXTRA_CXXFLAGS := $(ccflags-y)
> +
> +# options not valid for C++
>   ccflags-y += -Wmissing-prototypes
>   ccflags-y += -Wstrict-prototypes
>   
> @@ -252,7 +256,9 @@ clean:
>   
>   $(LIBBPF): FORCE
>   # Fix up variables inherited from Kbuild that tools/ build system won't like
> -	$(MAKE) -C $(dir $@) RM='rm -rf' LDFLAGS= srctree=$(BPF_SAMPLES_PATH)/../../ O=
> +	$(MAKE) -C $(dir $@) RM='rm -rf' EXTRA_CFLAGS="$(PROGS_CFLAGS)" \
> +		EXTRA_CXXFLAGS="$(EXTRA_CXXFLAGS)" LDFLAGS=$(PROGS_LDFLAGS) \
> +		srctree=$(BPF_SAMPLES_PATH)/../../ O=
>   
>   $(obj)/syscall_nrs.h:	$(obj)/syscall_nrs.s FORCE
>   	$(call filechk,offsets,__SYSCALL_NRS_H__)
> diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
> index c6f94cffe06e..bccfa556ef4e 100644
> --- a/tools/lib/bpf/Makefile
> +++ b/tools/lib/bpf/Makefile
> @@ -94,6 +94,10 @@ else
>     CFLAGS := -g -Wall
>   endif
>   
> +ifdef EXTRA_CXXFLAGS
> +  CXXFLAGS := $(EXTRA_CXXFLAGS)
> +endif
> +
>   ifeq ($(feature-libelf-mmap), 1)
>     override CFLAGS += -DHAVE_LIBELF_MMAP_SUPPORT
>   endif
> @@ -176,8 +180,9 @@ $(BPF_IN): force elfdep bpfdep
>   $(OUTPUT)libbpf.so: $(OUTPUT)libbpf.so.$(LIBBPF_VERSION)
>   
>   $(OUTPUT)libbpf.so.$(LIBBPF_VERSION): $(BPF_IN)
> -	$(QUIET_LINK)$(CC) --shared -Wl,-soname,libbpf.so.$(LIBBPF_MAJOR_VERSION) \
> -				    -Wl,--version-script=$(VERSION_SCRIPT) $^ -lelf -o $@
> +	$(QUIET_LINK)$(CC) $(LDFLAGS) \
> +		--shared -Wl,-soname,libbpf.so.$(LIBBPF_MAJOR_VERSION) \
> +		-Wl,--version-script=$(VERSION_SCRIPT) $^ -lelf -o $@
>   	@ln -sf $(@F) $(OUTPUT)libbpf.so
>   	@ln -sf $(@F) $(OUTPUT)libbpf.so.$(LIBBPF_MAJOR_VERSION)
>   
> @@ -185,7 +190,7 @@ $(OUTPUT)libbpf.a: $(BPF_IN)
>   	$(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^
>   
>   $(OUTPUT)test_libbpf: test_libbpf.cpp $(OUTPUT)libbpf.a
> -	$(QUIET_LINK)$(CXX) $(INCLUDES) $^ -lelf -o $@
> +	$(QUIET_LINK)$(CXX) $(CXXFLAGS) $(LDFLAGS) $(INCLUDES) $^ -lelf -o $@
>   
>   $(OUTPUT)libbpf.pc:
>   	$(QUIET_GEN)sed -e "s|@PREFIX@|$(prefix)|" \
> 

^ permalink raw reply

* Re: [PATCH bpf-next 11/11] samples: bpf: makefile: add sysroot support
From: Yonghong Song @ 2019-09-13 21:45 UTC (permalink / raw)
  To: Ivan Khoronzhuk, ast@kernel.org, daniel@iogearbox.net,
	davem@davemloft.net, jakub.kicinski@netronome.com,
	hawk@kernel.org, john.fastabend@gmail.com
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	bpf@vger.kernel.org, clang-built-linux@googlegroups.com
In-Reply-To: <20190910103830.20794-12-ivan.khoronzhuk@linaro.org>



On 9/10/19 11:38 AM, Ivan Khoronzhuk wrote:
> Basically it only enables that was added by previous couple fixes.
> For sure, just make tools/include to be included after sysroot
> headers.
> 
> export ARCH=arm
> export CROSS_COMPILE=arm-linux-gnueabihf-
> make samples/bpf/ SYSROOT="path/to/sysroot"
> 
> Sysroot contains correct libs installed and its headers ofc.
> Useful when working with NFC or virtual machine.
> 
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> ---
>   samples/bpf/Makefile   |  5 +++++
>   samples/bpf/README.rst | 10 ++++++++++
>   2 files changed, 15 insertions(+)
> 
> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> index 4edc5232cfc1..68ba78d1dbbe 100644
> --- a/samples/bpf/Makefile
> +++ b/samples/bpf/Makefile
> @@ -177,6 +177,11 @@ ifeq ($(ARCH), arm)
>   CLANG_EXTRA_CFLAGS := $(D_OPTIONS)
>   endif
>   
> +ifdef SYSROOT
> +ccflags-y += --sysroot=${SYSROOT}
> +PROGS_LDFLAGS := -L${SYSROOT}/usr/lib
> +endif
> +
>   ccflags-y += -I$(objtree)/usr/include
>   ccflags-y += -I$(srctree)/tools/lib/bpf/
>   ccflags-y += -I$(srctree)/tools/testing/selftests/bpf/
> diff --git a/samples/bpf/README.rst b/samples/bpf/README.rst
> index 5f27e4faca50..786d0ab98e8a 100644
> --- a/samples/bpf/README.rst
> +++ b/samples/bpf/README.rst
> @@ -74,3 +74,13 @@ samples for the cross target.
>   export ARCH=arm64
>   export CROSS_COMPILE="aarch64-linux-gnu-"
>   make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
> +
> +If need to use environment of target board (headers and libs), the SYSROOT
> +also can be set, pointing on FS of target board:
> +
> +export ARCH=arm64
> +export CROSS_COMPILE="aarch64-linux-gnu-"
> +make samples/bpf/ SYSROOT=~/some_sdk/linux-devkit/sysroots/aarch64-linux-gnu
> +
> +Setting LLC and CLANG is not necessarily if it's installed on HOST and have
> +in its targets appropriate arch triple (usually it has several arches).

You have very good description about how to build and test in cover 
letter. Could you include those instructions here as well? This will
help keep a record so later people can try/test if needed.

^ permalink raw reply

* Re: [PATCH v4 2/2] tcp: Add snd_wnd to TCP_INFO
From: Neal Cardwell @ 2019-09-13 21:53 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Thomas Higdon, netdev@vger.kernel.org, Jonathan Lemon, Dave Jones,
	Eric Dumazet, Dave Taht, Soheil Hassas Yeganeh
In-Reply-To: <CAK6E8=ddxo+yg2tTiZm5YEbfPkeVkeZOGwB33+Qfb4Qfj4yDJA@mail.gmail.com>

On Fri, Sep 13, 2019 at 5:29 PM Yuchung Cheng <ycheng@google.com> wrote:
> > What if the comment is shortened up to fit in 80 columns and the units
> > (bytes) are added, something like:
> >
> >         __u32   tcpi_snd_wnd;        /* peer's advertised recv window (bytes) */
> just a thought: will tcpi_peer_rcv_wnd be more self-explanatory?

Good suggestion. I'm on the fence about that one. By itself, I agree
tcpi_peer_rcv_wnd sounds much more clear. But tcpi_snd_wnd has the
virtue of matching both the kernel code (tp->snd_wnd) and RFC 793
(SND.WND). So they both have pros and cons. Maybe someone else feels
more strongly one way or the other.

neal

^ permalink raw reply

* Re: [PATCH bpf-next 07/11] samples: bpf: add makefile.prog for separate CC build
From: Ivan Khoronzhuk @ 2019-09-13 22:14 UTC (permalink / raw)
  To: Yonghong Song
  Cc: ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net,
	jakub.kicinski@netronome.com, hawk@kernel.org,
	john.fastabend@gmail.com, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	clang-built-linux@googlegroups.com
In-Reply-To: <1720c5a5-5c64-46a3-be2f-56b59614f82a@fb.com>

On Fri, Sep 13, 2019 at 09:33:58PM +0000, Yonghong Song wrote:
>
>
>On 9/10/19 11:38 AM, Ivan Khoronzhuk wrote:
>> The makefile.prog is added only, will be used in sample/bpf/Makefile
>> later in order to switch cross-compiling on CC from HOSTCC.
>>
>> The HOSTCC is supposed to build binaries and tools running on the host
>> afterwards, in order to simplify build or so, like "fixdep" or else.
>> In case of cross compiling "fixdep" is executed on host when the rest
>> samples should run on target arch. In order to build binaries for
>> target arch with CC and tools running on host with HOSTCC, lets add
>> Makefile.prog for simplicity, having definition and routines similar
>> to ones, used in script/Makefile.host. This allows later add
>> cross-compilation to samples/bpf with minimum changes.
>
>So this is really Makefile.host or Makefile.user, right?
It's cut and modified version of Makefile.host

>In BPF, 'prog' can refers to user prog or bpf prog.
>To avoid ambiguity, maybe Makefile.host?
Makefile.target? as target = host not always.
Makefile.user also looks ok, but doesn't contain target,
maybe "targetuser", but bpf is also target...
Initially I was thinking about Makefile.targetprog, but it looks too long.

>
>>
>> Makefile.prog contains only stuff needed for samples/bpf, potentially
>> can be reused and extended for other prog sets later and now needed
>
>What do you mean 'extended for other prog sets'? I am wondering whether
Another kind of samples...with c++ for instance.

>we could just include 'scripts/Makefile.host'? How hard it is?
It's bound to HOSTCC and it's environment. It is included by default and
was used before this patchset, blocking cross complication.

>
>> only for unblocking tricky samples/bpf cross compilation.
>>
>> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
>> ---
>>   samples/bpf/Makefile.prog | 77 +++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 77 insertions(+)
>>   create mode 100644 samples/bpf/Makefile.prog
>>
>> diff --git a/samples/bpf/Makefile.prog b/samples/bpf/Makefile.prog
>> new file mode 100644
>> index 000000000000..3781999b9193
>> --- /dev/null
>> +++ b/samples/bpf/Makefile.prog
>> @@ -0,0 +1,77 @@
>> +# SPDX-License-Identifier: GPL-2.0
>> +# ==========================================================================
>> +# Building binaries on the host system
>> +# Binaries are not used during the compilation of the kernel, and intendent
>> +# to be build for target board, target board can be host ofc. Added to build
>> +# binaries to run not on host system.
>> +#
>> +# Only C is supported, but can be extended for C++.
>
>The above comment is not needed.
will drop it.

>samples/bpf/ only have C now. I am wondering whether your below scripts
>can be simplified, e.g., removing cxxobjs.
I think yes, will try

>
>> +#
>> +# Sample syntax (see Documentation/kbuild/makefiles.rst for reference)
>> +# progs-y := xsk_example
>> +# Will compile xdpsock_example.c and create an executable named xsk_example
>> +#
>> +# progs-y    := xdpsock
>> +# xdpsock-objs := xdpsock_1.o xdpsock_2.o
>> +# Will compile xdpsock_1.c and xdpsock_2.c, and then link the executable
>> +# xdpsock, based on xdpsock_1.o and xdpsock_2.o
>> +#
>> +# Inherited from scripts/Makefile.host
>> +#
>> +__progs := $(sort $(progs-y))
>> +
>> +# C code
>> +# Executables compiled from a single .c file
>> +prog-csingle	:= $(foreach m,$(__progs), \
>> +			$(if $($(m)-objs)$($(m)-cxxobjs),,$(m)))
>> +
>> +# C executables linked based on several .o files
>> +prog-cmulti	:= $(foreach m,$(__progs),\
>> +		   $(if $($(m)-cxxobjs),,$(if $($(m)-objs),$(m))))
>> +
>> +# Object (.o) files compiled from .c files
>> +prog-cobjs	:= $(sort $(foreach m,$(__progs),$($(m)-objs)))
>> +
>> +prog-csingle	:= $(addprefix $(obj)/,$(prog-csingle))
>> +prog-cmulti	:= $(addprefix $(obj)/,$(prog-cmulti))
>> +prog-cobjs	:= $(addprefix $(obj)/,$(prog-cobjs))
>> +
>> +#####
>> +# Handle options to gcc. Support building with separate output directory
>> +
>> +_progc_flags   = $(PROGS_CFLAGS) \
>> +                 $(PROGCFLAGS_$(basetarget).o)
>> +
>> +# $(objtree)/$(obj) for including generated headers from checkin source files
>> +ifeq ($(KBUILD_EXTMOD),)
>> +ifdef building_out_of_srctree
>> +_progc_flags   += -I $(objtree)/$(obj)
>> +endif
>> +endif
>> +
>> +progc_flags    = -Wp,-MD,$(depfile) $(_progc_flags)
>> +
>> +# Create executable from a single .c file
>> +# prog-csingle -> Executable
>> +quiet_cmd_prog-csingle 	= CC  $@
>> +      cmd_prog-csingle	= $(CC) $(progc_flags) $(PROGS_LDFLAGS) -o $@ $< \
>> +		$(PROGS_LDLIBS) $(PROGLDLIBS_$(@F))
>> +$(prog-csingle): $(obj)/%: $(src)/%.c FORCE
>> +	$(call if_changed_dep,prog-csingle)
>> +
>> +# Link an executable based on list of .o files, all plain c
>> +# prog-cmulti -> executable
>> +quiet_cmd_prog-cmulti	= LD  $@
>> +      cmd_prog-cmulti	= $(CC) $(progc_flags) $(PROGS_LDFLAGS) -o $@ \
>> +			  $(addprefix $(obj)/,$($(@F)-objs)) \
>> +			  $(PROGS_LDLIBS) $(PROGLDLIBS_$(@F))
>> +$(prog-cmulti): $(prog-cobjs) FORCE
>> +	$(call if_changed,prog-cmulti)
>> +$(call multi_depend, $(prog-cmulti), , -objs)
>> +
>> +# Create .o file from a single .c file
>> +# prog-cobjs -> .o
>> +quiet_cmd_prog-cobjs	= CC  $@
>> +      cmd_prog-cobjs	= $(CC) $(progc_flags) -c -o $@ $<
>> +$(prog-cobjs): $(obj)/%.o: $(src)/%.c FORCE
>> +	$(call if_changed_dep,prog-cobjs)
>>

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply

* Re: [PATCH bpf-next 08/11] samples: bpf: makefile: base progs build on makefile.progs
From: Ivan Khoronzhuk @ 2019-09-13 22:25 UTC (permalink / raw)
  To: Yonghong Song
  Cc: ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net,
	jakub.kicinski@netronome.com, hawk@kernel.org,
	john.fastabend@gmail.com, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	clang-built-linux@googlegroups.com
In-Reply-To: <dd4cd83f-7e35-ad07-8a53-d34c13c074a5@fb.com>

On Fri, Sep 13, 2019 at 09:41:25PM +0000, Yonghong Song wrote:
>
>
>On 9/10/19 11:38 AM, Ivan Khoronzhuk wrote:
>> The main reason for that - HOSTCC and CC have different aims.
>> It was tested for arm cross compilation, based on linaro toolchain,
>> but should work for others.
>>
>> In order to split cross compilation (CC) with host build (HOSTCC),
>> lets base bpf samples on Makefile.progs. It allows to cross-compile
>> samples/bpf progs with CC while auxialry tools running on host built
>> with HOSTCC.
>
>I got a compilation failure with the following error
>
>$ make samples/bpf/
>   ...
>   LD  samples/bpf/hbm
>   CC      samples/bpf/syscall_nrs.s
>gcc: error: -pg and -fomit-frame-pointer are incompatible
>make[2]: *** [samples/bpf/syscall_nrs.s] Error 1
>make[1]: *** [samples/bpf/] Error 2
>make: *** [sub-make] Error 2
>
>Could you take a look?
Yes, sure.
Ilias also observer this, interesting that on my setup for cross and
native build on arm and arm64 doesn't have this error. Now I see log
and know how to proceed, I will fix it, maybe by using another var
for cflags.

Despite of it, just to be sure in order to avoid smth like this at least
for native build, I will add one more patch like:

"
    samples: bpf: makefile: don't use host cflags when cross compile

    While compile natively, the hosts cflags and ldflags are equal to ones
    used from HOSTCFLAGS and HOSTLDFLAGS. When cross compiling it should
    have own, used for target arch. While verification, for arm, arm64 and
    x86_64 the following flags were used alsways:

    -Wall
    -O2
    -fomit-frame-pointer
    -Wmissing-prototypes
    -Wstrict-prototypes

    So, add them as they were verified and used before adding
    Makefile.progs, but anyway limit it only for cross compile options as
    for host can be some configurations when another options can be used,
    So, for host arch samples left all as is, it allows to avoid potential
    option mistmatches for existent environments.
"

+ifdef CROSS_COMPILE
+ccflags-y += -Wall
+ccflags-y += -O2
+ccflags-y += -fomit-frame-pointer
+ccflags-y += -Wmissing-prototypes
+ccflags-y += -Wstrict-prototypes
+else
+ccflags-y += $(KBUILD_HOSTCFLAGS) $(HOST_EXTRACFLAGS)
+PROGS_LDLIBS := $(KBUILD_HOSTLDLIBS)
+endif

>
>>
>> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
>> ---
>>   samples/bpf/Makefile | 138 +++++++++++++++++++++++--------------------
>>   1 file changed, 73 insertions(+), 65 deletions(-)
>>
>> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
>> index f5dbf3d0c5f3..625a71f2e9d2 100644
>> --- a/samples/bpf/Makefile
>> +++ b/samples/bpf/Makefile
>> @@ -4,55 +4,53 @@ BPF_SAMPLES_PATH ?= $(abspath $(srctree)/$(src))
>>   TOOLS_PATH := $(BPF_SAMPLES_PATH)/../../tools
>>
>>   # List of programs to build
>> -hostprogs-y := test_lru_dist
>> -hostprogs-y += sock_example
>> -hostprogs-y += fds_example
>> -hostprogs-y += sockex1
>> -hostprogs-y += sockex2
>> -hostprogs-y += sockex3
>> -hostprogs-y += tracex1
>> -hostprogs-y += tracex2
>> -hostprogs-y += tracex3
>> -hostprogs-y += tracex4
>> -hostprogs-y += tracex5
>> -hostprogs-y += tracex6
>> -hostprogs-y += tracex7
>> -hostprogs-y += test_probe_write_user
>> -hostprogs-y += trace_output
>> -hostprogs-y += lathist
>> -hostprogs-y += offwaketime
>> -hostprogs-y += spintest
>> -hostprogs-y += map_perf_test
>> -hostprogs-y += test_overhead
>> -hostprogs-y += test_cgrp2_array_pin
>> -hostprogs-y += test_cgrp2_attach
>> -hostprogs-y += test_cgrp2_sock
>> -hostprogs-y += test_cgrp2_sock2
>> -hostprogs-y += xdp1
>> -hostprogs-y += xdp2
>> -hostprogs-y += xdp_router_ipv4
>> -hostprogs-y += test_current_task_under_cgroup
>> -hostprogs-y += trace_event
>> -hostprogs-y += sampleip
>> -hostprogs-y += tc_l2_redirect
>> -hostprogs-y += lwt_len_hist
>> -hostprogs-y += xdp_tx_iptunnel
>> -hostprogs-y += test_map_in_map
>> -hostprogs-y += per_socket_stats_example
>> -hostprogs-y += xdp_redirect
>> -hostprogs-y += xdp_redirect_map
>> -hostprogs-y += xdp_redirect_cpu
>> -hostprogs-y += xdp_monitor
>> -hostprogs-y += xdp_rxq_info
>> -hostprogs-y += syscall_tp
>> -hostprogs-y += cpustat
>> -hostprogs-y += xdp_adjust_tail
>> -hostprogs-y += xdpsock
>> -hostprogs-y += xdp_fwd
>> -hostprogs-y += task_fd_query
>> -hostprogs-y += xdp_sample_pkts
>> -hostprogs-y += ibumad
>> -hostprogs-y += hbm
>> +progs-y := test_lru_dist
>> +progs-y += sock_example
>> +progs-y += fds_example
>> +progs-y += sockex1
>> +progs-y += sockex2
>> +progs-y += sockex3
>> +progs-y += tracex1
>> +progs-y += tracex2
>> +progs-y += tracex3
>> +progs-y += tracex4
>> +progs-y += tracex5
>> +progs-y += tracex6
>> +progs-y += tracex7
>> +progs-y += test_probe_write_user
>> +progs-y += trace_output
>> +progs-y += lathist
>> +progs-y += offwaketime
>> +progs-y += spintest
>> +progs-y += map_perf_test
>> +progs-y += test_overhead
>> +progs-y += test_cgrp2_array_pin
>> +progs-y += test_cgrp2_attach
>> +progs-y += test_cgrp2_sock
>> +progs-y += test_cgrp2_sock2
>> +progs-y += xdp1
>> +progs-y += xdp2
>> +progs-y += xdp_router_ipv4
>> +progs-y += test_current_task_under_cgroup
>> +progs-y += trace_event
>> +progs-y += sampleip
>> +progs-y += tc_l2_redirect
>> +progs-y += lwt_len_hist
>> +progs-y += xdp_tx_iptunnel
>> +progs-y += test_map_in_map
>> +progs-y += xdp_redirect_map
>> +progs-y += xdp_redirect_cpu
>> +progs-y += xdp_monitor
>> +progs-y += xdp_rxq_info
>> +progs-y += syscall_tp
>> +progs-y += cpustat
>> +progs-y += xdp_adjust_tail
>> +progs-y += xdpsock
>> +progs-y += xdp_fwd
>> +progs-y += task_fd_query
>> +progs-y += xdp_sample_pkts
>> +progs-y += ibumad
>> +progs-y += hbm
>>
>>   # Libbpf dependencies
>>   LIBBPF = $(TOOLS_PATH)/lib/bpf/libbpf.a
>> @@ -111,7 +109,7 @@ ibumad-objs := bpf_load.o ibumad_user.o $(TRACE_HELPERS)
>>   hbm-objs := bpf_load.o hbm.o $(CGROUP_HELPERS)
>>
>>   # Tell kbuild to always build the programs
>> -always := $(hostprogs-y)
>> +always := $(progs-y)
>>   always += sockex1_kern.o
>>   always += sockex2_kern.o
>>   always += sockex3_kern.o
>> @@ -170,21 +168,6 @@ always += ibumad_kern.o
>>   always += hbm_out_kern.o
>>   always += hbm_edt_kern.o
>>
>> -KBUILD_HOSTCFLAGS += -I$(objtree)/usr/include
>> -KBUILD_HOSTCFLAGS += -I$(srctree)/tools/lib/bpf/
>> -KBUILD_HOSTCFLAGS += -I$(srctree)/tools/testing/selftests/bpf/
>> -KBUILD_HOSTCFLAGS += -I$(srctree)/tools/lib/ -I$(srctree)/tools/include
>> -KBUILD_HOSTCFLAGS += -I$(srctree)/tools/perf
>> -
>> -HOSTCFLAGS_bpf_load.o += -Wno-unused-variable
>> -
>> -KBUILD_HOSTLDLIBS		+= $(LIBBPF) -lelf
>> -HOSTLDLIBS_tracex4		+= -lrt
>> -HOSTLDLIBS_trace_output	+= -lrt
>> -HOSTLDLIBS_map_perf_test	+= -lrt
>> -HOSTLDLIBS_test_overhead	+= -lrt
>> -HOSTLDLIBS_xdpsock		+= -pthread
>> -
>>   # Strip all expet -D options needed to handle linux headers
>>   # for arm it's __LINUX_ARM_ARCH__ and potentially others fork vars
>>   D_OPTIONS = $(shell echo "$(KBUILD_CFLAGS) " | sed 's/[[:blank:]]/\n/g' | \
>> @@ -194,6 +177,29 @@ ifeq ($(ARCH), arm)
>>   CLANG_EXTRA_CFLAGS := $(D_OPTIONS)
>>   endif
>>
>> +ccflags-y += -I$(objtree)/usr/include
>> +ccflags-y += -I$(srctree)/tools/lib/bpf/
>> +ccflags-y += -I$(srctree)/tools/testing/selftests/bpf/
>> +ccflags-y += -I$(srctree)/tools/lib/
>> +ccflags-y += -I$(srctree)/tools/include
>> +ccflags-y += -I$(srctree)/tools/perf
>> +ccflags-y += $(D_OPTIONS)
>> +ccflags-y += -Wall
>> +ccflags-y += -fomit-frame-pointer
>> +ccflags-y += -Wmissing-prototypes
>> +ccflags-y += -Wstrict-prototypes
>> +
>> +PROGS_CFLAGS := $(ccflags-y)
>> +
>> +PROGCFLAGS_bpf_load.o += -Wno-unused-variable
>> +
>> +PROGS_LDLIBS			:= $(LIBBPF) -lelf
>> +PROGLDLIBS_tracex4		+= -lrt
>> +PROGLDLIBS_trace_output		+= -lrt
>> +PROGLDLIBS_map_perf_test	+= -lrt
>> +PROGLDLIBS_test_overhead	+= -lrt
>> +PROGLDLIBS_xdpsock		+= -pthread
>> +
>>   # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
>>   #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
>>   LLC ?= llc
>> @@ -284,6 +290,8 @@ $(obj)/hbm_out_kern.o: $(src)/hbm.h $(src)/hbm_kern.h
>>   $(obj)/hbm.o: $(src)/hbm.h
>>   $(obj)/hbm_edt_kern.o: $(src)/hbm.h $(src)/hbm_kern.h
>>
>> +-include $(BPF_SAMPLES_PATH)/Makefile.prog
>> +
>>   # asm/sysreg.h - inline assembly used by it is incompatible with llvm.
>>   # But, there is no easy way to fix it, so just exclude it since it is
>>   # useless for BPF samples.
>>

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply

* Re: [PATCH bpf-next 10/11] libbpf: makefile: add C/CXX/LDFLAGS to libbpf.so and test_libpf targets
From: Ivan Khoronzhuk @ 2019-09-13 22:33 UTC (permalink / raw)
  To: Yonghong Song
  Cc: ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net,
	jakub.kicinski@netronome.com, hawk@kernel.org,
	john.fastabend@gmail.com, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	clang-built-linux@googlegroups.com
In-Reply-To: <0ad42019-2614-b70c-f93e-527c136bba83@fb.com>

On Fri, Sep 13, 2019 at 09:43:22PM +0000, Yonghong Song wrote:
>
>
>On 9/10/19 11:38 AM, Ivan Khoronzhuk wrote:
>> In case of LDFLAGS and EXTRA_CC/CXX flags there is no way to pass them
>> correctly to build command, for instance when --sysroot is used or
>> external libraries are used, like -lelf, wich can be absent in
>> toolchain. This is used for samples/bpf cross-compiling allowing to
>> get elf lib from sysroot.
>>
>> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
>> ---
>>   samples/bpf/Makefile   |  8 +++++++-
>>   tools/lib/bpf/Makefile | 11 ++++++++---
>>   2 files changed, 15 insertions(+), 4 deletions(-)
>
>Could you separate this patch into two?
>One of libbpf and another for samples.
>
>The subject 'libbpf: ...' is not entirely accurate.
Yes, ofc.
But there is too many patches already, but better a lot of small
changes then couple huge.

>
>>
>> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
>> index 79c9aa41832e..4edc5232cfc1 100644
>> --- a/samples/bpf/Makefile
>> +++ b/samples/bpf/Makefile
>> @@ -186,6 +186,10 @@ ccflags-y += -I$(srctree)/tools/perf
>>   ccflags-y += $(D_OPTIONS)
>>   ccflags-y += -Wall
>>   ccflags-y += -fomit-frame-pointer
>> +
>> +EXTRA_CXXFLAGS := $(ccflags-y)
>> +
>> +# options not valid for C++
>>   ccflags-y += -Wmissing-prototypes
>>   ccflags-y += -Wstrict-prototypes
>>
>> @@ -252,7 +256,9 @@ clean:
>>
>>   $(LIBBPF): FORCE
>>   # Fix up variables inherited from Kbuild that tools/ build system won't like
>> -	$(MAKE) -C $(dir $@) RM='rm -rf' LDFLAGS= srctree=$(BPF_SAMPLES_PATH)/../../ O=
>> +	$(MAKE) -C $(dir $@) RM='rm -rf' EXTRA_CFLAGS="$(PROGS_CFLAGS)" \
>> +		EXTRA_CXXFLAGS="$(EXTRA_CXXFLAGS)" LDFLAGS=$(PROGS_LDFLAGS) \
>> +		srctree=$(BPF_SAMPLES_PATH)/../../ O=
>>
>>   $(obj)/syscall_nrs.h:	$(obj)/syscall_nrs.s FORCE
>>   	$(call filechk,offsets,__SYSCALL_NRS_H__)
>> diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
>> index c6f94cffe06e..bccfa556ef4e 100644
>> --- a/tools/lib/bpf/Makefile
>> +++ b/tools/lib/bpf/Makefile
>> @@ -94,6 +94,10 @@ else
>>     CFLAGS := -g -Wall
>>   endif
>>
>> +ifdef EXTRA_CXXFLAGS
>> +  CXXFLAGS := $(EXTRA_CXXFLAGS)
>> +endif
>> +
>>   ifeq ($(feature-libelf-mmap), 1)
>>     override CFLAGS += -DHAVE_LIBELF_MMAP_SUPPORT
>>   endif
>> @@ -176,8 +180,9 @@ $(BPF_IN): force elfdep bpfdep
>>   $(OUTPUT)libbpf.so: $(OUTPUT)libbpf.so.$(LIBBPF_VERSION)
>>
>>   $(OUTPUT)libbpf.so.$(LIBBPF_VERSION): $(BPF_IN)
>> -	$(QUIET_LINK)$(CC) --shared -Wl,-soname,libbpf.so.$(LIBBPF_MAJOR_VERSION) \
>> -				    -Wl,--version-script=$(VERSION_SCRIPT) $^ -lelf -o $@
>> +	$(QUIET_LINK)$(CC) $(LDFLAGS) \
>> +		--shared -Wl,-soname,libbpf.so.$(LIBBPF_MAJOR_VERSION) \
>> +		-Wl,--version-script=$(VERSION_SCRIPT) $^ -lelf -o $@
>>   	@ln -sf $(@F) $(OUTPUT)libbpf.so
>>   	@ln -sf $(@F) $(OUTPUT)libbpf.so.$(LIBBPF_MAJOR_VERSION)
>>
>> @@ -185,7 +190,7 @@ $(OUTPUT)libbpf.a: $(BPF_IN)
>>   	$(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^
>>
>>   $(OUTPUT)test_libbpf: test_libbpf.cpp $(OUTPUT)libbpf.a
>> -	$(QUIET_LINK)$(CXX) $(INCLUDES) $^ -lelf -o $@
>> +	$(QUIET_LINK)$(CXX) $(CXXFLAGS) $(LDFLAGS) $(INCLUDES) $^ -lelf -o $@
>>
>>   $(OUTPUT)libbpf.pc:
>>   	$(QUIET_GEN)sed -e "s|@PREFIX@|$(prefix)|" \
>>

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply

* Re: [bpf-next,v3] samples: bpf: add max_pckt_size option at xdp_adjust_tail
From: Yonghong Song @ 2019-09-13 22:34 UTC (permalink / raw)
  To: Daniel T. Lee, Daniel Borkmann, Alexei Starovoitov
  Cc: netdev@vger.kernel.org, bpf@vger.kernel.org
In-Reply-To: <20190911190218.22628-1-danieltimlee@gmail.com>



On 9/11/19 8:02 PM, Daniel T. Lee wrote:
> Currently, at xdp_adjust_tail_kern.c, MAX_PCKT_SIZE is limited
> to 600. To make this size flexible, a new map 'pcktsz' is added.
> 
> By updating new packet size to this map from the userland,
> xdp_adjust_tail_kern.o will use this value as a new max_pckt_size.
> 
> If no '-P <MAX_PCKT_SIZE>' option is used, the size of maximum packet
> will be 600 as a default.
> 
> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> 
> ---
> Changes in v2:
>      - Change the helper to fetch map from 'bpf_map__next' to
>      'bpf_object__find_map_fd_by_name'.
>   
>   samples/bpf/xdp_adjust_tail_kern.c | 23 +++++++++++++++++++----
>   samples/bpf/xdp_adjust_tail_user.c | 28 ++++++++++++++++++++++------
>   2 files changed, 41 insertions(+), 10 deletions(-)
> 
> diff --git a/samples/bpf/xdp_adjust_tail_kern.c b/samples/bpf/xdp_adjust_tail_kern.c
> index 411fdb21f8bc..d6d84ffe6a7a 100644
> --- a/samples/bpf/xdp_adjust_tail_kern.c
> +++ b/samples/bpf/xdp_adjust_tail_kern.c
> @@ -25,6 +25,13 @@
>   #define ICMP_TOOBIG_SIZE 98
>   #define ICMP_TOOBIG_PAYLOAD_SIZE 92
>   
> +struct bpf_map_def SEC("maps") pcktsz = {
> +	.type = BPF_MAP_TYPE_ARRAY,
> +	.key_size = sizeof(__u32),
> +	.value_size = sizeof(__u32),
> +	.max_entries = 1,
> +};

We have new map definition format like in
tools/testing/selftests/bpf/progs/bpf_flow.c.
But looks like most samples/bpf still use SEC("maps").
I guess we can leave it for now, and if needed,
later on a massive conversion for all samples/bpf/
bpf programs can be done.

> +
>   struct bpf_map_def SEC("maps") icmpcnt = {
>   	.type = BPF_MAP_TYPE_ARRAY,
>   	.key_size = sizeof(__u32),
> @@ -64,7 +71,8 @@ static __always_inline void ipv4_csum(void *data_start, int data_size,
>   	*csum = csum_fold_helper(*csum);
>   }
>   
> -static __always_inline int send_icmp4_too_big(struct xdp_md *xdp)
> +static __always_inline int send_icmp4_too_big(struct xdp_md *xdp,
> +					      __u32 max_pckt_size)
>   {
>   	int headroom = (int)sizeof(struct iphdr) + (int)sizeof(struct icmphdr);
>   
> @@ -92,7 +100,7 @@ static __always_inline int send_icmp4_too_big(struct xdp_md *xdp)
>   	orig_iph = data + off;
>   	icmp_hdr->type = ICMP_DEST_UNREACH;
>   	icmp_hdr->code = ICMP_FRAG_NEEDED;
> -	icmp_hdr->un.frag.mtu = htons(MAX_PCKT_SIZE-sizeof(struct ethhdr));
> +	icmp_hdr->un.frag.mtu = htons(max_pckt_size - sizeof(struct ethhdr));
>   	icmp_hdr->checksum = 0;
>   	ipv4_csum(icmp_hdr, ICMP_TOOBIG_PAYLOAD_SIZE, &csum);
>   	icmp_hdr->checksum = csum;
> @@ -118,14 +126,21 @@ static __always_inline int handle_ipv4(struct xdp_md *xdp)
>   {
>   	void *data_end = (void *)(long)xdp->data_end;
>   	void *data = (void *)(long)xdp->data;
> +	__u32 max_pckt_size = MAX_PCKT_SIZE;
> +	__u32 *pckt_sz;
> +	__u32 key = 0;

The above two new definitions may the code not in
reverse Christmas definition order, could you fix it?

>   	int pckt_size = data_end - data;
>   	int offset;
>   
> -	if (pckt_size > MAX_PCKT_SIZE) {
> +	pckt_sz = bpf_map_lookup_elem(&pcktsz, &key);
> +	if (pckt_sz && *pckt_sz)
> +		max_pckt_size = *pckt_sz;
> +
> +	if (pckt_size > max_pckt_size) {
>   		offset = pckt_size - ICMP_TOOBIG_SIZE;
>   		if (bpf_xdp_adjust_tail(xdp, 0 - offset))
>   			return XDP_PASS;

We could have the following scenario:
   max_pckt_size = 1
   pckt_size = 2
   offset = -96
   bpf_xdp_adjust_tail return -EINVAL
   so we return XDP_PASS now

Maybe you want to do
    if (pckt_size > max(max_pckt_size, ICMP_TOOBIG_SIZE)) {
       ...
    }
as in original code, bpf_xdp_adjust_tail(...) already succeeds.

> -		return send_icmp4_too_big(xdp);
> +		return send_icmp4_too_big(xdp, max_pckt_size);
>   	}
>   	return XDP_PASS;
>   }
> diff --git a/samples/bpf/xdp_adjust_tail_user.c b/samples/bpf/xdp_adjust_tail_user.c
> index a3596b617c4c..aef6c69a48a7 100644
> --- a/samples/bpf/xdp_adjust_tail_user.c
> +++ b/samples/bpf/xdp_adjust_tail_user.c
> @@ -23,6 +23,7 @@
>   #include "libbpf.h"
>   
>   #define STATS_INTERVAL_S 2U
> +#define MAX_PCKT_SIZE 600
>   
>   static int ifindex = -1;
>   static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
> @@ -72,6 +73,7 @@ static void usage(const char *cmd)
>   	printf("Usage: %s [...]\n", cmd);
>   	printf("    -i <ifname|ifindex> Interface\n");
>   	printf("    -T <stop-after-X-seconds> Default: 0 (forever)\n");
> +	printf("    -P <MAX_PCKT_SIZE> Default: %u\n", MAX_PCKT_SIZE);
>   	printf("    -S use skb-mode\n");
>   	printf("    -N enforce native mode\n");
>   	printf("    -F force loading prog\n");
> @@ -85,13 +87,14 @@ int main(int argc, char **argv)
>   		.prog_type	= BPF_PROG_TYPE_XDP,
>   	};
>   	unsigned char opt_flags[256] = {};
> -	const char *optstr = "i:T:SNFh";
> +	const char *optstr = "i:T:P:SNFh";
>   	struct bpf_prog_info info = {};
>   	__u32 info_len = sizeof(info);
> +	__u32 max_pckt_size = 0;
> +	__u32 key = 0;
>   	unsigned int kill_after_s = 0;
>   	int i, prog_fd, map_fd, opt;
>   	struct bpf_object *obj;
> -	struct bpf_map *map;
>   	char filename[256];
>   	int err;
>   
> @@ -110,6 +113,9 @@ int main(int argc, char **argv)
>   		case 'T':
>   			kill_after_s = atoi(optarg);
>   			break;
> +		case 'P':
> +			max_pckt_size = atoi(optarg);
> +			break;
>   		case 'S':
>   			xdp_flags |= XDP_FLAGS_SKB_MODE;
>   			break;
> @@ -150,12 +156,22 @@ int main(int argc, char **argv)
>   	if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
>   		return 1;
>   
> -	map = bpf_map__next(NULL, obj);
> -	if (!map) {
> -		printf("finding a map in obj file failed\n");
> +	/* update pcktsz map */
> +	if (max_pckt_size) {
> +		map_fd = bpf_object__find_map_fd_by_name(obj, "pcktsz");
> +		if (!map_fd) {

Let us test map_fd and below prog_fd with '< 0" instead of "!= 0'.
In this particular sample, "! = 0" is okay since we did not close
stdin. But in programs if stdin is closed, the fd 0 may be reused
for map_fd. Let us just keep good coding practice here.

> +			printf("finding a pcktsz map in obj file failed\n");
> +			return 1;
> +		}
> +		bpf_map_update_elem(map_fd, &key, &max_pckt_size, BPF_ANY);
> +	}
> +
> +	/* fetch icmpcnt map */
> +	map_fd = bpf_object__find_map_fd_by_name(obj, "icmpcnt");
> +	if (!map_fd) {
> +		printf("finding a icmpcnt map in obj file failed\n");
>   		return 1;
>   	}
> -	map_fd = bpf_map__fd(map);
>   
>   	if (!prog_fd) {
>   		printf("load_bpf_file: %s\n", strerror(errno));
> 

^ permalink raw reply

* Re: [PATCH bpf-next 11/11] samples: bpf: makefile: add sysroot support
From: Ivan Khoronzhuk @ 2019-09-13 22:36 UTC (permalink / raw)
  To: Yonghong Song
  Cc: ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net,
	jakub.kicinski@netronome.com, hawk@kernel.org,
	john.fastabend@gmail.com, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	clang-built-linux@googlegroups.com
In-Reply-To: <e967744b-0286-d92f-9fc8-70f5759cc4a1@fb.com>

On Fri, Sep 13, 2019 at 09:45:31PM +0000, Yonghong Song wrote:
>
>
>On 9/10/19 11:38 AM, Ivan Khoronzhuk wrote:
>> Basically it only enables that was added by previous couple fixes.
>> For sure, just make tools/include to be included after sysroot
>> headers.
>>
>> export ARCH=arm
>> export CROSS_COMPILE=arm-linux-gnueabihf-
>> make samples/bpf/ SYSROOT="path/to/sysroot"
>>
>> Sysroot contains correct libs installed and its headers ofc.
>> Useful when working with NFC or virtual machine.
>>
>> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
>> ---
>>   samples/bpf/Makefile   |  5 +++++
>>   samples/bpf/README.rst | 10 ++++++++++
>>   2 files changed, 15 insertions(+)
>>
>> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
>> index 4edc5232cfc1..68ba78d1dbbe 100644
>> --- a/samples/bpf/Makefile
>> +++ b/samples/bpf/Makefile
>> @@ -177,6 +177,11 @@ ifeq ($(ARCH), arm)
>>   CLANG_EXTRA_CFLAGS := $(D_OPTIONS)
>>   endif
>>
>> +ifdef SYSROOT
>> +ccflags-y += --sysroot=${SYSROOT}
>> +PROGS_LDFLAGS := -L${SYSROOT}/usr/lib
>> +endif
>> +
>>   ccflags-y += -I$(objtree)/usr/include
>>   ccflags-y += -I$(srctree)/tools/lib/bpf/
>>   ccflags-y += -I$(srctree)/tools/testing/selftests/bpf/
>> diff --git a/samples/bpf/README.rst b/samples/bpf/README.rst
>> index 5f27e4faca50..786d0ab98e8a 100644
>> --- a/samples/bpf/README.rst
>> +++ b/samples/bpf/README.rst
>> @@ -74,3 +74,13 @@ samples for the cross target.
>>   export ARCH=arm64
>>   export CROSS_COMPILE="aarch64-linux-gnu-"
>>   make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
>> +
>> +If need to use environment of target board (headers and libs), the SYSROOT
>> +also can be set, pointing on FS of target board:
>> +
>> +export ARCH=arm64
>> +export CROSS_COMPILE="aarch64-linux-gnu-"
>> +make samples/bpf/ SYSROOT=~/some_sdk/linux-devkit/sysroots/aarch64-linux-gnu
>> +
>> +Setting LLC and CLANG is not necessarily if it's installed on HOST and have
>> +in its targets appropriate arch triple (usually it has several arches).
>
>You have very good description about how to build and test in cover
>letter. Could you include those instructions here as well? This will
>help keep a record so later people can try/test if needed.

I will try.
Thanks!!!

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply

* Re: [PATCH v4 2/2] tcp: Add snd_wnd to TCP_INFO
From: Yuchung Cheng @ 2019-09-13 22:41 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Thomas Higdon, netdev@vger.kernel.org, Jonathan Lemon, Dave Jones,
	Eric Dumazet, Dave Taht, Soheil Hassas Yeganeh
In-Reply-To: <CADVnQy=aU=veBnZF=5OgwkT6EWA+hmmu8w9eq2d83eReSjAxEw@mail.gmail.com>

On Fri, Sep 13, 2019 at 2:53 PM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Fri, Sep 13, 2019 at 5:29 PM Yuchung Cheng <ycheng@google.com> wrote:
> > > What if the comment is shortened up to fit in 80 columns and the units
> > > (bytes) are added, something like:
> > >
> > >         __u32   tcpi_snd_wnd;        /* peer's advertised recv window (bytes) */
> > just a thought: will tcpi_peer_rcv_wnd be more self-explanatory?
>
> Good suggestion. I'm on the fence about that one. By itself, I agree
> tcpi_peer_rcv_wnd sounds much more clear. But tcpi_snd_wnd has the
> virtue of matching both the kernel code (tp->snd_wnd) and RFC 793
> (SND.WND). So they both have pros and cons. Maybe someone else feels
> more strongly one way or the other.
no strong preference -- snd_wnd is just fine too with proper comment
(for consistency).

also it might be good to comment whether it is scaled or not. it may
not be obvious if the actual value and scale are small.
>
> neal

^ permalink raw reply

* Re: [PATCH bpf-next] libbpf: add xsk_umem__adjust_offset
From: Yonghong Song @ 2019-09-13 22:49 UTC (permalink / raw)
  To: Kevin Laatz, netdev@vger.kernel.org, ast@kernel.org,
	daniel@iogearbox.net, bjorn.topel@intel.com,
	magnus.karlsson@intel.com, jonathan.lemon@gmail.com
  Cc: bruce.richardson@intel.com, ciara.loftus@intel.com,
	bpf@vger.kernel.org
In-Reply-To: <20190912072840.20947-1-kevin.laatz@intel.com>



On 9/12/19 8:28 AM, Kevin Laatz wrote:
> Currently, xsk_umem_adjust_offset exists as a kernel internal function.
> This patch adds xsk_umem__adjust_offset to libbpf so that it can be used
> from userspace. This will take the responsibility of properly storing the
> offset away from the application, making it less error prone.
> 
> Since xsk_umem__adjust_offset is called on a per-packet basis, we need to
> inline the function to avoid any performance regressions.  In order to
> inline xsk_umem__adjust_offset, we need to add it to xsk.h. Unfortunately
> this means that we can't dereference the xsk_umem_config struct directly
> since it is defined only in xsk.c. We therefore add an extra API to return
> the flags field to the user from the structure, and have the inline
> function use this flags field directly.

Could you add an example (either in samples/bpf or in 
tools/testing/selftests/bpf) to use xsk_umem__adjust_offset()? This will 
make it
easier to check whether the interface is appropriate or not and
what is the performance implication as you stated in the above.

> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> ---
>   tools/lib/bpf/libbpf.map |  1 +
>   tools/lib/bpf/xsk.c      |  5 +++++
>   tools/lib/bpf/xsk.h      | 14 ++++++++++++++
>   3 files changed, 20 insertions(+)
> 
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index d04c7cb623ed..760350c9b81c 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -189,4 +189,5 @@ LIBBPF_0.0.4 {
>   LIBBPF_0.0.5 {
>   	global:
>   		bpf_btf_get_next_id;
> +		xsk_umem__get_flags;
>   } LIBBPF_0.0.4;
> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> index 842c4fd55859..a4250a721ea6 100644
> --- a/tools/lib/bpf/xsk.c
> +++ b/tools/lib/bpf/xsk.c
> @@ -84,6 +84,11 @@ int xsk_socket__fd(const struct xsk_socket *xsk)
>   	return xsk ? xsk->fd : -EINVAL;
>   }
>   
> +__u32 xsk_umem__get_flags(struct xsk_umem *umem)
> +{
> +	return umem->config.flags;
> +}
> +
>   static bool xsk_page_aligned(void *buffer)
>   {
>   	unsigned long addr = (unsigned long)buffer;
> diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
> index 584f6820a639..bf782facb274 100644
> --- a/tools/lib/bpf/xsk.h
> +++ b/tools/lib/bpf/xsk.h
> @@ -183,8 +183,22 @@ static inline __u64 xsk_umem__add_offset_to_addr(__u64 addr)
>   	return xsk_umem__extract_addr(addr) + xsk_umem__extract_offset(addr);
>   }
>   
> +/* Handle the offset appropriately depending on aligned or unaligned mode.
> + * For unaligned mode, we store the offset in the upper 16-bits of the address.
> + * For aligned mode, we simply add the offset to the address.
> + */
> +static inline __u64 xsk_umem__adjust_offset(__u32 umem_flags, __u64 addr,
> +					    __u64 offset)
> +{
> +	if (umem_flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> +		return addr + (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
> +	else
> +		return addr + offset;
> +}
> +
>   LIBBPF_API int xsk_umem__fd(const struct xsk_umem *umem);
>   LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
> +LIBBPF_API __u32 xsk_umem__get_flags(struct xsk_umem *umem);
>   
>   #define XSK_RING_CONS__DEFAULT_NUM_DESCS      2048
>   #define XSK_RING_PROD__DEFAULT_NUM_DESCS      2048
> 

^ permalink raw reply

* [PATCH v2] net: mdio: switch to using gpiod_get_optional()
From: Dmitry Torokhov @ 2019-09-13 22:55 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, Heiner Kallweit
  Cc: David S. Miller, Linus Walleij, Andy Shevchenko, netdev,
	linux-kernel

The MDIO device reset line is optional and now that gpiod_get_optional()
returns proper value when GPIO support is compiled out, there is no
reason to use fwnode_get_named_gpiod() that I plan to hide away.

Let's switch to using more standard gpiod_get_optional() and
gpiod_set_consumer_name() to keep the nice "PHY reset" label.

Also there is no reason to only try to fetch the reset GPIO when we have
OF node, gpiolib can fetch GPIO data from firmwares as well.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
---

Note this is an update to a patch titled "[PATCH 05/11] net: mdio:
switch to using fwnode_gpiod_get_index()" that no longer uses the new
proposed API and instead works with already existing ones.

 drivers/net/phy/mdio_bus.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index ce940871331e..2e29ab841b4d 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -42,21 +42,17 @@

 static int mdiobus_register_gpiod(struct mdio_device *mdiodev)
 {
-	struct gpio_desc *gpiod = NULL;
+	int error;

 	/* Deassert the optional reset signal */
-	if (mdiodev->dev.of_node)
-		gpiod = fwnode_get_named_gpiod(&mdiodev->dev.of_node->fwnode,
-					       "reset-gpios", 0, GPIOD_OUT_LOW,
-					       "PHY reset");
-	if (IS_ERR(gpiod)) {
-		if (PTR_ERR(gpiod) == -ENOENT || PTR_ERR(gpiod) == -ENOSYS)
-			gpiod = NULL;
-		else
-			return PTR_ERR(gpiod);
-	}
-
-	mdiodev->reset_gpio = gpiod;
+	mdiodev->reset_gpio = gpiod_get_optional(&mdiodev->dev,
+						 "reset", GPIOD_OUT_LOW);
+	error = PTR_ERR_OR_ZERO(mdiodev->reset_gpio);
+	if (error)
+		return error;
+
+	if (mdiodev->reset_gpio)
+		gpiod_set_consumer_name(mdiodev->reset_gpio, "PHY reset");

 	return 0;
 }
-- 
2.23.0.237.gc6a4ce50a0-goog

-- 
Dmitry

^ permalink raw reply related

* RE: [PATCH v3 0/5] Introduce variable length mdev alias
From: Parav Pandit @ 2019-09-13 23:19 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Jiri Pirko, kwankhede@nvidia.com, cohuck@redhat.com,
	davem@davemloft.net, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <20190913153247.0309d016@x1.home>

Hi Alex,

> -----Original Message-----
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, September 13, 2019 4:33 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jiri Pirko <jiri@mellanox.com>; kwankhede@nvidia.com;
> cohuck@redhat.com; davem@davemloft.net; kvm@vger.kernel.org; linux-
> kernel@vger.kernel.org; netdev@vger.kernel.org
> Subject: Re: [PATCH v3 0/5] Introduce variable length mdev alias
> 
> On Wed, 11 Sep 2019 16:38:49 +0000
> Parav Pandit <parav@mellanox.com> wrote:
> 
> > > -----Original Message-----
> > > From: linux-kernel-owner@vger.kernel.org <linux-kernel-
> > > owner@vger.kernel.org> On Behalf Of Parav Pandit
> > > Sent: Wednesday, September 11, 2019 10:31 AM
> > > To: Alex Williamson <alex.williamson@redhat.com>
> > > Cc: Jiri Pirko <jiri@mellanox.com>; kwankhede@nvidia.com;
> > > cohuck@redhat.com; davem@davemloft.net; kvm@vger.kernel.org; linux-
> > > kernel@vger.kernel.org; netdev@vger.kernel.org
> > > Subject: RE: [PATCH v3 0/5] Introduce variable length mdev alias
> > >
> > > Hi Alex,
> > >
> > > > -----Original Message-----
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Wednesday, September 11, 2019 8:56 AM
> > > > To: Parav Pandit <parav@mellanox.com>
> > > > Cc: Jiri Pirko <jiri@mellanox.com>; kwankhede@nvidia.com;
> > > > cohuck@redhat.com; davem@davemloft.net; kvm@vger.kernel.org;
> > > > linux- kernel@vger.kernel.org; netdev@vger.kernel.org
> > > > Subject: Re: [PATCH v3 0/5] Introduce variable length mdev alias
> > > >
> > > > On Mon, 9 Sep 2019 20:42:32 +0000
> > > > Parav Pandit <parav@mellanox.com> wrote:
> > > >
> > > > > Hi Alex,
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Parav Pandit <parav@mellanox.com>
> > > > > > Sent: Sunday, September 1, 2019 11:25 PM
> > > > > > To: alex.williamson@redhat.com; Jiri Pirko
> > > > > > <jiri@mellanox.com>; kwankhede@nvidia.com; cohuck@redhat.com;
> > > > > > davem@davemloft.net
> > > > > > Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > > > netdev@vger.kernel.org; Parav Pandit <parav@mellanox.com>
> > > > > > Subject: [PATCH v3 0/5] Introduce variable length mdev alias
> > > > > >
> > > > > > To have consistent naming for the netdevice of a mdev and to
> > > > > > have consistent naming of the devlink port [1] of a mdev,
> > > > > > which is formed using phys_port_name of the devlink port,
> > > > > > current UUID is not usable because UUID is too long.
> > > > > >
> > > > > > UUID in string format is 36-characters long and in binary 128-bit.
> > > > > > Both formats are not able to fit within 15 characters limit of
> > > > > > netdev
> > > > name.
> > > > > >
> > > > > > It is desired to have mdev device naming consistent using UUID.
> > > > > > So that widely used user space framework such as ovs [2] can
> > > > > > make use of mdev representor in similar way as PCIe SR-IOV VF
> > > > > > and PF
> > > > representors.
> > > > > >
> > > > > > Hence,
> > > > > > (a) mdev alias is created which is derived using sha1 from the
> > > > > > mdev
> > > > name.
> > > > > > (b) Vendor driver describes how long an alias should be for
> > > > > > the child mdev created for a given parent.
> > > > > > (c) Mdev aliases are unique at system level.
> > > > > > (d) alias is created optionally whenever parent requested.
> > > > > > This ensures that non networking mdev parents can function
> > > > > > without alias creation overhead.
> > > > > >
> > > > > > This design is discussed at [3].
> > > > > >
> > > > > > An example systemd/udev extension will have,
> > > > > >
> > > > > > 1. netdev name created using mdev alias available in sysfs.
> > > > > >
> > > > > > mdev UUID=83b8f4f2-509f-382f-3c1e-e6bfe0fa1001
> > > > > > mdev 12 character alias=cd5b146a80a5
> > > > > >
> > > > > > netdev name of this mdev = enmcd5b146a80a5 Here en = Ethernet
> > > > > > link m = mediated device
> > > > > >
> > > > > > 2. devlink port phys_port_name created using mdev alias.
> > > > > > devlink phys_port_name=pcd5b146a80a5
> > > > > >
> > > > > > This patchset enables mdev core to maintain unique alias for a mdev.
> > > > > >
> > > > > > Patch-1 Introduces mdev alias using sha1.
> > > > > > Patch-2 Ensures that mdev alias is unique in a system.
> > > > > > Patch-3 Exposes mdev alias in a sysfs hirerchy, update
> > > > > > Documentation
> > > > > > Patch-4 Introduces mdev_alias() API.
> > > > > > Patch-5 Extends mtty driver to optionally provide alias generation.
> > > > > > This also enables to test UUID based sha1 collision and
> > > > > > trigger error handling for duplicate sha1 results.
> > > > > >
> > > > > > [1] http://man7.org/linux/man-pages/man8/devlink-port.8.html
> > > > > > [2]
> > > > > > https://docs.openstack.org/os-vif/latest/user/plugins/ovs.html
> > > > > > [3] https://patchwork.kernel.org/cover/11084231/
> > > > > >
> > > > > > ---
> > > > > > Changelog:
> > > > > > v2->v3:
> > > > > >  - Addressed comment from Yunsheng Lin
> > > > > >  - Changed strcmp() ==0 to !strcmp()
> > > > > >  - Addressed comment from Cornelia Hunk
> > > > > >  - Merged sysfs Documentation patch with syfs patch
> > > > > >  - Added more description for alias return value
> > > > >
> > > > > Did you get a chance review this updated series?
> > > > > I addressed Cornelia's and yours comment.
> > > > > I do not think allocating alias memory twice, once for
> > > > > comparison and once for storing is good idea or moving alias
> > > > > generation logic inside the mdev_list_lock(). So I didn't
> > > > > address that suggestion of
> > > Cornelia.
> > > >
> > > > Sorry, I'm at LPC this week.  I agree, I don't think the double
> > > > allocation is necessary, I thought the comment was sufficient to
> > > > clarify null'ing the variable.  It's awkward, but seems correct.
> > > >
> > > > I'm not sure what we do with this patch series though, has the real
> > > > consumer of this even been proposed?
> >
> > Jiri already acked to use mdev_alias() to generate phys_port_name several
> days back in the discussion we had in [1].
> > After concluding in the thread [1], I proceed with mdev_alias().
> > mlx5_core patches are not yet present on netdev mailing list, but we
> > all agree to use it in mdev_alias() in devlink phys_port_name
> > generation. So we have collective agreement on how to proceed forward.
> > I wasn't probably clear enough in previous email reply about it, so
> > adding link here.
> >
> > [1] https://patchwork.kernel.org/cover/11084231/#22838955
> 
> Jiri may have agreed to the concept, but without patches on the list proving an
> end to end solution, I think it's too early for us to commit to this by
> preemptively adding it to our API.  "Acked" and "collective agreement" seem
> like they overstate something that seems not to have seen the light of day yet.
> Instead I'll say, it looks reasonable, come back when the real consumer has
> actually been proposed upstream and has more buy-in from the community
> and we'll see if it still looks like the right approach from an mdev perspective
> then.  Thanks,
> 
Ok. I will combine these patches with the actual consumer patches of mdev_alias().
Thanks.

> Alex

^ permalink raw reply

* [PATCH v5 1/2] tcp: Add TCP_INFO counter for packets received out-of-order
From: Thomas Higdon @ 2019-09-13 23:23 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: Jonathan Lemon, Dave Jones, Eric Dumazet, Neal Cardwell,
	Dave Taht, Yuchung Cheng, Soheil Hassas Yeganeh

For receive-heavy cases on the server-side, we want to track the
connection quality for individual client IPs. This counter, similar to
the existing system-wide TCPOFOQueue counter in /proc/net/netstat,
tracks out-of-order packet reception. By providing this counter in
TCP_INFO, it will allow understanding to what degree receive-heavy
sockets are experiencing out-of-order delivery and packet drops
indicating congestion.

Please note that this is similar to the counter in NetBSD TCP_INFO, and
has the same name.

Also note that we avoid increasing the size of the tcp_sock struct by
taking advantage of a hole.

Signed-off-by: Thomas Higdon <tph@fb.com>
---
changes since v4:
 - optimize placement of rcv_ooopack to avoid increasing tcp_sock struct
   size

 include/linux/tcp.h      | 2 ++
 include/uapi/linux/tcp.h | 2 ++
 net/ipv4/tcp.c           | 2 ++
 net/ipv4/tcp_input.c     | 1 +
 4 files changed, 7 insertions(+)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index f3a85a7fb4b1..99617e528ea2 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -354,6 +354,8 @@ struct tcp_sock {
 #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) 0
 #endif
 
+	u32 rcv_ooopack; /* Received out-of-order packets, for tcpinfo */
+
 /* Receiver side RTT estimation */
 	u32 rcv_rtt_last_tsecr;
 	struct {
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index b3564f85a762..20237987ccc8 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -270,6 +270,8 @@ struct tcp_info {
 	__u64	tcpi_bytes_retrans;  /* RFC4898 tcpEStatsPerfOctetsRetrans */
 	__u32	tcpi_dsack_dups;     /* RFC4898 tcpEStatsStackDSACKDups */
 	__u32	tcpi_reord_seen;     /* reordering events seen */
+
+	__u32	tcpi_rcv_ooopack;    /* Out-of-order packets received */
 };
 
 /* netlink attributes types for SCM_TIMESTAMPING_OPT_STATS */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 94df48bcecc2..4cf58208270e 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2653,6 +2653,7 @@ int tcp_disconnect(struct sock *sk, int flags)
 	tp->rx_opt.saw_tstamp = 0;
 	tp->rx_opt.dsack = 0;
 	tp->rx_opt.num_sacks = 0;
+	tp->rcv_ooopack = 0;
 
 
 	/* Clean up fastopen related fields */
@@ -3295,6 +3296,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
 	info->tcpi_bytes_retrans = tp->bytes_retrans;
 	info->tcpi_dsack_dups = tp->dsack_dups;
 	info->tcpi_reord_seen = tp->reord_seen;
+	info->tcpi_rcv_ooopack = tp->rcv_ooopack;
 	unlock_sock_fast(sk, slow);
 }
 EXPORT_SYMBOL_GPL(tcp_get_info);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 706cbb3b2986..2ef333354026 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4555,6 +4555,7 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
 	tp->pred_flags = 0;
 	inet_csk_schedule_ack(sk);
 
+	tp->rcv_ooopack += max_t(u16, 1, skb_shinfo(skb)->gso_segs);
 	NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPOFOQUEUE);
 	seq = TCP_SKB_CB(skb)->seq;
 	end_seq = TCP_SKB_CB(skb)->end_seq;
-- 
2.17.1


^ permalink raw reply related

* [PATCH v5 2/2] tcp: Add snd_wnd to TCP_INFO
From: Thomas Higdon @ 2019-09-13 23:23 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: Jonathan Lemon, Dave Jones, Eric Dumazet, Neal Cardwell,
	Dave Taht, Yuchung Cheng, Soheil Hassas Yeganeh
In-Reply-To: <20190913232332.44036-1-tph@fb.com>

Neal Cardwell mentioned that snd_wnd would be useful for diagnosing TCP
performance problems --
> (1) Usually when we're diagnosing TCP performance problems, we do so
> from the sender, since the sender makes most of the
> performance-critical decisions (cwnd, pacing, TSO size, TSQ, etc).
> From the sender-side the thing that would be most useful is to see
> tp->snd_wnd, the receive window that the receiver has advertised to
> the sender.

This serves the purpose of adding an additional __u32 to avoid the
would-be hole caused by the addition of the tcpi_rcvi_ooopack field.

Signed-off-by: Thomas Higdon <tph@fb.com>
---
changes since v4:
 - clarify comment
 include/uapi/linux/tcp.h | 4 ++++
 net/ipv4/tcp.c           | 1 +
 2 files changed, 5 insertions(+)

diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 20237987ccc8..81e697978e8b 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -272,6 +272,10 @@ struct tcp_info {
 	__u32	tcpi_reord_seen;     /* reordering events seen */
 
 	__u32	tcpi_rcv_ooopack;    /* Out-of-order packets received */
+
+	__u32	tcpi_snd_wnd;	     /* peer's advertised receive window after
+				      * scaling (bytes)
+				      */
 };
 
 /* netlink attributes types for SCM_TIMESTAMPING_OPT_STATS */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4cf58208270e..79c325a07ba5 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3297,6 +3297,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
 	info->tcpi_dsack_dups = tp->dsack_dups;
 	info->tcpi_reord_seen = tp->reord_seen;
 	info->tcpi_rcv_ooopack = tp->rcv_ooopack;
+	info->tcpi_snd_wnd = tp->snd_wnd;
 	unlock_sock_fast(sk, slow);
 }
 EXPORT_SYMBOL_GPL(tcp_get_info);
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH v5 2/2] tcp: Add snd_wnd to TCP_INFO
From: Yuchung Cheng @ 2019-09-13 23:36 UTC (permalink / raw)
  To: Thomas Higdon
  Cc: netdev@vger.kernel.org, Jonathan Lemon, Dave Jones, Eric Dumazet,
	Neal Cardwell, Dave Taht, Soheil Hassas Yeganeh
In-Reply-To: <20190913232332.44036-2-tph@fb.com>

On Fri, Sep 13, 2019 at 4:23 PM Thomas Higdon <tph@fb.com> wrote:
>
> Neal Cardwell mentioned that snd_wnd would be useful for diagnosing TCP
> performance problems --
> > (1) Usually when we're diagnosing TCP performance problems, we do so
> > from the sender, since the sender makes most of the
> > performance-critical decisions (cwnd, pacing, TSO size, TSQ, etc).
> > From the sender-side the thing that would be most useful is to see
> > tp->snd_wnd, the receive window that the receiver has advertised to
> > the sender.
>
> This serves the purpose of adding an additional __u32 to avoid the
> would-be hole caused by the addition of the tcpi_rcvi_ooopack field.
>
> Signed-off-by: Thomas Higdon <tph@fb.com>
> ---
Acked-by: Yuchung Cheng <ycheng@google.com>

> changes since v4:
>  - clarify comment
>  include/uapi/linux/tcp.h | 4 ++++
>  net/ipv4/tcp.c           | 1 +
>  2 files changed, 5 insertions(+)
>
> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> index 20237987ccc8..81e697978e8b 100644
> --- a/include/uapi/linux/tcp.h
> +++ b/include/uapi/linux/tcp.h
> @@ -272,6 +272,10 @@ struct tcp_info {
>         __u32   tcpi_reord_seen;     /* reordering events seen */
>
>         __u32   tcpi_rcv_ooopack;    /* Out-of-order packets received */
> +
> +       __u32   tcpi_snd_wnd;        /* peer's advertised receive window after
> +                                     * scaling (bytes)
> +                                     */
>  };
>
>  /* netlink attributes types for SCM_TIMESTAMPING_OPT_STATS */
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 4cf58208270e..79c325a07ba5 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -3297,6 +3297,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
>         info->tcpi_dsack_dups = tp->dsack_dups;
>         info->tcpi_reord_seen = tp->reord_seen;
>         info->tcpi_rcv_ooopack = tp->rcv_ooopack;
> +       info->tcpi_snd_wnd = tp->snd_wnd;
>         unlock_sock_fast(sk, slow);
>  }
>  EXPORT_SYMBOL_GPL(tcp_get_info);
> --
> 2.17.1
>

^ permalink raw reply

* [PATCH] rsi: release skb if rsi_prepare_beacon fails
From: Navid Emamdoost @ 2019-09-14  0:08 UTC (permalink / raw)
  Cc: emamd001, smccaman, kjlu, Navid Emamdoost, Amitkumar Karwar,
	Siva Rebbagondla, Kalle Valo, David S. Miller, linux-wireless,
	netdev, linux-kernel

In rsi_send_beacon, if rsi_prepare_beacon fails the allocated skb should
be released.

Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
---
 drivers/net/wireless/rsi/rsi_91x_mgmt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/rsi/rsi_91x_mgmt.c b/drivers/net/wireless/rsi/rsi_91x_mgmt.c
index 6c7f26ef6476..9cc8a335d519 100644
--- a/drivers/net/wireless/rsi/rsi_91x_mgmt.c
+++ b/drivers/net/wireless/rsi/rsi_91x_mgmt.c
@@ -1756,6 +1756,7 @@ static int rsi_send_beacon(struct rsi_common *common)
 		skb_pull(skb, (64 - dword_align_bytes));
 	if (rsi_prepare_beacon(common, skb)) {
 		rsi_dbg(ERR_ZONE, "Failed to prepare beacon\n");
+		dev_kfree_skb(skb);
 		return -EINVAL;
 	}
 	skb_queue_tail(&common->tx_queue[MGMT_BEACON_Q], skb);
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH v2] igb: add rx drop enable attribute
From: Brown, Aaron F @ 2019-09-14  0:38 UTC (permalink / raw)
  To: Robert Beckett, intel-wired-lan@lists.osuosl.org
  Cc: Kirsher, Jeffrey T, David S. Miller, netdev@vger.kernel.org
In-Reply-To: <20190909142117.20186-1-bob.beckett@collabora.com>

On Mon, 2019-09-09 at 15:21 +0100, Robert Beckett wrote:
> To allow userland to enable or disable dropping packets when descriptor
> ring is exhausted, add RX_DROP_EN private flag.
> 
> This can be used in conjunction with flow control to mitigate packet storms
> (e.g. due to network loop or DoS) by forcing the network adapter to send
> pause frames whenever the ring is close to exhaustion.
> 
> By default this will maintain previous behaviour of enabling dropping of
> packets during ring buffer exhaustion.
> Some use cases prefer to not drop packets upon exhaustion, but instead
> use flow control to limit ingress rates and ensure no dropped packets.
> This is useful when the host CPU cannot keep up with packet delivery,
> but data delivery is more important than throughput via multiple queues.
> 
> Userland can set this flag to 0 via ethtool to disable packet dropping.
> 
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> ---
> 
> Notes:
>     Changes since v1: re-written to use ethtool priv flags instead of sysfs attribute
> 
>  drivers/net/ethernet/intel/igb/igb.h         |  1 +
>  drivers/net/ethernet/intel/igb/igb_ethtool.c |  8 ++++++++
>  drivers/net/ethernet/intel/igb/igb_main.c    | 11 +++++++++--
>  3 files changed, 18 insertions(+), 2 deletions(-)

Tested-by: Aaron Brown <aaron.f.brown@intel.com>

^ permalink raw reply

* Re: [PATCH] igb/igc: Don't warn on fatal read failures when the device is removed
From: Brown, Aaron F @ 2019-09-14  1:03 UTC (permalink / raw)
  To: Lyude Paul, intel-wired-lan@lists.osuosl.org,
	netdev@vger.kernel.org
  Cc: Tang, Feng, Neftin, Sasha, Kirsher, Jeffrey T, David S. Miller,
	linux-kernel@vger.kernel.org
In-Reply-To: <20190822183318.27634-1-lyude@redhat.com>

On Thu, 2019-08-22 at 14:33 -0400, Lyude Paul wrote:
> Fatal read errors are worth warning about, unless of course the device
> was just unplugged from the machine - something that's a rather normal
> occurence when the igb/igc adapter is located on a Thunderbolt dock. So,
> let's only WARN() if there's a fatal read error while the device is
> still present.
> 
> This fixes the following WARN splat that's been appearing whenever I
> unplug my Caldigit TS3 Thunderbolt dock from my laptop:
> 
>   igb 0000:09:00.0 enp9s0: PCIe link lost
>   ------------[ cut here ]------------
>   igb: Failed to read reg 0x18!
>   WARNING: CPU: 7 PID: 516 at
>   drivers/net/ethernet/intel/igb/igb_main.c:756 igb_rd32+0x57/0x6a [igb]
>   Modules linked in: igb dca thunderbolt fuse vfat fat elan_i2c mei_wdt
>   mei_hdcp i915 wmi_bmof intel_wmi_thunderbolt iTCO_wdt
>   iTCO_vendor_support x86_pkg_temp_thermal intel_powerclamp joydev
>   coretemp crct10dif_pclmul crc32_pclmul i2c_algo_bit ghash_clmulni_intel
>   intel_cstate drm_kms_helper intel_uncore syscopyarea sysfillrect
>   sysimgblt fb_sys_fops intel_rapl_perf intel_xhci_usb_role_switch mei_me
>   drm roles idma64 i2c_i801 ucsi_acpi typec_ucsi mei intel_lpss_pci
>   processor_thermal_device typec intel_pch_thermal intel_soc_dts_iosf
>   intel_lpss int3403_thermal thinkpad_acpi wmi int340x_thermal_zone
>   ledtrig_audio int3400_thermal acpi_thermal_rel acpi_pad video
>   pcc_cpufreq ip_tables serio_raw nvme nvme_core crc32c_intel uas
>   usb_storage e1000e i2c_dev
>   CPU: 7 PID: 516 Comm: kworker/u16:3 Not tainted 5.2.0-rc1Lyude-Test+ #14
>   Hardware name: LENOVO 20L8S2N800/20L8S2N800, BIOS N22ET35W (1.12 ) 04/09/2018
>   Workqueue: kacpi_hotplug acpi_hotplug_work_fn
>   RIP: 0010:igb_rd32+0x57/0x6a [igb]
>   Code: 87 b8 fc ff ff 48 c7 47 08 00 00 00 00 48 c7 c6 33 42 9b c0 4c 89
>   c7 e8 47 45 cd dc 89 ee 48 c7 c7 43 42 9b c0 e8 c1 94 71 dc <0f> 0b eb
>   08 8b 00 ff c0 75 b0 eb c8 44 89 e0 5d 41 5c c3 0f 1f 44
>   RSP: 0018:ffffba5801cf7c48 EFLAGS: 00010286
>   RAX: 0000000000000000 RBX: ffff9e7956608840 RCX: 0000000000000007
>   RDX: 0000000000000000 RSI: ffffba5801cf7b24 RDI: ffff9e795e3d6a00
>   RBP: 0000000000000018 R08: 000000009dec4a01 R09: ffffffff9e61018f
>   R10: 0000000000000000 R11: ffffba5801cf7ae5 R12: 00000000ffffffff
>   R13: ffff9e7956608840 R14: ffff9e795a6f10b0 R15: 0000000000000000
>   FS:  0000000000000000(0000) GS:ffff9e795e3c0000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 0000564317bc4088 CR3: 000000010e00a006 CR4: 00000000003606e0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   Call Trace:
>    igb_release_hw_control+0x1a/0x30 [igb]
>    igb_remove+0xc5/0x14b [igb]
>    pci_device_remove+0x3b/0x93
>    device_release_driver_internal+0xd7/0x17e
>    pci_stop_bus_device+0x36/0x75
>    pci_stop_bus_device+0x66/0x75
>    pci_stop_bus_device+0x66/0x75
>    pci_stop_and_remove_bus_device+0xf/0x19
>    trim_stale_devices+0xc5/0x13a
>    ? __pm_runtime_resume+0x6e/0x7b
>    trim_stale_devices+0x103/0x13a
>    ? __pm_runtime_resume+0x6e/0x7b
>    trim_stale_devices+0x103/0x13a
>    acpiphp_check_bridge+0xd8/0xf5
>    acpiphp_hotplug_notify+0xf7/0x14b
>    ? acpiphp_check_bridge+0xf5/0xf5
>    acpi_device_hotplug+0x357/0x3b5
>    acpi_hotplug_work_fn+0x1a/0x23
>    process_one_work+0x1a7/0x296
>    worker_thread+0x1a8/0x24c
>    ? process_scheduled_works+0x2c/0x2c
>    kthread+0xe9/0xee
>    ? kthread_destroy_worker+0x41/0x41
>    ret_from_fork+0x35/0x40
>   ---[ end trace 252bf10352c63d22 ]---
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Fixes: 47e16692b26b ("igb/igc: warn when fatal read failure happens")
> Cc: Feng Tang <feng.tang@intel.com>
> Cc: Sasha Neftin <sasha.neftin@intel.com>
> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Cc: intel-wired-lan@lists.osuosl.org
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 3 ++-
>  drivers/net/ethernet/intel/igc/igc_main.c | 3 ++-
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 

Tested-by: Aaron Brown <aaron.f.brown@intel.com>

^ permalink raw reply

* Re: [PATCH] e1000: fix memory leaks
From: Brown, Aaron F @ 2019-09-14  1:15 UTC (permalink / raw)
  To: Wenwen Wang
  Cc: Kirsher, Jeffrey T, David S. Miller,
	moderated list:INTEL ETHERNET DRIVERS,
	open list:NETWORKING DRIVERS, open list
In-Reply-To: <1565589561-5910-1-git-send-email-wenwen@cs.uga.edu>

On Mon, 2019-08-12 at 00:59 -0500, Wenwen Wang wrote:
> In e1000_set_ringparam(), 'tx_old' and 'rx_old' are not deallocated if
> e1000_up() fails, leading to memory leaks. Refactor the code to fix this
> issue.
> 
> Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
> ---
>  drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)

Tested-by: Aaron Brown <aaron.f.brown@intel.com>

^ permalink raw reply

* [PATCH v2 net-next 0/7] tc-taprio offload for SJA1105 DSA
From: Vladimir Oltean @ 2019-09-14  1:17 UTC (permalink / raw)
  To: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
	vedang.patel, richardcochran
  Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
	jhs, xiyou.wangcong, kurt.kanzenbach, joergen.andreasen, netdev,
	Vladimir Oltean

This is the second attempt to submit the tc-taprio offload model for
inclusion in the net tree. The sja1105 switch driver will provide the
first implementation of the offload. Only the bare minimum is added:

- The offload model and a DSA pass-through
- The hardware implementation
- The interaction with the netdev queues in the tagger code
- Documentation

What has been removed from the first attempt is support for
PTP-as-clocksource in sja1105.  This will be added as soon as the
offload model is settled.

Vinicius Costa Gomes (1):
  taprio: Add support for hardware offloading

Vladimir Oltean (6):
  net: dsa: Pass ndo_setup_tc slave callback to drivers
  net: dsa: sja1105: Add static config tables for scheduling
  net: dsa: sja1105: Advertise the 8 TX queues
  net: dsa: sja1105: Make HOSTPRIO a kernel config
  net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio
    offload
  docs: net: dsa: sja1105: Add info about the time-aware scheduler

 Documentation/networking/dsa/sja1105.rst      |  90 ++++
 drivers/net/dsa/sja1105/Kconfig               |  17 +
 drivers/net/dsa/sja1105/Makefile              |   4 +
 drivers/net/dsa/sja1105/sja1105.h             |   6 +
 .../net/dsa/sja1105/sja1105_dynamic_config.c  |   8 +
 drivers/net/dsa/sja1105/sja1105_main.c        |  28 +-
 .../net/dsa/sja1105/sja1105_static_config.c   | 167 +++++++
 .../net/dsa/sja1105/sja1105_static_config.h   |  48 +-
 drivers/net/dsa/sja1105/sja1105_tas.c         | 419 ++++++++++++++++++
 drivers/net/dsa/sja1105/sja1105_tas.h         |  44 ++
 include/linux/netdevice.h                     |   1 +
 include/net/dsa.h                             |   2 +
 include/net/pkt_sched.h                       |  23 +
 include/uapi/linux/pkt_sched.h                |   3 +-
 net/dsa/slave.c                               |  12 +-
 net/dsa/tag_sja1105.c                         |   3 +-
 net/sched/sch_taprio.c                        | 409 +++++++++++++++--
 17 files changed, 1231 insertions(+), 53 deletions(-)
 create mode 100644 drivers/net/dsa/sja1105/sja1105_tas.c
 create mode 100644 drivers/net/dsa/sja1105/sja1105_tas.h

-- 

For those who want to follow along with the hardware implementation, the
manual is here:
https://www.nxp.com/docs/en/user-guide/UM10944.pdf

Notable changes in v2:
- Made the series independent from PTP (which is temporarily removed)
- Changed the meaning of the gate_mask - it is now acting on traffic
  classes even in the view exposed by taprio to drivers.
- Removed the next_sched hrtimer.
- Summarized one of the responses given to Vinicius into a new
  documentation section.

The first version of the net-next patch series can be found here:

https://www.spinics.net/lists/netdev/msg597214.html

Changes in the first version of the net-next series compared to RFC v2:
- Made "flags 1" and "flags 2" mutually exclusive in the taprio qdisc
- Moved taprio_enable_offload and taprio_disable_offload out of atomic
  context - spin_lock_bh(qdisc_lock(sch)). This allows drivers that
  implement the ndo_setup_tc to sleep and for taprio memory to be
  allocated with GFP_KERNEL. The only thing that was kept under the
  spinlock is the assignment of the q->dequeue and q->peek pointers.
- Finally making proper use of own API - added a taprio_alloc helper to
  avoid passing stack memory to drivers.

The second version of the RFC is at:
https://www.spinics.net/lists/netdev/msg596663.html

Changes in v2 of the RFC since v1:
- Adapted the taprio offload patch to work by specifying "flags 2" to
  the iproute2-next tc. At the moment I don't clearly understand whether
  the full offload and the txtime assist ("flags 1") are mutually
  exclusive or not (i.e. whether a "flags 3" mode should be rejected,
  which it currently isn't).
- Added reference counting to the taprio offload structure. Maybe the
  function names and placement could have been better though. As for the
  other complaint (cycle time calculation) it got fixed in the taprio
  parser in the meantime.
- Converted sja1105 to use the hardware PTP registers, and save/restore
  the PTP time across resets.
- Made the DSA callback for ndo_setup_tc a bit more generic, but I don't
  know whether it fulfills expectations. Drivers still can't do blocking
  operations in its execution context.
- Added a state machine for starting/stopping the scheduler based on the
  last command run on the PTP clock.

The first RFC from July can be seen at:
https://lists.openwall.net/netdev/2019/07/07/81

Original cover letter:

Using Vinicius Costa Gomes' configuration interface for 802.1Qbv (later
resent by Voon Weifeng for the stmmac driver), I am submitting for
review a draft implementation of this offload for a DSA switch.

I don't want to insist too much on the hardware specifics of SJA1105
which isn't otherwise very compliant to the IEEE spec.

In order to be able to test with Vedang Patel's iproute2 patch for
taprio offload (https://www.spinics.net/lists/netdev/msg573072.html)
I had to actually revert the txtime-assist branch as it had changed the
iproute2 interface.

In terms of impact for DSA drivers, I would like to point out that:

- Maybe somebody should pre-populate qopt->cycle_time in case the user
  does not provide one. Otherwise each driver needs to iterate over the
  GCL once, just to set the cycle time (right now stmmac does as well).

- Configuring the switch over SPI cannot apparently be done from this
  ndo_setup_tc callback because it runs in atomic context. I also have
  some downstream patches to offload tc clsact matchall with mirred
  action, but in that case it looks like the atomic context restriction
  does not apply.

- I had to copy the struct tc_taprio_qopt_offload to driver private
  memory because a static config needs to be constructed every time a
  change takes place, and there are up to 4 switch ports that may take a
  TAS configuration. I have created a private
  tc_taprio_qopt_offload_copy() helper for this - I don't know whether
  it's of any help in the general case.

There is more to be done however. The TAS needs to be integrated with
the PTP driver. This is because with a PTP clock source, the base time
is written dynamically to the PTPSCHTM (PTP schedule time) register and
must be a time in the future. Then the "real" base time of each port's
TAS config can be offset by at most ~50 ms (the DELTA field from the
Schedule Entry Points Table) relative to PTPSCHTM.
Because base times in the past are completely ignored by this hardware,
we need to decide if it's ok behaviorally for a driver to "roll" a past
base time into the immediate future by incrementally adding the cycle
time (so the phase doesn't change). If it is, then decide by how long in
the future it is ok to do so. Or alternatively, is it preferable if the
driver errors out if the user-supplied base time is in the past and the
hardware doesn't like it? But even then, there might be fringe cases
when the base time becomes a past PTP time right as the driver tries to
apply the config.
Also applying a tc-taprio offload to a second SJA1105 switch port will
inevitably need to roll the first port's (now past) base time into an
equivalent future time.
All of this is going to be complicated even further by the fact that
resetting the switch (to apply the tc-taprio offload) makes it reset its
PTP time.

2.17.1

^ permalink raw reply

* [PATCH v2 net-next 2/7] net: dsa: Pass ndo_setup_tc slave callback to drivers
From: Vladimir Oltean @ 2019-09-14  1:17 UTC (permalink / raw)
  To: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
	vedang.patel, richardcochran
  Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
	jhs, xiyou.wangcong, kurt.kanzenbach, joergen.andreasen, netdev,
	Vladimir Oltean
In-Reply-To: <20190914011802.1602-1-olteanv@gmail.com>

DSA currently handles shared block filters (for the classifier-action
qdisc) in the core due to what I believe are simply pragmatic reasons -
hiding the complexity from drivers and offerring a simple API for port
mirroring.

Extend the dsa_slave_setup_tc function by passing all other qdisc
offloads to the driver layer, where the driver may choose what it
implements and how. DSA is simply a pass-through in this case.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Kurt Kanzenbach <kurt@linutronix.de>
---
Changes since v1:
- Added Kurt Kanzenbach's Acked-by.

Changes since RFC:
- Removed the unused declaration of struct tc_taprio_qopt_offload.

 include/net/dsa.h |  2 ++
 net/dsa/slave.c   | 12 ++++++++----
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 96acb14ec1a8..541fb514e31d 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -515,6 +515,8 @@ struct dsa_switch_ops {
 				   bool ingress);
 	void	(*port_mirror_del)(struct dsa_switch *ds, int port,
 				   struct dsa_mall_mirror_tc_entry *mirror);
+	int	(*port_setup_tc)(struct dsa_switch *ds, int port,
+				 enum tc_setup_type type, void *type_data);
 
 	/*
 	 * Cross-chip operations
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 9a88035517a6..75d58229a4bd 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1035,12 +1035,16 @@ static int dsa_slave_setup_tc_block(struct net_device *dev,
 static int dsa_slave_setup_tc(struct net_device *dev, enum tc_setup_type type,
 			      void *type_data)
 {
-	switch (type) {
-	case TC_SETUP_BLOCK:
+	struct dsa_port *dp = dsa_slave_to_port(dev);
+	struct dsa_switch *ds = dp->ds;
+
+	if (type == TC_SETUP_BLOCK)
 		return dsa_slave_setup_tc_block(dev, type_data);
-	default:
+
+	if (!ds->ops->port_setup_tc)
 		return -EOPNOTSUPP;
-	}
+
+	return ds->ops->port_setup_tc(ds, dp->index, type, type_data);
 }
 
 static void dsa_slave_get_stats64(struct net_device *dev,
-- 
2.17.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox