Netdev List
 help / color / mirror / Atom feed
* Re: [patch net-next 3/3] netdevsim: create devlink and netdev instances in namespace
From: Jakub Kicinski @ 2019-07-29 18:59 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, sthemmin, dsahern, mlxsw
In-Reply-To: <20190727094459.26345-4-jiri@resnulli.us>

On Sat, 27 Jul 2019 11:44:59 +0200, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
> 
> When user does create new netdevsim instance using sysfs bus file,
> create the devlink instance and related netdev instance in the namespace
> of the caller.
> 
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>

> diff --git a/drivers/net/netdevsim/bus.c b/drivers/net/netdevsim/bus.c
> index 1a0ff3d7747b..6aeed0c600f8 100644
> --- a/drivers/net/netdevsim/bus.c
> +++ b/drivers/net/netdevsim/bus.c
> @@ -283,6 +283,7 @@ nsim_bus_dev_new(unsigned int id, unsigned int port_count)
>  	nsim_bus_dev->dev.bus = &nsim_bus;
>  	nsim_bus_dev->dev.type = &nsim_bus_dev_type;
>  	nsim_bus_dev->port_count = port_count;
> +	nsim_bus_dev->initial_net = current->nsproxy->net_ns;
>  
>  	err = device_register(&nsim_bus_dev->dev);
>  	if (err)

The saved initial_net is only to carry the net info from the adding
process to the probe callback, because probe can be asynchronous?
I'm not entirely sure netdevsim can probe asynchronously in practice
given we never return EPROBE_DEFER, but would you mind at least adding
a comment stating that next to the definition of the field, otherwise 
I worry someone may mistakenly use this field instead of the up-to-date
devlink's netns.

> diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netdevsim.h
> index 79c05af2a7c0..cdf53d0e0c49 100644
> --- a/drivers/net/netdevsim/netdevsim.h
> +++ b/drivers/net/netdevsim/netdevsim.h
> @@ -19,6 +19,7 @@
>  #include <linux/netdevice.h>
>  #include <linux/u64_stats_sync.h>
>  #include <net/devlink.h>
> +#include <net/net_namespace.h>

You can just do a forward declaration, no need to pull in the header.

>  #include <net/xdp.h>
>  
>  #define DRV_NAME	"netdevsim"

> @@ -213,6 +215,7 @@ struct nsim_bus_dev {
>  	struct device dev;
>  	struct list_head list;
>  	unsigned int port_count;
> +	struct net *initial_net;
>  	unsigned int num_vfs;
>  	struct nsim_vf_config *vfconfigs;
>  };

Otherwise makes perfect sense, with the above nits addressed feel free
to add:

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>

^ permalink raw reply

* Re: [PATCH net v2] mlxsw: spectrum_ptp: Increase parsing depth when PTP is enabled
From: Ido Schimmel @ 2019-07-29 19:00 UTC (permalink / raw)
  To: Petr Machata
  Cc: netdev@vger.kernel.org, Jiri Pirko, Ido Schimmel,
	davem@davemloft.net
In-Reply-To: <b1584bdec4a0a36a2567a43dc0973dd8f3a05dec.1564424420.git.petrm@mellanox.com>

On Mon, Jul 29, 2019 at 06:26:14PM +0000, Petr Machata wrote:
> Spectrum systems have a configurable limit on how far into the packet they
> parse. By default, the limit is 96 bytes.
> 
> An IPv6 PTP packet is layered as Ethernet/IPv6/UDP (14+40+8 bytes), and
> sequence ID of a PTP event is only available 32 bytes into payload, for a
> total of 94 bytes. When an additional 802.1q header is present as
> well (such as when ptp4l is running on a VLAN port), the parsing limit is
> exceeded. Such packets are not recognized as PTP, and are not timestamped.
> 
> Therefore generalize the current VXLAN-specific parsing depth setting to
> allow reference-counted requests from other modules as well. Keep it in the
> VXLAN module, because the MPRS register also configures UDP destination
> port number used for VXLAN, and is thus closely tied to the VXLAN code
> anyway.
> 
> Then invoke the new interfaces from both VXLAN (in obvious places), as well
> as from PTP code, when the (global) timestamping configuration changes from
> disabled to enabled or vice versa.
> 
> Fixes: 8748642751ed ("mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls")
> Signed-off-by: Petr Machata <petrm@mellanox.com>

Reviewed-by: Ido Schimmel <idosch@mellanox.com>

^ permalink raw reply

* Re: [PATCH] net: sctp: drop unneeded likely() call around IS_ERR()
From: Marcelo Ricardo Leitner @ 2019-07-29 19:03 UTC (permalink / raw)
  To: Enrico Weigelt, metux IT consult
  Cc: linux-kernel, vyasevich, nhorman, davem, linux-sctp, netdev
In-Reply-To: <1564426521-22525-1-git-send-email-info@metux.net>

On Mon, Jul 29, 2019 at 08:55:21PM +0200, Enrico Weigelt, metux IT consult wrote:
> From: Enrico Weigelt <info@metux.net>
> 
> IS_ERR() already calls unlikely(), so this extra unlikely() call
> around IS_ERR() is not needed.
> 
> Signed-off-by: Enrico Weigelt <info@metux.net>

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

> ---
>  net/sctp/socket.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index aa80cda..9d1f83b 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -985,7 +985,7 @@ static int sctp_setsockopt_bindx(struct sock *sk,
>  		return -EINVAL;
>  
>  	kaddrs = memdup_user(addrs, addrs_size);
> -	if (unlikely(IS_ERR(kaddrs)))
> +	if (IS_ERR(kaddrs))
>  		return PTR_ERR(kaddrs);
>  
>  	/* Walk through the addrs buffer and count the number of addresses. */
> @@ -1315,7 +1315,7 @@ static int __sctp_setsockopt_connectx(struct sock *sk,
>  		return -EINVAL;
>  
>  	kaddrs = memdup_user(addrs, addrs_size);
> -	if (unlikely(IS_ERR(kaddrs)))
> +	if (IS_ERR(kaddrs))
>  		return PTR_ERR(kaddrs);
>  
>  	/* Allow security module to validate connectx addresses. */
> -- 
> 1.9.1
> 

^ permalink raw reply

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
From: Michael S. Tsirkin @ 2019-07-29 19:10 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <20190729165056.r32uzj6om3o6vfvp@steredhat>

On Mon, Jul 29, 2019 at 06:50:56PM +0200, Stefano Garzarella wrote:
> On Mon, Jul 29, 2019 at 06:19:03PM +0200, Stefano Garzarella wrote:
> > On Mon, Jul 29, 2019 at 11:49:02AM -0400, Michael S. Tsirkin wrote:
> > > On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote:
> > > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > > > and pushed to the guest using the vring, are directly queued in
> > > > > > a per-socket list. These buffers are preallocated by the guest
> > > > > > with a fixed size (4 KB).
> > > > > > 
> > > > > > The maximum amount of memory used by each socket should be
> > > > > > controlled by the credit mechanism.
> > > > > > The default credit available per-socket is 256 KB, but if we use
> > > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > > > to avoid starvation of other sockets.
> > > > > > 
> > > > > > This patch mitigates this issue copying the payload of small
> > > > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > > > order to avoid wasting memory.
> > > > > > 
> > > > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > > > 
> > > > > This is good enough for net-next, but for net I think we
> > > > > should figure out how to address the issue completely.
> > > > > Can we make the accounting precise? What happens to
> > > > > performance if we do?
> > > > > 
> > > > 
> > > > In order to do more precise accounting maybe we can use the buffer size,
> > > > instead of payload size when we update the credit available.
> > > > In this way, the credit available for each socket will reflect the memory
> > > > actually used.
> > > > 
> > > > I should check better, because I'm not sure what happen if the peer sees
> > > > 1KB of space available, then it sends 1KB of payload (using a 4KB
> > > > buffer).
> > > > 
> > > > The other option is to copy each packet in a new buffer like I did in
> > > > the v2 [2], but this forces us to make a copy for each packet that does
> > > > not fill the entire buffer, perhaps too expensive.
> > > > 
> > > > [2] https://patchwork.kernel.org/patch/10938741/
> > > > 
> > > > 
> > > > Thanks,
> > > > Stefano
> > > 
> > > Interesting. You are right, and at some level the protocol forces copies.
> > > 
> > > We could try to detect that the actual memory is getting close to
> > > admin limits and force copies on queued packets after the fact.
> > > Is that practical?
> > 
> > Yes, I think it is doable!
> > We can decrease the credit available with the buffer size queued, and
> > when the buffer size of packet to queue is bigger than the credit
> > available, we can copy it.
> > 
> > > 
> > > And yes we can extend the credit accounting to include buffer size.
> > > That's a protocol change but maybe it makes sense.
> > 
> > Since we send to the other peer the credit available, maybe this
> > change can be backwards compatible (I'll check better this).
> 
> What I said was wrong.
> 
> We send a counter (increased when the user consumes the packets) and the
> "buf_alloc" (the max memory allowed) to the other peer.
> It makes a difference between a local counter (increased when the
> packets are sent) and the remote counter to calculate the credit available:
> 
>     u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 credit)
>     {
>     	u32 ret;
> 
>     	spin_lock_bh(&vvs->tx_lock);
>     	ret = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt);
>     	if (ret > credit)
>     		ret = credit;
>     	vvs->tx_cnt += ret;
>     	spin_unlock_bh(&vvs->tx_lock);
> 
>     	return ret;
>     }
> 
> Maybe I can play with "buf_alloc" to take care of bytes queued but not
> used.
> 
> Thanks,
> Stefano

Right. And the idea behind it all was that if we send a credit
to remote then we have space for it.
I think the basic idea was that if we have actual allocated
memory and can copy data there, then we send the credit to
remote.

Of course that means an extra copy every packet.
So as an optimization, it seems that we just assume
that we will be able to allocate a new buffer.

First this is not the best we can do. We can actually do
allocate memory in the socket before sending credit.
If packet is small then we copy it there.
If packet is big then we queue the packet,
take the buffer out of socket and add it to the virtqueue.

Second question is what to do about medium sized packets.
Packet is 1K but buffer is 4K, what do we do?
And here I wonder - why don't we add the 3K buffer
to the vq?



-- 
MST

^ permalink raw reply

* Re: [PATCH bpf-next 02/10] libbpf: implement BPF CO-RE offset relocation algorithm
From: Song Liu @ 2019-07-29 19:56 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Yonghong Song, Andrii Nakryiko, Kernel Team
In-Reply-To: <20190724192742.1419254-3-andriin@fb.com>



> On Jul 24, 2019, at 12:27 PM, Andrii Nakryiko <andriin@fb.com> wrote:
> 
> This patch implements the core logic for BPF CO-RE offsets relocations.
> All the details are described in code comments.
> 
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---
> tools/lib/bpf/libbpf.c | 866 ++++++++++++++++++++++++++++++++++++++++-
> tools/lib/bpf/libbpf.h |   1 +
> 2 files changed, 861 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 8741c39adb1c..86d87bf10d46 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -38,6 +38,7 @@
> #include <sys/stat.h>
> #include <sys/types.h>
> #include <sys/vfs.h>
> +#include <sys/utsname.h>
> #include <tools/libc_compat.h>
> #include <libelf.h>
> #include <gelf.h>
> @@ -47,6 +48,7 @@
> #include "btf.h"
> #include "str_error.h"
> #include "libbpf_internal.h"
> +#include "hashmap.h"
> 
> #ifndef EM_BPF
> #define EM_BPF 247
> @@ -1013,16 +1015,22 @@ static int bpf_object__init_user_maps(struct bpf_object *obj, bool strict)
> }
> 
> static const struct btf_type *skip_mods_and_typedefs(const struct btf *btf,
> -						     __u32 id)
> +						     __u32 id,
> +						     __u32 *res_id)
> {
> 	const struct btf_type *t = btf__type_by_id(btf, id);
> 
> +	if (res_id)
> +		*res_id = id;
> +
> 	while (true) {
> 		switch (BTF_INFO_KIND(t->info)) {
> 		case BTF_KIND_VOLATILE:
> 		case BTF_KIND_CONST:
> 		case BTF_KIND_RESTRICT:
> 		case BTF_KIND_TYPEDEF:
> +			if (res_id)
> +				*res_id = t->type;
> 			t = btf__type_by_id(btf, t->type);
> 			break;
> 		default:
> @@ -1041,7 +1049,7 @@ static const struct btf_type *skip_mods_and_typedefs(const struct btf *btf,
> static bool get_map_field_int(const char *map_name, const struct btf *btf,
> 			      const struct btf_type *def,
> 			      const struct btf_member *m, __u32 *res) {
> -	const struct btf_type *t = skip_mods_and_typedefs(btf, m->type);
> +	const struct btf_type *t = skip_mods_and_typedefs(btf, m->type, NULL);
> 	const char *name = btf__name_by_offset(btf, m->name_off);
> 	const struct btf_array *arr_info;
> 	const struct btf_type *arr_t;
> @@ -1107,7 +1115,7 @@ static int bpf_object__init_user_btf_map(struct bpf_object *obj,
> 		return -EOPNOTSUPP;
> 	}
> 
> -	def = skip_mods_and_typedefs(obj->btf, var->type);
> +	def = skip_mods_and_typedefs(obj->btf, var->type, NULL);
> 	if (BTF_INFO_KIND(def->info) != BTF_KIND_STRUCT) {
> 		pr_warning("map '%s': unexpected def kind %u.\n",
> 			   map_name, BTF_INFO_KIND(var->info));
> @@ -2289,6 +2297,845 @@ bpf_program_reloc_btf_ext(struct bpf_program *prog, struct bpf_object *obj,
> 	return 0;
> }
> 
> +#define BPF_CORE_SPEC_MAX_LEN 64
> +
> +/* represents BPF CO-RE field or array element accessor */
> +struct bpf_core_accessor {
> +	__u32 type_id;		/* struct/union type or array element type */
> +	__u32 idx;		/* field index or array index */
> +	const char *name;	/* field name or NULL for array accessor */
> +};
> +
> +struct bpf_core_spec {
> +	const struct btf *btf;
> +	/* high-level spec: named fields and array indicies only */
> +	struct bpf_core_accessor spec[BPF_CORE_SPEC_MAX_LEN];
> +	/* high-level spec length */
> +	int len;
> +	/* raw, low-level spec: 1-to-1 with accessor spec string */
> +	int raw_spec[BPF_CORE_SPEC_MAX_LEN];
> +	/* raw spec length */
> +	int raw_len;
> +	/* field byte offset represented by spec */
> +	__u32 offset;
> +};
> +
> +static bool str_is_empty(const char *s)
> +{
> +	return !s || !s[0];
> +}
> +
> +static int btf_kind(const struct btf_type *t)
> +{
> +	return BTF_INFO_KIND(t->info);
> +}
> +
> +static bool btf_is_composite(const struct btf_type *t)
> +{
> +	int kind = btf_kind(t);
> +
> +	return kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION;
> +}
> +
> +static bool btf_is_array(const struct btf_type *t)
> +{
> +	return btf_kind(t) == BTF_KIND_ARRAY;
> +}
> +
> +/* 
> + * Turn bpf_offset_reloc into a low- and high-level spec representation,
> + * validating correctness along the way, as well as calculating resulting
> + * field offset (in bytes), specified by accessor string. Low-level spec
> + * captures every single level of nestedness, including traversing anonymous
> + * struct/union members. High-level one only captures semantically meaningful
> + * "turning points": named fields and array indicies.
> + * E.g., for this case:
> + *
> + *   struct sample {
> + *       int __unimportant;
> + *       struct {
> + *           int __1;
> + *           int __2;
> + *           int a[7];
> + *       };
> + *   };
> + *
> + *   struct sample *s = ...;
> + *
> + *   int x = &s->a[3]; // access string = '0:1:2:3'
> + *
> + * Low-level spec has 1:1 mapping with each element of access string (it's
> + * just a parsed access string representation): [0, 1, 2, 3].
> + *
> + * High-level spec will capture only 3 points:
> + *   - intial zero-index access by pointer (&s->... is the same as &s[0]...);
> + *   - field 'a' access (corresponds to '2' in low-level spec);
> + *   - array element #3 access (corresponds to '3' in low-level spec).
> + *
> + */
> +static int bpf_core_spec_parse(const struct btf *btf,
> +			       __u32 type_id,
> +			       const char *spec_str,
> +			       struct bpf_core_spec *spec)
> +{
> +	int access_idx, parsed_len, i;
> +	const struct btf_type *t;
> +	__u32 id = type_id;
> +	const char *name;
> +	__s64 sz;
> +
> +	if (str_is_empty(spec_str) || *spec_str == ':')
> +		return -EINVAL;
> +
> +	memset(spec, 0, sizeof(*spec));
> +	spec->btf = btf;
> +
> +	/* parse spec_str="0:1:2:3:4" into array raw_spec=[0, 1, 2, 3, 4] */
> +	while (*spec_str) {
> +		if (*spec_str == ':')
> +			++spec_str;
> +		if (sscanf(spec_str, "%d%n", &access_idx, &parsed_len) != 1)
> +			return -EINVAL;
> +		if (spec->raw_len == BPF_CORE_SPEC_MAX_LEN)
> +			return -E2BIG;
> +		spec_str += parsed_len;
> +		spec->raw_spec[spec->raw_len++] = access_idx;
> +	}
> +
> +	if (spec->raw_len == 0)
> +		return -EINVAL;
> +
> +	for (i = 0; i < spec->raw_len; i++) {
> +		t = skip_mods_and_typedefs(btf, id, &id);
> +		if (!t)
> +			return -EINVAL;
> +
> +		access_idx = spec->raw_spec[i];
> +
> +		if (i == 0) {
> +			/* first spec value is always reloc type array index */
> +			spec->spec[spec->len].type_id = id;
> +			spec->spec[spec->len].idx = access_idx;
> +			spec->len++;
> +
> +			sz = btf__resolve_size(btf, id);
> +			if (sz < 0)
> +				return sz;
> +			spec->offset += access_idx * sz;
> +			continue;
> +		}
> +
> +		if (btf_is_composite(t)) {
> +			const struct btf_member *m = (void *)(t + 1);
> +			__u32 offset;
> +
> +			if (access_idx >= BTF_INFO_VLEN(t->info))
> +				return -EINVAL;
> +
> +			m = &m[access_idx];
> +
> +			if (BTF_INFO_KFLAG(t->info)) {
> +				if (BTF_MEMBER_BITFIELD_SIZE(m->offset))
> +					return -EINVAL;
> +				offset = BTF_MEMBER_BIT_OFFSET(m->offset);
> +			} else {
> +				offset = m->offset;
> +			}
> +			if (m->offset % 8)
> +				return -EINVAL;
> +			spec->offset += offset / 8;
> +
> +			if (m->name_off) {
> +				name = btf__name_by_offset(btf, m->name_off);
> +				if (str_is_empty(name))
> +					return -EINVAL;
> +
> +				spec->spec[spec->len].type_id = id;
> +				spec->spec[spec->len].idx = access_idx;
> +				spec->spec[spec->len].name = name;
> +				spec->len++;
> +			}
> +
> +			id = m->type;
> +		} else if (btf_is_array(t)) {
> +			const struct btf_array *a = (void *)(t + 1);
> +
> +			t = skip_mods_and_typedefs(btf, a->type, &id);
> +			if (!t || access_idx >= a->nelems)
> +				return -EINVAL;
> +
> +			spec->spec[spec->len].type_id = id;
> +			spec->spec[spec->len].idx = access_idx;
> +			spec->len++;
> +
> +			sz = btf__resolve_size(btf, id);
> +			if (sz < 0)
> +				return sz;
> +			spec->offset += access_idx * sz;
> +		} else {
> +			pr_warning("relo for [%u] %s (at idx %d) captures type [%d] of unexpected kind %d\n",
> +				   type_id, spec_str, i, id, btf_kind(t));
> +			return -EINVAL;
> +		}
> +	}
> +
> +	if (spec->len == 0)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
> +/* Given 'some_struct_name___with_flavor' return the length of a name prefix
> + * before last triple underscore. Struct name part after last triple
> + * underscore is ignored by BPF CO-RE relocation during relocation matching.
> + */
> +static size_t bpf_core_essential_name_len(const char *name)
> +{
> +	size_t n = strlen(name);
> +	int i = n - 3;
> +
> +	while (i > 0) {
> +		if (name[i] == '_' && name[i + 1] == '_' && name[i + 2] == '_')
> +			return i;
> +		i--;
> +	}
> +	return n;
> +}
> +
> +/* dynamically sized list of type IDs */
> +struct ids_vec {
> +	__u32 *data;
> +	int len;
> +};
> +
> +static void bpf_core_free_cands(struct ids_vec *cand_ids)
> +{
> +	free(cand_ids->data);
> +	free(cand_ids);
> +}
> +
> +static struct ids_vec *bpf_core_find_cands(const struct btf *local_btf,
> +					   __u32 local_type_id,
> +					   const struct btf *targ_btf)
> +{
> +	size_t local_essent_len, targ_essent_len;
> +	const char *local_name, *targ_name;
> +	const struct btf_type *t;
> +	struct ids_vec *cand_ids;
> +	__u32 *new_ids;
> +	int i, err, n;
> +
> +	t = btf__type_by_id(local_btf, local_type_id);
> +	if (!t)
> +		return ERR_PTR(-EINVAL);
> +
> +	local_name = btf__name_by_offset(local_btf, t->name_off);
> +	if (str_is_empty(local_name))
> +		return ERR_PTR(-EINVAL);
> +	local_essent_len = bpf_core_essential_name_len(local_name);
> +
> +	cand_ids = calloc(1, sizeof(*cand_ids));
> +	if (!cand_ids)
> +		return ERR_PTR(-ENOMEM);
> +
> +	n = btf__get_nr_types(targ_btf);
> +	for (i = 1; i <= n; i++) {
> +		t = btf__type_by_id(targ_btf, i);
> +		targ_name = btf__name_by_offset(targ_btf, t->name_off);
> +		if (str_is_empty(targ_name))
> +			continue;
> +
> +		targ_essent_len = bpf_core_essential_name_len(targ_name);
> +		if (targ_essent_len != local_essent_len)
> +			continue;
> +
> +		if (strncmp(local_name, targ_name, local_essent_len) == 0) {
> +			pr_debug("[%d] (%s): found candidate [%d] (%s)\n",
> +				 local_type_id, local_name, i, targ_name);
> +			new_ids = realloc(cand_ids->data, cand_ids->len + 1);
> +			if (!new_ids) {
> +				err = -ENOMEM;
> +				goto err_out;
> +			}
> +			cand_ids->data = new_ids;
> +			cand_ids->data[cand_ids->len++] = i;
> +		}
> +	}
> +	return cand_ids;
> +err_out:
> +	bpf_core_free_cands(cand_ids);
> +	return ERR_PTR(err);
> +}
> +
> +/* Check two types for compatibility, skipping const/volatile/restrict and
> + * typedefs, to ensure we are relocating offset to the compatible entities:
> + *   - any two STRUCTs/UNIONs are compatible and can be mixed;
> + *   - any two FWDs are compatible;
> + *   - any two PTRs are always compatible;
> + *   - for ENUMs, check sizes, names are ignored;
> + *   - for INT, size and bitness should match, signedness is ignored;
> + *   - for ARRAY, dimensionality is ignored, element types are checked for
> + *     compatibility recursively;
> + *   - everything else shouldn't be ever a target of relocation.
> + * These rules are not set in stone and probably will be adjusted as we get
> + * more experience with using BPF CO-RE relocations.
> + */
> +static int bpf_core_fields_are_compat(const struct btf *local_btf,
> +				      __u32 local_id,
> +				      const struct btf *targ_btf,
> +				      __u32 targ_id)
> +{
> +	const struct btf_type *local_type, *targ_type;
> +	__u16 kind;
> +
> +recur:
> +	local_type = skip_mods_and_typedefs(local_btf, local_id, &local_id);
> +	targ_type = skip_mods_and_typedefs(targ_btf, targ_id, &targ_id);
> +	if (!local_type || !targ_type)
> +		return -EINVAL;
> +
> +	if (btf_is_composite(local_type) && btf_is_composite(targ_type))
> +		return 1;
> +	if (BTF_INFO_KIND(local_type->info) != BTF_INFO_KIND(targ_type->info))
> +		return 0;
> +
> +	kind = BTF_INFO_KIND(local_type->info);
> +	switch (kind) {
> +	case BTF_KIND_FWD:
> +	case BTF_KIND_PTR:
> +		return 1;
> +	case BTF_KIND_ENUM:
> +		return local_type->size == targ_type->size;
> +	case BTF_KIND_INT: {
> +		__u32 loc_int = *(__u32 *)(local_type + 1);
> +		__u32 targ_int = *(__u32 *)(targ_type + 1);
> +
> +		return BTF_INT_OFFSET(loc_int) == 0 &&
> +		       BTF_INT_OFFSET(targ_int) == 0 &&
> +		       local_type->size == targ_type->size &&
> +		       BTF_INT_BITS(loc_int) == BTF_INT_BITS(targ_int);
> +	}
> +	case BTF_KIND_ARRAY: {
> +		const struct btf_array *loc_a, *targ_a;
> +
> +		loc_a = (void *)(local_type + 1);
> +		targ_a = (void *)(targ_type + 1);
> +		local_id = loc_a->type;
> +		targ_id = targ_a->type;
> +		goto recur;
> +	}
> +	default:
> +		pr_warning("unexpected kind %d relocated, local [%d], target [%d]\n",
> +			   kind, local_id, targ_id);
> +		return 0;
> +	}
> +}
> +
> +/* 
> + * Given single high-level accessor (either named field or array index) in
> + * local type, find corresponding high-level accessor for a target type. Along
> + * the way, maintain low-level spec for target as well. Also keep updating
> + * target offset.
> + */
> +static int bpf_core_match_member(const struct btf *local_btf,
> +				 const struct bpf_core_accessor *local_acc,
> +				 const struct btf *targ_btf,
> +				 __u32 targ_id,
> +				 struct bpf_core_spec *spec,
> +				 __u32 *next_targ_id)
> +{
> +	const struct btf_type *local_type, *targ_type;
> +	const struct btf_member *local_member, *m;
> +	const char *local_name, *targ_name;
> +	__u32 local_id;
> +	int i, n, found;
> +
> +	targ_type = skip_mods_and_typedefs(targ_btf, targ_id, &targ_id);
> +	if (!targ_type)
> +		return -EINVAL;
> +	if (!btf_is_composite(targ_type))
> +		return 0;
> +
> +	local_id = local_acc->type_id;
> +	local_type = btf__type_by_id(local_btf, local_id);
> +	local_member = (void *)(local_type + 1);
> +	local_member += local_acc->idx;
> +	local_name = btf__name_by_offset(local_btf, local_member->name_off);
> +
> +	n = BTF_INFO_VLEN(targ_type->info);
> +	m = (void *)(targ_type + 1);
> +	for (i = 0; i < n; i++, m++) {
> +		__u32 offset;
> +
> +		/* bitfield relocations not supported */
> +		if (BTF_INFO_KFLAG(targ_type->info)) {
> +			if (BTF_MEMBER_BITFIELD_SIZE(m->offset))
> +				continue;
> +			offset = BTF_MEMBER_BIT_OFFSET(m->offset);
> +		} else {
> +			offset = m->offset;
> +		}
> +		if (offset % 8)
> +			continue;
> +
> +		/* too deep struct/union/array nesting */
> +		if (spec->raw_len == BPF_CORE_SPEC_MAX_LEN)
> +			return -E2BIG;
> +
> +		/* speculate this member will be the good one */
> +		spec->offset += offset / 8;
> +		spec->raw_spec[spec->raw_len++] = i;
> +
> +		targ_name = btf__name_by_offset(targ_btf, m->name_off);
> +		if (str_is_empty(targ_name)) {
> +			/* embedded struct/union, we need to go deeper */
> +			found = bpf_core_match_member(local_btf, local_acc,
> +						      targ_btf, m->type,
> +						      spec, next_targ_id);
> +			if (found) /* either found or error */
> +				return found;
> +		} else if (strcmp(local_name, targ_name) == 0) {
> +			/* matching named field */
> +			struct bpf_core_accessor *targ_acc;
> +
> +			targ_acc = &spec->spec[spec->len++];
> +			targ_acc->type_id = targ_id;
> +			targ_acc->idx = i;
> +			targ_acc->name = targ_name;
> +
> +			*next_targ_id = m->type;
> +			found = bpf_core_fields_are_compat(local_btf,
> +							   local_member->type,
> +							   targ_btf, m->type);
> +			if (!found)
> +				spec->len--; /* pop accessor */
> +			return found;
> +		}
> +		/* member turned out to be not we looked for */

	/* member turned out not to be what we looked for */ 

Or something similar. 

The rest of this patch looks good to me. 

Thanks,
Song


^ permalink raw reply

* Re: [PATCH bpf-next 01/10] libbpf: add .BTF.ext offset relocation section loading
From: Song Liu @ 2019-07-29 20:00 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Yonghong Song, Kernel Team
In-Reply-To: <CAEf4BzYgyAiPt0wVESrWSJ_tLheq0BRWLgrqMfLZsnp11+F77Q@mail.gmail.com>



> On Jul 26, 2019, at 10:11 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> On Wed, Jul 24, 2019 at 10:20 PM Song Liu <songliubraving@fb.com> wrote:
>> 
>> 
>> 
>>> On Jul 24, 2019, at 5:37 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>> 
>>> On Wed, Jul 24, 2019 at 5:00 PM Song Liu <songliubraving@fb.com> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Jul 24, 2019, at 12:27 PM, Andrii Nakryiko <andriin@fb.com> wrote:
>>>>> 
>>>>> Add support for BPF CO-RE offset relocations. Add section/record
>>>>> iteration macros for .BTF.ext. These macro are useful for iterating over
>>>>> each .BTF.ext record, either for dumping out contents or later for BPF
>>>>> CO-RE relocation handling.
>>>>> 
>>>>> To enable other parts of libbpf to work with .BTF.ext contents, moved
>>>>> a bunch of type definitions into libbpf_internal.h.
>>>>> 
>>>>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
>>>>> ---
>>>>> tools/lib/bpf/btf.c             | 64 +++++++++--------------
>>>>> tools/lib/bpf/btf.h             |  4 ++
>>>>> tools/lib/bpf/libbpf_internal.h | 91 +++++++++++++++++++++++++++++++++
>>>>> 3 files changed, 118 insertions(+), 41 deletions(-)
>>>>> 
>>> 
>>> [...]
>>> 
>>>>> +
>>>>> static int btf_ext_parse_hdr(__u8 *data, __u32 data_size)
>>>>> {
>>>>>     const struct btf_ext_header *hdr = (struct btf_ext_header *)data;
>>>>> @@ -1004,6 +979,13 @@ struct btf_ext *btf_ext__new(__u8 *data, __u32 size)
>>>>>     if (err)
>>>>>             goto done;
>>>>> 
>>>>> +     /* check if there is offset_reloc_off/offset_reloc_len fields */
>>>>> +     if (btf_ext->hdr->hdr_len < sizeof(struct btf_ext_header))
>>>> 
>>>> This check will break when we add more optional sections to btf_ext_header.
>>>> Maybe use offsetof() instead?
>>> 
>>> I didn't do it, because there are no fields after offset_reloc_len.
>>> But now I though that maybe it would be ok to add zero-sized marker
>>> field, kind of like marking off various versions of btf_ext header?
>>> 
>>> Alternatively, I can add offsetofend() macro somewhere in libbpf_internal.h.
>>> 
>>> Do you have any preference?
>> 
>> We only need a stable number to compare against. offsetofend() works.
>> Or we can simply have something like
>> 
>>    if (btf_ext->hdr->hdr_len <= offsetof(struct btf_ext_header, offset_reloc_off))
>>          goto done;
>> or
>>    if (btf_ext->hdr->hdr_len < offsetof(struct btf_ext_header, offset_reloc_len))
>>          goto done;
>> 
>> Does this make sense?
> 
> I think offsetofend() is the cleanest solution, I'll do just that.

Agreed that offsetofend() is the best. 

Song


^ permalink raw reply

* Re: [net-next 1/2] ipvs: batch __ip_vs_cleanup
From: Julian Anastasov @ 2019-07-29 20:03 UTC (permalink / raw)
  To: Haishuang Yan
  Cc: David S. Miller, Pablo Neira Ayuso, Simon Horman, netdev,
	lvs-devel, linux-kernel, netfilter-devel
In-Reply-To: <8441EA26-E197-4F40-A6D7-5B7D59AA7F7F@cmss.chinamobile.com>


	Hello,

On Thu, 18 Jul 2019, Haishuang Yan wrote:

> As the following benchmark testing results show, there is a little performance improvement:

	OK, can you send v2 after removing the LIST_HEAD(list) from
both patches, I guess, it is not needed. If you prefer, you can
include these benchmark results too.

> $  cat add_del_unshare.sh
> #!/bin/bash
> 
> for i in `seq 1 100`
>     do
>      (for j in `seq 1 40` ; do  unshare -n ipvsadm -A -t 172.16.$i.$j:80 >/dev/null ; done) &
>     done
> wait; grep net_namespace /proc/slabinfo
> 
> Befor patch:
> $  time sh add_del_unshare.sh
> net_namespace       4020   4020   4736    6    8 : tunables    0    0    0 : slabdata    670    670      0
> 
> real    0m8.086s
> user    0m2.025s
> sys     0m36.956s
> 
> After patch:
> $  time sh add_del_unshare.sh
> net_namespace       4020   4020   4736    6    8 : tunables    0    0    0 : slabdata    670    670      0
> 
> real    0m7.623s
> user    0m2.003s
> sys     0m32.935s
> 
> 
> > 
> >> +		ipvs = net_ipvs(net);
> >> +		ip_vs_conn_net_cleanup(ipvs);
> >> +		ip_vs_app_net_cleanup(ipvs);
> >> +		ip_vs_protocol_net_cleanup(ipvs);
> >> +		ip_vs_control_net_cleanup(ipvs);
> >> +		ip_vs_estimator_net_cleanup(ipvs);
> >> +		IP_VS_DBG(2, "ipvs netns %d released\n", ipvs->gen);
> >> +		net->ipvs = NULL;

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
From: Michael S. Tsirkin @ 2019-07-29 19:33 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <CAGxU2F5F1KcaFNJ6n7++ApZiYMGnoEWKVRgo3Vc4h5hpxSJEZg@mail.gmail.com>

On Mon, Jul 29, 2019 at 06:41:27PM +0200, Stefano Garzarella wrote:
> On Mon, Jul 29, 2019 at 12:01:37PM -0400, Michael S. Tsirkin wrote:
> > On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote:
> > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > > and pushed to the guest using the vring, are directly queued in
> > > > > a per-socket list. These buffers are preallocated by the guest
> > > > > with a fixed size (4 KB).
> > > > >
> > > > > The maximum amount of memory used by each socket should be
> > > > > controlled by the credit mechanism.
> > > > > The default credit available per-socket is 256 KB, but if we use
> > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > > to avoid starvation of other sockets.
> > > > >
> > > > > This patch mitigates this issue copying the payload of small
> > > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > > order to avoid wasting memory.
> > > > >
> > > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > >
> > > > This is good enough for net-next, but for net I think we
> > > > should figure out how to address the issue completely.
> > > > Can we make the accounting precise? What happens to
> > > > performance if we do?
> > > >
> > >
> > > In order to do more precise accounting maybe we can use the buffer size,
> > > instead of payload size when we update the credit available.
> > > In this way, the credit available for each socket will reflect the memory
> > > actually used.
> > >
> > > I should check better, because I'm not sure what happen if the peer sees
> > > 1KB of space available, then it sends 1KB of payload (using a 4KB
> > > buffer).
> > > The other option is to copy each packet in a new buffer like I did in
> > > the v2 [2], but this forces us to make a copy for each packet that does
> > > not fill the entire buffer, perhaps too expensive.
> > >
> > > [2] https://patchwork.kernel.org/patch/10938741/
> > >
> >
> > So one thing we can easily do is to under-report the
> > available credit. E.g. if we copy up to 256bytes,
> > then report just 256bytes for every buffer in the queue.
> >
> 
> Ehm sorry, I got lost :(
> Can you explain better?
> 
> 
> Thanks,
> Stefano

I think I suggested a better idea more recently.
But to clarify this option: we are adding a 4K buffer.
Let's say we know we will always copy 128 bytes.

So we just tell remote we have 128.
If we add another 4K buffer we add another 128 credits.

So we are charging local socket 16x more (4k for a 128 byte packet) but
we are paying remote 16x less (128 credits for 4k byte buffer). It evens
out.

Way less credits to go around so I'm not sure it's a good idea,
at least as the only solution. Can be combined with other
optimizations and probably in a less drastic fashion
(e.g. 2x rather than 16x).


-- 
MST

^ permalink raw reply

* [PATCH] net: hamradio: baycom_epp: Mark expected switch fall-through
From: Gustavo A. R. Silva @ 2019-07-29 20:12 UTC (permalink / raw)
  To: Thomas Sailer, David S. Miller
  Cc: linux-hams, netdev, linux-kernel, Gustavo A. R. Silva, Kees Cook

Mark switch cases where we are expecting to fall through.

This patch fixes the following warning (Building: i386):

drivers/net/hamradio/baycom_epp.c: In function ‘transmit’:
drivers/net/hamradio/baycom_epp.c:491:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    if (i) {
       ^
drivers/net/hamradio/baycom_epp.c:504:3: note: here
   default:  /* fall through */
   ^~~~~~~

Notice that, in this particular case, the code comment is
modified in accordance with what GCC is expecting to find.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
 drivers/net/hamradio/baycom_epp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hamradio/baycom_epp.c b/drivers/net/hamradio/baycom_epp.c
index daab2c07d891..9303aeb2595f 100644
--- a/drivers/net/hamradio/baycom_epp.c
+++ b/drivers/net/hamradio/baycom_epp.c
@@ -500,8 +500,9 @@ static int transmit(struct baycom_state *bc, int cnt, unsigned char stat)
 				}
 				break;
 			}
+			/* fall through */
 
-		default:  /* fall through */
+		default:
 			if (bc->hdlctx.calibrate <= 0)
 				return 0;
 			i = min_t(int, cnt, bc->hdlctx.calibrate);
-- 
2.22.0


^ permalink raw reply related

* Re: [patch net-next 0/3] net: devlink: Finish network namespace support
From: David Ahern @ 2019-07-29 20:17 UTC (permalink / raw)
  To: Jiri Pirko, netdev; +Cc: davem, jakub.kicinski, sthemmin, mlxsw
In-Reply-To: <20190727094459.26345-1-jiri@resnulli.us>

On 7/27/19 3:44 AM, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
> 
> Devlink from the beginning counts with network namespaces, but the
> instances has been fixed to init_net. The first patch allows user
> to move existing devlink instances into namespaces:
> 

so you intend for an asic, for example, to have multiple devlink
instances where each instance governs a set of related ports (e.g.,
ports that share a set of hardware resources) and those instances can be
managed from distinct network namespaces?


^ permalink raw reply

* Re: [patch iproute2 1/2] devlink: introduce cmdline option to switch to a different namespace
From: David Ahern @ 2019-07-29 20:17 UTC (permalink / raw)
  To: Jiri Pirko, netdev; +Cc: davem, jakub.kicinski, sthemmin, mlxsw
In-Reply-To: <20190727100544.28649-1-jiri@resnulli.us>

subject line should have iproute2-next

^ permalink raw reply

* Re: [PATCH bpf-next 00/10] CO-RE offset relocations
From: Song Liu @ 2019-07-29 20:20 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Yonghong Song, Andrii Nakryiko, Kernel Team
In-Reply-To: <20190724192742.1419254-1-andriin@fb.com>

On Wed, Jul 24, 2019 at 1:34 PM Andrii Nakryiko <andriin@fb.com> wrote:
>
> This patch set implements central part of CO-RE (Compile Once - Run
> Everywhere, see [0] and [1] for slides and video): relocating field offsets.
> Most of the details are written down as comments to corresponding parts of the
> code.
>
> Patch #1 adds loading of .BTF.ext offset relocations section and macros to
> work with its contents.
> Patch #2 implements CO-RE relocations algorithm in libbpf.
> Patches #3-#10 adds selftests validating various parts of relocation handling,
> type compatibility, etc.
>
> For all tests to work, you'll need latest Clang/LLVM supporting
> __builtin_preserve_access_index intrinsic, used for recording offset
> relocations. Kernel on which selftests run should have BTF information built
> in (CONFIG_DEBUG_INFO_BTF=y).
>
>   [0] http://vger.kernel.org/bpfconf2019.html#session-2
>   [1] http://vger.kernel.org/lpc-bpf2018.html#session-2CO-RE relocations
>
> This patch set implements central part of CO-RE (Compile Once - Run
> Everywhere, see [0] and [1] for slides and video): relocating field offsets.
> Most of the details are written down as comments to corresponding parts of the
> code.
>
> Patch #1 adds loading of .BTF.ext offset relocations section and macros to
> work with its contents.
> Patch #2 implements CO-RE relocations algorithm in libbpf.
> Patches #3-#10 adds selftests validating various parts of relocation handling,
> type compatibility, etc.
>
> For all tests to work, you'll need latest Clang/LLVM supporting
> __builtin_preserve_access_index intrinsic, used for recording offset
> relocations. Kernel on which selftests run should have BTF information built
> in (CONFIG_DEBUG_INFO_BTF=y).
>
>   [0] http://vger.kernel.org/bpfconf2019.html#session-2
>   [1] http://vger.kernel.org/lpc-bpf2018.html#session-2
>
> Andrii Nakryiko (10):
>   libbpf: add .BTF.ext offset relocation section loading
>   libbpf: implement BPF CO-RE offset relocation algorithm
>   selftests/bpf: add CO-RE relocs testing setup
>   selftests/bpf: add CO-RE relocs struct flavors tests
>   selftests/bpf: add CO-RE relocs nesting tests
>   selftests/bpf: add CO-RE relocs array tests
>   selftests/bpf: add CO-RE relocs enum/ptr/func_proto tests
>   selftests/bpf: add CO-RE relocs modifiers/typedef tests
>   selftest/bpf: add CO-RE relocs ptr-as-array tests
>   selftests/bpf: add CO-RE relocs ints tests
>
>  tools/lib/bpf/btf.c                           |  64 +-
>  tools/lib/bpf/btf.h                           |   4 +
>  tools/lib/bpf/libbpf.c                        | 866 +++++++++++++++++-
>  tools/lib/bpf/libbpf.h                        |   1 +
>  tools/lib/bpf/libbpf_internal.h               |  91 ++
>  .../selftests/bpf/prog_tests/core_reloc.c     | 363 ++++++++
>  .../bpf/progs/btf__core_reloc_arrays.c        |   3 +
>  .../btf__core_reloc_arrays___diff_arr_dim.c   |   3 +
>  ...btf__core_reloc_arrays___diff_arr_val_sz.c |   3 +
>  .../btf__core_reloc_arrays___err_non_array.c  |   3 +
>  ...btf__core_reloc_arrays___err_too_shallow.c |   3 +
>  .../btf__core_reloc_arrays___err_too_small.c  |   3 +
>  ..._core_reloc_arrays___err_wrong_val_type1.c |   3 +
>  ..._core_reloc_arrays___err_wrong_val_type2.c |   3 +
>  .../bpf/progs/btf__core_reloc_flavors.c       |   3 +
>  .../btf__core_reloc_flavors__err_wrong_name.c |   3 +
>  .../bpf/progs/btf__core_reloc_ints.c          |   3 +
>  .../bpf/progs/btf__core_reloc_ints___bool.c   |   3 +
>  .../btf__core_reloc_ints___err_bitfield.c     |   3 +
>  .../btf__core_reloc_ints___err_wrong_sz_16.c  |   3 +
>  .../btf__core_reloc_ints___err_wrong_sz_32.c  |   3 +
>  .../btf__core_reloc_ints___err_wrong_sz_64.c  |   3 +
>  .../btf__core_reloc_ints___err_wrong_sz_8.c   |   3 +
>  .../btf__core_reloc_ints___reverse_sign.c     |   3 +
>  .../bpf/progs/btf__core_reloc_mods.c          |   3 +
>  .../progs/btf__core_reloc_mods___mod_swap.c   |   3 +
>  .../progs/btf__core_reloc_mods___typedefs.c   |   3 +
>  .../bpf/progs/btf__core_reloc_nesting.c       |   3 +
>  .../btf__core_reloc_nesting___anon_embed.c    |   3 +
>  ...f__core_reloc_nesting___dup_compat_types.c |   5 +
>  ...core_reloc_nesting___err_array_container.c |   3 +
>  ...tf__core_reloc_nesting___err_array_field.c |   3 +
>  ...e_reloc_nesting___err_dup_incompat_types.c |   4 +
>  ...re_reloc_nesting___err_missing_container.c |   3 +
>  ...__core_reloc_nesting___err_missing_field.c |   3 +
>  ..._reloc_nesting___err_nonstruct_container.c |   3 +
>  ...e_reloc_nesting___err_partial_match_dups.c |   4 +
>  .../btf__core_reloc_nesting___err_too_deep.c  |   3 +
>  .../btf__core_reloc_nesting___extra_nesting.c |   3 +
>  ..._core_reloc_nesting___struct_union_mixup.c |   3 +
>  .../bpf/progs/btf__core_reloc_primitives.c    |   3 +
>  ...f__core_reloc_primitives___diff_enum_def.c |   3 +
>  ..._core_reloc_primitives___diff_func_proto.c |   3 +
>  ...f__core_reloc_primitives___diff_ptr_type.c |   3 +
>  ...tf__core_reloc_primitives___err_non_enum.c |   3 +
>  ...btf__core_reloc_primitives___err_non_int.c |   3 +
>  ...btf__core_reloc_primitives___err_non_ptr.c |   3 +
>  .../bpf/progs/btf__core_reloc_ptr_as_arr.c    |   3 +
>  .../btf__core_reloc_ptr_as_arr___diff_sz.c    |   3 +
>  .../selftests/bpf/progs/core_reloc_types.h    | 642 +++++++++++++
>  .../bpf/progs/test_core_reloc_arrays.c        |  58 ++
>  .../bpf/progs/test_core_reloc_flavors.c       |  65 ++
>  .../bpf/progs/test_core_reloc_ints.c          |  48 +
>  .../bpf/progs/test_core_reloc_kernel.c        |  39 +
>  .../bpf/progs/test_core_reloc_mods.c          |  68 ++
>  .../bpf/progs/test_core_reloc_nesting.c       |  48 +
>  .../bpf/progs/test_core_reloc_primitives.c    |  50 +
>  .../bpf/progs/test_core_reloc_ptr_as_arr.c    |  34 +
>  58 files changed, 2527 insertions(+), 47 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/core_reloc.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___diff_arr_dim.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___diff_arr_val_sz.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_non_array.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_too_shallow.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_too_small.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_wrong_val_type1.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_wrong_val_type2.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_flavors.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_flavors__err_wrong_name.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___bool.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_bitfield.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_16.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_32.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_64.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_8.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___reverse_sign.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_mods.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_mods___mod_swap.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_mods___typedefs.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___anon_embed.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___dup_compat_types.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_array_container.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_array_field.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_dup_incompat_types.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_missing_container.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_missing_field.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_nonstruct_container.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_partial_match_dups.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_too_deep.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___extra_nesting.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___struct_union_mixup.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___diff_enum_def.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___diff_func_proto.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___diff_ptr_type.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___err_non_enum.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___err_non_int.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___err_non_ptr.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ptr_as_arr.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ptr_as_arr___diff_sz.c
>  create mode 100644 tools/testing/selftests/bpf/progs/core_reloc_types.h
>  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_arrays.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_flavors.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_ints.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_kernel.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_mods.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_nesting.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_primitives.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_ptr_as_arr.c

We have created a lot of small files. Would it be cleaner if we can
somehow put these
data in one file (maybe different sections?).

Alternatively, maybe create a folder for these files:
  tools/testing/selftests/bpf/progs/core/

Thanks,
Song

^ permalink raw reply

* Re: [patch iproute2 1/2] devlink: introduce cmdline option to switch to a different namespace
From: David Ahern @ 2019-07-29 20:21 UTC (permalink / raw)
  To: Jiri Pirko, Toke Høiland-Jørgensen
  Cc: netdev, davem, jakub.kicinski, sthemmin, mlxsw
In-Reply-To: <20190727102116.GC2843@nanopsycho>

On 7/27/19 4:21 AM, Jiri Pirko wrote:
>>> diff --git a/devlink/devlink.c b/devlink/devlink.c
>>> index d8197ea3a478..9242cc05ad0c 100644
>>> --- a/devlink/devlink.c
>>> +++ b/devlink/devlink.c
>>> @@ -32,6 +32,7 @@
>>>  #include "mnlg.h"
>>>  #include "json_writer.h"
>>>  #include "utils.h"
>>> +#include "namespace.h"
>>>  
>>>  #define ESWITCH_MODE_LEGACY "legacy"
>>>  #define ESWITCH_MODE_SWITCHDEV "switchdev"
>>> @@ -6332,7 +6333,7 @@ static int cmd_health(struct dl *dl)
>>>  static void help(void)
>>>  {
>>>  	pr_err("Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }\n"
>>> -	       "       devlink [ -f[orce] ] -b[atch] filename\n"
>>> +	       "       devlink [ -f[orce] ] -b[atch] filename -N[etns]
>>>  netnsname\n"
>>
>> 'ip' uses lower-case n for this; why not be consistent?
> 
> Because "n" is taken :/

that's really unfortunate. commands within a package should have similar
syntax and -n/N are backwards between ip/tc and devlink. That's the
stuff that drives users crazy.

^ permalink raw reply

* Re: [PATCH bpf-next 03/10] selftests/bpf: add CO-RE relocs testing setup
From: Song Liu @ 2019-07-29 20:22 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Yonghong Song, Andrii Nakryiko, Kernel Team
In-Reply-To: <20190724192742.1419254-4-andriin@fb.com>

On Wed, Jul 24, 2019 at 1:34 PM Andrii Nakryiko <andriin@fb.com> wrote:
>
> Add CO-RE relocation test runner. Add one simple test validating that
> libbpf's logic for searching for kernel image and loading BTF out of it
> works.
>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply

* RE: [PATCH net] hv_sock: Fix hang when a connection is closed
From: Dexuan Cui @ 2019-07-29 20:23 UTC (permalink / raw)
  To: Sunil Muthuswamy, David Miller, netdev@vger.kernel.org
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, Michael Kelley, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com
In-Reply-To: <MW2PR2101MB1116DC8461F1B02C232019E2C0DD0@MW2PR2101MB1116.namprd21.prod.outlook.com>

> From: Sunil Muthuswamy <sunilmut@microsoft.com>
> Sent: Monday, July 29, 2019 10:21 AM
> > --- a/net/vmw_vsock/hyperv_transport.c
> > +++ b/net/vmw_vsock/hyperv_transport.c
> > @@ -309,9 +309,16 @@ static void hvs_close_connection(struct
> vmbus_channel *chan)
> >  {
> >  	struct sock *sk = get_per_channel_state(chan);
> >
> > +	/* Grab an extra reference since hvs_do_close_lock_held() may decrease
> > +	 * the reference count to 0 by calling sock_put(sk).
> > +	 */
> > +	sock_hold(sk);
> > +
> 
> To me, it seems like when 'hvs_close_connection' is called, there should always
> be an outstanding reference to the socket. 

I agree. There *should* be, but it turns out there is race condition: 

For an established connectin that is being closed by the guest, the refcnt is 4
at the end of hvs_release() (Note: here the 'remove_sock' is false):

1 for the initial value;
1 for the sk being in the bound list;
1 for the sk being in the connected list;
1 for the delayed close_work.

After hvs_release() finishes, __vsock_release() -> sock_put(sk) *may* decrease
the refcnt to 3. 

Concurrently, hvs_close_connection() runs in another thread:
  calls vsock_remove_sock() to decrease the refcnt by 2;
  call sock_put() to decrease the refcnt to 0, and free the sk;
  Next, the "release_sock(sk)" may hang due to use-after-free.

In the above, after hvs_release() finishes, if hvs_close_connection() runs
faster than "__vsock_release() -> sock_put(sk)", then there is not any issue,
because at the beginning of hvs_close_connection(), the refcnt is still 4.

So, this patch can work, but it's not the right fix. 
Your suggestion is correct and here is the patch. 
I'll give it more tests and send a v2.

--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -312,6 +312,11 @@ static void hvs_close_connection(struct vmbus_channel *chan)
        lock_sock(sk);
        hvs_do_close_lock_held(vsock_sk(sk), true);
        release_sock(sk);
+
+       /* Release the refcnt for the channel that's opened in
+        * hvs_open_connection().
+        */
+       sock_put(sk);
 }

 static void hvs_open_connection(struct vmbus_channel *chan)
@@ -407,6 +412,9 @@ static void hvs_open_connection(struct vmbus_channel *chan)
        }

        set_per_channel_state(chan, conn_from_host ? new : sk);
+
+       /* This reference will be dropped by hvs_close_connection(). */
+       sock_hold(conn_from_host ? new: sk);
        vmbus_set_chn_rescind_callback(chan, hvs_close_connection);

        /* Set the pending send size to max packet size to always get


> The reference that is dropped by
> ' hvs_do_close_lock_held' is a legitimate reference that was taken by
> 'hvs_close_lock_held'.

Correct.

> Or, in other words, I think the right solution is to always maintain a reference to
> socket
> until this routine is called and drop that here. That can be done by taking the
> reference to
> the socket prior to ' vmbus_set_chn_rescind_callback(chan,
> hvs_close_connection)' and
> dropping that reference at the end of 'hvs_close_connection'.
> 
> >  	lock_sock(sk);
> >  	hvs_do_close_lock_held(vsock_sk(sk), true);
> >  	release_sock(sk);
> > +
> > +	sock_put(sk);
> 
> Thanks for taking a look at this. We should queue this fix and the other
> hvsocket fixes
> for the stable branch.

I added a "Cc: stable@vger.kernel.org" tag so this pach will go to the
stable kernels automatically.

Your previous two fixes are in the v5.2.4 stable kernel, but not in the other
longterm stable kernels 4.19 and 4.14:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=v5.2.4&qt=author&q=Muthuswamy

I'll request them to be backported for 4.19 and 4.14.
I'll also request the patch "vsock: correct removal of socket from the list"
to be backported.

The other two "hv_sock: perf" patches are more of features rather than
fixes. Usually the stable kernel maintaners don't backport feature patches.

Thanks,
-- Dexuan

^ permalink raw reply

* [PATCH] net/socket: fix GCC8+ Wpacked-not-aligned warnings
From: Qian Cai @ 2019-07-29 20:23 UTC (permalink / raw)
  To: davem
  Cc: vyasevich, nhorman, marcelo.leitner, linux-sctp, netdev,
	linux-kernel, Qian Cai

In file included from ./include/linux/sctp.h:42,
                 from net/core/skbuff.c:47:
./include/uapi/linux/sctp.h:395:1: warning: alignment 4 of 'struct
sctp_paddr_change' is less than 8 [-Wpacked-not-aligned]
 } __attribute__((packed, aligned(4)));
 ^
./include/uapi/linux/sctp.h:728:1: warning: alignment 4 of 'struct
sctp_setpeerprim' is less than 8 [-Wpacked-not-aligned]
 } __attribute__((packed, aligned(4)));
 ^
./include/uapi/linux/sctp.h:727:26: warning: 'sspp_addr' offset 4 in
'struct sctp_setpeerprim' isn't aligned to 8 [-Wpacked-not-aligned]
  struct sockaddr_storage sspp_addr;
                          ^~~~~~~~~
./include/uapi/linux/sctp.h:741:1: warning: alignment 4 of 'struct
sctp_prim' is less than 8 [-Wpacked-not-aligned]
 } __attribute__((packed, aligned(4)));
 ^
./include/uapi/linux/sctp.h:740:26: warning: 'ssp_addr' offset 4 in
'struct sctp_prim' isn't aligned to 8 [-Wpacked-not-aligned]
  struct sockaddr_storage ssp_addr;
                          ^~~~~~~~
./include/uapi/linux/sctp.h:792:1: warning: alignment 4 of 'struct
sctp_paddrparams' is less than 8 [-Wpacked-not-aligned]
 } __attribute__((packed, aligned(4)));
 ^
./include/uapi/linux/sctp.h:784:26: warning: 'spp_address' offset 4 in
'struct sctp_paddrparams' isn't aligned to 8 [-Wpacked-not-aligned]
  struct sockaddr_storage spp_address;
                          ^~~~~~~~~~~
./include/uapi/linux/sctp.h:905:1: warning: alignment 4 of 'struct
sctp_paddrinfo' is less than 8 [-Wpacked-not-aligned]
 } __attribute__((packed, aligned(4)));
 ^
./include/uapi/linux/sctp.h:899:26: warning: 'spinfo_address' offset 4
in 'struct sctp_paddrinfo' isn't aligned to 8 [-Wpacked-not-aligned]
  struct sockaddr_storage spinfo_address;
                          ^~~~~~~~~~~~~~

This is because the commit 20c9c825b12f ("[SCTP] Fix SCTP socket options
to work with 32-bit apps on 64-bit kernels.") added "packed, aligned(4)"
GCC attributes to some structures but one of the members, i.e, "struct
sockaddr_storage" in those structures has the attribute,
"aligned(__alignof__ (struct sockaddr *)" which is 8-byte on 64-bit
systems, so the commit overwrites the designed alignments for
"sockaddr_storage".

To fix this, "struct sockaddr_storage" needs to be aligned to 4-byte as
it is only used in those packed sctp structure which is part of UAPI,
and "struct __kernel_sockaddr_storage" is used in some other
places of UAPI that need not to change alignments in order to not
breaking userspace.

One option is use typedef between "sockaddr_storage" and
"__kernel_sockaddr_storage" but it needs to change 35 or 370 lines of
codes. The other option is to duplicate this simple 2-field structure to
have a separate "struct sockaddr_storage" so it can use a different
alignment than "__kernel_sockaddr_storage". Also the structure seems
stable enough, so it will be low-maintenance to update two structures in
the future in case of changes.

Signed-off-by: Qian Cai <cai@lca.pw>
---
 include/linux/socket.h | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 97523818cb14..301119657125 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -38,7 +38,13 @@ struct linger {
 	int		l_linger;	/* How long to linger for	*/
 };
 
-#define sockaddr_storage __kernel_sockaddr_storage
+struct sockaddr_storage {
+	__kernel_sa_family_t	ss_family;		/* address family */
+	/* Following field(s) are implementation specific */
+	char		__data[_K_SS_MAXSIZE - sizeof(unsigned short)];
+				/* space to achieve desired size, */
+				/* _SS_MAXSIZE value minus size of ss_family */
+} __aligned(4);			/* force desired alignment */
 
 /*
  *	As we do 4.4BSD message passing we use a 4.4BSD message passing
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH] net: wan: sdla: Mark expected switch fall-through
From: Gustavo A. R. Silva @ 2019-07-29 20:01 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, linux-kernel, Gustavo A. R. Silva, Kees Cook

Mark switch cases where we are expecting to fall through.

This patch fixes the following warning (Building: i386):

drivers/net/wan/sdla.c: In function ‘sdla_errors’:
drivers/net/wan/sdla.c:414:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    if (cmd == SDLA_INFORMATION_WRITE)
       ^
drivers/net/wan/sdla.c:417:3: note: here
   default:
   ^~~~~~~

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
 drivers/net/wan/sdla.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wan/sdla.c b/drivers/net/wan/sdla.c
index a9ac3f37b904..e2e679a01b65 100644
--- a/drivers/net/wan/sdla.c
+++ b/drivers/net/wan/sdla.c
@@ -413,6 +413,7 @@ static void sdla_errors(struct net_device *dev, int cmd, int dlci, int ret, int
 		case SDLA_RET_NO_BUFS:
 			if (cmd == SDLA_INFORMATION_WRITE)
 				break;
+			/* Else, fall through */
 
 		default: 
 			netdev_dbg(dev, "Cmd 0x%02X generated return code 0x%02X\n",
-- 
2.22.0


^ permalink raw reply related

* Re: [PATCH bpf-next 00/10] CO-RE offset relocations
From: Song Liu @ 2019-07-29 20:36 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Yonghong Song, Andrii Nakryiko, Kernel Team
In-Reply-To: <CAPhsuW4hd2NJU5VZAwXDTMwrJRA4O-O2iNm8OywtJd0EZd5DmA@mail.gmail.com>

On Mon, Jul 29, 2019 at 1:20 PM Song Liu <liu.song.a23@gmail.com> wrote:
>
> On Wed, Jul 24, 2019 at 1:34 PM Andrii Nakryiko <andriin@fb.com> wrote:
> >
> > This patch set implements central part of CO-RE (Compile Once - Run
> > Everywhere, see [0] and [1] for slides and video): relocating field offsets.
> > Most of the details are written down as comments to corresponding parts of the
> > code.
> >
> > Patch #1 adds loading of .BTF.ext offset relocations section and macros to
> > work with its contents.
> > Patch #2 implements CO-RE relocations algorithm in libbpf.
> > Patches #3-#10 adds selftests validating various parts of relocation handling,
> > type compatibility, etc.
> >
> > For all tests to work, you'll need latest Clang/LLVM supporting
> > __builtin_preserve_access_index intrinsic, used for recording offset
> > relocations. Kernel on which selftests run should have BTF information built
> > in (CONFIG_DEBUG_INFO_BTF=y).
> >
> >   [0] http://vger.kernel.org/bpfconf2019.html#session-2
> >   [1] http://vger.kernel.org/lpc-bpf2018.html#session-2CO-RE relocations
> >
> > This patch set implements central part of CO-RE (Compile Once - Run
> > Everywhere, see [0] and [1] for slides and video): relocating field offsets.
> > Most of the details are written down as comments to corresponding parts of the
> > code.
> >
> > Patch #1 adds loading of .BTF.ext offset relocations section and macros to
> > work with its contents.
> > Patch #2 implements CO-RE relocations algorithm in libbpf.
> > Patches #3-#10 adds selftests validating various parts of relocation handling,
> > type compatibility, etc.
> >
> > For all tests to work, you'll need latest Clang/LLVM supporting
> > __builtin_preserve_access_index intrinsic, used for recording offset
> > relocations. Kernel on which selftests run should have BTF information built
> > in (CONFIG_DEBUG_INFO_BTF=y).
> >
> >   [0] http://vger.kernel.org/bpfconf2019.html#session-2
> >   [1] http://vger.kernel.org/lpc-bpf2018.html#session-2
> >
> > Andrii Nakryiko (10):
> >   libbpf: add .BTF.ext offset relocation section loading
> >   libbpf: implement BPF CO-RE offset relocation algorithm
> >   selftests/bpf: add CO-RE relocs testing setup
> >   selftests/bpf: add CO-RE relocs struct flavors tests
> >   selftests/bpf: add CO-RE relocs nesting tests
> >   selftests/bpf: add CO-RE relocs array tests
> >   selftests/bpf: add CO-RE relocs enum/ptr/func_proto tests
> >   selftests/bpf: add CO-RE relocs modifiers/typedef tests
> >   selftest/bpf: add CO-RE relocs ptr-as-array tests
> >   selftests/bpf: add CO-RE relocs ints tests
> >
> >  tools/lib/bpf/btf.c                           |  64 +-
> >  tools/lib/bpf/btf.h                           |   4 +
> >  tools/lib/bpf/libbpf.c                        | 866 +++++++++++++++++-
> >  tools/lib/bpf/libbpf.h                        |   1 +
> >  tools/lib/bpf/libbpf_internal.h               |  91 ++
> >  .../selftests/bpf/prog_tests/core_reloc.c     | 363 ++++++++
> >  .../bpf/progs/btf__core_reloc_arrays.c        |   3 +
> >  .../btf__core_reloc_arrays___diff_arr_dim.c   |   3 +
> >  ...btf__core_reloc_arrays___diff_arr_val_sz.c |   3 +
> >  .../btf__core_reloc_arrays___err_non_array.c  |   3 +
> >  ...btf__core_reloc_arrays___err_too_shallow.c |   3 +
> >  .../btf__core_reloc_arrays___err_too_small.c  |   3 +
> >  ..._core_reloc_arrays___err_wrong_val_type1.c |   3 +
> >  ..._core_reloc_arrays___err_wrong_val_type2.c |   3 +
> >  .../bpf/progs/btf__core_reloc_flavors.c       |   3 +
> >  .../btf__core_reloc_flavors__err_wrong_name.c |   3 +
> >  .../bpf/progs/btf__core_reloc_ints.c          |   3 +
> >  .../bpf/progs/btf__core_reloc_ints___bool.c   |   3 +
> >  .../btf__core_reloc_ints___err_bitfield.c     |   3 +
> >  .../btf__core_reloc_ints___err_wrong_sz_16.c  |   3 +
> >  .../btf__core_reloc_ints___err_wrong_sz_32.c  |   3 +
> >  .../btf__core_reloc_ints___err_wrong_sz_64.c  |   3 +
> >  .../btf__core_reloc_ints___err_wrong_sz_8.c   |   3 +
> >  .../btf__core_reloc_ints___reverse_sign.c     |   3 +
> >  .../bpf/progs/btf__core_reloc_mods.c          |   3 +
> >  .../progs/btf__core_reloc_mods___mod_swap.c   |   3 +
> >  .../progs/btf__core_reloc_mods___typedefs.c   |   3 +
> >  .../bpf/progs/btf__core_reloc_nesting.c       |   3 +
> >  .../btf__core_reloc_nesting___anon_embed.c    |   3 +
> >  ...f__core_reloc_nesting___dup_compat_types.c |   5 +
> >  ...core_reloc_nesting___err_array_container.c |   3 +
> >  ...tf__core_reloc_nesting___err_array_field.c |   3 +
> >  ...e_reloc_nesting___err_dup_incompat_types.c |   4 +
> >  ...re_reloc_nesting___err_missing_container.c |   3 +
> >  ...__core_reloc_nesting___err_missing_field.c |   3 +
> >  ..._reloc_nesting___err_nonstruct_container.c |   3 +
> >  ...e_reloc_nesting___err_partial_match_dups.c |   4 +
> >  .../btf__core_reloc_nesting___err_too_deep.c  |   3 +
> >  .../btf__core_reloc_nesting___extra_nesting.c |   3 +
> >  ..._core_reloc_nesting___struct_union_mixup.c |   3 +
> >  .../bpf/progs/btf__core_reloc_primitives.c    |   3 +
> >  ...f__core_reloc_primitives___diff_enum_def.c |   3 +
> >  ..._core_reloc_primitives___diff_func_proto.c |   3 +
> >  ...f__core_reloc_primitives___diff_ptr_type.c |   3 +
> >  ...tf__core_reloc_primitives___err_non_enum.c |   3 +
> >  ...btf__core_reloc_primitives___err_non_int.c |   3 +
> >  ...btf__core_reloc_primitives___err_non_ptr.c |   3 +
> >  .../bpf/progs/btf__core_reloc_ptr_as_arr.c    |   3 +
> >  .../btf__core_reloc_ptr_as_arr___diff_sz.c    |   3 +
> >  .../selftests/bpf/progs/core_reloc_types.h    | 642 +++++++++++++
> >  .../bpf/progs/test_core_reloc_arrays.c        |  58 ++
> >  .../bpf/progs/test_core_reloc_flavors.c       |  65 ++
> >  .../bpf/progs/test_core_reloc_ints.c          |  48 +
> >  .../bpf/progs/test_core_reloc_kernel.c        |  39 +
> >  .../bpf/progs/test_core_reloc_mods.c          |  68 ++
> >  .../bpf/progs/test_core_reloc_nesting.c       |  48 +
> >  .../bpf/progs/test_core_reloc_primitives.c    |  50 +
> >  .../bpf/progs/test_core_reloc_ptr_as_arr.c    |  34 +
> >  58 files changed, 2527 insertions(+), 47 deletions(-)
> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/core_reloc.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___diff_arr_dim.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___diff_arr_val_sz.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_non_array.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_too_shallow.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_too_small.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_wrong_val_type1.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_wrong_val_type2.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_flavors.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_flavors__err_wrong_name.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___bool.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_bitfield.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_16.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_32.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_64.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_8.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___reverse_sign.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_mods.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_mods___mod_swap.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_mods___typedefs.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___anon_embed.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___dup_compat_types.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_array_container.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_array_field.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_dup_incompat_types.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_missing_container.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_missing_field.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_nonstruct_container.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_partial_match_dups.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_too_deep.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___extra_nesting.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___struct_union_mixup.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___diff_enum_def.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___diff_func_proto.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___diff_ptr_type.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___err_non_enum.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___err_non_int.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___err_non_ptr.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ptr_as_arr.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ptr_as_arr___diff_sz.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/core_reloc_types.h
> >  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_arrays.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_flavors.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_ints.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_kernel.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_mods.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_nesting.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_primitives.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_ptr_as_arr.c
>
> We have created a lot of small files. Would it be cleaner if we can
> somehow put these
> data in one file (maybe different sections?).

After reading more, I guess you have tried this and end up with current
design: keep most struct defines in core_reloc_types.h.

>
> Alternatively, maybe create a folder for these files:
>   tools/testing/selftests/bpf/progs/core/

I guess this would still make it cleaner.

Thanks,
Song

^ permalink raw reply

* Re: [PATCH bpf-next 04/10] selftests/bpf: add CO-RE relocs struct flavors tests
From: Song Liu @ 2019-07-29 20:37 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Yonghong Song, Andrii Nakryiko, Kernel Team
In-Reply-To: <20190724192742.1419254-5-andriin@fb.com>

On Wed, Jul 24, 2019 at 1:34 PM Andrii Nakryiko <andriin@fb.com> wrote:
>
> Add tests verifying that BPF program can use various struct/union
> "flavors" to extract data from the same target struct/union.
>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply

* [PATCH net-next] can: fix ioctl function removal
From: Oliver Hartkopp @ 2019-07-29 20:40 UTC (permalink / raw)
  To: davem, netdev; +Cc: linux-can, Oliver Hartkopp, kernel test robot

Commit 60649d4e0af ("can: remove obsolete empty ioctl() handler") replaced the
almost empty can_ioctl() function with sock_no_ioctl() which always returns
-EOPNOTSUPP.

Even though we don't have any ioctl() functions on socket/network layer we need
to return -ENOIOCTLCMD to be able to forward ioctl commands like SIOCGIFINDEX
to the network driver layer.

This patch fixes the wrong return codes in the CAN network layer protocols.

Reported-by: kernel test robot <rong.a.chen@intel.com>
Fixes: 60649d4e0af ("can: remove obsolete empty ioctl() handler")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
---
 net/can/bcm.c | 9 ++++++++-
 net/can/raw.c | 9 ++++++++-
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/net/can/bcm.c b/net/can/bcm.c
index 8da986b19d88..bf1d0bbecec8 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -1680,6 +1680,13 @@ static int bcm_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
 	return size;
 }
 
+int bcm_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd,
+			 unsigned long arg)
+{
+	/* no ioctls for socket layer -> hand it down to NIC layer */
+	return -ENOIOCTLCMD;
+}
+
 static const struct proto_ops bcm_ops = {
 	.family        = PF_CAN,
 	.release       = bcm_release,
@@ -1689,7 +1696,7 @@ static const struct proto_ops bcm_ops = {
 	.accept        = sock_no_accept,
 	.getname       = sock_no_getname,
 	.poll          = datagram_poll,
-	.ioctl         = sock_no_ioctl,
+	.ioctl         = bcm_sock_no_ioctlcmd,
 	.gettstamp     = sock_gettstamp,
 	.listen        = sock_no_listen,
 	.shutdown      = sock_no_shutdown,
diff --git a/net/can/raw.c b/net/can/raw.c
index ff720272f7b7..da386f1fa815 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -837,6 +837,13 @@ static int raw_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
 	return size;
 }
 
+int raw_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd,
+			 unsigned long arg)
+{
+	/* no ioctls for socket layer -> hand it down to NIC layer */
+	return -ENOIOCTLCMD;
+}
+
 static const struct proto_ops raw_ops = {
 	.family        = PF_CAN,
 	.release       = raw_release,
@@ -846,7 +853,7 @@ static const struct proto_ops raw_ops = {
 	.accept        = sock_no_accept,
 	.getname       = raw_getname,
 	.poll          = datagram_poll,
-	.ioctl         = sock_no_ioctl,
+	.ioctl         = raw_sock_no_ioctlcmd,
 	.gettstamp     = sock_gettstamp,
 	.listen        = sock_no_listen,
 	.shutdown      = sock_no_shutdown,
-- 
2.20.1


^ permalink raw reply related

* Re: [bpf-next,v2 0/6] Introduce a BPF helper to generate SYN cookies
From: Alexei Starovoitov @ 2019-07-29 20:47 UTC (permalink / raw)
  To: Petar Penkov
  Cc: netdev, bpf, davem, ast, daniel, edumazet, lmb, sdf, toke,
	Petar Penkov
In-Reply-To: <20190729165918.92933-1-ppenkov.kernel@gmail.com>

On Mon, Jul 29, 2019 at 09:59:12AM -0700, Petar Penkov wrote:
> From: Petar Penkov <ppenkov@google.com>
> 
> This patch series introduces a BPF helper function that allows generating SYN
> cookies from BPF. Currently, this helper is enabled at both the TC hook and the
> XDP hook.
> 
> The first two patches in the series add/modify several TCP helper functions to
> allow for SKB-less operation, as is the case at the XDP hook.
> 
> The third patch introduces the bpf_tcp_gen_syncookie helper function which
> generates a SYN cookie for either XDP or TC programs. The return value of
> this function contains both the MSS value, encoded in the cookie, and the
> cookie itself.
> 
> The last three patches sync tools/ and add a test. 
> 
> Performance evaluation:
> I sent 10Mpps to a fixed port on a host with 2 10G bonded Mellanox 4 NICs from
> random IPv6 source addresses. Without XDP I observed 7.2Mpps (syn-acks) being
> sent out if the IPv6 packets carry 20 bytes of TCP options or 7.6Mpps if they
> carry no options. If I attached a simple program that checks if a packet is
> IPv6/TCP/SYN, looks up the socket, issues a cookie, and sends it back out after
> swapping src/dest, recomputing the checksum, and setting the ACK flag, I
> observed 10Mpps being sent back out.

Is it 10m because trafic gen is 10m?
What is cpu utilization at this rate?
Is it cpu or nic limited if you crank up the syn flood?
Original 7M with all cores or single core?

The patch set looks good to me.
I'd like Eric to review it one more time before applying.


^ permalink raw reply

* Re: [PATCH 1/3 net-next] linux: Add skb_frag_t page_offset accessors
From: Jakub Kicinski @ 2019-07-29 20:50 UTC (permalink / raw)
  To: Jonathan Lemon; +Cc: willy, davem, kernel-team, netdev
In-Reply-To: <20190729171941.250569-2-jonathan.lemon@gmail.com>

On Mon, 29 Jul 2019 10:19:39 -0700, Jonathan Lemon wrote:
> Add skb_frag_off(), skb_frag_off_add(), skb_frag_off_set(),
> and skb_frag_off_set_from() accessors for page_offset.
> 
> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> ---
>  include/linux/skbuff.h | 61 ++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 56 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 718742b1c505..7d94a78067ee 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -331,7 +331,7 @@ static inline void skb_frag_size_set(skb_frag_t *frag, unsigned int size)
>  }
>  
>  /**
> - * skb_frag_size_add - Incrementes the size of a skb fragment by %delta
> + * skb_frag_size_add - Increments the size of a skb fragment by %delta
>   * @frag: skb fragment
>   * @delta: value to add
>   */
> @@ -2857,6 +2857,46 @@ static inline void skb_propagate_pfmemalloc(struct page *page,
>  		skb->pfmemalloc = true;
>  }
>  
> +/**
> + * skb_frag_off - Returns the offset of a skb fragment
> + * @frag: the paged fragment
> + */
> +static inline unsigned int skb_frag_off(const skb_frag_t *frag)
> +{
> +	return frag->page_offset;
> +}
> +
> +/**
> + * skb_frag_off_add - Increments the offset of a skb fragment by %delta

I realize you're following the existing code, but should we perhaps use
the latest kdoc syntax? '()' after function name, and args should have
'@' prefix, '%' would be for constants.

> + * @frag: skb fragment
> + * @delta: value to add
> + */
> +static inline void skb_frag_off_add(skb_frag_t *frag, int delta)
> +{
> +	frag->page_offset += delta;
> +}
> +
> +/**
> + * skb_frag_off_set - Sets the offset of a skb fragment
> + * @frag: skb fragment
> + * @offset: offset of fragment
> + */
> +static inline void skb_frag_off_set(skb_frag_t *frag, unsigned int offset)
> +{
> +	frag->page_offset = offset;
> +}
> +
> +/**
> + * skb_frag_off_set_from - Sets the offset of a skb fragment from another fragment
> + * @fragto: skb fragment where offset is set
> + * @fragfrom: skb fragment offset is copied from
> + */
> +static inline void skb_frag_off_set_from(skb_frag_t *fragto,
> +					 const skb_frag_t *fragfrom)

skb_frag_off_copy() ?

> +{
> +	fragto->page_offset = fragfrom->page_offset;
> +}
> +
>  /**
>   * skb_frag_page - retrieve the page referred to by a paged fragment
>   * @frag: the paged fragment
> @@ -2923,7 +2963,7 @@ static inline void skb_frag_unref(struct sk_buff *skb, int f)
>   */
>  static inline void *skb_frag_address(const skb_frag_t *frag)
>  {
> -	return page_address(skb_frag_page(frag)) + frag->page_offset;
> +	return page_address(skb_frag_page(frag)) + skb_frag_off(frag);
>  }
>  
>  /**
> @@ -2939,7 +2979,18 @@ static inline void *skb_frag_address_safe(const skb_frag_t *frag)
>  	if (unlikely(!ptr))
>  		return NULL;
>  
> -	return ptr + frag->page_offset;
> +	return ptr + skb_frag_off(frag);
> +}
> +
> +/**
> + * skb_frag_page_set_from - sets the page in a fragment from another fragment

skb_frag_page_copy() ?

> + * @fragto: skb fragment where page is set
> + * @fragfrom: skb fragment page is copied from
> + */
> +static inline void skb_frag_page_set_from(skb_frag_t *fragto,
> +					  const skb_frag_t *fragfrom)
> +{
> +	fragto->bv_page = fragfrom->bv_page;
>  }
>  
>  /**

^ permalink raw reply

* Re: [PATCH net] net/mlx5e: Fix unnecessary flow_block_cb_is_busy call
From: David Miller @ 2019-07-29 20:53 UTC (permalink / raw)
  To: saeedm; +Cc: wenxu, netdev
In-Reply-To: <b7a5de0ae2464df31ed39fee71020ba063a7a90f.camel@mellanox.com>

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Mon, 29 Jul 2019 18:25:26 +0000

> Dave let me know if you want me to take it to my branch.

Please do, thanks Saeed.

^ permalink raw reply

* Re: [PATCH net v2] mlxsw: spectrum_ptp: Increase parsing depth when PTP is enabled
From: David Miller @ 2019-07-29 20:55 UTC (permalink / raw)
  To: petrm; +Cc: netdev, jiri, idosch
In-Reply-To: <b1584bdec4a0a36a2567a43dc0973dd8f3a05dec.1564424420.git.petrm@mellanox.com>

From: Petr Machata <petrm@mellanox.com>
Date: Mon, 29 Jul 2019 18:26:14 +0000

> Spectrum systems have a configurable limit on how far into the packet they
> parse. By default, the limit is 96 bytes.
> 
> An IPv6 PTP packet is layered as Ethernet/IPv6/UDP (14+40+8 bytes), and
> sequence ID of a PTP event is only available 32 bytes into payload, for a
> total of 94 bytes. When an additional 802.1q header is present as
> well (such as when ptp4l is running on a VLAN port), the parsing limit is
> exceeded. Such packets are not recognized as PTP, and are not timestamped.
> 
> Therefore generalize the current VXLAN-specific parsing depth setting to
> allow reference-counted requests from other modules as well. Keep it in the
> VXLAN module, because the MPRS register also configures UDP destination
> port number used for VXLAN, and is thus closely tied to the VXLAN code
> anyway.
> 
> Then invoke the new interfaces from both VXLAN (in obvious places), as well
> as from PTP code, when the (global) timestamping configuration changes from
> disabled to enabled or vice versa.
> 
> Fixes: 8748642751ed ("mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls")
> Signed-off-by: Petr Machata <petrm@mellanox.com>
> ---
> 
> Notes:
>     v2:
>     - Preserve RXT in mlxsw_sp1_ptp_mtpppc_update()

Applied, thanks Petr.

^ permalink raw reply

* Re: [PATCH bpf-next v5 0/6] xdp: Add devmap_hash map type
From: Alexei Starovoitov @ 2019-07-29 20:57 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Daniel Borkmann, Alexei Starovoitov, Network Development,
	David Miller, Jesper Dangaard Brouer, Jakub Kicinski,
	Björn Töpel, Yonghong Song
In-Reply-To: <CAADnVQJpYeQ68V5BE2r3BhbraBh7G8dSd8zknFUJxtW4GwNkuA@mail.gmail.com>

On Fri, Jul 26, 2019 at 7:26 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Jul 26, 2019 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> > This series adds a new map type, devmap_hash, that works like the existing
> > devmap type, but using a hash-based indexing scheme. This is useful for the use
> > case where a devmap is indexed by ifindex (for instance for use with the routing
> > table lookup helper). For this use case, the regular devmap needs to be sized
> > after the maximum ifindex number, not the number of devices in it. A hash-based
> > indexing scheme makes it possible to size the map after the number of devices it
> > should contain instead.
> >
> > This was previously part of my patch series that also turned the regular
> > bpf_redirect() helper into a map-based one; for this series I just pulled out
> > the patches that introduced the new map type.
> >
> > Changelog:
> >
> > v5:
> >
> > - Dynamically set the number of hash buckets by rounding up max_entries to the
> >   nearest power of two (mirroring the regular hashmap), as suggested by Jesper.
>
> fyi I'm waiting for Jesper to review this new version.

Now applied.

Toke,
please consider adding proper selftest for it.
        fd = bpf_create_map(BPF_MAP_TYPE_DEVMAP_HASH, sizeof(key),
sizeof(value),
                            2, 0);
        if (fd < 0) {
                printf("Failed to create devmap_hash '%s'!\n", strerror(errno));
                exit(1);
        }
        close(fd);
is not really a test.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox