Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH bpf-next 01/10] libbpf: add .BTF.ext offset relocation section loading
From: Andrii Nakryiko @ 2019-07-24 21:42 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Yonghong Song, Kernel Team
In-Reply-To: <20190724192742.1419254-2-andriin@fb.com>

On Wed, Jul 24, 2019 at 12:28 PM Andrii Nakryiko <andriin@fb.com> wrote:
>
> Add support for BPF CO-RE offset relocations. Add section/record
> iteration macros for .BTF.ext. These macro are useful for iterating over
> each .BTF.ext record, either for dumping out contents or later for BPF
> CO-RE relocation handling.
>
> To enable other parts of libbpf to work with .BTF.ext contents, moved
> a bunch of type definitions into libbpf_internal.h.
>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---

[...]

> + *
> + * Example to provide a better feel.
> + *
> + *   struct sample {
> + *       int a;
> + *       struct {
> + *           int b[10];
> + *       };
> + *   };
> + *
> + *   struct sample *s = ...;
> + *   int x = &s->a;     // encoded as "0:0" (a is field #0)
> + *   int y = &s->b[5];  // encoded as "0:1:5" (b is field #1, arr elem #5)

This should be "0:1:0:5", actually. Anon struct is field #1 in BTF, b
is field #0 inside that anon struct + array index 5.
Will update it locally and incorporate into next version once the rest
of patch set is reviewed.

> + *   int z = &s[10]->b; // encoded as "10:1" (ptr is used as an array)
> + *
> + * type_id for all relocs in this example  will capture BTF type id of
> + * `struct sample`.
> + *
> + *   [0] https://llvm.org/docs/LangRef.html#getelementptr-instruction
> + */
> +struct bpf_offset_reloc {
> +       __u32   insn_off;
> +       __u32   type_id;
> +       __u32   access_str_off;
> +};
> +
>  #endif /* __LIBBPF_LIBBPF_INTERNAL_H */
> --
> 2.17.1
>

^ permalink raw reply

* Re: [PATCH 3/3] net: dsa: ksz: Add Microchip KSZ8795 DSA driver
From: David Miller @ 2019-07-24 21:46 UTC (permalink / raw)
  To: marex; +Cc: netdev, Tristram.Ha, andrew, f.fainelli, vivien.didelot,
	woojung.huh
In-Reply-To: <20190724134048.31029-4-marex@denx.de>

From: Marek Vasut <marex@denx.de>
Date: Wed, 24 Jul 2019 15:40:48 +0200

> +static void ksz8795_r_mib_cnt(struct ksz_device *dev, int port, u16 addr,
> +			      u64 *cnt)
> +{
> +	u32 data;
> +	u16 ctrl_addr;
> +	u8 check;
> +	int loop;

Reverse christmas tree for these local variable declarations.

> +static void ksz8795_r_mib_pkt(struct ksz_device *dev, int port, u16 addr,
> +			      u64 *dropped, u64 *cnt)
> +{
> +	u32 data;
> +	u16 ctrl_addr;
> +	u8 check;
> +	int loop;

Likewise.

> +static int ksz8795_r_dyn_mac_table(struct ksz_device *dev, u16 addr,
> +				   u8 *mac_addr, u8 *fid, u8 *src_port,
> +				   u8 *timestamp, u16 *entries)
> +{
> +	u32 data_hi;
> +	u32 data_lo;
> +	u16 ctrl_addr;
> +	int rc;
> +	u8 data;

Likewise.

> +static int ksz8795_r_sta_mac_table(struct ksz_device *dev, u16 addr,
> +				   struct alu_struct *alu)
> +{
> +	u64 data;
> +	u32 data_hi;
> +	u32 data_lo;

Likewise.

> +static void ksz8795_w_sta_mac_table(struct ksz_device *dev, u16 addr,
> +				    struct alu_struct *alu)
> +{
> +	u64 data;
> +	u32 data_hi;
> +	u32 data_lo;

Likewise.

> +static inline void ksz8795_from_vlan(u16 vlan, u8 *fid, u8 *member, u8 *valid)

Never use the inline directive in foo.c files, let the compiler decide.

> +static inline void ksz8795_to_vlan(u8 fid, u8 member, u8 valid, u16 *vlan)

Likewise.

> +static void ksz8795_r_vlan_table(struct ksz_device *dev, u16 vid, u16 *vlan)
> +{
> +	u64 buf;
> +	u16 *data = (u16 *)&buf;
> +	u16 addr;
> +	int index;

Reverse christmas tree please.

> +static void ksz8795_w_vlan_table(struct ksz_device *dev, u16 vid, u16 vlan)
> +{
> +	u64 buf;
> +	u16 *data = (u16 *)&buf;
> +	u16 addr;
> +	int index;

Likewise.

> +static void ksz8795_r_phy(struct ksz_device *dev, u16 phy, u16 reg, u16 *val)
> +{
> +	struct ksz_port *port;
> +	u8 ctrl;
> +	u8 restart;
> +	u8 link;
> +	u8 speed;
> +	u8 p = phy;
> +	u16 data = 0;
> +	int processed = true;

Likewise.

> +static void ksz8795_w_phy(struct ksz_device *dev, u16 phy, u16 reg, u16 val)
> +{
> +	u8 ctrl;
> +	u8 restart;
> +	u8 speed;
> +	u8 data;
> +	u8 p = phy;

Likewise.

> +static void ksz8795_port_stp_state_set(struct dsa_switch *ds, int port,
> +				       u8 state)
> +{
> +	struct ksz_device *dev = ds->priv;
> +	struct ksz_port *p = &dev->ports[port];
> +	u8 data;
> +	int member = -1;
> +	int forward = dev->member;

Likewise.

> +static void ksz8795_flush_dyn_mac_table(struct ksz_device *dev, int port)
> +{
> +	struct ksz_port *p;
> +	int cnt;
> +	int first;
> +	int index;
> +	u8 learn[TOTAL_PORT_NUM];

Likewise.

> +static void ksz8795_port_vlan_add(struct dsa_switch *ds, int port,
> +				  const struct switchdev_obj_port_vlan *vlan)
> +{
> +	struct ksz_device *dev = ds->priv;
> +	bool untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED;
> +	u16 data;
> +	u16 vid;
> +	u8 fid;
> +	u8 member;
> +	u8 valid;
> +	u16 new_pvid = 0;

Likewise.  And seriously, combine all of the same typed variables onto one line
to compress the space a bit:

	bool untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED;
	struct ksz_device *dev = ds->priv;
	u16 data, vid, new_pvid = 0;
	u8 fid, member, valid;

Doesn't that look like 1,000 times nicer? :-)

> +static int ksz8795_port_vlan_del(struct dsa_switch *ds, int port,
> +				 const struct switchdev_obj_port_vlan *vlan)
> +{
> +	struct ksz_device *dev = ds->priv;
> +	bool untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED;
> +	u16 data;
> +	u16 vid;
> +	u16 pvid;
> +	u8 fid;
> +	u8 member;
> +	u8 valid;
> +	u16 new_pvid = 0;

Again, same thing.

> +static void ksz8795_port_setup(struct ksz_device *dev, int port, bool cpu_port)
> +{
> +	u8 data8;
> +	u8 member;
> +	struct ksz_port *p = &dev->ports[port];

Likewise.
> +static void ksz8795_config_cpu_port(struct dsa_switch *ds)
> +{
> +	struct ksz_device *dev = ds->priv;
> +	struct ksz_port *p;
> +	int i;
> +	u8 remote;

Likewise.

> +static int ksz8795_setup(struct dsa_switch *ds)
> +{
> +	u8 data8;
> +	u16 data16;
> +	u32 value;
> +	int i;
> +	struct alu_struct alu;
> +	struct ksz_device *dev = ds->priv;
> +	int ret = 0;

Likewise.
> +static int ksz8795_switch_detect(struct ksz_device *dev)
> +{
> +	u16 id16;
> +	u8 id1;
> +	u8 id2;
> +	int ret;

Likewise.

^ permalink raw reply

* [PATCH RFC net-next] net: use helper skb_ensure_writable in skb_checksum_help and skb_crc32c_csum_help
From: Heiner Kallweit @ 2019-07-24 21:47 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org

Instead of open-coding making the checksum writable we can use an
appropriate helper. skb_ensure_writable is a candidate, however we
might also use skb_header_unclone. Hints welcome.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 net/core/dev.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index fc676b261..90516a800 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2939,12 +2939,9 @@ int skb_checksum_help(struct sk_buff *skb)
 	offset += skb->csum_offset;
 	BUG_ON(offset + sizeof(__sum16) > skb_headlen(skb));
 
-	if (skb_cloned(skb) &&
-	    !skb_clone_writable(skb, offset + sizeof(__sum16))) {
-		ret = pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
-		if (ret)
-			goto out;
-	}
+	ret = skb_ensure_writable(skb, offset + sizeof(__sum16));
+	if (ret)
+		goto out;
 
 	*(__sum16 *)(skb->data + offset) = csum_fold(csum) ?: CSUM_MANGLED_0;
 out_set_summed:
@@ -2979,12 +2976,11 @@ int skb_crc32c_csum_help(struct sk_buff *skb)
 		ret = -EINVAL;
 		goto out;
 	}
-	if (skb_cloned(skb) &&
-	    !skb_clone_writable(skb, offset + sizeof(__le32))) {
-		ret = pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
-		if (ret)
-			goto out;
-	}
+
+	ret = skb_ensure_writable(skb, offset + sizeof(__le32));
+	if (ret)
+		goto out;
+
 	crc32c_csum = cpu_to_le32(~__skb_checksum(skb, start,
 						  skb->len - start, ~(__u32)0,
 						  crc32c_csum_stub));
-- 
2.22.0


^ permalink raw reply related

* [PATCH v2 bpf] libbpf: silence GCC8 warning about string truncation
From: Andrii Nakryiko @ 2019-07-24 21:47 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Magnus Karlsson

Despite a proper NULL-termination after strncpy(..., ..., IFNAMSIZ - 1),
GCC8 still complains about *expected* string truncation:

  xsk.c:330:2: error: 'strncpy' output may be truncated copying 15 bytes
  from a string of length 15 [-Werror=stringop-truncation]
    strncpy(ifr.ifr_name, xsk->ifname, IFNAMSIZ - 1);

This patch gets rid of the issue altogether by using memcpy instead.
There is no performance regression, as strncpy will still copy and fill
all of the bytes anyway.

v1->v2:
- rebase against bpf tree.

Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Song Liu <songliubraving@fb.com>
---
 tools/lib/bpf/xsk.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index e02025bbe36d..680e63066cf3 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -326,7 +326,7 @@ static int xsk_get_max_queues(struct xsk_socket *xsk)
 		return -errno;
 
 	ifr.ifr_data = (void *)&channels;
-	strncpy(ifr.ifr_name, xsk->ifname, IFNAMSIZ - 1);
+	memcpy(ifr.ifr_name, xsk->ifname, IFNAMSIZ - 1);
 	ifr.ifr_name[IFNAMSIZ - 1] = '\0';
 	err = ioctl(fd, SIOCETHTOOL, &ifr);
 	if (err && errno != EOPNOTSUPP) {
@@ -516,7 +516,7 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
 		err = -errno;
 		goto out_socket;
 	}
-	strncpy(xsk->ifname, ifname, IFNAMSIZ - 1);
+	memcpy(xsk->ifname, ifname, IFNAMSIZ - 1);
 	xsk->ifname[IFNAMSIZ - 1] = '\0';
 
 	err = xsk_set_xdp_socket_config(&xsk->config, usr_config);
-- 
2.17.1


^ permalink raw reply related

* Re: [RFC PATCH net-next 0/3] net: batched receive in GRO path
From: Edward Cree @ 2019-07-24 21:49 UTC (permalink / raw)
  To: Eric Dumazet, Paolo Abeni, David Miller; +Cc: netdev
In-Reply-To: <d7ca6e7a-b80e-12e8-9050-c25b8b92bf26@gmail.com>

On 12/07/2019 17:48, Eric Dumazet wrote:
>> but the rest is the stuff we're polling for for low latency.
>> I'm putting a gro_normal_list() call after the trace_napi_poll() in
>>  napi_busy_loop() and testing that, let's see how it goes...
One thing that's causing me some uncertainty: busy_poll_stop() does a
 napi->poll(), which can potentially gro_normal_one() something.  But
 when I tried to put a gro_normal_list() just after that, I ran into
 list corruption because it could race against the one in
 napi_complete_done().  I'm not entirely sure how, my current theory
 goes something like:
- clear_bit(IN_BUSY_POLL)
- task switch, start napi poll
- get as far as starting gro_normal_list()
- task switch back to busy_poll_stop()
- local_bh_disable()
- do a napi poll
- start gro_normal_list()
- list corruption ensues as we have two instances of
  netif_receive_skb_list_internal() trying to consume the same list
But I may be wildly mistaken.
Questions that arise from that:
1) Is it safe to potentially be adding to the rx_list (gro_normal_one(),
   which in theory can end up calling gro_normal_list() as well) within
   busy_poll_stop()?  I haven't ever seen a splat from that, but it seems
   every bit as possible as what I have been seeing.
2) Why does busy_poll_stop() not do its local_bh_disable() *before*
   clearing napi state bits, which (if I'm understanding correctly) would
   ensure an ordinary napi poll can't race with the one in busy_poll_stop()?

Apart from that I have indeed established that with the patches as posted
 busy-polling latency is awful, but adding a gro_normal_list() into
 napi_busy_loop() fixes that, as expected.

-Ed

^ permalink raw reply

* Re: [PATCH net-next] selftests: mlxsw: Fix typo in qos_mc_aware.sh
From: David Miller @ 2019-07-24 21:51 UTC (permalink / raw)
  To: standby24x7
  Cc: shuah, linux-kernel, jiri, idosch, linux-kselftest, rdunlap,
	netdev
In-Reply-To: <20190724152951.4618-1-standby24x7@gmail.com>

From: Masanari Iida <standby24x7@gmail.com>
Date: Thu, 25 Jul 2019 00:29:51 +0900

> This patch fix some spelling typo in qos_mc_aware.sh
> 
> Signed-off-by: Masanari Iida <standby24x7@gmail.com>
> Acked-by: Randy Dunlap <rdunlap@infradead.org>

Applied.

^ permalink raw reply

* Re: [PATCH bpf-next 5/6] selftests/bpf: test BPF_MAP_DUMP command on a bpf hashmap
From: Song Liu @ 2019-07-24 21:58 UTC (permalink / raw)
  To: Brian Vazquez
  Cc: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Stanislav Fomichev, Willem de Bruijn,
	Petar Penkov, open list, Networking, bpf
In-Reply-To: <20190724165803.87470-6-brianvv@google.com>

On Wed, Jul 24, 2019 at 10:10 AM Brian Vazquez <brianvv@google.com> wrote:
>
> This tests exercise the new command on a bpf hashmap and make sure it
> works as expected.
>
> Signed-off-by: Brian Vazquez <brianvv@google.com>
> ---
>  tools/testing/selftests/bpf/test_maps.c | 83 ++++++++++++++++++++++++-
>  1 file changed, 81 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
> index 5443b9bd75ed7..f7ab401399d40 100644
> --- a/tools/testing/selftests/bpf/test_maps.c
> +++ b/tools/testing/selftests/bpf/test_maps.c
> @@ -309,6 +309,86 @@ static void test_hashmap_walk(unsigned int task, void *data)
>         close(fd);
>  }
>
> +static void test_hashmap_dump(void)
> +{
> +       int fd, i, max_entries = 5;

5 is too small for map_dump.

> +       uint64_t keys[max_entries], values[max_entries];
> +       uint64_t key, value, next_key, prev_key;
> +       bool next_key_valid = true;
> +       void *buf, *elem;
> +       u32 buf_len;
> +       const int elem_size = sizeof(key) + sizeof(value);
> +
> +       fd = helper_fill_hashmap(max_entries);
> +
> +       // Get the elements in the hashmap, and store them in that order

Please use /* */ instead of //.

> +       assert(bpf_map_get_next_key(fd, NULL, &key) == 0);
> +       i = 0;
> +       keys[i] = key;
> +       for (i = 1; next_key_valid; i++) {
> +               next_key_valid = bpf_map_get_next_key(fd, &key, &next_key) == 0;
> +               assert(bpf_map_lookup_elem(fd, &key, &values[i - 1]) == 0);
> +               keys[i-1] = key;
> +               key = next_key;
> +       }
> +
> +       // Alloc memory for the whole table
> +       buf = malloc(elem_size * max_entries);
> +       assert(buf != NULL);
> +
> +       // Check that buf_len < elem_size returns EINVAL
> +       buf_len = elem_size-1;
> +       errno = 0;
> +       assert(bpf_map_dump(fd, NULL, buf, &buf_len) == -1 && errno == EINVAL);
> +
> +       // Check that it returns the first two elements
> +       errno = 0;
> +       buf_len = elem_size * 2;
> +       i = 0;
> +       assert(bpf_map_dump(fd, NULL, buf, &buf_len) == 0 &&
> +              buf_len == 2*elem_size);
> +       elem = buf;
> +       assert((*(uint64_t *)elem) == keys[i] &&
> +              (*(uint64_t *)(elem + sizeof(key))) == values[i]);
> +       elem = buf + elem_size;
> +       i++;
> +       assert((*(uint64_t *)elem) == keys[i] &&
> +              (*(uint64_t *)(elem + sizeof(key))) == values[i]);
> +       i++;
> +
> +       /* Check that prev_key contains key from last_elem retrieved in previous
> +        * call
> +        */
> +       prev_key = *((uint64_t *)elem);
> +       assert(bpf_map_dump(fd, &prev_key, buf, &buf_len) == 0 &&
> +              buf_len == elem_size*2);
> +       elem = buf;
> +       assert((*(uint64_t *)elem) == keys[i] &&
> +              (*(uint64_t *)(elem + sizeof(key))) == values[i]);
> +       elem = buf + elem_size;
> +       i++;
> +       assert((*(uint64_t *)elem) == keys[i] &&
> +              (*(uint64_t *)(elem + sizeof(key))) == values[i]);
> +       i++;
> +       assert(prev_key == (*(uint64_t *)elem));
> +
> +       /* Continue reading from map and verify buf_len only contains 1 element
> +        * even though buf_len is 2 elem_size and it returns err = 0.
> +        */
> +       assert(bpf_map_dump(fd, &prev_key, buf, &buf_len) == 0 &&
> +              buf_len == elem_size);
> +       elem = buf;
> +       assert((*(uint64_t *)elem) == keys[i] &&
> +              (*(uint64_t *)(elem + sizeof(key))) == values[i]);
> +
> +       // Verify there's no more entries and err = ENOENT
> +       assert(bpf_map_dump(fd, &prev_key, buf, &buf_len) == -1 &&
> +              errno == ENOENT);
> +
> +       free(buf);
> +       close(fd);
> +}
> +
>  static void test_hashmap_zero_seed(void)
>  {
>         int i, first, second, old_flags;
> @@ -1677,6 +1757,7 @@ static void run_all_tests(void)
>         test_hashmap_percpu(0, NULL);
>         test_hashmap_walk(0, NULL);
>         test_hashmap_zero_seed();
> +       test_hashmap_dump();
>
>         test_arraymap(0, NULL);
>         test_arraymap_percpu(0, NULL);
> @@ -1714,11 +1795,9 @@ int main(void)
>
>         map_flags = BPF_F_NO_PREALLOC;
>         run_all_tests();
> -
>  #define CALL
>  #include <map_tests/tests.h>
>  #undef CALL
> -
>         printf("test_maps: OK, %d SKIPPED\n", skips);
>         return 0;
>  }
> --
> 2.22.0.657.g960e92d24f-goog
>

^ permalink raw reply

* Re: [PATCH net-next] selftests: mlxsw: Fix typo in qos_mc_aware.sh
From: shuah @ 2019-07-24 22:00 UTC (permalink / raw)
  To: David Miller, standby24x7
  Cc: linux-kernel, jiri, idosch, linux-kselftest, rdunlap, netdev,
	shuah
In-Reply-To: <20190724.145123.912916059374852633.davem@davemloft.net>

On 7/24/19 3:51 PM, David Miller wrote:
> From: Masanari Iida <standby24x7@gmail.com>
> Date: Thu, 25 Jul 2019 00:29:51 +0900
> 
>> This patch fix some spelling typo in qos_mc_aware.sh
>>
>> Signed-off-by: Masanari Iida <standby24x7@gmail.com>
>> Acked-by: Randy Dunlap <rdunlap@infradead.org>
> 
> Applied.
> 

I applied to this my fixes branch this morning on auto-pilot
without realizing that it is in your domain :)

Would you like like me to drop it from mine?

thanks,
-- Shuah

^ permalink raw reply

* Re: [PATCH bpf-next 6/6] selftests/bpf: add test to measure performance of BPF_MAP_DUMP
From: Song Liu @ 2019-07-24 22:01 UTC (permalink / raw)
  To: Brian Vazquez
  Cc: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Stanislav Fomichev, Willem de Bruijn,
	Petar Penkov, open list, Networking, bpf
In-Reply-To: <20190724165803.87470-7-brianvv@google.com>

On Wed, Jul 24, 2019 at 10:10 AM Brian Vazquez <brianvv@google.com> wrote:
>
> This tests compares the amount of time that takes to read an entire
> table of 100K elements on a bpf hashmap using both BPF_MAP_DUMP and
> BPF_MAP_GET_NEXT_KEY + BPF_MAP_LOOKUP_ELEM.
>
> Signed-off-by: Brian Vazquez <brianvv@google.com>
> ---
>  tools/testing/selftests/bpf/test_maps.c | 65 +++++++++++++++++++++++++
>  1 file changed, 65 insertions(+)
>
> diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
> index f7ab401399d40..c4593a8904ca6 100644
> --- a/tools/testing/selftests/bpf/test_maps.c
> +++ b/tools/testing/selftests/bpf/test_maps.c
> @@ -18,6 +18,7 @@
>  #include <sys/socket.h>
>  #include <netinet/in.h>
>  #include <linux/bpf.h>
> +#include <linux/time64.h>
>
>  #include <bpf/bpf.h>
>  #include <bpf/libbpf.h>
> @@ -389,6 +390,69 @@ static void test_hashmap_dump(void)
>         close(fd);
>  }
>
> +static void test_hashmap_dump_perf(void)
> +{
> +       int fd, i, max_entries = 100000;
> +       uint64_t key, value, next_key;
> +       bool next_key_valid = true;
> +       void *buf;
> +       u32 buf_len, entries;
> +       int j = 0;
> +       int clk_id = CLOCK_MONOTONIC;
> +       struct timespec begin, end;
> +       long long time_spent, dump_time_spent;
> +       double res;
> +       int tests[] = {1, 2, 230, 5000, 73000, 100000, 234567};
> +       int test_len = ARRAY_SIZE(tests);
> +       const int elem_size = sizeof(key) + sizeof(value);
> +
> +       fd = helper_fill_hashmap(max_entries);
> +       // Alloc memory considering the largest buffer
> +       buf = malloc(elem_size * tests[test_len-1]);
> +       assert(buf != NULL);
> +
> +test:
> +       entries = tests[j];
> +       buf_len = elem_size*tests[j];
> +       j++;
> +       clock_gettime(clk_id, &begin);
> +       errno = 0;
> +       i = 0;
> +       while (errno == 0) {
> +               bpf_map_dump(fd, !i ? NULL : &key,
> +                                 buf, &buf_len);
> +               if (errno)
> +                       break;
> +               if (!i)
> +                       key = *((uint64_t *)(buf + buf_len - elem_size));
> +               i += buf_len / elem_size;
> +       }
> +       clock_gettime(clk_id, &end);
> +       assert(i  == max_entries);
> +       dump_time_spent = NSEC_PER_SEC * (end.tv_sec - begin.tv_sec) +
> +                         end.tv_nsec - begin.tv_nsec;
> +       next_key_valid = true;
> +       clock_gettime(clk_id, &begin);
> +       assert(bpf_map_get_next_key(fd, NULL, &key) == 0);
> +       for (i = 0; next_key_valid; i++) {
> +               next_key_valid = bpf_map_get_next_key(fd, &key, &next_key) == 0;
> +               assert(bpf_map_lookup_elem(fd, &key, &value) == 0);
> +               key = next_key;
> +       }
> +       clock_gettime(clk_id, &end);
> +       time_spent = NSEC_PER_SEC * (end.tv_sec - begin.tv_sec) +
> +                    end.tv_nsec - begin.tv_nsec;
> +       res = (1-((double)dump_time_spent/time_spent))*100;
> +       printf("buf_len_%u:\t %llu entry-by-entry: %llu improvement %lf\n",
> +              entries, dump_time_spent, time_spent, res);
> +       assert(i  == max_entries);
              ^ extra space after i.

> +
> +       if (j < test_len)
> +               goto test;
> +       free(buf);
> +       close(fd);
> +}
> +
>  static void test_hashmap_zero_seed(void)
>  {
>         int i, first, second, old_flags;
> @@ -1758,6 +1822,7 @@ static void run_all_tests(void)
>         test_hashmap_walk(0, NULL);
>         test_hashmap_zero_seed();
>         test_hashmap_dump();
> +       test_hashmap_dump_perf();
>
>         test_arraymap(0, NULL);
>         test_arraymap_percpu(0, NULL);
> --
> 2.22.0.657.g960e92d24f-goog
>

^ permalink raw reply

* Re: [PATCH -next v2] net/ixgbevf: fix a compilation error of skb_frag_t
From: Jeff Kirsher @ 2019-07-24 22:02 UTC (permalink / raw)
  To: David Miller; +Cc: cai, netdev, linux-kernel
In-Reply-To: <20190724.144143.2055459359936675888.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 482 bytes --]

On Wed, 2019-07-24 at 14:41 -0700, David Miller wrote:
> From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Date: Wed, 24 Jul 2019 09:39:26 -0700
> 
> > Dave I will pick this up and add it to my queue.
> 
> How soon will you get that to me?  The sooner this build fix is cured
> the
> better.

Go ahead and pick it up, I was able to get it through an initial round
of testing with no issues.  No need to wait for me to re-send it, I
will go ahead and ACK Qian's patch.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH -next] net/ixgbevf: fix a compilation error of skb_frag_t
From: Jeff Kirsher @ 2019-07-24 22:02 UTC (permalink / raw)
  To: Qian Cai, willy; +Cc: davem, netdev, linux-kernel
In-Reply-To: <1563975157-30691-1-git-send-email-cai@lca.pw>

[-- Attachment #1: Type: text/plain, Size: 1535 bytes --]

On Wed, 2019-07-24 at 09:32 -0400, Qian Cai wrote:
> The linux-next commit "net: Rename skb_frag_t size to bv_len" [1]
> introduced a compilation error on powerpc as it forgot to rename
> "size"
> to "bv_len" for ixgbevf.
> 
> [1] 
> https://lore.kernel.org/netdev/20190723030831.11879-1-willy@infradead.org/T/#md052f1c7de965ccd1bdcb6f92e1990a52298eac5
> 
> In file included from ./include/linux/cache.h:5,
>                  from ./include/linux/printk.h:9,
>                  from ./include/linux/kernel.h:15,
>                  from ./include/linux/list.h:9,
>                  from ./include/linux/module.h:9,
>                  from
> drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c:12:
> drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c: In function
> 'ixgbevf_xmit_frame_ring':
> drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c:4138:51: error:
> 'skb_frag_t' {aka 'struct bio_vec'} has no member named 'size'
>    count += TXD_USE_COUNT(skb_shinfo(skb)->frags[f].size);
>                                                    ^
> ./include/uapi/linux/kernel.h:13:40: note: in definition of macro
> '__KERNEL_DIV_ROUND_UP'
>  #define __KERNEL_DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
>                                         ^
> drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c:4138:12: note: in
> expansion of macro 'TXD_USE_COUNT'
>    count += TXD_USE_COUNT(skb_shinfo(skb)->frags[f].size);
> 
> Signed-off-by: Qian Cai <cai@lca.pw>

Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH bpf-next 0/7] bpf/flow_dissector: support input flags
From: Willem de Bruijn @ 2019-07-24 22:10 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Network Development, bpf, David Miller, Alexei Starovoitov,
	Daniel Borkmann, Willem de Bruijn, Petar Penkov
In-Reply-To: <20190724170018.96659-1-sdf@google.com>

On Wed, Jul 24, 2019 at 1:11 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> C flow dissector supports input flags that tell it to customize parsing
> by either stopping early or trying to parse as deep as possible.
> BPF flow dissector always parses as deep as possible which is sub-optimal.
> Pass input flags to the BPF flow dissector as well so it can make the same
> decisions.
>
> Series outline:
> * remove unused FLOW_DISSECTOR_F_STOP_AT_L3 flag
> * export FLOW_DISSECTOR_F_XXX flags as uapi and pass them to BPF
>   flow dissector
> * add documentation for the export flags
> * support input flags in BPF_PROG_TEST_RUN via ctx_{in,out}
> * sync uapi to tools
> * support FLOW_DISSECTOR_F_PARSE_1ST_FRAG in selftest
> * support FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL in kernel and selftest
> * support FLOW_DISSECTOR_F_STOP_AT_ENCAP in selftest
>
> Pros:
> * makes BPF flow dissector faster by avoiding burning extra cycles
> * existing BPF progs continue to work by ignoring the flags and always
>   parsing as deep as possible
>
> Cons:
> * new UAPI which we need to support (OTOH, if we need to deprecate some
>   flags, we can just stop setting them upon calling BPF programs)
>
> Some numbers (with .repeat = 4000000 in test_flow_dissector):
>         test_flow_dissector:PASS:ipv4-frag 35 nsec
>         test_flow_dissector:PASS:ipv4-frag 35 nsec
>         test_flow_dissector:PASS:ipv4-no-frag 32 nsec
>         test_flow_dissector:PASS:ipv4-no-frag 32 nsec
>
>         test_flow_dissector:PASS:ipv6-frag 39 nsec
>         test_flow_dissector:PASS:ipv6-frag 39 nsec
>         test_flow_dissector:PASS:ipv6-no-frag 36 nsec
>         test_flow_dissector:PASS:ipv6-no-frag 36 nsec
>
>         test_flow_dissector:PASS:ipv6-flow-label 36 nsec
>         test_flow_dissector:PASS:ipv6-flow-label 36 nsec
>         test_flow_dissector:PASS:ipv6-no-flow-label 33 nsec
>         test_flow_dissector:PASS:ipv6-no-flow-label 33 nsec
>
>         test_flow_dissector:PASS:ipip-encap 38 nsec
>         test_flow_dissector:PASS:ipip-encap 38 nsec
>         test_flow_dissector:PASS:ipip-no-encap 32 nsec
>         test_flow_dissector:PASS:ipip-no-encap 32 nsec
>
> The improvement is around 10%, but it's in a tight cache-hot
> BPF_PROG_TEST_RUN loop.
>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Petar Penkov <ppenkov@google.com>

This looks great to me. Thanks, Stan!

Acked-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply

* Re: [PATCH bpf-next 0/6] bpf: add BPF_MAP_DUMP command to dump more than one entry per call
From: Brian Vazquez @ 2019-07-24 22:15 UTC (permalink / raw)
  To: Song Liu
  Cc: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Stanislav Fomichev, Willem de Bruijn,
	Petar Penkov, open list, Networking, bpf
In-Reply-To: <CAPhsuW7PU1PP91e8vD2diwhBAwGJHWu6wAKOoBThT86f4r5OJw@mail.gmail.com>

On Wed, Jul 24, 2019 at 12:20 PM Song Liu <liu.song.a23@gmail.com> wrote:
>
> On Wed, Jul 24, 2019 at 10:09 AM Brian Vazquez <brianvv@google.com> wrote:
> >
> > This introduces a new command to retrieve multiple number of entries
> > from a bpf map.
> >
> > This new command can be executed from the existing BPF syscall as
> > follows:
> >
> > err =  bpf(BPF_MAP_DUMP, union bpf_attr *attr, u32 size)
> > using attr->dump.map_fd, attr->dump.prev_key, attr->dump.buf,
> > attr->dump.buf_len
> > returns zero or negative error, and populates buf and buf_len on
> > succees
> >
> > This implementation is wrapping the existing bpf methods:
> > map_get_next_key and map_lookup_elem
> >
> > Note that this implementation can be extended later to do dump and
> > delete by extending map_lookup_and_delete_elem (currently it only works
> > for bpf queue/stack maps) and either use a new flag in map_dump or a new
> > command map_dump_and_delete.
> >
> > Results show that even with a 1-elem_size buffer, it runs ~40 faster
>
> Why is the new command 40% faster with 1-elem_size buffer?

The test is using a really simple map structure: uint64_t key,val.
Which makes the lookup_elem logic faster since it doesn't spend too
much time copying the value. My conclusion with the 40% was that this
new implementation only needs 1 syscall per elem compare to the 2
syscalls that we needed with previous implementation so in this
particular case the number of ops that we are doing is almost halved.
I did one experiment increasing the value_size (448*64B) and it was
only 14% faster with 1-elem_size buffer.

> > than the current implementation, improvements of ~85% are reported when
> > the buffer size is increased, although, after the buffer size is around
> > 5% of the total number of entries there's no huge difference in
> > increasing it.
> >
> > Tested:
> > Tried different size buffers to handle case where the bulk is bigger, or
> > the elements to retrieve are less than the existing ones, all runs read
> > a map of 100K entries. Below are the results(in ns) from the different
> > runs:
> >
> > buf_len_1:       69038725 entry-by-entry: 112384424 improvement
> > 38.569134
> > buf_len_2:       40897447 entry-by-entry: 111030546 improvement
> > 63.165590
> > buf_len_230:     13652714 entry-by-entry: 111694058 improvement
> > 87.776687
> > buf_len_5000:    13576271 entry-by-entry: 111101169 improvement
> > 87.780263
> > buf_len_73000:   14694343 entry-by-entry: 111740162 improvement
> > 86.849542
> > buf_len_100000:  13745969 entry-by-entry: 114151991 improvement
> > 87.958187
> > buf_len_234567:  14329834 entry-by-entry: 114427589 improvement
> > 87.476941
>
> It took me a while to figure out the meaning of 87.476941. It is probably
> a good idea to say 87.5% instead.

right, will change it in next version.

^ permalink raw reply

* Re: [PATCH bpf-next 1/7] bpf/flow_dissector: pass input flags to BPF flow dissector program
From: Song Liu @ 2019-07-24 22:21 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Networking, bpf, David S . Miller, Alexei Starovoitov,
	Daniel Borkmann, Willem de Bruijn, Petar Penkov
In-Reply-To: <20190724170018.96659-2-sdf@google.com>

On Wed, Jul 24, 2019 at 10:11 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> C flow dissector supports input flags that tell it to customize parsing
> by either stopping early or trying to parse as deep as possible. Pass
> those flags to the BPF flow dissector so it can make the same
> decisions. In the next commits I'll add support for those flags to
> our reference bpf_flow.c
>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Petar Penkov <ppenkov@google.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  include/linux/skbuff.h       | 2 +-
>  include/net/flow_dissector.h | 4 ----
>  include/uapi/linux/bpf.h     | 5 +++++
>  net/bpf/test_run.c           | 2 +-
>  net/core/flow_dissector.c    | 5 +++--
>  5 files changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 718742b1c505..9b7a8038beec 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1271,7 +1271,7 @@ static inline int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
>
>  struct bpf_flow_dissector;
>  bool bpf_flow_dissect(struct bpf_prog *prog, struct bpf_flow_dissector *ctx,
> -                     __be16 proto, int nhoff, int hlen);
> +                     __be16 proto, int nhoff, int hlen, unsigned int flags);
>
>  bool __skb_flow_dissect(const struct net *net,
>                         const struct sk_buff *skb,
> diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
> index 90bd210be060..3e2642587b76 100644
> --- a/include/net/flow_dissector.h
> +++ b/include/net/flow_dissector.h
> @@ -253,10 +253,6 @@ enum flow_dissector_key_id {
>         FLOW_DISSECTOR_KEY_MAX,
>  };
>
> -#define FLOW_DISSECTOR_F_PARSE_1ST_FRAG                BIT(0)
> -#define FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL    BIT(1)
> -#define FLOW_DISSECTOR_F_STOP_AT_ENCAP         BIT(2)
> -
>  struct flow_dissector_key {
>         enum flow_dissector_key_id key_id;
>         size_t offset; /* offset of struct flow_dissector_key_*
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index fa1c753dcdbc..b4ad19bd6aa8 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -3507,6 +3507,10 @@ enum bpf_task_fd_type {
>         BPF_FD_TYPE_URETPROBE,          /* filename + offset */
>  };
>
> +#define FLOW_DISSECTOR_F_PARSE_1ST_FRAG                (1U << 0)
> +#define FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL    (1U << 1)
> +#define FLOW_DISSECTOR_F_STOP_AT_ENCAP         (1U << 2)

Do we have to move these?

Thanks,
Song

^ permalink raw reply

* [PATCH iproute2] iplink: document the 'link change' subcommand
From: Matteo Croce @ 2019-07-24 19:12 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, David Ahern

ip link can set parameters both via the 'set' and 'change' keyword.
In fact, 'change' is an alias for 'set'.
Document this in the help and manpage.

Fixes: 1d93483985f0 ("iplink: use netlink for link configuration")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
---
 ip/iplink.c           | 4 ++--
 man/man8/ip-link.8.in | 7 ++++++-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/ip/iplink.c b/ip/iplink.c
index 212a0885..be237c93 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -64,12 +64,12 @@ void iplink_usage(void)
 			"\n"
 			"	ip link delete { DEVICE | dev DEVICE | group DEVGROUP } type TYPE [ ARGS ]\n"
 			"\n"
-			"	ip link set { DEVICE | dev DEVICE | group DEVGROUP }\n"
+			"	ip link { set | change } { DEVICE | dev DEVICE | group DEVGROUP }\n"
 			"			[ { up | down } ]\n"
 			"			[ type TYPE ARGS ]\n");
 	} else
 		fprintf(stderr,
-			"Usage: ip link set DEVICE [ { up | down } ]\n");
+			"Usage: ip link { set | change } DEVICE [ { up | down } ]\n");
 
 	fprintf(stderr,
 		"		[ arp { on | off } ]\n"
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 883d8807..d4be1a0e 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -1802,7 +1802,8 @@ deleted since it is the default group.
 .BI type " TYPE "
 specifies the type of the device.
 
-.SS ip link set - change device attributes
+.SS ip link set
+.SS ip link change - change device attributes
 
 .PP
 .B Warning:
@@ -1815,6 +1816,10 @@ can move the system to an unpredictable state. The solution
 is to avoid changing several parameters with one
 .B ip link set
 call.
+.B set
+and
+.B change
+are synonyms and have identical syntax and behavior.
 
 .TP
 .BI dev " DEVICE "
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH bpf-next 2/7] bpf/flow_dissector: document flags
From: Song Liu @ 2019-07-24 22:24 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Networking, bpf, David S . Miller, Alexei Starovoitov,
	Daniel Borkmann, Willem de Bruijn, Petar Penkov
In-Reply-To: <20190724170018.96659-3-sdf@google.com>

On Wed, Jul 24, 2019 at 10:11 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> Describe what each input flag does and who uses it.
>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Petar Penkov <ppenkov@google.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  Documentation/bpf/prog_flow_dissector.rst | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
>
> diff --git a/Documentation/bpf/prog_flow_dissector.rst b/Documentation/bpf/prog_flow_dissector.rst
> index ed343abe541e..0f3f380b2ce4 100644
> --- a/Documentation/bpf/prog_flow_dissector.rst
> +++ b/Documentation/bpf/prog_flow_dissector.rst
> @@ -26,6 +26,7 @@ and output arguments.
>    * ``nhoff`` - initial offset of the networking header
>    * ``thoff`` - initial offset of the transport header, initialized to nhoff
>    * ``n_proto`` - L3 protocol type, parsed out of L2 header
> +  * ``flags`` - optional flags
>
>  Flow dissector BPF program should fill out the rest of the ``struct
>  bpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be
> @@ -101,6 +102,23 @@ can be called for both cases and would have to be written carefully to
>  handle both cases.
>
>
> +Flags
> +=====
> +
> +``flow_keys->flags`` might contain optional input flags that work as follows:
> +
> +* ``FLOW_DISSECTOR_F_PARSE_1ST_FRAG`` - tells BPF flow dissector to continue
> +  parsing first fragment; the default expected behavior is that flow dissector
> +  returns as soon as it finds out that the packet is fragmented;
> +  used by ``eth_get_headlen`` to estimate length of all headers for GRO.
> +* ``FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL`` - tells BPF flow dissector to stop
> +  parsing as soon as it reaches IPv6 flow label; used by ``___skb_get_hash``
> +  and ``__skb_get_hash_symmetric`` to get flow hash.
> +* ``FLOW_DISSECTOR_F_STOP_AT_ENCAP`` - tells BPF flow dissector to stop
> +  parsing as soon as it reaches encapsulated headers; used by routing
> +  infrastructure.
> +
> +
>  Reference Implementation
>  ========================
>
> --
> 2.22.0.657.g960e92d24f-goog
>

^ permalink raw reply

* Re: [PATCH bpf-next 2/6] bpf: add BPF_MAP_DUMP command to dump more than one entry per call
From: Brian Vazquez @ 2019-07-24 22:26 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Stanislav Fomichev, Willem de Bruijn,
	Petar Penkov, LKML, Network Development, bpf
In-Reply-To: <CAF=yD-+a=t_YizdJpb_Q+zxR7iP-V-EarNsp9tjnFTRBjOtFvA@mail.gmail.com>

On Wed, Jul 24, 2019 at 12:55 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Wed, Jul 24, 2019 at 1:10 PM Brian Vazquez <brianvv@google.com> wrote:
> >
> > This introduces a new command to retrieve multiple number of entries
> > from a bpf map, wrapping the existing bpf methods:
> > map_get_next_key and map_lookup_elem
> >
> > To start dumping the map from the beginning you must specify NULL as
> > the prev_key.
> >
> > The new API returns 0 when it successfully copied all the elements
> > requested or it copied less because there weren't more elements to
> > retrieved (i.e err == -ENOENT). In last scenario err will be masked to 0.
>
> I think I understand this, but perhaps it can be explained a bit more
> concisely without reference to ENOENT and error masking. The function
> returns the min of the number of requested elements and the number of
> remaining elements in the map from the given starting point
> (prev_key).

That sounds better to me. Thanks!

> > On a successful call buf and buf_len will contain correct data and in
> > case prev_key was provided (not for the first walk, since prev_key is
> > NULL) it will contain the last_key copied into the prev_key which will
> > simplify next call.
> >
> > Only when it can't find a single element it will return -ENOENT meaning
> > that the map has been entirely walked. When an error is return buf,
> > buf_len and prev_key shouldn't be read nor used.
>
> That's common for error handling. No need to state explicitly.

 Ack

>
> > Because maps can be called from userspace and kernel code, this function
> > can have a scenario where the next_key was found but by the time we
> > try to retrieve the value the element is not there, in this case the
> > function continues and tries to get a new next_key value, skipping the
> > deleted key. If at some point the function find itself trap in a loop,
> > it will return -EINTR.
>
> Good to point this out! I don't think that unbounded continue;
> statements until an interrupt happens is sufficient. Please bound the
> number of retries to a low number.

And what would it be a good number? Maybe 3 attempts? And in that case
what error should be reported?
>
> > The function will try to fit as much as possible in the buf provided and
> > will return -EINVAL if buf_len is smaller than elem_size.
> >
> > QUEUE and STACK maps are not supported.
> >
> > Note that map_dump doesn't guarantee that reading the entire table is
> > consistent since this function is always racing with kernel and user code
> > but the same behaviour is found when the entire table is walked using
> > the current interfaces: map_get_next_key + map_lookup_elem.
>
> > It is also important to note that with  a locked map, the lock is grabbed
> > for 1 entry at the time, meaning that the returned buf might or might not
> > be consistent.
>
> Would it be informative to signal to the caller if the read was
> complete and consistent (because the entire table was read while the
> lock was held)?

Mmm.. not sure how we could signal that to the caller.  But I don't
think there's a way to know it was consistent (i.e. one element was
removed in bucket 20 and you are copying the keys in bucket 15, when
you get to bucket 20 there's no way to know that some entries were
removed when you traversed them). The lock is held for just 1 single
entry not the entire table.
Maybe clarify more that in the commit message?

Thanks for reviewing!

^ permalink raw reply

* Re: [PATCH] net: phy: mscc: initialize stats array
From: David Miller @ 2019-07-24 22:26 UTC (permalink / raw)
  To: schwab; +Cc: netdev, andrew, f.fainelli, hkallweit1, linux-kernel
In-Reply-To: <mvmh87bih1y.fsf@suse.de>

From: Andreas Schwab <schwab@suse.de>
Date: Wed, 24 Jul 2019 17:32:57 +0200

> The memory allocated for the stats array may contain arbitrary data.
> 
> Signed-off-by: Andreas Schwab <schwab@suse.de>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net-next] selftests: mlxsw: Fix typo in qos_mc_aware.sh
From: David Miller @ 2019-07-24 22:27 UTC (permalink / raw)
  To: shuah
  Cc: standby24x7, linux-kernel, jiri, idosch, linux-kselftest, rdunlap,
	netdev
In-Reply-To: <7e69dda0-bca2-4b78-19cb-b66d097503c0@kernel.org>

From: shuah <shuah@kernel.org>
Date: Wed, 24 Jul 2019 16:00:56 -0600

> On 7/24/19 3:51 PM, David Miller wrote:
>> From: Masanari Iida <standby24x7@gmail.com>
>> Date: Thu, 25 Jul 2019 00:29:51 +0900
>> 
>>> This patch fix some spelling typo in qos_mc_aware.sh
>>>
>>> Signed-off-by: Masanari Iida <standby24x7@gmail.com>
>>> Acked-by: Randy Dunlap <rdunlap@infradead.org>
>> Applied.
>> 
> 
> I applied to this my fixes branch this morning on auto-pilot
> without realizing that it is in your domain :)
> 
> Would you like like me to drop it from mine?

It probably doesn't matter that it exists in two trees :)

^ permalink raw reply

* Re: [PATCH -next v2] net/ixgbevf: fix a compilation error of skb_frag_t
From: David Miller @ 2019-07-24 22:29 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: cai, netdev, linux-kernel
In-Reply-To: <8749f22284c5a557fe50a5dcc956c5d2c80037e2.camel@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 24 Jul 2019 15:02:15 -0700

> On Wed, 2019-07-24 at 14:41 -0700, David Miller wrote:
>> From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>> Date: Wed, 24 Jul 2019 09:39:26 -0700
>> 
>> > Dave I will pick this up and add it to my queue.
>> 
>> How soon will you get that to me?  The sooner this build fix is cured
>> the
>> better.
> 
> Go ahead and pick it up, I was able to get it through an initial round
> of testing with no issues.  No need to wait for me to re-send it, I
> will go ahead and ACK Qian's patch.

Ok, done.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next] net/tls: add myself as a co-maintainer
From: David Miller @ 2019-07-24 22:30 UTC (permalink / raw)
  To: jakub.kicinski
  Cc: alexei.starovoitov, daniel, john.fastabend, davejwatson, borisp,
	aviadye, netdev, oss-drivers, simon.horman, ast
In-Reply-To: <20190724180248.6480-1-jakub.kicinski@netronome.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Wed, 24 Jul 2019 11:02:48 -0700

> I've been spending quite a bit of time fixing and
> preventing bit rot in the core TLS code. TLS seems
> to only be growing in importance, I'd like to help
> ensuring the quality of our implementation.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Acked-by: Alexei Starovoitov <ast@kernel.org>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: John Fastabend <john.fastabend@gmail.com>
> Acked-by: Simon Horman <simon.horman@netronome.com>

I'll aplly this to 'net', thanks.

^ permalink raw reply

* Re: [PATCH bpf-next 1/7] bpf/flow_dissector: pass input flags to BPF flow dissector program
From: Stanislav Fomichev @ 2019-07-24 22:32 UTC (permalink / raw)
  To: Song Liu
  Cc: Stanislav Fomichev, Networking, bpf, David S . Miller,
	Alexei Starovoitov, Daniel Borkmann, Willem de Bruijn,
	Petar Penkov
In-Reply-To: <CAPhsuW6wq_6Pf80yV7oEb0uW7Xv9=UKAbTm4XJLyKAtSmDzCBQ@mail.gmail.com>

On 07/24, Song Liu wrote:
> On Wed, Jul 24, 2019 at 10:11 AM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > C flow dissector supports input flags that tell it to customize parsing
> > by either stopping early or trying to parse as deep as possible. Pass
> > those flags to the BPF flow dissector so it can make the same
> > decisions. In the next commits I'll add support for those flags to
> > our reference bpf_flow.c
> >
> > Cc: Willem de Bruijn <willemb@google.com>
> > Cc: Petar Penkov <ppenkov@google.com>
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  include/linux/skbuff.h       | 2 +-
> >  include/net/flow_dissector.h | 4 ----
> >  include/uapi/linux/bpf.h     | 5 +++++
> >  net/bpf/test_run.c           | 2 +-
> >  net/core/flow_dissector.c    | 5 +++--
> >  5 files changed, 10 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 718742b1c505..9b7a8038beec 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -1271,7 +1271,7 @@ static inline int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
> >
> >  struct bpf_flow_dissector;
> >  bool bpf_flow_dissect(struct bpf_prog *prog, struct bpf_flow_dissector *ctx,
> > -                     __be16 proto, int nhoff, int hlen);
> > +                     __be16 proto, int nhoff, int hlen, unsigned int flags);
> >
> >  bool __skb_flow_dissect(const struct net *net,
> >                         const struct sk_buff *skb,
> > diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
> > index 90bd210be060..3e2642587b76 100644
> > --- a/include/net/flow_dissector.h
> > +++ b/include/net/flow_dissector.h
> > @@ -253,10 +253,6 @@ enum flow_dissector_key_id {
> >         FLOW_DISSECTOR_KEY_MAX,
> >  };
> >
> > -#define FLOW_DISSECTOR_F_PARSE_1ST_FRAG                BIT(0)
> > -#define FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL    BIT(1)
> > -#define FLOW_DISSECTOR_F_STOP_AT_ENCAP         BIT(2)
> > -
> >  struct flow_dissector_key {
> >         enum flow_dissector_key_id key_id;
> >         size_t offset; /* offset of struct flow_dissector_key_*
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index fa1c753dcdbc..b4ad19bd6aa8 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -3507,6 +3507,10 @@ enum bpf_task_fd_type {
> >         BPF_FD_TYPE_URETPROBE,          /* filename + offset */
> >  };
> >
> > +#define FLOW_DISSECTOR_F_PARSE_1ST_FRAG                (1U << 0)
> > +#define FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL    (1U << 1)
> > +#define FLOW_DISSECTOR_F_STOP_AT_ENCAP         (1U << 2)
> 
> Do we have to move these?
These have to be a part of UAPI one way or another. The easiest thing
we can do is to call the existing flags a UAPI and move them into
exported headers. Alternatively, we can introduce new set of UAPI flags
and do some kind of conversion between exported and internal ones.

Since it's pretty easy to add/deprecate them (see cover letter), I've
decided to just move them instead of adding another set and doing
conversion. I'm open to suggestions if you think otherwise.

^ permalink raw reply

* Re: [PATCH bpf-next 3/7] bpf/flow_dissector: support flags in BPF_PROG_TEST_RUN
From: Song Liu @ 2019-07-24 22:33 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Networking, bpf, David S . Miller, Alexei Starovoitov,
	Daniel Borkmann, Willem de Bruijn, Petar Penkov
In-Reply-To: <20190724170018.96659-4-sdf@google.com>

On Wed, Jul 24, 2019 at 10:11 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> This will allow us to write tests for those flags.
>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Petar Penkov <ppenkov@google.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  net/bpf/test_run.c | 39 +++++++++++++++++++++++++++++++++++----
>  1 file changed, 35 insertions(+), 4 deletions(-)
>
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index 4e41d15a1098..444a7baed791 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -377,6 +377,22 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
>         return ret;
>  }
>
> +static int verify_user_bpf_flow_keys(struct bpf_flow_keys *ctx)
> +{
> +       /* make sure the fields we don't use are zeroed */
> +       if (!range_is_zero(ctx, 0, offsetof(struct bpf_flow_keys, flags)))
> +               return -EINVAL;
> +
> +       /* flags is allowed */
> +
> +       if (!range_is_zero(ctx, offsetof(struct bpf_flow_keys, flags) +
> +                          FIELD_SIZEOF(struct bpf_flow_keys, flags),
> +                          sizeof(struct bpf_flow_keys)))
> +               return -EINVAL;
> +
> +       return 0;
> +}
> +
>  int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
>                                      const union bpf_attr *kattr,
>                                      union bpf_attr __user *uattr)
> @@ -384,9 +400,11 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
>         u32 size = kattr->test.data_size_in;
>         struct bpf_flow_dissector ctx = {};
>         u32 repeat = kattr->test.repeat;
> +       struct bpf_flow_keys *user_ctx;
>         struct bpf_flow_keys flow_keys;
>         u64 time_start, time_spent = 0;
>         const struct ethhdr *eth;
> +       unsigned int flags = 0;
>         u32 retval, duration;
>         void *data;
>         int ret;
> @@ -395,9 +413,6 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
>         if (prog->type != BPF_PROG_TYPE_FLOW_DISSECTOR)
>                 return -EINVAL;
>
> -       if (kattr->test.ctx_in || kattr->test.ctx_out)
> -               return -EINVAL;
> -
>         if (size < ETH_HLEN)
>                 return -EINVAL;
>
> @@ -410,6 +425,18 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
>         if (!repeat)
>                 repeat = 1;
>
> +       user_ctx = bpf_ctx_init(kattr, sizeof(struct bpf_flow_keys));
> +       if (IS_ERR(user_ctx)) {
> +               kfree(data);
> +               return PTR_ERR(user_ctx);
> +       }
> +       if (user_ctx) {
> +               ret = verify_user_bpf_flow_keys(user_ctx);
> +               if (ret)
> +                       goto out;
> +               flags = user_ctx->flags;
> +       }
> +
>         ctx.flow_keys = &flow_keys;
>         ctx.data = data;
>         ctx.data_end = (__u8 *)data + size;
> @@ -419,7 +446,7 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
>         time_start = ktime_get_ns();
>         for (i = 0; i < repeat; i++) {
>                 retval = bpf_flow_dissect(prog, &ctx, eth->h_proto, ETH_HLEN,
> -                                         size, 0);
> +                                         size, flags);
>
>                 if (signal_pending(current)) {
>                         preempt_enable();
> @@ -450,8 +477,12 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
>
>         ret = bpf_test_finish(kattr, uattr, &flow_keys, sizeof(flow_keys),
>                               retval, duration);
> +       if (!ret)
> +               ret = bpf_ctx_finish(kattr, uattr, user_ctx,
> +                                    sizeof(struct bpf_flow_keys));
>
>  out:
>         kfree(data);
> +       kfree(user_ctx);

nit: it is not really necessary now, but maybe put kfree(user_ctx)
before kfree(data).

>         return ret;
>  }
> --
> 2.22.0.657.g960e92d24f-goog
>

^ permalink raw reply

* Re: [PATCH bpf-next 2/6] bpf: add BPF_MAP_DUMP command to dump more than one entry per call
From: Willem de Bruijn @ 2019-07-24 22:33 UTC (permalink / raw)
  To: Brian Vazquez
  Cc: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Stanislav Fomichev, Willem de Bruijn,
	Petar Penkov, LKML, Network Development, bpf
In-Reply-To: <CABCgpaWCLJtDx8kHNiQZneqYZkZ3fzRGnipT5__kmwMhu01g=w@mail.gmail.com>

> > > Because maps can be called from userspace and kernel code, this function
> > > can have a scenario where the next_key was found but by the time we
> > > try to retrieve the value the element is not there, in this case the
> > > function continues and tries to get a new next_key value, skipping the
> > > deleted key. If at some point the function find itself trap in a loop,
> > > it will return -EINTR.
> >
> > Good to point this out! I don't think that unbounded continue;
> > statements until an interrupt happens is sufficient. Please bound the
> > number of retries to a low number.
>
> And what would it be a good number? Maybe 3 attempts?

3 sounds good to me.

> And in that case what error should be reported?

One that's unambiguous and somewhat intuitive for the given issue.
Perhaps EBUSY?

> > > The function will try to fit as much as possible in the buf provided and
> > > will return -EINVAL if buf_len is smaller than elem_size.
> > >
> > > QUEUE and STACK maps are not supported.
> > >
> > > Note that map_dump doesn't guarantee that reading the entire table is
> > > consistent since this function is always racing with kernel and user code
> > > but the same behaviour is found when the entire table is walked using
> > > the current interfaces: map_get_next_key + map_lookup_elem.
> >
> > > It is also important to note that with  a locked map, the lock is grabbed
> > > for 1 entry at the time, meaning that the returned buf might or might not
> > > be consistent.
> >
> > Would it be informative to signal to the caller if the read was
> > complete and consistent (because the entire table was read while the
> > lock was held)?
>
> Mmm.. not sure how we could signal that to the caller.  But I don't
> think there's a way to know it was consistent

Okay, that makes for a simple answer :) No need to try to add a signal, then.

^ permalink raw reply

* Re: [net-next v2 0/5][pull request] 1GbE Intel Wired LAN Driver Updates 2019-07-24
From: David Miller @ 2019-07-24 22:35 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann
In-Reply-To: <20190724212613.1580-1-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 24 Jul 2019 14:26:08 -0700

> This series contains updates to igc and e1000e client drivers only.
> 
> Sasha provides a couple of cleanups to remove code that is not needed
> and reduce structure sizes.  Updated the MAC reset flow to use the
> device reset flow instead of a port reset flow.  Added addition device
> id's that will be supported.
> 
> Kai-Heng Feng provides a workaround for a possible stalled packet issue
> in our ICH devices due to a clock recovery from the PCH being too slow.
> 
> v2: removed the last patch in the series that supposedly fixed a MAC/PHY
>     de-sync potential issue while waiting for additional information from
>     hardware engineers.

Pulled, thanks Jeff.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox