Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH bpf-next 1/2] bpf: implement syscall command BPF_MAP_GET_NEXT_KEY for stacktrace map
From: Jakub Kicinski @ 2018-01-04 21:08 UTC (permalink / raw)
  To: Yonghong Song; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <20180104072746.1569033-2-yhs@fb.com>

On Wed, 3 Jan 2018 23:27:45 -0800, Yonghong Song wrote:
> Currently, bpf syscall command BPF_MAP_GET_NEXT_KEY is not
> supported for stacktrace map. However, there are use cases where
> user space wants to enumerate all stacktrace map entries where
> BPF_MAP_GET_NEXT_KEY command will be really helpful.
> In addition, if user space wants to delete all map entries
> in order to save memory and does not want to close the
> map file descriptor, BPF_MAP_GET_NEXT_KEY may help improve
> performance if map entries are sparsely populated.
> 
> The implementation follows the API specification of existing
> BPF_MAP_GET_NEXT_KEY implementation. If user provides
> an NULL key pointer, the first key is returned. Otherwise,
> the first valid key after the input parameter "key"
> is returned, or -ENOENT if no valid key can be found.
> 
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  kernel/bpf/stackmap.c | 23 +++++++++++++++++++++--
>  1 file changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index a15bc63..207b21c 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -226,9 +226,28 @@ int bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
>  	return 0;
>  }
>  
> -static int stack_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
> +static int stack_map_get_next_key(struct bpf_map *map, void *key,
> +				  void *next_key)
>  {
> -	return -EINVAL;
> +	struct bpf_stack_map *smap = container_of(map,
> +						  struct bpf_stack_map, map);
> +	u32 id;
> +
> +	WARN_ON_ONCE(!rcu_read_lock_held());
> +
> +	if (!key)
> +		id = 0;
> +	else
> +		id = *(u32 *)key + 1;
> +
> +	while (id < smap->n_buckets && !smap->buckets[id])
> +		id++;
> +
> +	if (id >= smap->n_buckets)
> +		return -ENOENT;

AFAIU for hash maps the semantics of get next are as follows:

get_next(map, key) {
	if (!key)
		return get_first(map);

	elem = lookup(map, key);
	if (!elem)                       // <-- note this branch
		return get_first(map);
	if (elem->next)
		return elem->next->key;
	return -ENOENT;
}

For arrays elements always exist, hence the elem->next check is
omitted.  Here you are, however, testing !smap->buckets[id] so I assume
elements may not exist.  

Is there any precedent for get_next on non-existent key returning
element other than first?  The stacktrace map is a bit special, and
returning id + 1 would defeat what you're trying to do here..  Is there
value in keeping the behaviour consistent across map types?  

Anyway, you said in the commit message that "The implementation follows
the API specification of existing BPF_MAP_GET_NEXT_KEY implementation."
and I find that arguable :)

> +	*(u32 *)next_key = id;
> +	return 0;
>  }
>  
>  static int stack_map_update_elem(struct bpf_map *map, void *key, void *value,

^ permalink raw reply

* [PATCH] sh_eth: remove sh_eth_plat_data::edmac_endian
From: Sergei Shtylyov @ 2018-01-04 21:26 UTC (permalink / raw)
  To: Yoshinori Sato, Rich Felker, linux-sh, netdev
  Cc: linux-renesas-soc, Sergei Shtylyov

[-- Attachment #1: sh_eth-remove-sh_eth_plat_data-edmac_endian.patch --]
[-- Type: text/plain, Size: 4872 bytes --]

Since the commit 888cc8c20cf ("sh_eth: remove EDMAC_BIG_ENDIAN") (geez,
I didn't realize that was 2 years ago!) the initializers in the SuperH
platform code for the 'sh_eth_plat_data::edmac_endian' stopped to matter,
so we can remove that field for good (not sure if  it  was ever useful --
SH7786 Ether has been reported  to have the same EDMAC descriptor/register
endiannes as configured for the SuperH CPU)...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

---
The patch is against DaveM's 'net-next.git' repo.
Not sure who should apply the patch -- will prolly be faster if DaveM does...

 arch/sh/boards/board-espt.c           |    1 -
 arch/sh/boards/board-sh7757lcr.c      |    4 ----
 arch/sh/boards/mach-ecovec24/setup.c  |    1 -
 arch/sh/boards/mach-se/7724/setup.c   |    1 -
 arch/sh/boards/mach-sh7763rdp/setup.c |    1 -
 arch/sh/kernel/cpu/sh2/setup-sh7619.c |    1 -
 include/linux/sh_eth.h                |    3 ---
 7 files changed, 12 deletions(-)

Index: net-next/arch/sh/boards/board-espt.c
===================================================================
--- net-next.orig/arch/sh/boards/board-espt.c
+++ net-next/arch/sh/boards/board-espt.c
@@ -79,7 +79,6 @@ static struct resource sh_eth_resources[
 
 static struct sh_eth_plat_data sh7763_eth_pdata = {
 	.phy = 0,
-	.edmac_endian = EDMAC_LITTLE_ENDIAN,
 	.phy_interface = PHY_INTERFACE_MODE_MII,
 };
 
Index: net-next/arch/sh/boards/board-sh7757lcr.c
===================================================================
--- net-next.orig/arch/sh/boards/board-sh7757lcr.c
+++ net-next/arch/sh/boards/board-sh7757lcr.c
@@ -76,7 +76,6 @@ static struct resource sh_eth0_resources
 
 static struct sh_eth_plat_data sh7757_eth0_pdata = {
 	.phy = 1,
-	.edmac_endian = EDMAC_LITTLE_ENDIAN,
 	.set_mdio_gate = sh7757_eth_set_mdio_gate,
 };
 
@@ -104,7 +103,6 @@ static struct resource sh_eth1_resources
 
 static struct sh_eth_plat_data sh7757_eth1_pdata = {
 	.phy = 1,
-	.edmac_endian = EDMAC_LITTLE_ENDIAN,
 	.set_mdio_gate = sh7757_eth_set_mdio_gate,
 };
 
@@ -148,7 +146,6 @@ static struct resource sh_eth_giga0_reso
 
 static struct sh_eth_plat_data sh7757_eth_giga0_pdata = {
 	.phy = 18,
-	.edmac_endian = EDMAC_LITTLE_ENDIAN,
 	.set_mdio_gate = sh7757_eth_giga_set_mdio_gate,
 	.phy_interface = PHY_INTERFACE_MODE_RGMII_ID,
 };
@@ -182,7 +179,6 @@ static struct resource sh_eth_giga1_reso
 
 static struct sh_eth_plat_data sh7757_eth_giga1_pdata = {
 	.phy = 19,
-	.edmac_endian = EDMAC_LITTLE_ENDIAN,
 	.set_mdio_gate = sh7757_eth_giga_set_mdio_gate,
 	.phy_interface = PHY_INTERFACE_MODE_RGMII_ID,
 };
Index: net-next/arch/sh/boards/mach-ecovec24/setup.c
===================================================================
--- net-next.orig/arch/sh/boards/mach-ecovec24/setup.c
+++ net-next/arch/sh/boards/mach-ecovec24/setup.c
@@ -159,7 +159,6 @@ static struct resource sh_eth_resources[
 
 static struct sh_eth_plat_data sh_eth_plat = {
 	.phy = 0x1f, /* SMSC LAN8700 */
-	.edmac_endian = EDMAC_LITTLE_ENDIAN,
 	.phy_interface = PHY_INTERFACE_MODE_MII,
 	.ether_link_active_low = 1
 };
Index: net-next/arch/sh/boards/mach-se/7724/setup.c
===================================================================
--- net-next.orig/arch/sh/boards/mach-se/7724/setup.c
+++ net-next/arch/sh/boards/mach-se/7724/setup.c
@@ -374,7 +374,6 @@ static struct resource sh_eth_resources[
 
 static struct sh_eth_plat_data sh_eth_plat = {
 	.phy = 0x1f, /* SMSC LAN8187 */
-	.edmac_endian = EDMAC_LITTLE_ENDIAN,
 	.phy_interface = PHY_INTERFACE_MODE_MII,
 };
 
Index: net-next/arch/sh/boards/mach-sh7763rdp/setup.c
===================================================================
--- net-next.orig/arch/sh/boards/mach-sh7763rdp/setup.c
+++ net-next/arch/sh/boards/mach-sh7763rdp/setup.c
@@ -87,7 +87,6 @@ static struct resource sh_eth_resources[
 
 static struct sh_eth_plat_data sh7763_eth_pdata = {
 	.phy = 1,
-	.edmac_endian = EDMAC_LITTLE_ENDIAN,
 	.phy_interface = PHY_INTERFACE_MODE_MII,
 };
 
Index: net-next/arch/sh/kernel/cpu/sh2/setup-sh7619.c
===================================================================
--- net-next.orig/arch/sh/kernel/cpu/sh2/setup-sh7619.c
+++ net-next/arch/sh/kernel/cpu/sh2/setup-sh7619.c
@@ -122,7 +122,6 @@ static struct platform_device scif2_devi
 
 static struct sh_eth_plat_data eth_platform_data = {
 	.phy		= 1,
-	.edmac_endian	= EDMAC_LITTLE_ENDIAN,
 	.phy_interface	= PHY_INTERFACE_MODE_MII,
 };
 
Index: net-next/include/linux/sh_eth.h
===================================================================
--- net-next.orig/include/linux/sh_eth.h
+++ net-next/include/linux/sh_eth.h
@@ -5,12 +5,9 @@
 #include <linux/phy.h>
 #include <linux/if_ether.h>
 
-enum {EDMAC_LITTLE_ENDIAN};
-
 struct sh_eth_plat_data {
 	int phy;
 	int phy_irq;
-	int edmac_endian;
 	phy_interface_t phy_interface;
 	void (*set_mdio_gate)(void *addr);
 


^ permalink raw reply

* Re: [PATCH bpf-next 1/2] bpf: implement syscall command BPF_MAP_GET_NEXT_KEY for stacktrace map
From: Yonghong Song @ 2018-01-04 21:32 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <20180104130822.377721d8@cakuba.netronome.com>



On 1/4/18 1:08 PM, Jakub Kicinski wrote:
> On Wed, 3 Jan 2018 23:27:45 -0800, Yonghong Song wrote:
>> Currently, bpf syscall command BPF_MAP_GET_NEXT_KEY is not
>> supported for stacktrace map. However, there are use cases where
>> user space wants to enumerate all stacktrace map entries where
>> BPF_MAP_GET_NEXT_KEY command will be really helpful.
>> In addition, if user space wants to delete all map entries
>> in order to save memory and does not want to close the
>> map file descriptor, BPF_MAP_GET_NEXT_KEY may help improve
>> performance if map entries are sparsely populated.
>>
>> The implementation follows the API specification of existing
>> BPF_MAP_GET_NEXT_KEY implementation. If user provides
>> an NULL key pointer, the first key is returned. Otherwise,
>> the first valid key after the input parameter "key"
>> is returned, or -ENOENT if no valid key can be found.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   kernel/bpf/stackmap.c | 23 +++++++++++++++++++++--
>>   1 file changed, 21 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
>> index a15bc63..207b21c 100644
>> --- a/kernel/bpf/stackmap.c
>> +++ b/kernel/bpf/stackmap.c
>> @@ -226,9 +226,28 @@ int bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
>>   	return 0;
>>   }
>>   
>> -static int stack_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
>> +static int stack_map_get_next_key(struct bpf_map *map, void *key,
>> +				  void *next_key)
>>   {
>> -	return -EINVAL;
>> +	struct bpf_stack_map *smap = container_of(map,
>> +						  struct bpf_stack_map, map);
>> +	u32 id;
>> +
>> +	WARN_ON_ONCE(!rcu_read_lock_held());
>> +
>> +	if (!key)
>> +		id = 0;
>> +	else
>> +		id = *(u32 *)key + 1;
>> +
>> +	while (id < smap->n_buckets && !smap->buckets[id])
>> +		id++;
>> +
>> +	if (id >= smap->n_buckets)
>> +		return -ENOENT;
> 
> AFAIU for hash maps the semantics of get next are as follows:
> 
> get_next(map, key) {
> 	if (!key)
> 		return get_first(map);
> 
> 	elem = lookup(map, key);
> 	if (!elem)                       // <-- note this branch
> 		return get_first(map);
> 	if (elem->next)
> 		return elem->next->key;
> 	return -ENOENT;
> }
> 
> For arrays elements always exist, hence the elem->next check is
> omitted.  Here you are, however, testing !smap->buckets[id] so I assume
> elements may not exist.

Right, buckets[id] could be NULL.

> 
> Is there any precedent for get_next on non-existent key returning
> element other than first?  The stacktrace map is a bit special, and

Sorry, I miss this. You are right. hashtable get_next_key will return 
the first for non-existing key. And all other implemented get_next_key
is a variant of arrays where all keys already exist.

> returning id + 1 would defeat what you're trying to do here..  Is there
> value in keeping the behaviour consistent across map types?

Let us keep the behavior consistent with hashtable then.

> 
> Anyway, you said in the commit message that "The implementation follows
> the API specification of existing BPF_MAP_GET_NEXT_KEY implementation."
> and I find that arguable :)

You are right. Will send v2 soon with re-wording of commit message as well.

>> +	*(u32 *)next_key = id;
>> +	return 0;
>>   }
>>   
>>   static int stack_map_update_elem(struct bpf_map *map, void *key, void *value,
> 

^ permalink raw reply

* Re: [net-next 00/10] net: create dynamic software irq moderation library
From: Saeed Mahameed @ 2018-01-04 21:37 UTC (permalink / raw)
  To: Andy Gospodarek, netdev; +Cc: mchan, talgi, ogerlitz, Andy Gospodarek
In-Reply-To: <1515097290-17470-1-git-send-email-andy@greyhouse.net>



On 1/4/2018 12:21 PM, Andy Gospodarek wrote:
> From: Andy Gospodarek <gospo@broadcom.com>
> 
> This converts the dynamic interrupt moderation library from the mlx5_en driver
> into a library so it can be used by any driver.  The penultimatepatch in this
> set adds support for interrupt moderation in the bnxt_en driver and the last
> patch creates an entry in the MAINTAINERS file.
> 
> The main purpose of this code in the mlx5_en driver is to allow an
> administrator to make sure that default coalesce settings are optimized
> for low latency, but quickly adapt to handle high throughput traffic and
> optimize how many packets are received during each napi poll.
> 
> For any new driver the following changes would be needed to use this
> library:
> 
> - add elements in ring struct to track items needed by this library
> - create function that can be called to actually set coalesce settings
>    for the driver
> 
> Credit to Rob Rice and Lee Reed for doing some of the initial proof of
> concept and testing for this patch and Tal Gilboa and Or Gerlitz for their
> comments, etc on this set.
> > Andy Gospodarek (10):
>    net/mlx5e: move interrupt moderation structs to new file
>    net/mlx5e: move interrupt moderation forward declarations
>    net/mlx5e: remove rq references in mlx5e_rx_am
>    net/mlx5e: move AM logic enums
>    net/mlx5e: move generic functions to new file
>    net/mlx5e: change Mellanox references in DIM code
>    net: move dynamic interrpt coalescing code to include/linux
>    net/dim: use struct net_dim_sample as arg to net_dim
>    bnxt_en: add support for software dynamic interrupt moderation
>    MAINTAINERS: add entry for Dynamic Interrupt Moderation

Very clean and nice work!
Thank you Andy for following my suggestion of changing the API to be 
static inline helper functions instead of function pointers provided by 
device drivers, the current API is less demanding for the device drivers 
and has no performance impact.

Awesome work.

Acked-by: Saeed Mahameed <saeedm@mellanox.com>

> 
>   MAINTAINERS                                        |   5 +
>   drivers/net/ethernet/broadcom/bnxt/Makefile        |   2 +-
>   drivers/net/ethernet/broadcom/bnxt/bnxt.c          |  52 +++
>   drivers/net/ethernet/broadcom/bnxt/bnxt.h          |  34 +-
>   drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c      |  32 ++
>   drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c  |  12 +
>   drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en.h       |  46 +--
>   drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   |  49 +++
>   .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  12 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  32 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   4 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 341 -------------------

Goodbye mlx5/core/en_rx_am.c :)

>   drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |  10 +-
>   drivers/net/ethernet/mellanox/mlx5/core/net_dim.h  | 108 ++++++
>   include/linux/mlx5/mlx5_ifc.h                      |   6 -
>   include/linux/net_dim.h                            | 372 +++++++++++++++++++++
>   17 files changed, 693 insertions(+), 426 deletions(-)
>   create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
>   create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
>   delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
>   create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h
>   create mode 100644 include/linux/net_dim.h
> 

^ permalink raw reply

* Re: bonding: Completion of error handling around bond_update_slave_arr()
From: SF Markus Elfring @ 2018-01-04 21:41 UTC (permalink / raw)
  To: Mahesh Bandewar (महेश बंडेवार),
	linux-netdev
  Cc: Andy Gospodarek, Jay Vosburgh, Veaceslav Falico, LKML,
	kernel-janitors
In-Reply-To: <CAF2d9jjyfWT8Nm1V3fCKSrEE4Xs9gShs7nvj+dAYqXg42kUhvg@mail.gmail.com>

>>> If you see 8 out of 9 call sites in this file ignore the return value.
>>
>> How do you think about to fix error detection and corresponding
>> exception handling then?
>>
> If I understand your question correctly - not having memory is not a
> correctable error

I am unsure if it would be feasible to retry memory allocations for
this software module under other circumstances.


> and hence there are consequences.

Could one consequence be to let the error code “-ENOMEM” move through
the function call hierarchy?

Regards,
Markus

^ permalink raw reply

* [PATCH bpf-next v2 1/2] bpf: implement syscall command BPF_MAP_GET_NEXT_KEY for stacktrace map
From: Yonghong Song @ 2018-01-04 21:55 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20180104215504.2013475-1-yhs@fb.com>

Currently, bpf syscall command BPF_MAP_GET_NEXT_KEY is not
supported for stacktrace map. However, there are use cases where
user space wants to enumerate all stacktrace map entries where
BPF_MAP_GET_NEXT_KEY command will be really helpful.
In addition, if user space wants to delete all map entries
in order to save memory and does not want to close the
map file descriptor, BPF_MAP_GET_NEXT_KEY may help improve
performance if map entries are sparsely populated.

The implementation has similar behavior for
BPF_MAP_GET_NEXT_KEY implementation in hashtab. If user provides
a NULL key pointer or an invalid key, the first key is returned.
Otherwise, the first valid key after the input parameter "key"
is returned, or -ENOENT if no valid key can be found.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/bpf/stackmap.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index a15bc63..6c63c22 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -226,9 +226,33 @@ int bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
 	return 0;
 }

-static int stack_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
+static int stack_map_get_next_key(struct bpf_map *map, void *key,
+				  void *next_key)
 {
-	return -EINVAL;
+	struct bpf_stack_map *smap = container_of(map,
+						  struct bpf_stack_map, map);
+	u32 id;
+
+	WARN_ON_ONCE(!rcu_read_lock_held());
+
+	if (!key) {
+		id = 0;
+	} else {
+		id = *(u32 *)key;
+		if (id >= smap->n_buckets || !smap->buckets[id])
+			id = 0;
+		else
+			id++;
+	}
+
+	while (id < smap->n_buckets && !smap->buckets[id])
+		id++;
+
+	if (id >= smap->n_buckets)
+		return -ENOENT;
+
+	*(u32 *)next_key = id;
+	return 0;
 }

 static int stack_map_update_elem(struct bpf_map *map, void *key, void *value,
-- 
2.9.5

^ permalink raw reply related

* [PATCH bpf-next v2 2/2] tools/bpf: add a bpf selftest for stacktrace
From: Yonghong Song @ 2018-01-04 21:55 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20180104215504.2013475-1-yhs@fb.com>

Added a bpf selftest in test_progs at tools directory for stacktrace.
The test will populate a hashtable map and a stacktrace map
at the same time with the same key, stackid.
The user space will compare both maps, using BPF_MAP_LOOKUP_ELEM
command and BPF_MAP_GET_NEXT_KEY command, to ensure that both have
the same set of keys.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/testing/selftests/bpf/Makefile              |   2 +-
 tools/testing/selftests/bpf/test_progs.c          | 127 ++++++++++++++++++++++
 tools/testing/selftests/bpf/test_stacktrace_map.c |  62 +++++++++++
 3 files changed, 190 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/test_stacktrace_map.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 1304753..a8aa7e2 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -19,7 +19,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
 TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test_obj_id.o \
 	test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o sockmap_parse_prog.o     \
 	sockmap_verdict_prog.o dev_cgroup.o sample_ret0.o test_tracepoint.o \
-	test_l4lb_noinline.o test_xdp_noinline.o
+	test_l4lb_noinline.o test_xdp_noinline.o test_stacktrace_map.o
 
 TEST_PROGS := test_kmod.sh test_xdp_redirect.sh test_xdp_meta.sh \
 	test_offload.py
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index 09087ab..b549308 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -837,6 +837,132 @@ static void test_tp_attach_query(void)
 	free(query);
 }
 
+static int compare_map_keys(int map1_fd, int map2_fd)
+{
+	__u32 key, next_key;
+	char val_buf[PERF_MAX_STACK_DEPTH * sizeof(__u64)];
+	int err;
+
+	err = bpf_map_get_next_key(map1_fd, NULL, &key);
+	if (err)
+		return err;
+	err = bpf_map_lookup_elem(map2_fd, &key, val_buf);
+	if (err)
+		return err;
+
+	while (bpf_map_get_next_key(map1_fd, &key, &next_key) == 0) {
+		err = bpf_map_lookup_elem(map2_fd, &next_key, val_buf);
+		if (err)
+			return err;
+
+		key = next_key;
+	}
+	if (errno != ENOENT)
+		return -1;
+
+	return 0;
+}
+
+static void test_stacktrace_map()
+{
+	int control_map_fd, stackid_hmap_fd, stackmap_fd;
+	const char *file = "./test_stacktrace_map.o";
+	int bytes, efd, err, pmu_fd, prog_fd;
+	struct perf_event_attr attr = {};
+	__u32 key, val, duration = 0;
+	struct bpf_object *obj;
+	char buf[256];
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd);
+	if (CHECK(err, "prog_load", "err %d errno %d\n", err, errno))
+		goto out;
+
+	/* Get the ID for the sched/sched_switch tracepoint */
+	snprintf(buf, sizeof(buf),
+		 "/sys/kernel/debug/tracing/events/sched/sched_switch/id");
+	efd = open(buf, O_RDONLY, 0);
+	if (CHECK(efd < 0, "open", "err %d errno %d\n", efd, errno))
+		goto close_prog;
+
+	bytes = read(efd, buf, sizeof(buf));
+	close(efd);
+	if (CHECK(bytes <= 0 || bytes >= sizeof(buf),
+		  "read", "bytes %d errno %d\n", bytes, errno))
+		goto close_prog;
+
+	/* Open the perf event and attach bpf progrram */
+	attr.config = strtol(buf, NULL, 0);
+	attr.type = PERF_TYPE_TRACEPOINT;
+	attr.sample_type = PERF_SAMPLE_RAW | PERF_SAMPLE_CALLCHAIN;
+	attr.sample_period = 1;
+	attr.wakeup_events = 1;
+	pmu_fd = syscall(__NR_perf_event_open, &attr, -1 /* pid */,
+			 0 /* cpu 0 */, -1 /* group id */,
+			 0 /* flags */);
+	if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n",
+		  pmu_fd, errno))
+		goto close_prog;
+
+	err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
+	if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n",
+		  err, errno))
+		goto close_pmu;
+
+	err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
+	if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n",
+		  err, errno))
+		goto disable_pmu;
+
+	/* find map fds */
+	control_map_fd = bpf_find_map(__func__, obj, "control_map");
+	if (CHECK(control_map_fd < 0, "bpf_find_map control_map",
+		  "err %d errno %d\n", err, errno))
+		goto disable_pmu;
+
+	stackid_hmap_fd = bpf_find_map(__func__, obj, "stackid_hmap");
+	if (CHECK(stackid_hmap_fd < 0, "bpf_find_map stackid_hmap",
+		  "err %d errno %d\n", err, errno))
+		goto disable_pmu;
+
+	stackmap_fd = bpf_find_map(__func__, obj, "stackmap");
+	if (CHECK(stackmap_fd < 0, "bpf_find_map stackmap", "err %d errno %d\n",
+		  err, errno))
+		goto disable_pmu;
+
+	/* give some time for bpf program run */
+	sleep(1);
+
+	/* disable stack trace collection */
+	key = 0;
+	val = 1;
+	bpf_map_update_elem(control_map_fd, &key, &val, 0);
+
+	/* for every element in stackid_hmap, we can find a corresponding one
+	 * in stackmap, and vise versa.
+	 */
+	err = compare_map_keys(stackid_hmap_fd, stackmap_fd);
+	if (CHECK(err, "compare_map_keys stackid_hmap vs. stackmap",
+		  "err %d errno %d\n", err, errno))
+		goto disable_pmu;
+
+	err = compare_map_keys(stackmap_fd, stackid_hmap_fd);
+	if (CHECK(err, "compare_map_keys stackmap vs. stackid_hmap",
+		  "err %d errno %d\n", err, errno))
+		; /* fall through */
+
+disable_pmu:
+	ioctl(pmu_fd, PERF_EVENT_IOC_DISABLE);
+
+close_pmu:
+	close(pmu_fd);
+
+close_prog:
+	bpf_object__close(obj);
+
+out:
+	return;
+}
+
 int main(void)
 {
 	struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
@@ -852,6 +978,7 @@ int main(void)
 	test_pkt_md_access();
 	test_obj_name();
 	test_tp_attach_query();
+	test_stacktrace_map();
 
 	printf("Summary: %d PASSED, %d FAILED\n", pass_cnt, error_cnt);
 	return error_cnt ? EXIT_FAILURE : EXIT_SUCCESS;
diff --git a/tools/testing/selftests/bpf/test_stacktrace_map.c b/tools/testing/selftests/bpf/test_stacktrace_map.c
new file mode 100644
index 0000000..76d85c5d
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_stacktrace_map.c
@@ -0,0 +1,62 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Facebook
+
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+
+#ifndef PERF_MAX_STACK_DEPTH
+#define PERF_MAX_STACK_DEPTH         127
+#endif
+
+struct bpf_map_def SEC("maps") control_map = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(__u32),
+	.max_entries = 1,
+};
+
+struct bpf_map_def SEC("maps") stackid_hmap = {
+	.type = BPF_MAP_TYPE_HASH,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(__u32),
+	.max_entries = 10000,
+};
+
+struct bpf_map_def SEC("maps") stackmap = {
+	.type = BPF_MAP_TYPE_STACK_TRACE,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(__u64) * PERF_MAX_STACK_DEPTH,
+	.max_entries = 10000,
+};
+
+/* taken from /sys/kernel/debug/tracing/events/sched/sched_switch/format */
+struct sched_switch_args {
+	unsigned long long pad;
+	char prev_comm[16];
+	int prev_pid;
+	int prev_prio;
+	long long prev_state;
+	char next_comm[16];
+	int next_pid;
+	int next_prio;
+};
+
+SEC("tracepoint/sched/sched_switch")
+int oncpu(struct sched_switch_args *ctx)
+{
+	__u32 key = 0, val = 0, *value_p;
+
+	value_p = bpf_map_lookup_elem(&control_map, &key);
+	if (value_p && *value_p)
+		return 0; /* skip if non-zero *value_p */
+
+	/* The size of stackmap and stackid_hmap should be the same */
+	key = bpf_get_stackid(ctx, &stackmap, 0);
+	if ((int)key >= 0)
+		bpf_map_update_elem(&stackid_hmap, &key, &val, 0);
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+__u32 _version SEC("version") = 1; /* ignored by tracepoints, required by libbpf.a */
-- 
2.9.5

^ permalink raw reply related

* [PATCH bpf-next v2 0/2] bpf: implement syscall command BPF_MAP_GET_NEXT_KEY for stacktrace map
From: Yonghong Song @ 2018-01-04 21:55 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team

The patch set implements bpf syscall command BPF_MAP_GET_NEXT_KEY
for stacktrace map. Patch #1 is the core implementation
and Patch #2 implements a bpf test at tools/testing/selftests/bpf
directory. Please see individual patch comments for details.

Changelog:
  v1 -> v2:
   - For invalid key (key pointer is non-NULL), sets next_key to be the first valid key.

Yonghong Song (2):
  bpf: implement syscall command BPF_MAP_GET_NEXT_KEY for stacktrace map
  tools/bpf: add a bpf selftest for stacktrace

 kernel/bpf/stackmap.c                             |  28 ++++-
 tools/testing/selftests/bpf/Makefile              |   2 +-
 tools/testing/selftests/bpf/test_progs.c          | 127 ++++++++++++++++++++++
 tools/testing/selftests/bpf/test_stacktrace_map.c |  62 +++++++++++
 4 files changed, 216 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_stacktrace_map.c

-- 
2.9.5

^ permalink raw reply

* [PATCH net-next] net: ipv6: Allow connect to linklocal address from socket bound to vrf
From: David Ahern @ 2018-01-04 22:03 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

Allow a process bound to a VRF to connect to a linklocal address.
Currently, this fails because of a mismatch between the scope of the
linklocal address and the sk_bound_dev_if inherited by the VRF binding:
    $ ssh -6 fe80::70b8:cff:fedd:ead8%eth1
    ssh: connect to host fe80::70b8:cff:fedd:ead8%eth1 port 22: Invalid argument

Relax the scope check to allow the socket to be bound to the same L3
device as the scope id.

This makes ipv6 linklocal consistent with other relaxed checks enabled
by commits 1ff23beebdd3 ("net: l3mdev: Allow send on enslaved interface")
and 7bb387c5ab12a ("net: Allow IP_MULTICAST_IF to set index to L3 slave").

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/net/sock.h  | 20 ++++++++++++++++++++
 net/ipv6/datagram.c |  3 +--
 net/ipv6/tcp_ipv6.c |  3 +--
 3 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 66fd3951e6f3..73b7830b0bb8 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -72,6 +72,7 @@
 #include <net/tcp_states.h>
 #include <linux/net_tstamp.h>
 #include <net/smc.h>
+#include <net/l3mdev.h>
 
 /*
  * This structure really needs to be cleaned up.
@@ -2399,4 +2400,23 @@ static inline void sk_pacing_shift_update(struct sock *sk, int val)
 	sk->sk_pacing_shift = val;
 }
 
+/* if a socket is bound to a device, check that the given device
+ * index is either the same or that the socket is bound to an L3
+ * master device and the given device index is also enslaved to
+ * that L3 master
+ */
+static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif)
+{
+	int mdif;
+
+	if (!sk->sk_bound_dev_if || sk->sk_bound_dev_if == dif)
+		return true;
+
+	mdif = l3mdev_master_ifindex_by_index(sock_net(sk), dif);
+	if (mdif && mdif == sk->sk_bound_dev_if)
+		return true;
+
+	return false;
+}
+
 #endif	/* _SOCK_H */
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index a1f918713006..fbf08ce3f5ab 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -221,8 +221,7 @@ int __ip6_datagram_connect(struct sock *sk, struct sockaddr *uaddr,
 	if (__ipv6_addr_needs_scope_id(addr_type)) {
 		if (addr_len >= sizeof(struct sockaddr_in6) &&
 		    usin->sin6_scope_id) {
-			if (sk->sk_bound_dev_if &&
-			    sk->sk_bound_dev_if != usin->sin6_scope_id) {
+			if (!sk_dev_equal_l3scope(sk, usin->sin6_scope_id)) {
 				err = -EINVAL;
 				goto out;
 			}
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index aa12a26a96c6..c0f7e69f2e6c 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -176,8 +176,7 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
 			/* If interface is set while binding, indices
 			 * must coincide.
 			 */
-			if (sk->sk_bound_dev_if &&
-			    sk->sk_bound_dev_if != usin->sin6_scope_id)
+			if (!sk_dev_equal_l3scope(sk, usin->sin6_scope_id))
 				return -EINVAL;
 
 			sk->sk_bound_dev_if = usin->sin6_scope_id;
-- 
2.11.0

^ permalink raw reply related

* Re: [net-next 09/10] bnxt_en: add support for software dynamic interrupt moderation
From: Michael Chan @ 2018-01-04 22:16 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Netdev, michael.chan@broadcom.com, talgi, ogerlitz,
	Andy Gospodarek
In-Reply-To: <1515097290-17470-10-git-send-email-andy@greyhouse.net>

On Thu, Jan 4, 2018 at 12:21 PM, Andy Gospodarek <andy@greyhouse.net> wrote:
> From: Andy Gospodarek <gospo@broadcom.com>
>
> This implements the changes needed for the bnxt_en driver to add support
> for dynamic interrupt moderation per ring.
>
> This does add additional counters in the receive path, but testing shows
> that any additional instructions are offset by throughput gain when the
> default configuration is for low latency.
>
> Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
> Cc: Michael Chan <mchan@broadcom.com>

Andy, looks good in general. I just have a few comments below.  These
minor issues can be cleaned up after merge if you want.

....
> +int bnxt_hwrm_set_ring_coal(struct bnxt *bp, struct bnxt_napi *bnapi)
> +{
> +       struct hwrm_ring_cmpl_ring_cfg_aggint_params_input req_rx = {0};
> +       struct bnxt_cp_ring_info *cpr = &bnapi->cp_ring;
> +       struct bnxt_coal coal;
> +       unsigned int grp_idx;
> +       int rc = 0;
> +
> +        /* Tick values in micro seconds.
> +         * 1 coal_buf x bufs_per_record = 1 completion record.
> +         */
> +       memcpy(&coal, &bp->rx_coal, sizeof(struct bnxt_coal));
> +
> +       coal.coal_ticks = cpr->rx_ring_coal.coal_ticks;
> +       coal.coal_bufs = cpr->rx_ring_coal.coal_bufs;
> +
> +       if (!bnapi->rx_ring)
> +               return -ENODEV;
> +
> +       bnxt_hwrm_cmd_hdr_init(bp, &req_rx,
> +                              HWRM_RING_CMPL_RING_CFG_AGGINT_PARAMS, -1, -1);
> +
> +       bnxt_hwrm_set_coal_params(&coal, &req_rx);
> +
> +       mutex_lock(&bp->hwrm_cmd_lock);
> +       grp_idx = bnapi->index;
> +
> +       req_rx.ring_id = cpu_to_le16(bp->grp_info[grp_idx].cp_fw_ring_id);
> +
> +       rc = _hwrm_send_message(bp, &req_rx, sizeof(req_rx),
> +                               HWRM_CMD_TIMEOUT);
> +       mutex_unlock(&bp->hwrm_cmd_lock);

You can use the hwrm_send_message() variant that does not require you
to take the mutex.  You only need this variant and take the mutex if
you need to check the firmware reply.

> +       return rc;
> +}
> +
>  int bnxt_hwrm_set_coal(struct bnxt *bp)
>  {
>         int i, rc = 0;
> @@ -5705,7 +5753,11 @@ static void bnxt_enable_napi(struct bnxt *bp)
>         int i;
>
>         for (i = 0; i < bp->cp_nr_rings; i++) {

We only need to enable this for every completion ring that has an RX
ring.  In some cases, for example when XDP is enabled, there will be a
set of completion rings with only TX rings.  So I think we can
optimize this for completion rings with RX only.

> +               struct bnxt_cp_ring_info *cpr = &bp->bnapi[i]->cp_ring;
>                 bp->bnapi[i]->in_reset = false;
> +
> +               INIT_WORK(&cpr->am.work, bnxt_dim_work);
> +               cpr->am.mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
>                 napi_enable(&bp->bnapi[i]->napi);
>         }
>  }

^ permalink raw reply

* Re: [net-next PATCH 1/2] bpf: sockmap remove unused function
From: Daniel Borkmann @ 2018-01-04 22:27 UTC (permalink / raw)
  To: John Fastabend, borkmann, alexei.starovoitov; +Cc: netdev
In-Reply-To: <20180104015739.14160.96127.stgit@john-Precision-Tower-5810>

On 01/04/2018 02:57 AM, John Fastabend wrote:
> This was added for some work that was eventually factored out but the
> helper call was missed. Remove it now and add it back later if needed.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Both applied to bpf-next, thanks John!

^ permalink raw reply

* Re: [net-next 10/10] MAINTAINERS: add entry for Dynamic Interrupt Moderation
From: Stephen Hemminger @ 2018-01-04 22:36 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: netdev, mchan, talgi, ogerlitz, Andy Gospodarek
In-Reply-To: <1515097290-17470-11-git-send-email-andy@greyhouse.net>

On Thu,  4 Jan 2018 15:21:30 -0500
Andy Gospodarek <andy@greyhouse.net> wrote:

> +DYNAMIC INTERRUPT MODERATION
> +M:	Tal Gilboa <talgi@mellanox.com>
> +S:	Mainained

s/Mainained/Maintained/

^ permalink raw reply

* Re: [PATCH net-next v2 05/10] net: qualcomm: rmnet: Set pacing rate
From: Subash Abhinov Kasiviswanathan @ 2018-01-04 22:43 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev, lkp
In-Reply-To: <1515051890.131759.6.camel@gmail.com>

>> The real device over which the rmnet devices are installed also
>> aggregate multiple IP packets and sends them as a single large 
>> aggregate
>> frame to the hardware.
> 
> It would be nice to give some details about this in the changelog.
> 
> Also what results you get with different values for the shift (10, 9,
> 8)
> 
> My fear is that people might be tempted to blindly use the
> sk_pacing_shift_update() just because a single TCP flow gets 'better'
> results.
> 
> bufferbloat is a serious issue, we do not want to allow a single TCP
> flow to fill a fifo.
> 
> Otherwise, we could remove TCP Small queues overhead from the kernel
> and be happy.
> 
> Thanks.

The test was run with iperf single stream TCP TX for a duration of 30s.

Pacing shift | Observed data rate (Mbps)
           10 | 9
           9  | 140
           8  | 146

I will update all of this in the commit text in v3.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply

* Re: [net-next 10/10] MAINTAINERS: add entry for Dynamic Interrupt Moderation
From: Andy Gospodarek @ 2018-01-04 22:45 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, mchan, talgi, ogerlitz, Andy Gospodarek
In-Reply-To: <20180104143654.0600b695@xeon-e3>

On Thu, Jan 04, 2018 at 02:36:54PM -0800, Stephen Hemminger wrote:
> On Thu,  4 Jan 2018 15:21:30 -0500
> Andy Gospodarek <andy@greyhouse.net> wrote:
> 
> > +DYNAMIC INTERRUPT MODERATION
> > +M:	Tal Gilboa <talgi@mellanox.com>
> > +S:	Mainained
> 
> s/Mainained/Maintained/

Ugh.  Thanks for noticing that, Stephen!

^ permalink raw reply

* Re: [net-next 00/10] net: create dynamic software irq moderation library
From: Andy Gospodarek @ 2018-01-04 22:46 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Linux Netdev List, mchan, Tal Gilboa, Or Gerlitz, Andy Gospodarek
In-Reply-To: <CAJ3xEMhSGCbYVA2dwgbtFObHb--GQZcUJv6SOQkJ_GDOjw4VeA@mail.gmail.com>

On Thu, Jan 04, 2018 at 10:37:37PM +0200, Or Gerlitz wrote:
> >   net/mlx5e: move interrupt moderation structs to new file
> >   net/mlx5e: move interrupt moderation forward declarations
> >   net/mlx5e: remove rq references in mlx5e_rx_am
> >   net/mlx5e: move AM logic enums
> >   net/mlx5e: move generic functions to new file
> >   net/mlx5e: change Mellanox references in DIM code
> 
> Hi, Andy && happy new 2018 --  this is indeed a nit, but I have
> provided it to you twice (...),
> please get the commit titles to align with what we do which is capital
> letter after the net/mlx5e: prefix
> 
> from: net/mlx5e: move interrupt moderation structs to new file
> to: net/mlx5e: Move interrupt moderation structs to new file
> 
> If you get other comments, just apply this for the next version, if everyone
> is happy, that would be a very small effort to just fix and get that in..

Or, you did mention this part and I'm _really_ sorry to forgot to add
the capitalization.  I will do that if there is a v2 (which is looks
like there might be since I cannot spell 'Maintained' correctly.

^ permalink raw reply

* Re: [PATCH 8/8] net: tipc: remove unused hardirq.h
From: Yang Shi @ 2018-01-04 22:46 UTC (permalink / raw)
  To: linux-kernel, David S. Miller
  Cc: Ying Xue, linux-mm, linux-fsdevel, linux-crypto, netdev,
	Jon Maloy
In-Reply-To: <4ed1efbc-5fb8-7412-4f46-1e3a91a98373@windriver.com>

Hi David,

Any more comment on this change?

Thanks,
Yang


On 12/7/17 5:40 PM, Ying Xue wrote:
> On 11/18/2017 07:02 AM, Yang Shi wrote:
>> Preempt counter APIs have been split out, currently, hardirq.h just
>> includes irq_enter/exit APIs which are not used by TIPC at all.
>>
>> So, remove the unused hardirq.h.
>>
>> Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
>> Cc: Jon Maloy <jon.maloy@ericsson.com>
>> Cc: Ying Xue <ying.xue@windriver.com>
>> Cc: "David S. Miller" <davem@davemloft.net>
> 
> Tested-by: Ying Xue <ying.xue@windriver.com>
> Acked-by: Ying Xue <ying.xue@windriver.com>
> 
>> ---
>>   net/tipc/core.h | 1 -
>>   1 file changed, 1 deletion(-)
>>
>> diff --git a/net/tipc/core.h b/net/tipc/core.h
>> index 5cc5398..099e072 100644
>> --- a/net/tipc/core.h
>> +++ b/net/tipc/core.h
>> @@ -49,7 +49,6 @@
>>   #include <linux/uaccess.h>
>>   #include <linux/interrupt.h>
>>   #include <linux/atomic.h>
>> -#include <asm/hardirq.h>
>>   #include <linux/netdevice.h>
>>   #include <linux/in.h>
>>   #include <linux/list.h>
>>

^ permalink raw reply

* Re: [ovs-dev] [PATCH 7/8] net: ovs: remove unused hardirq.h
From: Yang Shi @ 2018-01-04 22:47 UTC (permalink / raw)
  To: David S. Miller
  Cc: Pravin Shelar, linux-kernel, ovs dev,
	Linux Kernel Network Developers, linux-mm, Pravin Shelar,
	linux-crypto, linux-fsdevel
In-Reply-To: <CAOrHB_CiK-A0nphB2xVTG_5P_xeFOkg0xc6iNNbT=MXq1XgU=A@mail.gmail.com>

Hi David,

Any comment is appreciated.

Thanks,
Yang


On 12/7/17 11:27 AM, Pravin Shelar wrote:
> On Fri, Nov 17, 2017 at 3:02 PM, Yang Shi <yang.s@alibaba-inc.com> wrote:
>> Preempt counter APIs have been split out, currently, hardirq.h just
>> includes irq_enter/exit APIs which are not used by openvswitch at all.
>>
>> So, remove the unused hardirq.h.
>>
>> Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
>> Cc: Pravin Shelar <pshelar@nicira.com>
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: dev@openvswitch.org
> 
> Acked-by: Pravin B Shelar <pshelar@ovn.org>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH RESEND] bpf: fix bad include in libbpf when srctree is set
From: Daniel Borkmann @ 2018-01-04 22:47 UTC (permalink / raw)
  To: Alexander Alemayhu; +Cc: netdev, alexei.starovoitov, acme
In-Reply-To: <20180101123654.16014-1-alexander@alemayhu.com>

On 01/01/2018 01:36 PM, Alexander Alemayhu wrote:
> The relative path can be wrong and prevents the build.

You mean if you move the files from tools/lib/bpf/ to a different
location, and then specify srctree var pointing to a kernel tree?

I think this would also break various other assumptions inside the
tools/lib/bpf/Makefile, e.g. the uapi header checks also depending
on relative paths.

> 	Makefile:57: ../../scripts/Makefile.include: No such file or directory
> 	make: *** No rule to make target '../../scripts/Makefile.include'.  Stop.
> 
> Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
> ---
>  tools/lib/bpf/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
> index 4555304dc18e..0068829a56db 100644
> --- a/tools/lib/bpf/Makefile
> +++ b/tools/lib/bpf/Makefile
> @@ -54,7 +54,7 @@ man_dir_SQ = '$(subst ','\'',$(man_dir))'
>  export man_dir man_dir_SQ INSTALL
>  export DESTDIR DESTDIR_SQ
>  
> -include ../../scripts/Makefile.include
> +include $(srctree)/tools/scripts/Makefile.include
>  
>  # copy a bit from Linux kbuild
>  
> 

^ permalink raw reply

* Re: [PATCH 6/8] net: caif: remove unused hardirq.h
From: Yang Shi @ 2018-01-04 22:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-fsdevel, linux-crypto, netdev, Dmitry Tarnyagin,
	David S. Miller
In-Reply-To: <9ad5b35a-8d4c-448a-912b-2816c4c8c53f@alibaba-inc.com>

Hi David,

I'm not sure if CAIF is still maintained by Dmitry Tarnyagin. Do you 
have any comment on this one?

Thanks,
Yang


On 12/7/17 11:13 AM, Yang Shi wrote:
> Hi folks,
> 
> Any comment on this one?
> 
> Thanks,
> Yang
> 
> 
> On 11/17/17 3:02 PM, Yang Shi wrote:
>> Preempt counter APIs have been split out, currently, hardirq.h just
>> includes irq_enter/exit APIs which are not used by caif at all.
>>
>> So, remove the unused hardirq.h.
>>
>> Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
>> Cc: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no>
>> Cc: "David S. Miller" <davem@davemloft.net>
>> ---
>>   net/caif/cfpkt_skbuff.c | 1 -
>>   net/caif/chnl_net.c     | 1 -
>>   2 files changed, 2 deletions(-)
>>
>> diff --git a/net/caif/cfpkt_skbuff.c b/net/caif/cfpkt_skbuff.c
>> index 71b6ab2..38c2b7a 100644
>> --- a/net/caif/cfpkt_skbuff.c
>> +++ b/net/caif/cfpkt_skbuff.c
>> @@ -8,7 +8,6 @@
>>   #include <linux/string.h>
>>   #include <linux/skbuff.h>
>> -#include <linux/hardirq.h>
>>   #include <linux/export.h>
>>   #include <net/caif/cfpkt.h>
>> diff --git a/net/caif/chnl_net.c b/net/caif/chnl_net.c
>> index 922ac1d..53ecda1 100644
>> --- a/net/caif/chnl_net.c
>> +++ b/net/caif/chnl_net.c
>> @@ -8,7 +8,6 @@
>>   #define pr_fmt(fmt) KBUILD_MODNAME ":%s(): " fmt, __func__
>>   #include <linux/fs.h>
>> -#include <linux/hardirq.h>
>>   #include <linux/init.h>
>>   #include <linux/module.h>
>>   #include <linux/netdevice.h>
>>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH net-next] net: sched: fix tcf_block_get_ext() in case CONFIG_NET_CLS is not set
From: Cong Wang @ 2018-01-04 23:22 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: Jakub Kicinski, Linux Kernel Network Developers, oss-drivers,
	Jiri Pirko, Alexander Aring
In-Reply-To: <1d6a9fe2-f1e2-116e-3f39-e81877db4cf3@netronome.com>

On Thu, Jan 4, 2018 at 1:59 AM, Quentin Monnet
<quentin.monnet@netronome.com> wrote:
> Hi Cong,
>
> 2018-01-03 18:08 UTC-0800 ~ Cong Wang <xiyou.wangcong@gmail.com>
>> On Wed, Jan 3, 2018 at 5:30 PM, Jakub Kicinski
>> <jakub.kicinski@netronome.com> wrote:
>>> From: Quentin Monnet <quentin.monnet@netronome.com>
>>>
>>> The definition of functions tcf_block_get() and tcf_block_get_ext()
>>> depends of CONFIG_NET_CLS being set. When those functions gained extack
>>> support, only one version of the declaration of those functions was
>>> updated. Function tcf_block_get() was later fixed with commit
>>> 3c1490913f3b ("net: sch: api: fix tcf_block_get").
>>>
>>> Change arguments of tcf_block_get_ext() for the case when CONFIG_NET_CLS
>>> is not set.
>>
>> There is one already:
>> https://patchwork.kernel.org/patch/10130849/
>>
>
> Thanks! But this patch is the one I mentioned in the commit log: it
> fixes a different function, tcf_block_get(). My patch is an additional
> fix for tcf_block_get_ext().

Oh, I thought it is same one.

Acked-by: Cong Wang <xiyou.wangcong@gmail.com>

^ permalink raw reply

* [PATCH net-next] tcp: Split BUG_ON() in tcp_tso_should_defer() into two assertions
From: Stefano Brivio @ 2018-01-04 23:38 UTC (permalink / raw)
  To: David S . Miller; +Cc: Eric Dumazet, netdev

The two conditions triggering BUG_ON() are somewhat unrelated:
the tcp_skb_pcount() check is meant to catch TSO flaws, the
second one checks sanity of congestion window bookkeeping.

Split them into two separate BUG_ON() assertions on two lines,
so that we know which one actually triggers, when they do.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 net/ipv4/tcp_output.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 04be9f833927..95461f02ac9a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1944,7 +1944,8 @@ static bool tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb,

 	in_flight = tcp_packets_in_flight(tp);

-	BUG_ON(tcp_skb_pcount(skb) <= 1 || (tp->snd_cwnd <= in_flight));
+	BUG_ON(tcp_skb_pcount(skb) <= 1);
+	BUG_ON(tp->snd_cwnd <= in_flight);

 	send_win = tcp_wnd_end(tp) - TCP_SKB_CB(skb)->seq;

-- 
2.9.4

^ permalink raw reply related

* [PATCH net 0/2] bnxt_en: 2 small bug fixes.
From: Michael Chan @ 2018-01-04 23:46 UTC (permalink / raw)
  To: davem; +Cc: netdev

The first one fixes the TC Flower flow parameter passed to firmware.  The
2nd one fixes the VF index range checking for iproute2 SRIOV related commands.

Sunil Challa (1):
  bnxt_en: Fix population of flow_type in bnxt_hwrm_cfa_flow_alloc()

Venkat Duvvuru (1):
  bnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine.

 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c    | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* [PATCH net 1/2] bnxt_en: Fix population of flow_type in bnxt_hwrm_cfa_flow_alloc()
From: Michael Chan @ 2018-01-04 23:46 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1515109615-22695-1-git-send-email-michael.chan@broadcom.com>

From: Sunil Challa <sunilkumar.challa@broadcom.com>

flow_type in HWRM_FLOW_ALLOC is not being populated correctly due to
incorrect passing of pointer and size of l3_mask argument of is_wildcard().
Fixed this.

Fixes: db1d36a27324 ("bnxt_en: add TC flower offload flow_alloc/free FW cmds")
Signed-off-by: Sunil Challa <sunilkumar.challa@broadcom.com>
Reviewed-by: Sathya Perla <sathya.perla@broadcom.com>
Reviewed-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
index 3d201d7..d8fee26 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
@@ -421,7 +421,7 @@ static int bnxt_hwrm_cfa_flow_alloc(struct bnxt *bp, struct bnxt_tc_flow *flow,
 	}
 
 	/* If all IP and L4 fields are wildcarded then this is an L2 flow */
-	if (is_wildcard(&l3_mask, sizeof(l3_mask)) &&
+	if (is_wildcard(l3_mask, sizeof(*l3_mask)) &&
 	    is_wildcard(&flow->l4_mask, sizeof(flow->l4_mask))) {
 		flow_flags |= CFA_FLOW_ALLOC_REQ_FLAGS_FLOWTYPE_L2;
 	} else {
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net 2/2] bnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine.
From: Michael Chan @ 2018-01-04 23:46 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1515109615-22695-1-git-send-email-michael.chan@broadcom.com>

From: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>

In bnxt_vf_ndo_prep (which is called by bnxt_get_vf_config ndo), there is a
check for "Invalid VF id". Currently, the check is done against max_vfs.
However, the user doesn't always create max_vfs. So, the check should be
against the created number of VFs. The number of bnxt_vf_info structures
that are allocated in bnxt_alloc_vf_resources routine is the "number of
requested VFs". So, if an "invalid VF id" falls between the requested
number of VFs and the max_vfs, the driver will be dereferencing an invalid
pointer.

Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Venkat Devvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index 5ee1866..c961767 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -70,7 +70,7 @@ static int bnxt_vf_ndo_prep(struct bnxt *bp, int vf_id)
 		netdev_err(bp->dev, "vf ndo called though sriov is disabled\n");
 		return -EINVAL;
 	}
-	if (vf_id >= bp->pf.max_vfs) {
+	if (vf_id >= bp->pf.active_vfs) {
 		netdev_err(bp->dev, "Invalid VF id %d\n", vf_id);
 		return -EINVAL;
 	}
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH bpf-next v4 01/11] bpf: Make SOCK_OPS_GET_TCP size independent
From: Lawrence Brakmo @ 2018-01-04 23:55 UTC (permalink / raw)
  To: netdev
  Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
	Eric Dumazet, Neal Cardwell, Yuchung Cheng
In-Reply-To: <20180104235533.3672006-1-brakmo@fb.com>

Make SOCK_OPS_GET_TCP helper macro size independent (before only worked
with 4-byte fields.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
 net/core/filter.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 130b842..099ff9fd 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4449,9 +4449,10 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
 		break;
 
 /* Helper macro for adding read access to tcp_sock fields. */
-#define SOCK_OPS_GET_TCP32(FIELD_NAME)					      \
+#define SOCK_OPS_GET_TCP(FIELD_NAME)					      \
 	do {								      \
-		BUILD_BUG_ON(FIELD_SIZEOF(struct tcp_sock, FIELD_NAME) != 4); \
+		BUILD_BUG_ON(FIELD_SIZEOF(struct tcp_sock, FIELD_NAME) >      \
+			     FIELD_SIZEOF(struct bpf_sock_ops, FIELD_NAME));  \
 		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			      \
 						struct bpf_sock_ops_kern,     \
 						is_fullsock),		      \
@@ -4463,16 +4464,18 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
 						struct bpf_sock_ops_kern, sk),\
 				      si->dst_reg, si->src_reg,		      \
 				      offsetof(struct bpf_sock_ops_kern, sk));\
-		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,        \
+		*insn++ = BPF_LDX_MEM(FIELD_SIZEOF(struct tcp_sock,	      \
+						   FIELD_NAME), si->dst_reg,  \
+				      si->dst_reg,			      \
 				      offsetof(struct tcp_sock, FIELD_NAME)); \
 	} while (0)
 
 	case offsetof(struct bpf_sock_ops, snd_cwnd):
-		SOCK_OPS_GET_TCP32(snd_cwnd);
+		SOCK_OPS_GET_TCP(snd_cwnd);
 		break;
 
 	case offsetof(struct bpf_sock_ops, srtt_us):
-		SOCK_OPS_GET_TCP32(srtt_us);
+		SOCK_OPS_GET_TCP(srtt_us);
 		break;
 	}
 	return insn - insn_buf;
-- 
2.9.5

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox