Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 09/28] thunderbolt: Cache adapter specific capability offset into struct port
From: Lukas Wunner @ 2019-01-31  9:23 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: linux-kernel, Michael Jamet, Yehezkel Bernat, Andreas Noever,
	David S . Miller, Andy Shevchenko, netdev
In-Reply-To: <20190129150143.12681-10-mika.westerberg@linux.intel.com>

On Tue, Jan 29, 2019 at 06:01:24PM +0300, Mika Westerberg wrote:
> --- a/drivers/thunderbolt/tb.h
> +++ b/drivers/thunderbolt/tb.h
> @@ -124,6 +124,8 @@ struct tb_switch {
>   * @remote: Remote port (%NULL if not connected)
>   * @xdomain: Remote host (%NULL if not connected)
>   * @cap_phy: Offset, zero if not found
> + * @cap_adap: Offset of the adapter specific capability. Negative if not
> + *	      present.

Hm, could cap_adap be made zero in the non-presence case for consistency
with cap_phy?

Thanks,

Lukas

^ permalink raw reply

* Re: [PATCH bpf-next] bpf: add optional memory accounting for maps
From: Martynas @ 2019-01-31  9:27 UTC (permalink / raw)
  To: Y Song; +Cc: netdev, Alexei Starovoitov, Daniel Borkmann
In-Reply-To: <CAH3MdRW1V=BwmLYgo55=-xPuxJy8NKcXZjRPVJnApQ84Gd8CFw@mail.gmail.com>

Thanks for testing. Sure, I will re-submit with the required changes.

On Thu, Jan 31, 2019, at 9:15 AM, Y Song wrote:
> On Wed, Jan 30, 2019 at 6:05 AM Martynas Pumputis <m@lambda.lt> wrote:
> >
> > Previously, memory allocated for a map was not accounted. Therefore,
> > this memory could not be taken into consideration by the cgroups
> > memory controller.
> >
> > This patch introduces the "BPF_F_ACCOUNT_MEM" flag which enables
> > the memory accounting for a map, and it can be set during
> > the map creation ("BPF_MAP_CREATE") in "map_flags".
> >
> > When enabled, we account only that amount of memory which is charged
> > against the "RLIMIT_MEMLOCK" limit.
> >
> > To validate the change, first we create the memory cgroup "test-map":
> >
> >     # mkdir /sys/fs/cgroup/memory/test-map
> >
> > And then we run the following program against the cgroup:
> >
> >     $ cat test_map.c
> >     <..>
> >     int main() {
> >         usleep(3 * 1000000);
> >         assert(bpf_create_map(BPF_MAP_TYPE_HASH, 8, 16, 65536, 0) > 0);
> >         usleep(3 * 1000000);
> >     }
> >     # cgexec -g memory:test-map ./test_map &
> >     # cat /sys/fs/cgroup/memory/test-map/memory{,.kmem}.usage_in_bytes
> >     397312
> >     258048
> >
> >     <after 3 sec the map has been created>
> >
> >     # bpftool map list
> >     19: hash  flags 0x0
> >         key 8B  value 16B  max_entries 65536  memlock 5771264B
> >     # cat /sys/fs/cgroup/memory/test-map/memory{,.kmem}.usage_in_bytes
> >     401408
> >     262144
> >
> > As we can see, the memory allocated for map is not accounted, as
> > 397312B + 5771264B > 401408B.
> >
> > Next, we enabled the accounting and re-run the test:
> >
> >     $ cat test_map.c
> >     <..>
> >     int main() {
> >         usleep(3 * 1000000);
> >         assert(bpf_create_map(BPF_MAP_TYPE_HASH, 8, 16, 65536, BPF_F_ACCOUNT_MEM) > 0);
> >         usleep(3 * 1000000);
> >     }
> >     # cgexec -g memory:test-map ./test_map &
> >     # cat /sys/fs/cgroup/memory/test-map/memory{,.kmem}.usage_in_bytes
> >     450560
> >     307200
> >
> >     <after 3 sec the map has been created>
> >
> >     # bpftool map list
> >     20: hash  flags 0x80
> >         key 8B  value 16B  max_entries 65536  memlock 5771264B
> >     # cat /sys/fs/cgroup/memory/test-map/memory{,.kmem}.usage_in_bytes
> >     6221824
> >     6078464
> >
> > This time, the memory (including kmem) is accounted, as
> > 450560B + 5771264B <= 6221824B
> 
> The above test result is for cgroup v1.
> I also tested the patch for cgroup v2. It works fine too.
> 
> >
> > Signed-off-by: Martynas Pumputis <m@lambda.lt>
> > ---
> >  include/linux/bpf.h            |  5 +++--
> >  include/uapi/linux/bpf.h       |  2 ++
> >  kernel/bpf/arraymap.c          | 14 +++++++++-----
> >  kernel/bpf/bpf_lru_list.c      | 11 +++++++++--
> >  kernel/bpf/bpf_lru_list.h      |  1 +
> >  kernel/bpf/cpumap.c            | 12 +++++++++---
> >  kernel/bpf/devmap.c            | 10 ++++++++--
> >  kernel/bpf/hashtab.c           | 19 ++++++++++++++-----
> >  kernel/bpf/lpm_trie.c          | 19 +++++++++++++------
> >  kernel/bpf/queue_stack_maps.c  |  5 +++--
> >  kernel/bpf/reuseport_array.c   |  3 ++-
> >  kernel/bpf/stackmap.c          | 12 ++++++++----
> >  kernel/bpf/syscall.c           | 12 ++++++++----
> >  kernel/bpf/xskmap.c            |  9 +++++++--
> >  net/core/sock_map.c            | 13 +++++++++----
> >  tools/include/uapi/linux/bpf.h |  3 +++
> >  16 files changed, 108 insertions(+), 42 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index e734f163bd0b..353a3f4304fe 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -79,7 +79,8 @@ struct bpf_map {
> >         u32 btf_value_type_id;
> >         struct btf *btf;
> >         bool unpriv_array;
> > -       /* 55 bytes hole */
> > +       bool account_mem;
> > +       /* 54 bytes hole */
> >
> >         /* The 3rd and 4th cacheline with misc members to avoid false sharing
> >          * particularly with refcounting.
> > @@ -506,7 +507,7 @@ void bpf_map_put(struct bpf_map *map);
> >  int bpf_map_precharge_memlock(u32 pages);
> >  int bpf_map_charge_memlock(struct bpf_map *map, u32 pages);
> >  void bpf_map_uncharge_memlock(struct bpf_map *map, u32 pages);
> > -void *bpf_map_area_alloc(size_t size, int numa_node);
> > +void *bpf_map_area_alloc(size_t size, int numa_node, bool account_mem);
> >  void bpf_map_area_free(void *base);
> >  void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 91c43884f295..a374ccbaa51b 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -278,6 +278,8 @@ enum bpf_attach_type {
> >  #define BPF_F_NO_COMMON_LRU    (1U << 1)
> >  /* Specify numa node during map creation */
> >  #define BPF_F_NUMA_NODE                (1U << 2)
> > +/* Enable memory accounting for map */
> > +#define BPF_F_ACCOUNT_MEM      (1U << 7)
> 
> Could you put the above definition after BPF_F_ZERO_SEED (1U << 6)?
> The same for tools.
> With this change, you can add my ack:
> Acked-by: Yonghong Song <yhs@fb.com>

^ permalink raw reply

* Re: [PATCH net-next 06/10] net: introduce a net_device_ops macsec helper
From: Antoine Tenart @ 2019-01-31  9:26 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Antoine Tenart, davem, sd, andrew, hkallweit1, netdev,
	linux-kernel, thomas.petazzoni, alexandre.belloni, quentin.schulz,
	allan.nielsen
In-Reply-To: <20190124092349.GE3662@kwain>

Hi,

On Thu, Jan 24, 2019 at 10:23:49AM +0100, Antoine Tenart wrote:
> On Wed, Jan 23, 2019 at 12:16:08PM -0800, Florian Fainelli wrote:
> > On 1/23/19 7:56 AM, Antoine Tenart wrote:
> > > @@ -1441,6 +1445,10 @@ struct net_device_ops {
> > >  						u32 flags);
> > >  	int			(*ndo_xsk_async_xmit)(struct net_device *dev,
> > >  						      u32 queue_id);
> > > +#ifdef CONFIG_MACSEC
> > > +	int			(*ndo_macsec)(struct net_device *dev,
> > > +					      struct netdev_macsec *macsec);
> > 
> > You would really want to define an API which is more oriented towards
> > configuring/deconfiguring a MACsec association here, e.g.: similar to
> > what the IPsec offload ndos offer.
> 
> This means mostly moving from a single function using a command field to
> multiple specialized functions to add/remove each element of MACsec
> configuration.
> 
> I don't have strong opinion on the single helper vs a structure
> containing pointers to specialized ones, but out of curiosity what's the
> benefit of such a move? Future additions and maintainability?
> 
> > It is not clear to me whether after your patch series we still need to
> > create a macsec virtual device, and that gets offloaded onto its real
> > device/PHY device, or if we don't need that all?
> 
> After this series, we will still need the virtual MACsec interface. When
> using hardware offloading this interface isn't doing much, but it's the
> interface used to configure all the MACsec connexions.
> 
> This is because, and that's specific to MACsec (vs IPsec), a software
> implementation is already supported and it's using a virtual interface
> to perform all the MACsec related operations (vs hooks in the Rx/Tx
> paths). I really wanted to avoid having two interfaces and ways of
> configuring MACsec depending on if the offloading is used.
> 
> This should also allow in the future to disable at run-time the
> offloading on a given interface, and to still have MACsec working in
> software (or the opposite, with extra work). For this to work, the
> virtual interface still has to provide an Rx and a Tx functions so that
> programs can bind onto the same interface, regardless of if the
> offloading is enabled.

Do you need extra information and explanations about this? I believe
this point is very important as the design choices were influenced a lot
by reusing the s/w implementation logic and API.

Thanks!
Antoine

-- 
Antoine Ténart, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* Re: [PATCH 09/28] thunderbolt: Cache adapter specific capability offset into struct port
From: Mika Westerberg @ 2019-01-31  9:37 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: linux-kernel, Michael Jamet, Yehezkel Bernat, Andreas Noever,
	David S . Miller, Andy Shevchenko, netdev
In-Reply-To: <20190131092309.5jfyvo7hgq2ui3ok@wunner.de>

On Thu, Jan 31, 2019 at 10:23:09AM +0100, Lukas Wunner wrote:
> On Tue, Jan 29, 2019 at 06:01:24PM +0300, Mika Westerberg wrote:
> > --- a/drivers/thunderbolt/tb.h
> > +++ b/drivers/thunderbolt/tb.h
> > @@ -124,6 +124,8 @@ struct tb_switch {
> >   * @remote: Remote port (%NULL if not connected)
> >   * @xdomain: Remote host (%NULL if not connected)
> >   * @cap_phy: Offset, zero if not found
> > + * @cap_adap: Offset of the adapter specific capability. Negative if not
> > + *	      present.
> 
> Hm, could cap_adap be made zero in the non-presence case for consistency
> with cap_phy?

Sure.

^ permalink raw reply

* [PATCH] can: flexcan: fix timeout when set small bitrate
From: Joakim Zhang @ 2019-01-31  9:37 UTC (permalink / raw)
  To: mkl@pengutronix.de, linux-can@vger.kernel.org
  Cc: wg@grandegger.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, dl-linux-imx, Joakim Zhang

Current we can meet timeout issue when setting a small bitrate like 10000
as follows on i.MX6UL EVK board (ipg clock = 66MHZ, per clock = 30MHZ):
root@imx6ul7d:~# ip link set can0 up type can bitrate 10000
A link change request failed with some changes committed already.
Interface can0 may have been left with an inconsistent configuration, please check.
RTNETLINK answers: Connection timed out

It is caused by calling of flexcan_chip_unfreeze() timeout.

Originally the code is using usleep_range(10, 20) for unfreeze operation,
but the patch (8badd65 can: flexcan: avoid calling usleep_range from interrupt
context) changed it into udelay(10) which is only a half delay of before,
there're also some other delay changes.

After double to FLEXCAN_TIMEOUT_US to 100 can fix the issue.

Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
---
 drivers/net/can/flexcan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
index 2bca867bcfaa..1f2b4db7da88 100644
--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -166,7 +166,7 @@
 #define FLEXCAN_MB_CNT_LENGTH(x)	(((x) & 0xf) << 16)
 #define FLEXCAN_MB_CNT_TIMESTAMP(x)	((x) & 0xffff)

-#define FLEXCAN_TIMEOUT_US		(50)
+#define FLEXCAN_TIMEOUT_US		(100)

 /* FLEXCAN hardware feature flags
  *
-- 
2.17.1

^ permalink raw reply related

* [PATCH v2 bpf-next] bpf: add optional memory accounting for maps
From: Martynas Pumputis @ 2019-01-31  9:38 UTC (permalink / raw)
  To: netdev; +Cc: ys114321, ast, daniel, m, Yonghong Song

Previously, memory allocated for a map was not accounted. Therefore,
this memory could not be taken into consideration by the cgroups
memory controller.

This patch introduces the "BPF_F_ACCOUNT_MEM" flag which enables
the memory accounting for a map, and it can be set during
the map creation ("BPF_MAP_CREATE") in "map_flags".

When enabled, we account only that amount of memory which is charged
against the "RLIMIT_MEMLOCK" limit.

To validate the change, first we create the memory cgroup-v1 "test-map":

    # mkdir /sys/fs/cgroup/memory/test-map

And then we run the following program against the cgroup:

    $ cat test_map.c
    <..>
    int main() {
        usleep(3 * 1000000);
        assert(bpf_create_map(BPF_MAP_TYPE_HASH, 8, 16, 65536, 0) > 0);
        usleep(3 * 1000000);
    }
    # cgexec -g memory:test-map ./test_map &
    # cat /sys/fs/cgroup/memory/test-map/memory{,.kmem}.usage_in_bytes
    397312
    258048

    <after 3 sec the map has been created>

    # bpftool map list
    19: hash  flags 0x0
        key 8B  value 16B  max_entries 65536  memlock 5771264B
    # cat /sys/fs/cgroup/memory/test-map/memory{,.kmem}.usage_in_bytes
    401408
    262144

As we can see, the memory allocated for map is not accounted, as
397312B + 5771264B > 401408B.

Next, we enabled the accounting and re-run the test:

    $ cat test_map.c
    <..>
    int main() {
        usleep(3 * 1000000);
        assert(bpf_create_map(BPF_MAP_TYPE_HASH, 8, 16, 65536, BPF_F_ACCOUNT_MEM) > 0);
        usleep(3 * 1000000);
    }
    # cgexec -g memory:test-map ./test_map &
    # cat /sys/fs/cgroup/memory/test-map/memory{,.kmem}.usage_in_bytes
    450560
    307200

    <after 3 sec the map has been created>

    # bpftool map list
    20: hash  flags 0x80
        key 8B  value 16B  max_entries 65536  memlock 5771264B
    # cat /sys/fs/cgroup/memory/test-map/memory{,.kmem}.usage_in_bytes
    6221824
    6078464

This time, the memory (including kmem) is accounted, as
450560B + 5771264B <= 6221824B

Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
---
 include/linux/bpf.h            |  5 +++--
 include/uapi/linux/bpf.h       |  3 +++
 kernel/bpf/arraymap.c          | 14 +++++++++-----
 kernel/bpf/bpf_lru_list.c      | 11 +++++++++--
 kernel/bpf/bpf_lru_list.h      |  1 +
 kernel/bpf/cpumap.c            | 12 +++++++++---
 kernel/bpf/devmap.c            | 10 ++++++++--
 kernel/bpf/hashtab.c           | 19 ++++++++++++++-----
 kernel/bpf/lpm_trie.c          | 19 +++++++++++++------
 kernel/bpf/queue_stack_maps.c  |  5 +++--
 kernel/bpf/reuseport_array.c   |  3 ++-
 kernel/bpf/stackmap.c          | 12 ++++++++----
 kernel/bpf/syscall.c           | 12 ++++++++----
 kernel/bpf/xskmap.c            |  9 +++++++--
 net/core/sock_map.c            | 13 +++++++++----
 tools/include/uapi/linux/bpf.h |  4 ++++
 16 files changed, 110 insertions(+), 42 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e734f163bd0b..353a3f4304fe 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -79,7 +79,8 @@ struct bpf_map {
 	u32 btf_value_type_id;
 	struct btf *btf;
 	bool unpriv_array;
-	/* 55 bytes hole */
+	bool account_mem;
+	/* 54 bytes hole */
 
 	/* The 3rd and 4th cacheline with misc members to avoid false sharing
 	 * particularly with refcounting.
@@ -506,7 +507,7 @@ void bpf_map_put(struct bpf_map *map);
 int bpf_map_precharge_memlock(u32 pages);
 int bpf_map_charge_memlock(struct bpf_map *map, u32 pages);
 void bpf_map_uncharge_memlock(struct bpf_map *map, u32 pages);
-void *bpf_map_area_alloc(size_t size, int numa_node);
+void *bpf_map_area_alloc(size_t size, int numa_node, bool account_mem);
 void bpf_map_area_free(void *base);
 void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 91c43884f295..cff34678caf6 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -291,6 +291,9 @@ enum bpf_attach_type {
 /* Zero-initialize hash function seed. This should only be used for testing. */
 #define BPF_F_ZERO_SEED		(1U << 6)
 
+/* Enable memory accounting for map (only for BPF_MAP_CREATE) */
+#define BPF_F_ACCOUNT_MEM	(1U << 7)
+
 /* flags for BPF_PROG_QUERY */
 #define BPF_F_QUERY_EFFECTIVE	(1U << 0)
 
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 25632a75d630..86417f2e6f1b 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -34,14 +34,17 @@ static void bpf_array_free_percpu(struct bpf_array *array)
 	}
 }
 
-static int bpf_array_alloc_percpu(struct bpf_array *array)
+static int bpf_array_alloc_percpu(struct bpf_array *array, bool account_mem)
 {
 	void __percpu *ptr;
+	gfp_t gfp = GFP_USER | __GFP_NOWARN;
 	int i;
 
+	if (account_mem)
+		gfp |= __GFP_ACCOUNT;
+
 	for (i = 0; i < array->map.max_entries; i++) {
-		ptr = __alloc_percpu_gfp(array->elem_size, 8,
-					 GFP_USER | __GFP_NOWARN);
+		ptr = __alloc_percpu_gfp(array->elem_size, 8, gfp);
 		if (!ptr) {
 			bpf_array_free_percpu(array);
 			return -ENOMEM;
@@ -82,6 +85,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 	u32 elem_size, index_mask, max_entries;
 	bool unpriv = !capable(CAP_SYS_ADMIN);
 	u64 cost, array_size, mask64;
+	bool account_mem = (attr->map_flags & BPF_F_ACCOUNT_MEM);
 	struct bpf_array *array;
 
 	elem_size = round_up(attr->value_size, 8);
@@ -129,7 +133,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 		return ERR_PTR(ret);
 
 	/* allocate all map elements and zero-initialize them */
-	array = bpf_map_area_alloc(array_size, numa_node);
+	array = bpf_map_area_alloc(array_size, numa_node, account_mem);
 	if (!array)
 		return ERR_PTR(-ENOMEM);
 	array->index_mask = index_mask;
@@ -140,7 +144,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 	array->map.pages = cost;
 	array->elem_size = elem_size;
 
-	if (percpu && bpf_array_alloc_percpu(array)) {
+	if (percpu && bpf_array_alloc_percpu(array, account_mem)) {
 		bpf_map_area_free(array);
 		return ERR_PTR(-ENOMEM);
 	}
diff --git a/kernel/bpf/bpf_lru_list.c b/kernel/bpf/bpf_lru_list.c
index e6ef4401a138..4d58537e0af2 100644
--- a/kernel/bpf/bpf_lru_list.c
+++ b/kernel/bpf/bpf_lru_list.c
@@ -7,6 +7,7 @@
 #include <linux/cpumask.h>
 #include <linux/spinlock.h>
 #include <linux/percpu.h>
+#include <linux/gfp.h>
 
 #include "bpf_lru_list.h"
 
@@ -646,12 +647,17 @@ static void bpf_lru_list_init(struct bpf_lru_list *l)
 }
 
 int bpf_lru_init(struct bpf_lru *lru, bool percpu, u32 hash_offset,
+		 bool account_mem,
 		 del_from_htab_func del_from_htab, void *del_arg)
 {
+	gfp_t gfp = GFP_KERNEL;
 	int cpu;
 
+	if (account_mem)
+		gfp |= __GFP_ACCOUNT;
+
 	if (percpu) {
-		lru->percpu_lru = alloc_percpu(struct bpf_lru_list);
+		lru->percpu_lru = alloc_percpu_gfp(struct bpf_lru_list, gfp);
 		if (!lru->percpu_lru)
 			return -ENOMEM;
 
@@ -665,7 +671,8 @@ int bpf_lru_init(struct bpf_lru *lru, bool percpu, u32 hash_offset,
 	} else {
 		struct bpf_common_lru *clru = &lru->common_lru;
 
-		clru->local_list = alloc_percpu(struct bpf_lru_locallist);
+		clru->local_list = alloc_percpu_gfp(struct bpf_lru_locallist,
+						    gfp);
 		if (!clru->local_list)
 			return -ENOMEM;
 
diff --git a/kernel/bpf/bpf_lru_list.h b/kernel/bpf/bpf_lru_list.h
index 7d4f89b7cb84..89566665592b 100644
--- a/kernel/bpf/bpf_lru_list.h
+++ b/kernel/bpf/bpf_lru_list.h
@@ -74,6 +74,7 @@ static inline void bpf_lru_node_set_ref(struct bpf_lru_node *node)
 }
 
 int bpf_lru_init(struct bpf_lru *lru, bool percpu, u32 hash_offset,
+		 bool account_mem,
 		 del_from_htab_func del_from_htab, void *delete_arg);
 void bpf_lru_populate(struct bpf_lru *lru, void *buf, u32 node_offset,
 		      u32 elem_size, u32 nr_elems);
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 8974b3755670..1e84bf78716e 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -81,6 +81,8 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 	struct bpf_cpu_map *cmap;
 	int err = -ENOMEM;
 	u64 cost;
+	bool account_mem = (attr->map_flags & BPF_F_ACCOUNT_MEM);
+	gfp_t gfp = GFP_KERNEL;
 	int ret;
 
 	if (!capable(CAP_SYS_ADMIN))
@@ -117,16 +119,20 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 		goto free_cmap;
 	}
 
+	if (account_mem)
+		gfp |= __GFP_ACCOUNT;
 	/* A per cpu bitfield with a bit per possible CPU in map  */
-	cmap->flush_needed = __alloc_percpu(cpu_map_bitmap_size(attr),
-					    __alignof__(unsigned long));
+	cmap->flush_needed = __alloc_percpu_gfp(cpu_map_bitmap_size(attr),
+					    __alignof__(unsigned long),
+					    gfp);
 	if (!cmap->flush_needed)
 		goto free_cmap;
 
 	/* Alloc array for possible remote "destination" CPUs */
 	cmap->cpu_map = bpf_map_area_alloc(cmap->map.max_entries *
 					   sizeof(struct bpf_cpu_map_entry *),
-					   cmap->map.numa_node);
+					   cmap->map.numa_node,
+					   account_mem);
 	if (!cmap->cpu_map)
 		goto free_percpu;
 
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 191b79948424..acfc1b35aa51 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -90,6 +90,8 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
 	struct bpf_dtab *dtab;
 	int err = -EINVAL;
 	u64 cost;
+	bool account_mem = (attr->map_flags & BPF_F_ACCOUNT_MEM);
+	gfp_t gfp;
 
 	if (!capable(CAP_NET_ADMIN))
 		return ERR_PTR(-EPERM);
@@ -120,16 +122,20 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
 
 	err = -ENOMEM;
 
+	gfp = GFP_KERNEL | __GFP_NOWARN;
+	if (account_mem)
+		gfp |= __GFP_ACCOUNT;
 	/* A per cpu bitfield with a bit per possible net device */
 	dtab->flush_needed = __alloc_percpu_gfp(dev_map_bitmap_size(attr),
 						__alignof__(unsigned long),
-						GFP_KERNEL | __GFP_NOWARN);
+						gfp);
 	if (!dtab->flush_needed)
 		goto free_dtab;
 
 	dtab->netdev_map = bpf_map_area_alloc(dtab->map.max_entries *
 					      sizeof(struct bpf_dtab_netdev *),
-					      dtab->map.numa_node);
+					      dtab->map.numa_node,
+					      account_mem);
 	if (!dtab->netdev_map)
 		goto free_dtab;
 
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 4b7c76765d9d..fc2f44451256 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -23,6 +23,7 @@
 
 #define HTAB_CREATE_FLAG_MASK						\
 	(BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE |	\
+	 BPF_F_ACCOUNT_MEM |						\
 	 BPF_F_RDONLY | BPF_F_WRONLY | BPF_F_ZERO_SEED)
 
 struct bucket {
@@ -139,27 +140,32 @@ static struct htab_elem *prealloc_lru_pop(struct bpf_htab *htab, void *key,
 	return NULL;
 }
 
-static int prealloc_init(struct bpf_htab *htab)
+static int prealloc_init(struct bpf_htab *htab, bool account_mem)
 {
 	u32 num_entries = htab->map.max_entries;
+	gfp_t gfp = GFP_USER | __GFP_NOWARN;
 	int err = -ENOMEM, i;
 
 	if (!htab_is_percpu(htab) && !htab_is_lru(htab))
 		num_entries += num_possible_cpus();
 
 	htab->elems = bpf_map_area_alloc(htab->elem_size * num_entries,
-					 htab->map.numa_node);
+					 htab->map.numa_node,
+					 account_mem);
 	if (!htab->elems)
 		return -ENOMEM;
 
 	if (!htab_is_percpu(htab))
 		goto skip_percpu_elems;
 
+	if (account_mem)
+		gfp |= __GFP_ACCOUNT;
+
 	for (i = 0; i < num_entries; i++) {
 		u32 size = round_up(htab->map.value_size, 8);
 		void __percpu *pptr;
 
-		pptr = __alloc_percpu_gfp(size, 8, GFP_USER | __GFP_NOWARN);
+		pptr = __alloc_percpu_gfp(size, 8, gfp);
 		if (!pptr)
 			goto free_elems;
 		htab_elem_set_ptr(get_htab_elem(htab, i), htab->map.key_size,
@@ -173,6 +179,7 @@ static int prealloc_init(struct bpf_htab *htab)
 				   htab->map.map_flags & BPF_F_NO_COMMON_LRU,
 				   offsetof(struct htab_elem, hash) -
 				   offsetof(struct htab_elem, lru_node),
+				   account_mem,
 				   htab_lru_map_delete_node,
 				   htab);
 	else
@@ -313,6 +320,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	 */
 	bool percpu_lru = (attr->map_flags & BPF_F_NO_COMMON_LRU);
 	bool prealloc = !(attr->map_flags & BPF_F_NO_PREALLOC);
+	bool account_mem = (attr->map_flags & BPF_F_ACCOUNT_MEM);
 	struct bpf_htab *htab;
 	int err, i;
 	u64 cost;
@@ -374,7 +382,8 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	err = -ENOMEM;
 	htab->buckets = bpf_map_area_alloc(htab->n_buckets *
 					   sizeof(struct bucket),
-					   htab->map.numa_node);
+					   htab->map.numa_node,
+					   account_mem);
 	if (!htab->buckets)
 		goto free_htab;
 
@@ -389,7 +398,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	}
 
 	if (prealloc) {
-		err = prealloc_init(htab);
+		err = prealloc_init(htab, account_mem);
 		if (err)
 			goto free_buckets;
 
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index abf1002080df..8421fdb816f3 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -277,16 +277,19 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
 }
 
 static struct lpm_trie_node *lpm_trie_node_alloc(const struct lpm_trie *trie,
-						 const void *value)
+						 const void *value,
+						 bool account_mem)
 {
 	struct lpm_trie_node *node;
 	size_t size = sizeof(struct lpm_trie_node) + trie->data_size;
+	gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN;
 
 	if (value)
 		size += trie->map.value_size;
 
-	node = kmalloc_node(size, GFP_ATOMIC | __GFP_NOWARN,
-			    trie->map.numa_node);
+	if (account_mem)
+		gfp |= __GFP_ACCOUNT;
+	node = kmalloc_node(size, gfp, trie->map.numa_node);
 	if (!node)
 		return NULL;
 
@@ -327,7 +330,7 @@ static int trie_update_elem(struct bpf_map *map,
 		goto out;
 	}
 
-	new_node = lpm_trie_node_alloc(trie, value);
+	new_node = lpm_trie_node_alloc(trie, value, map->account_mem);
 	if (!new_node) {
 		ret = -ENOMEM;
 		goto out;
@@ -394,7 +397,7 @@ static int trie_update_elem(struct bpf_map *map,
 		goto out;
 	}
 
-	im_node = lpm_trie_node_alloc(trie, NULL);
+	im_node = lpm_trie_node_alloc(trie, NULL, map->account_mem);
 	if (!im_node) {
 		ret = -ENOMEM;
 		goto out;
@@ -542,6 +545,8 @@ static int trie_delete_elem(struct bpf_map *map, void *_key)
 static struct bpf_map *trie_alloc(union bpf_attr *attr)
 {
 	struct lpm_trie *trie;
+	bool account_mem = (attr->map_flags & BPF_F_ACCOUNT_MEM);
+	gfp_t gfp = GFP_USER | __GFP_NOWARN;
 	u64 cost = sizeof(*trie), cost_per_node;
 	int ret;
 
@@ -558,7 +563,9 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
 	    attr->value_size > LPM_VAL_SIZE_MAX)
 		return ERR_PTR(-EINVAL);
 
-	trie = kzalloc(sizeof(*trie), GFP_USER | __GFP_NOWARN);
+	if (account_mem)
+		gfp |= __GFP_ACCOUNT;
+	trie = kzalloc(sizeof(*trie), gfp);
 	if (!trie)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index b384ea9f3254..040ec350af3d 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -11,7 +11,7 @@
 #include "percpu_freelist.h"
 
 #define QUEUE_STACK_CREATE_FLAG_MASK \
-	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
+	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY | BPF_F_ACCOUNT_MEM)
 
 
 struct bpf_queue_stack {
@@ -69,6 +69,7 @@ static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
 	int ret, numa_node = bpf_map_attr_numa_node(attr);
 	struct bpf_queue_stack *qs;
 	u64 size, queue_size, cost;
+	bool account_mem = (attr->map_flags & BPF_F_ACCOUNT_MEM);
 
 	size = (u64) attr->max_entries + 1;
 	cost = queue_size = sizeof(*qs) + size * attr->value_size;
@@ -81,7 +82,7 @@ static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
 	if (ret < 0)
 		return ERR_PTR(ret);
 
-	qs = bpf_map_area_alloc(queue_size, numa_node);
+	qs = bpf_map_area_alloc(queue_size, numa_node, account_mem);
 	if (!qs)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
index 18e225de80ff..a9a2709c7507 100644
--- a/kernel/bpf/reuseport_array.c
+++ b/kernel/bpf/reuseport_array.c
@@ -152,6 +152,7 @@ static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
 	int err, numa_node = bpf_map_attr_numa_node(attr);
 	struct reuseport_array *array;
 	u64 cost, array_size;
+	bool account_mem = (attr->map_flags & BPF_F_ACCOUNT_MEM);
 
 	if (!capable(CAP_SYS_ADMIN))
 		return ERR_PTR(-EPERM);
@@ -170,7 +171,7 @@ static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
 		return ERR_PTR(err);
 
 	/* allocate all map elements and zero-initialize them */
-	array = bpf_map_area_alloc(array_size, numa_node);
+	array = bpf_map_area_alloc(array_size, numa_node, account_mem);
 	if (!array)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index d43b14535827..46d37c7e09a2 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -61,13 +61,15 @@ static inline int stack_map_data_size(struct bpf_map *map)
 		sizeof(struct bpf_stack_build_id) : sizeof(u64);
 }
 
-static int prealloc_elems_and_freelist(struct bpf_stack_map *smap)
+static int prealloc_elems_and_freelist(struct bpf_stack_map *smap,
+				       bool account_mem)
 {
 	u32 elem_size = sizeof(struct stack_map_bucket) + smap->map.value_size;
 	int err;
 
 	smap->elems = bpf_map_area_alloc(elem_size * smap->map.max_entries,
-					 smap->map.numa_node);
+					 smap->map.numa_node,
+					 account_mem);
 	if (!smap->elems)
 		return -ENOMEM;
 
@@ -90,6 +92,7 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
 	u32 value_size = attr->value_size;
 	struct bpf_stack_map *smap;
 	u64 cost, n_buckets;
+	bool account_mem = (attr->map_flags & BPF_F_ACCOUNT_MEM);
 	int err;
 
 	if (!capable(CAP_SYS_ADMIN))
@@ -119,7 +122,8 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
 	if (cost >= U32_MAX - PAGE_SIZE)
 		return ERR_PTR(-E2BIG);
 
-	smap = bpf_map_area_alloc(cost, bpf_map_attr_numa_node(attr));
+	smap = bpf_map_area_alloc(cost, bpf_map_attr_numa_node(attr),
+				  account_mem);
 	if (!smap)
 		return ERR_PTR(-ENOMEM);
 
@@ -141,7 +145,7 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
 	if (err)
 		goto free_smap;
 
-	err = prealloc_elems_and_freelist(smap);
+	err = prealloc_elems_and_freelist(smap, account_mem);
 	if (err)
 		goto put_buffers;
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index b155cd17c1bd..13f2e1731a47 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -131,25 +131,29 @@ static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
 		return map;
 	map->ops = ops;
 	map->map_type = type;
+	map->account_mem = (attr->map_flags & BPF_F_ACCOUNT_MEM);
 	return map;
 }
 
-void *bpf_map_area_alloc(size_t size, int numa_node)
+void *bpf_map_area_alloc(size_t size, int numa_node, bool account_mem)
 {
 	/* We definitely need __GFP_NORETRY, so OOM killer doesn't
 	 * trigger under memory pressure as we really just want to
 	 * fail instead.
 	 */
-	const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
+	gfp_t gfp = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
 	void *area;
 
+	if (account_mem)
+		gfp |= __GFP_ACCOUNT;
+
 	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
-		area = kmalloc_node(size, GFP_USER | flags, numa_node);
+		area = kmalloc_node(size, GFP_USER | gfp, numa_node);
 		if (area != NULL)
 			return area;
 	}
 
-	return __vmalloc_node_flags_caller(size, numa_node, GFP_KERNEL | flags,
+	return __vmalloc_node_flags_caller(size, numa_node, GFP_KERNEL | gfp,
 					   __builtin_return_address(0));
 }
 
diff --git a/kernel/bpf/xskmap.c b/kernel/bpf/xskmap.c
index 686d244e798d..bbc1f142326f 100644
--- a/kernel/bpf/xskmap.c
+++ b/kernel/bpf/xskmap.c
@@ -20,6 +20,8 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
 	int cpu, err = -EINVAL;
 	struct xsk_map *m;
 	u64 cost;
+	bool account_mem = (attr->map_flags & BPF_F_ACCOUNT_MEM);
+	gfp_t gfp = GFP_KERNEL;
 
 	if (!capable(CAP_NET_ADMIN))
 		return ERR_PTR(-EPERM);
@@ -49,7 +51,9 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
 
 	err = -ENOMEM;
 
-	m->flush_list = alloc_percpu(struct list_head);
+	if (account_mem)
+		gfp |= __GFP_ACCOUNT;
+	m->flush_list = alloc_percpu_gfp(struct list_head, gfp);
 	if (!m->flush_list)
 		goto free_m;
 
@@ -58,7 +62,8 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
 
 	m->xsk_map = bpf_map_area_alloc(m->map.max_entries *
 					sizeof(struct xdp_sock *),
-					m->map.numa_node);
+					m->map.numa_node,
+					account_mem);
 	if (!m->xsk_map)
 		goto free_percpu;
 	return &m->map;
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index be6092ac69f8..eefcfd1294c0 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -18,13 +18,15 @@ struct bpf_stab {
 	raw_spinlock_t lock;
 };
 
-#define SOCK_CREATE_FLAG_MASK				\
-	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
+#define SOCK_CREATE_FLAG_MASK					\
+	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY |	\
+	 BPF_F_ACCOUNT_MEM)
 
 static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 {
 	struct bpf_stab *stab;
 	u64 cost;
+	bool account_mem = (attr->map_flags & BPF_F_ACCOUNT_MEM);
 	int err;
 
 	if (!capable(CAP_NET_ADMIN))
@@ -56,7 +58,8 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 
 	stab->sks = bpf_map_area_alloc(stab->map.max_entries *
 				       sizeof(struct sock *),
-				       stab->map.numa_node);
+				       stab->map.numa_node,
+				       account_mem);
 	if (stab->sks)
 		return &stab->map;
 	err = -ENOMEM;
@@ -788,6 +791,7 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
 	struct bpf_htab *htab;
 	int i, err;
 	u64 cost;
+	bool account = (attr->map_flags & BPF_F_ACCOUNT_MEM);
 
 	if (!capable(CAP_NET_ADMIN))
 		return ERR_PTR(-EPERM);
@@ -823,7 +827,8 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
 
 	htab->buckets = bpf_map_area_alloc(htab->buckets_num *
 					   sizeof(struct bpf_htab_bucket),
-					   htab->map.numa_node);
+					   htab->map.numa_node,
+					   account);
 	if (!htab->buckets) {
 		err = -ENOMEM;
 		goto free_htab;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 91c43884f295..a2b6cb200c9b 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -279,6 +279,7 @@ enum bpf_attach_type {
 /* Specify numa node during map creation */
 #define BPF_F_NUMA_NODE		(1U << 2)
 
+
 #define BPF_OBJ_NAME_LEN 16U
 
 /* Flags for accessing BPF object */
@@ -291,6 +292,9 @@ enum bpf_attach_type {
 /* Zero-initialize hash function seed. This should only be used for testing. */
 #define BPF_F_ZERO_SEED		(1U << 6)
 
+/* Enable memory accounting for map (only for BPF_MAP_CREATE) */
+#define BPF_F_ACCOUNT_MEM	(1U << 7)
+
 /* flags for BPF_PROG_QUERY */
 #define BPF_F_QUERY_EFFECTIVE	(1U << 0)
 
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH] include/linux/bpf.h - fix missing prototype warnings...
From: Daniel Borkmann @ 2019-01-31  9:39 UTC (permalink / raw)
  To: valdis.kletnieks, Alexei Starovoitov; +Cc: netdev, linux-kernel
In-Reply-To: <6349.1548741865@turing-police.cc.vt.edu>

On 01/29/2019 07:04 AM, valdis.kletnieks@vt.edu wrote:
> Compiling with W=1 generates warnings:
> 
>   CC      kernel/bpf/core.o
> kernel/bpf/core.c:721:12: warning: no previous prototype for ?bpf_jit_alloc_exec_limit? [-Wmissing-prototypes]
>   721 | u64 __weak bpf_jit_alloc_exec_limit(void)
>       |            ^~~~~~~~~~~~~~~~~~~~~~~~
> kernel/bpf/core.c:757:14: warning: no previous prototype for ?bpf_jit_alloc_exec? [-Wmissing-prototypes]
>   757 | void *__weak bpf_jit_alloc_exec(unsigned long size)
>       |              ^~~~~~~~~~~~~~~~~~
> kernel/bpf/core.c:762:13: warning: no previous prototype for ?bpf_jit_free_exec? [-Wmissing-prototypes]
>   762 | void __weak bpf_jit_free_exec(void *addr)
>       |             ^~~~~~~~~~~~~~~~~
> 
> All three are weak functions that archs can override, although none do so
> currently.  Provide prototypes for when a new arch provides its own.
> 
> Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>

Applied all 3 to bpf-next, thanks Valdis!

> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 3851529062ec..99e55313123f 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -472,6 +472,10 @@ _out:							\
>  #define BPF_PROG_RUN_ARRAY_CHECK(array, ctx, func)	\
>  	__BPF_PROG_RUN_ARRAY(array, ctx, func, true)
>  
> +u64 __weak bpf_jit_alloc_exec_limit(void);
> +void *__weak bpf_jit_alloc_exec(unsigned long size);
> +void __weak bpf_jit_free_exec(void *addr);
> +

(I moved these to include/linux/filter.h while applying where we have all the
 other prototypes from JIT core already.)

>  #ifdef CONFIG_BPF_SYSCALL
>  DECLARE_PER_CPU(int, bpf_prog_active);
>  
> 


^ permalink raw reply

* Re: [PATCH bpf-next v3 0/5] skip verifier/map tests if kernel support is missing
From: Daniel Borkmann @ 2019-01-31  9:40 UTC (permalink / raw)
  To: Stanislav Fomichev, netdev; +Cc: davem, ast
In-Reply-To: <20190128172119.12105-1-sdf@google.com>

On 01/28/2019 06:21 PM, Stanislav Fomichev wrote:
> If test_maps/test_verifier is running against the kernel which doesn't
> have _all_ BPF features enabled, it fails with an error. This patch
> series tries to probe kernel support for each failed test and skip
> it instead. This lets users run BPF selftests in the not-all-bpf-yes
> environments and received correct PASS/NON-PASS result.
> 
> See https://www.spinics.net/lists/netdev/msg539331.html for more
> context.
> 
> The series goes like this:
> 
> * patch #1 skips sockmap tests in test_maps.c if BPF_MAP_TYPE_SOCKMAP
>   map is not supported (if bpf_create_map fails, we probe the kernel
>   for support)
> * patch #2 skips verifier tests if test->prog_type is not supported (if
>   bpf_verify_program fails, we probe the kernel for support)
> * patch #3 skips verifier tests if test fixup map is not supported (if
>   create_map fails, we probe the kernel for support)
> * next patches fix various small issues that arise from the first four:
>   * patch #4 sets "unknown func bpf_trace_printk#6" prog_type to
>     BPF_PROG_TYPE_TRACEPOINT so it is correctly skipped in
>     CONFIG_BPF_EVENTS=n case
>   * patch #5 exposes BPF_PROG_TYPE_CGROUP_{SKB,SOCK,SOCK_ADDR} only when
>     CONFIG_CGROUP_BPF=y, this makes verifier correctly skip appropriate
>     tests
> 
> v3 changes:
> * rebased on top of Quentin's series which adds probes to libbpf
> 
> v2 changes:
> * don't sprinkle "ifdef CONFIG_CGROUP_BPF" all around net/core/filter.c,
>   doing it only in the bpf_types.h is enough to disable
>   BPF_PROG_TYPE_CGROUP_{SKB,SOCK,SOCK_ADDR} prog types for non-cgroup
>   enabled kernels
> 
> Stanislav Fomichev (5):
>   selftests/bpf: skip sockmap in test_maps if kernel doesn't have
>     support
>   selftests/bpf: skip verifier tests for unsupported program types
>   selftests/bpf: skip verifier tests for unsupported map types
>   selftests/bpf: mark verifier test that uses bpf_trace_printk as
>     BPF_PROG_TYPE_TRACEPOINT
>   bpf: BPF_PROG_TYPE_CGROUP_{SKB,SOCK,SOCK_ADDR} require cgroups enabled
> 
>  include/linux/bpf_types.h                     |  2 +
>  tools/testing/selftests/bpf/test_maps.c       | 13 +++++-
>  tools/testing/selftests/bpf/test_verifier.c   | 45 +++++++++++++++++--
>  tools/testing/selftests/bpf/verifier/unpriv.c |  1 +
>  4 files changed, 56 insertions(+), 5 deletions(-)
> 

Applied, thanks!

^ permalink raw reply

* Re: [PATCH -next] mISDN: hfcsusb: Fix potential NULL pointer dereference
From: YueHaibing @ 2019-01-31  9:41 UTC (permalink / raw)
  To: David Miller; +Cc: isdn, gustavo, bigeasy, linux-kernel, netdev
In-Reply-To: <20190130.101013.1464226067584735951.davem@davemloft.net>

On 2019/1/31 2:10, David Miller wrote:
> From: YueHaibing <yuehaibing@huawei.com>
> Date: Wed, 30 Jan 2019 18:19:02 +0800
> 
>> There is a potential NULL pointer dereference in case
>> kzalloc() fails and returns NULL.
>>
>> Fixes: 69f52adb2d53 ("mISDN: Add HFC USB driver")
>> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
>> ---
>>  drivers/isdn/hardware/mISDN/hfcsusb.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/isdn/hardware/mISDN/hfcsusb.c b/drivers/isdn/hardware/mISDN/hfcsusb.c
>> index 124ff53..5660d5a 100644
>> --- a/drivers/isdn/hardware/mISDN/hfcsusb.c
>> +++ b/drivers/isdn/hardware/mISDN/hfcsusb.c
>> @@ -263,6 +263,8 @@ hfcsusb_ph_info(struct hfcsusb *hw)
>>  	int i;
>>  
>>  	phi = kzalloc(struct_size(phi, bch, dch->dev.nrbchan), GFP_ATOMIC);
>> +	if (!phi)
>> +		return;
> 
> If we fail with an error and do not perform the operation we were requested to
> make, we must return an error to the caller, and the caller must do something
> reasonable with that error (perhaps return it to it's caller) and so on and
> so forth.


hfcsusb_ph_info alloced the 'phi'，then use it _alloc_mISDN_skb in _queue_data.
while _alloc_mISDN_skb fails, it also just return without err handling,then kfree(phi).
It seems that all the caller of hfcsusb_ph_info doesn't care the return value.

> 
> .
> 


^ permalink raw reply

* [PATCH net-next] macvlan: use netif_is_macvlan_port()
From: Julian Wiedmann @ 2019-01-31  9:48 UTC (permalink / raw)
  To: David Miller; +Cc: Julian Wiedmann, netdev

Replace the macvlan_port_exists() macro with its twin from netdevice.h

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
---
 drivers/net/macvlan.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index cf22a79af66b..26e53832b095 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -119,8 +119,6 @@ static struct macvlan_port *macvlan_port_get_rtnl(const struct net_device *dev)
 	return rtnl_dereference(dev->rx_handler_data);
 }
 
-#define macvlan_port_exists(dev) (dev->priv_flags & IFF_MACVLAN_PORT)
-
 static struct macvlan_dev *macvlan_hash_lookup(const struct macvlan_port *port,
 					       const unsigned char *addr)
 {
@@ -1378,7 +1376,7 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
 	if (!tb[IFLA_ADDRESS])
 		eth_hw_addr_random(dev);
 
-	if (!macvlan_port_exists(lowerdev)) {
+	if (!netif_is_macvlan_port(lowerdev)) {
 		err = macvlan_port_create(lowerdev);
 		if (err < 0)
 			return err;
@@ -1638,7 +1636,7 @@ static int macvlan_device_event(struct notifier_block *unused,
 	struct macvlan_port *port;
 	LIST_HEAD(list_kill);
 
-	if (!macvlan_port_exists(dev))
+	if (!netif_is_macvlan_port(dev))
 		return NOTIFY_DONE;
 
 	port = macvlan_port_get_rtnl(dev);
-- 
2.16.4


^ permalink raw reply related

* Re: [PATCH bpf-next v3 1/3] libbpf: move pr_*() functions to common header file
From: Daniel Borkmann @ 2019-01-31 10:04 UTC (permalink / raw)
  To: Magnus Karlsson, bjorn.topel, ast, netdev, jakub.kicinski,
	bjorn.topel, qi.z.zhang
  Cc: brouer
In-Reply-To: <1548774737-16579-2-git-send-email-magnus.karlsson@intel.com>

On 01/29/2019 04:12 PM, Magnus Karlsson wrote:
> Move the pr_*() functions in libbpf.c to a common header file called
> libbpf_internal.h. This so that the later libbpf AF_XDP helper library
> code in xsk.c can use these printing functions too.
> 
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> ---
>  tools/lib/bpf/libbpf.c          | 30 +-----------------------------
>  tools/lib/bpf/libbpf_internal.h | 41 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+), 29 deletions(-)
>  create mode 100644 tools/lib/bpf/libbpf_internal.h
> 
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 2ccde17..1d7fe26 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -39,6 +39,7 @@
>  #include <gelf.h>
>  
>  #include "libbpf.h"
> +#include "libbpf_internal.h"
>  #include "bpf.h"
>  #include "btf.h"
>  #include "str_error.h"
> @@ -51,34 +52,6 @@
>  #define BPF_FS_MAGIC		0xcafe4a11
>  #endif
>  
> -#define __printf(a, b)	__attribute__((format(printf, a, b)))
> -
> -__printf(1, 2)
> -static int __base_pr(const char *format, ...)
> -{
> -	va_list args;
> -	int err;
> -
> -	va_start(args, format);
> -	err = vfprintf(stderr, format, args);
> -	va_end(args);
> -	return err;
> -}
> -
> -static __printf(1, 2) libbpf_print_fn_t __pr_warning = __base_pr;
> -static __printf(1, 2) libbpf_print_fn_t __pr_info = __base_pr;
> -static __printf(1, 2) libbpf_print_fn_t __pr_debug;
> -
> -#define __pr(func, fmt, ...)	\
> -do {				\
> -	if ((func))		\
> -		(func)("libbpf: " fmt, ##__VA_ARGS__); \
> -} while (0)
> -
> -#define pr_warning(fmt, ...)	__pr(__pr_warning, fmt, ##__VA_ARGS__)
> -#define pr_info(fmt, ...)	__pr(__pr_info, fmt, ##__VA_ARGS__)
> -#define pr_debug(fmt, ...)	__pr(__pr_debug, fmt, ##__VA_ARGS__)
> -
>  void libbpf_set_print(libbpf_print_fn_t warn,
>  		      libbpf_print_fn_t info,
>  		      libbpf_print_fn_t debug)
> @@ -96,7 +69,6 @@ void libbpf_set_print(libbpf_print_fn_t warn,
>  		goto out;		\
>  } while(0)
>  
> -
>  /* Copied from tools/perf/util/util.h */
>  #ifndef zfree
>  # define zfree(ptr) ({ free(*ptr); *ptr = NULL; })
> diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
> new file mode 100644
> index 0000000..951c078
> --- /dev/null
> +++ b/tools/lib/bpf/libbpf_internal.h

Just really minor nit: lets name the header utils.h or such. It would also make
sense to move the zfree() and zclose() to this and make use of it.

> @@ -0,0 +1,41 @@
> +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> +
> +/*
> + * Common internal libbpf functions and definitions.
> + *
> + * Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org>
> + * Copyright (C) 2015 Wang Nan <wangnan0@huawei.com>
> + * Copyright (C) 2015 Huawei Inc.
> + */
> +#ifndef __LIBBPF_LIBBPF_INTERNAL_H
> +#define __LIBBPF_LIBBPF_INTERNAL_H
> +
> +#define __printf(a, b)	__attribute__((format(printf, a, b)))
> +
> +__printf(1, 2)
> +static int __base_pr(const char *format, ...)
> +{
> +	va_list args;
> +	int err;
> +
> +	va_start(args, format);
> +	err = vfprintf(stderr, format, args);
> +	va_end(args);
> +	return err;
> +}
> +
> +static __maybe_unused __printf(1, 2) libbpf_print_fn_t __pr_warning = __base_pr;
> +static __maybe_unused __printf(1, 2) libbpf_print_fn_t __pr_info = __base_pr;
> +static __maybe_unused __printf(1, 2) libbpf_print_fn_t __pr_debug;
> +
> +#define __pr(func, fmt, ...)	\
> +do {				\
> +	if ((func))		\
> +		(func)("libbpf: " fmt, ##__VA_ARGS__); \
> +} while (0)
> +
> +#define pr_warning(fmt, ...)	__pr(__pr_warning, fmt, ##__VA_ARGS__)
> +#define pr_info(fmt, ...)	__pr(__pr_info, fmt, ##__VA_ARGS__)
> +#define pr_debug(fmt, ...)	__pr(__pr_debug, fmt, ##__VA_ARGS__)
> +
> +#endif /* __LIBBPF_LIBBPF_INTERNAL_H */
> 


^ permalink raw reply

* Re: [PATCH bpf-next v3 2/3] libbpf: add support for using AF_XDP sockets
From: Daniel Borkmann @ 2019-01-31 10:05 UTC (permalink / raw)
  To: Magnus Karlsson, bjorn.topel, ast, netdev, jakub.kicinski,
	bjorn.topel, qi.z.zhang
  Cc: brouer
In-Reply-To: <1548774737-16579-3-git-send-email-magnus.karlsson@intel.com>

On 01/29/2019 04:12 PM, Magnus Karlsson wrote:
> This commit adds AF_XDP support to libbpf. The main reason for this is
> to facilitate writing applications that use AF_XDP by offering
> higher-level APIs that hide many of the details of the AF_XDP
> uapi. This is in the same vein as libbpf facilitates XDP adoption by
> offering easy-to-use higher level interfaces of XDP
> functionality. Hopefully this will facilitate adoption of AF_XDP, make
> applications using it simpler and smaller, and finally also make it
> possible for applications to benefit from optimizations in the AF_XDP
> user space access code. Previously, people just copied and pasted the
> code from the sample application into their application, which is not
> desirable.
> 
> The interface is composed of two parts:
> 
> * Low-level access interface to the four rings and the packet
> * High-level control plane interface for creating and setting
>   up umems and af_xdp sockets as well as a simple XDP program.
> 
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
[...]
>  libbpf ABI
>  ==========
>  
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index 266bc95..d294843 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -121,6 +121,18 @@ LIBBPF_0.0.1 {
>  		libbpf_prog_type_by_name;
>  		libbpf_set_print;
>  		libbpf_strerror;
> +		xsk_ring_cons__peek;
> +		xsk_ring_cons__release;
> +		xsk_ring_prod__reserve;
> +		xsk_ring_prod__submit;
> +		xsk_umem__get_data;
> +		xsk_umem__get_data_raw;
> +		xsk_umem__create;
> +		xsk_socket__create;
> +		xsk_umem__delete;
> +		xsk_socket__delete;
> +		xsk_umem__fd;
> +		xsk_socket__fd;

This definitely needs to be placed under LIBBPF_0.0.2 where we have bpf_probe_*
that were added in this cycle.

>  	local:
>  		*;
>  };
> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c

^ permalink raw reply

* RE: [PATCH] can: flexcan: fix timeout when set small bitrate
From: Aisheng Dong @ 2019-01-31 10:09 UTC (permalink / raw)
  To: Joakim Zhang, mkl@pengutronix.de, linux-can@vger.kernel.org
  Cc: wg@grandegger.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, dl-linux-imx
In-Reply-To: <20190131093509.12613-1-qiangqing.zhang@nxp.com>

> -----Original Message-----
> From: Joakim Zhang
> Sent: Thursday, January 31, 2019 5:37 PM
> To: mkl@pengutronix.de; linux-can@vger.kernel.org
> Cc: wg@grandegger.com; netdev@vger.kernel.org;
> linux-kernel@vger.kernel.org; dl-linux-imx <linux-imx@nxp.com>; Joakim
> Zhang <qiangqing.zhang@nxp.com>
> Subject: [PATCH] can: flexcan: fix timeout when set small bitrate
> 
> Current we can meet timeout issue when setting a small bitrate like 10000 as
> follows on i.MX6UL EVK board (ipg clock = 66MHZ, per clock = 30MHZ):
> root@imx6ul7d:~# ip link set can0 up type can bitrate 10000 A link change
> request failed with some changes committed already.
> Interface can0 may have been left with an inconsistent configuration, please
> check.
> RTNETLINK answers: Connection timed out
> 
> It is caused by calling of flexcan_chip_unfreeze() timeout.
> 
> Originally the code is using usleep_range(10, 20) for unfreeze operation, but
> the patch (8badd65 can: flexcan: avoid calling usleep_range from interrupt
> context) changed it into udelay(10) which is only a half delay of before,
> there're also some other delay changes.
> 
> After double to FLEXCAN_TIMEOUT_US to 100 can fix the issue.
> 
> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>

Reviewed-by: Dong Aisheng <aisheng.dong@nxp.com>

Regards
Dong Aisheng

^ permalink raw reply

* [PATCH v2] ipconfig: add carrier_timeout kernel parameter
From: Martin Kepplinger @ 2019-01-31 10:14 UTC (permalink / raw)
  To: corbet, davem, kuznet, yoshfuji, linux-doc, netdev
  Cc: linux-kernel, manfred.schlaegl

[-- Attachment #1: Type: text/plain, Size: 3730 bytes --]

commit 3fb72f1e6e61 ("ipconfig wait for carrier") added a
"wait for carrier" policy, with a fixed worst case maximum wait
of two minutes.

Now make the wait for carrier timeout configurable on the kernel
commandline and use the 120s as the default.

The timeout messages introduced with
commit 5e404cd65860 ("ipconfig: add informative timeout messages while
waiting for carrier") are done in a fixed interval of 20 seconds, just
like they were before (240/12).

Signed-off-by: Martin Kepplinger <martin.kepplinger@ginzinger.com>
---

so the "carrier_timeout" name really is a first suggestion and
feels a bit too generic, but from reading other parameters I took
that for now.

thanks
                            martin



revision history
----------------
v2: create a kernel parameter instead
v1: initial idea using kconfig


 .../admin-guide/kernel-parameters.txt         |  5 ++++
 net/ipv4/ipconfig.c                           | 27 +++++++++++++++----
 2 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 17ce83cc48f4..45743ed3e7b6 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -461,6 +461,11 @@
 			possible to determine what the correct size should be.
 			This option provides an override for these situations.
 
+	carrier_timeout=
+			[NET] Specifies amount of time (in seconds) that
+			the kernel should wait for a network carrier. By default
+			it waits 120 seconds.
+
 	ca_keys=	[KEYS] This parameter identifies a specific key(s) on
 			the system trusted keyring to be used for certificate
 			trust validation.
diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index b9a9873c25c6..9bcca08efec9 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -85,7 +85,6 @@
 
 /* Define the friendly delay before and after opening net devices */
 #define CONF_POST_OPEN		10	/* After opening: 10 msecs */
-#define CONF_CARRIER_TIMEOUT	120000	/* Wait for carrier timeout */
 
 /* Define the timeout for waiting for a DHCP/BOOTP/RARP reply */
 #define CONF_OPEN_RETRIES 	2	/* (Re)open devices twice */
@@ -101,6 +100,9 @@
 #define NONE cpu_to_be32(INADDR_NONE)
 #define ANY cpu_to_be32(INADDR_ANY)
 
+/* Wait for carrier timeout default in seconds */
+static unsigned int carrier_timeout = 120;
+
 /*
  * Public IP configuration
  */
@@ -268,9 +270,9 @@ static int __init ic_open_devs(void)
 
 	/* wait for a carrier on at least one device */
 	start = jiffies;
-	next_msg = start + msecs_to_jiffies(CONF_CARRIER_TIMEOUT/12);
+	next_msg = start + msecs_to_jiffies(20000);
 	while (time_before(jiffies, start +
-			   msecs_to_jiffies(CONF_CARRIER_TIMEOUT))) {
+			   msecs_to_jiffies(carrier_timeout * 1000))) {
 		int wait, elapsed;
 
 		for_each_netdev(&init_net, dev)
@@ -283,9 +285,9 @@ static int __init ic_open_devs(void)
 			continue;
 
 		elapsed = jiffies_to_msecs(jiffies - start);
-		wait = (CONF_CARRIER_TIMEOUT - elapsed + 500)/1000;
+		wait = (carrier_timeout * 1000 - elapsed + 500) / 1000;
 		pr_info("Waiting up to %d more seconds for network.\n", wait);
-		next_msg = jiffies + msecs_to_jiffies(CONF_CARRIER_TIMEOUT/12);
+		next_msg = jiffies + msecs_to_jiffies(20000);
 	}
 have_carrier:
 	rtnl_unlock();
@@ -1780,3 +1782,18 @@ static int __init vendor_class_identifier_setup(char *addrs)
 	return 1;
 }
 __setup("dhcpclass=", vendor_class_identifier_setup);
+
+static int __init set_carrier_timeout(char *str)
+{
+	ssize_t ret;
+
+	if (!str)
+		return 0;
+
+	ret = kstrtouint(str, 0, &carrier_timeout);
+	if (ret)
+		return 0;
+
+	return 1;
+}
+__setup("carrier_timeout=", set_carrier_timeout);
-- 
2.20.1


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 3616 bytes --]

^ permalink raw reply related

* Re: [PATCH bpf-next v3 2/3] libbpf: add support for using AF_XDP sockets
From: Daniel Borkmann @ 2019-01-31 10:29 UTC (permalink / raw)
  To: Magnus Karlsson, bjorn.topel, ast, netdev, jakub.kicinski,
	bjorn.topel, qi.z.zhang
  Cc: brouer
In-Reply-To: <1548774737-16579-3-git-send-email-magnus.karlsson@intel.com>

On 01/29/2019 04:12 PM, Magnus Karlsson wrote:
> This commit adds AF_XDP support to libbpf. The main reason for this is
> to facilitate writing applications that use AF_XDP by offering
> higher-level APIs that hide many of the details of the AF_XDP
> uapi. This is in the same vein as libbpf facilitates XDP adoption by
> offering easy-to-use higher level interfaces of XDP
> functionality. Hopefully this will facilitate adoption of AF_XDP, make
> applications using it simpler and smaller, and finally also make it
> possible for applications to benefit from optimizations in the AF_XDP
> user space access code. Previously, people just copied and pasted the
> code from the sample application into their application, which is not
> desirable.
> 
> The interface is composed of two parts:
> 
> * Low-level access interface to the four rings and the packet
> * High-level control plane interface for creating and setting
>   up umems and af_xdp sockets as well as a simple XDP program.
> 
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
[...]
> +
> +static __u32 xsk_prod_nb_free(struct xsk_ring_prod *r, __u32 nb)
> +{
> +	__u32 free_entries = r->cached_cons - r->cached_prod;
> +
> +	if (free_entries >= nb)
> +		return free_entries;
> +
> +	/* Refresh the local tail pointer.
> +	 * cached_cons is r->size bigger than the real consumer pointer so
> +	 * that this addition can be avoided in the more frequently
> +	 * executed code that computs free_entries in the beginning of
> +	 * this function. Without this optimization it whould have been
> +	 * free_entries = r->cached_prod - r->cached_cons + r->size.
> +	 */
> +	r->cached_cons = *r->consumer + r->size;
> +
> +	return r->cached_cons - r->cached_prod;
> +}
> +
> +static __u32 xsk_cons_nb_avail(struct xsk_ring_cons *r, __u32 nb)
> +{
> +	__u32 entries = r->cached_prod - r->cached_cons;
> +
> +	if (entries == 0) {
> +		r->cached_prod = *r->producer;
> +		entries = r->cached_prod - r->cached_cons;
> +	}
> +
> +	return (entries > nb) ? nb : entries;
> +}
> +
> +size_t xsk_ring_prod__reserve(struct xsk_ring_prod *prod, size_t nb, __u32 *idx)
> +{
> +	if (unlikely(xsk_prod_nb_free(prod, nb) < nb))
> +		return 0;
> +
> +	*idx = prod->cached_prod;
> +	prod->cached_prod += nb;
> +
> +	return nb;
> +}
> +
> +void xsk_ring_prod__submit(struct xsk_ring_prod *prod, size_t nb)
> +{
> +	/* Make sure everything has been written to the ring before signalling
> +	 * this to the kernel.
> +	 */
> +	smp_wmb();
> +
> +	*prod->producer += nb;
> +}
> +
> +size_t xsk_ring_cons__peek(struct xsk_ring_cons *cons, size_t nb, __u32 *idx)
> +{
> +	size_t entries = xsk_cons_nb_avail(cons, nb);
> +
> +	if (likely(entries > 0)) {
> +		/* Make sure we do not speculatively read the data before
> +		 * we have received the packet buffers from the ring.
> +		 */
> +		smp_rmb();
> +
> +		*idx = cons->cached_cons;
> +		cons->cached_cons += entries;
> +	}
> +
> +	return entries;
> +}
> +
> +void xsk_ring_cons__release(struct xsk_ring_cons *cons, size_t nb)
> +{
> +	*cons->consumer += nb;
> +}
> +
> +void *xsk_umem__get_data_raw(void *umem_area, __u64 addr)
> +{
> +	return &((char *)umem_area)[addr];
> +}
> +
> +void *xsk_umem__get_data(struct xsk_umem *umem, __u64 addr)
> +{
> +	return &((char *)(umem->umem_area))[addr];
> +}

Shouldn't some of the above helpers for critical path be exposed as static
inline functions instead?

[...]
> +static int xsk_parse_nl(void *cookie, void *msg, struct nlattr **tb)
> +{
> +	struct nlattr *tb_parsed[IFLA_XDP_MAX + 1];
> +	struct xsk_nl_info *nl_info = cookie;
> +	unsigned char mode;
> +	int err;
> +
> +	(void)msg;

Unused?

> +	nl_info->xdp_prog_attached = false;
> +	if (!tb[IFLA_XDP])
> +		return 0;
> +
> +	err = libbpf_nla_parse_nested(tb_parsed, IFLA_XDP_MAX, tb[IFLA_XDP],
> +				      NULL);
> +	if (err)
> +		return err;
> +
> +	if (!tb_parsed[IFLA_XDP_ATTACHED] || !tb_parsed[IFLA_XDP_FD])
> +		return 0;
> +
> +	mode = libbpf_nla_getattr_u8(tb_parsed[IFLA_XDP_ATTACHED]);
> +	if (mode == XDP_ATTACHED_NONE)
> +		return 0;

Probably good to memset and/or init the passed struct xsk_nl_info (e.g.
fd to -1 etc) such that if some user did not we won't end up with garbage
values on return 0.

> +	nl_info->xdp_prog_attached = true;
> +	nl_info->fd = libbpf_nla_getattr_u32(tb_parsed[IFLA_XDP_FD]);
> +	return 0;
> +}
> +
> +static bool xsk_xdp_prog_attached(struct xsk_socket *xsk)
> +{
> +	struct xsk_nl_info nl_info;
> +	unsigned int nl_pid;
> +	char err_buf[256];
> +	int sock, err;
> +
> +	sock = libbpf_netlink_open(&nl_pid);
> +	if (sock < 0)
> +		return false;
> +
> +	nl_info.xdp_prog_attached = false;
> +	nl_info.fd = 0;
> +
> +	err = libbpf_nl_get_link(sock, nl_pid, xsk_parse_nl, &nl_info);
> +	if (err) {
> +		libbpf_strerror(err, err_buf, sizeof(err_buf));
> +		pr_warning("Error:\n%s\n", err_buf);
> +		return false;
> +	}
> +
> +	xsk->prog_fd = nl_info.fd;
> +	return nl_info.xdp_prog_attached;

Don't we leak sock here and in error above?

> +}
> +
> +static int xsk_load_xdp_prog(struct xsk_socket *xsk)
> +{
> +	char bpf_log_buf[BPF_LOG_BUF_SIZE];
> +	int err, prog_fd;
> +
> +	/* This is the C-program:
> +	 * SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
> +	 * {
> +	 *     int *qidconf, index = ctx->rx_queue_index;
> +	 *
> +	 *     // A set entry here means that the correspnding queue_id
> +	 *     // has an active AF_XDP socket bound to it.
> +	 *     qidconf = bpf_map_lookup_elem(&qidconf_map, &index);
> +	 *     if (!qidconf)
> +	 *         return XDP_ABORTED;
> +	 *
> +	 *     if (*qidconf)
> +	 *         return bpf_redirect_map(&xsks_map, index, 0);
> +	 *
> +	 *     return XDP_PASS;
> +	 * }
> +	 */
> +	struct bpf_insn prog[] = {
> +		/* r1 = *(u32 *)(r1 + 16) */
> +		BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_1, 16),
> +		/* *(u32 *)(r10 - 4) = r1 */
> +		BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_1, -4),
> +		BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> +		BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
> +		BPF_LD_MAP_FD(BPF_REG_1, xsk->qidconf_map_fd),
> +		BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
> +		BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
> +		BPF_MOV32_IMM(BPF_REG_0, 0),
> +		/* if r1 == 0 goto +8 */
> +		BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 8),
> +		BPF_MOV32_IMM(BPF_REG_0, 2),
> +		/* r1 = *(u32 *)(r1 + 0) */
> +		BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_1, 0),
> +		/* if r1 == 0 goto +5 */
> +		BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 5),
> +		/* r2 = *(u32 *)(r10 - 4) */
> +		BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd),
> +		BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_10, -4),
> +		BPF_MOV32_IMM(BPF_REG_3, 0),
> +		BPF_EMIT_CALL(BPF_FUNC_redirect_map),
> +		/* The jumps are to this instruction */
> +		BPF_EXIT_INSN(),
> +	};
> +	size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
> +
> +	prog_fd = bpf_load_program(BPF_PROG_TYPE_XDP, prog, insns_cnt,
> +				   "LGPL-2.1 or BSD-2-Clause", 0, bpf_log_buf,
> +				   BPF_LOG_BUF_SIZE);
> +	if (prog_fd < 0) {
> +		pr_warning("BPF log buffer:\n%s", bpf_log_buf);
> +		return prog_fd;
> +	}
> +
> +	err = bpf_set_link_xdp_fd(xsk->ifindex, prog_fd, xsk->config.xdp_flags);
> +	if (err)
> +		return err;

Leaks prog_fd on error.

> +	xsk->prog_fd = prog_fd;
> +	return 0;
> +}
> +
> +static int xsk_create_bpf_maps(struct xsk_socket *xsk)
> +{
> +	int fd;
> +
> +	fd = bpf_create_map_name(BPF_MAP_TYPE_ARRAY, "qidconf_map",
> +				 sizeof(int), sizeof(int), MAX_QUEUES, 0);
> +	if (fd < 0)
> +		return fd;
> +	xsk->qidconf_map_fd = fd;
> +
> +	fd = bpf_create_map_name(BPF_MAP_TYPE_XSKMAP, "xsks_map",
> +				 sizeof(int), sizeof(int), MAX_QUEUES, 0);
> +	if (fd < 0)
> +		return fd;

Leaks first map fd on error.

> +	xsk->xsks_map_fd = fd;
> +
> +	return 0;
> +}
> +
> +static int xsk_update_bpf_maps(struct xsk_socket *xsk, int qidconf_value,
> +			       int xsks_value)
> +{
> +	bool qidconf_map_updated = false, xsks_map_updated = false;
> +	struct bpf_prog_info prog_info = {};
> +	__u32 prog_len = sizeof(prog_info);
> +	struct bpf_map_info map_info;
> +	__u32 map_len = sizeof(map_info);
> +	__u32 *map_ids;
> +	int reset_value = 0;
> +	__u32 num_maps;
> +	unsigned int i;
> +	int err;
> +
> +	err = bpf_obj_get_info_by_fd(xsk->prog_fd, &prog_info, &prog_len);
> +	if (err)
> +		return err;
> +
> +	num_maps = prog_info.nr_map_ids;
> +
> +	map_ids = malloc(prog_info.nr_map_ids * sizeof(*map_ids));

calloc()?

> +	if (!map_ids)
> +		return -ENOMEM;
> +
> +	memset(&prog_info, 0, prog_len);
> +	prog_info.nr_map_ids = num_maps;
> +	prog_info.map_ids = (__u64)(unsigned long)map_ids;
> +
> +	err = bpf_obj_get_info_by_fd(xsk->prog_fd, &prog_info, &prog_len);
> +	if (err)
> +		return err;

Leaks map_ids on error.

> +
> +	for (i = 0; i < prog_info.nr_map_ids; i++) {
> +		int fd;
> +
> +		fd = bpf_map_get_fd_by_id(map_ids[i]);
> +		if (fd < 0) {
> +			err = -errno;
> +			goto out;
> +		}
> +
> +		err = bpf_obj_get_info_by_fd(fd, &map_info, &map_len);
> +		if (err)
> +			goto out;

close(fd) on error, also the case for below, please double check everything
so that no fd leaks by accident into the app.

> +
> +		if (!strcmp(map_info.name, "qidconf_map")) {
> +			err = bpf_map_update_elem(fd, &xsk->queue_id,
> +						  &qidconf_value, 0);
> +			if (err)
> +				goto out;
> +			qidconf_map_updated = true;
> +			xsk->qidconf_map_fd = fd;
> +		} else if (!strcmp(map_info.name, "xsks_map")) {
> +			err = bpf_map_update_elem(fd, &xsk->queue_id,
> +						  &xsks_value, 0);
> +			if (err)
> +				goto out;> +			xsks_map_updated = true;
> +			xsk->xsks_map_fd = fd;
> +		}
> +
> +		if (qidconf_map_updated && xsks_map_updated)
> +			break;
> +	}
> +
> +	if (!(qidconf_map_updated && xsks_map_updated)) {
> +		err = -ENOENT;
> +		goto out;
> +	}
> +
> +	return 0;
> +
> +out:
> +	if (qidconf_map_updated)
> +		(void)bpf_map_update_elem(xsk->qidconf_map_fd, &xsk->queue_id,
> +					  &reset_value, 0);
> +	if (xsks_map_updated)
> +		(void)bpf_map_update_elem(xsk->xsks_map_fd, &xsk->queue_id,
> +					  &reset_value, 0);
> +	return err;
> +}
> +
> +static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
> +{
> +	bool prog_loaded = false;
> +	int err;
> +
> +	if (!xsk_xdp_prog_attached(xsk)) {
> +		err = xsk_create_bpf_maps(xsk);
> +		if (err)
> +			goto out_load;
> +
> +		err = xsk_load_xdp_prog(xsk);
> +		if (err)
> +			return err;

Needs to undo created maps on error.

> +		prog_loaded = true;
> +	}
> +
> +	err = xsk_update_bpf_maps(xsk, true, xsk->fd);
> +	if (err)
> +		goto out_load;
> +
> +	return 0;
> +
> +out_load:
> +	if (prog_loaded)
> +		close(xsk->prog_fd);

Ditto

> +	return err;
> +}
> +
> +int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
> +		       __u32 queue_id, struct xsk_umem *umem,
> +		       struct xsk_ring_cons *rx, struct xsk_ring_prod *tx,
> +		       const struct xsk_socket_config *usr_config)
> +{
> +	struct sockaddr_xdp sxdp = {};
> +	struct xdp_mmap_offsets off;
> +	struct xsk_socket *xsk;
> +	socklen_t optlen;
> +	void *map;

^ permalink raw reply

* Re: [PATCH bpf-next v3 1/3] libbpf: move pr_*() functions to common header file
From: Magnus Karlsson @ 2019-01-31 10:42 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Magnus Karlsson, Björn Töpel, ast, Network Development,
	Jakub Kicinski, Björn Töpel, Zhang, Qi Z,
	Jesper Dangaard Brouer
In-Reply-To: <f120bbc3-7448-b5ef-55b6-9cf42794ae25@iogearbox.net>

On Thu, Jan 31, 2019 at 11:07 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 01/29/2019 04:12 PM, Magnus Karlsson wrote:
> > Move the pr_*() functions in libbpf.c to a common header file called
> > libbpf_internal.h. This so that the later libbpf AF_XDP helper library
> > code in xsk.c can use these printing functions too.
> >
> > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> > ---
> >  tools/lib/bpf/libbpf.c          | 30 +-----------------------------
> >  tools/lib/bpf/libbpf_internal.h | 41 +++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 42 insertions(+), 29 deletions(-)
> >  create mode 100644 tools/lib/bpf/libbpf_internal.h
> >
> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > index 2ccde17..1d7fe26 100644
> > --- a/tools/lib/bpf/libbpf.c
> > +++ b/tools/lib/bpf/libbpf.c
> > @@ -39,6 +39,7 @@
> >  #include <gelf.h>
> >
> >  #include "libbpf.h"
> > +#include "libbpf_internal.h"
> >  #include "bpf.h"
> >  #include "btf.h"
> >  #include "str_error.h"
> > @@ -51,34 +52,6 @@
> >  #define BPF_FS_MAGIC         0xcafe4a11
> >  #endif
> >
> > -#define __printf(a, b)       __attribute__((format(printf, a, b)))
> > -
> > -__printf(1, 2)
> > -static int __base_pr(const char *format, ...)
> > -{
> > -     va_list args;
> > -     int err;
> > -
> > -     va_start(args, format);
> > -     err = vfprintf(stderr, format, args);
> > -     va_end(args);
> > -     return err;
> > -}
> > -
> > -static __printf(1, 2) libbpf_print_fn_t __pr_warning = __base_pr;
> > -static __printf(1, 2) libbpf_print_fn_t __pr_info = __base_pr;
> > -static __printf(1, 2) libbpf_print_fn_t __pr_debug;
> > -
> > -#define __pr(func, fmt, ...) \
> > -do {                         \
> > -     if ((func))             \
> > -             (func)("libbpf: " fmt, ##__VA_ARGS__); \
> > -} while (0)
> > -
> > -#define pr_warning(fmt, ...) __pr(__pr_warning, fmt, ##__VA_ARGS__)
> > -#define pr_info(fmt, ...)    __pr(__pr_info, fmt, ##__VA_ARGS__)
> > -#define pr_debug(fmt, ...)   __pr(__pr_debug, fmt, ##__VA_ARGS__)
> > -
> >  void libbpf_set_print(libbpf_print_fn_t warn,
> >                     libbpf_print_fn_t info,
> >                     libbpf_print_fn_t debug)
> > @@ -96,7 +69,6 @@ void libbpf_set_print(libbpf_print_fn_t warn,
> >               goto out;               \
> >  } while(0)
> >
> > -
> >  /* Copied from tools/perf/util/util.h */
> >  #ifndef zfree
> >  # define zfree(ptr) ({ free(*ptr); *ptr = NULL; })
> > diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
> > new file mode 100644
> > index 0000000..951c078
> > --- /dev/null
> > +++ b/tools/lib/bpf/libbpf_internal.h
>
> Just really minor nit: lets name the header utils.h or such. It would also make
> sense to move the zfree() and zclose() to this and make use of it.

Will fix.

Thanks: Magnus

> > @@ -0,0 +1,41 @@
> > +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> > +
> > +/*
> > + * Common internal libbpf functions and definitions.
> > + *
> > + * Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org>
> > + * Copyright (C) 2015 Wang Nan <wangnan0@huawei.com>
> > + * Copyright (C) 2015 Huawei Inc.
> > + */
> > +#ifndef __LIBBPF_LIBBPF_INTERNAL_H
> > +#define __LIBBPF_LIBBPF_INTERNAL_H
> > +
> > +#define __printf(a, b)       __attribute__((format(printf, a, b)))
> > +
> > +__printf(1, 2)
> > +static int __base_pr(const char *format, ...)
> > +{
> > +     va_list args;
> > +     int err;
> > +
> > +     va_start(args, format);
> > +     err = vfprintf(stderr, format, args);
> > +     va_end(args);
> > +     return err;
> > +}
> > +
> > +static __maybe_unused __printf(1, 2) libbpf_print_fn_t __pr_warning = __base_pr;
> > +static __maybe_unused __printf(1, 2) libbpf_print_fn_t __pr_info = __base_pr;
> > +static __maybe_unused __printf(1, 2) libbpf_print_fn_t __pr_debug;
> > +
> > +#define __pr(func, fmt, ...) \
> > +do {                         \
> > +     if ((func))             \
> > +             (func)("libbpf: " fmt, ##__VA_ARGS__); \
> > +} while (0)
> > +
> > +#define pr_warning(fmt, ...) __pr(__pr_warning, fmt, ##__VA_ARGS__)
> > +#define pr_info(fmt, ...)    __pr(__pr_info, fmt, ##__VA_ARGS__)
> > +#define pr_debug(fmt, ...)   __pr(__pr_debug, fmt, ##__VA_ARGS__)
> > +
> > +#endif /* __LIBBPF_LIBBPF_INTERNAL_H */
> >
>

^ permalink raw reply

* Re: [PATCH bpf-next v3 2/3] libbpf: add support for using AF_XDP sockets
From: Magnus Karlsson @ 2019-01-31 10:44 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Magnus Karlsson, Björn Töpel, ast, Network Development,
	Jakub Kicinski, Björn Töpel, Zhang, Qi Z,
	Jesper Dangaard Brouer
In-Reply-To: <b422085d-493a-1c1b-7291-8de0dd79e21d@iogearbox.net>

On Thu, Jan 31, 2019 at 11:07 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 01/29/2019 04:12 PM, Magnus Karlsson wrote:
> > This commit adds AF_XDP support to libbpf. The main reason for this is
> > to facilitate writing applications that use AF_XDP by offering
> > higher-level APIs that hide many of the details of the AF_XDP
> > uapi. This is in the same vein as libbpf facilitates XDP adoption by
> > offering easy-to-use higher level interfaces of XDP
> > functionality. Hopefully this will facilitate adoption of AF_XDP, make
> > applications using it simpler and smaller, and finally also make it
> > possible for applications to benefit from optimizations in the AF_XDP
> > user space access code. Previously, people just copied and pasted the
> > code from the sample application into their application, which is not
> > desirable.
> >
> > The interface is composed of two parts:
> >
> > * Low-level access interface to the four rings and the packet
> > * High-level control plane interface for creating and setting
> >   up umems and af_xdp sockets as well as a simple XDP program.
> >
> > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> [...]
> >  libbpf ABI
> >  ==========
> >
> > diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> > index 266bc95..d294843 100644
> > --- a/tools/lib/bpf/libbpf.map
> > +++ b/tools/lib/bpf/libbpf.map
> > @@ -121,6 +121,18 @@ LIBBPF_0.0.1 {
> >               libbpf_prog_type_by_name;
> >               libbpf_set_print;
> >               libbpf_strerror;
> > +             xsk_ring_cons__peek;
> > +             xsk_ring_cons__release;
> > +             xsk_ring_prod__reserve;
> > +             xsk_ring_prod__submit;
> > +             xsk_umem__get_data;
> > +             xsk_umem__get_data_raw;
> > +             xsk_umem__create;
> > +             xsk_socket__create;
> > +             xsk_umem__delete;
> > +             xsk_socket__delete;
> > +             xsk_umem__fd;
> > +             xsk_socket__fd;
>
> This definitely needs to be placed under LIBBPF_0.0.2 where we have bpf_probe_*
> that were added in this cycle.

OK. Will move it to LIBBPF_0.0.2.

Thanks: Magnus

>
> >       local:
> >               *;
> >  };
> > diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c

^ permalink raw reply

* Re: [PATCH v6 bpf-next 1/9] bpf: introduce bpf_spin_lock
From: Peter Zijlstra @ 2019-01-31 10:47 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: davem, daniel, jannh, netdev, kernel-team
In-Reply-To: <20190131050546.3634009-2-ast@kernel.org>

On Wed, Jan 30, 2019 at 09:05:38PM -0800, Alexei Starovoitov wrote:
> Introduce 'struct bpf_spin_lock' and bpf_spin_lock/unlock() helpers to let
> bpf program serialize access to other variables.
> 
> Example:
> struct hash_elem {
>     int cnt;
>     struct bpf_spin_lock lock;
> };
> struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key);
> if (val) {
>     bpf_spin_lock(&val->lock);
>     val->cnt++;
>     bpf_spin_unlock(&val->lock);
> }
> 
> Restrictions and safety checks:
> - bpf_spin_lock is only allowed inside HASH and ARRAY maps.
> - BTF description of the map is mandatory for safety analysis.
> - bpf program can take one bpf_spin_lock at a time, since two or more can
>   cause dead locks.
> - only one 'struct bpf_spin_lock' is allowed per map element.
>   It drastically simplifies implementation yet allows bpf program to use
>   any number of bpf_spin_locks.
> - when bpf_spin_lock is taken the calls (either bpf2bpf or helpers) are not allowed.
> - bpf program must bpf_spin_unlock() before return.
> - bpf program can access 'struct bpf_spin_lock' only via
>   bpf_spin_lock()/bpf_spin_unlock() helpers.
> - load/store into 'struct bpf_spin_lock lock;' field is not allowed.
> - to use bpf_spin_lock() helper the BTF description of map value must be
>   a struct and have 'struct bpf_spin_lock anyname;' field at the top level.
>   Nested lock inside another struct is not allowed.
> - syscall map_lookup doesn't copy bpf_spin_lock field to user space.
> - syscall map_update and program map_update do not update bpf_spin_lock field.
> - bpf_spin_lock cannot be on the stack or inside networking packet.
>   bpf_spin_lock can only be inside HASH or ARRAY map value.
> - bpf_spin_lock is available to root only and to all program types.
> - bpf_spin_lock is not allowed in inner maps of map-in-map.
> - ld_abs is not allowed inside spin_lock-ed region.
> - tracing progs and socket filter progs cannot use bpf_spin_lock due to
>   insufficient preemption checks
> 
> Implementation details:
> - cgroup-bpf class of programs can nest with xdp/tc programs.
>   Hence bpf_spin_lock is equivalent to spin_lock_irqsave.
>   Other solutions to avoid nested bpf_spin_lock are possible.
>   Like making sure that all networking progs run with softirq disabled.
>   spin_lock_irqsave is the simplest and doesn't add overhead to the
>   programs that don't use it.
> - arch_spinlock_t is used when its implemented as queued_spin_lock
> - archs can force their own arch_spinlock_t
> - on architectures where queued_spin_lock is not available and
>   sizeof(arch_spinlock_t) != sizeof(__u32) trivial lock is used.
> - presence of bpf_spin_lock inside map value could have been indicated via
>   extra flag during map_create, but specifying it via BTF is cleaner.
>   It provides introspection for map key/value and reduces user mistakes.
> 
> Next steps:
> - allow bpf_spin_lock in other map types (like cgroup local storage)
> - introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper
>   to request kernel to grab bpf_spin_lock before rewriting the value.
>   That will serialize access to map elements.
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Small nit below.

> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index a74972b07e74..29a8a4e62b8a 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -221,6 +221,86 @@ const struct bpf_func_proto bpf_get_current_comm_proto = {
>  	.arg2_type	= ARG_CONST_SIZE,
>  };
>  
> +#if defined(CONFIG_QUEUED_SPINLOCKS) || defined(CONFIG_BPF_ARCH_SPINLOCK)
> +
> +static inline void __bpf_spin_lock(struct bpf_spin_lock *lock)
> +{
> +	arch_spinlock_t *l = (void *)lock;
> +	union {
> +		__u32 val;
> +		arch_spinlock_t lock;
> +	} u = { .lock = __ARCH_SPIN_LOCK_UNLOCKED };
> +
> +	compiletime_assert(u.val == 0, "__ARCH_SPIN_LOCK_UNLOCKED not 0");
> +	BUILD_BUG_ON(sizeof(*l) != sizeof(__u32));
> +	BUILD_BUG_ON(sizeof(*lock) != sizeof(__u32));
> +	arch_spin_lock(l);
> +}
> +
> +static inline void __bpf_spin_unlock(struct bpf_spin_lock *lock)
> +{
> +	arch_spinlock_t *l = (void *)lock;
> +
> +	arch_spin_unlock(l);
> +}
> +
> +#else
> +
> +static inline void __bpf_spin_lock(struct bpf_spin_lock *lock)
> +{
> +	atomic_t *l = (void *)lock;
> +
> +	BUILD_BUG_ON(sizeof(*l) != sizeof(*lock));
> +	do {
> +		atomic_cond_read_relaxed(l, !VAL);
> +	} while (atomic_xchg(l, 1));
> +}
> +
> +static inline void __bpf_spin_unlock(struct bpf_spin_lock *lock)
> +{
> +	atomic_t *l = (void *)lock;
> +
> +	atomic_set_release(l, 0);
> +}
> +
> +#endif
> +
> +static DEFINE_PER_CPU(unsigned long, irqsave_flags);
> +
> +notrace BPF_CALL_1(bpf_spin_lock, struct bpf_spin_lock *, lock)
> +{
> +	unsigned long flags;
> +
> +	local_irq_save(flags);
> +	__bpf_spin_lock(lock);
> +	this_cpu_write(irqsave_flags, flags);

	__this_cpu_write()

> +	return 0;
> +}
> +
> +const struct bpf_func_proto bpf_spin_lock_proto = {
> +	.func		= bpf_spin_lock,
> +	.gpl_only	= false,
> +	.ret_type	= RET_VOID,
> +	.arg1_type	= ARG_PTR_TO_SPIN_LOCK,
> +};
> +
> +notrace BPF_CALL_1(bpf_spin_unlock, struct bpf_spin_lock *, lock)
> +{
> +	unsigned long flags;
> +
> +	flags = this_cpu_read(irqsave_flags);

	__this_cpu_read()

> +	__bpf_spin_unlock(lock);
> +	local_irq_restore(flags);
> +	return 0;
> +}
> +
> +const struct bpf_func_proto bpf_spin_unlock_proto = {
> +	.func		= bpf_spin_unlock,
> +	.gpl_only	= false,
> +	.ret_type	= RET_VOID,
> +	.arg1_type	= ARG_PTR_TO_SPIN_LOCK,
> +};
> +
>  #ifdef CONFIG_CGROUPS
>  BPF_CALL_0(bpf_get_current_cgroup_id)
>  {

^ permalink raw reply

* Re: [PATCH 4/7] sh_eth: offload RX checksum on R8A7740
From: Sergei Shtylyov @ 2019-01-31 10:52 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: netdev, David S. Miller, Linux-Renesas, Linux-sh list
In-Reply-To: <CAMuHMdUixqEcD36ReZj4noF5VOCNHS-qH_9herABkiLSVD3R8A@mail.gmail.com>

Hello!

On 01/29/2019 09:20 PM, Geert Uytterhoeven wrote:

>> The R-Mobile A1 (R8A7740) SoC manual describes the Ether MAC's RX checksum
>> offload the same way as it's implemented in the EtherAVB MAC...
>>
>> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
> 
> Thanks for your patch!
> 
> Running netperf as described in patch 2/7, perf tells me there's a reduction
> for csum_partial from ca. 1.9% to 0.01%, so this feature seems to work.

   Hm, what about do_csum()?

> Hence:
> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

   Thank you!
 
> However, while effective according to perf results, using ethtool to
> enable/disable
> the feature prints an error message:
> 
> root@armadillo:~# ethtool -K eth0 rx on
> Cannot get device udp-fragmentation-offload settings: Operation not supported
> Cannot get device udp-fragmentation-offload settings: Operation not supported
> root@armadillo:~# ethtool -K eth0 rx off
> Cannot get device udp-fragmentation-offload settings: Operation not supported
> Cannot get device udp-fragmentation-offload settings: Operation not supported
> root@armadillo:~#
> 
> Do you have any clue?

   No (I'm seeing the same).
 
> Does this needs testing on R-Mobile A1 with VLAN enabled, too, or is that
> independent from the underlying sh-eth hardware version?

   It's dependent...

> Gr{oetje,eeting}s,
> 
>                         Geert

MBR, Sergei

^ permalink raw reply

* WE ARE SUCCESSFUL AT LAST
From: Mr Mark Guer @ 2019-01-31 10:26 UTC (permalink / raw)
  To: netdev

Dear Colleague,

On a very happy note and bearing in mind the staunch commitment that you had put into the quest to see that the fund was transferred to you for claim as we had arranged but to no avail, I write to inform you that at last, the fund transfer had been a SUCCESS. Since all our effort in conjunction with yours were met with various failures and disappointments after the other, there was an influential party from Britain that was approached on our behalf by a Finance & Management Supports Initiatives(a consulting firm); and with her (influential  party) imputes, the Fund was paid out and I have collected my own share. Presently, I am in  Ecuador where, in partnership with a core investor in the Mining Sector of their economy, we are investing in the industry

But, I am not oblivious of the fact that you did your best in that context, although, it seem  that element of doubt crept in. You were not to blame,
 as there were many circumstances that would reflect such skepticism to any reasonable being.For your sincere but unfruitful sacrifices, I pressurized the influential woman to  concede some amount as COMPENSATION to you, for the resource that you imp lunged even the time and inconveniences

The sum of $US 800,000.00 was what she agreed to as your compensation value, and the sum  was made in draft but, because of my appointment in Ecuador and too, I could not get through  to you since your number at any time indicated "not in use", I then handed it over to the  Secretary of the Finance & Management Supports Initiatives, the consulting firm

His name is Mr Emmanuel Nduka
Email: emmanduka04@gmail.com

The Secretary is very much informed by me about you and I instructed him to hand over the  Draft you as soon as you contact him. I gave him the CODE 555, which you have to mention for  identification. Note that at times I will be in the site and the Internet does not work  there. Only when I am in the city.

Yours Sincerely,
Mr Mark Guer

^ permalink raw reply

* Re: [PATCH 4/7] sh_eth: offload RX checksum on R8A7740
From: Geert Uytterhoeven @ 2019-01-31 11:11 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: netdev, David S. Miller, Linux-Renesas, Linux-sh list
In-Reply-To: <918892bc-4b2d-0235-cb51-749ca49bd4e8@cogentembedded.com>

Hi Sergei,

On Thu, Jan 31, 2019 at 11:52 AM Sergei Shtylyov
<sergei.shtylyov@cogentembedded.com> wrote:
> On 01/29/2019 09:20 PM, Geert Uytterhoeven wrote:
> >> The R-Mobile A1 (R8A7740) SoC manual describes the Ether MAC's RX checksum
> >> offload the same way as it's implemented in the EtherAVB MAC...
> >>
> >> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
> >
> > Thanks for your patch!
> >
> > Running netperf as described in patch 2/7, perf tells me there's a reduction
> > for csum_partial from ca. 1.9% to 0.01%, so this feature seems to work.
>
>    Hm, what about do_csum()?

I had looked for that, but didn't see it. Probably inlined, as it's static.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH net-next 1/1] openvswitch: Declare ovs key structures using macros
From: Eli Britstein @ 2019-01-31 11:32 UTC (permalink / raw)
  To: David Miller, pshelar@ovn.org
  Cc: netdev@vger.kernel.org, blp@ovn.org, dev@openvswitch.org,
	Roi Dayan, simon.horman@netronome.com
In-Reply-To: <b9632d46-91f7-49a9-3f32-76ada6317da9@mellanox.com>

ping

for the using patch, i put below the v1 of it. here is v2:

https://patchwork.ozlabs.org/patch/1023406/


On 1/27/2019 8:37 AM, Eli Britstein wrote:
>
> On 1/27/2019 1:04 AM, David Miller wrote:
>> From: Eli Britstein <elibr@mellanox.com>
>> Date: Thu, 24 Jan 2019 11:46:47 +0200
>>
>>> Declare ovs key structures using macros to enable retrieving fields
>>> information, with no functional change.
>>>
>>> Signed-off-by: Eli Britstein <elibr@mellanox.com>
>>> Reviewed-by: Roi Dayan <roid@mellanox.com>
>> I agree with Pravin, this need a much better commit message.
>>
>> Maybe even better to submit this alongside whatever is supposed
>> to use these new macros.
>
> This patch is equivalent to a work done in the OVS tree.
>
> https://patchwork.ozlabs.org/patch/1023405/
>
> As a standalone it doesn't serve any purpose (as mentioned - no 
> functional change).
>
> It serves as a pre-step towards another patch in the OVS:
>
> https://patchwork.ozlabs.org/patch/1022794/
>
> So, the purpose of doing it in the kernel is just to keep this H file 
> identical. Once it is approved for the kernel, we will be able to 
> proceed with it in the OVS.
>

^ permalink raw reply

* [PATCH net] virtio_net: Account for tx bytes and packets on sending xdp_frames
From: Toshiaki Makita @ 2019-01-31 11:40 UTC (permalink / raw)
  To: David S. Miller, Michael S. Tsirkin, Jason Wang
  Cc: Toshiaki Makita, netdev, virtualization, David Ahern

Previously virtnet_xdp_xmit() did not account for device tx counters,
which caused confusions.
To be consistent with SKBs, account them on freeing xdp_frames.

Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
---
 drivers/net/virtio_net.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2594481..4cfceb7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -503,6 +503,8 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 	struct bpf_prog *xdp_prog;
 	struct send_queue *sq;
 	unsigned int len;
+	int packets = 0;
+	int bytes = 0;
 	int drops = 0;
 	int kicks = 0;
 	int ret, err;
@@ -526,10 +528,18 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 
 	/* Free up any pending old buffers before queueing new ones. */
 	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
-		if (likely(is_xdp_frame(ptr)))
-			xdp_return_frame(ptr_to_xdp(ptr));
-		else
-			napi_consume_skb(ptr, false);
+		if (likely(is_xdp_frame(ptr))) {
+			struct xdp_frame *frame = ptr_to_xdp(ptr);
+
+			bytes += frame->len;
+			xdp_return_frame(frame);
+		} else {
+			struct sk_buff *skb = ptr;
+
+			bytes += skb->len;
+			napi_consume_skb(skb, false);
+		}
+		packets++;
 	}
 
 	for (i = 0; i < n; i++) {
@@ -549,6 +559,8 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 	}
 out:
 	u64_stats_update_begin(&sq->stats.syncp);
+	sq->stats.bytes += bytes;
+	sq->stats.packets += packets;
 	sq->stats.xdp_tx += n;
 	sq->stats.xdp_tx_drops += drops;
 	sq->stats.kicks += kicks;
-- 
1.8.3.1



^ permalink raw reply related

* Re: [PATCH v3] lib/test_rhashtable: Make test_insert_dup() allocate its hash table dynamically
From: Herbert Xu @ 2019-01-31 12:08 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: David Miller, tgraf, netdev, linux-kernel
In-Reply-To: <20190130184230.225631-1-bvanassche@acm.org>

On Wed, Jan 30, 2019 at 10:42:30AM -0800, Bart Van Assche wrote:
> The test_insert_dup() function from lib/test_rhashtable.c passes a
> pointer to a stack object to rhltable_init(). Allocate the hash table
> dynamically to avoid that the following is reported with object
> debugging enabled:
> 
> ODEBUG: object (ptrval) is on stack (ptrval), but NOT annotated.
> WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:368 __debug_object_init+0x312/0x480
> Modules linked in:
> EIP: __debug_object_init+0x312/0x480
> Call Trace:
>  ? debug_object_init+0x1a/0x20
>  ? __init_work+0x16/0x30
>  ? rhashtable_init+0x1e1/0x460
>  ? sched_clock_cpu+0x57/0xe0
>  ? rhltable_init+0xb/0x20
>  ? test_insert_dup+0x32/0x20f
>  ? trace_hardirqs_on+0x38/0xf0
>  ? ida_dump+0x10/0x10
>  ? jhash+0x130/0x130
>  ? my_hashfn+0x30/0x30
>  ? test_rht_init+0x6aa/0xab4
>  ? ida_dump+0x10/0x10
>  ? test_rhltable+0xc5c/0xc5c
>  ? do_one_initcall+0x67/0x28e
>  ? trace_hardirqs_off+0x22/0xe0
>  ? restore_all_kernel+0xf/0x70
>  ? trace_hardirqs_on_thunk+0xc/0x10
>  ? restore_all_kernel+0xf/0x70
>  ? kernel_init_freeable+0x142/0x213
>  ? rest_init+0x230/0x230
>  ? kernel_init+0x10/0x110
>  ? schedule_tail_wrapper+0x9/0xc
>  ? ret_from_fork+0x19/0x24
> 
> Cc: Thomas Graf <tgraf@suug.ch>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: netdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
> 
> Changes compared to v2: fixed build error.
> 
> Changes compared to v1: instead of modifying rhashtable_init(), modify its
>    caller.
> 
> lib/test_rhashtable.c | 23 +++++++++++++++--------
>  1 file changed, 15 insertions(+), 8 deletions(-)

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
From: Peter Ceiley @ 2019-01-31 12:09 UTC (permalink / raw)
  To: David Chang; +Cc: Heiner Kallweit, Realtek linux nic maintainers, netdev
In-Reply-To: <20190131072306.GG25745@linux-kyyb.suse>

Hi Heiner,

A quick update on my testing with different pcie_aspm settings:

pcie_aspm=off | no change
pcie_aspm.policy=default | no change
pcie_aspm.policy=performance | issue resolved
pcie_aspm.policy=powersave | issue resolved
pcie_aspm.policy=powersupersave | issue resolved

It seems the new driver does not play nicely with the default ASPM policy.

As requested, I've included an output of ethtool below when experiencing
the issue - note that no errors are recorded.

# ethtool -S enp3s0
NIC statistics:
     tx_packets: 2749
     rx_packets: 4089
     tx_errors: 0
     rx_errors: 0
     rx_missed: 0
     align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     unicast: 4078
     broadcast: 9
     multicast: 2
     tx_aborted: 0
     tx_underrun: 0

David, I hope this helps for your user as well. I appreciate you sharing
the bug ticket - thanks.

Heiner, thanks very much for your help to date.

Regards,

Peter.

On Thu, 31 Jan 2019 at 18:23, David Chang <dchang@suse.com> wrote:
>
> Hi Heiner,
>
> On Jan 31, 2019 at 07:35:30 +0100, Heiner Kallweit wrote:
> > Hi David, two more things:
> >
> > 1. Could you please test a recent linux-next kernel?
> > 2. Please get a register dump (ethtool -d <if>) from 4.18 and 4.19
> >    and compare them.
>
> I'm sorry that I do not have the issue machine handy. I would ask
> our user to do the test. Thanks!
>
> Regards,
> David
>
> >
> > Heiner
> >
> >
> > On 31.01.2019 07:21, Heiner Kallweit wrote:
> > > David, thanks for the link to the bug ticket.
> > > I think only a proper bisect can help to find the offending commit.
> > >
> > > Heiner
> > >
> > >
> > > On 31.01.2019 03:32, David Chang wrote:
> > >> Hi,
> > >>
> > >> We had a similr case here.
> > >> - Realtek r8169 receive performance regression in kernel 4.19
> > >>   https://bugzilla.suse.com/show_bug.cgi?id=1119649
> > >>
> > >> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> > >> The major symptom is there are many rx_missed count.
> > >>
> > >>
> > >> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
> > >>> Hi Peter,
> > >>>
> > >>> recently I had somebody where pcie_aspm=off for whatever reason didn't
> > >>> do the trick, can you also check with pcie_aspm.policy=performance.
> > >>
> > >> We will give it a try later.
> > >>
> > >>> And please check with "ethtool -S <if>" whether the chip statistics
> > >>> show a significant number of errors.
> > >>>
> > >>> If this doesn't help you may have to bisect to find the offending commit.
> > >>
> > >> We had tried fallback driver to a few previous commits as following,
> > >> but with no luck.
> > >>
> > >> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> > >> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> > >> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> > >> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> > >> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
> > >>
> > >> Thanks,
> > >> David Chang
> > >>
> > >>>
> > >>> Heiner
> > >>>
> > >>>
> > >>> On 30.01.2019 10:59, Peter Ceiley wrote:
> > >>>> Hi Heiner,
> > >>>>
> > >>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
> > >>>> and this made no difference.
> > >>>>
> > >>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
> > >>>> subsequently loaded the module in the running 4.19.18 kernel. I can
> > >>>> confirm that this immediately resolved the issue and access to the NFS
> > >>>> shares operated as expected.
> > >>>>
> > >>>> I presume this means it is an issue with the r8169 driver included in
> > >>>> 4.19 onwards?
> > >>>>
> > >>>> To answer your last questions:
> > >>>>
> > >>>> Base Board Information
> > >>>>     Manufacturer: Alienware
> > >>>>     Product Name: 0PGRP5
> > >>>>     Version: A02
> > >>>>
> > >>>> ... and yes, the RTL8168 is the onboard network chip.
> > >>>>
> > >>>> Regards,
> > >>>>
> > >>>> Peter.
> > >>>>
> > >>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> > >>>>>
> > >>>>> Hi Peter,
> > >>>>>
> > >>>>> I think the vendor driver doesn't enable ASPM per default.
> > >>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
> > >>>>> Few older systems seem to have issues with ASPM, what kind of
> > >>>>> system / mainboard are you using? The RTL8168 is the onboard
> > >>>>> network chip?
> > >>>>>
> > >>>>> Rgds, Heiner
> > >>>>>
> > >>>>>
> > >>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
> > >>>>>> Hi Heiner,
> > >>>>>>
> > >>>>>> Thanks, I'll do some more testing. It might not be the driver - I
> > >>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
> > >>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> > >>>>>> a good idea.
> > >>>>>>
> > >>>>>> Cheers,
> > >>>>>>
> > >>>>>> Peter.
> > >>>>>>
> > >>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> > >>>>>>>
> > >>>>>>> Hi Peter,
> > >>>>>>>
> > >>>>>>> at a first glance it doesn't look like a typical driver issue.
> > >>>>>>> What you could do:
> > >>>>>>>
> > >>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
> > >>>>>>>
> > >>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
> > >>>>>>>
> > >>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
> > >>>>>>>
> > >>>>>>> Any specific reason why you think root cause is in the driver and not
> > >>>>>>> elsewhere in the network subsystem?
> > >>>>>>>
> > >>>>>>> Heiner
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
> > >>>>>>>> Hi Heiner,
> > >>>>>>>>
> > >>>>>>>> Thanks for getting back to me.
> > >>>>>>>>
> > >>>>>>>> No, I don't use jumbo packets.
> > >>>>>>>>
> > >>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
> > >>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
> > >>>>>>>> establishing a connection and is most notable, for example, on my
> > >>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
> > >>>>>>>> larger directories) to list the contents of each directory. Once a
> > >>>>>>>> transfer begins on a file, I appear to get good bandwidth.
> > >>>>>>>>
> > >>>>>>>> I'm unsure of the best scientific data to provide you in order to
> > >>>>>>>> troubleshoot this issue. Running the following
> > >>>>>>>>
> > >>>>>>>>     netstat -s |grep retransmitted
> > >>>>>>>>
> > >>>>>>>> shows a steady increase in retransmitted segments each time I list the
> > >>>>>>>> contents of a remote directory, for example, running 'ls' on a
> > >>>>>>>> directory containing 345 media files did the following using kernel
> > >>>>>>>> 4.19.18:
> > >>>>>>>>
> > >>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
> > >>>>>>>> the following:
> > >>>>>>>>     real    0m19.867s
> > >>>>>>>>     user    0m0.012s
> > >>>>>>>>     sys    0m0.036s
> > >>>>>>>>
> > >>>>>>>> The same command shows no retransmitted segments running kernel
> > >>>>>>>> 4.18.16 and 'time' showed:
> > >>>>>>>>     real    0m0.300s
> > >>>>>>>>     user    0m0.004s
> > >>>>>>>>     sys    0m0.007s
> > >>>>>>>>
> > >>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
> > >>>>>>>>
> > >>>>>>>> dmesg XID:
> > >>>>>>>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> > >>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> > >>>>>>>>
> > >>>>>>>> # lspci -vv
> > >>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> > >>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> > >>>>>>>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> > >>>>>>>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > >>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
> > >>>>>>>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > >>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
> > >>>>>>>>     Latency: 0, Cache Line Size: 64 bytes
> > >>>>>>>>     Interrupt: pin A routed to IRQ 19
> > >>>>>>>>     Region 0: I/O ports at d000 [size=256]
> > >>>>>>>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> > >>>>>>>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> > >>>>>>>>     Capabilities: [40] Power Management version 3
> > >>>>>>>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> > >>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> > >>>>>>>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > >>>>>>>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> > >>>>>>>>         Address: 0000000000000000  Data: 0000
> > >>>>>>>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
> > >>>>>>>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> > >>>>>>>> <512ns, L1 <64us
> > >>>>>>>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> > >>>>>>>> SlotPowerLimit 10.000W
> > >>>>>>>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > >>>>>>>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> > >>>>>>>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
> > >>>>>>>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> > >>>>>>>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> > >>>>>>>> Latency L0s unlimited, L1 <64us
> > >>>>>>>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> > >>>>>>>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> > >>>>>>>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> > >>>>>>>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
> > >>>>>>>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> > >>>>>>>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> > >>>>>>>> OBFF Via message/WAKE#
> > >>>>>>>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> > >>>>>>>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> > >>>>>>>> OBFF Disabled
> > >>>>>>>>              AtomicOpsCtl: ReqEn-
> > >>>>>>>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> > >>>>>>>>              Transmit Margin: Normal Operating Range,
> > >>>>>>>> EnterModifiedCompliance- ComplianceSOS-
> > >>>>>>>>              Compliance De-emphasis: -6dB
> > >>>>>>>>         LnkSta2: Current De-emphasis Level: -6dB,
> > >>>>>>>> EqualizationComplete-, EqualizationPhase1-
> > >>>>>>>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> > >>>>>>>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> > >>>>>>>>         Vector table: BAR=4 offset=00000000
> > >>>>>>>>         PBA: BAR=4 offset=00000800
> > >>>>>>>>     Capabilities: [d0] Vital Product Data
> > >>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
> > >>>>>>>>         Not readable
> > >>>>>>>>     Capabilities: [100 v1] Advanced Error Reporting
> > >>>>>>>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> > >>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> > >>>>>>>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> > >>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> > >>>>>>>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> > >>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> > >>>>>>>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> > >>>>>>>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> > >>>>>>>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> > >>>>>>>> ECRCChkCap+ ECRCChkEn-
> > >>>>>>>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> > >>>>>>>>         HeaderLog: 00000000 00000000 00000000 00000000
> > >>>>>>>>     Capabilities: [140 v1] Virtual Channel
> > >>>>>>>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
> > >>>>>>>>         Arb:    Fixed- WRR32- WRR64- WRR128-
> > >>>>>>>>         Ctrl:    ArbSelect=Fixed
> > >>>>>>>>         Status:    InProgress-
> > >>>>>>>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> > >>>>>>>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> > >>>>>>>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> > >>>>>>>>             Status:    NegoPending- InProgress-
> > >>>>>>>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> > >>>>>>>>     Capabilities: [170 v1] Latency Tolerance Reporting
> > >>>>>>>>         Max snoop latency: 71680ns
> > >>>>>>>>         Max no snoop latency: 71680ns
> > >>>>>>>>     Kernel driver in use: r8169
> > >>>>>>>>     Kernel modules: r8169
> > >>>>>>>>
> > >>>>>>>> Please let me know if you have any other ideas in terms of testing.
> > >>>>>>>>
> > >>>>>>>> Thanks!
> > >>>>>>>>
> > >>>>>>>> Peter.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
> > >>>>>>>>>> Hi,
> > >>>>>>>>>>
> > >>>>>>>>>> I have been experiencing very poor network performance since Kernel
> > >>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
> > >>>>>>>>>>
> > >>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> > >>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> > >>>>>>>>>> 4.20.4 & 4.19.18).
> > >>>>>>>>>>
> > >>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
> > >>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
> > >>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
> > >>>>>>>>>> differ in that I still have a network connection. I have attempted to
> > >>>>>>>>>> reload the driver on a running system, but this does not improve the
> > >>>>>>>>>> situation.
> > >>>>>>>>>>
> > >>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
> > >>>>>>>>>>
> > >>>>>>>>>> lshw shows:
> > >>>>>>>>>>        description: Ethernet interface
> > >>>>>>>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> > >>>>>>>>>>        vendor: Realtek Semiconductor Co., Ltd.
> > >>>>>>>>>>        physical id: 0
> > >>>>>>>>>>        bus info: pci@0000:03:00.0
> > >>>>>>>>>>        logical name: enp3s0
> > >>>>>>>>>>        version: 0c
> > >>>>>>>>>>        serial:
> > >>>>>>>>>>        size: 1Gbit/s
> > >>>>>>>>>>        capacity: 1Gbit/s
> > >>>>>>>>>>        width: 64 bits
> > >>>>>>>>>>        clock: 33MHz
> > >>>>>>>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
> > >>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> > >>>>>>>>>> 1000bt-fd autonegotiation
> > >>>>>>>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
> > >>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> > >>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> > >>>>>>>>>>        resources: irq:19 ioport:d000(size=256)
> > >>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> > >>>>>>>>>>
> > >>>>>>>>>> Kind Regards,
> > >>>>>>>>>>
> > >>>>>>>>>> Peter.
> > >>>>>>>>>>
> > >>>>>>>>> Hi Peter,
> > >>>>>>>>>
> > >>>>>>>>> the description "poor network performance" is quite vague, therefore:
> > >>>>>>>>>
> > >>>>>>>>> - Can you provide any measurements?
> > >>>>>>>>> - iperf results before and after
> > >>>>>>>>> - statistics about dropped packets (rx and/or tx)
> > >>>>>>>>> - Do you use jumbo packets?
> > >>>>>>>>>
> > >>>>>>>>> Also help would be a "lspci -vv" output for the network card and
> > >>>>>>>>> the dmesg output line with the chip XID.
> > >>>>>>>>>
> > >>>>>>>>> Heiner
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> > >
> >
> >

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox