Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 4/6] dt-bindings: can: m_can: Document transceiver implementation as a phy
From: Faiz Abbas @ 2018-11-02 19:26 UTC (permalink / raw)
  To: linux-kernel, devicetree, netdev, linux-can
  Cc: wg, mkl, robh+dt, mark.rutland, kishon, faiz_abbas
In-Reply-To: <20181102192616.28291-1-faiz_abbas@ti.com>

Some transceivers need a configuration step (for example, pulling
a standby line low) for them to start sending messages. Instead of a
parent child relationship, the transceiver can be implemented as a
phy with the configuration done in the phy driver. The bitrate
limitations can then be obtained by the driver using said phy
node.

Document the above implementation.

Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
---
 .../devicetree/bindings/net/can/m_can.txt     | 24 ++++++++++++-------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/can/m_can.txt b/Documentation/devicetree/bindings/net/can/m_can.txt
index ed614383af9c..c11548e74278 100644
--- a/Documentation/devicetree/bindings/net/can/m_can.txt
+++ b/Documentation/devicetree/bindings/net/can/m_can.txt
@@ -42,12 +42,14 @@ Required properties:
 
 			  Please refer to 2.4.1 Message RAM Configuration in
 			  Bosch M_CAN user manual for details.
+Optional properties:
+- phy-names		: must be "can_transceiver". The transceiver
+			  typically limits the maximum bitrate of the
+			  system. The phy node is used to discover this
+			  limitation.
+- phys			: phandle to the transceiver implemented as a phy
+			  node.
 
-Optional Subnode:
-- can-transceiver	: Can-transceiver subnode describing maximum speed
-			  that can be used for CAN/CAN-FD modes. See
-			  Documentation/devicetree/bindings/net/can/can-transceiver.txt
-			  for details.
 Example:
 SoC dtsi:
 m_can1: can@20e8000 {
@@ -64,12 +66,18 @@ m_can1: can@20e8000 {
 };
 
 Board dts:
+
 &m_can1 {
 	pinctrl-names = "default";
 	pinctrl-0 = <&pinctrl_m_can1>;
 	status = "enabled";
+	phy-names = "can_transceiver";
+	phys = <&transceiver1>;
+};
 
-	can-transceiver {
-		max-bitrate = <5000000>;
-	};
+transceiver1: can-transceiver {
+	compatible = "simple-phy";
+	max-bitrate = <5000000>;
+	pwr-supply = <&transceiver1_fixed>;
+	#phy-cells = <0>;
 };
-- 
2.18.0

^ permalink raw reply related

* [PATCH 6/6] can: m_can: Add support for transceiver as phy
From: Faiz Abbas @ 2018-11-02 19:26 UTC (permalink / raw)
  To: linux-kernel, devicetree, netdev, linux-can
  Cc: wg, mkl, robh+dt, mark.rutland, kishon, faiz_abbas
In-Reply-To: <20181102192616.28291-1-faiz_abbas@ti.com>

Add support for implementing the transceiver device as a phy. Instead
of a child node, the max_bitrate information is obtained by getting a
phy attribute. Fallback to the legacy API if no phy node is found.

Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
---
 drivers/net/can/m_can/m_can.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index 9b449400376b..ed6d84acea20 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -26,6 +26,7 @@
 #include <linux/pm_runtime.h>
 #include <linux/iopoll.h>
 #include <linux/can/dev.h>
+#include <linux/phy/phy.h>
 #include <linux/pinctrl/consumer.h>
 
 /* napi related */
@@ -1582,6 +1583,7 @@ static int m_can_plat_probe(struct platform_device *pdev)
 	struct device_node *np;
 	u32 mram_config_vals[MRAM_CFG_LEN];
 	u32 tx_fifo_size;
+	struct phy *transceiver;
 
 	np = pdev->dev.of_node;
 
@@ -1671,7 +1673,26 @@ static int m_can_plat_probe(struct platform_device *pdev)
 
 	devm_can_led_init(dev);
 
-	of_can_transceiver(dev);
+	transceiver = devm_phy_optional_get(&pdev->dev, "can_transceiver");
+	if (IS_ERR(transceiver)) {
+		ret = PTR_ERR(transceiver);
+		dev_err(&pdev->dev, "Couldn't get phy. err=%d\n", ret);
+		return ret;
+	}
+
+	if (!transceiver) {
+		dev_warn(&pdev->dev, "No transceiver phy. Falling back to legacy API\n");
+		of_can_transceiver(dev);
+	} else {
+		ret = phy_power_on(transceiver);
+		if (ret) {
+			dev_err(&pdev->dev, "phy_power_on err:%d\n", ret);
+			return ret;
+		}
+		priv->can.bitrate_max = phy_get_max_bitrate(transceiver);
+		if (!priv->can.bitrate_max)
+			dev_warn(&pdev->dev, "Invalid value for transceiver max bitrate. Ignoring bitrate limit\n");
+	}
 
 	dev_info(&pdev->dev, "%s device registered (irq=%d, version=%d)\n",
 		 KBUILD_MODNAME, dev->irq, priv->version);
-- 
2.18.0

^ permalink raw reply related

* Re: [PATCH bpf 2/3] bpf: show real jited address in bpf_prog_info->jited_ksyms
From: Daniel Borkmann @ 2018-11-02 10:19 UTC (permalink / raw)
  To: Song Liu, netdev; +Cc: kernel-team, ast, sandipan
In-Reply-To: <e779db0e-7933-c11b-5068-d919897d15ec@iogearbox.net>

On 11/02/2018 11:09 AM, Daniel Borkmann wrote:
> On 11/01/2018 08:00 AM, Song Liu wrote:
>> Currently, jited_ksyms in bpf_prog_info shows page addresses of jited
>> bpf program. This is not ideal for detailed profiling (find hot
>> instructions from stack traces). This patch replaces the page address
>> with real prog start address.
>>
>> Signed-off-by: Song Liu <songliubraving@fb.com>
>> ---
>>  kernel/bpf/syscall.c | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index ccb93277aae2..34a9eef5992c 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -2172,7 +2172,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>>  			user_ksyms = u64_to_user_ptr(info.jited_ksyms);
>>  			for (i = 0; i < ulen; i++) {
>>  				ksym_addr = (ulong) prog->aux->func[i]->bpf_func;
>> -				ksym_addr &= PAGE_MASK;
> 
> Note that the masking was done on purpose here and in patch 1/3 in order to
> not expose randomized start address to kallsyms at least. I suppose it's
> okay to change it here and for kallsyms given bpf_prog_get_info_by_fd() dump
> is for root only, and in each of the two cases we additionally apply
> kallsyms_show_value() logic, so for unpriv this is zeroed out plus only root
> loaded programs are added under kallsyms (capable(CAP_SYS_ADMIN)) anyway.

(Btw, something like above should have been in changelog to provide some more
historical context of why we used to do it like that and explaining why it is
okay to change it this way.)

>>  				if (put_user((u64) ksym_addr, &user_ksyms[i]))
>>  					return -EFAULT;
>>  			}
>>
> 

^ permalink raw reply

* [PATCH bpf] bpf: fix bpf_prog_get_info_by_fd to return 0 func_lens for unpriv
From: Daniel Borkmann @ 2018-11-02 10:35 UTC (permalink / raw)
  To: ast; +Cc: netdev, Daniel Borkmann, Sandipan Das, Song Liu

While dbecd7388476 ("bpf: get kernel symbol addresses via syscall")
zeroed info.nr_jited_ksyms in bpf_prog_get_info_by_fd() for queries
from unprivileged users, commit 815581c11cc2 ("bpf: get JITed image
lengths of functions via syscall") forgot about doing so and therefore
returns the #elems of the user set up buffer which is incorrect. It
also needs to indicate a info.nr_jited_func_lens of zero.

Fixes: 815581c11cc2 ("bpf: get JITed image lengths of functions via syscall")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Sandipan Das <sandipan@linux.vnet.ibm.com>
Cc: Song Liu <songliubraving@fb.com>
---
 kernel/bpf/syscall.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index ccb9327..1ea5ce1 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2078,6 +2078,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
 		info.jited_prog_len = 0;
 		info.xlated_prog_len = 0;
 		info.nr_jited_ksyms = 0;
+		info.nr_jited_func_lens = 0;
 		goto done;
 	}

-- 
2.9.5

^ permalink raw reply related

* Re: [PATCH bpf] bpf: fix bpf_prog_get_info_by_fd to return 0 func_lens for unpriv
From: Sandipan Das @ 2018-11-02 10:55 UTC (permalink / raw)
  To: Daniel Borkmann, ast; +Cc: netdev, Sandipan Das, Song Liu
In-Reply-To: <20181102103546.4499-1-daniel@iogearbox.net>



On 02/11/18 4:05 PM, Daniel Borkmann wrote:
> While dbecd7388476 ("bpf: get kernel symbol addresses via syscall")
> zeroed info.nr_jited_ksyms in bpf_prog_get_info_by_fd() for queries
> from unprivileged users, commit 815581c11cc2 ("bpf: get JITed image
> lengths of functions via syscall") forgot about doing so and therefore
> returns the #elems of the user set up buffer which is incorrect. It
> also needs to indicate a info.nr_jited_func_lens of zero.
> 
> Fixes: 815581c11cc2 ("bpf: get JITed image lengths of functions via syscall")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Sandipan Das <sandipan@linux.vnet.ibm.com>
> Cc: Song Liu <songliubraving@fb.com>
> ---
>  kernel/bpf/syscall.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index ccb9327..1ea5ce1 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2078,6 +2078,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  		info.jited_prog_len = 0;
>  		info.xlated_prog_len = 0;
>  		info.nr_jited_ksyms = 0;
> +		info.nr_jited_func_lens = 0;
>  		goto done;
>  	}
> 

Looks good. Thanks for fixing this. 

- Sandipan

^ permalink raw reply

* [PATCH iproute2] Fix warning in tc-skbprio.8 manpage
From: Luca Boccassi @ 2018-11-02 10:57 UTC (permalink / raw)
  To: netdev; +Cc: stephen, dsahern

". If" gets interpreted as a macro, so move the period to the previous
line:

  33: warning: macro `If' not defined

Fixes: 141b55f8544e ("Add SKB Priority qdisc support in tc(8)")

Signed-off-by: Luca Boccassi <bluca@debian.org>
---
 man/man8/tc-skbprio.8 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/man/man8/tc-skbprio.8 b/man/man8/tc-skbprio.8
index 844bbf46..8b259839 100644
--- a/man/man8/tc-skbprio.8
+++ b/man/man8/tc-skbprio.8
@@ -29,8 +29,8 @@ While SKB Priority Queue is agnostic to how
 .B skb->priority
 is assigned. A typical use case is to copy
 the 6-bit DS field of IPv4 and IPv6 packets using
-.BR tc-skbedit (8)
-. If
+.BR tc-skbedit (8) .
+If
 .B skb->priority
 is greater or equal to 64, the priority is assumed to be 63.
 Priorities less than 64 are taken at face value.
-- 
2.19.1

^ permalink raw reply related

* Re: [RFC bpf-next] libbpf: increase rlimit before trying to create BPF maps
From: Quentin Monnet @ 2018-11-02 11:04 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: netdev, oss-drivers
In-Reply-To: <62ba0595-8b04-3ad4-32cd-47829b503f6b@iogearbox.net>

2018-11-02 10:08 UTC+0100 ~ Daniel Borkmann <daniel@iogearbox.net>
> On 11/01/2018 06:18 PM, Quentin Monnet wrote:
>> 2018-10-30 15:23 UTC+0000 ~ Quentin Monnet <quentin.monnet@netronome.com>
>>> The limit for memory locked in the kernel by a process is usually set to
>>> 64 bytes by default. This can be an issue when creating large BPF maps.
>>> A workaround is to raise this limit for the current process before
>>> trying to create a new BPF map. Changing the hard limit requires the
>>> CAP_SYS_RESOURCE and can usually only be done by root user (but then
>>> only root can create BPF maps).
>>
>> Sorry, the parenthesis is not correct: non-root users can in fact create
>> BPF maps as well. If a non-root user calls the function to create a map,
>> setrlimit() will fail silently (but set errno), and the program will
>> simply go on with its rlimit unchanged.
>>
>>> As far as I know there is not API to get the current amount of memory
>>> locked for a user, therefore we cannot raise the limit only when
>>> required. One solution, used by bcc, is to try to create the map, and on
>>> getting a EPERM error, raising the limit to infinity before giving
>>> another try. Another approach, used in iproute, is to raise the limit in
>>> all cases, before trying to create the map.
>>>
>>> Here we do the same as in iproute2: the rlimit is raised to infinity
>>> before trying to load the map.
>>>
>>> I send this patch as a RFC to see if people would prefer the bcc
>>> approach instead, or the rlimit change to be in bpftool rather than in
>>> libbpf.
> 
> I'd avoid doing something like this in a generic library; it's basically an
> ugly hack for the kind of accounting we're doing and only shows that while
> this was "good enough" to start off with in the early days, we should be
> doing something better today if every application raises it to inf anyway
> then it's broken. :) It just shows that this missed its purpose. Similarly
> to the jit_limit discussion on rlimit, perhaps we should be considering
> switching to something else entirely from kernel side. Could be something
> like memcg but this definitely needs some more evaluation first.

Changing the way limitations are enforced sounds like a cleaner
long-term approach indeed.

> (Meanwhile
> I'd not change the lib but callers instead and once we have something better
> in place we remove this type of "raising to inf" from the tree ...)

Understood, for the time beeing I'll repost a patch adding the
modification to bpftool once bpf-next is open.

Thanks Daniel!
Quentin

^ permalink raw reply

* RE: [PATCH] arm64: dts: stratix10: fix multicast filtering
From: Koskinen, Aaro (Nokia - FI/Espoo) @ 2018-11-02 11:04 UTC (permalink / raw)
  To: Dinh Nguyen, Thor Thayer, Rob Herring, Mark Rutland
  Cc: devicetree@vger.kernel.org, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <20181102105355.18654-1-aaro.koskinen@nokia.com>

Hi,

Please don't apply this - looks like the company SMTP server mangled the From
header. I'll try to send v2 with a workaround.

A.

^ permalink raw reply

* Re: Kernel 4.19 network performance - forwarding/routing normal users traffic
From: Jesper Dangaard Brouer @ 2018-11-02 11:40 UTC (permalink / raw)
  To: Aaron Lu
  Cc: Saeed Mahameed, pstaszewski@itcare.pl, eric.dumazet@gmail.com,
	netdev@vger.kernel.org, Tariq Toukan, ilias.apalodimas@linaro.org,
	yoel@kviknet.dk, mgorman@techsingularity.net, brouer
In-Reply-To: <20181102052356.GA17587@intel.com>

On Fri, 2 Nov 2018 13:23:56 +0800
Aaron Lu <aaron.lu@intel.com> wrote:

> On Thu, Nov 01, 2018 at 08:23:19PM +0000, Saeed Mahameed wrote:
> > On Thu, 2018-11-01 at 23:27 +0800, Aaron Lu wrote:  
> > > On Thu, Nov 01, 2018 at 10:22:13AM +0100, Jesper Dangaard Brouer
> > > wrote:
> > > ... ...  
> > > > Section copied out:
> > > > 
> > > >   mlx5e_poll_tx_cq
> > > >   |          
> > > >    --16.34%--napi_consume_skb
> > > >              |          
> > > >              |--12.65%--__free_pages_ok
> > > >              |          |          
> > > >              |           --11.86%--free_one_page
> > > >              |                     |          
> > > >              |                     |--10.10%
> > > > --queued_spin_lock_slowpath
> > > >              |                     |          
> > > >              |                      --0.65%--_raw_spin_lock  
> > > 
> > > This callchain looks like it is freeing higher order pages than order
> > > 0:
> > > __free_pages_ok is only called for pages whose order are bigger than
> > > 0.  
> > 
> > mlx5 rx uses only order 0 pages, so i don't know where these high order
> > tx SKBs are coming from..   
> 
> Perhaps here:
> __netdev_alloc_skb(), __napi_alloc_skb(), __netdev_alloc_frag() and
> __napi_alloc_frag() will all call page_frag_alloc(), which will use
> __page_frag_cache_refill() to get an order 3 page if possible, or fall
> back to an order 0 page if order 3 page is not available.
> 
> I'm not sure if your workload will use the above code path though.

TL;DR: this is order-0 pages (code-walk trough proof below)

To Aaron, the network stack *can* call __free_pages_ok() with order-0
pages, via:

static void skb_free_head(struct sk_buff *skb)
{
	unsigned char *head = skb->head;

	if (skb->head_frag)
		skb_free_frag(head);
	else
		kfree(head);
}

static inline void skb_free_frag(void *addr)
{
	page_frag_free(addr);
}

/*
 * Frees a page fragment allocated out of either a compound or order 0 page.
 */
void page_frag_free(void *addr)
{
	struct page *page = virt_to_head_page(addr);

	if (unlikely(put_page_testzero(page)))
		__free_pages_ok(page, compound_order(page));
}
EXPORT_SYMBOL(page_frag_free);


Notice for the mlx5 driver it support several RX-memory models, so it
can be hard to follow, but from the perf report output we can see that
is uses mlx5e_skb_from_cqe_linear, which use build_skb.

--13.63%--mlx5e_skb_from_cqe_linear
          |          
           --5.02%--build_skb
                     |          
                      --1.85%--__build_skb
                                |          
                                 --1.00%--kmem_cache_alloc

/* build_skb() is wrapper over __build_skb(), that specifically
 * takes care of skb->head and skb->pfmemalloc
 * This means that if @frag_size is not zero, then @data must be backed
 * by a page fragment, not kmalloc() or vmalloc()
 */
struct sk_buff *build_skb(void *data, unsigned int frag_size)
{
	struct sk_buff *skb = __build_skb(data, frag_size);

	if (skb && frag_size) {
		skb->head_frag = 1;
		if (page_is_pfmemalloc(virt_to_head_page(data)))
			skb->pfmemalloc = 1;
	}
	return skb;
}
EXPORT_SYMBOL(build_skb);

It still doesn't prove, that the @data is backed by by a order-0 page.
For the mlx5 driver is uses mlx5e_page_alloc_mapped ->
page_pool_dev_alloc_pages(), and I can see perf report using
__page_pool_alloc_pages_slow().

The setup for page_pool in mlx5 uses order=0.

	/* Create a page_pool and register it with rxq */
	pp_params.order     = 0;
	pp_params.flags     = 0; /* No-internal DMA mapping in page_pool */
	pp_params.pool_size = pool_size;
	pp_params.nid       = cpu_to_node(c->cpu);
	pp_params.dev       = c->pdev;
	pp_params.dma_dir   = rq->buff.map_dir;

	/* page_pool can be used even when there is no rq->xdp_prog,
	 * given page_pool does not handle DMA mapping there is no
	 * required state to clear. And page_pool gracefully handle
	 * elevated refcnt.
	 */
	rq->page_pool = page_pool_create(&pp_params);
	if (IS_ERR(rq->page_pool)) {
		err = PTR_ERR(rq->page_pool);
		rq->page_pool = NULL;
		goto err_free;
	}
	err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq,
					 MEM_TYPE_PAGE_POOL, rq->page_pool);

 
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH] net/mlx5e: fix high stack usage
From: Saeed Mahameed @ 2018-11-02 21:05 UTC (permalink / raw)
  To: davem@davemloft.net, arnd@arndb.de, leon@kernel.org
  Cc: linux-kernel@vger.kernel.org, Moshe Shemesh,
	linux-rdma@vger.kernel.org, Boris Pismenny, Tariq Toukan,
	akpm@linux-foundation.org, Eran Ben Elisha,
	netdev@vger.kernel.org, Ilya Lesokhin, Kamal Heib
In-Reply-To: <20181102153316.1492515-1-arnd@arndb.de>

On Fri, 2018-11-02 at 16:33 +0100, Arnd Bergmann wrote:
> A patch that looks harmless causes the stack usage of the
> mlx5e_grp_sw_update_stats()
> function to drastically increase with x86 gcc-4.9 and higher (tested
> up to 8.1):
> 
> drivers/net/ethernet/mellanox/mlx5/core/en_stats.c: In function
> ‘mlx5e_grp_sw_update_stats’:
> drivers/net/ethernet/mellanox/mlx5/core/en_stats.c:216:1: warning:
> the frame size of 1276 bytes is larger than 500 bytes [-Wframe-
> larger-than=]
> 
> By splitting out the loop body into a non-inlined function, the stack
> size goes
> back down to under 500 bytes.
> 
> Fixes: 779d986d60de ("net/mlx5e: Do not ignore netdevice TX/RX queues
> number")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  .../ethernet/mellanox/mlx5/core/en_stats.c    | 168 +++++++++-------
> --
>  1 file changed, 86 insertions(+), 82 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> index 1e55b9c27ffc..c270206f3475 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> @@ -126,93 +126,97 @@ static int mlx5e_grp_sw_fill_stats(struct
> mlx5e_priv *priv, u64 *data, int idx)
>  	return idx;
>  }
>  
> +static noinline_for_stack void
> +mlx5e_grp_sw_collect_stat(struct mlx5e_priv *priv, struct
> mlx5e_sw_stats *s, int i)
> +{
> +	struct mlx5e_channel_stats *channel_stats = &priv-
> >channel_stats[i];
> +	struct mlx5e_xdpsq_stats *xdpsq_red_stats = &channel_stats-
> >xdpsq;
> +	struct mlx5e_xdpsq_stats *xdpsq_stats = &channel_stats-
> >rq_xdpsq;
> +	struct mlx5e_rq_stats *rq_stats = &channel_stats->rq;
> +	struct mlx5e_ch_stats *ch_stats = &channel_stats->ch;
> +	int j;
> +
> +	s->rx_packets	+= rq_stats->packets;
> +	s->rx_bytes	+= rq_stats->bytes;
> +	s->rx_lro_packets += rq_stats->lro_packets;
> +	s->rx_lro_bytes	+= rq_stats->lro_bytes;
> +	s->rx_ecn_mark	+= rq_stats->ecn_mark;
> +	s->rx_removed_vlan_packets += rq_stats->removed_vlan_packets;
> +	s->rx_csum_none	+= rq_stats->csum_none;
> +	s->rx_csum_complete += rq_stats->csum_complete;
> +	s->rx_csum_unnecessary += rq_stats->csum_unnecessary;
> +	s->rx_csum_unnecessary_inner += rq_stats-
> >csum_unnecessary_inner;
> +	s->rx_xdp_drop     += rq_stats->xdp_drop;
> +	s->rx_xdp_redirect += rq_stats->xdp_redirect;
> +	s->rx_xdp_tx_xmit  += xdpsq_stats->xmit;
> +	s->rx_xdp_tx_full  += xdpsq_stats->full;
> +	s->rx_xdp_tx_err   += xdpsq_stats->err;
> +	s->rx_xdp_tx_cqe   += xdpsq_stats->cqes;
> +	s->rx_wqe_err   += rq_stats->wqe_err;
> +	s->rx_mpwqe_filler_cqes    += rq_stats->mpwqe_filler_cqes;
> +	s->rx_mpwqe_filler_strides += rq_stats->mpwqe_filler_strides;
> +	s->rx_buff_alloc_err += rq_stats->buff_alloc_err;
> +	s->rx_cqe_compress_blks += rq_stats->cqe_compress_blks;
> +	s->rx_cqe_compress_pkts += rq_stats->cqe_compress_pkts;
> +	s->rx_page_reuse  += rq_stats->page_reuse;
> +	s->rx_cache_reuse += rq_stats->cache_reuse;
> +	s->rx_cache_full  += rq_stats->cache_full;
> +	s->rx_cache_empty += rq_stats->cache_empty;
> +	s->rx_cache_busy  += rq_stats->cache_busy;
> +	s->rx_cache_waive += rq_stats->cache_waive;
> +	s->rx_congst_umr  += rq_stats->congst_umr;
> +	s->rx_arfs_err    += rq_stats->arfs_err;
> +	s->ch_events      += ch_stats->events;
> +	s->ch_poll        += ch_stats->poll;
> +	s->ch_arm         += ch_stats->arm;
> +	s->ch_aff_change  += ch_stats->aff_change;
> +	s->ch_eq_rearm    += ch_stats->eq_rearm;
> +	/* xdp redirect */
> +	s->tx_xdp_xmit    += xdpsq_red_stats->xmit;
> +	s->tx_xdp_full    += xdpsq_red_stats->full;
> +	s->tx_xdp_err     += xdpsq_red_stats->err;
> +	s->tx_xdp_cqes    += xdpsq_red_stats->cqes;
> +
> +	for (j = 0; j < priv->max_opened_tc; j++) {
> +		struct mlx5e_sq_stats *sq_stats = &channel_stats-
> >sq[j];
> +
> +		s->tx_packets		+= sq_stats->packets;
> +		s->tx_bytes		+= sq_stats->bytes;
> +		s->tx_tso_packets	+= sq_stats->tso_packets;
> +		s->tx_tso_bytes		+= sq_stats->tso_bytes;
> +		s->tx_tso_inner_packets	+= sq_stats-
> >tso_inner_packets;
> +		s->tx_tso_inner_bytes	+= sq_stats->tso_inner_bytes;
> +		s->tx_added_vlan_packets += sq_stats-
> >added_vlan_packets;
> +		s->tx_nop               += sq_stats->nop;
> +		s->tx_queue_stopped	+= sq_stats->stopped;
> +		s->tx_queue_wake	+= sq_stats->wake;
> +		s->tx_udp_seg_rem	+= sq_stats->udp_seg_rem;
> +		s->tx_queue_dropped	+= sq_stats->dropped;
> +		s->tx_cqe_err		+= sq_stats->cqe_err;
> +		s->tx_recover		+= sq_stats->recover;
> +		s->tx_xmit_more		+= sq_stats->xmit_more;
> +		s->tx_csum_partial_inner += sq_stats-
> >csum_partial_inner;
> +		s->tx_csum_none		+= sq_stats->csum_none;
> +		s->tx_csum_partial	+= sq_stats->csum_partial;
> +#ifdef CONFIG_MLX5_EN_TLS
> +		s->tx_tls_ooo		+= sq_stats->tls_ooo;
> +		s->tx_tls_resync_bytes	+= sq_stats-
> >tls_resync_bytes;
> +#endif
> +		s->tx_cqes		+= sq_stats->cqes;
> +	}
> +}
> +
>  void mlx5e_grp_sw_update_stats(struct mlx5e_priv *priv)
>  {
> -	struct mlx5e_sw_stats temp, *s = &temp;

temp will be mem copied to priv->stats.sw at the end,
memcpy(&priv->stats.sw, &s, sizeof(s)); 

one other way to solve this as suggested by Andrew, is to get rid of
the temp var and make it point directly to priv->stats.sw 


> +	struct mlx5e_sw_stats s;
>  	int i;
>  
> -	memset(s, 0, sizeof(*s));
> -
> -	for (i = 0; i < mlx5e_get_netdev_max_channels(priv->netdev);
> i++) {
> -		struct mlx5e_channel_stats *channel_stats =
> -			&priv->channel_stats[i];
> -		struct mlx5e_xdpsq_stats *xdpsq_red_stats =
> &channel_stats->xdpsq;
> -		struct mlx5e_xdpsq_stats *xdpsq_stats = &channel_stats-
> >rq_xdpsq;
> -		struct mlx5e_rq_stats *rq_stats = &channel_stats->rq;
> -		struct mlx5e_ch_stats *ch_stats = &channel_stats->ch;
> -		int j;
> -
> -		s->rx_packets	+= rq_stats->packets;
> -		s->rx_bytes	+= rq_stats->bytes;
> -		s->rx_lro_packets += rq_stats->lro_packets;
> -		s->rx_lro_bytes	+= rq_stats->lro_bytes;
> -		s->rx_ecn_mark	+= rq_stats->ecn_mark;
> -		s->rx_removed_vlan_packets += rq_stats-
> >removed_vlan_packets;
> -		s->rx_csum_none	+= rq_stats->csum_none;
> -		s->rx_csum_complete += rq_stats->csum_complete;
> -		s->rx_csum_unnecessary += rq_stats->csum_unnecessary;
> -		s->rx_csum_unnecessary_inner += rq_stats-
> >csum_unnecessary_inner;
> -		s->rx_xdp_drop     += rq_stats->xdp_drop;
> -		s->rx_xdp_redirect += rq_stats->xdp_redirect;
> -		s->rx_xdp_tx_xmit  += xdpsq_stats->xmit;
> -		s->rx_xdp_tx_full  += xdpsq_stats->full;
> -		s->rx_xdp_tx_err   += xdpsq_stats->err;
> -		s->rx_xdp_tx_cqe   += xdpsq_stats->cqes;
> -		s->rx_wqe_err   += rq_stats->wqe_err;
> -		s->rx_mpwqe_filler_cqes    += rq_stats-
> >mpwqe_filler_cqes;
> -		s->rx_mpwqe_filler_strides += rq_stats-
> >mpwqe_filler_strides;
> -		s->rx_buff_alloc_err += rq_stats->buff_alloc_err;
> -		s->rx_cqe_compress_blks += rq_stats->cqe_compress_blks;
> -		s->rx_cqe_compress_pkts += rq_stats->cqe_compress_pkts;
> -		s->rx_page_reuse  += rq_stats->page_reuse;
> -		s->rx_cache_reuse += rq_stats->cache_reuse;
> -		s->rx_cache_full  += rq_stats->cache_full;
> -		s->rx_cache_empty += rq_stats->cache_empty;
> -		s->rx_cache_busy  += rq_stats->cache_busy;
> -		s->rx_cache_waive += rq_stats->cache_waive;
> -		s->rx_congst_umr  += rq_stats->congst_umr;
> -		s->rx_arfs_err    += rq_stats->arfs_err;
> -		s->ch_events      += ch_stats->events;
> -		s->ch_poll        += ch_stats->poll;
> -		s->ch_arm         += ch_stats->arm;
> -		s->ch_aff_change  += ch_stats->aff_change;
> -		s->ch_eq_rearm    += ch_stats->eq_rearm;
> -		/* xdp redirect */
> -		s->tx_xdp_xmit    += xdpsq_red_stats->xmit;
> -		s->tx_xdp_full    += xdpsq_red_stats->full;
> -		s->tx_xdp_err     += xdpsq_red_stats->err;
> -		s->tx_xdp_cqes    += xdpsq_red_stats->cqes;
> -
> -		for (j = 0; j < priv->max_opened_tc; j++) {
> -			struct mlx5e_sq_stats *sq_stats =
> &channel_stats->sq[j];
> -
> -			s->tx_packets		+= sq_stats->packets;
> -			s->tx_bytes		+= sq_stats->bytes;
> -			s->tx_tso_packets	+= sq_stats->tso_packets;
> -			s->tx_tso_bytes		+= sq_stats-
> >tso_bytes;
> -			s->tx_tso_inner_packets	+= sq_stats-
> >tso_inner_packets;
> -			s->tx_tso_inner_bytes	+= sq_stats-
> >tso_inner_bytes;
> -			s->tx_added_vlan_packets += sq_stats-
> >added_vlan_packets;
> -			s->tx_nop               += sq_stats->nop;
> -			s->tx_queue_stopped	+= sq_stats->stopped;
> -			s->tx_queue_wake	+= sq_stats->wake;
> -			s->tx_udp_seg_rem	+= sq_stats->udp_seg_rem;
> -			s->tx_queue_dropped	+= sq_stats->dropped;
> -			s->tx_cqe_err		+= sq_stats->cqe_err;
> -			s->tx_recover		+= sq_stats->recover;
> -			s->tx_xmit_more		+= sq_stats-
> >xmit_more;
> -			s->tx_csum_partial_inner += sq_stats-
> >csum_partial_inner;
> -			s->tx_csum_none		+= sq_stats-
> >csum_none;
> -			s->tx_csum_partial	+= sq_stats-
> >csum_partial;
> -#ifdef CONFIG_MLX5_EN_TLS
> -			s->tx_tls_ooo		+= sq_stats->tls_ooo;
> -			s->tx_tls_resync_bytes	+= sq_stats-
> >tls_resync_bytes;
> -#endif
> -			s->tx_cqes		+= sq_stats->cqes;
> -		}
> -	}
> +	memset(&s, 0, sizeof(s));
> +
> +	for (i = 0; i < mlx5e_get_netdev_max_channels(priv->netdev);
> i++)
> +		mlx5e_grp_sw_collect_stat(priv, &s, i);
>  
> -	memcpy(&priv->stats.sw, s, sizeof(*s));
> +	memcpy(&priv->stats.sw, &s, sizeof(s));
>  }
>  
>  static const struct counter_desc q_stats_desc[] = {

^ permalink raw reply

* Re: [PATCH] net/mlx5e: fix high stack usage
From: Eric Dumazet @ 2018-11-02 21:39 UTC (permalink / raw)
  To: Saeed Mahameed, davem@davemloft.net, arnd@arndb.de,
	leon@kernel.org
  Cc: linux-kernel@vger.kernel.org, Moshe Shemesh,
	linux-rdma@vger.kernel.org, Boris Pismenny, Tariq Toukan,
	akpm@linux-foundation.org, Eran Ben Elisha,
	netdev@vger.kernel.org, Ilya Lesokhin, Kamal Heib
In-Reply-To: <9f214f12ec89020ceb14c1aec25b3a0d968507aa.camel@mellanox.com>



On 11/02/2018 02:05 PM, Saeed Mahameed wrote:
 
> temp will be mem copied to priv->stats.sw at the end,
> memcpy(&priv->stats.sw, &s, sizeof(s)); 
> 
> one other way to solve this as suggested by Andrew, is to get rid of
> the temp var and make it point directly to priv->stats.sw 
> 

What about concurrency ?

This temp variable is there to make sure concurrent readers of stats might
not see mangle data (because another 'reader' just did a memset() and is doing the folding)


mlx5e_get_stats() can definitely be run at the same time by multiple threads.

^ permalink raw reply

* [PATCH iproute2 1/2] testsuite: build generate_nlmsg with QUIET_CC
From: Luca Boccassi @ 2018-11-02 12:35 UTC (permalink / raw)
  To: netdev; +Cc: stephen, dsahern

Follow the standard pattern, and respect user's verbosity setting.

Signed-off-by: Luca Boccassi <bluca@debian.org>
---
 testsuite/tools/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/testsuite/tools/Makefile b/testsuite/tools/Makefile
index e1d9bfef..85df69ec 100644
--- a/testsuite/tools/Makefile
+++ b/testsuite/tools/Makefile
@@ -3,7 +3,7 @@ CFLAGS=
 include ../../config.mk
 
 generate_nlmsg: generate_nlmsg.c ../../lib/libnetlink.c
-	$(CC) $(CPPFLAGS) $(CFLAGS) $(EXTRA_CFLAGS) -I../../include -include../../include/uapi/linux/netlink.h -o $@ $^ -lmnl
+	$(QUIET_CC) $(CPPFLAGS) $(CFLAGS) $(EXTRA_CFLAGS) -I../../include -include../../include/uapi/linux/netlink.h -o $@ $^ -lmnl
 
 clean:
 	rm -f generate_nlmsg
-- 
2.19.1

^ permalink raw reply related

* [PATCH iproute2 2/2] Pass CPPFLAGS to the compiler
From: Luca Boccassi @ 2018-11-02 12:35 UTC (permalink / raw)
  To: netdev; +Cc: stephen, dsahern
In-Reply-To: <20181102123544.13000-1-bluca@debian.org>

When building Debian packages pre-processor flags are passed via
CPPFLAGS, as the convention indicates. Specifically, the hardening
-D_FORTIFY_SOURCE=2 flag is used.
Pass CPPFLAGS to all calls of QUIET_CC together with CFLAGS.

Signed-off-by: Luca Boccassi <bluca@debian.org>
---
 configure     | 2 +-
 misc/Makefile | 8 ++++----
 tc/Makefile   | 8 ++++----
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/configure b/configure
index c5655978..5df6082b 100755
--- a/configure
+++ b/configure
@@ -436,4 +436,4 @@ check_cap
 
 echo >> $CONFIG
 echo "%.o: %.c" >> $CONFIG
-echo '	$(QUIET_CC)$(CC) $(CFLAGS) $(EXTRA_CFLAGS) -c -o $@ $<' >> $CONFIG
+echo '	$(QUIET_CC)$(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(CPPFLAGS) -c -o $@ $<' >> $CONFIG
diff --git a/misc/Makefile b/misc/Makefile
index b2dd6b26..6a849af4 100644
--- a/misc/Makefile
+++ b/misc/Makefile
@@ -16,16 +16,16 @@ ss: $(SSOBJ)
 	$(QUIET_LINK)$(CC) $^ $(LDFLAGS) $(LDLIBS) -o $@
 
 nstat: nstat.c
-	$(QUIET_CC)$(CC) $(CFLAGS) $(LDFLAGS) -o nstat nstat.c $(LDLIBS) -lm
+	$(QUIET_CC)$(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) -o nstat nstat.c $(LDLIBS) -lm
 
 ifstat: ifstat.c
-	$(QUIET_CC)$(CC) $(CFLAGS) $(LDFLAGS) -o ifstat ifstat.c $(LDLIBS) -lm
+	$(QUIET_CC)$(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) -o ifstat ifstat.c $(LDLIBS) -lm
 
 rtacct: rtacct.c
-	$(QUIET_CC)$(CC) $(CFLAGS) $(LDFLAGS) -o rtacct rtacct.c $(LDLIBS) -lm
+	$(QUIET_CC)$(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) -o rtacct rtacct.c $(LDLIBS) -lm
 
 arpd: arpd.c
-	$(QUIET_CC)$(CC) $(CFLAGS) -I$(DBM_INCLUDE) $(LDFLAGS) -o arpd arpd.c $(LDLIBS) -ldb
+	$(QUIET_CC)$(CC) $(CFLAGS) -I$(DBM_INCLUDE) $(CPPFLAGS) $(LDFLAGS) -o arpd arpd.c $(LDLIBS) -ldb
 
 ssfilter.c: ssfilter.y
 	$(QUIET_YACC)bison ssfilter.y -o ssfilter.c
diff --git a/tc/Makefile b/tc/Makefile
index 25a28284..f8010d3c 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -128,7 +128,7 @@ CFLAGS += -DYY_NO_INPUT
 MODDESTDIR := $(DESTDIR)$(LIBDIR)/tc
 
 %.so: %.c
-	$(QUIET_CC)$(CC) $(CFLAGS) $(LDFLAGS) -shared -fpic $< -o $@
+	$(QUIET_CC)$(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) -shared -fpic $< -o $@
 
 
 all: tc $(TCSO)
@@ -158,13 +158,13 @@ clean:
 	rm -f emp_ematch.yacc.*
 
 q_atm.so: q_atm.c
-	$(QUIET_CC)$(CC) $(CFLAGS) $(LDFLAGS) -shared -fpic -o q_atm.so q_atm.c -latm
+	$(QUIET_CC)$(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) -shared -fpic -o q_atm.so q_atm.c -latm
 
 m_xt.so: m_xt.c
-	$(QUIET_CC)$(CC) $(CFLAGS) $(LDFLAGS) -shared -fpic -o m_xt.so m_xt.c $$($(PKG_CONFIG) xtables --cflags --libs)
+	$(QUIET_CC)$(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) -shared -fpic -o m_xt.so m_xt.c $$($(PKG_CONFIG) xtables --cflags --libs)
 
 m_xt_old.so: m_xt_old.c
-	$(QUIET_CC)$(CC) $(CFLAGS) $(LDFLAGS) -shared -fpic -o m_xt_old.so m_xt_old.c $$($(PKG_CONFIG) xtables --cflags --libs)
+	$(QUIET_CC)$(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) -shared -fpic -o m_xt_old.so m_xt_old.c $$($(PKG_CONFIG) xtables --cflags --libs)
 
 em_ipset.o: CFLAGS += $$($(PKG_CONFIG) xtables --cflags)
 
-- 
2.19.1

^ permalink raw reply related

* Re: [PATCH] net/mlx5e: fix high stack usage
From: Saeed Mahameed @ 2018-11-02 22:07 UTC (permalink / raw)
  To: eric.dumazet@gmail.com, davem@davemloft.net, arnd@arndb.de,
	leon@kernel.org
  Cc: linux-kernel@vger.kernel.org, Moshe Shemesh,
	linux-rdma@vger.kernel.org, Boris Pismenny, Tariq Toukan,
	akpm@linux-foundation.org, Eran Ben Elisha,
	netdev@vger.kernel.org, Ilya Lesokhin, Kamal Heib
In-Reply-To: <2727f37b-1742-5532-317e-3be8a984266b@gmail.com>

On Fri, 2018-11-02 at 14:39 -0700, Eric Dumazet wrote:
> 
> On 11/02/2018 02:05 PM, Saeed Mahameed wrote:
>  
> > temp will be mem copied to priv->stats.sw at the end,
> > memcpy(&priv->stats.sw, &s, sizeof(s)); 
> > 
> > one other way to solve this as suggested by Andrew, is to get rid
> > of
> > the temp var and make it point directly to priv->stats.sw 
> > 
> 
> What about concurrency ?
> 
> This temp variable is there to make sure concurrent readers of stats
> might
> not see mangle data (because another 'reader' just did a memset() and
> is doing the folding)
> 
> 
> mlx5e_get_stats() can definitely be run at the same time by multiple
> threads.
> 

hmm, you are right, i was thinking that mlx5e_get_Stats will trigger a
work to update stats and grab the state_lock, but for sw stats this is
not the case it is done in place.

BTW memcpy itself is not thread safe.

^ permalink raw reply

* Re: [PATCH] net/mlx5e: fix high stack usage
From: Arnd Bergmann @ 2018-11-02 22:15 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: eric.dumazet@gmail.com, davem@davemloft.net, leon@kernel.org,
	linux-kernel@vger.kernel.org, Moshe Shemesh,
	linux-rdma@vger.kernel.org, Boris Pismenny, Tariq Toukan,
	akpm@linux-foundation.org, Eran Ben Elisha,
	netdev@vger.kernel.org, Ilya Lesokhin, Kamal Heib
In-Reply-To: <4ebfc3aeb5335a1f671602f9a906f948c39a30da.camel@mellanox.com>

On 11/2/18, Saeed Mahameed <saeedm@mellanox.com> wrote:
> On Fri, 2018-11-02 at 14:39 -0700, Eric Dumazet wrote:
>>
>> On 11/02/2018 02:05 PM, Saeed Mahameed wrote:
>>
>> > temp will be mem copied to priv->stats.sw at the end,
>> > memcpy(&priv->stats.sw, &s, sizeof(s));
>> >
>> > one other way to solve this as suggested by Andrew, is to get rid
>> > of
>> > the temp var and make it point directly to priv->stats.sw
>> >
>>
>> What about concurrency ?
>>
>> This temp variable is there to make sure concurrent readers of stats
>> might
>> not see mangle data (because another 'reader' just did a memset() and
>> is doing the folding)
>>
>>
>> mlx5e_get_stats() can definitely be run at the same time by multiple
>> threads.
>>
>
> hmm, you are right, i was thinking that mlx5e_get_Stats will trigger a
> work to update stats and grab the state_lock, but for sw stats this is
> not the case it is done in place.
>
> BTW memcpy itself is not thread safe.

Before commit 6c63efe4cfab ("net/mlx5e: Remove redundant active_channels
indication"), there was a read_lock() in the function apparently intended to
made it thread safe. This got removed with the comment

commit 6c63efe4cfabf230a8ed4b1d880249875ffdac13
Author: Eran Ben Elisha <eranbe@mellanox.com>
Date:   Tue May 29 11:06:31 2018 +0300

    net/mlx5e: Remove redundant active_channels indication

    Now, when all channels stats are saved regardless of the channel's state
    {open, closed}, we can safely remove this indication and the stats spin
    lock which protects it.

    Fixes: 76c3810bade3 ("net/mlx5e: Avoid reset netdev stats on
configuration changes")

I don't really understand the reasoning, but maybe we can remove
the memcpy() if the code is thread safe, or we need the lock back if it's not.

         Arnd

^ permalink raw reply

* Re: [PATCH] net/mlx5e: fix high stack usage
From: Jason Gunthorpe @ 2018-11-02 22:15 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: eric.dumazet@gmail.com, davem@davemloft.net, arnd@arndb.de,
	leon@kernel.org, linux-kernel@vger.kernel.org, Moshe Shemesh,
	linux-rdma@vger.kernel.org, Boris Pismenny, Tariq Toukan,
	akpm@linux-foundation.org, Eran Ben Elisha,
	netdev@vger.kernel.org, Ilya Lesokhin, Kamal Heib
In-Reply-To: <4ebfc3aeb5335a1f671602f9a906f948c39a30da.camel@mellanox.com>

On Fri, Nov 02, 2018 at 10:07:02PM +0000, Saeed Mahameed wrote:
> On Fri, 2018-11-02 at 14:39 -0700, Eric Dumazet wrote:
> > 
> > On 11/02/2018 02:05 PM, Saeed Mahameed wrote:
> >  
> > > temp will be mem copied to priv->stats.sw at the end,
> > > memcpy(&priv->stats.sw, &s, sizeof(s)); 
> > > 
> > > one other way to solve this as suggested by Andrew, is to get rid
> > > of
> > > the temp var and make it point directly to priv->stats.sw 
> > > 
> > 
> > What about concurrency ?
> > 
> > This temp variable is there to make sure concurrent readers of stats
> > might
> > not see mangle data (because another 'reader' just did a memset() and
> > is doing the folding)
> > 
> > 
> > mlx5e_get_stats() can definitely be run at the same time by multiple
> > threads.
> > 
> 
> hmm, you are right, i was thinking that mlx5e_get_Stats will trigger a
> work to update stats and grab the state_lock, but for sw stats this is
> not the case it is done in place.

That was my guess when I saw this.. the confusing bit is why is there
s and temp, why not just s?

> BTW memcpy itself is not thread safe.

At least on 64 bit memcpy will do > 8 byte stores when copying so on
most architectures it will cause individual new or old u64 to be
returned and not a mess..

32 bit will always make a mess.

If the stats don't update that often then kmalloc'ing a new buffer and
RCU'ing it into view might be a reasonable alternative to this?

Jason

^ permalink raw reply

* [PATCH iproute2] man: ss.8: break and indent long line
From: Luca Boccassi @ 2018-11-02 13:27 UTC (permalink / raw)
  To: netdev; +Cc: stephen, dsahern

Fixes groff warning:
  ss.8 92: warning [p 2, 2.8i]: can't break line

And makes the line also more readable.

Signed-off-by: Luca Boccassi <bluca@debian.org>
---
This line is the example of an output, so you might not want to break it.
Let me know if you prefer a different solution.

 man/man8/ss.8 | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/man/man8/ss.8 b/man/man8/ss.8
index 7a6572b1..699a1271 100644
--- a/man/man8/ss.8
+++ b/man/man8/ss.8
@@ -89,7 +89,13 @@ an uuid of the socket
 Show socket memory usage. The output format is:
 .RS
 .P
-skmem:(r<rmem_alloc>,rb<rcv_buf>,t<wmem_alloc>,tb<snd_buf>,f<fwd_alloc>,w<wmem_queued>,o<opt_mem>,bl<back_log>)
+skmem:(r<rmem_alloc>,rb<rcv_buf>,t<wmem_alloc>,tb<snd_buf>,f<fwd_alloc>,
+.br
+.RS
+.RS
+w<wmem_queued>,o<opt_mem>,bl<back_log>)
+.RE
+.RE
 .P
 .TP
 .B <rmem_alloc>
-- 
2.19.1

^ permalink raw reply related

* Re: [RFC PATCH v3 04/10] ip: factor out protocol delivery helper
From: Paolo Abeni @ 2018-11-02 13:30 UTC (permalink / raw)
  To: Subash Abhinov Kasiviswanathan
  Cc: netdev, David S. Miller, Willem de Bruijn, Steffen Klassert
In-Reply-To: <482631991107b8493fe7310c01d66b92@codeaurora.org>

On Thu, 2018-11-01 at 00:35 -0600, Subash Abhinov Kasiviswanathan
wrote:
> On 2018-10-30 11:24, Paolo Abeni wrote:
> > So that we can re-use it at the UDP lavel in a later patch
> > 
> 
> Hi Paolo
> 
> Minor queries -
> Should it be "level" instead of "lavel"? Similar comment for the ipv6
> patch as well.

Indeed. Will fix in the next iteration.

> > diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
> > index 35a786c0aaa0..72250b4e466d 100644
> > --- a/net/ipv4/ip_input.c
> > +++ b/net/ipv4/ip_input.c
> > @@ -188,51 +188,50 @@ bool ip_call_ra_chain(struct sk_buff *skb)
> >  	return false;
> >  }
> > 
> > -static int ip_local_deliver_finish(struct net *net, struct sock *sk,
> > struct sk_buff *skb)
> > +void ip_protocol_deliver_rcu(struct net *net, struct sk_buff *skb,
> > int protocol)
> 
> Would it be better if this function was declared in include/net/ip.h &
> include/net/ipv6.h rather than in net/ipv4/udp.c & net/ipv6/udp.c as in
> the patch "udp: cope with UDP GRO packet misdirection"
> 
> diff --git a/include/net/ip.h b/include/net/ip.h
> index 72593e1..3d7fdb4 100644
> --- a/include/net/ip.h
> +++ b/include/net/ip.h
> @@ -717,4 +717,6 @@ static inline void ip_cmsg_recv(struct msghdr *msg, 
> struct sk_buff *skb)
>   int rtm_getroute_parse_ip_proto(struct nlattr *attr, u8 *ip_proto,
>                                  struct netlink_ext_ack *extack);
> 
> +void ip_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int 
> proto);
> +
>   #endif /* _IP_H */
> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> index 8296505..4d4d2cfe 100644
> --- a/include/net/ipv6.h
> +++ b/include/net/ipv6.h
> @@ -1101,4 +1101,8 @@ int ipv6_sock_mc_join_ssm(struct sock *sk, int 
> ifindex,
>                            const struct in6_addr *addr, unsigned int 
> mode);
>   int ipv6_sock_mc_drop(struct sock *sk, int ifindex,
>                        const struct in6_addr *addr);
> +
> +void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int 
> nexthdr,
> +                              bool have_final);
> +
>   #endif /* _NET_IPV6_H */
> 

Agreed, I will do in the next iteration. 

Thanks,

Paolo

^ permalink raw reply

* Re: [RFC PATCH v3 01/10] udp: implement complete book-keeping for encap_needed
From: Paolo Abeni @ 2018-11-02 13:30 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David Miller, Willem de Bruijn,
	steffen.klassert, Subash Abhinov Kasiviswanathan
In-Reply-To: <CAF=yD-KMU9+UvL-UNFnZth31O+NzGzdqFHNCq-QbSpZ4_qY86A@mail.gmail.com>

Hi,

On Thu, 2018-11-01 at 16:59 -0400, Willem de Bruijn wrote:
> On Tue, Oct 30, 2018 at 1:28 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > 
> > The *encap_needed static keys are enabled by UDP tunnels
> > and several UDP encapsulations type, but they are never
> > turned off. This can cause unneeded overall performance
> > degradation for systems where such features are used
> > transiently.
> > 
> > This patch introduces complete book-keeping for such keys,
> > decreasing the usage at socket destruction time, if needed,
> > and avoiding that the same socket could increase the key
> > usage multiple times.
> > 
> > rfc v2 - rfc v3:
> >  - use udp_tunnel_encap_enable() in setsockopt()
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > @@ -2447,7 +2452,7 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
> >                         /* FALLTHROUGH */
> >                 case UDP_ENCAP_L2TPINUDP:
> >                         up->encap_type = val;
> > -                       udp_encap_enable();
> > +                       udp_tunnel_encap_enable(sk->sk_socket);
> 
> this now also needs lock_sock?

Yep, you are right. I'll add it in the next iteration.

Thanks,

Paolo

^ permalink raw reply

* Re: WARNING in kmem_cache_create_usercopy
From: Dominique Martinet @ 2018-11-02 22:39 UTC (permalink / raw)
  To: syzbot
  Cc: davem, ericvh, linux-kernel, lucho, netdev, syzkaller-bugs,
	v9fs-developer
In-Reply-To: <0000000000009d4c780579b04ee4@google.com>

syzbot wrote on Fri, Nov 02, 2018:
> RIP: 0010:kmem_cache_create_usercopy+0xad/0x240 mm/slab_common.c:473
> Code: 44 89 f0 25 00 60 de 04 45 85 ed 89 45 cc 75 0b 8b 45 d0 85 c0
> 0f 85 8e 01 00 00 44 39 eb 72 0a 89 d8 44 29 e8 3b 45 d0 73 7e <0f>
> 0b c7 45 d0 00 00 00 00 4c 8b 45 10 44 89 fa 89 de 4c 89 e7 8b
> RSP: 0018:ffff8801bc23f5d0 EFLAGS: 00010213
> RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000006
> RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffffffff88b04b20
> RBP: ffff8801bc23f608 R08: fffffbfff1283a2d R09: fffffbfff1283a2c
> R10: ffff8801bc23f5c0 R11: ffffffff8941d167 R12: ffffffff88b04b20
> R13: 00000000fffffffd R14: 0000000000000000 R15: 0000000000000000
>  p9_client_create+0xa58/0x1769 net/9p/client.c:1054

No lower bound on msize, the reproducer gives a reply from the
pseudo-server with msize=8 and we happily take it, underflowing the
msize - 11 (hdr+4) argument to kmem_cache_create_usercopy...
This probably never worked anyway, but we now get a warning :)

We need to add a sane lower bound to msize as well as the current upper
bound set in the transport.
We have some header sizes in 9p.h for IO header overhead (P9_IOHDRSZ to
24 for example) but I think that's too low in practice; stuff like
readdir will require more than this to get a single entry... We can
request the server to ask for at least 4k?
9p would probably work with less (e.g. 1k; I'd rather not have to
figgure the minimum length we need to get each messages to work in its
minimal form) but honestly even with 4k the perforamnces will be
terrible, so tempted to go with that...

I'll send a patch imposing 4k next week unless someone else does first,
or replies indicate different preferences.

@Dmitry: semi-related, the C reproducer (
https://syzkaller.appspot.com/x/repro.c?x=1701f831400000 ) has a lot of
"readable" letters spelled out as "\x63..." or chars as 0x3d - it's fine
for generated code and that might be easier for the intermediate
representation syzkaller works with, but do you know something handy
that would help convert these to readable strings?
e.g. "\x63\x61\x63\x68\x65\x3d\xc0\x6d\x61\x70" could be written
"cache=\xc0map", and 0x3d as '=' (hm I guess the later would not always
make sense to convert so probably best left as is, but it gets annoying
pretty fast with longer strings)

-- 
Dominique

^ permalink raw reply

* Re: [RFC PATCH] lib: Introduce generic __cmpxchg_u64() and use it where needed
From: Paul E. McKenney @ 2018-11-02 13:37 UTC (permalink / raw)
  To: David Laight
  Cc: Peter Zijlstra, Trond Myklebust, mark.rutland@arm.com,
	linux-kernel@vger.kernel.org, ralf@linux-mips.org,
	jlayton@kernel.org, linuxppc-dev@lists.ozlabs.org,
	bfields@fieldses.org, linux-mips@linux-mips.org,
	linux@roeck-us.net, linux-nfs@vger.kernel.org,
	akpm@linux-foundation.org, will.deacon@arm.com,
	boqun.feng@gmail.com, paul.burton@mips.com,
	"anna.schumaker@netapp.com" <an
In-Reply-To: <7d1ecd21c4c249138dfdd42b9aaa1cea@AcuMS.aculab.com>

On Fri, Nov 02, 2018 at 10:56:31AM +0000, David Laight wrote:
> From: Paul E. McKenney
> > Sent: 01 November 2018 17:02
> ...
> > And there is a push to define C++ signed arithmetic as 2s complement,
> > but there are still 1s complement systems with C compilers.  Just not
> > C++ compilers.  Legacy...
> 
> Hmmm... I've used C compilers for DSPs where signed integer arithmetic
> used the 'data registers' and would saturate, unsigned used the 'address
> registers' and wrapped.
> That was deliberate because it is much better to clip analogue values.

There are no C++ compilers for those DSPs, correct?  (Some of the
C++ standards commmittee members believe that they have fully checked,
but they might well have missed something.)

> Then there was the annoying cobol run time that didn't update the
> result variable if the result wouldn't fit.
> Took a while to notice that the sum of a list of values was even wrong!
> That would be perfectly valid for C - if unexpected.

Heh!  COBOL and FORTRAN also helped fund my first pass through university.

> > > But for us using -fno-strict-overflow which actually defines signed
> > > overflow
> 
> I wonder how much real code 'strict-overflow' gets rid of?
> IIRC gcc silently turns loops like:
> 	int i; for (i = 1; i != 0; i *= 2) ...
> into infinite ones.
> Which is never what is required.

The usual response is something like this:

	for (i = 1; i < n; i++)

where the compiler has no idea what range of values "n" might take on.
Can't say that I am convinced by that example, but at least we do have
-fno-strict-overflow.

							Thanx, Paul

^ permalink raw reply

* Re: [RFC PATCH v3 06/10] udp: cope with UDP GRO packet misdirection
From: Paolo Abeni @ 2018-11-02 13:44 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David Miller, Willem de Bruijn,
	steffen.klassert, Subash Abhinov Kasiviswanathan
In-Reply-To: <CAF=yD-LbaAEF7XS8BT8i1uaZTrGpmAEYvf3Yw7DHFKUKMmGzNA@mail.gmail.com>

On Thu, 2018-11-01 at 17:01 -0400, Willem de Bruijn wrote:
> On Wed, Oct 31, 2018 at 5:57 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > @@ -450,4 +457,32 @@ DECLARE_STATIC_KEY_FALSE(udpv6_encap_needed_key);
> > >  void udpv6_encap_enable(void);
> > >  #endif
> > > 
> > > +static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
> > > +                                           struct sk_buff *skb)
> > > +{
> > > +     bool ipv4 = skb->protocol == htons(ETH_P_IP);
> > 
> > And this cause a compile warning when # CONFIG_IPV6 is not set, I will
> > fix in the next iteration (again thanks kbuildbot)
> 
> Can also just pass it as argument. 

Agreed. 

> This skb->protocol should work correctly
> with tunneled packets, but it wasn't as immediately obvious to me.
> 
> Also
> 
> +       if (unlikely(!segs))
> +               goto drop;
> 
> this should not happen. But if it could and the caller treats it the
> same as error (both now return NULL), then skb needs to be freed.

Right you are. Will do in the next iteration.

Thanks,

Paolo

^ permalink raw reply

* Re: Kernel 4.19 network performance - forwarding/routing normal users traffic
From: Aaron Lu @ 2018-11-02 14:20 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Saeed Mahameed, pstaszewski@itcare.pl, eric.dumazet@gmail.com,
	netdev@vger.kernel.org, Tariq Toukan, ilias.apalodimas@linaro.org,
	yoel@kviknet.dk, mgorman@techsingularity.net
In-Reply-To: <20181102124037.352b15de@redhat.com>

On Fri, Nov 02, 2018 at 12:40:37PM +0100, Jesper Dangaard Brouer wrote:
> On Fri, 2 Nov 2018 13:23:56 +0800
> Aaron Lu <aaron.lu@intel.com> wrote:
> 
> > On Thu, Nov 01, 2018 at 08:23:19PM +0000, Saeed Mahameed wrote:
> > > On Thu, 2018-11-01 at 23:27 +0800, Aaron Lu wrote:  
> > > > On Thu, Nov 01, 2018 at 10:22:13AM +0100, Jesper Dangaard Brouer
> > > > wrote:
> > > > ... ...  
> > > > > Section copied out:
> > > > > 
> > > > >   mlx5e_poll_tx_cq
> > > > >   |          
> > > > >    --16.34%--napi_consume_skb
> > > > >              |          
> > > > >              |--12.65%--__free_pages_ok
> > > > >              |          |          
> > > > >              |           --11.86%--free_one_page
> > > > >              |                     |          
> > > > >              |                     |--10.10%
> > > > > --queued_spin_lock_slowpath
> > > > >              |                     |          
> > > > >              |                      --0.65%--_raw_spin_lock  
> > > > 
> > > > This callchain looks like it is freeing higher order pages than order
> > > > 0:
> > > > __free_pages_ok is only called for pages whose order are bigger than
> > > > 0.  
> > > 
> > > mlx5 rx uses only order 0 pages, so i don't know where these high order
> > > tx SKBs are coming from..   
> > 
> > Perhaps here:
> > __netdev_alloc_skb(), __napi_alloc_skb(), __netdev_alloc_frag() and
> > __napi_alloc_frag() will all call page_frag_alloc(), which will use
> > __page_frag_cache_refill() to get an order 3 page if possible, or fall
> > back to an order 0 page if order 3 page is not available.
> > 
> > I'm not sure if your workload will use the above code path though.
> 
> TL;DR: this is order-0 pages (code-walk trough proof below)
> 
> To Aaron, the network stack *can* call __free_pages_ok() with order-0
> pages, via:
> 
> static void skb_free_head(struct sk_buff *skb)
> {
> 	unsigned char *head = skb->head;
> 
> 	if (skb->head_frag)
> 		skb_free_frag(head);
> 	else
> 		kfree(head);
> }
> 
> static inline void skb_free_frag(void *addr)
> {
> 	page_frag_free(addr);
> }
> 
> /*
>  * Frees a page fragment allocated out of either a compound or order 0 page.
>  */
> void page_frag_free(void *addr)
> {
> 	struct page *page = virt_to_head_page(addr);
> 
> 	if (unlikely(put_page_testzero(page)))
> 		__free_pages_ok(page, compound_order(page));
> }
> EXPORT_SYMBOL(page_frag_free);

I think here is a problem - order 0 pages are freed directly to buddy,
bypassing per-cpu-pages. This might be the reason lock contention
appeared on free path. Can someone apply below diff and see if lock
contention is gone?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e2ef1c17942f..65c0ae13215a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4554,8 +4554,14 @@ void page_frag_free(void *addr)
 {
 	struct page *page = virt_to_head_page(addr);
 
-	if (unlikely(put_page_testzero(page)))
-		__free_pages_ok(page, compound_order(page));
+	if (unlikely(put_page_testzero(page))) {
+		unsigned int order = compound_order(page);
+
+		if (order == 0)
+			free_unref_page(page);
+		else
+			__free_pages_ok(page, order);
+	}
 }
 EXPORT_SYMBOL(page_frag_free);
 
> Notice for the mlx5 driver it support several RX-memory models, so it
> can be hard to follow, but from the perf report output we can see that
> is uses mlx5e_skb_from_cqe_linear, which use build_skb.
> 
> --13.63%--mlx5e_skb_from_cqe_linear
>           |          
>            --5.02%--build_skb
>                      |          
>                       --1.85%--__build_skb
>                                 |          
>                                  --1.00%--kmem_cache_alloc
> 
> /* build_skb() is wrapper over __build_skb(), that specifically
>  * takes care of skb->head and skb->pfmemalloc
>  * This means that if @frag_size is not zero, then @data must be backed
>  * by a page fragment, not kmalloc() or vmalloc()
>  */
> struct sk_buff *build_skb(void *data, unsigned int frag_size)
> {
> 	struct sk_buff *skb = __build_skb(data, frag_size);
> 
> 	if (skb && frag_size) {
> 		skb->head_frag = 1;
> 		if (page_is_pfmemalloc(virt_to_head_page(data)))
> 			skb->pfmemalloc = 1;
> 	}
> 	return skb;
> }
> EXPORT_SYMBOL(build_skb);
> 
> It still doesn't prove, that the @data is backed by by a order-0 page.
> For the mlx5 driver is uses mlx5e_page_alloc_mapped ->
> page_pool_dev_alloc_pages(), and I can see perf report using
> __page_pool_alloc_pages_slow().
> 
> The setup for page_pool in mlx5 uses order=0.
> 
> 	/* Create a page_pool and register it with rxq */
> 	pp_params.order     = 0;
> 	pp_params.flags     = 0; /* No-internal DMA mapping in page_pool */
> 	pp_params.pool_size = pool_size;
> 	pp_params.nid       = cpu_to_node(c->cpu);
> 	pp_params.dev       = c->pdev;
> 	pp_params.dma_dir   = rq->buff.map_dir;
> 
> 	/* page_pool can be used even when there is no rq->xdp_prog,
> 	 * given page_pool does not handle DMA mapping there is no
> 	 * required state to clear. And page_pool gracefully handle
> 	 * elevated refcnt.
> 	 */
> 	rq->page_pool = page_pool_create(&pp_params);
> 	if (IS_ERR(rq->page_pool)) {
> 		err = PTR_ERR(rq->page_pool);
> 		rq->page_pool = NULL;
> 		goto err_free;
> 	}
> 	err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq,
> 					 MEM_TYPE_PAGE_POOL, rq->page_pool);

Thanks for the detailed analysis, I'll need more time to understand the
whole picture :-)

^ permalink raw reply related

* Re: Help with the BPF verifier
From: Arnaldo Carvalho de Melo @ 2018-11-02 15:02 UTC (permalink / raw)
  To: Edward Cree
  Cc: Yonghong Song, Daniel Borkmann, Jiri Olsa, Martin Lau,
	Alexei Starovoitov, Linux Networking Development Mailing List
In-Reply-To: <deb1d2a9-25cb-da1f-32b0-53a7b07eb4ad@solarflare.com>

Em Thu, Nov 01, 2018 at 08:05:07PM +0000, Edward Cree escreveu:
> On 01/11/18 18:52, Arnaldo Carvalho de Melo wrote:
> >  R0=inv(id=0) R1=inv6 R2=inv6 R3=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> > 15: (b7) r2 = 0
> > 16: (63) *(u32 *)(r10 -260) = r2
> > 17: (67) r1 <<= 32
> > 18: (77) r1 >>= 32
> > 19: (67) r1 <<= 3
> > 20: (bf) r2 = r6
> > 21: (0f) r2 += r1
> > 22: (79) r3 = *(u64 *)(r2 +16)
> > R2 invalid mem access 'inv'
> I wonder if you could run this with verifier log level 2?  (I'm not sure how
>  you would go about plumbing that through the perf tooling.)  It seems very
>  odd that it ends up with R2=inv, and I'm wondering whether R1 becomes unknown
>  during the shifts or whether the addition in insn 21 somehow produces the
>  unknown-ness.  (I know we used to have a thing[1] where doing ptr += K and
>  then also having an offset in the LDX produced an error about
>  ptr+const+const, but that seems to have been fixed at some point.)
> 
> Note however that even if we get past this, R1 at this point holds 6, so it
>  looks like the verifier is walking the impossible path where we're inside the
>  'if' even though filename_arg = 6.  This is a (slightly annoying) verifier
>  limitation, that it walks paths with impossible combinations of constraints
>  (we've previously had cases where assertions in the verifier would blow up
>  because of this, e.g. registers with max_val less than min_val).  So if the
>  check_ctx_access() is going to worry about whether you're off the end of the
>  array (I'm not sure what your program type is and thus which is_valid_access
>  callback is involved), then it'll complain about this.
> If filename_arg came from some external source you'd have a different
>  problem, because then it would have a totally unknown value, that on entering
>  the 'if' becomes "unknown but < 6", which is still too variable to have as
>  the offset of a ctx access.  Those have to be at a known constant offset, so
>  that we can determine the type of the returned value.
> 
> As a way to fix this, how about [UNTESTED!]:
>     const void *filename_arg = NULL;
>     /* ... */
>     switch (augmented_args.args.syscall_nr) {
>         case SYS_OPEN: filename_arg = args->args[0]; break;
>         case SYS_OPENAT: filename_arg = args->args[1]; break;
>     }
>     /* ... */
>     if (filename_arg) {
>         /* stuff */
>         blah = probe_read_str(/* ... */, filename_arg);
>     } else {
>         /* the other stuff */
>     }
> That way, you're only ever dealing in constant pointers (although judging by
>  an old thread I found[1] about ptr+const+const, the compiler might decide to
>  make some optimisations that end up looking like your existing code).

Yeah, didn't work as well:

SEC("raw_syscalls:sys_enter")
int sys_enter(struct syscall_enter_args *args)
{
	struct {
		struct syscall_enter_args args;
		struct augmented_filename filename;
	} augmented_args;
	unsigned int len = sizeof(augmented_args);
	const void *filename_arg = NULL;

	probe_read(&augmented_args.args, sizeof(augmented_args.args), args);

	switch (augmented_args.args.syscall_nr) {
	case SYS_OPEN:	 filename_arg = (const void *)args->args[0]; break;
	case SYS_OPENAT: filename_arg = (const void *)args->args[1]; break;
	}

	if (filename_arg != NULL) {
		augmented_args.filename.reserved = 0;
		augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
							      sizeof(augmented_args.filename.value),
							      filename_arg);
		if (augmented_args.filename.size < sizeof(augmented_args.filename.value)) {
			len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size;
			len &= sizeof(augmented_args.filename.value) - 1;
		}
	} else {
		len = sizeof(augmented_args.args);
	}

	perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, len);
	return 0;
}

And the -vv in 'perf trace' didn't seem to map to further details in the
output of the verifier debug:

# trace -vv -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
bpf: builtin compilation failed: -95, try external compiler
Kernel build dir is set to /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
set env: KBUILD_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
unset env: KBUILD_OPTS
include option is set to  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h 
set env: NR_CPUS=4
set env: LINUX_VERSION_CODE=0x41300
set env: CLANG_EXEC=/usr/local/bin/clang
unset env: CLANG_OPTIONS
set env: KERNEL_INC_OPTIONS= -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h 
set env: PERF_BPF_INC_OPTIONS=-I/home/acme/lib/perf/include/bpf
set env: WORKING_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
set env: CLANG_SOURCE=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
llvm compiling command template: $CLANG_EXEC -D__KERNEL__ -D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign -working-directory $WORKING_DIR -c "$CLANG_SOURCE" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE
llvm compiling command : /usr/local/bin/clang -D__KERNEL__ -D__NR_CPUS__=4 -DLINUX_VERSION_CODE=0x41300  -I/home/acme/lib/perf/include/bpf  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h  -Wno-unused-value -Wno-pointer-sign -working-directory /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build -c /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c -target bpf  -O2 -o - 
libbpf: loading object 'tools/perf/examples/bpf/augmented_raw_syscalls.c' from buffer
libbpf: section(1) .strtab, size 168, link 0, flags 0, type=3
libbpf: skip section(1) .strtab
libbpf: section(2) .text, size 0, link 0, flags 6, type=1
libbpf: skip section(2) .text
libbpf: section(3) raw_syscalls:sys_enter, size 344, link 0, flags 6, type=1
libbpf: found program raw_syscalls:sys_enter
libbpf: section(4) .relraw_syscalls:sys_enter, size 16, link 10, flags 0, type=9
libbpf: section(5) raw_syscalls:sys_exit, size 16, link 0, flags 6, type=1
libbpf: found program raw_syscalls:sys_exit
libbpf: section(6) maps, size 56, link 0, flags 3, type=1
libbpf: section(7) license, size 4, link 0, flags 3, type=1
libbpf: license of tools/perf/examples/bpf/augmented_raw_syscalls.c is GPL
libbpf: section(8) version, size 4, link 0, flags 3, type=1
libbpf: kernel version of tools/perf/examples/bpf/augmented_raw_syscalls.c is 41300
libbpf: section(9) .llvm_addrsig, size 6, link 10, flags 80000000, type=1879002115
libbpf: skip section(9) .llvm_addrsig
libbpf: section(10) .symtab, size 240, link 1, flags 0, type=2
libbpf: maps in tools/perf/examples/bpf/augmented_raw_syscalls.c: 2 maps in 56 bytes
libbpf: map 0 is "__augmented_syscalls__"
libbpf: map 1 is "__bpf_stdout__"
libbpf: collecting relocating info for: 'raw_syscalls:sys_enter'
libbpf: relo for 4 value 28 name 124
libbpf: relocation: insn_idx=35
libbpf: relocation: find map 1 (__augmented_syscalls__) for insn 35
Added extra kernel map __entry_SYSCALL_64_trampoline fffffe0000006000-fffffe0000007000
Added extra kernel map __entry_SYSCALL_64_trampoline fffffe0000032000-fffffe0000033000
Added extra kernel map __entry_SYSCALL_64_trampoline fffffe000005e000-fffffe000005f000
Added extra kernel map __entry_SYSCALL_64_trampoline fffffe000008a000-fffffe000008b000
bpf: config program 'raw_syscalls:sys_enter'
bpf: config program 'raw_syscalls:sys_exit'
libbpf: create map __bpf_stdout__: fd=3
libbpf: create map __augmented_syscalls__: fd=4
libbpf: load bpf program failed: Permission denied
libbpf: -- BEGIN DUMP LOG ---
libbpf: 
0: (bf) r6 = r1
1: (bf) r1 = r10
2: (07) r1 += -328
3: (b7) r7 = 64
4: (b7) r2 = 64
5: (bf) r3 = r6
6: (85) call bpf_probe_read#4
7: (79) r1 = *(u64 *)(r10 -320)
8: (15) if r1 == 0x101 goto pc+4
 R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
9: (55) if r1 != 0x2 goto pc+22
 R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
10: (bf) r1 = r6
11: (07) r1 += 16
12: (05) goto pc+2
15: (79) r3 = *(u64 *)(r1 +0)
dereference of modified ctx ptr R1 off=16 disallowed

libbpf: -- END LOG --
libbpf: failed to load program 'raw_syscalls:sys_enter'
libbpf: failed to load object 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
bpf: load objects failed: err=-4007: (Kernel verifier blocks program loading)
event syntax error: 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
                     \___ Kernel verifier blocks program loading

(add -v to see detail)
Run 'perf list' for a list of valid events

 Usage: perf trace [<options>] [<command>]
    or: perf trace [<options>] -- <command> [<options>]
    or: perf trace record [<options>] [<command>]
    or: perf trace record [<options>] -- <command> [<options>]

    -e, --event <event>   event/syscall selector. use 'perf list' to list available events
[root@seventh perf]# 


I'll check how to plumb that, but its a holiday down here in Brazil,
kids at home...


 
> As for what you want to do with the index coming from userspace, the verifier
>  will not like that at all, as mentioned above, so I think you'll need to do
>  something like:
>     switch (filename_arg_from_userspace) {
>         case 0: filename_arg = args->args[0]; break;
>         case 1: filename_arg = args->args[1]; break;
>         /* etc */
>         default: filename_arg = NULL;
>     }
>  thus ensuring that you only ever have ctx pointers with constant offsets.
> 
> -Ed
> 
> [1]: https://lists.iovisor.org/g/iovisor-dev/topic/21386327#1302

^ permalink raw reply

* ethtool 4.19 released
From: John W. Linville @ 2018-11-02 15:14 UTC (permalink / raw)
  To: netdev

ethtool version 4.19 has been released.

Home page: https://www.kernel.org/pub/software/network/ethtool/
Download link:
https://www.kernel.org/pub/software/network/ethtool/ethtool-4.19.tar.xz

Release notes:

	* Feature: support combinations of FEC modes
	* Feature: better syntax for combinations of FEC modes
	* Fix: Fix uninitialized variable use at qsfp dump

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox