Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next v2] bonding: check slave set command firstly
From: David Miller @ 2019-02-14 16:36 UTC (permalink / raw)
  To: xiangxia.m.yue; +Cc: netdev
In-Reply-To: <1549910988-40999-1-git-send-email-xiangxia.m.yue@gmail.com>

From: xiangxia.m.yue@gmail.com
Date: Mon, 11 Feb 2019 10:49:48 -0800

> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> 
> This patch is a little improvement. If user use the
> command shown as below, we should print the info [1]
> instead of [2]. The eth0 exists actually, and it may
> confuse user.
> 
> $ echo "eth0" > /sys/class/net/bond4/bonding/slaves
> 
> [1] "bond4: no command found in slaves file - use +ifname or -ifname"
> [2] "write error: No such device"
> 
> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>

Applied, but would you please fix the dates on your submissions?

Because the dates in your patch postings are in the past, patchwork
puts your work at the tail of my patch queue instead of the front.

Thank you.

^ permalink raw reply

* Re: [PATCH] net: phy: at803x: disable delay only for RGMII mode
From: David Miller @ 2019-02-14 16:38 UTC (permalink / raw)
  To: vkoul
  Cc: linux-arm-msm, bjorn.andersson, netdev, niklas.cassel, andrew,
	f.fainelli, nsekhar, peter.ujfalusi, marc.w.gonzalez
In-Reply-To: <20190212141922.12849-1-vkoul@kernel.org>

From: Vinod Koul <vkoul@kernel.org>
Date: Tue, 12 Feb 2019 19:49:22 +0530

> diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c
> index 8ff12938ab47..7b54b54e3316 100644
> --- a/drivers/net/phy/at803x.c
> +++ b/drivers/net/phy/at803x.c
> @@ -110,6 +110,18 @@ static int at803x_debug_reg_mask(struct phy_device *phydev, u16 reg,
>  	return phy_write(phydev, AT803X_DEBUG_DATA, val);
>  }
>  
> +static inline int at803x_enable_rx_delay(struct phy_device *phydev)
> +{
> +	return at803x_debug_reg_mask(phydev, AT803X_DEBUG_REG_0, 0,
> +				     AT803X_DEBUG_RX_CLK_DLY_EN);
> +}
> +
> +static inline int at803x_enable_tx_delay(struct phy_device *phydev)
> +{
> +	return at803x_debug_reg_mask(phydev, AT803X_DEBUG_REG_5, 0,
> +				     AT803X_DEBUG_TX_CLK_DLY_EN);
> +}
> +

Please do not use the inline directive in foo.c files, let the compiler
decide.

Thank you.

^ permalink raw reply

* Re: [PATCH] NETWORKING: avoid use IPCB in cipso_v4_error
From: David Miller @ 2019-02-14 16:43 UTC (permalink / raw)
  To: s-nazarov; +Cc: netdev, linux-security-module, kuznet, yoshfuji, paul
In-Reply-To: <6691891549984203@myt5-a323eb993ef7.qloud-c.yandex.net>

From: Nazarov Sergey <s-nazarov@yandex.ru>
Date: Tue, 12 Feb 2019 18:10:03 +0300

> Since cipso_v4_error might be called from different network stack layers, we can't safely use icmp_send there.
> icmp_send copies IP options with ip_option_echo, which uses IPCB to take access to IP header compiled data.
> But after commit 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses"), IPCB can't be used
> above IP layer.
> This patch fixes the problem by creating in cipso_v4_error a local copy of compiled IP options and using it with
> introduced __icmp_send function. This looks some overloaded, but in quite rare error conditions only.
> 
> The original discussion is here:
> https://lore.kernel.org/linux-security-module/16659801547571984@sas1-890ba5c2334a.qloud-c.yandex.net/
> 
> Signed-off-by: Sergey Nazarov <s-nazarov@yandex.ru>

This problem is not unique to Cipso, net/atm/clip.c's error handler
has the same exact issue.

I didn't scan more of the tree, there are probably a couple more
locations as well.

^ permalink raw reply

* Re: [PATCHv2 net-next 0/2] devlink: 2 fixes for devlink region read
From: David Miller @ 2019-02-14 16:46 UTC (permalink / raw)
  To: parav; +Cc: jiri, netdev
In-Reply-To: <1550002970-28893-1-git-send-email-parav@mellanox.com>

From: Parav Pandit <parav@mellanox.com>
Date: Tue, 12 Feb 2019 14:22:50 -0600

> This 2 patches consist of fixes for devlink region read handling.
> 
> Signed-off-by: Parav Pandit <parav@mellanox.com>

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH] net: phy: at803x: disable delay only for RGMII mode
From: Marc Gonzalez @ 2019-02-14 16:46 UTC (permalink / raw)
  To: David Miller, vkoul
  Cc: linux-arm-msm, bjorn.andersson, netdev, niklas.cassel, andrew,
	f.fainelli, nsekhar, peter.ujfalusi
In-Reply-To: <20190214.083828.206479765039661735.davem@davemloft.net>

On 14/02/2019 17:38, David Miller wrote:

> From: Vinod Koul <vkoul@kernel.org>
> Date: Tue, 12 Feb 2019 19:49:22 +0530
> 
>> diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c
>> index 8ff12938ab47..7b54b54e3316 100644
>> --- a/drivers/net/phy/at803x.c
>> +++ b/drivers/net/phy/at803x.c
>> @@ -110,6 +110,18 @@ static int at803x_debug_reg_mask(struct phy_device *phydev, u16 reg,
>>  	return phy_write(phydev, AT803X_DEBUG_DATA, val);
>>  }
>>  
>> +static inline int at803x_enable_rx_delay(struct phy_device *phydev)
>> +{
>> +	return at803x_debug_reg_mask(phydev, AT803X_DEBUG_REG_0, 0,
>> +				     AT803X_DEBUG_RX_CLK_DLY_EN);
>> +}
>> +
>> +static inline int at803x_enable_tx_delay(struct phy_device *phydev)
>> +{
>> +	return at803x_debug_reg_mask(phydev, AT803X_DEBUG_REG_5, 0,
>> +				     AT803X_DEBUG_TX_CLK_DLY_EN);
>> +}
>> +
> 
> Please do not use the inline directive in foo.c files, let the compiler
> decide.

Isn't the compiler free to ignore the "inline" hint?

Regards.

^ permalink raw reply

* Re: [PATCH net-next 0/2] uapi: Add a new header for time types
From: David Miller @ 2019-02-14 16:52 UTC (permalink / raw)
  To: deepa.kernel; +Cc: linux-kernel, netdev, willemb, tglx, arnd, y2038
In-Reply-To: <20190213032604.2655-1-deepa.kernel@gmail.com>

From: Deepa Dinamani <deepa.kernel@gmail.com>
Date: Tue, 12 Feb 2019 19:26:02 -0800

> The series aims at adding a new time header: time_types.h.  This header
> is what will eventually hold all the uapi time types that we plan to
> leave across the interfaces after the y2038 cleanup.
> 
> The series was discussed with Arnd Bergmann.
> 
> The second patch fixes the errqueue.h header, which has a dependency on
> these types.
> 
> Note that there may be a trivial merge conflict with linux-next
> c70a772fda11 ("y2038: remove struct definition redirects").

Series applied, thank you.

^ permalink raw reply

* Re: [PATCH iproute2] ss: Render buffer to output every time a number of chunks are allocated
From: Eric Dumazet @ 2019-02-14 16:55 UTC (permalink / raw)
  To: Stefano Brivio, Stephen Hemminger
  Cc: Phil Sutter, David Ahern, Sabrina Dubroca, netdev
In-Reply-To: <03dd56e5161a3c1270a21c4ba3f6e695793dbb74.1550105375.git.sbrivio@redhat.com>



On 02/13/2019 04:58 PM, Stefano Brivio wrote:
> Eric reported that, with 10 million sockets, ss -emoi (about 1000 bytes
> output per socket) can easily lead to OOM (buffer would grow to 10GB of
> memory).
> 
> Limit the maximum size of the buffer to five chunks, 1M each. Render and
> flush buffers whenever we reach that.
> 
> This might make the resulting blocks slightly unaligned between them, with
> occasional loss of readability on lines occurring every 5k to 50k sockets
> approximately. Something like (from ss -tu):
> 
> [...]
> CLOSE-WAIT   32       0           192.168.1.50:35232           10.0.0.1:https
> ESTAB        0        0           192.168.1.50:53820           10.0.0.1:https
> ESTAB       0        0           192.168.1.50:46924            10.0.0.1:https
> CLOSE-WAIT  32       0           192.168.1.50:35228            10.0.0.1:https
> [...]
> 
> However, I don't actually expect any human user to scroll through that
> amount of sockets, so readability should be preserved when it matters.
> 
> The bulk of the diffstat comes from moving field_next() around, as we now
> call render() from it. Functionally, this is implemented by six lines of
> code, most of them in field_next().
> 
> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> Fixes: 691bd854bf4a ("ss: Buffer raw fields first, then render them as a table")
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> ---
> Eric, it would be nice if you could test this with your bazillion sockets,
> I checked this with -emoi and "only" 500,000 sockets.

Thanks, this seems reasonable enough to me.

# /usr/bin/time misc/ss -t |head -1
State       Recv-Q   Send-Q       Local Address:Port        Peer Address:Port   
Command terminated by signal 13
0.05user 0.00system 0:00.05elapsed 100%CPU (0avgtext+0avgdata 5836maxresident)k
0inputs+0outputs (0major+1121minor)pagefaults 0swaps



^ permalink raw reply

* Re: [PATCH net] net: dlink: sundance: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
From: David Miller @ 2019-02-14 16:56 UTC (permalink / raw)
  To: albin_yang; +Cc: netdev, kda, yang.wei9
In-Reply-To: <1550070722-4539-1-git-send-email-albin_yang@163.com>

From: Yang Wei <albin_yang@163.com>
Date: Wed, 13 Feb 2019 23:12:02 +0800

> From: Yang Wei <yang.wei9@zte.com.cn>
> 
> dev_consume_skb_irq() should be called in intr_handler() when skb
> xmit done. It makes drop profiles(dropwatch, perf) more friendly.
> 
> Remove a redundant blank line in intr_handler().
> 
> Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>

Applied.

^ permalink raw reply

* Re: [PATCH net] net: amd: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
From: David Miller @ 2019-02-14 16:57 UTC (permalink / raw)
  To: albin_yang; +Cc: netdev, yang.wei9
In-Reply-To: <1550070894-4602-1-git-send-email-albin_yang@163.com>

From: Yang Wei <albin_yang@163.com>
Date: Wed, 13 Feb 2019 23:14:54 +0800

> From: Yang Wei <yang.wei9@zte.com.cn>
> 
> dev_consume_skb_irq() should be called when skb xmit done. It makes
> drop profiles(dropwatch, perf) more friendly.
> 
> Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>

Applied.

^ permalink raw reply

* Re: [PATCH net] net: myri10ge: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
From: David Miller @ 2019-02-14 16:57 UTC (permalink / raw)
  To: albin_yang; +Cc: netdev, christopher.lee, yang.wei9
In-Reply-To: <1550070943-4653-1-git-send-email-albin_yang@163.com>

From: Yang Wei <albin_yang@163.com>
Date: Wed, 13 Feb 2019 23:15:43 +0800

> From: Yang Wei <yang.wei9@zte.com.cn>
> 
> dev_consume_skb_irq() should be called in myri10ge_tx_done() when
> skb xmit done. It makes drop profiles(dropwatch, perf) more friendly.
> 
> Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>

Applied.

^ permalink raw reply

* Re: [PATCH net] net: sgi: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
From: David Miller @ 2019-02-14 16:57 UTC (permalink / raw)
  To: albin_yang; +Cc: netdev, ralf, yang.wei9
In-Reply-To: <1550071026-4723-1-git-send-email-albin_yang@163.com>

From: Yang Wei <albin_yang@163.com>
Date: Wed, 13 Feb 2019 23:17:06 +0800

> From: Yang Wei <yang.wei9@zte.com.cn>
> 
> dev_consume_skb_irq() should be called when skb xmit done. It makes
> drop profiles(dropwatch, perf) more friendly.
> 
> Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>

Applied.

^ permalink raw reply

* Re: [PATCH net] net: micrel: ks8695net: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
From: David Miller @ 2019-02-14 16:57 UTC (permalink / raw)
  To: albin_yang; +Cc: netdev, yang.wei9
In-Reply-To: <1550071089-4776-1-git-send-email-albin_yang@163.com>

From: Yang Wei <albin_yang@163.com>
Date: Wed, 13 Feb 2019 23:18:09 +0800

> From: Yang Wei <yang.wei9@zte.com.cn>
> 
> dev_consume_skb_irq() should be called in ks8695_tx_irq() when skb
> xmit done. It makes drop profiles(dropwatch, perf) more friendly.
> 
> Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>

Applied.

^ permalink raw reply

* Re: [PATCH net] net: natsemi: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
From: David Miller @ 2019-02-14 16:57 UTC (permalink / raw)
  To: albin_yang; +Cc: netdev, tsbogend, yang.wei9
In-Reply-To: <1550071154-4834-1-git-send-email-albin_yang@163.com>

From: Yang Wei <albin_yang@163.com>
Date: Wed, 13 Feb 2019 23:19:14 +0800

> From: Yang Wei <yang.wei9@zte.com.cn>
> 
> dev_consume_skb_irq() should be called when skb xmit done. It makes
> drop profiles(dropwatch, perf) more friendly.
> 
> Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>

Applied.

^ permalink raw reply

* Re: [PATCH net] net: nuvoton: w90p910_ether: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
From: David Miller @ 2019-02-14 16:57 UTC (permalink / raw)
  To: albin_yang; +Cc: netdev, linux-arm-kernel, mcuos.com, yang.wei9
In-Reply-To: <1550071262-4889-1-git-send-email-albin_yang@163.com>

From: Yang Wei <albin_yang@163.com>
Date: Wed, 13 Feb 2019 23:21:02 +0800

> From: Yang Wei <yang.wei9@zte.com.cn>
> 
> dev_consume_skb_irq() should be called in w90p910_ether_start_xmit()
> when skb xmit done. It makes drop profiles(dropwatch, perf) more
> friendly.
> 
> Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>

Applied.

^ permalink raw reply

* Re: [PATCH -next] net: ipvlan_l3s: fix kconfig dependency warning
From: David Miller @ 2019-02-14 16:59 UTC (permalink / raw)
  To: rdunlap; +Cc: netdev, maheshb, daniel
In-Reply-To: <204a7785-a1d2-e714-653e-2cb19e36f279@infradead.org>

From: Randy Dunlap <rdunlap@infradead.org>
Date: Wed, 13 Feb 2019 08:55:02 -0800

> From: Randy Dunlap <rdunlap@infradead.org>
> 
> Fix the kconfig warning in IPVLAN_L3S when neither INET nor IPV6
> is enabled:
> 
> WARNING: unmet direct dependencies detected for NET_L3_MASTER_DEV
>   Depends on [n]: NET [=y] && (INET [=n] || IPV6 [=n])
>   Selected by [y]:
>   - IPVLAN_L3S [=y] && NETDEVICES [=y] && NET_CORE [=y] && NETFILTER [=y]
> 
> Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
> Cc: Mahesh Bandewar <maheshb@google.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> ---
> v2: simplify the dependency to IPVLAN

Applied, thanks Randy.

^ permalink raw reply

* RE: [PATCH net-next 2/3] arm64: dts: fsl: ls1028a-rdb: Add ENETC external eth ports for the LS1028A RDB board
From: Claudiu Manoil @ 2019-02-14 17:00 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Shawn Guo, Leo Li, David S . Miller, devicetree@vger.kernel.org,
	Alexandru Marginean, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org
In-Reply-To: <20190214162746.GI708@lunn.ch>

>-----Original Message-----
>From: Andrew Lunn <andrew@lunn.ch>
>Sent: Thursday, February 14, 2019 6:28 PM
>To: Claudiu Manoil <claudiu.manoil@nxp.com>
>Cc: Shawn Guo <shawnguo@kernel.org>; Leo Li <leoyang.li@nxp.com>; David S .
>Miller <davem@davemloft.net>; devicetree@vger.kernel.org; Alexandru
>Marginean <alexandru.marginean@nxp.com>; linux-kernel@vger.kernel.org;
>linux-arm-kernel@lists.infradead.org; netdev@vger.kernel.org
>Subject: Re: [PATCH net-next 2/3] arm64: dts: fsl: ls1028a-rdb: Add ENETC
>external eth ports for the LS1028A RDB board
>
>> Hi Andrew,
>>
>> The extra node for mdio seems to complicate things somewhat.
>> Just adding this node seems not enough.  How to find out easily if a
>> child of a enetc port node is a mdio node?
>
>You copy somebody else code :-)
>

Provided you find the right thing to copy : ) . Thanks for the hint.

^ permalink raw reply

* Re: [PATCH net] net: stmmac: Fix NAPI poll in TX path when in multi-queue
From: David Miller @ 2019-02-14 17:01 UTC (permalink / raw)
  To: jose.abreu
  Cc: netdev, linux-kernel, joao.pinto, peppe.cavallaro,
	alexandre.torgue
In-Reply-To: <a264c48823687434e4d18aeb5830707e00c64250.1550077162.git.joabreu@synopsys.com>

From: Jose Abreu <jose.abreu@synopsys.com>
Date: Wed, 13 Feb 2019 18:00:43 +0100

> Commit 8fce33317023 introduced the concept of NAPI per-channel and
> independent cleaning of TX path.
> 
> This is currently breaking performance in some cases. The scenario
> happens when all packets are being received in Queue 0 but the TX is
> performed in Queue != 0.
> 
> I didn't look very deep but it seems that NAPI for Queue 0 will clean
> the RX path but as TX is in different NAPI, this last one is called at a
> slower rate which kills performance in TX. I suspect this is due to TX
> cleaning takes much longer than RX and because NAPI will get canceled
> once we return with 0 budget consumed (e.g. when TX is still not done it
> will return 0 budget).
> 
> Fix this by looking at all TX channels in NAPI poll function.
> 
> Signed-off-by: Jose Abreu <joabreu@synopsys.com>
> Fixes: 8fce33317023 ("net: stmmac: Rework coalesce timer and fix multi-queue races")

No this isn't right.

The TX interrupt events for Queue != 0 should clean up the TX packets
on those queues.

Furthermore you are breaking the locality of the TX processing.

I'm not applying this, sorry.

^ permalink raw reply

* Re: [PATCH 5/9] perf, bpf: save bpf_prog_info in a rbtree in perf_env
From: Song Liu @ 2019-02-14 17:01 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Netdev, linux-kernel@vger.kernel.org, ast@kernel.org,
	daniel@iogearbox.net, Kernel Team, peterz@infradead.org,
	acme@redhat.com
In-Reply-To: <20190214122638.GD26714@krava>



> On Feb 14, 2019, at 4:26 AM, Jiri Olsa <jolsa@redhat.com> wrote:
> 
> On Fri, Feb 08, 2019 at 05:17:01PM -0800, Song Liu wrote:
> 
> SNIP
> 
>> diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
>> index d01b8355f4ca..5894a177b7cf 100644
>> --- a/tools/perf/util/env.h
>> +++ b/tools/perf/util/env.h
>> @@ -3,7 +3,10 @@
>> #define __PERF_ENV_H
>> 
>> #include <linux/types.h>
>> +#include <linux/rbtree.h>
>> #include "cpumap.h"
>> +#include "rwsem.h"
>> +#include "bpf-event.h"
>> 
>> struct cpu_topology_map {
>> 	int	socket_id;
>> @@ -64,6 +67,8 @@ struct perf_env {
>> 	struct memory_node	*memory_nodes;
>> 	unsigned long long	 memory_bsize;
>> 	u64                     clockid_res_ns;
>> +	struct rw_semaphore	bpf_info_lock;
> 
> why's the lock needed?
> 
> jirka

It protects the retries for bpf_prog_info and btf. For perf-top, 
we will have one thread writing to the trees, while the main 
thread reading from them. 

Let me add comments to clarify. 

Thanks,
Song

^ permalink raw reply

* Re: [PATCH net] selftests: fix timestamping Makefile
From: David Miller @ 2019-02-14 17:03 UTC (permalink / raw)
  To: deepa.kernel; +Cc: shuah, willemb, netdev, linux-kselftest
In-Reply-To: <20190213170914.11991-1-deepa.kernel@gmail.com>

From: Deepa Dinamani <deepa.kernel@gmail.com>
Date: Wed, 13 Feb 2019 09:09:13 -0800

> The clean target in the makefile conflicts with the generic
> kselftests lib.mk, and fails to properly remove the compiled
> test programs.
> 
> Remove the redundant rule, the TEST_GEN_FILES will be already
> removed by the CLEAN macro in lib.mk.
> 
> Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH 5/9] perf, bpf: save bpf_prog_info in a rbtree in perf_env
From: Song Liu @ 2019-02-14 17:03 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Netdev, linux-kernel, ast@kernel.org, daniel@iogearbox.net,
	Kernel Team, peterz@infradead.org, acme@redhat.com
In-Reply-To: <20190214123311.GA7465@krava>



> On Feb 14, 2019, at 4:33 AM, Jiri Olsa <jolsa@redhat.com> wrote:
> 
> On Fri, Feb 08, 2019 at 05:17:01PM -0800, Song Liu wrote:
>> bpf_prog_info contains information necessary to annotate bpf programs.
>> This patch saves bpf_prog_info for bpf programs loaded in the system.
>> 
>> perf-record saves bpf_prog_info information as headers to perf.data.
>> A new header type HEADER_BPF_PROG_INFO is introduced for this data.
> 
> please move those 2 changes into separate patches then

Do you mean one patch to save data in rbtree, then a separate patch 
to save data in perf.data file?

Thanks,
Song

> 
> it's hard to make comments when I don't see the rest of
> the patches on the list please resend the patchset
> 
> thanks,
> jirka


^ permalink raw reply

* Re: [PATCH net 0/2] net: phy: fix locking issue
From: David Miller @ 2019-02-14 17:05 UTC (permalink / raw)
  To: hkallweit1; +Cc: andrew, f.fainelli, linux, netdev
In-Reply-To: <2a39271d-3b9e-e425-98b4-b2a24074e806@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Wed, 13 Feb 2019 20:10:36 +0100

> Russell pointed out that the locking used in phy_is_started() isn't
> needed and misleading. This locking also contributes to a race fixed
> with patch 2.

Series applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net: ip6_gre: Give ERSPAN a fill_info link op of its own
From: David Miller @ 2019-02-14 17:08 UTC (permalink / raw)
  To: petrm; +Cc: netdev, kuznet, yoshfuji, lorenzo.bianconi
In-Reply-To: <c14a9085e87ca9e36ba7f5feea46e5750a5baeeb.1550086179.git.petrm@mellanox.com>

From: Petr Machata <petrm@mellanox.com>
Date: Wed, 13 Feb 2019 19:31:32 +0000

> In commit c706863bc890 ("net: ip6_gre: always reports o_key to
> userspace"), ip6gre and ip6gretap tunnels started reporting a TUNNEL_KEY
> output flag even if one was not configured at the device.
> 
> When an okey-less ip6gre or ip6gretap netdevice is created, it initially
> encapsulates the packets without okey. But any configuration change
> (even a non-change such as setting TOS to an already-configured value)
> then causes the okey flag from the reported configuration to be
> circulated back to actual configuration. From that point on, the device
> encapsulates packets with output key of 0.
> 
> The intention was to implement this behavior for ERSPAN devices, not for
> all ip6gre devices. The ERSPAN netdevice should really have its own
> fill_info callback. Add one.
> 
> Fixes: c706863bc890 ("net: ip6_gre: always reports o_key to userspace")
> CC: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
> Signed-off-by: Petr Machata <petrm@mellanox.com>

This commit you are fixing exists in the 'net' tree, therefore this is
a bug fix and should be targetted at 'net'.

^ permalink raw reply

* Re: [RESEND PATCH net] mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
From: David Miller @ 2019-02-14 17:13 UTC (permalink / raw)
  To: jannh
  Cc: netdev, linux-mm, linux-kernel, mhocko, vbabka, pavel.tatashin,
	osalvador, mgorman, aaron.lu, alexander.h.duyck
In-Reply-To: <20190213214559.125666-1-jannh@google.com>

From: Jann Horn <jannh@google.com>
Date: Wed, 13 Feb 2019 22:45:59 +0100

> The basic idea behind ->pagecnt_bias is: If we pre-allocate the maximum
> number of references that we might need to create in the fastpath later,
> the bump-allocation fastpath only has to modify the non-atomic bias value
> that tracks the number of extra references we hold instead of the atomic
> refcount. The maximum number of allocations we can serve (under the
> assumption that no allocation is made with size 0) is nc->size, so that's
> the bias used.
> 
> However, even when all memory in the allocation has been given away, a
> reference to the page is still held; and in the `offset < 0` slowpath, the
> page may be reused if everyone else has dropped their references.
> This means that the necessary number of references is actually
> `nc->size+1`.
> 
> Luckily, from a quick grep, it looks like the only path that can call
> page_frag_alloc(fragsz=1) is TAP with the IFF_NAPI_FRAGS flag, which
> requires CAP_NET_ADMIN in the init namespace and is only intended to be
> used for kernel testing and fuzzing.
> 
> To test for this issue, put a `WARN_ON(page_ref_count(page) == 0)` in the
> `offset < 0` path, below the virt_to_page() call, and then repeatedly call
> writev() on a TAP device with IFF_TAP|IFF_NO_PI|IFF_NAPI_FRAGS|IFF_NAPI,
> with a vector consisting of 15 elements containing 1 byte each.
> 
> Signed-off-by: Jann Horn <jannh@google.com>

Applied and queued up for -stable.

^ permalink raw reply

* [PATCH net-next 0/2] tracepoints in neighbor subsystem
From: Roopa Prabhu @ 2019-02-14 17:15 UTC (permalink / raw)
  To: davem; +Cc: netdev, dsa

From: Roopa Prabhu <roopa@cumulusnetworks.com>

Roopa Prabhu (2):
  trace: events: add a few neigh tracepoints
  neigh: hook tracepoints in neigh update code

 include/trace/events/neigh.h | 213 +++++++++++++++++++++++++++++++++++++++++++
 net/core/neighbour.c         |  11 +++
 net/core/net-traces.c        |   8 ++
 3 files changed, 232 insertions(+)
 create mode 100644 include/trace/events/neigh.h

-- 
2.1.4


^ permalink raw reply

* [PATCH net-next 1/2] trace: events: add a few neigh tracepoints
From: Roopa Prabhu @ 2019-02-14 17:15 UTC (permalink / raw)
  To: davem; +Cc: netdev, dsa
In-Reply-To: <1550164511-21195-1-git-send-email-roopa@cumulusnetworks.com>

From: Roopa Prabhu <roopa@cumulusnetworks.com>

The goal here is to trace neigh state changes covering all possible
neigh update paths. Plus have a specific trace point in neigh_update
to cover flags sent to neigh_update.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 include/trace/events/neigh.h | 204 +++++++++++++++++++++++++++++++++++++++++++
 net/core/net-traces.c        |   8 ++
 2 files changed, 212 insertions(+)
 create mode 100644 include/trace/events/neigh.h

diff --git a/include/trace/events/neigh.h b/include/trace/events/neigh.h
new file mode 100644
index 0000000..ed10353
--- /dev/null
+++ b/include/trace/events/neigh.h
@@ -0,0 +1,204 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM neigh
+
+#if !defined(_TRACE_NEIGH_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_NEIGH_H
+
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <linux/tracepoint.h>
+#include <net/neighbour.h>
+
+#define neigh_state_str(state)				\
+	__print_symbolic(state,				\
+		{ NUD_INCOMPLETE, "incomplete" },	\
+		{ NUD_REACHABLE, "reachable" },		\
+		{ NUD_STALE, "stale" },			\
+		{ NUD_DELAY, "delay" },			\
+		{ NUD_PROBE, "probe" },			\
+		{ NUD_FAILED, "failed" })
+
+TRACE_EVENT(neigh_update,
+
+	TP_PROTO(struct neighbour *n, const u8 *lladdr, u8 new,
+		 u32 flags, u32 nlmsg_pid),
+
+	TP_ARGS(n, lladdr, new, flags, nlmsg_pid),
+
+	TP_STRUCT__entry(
+		__field(u32, family)
+		__string(dev, (n->dev ? n->dev->name : "NULL"))
+		__array(u8, lladdr, MAX_ADDR_LEN)
+		__field(u8, lladdr_len)
+		__field(u8, flags)
+		__field(u8, nud_state)
+		__field(u8, type)
+		__field(u8, dead)
+		__field(int, refcnt)
+		__array(__u8, primary_key4, 4)
+		__array(__u8, primary_key6, 16)
+		__field(unsigned long, confirmed)
+		__field(unsigned long, updated)
+		__field(unsigned long, used)
+		__array(u8, new_lladdr, MAX_ADDR_LEN)
+		__field(u8, new_state)
+		__field(u32, update_flags)
+		__field(u32, pid)
+	),
+
+	TP_fast_assign(
+		int lladdr_len = (n->dev ? n->dev->addr_len : MAX_ADDR_LEN);
+		struct in6_addr *pin6;
+		__be32 *p32;
+
+		__entry->family = n->tbl->family;
+		__assign_str(dev, (n->dev ? n->dev->name : "NULL"));
+		__entry->lladdr_len = lladdr_len;
+		memcpy(__entry->lladdr, n->ha, lladdr_len);
+		__entry->flags = n->flags;
+		__entry->nud_state = n->nud_state;
+		__entry->type = n->type;
+		__entry->dead = n->dead;
+		__entry->refcnt = refcount_read(&n->refcnt);
+		pin6 = (struct in6_addr *)__entry->primary_key6;
+		p32 = (__be32 *)__entry->primary_key4;
+
+		if (n->tbl->family == AF_INET)
+			*p32 = *(__be32 *)n->primary_key;
+		else
+			*p32 = 0;
+
+#if IS_ENABLED(CONFIG_IPV6)
+		if (n->tbl->family == AF_INET6) {
+			pin6 = (struct in6_addr *)__entry->primary_key6;
+			*pin6 = *(struct in6_addr *)n->primary_key;
+		} else
+#endif
+		{
+			ipv6_addr_set_v4mapped(*p32, pin6);
+		}
+		__entry->confirmed = n->confirmed;
+		__entry->updated = n->updated;
+		__entry->used = n->used;
+		if (lladdr)
+			memcpy(__entry->new_lladdr, lladdr, lladdr_len);
+		__entry->new_state = new;
+		__entry->update_flags = flags;
+		__entry->pid = nlmsg_pid;
+	),
+
+	TP_printk("family %d dev %s lladdr %s flags %02x nud_state %s type %02x "
+		  "dead %d refcnt %d primary_key4 %pI4 primary_key6 %pI6c "
+		  "confirmed %lu updated %lu used %lu new_lladdr %s "
+		  "new_state %02x update_flags %02x pid %d",
+		  __entry->family, __get_str(dev),
+		  __print_hex_str(__entry->lladdr, __entry->lladdr_len),
+		  __entry->flags, neigh_state_str(__entry->nud_state),
+		  __entry->type, __entry->dead, __entry->refcnt,
+		  __entry->primary_key4, __entry->primary_key6,
+		  __entry->confirmed, __entry->updated, __entry->used,
+		  __print_hex_str(__entry->new_lladdr, __entry->lladdr_len),
+		  __entry->new_state,
+		  __entry->update_flags, __entry->pid)
+);
+
+DECLARE_EVENT_CLASS(neigh__update,
+	TP_PROTO(struct neighbour *n, int err),
+	TP_ARGS(n, err),
+	TP_STRUCT__entry(
+		__field(u32, family)
+		__string(dev, (n->dev ? n->dev->name : "NULL"))
+		__array(u8, lladdr, MAX_ADDR_LEN)
+		__field(u8, lladdr_len)
+		__field(u8, flags)
+		__field(u8, nud_state)
+		__field(u8, type)
+		__field(u8, dead)
+		__field(int, refcnt)
+		__array(__u8, primary_key4, 4)
+		__array(__u8, primary_key6, 16)
+		__field(unsigned long, confirmed)
+		__field(unsigned long, updated)
+		__field(unsigned long, used)
+		__field(u32, err)
+	),
+
+	TP_fast_assign(
+		int lladdr_len = (n->dev ? n->dev->addr_len : MAX_ADDR_LEN);
+		struct in6_addr *pin6;
+		__be32 *p32;
+
+		__entry->family = n->tbl->family;
+		__assign_str(dev, (n->dev ? n->dev->name : "NULL"));
+		__entry->lladdr_len = lladdr_len;
+		memcpy(__entry->lladdr, n->ha, lladdr_len);
+		__entry->flags = n->flags;
+		__entry->nud_state = n->nud_state;
+		__entry->type = n->type;
+		__entry->dead = n->dead;
+		__entry->refcnt = refcount_read(&n->refcnt);
+		pin6 = (struct in6_addr *)__entry->primary_key6;
+		p32 = (__be32 *)__entry->primary_key4;
+
+		if (n->tbl->family == AF_INET)
+			*p32 = *(__be32 *)n->primary_key;
+		else
+			*p32 = 0;
+
+#if IS_ENABLED(CONFIG_IPV6)
+		if (n->tbl->family == AF_INET6) {
+			pin6 = (struct in6_addr *)__entry->primary_key6;
+			*pin6 = *(struct in6_addr *)n->primary_key;
+		} else
+#endif
+		{
+			ipv6_addr_set_v4mapped(*p32, pin6);
+		}
+
+		__entry->confirmed = n->confirmed;
+		__entry->updated = n->updated;
+		__entry->used = n->used;
+		__entry->err = err;
+	),
+
+	TP_printk("family %d dev %s lladdr %s flags %02x nud_state %s type %02x "
+		  "dead %d refcnt %d primary_key4 %pI4 primary_key6 %pI6c "
+		  "confirmed %lu updated %lu used %lu err %d",
+		  __entry->family, __get_str(dev),
+		  __print_hex_str(__entry->lladdr, __entry->lladdr_len),
+		  __entry->flags, neigh_state_str(__entry->nud_state),
+		  __entry->type, __entry->dead, __entry->refcnt,
+		  __entry->primary_key4, __entry->primary_key6,
+		  __entry->confirmed, __entry->updated, __entry->used,
+		  __entry->err)
+);
+
+DEFINE_EVENT(neigh__update, neigh_update_done,
+	TP_PROTO(struct neighbour *neigh, int err),
+	TP_ARGS(neigh, err)
+);
+
+DEFINE_EVENT(neigh__update, neigh_timer_handler,
+	TP_PROTO(struct neighbour *neigh, int err),
+	TP_ARGS(neigh, err)
+);
+
+DEFINE_EVENT(neigh__update, neigh_event_send_done,
+	TP_PROTO(struct neighbour *neigh, int err),
+	TP_ARGS(neigh, err)
+);
+
+DEFINE_EVENT(neigh__update, neigh_event_send_dead,
+	TP_PROTO(struct neighbour *neigh, int err),
+	TP_ARGS(neigh, err)
+);
+
+DEFINE_EVENT(neigh__update, neigh_cleanup_and_release,
+	TP_PROTO(struct neighbour *neigh, int rc),
+	TP_ARGS(neigh, rc)
+);
+
+#endif /* _TRACE_NEIGH_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/net/core/net-traces.c b/net/core/net-traces.c
index 419af6d..470b179 100644
--- a/net/core/net-traces.c
+++ b/net/core/net-traces.c
@@ -43,6 +43,14 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(fdb_delete);
 EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_update);
 #endif
 
+#include <trace/events/neigh.h>
+EXPORT_TRACEPOINT_SYMBOL_GPL(neigh_update);
+EXPORT_TRACEPOINT_SYMBOL_GPL(neigh_update_done);
+EXPORT_TRACEPOINT_SYMBOL_GPL(neigh_timer_handler);
+EXPORT_TRACEPOINT_SYMBOL_GPL(neigh_event_send_done);
+EXPORT_TRACEPOINT_SYMBOL_GPL(neigh_event_send_dead);
+EXPORT_TRACEPOINT_SYMBOL_GPL(neigh_cleanup_and_release);
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kfree_skb);
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(napi_poll);
-- 
2.1.4


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox