Netdev List
 help / color / mirror / Atom feed
* Re: WARNING in ip_rt_bug
From: Dmitry Vyukov @ 2018-04-09  6:06 UTC (permalink / raw)
  To: syzbot
  Cc: David Miller, Alexey Kuznetsov, LKML, netdev, syzkaller-bugs,
	Hideaki YOSHIFUJI, Eric Dumazet
In-Reply-To: <001a11408d9c43ff4a0569641a3f@google.com>

On Mon, Apr 9, 2018 at 7:59 AM, syzbot
<syzbot+b09ac67a2af842b12eab@syzkaller.appspotmail.com> wrote:
> Hello,
>
> syzbot hit the following crash on net-next commit
> 8bde261e535257e81087d39ff808414e2f5aa39d (Sun Apr 1 02:31:43 2018 +0000)
> Merge tag 'mlx5-updates-2018-03-30' of
> git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=b09ac67a2af842b12eab
>
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=5991727739437056
> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=3327544840960562528
> compiler: gcc (GCC) 7.1.1 20170620
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+b09ac67a2af842b12eab@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.


+Eric said that perhaps we just need to revert:

commit c378a9c019cf5e017d1ed24954b54fae7bebd2bc
Date:   Sat May 21 07:16:42 2011 +0000
    ipv4: Give backtrace in ip_rt_bug().


> netlink: 'syz-executor6': attribute type 3 has an invalid length.
> WARNING: CPU: 0 PID: 11678 at net/ipv4/route.c:1213 ip_rt_bug+0x15/0x20
> net/ipv4/route.c:1212
> Kernel panic - not syncing: panic_on_warn set ...
>
> CPU: 0 PID: 11678 Comm: kworker/u4:7 Not tainted 4.16.0-rc6+ #289
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  <IRQ>
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x24d lib/dump_stack.c:53
>  panic+0x1e4/0x41c kernel/panic.c:183
>  __warn+0x1dc/0x200 kernel/panic.c:547
>  report_bug+0x1f4/0x2b0 lib/bug.c:186
>  fixup_bug.part.10+0x37/0x80 arch/x86/kernel/traps.c:178
>  fixup_bug arch/x86/kernel/traps.c:247 [inline]
>  do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
>  invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
> RIP: 0010:ip_rt_bug+0x15/0x20 net/ipv4/route.c:1212
> RSP: 0018:ffff8801db007290 EFLAGS: 00010282
> RAX: dffffc0000000000 RBX: ffff8801d8dda3c0 RCX: ffffffff856c31ca
> RDX: 0000000000000100 RSI: ffffffff8858c300 RDI: 0000000000000282
> RBP: ffff8801db007298 R08: 1ffff1003b600de1 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801d8dda3c0
> R13: ffff88019bdb2200 R14: ffff88019bdeed80 R15: ffff8801d8dda418
>  dst_output include/net/dst.h:444 [inline]
>  ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124
>  ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1414
>  ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1434
>  icmp_push_reply+0x395/0x4f0 net/ipv4/icmp.c:394
>  icmp_send+0x1136/0x19b0 net/ipv4/icmp.c:741
>  ipv4_link_failure+0x2a/0x1b0 net/ipv4/route.c:1200
>  dst_link_failure include/net/dst.h:427 [inline]
>  arp_error_report+0xae/0x180 net/ipv4/arp.c:297
>  neigh_invalidate+0x225/0x530 net/core/neighbour.c:883
>  neigh_timer_handler+0x897/0xd60 net/core/neighbour.c:969
>  call_timer_fn+0x228/0x820 kernel/time/timer.c:1326
>  expire_timers kernel/time/timer.c:1363 [inline]
>  __run_timers+0x7ee/0xb70 kernel/time/timer.c:1666
>  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
>  __do_softirq+0x2d7/0xb85 kernel/softirq.c:285
>  invoke_softirq kernel/softirq.c:365 [inline]
>  irq_exit+0x1cc/0x200 kernel/softirq.c:405
>  exiting_irq arch/x86/include/asm/apic.h:541 [inline]
>  smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:857
>  </IRQ>
> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:778
> [inline]
> RIP: 0010:lock_acquire+0x256/0x580 kernel/locking/lockdep.c:3923
> RSP: 0018:ffff880197b3f980 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff12
> RAX: dffffc0000000000 RBX: ffff8801d225e400 RCX: 0000000000000000
> RDX: 1ffffffff10a24e5 RSI: 00000000b98b8227 RDI: 0000000000000282
> RBP: ffff880197b3fa78 R08: 1ffff10032f67e93 R09: 0000000000000004
> R10: ffff880197b3f960 R11: 0000000000000003 R12: 1ffff10032f67f36
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
>  down_write_killable+0x8a/0x140 kernel/locking/rwsem.c:84
>  __bprm_mm_init fs/exec.c:297 [inline]
>  bprm_mm_init fs/exec.c:414 [inline]
>  do_execveat_common.isra.30+0xc8e/0x23c0 fs/exec.c:1771
>  do_execve+0x31/0x40 fs/exec.c:1847
>  call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
>
>
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkaller@googlegroups.com.
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line in the email body.
>
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/001a11408d9c43ff4a0569641a3f%40google.com.
>
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* [PATCH net-next 0/3] ave: fix the activation issues for some UniPhier SoCs
From: Kunihiko Hayashi @ 2018-04-09  6:38 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: Andrew Lunn, Florian Fainelli, Rob Herring, Mark Rutland,
	linux-arm-kernel, linux-kernel, devicetree, Masahiro Yamada,
	Masami Hiramatsu, Jassi Brar, Kunihiko Hayashi

This add the following stuffs to fix the activation issues and satisfy
requirements for AVE ethernet driver implemented on some UniPhier SoCs.

- Add support for additional necessary clocks and resets, because the kernel
  is stalled on Pro4 due to lack of them.

- Check whether the SoC supports the specified phy-mode

- Add DT property support indicating system controller that has the feature
  for configurating phy-mode including built-in phy on LD11.

Kunihiko Hayashi (3):
  net: ethernet: ave: add multiple clocks and resets support as required
    property
  dt-bindings: net: ave: add syscon-phy-mode property to configure
    phy-mode setting
  net: ethernet: ave: add support for phy-mode setting of system
    controller

 .../bindings/net/socionext,uniphier-ave4.txt       |  19 +-
 drivers/net/ethernet/socionext/Kconfig             |   2 +
 drivers/net/ethernet/socionext/sni_ave.c           | 252 ++++++++++++++++++---
 3 files changed, 238 insertions(+), 35 deletions(-)

-- 
2.7.4

^ permalink raw reply

* [PATCH net-next 1/3] net: ethernet: ave: add multiple clocks and resets support as required property
From: Kunihiko Hayashi @ 2018-04-09  6:38 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: Andrew Lunn, Florian Fainelli, Rob Herring, Mark Rutland,
	linux-arm-kernel, linux-kernel, devicetree, Masahiro Yamada,
	Masami Hiramatsu, Jassi Brar, Kunihiko Hayashi
In-Reply-To: <1523255925-6469-1-git-send-email-hayashi.kunihiko@socionext.com>

When the link is becoming up for Pro4 SoC, the kernel is stalled
due to some missing clocks and resets.

The AVE block for Pro4 is connected to the GIO bus in the SoC.
Without its clock/reset, the access to the AVE register makes the
system stall.

In the same way, another MAC clock for Giga-bit Connection and
the PHY clock are also required for Pro4 to activate the Giga-bit feature
and to recognize the PHY.

To satisfy these requirements, this patch adds support for multiple clocks
and resets, and adds the clock-names and reset-names to the binding because
we need to distinguish clock/reset for the AVE main block and the others.

Also, make the resets a required property. Currently, "reset is
optional" relies on that the bootloader or firmware has deasserted
the reset before booting the kernel.  Drivers should work without
such expectation.

Fixes: 4c270b55a5af ("net: ethernet: socionext: add AVE ethernet driver")
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
---
 .../bindings/net/socionext,uniphier-ave4.txt       |  13 ++-
 drivers/net/ethernet/socionext/sni_ave.c           | 108 ++++++++++++++++-----
 2 files changed, 96 insertions(+), 25 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
index 96398cc..85e0c49 100644
--- a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
+++ b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
@@ -17,9 +17,18 @@ Required properties:
  - phy-handle: Should point to the external phy device.
 	See ethernet.txt file in the same directory.
  - clocks: A phandle to the clock for the MAC.
+	For Pro4 SoC, that is "socionext,uniphier-pro4-ave4",
+	another MAC clock, GIO bus clock and PHY clock are also required.
+ - clock-names: Should contain
+	- "ether", "ether-gb", "gio", "ether-phy" for Pro4 SoC
+	- "ether" for others
+ - resets: A phandle to the reset control for the MAC. For Pro4 SoC,
+	GIO bus reset is also required.
+ - reset-names: Should contain
+	- "ether", "gio" for Pro4 SoC
+	- "ether" for others
 
 Optional properties:
- - resets: A phandle to the reset control for the MAC.
  - local-mac-address: See ethernet.txt in the same directory.
 
 Required subnode:
@@ -34,7 +43,9 @@ Example:
 		interrupts = <0 66 4>;
 		phy-mode = "rgmii";
 		phy-handle = <&ethphy>;
+		clock-names = "ether";
 		clocks = <&sys_clk 6>;
+		reset-names = "ether";
 		resets = <&sys_rst 6>;
 		local-mac-address = [00 00 00 00 00 00];
 
diff --git a/drivers/net/ethernet/socionext/sni_ave.c b/drivers/net/ethernet/socionext/sni_ave.c
index 0b3b7a4..52940bd 100644
--- a/drivers/net/ethernet/socionext/sni_ave.c
+++ b/drivers/net/ethernet/socionext/sni_ave.c
@@ -199,6 +199,9 @@
 
 #define IS_DESC_64BIT(p)	((p)->data->is_desc_64bit)
 
+#define AVE_MAX_CLKS		4
+#define AVE_MAX_RSTS		2
+
 enum desc_id {
 	AVE_DESCID_RX,
 	AVE_DESCID_TX,
@@ -227,6 +230,8 @@ struct ave_desc_info {
 
 struct ave_soc_data {
 	bool	is_desc_64bit;
+	const char	*clock_names[AVE_MAX_CLKS];
+	const char	*reset_names[AVE_MAX_RSTS];
 };
 
 struct ave_stats {
@@ -245,8 +250,10 @@ struct ave_private {
 	int			phy_id;
 	unsigned int		desc_size;
 	u32			msg_enable;
-	struct clk		*clk;
-	struct reset_control	*rst;
+	int			nclks;
+	struct clk		*clk[AVE_MAX_CLKS];
+	int			nrsts;
+	struct reset_control	*rst[AVE_MAX_RSTS];
 	phy_interface_t		phy_mode;
 	struct phy_device	*phydev;
 	struct mii_bus		*mdio;
@@ -1153,18 +1160,23 @@ static int ave_init(struct net_device *ndev)
 	struct device_node *np = dev->of_node;
 	struct device_node *mdio_np;
 	struct phy_device *phydev;
-	int ret;
+	int nc, nr, ret;
 
 	/* enable clk because of hw access until ndo_open */
-	ret = clk_prepare_enable(priv->clk);
-	if (ret) {
-		dev_err(dev, "can't enable clock\n");
-		return ret;
+	for (nc = 0; nc < priv->nclks; nc++) {
+		ret = clk_prepare_enable(priv->clk[nc]);
+		if (ret) {
+			dev_err(dev, "can't enable clock\n");
+			goto out_clk_disable;
+		}
 	}
-	ret = reset_control_deassert(priv->rst);
-	if (ret) {
-		dev_err(dev, "can't deassert reset\n");
-		goto out_clk_disable;
+
+	for (nr = 0; nr < priv->nrsts; nr++) {
+		ret = reset_control_deassert(priv->rst[nr]);
+		if (ret) {
+			dev_err(dev, "can't deassert reset\n");
+			goto out_reset_assert;
+		}
 	}
 
 	ave_global_reset(ndev);
@@ -1207,9 +1219,11 @@ static int ave_init(struct net_device *ndev)
 out_mdio_unregister:
 	mdiobus_unregister(priv->mdio);
 out_reset_assert:
-	reset_control_assert(priv->rst);
+	while (--nr >= 0)
+		reset_control_assert(priv->rst[nr]);
 out_clk_disable:
-	clk_disable_unprepare(priv->clk);
+	while (--nc >= 0)
+		clk_disable_unprepare(priv->clk[nc]);
 
 	return ret;
 }
@@ -1217,13 +1231,16 @@ static int ave_init(struct net_device *ndev)
 static void ave_uninit(struct net_device *ndev)
 {
 	struct ave_private *priv = netdev_priv(ndev);
+	int i;
 
 	phy_disconnect(priv->phydev);
 	mdiobus_unregister(priv->mdio);
 
 	/* disable clk because of hw access after ndo_stop */
-	reset_control_assert(priv->rst);
-	clk_disable_unprepare(priv->clk);
+	for (i = 0; i < priv->nrsts; i++)
+		reset_control_assert(priv->rst[i]);
+	for (i = 0; i < priv->nclks; i++)
+		clk_disable_unprepare(priv->clk[i]);
 }
 
 static int ave_open(struct net_device *ndev)
@@ -1527,8 +1544,9 @@ static int ave_probe(struct platform_device *pdev)
 	struct resource	*res;
 	const void *mac_addr;
 	void __iomem *base;
+	const char *name;
+	int i, irq, ret;
 	u64 dma_mask;
-	int irq, ret;
 	u32 ave_id;
 
 	data = of_device_get_match_data(dev);
@@ -1614,16 +1632,28 @@ static int ave_probe(struct platform_device *pdev)
 	u64_stats_init(&priv->stats_tx.syncp);
 	u64_stats_init(&priv->stats_rx.syncp);
 
-	priv->clk = devm_clk_get(dev, NULL);
-	if (IS_ERR(priv->clk)) {
-		ret = PTR_ERR(priv->clk);
-		goto out_free_netdev;
+	for (i = 0; i < AVE_MAX_CLKS; i++) {
+		name = priv->data->clock_names[i];
+		if (!name)
+			break;
+		priv->clk[i] = devm_clk_get(dev, name);
+		if (IS_ERR(priv->clk[i])) {
+			ret = PTR_ERR(priv->clk[i]);
+			goto out_free_netdev;
+		}
+		priv->nclks++;
 	}
 
-	priv->rst = devm_reset_control_get_optional_shared(dev, NULL);
-	if (IS_ERR(priv->rst)) {
-		ret = PTR_ERR(priv->rst);
-		goto out_free_netdev;
+	for (i = 0; i < AVE_MAX_RSTS; i++) {
+		name = priv->data->reset_names[i];
+		if (!name)
+			break;
+		priv->rst[i] = devm_reset_control_get_shared(dev, name);
+		if (IS_ERR(priv->rst[i])) {
+			ret = PTR_ERR(priv->rst[i]);
+			goto out_free_netdev;
+		}
+		priv->nrsts++;
 	}
 
 	priv->mdio = devm_mdiobus_alloc(dev);
@@ -1687,22 +1717,52 @@ static int ave_remove(struct platform_device *pdev)
 
 static const struct ave_soc_data ave_pro4_data = {
 	.is_desc_64bit = false,
+	.clock_names = {
+		"gio", "ether", "ether-gb", "ether-phy",
+	},
+	.reset_names = {
+		"gio", "ether",
+	},
 };
 
 static const struct ave_soc_data ave_pxs2_data = {
 	.is_desc_64bit = false,
+	.clock_names = {
+		"ether",
+	},
+	.reset_names = {
+		"ether",
+	},
 };
 
 static const struct ave_soc_data ave_ld11_data = {
 	.is_desc_64bit = false,
+	.clock_names = {
+		"ether",
+	},
+	.reset_names = {
+		"ether",
+	},
 };
 
 static const struct ave_soc_data ave_ld20_data = {
 	.is_desc_64bit = true,
+	.clock_names = {
+		"ether",
+	},
+	.reset_names = {
+		"ether",
+	},
 };
 
 static const struct ave_soc_data ave_pxs3_data = {
 	.is_desc_64bit = false,
+	.clock_names = {
+		"ether",
+	},
+	.reset_names = {
+		"ether",
+	},
 };
 
 static const struct of_device_id of_ave_match[] = {
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 2/3] dt-bindings: net: ave: add syscon-phy-mode property to configure phy-mode setting
From: Kunihiko Hayashi @ 2018-04-09  6:38 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: Mark Rutland, Andrew Lunn, Florian Fainelli, Kunihiko Hayashi,
	Masami Hiramatsu, devicetree, linux-kernel, Masahiro Yamada,
	Rob Herring, Jassi Brar, linux-arm-kernel
In-Reply-To: <1523255925-6469-1-git-send-email-hayashi.kunihiko@socionext.com>

Add "socionext,syscon-phy-mode" property to specify system controller that
configures the settings about phy-mode.

Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
---
 Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
index 85e0c49..fc8f017 100644
--- a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
+++ b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
@@ -13,7 +13,8 @@ Required properties:
  - reg: Address where registers are mapped and size of region.
  - interrupts: Should contain the MAC interrupt.
  - phy-mode: See ethernet.txt in the same directory. Allow to choose
-	"rgmii", "rmii", or "mii" according to the PHY.
+	"rgmii", "rmii", "mii", or "internal" according to the PHY.
+	The acceptable mode is SoC-dependent.
  - phy-handle: Should point to the external phy device.
 	See ethernet.txt file in the same directory.
  - clocks: A phandle to the clock for the MAC.
@@ -27,6 +28,8 @@ Required properties:
  - reset-names: Should contain
 	- "ether", "gio" for Pro4 SoC
 	- "ether" for others
+ - socionext,syscon-phy-mode: A phandle to syscon with one argument
+	that configures phy mode. The argument is the ID of MAC instance.
 
 Optional properties:
  - local-mac-address: See ethernet.txt in the same directory.
@@ -47,6 +50,7 @@ Example:
 		clocks = <&sys_clk 6>;
 		reset-names = "ether";
 		resets = <&sys_rst 6>;
+		socionext,syscon-phy-mode = <&soc_glue 0>;
 		local-mac-address = [00 00 00 00 00 00];
 
 		mdio {
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 3/3] net: ethernet: ave: add support for phy-mode setting of system controller
From: Kunihiko Hayashi @ 2018-04-09  6:38 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: Andrew Lunn, Florian Fainelli, Rob Herring, Mark Rutland,
	linux-arm-kernel, linux-kernel, devicetree, Masahiro Yamada,
	Masami Hiramatsu, Jassi Brar, Kunihiko Hayashi
In-Reply-To: <1523255925-6469-1-git-send-email-hayashi.kunihiko@socionext.com>

This patch adds support for specifying system controller that configures
phy-mode setting.

According to the DT property "phy-mode", it's necessary to configure the
controller, which is used to choose the settings of the MAC suitable,
for example, mdio pin connections, internal clocks, and so on.

Supported phy-modes are SoC-dependent. The driver allows phy-mode to set
"internal" if the SoC has a built-in PHY, and {"mii", "rmii", "rgmii"}
if the SoC supports each mode. So we have to check whether the phy-mode
is valid or not.

This adds the following features for each SoC:
- check whether the SoC supports the specified phy-mode
- configure the controller accroding to phy-mode

The DT property accepts one argument to distinguish them for multiple MAC
instances.

ethernet@65000000 {
	...
	socionext,syscon-phy-mode = <&soc_glue 0>;
};

ethernet@65200000 {
	...
	socionext,syscon-phy-mode = <&soc_glue 1>;
};

Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
---
 drivers/net/ethernet/socionext/Kconfig   |   2 +
 drivers/net/ethernet/socionext/sni_ave.c | 150 ++++++++++++++++++++++++++++---
 2 files changed, 140 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/socionext/Kconfig b/drivers/net/ethernet/socionext/Kconfig
index 6bcfe27..b80048c 100644
--- a/drivers/net/ethernet/socionext/Kconfig
+++ b/drivers/net/ethernet/socionext/Kconfig
@@ -14,6 +14,8 @@ if NET_VENDOR_SOCIONEXT
 config SNI_AVE
 	tristate "Socionext AVE ethernet support"
 	depends on (ARCH_UNIPHIER || COMPILE_TEST) && OF
+	depends on HAS_IOMEM
+	select MFD_SYSCON
 	select PHYLIB
 	---help---
 	  Driver for gigabit ethernet MACs, called AVE, in the
diff --git a/drivers/net/ethernet/socionext/sni_ave.c b/drivers/net/ethernet/socionext/sni_ave.c
index 52940bd..f7eccee 100644
--- a/drivers/net/ethernet/socionext/sni_ave.c
+++ b/drivers/net/ethernet/socionext/sni_ave.c
@@ -11,6 +11,7 @@
 #include <linux/interrupt.h>
 #include <linux/io.h>
 #include <linux/iopoll.h>
+#include <linux/mfd/syscon.h>
 #include <linux/mii.h>
 #include <linux/module.h>
 #include <linux/netdevice.h>
@@ -18,6 +19,7 @@
 #include <linux/of_mdio.h>
 #include <linux/of_platform.h>
 #include <linux/phy.h>
+#include <linux/regmap.h>
 #include <linux/reset.h>
 #include <linux/types.h>
 #include <linux/u64_stats_sync.h>
@@ -197,6 +199,11 @@
 #define AVE_INTM_COUNT		20
 #define AVE_FORCE_TXINTCNT	1
 
+/* SG */
+#define SG_ETPINMODE		0x540
+#define SG_ETPINMODE_EXTPHY	BIT(1)	/* for LD11 */
+#define SG_ETPINMODE_RMII(ins)	BIT(ins)
+
 #define IS_DESC_64BIT(p)	((p)->data->is_desc_64bit)
 
 #define AVE_MAX_CLKS		4
@@ -228,12 +235,6 @@ struct ave_desc_info {
 	struct ave_desc *desc;	/* skb info related descriptor */
 };
 
-struct ave_soc_data {
-	bool	is_desc_64bit;
-	const char	*clock_names[AVE_MAX_CLKS];
-	const char	*reset_names[AVE_MAX_RSTS];
-};
-
 struct ave_stats {
 	struct	u64_stats_sync	syncp;
 	u64	packets;
@@ -257,6 +258,9 @@ struct ave_private {
 	phy_interface_t		phy_mode;
 	struct phy_device	*phydev;
 	struct mii_bus		*mdio;
+	struct regmap		*regmap;
+	unsigned int		pinmode_mask;
+	unsigned int		pinmode_val;
 
 	/* stats */
 	struct ave_stats	stats_rx;
@@ -279,6 +283,14 @@ struct ave_private {
 	const struct ave_soc_data *data;
 };
 
+struct ave_soc_data {
+	bool	is_desc_64bit;
+	const char	*clock_names[AVE_MAX_CLKS];
+	const char	*reset_names[AVE_MAX_RSTS];
+	int	(*get_pinmode)(struct ave_private *priv,
+			       phy_interface_t phy_mode, u32 arg);
+};
+
 static u32 ave_desc_read(struct net_device *ndev, enum desc_id id, int entry,
 			 int offset)
 {
@@ -1179,6 +1191,11 @@ static int ave_init(struct net_device *ndev)
 		}
 	}
 
+	ret = regmap_update_bits(priv->regmap, SG_ETPINMODE,
+				 priv->pinmode_mask, priv->pinmode_val);
+	if (ret)
+		return ret;
+
 	ave_global_reset(ndev);
 
 	mdio_np = of_get_child_by_name(np, "mdio");
@@ -1537,6 +1554,7 @@ static int ave_probe(struct platform_device *pdev)
 	const struct ave_soc_data *data;
 	struct device *dev = &pdev->dev;
 	char buf[ETHTOOL_FWVERS_LEN];
+	struct of_phandle_args args;
 	phy_interface_t phy_mode;
 	struct ave_private *priv;
 	struct net_device *ndev;
@@ -1559,12 +1577,6 @@ static int ave_probe(struct platform_device *pdev)
 		dev_err(dev, "phy-mode not found\n");
 		return -EINVAL;
 	}
-	if ((!phy_interface_mode_is_rgmii(phy_mode)) &&
-	    phy_mode != PHY_INTERFACE_MODE_RMII &&
-	    phy_mode != PHY_INTERFACE_MODE_MII) {
-		dev_err(dev, "phy-mode is invalid\n");
-		return -EINVAL;
-	}
 
 	irq = platform_get_irq(pdev, 0);
 	if (irq < 0) {
@@ -1656,6 +1668,26 @@ static int ave_probe(struct platform_device *pdev)
 		priv->nrsts++;
 	}
 
+	ret = of_parse_phandle_with_fixed_args(np,
+					       "socionext,syscon-phy-mode",
+					       1, 0, &args);
+	if (ret) {
+		netdev_err(ndev, "can't get syscon-phy-mode property\n");
+		goto out_free_netdev;
+	}
+	priv->regmap = syscon_node_to_regmap(args.np);
+	of_node_put(args.np);
+	if (IS_ERR(priv->regmap)) {
+		netdev_err(ndev, "can't map syscon-phy-mode\n");
+		ret = PTR_ERR(priv->regmap);
+		goto out_free_netdev;
+	}
+	ret = priv->data->get_pinmode(priv, phy_mode, args.args[0]);
+	if (ret) {
+		netdev_err(ndev, "invalid phy-mode setting\n");
+		goto out_free_netdev;
+	}
+
 	priv->mdio = devm_mdiobus_alloc(dev);
 	if (!priv->mdio) {
 		ret = -ENOMEM;
@@ -1715,6 +1747,95 @@ static int ave_remove(struct platform_device *pdev)
 	return 0;
 }
 
+static int ave_pro4_get_pinmode(struct ave_private *priv,
+				phy_interface_t phy_mode, u32 arg)
+{
+	if (arg > 0)
+		return -EINVAL;
+
+	priv->pinmode_mask = SG_ETPINMODE_RMII(0);
+
+	switch (phy_mode) {
+	case PHY_INTERFACE_MODE_RMII:
+		priv->pinmode_val = SG_ETPINMODE_RMII(0);
+		break;
+	case PHY_INTERFACE_MODE_MII:
+	case PHY_INTERFACE_MODE_RGMII:
+		priv->pinmode_val = 0;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int ave_ld11_get_pinmode(struct ave_private *priv,
+				phy_interface_t phy_mode, u32 arg)
+{
+	if (arg > 0)
+		return -EINVAL;
+
+	priv->pinmode_mask = SG_ETPINMODE_EXTPHY | SG_ETPINMODE_RMII(0);
+
+	switch (phy_mode) {
+	case PHY_INTERFACE_MODE_INTERNAL:
+		priv->pinmode_val = 0;
+		break;
+	case PHY_INTERFACE_MODE_RMII:
+		priv->pinmode_val = SG_ETPINMODE_EXTPHY | SG_ETPINMODE_RMII(0);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int ave_ld20_get_pinmode(struct ave_private *priv,
+				phy_interface_t phy_mode, u32 arg)
+{
+	if (arg > 0)
+		return -EINVAL;
+
+	priv->pinmode_mask = SG_ETPINMODE_RMII(0);
+
+	switch (phy_mode) {
+	case PHY_INTERFACE_MODE_RMII:
+		priv->pinmode_val = SG_ETPINMODE_RMII(0);
+		break;
+	case PHY_INTERFACE_MODE_RGMII:
+		priv->pinmode_val = 0;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int ave_pxs3_get_pinmode(struct ave_private *priv,
+				phy_interface_t phy_mode, u32 arg)
+{
+	if (arg > 1)
+		return -EINVAL;
+
+	priv->pinmode_mask = SG_ETPINMODE_RMII(arg);
+
+	switch (phy_mode) {
+	case PHY_INTERFACE_MODE_RMII:
+		priv->pinmode_val = SG_ETPINMODE_RMII(arg);
+		break;
+	case PHY_INTERFACE_MODE_RGMII:
+		priv->pinmode_val = 0;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static const struct ave_soc_data ave_pro4_data = {
 	.is_desc_64bit = false,
 	.clock_names = {
@@ -1723,6 +1844,7 @@ static const struct ave_soc_data ave_pro4_data = {
 	.reset_names = {
 		"gio", "ether",
 	},
+	.get_pinmode = ave_pro4_get_pinmode,
 };
 
 static const struct ave_soc_data ave_pxs2_data = {
@@ -1733,6 +1855,7 @@ static const struct ave_soc_data ave_pxs2_data = {
 	.reset_names = {
 		"ether",
 	},
+	.get_pinmode = ave_pro4_get_pinmode,
 };
 
 static const struct ave_soc_data ave_ld11_data = {
@@ -1743,6 +1866,7 @@ static const struct ave_soc_data ave_ld11_data = {
 	.reset_names = {
 		"ether",
 	},
+	.get_pinmode = ave_ld11_get_pinmode,
 };
 
 static const struct ave_soc_data ave_ld20_data = {
@@ -1753,6 +1877,7 @@ static const struct ave_soc_data ave_ld20_data = {
 	.reset_names = {
 		"ether",
 	},
+	.get_pinmode = ave_ld20_get_pinmode,
 };
 
 static const struct ave_soc_data ave_pxs3_data = {
@@ -1763,6 +1888,7 @@ static const struct ave_soc_data ave_pxs3_data = {
 	.reset_names = {
 		"ether",
 	},
+	.get_pinmode = ave_pxs3_get_pinmode,
 };
 
 static const struct of_device_id of_ave_match[] = {
-- 
2.7.4

^ permalink raw reply related

* Dear Talented
From: Lisa Clement @ 2018-04-09  6:45 UTC (permalink / raw)
  To: Recipients

Dear Talented,

I am Talent Scout For BLUE SKY FILM STUDIO, Present Blue sky Studio a
Film Corporation Located in the United State, is Soliciting for the
Right to use Your Photo/Face and Personality as One of the Semi -Major
Role/ Character in our Upcoming ANIMATED Stereoscope 3D Movie-The Story
of Spies in Disguise (Spies in Disguise 2019) The Movie is Currently Filming (In
Production) Please Note That There Will Be No Auditions, Traveling or
Any Special / Professional Acting Skills, Since the Production of This
Movie Will Be Done with our State of Art Computer -Generating Imagery
Equipment. We Are Prepared to Pay the Total Sum of $620,000.00 USD. For
More Information/Understanding, Please Write us on the E-Mail Below.
CONTACT EMAIL: blueskystudios7@gmail.com
All Reply to: blueskystudios7@gmail.com
Note: Only the Response send to this mail will be Given a Prior
Consideration.

Talent Scout
Lisa Clement

^ permalink raw reply

* [PATCH net] net: dsa: mv88e6xxx: Fix receive time stamp race condition.
From: Richard Cochran @ 2018-04-09  7:03 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, Andrew Lunn, David Miller, Florian Fainelli,
	Vivien Didelot

The DSA stack passes received PTP frames to this driver via
mv88e6xxx_port_rxtstamp() for deferred delivery.  The driver then
queues the frame and kicks the worker thread.  The work callback reads
out the latched receive time stamp and then works through the queue,
delivering any non-matching frames without a time stamp.

If a new frame arrives after the worker thread has read out the time
stamp register but enters the queue before the worker finishes
processing the queue, that frame will be delivered without a time
stamp.

This patch fixes the race by moving the queue onto a list on the stack
before reading out the latched time stamp value.

Fixes: c6fe0ad2c3499 ("net: dsa: mv88e6xxx: add rx/tx timestamping support")

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
---
 drivers/net/dsa/mv88e6xxx/hwtstamp.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/hwtstamp.c b/drivers/net/dsa/mv88e6xxx/hwtstamp.c
index ac7694c71266..a036c490b7ce 100644
--- a/drivers/net/dsa/mv88e6xxx/hwtstamp.c
+++ b/drivers/net/dsa/mv88e6xxx/hwtstamp.c
@@ -285,10 +285,18 @@ static void mv88e6xxx_get_rxts(struct mv88e6xxx_chip *chip,
 			       struct sk_buff_head *rxq)
 {
 	u16 buf[4] = { 0 }, status, seq_id;
-	u64 ns, timelo, timehi;
 	struct skb_shared_hwtstamps *shwt;
+	struct sk_buff_head received;
+	u64 ns, timelo, timehi;
+	unsigned long flags;
 	int err;
 
+	/* The latched timestamp belongs to one of the received frames. */
+	__skb_queue_head_init(&received);
+	spin_lock_irqsave(&rxq->lock, flags);
+	skb_queue_splice_tail_init(rxq, &received);
+	spin_unlock_irqrestore(&rxq->lock, flags);
+
 	mutex_lock(&chip->reg_lock);
 	err = mv88e6xxx_port_ptp_read(chip, ps->port_id,
 				      reg, buf, ARRAY_SIZE(buf));
@@ -311,7 +319,7 @@ static void mv88e6xxx_get_rxts(struct mv88e6xxx_chip *chip,
 	/* Since the device can only handle one time stamp at a time,
 	 * we purge any extra frames from the queue.
 	 */
-	for ( ; skb; skb = skb_dequeue(rxq)) {
+	for ( ; skb; skb = __skb_dequeue(&received)) {
 		if (mv88e6xxx_ts_valid(status) && seq_match(skb, seq_id)) {
 			ns = timehi << 16 | timelo;
 
-- 
2.11.0

^ permalink raw reply related

* [PATCH v2] vhost-net: set packet weight of tx polling to 2 * vq size
From: haibinzhang(张海斌) @ 2018-04-09  7:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, kvm@vger.kernel.org,
	virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
  Cc: lidongchen(陈立东),
	yunfangtai(台运方)

handle_tx will delay rx for tens or even hundreds of milliseconds when tx busy
polling udp packets with small length(e.g. 1byte udp payload), because setting
VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet length.

Ping-Latencies shown below were tested between two Virtual Machines using
netperf (UDP_STREAM, len=1), and then another machine pinged the client:

vq size=256
Packet-Weight   Ping-Latencies(millisecond)
                   min      avg       max
Origin           3.319   18.489    57.303
64               1.643    2.021     2.552
128              1.825    2.600     3.224
256              1.997    2.710     4.295
512              1.860    3.171     4.631
1024             2.002    4.173     9.056
2048             2.257    5.650     9.688
4096             2.093    8.508    15.943

vq size=512
Packet-Weight   Ping-Latencies(millisecond)
                   min      avg       max
Origin           6.537   29.177    66.245
64               2.798    3.614     4.403
128              2.861    3.820     4.775
256              3.008    4.018     4.807
512              3.254    4.523     5.824
1024             3.079    5.335     7.747
2048             3.944    8.201    12.762
4096             4.158   11.057    19.985

Ring size is a hint from device about a burst size it can tolerate. Based on
benchmarks, set the weight to 2 * vq size.

To evaluate this change, another tests were done using netperf(RR, TX) between
two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was
tweaked through qemu. Results shown below does not show obvious changes.

vq size=256 TCP_RR                vq size=512 TCP_RR
size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
   1/       1/  -7%/        -2%      1/       1/   0%/        -2%
   1/       4/  +1%/         0%      1/       4/  +1%/         0%
   1/       8/  +1%/        -2%      1/       8/   0%/        +1%
  64/       1/  -6%/         0%     64/       1/  +7%/        +3%
  64/       4/   0%/        +2%     64/       4/  -1%/        +1%
  64/       8/   0%/         0%     64/       8/  -1%/        -2%
 256/       1/  -3%/        -4%    256/       1/  -4%/        -2%
 256/       4/  +3%/        +4%    256/       4/  +1%/        +2%
 256/       8/  +2%/         0%    256/       8/  +1%/        -1%

vq size=256 UDP_RR                vq size=512 UDP_RR
size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
   1/       1/  -5%/        +1%      1/       1/  -3%/        -2%
   1/       4/  +4%/        +1%      1/       4/  -2%/        +2%
   1/       8/  -1%/        -1%      1/       8/  -1%/         0%
  64/       1/  -2%/        -3%     64/       1/  +1%/        +1%
  64/       4/  -5%/        -1%     64/       4/  +2%/         0%
  64/       8/   0%/        -1%     64/       8/  -2%/        +1%
 256/       1/  +7%/        +1%    256/       1/  -7%/         0%
 256/       4/  +1%/        +1%    256/       4/  -3%/        -4%
 256/       8/  +2%/        +2%    256/       8/  +1%/        +1%

vq size=256 TCP_STREAM            vq size=512 TCP_STREAM
size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
  64/       1/   0%/        -3%     64/       1/   0%/         0%
  64/       4/  +3%/        -1%     64/       4/  -2%/        +4%
  64/       8/  +9%/        -4%     64/       8/  -1%/        +2%
 256/       1/  +1%/        -4%    256/       1/  +1%/        +1%
 256/       4/  -1%/        -1%    256/       4/  -3%/         0%
 256/       8/  +7%/        +5%    256/       8/  -3%/         0%
 512/       1/  +1%/         0%    512/       1/  -1%/        -1%
 512/       4/  +1%/        -1%    512/       4/   0%/         0%
 512/       8/  +7%/        -5%    512/       8/  +6%/        -1%
1024/       1/   0%/        -1%   1024/       1/   0%/        +1%
1024/       4/  +3%/         0%   1024/       4/  +1%/         0%
1024/       8/  +8%/        +5%   1024/       8/  -1%/         0%
2048/       1/  +2%/        +2%   2048/       1/  -1%/         0%
2048/       4/  +1%/         0%   2048/       4/   0%/        -1%
2048/       8/  -2%/         0%   2048/       8/   5%/        -1%
4096/       1/  -2%/         0%   4096/       1/  -2%/         0%
4096/       4/  +2%/         0%   4096/       4/   0%/         0%
4096/       8/  +9%/        -2%   4096/       8/  -5%/        -1%

Signed-off-by: Haibin Zhang <haibinzhang@tencent.com>
Signed-off-by: Yunfang Tai <yunfangtai@tencent.com>
Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 drivers/vhost/net.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 8139bc70ad7d..3563a305cc0a 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -44,6 +44,10 @@ MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;"
  * Using this limit prevents one virtqueue from starving others. */
 #define VHOST_NET_WEIGHT 0x80000
 
+/* Max number of packets transferred before requeueing the job.
+ * Using this limit prevents one virtqueue from starving rx. */
+#define VHOST_NET_PKT_WEIGHT(vq) ((vq)->num * 2)
+
 /* MAX number of TX used buffers for outstanding zerocopy */
 #define VHOST_MAX_PEND 128
 #define VHOST_GOODCOPY_LEN 256
@@ -473,6 +477,7 @@ static void handle_tx(struct vhost_net *net)
 	struct socket *sock;
 	struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
 	bool zcopy, zcopy_used;
+	int sent_pkts = 0;
 
 	mutex_lock(&vq->mutex);
 	sock = vq->private_data;
@@ -580,7 +585,8 @@ static void handle_tx(struct vhost_net *net)
 		else
 			vhost_zerocopy_signal_used(net, vq);
 		vhost_net_tx_packet(net);
-		if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
+		if (unlikely(total_len >= VHOST_NET_WEIGHT) ||
+		    unlikely(++sent_pkts >= VHOST_NET_PKT_WEIGHT(vq))) {
 			vhost_poll_queue(&vq->poll);
 			break;
 		}
-- 
2.12.3

^ permalink raw reply related

* Re: [PATCH net-next v2] ipvs: fix multiplicative hashing in sh/dh/lblc/lblcr algorithms
From: Simon Horman @ 2018-04-09  7:16 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: Vincent Bernat, Wensong Zhang, David S. Miller, netdev, lvs-devel
In-Reply-To: <alpine.LFD.2.20.1804011407500.9304@ja.home.ssi.bg>

On Sun, Apr 01, 2018 at 02:11:51PM +0300, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Sun, 1 Apr 2018, Vincent Bernat wrote:
> 
> > The sh/dh/lblc/lblcr algorithms are using Knuth's multiplicative
> > hashing incorrectly. Replace its use by the hash_32() macro, which
> > correctly implements this algorithm. It doesn't use the same constant,
> > but it shouldn't matter.
> > 
> > Signed-off-by: Vincent Bernat <vincent@bernat.im>
> 
> 	Looks good to me, thanks! Simon, please apply, if possible
> with the extra space removed, see below...
> 
> Acked-by: Julian Anastasov <ja@ssi.bg>

Thanks, I have applied this with the extra space removed.
I will submit it for inclusion in v4.18.

^ permalink raw reply

* [PATCH RESEND v2] vhost-net: set packet weight of tx polling to 2 * vq size
From: haibinzhang(张海斌) @ 2018-04-09  7:22 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, kvm@vger.kernel.org,
	virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
  Cc: lidongchen(陈立东),
	yunfangtai(台运方)

handle_tx will delay rx for tens or even hundreds of milliseconds when tx busy
polling udp packets with small length(e.g. 1byte udp payload), because setting
VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet length.

Ping-Latencies shown below were tested between two Virtual Machines using
netperf (UDP_STREAM, len=1), and then another machine pinged the client:

vq size=256
Packet-Weight   Ping-Latencies(millisecond)
                   min      avg       max
Origin           3.319   18.489    57.303
64               1.643    2.021     2.552
128              1.825    2.600     3.224
256              1.997    2.710     4.295
512              1.860    3.171     4.631
1024             2.002    4.173     9.056
2048             2.257    5.650     9.688
4096             2.093    8.508    15.943

vq size=512
Packet-Weight   Ping-Latencies(millisecond)
                   min      avg       max
Origin           6.537   29.177    66.245
64               2.798    3.614     4.403
128              2.861    3.820     4.775
256              3.008    4.018     4.807
512              3.254    4.523     5.824
1024             3.079    5.335     7.747
2048             3.944    8.201    12.762
4096             4.158   11.057    19.985

Seems pretty consistent, a small dip at 2 VQ sizes.
Ring size is a hint from device about a burst size it can tolerate. Based on
benchmarks, set the weight to 2 * vq size.

To evaluate this change, another tests were done using netperf(RR, TX) between
two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was
tweaked through qemu. Results shown below does not show obvious changes.

vq size=256 TCP_RR                vq size=512 TCP_RR
size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
   1/       1/  -7%/        -2%      1/       1/   0%/        -2%
   1/       4/  +1%/         0%      1/       4/  +1%/         0%
   1/       8/  +1%/        -2%      1/       8/   0%/        +1%
  64/       1/  -6%/         0%     64/       1/  +7%/        +3%
  64/       4/   0%/        +2%     64/       4/  -1%/        +1%
  64/       8/   0%/         0%     64/       8/  -1%/        -2%
 256/       1/  -3%/        -4%    256/       1/  -4%/        -2%
 256/       4/  +3%/        +4%    256/       4/  +1%/        +2%
 256/       8/  +2%/         0%    256/       8/  +1%/        -1%

vq size=256 UDP_RR                vq size=512 UDP_RR
size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
   1/       1/  -5%/        +1%      1/       1/  -3%/        -2%
   1/       4/  +4%/        +1%      1/       4/  -2%/        +2%
   1/       8/  -1%/        -1%      1/       8/  -1%/         0%
  64/       1/  -2%/        -3%     64/       1/  +1%/        +1%
  64/       4/  -5%/        -1%     64/       4/  +2%/         0%
  64/       8/   0%/        -1%     64/       8/  -2%/        +1%
 256/       1/  +7%/        +1%    256/       1/  -7%/         0%
 256/       4/  +1%/        +1%    256/       4/  -3%/        -4%
 256/       8/  +2%/        +2%    256/       8/  +1%/        +1%

vq size=256 TCP_STREAM            vq size=512 TCP_STREAM
size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
  64/       1/   0%/        -3%     64/       1/   0%/         0%
  64/       4/  +3%/        -1%     64/       4/  -2%/        +4%
  64/       8/  +9%/        -4%     64/       8/  -1%/        +2%
 256/       1/  +1%/        -4%    256/       1/  +1%/        +1%
 256/       4/  -1%/        -1%    256/       4/  -3%/         0%
 256/       8/  +7%/        +5%    256/       8/  -3%/         0%
 512/       1/  +1%/         0%    512/       1/  -1%/        -1%
 512/       4/  +1%/        -1%    512/       4/   0%/         0%
 512/       8/  +7%/        -5%    512/       8/  +6%/        -1%
1024/       1/   0%/        -1%   1024/       1/   0%/        +1%
1024/       4/  +3%/         0%   1024/       4/  +1%/         0%
1024/       8/  +8%/        +5%   1024/       8/  -1%/         0%
2048/       1/  +2%/        +2%   2048/       1/  -1%/         0%
2048/       4/  +1%/         0%   2048/       4/   0%/        -1%
2048/       8/  -2%/         0%   2048/       8/   5%/        -1%
4096/       1/  -2%/         0%   4096/       1/  -2%/         0%
4096/       4/  +2%/         0%   4096/       4/   0%/         0%
4096/       8/  +9%/        -2%   4096/       8/  -5%/        -1%

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Haibin Zhang <haibinzhang@tencent.com>
Signed-off-by: Yunfang Tai <yunfangtai@tencent.com>
Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 drivers/vhost/net.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 8139bc70ad7d..3563a305cc0a 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -44,6 +44,10 @@ MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;"
  * Using this limit prevents one virtqueue from starving others. */
 #define VHOST_NET_WEIGHT 0x80000
 
+/* Max number of packets transferred before requeueing the job.
+ * Using this limit prevents one virtqueue from starving rx. */
+#define VHOST_NET_PKT_WEIGHT(vq) ((vq)->num * 2)
+
 /* MAX number of TX used buffers for outstanding zerocopy */
 #define VHOST_MAX_PEND 128
 #define VHOST_GOODCOPY_LEN 256
@@ -473,6 +477,7 @@ static void handle_tx(struct vhost_net *net)
 	struct socket *sock;
 	struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
 	bool zcopy, zcopy_used;
+	int sent_pkts = 0;
 
 	mutex_lock(&vq->mutex);
 	sock = vq->private_data;
@@ -580,7 +585,8 @@ static void handle_tx(struct vhost_net *net)
 		else
 			vhost_zerocopy_signal_used(net, vq);
 		vhost_net_tx_packet(net);
-		if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
+		if (unlikely(total_len >= VHOST_NET_WEIGHT) ||
+		    unlikely(++sent_pkts >= VHOST_NET_PKT_WEIGHT(vq))) {
 			vhost_poll_queue(&vq->poll);
 			break;
 		}
-- 
2.12.3

^ permalink raw reply related

* Re: [RFC PATCH net-next v5 3/4] virtio_net: Extend virtio to use VF datapath when available
From: Jiri Pirko @ 2018-04-09  8:07 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: mst, stephen, davem, netdev, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
	loseweigh
In-Reply-To: <bb4f0821-246f-1052-de8b-d96be3388ed3@intel.com>

Sat, Apr 07, 2018 at 12:59:14AM CEST, sridhar.samudrala@intel.com wrote:
>On 4/6/2018 5:48 AM, Jiri Pirko wrote:
>> Thu, Apr 05, 2018 at 11:08:22PM CEST, sridhar.samudrala@intel.com wrote:

[...]

>> > +static int virtnet_bypass_join_child(struct net_device *bypass_netdev,
>> > +				     struct net_device *child_netdev)
>> > +{
>> > +	struct virtnet_bypass_info *vbi;
>> > +	bool backup;
>> > +
>> > +	vbi = netdev_priv(bypass_netdev);
>> > +	backup = (child_netdev->dev.parent == bypass_netdev->dev.parent);
>> > +	if (backup ? rtnl_dereference(vbi->backup_netdev) :
>> > +			rtnl_dereference(vbi->active_netdev)) {
>> > +		netdev_info(bypass_netdev,
>> > +			    "%s attempting to join bypass dev when %s already present\n",
>> > +			    child_netdev->name, backup ? "backup" : "active");
>> Bypass module should check if there is already some other netdev
>> enslaved and refuse right there.
>
>This will work for virtio-net with 3 netdev model, but this check has to be done by netvsc
>as its bypass_netdev is same as the backup_netdev.
>Will add a flag while registering with the bypass module to indicate if the driver is doing
>a 2 netdev or 3 netdev model and based on that flag this check can be done in bypass module
>for 3 netdev scenario.

Just let me undestand it clearly. What I expect the difference would be
between 2netdev and3 netdev model is this:
2netdev:
  bypass_master
     /
    /
VF_slave

3netdev:
  bypass_master
     /     \
    /       \
VF_slave   backup_slave

Is that correct? If not, how does it look like?

Thanks!

^ permalink raw reply

* Re: [PATCH net-next 6/6] netdevsim: Add simple FIB resource controller via devlink
From: Jiri Pirko @ 2018-04-09  8:18 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, davem, roopa, shm, jiri, idosch, jakub.kicinski,
	andy.roulin
In-Reply-To: <c85f2b41-6100-fe12-b30a-8f12f63e8aab@cumulusnetworks.com>

Fri, Apr 06, 2018 at 11:22:29PM CEST, dsa@cumulusnetworks.com wrote:
>On 4/5/18 11:52 PM, Jiri Pirko wrote:
>> Thu, Apr 05, 2018 at 11:06:41PM CEST, dsa@cumulusnetworks.com wrote:
>>> On 4/5/18 2:10 PM, David Ahern wrote:
>>>>
>>>> The ASIC here is the kernel tables in a namespace. It does not make
>>>> sense to have 2 devlink instances for a single namespace.
>>>
>>> I put this example controller in netdevsim per a suggestion from Ido.
>>> The netdevsim seemed like a good idea given that modules intention --
>>> testing network facilities. Perhaps I should have done this as a
>>> completely standalone module ...
>>>
>>> The intention is to treat the kernel's tables *per namespace* as a
>>> standalone entity that can be managed very similar to ASIC resources.
>> 
>> So you say you want to treat a namespace as an ASIC? That sounds very
>> odd to me :/
>
>Why? The kernel has forwarding tables, acl's, etc just like the ASIC,
>and each namespace is a separate set of tables.

I don't get it. What's the point? For HW, the reason is it has limited
resources and those resources are not mapped 1:1 with kernel object.
However, for kernel, that is meaningless.


>
>If you think about it, userspace "programs" the kernel just like mlxsw
>and userspace SDKs "program" an asic.

I don't give a **** about sdks. I have no clue why you mention that here.


>
>
>>> Given that I can add a resource controller module
>>> (drivers/net/kern_res_mgr.c?) that creates a 'struct device' per network
>>> namespace with a devlink instance. In this case the device would very
>>> much be tied to the namespace 1:1.
>> 
>> That sounds more reasonable and accurate, yet still odd. You would not
>> have any netdevices there? Any ports?
>> 
>
>Sure, what ever ports are assigned to or created in the namespace.
>
>Nothing about the devlink API says it has to be a real h/w device.

Sure, it could represent something made-up, like netdevsim. However I
see a big misfit when you want to represent a namespace.


>Nothing about the devlink API says it can only be used for real h/w that
>has ports represented by netdevices that the devlink instance some how
>has "control" over.
>
>As the netdevsim demo shows, I can build an L3 resource controller for
>the kernel tables using just the devlink API and the in-kernel notifiers.

^ permalink raw reply

* [PATCH] slip: Check if rstate is initialized before uncompressing
From: Tejaswi Tanikella @ 2018-04-09  8:53 UTC (permalink / raw)
  To: netdev

On receiving a packet the state index points to the rstate which must be
used to fill up IP and TCP headers. But if the state index points to a
rstate which is unitialized, i.e. filled with zeros, it gets stuck in an
infinite loop inside ip_fast_csum trying to compute the ip checsum of a
header with zero length.

89.666953:   <2> [<ffffff9dd3e94d38>] slhc_uncompress+0x464/0x468
89.666965:   <2> [<ffffff9dd3e87d88>] ppp_receive_nonmp_frame+0x3b4/0x65c
89.666978:   <2> [<ffffff9dd3e89dd4>] ppp_receive_frame+0x64/0x7e0
89.666991:   <2> [<ffffff9dd3e8a708>] ppp_input+0x104/0x198
89.667005:   <2> [<ffffff9dd3e93868>] pppopns_recv_core+0x238/0x370
89.667027:   <2> [<ffffff9dd4428fc8>] __sk_receive_skb+0xdc/0x250
89.667040:   <2> [<ffffff9dd3e939e4>] pppopns_recv+0x44/0x60
89.667053:   <2> [<ffffff9dd4426848>] __sock_queue_rcv_skb+0x16c/0x24c
89.667065:   <2> [<ffffff9dd4426954>] sock_queue_rcv_skb+0x2c/0x38
89.667085:   <2> [<ffffff9dd44f7358>] raw_rcv+0x124/0x154
89.667098:   <2> [<ffffff9dd44f7568>] raw_local_deliver+0x1e0/0x22c
89.667117:   <2> [<ffffff9dd44c8ba0>] ip_local_deliver_finish+0x70/0x24c
89.667131:   <2> [<ffffff9dd44c92f4>] ip_local_deliver+0x100/0x10c

./scripts/faddr2line vmlinux slhc_uncompress+0x464/0x468 output:
 ip_fast_csum at arch/arm64/include/asm/checksum.h:40
 (inlined by) slhc_uncompress at drivers/net/slip/slhc.c:615

Adding a variable to indicate if the current rstate is initialized. If
such a packet arrives, move to toss state.

Signed-off-by: Tejaswi Tanikella <tejaswit@codeaurora.org>
---
 drivers/net/slip/slhc.c | 5 +++++
 include/net/slhc_vj.h   | 1 +
 2 files changed, 6 insertions(+)

diff --git a/drivers/net/slip/slhc.c b/drivers/net/slip/slhc.c
index 5782733..e710b0b 100644
--- a/drivers/net/slip/slhc.c
+++ b/drivers/net/slip/slhc.c
@@ -509,6 +509,10 @@ struct slcompress *
 		if(x < 0 || x > comp->rslot_limit)
 			goto bad;
 
+		/* Check if the cstate is initialized */
+		if(!comp->rstate[x].initialized)
+			goto bad;
+
 		comp->flags &=~ SLF_TOSS;
 		comp->recv_current = x;
 	} else {
@@ -673,6 +677,7 @@ struct slcompress *
 	if (cs->cs_tcp.doff > 5)
 	  memcpy(cs->cs_tcpopt, icp + ihl*4 + sizeof(struct tcphdr), (cs->cs_tcp.doff - 5) * 4);
 	cs->cs_hsize = ihl*2 + cs->cs_tcp.doff*2;
+	cs->initialized = 1;
 	/* Put headers back on packet
 	 * Neither header checksum is recalculated
 	 */
diff --git a/include/net/slhc_vj.h b/include/net/slhc_vj.h
index 8716d59..d673365 100644
--- a/include/net/slhc_vj.h
+++ b/include/net/slhc_vj.h
@@ -127,6 +127,7 @@
  */
 struct cstate {
 	byte_t	cs_this;	/* connection id number (xmit) */
+	byte_t	initialized;	/* non-zero if initialized */
 	struct cstate *next;	/* next in ring (xmit) */
 	struct iphdr cs_ip;	/* ip/tcp hdr from most recent packet */
 	struct tcphdr cs_tcp;
-- 
1.9.1

^ permalink raw reply related

* Re: [RFC bpf-next] bpf: document eBPF helpers and add a script to generate man page
From: Daniel Borkmann @ 2018-04-09  9:01 UTC (permalink / raw)
  To: Quentin Monnet, ast; +Cc: netdev, oss-drivers, linux-doc, linux-man
In-Reply-To: <20180406111122.11038-1-quentin.monnet@netronome.com>

On 04/06/2018 01:11 PM, Quentin Monnet wrote:
> eBPF helper functions can be called from within eBPF programs to perform
> a variety of tasks that would be otherwise hard or impossible to do with
> eBPF itself. There is a growing number of such helper functions in the
> kernel, but documentation is scarce. The main user space header file
> does contain a short commented description of most helpers, but it is
> somewhat outdated and not complete. It is more a "cheat sheet" than a
> real documentation accessible to new eBPF developers.
> 
> This commit attempts to improve the situation by replacing the existing
> overview for the helpers with a more developed description. Furthermore,
> a Python script is added to generate a manual page for eBPF helpers. The
> workflow is the following, and requires the rst2man utility:
> 
>     $ ./scripts/bpf_helpers_doc.py \
>             --filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
>     $ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
>     $ man /tmp/bpf-helpers.7
> 
> The objective is to keep all documentation related to the helpers in a
> single place, and to be able to generate from here a manual page that
> could be packaged in the man-pages repository and shipped with most
> distributions [1].
> 
> Additionally, parsing the prototypes of the helper functions could
> hopefully be reused, with a different Printer object, to generate
> header files needed in some eBPF-related projects.
> 
> Regarding the description of each helper, it comprises several items:
> 
> - The function prototype.
> - A description of the function and of its arguments (except for a
>   couple of cases, when there are no arguments and the return value
>   makes the function usage really obvious).
> - A description of return values (if not void).
> - A listing of eBPF program types (if relevant, map types) compatible
>   with the helper.
> - Information about the helper being restricted to GPL programs, or not.
> - The kernel version in which the helper was introduced.
> - The commit that introduced the helper (this is mostly to have it in
>   the source of the man page, as it can be used to track changes and
>   update the page).
> 
> For several helpers, descriptions are inspired (at times, nearly copied)
> from the commit logs introducing them in the kernel--Many thanks to
> their respective authors! They were completed as much as possible, the
> objective being to have something easily accessible even for people just
> starting with eBPF. There is probably a bit more work to do in this
> direction for some helpers.
> 
> Some RST formatting is used in the descriptions (not in function
> prototypes, to keep them readable, but the Python script provided in
> order to generate the RST for the manual page does add formatting to
> prototypes, to produce something pretty) to get "bold" and "italics" in
> manual pages. Hopefully, the descriptions in bpf.h file remains
> perfectly readable. Note that the few trailing white spaces are
> intentional, removing them would break paragraphs for rst2man.
> 
> The descriptions should ideally be updated each time someone adds a new
> helper, or updates the behaviour (compatibility extended to new program
> types, new socket option supported...) or the interface (new flags
> available, ...) of existing ones.
> 
> [1] I have not contacted people from the man-pages project prior to
>     sending this RFC, so I can offer no guaranty at this time that they
>     would accept to take the generated man page.
> 
> Cc: linux-doc@vger.kernel.org
> Cc: linux-man@vger.kernel.org
> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>

Great work, thanks a lot for doing this!

[...]
> + * int bpf_probe_read(void *dst, u32 size, const void *src)
> + * 	Description
> + * 		For tracing programs, safely attempt to read *size* bytes from
> + * 		address *src* and store the data in *dst*.
> + * 	Return
> + * 		0 on success, or a negative error in case of failure.
> + * 	For
> + * 		*Tracing programs*.
> + * 	GPL only
> + * 		Yes
> + * 	Since
> + * 		Linux 4.1
> + *
> + * .. commit 2541517c32be
>   *
>   * u64 bpf_ktime_get_ns(void)
> - *     Return: current ktime
> - *
> - * int bpf_trace_printk(const char *fmt, int fmt_size, ...)
> - *     Return: length of buffer written or negative error
> + * 	Description
> + * 		Return the time elapsed since system boot, in nanoseconds.
> + * 	Return
> + * 		Current *ktime*.
> + * 	For
> + * 		All program types, except
> + * 		**BPF_PROG_TYPE_CGROUP_DEVICE**.

I think we should probably always just list the actual program types instead,
since when new types get added to the kernel not supporting bpf_ktime_get_ns()
helper in this example, the above statement would be misleading (potentially
more misleading than the other way around when it's not yet mentioned to be
supported).

> + * 	GPL only
> + * 		Yes
> + * 	Since
> + * 		Linux 4.1
> + *
> + * .. commit d9847d310ab4
> + *
> + * int bpf_trace_printk(const char *fmt, u32 fmt_size, ...)
> + * 	Description
> + * 		This helper is a "printk()-like" facility for debugging. It
> + * 		prints a message defined by format *fmt* (of size *fmt_size*)
> + * 		to file *\/sys/kernel/debug/tracing/trace* from DebugFS, if
> + * 		available. It can take up to three additional **u64**
> + * 		arguments (as an eBPF helpers, the total number of arguments is
> + * 		limited to five). Each time the helper is called, it appends a
> + * 		line that looks like the following:
> + *
> + * 		::
> + *
> + * 			telnet-470   [001] .N.. 419421.045894: 0x00000001: BPF command: 2
> + *
> + * 		In the above:
> + *
> + * 			* ``telnet`` is the name of the current task.
> + * 			* ``470`` is the PID of the current task.
> + * 			* ``001`` is the CPU number on which the task is
> + * 			  running.
> + * 			* In ``.N..``, each character refers to a set of
> + * 			  options (whether irqs are enabled, scheduling
> + * 			  options, whether hard/softirqs are running, level of
> + * 			  preempt_disabled respectively). **N** means that
> + * 			  **TIF_NEED_RESCHED** and **PREEMPT_NEED_RESCHED**
> + * 			  are set.
> + * 			* ``419421.045894`` is a timestamp.
> + * 			* ``0x00000001`` is a fake value used by BPF for the
> + * 			  instruction pointer register.
> + * 			* ``BPF command: 2`` is the message formatted with
> + * 			  *fmt*.
> + *
> + * 		The conversion specifiers supported by *fmt* are similar, but
> + * 		more limited than for printk(). They are **%d**, **%i**,
> + * 		**%u**, **%x**, **%ld**, **%li**, **%lu**, **%lx**, **%lld**,
> + * 		**%lli**, **%llu**, **%llx**, **%p**, **%s**. No modifier (size
> + * 		of field, padding with zeroes, etc.) is available, and the
> + * 		helper will silently fail if it encounters an unknown
> + * 		specifier.
> + *
> + * 		Also, note that **bpf_trace_printk**\ () is slow, and should
> + * 		only be used for debugging purposes. For passing values to user
> + * 		space, perf events should be preferred.
> + * 	Return
> + * 		The number of bytes written to the buffer, or a negative error
> + * 		in case of failure.
> + * 	For
> + * 		All types of programs.

Ditto, and various other places.

> + * 	GPL only
> + * 		Yes
> + * 	Since
> + * 		Linux 4.1
> + *
> + * .. commit 9c959c863f82
>   *
>   * u32 bpf_prandom_u32(void)
> - *     Return: random value
> - *
> - * u32 bpf_raw_smp_processor_id(void)
> - *     Return: SMP processor ID
> - *
[...]
> + * u32 bpf_get_smp_processor_id(void)
> + * 	Return
> + * 		The SMP (Symmetric multiprocessing) processor id.
> + * 	For
> + * 		*Networking programs*.

Ditto plus above is actually not correct. See tracing_func_proto():

        [...]
        case BPF_FUNC_get_smp_processor_id:
                return &bpf_get_smp_processor_id_proto;
        [...]

> + * 	GPL only
> + * 		No
> + * 	Since
> + * 		Linux 4.1
> + *
> + * .. commit c04167ce2ca0
> + *
> + * int bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from, u32 len, u64 flags)
> + * 	Description
> + * 		Store *len* bytes from address *from* into the packet
> + * 		associated to *skb*, at *offset*. *flags* are a combination of
> + * 		**BPF_F_RECOMPUTE_CSUM** (automatically recompute the
> + * 		checksum for the packet after storing the bytes) and
> + * 		**BPF_F_INVALIDATE_HASH** (set *skb*\ **->hash**, *skb*\
> + * 		**->swhash** and *skb*\ **->l4hash** to 0).
> + *
> + * 		A call to this helper is susceptible to change data from the
> + * 		packet. Therefore, at load time, all checks on pointers
> + * 		previously done by the verifier are invalidated and must be
> + * 		performed again.
> + * 	Return
[...]

Agree with Alexei that 'Description' but also 'Return' is the most useful
(the latter we can still improve a bit wrt errors). 'For' gets indeed quickly
out of hand but I do think that for BPF program developers 'Since' has good
value to quickly check when a helper could be available, but then again
this is actually tied to an individual kconfig and/or program type, and could
potentially require to add or (worst case) remove the info in a second commit
e.g. after the merge window similar with the commit sha. Probably best indeed
to stick to the signature, 'Description' and 'Return' initially.

We should probably also have the bpf_helpers_doc.py workflow in a separate
comment above this one such that developers don't have to look up the original
commit message, but have the required steps to render the rst/man page available
right where they need it.

Thanks,
Daniel

^ permalink raw reply

* Re: [RFC bpf-next] bpf: document eBPF helpers and add a script to generate man page
From: Daniel Borkmann @ 2018-04-09  9:25 UTC (permalink / raw)
  To: Markus Heiser, Jonathan Corbet
  Cc: Quentin Monnet, ast, netdev, oss-drivers, Linux Doc Mailing List,
	linux-man
In-Reply-To: <AB89F6E2-DE8A-4259-B361-BE96AD32E84E@darmarit.de>

On 04/09/2018 11:21 AM, Markus Heiser wrote:
[...]
> Do we really need another kernel-doc parser?
> 
>   ./scripts/kernel-doc include/uapi/linux/bpf.h
> 
> should already do the job (producing .rst). For more infos, take a look at

This has absolutely zero to do with kernel-doc, but rather producing
a description of BPF helper function that are later assembled into an
actual man-page that BPF program developers (user space) can use.

^ permalink raw reply

* Re: [RFC bpf-next] bpf: document eBPF helpers and add a script to generate man page
From: Markus Heiser @ 2018-04-09  9:26 UTC (permalink / raw)
  To: Daniel Borkmann, Jonathan Corbet
  Cc: Quentin Monnet, ast, netdev, oss-drivers, Linux Doc Mailing List,
	linux-man
In-Reply-To: <05d2d03a-0b39-9d0c-9ba0-3461afc45fac@iogearbox.net>

On 04/06/2018 01:11 PM, Quentin Monnet wrote:
>> eBPF helper functions can be called from within eBPF programs to perform
>> a variety of tasks that would be otherwise hard or impossible to do with
>> eBPF itself. There is a growing number of such helper functions in the
>> kernel, but documentation is scarce. The main user space header file
>> does contain a short commented description of most helpers, but it is
>> somewhat outdated and not complete. It is more a "cheat sheet" than a
>> real documentation accessible to new eBPF developers.
>> 
>> This commit attempts to improve the situation by replacing the existing
>> overview for the helpers with a more developed description. Furthermore,
>> a Python script is added to generate a manual page for eBPF helpers. The
>> workflow is the following, and requires the rst2man utility:
>> 
>>    $ ./scripts/bpf_helpers_doc.py \
>>            --filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
>>    $ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
>>    $ man /tmp/bpf-helpers.7
>> 
>> The objective is to keep all documentation related to the helpers in a
>> single place,

Do we really need another kernel-doc parser?

  ./scripts/kernel-doc include/uapi/linux/bpf.h

should already do the job (producing .rst). For more infos, take a look at

  https://www.kernel.org/doc/html/latest/doc-guide/index.html

-- Markus --

PS: sorry for re-post, first post was HTML which is not accepted by ML :o

^ permalink raw reply

* Re: [RFC bpf-next] bpf: document eBPF helpers and add a script to generate man page
From: Markus Heiser @ 2018-04-09  9:35 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Jonathan Corbet, Quentin Monnet, ast, netdev, oss-drivers,
	Linux Doc Mailing List, linux-man
In-Reply-To: <81150b4e-79d5-f837-72bc-d988e87f842a@iogearbox.net>


> Am 09.04.2018 um 11:25 schrieb Daniel Borkmann <daniel@iogearbox.net>:
> 
> On 04/09/2018 11:21 AM, Markus Heiser wrote:
> [...]
>> Do we really need another kernel-doc parser?
>> 
>>  ./scripts/kernel-doc include/uapi/linux/bpf.h
>> 
>> should already do the job (producing .rst). For more infos, take a look at
> 
> This has absolutely zero to do with kernel-doc, but rather producing
> a description of BPF helper function that are later assembled into an
> actual man-page that BPF program developers (user space) can use.

May I completely misunderstood you, so correct my if I'am wrong:

- ./scripts/bpf_helpers_doc.py : produces reST markup from C-comments
- ./scripts/kerne-doc          : produces reST markup from C-comments

IMO: both are doing the same job, so why not using kernel-doc?

-- Markus --

^ permalink raw reply

* Re: [PATCH v4] net: thunderx: rework mac addresses list to u64 array
From: Vadim Lomovtsev @ 2018-04-09  9:42 UTC (permalink / raw)
  To: David Miller
  Cc: sgoutham, sunil.kovvuri, rric, linux-arm-kernel, netdev,
	linux-kernel, dnelson, robin.murphy, hch, gustavo,
	Vadim.Lomovtsev
In-Reply-To: <20180408.124200.2204489876311923873.davem@davemloft.net>

On Sun, Apr 08, 2018 at 12:42:00PM -0400, David Miller wrote:
> From: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
> Date: Fri,  6 Apr 2018 12:53:54 -0700
> 
> > @@ -1929,7 +1929,7 @@ static void nicvf_set_rx_mode_task(struct work_struct *work_arg)
> >  						  work.work);
> >  	struct nicvf *nic = container_of(vf_work, struct nicvf, rx_mode_work);
> >  	union nic_mbx mbx = {};
> > -	struct xcast_addr *xaddr, *next;
> > +	int idx = 0;
> 
> No need to initialize idx.
> 
> > +		for (idx = 0; idx < vf_work->mc->count; idx++) {
> 
> As it is always explicitly initialized at, and only used inside of,
> this loop.

Ok, will post next version shortly.
Thanks for your time.

Vadim

^ permalink raw reply

* Re: [RFC PATCH bpf-next 2/6] bpf: add bpf_get_stack helper
From: Daniel Borkmann @ 2018-04-09 10:01 UTC (permalink / raw)
  To: Alexei Starovoitov, Yonghong Song, netdev; +Cc: kernel-team
In-Reply-To: <49c15114-c518-b591-470a-b4073a675588@fb.com>

On 04/09/2018 07:02 AM, Alexei Starovoitov wrote:
> On 4/8/18 9:53 PM, Yonghong Song wrote:
>>>> @@ -1004,7 +1007,8 @@ static void __bpf_prog_put(struct bpf_prog
>>>> *prog, bool do_idr_lock)
>>>>              bpf_prog_kallsyms_del(prog->aux->func[i]);
>>>>          bpf_prog_kallsyms_del(prog);
>>>>
>>>> -        call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
>>>> +        synchronize_rcu();
>>>> +        __bpf_prog_put_rcu(&prog->aux->rcu);
>>>
>>> there should have been lockdep splat.
>>> We cannot call synchronize_rcu here, since we cannot sleep
>>> in some cases.
>>
>> Let me double check this. The following is the reason
>> why I am using synchronize_rcu().
>>
>> With call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu) and
>> _bpf_prog_put_rcu calls put_callchain_buffers() which
>> calls mutex_lock(), the runtime with CONFIG_DEBUG_ATOMIC_SLEEP=y
>> will complains since potential sleep inside the call_rcu is not
>> allowed.
> 
> I see. Indeed. We cannot call put_callchain_buffers() from rcu callback,
> but doing synchronize_rcu() here is also not possible.
> How about moving put_callchain into bpf_prog_free_deferred() ?

+1, the assumption is that you can call bpf_prog_put() and also the
bpf_map_put() from any context. Sleeping here for a long time might
subtly break things badly.

^ permalink raw reply

* Re: [RFC bpf-next] bpf: document eBPF helpers and add a script to generate man page
From: Daniel Borkmann @ 2018-04-09 10:08 UTC (permalink / raw)
  To: Markus Heiser
  Cc: Jonathan Corbet, Quentin Monnet, ast, netdev, oss-drivers,
	Linux Doc Mailing List, linux-man
In-Reply-To: <C6491E01-3FC2-4AC4-8190-7B17CFAC01AD@darmarit.de>

On 04/09/2018 11:35 AM, Markus Heiser wrote:
> 
>> Am 09.04.2018 um 11:25 schrieb Daniel Borkmann <daniel@iogearbox.net>:
>>
>> On 04/09/2018 11:21 AM, Markus Heiser wrote:
>> [...]
>>> Do we really need another kernel-doc parser?
>>>
>>>  ./scripts/kernel-doc include/uapi/linux/bpf.h
>>>
>>> should already do the job (producing .rst). For more infos, take a look at
>>
>> This has absolutely zero to do with kernel-doc, but rather producing
>> a description of BPF helper function that are later assembled into an
>> actual man-page that BPF program developers (user space) can use.
> 
> May I completely misunderstood you, so correct my if I'am wrong:
> 
> - ./scripts/bpf_helpers_doc.py : produces reST markup from C-comments
> - ./scripts/kerne-doc          : produces reST markup from C-comments
> 
> IMO: both are doing the same job, so why not using kernel-doc?

They are not really doing the same job, in bpf_helpers_doc.py case you don't
want the whole header rendered, but just a fraction of it, that is, the
single big comment which describes all BPF helper functions that a BPF
program developer has available to use in user space - aka the entries in
the __BPF_FUNC_MAPPER() macro; I also doubt the latter would actually qualify
in kdoc context as some sort of a function description.

^ permalink raw reply

* Re: [PATCH v2 0/2] af_key: Fix for sadb_key memcpy read overrun
From: Steffen Klassert @ 2018-04-09 10:32 UTC (permalink / raw)
  To: Kevin Easton; +Cc: Herbert Xu, David S. Miller, netdev, linux-kernel
In-Reply-To: <cover.1523115061.git.kevin@guarana.org>

On Sat, Apr 07, 2018 at 11:40:18AM -0400, Kevin Easton wrote:
> As found by syzbot, af_key does not properly validate the key length in
> sadb_key messages from userspace.  This can result in copying from beyond
> the end of the sadb_key part of the message, or indeed beyond the end of
> the entire packet.
> 
> Both these patches apply cleanly to ipsec-next.  Based on Steffen's
> feedback I have re-ordered them so that the fix only is in patch 1, which
> I would suggest is also a stable tree candidate, whereas patch 2 is a
> cleanup only.

I think here is some explanation needed. Usually bugfixes and cleanups
don't go to the same tree. On IPsec bugfixes go to the'ipsec' tree
while cleanups and new features go to the 'ipsec-next' tree. So
you need to split up your patchsets into patches that are targeted
to 'ipsec' and 'ipsec-next'. Aside from that, we are in 'merge window'
currently. This means that most maintainers don't accept patches to
their -next trees. If you have patches for a -next tree, wait until
the merge window is over (when v4.17-rc1 is released) and send them
then.

^ permalink raw reply

* Re: [PATCH v2 1/2] af_key: Always verify length of provided sadb_key
From: Steffen Klassert @ 2018-04-09 10:33 UTC (permalink / raw)
  To: Kevin Easton; +Cc: Herbert Xu, David S. Miller, netdev, linux-kernel
In-Reply-To: <de803d3d01f8b70d306c0d52569cc5265ed20bc9.1523115061.git.kevin@guarana.org>

On Sat, Apr 07, 2018 at 11:40:33AM -0400, Kevin Easton wrote:
> Key extensions (struct sadb_key) include a user-specified number of key
> bits.  The kernel uses that number to determine how much key data to copy
> out of the message in pfkey_msg2xfrm_state().
> 
> The length of the sadb_key message must be verified to be long enough,
> even in the case of SADB_X_AALG_NULL.  Furthermore, the sadb_key_len value
> must be long enough to include both the key data and the struct sadb_key
> itself.
> 
> Introduce a helper function verify_key_len(), and call it from
> parse_exthdrs() where other exthdr types are similarly checked for
> correctness.
> 
> Signed-off-by: Kevin Easton <kevin@guarana.org>
> Reported-by: syzbot+5022a34ca5a3d49b84223653fab632dfb7b4cf37@syzkaller.appspotmail.com

Applied to the ipsec tree, thanks Kevin!

^ permalink raw reply

* Re: [PATCH v2 2/2] af_key: Use DIV_ROUND_UP() instead of open-coded equivalent
From: Steffen Klassert @ 2018-04-09 10:34 UTC (permalink / raw)
  To: Kevin Easton; +Cc: Herbert Xu, David S. Miller, netdev, linux-kernel
In-Reply-To: <494dcebdc07530d6afbd7eda300da890a3e68e1c.1523115061.git.kevin@guarana.org>

On Sat, Apr 07, 2018 at 11:40:47AM -0400, Kevin Easton wrote:
> Several places use (x + 7) / 8 to convert from a number of bits to a number
> of bytes.  Replace those with DIV_ROUND_UP(x, 8) instead, for consistency
> with other parts of the same file.
> 
> Signed-off-by: Kevin Easton <kevin@guarana.org>

Please resubmit this one to ipsec-next after the
merge window. Thanks!

^ permalink raw reply

* Re: [RFC bpf-next] bpf: document eBPF helpers and add a script to generate man page
From: Markus Heiser @ 2018-04-09 10:52 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Jonathan Corbet, Quentin Monnet, ast, netdev, oss-drivers,
	Linux Doc Mailing List, linux-man
In-Reply-To: <b2fbd09d-7ba2-9884-e18d-eac586af1665@iogearbox.net>


> Am 09.04.2018 um 12:08 schrieb Daniel Borkmann <daniel@iogearbox.net>:
[...]

>> May I completely misunderstood you, so correct my if I'am wrong:
>> 
>> - ./scripts/bpf_helpers_doc.py : produces reST markup from C-comments
>> - ./scripts/kerne-doc          : produces reST markup from C-comments
>> 
>> IMO: both are doing the same job, so why not using kernel-doc?
> 
> They are not really doing the same job, in bpf_helpers_doc.py case you don't
> want the whole header rendered, but just a fraction of it, that is, the
> single big comment which describes all BPF helper functions that a BPF
> program developer has available to use in user space - aka the entries in
> the __BPF_FUNC_MAPPER() macro;


> I also doubt the latter would actually qualify
> in kdoc context as some sort of a function description.

latter .. ah, OK .. thanks for clarifying. 

-- Markus --

^ permalink raw reply

* [PATCH] netfilter: fix CONFIG_NF_REJECT_IPV6=m link error
From: Arnd Bergmann @ 2018-04-09 10:53 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
	David S. Miller
  Cc: Arnd Bergmann, netfilter-devel, coreteam, netdev, linux-kernel

We get a new link error with CONFIG_NFT_REJECT_INET=y and CONFIG_NF_REJECT_IPV6=m
after larger parts of the nftables modules are linked together:

net/netfilter/nft_reject_inet.o: In function `nft_reject_inet_eval':
nft_reject_inet.c:(.text+0x17c): undefined reference to `nf_send_unreach6'
nft_reject_inet.c:(.text+0x190): undefined reference to `nf_send_reset6'

The problem is that with NF_TABLES_INET set, we implicitly try to use
the ipv6 version as well for NFT_REJECT, but when CONFIG_IPV6 is set to
a loadable module, it's impossible to reach that.

The best workaround I found is to express the above as a Kconfig
dependency, forcing NFT_REJECT itself to be 'm' in that particular
configuration.

Fixes: 02c7b25e5f54 ("netfilter: nf_tables: build-in filter chain type")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 net/netfilter/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 704b3832dbad..44d8a55e9721 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -594,6 +594,7 @@ config NFT_QUOTA
 config NFT_REJECT
 	default m if NETFILTER_ADVANCED=n
 	tristate "Netfilter nf_tables reject support"
+	depends on !NF_TABLES_INET || (IPV6!=m || m)
 	help
 	  This option adds the "reject" expression that you can use to
 	  explicitly deny and notify via TCP reset/ICMP informational errors
-- 
2.9.0

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox