Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net] virtio_net: split out ctrl buffer
From: kbuild test robot @ 2018-04-19  7:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kbuild-all, linux-kernel, Mikulas Patocka, Eric Dumazet,
	David Miller, Jason Wang, virtualization, netdev
In-Reply-To: <1524113437-308621-1-git-send-email-mst@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 1822 bytes --]

Hi Michael,

I love your patch! Yet something to improve:

[auto build test ERROR on net/master]

url:    https://github.com/0day-ci/linux/commits/Michael-S-Tsirkin/virtio_net-split-out-ctrl-buffer/20180419-145754
config: x86_64-randconfig-x006-201815 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers//net/virtio_net.c: In function 'virtnet_free_queues':
>> drivers//net/virtio_net.c:2365:12: error: 'struct virtnet_info' has no member named 'err_ctrl'; did you mean 'ctrl'?
     kfree(vi->err_ctrl);
               ^~~~~~~~
               ctrl
   drivers//net/virtio_net.c: In function 'virtnet_alloc_queues':
>> drivers//net/virtio_net.c:2589:8: error: 'vq' undeclared (first use in this function); did you mean 'vi'?
     kfree(vq->ctrl);
           ^~
           vi
   drivers//net/virtio_net.c:2589:8: note: each undeclared identifier is reported only once for each function it appears in

vim +2365 drivers//net/virtio_net.c

  2347	
  2348	static void virtnet_free_queues(struct virtnet_info *vi)
  2349	{
  2350		int i;
  2351	
  2352		for (i = 0; i < vi->max_queue_pairs; i++) {
  2353			napi_hash_del(&vi->rq[i].napi);
  2354			netif_napi_del(&vi->rq[i].napi);
  2355			netif_napi_del(&vi->sq[i].napi);
  2356		}
  2357	
  2358		/* We called napi_hash_del() before netif_napi_del(),
  2359		 * we need to respect an RCU grace period before freeing vi->rq
  2360		 */
  2361		synchronize_net();
  2362	
  2363		kfree(vi->rq);
  2364		kfree(vi->sq);
> 2365		kfree(vi->err_ctrl);
  2366	}
  2367	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27230 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 2/2] udp: implement and use per cpu rx skbs cache
From: Paolo Abeni @ 2018-04-19  7:40 UTC (permalink / raw)
  To: Eric Dumazet, netdev; +Cc: David S. Miller
In-Reply-To: <3270c995-4eea-b3e1-128c-82921d89eb79@gmail.com>

Hi,

On Wed, 2018-04-18 at 12:21 -0700, Eric Dumazet wrote:
> 
> On 04/18/2018 10:15 AM, Paolo Abeni wrote:
> is not appealing to me :/
> > 
> > Thank you for the feedback.
> > Sorry for not being clear about it, but knotd is using SO_REUSEPORT and
> > the above tests are leveraging it.
> > 
> > That 5% is on top of that 300%.
> 
> Then there is something wrong.
> 
> Adding copies should not increase performance.

The skb and data are copied into the UDP skb cache only if the socket
is under memory pressure, and that happens if and only if the receiver
is slower than the BH/IP receive path.

The copy slows down the RX path - which was dropping packets - and
makes the udp_recvmsg() considerably faster, as consuming skb becomes
almost a no-op.

AFAICS, this is similar to the strategy you used in:

ommit c8c8b127091b758f5768f906bcdeeb88bc9951ca
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Dec 7 09:19:33 2016 -0800

    udp: under rx pressure, try to condense skbs

with the difference that with the UDP skb cache there is an hard limit
to the amount of memory the BH is allowed to copy.

> If it does, there is certainly another way, reaching 10% instead of 5%

I benchmarked vs a DNS server to test and verify that we get measurable
benefits in real life scenario. The measured performance gain for the
RX path with reasonable configurations is ~20%.

Any suggestions for better results are more than welcome!

Cheers,

Paolo

^ permalink raw reply

* Re: [net PATCH v2] net: sched, fix OOO packets with pfifo_fast
From: Paolo Abeni @ 2018-04-19  8:00 UTC (permalink / raw)
  To: John Fastabend, Cong Wang
  Cc: Eric Dumazet, Jiri Pirko, David Miller,
	Linux Kernel Network Developers
In-Reply-To: <36a89ed1-d6ff-ddad-c736-4e68909d61c4@gmail.com>

On Wed, 2018-04-18 at 09:44 -0700, John Fastabend wrote:
> Thanks for bringing this up. I'll think about it for a bit maybe
> there is something we can do here. There is a set of conditions
> that if met we can run without the lock. Possibly ONETXQUEUE and
> aligned cpu_map is sufficient. 

I think you mean "root qdisc is mq and aligned cpu_map": AFAICS we can
have ONETXQUEUE when root qdisc is e.g. pfifo_fast which would not help
here.

> We could detect this case and drop
> the locking. For existing systems and high Gbps NICs I think (feel
> free to correct me) assuming a core per cpu is OK. 

I'm sorry, I'm lost. Do you mean "a tx queue per core" instead ?!? 

I'm unsure we can assume the above. In my experiments, at least in some
scenarios it's preferrable configuring a limited number of rx/tx
queues, confine BH processing to the related cores and let user space
processes run on the others, with a many to 1 relationship between the
cores "assigned" to user-space and the cores "assigned" to BH
processing. 

Can't we somewhat try to leverage TCQ_F_CAN_BYPASS even with NOLOCK
qdisc? I *think* we can avoid the qdisc_run() call after
sch_direct_xmit() in the bypass scenario, and that will avoid the
blamed atomic ops above.

Cheers,

Paolo

^ permalink raw reply

* [PATCH] net: phy: marvell: clear wol event before setting it
From: Jisheng Zhang @ 2018-04-19  8:02 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David S. Miller
  Cc: netdev, linux-kernel, Jingju Hou

From: Jingju Hou <Jingju.Hou@synaptics.com>

If WOL event happened once, the LED[2] interrupt pin will not be
cleared unless reading the CSISR register. So clear the WOL event
before enabling it.

Signed-off-by: Jingju Hou <Jingju.Hou@synaptics.com>
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
---
 drivers/net/phy/marvell.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index c22e8e383247..b6abe1cbc84b 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -115,6 +115,9 @@
 /* WOL Event Interrupt Enable */
 #define MII_88E1318S_PHY_CSIER_WOL_EIE			BIT(7)
 
+/* Copper Specific Interrupt Status Register */
+#define MII_88E1318S_PHY_CSISR				0x13
+
 /* LED Timer Control Register */
 #define MII_88E1318S_PHY_LED_TCR			0x12
 #define MII_88E1318S_PHY_LED_TCR_FORCE_INT		BIT(15)
@@ -1393,6 +1396,12 @@ static int m88e1318_set_wol(struct phy_device *phydev,
 		if (err < 0)
 			goto error;
 
+		/* If WOL event happened once, the LED[2] interrupt pin
+		 * will not be cleared unless reading the CSISR register.
+		 * So clear the WOL event first before enabling it.
+		 */
+		phy_read(phydev, MII_88E1318S_PHY_CSISR);
+
 		/* Enable the WOL interrupt */
 		err = __phy_modify(phydev, MII_88E1318S_PHY_CSIER, 0,
 				   MII_88E1318S_PHY_CSIER_WOL_EIE);
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH net] virtio_net: split out ctrl buffer
From: kbuild test robot @ 2018-04-19  8:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kbuild-all, linux-kernel, Mikulas Patocka, Eric Dumazet,
	David Miller, Jason Wang, virtualization, netdev
In-Reply-To: <1524113437-308621-1-git-send-email-mst@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 1725 bytes --]

Hi Michael,

I love your patch! Yet something to improve:

[auto build test ERROR on net/master]

url:    https://github.com/0day-ci/linux/commits/Michael-S-Tsirkin/virtio_net-split-out-ctrl-buffer/20180419-145754
config: i386-randconfig-a0-201815 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/net/virtio_net.c: In function 'virtnet_free_queues':
>> drivers/net/virtio_net.c:2365:10: error: 'struct virtnet_info' has no member named 'err_ctrl'
     kfree(vi->err_ctrl);
             ^
   drivers/net/virtio_net.c: In function 'virtnet_alloc_queues':
>> drivers/net/virtio_net.c:2589:8: error: 'vq' undeclared (first use in this function)
     kfree(vq->ctrl);
           ^
   drivers/net/virtio_net.c:2589:8: note: each undeclared identifier is reported only once for each function it appears in

vim +2365 drivers/net/virtio_net.c

  2347	
  2348	static void virtnet_free_queues(struct virtnet_info *vi)
  2349	{
  2350		int i;
  2351	
  2352		for (i = 0; i < vi->max_queue_pairs; i++) {
  2353			napi_hash_del(&vi->rq[i].napi);
  2354			netif_napi_del(&vi->rq[i].napi);
  2355			netif_napi_del(&vi->sq[i].napi);
  2356		}
  2357	
  2358		/* We called napi_hash_del() before netif_napi_del(),
  2359		 * we need to respect an RCU grace period before freeing vi->rq
  2360		 */
  2361		synchronize_net();
  2362	
  2363		kfree(vi->rq);
  2364		kfree(vi->sq);
> 2365		kfree(vi->err_ctrl);
  2366	}
  2367	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 31332 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v4 1/3] vmcore: add API to collect hardware dump in second kernel
From: Greg KH @ 2018-04-19  8:24 UTC (permalink / raw)
  To: Rahul Lakkireddy
  Cc: netdev, kexec, linux-fsdevel, linux-kernel, davem, viro, ebiederm,
	stephen, akpm, torvalds, ganeshgr, nirranjan, indranil
In-Reply-To: <e69b49f1406d2dcd9cd68e4300939e96cd745559.1523950324.git.rahul.lakkireddy@chelsio.com>

On Tue, Apr 17, 2018 at 01:14:17PM +0530, Rahul Lakkireddy wrote:
> +config PROC_VMCORE_DEVICE_DUMP
> +	bool "Device Hardware/Firmware Log Collection"
> +	depends on PROC_VMCORE
> +	default y

Only things that require the machine to keep working should be 'default
y', please remove this, it's an option.

> +	help
> +	  Device drivers can collect the device specific snapshot of
> +	  their hardware or firmware before they are initialized in
> +	  crash recovery kernel. If you say Y here, the device dumps
> +	  will be added as ELF notes to /proc/vmcore

Which exact "device drivers" are you referring to here?

thanks,

greg k-h

^ permalink raw reply

* [PATCH] net: phy: TLK10X initial driver submission
From: Måns Andersson @ 2018-04-19  8:28 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Andrew Lunn, Florian Fainelli, netdev,
	devicetree, linux-kernel
  Cc: Mans Andersson

From: Mans Andersson <mans.andersson@nibe.se>

Add suport for the TI TLK105 and TLK106 10/100Mbit ethernet phys.

In addition the TLK10X needs to be removed from DP83848 driver as the
power back off support is added here for this device.

Datasheet:
http://www.ti.com/lit/gpn/tlk106
---
 .../devicetree/bindings/net/ti,tlk10x.txt          |  27 +++
 drivers/net/phy/Kconfig                            |   5 +
 drivers/net/phy/Makefile                           |   1 +
 drivers/net/phy/dp83848.c                          |   3 -
 drivers/net/phy/tlk10x.c                           | 209 +++++++++++++++++++++
 5 files changed, 242 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/ti,tlk10x.txt
 create mode 100644 drivers/net/phy/tlk10x.c

diff --git a/Documentation/devicetree/bindings/net/ti,tlk10x.txt b/Documentation/devicetree/bindings/net/ti,tlk10x.txt
new file mode 100644
index 0000000..371d0d7
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/ti,tlk10x.txt
@@ -0,0 +1,27 @@
+* Texas Instruments - TLK105 / TLK106 ethernet PHYs
+
+Required properties:
+	- reg - The ID number for the phy, usually a small integer
+
+Optional properties:
+	- ti,power-back-off - Power Back Off Level
+		Please refer to data sheet chapter 8.6 and TI Application
+		Note SLLA3228
+		0 - Normal Operation
+		1 - Level 1 (up to 140m cable between TLK link partners)
+		2 - Level 2 (up to 100m cable between TLK link partners)
+		3 - Level 3 (up to 80m cable between TLK link partners)
+
+Default child nodes are standard Ethernet PHY device
+nodes as described in Documentation/devicetree/bindings/net/phy.txt
+
+Example:
+
+	ethernet-phy@0 {
+		reg = <0>;
+		ti,power-back-off = <2>;
+	};
+
+Datasheets and documentation can be found at:
+http://www.ti.com/lit/gpn/tlk106
+http://www.ti.com/lit/an/slla328/slla328.pdf
diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index bdfbabb..c980240 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -295,6 +295,11 @@ config DP83867_PHY
 	---help---
 	  Currently supports the DP83867 PHY.
 
+config TLK10X_PHY
+	tristate "Texas Instruments TLK10x PHY"
+	---help---
+	  Supports the TLK105 and TLK106 PHYs.
+
 config FIXED_PHY
 	tristate "MDIO Bus/PHY emulation with fixed speed/link PHYs"
 	depends on PHYLIB
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 01acbcb..37e4e02 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -79,5 +79,6 @@ obj-$(CONFIG_ROCKCHIP_PHY)	+= rockchip.o
 obj-$(CONFIG_SMSC_PHY)		+= smsc.o
 obj-$(CONFIG_STE10XP)		+= ste10Xp.o
 obj-$(CONFIG_TERANETICS_PHY)	+= teranetics.o
+obj-$(CONFIG_TLK10X_PHY)	+= tlk10x.o
 obj-$(CONFIG_VITESSE_PHY)	+= vitesse.o
 obj-$(CONFIG_XILINX_GMII2RGMII) += xilinx_gmii2rgmii.o
diff --git a/drivers/net/phy/dp83848.c b/drivers/net/phy/dp83848.c
index cd09c3a..435f401 100644
--- a/drivers/net/phy/dp83848.c
+++ b/drivers/net/phy/dp83848.c
@@ -19,7 +19,6 @@
 #define TI_DP83848C_PHY_ID		0x20005ca0
 #define TI_DP83620_PHY_ID		0x20005ce0
 #define NS_DP83848C_PHY_ID		0x20005c90
-#define TLK10X_PHY_ID			0x2000a210
 
 /* Registers */
 #define DP83848_MICR			0x11 /* MII Interrupt Control Register */
@@ -78,7 +77,6 @@ static struct mdio_device_id __maybe_unused dp83848_tbl[] = {
 	{ TI_DP83848C_PHY_ID, 0xfffffff0 },
 	{ NS_DP83848C_PHY_ID, 0xfffffff0 },
 	{ TI_DP83620_PHY_ID, 0xfffffff0 },
-	{ TLK10X_PHY_ID, 0xfffffff0 },
 	{ }
 };
 MODULE_DEVICE_TABLE(mdio, dp83848_tbl);
@@ -105,7 +103,6 @@ static struct phy_driver dp83848_driver[] = {
 	DP83848_PHY_DRIVER(TI_DP83848C_PHY_ID, "TI DP83848C 10/100 Mbps PHY"),
 	DP83848_PHY_DRIVER(NS_DP83848C_PHY_ID, "NS DP83848C 10/100 Mbps PHY"),
 	DP83848_PHY_DRIVER(TI_DP83620_PHY_ID, "TI DP83620 10/100 Mbps PHY"),
-	DP83848_PHY_DRIVER(TLK10X_PHY_ID, "TI TLK10X 10/100 Mbps PHY"),
 };
 module_phy_driver(dp83848_driver);
 
diff --git a/drivers/net/phy/tlk10x.c b/drivers/net/phy/tlk10x.c
new file mode 100644
index 0000000..1efc81e
--- /dev/null
+++ b/drivers/net/phy/tlk10x.c
@@ -0,0 +1,209 @@
+// SPDX-License-Identifier: GPL-2.0
+/**
+ * Driver for the Texas Instruments TLK105 / TLK106
+ *
+ * Copyright (C) 2018 NIBE Industrier AB - http://www.nibe.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/phy.h>
+#include <linux/of.h>
+
+#define TLK10X_PHY_ID			0x2000a210
+
+/* Registers */
+#define TLK10X_MICR			0x11 /* MII Interrupt Control Reg */
+#define TLK10X_MISR			0x12 /* MII Interrupt Status Reg */
+#define TLK10X_REGCR			0x0d /* Register Control Register */
+#define TLK10X_ADDAR			0x0e /* Data Register */
+#define TLK10X_PWRBOCR			0xae /* Power Backoff Register */
+
+/* MICR Register Fields */
+#define TLK10X_MICR_INT_OE		BIT(0) /* Interrupt Output Enable */
+#define TLK10X_MICR_INTEN		BIT(1) /* Interrupt Enable */
+
+/* MISR Register Fields */
+#define TLK10X_MISR_RHF_INT_EN		BIT(0) /* Receive Error Counter */
+#define TLK10X_MISR_FHF_INT_EN		BIT(1) /* False Carrier Counter */
+#define TLK10X_MISR_ANC_INT_EN		BIT(2) /* Auto-negotiation complete */
+#define TLK10X_MISR_DUP_INT_EN		BIT(3) /* Duplex Status */
+#define TLK10X_MISR_SPD_INT_EN		BIT(4) /* Speed status */
+#define TLK10X_MISR_LINK_INT_EN		BIT(5) /* Link status */
+#define TLK10X_MISR_ED_INT_EN		BIT(6) /* Energy detect */
+#define TLK10X_MISR_LQM_INT_EN		BIT(7) /* Link Quality Monitor */
+
+/* PWRBOCR Register Fields */
+#define TLK10X_PWRBOCR_MASK		0xe0 /* Power Backoff Mask */
+
+#define TLK10X_INT_EN_MASK		\
+	(TLK10X_MISR_ANC_INT_EN |	\
+	 TLK10X_MISR_DUP_INT_EN |	\
+	 TLK10X_MISR_SPD_INT_EN |	\
+	 TLK10X_MISR_LINK_INT_EN)
+
+struct tlk10x_private {
+	int pwrbo_level;
+};
+
+static int tlk10x_read(struct phy_device *phydev, int reg)
+{
+	if (reg & ~0x1f) {
+		/* Extended register */
+		phy_write(phydev, TLK10X_REGCR, 0x001F);
+		phy_write(phydev, TLK10X_ADDAR, reg);
+		phy_write(phydev, TLK10X_REGCR, 0x401F);
+		reg = TLK10X_ADDAR;
+	}
+
+	return phy_read(phydev, reg);
+}
+
+static int tlk10x_write(struct phy_device *phydev, int reg, int val)
+{
+	if (reg & ~0x1f) {
+		/* Extended register */
+		phy_write(phydev, TLK10X_REGCR, 0x001F);
+		phy_write(phydev, TLK10X_ADDAR, reg);
+		phy_write(phydev, TLK10X_REGCR, 0x401F);
+		reg = TLK10X_ADDAR;
+	}
+
+	return phy_write(phydev, reg, val);
+}
+
+#ifdef CONFIG_OF_MDIO
+static int tlk10x_of_init(struct phy_device *phydev)
+{
+	struct tlk10x_private *tlk10x = phydev->priv;
+	struct device *dev = &phydev->mdio.dev;
+	struct device_node *of_node = dev->of_node;
+	int ret;
+
+	if (!of_node)
+		return 0;
+
+	ret = of_property_read_u32(of_node, "ti,power-back-off",
+				   &tlk10x->pwrbo_level);
+	if (ret) {
+		dev_err(dev, "missing ti,power-back-off property");
+		tlk10x->pwrbo_level = 0;
+	}
+
+	return 0;
+}
+#else
+static int tlk10x_of_init(struct phy_device *phydev)
+{
+	return 0;
+}
+#endif /* CONFIG_OF_MDIO */
+
+static int tlk10x_config_init(struct phy_device *phydev)
+{
+	int ret, reg;
+	struct tlk10x_private *tlk10x;
+
+	ret = genphy_config_init(phydev);
+	if (ret < 0)
+		return ret;
+
+	if (!phydev->priv) {
+		tlk10x = devm_kzalloc(&phydev->mdio.dev, sizeof(*tlk10x),
+				      GFP_KERNEL);
+		if (!tlk10x)
+			return -ENOMEM;
+
+		phydev->priv = tlk10x;
+		ret = tlk10x_of_init(phydev);
+		if (ret)
+			return ret;
+	} else {
+		tlk10x = (struct tlk10x_private *)phydev->priv;
+	}
+
+	// Power back off
+	if (tlk10x->pwrbo_level < 0 || tlk10x->pwrbo_level > 3)
+		tlk10x->pwrbo_level = 0;
+	reg = tlk10x_read(phydev, TLK10X_PWRBOCR);
+	reg = ((reg & ~TLK10X_PWRBOCR_MASK)
+		| (tlk10x->pwrbo_level << 6));
+	ret = tlk10x_write(phydev, TLK10X_PWRBOCR, reg);
+	if (ret < 0) {
+		dev_err(&phydev->mdio.dev,
+			"unable to set power back-off (err=%d)\n", ret);
+		return ret;
+	}
+	dev_info(&phydev->mdio.dev, "power back-off set to level %d\n",
+		 tlk10x->pwrbo_level);
+
+	return 0;
+}
+
+static int tlk10x_ack_interrupt(struct phy_device *phydev)
+{
+	int err = tlk10x_read(phydev, TLK10X_MISR);
+
+	return err < 0 ? err : 0;
+}
+
+static int tlk10x_config_intr(struct phy_device *phydev)
+{
+	int control, ret;
+
+	control = tlk10x_read(phydev, TLK10X_MICR);
+	if (control < 0)
+		return control;
+
+	if (phydev->interrupts == PHY_INTERRUPT_ENABLED) {
+		control |= TLK10X_MICR_INT_OE;
+		control |= TLK10X_MICR_INTEN;
+
+		ret = tlk10x_write(phydev, TLK10X_MISR, TLK10X_INT_EN_MASK);
+		if (ret < 0)
+			return ret;
+	} else {
+		control &= ~TLK10X_MICR_INTEN;
+	}
+
+	return tlk10x_write(phydev, TLK10X_MICR, control);
+}
+
+static struct phy_driver tlk10x_driver[] = {
+	{
+		.phy_id		= TLK10X_PHY_ID,
+		.phy_id_mask	= 0xfffffff0,
+		.name		= "TI TLK10X 10/100 Mbps PHY",
+		.features	= PHY_BASIC_FEATURES,
+		.flags		= PHY_HAS_INTERRUPT,
+
+		.config_init	= tlk10x_config_init,
+		.soft_reset	= genphy_soft_reset,
+
+		/* IRQ related */
+		.ack_interrupt	= tlk10x_ack_interrupt,
+		.config_intr	= tlk10x_config_intr,
+
+		.suspend	= genphy_suspend,
+		.resume		= genphy_resume,
+	},
+};
+module_phy_driver(tlk10x_driver);
+
+static struct mdio_device_id __maybe_unused tlk10x_tbl[] = {
+	{ TLK10X_PHY_ID, 0xfffffff0 },
+	{ }
+};
+MODULE_DEVICE_TABLE(mdio, tlk10x_tbl);
+
+MODULE_DESCRIPTION("Texas Instruments TLK105 / TLK106 PHY driver");
+MODULE_AUTHOR("Mans Andersson <mans.andersson@nibe.se>");
+MODULE_LICENSE("GPL");
-- 
2.9.3

^ permalink raw reply related

* RE: [PATCH] net: phy: marvell: clear wol event before setting it
From: Bhadram Varka @ 2018-04-19  8:38 UTC (permalink / raw)
  To: Jisheng Zhang, Andrew Lunn, Florian Fainelli, David S. Miller
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Jingju Hou
In-Reply-To: <20180419160232.519d15be@xhacker.debian>

Hi,

> -----Original Message-----
> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> Behalf Of Jisheng Zhang
> Sent: Thursday, April 19, 2018 1:33 PM
> To: Andrew Lunn <andrew@lunn.ch>; Florian Fainelli <f.fainelli@gmail.com>;
> David S. Miller <davem@davemloft.net>
> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Jingju Hou
> <Jingju.Hou@synaptics.com>
> Subject: [PATCH] net: phy: marvell: clear wol event before setting it
> 
> From: Jingju Hou <Jingju.Hou@synaptics.com>
> 
> If WOL event happened once, the LED[2] interrupt pin will not be cleared unless
> reading the CSISR register. So clear the WOL event before enabling it.
> 
> Signed-off-by: Jingju Hou <Jingju.Hou@synaptics.com>
> Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
> ---
>  drivers/net/phy/marvell.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c index
> c22e8e383247..b6abe1cbc84b 100644
> --- a/drivers/net/phy/marvell.c
> +++ b/drivers/net/phy/marvell.c
> @@ -115,6 +115,9 @@
>  /* WOL Event Interrupt Enable */
>  #define MII_88E1318S_PHY_CSIER_WOL_EIE			BIT(7)
> 
> +/* Copper Specific Interrupt Status Register */
> +#define MII_88E1318S_PHY_CSISR				0x13
> +

There is already macro to represent this register - MII_M1011_IEVENT. Do we need this macro ?

>  /* LED Timer Control Register */
>  #define MII_88E1318S_PHY_LED_TCR			0x12
>  #define MII_88E1318S_PHY_LED_TCR_FORCE_INT		BIT(15)
> @@ -1393,6 +1396,12 @@ static int m88e1318_set_wol(struct phy_device
> *phydev,
>  		if (err < 0)
>  			goto error;
> 
> +		/* If WOL event happened once, the LED[2] interrupt pin
> +		 * will not be cleared unless reading the CSISR register.
> +		 * So clear the WOL event first before enabling it.
> +		 */
> +		phy_read(phydev, MII_88E1318S_PHY_CSISR);

This part of the operation already taken care by ack_interrupt and did_interrupt
[....]
.ack_interrupt = &marvell_ack_interrupt,
.did_interrupt = &m88e1121_did_interrupt,
[...]

If at all WOL event occurred marvell_ack_interrupt will take care of clearing the interrupt status register.
Am I missing anything here ?

>  		/* Enable the WOL interrupt */
>  		err = __phy_modify(phydev, MII_88E1318S_PHY_CSIER, 0,
>  				   MII_88E1318S_PHY_CSIER_WOL_EIE);
> --
> 2.17.0

^ permalink raw reply

* Re: [PATCH] net: phy: marvell: clear wol event before setting it
From: Jisheng Zhang @ 2018-04-19  8:53 UTC (permalink / raw)
  To: Bhadram Varka
  Cc: Andrew Lunn, Florian Fainelli, David S. Miller,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Jingju Hou
In-Reply-To: <96e77eac86794bef9a5b772147527c67@bgmail102.nvidia.com>

Hi,

On Thu, 19 Apr 2018 08:38:45 +0000 Bhadram Varka wrote:

> Hi,
> 
> > -----Original Message-----
> > From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> > Behalf Of Jisheng Zhang
> > Sent: Thursday, April 19, 2018 1:33 PM
> > To: Andrew Lunn <andrew@lunn.ch>; Florian Fainelli <f.fainelli@gmail.com>;
> > David S. Miller <davem@davemloft.net>
> > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Jingju Hou
> > <Jingju.Hou@synaptics.com>
> > Subject: [PATCH] net: phy: marvell: clear wol event before setting it
> > 
> > From: Jingju Hou <Jingju.Hou@synaptics.com>
> > 
> > If WOL event happened once, the LED[2] interrupt pin will not be cleared unless
> > reading the CSISR register. So clear the WOL event before enabling it.
> > 
> > Signed-off-by: Jingju Hou <Jingju.Hou@synaptics.com>
> > Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
> > ---
> >  drivers/net/phy/marvell.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c index
> > c22e8e383247..b6abe1cbc84b 100644
> > --- a/drivers/net/phy/marvell.c
> > +++ b/drivers/net/phy/marvell.c
> > @@ -115,6 +115,9 @@
> >  /* WOL Event Interrupt Enable */
> >  #define MII_88E1318S_PHY_CSIER_WOL_EIE			BIT(7)
> > 
> > +/* Copper Specific Interrupt Status Register */
> > +#define MII_88E1318S_PHY_CSISR				0x13
> > +  
> 
> There is already macro to represent this register - MII_M1011_IEVENT. Do we need this macro ?

Good point. Will use MII_M1011_IEVENT instead in v2.

> 
> >  /* LED Timer Control Register */
> >  #define MII_88E1318S_PHY_LED_TCR			0x12
> >  #define MII_88E1318S_PHY_LED_TCR_FORCE_INT		BIT(15)
> > @@ -1393,6 +1396,12 @@ static int m88e1318_set_wol(struct phy_device
> > *phydev,
> >  		if (err < 0)
> >  			goto error;
> > 
> > +		/* If WOL event happened once, the LED[2] interrupt pin
> > +		 * will not be cleared unless reading the CSISR register.
> > +		 * So clear the WOL event first before enabling it.
> > +		 */
> > +		phy_read(phydev, MII_88E1318S_PHY_CSISR);  
> 
> This part of the operation already taken care by ack_interrupt and did_interrupt
> [....]
> .ack_interrupt = &marvell_ack_interrupt,
> .did_interrupt = &m88e1121_did_interrupt,
> [...]
> 
> If at all WOL event occurred marvell_ack_interrupt will take care of clearing the interrupt status register.
> Am I missing anything here ?

If there's no valid irq for phy, the ack_interrupt/did_interrupt won't
be called.


Thanks

^ permalink raw reply

* [PATCH 1/5] netfilter: ipvs: Fix space before '[' error.
From: Simon Horman @ 2018-04-19  8:56 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Arvind Yadav, Simon Horman
In-Reply-To: <20180419085614.7437-1-horms@verge.net.au>

From: Arvind Yadav <arvind.yadav.cs@gmail.com>

Fix checkpatch.pl error:
ERROR: space prohibited before open square bracket '['.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_proto_tcp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_proto_tcp.c b/net/netfilter/ipvs/ip_vs_proto_tcp.c
index bcd9b7bde4ee..569631d2b2a1 100644
--- a/net/netfilter/ipvs/ip_vs_proto_tcp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_tcp.c
@@ -436,7 +436,7 @@ static bool tcp_state_active(int state)
 	return tcp_state_active_table[state];
 }
 
-static struct tcp_states_t tcp_states [] = {
+static struct tcp_states_t tcp_states[] = {
 /*	INPUT */
 /*        sNO, sES, sSS, sSR, sFW, sTW, sCL, sCW, sLA, sLI, sSA	*/
 /*syn*/ {{sSR, sES, sES, sSR, sSR, sSR, sSR, sSR, sSR, sSR, sSR }},
@@ -459,7 +459,7 @@ static struct tcp_states_t tcp_states [] = {
 /*rst*/ {{sCL, sCL, sCL, sSR, sCL, sCL, sCL, sCL, sLA, sLI, sCL }},
 };
 
-static struct tcp_states_t tcp_states_dos [] = {
+static struct tcp_states_t tcp_states_dos[] = {
 /*	INPUT */
 /*        sNO, sES, sSS, sSR, sFW, sTW, sCL, sCW, sLA, sLI, sSA	*/
 /*syn*/ {{sSR, sES, sES, sSR, sSR, sSR, sSR, sSR, sSR, sSR, sSA }},
-- 
2.11.0

^ permalink raw reply related

* [PATCH 2/5] netfilter: ipvs: Keep latest weight of destination
From: Simon Horman @ 2018-04-19  8:56 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Inju Song, Simon Horman
In-Reply-To: <20180419085614.7437-1-horms@verge.net.au>

From: Inju Song <inju.song@navercorp.com>

The hashing table in scheduler such as source hash or maglev hash
should ignore the changed weight to 0 and allow changing the weight
from/to non-0 values. So, struct ip_vs_dest needs to keep weight
with latest non-0 weight.

Signed-off-by: Inju Song <inju.song@navercorp.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 include/net/ip_vs.h            | 1 +
 net/netfilter/ipvs/ip_vs_ctl.c | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index eb0bec043c96..0ac795b41ab8 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -668,6 +668,7 @@ struct ip_vs_dest {
 	volatile unsigned int	flags;		/* dest status flags */
 	atomic_t		conn_flags;	/* flags to copy to conn */
 	atomic_t		weight;		/* server weight */
+	atomic_t		last_weight;	/* server latest weight */
 
 	refcount_t		refcnt;		/* reference counter */
 	struct ip_vs_stats      stats;          /* statistics */
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 5ebde4b15810..b91bb70ece92 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -821,6 +821,10 @@ __ip_vs_update_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest,
 	if (add && udest->af != svc->af)
 		ipvs->mixed_address_family_dests++;
 
+	/* keep the last_weight with latest non-0 weight */
+	if (add || udest->weight != 0)
+		atomic_set(&dest->last_weight, udest->weight);
+
 	/* set the weight and the flags */
 	atomic_set(&dest->weight, udest->weight);
 	conn_flags = udest->conn_flags & IP_VS_CONN_F_DEST_MASK;
-- 
2.11.0

^ permalink raw reply related

* [GIT PULL 0/5] IPVS Updates for v4.18
From: Simon Horman @ 2018-04-19  8:56 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Simon Horman

Hi Pablo,

please consider these IPVS enhancements for v4.18.

* Whitepace cleanup

* Add Maglev hashing algorithm as a IPVS scheduler

  Inju Song says "Implements the Google's Maglev hashing algorithm as a
  IPVS scheduler.  Basically it provides consistent hashing but offers some
  special features about disruption and load balancing.

  1) minimal disruption: when the set of destinations changes,
     a connection will likely be sent to the same destination
     as it was before.

  2) load balancing: each destination will receive an almost
     equal number of connections.

 Seel also: [3.4 Consistent Hasing] in
 https://www.usenix.org/system/files/conference/nsdi16/nsdi16-paper-eisenbud.pdf
 "

* Fix to correct implementation of Knuth's multiplicative hashing
  which is used in sh/dh/lblc/lblcr algorithms. Instead the
  implementation provided by the hash_32() macro is used.

The following changes since commit 159f02977b2feb18a4bece5e586c838a6d26d44b:

  Merge branch 'net-mvneta-improve-suspend-resume' (2018-04-02 11:14:03 -0400)

are available in the git repository at:

  http://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git tags/ipvs-for-v4.18

for you to fetch changes up to 9a17740e0ea1c9b1edd89836bb27c76272f54641:

  ipvs: fix multiplicative hashing in sh/dh/lblc/lblcr algorithms (2018-04-09 10:15:27 +0300)

----------------------------------------------------------------
Arvind Yadav (1):
      netfilter: ipvs: Fix space before '[' error.

Inju Song (3):
      netfilter: ipvs: Keep latest weight of destination
      netfilter: ipvs: Add Maglev hashing scheduler
      netfilter: ipvs: Add configurations of Maglev hashing

Vincent Bernat (1):
      ipvs: fix multiplicative hashing in sh/dh/lblc/lblcr algorithms

 include/net/ip_vs.h                  |   1 +
 net/netfilter/ipvs/Kconfig           |  37 +++
 net/netfilter/ipvs/Makefile          |   1 +
 net/netfilter/ipvs/ip_vs_ctl.c       |   4 +
 net/netfilter/ipvs/ip_vs_dh.c        |   3 +-
 net/netfilter/ipvs/ip_vs_lblc.c      |   3 +-
 net/netfilter/ipvs/ip_vs_lblcr.c     |   3 +-
 net/netfilter/ipvs/ip_vs_mh.c        | 540 +++++++++++++++++++++++++++++++++++
 net/netfilter/ipvs/ip_vs_proto_tcp.c |   4 +-
 net/netfilter/ipvs/ip_vs_sh.c        |   3 +-
 10 files changed, 593 insertions(+), 6 deletions(-)
 create mode 100644 net/netfilter/ipvs/ip_vs_mh.c

^ permalink raw reply

* [PATCH 4/5] netfilter: ipvs: Add configurations of Maglev hashing
From: Simon Horman @ 2018-04-19  8:56 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Inju Song, Simon Horman
In-Reply-To: <20180419085614.7437-1-horms@verge.net.au>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 3306 bytes --]

From: Inju Song <inju.song@navercorp.com>

To build the maglev hashing scheduler, add some configuration
to Kconfig and Makefile.

 - The compile configurations of MH are added to the Kconfig.

 - The MH build rule is added to the Makefile.

Signed-off-by: Inju Song <inju.song@navercorp.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/Kconfig  | 37 +++++++++++++++++++++++++++++++++++++
 net/netfilter/ipvs/Makefile |  1 +
 2 files changed, 38 insertions(+)

diff --git a/net/netfilter/ipvs/Kconfig b/net/netfilter/ipvs/Kconfig
index b32fb0dbe237..05dc1b77e466 100644
--- a/net/netfilter/ipvs/Kconfig
+++ b/net/netfilter/ipvs/Kconfig
@@ -225,6 +225,25 @@ config	IP_VS_SH
 	  If you want to compile it in kernel, say Y. To compile it as a
 	  module, choose M here. If unsure, say N.
 
+config	IP_VS_MH
+	tristate "maglev hashing scheduling"
+	---help---
+	  The maglev consistent hashing scheduling algorithm provides the
+	  Google's Maglev hashing algorithm as a IPVS scheduler. It assigns
+	  network connections to the servers through looking up a statically
+	  assigned special hash table called the lookup table. Maglev hashing
+	  is to assign a preference list of all the lookup table positions
+	  to each destination.
+
+	  Through this operation, The maglev hashing gives an almost equal
+	  share of the lookup table to each of the destinations and provides
+	  minimal disruption by using the lookup table. When the set of
+	  destinations changes, a connection will likely be sent to the same
+	  destination as it was before.
+
+	  If you want to compile it in kernel, say Y. To compile it as a
+	  module, choose M here. If unsure, say N.
+
 config	IP_VS_SED
 	tristate "shortest expected delay scheduling"
 	---help---
@@ -266,6 +285,24 @@ config IP_VS_SH_TAB_BITS
 	  needs to be large enough to effectively fit all the destinations
 	  multiplied by their respective weights.
 
+comment 'IPVS MH scheduler'
+
+config IP_VS_MH_TAB_INDEX
+	int "IPVS maglev hashing table index of size (the prime numbers)"
+	range 8 17
+	default 12
+	---help---
+	  The maglev hashing scheduler maps source IPs to destinations
+	  stored in a hash table. This table is assigned by a preference
+	  list of the positions to each destination until all slots in
+	  the table are filled. The index determines the prime for size of
+	  the table as 251, 509, 1021, 2039, 4093, 8191, 16381, 32749,
+	  65521 or 131071. When using weights to allow destinations to
+	  receive more connections, the table is assigned an amount
+	  proportional to the weights specified. The table needs to be large
+	  enough to effectively fit all the destinations multiplied by their
+	  respective weights.
+
 comment 'IPVS application helper'
 
 config	IP_VS_FTP
diff --git a/net/netfilter/ipvs/Makefile b/net/netfilter/ipvs/Makefile
index c552993fa4b9..bfce2677fda2 100644
--- a/net/netfilter/ipvs/Makefile
+++ b/net/netfilter/ipvs/Makefile
@@ -33,6 +33,7 @@ obj-$(CONFIG_IP_VS_LBLC) += ip_vs_lblc.o
 obj-$(CONFIG_IP_VS_LBLCR) += ip_vs_lblcr.o
 obj-$(CONFIG_IP_VS_DH) += ip_vs_dh.o
 obj-$(CONFIG_IP_VS_SH) += ip_vs_sh.o
+obj-$(CONFIG_IP_VS_MH) += ip_vs_mh.o
 obj-$(CONFIG_IP_VS_SED) += ip_vs_sed.o
 obj-$(CONFIG_IP_VS_NQ) += ip_vs_nq.o
 
-- 
2.11.0

^ permalink raw reply related

* [PATCH 3/5] netfilter: ipvs: Add Maglev hashing scheduler
From: Simon Horman @ 2018-04-19  8:56 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Inju Song, Simon Horman
In-Reply-To: <20180419085614.7437-1-horms@verge.net.au>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 15318 bytes --]

From: Inju Song <inju.song@navercorp.com>

Implements the Google's Maglev hashing algorithm as a IPVS scheduler.

Basically it provides consistent hashing butÂ offers some special
features about disruption and load balancing.

 1)Â minimal disruption: when the set of destinations changes,
    a connection will likely be sent to the same destination
    as it was before.

 2)Â load balancing: each destination will receive an almost
    equal number of connections.

Seel also for detail: [3.4 Consistent Hasing] in
https://www.usenix.org/system/files/conference/nsdi16/nsdi16-paper-eisenbud.pdf

Signed-off-by: Inju Song <inju.song@navercorp.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_mh.c | 540 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 540 insertions(+)
 create mode 100644 net/netfilter/ipvs/ip_vs_mh.c

diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
new file mode 100644
index 000000000000..0f795b186eb3
--- /dev/null
+++ b/net/netfilter/ipvs/ip_vs_mh.c
@@ -0,0 +1,540 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPVS:	Maglev Hashing scheduling module
+ *
+ * Authors:	Inju Song <inju.song@navercorp.com>
+ *
+ */
+
+/* The mh algorithm is to assign a preference list of all the lookup
+ * table positions to each destination and populate the table with
+ * the most-preferred position of destinations. Then it is to select
+ * destination with the hash key of source IP address through looking
+ * up a the lookup table.
+ *
+ * The algorithm is detailed in:
+ * [3.4 Consistent Hasing]
+https://www.usenix.org/system/files/conference/nsdi16/nsdi16-paper-eisenbud.pdf
+ *
+ */
+
+#define KMSG_COMPONENT "IPVS"
+#define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
+
+#include <linux/ip.h>
+#include <linux/slab.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+
+#include <net/ip_vs.h>
+
+#include <linux/siphash.h>
+#include <linux/bitops.h>
+#include <linux/gcd.h>
+
+#define IP_VS_SVC_F_SCHED_MH_FALLBACK	IP_VS_SVC_F_SCHED1 /* MH fallback */
+#define IP_VS_SVC_F_SCHED_MH_PORT	IP_VS_SVC_F_SCHED2 /* MH use port */
+
+struct ip_vs_mh_lookup {
+	struct ip_vs_dest __rcu	*dest;	/* real server (cache) */
+};
+
+struct ip_vs_mh_dest_setup {
+	unsigned int	offset; /* starting offset */
+	unsigned int	skip;	/* skip */
+	unsigned int	perm;	/* next_offset */
+	int		turns;	/* weight / gcd() and rshift */
+};
+
+/* Available prime numbers for MH table */
+static int primes[] = {251, 509, 1021, 2039, 4093,
+		       8191, 16381, 32749, 65521, 131071};
+
+/* For IPVS MH entry hash table */
+#ifndef CONFIG_IP_VS_MH_TAB_INDEX
+#define CONFIG_IP_VS_MH_TAB_INDEX	12
+#endif
+#define IP_VS_MH_TAB_BITS		(CONFIG_IP_VS_MH_TAB_INDEX / 2)
+#define IP_VS_MH_TAB_INDEX		(CONFIG_IP_VS_MH_TAB_INDEX - 8)
+#define IP_VS_MH_TAB_SIZE               primes[IP_VS_MH_TAB_INDEX]
+
+struct ip_vs_mh_state {
+	struct rcu_head			rcu_head;
+	struct ip_vs_mh_lookup		*lookup;
+	struct ip_vs_mh_dest_setup	*dest_setup;
+	hsiphash_key_t			hash1, hash2;
+	int				gcd;
+	int				rshift;
+};
+
+static inline void generate_hash_secret(hsiphash_key_t *hash1,
+					hsiphash_key_t *hash2)
+{
+	hash1->key[0] = 2654435761UL;
+	hash1->key[1] = 2654435761UL;
+
+	hash2->key[0] = 2654446892UL;
+	hash2->key[1] = 2654446892UL;
+}
+
+/* Helper function to determine if server is unavailable */
+static inline bool is_unavailable(struct ip_vs_dest *dest)
+{
+	return atomic_read(&dest->weight) <= 0 ||
+	       dest->flags & IP_VS_DEST_F_OVERLOAD;
+}
+
+/* Returns hash value for IPVS MH entry */
+static inline unsigned int
+ip_vs_mh_hashkey(int af, const union nf_inet_addr *addr,
+		 __be16 port, hsiphash_key_t *key, unsigned int offset)
+{
+	unsigned int v;
+	__be32 addr_fold = addr->ip;
+
+#ifdef CONFIG_IP_VS_IPV6
+	if (af == AF_INET6)
+		addr_fold = addr->ip6[0] ^ addr->ip6[1] ^
+			    addr->ip6[2] ^ addr->ip6[3];
+#endif
+	v = (offset + ntohs(port) + ntohl(addr_fold));
+	return hsiphash(&v, sizeof(v), key);
+}
+
+/* Reset all the hash buckets of the specified table. */
+static void ip_vs_mh_reset(struct ip_vs_mh_state *s)
+{
+	int i;
+	struct ip_vs_mh_lookup *l;
+	struct ip_vs_dest *dest;
+
+	l = &s->lookup[0];
+	for (i = 0; i < IP_VS_MH_TAB_SIZE; i++) {
+		dest = rcu_dereference_protected(l->dest, 1);
+		if (dest) {
+			ip_vs_dest_put(dest);
+			RCU_INIT_POINTER(l->dest, NULL);
+		}
+		l++;
+	}
+}
+
+static int ip_vs_mh_permutate(struct ip_vs_mh_state *s,
+			      struct ip_vs_service *svc)
+{
+	struct list_head *p;
+	struct ip_vs_mh_dest_setup *ds;
+	struct ip_vs_dest *dest;
+	int lw;
+
+	/* If gcd is smaller then 1, number of dests or
+	 * all last_weight of dests are zero. So, skip
+	 * permutation for the dests.
+	 */
+	if (s->gcd < 1)
+		return 0;
+
+	/* Set dest_setup for the dests permutation */
+	p = &svc->destinations;
+	ds = &s->dest_setup[0];
+	while ((p = p->next) != &svc->destinations) {
+		dest = list_entry(p, struct ip_vs_dest, n_list);
+
+		ds->offset = ip_vs_mh_hashkey(svc->af, &dest->addr,
+					      dest->port, &s->hash1, 0) %
+					      IP_VS_MH_TAB_SIZE;
+		ds->skip = ip_vs_mh_hashkey(svc->af, &dest->addr,
+					    dest->port, &s->hash2, 0) %
+					    (IP_VS_MH_TAB_SIZE - 1) + 1;
+		ds->perm = ds->offset;
+
+		lw = atomic_read(&dest->last_weight);
+		ds->turns = ((lw / s->gcd) >> s->rshift) ? : (lw != 0);
+		ds++;
+	}
+
+	return 0;
+}
+
+static int ip_vs_mh_populate(struct ip_vs_mh_state *s,
+			     struct ip_vs_service *svc)
+{
+	int n, c, dt_count;
+	unsigned long *table;
+	struct list_head *p;
+	struct ip_vs_mh_dest_setup *ds;
+	struct ip_vs_dest *dest, *new_dest;
+
+	/* If gcd is smaller then 1, number of dests or
+	 * all last_weight of dests are zero. So, skip
+	 * the population for the dests and reset lookup table.
+	 */
+	if (s->gcd < 1) {
+		ip_vs_mh_reset(s);
+		return 0;
+	}
+
+	table =  kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
+			 sizeof(unsigned long), GFP_KERNEL);
+	if (!table)
+		return -ENOMEM;
+
+	p = &svc->destinations;
+	n = 0;
+	dt_count = 0;
+	while (n < IP_VS_MH_TAB_SIZE) {
+		if (p == &svc->destinations)
+			p = p->next;
+
+		ds = &s->dest_setup[0];
+		while (p != &svc->destinations) {
+			/* Ignore added server with zero weight */
+			if (ds->turns < 1) {
+				p = p->next;
+				ds++;
+				continue;
+			}
+
+			c = ds->perm;
+			while (test_bit(c, table)) {
+				/* Add skip, mod IP_VS_MH_TAB_SIZE */
+				ds->perm += ds->skip;
+				if (ds->perm >= IP_VS_MH_TAB_SIZE)
+					ds->perm -= IP_VS_MH_TAB_SIZE;
+				c = ds->perm;
+			}
+
+			__set_bit(c, table);
+
+			dest = rcu_dereference_protected(s->lookup[c].dest, 1);
+			new_dest = list_entry(p, struct ip_vs_dest, n_list);
+			if (dest != new_dest) {
+				if (dest)
+					ip_vs_dest_put(dest);
+				ip_vs_dest_hold(new_dest);
+				RCU_INIT_POINTER(s->lookup[c].dest, new_dest);
+			}
+
+			if (++n == IP_VS_MH_TAB_SIZE)
+				goto out;
+
+			if (++dt_count >= ds->turns) {
+				dt_count = 0;
+				p = p->next;
+				ds++;
+			}
+		}
+	}
+
+out:
+	kfree(table);
+	return 0;
+}
+
+/* Get ip_vs_dest associated with supplied parameters. */
+static inline struct ip_vs_dest *
+ip_vs_mh_get(struct ip_vs_service *svc, struct ip_vs_mh_state *s,
+	     const union nf_inet_addr *addr, __be16 port)
+{
+	unsigned int hash = ip_vs_mh_hashkey(svc->af, addr, port, &s->hash1, 0)
+					     % IP_VS_MH_TAB_SIZE;
+	struct ip_vs_dest *dest = rcu_dereference(s->lookup[hash].dest);
+
+	return (!dest || is_unavailable(dest)) ? NULL : dest;
+}
+
+/* As ip_vs_mh_get, but with fallback if selected server is unavailable */
+static inline struct ip_vs_dest *
+ip_vs_mh_get_fallback(struct ip_vs_service *svc, struct ip_vs_mh_state *s,
+		      const union nf_inet_addr *addr, __be16 port)
+{
+	unsigned int offset, roffset;
+	unsigned int hash, ihash;
+	struct ip_vs_dest *dest;
+
+	/* First try the dest it's supposed to go to */
+	ihash = ip_vs_mh_hashkey(svc->af, addr, port,
+				 &s->hash1, 0) % IP_VS_MH_TAB_SIZE;
+	dest = rcu_dereference(s->lookup[ihash].dest);
+	if (!dest)
+		return NULL;
+	if (!is_unavailable(dest))
+		return dest;
+
+	IP_VS_DBG_BUF(6, "MH: selected unavailable server %s:%u, reselecting",
+		      IP_VS_DBG_ADDR(dest->af, &dest->addr), ntohs(dest->port));
+
+	/* If the original dest is unavailable, loop around the table
+	 * starting from ihash to find a new dest
+	 */
+	for (offset = 0; offset < IP_VS_MH_TAB_SIZE; offset++) {
+		roffset = (offset + ihash) % IP_VS_MH_TAB_SIZE;
+		hash = ip_vs_mh_hashkey(svc->af, addr, port, &s->hash1,
+					roffset) % IP_VS_MH_TAB_SIZE;
+		dest = rcu_dereference(s->lookup[hash].dest);
+		if (!dest)
+			break;
+		if (!is_unavailable(dest))
+			return dest;
+		IP_VS_DBG_BUF(6,
+			      "MH: selected unavailable server %s:%u (offset %u), reselecting",
+			      IP_VS_DBG_ADDR(dest->af, &dest->addr),
+			      ntohs(dest->port), roffset);
+	}
+
+	return NULL;
+}
+
+/* Assign all the hash buckets of the specified table with the service. */
+static int ip_vs_mh_reassign(struct ip_vs_mh_state *s,
+			     struct ip_vs_service *svc)
+{
+	int ret;
+
+	if (svc->num_dests > IP_VS_MH_TAB_SIZE)
+		return -EINVAL;
+
+	if (svc->num_dests >= 1) {
+		s->dest_setup = kcalloc(svc->num_dests,
+					sizeof(struct ip_vs_mh_dest_setup),
+					GFP_KERNEL);
+		if (!s->dest_setup)
+			return -ENOMEM;
+	}
+
+	ip_vs_mh_permutate(s, svc);
+
+	ret = ip_vs_mh_populate(s, svc);
+	if (ret < 0)
+		goto out;
+
+	IP_VS_DBG_BUF(6, "MH: reassign lookup table of %s:%u\n",
+		      IP_VS_DBG_ADDR(svc->af, &svc->addr),
+		      ntohs(svc->port));
+
+out:
+	if (svc->num_dests >= 1) {
+		kfree(s->dest_setup);
+		s->dest_setup = NULL;
+	}
+	return ret;
+}
+
+static int ip_vs_mh_gcd_weight(struct ip_vs_service *svc)
+{
+	struct ip_vs_dest *dest;
+	int weight;
+	int g = 0;
+
+	list_for_each_entry(dest, &svc->destinations, n_list) {
+		weight = atomic_read(&dest->last_weight);
+		if (weight > 0) {
+			if (g > 0)
+				g = gcd(weight, g);
+			else
+				g = weight;
+		}
+	}
+	return g;
+}
+
+/* To avoid assigning huge weight for the MH table,
+ * calculate shift value with gcd.
+ */
+static int ip_vs_mh_shift_weight(struct ip_vs_service *svc, int gcd)
+{
+	struct ip_vs_dest *dest;
+	int new_weight, weight = 0;
+	int mw, shift;
+
+	/* If gcd is smaller then 1, number of dests or
+	 * all last_weight of dests are zero. So, return
+	 * shift value as zero.
+	 */
+	if (gcd < 1)
+		return 0;
+
+	list_for_each_entry(dest, &svc->destinations, n_list) {
+		new_weight = atomic_read(&dest->last_weight);
+		if (new_weight > weight)
+			weight = new_weight;
+	}
+
+	/* Because gcd is greater than zero,
+	 * the maximum weight and gcd are always greater than zero
+	 */
+	mw = weight / gcd;
+
+	/* shift = occupied bits of weight/gcd - MH highest bits */
+	shift = fls(mw) - IP_VS_MH_TAB_BITS;
+	return (shift >= 0) ? shift : 0;
+}
+
+static void ip_vs_mh_state_free(struct rcu_head *head)
+{
+	struct ip_vs_mh_state *s;
+
+	s = container_of(head, struct ip_vs_mh_state, rcu_head);
+	kfree(s->lookup);
+	kfree(s);
+}
+
+static int ip_vs_mh_init_svc(struct ip_vs_service *svc)
+{
+	int ret;
+	struct ip_vs_mh_state *s;
+
+	/* Allocate the MH table for this service */
+	s = kzalloc(sizeof(*s), GFP_KERNEL);
+	if (!s)
+		return -ENOMEM;
+
+	s->lookup = kcalloc(IP_VS_MH_TAB_SIZE, sizeof(struct ip_vs_mh_lookup),
+			    GFP_KERNEL);
+	if (!s->lookup) {
+		kfree(s);
+		return -ENOMEM;
+	}
+
+	generate_hash_secret(&s->hash1, &s->hash2);
+	s->gcd = ip_vs_mh_gcd_weight(svc);
+	s->rshift = ip_vs_mh_shift_weight(svc, s->gcd);
+
+	IP_VS_DBG(6,
+		  "MH lookup table (memory=%zdbytes) allocated for current service\n",
+		  sizeof(struct ip_vs_mh_lookup) * IP_VS_MH_TAB_SIZE);
+
+	/* Assign the lookup table with current dests */
+	ret = ip_vs_mh_reassign(s, svc);
+	if (ret < 0) {
+		ip_vs_mh_reset(s);
+		ip_vs_mh_state_free(&s->rcu_head);
+		return ret;
+	}
+
+	/* No more failures, attach state */
+	svc->sched_data = s;
+	return 0;
+}
+
+static void ip_vs_mh_done_svc(struct ip_vs_service *svc)
+{
+	struct ip_vs_mh_state *s = svc->sched_data;
+
+	/* Got to clean up lookup entry here */
+	ip_vs_mh_reset(s);
+
+	call_rcu(&s->rcu_head, ip_vs_mh_state_free);
+	IP_VS_DBG(6, "MH lookup table (memory=%zdbytes) released\n",
+		  sizeof(struct ip_vs_mh_lookup) * IP_VS_MH_TAB_SIZE);
+}
+
+static int ip_vs_mh_dest_changed(struct ip_vs_service *svc,
+				 struct ip_vs_dest *dest)
+{
+	struct ip_vs_mh_state *s = svc->sched_data;
+
+	s->gcd = ip_vs_mh_gcd_weight(svc);
+	s->rshift = ip_vs_mh_shift_weight(svc, s->gcd);
+
+	/* Assign the lookup table with the updated service */
+	return ip_vs_mh_reassign(s, svc);
+}
+
+/* Helper function to get port number */
+static inline __be16
+ip_vs_mh_get_port(const struct sk_buff *skb, struct ip_vs_iphdr *iph)
+{
+	__be16 _ports[2], *ports;
+
+	/* At this point we know that we have a valid packet of some kind.
+	 * Because ICMP packets are only guaranteed to have the first 8
+	 * bytes, let's just grab the ports.  Fortunately they're in the
+	 * same position for all three of the protocols we care about.
+	 */
+	switch (iph->protocol) {
+	case IPPROTO_TCP:
+	case IPPROTO_UDP:
+	case IPPROTO_SCTP:
+		ports = skb_header_pointer(skb, iph->len, sizeof(_ports),
+					   &_ports);
+		if (unlikely(!ports))
+			return 0;
+
+		if (likely(!ip_vs_iph_inverse(iph)))
+			return ports[0];
+		else
+			return ports[1];
+	default:
+		return 0;
+	}
+}
+
+/* Maglev Hashing scheduling */
+static struct ip_vs_dest *
+ip_vs_mh_schedule(struct ip_vs_service *svc, const struct sk_buff *skb,
+		  struct ip_vs_iphdr *iph)
+{
+	struct ip_vs_dest *dest;
+	struct ip_vs_mh_state *s;
+	__be16 port = 0;
+	const union nf_inet_addr *hash_addr;
+
+	hash_addr = ip_vs_iph_inverse(iph) ? &iph->daddr : &iph->saddr;
+
+	IP_VS_DBG(6, "%s : Scheduling...\n", __func__);
+
+	if (svc->flags & IP_VS_SVC_F_SCHED_MH_PORT)
+		port = ip_vs_mh_get_port(skb, iph);
+
+	s = (struct ip_vs_mh_state *)svc->sched_data;
+
+	if (svc->flags & IP_VS_SVC_F_SCHED_MH_FALLBACK)
+		dest = ip_vs_mh_get_fallback(svc, s, hash_addr, port);
+	else
+		dest = ip_vs_mh_get(svc, s, hash_addr, port);
+
+	if (!dest) {
+		ip_vs_scheduler_err(svc, "no destination available");
+		return NULL;
+	}
+
+	IP_VS_DBG_BUF(6, "MH: source IP address %s:%u --> server %s:%u\n",
+		      IP_VS_DBG_ADDR(svc->af, hash_addr),
+		      ntohs(port),
+		      IP_VS_DBG_ADDR(dest->af, &dest->addr),
+		      ntohs(dest->port));
+
+	return dest;
+}
+
+/* IPVS MH Scheduler structure */
+static struct ip_vs_scheduler ip_vs_mh_scheduler = {
+	.name =			"mh",
+	.refcnt =		ATOMIC_INIT(0),
+	.module =		THIS_MODULE,
+	.n_list	 =		LIST_HEAD_INIT(ip_vs_mh_scheduler.n_list),
+	.init_service =		ip_vs_mh_init_svc,
+	.done_service =		ip_vs_mh_done_svc,
+	.add_dest =		ip_vs_mh_dest_changed,
+	.del_dest =		ip_vs_mh_dest_changed,
+	.upd_dest =		ip_vs_mh_dest_changed,
+	.schedule =		ip_vs_mh_schedule,
+};
+
+static int __init ip_vs_mh_init(void)
+{
+	return register_ip_vs_scheduler(&ip_vs_mh_scheduler);
+}
+
+static void __exit ip_vs_mh_cleanup(void)
+{
+	unregister_ip_vs_scheduler(&ip_vs_mh_scheduler);
+	rcu_barrier();
+}
+
+module_init(ip_vs_mh_init);
+module_exit(ip_vs_mh_cleanup);
+MODULE_DESCRIPTION("Maglev hashing ipvs scheduler");
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Inju Song <inju.song@navercorp.com>");
-- 
2.11.0

^ permalink raw reply related

* [PATCH 5/5] ipvs: fix multiplicative hashing in sh/dh/lblc/lblcr algorithms
From: Simon Horman @ 2018-04-19  8:56 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Vincent Bernat, Simon Horman
In-Reply-To: <20180419085614.7437-1-horms@verge.net.au>

From: Vincent Bernat <vincent@bernat.im>

The sh/dh/lblc/lblcr algorithms are using Knuth's multiplicative
hashing incorrectly. Replace its use by the hash_32() macro, which
correctly implements this algorithm. It doesn't use the same constant,
but it shouldn't matter.

Signed-off-by: Vincent Bernat <vincent@bernat.im>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_dh.c    | 3 ++-
 net/netfilter/ipvs/ip_vs_lblc.c  | 3 ++-
 net/netfilter/ipvs/ip_vs_lblcr.c | 3 ++-
 net/netfilter/ipvs/ip_vs_sh.c    | 3 ++-
 4 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_dh.c b/net/netfilter/ipvs/ip_vs_dh.c
index 75f798f8e83b..07459e71d907 100644
--- a/net/netfilter/ipvs/ip_vs_dh.c
+++ b/net/netfilter/ipvs/ip_vs_dh.c
@@ -43,6 +43,7 @@
 #include <linux/module.h>
 #include <linux/kernel.h>
 #include <linux/skbuff.h>
+#include <linux/hash.h>
 
 #include <net/ip_vs.h>
 
@@ -81,7 +82,7 @@ static inline unsigned int ip_vs_dh_hashkey(int af, const union nf_inet_addr *ad
 		addr_fold = addr->ip6[0]^addr->ip6[1]^
 			    addr->ip6[2]^addr->ip6[3];
 #endif
-	return (ntohl(addr_fold)*2654435761UL) & IP_VS_DH_TAB_MASK;
+	return hash_32(ntohl(addr_fold), IP_VS_DH_TAB_BITS);
 }
 
 
diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c
index 3057e453bf31..08147fc6400c 100644
--- a/net/netfilter/ipvs/ip_vs_lblc.c
+++ b/net/netfilter/ipvs/ip_vs_lblc.c
@@ -48,6 +48,7 @@
 #include <linux/kernel.h>
 #include <linux/skbuff.h>
 #include <linux/jiffies.h>
+#include <linux/hash.h>
 
 /* for sysctl */
 #include <linux/fs.h>
@@ -160,7 +161,7 @@ ip_vs_lblc_hashkey(int af, const union nf_inet_addr *addr)
 		addr_fold = addr->ip6[0]^addr->ip6[1]^
 			    addr->ip6[2]^addr->ip6[3];
 #endif
-	return (ntohl(addr_fold)*2654435761UL) & IP_VS_LBLC_TAB_MASK;
+	return hash_32(ntohl(addr_fold), IP_VS_LBLC_TAB_BITS);
 }
 
 
diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c
index 92adc04557ed..9b6a6c9e9cfa 100644
--- a/net/netfilter/ipvs/ip_vs_lblcr.c
+++ b/net/netfilter/ipvs/ip_vs_lblcr.c
@@ -47,6 +47,7 @@
 #include <linux/jiffies.h>
 #include <linux/list.h>
 #include <linux/slab.h>
+#include <linux/hash.h>
 
 /* for sysctl */
 #include <linux/fs.h>
@@ -323,7 +324,7 @@ ip_vs_lblcr_hashkey(int af, const union nf_inet_addr *addr)
 		addr_fold = addr->ip6[0]^addr->ip6[1]^
 			    addr->ip6[2]^addr->ip6[3];
 #endif
-	return (ntohl(addr_fold)*2654435761UL) & IP_VS_LBLCR_TAB_MASK;
+	return hash_32(ntohl(addr_fold), IP_VS_LBLCR_TAB_BITS);
 }
 
 
diff --git a/net/netfilter/ipvs/ip_vs_sh.c b/net/netfilter/ipvs/ip_vs_sh.c
index 16aaac6eedc9..1e01c782583a 100644
--- a/net/netfilter/ipvs/ip_vs_sh.c
+++ b/net/netfilter/ipvs/ip_vs_sh.c
@@ -96,7 +96,8 @@ ip_vs_sh_hashkey(int af, const union nf_inet_addr *addr,
 		addr_fold = addr->ip6[0]^addr->ip6[1]^
 			    addr->ip6[2]^addr->ip6[3];
 #endif
-	return (offset + (ntohs(port) + ntohl(addr_fold))*2654435761UL) &
+	return (offset + hash_32(ntohs(port) + ntohl(addr_fold),
+				 IP_VS_SH_TAB_BITS)) &
 		IP_VS_SH_TAB_MASK;
 }
 
-- 
2.11.0

^ permalink raw reply related

* RE: [PATCH] net: phy: marvell: clear wol event before setting it
From: Bhadram Varka @ 2018-04-19  9:00 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: Andrew Lunn, Florian Fainelli, David S. Miller,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Jingju Hou
In-Reply-To: <20180419165351.5388021e@xhacker.debian>

Hi,

> -----Original Message-----
> From: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
> Sent: Thursday, April 19, 2018 2:24 PM
> To: Bhadram Varka <vbhadram@nvidia.com>
> Cc: Andrew Lunn <andrew@lunn.ch>; Florian Fainelli <f.fainelli@gmail.com>;
> David S. Miller <davem@davemloft.net>; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; Jingju Hou <Jingju.Hou@synaptics.com>
> Subject: Re: [PATCH] net: phy: marvell: clear wol event before setting it
> 
> Hi,
> 
> On Thu, 19 Apr 2018 08:38:45 +0000 Bhadram Varka wrote:
> 
> > Hi,
> >
> > > -----Original Message-----
> > > From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> > > Behalf Of Jisheng Zhang
> > > Sent: Thursday, April 19, 2018 1:33 PM
> > > To: Andrew Lunn <andrew@lunn.ch>; Florian Fainelli
> > > <f.fainelli@gmail.com>; David S. Miller <davem@davemloft.net>
> > > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Jingju Hou
> > > <Jingju.Hou@synaptics.com>
> > > Subject: [PATCH] net: phy: marvell: clear wol event before setting
> > > it
> > >
> > > From: Jingju Hou <Jingju.Hou@synaptics.com>
> > >
> > > If WOL event happened once, the LED[2] interrupt pin will not be
> > > cleared unless reading the CSISR register. So clear the WOL event before
> enabling it.
> > >
> > > Signed-off-by: Jingju Hou <Jingju.Hou@synaptics.com>
> > > Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
> > > ---
> > >  drivers/net/phy/marvell.c | 9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > >
> > > diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
> > > index c22e8e383247..b6abe1cbc84b 100644
> > > --- a/drivers/net/phy/marvell.c
> > > +++ b/drivers/net/phy/marvell.c
> > > @@ -115,6 +115,9 @@
> > >  /* WOL Event Interrupt Enable */
> > >  #define MII_88E1318S_PHY_CSIER_WOL_EIE			BIT(7)
> > >
> > > +/* Copper Specific Interrupt Status Register */
> > > +#define MII_88E1318S_PHY_CSISR				0x13
> > > +
> >
> > There is already macro to represent this register - MII_M1011_IEVENT. Do we
> need this macro ?
> 
> Good point. Will use MII_M1011_IEVENT instead in v2.
> 
> >
> > >  /* LED Timer Control Register */
> > >  #define MII_88E1318S_PHY_LED_TCR			0x12
> > >  #define MII_88E1318S_PHY_LED_TCR_FORCE_INT		BIT(15)
> > > @@ -1393,6 +1396,12 @@ static int m88e1318_set_wol(struct phy_device
> > > *phydev,
> > >  		if (err < 0)
> > >  			goto error;
> > >
> > > +		/* If WOL event happened once, the LED[2] interrupt pin
> > > +		 * will not be cleared unless reading the CSISR register.
> > > +		 * So clear the WOL event first before enabling it.
> > > +		 */
> > > +		phy_read(phydev, MII_88E1318S_PHY_CSISR);
> >
> > This part of the operation already taken care by ack_interrupt and
> > did_interrupt [....] .ack_interrupt = &marvell_ack_interrupt,
> > .did_interrupt = &m88e1121_did_interrupt, [...]
> >
> > If at all WOL event occurred marvell_ack_interrupt will take care of clearing the
> interrupt status register.
> > Am I missing anything here ?
> 
> If there's no valid irq for phy, the ack_interrupt/did_interrupt won't be called.

Which means that the PHY is not having Interrupt pin ?

Generally through PHY interrupt will wake up the system right. If there is no interrupt pin then how the system will wake up the from suspend for the magic packet.?

Thanks!

^ permalink raw reply

* [net-next  0/3] tipc: Confgiuration of MTU for media UDP
From: GhantaKrishnamurthy MohanKrishna @ 2018-04-19  9:06 UTC (permalink / raw)
  To: tipc-discussion, jon.maloy, maloy, ying.xue,
	mohan.krishna.ghanta.krishnamurthy, netdev, davem

Systematic measurements have shown that an emulated MTU of 14k for
UDP bearers is the optimal value for maximal throughput. Accordingly,
the default MTU of UDP bearers is changed to 14k.

We also provide users with a fallback option from this value,
by providing support to configure MTU for UDP bearers. The following
options are introduced which are symmetrical to the design of
confguring link tolerance.

- Configure media with new MTU value, which will take effect on
links going up after the moment it was configured. Alternatively,
the bearer has to be disabled and re-enabled, for existing links to
reflect the configured value.

- Configure bearer with new MTU value, which take effect on 
running links dynamically.

Please note:
- User has to change MTU at both endpoints, otherwise the link 
will fall back to smallest MTU after a reset.
- Failover from a link with higher MTU to a link with lower MTU

GhantaKrishnamurthy MohanKrishna (3):
  tipc: set default MTU for UDP media
  tipc: implement configuration of UDP media MTU
  tipc: confgiure and apply UDP bearer MTU on running links

 include/uapi/linux/tipc_config.h  |  5 +++++
 include/uapi/linux/tipc_netlink.h |  1 +
 net/tipc/bearer.c                 | 29 ++++++++++++++++++++++++++++-
 net/tipc/bearer.h                 |  3 +++
 net/tipc/node.c                   | 12 +++++++++---
 net/tipc/node.h                   |  2 +-
 net/tipc/udp_media.c              |  4 ++--
 net/tipc/udp_media.h              | 14 ++++++++++++++
 8 files changed, 63 insertions(+), 7 deletions(-)

-- 
2.1.4

^ permalink raw reply

* [net-next  1/3] tipc: set default MTU for UDP media
From: GhantaKrishnamurthy MohanKrishna @ 2018-04-19  9:06 UTC (permalink / raw)
  To: tipc-discussion, jon.maloy, maloy, ying.xue,
	mohan.krishna.ghanta.krishnamurthy, netdev, davem
In-Reply-To: <1524128780-2550-1-git-send-email-mohan.krishna.ghanta.krishnamurthy@ericsson.com>

Currently, all bearers are configured with MTU value same as the
underlying L2 device. However, in case of bearers with media type
UDP, higher throughput is possible with a fixed and higher emulated
MTU value than adapting to the underlying L2 MTU.

In this commit, we introduce a parameter mtu in struct tipc_media
and a default value is set for UDP. A default value of 14k
was determined by experimentation and found to have a higher throughput
than 16k. MTU for UDP bearers are assigned the above set value of
media MTU.

Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
---
 include/uapi/linux/tipc_config.h | 5 +++++
 net/tipc/udp_media.c             | 4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/tipc_config.h b/include/uapi/linux/tipc_config.h
index 3f29e3c8ed06..4b2c93b1934c 100644
--- a/include/uapi/linux/tipc_config.h
+++ b/include/uapi/linux/tipc_config.h
@@ -185,6 +185,11 @@
 #define TIPC_DEF_LINK_WIN 50
 #define TIPC_MAX_LINK_WIN 8191
 
+/*
+ * Default MTU for UDP media
+ */
+
+#define TIPC_DEF_LINK_UDP_MTU 14000
 
 struct tipc_node_info {
 	__be32 addr;			/* network address of node */
diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
index e7d91f5d5cae..9783101bc4a9 100644
--- a/net/tipc/udp_media.c
+++ b/net/tipc/udp_media.c
@@ -713,8 +713,7 @@ static int tipc_udp_enable(struct net *net, struct tipc_bearer *b,
 			err = -EINVAL;
 			goto err;
 		}
-		b->mtu = dev->mtu - sizeof(struct iphdr)
-			- sizeof(struct udphdr);
+		b->mtu = b->media->mtu;
 #if IS_ENABLED(CONFIG_IPV6)
 	} else if (local.proto == htons(ETH_P_IPV6)) {
 		udp_conf.family = AF_INET6;
@@ -803,6 +802,7 @@ struct tipc_media udp_media_info = {
 	.priority	= TIPC_DEF_LINK_PRI,
 	.tolerance	= TIPC_DEF_LINK_TOL,
 	.window		= TIPC_DEF_LINK_WIN,
+	.mtu		= TIPC_DEF_LINK_UDP_MTU,
 	.type_id	= TIPC_MEDIA_TYPE_UDP,
 	.hwaddr_len	= 0,
 	.name		= "udp"
-- 
2.1.4

^ permalink raw reply related

* [net-next  2/3] tipc: implement configuration of UDP media MTU
From: GhantaKrishnamurthy MohanKrishna @ 2018-04-19  9:06 UTC (permalink / raw)
  To: tipc-discussion, jon.maloy, maloy, ying.xue,
	mohan.krishna.ghanta.krishnamurthy, netdev, davem
In-Reply-To: <1524128780-2550-1-git-send-email-mohan.krishna.ghanta.krishnamurthy@ericsson.com>

In previous commit, we changed the default emulated MTU for UDP bearers
to 14k.

This commit adds the functionality to set/change the default value
by configuring new MTU for UDP media. UDP bearer(s) have to be disabled
and enabled back for the new MTU to take effect.

Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
---
 include/uapi/linux/tipc_netlink.h |  1 +
 net/tipc/bearer.c                 | 13 +++++++++++++
 net/tipc/bearer.h                 |  3 +++
 net/tipc/udp_media.h              | 14 ++++++++++++++
 4 files changed, 31 insertions(+)

diff --git a/include/uapi/linux/tipc_netlink.h b/include/uapi/linux/tipc_netlink.h
index 0affb682e5e3..85c11982c89b 100644
--- a/include/uapi/linux/tipc_netlink.h
+++ b/include/uapi/linux/tipc_netlink.h
@@ -266,6 +266,7 @@ enum {
 	TIPC_NLA_PROP_PRIO,		/* u32 */
 	TIPC_NLA_PROP_TOL,		/* u32 */
 	TIPC_NLA_PROP_WIN,		/* u32 */
+	TIPC_NLA_PROP_MTU,		/* u32 */
 
 	__TIPC_NLA_PROP_MAX,
 	TIPC_NLA_PROP_MAX = __TIPC_NLA_PROP_MAX - 1
diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index f7d47c89d658..a22caf9e5a18 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -1029,6 +1029,9 @@ static int __tipc_nl_add_media(struct tipc_nl_msg *msg,
 		goto prop_msg_full;
 	if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, media->window))
 		goto prop_msg_full;
+	if (media->type_id == TIPC_MEDIA_TYPE_UDP)
+		if (nla_put_u32(msg->skb, TIPC_NLA_PROP_MTU, media->mtu))
+			goto prop_msg_full;
 
 	nla_nest_end(msg->skb, prop);
 	nla_nest_end(msg->skb, attrs);
@@ -1158,6 +1161,16 @@ int __tipc_nl_media_set(struct sk_buff *skb, struct genl_info *info)
 			m->priority = nla_get_u32(props[TIPC_NLA_PROP_PRIO]);
 		if (props[TIPC_NLA_PROP_WIN])
 			m->window = nla_get_u32(props[TIPC_NLA_PROP_WIN]);
+		if (props[TIPC_NLA_PROP_MTU]) {
+			if (m->type_id != TIPC_MEDIA_TYPE_UDP)
+				return -EINVAL;
+#ifdef CONFIG_TIPC_MEDIA_UDP
+			if (tipc_udp_mtu_bad(nla_get_u32
+					     (props[TIPC_NLA_PROP_MTU])))
+				return -EINVAL;
+			m->mtu = nla_get_u32(props[TIPC_NLA_PROP_MTU]);
+#endif
+		}
 	}
 
 	return 0;
diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h
index 6efcee63a381..394290cbbb1d 100644
--- a/net/tipc/bearer.h
+++ b/net/tipc/bearer.h
@@ -94,6 +94,8 @@ struct tipc_bearer;
  * @priority: default link (and bearer) priority
  * @tolerance: default time (in ms) before declaring link failure
  * @window: default window (in packets) before declaring link congestion
+ * @mtu: max packet size bearer can support for media type not dependent on
+ * underlying device MTU
  * @type_id: TIPC media identifier
  * @hwaddr_len: TIPC media address len
  * @name: media name
@@ -118,6 +120,7 @@ struct tipc_media {
 	u32 priority;
 	u32 tolerance;
 	u32 window;
+	u32 mtu;
 	u32 type_id;
 	u32 hwaddr_len;
 	char name[TIPC_MAX_MEDIA_NAME];
diff --git a/net/tipc/udp_media.h b/net/tipc/udp_media.h
index 281bbae87726..e7455cc73e16 100644
--- a/net/tipc/udp_media.h
+++ b/net/tipc/udp_media.h
@@ -38,9 +38,23 @@
 #ifndef _TIPC_UDP_MEDIA_H
 #define _TIPC_UDP_MEDIA_H
 
+#include <linux/ip.h>
+#include <linux/udp.h>
+
 int tipc_udp_nl_bearer_add(struct tipc_bearer *b, struct nlattr *attr);
 int tipc_udp_nl_add_bearer_data(struct tipc_nl_msg *msg, struct tipc_bearer *b);
 int tipc_udp_nl_dump_remoteip(struct sk_buff *skb, struct netlink_callback *cb);
 
+/* check if configured MTU is too low for tipc headers */
+static inline bool tipc_udp_mtu_bad(u32 mtu)
+{
+	if (mtu >= (TIPC_MIN_BEARER_MTU + sizeof(struct iphdr) +
+	    sizeof(struct udphdr)))
+		return false;
+
+	pr_warn("MTU too low for tipc bearer\n");
+	return true;
+}
+
 #endif
 #endif
-- 
2.1.4

^ permalink raw reply related

* [net-next  3/3] tipc: confgiure and apply UDP bearer MTU on running links
From: GhantaKrishnamurthy MohanKrishna @ 2018-04-19  9:06 UTC (permalink / raw)
  To: tipc-discussion, jon.maloy, maloy, ying.xue,
	mohan.krishna.ghanta.krishnamurthy, netdev, davem
In-Reply-To: <1524128780-2550-1-git-send-email-mohan.krishna.ghanta.krishnamurthy@ericsson.com>

Currently, we have option to configure MTU of UDP media. The configured
MTU takes effect on the links going up after that moment. I.e, a user
has to reset bearer to have new value applied across its links. This is
confusing and disturbing on a running cluster.

We now introduce the functionality to change the default UDP bearer MTU
in struct tipc_bearer. Additionally, the links are updated dynamically,
without any need for a reset, when bearer value is changed. We leverage
the existing per-link functionality and the design being symetrical to
the confguration of link tolerance.

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
---
 net/tipc/bearer.c | 16 +++++++++++++++-
 net/tipc/node.c   | 12 +++++++++---
 net/tipc/node.h   |  2 +-
 3 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index a22caf9e5a18..2dfb492a7c94 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -697,6 +697,9 @@ static int __tipc_nl_add_bearer(struct tipc_nl_msg *msg,
 		goto prop_msg_full;
 	if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, bearer->window))
 		goto prop_msg_full;
+	if (bearer->media->type_id == TIPC_MEDIA_TYPE_UDP)
+		if (nla_put_u32(msg->skb, TIPC_NLA_PROP_MTU, bearer->mtu))
+			goto prop_msg_full;
 
 	nla_nest_end(msg->skb, prop);
 
@@ -979,12 +982,23 @@ int __tipc_nl_bearer_set(struct sk_buff *skb, struct genl_info *info)
 
 		if (props[TIPC_NLA_PROP_TOL]) {
 			b->tolerance = nla_get_u32(props[TIPC_NLA_PROP_TOL]);
-			tipc_node_apply_tolerance(net, b);
+			tipc_node_apply_property(net, b, TIPC_NLA_PROP_TOL);
 		}
 		if (props[TIPC_NLA_PROP_PRIO])
 			b->priority = nla_get_u32(props[TIPC_NLA_PROP_PRIO]);
 		if (props[TIPC_NLA_PROP_WIN])
 			b->window = nla_get_u32(props[TIPC_NLA_PROP_WIN]);
+		if (props[TIPC_NLA_PROP_MTU]) {
+			if (b->media->type_id != TIPC_MEDIA_TYPE_UDP)
+				return -EINVAL;
+#ifdef CONFIG_TIPC_MEDIA_UDP
+			if (tipc_udp_mtu_bad(nla_get_u32
+					     (props[TIPC_NLA_PROP_MTU])))
+				return -EINVAL;
+			b->mtu = nla_get_u32(props[TIPC_NLA_PROP_MTU]);
+			tipc_node_apply_property(net, b, TIPC_NLA_PROP_MTU);
+#endif
+		}
 	}
 
 	return 0;
diff --git a/net/tipc/node.c b/net/tipc/node.c
index c77dd2f3c589..b71e4e376bb9 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1681,7 +1681,8 @@ void tipc_rcv(struct net *net, struct sk_buff *skb, struct tipc_bearer *b)
 	kfree_skb(skb);
 }
 
-void tipc_node_apply_tolerance(struct net *net, struct tipc_bearer *b)
+void tipc_node_apply_property(struct net *net, struct tipc_bearer *b,
+			      int prop)
 {
 	struct tipc_net *tn = tipc_net(net);
 	int bearer_id = b->identity;
@@ -1696,8 +1697,13 @@ void tipc_node_apply_tolerance(struct net *net, struct tipc_bearer *b)
 	list_for_each_entry_rcu(n, &tn->node_list, list) {
 		tipc_node_write_lock(n);
 		e = &n->links[bearer_id];
-		if (e->link)
-			tipc_link_set_tolerance(e->link, b->tolerance, &xmitq);
+		if (e->link) {
+			if (prop == TIPC_NLA_PROP_TOL)
+				tipc_link_set_tolerance(e->link, b->tolerance,
+							&xmitq);
+			else if (prop == TIPC_NLA_PROP_MTU)
+				tipc_link_set_mtu(e->link, b->mtu);
+		}
 		tipc_node_write_unlock(n);
 		tipc_bearer_xmit(net, bearer_id, &xmitq, &e->maddr);
 	}
diff --git a/net/tipc/node.h b/net/tipc/node.h
index f24b83500df1..bb271a37c93f 100644
--- a/net/tipc/node.h
+++ b/net/tipc/node.h
@@ -67,7 +67,7 @@ void tipc_node_check_dest(struct net *net, u32 onode, u8 *peer_id128,
 			  struct tipc_media_addr *maddr,
 			  bool *respond, bool *dupl_addr);
 void tipc_node_delete_links(struct net *net, int bearer_id);
-void tipc_node_apply_tolerance(struct net *net, struct tipc_bearer *b);
+void tipc_node_apply_property(struct net *net, struct tipc_bearer *b, int prop);
 int tipc_node_get_linkname(struct net *net, u32 bearer_id, u32 node,
 			   char *linkname, size_t len);
 int tipc_node_xmit(struct net *net, struct sk_buff_head *list, u32 dnode,
-- 
2.1.4

^ permalink raw reply related

* Re: [bpf-next PATCH 1/3] bpf: add id to map tracepoint
From: Jesper Dangaard Brouer @ 2018-04-19  9:08 UTC (permalink / raw)
  To: Sebastiano Miano
  Cc: netdev, ast, daniel, mingo, rostedt, fulvio.risso, brouer
In-Reply-To: <152406544797.3465.16927588919039069579.stgit@localhost.localdomain>

On Wed, 18 Apr 2018 17:30:48 +0200
Sebastiano Miano <sebastiano.miano@polito.it> wrote:

> This patch adds the map id to the bpf tracepoints
> that can be used when monitoring or inspecting map
> related functions.
> 
> Signed-off-by: Sebastiano Miano <sebastiano.miano@polito.it>
> Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

Thanks you for doing this.  I've needed this before when
troubleshooting my XDP programs (specifically xdp_ddos01_blacklist[1]).

E.g. when I want to verify that my tools are doing the right thing, I
can now find the XDP prog id via 'ip link' or bpftool, and list the map
IDs used by the prog tool (via bpftool), and now use perf to record map
changes, which now have the needed IDs I can filter on.  Before, I
could not tell the difference if the program was updating the correct
map (which were a mistake I ran into).

Perf record even support supplying filters on the cmdline, like:

 perf record -e bpf:bpf_map_* -a --filter 'id == 2 || id == 1' sleep 100

And yes, doing filtering this way is slow, compared to doing it via a
bpf_prog inside the kernel, which Sebastiano already provide a sample
on howto do.  But I just needed a way to find the bug in my program,
not any high speed usage.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_ddos01_blacklist_cmdline.c

^ permalink raw reply

* Re: [PATCH] net: phy: marvell: clear wol event before setting it
From: Jisheng Zhang @ 2018-04-19  9:09 UTC (permalink / raw)
  To: Bhadram Varka
  Cc: Andrew Lunn, Florian Fainelli, David S. Miller,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Jingju Hou
In-Reply-To: <e1409c9493f64188a71b44f69ccb24cf@bgmail102.nvidia.com>

On Thu, 19 Apr 2018 09:00:40 +0000 Bhadram Varka wrote:

> Hi,
> 
> > -----Original Message-----
> > From: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
> > Sent: Thursday, April 19, 2018 2:24 PM
> > To: Bhadram Varka <vbhadram@nvidia.com>
> > Cc: Andrew Lunn <andrew@lunn.ch>; Florian Fainelli <f.fainelli@gmail.com>;
> > David S. Miller <davem@davemloft.net>; netdev@vger.kernel.org; linux-
> > kernel@vger.kernel.org; Jingju Hou <Jingju.Hou@synaptics.com>
> > Subject: Re: [PATCH] net: phy: marvell: clear wol event before setting it
> > 
> > Hi,
> > 
> > On Thu, 19 Apr 2018 08:38:45 +0000 Bhadram Varka wrote:
> >   
> > > Hi,
> > >  
> > > > -----Original Message-----
> > > > From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> > > > Behalf Of Jisheng Zhang
> > > > Sent: Thursday, April 19, 2018 1:33 PM
> > > > To: Andrew Lunn <andrew@lunn.ch>; Florian Fainelli
> > > > <f.fainelli@gmail.com>; David S. Miller <davem@davemloft.net>
> > > > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Jingju Hou
> > > > <Jingju.Hou@synaptics.com>
> > > > Subject: [PATCH] net: phy: marvell: clear wol event before setting
> > > > it
> > > >
> > > > From: Jingju Hou <Jingju.Hou@synaptics.com>
> > > >
> > > > If WOL event happened once, the LED[2] interrupt pin will not be
> > > > cleared unless reading the CSISR register. So clear the WOL event before  
> > enabling it.  
> > > >
> > > > Signed-off-by: Jingju Hou <Jingju.Hou@synaptics.com>
> > > > Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
> > > > ---
> > > >  drivers/net/phy/marvell.c | 9 +++++++++
> > > >  1 file changed, 9 insertions(+)
> > > >
> > > > diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
> > > > index c22e8e383247..b6abe1cbc84b 100644
> > > > --- a/drivers/net/phy/marvell.c
> > > > +++ b/drivers/net/phy/marvell.c
> > > > @@ -115,6 +115,9 @@
> > > >  /* WOL Event Interrupt Enable */
> > > >  #define MII_88E1318S_PHY_CSIER_WOL_EIE			BIT(7)
> > > >
> > > > +/* Copper Specific Interrupt Status Register */
> > > > +#define MII_88E1318S_PHY_CSISR				0x13
> > > > +  
> > >
> > > There is already macro to represent this register - MII_M1011_IEVENT. Do we  
> > need this macro ?
> > 
> > Good point. Will use MII_M1011_IEVENT instead in v2.
> >   
> > >  
> > > >  /* LED Timer Control Register */
> > > >  #define MII_88E1318S_PHY_LED_TCR			0x12
> > > >  #define MII_88E1318S_PHY_LED_TCR_FORCE_INT		BIT(15)
> > > > @@ -1393,6 +1396,12 @@ static int m88e1318_set_wol(struct phy_device
> > > > *phydev,
> > > >  		if (err < 0)
> > > >  			goto error;
> > > >
> > > > +		/* If WOL event happened once, the LED[2] interrupt pin
> > > > +		 * will not be cleared unless reading the CSISR register.
> > > > +		 * So clear the WOL event first before enabling it.
> > > > +		 */
> > > > +		phy_read(phydev, MII_88E1318S_PHY_CSISR);  
> > >
> > > This part of the operation already taken care by ack_interrupt and
> > > did_interrupt [....] .ack_interrupt = &marvell_ack_interrupt,
> > > .did_interrupt = &m88e1121_did_interrupt, [...]
> > >
> > > If at all WOL event occurred marvell_ack_interrupt will take care of clearing the  
> > interrupt status register.  
> > > Am I missing anything here ?  
> > 
> > If there's no valid irq for phy, the ack_interrupt/did_interrupt won't be called.  
> 
> Which means that the PHY is not having Interrupt pin ?

No valid irq doesn't mean "not having interrupt pin". they are different

> 
> Generally through PHY interrupt will wake up the system right. If there is no interrupt pin then how the system will wake up the from suspend for the magic packet.?
> 

IIRC, the phy irq isn't necessary for WOL. The phy interrupt pin isn't
necessarily taken as "interrupt"

PS: Did you use outlook as your email client? it's not suitable
for kernel mail list.

Thanks

^ permalink raw reply

* Re: [bpf-next PATCH 2/3] bpf: add id to prog tracepoint
From: Jesper Dangaard Brouer @ 2018-04-19  9:10 UTC (permalink / raw)
  To: Sebastiano Miano
  Cc: netdev, ast, daniel, mingo, rostedt, fulvio.risso, brouer
In-Reply-To: <152406545357.3465.11912320259045335712.stgit@localhost.localdomain>

On Wed, 18 Apr 2018 17:30:53 +0200
Sebastiano Miano <sebastiano.miano@polito.it> wrote:

> This patch adds the prog id to the bpf tracepoints
> that can be used when monitoring or inspecting prog
> related functions.
> 
> Signed-off-by: Sebastiano Miano <sebastiano.miano@polito.it>
> Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [bpf-next PATCH 3/3] bpf: add sample program to trace map events
From: Jesper Dangaard Brouer @ 2018-04-19  9:20 UTC (permalink / raw)
  To: Sebastiano Miano
  Cc: netdev, ast, daniel, mingo, rostedt, fulvio.risso, brouer
In-Reply-To: <152406545918.3465.14253635905960610284.stgit@localhost.localdomain>

On Wed, 18 Apr 2018 17:30:59 +0200
Sebastiano Miano <sebastiano.miano@polito.it> wrote:

> This patch adds a sample program, called trace_map_events,
> that shows how to capture map events and filter them based on
> the map id.
> 
> The program accepts a list of map IDs, via the -i command line
> option, and filters all the map events related to those IDs (i.e.,
> map_create/update/lookup/next_key).
> If no IDs are specified, all map events are listed and no filtering
> is performed.
> 
> Sample usage:
> 
>  # trace_map_events -i <map_id1> -i <map_id2> -i <map_id3> ...
> 
> Signed-off-by: Sebastiano Miano <sebastiano.miano@polito.it>

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

I have tested it works:

$ sudo ./trace_map_events -i 2
Init bpf_perf_event for cpu:0
Init bpf_perf_event for cpu:1
Init bpf_perf_event for cpu:2
Init bpf_perf_event for cpu:3
Init bpf_perf_event for cpu:4
Init bpf_perf_event for cpu:5
Waiting for map events...
LOOKUP event for map id: 2 and type: 6
Waiting for map events...
LOOKUP event for map id: 2 and type: 6
Waiting for map events...
LOOKUP event for map id: 2 and type: 6
Waiting for map events...
LOOKUP event for map id: 2 and type: 6
Waiting for map events...
LOOKUP event for map id: 2 and type: 6
Waiting for map events...
LOOKUP event for map id: 2 and type: 6
Waiting for map events...
LOOKUP event for map id: 2 and type: 6
Waiting for map events...

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH net-next] team: account for oper state
From: George Wilkie @ 2018-04-19  9:33 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev
In-Reply-To: <20180418191732.GB1922@nanopsycho>

On Wed, Apr 18, 2018 at 09:17:32PM +0200, Jiri Pirko wrote:
> Wed, Apr 18, 2018 at 05:33:12PM CEST, gwilkie@vyatta.att-mail.com wrote:
> >On Wed, Apr 18, 2018 at 04:58:22PM +0200, Jiri Pirko wrote:
> >> Wed, Apr 18, 2018 at 03:35:49PM CEST, gwilkie@vyatta.att-mail.com wrote:
> >> >On Wed, Apr 18, 2018 at 02:56:44PM +0200, Jiri Pirko wrote:
> >> >> Wed, Apr 18, 2018 at 12:29:50PM CEST, gwilkie@vyatta.att-mail.com wrote:
> >> >> >Account for operational state when determining port linkup state,
> >> >> >as per Documentation/networking/operstates.txt.
> >> >> 
> >> >> Could you please point me to the exact place in the document where this
> >> >> is suggested?
> >> >> 
> >> >
> >> >Various places cover it I think.
> >> >
> >> >In 1. Introduction:
> >> >"interface is not usable just because the admin enabled it"
> >> >"userspace must be granted the possibility to
> >> >influence operational state"
> >> >
> >> >In 4. Setting from userspace:
> >> >"the userspace application can set IFLA_OPERSTATE
> >> >to IF_OPER_DORMANT or IF_OPER_UP as long as the driver does not set
> >> >netif_carrier_off() or netif_dormant_on()"
> >> >
> >> >We have a use case where we want to set the oper state of the team ports based
> >> >on whether they are actually usable or not (as opposed to just admin up).
> >> 
> >> Are you running a supplicant there or what is the use-case?
> >> 
> >
> >We are using tun/tap interfaces for the team ports with the physical interfaces
> >under the control of a user process.
> >
> >> How is this handle in other drivers like bond, openvswitch, bridge, etc?
> >
> >It looks like bridge is using it, looking at br_port_carrier_check() and
> >br_add_if().
> 
> Okay, so why do you still need to check netif_carrier_ok?
> Looks like netif_oper_up is enough, right?

Yes, I was being overly cautious. Replacing netif_carrier_ok with netif_oper_up
works OK. I'll send updated patch.

Cheers.


> 
> 
> >
> >Cheers.
> >
> >> 
> >> >
> >> >Cheers.
> >> >
> >> >> 
> >> >> >
> >> >> >Signed-off-by: George Wilkie <gwilkie@vyatta.att-mail.com>
> >> >> >---
> >> >> > drivers/net/team/team.c | 3 ++-
> >> >> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >> >> >
> >> >> >diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
> >> >> >index a6c6ce19eeee..231264a05e55 100644
> >> >> >--- a/drivers/net/team/team.c
> >> >> >+++ b/drivers/net/team/team.c
> >> >> >@@ -2918,7 +2918,8 @@ static int team_device_event(struct notifier_block *unused,
> >> >> > 	case NETDEV_CHANGE:
> >> >> > 		if (netif_running(port->dev))
> >> >> > 			team_port_change_check(port,
> >> >> >-					       !!netif_carrier_ok(port->dev));
> >> >> >+					       !!(netif_carrier_ok(port->dev) &&
> >> >> >+						  netif_oper_up(port->dev)));
> >> >> > 		break;
> >> >> > 	case NETDEV_UNREGISTER:
> >> >> > 		team_del_slave(port->team->dev, dev);
> >> >> >-- 
> >> >> >2.11.0
> >> >> >
> 
> 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox