Netdev List
 help / color / mirror / Atom feed
* [PATCH v2 2/4] net: phy: realtek: change macro name for page select register
From: Icenowy Zheng @ 2017-08-22  4:03 UTC (permalink / raw)
  To: Maxime Ripard, Chen-Yu Tsai, Florian Fainelli, Corentin Labbe
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-sunxi-/JYPxA39Uh5TLH3MbocFFw,
	Icenowy Zheng
In-Reply-To: <20170822040400.12166-1-icenowy-h8G6r0blFSE@public.gmane.org>

From: Icenowy Zheng <icenowy-ymACFijhrKM@public.gmane.org>

The page select register also exists on RTL8211E PHY (although it
behaves slightly differently).

Change the register macro name to remove the F.

Signed-off-by: Icenowy Zheng <icenowy-ymACFijhrKM@public.gmane.org>
---
 drivers/net/phy/realtek.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index 9cbe645e3d89..d820d00addf6 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -22,11 +22,13 @@
 #define RTL821x_INER		0x12
 #define RTL821x_INER_INIT	0x6400
 #define RTL821x_INSR		0x13
+
+#define RTL8211_PAGE_SELECT	0x1f
+
 #define RTL8211E_INER_LINK_STATUS 0x400
 
 #define RTL8211F_INER_LINK_STATUS 0x0010
 #define RTL8211F_INSR		0x1d
-#define RTL8211F_PAGE_SELECT	0x1f
 #define RTL8211F_TX_DELAY	0x100
 
 MODULE_DESCRIPTION("Realtek PHY driver");
@@ -46,10 +48,10 @@ static int rtl8211f_ack_interrupt(struct phy_device *phydev)
 {
 	int err;
 
-	phy_write(phydev, RTL8211F_PAGE_SELECT, 0xa43);
+	phy_write(phydev, RTL8211_PAGE_SELECT, 0xa43);
 	err = phy_read(phydev, RTL8211F_INSR);
 	/* restore to default page 0 */
-	phy_write(phydev, RTL8211F_PAGE_SELECT, 0x0);
+	phy_write(phydev, RTL8211_PAGE_SELECT, 0x0);
 
 	return (err < 0) ? err : 0;
 }
@@ -102,7 +104,7 @@ static int rtl8211f_config_init(struct phy_device *phydev)
 	if (ret < 0)
 		return ret;
 
-	phy_write(phydev, RTL8211F_PAGE_SELECT, 0xd08);
+	phy_write(phydev, RTL8211_PAGE_SELECT, 0xd08);
 	reg = phy_read(phydev, 0x11);
 
 	/* enable TX-delay for rgmii-id and rgmii-txid, otherwise disable it */
@@ -114,7 +116,7 @@ static int rtl8211f_config_init(struct phy_device *phydev)
 
 	phy_write(phydev, 0x11, reg);
 	/* restore to default page 0 */
-	phy_write(phydev, RTL8211F_PAGE_SELECT, 0x0);
+	phy_write(phydev, RTL8211_PAGE_SELECT, 0x0);
 
 	return 0;
 }
-- 
2.13.5

^ permalink raw reply related

* [PATCH v2 1/4] net: stmmac: dwmac-sun8i: support RGMII modes with PHY internal delay
From: Icenowy Zheng @ 2017-08-22  4:03 UTC (permalink / raw)
  To: Maxime Ripard, Chen-Yu Tsai, Florian Fainelli, Corentin Labbe
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-sunxi-/JYPxA39Uh5TLH3MbocFFw,
	Icenowy Zheng
In-Reply-To: <20170822040400.12166-1-icenowy-h8G6r0blFSE@public.gmane.org>

Some boards uses a PHY with internal delay with an Allwinner SoC.

Support these PHY modes in the driver.

As the driver has no configuration registers for these modes, just treat
them as ordinary RGMII.

Signed-off-by: Icenowy Zheng <icenowy-h8G6r0blFSE@public.gmane.org>
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
index fffd6d5fc907..2af680cac497 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
@@ -723,6 +723,9 @@ static int sun8i_dwmac_set_syscon(struct stmmac_priv *priv)
 		/* default */
 		break;
 	case PHY_INTERFACE_MODE_RGMII:
+	case PHY_INTERFACE_MODE_RGMII_ID:
+	case PHY_INTERFACE_MODE_RGMII_RXID:
+	case PHY_INTERFACE_MODE_RGMII_TXID:
 		reg |= SYSCON_EPIT | SYSCON_ETCS_INT_GMII;
 		break;
 	case PHY_INTERFACE_MODE_RMII:
-- 
2.13.5

^ permalink raw reply related

* [PATCH v2 0/4] Workaround broken RTL8211E on some Pine64+ boards
From: Icenowy Zheng @ 2017-08-22  4:03 UTC (permalink / raw)
  To: Maxime Ripard, Chen-Yu Tsai, Florian Fainelli, Corentin Labbe
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-sunxi-/JYPxA39Uh5TLH3MbocFFw,
	Icenowy Zheng

Some Pine64+ boards come with bad RTL8211E PHYs, which cannot work reliably
unless do some hack. According to Pine64 people, Realtek describes the hack
as totally disabling RX delay, and it's not documented at all.

This patchset introduces the workaround on Pine64+.

The first patch adds RGMII variants' support to the dwmac-sun8i driver.

The second patch renames some macros in RTL PHY driver, and the third
patch introduces the hack as the "RGMII-TXID" mode of the PHY.

The fourth patch enables the hack in the device tree.

Icenowy Zheng (4):
  net: stmmac: dwmac-sun8i: support RGMII modes with PHY internal delay
  net: phy: realtek: change macro name for page select register
  net: phy: realtek: add disable RX internal delay mode
  arm64: allwinner: a64: disable the RTL8211E internal RX delay on
    Pine64+

 .../boot/dts/allwinner/sun50i-a64-pine64-plus.dts  |  2 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c  |  3 ++
 drivers/net/phy/realtek.c                          | 43 +++++++++++++++++++---
 3 files changed, 42 insertions(+), 6 deletions(-)

-- 
2.13.5

^ permalink raw reply

* Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
From: Willem de Bruijn @ 2017-08-22  3:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: Koichiro Den, Michael S. Tsirkin, virtualization,
	Network Development
In-Reply-To: <64d451ae-9944-e978-5a05-54bb1a62aaad@redhat.com>

>>> Interesting, deadlock could be treated as a a radical case of the
>>> discussion
>>> here https://patchwork.kernel.org/patch/3787671/.
>>>
>>> git grep tells more similar skb_orphan() cases. Do we need to change them
>>> all (or part)?
>>
>> Most skb_orphan calls are not relevant to the issue of transmit delay.
>
>
> Yes, but at least we should audit the ones in drivers/net.

Do you mean other virtual device driver transmit paths, like xen,
specifically?

>>> Actually, we may meet similar issues at many other places (e.g netem).
>>
>> Netem is an interesting case. Because it is intended to mimic network
>> delay, at least in the case where it calls skb_orphan, it may make
>> sense to release all references, including calling skb_zcopy_clear.
>>
>> In general, zerocopy reverts to copy on all paths that may cause
>> unbounded delay due to another process. Guarding against delay
>> induced by the administrator is infeasible. It is always possible to
>> just pause the nic. Netem is one instance of that, and not unbounded.
>
>
> The problem is, admin may only delay the traffic in e.g one interface, but
> it actually delay or stall all traffic inside a VM.

Understood. Ideally we can remove the HoL blocking cause of this,
itself.

>>> Need
>>> to consider a complete solution for this. Figuring out all places that
>>> could
>>> delay a packet is a method.
>>
>> The issue described in the referenced patch seems like head of line
>> blocking between two flows. If one flow delays zerocopy descriptor
>> release from the vhost-net pool, it blocks all subsequent descriptors
>> in that pool from being released, also delaying other flows that use
>> the same descriptor pool. If the pool is empty, all transmission stopped.
>>
>> Reverting to copy tx when the pool reaches a low watermark, as the
>> patch does, fixes this.
>
>
> An issue of the referenced patch is that sndbuf could be smaller than low
> watermark.
>
>> Perhaps the descriptor pool should also be
>> revised to allow out of order completions. Then there is no need to
>> copy zerocopy packets whenever they may experience delay.
>
>
> Yes, but as replied in the referenced thread, windows driver may treat out
> of order completion as a bug.

Interesting. I missed that. Perhaps the zerocopy optimization
could be gated on guest support for out of order completions.

>> On the point of counting copy vs zerocopy: the new msg_zerocopy
>> variant of ubuf_info has a field to record whether a deep copy was
>> made. This can be used with vhost-net zerocopy, too.
>
>
> Just to make sure I understand. It's still not clear to me how to reuse this
> for vhost-net, e.g zerocopy flag is in a union which is not used by
> vhost_net.

True, but that is not set in stone. I went back and forth on that when
preparing fix 0a4a060bb204 ("sock: fix zerocopy_success regression
with msg_zerocopy"). The field can be moved outside the union and
initialized in the other zerocopy paths.

^ permalink raw reply

* Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
From: Jason Wang @ 2017-08-22  2:50 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Koichiro Den, Michael S. Tsirkin, virtualization,
	Network Development
In-Reply-To: <CAF=yD-LZ4=WAYfUtY7xRWi50FRSkrcOa+b7uc46xRnC4sbDCzQ@mail.gmail.com>



On 2017年08月21日 23:41, Willem de Bruijn wrote:
> On Mon, Aug 21, 2017 at 8:33 AM, Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2017年08月19日 14:38, Koichiro Den wrote:
>>> Facing the possible unbounded delay relying on freeing on xmit path,
>>> we also better to invoke and clear the upper layer zerocopy callback
>>> beforehand to keep them from waiting for unbounded duration in vain.
>>> For instance, this removes the possible deadlock in the case that the
>>> upper layer is a zerocopy-enabled vhost-net.
>>> This does not apply if napi_tx is enabled since it will be called in
>>> reasonale time.
>>>
>>> Signed-off-by: Koichiro Den <den@klaipeden.com>
>>> ---
>>>    drivers/net/virtio_net.c | 8 ++++++++
>>>    1 file changed, 8 insertions(+)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index 4302f313d9a7..f7deaa5b7b50 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -1290,6 +1290,14 @@ static netdev_tx_t start_xmit(struct sk_buff *skb,
>>> struct net_device *dev)
>>>          /* Don't wait up for transmitted skbs to be freed. */
>>>          if (!use_napi) {
>>> +               if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
>>> +                       struct ubuf_info *uarg;
>>> +                       uarg = skb_shinfo(skb)->destructor_arg;
>>> +                       if (uarg->callback)
>>> +                           uarg->callback(uarg, true);
>>> +                       skb_shinfo(skb)->destructor_arg = NULL;
>>> +                       skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;
>>> +               }
>>>                  skb_orphan(skb);
>>>                  nf_reset(skb);
>>>          }
>>
>>
>> Interesting, deadlock could be treated as a a radical case of the discussion
>> here https://patchwork.kernel.org/patch/3787671/.
>>
>> git grep tells more similar skb_orphan() cases. Do we need to change them
>> all (or part)?
> Most skb_orphan calls are not relevant to the issue of transmit delay.

Yes, but at least we should audit the ones in drivers/net.

>
>> Actually, we may meet similar issues at many other places (e.g netem).
> Netem is an interesting case. Because it is intended to mimic network
> delay, at least in the case where it calls skb_orphan, it may make
> sense to release all references, including calling skb_zcopy_clear.
>
> In general, zerocopy reverts to copy on all paths that may cause
> unbounded delay due to another process. Guarding against delay
> induced by the administrator is infeasible. It is always possible to
> just pause the nic. Netem is one instance of that, and not unbounded.

The problem is, admin may only delay the traffic in e.g one interface, 
but it actually delay or stall all traffic inside a VM.

>
>> Need
>> to consider a complete solution for this. Figuring out all places that could
>> delay a packet is a method.
> The issue described in the referenced patch seems like head of line
> blocking between two flows. If one flow delays zerocopy descriptor
> release from the vhost-net pool, it blocks all subsequent descriptors
> in that pool from being released, also delaying other flows that use
> the same descriptor pool. If the pool is empty, all transmission stopped.
>
> Reverting to copy tx when the pool reaches a low watermark, as the
> patch does, fixes this.

An issue of the referenced patch is that sndbuf could be smaller than 
low watermark.

> Perhaps the descriptor pool should also be
> revised to allow out of order completions. Then there is no need to
> copy zerocopy packets whenever they may experience delay.

Yes, but as replied in the referenced thread, windows driver may treat 
out of order completion as a bug.

>
> On the point of counting copy vs zerocopy: the new msg_zerocopy
> variant of ubuf_info has a field to record whether a deep copy was
> made. This can be used with vhost-net zerocopy, too.

Just to make sure I understand. It's still not clear to me how to reuse 
this for vhost-net, e.g zerocopy flag is in a union which is not used by 
vhost_net.

Thanks

^ permalink raw reply

* [PATCH net-next,4/4] hv_netvsc: Update netvsc Document for UDP hash level setting
From: Haiyang Zhang @ 2017-08-22  2:22 UTC (permalink / raw)
  To: davem, netdev; +Cc: haiyangz, kys, olaf, vkuznets, linux-kernel
In-Reply-To: <1503368560-14331-1-git-send-email-haiyangz@exchange.microsoft.com>

From: Haiyang Zhang <haiyangz@microsoft.com>

Update Documentation/networking/netvsc.txt for UDP hash level setting
and related info.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 Documentation/networking/netvsc.txt |   22 +++++++++++++++++-----
 1 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/netvsc.txt b/Documentation/networking/netvsc.txt
index 4ddb4e4..fa8d863 100644
--- a/Documentation/networking/netvsc.txt
+++ b/Documentation/networking/netvsc.txt
@@ -21,11 +21,23 @@ Features
   --------------------
   Hyper-V supports receive side scaling. For TCP, packets are
   distributed among available queues based on IP address and port
-  number. Current versions of Hyper-V host, only distribute UDP
-  packets based on the IP source and destination address.
-  The port number is not used as part of the hash value for UDP.
-  Fragmented IP packets are not distributed between queues;
-  all fragmented packets arrive on the first channel.
+  number.
+
+  For UDP, we can switch UDP hash level between L3 and L4 by ethtool
+  command. UDP over IPv4 and v6 can be set differently. The default
+  hash level is L4. We currently only allow switching TX hash level
+  from within the guests.
+
+  On Azure, fragmented UDP packets have high loss rate with L4
+  hashing. Using L3 hashing is recommended in this case.
+
+  For example, for UDP over IPv4 on eth0:
+  To include UDP port numbers in hasing:
+        ethtool -N eth0 rx-flow-hash udp4 sdfn
+  To exclude UDP port numbers in hasing:
+        ethtool -N eth0 rx-flow-hash udp4 sd
+  To show UDP hash level:
+        ethtool -n eth0 rx-flow-hash udp4
 
   Generic Receive Offload, aka GRO
   --------------------------------
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next,3/4] hv_netvsc: Add ethtool handler to set and get UDP hash levels
From: Haiyang Zhang @ 2017-08-22  2:22 UTC (permalink / raw)
  To: davem, netdev; +Cc: haiyangz, kys, olaf, vkuznets, linux-kernel
In-Reply-To: <1503368560-14331-1-git-send-email-haiyangz@exchange.microsoft.com>

From: Haiyang Zhang <haiyangz@microsoft.com>

The patch add the functions to switch UDP hash level between
L3 and L4 by ethtool command. UDP over IPv4 and v6 can be set
differently. The default hash level is L4. We currently only
allow switching TX hash level from within the guests.

On Azure, fragmented UDP packets have high loss rate with L4
hashing. Using L3 hashing is recommended in this case.

For example, for UDP over IPv4 on eth0:
To include UDP port numbers in hasing:
	ethtool -N eth0 rx-flow-hash udp4 sdfn
To exclude UDP port numbers in hasing:
	ethtool -N eth0 rx-flow-hash udp4 sd
To show UDP hash level:
	ethtool -n eth0 rx-flow-hash udp4

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/net/hyperv/hyperv_net.h |    2 +
 drivers/net/hyperv/netvsc_drv.c |   78 +++++++++++++++++++++++++++++++++++----
 2 files changed, 72 insertions(+), 8 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 9198dd1..ff1c0c8 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -720,6 +720,8 @@ struct net_device_context {
 	u32 tx_send_table[VRSS_SEND_TAB_SIZE];
 
 	/* Ethtool settings */
+	bool udp4_l4_hash;
+	bool udp6_l4_hash;
 	u8 duplex;
 	u32 speed;
 	struct netvsc_ethtool_stats eth_stats;
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index d8612b1..c0c4c91 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -190,10 +190,12 @@ static int netvsc_close(struct net_device *net)
 	return ppi;
 }
 
-/* Azure hosts don't support non-TCP port numbers in hashing yet. We compute
- * hash for non-TCP traffic with only IP numbers.
+/* Azure hosts don't support non-TCP port numbers in hashing for fragmented
+ * packets. We can use ethtool to change UDP hash level when necessary.
  */
-static inline u32 netvsc_get_hash(struct sk_buff *skb)
+static inline u32 netvsc_get_hash(
+	struct sk_buff *skb,
+	const struct net_device_context *ndc)
 {
 	struct flow_keys flow;
 	u32 hash;
@@ -204,7 +206,11 @@ static inline u32 netvsc_get_hash(struct sk_buff *skb)
 	if (!skb_flow_dissect_flow_keys(skb, &flow, 0))
 		return 0;
 
-	if (flow.basic.ip_proto == IPPROTO_TCP) {
+	if (flow.basic.ip_proto == IPPROTO_TCP ||
+	    (flow.basic.ip_proto == IPPROTO_UDP &&
+	     ((flow.basic.n_proto == htons(ETH_P_IP) && ndc->udp4_l4_hash) ||
+	      (flow.basic.n_proto == htons(ETH_P_IPV6) &&
+	       ndc->udp6_l4_hash)))) {
 		return skb_get_hash(skb);
 	} else {
 		if (flow.basic.n_proto == htons(ETH_P_IP))
@@ -227,7 +233,7 @@ static inline int netvsc_get_tx_queue(struct net_device *ndev,
 	struct sock *sk = skb->sk;
 	int q_idx;
 
-	q_idx = ndc->tx_send_table[netvsc_get_hash(skb) &
+	q_idx = ndc->tx_send_table[netvsc_get_hash(skb, ndc) &
 				   (VRSS_SEND_TAB_SIZE - 1)];
 
 	/* If queue index changed record the new value */
@@ -891,6 +897,9 @@ static void netvsc_init_settings(struct net_device *dev)
 {
 	struct net_device_context *ndc = netdev_priv(dev);
 
+	ndc->udp4_l4_hash = true;
+	ndc->udp6_l4_hash = true;
+
 	ndc->speed = SPEED_UNKNOWN;
 	ndc->duplex = DUPLEX_FULL;
 }
@@ -1228,7 +1237,8 @@ static void netvsc_get_strings(struct net_device *dev, u32 stringset, u8 *data)
 }
 
 static int
-netvsc_get_rss_hash_opts(struct ethtool_rxnfc *info)
+netvsc_get_rss_hash_opts(struct net_device_context *ndc,
+			 struct ethtool_rxnfc *info)
 {
 	info->data = RXH_IP_SRC | RXH_IP_DST;
 
@@ -1236,9 +1246,20 @@ static void netvsc_get_strings(struct net_device *dev, u32 stringset, u8 *data)
 	case TCP_V4_FLOW:
 	case TCP_V6_FLOW:
 		info->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
-		/* fallthrough */
+		break;
+
 	case UDP_V4_FLOW:
+		if (ndc->udp4_l4_hash)
+			info->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
+
+		break;
+
 	case UDP_V6_FLOW:
+		if (ndc->udp6_l4_hash)
+			info->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
+
+		break;
+
 	case IPV4_FLOW:
 	case IPV6_FLOW:
 		break;
@@ -1266,11 +1287,51 @@ static void netvsc_get_strings(struct net_device *dev, u32 stringset, u8 *data)
 		return 0;
 
 	case ETHTOOL_GRXFH:
-		return netvsc_get_rss_hash_opts(info);
+		return netvsc_get_rss_hash_opts(ndc, info);
 	}
 	return -EOPNOTSUPP;
 }
 
+static int netvsc_set_rss_hash_opts(struct net_device_context *ndc,
+				    struct ethtool_rxnfc *info)
+{
+	if (info->data == (RXH_IP_SRC | RXH_IP_DST |
+			   RXH_L4_B_0_1 | RXH_L4_B_2_3)) {
+		if (info->flow_type == UDP_V4_FLOW)
+			ndc->udp4_l4_hash = true;
+		else if (info->flow_type == UDP_V6_FLOW)
+			ndc->udp6_l4_hash = true;
+		else
+			return -EOPNOTSUPP;
+
+		return 0;
+	}
+
+	if (info->data == (RXH_IP_SRC | RXH_IP_DST)) {
+		if (info->flow_type == UDP_V4_FLOW)
+			ndc->udp4_l4_hash = false;
+		else if (info->flow_type == UDP_V6_FLOW)
+			ndc->udp6_l4_hash = false;
+		else
+			return -EOPNOTSUPP;
+
+		return 0;
+	}
+
+	return -EOPNOTSUPP;
+}
+
+static int
+netvsc_set_rxnfc(struct net_device *ndev, struct ethtool_rxnfc *info)
+{
+	struct net_device_context *ndc = netdev_priv(ndev);
+
+	if (info->cmd == ETHTOOL_SRXFH)
+		return netvsc_set_rss_hash_opts(ndc, info);
+
+	return -EOPNOTSUPP;
+}
+
 #ifdef CONFIG_NET_POLL_CONTROLLER
 static void netvsc_poll_controller(struct net_device *dev)
 {
@@ -1469,6 +1530,7 @@ static int netvsc_set_ringparam(struct net_device *ndev,
 	.set_channels   = netvsc_set_channels,
 	.get_ts_info	= ethtool_op_get_ts_info,
 	.get_rxnfc	= netvsc_get_rxnfc,
+	.set_rxnfc	= netvsc_set_rxnfc,
 	.get_rxfh_key_size = netvsc_get_rxfh_key_size,
 	.get_rxfh_indir_size = netvsc_rss_indir_size,
 	.get_rxfh	= netvsc_get_rxfh,
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next,2/4] hv_netvsc: Clean up unused parameter from netvsc_get_rss_hash_opts()
From: Haiyang Zhang @ 2017-08-22  2:22 UTC (permalink / raw)
  To: davem, netdev; +Cc: haiyangz, kys, olaf, vkuznets, linux-kernel
In-Reply-To: <1503368560-14331-1-git-send-email-haiyangz@exchange.microsoft.com>

From: Haiyang Zhang <haiyangz@microsoft.com>

The parameter "nvdev" is not in use.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/net/hyperv/netvsc_drv.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 4677d21..d8612b1 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1228,8 +1228,7 @@ static void netvsc_get_strings(struct net_device *dev, u32 stringset, u8 *data)
 }
 
 static int
-netvsc_get_rss_hash_opts(struct netvsc_device *nvdev,
-			 struct ethtool_rxnfc *info)
+netvsc_get_rss_hash_opts(struct ethtool_rxnfc *info)
 {
 	info->data = RXH_IP_SRC | RXH_IP_DST;
 
@@ -1267,7 +1266,7 @@ static void netvsc_get_strings(struct net_device *dev, u32 stringset, u8 *data)
 		return 0;
 
 	case ETHTOOL_GRXFH:
-		return netvsc_get_rss_hash_opts(nvdev, info);
+		return netvsc_get_rss_hash_opts(info);
 	}
 	return -EOPNOTSUPP;
 }
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next,1/4] hv_netvsc: Clean up unused parameter from netvsc_get_hash()
From: Haiyang Zhang @ 2017-08-22  2:22 UTC (permalink / raw)
  To: davem, netdev; +Cc: haiyangz, kys, olaf, vkuznets, linux-kernel
In-Reply-To: <1503368560-14331-1-git-send-email-haiyangz@exchange.microsoft.com>

From: Haiyang Zhang <haiyangz@microsoft.com>

The parameter "sk" is not in use.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/net/hyperv/netvsc_drv.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index b33f050..4677d21 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -193,7 +193,7 @@ static int netvsc_close(struct net_device *net)
 /* Azure hosts don't support non-TCP port numbers in hashing yet. We compute
  * hash for non-TCP traffic with only IP numbers.
  */
-static inline u32 netvsc_get_hash(struct sk_buff *skb, struct sock *sk)
+static inline u32 netvsc_get_hash(struct sk_buff *skb)
 {
 	struct flow_keys flow;
 	u32 hash;
@@ -227,7 +227,7 @@ static inline int netvsc_get_tx_queue(struct net_device *ndev,
 	struct sock *sk = skb->sk;
 	int q_idx;
 
-	q_idx = ndc->tx_send_table[netvsc_get_hash(skb, sk) &
+	q_idx = ndc->tx_send_table[netvsc_get_hash(skb) &
 				   (VRSS_SEND_TAB_SIZE - 1)];
 
 	/* If queue index changed record the new value */
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next,0/4] hv_netvsc: Ethtool handler to change UDP hash levels
From: Haiyang Zhang @ 2017-08-22  2:22 UTC (permalink / raw)
  To: davem, netdev; +Cc: haiyangz, kys, olaf, vkuznets, linux-kernel

From: Haiyang Zhang <haiyangz@microsoft.com>

The patch set adds the functions to switch UDP hash level between
L3 and L4 by ethtool command. UDP over IPv4 and v6 can be set
differently. The default hash level is L4. We currently only
allow switching TX hash level from within the guests.

The ethtool callback function is triggered by command line, and
update the per device variables of the hash level.

On Azure, fragmented UDP packets is not yet supported with L4
hashing, and may have high packet loss rate. Using L3 hashing is
recommended in this case. This ethtool option allows a user to
make this selection.

Haiyang Zhang (4):
  hv_netvsc: Clean up unused parameter from netvsc_get_hash()
  hv_netvsc: Clean up unused parameter from netvsc_get_rss_hash_opts()
  hv_netvsc: Add ethtool handler to set and get UDP hash levels
  hv_netvsc: Update netvsc Document for UDP hash level setting

 Documentation/networking/netvsc.txt |   22 ++++++++--
 drivers/net/hyperv/hyperv_net.h     |    2 +
 drivers/net/hyperv/netvsc_drv.c     |   77 +++++++++++++++++++++++++++++++----
 3 files changed, 88 insertions(+), 13 deletions(-)

^ permalink raw reply

* Re: [PATCH net-next] liquidio: show NIC's U-Boot version in a dev_info() message
From: Andrew Lunn @ 2017-08-22  1:30 UTC (permalink / raw)
  To: Felix Manlunas
  Cc: davem, netdev, raghu.vatsavayi, derek.chickles, satananda.burla,
	weilin.chang
In-Reply-To: <20170822011944.GA6393@felix-thinkpad.cavium.com>

On Mon, Aug 21, 2017 at 06:19:44PM -0700, Felix Manlunas wrote:
> From: Weilin Chang <weilin.chang@cavium.com>
> 
> Signed-off-by: Weilin Chang <weilin.chang@cavium.com>
> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
> ---
>  .../net/ethernet/cavium/liquidio/octeon_console.c  | 85 ++++++++++++++++++++++
>  .../net/ethernet/cavium/liquidio/octeon_device.h   |  5 ++
>  2 files changed, 90 insertions(+)
> 
> diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_console.c b/drivers/net/ethernet/cavium/liquidio/octeon_console.c
> index 19e5212..88ef12b 100644
> --- a/drivers/net/ethernet/cavium/liquidio/octeon_console.c
> +++ b/drivers/net/ethernet/cavium/liquidio/octeon_console.c
> @@ -30,6 +30,7 @@
>  #include "liquidio_image.h"
>  #include "octeon_mem_ops.h"
>  
> +static void octeon_get_uboot_version(struct octeon_device *oct);

Hi Felix

Please can we avoid this forward declaration by putting the code in
the correct order.

>  static void octeon_remote_lock(void);
>  static void octeon_remote_unlock(void);
>  static u64 cvmx_bootmem_phy_named_block_find(struct octeon_device *oct,
> @@ -611,6 +612,9 @@ int octeon_add_console(struct octeon_device *oct, u32 console_num,
>  
>  		work = &oct->console_poll_work[console_num].work;
>  
> +		if (oct->uboot_len == 0)
> +			octeon_get_uboot_version(oct);
> +
>  		INIT_DELAYED_WORK(work, check_console);
>  		oct->console_poll_work[console_num].ctxptr = (void *)oct;
>  		oct->console_poll_work[console_num].ctxul = console_num;
> @@ -724,6 +728,87 @@ static int octeon_console_read(struct octeon_device *oct, u32 console_num,
>  	return bytes_to_read;
>  }
>  
> +static void octeon_get_uboot_version(struct octeon_device *oct)
> +{
> +	s32 bytes_read, tries, total_read;
> +	struct octeon_console *console;
> +	u32 console_num = 0;
> +	int i;
> +
> +	if (octeon_console_send_cmd(oct, "setenv stdout pci\n", 50))
> +		return;
> +
> +	console = &oct->console[console_num];
> +	tries = 0;
> +	total_read = 0;
> +
> +	if (octeon_console_send_cmd(oct, "version\n", 1))
> +		return;
> +
> +	do {
> +		/* Take console output regardless of whether it will
> +		 * be logged
> +		 */
> +		bytes_read =
> +			octeon_console_read(oct,
> +					    console_num, oct->uboot_version +
> +					    total_read,
> +					    OCTEON_UBOOT_BUFFER_SIZE - 1 -
> +					    total_read);
> +		if (bytes_read > 0) {
> +			oct->uboot_version[bytes_read] = 0x0;
> +
> +			total_read += bytes_read;
> +			if (console->waiting)
> +				octeon_console_handle_result(oct, console_num);
> +		} else if (bytes_read < 0) {
> +			dev_err(&oct->pci_dev->dev, "Error reading console %u, ret=%d\n",
> +				console_num, bytes_read);
> +		}
> +
> +		tries++;
> +	} while ((bytes_read > 0) && (tries < 16));
> +
> +	/* If nothing is read after polling the console,
> +	 * output any leftovers if any
> +	 */
> +	if ((total_read == 0) && (console->leftover[0])) {
> +		dev_dbg(&oct->pci_dev->dev, "%u: %s\n",
> +			console_num, console->leftover);
> +		console->leftover[0] = '\0';
> +	}
> +
> +	if (octeon_console_send_cmd(oct, "setenv stdout serial\n", 50))
> +		return;
> +
> +	/* U-Boot */
> +	for (i = 0; i < (OCTEON_UBOOT_BUFFER_SIZE - 9); i++) {
> +		if (oct->uboot_version[i] == 'U' &&
> +		    oct->uboot_version[i + 2] == 'B' &&
> +		    oct->uboot_version[i + 3] == 'o' &&
> +		    oct->uboot_version[i + 4] == 'o' &&
> +		    oct->uboot_version[i + 5] == 't') {
> +			oct->uboot_sidx = i;
> +			i++;
> +			for (; oct->uboot_version[i] != 0x0; i++) {

I think you need a test in here to ensure i stays within the buffer.

> +				if (oct->uboot_version[i] == 'm' &&
> +				    oct->uboot_version[i + 1] == 'i' &&
> +				    oct->uboot_version[i + 2] == 'p' &&
> +				    oct->uboot_version[i + 3] == 's') {
> +					oct->uboot_eidx = i - 1;
> +					oct->uboot_version[i - 1] = 0x0;
> +					oct->uboot_len = oct->uboot_eidx -
> +						oct->uboot_sidx + 1;
> +					dev_info(&oct->pci_dev->dev, "%s\n",
> +						 &oct->uboot_version
> +							[oct->uboot_sidx]);
> +					return;
> +				}
> +			}
> +		}
> +	}
> +}
> +

      Andrew

^ permalink raw reply

* linux-next: manual merge of the net-next tree with the rockchip tree
From: Stephen Rothwell @ 2017-08-22  1:24 UTC (permalink / raw)
  To: David Miller, Networking, Heiko Stuebner
  Cc: Linux-Next Mailing List, Linux Kernel Mailing List, David Wu,
	Rocky Hao, Joseph Chen

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  arch/arm64/boot/dts/rockchip/rk3328-evb.dts

between commits:

  ab78718bda79 ("arm64: dts: rockchip: Enable tsadc module on RK3328 eavluation board")
  1e28037ec88e ("arm64: dts: rockchip: add rk805 node for rk3328-evb")

from the rockchip tree and commit:

  4b05bc6157eb ("ARM64: dts: rockchip: Enable gmac2phy for rk3328-evb")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/arm64/boot/dts/rockchip/rk3328-evb.dts
index 86605ae7b6f5,b9f36dad17e6..000000000000
--- a/arch/arm64/boot/dts/rockchip/rk3328-evb.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3328-evb.dts
@@@ -51,147 -51,24 +51,164 @@@
  		stdout-path = "serial2:1500000n8";
  	};
  
 +	dc_12v: dc-12v {
 +		compatible = "regulator-fixed";
 +		regulator-name = "dc_12v";
 +		regulator-always-on;
 +		regulator-boot-on;
 +		regulator-min-microvolt = <12000000>;
 +		regulator-max-microvolt = <12000000>;
 +	};
 +
+ 	vcc_phy: vcc-phy-regulator {
+ 		compatible = "regulator-fixed";
+ 		regulator-name = "vcc_phy";
+ 		regulator-always-on;
+ 		regulator-boot-on;
+ 	};
++
 +	vcc_sys: vcc-sys {
 +		compatible = "regulator-fixed";
 +		regulator-name = "vcc_sys";
 +		regulator-always-on;
 +		regulator-boot-on;
 +		regulator-min-microvolt = <5000000>;
 +		regulator-max-microvolt = <5000000>;
 +		vin-supply = <&dc_12v>;
 +	};
  };
  
+ &gmac2phy {
+ 	phy-supply = <&vcc_phy>;
+ 	clock_in_out = "output";
+ 	assigned-clocks = <&cru SCLK_MAC2PHY_SRC>;
+ 	assigned-clock-rate = <50000000>;
+ 	assigned-clocks = <&cru SCLK_MAC2PHY>;
+ 	assigned-clock-parents = <&cru SCLK_MAC2PHY_SRC>;
+ 	status = "okay";
+ };
+ 
 +&i2c1 {
 +	status = "okay";
 +
 +	rk805: rk805@18 {
 +		compatible = "rockchip,rk805";
 +		reg = <0x18>;
 +		interrupt-parent = <&gpio2>;
 +		interrupts = <6 IRQ_TYPE_LEVEL_LOW>;
 +		#clock-cells = <1>;
 +		clock-output-names = "xin32k", "rk805-clkout2";
 +		gpio-controller;
 +		#gpio-cells = <2>;
 +		pinctrl-names = "default";
 +		pinctrl-0 = <&pmic_int_l>;
 +		rockchip,system-power-controller;
 +		wakeup-source;
 +
 +		vcc1-supply = <&vcc_sys>;
 +		vcc2-supply = <&vcc_sys>;
 +		vcc3-supply = <&vcc_sys>;
 +		vcc4-supply = <&vcc_sys>;
 +		vcc5-supply = <&vcc_io>;
 +		vcc6-supply = <&vcc_io>;
 +
 +		regulators {
 +			vdd_logic: DCDC_REG1 {
 +				regulator-name = "vdd_logic";
 +				regulator-min-microvolt = <712500>;
 +				regulator-max-microvolt = <1450000>;
 +				regulator-always-on;
 +				regulator-boot-on;
 +				regulator-state-mem {
 +					regulator-on-in-suspend;
 +					regulator-suspend-microvolt = <1000000>;
 +				};
 +			};
 +
 +			vdd_arm: DCDC_REG2 {
 +				regulator-name = "vdd_arm";
 +				regulator-min-microvolt = <712500>;
 +				regulator-max-microvolt = <1450000>;
 +				regulator-always-on;
 +				regulator-boot-on;
 +				regulator-state-mem {
 +					regulator-on-in-suspend;
 +					regulator-suspend-microvolt = <950000>;
 +				};
 +			};
 +
 +			vcc_ddr: DCDC_REG3 {
 +				regulator-name = "vcc_ddr";
 +				regulator-always-on;
 +				regulator-boot-on;
 +				regulator-state-mem {
 +					regulator-on-in-suspend;
 +				};
 +			};
 +
 +			vcc_io: DCDC_REG4 {
 +				regulator-name = "vcc_io";
 +				regulator-min-microvolt = <3300000>;
 +				regulator-max-microvolt = <3300000>;
 +				regulator-always-on;
 +				regulator-boot-on;
 +				regulator-state-mem {
 +					regulator-on-in-suspend;
 +					regulator-suspend-microvolt = <3300000>;
 +				};
 +			};
 +
 +			vcc_18: LDO_REG1 {
 +				regulator-name = "vcc_18";
 +				regulator-min-microvolt = <1800000>;
 +				regulator-max-microvolt = <1800000>;
 +				regulator-always-on;
 +				regulator-boot-on;
 +				regulator-state-mem {
 +					regulator-on-in-suspend;
 +					regulator-suspend-microvolt = <1800000>;
 +				};
 +			};
 +
 +			vcc18_emmc: LDO_REG2 {
 +				regulator-name = "vcc18_emmc";
 +				regulator-min-microvolt = <1800000>;
 +				regulator-max-microvolt = <1800000>;
 +				regulator-always-on;
 +				regulator-boot-on;
 +				regulator-state-mem {
 +					regulator-on-in-suspend;
 +					regulator-suspend-microvolt = <1800000>;
 +				};
 +			};
 +
 +			vdd_10: LDO_REG3 {
 +				regulator-name = "vdd_10";
 +				regulator-min-microvolt = <1000000>;
 +				regulator-max-microvolt = <1000000>;
 +				regulator-always-on;
 +				regulator-boot-on;
 +				regulator-state-mem {
 +					regulator-on-in-suspend;
 +					regulator-suspend-microvolt = <1000000>;
 +				};
 +			};
 +		};
 +	};
 +};
 +
 +&pinctrl {
 +	pmic {
 +		pmic_int_l: pmic-int-l {
 +			rockchip,pins = <2 RK_PA6 RK_FUNC_GPIO &pcfg_pull_up>;
 +		};
 +	};
 +};
 +
 +&tsadc {
 +	status = "okay";
 +};
 +
  &uart2 {
  	status = "okay";
  };

^ permalink raw reply

* [PATCH net-next] liquidio: show NIC's U-Boot version in a dev_info() message
From: Felix Manlunas @ 2017-08-22  1:19 UTC (permalink / raw)
  To: davem
  Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla,
	weilin.chang

From: Weilin Chang <weilin.chang@cavium.com>

Signed-off-by: Weilin Chang <weilin.chang@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
---
 .../net/ethernet/cavium/liquidio/octeon_console.c  | 85 ++++++++++++++++++++++
 .../net/ethernet/cavium/liquidio/octeon_device.h   |  5 ++
 2 files changed, 90 insertions(+)

diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_console.c b/drivers/net/ethernet/cavium/liquidio/octeon_console.c
index 19e5212..88ef12b 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_console.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_console.c
@@ -30,6 +30,7 @@
 #include "liquidio_image.h"
 #include "octeon_mem_ops.h"
 
+static void octeon_get_uboot_version(struct octeon_device *oct);
 static void octeon_remote_lock(void);
 static void octeon_remote_unlock(void);
 static u64 cvmx_bootmem_phy_named_block_find(struct octeon_device *oct,
@@ -611,6 +612,9 @@ int octeon_add_console(struct octeon_device *oct, u32 console_num,
 
 		work = &oct->console_poll_work[console_num].work;
 
+		if (oct->uboot_len == 0)
+			octeon_get_uboot_version(oct);
+
 		INIT_DELAYED_WORK(work, check_console);
 		oct->console_poll_work[console_num].ctxptr = (void *)oct;
 		oct->console_poll_work[console_num].ctxul = console_num;
@@ -724,6 +728,87 @@ static int octeon_console_read(struct octeon_device *oct, u32 console_num,
 	return bytes_to_read;
 }
 
+static void octeon_get_uboot_version(struct octeon_device *oct)
+{
+	s32 bytes_read, tries, total_read;
+	struct octeon_console *console;
+	u32 console_num = 0;
+	int i;
+
+	if (octeon_console_send_cmd(oct, "setenv stdout pci\n", 50))
+		return;
+
+	console = &oct->console[console_num];
+	tries = 0;
+	total_read = 0;
+
+	if (octeon_console_send_cmd(oct, "version\n", 1))
+		return;
+
+	do {
+		/* Take console output regardless of whether it will
+		 * be logged
+		 */
+		bytes_read =
+			octeon_console_read(oct,
+					    console_num, oct->uboot_version +
+					    total_read,
+					    OCTEON_UBOOT_BUFFER_SIZE - 1 -
+					    total_read);
+		if (bytes_read > 0) {
+			oct->uboot_version[bytes_read] = 0x0;
+
+			total_read += bytes_read;
+			if (console->waiting)
+				octeon_console_handle_result(oct, console_num);
+		} else if (bytes_read < 0) {
+			dev_err(&oct->pci_dev->dev, "Error reading console %u, ret=%d\n",
+				console_num, bytes_read);
+		}
+
+		tries++;
+	} while ((bytes_read > 0) && (tries < 16));
+
+	/* If nothing is read after polling the console,
+	 * output any leftovers if any
+	 */
+	if ((total_read == 0) && (console->leftover[0])) {
+		dev_dbg(&oct->pci_dev->dev, "%u: %s\n",
+			console_num, console->leftover);
+		console->leftover[0] = '\0';
+	}
+
+	if (octeon_console_send_cmd(oct, "setenv stdout serial\n", 50))
+		return;
+
+	/* U-Boot */
+	for (i = 0; i < (OCTEON_UBOOT_BUFFER_SIZE - 9); i++) {
+		if (oct->uboot_version[i] == 'U' &&
+		    oct->uboot_version[i + 2] == 'B' &&
+		    oct->uboot_version[i + 3] == 'o' &&
+		    oct->uboot_version[i + 4] == 'o' &&
+		    oct->uboot_version[i + 5] == 't') {
+			oct->uboot_sidx = i;
+			i++;
+			for (; oct->uboot_version[i] != 0x0; i++) {
+				if (oct->uboot_version[i] == 'm' &&
+				    oct->uboot_version[i + 1] == 'i' &&
+				    oct->uboot_version[i + 2] == 'p' &&
+				    oct->uboot_version[i + 3] == 's') {
+					oct->uboot_eidx = i - 1;
+					oct->uboot_version[i - 1] = 0x0;
+					oct->uboot_len = oct->uboot_eidx -
+						oct->uboot_sidx + 1;
+					dev_info(&oct->pci_dev->dev, "%s\n",
+						 &oct->uboot_version
+							[oct->uboot_sidx]);
+					return;
+				}
+			}
+		}
+	}
+}
+
 #define FBUF_SIZE	(4 * 1024 * 1024)
 
 int octeon_download_firmware(struct octeon_device *oct, const u8 *data,
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.h b/drivers/net/ethernet/cavium/liquidio/octeon_device.h
index 894af19..8dbd133 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_device.h
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.h
@@ -550,6 +550,11 @@ struct octeon_device {
 
 	bool cores_crashed;
 
+#define OCTEON_UBOOT_BUFFER_SIZE 512
+	char uboot_version[OCTEON_UBOOT_BUFFER_SIZE];
+	int  uboot_len;
+	int  uboot_sidx, uboot_eidx;
+
 	struct {
 		int bus;
 		int dev;

^ permalink raw reply related

* Re: [PATCH net-next 3/3 v7] drivers: net: ethernet: qualcomm: rmnet: Initial implementation
From: Andrew Lunn @ 2017-08-22  1:17 UTC (permalink / raw)
  To: Subash Abhinov Kasiviswanathan
  Cc: netdev, davem, fengguang.wu, dcbw, jiri, stephen, David.Laight,
	marcel
In-Reply-To: <1503355019-12236-4-git-send-email-subashab@codeaurora.org>

> +static int rmnet_unregister_real_device(struct net_device *real_dev)
> +{
> +	struct rmnet_real_dev_info *rdinfo;
> +	struct list_head *iter;
> +
> +	ASSERT_RTNL();
> +
> +	if (!rmnet_is_real_dev_registered(real_dev) ||
> +	    netdev_lower_get_next(real_dev, &iter))
> +		return -EINVAL;
> +
> +	rdinfo = __rmnet_get_real_dev_info(real_dev);
> +	kfree(rdinfo);
> +
> +	netdev_rx_handler_unregister(real_dev);
> +
> +	/* release reference on real_dev */
> +	dev_put(real_dev);
> +
> +	netdev_info(real_dev, "Removed from rmnet\n");

I would probably turn all your netdev_info()s into netdev_dbg()s.  You
seem to be spamming the kernel log quite a bit.

  Andrew

^ permalink raw reply

* Re: [PATCH net] udp: on peeking bad csum, drop packets even if not at head
From: Willem de Bruijn @ 2017-08-22  1:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Network Development, David Miller, Paolo Abeni, Willem de Bruijn
In-Reply-To: <CAF=yD-+oht8gPcNdSU1zuj5TrOhq0cuwzzcwiCn676YzKNsn=Q@mail.gmail.com>

On Mon, Aug 21, 2017 at 8:12 PM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> On Mon, Aug 21, 2017 at 6:40 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Mon, 2017-08-21 at 17:39 -0400, Willem de Bruijn wrote:
>>> From: Willem de Bruijn <willemb@google.com>
>>>
>>> When peeking, if a bad csum is discovered, the skb is unlinked from
>>> the queue with __sk_queue_drop_skb and the peek operation restarted.
>>>
>>> __sk_queue_drop_skb only drops packets that match the queue head. With
>>> sk_peek_off, the skb need not be at head, causing the call to fail and
>>> the same skb to be found again on restart.
>>>
>>> Walk the queue to find the correct skb. Limit the walk to sk_peek_off,
>>> to bound cycle cost to at most twice the original skb_queue_walk in
>>> __skb_try_recv_from_queue.
>>>
>>> The operation may race with updates to sk_peek_off. As the operation
>>> is retried, it will eventually succeed.
>>>
>>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>>
>> You forgot the Fixes: tag, that such a bug fix deserves.
>
> Indeed, sorry. I'm looking into that now. It should be the patch that
> introduced peeking at offset, but need to verify.

It is. Fixes: 627d2d6b5500 ("udp: enable MSG_PEEK at non-zero offset")

^ permalink raw reply

* Re: [PATCH net-next 1/3 v7] net: ether: Add support for multiplexing and aggregation type
From: Andrew Lunn @ 2017-08-22  1:02 UTC (permalink / raw)
  To: Subash Abhinov Kasiviswanathan
  Cc: netdev, davem, fengguang.wu, dcbw, jiri, stephen, David.Laight,
	marcel
In-Reply-To: <1503355019-12236-2-git-send-email-subashab@codeaurora.org>

On Mon, Aug 21, 2017 at 04:36:57PM -0600, Subash Abhinov Kasiviswanathan wrote:
> Define the multiplexing and aggregation (MAP) ether type 0xDA1A. This
> is needed for receiving data in the MAP protocol like RMNET. This is
> not an officially registered ID.
> 
> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
> ---
>  include/uapi/linux/if_ether.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/if_ether.h b/include/uapi/linux/if_ether.h
> index 5bc9bfd..e80b03f 100644
> --- a/include/uapi/linux/if_ether.h
> +++ b/include/uapi/linux/if_ether.h
> @@ -104,7 +104,9 @@
>  #define ETH_P_QINQ3	0x9300		/* deprecated QinQ VLAN [ NOT AN OFFICIALLY REGISTERED ID ] */
>  #define ETH_P_EDSA	0xDADA		/* Ethertype DSA [ NOT AN OFFICIALLY REGISTERED ID ] */
>  #define ETH_P_AF_IUCV   0xFBFB		/* IBM af_iucv [ NOT AN OFFICIALLY REGISTERED ID ] */
> -
> +#define ETH_P_MAP       0xDA1A          /* Multiplexing and Aggregation Protocol
> +					 *  NOT AN OFFICIALLY REGISTERED ID ]
> +					 */

Hi Subash

This list is sorted. So this entry should go earlier.

     Andrew

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] tcp: Remove unnecessary dst check in tcp_conn_request.
From: Tonghao Zhang @ 2017-08-22  0:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Linux Kernel Network Developers
In-Reply-To: <1503327362.2499.2.camel@edumazet-glaptop3.roam.corp.google.com>

On Mon, Aug 21, 2017 at 10:56 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Please do not top post.
Got it, thanks.

> On Mon, 2017-08-21 at 21:24 +0800, Tonghao Zhang wrote:
>> Thanks,  yes  this is a bug. I found this bug exists from 3.17~ 4.13.
>> The commit is d94e0417
>>
>
> This bug was there at the beginning of git tree.
>
>
>> One question:  should I send a patch for each kernel version because
>> code conflicts ?
>>
>> a patch for v4.12
>> a patch for v4.11
>> a patch for v4.10~v4.7
>> a patch for v4.6~v3.17
>>
>> and
>> a patch for  net-next, because tcp_tw_recycle has been removed.
>>
>
> Given this bug only would matter if syncookies are disabled, I would not
> bother and only target net-next. This does not look serious enough to
> deserve backports to stable versions.
>
OK, thanks again.

>> Thanks very much.
>>
>> On Sun, Aug 20, 2017 at 12:25 PM, David Miller <davem@davemloft.net> wrote:
>> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>> > Date: Wed, 16 Aug 2017 20:02:45 -0700
>> >
>> >> Because we remove the tcp_tw_recycle support in the commit
>> >> 4396e46187c ('tcp: remove tcp_tw_recycle') and also delete
>> >> the code 'af_ops->route_req' for sysctl_tw_recycle in tcp_conn_request.
>> >> Now when we call the 'af_ops->route_req', the dist always is
>> >> NULL, and we remove the unnecessay check.
>> >>
>> >> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>> >
>> > This is a bug actually, rather than something to paper over
>> > by removing the check.
>> >
>> > Code earlier in this function needs a proper 'dst' in order to operate
>> > properly.
>> >
>> > There is a call to tcp_peer_is_proven() which must have a proper route
>> > to make the determination yet it will always be NULL.
>> >
>> > Please investigate what the code is doing and how a test became
>> > "unnecessary" over time before blindly removing it, thank you.
>
>

^ permalink raw reply

* Re: Re: [PATCH net-next] net: sched: Add the invalid handle check in qdisc_class_find
From: Cong Wang @ 2017-08-22  0:51 UTC (permalink / raw)
  To: Gao Feng; +Cc: David Miller, Linux Kernel Network Developers, Jamal Hadi Salim
In-Reply-To: <caf756f.76ba.15e0767aea0.Coremail.gfree.wind@vip.163.com>

On Mon, Aug 21, 2017 at 5:46 PM, Gao Feng <gfree.wind@vip.163.com> wrote:
> Hi Cong,
>
> Thanks your reminder firstly.
> But I had used the get_maintainer.pl actually before sent the patch.
>
> The following is the output.
> [fgao@ikuai8 net-next]#./scripts/get_maintainer.pl patch_ScheCheck/0001-net-sched-Add-the-invalid-handle-check-in-qdisc_clas.patch
> "David S. Miller" <davem@davemloft.net> (maintainer:NETWORKING [GENERAL])
> netdev@vger.kernel.org (open list:NETWORKING [GENERAL])
> linux-kernel@vger.kernel.org (open list)
>
> I don't know if it is an issue of the script "get_maintainer.pl"  or my usage is wrong.

No, probably because we don't add include/net/sch_generic.h into TC
subsystem, while we should.

^ permalink raw reply

* Re:Re: [PATCH net-next] net: sched: Add the invalid handle check in qdisc_class_find
From: Gao Feng @ 2017-08-22  0:46 UTC (permalink / raw)
  To: Cong Wang; +Cc: David Miller, Linux Kernel Network Developers, Jamal Hadi Salim
In-Reply-To: <CAM_iQpW_odL=iqONVX2cLqPr_XQ-J40Ruv+kYMvwPThR42Udrg@mail.gmail.com>

At 2017-08-22 03:58:03, "Cong Wang" <xiyou.wangcong@gmail.com> wrote:
>On Mon, Aug 21, 2017 at 10:47 AM, David Miller <davem@davemloft.net> wrote:
>> From: gfree.wind@vip.163.com
>> Date: Fri, 18 Aug 2017 15:23:24 +0800
>>
>>> From: Gao Feng <gfree.wind@vip.163.com>
>>>
>>> Add the invalid handle "0" check to avoid unnecessary search, because
>>> the qdisc uses the skb->priority as the handle value to look up, and
>>> it is "0" usually.
>>>
>>> Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
>>
>> Jamal, Cong, please review.
>>
>> If 'id' zero is never hashed into the tables, this change looks
>> legitimate.
>
>Looks good to me.
>
>Gao, in the future please Cc maintainers directly, you can
>use ./scripts/get_maintainer.pl.
>
>Thanks.

Hi Cong,

Thanks your reminder firstly.
But I had used the get_maintainer.pl actually before sent the patch.

The following is the output.
[fgao@ikuai8 net-next]#./scripts/get_maintainer.pl patch_ScheCheck/0001-net-sched-Add-the-invalid-handle-check-in-qdisc_clas.patch
"David S. Miller" <davem@davemloft.net> (maintainer:NETWORKING [GENERAL])
netdev@vger.kernel.org (open list:NETWORKING [GENERAL])
linux-kernel@vger.kernel.org (open list)

I don't know if it is an issue of the script "get_maintainer.pl"  or my usage is wrong.

Anyway, thanks you & Jamal's review.

Best Regards
Feng


^ permalink raw reply

* Re: [PATCH v2 2/2] net: phy: Don't use drv when it is NULL in phy_attached_print
From: Florian Fainelli @ 2017-08-22  0:46 UTC (permalink / raw)
  To: Andrew Lunn, Romain Perier
  Cc: Giuseppe Cavallaro, Alexandre Torgue, netdev, linux-kernel
In-Reply-To: <20170821142426.GF1703@lunn.ch>

On 08/21/2017 07:24 AM, Andrew Lunn wrote:
> On Mon, Aug 21, 2017 at 01:45:30PM +0200, Romain Perier wrote:
>> Currently, if this logging function is used prior the phy driver is
>> bound to the phy device (that is usually done from .ndo_open),
>> 'phydev->drv' might be NULL, resulting in a kernel crash. That is
>> typically the case in the stmmac driver, info about the phy is displayed
>> during the registration of the MDIO bus, and then genphy driver is bound
>> to this phydev when .ndo_open is called.
>>
>> This commit fixes the issue by using the right genphy driver, when
>> phydev->drv is NULL.
>>
>> Fixes: fbca164776e4 ("net: stmmac: Use the right logging functi")
>> Signed-off-by: Romain Perier <romain.perier@collabora.com>
>> ---
>>  drivers/net/phy/phy_device.c | 13 +++++++++++--
>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>> index 1790f7fec125..6af6dc6dfeaf 100644
>> --- a/drivers/net/phy/phy_device.c
>> +++ b/drivers/net/phy/phy_device.c
>> @@ -864,15 +864,24 @@ EXPORT_SYMBOL(phy_attached_info);
>>  #define ATTACHED_FMT "attached PHY driver [%s] (mii_bus:phy_addr=%s, irq=%d)"
>>  void phy_attached_print(struct phy_device *phydev, const char *fmt, ...)
>>  {
>> +	struct phy_driver *drv = phydev->drv;
>> +
>> +	if (!drv) {
>> +		if (phydev->is_c45)
>> +			drv = &genphy_10g_driver;
>> +		else
>> +			drv = &genphy_driver;
>> +	}
>> +
> 
> As i said in my comment to v1, i don't like this.

Agreed, just use Andrew's earlier suggestion of checking phydev->drv
validity.

We don't have an equivalent of "unregistered" in the PHY layer, but
"unbound" seems like it could be what we want here.

Thanks
-- 
Florian

^ permalink raw reply

* Re: [PATCH v2 1/2] net: stmmac: Delete dead code for MDIO registration
From: Florian Fainelli @ 2017-08-22  0:45 UTC (permalink / raw)
  To: Romain Perier, Giuseppe Cavallaro, Alexandre Torgue, Andrew Lunn
  Cc: netdev, linux-kernel
In-Reply-To: <20170821114530.13706-2-romain.perier@collabora.com>

On 08/21/2017 04:45 AM, Romain Perier wrote:
> This code is no longer used, the logging function was changed by commit
> fbca164776e4 ("net: stmmac: Use the right logging functi").
> 
> Fixes: fbca164776e4 ("net: stmmac: Use the right logging functi")
> Signed-off-by: Romain Perier <romain.perier@collabora.com>
> ---
>  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 16 ----------------
>  1 file changed, 16 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> index 72ec711fcba2..f5f37bfa1d58 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> @@ -248,9 +248,6 @@ int stmmac_mdio_register(struct net_device *ndev)
>  	found = 0;
>  	for (addr = 0; addr < PHY_MAX_ADDR; addr++) {
>  		struct phy_device *phydev = mdiobus_get_phy(new_bus, addr);
> -		int act = 0;
> -		char irq_num[4];
> -		char *irq_str;
>  
>  		if (!phydev)
>  			continue;
> @@ -273,19 +270,6 @@ int stmmac_mdio_register(struct net_device *ndev)
>  		if (priv->plat->phy_addr == -1)
>  			priv->plat->phy_addr = addr;
>  
> -		act = (priv->plat->phy_addr == addr);
> -		switch (phydev->irq) {
> -		case PHY_POLL:
> -			irq_str = "POLL";
> -			break;
> -		case PHY_IGNORE_INTERRUPT:
> -			irq_str = "IGNORE";
> -			break;
> -		default:
> -			sprintf(irq_num, "%d", phydev->irq);
> -			irq_str = irq_num;
> -			break;
> -		}

I was actually just looking into moving these prints to
phy_attached_info(), since it is useful to know whether the interrupt is
POLL, IGNORE, or valid. Can you move that there? Then you can really
migrate over phy_attached_info() with no information loss.

Thanks!
-- 
Florian

^ permalink raw reply

* Re: [iproute PATCH v2] lib/bpf: Don't leak fp in bpf_find_mntpt()
From: Stephen Hemminger @ 2017-08-22  0:35 UTC (permalink / raw)
  To: Phil Sutter; +Cc: netdev
In-Reply-To: <20170821144651.5379-1-phil@nwl.cc>

On Mon, 21 Aug 2017 16:46:51 +0200
Phil Sutter <phil@nwl.cc> wrote:

> If fopen() succeeded but len != PATH_MAX, the function leaks the open
> FILE pointer. Fix this by checking len value before calling fopen().
> 
> Signed-off-by: Phil Sutter <phil@nwl.cc>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>


Thanks, Applied

^ permalink raw reply

* Re: [iproute PATCH v2 2/7] ss: Make sure index variable is >= 0
From: Stephen Hemminger @ 2017-08-22  0:34 UTC (permalink / raw)
  To: Phil Sutter; +Cc: netdev
In-Reply-To: <20170821170813.29697-3-phil@nwl.cc>

On Mon, 21 Aug 2017 19:08:08 +0200
Phil Sutter <phil@nwl.cc> wrote:

> This shouldn't happen but relying upon external data without checking
> may lead to unexpected results.
> 
> Signed-off-by: Phil Sutter <phil@nwl.cc>
> ---
>  misc/ss.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/misc/ss.c b/misc/ss.c
> index 10360e5a04ff8..1ee02d73b2d7f 100644
> --- a/misc/ss.c
> +++ b/misc/ss.c
> @@ -2004,7 +2004,7 @@ static void tcp_timer_print(struct tcpstat *s)
>  		"unknown"
>  	};
>  
> -	if (s->timer) {
> +	if (s->timer >= 0) {
>  		if (s->timer > 4)
>  			s->timer = 5;
>  		printf(" timer:(%s,%s,%d)",

Let's go one step deeper on this.
Why is s->timer an int, it should be unsigned. In which case the code in
print would not have to change.

^ permalink raw reply

* Re: [RFC net-next v2] bridge lwtunnel, VPLS & NVGRE
From: David Lamparter @ 2017-08-22  0:29 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Lamparter, netdev, bridge, amine.kherbouche, roopa
In-Reply-To: <20170821170151.5b12a392@xeon-e3>

On Mon, Aug 21, 2017 at 05:01:51PM -0700, Stephen Hemminger wrote:
> On Mon, 21 Aug 2017 19:15:17 +0200 David Lamparter <equinox@diac24.net> wrote:
> > > P.S.: For a little context on the bridge FDB changes - I'm hoping to
> > > find some time to extend this to the MDB to allow aggregating dst
> > > metadata and handing down a list of dst metas on TX.  This isn't
> > > specifically for VPLS but rather to give sufficient information to the
> > > 802.11 stack to allow it to optimize selecting rates (or unicasting)
> > > for multicast traffic by having the multicast subscriber list known.
> > > This is done by major commercial wifi solutions (e.g. google "dynamic
> > > multicast optimization".)  
> > 
> > You can find hacks at this on:
> > https://github.com/eqvinox/vpls-linux-kernel/tree/mdb-hack
> > Please note that the patches in that branch are not at an acceptable
> > quality level, but you can see the semantic relation to 802.11.
> > 
> > I would, however, like to point out that this branch has pseudo-working
> > IGMP/MLD snooping for VPLS, and it'd be 20-ish lines to add it to NVGRE
> > (I'll do that as soon as I get to it, it'll pop up on that branch too.)
> > 
> > This is relevant to the discussion because it's a feature which is
> > non-obvious (to me) on how to do with the VXLAN model of having an
> > entirely separate FDB.  Meanwhile, with this architecture, the proof of
> > concept / hack is coming in at a measly cost of:
> > 8 files changed, 176 insertions(+), 15 deletions(-)
> 
> I know the bridge is an easy target to extend L2 forwarding, but it is not
> the only option. Have you condidered building a new driver

Yes I have;  I dismissed the approach because even though an fdb is
reasonable to duplicate, I did not believe replicating multicast
snooping code into both VPLS and 802.11 (and possibly VXLAN) to be a
viable option.  ...is it?

> (like VXLAN does) which does the forwarding you want. Having all
> features in one driver makes for worse performance, and increased
> complexity.

Can you elaborate?  I agree with that sentence as a general statement,
but a general statement needs to apply to a specific situation.  As
discussed in the previous thread with Nikolay, checking skb->_refdst
against 0 should be doable without touching additional cachelines, so
the performance cost should be rather small.  For complexity - it's
keeping an extra pointer around, which is semantically bound to the
existing net_bridge_fdb_entry->dst.  On the other hand, it spares us
from another copy of a fdb implementation, and two copies of multicast
snooping code...  I honestly believe this patchset is a good approach.


-David

^ permalink raw reply

* Re: [iproute PATCH v2 0/3] Covscan: Fix for missing error checking
From: Stephen Hemminger @ 2017-08-22  0:29 UTC (permalink / raw)
  To: Phil Sutter; +Cc: netdev
In-Reply-To: <20170821163652.23752-1-phil@nwl.cc>

On Mon, 21 Aug 2017 18:36:49 +0200
Phil Sutter <phil@nwl.cc> wrote:

> This series collects patches from v1 dealing with spots where error
> checking is necessary or recommended.
> 
> Minor changes to patches 1 and 2, patch 3 remains unchanged.
> 
> Phil Sutter (3):
>   iproute: Check mark value input
>   iplink_vrf: Complain if main table is not found
>   devlink: Check return code of strslashrsplit()
> 
>  devlink/devlink.c | 16 ++++++++++++----
>  ip/iplink_vrf.c   |  4 +++-
>  ip/iproute.c      |  6 ++++--
>  3 files changed, 19 insertions(+), 7 deletions(-)
> 

These 3 look fine. Applied

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox