Netdev List
 help / color / mirror / Atom feed
* Re: Re: [PATCH net-next v2 0/5] virtio: support packed ring
From: Michael S. Tsirkin @ 2018-10-11 14:17 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: Jason Wang, virtualization, linux-kernel, netdev, virtio-dev,
	wexu, jfreimann, maxime.coquelin, zhihong.wang
In-Reply-To: <20181011141331.GA11650@debian>

On Thu, Oct 11, 2018 at 10:13:31PM +0800, Tiwei Bie wrote:
> On Thu, Oct 11, 2018 at 09:48:48AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Oct 11, 2018 at 08:12:21PM +0800, Tiwei Bie wrote:
> > > > > But if it's not too late, I second for a OUT_OF_ORDER feature.
> > > > > Starting from in order can have much simpler code in driver.
> > > > > 
> > > > > Thanks
> > > > 
> > > > It's tricky to change the flag polarity because of compatibility
> > > > with legacy interfaces. Why is this such a big deal?
> > > > 
> > > > Let's teach drivers about IN_ORDER, then if devices
> > > > are in order it will get enabled by default.
> > > 
> > > Yeah, make sense.
> > > 
> > > Besides, I have done some further profiling and debugging
> > > both in kernel driver and DPDK vhost. Previously I was mislead
> > > by a bug in vhost code. I will send a patch to fix that bug.
> > > With that bug fixed, the performance of packed ring in the
> > > test between kernel driver and DPDK vhost is better now.
> > 
> > OK, if we get a performance gain on the virtio side, we can finally
> > upstream it. If you see that please re-post ASAP so we can
> > put it in the next kernel release.
> 
> Got it, I will re-post ASAP.
> 
> Thanks!


Pls remember to include data on performance gain in the cover letter.


> 
> > 
> > -- 
> > MST

^ permalink raw reply

* Re: [PATCH stable 4.4 V2 0/6] fix SegmentSmack in stable branch (CVE-2018-5390)
From: Greg KH @ 2018-10-11 14:07 UTC (permalink / raw)
  To: maowenan; +Cc: netdev, dwmw2, eric.dumazet, davem, stable, linux-kernel
In-Reply-To: <20180926202121.GB23881@kroah.com>

On Wed, Sep 26, 2018 at 10:21:21PM +0200, Greg KH wrote:
> On Tue, Sep 25, 2018 at 10:10:15PM +0800, maowenan wrote:
> > Hi Greg:
> > 
> > can you review this patch set?
> 
> It is still in the queue, don't worry.  It will take some more time to
> properly review and test it.
> 
> Ideally you could get someone else to test this and provide a
> "tested-by:" tag for it?

All now queued up, let's see what breaks :)

thanks,

greg k-h

^ permalink raw reply

* [PATCH] r8169: set RX_MULTI_EN bit in RxConfig for 8168F-family chips
From: Maciej S. Szmigiero @ 2018-10-11 14:02 UTC (permalink / raw)
  To: David S. Miller, Chris Clayton
  Cc: Heiner Kallweit, Azat Khuzhin, Realtek linux nic maintainers,
	netdev@vger.kernel.org, linux-kernel

It has been reported that since
commit 05212ba8132b42 ("r8169: set RxConfig after tx/rx is enabled for RTL8169sb/8110sb devices")
at least RTL_GIGA_MAC_VER_38 NICs work erratically after a resume from
suspend.
The problem has been traced to a missing RX_MULTI_EN bit in the RxConfig
register.
We already set this bit for RTL_GIGA_MAC_VER_35 NICs of the same 8168F
chip family so let's do it also for its other siblings: RTL_GIGA_MAC_VER_36
and RTL_GIGA_MAC_VER_38.

Curiously, the NIC seems to work fine after a system boot without having
this bit set as long as the system isn't suspended and resumed.

Fixes: 05212ba8132b42 ("r8169: set RxConfig after tx/rx is enabled for RTL8169sb/8110sb devices")
Reported-by: Chris Clayton <chris2553@googlemail.com>
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
---
 drivers/net/ethernet/realtek/r8169.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 7d3f671e1bb3..b68e32186d67 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -4269,8 +4269,8 @@ static void rtl_init_rxcfg(struct rtl8169_private *tp)
 		RTL_W32(tp, RxConfig, RX_FIFO_THRESH | RX_DMA_BURST);
 		break;
 	case RTL_GIGA_MAC_VER_18 ... RTL_GIGA_MAC_VER_24:
-	case RTL_GIGA_MAC_VER_34:
-	case RTL_GIGA_MAC_VER_35:
+	case RTL_GIGA_MAC_VER_34 ... RTL_GIGA_MAC_VER_36:
+	case RTL_GIGA_MAC_VER_38:
 		RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST);
 		break;
 	case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
-- 
2.17.0

^ permalink raw reply related

* Re: [virtio-dev] Re: [PATCH net-next v2 0/5] virtio: support packed ring
From: Michael S. Tsirkin @ 2018-10-11 13:48 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: Jason Wang, virtualization, linux-kernel, netdev, virtio-dev,
	wexu, jfreimann
In-Reply-To: <20181011121221.GA27106@debian>

On Thu, Oct 11, 2018 at 08:12:21PM +0800, Tiwei Bie wrote:
> > > But if it's not too late, I second for a OUT_OF_ORDER feature.
> > > Starting from in order can have much simpler code in driver.
> > > 
> > > Thanks
> > 
> > It's tricky to change the flag polarity because of compatibility
> > with legacy interfaces. Why is this such a big deal?
> > 
> > Let's teach drivers about IN_ORDER, then if devices
> > are in order it will get enabled by default.
> 
> Yeah, make sense.
> 
> Besides, I have done some further profiling and debugging
> both in kernel driver and DPDK vhost. Previously I was mislead
> by a bug in vhost code. I will send a patch to fix that bug.
> With that bug fixed, the performance of packed ring in the
> test between kernel driver and DPDK vhost is better now.

OK, if we get a performance gain on the virtio side, we can finally
upstream it. If you see that please re-post ASAP so we can
put it in the next kernel release.

-- 
MST

^ permalink raw reply

* [PATCH net-next v6] net/ncsi: Extend NC-SI Netlink interface to allow user space to send NC-SI command
From: Justin.Lee1 @ 2018-10-11  6:21 UTC (permalink / raw)
  To: sam, joel; +Cc: linux-aspeed, netdev, openbmc, amithash, christian, vijaykhemka

The new command (NCSI_CMD_SEND_CMD) is added to allow user space application
to send NC-SI command to the network card.
Also, add a new attribute (NCSI_ATTR_DATA) for transferring request and response.

The work flow is as below. 

Request:
User space application
	-> Netlink interface (msg)
	-> new Netlink handler - ncsi_send_cmd_nl()
	-> ncsi_xmit_cmd()

Response:
Response received - ncsi_rcv_rsp()
	-> internal response handler - ncsi_rsp_handler_xxx()
	-> ncsi_rsp_handler_netlink()
	-> ncsi_send_netlink_rsp ()
	-> Netlink interface (msg)
	-> user space application

Command timeout - ncsi_request_timeout()
	-> ncsi_send_netlink_timeout ()
	-> Netlink interface (msg with zero data length)
	-> user space application

Error:
Error detected
	-> ncsi_send_netlink_err ()
	-> Netlink interface (err msg)
	-> user space application


Signed-off-by: Justin Lee <justin.lee1@dell.com> 


---
V6: Add checking before accessing NCSI_ATTR_DATA attribute to avoid null-dereference. 
V5: Update comments and debug message.
V4: Update comments and remove some debug message.
V3: Based on http://patchwork.ozlabs.org/patch/979688/ to remove the duplicated code.
V2: Remove non-related debug message and clean up the code.

 include/uapi/linux/ncsi.h |   6 ++
 net/ncsi/internal.h       |   7 ++
 net/ncsi/ncsi-cmd.c       |   8 ++
 net/ncsi/ncsi-manage.c    |  16 ++++
 net/ncsi/ncsi-netlink.c   | 205 ++++++++++++++++++++++++++++++++++++++++++++++
 net/ncsi/ncsi-netlink.h   |  12 +++
 net/ncsi/ncsi-rsp.c       |  67 +++++++++++++--
 7 files changed, 316 insertions(+), 5 deletions(-)

diff --git a/include/uapi/linux/ncsi.h b/include/uapi/linux/ncsi.h
index 4c292ec..0a26a55 100644
--- a/include/uapi/linux/ncsi.h
+++ b/include/uapi/linux/ncsi.h
@@ -23,6 +23,9 @@
  *	optionally the preferred NCSI_ATTR_CHANNEL_ID.
  * @NCSI_CMD_CLEAR_INTERFACE: clear any preferred package/channel combination.
  *	Requires NCSI_ATTR_IFINDEX.
+ * @NCSI_CMD_SEND_CMD: send NC-SI command to network card.
+ *	Requires NCSI_ATTR_IFINDEX, NCSI_ATTR_PACKAGE_ID
+ *	and NCSI_ATTR_CHANNEL_ID.
  * @NCSI_CMD_MAX: highest command number
  */
 enum ncsi_nl_commands {
@@ -30,6 +33,7 @@ enum ncsi_nl_commands {
 	NCSI_CMD_PKG_INFO,
 	NCSI_CMD_SET_INTERFACE,
 	NCSI_CMD_CLEAR_INTERFACE,
+	NCSI_CMD_SEND_CMD,
 
 	__NCSI_CMD_AFTER_LAST,
 	NCSI_CMD_MAX = __NCSI_CMD_AFTER_LAST - 1
@@ -43,6 +47,7 @@ enum ncsi_nl_commands {
  * @NCSI_ATTR_PACKAGE_LIST: nested array of NCSI_PKG_ATTR attributes
  * @NCSI_ATTR_PACKAGE_ID: package ID
  * @NCSI_ATTR_CHANNEL_ID: channel ID
+ * @NCSI_ATTR_DATA: command payload
  * @NCSI_ATTR_MAX: highest attribute number
  */
 enum ncsi_nl_attrs {
@@ -51,6 +56,7 @@ enum ncsi_nl_attrs {
 	NCSI_ATTR_PACKAGE_LIST,
 	NCSI_ATTR_PACKAGE_ID,
 	NCSI_ATTR_CHANNEL_ID,
+	NCSI_ATTR_DATA,
 
 	__NCSI_ATTR_AFTER_LAST,
 	NCSI_ATTR_MAX = __NCSI_ATTR_AFTER_LAST - 1
diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h
index 3d0a33b..13c9b5e 100644
--- a/net/ncsi/internal.h
+++ b/net/ncsi/internal.h
@@ -175,6 +175,8 @@ struct ncsi_package;
 #define NCSI_RESERVED_CHANNEL	0x1f
 #define NCSI_CHANNEL_INDEX(c)	((c) & ((1 << NCSI_PACKAGE_SHIFT) - 1))
 #define NCSI_TO_CHANNEL(p, c)	(((p) << NCSI_PACKAGE_SHIFT) | (c))
+#define NCSI_MAX_PACKAGE	8
+#define NCSI_MAX_CHANNEL	32
 
 struct ncsi_channel {
 	unsigned char               id;
@@ -220,11 +222,15 @@ struct ncsi_request {
 	bool                 used;    /* Request that has been assigned  */
 	unsigned int         flags;   /* NCSI request property           */
 #define NCSI_REQ_FLAG_EVENT_DRIVEN	1
+#define NCSI_REQ_FLAG_NETLINK_DRIVEN	2
 	struct ncsi_dev_priv *ndp;    /* Associated NCSI device          */
 	struct sk_buff       *cmd;    /* Associated NCSI command packet  */
 	struct sk_buff       *rsp;    /* Associated NCSI response packet */
 	struct timer_list    timer;   /* Timer on waiting for response   */
 	bool                 enabled; /* Time has been enabled or not    */
+	u32                  snd_seq;     /* netlink sending sequence number */
+	u32                  snd_portid;  /* netlink portid of sender        */
+	struct nlmsghdr      nlhdr;       /* netlink message header          */
 };
 
 enum {
@@ -310,6 +316,7 @@ struct ncsi_cmd_arg {
 		unsigned int   dwords[4];
 	};
 	unsigned char        *data;       /* NCSI OEM data                 */
+	struct genl_info     *info;       /* Netlink information           */
 };
 
 extern struct list_head ncsi_dev_list;
diff --git a/net/ncsi/ncsi-cmd.c b/net/ncsi/ncsi-cmd.c
index 82b7d92..356af47 100644
--- a/net/ncsi/ncsi-cmd.c
+++ b/net/ncsi/ncsi-cmd.c
@@ -17,6 +17,7 @@
 #include <net/ncsi.h>
 #include <net/net_namespace.h>
 #include <net/sock.h>
+#include <net/genetlink.h>
 
 #include "internal.h"
 #include "ncsi-pkt.h"
@@ -346,6 +347,13 @@ int ncsi_xmit_cmd(struct ncsi_cmd_arg *nca)
 	if (!nr)
 		return -ENOMEM;
 
+	/* track netlink information */
+	if (nca->req_flags == NCSI_REQ_FLAG_NETLINK_DRIVEN) {
+		nr->snd_seq = nca->info->snd_seq;
+		nr->snd_portid = nca->info->snd_portid;
+		nr->nlhdr = *nca->info->nlhdr;
+	}
+
 	/* Prepare the packet */
 	nca->id = nr->id;
 	ret = nch->handler(nr->cmd, nca);
diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c
index 0912847..76a4bcb 100644
--- a/net/ncsi/ncsi-manage.c
+++ b/net/ncsi/ncsi-manage.c
@@ -19,6 +19,7 @@
 #include <net/addrconf.h>
 #include <net/ipv6.h>
 #include <net/if_inet6.h>
+#include <net/genetlink.h>
 
 #include "internal.h"
 #include "ncsi-pkt.h"
@@ -406,6 +407,9 @@ static void ncsi_request_timeout(struct timer_list *t)
 {
 	struct ncsi_request *nr = from_timer(nr, t, timer);
 	struct ncsi_dev_priv *ndp = nr->ndp;
+	struct ncsi_package *np;
+	struct ncsi_channel *nc;
+	struct ncsi_cmd_pkt *cmd;
 	unsigned long flags;
 
 	/* If the request already had associated response,
@@ -419,6 +423,18 @@ static void ncsi_request_timeout(struct timer_list *t)
 	}
 	spin_unlock_irqrestore(&ndp->lock, flags);
 
+	if (nr->flags == NCSI_REQ_FLAG_NETLINK_DRIVEN) {
+		if (nr->cmd) {
+			/* Find the package */
+			cmd = (struct ncsi_cmd_pkt *)
+			      skb_network_header(nr->cmd);
+			ncsi_find_package_and_channel(ndp,
+						      cmd->cmd.common.channel,
+						      &np, &nc);
+			ncsi_send_netlink_timeout(nr, np, nc);
+		}
+	}
+
 	/* Release the request */
 	ncsi_free_request(nr);
 }
diff --git a/net/ncsi/ncsi-netlink.c b/net/ncsi/ncsi-netlink.c
index 45f33d6..80ba212 100644
--- a/net/ncsi/ncsi-netlink.c
+++ b/net/ncsi/ncsi-netlink.c
@@ -20,6 +20,7 @@
 #include <uapi/linux/ncsi.h>
 
 #include "internal.h"
+#include "ncsi-pkt.h"
 #include "ncsi-netlink.h"
 
 static struct genl_family ncsi_genl_family;
@@ -29,6 +30,7 @@ static const struct nla_policy ncsi_genl_policy[NCSI_ATTR_MAX + 1] = {
 	[NCSI_ATTR_PACKAGE_LIST] =	{ .type = NLA_NESTED },
 	[NCSI_ATTR_PACKAGE_ID] =	{ .type = NLA_U32 },
 	[NCSI_ATTR_CHANNEL_ID] =	{ .type = NLA_U32 },
+	[NCSI_ATTR_DATA] =		{ .type = NLA_BINARY, .len = 2048 },
 };
 
 static struct ncsi_dev_priv *ndp_from_ifindex(struct net *net, u32 ifindex)
@@ -366,6 +368,203 @@ static int ncsi_clear_interface_nl(struct sk_buff *msg, struct genl_info *info)
 	return 0;
 }
 
+static int ncsi_send_cmd_nl(struct sk_buff *msg, struct genl_info *info)
+{
+	struct ncsi_dev_priv *ndp;
+
+	struct ncsi_cmd_arg nca;
+	struct ncsi_pkt_hdr *hdr;
+
+	u32 package_id, channel_id;
+	unsigned char *data;
+	int len, ret;
+
+	if (!info || !info->attrs) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (!info->attrs[NCSI_ATTR_IFINDEX]) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (!info->attrs[NCSI_ATTR_PACKAGE_ID]) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (!info->attrs[NCSI_ATTR_CHANNEL_ID]) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (!info->attrs[NCSI_ATTR_DATA]) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ndp = ndp_from_ifindex(get_net(sock_net(msg->sk)),
+			       nla_get_u32(info->attrs[NCSI_ATTR_IFINDEX]));
+	if (!ndp) {
+		ret = -ENODEV;
+		goto out;
+	}
+
+	package_id = nla_get_u32(info->attrs[NCSI_ATTR_PACKAGE_ID]);
+	channel_id = nla_get_u32(info->attrs[NCSI_ATTR_CHANNEL_ID]);
+
+	if (package_id >= NCSI_MAX_PACKAGE || channel_id >= NCSI_MAX_CHANNEL) {
+		ret = -ERANGE;
+		goto out_netlink;
+	}
+
+	len = nla_len(info->attrs[NCSI_ATTR_DATA]);
+	if (len < sizeof(struct ncsi_pkt_hdr)) {
+		netdev_info(ndp->ndev.dev, "NCSI: no command to send %u\n",
+			    package_id);
+		ret = -EINVAL;
+		goto out_netlink;
+	} else {
+		data = (unsigned char *)nla_data(info->attrs[NCSI_ATTR_DATA]);
+	}
+
+	hdr = (struct ncsi_pkt_hdr *)data;
+
+	nca.ndp = ndp;
+	nca.package = (unsigned char)package_id;
+	nca.channel = (unsigned char)channel_id;
+	nca.type = hdr->type;
+	nca.req_flags = NCSI_REQ_FLAG_NETLINK_DRIVEN;
+	nca.info = info;
+	nca.payload = ntohs(hdr->length);
+	nca.data = data + sizeof(*hdr);
+
+	ret = ncsi_xmit_cmd(&nca);
+out_netlink:
+	if (ret != 0) {
+		netdev_err(ndp->ndev.dev,
+			   "NCSI: Error %d sending command\n",
+			   ret);
+		ncsi_send_netlink_err(ndp->ndev.dev,
+				      info->snd_seq,
+				      info->snd_portid,
+				      info->nlhdr,
+				      ret);
+	}
+out:
+	return ret;
+}
+
+int ncsi_send_netlink_rsp(struct ncsi_request *nr,
+			  struct ncsi_package *np,
+			  struct ncsi_channel *nc)
+{
+	struct sk_buff *skb;
+	struct net *net;
+	void *hdr;
+	int rc;
+
+	net = dev_net(nr->rsp->dev);
+
+	skb = genlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
+	if (!skb)
+		return -ENOMEM;
+
+	hdr = genlmsg_put(skb, nr->snd_portid, nr->snd_seq,
+			  &ncsi_genl_family, 0, NCSI_CMD_SEND_CMD);
+	if (!hdr) {
+		kfree_skb(skb);
+		return -EMSGSIZE;
+	}
+
+	nla_put_u32(skb, NCSI_ATTR_IFINDEX, nr->rsp->dev->ifindex);
+	if (np)
+		nla_put_u32(skb, NCSI_ATTR_PACKAGE_ID, np->id);
+	if (nc)
+		nla_put_u32(skb, NCSI_ATTR_CHANNEL_ID, nc->id);
+	else
+		nla_put_u32(skb, NCSI_ATTR_CHANNEL_ID, NCSI_RESERVED_CHANNEL);
+
+	rc = nla_put(skb, NCSI_ATTR_DATA, nr->rsp->len, (void *)nr->rsp->data);
+	if (rc)
+		goto err;
+
+	genlmsg_end(skb, hdr);
+	return genlmsg_unicast(net, skb, nr->snd_portid);
+
+err:
+	kfree_skb(skb);
+	return rc;
+}
+
+int ncsi_send_netlink_timeout(struct ncsi_request *nr,
+			      struct ncsi_package *np,
+			      struct ncsi_channel *nc)
+{
+	struct sk_buff *skb;
+	struct net *net;
+	void *hdr;
+
+	skb = genlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
+	if (!skb)
+		return -ENOMEM;
+
+	hdr = genlmsg_put(skb, nr->snd_portid, nr->snd_seq,
+			  &ncsi_genl_family, 0, NCSI_CMD_SEND_CMD);
+	if (!hdr) {
+		kfree_skb(skb);
+		return -EMSGSIZE;
+	}
+
+	net = dev_net(nr->cmd->dev);
+
+	nla_put_u32(skb, NCSI_ATTR_IFINDEX, nr->cmd->dev->ifindex);
+
+	if (np)
+		nla_put_u32(skb, NCSI_ATTR_PACKAGE_ID, np->id);
+	else
+		nla_put_u32(skb, NCSI_ATTR_PACKAGE_ID,
+			    NCSI_PACKAGE_INDEX((((struct ncsi_pkt_hdr *)
+						 nr->cmd->data)->channel)));
+
+	if (nc)
+		nla_put_u32(skb, NCSI_ATTR_CHANNEL_ID, nc->id);
+	else
+		nla_put_u32(skb, NCSI_ATTR_CHANNEL_ID, NCSI_RESERVED_CHANNEL);
+
+	genlmsg_end(skb, hdr);
+	return genlmsg_unicast(net, skb, nr->snd_portid);
+}
+
+int ncsi_send_netlink_err(struct net_device *dev,
+			  u32 snd_seq,
+			  u32 snd_portid,
+			  struct nlmsghdr *nlhdr,
+			  int err)
+{
+	struct sk_buff *skb;
+	struct nlmsghdr *nlh;
+	struct nlmsgerr *nle;
+	struct net *net;
+
+	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
+	if (!skb)
+		return -ENOMEM;
+
+	net = dev_net(dev);
+
+	nlh = nlmsg_put(skb, snd_portid, snd_seq,
+			NLMSG_ERROR, sizeof(*nle), 0);
+	nle = (struct nlmsgerr *)nlmsg_data(nlh);
+	nle->error = err;
+	memcpy(&nle->msg, nlhdr, sizeof(*nlh));
+
+	nlmsg_end(skb, nlh);
+
+	return nlmsg_unicast(net->genl_sock, skb, snd_portid);
+}
+
 static const struct genl_ops ncsi_ops[] = {
 	{
 		.cmd = NCSI_CMD_PKG_INFO,
@@ -386,6 +585,12 @@ static const struct genl_ops ncsi_ops[] = {
 		.doit = ncsi_clear_interface_nl,
 		.flags = GENL_ADMIN_PERM,
 	},
+	{
+		.cmd = NCSI_CMD_SEND_CMD,
+		.policy = ncsi_genl_policy,
+		.doit = ncsi_send_cmd_nl,
+		.flags = GENL_ADMIN_PERM,
+	},
 };
 
 static struct genl_family ncsi_genl_family __ro_after_init = {
diff --git a/net/ncsi/ncsi-netlink.h b/net/ncsi/ncsi-netlink.h
index 91a5c25..c4a4688 100644
--- a/net/ncsi/ncsi-netlink.h
+++ b/net/ncsi/ncsi-netlink.h
@@ -14,6 +14,18 @@
 
 #include "internal.h"
 
+int ncsi_send_netlink_rsp(struct ncsi_request *nr,
+			  struct ncsi_package *np,
+			  struct ncsi_channel *nc);
+int ncsi_send_netlink_timeout(struct ncsi_request *nr,
+			      struct ncsi_package *np,
+			      struct ncsi_channel *nc);
+int ncsi_send_netlink_err(struct net_device *dev,
+			  u32 snd_seq,
+			  u32 snd_portid,
+			  struct nlmsghdr *nlhdr,
+			  int err);
+
 int ncsi_init_netlink(struct net_device *dev);
 int ncsi_unregister_netlink(struct net_device *dev);
 
diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
index d66b347..dd931d2 100644
--- a/net/ncsi/ncsi-rsp.c
+++ b/net/ncsi/ncsi-rsp.c
@@ -16,9 +16,11 @@
 #include <net/ncsi.h>
 #include <net/net_namespace.h>
 #include <net/sock.h>
+#include <net/genetlink.h>
 
 #include "internal.h"
 #include "ncsi-pkt.h"
+#include "ncsi-netlink.h"
 
 static int ncsi_validate_rsp_pkt(struct ncsi_request *nr,
 				 unsigned short payload)
@@ -32,15 +34,25 @@ static int ncsi_validate_rsp_pkt(struct ncsi_request *nr,
 	 * before calling this function.
 	 */
 	h = (struct ncsi_rsp_pkt_hdr *)skb_network_header(nr->rsp);
-	if (h->common.revision != NCSI_PKT_REVISION)
+
+	if (h->common.revision != NCSI_PKT_REVISION) {
+		netdev_dbg(nr->ndp->ndev.dev,
+			   "NCSI: unsupported header revision\n");
 		return -EINVAL;
-	if (ntohs(h->common.length) != payload)
+	}
+	if (ntohs(h->common.length) != payload) {
+		netdev_dbg(nr->ndp->ndev.dev,
+			   "NCSI: payload length mismatched\n");
 		return -EINVAL;
+	}
 
 	/* Check on code and reason */
 	if (ntohs(h->code) != NCSI_PKT_RSP_C_COMPLETED ||
-	    ntohs(h->reason) != NCSI_PKT_RSP_R_NO_ERROR)
-		return -EINVAL;
+	    ntohs(h->reason) != NCSI_PKT_RSP_R_NO_ERROR) {
+		netdev_dbg(nr->ndp->ndev.dev,
+			   "NCSI: non zero response/reason code\n");
+		return -EPERM;
+	}
 
 	/* Validate checksum, which might be zeroes if the
 	 * sender doesn't support checksum according to NCSI
@@ -52,8 +64,11 @@ static int ncsi_validate_rsp_pkt(struct ncsi_request *nr,
 
 	checksum = ncsi_calculate_checksum((unsigned char *)h,
 					   sizeof(*h) + payload - 4);
-	if (*pchecksum != htonl(checksum))
+
+	if (*pchecksum != htonl(checksum)) {
+		netdev_dbg(nr->ndp->ndev.dev, "NCSI: checksum mismatched\n");
 		return -EINVAL;
+	}
 
 	return 0;
 }
@@ -941,6 +956,26 @@ static int ncsi_rsp_handler_gpuuid(struct ncsi_request *nr)
 	return 0;
 }
 
+static int ncsi_rsp_handler_netlink(struct ncsi_request *nr)
+{
+	struct ncsi_rsp_pkt *rsp;
+	struct ncsi_dev_priv *ndp = nr->ndp;
+	struct ncsi_package *np;
+	struct ncsi_channel *nc;
+	int ret;
+
+	/* Find the package */
+	rsp = (struct ncsi_rsp_pkt *)skb_network_header(nr->rsp);
+	ncsi_find_package_and_channel(ndp, rsp->rsp.common.channel,
+				      &np, &nc);
+	if (!np)
+		return -ENODEV;
+
+	ret = ncsi_send_netlink_rsp(nr, np, nc);
+
+	return ret;
+}
+
 static struct ncsi_rsp_handler {
 	unsigned char	type;
 	int             payload;
@@ -1043,6 +1078,17 @@ int ncsi_rcv_rsp(struct sk_buff *skb, struct net_device *dev,
 		netdev_warn(ndp->ndev.dev,
 			    "NCSI: 'bad' packet ignored for type 0x%x\n",
 			    hdr->type);
+
+		if (nr->flags == NCSI_REQ_FLAG_NETLINK_DRIVEN) {
+			if (ret == -EPERM)
+				goto out_netlink;
+			else
+				ncsi_send_netlink_err(ndp->ndev.dev,
+						      nr->snd_seq,
+						      nr->snd_portid,
+						      &nr->nlhdr,
+						      ret);
+		}
 		goto out;
 	}
 
@@ -1052,6 +1098,17 @@ int ncsi_rcv_rsp(struct sk_buff *skb, struct net_device *dev,
 		netdev_err(ndp->ndev.dev,
 			   "NCSI: Handler for packet type 0x%x returned %d\n",
 			   hdr->type, ret);
+
+out_netlink:
+	if (nr->flags == NCSI_REQ_FLAG_NETLINK_DRIVEN) {
+		ret = ncsi_rsp_handler_netlink(nr);
+		if (ret) {
+			netdev_err(ndp->ndev.dev,
+				   "NCSI: Netlink handler for packet type 0x%x returned %d\n",
+				   hdr->type, ret);
+		}
+	}
+
 out:
 	ncsi_free_request(nr);
 	return ret;
-- 
2.9.3



^ permalink raw reply related

* Re: [PATCH net-next 0/7] add support for VSC8584 and VSC8574 Microsemi quad-port PHYs
From: Allan W. Nielsen @ 2018-10-11  6:18 UTC (permalink / raw)
  To: Linus Walleij; +Cc: Alexandre Belloni, netdev
In-Reply-To: <CACRpkdZYpDJGzexRr7y4LO+hyJ92aEu_eRquK50iBWaaox5H5Q@mail.gmail.com>

Hi Linus,

I'm Allan, working for Microchip, who acquired Microsemi, who acquired Vitesse.

Alexandre pointed me to your comments on the VSC7384 switch done by Vitesse ~15
years ago. BTW: the chip is not being sold any more, and is unlikely to turn up
in new products.

I managed to find the complete datasheet of this device, and convincing people
that it can be "opened-up" meaning that it is available without any NDA or
similar. It is a 1.2mb/200 page pdf, it is not avialable at any web page (that I
know of), but if you are interested in having it then I can send it in a mail to
you.

The 09/20/2018 14:38, Linus Walleij wrote:
> Just as a drive-by comment this seems vaguely related to the Vitesse
> DSA switch I merged in drivers/net/dsa/vitesse-vsc73xx.c
> The VSC* product name handily gives away the origin in Vitesse's
> product line.
> 
> The VSC73xx also have the 8051 CPU and internal RAM, but are
> accessed (typically) over SPI, and AFAICT this thing is talking over
> MDIO.
> 
> The Vitesse 73xx however also supports a WAN port and VLANs
> which makes it significantly different, falling into switch class I
> guess.
That is right, here are the list of feature from page 1 in the datasheet.

Features:
- 12 Gigabit Ethernet ports with nonblocking wire- speed performance
- IEEE802.1Q-in-Q nested VLAN support
- Tri-speed (10/100/1000 Mbps) RGMII interfaces
- Full duplex flow control (IEEE802.3x) and half duplex back pressure
- Support for both wire-speed automatic learning, and CPU-based learning
- Flexible link aggregation compliant with IEEE802.3ad
- 208 kB on-chip frame buffer
- Spanning Tree Protocol support (IEEE802.1D)
- Jumbo frame support
- Multiple Spanning Tree support (IEEE802.1s)
- Programmable classifier for QoS (Layer 4/Multimedia) into four classes of
  service
- Port-based Access Control (IEEE802.1X)
- IGMP, GARP, GMRP, and GVRP support
- 8192 MAC addresses and 4,096 VLAN support (IEEE802.1Q)
- Cost effective 4-pin serial CPU interface
- Per-port shaping, policing, and Broadcast and Multicast Storm Control
- Selection between on-chip 8051 CPU, or off-chip 8-bit or 16-bit CPU for SNMP
  and Web-based management

/Allan

^ permalink raw reply

* Re: BUG: corrupted list in p9_read_work
From: Dmitry Vyukov @ 2018-10-11 13:40 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Leon Romanovsky, syzbot, David Miller, Eric Van Hensbergen, LKML,
	Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs,
	v9fs-developer
In-Reply-To: <CACT4Y+YvJ-KT4UxtOXpou1xpT3wBASDkt135zr2BwrRi0gv4Tw@mail.gmail.com>

On Thu, Oct 11, 2018 at 3:27 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Thu, Oct 11, 2018 at 3:10 PM, Dominique Martinet
> <asmadeus@codewreck.org> wrote:
>> Dmitry Vyukov wrote on Thu, Oct 11, 2018:
>>> > That's still the tricky part, I'm afraid... Making a separate server
>>> > would have been easy because I could have reused some of my junk for the
>>> > actual connection handling (some rdma helper library I wrote ages
>>> > ago[1]), but if you're going to just embed C code you'll probably want
>>> > something lower level? I've never seen syzkaller use any library call
>>> > but I'm not even sure I would know how to create a qp without
>>> > libibverbs, would standard stuff be OK ?
>>>
>>> Raw syscalls preferably.
>>> What does 'rxe_cfg start ens3' do on syscall level? Some netlink?
>>
>> modprobe rdma_rxe (and a bunch of other rdma modules before that) then
>> writes the interface name in /sys/module/rdma_rxe/parameters/add
>> apparently; then checks it worked.
>> this part could be done in C directly without too much trouble, but as
>> long as the proper kernel configuration/modules are available
>
> Now we are talking!
> We generally assume that all modules are simply compiled into kernel.
> At least that's we have on syzbot. If somebody can't compile them in,
> we can suggest to add modprobe into init.
> So this boils down to just writing to /sys/module/rdma_rxe/parameters/add.

This fails for me:

root@syzkaller:~# echo -n syz1 > /sys/module/rdma_rxe/parameters/add
[20992.905406] rdma_rxe: interface syz1 not found
bash: echo: write error: Invalid argument



>>> Any libraries and utilities are hell pain in linux world. Will it work
>>> in Android userspace? gVisor? Who will explain all syzkaller users
>>> where they get this for their who-knows-what distro, which is 10 years
>>> old because of corp policies, and debug how their version of the
>>> library has a slightly incompatible version?
>>> For example, after figuring out that rxe_cfg actually comes from
>>> rdma-core (which is a separate delight on linux), my debian
>>> destribution failed to install it because of some conflicts around
>>> /etc/modprobe.d/mlx4.conf, and my ubuntu distro does not know about
>>> such package. And we've just started :)
>>
>> The rdma ecosystem is a pain, I'll easily agree with that...
>>
>>> Syscalls tend to be simpler and more reliable. If it gives ENOSUPP,
>>> ok, that's it. If it works, great, we can use it.
>>
>> I'll have to look into it a bit more; libibverbs abstracts a lot of
>> stuff into per-nic userspace drivers (the files I cited in a previous
>> mail) and basically with the mellanox cards I'm familiar with the whole
>> user session looks like this:
>>  * common libibverbs/rdmacm code opens /dev/infiniband/rdma_cm and
>> /dev/infiniband/uverbs0 (plus a bunch of files to figure out abi
>> version, what user driver to load etc)
>>  * it and the userspace driver issue "commands" over these two files' fd
>> to setup the connection ; some commands are standard but some are
>> specific to the interface and defined in the driver.
>
> But we will use some kind of virtual/stub driver, right? We don't have
> real hardware. So all these commands should be fixed and known for the
> virtual/stub driver.
>
>> There are many facets to a connection in RDMA: a protection domain used
>> to register memory with the nic, a queue pair that is the actual tx/rx
>> connection, optionally a completion channel that will be another fd to
>> listen on for events that tell you something happened and finally some
>> memory regions to directly communicate with the nic from userspace
>> depending on the specific driver.
>>  * then there's the actual usage, more commands through the uverbs0 char
>> device to register the memory you'll use, and once that's done it's
>> entierly up to the driver - for example the mellanox lib can do
>> everything in userspace playing with the memory regions it registered,
>> but I'd wager the rxe driver does more calls through the uverbs0 fd...
>>
>> Honestly I'm not keen on reimplementing all of this; the interface
>> itself pretty much depends on your version of the kernel (there is a
>> common ABI defined, but as far as specific nics are concerned if your
>> kernel module doesn't match the user library version you can get some
>> nasty surprises), and it's far from the black or white of a good ol'
>> ENOSUPP error.
>>
>>
>> I'll look if I can figure out if there is a common subset of verbs
>> commands that are standard and sufficient to setup a listening
>> connection and exchange data that should be supported for all devices
>> and would let us reimplement just that, but while I hear your point
>> about android and ten years in the future I think it's more likely than
>> ten years in the future the verb abi will have changed but libibverbs
>> will just have the new version implemented and hide the change :P
>
> But again we don't need to support all of the available hardware.
> For example, we are testing net stack from external side using tun.
> tun is a very simple, virtual abstraction of a network card. It allows
> us to test all of generic net stack starting from L2 without messing
> with any real drivers and their differences entirely. I had impression
> that we are talking about something similar here too. Or not?
>
> Also I am a bit missing context about rdma<->9p interface. Do we need
> to setup all these ring buffers to satisfy the parts that 9p needs? Is
> it that 9p actually reads data directly from these ring buffers? Or
> there is some higher-level rdma interface that 9p uses?

^ permalink raw reply

* Re: BUG: corrupted list in p9_read_work
From: Dmitry Vyukov @ 2018-10-11 13:27 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Leon Romanovsky, syzbot, David Miller, Eric Van Hensbergen, LKML,
	Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs,
	v9fs-developer
In-Reply-To: <20181011131045.GA32030@nautica>

On Thu, Oct 11, 2018 at 3:10 PM, Dominique Martinet
<asmadeus@codewreck.org> wrote:
> Dmitry Vyukov wrote on Thu, Oct 11, 2018:
>> > That's still the tricky part, I'm afraid... Making a separate server
>> > would have been easy because I could have reused some of my junk for the
>> > actual connection handling (some rdma helper library I wrote ages
>> > ago[1]), but if you're going to just embed C code you'll probably want
>> > something lower level? I've never seen syzkaller use any library call
>> > but I'm not even sure I would know how to create a qp without
>> > libibverbs, would standard stuff be OK ?
>>
>> Raw syscalls preferably.
>> What does 'rxe_cfg start ens3' do on syscall level? Some netlink?
>
> modprobe rdma_rxe (and a bunch of other rdma modules before that) then
> writes the interface name in /sys/module/rdma_rxe/parameters/add
> apparently; then checks it worked.
> this part could be done in C directly without too much trouble, but as
> long as the proper kernel configuration/modules are available

Now we are talking!
We generally assume that all modules are simply compiled into kernel.
At least that's we have on syzbot. If somebody can't compile them in,
we can suggest to add modprobe into init.
So this boils down to just writing to /sys/module/rdma_rxe/parameters/add.



>> Any libraries and utilities are hell pain in linux world. Will it work
>> in Android userspace? gVisor? Who will explain all syzkaller users
>> where they get this for their who-knows-what distro, which is 10 years
>> old because of corp policies, and debug how their version of the
>> library has a slightly incompatible version?
>> For example, after figuring out that rxe_cfg actually comes from
>> rdma-core (which is a separate delight on linux), my debian
>> destribution failed to install it because of some conflicts around
>> /etc/modprobe.d/mlx4.conf, and my ubuntu distro does not know about
>> such package. And we've just started :)
>
> The rdma ecosystem is a pain, I'll easily agree with that...
>
>> Syscalls tend to be simpler and more reliable. If it gives ENOSUPP,
>> ok, that's it. If it works, great, we can use it.
>
> I'll have to look into it a bit more; libibverbs abstracts a lot of
> stuff into per-nic userspace drivers (the files I cited in a previous
> mail) and basically with the mellanox cards I'm familiar with the whole
> user session looks like this:
>  * common libibverbs/rdmacm code opens /dev/infiniband/rdma_cm and
> /dev/infiniband/uverbs0 (plus a bunch of files to figure out abi
> version, what user driver to load etc)
>  * it and the userspace driver issue "commands" over these two files' fd
> to setup the connection ; some commands are standard but some are
> specific to the interface and defined in the driver.

But we will use some kind of virtual/stub driver, right? We don't have
real hardware. So all these commands should be fixed and known for the
virtual/stub driver.

> There are many facets to a connection in RDMA: a protection domain used
> to register memory with the nic, a queue pair that is the actual tx/rx
> connection, optionally a completion channel that will be another fd to
> listen on for events that tell you something happened and finally some
> memory regions to directly communicate with the nic from userspace
> depending on the specific driver.
>  * then there's the actual usage, more commands through the uverbs0 char
> device to register the memory you'll use, and once that's done it's
> entierly up to the driver - for example the mellanox lib can do
> everything in userspace playing with the memory regions it registered,
> but I'd wager the rxe driver does more calls through the uverbs0 fd...
>
> Honestly I'm not keen on reimplementing all of this; the interface
> itself pretty much depends on your version of the kernel (there is a
> common ABI defined, but as far as specific nics are concerned if your
> kernel module doesn't match the user library version you can get some
> nasty surprises), and it's far from the black or white of a good ol'
> ENOSUPP error.
>
>
> I'll look if I can figure out if there is a common subset of verbs
> commands that are standard and sufficient to setup a listening
> connection and exchange data that should be supported for all devices
> and would let us reimplement just that, but while I hear your point
> about android and ten years in the future I think it's more likely than
> ten years in the future the verb abi will have changed but libibverbs
> will just have the new version implemented and hide the change :P

But again we don't need to support all of the available hardware.
For example, we are testing net stack from external side using tun.
tun is a very simple, virtual abstraction of a network card. It allows
us to test all of generic net stack starting from L2 without messing
with any real drivers and their differences entirely. I had impression
that we are talking about something similar here too. Or not?

Also I am a bit missing context about rdma<->9p interface. Do we need
to setup all these ring buffers to satisfy the parts that 9p needs? Is
it that 9p actually reads data directly from these ring buffers? Or
there is some higher-level rdma interface that 9p uses?

^ permalink raw reply

* Re: [PATCH] qmi_wwan: Added support for Gemalto's Cinterion ALASxx WWAN interface
From: David Miller @ 2018-10-11  5:57 UTC (permalink / raw)
  To: gciofono; +Cc: netdev
In-Reply-To: <20181010180553.3986-1-gciofono@gmail.com>

From: Giacinto Cifelli <gciofono@gmail.com>
Date: Wed, 10 Oct 2018 20:05:53 +0200

> Added support for Gemalto's Cinterion ALASxx WWAN interfaces
> by adding QMI_FIXED_INTF with Cinterion's VID and PID.
> 
> Signed-off-by: Giacinto Cifelli <gciofono@gmail.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [net 1/1] tipc: queue socket protocol error messages into socket receive buffer
From: David Miller @ 2018-10-11  5:57 UTC (permalink / raw)
  To: jon.maloy
  Cc: netdev, gordan.mihaljevic, tung.q.nguyen, hoang.h.le, canh.d.luu,
	ying.xue, tipc-discussion
In-Reply-To: <1539186623-344-1-git-send-email-jon.maloy@ericsson.com>

From: Jon Maloy <jon.maloy@ericsson.com>
Date: Wed, 10 Oct 2018 17:50:23 +0200

> From: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
> 
> In tipc_sk_filter_rcv(), when we detect protocol messages with error we
> call tipc_sk_conn_proto_rcv() and let it reset the connection and notify
> the socket by calling sk->sk_state_change().
> 
> However, tipc_sk_filter_rcv() may have been called from the function
> tipc_backlog_rcv(), in which case the socket lock is held and the socket
> already awake. This means that the sk_state_change() call is ignored and
> the error notification lost. Now the receive queue will remain empty and
> the socket sleeps forever.
> 
> In this commit, we convert the protocol message into a connection abort
> message and enqueue it into the socket's receive queue. By this addition
> to the above state change we cover all conditions.
> 
> Acked-by: Ying Xue <ying.xue@windriver.com>
> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>

Applied.

^ permalink raw reply

* Re: [net 1/1] tipc: set link tolerance correctly in broadcast link
From: David Miller @ 2018-10-11  5:56 UTC (permalink / raw)
  To: jon.maloy
  Cc: netdev, gordan.mihaljevic, tung.q.nguyen, hoang.h.le, canh.d.luu,
	ying.xue, tipc-discussion
In-Reply-To: <1539185641-32711-1-git-send-email-jon.maloy@ericsson.com>

From: Jon Maloy <jon.maloy@ericsson.com>
Date: Wed, 10 Oct 2018 17:34:01 +0200

> In the patch referred to below we added link tolerance as an additional
> criteria for declaring broadcast transmission "stale" and resetting the
> affected links.
> 
> However, the 'tolerance' field of the broadcast link is never set, and
> remains at zero. This renders the whole commit without the intended
> improving effect, but luckily also with no negative effect.
> 
> In this commit we add the missing initialization.
> 
> Fixes: a4dc70d46cf1 ("tipc: extend link reset criteria for stale packet
> retransmission")
> 
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>

Applied.

^ permalink raw reply

* Re: [PATCH net 0/2] net: dsa: bcm_sf2: Couple of fixes
From: David Miller @ 2018-10-11  5:53 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, andrew, vivien.didelot
In-Reply-To: <20181009234858.23920-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Tue,  9 Oct 2018 16:48:56 -0700

> Here are two fixes for the bcm_sf2 driver that were found during
> testing unbind and analysing another issue during system
> suspend/resume.

Series applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net: sched: avoid writing on noop_qdisc
From: David Miller @ 2018-10-11  5:49 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, john.fastabend, eric.dumazet
In-Reply-To: <20181009222050.175348-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Tue,  9 Oct 2018 15:20:50 -0700

> While noop_qdisc.gso_skb and noop_qdisc.skb_bad_txq are not used
> in other places, it seems not correct to overwrite their fields
> in dev_init_scheduler_queue().
> 
> noop_qdisc is essentially a shared and read-only object, even if
> it is not marked as const because of some implementation detail.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net/ipv6: Add knob to skip DELROUTE message on device down
From: David Miller @ 2018-10-11  5:48 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, roopa, dsahern
In-Reply-To: <20181009212751.17695-1-dsahern@kernel.org>

From: David Ahern <dsahern@kernel.org>
Date: Tue,  9 Oct 2018 14:27:51 -0700

> From: David Ahern <dsahern@gmail.com>
> 
> Another difference between IPv4 and IPv6 is the generation of RTM_DELROUTE
> notifications when a device is taken down (admin down) or deleted. IPv4
> does not generate a message for routes evicted by the down or delete;
> IPv6 does. A NOS at scale really needs to avoid these messages and have
> IPv4 and IPv6 behave similarly, relying on userspace to handle link
> notifications and evict the routes.
> 
> At this point existing user behavior needs to be preserved. Since
> notifications are a global action (not per app) the only way to preserve
> existing behavior and allow the messages to be skipped is to add a new
> sysctl (net/ipv6/route/skip_notify_on_dev_down) which can be set to
> disable the notificatioons.
> 
> IPv6 route code already supports the option to skip the message (it is
> used for multipath routes for example). Besides the new sysctl we need
> to pass the skip_notify setting through the generic fib6_clean and
> fib6_walk functions to fib6_clean_node and to set skip_notify on calls
> to __ip_del_rt for the addrconf_ifdown path.
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>

This doesn't apply cleanly, and you seem to be saying that the anycast
and addrconf bits should be removed anyways.

^ permalink raw reply

* Re: [PATCH net-next] net/mpls: Implement handler for strict data checking on dumps
From: David Miller @ 2018-10-11  5:46 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, arnd, dsahern
In-Reply-To: <20181009181043.25350-1-dsahern@kernel.org>

From: David Ahern <dsahern@kernel.org>
Date: Tue,  9 Oct 2018 11:10:43 -0700

> From: David Ahern <dsahern@gmail.com>
> 
> Without CONFIG_INET enabled compiles fail with:
> 
> net/mpls/af_mpls.o: In function `mpls_dump_routes':
> af_mpls.c:(.text+0xed0): undefined reference to `ip_valid_fib_dump_req'
> 
> The preference is for MPLS to use the same handler as ipv4 and ipv6
> to allow consistency when doing a dump for AF_UNSPEC which walks
> all address families invoking the route dump handler. If INET is
> disabled then fallback to an MPLS version which can be tighter on
> the data checks.
> 
> Fixes: e8ba330ac0c5 ("rtnetlink: Update fib dumps for strict data checking")
> Reported-by: Randy Dunlap <rdunlap@infradead.org>
> Reported-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: David Ahern <dsahern@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net v2 0/2] net: ipv4: fixes for PMTU when link MTU changes
From: David Miller @ 2018-10-11  5:45 UTC (permalink / raw)
  To: sd; +Cc: netdev, dsahern, sbrivio
In-Reply-To: <cover.1539073548.git.sd@queasysnail.net>

From: Sabrina Dubroca <sd@queasysnail.net>
Date: Tue,  9 Oct 2018 17:48:13 +0200

> The first patch adapts the changes that commit e9fa1495d738 ("ipv6:
> Reflect MTU changes on PMTU of exceptions for MTU-less routes") did in
> IPv6 to IPv4: lower PMTU when the first hop's MTU drops below it, and
> raise PMTU when the first hop was limiting PMTU discovery and its MTU
> is increased.
> 
> The second patch fixes bugs introduced in commit d52e5a7e7ca4 ("ipv4:
> lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu") that
> only appear once the first patch is applied.
> 
> Selftests for these cases were introduced in net-next commit
> e44e428f59e4 ("selftests: pmtu: add basic IPv4 and IPv6 PMTU tests")
> 
> v2: add cover letter, and fix a few small things in patch 1

Series applied and queued up for -stable.

^ permalink raw reply

* Re: BUG: corrupted list in p9_read_work
From: Dominique Martinet @ 2018-10-11 13:10 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Leon Romanovsky, syzbot, David Miller, Eric Van Hensbergen, LKML,
	Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs,
	v9fs-developer
In-Reply-To: <CACT4Y+bJvxXeJxbAKCPTm0X608t302Kcq+F7Ccot3_-wx7sdog@mail.gmail.com>

Dmitry Vyukov wrote on Thu, Oct 11, 2018:
> > That's still the tricky part, I'm afraid... Making a separate server
> > would have been easy because I could have reused some of my junk for the
> > actual connection handling (some rdma helper library I wrote ages
> > ago[1]), but if you're going to just embed C code you'll probably want
> > something lower level? I've never seen syzkaller use any library call
> > but I'm not even sure I would know how to create a qp without
> > libibverbs, would standard stuff be OK ?
> 
> Raw syscalls preferably.
> What does 'rxe_cfg start ens3' do on syscall level? Some netlink?

modprobe rdma_rxe (and a bunch of other rdma modules before that) then
writes the interface name in /sys/module/rdma_rxe/parameters/add
apparently; then checks it worked.
this part could be done in C directly without too much trouble, but as
long as the proper kernel configuration/modules are available

> Any libraries and utilities are hell pain in linux world. Will it work
> in Android userspace? gVisor? Who will explain all syzkaller users
> where they get this for their who-knows-what distro, which is 10 years
> old because of corp policies, and debug how their version of the
> library has a slightly incompatible version?
> For example, after figuring out that rxe_cfg actually comes from
> rdma-core (which is a separate delight on linux), my debian
> destribution failed to install it because of some conflicts around
> /etc/modprobe.d/mlx4.conf, and my ubuntu distro does not know about
> such package. And we've just started :)

The rdma ecosystem is a pain, I'll easily agree with that...

> Syscalls tend to be simpler and more reliable. If it gives ENOSUPP,
> ok, that's it. If it works, great, we can use it.

I'll have to look into it a bit more; libibverbs abstracts a lot of
stuff into per-nic userspace drivers (the files I cited in a previous
mail) and basically with the mellanox cards I'm familiar with the whole
user session looks like this:
 * common libibverbs/rdmacm code opens /dev/infiniband/rdma_cm and
/dev/infiniband/uverbs0 (plus a bunch of files to figure out abi
version, what user driver to load etc) 
 * it and the userspace driver issue "commands" over these two files' fd
to setup the connection ; some commands are standard but some are
specific to the interface and defined in the driver.
There are many facets to a connection in RDMA: a protection domain used
to register memory with the nic, a queue pair that is the actual tx/rx
connection, optionally a completion channel that will be another fd to
listen on for events that tell you something happened and finally some
memory regions to directly communicate with the nic from userspace
depending on the specific driver.
 * then there's the actual usage, more commands through the uverbs0 char
device to register the memory you'll use, and once that's done it's
entierly up to the driver - for example the mellanox lib can do
everything in userspace playing with the memory regions it registered,
but I'd wager the rxe driver does more calls through the uverbs0 fd...

Honestly I'm not keen on reimplementing all of this; the interface
itself pretty much depends on your version of the kernel (there is a
common ABI defined, but as far as specific nics are concerned if your
kernel module doesn't match the user library version you can get some
nasty surprises), and it's far from the black or white of a good ol'
ENOSUPP error.


I'll look if I can figure out if there is a common subset of verbs
commands that are standard and sufficient to setup a listening
connection and exchange data that should be supported for all devices
and would let us reimplement just that, but while I hear your point
about android and ten years in the future I think it's more likely than
ten years in the future the verb abi will have changed but libibverbs
will just have the new version implemented and hide the change :P

-- 
Dominique

^ permalink raw reply

* Re: [PATCH V1 net-next 00/12] Improving performance and reducing latencies, by using latest capabilities exposed in ENA device
From: David Miller @ 2018-10-11  5:41 UTC (permalink / raw)
  To: akiyano
  Cc: netdev, dwmw, zorik, matua, saeedb, msw, aliguori, nafea, gtzalik,
	netanel, alisaidi
In-Reply-To: <1539110709-31954-1-git-send-email-akiyano@amazon.com>

From: <akiyano@amazon.com>
Date: Tue, 9 Oct 2018 21:44:57 +0300

> This patchset introduces the following:
> 1. A new placement policy of Tx headers and descriptors, which takes
> advantage of an option to place headers + descriptors in device memory
> space. This is sometimes referred to as LLQ - low latency queue.
> The patch set defines the admin capability, maps the device memory as
> write-combined, and adds a mode in transmit datapath to do header +
> descriptor placement on the device.
> 2. Support for RX checksum offloading
> 3. Miscelaneous small improvements and code cleanups

This doesn't apply cleanly to net-next, please respin.

^ permalink raw reply

* Re: [PATCH] virtio_net: enable tx after resuming from suspend
From: Jason Wang @ 2018-10-11 13:06 UTC (permalink / raw)
  To: ake
  Cc: netdev, virtualization, David S. Miller, linux-kernel,
	Michael S. Tsirkin
In-Reply-To: <7e87b140-79ae-c79e-40ed-dc76b38eeae4@igel.co.jp>



On 2018年10月11日 18:22, ake wrote:
>
> On 2018年10月11日 18:44, Jason Wang wrote:
>>
>> On 2018年10月11日 15:51, Ake Koomsin wrote:
>>> commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
>>> disabled the virtio tx before going to suspend to avoid a use after free.
>>> However, after resuming, it causes the virtio_net device to lose its
>>> network connectivity.
>>>
>>> To solve the issue, we need to enable tx after resuming.
>>>
>>> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during
>>> reset")
>>> Signed-off-by: Ake Koomsin <ake@igel.co.jp>
>>> ---
>>>    drivers/net/virtio_net.c | 1 +
>>>    1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index dab504ec5e50..3453d80f5f81 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -2256,6 +2256,7 @@ static int virtnet_restore_up(struct
>>> virtio_device *vdev)
>>>        }
>>>          netif_device_attach(vi->dev);
>>> +    netif_start_queue(vi->dev);
>> I believe this is duplicated with netif_tx_wake_all_queues() in
>> netif_device_attach() above?
> Thank you for your review.
>
> If both netif_tx_wake_all_queues() and netif_start_queue() result in
> clearing __QUEUE_STATE_DRV_XOFF, then is it possible that some
> conditions in netif_device_attach() is not satisfied?

Yes, maybe. One case I can see now is when the device is down, in this 
case netif_device_attach() won't try to wakeup the queue.

>   Without
> netif_start_queue(), the virtio_net device does not resume properly
> after waking up.

How do you trigger the issue? Just do suspend/resume?

>
> Is it better to report this as a bug first?

Nope, you're very welcome to post patch directly.

> If I am to do more
> investigation, what areas should I look into?

As you've figured out, you can start with why netif_tx_wake_all_queues() 
were not executed?

(Btw, does the issue disappear if you move netif_tx_disable() under the 
check of netif_running() in virtnet_freeze_down()?)

Thanks

>
> Best Regards
> Ake Koomsin
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* [PATCH] net: cdc_ncm: use tasklet_init() for tasklet_struct init
From: Ben Dooks @ 2018-10-11 13:03 UTC (permalink / raw)
  To: avem, oliver; +Cc: linux-usb, netdev, linux-kernel, linux-kernel, Ben Dooks

The tasklet initialisation would be better done by tasklet_init()
instead of assuming all the fields are in an ok state by default.

This does not fix any actual know bug.

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
---
 drivers/net/usb/cdc_ncm.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index 0d722b326e1b..863f3548a439 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -784,8 +784,7 @@ int cdc_ncm_bind_common(struct usbnet *dev, struct usb_interface *intf, u8 data_
 
 	hrtimer_init(&ctx->tx_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
 	ctx->tx_timer.function = &cdc_ncm_tx_timer_cb;
-	ctx->bh.data = (unsigned long)dev;
-	ctx->bh.func = cdc_ncm_txpath_bh;
+	tasklet_init(&ctx->bh, cdc_ncm_txpath_bh, (unsigned long)dev);
 	atomic_set(&ctx->stop, 0);
 	spin_lock_init(&ctx->mtx);
 
-- 
2.19.1

^ permalink raw reply related

* RE: [PATCH net-next v5] net/ncsi: Extend NC-SI Netlink interface to allow user space to send NC-SI command
From: Justin.Lee1 @ 2018-10-11  5:37 UTC (permalink / raw)
  To: sam, joel; +Cc: linux-aspeed, netdev, openbmc, amithash, christian, vijaykhemka
In-Reply-To: <e26c7aa664a2dad4f5bcf4efaa1b3eb655548b01.camel@mendozajonas.com>


> On Wed, 2018-10-10 at 18:11 +0000, Justin.Lee1@Dell.com wrote:
> <snip>
> > +
> > +	len = nla_len(info->attrs[NCSI_ATTR_DATA]);
> > +	if (len < sizeof(struct ncsi_pkt_hdr)) {
> > +		netdev_info(ndp->ndev.dev, "NCSI: no command to send %u\n",
> > +			    package_id);
> > +		ret = -EINVAL;
> > +		goto out_netlink;
> > +	} else {
> > +		data = (unsigned char *)nla_data(info->attrs[NCSI_ATTR_DATA]);
> > +	}
> 
> I only just noticed this, the call to nla_len() can cause a null-dereference if
> the NCSI_ATTR_DATA attribute isn't present; we need to make sure it exists
> before accessing it in info->attrs.
> 
> eg:
> 
> root@ozrom2-bmc:~# ./ncsi-netlink -l 2 -p 0 -c 0 --cmd
> [   81.399837] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [   81.409092] pgd = ddaa9fa6
> [   81.413084] [00000000] *pgd=9702c831, *pte=00000000, *ppte=00000000
> [   81.420729] Internal error: Oops: 17 [#1] ARM
> [   81.426447] CPU: 0 PID: 1028 Comm: ncsi-netlink Not tainted 4.18.8-sammj-00144-gbc129f31bfa5 #12
> ...
> [   81.874434] Kernel panic - not syncing: Fatal exception
> 
> Cheers,
> Sam

Good catch! I will address this and generate the new patch.

Thanks,
Justin

^ permalink raw reply

* Re: [PATCH net-next v2] net: tun: remove useless codes of tun_automq_select_queue
From: David Miller @ 2018-10-11  5:35 UTC (permalink / raw)
  To: wangli39; +Cc: netdev
In-Reply-To: <20181009023204.62836-1-wangli39@baidu.com>

From: Wang Li <wangli39@baidu.com>
Date: Tue,  9 Oct 2018 10:32:04 +0800

> Because the function __skb_get_hash_symmetric always returns non-zero.
> 
> Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> Signed-off-by: Wang Li <wangli39@baidu.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 0/3] nfp: flower: speed up stats update loop
From: David Miller @ 2018-10-11  5:33 UTC (permalink / raw)
  To: jakub.kicinski; +Cc: netdev, oss-drivers
In-Reply-To: <20181009015736.30268-1-jakub.kicinski@netronome.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Mon,  8 Oct 2018 18:57:33 -0700

> This set from Pieter improves performance of processing FW stats
> update notifications.  The FW seems to send those at relatively
> high rate (roughly ten per second per flow), therefore if we want
> to approach the million flows mark we have to be very careful
> about our data structures.
> 
> We tried rhashtable for stat updates, but according to our experiments
> rhashtable lookup on a u32 takes roughly 60ns on an Xeon E5-2670 v3.
> Which translate to a hard limit of 16M lookups per second on this CPU,
> and, according to perf record jhash and memcmp account for 60% of CPU
> usage on the core handling the updates.
> 
> Given that our statistic IDs are already array indices, and considering
> each statistic is only 24B in size, we decided to forego the use
> of hashtables and use a directly indexed array.  The CPU savings are
> considerable.
> 
> With the recent improvements in TC core and with our own bottlenecks
> out of the way Pieter removes the artificial limit of 128 flows, and
> allows the driver to install as many flows as FW supports.

Series applied.

^ permalink raw reply

* Re: [PATCH net-next] tcp: refactor DCTCP ECN ACK handling
From: David Miller @ 2018-10-11  5:26 UTC (permalink / raw)
  To: ycheng; +Cc: netdev, edumazet, ncardwell, ysseung
In-Reply-To: <20181008223220.93230-1-ycheng@google.com>

From: Yuchung Cheng <ycheng@google.com>
Date: Mon,  8 Oct 2018 15:32:20 -0700

> DCTCP has two parts - a new ECN signalling mechanism and the response
> function to it. The first part can be used by other congestion
> control for DCTCP-ECN deployed networks. This patch moves that part
> into a separate tcp_dctcp.h to be used by other congestion control
> module (like how Yeah uses Vegas algorithmas). For example, BBR is
> experimenting such ECN signal currently
> https://tinyurl.com/ietf-102-iccrg-bbr2
> 
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Yousuk Seung <ysseung@google.com>
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net/ipv6: Make ipv6_route_table_template static
From: David Miller @ 2018-10-11  5:26 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, dsahern
In-Reply-To: <20181008210634.19645-1-dsahern@kernel.org>

From: David Ahern <dsahern@kernel.org>
Date: Mon,  8 Oct 2018 14:06:34 -0700

> From: David Ahern <dsahern@gmail.com>
> 
> ipv6_route_table_template is exported but there are no users outside
> of route.c. Make it static.
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>

Applied.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox