Netdev List
 help / color / mirror / Atom feed
* [patch net-next 0/2] net: sched: Fix RED qdisc offload flag
From: Nogah Frankel @ 2017-12-25  8:51 UTC (permalink / raw)
  To: netdev; +Cc: davem, jhs, xiyou.wangcong, jiri, mlxsw, Nogah Frankel

Replacing the RED offload flag (TC_RED_OFFLOADED) with the generic one
(TCQ_F_OFFLOADED) deleted some of the logic behind it. This patchset fixes
this problem.

Nogah Frankel (2):
  net_sch: red: Fix the new offload indication
  net: sched: Move offload check till after dump call

 net/sched/sch_api.c |  5 ++---
 net/sched/sch_red.c | 26 ++++++++++++++------------
 2 files changed, 16 insertions(+), 15 deletions(-)

-- 
2.4.3

^ permalink raw reply

* [patch net-next 2/2] net: sched: Move offload check till after dump call
From: Nogah Frankel @ 2017-12-25  8:51 UTC (permalink / raw)
  To: netdev; +Cc: davem, jhs, xiyou.wangcong, jiri, mlxsw, Nogah Frankel
In-Reply-To: <1514191902-53752-1-git-send-email-nogahf@mellanox.com>

Move the check of the offload state to after the qdisc dump action was
called, so the qdisc could update it if it was changed.

Fixes: 7a4fa29106d9 ("net: sched: Add TCA_HW_OFFLOAD")
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
---
 net/sched/sch_api.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 3a3a1da..758f132 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -807,11 +807,10 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
 	tcm->tcm_info = refcount_read(&q->refcnt);
 	if (nla_put_string(skb, TCA_KIND, q->ops->id))
 		goto nla_put_failure;
-	if (nla_put_u8(skb, TCA_HW_OFFLOAD, !!(q->flags & TCQ_F_OFFLOADED)))
-		goto nla_put_failure;
 	if (q->ops->dump && q->ops->dump(q, skb) < 0)
 		goto nla_put_failure;
-
+	if (nla_put_u8(skb, TCA_HW_OFFLOAD, !!(q->flags & TCQ_F_OFFLOADED)))
+		goto nla_put_failure;
 	qlen = qdisc_qlen_sum(q);
 
 	stab = rtnl_dereference(q->stab);
-- 
2.4.3

^ permalink raw reply related

* [patch net-next 1/2] net_sch: red: Fix the new offload indication
From: Nogah Frankel @ 2017-12-25  8:51 UTC (permalink / raw)
  To: netdev; +Cc: davem, jhs, xiyou.wangcong, jiri, mlxsw, Nogah Frankel
In-Reply-To: <1514191902-53752-1-git-send-email-nogahf@mellanox.com>

Update the offload flag, TCQ_F_OFFLOADED, in each dump call (and ignore
the offloading function return value in relation to this flag).
This is done because a qdisc is being initialized, and therefore offloaded
before being grafted. Since the ability of the driver to offload the qdisc
depends on its location, a qdisc can be offloaded and un-offloaded by graft
calls, that doesn't effect the qdisc itself.

Fixes: 428a68af3a7c ("net: sched: Move to new offload indication in RED"
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
---
 net/sched/sch_red.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index ec0bd36..a392eaa 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -157,7 +157,6 @@ static int red_offload(struct Qdisc *sch, bool enable)
 		.handle = sch->handle,
 		.parent = sch->parent,
 	};
-	int err;
 
 	if (!tc_can_offload(dev) || !dev->netdev_ops->ndo_setup_tc)
 		return -EOPNOTSUPP;
@@ -172,14 +171,7 @@ static int red_offload(struct Qdisc *sch, bool enable)
 		opt.command = TC_RED_DESTROY;
 	}
 
-	err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_RED, &opt);
-
-	if (!err && enable)
-		sch->flags |= TCQ_F_OFFLOADED;
-	else
-		sch->flags &= ~TCQ_F_OFFLOADED;
-
-	return err;
+	return dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_RED, &opt);
 }
 
 static void red_destroy(struct Qdisc *sch)
@@ -297,12 +289,22 @@ static int red_dump_offload_stats(struct Qdisc *sch, struct tc_red_qopt *opt)
 			.stats.qstats = &sch->qstats,
 		},
 	};
+	int err;
+
+	sch->flags &= ~TCQ_F_OFFLOADED;
 
-	if (!(sch->flags & TCQ_F_OFFLOADED))
+	if (!tc_can_offload(dev) || !dev->netdev_ops->ndo_setup_tc)
+		return 0;
+
+	err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_RED,
+					    &hw_stats);
+	if (err == -EOPNOTSUPP)
 		return 0;
 
-	return dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_RED,
-					     &hw_stats);
+	if (!err)
+		sch->flags |= TCQ_F_OFFLOADED;
+
+	return err;
 }
 
 static int red_dump(struct Qdisc *sch, struct sk_buff *skb)
-- 
2.4.3

^ permalink raw reply related

* [patch iproute2 v3 4/4] man: Add -bs option to tc manpage
From: Chris Mi @ 2017-12-25  8:46 UTC (permalink / raw)
  To: netdev; +Cc: gerlitz.or, stephen, dsahern
In-Reply-To: <20171225084658.24076-1-chrism@mellanox.com>

Signed-off-by: Chris Mi <chrism@mellanox.com>
---
 man/man8/tc.8 | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/man/man8/tc.8 b/man/man8/tc.8
index ff071b33..de137e16 100644
--- a/man/man8/tc.8
+++ b/man/man8/tc.8
@@ -601,6 +601,11 @@ must exist already.
 read commands from provided file or standard input and invoke them.
 First failure will cause termination of tc.
 
+.TP
+.BR "\-bs", " \-bs size", " \-batchsize", " \-batchsize size"
+How many commands are accumulated before sending to kernel.
+By default, it is 1. It only takes effect in batch mode.
+
 .TP
 .BR "\-force"
 don't terminate tc on errors in batch mode.
-- 
2.14.3

^ permalink raw reply related

* [patch iproute2 v3 2/4] utils: Add a function setcmdlinetotal
From: Chris Mi @ 2017-12-25  8:46 UTC (permalink / raw)
  To: netdev; +Cc: gerlitz.or, stephen, dsahern
In-Reply-To: <20171225084658.24076-1-chrism@mellanox.com>

This function calculates how many commands a batch file has and
set it to global variable cmdlinetotal.

Signed-off-by: Chris Mi <chrism@mellanox.com>
---
 include/utils.h |  4 ++++
 lib/utils.c     | 20 ++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/include/utils.h b/include/utils.h
index d3895d56..113a8c31 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -235,6 +235,10 @@ void print_nlmsg_timestamp(FILE *fp, const struct nlmsghdr *n);
 
 extern int cmdlineno;
 ssize_t getcmdline(char **line, size_t *len, FILE *in);
+
+extern int cmdlinetotal;
+void setcmdlinetotal(const char *name);
+
 int makeargs(char *line, char *argv[], int maxargs);
 int inet_get_addr(const char *src, __u32 *dst, struct in6_addr *dst6);
 
diff --git a/lib/utils.c b/lib/utils.c
index 7ced8c06..53ca389f 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -1202,6 +1202,26 @@ ssize_t getcmdline(char **linep, size_t *lenp, FILE *in)
 	return cc;
 }
 
+int cmdlinetotal;
+
+void setcmdlinetotal(const char *name)
+{
+	char *line = NULL;
+	size_t len = 0;
+
+	if (name && strcmp(name, "-") != 0) {
+		if (freopen(name, "r", stdin) == NULL) {
+			fprintf(stderr, "Cannot open file \"%s\" for reading: %s\n",
+				name, strerror(errno));
+			return;
+		}
+	}
+
+	cmdlinetotal = 0;
+	while (getcmdline(&line, &len, stdin) != -1)
+		cmdlinetotal++;
+}
+
 /* split command line into argument vector */
 int makeargs(char *line, char *argv[], int maxargs)
 {
-- 
2.14.3

^ permalink raw reply related

* [patch iproute2 v3 1/4] lib/libnetlink: Add a function rtnl_talk_msg
From: Chris Mi @ 2017-12-25  8:46 UTC (permalink / raw)
  To: netdev; +Cc: gerlitz.or, stephen, dsahern
In-Reply-To: <20171225084658.24076-1-chrism@mellanox.com>

rtnl_talk can only send a single message to kernel. Add a new function
rtnl_talk_msg that can send multiple messages to kernel.

Signed-off-by: Chris Mi <chrism@mellanox.com>
---
 include/libnetlink.h |  3 ++
 lib/libnetlink.c     | 92 ++++++++++++++++++++++++++++++++++++++++------------
 2 files changed, 74 insertions(+), 21 deletions(-)

diff --git a/include/libnetlink.h b/include/libnetlink.h
index a4d83b9e..e95fad75 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -96,6 +96,9 @@ int rtnl_dump_filter_nc(struct rtnl_handle *rth,
 int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
 	      struct nlmsghdr **answer)
 	__attribute__((warn_unused_result));
+int rtnl_talk_msg(struct rtnl_handle *rtnl, struct msghdr *m,
+	      struct nlmsghdr **answer)
+	__attribute__((warn_unused_result));
 int rtnl_talk_extack(struct rtnl_handle *rtnl, struct nlmsghdr *n,
 	      struct nlmsghdr **answer, nl_ext_ack_fn_t errfn)
 	__attribute__((warn_unused_result));
diff --git a/lib/libnetlink.c b/lib/libnetlink.c
index 00e6ce0c..f5f675cf 100644
--- a/lib/libnetlink.c
+++ b/lib/libnetlink.c
@@ -581,36 +581,21 @@ static void rtnl_talk_error(struct nlmsghdr *h, struct nlmsgerr *err,
 		strerror(-err->error));
 }
 
-static int __rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
-		       struct nlmsghdr **answer,
-		       bool show_rtnl_err, nl_ext_ack_fn_t errfn)
+static int __rtnl_check_ack(struct rtnl_handle *rtnl, struct nlmsghdr **answer,
+		       bool show_rtnl_err, nl_ext_ack_fn_t errfn,
+		       unsigned int seq)
 {
 	int status;
-	unsigned int seq;
-	struct nlmsghdr *h;
+	char *buf;
 	struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK };
-	struct iovec iov = {
-		.iov_base = n,
-		.iov_len = n->nlmsg_len
-	};
+	struct nlmsghdr *h;
+	struct iovec iov;
 	struct msghdr msg = {
 		.msg_name = &nladdr,
 		.msg_namelen = sizeof(nladdr),
 		.msg_iov = &iov,
 		.msg_iovlen = 1,
 	};
-	char *buf;
-
-	n->nlmsg_seq = seq = ++rtnl->seq;
-
-	if (answer == NULL)
-		n->nlmsg_flags |= NLM_F_ACK;
-
-	status = sendmsg(rtnl->fd, &msg, 0);
-	if (status < 0) {
-		perror("Cannot talk to rtnetlink");
-		return -1;
-	}
 
 	while (1) {
 		status = rtnl_recvmsg(rtnl->fd, &msg, &buf);
@@ -698,12 +683,77 @@ static int __rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
 	}
 }
 
+static int __rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
+		       struct nlmsghdr **answer,
+		       bool show_rtnl_err, nl_ext_ack_fn_t errfn)
+{
+	unsigned int seq;
+	int status;
+	struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK };
+	struct iovec iov = {
+		.iov_base = n,
+		.iov_len = n->nlmsg_len
+	};
+	struct msghdr msg = {
+		.msg_name = &nladdr,
+		.msg_namelen = sizeof(nladdr),
+		.msg_iov = &iov,
+		.msg_iovlen = 1,
+	};
+
+	n->nlmsg_seq = seq = ++rtnl->seq;
+
+	if (answer == NULL)
+		n->nlmsg_flags |= NLM_F_ACK;
+
+	status = sendmsg(rtnl->fd, &msg, 0);
+	if (status < 0) {
+		perror("Cannot talk to rtnetlink");
+		return -1;
+	}
+
+	return __rtnl_check_ack(rtnl, answer, show_rtnl_err, errfn, seq);
+}
+
+static int __rtnl_talk_msg(struct rtnl_handle *rtnl, struct msghdr *m,
+		       struct nlmsghdr **answer,
+		       bool show_rtnl_err, nl_ext_ack_fn_t errfn)
+{
+	int iovlen = m->msg_iovlen;
+	unsigned int seq = 0;
+	struct nlmsghdr *n;
+	struct iovec *v;
+	int i, status;
+
+	for (i = 0; i < iovlen; i++) {
+		v = &m->msg_iov[i];
+		n = v->iov_base;
+		n->nlmsg_seq = seq = ++rtnl->seq;
+		if (answer == NULL)
+			n->nlmsg_flags |= NLM_F_ACK;
+	}
+
+	status = sendmsg(rtnl->fd, m, 0);
+	if (status < 0) {
+		perror("Cannot talk to rtnetlink");
+		return -1;
+	}
+
+	return __rtnl_check_ack(rtnl, answer, show_rtnl_err, errfn, seq);
+}
+
 int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
 	      struct nlmsghdr **answer)
 {
 	return __rtnl_talk(rtnl, n, answer, true, NULL);
 }
 
+int rtnl_talk_msg(struct rtnl_handle *rtnl, struct msghdr *m,
+	      struct nlmsghdr **answer)
+{
+	return __rtnl_talk_msg(rtnl, m, answer, true, NULL);
+}
+
 int rtnl_talk_extack(struct rtnl_handle *rtnl, struct nlmsghdr *n,
 		     struct nlmsghdr **answer,
 		     nl_ext_ack_fn_t errfn)
-- 
2.14.3

^ permalink raw reply related

* [patch iproute2 v3 3/4] tc: Add -bs option to batch mode
From: Chris Mi @ 2017-12-25  8:46 UTC (permalink / raw)
  To: netdev; +Cc: gerlitz.or, stephen, dsahern
In-Reply-To: <20171225084658.24076-1-chrism@mellanox.com>

Signed-off-by: Chris Mi <chrism@mellanox.com>
---
 tc/m_action.c  |  91 +++++++++++++++++++++++++++++++++----------
 tc/tc.c        |  47 ++++++++++++++++++----
 tc/tc_common.h |   8 +++-
 tc/tc_filter.c | 121 +++++++++++++++++++++++++++++++++++++++++----------------
 4 files changed, 204 insertions(+), 63 deletions(-)

diff --git a/tc/m_action.c b/tc/m_action.c
index fc422364..c4c3b862 100644
--- a/tc/m_action.c
+++ b/tc/m_action.c
@@ -23,6 +23,7 @@
 #include <arpa/inet.h>
 #include <string.h>
 #include <dlfcn.h>
+#include <errno.h>
 
 #include "utils.h"
 #include "tc_common.h"
@@ -546,40 +547,88 @@ bad_val:
 	return ret;
 }
 
+typedef struct {
+	struct nlmsghdr		n;
+	struct tcamsg		t;
+	char			buf[MAX_MSG];
+} tc_action_req;
+
+static tc_action_req *action_reqs;
+static struct iovec msg_iov[MSG_IOV_MAX];
+
+void free_action_reqs(void)
+{
+	free(action_reqs);
+}
+
+static tc_action_req *get_action_req(int batch_size, int index)
+{
+	tc_action_req *req;
+
+	if (action_reqs == NULL) {
+		action_reqs = malloc(batch_size * sizeof (tc_action_req));
+		if (action_reqs == NULL)
+			return NULL;
+	}
+	req = &action_reqs[index];
+	memset(req, 0, sizeof (*req));
+
+	return req;
+}
+
 static int tc_action_modify(int cmd, unsigned int flags,
-			    int *argc_p, char ***argv_p)
+			    int *argc_p, char ***argv_p,
+			    int batch_size, int index, bool send)
 {
 	int argc = *argc_p;
 	char **argv = *argv_p;
 	int ret = 0;
-	struct {
-		struct nlmsghdr         n;
-		struct tcamsg           t;
-		char                    buf[MAX_MSG];
-	} req = {
-		.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcamsg)),
-		.n.nlmsg_flags = NLM_F_REQUEST | flags,
-		.n.nlmsg_type = cmd,
-		.t.tca_family = AF_UNSPEC,
+	tc_action_req *req;
+	struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK };
+	struct iovec *iov = &msg_iov[index];
+
+	req = get_action_req(batch_size, index);
+	if (req == NULL) {
+		fprintf(stderr, "get_action_req error: not enough buffer\n");
+		return -ENOMEM;
+	}
+
+	req->n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcamsg));
+	req->n.nlmsg_flags = NLM_F_REQUEST | flags;
+	req->n.nlmsg_type = cmd;
+	req->t.tca_family = AF_UNSPEC;
+	struct rtattr *tail = NLMSG_TAIL(&req->n);
+
+	struct msghdr msg = {
+		.msg_name = &nladdr,
+		.msg_namelen = sizeof(nladdr),
+		.msg_iov = msg_iov,
+		.msg_iovlen = index + 1,
 	};
-	struct rtattr *tail = NLMSG_TAIL(&req.n);
 
 	argc -= 1;
 	argv += 1;
-	if (parse_action(&argc, &argv, TCA_ACT_TAB, &req.n)) {
+	if (parse_action(&argc, &argv, TCA_ACT_TAB, &req->n)) {
 		fprintf(stderr, "Illegal \"action\"\n");
 		return -1;
 	}
-	tail->rta_len = (void *) NLMSG_TAIL(&req.n) - (void *) tail;
+	tail->rta_len = (void *) NLMSG_TAIL(&req->n) - (void *) tail;
+
+	*argc_p = argc;
+	*argv_p = argv;
+
+	iov->iov_base = &req->n;
+	iov->iov_len = req->n.nlmsg_len;
+
+	if (!send)
+		return 0;
 
-	if (rtnl_talk(&rth, &req.n, NULL) < 0) {
+	ret = rtnl_talk_msg(&rth, &msg, NULL);
+	if (ret < 0) {
 		fprintf(stderr, "We have an error talking to the kernel\n");
 		ret = -1;
 	}
 
-	*argc_p = argc;
-	*argv_p = argv;
-
 	return ret;
 }
 
@@ -679,7 +728,7 @@ bad_val:
 	return ret;
 }
 
-int do_action(int argc, char **argv)
+int do_action(int argc, char **argv, int batch_size, int index, bool send)
 {
 
 	int ret = 0;
@@ -689,12 +738,14 @@ int do_action(int argc, char **argv)
 		if (matches(*argv, "add") == 0) {
 			ret =  tc_action_modify(RTM_NEWACTION,
 						NLM_F_EXCL | NLM_F_CREATE,
-						&argc, &argv);
+						&argc, &argv, batch_size,
+						index, send);
 		} else if (matches(*argv, "change") == 0 ||
 			  matches(*argv, "replace") == 0) {
 			ret = tc_action_modify(RTM_NEWACTION,
 					       NLM_F_CREATE | NLM_F_REPLACE,
-					       &argc, &argv);
+					       &argc, &argv, batch_size,
+					       index, send);
 		} else if (matches(*argv, "delete") == 0) {
 			argc -= 1;
 			argv += 1;
diff --git a/tc/tc.c b/tc/tc.c
index ad9f07e9..7ea2fc89 100644
--- a/tc/tc.c
+++ b/tc/tc.c
@@ -189,20 +189,20 @@ static void usage(void)
 	fprintf(stderr, "Usage: tc [ OPTIONS ] OBJECT { COMMAND | help }\n"
 			"       tc [-force] -batch filename\n"
 			"where  OBJECT := { qdisc | class | filter | action | monitor | exec }\n"
-	                "       OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] | -p[retty] | -b[atch] [filename] | -n[etns] name |\n"
+	                "       OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] | -p[retty] | -b[atch] [filename] | -bs | -batchsize [size] | -n[etns] name |\n"
 			"                    -nm | -nam[es] | { -cf | -conf } path } | -j[son]\n");
 }
 
-static int do_cmd(int argc, char **argv)
+static int do_cmd(int argc, char **argv, int batch_size, int index, bool send)
 {
 	if (matches(*argv, "qdisc") == 0)
 		return do_qdisc(argc-1, argv+1);
 	if (matches(*argv, "class") == 0)
 		return do_class(argc-1, argv+1);
 	if (matches(*argv, "filter") == 0)
-		return do_filter(argc-1, argv+1);
+		return do_filter(argc-1, argv+1, batch_size, index, send);
 	if (matches(*argv, "actions") == 0)
-		return do_action(argc-1, argv+1);
+		return do_action(argc-1, argv+1, batch_size, index, send);
 	if (matches(*argv, "monitor") == 0)
 		return do_tcmonitor(argc-1, argv+1);
 	if (matches(*argv, "exec") == 0)
@@ -217,11 +217,16 @@ static int do_cmd(int argc, char **argv)
 	return -1;
 }
 
-static int batch(const char *name)
+static int batch(const char *name, int batch_size)
 {
+	int msg_iov_index = 0;
 	char *line = NULL;
 	size_t len = 0;
 	int ret = 0;
+	bool send;
+
+	if (batch_size > 1)
+		setcmdlinetotal(name);
 
 	batch_mode = 1;
 	if (name && strcmp(name, "-") != 0) {
@@ -248,15 +253,30 @@ static int batch(const char *name)
 		if (largc == 0)
 			continue;	/* blank line */
 
-		if (do_cmd(largc, largv)) {
+		/*
+		 * In batch mode, if we haven't accumulated enough commands
+		 * and this is not the last command, don't send the message
+		 * immediately.
+		 */
+		if (batch_size > 1 && msg_iov_index + 1 != batch_size
+		    && cmdlineno != cmdlinetotal)
+			send = false;
+		else
+			send = true;
+
+		ret = do_cmd(largc, largv, batch_size, msg_iov_index++, send);
+		if (ret < 0) {
 			fprintf(stderr, "Command failed %s:%d\n", name, cmdlineno);
 			ret = 1;
 			if (!force)
 				break;
 		}
+		msg_iov_index %= batch_size;
 	}
 	if (line)
 		free(line);
+	free_filter_reqs();
+	free_action_reqs();
 
 	rtnl_close(&rth);
 	return ret;
@@ -267,6 +287,7 @@ int main(int argc, char **argv)
 {
 	int ret;
 	char *batch_file = NULL;
+	int batch_size = 1;
 
 	while (argc > 1) {
 		if (argv[1][0] != '-')
@@ -297,6 +318,14 @@ int main(int argc, char **argv)
 			if (argc <= 1)
 				usage();
 			batch_file = argv[1];
+		} else if (matches(argv[1], "-batchsize") == 0 ||
+				matches(argv[1], "-bs") == 0) {
+			argc--;	argv++;
+			if (argc <= 1)
+				usage();
+			batch_size = atoi(argv[1]);
+			if (batch_size > MSG_IOV_MAX)
+				batch_size = MSG_IOV_MAX;
 		} else if (matches(argv[1], "-netns") == 0) {
 			NEXT_ARG();
 			if (netns_switch(argv[1]))
@@ -323,7 +352,7 @@ int main(int argc, char **argv)
 	}
 
 	if (batch_file)
-		return batch(batch_file);
+		return batch(batch_file, batch_size);
 
 	if (argc <= 1) {
 		usage();
@@ -341,7 +370,9 @@ int main(int argc, char **argv)
 		goto Exit;
 	}
 
-	ret = do_cmd(argc-1, argv+1);
+	ret = do_cmd(argc-1, argv+1, 1, 0, true);
+	free_filter_reqs();
+	free_action_reqs();
 Exit:
 	rtnl_close(&rth);
 
diff --git a/tc/tc_common.h b/tc/tc_common.h
index 264fbdac..8a82439f 100644
--- a/tc/tc_common.h
+++ b/tc/tc_common.h
@@ -1,13 +1,14 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 
 #define TCA_BUF_MAX	(64*1024)
+#define MSG_IOV_MAX	256
 
 extern struct rtnl_handle rth;
 
 extern int do_qdisc(int argc, char **argv);
 extern int do_class(int argc, char **argv);
-extern int do_filter(int argc, char **argv);
-extern int do_action(int argc, char **argv);
+extern int do_filter(int argc, char **argv, int batch_size, int index, bool send);
+extern int do_action(int argc, char **argv, int batch_size, int index, bool send);
 extern int do_tcmonitor(int argc, char **argv);
 extern int do_exec(int argc, char **argv);
 
@@ -24,5 +25,8 @@ struct tc_sizespec;
 extern int parse_size_table(int *p_argc, char ***p_argv, struct tc_sizespec *s);
 extern int check_size_table_opts(struct tc_sizespec *s);
 
+extern void free_filter_reqs(void);
+extern void free_action_reqs(void);
+
 extern int show_graph;
 extern bool use_names;
diff --git a/tc/tc_filter.c b/tc/tc_filter.c
index 545cc3a1..6fecbb45 100644
--- a/tc/tc_filter.c
+++ b/tc/tc_filter.c
@@ -19,6 +19,7 @@
 #include <arpa/inet.h>
 #include <string.h>
 #include <linux/if_ether.h>
+#include <errno.h>
 
 #include "rt_names.h"
 #include "utils.h"
@@ -42,18 +43,44 @@ static void usage(void)
 		"OPTIONS := ... try tc filter add <desired FILTER_KIND> help\n");
 }
 
-static int tc_filter_modify(int cmd, unsigned int flags, int argc, char **argv)
+typedef struct {
+	struct nlmsghdr		n;
+	struct tcmsg		t;
+	char			buf[MAX_MSG];
+} tc_filter_req;
+
+static tc_filter_req *filter_reqs;
+static struct iovec msg_iov[MSG_IOV_MAX];
+
+void free_filter_reqs(void)
 {
-	struct {
-		struct nlmsghdr	n;
-		struct tcmsg		t;
-		char			buf[MAX_MSG];
-	} req = {
-		.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)),
-		.n.nlmsg_flags = NLM_F_REQUEST | flags,
-		.n.nlmsg_type = cmd,
-		.t.tcm_family = AF_UNSPEC,
-	};
+	free(filter_reqs);
+}
+
+static tc_filter_req *get_filter_req(int batch_size, int index)
+{
+	tc_filter_req *req;
+
+	if (filter_reqs == NULL) {
+		filter_reqs = malloc(batch_size * sizeof (tc_filter_req));
+		if (filter_reqs == NULL)
+			return NULL;
+	}
+	req = &filter_reqs[index];
+	memset(req, 0, sizeof (*req));
+
+	return req;
+}
+
+static int tc_filter_modify(int cmd, unsigned int flags, int argc, char **argv,
+			    int batch_size, int index, bool send)
+{
+	tc_filter_req *req;
+	int ret;
+
+	struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK };
+	struct iovec *iov = &msg_iov[index];
+
 	struct filter_util *q = NULL;
 	__u32 prio = 0;
 	__u32 protocol = 0;
@@ -65,6 +92,24 @@ static int tc_filter_modify(int cmd, unsigned int flags, int argc, char **argv)
 	char  k[FILTER_NAMESZ] = {};
 	struct tc_estimator est = {};
 
+	req = get_filter_req(batch_size, index);
+	if (req == NULL) {
+		fprintf(stderr, "get_filter_req error: not enough buffer\n");
+		return -ENOMEM;
+	}
+
+	req->n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg));
+	req->n.nlmsg_flags = NLM_F_REQUEST | flags;
+	req->n.nlmsg_type = cmd;
+	req->t.tcm_family = AF_UNSPEC;
+
+	struct msghdr msg = {
+		.msg_name = &nladdr,
+		.msg_namelen = sizeof(nladdr),
+		.msg_iov = msg_iov,
+		.msg_iovlen = index + 1,
+	};
+
 	if (cmd == RTM_NEWTFILTER && flags & NLM_F_CREATE)
 		protocol = htons(ETH_P_ALL);
 
@@ -75,37 +120,37 @@ static int tc_filter_modify(int cmd, unsigned int flags, int argc, char **argv)
 				duparg("dev", *argv);
 			strncpy(d, *argv, sizeof(d)-1);
 		} else if (strcmp(*argv, "root") == 0) {
-			if (req.t.tcm_parent) {
+			if (req->t.tcm_parent) {
 				fprintf(stderr,
 					"Error: \"root\" is duplicate parent ID\n");
 				return -1;
 			}
-			req.t.tcm_parent = TC_H_ROOT;
+			req->t.tcm_parent = TC_H_ROOT;
 		} else if (strcmp(*argv, "ingress") == 0) {
-			if (req.t.tcm_parent) {
+			if (req->t.tcm_parent) {
 				fprintf(stderr,
 					"Error: \"ingress\" is duplicate parent ID\n");
 				return -1;
 			}
-			req.t.tcm_parent = TC_H_MAKE(TC_H_CLSACT,
+			req->t.tcm_parent = TC_H_MAKE(TC_H_CLSACT,
 						     TC_H_MIN_INGRESS);
 		} else if (strcmp(*argv, "egress") == 0) {
-			if (req.t.tcm_parent) {
+			if (req->t.tcm_parent) {
 				fprintf(stderr,
 					"Error: \"egress\" is duplicate parent ID\n");
 				return -1;
 			}
-			req.t.tcm_parent = TC_H_MAKE(TC_H_CLSACT,
+			req->t.tcm_parent = TC_H_MAKE(TC_H_CLSACT,
 						     TC_H_MIN_EGRESS);
 		} else if (strcmp(*argv, "parent") == 0) {
 			__u32 handle;
 
 			NEXT_ARG();
-			if (req.t.tcm_parent)
+			if (req->t.tcm_parent)
 				duparg("parent", *argv);
 			if (get_tc_classid(&handle, *argv))
 				invarg("Invalid parent ID", *argv);
-			req.t.tcm_parent = handle;
+			req->t.tcm_parent = handle;
 		} else if (strcmp(*argv, "handle") == 0) {
 			NEXT_ARG();
 			if (fhandle)
@@ -152,26 +197,26 @@ static int tc_filter_modify(int cmd, unsigned int flags, int argc, char **argv)
 		argc--; argv++;
 	}
 
-	req.t.tcm_info = TC_H_MAKE(prio<<16, protocol);
+	req->t.tcm_info = TC_H_MAKE(prio<<16, protocol);
 
 	if (chain_index_set)
-		addattr32(&req.n, sizeof(req), TCA_CHAIN, chain_index);
+		addattr32(&req->n, sizeof(*req), TCA_CHAIN, chain_index);
 
 	if (k[0])
-		addattr_l(&req.n, sizeof(req), TCA_KIND, k, strlen(k)+1);
+		addattr_l(&req->n, sizeof(*req), TCA_KIND, k, strlen(k)+1);
 
 	if (d[0])  {
 		ll_init_map(&rth);
 
-		req.t.tcm_ifindex = ll_name_to_index(d);
-		if (req.t.tcm_ifindex == 0) {
+		req->t.tcm_ifindex = ll_name_to_index(d);
+		if (req->t.tcm_ifindex == 0) {
 			fprintf(stderr, "Cannot find device \"%s\"\n", d);
 			return 1;
 		}
 	}
 
 	if (q) {
-		if (q->parse_fopt(q, fhandle, argc, argv, &req.n))
+		if (q->parse_fopt(q, fhandle, argc, argv, &req->n))
 			return 1;
 	} else {
 		if (fhandle) {
@@ -190,10 +235,17 @@ static int tc_filter_modify(int cmd, unsigned int flags, int argc, char **argv)
 	}
 
 	if (est.ewma_log)
-		addattr_l(&req.n, sizeof(req), TCA_RATE, &est, sizeof(est));
+		addattr_l(&req->n, sizeof(*req), TCA_RATE, &est, sizeof(est));
 
-	if (rtnl_talk(&rth, &req.n, NULL) < 0) {
-		fprintf(stderr, "We have an error talking to the kernel\n");
+	iov->iov_base = &req->n;
+	iov->iov_len = req->n.nlmsg_len;
+
+	if (!send)
+		return 0;
+
+	ret = rtnl_talk_msg(&rth, &msg, NULL);
+	if (ret < 0) {
+		fprintf(stderr, "We have an error talking to the kernel, %d\n", ret);
 		return 2;
 	}
 
@@ -636,20 +688,23 @@ static int tc_filter_list(int argc, char **argv)
 	return 0;
 }
 
-int do_filter(int argc, char **argv)
+int do_filter(int argc, char **argv, int batch_size, int index, bool send)
 {
 	if (argc < 1)
 		return tc_filter_list(0, NULL);
 	if (matches(*argv, "add") == 0)
 		return tc_filter_modify(RTM_NEWTFILTER, NLM_F_EXCL|NLM_F_CREATE,
-					argc-1, argv+1);
+					argc-1, argv+1,
+					batch_size, index, send);
 	if (matches(*argv, "change") == 0)
-		return tc_filter_modify(RTM_NEWTFILTER, 0, argc-1, argv+1);
+		return tc_filter_modify(RTM_NEWTFILTER, 0, argc-1, argv+1,
+					batch_size, index, send);
 	if (matches(*argv, "replace") == 0)
 		return tc_filter_modify(RTM_NEWTFILTER, NLM_F_CREATE, argc-1,
-					argv+1);
+					argv+1, batch_size, index, send);
 	if (matches(*argv, "delete") == 0)
-		return tc_filter_modify(RTM_DELTFILTER, 0,  argc-1, argv+1);
+		return tc_filter_modify(RTM_DELTFILTER, 0, argc-1, argv+1,
+					batch_size, index, send);
 	if (matches(*argv, "get") == 0)
 		return tc_filter_get(RTM_GETTFILTER, 0,  argc-1, argv+1);
 	if (matches(*argv, "list") == 0 || matches(*argv, "show") == 0
-- 
2.14.3

^ permalink raw reply related

* [patch iproute2 v3 0/4] tc: Add -bs option to batch mode
From: Chris Mi @ 2017-12-25  8:46 UTC (permalink / raw)
  To: netdev; +Cc: gerlitz.or, stephen, dsahern

Currently in tc batch mode, only one command is read from the batch
file and sent to kernel to process. With this patchset, we can accumulate
several commands before sending to kernel. The batch size is specified
using option -bs or -batchsize.

To accumulate the commands in tc, client should allocate an array of
struct iovec. If batchsize is bigger than 1, only after the client
has accumulated enough commands, can the client call rtnl_talk_msg
to send the message that includes the iov array. One exception is
that there is no more command in the batch file.

But please note that kernel still processes the requests one by one.
To process the requests in parallel in kernel is another effort.
The time we're saving in this patchset is the user mode and kernel mode
context switch. So this patchset works on top of the current kernel.

Using the following script in kernel, we can generate 1,000,000 rules.
	tools/testing/selftests/tc-testing/tdc_batch.py

Without this patchset, 'tc -b $file' exection time is:

real    0m15.125s
user    0m6.982s
sys     0m8.080s

With this patchset, 'tc -b $file -bs 10' exection time is:

real    0m12.772s
user    0m5.984s
sys     0m6.723s

The insertion rate is improved more than 10%.

In this patchset, we still ack for every rule. If we don't ack at all,

'tc -b $file' exection time is:

real    0m14.484s
user    0m6.919s
sys     0m7.498s

'tc -b $file -bs 10' exection time is:

real    0m11.664s
user    0m6.017s
sys     0m5.578s

We can see that the performance win is to send multiple messages instead
of no acking. I think that's because in tc, we don't spend too much time
processing the ack message.


v3
==
1. Instead of hacking function rtnl_talk directly, add a new function
   rtnl_talk_msg. 
2. remove most of global variables to use parameter passing
3. divide the previous patch into 4 patches.

Chris Mi (4):
  lib/libnetlink: Add a function rtnl_talk_msg
  utils: Add a function setcmdlinetotal
  tc: Add -bs option to batch mode
  man: Add -bs option to tc manpage

 include/libnetlink.h |   3 ++
 include/utils.h      |   4 ++
 lib/libnetlink.c     |  92 ++++++++++++++++++++++++++++++---------
 lib/utils.c          |  20 +++++++++
 man/man8/tc.8        |   5 +++
 tc/m_action.c        |  91 +++++++++++++++++++++++++++++---------
 tc/tc.c              |  47 ++++++++++++++++----
 tc/tc_common.h       |   8 +++-
 tc/tc_filter.c       | 121 +++++++++++++++++++++++++++++++++++++--------------
 9 files changed, 307 insertions(+), 84 deletions(-)

-- 
2.14.3

^ permalink raw reply

* [patch net] mlxsw: spectrum: Relax sanity checks during enslavement
From: Jiri Pirko @ 2017-12-25  8:05 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, alexpe, mlxsw

From: Ido Schimmel <idosch@mellanox.com>

Since commit 25cc72a33835 ("mlxsw: spectrum: Forbid linking to devices that
have uppers") the driver forbids enslavement to netdevs that already
have uppers of their own, as this can result in various ordering
problems.

This requirement proved to be too strict for some users who need to be
able to enslave ports to a bridge that already has uppers. In this case,
we can allow the enslavement if the bridge is already known to us, as
any configuration performed on top of the bridge was already reflected
to the device.

Fixes: 25cc72a33835 ("mlxsw: spectrum: Forbid linking to devices that have uppers")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Tested-by: Alexander Petrovskiy <alexpe@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c           | 11 +++++++++--
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h           |  2 ++
 drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c |  6 ++++++
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 9bd8d28..c3837ca 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -4376,7 +4376,10 @@ static int mlxsw_sp_netdevice_port_upper_event(struct net_device *lower_dev,
 		}
 		if (!info->linking)
 			break;
-		if (netdev_has_any_upper_dev(upper_dev)) {
+		if (netdev_has_any_upper_dev(upper_dev) &&
+		    (!netif_is_bridge_master(upper_dev) ||
+		     !mlxsw_sp_bridge_device_is_offloaded(mlxsw_sp,
+							  upper_dev))) {
 			NL_SET_ERR_MSG(extack,
 				       "spectrum: Enslaving a port to a device that already has an upper device is not supported");
 			return -EINVAL;
@@ -4504,6 +4507,7 @@ static int mlxsw_sp_netdevice_port_vlan_event(struct net_device *vlan_dev,
 					      u16 vid)
 {
 	struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev);
+	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
 	struct netdev_notifier_changeupper_info *info = ptr;
 	struct netlink_ext_ack *extack;
 	struct net_device *upper_dev;
@@ -4520,7 +4524,10 @@ static int mlxsw_sp_netdevice_port_vlan_event(struct net_device *vlan_dev,
 		}
 		if (!info->linking)
 			break;
-		if (netdev_has_any_upper_dev(upper_dev)) {
+		if (netdev_has_any_upper_dev(upper_dev) &&
+		    (!netif_is_bridge_master(upper_dev) ||
+		     !mlxsw_sp_bridge_device_is_offloaded(mlxsw_sp,
+							  upper_dev))) {
 			NL_SET_ERR_MSG(extack, "spectrum: Enslaving a port to a device that already has an upper device is not supported");
 			return -EINVAL;
 		}
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 432ab9b..05ce1be 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -365,6 +365,8 @@ int mlxsw_sp_port_bridge_join(struct mlxsw_sp_port *mlxsw_sp_port,
 void mlxsw_sp_port_bridge_leave(struct mlxsw_sp_port *mlxsw_sp_port,
 				struct net_device *brport_dev,
 				struct net_device *br_dev);
+bool mlxsw_sp_bridge_device_is_offloaded(const struct mlxsw_sp *mlxsw_sp,
+					 const struct net_device *br_dev);
 
 /* spectrum.c */
 int mlxsw_sp_port_ets_set(struct mlxsw_sp_port *mlxsw_sp_port,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 7b8548e..593ad31 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -152,6 +152,12 @@ mlxsw_sp_bridge_device_find(const struct mlxsw_sp_bridge *bridge,
 	return NULL;
 }
 
+bool mlxsw_sp_bridge_device_is_offloaded(const struct mlxsw_sp *mlxsw_sp,
+					 const struct net_device *br_dev)
+{
+	return !!mlxsw_sp_bridge_device_find(mlxsw_sp->bridge, br_dev);
+}
+
 static struct mlxsw_sp_bridge_device *
 mlxsw_sp_bridge_device_create(struct mlxsw_sp_bridge *bridge,
 			      struct net_device *br_dev)
-- 
2.9.5

^ permalink raw reply related

* Re: [patch net] mlxsw: spectrum_router: Fix NULL pointer deref
From: Jiri Pirko @ 2017-12-25  8:02 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, mlxsw
In-Reply-To: <20171225075735.2058-1-jiri@resnulli.us>

Mon, Dec 25, 2017 at 08:57:35AM CET, jiri@resnulli.us wrote:
>From: Ido Schimmel <idosch@mellanox.com>
>
>When we remove the neighbour associated with a nexthop we should always
>refuse to write the nexthop to the adjacency table. Regardless if it is
>already present in the table or not.
>
>Otherwise, we risk dereferencing the NULL pointer that was set instead
>of the neighbour.
>
>Fixes: a7ff87acd995 ("mlxsw: spectrum_router: Implement next-hop routing")
>Signed-off-by: Ido Schimmel <idosch@mellanox.com>
>Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
>Signed-off-by: Jiri Pirko <jiri@mellanox.com>

Dave, could you please queue this up for 4.14.y together
with "mlxsw: spectrum: Relax sanity checks during enslavement".

Thanks!

^ permalink raw reply

* [patch net] mlxsw: spectrum_router: Fix NULL pointer deref
From: Jiri Pirko @ 2017-12-25  7:57 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, mlxsw

From: Ido Schimmel <idosch@mellanox.com>

When we remove the neighbour associated with a nexthop we should always
refuse to write the nexthop to the adjacency table. Regardless if it is
already present in the table or not.

Otherwise, we risk dereferencing the NULL pointer that was set instead
of the neighbour.

Fixes: a7ff87acd995 ("mlxsw: spectrum_router: Implement next-hop routing")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index be657b8..434b392 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -3228,7 +3228,7 @@ static void __mlxsw_sp_nexthop_neigh_update(struct mlxsw_sp_nexthop *nh,
 {
 	if (!removing)
 		nh->should_offload = 1;
-	else if (nh->offloaded)
+	else
 		nh->should_offload = 0;
 	nh->update = 1;
 }
-- 
2.9.5

^ permalink raw reply related

* [PATCH net] ip6_tunnel: allow ip6gre dev mtu to be set below 1280
From: Xin Long @ 2017-12-25  6:45 UTC (permalink / raw)
  To: network dev; +Cc: davem

Commit 582442d6d5bc ("ipv6: Allow the MTU of ipip6 tunnel to be set
below 1280") fixed a mtu setting issue. It works for ipip6 tunnel.

But ip6gre dev updates the mtu also with ip6_tnl_change_mtu. Since
the inner packet over ip6gre can be ipv4 and it's mtu should also
be allowed to set below 1280, the same issue also exists on ip6gre.

This patch is to fix it by simply changing to check if parms.proto
is IPPROTO_IPV6 in ip6_tnl_change_mtu instead, to make ip6gre to
go to 'else' branch.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/ipv6/ip6_tunnel.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 931c38f..1b30777c 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1676,11 +1676,11 @@ int ip6_tnl_change_mtu(struct net_device *dev, int new_mtu)
 {
 	struct ip6_tnl *tnl = netdev_priv(dev);
 
-	if (tnl->parms.proto == IPPROTO_IPIP) {
-		if (new_mtu < ETH_MIN_MTU)
+	if (tnl->parms.proto == IPPROTO_IPV6) {
+		if (new_mtu < IPV6_MIN_MTU)
 			return -EINVAL;
 	} else {
-		if (new_mtu < IPV6_MIN_MTU)
+		if (new_mtu < ETH_MIN_MTU)
 			return -EINVAL;
 	}
 	if (new_mtu > 0xFFF8 - dev->hard_header_len)
-- 
2.1.0

^ permalink raw reply related

* [PATCH net] geneve: update skb dst pmtu on tx path
From: Xin Long @ 2017-12-25  6:43 UTC (permalink / raw)
  To: network dev; +Cc: davem

Commit a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path") has fixed
a performance issue caused by the change of lower dev's mtu for vxlan.

The same thing needs to be done for geneve as well.

Note that geneve cannot adjust it's mtu according to lower dev's mtu
when creating it. The performance is very low later when netperfing
over it without fixing the mtu manually. This patch could also avoid
this issue.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 drivers/net/geneve.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index b718a02..0a48b30 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -825,6 +825,13 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 	if (IS_ERR(rt))
 		return PTR_ERR(rt);
 
+	if (skb_dst(skb)) {
+		int mtu = dst_mtu(&rt->dst) - sizeof(struct iphdr) -
+			  GENEVE_BASE_HLEN - info->options_len - 14;
+
+		skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
+	}
+
 	sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
 	if (geneve->collect_md) {
 		tos = ip_tunnel_ecn_encap(key->tos, ip_hdr(skb), skb);
@@ -864,6 +871,13 @@ static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 	if (IS_ERR(dst))
 		return PTR_ERR(dst);
 
+	if (skb_dst(skb)) {
+		int mtu = dst_mtu(dst) - sizeof(struct ipv6hdr) -
+			  GENEVE_BASE_HLEN - info->options_len - 14;
+
+		skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
+	}
+
 	sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
 	if (geneve->collect_md) {
 		prio = ip_tunnel_ecn_encap(key->tos, ip_hdr(skb), skb);
-- 
2.1.0

^ permalink raw reply related

* [PATCH net-next v2] net: sched: fix skb leak in dev_requeue_skb()
From: Wei Yongjun @ 2017-12-25  3:49 UTC (permalink / raw)
  To: John Fastabend, Jamal Hadi Salim, Cong Wang, Jiri Pirko
  Cc: Wei Yongjun, netdev

When dev_requeue_skb() is called with bluked skb list, only the
first skb of the list will be requeued to qdisc layer, and leak
the others without free them.

TCP is broken due to skb leak since no free skb will be considered
as still in the host queue and never be retransmitted. This happend
when dev_requeue_skb() called from qdisc_restart().
  qdisc_restart
  |-- dequeue_skb
  |-- sch_direct_xmit()
      |-- dev_requeue_skb() <-- skb may bluked

Fix dev_requeue_skb() to requeue the full bluked list.

Fixes: a53851e2c321 ("net: sched: explicit locking in gso_cpu fallback")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
v1 -> v2: add net-next prefix
---
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 981c08f..0df2dbf 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -111,10 +111,16 @@ static inline void qdisc_enqueue_skb_bad_txq(struct Qdisc *q,
 
 static inline int __dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
 {
-	__skb_queue_head(&q->gso_skb, skb);
-	q->qstats.requeues++;
-	qdisc_qstats_backlog_inc(q, skb);
-	q->q.qlen++;	/* it's still part of the queue */
+	while (skb) {
+		struct sk_buff *next = skb->next;
+
+		__skb_queue_tail(&q->gso_skb, skb);
+		q->qstats.requeues++;
+		qdisc_qstats_backlog_inc(q, skb);
+		q->q.qlen++;	/* it's still part of the queue */
+
+		skb = next;
+	}
 	__netif_schedule(q);
 
 	return 0;
@@ -124,13 +130,20 @@ static inline int dev_requeue_skb_locked(struct sk_buff *skb, struct Qdisc *q)
 {
 	spinlock_t *lock = qdisc_lock(q);
 
-	spin_lock(lock);
-	__skb_queue_tail(&q->gso_skb, skb);
-	spin_unlock(lock);
+	while (skb) {
+		struct sk_buff *next = skb->next;
+
+		spin_lock(lock);
+		__skb_queue_tail(&q->gso_skb, skb);
+		spin_unlock(lock);
+
+		qdisc_qstats_cpu_requeues_inc(q);
+		qdisc_qstats_cpu_backlog_inc(q, skb);
+		qdisc_qstats_cpu_qlen_inc(q);
+
+		skb = next;
+	}
 
-	qdisc_qstats_cpu_requeues_inc(q);
-	qdisc_qstats_cpu_backlog_inc(q, skb);
-	qdisc_qstats_cpu_qlen_inc(q);
 	__netif_schedule(q);
 
 	return 0;
-- 
1.8.3.1

^ permalink raw reply related

* RE: [PATCH] net: sched: fix skb leak in dev_requeue_skb()
From: weiyongjun (A) @ 2017-12-25  3:38 UTC (permalink / raw)
  To: John Fastabend, Jamal Hadi Salim, Cong Wang, Jiri Pirko
  Cc: netdev@vger.kernel.org
In-Reply-To: <1514173173-164916-1-git-send-email-weiyongjun1@huawei.com>

> When dev_requeue_skb() is called with bluked skb list, only the
> first skb of the list will be requeued to qdisc layer, and leak
> the others without free them.
> 
> TCP is broken due to skb leak since no free skb will be considered
> as still in the host queue and never be retransmitted. This happend
> when dev_requeue_skb() called from qdisc_restart().
>   qdisc_restart
>   |-- dequeue_skb
>   |-- sch_direct_xmit()
>       |-- dev_requeue_skb() <-- skb may bluked
> 
> Fix dev_requeue_skb() to requeue the full bluked list.
> 
> Fixes: a53851e2c321 ("net: sched: explicit locking in gso_cpu fallback")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>

Sorry, I forgot the 'net-next' prefix, please ignore this one, I will resend it.


^ permalink raw reply

* [PATCH] net: sched: fix skb leak in dev_requeue_skb()
From: Wei Yongjun @ 2017-12-25  3:39 UTC (permalink / raw)
  To: John Fastabend, Jamal Hadi Salim, Cong Wang, Jiri Pirko
  Cc: Wei Yongjun, netdev

When dev_requeue_skb() is called with bluked skb list, only the
first skb of the list will be requeued to qdisc layer, and leak
the others without free them.

TCP is broken due to skb leak since no free skb will be considered
as still in the host queue and never be retransmitted. This happend
when dev_requeue_skb() called from qdisc_restart().
  qdisc_restart
  |-- dequeue_skb
  |-- sch_direct_xmit()
      |-- dev_requeue_skb() <-- skb may bluked

Fix dev_requeue_skb() to requeue the full bluked list.

Fixes: a53851e2c321 ("net: sched: explicit locking in gso_cpu fallback")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 981c08f..0df2dbf 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -111,10 +111,16 @@ static inline void qdisc_enqueue_skb_bad_txq(struct Qdisc *q,
 
 static inline int __dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
 {
-	__skb_queue_head(&q->gso_skb, skb);
-	q->qstats.requeues++;
-	qdisc_qstats_backlog_inc(q, skb);
-	q->q.qlen++;	/* it's still part of the queue */
+	while (skb) {
+		struct sk_buff *next = skb->next;
+
+		__skb_queue_tail(&q->gso_skb, skb);
+		q->qstats.requeues++;
+		qdisc_qstats_backlog_inc(q, skb);
+		q->q.qlen++;	/* it's still part of the queue */
+
+		skb = next;
+	}
 	__netif_schedule(q);
 
 	return 0;
@@ -124,13 +130,20 @@ static inline int dev_requeue_skb_locked(struct sk_buff *skb, struct Qdisc *q)
 {
 	spinlock_t *lock = qdisc_lock(q);
 
-	spin_lock(lock);
-	__skb_queue_tail(&q->gso_skb, skb);
-	spin_unlock(lock);
+	while (skb) {
+		struct sk_buff *next = skb->next;
+
+		spin_lock(lock);
+		__skb_queue_tail(&q->gso_skb, skb);
+		spin_unlock(lock);
+
+		qdisc_qstats_cpu_requeues_inc(q);
+		qdisc_qstats_cpu_backlog_inc(q, skb);
+		qdisc_qstats_cpu_qlen_inc(q);
+
+		skb = next;
+	}
 
-	qdisc_qstats_cpu_requeues_inc(q);
-	qdisc_qstats_cpu_backlog_inc(q, skb);
-	qdisc_qstats_cpu_qlen_inc(q);
 	__netif_schedule(q);
 
 	return 0;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH] ip6_tunnel: disable dst caching if tunnel is dual-stack
From: Eli Cooper @ 2017-12-25  2:43 UTC (permalink / raw)
  To: netdev; +Cc: David S . Miller

When an ip6_tunnel is in mode 'any', where the transport layer
protocol can be either 4 or 41, dst_cache must be disabled.

This is because xfrm policies might apply to only one of the two
protocols. Caching dst would cause xfrm policies for one protocol
incorrectly used for the other.

Cc: stable@vger.kernel.org
Signed-off-by: Eli Cooper <elicooper@gmx.com>
---
 net/ipv6/ip6_tunnel.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 931c38f6ff4a..8aea23d15ddd 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1074,10 +1074,10 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield,
 			memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr));
 			neigh_release(neigh);
 		}
-	} else if (!(t->parms.flags &
+	} else if (t->parms.proto != 0 && !(t->parms.flags &
 		     (IP6_TNL_F_USE_ORIG_TCLASS | IP6_TNL_F_USE_ORIG_FWMARK))) {
-		/* enable the cache only only if the routing decision does
-		 * not depend on the current inner header value
+		/* enable the cache only if neither the outer protocol nor the
+		 * routing decision depends on the current inner header value
 		 */
 		use_cache = true;
 	}
-- 
2.15.1

^ permalink raw reply related

* Re: [PATCH v3 27/27] devres: kill devm_ioremap_nocache
From: Yisheng Xie @ 2017-12-25  1:43 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-mips, ulf.hansson, jakub.kicinski, lgirdwood, airlied,
	linux-pci, linus.walleij, alsa-devel, dri-devel,
	platform-driver-x86, linux-ide, linux-mtd, daniel.vetter, tglx,
	linux-watchdog, linux-rtc, boris.brezillon, andriy.shevchenko,
	vinod.koul, richard, alexandre.belloni, marek.vasut,
	industrypack-devel, jslaby, dvhart, linux, linux-media, devel,
	jason, arnd, b.zolnierkie, marc.zyngier, linux-mmc, jani.niku
In-Reply-To: <20171223134532.GA10103@kroah.com>



On 2017/12/23 21:45, Greg KH wrote:
> On Sat, Dec 23, 2017 at 07:02:59PM +0800, Yisheng Xie wrote:
>> --- a/lib/devres.c
>> +++ b/lib/devres.c
>> @@ -44,35 +44,6 @@ void __iomem *devm_ioremap(struct device *dev, resource_size_t offset,
>>  EXPORT_SYMBOL(devm_ioremap);
>>  
>>  /**
>> - * devm_ioremap_nocache - Managed ioremap_nocache()
>> - * @dev: Generic device to remap IO address for
>> - * @offset: Resource address to map
>> - * @size: Size of map
>> - *
>> - * Managed ioremap_nocache().  Map is automatically unmapped on driver
>> - * detach.
>> - */
>> -void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t offset,
>> -				   resource_size_t size)
>> -{
>> -	void __iomem **ptr, *addr;
>> -
>> -	ptr = devres_alloc(devm_ioremap_release, sizeof(*ptr), GFP_KERNEL);
>> -	if (!ptr)
>> -		return NULL;
>> -
>> -	addr = ioremap_nocache(offset, size);
> 
> Wait, devm_ioremap() calls ioremap(), not ioremap_nocache(), are you
> _SURE_ that these are all identical?  For all arches?  If so, then
> ioremap_nocache() can also be removed, right?

Yeah, As Christophe pointed out, that 4 archs do not have the same function.
But I do not why they do not want do the same thing. Driver may no know about
this? right?

> 
> In my quick glance, I don't think you can do this series at all :(

Yes, maybe should take Christophe suggestion and use a bool or enum to distinguish them?

Thanks
Yisheng
> 
> greg k-h
> 
> .
> 

^ permalink raw reply

* Re: [PATCH v3 00/27] kill devm_ioremap_nocache
From: Yisheng Xie @ 2017-12-25  1:34 UTC (permalink / raw)
  To: christophe leroy, Greg KH
  Cc: linux-mips, ulf.hansson, jakub.kicinski, platform-driver-x86,
	airlied, linux-wireless, linus.walleij, alsa-devel, dri-devel,
	linux-kernel, linux-ide, linux-mtd, daniel.vetter, dan.j.williams,
	jason, linux-rtc, boris.brezillon, mchehab, dmaengine, vinod.koul,
	richard, marek.vasut, industrypack-devel, linux-pci, dvhart,
	linux, linux-media, seanpaul, devel, linux-watchdog, arnd,
	b.zolnierkie, marc.zyngier, jslaby
In-Reply-To: <b8ff7f17-7f2c-f220-9833-7ae5bd7343d5@c-s.fr>



On 2017/12/24 17:05, christophe leroy wrote:
> 
> 
> Le 23/12/2017 à 14:48, Greg KH a écrit :
>> On Sat, Dec 23, 2017 at 06:55:25PM +0800, Yisheng Xie wrote:
>>> Hi all,
>>>
>>> When I tried to use devm_ioremap function and review related code, I found
>>> devm_ioremap and devm_ioremap_nocache is almost the same with each other,
>>> except one use ioremap while the other use ioremap_nocache.
>>
>> For all arches?  Really?  Look at MIPS, and x86, they have different
>> functions.
>>
>>> While ioremap's
>>> default function is ioremap_nocache, so devm_ioremap_nocache also have the
>>> same function with devm_ioremap, which can just be killed to reduce the size
>>> of devres.o(from 20304 bytes to 18992 bytes in my compile environment).
>>>
>>> I have posted two versions, which use macro instead of function for
>>> devm_ioremap_nocache[1] or devm_ioremap[2]. And Greg suggest me to kill
>>> devm_ioremap_nocache for no need to keep a macro around for the duplicate
>>> thing. So here comes v3 and please help to review.
>>
>> I don't think this can be done, what am I missing?  These functions are
>> not identical, sorry for missing that before.
> 
> devm_ioremap() and devm_ioremap_nocache() are quite similar, both use devm_ioremap_release() for the release, why not just defining:
> 
> static void __iomem *__devm_ioremap(struct device *dev, resource_size_t offset,
>                resource_size_t size, bool nocache)
> {
> [...]
>     if (nocache)
>         addr = ioremap_nocache(offset, size);
>     else
>         addr = ioremap(offset, size);
> [...]
> }
> 
> then in include/linux/io.h
> 
> static inline void __iomem *devm_ioremap(struct device *dev, resource_size_t offset,
>                resource_size_t size)
> {return __devm_ioremap(dev, offset, size, false);}
> 
> static inline void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t offset,
>                    resource_size_t size);
> {return __devm_ioremap(dev, offset, size, true);}

Yeah, this seems good to me, right now we have devm_ioremap, devm_ioremap_wc, devm_ioremap_nocache
May be we can use an enum like:
typedef enum {
	DEVM_IOREMAP = 0,
	DEVM_IOREMAP_NOCACHE,
	DEVM_IOREMAP_WC,
} devm_ioremap_type;

static inline void __iomem *devm_ioremap(struct device *dev, resource_size_t offset,
                resource_size_t size)
 {return __devm_ioremap(dev, offset, size, DEVM_IOREMAP);}

 static inline void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t offset,
                    resource_size_t size);
 {return __devm_ioremap(dev, offset, size, DEVM_IOREMAP_NOCACHE);}

 static inline void __iomem *devm_ioremap_wc(struct device *dev, resource_size_t offset,
                    resource_size_t size);
 {return __devm_ioremap(dev, offset, size, DEVM_IOREMAP_WC);}

 static void __iomem *__devm_ioremap(struct device *dev, resource_size_t offset,
                resource_size_t size, devm_ioremap_type type)
 {
     void __iomem **ptr, *addr = NULL;
 [...]
     switch (type){
     case DEVM_IOREMAP:
         addr = ioremap(offset, size);
         break;
     case DEVM_IOREMAP_NOCACHE:
         addr = ioremap_nocache(offset, size);
         break;
     case DEVM_IOREMAP_WC:
         addr = ioremap_wc(offset, size);
         break;
     }
 [...]
 }

Thanks
Yisheng

> 
> Christophe
> 
>>
>> thanks,
>>
>> greg k-h
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-watchdog" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> ---
> L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
> https://www.avast.com/antivirus
> 
> 
> .
> 

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply

* [PATCH net-next v8 2/2] net: ethernet: socionext: add AVE ethernet driver
From: Kunihiko Hayashi @ 2017-12-25  1:10 UTC (permalink / raw)
  To: David Miller, netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: Andrew Lunn, Florian Fainelli, Rob Herring, Mark Rutland,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Masahiro Yamada,
	Masami Hiramatsu, Jassi Brar, Kunihiko Hayashi
In-Reply-To: <1514164238-28901-1-git-send-email-hayashi.kunihiko-uWyLwvC0a2jby3iVrkZq2A@public.gmane.org>

The UniPhier platform from Socionext provides the AVE ethernet
controller that includes MAC and MDIO bus supporting RGMII/RMII
modes. The controller is named AVE.

Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko-uWyLwvC0a2jby3iVrkZq2A@public.gmane.org>
Signed-off-by: Jassi Brar <jaswinder.singh-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
Reviewed-by: Andrew Lunn <andrew-g2DYL2Zd6BY@public.gmane.org>
---
 drivers/net/ethernet/Kconfig             |    1 +
 drivers/net/ethernet/Makefile            |    1 +
 drivers/net/ethernet/socionext/Kconfig   |   22 +
 drivers/net/ethernet/socionext/Makefile  |    5 +
 drivers/net/ethernet/socionext/sni_ave.c | 1736 ++++++++++++++++++++++++++++++
 5 files changed, 1765 insertions(+)
 create mode 100644 drivers/net/ethernet/socionext/Kconfig
 create mode 100644 drivers/net/ethernet/socionext/Makefile
 create mode 100644 drivers/net/ethernet/socionext/sni_ave.c

diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index c604213..d50519e 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -170,6 +170,7 @@ source "drivers/net/ethernet/sis/Kconfig"
 source "drivers/net/ethernet/sfc/Kconfig"
 source "drivers/net/ethernet/sgi/Kconfig"
 source "drivers/net/ethernet/smsc/Kconfig"
+source "drivers/net/ethernet/socionext/Kconfig"
 source "drivers/net/ethernet/stmicro/Kconfig"
 source "drivers/net/ethernet/sun/Kconfig"
 source "drivers/net/ethernet/tehuti/Kconfig"
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index 39f62733..6cf5ade 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -82,6 +82,7 @@ obj-$(CONFIG_SFC) += sfc/
 obj-$(CONFIG_SFC_FALCON) += sfc/falcon/
 obj-$(CONFIG_NET_VENDOR_SGI) += sgi/
 obj-$(CONFIG_NET_VENDOR_SMSC) += smsc/
+obj-$(CONFIG_NET_VENDOR_SOCIONEXT) += socionext/
 obj-$(CONFIG_NET_VENDOR_STMICRO) += stmicro/
 obj-$(CONFIG_NET_VENDOR_SUN) += sun/
 obj-$(CONFIG_NET_VENDOR_TEHUTI) += tehuti/
diff --git a/drivers/net/ethernet/socionext/Kconfig b/drivers/net/ethernet/socionext/Kconfig
new file mode 100644
index 0000000..3a1829e
--- /dev/null
+++ b/drivers/net/ethernet/socionext/Kconfig
@@ -0,0 +1,22 @@
+config NET_VENDOR_SOCIONEXT
+	bool "Socionext ethernet drivers"
+	default y
+	---help---
+	  Option to select ethernet drivers for Socionext platforms.
+
+	  Note that the answer to this question doesn't directly affect the
+	  kernel: saying N will just cause the configurator to skip all
+	  the questions about Socionext devices. If you say Y, you will be asked
+	  for your specific card in the following questions.
+
+if NET_VENDOR_SOCIONEXT
+
+config SNI_AVE
+	tristate "Socionext AVE ethernet support"
+	depends on (ARCH_UNIPHIER || COMPILE_TEST) && OF
+	select PHYLIB
+	---help---
+	  Driver for gigabit ethernet MACs, called AVE, in the
+	  Socionext UniPhier family.
+
+endif #NET_VENDOR_SOCIONEXT
diff --git a/drivers/net/ethernet/socionext/Makefile b/drivers/net/ethernet/socionext/Makefile
new file mode 100644
index 0000000..ab83df6
--- /dev/null
+++ b/drivers/net/ethernet/socionext/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for all ethernet ip drivers on Socionext platforms
+#
+obj-$(CONFIG_SNI_AVE) += sni_ave.o
diff --git a/drivers/net/ethernet/socionext/sni_ave.c b/drivers/net/ethernet/socionext/sni_ave.c
new file mode 100644
index 0000000..0925675
--- /dev/null
+++ b/drivers/net/ethernet/socionext/sni_ave.c
@@ -0,0 +1,1736 @@
+// SPDX-License-Identifier: GPL-2.0
+/**
+ * sni_ave.c - Socionext UniPhier AVE ethernet driver
+ * Copyright 2014 Panasonic Corporation
+ * Copyright 2015-2017 Socionext Inc.
+ */
+
+#include <linux/bitops.h>
+#include <linux/clk.h>
+#include <linux/etherdevice.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/iopoll.h>
+#include <linux/mii.h>
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/of_net.h>
+#include <linux/of_mdio.h>
+#include <linux/of_platform.h>
+#include <linux/phy.h>
+#include <linux/reset.h>
+#include <linux/types.h>
+#include <linux/u64_stats_sync.h>
+
+/* General Register Group */
+#define AVE_IDR			0x000	/* ID */
+#define AVE_VR			0x004	/* Version */
+#define AVE_GRR			0x008	/* Global Reset */
+#define AVE_CFGR		0x00c	/* Configuration */
+
+/* Interrupt Register Group */
+#define AVE_GIMR		0x100	/* Global Interrupt Mask */
+#define AVE_GISR		0x104	/* Global Interrupt Status */
+
+/* MAC Register Group */
+#define AVE_TXCR		0x200	/* TX Setup */
+#define AVE_RXCR		0x204	/* RX Setup */
+#define AVE_RXMAC1R		0x208	/* MAC address (lower) */
+#define AVE_RXMAC2R		0x20c	/* MAC address (upper) */
+#define AVE_MDIOCTR		0x214	/* MDIO Control */
+#define AVE_MDIOAR		0x218	/* MDIO Address */
+#define AVE_MDIOWDR		0x21c	/* MDIO Data */
+#define AVE_MDIOSR		0x220	/* MDIO Status */
+#define AVE_MDIORDR		0x224	/* MDIO Rd Data */
+
+/* Descriptor Control Register Group */
+#define AVE_DESCC		0x300	/* Descriptor Control */
+#define AVE_TXDC		0x304	/* TX Descriptor Configuration */
+#define AVE_RXDC0		0x308	/* RX Descriptor Ring0 Configuration */
+#define AVE_IIRQC		0x34c	/* Interval IRQ Control */
+
+/* Packet Filter Register Group */
+#define AVE_PKTF_BASE		0x800	/* PF Base Address */
+#define AVE_PFMBYTE_BASE	0xd00	/* PF Mask Byte Base Address */
+#define AVE_PFMBIT_BASE		0xe00	/* PF Mask Bit Base Address */
+#define AVE_PFSEL_BASE		0xf00	/* PF Selector Base Address */
+#define AVE_PFEN		0xffc	/* Packet Filter Enable */
+#define AVE_PKTF(ent)		(AVE_PKTF_BASE + (ent) * 0x40)
+#define AVE_PFMBYTE(ent)	(AVE_PFMBYTE_BASE + (ent) * 8)
+#define AVE_PFMBIT(ent)		(AVE_PFMBIT_BASE + (ent) * 4)
+#define AVE_PFSEL(ent)		(AVE_PFSEL_BASE + (ent) * 4)
+
+/* 64bit descriptor memory */
+#define AVE_DESC_SIZE_64	12	/* Descriptor Size */
+
+#define AVE_TXDM_64		0x1000	/* Tx Descriptor Memory */
+#define AVE_RXDM_64		0x1c00	/* Rx Descriptor Memory */
+
+#define AVE_TXDM_SIZE_64	0x0ba0	/* Tx Descriptor Memory Size 3KB */
+#define AVE_RXDM_SIZE_64	0x6000	/* Rx Descriptor Memory Size 24KB */
+
+/* 32bit descriptor memory */
+#define AVE_DESC_SIZE_32	8	/* Descriptor Size */
+
+#define AVE_TXDM_32		0x1000	/* Tx Descriptor Memory */
+#define AVE_RXDM_32		0x1800	/* Rx Descriptor Memory */
+
+#define AVE_TXDM_SIZE_32	0x07c0	/* Tx Descriptor Memory Size 2KB */
+#define AVE_RXDM_SIZE_32	0x4000	/* Rx Descriptor Memory Size 16KB */
+
+/* RMII Bridge Register Group */
+#define AVE_RSTCTRL		0x8028	/* Reset control */
+#define AVE_RSTCTRL_RMIIRST	BIT(16)
+#define AVE_LINKSEL		0x8034	/* Link speed setting */
+#define AVE_LINKSEL_100M	BIT(0)
+
+/* AVE_GRR */
+#define AVE_GRR_RXFFR		BIT(5)	/* Reset RxFIFO */
+#define AVE_GRR_PHYRST		BIT(4)	/* Reset external PHY */
+#define AVE_GRR_GRST		BIT(0)	/* Reset all MAC */
+
+/* AVE_CFGR */
+#define AVE_CFGR_FLE		BIT(31)	/* Filter Function */
+#define AVE_CFGR_CHE		BIT(30)	/* Checksum Function */
+#define AVE_CFGR_MII		BIT(27)	/* Func mode (1:MII/RMII, 0:RGMII) */
+#define AVE_CFGR_IPFCEN		BIT(24)	/* IP fragment sum Enable */
+
+/* AVE_GISR (common with GIMR) */
+#define AVE_GI_PHY		BIT(24)	/* PHY interrupt */
+#define AVE_GI_TX		BIT(16)	/* Tx complete */
+#define AVE_GI_RXERR		BIT(8)	/* Receive frame more than max size */
+#define AVE_GI_RXOVF		BIT(7)	/* Overflow at the RxFIFO */
+#define AVE_GI_RXDROP		BIT(6)	/* Drop packet */
+#define AVE_GI_RXIINT		BIT(5)	/* Interval interrupt */
+
+/* AVE_TXCR */
+#define AVE_TXCR_FLOCTR		BIT(18)	/* Flow control */
+#define AVE_TXCR_TXSPD_1G	BIT(17)
+#define AVE_TXCR_TXSPD_100	BIT(16)
+
+/* AVE_RXCR */
+#define AVE_RXCR_RXEN		BIT(30)	/* Rx enable */
+#define AVE_RXCR_FDUPEN		BIT(22)	/* Interface mode */
+#define AVE_RXCR_FLOCTR		BIT(21)	/* Flow control */
+#define AVE_RXCR_AFEN		BIT(19)	/* MAC address filter */
+#define AVE_RXCR_DRPEN		BIT(18)	/* Drop pause frame */
+#define AVE_RXCR_MPSIZ_MASK	GENMASK(10, 0)
+
+/* AVE_MDIOCTR */
+#define AVE_MDIOCTR_RREQ	BIT(3)	/* Read request */
+#define AVE_MDIOCTR_WREQ	BIT(2)	/* Write request */
+
+/* AVE_MDIOSR */
+#define AVE_MDIOSR_STS		BIT(0)	/* access status */
+
+/* AVE_DESCC */
+#define AVE_DESCC_STATUS_MASK	GENMASK(31, 16)
+#define AVE_DESCC_RD0		BIT(8)	/* Enable Rx descriptor Ring0 */
+#define AVE_DESCC_RDSTP		BIT(4)	/* Pause Rx descriptor */
+#define AVE_DESCC_TD		BIT(0)	/* Enable Tx descriptor */
+
+/* AVE_TXDC */
+#define AVE_TXDC_SIZE		GENMASK(27, 16)	/* Size of Tx descriptor */
+#define AVE_TXDC_ADDR		GENMASK(11, 0)	/* Start address */
+#define AVE_TXDC_ADDR_START	0
+
+/* AVE_RXDC0 */
+#define AVE_RXDC0_SIZE		GENMASK(30, 16)	/* Size of Rx descriptor */
+#define AVE_RXDC0_ADDR		GENMASK(14, 0)	/* Start address */
+#define AVE_RXDC0_ADDR_START	0
+
+/* AVE_IIRQC */
+#define AVE_IIRQC_EN0		BIT(27)	/* Enable interval interrupt Ring0 */
+#define AVE_IIRQC_BSCK		GENMASK(15, 0)	/* Interval count unit */
+
+/* Command status for descriptor */
+#define AVE_STS_OWN		BIT(31)	/* Descriptor ownership */
+#define AVE_STS_INTR		BIT(29)	/* Request for interrupt */
+#define AVE_STS_OK		BIT(27)	/* Normal transmit */
+/* TX */
+#define AVE_STS_NOCSUM		BIT(28)	/* No use HW checksum */
+#define AVE_STS_1ST		BIT(26)	/* Head of buffer chain */
+#define AVE_STS_LAST		BIT(25)	/* Tail of buffer chain */
+#define AVE_STS_OWC		BIT(21)	/* Out of window,Late Collision */
+#define AVE_STS_EC		BIT(20)	/* Excess collision occurred */
+#define AVE_STS_PKTLEN_TX_MASK	GENMASK(15, 0)
+/* RX */
+#define AVE_STS_CSSV		BIT(21)	/* Checksum check performed */
+#define AVE_STS_CSER		BIT(20)	/* Checksum error detected */
+#define AVE_STS_PKTLEN_RX_MASK	GENMASK(10, 0)
+
+/* Packet filter */
+#define AVE_PFMBYTE_MASK0	(GENMASK(31, 8) | GENMASK(5, 0))
+#define AVE_PFMBYTE_MASK1	GENMASK(25, 0)
+#define AVE_PFMBIT_MASK		GENMASK(15, 0)
+
+#define AVE_PF_SIZE		17	/* Number of all packet filter */
+#define AVE_PF_MULTICAST_SIZE	7	/* Number of multicast filter */
+
+#define AVE_PFNUM_FILTER	0	/* No.0 */
+#define AVE_PFNUM_UNICAST	1	/* No.1 */
+#define AVE_PFNUM_BROADCAST	2	/* No.2 */
+#define AVE_PFNUM_MULTICAST	11	/* No.11-17 */
+
+/* NETIF Message control */
+#define AVE_DEFAULT_MSG_ENABLE	(NETIF_MSG_DRV    |	\
+				 NETIF_MSG_PROBE  |	\
+				 NETIF_MSG_LINK   |	\
+				 NETIF_MSG_TIMER  |	\
+				 NETIF_MSG_IFDOWN |	\
+				 NETIF_MSG_IFUP   |	\
+				 NETIF_MSG_RX_ERR |	\
+				 NETIF_MSG_TX_ERR)
+
+/* Parameter for descriptor */
+#define AVE_NR_TXDESC		32	/* Tx descriptor */
+#define AVE_NR_RXDESC		64	/* Rx descriptor */
+
+#define AVE_DESC_OFS_CMDSTS	0
+#define AVE_DESC_OFS_ADDRL	4
+#define AVE_DESC_OFS_ADDRU	8
+
+/* Parameter for ethernet frame */
+#define AVE_MAX_ETHFRAME	1518
+
+/* Parameter for interrupt */
+#define AVE_INTM_COUNT		20
+#define AVE_FORCE_TXINTCNT	1
+
+#define IS_DESC_64BIT(p)	((p)->data->is_desc_64bit)
+
+enum desc_id {
+	AVE_DESCID_RX,
+	AVE_DESCID_TX,
+};
+
+enum desc_state {
+	AVE_DESC_RX_PERMIT,
+	AVE_DESC_RX_SUSPEND,
+	AVE_DESC_START,
+	AVE_DESC_STOP,
+};
+
+struct ave_desc {
+	struct sk_buff	*skbs;
+	dma_addr_t	skbs_dma;
+	size_t		skbs_dmalen;
+};
+
+struct ave_desc_info {
+	u32	ndesc;		/* number of descriptor */
+	u32	daddr;		/* start address of descriptor */
+	u32	proc_idx;	/* index of processing packet */
+	u32	done_idx;	/* index of processed packet */
+	struct ave_desc *desc;	/* skb info related descriptor */
+};
+
+struct ave_soc_data {
+	bool	is_desc_64bit;
+};
+
+struct ave_stats {
+	struct	u64_stats_sync	syncp;
+	u64	packets;
+	u64	bytes;
+	u64	errors;
+	u64	dropped;
+	u64	collisions;
+	u64	fifo_errors;
+};
+
+struct ave_private {
+	void __iomem            *base;
+	int                     irq;
+	int			phy_id;
+	unsigned int		desc_size;
+	u32			msg_enable;
+	struct clk		*clk;
+	struct reset_control	*rst;
+	phy_interface_t		phy_mode;
+	struct phy_device	*phydev;
+	struct mii_bus		*mdio;
+
+	/* stats */
+	struct ave_stats	stats_rx;
+	struct ave_stats	stats_tx;
+
+	/* NAPI support */
+	struct net_device	*ndev;
+	struct napi_struct	napi_rx;
+	struct napi_struct	napi_tx;
+
+	/* descriptor */
+	struct ave_desc_info	rx;
+	struct ave_desc_info	tx;
+
+	/* flow control */
+	int pause_auto;
+	int pause_rx;
+	int pause_tx;
+
+	const struct ave_soc_data *data;
+};
+
+static u32 ave_desc_read(struct net_device *ndev, enum desc_id id, int entry,
+			 int offset)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+	u32 addr;
+
+	addr = ((id == AVE_DESCID_TX) ? priv->tx.daddr : priv->rx.daddr)
+		+ entry * priv->desc_size + offset;
+
+	return readl(priv->base + addr);
+}
+
+static u32 ave_desc_read_cmdsts(struct net_device *ndev, enum desc_id id,
+				int entry)
+{
+	return ave_desc_read(ndev, id, entry, AVE_DESC_OFS_CMDSTS);
+}
+
+static void ave_desc_write(struct net_device *ndev, enum desc_id id,
+			   int entry, int offset, u32 val)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+	u32 addr;
+
+	addr = ((id == AVE_DESCID_TX) ? priv->tx.daddr : priv->rx.daddr)
+		+ entry * priv->desc_size + offset;
+
+	writel(val, priv->base + addr);
+}
+
+static void ave_desc_write_cmdsts(struct net_device *ndev, enum desc_id id,
+				  int entry, u32 val)
+{
+	ave_desc_write(ndev, id, entry, AVE_DESC_OFS_CMDSTS, val);
+}
+
+static void ave_desc_write_addr(struct net_device *ndev, enum desc_id id,
+				int entry, dma_addr_t paddr)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+
+	ave_desc_write(ndev, id, entry, AVE_DESC_OFS_ADDRL,
+		       lower_32_bits(paddr));
+	if (IS_DESC_64BIT(priv))
+		ave_desc_write(ndev, id,
+			       entry, AVE_DESC_OFS_ADDRU,
+			       upper_32_bits(paddr));
+}
+
+static u32 ave_irq_disable_all(struct net_device *ndev)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+	u32 ret;
+
+	ret = readl(priv->base + AVE_GIMR);
+	writel(0, priv->base + AVE_GIMR);
+
+	return ret;
+}
+
+static void ave_irq_restore(struct net_device *ndev, u32 val)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+
+	writel(val, priv->base + AVE_GIMR);
+}
+
+static void ave_irq_enable(struct net_device *ndev, u32 bitflag)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+
+	writel(readl(priv->base + AVE_GIMR) | bitflag, priv->base + AVE_GIMR);
+	writel(bitflag, priv->base + AVE_GISR);
+}
+
+static void ave_hw_write_macaddr(struct net_device *ndev,
+				 const unsigned char *mac_addr,
+				 int reg1, int reg2)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+
+	writel(mac_addr[0] | mac_addr[1] << 8 |
+	       mac_addr[2] << 16 | mac_addr[3] << 24, priv->base + reg1);
+	writel(mac_addr[4] | mac_addr[5] << 8, priv->base + reg2);
+}
+
+static void ave_hw_read_version(struct net_device *ndev, char *buf, int len)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+	u32 major, minor, vr;
+
+	vr = readl(priv->base + AVE_VR);
+	major = (vr & GENMASK(15, 8)) >> 8;
+	minor = (vr & GENMASK(7, 0));
+	snprintf(buf, len, "v%u.%u", major, minor);
+}
+
+static void ave_ethtool_get_drvinfo(struct net_device *ndev,
+				    struct ethtool_drvinfo *info)
+{
+	struct device *dev = ndev->dev.parent;
+
+	strlcpy(info->driver, dev->driver->name, sizeof(info->driver));
+	strlcpy(info->bus_info, dev_name(dev), sizeof(info->bus_info));
+	ave_hw_read_version(ndev, info->fw_version, sizeof(info->fw_version));
+}
+
+static u32 ave_ethtool_get_msglevel(struct net_device *ndev)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+
+	return priv->msg_enable;
+}
+
+static void ave_ethtool_set_msglevel(struct net_device *ndev, u32 val)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+
+	priv->msg_enable = val;
+}
+
+static void ave_ethtool_get_wol(struct net_device *ndev,
+				struct ethtool_wolinfo *wol)
+{
+	wol->supported = 0;
+	wol->wolopts   = 0;
+
+	if (ndev->phydev)
+		phy_ethtool_get_wol(ndev->phydev, wol);
+}
+
+static int ave_ethtool_set_wol(struct net_device *ndev,
+			       struct ethtool_wolinfo *wol)
+{
+	int ret;
+
+	if (!ndev->phydev ||
+	    (wol->wolopts & (WAKE_ARP | WAKE_MAGICSECURE)))
+		return -EOPNOTSUPP;
+
+	ret = phy_ethtool_set_wol(ndev->phydev, wol);
+	if (!ret)
+		device_set_wakeup_enable(&ndev->dev, !!wol->wolopts);
+
+	return ret;
+}
+
+static void ave_ethtool_get_pauseparam(struct net_device *ndev,
+				       struct ethtool_pauseparam *pause)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+
+	pause->autoneg  = priv->pause_auto;
+	pause->rx_pause = priv->pause_rx;
+	pause->tx_pause = priv->pause_tx;
+}
+
+static int ave_ethtool_set_pauseparam(struct net_device *ndev,
+				      struct ethtool_pauseparam *pause)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+	struct phy_device *phydev = ndev->phydev;
+
+	if (!phydev)
+		return -EINVAL;
+
+	priv->pause_auto = pause->autoneg;
+	priv->pause_rx   = pause->rx_pause;
+	priv->pause_tx   = pause->tx_pause;
+
+	phydev->advertising &= ~(ADVERTISED_Pause | ADVERTISED_Asym_Pause);
+	if (pause->rx_pause)
+		phydev->advertising |= ADVERTISED_Pause | ADVERTISED_Asym_Pause;
+	if (pause->tx_pause)
+		phydev->advertising ^= ADVERTISED_Asym_Pause;
+
+	if (pause->autoneg) {
+		if (netif_running(ndev))
+			phy_start_aneg(phydev);
+	}
+
+	return 0;
+}
+
+static const struct ethtool_ops ave_ethtool_ops = {
+	.get_link_ksettings	= phy_ethtool_get_link_ksettings,
+	.set_link_ksettings	= phy_ethtool_set_link_ksettings,
+	.get_drvinfo		= ave_ethtool_get_drvinfo,
+	.nway_reset		= phy_ethtool_nway_reset,
+	.get_link		= ethtool_op_get_link,
+	.get_msglevel		= ave_ethtool_get_msglevel,
+	.set_msglevel		= ave_ethtool_set_msglevel,
+	.get_wol		= ave_ethtool_get_wol,
+	.set_wol		= ave_ethtool_set_wol,
+	.get_pauseparam         = ave_ethtool_get_pauseparam,
+	.set_pauseparam         = ave_ethtool_set_pauseparam,
+};
+
+static int ave_mdiobus_read(struct mii_bus *bus, int phyid, int regnum)
+{
+	struct net_device *ndev = bus->priv;
+	struct ave_private *priv;
+	u32 mdioctl, mdiosr;
+	int ret;
+
+	priv = netdev_priv(ndev);
+
+	/* write address */
+	writel((phyid << 8) | regnum, priv->base + AVE_MDIOAR);
+
+	/* read request */
+	mdioctl = readl(priv->base + AVE_MDIOCTR);
+	writel((mdioctl | AVE_MDIOCTR_RREQ) & ~AVE_MDIOCTR_WREQ,
+	       priv->base + AVE_MDIOCTR);
+
+	ret = readl_poll_timeout(priv->base + AVE_MDIOSR, mdiosr,
+				 !(mdiosr & AVE_MDIOSR_STS), 20, 2000);
+	if (ret) {
+		netdev_err(ndev, "failed to read (phy:%d reg:%x)\n",
+			   phyid, regnum);
+		return ret;
+	}
+
+	return readl(priv->base + AVE_MDIORDR) & GENMASK(15, 0);
+}
+
+static int ave_mdiobus_write(struct mii_bus *bus, int phyid, int regnum,
+			     u16 val)
+{
+	struct net_device *ndev = bus->priv;
+	struct ave_private *priv;
+	u32 mdioctl, mdiosr;
+	int ret;
+
+	priv = netdev_priv(ndev);
+
+	/* write address */
+	writel((phyid << 8) | regnum, priv->base + AVE_MDIOAR);
+
+	/* write data */
+	writel(val, priv->base + AVE_MDIOWDR);
+
+	/* write request */
+	mdioctl = readl(priv->base + AVE_MDIOCTR);
+	writel((mdioctl | AVE_MDIOCTR_WREQ) & ~AVE_MDIOCTR_RREQ,
+	       priv->base + AVE_MDIOCTR);
+
+	ret = readl_poll_timeout(priv->base + AVE_MDIOSR, mdiosr,
+				 !(mdiosr & AVE_MDIOSR_STS), 20, 2000);
+	if (ret)
+		netdev_err(ndev, "failed to write (phy:%d reg:%x)\n",
+			   phyid, regnum);
+
+	return ret;
+}
+
+static int ave_dma_map(struct net_device *ndev, struct ave_desc *desc,
+		       void *ptr, size_t len, enum dma_data_direction dir,
+		       dma_addr_t *paddr)
+{
+	dma_addr_t map_addr;
+
+	map_addr = dma_map_single(ndev->dev.parent, ptr, len, dir);
+	if (unlikely(dma_mapping_error(ndev->dev.parent, map_addr)))
+		return -ENOMEM;
+
+	desc->skbs_dma = map_addr;
+	desc->skbs_dmalen = len;
+	*paddr = map_addr;
+
+	return 0;
+}
+
+static void ave_dma_unmap(struct net_device *ndev, struct ave_desc *desc,
+			  enum dma_data_direction dir)
+{
+	if (!desc->skbs_dma)
+		return;
+
+	dma_unmap_single(ndev->dev.parent,
+			 desc->skbs_dma, desc->skbs_dmalen, dir);
+	desc->skbs_dma = 0;
+}
+
+/* Prepare Rx descriptor and memory */
+static int ave_rxdesc_prepare(struct net_device *ndev, int entry)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+	struct sk_buff *skb;
+	dma_addr_t paddr;
+	int ret;
+
+	skb = priv->rx.desc[entry].skbs;
+	if (!skb) {
+		skb = netdev_alloc_skb_ip_align(ndev,
+						AVE_MAX_ETHFRAME);
+		if (!skb) {
+			netdev_err(ndev, "can't allocate skb for Rx\n");
+			return -ENOMEM;
+		}
+	}
+
+	/* set disable to cmdsts */
+	ave_desc_write_cmdsts(ndev, AVE_DESCID_RX, entry,
+			      AVE_STS_INTR | AVE_STS_OWN);
+
+	/* map Rx buffer
+	 * Rx buffer set to the Rx descriptor has two restrictions:
+	 * - Rx buffer address is 4 byte aligned.
+	 * - Rx buffer begins with 2 byte headroom, and data will be put from
+	 *   (buffer + 2).
+	 * To satisfy this, specify the address to put back the buffer
+	 * pointer advanced by NET_IP_ALIGN by netdev_alloc_skb_ip_align(),
+	 * and expand the map size by NET_IP_ALIGN.
+	 */
+	ret = ave_dma_map(ndev, &priv->rx.desc[entry],
+			  skb->data - NET_IP_ALIGN,
+			  AVE_MAX_ETHFRAME + NET_IP_ALIGN,
+			  DMA_FROM_DEVICE, &paddr);
+	if (ret) {
+		netdev_err(ndev, "can't map skb for Rx\n");
+		dev_kfree_skb_any(skb);
+		return ret;
+	}
+	priv->rx.desc[entry].skbs = skb;
+
+	/* set buffer pointer */
+	ave_desc_write_addr(ndev, AVE_DESCID_RX, entry, paddr);
+
+	/* set enable to cmdsts */
+	ave_desc_write_cmdsts(ndev, AVE_DESCID_RX, entry,
+			      AVE_STS_INTR | AVE_MAX_ETHFRAME);
+
+	return ret;
+}
+
+/* Switch state of descriptor */
+static int ave_desc_switch(struct net_device *ndev, enum desc_state state)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+	int ret = 0;
+	u32 val;
+
+	switch (state) {
+	case AVE_DESC_START:
+		writel(AVE_DESCC_TD | AVE_DESCC_RD0, priv->base + AVE_DESCC);
+		break;
+
+	case AVE_DESC_STOP:
+		writel(0, priv->base + AVE_DESCC);
+		if (readl_poll_timeout(priv->base + AVE_DESCC, val, !val,
+				       150, 15000)) {
+			netdev_err(ndev, "can't stop descriptor\n");
+			ret = -EBUSY;
+		}
+		break;
+
+	case AVE_DESC_RX_SUSPEND:
+		val = readl(priv->base + AVE_DESCC);
+		val |= AVE_DESCC_RDSTP;
+		val &= ~AVE_DESCC_STATUS_MASK;
+		writel(val, priv->base + AVE_DESCC);
+		if (readl_poll_timeout(priv->base + AVE_DESCC, val,
+				       val & (AVE_DESCC_RDSTP << 16),
+				       150, 150000)) {
+			netdev_err(ndev, "can't suspend descriptor\n");
+			ret = -EBUSY;
+		}
+		break;
+
+	case AVE_DESC_RX_PERMIT:
+		val = readl(priv->base + AVE_DESCC);
+		val &= ~AVE_DESCC_RDSTP;
+		val &= ~AVE_DESCC_STATUS_MASK;
+		writel(val, priv->base + AVE_DESCC);
+		break;
+
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+	return ret;
+}
+
+static int ave_tx_complete(struct net_device *ndev)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+	u32 proc_idx, done_idx, ndesc, cmdsts;
+	unsigned int nr_freebuf = 0;
+	unsigned int tx_packets = 0;
+	unsigned int tx_bytes = 0;
+
+	proc_idx = priv->tx.proc_idx;
+	done_idx = priv->tx.done_idx;
+	ndesc    = priv->tx.ndesc;
+
+	/* free pre-stored skb from done_idx to proc_idx */
+	while (proc_idx != done_idx) {
+		cmdsts = ave_desc_read_cmdsts(ndev, AVE_DESCID_TX, done_idx);
+
+		/* do nothing if owner is HW (==1 for Tx) */
+		if (cmdsts & AVE_STS_OWN)
+			break;
+
+		/* check Tx status and updates statistics */
+		if (cmdsts & AVE_STS_OK) {
+			tx_bytes += cmdsts & AVE_STS_PKTLEN_TX_MASK;
+			/* success */
+			if (cmdsts & AVE_STS_LAST)
+				tx_packets++;
+		} else {
+			/* error */
+			if (cmdsts & AVE_STS_LAST) {
+				priv->stats_tx.errors++;
+				if (cmdsts & (AVE_STS_OWC | AVE_STS_EC))
+					priv->stats_tx.collisions++;
+			}
+		}
+
+		/* release skb */
+		if (priv->tx.desc[done_idx].skbs) {
+			ave_dma_unmap(ndev, &priv->tx.desc[done_idx],
+				      DMA_TO_DEVICE);
+			dev_consume_skb_any(priv->tx.desc[done_idx].skbs);
+			priv->tx.desc[done_idx].skbs = NULL;
+			nr_freebuf++;
+		}
+		done_idx = (done_idx + 1) % ndesc;
+	}
+
+	priv->tx.done_idx = done_idx;
+
+	/* update stats */
+	u64_stats_update_begin(&priv->stats_tx.syncp);
+	priv->stats_tx.packets += tx_packets;
+	priv->stats_tx.bytes   += tx_bytes;
+	u64_stats_update_end(&priv->stats_tx.syncp);
+
+	/* wake queue for freeing buffer */
+	if (unlikely(netif_queue_stopped(ndev)) && nr_freebuf)
+		netif_wake_queue(ndev);
+
+	return nr_freebuf;
+}
+
+static int ave_rx_receive(struct net_device *ndev, int num)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+	unsigned int rx_packets = 0;
+	unsigned int rx_bytes = 0;
+	u32 proc_idx, done_idx;
+	struct sk_buff *skb;
+	unsigned int pktlen;
+	int restpkt, npkts;
+	u32 ndesc, cmdsts;
+
+	proc_idx = priv->rx.proc_idx;
+	done_idx = priv->rx.done_idx;
+	ndesc    = priv->rx.ndesc;
+	restpkt  = ((proc_idx + ndesc - 1) - done_idx) % ndesc;
+
+	for (npkts = 0; npkts < num; npkts++) {
+		/* we can't receive more packet, so fill desc quickly */
+		if (--restpkt < 0)
+			break;
+
+		cmdsts = ave_desc_read_cmdsts(ndev, AVE_DESCID_RX, proc_idx);
+
+		/* do nothing if owner is HW (==0 for Rx) */
+		if (!(cmdsts & AVE_STS_OWN))
+			break;
+
+		if (!(cmdsts & AVE_STS_OK)) {
+			priv->stats_rx.errors++;
+			proc_idx = (proc_idx + 1) % ndesc;
+			continue;
+		}
+
+		pktlen = cmdsts & AVE_STS_PKTLEN_RX_MASK;
+
+		/* get skbuff for rx */
+		skb = priv->rx.desc[proc_idx].skbs;
+		priv->rx.desc[proc_idx].skbs = NULL;
+
+		ave_dma_unmap(ndev, &priv->rx.desc[proc_idx], DMA_FROM_DEVICE);
+
+		skb->dev = ndev;
+		skb_put(skb, pktlen);
+		skb->protocol = eth_type_trans(skb, ndev);
+
+		if ((cmdsts & AVE_STS_CSSV) && (!(cmdsts & AVE_STS_CSER)))
+			skb->ip_summed = CHECKSUM_UNNECESSARY;
+
+		rx_packets++;
+		rx_bytes += pktlen;
+
+		netif_receive_skb(skb);
+
+		proc_idx = (proc_idx + 1) % ndesc;
+	}
+
+	priv->rx.proc_idx = proc_idx;
+
+	/* update stats */
+	u64_stats_update_begin(&priv->stats_rx.syncp);
+	priv->stats_rx.packets += rx_packets;
+	priv->stats_rx.bytes   += rx_bytes;
+	u64_stats_update_end(&priv->stats_rx.syncp);
+
+	/* refill the Rx buffers */
+	while (proc_idx != done_idx) {
+		if (ave_rxdesc_prepare(ndev, done_idx))
+			break;
+		done_idx = (done_idx + 1) % ndesc;
+	}
+
+	priv->rx.done_idx = done_idx;
+
+	return npkts;
+}
+
+static int ave_napi_poll_rx(struct napi_struct *napi, int budget)
+{
+	struct ave_private *priv;
+	struct net_device *ndev;
+	int num;
+
+	priv = container_of(napi, struct ave_private, napi_rx);
+	ndev = priv->ndev;
+
+	num = ave_rx_receive(ndev, budget);
+	if (num < budget) {
+		napi_complete_done(napi, num);
+
+		/* enable Rx interrupt when NAPI finishes */
+		ave_irq_enable(ndev, AVE_GI_RXIINT);
+	}
+
+	return num;
+}
+
+static int ave_napi_poll_tx(struct napi_struct *napi, int budget)
+{
+	struct ave_private *priv;
+	struct net_device *ndev;
+	int num;
+
+	priv = container_of(napi, struct ave_private, napi_tx);
+	ndev = priv->ndev;
+
+	num = ave_tx_complete(ndev);
+	napi_complete(napi);
+
+	/* enable Tx interrupt when NAPI finishes */
+	ave_irq_enable(ndev, AVE_GI_TX);
+
+	return num;
+}
+
+static void ave_global_reset(struct net_device *ndev)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* set config register */
+	val = AVE_CFGR_FLE | AVE_CFGR_IPFCEN | AVE_CFGR_CHE;
+	if (!phy_interface_mode_is_rgmii(priv->phy_mode))
+		val |= AVE_CFGR_MII;
+	writel(val, priv->base + AVE_CFGR);
+
+	/* reset RMII register */
+	val = readl(priv->base + AVE_RSTCTRL);
+	val &= ~AVE_RSTCTRL_RMIIRST;
+	writel(val, priv->base + AVE_RSTCTRL);
+
+	/* assert reset */
+	writel(AVE_GRR_GRST | AVE_GRR_PHYRST, priv->base + AVE_GRR);
+	msleep(20);
+
+	/* 1st, negate PHY reset only */
+	writel(AVE_GRR_GRST, priv->base + AVE_GRR);
+	msleep(40);
+
+	/* negate reset */
+	writel(0, priv->base + AVE_GRR);
+	msleep(40);
+
+	/* negate RMII register */
+	val = readl(priv->base + AVE_RSTCTRL);
+	val |= AVE_RSTCTRL_RMIIRST;
+	writel(val, priv->base + AVE_RSTCTRL);
+
+	ave_irq_disable_all(ndev);
+}
+
+static void ave_rxfifo_reset(struct net_device *ndev)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+	u32 rxcr_org;
+
+	/* save and disable MAC receive op */
+	rxcr_org = readl(priv->base + AVE_RXCR);
+	writel(rxcr_org & (~AVE_RXCR_RXEN), priv->base + AVE_RXCR);
+
+	/* suspend Rx descriptor */
+	ave_desc_switch(ndev, AVE_DESC_RX_SUSPEND);
+
+	/* receive all packets before descriptor starts */
+	ave_rx_receive(ndev, priv->rx.ndesc);
+
+	/* assert reset */
+	writel(AVE_GRR_RXFFR, priv->base + AVE_GRR);
+	usleep_range(40, 50);
+
+	/* negate reset */
+	writel(0, priv->base + AVE_GRR);
+	usleep_range(10, 20);
+
+	/* negate interrupt status */
+	writel(AVE_GI_RXOVF, priv->base + AVE_GISR);
+
+	/* permit descriptor */
+	ave_desc_switch(ndev, AVE_DESC_RX_PERMIT);
+
+	/* restore MAC reccieve op */
+	writel(rxcr_org, priv->base + AVE_RXCR);
+}
+
+static irqreturn_t ave_irq_handler(int irq, void *netdev)
+{
+	struct net_device *ndev = (struct net_device *)netdev;
+	struct ave_private *priv = netdev_priv(ndev);
+	u32 gimr_val, gisr_val;
+
+	gimr_val = ave_irq_disable_all(ndev);
+
+	/* get interrupt status */
+	gisr_val = readl(priv->base + AVE_GISR);
+
+	/* PHY */
+	if (gisr_val & AVE_GI_PHY)
+		writel(AVE_GI_PHY, priv->base + AVE_GISR);
+
+	/* check exceeding packet */
+	if (gisr_val & AVE_GI_RXERR) {
+		writel(AVE_GI_RXERR, priv->base + AVE_GISR);
+		netdev_err(ndev, "receive a packet exceeding frame buffer\n");
+	}
+
+	gisr_val &= gimr_val;
+	if (!gisr_val)
+		goto exit_isr;
+
+	/* RxFIFO overflow */
+	if (gisr_val & AVE_GI_RXOVF) {
+		priv->stats_rx.fifo_errors++;
+		ave_rxfifo_reset(ndev);
+		goto exit_isr;
+	}
+
+	/* Rx drop */
+	if (gisr_val & AVE_GI_RXDROP) {
+		priv->stats_rx.dropped++;
+		writel(AVE_GI_RXDROP, priv->base + AVE_GISR);
+	}
+
+	/* Rx interval */
+	if (gisr_val & AVE_GI_RXIINT) {
+		napi_schedule(&priv->napi_rx);
+		/* still force to disable Rx interrupt until NAPI finishes */
+		gimr_val &= ~AVE_GI_RXIINT;
+	}
+
+	/* Tx completed */
+	if (gisr_val & AVE_GI_TX) {
+		napi_schedule(&priv->napi_tx);
+		/* still force to disable Tx interrupt until NAPI finishes */
+		gimr_val &= ~AVE_GI_TX;
+	}
+
+exit_isr:
+	ave_irq_restore(ndev, gimr_val);
+
+	return IRQ_HANDLED;
+}
+
+static int ave_pfsel_start(struct net_device *ndev, unsigned int entry)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+	u32 val;
+
+	if (WARN_ON(entry > AVE_PF_SIZE))
+		return -EINVAL;
+
+	val = readl(priv->base + AVE_PFEN);
+	writel(val | BIT(entry), priv->base + AVE_PFEN);
+
+	return 0;
+}
+
+static int ave_pfsel_stop(struct net_device *ndev, unsigned int entry)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+	u32 val;
+
+	if (WARN_ON(entry > AVE_PF_SIZE))
+		return -EINVAL;
+
+	val = readl(priv->base + AVE_PFEN);
+	writel(val & ~BIT(entry), priv->base + AVE_PFEN);
+
+	return 0;
+}
+
+static int ave_pfsel_set_macaddr(struct net_device *ndev,
+				 unsigned int entry,
+				 const unsigned char *mac_addr,
+				 unsigned int set_size)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+
+	if (WARN_ON(entry > AVE_PF_SIZE))
+		return -EINVAL;
+	if (WARN_ON(set_size > 6))
+		return -EINVAL;
+
+	ave_pfsel_stop(ndev, entry);
+
+	/* set MAC address for the filter */
+	ave_hw_write_macaddr(ndev, mac_addr,
+			     AVE_PKTF(entry), AVE_PKTF(entry) + 4);
+
+	/* set byte mask */
+	writel(GENMASK(31, set_size) & AVE_PFMBYTE_MASK0,
+	       priv->base + AVE_PFMBYTE(entry));
+	writel(AVE_PFMBYTE_MASK1, priv->base + AVE_PFMBYTE(entry) + 4);
+
+	/* set bit mask filter */
+	writel(AVE_PFMBIT_MASK, priv->base + AVE_PFMBIT(entry));
+
+	/* set selector to ring 0 */
+	writel(0, priv->base + AVE_PFSEL(entry));
+
+	/* restart filter */
+	ave_pfsel_start(ndev, entry);
+
+	return 0;
+}
+
+static void ave_pfsel_set_promisc(struct net_device *ndev,
+				  unsigned int entry, u32 rxring)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+
+	if (WARN_ON(entry > AVE_PF_SIZE))
+		return;
+
+	ave_pfsel_stop(ndev, entry);
+
+	/* set byte mask */
+	writel(AVE_PFMBYTE_MASK0, priv->base + AVE_PFMBYTE(entry));
+	writel(AVE_PFMBYTE_MASK1, priv->base + AVE_PFMBYTE(entry) + 4);
+
+	/* set bit mask filter */
+	writel(AVE_PFMBIT_MASK, priv->base + AVE_PFMBIT(entry));
+
+	/* set selector to rxring */
+	writel(rxring, priv->base + AVE_PFSEL(entry));
+
+	ave_pfsel_start(ndev, entry);
+}
+
+static void ave_pfsel_init(struct net_device *ndev)
+{
+	unsigned char bcast_mac[ETH_ALEN];
+	int i;
+
+	eth_broadcast_addr(bcast_mac);
+
+	for (i = 0; i < AVE_PF_SIZE; i++)
+		ave_pfsel_stop(ndev, i);
+
+	/* promiscious entry, select ring 0 */
+	ave_pfsel_set_promisc(ndev, AVE_PFNUM_FILTER, 0);
+
+	/* unicast entry */
+	ave_pfsel_set_macaddr(ndev, AVE_PFNUM_UNICAST, ndev->dev_addr, 6);
+
+	/* broadcast entry */
+	ave_pfsel_set_macaddr(ndev, AVE_PFNUM_BROADCAST, bcast_mac, 6);
+}
+
+static void ave_phy_adjust_link(struct net_device *ndev)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+	struct phy_device *phydev = ndev->phydev;
+	u32 val, txcr, rxcr, rxcr_org;
+	u16 rmt_adv = 0, lcl_adv = 0;
+	u8 cap;
+
+	/* set RGMII speed */
+	val = readl(priv->base + AVE_TXCR);
+	val &= ~(AVE_TXCR_TXSPD_100 | AVE_TXCR_TXSPD_1G);
+
+	if (phy_interface_is_rgmii(phydev) && phydev->speed == SPEED_1000)
+		val |= AVE_TXCR_TXSPD_1G;
+	else if (phydev->speed == SPEED_100)
+		val |= AVE_TXCR_TXSPD_100;
+
+	writel(val, priv->base + AVE_TXCR);
+
+	/* set RMII speed (100M/10M only) */
+	if (!phy_interface_is_rgmii(phydev)) {
+		val = readl(priv->base + AVE_LINKSEL);
+		if (phydev->speed == SPEED_10)
+			val &= ~AVE_LINKSEL_100M;
+		else
+			val |= AVE_LINKSEL_100M;
+		writel(val, priv->base + AVE_LINKSEL);
+	}
+
+	/* check current RXCR/TXCR */
+	rxcr = readl(priv->base + AVE_RXCR);
+	txcr = readl(priv->base + AVE_TXCR);
+	rxcr_org = rxcr;
+
+	if (phydev->duplex) {
+		rxcr |= AVE_RXCR_FDUPEN;
+
+		if (phydev->pause)
+			rmt_adv |= LPA_PAUSE_CAP;
+		if (phydev->asym_pause)
+			rmt_adv |= LPA_PAUSE_ASYM;
+		if (phydev->advertising & ADVERTISED_Pause)
+			lcl_adv |= ADVERTISE_PAUSE_CAP;
+		if (phydev->advertising & ADVERTISED_Asym_Pause)
+			lcl_adv |= ADVERTISE_PAUSE_ASYM;
+
+		cap = mii_resolve_flowctrl_fdx(lcl_adv, rmt_adv);
+		if (cap & FLOW_CTRL_TX)
+			txcr |= AVE_TXCR_FLOCTR;
+		else
+			txcr &= ~AVE_TXCR_FLOCTR;
+		if (cap & FLOW_CTRL_RX)
+			rxcr |= AVE_RXCR_FLOCTR;
+		else
+			rxcr &= ~AVE_RXCR_FLOCTR;
+	} else {
+		rxcr &= ~AVE_RXCR_FDUPEN;
+		rxcr &= ~AVE_RXCR_FLOCTR;
+		txcr &= ~AVE_TXCR_FLOCTR;
+	}
+
+	if (rxcr_org != rxcr) {
+		/* disable Rx mac */
+		writel(rxcr & ~AVE_RXCR_RXEN, priv->base + AVE_RXCR);
+		/* change and enable TX/Rx mac */
+		writel(txcr, priv->base + AVE_TXCR);
+		writel(rxcr, priv->base + AVE_RXCR);
+	}
+
+	phy_print_status(phydev);
+}
+
+static void ave_macaddr_init(struct net_device *ndev)
+{
+	ave_hw_write_macaddr(ndev, ndev->dev_addr, AVE_RXMAC1R, AVE_RXMAC2R);
+
+	/* pfsel unicast entry */
+	ave_pfsel_set_macaddr(ndev, AVE_PFNUM_UNICAST, ndev->dev_addr, 6);
+}
+
+static int ave_init(struct net_device *ndev)
+{
+	struct ethtool_wolinfo wol = { .cmd = ETHTOOL_GWOL };
+	struct ave_private *priv = netdev_priv(ndev);
+	struct device *dev = ndev->dev.parent;
+	struct device_node *np = dev->of_node;
+	struct device_node *mdio_np;
+	struct phy_device *phydev;
+	int ret;
+
+	/* enable clk because of hw access until ndo_open */
+	ret = clk_prepare_enable(priv->clk);
+	if (ret) {
+		dev_err(dev, "can't enable clock\n");
+		return ret;
+	}
+	ret = reset_control_deassert(priv->rst);
+	if (ret) {
+		dev_err(dev, "can't deassert reset\n");
+		goto out_clk_disable;
+	}
+
+	ave_global_reset(ndev);
+
+	mdio_np = of_get_child_by_name(np, "mdio");
+	if (!mdio_np) {
+		dev_err(dev, "mdio node not found\n");
+		ret = -EINVAL;
+		goto out_reset_assert;
+	}
+	ret = of_mdiobus_register(priv->mdio, mdio_np);
+	of_node_put(mdio_np);
+	if (ret) {
+		dev_err(dev, "failed to register mdiobus\n");
+		goto out_reset_assert;
+	}
+
+	phydev = of_phy_get_and_connect(ndev, np, ave_phy_adjust_link);
+	if (!phydev) {
+		dev_err(dev, "could not attach to PHY\n");
+		ret = -ENODEV;
+		goto out_mdio_unregister;
+	}
+
+	priv->phydev = phydev;
+
+	phy_ethtool_get_wol(phydev, &wol);
+	device_set_wakeup_capable(&ndev->dev, !!wol.supported);
+
+	if (!phy_interface_is_rgmii(phydev)) {
+		phydev->supported &= ~PHY_GBIT_FEATURES;
+		phydev->supported |= PHY_BASIC_FEATURES;
+	}
+	phydev->supported |= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
+
+	phy_attached_info(phydev);
+
+	return 0;
+
+out_mdio_unregister:
+	mdiobus_unregister(priv->mdio);
+out_reset_assert:
+	reset_control_assert(priv->rst);
+out_clk_disable:
+	clk_disable_unprepare(priv->clk);
+
+	return ret;
+}
+
+static void ave_uninit(struct net_device *ndev)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+
+	phy_disconnect(priv->phydev);
+	mdiobus_unregister(priv->mdio);
+
+	/* disable clk because of hw access after ndo_stop */
+	reset_control_assert(priv->rst);
+	clk_disable_unprepare(priv->clk);
+}
+
+static int ave_open(struct net_device *ndev)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+	int entry;
+	int ret;
+	u32 val;
+
+	ret = request_irq(priv->irq, ave_irq_handler, IRQF_SHARED, ndev->name,
+			  ndev);
+	if (ret)
+		return ret;
+
+	priv->tx.desc = kcalloc(priv->tx.ndesc, sizeof(*priv->tx.desc),
+				GFP_KERNEL);
+	if (!priv->tx.desc) {
+		ret = -ENOMEM;
+		goto out_free_irq;
+	}
+
+	priv->rx.desc = kcalloc(priv->rx.ndesc, sizeof(*priv->rx.desc),
+				GFP_KERNEL);
+	if (!priv->rx.desc) {
+		kfree(priv->tx.desc);
+		ret = -ENOMEM;
+		goto out_free_irq;
+	}
+
+	/* initialize Tx work and descriptor */
+	priv->tx.proc_idx = 0;
+	priv->tx.done_idx = 0;
+	for (entry = 0; entry < priv->tx.ndesc; entry++) {
+		ave_desc_write_cmdsts(ndev, AVE_DESCID_TX, entry, 0);
+		ave_desc_write_addr(ndev, AVE_DESCID_TX, entry, 0);
+	}
+	writel(AVE_TXDC_ADDR_START
+		| (((priv->tx.ndesc * priv->desc_size) << 16) & AVE_TXDC_SIZE),
+		priv->base + AVE_TXDC);
+
+	/* initialize Rx work and descriptor */
+	priv->rx.proc_idx = 0;
+	priv->rx.done_idx = 0;
+	for (entry = 0; entry < priv->rx.ndesc; entry++) {
+		if (ave_rxdesc_prepare(ndev, entry))
+			break;
+	}
+	writel(AVE_RXDC0_ADDR_START
+	       | (((priv->rx.ndesc * priv->desc_size) << 16) & AVE_RXDC0_SIZE),
+	       priv->base + AVE_RXDC0);
+
+	ave_desc_switch(ndev, AVE_DESC_START);
+
+	ave_pfsel_init(ndev);
+	ave_macaddr_init(ndev);
+
+	/* set Rx configuration */
+	/* full duplex, enable pause drop, enalbe flow control */
+	val = AVE_RXCR_RXEN | AVE_RXCR_FDUPEN | AVE_RXCR_DRPEN |
+		AVE_RXCR_FLOCTR | (AVE_MAX_ETHFRAME & AVE_RXCR_MPSIZ_MASK);
+	writel(val, priv->base + AVE_RXCR);
+
+	/* set Tx configuration */
+	/* enable flow control, disable loopback */
+	writel(AVE_TXCR_FLOCTR, priv->base + AVE_TXCR);
+
+	/* enable timer, clear EN,INTM, and mask interval unit(BSCK) */
+	val = readl(priv->base + AVE_IIRQC) & AVE_IIRQC_BSCK;
+	val |= AVE_IIRQC_EN0 | (AVE_INTM_COUNT << 16);
+	writel(val, priv->base + AVE_IIRQC);
+
+	val = AVE_GI_RXIINT | AVE_GI_RXOVF | AVE_GI_TX;
+	ave_irq_restore(ndev, val);
+
+	napi_enable(&priv->napi_rx);
+	napi_enable(&priv->napi_tx);
+
+	phy_start(ndev->phydev);
+	phy_start_aneg(ndev->phydev);
+	netif_start_queue(ndev);
+
+	return 0;
+
+out_free_irq:
+	disable_irq(priv->irq);
+	free_irq(priv->irq, ndev);
+
+	return ret;
+}
+
+static int ave_stop(struct net_device *ndev)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+	int entry;
+
+	ave_irq_disable_all(ndev);
+	disable_irq(priv->irq);
+	free_irq(priv->irq, ndev);
+
+	netif_tx_disable(ndev);
+	phy_stop(ndev->phydev);
+	napi_disable(&priv->napi_tx);
+	napi_disable(&priv->napi_rx);
+
+	ave_desc_switch(ndev, AVE_DESC_STOP);
+
+	/* free Tx buffer */
+	for (entry = 0; entry < priv->tx.ndesc; entry++) {
+		if (!priv->tx.desc[entry].skbs)
+			continue;
+
+		ave_dma_unmap(ndev, &priv->tx.desc[entry], DMA_TO_DEVICE);
+		dev_kfree_skb_any(priv->tx.desc[entry].skbs);
+		priv->tx.desc[entry].skbs = NULL;
+	}
+	priv->tx.proc_idx = 0;
+	priv->tx.done_idx = 0;
+
+	/* free Rx buffer */
+	for (entry = 0; entry < priv->rx.ndesc; entry++) {
+		if (!priv->rx.desc[entry].skbs)
+			continue;
+
+		ave_dma_unmap(ndev, &priv->rx.desc[entry], DMA_FROM_DEVICE);
+		dev_kfree_skb_any(priv->rx.desc[entry].skbs);
+		priv->rx.desc[entry].skbs = NULL;
+	}
+	priv->rx.proc_idx = 0;
+	priv->rx.done_idx = 0;
+
+	kfree(priv->tx.desc);
+	kfree(priv->rx.desc);
+
+	return 0;
+}
+
+static int ave_start_xmit(struct sk_buff *skb, struct net_device *ndev)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+	u32 proc_idx, done_idx, ndesc, cmdsts;
+	int ret, freepkt;
+	dma_addr_t paddr;
+
+	proc_idx = priv->tx.proc_idx;
+	done_idx = priv->tx.done_idx;
+	ndesc = priv->tx.ndesc;
+	freepkt = ((done_idx + ndesc - 1) - proc_idx) % ndesc;
+
+	/* stop queue when not enough entry */
+	if (unlikely(freepkt < 1)) {
+		netif_stop_queue(ndev);
+		return NETDEV_TX_BUSY;
+	}
+
+	/* add padding for short packet */
+	if (skb_put_padto(skb, ETH_ZLEN)) {
+		priv->stats_tx.dropped++;
+		return NETDEV_TX_OK;
+	}
+
+	/* map Tx buffer
+	 * Tx buffer set to the Tx descriptor doesn't have any restriction.
+	 */
+	ret = ave_dma_map(ndev, &priv->tx.desc[proc_idx],
+			  skb->data, skb->len, DMA_TO_DEVICE, &paddr);
+	if (ret) {
+		dev_kfree_skb_any(skb);
+		priv->stats_tx.dropped++;
+		return NETDEV_TX_OK;
+	}
+
+	priv->tx.desc[proc_idx].skbs = skb;
+
+	ave_desc_write_addr(ndev, AVE_DESCID_TX, proc_idx, paddr);
+
+	cmdsts = AVE_STS_OWN | AVE_STS_1ST | AVE_STS_LAST
+		| (skb->len & AVE_STS_PKTLEN_TX_MASK);
+
+	/* set interrupt per AVE_FORCE_TXINTCNT or when queue is stopped */
+	if (!(proc_idx % AVE_FORCE_TXINTCNT) || netif_queue_stopped(ndev))
+		cmdsts |= AVE_STS_INTR;
+
+	/* disable checksum calculation when skb doesn't calurate checksum */
+	if (skb->ip_summed == CHECKSUM_NONE ||
+	    skb->ip_summed == CHECKSUM_UNNECESSARY)
+		cmdsts |= AVE_STS_NOCSUM;
+
+	ave_desc_write_cmdsts(ndev, AVE_DESCID_TX, proc_idx, cmdsts);
+
+	priv->tx.proc_idx = (proc_idx + 1) % ndesc;
+
+	return NETDEV_TX_OK;
+}
+
+static int ave_ioctl(struct net_device *ndev, struct ifreq *ifr, int cmd)
+{
+	return phy_mii_ioctl(ndev->phydev, ifr, cmd);
+}
+
+static const u8 v4multi_macadr[] = { 0x01, 0x00, 0x00, 0x00, 0x00, 0x00 };
+static const u8 v6multi_macadr[] = { 0x33, 0x00, 0x00, 0x00, 0x00, 0x00 };
+
+static void ave_set_rx_mode(struct net_device *ndev)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+	struct netdev_hw_addr *hw_adr;
+	int count, mc_cnt;
+	u32 val;
+
+	/* MAC addr filter enable for promiscious mode */
+	mc_cnt = netdev_mc_count(ndev);
+	val = readl(priv->base + AVE_RXCR);
+	if (ndev->flags & IFF_PROMISC || !mc_cnt)
+		val &= ~AVE_RXCR_AFEN;
+	else
+		val |= AVE_RXCR_AFEN;
+	writel(val, priv->base + AVE_RXCR);
+
+	/* set all multicast address */
+	if ((ndev->flags & IFF_ALLMULTI) || mc_cnt > AVE_PF_MULTICAST_SIZE) {
+		ave_pfsel_set_macaddr(ndev, AVE_PFNUM_MULTICAST,
+				      v4multi_macadr, 1);
+		ave_pfsel_set_macaddr(ndev, AVE_PFNUM_MULTICAST + 1,
+				      v6multi_macadr, 1);
+	} else {
+		/* stop all multicast filter */
+		for (count = 0; count < AVE_PF_MULTICAST_SIZE; count++)
+			ave_pfsel_stop(ndev, AVE_PFNUM_MULTICAST + count);
+
+		/* set multicast addresses */
+		count = 0;
+		netdev_for_each_mc_addr(hw_adr, ndev) {
+			if (count == mc_cnt)
+				break;
+			ave_pfsel_set_macaddr(ndev, AVE_PFNUM_MULTICAST + count,
+					      hw_adr->addr, 6);
+			count++;
+		}
+	}
+}
+
+static void ave_get_stats64(struct net_device *ndev,
+			    struct rtnl_link_stats64 *stats)
+{
+	struct ave_private *priv = netdev_priv(ndev);
+	unsigned int start;
+
+	do {
+		start = u64_stats_fetch_begin_irq(&priv->stats_rx.syncp);
+		stats->rx_packets = priv->stats_rx.packets;
+		stats->rx_bytes	  = priv->stats_rx.bytes;
+	} while (u64_stats_fetch_retry_irq(&priv->stats_rx.syncp, start));
+
+	do {
+		start = u64_stats_fetch_begin_irq(&priv->stats_tx.syncp);
+		stats->tx_packets = priv->stats_tx.packets;
+		stats->tx_bytes	  = priv->stats_tx.bytes;
+	} while (u64_stats_fetch_retry_irq(&priv->stats_tx.syncp, start));
+
+	stats->rx_errors      = priv->stats_rx.errors;
+	stats->tx_errors      = priv->stats_tx.errors;
+	stats->rx_dropped     = priv->stats_rx.dropped;
+	stats->tx_dropped     = priv->stats_tx.dropped;
+	stats->rx_fifo_errors = priv->stats_rx.fifo_errors;
+	stats->collisions     = priv->stats_tx.collisions;
+}
+
+static int ave_set_mac_address(struct net_device *ndev, void *p)
+{
+	int ret = eth_mac_addr(ndev, p);
+
+	if (ret)
+		return ret;
+
+	ave_macaddr_init(ndev);
+
+	return 0;
+}
+
+static const struct net_device_ops ave_netdev_ops = {
+	.ndo_init		= ave_init,
+	.ndo_uninit		= ave_uninit,
+	.ndo_open		= ave_open,
+	.ndo_stop		= ave_stop,
+	.ndo_start_xmit		= ave_start_xmit,
+	.ndo_do_ioctl		= ave_ioctl,
+	.ndo_set_rx_mode	= ave_set_rx_mode,
+	.ndo_get_stats64	= ave_get_stats64,
+	.ndo_set_mac_address	= ave_set_mac_address,
+};
+
+static int ave_probe(struct platform_device *pdev)
+{
+	const struct ave_soc_data *data;
+	struct device *dev = &pdev->dev;
+	char buf[ETHTOOL_FWVERS_LEN];
+	phy_interface_t phy_mode;
+	struct ave_private *priv;
+	struct net_device *ndev;
+	struct device_node *np;
+	struct resource	*res;
+	const void *mac_addr;
+	void __iomem *base;
+	u64 dma_mask;
+	int irq, ret;
+	u32 ave_id;
+
+	data = of_device_get_match_data(dev);
+	if (WARN_ON(!data))
+		return -EINVAL;
+
+	np = dev->of_node;
+	phy_mode = of_get_phy_mode(np);
+	if (phy_mode < 0) {
+		dev_err(dev, "phy-mode not found\n");
+		return -EINVAL;
+	}
+	if ((!phy_interface_mode_is_rgmii(phy_mode)) &&
+	    phy_mode != PHY_INTERFACE_MODE_RMII &&
+	    phy_mode != PHY_INTERFACE_MODE_MII) {
+		dev_err(dev, "phy-mode is invalid\n");
+		return -EINVAL;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq < 0) {
+		dev_err(dev, "IRQ not found\n");
+		return irq;
+	}
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	base = devm_ioremap_resource(dev, res);
+	if (IS_ERR(base))
+		return PTR_ERR(base);
+
+	ndev = alloc_etherdev(sizeof(struct ave_private));
+	if (!ndev) {
+		dev_err(dev, "can't allocate ethernet device\n");
+		return -ENOMEM;
+	}
+
+	ndev->netdev_ops = &ave_netdev_ops;
+	ndev->ethtool_ops = &ave_ethtool_ops;
+	SET_NETDEV_DEV(ndev, dev);
+
+	ndev->features    |= (NETIF_F_IP_CSUM | NETIF_F_RXCSUM);
+	ndev->hw_features |= (NETIF_F_IP_CSUM | NETIF_F_RXCSUM);
+
+	ndev->max_mtu = AVE_MAX_ETHFRAME - (ETH_HLEN + ETH_FCS_LEN);
+
+	mac_addr = of_get_mac_address(np);
+	if (mac_addr)
+		ether_addr_copy(ndev->dev_addr, mac_addr);
+
+	/* if the mac address is invalid, use random mac address */
+	if (!is_valid_ether_addr(ndev->dev_addr)) {
+		eth_hw_addr_random(ndev);
+		dev_warn(dev, "Using random MAC address: %pM\n",
+			 ndev->dev_addr);
+	}
+
+	priv = netdev_priv(ndev);
+	priv->base = base;
+	priv->irq = irq;
+	priv->ndev = ndev;
+	priv->msg_enable = netif_msg_init(-1, AVE_DEFAULT_MSG_ENABLE);
+	priv->phy_mode = phy_mode;
+	priv->data = data;
+
+	if (IS_DESC_64BIT(priv)) {
+		priv->desc_size = AVE_DESC_SIZE_64;
+		priv->tx.daddr  = AVE_TXDM_64;
+		priv->rx.daddr  = AVE_RXDM_64;
+		dma_mask = DMA_BIT_MASK(64);
+	} else {
+		priv->desc_size = AVE_DESC_SIZE_32;
+		priv->tx.daddr  = AVE_TXDM_32;
+		priv->rx.daddr  = AVE_RXDM_32;
+		dma_mask = DMA_BIT_MASK(32);
+	}
+	ret = dma_set_mask(dev, dma_mask);
+	if (ret)
+		goto out_free_netdev;
+
+	priv->tx.ndesc = AVE_NR_TXDESC;
+	priv->rx.ndesc = AVE_NR_RXDESC;
+
+	u64_stats_init(&priv->stats_tx.syncp);
+	u64_stats_init(&priv->stats_rx.syncp);
+
+	priv->clk = devm_clk_get(dev, NULL);
+	if (IS_ERR(priv->clk)) {
+		ret = PTR_ERR(priv->clk);
+		goto out_free_netdev;
+	}
+
+	priv->rst = devm_reset_control_get_optional_shared(dev, NULL);
+	if (IS_ERR(priv->rst)) {
+		ret = PTR_ERR(priv->rst);
+		goto out_free_netdev;
+	}
+
+	priv->mdio = devm_mdiobus_alloc(dev);
+	if (!priv->mdio) {
+		ret = -ENOMEM;
+		goto out_free_netdev;
+	}
+	priv->mdio->priv = ndev;
+	priv->mdio->parent = dev;
+	priv->mdio->read = ave_mdiobus_read;
+	priv->mdio->write = ave_mdiobus_write;
+	priv->mdio->name = "uniphier-mdio";
+	snprintf(priv->mdio->id, MII_BUS_ID_SIZE, "%s-%x",
+		 pdev->name, pdev->id);
+
+	/* Register as a NAPI supported driver */
+	netif_napi_add(ndev, &priv->napi_rx, ave_napi_poll_rx, priv->rx.ndesc);
+	netif_tx_napi_add(ndev, &priv->napi_tx, ave_napi_poll_tx,
+			  priv->tx.ndesc);
+
+	platform_set_drvdata(pdev, ndev);
+
+	ret = register_netdev(ndev);
+	if (ret) {
+		dev_err(dev, "failed to register netdevice\n");
+		goto out_del_napi;
+	}
+
+	/* get ID and version */
+	ave_id = readl(priv->base + AVE_IDR);
+	ave_hw_read_version(ndev, buf, sizeof(buf));
+
+	dev_info(dev, "Socionext %c%c%c%c Ethernet IP %s (irq=%d, phy=%s)\n",
+		 (ave_id >> 24) & 0xff, (ave_id >> 16) & 0xff,
+		 (ave_id >> 8) & 0xff, (ave_id >> 0) & 0xff,
+		 buf, priv->irq, phy_modes(phy_mode));
+
+	return 0;
+
+out_del_napi:
+	netif_napi_del(&priv->napi_rx);
+	netif_napi_del(&priv->napi_tx);
+out_free_netdev:
+	free_netdev(ndev);
+
+	return ret;
+}
+
+static int ave_remove(struct platform_device *pdev)
+{
+	struct net_device *ndev = platform_get_drvdata(pdev);
+	struct ave_private *priv = netdev_priv(ndev);
+
+	unregister_netdev(ndev);
+	netif_napi_del(&priv->napi_rx);
+	netif_napi_del(&priv->napi_tx);
+	free_netdev(ndev);
+
+	return 0;
+}
+
+static const struct ave_soc_data ave_pro4_data = {
+	.is_desc_64bit = false,
+};
+
+static const struct ave_soc_data ave_pxs2_data = {
+	.is_desc_64bit = false,
+};
+
+static const struct ave_soc_data ave_ld11_data = {
+	.is_desc_64bit = false,
+};
+
+static const struct ave_soc_data ave_ld20_data = {
+	.is_desc_64bit = true,
+};
+
+static const struct of_device_id of_ave_match[] = {
+	{
+		.compatible = "socionext,uniphier-pro4-ave4",
+		.data = &ave_pro4_data,
+	},
+	{
+		.compatible = "socionext,uniphier-pxs2-ave4",
+		.data = &ave_pxs2_data,
+	},
+	{
+		.compatible = "socionext,uniphier-ld11-ave4",
+		.data = &ave_ld11_data,
+	},
+	{
+		.compatible = "socionext,uniphier-ld20-ave4",
+		.data = &ave_ld20_data,
+	},
+	{ /* Sentinel */ }
+};
+MODULE_DEVICE_TABLE(of, of_ave_match);
+
+static struct platform_driver ave_driver = {
+	.probe  = ave_probe,
+	.remove = ave_remove,
+	.driver	= {
+		.name = "ave",
+		.of_match_table	= of_ave_match,
+	},
+};
+module_platform_driver(ave_driver);
+
+MODULE_DESCRIPTION("Socionext UniPhier AVE ethernet driver");
+MODULE_LICENSE("GPL v2");
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH net-next v8 1/2] dt-bindings: net: add DT bindings for Socionext UniPhier AVE
From: Kunihiko Hayashi @ 2017-12-25  1:10 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: Andrew Lunn, Florian Fainelli, Rob Herring, Mark Rutland,
	linux-arm-kernel, linux-kernel, devicetree, Masahiro Yamada,
	Masami Hiramatsu, Jassi Brar, Kunihiko Hayashi
In-Reply-To: <1514164238-28901-1-git-send-email-hayashi.kunihiko@socionext.com>

DT bindings for the AVE ethernet controller found on Socionext's
UniPhier platforms.

Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Acked-by: Rob Herring <robh@kernel.org>
---
 .../bindings/net/socionext,uniphier-ave4.txt       | 47 ++++++++++++++++++++++
 1 file changed, 47 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt

diff --git a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
new file mode 100644
index 0000000..8b03668
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
@@ -0,0 +1,47 @@
+* Socionext AVE ethernet controller
+
+This describes the devicetree bindings for AVE ethernet controller
+implemented on Socionext UniPhier SoCs.
+
+Required properties:
+ - compatible: Should be
+	- "socionext,uniphier-pro4-ave4" : for Pro4 SoC
+	- "socionext,uniphier-pxs2-ave4" : for PXs2 SoC
+	- "socionext,uniphier-ld11-ave4" : for LD11 SoC
+	- "socionext,uniphier-ld20-ave4" : for LD20 SoC
+ - reg: Address where registers are mapped and size of region.
+ - interrupts: Should contain the MAC interrupt.
+ - phy-mode: See ethernet.txt in the same directory. Allow to choose
+	"rgmii", "rmii", or "mii" according to the PHY.
+ - phy-handle: Should point to the external phy device.
+	See ethernet.txt file in the same directory.
+ - clocks: A phandle to the clock for the MAC.
+
+Optional properties:
+ - resets: A phandle to the reset control for the MAC.
+ - local-mac-address: See ethernet.txt in the same directory.
+
+Required subnode:
+ - mdio: A container for child nodes representing phy nodes.
+         See phy.txt in the same directory.
+
+Example:
+
+	ether: ethernet@65000000 {
+		compatible = "socionext,uniphier-ld20-ave4";
+		reg = <0x65000000 0x8500>;
+		interrupts = <0 66 4>;
+		phy-mode = "rgmii";
+		phy-handle = <&ethphy>;
+		clocks = <&sys_clk 6>;
+		resets = <&sys_rst 6>;
+		local-mac-address = [00 00 00 00 00 00];
+
+		mdio {
+			#address-cells = <1>;
+			#size-cells = <0>;
+			ethphy: ethphy@1 {
+				reg = <1>;
+			};
+		};
+	};
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next v8 0/2] add UniPhier AVE ethernet support
From: Kunihiko Hayashi @ 2017-12-25  1:10 UTC (permalink / raw)
  To: David Miller, netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: Andrew Lunn, Florian Fainelli, Rob Herring, Mark Rutland,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Masahiro Yamada,
	Masami Hiramatsu, Jassi Brar, Kunihiko Hayashi

This series adds support for Socionext AVE ethernet controller implemented
on UniPhier SoCs. This driver supports RGMII/RMII modes.

v7: https://www.spinics.net/lists/netdev/msg473896.html

The PHY patch included in v1 has already separated in:
http://www.spinics.net/lists/netdev/msg454595.html

Changes since v7:
- dt-bindings: fix mdio subnode description

Changes since v6:
- sort the order of local variables from longest to shortest line
- fix ave_probe() which calls register_netdev() at the end of initialization
- dt-bindings: remove phy node descriptions in mdio node

Changes since v5:
- replace license boilerplate with SPDX Identifier
- remove inline directives and an unused function

Changes since v4:
- fix larger integer warning on AVE_PFMBYTE_MASK0

Changes since v3:
- remove checking dma address and use dma_set_mask() to restirct address
- replace ave_mdio_busywait() with read_poll_timeout()
- replace functions to access to registers with readl/writel() directly
- replace a function to access to macaddr with ave_hw_write_macaddr()
- change return value of ave_dma_map() to error value
- move mdiobus_unregister() from ave_remove() to ave_uninit()
- eliminate else block at the end of ave_dma_map()
- add mask definitions for packet filter
- sort bitmap definitions in descending order
- add error check to some functions
- rename and sort functions to clear sub-categories
- fix error value consistency
- remove unneeded initializers
- change type of constant arrays

Changes since v2:
- replace clk_get() with devm_clk_get()
- replace reset_control_get() with devm_reset_control_get_optional_shared()
- add error return when the error occurs on the above *_get functions
- sort soc data and compatible strings
- remove clearly obvious comments
- modify dt-bindings document consistent with these modifications

Changes since v1:
- add/remove devicetree properties and sub-node
  - remove "internal-phy-interrupt" and "desc-bits" property
  - add SoC data structures based on compatible strings
  - add node operation to apply "mdio" sub-node
- add support for features
  - add support for {get,set}_pauseparam and pause frame operations
  - add support for ndo_get_stats64 instead of ndo_get_stats
- replace with desiable functions
  - replace check for valid phy_mode with phy_interface{_mode}_is_rgmii()
  - replace phy attach message with phy_attached_info()
  - replace 32bit operation with {upper,lower}_32_bits() on ave_wdesc_addr()
  - replace nway_reset and get_link with generic functions
- move operations to proper functions
  - move phy_start_aneg() to ndo_open,
    and remove unnecessary PHY interrupt operations
    See http://www.spinics.net/lists/netdev/msg454590.html
  - move irq initialization and descriptor memory allocation to ndo_open
  - move initialization of reset and clock and mdiobus to ndo_init
- fix skbuffer operations
  - fix skb alignment operations and add Rx buffer adjustment for descriptor
    See http://www.spinics.net/lists/netdev/msg456014.html
  - add error returns when dma_map_single() failed 
- clean up code structures
  - clean up wait-loop and wake-queue conditions
  - add ave_wdesc_addr() and offset definitions
  - add ave_macaddr_init() to clean up mac-address operation
  - fix checking whether Tx entry is not enough
  - fix supported features of phydev
  - add necessary free/disable operations
  - add phydev check on ave_{get,set}_wol()
  - remove netif_carrier functions, phydev initializer, and Tx budget check
- change obsolate codes
  - replace ndev->{base_addr,irq} with the members of ave_private
- rename goto labels and mask definitions, and remove unused codes

Kunihiko Hayashi (2):
  dt-bindings: net: add DT bindings for Socionext UniPhier AVE
  net: ethernet: socionext: add AVE ethernet driver

 .../bindings/net/socionext,uniphier-ave4.txt       |   47 +
 drivers/net/ethernet/Kconfig                       |    1 +
 drivers/net/ethernet/Makefile                      |    1 +
 drivers/net/ethernet/socionext/Kconfig             |   22 +
 drivers/net/ethernet/socionext/Makefile            |    5 +
 drivers/net/ethernet/socionext/sni_ave.c           | 1736 ++++++++++++++++++++
 6 files changed, 1812 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
 create mode 100644 drivers/net/ethernet/socionext/Kconfig
 create mode 100644 drivers/net/ethernet/socionext/Makefile
 create mode 100644 drivers/net/ethernet/socionext/sni_ave.c

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v3 00/27] kill devm_ioremap_nocache
From: Yisheng Xie @ 2017-12-25  1:09 UTC (permalink / raw)
  To: christophe leroy, Guenter Roeck, Greg KH
  Cc: linux-mips, ulf.hansson, jakub.kicinski, platform-driver-x86,
	airlied, linux-wireless, linus.walleij, alsa-devel, dri-devel,
	linux-kernel, linux-ide, linux-mtd, daniel.vetter, dan.j.williams,
	jason, linux-rtc, boris.brezillon, mchehab, dmaengine, vinod.koul,
	richard, marek.vasut, industrypack-devel, linux-pci, dvhart, wg,
	linux-media, seanpaul, devel, linux-watchdog, arnd, b.zolnierkie,
	marc.zyngier, jslaby
In-Reply-To: <c28ac0bc-8bd2-3dce-3167-8c0f80ec601e@c-s.fr>

hi Christophe and Greg,

On 2017/12/24 16:55, christophe leroy wrote:
> 
> 
> Le 23/12/2017 à 16:57, Guenter Roeck a écrit :
>> On 12/23/2017 05:48 AM, Greg KH wrote:
>>> On Sat, Dec 23, 2017 at 06:55:25PM +0800, Yisheng Xie wrote:
>>>> Hi all,
>>>>
>>>> When I tried to use devm_ioremap function and review related code, I found
>>>> devm_ioremap and devm_ioremap_nocache is almost the same with each other,
>>>> except one use ioremap while the other use ioremap_nocache.
>>>
>>> For all arches?  Really?  Look at MIPS, and x86, they have different
>>> functions.
>>>
>>
>> Both mips and x86 end up mapping the same function, but other arches don't.
>> mn10300 is one where ioremap and ioremap_nocache are definitely different.
> 
> alpha: identical
> arc: identical
> arm: identical
> arm64: identical
> cris: different        <==
> frv: identical
> hexagone: identical
> ia64: different        <==
> m32r: identical
> m68k: identical
> metag: identical
> microblaze: identical
> mips: identical
> mn10300: different     <==
> nios: identical
> openrisc: different    <==
> parisc: identical
> riscv: identical
> s390: identical
> sh: identical
> sparc: identical
> tile: identical
> um: rely on asm/generic
> unicore32: identical
> x86: identical
> asm/generic (no mmu): identical

Wow, that's correct, sorry for I have just checked the main archs, I means
x86,arm, arm64, mips.

However, I stall have no idea about why these 4 archs want different ioremap
function with others. Drivers seems cannot aware this? If driver call ioremap
want he really want for there 4 archs, cache or nocache?

> 
> So 4 among all arches seems to have ioremap() and ioremap_nocache() being different.
> 
> Could we have a define set by the 4 arches on which ioremap() and ioremap_nocache() are different, something like HAVE_DIFFERENT_IOREMAP_NOCACHE ?

Then, what the HAVE_DIFFERENT_IOREMAP_NOCACHE is uesed for ?

Thanks
Yisheng
> 
> Christophe
> 
>>
>> Guenter
>>
>>>> While ioremap's
>>>> default function is ioremap_nocache, so devm_ioremap_nocache also have the
>>>> same function with devm_ioremap, which can just be killed to reduce the size
>>>> of devres.o(from 20304 bytes to 18992 bytes in my compile environment).
>>>>
>>>> I have posted two versions, which use macro instead of function for
>>>> devm_ioremap_nocache[1] or devm_ioremap[2]. And Greg suggest me to kill
>>>> devm_ioremap_nocache for no need to keep a macro around for the duplicate
>>>> thing. So here comes v3 and please help to review.
>>>
>>> I don't think this can be done, what am I missing?  These functions are
>>> not identical, sorry for missing that before.

Never mind, I should checked all the arches, sorry about that.

>>>
>>> thanks,
>>>
>>> greg k-h
>>>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-watchdog" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> ---
> L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
> https://www.avast.com/antivirus
> 
> 
> .
> 

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply

* IPSec tunnels with compression are broken since 4.14
From: Serguei Ivantsov @ 2017-12-24 21:20 UTC (permalink / raw)
  To: netdev

Hi,

Found weird issue starting from 4.14 kernels.
IPSec tunnels with IPComp enabled are not working.
There are a couple of similar reports in strongSwan's wiki and mailing 
list.
Resolution is simple - disable compression.
I have tested all kernels from 4.14 to 4.14.8 - does not work. But works 
fine with any earlier kernel like 4.13.x
Both ikev1 and ikev2 are affected.
According to ipsec statusall, connection was established, but no traffic 
routed - can't ping, etc.

rt6-center[6]: ESTABLISHED 3 minutes ago, 
XX.XX.XX.XX[rt6]...YY.YY.YY.YY[center]
rt6-center[6]: IKEv2 SPIs: d7bd02a630bcd9b9_i 676e1ad3da512c68_r*, 
rekeying in 2 hours
rt6-center[6]: IKE proposal: 
AES_CBC_128/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/CURVE_25519
rt6-center{2}:  INSTALLED, TUNNEL, reqid 2, ESP in UDP SPIs: c1b5ec2b_i 
ce572160_o, IPCOMP CPIs: 2c8c_i 0ca7_o
rt6-center{2}:  AES_CBC_128/HMAC_SHA2_256_128, 0 bytes_i, 0 bytes_o, 
rekeying in 35 minutes
rt6-center{2}:   XX.XX.XX.XX/32 === 10.1.0.1/32


Regards,
  Serguei

^ permalink raw reply

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
From: Willem de Bruijn @ 2017-12-24 18:54 UTC (permalink / raw)
  To: Andreas Hartmann
  Cc: Michal Kubecek, Jason Wang, David Miller, Network Development
In-Reply-To: <3fad7ad3-b71a-037f-80d4-7052da28a18f@01019freenet.de>

On Sun, Dec 24, 2017 at 11:24 AM, Andreas Hartmann
<andihartmann@01019freenet.de> wrote:
> On 12/20/2017 at 04:56 PM Andreas Hartmann wrote:
>> On 12/18/2017 at 06:11 PM Andreas Hartmann wrote:
>>> On 12/17/2017 at 11:33 PM Willem de Bruijn wrote:
>> [...]
>>>> I have been able to reproduce the hang by sending a UFO packet
>>>> between two guests running v4.13 on a host running v4.15-rc1.
>>>>
>>>> The vhost_net_ubuf_ref refcount indeed hits overflow (-1) from
>>>> vhost_zerocopy_callback being called for each segment of a
>>>> segmented UFO skb. This refcount is decremented then on each
>>>> segment, but incremented only once for the entire UFO skb.
>>>>
>>>> Before v4.14, these packets would be converted in skb_segment to
>>>> regular copy packets with skb_orphan_frags and the callback function
>>>> called once at this point. v4.14 added support for reference counted
>>>> zerocopy skb that can pass through skb_orphan_frags unmodified and
>>>> have their zerocopy state safely cloned with skb_zerocopy_clone.
>>>>
>>>> The call to skb_zerocopy_clone must come after skb_orphan_frags
>>>> to limit cloning of this state to those skbs that can do so safely.
>>>>
>>>> Please try a host with the following patch. This fixes it for me. I intend to
>>>> send it to net.
>>>>
>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>>> index a592ca025fc4..d2d985418819 100644
>>>> --- a/net/core/skbuff.c
>>>> +++ b/net/core/skbuff.c
>>>> @@ -3654,8 +3654,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>>
>>>>                 skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
>>>>                                               SKBTX_SHARED_FRAG;
>>>> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
>>>> -                       goto err;
>>>>
>>>>                 while (pos < offset + len) {
>>>>                         if (i >= nfrags) {
>>>> @@ -3681,6 +3679,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>>
>>>>                         if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
>>>>                                 goto err;
>>>> +                       if (skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>>>> +                               goto err;
>>>>
>>>>                         *nskb_frag = *frag;
>>>>                         __skb_frag_ref(nskb_frag);
>>>>
>>>>
>>>> This is relatively inefficient, as it calls skb_zerocopy_clone for each frag
>>>> in the frags[] array. I will follow-up with a patch to net-next that only
>>>> checks once per skb:
>>>>
>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>>> index 466581cf4cdc..a293a33604ec 100644
>>>> --- a/net/core/skbuff.c
>>>> +++ b/net/core/skbuff.c
>>>> @@ -3662,7 +3662,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>>
>>>>                 skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
>>>>                                               SKBTX_SHARED_FRAG;
>>>> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
>>>> +               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>>>> +                   skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>>>>                         goto err;
>>>>
>>>>                 while (pos < offset + len) {
>>>> @@ -3676,6 +3677,11 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>>
>>>>                                 BUG_ON(!nfrags);
>>>>
>>>> +                               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>>>> +                                   skb_zerocopy_clone(nskb, frag_skb,
>>>> +                                                      GFP_ATOMIC))
>>>> +                                       goto err;
>>>> +
>>>>                                 list_skb = list_skb->next;
>>>>                         }
>>>>
>>>> @@ -3687,9 +3693,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>>                                 goto err;
>>>>                         }
>>>>
>>>> -                       if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
>>>> -                               goto err;
>>>> -
>>>
>>> I'm currently testing this one.
>>>
>>
>> Test is in progress. I'm testing w/ 4.14.7, which already contains "net:
>> accept UFO datagrams from tuntap and packet".
>>
>> At first, I tested an unpatched 4.14.7 - the problem (no more killable
>> qemu-process) did occur promptly on shutdown of the machine. This was
>> expected.
>>
>> Next, I applied the above patch (the second one). Until now, I didn't
>> face any problem any more on shutdown of VMs. Looks promising.
>
> Ok, I didn't face any problem any more! Many thanks for your effort and
> your 2 patches to get 4.14. working again w/ qemu and virtual networks /
> virtio!

That is great news. Thanks a lot for testing!

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox