Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [iproute PATCH v2 0/3] Check user supplied interface name lengths
From: Stephen Hemminger @ 2017-09-29 17:31 UTC (permalink / raw)
  To: Phil Sutter; +Cc: netdev
In-Reply-To: <20170927160528.GN32305@orbyte.nwl.cc>

On Wed, 27 Sep 2017 18:05:28 +0200
Phil Sutter <phil@nwl.cc> wrote:

> On Wed, Sep 27, 2017 at 08:42:49AM +0100, Stephen Hemminger wrote:
> > On Tue, 26 Sep 2017 18:35:45 +0200
> > Phil Sutter <phil@nwl.cc> wrote:
> >   
> > > This series adds explicit checks for user-supplied interface names to
> > > make sure their length fits Linux's requirements.
> > > 
> > > The first two patches simplify interface name parsing in some places -
> > > these are side-effects of working on the actual implementation provided
> > > in patch three.
> > > 
> > > Changes since v1:
> > > - Patches 1 and 2 introduced.
> > > - Changes to patch 3 are listed in there.
> > > 
> > > Phil Sutter (3):
> > >   ip{6,}tunnel: Avoid copying user-supplied interface name around
> > >   tc: flower: No need to cache indev arg
> > >   Check user supplied interface name lengths
> > > 
> > >  include/utils.h |  1 +
> > >  ip/ip6tunnel.c  |  9 +++++----
> > >  ip/ipl2tp.c     |  3 ++-
> > >  ip/iplink.c     | 27 ++++++++-------------------
> > >  ip/ipmaddr.c    |  1 +
> > >  ip/iprule.c     |  4 ++++
> > >  ip/iptunnel.c   | 27 +++++++++++++--------------
> > >  ip/iptuntap.c   |  4 +++-
> > >  lib/utils.c     | 10 ++++++++++
> > >  misc/arpd.c     |  1 +
> > >  tc/f_flower.c   |  6 ++----
> > >  11 files changed, 50 insertions(+), 43 deletions(-)
> > >   
> > 
> > I like the idea, and checking arguments is good.  
> 
> Cool!

I was thinking something like:



diff --git a/include/utils.h b/include/utils.h
index c9ed230b9604..e2702b56f2e0 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -105,6 +105,8 @@ int get_be64(__be64 *val, const char *arg, int base);
 int get_be32(__be32 *val, const char *arg, int base);
 int get_be16(__be16 *val, const char *arg, int base);
 int get_addr64(__u64 *ap, const char *cp);
+int check_ifname(const char *arg);
+int get_ifname(char *buf, const char *arg);
 
 int hex2mem(const char *buf, uint8_t *mem, int count);
 char *hexstring_n2a(const __u8 *str, int len, char *buf, int blen);
diff --git a/ip/ip6tunnel.c b/ip/ip6tunnel.c
index b4a7def14422..a6f0e99bdc21 100644
--- a/ip/ip6tunnel.c
+++ b/ip/ip6tunnel.c
@@ -180,7 +180,8 @@ static int parse_args(int argc, char **argv, int cmd, struct ip6_tnl_parm2 *p)
 			memcpy(&p->laddr, &laddr.data, sizeof(p->laddr));
 		} else if (strcmp(*argv, "dev") == 0) {
 			NEXT_ARG();
-			strncpy(medium, *argv, IFNAMSIZ - 1);
+			if (get_ifname(medium, *argv))
+				invarg("\"medium\" not a valid ifname", *argv);
 		} else if (strcmp(*argv, "encaplimit") == 0) {
 			NEXT_ARG();
 			if (strcmp(*argv, "none") == 0) {
diff --git a/ip/iplink.c b/ip/iplink.c
index ff5b56c038d2..89aa51ed3b40 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -1265,6 +1265,8 @@ static int do_set(int argc, char **argv)
 			flags &= ~IFF_UP;
 		} else if (strcmp(*argv, "name") == 0) {
 			NEXT_ARG();
+			if (check_ifname(*argv))
+				invarg("Invalid \"name\"\n", *argv);
 			newname = *argv;
 		} else if (matches(*argv, "address") == 0) {
 			NEXT_ARG();
@@ -1383,9 +1385,6 @@ static int do_set(int argc, char **argv)
 	}
 
 	if (newname && strcmp(dev, newname)) {
-		if (strlen(newname) == 0)
-			invarg("\"\" is not a valid device identifier\n",
-			       "name");
 		if (do_changename(dev, newname) < 0)
 			return -1;
 		dev = newname;
diff --git a/lib/utils.c b/lib/utils.c
index bbd3cbc46a0e..a93b45b51a3b 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -20,6 +20,7 @@
 #include <sys/socket.h>
 #include <netinet/in.h>
 #include <string.h>
+#include <ctype.h>
 #include <netdb.h>
 #include <arpa/inet.h>
 #include <asm/types.h>
@@ -465,6 +466,34 @@ int get_addr64(__u64 *ap, const char *cp)
 	return 1;
 }
 
+int check_ifname(const char *name)
+{
+	/* These check mimic kernel checks in dev_valid_name */
+	if (*name == '\0')
+		return -1;
+	if (strlen(name) >= IFNAMSIZ)
+		return -1;
+
+	while (*name) {
+		if (*name == '/' || isspace(*name))
+			return -1;
+		++name;
+	}
+	return 0;
+}
+		
+/* buf is assumed to be IFNAMSIZ */
+int get_ifname(char *buf, const char *name)
+{
+	int ret;
+
+	ret = check_ifname(name);
+	if (ret == 0)
+		strncpy(buf, name, IFNAMSIZ - 1);
+
+	return ret;
+}
+
 int get_addr_1(inet_prefix *addr, const char *name, int family)
 {
 	memset(addr, 0, sizeof(*addr));
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH][net-next] net_sched: remove redundant assignment to ret
From: Cong Wang @ 2017-09-29 17:35 UTC (permalink / raw)
  To: Colin King
  Cc: Jamal Hadi Salim, Jiri Pirko, David S . Miller,
	Linux Kernel Network Developers, kernel-janitors, LKML
In-Reply-To: <20170929140116.8642-1-colin.king@canonical.com>

On Fri, Sep 29, 2017 at 7:01 AM, Colin King <colin.king@canonical.com> wrote:
> From: Colin Ian King <colin.king@canonical.com>
>
> The assignment of -EINVAL to variable ret is redundant as it
> is being overwritten on the following error exit paths or
> to the return value from the following call to basic_set_parms.
> Fix this up by removing it. Cleans up clang warning message:
>
> net/sched/cls_basic.c:185:2: warning: Value stored to 'err' is never read
>
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Fixes: 1d8134fea2eb ("net_sched: use idr to allocate basic filter handles")

Acked-by: Cong Wang <xiyou.wangcong@gmail.com>

^ permalink raw reply

* Re: [PATCH v4 net-next 0/8] flow_dissector: Protocol specific flow dissector offload
From: David Miller @ 2017-09-29 17:42 UTC (permalink / raw)
  To: tom; +Cc: hannes, tom, netdev, rohit
In-Reply-To: <CALx6S37J-Uv11TJuNFj4et==NFZj9W0v9s_c-Qowqcs--Kx_VQ@mail.gmail.com>

From: Tom Herbert <tom@herbertland.com>
Date: Fri, 29 Sep 2017 08:48:55 -0700

> The flow_dissector interface is not a uAPI.

That's not true, insofar as cls_flower.c uses the flow_dissector
therefore if you change the flow_dissector in certain ways then
cls_flower.c might have it's behavior changed and that is in fact UAPI
facing.

^ permalink raw reply

* Re: netlink backwards compatibility in userspace tools
From: Rustad, Mark D @ 2017-09-29 17:50 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Netdev, LKML, Daniel Kahn Gillmor
In-Reply-To: <CAHmME9oixZtPVdH24KJQ9NaTuf_ECAOoHwQhuA+Fy-BX+F_3dw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2135 bytes --]


> On Sep 29, 2017, at 3:22 AM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> 
> Hi guys,
> 
> One handy aspect of Netlink is that it's backwards compatible. This
> means that you can run old userspace utilities on new kernels, even if
> the new kernel supports new features and netlink attributes. The wire
> format is stable enough that the data marshaled can be extended
> without breaking compat. Neat.
> 
> I was wondering, though, what you think the best stance is toward
> these old userspace utilities. What should they do if the kernel sends
> it netlink attributes that it does not recognize? At the moment, I'm
> doing something like this:
> 
> static void warn_unrecognized(void)
> {
>    static bool once = false;
>    if (once)
>        return;
>    once = true;
>    fprintf(stderr,
>        "Warning: this program received from your kernel one or more\n"
>        "attributes that it did not recognize. It is possible that\n"
>        "this version of wg(8) is older than your kernel. You may\n"
>        "want to update this program.\n");
> }
> 
> This seems like a somewhat sensible warning, but then I wonder about
> distributions like Debian, which has a long stable life cycle, so it
> frequently has very old tools (ancient iproute2 for example). Then,
> VPS providers have these Debian images run on top of newer kernels.
> People in this situation would undoubtedly see the above warning a lot
> and not be able to do anything about it. Not horrible, but a bit
> annoying. Is this an okay annoyance? Or is it advised to just have no
> warning at all? One idea would be to put it behind an environment
> variable flag, but I don't like too many nobs.
> 
> I'm generally wondering about attitudes toward this kind of userspace
> program behavior in response to newer kernels.
> 
> Thanks,
> Jason

That seems like a bit much. Consider only emitting a message with the use of a verbose flag - or two. Even then the message should be shortened - the first sentence is entirely adequate even in verbose mode.

--
Mark Rustad, Networking Division, Intel Corporation


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 841 bytes --]

^ permalink raw reply

* [PATCH net-next] bpf: Fix compiler warning on info.map_ids for 32bit platform
From: Martin KaFai Lau @ 2017-09-29 17:52 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

This patch uses u64_to_user_ptr() to cast info.map_ids to a userspace ptr.
It also tags the user_map_ids with '__user' for sparse check.

Fixes: cb4d2b3f03d8 ("bpf: Add name, load_time, uid and map_ids to bpf_prog_info")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 kernel/bpf/syscall.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 11a7f82a55d1..b927da66f653 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1405,7 +1405,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
 	info.nr_map_ids = prog->aux->used_map_cnt;
 	ulen = min_t(u32, info.nr_map_ids, ulen);
 	if (ulen) {
-		u32 *user_map_ids = (u32 *)info.map_ids;
+		u32 __user *user_map_ids = u64_to_user_ptr(info.map_ids);
 		u32 i;
 
 		for (i = 0; i < ulen; i++)
-- 
2.9.5

^ permalink raw reply related

* Re: [PATCHv4 iproute2 1/2] lib/libnetlink: re malloc buff if size is not enough
From: Stephen Hemminger @ 2017-09-29 17:54 UTC (permalink / raw)
  To: Hangbin Liu; +Cc: netdev, Michal Kubecek, Phil Sutter, Hangbin Liu
In-Reply-To: <1506605626-1744-2-git-send-email-haliu@redhat.com>

On Thu, 28 Sep 2017 21:33:45 +0800
Hangbin Liu <haliu@redhat.com> wrote:

>  
> +static int __rtnl_recvmsg(int fd, struct msghdr *msg, int flags)
> +{
> +	int len;
> +
> +	do {
> +		len = recvmsg(fd, msg, flags);
> +	} while (len < 0 && (errno == EINTR || errno == EAGAIN));
> +
> +	if (len < 0) {
> +		fprintf(stderr, "netlink receive error %s (%d)\n",
> +			strerror(errno), errno);
> +		return -errno;
> +	}
> +
> +	if (len == 0) {
> +		fprintf(stderr, "EOF on netlink\n");
> +		return -ENODATA;
> +	}
> +
> +	return len;
> +}
> +
> +static int rtnl_recvmsg(int fd, struct msghdr *msg, char **answer)
> +{
> +	struct iovec *iov = msg->msg_iov;
> +	char *buf;
> +	int len;
> +
> +	iov->iov_base = NULL;
> +	iov->iov_len = 0;
> +
> +	len = __rtnl_recvmsg(fd, msg, MSG_PEEK | MSG_TRUNC);
> +	if (len < 0)
> +		return len;
> +
> +	buf = malloc(len);
> +	if (!buf) {
> +		fprintf(stderr, "malloc error: not enough buffer\n");
> +		return -ENOMEM;
> +	}
> +
> +	iov->iov_base = buf;
> +	iov->iov_len = len;
> +
> +	len = __rtnl_recvmsg(fd, msg, 0);
> +	if (len < 0) {
> +		free(buf);
> +		return len;
> +	}
> +
> +	if (answer)
> +		*answer = buf;
> +	else
> +		free(buf);
> +
> +	return len;
> +}

Doubling the number of system calls per message is not going to make
users with 5,000,000 routes or 1000 vlans, or 10,000 tunnels happy.
Please rethink this.

^ permalink raw reply

* Re: [PATCH v4 net-next 0/8] flow_dissector: Protocol specific flow dissector offload
From: Tom Herbert @ 2017-09-29 17:59 UTC (permalink / raw)
  To: David Miller
  Cc: Hannes Frederic Sowa, Tom Herbert,
	Linux Kernel Network Developers, Rohit Seth
In-Reply-To: <20170929.184245.412578447363431176.davem@davemloft.net>

On Fri, Sep 29, 2017 at 10:42 AM, David Miller <davem@davemloft.net> wrote:
> From: Tom Herbert <tom@herbertland.com>
> Date: Fri, 29 Sep 2017 08:48:55 -0700
>
>> The flow_dissector interface is not a uAPI.
>
> That's not true, insofar as cls_flower.c uses the flow_dissector
> therefore if you change the flow_dissector in certain ways then
> cls_flower.c might have it's behavior changed and that is in fact UAPI
> facing.

Then I would suggest adding another flag like FLOW_DISSECTOR_F_FLOWER
and when anyone puts new code into flow_dissector they can wrap it
with "if !(flags & FLOW_DISSECTOR_F_FLOWER)". If the flower uAPI is
subsequently update then the conditional can be removed. This way
flower can support maintain its APIs, but we can still still extend
and improve flow_dissector for othersuse cases.

Tom

^ permalink raw reply

* Re: [PATCH v4 net-next 0/8] flow_dissector: Protocol specific flow dissector offload
From: Tom Herbert @ 2017-09-29 18:04 UTC (permalink / raw)
  To: David Miller
  Cc: Hannes Frederic Sowa, Tom Herbert,
	Linux Kernel Network Developers, Rohit Seth
In-Reply-To: <CALx6S37DhT095fY92HWZAKQvMkqjxqq08nxJqL=wTP8JorHHDg@mail.gmail.com>

On Fri, Sep 29, 2017 at 10:59 AM, Tom Herbert <tom@herbertland.com> wrote:
> On Fri, Sep 29, 2017 at 10:42 AM, David Miller <davem@davemloft.net> wrote:
>> From: Tom Herbert <tom@herbertland.com>
>> Date: Fri, 29 Sep 2017 08:48:55 -0700
>>
>>> The flow_dissector interface is not a uAPI.
>>
>> That's not true, insofar as cls_flower.c uses the flow_dissector
>> therefore if you change the flow_dissector in certain ways then
>> cls_flower.c might have it's behavior changed and that is in fact UAPI
>> facing.
>
> Then I would suggest adding another flag like FLOW_DISSECTOR_F_FLOWER
> and when anyone puts new code into flow_dissector they can wrap it
> with "if !(flags & FLOW_DISSECTOR_F_FLOWER)". If the flower uAPI is
> subsequently update then the conditional can be removed. This way
> flower can support maintain its APIs, but we can still still extend
> and improve flow_dissector for othersuse cases.
>
Actually, it would make more sense to have the converse so we don't
have to touch flower. I will add FLOW_DISSECTOR_F_NOT_FLOWER

> Tom

^ permalink raw reply

* Re: [PATCHv4 iproute2 1/2] lib/libnetlink: re malloc buff if size is not enough
From: Michal Kubecek @ 2017-09-29 18:20 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Hangbin Liu, netdev, Phil Sutter, Hangbin Liu
In-Reply-To: <20170929105440.7edaec1f@xeon-e3>

On Fri, Sep 29, 2017 at 10:54:40AM -0700, Stephen Hemminger wrote:
> On Thu, 28 Sep 2017 21:33:45 +0800
> Hangbin Liu <haliu@redhat.com> wrote:
> 
> >  
> > +static int __rtnl_recvmsg(int fd, struct msghdr *msg, int flags)
> > +{
> > +	int len;
> > +
> > +	do {
> > +		len = recvmsg(fd, msg, flags);
> > +	} while (len < 0 && (errno == EINTR || errno == EAGAIN));
> > +
> > +	if (len < 0) {
> > +		fprintf(stderr, "netlink receive error %s (%d)\n",
> > +			strerror(errno), errno);
> > +		return -errno;
> > +	}
> > +
> > +	if (len == 0) {
> > +		fprintf(stderr, "EOF on netlink\n");
> > +		return -ENODATA;
> > +	}
> > +
> > +	return len;
> > +}
> > +
> > +static int rtnl_recvmsg(int fd, struct msghdr *msg, char **answer)
> > +{
> > +	struct iovec *iov = msg->msg_iov;
> > +	char *buf;
> > +	int len;
> > +
> > +	iov->iov_base = NULL;
> > +	iov->iov_len = 0;
> > +
> > +	len = __rtnl_recvmsg(fd, msg, MSG_PEEK | MSG_TRUNC);
> > +	if (len < 0)
> > +		return len;
> > +
> > +	buf = malloc(len);
> > +	if (!buf) {
> > +		fprintf(stderr, "malloc error: not enough buffer\n");
> > +		return -ENOMEM;
> > +	}
> > +
> > +	iov->iov_base = buf;
> > +	iov->iov_len = len;
> > +
> > +	len = __rtnl_recvmsg(fd, msg, 0);
> > +	if (len < 0) {
> > +		free(buf);
> > +		return len;
> > +	}
> > +
> > +	if (answer)
> > +		*answer = buf;
> > +	else
> > +		free(buf);
> > +
> > +	return len;
> > +}
> 
> Doubling the number of system calls per message is not going to make
> users with 5,000,000 routes or 1000 vlans, or 10,000 tunnels happy.
> Please rethink this.

I'm not sure it's possible to avoid this if we want to be able to get
rid of a preset message length limit. If you call recvmsg() without
MSG_PEEK and your buffer isn't sufficiently large, the message is lost.
And once you use MSG_PEEK, you need another syscall to remove the
message from the queue even if you read all data. In other words, to be
sure you don't lose the reply, you have to do two syscalls.

One alternative I can see would be calling recvmsg() without MSG_PEEK
(but with reasonably large buffer) and repeating the request if the
buffer is not large enough (and caller is actually interested in the
answer). But I don't think this is desirable either as that would result
in even worse overhead.

Michal Kubecek

^ permalink raw reply

* RE: [PATCH RFC 3/5] Add KSZ8795 switch driver
From: Tristram.Ha @ 2017-09-29 18:24 UTC (permalink / raw)
  To: andrew, David.Laight
  Cc: muvarov, pavel, nathan.leigh.conrad, vivien.didelot, f.fainelli,
	netdev, linux-kernel, Woojung.Huh
In-Reply-To: <20170929121201.GD19710@lunn.ch>

The actual SPI access performance will depend on the SPI host controller.
The SPI access speed, ranging from 12 MHz to 50 MHz depending on the
chip, is a factor but the performance of the SPI host controller is more
important.  Generally the SPI host controller scales down the clock by a
factor of 2.  So if the maximum 50 Mhz does not work for the chip the next
speed is 25 Mhz.  Most of the SPI host controllers work in range of 20-30 Mhz
with Microchip SPI switches.

For example, Raspberry Pi 2 has 2 SPI host controllers.  The new code uses the
newer host controller which has better performance even when running in
slower SPI speed.

It is hard to measure the actual SPI performance of the switch, but for SPI
Ethernet controller the performance can be viewed by running throughput test
as SPI is responsible for passing network frames between system and device.

The ODROID C1 (A Raspberry Pi like SoC) has the best SPI performance, even not
running in the highest SPI speed.

Typical SPI register read takes about 120 microseconds.

Now back to my concern about SPI access.  It is not accessible in interrupt context.
Even in a timer callback a work queue has to be scheduled to program the hardware
registers.  My concern is if a task is already running with SPI access to a lot of registers
like reading the 32 MIB counters in every port of the switch, another register access
has to wait until they are finished.  This normally does not pose a problem in
regular switch operation, but there are some situations it will create a problem.

One of the situations is running RSTP Conformance Test.  The test case sends a BPDU
to open/close the port and then send traffic to test if the port is really opened/closed.
For software implementation which receives the BPDU and all network traffic it is
reasonable to expect the software opens/closes the port and then can regulate the
network traffic whatever it wants, but for a hardware implementation which programs
a register to open/close the port then it is critical this register write can be executed as
soon as possible.

Another situation is getting the PTP transmit timestamp of a PTP event message.
The Microchip PTP switch uses registers to store the Sync, Delay_Req, and Pdelay_Resp
timestamps.  If this register is not read as soon as possible and another message of the
same type is sent, the last timestamp is lost.  Software can regulate the sending of
these messages and this situation does not happen in normal operation.  But in a stress
test this PTP operation definitely cannot handle it.

I know this MIB counter reading implementation cannot guarantee those urgent register
access to happen promptly, but it minimizes the chance of blocking those accesses in normal
operation.

> -----Original Message-----
> From: Andrew Lunn [mailto:andrew@lunn.ch]
> Sent: Friday, September 29, 2017 5:12 AM
> To: David Laight
> Cc: Tristram Ha - C24268; muvarov@gmail.com; pavel@ucw.cz;
> nathan.leigh.conrad@gmail.com; vivien.didelot@savoirfairelinux.com;
> f.fainelli@gmail.com; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; Woojung Huh - C21699
> Subject: Re: [PATCH RFC 3/5] Add KSZ8795 switch driver
> 
> On Fri, Sep 29, 2017 at 09:14:26AM +0000, David Laight wrote:
> > From: Andrew Lunn
> > > Sent: 28 September 2017 20:34
> > ...
> > > > There are 34 counters.  In normal case using generic bus I/O or PCI to
> read them
> > > > is very quick, but the switch is mostly accessed using SPI, or even I2C.
> As the SPI
> > > > access is very slow.
> > >
> > > How slow is it? The Marvell switches all use MDIO. It is probably a
> > > bit faster than I2C, but it is a lot slower than MMIO or PCI.
> > >
> > > ethtool -S lan0 takes about 25ms.
> >
> > Is the SPI access software bit-banged?
> 
> That will depend on the board design. I've used mdio bit banging, and
> that was painfully slow for stats.
> 
> But we should primarily think about average hardware. It is going to
> have hardware SPI or I2C. If statistics reading with hardware I2C is
> reasonable, i would avoid caching, and just ensure other accesses are
> permitted between individual statistic reads.
> 
> It also requires Microchip also post new code. They have been very
> silent for quite a while....
> 
> 	  Andrew

^ permalink raw reply

* [PATCH net-next 0/8] net: dsa: change dsa_ptr for a dsa_port
From: Vivien Didelot @ 2017-09-29 18:36 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot

With DSA, a master net_device is physically wired to a dedicated CPU
switch port. For interaction with the DSA layer, the struct net_device
contains a dsa_ptr, which currently points to a dsa_switch_tree object.

This is only valid for a switch fabric with a single CPU port. In order
to support switch fabrics with multiple CPU ports, we first need to
change the type of dsa_ptr to what it really is: a dsa_port object.

This is what this patchset does. The first 4 patches cleans up portions
of DSA core to make the next patches more readable. These next patches
prepare the xmit and receive hot paths and finally change dsa_ptr.

Vivien Didelot (8):
  net: dsa: directly fetch switch in mtk_tag_rcv
  net: dsa: directly fetch switch in lan9303_rcv
  net: dsa: use cpu_dp in master code
  net: dsa: use temporary dsa_device_ops variable
  net: dsa: add tagging ops to port
  net: dsa: prepare master receive hot path
  net: dsa: change dsa_ptr for a dsa_port
  net: dsa: remove tag ops from the switch tree

 include/linux/netdevice.h |  4 ++--
 include/net/dsa.h         | 19 ++++++++-----------
 net/dsa/dsa.c             |  6 +++---
 net/dsa/dsa2.c            | 15 ++++++++++-----
 net/dsa/dsa_priv.h        |  7 +------
 net/dsa/legacy.c          | 15 ++++++++++-----
 net/dsa/master.c          | 47 ++++++++++++++++++++++-------------------------
 net/dsa/slave.c           |  3 +--
 net/dsa/tag_brcm.c        |  3 +--
 net/dsa/tag_dsa.c         |  3 ++-
 net/dsa/tag_edsa.c        |  3 ++-
 net/dsa/tag_ksz.c         |  3 +--
 net/dsa/tag_lan9303.c     |  6 ++----
 net/dsa/tag_mtk.c         | 12 ++----------
 net/dsa/tag_qca.c         |  3 +--
 net/dsa/tag_trailer.c     |  3 +--
 16 files changed, 69 insertions(+), 83 deletions(-)

-- 
2.14.1

^ permalink raw reply

* [PATCH net-next 1/8] net: dsa: directly fetch switch in mtk_tag_rcv
From: Vivien Didelot @ 2017-09-29 18:36 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot
In-Reply-To: <20170929183635.8122-1-vivien.didelot@savoirfairelinux.com>

In a single-chip switch fabric, there is no need to fetch the dsa_switch
structure from the tree, directly use the CPU port's "ds" member.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 net/dsa/tag_mtk.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/net/dsa/tag_mtk.c b/net/dsa/tag_mtk.c
index ec8ee5f43255..ef0364b82241 100644
--- a/net/dsa/tag_mtk.c
+++ b/net/dsa/tag_mtk.c
@@ -47,7 +47,8 @@ static struct sk_buff *mtk_tag_rcv(struct sk_buff *skb, struct net_device *dev,
 				   struct packet_type *pt)
 {
 	struct dsa_switch_tree *dst = dev->dsa_ptr;
-	struct dsa_switch *ds;
+	struct dsa_port *cpu_dp = dst->cpu_dp;
+	struct dsa_switch *ds = cpu_dp->ds;
 	int port;
 	__be16 *phdr, hdr;
 
@@ -68,14 +69,6 @@ static struct sk_buff *mtk_tag_rcv(struct sk_buff *skb, struct net_device *dev,
 		skb->data - ETH_HLEN - MTK_HDR_LEN,
 		2 * ETH_ALEN);
 
-	/* This protocol doesn't support cascading multiple
-	 * switches so it's safe to assume the switch is first
-	 * in the tree.
-	 */
-	ds = dst->ds[0];
-	if (!ds)
-		return NULL;
-
 	/* Get source port information */
 	port = (hdr & MTK_HDR_RECV_SOURCE_PORT_MASK);
 	if (!ds->ports[port].netdev)
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next 2/8] net: dsa: directly fetch switch in lan9303_rcv
From: Vivien Didelot @ 2017-09-29 18:36 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot
In-Reply-To: <20170929183635.8122-1-vivien.didelot@savoirfairelinux.com>

In a single-chip switch fabric, there is no need to fetch the dsa_switch
structure from the tree, directly use the CPU port's "ds" member.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 net/dsa/tag_lan9303.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/dsa/tag_lan9303.c b/net/dsa/tag_lan9303.c
index 0b9826105e42..f0b51acf36ac 100644
--- a/net/dsa/tag_lan9303.c
+++ b/net/dsa/tag_lan9303.c
@@ -72,11 +72,10 @@ static struct sk_buff *lan9303_rcv(struct sk_buff *skb, struct net_device *dev,
 {
 	u16 *lan9303_tag;
 	struct dsa_switch_tree *dst = dev->dsa_ptr;
-	struct dsa_switch *ds;
+	struct dsa_port *cpu_dp = dst->cpu_dp;
+	struct dsa_switch *ds = cpu_dp->ds;
 	unsigned int source_port;
 
-	ds = dst->ds[0];
-
 	if (unlikely(!ds)) {
 		dev_warn_ratelimited(&dev->dev, "Dropping packet, due to missing DSA switch device\n");
 		return NULL;
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next 3/8] net: dsa: use cpu_dp in master code
From: Vivien Didelot @ 2017-09-29 18:36 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot
In-Reply-To: <20170929183635.8122-1-vivien.didelot@savoirfairelinux.com>

Make it clear that the master device is linked to a CPU port by using
"cpu_dp" for the dsa_port variable in master.c instead of "port", then
use a "port" variable to describe the port index, as usually seen in
other places of DSA core.

This will make the future patch touching dsa_ptr more readable. There is
no functional changes.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 net/dsa/master.c | 42 ++++++++++++++++++++++--------------------
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/net/dsa/master.c b/net/dsa/master.c
index 5e5147ec5a44..ef15d35f1574 100644
--- a/net/dsa/master.c
+++ b/net/dsa/master.c
@@ -17,9 +17,10 @@ static void dsa_master_get_ethtool_stats(struct net_device *dev,
 					 uint64_t *data)
 {
 	struct dsa_switch_tree *dst = dev->dsa_ptr;
-	struct dsa_port *port = dst->cpu_dp;
-	struct dsa_switch *ds = port->ds;
-	const struct ethtool_ops *ops = port->orig_ethtool_ops;
+	struct dsa_port *cpu_dp = dst->cpu_dp;
+	const struct ethtool_ops *ops = cpu_dp->orig_ethtool_ops;
+	struct dsa_switch *ds = cpu_dp->ds;
+	int port = cpu_dp->index;
 	int count = 0;
 
 	if (ops && ops->get_sset_count && ops->get_ethtool_stats) {
@@ -28,15 +29,15 @@ static void dsa_master_get_ethtool_stats(struct net_device *dev,
 	}
 
 	if (ds->ops->get_ethtool_stats)
-		ds->ops->get_ethtool_stats(ds, port->index, data + count);
+		ds->ops->get_ethtool_stats(ds, port, data + count);
 }
 
 static int dsa_master_get_sset_count(struct net_device *dev, int sset)
 {
 	struct dsa_switch_tree *dst = dev->dsa_ptr;
-	struct dsa_port *port = dst->cpu_dp;
-	struct dsa_switch *ds = port->ds;
-	const struct ethtool_ops *ops = port->orig_ethtool_ops;
+	struct dsa_port *cpu_dp = dst->cpu_dp;
+	const struct ethtool_ops *ops = cpu_dp->orig_ethtool_ops;
+	struct dsa_switch *ds = cpu_dp->ds;
 	int count = 0;
 
 	if (ops && ops->get_sset_count)
@@ -52,16 +53,17 @@ static void dsa_master_get_strings(struct net_device *dev, uint32_t stringset,
 				   uint8_t *data)
 {
 	struct dsa_switch_tree *dst = dev->dsa_ptr;
-	struct dsa_port *port = dst->cpu_dp;
-	struct dsa_switch *ds = port->ds;
-	const struct ethtool_ops *ops = port->orig_ethtool_ops;
+	struct dsa_port *cpu_dp = dst->cpu_dp;
+	const struct ethtool_ops *ops = cpu_dp->orig_ethtool_ops;
+	struct dsa_switch *ds = cpu_dp->ds;
+	int port = cpu_dp->index;
 	int len = ETH_GSTRING_LEN;
 	int mcount = 0, count;
 	unsigned int i;
 	uint8_t pfx[4];
 	uint8_t *ndata;
 
-	snprintf(pfx, sizeof(pfx), "p%.2d", port->index);
+	snprintf(pfx, sizeof(pfx), "p%.2d", port);
 	/* We do not want to be NULL-terminated, since this is a prefix */
 	pfx[sizeof(pfx) - 1] = '_';
 
@@ -76,7 +78,7 @@ static void dsa_master_get_strings(struct net_device *dev, uint32_t stringset,
 		 * the output after to prepend our CPU port prefix we
 		 * constructed earlier
 		 */
-		ds->ops->get_strings(ds, port->index, ndata);
+		ds->ops->get_strings(ds, port, ndata);
 		count = ds->ops->get_sset_count(ds);
 		for (i = 0; i < count; i++) {
 			memmove(ndata + (i * len + sizeof(pfx)),
@@ -89,17 +91,17 @@ static void dsa_master_get_strings(struct net_device *dev, uint32_t stringset,
 int dsa_master_ethtool_setup(struct net_device *dev)
 {
 	struct dsa_switch_tree *dst = dev->dsa_ptr;
-	struct dsa_port *port = dst->cpu_dp;
-	struct dsa_switch *ds = port->ds;
+	struct dsa_port *cpu_dp = dst->cpu_dp;
+	struct dsa_switch *ds = cpu_dp->ds;
 	struct ethtool_ops *ops;
 
 	ops = devm_kzalloc(ds->dev, sizeof(*ops), GFP_KERNEL);
 	if (!ops)
 		return -ENOMEM;
 
-	port->orig_ethtool_ops = dev->ethtool_ops;
-	if (port->orig_ethtool_ops)
-		memcpy(ops, port->orig_ethtool_ops, sizeof(*ops));
+	cpu_dp->orig_ethtool_ops = dev->ethtool_ops;
+	if (cpu_dp->orig_ethtool_ops)
+		memcpy(ops, cpu_dp->orig_ethtool_ops, sizeof(*ops));
 
 	ops->get_sset_count = dsa_master_get_sset_count;
 	ops->get_ethtool_stats = dsa_master_get_ethtool_stats;
@@ -113,8 +115,8 @@ int dsa_master_ethtool_setup(struct net_device *dev)
 void dsa_master_ethtool_restore(struct net_device *dev)
 {
 	struct dsa_switch_tree *dst = dev->dsa_ptr;
-	struct dsa_port *port = dst->cpu_dp;
+	struct dsa_port *cpu_dp = dst->cpu_dp;
 
-	dev->ethtool_ops = port->orig_ethtool_ops;
-	port->orig_ethtool_ops = NULL;
+	dev->ethtool_ops = cpu_dp->orig_ethtool_ops;
+	cpu_dp->orig_ethtool_ops = NULL;
 }
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next 4/8] net: dsa: use temporary dsa_device_ops variable
From: Vivien Didelot @ 2017-09-29 18:36 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot
In-Reply-To: <20170929183635.8122-1-vivien.didelot@savoirfairelinux.com>

When resolving the DSA tagging protocol used by a CPU switch, use a
temporary "tag_ops" variable to store the dsa_device_ops instead of
using directly dst->tag_ops. This will make the future patches moving
this pointer around easier to read.

There is no functional changes.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 net/dsa/dsa2.c   | 8 +++++---
 net/dsa/legacy.c | 8 +++++---
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index dcccaebde708..6a10c5c1639f 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -485,6 +485,7 @@ static int dsa_cpu_parse(struct dsa_port *port, u32 index,
 			 struct dsa_switch_tree *dst,
 			 struct dsa_switch *ds)
 {
+	const struct dsa_device_ops *tag_ops;
 	enum dsa_tag_protocol tag_protocol;
 	struct net_device *ethernet_dev;
 	struct device_node *ethernet;
@@ -514,13 +515,14 @@ static int dsa_cpu_parse(struct dsa_port *port, u32 index,
 	ds->cpu_port_mask |= BIT(index);
 
 	tag_protocol = ds->ops->get_tag_protocol(ds);
-	dst->tag_ops = dsa_resolve_tag_protocol(tag_protocol);
-	if (IS_ERR(dst->tag_ops)) {
+	tag_ops = dsa_resolve_tag_protocol(tag_protocol);
+	if (IS_ERR(tag_ops)) {
 		dev_warn(ds->dev, "No tagger for this switch\n");
 		ds->cpu_port_mask &= ~BIT(index);
-		return PTR_ERR(dst->tag_ops);
+		return PTR_ERR(tag_ops);
 	}
 
+	dst->tag_ops = tag_ops;
 	dst->rcv = dst->tag_ops->rcv;
 
 	return 0;
diff --git a/net/dsa/legacy.c b/net/dsa/legacy.c
index ae505d8e4417..8e849013f69d 100644
--- a/net/dsa/legacy.c
+++ b/net/dsa/legacy.c
@@ -144,13 +144,15 @@ static int dsa_switch_setup_one(struct dsa_switch *ds,
 	 * switch.
 	 */
 	if (dst->cpu_dp->ds == ds) {
+		const struct dsa_device_ops *tag_ops;
 		enum dsa_tag_protocol tag_protocol;
 
 		tag_protocol = ops->get_tag_protocol(ds);
-		dst->tag_ops = dsa_resolve_tag_protocol(tag_protocol);
-		if (IS_ERR(dst->tag_ops))
-			return PTR_ERR(dst->tag_ops);
+		tag_ops = dsa_resolve_tag_protocol(tag_protocol);
+		if (IS_ERR(tag_ops))
+			return PTR_ERR(tag_ops);
 
+		dst->tag_ops = tag_ops;
 		dst->rcv = dst->tag_ops->rcv;
 	}
 
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next 5/8] net: dsa: add tagging ops to port
From: Vivien Didelot @ 2017-09-29 18:36 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot
In-Reply-To: <20170929183635.8122-1-vivien.didelot@savoirfairelinux.com>

The DSA tagging protocol operations are specific to each CPU port,
thus the dsa_device_ops pointer belongs to the dsa_port structure.

>From now on assign a slave's xmit copy from its CPU port tagging
operations. This will ease the future support for multiple CPU ports.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 include/net/dsa.h  | 3 +++
 net/dsa/dsa2.c     | 1 +
 net/dsa/dsa_priv.h | 2 +-
 net/dsa/legacy.c   | 1 +
 net/dsa/slave.c    | 3 +--
 5 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 8dee216a5a9b..6cd36dcb65e1 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -189,6 +189,9 @@ struct dsa_port {
 	 * Original copy of the master netdev ethtool_ops
 	 */
 	const struct ethtool_ops *orig_ethtool_ops;
+
+	/* CPU port tagging operations used by master or slave devices */
+	const struct dsa_device_ops *tag_ops;
 };
 
 struct dsa_switch {
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 6a10c5c1639f..9eac4726dc0c 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -522,6 +522,7 @@ static int dsa_cpu_parse(struct dsa_port *port, u32 index,
 		return PTR_ERR(tag_ops);
 	}
 
+	dst->cpu_dp->tag_ops = tag_ops;
 	dst->tag_ops = tag_ops;
 	dst->rcv = dst->tag_ops->rcv;
 
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index eccc62776283..6dbed23ffe83 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -66,7 +66,7 @@ struct dsa_notifier_vlan_info {
 };
 
 struct dsa_slave_priv {
-	/* Copy of dp->ds->dst->tag_ops->xmit for faster access in hot path */
+	/* Copy of CPU port xmit for faster access in slave transmit hot path */
 	struct sk_buff *	(*xmit)(struct sk_buff *skb,
 					struct net_device *dev);
 
diff --git a/net/dsa/legacy.c b/net/dsa/legacy.c
index 8e849013f69d..4d374541815a 100644
--- a/net/dsa/legacy.c
+++ b/net/dsa/legacy.c
@@ -152,6 +152,7 @@ static int dsa_switch_setup_one(struct dsa_switch *ds,
 		if (IS_ERR(tag_ops))
 			return PTR_ERR(tag_ops);
 
+		dst->cpu_dp->tag_ops = tag_ops;
 		dst->tag_ops = tag_ops;
 		dst->rcv = dst->tag_ops->rcv;
 	}
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index bf8800de13c1..4b634db05cee 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1117,7 +1117,6 @@ int dsa_slave_resume(struct net_device *slave_dev)
 int dsa_slave_create(struct dsa_port *port, const char *name)
 {
 	struct dsa_switch *ds = port->ds;
-	struct dsa_switch_tree *dst = ds->dst;
 	struct net_device *master;
 	struct net_device *slave_dev;
 	struct dsa_slave_priv *p;
@@ -1162,7 +1161,7 @@ int dsa_slave_create(struct dsa_port *port, const char *name)
 	}
 	p->dp = port;
 	INIT_LIST_HEAD(&p->mall_tc_list);
-	p->xmit = dst->tag_ops->xmit;
+	p->xmit = cpu_dp->tag_ops->xmit;
 
 	p->old_pause = -1;
 	p->old_link = -1;
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next 6/8] net: dsa: prepare master receive hot path
From: Vivien Didelot @ 2017-09-29 18:36 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot
In-Reply-To: <20170929183635.8122-1-vivien.didelot@savoirfairelinux.com>

In preparation to make DSA master devices point to their corresponding
CPU port instead of the whole tree, add copies of dst and rcv in the
dsa_port structure so that we keep fast access in the receive hot path.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 include/net/dsa.h | 5 +++++
 net/dsa/dsa2.c    | 4 ++++
 net/dsa/legacy.c  | 4 ++++
 3 files changed, 13 insertions(+)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 6cd36dcb65e1..75dc4024f5a8 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -192,6 +192,11 @@ struct dsa_port {
 
 	/* CPU port tagging operations used by master or slave devices */
 	const struct dsa_device_ops *tag_ops;
+
+	/* Copies for faster access in master receive hot path */
+	struct dsa_switch_tree *dst;
+	struct sk_buff *(*rcv)(struct sk_buff *skb, struct net_device *dev,
+			       struct packet_type *pt);
 };
 
 struct dsa_switch {
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 9eac4726dc0c..b71e3bb478e4 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -524,7 +524,11 @@ static int dsa_cpu_parse(struct dsa_port *port, u32 index,
 
 	dst->cpu_dp->tag_ops = tag_ops;
 	dst->tag_ops = tag_ops;
+
+	/* Make a few copies for faster access in master receive hot path */
+	dst->cpu_dp->rcv = dst->cpu_dp->tag_ops->rcv;
 	dst->rcv = dst->tag_ops->rcv;
+	dst->cpu_dp->dst = dst;
 
 	return 0;
 }
diff --git a/net/dsa/legacy.c b/net/dsa/legacy.c
index 4d374541815a..96c7e3f8b8bb 100644
--- a/net/dsa/legacy.c
+++ b/net/dsa/legacy.c
@@ -154,7 +154,11 @@ static int dsa_switch_setup_one(struct dsa_switch *ds,
 
 		dst->cpu_dp->tag_ops = tag_ops;
 		dst->tag_ops = tag_ops;
+
+		/* Few copies for faster access in master receive hot path */
+		dst->cpu_dp->rcv = dst->cpu_dp->tag_ops->rcv;
 		dst->rcv = dst->tag_ops->rcv;
+		dst->cpu_dp->dst = dst;
 	}
 
 	memcpy(ds->rtable, cd->rtable, sizeof(ds->rtable));
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next 7/8] net: dsa: change dsa_ptr for a dsa_port
From: Vivien Didelot @ 2017-09-29 18:36 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot
In-Reply-To: <20170929183635.8122-1-vivien.didelot@savoirfairelinux.com>

With DSA, a master net device (CPU facing interface) has a dsa_ptr
pointer to which hangs a dsa_switch_tree. This is not correct because a
master interface is wired to a dedicated switch port, and because we can
theoretically have several master interfaces pointing to several CPU
ports of the same switch fabric.

Change the master interface's dsa_ptr for the CPU dsa_port pointer.
This is a step towards supporting multiple CPU ports. Also remove
dsa_get_cpu_port which is now unneeded.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 include/linux/netdevice.h |  4 ++--
 net/dsa/dsa.c             |  6 +++---
 net/dsa/dsa2.c            |  2 +-
 net/dsa/dsa_priv.h        |  5 -----
 net/dsa/legacy.c          |  2 +-
 net/dsa/master.c          | 15 +++++----------
 net/dsa/tag_brcm.c        |  3 +--
 net/dsa/tag_dsa.c         |  3 ++-
 net/dsa/tag_edsa.c        |  3 ++-
 net/dsa/tag_ksz.c         |  3 +--
 net/dsa/tag_lan9303.c     |  3 +--
 net/dsa/tag_mtk.c         |  3 +--
 net/dsa/tag_qca.c         |  3 +--
 net/dsa/tag_trailer.c     |  3 +--
 14 files changed, 22 insertions(+), 36 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f535779d9dc1..e1d6ef130611 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -55,7 +55,7 @@
 struct netpoll_info;
 struct device;
 struct phy_device;
-struct dsa_switch_tree;
+struct dsa_port;
 
 /* 802.11 specific */
 struct wireless_dev;
@@ -1752,7 +1752,7 @@ struct net_device {
 	struct vlan_info __rcu	*vlan_info;
 #endif
 #if IS_ENABLED(CONFIG_NET_DSA)
-	struct dsa_switch_tree	*dsa_ptr;
+	struct dsa_port		*dsa_ptr;
 #endif
 #if IS_ENABLED(CONFIG_TIPC)
 	struct tipc_bearer __rcu *tipc_ptr;
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 81c852e32821..51ca2a524a27 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -160,12 +160,12 @@ EXPORT_SYMBOL_GPL(dsa_dev_to_net_device);
 static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
 			  struct packet_type *pt, struct net_device *unused)
 {
-	struct dsa_switch_tree *dst = dev->dsa_ptr;
+	struct dsa_port *cpu_dp = dev->dsa_ptr;
 	struct sk_buff *nskb = NULL;
 	struct pcpu_sw_netstats *s;
 	struct dsa_slave_priv *p;
 
-	if (unlikely(dst == NULL)) {
+	if (unlikely(!cpu_dp)) {
 		kfree_skb(skb);
 		return 0;
 	}
@@ -174,7 +174,7 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
 	if (!skb)
 		return 0;
 
-	nskb = dst->rcv(skb, dev, pt);
+	nskb = cpu_dp->rcv(skb, dev, pt);
 	if (!nskb) {
 		kfree_skb(skb);
 		return 0;
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index b71e3bb478e4..62302558f38c 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -438,7 +438,7 @@ static int dsa_dst_apply(struct dsa_switch_tree *dst)
 	 * sent to the tag format's receive function.
 	 */
 	wmb();
-	dst->cpu_dp->netdev->dsa_ptr = dst;
+	dst->cpu_dp->netdev->dsa_ptr = dst->cpu_dp;
 
 	err = dsa_master_ethtool_setup(dst->cpu_dp->netdev);
 	if (err)
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 6dbed23ffe83..2495f7bdefa2 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -182,9 +182,4 @@ static inline struct net_device *dsa_master_netdev(struct dsa_slave_priv *p)
 	return p->dp->cpu_dp->netdev;
 }
 
-static inline struct dsa_port *dsa_get_cpu_port(struct dsa_switch_tree *dst)
-{
-	return dst->cpu_dp;
-}
-
 #endif
diff --git a/net/dsa/legacy.c b/net/dsa/legacy.c
index 96c7e3f8b8bb..71917505a5cc 100644
--- a/net/dsa/legacy.c
+++ b/net/dsa/legacy.c
@@ -607,7 +607,7 @@ static int dsa_setup_dst(struct dsa_switch_tree *dst, struct net_device *dev,
 	 * sent to the tag format's receive function.
 	 */
 	wmb();
-	dev->dsa_ptr = dst;
+	dev->dsa_ptr = dst->cpu_dp;
 
 	return dsa_master_ethtool_setup(dst->cpu_dp->netdev);
 }
diff --git a/net/dsa/master.c b/net/dsa/master.c
index ef15d35f1574..5f3f57e372e0 100644
--- a/net/dsa/master.c
+++ b/net/dsa/master.c
@@ -16,8 +16,7 @@ static void dsa_master_get_ethtool_stats(struct net_device *dev,
 					 struct ethtool_stats *stats,
 					 uint64_t *data)
 {
-	struct dsa_switch_tree *dst = dev->dsa_ptr;
-	struct dsa_port *cpu_dp = dst->cpu_dp;
+	struct dsa_port *cpu_dp = dev->dsa_ptr;
 	const struct ethtool_ops *ops = cpu_dp->orig_ethtool_ops;
 	struct dsa_switch *ds = cpu_dp->ds;
 	int port = cpu_dp->index;
@@ -34,8 +33,7 @@ static void dsa_master_get_ethtool_stats(struct net_device *dev,
 
 static int dsa_master_get_sset_count(struct net_device *dev, int sset)
 {
-	struct dsa_switch_tree *dst = dev->dsa_ptr;
-	struct dsa_port *cpu_dp = dst->cpu_dp;
+	struct dsa_port *cpu_dp = dev->dsa_ptr;
 	const struct ethtool_ops *ops = cpu_dp->orig_ethtool_ops;
 	struct dsa_switch *ds = cpu_dp->ds;
 	int count = 0;
@@ -52,8 +50,7 @@ static int dsa_master_get_sset_count(struct net_device *dev, int sset)
 static void dsa_master_get_strings(struct net_device *dev, uint32_t stringset,
 				   uint8_t *data)
 {
-	struct dsa_switch_tree *dst = dev->dsa_ptr;
-	struct dsa_port *cpu_dp = dst->cpu_dp;
+	struct dsa_port *cpu_dp = dev->dsa_ptr;
 	const struct ethtool_ops *ops = cpu_dp->orig_ethtool_ops;
 	struct dsa_switch *ds = cpu_dp->ds;
 	int port = cpu_dp->index;
@@ -90,8 +87,7 @@ static void dsa_master_get_strings(struct net_device *dev, uint32_t stringset,
 
 int dsa_master_ethtool_setup(struct net_device *dev)
 {
-	struct dsa_switch_tree *dst = dev->dsa_ptr;
-	struct dsa_port *cpu_dp = dst->cpu_dp;
+	struct dsa_port *cpu_dp = dev->dsa_ptr;
 	struct dsa_switch *ds = cpu_dp->ds;
 	struct ethtool_ops *ops;
 
@@ -114,8 +110,7 @@ int dsa_master_ethtool_setup(struct net_device *dev)
 
 void dsa_master_ethtool_restore(struct net_device *dev)
 {
-	struct dsa_switch_tree *dst = dev->dsa_ptr;
-	struct dsa_port *cpu_dp = dst->cpu_dp;
+	struct dsa_port *cpu_dp = dev->dsa_ptr;
 
 	dev->ethtool_ops = cpu_dp->orig_ethtool_ops;
 	cpu_dp->orig_ethtool_ops = NULL;
diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c
index dbb016434ace..67e0e9fd3d16 100644
--- a/net/dsa/tag_brcm.c
+++ b/net/dsa/tag_brcm.c
@@ -92,8 +92,7 @@ static struct sk_buff *brcm_tag_xmit(struct sk_buff *skb, struct net_device *dev
 static struct sk_buff *brcm_tag_rcv(struct sk_buff *skb, struct net_device *dev,
 				    struct packet_type *pt)
 {
-	struct dsa_switch_tree *dst = dev->dsa_ptr;
-	struct dsa_port *cpu_dp = dsa_get_cpu_port(dst);
+	struct dsa_port *cpu_dp = dev->dsa_ptr;
 	struct dsa_switch *ds = cpu_dp->ds;
 	int source_port;
 	u8 *brcm_tag;
diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c
index fbf9ca954773..60679d64fd07 100644
--- a/net/dsa/tag_dsa.c
+++ b/net/dsa/tag_dsa.c
@@ -67,7 +67,8 @@ static struct sk_buff *dsa_xmit(struct sk_buff *skb, struct net_device *dev)
 static struct sk_buff *dsa_rcv(struct sk_buff *skb, struct net_device *dev,
 			       struct packet_type *pt)
 {
-	struct dsa_switch_tree *dst = dev->dsa_ptr;
+	struct dsa_port *cpu_dp = dev->dsa_ptr;
+	struct dsa_switch_tree *dst = cpu_dp->dst;
 	struct dsa_switch *ds;
 	u8 *dsa_header;
 	int source_device;
diff --git a/net/dsa/tag_edsa.c b/net/dsa/tag_edsa.c
index 76367ba1b2e2..15c75b31d0d5 100644
--- a/net/dsa/tag_edsa.c
+++ b/net/dsa/tag_edsa.c
@@ -80,7 +80,8 @@ static struct sk_buff *edsa_xmit(struct sk_buff *skb, struct net_device *dev)
 static struct sk_buff *edsa_rcv(struct sk_buff *skb, struct net_device *dev,
 				struct packet_type *pt)
 {
-	struct dsa_switch_tree *dst = dev->dsa_ptr;
+	struct dsa_port *cpu_dp = dev->dsa_ptr;
+	struct dsa_switch_tree *dst = cpu_dp->dst;
 	struct dsa_switch *ds;
 	u8 *edsa_header;
 	int source_device;
diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
index 010ca0a336c4..fdf54b8379a2 100644
--- a/net/dsa/tag_ksz.c
+++ b/net/dsa/tag_ksz.c
@@ -80,8 +80,7 @@ static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct net_device *dev)
 static struct sk_buff *ksz_rcv(struct sk_buff *skb, struct net_device *dev,
 			       struct packet_type *pt)
 {
-	struct dsa_switch_tree *dst = dev->dsa_ptr;
-	struct dsa_port *cpu_dp = dsa_get_cpu_port(dst);
+	struct dsa_port *cpu_dp = dev->dsa_ptr;
 	struct dsa_switch *ds = cpu_dp->ds;
 	u8 *tag;
 	int source_port;
diff --git a/net/dsa/tag_lan9303.c b/net/dsa/tag_lan9303.c
index f0b51acf36ac..1134ff08ce63 100644
--- a/net/dsa/tag_lan9303.c
+++ b/net/dsa/tag_lan9303.c
@@ -71,8 +71,7 @@ static struct sk_buff *lan9303_rcv(struct sk_buff *skb, struct net_device *dev,
 			struct packet_type *pt)
 {
 	u16 *lan9303_tag;
-	struct dsa_switch_tree *dst = dev->dsa_ptr;
-	struct dsa_port *cpu_dp = dst->cpu_dp;
+	struct dsa_port *cpu_dp = dev->dsa_ptr;
 	struct dsa_switch *ds = cpu_dp->ds;
 	unsigned int source_port;
 
diff --git a/net/dsa/tag_mtk.c b/net/dsa/tag_mtk.c
index ef0364b82241..5c755c5b303f 100644
--- a/net/dsa/tag_mtk.c
+++ b/net/dsa/tag_mtk.c
@@ -46,8 +46,7 @@ static struct sk_buff *mtk_tag_xmit(struct sk_buff *skb,
 static struct sk_buff *mtk_tag_rcv(struct sk_buff *skb, struct net_device *dev,
 				   struct packet_type *pt)
 {
-	struct dsa_switch_tree *dst = dev->dsa_ptr;
-	struct dsa_port *cpu_dp = dst->cpu_dp;
+	struct dsa_port *cpu_dp = dev->dsa_ptr;
 	struct dsa_switch *ds = cpu_dp->ds;
 	int port;
 	__be16 *phdr, hdr;
diff --git a/net/dsa/tag_qca.c b/net/dsa/tag_qca.c
index 1d4c70711c0f..98618ca6eef6 100644
--- a/net/dsa/tag_qca.c
+++ b/net/dsa/tag_qca.c
@@ -65,8 +65,7 @@ static struct sk_buff *qca_tag_xmit(struct sk_buff *skb, struct net_device *dev)
 static struct sk_buff *qca_tag_rcv(struct sk_buff *skb, struct net_device *dev,
 				   struct packet_type *pt)
 {
-	struct dsa_switch_tree *dst = dev->dsa_ptr;
-	struct dsa_port *cpu_dp = dsa_get_cpu_port(dst);
+	struct dsa_port *cpu_dp = dev->dsa_ptr;
 	struct dsa_switch *ds;
 	u8 ver;
 	int port;
diff --git a/net/dsa/tag_trailer.c b/net/dsa/tag_trailer.c
index d2fd4923aa3e..e185f5083558 100644
--- a/net/dsa/tag_trailer.c
+++ b/net/dsa/tag_trailer.c
@@ -58,8 +58,7 @@ static struct sk_buff *trailer_xmit(struct sk_buff *skb, struct net_device *dev)
 static struct sk_buff *trailer_rcv(struct sk_buff *skb, struct net_device *dev,
 				   struct packet_type *pt)
 {
-	struct dsa_switch_tree *dst = dev->dsa_ptr;
-	struct dsa_port *cpu_dp = dsa_get_cpu_port(dst);
+	struct dsa_port *cpu_dp = dev->dsa_ptr;
 	struct dsa_switch *ds = cpu_dp->ds;
 	u8 *trailer;
 	int source_port;
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next 8/8] net: dsa: remove tag ops from the switch tree
From: Vivien Didelot @ 2017-09-29 18:36 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot
In-Reply-To: <20170929183635.8122-1-vivien.didelot@savoirfairelinux.com>

Now that the dsa_ptr is a dsa_port instance, there is no need to keep
the tag operations in the dsa_switch_tree structure. Remove it.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 include/net/dsa.h | 11 -----------
 net/dsa/dsa2.c    |  2 --
 net/dsa/legacy.c  |  2 --
 3 files changed, 15 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 75dc4024f5a8..bf1006f7c0aa 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -130,11 +130,6 @@ struct dsa_switch_tree {
 	 */
 	struct dsa_platform_data	*pd;
 
-	/* Copy of tag_ops->rcv for faster access in hot path */
-	struct sk_buff *	(*rcv)(struct sk_buff *skb,
-				       struct net_device *dev,
-				       struct packet_type *pt);
-
 	/*
 	 * The switch port to which the CPU is attached.
 	 */
@@ -144,12 +139,6 @@ struct dsa_switch_tree {
 	 * Data for the individual switch chips.
 	 */
 	struct dsa_switch	*ds[DSA_MAX_SWITCHES];
-
-	/*
-	 * Tagging protocol operations for adding and removing an
-	 * encapsulation tag.
-	 */
-	const struct dsa_device_ops *tag_ops;
 };
 
 /* TC matchall action types, only mirroring for now */
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 62302558f38c..54ed054777bd 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -523,11 +523,9 @@ static int dsa_cpu_parse(struct dsa_port *port, u32 index,
 	}
 
 	dst->cpu_dp->tag_ops = tag_ops;
-	dst->tag_ops = tag_ops;
 
 	/* Make a few copies for faster access in master receive hot path */
 	dst->cpu_dp->rcv = dst->cpu_dp->tag_ops->rcv;
-	dst->rcv = dst->tag_ops->rcv;
 	dst->cpu_dp->dst = dst;
 
 	return 0;
diff --git a/net/dsa/legacy.c b/net/dsa/legacy.c
index 71917505a5cc..19ff6e0a21dc 100644
--- a/net/dsa/legacy.c
+++ b/net/dsa/legacy.c
@@ -153,11 +153,9 @@ static int dsa_switch_setup_one(struct dsa_switch *ds,
 			return PTR_ERR(tag_ops);
 
 		dst->cpu_dp->tag_ops = tag_ops;
-		dst->tag_ops = tag_ops;
 
 		/* Few copies for faster access in master receive hot path */
 		dst->cpu_dp->rcv = dst->cpu_dp->tag_ops->rcv;
-		dst->rcv = dst->tag_ops->rcv;
 		dst->cpu_dp->dst = dst;
 	}
 
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next] hv_netvsc: report stop_queue and wake_queue
From: Simon Xiao @ 2017-09-29 18:39 UTC (permalink / raw)
  To: netdev, davem, kys, haiyangz, sthemmin; +Cc: Simon Xiao

Report the numbers of events for stop_queue and wake_queue in
ethtool stats.

Example:
ethtool -S eth0
NIC statistics:
	...
	stop_queue: 7
	wake_queue: 7
	...

Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/net/hyperv/hyperv_net.h |  2 ++
 drivers/net/hyperv/netvsc.c     | 12 ++++++++++--
 drivers/net/hyperv/netvsc_drv.c |  2 ++
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 5176be7..6f550e1 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -686,6 +686,8 @@ struct netvsc_ethtool_stats {
 	unsigned long tx_busy;
 	unsigned long tx_send_full;
 	unsigned long rx_comp_busy;
+	unsigned long stop_queue;
+	unsigned long wake_queue;
 };
 
 struct netvsc_vf_pcpu_stats {
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index b0d323e..6e51949 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -609,6 +609,7 @@ static void netvsc_send_tx_complete(struct netvsc_device *net_device,
 {
 	struct sk_buff *skb = (struct sk_buff *)(unsigned long)desc->trans_id;
 	struct net_device *ndev = hv_get_drvdata(device);
+	struct net_device_context *ndev_ctx = netdev_priv(ndev);
 	struct vmbus_channel *channel = device->channel;
 	u16 q_idx = 0;
 	int queue_sends;
@@ -643,8 +644,10 @@ static void netvsc_send_tx_complete(struct netvsc_device *net_device,
 
 	if (netif_tx_queue_stopped(netdev_get_tx_queue(ndev, q_idx)) &&
 	    (hv_ringbuf_avail_percent(&channel->outbound) > RING_AVAIL_PERCENT_HIWATER ||
-	     queue_sends < 1))
+	     queue_sends < 1)) {
 		netif_tx_wake_queue(netdev_get_tx_queue(ndev, q_idx));
+		ndev_ctx->eth_stats.wake_queue++;
+	}
 }
 
 static void netvsc_send_completion(struct netvsc_device *net_device,
@@ -749,6 +752,7 @@ static inline int netvsc_send_pkt(
 		&net_device->chan_table[packet->q_idx];
 	struct vmbus_channel *out_channel = nvchan->channel;
 	struct net_device *ndev = hv_get_drvdata(device);
+	struct net_device_context *ndev_ctx = netdev_priv(ndev);
 	struct netdev_queue *txq = netdev_get_tx_queue(ndev, packet->q_idx);
 	u64 req_id;
 	int ret;
@@ -789,12 +793,16 @@ static inline int netvsc_send_pkt(
 	if (ret == 0) {
 		atomic_inc_return(&nvchan->queue_sends);
 
-		if (ring_avail < RING_AVAIL_PERCENT_LOWATER)
+		if (ring_avail < RING_AVAIL_PERCENT_LOWATER) {
 			netif_tx_stop_queue(txq);
+			ndev_ctx->eth_stats.stop_queue++;
+		}
 	} else if (ret == -EAGAIN) {
 		netif_tx_stop_queue(txq);
+		ndev_ctx->eth_stats.stop_queue++;
 		if (atomic_read(&nvchan->queue_sends) < 1) {
 			netif_tx_wake_queue(txq);
+			ndev_ctx->eth_stats.wake_queue++;
 			ret = -ENOSPC;
 		}
 	} else {
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index e9d54c9..f300ae6 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1126,6 +1126,8 @@ static const struct {
 	{ "tx_busy",	  offsetof(struct netvsc_ethtool_stats, tx_busy) },
 	{ "tx_send_full", offsetof(struct netvsc_ethtool_stats, tx_send_full) },
 	{ "rx_comp_busy", offsetof(struct netvsc_ethtool_stats, rx_comp_busy) },
+	{ "stop_queue", offsetof(struct netvsc_ethtool_stats, stop_queue) },
+	{ "wake_queue", offsetof(struct netvsc_ethtool_stats, wake_queue) },
 }, vf_stats[] = {
 	{ "vf_rx_packets", offsetof(struct netvsc_vf_pcpu_stats, rx_packets) },
 	{ "vf_rx_bytes",   offsetof(struct netvsc_vf_pcpu_stats, rx_bytes) },
-- 
2.7.4

^ permalink raw reply related

* Re: [net-next V2 PATCH 1/5] bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP
From: Jakub Kicinski @ 2017-09-29 18:41 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, Michael S. Tsirkin, Jason Wang, mchan, John Fastabend,
	peter.waskiewicz.jr, Daniel Borkmann, Alexei Starovoitov,
	Andy Gospodarek
In-Reply-To: <150670285218.23765.2480801081343646072.stgit@firesoul>

On Fri, 29 Sep 2017 18:34:12 +0200, Jesper Dangaard Brouer wrote:
> The 'cpumap' is primary used as a backend map for XDP BPF helper
> call bpf_redirect_map() and XDP_REDIRECT action, like 'devmap'.
> 
> This patch implement the main part of the map.  It is not connected to
> the XDP redirect system yet, and no SKB allocation are done yet.
> 
> The main concern in this patch is to ensure the datapath can run
> without any locking.  This adds complexity to the setup and tear-down
> procedure, which assumptions are extra carefully documented in the
> code comments.
> 
> V2: make sure array isn't larger than num possible CPUs
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Few trivial nitpicks, hope you don't mind :)

> @@ -0,0 +1,555 @@
> +/* bpf/cpumap.c
> + *
> + * Copyright (c) 2017 Jesper Dangaard Brouer, Red Hat Inc.
> + * Released under terms in GPL version 2.  See COPYING.
> + */
> +
> +/* The 'cpumap' is primary used as a backend map for XDP BPF helper
> + * call bpf_redirect_map() and XDP_REDIRECT action, like 'devmap'.
> + *
> + * Unlike devmap which redirect XDP frames out another NIC device,
> + * this map type redirect raw XDP frames to another CPU.  The remote
> + * CPU will do SKB-allocation and call the normal network stack.
> + *
> + * This is a scalability and isolation mechanism, that allow
> + * separating the early driver network XDP layer, from the rest of the
> + * netstack, and assigning dedicated CPUs for this stage.  This
> + * basically allows for 10G wirespeed pre-filtering via bpf.
> + */
> +#include <linux/bpf.h>
> +#include <linux/filter.h>
> +#include <linux/ptr_ring.h>
> +
> +#include <linux/sched.h>
> +#include <linux/workqueue.h>
> +#include <linux/kthread.h>
> +
> +/*
> + * General idea: XDP packets getting XDP redirected to another CPU,
> + * will maximum be stored/queued for one driver ->poll() call.  It is
> + * guaranteed that setting flush bit and flush operation happen on
> + * same CPU.  Thus, cpu_map_flush operation can deduct via this_cpu_ptr()
> + * which queue in bpf_cpu_map_entry contains packets.
> + */
> +
> +#define CPU_MAP_BULK_SIZE 8  /* 8 == one cacheline on 64-bit archs */
> +struct xdp_bulk_queue {
> +	void *q[CPU_MAP_BULK_SIZE];
> +	unsigned int count;
> +};

Out of curiosity - would it make sense to make sure the entire struct
fits into a cache line?  The comment seems to indicate that the array is
sized to fit a cache line, but then there is also the count member...

> +/*
> + * After xchg pointer to bpf_cpu_map_entry, use the call_rcu() to
...

There is a mix for networking and non-networking style comments in this
file, is this intentional?

> +const struct bpf_map_ops cpu_map_ops = {
> +	.map_alloc		= cpu_map_alloc,
> +	.map_free		= cpu_map_free,
> +	.map_delete_elem	= cpu_map_delete_elem,
> +	.map_update_elem	= cpu_map_update_elem,
> +	.map_lookup_elem	= cpu_map_lookup_elem,
> +	.map_get_next_key	= cpu_map_get_next_key,
> +};
> +
> +

Extra new line.

> +/* Runs under RCU-read-side, plus in softirq under NAPI protection.
> + * Thus, safe percpu variable access.
> + */
> +static int bq_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_pkt *xdp_pkt)
> +{
> +	struct xdp_bulk_queue *bq = this_cpu_ptr(rcpu->bulkq);
> +
> +	if (unlikely(bq->count == CPU_MAP_BULK_SIZE)) {
> +		bq_flush_to_queue(rcpu, bq);
> +	}

Curly brackets not needed.

^ permalink raw reply

* RE: [PATCH RFC 3/5] Add KSZ8795 switch driver
From: Tristram.Ha @ 2017-09-29 18:45 UTC (permalink / raw)
  To: pavel
  Cc: andrew, muvarov, nathan.leigh.conrad, vivien.didelot, f.fainelli,
	netdev, linux-kernel, Woojung.Huh
In-Reply-To: <20170928152445.GC2482@amd>

> > > > > Similar code will be needed by other drivers, right?
> > > >
> > > > Although KSZ8795 and KSZ8895 may use the same code, the other
> > > > chips will have different code.
> > >
> > > Ok, please make sure code is shared between these two.
> >
> > The exact function probably cannot be shared between KSZ8795
> > and KSZ8895 drivers because although the register constants look
> > the same but their exact locations are different in the 2 chips.
> 
> Put the code into header as static inline and include it from both
> places?
> 

Although KSZ8795 and KSZ8895 share the same code when simulating
the PHY register access, even though the exact registers are not the
same, this code needs a little modification for another chip.  It also looks
too large to put in a header.

^ permalink raw reply

* Re: [PATCH RFC 3/5] Add KSZ8795 switch driver
From: Andrew Lunn @ 2017-09-29 18:53 UTC (permalink / raw)
  To: Tristram.Ha
  Cc: David.Laight, muvarov, pavel, nathan.leigh.conrad, vivien.didelot,
	f.fainelli, netdev, linux-kernel, Woojung.Huh
In-Reply-To: <93AF473E2DA327428DE3D46B72B1E9FD4112CB89@CHN-SV-EXMX02.mchp-main.com>

> My concern is if a task is already running with SPI access to a lot
> of registers like reading the 32 MIB counters in every port of the
> switch, another register access has to wait until they are finished.

Why does it have to wait? Looking at the code in
ksz_get_ethtool_stats(), you don't take any mutex which will prevent
others from using the SPI bus. All there is is a mutex which prevents
two sets of ksz_get_ethtool_stats() at the same time.

So a PTP read could happen in parallel, and will not be blocked by MIB
reads. They should get interleaved access to the SPI bus.

       Andrew

^ permalink raw reply

* RE: [PATCH RFC 3/5] Add KSZ8795 switch driver
From: Tristram.Ha @ 2017-09-29 18:56 UTC (permalink / raw)
  To: pavel
  Cc: andrew, muvarov, nathan.leigh.conrad, vivien.didelot, f.fainelli,
	netdev, linux-kernel, Woojung.Huh
In-Reply-To: <20170928184059.GA2825@amd>

> On Mon 2017-09-18 20:27:13, Tristram.Ha@microchip.com wrote:
> > > > +/**
> > > > + * Some counters do not need to be read too often because they are
> less
> > > likely
> > > > + * to increase much.
> > > > + */
> > >
> > > What does comment mean? Are you caching statistics, and updating
> > > different values at different rates?
> > >
> >
> > There are 34 counters.  In normal case using generic bus I/O or PCI to read
> them
> > is very quick, but the switch is mostly accessed using SPI, or even I2C.  As
> the SPI
> > access is very slow and cannot run in interrupt context I keep worrying
> reading
> > the MIB counters in a loop for 5 or more ports will prevent other critical
> hardware
> > access from executing soon enough.  These accesses can be getting 1588
> PTP
> > timestamps and opening/closing ports.  (RSTP Conformance Test sends test
> traffic
> > to port supposed to be closed/opened after receiving specific RSTP
> > BPDU.)
> 
> Hmm. Ok, interesting.
> 
> I wonder how well this is going to work if userspace actively 'does
> something' with the switch.
> 
> It seems to me that even if your statistics code is careful not to do
> 'a lot' of accesses at the same time, userspace can use other parts of
> the driver to do the same, and thus cause same unwanted effects...

If the user calls "ethtool -S" in a tight loop the system will waste a lot of
CPU time, but this is more like a user error.
Another solution is not to schedule to read the MIB counters in that
function call.  I think I was doing a favor to update the MIB counters
sooner as the user probably wants to find out what is wrong with the
switch by reading the MIB counters and checking them several times.
For system tracking like SNMP I think it is likely a separate mechanism
is used to gather those information.  If I am wrong that function definitely
needs to be modified.

^ permalink raw reply

* Re: [PATCH v2] lib: fix multiple strlcpy definition
From: Stephen Hemminger @ 2017-09-29 18:58 UTC (permalink / raw)
  To: Baruch Siach; +Cc: netdev, Phil Sutter
In-Reply-To: <6910008ec8db3f63c8120b1624a08328cd203e92.1506621731.git.baruch@tkos.co.il>

On Thu, 28 Sep 2017 21:02:11 +0300
Baruch Siach <baruch@tkos.co.il> wrote:

> Some C libraries, like uClibc and musl, provide BSD compatible
> strlcpy(). Add check_strlcpy() to configure, and avoid defining strlcpy
> and strlcat when the C library provides them.
> 
> This fixes the following static link error with uClibc-ng:
> 
> .../sysroot/usr/lib/libc.a(strlcpy.os): In function `strlcpy':
> strlcpy.c:(.text+0x0): multiple definition of `strlcpy'
> ../lib/libutil.a(utils.o):utils.c:(.text+0x1ddc): first defined here
> collect2: error: ld returned 1 exit status
> 
> Acked-by: Phil Sutter <phil@nwl.cc>
> Signed-off-by: Baruch Siach <baruch@tkos.co.il>

This is OK because it doesn't impact normal glibc too much.


> diff --git a/lib/Makefile b/lib/Makefile
> index 0fbdf4c31f50..132ad00c3335 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -1,5 +1,9 @@
>  include ../config.mk
>  
> +ifeq ($(NEED_STRLCPY),y)
> +	CFLAGS += -DNEED_STRLCPY
> +endif
> +
>

I just removed all the conditional CFLAGS out of subdirectory Makefiles
and moved them into the generated config.mk. Please do that for this
as well and resubmit.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox