Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next v2 03/10] net: dsa: debugfs: add tag_protocol
From: Andrew Lunn @ 2017-08-28 20:16 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Egil Hjelmeland, John Crispin, Woojung Huh, Sean Wang,
	Nikita Yushchenko, Chris Healy
In-Reply-To: <20170828191748.19492-4-vivien.didelot@savoirfairelinux.com>

On Mon, Aug 28, 2017 at 03:17:41PM -0400, Vivien Didelot wrote:
> Add a debug filesystem "tag_protocol" entry to query the switch tagging
> protocol through the .get_tag_protocol operation.
> 
>     # cat switch1/tag_protocol
>     EDSA
> 
> To ease maintenance of tag protocols, add a dsa_tag_protocol_name helper
> to the public API which to convert a tag protocol enum to a string.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net-next v2 00/10] net: dsa: add generic debugfs interface
From: Andrew Lunn @ 2017-08-28 20:08 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Vivien Didelot, netdev, linux-kernel, kernel, David S. Miller,
	Florian Fainelli, Egil Hjelmeland, John Crispin, Woojung Huh,
	Sean Wang, Nikita Yushchenko, Chris Healy, mlxsw
In-Reply-To: <20170828195332.GB1950@nanopsycho.orion>

> I see this overlaps a lot with DPIPE. Why won't you use that to expose
> your hw state?

We took a look at dpipe and i talked to you about using it for this
sort of thing at netconf/netdev. But dpipe has issues displaying the
sort of information we have. I never figured out how to do two
dimensional tables. The output of the dpipe command is pretty
unreadable. A lot of the information being dumped here is not about
the data pipe, etc.

There is a lot of pushback on debugfs for individual drivers. As i
said recently to somebody, debugfs is a bit of a wild west. When
designing this code, we thought about that. This debugfs is not at the
driver level. It is at the DSA level. All DSA drivers will benefit
from this code, and all DSA drivers will get the same information
exposed in debugfs. It is generic, well defined and structured, with
respect to DSA.

	Andrew

^ permalink raw reply

* Re: [PATCH net-next v2 01/10] net: dsa: add debugfs interface
From: Jiri Pirko @ 2017-08-28 20:05 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Vivien Didelot, netdev, linux-kernel, kernel, David S. Miller,
	Andrew Lunn, Egil Hjelmeland, John Crispin, Woojung Huh,
	Sean Wang, Nikita Yushchenko, Chris Healy
In-Reply-To: <b642a93c-1f63-522d-e9f2-c266b809623a@gmail.com>

Mon, Aug 28, 2017 at 09:58:12PM CEST, f.fainelli@gmail.com wrote:
>On 08/28/2017 12:50 PM, Jiri Pirko wrote:
>> Mon, Aug 28, 2017 at 09:17:39PM CEST, vivien.didelot@savoirfairelinux.com wrote:
>>> This commit adds a DEBUG_FS dependent DSA core file creating a generic
>>> debug filesystem interface for the DSA switch devices.
>>>
>>> The interface can be mounted with:
>>>
>>>    # mount -t debugfs none /sys/kernel/debug
>>>
>>> The dsa directory contains one directory per switch chip:
>>>
>>>    # cd /sys/kernel/debug/dsa/
>>>    # ls
>>>    switch0  switch1 switch2
>>>
>>> Each chip directory contains one directory per port:
>>>
>>>    # ls -l switch0/
>>>    drwxr-xr-x 2 root root 0 Jan  1 00:00 port0
>>>    drwxr-xr-x 2 root root 0 Jan  1 00:00 port1
>>>    drwxr-xr-x 2 root root 0 Jan  1 00:00 port2
>>>    drwxr-xr-x 2 root root 0 Jan  1 00:00 port5
>>>    drwxr-xr-x 2 root root 0 Jan  1 00:00 port6
>>>
>>> Future patches will add entry files to these directories.
>>>
>>> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
>> 
>> Oh no, no debugfs please!
>> 
>> What do you need to expose? I'm sure we can find out some generic, well
>> defined and reusable way.
>
>We have no CPU or DSA (cross switches) net_device reprensentors because
>those would be two ends of the same pipe so it would be both confusing

So? That is certainly not an argument for debugfs. Just have all ports
as devlink port, and you can introduce special new kind of port for cpu
port. Note that devlink port does not have to have netdev association.


>and a duplication. For a CPU interface, one side goes to the switch, the
>other one is the master net_device (normal Ethernet MAC). For a DSA
>interface, one interface is on one switch, and the other is on the other
>switch.
>
>If you look at the patch series it's pretty obvious what is being exposed :)

Sure. But lets use existing interfaces and extend them if needed. Please
don't use some made-up debugfs mess. That is never the correct answer :/

^ permalink raw reply

* Re: [PATCH net-next v2 01/10] net: dsa: add debugfs interface
From: Florian Fainelli @ 2017-08-28 19:58 UTC (permalink / raw)
  To: Jiri Pirko, Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Andrew Lunn,
	Egil Hjelmeland, John Crispin, Woojung Huh, Sean Wang,
	Nikita Yushchenko, Chris Healy
In-Reply-To: <20170828195039.GA1950@nanopsycho.orion>

On 08/28/2017 12:50 PM, Jiri Pirko wrote:
> Mon, Aug 28, 2017 at 09:17:39PM CEST, vivien.didelot@savoirfairelinux.com wrote:
>> This commit adds a DEBUG_FS dependent DSA core file creating a generic
>> debug filesystem interface for the DSA switch devices.
>>
>> The interface can be mounted with:
>>
>>    # mount -t debugfs none /sys/kernel/debug
>>
>> The dsa directory contains one directory per switch chip:
>>
>>    # cd /sys/kernel/debug/dsa/
>>    # ls
>>    switch0  switch1 switch2
>>
>> Each chip directory contains one directory per port:
>>
>>    # ls -l switch0/
>>    drwxr-xr-x 2 root root 0 Jan  1 00:00 port0
>>    drwxr-xr-x 2 root root 0 Jan  1 00:00 port1
>>    drwxr-xr-x 2 root root 0 Jan  1 00:00 port2
>>    drwxr-xr-x 2 root root 0 Jan  1 00:00 port5
>>    drwxr-xr-x 2 root root 0 Jan  1 00:00 port6
>>
>> Future patches will add entry files to these directories.
>>
>> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
> 
> Oh no, no debugfs please!
> 
> What do you need to expose? I'm sure we can find out some generic, well
> defined and reusable way.

We have no CPU or DSA (cross switches) net_device reprensentors because
those would be two ends of the same pipe so it would be both confusing
and a duplication. For a CPU interface, one side goes to the switch, the
other one is the master net_device (normal Ethernet MAC). For a DSA
interface, one interface is on one switch, and the other is on the other
switch.

If you look at the patch series it's pretty obvious what is being exposed :)
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next v2 00/10] net: dsa: add generic debugfs interface
From: Jiri Pirko @ 2017-08-28 19:53 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Egil Hjelmeland, John Crispin, Woojung Huh,
	Sean Wang, Nikita Yushchenko, Chris Healy, mlxsw
In-Reply-To: <20170828191748.19492-1-vivien.didelot@savoirfairelinux.com>

Mon, Aug 28, 2017 at 09:17:38PM CEST, vivien.didelot@savoirfairelinux.com wrote:
>This patch series adds a generic debugfs interface for the DSA
>framework, so that all switch devices benefit from it, e.g. Marvell,
>Broadcom, Microchip or any other DSA driver.
>
>This is really convenient for debugging, especially CPU ports and DSA
>links which are not exposed to userspace as net device. This interface
>is currently the only way to easily inspect the hardware for such ports.
>
>With the patch series, any switch device user is able to query the
>hardware for the supported tagging protocol, the ports stats and
>registers, as well as their FDB, MDB and VLAN entries.

I see this overlaps a lot with DPIPE. Why won't you use that to expose
your hw state?

^ permalink raw reply

* Re: [PATCH net-next v2 01/10] net: dsa: add debugfs interface
From: Jiri Pirko @ 2017-08-28 19:50 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Egil Hjelmeland, John Crispin, Woojung Huh,
	Sean Wang, Nikita Yushchenko, Chris Healy
In-Reply-To: <20170828191748.19492-2-vivien.didelot@savoirfairelinux.com>

Mon, Aug 28, 2017 at 09:17:39PM CEST, vivien.didelot@savoirfairelinux.com wrote:
>This commit adds a DEBUG_FS dependent DSA core file creating a generic
>debug filesystem interface for the DSA switch devices.
>
>The interface can be mounted with:
>
>    # mount -t debugfs none /sys/kernel/debug
>
>The dsa directory contains one directory per switch chip:
>
>    # cd /sys/kernel/debug/dsa/
>    # ls
>    switch0  switch1 switch2
>
>Each chip directory contains one directory per port:
>
>    # ls -l switch0/
>    drwxr-xr-x 2 root root 0 Jan  1 00:00 port0
>    drwxr-xr-x 2 root root 0 Jan  1 00:00 port1
>    drwxr-xr-x 2 root root 0 Jan  1 00:00 port2
>    drwxr-xr-x 2 root root 0 Jan  1 00:00 port5
>    drwxr-xr-x 2 root root 0 Jan  1 00:00 port6
>
>Future patches will add entry files to these directories.
>
>Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Oh no, no debugfs please!

What do you need to expose? I'm sure we can find out some generic, well
defined and reusable way.

^ permalink raw reply

* [PATCH net-next 4/4] nsh: add GSO support
From: Jiri Benc @ 2017-08-28 19:43 UTC (permalink / raw)
  To: netdev; +Cc: Yi Yang, Eric Garver, Jan Scheurich, Ben Pfaff
In-Reply-To: <cover.1503948295.git.jbenc@redhat.com>

Add a new nsh/ directory. It currently holds only GSO functions but more
will come: in particular, code shared by openvswitch and tc to manipulate
NSH headers.

For now, assume there's no hardware support for NSH segmentation. We can
always introduce netdev->nsh_features later.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
 net/Kconfig      |  1 +
 net/Makefile     |  1 +
 net/nsh/Kconfig  |  9 ++++++
 net/nsh/Makefile |  1 +
 net/nsh/nsh.c    | 91 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 103 insertions(+)
 create mode 100644 net/nsh/Kconfig
 create mode 100644 net/nsh/Makefile
 create mode 100644 net/nsh/nsh.c

diff --git a/net/Kconfig b/net/Kconfig
index 7d57ef34b79c..45def78912ce 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -235,6 +235,7 @@ source "net/openvswitch/Kconfig"
 source "net/vmw_vsock/Kconfig"
 source "net/netlink/Kconfig"
 source "net/mpls/Kconfig"
+source "net/nsh/Kconfig"
 source "net/hsr/Kconfig"
 source "net/switchdev/Kconfig"
 source "net/l3mdev/Kconfig"
diff --git a/net/Makefile b/net/Makefile
index bed80fa398b7..e03c3888179f 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -76,6 +76,7 @@ obj-$(CONFIG_NET_IFE)		+= ife/
 obj-$(CONFIG_OPENVSWITCH)	+= openvswitch/
 obj-$(CONFIG_VSOCKETS)	+= vmw_vsock/
 obj-$(CONFIG_MPLS)		+= mpls/
+obj-$(CONFIG_NET_NSH)		+= nsh/
 obj-$(CONFIG_HSR)		+= hsr/
 ifneq ($(CONFIG_NET_SWITCHDEV),)
 obj-y				+= switchdev/
diff --git a/net/nsh/Kconfig b/net/nsh/Kconfig
new file mode 100644
index 000000000000..bafc3dd60c2c
--- /dev/null
+++ b/net/nsh/Kconfig
@@ -0,0 +1,9 @@
+menuconfig NET_NSH
+	tristate "Network Service Header (NSH) protocol"
+	default n
+	---help---
+	  Network Service Header is an implementation of Service Function
+	  Chaining (RFC 7665). The current implementation in Linux supports
+	  only MD type 1 and only with the openvswitch module.
+
+	  If unsure, say N.
diff --git a/net/nsh/Makefile b/net/nsh/Makefile
new file mode 100644
index 000000000000..c93c787385ca
--- /dev/null
+++ b/net/nsh/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_NET_NSH) += nsh.o
diff --git a/net/nsh/nsh.c b/net/nsh/nsh.c
new file mode 100644
index 000000000000..58fb827439a8
--- /dev/null
+++ b/net/nsh/nsh.c
@@ -0,0 +1,91 @@
+/*
+ * Network Service Header
+ *
+ * Copyright (c) 2017 Red Hat, Inc. -- Jiri Benc <jbenc@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/skbuff.h>
+#include <net/nsh.h>
+#include <net/tun_proto.h>
+
+static struct sk_buff *nsh_gso_segment(struct sk_buff *skb,
+				       netdev_features_t features)
+{
+	struct sk_buff *segs = ERR_PTR(-EINVAL);
+	unsigned int nsh_len, mac_len;
+	__be16 proto;
+	int nhoff;
+
+	skb_reset_network_header(skb);
+
+	nhoff = skb->network_header - skb->mac_header;
+	mac_len = skb->mac_len;
+
+	if (unlikely(!pskb_may_pull(skb, NSH_BASE_HDR_LEN)))
+		goto out;
+	nsh_len = nsh_hdr_len(nsh_hdr(skb));
+	if (unlikely(!pskb_may_pull(skb, nsh_len)))
+		goto out;
+
+	proto = tun_p_to_eth_p(nsh_hdr(skb)->np);
+	if (!proto)
+		goto out;
+
+	__skb_pull(skb, nsh_len);
+
+	skb_reset_mac_header(skb);
+	skb_reset_mac_len(skb);
+	skb->protocol = proto;
+
+	features &= NETIF_F_SG;
+	segs = skb_mac_gso_segment(skb, features);
+	if (IS_ERR_OR_NULL(segs)) {
+		skb_gso_error_unwind(skb, htons(ETH_P_NSH), nsh_len,
+				     skb->network_header - nhoff,
+				     mac_len);
+		goto out;
+	}
+
+	for (skb = segs; skb; skb = skb->next) {
+		skb->protocol = htons(ETH_P_NSH);
+		__skb_push(skb, nsh_len);
+		skb_set_mac_header(skb, -nhoff);
+		skb->network_header = skb->mac_header + mac_len;
+		skb->mac_len = mac_len;
+	}
+
+out:
+	return segs;
+}
+
+static struct packet_offload nsh_packet_offload __read_mostly = {
+	.type = htons(ETH_P_NSH),
+	.priority = 15,
+	.callbacks = {
+		.gso_segment = nsh_gso_segment,
+	},
+};
+
+static int __init nsh_init_module(void)
+{
+	dev_add_offload(&nsh_packet_offload);
+	return 0;
+}
+
+static void __exit nsh_cleanup_module(void)
+{
+	dev_remove_offload(&nsh_packet_offload);
+}
+
+module_init(nsh_init_module);
+module_exit(nsh_cleanup_module);
+
+MODULE_AUTHOR("Jiri Benc <jbenc@redhat.com>");
+MODULE_DESCRIPTION("NSH protocol");
+MODULE_LICENSE("GPL v2");
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 3/4] net: add NSH header structures and helpers
From: Jiri Benc @ 2017-08-28 19:43 UTC (permalink / raw)
  To: netdev; +Cc: Yi Yang, Eric Garver, Jan Scheurich, Ben Pfaff
In-Reply-To: <cover.1503948295.git.jbenc@redhat.com>

From: Yi Yang <yi.y.yang@intel.com>

NSH (Network Service Header)[1] is a new protocol for service
function chaining, it can be handled as a L3 protocol like
IPv4 and IPv6, Eth + NSH + Inner packet or VxLAN-gpe + NSH +
Inner packet are two typical use cases.

This patch adds NSH header structures and helpers for NSH GSO
support and Open vSwitch NSH support.

[1] https://datatracker.ietf.org/doc/draft-ietf-sfc-nsh/

[Jiri: added nsh_hdr() helper and renamed the header struct to "struct
nshhdr" to match the usual pattern. Removed packet type defines, these are
now shared with VXLAN-GPE.]

Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
 include/net/nsh.h | 307 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 307 insertions(+)
 create mode 100644 include/net/nsh.h

diff --git a/include/net/nsh.h b/include/net/nsh.h
new file mode 100644
index 000000000000..a1eaea20be96
--- /dev/null
+++ b/include/net/nsh.h
@@ -0,0 +1,307 @@
+#ifndef __NET_NSH_H
+#define __NET_NSH_H 1
+
+#include <linux/skbuff.h>
+
+/*
+ * Network Service Header:
+ *  0                   1                   2                   3
+ *  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |Ver|O|U|    TTL    |   Length  |U|U|U|U|MD Type| Next Protocol |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |          Service Path Identifier (SPI)        | Service Index |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                                                               |
+ * ~               Mandatory/Optional Context Headers              ~
+ * |                                                               |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * Version: The version field is used to ensure backward compatibility
+ * going forward with future NSH specification updates.  It MUST be set
+ * to 0x0 by the sender, in this first revision of NSH.  Given the
+ * widespread implementation of existing hardware that uses the first
+ * nibble after an MPLS label stack for ECMP decision processing, this
+ * document reserves version 01b and this value MUST NOT be used in
+ * future versions of the protocol.  Please see [RFC7325] for further
+ * discussion of MPLS-related forwarding requirements.
+ *
+ * O bit: Setting this bit indicates an Operations, Administration, and
+ * Maintenance (OAM) packet.  The actual format and processing of SFC
+ * OAM packets is outside the scope of this specification (see for
+ * example [I-D.ietf-sfc-oam-framework] for one approach).
+ *
+ * The O bit MUST be set for OAM packets and MUST NOT be set for non-OAM
+ * packets.  The O bit MUST NOT be modified along the SFP.
+ *
+ * SF/SFF/SFC Proxy/Classifier implementations that do not support SFC
+ * OAM procedures SHOULD discard packets with O bit set, but MAY support
+ * a configurable parameter to enable forwarding received SFC OAM
+ * packets unmodified to the next element in the chain.  Forwarding OAM
+ * packets unmodified by SFC elements that do not support SFC OAM
+ * procedures may be acceptable for a subset of OAM functions, but can
+ * result in unexpected outcomes for others, thus it is recommended to
+ * analyze the impact of forwarding an OAM packet for all OAM functions
+ * prior to enabling this behavior.  The configurable parameter MUST be
+ * disabled by default.
+ *
+ * TTL: Indicates the maximum SFF hops for an SFP.  This field is used
+ * for service plane loop detection.  The initial TTL value SHOULD be
+ * configurable via the control plane; the configured initial value can
+ * be specific to one or more SFPs.  If no initial value is explicitly
+ * provided, the default initial TTL value of 63 MUST be used.  Each SFF
+ * involved in forwarding an NSH packet MUST decrement the TTL value by
+ * 1 prior to NSH forwarding lookup.  Decrementing by 1 from an incoming
+ * value of 0 shall result in a TTL value of 63.  The packet MUST NOT be
+ * forwarded if TTL is, after decrement, 0.
+ *
+ * All other flag fields, marked U, are unassigned and available for
+ * future use, see Section 11.2.1.  Unassigned bits MUST be set to zero
+ * upon origination, and MUST be ignored and preserved unmodified by
+ * other NSH supporting elements.  Elements which do not understand the
+ * meaning of any of these bits MUST NOT modify their actions based on
+ * those unknown bits.
+ *
+ * Length: The total length, in 4-byte words, of NSH including the Base
+ * Header, the Service Path Header, the Fixed Length Context Header or
+ * Variable Length Context Header(s).  The length MUST be 0x6 for MD
+ * Type equal to 0x1, and MUST be 0x2 or greater for MD Type equal to
+ * 0x2.  The length of the NSH header MUST be an integer multiple of 4
+ * bytes, thus variable length metadata is always padded out to a
+ * multiple of 4 bytes.
+ *
+ * MD Type: Indicates the format of NSH beyond the mandatory Base Header
+ * and the Service Path Header.  MD Type defines the format of the
+ * metadata being carried.
+ *
+ * 0x0 - This is a reserved value.  Implementations SHOULD silently
+ * discard packets with MD Type 0x0.
+ *
+ * 0x1 - This indicates that the format of the header includes a fixed
+ * length Context Header (see Figure 4 below).
+ *
+ * 0x2 - This does not mandate any headers beyond the Base Header and
+ * Service Path Header, but may contain optional variable length Context
+ * Header(s).  The semantics of the variable length Context Header(s)
+ * are not defined in this document.  The format of the optional
+ * variable length Context Headers is provided in Section 2.5.1.
+ *
+ * 0xF - This value is reserved for experimentation and testing, as per
+ * [RFC3692].  Implementations not explicitly configured to be part of
+ * an experiment SHOULD silently discard packets with MD Type 0xF.
+ *
+ * Next Protocol: indicates the protocol type of the encapsulated data.
+ * NSH does not alter the inner payload, and the semantics on the inner
+ * protocol remain unchanged due to NSH service function chaining.
+ * Please see the IANA Considerations section below, Section 11.2.5.
+ *
+ * This document defines the following Next Protocol values:
+ *
+ * 0x1: IPv4
+ * 0x2: IPv6
+ * 0x3: Ethernet
+ * 0x4: NSH
+ * 0x5: MPLS
+ * 0xFE: Experiment 1
+ * 0xFF: Experiment 2
+ *
+ * Packets with Next Protocol values not supported SHOULD be silently
+ * dropped by default, although an implementation MAY provide a
+ * configuration parameter to forward them.  Additionally, an
+ * implementation not explicitly configured for a specific experiment
+ * [RFC3692] SHOULD silently drop packets with Next Protocol values 0xFE
+ * and 0xFF.
+ *
+ * Service Path Identifier (SPI): Identifies a service path.
+ * Participating nodes MUST use this identifier for Service Function
+ * Path selection.  The initial classifier MUST set the appropriate SPI
+ * for a given classification result.
+ *
+ * Service Index (SI): Provides location within the SFP.  The initial
+ * classifier for a given SFP SHOULD set the SI to 255, however the
+ * control plane MAY configure the initial value of SI as appropriate
+ * (i.e., taking into account the length of the service function path).
+ * The Service Index MUST be decremented by a value of 1 by Service
+ * Functions or by SFC Proxy nodes after performing required services
+ * and the new decremented SI value MUST be used in the egress packet's
+ * NSH.  The initial Classifier MUST send the packet to the first SFF in
+ * the identified SFP for forwarding along an SFP.  If re-classification
+ * occurs, and that re-classification results in a new SPI, the
+ * (re)classifier is, in effect, the initial classifier for the
+ * resultant SPI.
+ *
+ * The SI is used in conjunction the with Service Path Identifier for
+ * Service Function Path Selection and for determining the next SFF/SF
+ * in the path.  The SI is also valuable when troubleshooting or
+ * reporting service paths.  Additionally, while the TTL field is the
+ * main mechanism for service plane loop detection, the SI can also be
+ * used for detecting service plane loops.
+ *
+ * When the Base Header specifies MD Type = 0x1, a Fixed Length Context
+ * Header (16-bytes) MUST be present immediately following the Service
+ * Path Header. The value of a Fixed Length Context
+ * Header that carries no metadata MUST be set to zero.
+ *
+ * When the base header specifies MD Type = 0x2, zero or more Variable
+ * Length Context Headers MAY be added, immediately following the
+ * Service Path Header (see Figure 5).  Therefore, Length = 0x2,
+ * indicates that only the Base Header followed by the Service Path
+ * Header are present.  The optional Variable Length Context Headers
+ * MUST be of an integer number of 4-bytes.  The base header Length
+ * field MUST be used to determine the offset to locate the original
+ * packet or frame for SFC nodes that require access to that
+ * information.
+ *
+ * The format of the optional variable length Context Headers
+ *
+ *  0                   1                   2                   3
+ *  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |          Metadata Class       |      Type     |U|    Length   |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                      Variable Metadata                        |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * Metadata Class (MD Class): Defines the scope of the 'Type' field to
+ * provide a hierarchical namespace.  The IANA Considerations
+ * Section 11.2.4 defines how the MD Class values can be allocated to
+ * standards bodies, vendors, and others.
+ *
+ * Type: Indicates the explicit type of metadata being carried.  The
+ * definition of the Type is the responsibility of the MD Class owner.
+ *
+ * Unassigned bit: One unassigned bit is available for future use. This
+ * bit MUST NOT be set, and MUST be ignored on receipt.
+ *
+ * Length: Indicates the length of the variable metadata, in bytes.  In
+ * case the metadata length is not an integer number of 4-byte words,
+ * the sender MUST add pad bytes immediately following the last metadata
+ * byte to extend the metadata to an integer number of 4-byte words.
+ * The receiver MUST round up the length field to the nearest 4-byte
+ * word boundary, to locate and process the next field in the packet.
+ * The receiver MUST access only those bytes in the metadata indicated
+ * by the length field (i.e., actual number of bytes) and MUST ignore
+ * the remaining bytes up to the nearest 4-byte word boundary.  The
+ * Length may be 0 or greater.
+ *
+ * A value of 0 denotes a Context Header without a Variable Metadata
+ * field.
+ *
+ * [0] https://datatracker.ietf.org/doc/draft-ietf-sfc-nsh/
+ */
+
+/**
+ * struct nsh_md1_ctx - Keeps track of NSH context data
+ * @nshc<1-4>: NSH Contexts.
+ */
+struct nsh_md1_ctx {
+	__be32 context[4];
+};
+
+struct nsh_md2_tlv {
+	__be16 md_class;
+	u8 type;
+	u8 length;
+	u8 md_value[];
+};
+
+struct nshhdr {
+	__be16 ver_flags_ttl_len;
+	u8 mdtype;
+	u8 np;
+	__be32 path_hdr;
+	union {
+	    struct nsh_md1_ctx md1;
+	    struct nsh_md2_tlv md2;
+	};
+};
+
+/* Masking NSH header fields. */
+#define NSH_VER_MASK       0xc000
+#define NSH_VER_SHIFT      14
+#define NSH_FLAGS_MASK     0x3000
+#define NSH_FLAGS_SHIFT    12
+#define NSH_TTL_MASK       0x0fc0
+#define NSH_TTL_SHIFT      6
+#define NSH_LEN_MASK       0x003f
+#define NSH_LEN_SHIFT      0
+
+#define NSH_MDTYPE_MASK    0x0f
+#define NSH_MDTYPE_SHIFT   0
+
+#define NSH_SPI_MASK       0xffffff00
+#define NSH_SPI_SHIFT      8
+#define NSH_SI_MASK        0x000000ff
+#define NSH_SI_SHIFT       0
+
+/* MD Type Registry. */
+#define NSH_M_TYPE1     0x01
+#define NSH_M_TYPE2     0x02
+#define NSH_M_EXP1      0xFE
+#define NSH_M_EXP2      0xFF
+
+/* NSH Base Header Length */
+#define NSH_BASE_HDR_LEN  8
+
+/* NSH MD Type 1 header Length. */
+#define NSH_M_TYPE1_LEN   24
+
+/* NSH header maximum Length. */
+#define NSH_HDR_MAX_LEN 256
+
+/* NSH context headers maximum Length. */
+#define NSH_CTX_HDRS_MAX_LEN 248
+
+static inline struct nshhdr *nsh_hdr(struct sk_buff *skb)
+{
+	return (struct nshhdr *)skb_network_header(skb);
+}
+
+static inline u16 nsh_hdr_len(const struct nshhdr *nsh)
+{
+	return ((ntohs(nsh->ver_flags_ttl_len) & NSH_LEN_MASK)
+		>> NSH_LEN_SHIFT) << 2;
+}
+
+static inline u8 nsh_get_ver(const struct nshhdr *nsh)
+{
+	return (ntohs(nsh->ver_flags_ttl_len) & NSH_VER_MASK)
+		>> NSH_VER_SHIFT;
+}
+
+static inline u8 nsh_get_flags(const struct nshhdr *nsh)
+{
+	return (ntohs(nsh->ver_flags_ttl_len) & NSH_FLAGS_MASK)
+		>> NSH_FLAGS_SHIFT;
+}
+
+static inline u8 nsh_get_ttl(const struct nshhdr *nsh)
+{
+	return (ntohs(nsh->ver_flags_ttl_len) & NSH_TTL_MASK)
+		>> NSH_TTL_SHIFT;
+}
+
+static inline void __nsh_set_xflag(struct nshhdr *nsh, u16 xflag, u16 xmask)
+{
+	nsh->ver_flags_ttl_len
+		= (nsh->ver_flags_ttl_len & ~htons(xmask)) | htons(xflag);
+}
+
+static inline void nsh_set_flags_and_ttl(struct nshhdr *nsh, u8 flags, u8 ttl)
+{
+	__nsh_set_xflag(nsh, ((flags << NSH_FLAGS_SHIFT) & NSH_FLAGS_MASK) |
+			     ((ttl << NSH_TTL_SHIFT) & NSH_TTL_MASK),
+			NSH_FLAGS_MASK | NSH_TTL_MASK);
+}
+
+static inline void nsh_set_flags_ttl_len(struct nshhdr *nsh, u8 flags,
+					 u8 ttl, u8 len)
+{
+	len = len >> 2;
+	__nsh_set_xflag(nsh, ((flags << NSH_FLAGS_SHIFT) & NSH_FLAGS_MASK) |
+			     ((ttl << NSH_TTL_SHIFT) & NSH_TTL_MASK) |
+			     ((len << NSH_LEN_SHIFT) & NSH_LEN_MASK),
+			NSH_FLAGS_MASK | NSH_TTL_MASK | NSH_LEN_MASK);
+}
+
+#endif /* __NET_NSH_H */
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 2/4] vxlan: factor out VXLAN-GPE next protocol
From: Jiri Benc @ 2017-08-28 19:43 UTC (permalink / raw)
  To: netdev; +Cc: Yi Yang, Eric Garver, Jan Scheurich, Ben Pfaff
In-Reply-To: <cover.1503948295.git.jbenc@redhat.com>

The values are shared between VXLAN-GPE and NSH. Originally probably by
coincidence but I notified both working groups about this last year and they
seem to keep the values in sync since then.

Hopefully they'll get a single IANA registry for the values, too. (I asked
them for that.)

Factor out the code to be shared by the NSH implementation.

NSH and MPLS values are added in this patch, too. For MPLS, the drafts
incorrectly assign only a single value, while we have two MPLS ethertypes.
I raised the problem with both groups. For now, I assume the value is for
unicast.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
 drivers/net/vxlan.c     | 32 +++++++-------------------------
 include/net/tun_proto.h | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/net/vxlan.h     |  6 ------
 3 files changed, 56 insertions(+), 31 deletions(-)
 create mode 100644 include/net/tun_proto.h

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index ae3a1da703c2..d7c49cf1d5e9 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -26,6 +26,7 @@
 #include <net/inet_ecn.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
+#include <net/tun_proto.h>
 #include <net/vxlan.h>
 
 #if IS_ENABLED(CONFIG_IPV6)
@@ -1261,19 +1262,9 @@ static bool vxlan_parse_gpe_hdr(struct vxlanhdr *unparsed,
 	if (gpe->oam_flag)
 		return false;
 
-	switch (gpe->next_protocol) {
-	case VXLAN_GPE_NP_IPV4:
-		*protocol = htons(ETH_P_IP);
-		break;
-	case VXLAN_GPE_NP_IPV6:
-		*protocol = htons(ETH_P_IPV6);
-		break;
-	case VXLAN_GPE_NP_ETHERNET:
-		*protocol = htons(ETH_P_TEB);
-		break;
-	default:
+	*protocol = tun_p_to_eth_p(gpe->next_protocol);
+	if (!*protocol)
 		return false;
-	}
 
 	unparsed->vx_flags &= ~VXLAN_GPE_USED_BITS;
 	return true;
@@ -1799,19 +1790,10 @@ static int vxlan_build_gpe_hdr(struct vxlanhdr *vxh, u32 vxflags,
 	struct vxlanhdr_gpe *gpe = (struct vxlanhdr_gpe *)vxh;
 
 	gpe->np_applied = 1;
-
-	switch (protocol) {
-	case htons(ETH_P_IP):
-		gpe->next_protocol = VXLAN_GPE_NP_IPV4;
-		return 0;
-	case htons(ETH_P_IPV6):
-		gpe->next_protocol = VXLAN_GPE_NP_IPV6;
-		return 0;
-	case htons(ETH_P_TEB):
-		gpe->next_protocol = VXLAN_GPE_NP_ETHERNET;
-		return 0;
-	}
-	return -EPFNOSUPPORT;
+	gpe->next_protocol = tun_p_from_eth_p(protocol);
+	if (!gpe->next_protocol)
+		return -EPFNOSUPPORT;
+	return 0;
 }
 
 static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst,
diff --git a/include/net/tun_proto.h b/include/net/tun_proto.h
new file mode 100644
index 000000000000..2ea3deba4c99
--- /dev/null
+++ b/include/net/tun_proto.h
@@ -0,0 +1,49 @@
+#ifndef __NET_TUN_PROTO_H
+#define __NET_TUN_PROTO_H
+
+#include <linux/kernel.h>
+
+/* One byte protocol values as defined by VXLAN-GPE and NSH. These will
+ * hopefully get a shared IANA registry.
+ */
+#define TUN_P_IPV4      0x01
+#define TUN_P_IPV6      0x02
+#define TUN_P_ETHERNET  0x03
+#define TUN_P_NSH       0x04
+#define TUN_P_MPLS_UC   0x05
+
+static inline __be16 tun_p_to_eth_p(u8 proto)
+{
+	switch (proto) {
+	case TUN_P_IPV4:
+		return htons(ETH_P_IP);
+	case TUN_P_IPV6:
+		return htons(ETH_P_IPV6);
+	case TUN_P_ETHERNET:
+		return htons(ETH_P_TEB);
+	case TUN_P_NSH:
+		return htons(ETH_P_NSH);
+	case TUN_P_MPLS_UC:
+		return htons(ETH_P_MPLS_UC);
+	}
+	return 0;
+}
+
+static inline u8 tun_p_from_eth_p(__be16 proto)
+{
+	switch (proto) {
+	case htons(ETH_P_IP):
+		return TUN_P_IPV4;
+	case htons(ETH_P_IPV6):
+		return TUN_P_IPV6;
+	case htons(ETH_P_TEB):
+		return TUN_P_ETHERNET;
+	case htons(ETH_P_NSH):
+		return TUN_P_NSH;
+	case htons(ETH_P_MPLS_UC):
+		return TUN_P_MPLS_UC;
+	}
+	return 0;
+}
+
+#endif
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 3f430e38ab82..4e3876dde295 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -168,12 +168,6 @@ struct vxlanhdr_gpe {
 #define VXLAN_GPE_USED_BITS (VXLAN_HF_VER | VXLAN_HF_NP | VXLAN_HF_OAM | \
 			     cpu_to_be32(0xff))
 
-/* VXLAN-GPE header Next Protocol. */
-#define VXLAN_GPE_NP_IPV4      0x01
-#define VXLAN_GPE_NP_IPV6      0x02
-#define VXLAN_GPE_NP_ETHERNET  0x03
-#define VXLAN_GPE_NP_NSH       0x04
-
 struct vxlan_metadata {
 	u32		gbp;
 };
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 1/4] ether: add NSH ethertype
From: Jiri Benc @ 2017-08-28 19:43 UTC (permalink / raw)
  To: netdev; +Cc: Yi Yang, Eric Garver, Jan Scheurich, Ben Pfaff
In-Reply-To: <cover.1503948295.git.jbenc@redhat.com>

The NSH draft says:

   An IEEE EtherType, 0x894F, has been allocated for NSH.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
 include/uapi/linux/if_ether.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/if_ether.h b/include/uapi/linux/if_ether.h
index efeb1190c2ca..d7b47e97904f 100644
--- a/include/uapi/linux/if_ether.h
+++ b/include/uapi/linux/if_ether.h
@@ -99,6 +99,7 @@
 #define ETH_P_FIP	0x8914		/* FCoE Initialization Protocol */
 #define ETH_P_80221	0x8917		/* IEEE 802.21 Media Independent Handover Protocol */
 #define ETH_P_HSR	0x892F		/* IEC 62439-3 HSRv1	*/
+#define ETH_P_NSH	0x894F		/* Network Service Header */
 #define ETH_P_LOOPBACK	0x9000		/* Ethernet loopback packet, per IEEE 802.3 */
 #define ETH_P_QINQ1	0x9100		/* deprecated QinQ VLAN [ NOT AN OFFICIALLY REGISTERED ID ] */
 #define ETH_P_QINQ2	0x9200		/* deprecated QinQ VLAN [ NOT AN OFFICIALLY REGISTERED ID ] */
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 0/4] nsh: headers, GSO
From: Jiri Benc @ 2017-08-28 19:43 UTC (permalink / raw)
  To: netdev; +Cc: Yi Yang, Eric Garver, Jan Scheurich, Ben Pfaff

This adds header structs and helpers for NSH together with GSO support.

Note there is no code in this patchset that actually manipulates the NSH
headers. That was sent to netdev by Yi Yang ("[PATCH net-next v6 0/3]
openvswitch: add NSH support"). The aim of this series is to lay the
groundwork and ease the implementation for him.

In addition to openvswitch, the NSH support should be added to tc (flower to
match, act_nsh to push/pop NSH headers). That will come later. There's
currently no plan to support NSH by other means than those two.

The patch 3 in this patchset was written by Yi Yang, I took it from the
aforementioned series and slightly modified it - see the note in the patch.

Jiri Benc (3):
  ether: add NSH ethertype
  vxlan: factor out VXLAN-GPE next protocol
  nsh: add GSO support

Yi Yang (1):
  net: add NSH header structures and helpers

 drivers/net/vxlan.c           |  32 +----
 include/net/nsh.h             | 307 ++++++++++++++++++++++++++++++++++++++++++
 include/net/tun_proto.h       |  49 +++++++
 include/net/vxlan.h           |   6 -
 include/uapi/linux/if_ether.h |   1 +
 net/Kconfig                   |   1 +
 net/Makefile                  |   1 +
 net/nsh/Kconfig               |   9 ++
 net/nsh/Makefile              |   1 +
 net/nsh/nsh.c                 |  91 +++++++++++++
 10 files changed, 467 insertions(+), 31 deletions(-)
 create mode 100644 include/net/nsh.h
 create mode 100644 include/net/tun_proto.h
 create mode 100644 net/nsh/Kconfig
 create mode 100644 net/nsh/Makefile
 create mode 100644 net/nsh/nsh.c

-- 
1.8.3.1

^ permalink raw reply

* Re: XDP redirect measurements, gotchas and tracepoints
From: Andy Gospodarek @ 2017-08-28 19:39 UTC (permalink / raw)
  To: John Fastabend
  Cc: Michael Chan, Jesper Dangaard Brouer, Alexander Duyck,
	Duyck, Alexander H, pstaszewski@itcare.pl, netdev@vger.kernel.org,
	xdp-newbies@vger.kernel.org, borkmann@iogearbox.net
In-Reply-To: <59A4415C.80702@gmail.com>

On Mon, Aug 28, 2017 at 09:14:20AM -0700, John Fastabend wrote:
> On 08/28/2017 09:02 AM, Andy Gospodarek wrote:
> > On Fri, Aug 25, 2017 at 08:28:55AM -0700, Michael Chan wrote:
> >> On Fri, Aug 25, 2017 at 8:10 AM, John Fastabend
> >> <john.fastabend@gmail.com> wrote:
> >>> On 08/25/2017 05:45 AM, Jesper Dangaard Brouer wrote:
> >>>> On Thu, 24 Aug 2017 20:36:28 -0700
> >>>> Michael Chan <michael.chan@broadcom.com> wrote:
> >>>>
> >>>>> On Wed, Aug 23, 2017 at 1:29 AM, Jesper Dangaard Brouer
> >>>>> <brouer@redhat.com> wrote:
> >>>>>> On Tue, 22 Aug 2017 23:59:05 -0700
> >>>>>> Michael Chan <michael.chan@broadcom.com> wrote:
> >>>>>>
> >>>>>>> On Tue, Aug 22, 2017 at 6:06 PM, Alexander Duyck
> >>>>>>> <alexander.duyck@gmail.com> wrote:
> >>>>>>>> On Tue, Aug 22, 2017 at 1:04 PM, Michael Chan <michael.chan@broadcom.com> wrote:
> >>>>>>>>>
> >>>>>>>>> Right, but it's conceivable to add an API to "return" the buffer to
> >>>>>>>>> the input device, right?
> >>>>>>
> >>>>>> Yes, I would really like to see an API like this.
> >>>>>>
> >>>>>>>>
> >>>>>>>> You could, it is just added complexity. "just free the buffer" in
> >>>>>>>> ixgbe usually just amounts to one atomic operation to decrement the
> >>>>>>>> total page count since page recycling is already implemented in the
> >>>>>>>> driver. You still would have to unmap the buffer regardless of if you
> >>>>>>>> were recycling it or not so all you would save is 1.000015259 atomic
> >>>>>>>> operations per packet. The fraction is because once every 64K uses we
> >>>>>>>> have to bulk update the count on the page.
> >>>>>>>>
> >>>>>>>
> >>>>>>> If the buffer is returned to the input device, the input device can
> >>>>>>> keep the DMA mapping.  All it needs to do is to dma_sync it back to
> >>>>>>> the input device when the buffer is returned.
> >>>>>>
> >>>>>> Yes, exactly, return to the input device. I really think we should
> >>>>>> work on a solution where we can keep the DMA mapping around.  We have
> >>>>>> an opportunity here to make ndo_xdp_xmit TX queues use a specialized
> >>>>>> page return call, to achieve this. (I imagine other arch's have a high
> >>>>>> DMA overhead than Intel)
> >>>>>>
> >>>>>> I'm not sure how the API should look.  The ixgbe recycle mechanism and
> >>>>>> splitting the page (into two packets) actually complicates things, and
> >>>>>> tie us into a page-refcnt based model.  We could get around this by
> >>>>>> each driver implementing a page-return-callback, that allow us to
> >>>>>> return the page to the input device?  Then, drivers implementing the
> >>>>>> 1-packet-per-page can simply check/read the page-refcnt, and if it is
> >>>>>> "1" DMA-sync and reuse it in the RX queue.
> >>>>>>
> >>>>>
> >>>>> Yeah, based on Alex' description, it's not clear to me whether ixgbe
> >>>>> redirecting to a non-intel NIC or vice versa will actually work.  It
> >>>>> sounds like the output device has to make some assumptions about how
> >>>>> the page was allocated by the input device.
> >>>>
> >>>> Yes, exactly. We are tied into a page refcnt based scheme.
> >>>>
> >>>> Besides the ixgbe page recycle scheme (which keeps the DMA RX-mapping)
> >>>> is also tied to the RX queue size, plus how fast the pages are returned.
> >>>> This makes it very hard to tune.  As I demonstrated, default ixgbe
> >>>> settings does not work well with XDP_REDIRECT.  I needed to increase
> >>>> TX-ring size, but it broke page recycling (dropping perf from 13Mpps to
> >>>> 10Mpps) so I also needed it increase RX-ring size.  But perf is best if
> >>>> RX-ring size is smaller, thus two contradicting tuning needed.
> >>>>
> >>>
> >>> The changes to decouple the ixgbe page recycle scheme (1pg per descriptor
> >>> split into two halves being the default) from the number of descriptors
> >>> doesn't look too bad IMO. It seems like it could be done by having some
> >>> extra pages allocated upfront and pulling those in when we need another
> >>> page.
> >>>
> >>> This would be a nice iterative step we could take on the existing API.
> >>>
> >>>>
> >>>>> With buffer return API,
> >>>>> each driver can cleanly recycle or free its own buffers properly.
> >>>>
> >>>> Yes, exactly. And RX-driver can implement a special memory model for
> >>>> this queue.  E.g. RX-driver can know this is a dedicated XDP RX-queue
> >>>> which is never used for SKBs, thus opening for new RX memory models.
> >>>>
> >>>> Another advantage of a return API.  There is also an opportunity for
> >>>> avoiding the DMA map on TX. As we need to know the from-device.  Thus,
> >>>> we can add a DMA API, where we can query if the two devices uses the
> >>>> same DMA engine, and can reuse the same DMA address the RX-side already
> >>>> knows.
> >>>>
> >>>>
> >>>>> Let me discuss this further with Andy to see if we can come up with a
> >>>>> good scheme.
> >>>>
> >>>> Sound good, looking forward to hear what you come-up with :-)
> >>>>
> >>>
> >>> I guess by this thread we will see a broadcom nic with redirect support
> >>> soon ;)
> >>
> >> Yes, Andy actually has finished the coding for XDP_REDIRECT, but the
> >> buffer recycling scheme has some problems.  We can make it work for
> >> Broadcom to Broadcom only, but we want a better solution.
> > 
> > (Sorry for the radio silence I was AFK last week...)
> > 
> > I finished it a little while ago, but Michael and I both have concerns
> > that in a heterogenous hardware setup one can quickly run into issues
> > and haven't had time to work-up a few solutions before bringing this up
> > formally.  It also isn't a major problem until the second
> > optimized/native XDP driver appears on the scene.
> > 
> > I can run a test where XDP redirects from an ixgbe <-> bnxt_en based
> > device I get OOM kills after only a few seconds, due to the lack of
> > feedback between the different drivers that the pointer to xdp->data can
> > be freed/reused/etc and the different buffer allocation schemes used.
> > 
> 
> hmm so how do you get OOM here, I expect the number of in-flight xdp
> bufs should be limited by the number of xdps that can be posted to the
> outgoing interface. If we are hitting OOM that _should_ mean the size of
> the tx queue is too large. Ixgbe should be free'ing the buffer if an error
> is returned from xdp xmit routines (will check this today). And bnxt should
> return an error if we hit some high water mark on xmit.

I reconfigured the hardware after I was done with the bnxt_en devel, but I
should be able to set it up and provide some more detail.  Let me repro it and
debug a bit more.

> 
> > Initially I did not think this was an issue and that xdp_do_flush_map()
> > would handle this, but I think there is a still a need to be able to
> > signal back to the receving device that the buffer allocated has been
> > xmitted by the transmitter and can be freed.  Since there is really no
> > guarantee that completion of an XDP_REDIRECT action means that it is
> > safe to free area pointed to by xdp->data area that contains the packet
> > to be xmitted.  Since the packet done interrupt handler in a driver
> > cannot signal back the the receiving driver that the buffer is now safe
> > to reuse/free there is a chance for trouble.  
> 
> There should be some high water mark on how many outstanding packets
> can be in-flight. At the moment I assumed this was something related to
> queue lengths a more explicit high water mark could added to the xmit path
> and tracked in xdp infrastructure.
> 
> > 
> > I was hoping to spend some time this week cooking up a patch that just
> > did not allow use of XDP_REDIRECT when the ifindex of the outgoing
> > device did not match that of the device to which the XDP prog was
> > attached, but that probably is not worth the trouble when we would just
> > fix it for real.  (It would also require some really terrible hacks to
> > enforce this in the kernel when all that is being done is setting up a
> > map that contains the redirect table, so it is probably not useful.)
> > 
> 
> I would prefer to solve the problem vs limiting the implementation
> 

Agreed.

> > The basic prototype would be something like this:
> > 
> > (rx packet interrupt on eth0, leads to napi_poll)
> > napi_poll (eth0)
> >   call xdp_prog (eth0)
> >     xdp_do_redirect (eth0)
> >       ndo_xdp_xmit (eth1)
> >       mark buffer with information netdev/ring/etc
> >       place buffer on tx ring for eth1
> > 
> > (tx done interrupt on eth1, leads to napi_poll)
> > napi_poll (eth1)
> >   process tx interrupt (eth1)
> >     look up information about netdev/ring/etc
> >     ndo_xdp_data_free (eth0, ring, etc)
> > 
> > Thoughts?
> > 
> 

^ permalink raw reply

* Re: [PATCH iproute2 1/4] tc: m_ife: allow ife type to zero
From: Stephen Hemminger @ 2017-08-28 19:34 UTC (permalink / raw)
  To: Alexander Aring; +Cc: jhs, yotamg, xiyou.wangcong, jiri, netdev
In-Reply-To: <20170828190738.26829-2-aring@mojatatu.com>

On Mon, 28 Aug 2017 15:07:35 -0400
Alexander Aring <aring@mojatatu.com> wrote:

> This patch allows to set an ethertype for IFE which is zero. There is no
> kernel side validation which forbids a type to zero.
> 
> Signed-off-by: Alexander Aring <aring@mojatatu.com>
> ---
>  tc/m_ife.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/tc/m_ife.c b/tc/m_ife.c
> index e3521e62..e05e2276 100644
> --- a/tc/m_ife.c
> +++ b/tc/m_ife.c
> @@ -63,6 +63,7 @@ static int parse_ife(struct action_util *a, int *argc_p, char ***argv_p,
>  	char dbuf[ETH_ALEN];
>  	char sbuf[ETH_ALEN];
>  	__u16 ife_type = 0;
> +	int user_type = 0;

Please use bool if it is a flag value

^ permalink raw reply

* Re: [PATCH iproute2 2/4] tc: m_ife: print IEEE ethertype format
From: Stephen Hemminger @ 2017-08-28 19:33 UTC (permalink / raw)
  To: Alexander Aring; +Cc: jhs, yotamg, xiyou.wangcong, jiri, netdev
In-Reply-To: <20170828190738.26829-3-aring@mojatatu.com>

On Mon, 28 Aug 2017 15:07:36 -0400
Alexander Aring <aring@mojatatu.com> wrote:

> This patch uses the usually IEEE format to display an ethertype which is
> 4-digits and every digit in upper case.
> 
> Signed-off-by: Alexander Aring <aring@mojatatu.com>
> ---
>  tc/m_ife.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tc/m_ife.c b/tc/m_ife.c
> index e05e2276..7b57130e 100644
> --- a/tc/m_ife.c
> +++ b/tc/m_ife.c
> @@ -125,7 +125,7 @@ static int parse_ife(struct action_util *a, int *argc_p, char ***argv_p,
>  			NEXT_ARG();
>  			if (get_u16(&ife_type, *argv, 0))
>  				invarg("ife type is invalid", *argv);
> -			fprintf(stderr, "IFE type 0x%x\n", ife_type);
> +			fprintf(stderr, "IFE type 0x%04X\n", ife_type);
>  			user_type = 1;
>  		} else if (matches(*argv, "dst") == 0) {
>  			NEXT_ARG();

Why upper case?

^ permalink raw reply

* Re: [PATCH iproute2 net-next 1/2] iproute: add support for seg6 l2encap mode
From: Stephen Hemminger @ 2017-08-28 19:32 UTC (permalink / raw)
  To: David Lebrun; +Cc: netdev
In-Reply-To: <99974f81-90e8-c6ff-7068-2beb871cb84d@uclouvain.be>

[-- Attachment #1: Type: text/plain, Size: 436 bytes --]

On Mon, 28 Aug 2017 20:11:59 +0100
David Lebrun <david.lebrun@uclouvain.be> wrote:

> On 08/28/2017 08:07 PM, Stephen Hemminger wrote:
> > Since these values probably will grow over time, it would make
> > sense to have this a name/value table.  
> 
> I wasn't sure if it was worth it for 3 values. I do not foresee a large
> growth either, but I will send a v2 will a table anyway
> 
> David

Either way, not a big worry.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH iproute2 net-next v2 2/2] man: add documentation for seg6 l2encap mode
From: David Lebrun @ 2017-08-28 19:26 UTC (permalink / raw)
  To: netdev; +Cc: David Lebrun
In-Reply-To: <20170828192640.19240-1-david.lebrun@uclouvain.be>

This patch adds documentation for the seg6 L2ENCAP encapsulation mode.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
---
 man/man8/ip-route.8.in | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/man/man8/ip-route.8.in b/man/man8/ip-route.8.in
index 11dd9d0..803de3b 100644
--- a/man/man8/ip-route.8.in
+++ b/man/man8/ip-route.8.in
@@ -214,7 +214,7 @@ throw " | " unreachable " | " prohibit " | " blackhole " | " nat " ]"
 .IR ENCAP_SEG6 " := "
 .B seg6
 .BR mode " [ "
-.BR encap " | " inline " ] "
+.BR encap " | " inline " | " l2encap " ] "
 .B segs
 .IR SEGMENTS " [ "
 .B hmac
@@ -750,6 +750,10 @@ is a set of encapsulation attributes specific to the
 - Encapsulate packet in an outer IPv6 header with SRH
 .sp
 
+.B mode l2encap
+- Encapsulate ingress L2 frame within an outer IPv6 header and SRH
+.sp
+
 .I SEGMENTS
 - List of comma-separated IPv6 addresses
 .sp
-- 
2.10.2

^ permalink raw reply related

* [PATCH iproute2 net-next v2 1/2] iproute: add support for seg6 l2encap mode
From: David Lebrun @ 2017-08-28 19:26 UTC (permalink / raw)
  To: netdev; +Cc: David Lebrun
In-Reply-To: <20170828192640.19240-1-david.lebrun@uclouvain.be>

This patch adds support for the L2ENCAP seg6 mode, enabling to encapsulate
L2 frames within SRv6 packets.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
---
 ip/iproute_lwtunnel.c | 41 ++++++++++++++++++++++++++++++-----------
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/ip/iproute_lwtunnel.c b/ip/iproute_lwtunnel.c
index 14294c6..02f36ef 100644
--- a/ip/iproute_lwtunnel.c
+++ b/ip/iproute_lwtunnel.c
@@ -110,6 +110,32 @@ static void print_srh(FILE *fp, struct ipv6_sr_hdr *srh)
 	}
 }
 
+static const char *seg6_mode_types[] = {
+	[SEG6_IPTUN_MODE_INLINE]	= "inline",
+	[SEG6_IPTUN_MODE_ENCAP]		= "encap",
+	[SEG6_IPTUN_MODE_L2ENCAP]	= "l2encap",
+};
+
+static const char *format_seg6mode_type(int mode)
+{
+	if (mode < 0 || mode > ARRAY_SIZE(seg6_mode_types))
+		return "<unknown>";
+
+	return seg6_mode_types[mode];
+}
+
+static int read_seg6mode_type(const char *mode)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(seg6_mode_types); i++) {
+		if (strcmp(mode, seg6_mode_types[i]) == 0)
+			return i;
+	}
+
+	return -1;
+}
+
 static void print_encap_seg6(FILE *fp, struct rtattr *encap)
 {
 	struct rtattr *tb[SEG6_IPTUNNEL_MAX+1];
@@ -121,8 +147,7 @@ static void print_encap_seg6(FILE *fp, struct rtattr *encap)
 		return;
 
 	tuninfo = RTA_DATA(tb[SEG6_IPTUNNEL_SRH]);
-	fprintf(fp, "mode %s ",
-		(tuninfo->mode == SEG6_IPTUN_MODE_ENCAP) ? "encap" : "inline");
+	fprintf(fp, "mode %s ", format_seg6mode_type(tuninfo->mode));
 
 	print_srh(fp, tuninfo->srh);
 }
@@ -457,11 +482,8 @@ static int parse_encap_seg6(struct rtattr *rta, size_t len, int *argcp,
 			NEXT_ARG();
 			if (mode_ok++)
 				duparg2("mode", *argv);
-			if (strcmp(*argv, "encap") == 0)
-				encap = 1;
-			else if (strcmp(*argv, "inline") == 0)
-				encap = 0;
-			else
+			encap = read_seg6mode_type(*argv);
+			if (encap < 0)
 				invarg("\"mode\" value is invalid\n", *argv);
 		} else if (strcmp(*argv, "segs") == 0) {
 			NEXT_ARG();
@@ -490,10 +512,7 @@ static int parse_encap_seg6(struct rtattr *rta, size_t len, int *argcp,
 	tuninfo = malloc(sizeof(*tuninfo) + srhlen);
 	memset(tuninfo, 0, sizeof(*tuninfo) + srhlen);
 
-	if (encap)
-		tuninfo->mode = SEG6_IPTUN_MODE_ENCAP;
-	else
-		tuninfo->mode = SEG6_IPTUN_MODE_INLINE;
+	tuninfo->mode = encap;
 
 	memcpy(tuninfo->srh, srh, srhlen);
 
-- 
2.10.2

^ permalink raw reply related

* [PATCH iproute2 net-next v2 0/2] Add support for seg6 l2encap mode
From: David Lebrun @ 2017-08-28 19:26 UTC (permalink / raw)
  To: netdev; +Cc: David Lebrun

This patch series adds support for the new L2ENCAP mode for SRv6
encapsulations.

v2: use a name/value table for encap modes

David Lebrun (2):
  iproute: add support for seg6 l2encap mode
  man: add documentation for seg6 l2encap mode

 ip/iproute_lwtunnel.c  | 41 ++++++++++++++++++++++++++++++-----------
 man/man8/ip-route.8.in |  6 +++++-
 2 files changed, 35 insertions(+), 12 deletions(-)

-- 
2.10.2

^ permalink raw reply

* [PATCH net-next v2 09/10] net: dsa: restore VLAN dump
From: Vivien Didelot @ 2017-08-28 19:17 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Egil Hjelmeland, John Crispin, Woojung Huh,
	Sean Wang, Nikita Yushchenko, Chris Healy, Vivien Didelot
In-Reply-To: <20170828191748.19492-1-vivien.didelot@savoirfairelinux.com>

This commit defines a dsa_vlan_dump_cb_t callback, similar to the FDB
dump callback and partly reverts commit a0b6b8c9fa3c ("net: dsa: Remove
support for vlan dump from DSA's drivers") to restore the DSA drivers
VLAN dump operations.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
 drivers/net/dsa/b53/b53_common.c       | 41 ++++++++++++++++++++++++++++
 drivers/net/dsa/b53/b53_priv.h         |  2 ++
 drivers/net/dsa/bcm_sf2.c              |  1 +
 drivers/net/dsa/dsa_loop.c             | 38 ++++++++++++++++++++++++++
 drivers/net/dsa/microchip/ksz_common.c | 41 ++++++++++++++++++++++++++++
 drivers/net/dsa/mv88e6xxx/chip.c       | 49 ++++++++++++++++++++++++++++++++++
 include/net/dsa.h                      |  5 ++++
 7 files changed, 177 insertions(+)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 274f3679f33d..be0c5fa8bd9b 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -1053,6 +1053,46 @@ int b53_vlan_del(struct dsa_switch *ds, int port,
 }
 EXPORT_SYMBOL(b53_vlan_del);
 
+int b53_vlan_dump(struct dsa_switch *ds, int port, dsa_vlan_dump_cb_t *cb,
+		  void *data)
+{
+	struct b53_device *dev = ds->priv;
+	u16 vid, vid_start = 0, pvid;
+	struct b53_vlan *vl;
+	bool untagged;
+	int err = 0;
+
+	if (is5325(dev) || is5365(dev))
+		vid_start = 1;
+
+	b53_read16(dev, B53_VLAN_PAGE, B53_VLAN_PORT_DEF_TAG(port), &pvid);
+
+	/* Use our software cache for dumps, since we do not have any HW
+	 * operation returning only the used/valid VLANs
+	 */
+	for (vid = vid_start; vid < dev->num_vlans; vid++) {
+		vl = &dev->vlans[vid];
+
+		if (!vl->valid)
+			continue;
+
+		if (!(vl->members & BIT(port)))
+			continue;
+
+		untagged = false;
+
+		if (vl->untag & BIT(port))
+			untagged = true;
+
+		err = cb(vid, pvid == vid, untagged, data);
+		if (err)
+			break;
+	}
+
+	return err;
+}
+EXPORT_SYMBOL(b53_vlan_dump);
+
 /* Address Resolution Logic routines */
 static int b53_arl_op_wait(struct b53_device *dev)
 {
@@ -1503,6 +1543,7 @@ static const struct dsa_switch_ops b53_switch_ops = {
 	.port_vlan_prepare	= b53_vlan_prepare,
 	.port_vlan_add		= b53_vlan_add,
 	.port_vlan_del		= b53_vlan_del,
+	.port_vlan_dump		= b53_vlan_dump,
 	.port_fdb_dump		= b53_fdb_dump,
 	.port_fdb_add		= b53_fdb_add,
 	.port_fdb_del		= b53_fdb_del,
diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h
index 01bd8cbe9a3f..2b3e59d80fdb 100644
--- a/drivers/net/dsa/b53/b53_priv.h
+++ b/drivers/net/dsa/b53/b53_priv.h
@@ -393,6 +393,8 @@ void b53_vlan_add(struct dsa_switch *ds, int port,
 		  struct switchdev_trans *trans);
 int b53_vlan_del(struct dsa_switch *ds, int port,
 		 const struct switchdev_obj_port_vlan *vlan);
+int b53_vlan_dump(struct dsa_switch *ds, int port, dsa_vlan_dump_cb_t *cb,
+		  void *data);
 int b53_fdb_add(struct dsa_switch *ds, int port,
 		const unsigned char *addr, u16 vid);
 int b53_fdb_del(struct dsa_switch *ds, int port,
diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index bbcb4053e04e..1907b27297c3 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -1021,6 +1021,7 @@ static const struct dsa_switch_ops bcm_sf2_ops = {
 	.port_vlan_prepare	= b53_vlan_prepare,
 	.port_vlan_add		= b53_vlan_add,
 	.port_vlan_del		= b53_vlan_del,
+	.port_vlan_dump		= b53_vlan_dump,
 	.port_fdb_dump		= b53_fdb_dump,
 	.port_fdb_add		= b53_fdb_add,
 	.port_fdb_del		= b53_fdb_del,
diff --git a/drivers/net/dsa/dsa_loop.c b/drivers/net/dsa/dsa_loop.c
index 7819a9fe8321..0407533f725f 100644
--- a/drivers/net/dsa/dsa_loop.c
+++ b/drivers/net/dsa/dsa_loop.c
@@ -257,6 +257,43 @@ static int dsa_loop_port_vlan_del(struct dsa_switch *ds, int port,
 	return 0;
 }
 
+static int dsa_loop_port_vlan_dump(struct dsa_switch *ds, int port,
+				   dsa_vlan_dump_cb_t *cb, void *data)
+{
+	struct dsa_loop_priv *ps = ds->priv;
+	struct mii_bus *bus = ps->bus;
+	struct dsa_loop_vlan *vl;
+	u16 vid, vid_start = 0;
+	bool pvid, untagged;
+	int err = 0;
+
+	dev_dbg(ds->dev, "%s\n", __func__);
+
+	/* Just do a sleeping operation to make lockdep checks effective */
+	mdiobus_read(bus, ps->port_base + port, MII_BMSR);
+
+	for (vid = vid_start; vid < DSA_LOOP_VLANS; vid++) {
+		vl = &ps->vlans[vid];
+
+		if (!(vl->members & BIT(port)))
+			continue;
+
+		untagged = false;
+		pvid = false;
+
+		if (vl->untagged & BIT(port))
+			untagged = true;
+		if (ps->pvid == vid)
+			pvid = true;
+
+		err = cb(vid, pvid, untagged, data);
+		if (err)
+			break;
+	}
+
+	return err;
+}
+
 static const struct dsa_switch_ops dsa_loop_driver = {
 	.get_tag_protocol	= dsa_loop_get_protocol,
 	.setup			= dsa_loop_setup,
@@ -273,6 +310,7 @@ static const struct dsa_switch_ops dsa_loop_driver = {
 	.port_vlan_prepare	= dsa_loop_port_vlan_prepare,
 	.port_vlan_add		= dsa_loop_port_vlan_add,
 	.port_vlan_del		= dsa_loop_port_vlan_del,
+	.port_vlan_dump		= dsa_loop_port_vlan_dump,
 };
 
 static int dsa_loop_drv_probe(struct mdio_device *mdiodev)
diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c
index 56cd6d365352..52c7849acc3c 100644
--- a/drivers/net/dsa/microchip/ksz_common.c
+++ b/drivers/net/dsa/microchip/ksz_common.c
@@ -638,6 +638,46 @@ static int ksz_port_vlan_del(struct dsa_switch *ds, int port,
 	return 0;
 }
 
+static int ksz_port_vlan_dump(struct dsa_switch *ds, int port,
+			      dsa_vlan_dump_cb_t *cb, void *data)
+{
+	struct ksz_device *dev = ds->priv;
+	struct vlan_table *vlan_cache;
+	bool pvid, untagged;
+	u16 val;
+	int vid;
+	int err = 0;
+
+	mutex_lock(&dev->vlan_mutex);
+
+	/* use dev->vlan_cache due to lack of searching valid vlan entry */
+	for (vid = 0; vid < dev->num_vlans; vid++) {
+		vlan_cache = &dev->vlan_cache[vid];
+
+		if (!(vlan_cache->table[0] & VLAN_VALID))
+			continue;
+
+		untagged = false;
+		pvid = false;
+
+		if (vlan_cache->table[2] & BIT(port)) {
+			if (vlan_cache->table[1] & BIT(port))
+				untagged = true;
+			ksz_pread16(dev, port, REG_PORT_DEFAULT_VID, &val);
+			if (vid == (val & 0xFFFFF))
+				pvid = true;
+
+			err = cb(vid, pvid, untagged, data);
+			if (err)
+				break;
+		}
+	}
+
+	mutex_unlock(&dev->vlan_mutex);
+
+	return err;
+}
+
 struct alu_struct {
 	/* entry 1 */
 	u8	is_static:1;
@@ -1068,6 +1108,7 @@ static const struct dsa_switch_ops ksz_switch_ops = {
 	.port_vlan_prepare	= ksz_port_vlan_prepare,
 	.port_vlan_add		= ksz_port_vlan_add,
 	.port_vlan_del		= ksz_port_vlan_del,
+	.port_vlan_dump		= ksz_port_vlan_dump,
 	.port_fdb_dump		= ksz_port_fdb_dump,
 	.port_fdb_add		= ksz_port_fdb_add,
 	.port_fdb_del		= ksz_port_fdb_del,
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index c66204423641..3717ae098d58 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1011,6 +1011,54 @@ static int mv88e6xxx_vtu_loadpurge(struct mv88e6xxx_chip *chip,
 	return chip->info->ops->vtu_loadpurge(chip, entry);
 }
 
+static int mv88e6xxx_port_vlan_dump(struct dsa_switch *ds, int port,
+				    dsa_vlan_dump_cb_t *cb, void *data)
+{
+	struct mv88e6xxx_chip *chip = ds->priv;
+	struct mv88e6xxx_vtu_entry next = {
+		.vid = chip->info->max_vid,
+	};
+	bool untagged;
+	u16 pvid;
+	int err;
+
+	if (!chip->info->max_vid)
+		return -EOPNOTSUPP;
+
+	mutex_lock(&chip->reg_lock);
+
+	err = mv88e6xxx_port_get_pvid(chip, port, &pvid);
+	if (err)
+		goto unlock;
+
+	do {
+		err = mv88e6xxx_vtu_getnext(chip, &next);
+		if (err)
+			break;
+
+		if (!next.valid)
+			break;
+
+		if (next.member[port] ==
+		    MV88E6XXX_G1_VTU_DATA_MEMBER_TAG_NON_MEMBER)
+			continue;
+
+		untagged = false;
+		if (next.member[port] ==
+		    MV88E6XXX_G1_VTU_DATA_MEMBER_TAG_UNTAGGED)
+			untagged = true;
+
+		err = cb(next.vid, next.vid == pvid, untagged, data);
+		if (err)
+			break;
+	} while (next.vid < chip->info->max_vid);
+
+unlock:
+	mutex_unlock(&chip->reg_lock);
+
+	return err;
+}
+
 static int mv88e6xxx_atu_new(struct mv88e6xxx_chip *chip, u16 *fid)
 {
 	DECLARE_BITMAP(fid_bitmap, MV88E6XXX_N_FID);
@@ -3820,6 +3868,7 @@ static const struct dsa_switch_ops mv88e6xxx_switch_ops = {
 	.port_vlan_prepare	= mv88e6xxx_port_vlan_prepare,
 	.port_vlan_add		= mv88e6xxx_port_vlan_add,
 	.port_vlan_del		= mv88e6xxx_port_vlan_del,
+	.port_vlan_dump		= mv88e6xxx_port_vlan_dump,
 	.port_fdb_add           = mv88e6xxx_port_fdb_add,
 	.port_fdb_del           = mv88e6xxx_port_fdb_del,
 	.port_fdb_dump          = mv88e6xxx_port_fdb_dump,
diff --git a/include/net/dsa.h b/include/net/dsa.h
index c0d1b6c47a50..b4994c58547f 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -315,6 +315,8 @@ static inline u8 dsa_upstream_port(struct dsa_switch *ds)
 /* FDB (and MDB) dump callback */
 typedef int dsa_fdb_dump_cb_t(const unsigned char *addr, u16 vid,
 			      bool is_static, void *data);
+typedef int dsa_vlan_dump_cb_t(u16 vid, bool pvid, bool untagged, void *data);
+
 struct dsa_switch_ops {
 	/*
 	 * Legacy probing.
@@ -421,6 +423,9 @@ struct dsa_switch_ops {
 				 struct switchdev_trans *trans);
 	int	(*port_vlan_del)(struct dsa_switch *ds, int port,
 				 const struct switchdev_obj_port_vlan *vlan);
+	int	(*port_vlan_dump)(struct dsa_switch *ds, int port,
+				  dsa_vlan_dump_cb_t *cb, void *data);
+
 	/*
 	 * Forwarding database
 	 */
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next v2 03/10] net: dsa: debugfs: add tag_protocol
From: Vivien Didelot @ 2017-08-28 19:17 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Egil Hjelmeland, John Crispin, Woojung Huh,
	Sean Wang, Nikita Yushchenko, Chris Healy, Vivien Didelot
In-Reply-To: <20170828191748.19492-1-vivien.didelot@savoirfairelinux.com>

Add a debug filesystem "tag_protocol" entry to query the switch tagging
protocol through the .get_tag_protocol operation.

    # cat switch1/tag_protocol
    EDSA

To ease maintenance of tag protocols, add a dsa_tag_protocol_name helper
to the public API which to convert a tag protocol enum to a string.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 include/net/dsa.h | 26 ++++++++++++++++++++++++++
 net/dsa/debugfs.c | 23 +++++++++++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 7341178319f5..1309ba0376ae 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -39,6 +39,32 @@ enum dsa_tag_protocol {
 	DSA_TAG_LAST,		/* MUST BE LAST */
 };
 
+static inline const char *dsa_tag_protocol_name(enum dsa_tag_protocol proto)
+{
+	switch (proto) {
+	case DSA_TAG_PROTO_NONE:
+		return "none";
+	case DSA_TAG_PROTO_BRCM:
+		return "BRCM";
+	case DSA_TAG_PROTO_DSA:
+		return "DSA";
+	case DSA_TAG_PROTO_EDSA:
+		return "EDSA";
+	case DSA_TAG_PROTO_KSZ:
+		return "KSZ";
+	case DSA_TAG_PROTO_LAN9303:
+		return "LAN9303";
+	case DSA_TAG_PROTO_MTK:
+		return "MTK";
+	case DSA_TAG_PROTO_QCA:
+		return "QCA";
+	case DSA_TAG_PROTO_TRAILER:
+		return "TRAILER";
+	default:
+		return "unknown";
+	}
+}
+
 #define DSA_MAX_SWITCHES	4
 #define DSA_MAX_PORTS		12
 
diff --git a/net/dsa/debugfs.c b/net/dsa/debugfs.c
index 54e97e05a9d7..8a0e4311ff8c 100644
--- a/net/dsa/debugfs.c
+++ b/net/dsa/debugfs.c
@@ -109,6 +109,24 @@ static int dsa_debugfs_create_file(struct dsa_switch *ds, struct dentry *dir,
 	return 0;
 }
 
+static int dsa_debugfs_tag_protocol_read(struct dsa_switch *ds, int id,
+					 struct seq_file *seq)
+{
+	enum dsa_tag_protocol proto;
+
+	if (!ds->ops->get_tag_protocol)
+		return -EOPNOTSUPP;
+
+	proto = ds->ops->get_tag_protocol(ds);
+	seq_printf(seq, "%s\n", dsa_tag_protocol_name(proto));
+
+	return 0;
+}
+
+static const struct dsa_debugfs_ops dsa_debugfs_tag_protocol_ops = {
+	.read = dsa_debugfs_tag_protocol_read,
+};
+
 static int dsa_debugfs_tree_read(struct dsa_switch *ds, int id,
 				 struct seq_file *seq)
 {
@@ -150,6 +168,11 @@ static int dsa_debugfs_create_switch(struct dsa_switch *ds)
 	if (IS_ERR_OR_NULL(ds->debugfs_dir))
 		return -EFAULT;
 
+	err = dsa_debugfs_create_file(ds, ds->debugfs_dir, "tag_protocol", -1,
+				      &dsa_debugfs_tag_protocol_ops);
+	if (err)
+		return err;
+
 	err = dsa_debugfs_create_file(ds, ds->debugfs_dir, "tree", -1,
 				      &dsa_debugfs_tree_ops);
 	if (err)
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next v2 10/10] net: dsa: debugfs: add port vlan
From: Vivien Didelot @ 2017-08-28 19:17 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Egil Hjelmeland, John Crispin, Woojung Huh,
	Sean Wang, Nikita Yushchenko, Chris Healy, Vivien Didelot
In-Reply-To: <20170828191748.19492-1-vivien.didelot@savoirfairelinux.com>

Add a debug filesystem "vlan" entry to query a port's hardware VLAN
entries through the .port_vlan_dump switch operation.

This is really convenient to query directly the hardware or inspect DSA
or CPU links, since these ports are not exposed to userspace.

Here are the VLAN entries for a CPU port:

    # cat port5/vlan
    vid 1  untagged  pvid
    vid 42  tagged

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
 net/dsa/debugfs.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/net/dsa/debugfs.c b/net/dsa/debugfs.c
index bed8e1d5cfe1..40fe19872ab1 100644
--- a/net/dsa/debugfs.c
+++ b/net/dsa/debugfs.c
@@ -250,6 +250,30 @@ static const struct dsa_debugfs_ops dsa_debugfs_tree_ops = {
 	.read = dsa_debugfs_tree_read,
 };
 
+static int dsa_debugfs_vlan_dump_cb(u16 vid, bool pvid, bool untagged,
+				    void *data)
+{
+	struct seq_file *seq = data;
+
+	seq_printf(seq, "vid %d  %s  %s\n", vid,
+		   untagged ? "untagged" : "tagged", pvid ? "pvid" : "");
+
+	return 0;
+}
+
+static int dsa_debugfs_vlan_read(struct dsa_switch *ds, int id,
+				 struct seq_file *seq)
+{
+	if (!ds->ops->port_vlan_dump)
+		return -EOPNOTSUPP;
+
+	return ds->ops->port_vlan_dump(ds, id, dsa_debugfs_vlan_dump_cb, seq);
+}
+
+static const struct dsa_debugfs_ops dsa_debugfs_vlan_ops = {
+	.read = dsa_debugfs_vlan_read,
+};
+
 static int dsa_debugfs_create_port(struct dsa_switch *ds, int port)
 {
 	struct dentry *dir;
@@ -282,6 +306,11 @@ static int dsa_debugfs_create_port(struct dsa_switch *ds, int port)
 	if (err)
 		return err;
 
+	err = dsa_debugfs_create_file(ds, dir, "vlan", port,
+				      &dsa_debugfs_vlan_ops);
+	if (err)
+		return err;
+
 	return 0;
 }
 
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next v2 08/10] net: dsa: debugfs: add port mdb
From: Vivien Didelot @ 2017-08-28 19:17 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Egil Hjelmeland, John Crispin, Woojung Huh,
	Sean Wang, Nikita Yushchenko, Chris Healy, Vivien Didelot
In-Reply-To: <20170828191748.19492-1-vivien.didelot@savoirfairelinux.com>

Add a debug filesystem "mdb" entry to query a port's hardware MDB
entries through the .port_mdb_dump switch operation.

This is really convenient to query directly the hardware or inspect DSA
or CPU links, since these ports are not exposed to userspace.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
 net/dsa/debugfs.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/net/dsa/debugfs.c b/net/dsa/debugfs.c
index 59c09a67bc23..bed8e1d5cfe1 100644
--- a/net/dsa/debugfs.c
+++ b/net/dsa/debugfs.c
@@ -135,6 +135,20 @@ static const struct dsa_debugfs_ops dsa_debugfs_fdb_ops = {
 	.read = dsa_debugfs_fdb_read,
 };
 
+static int dsa_debugfs_mdb_read(struct dsa_switch *ds, int id,
+				struct seq_file *seq)
+{
+	if (!ds->ops->port_mdb_dump)
+		return -EOPNOTSUPP;
+
+	/* same callback as for FDB dump */
+	return ds->ops->port_mdb_dump(ds, id, dsa_debugfs_fdb_dump_cb, seq);
+}
+
+static const struct dsa_debugfs_ops dsa_debugfs_mdb_ops = {
+	.read = dsa_debugfs_mdb_read,
+};
+
 static void dsa_debugfs_regs_read_count(struct dsa_switch *ds, int id,
 					struct seq_file *seq, int count)
 {
@@ -253,6 +267,11 @@ static int dsa_debugfs_create_port(struct dsa_switch *ds, int port)
 	if (err)
 		return err;
 
+	err = dsa_debugfs_create_file(ds, dir, "mdb", port,
+				      &dsa_debugfs_mdb_ops);
+	if (err)
+		return err;
+
 	err = dsa_debugfs_create_file(ds, dir, "regs", port,
 				      &dsa_debugfs_regs_ops);
 	if (err)
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next v2 07/10] net: dsa: restore mdb dump
From: Vivien Didelot @ 2017-08-28 19:17 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Egil Hjelmeland, John Crispin, Woojung Huh,
	Sean Wang, Nikita Yushchenko, Chris Healy, Vivien Didelot
In-Reply-To: <20170828191748.19492-1-vivien.didelot@savoirfairelinux.com>

The same dsa_fdb_dump_cb_t callback is used since there is no
distinction to do between FDB and MDB entries at this layer.

Implement mv88e6xxx_port_mdb_dump so that multicast addresses associated
to a switch port can be dumped.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 33 +++++++++++++++++++++++++--------
 include/net/dsa.h                |  3 +++
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index c6678aa9b4ef..c66204423641 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1380,7 +1380,7 @@ static int mv88e6xxx_port_fdb_del(struct dsa_switch *ds, int port,
 }
 
 static int mv88e6xxx_port_db_dump_fid(struct mv88e6xxx_chip *chip,
-				      u16 fid, u16 vid, int port,
+				      u16 fid, u16 vid, int port, bool mc,
 				      dsa_fdb_dump_cb_t *cb, void *data)
 {
 	struct mv88e6xxx_atu_entry addr;
@@ -1401,11 +1401,14 @@ static int mv88e6xxx_port_db_dump_fid(struct mv88e6xxx_chip *chip,
 		if (addr.trunk || (addr.portvec & BIT(port)) == 0)
 			continue;
 
-		if (!is_unicast_ether_addr(addr.mac))
+		if ((is_unicast_ether_addr(addr.mac) && mc) ||
+		    (is_multicast_ether_addr(addr.mac) && !mc))
 			continue;
 
-		is_static = (addr.state ==
-			     MV88E6XXX_G1_ATU_DATA_STATE_UC_STATIC);
+		is_static = addr.state == mc ?
+			MV88E6XXX_G1_ATU_DATA_STATE_MC_STATIC :
+			MV88E6XXX_G1_ATU_DATA_STATE_UC_STATIC;
+
 		err = cb(addr.mac, vid, is_static, data);
 		if (err)
 			return err;
@@ -1415,7 +1418,7 @@ static int mv88e6xxx_port_db_dump_fid(struct mv88e6xxx_chip *chip,
 }
 
 static int mv88e6xxx_port_db_dump(struct mv88e6xxx_chip *chip, int port,
-				  dsa_fdb_dump_cb_t *cb, void *data)
+				  bool mc, dsa_fdb_dump_cb_t *cb, void *data)
 {
 	struct mv88e6xxx_vtu_entry vlan = {
 		.vid = chip->info->max_vid,
@@ -1428,7 +1431,7 @@ static int mv88e6xxx_port_db_dump(struct mv88e6xxx_chip *chip, int port,
 	if (err)
 		return err;
 
-	err = mv88e6xxx_port_db_dump_fid(chip, fid, 0, port, cb, data);
+	err = mv88e6xxx_port_db_dump_fid(chip, fid, 0, port, mc, cb, data);
 	if (err)
 		return err;
 
@@ -1442,7 +1445,7 @@ static int mv88e6xxx_port_db_dump(struct mv88e6xxx_chip *chip, int port,
 			break;
 
 		err = mv88e6xxx_port_db_dump_fid(chip, vlan.fid, vlan.vid, port,
-						 cb, data);
+						 mc, cb, data);
 		if (err)
 			return err;
 	} while (vlan.vid < chip->info->max_vid);
@@ -1457,7 +1460,7 @@ static int mv88e6xxx_port_fdb_dump(struct dsa_switch *ds, int port,
 	int err;
 
 	mutex_lock(&chip->reg_lock);
-	err = mv88e6xxx_port_db_dump(chip, port, cb, data);
+	err = mv88e6xxx_port_db_dump(chip, port, false, cb, data);
 	mutex_unlock(&chip->reg_lock);
 
 	return err;
@@ -3777,6 +3780,19 @@ static int mv88e6xxx_port_mdb_del(struct dsa_switch *ds, int port,
 	return err;
 }
 
+static int mv88e6xxx_port_mdb_dump(struct dsa_switch *ds, int port,
+				   dsa_fdb_dump_cb_t *cb, void *data)
+{
+	struct mv88e6xxx_chip *chip = ds->priv;
+	int err;
+
+	mutex_lock(&chip->reg_lock);
+	err = mv88e6xxx_port_db_dump(chip, port, true, cb, data);
+	mutex_unlock(&chip->reg_lock);
+
+	return err;
+}
+
 static const struct dsa_switch_ops mv88e6xxx_switch_ops = {
 	.probe			= mv88e6xxx_drv_probe,
 	.get_tag_protocol	= mv88e6xxx_get_tag_protocol,
@@ -3810,6 +3826,7 @@ static const struct dsa_switch_ops mv88e6xxx_switch_ops = {
 	.port_mdb_prepare       = mv88e6xxx_port_mdb_prepare,
 	.port_mdb_add           = mv88e6xxx_port_mdb_add,
 	.port_mdb_del           = mv88e6xxx_port_mdb_del,
+	.port_mdb_dump          = mv88e6xxx_port_mdb_dump,
 	.crosschip_bridge_join	= mv88e6xxx_crosschip_bridge_join,
 	.crosschip_bridge_leave	= mv88e6xxx_crosschip_bridge_leave,
 };
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 1309ba0376ae..c0d1b6c47a50 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -312,6 +312,7 @@ static inline u8 dsa_upstream_port(struct dsa_switch *ds)
 		return ds->rtable[dst->cpu_dp->ds->index];
 }
 
+/* FDB (and MDB) dump callback */
 typedef int dsa_fdb_dump_cb_t(const unsigned char *addr, u16 vid,
 			      bool is_static, void *data);
 struct dsa_switch_ops {
@@ -441,6 +442,8 @@ struct dsa_switch_ops {
 				struct switchdev_trans *trans);
 	int	(*port_mdb_del)(struct dsa_switch *ds, int port,
 				const struct switchdev_obj_port_mdb *mdb);
+	int	(*port_mdb_dump)(struct dsa_switch *ds, int port,
+				 dsa_fdb_dump_cb_t *cb, void *data);
 	/*
 	 * RXNFC
 	 */
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next v2 06/10] net: dsa: debugfs: add port fdb
From: Vivien Didelot @ 2017-08-28 19:17 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Egil Hjelmeland, John Crispin, Woojung Huh,
	Sean Wang, Nikita Yushchenko, Chris Healy, Vivien Didelot
In-Reply-To: <20170828191748.19492-1-vivien.didelot@savoirfairelinux.com>

Add a debug filesystem "fdb" entry to query a port's hardware FDB
entries through the .port_fdb_dump switch operation.

This is really convenient to query directly the hardware or inspect DSA
or CPU links, since these ports are not exposed to userspace.

    # cat port1/fdb
    vid 0  12:34:56:78:90:ab  unicast  static

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
 net/dsa/debugfs.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/net/dsa/debugfs.c b/net/dsa/debugfs.c
index 7b299c9d9892..59c09a67bc23 100644
--- a/net/dsa/debugfs.c
+++ b/net/dsa/debugfs.c
@@ -10,6 +10,7 @@
  */
 
 #include <linux/debugfs.h>
+#include <linux/etherdevice.h>
 #include <linux/seq_file.h>
 
 #include "dsa_priv.h"
@@ -109,6 +110,31 @@ static int dsa_debugfs_create_file(struct dsa_switch *ds, struct dentry *dir,
 	return 0;
 }
 
+static int dsa_debugfs_fdb_dump_cb(const unsigned char *addr, u16 vid,
+				   bool is_static, void *data)
+{
+	struct seq_file *seq = data;
+
+	seq_printf(seq, "vid %d  %pM  %s  %s\n", vid, addr,
+		   is_unicast_ether_addr(addr) ? "unicast" : "multicast",
+		   is_static ? "static" : "dynamic");
+
+	return 0;
+}
+
+static int dsa_debugfs_fdb_read(struct dsa_switch *ds, int id,
+				struct seq_file *seq)
+{
+	if (!ds->ops->port_fdb_dump)
+		return -EOPNOTSUPP;
+
+	return ds->ops->port_fdb_dump(ds, id, dsa_debugfs_fdb_dump_cb, seq);
+}
+
+static const struct dsa_debugfs_ops dsa_debugfs_fdb_ops = {
+	.read = dsa_debugfs_fdb_read,
+};
+
 static void dsa_debugfs_regs_read_count(struct dsa_switch *ds, int id,
 					struct seq_file *seq, int count)
 {
@@ -222,6 +248,11 @@ static int dsa_debugfs_create_port(struct dsa_switch *ds, int port)
 	if (IS_ERR_OR_NULL(dir))
 		return -EFAULT;
 
+	err = dsa_debugfs_create_file(ds, dir, "fdb", port,
+				      &dsa_debugfs_fdb_ops);
+	if (err)
+		return err;
+
 	err = dsa_debugfs_create_file(ds, dir, "regs", port,
 				      &dsa_debugfs_regs_ops);
 	if (err)
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next v2 05/10] net: dsa: debugfs: add port regs
From: Vivien Didelot @ 2017-08-28 19:17 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Egil Hjelmeland, John Crispin, Woojung Huh,
	Sean Wang, Nikita Yushchenko, Chris Healy, Vivien Didelot
In-Reply-To: <20170828191748.19492-1-vivien.didelot@savoirfairelinux.com>

Add a debug filesystem "regs" entry to query a port's hardware registers
through the .get_regs_len and .get_regs_len switch operations.

This is very convenient because it allows one to dump the registers of
DSA links, which are not exposed to userspace.

Here are the registers of a zii-rev-b CPU and DSA ports:

    # pr -mt switch0/port{5,6}/regs
     0: 4e07			     0: 4d04
     1: 403e			     1: 003d
     2: 0000			     2: 0000
     3: 3521			     3: 3521
     4: 0533			     4: 373f
     5: 8000			     5: 0000
     6: 005f			     6: 003f
     7: 002a			     7: 002a
     8: 2080			     8: 2080
     9: 0001			     9: 0001
    10: 0000			    10: 0000
    11: 0020			    11: 0000
    12: 0000			    12: 0000
    13: 0000			    13: 0000
    14: 0000			    14: 0000
    15: 9100			    15: dada
    16: 0000			    16: 0000
    17: 0000			    17: 0000
    18: 0000			    18: 0000
    19: 0000			    19: 00d8
    20: 0000			    20: 0000
    21: 0000			    21: 0000
    22: 0022			    22: 0000
    23: 0000			    23: 0000
    24: 3210			    24: 3210
    25: 7654			    25: 7654
    26: 0000			    26: 0000
    27: 8000			    27: 8000
    28: 0000			    28: 0000
    29: 0000			    29: 0000
    30: 0000			    30: 0000
    31: 0000			    31: 0000

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
 net/dsa/debugfs.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/net/dsa/debugfs.c b/net/dsa/debugfs.c
index 997bbc8eb502..7b299c9d9892 100644
--- a/net/dsa/debugfs.c
+++ b/net/dsa/debugfs.c
@@ -109,6 +109,40 @@ static int dsa_debugfs_create_file(struct dsa_switch *ds, struct dentry *dir,
 	return 0;
 }
 
+static void dsa_debugfs_regs_read_count(struct dsa_switch *ds, int id,
+					struct seq_file *seq, int count)
+{
+	u16 data[count * ETH_GSTRING_LEN];
+	struct ethtool_regs regs;
+	int i;
+
+	ds->ops->get_regs(ds, id, &regs, data);
+
+	for (i = 0; i < count / 2; i++)
+		seq_printf(seq, "%2d: %04x\n", i, data[i]);
+}
+
+static int dsa_debugfs_regs_read(struct dsa_switch *ds, int id,
+				 struct seq_file *seq)
+{
+	int count;
+
+	if (!ds->ops->get_regs_len || !ds->ops->get_regs)
+		return -EOPNOTSUPP;
+
+	count = ds->ops->get_regs_len(ds, id);
+	if (count < 0)
+		return count;
+
+	dsa_debugfs_regs_read_count(ds, id, seq, count);
+
+	return 0;
+}
+
+static const struct dsa_debugfs_ops dsa_debugfs_regs_ops = {
+	.read = dsa_debugfs_regs_read,
+};
+
 static void dsa_debugfs_stats_read_count(struct dsa_switch *ds, int id,
 					 struct seq_file *seq, int count)
 {
@@ -188,6 +222,11 @@ static int dsa_debugfs_create_port(struct dsa_switch *ds, int port)
 	if (IS_ERR_OR_NULL(dir))
 		return -EFAULT;
 
+	err = dsa_debugfs_create_file(ds, dir, "regs", port,
+				      &dsa_debugfs_regs_ops);
+	if (err)
+		return err;
+
 	err = dsa_debugfs_create_file(ds, dir, "stats", port,
 				      &dsa_debugfs_stats_ops);
 	if (err)
-- 
2.14.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox