netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [net-next 0/7][pull request] 1GbE Intel Wired LAN Driver Updates 2017-10-26
@ 2017-10-26 17:17 Jeff Kirsher
  2017-10-26 17:17 ` [net-next 1/7] net/sched: Check for null dev_queue on create flow Jeff Kirsher
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: Jeff Kirsher @ 2017-10-26 17:17 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene

This patchset is a proposal of how the Traffic Control subsystem can
be used to offload the configuration of the Credit Based Shaper
(defined in the IEEE 802.1Q-2014 Section 8.6.8.2) into supported
network devices.

As part of this work, we've assessed previous public discussions
related to TSN enabling: patches from Henrik Austad (Cisco), the
presentation from Eric Mann at Linux Plumbers 2012, patches from
Gangfeng Huang (National Instruments) and the current state of the
OpenAVNU project (https://github.com/AVnu/OpenAvnu/).

Overview
========

Time-sensitive Networking (TSN) is a set of standards that aim to
address resources availability for providing bandwidth reservation and
bounded latency on Ethernet based LANs. The proposal described here
aims to cover mainly what is needed to enable the following standards:
802.1Qat and 802.1Qav.

The initial target of this work is the Intel i210 NIC, but other
controllers' datasheet were also taken into account, like the Renesas
RZ/A1H RZ/A1M group and the Synopsis DesignWare Ethernet QoS
controller.


Proposal
========

Feature-wise, what is covered here is the configuration interfaces for
HW implementations of the Credit-Based shaper (CBS, 802.1Qav). CBS is
a per-queue shaper. Given that this feature is related to traffic
shaping, and that the traffic control subsystem already provides a
queueing discipline that offloads config into the device driver (i.e.
mqprio), designing a new qdisc for the specific purpose of offloading
the config for the CBS shaper seemed like a good fit.

For steering traffic into the correct queues, we use the socket option
SO_PRIORITY and then a mechanism to map priority to traffic classes /
Tx queues. The qdisc mqprio is currently used in our tests.

As for the CBS config interface, this patchset is proposing a new
qdisc called 'cbs'. Its 'tc' cmd line is:

$ tc qdisc add dev IFACE parent ID cbs locredit N hicredit M sendslope S \
     idleslope I

   Note that the parameters for this qdisc are the ones defined by the
   802.1Q-2014 spec, so no hardware specific functionality is exposed here.

Per-stream shaping, as defined by IEEE 802.1Q-2014 Section 34.6.1, is
not yet covered by this proposal.

The following are changes since commit 3fc27b71b894f5e2ad611297d5817b6092c96626:
  tools: bpftool: try to mount bpffs if required for pinning objects
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 1GbE

Andre Guedes (1):
  igb: Add support for CBS offload

Jesus Sanchez-Palencia (3):
  net/sched: Check for null dev_queue on create flow
  net/sched: Change behavior of mq select_queue()
  net/sched: Add select_queue() class_ops for mqprio

Vinicius Costa Gomes (3):
  net/sched: Introduce Credit Based Shaper (CBS) qdisc
  net/sched: Add support for HW offloading for CBS
  net/sched/sch_cbs: Fix compilation on 32bit architectures

 drivers/net/ethernet/intel/igb/e1000_defines.h |  23 ++
 drivers/net/ethernet/intel/igb/e1000_regs.h    |   8 +
 drivers/net/ethernet/intel/igb/igb.h           |   6 +
 drivers/net/ethernet/intel/igb/igb_main.c      | 347 +++++++++++++++++++++++
 include/linux/netdevice.h                      |   1 +
 include/net/pkt_sched.h                        |   9 +
 include/uapi/linux/pkt_sched.h                 |  19 ++
 net/sched/Kconfig                              |  11 +
 net/sched/Makefile                             |   1 +
 net/sched/sch_cbs.c                            | 373 +++++++++++++++++++++++++
 net/sched/sch_generic.c                        |   8 +-
 net/sched/sch_mq.c                             |  10 +-
 net/sched/sch_mqprio.c                         |   7 +
 13 files changed, 813 insertions(+), 10 deletions(-)
 create mode 100644 net/sched/sch_cbs.c

-- 
2.14.2

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [net-next 1/7] net/sched: Check for null dev_queue on create flow
  2017-10-26 17:17 [net-next 0/7][pull request] 1GbE Intel Wired LAN Driver Updates 2017-10-26 Jeff Kirsher
@ 2017-10-26 17:17 ` Jeff Kirsher
  2017-10-27 10:23   ` Davide Caratti
  2017-10-26 17:17 ` [net-next 2/7] net/sched: Change behavior of mq select_queue() Jeff Kirsher
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 11+ messages in thread
From: Jeff Kirsher @ 2017-10-26 17:17 UTC (permalink / raw)
  To: davem
  Cc: Jesus Sanchez-Palencia, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher

From: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>

In qdisc_alloc() the dev_queue pointer was used without any checks
being performed. If qdisc_create() gets a null dev_queue pointer, it
just passes it along to qdisc_alloc(), leading to a crash. That
happens if a root qdisc implements select_queue() and returns a null
dev_queue pointer for an "invalid handle", for example, or if the
dev_queue associated with the parent qdisc is null.

This patch is in preparation for the next in this series, where
select_queue() is being added to mqprio and as it may return a null
dev_queue.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Tested-by: Henrik Austad <henrik@austad.us>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 net/sched/sch_generic.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 6ced7c89c141..aa74aa42b5d7 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -603,8 +603,14 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
 	struct Qdisc *sch;
 	unsigned int size = QDISC_ALIGN(sizeof(*sch)) + ops->priv_size;
 	int err = -ENOBUFS;
-	struct net_device *dev = dev_queue->dev;
+	struct net_device *dev;
+
+	if (!dev_queue) {
+		err = -EINVAL;
+		goto errout;
+	}
 
+	dev = dev_queue->dev;
 	p = kzalloc_node(size, GFP_KERNEL,
 			 netdev_queue_numa_node_read(dev_queue));
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [net-next 2/7] net/sched: Change behavior of mq select_queue()
  2017-10-26 17:17 [net-next 0/7][pull request] 1GbE Intel Wired LAN Driver Updates 2017-10-26 Jeff Kirsher
  2017-10-26 17:17 ` [net-next 1/7] net/sched: Check for null dev_queue on create flow Jeff Kirsher
@ 2017-10-26 17:17 ` Jeff Kirsher
  2017-10-26 17:17 ` [net-next 3/7] net/sched: Add select_queue() class_ops for mqprio Jeff Kirsher
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Jeff Kirsher @ 2017-10-26 17:17 UTC (permalink / raw)
  To: davem
  Cc: Jesus Sanchez-Palencia, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher

From: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>

Currently, the class_ops select_queue() implementation on sch_mq
returns a pointer to netdev_queue #0 when it receives and invalid
qdisc id. That can be misleading since all of mq's inner qdiscs are
attached to a valid netdev_queue.

Here we fix that by returning NULL when a qdisc id is invalid. This is
aligned with how select_queue() is implemented for sch_mqprio in the
next patch on this series, keeping a consistent behavior between these
two qdiscs.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Tested-by: Henrik Austad <henrik@austad.us>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 net/sched/sch_mq.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index f3a3e507422b..213b586a06a0 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -130,15 +130,7 @@ static struct netdev_queue *mq_queue_get(struct Qdisc *sch, unsigned long cl)
 static struct netdev_queue *mq_select_queue(struct Qdisc *sch,
 					    struct tcmsg *tcm)
 {
-	unsigned int ntx = TC_H_MIN(tcm->tcm_parent);
-	struct netdev_queue *dev_queue = mq_queue_get(sch, ntx);
-
-	if (!dev_queue) {
-		struct net_device *dev = qdisc_dev(sch);
-
-		return netdev_get_tx_queue(dev, 0);
-	}
-	return dev_queue;
+	return mq_queue_get(sch, TC_H_MIN(tcm->tcm_parent));
 }
 
 static int mq_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [net-next 3/7] net/sched: Add select_queue() class_ops for mqprio
  2017-10-26 17:17 [net-next 0/7][pull request] 1GbE Intel Wired LAN Driver Updates 2017-10-26 Jeff Kirsher
  2017-10-26 17:17 ` [net-next 1/7] net/sched: Check for null dev_queue on create flow Jeff Kirsher
  2017-10-26 17:17 ` [net-next 2/7] net/sched: Change behavior of mq select_queue() Jeff Kirsher
@ 2017-10-26 17:17 ` Jeff Kirsher
  2017-10-26 17:17 ` [net-next 4/7] net/sched: Introduce Credit Based Shaper (CBS) qdisc Jeff Kirsher
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Jeff Kirsher @ 2017-10-26 17:17 UTC (permalink / raw)
  To: davem
  Cc: Jesus Sanchez-Palencia, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher

From: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>

When replacing a child qdisc from mqprio, tc_modify_qdisc() must fetch
the netdev_queue pointer that the current child qdisc is associated
with before creating the new qdisc.

Currently, when using mqprio as root qdisc, the kernel will end up
getting the queue #0 pointer from the mqprio (root qdisc), which leaves
any new child qdisc with a possibly wrong netdev_queue pointer.

Implementing the Qdisc_class_ops select_queue() on mqprio fixes this
issue and avoid an inconsistent state when child qdiscs are replaced.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Tested-by: Henrik Austad <henrik@austad.us>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 net/sched/sch_mqprio.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 51c2b289c69b..4d5ed45123f0 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -575,6 +575,12 @@ static void mqprio_walk(struct Qdisc *sch, struct qdisc_walker *arg)
 	}
 }
 
+static struct netdev_queue *mqprio_select_queue(struct Qdisc *sch,
+						struct tcmsg *tcm)
+{
+	return mqprio_queue_get(sch, TC_H_MIN(tcm->tcm_parent));
+}
+
 static const struct Qdisc_class_ops mqprio_class_ops = {
 	.graft		= mqprio_graft,
 	.leaf		= mqprio_leaf,
@@ -582,6 +588,7 @@ static const struct Qdisc_class_ops mqprio_class_ops = {
 	.walk		= mqprio_walk,
 	.dump		= mqprio_dump_class,
 	.dump_stats	= mqprio_dump_class_stats,
+	.select_queue	= mqprio_select_queue,
 };
 
 static struct Qdisc_ops mqprio_qdisc_ops __read_mostly = {
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [net-next 4/7] net/sched: Introduce Credit Based Shaper (CBS) qdisc
  2017-10-26 17:17 [net-next 0/7][pull request] 1GbE Intel Wired LAN Driver Updates 2017-10-26 Jeff Kirsher
                   ` (2 preceding siblings ...)
  2017-10-26 17:17 ` [net-next 3/7] net/sched: Add select_queue() class_ops for mqprio Jeff Kirsher
@ 2017-10-26 17:17 ` Jeff Kirsher
  2017-10-26 17:17 ` [net-next 5/7] net/sched: Add support for HW offloading for CBS Jeff Kirsher
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Jeff Kirsher @ 2017-10-26 17:17 UTC (permalink / raw)
  To: davem
  Cc: Vinicius Costa Gomes, netdev, nhorman, sassmann, jogreene,
	Jesus Sanchez-Palencia, Jeff Kirsher

From: Vinicius Costa Gomes <vinicius.gomes@intel.com>

This queueing discipline implements the shaper algorithm defined by
the 802.1Q-2014 Section 8.6.8.2 and detailed in Annex L.

It's primary usage is to apply some bandwidth reservation to user
defined traffic classes, which are mapped to different queues via the
mqprio qdisc.

Only a simple software implementation is added for now.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Tested-by: Henrik Austad <henrik@austad.us>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 include/uapi/linux/pkt_sched.h |  19 +++
 net/sched/Kconfig              |  11 ++
 net/sched/Makefile             |   1 +
 net/sched/sch_cbs.c            | 293 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 324 insertions(+)
 create mode 100644 net/sched/sch_cbs.c

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index e7cc3d3c7421..0e88cc262ca0 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -904,4 +904,23 @@ struct tc_pie_xstats {
 	__u32 maxq;             /* maximum queue size */
 	__u32 ecn_mark;         /* packets marked with ecn*/
 };
+
+/* CBS */
+struct tc_cbs_qopt {
+	__u8 offload;
+	__u8 _pad[3];
+	__s32 hicredit;
+	__s32 locredit;
+	__s32 idleslope;
+	__s32 sendslope;
+};
+
+enum {
+	TCA_CBS_UNSPEC,
+	TCA_CBS_PARMS,
+	__TCA_CBS_MAX,
+};
+
+#define TCA_CBS_MAX (__TCA_CBS_MAX - 1)
+
 #endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index e70ed26485a2..c03d86a7775e 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -172,6 +172,17 @@ config NET_SCH_TBF
 	  To compile this code as a module, choose M here: the
 	  module will be called sch_tbf.
 
+config NET_SCH_CBS
+	tristate "Credit Based Shaper (CBS)"
+	---help---
+	  Say Y here if you want to use the Credit Based Shaper (CBS) packet
+	  scheduling algorithm.
+
+	  See the top of <file:net/sched/sch_cbs.c> for more details.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called sch_cbs.
+
 config NET_SCH_GRED
 	tristate "Generic Random Early Detection (GRED)"
 	---help---
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 7b915d226de7..80c8f92d162d 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -52,6 +52,7 @@ obj-$(CONFIG_NET_SCH_FQ_CODEL)	+= sch_fq_codel.o
 obj-$(CONFIG_NET_SCH_FQ)	+= sch_fq.o
 obj-$(CONFIG_NET_SCH_HHF)	+= sch_hhf.o
 obj-$(CONFIG_NET_SCH_PIE)	+= sch_pie.o
+obj-$(CONFIG_NET_SCH_CBS)	+= sch_cbs.o
 
 obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
new file mode 100644
index 000000000000..c0102b589494
--- /dev/null
+++ b/net/sched/sch_cbs.c
@@ -0,0 +1,293 @@
+/*
+ * net/sched/sch_cbs.c	Credit Based Shaper
+ *
+ *		This program is free software; you can redistribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Authors:	Vinicius Costa Gomes <vinicius.gomes@intel.com>
+ *
+ */
+
+/* Credit Based Shaper (CBS)
+ * =========================
+ *
+ * This is a simple rate-limiting shaper aimed at TSN applications on
+ * systems with known traffic workloads.
+ *
+ * Its algorithm is defined by the IEEE 802.1Q-2014 Specification,
+ * Section 8.6.8.2, and explained in more detail in the Annex L of the
+ * same specification.
+ *
+ * There are four tunables to be considered:
+ *
+ *	'idleslope': Idleslope is the rate of credits that is
+ *	accumulated (in kilobits per second) when there is at least
+ *	one packet waiting for transmission. Packets are transmitted
+ *	when the current value of credits is equal or greater than
+ *	zero. When there is no packet to be transmitted the amount of
+ *	credits is set to zero. This is the main tunable of the CBS
+ *	algorithm.
+ *
+ *	'sendslope':
+ *	Sendslope is the rate of credits that is depleted (it should be a
+ *	negative number of kilobits per second) when a transmission is
+ *	ocurring. It can be calculated as follows, (IEEE 802.1Q-2014 Section
+ *	8.6.8.2 item g):
+ *
+ *	sendslope = idleslope - port_transmit_rate
+ *
+ *	'hicredit': Hicredit defines the maximum amount of credits (in
+ *	bytes) that can be accumulated. Hicredit depends on the
+ *	characteristics of interfering traffic,
+ *	'max_interference_size' is the maximum size of any burst of
+ *	traffic that can delay the transmission of a frame that is
+ *	available for transmission for this traffic class, (IEEE
+ *	802.1Q-2014 Annex L, Equation L-3):
+ *
+ *	hicredit = max_interference_size * (idleslope / port_transmit_rate)
+ *
+ *	'locredit': Locredit is the minimum amount of credits that can
+ *	be reached. It is a function of the traffic flowing through
+ *	this qdisc (IEEE 802.1Q-2014 Annex L, Equation L-2):
+ *
+ *	locredit = max_frame_size * (sendslope / port_transmit_rate)
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/skbuff.h>
+#include <net/netlink.h>
+#include <net/sch_generic.h>
+#include <net/pkt_sched.h>
+
+#define BYTES_PER_KBIT (1000LL / 8)
+
+struct cbs_sched_data {
+	s64 port_rate; /* in bytes/s */
+	s64 last; /* timestamp in ns */
+	s64 credits; /* in bytes */
+	s32 locredit; /* in bytes */
+	s32 hicredit; /* in bytes */
+	s64 sendslope; /* in bytes/s */
+	s64 idleslope; /* in bytes/s */
+	struct qdisc_watchdog watchdog;
+	int (*enqueue)(struct sk_buff *skb, struct Qdisc *sch);
+	struct sk_buff *(*dequeue)(struct Qdisc *sch);
+};
+
+static int cbs_enqueue_soft(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct cbs_sched_data *q = qdisc_priv(sch);
+
+	if (sch->q.qlen == 0 && q->credits > 0) {
+		/* We need to stop accumulating credits when there's
+		 * no enqueued packets and q->credits is positive.
+		 */
+		q->credits = 0;
+		q->last = ktime_get_ns();
+	}
+
+	return qdisc_enqueue_tail(skb, sch);
+}
+
+static int cbs_enqueue(struct sk_buff *skb, struct Qdisc *sch,
+		       struct sk_buff **to_free)
+{
+	struct cbs_sched_data *q = qdisc_priv(sch);
+
+	return q->enqueue(skb, sch);
+}
+
+/* timediff is in ns, slope is in bytes/s */
+static s64 timediff_to_credits(s64 timediff, s64 slope)
+{
+	return div64_s64(timediff * slope, NSEC_PER_SEC);
+}
+
+static s64 delay_from_credits(s64 credits, s64 slope)
+{
+	if (unlikely(slope == 0))
+		return S64_MAX;
+
+	return div64_s64(-credits * NSEC_PER_SEC, slope);
+}
+
+static s64 credits_from_len(unsigned int len, s64 slope, s64 port_rate)
+{
+	if (unlikely(port_rate == 0))
+		return S64_MAX;
+
+	return div64_s64(len * slope, port_rate);
+}
+
+static struct sk_buff *cbs_dequeue_soft(struct Qdisc *sch)
+{
+	struct cbs_sched_data *q = qdisc_priv(sch);
+	s64 now = ktime_get_ns();
+	struct sk_buff *skb;
+	s64 credits;
+	int len;
+
+	if (q->credits < 0) {
+		credits = timediff_to_credits(now - q->last, q->idleslope);
+
+		credits = q->credits + credits;
+		q->credits = min_t(s64, credits, q->hicredit);
+
+		if (q->credits < 0) {
+			s64 delay;
+
+			delay = delay_from_credits(q->credits, q->idleslope);
+			qdisc_watchdog_schedule_ns(&q->watchdog, now + delay);
+
+			q->last = now;
+
+			return NULL;
+		}
+	}
+
+	skb = qdisc_dequeue_head(sch);
+	if (!skb)
+		return NULL;
+
+	len = qdisc_pkt_len(skb);
+
+	/* As sendslope is a negative number, this will decrease the
+	 * amount of q->credits.
+	 */
+	credits = credits_from_len(len, q->sendslope, q->port_rate);
+	credits += q->credits;
+
+	q->credits = max_t(s64, credits, q->locredit);
+	q->last = now;
+
+	return skb;
+}
+
+static struct sk_buff *cbs_dequeue(struct Qdisc *sch)
+{
+	struct cbs_sched_data *q = qdisc_priv(sch);
+
+	return q->dequeue(sch);
+}
+
+static const struct nla_policy cbs_policy[TCA_CBS_MAX + 1] = {
+	[TCA_CBS_PARMS]	= { .len = sizeof(struct tc_cbs_qopt) },
+};
+
+static int cbs_change(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct cbs_sched_data *q = qdisc_priv(sch);
+	struct net_device *dev = qdisc_dev(sch);
+	struct nlattr *tb[TCA_CBS_MAX + 1];
+	struct ethtool_link_ksettings ecmd;
+	struct tc_cbs_qopt *qopt;
+	s64 link_speed;
+	int err;
+
+	err = nla_parse_nested(tb, TCA_CBS_MAX, opt, cbs_policy, NULL);
+	if (err < 0)
+		return err;
+
+	if (!tb[TCA_CBS_PARMS])
+		return -EINVAL;
+
+	qopt = nla_data(tb[TCA_CBS_PARMS]);
+
+	if (qopt->offload)
+		return -EOPNOTSUPP;
+
+	if (!__ethtool_get_link_ksettings(dev, &ecmd))
+		link_speed = ecmd.base.speed;
+	else
+		link_speed = SPEED_1000;
+
+	q->port_rate = link_speed * 1000 * BYTES_PER_KBIT;
+
+	q->enqueue = cbs_enqueue_soft;
+	q->dequeue = cbs_dequeue_soft;
+
+	q->hicredit = qopt->hicredit;
+	q->locredit = qopt->locredit;
+	q->idleslope = qopt->idleslope * BYTES_PER_KBIT;
+	q->sendslope = qopt->sendslope * BYTES_PER_KBIT;
+
+	return 0;
+}
+
+static int cbs_init(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct cbs_sched_data *q = qdisc_priv(sch);
+
+	if (!opt)
+		return -EINVAL;
+
+	qdisc_watchdog_init(&q->watchdog, sch);
+
+	return cbs_change(sch, opt);
+}
+
+static void cbs_destroy(struct Qdisc *sch)
+{
+	struct cbs_sched_data *q = qdisc_priv(sch);
+
+	qdisc_watchdog_cancel(&q->watchdog);
+}
+
+static int cbs_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct cbs_sched_data *q = qdisc_priv(sch);
+	struct tc_cbs_qopt opt = { };
+	struct nlattr *nest;
+
+	nest = nla_nest_start(skb, TCA_OPTIONS);
+	if (!nest)
+		goto nla_put_failure;
+
+	opt.hicredit = q->hicredit;
+	opt.locredit = q->locredit;
+	opt.sendslope = q->sendslope / BYTES_PER_KBIT;
+	opt.idleslope = q->idleslope / BYTES_PER_KBIT;
+	opt.offload = 0;
+
+	if (nla_put(skb, TCA_CBS_PARMS, sizeof(opt), &opt))
+		goto nla_put_failure;
+
+	return nla_nest_end(skb, nest);
+
+nla_put_failure:
+	nla_nest_cancel(skb, nest);
+	return -1;
+}
+
+static struct Qdisc_ops cbs_qdisc_ops __read_mostly = {
+	.id		=	"cbs",
+	.priv_size	=	sizeof(struct cbs_sched_data),
+	.enqueue	=	cbs_enqueue,
+	.dequeue	=	cbs_dequeue,
+	.peek		=	qdisc_peek_dequeued,
+	.init		=	cbs_init,
+	.reset		=	qdisc_reset_queue,
+	.destroy	=	cbs_destroy,
+	.change		=	cbs_change,
+	.dump		=	cbs_dump,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init cbs_module_init(void)
+{
+	return register_qdisc(&cbs_qdisc_ops);
+}
+
+static void __exit cbs_module_exit(void)
+{
+	unregister_qdisc(&cbs_qdisc_ops);
+}
+module_init(cbs_module_init)
+module_exit(cbs_module_exit)
+MODULE_LICENSE("GPL");
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [net-next 5/7] net/sched: Add support for HW offloading for CBS
  2017-10-26 17:17 [net-next 0/7][pull request] 1GbE Intel Wired LAN Driver Updates 2017-10-26 Jeff Kirsher
                   ` (3 preceding siblings ...)
  2017-10-26 17:17 ` [net-next 4/7] net/sched: Introduce Credit Based Shaper (CBS) qdisc Jeff Kirsher
@ 2017-10-26 17:17 ` Jeff Kirsher
  2017-10-26 17:17 ` [net-next 6/7] net/sched/sch_cbs: Fix compilation on 32bit architectures Jeff Kirsher
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Jeff Kirsher @ 2017-10-26 17:17 UTC (permalink / raw)
  To: davem
  Cc: Vinicius Costa Gomes, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher

From: Vinicius Costa Gomes <vinicius.gomes@intel.com>

This adds support for offloading the CBS algorithm to the controller,
if supported. Drivers wanting to support CBS offload must implement
the .ndo_setup_tc callback and handle the TC_SETUP_CBS (introduced
here) type.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Henrik Austad <henrik@austad.us>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 include/linux/netdevice.h |   1 +
 include/net/pkt_sched.h   |   9 ++++
 net/sched/sch_cbs.c       | 104 ++++++++++++++++++++++++++++++++++++++++------
 3 files changed, 102 insertions(+), 12 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 6c7960c8338a..5e02f79b2110 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -776,6 +776,7 @@ enum tc_setup_type {
 	TC_SETUP_CLSMATCHALL,
 	TC_SETUP_CLSBPF,
 	TC_SETUP_BLOCK,
+	TC_SETUP_CBS,
 };
 
 /* These structures hold the attributes of xdp state that are being passed
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index b8ecafce4ba1..02f2db26e30c 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -140,4 +140,13 @@ static inline struct net *qdisc_net(struct Qdisc *q)
 	return dev_net(q->dev_queue->dev);
 }
 
+struct tc_cbs_qopt_offload {
+	u8 enable;
+	s32 queue;
+	s32 hicredit;
+	s32 locredit;
+	s32 idleslope;
+	s32 sendslope;
+};
+
 #endif
diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
index c0102b589494..cae021c642e5 100644
--- a/net/sched/sch_cbs.c
+++ b/net/sched/sch_cbs.c
@@ -68,6 +68,8 @@
 #define BYTES_PER_KBIT (1000LL / 8)
 
 struct cbs_sched_data {
+	bool offload;
+	int queue;
 	s64 port_rate; /* in bytes/s */
 	s64 last; /* timestamp in ns */
 	s64 credits; /* in bytes */
@@ -80,6 +82,11 @@ struct cbs_sched_data {
 	struct sk_buff *(*dequeue)(struct Qdisc *sch);
 };
 
+static int cbs_enqueue_offload(struct sk_buff *skb, struct Qdisc *sch)
+{
+	return qdisc_enqueue_tail(skb, sch);
+}
+
 static int cbs_enqueue_soft(struct sk_buff *skb, struct Qdisc *sch)
 {
 	struct cbs_sched_data *q = qdisc_priv(sch);
@@ -169,6 +176,11 @@ static struct sk_buff *cbs_dequeue_soft(struct Qdisc *sch)
 	return skb;
 }
 
+static struct sk_buff *cbs_dequeue_offload(struct Qdisc *sch)
+{
+	return qdisc_dequeue_head(sch);
+}
+
 static struct sk_buff *cbs_dequeue(struct Qdisc *sch)
 {
 	struct cbs_sched_data *q = qdisc_priv(sch);
@@ -180,14 +192,66 @@ static const struct nla_policy cbs_policy[TCA_CBS_MAX + 1] = {
 	[TCA_CBS_PARMS]	= { .len = sizeof(struct tc_cbs_qopt) },
 };
 
+static void cbs_disable_offload(struct net_device *dev,
+				struct cbs_sched_data *q)
+{
+	struct tc_cbs_qopt_offload cbs = { };
+	const struct net_device_ops *ops;
+	int err;
+
+	if (!q->offload)
+		return;
+
+	q->enqueue = cbs_enqueue_soft;
+	q->dequeue = cbs_dequeue_soft;
+
+	ops = dev->netdev_ops;
+	if (!ops->ndo_setup_tc)
+		return;
+
+	cbs.queue = q->queue;
+	cbs.enable = 0;
+
+	err = ops->ndo_setup_tc(dev, TC_SETUP_CBS, &cbs);
+	if (err < 0)
+		pr_warn("Couldn't disable CBS offload for queue %d\n",
+			cbs.queue);
+}
+
+static int cbs_enable_offload(struct net_device *dev, struct cbs_sched_data *q,
+			      const struct tc_cbs_qopt *opt)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+	struct tc_cbs_qopt_offload cbs = { };
+	int err;
+
+	if (!ops->ndo_setup_tc)
+		return -EOPNOTSUPP;
+
+	cbs.queue = q->queue;
+
+	cbs.enable = 1;
+	cbs.hicredit = opt->hicredit;
+	cbs.locredit = opt->locredit;
+	cbs.idleslope = opt->idleslope;
+	cbs.sendslope = opt->sendslope;
+
+	err = ops->ndo_setup_tc(dev, TC_SETUP_CBS, &cbs);
+	if (err < 0)
+		return err;
+
+	q->enqueue = cbs_enqueue_offload;
+	q->dequeue = cbs_dequeue_offload;
+
+	return 0;
+}
+
 static int cbs_change(struct Qdisc *sch, struct nlattr *opt)
 {
 	struct cbs_sched_data *q = qdisc_priv(sch);
 	struct net_device *dev = qdisc_dev(sch);
 	struct nlattr *tb[TCA_CBS_MAX + 1];
-	struct ethtool_link_ksettings ecmd;
 	struct tc_cbs_qopt *qopt;
-	s64 link_speed;
 	int err;
 
 	err = nla_parse_nested(tb, TCA_CBS_MAX, opt, cbs_policy, NULL);
@@ -199,23 +263,30 @@ static int cbs_change(struct Qdisc *sch, struct nlattr *opt)
 
 	qopt = nla_data(tb[TCA_CBS_PARMS]);
 
-	if (qopt->offload)
-		return -EOPNOTSUPP;
+	if (!qopt->offload) {
+		struct ethtool_link_ksettings ecmd;
+		s64 link_speed;
 
-	if (!__ethtool_get_link_ksettings(dev, &ecmd))
-		link_speed = ecmd.base.speed;
-	else
-		link_speed = SPEED_1000;
+		if (!__ethtool_get_link_ksettings(dev, &ecmd))
+			link_speed = ecmd.base.speed;
+		else
+			link_speed = SPEED_1000;
 
-	q->port_rate = link_speed * 1000 * BYTES_PER_KBIT;
+		q->port_rate = link_speed * 1000 * BYTES_PER_KBIT;
 
-	q->enqueue = cbs_enqueue_soft;
-	q->dequeue = cbs_dequeue_soft;
+		cbs_disable_offload(dev, q);
+	} else {
+		err = cbs_enable_offload(dev, q, qopt);
+		if (err < 0)
+			return err;
+	}
 
+	/* Everything went OK, save the parameters used. */
 	q->hicredit = qopt->hicredit;
 	q->locredit = qopt->locredit;
 	q->idleslope = qopt->idleslope * BYTES_PER_KBIT;
 	q->sendslope = qopt->sendslope * BYTES_PER_KBIT;
+	q->offload = qopt->offload;
 
 	return 0;
 }
@@ -223,10 +294,16 @@ static int cbs_change(struct Qdisc *sch, struct nlattr *opt)
 static int cbs_init(struct Qdisc *sch, struct nlattr *opt)
 {
 	struct cbs_sched_data *q = qdisc_priv(sch);
+	struct net_device *dev = qdisc_dev(sch);
 
 	if (!opt)
 		return -EINVAL;
 
+	q->queue = sch->dev_queue - netdev_get_tx_queue(dev, 0);
+
+	q->enqueue = cbs_enqueue_soft;
+	q->dequeue = cbs_dequeue_soft;
+
 	qdisc_watchdog_init(&q->watchdog, sch);
 
 	return cbs_change(sch, opt);
@@ -235,8 +312,11 @@ static int cbs_init(struct Qdisc *sch, struct nlattr *opt)
 static void cbs_destroy(struct Qdisc *sch)
 {
 	struct cbs_sched_data *q = qdisc_priv(sch);
+	struct net_device *dev = qdisc_dev(sch);
 
 	qdisc_watchdog_cancel(&q->watchdog);
+
+	cbs_disable_offload(dev, q);
 }
 
 static int cbs_dump(struct Qdisc *sch, struct sk_buff *skb)
@@ -253,7 +333,7 @@ static int cbs_dump(struct Qdisc *sch, struct sk_buff *skb)
 	opt.locredit = q->locredit;
 	opt.sendslope = q->sendslope / BYTES_PER_KBIT;
 	opt.idleslope = q->idleslope / BYTES_PER_KBIT;
-	opt.offload = 0;
+	opt.offload = q->offload;
 
 	if (nla_put(skb, TCA_CBS_PARMS, sizeof(opt), &opt))
 		goto nla_put_failure;
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [net-next 6/7] net/sched/sch_cbs: Fix compilation on 32bit architectures
  2017-10-26 17:17 [net-next 0/7][pull request] 1GbE Intel Wired LAN Driver Updates 2017-10-26 Jeff Kirsher
                   ` (4 preceding siblings ...)
  2017-10-26 17:17 ` [net-next 5/7] net/sched: Add support for HW offloading for CBS Jeff Kirsher
@ 2017-10-26 17:17 ` Jeff Kirsher
  2017-10-26 17:17 ` [net-next 7/7] igb: Add support for CBS offload Jeff Kirsher
  2017-10-27 15:15 ` [net-next 0/7][pull request] 1GbE Intel Wired LAN Driver Updates 2017-10-26 David Miller
  7 siblings, 0 replies; 11+ messages in thread
From: Jeff Kirsher @ 2017-10-26 17:17 UTC (permalink / raw)
  To: davem
  Cc: Vinicius Costa Gomes, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher

From: Vinicius Costa Gomes <vinicius.gomes@intel.com>

There was still a couple of divisions of 64bit quantities happening,
which can fail to compile if there aren't instructions to handle that
kind of division.

It will fail with a message like this:

ERROR: "__aeabi_ldivmod" [net/sched/sch_cbs.ko] undefined!

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Henrik Austad <henrik@austad.us>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 net/sched/sch_cbs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
index cae021c642e5..bdb533b7fb8c 100644
--- a/net/sched/sch_cbs.c
+++ b/net/sched/sch_cbs.c
@@ -331,8 +331,8 @@ static int cbs_dump(struct Qdisc *sch, struct sk_buff *skb)
 
 	opt.hicredit = q->hicredit;
 	opt.locredit = q->locredit;
-	opt.sendslope = q->sendslope / BYTES_PER_KBIT;
-	opt.idleslope = q->idleslope / BYTES_PER_KBIT;
+	opt.sendslope = div64_s64(q->sendslope, BYTES_PER_KBIT);
+	opt.idleslope = div64_s64(q->idleslope, BYTES_PER_KBIT);
 	opt.offload = q->offload;
 
 	if (nla_put(skb, TCA_CBS_PARMS, sizeof(opt), &opt))
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [net-next 7/7] igb: Add support for CBS offload
  2017-10-26 17:17 [net-next 0/7][pull request] 1GbE Intel Wired LAN Driver Updates 2017-10-26 Jeff Kirsher
                   ` (5 preceding siblings ...)
  2017-10-26 17:17 ` [net-next 6/7] net/sched/sch_cbs: Fix compilation on 32bit architectures Jeff Kirsher
@ 2017-10-26 17:17 ` Jeff Kirsher
  2017-10-27 15:15 ` [net-next 0/7][pull request] 1GbE Intel Wired LAN Driver Updates 2017-10-26 David Miller
  7 siblings, 0 replies; 11+ messages in thread
From: Jeff Kirsher @ 2017-10-26 17:17 UTC (permalink / raw)
  To: davem; +Cc: Andre Guedes, netdev, nhorman, sassmann, jogreene, Jeff Kirsher

From: Andre Guedes <andre.guedes@intel.com>

This patch adds support for Credit-Based Shaper (CBS) qdisc offload
from Traffic Control system. This support enable us to leverage the
Forwarding and Queuing for Time-Sensitive Streams (FQTSS) features
from Intel i210 Ethernet Controller. FQTSS is the former 802.1Qav
standard which was merged into 802.1Q in 2014. It enables traffic
prioritization and bandwidth reservation via the Credit-Based Shaper
which is implemented in hardware by i210 controller.

The patch introduces the igb_setup_tc() function which implements the
support for CBS qdisc hardware offload in the IGB driver. CBS offload
is the only traffic control offload supported by the driver at the
moment.

FQTSS transmission mode from i210 controller is automatically enabled
by the IGB driver when the CBS is enabled for the first hardware
queue. Likewise, FQTSS mode is automatically disabled when CBS is
disabled for the last hardware queue. Changing FQTSS mode requires NIC
reset.

FQTSS feature is supported by i210 controller only.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Henrik Austad <henrik@austad.us>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/e1000_defines.h |  23 ++
 drivers/net/ethernet/intel/igb/e1000_regs.h    |   8 +
 drivers/net/ethernet/intel/igb/igb.h           |   6 +
 drivers/net/ethernet/intel/igb/igb_main.c      | 347 +++++++++++++++++++++++++
 4 files changed, 384 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h b/drivers/net/ethernet/intel/igb/e1000_defines.h
index 1de82f247312..83cabff1e0ab 100644
--- a/drivers/net/ethernet/intel/igb/e1000_defines.h
+++ b/drivers/net/ethernet/intel/igb/e1000_defines.h
@@ -353,7 +353,18 @@
 #define E1000_RXPBS_CFG_TS_EN           0x80000000
 
 #define I210_RXPBSIZE_DEFAULT		0x000000A2 /* RXPBSIZE default */
+#define I210_RXPBSIZE_MASK		0x0000003F
+#define I210_RXPBSIZE_PB_32KB		0x00000020
 #define I210_TXPBSIZE_DEFAULT		0x04000014 /* TXPBSIZE default */
+#define I210_TXPBSIZE_MASK		0xC0FFFFFF
+#define I210_TXPBSIZE_PB0_8KB		(8 << 0)
+#define I210_TXPBSIZE_PB1_8KB		(8 << 6)
+#define I210_TXPBSIZE_PB2_4KB		(4 << 12)
+#define I210_TXPBSIZE_PB3_4KB		(4 << 18)
+
+#define I210_DTXMXPKTSZ_DEFAULT		0x00000098
+
+#define I210_SR_QUEUES_NUM		2
 
 /* SerDes Control */
 #define E1000_SCTL_DISABLE_SERDES_LOOPBACK 0x0400
@@ -1051,4 +1062,16 @@
 #define E1000_VLAPQF_P_VALID(_n)	(0x1 << (3 + (_n) * 4))
 #define E1000_VLAPQF_QUEUE_MASK	0x03
 
+/* TX Qav Control fields */
+#define E1000_TQAVCTRL_XMIT_MODE	BIT(0)
+#define E1000_TQAVCTRL_DATAFETCHARB	BIT(4)
+#define E1000_TQAVCTRL_DATATRANARB	BIT(8)
+
+/* TX Qav Credit Control fields */
+#define E1000_TQAVCC_IDLESLOPE_MASK	0xFFFF
+#define E1000_TQAVCC_QUEUEMODE		BIT(31)
+
+/* Transmit Descriptor Control fields */
+#define E1000_TXDCTL_PRIORITY		BIT(27)
+
 #endif
diff --git a/drivers/net/ethernet/intel/igb/e1000_regs.h b/drivers/net/ethernet/intel/igb/e1000_regs.h
index 58adbf234e07..8eee081d395f 100644
--- a/drivers/net/ethernet/intel/igb/e1000_regs.h
+++ b/drivers/net/ethernet/intel/igb/e1000_regs.h
@@ -421,6 +421,14 @@ do { \
 
 #define E1000_I210_FLA		0x1201C
 
+#define E1000_I210_DTXMXPKTSZ	0x355C
+
+#define E1000_I210_TXDCTL(_n)	(0x0E028 + ((_n) * 0x40))
+
+#define E1000_I210_TQAVCTRL	0x3570
+#define E1000_I210_TQAVCC(_n)	(0x3004 + ((_n) * 0x40))
+#define E1000_I210_TQAVHC(_n)	(0x300C + ((_n) * 0x40))
+
 #define E1000_INVM_DATA_REG(_n)	(0x12120 + 4*(_n))
 #define E1000_INVM_SIZE		64 /* Number of INVM Data Registers */
 
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 06ffb2bc713e..92845692087a 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -281,6 +281,11 @@ struct igb_ring {
 	u16 count;			/* number of desc. in the ring */
 	u8 queue_index;			/* logical index of the ring*/
 	u8 reg_idx;			/* physical index of the ring */
+	bool cbs_enable;		/* indicates if CBS is enabled */
+	s32 idleslope;			/* idleSlope in kbps */
+	s32 sendslope;			/* sendSlope in kbps */
+	s32 hicredit;			/* hiCredit in bytes */
+	s32 locredit;			/* loCredit in bytes */
 
 	/* everything past this point are written often */
 	u16 next_to_clean;
@@ -621,6 +626,7 @@ struct igb_adapter {
 #define IGB_FLAG_EEE			BIT(14)
 #define IGB_FLAG_VLAN_PROMISC		BIT(15)
 #define IGB_FLAG_RX_LEGACY		BIT(16)
+#define IGB_FLAG_FQTSS			BIT(17)
 
 /* Media Auto Sense */
 #define IGB_MAS_ENABLE_0		0X0001
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 58d01a211367..b3d730f4d695 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -34,6 +34,7 @@
 #include <linux/slab.h>
 #include <net/checksum.h>
 #include <net/ip6_checksum.h>
+#include <net/pkt_sched.h>
 #include <linux/net_tstamp.h>
 #include <linux/mii.h>
 #include <linux/ethtool.h>
@@ -62,6 +63,17 @@
 #define BUILD 0
 #define DRV_VERSION __stringify(MAJ) "." __stringify(MIN) "." \
 __stringify(BUILD) "-k"
+
+enum queue_mode {
+	QUEUE_MODE_STRICT_PRIORITY,
+	QUEUE_MODE_STREAM_RESERVATION,
+};
+
+enum tx_queue_prio {
+	TX_QUEUE_PRIO_HIGH,
+	TX_QUEUE_PRIO_LOW,
+};
+
 char igb_driver_name[] = "igb";
 char igb_driver_version[] = DRV_VERSION;
 static const char igb_driver_string[] =
@@ -1271,6 +1283,12 @@ static int igb_alloc_q_vector(struct igb_adapter *adapter,
 		ring->count = adapter->tx_ring_count;
 		ring->queue_index = txr_idx;
 
+		ring->cbs_enable = false;
+		ring->idleslope = 0;
+		ring->sendslope = 0;
+		ring->hicredit = 0;
+		ring->locredit = 0;
+
 		u64_stats_init(&ring->tx_syncp);
 		u64_stats_init(&ring->tx_syncp2);
 
@@ -1598,6 +1616,284 @@ static void igb_get_hw_control(struct igb_adapter *adapter)
 			ctrl_ext | E1000_CTRL_EXT_DRV_LOAD);
 }
 
+static void enable_fqtss(struct igb_adapter *adapter, bool enable)
+{
+	struct net_device *netdev = adapter->netdev;
+	struct e1000_hw *hw = &adapter->hw;
+
+	WARN_ON(hw->mac.type != e1000_i210);
+
+	if (enable)
+		adapter->flags |= IGB_FLAG_FQTSS;
+	else
+		adapter->flags &= ~IGB_FLAG_FQTSS;
+
+	if (netif_running(netdev))
+		schedule_work(&adapter->reset_task);
+}
+
+static bool is_fqtss_enabled(struct igb_adapter *adapter)
+{
+	return (adapter->flags & IGB_FLAG_FQTSS) ? true : false;
+}
+
+static void set_tx_desc_fetch_prio(struct e1000_hw *hw, int queue,
+				   enum tx_queue_prio prio)
+{
+	u32 val;
+
+	WARN_ON(hw->mac.type != e1000_i210);
+	WARN_ON(queue < 0 || queue > 4);
+
+	val = rd32(E1000_I210_TXDCTL(queue));
+
+	if (prio == TX_QUEUE_PRIO_HIGH)
+		val |= E1000_TXDCTL_PRIORITY;
+	else
+		val &= ~E1000_TXDCTL_PRIORITY;
+
+	wr32(E1000_I210_TXDCTL(queue), val);
+}
+
+static void set_queue_mode(struct e1000_hw *hw, int queue, enum queue_mode mode)
+{
+	u32 val;
+
+	WARN_ON(hw->mac.type != e1000_i210);
+	WARN_ON(queue < 0 || queue > 1);
+
+	val = rd32(E1000_I210_TQAVCC(queue));
+
+	if (mode == QUEUE_MODE_STREAM_RESERVATION)
+		val |= E1000_TQAVCC_QUEUEMODE;
+	else
+		val &= ~E1000_TQAVCC_QUEUEMODE;
+
+	wr32(E1000_I210_TQAVCC(queue), val);
+}
+
+/**
+ *  igb_configure_cbs - Configure Credit-Based Shaper (CBS)
+ *  @adapter: pointer to adapter struct
+ *  @queue: queue number
+ *  @enable: true = enable CBS, false = disable CBS
+ *  @idleslope: idleSlope in kbps
+ *  @sendslope: sendSlope in kbps
+ *  @hicredit: hiCredit in bytes
+ *  @locredit: loCredit in bytes
+ *
+ *  Configure CBS for a given hardware queue. When disabling, idleslope,
+ *  sendslope, hicredit, locredit arguments are ignored. Returns 0 if
+ *  success. Negative otherwise.
+ **/
+static void igb_configure_cbs(struct igb_adapter *adapter, int queue,
+			      bool enable, int idleslope, int sendslope,
+			      int hicredit, int locredit)
+{
+	struct net_device *netdev = adapter->netdev;
+	struct e1000_hw *hw = &adapter->hw;
+	u32 tqavcc;
+	u16 value;
+
+	WARN_ON(hw->mac.type != e1000_i210);
+	WARN_ON(queue < 0 || queue > 1);
+
+	if (enable) {
+		set_tx_desc_fetch_prio(hw, queue, TX_QUEUE_PRIO_HIGH);
+		set_queue_mode(hw, queue, QUEUE_MODE_STREAM_RESERVATION);
+
+		/* According to i210 datasheet section 7.2.7.7, we should set
+		 * the 'idleSlope' field from TQAVCC register following the
+		 * equation:
+		 *
+		 * For 100 Mbps link speed:
+		 *
+		 *     value = BW * 0x7735 * 0.2                          (E1)
+		 *
+		 * For 1000Mbps link speed:
+		 *
+		 *     value = BW * 0x7735 * 2                            (E2)
+		 *
+		 * E1 and E2 can be merged into one equation as shown below.
+		 * Note that 'link-speed' is in Mbps.
+		 *
+		 *     value = BW * 0x7735 * 2 * link-speed
+		 *                           --------------               (E3)
+		 *                                1000
+		 *
+		 * 'BW' is the percentage bandwidth out of full link speed
+		 * which can be found with the following equation. Note that
+		 * idleSlope here is the parameter from this function which
+		 * is in kbps.
+		 *
+		 *     BW =     idleSlope
+		 *          -----------------                             (E4)
+		 *          link-speed * 1000
+		 *
+		 * That said, we can come up with a generic equation to
+		 * calculate the value we should set it TQAVCC register by
+		 * replacing 'BW' in E3 by E4. The resulting equation is:
+		 *
+		 * value =     idleSlope     * 0x7735 * 2 * link-speed
+		 *         -----------------            --------------    (E5)
+		 *         link-speed * 1000                 1000
+		 *
+		 * 'link-speed' is present in both sides of the fraction so
+		 * it is canceled out. The final equation is the following:
+		 *
+		 *     value = idleSlope * 61034
+		 *             -----------------                          (E6)
+		 *                  1000000
+		 */
+		value = DIV_ROUND_UP_ULL(idleslope * 61034ULL, 1000000);
+
+		tqavcc = rd32(E1000_I210_TQAVCC(queue));
+		tqavcc &= ~E1000_TQAVCC_IDLESLOPE_MASK;
+		tqavcc |= value;
+		wr32(E1000_I210_TQAVCC(queue), tqavcc);
+
+		wr32(E1000_I210_TQAVHC(queue), 0x80000000 + hicredit * 0x7735);
+	} else {
+		set_tx_desc_fetch_prio(hw, queue, TX_QUEUE_PRIO_LOW);
+		set_queue_mode(hw, queue, QUEUE_MODE_STRICT_PRIORITY);
+
+		/* Set idleSlope to zero. */
+		tqavcc = rd32(E1000_I210_TQAVCC(queue));
+		tqavcc &= ~E1000_TQAVCC_IDLESLOPE_MASK;
+		wr32(E1000_I210_TQAVCC(queue), tqavcc);
+
+		/* Set hiCredit to zero. */
+		wr32(E1000_I210_TQAVHC(queue), 0);
+	}
+
+	/* XXX: In i210 controller the sendSlope and loCredit parameters from
+	 * CBS are not configurable by software so we don't do any 'controller
+	 * configuration' in respect to these parameters.
+	 */
+
+	netdev_dbg(netdev, "CBS %s: queue %d idleslope %d sendslope %d hiCredit %d locredit %d\n",
+		   (enable) ? "enabled" : "disabled", queue,
+		   idleslope, sendslope, hicredit, locredit);
+}
+
+static int igb_save_cbs_params(struct igb_adapter *adapter, int queue,
+			       bool enable, int idleslope, int sendslope,
+			       int hicredit, int locredit)
+{
+	struct igb_ring *ring;
+
+	if (queue < 0 || queue > adapter->num_tx_queues)
+		return -EINVAL;
+
+	ring = adapter->tx_ring[queue];
+
+	ring->cbs_enable = enable;
+	ring->idleslope = idleslope;
+	ring->sendslope = sendslope;
+	ring->hicredit = hicredit;
+	ring->locredit = locredit;
+
+	return 0;
+}
+
+static bool is_any_cbs_enabled(struct igb_adapter *adapter)
+{
+	struct igb_ring *ring;
+	int i;
+
+	for (i = 0; i < adapter->num_tx_queues; i++) {
+		ring = adapter->tx_ring[i];
+
+		if (ring->cbs_enable)
+			return true;
+	}
+
+	return false;
+}
+
+static void igb_setup_tx_mode(struct igb_adapter *adapter)
+{
+	struct net_device *netdev = adapter->netdev;
+	struct e1000_hw *hw = &adapter->hw;
+	u32 val;
+
+	/* Only i210 controller supports changing the transmission mode. */
+	if (hw->mac.type != e1000_i210)
+		return;
+
+	if (is_fqtss_enabled(adapter)) {
+		int i, max_queue;
+
+		/* Configure TQAVCTRL register: set transmit mode to 'Qav',
+		 * set data fetch arbitration to 'round robin' and set data
+		 * transfer arbitration to 'credit shaper algorithm.
+		 */
+		val = rd32(E1000_I210_TQAVCTRL);
+		val |= E1000_TQAVCTRL_XMIT_MODE | E1000_TQAVCTRL_DATATRANARB;
+		val &= ~E1000_TQAVCTRL_DATAFETCHARB;
+		wr32(E1000_I210_TQAVCTRL, val);
+
+		/* Configure Tx and Rx packet buffers sizes as described in
+		 * i210 datasheet section 7.2.7.7.
+		 */
+		val = rd32(E1000_TXPBS);
+		val &= ~I210_TXPBSIZE_MASK;
+		val |= I210_TXPBSIZE_PB0_8KB | I210_TXPBSIZE_PB1_8KB |
+			I210_TXPBSIZE_PB2_4KB | I210_TXPBSIZE_PB3_4KB;
+		wr32(E1000_TXPBS, val);
+
+		val = rd32(E1000_RXPBS);
+		val &= ~I210_RXPBSIZE_MASK;
+		val |= I210_RXPBSIZE_PB_32KB;
+		wr32(E1000_RXPBS, val);
+
+		/* Section 8.12.9 states that MAX_TPKT_SIZE from DTXMXPKTSZ
+		 * register should not exceed the buffer size programmed in
+		 * TXPBS. The smallest buffer size programmed in TXPBS is 4kB
+		 * so according to the datasheet we should set MAX_TPKT_SIZE to
+		 * 4kB / 64.
+		 *
+		 * However, when we do so, no frame from queue 2 and 3 are
+		 * transmitted.  It seems the MAX_TPKT_SIZE should not be great
+		 * or _equal_ to the buffer size programmed in TXPBS. For this
+		 * reason, we set set MAX_ TPKT_SIZE to (4kB - 1) / 64.
+		 */
+		val = (4096 - 1) / 64;
+		wr32(E1000_I210_DTXMXPKTSZ, val);
+
+		/* Since FQTSS mode is enabled, apply any CBS configuration
+		 * previously set. If no previous CBS configuration has been
+		 * done, then the initial configuration is applied, which means
+		 * CBS is disabled.
+		 */
+		max_queue = (adapter->num_tx_queues < I210_SR_QUEUES_NUM) ?
+			    adapter->num_tx_queues : I210_SR_QUEUES_NUM;
+
+		for (i = 0; i < max_queue; i++) {
+			struct igb_ring *ring = adapter->tx_ring[i];
+
+			igb_configure_cbs(adapter, i, ring->cbs_enable,
+					  ring->idleslope, ring->sendslope,
+					  ring->hicredit, ring->locredit);
+		}
+	} else {
+		wr32(E1000_RXPBS, I210_RXPBSIZE_DEFAULT);
+		wr32(E1000_TXPBS, I210_TXPBSIZE_DEFAULT);
+		wr32(E1000_I210_DTXMXPKTSZ, I210_DTXMXPKTSZ_DEFAULT);
+
+		val = rd32(E1000_I210_TQAVCTRL);
+		/* According to Section 8.12.21, the other flags we've set when
+		 * enabling FQTSS are not relevant when disabling FQTSS so we
+		 * don't set they here.
+		 */
+		val &= ~E1000_TQAVCTRL_XMIT_MODE;
+		wr32(E1000_I210_TQAVCTRL, val);
+	}
+
+	netdev_dbg(netdev, "FQTSS %s\n", (is_fqtss_enabled(adapter)) ?
+		   "enabled" : "disabled");
+}
+
 /**
  *  igb_configure - configure the hardware for RX and TX
  *  @adapter: private board structure
@@ -1609,6 +1905,7 @@ static void igb_configure(struct igb_adapter *adapter)
 
 	igb_get_hw_control(adapter);
 	igb_set_rx_mode(netdev);
+	igb_setup_tx_mode(adapter);
 
 	igb_restore_vlan(adapter);
 
@@ -2150,6 +2447,55 @@ igb_features_check(struct sk_buff *skb, struct net_device *dev,
 	return features;
 }
 
+static int igb_offload_cbs(struct igb_adapter *adapter,
+			   struct tc_cbs_qopt_offload *qopt)
+{
+	struct e1000_hw *hw = &adapter->hw;
+	int err;
+
+	/* CBS offloading is only supported by i210 controller. */
+	if (hw->mac.type != e1000_i210)
+		return -EOPNOTSUPP;
+
+	/* CBS offloading is only supported by queue 0 and queue 1. */
+	if (qopt->queue < 0 || qopt->queue > 1)
+		return -EINVAL;
+
+	err = igb_save_cbs_params(adapter, qopt->queue, qopt->enable,
+				  qopt->idleslope, qopt->sendslope,
+				  qopt->hicredit, qopt->locredit);
+	if (err)
+		return err;
+
+	if (is_fqtss_enabled(adapter)) {
+		igb_configure_cbs(adapter, qopt->queue, qopt->enable,
+				  qopt->idleslope, qopt->sendslope,
+				  qopt->hicredit, qopt->locredit);
+
+		if (!is_any_cbs_enabled(adapter))
+			enable_fqtss(adapter, false);
+
+	} else {
+		enable_fqtss(adapter, true);
+	}
+
+	return 0;
+}
+
+static int igb_setup_tc(struct net_device *dev, enum tc_setup_type type,
+			void *type_data)
+{
+	struct igb_adapter *adapter = netdev_priv(dev);
+
+	switch (type) {
+	case TC_SETUP_CBS:
+		return igb_offload_cbs(adapter, type_data);
+
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
 static const struct net_device_ops igb_netdev_ops = {
 	.ndo_open		= igb_open,
 	.ndo_stop		= igb_close,
@@ -2175,6 +2521,7 @@ static const struct net_device_ops igb_netdev_ops = {
 	.ndo_set_features	= igb_set_features,
 	.ndo_fdb_add		= igb_ndo_fdb_add,
 	.ndo_features_check	= igb_features_check,
+	.ndo_setup_tc		= igb_setup_tc,
 };
 
 /**
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [net-next 1/7] net/sched: Check for null dev_queue on create flow
  2017-10-26 17:17 ` [net-next 1/7] net/sched: Check for null dev_queue on create flow Jeff Kirsher
@ 2017-10-27 10:23   ` Davide Caratti
  2017-10-27 15:28     ` Jesus Sanchez-Palencia
  0 siblings, 1 reply; 11+ messages in thread
From: Davide Caratti @ 2017-10-27 10:23 UTC (permalink / raw)
  To: Jeff Kirsher, davem
  Cc: Jesus Sanchez-Palencia, netdev, nhorman, sassmann, jogreene,
	Ivan Vecera

On Thu, 2017-10-26 at 10:17 -0700, Jeff Kirsher wrote:
> From: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
> 
> In qdisc_alloc() the dev_queue pointer was used without any checks
> being performed. If qdisc_create() gets a null dev_queue pointer, it
> just passes it along to qdisc_alloc(), leading to a crash. That
> happens if a root qdisc implements select_queue() and returns a null
> dev_queue pointer for an "invalid handle", for example, or if the
> dev_queue associated with the parent qdisc is null.
> 
> This patch is in preparation for the next in this series, where
> select_queue() is being added to mqprio and as it may return a null
> dev_queue.
> 
> Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
> Tested-by: Henrik Austad <henrik@austad.us>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
>  net/sched/sch_generic.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)

hello,

I didn't notice this when I posted https://www.spinics.net/lists/netdev/ms
g462986.html , about one hour later, targeting Dave's net tree. I saw
similar issues, but in my setup dev_queue was a valid pointer, and dev was
NULL. This made qdisc_alloc() dereference NULL, when accessing dev->
members in the function body before returning the newly allocated qdisc.

So, in my understanding both tests are necessary, but a (very trivial)
conflict will be generated when these two commits will be eventually
merged together.

I like this suggestion from Ivan:

-- >8 --
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index bf8c81e07c70..6bd1ae993326 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -603,8 +603,14 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
        struct Qdisc *sch;
        unsigned int size = QDISC_ALIGN(sizeof(*sch)) + ops->priv_size;
        int err = -ENOBUFS;
-       struct net_device *dev = dev_queue->dev;
+       struct net_device *dev;
+
+       if (!dev_queue || !dev_queue->dev) {
+               err = !dev_queue ? -EINVAL : -ENOENT;
+               goto errout;
+       }

+       dev = dev_queue->dev;
        p = kzalloc_node(size, GFP_KERNEL,
                         netdev_queue_numa_node_read(dev_queue));

-- 8< --

and I volunteer for sending a v2 of 'net/sched: fix NULL pointer
dereference in qdisc_alloc()' including this, targeting 'net' tree, with
an appropriate tag. WDYT?

regards,
-- 
davide

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [net-next 0/7][pull request] 1GbE Intel Wired LAN Driver Updates 2017-10-26
  2017-10-26 17:17 [net-next 0/7][pull request] 1GbE Intel Wired LAN Driver Updates 2017-10-26 Jeff Kirsher
                   ` (6 preceding siblings ...)
  2017-10-26 17:17 ` [net-next 7/7] igb: Add support for CBS offload Jeff Kirsher
@ 2017-10-27 15:15 ` David Miller
  7 siblings, 0 replies; 11+ messages in thread
From: David Miller @ 2017-10-27 15:15 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 26 Oct 2017 10:17:07 -0700

> This patchset is a proposal of how the Traffic Control subsystem can
> be used to offload the configuration of the Credit Based Shaper
> (defined in the IEEE 802.1Q-2014 Section 8.6.8.2) into supported
> network devices.

Jeff, please collapse patch #6 into patch #4, as #6 fixes a bug added
by #4.

This will also help make things bisect better.

Thank you.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [net-next 1/7] net/sched: Check for null dev_queue on create flow
  2017-10-27 10:23   ` Davide Caratti
@ 2017-10-27 15:28     ` Jesus Sanchez-Palencia
  0 siblings, 0 replies; 11+ messages in thread
From: Jesus Sanchez-Palencia @ 2017-10-27 15:28 UTC (permalink / raw)
  To: Davide Caratti, Jeff Kirsher, davem
  Cc: netdev, nhorman, sassmann, jogreene, Ivan Vecera

Hi Davide,


On 10/27/2017 03:23 AM, Davide Caratti wrote:
> On Thu, 2017-10-26 at 10:17 -0700, Jeff Kirsher wrote:
>> From: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
>>
>> In qdisc_alloc() the dev_queue pointer was used without any checks
>> being performed. If qdisc_create() gets a null dev_queue pointer, it
>> just passes it along to qdisc_alloc(), leading to a crash. That
>> happens if a root qdisc implements select_queue() and returns a null
>> dev_queue pointer for an "invalid handle", for example, or if the
>> dev_queue associated with the parent qdisc is null.
>>
>> This patch is in preparation for the next in this series, where
>> select_queue() is being added to mqprio and as it may return a null
>> dev_queue.
>>
>> Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
>> Tested-by: Henrik Austad <henrik@austad.us>
>> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>> ---
>>  net/sched/sch_generic.c | 8 +++++++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> hello,
> 
> I didn't notice this when I posted https://www.spinics.net/lists/netdev/ms
> g462986.html , about one hour later, targeting Dave's net tree. I saw
> similar issues, but in my setup dev_queue was a valid pointer, and dev was
> NULL. This made qdisc_alloc() dereference NULL, when accessing dev->
> members in the function body before returning the newly allocated qdisc.
> 
> So, in my understanding both tests are necessary, but a (very trivial)
> conflict will be generated when these two commits will be eventually
> merged together.
> 
> I like this suggestion from Ivan:
> 
> -- >8 --
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index bf8c81e07c70..6bd1ae993326 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -603,8 +603,14 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
>         struct Qdisc *sch;
>         unsigned int size = QDISC_ALIGN(sizeof(*sch)) + ops->priv_size;
>         int err = -ENOBUFS;
> -       struct net_device *dev = dev_queue->dev;
> +       struct net_device *dev;
> +
> +       if (!dev_queue || !dev_queue->dev) {
> +               err = !dev_queue ? -EINVAL : -ENOENT;
> +               goto errout;
> +       }
> 
> +       dev = dev_queue->dev;
>         p = kzalloc_node(size, GFP_KERNEL,
>                          netdev_queue_numa_node_read(dev_queue));
> 
> -- 8< --
> 
> and I volunteer for sending a v2 of 'net/sched: fix NULL pointer
> dereference in qdisc_alloc()' including this, targeting 'net' tree, with
> an appropriate tag. WDYT?


Thanks for reaching out!

The patches look independent to me, and in a way they tell different stories.
The conflict should be trivial, so I would just merge both separately and handle
it, but I'm fine with any approaches, really, so I'd go with whichever makes the
maintainers' lives easier.


Regards,
Jesus


> 
> regards,
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-10-27 15:35 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-26 17:17 [net-next 0/7][pull request] 1GbE Intel Wired LAN Driver Updates 2017-10-26 Jeff Kirsher
2017-10-26 17:17 ` [net-next 1/7] net/sched: Check for null dev_queue on create flow Jeff Kirsher
2017-10-27 10:23   ` Davide Caratti
2017-10-27 15:28     ` Jesus Sanchez-Palencia
2017-10-26 17:17 ` [net-next 2/7] net/sched: Change behavior of mq select_queue() Jeff Kirsher
2017-10-26 17:17 ` [net-next 3/7] net/sched: Add select_queue() class_ops for mqprio Jeff Kirsher
2017-10-26 17:17 ` [net-next 4/7] net/sched: Introduce Credit Based Shaper (CBS) qdisc Jeff Kirsher
2017-10-26 17:17 ` [net-next 5/7] net/sched: Add support for HW offloading for CBS Jeff Kirsher
2017-10-26 17:17 ` [net-next 6/7] net/sched/sch_cbs: Fix compilation on 32bit architectures Jeff Kirsher
2017-10-26 17:17 ` [net-next 7/7] igb: Add support for CBS offload Jeff Kirsher
2017-10-27 15:15 ` [net-next 0/7][pull request] 1GbE Intel Wired LAN Driver Updates 2017-10-26 David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).