* [Intel-wired-lan] [next-queue PATCH v4 0/4] TSN: Add qdisc based config interface for CBS
@ 2017-10-04 0:28 ` Vinicius Costa Gomes
0 siblings, 0 replies; 28+ messages in thread
From: Vinicius Costa Gomes @ 2017-10-04 0:28 UTC (permalink / raw)
To: intel-wired-lan
Hi,
Changes since v3:
- None, only a clean patchset without old patches;
Changes since v2:
- squashed the patch introducing the userspace API into the patch
implementing CBS;
Changes since v1:
- Solved the mqprio dependency;
- Fixed a mqprio bug, that caused the inner qdisc to have a wrong
dev_queue associated with it;
Changes from the RFC:
- Fixed comments from Henrik Austad;
- Simplified the Qdisc, using the generic implementation of callbacks
where possible;
- Small refactor on the driver (igb) code;
This patchset is a proposal of how the Traffic Control subsystem can
be used to offload the configuration of the Credit Based Shaper
(defined in the IEEE 802.1Q-2014 Section 8.6.8.2) into supported
network devices.
As part of this work, we've assessed previous public discussions
related to TSN enabling: patches from Henrik Austad (Cisco), the
presentation from Eric Mann at Linux Plumbers 2012, patches from
Gangfeng Huang (National Instruments) and the current state of the
OpenAVNU project (https://github.com/AVnu/OpenAvnu/).
Overview
========
Time-sensitive Networking (TSN) is a set of standards that aim to
address resources availability for providing bandwidth reservation and
bounded latency on Ethernet based LANs. The proposal described here
aims to cover mainly what is needed to enable the following standards:
802.1Qat and 802.1Qav.
The initial target of this work is the Intel i210 NIC, but other
controllers' datasheet were also taken into account, like the Renesas
RZ/A1H RZ/A1M group and the Synopsis DesignWare Ethernet QoS
controller.
Proposal
========
Feature-wise, what is covered here is the configuration interfaces for
HW implementations of the Credit-Based shaper (CBS, 802.1Qav). CBS is
a per-queue shaper. Given that this feature is related to traffic
shaping, and that the traffic control subsystem already provides a
queueing discipline that offloads config into the device driver (i.e.
mqprio), designing a new qdisc for the specific purpose of offloading
the config for the CBS shaper seemed like a good fit.
For steering traffic into the correct queues, we use the socket option
SO_PRIORITY and then a mechanism to map priority to traffic classes /
Tx queues. The qdisc mqprio is currently used in our tests.
As for the CBS config interface, this patchset is proposing a new
qdisc called 'cbs'. Its 'tc' cmd line is:
$ tc qdisc add dev IFACE parent ID cbs locredit N hicredit M sendslope S \
idleslope I
Note that the parameters for this qdisc are the ones defined by the
802.1Q-2014 spec, so no hardware specific functionality is exposed here.
Per-stream shaping, as defined by IEEE 802.1Q-2014 Section 34.6.1, is
not yet covered by this proposal.
Testing this RFC
================
Attached to this cover letter are:
- calculate_cbs_params.py: A Python script to calculate the
parameters to the CBS queueing discipline;
- tsn-talker.c: A sample C implementation of the talker side of a stream;
- tsn-listener.c: A sample C implementation of the listener side of a
stream;
For testing the patches of this series, you may want to use the
attached samples to this cover letter and use the 'mqprio' qdisc to
setup the priorities to Tx queues mapping, together with the 'cbs'
qdisc to configure the HW shaper of the i210 controller:
1) Setup priorities to traffic classes to hardware queues mapping
$ tc qdisc replace dev ens4 handle 100: parent root mqprio num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1 at 0 1 at 1 2 at 2 hw 0
For a more detailed explanation, see mqprio(8), in short, this command
will map traffic with priority 3 to the hardware queue 0, traffic with
priority 2 to hardware queue 1, and the rest will be mapped to
hardware queues 2 and 3.
2) Check scheme. You want to get the inner qdiscs ID from the bottom up
$ tc -g class show dev ens4
Ex.:
+---(100:3) mqprio
| +---(100:6) mqprio
| +---(100:7) mqprio
|
+---(100:2) mqprio
| +---(100:5) mqprio
|
+---(100:1) mqprio
+---(100:4) mqprio
* Here '100:4' is Tx Queue #0 and '100:5' is Tx Queue #1.
3) Calculate CBS parameters for classes A and B. i.e. BW for A is 20Mbps and
for B is 10Mbps:
$ calc_cbs_params.py -A 20000 -a 1500 -B 10000 -b 1500
4) Configure CBS for traffic class A (priority 3) as provided by the script:
$ tc qdisc replace dev ens4 parent 100:4 cbs locredit -1470 \
hicredit 30 sendslope -980000 idleslope 20000
5) Configure CBS for traffic class B (priority 2):
$ tc qdisc replace dev ens4 parent 100:5 cbs \
locredit -1485 hicredit 31 sendslope -990000 idleslope 10000
6) Run Listener:
$ ./tsn-listener -d 01:AA:AA:AA:AA:AA -i ens4 -s 1500
7) Run Talker for class A (prio 3 here), compiled from samples/tsn/talker.c
$ ./tsn-talker -d 01:AA:AA:AA:AA:AA -i ens4 -p 3 -s 1500
* The bandwidth displayed on the listener output at this stage should be very
close to the one configured for class A.
8) You can also run a Talker for class B (prio 2 here and using a
different address):
$ ./tsn-talker -d 01:BB:BB:BB:BB:BB -i ens4 -s 1500
Authors
=======
- Andre Guedes <andre.guedes@intel.com>
- Ivan Briano <ivan.briano@intel.com>
- Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
- Vinicius Gomes <vinicius.gomes@intel.com>
Andre Guedes (1):
igb: Add support for CBS offload
Jesus Sanchez-Palencia (2):
mqprio: Implement select_queue class_ops
net/sched: Fix accessing invalid dev_queue
Vinicius Costa Gomes (1):
net/sched: Introduce Credit Based Shaper (CBS) qdisc
drivers/net/ethernet/intel/igb/e1000_defines.h | 23 ++
drivers/net/ethernet/intel/igb/e1000_regs.h | 8 +
drivers/net/ethernet/intel/igb/igb.h | 6 +
drivers/net/ethernet/intel/igb/igb_main.c | 347 +++++++++++++++++++++++++
include/linux/netdevice.h | 1 +
include/net/pkt_sched.h | 9 +
include/uapi/linux/pkt_sched.h | 17 ++
net/sched/Kconfig | 11 +
net/sched/Makefile | 1 +
net/sched/sch_cbs.c | 225 ++++++++++++++++
net/sched/sch_generic.c | 8 +-
net/sched/sch_mqprio.c | 7 +
12 files changed, 662 insertions(+), 1 deletion(-)
create mode 100644 net/sched/sch_cbs.c
Annex: Sample files
===================
calc_cbs_params.py
--8<---------------cut here---------------start------------->8---
#!/usr/bin/env python
#
# Copyright (c) 2017, Intel Corporation
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of Intel Corporation nor the names of its contributors
# may be used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
import argparse
import math
def print_cbs_params_for_class_a(args):
idleslope = args.idleslope_a
sendslope = idleslope - args.link_speed
# According to 802.1Q-2014 spec, Annex L, hiCredit and
# loCredit for SR class A are calculated following the
# equations L-10 and L-12, respectively.
hicredit = math.ceil(idleslope * args.frame_non_sr / args.link_speed)
locredit = math.ceil(sendslope * args.frame_a / args.link_speed)
print("tc qdisc add dev <IFNAME> parent <QDISC-ID> cbs idleslope %d sendslope %d hicredit %d locredit %d" % \
(idleslope, sendslope, hicredit, locredit))
def print_cbs_params_for_class_b(args):
idleslope = args.idleslope_b
sendslope = idleslope - args.link_speed
# Annex L doesn't present a straightforward equation to
# calculate hiCredit for Class B so we have to derive it
# based on generic equations presented in that Annex.
#
# L-3 is the primary equation to calculate hiCredit. Section
# L.2 states that the 'maxInterferenceSize' for SR class B
# is the maximum burst size for SR class A plus the
# maxInterferenceSize from SR class A (which is equal to the
# maximum frame from non-SR traffic).
#
# The maximum burst size for SR class A equation is shown in
# L-16. Merging L-16 into L-3 we get the resulting equation
# which calculates hiCredit B (refer to section L.3 in case
# you're not familiar with the legend):
#
# hiCredit B = Rb * ( Mo Ma )
# ---------- + ------
# Ro - Ra Ro
#
hicredit = math.ceil(idleslope * \
((args.frame_non_sr / (args.link_speed - args.idleslope_a)) + \
(args.frame_a / args.link_speed)))
# loCredit B is calculated following equation L-2.
locredit = math.ceil(sendslope * args.frame_b / args.link_speed)
print("tc qdisc add dev <IFNAME> parent <QDISC-ID> cbs idleslope %d sendslope %d hicredit %d locredit %d" % \
(idleslope, sendslope, hicredit, locredit))
def main():
parser = argparse.ArgumentParser()
parser.add_argument('-S', dest='link_speed', default=1000000.0, type=float,
help='Link speed in kbps')
parser.add_argument('-s', dest='frame_non_sr', default=1500.0, type=float,
help='Maximum frame size from non-SR traffic (MTU size'
'usually')
parser.add_argument('-A', dest='idleslope_a', default=0, type=float,
help='Idleslope for SR class A in kbps')
parser.add_argument('-a', dest='frame_a', default=0, type=float,
help='Maximum frame size for SR class A traffic')
parser.add_argument('-B', dest='idleslope_b', default=0, type=float,
help='Idleslope for SR class B in kbps')
parser.add_argument('-b', dest='frame_b', default=0, type=float,
help='Maximum frame size for SR class B traffic')
args = parser.parse_args()
if args.idleslope_a > 0:
print_cbs_params_for_class_a(args)
if args.idleslope_b > 0:
print_cbs_params_for_class_b(args)
if __name__ == "__main__":
main()
--8<---------------cut here---------------end--------------->8---
tsn-talker.c
--8<---------------cut here---------------start------------->8---
/*
* Copyright (c) 2017, Intel Corporation
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of Intel Corporation nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
* SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
* STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include <alloca.h>
#include <argp.h>
#include <arpa/inet.h>
#include <inttypes.h>
#include <linux/if.h>
#include <linux/if_ether.h>
#include <linux/if_packet.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <unistd.h>
#define MAGIC 0xCC
static uint8_t ifname[IFNAMSIZ];
static uint8_t macaddr[ETH_ALEN];
static int priority = -1;
static size_t size = 1500;
static uint64_t seq;
static int delay = -1;
static struct argp_option options[] = {
{"dst-addr", 'd', "MACADDR", 0, "Stream Destination MAC address" },
{"delay", 'D', "NUM", 0, "Delay (in us) between packet transmission" },
{"ifname", 'i', "IFNAME", 0, "Network Interface" },
{"prio", 'p', "NUM", 0, "SO_PRIORITY to be set in socket" },
{"packet-size", 's', "NUM", 0, "Size of packets to be transmitted" },
{ 0 }
};
static error_t parser(int key, char *arg, struct argp_state *state)
{
int res;
switch (key) {
case 'd':
res = sscanf(arg, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
&macaddr[0], &macaddr[1], &macaddr[2],
&macaddr[3], &macaddr[4], &macaddr[5]);
if (res != 6) {
printf("Invalid address\n");
exit(EXIT_FAILURE);
}
break;
case 'D':
delay = atoi(arg);
break;
case 'i':
strncpy(ifname, arg, sizeof(ifname) - 1);
break;
case 'p':
priority = atoi(arg);
break;
case 's':
size = atoi(arg);
break;
}
return 0;
}
static struct argp argp = { options, parser };
int main(int argc, char *argv[])
{
int fd, res;
struct ifreq req;
uint8_t *data;
struct sockaddr_ll sk_addr = {
.sll_family = AF_PACKET,
.sll_protocol = htons(ETH_P_TSN),
.sll_halen = ETH_ALEN,
};
argp_parse(&argp, argc, argv, 0, NULL, NULL);
fd = socket(AF_PACKET, SOCK_DGRAM, htons(ETH_P_TSN));
if (fd < 0) {
perror("Couldn't open socket");
return 1;
}
strncpy(req.ifr_name, ifname, sizeof(req.ifr_name));
res = ioctl(fd, SIOCGIFINDEX, &req);
if (res < 0) {
perror("Couldn't get interface index");
goto err;
}
sk_addr.sll_ifindex = req.ifr_ifindex;
memcpy(&sk_addr.sll_addr, macaddr, ETH_ALEN);
if (priority != -1) {
res = setsockopt(fd, SOL_SOCKET, SO_PRIORITY, &priority,
sizeof(priority));
if (res < 0) {
perror("Couldn't set priority");
goto err;
}
}
data = alloca(size);
memset(data, MAGIC, size);
printf("Sending packets...\n");
while (1) {
uint64_t *seq_ptr = (uint64_t *) &data[0];
ssize_t n;
*seq_ptr = seq++;
n = sendto(fd, data, size, 0, (struct sockaddr *) &sk_addr,
sizeof(sk_addr));
if (n < 0)
perror("Failed to send data");
if (delay > 0)
usleep(delay);
}
close(fd);
return 0;
err:
close(fd);
return 1;
}
--8<---------------cut here---------------end--------------->8---
tsn-listener.c
--8<---------------cut here---------------start------------->8---
/*
* Copyright (c) 2017, Intel Corporation
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of Intel Corporation nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
* SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
* STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include <alloca.h>
#include <argp.h>
#include <arpa/inet.h>
#include <inttypes.h>
#include <linux/if.h>
#include <linux/if_ether.h>
#include <linux/if_packet.h>
#include <poll.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/timerfd.h>
#include <unistd.h>
static uint8_t ifname[IFNAMSIZ];
static uint8_t macaddr[ETH_ALEN];
static uint64_t data_count;
static int size = 1500;
static time_t interval = 1;
static bool check_seq = false;
static uint64_t expected_seq;
static struct argp_option options[] = {
{"check-seq", 'c', NULL, 0, "Check sequence number within packet" },
{"dst-addr", 'd', "MACADDR", 0, "Stream Destination MAC address" },
{"ifname", 'i', "IFNAME", 0, "Network Interface" },
{"interval", 'I', "SEC", 0, "Interval between bandwidth reports" },
{"packet-size", 's', "NUM", 0, "Expected packet size" },
{ 0 }
};
static error_t parser(int key, char *arg, struct argp_state *state)
{
int res;
switch (key) {
case 'c':
check_seq = true;
break;
case 'd':
res = sscanf(arg, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
&macaddr[0], &macaddr[1], &macaddr[2],
&macaddr[3], &macaddr[4], &macaddr[5]);
if (res != 6) {
printf("Invalid address\n");
exit(EXIT_FAILURE);
}
break;
case 'i':
strncpy(ifname, arg, sizeof(ifname) - 1);
break;
case 'I':
interval = atoi(arg);
break;
case 's':
size = atoi(arg);
break;
}
return 0;
}
static struct argp argp = { options, parser };
static int setup_timer(void)
{
int fd, res;
struct itimerspec tspec = { 0 };
fd = timerfd_create(CLOCK_MONOTONIC, 0);
if (fd < 0) {
perror("Couldn't create timer");
return -1;
}
tspec.it_value.tv_sec = interval;
tspec.it_interval.tv_sec = interval;
res = timerfd_settime(fd, 0, &tspec, NULL);
if (res < 0) {
perror("Couldn't set timer");
close(fd);
return -1;
}
return fd;
}
static int setup_socket(void)
{
int fd, res;
struct sockaddr_ll sk_addr = {
.sll_family = AF_PACKET,
.sll_protocol = htons(ETH_P_TSN),
};
fd = socket(AF_PACKET, SOCK_DGRAM, htons(ETH_P_TSN));
if (fd < 0) {
perror("Couldn't open socket");
return -1;
}
/* If user provided a network interface, bind() to it. */
if (ifname[0] != '\0') {
struct ifreq req;
strncpy(req.ifr_name, ifname, sizeof(req.ifr_name));
res = ioctl(fd, SIOCGIFINDEX, &req);
if (res < 0) {
perror("Couldn't get interface index");
goto err;
}
sk_addr.sll_ifindex = req.ifr_ifindex;
res = bind(fd, (struct sockaddr *) &sk_addr, sizeof(sk_addr));
if (res < 0) {
perror("Couldn't bind() to interface");
goto err;
}
}
/* If user provided the stream destination address, set it as multicast
* address.
*/
if (macaddr[0] != '\0') {
struct packet_mreq mreq;
mreq.mr_ifindex = sk_addr.sll_ifindex;
mreq.mr_type = PACKET_MR_MULTICAST;
mreq.mr_alen = ETH_ALEN;
memcpy(&mreq.mr_address, macaddr, ETH_ALEN);
res = setsockopt(fd, SOL_PACKET, PACKET_ADD_MEMBERSHIP,
&mreq, sizeof(struct packet_mreq));
if (res < 0) {
perror("Couldn't set PACKET_ADD_MEMBERSHIP");
goto err;
}
}
return fd;
err:
close(fd);
return -1;
}
static void recv_packet(int fd)
{
uint8_t *data = alloca(size);
ssize_t n = recv(fd, data, size, 0);
if (n < 0) {
perror("Failed to receive data");
return;
}
if (n != size)
printf("Size mismatch: expected %d, got %d\n", size, n);
if (check_seq) {
uint64_t *seq = (uint64_t *) &data[0];
/* If 'expected_seq' is equal to zero, it means this is the
* first packet we received so we don't know what sequence
* number to expect.
*/
if (expected_seq == 0)
expected_seq = *seq;
if (*seq != expected_seq) {
printf("Sequence mismatch: expected %llu, got %llu\n",
expected_seq, *seq);
expected_seq = *seq;
}
expected_seq++;
}
data_count += n;
}
static void report_bw(int fd)
{
uint64_t expirations;
ssize_t n = read(fd, &expirations, sizeof(uint64_t));
if (n < 0) {
perror("Couldn't read timerfd");
return;
}
if (expirations != 1)
printf("Some went wrong with timerfd\n");
printf("Receiving data rate: %llu kbps\n", (data_count * 8) / (1000 * interval));
data_count = 0;
}
int main(int argc, char *argv[])
{
int sk_fd, timer_fd, res;
struct pollfd fds[2];
argp_parse(&argp, argc, argv, 0, NULL, NULL);
sk_fd = setup_socket();
if (sk_fd < 0)
return 1;
timer_fd = setup_timer();
if (timer_fd < 0) {
close(sk_fd);
return 1;
}
fds[0].fd = sk_fd;
fds[0].events = POLLIN;
fds[1].fd = timer_fd;
fds[1].events = POLLIN;
printf("Waiting for packets...\n");
while (1) {
res = poll(fds, 2, -1);
if (res < 0) {
perror("Error on poll()");
goto err;
}
if (fds[0].revents & POLLIN)
recv_packet(fds[0].fd);
if (fds[1].revents & POLLIN) {
report_bw(fds[1].fd);
}
}
close(timer_fd);
close(sk_fd);
return 0;
err:
close(timer_fd);
close(sk_fd);
return 1;
}
--8<---------------cut here---------------end--------------->8---
--
2.14.2
^ permalink raw reply [flat|nested] 28+ messages in thread* [next-queue PATCH v4 0/4] TSN: Add qdisc based config interface for CBS @ 2017-10-04 0:28 ` Vinicius Costa Gomes 0 siblings, 0 replies; 28+ messages in thread From: Vinicius Costa Gomes @ 2017-10-04 0:28 UTC (permalink / raw) To: netdev, intel-wired-lan Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, andre.guedes, ivan.briano, jesus.sanchez-palencia, boon.leong.ong, richardcochran, henrik, levipearson, rodney.cummings Hi, Changes since v3: - None, only a clean patchset without old patches; Changes since v2: - squashed the patch introducing the userspace API into the patch implementing CBS; Changes since v1: - Solved the mqprio dependency; - Fixed a mqprio bug, that caused the inner qdisc to have a wrong dev_queue associated with it; Changes from the RFC: - Fixed comments from Henrik Austad; - Simplified the Qdisc, using the generic implementation of callbacks where possible; - Small refactor on the driver (igb) code; This patchset is a proposal of how the Traffic Control subsystem can be used to offload the configuration of the Credit Based Shaper (defined in the IEEE 802.1Q-2014 Section 8.6.8.2) into supported network devices. As part of this work, we've assessed previous public discussions related to TSN enabling: patches from Henrik Austad (Cisco), the presentation from Eric Mann at Linux Plumbers 2012, patches from Gangfeng Huang (National Instruments) and the current state of the OpenAVNU project (https://github.com/AVnu/OpenAvnu/). Overview ======== Time-sensitive Networking (TSN) is a set of standards that aim to address resources availability for providing bandwidth reservation and bounded latency on Ethernet based LANs. The proposal described here aims to cover mainly what is needed to enable the following standards: 802.1Qat and 802.1Qav. The initial target of this work is the Intel i210 NIC, but other controllers' datasheet were also taken into account, like the Renesas RZ/A1H RZ/A1M group and the Synopsis DesignWare Ethernet QoS controller. Proposal ======== Feature-wise, what is covered here is the configuration interfaces for HW implementations of the Credit-Based shaper (CBS, 802.1Qav). CBS is a per-queue shaper. Given that this feature is related to traffic shaping, and that the traffic control subsystem already provides a queueing discipline that offloads config into the device driver (i.e. mqprio), designing a new qdisc for the specific purpose of offloading the config for the CBS shaper seemed like a good fit. For steering traffic into the correct queues, we use the socket option SO_PRIORITY and then a mechanism to map priority to traffic classes / Tx queues. The qdisc mqprio is currently used in our tests. As for the CBS config interface, this patchset is proposing a new qdisc called 'cbs'. Its 'tc' cmd line is: $ tc qdisc add dev IFACE parent ID cbs locredit N hicredit M sendslope S \ idleslope I Note that the parameters for this qdisc are the ones defined by the 802.1Q-2014 spec, so no hardware specific functionality is exposed here. Per-stream shaping, as defined by IEEE 802.1Q-2014 Section 34.6.1, is not yet covered by this proposal. Testing this RFC ================ Attached to this cover letter are: - calculate_cbs_params.py: A Python script to calculate the parameters to the CBS queueing discipline; - tsn-talker.c: A sample C implementation of the talker side of a stream; - tsn-listener.c: A sample C implementation of the listener side of a stream; For testing the patches of this series, you may want to use the attached samples to this cover letter and use the 'mqprio' qdisc to setup the priorities to Tx queues mapping, together with the 'cbs' qdisc to configure the HW shaper of the i210 controller: 1) Setup priorities to traffic classes to hardware queues mapping $ tc qdisc replace dev ens4 handle 100: parent root mqprio num_tc 3 \ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0 For a more detailed explanation, see mqprio(8), in short, this command will map traffic with priority 3 to the hardware queue 0, traffic with priority 2 to hardware queue 1, and the rest will be mapped to hardware queues 2 and 3. 2) Check scheme. You want to get the inner qdiscs ID from the bottom up $ tc -g class show dev ens4 Ex.: +---(100:3) mqprio | +---(100:6) mqprio | +---(100:7) mqprio | +---(100:2) mqprio | +---(100:5) mqprio | +---(100:1) mqprio +---(100:4) mqprio * Here '100:4' is Tx Queue #0 and '100:5' is Tx Queue #1. 3) Calculate CBS parameters for classes A and B. i.e. BW for A is 20Mbps and for B is 10Mbps: $ calc_cbs_params.py -A 20000 -a 1500 -B 10000 -b 1500 4) Configure CBS for traffic class A (priority 3) as provided by the script: $ tc qdisc replace dev ens4 parent 100:4 cbs locredit -1470 \ hicredit 30 sendslope -980000 idleslope 20000 5) Configure CBS for traffic class B (priority 2): $ tc qdisc replace dev ens4 parent 100:5 cbs \ locredit -1485 hicredit 31 sendslope -990000 idleslope 10000 6) Run Listener: $ ./tsn-listener -d 01:AA:AA:AA:AA:AA -i ens4 -s 1500 7) Run Talker for class A (prio 3 here), compiled from samples/tsn/talker.c $ ./tsn-talker -d 01:AA:AA:AA:AA:AA -i ens4 -p 3 -s 1500 * The bandwidth displayed on the listener output at this stage should be very close to the one configured for class A. 8) You can also run a Talker for class B (prio 2 here and using a different address): $ ./tsn-talker -d 01:BB:BB:BB:BB:BB -i ens4 -s 1500 Authors ======= - Andre Guedes <andre.guedes@intel.com> - Ivan Briano <ivan.briano@intel.com> - Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> - Vinicius Gomes <vinicius.gomes@intel.com> Andre Guedes (1): igb: Add support for CBS offload Jesus Sanchez-Palencia (2): mqprio: Implement select_queue class_ops net/sched: Fix accessing invalid dev_queue Vinicius Costa Gomes (1): net/sched: Introduce Credit Based Shaper (CBS) qdisc drivers/net/ethernet/intel/igb/e1000_defines.h | 23 ++ drivers/net/ethernet/intel/igb/e1000_regs.h | 8 + drivers/net/ethernet/intel/igb/igb.h | 6 + drivers/net/ethernet/intel/igb/igb_main.c | 347 +++++++++++++++++++++++++ include/linux/netdevice.h | 1 + include/net/pkt_sched.h | 9 + include/uapi/linux/pkt_sched.h | 17 ++ net/sched/Kconfig | 11 + net/sched/Makefile | 1 + net/sched/sch_cbs.c | 225 ++++++++++++++++ net/sched/sch_generic.c | 8 +- net/sched/sch_mqprio.c | 7 + 12 files changed, 662 insertions(+), 1 deletion(-) create mode 100644 net/sched/sch_cbs.c Annex: Sample files =================== calc_cbs_params.py --8<---------------cut here---------------start------------->8--- #!/usr/bin/env python # # Copyright (c) 2017, Intel Corporation # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are met: # # * Redistributions of source code must retain the above copyright notice, # this list of conditions and the following disclaimer. # * Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # * Neither the name of Intel Corporation nor the names of its contributors # may be used to endorse or promote products derived from this software # without specific prior written permission. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. import argparse import math def print_cbs_params_for_class_a(args): idleslope = args.idleslope_a sendslope = idleslope - args.link_speed # According to 802.1Q-2014 spec, Annex L, hiCredit and # loCredit for SR class A are calculated following the # equations L-10 and L-12, respectively. hicredit = math.ceil(idleslope * args.frame_non_sr / args.link_speed) locredit = math.ceil(sendslope * args.frame_a / args.link_speed) print("tc qdisc add dev <IFNAME> parent <QDISC-ID> cbs idleslope %d sendslope %d hicredit %d locredit %d" % \ (idleslope, sendslope, hicredit, locredit)) def print_cbs_params_for_class_b(args): idleslope = args.idleslope_b sendslope = idleslope - args.link_speed # Annex L doesn't present a straightforward equation to # calculate hiCredit for Class B so we have to derive it # based on generic equations presented in that Annex. # # L-3 is the primary equation to calculate hiCredit. Section # L.2 states that the 'maxInterferenceSize' for SR class B # is the maximum burst size for SR class A plus the # maxInterferenceSize from SR class A (which is equal to the # maximum frame from non-SR traffic). # # The maximum burst size for SR class A equation is shown in # L-16. Merging L-16 into L-3 we get the resulting equation # which calculates hiCredit B (refer to section L.3 in case # you're not familiar with the legend): # # hiCredit B = Rb * ( Mo Ma ) # ---------- + ------ # Ro - Ra Ro # hicredit = math.ceil(idleslope * \ ((args.frame_non_sr / (args.link_speed - args.idleslope_a)) + \ (args.frame_a / args.link_speed))) # loCredit B is calculated following equation L-2. locredit = math.ceil(sendslope * args.frame_b / args.link_speed) print("tc qdisc add dev <IFNAME> parent <QDISC-ID> cbs idleslope %d sendslope %d hicredit %d locredit %d" % \ (idleslope, sendslope, hicredit, locredit)) def main(): parser = argparse.ArgumentParser() parser.add_argument('-S', dest='link_speed', default=1000000.0, type=float, help='Link speed in kbps') parser.add_argument('-s', dest='frame_non_sr', default=1500.0, type=float, help='Maximum frame size from non-SR traffic (MTU size' 'usually') parser.add_argument('-A', dest='idleslope_a', default=0, type=float, help='Idleslope for SR class A in kbps') parser.add_argument('-a', dest='frame_a', default=0, type=float, help='Maximum frame size for SR class A traffic') parser.add_argument('-B', dest='idleslope_b', default=0, type=float, help='Idleslope for SR class B in kbps') parser.add_argument('-b', dest='frame_b', default=0, type=float, help='Maximum frame size for SR class B traffic') args = parser.parse_args() if args.idleslope_a > 0: print_cbs_params_for_class_a(args) if args.idleslope_b > 0: print_cbs_params_for_class_b(args) if __name__ == "__main__": main() --8<---------------cut here---------------end--------------->8--- tsn-talker.c --8<---------------cut here---------------start------------->8--- /* * Copyright (c) 2017, Intel Corporation * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * * Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * * Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * Neither the name of Intel Corporation nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE * COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED * OF THE POSSIBILITY OF SUCH DAMAGE. */ #include <alloca.h> #include <argp.h> #include <arpa/inet.h> #include <inttypes.h> #include <linux/if.h> #include <linux/if_ether.h> #include <linux/if_packet.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/ioctl.h> #include <unistd.h> #define MAGIC 0xCC static uint8_t ifname[IFNAMSIZ]; static uint8_t macaddr[ETH_ALEN]; static int priority = -1; static size_t size = 1500; static uint64_t seq; static int delay = -1; static struct argp_option options[] = { {"dst-addr", 'd', "MACADDR", 0, "Stream Destination MAC address" }, {"delay", 'D', "NUM", 0, "Delay (in us) between packet transmission" }, {"ifname", 'i', "IFNAME", 0, "Network Interface" }, {"prio", 'p', "NUM", 0, "SO_PRIORITY to be set in socket" }, {"packet-size", 's', "NUM", 0, "Size of packets to be transmitted" }, { 0 } }; static error_t parser(int key, char *arg, struct argp_state *state) { int res; switch (key) { case 'd': res = sscanf(arg, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx", &macaddr[0], &macaddr[1], &macaddr[2], &macaddr[3], &macaddr[4], &macaddr[5]); if (res != 6) { printf("Invalid address\n"); exit(EXIT_FAILURE); } break; case 'D': delay = atoi(arg); break; case 'i': strncpy(ifname, arg, sizeof(ifname) - 1); break; case 'p': priority = atoi(arg); break; case 's': size = atoi(arg); break; } return 0; } static struct argp argp = { options, parser }; int main(int argc, char *argv[]) { int fd, res; struct ifreq req; uint8_t *data; struct sockaddr_ll sk_addr = { .sll_family = AF_PACKET, .sll_protocol = htons(ETH_P_TSN), .sll_halen = ETH_ALEN, }; argp_parse(&argp, argc, argv, 0, NULL, NULL); fd = socket(AF_PACKET, SOCK_DGRAM, htons(ETH_P_TSN)); if (fd < 0) { perror("Couldn't open socket"); return 1; } strncpy(req.ifr_name, ifname, sizeof(req.ifr_name)); res = ioctl(fd, SIOCGIFINDEX, &req); if (res < 0) { perror("Couldn't get interface index"); goto err; } sk_addr.sll_ifindex = req.ifr_ifindex; memcpy(&sk_addr.sll_addr, macaddr, ETH_ALEN); if (priority != -1) { res = setsockopt(fd, SOL_SOCKET, SO_PRIORITY, &priority, sizeof(priority)); if (res < 0) { perror("Couldn't set priority"); goto err; } } data = alloca(size); memset(data, MAGIC, size); printf("Sending packets...\n"); while (1) { uint64_t *seq_ptr = (uint64_t *) &data[0]; ssize_t n; *seq_ptr = seq++; n = sendto(fd, data, size, 0, (struct sockaddr *) &sk_addr, sizeof(sk_addr)); if (n < 0) perror("Failed to send data"); if (delay > 0) usleep(delay); } close(fd); return 0; err: close(fd); return 1; } --8<---------------cut here---------------end--------------->8--- tsn-listener.c --8<---------------cut here---------------start------------->8--- /* * Copyright (c) 2017, Intel Corporation * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * * Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * * Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * Neither the name of Intel Corporation nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE * COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED * OF THE POSSIBILITY OF SUCH DAMAGE. */ #include <alloca.h> #include <argp.h> #include <arpa/inet.h> #include <inttypes.h> #include <linux/if.h> #include <linux/if_ether.h> #include <linux/if_packet.h> #include <poll.h> #include <stdbool.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/ioctl.h> #include <sys/timerfd.h> #include <unistd.h> static uint8_t ifname[IFNAMSIZ]; static uint8_t macaddr[ETH_ALEN]; static uint64_t data_count; static int size = 1500; static time_t interval = 1; static bool check_seq = false; static uint64_t expected_seq; static struct argp_option options[] = { {"check-seq", 'c', NULL, 0, "Check sequence number within packet" }, {"dst-addr", 'd', "MACADDR", 0, "Stream Destination MAC address" }, {"ifname", 'i', "IFNAME", 0, "Network Interface" }, {"interval", 'I', "SEC", 0, "Interval between bandwidth reports" }, {"packet-size", 's', "NUM", 0, "Expected packet size" }, { 0 } }; static error_t parser(int key, char *arg, struct argp_state *state) { int res; switch (key) { case 'c': check_seq = true; break; case 'd': res = sscanf(arg, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx", &macaddr[0], &macaddr[1], &macaddr[2], &macaddr[3], &macaddr[4], &macaddr[5]); if (res != 6) { printf("Invalid address\n"); exit(EXIT_FAILURE); } break; case 'i': strncpy(ifname, arg, sizeof(ifname) - 1); break; case 'I': interval = atoi(arg); break; case 's': size = atoi(arg); break; } return 0; } static struct argp argp = { options, parser }; static int setup_timer(void) { int fd, res; struct itimerspec tspec = { 0 }; fd = timerfd_create(CLOCK_MONOTONIC, 0); if (fd < 0) { perror("Couldn't create timer"); return -1; } tspec.it_value.tv_sec = interval; tspec.it_interval.tv_sec = interval; res = timerfd_settime(fd, 0, &tspec, NULL); if (res < 0) { perror("Couldn't set timer"); close(fd); return -1; } return fd; } static int setup_socket(void) { int fd, res; struct sockaddr_ll sk_addr = { .sll_family = AF_PACKET, .sll_protocol = htons(ETH_P_TSN), }; fd = socket(AF_PACKET, SOCK_DGRAM, htons(ETH_P_TSN)); if (fd < 0) { perror("Couldn't open socket"); return -1; } /* If user provided a network interface, bind() to it. */ if (ifname[0] != '\0') { struct ifreq req; strncpy(req.ifr_name, ifname, sizeof(req.ifr_name)); res = ioctl(fd, SIOCGIFINDEX, &req); if (res < 0) { perror("Couldn't get interface index"); goto err; } sk_addr.sll_ifindex = req.ifr_ifindex; res = bind(fd, (struct sockaddr *) &sk_addr, sizeof(sk_addr)); if (res < 0) { perror("Couldn't bind() to interface"); goto err; } } /* If user provided the stream destination address, set it as multicast * address. */ if (macaddr[0] != '\0') { struct packet_mreq mreq; mreq.mr_ifindex = sk_addr.sll_ifindex; mreq.mr_type = PACKET_MR_MULTICAST; mreq.mr_alen = ETH_ALEN; memcpy(&mreq.mr_address, macaddr, ETH_ALEN); res = setsockopt(fd, SOL_PACKET, PACKET_ADD_MEMBERSHIP, &mreq, sizeof(struct packet_mreq)); if (res < 0) { perror("Couldn't set PACKET_ADD_MEMBERSHIP"); goto err; } } return fd; err: close(fd); return -1; } static void recv_packet(int fd) { uint8_t *data = alloca(size); ssize_t n = recv(fd, data, size, 0); if (n < 0) { perror("Failed to receive data"); return; } if (n != size) printf("Size mismatch: expected %d, got %d\n", size, n); if (check_seq) { uint64_t *seq = (uint64_t *) &data[0]; /* If 'expected_seq' is equal to zero, it means this is the * first packet we received so we don't know what sequence * number to expect. */ if (expected_seq == 0) expected_seq = *seq; if (*seq != expected_seq) { printf("Sequence mismatch: expected %llu, got %llu\n", expected_seq, *seq); expected_seq = *seq; } expected_seq++; } data_count += n; } static void report_bw(int fd) { uint64_t expirations; ssize_t n = read(fd, &expirations, sizeof(uint64_t)); if (n < 0) { perror("Couldn't read timerfd"); return; } if (expirations != 1) printf("Some went wrong with timerfd\n"); printf("Receiving data rate: %llu kbps\n", (data_count * 8) / (1000 * interval)); data_count = 0; } int main(int argc, char *argv[]) { int sk_fd, timer_fd, res; struct pollfd fds[2]; argp_parse(&argp, argc, argv, 0, NULL, NULL); sk_fd = setup_socket(); if (sk_fd < 0) return 1; timer_fd = setup_timer(); if (timer_fd < 0) { close(sk_fd); return 1; } fds[0].fd = sk_fd; fds[0].events = POLLIN; fds[1].fd = timer_fd; fds[1].events = POLLIN; printf("Waiting for packets...\n"); while (1) { res = poll(fds, 2, -1); if (res < 0) { perror("Error on poll()"); goto err; } if (fds[0].revents & POLLIN) recv_packet(fds[0].fd); if (fds[1].revents & POLLIN) { report_bw(fds[1].fd); } } close(timer_fd); close(sk_fd); return 0; err: close(timer_fd); close(sk_fd); return 1; } --8<---------------cut here---------------end--------------->8--- ^ permalink raw reply [flat|nested] 28+ messages in thread
* [Intel-wired-lan] [next-queue PATCH v4 1/4] mqprio: Implement select_queue class_ops 2017-10-04 0:28 ` Vinicius Costa Gomes @ 2017-10-04 0:28 ` Vinicius Costa Gomes -1 siblings, 0 replies; 28+ messages in thread From: Vinicius Costa Gomes @ 2017-10-04 0:28 UTC (permalink / raw) To: intel-wired-lan From: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> When replacing a child qdisc from mqprio, tc_modify_qdisc() must fetch the netdev_queue pointer that the current child qdisc is associated with before creating the new qdisc. Currently, when using mqprio as root qdisc, the kernel will end up getting the queue #0 pointer from the mqprio (root qdisc), which leaves any new child qdisc with a possibly wrong netdev_queue pointer. Implementing the Qdisc_class_ops select_queue() on mqprio fixes this issue and avoid an inconsistent state when child qdiscs are replaced. Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> --- net/sched/sch_mqprio.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c index 6bcdfe6e7b63..9d762ca4a76c 100644 --- a/net/sched/sch_mqprio.c +++ b/net/sched/sch_mqprio.c @@ -396,6 +396,12 @@ static void mqprio_walk(struct Qdisc *sch, struct qdisc_walker *arg) } } +static struct netdev_queue *mqprio_select_queue(struct Qdisc *sch, + struct tcmsg *tcm) +{ + return mqprio_queue_get(sch, TC_H_MIN(tcm->tcm_parent)); +} + static const struct Qdisc_class_ops mqprio_class_ops = { .graft = mqprio_graft, .leaf = mqprio_leaf, @@ -403,6 +409,7 @@ static const struct Qdisc_class_ops mqprio_class_ops = { .walk = mqprio_walk, .dump = mqprio_dump_class, .dump_stats = mqprio_dump_class_stats, + .select_queue = mqprio_select_queue, }; static struct Qdisc_ops mqprio_qdisc_ops __read_mostly = { -- 2.14.2 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [next-queue PATCH v4 1/4] mqprio: Implement select_queue class_ops @ 2017-10-04 0:28 ` Vinicius Costa Gomes 0 siblings, 0 replies; 28+ messages in thread From: Vinicius Costa Gomes @ 2017-10-04 0:28 UTC (permalink / raw) To: netdev, intel-wired-lan Cc: Jesus Sanchez-Palencia, jhs, xiyou.wangcong, jiri, andre.guedes, ivan.briano, boon.leong.ong, richardcochran, henrik, levipearson, rodney.cummings From: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> When replacing a child qdisc from mqprio, tc_modify_qdisc() must fetch the netdev_queue pointer that the current child qdisc is associated with before creating the new qdisc. Currently, when using mqprio as root qdisc, the kernel will end up getting the queue #0 pointer from the mqprio (root qdisc), which leaves any new child qdisc with a possibly wrong netdev_queue pointer. Implementing the Qdisc_class_ops select_queue() on mqprio fixes this issue and avoid an inconsistent state when child qdiscs are replaced. Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> --- net/sched/sch_mqprio.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c index 6bcdfe6e7b63..9d762ca4a76c 100644 --- a/net/sched/sch_mqprio.c +++ b/net/sched/sch_mqprio.c @@ -396,6 +396,12 @@ static void mqprio_walk(struct Qdisc *sch, struct qdisc_walker *arg) } } +static struct netdev_queue *mqprio_select_queue(struct Qdisc *sch, + struct tcmsg *tcm) +{ + return mqprio_queue_get(sch, TC_H_MIN(tcm->tcm_parent)); +} + static const struct Qdisc_class_ops mqprio_class_ops = { .graft = mqprio_graft, .leaf = mqprio_leaf, @@ -403,6 +409,7 @@ static const struct Qdisc_class_ops mqprio_class_ops = { .walk = mqprio_walk, .dump = mqprio_dump_class, .dump_stats = mqprio_dump_class_stats, + .select_queue = mqprio_select_queue, }; static struct Qdisc_ops mqprio_qdisc_ops __read_mostly = { -- 2.14.2 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [Intel-wired-lan] [next-queue PATCH v4 2/4] net/sched: Fix accessing invalid dev_queue 2017-10-04 0:28 ` Vinicius Costa Gomes @ 2017-10-04 0:28 ` Vinicius Costa Gomes -1 siblings, 0 replies; 28+ messages in thread From: Vinicius Costa Gomes @ 2017-10-04 0:28 UTC (permalink / raw) To: intel-wired-lan From: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> In qdisc_alloc() the dev_queue pointer was used without any checks being performed. If qdisc_create() gets a null dev_queue pointer, it just passes it along to qdisc_alloc(), leading to a crash. That happens if a root qdisc implements select_queue() and returns a null dev_queue pointer for an "invalid handle", for example. One way to reproduce that is: 1) Setup mqprio $ tc qdisc replace dev enp3s0 parent root mqprio num_tc 3 \ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1 at 0 1 at 1 2 at 2 hw 0 2) Replace the first inner qdisc $ tc qdisc replace dev enp3s0 parent 8001:1 pfifo_fast This will lead to the following crash: [15274.874506] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 [15274.875044] IP: qdisc_alloc+0xd/0xf0 [15274.875253] PGD 1a37d067 P4D 1a37d067 PUD 1a0cb067 PMD 0 [15274.875566] Oops: 0000 [#1] SMP [15274.875813] Modules linked in: [15274.875993] CPU: 0 PID: 2006 Comm: tc Not tainted 4.14.0-rc1+ #42 [15274.876415] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014 [15274.877019] task: ffff88001eb91d00 task.stack: ffffc90000734000 [15274.877359] RIP: 0010:qdisc_alloc+0xd/0xf0 [15274.877606] RSP: 0018:ffffc90000737a88 EFLAGS: 00010286 [15274.877904] RAX: ffffffff81ef0da0 RBX: 0000000000000000 RCX: 0000000000000000 [15274.878331] RDX: ffff88001a046830 RSI: ffffffff81ef0da0 RDI: 0000000000000000 [15274.878757] RBP: 0000000001000001 R08: 000000000000006c R09: ffffc90000737b40 [15274.879165] R10: fffffffffffffc20 R11: 0000000000000000 R12: ffff88001e308000 [15274.879571] R13: 0000000000000088 R14: ffffc90000737b40 R15: 0000000000000000 [15274.879978] FS: 00007f790a90a700(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000 [15274.880457] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [15274.880784] CR2: 0000000000000058 CR3: 000000001a003004 CR4: 00000000003606f0 [15274.881203] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [15274.881620] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [15274.882026] Call Trace: [15274.882179] qdisc_create+0xb0/0x380 [15274.882388] ? security_capable+0x2e/0x50 [15274.882616] ? nla_parse+0x32/0x100 [15274.882818] tc_modify_qdisc+0x47f/0x560 [15274.883046] rtnetlink_rcv_msg+0x1db/0x200 [15274.883284] ? __skb_try_recv_datagram+0xd6/0x150 [15274.883555] ? rtnl_calcit.isra.21+0xe0/0xe0 [15274.883814] netlink_rcv_skb+0x92/0xf0 [15274.884031] netlink_unicast+0x12c/0x1d0 [15274.884258] netlink_sendmsg+0x2ee/0x330 [15274.884486] sock_sendmsg+0x28/0x40 [15274.884689] ___sys_sendmsg+0x25d/0x280 [15274.884913] ? __alloc_pages_nodemask+0x120/0x180 [15274.885185] ? page_add_new_anon_rmap+0x90/0xb0 [15274.885448] ? __handle_mm_fault+0x74e/0xc40 [15274.885695] ? __sys_sendmsg+0x3e/0x70 [15274.885912] __sys_sendmsg+0x3e/0x70 [15274.886123] entry_SYSCALL_64_fastpath+0x13/0x94 [15274.886387] RIP: 0033:0x7f7909b1a1b7 [15274.886594] RSP: 002b:00007ffe14315028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [15274.887042] RAX: ffffffffffffffda RBX: 0000000000661860 RCX: 00007f7909b1a1b7 [15274.887450] RDX: 0000000000000000 RSI: 00007ffe143150a0 RDI: 0000000000000003 [15274.887854] RBP: 00007ffe143150a0 R08: 0000000000000001 R09: 0000000000000000 [15274.888256] R10: 00000000000005f1 R11: 0000000000000246 R12: 00007ffe143150e0 [15274.888660] R13: 000000000066e620 R14: 00007ffe1431d180 R15: 0000000000000000 [15274.889070] Code: a8 01 74 09 48 89 df 5b e9 71 ff ff ff 5b c3 f3 c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 41 55 41 54 55 53 48 89 fb 44 8b 6e 20 <8b> 57 58 48 89 f5 4c 8b 27 be c0 80 40 01 41 8d bd 40 01 00 00 [15274.890154] RIP: qdisc_alloc+0xd/0xf0 RSP: ffffc90000737a88 [15274.890479] CR2: 0000000000000058 [15274.890677] ---[ end trace 6d0e946c11e46095 ]--- Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> --- net/sched/sch_generic.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index a0a198768aad..de2408f1ccd3 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -603,8 +603,14 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue, struct Qdisc *sch; unsigned int size = QDISC_ALIGN(sizeof(*sch)) + ops->priv_size; int err = -ENOBUFS; - struct net_device *dev = dev_queue->dev; + struct net_device *dev; + + if (!dev_queue) { + err = -EINVAL; + goto errout; + } + dev = dev_queue->dev; p = kzalloc_node(size, GFP_KERNEL, netdev_queue_numa_node_read(dev_queue)); -- 2.14.2 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [next-queue PATCH v4 2/4] net/sched: Fix accessing invalid dev_queue @ 2017-10-04 0:28 ` Vinicius Costa Gomes 0 siblings, 0 replies; 28+ messages in thread From: Vinicius Costa Gomes @ 2017-10-04 0:28 UTC (permalink / raw) To: netdev, intel-wired-lan Cc: Jesus Sanchez-Palencia, jhs, xiyou.wangcong, jiri, andre.guedes, ivan.briano, boon.leong.ong, richardcochran, henrik, levipearson, rodney.cummings From: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> In qdisc_alloc() the dev_queue pointer was used without any checks being performed. If qdisc_create() gets a null dev_queue pointer, it just passes it along to qdisc_alloc(), leading to a crash. That happens if a root qdisc implements select_queue() and returns a null dev_queue pointer for an "invalid handle", for example. One way to reproduce that is: 1) Setup mqprio $ tc qdisc replace dev enp3s0 parent root mqprio num_tc 3 \ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0 2) Replace the first inner qdisc $ tc qdisc replace dev enp3s0 parent 8001:1 pfifo_fast This will lead to the following crash: [15274.874506] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 [15274.875044] IP: qdisc_alloc+0xd/0xf0 [15274.875253] PGD 1a37d067 P4D 1a37d067 PUD 1a0cb067 PMD 0 [15274.875566] Oops: 0000 [#1] SMP [15274.875813] Modules linked in: [15274.875993] CPU: 0 PID: 2006 Comm: tc Not tainted 4.14.0-rc1+ #42 [15274.876415] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014 [15274.877019] task: ffff88001eb91d00 task.stack: ffffc90000734000 [15274.877359] RIP: 0010:qdisc_alloc+0xd/0xf0 [15274.877606] RSP: 0018:ffffc90000737a88 EFLAGS: 00010286 [15274.877904] RAX: ffffffff81ef0da0 RBX: 0000000000000000 RCX: 0000000000000000 [15274.878331] RDX: ffff88001a046830 RSI: ffffffff81ef0da0 RDI: 0000000000000000 [15274.878757] RBP: 0000000001000001 R08: 000000000000006c R09: ffffc90000737b40 [15274.879165] R10: fffffffffffffc20 R11: 0000000000000000 R12: ffff88001e308000 [15274.879571] R13: 0000000000000088 R14: ffffc90000737b40 R15: 0000000000000000 [15274.879978] FS: 00007f790a90a700(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000 [15274.880457] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [15274.880784] CR2: 0000000000000058 CR3: 000000001a003004 CR4: 00000000003606f0 [15274.881203] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [15274.881620] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [15274.882026] Call Trace: [15274.882179] qdisc_create+0xb0/0x380 [15274.882388] ? security_capable+0x2e/0x50 [15274.882616] ? nla_parse+0x32/0x100 [15274.882818] tc_modify_qdisc+0x47f/0x560 [15274.883046] rtnetlink_rcv_msg+0x1db/0x200 [15274.883284] ? __skb_try_recv_datagram+0xd6/0x150 [15274.883555] ? rtnl_calcit.isra.21+0xe0/0xe0 [15274.883814] netlink_rcv_skb+0x92/0xf0 [15274.884031] netlink_unicast+0x12c/0x1d0 [15274.884258] netlink_sendmsg+0x2ee/0x330 [15274.884486] sock_sendmsg+0x28/0x40 [15274.884689] ___sys_sendmsg+0x25d/0x280 [15274.884913] ? __alloc_pages_nodemask+0x120/0x180 [15274.885185] ? page_add_new_anon_rmap+0x90/0xb0 [15274.885448] ? __handle_mm_fault+0x74e/0xc40 [15274.885695] ? __sys_sendmsg+0x3e/0x70 [15274.885912] __sys_sendmsg+0x3e/0x70 [15274.886123] entry_SYSCALL_64_fastpath+0x13/0x94 [15274.886387] RIP: 0033:0x7f7909b1a1b7 [15274.886594] RSP: 002b:00007ffe14315028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [15274.887042] RAX: ffffffffffffffda RBX: 0000000000661860 RCX: 00007f7909b1a1b7 [15274.887450] RDX: 0000000000000000 RSI: 00007ffe143150a0 RDI: 0000000000000003 [15274.887854] RBP: 00007ffe143150a0 R08: 0000000000000001 R09: 0000000000000000 [15274.888256] R10: 00000000000005f1 R11: 0000000000000246 R12: 00007ffe143150e0 [15274.888660] R13: 000000000066e620 R14: 00007ffe1431d180 R15: 0000000000000000 [15274.889070] Code: a8 01 74 09 48 89 df 5b e9 71 ff ff ff 5b c3 f3 c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 41 55 41 54 55 53 48 89 fb 44 8b 6e 20 <8b> 57 58 48 89 f5 4c 8b 27 be c0 80 40 01 41 8d bd 40 01 00 00 [15274.890154] RIP: qdisc_alloc+0xd/0xf0 RSP: ffffc90000737a88 [15274.890479] CR2: 0000000000000058 [15274.890677] ---[ end trace 6d0e946c11e46095 ]--- Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> --- net/sched/sch_generic.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index a0a198768aad..de2408f1ccd3 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -603,8 +603,14 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue, struct Qdisc *sch; unsigned int size = QDISC_ALIGN(sizeof(*sch)) + ops->priv_size; int err = -ENOBUFS; - struct net_device *dev = dev_queue->dev; + struct net_device *dev; + + if (!dev_queue) { + err = -EINVAL; + goto errout; + } + dev = dev_queue->dev; p = kzalloc_node(size, GFP_KERNEL, netdev_queue_numa_node_read(dev_queue)); -- 2.14.2 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [Intel-wired-lan] [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc 2017-10-04 0:28 ` Vinicius Costa Gomes @ 2017-10-04 0:28 ` Vinicius Costa Gomes -1 siblings, 0 replies; 28+ messages in thread From: Vinicius Costa Gomes @ 2017-10-04 0:28 UTC (permalink / raw) To: intel-wired-lan This queueing discipline implements the shaper algorithm defined by the 802.1Q-2014 Section 8.6.8.2 and detailed in Annex L. It's primary usage is to apply some bandwidth reservation to user defined traffic classes, which are mapped to different queues via the mqprio qdisc. Initially, it only supports offloading the traffic shaping work to supporting controllers. Later, when a software implementation is added, the current dependency on being installed "under" mqprio can be lifted. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> --- include/linux/netdevice.h | 1 + include/net/pkt_sched.h | 9 ++ include/uapi/linux/pkt_sched.h | 17 ++++ net/sched/Kconfig | 11 ++ net/sched/Makefile | 1 + net/sched/sch_cbs.c | 225 +++++++++++++++++++++++++++++++++++++++++ 6 files changed, 264 insertions(+) create mode 100644 net/sched/sch_cbs.c diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index e1d6ef130611..b8798adc214f 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -775,6 +775,7 @@ enum tc_setup_type { TC_SETUP_CLSFLOWER, TC_SETUP_CLSMATCHALL, TC_SETUP_CLSBPF, + TC_SETUP_CBS, }; /* These structures hold the attributes of xdp state that are being passed diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index 259bc191ba59..7c597b050b36 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -146,4 +146,13 @@ static inline bool is_classid_clsact_egress(u32 classid) TC_H_MIN(classid) == TC_H_MIN(TC_H_MIN_EGRESS); } +struct tc_cbs_qopt_offload { + u8 enable; + s32 queue; + s32 hicredit; + s32 locredit; + s32 idleslope; + s32 sendslope; +}; + #endif diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h index 099bf5528fed..27c849c053cf 100644 --- a/include/uapi/linux/pkt_sched.h +++ b/include/uapi/linux/pkt_sched.h @@ -871,4 +871,21 @@ struct tc_pie_xstats { __u32 maxq; /* maximum queue size */ __u32 ecn_mark; /* packets marked with ecn*/ }; + +/* CBS */ +struct tc_cbs_qopt { + __s32 hicredit; + __s32 locredit; + __s32 idleslope; + __s32 sendslope; +}; + +enum { + TCA_CBS_UNSPEC, + TCA_CBS_PARMS, + __TCA_CBS_MAX, +}; + +#define TCA_CBS_MAX (__TCA_CBS_MAX - 1) + #endif diff --git a/net/sched/Kconfig b/net/sched/Kconfig index e70ed26485a2..c03d86a7775e 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -172,6 +172,17 @@ config NET_SCH_TBF To compile this code as a module, choose M here: the module will be called sch_tbf. +config NET_SCH_CBS + tristate "Credit Based Shaper (CBS)" + ---help--- + Say Y here if you want to use the Credit Based Shaper (CBS) packet + scheduling algorithm. + + See the top of <file:net/sched/sch_cbs.c> for more details. + + To compile this code as a module, choose M here: the + module will be called sch_cbs. + config NET_SCH_GRED tristate "Generic Random Early Detection (GRED)" ---help--- diff --git a/net/sched/Makefile b/net/sched/Makefile index 7b915d226de7..80c8f92d162d 100644 --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -52,6 +52,7 @@ obj-$(CONFIG_NET_SCH_FQ_CODEL) += sch_fq_codel.o obj-$(CONFIG_NET_SCH_FQ) += sch_fq.o obj-$(CONFIG_NET_SCH_HHF) += sch_hhf.o obj-$(CONFIG_NET_SCH_PIE) += sch_pie.o +obj-$(CONFIG_NET_SCH_CBS) += sch_cbs.o obj-$(CONFIG_NET_CLS_U32) += cls_u32.o obj-$(CONFIG_NET_CLS_ROUTE4) += cls_route.o diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c new file mode 100644 index 000000000000..3e0fb0b92160 --- /dev/null +++ b/net/sched/sch_cbs.c @@ -0,0 +1,225 @@ +/* + * net/sched/sch_cbs.c Credit Based Shaper + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Authors: Vinicius Costa Gomes <vinicius.gomes@intel.com> + * + */ + +/* Credit Based Shaper (CBS) + ========================= + + This is a simple rate-limiting shaper aimed at TSN applications on + systems with known traffic workloads. + + Its algorithm is defined by the IEEE 802.1Q-2014 Specification, + Section 8.6.8.2, and explained in more detail in the Annex L of the + same specification. + + There are four tunables to be considered: + + 'idleslope': Idleslope is the rate of credits that is + accumulated (in kilobits per second) when there is at least + one packet waiting for transmission. Packets are transmitted + when the current value of credits is equal or greater than + zero. When there is no packet to be transmitted the amount of + credits is set to zero. This is the main tunable of the CBS + algorithm. + + 'sendslope': + Sendslope is the rate of credits that is depleted (it should be a + negative number of kilobits per second) when a transmission is + ocurring. It can be calculated as follows, (IEEE 802.1Q-2014 Section + 8.6.8.2 item g): + + sendslope = idleslope - port_transmit_rate + + 'hicredit': Hicredit defines the maximum amount of credits (in + bytes) that can be accumulated. Hicredit depends on the + characteristics of interfering traffic, + 'max_interference_size' is the maximum size of any burst of + traffic that can delay the transmission of a frame that is + available for transmission for this traffic class, (IEEE + 802.1Q-2014 Annex L, Equation L-3): + + hicredit = max_interference_size * (idleslope / port_transmit_rate) + + 'locredit': Locredit is the minimum amount of credits that can + be reached. It is a function of the traffic flowing through + this qdisc (IEEE 802.1Q-2014 Annex L, Equation L-2): + + locredit = max_frame_size * (sendslope / port_transmit_rate) +*/ + +#include <linux/module.h> +#include <linux/types.h> +#include <linux/kernel.h> +#include <linux/string.h> +#include <linux/errno.h> +#include <linux/skbuff.h> +#include <net/netlink.h> +#include <net/sch_generic.h> +#include <net/pkt_sched.h> + +struct cbs_sched_data { + s32 queue; + s32 locredit; + s32 hicredit; + s32 sendslope; + s32 idleslope; +}; + +static int cbs_enqueue(struct sk_buff *skb, struct Qdisc *sch, + struct sk_buff **to_free) +{ + return qdisc_enqueue_tail(skb, sch); +} + +static const struct nla_policy cbs_policy[TCA_CBS_MAX + 1] = { + [TCA_CBS_PARMS] = { .len = sizeof(struct tc_cbs_qopt) }, +}; + +static int cbs_change(struct Qdisc *sch, struct nlattr *opt) +{ + struct cbs_sched_data *q = qdisc_priv(sch); + struct tc_cbs_qopt_offload cbs = { }; + struct nlattr *tb[TCA_CBS_MAX + 1]; + const struct net_device_ops *ops; + struct tc_cbs_qopt *qopt; + struct net_device *dev; + int err; + + err = nla_parse_nested(tb, TCA_CBS_MAX, opt, cbs_policy, NULL); + if (err < 0) + return err; + + err = -EINVAL; + if (!tb[TCA_CBS_PARMS]) + goto done; + + qopt = nla_data(tb[TCA_CBS_PARMS]); + + dev = qdisc_dev(sch); + ops = dev->netdev_ops; + + cbs.queue = q->queue; + cbs.enable = 1; + cbs.hicredit = qopt->hicredit; + cbs.locredit = qopt->locredit; + cbs.idleslope = qopt->idleslope; + cbs.sendslope = qopt->sendslope; + + err = -EOPNOTSUPP; + if (!ops->ndo_setup_tc) + goto done; + + err = ops->ndo_setup_tc(dev, TC_SETUP_CBS, &cbs); + if (err < 0) + goto done; + + q->hicredit = cbs.hicredit; + q->locredit = cbs.locredit; + q->idleslope = cbs.idleslope; + q->sendslope = cbs.sendslope; + +done: + return err; +} + +static int cbs_init(struct Qdisc *sch, struct nlattr *opt) +{ + struct cbs_sched_data *q = qdisc_priv(sch); + struct net_device *dev = qdisc_dev(sch); + + if (!opt) + return -EINVAL; + + q->queue = sch->dev_queue - netdev_get_tx_queue(dev, 0); + + return cbs_change(sch, opt); +} + +static void cbs_destroy(struct Qdisc *sch) +{ + struct cbs_sched_data *q = qdisc_priv(sch); + struct tc_cbs_qopt_offload cbs = { }; + const struct net_device_ops *ops; + struct net_device *dev; + int err; + + q->hicredit = 0; + q->locredit = 0; + q->idleslope = 0; + q->sendslope = 0; + + dev = qdisc_dev(sch); + ops = dev->netdev_ops; + + if (!ops->ndo_setup_tc) + return; + + cbs.queue = q->queue; + cbs.enable = 0; + + err = ops->ndo_setup_tc(dev, TC_SETUP_CBS, &cbs); + if (err < 0) + pr_warn("Couldn't reset queue %d to default values\n", + cbs.queue); +} + +static int cbs_dump(struct Qdisc *sch, struct sk_buff *skb) +{ + struct cbs_sched_data *q = qdisc_priv(sch); + struct nlattr *nest; + struct tc_cbs_qopt opt; + + nest = nla_nest_start(skb, TCA_OPTIONS); + if (!nest) + goto nla_put_failure; + + opt.hicredit = q->hicredit; + opt.locredit = q->locredit; + opt.sendslope = q->sendslope; + opt.idleslope = q->idleslope; + + if (nla_put(skb, TCA_CBS_PARMS, sizeof(opt), &opt)) + goto nla_put_failure; + + return nla_nest_end(skb, nest); + +nla_put_failure: + nla_nest_cancel(skb, nest); + return -1; +} + +static struct Qdisc_ops cbs_qdisc_ops __read_mostly = { + .next = NULL, + .id = "cbs", + .priv_size = sizeof(struct cbs_sched_data), + .enqueue = cbs_enqueue, + .dequeue = qdisc_dequeue_head, + .peek = qdisc_peek_dequeued, + .init = cbs_init, + .reset = qdisc_reset_queue, + .destroy = cbs_destroy, + .change = cbs_change, + .dump = cbs_dump, + .owner = THIS_MODULE, +}; + +static int __init cbs_module_init(void) +{ + return register_qdisc(&cbs_qdisc_ops); +} + +static void __exit cbs_module_exit(void) +{ + unregister_qdisc(&cbs_qdisc_ops); +} +module_init(cbs_module_init) +module_exit(cbs_module_exit) +MODULE_LICENSE("GPL"); -- 2.14.2 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc @ 2017-10-04 0:28 ` Vinicius Costa Gomes 0 siblings, 0 replies; 28+ messages in thread From: Vinicius Costa Gomes @ 2017-10-04 0:28 UTC (permalink / raw) To: netdev, intel-wired-lan Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, andre.guedes, ivan.briano, jesus.sanchez-palencia, boon.leong.ong, richardcochran, henrik, levipearson, rodney.cummings This queueing discipline implements the shaper algorithm defined by the 802.1Q-2014 Section 8.6.8.2 and detailed in Annex L. It's primary usage is to apply some bandwidth reservation to user defined traffic classes, which are mapped to different queues via the mqprio qdisc. Initially, it only supports offloading the traffic shaping work to supporting controllers. Later, when a software implementation is added, the current dependency on being installed "under" mqprio can be lifted. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> --- include/linux/netdevice.h | 1 + include/net/pkt_sched.h | 9 ++ include/uapi/linux/pkt_sched.h | 17 ++++ net/sched/Kconfig | 11 ++ net/sched/Makefile | 1 + net/sched/sch_cbs.c | 225 +++++++++++++++++++++++++++++++++++++++++ 6 files changed, 264 insertions(+) create mode 100644 net/sched/sch_cbs.c diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index e1d6ef130611..b8798adc214f 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -775,6 +775,7 @@ enum tc_setup_type { TC_SETUP_CLSFLOWER, TC_SETUP_CLSMATCHALL, TC_SETUP_CLSBPF, + TC_SETUP_CBS, }; /* These structures hold the attributes of xdp state that are being passed diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index 259bc191ba59..7c597b050b36 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -146,4 +146,13 @@ static inline bool is_classid_clsact_egress(u32 classid) TC_H_MIN(classid) == TC_H_MIN(TC_H_MIN_EGRESS); } +struct tc_cbs_qopt_offload { + u8 enable; + s32 queue; + s32 hicredit; + s32 locredit; + s32 idleslope; + s32 sendslope; +}; + #endif diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h index 099bf5528fed..27c849c053cf 100644 --- a/include/uapi/linux/pkt_sched.h +++ b/include/uapi/linux/pkt_sched.h @@ -871,4 +871,21 @@ struct tc_pie_xstats { __u32 maxq; /* maximum queue size */ __u32 ecn_mark; /* packets marked with ecn*/ }; + +/* CBS */ +struct tc_cbs_qopt { + __s32 hicredit; + __s32 locredit; + __s32 idleslope; + __s32 sendslope; +}; + +enum { + TCA_CBS_UNSPEC, + TCA_CBS_PARMS, + __TCA_CBS_MAX, +}; + +#define TCA_CBS_MAX (__TCA_CBS_MAX - 1) + #endif diff --git a/net/sched/Kconfig b/net/sched/Kconfig index e70ed26485a2..c03d86a7775e 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -172,6 +172,17 @@ config NET_SCH_TBF To compile this code as a module, choose M here: the module will be called sch_tbf. +config NET_SCH_CBS + tristate "Credit Based Shaper (CBS)" + ---help--- + Say Y here if you want to use the Credit Based Shaper (CBS) packet + scheduling algorithm. + + See the top of <file:net/sched/sch_cbs.c> for more details. + + To compile this code as a module, choose M here: the + module will be called sch_cbs. + config NET_SCH_GRED tristate "Generic Random Early Detection (GRED)" ---help--- diff --git a/net/sched/Makefile b/net/sched/Makefile index 7b915d226de7..80c8f92d162d 100644 --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -52,6 +52,7 @@ obj-$(CONFIG_NET_SCH_FQ_CODEL) += sch_fq_codel.o obj-$(CONFIG_NET_SCH_FQ) += sch_fq.o obj-$(CONFIG_NET_SCH_HHF) += sch_hhf.o obj-$(CONFIG_NET_SCH_PIE) += sch_pie.o +obj-$(CONFIG_NET_SCH_CBS) += sch_cbs.o obj-$(CONFIG_NET_CLS_U32) += cls_u32.o obj-$(CONFIG_NET_CLS_ROUTE4) += cls_route.o diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c new file mode 100644 index 000000000000..3e0fb0b92160 --- /dev/null +++ b/net/sched/sch_cbs.c @@ -0,0 +1,225 @@ +/* + * net/sched/sch_cbs.c Credit Based Shaper + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Authors: Vinicius Costa Gomes <vinicius.gomes@intel.com> + * + */ + +/* Credit Based Shaper (CBS) + ========================= + + This is a simple rate-limiting shaper aimed at TSN applications on + systems with known traffic workloads. + + Its algorithm is defined by the IEEE 802.1Q-2014 Specification, + Section 8.6.8.2, and explained in more detail in the Annex L of the + same specification. + + There are four tunables to be considered: + + 'idleslope': Idleslope is the rate of credits that is + accumulated (in kilobits per second) when there is at least + one packet waiting for transmission. Packets are transmitted + when the current value of credits is equal or greater than + zero. When there is no packet to be transmitted the amount of + credits is set to zero. This is the main tunable of the CBS + algorithm. + + 'sendslope': + Sendslope is the rate of credits that is depleted (it should be a + negative number of kilobits per second) when a transmission is + ocurring. It can be calculated as follows, (IEEE 802.1Q-2014 Section + 8.6.8.2 item g): + + sendslope = idleslope - port_transmit_rate + + 'hicredit': Hicredit defines the maximum amount of credits (in + bytes) that can be accumulated. Hicredit depends on the + characteristics of interfering traffic, + 'max_interference_size' is the maximum size of any burst of + traffic that can delay the transmission of a frame that is + available for transmission for this traffic class, (IEEE + 802.1Q-2014 Annex L, Equation L-3): + + hicredit = max_interference_size * (idleslope / port_transmit_rate) + + 'locredit': Locredit is the minimum amount of credits that can + be reached. It is a function of the traffic flowing through + this qdisc (IEEE 802.1Q-2014 Annex L, Equation L-2): + + locredit = max_frame_size * (sendslope / port_transmit_rate) +*/ + +#include <linux/module.h> +#include <linux/types.h> +#include <linux/kernel.h> +#include <linux/string.h> +#include <linux/errno.h> +#include <linux/skbuff.h> +#include <net/netlink.h> +#include <net/sch_generic.h> +#include <net/pkt_sched.h> + +struct cbs_sched_data { + s32 queue; + s32 locredit; + s32 hicredit; + s32 sendslope; + s32 idleslope; +}; + +static int cbs_enqueue(struct sk_buff *skb, struct Qdisc *sch, + struct sk_buff **to_free) +{ + return qdisc_enqueue_tail(skb, sch); +} + +static const struct nla_policy cbs_policy[TCA_CBS_MAX + 1] = { + [TCA_CBS_PARMS] = { .len = sizeof(struct tc_cbs_qopt) }, +}; + +static int cbs_change(struct Qdisc *sch, struct nlattr *opt) +{ + struct cbs_sched_data *q = qdisc_priv(sch); + struct tc_cbs_qopt_offload cbs = { }; + struct nlattr *tb[TCA_CBS_MAX + 1]; + const struct net_device_ops *ops; + struct tc_cbs_qopt *qopt; + struct net_device *dev; + int err; + + err = nla_parse_nested(tb, TCA_CBS_MAX, opt, cbs_policy, NULL); + if (err < 0) + return err; + + err = -EINVAL; + if (!tb[TCA_CBS_PARMS]) + goto done; + + qopt = nla_data(tb[TCA_CBS_PARMS]); + + dev = qdisc_dev(sch); + ops = dev->netdev_ops; + + cbs.queue = q->queue; + cbs.enable = 1; + cbs.hicredit = qopt->hicredit; + cbs.locredit = qopt->locredit; + cbs.idleslope = qopt->idleslope; + cbs.sendslope = qopt->sendslope; + + err = -EOPNOTSUPP; + if (!ops->ndo_setup_tc) + goto done; + + err = ops->ndo_setup_tc(dev, TC_SETUP_CBS, &cbs); + if (err < 0) + goto done; + + q->hicredit = cbs.hicredit; + q->locredit = cbs.locredit; + q->idleslope = cbs.idleslope; + q->sendslope = cbs.sendslope; + +done: + return err; +} + +static int cbs_init(struct Qdisc *sch, struct nlattr *opt) +{ + struct cbs_sched_data *q = qdisc_priv(sch); + struct net_device *dev = qdisc_dev(sch); + + if (!opt) + return -EINVAL; + + q->queue = sch->dev_queue - netdev_get_tx_queue(dev, 0); + + return cbs_change(sch, opt); +} + +static void cbs_destroy(struct Qdisc *sch) +{ + struct cbs_sched_data *q = qdisc_priv(sch); + struct tc_cbs_qopt_offload cbs = { }; + const struct net_device_ops *ops; + struct net_device *dev; + int err; + + q->hicredit = 0; + q->locredit = 0; + q->idleslope = 0; + q->sendslope = 0; + + dev = qdisc_dev(sch); + ops = dev->netdev_ops; + + if (!ops->ndo_setup_tc) + return; + + cbs.queue = q->queue; + cbs.enable = 0; + + err = ops->ndo_setup_tc(dev, TC_SETUP_CBS, &cbs); + if (err < 0) + pr_warn("Couldn't reset queue %d to default values\n", + cbs.queue); +} + +static int cbs_dump(struct Qdisc *sch, struct sk_buff *skb) +{ + struct cbs_sched_data *q = qdisc_priv(sch); + struct nlattr *nest; + struct tc_cbs_qopt opt; + + nest = nla_nest_start(skb, TCA_OPTIONS); + if (!nest) + goto nla_put_failure; + + opt.hicredit = q->hicredit; + opt.locredit = q->locredit; + opt.sendslope = q->sendslope; + opt.idleslope = q->idleslope; + + if (nla_put(skb, TCA_CBS_PARMS, sizeof(opt), &opt)) + goto nla_put_failure; + + return nla_nest_end(skb, nest); + +nla_put_failure: + nla_nest_cancel(skb, nest); + return -1; +} + +static struct Qdisc_ops cbs_qdisc_ops __read_mostly = { + .next = NULL, + .id = "cbs", + .priv_size = sizeof(struct cbs_sched_data), + .enqueue = cbs_enqueue, + .dequeue = qdisc_dequeue_head, + .peek = qdisc_peek_dequeued, + .init = cbs_init, + .reset = qdisc_reset_queue, + .destroy = cbs_destroy, + .change = cbs_change, + .dump = cbs_dump, + .owner = THIS_MODULE, +}; + +static int __init cbs_module_init(void) +{ + return register_qdisc(&cbs_qdisc_ops); +} + +static void __exit cbs_module_exit(void) +{ + unregister_qdisc(&cbs_qdisc_ops); +} +module_init(cbs_module_init) +module_exit(cbs_module_exit) +MODULE_LICENSE("GPL"); -- 2.14.2 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [Intel-wired-lan] [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc 2017-10-04 0:28 ` Vinicius Costa Gomes @ 2017-10-04 6:36 ` Jiri Pirko -1 siblings, 0 replies; 28+ messages in thread From: Jiri Pirko @ 2017-10-04 6:36 UTC (permalink / raw) To: intel-wired-lan Wed, Oct 04, 2017 at 02:28:30AM CEST, vinicius.gomes at intel.com wrote: >This queueing discipline implements the shaper algorithm defined by >the 802.1Q-2014 Section 8.6.8.2 and detailed in Annex L. > >It's primary usage is to apply some bandwidth reservation to user >defined traffic classes, which are mapped to different queues via the >mqprio qdisc. > >Initially, it only supports offloading the traffic shaping work to >supporting controllers. > >Later, when a software implementation is added, the current dependency >on being installed "under" mqprio can be lifted. > >Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> >Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> >--- > include/linux/netdevice.h | 1 + > include/net/pkt_sched.h | 9 ++ > include/uapi/linux/pkt_sched.h | 17 ++++ > net/sched/Kconfig | 11 ++ > net/sched/Makefile | 1 + > net/sched/sch_cbs.c | 225 +++++++++++++++++++++++++++++++++++++++++ > 6 files changed, 264 insertions(+) > create mode 100644 net/sched/sch_cbs.c > >diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >index e1d6ef130611..b8798adc214f 100644 >--- a/include/linux/netdevice.h >+++ b/include/linux/netdevice.h >@@ -775,6 +775,7 @@ enum tc_setup_type { > TC_SETUP_CLSFLOWER, > TC_SETUP_CLSMATCHALL, > TC_SETUP_CLSBPF, >+ TC_SETUP_CBS, Please split this into 2 patches. One will introduce the new qdisc, second will add offload capabilities. [...] >+static struct Qdisc_ops cbs_qdisc_ops __read_mostly = { >+ .next = NULL, >+ .id = "cbs", >+ .priv_size = sizeof(struct cbs_sched_data), >+ .enqueue = cbs_enqueue, >+ .dequeue = qdisc_dequeue_head, >+ .peek = qdisc_peek_dequeued, >+ .init = cbs_init, >+ .reset = qdisc_reset_queue, >+ .destroy = cbs_destroy, >+ .change = cbs_change, >+ .dump = cbs_dump, >+ .owner = THIS_MODULE, >+}; I don't see a software implementation for this. Looks like you are trying abuse tc subsystem to bypass kernel. Could you please explain this? The golden rule is: implement in kernel, then offload. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc @ 2017-10-04 6:36 ` Jiri Pirko 0 siblings, 0 replies; 28+ messages in thread From: Jiri Pirko @ 2017-10-04 6:36 UTC (permalink / raw) To: Vinicius Costa Gomes Cc: netdev, intel-wired-lan, jhs, xiyou.wangcong, andre.guedes, ivan.briano, jesus.sanchez-palencia, boon.leong.ong, richardcochran, henrik, levipearson, rodney.cummings Wed, Oct 04, 2017 at 02:28:30AM CEST, vinicius.gomes@intel.com wrote: >This queueing discipline implements the shaper algorithm defined by >the 802.1Q-2014 Section 8.6.8.2 and detailed in Annex L. > >It's primary usage is to apply some bandwidth reservation to user >defined traffic classes, which are mapped to different queues via the >mqprio qdisc. > >Initially, it only supports offloading the traffic shaping work to >supporting controllers. > >Later, when a software implementation is added, the current dependency >on being installed "under" mqprio can be lifted. > >Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> >Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> >--- > include/linux/netdevice.h | 1 + > include/net/pkt_sched.h | 9 ++ > include/uapi/linux/pkt_sched.h | 17 ++++ > net/sched/Kconfig | 11 ++ > net/sched/Makefile | 1 + > net/sched/sch_cbs.c | 225 +++++++++++++++++++++++++++++++++++++++++ > 6 files changed, 264 insertions(+) > create mode 100644 net/sched/sch_cbs.c > >diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >index e1d6ef130611..b8798adc214f 100644 >--- a/include/linux/netdevice.h >+++ b/include/linux/netdevice.h >@@ -775,6 +775,7 @@ enum tc_setup_type { > TC_SETUP_CLSFLOWER, > TC_SETUP_CLSMATCHALL, > TC_SETUP_CLSBPF, >+ TC_SETUP_CBS, Please split this into 2 patches. One will introduce the new qdisc, second will add offload capabilities. [...] >+static struct Qdisc_ops cbs_qdisc_ops __read_mostly = { >+ .next = NULL, >+ .id = "cbs", >+ .priv_size = sizeof(struct cbs_sched_data), >+ .enqueue = cbs_enqueue, >+ .dequeue = qdisc_dequeue_head, >+ .peek = qdisc_peek_dequeued, >+ .init = cbs_init, >+ .reset = qdisc_reset_queue, >+ .destroy = cbs_destroy, >+ .change = cbs_change, >+ .dump = cbs_dump, >+ .owner = THIS_MODULE, >+}; I don't see a software implementation for this. Looks like you are trying abuse tc subsystem to bypass kernel. Could you please explain this? The golden rule is: implement in kernel, then offload. ^ permalink raw reply [flat|nested] 28+ messages in thread
* [Intel-wired-lan] [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc 2017-10-04 6:36 ` Jiri Pirko @ 2017-10-05 18:09 ` Levi Pearson -1 siblings, 0 replies; 28+ messages in thread From: Levi Pearson @ 2017-10-05 18:09 UTC (permalink / raw) To: intel-wired-lan On Wed, Oct 4, 2017 at 12:36 AM, Jiri Pirko <jiri@resnulli.us> wrote: >>+static struct Qdisc_ops cbs_qdisc_ops __read_mostly = { >>+ .next = NULL, >>+ .id = "cbs", >>+ .priv_size = sizeof(struct cbs_sched_data), >>+ .enqueue = cbs_enqueue, >>+ .dequeue = qdisc_dequeue_head, >>+ .peek = qdisc_peek_dequeued, >>+ .init = cbs_init, >>+ .reset = qdisc_reset_queue, >>+ .destroy = cbs_destroy, >>+ .change = cbs_change, >>+ .dump = cbs_dump, >>+ .owner = THIS_MODULE, >>+}; > > I don't see a software implementation for this. Looks like you are > trying abuse tc subsystem to bypass kernel. Could you please explain > this? The golden rule is: implement in kernel, then offload. It would be a shame if this were blocked due to a missing software implementation. This module is analogous to (and designed to work with) the mqprio module; it directly configures the 802.1Qav (Forwarding and Queuing for Time-Sensitive Streams) functionality of multi-queue NICs with that capability. I'm not sure what makes it seem like an attempt to "bypass the kernel"; it's actually an attempt to get an appropriate configuration path *into* the kernel, which has been missing for some time. While it would be valuable to have a CBS software-only implementation, and Vinicius and colleagues have mentioned plans to implement one, most users will have chosen Qav-compliant NICs and will prefer to use the hardware capability. In fact they are often *already* using that capability, but configure it via non-standardized interfaces in out-of-tree or vendor-tree drivers. I believe it's valuable to have the "knobs" fit in with the mqprio qdisc and the overall tc subsystem rather than forcing users through various unrelated configuration tools, but ultimately the hooks just need to be in the network subsystem so the drivers can be told how the user wants to set the registers. It *might* be reasonable to add the functionality of this to mqprio instead of a separate module, but this is only one of many possible 802.1Q shapers that could be selected and configured (with more being defined by IEEE 802.1 working groups for different use cases), and it seems cleaner to me to have their configuration be through separate modules than crammed into an already-confusing one, especially since mqprio has much broader applicability than CBS and it probably doesn't make sense to burden all mqprio users with the configuration option overhead. This meets a specific need in industry (this is widely used in automotive infotainment devices with broad hardware support across the SoCs targeted at that industry) that is not well-served by a software implementation of class-level shaping. As a maintainer of the OpenAvnu project (sponsored by Avnu, an industry alliance formed around the TSN standards), I will be integrating support for this as soon as it's available to our traffic shaping management userspace tools, which currently have to rely on out-of-tree drivers with custom interfaces or the HTB shaper which can be configured close to CBS, but with greatly increased overhead. Levi ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc @ 2017-10-05 18:09 ` Levi Pearson 0 siblings, 0 replies; 28+ messages in thread From: Levi Pearson @ 2017-10-05 18:09 UTC (permalink / raw) To: Jiri Pirko Cc: Vinicius Costa Gomes, Linux Kernel Network Developers, intel-wired-lan, Jamal Hadi Salim, Cong Wang, andre.guedes, Ivan Briano, jesus.sanchez-palencia, boon.leong.ong, richardcochran, Henrik Austad, Rodney Cummings On Wed, Oct 4, 2017 at 12:36 AM, Jiri Pirko <jiri@resnulli.us> wrote: >>+static struct Qdisc_ops cbs_qdisc_ops __read_mostly = { >>+ .next = NULL, >>+ .id = "cbs", >>+ .priv_size = sizeof(struct cbs_sched_data), >>+ .enqueue = cbs_enqueue, >>+ .dequeue = qdisc_dequeue_head, >>+ .peek = qdisc_peek_dequeued, >>+ .init = cbs_init, >>+ .reset = qdisc_reset_queue, >>+ .destroy = cbs_destroy, >>+ .change = cbs_change, >>+ .dump = cbs_dump, >>+ .owner = THIS_MODULE, >>+}; > > I don't see a software implementation for this. Looks like you are > trying abuse tc subsystem to bypass kernel. Could you please explain > this? The golden rule is: implement in kernel, then offload. It would be a shame if this were blocked due to a missing software implementation. This module is analogous to (and designed to work with) the mqprio module; it directly configures the 802.1Qav (Forwarding and Queuing for Time-Sensitive Streams) functionality of multi-queue NICs with that capability. I'm not sure what makes it seem like an attempt to "bypass the kernel"; it's actually an attempt to get an appropriate configuration path *into* the kernel, which has been missing for some time. While it would be valuable to have a CBS software-only implementation, and Vinicius and colleagues have mentioned plans to implement one, most users will have chosen Qav-compliant NICs and will prefer to use the hardware capability. In fact they are often *already* using that capability, but configure it via non-standardized interfaces in out-of-tree or vendor-tree drivers. I believe it's valuable to have the "knobs" fit in with the mqprio qdisc and the overall tc subsystem rather than forcing users through various unrelated configuration tools, but ultimately the hooks just need to be in the network subsystem so the drivers can be told how the user wants to set the registers. It *might* be reasonable to add the functionality of this to mqprio instead of a separate module, but this is only one of many possible 802.1Q shapers that could be selected and configured (with more being defined by IEEE 802.1 working groups for different use cases), and it seems cleaner to me to have their configuration be through separate modules than crammed into an already-confusing one, especially since mqprio has much broader applicability than CBS and it probably doesn't make sense to burden all mqprio users with the configuration option overhead. This meets a specific need in industry (this is widely used in automotive infotainment devices with broad hardware support across the SoCs targeted at that industry) that is not well-served by a software implementation of class-level shaping. As a maintainer of the OpenAvnu project (sponsored by Avnu, an industry alliance formed around the TSN standards), I will be integrating support for this as soon as it's available to our traffic shaping management userspace tools, which currently have to rely on out-of-tree drivers with custom interfaces or the HTB shaper which can be configured close to CBS, but with greatly increased overhead. Levi ^ permalink raw reply [flat|nested] 28+ messages in thread
* [Intel-wired-lan] [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc 2017-10-05 18:09 ` Levi Pearson @ 2017-10-05 18:29 ` David Miller -1 siblings, 0 replies; 28+ messages in thread From: David Miller @ 2017-10-05 18:29 UTC (permalink / raw) To: intel-wired-lan From: Levi Pearson <levipearson@gmail.com> Date: Thu, 5 Oct 2017 12:09:32 -0600 > It would be a shame if this were blocked due to a missing software > implementation. Quite the contrary, I think a software implementation is a minimum requirement for inclusion of this feature. Without a software implementation, there is no clear definition of what is supposed to happen, and no clear way for people to test those expectations unless they have the specific hardware. I completely agree with Jiri. Hardware offload first is _not_ how we do things in the Linux networking. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc @ 2017-10-05 18:29 ` David Miller 0 siblings, 0 replies; 28+ messages in thread From: David Miller @ 2017-10-05 18:29 UTC (permalink / raw) To: levipearson Cc: jiri, vinicius.gomes, netdev, intel-wired-lan, jhs, xiyou.wangcong, andre.guedes, ivan.briano, jesus.sanchez-palencia, boon.leong.ong, richardcochran, henrik, rodney.cummings From: Levi Pearson <levipearson@gmail.com> Date: Thu, 5 Oct 2017 12:09:32 -0600 > It would be a shame if this were blocked due to a missing software > implementation. Quite the contrary, I think a software implementation is a minimum requirement for inclusion of this feature. Without a software implementation, there is no clear definition of what is supposed to happen, and no clear way for people to test those expectations unless they have the specific hardware. I completely agree with Jiri. Hardware offload first is _not_ how we do things in the Linux networking. ^ permalink raw reply [flat|nested] 28+ messages in thread
* [Intel-wired-lan] [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc 2017-10-05 18:29 ` David Miller @ 2017-10-05 18:41 ` Rodney Cummings -1 siblings, 0 replies; 28+ messages in thread From: Rodney Cummings @ 2017-10-05 18:41 UTC (permalink / raw) To: intel-wired-lan The IEEE Std 802.1Q specs for credit-based shaper require precise transmit decisions within a 125 microsecond window of time. Even with the Preempt RT patch or similar enhancements, that isn't very practical as software-only. I doubt that software would conform to the standard's requirements. This is analogous to memory, or CPU. . > -----Original Message----- > From: David Miller [mailto:davem at davemloft.net] > Sent: Thursday, October 5, 2017 1:29 PM > To: levipearson at gmail.com > Cc: jiri at resnulli.us; vinicius.gomes at intel.com; netdev at vger.kernel.org; > intel-wired-lan at lists.osuosl.org; jhs at mojatatu.com; > xiyou.wangcong at gmail.com; andre.guedes at intel.com; ivan.briano at intel.com; > jesus.sanchez-palencia at intel.com; boon.leong.ong at intel.com; > richardcochran at gmail.com; henrik at austad.us; Rodney Cummings > <rodney.cummings@ni.com> > Subject: Re: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based > Shaper (CBS) qdisc > > From: Levi Pearson <levipearson@gmail.com> > Date: Thu, 5 Oct 2017 12:09:32 -0600 > > > It would be a shame if this were blocked due to a missing software > > implementation. > > Quite the contrary, I think a software implementation is a minimum > requirement for inclusion of this feature. > > Without a software implementation, there is no clear definition of > what is supposed to happen, and no clear way for people to test those > expectations unless they have the specific hardware. > > I completely agree with Jiri. Hardware offload first is _not_ how > we do things in the Linux networking. ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc @ 2017-10-05 18:41 ` Rodney Cummings 0 siblings, 0 replies; 28+ messages in thread From: Rodney Cummings @ 2017-10-05 18:41 UTC (permalink / raw) To: David Miller, levipearson@gmail.com Cc: jiri@resnulli.us, vinicius.gomes@intel.com, netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org, jhs@mojatatu.com, xiyou.wangcong@gmail.com, andre.guedes@intel.com, ivan.briano@intel.com, jesus.sanchez-palencia@intel.com, boon.leong.ong@intel.com, richardcochran@gmail.com, henrik@austad.us The IEEE Std 802.1Q specs for credit-based shaper require precise transmit decisions within a 125 microsecond window of time. Even with the Preempt RT patch or similar enhancements, that isn't very practical as software-only. I doubt that software would conform to the standard's requirements. This is analogous to memory, or CPU. . > -----Original Message----- > From: David Miller [mailto:davem@davemloft.net] > Sent: Thursday, October 5, 2017 1:29 PM > To: levipearson@gmail.com > Cc: jiri@resnulli.us; vinicius.gomes@intel.com; netdev@vger.kernel.org; > intel-wired-lan@lists.osuosl.org; jhs@mojatatu.com; > xiyou.wangcong@gmail.com; andre.guedes@intel.com; ivan.briano@intel.com; > jesus.sanchez-palencia@intel.com; boon.leong.ong@intel.com; > richardcochran@gmail.com; henrik@austad.us; Rodney Cummings > <rodney.cummings@ni.com> > Subject: Re: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based > Shaper (CBS) qdisc > > From: Levi Pearson <levipearson@gmail.com> > Date: Thu, 5 Oct 2017 12:09:32 -0600 > > > It would be a shame if this were blocked due to a missing software > > implementation. > > Quite the contrary, I think a software implementation is a minimum > requirement for inclusion of this feature. > > Without a software implementation, there is no clear definition of > what is supposed to happen, and no clear way for people to test those > expectations unless they have the specific hardware. > > I completely agree with Jiri. Hardware offload first is _not_ how > we do things in the Linux networking. ^ permalink raw reply [flat|nested] 28+ messages in thread
* [Intel-wired-lan] [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc 2017-10-05 18:41 ` Rodney Cummings @ 2017-10-05 19:05 ` David Miller -1 siblings, 0 replies; 28+ messages in thread From: David Miller @ 2017-10-05 19:05 UTC (permalink / raw) To: intel-wired-lan From: Rodney Cummings <rodney.cummings@ni.com> Date: Thu, 5 Oct 2017 18:41:48 +0000 > The IEEE Std 802.1Q specs for credit-based shaper require precise transmit decisions > within a 125 microsecond window of time. > > Even with the Preempt RT patch or similar enhancements, that isn't very practical > as software-only. I doubt that software would conform to the standard's > requirements. > > This is analogous to memory, or CPU. I feel like this is looking for an excuse to not have to at least try to implement the software version of CBS. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc @ 2017-10-05 19:05 ` David Miller 0 siblings, 0 replies; 28+ messages in thread From: David Miller @ 2017-10-05 19:05 UTC (permalink / raw) To: rodney.cummings Cc: levipearson, jiri, vinicius.gomes, netdev, intel-wired-lan, jhs, xiyou.wangcong, andre.guedes, ivan.briano, jesus.sanchez-palencia, boon.leong.ong, richardcochran, henrik From: Rodney Cummings <rodney.cummings@ni.com> Date: Thu, 5 Oct 2017 18:41:48 +0000 > The IEEE Std 802.1Q specs for credit-based shaper require precise transmit decisions > within a 125 microsecond window of time. > > Even with the Preempt RT patch or similar enhancements, that isn't very practical > as software-only. I doubt that software would conform to the standard's > requirements. > > This is analogous to memory, or CPU. I feel like this is looking for an excuse to not have to at least try to implement the software version of CBS. ^ permalink raw reply [flat|nested] 28+ messages in thread
* [Intel-wired-lan] [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc 2017-10-05 19:05 ` David Miller @ 2017-10-05 19:17 ` Rodney Cummings -1 siblings, 0 replies; 28+ messages in thread From: Rodney Cummings @ 2017-10-05 19:17 UTC (permalink / raw) To: intel-wired-lan No excuse. If the software cannot meet the standard's requirements, it is non-conformant, which means it cannot be called a standard credit-based shaper. But... I have no objection if someone wants to try software-only. I'm just saying that it is a waste of time for me. > -----Original Message----- > From: David Miller [mailto:davem at davemloft.net] > Sent: Thursday, October 5, 2017 2:05 PM > To: Rodney Cummings <rodney.cummings@ni.com> > Cc: levipearson at gmail.com; jiri at resnulli.us; vinicius.gomes at intel.com; > netdev at vger.kernel.org; intel-wired-lan at lists.osuosl.org; > jhs at mojatatu.com; xiyou.wangcong at gmail.com; andre.guedes at intel.com; > ivan.briano at intel.com; jesus.sanchez-palencia at intel.com; > boon.leong.ong at intel.com; richardcochran at gmail.com; henrik at austad.us > Subject: Re: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based > Shaper (CBS) qdisc > > From: Rodney Cummings <rodney.cummings@ni.com> > Date: Thu, 5 Oct 2017 18:41:48 +0000 > > > The IEEE Std 802.1Q specs for credit-based shaper require precise > transmit decisions > > within a 125 microsecond window of time. > > > > Even with the Preempt RT patch or similar enhancements, that isn't very > practical > > as software-only. I doubt that software would conform to the standard's > > requirements. > > > > This is analogous to memory, or CPU. > > I feel like this is looking for an excuse to not have to at least try to > implement > the software version of CBS. ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc @ 2017-10-05 19:17 ` Rodney Cummings 0 siblings, 0 replies; 28+ messages in thread From: Rodney Cummings @ 2017-10-05 19:17 UTC (permalink / raw) To: David Miller Cc: levipearson@gmail.com, jiri@resnulli.us, vinicius.gomes@intel.com, netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org, jhs@mojatatu.com, xiyou.wangcong@gmail.com, andre.guedes@intel.com, ivan.briano@intel.com, jesus.sanchez-palencia@intel.com, boon.leong.ong@intel.com, richardcochran@gmail.com, henrik@austad.us No excuse. If the software cannot meet the standard's requirements, it is non-conformant, which means it cannot be called a standard credit-based shaper. But... I have no objection if someone wants to try software-only. I'm just saying that it is a waste of time for me. > -----Original Message----- > From: David Miller [mailto:davem@davemloft.net] > Sent: Thursday, October 5, 2017 2:05 PM > To: Rodney Cummings <rodney.cummings@ni.com> > Cc: levipearson@gmail.com; jiri@resnulli.us; vinicius.gomes@intel.com; > netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org; > jhs@mojatatu.com; xiyou.wangcong@gmail.com; andre.guedes@intel.com; > ivan.briano@intel.com; jesus.sanchez-palencia@intel.com; > boon.leong.ong@intel.com; richardcochran@gmail.com; henrik@austad.us > Subject: Re: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based > Shaper (CBS) qdisc > > From: Rodney Cummings <rodney.cummings@ni.com> > Date: Thu, 5 Oct 2017 18:41:48 +0000 > > > The IEEE Std 802.1Q specs for credit-based shaper require precise > transmit decisions > > within a 125 microsecond window of time. > > > > Even with the Preempt RT patch or similar enhancements, that isn't very > practical > > as software-only. I doubt that software would conform to the standard's > > requirements. > > > > This is analogous to memory, or CPU. > > I feel like this is looking for an excuse to not have to at least try to > implement > the software version of CBS. ^ permalink raw reply [flat|nested] 28+ messages in thread
* [Intel-wired-lan] [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc 2017-10-05 19:05 ` David Miller @ 2017-10-05 21:23 ` Levi Pearson -1 siblings, 0 replies; 28+ messages in thread From: Levi Pearson @ 2017-10-05 21:23 UTC (permalink / raw) To: intel-wired-lan (apologies to davem for the repeat; I accidentally did a reply vs. reply-all the first time) On Thu, Oct 5, 2017 at 1:05 PM, David Miller <davem@davemloft.net> wrote: > From: Rodney Cummings <rodney.cummings@ni.com> > Date: Thu, 5 Oct 2017 18:41:48 +0000 > >> The IEEE Std 802.1Q specs for credit-based shaper require precise transmit decisions >> within a 125 microsecond window of time. >> >> Even with the Preempt RT patch or similar enhancements, that isn't very practical >> as software-only. I doubt that software would conform to the standard's >> requirements. >> >> This is analogous to memory, or CPU. > > I feel like this is looking for an excuse to not have to at least try to implement > the software version of CBS. I don't understand why you attribute this to excuse-making. Is the objection due to the fact that the user interface is provided through a qdisc module? In that case, is there a better configuration interface for setting up traffic shaping registers that could be used across all the NICs that provide the capability? There are quite a number of them now, and the lack of kernel interfaces to the hardware makes coordinating the userspace effort to support the protocols far more difficult than it needs to be. As a contrasting example, look at the DCB shaping functionality, provided by the ETS shaper. It's specified in 802.1Q right next to the CBS shaper. It has no software implementation in a qdisc module as far as I can tell (although it should be less resource-intensive to implement), yet there's a whole netlink protocol for configuring it. I don't think it makes sense to tack on the dcb netlink interface to every driver that implements Qav; most don't have the DCB shapers, and the user-level control protocol for FQTSS is SRP instead of DCB's LLDP extensions, so completely different userspace tools would be required as well. I just want a simple, standard interface for configuring some fairly common and IEEE-standard hardware features related to AVB/TSN traffic shaping. Do we need our own netlink protocol for TSN configuration? It seems to be massive overkill for an interface to write a single register, but I suppose it might also be used for configuring TSN paramters in local switch devices, such as Qbv windows, which need quite a bit more information. I would be happy to do some of the work, but I'd like an idea of what kind of interface would be acceptable before writing up an RFC implementation. Levi ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc @ 2017-10-05 21:23 ` Levi Pearson 0 siblings, 0 replies; 28+ messages in thread From: Levi Pearson @ 2017-10-05 21:23 UTC (permalink / raw) To: David Miller Cc: Linux Kernel Network Developers, Rodney Cummings, Jiri Pirko, Vinicius Costa Gomes, intel-wired-lan, Jamal Hadi Salim, Cong Wang, andre.guedes, Ivan Briano, jesus.sanchez-palencia, boon.leong.ong, richardcochran, Henrik Austad (apologies to davem for the repeat; I accidentally did a reply vs. reply-all the first time) On Thu, Oct 5, 2017 at 1:05 PM, David Miller <davem@davemloft.net> wrote: > From: Rodney Cummings <rodney.cummings@ni.com> > Date: Thu, 5 Oct 2017 18:41:48 +0000 > >> The IEEE Std 802.1Q specs for credit-based shaper require precise transmit decisions >> within a 125 microsecond window of time. >> >> Even with the Preempt RT patch or similar enhancements, that isn't very practical >> as software-only. I doubt that software would conform to the standard's >> requirements. >> >> This is analogous to memory, or CPU. > > I feel like this is looking for an excuse to not have to at least try to implement > the software version of CBS. I don't understand why you attribute this to excuse-making. Is the objection due to the fact that the user interface is provided through a qdisc module? In that case, is there a better configuration interface for setting up traffic shaping registers that could be used across all the NICs that provide the capability? There are quite a number of them now, and the lack of kernel interfaces to the hardware makes coordinating the userspace effort to support the protocols far more difficult than it needs to be. As a contrasting example, look at the DCB shaping functionality, provided by the ETS shaper. It's specified in 802.1Q right next to the CBS shaper. It has no software implementation in a qdisc module as far as I can tell (although it should be less resource-intensive to implement), yet there's a whole netlink protocol for configuring it. I don't think it makes sense to tack on the dcb netlink interface to every driver that implements Qav; most don't have the DCB shapers, and the user-level control protocol for FQTSS is SRP instead of DCB's LLDP extensions, so completely different userspace tools would be required as well. I just want a simple, standard interface for configuring some fairly common and IEEE-standard hardware features related to AVB/TSN traffic shaping. Do we need our own netlink protocol for TSN configuration? It seems to be massive overkill for an interface to write a single register, but I suppose it might also be used for configuring TSN paramters in local switch devices, such as Qbv windows, which need quite a bit more information. I would be happy to do some of the work, but I'd like an idea of what kind of interface would be acceptable before writing up an RFC implementation. Levi ^ permalink raw reply [flat|nested] 28+ messages in thread
* [Intel-wired-lan] [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc 2017-10-04 6:36 ` Jiri Pirko @ 2017-10-05 19:57 ` Vinicius Costa Gomes -1 siblings, 0 replies; 28+ messages in thread From: Vinicius Costa Gomes @ 2017-10-05 19:57 UTC (permalink / raw) To: intel-wired-lan Hi Jiri, Jiri Pirko <jiri@resnulli.us> writes: > Wed, Oct 04, 2017 at 02:28:30AM CEST, vinicius.gomes at intel.com wrote: >>This queueing discipline implements the shaper algorithm defined by >>the 802.1Q-2014 Section 8.6.8.2 and detailed in Annex L. >> >>It's primary usage is to apply some bandwidth reservation to user >>defined traffic classes, which are mapped to different queues via the >>mqprio qdisc. >> >>Initially, it only supports offloading the traffic shaping work to >>supporting controllers. >> >>Later, when a software implementation is added, the current dependency >>on being installed "under" mqprio can be lifted. >> >>Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> >>Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> >>--- >> include/linux/netdevice.h | 1 + >> include/net/pkt_sched.h | 9 ++ >> include/uapi/linux/pkt_sched.h | 17 ++++ >> net/sched/Kconfig | 11 ++ >> net/sched/Makefile | 1 + >> net/sched/sch_cbs.c | 225 +++++++++++++++++++++++++++++++++++++++++ >> 6 files changed, 264 insertions(+) >> create mode 100644 net/sched/sch_cbs.c >> >>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>index e1d6ef130611..b8798adc214f 100644 >>--- a/include/linux/netdevice.h >>+++ b/include/linux/netdevice.h >>@@ -775,6 +775,7 @@ enum tc_setup_type { >> TC_SETUP_CLSFLOWER, >> TC_SETUP_CLSMATCHALL, >> TC_SETUP_CLSBPF, >>+ TC_SETUP_CBS, > > Please split this into 2 patches. One will introduce the new qdisc, > second will add offload capabilities. > Of course. > [...] > > >>+static struct Qdisc_ops cbs_qdisc_ops __read_mostly = { >>+ .next = NULL, >>+ .id = "cbs", >>+ .priv_size = sizeof(struct cbs_sched_data), >>+ .enqueue = cbs_enqueue, >>+ .dequeue = qdisc_dequeue_head, >>+ .peek = qdisc_peek_dequeued, >>+ .init = cbs_init, >>+ .reset = qdisc_reset_queue, >>+ .destroy = cbs_destroy, >>+ .change = cbs_change, >>+ .dump = cbs_dump, >>+ .owner = THIS_MODULE, >>+}; > > I don't see a software implementation for this. Looks like you are > trying abuse tc subsystem to bypass kernel. Could you please explain > this? The golden rule is: implement in kernel, then offload. The reason was that we didn't have a use case for the software implementation right now, it would be added in a later series. But as that was requested (and it makes sense), I will add it for the next version of this series (it is already written, just need to test it better). Cheers, -- Vinicius ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc @ 2017-10-05 19:57 ` Vinicius Costa Gomes 0 siblings, 0 replies; 28+ messages in thread From: Vinicius Costa Gomes @ 2017-10-05 19:57 UTC (permalink / raw) To: Jiri Pirko Cc: netdev, intel-wired-lan, jhs, xiyou.wangcong, andre.guedes, ivan.briano, jesus.sanchez-palencia, boon.leong.ong, richardcochran, henrik, levipearson, rodney.cummings Hi Jiri, Jiri Pirko <jiri@resnulli.us> writes: > Wed, Oct 04, 2017 at 02:28:30AM CEST, vinicius.gomes@intel.com wrote: >>This queueing discipline implements the shaper algorithm defined by >>the 802.1Q-2014 Section 8.6.8.2 and detailed in Annex L. >> >>It's primary usage is to apply some bandwidth reservation to user >>defined traffic classes, which are mapped to different queues via the >>mqprio qdisc. >> >>Initially, it only supports offloading the traffic shaping work to >>supporting controllers. >> >>Later, when a software implementation is added, the current dependency >>on being installed "under" mqprio can be lifted. >> >>Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> >>Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> >>--- >> include/linux/netdevice.h | 1 + >> include/net/pkt_sched.h | 9 ++ >> include/uapi/linux/pkt_sched.h | 17 ++++ >> net/sched/Kconfig | 11 ++ >> net/sched/Makefile | 1 + >> net/sched/sch_cbs.c | 225 +++++++++++++++++++++++++++++++++++++++++ >> 6 files changed, 264 insertions(+) >> create mode 100644 net/sched/sch_cbs.c >> >>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>index e1d6ef130611..b8798adc214f 100644 >>--- a/include/linux/netdevice.h >>+++ b/include/linux/netdevice.h >>@@ -775,6 +775,7 @@ enum tc_setup_type { >> TC_SETUP_CLSFLOWER, >> TC_SETUP_CLSMATCHALL, >> TC_SETUP_CLSBPF, >>+ TC_SETUP_CBS, > > Please split this into 2 patches. One will introduce the new qdisc, > second will add offload capabilities. > Of course. > [...] > > >>+static struct Qdisc_ops cbs_qdisc_ops __read_mostly = { >>+ .next = NULL, >>+ .id = "cbs", >>+ .priv_size = sizeof(struct cbs_sched_data), >>+ .enqueue = cbs_enqueue, >>+ .dequeue = qdisc_dequeue_head, >>+ .peek = qdisc_peek_dequeued, >>+ .init = cbs_init, >>+ .reset = qdisc_reset_queue, >>+ .destroy = cbs_destroy, >>+ .change = cbs_change, >>+ .dump = cbs_dump, >>+ .owner = THIS_MODULE, >>+}; > > I don't see a software implementation for this. Looks like you are > trying abuse tc subsystem to bypass kernel. Could you please explain > this? The golden rule is: implement in kernel, then offload. The reason was that we didn't have a use case for the software implementation right now, it would be added in a later series. But as that was requested (and it makes sense), I will add it for the next version of this series (it is already written, just need to test it better). Cheers, ^ permalink raw reply [flat|nested] 28+ messages in thread
* [Intel-wired-lan] [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc 2017-10-05 19:57 ` Vinicius Costa Gomes @ 2017-10-05 21:15 ` Jiri Pirko -1 siblings, 0 replies; 28+ messages in thread From: Jiri Pirko @ 2017-10-05 21:15 UTC (permalink / raw) To: intel-wired-lan Thu, Oct 05, 2017 at 09:57:34PM CEST, vinicius.gomes at intel.com wrote: >Hi Jiri, > >Jiri Pirko <jiri@resnulli.us> writes: > >> Wed, Oct 04, 2017 at 02:28:30AM CEST, vinicius.gomes at intel.com wrote: >>>This queueing discipline implements the shaper algorithm defined by >>>the 802.1Q-2014 Section 8.6.8.2 and detailed in Annex L. >>> >>>It's primary usage is to apply some bandwidth reservation to user >>>defined traffic classes, which are mapped to different queues via the >>>mqprio qdisc. >>> >>>Initially, it only supports offloading the traffic shaping work to >>>supporting controllers. >>> >>>Later, when a software implementation is added, the current dependency >>>on being installed "under" mqprio can be lifted. >>> >>>Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> >>>Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> >>>--- >>> include/linux/netdevice.h | 1 + >>> include/net/pkt_sched.h | 9 ++ >>> include/uapi/linux/pkt_sched.h | 17 ++++ >>> net/sched/Kconfig | 11 ++ >>> net/sched/Makefile | 1 + >>> net/sched/sch_cbs.c | 225 +++++++++++++++++++++++++++++++++++++++++ >>> 6 files changed, 264 insertions(+) >>> create mode 100644 net/sched/sch_cbs.c >>> >>>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>>index e1d6ef130611..b8798adc214f 100644 >>>--- a/include/linux/netdevice.h >>>+++ b/include/linux/netdevice.h >>>@@ -775,6 +775,7 @@ enum tc_setup_type { >>> TC_SETUP_CLSFLOWER, >>> TC_SETUP_CLSMATCHALL, >>> TC_SETUP_CLSBPF, >>>+ TC_SETUP_CBS, >> >> Please split this into 2 patches. One will introduce the new qdisc, >> second will add offload capabilities. >> > >Of course. > >> [...] >> >> >>>+static struct Qdisc_ops cbs_qdisc_ops __read_mostly = { >>>+ .next = NULL, >>>+ .id = "cbs", >>>+ .priv_size = sizeof(struct cbs_sched_data), >>>+ .enqueue = cbs_enqueue, >>>+ .dequeue = qdisc_dequeue_head, >>>+ .peek = qdisc_peek_dequeued, >>>+ .init = cbs_init, >>>+ .reset = qdisc_reset_queue, >>>+ .destroy = cbs_destroy, >>>+ .change = cbs_change, >>>+ .dump = cbs_dump, >>>+ .owner = THIS_MODULE, >>>+}; >> >> I don't see a software implementation for this. Looks like you are >> trying abuse tc subsystem to bypass kernel. Could you please explain >> this? The golden rule is: implement in kernel, then offload. > >The reason was that we didn't have a use case for the software >implementation right now, it would be added in a later series. The policy is very strict, SW implementation first, HW implementation later. > >But as that was requested (and it makes sense), I will add it for the >next version of this series (it is already written, just need to test it >better). Good. > > >Cheers, >-- >Vinicius ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc @ 2017-10-05 21:15 ` Jiri Pirko 0 siblings, 0 replies; 28+ messages in thread From: Jiri Pirko @ 2017-10-05 21:15 UTC (permalink / raw) To: Vinicius Costa Gomes Cc: netdev, intel-wired-lan, jhs, xiyou.wangcong, andre.guedes, ivan.briano, jesus.sanchez-palencia, boon.leong.ong, richardcochran, henrik, levipearson, rodney.cummings Thu, Oct 05, 2017 at 09:57:34PM CEST, vinicius.gomes@intel.com wrote: >Hi Jiri, > >Jiri Pirko <jiri@resnulli.us> writes: > >> Wed, Oct 04, 2017 at 02:28:30AM CEST, vinicius.gomes@intel.com wrote: >>>This queueing discipline implements the shaper algorithm defined by >>>the 802.1Q-2014 Section 8.6.8.2 and detailed in Annex L. >>> >>>It's primary usage is to apply some bandwidth reservation to user >>>defined traffic classes, which are mapped to different queues via the >>>mqprio qdisc. >>> >>>Initially, it only supports offloading the traffic shaping work to >>>supporting controllers. >>> >>>Later, when a software implementation is added, the current dependency >>>on being installed "under" mqprio can be lifted. >>> >>>Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> >>>Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> >>>--- >>> include/linux/netdevice.h | 1 + >>> include/net/pkt_sched.h | 9 ++ >>> include/uapi/linux/pkt_sched.h | 17 ++++ >>> net/sched/Kconfig | 11 ++ >>> net/sched/Makefile | 1 + >>> net/sched/sch_cbs.c | 225 +++++++++++++++++++++++++++++++++++++++++ >>> 6 files changed, 264 insertions(+) >>> create mode 100644 net/sched/sch_cbs.c >>> >>>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>>index e1d6ef130611..b8798adc214f 100644 >>>--- a/include/linux/netdevice.h >>>+++ b/include/linux/netdevice.h >>>@@ -775,6 +775,7 @@ enum tc_setup_type { >>> TC_SETUP_CLSFLOWER, >>> TC_SETUP_CLSMATCHALL, >>> TC_SETUP_CLSBPF, >>>+ TC_SETUP_CBS, >> >> Please split this into 2 patches. One will introduce the new qdisc, >> second will add offload capabilities. >> > >Of course. > >> [...] >> >> >>>+static struct Qdisc_ops cbs_qdisc_ops __read_mostly = { >>>+ .next = NULL, >>>+ .id = "cbs", >>>+ .priv_size = sizeof(struct cbs_sched_data), >>>+ .enqueue = cbs_enqueue, >>>+ .dequeue = qdisc_dequeue_head, >>>+ .peek = qdisc_peek_dequeued, >>>+ .init = cbs_init, >>>+ .reset = qdisc_reset_queue, >>>+ .destroy = cbs_destroy, >>>+ .change = cbs_change, >>>+ .dump = cbs_dump, >>>+ .owner = THIS_MODULE, >>>+}; >> >> I don't see a software implementation for this. Looks like you are >> trying abuse tc subsystem to bypass kernel. Could you please explain >> this? The golden rule is: implement in kernel, then offload. > >The reason was that we didn't have a use case for the software >implementation right now, it would be added in a later series. The policy is very strict, SW implementation first, HW implementation later. > >But as that was requested (and it makes sense), I will add it for the >next version of this series (it is already written, just need to test it >better). Good. > > >Cheers, >-- >Vinicius ^ permalink raw reply [flat|nested] 28+ messages in thread
* [Intel-wired-lan] [next-queue PATCH v4 4/4] igb: Add support for CBS offload 2017-10-04 0:28 ` Vinicius Costa Gomes @ 2017-10-04 0:28 ` Vinicius Costa Gomes -1 siblings, 0 replies; 28+ messages in thread From: Vinicius Costa Gomes @ 2017-10-04 0:28 UTC (permalink / raw) To: intel-wired-lan From: Andre Guedes <andre.guedes@intel.com> This patch adds support for Credit-Based Shaper (CBS) qdisc offload from Traffic Control system. This support enable us to leverage the Forwarding and Queuing for Time-Sensitive Streams (FQTSS) features from Intel i210 Ethernet Controller. FQTSS is the former 802.1Qav standard which was merged into 802.1Q in 2014. It enables traffic prioritization and bandwidth reservation via the Credit-Based Shaper which is implemented in hardware by i210 controller. The patch introduces the igb_setup_tc() function which implements the support for CBS qdisc hardware offload in the IGB driver. CBS offload is the only traffic control offload supported by the driver at the moment. FQTSS transmission mode from i210 controller is automatically enabled by the IGB driver when the CBS is enabled for the first hardware queue. Likewise, FQTSS mode is automatically disabled when CBS is disabled for the last hardware queue. Changing FQTSS mode requires NIC reset. FQTSS feature is supported by i210 controller only. Signed-off-by: Andre Guedes <andre.guedes@intel.com> --- drivers/net/ethernet/intel/igb/e1000_defines.h | 23 ++ drivers/net/ethernet/intel/igb/e1000_regs.h | 8 + drivers/net/ethernet/intel/igb/igb.h | 6 + drivers/net/ethernet/intel/igb/igb_main.c | 347 +++++++++++++++++++++++++ 4 files changed, 384 insertions(+) diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h b/drivers/net/ethernet/intel/igb/e1000_defines.h index 1de82f247312..83cabff1e0ab 100644 --- a/drivers/net/ethernet/intel/igb/e1000_defines.h +++ b/drivers/net/ethernet/intel/igb/e1000_defines.h @@ -353,7 +353,18 @@ #define E1000_RXPBS_CFG_TS_EN 0x80000000 #define I210_RXPBSIZE_DEFAULT 0x000000A2 /* RXPBSIZE default */ +#define I210_RXPBSIZE_MASK 0x0000003F +#define I210_RXPBSIZE_PB_32KB 0x00000020 #define I210_TXPBSIZE_DEFAULT 0x04000014 /* TXPBSIZE default */ +#define I210_TXPBSIZE_MASK 0xC0FFFFFF +#define I210_TXPBSIZE_PB0_8KB (8 << 0) +#define I210_TXPBSIZE_PB1_8KB (8 << 6) +#define I210_TXPBSIZE_PB2_4KB (4 << 12) +#define I210_TXPBSIZE_PB3_4KB (4 << 18) + +#define I210_DTXMXPKTSZ_DEFAULT 0x00000098 + +#define I210_SR_QUEUES_NUM 2 /* SerDes Control */ #define E1000_SCTL_DISABLE_SERDES_LOOPBACK 0x0400 @@ -1051,4 +1062,16 @@ #define E1000_VLAPQF_P_VALID(_n) (0x1 << (3 + (_n) * 4)) #define E1000_VLAPQF_QUEUE_MASK 0x03 +/* TX Qav Control fields */ +#define E1000_TQAVCTRL_XMIT_MODE BIT(0) +#define E1000_TQAVCTRL_DATAFETCHARB BIT(4) +#define E1000_TQAVCTRL_DATATRANARB BIT(8) + +/* TX Qav Credit Control fields */ +#define E1000_TQAVCC_IDLESLOPE_MASK 0xFFFF +#define E1000_TQAVCC_QUEUEMODE BIT(31) + +/* Transmit Descriptor Control fields */ +#define E1000_TXDCTL_PRIORITY BIT(27) + #endif diff --git a/drivers/net/ethernet/intel/igb/e1000_regs.h b/drivers/net/ethernet/intel/igb/e1000_regs.h index 58adbf234e07..8eee081d395f 100644 --- a/drivers/net/ethernet/intel/igb/e1000_regs.h +++ b/drivers/net/ethernet/intel/igb/e1000_regs.h @@ -421,6 +421,14 @@ do { \ #define E1000_I210_FLA 0x1201C +#define E1000_I210_DTXMXPKTSZ 0x355C + +#define E1000_I210_TXDCTL(_n) (0x0E028 + ((_n) * 0x40)) + +#define E1000_I210_TQAVCTRL 0x3570 +#define E1000_I210_TQAVCC(_n) (0x3004 + ((_n) * 0x40)) +#define E1000_I210_TQAVHC(_n) (0x300C + ((_n) * 0x40)) + #define E1000_INVM_DATA_REG(_n) (0x12120 + 4*(_n)) #define E1000_INVM_SIZE 64 /* Number of INVM Data Registers */ diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h index 06ffb2bc713e..92845692087a 100644 --- a/drivers/net/ethernet/intel/igb/igb.h +++ b/drivers/net/ethernet/intel/igb/igb.h @@ -281,6 +281,11 @@ struct igb_ring { u16 count; /* number of desc. in the ring */ u8 queue_index; /* logical index of the ring*/ u8 reg_idx; /* physical index of the ring */ + bool cbs_enable; /* indicates if CBS is enabled */ + s32 idleslope; /* idleSlope in kbps */ + s32 sendslope; /* sendSlope in kbps */ + s32 hicredit; /* hiCredit in bytes */ + s32 locredit; /* loCredit in bytes */ /* everything past this point are written often */ u16 next_to_clean; @@ -621,6 +626,7 @@ struct igb_adapter { #define IGB_FLAG_EEE BIT(14) #define IGB_FLAG_VLAN_PROMISC BIT(15) #define IGB_FLAG_RX_LEGACY BIT(16) +#define IGB_FLAG_FQTSS BIT(17) /* Media Auto Sense */ #define IGB_MAS_ENABLE_0 0X0001 diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index fd4a46b03cc8..03b8d0f4acfd 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -34,6 +34,7 @@ #include <linux/slab.h> #include <net/checksum.h> #include <net/ip6_checksum.h> +#include <net/pkt_sched.h> #include <linux/net_tstamp.h> #include <linux/mii.h> #include <linux/ethtool.h> @@ -62,6 +63,17 @@ #define BUILD 0 #define DRV_VERSION __stringify(MAJ) "." __stringify(MIN) "." \ __stringify(BUILD) "-k" + +enum queue_mode { + QUEUE_MODE_STRICT_PRIORITY, + QUEUE_MODE_STREAM_RESERVATION, +}; + +enum tx_queue_prio { + TX_QUEUE_PRIO_HIGH, + TX_QUEUE_PRIO_LOW, +}; + char igb_driver_name[] = "igb"; char igb_driver_version[] = DRV_VERSION; static const char igb_driver_string[] = @@ -1271,6 +1283,12 @@ static int igb_alloc_q_vector(struct igb_adapter *adapter, ring->count = adapter->tx_ring_count; ring->queue_index = txr_idx; + ring->cbs_enable = false; + ring->idleslope = 0; + ring->sendslope = 0; + ring->hicredit = 0; + ring->locredit = 0; + u64_stats_init(&ring->tx_syncp); u64_stats_init(&ring->tx_syncp2); @@ -1598,6 +1616,284 @@ static void igb_get_hw_control(struct igb_adapter *adapter) ctrl_ext | E1000_CTRL_EXT_DRV_LOAD); } +static void enable_fqtss(struct igb_adapter *adapter, bool enable) +{ + struct net_device *netdev = adapter->netdev; + struct e1000_hw *hw = &adapter->hw; + + WARN_ON(hw->mac.type != e1000_i210); + + if (enable) + adapter->flags |= IGB_FLAG_FQTSS; + else + adapter->flags &= ~IGB_FLAG_FQTSS; + + if (netif_running(netdev)) + schedule_work(&adapter->reset_task); +} + +static bool is_fqtss_enabled(struct igb_adapter *adapter) +{ + return (adapter->flags & IGB_FLAG_FQTSS) ? true : false; +} + +static void set_tx_desc_fetch_prio(struct e1000_hw *hw, int queue, + enum tx_queue_prio prio) +{ + u32 val; + + WARN_ON(hw->mac.type != e1000_i210); + WARN_ON(queue < 0 || queue > 4); + + val = rd32(E1000_I210_TXDCTL(queue)); + + if (prio == TX_QUEUE_PRIO_HIGH) + val |= E1000_TXDCTL_PRIORITY; + else + val &= ~E1000_TXDCTL_PRIORITY; + + wr32(E1000_I210_TXDCTL(queue), val); +} + +static void set_queue_mode(struct e1000_hw *hw, int queue, enum queue_mode mode) +{ + u32 val; + + WARN_ON(hw->mac.type != e1000_i210); + WARN_ON(queue < 0 || queue > 1); + + val = rd32(E1000_I210_TQAVCC(queue)); + + if (mode == QUEUE_MODE_STREAM_RESERVATION) + val |= E1000_TQAVCC_QUEUEMODE; + else + val &= ~E1000_TQAVCC_QUEUEMODE; + + wr32(E1000_I210_TQAVCC(queue), val); +} + +/** + * igb_configure_cbs - Configure Credit-Based Shaper (CBS) + * @adapter: pointer to adapter struct + * @queue: queue number + * @enable: true = enable CBS, false = disable CBS + * @idleslope: idleSlope in kbps + * @sendslope: sendSlope in kbps + * @hicredit: hiCredit in bytes + * @locredit: loCredit in bytes + * + * Configure CBS for a given hardware queue. When disabling, idleslope, + * sendslope, hicredit, locredit arguments are ignored. Returns 0 if + * success. Negative otherwise. + **/ +static void igb_configure_cbs(struct igb_adapter *adapter, int queue, + bool enable, int idleslope, int sendslope, + int hicredit, int locredit) +{ + struct net_device *netdev = adapter->netdev; + struct e1000_hw *hw = &adapter->hw; + u32 tqavcc; + u16 value; + + WARN_ON(hw->mac.type != e1000_i210); + WARN_ON(queue < 0 || queue > 1); + + if (enable) { + set_tx_desc_fetch_prio(hw, queue, TX_QUEUE_PRIO_HIGH); + set_queue_mode(hw, queue, QUEUE_MODE_STREAM_RESERVATION); + + /* According to i210 datasheet section 7.2.7.7, we should set + * the 'idleSlope' field from TQAVCC register following the + * equation: + * + * For 100 Mbps link speed: + * + * value = BW * 0x7735 * 0.2 (E1) + * + * For 1000Mbps link speed: + * + * value = BW * 0x7735 * 2 (E2) + * + * E1 and E2 can be merged into one equation as shown below. + * Note that 'link-speed' is in Mbps. + * + * value = BW * 0x7735 * 2 * link-speed + * -------------- (E3) + * 1000 + * + * 'BW' is the percentage bandwidth out of full link speed + * which can be found with the following equation. Note that + * idleSlope here is the parameter from this function which + * is in kbps. + * + * BW = idleSlope + * ----------------- (E4) + * link-speed * 1000 + * + * That said, we can come up with a generic equation to + * calculate the value we should set it TQAVCC register by + * replacing 'BW' in E3 by E4. The resulting equation is: + * + * value = idleSlope * 0x7735 * 2 * link-speed + * ----------------- -------------- (E5) + * link-speed * 1000 1000 + * + * 'link-speed' is present in both sides of the fraction so + * it is canceled out. The final equation is the following: + * + * value = idleSlope * 61034 + * ----------------- (E6) + * 1000000 + */ + value = DIV_ROUND_UP_ULL(idleslope * 61034ULL, 1000000); + + tqavcc = rd32(E1000_I210_TQAVCC(queue)); + tqavcc &= ~E1000_TQAVCC_IDLESLOPE_MASK; + tqavcc |= value; + wr32(E1000_I210_TQAVCC(queue), tqavcc); + + wr32(E1000_I210_TQAVHC(queue), 0x80000000 + hicredit * 0x7735); + } else { + set_tx_desc_fetch_prio(hw, queue, TX_QUEUE_PRIO_LOW); + set_queue_mode(hw, queue, QUEUE_MODE_STRICT_PRIORITY); + + /* Set idleSlope to zero. */ + tqavcc = rd32(E1000_I210_TQAVCC(queue)); + tqavcc &= ~E1000_TQAVCC_IDLESLOPE_MASK; + wr32(E1000_I210_TQAVCC(queue), tqavcc); + + /* Set hiCredit to zero. */ + wr32(E1000_I210_TQAVHC(queue), 0); + } + + /* XXX: In i210 controller the sendSlope and loCredit parameters from + * CBS are not configurable by software so we don't do any 'controller + * configuration' in respect to these parameters. + */ + + netdev_dbg(netdev, "CBS %s: queue %d idleslope %d sendslope %d hiCredit %d locredit %d\n", + (enable) ? "enabled" : "disabled", queue, + idleslope, sendslope, hicredit, locredit); +} + +static int igb_save_cbs_params(struct igb_adapter *adapter, int queue, + bool enable, int idleslope, int sendslope, + int hicredit, int locredit) +{ + struct igb_ring *ring; + + if (queue < 0 || queue > adapter->num_tx_queues) + return -EINVAL; + + ring = adapter->tx_ring[queue]; + + ring->cbs_enable = enable; + ring->idleslope = idleslope; + ring->sendslope = sendslope; + ring->hicredit = hicredit; + ring->locredit = locredit; + + return 0; +} + +static bool is_any_cbs_enabled(struct igb_adapter *adapter) +{ + struct igb_ring *ring; + int i; + + for (i = 0; i < adapter->num_tx_queues; i++) { + ring = adapter->tx_ring[i]; + + if (ring->cbs_enable) + return true; + } + + return false; +} + +static void igb_setup_tx_mode(struct igb_adapter *adapter) +{ + struct net_device *netdev = adapter->netdev; + struct e1000_hw *hw = &adapter->hw; + u32 val; + + /* Only i210 controller supports changing the transmission mode. */ + if (hw->mac.type != e1000_i210) + return; + + if (is_fqtss_enabled(adapter)) { + int i, max_queue; + + /* Configure TQAVCTRL register: set transmit mode to 'Qav', + * set data fetch arbitration to 'round robin' and set data + * transfer arbitration to 'credit shaper algorithm. + */ + val = rd32(E1000_I210_TQAVCTRL); + val |= E1000_TQAVCTRL_XMIT_MODE | E1000_TQAVCTRL_DATATRANARB; + val &= ~E1000_TQAVCTRL_DATAFETCHARB; + wr32(E1000_I210_TQAVCTRL, val); + + /* Configure Tx and Rx packet buffers sizes as described in + * i210 datasheet section 7.2.7.7. + */ + val = rd32(E1000_TXPBS); + val &= ~I210_TXPBSIZE_MASK; + val |= I210_TXPBSIZE_PB0_8KB | I210_TXPBSIZE_PB1_8KB | + I210_TXPBSIZE_PB2_4KB | I210_TXPBSIZE_PB3_4KB; + wr32(E1000_TXPBS, val); + + val = rd32(E1000_RXPBS); + val &= ~I210_RXPBSIZE_MASK; + val |= I210_RXPBSIZE_PB_32KB; + wr32(E1000_RXPBS, val); + + /* Section 8.12.9 states that MAX_TPKT_SIZE from DTXMXPKTSZ + * register should not exceed the buffer size programmed in + * TXPBS. The smallest buffer size programmed in TXPBS is 4kB + * so according to the datasheet we should set MAX_TPKT_SIZE to + * 4kB / 64. + * + * However, when we do so, no frame from queue 2 and 3 are + * transmitted. It seems the MAX_TPKT_SIZE should not be great + * or _equal_ to the buffer size programmed in TXPBS. For this + * reason, we set set MAX_ TPKT_SIZE to (4kB - 1) / 64. + */ + val = (4096 - 1) / 64; + wr32(E1000_I210_DTXMXPKTSZ, val); + + /* Since FQTSS mode is enabled, apply any CBS configuration + * previously set. If no previous CBS configuration has been + * done, then the initial configuration is applied, which means + * CBS is disabled. + */ + max_queue = (adapter->num_tx_queues < I210_SR_QUEUES_NUM) ? + adapter->num_tx_queues : I210_SR_QUEUES_NUM; + + for (i = 0; i < max_queue; i++) { + struct igb_ring *ring = adapter->tx_ring[i]; + + igb_configure_cbs(adapter, i, ring->cbs_enable, + ring->idleslope, ring->sendslope, + ring->hicredit, ring->locredit); + } + } else { + wr32(E1000_RXPBS, I210_RXPBSIZE_DEFAULT); + wr32(E1000_TXPBS, I210_TXPBSIZE_DEFAULT); + wr32(E1000_I210_DTXMXPKTSZ, I210_DTXMXPKTSZ_DEFAULT); + + val = rd32(E1000_I210_TQAVCTRL); + /* According to Section 8.12.21, the other flags we've set when + * enabling FQTSS are not relevant when disabling FQTSS so we + * don't set they here. + */ + val &= ~E1000_TQAVCTRL_XMIT_MODE; + wr32(E1000_I210_TQAVCTRL, val); + } + + netdev_dbg(netdev, "FQTSS %s\n", (is_fqtss_enabled(adapter)) ? + "enabled" : "disabled"); +} + /** * igb_configure - configure the hardware for RX and TX * @adapter: private board structure @@ -1609,6 +1905,7 @@ static void igb_configure(struct igb_adapter *adapter) igb_get_hw_control(adapter); igb_set_rx_mode(netdev); + igb_setup_tx_mode(adapter); igb_restore_vlan(adapter); @@ -2150,6 +2447,55 @@ igb_features_check(struct sk_buff *skb, struct net_device *dev, return features; } +static int igb_offload_cbs(struct igb_adapter *adapter, + struct tc_cbs_qopt_offload *qopt) +{ + struct e1000_hw *hw = &adapter->hw; + int err; + + /* CBS offloading is only supported by i210 controller. */ + if (hw->mac.type != e1000_i210) + return -EOPNOTSUPP; + + /* CBS offloading is only supported by queue 0 and queue 1. */ + if (qopt->queue < 0 || qopt->queue > 1) + return -EINVAL; + + err = igb_save_cbs_params(adapter, qopt->queue, qopt->enable, + qopt->idleslope, qopt->sendslope, + qopt->hicredit, qopt->locredit); + if (err) + return err; + + if (is_fqtss_enabled(adapter)) { + igb_configure_cbs(adapter, qopt->queue, qopt->enable, + qopt->idleslope, qopt->sendslope, + qopt->hicredit, qopt->locredit); + + if (!is_any_cbs_enabled(adapter)) + enable_fqtss(adapter, false); + + } else { + enable_fqtss(adapter, true); + } + + return 0; +} + +static int igb_setup_tc(struct net_device *dev, enum tc_setup_type type, + void *type_data) +{ + struct igb_adapter *adapter = netdev_priv(dev); + + switch (type) { + case TC_SETUP_CBS: + return igb_offload_cbs(adapter, type_data); + + default: + return -EOPNOTSUPP; + } +} + static const struct net_device_ops igb_netdev_ops = { .ndo_open = igb_open, .ndo_stop = igb_close, @@ -2175,6 +2521,7 @@ static const struct net_device_ops igb_netdev_ops = { .ndo_set_features = igb_set_features, .ndo_fdb_add = igb_ndo_fdb_add, .ndo_features_check = igb_features_check, + .ndo_setup_tc = igb_setup_tc, }; /** -- 2.14.2 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [next-queue PATCH v4 4/4] igb: Add support for CBS offload @ 2017-10-04 0:28 ` Vinicius Costa Gomes 0 siblings, 0 replies; 28+ messages in thread From: Vinicius Costa Gomes @ 2017-10-04 0:28 UTC (permalink / raw) To: netdev, intel-wired-lan Cc: Andre Guedes, jhs, xiyou.wangcong, jiri, ivan.briano, jesus.sanchez-palencia, boon.leong.ong, richardcochran, henrik, levipearson, rodney.cummings From: Andre Guedes <andre.guedes@intel.com> This patch adds support for Credit-Based Shaper (CBS) qdisc offload from Traffic Control system. This support enable us to leverage the Forwarding and Queuing for Time-Sensitive Streams (FQTSS) features from Intel i210 Ethernet Controller. FQTSS is the former 802.1Qav standard which was merged into 802.1Q in 2014. It enables traffic prioritization and bandwidth reservation via the Credit-Based Shaper which is implemented in hardware by i210 controller. The patch introduces the igb_setup_tc() function which implements the support for CBS qdisc hardware offload in the IGB driver. CBS offload is the only traffic control offload supported by the driver at the moment. FQTSS transmission mode from i210 controller is automatically enabled by the IGB driver when the CBS is enabled for the first hardware queue. Likewise, FQTSS mode is automatically disabled when CBS is disabled for the last hardware queue. Changing FQTSS mode requires NIC reset. FQTSS feature is supported by i210 controller only. Signed-off-by: Andre Guedes <andre.guedes@intel.com> --- drivers/net/ethernet/intel/igb/e1000_defines.h | 23 ++ drivers/net/ethernet/intel/igb/e1000_regs.h | 8 + drivers/net/ethernet/intel/igb/igb.h | 6 + drivers/net/ethernet/intel/igb/igb_main.c | 347 +++++++++++++++++++++++++ 4 files changed, 384 insertions(+) diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h b/drivers/net/ethernet/intel/igb/e1000_defines.h index 1de82f247312..83cabff1e0ab 100644 --- a/drivers/net/ethernet/intel/igb/e1000_defines.h +++ b/drivers/net/ethernet/intel/igb/e1000_defines.h @@ -353,7 +353,18 @@ #define E1000_RXPBS_CFG_TS_EN 0x80000000 #define I210_RXPBSIZE_DEFAULT 0x000000A2 /* RXPBSIZE default */ +#define I210_RXPBSIZE_MASK 0x0000003F +#define I210_RXPBSIZE_PB_32KB 0x00000020 #define I210_TXPBSIZE_DEFAULT 0x04000014 /* TXPBSIZE default */ +#define I210_TXPBSIZE_MASK 0xC0FFFFFF +#define I210_TXPBSIZE_PB0_8KB (8 << 0) +#define I210_TXPBSIZE_PB1_8KB (8 << 6) +#define I210_TXPBSIZE_PB2_4KB (4 << 12) +#define I210_TXPBSIZE_PB3_4KB (4 << 18) + +#define I210_DTXMXPKTSZ_DEFAULT 0x00000098 + +#define I210_SR_QUEUES_NUM 2 /* SerDes Control */ #define E1000_SCTL_DISABLE_SERDES_LOOPBACK 0x0400 @@ -1051,4 +1062,16 @@ #define E1000_VLAPQF_P_VALID(_n) (0x1 << (3 + (_n) * 4)) #define E1000_VLAPQF_QUEUE_MASK 0x03 +/* TX Qav Control fields */ +#define E1000_TQAVCTRL_XMIT_MODE BIT(0) +#define E1000_TQAVCTRL_DATAFETCHARB BIT(4) +#define E1000_TQAVCTRL_DATATRANARB BIT(8) + +/* TX Qav Credit Control fields */ +#define E1000_TQAVCC_IDLESLOPE_MASK 0xFFFF +#define E1000_TQAVCC_QUEUEMODE BIT(31) + +/* Transmit Descriptor Control fields */ +#define E1000_TXDCTL_PRIORITY BIT(27) + #endif diff --git a/drivers/net/ethernet/intel/igb/e1000_regs.h b/drivers/net/ethernet/intel/igb/e1000_regs.h index 58adbf234e07..8eee081d395f 100644 --- a/drivers/net/ethernet/intel/igb/e1000_regs.h +++ b/drivers/net/ethernet/intel/igb/e1000_regs.h @@ -421,6 +421,14 @@ do { \ #define E1000_I210_FLA 0x1201C +#define E1000_I210_DTXMXPKTSZ 0x355C + +#define E1000_I210_TXDCTL(_n) (0x0E028 + ((_n) * 0x40)) + +#define E1000_I210_TQAVCTRL 0x3570 +#define E1000_I210_TQAVCC(_n) (0x3004 + ((_n) * 0x40)) +#define E1000_I210_TQAVHC(_n) (0x300C + ((_n) * 0x40)) + #define E1000_INVM_DATA_REG(_n) (0x12120 + 4*(_n)) #define E1000_INVM_SIZE 64 /* Number of INVM Data Registers */ diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h index 06ffb2bc713e..92845692087a 100644 --- a/drivers/net/ethernet/intel/igb/igb.h +++ b/drivers/net/ethernet/intel/igb/igb.h @@ -281,6 +281,11 @@ struct igb_ring { u16 count; /* number of desc. in the ring */ u8 queue_index; /* logical index of the ring*/ u8 reg_idx; /* physical index of the ring */ + bool cbs_enable; /* indicates if CBS is enabled */ + s32 idleslope; /* idleSlope in kbps */ + s32 sendslope; /* sendSlope in kbps */ + s32 hicredit; /* hiCredit in bytes */ + s32 locredit; /* loCredit in bytes */ /* everything past this point are written often */ u16 next_to_clean; @@ -621,6 +626,7 @@ struct igb_adapter { #define IGB_FLAG_EEE BIT(14) #define IGB_FLAG_VLAN_PROMISC BIT(15) #define IGB_FLAG_RX_LEGACY BIT(16) +#define IGB_FLAG_FQTSS BIT(17) /* Media Auto Sense */ #define IGB_MAS_ENABLE_0 0X0001 diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index fd4a46b03cc8..03b8d0f4acfd 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -34,6 +34,7 @@ #include <linux/slab.h> #include <net/checksum.h> #include <net/ip6_checksum.h> +#include <net/pkt_sched.h> #include <linux/net_tstamp.h> #include <linux/mii.h> #include <linux/ethtool.h> @@ -62,6 +63,17 @@ #define BUILD 0 #define DRV_VERSION __stringify(MAJ) "." __stringify(MIN) "." \ __stringify(BUILD) "-k" + +enum queue_mode { + QUEUE_MODE_STRICT_PRIORITY, + QUEUE_MODE_STREAM_RESERVATION, +}; + +enum tx_queue_prio { + TX_QUEUE_PRIO_HIGH, + TX_QUEUE_PRIO_LOW, +}; + char igb_driver_name[] = "igb"; char igb_driver_version[] = DRV_VERSION; static const char igb_driver_string[] = @@ -1271,6 +1283,12 @@ static int igb_alloc_q_vector(struct igb_adapter *adapter, ring->count = adapter->tx_ring_count; ring->queue_index = txr_idx; + ring->cbs_enable = false; + ring->idleslope = 0; + ring->sendslope = 0; + ring->hicredit = 0; + ring->locredit = 0; + u64_stats_init(&ring->tx_syncp); u64_stats_init(&ring->tx_syncp2); @@ -1598,6 +1616,284 @@ static void igb_get_hw_control(struct igb_adapter *adapter) ctrl_ext | E1000_CTRL_EXT_DRV_LOAD); } +static void enable_fqtss(struct igb_adapter *adapter, bool enable) +{ + struct net_device *netdev = adapter->netdev; + struct e1000_hw *hw = &adapter->hw; + + WARN_ON(hw->mac.type != e1000_i210); + + if (enable) + adapter->flags |= IGB_FLAG_FQTSS; + else + adapter->flags &= ~IGB_FLAG_FQTSS; + + if (netif_running(netdev)) + schedule_work(&adapter->reset_task); +} + +static bool is_fqtss_enabled(struct igb_adapter *adapter) +{ + return (adapter->flags & IGB_FLAG_FQTSS) ? true : false; +} + +static void set_tx_desc_fetch_prio(struct e1000_hw *hw, int queue, + enum tx_queue_prio prio) +{ + u32 val; + + WARN_ON(hw->mac.type != e1000_i210); + WARN_ON(queue < 0 || queue > 4); + + val = rd32(E1000_I210_TXDCTL(queue)); + + if (prio == TX_QUEUE_PRIO_HIGH) + val |= E1000_TXDCTL_PRIORITY; + else + val &= ~E1000_TXDCTL_PRIORITY; + + wr32(E1000_I210_TXDCTL(queue), val); +} + +static void set_queue_mode(struct e1000_hw *hw, int queue, enum queue_mode mode) +{ + u32 val; + + WARN_ON(hw->mac.type != e1000_i210); + WARN_ON(queue < 0 || queue > 1); + + val = rd32(E1000_I210_TQAVCC(queue)); + + if (mode == QUEUE_MODE_STREAM_RESERVATION) + val |= E1000_TQAVCC_QUEUEMODE; + else + val &= ~E1000_TQAVCC_QUEUEMODE; + + wr32(E1000_I210_TQAVCC(queue), val); +} + +/** + * igb_configure_cbs - Configure Credit-Based Shaper (CBS) + * @adapter: pointer to adapter struct + * @queue: queue number + * @enable: true = enable CBS, false = disable CBS + * @idleslope: idleSlope in kbps + * @sendslope: sendSlope in kbps + * @hicredit: hiCredit in bytes + * @locredit: loCredit in bytes + * + * Configure CBS for a given hardware queue. When disabling, idleslope, + * sendslope, hicredit, locredit arguments are ignored. Returns 0 if + * success. Negative otherwise. + **/ +static void igb_configure_cbs(struct igb_adapter *adapter, int queue, + bool enable, int idleslope, int sendslope, + int hicredit, int locredit) +{ + struct net_device *netdev = adapter->netdev; + struct e1000_hw *hw = &adapter->hw; + u32 tqavcc; + u16 value; + + WARN_ON(hw->mac.type != e1000_i210); + WARN_ON(queue < 0 || queue > 1); + + if (enable) { + set_tx_desc_fetch_prio(hw, queue, TX_QUEUE_PRIO_HIGH); + set_queue_mode(hw, queue, QUEUE_MODE_STREAM_RESERVATION); + + /* According to i210 datasheet section 7.2.7.7, we should set + * the 'idleSlope' field from TQAVCC register following the + * equation: + * + * For 100 Mbps link speed: + * + * value = BW * 0x7735 * 0.2 (E1) + * + * For 1000Mbps link speed: + * + * value = BW * 0x7735 * 2 (E2) + * + * E1 and E2 can be merged into one equation as shown below. + * Note that 'link-speed' is in Mbps. + * + * value = BW * 0x7735 * 2 * link-speed + * -------------- (E3) + * 1000 + * + * 'BW' is the percentage bandwidth out of full link speed + * which can be found with the following equation. Note that + * idleSlope here is the parameter from this function which + * is in kbps. + * + * BW = idleSlope + * ----------------- (E4) + * link-speed * 1000 + * + * That said, we can come up with a generic equation to + * calculate the value we should set it TQAVCC register by + * replacing 'BW' in E3 by E4. The resulting equation is: + * + * value = idleSlope * 0x7735 * 2 * link-speed + * ----------------- -------------- (E5) + * link-speed * 1000 1000 + * + * 'link-speed' is present in both sides of the fraction so + * it is canceled out. The final equation is the following: + * + * value = idleSlope * 61034 + * ----------------- (E6) + * 1000000 + */ + value = DIV_ROUND_UP_ULL(idleslope * 61034ULL, 1000000); + + tqavcc = rd32(E1000_I210_TQAVCC(queue)); + tqavcc &= ~E1000_TQAVCC_IDLESLOPE_MASK; + tqavcc |= value; + wr32(E1000_I210_TQAVCC(queue), tqavcc); + + wr32(E1000_I210_TQAVHC(queue), 0x80000000 + hicredit * 0x7735); + } else { + set_tx_desc_fetch_prio(hw, queue, TX_QUEUE_PRIO_LOW); + set_queue_mode(hw, queue, QUEUE_MODE_STRICT_PRIORITY); + + /* Set idleSlope to zero. */ + tqavcc = rd32(E1000_I210_TQAVCC(queue)); + tqavcc &= ~E1000_TQAVCC_IDLESLOPE_MASK; + wr32(E1000_I210_TQAVCC(queue), tqavcc); + + /* Set hiCredit to zero. */ + wr32(E1000_I210_TQAVHC(queue), 0); + } + + /* XXX: In i210 controller the sendSlope and loCredit parameters from + * CBS are not configurable by software so we don't do any 'controller + * configuration' in respect to these parameters. + */ + + netdev_dbg(netdev, "CBS %s: queue %d idleslope %d sendslope %d hiCredit %d locredit %d\n", + (enable) ? "enabled" : "disabled", queue, + idleslope, sendslope, hicredit, locredit); +} + +static int igb_save_cbs_params(struct igb_adapter *adapter, int queue, + bool enable, int idleslope, int sendslope, + int hicredit, int locredit) +{ + struct igb_ring *ring; + + if (queue < 0 || queue > adapter->num_tx_queues) + return -EINVAL; + + ring = adapter->tx_ring[queue]; + + ring->cbs_enable = enable; + ring->idleslope = idleslope; + ring->sendslope = sendslope; + ring->hicredit = hicredit; + ring->locredit = locredit; + + return 0; +} + +static bool is_any_cbs_enabled(struct igb_adapter *adapter) +{ + struct igb_ring *ring; + int i; + + for (i = 0; i < adapter->num_tx_queues; i++) { + ring = adapter->tx_ring[i]; + + if (ring->cbs_enable) + return true; + } + + return false; +} + +static void igb_setup_tx_mode(struct igb_adapter *adapter) +{ + struct net_device *netdev = adapter->netdev; + struct e1000_hw *hw = &adapter->hw; + u32 val; + + /* Only i210 controller supports changing the transmission mode. */ + if (hw->mac.type != e1000_i210) + return; + + if (is_fqtss_enabled(adapter)) { + int i, max_queue; + + /* Configure TQAVCTRL register: set transmit mode to 'Qav', + * set data fetch arbitration to 'round robin' and set data + * transfer arbitration to 'credit shaper algorithm. + */ + val = rd32(E1000_I210_TQAVCTRL); + val |= E1000_TQAVCTRL_XMIT_MODE | E1000_TQAVCTRL_DATATRANARB; + val &= ~E1000_TQAVCTRL_DATAFETCHARB; + wr32(E1000_I210_TQAVCTRL, val); + + /* Configure Tx and Rx packet buffers sizes as described in + * i210 datasheet section 7.2.7.7. + */ + val = rd32(E1000_TXPBS); + val &= ~I210_TXPBSIZE_MASK; + val |= I210_TXPBSIZE_PB0_8KB | I210_TXPBSIZE_PB1_8KB | + I210_TXPBSIZE_PB2_4KB | I210_TXPBSIZE_PB3_4KB; + wr32(E1000_TXPBS, val); + + val = rd32(E1000_RXPBS); + val &= ~I210_RXPBSIZE_MASK; + val |= I210_RXPBSIZE_PB_32KB; + wr32(E1000_RXPBS, val); + + /* Section 8.12.9 states that MAX_TPKT_SIZE from DTXMXPKTSZ + * register should not exceed the buffer size programmed in + * TXPBS. The smallest buffer size programmed in TXPBS is 4kB + * so according to the datasheet we should set MAX_TPKT_SIZE to + * 4kB / 64. + * + * However, when we do so, no frame from queue 2 and 3 are + * transmitted. It seems the MAX_TPKT_SIZE should not be great + * or _equal_ to the buffer size programmed in TXPBS. For this + * reason, we set set MAX_ TPKT_SIZE to (4kB - 1) / 64. + */ + val = (4096 - 1) / 64; + wr32(E1000_I210_DTXMXPKTSZ, val); + + /* Since FQTSS mode is enabled, apply any CBS configuration + * previously set. If no previous CBS configuration has been + * done, then the initial configuration is applied, which means + * CBS is disabled. + */ + max_queue = (adapter->num_tx_queues < I210_SR_QUEUES_NUM) ? + adapter->num_tx_queues : I210_SR_QUEUES_NUM; + + for (i = 0; i < max_queue; i++) { + struct igb_ring *ring = adapter->tx_ring[i]; + + igb_configure_cbs(adapter, i, ring->cbs_enable, + ring->idleslope, ring->sendslope, + ring->hicredit, ring->locredit); + } + } else { + wr32(E1000_RXPBS, I210_RXPBSIZE_DEFAULT); + wr32(E1000_TXPBS, I210_TXPBSIZE_DEFAULT); + wr32(E1000_I210_DTXMXPKTSZ, I210_DTXMXPKTSZ_DEFAULT); + + val = rd32(E1000_I210_TQAVCTRL); + /* According to Section 8.12.21, the other flags we've set when + * enabling FQTSS are not relevant when disabling FQTSS so we + * don't set they here. + */ + val &= ~E1000_TQAVCTRL_XMIT_MODE; + wr32(E1000_I210_TQAVCTRL, val); + } + + netdev_dbg(netdev, "FQTSS %s\n", (is_fqtss_enabled(adapter)) ? + "enabled" : "disabled"); +} + /** * igb_configure - configure the hardware for RX and TX * @adapter: private board structure @@ -1609,6 +1905,7 @@ static void igb_configure(struct igb_adapter *adapter) igb_get_hw_control(adapter); igb_set_rx_mode(netdev); + igb_setup_tx_mode(adapter); igb_restore_vlan(adapter); @@ -2150,6 +2447,55 @@ igb_features_check(struct sk_buff *skb, struct net_device *dev, return features; } +static int igb_offload_cbs(struct igb_adapter *adapter, + struct tc_cbs_qopt_offload *qopt) +{ + struct e1000_hw *hw = &adapter->hw; + int err; + + /* CBS offloading is only supported by i210 controller. */ + if (hw->mac.type != e1000_i210) + return -EOPNOTSUPP; + + /* CBS offloading is only supported by queue 0 and queue 1. */ + if (qopt->queue < 0 || qopt->queue > 1) + return -EINVAL; + + err = igb_save_cbs_params(adapter, qopt->queue, qopt->enable, + qopt->idleslope, qopt->sendslope, + qopt->hicredit, qopt->locredit); + if (err) + return err; + + if (is_fqtss_enabled(adapter)) { + igb_configure_cbs(adapter, qopt->queue, qopt->enable, + qopt->idleslope, qopt->sendslope, + qopt->hicredit, qopt->locredit); + + if (!is_any_cbs_enabled(adapter)) + enable_fqtss(adapter, false); + + } else { + enable_fqtss(adapter, true); + } + + return 0; +} + +static int igb_setup_tc(struct net_device *dev, enum tc_setup_type type, + void *type_data) +{ + struct igb_adapter *adapter = netdev_priv(dev); + + switch (type) { + case TC_SETUP_CBS: + return igb_offload_cbs(adapter, type_data); + + default: + return -EOPNOTSUPP; + } +} + static const struct net_device_ops igb_netdev_ops = { .ndo_open = igb_open, .ndo_stop = igb_close, @@ -2175,6 +2521,7 @@ static const struct net_device_ops igb_netdev_ops = { .ndo_set_features = igb_set_features, .ndo_fdb_add = igb_ndo_fdb_add, .ndo_features_check = igb_features_check, + .ndo_setup_tc = igb_setup_tc, }; /** -- 2.14.2 ^ permalink raw reply related [flat|nested] 28+ messages in thread
end of thread, other threads:[~2017-10-05 21:23 UTC | newest] Thread overview: 28+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-10-04 0:28 [Intel-wired-lan] [next-queue PATCH v4 0/4] TSN: Add qdisc based config interface for CBS Vinicius Costa Gomes 2017-10-04 0:28 ` Vinicius Costa Gomes 2017-10-04 0:28 ` [Intel-wired-lan] [next-queue PATCH v4 1/4] mqprio: Implement select_queue class_ops Vinicius Costa Gomes 2017-10-04 0:28 ` Vinicius Costa Gomes 2017-10-04 0:28 ` [Intel-wired-lan] [next-queue PATCH v4 2/4] net/sched: Fix accessing invalid dev_queue Vinicius Costa Gomes 2017-10-04 0:28 ` Vinicius Costa Gomes 2017-10-04 0:28 ` [Intel-wired-lan] [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc Vinicius Costa Gomes 2017-10-04 0:28 ` Vinicius Costa Gomes 2017-10-04 6:36 ` [Intel-wired-lan] " Jiri Pirko 2017-10-04 6:36 ` Jiri Pirko 2017-10-05 18:09 ` [Intel-wired-lan] " Levi Pearson 2017-10-05 18:09 ` Levi Pearson 2017-10-05 18:29 ` [Intel-wired-lan] " David Miller 2017-10-05 18:29 ` David Miller 2017-10-05 18:41 ` [Intel-wired-lan] " Rodney Cummings 2017-10-05 18:41 ` Rodney Cummings 2017-10-05 19:05 ` [Intel-wired-lan] " David Miller 2017-10-05 19:05 ` David Miller 2017-10-05 19:17 ` [Intel-wired-lan] " Rodney Cummings 2017-10-05 19:17 ` Rodney Cummings 2017-10-05 21:23 ` [Intel-wired-lan] " Levi Pearson 2017-10-05 21:23 ` Levi Pearson 2017-10-05 19:57 ` [Intel-wired-lan] " Vinicius Costa Gomes 2017-10-05 19:57 ` Vinicius Costa Gomes 2017-10-05 21:15 ` [Intel-wired-lan] " Jiri Pirko 2017-10-05 21:15 ` Jiri Pirko 2017-10-04 0:28 ` [Intel-wired-lan] [next-queue PATCH v4 4/4] igb: Add support for CBS offload Vinicius Costa Gomes 2017-10-04 0:28 ` Vinicius Costa Gomes
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.