From mboxrd@z Thu Jan 1 00:00:00 1970 From: John Fastabend Subject: Re: [net-next-2.6 PATCH v6 1/2] net: implement mechanism for HW based QOS Date: Fri, 07 Jan 2011 14:48:13 -0800 Message-ID: <4D27982D.6080002@intel.com> References: <20110107031211.2446.35715.stgit@jf-dev1-dcblab> <20110107214645.GB2050@del.dom.local> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: "davem@davemloft.net" , "hadi@cyberus.ca" , "eric.dumazet@gmail.com" , "shemminger@vyatta.com" , "tgraf@infradead.org" , "bhutchings@solarflare.com" , "nhorman@tuxdriver.com" , "netdev@vger.kernel.org" To: Jarek Poplawski Return-path: Received: from mga03.intel.com ([143.182.124.21]:57713 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754914Ab1AGWsP (ORCPT ); Fri, 7 Jan 2011 17:48:15 -0500 In-Reply-To: <20110107214645.GB2050@del.dom.local> Sender: netdev-owner@vger.kernel.org List-ID: On 1/7/2011 1:46 PM, Jarek Poplawski wrote: > On Thu, Jan 06, 2011 at 07:12:11PM -0800, John Fastabend wrote: >> This patch provides a mechanism for lower layer devices to >> steer traffic using skb->priority to tx queues. This allows >> for hardware based QOS schemes to use the default qdisc without >> incurring the penalties related to global state and the qdisc >> lock. While reliably receiving skbs on the correct tx ring >> to avoid head of line blocking resulting from shuffling in >> the LLD. Finally, all the goodness from txq caching and xps/rps >> can still be leveraged. >> >> Many drivers and hardware exist with the ability to implement >> QOS schemes in the hardware but currently these drivers tend >> to rely on firmware to reroute specific traffic, a driver >> specific select_queue or the queue_mapping action in the >> qdisc. >> >> By using select_queue for this drivers need to be updated for >> each and every traffic type and we lose the goodness of much >> of the upstream work. Firmware solutions are inherently >> inflexible. And finally if admins are expected to build a >> qdisc and filter rules to steer traffic this requires knowledge >> of how the hardware is currently configured. The number of tx >> queues and the queue offsets may change depending on resources. >> Also this approach incurs all the overhead of a qdisc with filters. >> >> With the mechanism in this patch users can set skb priority using >> expected methods ie setsockopt() or the stack can set the priority >> directly. Then the skb will be steered to the correct tx queues >> aligned with hardware QOS traffic classes. In the normal case with >> a single traffic class and all queues in this class everything >> works as is until the LLD enables multiple tcs. >> >> To steer the skb we mask out the lower 4 bits of the priority >> and allow the hardware to configure upto 15 distinct classes >> of traffic. This is expected to be sufficient for most applications >> at any rate it is more then the 8021Q spec designates and is >> equal to the number of prio bands currently implemented in >> the default qdisc. >> >> This in conjunction with a userspace application such as >> lldpad can be used to implement 8021Q transmission selection >> algorithms one of these algorithms being the extended transmission >> selection algorithm currently being used for DCB. >> >> Signed-off-by: John Fastabend >> --- >> >> include/linux/netdevice.h | 65 +++++++++++++++++++++++++++++++++++++++++++++ >> net/core/dev.c | 52 +++++++++++++++++++++++++++++++++++- >> 2 files changed, 116 insertions(+), 1 deletions(-) >> >> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >> index 0f6b1c9..12fff42 100644 >> --- a/include/linux/netdevice.h >> +++ b/include/linux/netdevice.h >> @@ -646,6 +646,14 @@ struct xps_dev_maps { >> (nr_cpu_ids * sizeof(struct xps_map *))) >> #endif /* CONFIG_XPS */ >> >> +#define TC_MAX_QUEUE 16 >> +#define TC_BITMASK 15 >> +/* HW offloaded queuing disciplines txq count and offset maps */ >> +struct netdev_tc_txq { >> + u16 count; >> + u16 offset; >> +}; >> + >> /* >> * This structure defines the management hooks for network devices. >> * The following hooks can be defined; unless noted otherwise, they are >> @@ -756,6 +764,7 @@ struct xps_dev_maps { >> * int (*ndo_set_vf_port)(struct net_device *dev, int vf, >> * struct nlattr *port[]); >> * int (*ndo_get_vf_port)(struct net_device *dev, int vf, struct sk_buff *skb); >> + * void (*ndo_setup_tc)(struct net_device *dev, u8 tc) > > ..., unsigned int txq) ? > >> */ >> #define HAVE_NET_DEVICE_OPS >> struct net_device_ops { >> @@ -814,6 +823,8 @@ struct net_device_ops { >> struct nlattr *port[]); >> int (*ndo_get_vf_port)(struct net_device *dev, >> int vf, struct sk_buff *skb); >> + int (*ndo_setup_tc)(struct net_device *dev, u8 tc, >> + unsigned int txq); > > ... >> +/* netif_setup_tc - Handle tc mappings on real_num_tx_queues change >> + * @dev: Network device >> + * @txq: number of queues available >> + * >> + * If real_num_tx_queues is changed the tc mappings may no longer be >> + * valid. To resolve this if the net_device supports ndo_setup_tc >> + * call the ops routine with the new queue number. If the ops is not >> + * available verify the tc mapping remains valid and if not NULL the >> + * mapping. With no priorities mapping to this offset/count pair it >> + * will no longer be used. In the worst case TC0 is invalid nothing >> + * can be done so disable priority mappings. >> + */ >> +void netif_setup_tc(struct net_device *dev, unsigned int txq) >> +{ >> + const struct net_device_ops *ops = dev->netdev_ops; >> + >> + if (ops->ndo_setup_tc) { >> + ops->ndo_setup_tc(dev, dev->num_tc, txq); >> + } else { >> + int i; >> + struct netdev_tc_txq *tc = &dev->tc_to_txq[0]; >> + >> + /* If TC0 is invalidated disable TC mapping */ >> + if (tc->offset + tc->count > txq) { >> + dev->num_tc = 0; >> + return; >> + } >> + >> + /* Invalidated prio to tc mappings set to TC0 */ >> + for (i = 1; i < TC_BITMASK + 1; i++) { >> + int q = netdev_get_prio_tc_map(dev, i); > > (empty line) > Btw, probably some warning should be logged on config change here. > OK maybe I should see about making at least my local checkpatch script look for this. Also added pr_warnings here. >> + tc = &dev->tc_to_txq[q]; >> + >> + if (tc->offset + tc->count > txq) >> + netdev_set_prio_tc_map(dev, i, 0); >> + } >> + } >> +} >> + >> /* >> * Routine to help set real_num_tx_queues. To avoid skbs mapped to queues >> * greater then real_num_tx_queues stale skbs on the qdisc must be flushed. >> @@ -1614,6 +1653,9 @@ int netif_set_real_num_tx_queues(struct net_device *dev, unsigned int txq) >> >> if (txq < dev->real_num_tx_queues) >> qdisc_reset_all_tx_gt(dev, txq); >> + >> + if (dev->num_tc) >> + netif_setup_tc(dev, txq); > > Should be before qdisc_reset_all_tx_gt (above). > > Jarek P. I will fix this. Thanks!