All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Fastabend <john.r.fastabend@intel.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"nhorman@tuxdriver.com" <nhorman@tuxdriver.com>,
	"davem@davemloft.net" <davem@davemloft.net>
Subject: Re: [RFC PATCH v1 1/2] net: implement mechanism for HW based QOS
Date: Thu, 18 Nov 2010 09:27:59 -0800	[thread overview]
Message-ID: <4CE5621F.8040503@intel.com> (raw)
In-Reply-To: <1289976990.2732.226.camel@edumazet-laptop>

On 11/16/2010 10:56 PM, Eric Dumazet wrote:
> Le mardi 16 novembre 2010 à 21:15 -0800, John Fastabend a écrit :
>> This patch provides a mechanism for lower layer devices to
>> steer traffic using skb->priority to tx queues. This allows
>> for hardware based QOS schemes to use the default qdisc without
>> incurring the penalties related to global state and the qdisc
>> lock. While reliably receiving skbs on the correct tx ring
>> to avoid head of line blocking resulting from shuffling in
>> the LLD. Finally, all the goodness from txq caching and xps/rps
>> can still be leveraged.
>>
>> Many drivers and hardware exist with the ability to implement
>> QOS schemes in the hardware but currently these drivers tend
>> to rely on firmware to reroute specific traffic, a driver
>> specific select_queue or the queue_mapping action in the
>> qdisc.
>>
>> None of these solutions are ideal or generic so we end up
>> with driver specific solutions that one-off traffic types
>> for example FCoE traffic is steered in ixgbe with the
>> queue_select routine. By using select_queue for this drivers
>> need to be updated for each and every traffic type and we
>> loose the goodness of much of the upstream work. For example
>> txq caching.
>>
>> Firmware solutions are inherently inflexible. And finally if
>> admins are expected to build a qdisc and filter rules to steer
>> traffic this requires knowledge of how the hardware is currently
>> configured. The number of tx queues and the queue offsets may
>> change depending on resources. Also this approach incurs all the
>> overhead of a qdisc with filters.
>>
>> With this mechanism users can set skb priority using expected
>> methods either socket options or the stack can set this directly.
>> Then the skb will be steered to the correct tx queues aligned
>> with hardware QOS traffic classes. In the normal case with a
>> single traffic class and all queues in this class every thing
>> works as is until the LLD enables multiple tcs.
>>
>> To steer the skb we mask out the lower 8 bits of the priority
>> and allow the hardware to configure upto 15 distinct classes
>> of traffic. This is expected to be sufficient for most applications
>> at any rate it is more then the 8021Q spec designates and is
>> equal to the number of prio bands currently implemented in
>> the default qdisc.
>>
>> This in conjunction with a userspace application such as
>> lldpad can be used to implement 8021Q transmission selection
>> algorithms one of these algorithms being the extended transmission
>> selection algorithm currently being used for DCB.
>>
>> If this approach seems reasonable I'll go ahead and finish
>> this up. The priority to tc mapping should probably be exposed
>> to userspace either through sysfs or rtnetlink. Any thoughts?
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
>>
>>  include/linux/netdevice.h |   47 +++++++++++++++++++++++++++++++++++++++++++++
>>  net/core/dev.c            |   43 ++++++++++++++++++++++++++++++++++++++++-
>>  2 files changed, 89 insertions(+), 1 deletions(-)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index b45c1b8..8a2adeb 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -1092,6 +1092,12 @@ struct net_device {
>>  	/* Data Center Bridging netlink ops */
>>  	const struct dcbnl_rtnl_ops *dcbnl_ops;
>>  #endif
>> +	u8 max_tcs;
>> +	u8 num_tcs;
>> +	unsigned int *_tc_txqcount;
>> +	unsigned int *_tc_txqoffset;
> 
> This seems wrong to use two different pointers, this is a waste of cache
> memory. Also, I am not sure we need 32 bits, I believe we have a 16bit
> limit for queue numbers.
> 

Thanks for the feedback!

num_tx_queues is an unsigned int in net_device and I can't see any place that would limit it to 16bits. So I think we need 32 bits here. Otherwise I will make these changes and fix the fallout. This is much cleaner.


> Use a struct {
> 	u16 count;
> 	u16 offset;
> };
> 
>> +	u64 prio_tc_map;
> 
> Seems wrong too on 32bit arches
> 
> 	Please use : (even if using 16 bytes instead of 8)
> 
> 	u8 prio_tc_map[16];
> 
>> +
>>  
>>  #if defined(CONFIG_FCOE) || defined(CONFIG_FCOE_MODULE)
>>  	/* max exchange id for FCoE LRO by ddp */
>> @@ -1108,6 +1114,44 @@ struct net_device {
>>  #define	NETDEV_ALIGN		32
>>  
>>  static inline
>> +int netdev_get_prio_tc_map(const struct net_device *dev, u32 prio)
>> +{
>> +	return (dev->prio_tc_map >> (4 * (prio & 0xF))) & 0xF;
> 
> 	return dev->prio_tc_map[prio & 15];
> 
>> +}



  reply	other threads:[~2010-11-18 17:27 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-17  5:15 [RFC PATCH v1 1/2] net: implement mechanism for HW based QOS John Fastabend
2010-11-17  5:15 ` [RFC PATCH v1 2/2] ixgbe: add multiple txqs per tc John Fastabend
2010-11-17  6:56 ` [RFC PATCH v1 1/2] net: implement mechanism for HW based QOS Eric Dumazet
2010-11-18 17:27   ` John Fastabend [this message]
2010-11-18 17:39     ` Eric Dumazet
2010-11-18 18:00       ` John Fastabend
2010-11-17  9:18 ` Thomas Graf
2010-11-18 17:31   ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CE5621F.8040503@intel.com \
    --to=john.r.fastabend@intel.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@tuxdriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.