From: Amir Vadai <amirv@mellanox.com>
To: John Fastabend <john.r.fastabend@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>, <netdev@vger.kernel.org>,
Roland Dreier <roland@purestorage.com>,
Oren Duer <oren@mellanox.com>, Amir Vadai <amirv@mellanox.co.il>,
Jeff Kirsher <jeffrey.t.kirsher@intel.com>,
Eilon Greenstein <eilong@broadcom.com>
Subject: Re: [PATCH 7/8] net: support tx_ring per UP in HW based QoS mechanism
Date: Wed, 14 Mar 2012 12:09:30 +0200 [thread overview]
Message-ID: <4F606E5A.6080407@mellanox.com> (raw)
In-Reply-To: <4F5F9087.7090607@intel.com>
On 03/13/2012 08:23 PM, John Fastabend wrote:
> On 3/13/2012 10:22 AM, Amir Vadai wrote:
>> From: Amir Vadai<amirv@mellanox.co.il>
>>
>> The Current HW based QoS mechanism which was introduced in commit 4f57c087de9
>> "net: implement mechanism for HW based QOS" is in orientation to ETS traffic
>> class. This patch introduces an approach which allow to use this mechanism also
>> with hardware who has queues per user priority (UP). After the change,
>> __skb_tx_hash() will direct a flow to a tx ring from a range of tx rings. This
>> range is defined by the caller function by the specific HW. If TC based queues,
>> the range is by TC number and for UP based queues, the range is by UP.
>>
> ETS is one specific use case for mqprio it can easily be used with other
> hardware transmission selection algorithms 802.1Q std based or otherwise.
>
> The mapping is really just an skb->priority to group of queues (qoffset and
> qcount). I happened to call the queue grouping a traffic class because that
> aligns with 802.1Q but it _really_ is just a queue grouping.
This is good for untagged traffic, but for tagged traffic I can see 2
problems with this approach:
1. mqprio mapping could contradict egress map or 8021Qaz mapping (UP <=>
TC). This could be solved (not very elegantly) by forcing the mappings
to be synced.
2. egress map is per vlan, and mqprio mapping is one global mapping.
>
> In your case what would it mean to change the map and num_tc see 'tc':
>
> [root@jf-dev1-dcblab netperf]# tc qdisc add dev eth3 root mqprio help
> Usage: ... mqprio [num_tc NUMBER] [map P0 P1 ...]
> [queues count1@offset1 count2@offset2 ...] [hw 1|0]
>
> For example setting 'num_tc 8 map 0 1 2 3 0 1 2 3' looks like it might
> not work correctly. Would you end up with an skb tagged with priority
> 4,5,6,7 being sent out UP queues 0,1,2,3? My quess is that won't work
> with PFC or your ETS correctly.
I don't see a problem here. For example, skb tagged with priority 5 is
mapped to UP 1. And sent through one of the tx rings of UP 1. All the
rings of UP 1 share the same transmission queue (schedule queue) which
is controlled by PFC and ETS by the HW. What is the problem here?
>
> In the canonical iSCSI case we put iscsid in a net_prio cgroup to get the
> priority set then use the priority to steer the skb to the correct queue
> groupings. In your case I think you can just fail any num_tc != 8 and keep
> the dflt map 1:1 then this should work. What did I miss? It looks like you
> already fail the num_tc != 8 case so why do we need this change?
>
> At most maybe we need a flag to indicate the mqprio map can't be changed in
> some cases.
What you suggest is that the priority in net_prio cgroup will be the
User Priority, and not just the skb priority?
And also, for tagged traffic, how could it be forced to be synced with
egress map?
>
>
>> CC: John Fastabend<john.r.fastabend@intel.com>
>> CC: Jeff Kirsher<jeffrey.t.kirsher@intel.com>
>> CC: Eilon Greenstein<eilong@broadcom.com>
>> Signed-off-by: Amir Vadai<amirv@mellanox.co.il>
>> ---
>> drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 11 ++++++++++-
>> drivers/net/ethernet/mellanox/mlx4/en_tx.c | 9 +++------
>> include/linux/netdevice.h | 12 +++++++++++-
>> include/linux/skbuff.h | 3 ++-
>> net/core/dev.c | 10 +---------
>> 5 files changed, 27 insertions(+), 18 deletions(-)
>>
> [...]
>
>>
>> void bnx2x_set_num_queues(struct bnx2x *bp)
>> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> index 7a49830..d0d96e3 100644
>> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> @@ -570,18 +570,15 @@ static void build_inline_wqe(struct mlx4_en_tx_desc *tx_desc, struct sk_buff *sk
>>
>> u16 mlx4_en_select_queue(struct net_device *dev, struct sk_buff *skb)
>> {
>> - struct mlx4_en_priv *priv = netdev_priv(dev);
>> - int up = -1;
>> + int up = 0;
>>
>> if (vlan_tx_tag_present(skb))
>> up = (vlan_tx_tag_get(skb)>> 13);
> I was trying to avoid logic like this in select_queue().
Why?
>
> Can we get the same behavior by keeping the egress map and mqprio
> map in sync?
As I said above, if we force egress map to be synced to mqprio mapping,
we loose it's power - mqprio is global, and egress map is per vlan.
>
>> else if (dev->num_tc)
>> up = netdev_get_prio_tc_map(dev, skb->priority);
>>
>> - if (up>= 0)
>> - return MLX4_EN_NUM_TX_RINGS + up;
>> -
>> - return __skb_tx_hash(dev, skb, MLX4_EN_NUM_TX_RINGS);
>> + return __skb_tx_hash(dev, skb, MLX4_EN_NUM_TX_RINGS,
>> + MLX4_EN_NUM_TX_RINGS + up, 1);
>> }
>>
> Thanks,
> John
Thanks,
Amir
next prev parent reply other threads:[~2012-03-14 10:09 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-13 17:21 [PATCH 0/8] net/mlx4_en: DCB QoS support Amir Vadai
2012-03-13 17:21 ` [PATCH 1/8] net/mlx4_en: Force user priority by QP attribute Amir Vadai
2012-03-13 17:21 ` [PATCH 2/8] net/mlx4_core: set port QoS attributes Amir Vadai
2012-03-13 17:21 ` [PATCH 3/8] net/mlx4_en: DCB QoS support Amir Vadai
2012-03-13 17:21 ` [PATCH 4/8] net/mlx4_en: Set max rate-limit for a TC Amir Vadai
2012-03-13 18:26 ` John Fastabend
2012-03-14 10:31 ` Amir Vadai
2012-03-13 19:16 ` Dave Taht
2012-03-14 10:42 ` Amir Vadai
2012-03-13 17:22 ` [PATCH 5/8] net/mlx4_en: sk_prio <=> UP for untagged traffic Amir Vadai
2012-03-13 17:22 ` [PATCH 6/8] IB/rdma_cm: TOS <=> UP mapping for IBoE Amir Vadai
2012-03-13 17:22 ` [PATCH 7/8] net: support tx_ring per UP in HW based QoS mechanism Amir Vadai
2012-03-13 18:23 ` John Fastabend
2012-03-14 10:09 ` Amir Vadai [this message]
2012-03-14 21:36 ` John Fastabend
2012-03-15 10:05 ` Amir Vadai
2012-03-16 7:16 ` John Fastabend
2012-03-13 17:22 ` [PATCH 8/8] net/mlx4_en: num cores tx rings for every UP Amir Vadai
2012-03-20 11:29 ` [PATCH 0/8] net/mlx4_en: DCB QoS support Amir Vadai
2012-03-20 19:58 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F606E5A.6080407@mellanox.com \
--to=amirv@mellanox.com \
--cc=amirv@mellanox.co.il \
--cc=davem@davemloft.net \
--cc=eilong@broadcom.com \
--cc=jeffrey.t.kirsher@intel.com \
--cc=john.r.fastabend@intel.com \
--cc=netdev@vger.kernel.org \
--cc=oren@mellanox.com \
--cc=roland@purestorage.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.