From: Saeed Mahameed <saeedm@mellanox.com>
To: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org, Huy Nguyen <huyn@mellanox.com>,
Ido Schimmel <idosch@idosch.org>,
Jakub Kicinski <jakub.kicinski@netronome.com>,
Jiri Pirko <jiri@resnulli.us>, Or Gerlitz <gerlitz.or@gmail.com>,
Parav Pandit <parav@mellanox.com>,
Aron Silverton <aron.silverton@oracle.com>,
Saeed Mahameed <saeedm@mellanox.com>
Subject: [net-next V2 1/6] net/dcb: Add dcbnl buffer attribute
Date: Thu, 24 May 2018 14:38:15 -0700 [thread overview]
Message-ID: <20180524213820.5910-2-saeedm@mellanox.com> (raw)
In-Reply-To: <20180524213820.5910-1-saeedm@mellanox.com>
From: Huy Nguyen <huyn@mellanox.com>
In this patch, we add dcbnl buffer attribute to allow user
change the NIC's buffer configuration such as priority
to buffer mapping and buffer size of individual buffer.
This attribute combined with pfc attribute allows advanced user to
fine tune the qos setting for specific priority queue. For example,
user can give dedicated buffer for one or more priorities or user
can give large buffer to certain priorities.
The dcb buffer configuration will be controlled by lldptool.
lldptool -T -i eth2 -V BUFFER prio 0,2,5,7,1,2,3,6
maps priorities 0,1,2,3,4,5,6,7 to receive buffer 0,2,5,7,1,2,3,6
lldptool -T -i eth2 -V BUFFER size 87296,87296,0,87296,0,0,0,0
sets receive buffer size for buffer 0,1,2,3,4,5,6,7 respectively
After discussion on mailing list with Jakub, Jiri, Ido and John, we agreed to
choose dcbnl over devlink interface since this feature is intended to set
port attributes which are governed by the netdev instance of that port, where
devlink API is more suitable for global ASIC configurations.
We present an use case scenario where dcbnl buffer attribute configured
by advance user helps reduce the latency of messages of different sizes.
Scenarios description:
On ConnectX-5, we run latency sensitive traffic with
small/medium message sizes ranging from 64B to 256KB and bandwidth sensitive
traffic with large messages sizes 512KB and 1MB. We group small, medium,
and large message sizes to their own pfc enables priorities as follow.
Priorities 1 & 2 (64B, 256B and 1KB)
Priorities 3 & 4 (4KB, 8KB, 16KB, 64KB, 128KB and 256KB)
Priorities 5 & 6 (512KB and 1MB)
By default, ConnectX-5 maps all pfc enabled priorities to a single
lossless fixed buffer size of 50% of total available buffer space. The
other 50% is assigned to lossy buffer. Using dcbnl buffer attribute,
we create three equal size lossless buffers. Each buffer has 25% of total
available buffer space. Thus, the lossy buffer size reduces to 25%. Priority
to lossless buffer mappings are set as follow.
Priorities 1 & 2 on lossless buffer #1
Priorities 3 & 4 on lossless buffer #2
Priorities 5 & 6 on lossless buffer #3
We observe improvements in latency for small and medium message sizes
as follows. Please note that the large message sizes bandwidth performance is
reduced but the total bandwidth remains the same.
256B message size (42 % latency reduction)
4K message size (21% latency reduction)
64K message size (16% latency reduction)
CC: Ido Schimmel <idosch@idosch.org>
CC: Jakub Kicinski <jakub.kicinski@netronome.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Or Gerlitz <gerlitz.or@gmail.com>
CC: Parav Pandit <parav@mellanox.com>
CC: Aron Silverton <aron.silverton@oracle.com>
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
include/net/dcbnl.h | 4 ++++
include/uapi/linux/dcbnl.h | 11 +++++++++++
net/dcb/dcbnl.c | 20 ++++++++++++++++++++
3 files changed, 35 insertions(+)
diff --git a/include/net/dcbnl.h b/include/net/dcbnl.h
index 207d9ba1f92c..0e5e91be2d30 100644
--- a/include/net/dcbnl.h
+++ b/include/net/dcbnl.h
@@ -101,6 +101,10 @@ struct dcbnl_rtnl_ops {
/* CEE peer */
int (*cee_peer_getpg) (struct net_device *, struct cee_pg *);
int (*cee_peer_getpfc) (struct net_device *, struct cee_pfc *);
+
+ /* buffer settings */
+ int (*dcbnl_getbuffer)(struct net_device *, struct dcbnl_buffer *);
+ int (*dcbnl_setbuffer)(struct net_device *, struct dcbnl_buffer *);
};
#endif /* __NET_DCBNL_H__ */
diff --git a/include/uapi/linux/dcbnl.h b/include/uapi/linux/dcbnl.h
index 2c0c6453c3f4..60aa2e446698 100644
--- a/include/uapi/linux/dcbnl.h
+++ b/include/uapi/linux/dcbnl.h
@@ -163,6 +163,16 @@ struct ieee_pfc {
__u64 indications[IEEE_8021QAZ_MAX_TCS];
};
+#define IEEE_8021Q_MAX_PRIORITIES 8
+#define DCBX_MAX_BUFFERS 8
+struct dcbnl_buffer {
+ /* priority to buffer mapping */
+ __u8 prio2buffer[IEEE_8021Q_MAX_PRIORITIES];
+ /* buffer size in Bytes */
+ __u32 buffer_size[DCBX_MAX_BUFFERS];
+ __u32 total_size;
+};
+
/* CEE DCBX std supported values */
#define CEE_DCBX_MAX_PGS 8
#define CEE_DCBX_MAX_PRIO 8
@@ -406,6 +416,7 @@ enum ieee_attrs {
DCB_ATTR_IEEE_MAXRATE,
DCB_ATTR_IEEE_QCN,
DCB_ATTR_IEEE_QCN_STATS,
+ DCB_ATTR_DCB_BUFFER,
__DCB_ATTR_IEEE_MAX
};
#define DCB_ATTR_IEEE_MAX (__DCB_ATTR_IEEE_MAX - 1)
diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index bae7d78aa068..d2f4e0c1faaf 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -176,6 +176,7 @@ static const struct nla_policy dcbnl_ieee_policy[DCB_ATTR_IEEE_MAX + 1] = {
[DCB_ATTR_IEEE_MAXRATE] = {.len = sizeof(struct ieee_maxrate)},
[DCB_ATTR_IEEE_QCN] = {.len = sizeof(struct ieee_qcn)},
[DCB_ATTR_IEEE_QCN_STATS] = {.len = sizeof(struct ieee_qcn_stats)},
+ [DCB_ATTR_DCB_BUFFER] = {.len = sizeof(struct dcbnl_buffer)},
};
/* DCB number of traffic classes nested attributes. */
@@ -1094,6 +1095,16 @@ static int dcbnl_ieee_fill(struct sk_buff *skb, struct net_device *netdev)
return -EMSGSIZE;
}
+ if (ops->dcbnl_getbuffer) {
+ struct dcbnl_buffer buffer;
+
+ memset(&buffer, 0, sizeof(buffer));
+ err = ops->dcbnl_getbuffer(netdev, &buffer);
+ if (!err &&
+ nla_put(skb, DCB_ATTR_DCB_BUFFER, sizeof(buffer), &buffer))
+ return -EMSGSIZE;
+ }
+
app = nla_nest_start(skb, DCB_ATTR_IEEE_APP_TABLE);
if (!app)
return -EMSGSIZE;
@@ -1453,6 +1464,15 @@ static int dcbnl_ieee_set(struct net_device *netdev, struct nlmsghdr *nlh,
goto err;
}
+ if (ieee[DCB_ATTR_DCB_BUFFER] && ops->dcbnl_setbuffer) {
+ struct dcbnl_buffer *buffer =
+ nla_data(ieee[DCB_ATTR_DCB_BUFFER]);
+
+ err = ops->dcbnl_setbuffer(netdev, buffer);
+ if (err)
+ goto err;
+ }
+
if (ieee[DCB_ATTR_IEEE_APP_TABLE]) {
struct nlattr *attr;
int rem;
--
2.17.0
next prev parent reply other threads:[~2018-05-24 21:38 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-24 21:38 [pull request][net-next V2 0/6] Mellanox, mlx5e updates 2018-05-19 Saeed Mahameed
2018-05-24 21:38 ` Saeed Mahameed [this message]
2018-05-24 21:38 ` [net-next V2 2/6] net/mlx5e: Move port speed code from en_ethtool.c to en/port.c Saeed Mahameed
2018-05-24 21:38 ` [net-next V2 3/6] net/mlx5: Add pbmc and pptb in the port_access_reg_cap_mask Saeed Mahameed
2018-05-24 21:38 ` [net-next V2 4/6] net/mlx5: PPTB and PBMC register firmware command support Saeed Mahameed
2018-05-24 21:38 ` [net-next V2 5/6] net/mlx5e: Receive buffer configuration Saeed Mahameed
2018-05-24 21:38 ` [net-next V2 6/6] net/mlx5e: Receive buffer support for DCBX Saeed Mahameed
2018-05-25 20:42 ` [pull request][net-next V2 0/6] Mellanox, mlx5e updates 2018-05-19 David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180524213820.5910-2-saeedm@mellanox.com \
--to=saeedm@mellanox.com \
--cc=aron.silverton@oracle.com \
--cc=davem@davemloft.net \
--cc=gerlitz.or@gmail.com \
--cc=huyn@mellanox.com \
--cc=idosch@idosch.org \
--cc=jakub.kicinski@netronome.com \
--cc=jiri@resnulli.us \
--cc=netdev@vger.kernel.org \
--cc=parav@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox