* [patch net-next v2 00/10] Add support for resource abstraction
@ 2017-12-26 11:23 Jiri Pirko
2017-12-26 11:23 ` [patch net-next v2 01/10] devlink: Add per devlink instance lock Jiri Pirko
` (11 more replies)
0 siblings, 12 replies; 61+ messages in thread
From: Jiri Pirko @ 2017-12-26 11:23 UTC (permalink / raw)
To: netdev
Cc: davem, arkadis, mlxsw, andrew, vivien.didelot, f.fainelli,
michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
yuvalm, ogerlitz, dsa, roopa
From: Jiri Pirko <jiri@mellanox.com>
Many of the ASIC's internal resources are limited and are shared between
several hardware procedures. For example, unified hash-based memory can
be used for many lookup purposes, like FDB and LPM. In many cases the user
can provide a partitioning scheme for such a resource in order to perform
fine tuning for his application. In such cases performing driver reload is
needed for the changes to take place, thus this patchset also adds support
for hot reload.
Such an abstraction can be coupled with devlink's dpipe interface, which
models the ASIC's pipeline as a graph of match/action tables. By modeling
the hardware resource object, and by coupling it to several dpipe tables,
further visibility can be achieved in order to debug ASIC-wide issues.
The proposed interface will provide the user the ability to understand the
limitations of the hardware, and receive notification regarding its occupancy.
Furthermore, monitoring the resource occupancy can be done in real-time and
can be useful in many cases.
---
Userspace part prototype can be found at https://github.com/arkadis/iproute2/
at resource_dev branch.
v1->v2
- Add resource size attribute.
- Fix split bug.
Arkadi Sharshevsky (10):
devlink: Add per devlink instance lock
devlink: Add support for resource abstraction
devlink: Add support for reload
devlink: Add relation between dpipe and resource
mlxsw: pci: Add support for performing bus reset
mlxsw: spectrum: Register KVD resources with devlink
mlxsw: spectrum_dpipe: Connect dpipe tables to resources
mlxsw: spectrum: Add support for getting kvdl occupancy
mlxsw: pci: Add support for getting resource through devlink
mlxsw: core: Add support for reload
drivers/net/ethernet/mellanox/mlxsw/core.c | 85 ++-
drivers/net/ethernet/mellanox/mlxsw/core.h | 16 +-
drivers/net/ethernet/mellanox/mlxsw/i2c.c | 5 +-
drivers/net/ethernet/mellanox/mlxsw/pci.c | 98 ++--
drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 205 ++++++++
drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 13 +
.../net/ethernet/mellanox/mlxsw/spectrum_dpipe.c | 72 ++-
.../net/ethernet/mellanox/mlxsw/spectrum_kvdl.c | 26 +
include/net/devlink.h | 97 ++++
include/uapi/linux/devlink.h | 21 +
net/core/devlink.c | 573 ++++++++++++++++++---
11 files changed, 1079 insertions(+), 132 deletions(-)
--
2.9.5
^ permalink raw reply [flat|nested] 61+ messages in thread
* [patch net-next v2 01/10] devlink: Add per devlink instance lock
2017-12-26 11:23 [patch net-next v2 00/10] Add support for resource abstraction Jiri Pirko
@ 2017-12-26 11:23 ` Jiri Pirko
2017-12-26 11:23 ` [patch net-next v2 02/10] devlink: Add support for resource abstraction Jiri Pirko
` (10 subsequent siblings)
11 siblings, 0 replies; 61+ messages in thread
From: Jiri Pirko @ 2017-12-26 11:23 UTC (permalink / raw)
To: netdev
Cc: davem, arkadis, mlxsw, andrew, vivien.didelot, f.fainelli,
michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
yuvalm, ogerlitz, dsa, roopa
From: Arkadi Sharshevsky <arkadis@mellanox.com>
This is a preparation before introducing resources and hot reload support.
Currently there are two global lock where one protects all devlink access,
and the second one protects devlink port access. This patch adds per devlink
instance lock which protects the internal members which are the sb/dpipe/
resource/ports. By introducing this lock the global devlink port lock can
be discarded.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
v1->v2
- Add flag for indication about not locking the devlink per-instance
lock. It is currently used for port split commands, and will be used
for reload as well.
---
include/net/devlink.h | 1 +
net/core/devlink.c | 136 ++++++++++++++++++++++++++++----------------------
2 files changed, 78 insertions(+), 59 deletions(-)
diff --git a/include/net/devlink.h b/include/net/devlink.h
index b9654e1..4d2c6fc 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -30,6 +30,7 @@ struct devlink {
const struct devlink_ops *ops;
struct device *dev;
possible_net_t _net;
+ struct mutex lock;
char priv[0] __aligned(NETDEV_ALIGN);
};
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 7d430c1..2f71734 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -92,12 +92,6 @@ static LIST_HEAD(devlink_list);
*/
static DEFINE_MUTEX(devlink_mutex);
-/* devlink_port_mutex
- *
- * Shared lock to guard lists of ports in all devlink devices.
- */
-static DEFINE_MUTEX(devlink_port_mutex);
-
static struct net *devlink_net(const struct devlink *devlink)
{
return read_pnet(&devlink->_net);
@@ -335,15 +329,18 @@ devlink_sb_tc_index_get_from_info(struct devlink_sb *devlink_sb,
#define DEVLINK_NL_FLAG_NEED_DEVLINK BIT(0)
#define DEVLINK_NL_FLAG_NEED_PORT BIT(1)
#define DEVLINK_NL_FLAG_NEED_SB BIT(2)
-#define DEVLINK_NL_FLAG_LOCK_PORTS BIT(3)
- /* port is not needed but we need to ensure they don't
- * change in the middle of command
- */
+
+/* The per devlink instance lock is taken by default in the pre-doit
+ * operation, yet several commands do not require this. The global
+ * devlink lock is taken and protects from disruption by user-calls.
+ */
+#define DEVLINK_NL_FLAG_NO_LOCK BIT(3)
static int devlink_nl_pre_doit(const struct genl_ops *ops,
struct sk_buff *skb, struct genl_info *info)
{
struct devlink *devlink;
+ int err;
mutex_lock(&devlink_mutex);
devlink = devlink_get_from_info(info);
@@ -351,44 +348,47 @@ static int devlink_nl_pre_doit(const struct genl_ops *ops,
mutex_unlock(&devlink_mutex);
return PTR_ERR(devlink);
}
+ if (~ops->internal_flags & DEVLINK_NL_FLAG_NO_LOCK)
+ mutex_lock(&devlink->lock);
if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_DEVLINK) {
info->user_ptr[0] = devlink;
} else if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_PORT) {
struct devlink_port *devlink_port;
- mutex_lock(&devlink_port_mutex);
devlink_port = devlink_port_get_from_info(devlink, info);
if (IS_ERR(devlink_port)) {
- mutex_unlock(&devlink_port_mutex);
- mutex_unlock(&devlink_mutex);
- return PTR_ERR(devlink_port);
+ err = PTR_ERR(devlink_port);
+ goto unlock;
}
info->user_ptr[0] = devlink_port;
}
- if (ops->internal_flags & DEVLINK_NL_FLAG_LOCK_PORTS) {
- mutex_lock(&devlink_port_mutex);
- }
if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_SB) {
struct devlink_sb *devlink_sb;
devlink_sb = devlink_sb_get_from_info(devlink, info);
if (IS_ERR(devlink_sb)) {
- if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_PORT)
- mutex_unlock(&devlink_port_mutex);
- mutex_unlock(&devlink_mutex);
- return PTR_ERR(devlink_sb);
+ err = PTR_ERR(devlink_sb);
+ goto unlock;
}
info->user_ptr[1] = devlink_sb;
}
return 0;
+
+unlock:
+ if (~ops->internal_flags & DEVLINK_NL_FLAG_NO_LOCK)
+ mutex_unlock(&devlink->lock);
+ mutex_unlock(&devlink_mutex);
+ return err;
}
static void devlink_nl_post_doit(const struct genl_ops *ops,
struct sk_buff *skb, struct genl_info *info)
{
- if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_PORT ||
- ops->internal_flags & DEVLINK_NL_FLAG_LOCK_PORTS)
- mutex_unlock(&devlink_port_mutex);
+ struct devlink *devlink;
+
+ devlink = devlink_get_from_info(info);
+ if (~ops->internal_flags & DEVLINK_NL_FLAG_NO_LOCK)
+ mutex_unlock(&devlink->lock);
mutex_unlock(&devlink_mutex);
}
@@ -614,10 +614,10 @@ static int devlink_nl_cmd_port_get_dumpit(struct sk_buff *msg,
int err;
mutex_lock(&devlink_mutex);
- mutex_lock(&devlink_port_mutex);
list_for_each_entry(devlink, &devlink_list, list) {
if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
continue;
+ mutex_lock(&devlink->lock);
list_for_each_entry(devlink_port, &devlink->port_list, list) {
if (idx < start) {
idx++;
@@ -628,13 +628,15 @@ static int devlink_nl_cmd_port_get_dumpit(struct sk_buff *msg,
NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq,
NLM_F_MULTI);
- if (err)
+ if (err) {
+ mutex_unlock(&devlink->lock);
goto out;
+ }
idx++;
}
+ mutex_unlock(&devlink->lock);
}
out:
- mutex_unlock(&devlink_port_mutex);
mutex_unlock(&devlink_mutex);
cb->args[0] = idx;
@@ -801,6 +803,7 @@ static int devlink_nl_cmd_sb_get_dumpit(struct sk_buff *msg,
list_for_each_entry(devlink, &devlink_list, list) {
if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
continue;
+ mutex_lock(&devlink->lock);
list_for_each_entry(devlink_sb, &devlink->sb_list, list) {
if (idx < start) {
idx++;
@@ -811,10 +814,13 @@ static int devlink_nl_cmd_sb_get_dumpit(struct sk_buff *msg,
NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq,
NLM_F_MULTI);
- if (err)
+ if (err) {
+ mutex_unlock(&devlink->lock);
goto out;
+ }
idx++;
}
+ mutex_unlock(&devlink->lock);
}
out:
mutex_unlock(&devlink_mutex);
@@ -935,14 +941,18 @@ static int devlink_nl_cmd_sb_pool_get_dumpit(struct sk_buff *msg,
if (!net_eq(devlink_net(devlink), sock_net(msg->sk)) ||
!devlink->ops || !devlink->ops->sb_pool_get)
continue;
+ mutex_lock(&devlink->lock);
list_for_each_entry(devlink_sb, &devlink->sb_list, list) {
err = __sb_pool_get_dumpit(msg, start, &idx, devlink,
devlink_sb,
NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq);
- if (err && err != -EOPNOTSUPP)
+ if (err && err != -EOPNOTSUPP) {
+ mutex_unlock(&devlink->lock);
goto out;
+ }
}
+ mutex_unlock(&devlink->lock);
}
out:
mutex_unlock(&devlink_mutex);
@@ -1123,22 +1133,24 @@ static int devlink_nl_cmd_sb_port_pool_get_dumpit(struct sk_buff *msg,
int err;
mutex_lock(&devlink_mutex);
- mutex_lock(&devlink_port_mutex);
list_for_each_entry(devlink, &devlink_list, list) {
if (!net_eq(devlink_net(devlink), sock_net(msg->sk)) ||
!devlink->ops || !devlink->ops->sb_port_pool_get)
continue;
+ mutex_lock(&devlink->lock);
list_for_each_entry(devlink_sb, &devlink->sb_list, list) {
err = __sb_port_pool_get_dumpit(msg, start, &idx,
devlink, devlink_sb,
NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq);
- if (err && err != -EOPNOTSUPP)
+ if (err && err != -EOPNOTSUPP) {
+ mutex_unlock(&devlink->lock);
goto out;
+ }
}
+ mutex_unlock(&devlink->lock);
}
out:
- mutex_unlock(&devlink_port_mutex);
mutex_unlock(&devlink_mutex);
cb->args[0] = idx;
@@ -1347,23 +1359,26 @@ devlink_nl_cmd_sb_tc_pool_bind_get_dumpit(struct sk_buff *msg,
int err;
mutex_lock(&devlink_mutex);
- mutex_lock(&devlink_port_mutex);
list_for_each_entry(devlink, &devlink_list, list) {
if (!net_eq(devlink_net(devlink), sock_net(msg->sk)) ||
!devlink->ops || !devlink->ops->sb_tc_pool_bind_get)
continue;
+
+ mutex_lock(&devlink->lock);
list_for_each_entry(devlink_sb, &devlink->sb_list, list) {
err = __sb_tc_pool_bind_get_dumpit(msg, start, &idx,
devlink,
devlink_sb,
NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq);
- if (err && err != -EOPNOTSUPP)
+ if (err && err != -EOPNOTSUPP) {
+ mutex_unlock(&devlink->lock);
goto out;
+ }
}
+ mutex_unlock(&devlink->lock);
}
out:
- mutex_unlock(&devlink_port_mutex);
mutex_unlock(&devlink_mutex);
cb->args[0] = idx;
@@ -2322,14 +2337,16 @@ static const struct genl_ops devlink_nl_ops[] = {
.doit = devlink_nl_cmd_port_split_doit,
.policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
- .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+ .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK |
+ DEVLINK_NL_FLAG_NO_LOCK,
},
{
.cmd = DEVLINK_CMD_PORT_UNSPLIT,
.doit = devlink_nl_cmd_port_unsplit_doit,
.policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
- .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+ .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK |
+ DEVLINK_NL_FLAG_NO_LOCK,
},
{
.cmd = DEVLINK_CMD_SB_GET,
@@ -2397,8 +2414,7 @@ static const struct genl_ops devlink_nl_ops[] = {
.policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK |
- DEVLINK_NL_FLAG_NEED_SB |
- DEVLINK_NL_FLAG_LOCK_PORTS,
+ DEVLINK_NL_FLAG_NEED_SB,
},
{
.cmd = DEVLINK_CMD_SB_OCC_MAX_CLEAR,
@@ -2406,8 +2422,7 @@ static const struct genl_ops devlink_nl_ops[] = {
.policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK |
- DEVLINK_NL_FLAG_NEED_SB |
- DEVLINK_NL_FLAG_LOCK_PORTS,
+ DEVLINK_NL_FLAG_NEED_SB,
},
{
.cmd = DEVLINK_CMD_ESWITCH_GET,
@@ -2488,6 +2503,7 @@ struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size)
INIT_LIST_HEAD(&devlink->port_list);
INIT_LIST_HEAD(&devlink->sb_list);
INIT_LIST_HEAD_RCU(&devlink->dpipe_table_list);
+ mutex_init(&devlink->lock);
return devlink;
}
EXPORT_SYMBOL_GPL(devlink_alloc);
@@ -2550,16 +2566,16 @@ int devlink_port_register(struct devlink *devlink,
struct devlink_port *devlink_port,
unsigned int port_index)
{
- mutex_lock(&devlink_port_mutex);
+ mutex_lock(&devlink->lock);
if (devlink_port_index_exists(devlink, port_index)) {
- mutex_unlock(&devlink_port_mutex);
+ mutex_unlock(&devlink->lock);
return -EEXIST;
}
devlink_port->devlink = devlink;
devlink_port->index = port_index;
devlink_port->registered = true;
list_add_tail(&devlink_port->list, &devlink->port_list);
- mutex_unlock(&devlink_port_mutex);
+ mutex_unlock(&devlink->lock);
devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW);
return 0;
}
@@ -2572,10 +2588,12 @@ EXPORT_SYMBOL_GPL(devlink_port_register);
*/
void devlink_port_unregister(struct devlink_port *devlink_port)
{
+ struct devlink *devlink = devlink_port->devlink;
+
devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_DEL);
- mutex_lock(&devlink_port_mutex);
+ mutex_lock(&devlink->lock);
list_del(&devlink_port->list);
- mutex_unlock(&devlink_port_mutex);
+ mutex_unlock(&devlink->lock);
}
EXPORT_SYMBOL_GPL(devlink_port_unregister);
@@ -2651,7 +2669,7 @@ int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
struct devlink_sb *devlink_sb;
int err = 0;
- mutex_lock(&devlink_mutex);
+ mutex_lock(&devlink->lock);
if (devlink_sb_index_exists(devlink, sb_index)) {
err = -EEXIST;
goto unlock;
@@ -2670,7 +2688,7 @@ int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
devlink_sb->egress_tc_count = egress_tc_count;
list_add_tail(&devlink_sb->list, &devlink->sb_list);
unlock:
- mutex_unlock(&devlink_mutex);
+ mutex_unlock(&devlink->lock);
return err;
}
EXPORT_SYMBOL_GPL(devlink_sb_register);
@@ -2679,11 +2697,11 @@ void devlink_sb_unregister(struct devlink *devlink, unsigned int sb_index)
{
struct devlink_sb *devlink_sb;
- mutex_lock(&devlink_mutex);
+ mutex_lock(&devlink->lock);
devlink_sb = devlink_sb_get_by_index(devlink, sb_index);
WARN_ON(!devlink_sb);
list_del(&devlink_sb->list);
- mutex_unlock(&devlink_mutex);
+ mutex_unlock(&devlink->lock);
kfree(devlink_sb);
}
EXPORT_SYMBOL_GPL(devlink_sb_unregister);
@@ -2699,9 +2717,9 @@ EXPORT_SYMBOL_GPL(devlink_sb_unregister);
int devlink_dpipe_headers_register(struct devlink *devlink,
struct devlink_dpipe_headers *dpipe_headers)
{
- mutex_lock(&devlink_mutex);
+ mutex_lock(&devlink->lock);
devlink->dpipe_headers = dpipe_headers;
- mutex_unlock(&devlink_mutex);
+ mutex_unlock(&devlink->lock);
return 0;
}
EXPORT_SYMBOL_GPL(devlink_dpipe_headers_register);
@@ -2715,9 +2733,9 @@ EXPORT_SYMBOL_GPL(devlink_dpipe_headers_register);
*/
void devlink_dpipe_headers_unregister(struct devlink *devlink)
{
- mutex_lock(&devlink_mutex);
+ mutex_lock(&devlink->lock);
devlink->dpipe_headers = NULL;
- mutex_unlock(&devlink_mutex);
+ mutex_unlock(&devlink->lock);
}
EXPORT_SYMBOL_GPL(devlink_dpipe_headers_unregister);
@@ -2783,9 +2801,9 @@ int devlink_dpipe_table_register(struct devlink *devlink,
table->priv = priv;
table->counter_control_extern = counter_control_extern;
- mutex_lock(&devlink_mutex);
+ mutex_lock(&devlink->lock);
list_add_tail_rcu(&table->list, &devlink->dpipe_table_list);
- mutex_unlock(&devlink_mutex);
+ mutex_unlock(&devlink->lock);
return 0;
}
EXPORT_SYMBOL_GPL(devlink_dpipe_table_register);
@@ -2801,17 +2819,17 @@ void devlink_dpipe_table_unregister(struct devlink *devlink,
{
struct devlink_dpipe_table *table;
- mutex_lock(&devlink_mutex);
+ mutex_lock(&devlink->lock);
table = devlink_dpipe_table_find(&devlink->dpipe_table_list,
table_name);
if (!table)
goto unlock;
list_del_rcu(&table->list);
- mutex_unlock(&devlink_mutex);
+ mutex_unlock(&devlink->lock);
kfree_rcu(table, rcu);
return;
unlock:
- mutex_unlock(&devlink_mutex);
+ mutex_unlock(&devlink->lock);
}
EXPORT_SYMBOL_GPL(devlink_dpipe_table_unregister);
--
2.9.5
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [patch net-next v2 02/10] devlink: Add support for resource abstraction
2017-12-26 11:23 [patch net-next v2 00/10] Add support for resource abstraction Jiri Pirko
2017-12-26 11:23 ` [patch net-next v2 01/10] devlink: Add per devlink instance lock Jiri Pirko
@ 2017-12-26 11:23 ` Jiri Pirko
2017-12-26 11:23 ` [patch net-next v2 03/10] devlink: Add support for reload Jiri Pirko
` (9 subsequent siblings)
11 siblings, 0 replies; 61+ messages in thread
From: Jiri Pirko @ 2017-12-26 11:23 UTC (permalink / raw)
To: netdev
Cc: davem, arkadis, mlxsw, andrew, vivien.didelot, f.fainelli,
michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
yuvalm, ogerlitz, dsa, roopa
From: Arkadi Sharshevsky <arkadis@mellanox.com>
Add support for hardware resource abstraction over devlink. Each resource
is identified via id, furthermore it contains information regarding its
size and its related sub resources. Each resource can also provide its
current occupancy.
In some cases the sizes of some resources can be changed, yet for those
changes to take place a hot driver reload may be needed. The reload
capability will be introduced in the next patch.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
v1->v2
- Add resource unit.
---
include/net/devlink.h | 82 ++++++++++
include/uapi/linux/devlink.h | 15 ++
net/core/devlink.c | 358 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 455 insertions(+)
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 4d2c6fc..76ab373 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -26,6 +26,7 @@ struct devlink {
struct list_head port_list;
struct list_head sb_list;
struct list_head dpipe_table_list;
+ struct list_head resource_list;
struct devlink_dpipe_headers *dpipe_headers;
const struct devlink_ops *ops;
struct device *dev;
@@ -224,6 +225,47 @@ struct devlink_dpipe_headers {
unsigned int headers_count;
};
+/**
+ * struct devlink_resource_ops - resource ops
+ * @occ_get: get the occupied size
+ * @size_validate: validate the size of the resource before update, reload
+ * is needed for changes to take place
+ */
+struct devlink_resource_ops {
+ u64 (*occ_get)(struct devlink *devlink);
+ int (*size_validate)(struct devlink *devlink, u64 size,
+ struct netlink_ext_ack *extack);
+};
+
+/**
+ * struct devlink_resource - devlink resource
+ * @name: name of the resource
+ * @size: size of the resource
+ * @size_new: updated size of the resource, reload is needed
+ * @size_valid: valid in case the total size of the resource is valid
+ * including its children
+ * @id: id, per devlink instance
+ * @unit: resource's unit
+ * @parent: parent resource
+ * @list: parent list
+ * @resource_list: list of child resources
+ * @resource_ops: resource ops
+ */
+struct devlink_resource {
+ const char *name;
+ u64 size;
+ u64 size_new;
+ bool size_valid;
+ u64 id;
+ enum devlink_resource_unit unit;
+ struct devlink_resource *parent;
+ struct list_head list;
+ struct list_head resource_list;
+ const struct devlink_resource_ops *resource_ops;
+};
+
+#define DEVLINK_RESOURCE_ID_PARENT_TOP 0
+
struct devlink_ops {
int (*port_type_set)(struct devlink_port *devlink_port,
enum devlink_port_type port_type);
@@ -333,6 +375,20 @@ extern struct devlink_dpipe_header devlink_dpipe_header_ethernet;
extern struct devlink_dpipe_header devlink_dpipe_header_ipv4;
extern struct devlink_dpipe_header devlink_dpipe_header_ipv6;
+int devlink_resource_register(struct devlink *devlink,
+ const char *resource_name,
+ bool top_hierarchy,
+ u64 resource_size,
+ u64 resource_id,
+ u64 parent_resource_id,
+ enum devlink_resource_unit unit,
+ const struct devlink_resource_ops *resource_ops);
+void devlink_resources_unregister(struct devlink *devlink,
+ struct devlink_resource *resource);
+int devlink_resource_size_get(struct devlink *devlink,
+ u64 resource_id,
+ u64 *p_resource_size);
+
#else
static inline struct devlink *devlink_alloc(const struct devlink_ops *ops,
@@ -469,6 +525,32 @@ devlink_dpipe_match_put(struct sk_buff *skb,
return 0;
}
+static inline int
+devlink_resource_register(struct devlink *devlink,
+ const char *resource_name,
+ bool top_hierarchy,
+ u64 resource_size,
+ u64 resource_id,
+ u64 parent_resource_id,
+ enum devlink_resource_unit unit,
+ const struct devlink_resource_ops *resource_ops)
+{
+ return 0;
+}
+
+static inline void
+devlink_resources_unregister(struct devlink *devlink,
+ struct devlink_resource *resource)
+{
+}
+
+static inline int
+devlink_resource_size_get(struct devlink *devlink, u64 resource_id,
+ u64 *p_resource_size)
+{
+ return -EOPNOTSUPP;
+}
+
#endif
#endif /* _NET_DEVLINK_H_ */
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 6665df6..1d59d8e 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -70,6 +70,8 @@ enum devlink_command {
DEVLINK_CMD_DPIPE_ENTRIES_GET,
DEVLINK_CMD_DPIPE_HEADERS_GET,
DEVLINK_CMD_DPIPE_TABLE_COUNTERS_SET,
+ DEVLINK_CMD_RESOURCE_SET,
+ DEVLINK_CMD_RESOURCE_DUMP,
/* add new commands above here */
__DEVLINK_CMD_MAX,
@@ -202,6 +204,15 @@ enum devlink_attr {
DEVLINK_ATTR_PAD,
DEVLINK_ATTR_ESWITCH_ENCAP_MODE, /* u8 */
+ DEVLINK_ATTR_RESOURCE_LIST, /* nested */
+ DEVLINK_ATTR_RESOURCE, /* nested */
+ DEVLINK_ATTR_RESOURCE_NAME, /* string */
+ DEVLINK_ATTR_RESOURCE_SIZE, /* u64 */
+ DEVLINK_ATTR_RESOURCE_SIZE_NEW, /* u64 */
+ DEVLINK_ATTR_RESOURCE_SIZE_VALID, /* u8 */
+ DEVLINK_ATTR_RESOURCE_UNIT, /* u8 */
+ DEVLINK_ATTR_RESOURCE_OCC, /* u64 */
+ DEVLINK_ATTR_RESOURCE_ID, /* u64 */
/* add new attributes above here, update the policy in devlink.c */
@@ -245,4 +256,8 @@ enum devlink_dpipe_header_id {
DEVLINK_DPIPE_HEADER_IPV6,
};
+enum devlink_resource_unit {
+ DEVLINK_RESOURCE_UNIT_DOUBLE_WORD,
+};
+
#endif /* _UAPI_LINUX_DEVLINK_H_ */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 2f71734..733a9c4 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2288,6 +2288,217 @@ static int devlink_nl_cmd_dpipe_table_counters_set(struct sk_buff *skb,
counters_enable);
}
+struct devlink_resource *
+devlink_resource_find(struct devlink *devlink,
+ struct devlink_resource *resource, u64 resource_id)
+{
+ struct list_head *resource_list;
+
+ if (resource)
+ resource_list = &resource->resource_list;
+ else
+ resource_list = &devlink->resource_list;
+
+ list_for_each_entry(resource, resource_list, list) {
+ struct devlink_resource *__resource;
+
+ if (resource->id == resource_id)
+ return resource;
+
+ __resource = devlink_resource_find(devlink, resource,
+ resource_id);
+ if (__resource)
+ return __resource;
+ }
+ return NULL;
+}
+
+void devlink_resource_validate_children(struct devlink_resource *resource)
+{
+ struct devlink_resource *__resource;
+ bool size_valid = true;
+ u64 parts_size = 0;
+
+ if (list_empty(&resource->resource_list))
+ goto out;
+
+ list_for_each_entry(__resource, &resource->resource_list, list)
+ parts_size += __resource->size_new;
+
+ if (parts_size > resource->size)
+ size_valid = false;
+out:
+ resource->size_valid = size_valid;
+}
+
+static int devlink_nl_cmd_resource_set(struct sk_buff *skb,
+ struct genl_info *info)
+{
+ struct devlink *devlink = info->user_ptr[0];
+ struct devlink_resource *resource;
+ u64 resource_id;
+ u64 size;
+ int err;
+
+ if (!info->attrs[DEVLINK_ATTR_RESOURCE_ID] ||
+ !info->attrs[DEVLINK_ATTR_RESOURCE_SIZE])
+ return -EINVAL;
+ resource_id = nla_get_u64(info->attrs[DEVLINK_ATTR_RESOURCE_ID]);
+
+ resource = devlink_resource_find(devlink, NULL, resource_id);
+ if (!resource)
+ return -EINVAL;
+
+ if (!resource->resource_ops->size_validate)
+ return -EINVAL;
+
+ size = nla_get_u64(info->attrs[DEVLINK_ATTR_RESOURCE_SIZE]);
+ err = resource->resource_ops->size_validate(devlink, size,
+ info->extack);
+ if (err)
+ return err;
+
+ resource->size_new = size;
+ devlink_resource_validate_children(resource);
+ if (resource->parent)
+ devlink_resource_validate_children(resource->parent);
+ return 0;
+}
+
+static int devlink_resource_put(struct devlink *devlink, struct sk_buff *skb,
+ struct devlink_resource *resource)
+{
+ struct devlink_resource *__resource;
+ struct nlattr *__resource_attr;
+ struct nlattr *resource_attr;
+
+ resource_attr = nla_nest_start(skb, DEVLINK_ATTR_RESOURCE);
+ if (!resource_attr)
+ return -EMSGSIZE;
+
+ if (nla_put_string(skb, DEVLINK_ATTR_RESOURCE_NAME, resource->name) ||
+ nla_put_u64_64bit(skb, DEVLINK_ATTR_RESOURCE_SIZE, resource->size,
+ DEVLINK_ATTR_PAD) ||
+ nla_put_u64_64bit(skb, DEVLINK_ATTR_RESOURCE_ID, resource->id,
+ DEVLINK_ATTR_PAD) ||
+ nla_put_u8(skb, DEVLINK_ATTR_RESOURCE_UNIT, resource->unit))
+ goto nla_put_failure;
+ if (resource->size != resource->size_new)
+ nla_put_u64_64bit(skb, DEVLINK_ATTR_RESOURCE_SIZE_NEW,
+ resource->size_new, DEVLINK_ATTR_PAD);
+ if (resource->resource_ops && resource->resource_ops->occ_get)
+ nla_put_u64_64bit(skb, DEVLINK_ATTR_RESOURCE_OCC,
+ resource->resource_ops->occ_get(devlink),
+ DEVLINK_ATTR_PAD);
+ if (list_empty(&resource->resource_list))
+ goto out;
+
+ if (nla_put_u8(skb, DEVLINK_ATTR_RESOURCE_SIZE_VALID,
+ resource->size_valid))
+ goto nla_put_failure;
+
+ __resource_attr = nla_nest_start(skb, DEVLINK_ATTR_RESOURCE_LIST);
+ if (!__resource_attr)
+ goto nla_put_failure;
+
+ list_for_each_entry(__resource, &resource->resource_list, list) {
+ if (devlink_resource_put(devlink, skb, __resource))
+ goto resource_put_failure;
+ }
+
+ nla_nest_end(skb, __resource_attr);
+out:
+ nla_nest_end(skb, resource_attr);
+ return 0;
+
+resource_put_failure:
+ nla_nest_cancel(skb, __resource_attr);
+nla_put_failure:
+ nla_nest_cancel(skb, resource_attr);
+ return -EMSGSIZE;
+}
+
+static int devlink_resource_fill(struct genl_info *info,
+ enum devlink_command cmd, int flags)
+{
+ struct devlink *devlink = info->user_ptr[0];
+ struct devlink_resource *resource;
+ struct nlattr *resources_attr;
+ struct sk_buff *skb = NULL;
+ struct nlmsghdr *nlh;
+ bool incomplete;
+ void *hdr;
+ int i;
+ int err;
+
+ resource = list_first_entry(&devlink->resource_list,
+ struct devlink_resource, list);
+start_again:
+ err = devlink_dpipe_send_and_alloc_skb(&skb, info);
+ if (err)
+ return err;
+
+ hdr = genlmsg_put(skb, info->snd_portid, info->snd_seq,
+ &devlink_nl_family, NLM_F_MULTI, cmd);
+ if (!hdr) {
+ nlmsg_free(skb);
+ return -EMSGSIZE;
+ }
+
+ if (devlink_nl_put_handle(skb, devlink))
+ goto nla_put_failure;
+
+ resources_attr = nla_nest_start(skb, DEVLINK_ATTR_RESOURCE_LIST);
+ if (!resources_attr)
+ goto nla_put_failure;
+
+ incomplete = false;
+ i = 0;
+ list_for_each_entry_from(resource, &devlink->resource_list, list) {
+ err = devlink_resource_put(devlink, skb, resource);
+ if (err) {
+ if (!i)
+ goto err_resource_put;
+ incomplete = true;
+ break;
+ }
+ i++;
+ }
+ nla_nest_end(skb, resources_attr);
+ genlmsg_end(skb, hdr);
+ if (incomplete)
+ goto start_again;
+send_done:
+ nlh = nlmsg_put(skb, info->snd_portid, info->snd_seq,
+ NLMSG_DONE, 0, flags | NLM_F_MULTI);
+ if (!nlh) {
+ err = devlink_dpipe_send_and_alloc_skb(&skb, info);
+ if (err)
+ goto err_skb_send_alloc;
+ goto send_done;
+ }
+ return genlmsg_reply(skb, info);
+
+nla_put_failure:
+ err = -EMSGSIZE;
+err_resource_put:
+err_skb_send_alloc:
+ genlmsg_cancel(skb, hdr);
+ nlmsg_free(skb);
+ return err;
+}
+
+static int devlink_nl_cmd_resource_dump(struct sk_buff *skb,
+ struct genl_info *info)
+{
+ struct devlink *devlink = info->user_ptr[0];
+
+ if (list_empty(&devlink->resource_list))
+ return -EOPNOTSUPP;
+
+ return devlink_resource_fill(info, DEVLINK_CMD_RESOURCE_DUMP, 0);
+}
+
static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -2306,6 +2517,8 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_ESWITCH_ENCAP_MODE] = { .type = NLA_U8 },
[DEVLINK_ATTR_DPIPE_TABLE_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED] = { .type = NLA_U8 },
+ [DEVLINK_ATTR_RESOURCE_ID] = { .type = NLA_U64},
+ [DEVLINK_ATTR_RESOURCE_SIZE] = { .type = NLA_U64},
};
static const struct genl_ops devlink_nl_ops[] = {
@@ -2466,6 +2679,20 @@ static const struct genl_ops devlink_nl_ops[] = {
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
+ {
+ .cmd = DEVLINK_CMD_RESOURCE_SET,
+ .doit = devlink_nl_cmd_resource_set,
+ .policy = devlink_nl_policy,
+ .flags = GENL_ADMIN_PERM,
+ .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+ },
+ {
+ .cmd = DEVLINK_CMD_RESOURCE_DUMP,
+ .doit = devlink_nl_cmd_resource_dump,
+ .policy = devlink_nl_policy,
+ .flags = GENL_ADMIN_PERM,
+ .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+ },
};
static struct genl_family devlink_nl_family __ro_after_init = {
@@ -2503,6 +2730,7 @@ struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size)
INIT_LIST_HEAD(&devlink->port_list);
INIT_LIST_HEAD(&devlink->sb_list);
INIT_LIST_HEAD_RCU(&devlink->dpipe_table_list);
+ INIT_LIST_HEAD(&devlink->resource_list);
mutex_init(&devlink->lock);
return devlink;
}
@@ -2833,6 +3061,136 @@ void devlink_dpipe_table_unregister(struct devlink *devlink,
}
EXPORT_SYMBOL_GPL(devlink_dpipe_table_unregister);
+/**
+ * devlink_resource_register - devlink resource register
+ *
+ * @devlink: devlink
+ * @resource_name: resource's name
+ * @top_hierarchy: top hierarchy
+ * @reload_required: reload is required for new configuration to
+ * apply
+ * @resource_size: resource's size
+ * @resource_id: resource's id
+ * @parent_reosurce_id: resource's parent id
+ * @unit: resource's unit type
+ * @resource_ops: resource ops
+ */
+int devlink_resource_register(struct devlink *devlink,
+ const char *resource_name,
+ bool top_hierarchy,
+ u64 resource_size,
+ u64 resource_id,
+ u64 parent_resource_id,
+ enum devlink_resource_unit unit,
+ const struct devlink_resource_ops *resource_ops)
+{
+ struct devlink_resource *resource;
+ struct list_head *resource_list;
+ int err = 0;
+
+ mutex_lock(&devlink->lock);
+ resource = devlink_resource_find(devlink, NULL, resource_id);
+ if (resource) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ resource = kzalloc(sizeof(*resource), GFP_KERNEL);
+ if (!resource) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ if (top_hierarchy) {
+ resource_list = &devlink->resource_list;
+ } else {
+ struct devlink_resource *parent_resource;
+
+ parent_resource = devlink_resource_find(devlink, NULL,
+ parent_resource_id);
+ if (parent_resource) {
+ resource_list = &parent_resource->resource_list;
+ resource->parent = parent_resource;
+ } else {
+ err = -EINVAL;
+ goto out;
+ }
+ }
+
+ resource->name = resource_name;
+ resource->size = resource_size;
+ resource->size_new = resource_size;
+ resource->id = resource_id;
+ resource->unit = unit;
+ resource->resource_ops = resource_ops;
+ resource->size_valid = true;
+ INIT_LIST_HEAD(&resource->resource_list);
+ list_add_tail(&resource->list, resource_list);
+out:
+ mutex_unlock(&devlink->lock);
+ return err;
+}
+EXPORT_SYMBOL_GPL(devlink_resource_register);
+
+/**
+ * devlink_resources_unregister - free all resources
+ *
+ * @devlink: devlink
+ * @resource: resource
+ */
+void devlink_resources_unregister(struct devlink *devlink,
+ struct devlink_resource *resource)
+{
+ struct devlink_resource *tmp, *__resource;
+ struct list_head *resource_list;
+
+ if (resource)
+ resource_list = &resource->resource_list;
+ else
+ resource_list = &devlink->resource_list;
+
+ if (!resource)
+ mutex_lock(&devlink->lock);
+
+ list_for_each_entry_safe(__resource, tmp, resource_list, list) {
+ devlink_resources_unregister(devlink, __resource);
+ list_del(&__resource->list);
+ kfree(__resource);
+ }
+
+ if (!resource)
+ mutex_unlock(&devlink->lock);
+}
+EXPORT_SYMBOL_GPL(devlink_resources_unregister);
+
+/**
+ * devlink_resource_size_get - get and update size
+ *
+ * @devlink: devlink
+ * @resource_id: the requested resource id
+ * @p_resource_size: ptr to update
+ */
+int devlink_resource_size_get(struct devlink *devlink,
+ u64 resource_id,
+ u64 *p_resource_size)
+{
+ struct devlink_resource *resource;
+ int err = 0;
+
+ mutex_lock(&devlink->lock);
+ resource = devlink_resource_find(devlink, NULL, resource_id);
+ if (!resource) {
+ err = -EINVAL;
+ goto out;
+ }
+ *p_resource_size = resource->size_new;
+ resource->size = resource->size_new;
+out:
+ mutex_unlock(&devlink->lock);
+ return err;
+}
+EXPORT_SYMBOL_GPL(devlink_resource_size_get);
+
static int __init devlink_module_init(void)
{
return genl_register_family(&devlink_nl_family);
--
2.9.5
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [patch net-next v2 03/10] devlink: Add support for reload
2017-12-26 11:23 [patch net-next v2 00/10] Add support for resource abstraction Jiri Pirko
2017-12-26 11:23 ` [patch net-next v2 01/10] devlink: Add per devlink instance lock Jiri Pirko
2017-12-26 11:23 ` [patch net-next v2 02/10] devlink: Add support for resource abstraction Jiri Pirko
@ 2017-12-26 11:23 ` Jiri Pirko
2017-12-26 11:23 ` [patch net-next v2 04/10] devlink: Add relation between dpipe and resource Jiri Pirko
` (8 subsequent siblings)
11 siblings, 0 replies; 61+ messages in thread
From: Jiri Pirko @ 2017-12-26 11:23 UTC (permalink / raw)
To: netdev
Cc: davem, arkadis, mlxsw, andrew, vivien.didelot, f.fainelli,
michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
yuvalm, ogerlitz, dsa, roopa
From: Arkadi Sharshevsky <arkadis@mellanox.com>
Add support for performing driver hot reload.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
v1->v2
- Use the NO_LOCK flag instead of releasing the lock manually.
---
include/net/devlink.h | 1 +
include/uapi/linux/devlink.h | 5 +++++
net/core/devlink.c | 48 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 54 insertions(+)
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 76ab373..7a06ec6 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -267,6 +267,7 @@ struct devlink_resource {
#define DEVLINK_RESOURCE_ID_PARENT_TOP 0
struct devlink_ops {
+ int (*reload)(struct devlink *devlink);
int (*port_type_set)(struct devlink_port *devlink_port,
enum devlink_port_type port_type);
int (*port_split)(struct devlink *devlink, unsigned int port_index,
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 1d59d8e..fcb86a6 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -73,6 +73,11 @@ enum devlink_command {
DEVLINK_CMD_RESOURCE_SET,
DEVLINK_CMD_RESOURCE_DUMP,
+ /* Hot driver reload, makes configuration changes take place. The
+ * devlink instance is not released during the process.
+ */
+ DEVLINK_CMD_RELOAD,
+
/* add new commands above here */
__DEVLINK_CMD_MAX,
DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 733a9c4..40ffa07 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2499,6 +2499,46 @@ static int devlink_nl_cmd_resource_dump(struct sk_buff *skb,
return devlink_resource_fill(info, DEVLINK_CMD_RESOURCE_DUMP, 0);
}
+static int
+devlink_resources_validate(struct devlink *devlink,
+ struct devlink_resource *resource,
+ struct genl_info *info)
+{
+ struct list_head *resource_list;
+ int err = 0;
+
+ if (resource)
+ resource_list = &resource->resource_list;
+ else
+ resource_list = &devlink->resource_list;
+
+ list_for_each_entry(resource, resource_list, list) {
+ if (!resource->size_valid)
+ return -EINVAL;
+ err = devlink_resources_validate(devlink, resource, info);
+ if (err)
+ return err;
+ }
+ return err;
+}
+
+static int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info)
+{
+ struct devlink *devlink = info->user_ptr[0];
+ int err;
+
+ if (!devlink->ops->reload)
+ return -EOPNOTSUPP;
+
+ err = devlink_resources_validate(devlink, NULL, info);
+ if (err) {
+ NL_SET_ERR_MSG_MOD(info->extack,
+ "resources size validation failed");
+ return err;
+ }
+ return devlink->ops->reload(devlink);
+}
+
static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -2693,6 +2733,14 @@ static const struct genl_ops devlink_nl_ops[] = {
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
+ {
+ .cmd = DEVLINK_CMD_RELOAD,
+ .doit = devlink_nl_cmd_reload,
+ .policy = devlink_nl_policy,
+ .flags = GENL_ADMIN_PERM,
+ .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK |
+ DEVLINK_NL_FLAG_NO_LOCK,
+ },
};
static struct genl_family devlink_nl_family __ro_after_init = {
--
2.9.5
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [patch net-next v2 04/10] devlink: Add relation between dpipe and resource
2017-12-26 11:23 [patch net-next v2 00/10] Add support for resource abstraction Jiri Pirko
` (2 preceding siblings ...)
2017-12-26 11:23 ` [patch net-next v2 03/10] devlink: Add support for reload Jiri Pirko
@ 2017-12-26 11:23 ` Jiri Pirko
2017-12-26 11:23 ` [patch net-next v2 05/10] mlxsw: pci: Add support for performing bus reset Jiri Pirko
` (7 subsequent siblings)
11 siblings, 0 replies; 61+ messages in thread
From: Jiri Pirko @ 2017-12-26 11:23 UTC (permalink / raw)
To: netdev
Cc: davem, arkadis, mlxsw, andrew, vivien.didelot, f.fainelli,
michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
yuvalm, ogerlitz, dsa, roopa
From: Arkadi Sharshevsky <arkadis@mellanox.com>
The hardware processes which are modeled via dpipe commonly use some
internal hardware resources. Such relation can improve the understanding
of hardware limitations.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
include/net/devlink.h | 13 +++++++++++++
include/uapi/linux/devlink.h | 1 +
net/core/devlink.c | 31 +++++++++++++++++++++++++++++++
3 files changed, 45 insertions(+)
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 7a06ec6..5cb061c 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -183,6 +183,8 @@ struct devlink_dpipe_table_ops;
* @counters_enabled: indicates if counters are active
* @counter_control_extern: indicates if counter control is in dpipe or
* external tool
+ * @resource_valid: Indicate that the resource id is valid
+ * @resource_id: relative resource this table is related to
* @table_ops: table operations
* @rcu: rcu
*/
@@ -192,6 +194,8 @@ struct devlink_dpipe_table {
const char *name;
bool counters_enabled;
bool counter_control_extern;
+ bool resource_valid;
+ u64 resource_id;
struct devlink_dpipe_table_ops *table_ops;
struct rcu_head rcu;
};
@@ -389,6 +393,8 @@ void devlink_resources_unregister(struct devlink *devlink,
int devlink_resource_size_get(struct devlink *devlink,
u64 resource_id,
u64 *p_resource_size);
+int devlink_dpipe_table_resource_set(struct devlink *devlink,
+ const char *table_name, u64 resource_id);
#else
@@ -552,6 +558,13 @@ devlink_resource_size_get(struct devlink *devlink, u64 resource_id,
return -EOPNOTSUPP;
}
+static inline int
+devlink_dpipe_table_resource_set(struct devlink *devlink,
+ const char *table_name, u64 resource_id)
+{
+ return -EOPNOTSUPP;
+}
+
#endif
#endif /* _NET_DEVLINK_H_ */
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index fcb86a6..1b474d69 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -218,6 +218,7 @@ enum devlink_attr {
DEVLINK_ATTR_RESOURCE_UNIT, /* u8 */
DEVLINK_ATTR_RESOURCE_OCC, /* u64 */
DEVLINK_ATTR_RESOURCE_ID, /* u64 */
+ DEVLINK_ATTR_DPIPE_TABLE_RESOURCE_ID, /* u64 */
/* add new attributes above here, update the policy in devlink.c */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 40ffa07..a257e88 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -1694,6 +1694,9 @@ static int devlink_dpipe_table_put(struct sk_buff *skb,
table->counters_enabled))
goto nla_put_failure;
+ if (table->resource_valid)
+ nla_put_u64_64bit(skb, DEVLINK_ATTR_DPIPE_TABLE_RESOURCE_ID,
+ table->resource_id, DEVLINK_ATTR_PAD);
if (devlink_dpipe_matches_put(table, skb))
goto nla_put_failure;
@@ -3239,6 +3242,34 @@ int devlink_resource_size_get(struct devlink *devlink,
}
EXPORT_SYMBOL_GPL(devlink_resource_size_get);
+/**
+ * devlink_dpipe_table_resource_set - set the resource id
+ *
+ * @devlink: devlink
+ * @table_name: table name
+ * @resource_id: resource id
+ */
+int devlink_dpipe_table_resource_set(struct devlink *devlink,
+ const char *table_name, u64 resource_id)
+{
+ struct devlink_dpipe_table *table;
+ int err = 0;
+
+ mutex_lock(&devlink->lock);
+ table = devlink_dpipe_table_find(&devlink->dpipe_table_list,
+ table_name);
+ if (!table) {
+ err = -EINVAL;
+ goto out;
+ }
+ table->resource_id = resource_id;
+ table->resource_valid = true;
+out:
+ mutex_unlock(&devlink->lock);
+ return err;
+}
+EXPORT_SYMBOL_GPL(devlink_dpipe_table_resource_set);
+
static int __init devlink_module_init(void)
{
return genl_register_family(&devlink_nl_family);
--
2.9.5
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [patch net-next v2 05/10] mlxsw: pci: Add support for performing bus reset
2017-12-26 11:23 [patch net-next v2 00/10] Add support for resource abstraction Jiri Pirko
` (3 preceding siblings ...)
2017-12-26 11:23 ` [patch net-next v2 04/10] devlink: Add relation between dpipe and resource Jiri Pirko
@ 2017-12-26 11:23 ` Jiri Pirko
2017-12-26 11:23 ` [patch net-next v2 06/10] mlxsw: spectrum: Register KVD resources with devlink Jiri Pirko
` (6 subsequent siblings)
11 siblings, 0 replies; 61+ messages in thread
From: Jiri Pirko @ 2017-12-26 11:23 UTC (permalink / raw)
To: netdev
Cc: davem, arkadis, mlxsw, andrew, vivien.didelot, f.fainelli,
michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
yuvalm, ogerlitz, dsa, roopa
From: Arkadi Sharshevsky <arkadis@mellanox.com>
This is a preparation stage before introducing hot reload. During the
reload process the ASIC should be resetted by accessing the PCI BAR due
to unavailability of the mailbox/emad interfaces.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
drivers/net/ethernet/mellanox/mlxsw/core.h | 1 +
drivers/net/ethernet/mellanox/mlxsw/pci.c | 53 ++++++++++++++++++++++--------
2 files changed, 41 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index 6e966af..34dda96 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -332,6 +332,7 @@ struct mlxsw_bus {
const struct mlxsw_config_profile *profile,
struct mlxsw_res *res);
void (*fini)(void *bus_priv);
+ void (*reset)(void *bus_priv);
bool (*skb_transmit_busy)(void *bus_priv,
const struct mlxsw_tx_info *tx_info);
int (*skb_transmit)(void *bus_priv, struct sk_buff *skb,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 23f7d82..39872fb 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -154,6 +154,7 @@ struct mlxsw_pci {
} comp;
} cmd;
struct mlxsw_bus_info bus_info;
+ const struct pci_device_id *id;
};
static void mlxsw_pci_queue_tasklet_schedule(struct mlxsw_pci_queue *q)
@@ -1622,16 +1623,6 @@ static int mlxsw_pci_cmd_exec(void *bus_priv, u16 opcode, u8 opcode_mod,
return err;
}
-static const struct mlxsw_bus mlxsw_pci_bus = {
- .kind = "pci",
- .init = mlxsw_pci_init,
- .fini = mlxsw_pci_fini,
- .skb_transmit_busy = mlxsw_pci_skb_transmit_busy,
- .skb_transmit = mlxsw_pci_skb_transmit,
- .cmd_exec = mlxsw_pci_cmd_exec,
- .features = MLXSW_BUS_F_TXRX,
-};
-
static int mlxsw_pci_sw_reset(struct mlxsw_pci *mlxsw_pci,
const struct pci_device_id *id)
{
@@ -1655,6 +1646,41 @@ static int mlxsw_pci_sw_reset(struct mlxsw_pci *mlxsw_pci,
return 0;
}
+static void mlxsw_pci_free_irq_vectors(struct mlxsw_pci *mlxsw_pci)
+{
+ pci_free_irq_vectors(mlxsw_pci->pdev);
+}
+
+static int mlxsw_pci_alloc_irq_vectors(struct mlxsw_pci *mlxsw_pci)
+{
+ int err;
+
+ err = pci_alloc_irq_vectors(mlxsw_pci->pdev, 1, 1, PCI_IRQ_MSIX);
+ if (err < 0)
+ dev_err(&mlxsw_pci->pdev->dev, "MSI-X init failed\n");
+ return err;
+}
+
+static void mlxsw_pci_reset(void *bus_priv)
+{
+ struct mlxsw_pci *mlxsw_pci = bus_priv;
+
+ mlxsw_pci_free_irq_vectors(mlxsw_pci);
+ mlxsw_pci_sw_reset(mlxsw_pci, mlxsw_pci->id);
+ mlxsw_pci_alloc_irq_vectors(mlxsw_pci);
+}
+
+static const struct mlxsw_bus mlxsw_pci_bus = {
+ .kind = "pci",
+ .init = mlxsw_pci_init,
+ .fini = mlxsw_pci_fini,
+ .skb_transmit_busy = mlxsw_pci_skb_transmit_busy,
+ .skb_transmit = mlxsw_pci_skb_transmit,
+ .cmd_exec = mlxsw_pci_cmd_exec,
+ .features = MLXSW_BUS_F_TXRX,
+ .reset = mlxsw_pci_reset,
+};
+
static int mlxsw_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
const char *driver_name = pdev->driver->name;
@@ -1716,7 +1742,7 @@ static int mlxsw_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
goto err_sw_reset;
}
- err = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_MSIX);
+ err = mlxsw_pci_alloc_irq_vectors(mlxsw_pci);
if (err < 0) {
dev_err(&pdev->dev, "MSI-X init failed\n");
goto err_msix_init;
@@ -1725,6 +1751,7 @@ static int mlxsw_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
mlxsw_pci->bus_info.device_kind = driver_name;
mlxsw_pci->bus_info.device_name = pci_name(mlxsw_pci->pdev);
mlxsw_pci->bus_info.dev = &pdev->dev;
+ mlxsw_pci->id = id;
err = mlxsw_core_bus_device_register(&mlxsw_pci->bus_info,
&mlxsw_pci_bus, mlxsw_pci);
@@ -1736,7 +1763,7 @@ static int mlxsw_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
return 0;
err_bus_device_register:
- pci_free_irq_vectors(mlxsw_pci->pdev);
+ mlxsw_pci_free_irq_vectors(mlxsw_pci);
err_msix_init:
err_sw_reset:
iounmap(mlxsw_pci->hw_addr);
@@ -1756,7 +1783,7 @@ static void mlxsw_pci_remove(struct pci_dev *pdev)
struct mlxsw_pci *mlxsw_pci = pci_get_drvdata(pdev);
mlxsw_core_bus_device_unregister(mlxsw_pci->core);
- pci_free_irq_vectors(mlxsw_pci->pdev);
+ mlxsw_pci_free_irq_vectors(mlxsw_pci);
iounmap(mlxsw_pci->hw_addr);
pci_release_regions(mlxsw_pci->pdev);
pci_disable_device(mlxsw_pci->pdev);
--
2.9.5
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [patch net-next v2 06/10] mlxsw: spectrum: Register KVD resources with devlink
2017-12-26 11:23 [patch net-next v2 00/10] Add support for resource abstraction Jiri Pirko
` (4 preceding siblings ...)
2017-12-26 11:23 ` [patch net-next v2 05/10] mlxsw: pci: Add support for performing bus reset Jiri Pirko
@ 2017-12-26 11:23 ` Jiri Pirko
2017-12-26 11:23 ` [patch net-next v2 07/10] mlxsw: spectrum_dpipe: Connect dpipe tables to resources Jiri Pirko
` (5 subsequent siblings)
11 siblings, 0 replies; 61+ messages in thread
From: Jiri Pirko @ 2017-12-26 11:23 UTC (permalink / raw)
To: netdev
Cc: davem, arkadis, mlxsw, andrew, vivien.didelot, f.fainelli,
michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
yuvalm, ogerlitz, dsa, roopa
From: Arkadi Sharshevsky <arkadis@mellanox.com>
Register the KVD resources with devlink. The KVD is a memory resource
which is subdivided into three partitions which are the linear, hash
single and hash double.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
v1->v2
- Add resource unit of double word for all the resources.
---
drivers/net/ethernet/mellanox/mlxsw/core.c | 9 ++
drivers/net/ethernet/mellanox/mlxsw/core.h | 1 +
drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 139 +++++++++++++++++++++++++
drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 12 +++
4 files changed, 161 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index f3315bc..2488662 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -1012,6 +1012,12 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
if (err)
goto err_bus_init;
+ if (mlxsw_driver->resources_register) {
+ err = mlxsw_driver->resources_register(mlxsw_core);
+ if (err)
+ goto err_register_resources;
+ }
+
err = mlxsw_ports_init(mlxsw_core);
if (err)
goto err_ports_init;
@@ -1067,6 +1073,8 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
err_ports_init:
mlxsw_bus->fini(bus_priv);
err_bus_init:
+ devlink_resources_unregister(devlink, NULL);
+err_register_resources:
devlink_free(devlink);
err_devlink_alloc:
mlxsw_core_driver_put(device_kind);
@@ -1086,6 +1094,7 @@ void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core)
mlxsw_emad_fini(mlxsw_core);
kfree(mlxsw_core->lag.mapping);
mlxsw_ports_fini(mlxsw_core);
+ devlink_resources_unregister(devlink, NULL);
mlxsw_core->bus->fini(mlxsw_core->bus_priv);
devlink_free(devlink);
mlxsw_core_driver_put(device_kind);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index 34dda96..e23f83b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -308,6 +308,7 @@ struct mlxsw_driver {
u32 *p_cur, u32 *p_max);
void (*txhdr_construct)(struct sk_buff *skb,
const struct mlxsw_tx_info *tx_info);
+ int (*resources_register)(struct mlxsw_core *mlxsw_core);
u8 txhdr_len;
const struct mlxsw_config_profile *profile;
};
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index d373df7..56d0e07 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -3979,6 +3979,144 @@ static const struct mlxsw_config_profile mlxsw_sp_config_profile = {
.resource_query_enable = 1,
};
+static bool
+mlxsw_sp_resource_kvd_granularity_validate(struct netlink_ext_ack *extack,
+ u64 size)
+{
+ const struct mlxsw_config_profile *profile;
+
+ profile = &mlxsw_sp_config_profile;
+ if (size % profile->kvd_hash_granularity) {
+ NL_SET_ERR_MSG_MOD(extack, "resource set with wrong granularity");
+ return false;
+ }
+ return true;
+}
+
+static int
+mlxsw_sp_resource_kvd_size_validate(struct devlink *devlink, u64 size,
+ struct netlink_ext_ack *extack)
+{
+ NL_SET_ERR_MSG_MOD(extack, "kvd size cannot be changed");
+ return -EINVAL;
+}
+
+static int
+mlxsw_sp_resource_kvd_linear_size_validate(struct devlink *devlink, u64 size,
+ struct netlink_ext_ack *extack)
+{
+ if (!mlxsw_sp_resource_kvd_granularity_validate(extack, size))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int
+mlxsw_sp_resource_kvd_hash_single_size_validate(struct devlink *devlink, u64 size,
+ struct netlink_ext_ack *extack)
+{
+ struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
+
+ if (!mlxsw_sp_resource_kvd_granularity_validate(extack, size))
+ return -EINVAL;
+
+ if (size < MLXSW_CORE_RES_GET(mlxsw_core, KVD_SINGLE_MIN_SIZE)) {
+ NL_SET_ERR_MSG_MOD(extack, "hash single size is smaller than minimum");
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static int
+mlxsw_sp_resource_kvd_hash_double_size_validate(struct devlink *devlink, u64 size,
+ struct netlink_ext_ack *extack)
+{
+ struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
+
+ if (!mlxsw_sp_resource_kvd_granularity_validate(extack, size))
+ return -EINVAL;
+
+ if (size < MLXSW_CORE_RES_GET(mlxsw_core, KVD_DOUBLE_MIN_SIZE)) {
+ NL_SET_ERR_MSG_MOD(extack, "hash double size is smaller than minimum");
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static struct devlink_resource_ops mlxsw_sp_resource_kvd_ops = {
+ .size_validate = mlxsw_sp_resource_kvd_size_validate,
+};
+
+static struct devlink_resource_ops mlxsw_sp_resource_kvd_linear_ops = {
+ .size_validate = mlxsw_sp_resource_kvd_linear_size_validate,
+};
+
+static struct devlink_resource_ops mlxsw_sp_resource_kvd_hash_single_ops = {
+ .size_validate = mlxsw_sp_resource_kvd_hash_single_size_validate,
+};
+
+static struct devlink_resource_ops mlxsw_sp_resource_kvd_hash_double_ops = {
+ .size_validate = mlxsw_sp_resource_kvd_hash_double_size_validate,
+};
+
+static int mlxsw_sp_resources_register(struct mlxsw_core *mlxsw_core)
+{
+ struct devlink *devlink = priv_to_devlink(mlxsw_core);
+ u32 kvd_size, single_size, double_size, linear_size;
+ const struct mlxsw_config_profile *profile;
+ int err;
+
+ profile = &mlxsw_sp_config_profile;
+ if (!MLXSW_CORE_RES_VALID(mlxsw_core, KVD_SIZE))
+ return -EIO;
+
+ kvd_size = MLXSW_CORE_RES_GET(mlxsw_core, KVD_SIZE);
+ err = devlink_resource_register(devlink, MLXSW_SP_RESOURCE_NAME_KVD,
+ true, kvd_size,
+ MLXSW_SP_RESOURCE_KVD,
+ DEVLINK_RESOURCE_ID_PARENT_TOP,
+ DEVLINK_RESOURCE_UNIT_DOUBLE_WORD,
+ &mlxsw_sp_resource_kvd_ops);
+ if (err)
+ return err;
+
+ linear_size = profile->kvd_linear_size;
+ err = devlink_resource_register(devlink, MLXSW_SP_RESOURCE_NAME_KVD_LINEAR,
+ false, linear_size,
+ MLXSW_SP_RESOURCE_KVD_LINEAR,
+ MLXSW_SP_RESOURCE_KVD,
+ DEVLINK_RESOURCE_UNIT_DOUBLE_WORD,
+ &mlxsw_sp_resource_kvd_linear_ops);
+ if (err)
+ return err;
+
+ double_size = kvd_size - linear_size;
+ double_size *= profile->kvd_hash_double_parts;
+ double_size /= profile->kvd_hash_double_parts +
+ profile->kvd_hash_single_parts;
+ double_size = rounddown(double_size, profile->kvd_hash_granularity);
+ err = devlink_resource_register(devlink, MLXSW_SP_RESOURCE_NAME_KVD_HASH_DOUBLE,
+ false, double_size,
+ MLXSW_SP_RESOURCE_KVD_HASH_DOUBLE,
+ MLXSW_SP_RESOURCE_KVD,
+ DEVLINK_RESOURCE_UNIT_DOUBLE_WORD,
+ &mlxsw_sp_resource_kvd_hash_double_ops);
+ if (err)
+ return err;
+
+ single_size = kvd_size - double_size - linear_size;
+ err = devlink_resource_register(devlink, MLXSW_SP_RESOURCE_NAME_KVD_HASH_SINGLE,
+ false, single_size,
+ MLXSW_SP_RESOURCE_KVD_HASH_SINGLE,
+ MLXSW_SP_RESOURCE_KVD,
+ DEVLINK_RESOURCE_UNIT_DOUBLE_WORD,
+ &mlxsw_sp_resource_kvd_hash_single_ops);
+ if (err)
+ return err;
+
+ return 0;
+}
+
static struct mlxsw_driver mlxsw_sp_driver = {
.kind = mlxsw_sp_driver_name,
.priv_size = sizeof(struct mlxsw_sp),
@@ -3998,6 +4136,7 @@ static struct mlxsw_driver mlxsw_sp_driver = {
.sb_occ_port_pool_get = mlxsw_sp_sb_occ_port_pool_get,
.sb_occ_tc_port_bind_get = mlxsw_sp_sb_occ_tc_port_bind_get,
.txhdr_construct = mlxsw_sp_txhdr_construct,
+ .resources_register = mlxsw_sp_resources_register,
.txhdr_len = MLXSW_TXHDR_LEN,
.profile = &mlxsw_sp_config_profile,
};
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index a0adcd8..5abc8c5 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -66,6 +66,18 @@
#define MLXSW_SP_KVD_LINEAR_SIZE 98304 /* entries */
#define MLXSW_SP_KVD_GRANULARITY 128
+#define MLXSW_SP_RESOURCE_NAME_KVD "kvd"
+#define MLXSW_SP_RESOURCE_NAME_KVD_LINEAR "linear"
+#define MLXSW_SP_RESOURCE_NAME_KVD_HASH_SINGLE "hash_single"
+#define MLXSW_SP_RESOURCE_NAME_KVD_HASH_DOUBLE "hash_double"
+
+enum mlxsw_sp_resource_id {
+ MLXSW_SP_RESOURCE_KVD,
+ MLXSW_SP_RESOURCE_KVD_LINEAR,
+ MLXSW_SP_RESOURCE_KVD_HASH_SINGLE,
+ MLXSW_SP_RESOURCE_KVD_HASH_DOUBLE,
+};
+
struct mlxsw_sp_port;
struct mlxsw_sp_rif;
--
2.9.5
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [patch net-next v2 07/10] mlxsw: spectrum_dpipe: Connect dpipe tables to resources
2017-12-26 11:23 [patch net-next v2 00/10] Add support for resource abstraction Jiri Pirko
` (5 preceding siblings ...)
2017-12-26 11:23 ` [patch net-next v2 06/10] mlxsw: spectrum: Register KVD resources with devlink Jiri Pirko
@ 2017-12-26 11:23 ` Jiri Pirko
2017-12-26 11:23 ` [patch net-next v2 08/10] mlxsw: spectrum: Add support for getting kvdl occupancy Jiri Pirko
` (4 subsequent siblings)
11 siblings, 0 replies; 61+ messages in thread
From: Jiri Pirko @ 2017-12-26 11:23 UTC (permalink / raw)
To: netdev
Cc: davem, arkadis, mlxsw, andrew, vivien.didelot, f.fainelli,
michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
yuvalm, ogerlitz, dsa, roopa
From: Arkadi Sharshevsky <arkadis@mellanox.com>
Connect current dpipe tables to resources. The tables are connected
in the following fashion:
1. IPv4 host - KVD hash single
2. IPv6 host - KVD hash double
3. Adjacency - KVD linear
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
.../net/ethernet/mellanox/mlxsw/spectrum_dpipe.c | 72 ++++++++++++++++++----
1 file changed, 60 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c
index 96fdba7..282fe82 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c
@@ -774,11 +774,27 @@ static struct devlink_dpipe_table_ops mlxsw_sp_host4_ops = {
static int mlxsw_sp_dpipe_host4_table_init(struct mlxsw_sp *mlxsw_sp)
{
struct devlink *devlink = priv_to_devlink(mlxsw_sp->core);
+ int err;
- return devlink_dpipe_table_register(devlink,
- MLXSW_SP_DPIPE_TABLE_NAME_HOST4,
- &mlxsw_sp_host4_ops,
- mlxsw_sp, false);
+ err = devlink_dpipe_table_register(devlink,
+ MLXSW_SP_DPIPE_TABLE_NAME_HOST4,
+ &mlxsw_sp_host4_ops,
+ mlxsw_sp, false);
+ if (err)
+ return err;
+
+ err = devlink_dpipe_table_resource_set(devlink,
+ MLXSW_SP_DPIPE_TABLE_NAME_HOST4,
+ MLXSW_SP_RESOURCE_KVD_HASH_SINGLE);
+ if (err)
+ goto err_resource_set;
+
+ return 0;
+
+err_resource_set:
+ devlink_dpipe_table_unregister(devlink,
+ MLXSW_SP_DPIPE_TABLE_NAME_HOST4);
+ return err;
}
static void mlxsw_sp_dpipe_host4_table_fini(struct mlxsw_sp *mlxsw_sp)
@@ -832,11 +848,27 @@ static struct devlink_dpipe_table_ops mlxsw_sp_host6_ops = {
static int mlxsw_sp_dpipe_host6_table_init(struct mlxsw_sp *mlxsw_sp)
{
struct devlink *devlink = priv_to_devlink(mlxsw_sp->core);
+ int err;
- return devlink_dpipe_table_register(devlink,
- MLXSW_SP_DPIPE_TABLE_NAME_HOST6,
- &mlxsw_sp_host6_ops,
- mlxsw_sp, false);
+ err = devlink_dpipe_table_register(devlink,
+ MLXSW_SP_DPIPE_TABLE_NAME_HOST6,
+ &mlxsw_sp_host6_ops,
+ mlxsw_sp, false);
+ if (err)
+ return err;
+
+ err = devlink_dpipe_table_resource_set(devlink,
+ MLXSW_SP_DPIPE_TABLE_NAME_HOST6,
+ MLXSW_SP_RESOURCE_KVD_HASH_DOUBLE);
+ if (err)
+ goto err_resource_set;
+
+ return 0;
+
+err_resource_set:
+ devlink_dpipe_table_unregister(devlink,
+ MLXSW_SP_DPIPE_TABLE_NAME_HOST6);
+ return err;
}
static void mlxsw_sp_dpipe_host6_table_fini(struct mlxsw_sp *mlxsw_sp)
@@ -1216,11 +1248,27 @@ static struct devlink_dpipe_table_ops mlxsw_sp_dpipe_table_adj_ops = {
static int mlxsw_sp_dpipe_adj_table_init(struct mlxsw_sp *mlxsw_sp)
{
struct devlink *devlink = priv_to_devlink(mlxsw_sp->core);
+ int err;
- return devlink_dpipe_table_register(devlink,
- MLXSW_SP_DPIPE_TABLE_NAME_ADJ,
- &mlxsw_sp_dpipe_table_adj_ops,
- mlxsw_sp, false);
+ err = devlink_dpipe_table_register(devlink,
+ MLXSW_SP_DPIPE_TABLE_NAME_ADJ,
+ &mlxsw_sp_dpipe_table_adj_ops,
+ mlxsw_sp, false);
+ if (err)
+ return err;
+
+ err = devlink_dpipe_table_resource_set(devlink,
+ MLXSW_SP_DPIPE_TABLE_NAME_ADJ,
+ MLXSW_SP_RESOURCE_KVD_LINEAR);
+ if (err)
+ goto err_resource_set;
+
+ return 0;
+
+err_resource_set:
+ devlink_dpipe_table_unregister(devlink,
+ MLXSW_SP_DPIPE_TABLE_NAME_ADJ);
+ return err;
}
static void mlxsw_sp_dpipe_adj_table_fini(struct mlxsw_sp *mlxsw_sp)
--
2.9.5
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [patch net-next v2 08/10] mlxsw: spectrum: Add support for getting kvdl occupancy
2017-12-26 11:23 [patch net-next v2 00/10] Add support for resource abstraction Jiri Pirko
` (6 preceding siblings ...)
2017-12-26 11:23 ` [patch net-next v2 07/10] mlxsw: spectrum_dpipe: Connect dpipe tables to resources Jiri Pirko
@ 2017-12-26 11:23 ` Jiri Pirko
2017-12-26 11:23 ` [patch net-next v2 09/10] mlxsw: pci: Add support for getting resource through devlink Jiri Pirko
` (3 subsequent siblings)
11 siblings, 0 replies; 61+ messages in thread
From: Jiri Pirko @ 2017-12-26 11:23 UTC (permalink / raw)
To: netdev
Cc: davem, arkadis, mlxsw, andrew, vivien.didelot, f.fainelli,
michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
yuvalm, ogerlitz, dsa, roopa
From: Arkadi Sharshevsky <arkadis@mellanox.com>
Add support for getting the kvdl occupancy through the resource interface.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 9 ++++++++
drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 1 +
.../net/ethernet/mellanox/mlxsw/spectrum_kvdl.c | 26 ++++++++++++++++++++++
3 files changed, 36 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 56d0e07..6569e55 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -4043,12 +4043,21 @@ mlxsw_sp_resource_kvd_hash_double_size_validate(struct devlink *devlink, u64 siz
return 0;
}
+static u64 mlxsw_sp_resource_kvd_linear_occ_get(struct devlink *devlink)
+{
+ struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
+ struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core);
+
+ return mlxsw_sp_kvdl_occ_get(mlxsw_sp);
+}
+
static struct devlink_resource_ops mlxsw_sp_resource_kvd_ops = {
.size_validate = mlxsw_sp_resource_kvd_size_validate,
};
static struct devlink_resource_ops mlxsw_sp_resource_kvd_linear_ops = {
.size_validate = mlxsw_sp_resource_kvd_linear_size_validate,
+ .occ_get = mlxsw_sp_resource_kvd_linear_occ_get,
};
static struct devlink_resource_ops mlxsw_sp_resource_kvd_hash_single_ops = {
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 5abc8c5..ff8d32b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -469,6 +469,7 @@ void mlxsw_sp_kvdl_free(struct mlxsw_sp *mlxsw_sp, int entry_index);
int mlxsw_sp_kvdl_alloc_size_query(struct mlxsw_sp *mlxsw_sp,
unsigned int entry_count,
unsigned int *p_alloc_size);
+u64 mlxsw_sp_kvdl_occ_get(const struct mlxsw_sp *mlxsw_sp);
struct mlxsw_sp_acl_rule_info {
unsigned int priority;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c
index 310c382..cfacc17 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c
@@ -286,6 +286,32 @@ static void mlxsw_sp_kvdl_parts_fini(struct mlxsw_sp *mlxsw_sp)
mlxsw_sp_kvdl_part_fini(mlxsw_sp, i);
}
+u64 mlxsw_sp_kvdl_part_occ(struct mlxsw_sp_kvdl_part *part)
+{
+ unsigned int nr_entries;
+ int bit = -1;
+ u64 occ = 0;
+
+ nr_entries = (part->info->end_index -
+ part->info->start_index + 1) /
+ part->info->alloc_size;
+ while ((bit = find_next_bit(part->usage, nr_entries, bit + 1))
+ < nr_entries)
+ occ += part->info->alloc_size;
+ return occ;
+}
+
+u64 mlxsw_sp_kvdl_occ_get(const struct mlxsw_sp *mlxsw_sp)
+{
+ struct mlxsw_sp_kvdl_part *part;
+ u64 occ = 0;
+
+ list_for_each_entry(part, &mlxsw_sp->kvdl->parts_list, list)
+ occ += mlxsw_sp_kvdl_part_occ(part);
+
+ return occ;
+}
+
int mlxsw_sp_kvdl_init(struct mlxsw_sp *mlxsw_sp)
{
struct mlxsw_sp_kvdl *kvdl;
--
2.9.5
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [patch net-next v2 09/10] mlxsw: pci: Add support for getting resource through devlink
2017-12-26 11:23 [patch net-next v2 00/10] Add support for resource abstraction Jiri Pirko
` (7 preceding siblings ...)
2017-12-26 11:23 ` [patch net-next v2 08/10] mlxsw: spectrum: Add support for getting kvdl occupancy Jiri Pirko
@ 2017-12-26 11:23 ` Jiri Pirko
2017-12-26 11:23 ` [patch net-next v2 10/10] mlxsw: core: Add support for reload Jiri Pirko
` (2 subsequent siblings)
11 siblings, 0 replies; 61+ messages in thread
From: Jiri Pirko @ 2017-12-26 11:23 UTC (permalink / raw)
To: netdev
Cc: davem, arkadis, mlxsw, andrew, vivien.didelot, f.fainelli,
michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
yuvalm, ogerlitz, dsa, roopa
From: Arkadi Sharshevsky <arkadis@mellanox.com>
Up until now the KVD partition was static. This patch introduces the
ability to get the resource sizes via devlink. In case the resource is not
available the default configuration is used.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
drivers/net/ethernet/mellanox/mlxsw/core.c | 16 ++++++++
drivers/net/ethernet/mellanox/mlxsw/core.h | 9 ++++
drivers/net/ethernet/mellanox/mlxsw/pci.c | 40 +++++-------------
drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 57 ++++++++++++++++++++++++++
4 files changed, 92 insertions(+), 30 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 2488662..9fe25b1 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -1800,6 +1800,22 @@ void mlxsw_core_flush_owq(void)
}
EXPORT_SYMBOL(mlxsw_core_flush_owq);
+int mlxsw_core_kvd_sizes_get(struct mlxsw_core *mlxsw_core,
+ const struct mlxsw_config_profile *profile,
+ u64 *p_single_size, u64 *p_double_size,
+ u64 *p_linear_size)
+{
+ struct mlxsw_driver *driver = mlxsw_core->driver;
+
+ if (!driver->kvd_sizes_get)
+ return -EINVAL;
+
+ return driver->kvd_sizes_get(mlxsw_core, profile,
+ p_single_size, p_double_size,
+ p_linear_size);
+}
+EXPORT_SYMBOL(mlxsw_core_kvd_sizes_get);
+
static int __init mlxsw_core_module_init(void)
{
int err;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index e23f83b..e44061d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -309,10 +309,19 @@ struct mlxsw_driver {
void (*txhdr_construct)(struct sk_buff *skb,
const struct mlxsw_tx_info *tx_info);
int (*resources_register)(struct mlxsw_core *mlxsw_core);
+ int (*kvd_sizes_get)(struct mlxsw_core *mlxsw_core,
+ const struct mlxsw_config_profile *profile,
+ u64 *p_single_size, u64 *p_double_size,
+ u64 *p_linear_size);
u8 txhdr_len;
const struct mlxsw_config_profile *profile;
};
+int mlxsw_core_kvd_sizes_get(struct mlxsw_core *mlxsw_core,
+ const struct mlxsw_config_profile *profile,
+ u64 *p_single_size, u64 *p_double_size,
+ u64 *p_linear_size);
+
bool mlxsw_core_res_valid(struct mlxsw_core *mlxsw_core,
enum mlxsw_res_id res_id);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 39872fb..6468cf2 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -1053,38 +1053,18 @@ static int mlxsw_pci_resources_query(struct mlxsw_pci *mlxsw_pci, char *mbox,
}
static int
-mlxsw_pci_profile_get_kvd_sizes(const struct mlxsw_config_profile *profile,
+mlxsw_pci_profile_get_kvd_sizes(const struct mlxsw_pci *mlxsw_pci,
+ const struct mlxsw_config_profile *profile,
struct mlxsw_res *res)
{
- u32 single_size, double_size, linear_size;
-
- if (!MLXSW_RES_VALID(res, KVD_SINGLE_MIN_SIZE) ||
- !MLXSW_RES_VALID(res, KVD_DOUBLE_MIN_SIZE) ||
- !profile->used_kvd_split_data)
- return -EIO;
-
- linear_size = profile->kvd_linear_size;
+ u64 single_size, double_size, linear_size;
+ int err;
- /* The hash part is what left of the kvd without the
- * linear part. It is split to the single size and
- * double size by the parts ratio from the profile.
- * Both sizes must be a multiplications of the
- * granularity from the profile.
- */
- double_size = MLXSW_RES_GET(res, KVD_SIZE) - linear_size;
- double_size *= profile->kvd_hash_double_parts;
- double_size /= profile->kvd_hash_double_parts +
- profile->kvd_hash_single_parts;
- double_size /= profile->kvd_hash_granularity;
- double_size *= profile->kvd_hash_granularity;
- single_size = MLXSW_RES_GET(res, KVD_SIZE) - double_size -
- linear_size;
-
- /* Check results are legal. */
- if (single_size < MLXSW_RES_GET(res, KVD_SINGLE_MIN_SIZE) ||
- double_size < MLXSW_RES_GET(res, KVD_DOUBLE_MIN_SIZE) ||
- MLXSW_RES_GET(res, KVD_SIZE) < linear_size)
- return -EIO;
+ err = mlxsw_core_kvd_sizes_get(mlxsw_pci->core, profile,
+ &single_size, &double_size,
+ &linear_size);
+ if (err)
+ return err;
MLXSW_RES_SET(res, KVD_SINGLE_SIZE, single_size);
MLXSW_RES_SET(res, KVD_DOUBLE_SIZE, double_size);
@@ -1185,7 +1165,7 @@ static int mlxsw_pci_config_profile(struct mlxsw_pci *mlxsw_pci, char *mbox,
mbox, profile->adaptive_routing_group_cap);
}
if (MLXSW_RES_VALID(res, KVD_SIZE)) {
- err = mlxsw_pci_profile_get_kvd_sizes(profile, res);
+ err = mlxsw_pci_profile_get_kvd_sizes(mlxsw_pci, profile, res);
if (err)
return err;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 6569e55..ef5f052 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -4126,6 +4126,62 @@ static int mlxsw_sp_resources_register(struct mlxsw_core *mlxsw_core)
return 0;
}
+static int mlxsw_sp_kvd_sizes_get(struct mlxsw_core *mlxsw_core,
+ const struct mlxsw_config_profile *profile,
+ u64 *p_single_size, u64 *p_double_size,
+ u64 *p_linear_size)
+{
+ struct devlink *devlink = priv_to_devlink(mlxsw_core);
+ u32 double_size;
+ int err;
+
+ if (!MLXSW_CORE_RES_VALID(mlxsw_core, KVD_SINGLE_MIN_SIZE) ||
+ !MLXSW_CORE_RES_VALID(mlxsw_core, KVD_DOUBLE_MIN_SIZE) ||
+ !profile->used_kvd_split_data)
+ return -EIO;
+
+ /* The hash part is what left of the kvd without the
+ * linear part. It is split to the single size and
+ * double size by the parts ratio from the profile.
+ * Both sizes must be a multiplications of the
+ * granularity from the profile. In case the user
+ * provided the sizes they are obtained via devlink.
+ */
+ err = devlink_resource_size_get(devlink,
+ MLXSW_SP_RESOURCE_KVD_LINEAR,
+ p_linear_size);
+ if (err)
+ *p_linear_size = profile->kvd_linear_size;
+
+ err = devlink_resource_size_get(devlink,
+ MLXSW_SP_RESOURCE_KVD_HASH_DOUBLE,
+ p_double_size);
+ if (err) {
+ double_size = MLXSW_CORE_RES_GET(mlxsw_core, KVD_SIZE) -
+ *p_linear_size;
+ double_size *= profile->kvd_hash_double_parts;
+ double_size /= profile->kvd_hash_double_parts +
+ profile->kvd_hash_single_parts;
+ *p_double_size = rounddown(double_size,
+ profile->kvd_hash_granularity);
+ }
+
+ err = devlink_resource_size_get(devlink,
+ MLXSW_SP_RESOURCE_KVD_HASH_SINGLE,
+ p_single_size);
+ if (err)
+ *p_single_size = MLXSW_CORE_RES_GET(mlxsw_core, KVD_SIZE) -
+ *p_double_size - *p_linear_size;
+
+ /* Check results are legal. */
+ if (*p_single_size < MLXSW_CORE_RES_GET(mlxsw_core, KVD_SINGLE_MIN_SIZE) ||
+ *p_double_size < MLXSW_CORE_RES_GET(mlxsw_core, KVD_DOUBLE_MIN_SIZE) ||
+ MLXSW_CORE_RES_GET(mlxsw_core, KVD_SIZE) < *p_linear_size)
+ return -EIO;
+
+ return 0;
+}
+
static struct mlxsw_driver mlxsw_sp_driver = {
.kind = mlxsw_sp_driver_name,
.priv_size = sizeof(struct mlxsw_sp),
@@ -4146,6 +4202,7 @@ static struct mlxsw_driver mlxsw_sp_driver = {
.sb_occ_tc_port_bind_get = mlxsw_sp_sb_occ_tc_port_bind_get,
.txhdr_construct = mlxsw_sp_txhdr_construct,
.resources_register = mlxsw_sp_resources_register,
+ .kvd_sizes_get = mlxsw_sp_kvd_sizes_get,
.txhdr_len = MLXSW_TXHDR_LEN,
.profile = &mlxsw_sp_config_profile,
};
--
2.9.5
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [patch net-next v2 10/10] mlxsw: core: Add support for reload
2017-12-26 11:23 [patch net-next v2 00/10] Add support for resource abstraction Jiri Pirko
` (8 preceding siblings ...)
2017-12-26 11:23 ` [patch net-next v2 09/10] mlxsw: pci: Add support for getting resource through devlink Jiri Pirko
@ 2017-12-26 11:23 ` Jiri Pirko
2017-12-27 4:05 ` [patch net-next v2 00/10] Add support for resource abstraction David Ahern
2018-01-01 14:58 ` Arkadi Sharshevsky
11 siblings, 0 replies; 61+ messages in thread
From: Jiri Pirko @ 2017-12-26 11:23 UTC (permalink / raw)
To: netdev
Cc: davem, arkadis, mlxsw, andrew, vivien.didelot, f.fainelli,
michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
yuvalm, ogerlitz, dsa, roopa
From: Arkadi Sharshevsky <arkadis@mellanox.com>
Add support for hot reload. First, all the driver/core resources are
released but the PCI and devlink instances, then reset is performed
through the PCI interface. Finally the driver performs initialization.
In case of reload failure the driver is left in a partially initialized
state. Special care is taken during the driver removal in order to
properly handle this state.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
drivers/net/ethernet/mellanox/mlxsw/core.c | 64 +++++++++++++++++++++++-------
drivers/net/ethernet/mellanox/mlxsw/core.h | 5 ++-
drivers/net/ethernet/mellanox/mlxsw/i2c.c | 5 ++-
drivers/net/ethernet/mellanox/mlxsw/pci.c | 5 ++-
4 files changed, 59 insertions(+), 20 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 9fe25b1..4b33919 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -113,6 +113,7 @@ struct mlxsw_core {
struct mlxsw_thermal *thermal;
struct mlxsw_core_port *ports;
unsigned int max_ports;
+ bool reload_fail;
unsigned long driver_priv[0];
/* driver_priv has to be always the last item */
};
@@ -962,7 +963,28 @@ mlxsw_devlink_sb_occ_tc_port_bind_get(struct devlink_port *devlink_port,
pool_type, p_cur, p_max);
}
+static int mlxsw_devlink_core_bus_device_reload(struct devlink *devlink)
+{
+ struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
+ const struct mlxsw_bus *mlxsw_bus = mlxsw_core->bus;
+ int err;
+
+ if (!mlxsw_bus->reset)
+ return -EOPNOTSUPP;
+
+ mlxsw_core_bus_device_unregister(mlxsw_core, true);
+ mlxsw_bus->reset(mlxsw_core->bus_priv);
+ err = mlxsw_core_bus_device_register(mlxsw_core->bus_info,
+ mlxsw_core->bus,
+ mlxsw_core->bus_priv, true,
+ devlink);
+ if (err)
+ mlxsw_core->reload_fail = true;
+ return err;
+}
+
static const struct devlink_ops mlxsw_devlink_ops = {
+ .reload = mlxsw_devlink_core_bus_device_reload,
.port_type_set = mlxsw_devlink_port_type_set,
.port_split = mlxsw_devlink_port_split,
.port_unsplit = mlxsw_devlink_port_unsplit,
@@ -980,23 +1002,26 @@ static const struct devlink_ops mlxsw_devlink_ops = {
int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
const struct mlxsw_bus *mlxsw_bus,
- void *bus_priv)
+ void *bus_priv, bool reload,
+ struct devlink *devlink)
{
const char *device_kind = mlxsw_bus_info->device_kind;
struct mlxsw_core *mlxsw_core;
struct mlxsw_driver *mlxsw_driver;
- struct devlink *devlink;
size_t alloc_size;
int err;
mlxsw_driver = mlxsw_core_driver_get(device_kind);
if (!mlxsw_driver)
return -EINVAL;
- alloc_size = sizeof(*mlxsw_core) + mlxsw_driver->priv_size;
- devlink = devlink_alloc(&mlxsw_devlink_ops, alloc_size);
- if (!devlink) {
- err = -ENOMEM;
- goto err_devlink_alloc;
+
+ if (!reload) {
+ alloc_size = sizeof(*mlxsw_core) + mlxsw_driver->priv_size;
+ devlink = devlink_alloc(&mlxsw_devlink_ops, alloc_size);
+ if (!devlink) {
+ err = -ENOMEM;
+ goto err_devlink_alloc;
+ }
}
mlxsw_core = devlink_priv(devlink);
@@ -1012,7 +1037,7 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
if (err)
goto err_bus_init;
- if (mlxsw_driver->resources_register) {
+ if (mlxsw_driver->resources_register && !reload) {
err = mlxsw_driver->resources_register(mlxsw_core);
if (err)
goto err_register_resources;
@@ -1038,9 +1063,11 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
if (err)
goto err_emad_init;
- err = devlink_register(devlink, mlxsw_bus_info->dev);
- if (err)
- goto err_devlink_register;
+ if (!reload) {
+ err = devlink_register(devlink, mlxsw_bus_info->dev);
+ if (err)
+ goto err_devlink_register;
+ }
err = mlxsw_hwmon_init(mlxsw_core, mlxsw_bus_info, &mlxsw_core->hwmon);
if (err)
@@ -1082,20 +1109,29 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
}
EXPORT_SYMBOL(mlxsw_core_bus_device_register);
-void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core)
+void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core,
+ bool reload)
{
const char *device_kind = mlxsw_core->bus_info->device_kind;
struct devlink *devlink = priv_to_devlink(mlxsw_core);
+ if (mlxsw_core->reload_fail)
+ goto out;
+
if (mlxsw_core->driver->fini)
mlxsw_core->driver->fini(mlxsw_core);
mlxsw_thermal_fini(mlxsw_core->thermal);
- devlink_unregister(devlink);
+ if (!reload)
+ devlink_unregister(devlink);
mlxsw_emad_fini(mlxsw_core);
kfree(mlxsw_core->lag.mapping);
mlxsw_ports_fini(mlxsw_core);
- devlink_resources_unregister(devlink, NULL);
+ if (!reload)
+ devlink_resources_unregister(devlink, NULL);
mlxsw_core->bus->fini(mlxsw_core->bus_priv);
+ if (reload)
+ return;
+out:
devlink_free(devlink);
mlxsw_core_driver_put(device_kind);
}
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index e44061d..5ddafd7 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -66,8 +66,9 @@ void mlxsw_core_driver_unregister(struct mlxsw_driver *mlxsw_driver);
int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
const struct mlxsw_bus *mlxsw_bus,
- void *bus_priv);
-void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core);
+ void *bus_priv, bool reload,
+ struct devlink *devlink);
+void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core, bool reload);
struct mlxsw_tx_info {
u8 local_port;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/i2c.c b/drivers/net/ethernet/mellanox/mlxsw/i2c.c
index c0dcfa0..25f9915 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/i2c.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/i2c.c
@@ -539,7 +539,8 @@ static int mlxsw_i2c_probe(struct i2c_client *client,
mlxsw_i2c->dev = &client->dev;
err = mlxsw_core_bus_device_register(&mlxsw_i2c->bus_info,
- &mlxsw_i2c_bus, mlxsw_i2c);
+ &mlxsw_i2c_bus, mlxsw_i2c, false,
+ NULL);
if (err) {
dev_err(&client->dev, "Fail to register core bus\n");
return err;
@@ -557,7 +558,7 @@ static int mlxsw_i2c_remove(struct i2c_client *client)
{
struct mlxsw_i2c *mlxsw_i2c = i2c_get_clientdata(client);
- mlxsw_core_bus_device_unregister(mlxsw_i2c->core);
+ mlxsw_core_bus_device_unregister(mlxsw_i2c->core, false);
mutex_destroy(&mlxsw_i2c->cmd.lock);
return 0;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 6468cf2..ceb8785 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -1734,7 +1734,8 @@ static int mlxsw_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
mlxsw_pci->id = id;
err = mlxsw_core_bus_device_register(&mlxsw_pci->bus_info,
- &mlxsw_pci_bus, mlxsw_pci);
+ &mlxsw_pci_bus, mlxsw_pci, false,
+ NULL);
if (err) {
dev_err(&pdev->dev, "cannot register bus device\n");
goto err_bus_device_register;
@@ -1762,7 +1763,7 @@ static void mlxsw_pci_remove(struct pci_dev *pdev)
{
struct mlxsw_pci *mlxsw_pci = pci_get_drvdata(pdev);
- mlxsw_core_bus_device_unregister(mlxsw_pci->core);
+ mlxsw_core_bus_device_unregister(mlxsw_pci->core, false);
mlxsw_pci_free_irq_vectors(mlxsw_pci);
iounmap(mlxsw_pci->hw_addr);
pci_release_regions(mlxsw_pci->pdev);
--
2.9.5
^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-26 11:23 [patch net-next v2 00/10] Add support for resource abstraction Jiri Pirko
` (9 preceding siblings ...)
2017-12-26 11:23 ` [patch net-next v2 10/10] mlxsw: core: Add support for reload Jiri Pirko
@ 2017-12-27 4:05 ` David Ahern
2017-12-27 8:09 ` Jiri Pirko
2018-01-01 14:58 ` Arkadi Sharshevsky
11 siblings, 1 reply; 61+ messages in thread
From: David Ahern @ 2017-12-27 4:05 UTC (permalink / raw)
To: Jiri Pirko, netdev, davem
Cc: arkadis, mlxsw, andrew, vivien.didelot, f.fainelli, michael.chan,
ganeshgr, saeedm, matanb, leonro, idosch, jakub.kicinski, ast,
daniel, simon.horman, pieter.jansenvanvuuren, john.hurley,
alexander.h.duyck, linville, gospo, steven.lin1, yuvalm, ogerlitz,
roopa
On 12/26/17 5:23 AM, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
>
> Many of the ASIC's internal resources are limited and are shared between
> several hardware procedures. For example, unified hash-based memory can
> be used for many lookup purposes, like FDB and LPM. In many cases the user
> can provide a partitioning scheme for such a resource in order to perform
> fine tuning for his application. In such cases performing driver reload is
> needed for the changes to take place, thus this patchset also adds support
> for hot reload.
>
> Such an abstraction can be coupled with devlink's dpipe interface, which
> models the ASIC's pipeline as a graph of match/action tables. By modeling
> the hardware resource object, and by coupling it to several dpipe tables,
> further visibility can be achieved in order to debug ASIC-wide issues.
>
> The proposed interface will provide the user the ability to understand the
> limitations of the hardware, and receive notification regarding its occupancy.
> Furthermore, monitoring the resource occupancy can be done in real-time and
> can be useful in many cases.
In the last RFC (not v1, but RFC) I asked for some kind of description
for each resource, and you and Arkadi have pushed back. Let's walk
through an example to see what I mean:
$ devlink resource show pci/0000:03:00.0
pci/0000:03:00.0:
name kvd size 245760 size_valid true
resources:
name linear size 98304 occ 0
name hash_double size 60416
name hash_single size 87040
So this 2700 has 3 resources that can be managed -- some table or
resource or something named 'kvd' with linear, hash_double and
hash_single sub-resources. What are these names referring too? The above
output gives no description, and 'kvd' is not an industry term. Further,
what are these sizes that a user can control? The output contains no
units, no description, nothing. In short, the above output provides
random numbers associated with random names.
I can see dpipe tables exported by this device:
$ devlink dpipe header show pci/0000:03:00.0
pci/0000:03:00.0:
name mlxsw_meta
field:
name erif_port bitwidth 32 mapping_type ifindex
name l3_forward bitwidth 1
name l3_drop bitwidth 1
name adj_index bitwidth 32
name adj_size bitwidth 32
name adj_hash_index bitwidth 32
name ipv6
field:
name destination ip bitwidth 128
name ipv4
field:
name destination ip bitwidth 32
name ethernet
field:
name destination mac bitwidth 48
but none mention 'kvd' or 'linear' or 'hash" and none of the other
various devlink options:
$ devlink
Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }
where OBJECT := { dev | port | sb | monitor | dpipe }
seem to related to resources.
So how does a user know what they are controlling by this 'resource'
option? Is the user expected to have a PRM or user guide on hand for the
specific device model that is being configured?
Again, I have no objections to kvd, linear, hash, etc terms as they do
relate to Mellanox products. But kvd/linear, for example, does correlate
to industry standard concepts in some way. My request is that the
resource listing guide the user in some way, stating what these
resources mean.
IMO the above output is not user friendly and having to keep a PRM on
hand for each device model is not a realistic solution.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-27 4:05 ` [patch net-next v2 00/10] Add support for resource abstraction David Ahern
@ 2017-12-27 8:09 ` Jiri Pirko
2017-12-27 8:23 ` Andrew Lunn
` (2 more replies)
0 siblings, 3 replies; 61+ messages in thread
From: Jiri Pirko @ 2017-12-27 8:09 UTC (permalink / raw)
To: David Ahern
Cc: netdev, davem, arkadis, mlxsw, andrew, vivien.didelot, f.fainelli,
michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
yuvalm, ogerlitz, roopa
Wed, Dec 27, 2017 at 05:05:09AM CET, dsa@cumulusnetworks.com wrote:
>On 12/26/17 5:23 AM, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>>
>> Many of the ASIC's internal resources are limited and are shared between
>> several hardware procedures. For example, unified hash-based memory can
>> be used for many lookup purposes, like FDB and LPM. In many cases the user
>> can provide a partitioning scheme for such a resource in order to perform
>> fine tuning for his application. In such cases performing driver reload is
>> needed for the changes to take place, thus this patchset also adds support
>> for hot reload.
>>
>> Such an abstraction can be coupled with devlink's dpipe interface, which
>> models the ASIC's pipeline as a graph of match/action tables. By modeling
>> the hardware resource object, and by coupling it to several dpipe tables,
>> further visibility can be achieved in order to debug ASIC-wide issues.
>>
>> The proposed interface will provide the user the ability to understand the
>> limitations of the hardware, and receive notification regarding its occupancy.
>> Furthermore, monitoring the resource occupancy can be done in real-time and
>> can be useful in many cases.
>
>In the last RFC (not v1, but RFC) I asked for some kind of description
>for each resource, and you and Arkadi have pushed back. Let's walk
>through an example to see what I mean:
>
>$ devlink resource show pci/0000:03:00.0
>pci/0000:03:00.0:
> name kvd size 245760 size_valid true
> resources:
> name linear size 98304 occ 0
> name hash_double size 60416
> name hash_single size 87040
>
>So this 2700 has 3 resources that can be managed -- some table or
>resource or something named 'kvd' with linear, hash_double and
>hash_single sub-resources. What are these names referring too? The above
>output gives no description, and 'kvd' is not an industry term. Further,
This are internal resources specific to the ASIC. Would you like some
description to each or something like that?
>what are these sizes that a user can control? The output contains no
>units, no description, nothing. In short, the above output provides
>random numbers associated with random names.
Units are now exposed from kernel, just this version of iproute2 patch
does not display it.
>
>I can see dpipe tables exported by this device:
>
>$ devlink dpipe header show pci/0000:03:00.0
>
>pci/0000:03:00.0:
> name mlxsw_meta
> field:
> name erif_port bitwidth 32 mapping_type ifindex
> name l3_forward bitwidth 1
> name l3_drop bitwidth 1
> name adj_index bitwidth 32
> name adj_size bitwidth 32
> name adj_hash_index bitwidth 32
>
> name ipv6
> field:
> name destination ip bitwidth 128
>
> name ipv4
> field:
> name destination ip bitwidth 32
>
> name ethernet
> field:
> name destination mac bitwidth 48
>
>but none mention 'kvd' or 'linear' or 'hash" and none of the other
>various devlink options:
>
>$ devlink
>Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }
>where OBJECT := { dev | port | sb | monitor | dpipe }
>
>seem to related to resources.
>
>So how does a user know what they are controlling by this 'resource'
>option? Is the user expected to have a PRM or user guide on hand for the
>specific device model that is being configured?
The relation of specific dpipe table to specific resource is exposed by
the kernel as well. Probably the iproute2 patch just does not display
it.
>
>Again, I have no objections to kvd, linear, hash, etc terms as they do
>relate to Mellanox products. But kvd/linear, for example, does correlate
>to industry standard concepts in some way. My request is that the
>resource listing guide the user in some way, stating what these
>resources mean.
So the showed relation to dpipe table would be enougn or you would still
like to see some description? I don't like the description concept here
as the relations to dpipe table should tell user exactly what he needs
to know.
>
>IMO the above output is not user friendly and having to keep a PRM on
>hand for each device model is not a realistic solution.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-27 8:09 ` Jiri Pirko
@ 2017-12-27 8:23 ` Andrew Lunn
2017-12-27 9:37 ` Jiri Pirko
2017-12-27 8:28 ` Yuval Mintz
2017-12-27 16:34 ` David Ahern
2 siblings, 1 reply; 61+ messages in thread
From: Andrew Lunn @ 2017-12-27 8:23 UTC (permalink / raw)
To: Jiri Pirko
Cc: David Ahern, netdev, davem, arkadis, mlxsw, vivien.didelot,
f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
idosch, jakub.kicinski, ast, daniel, simon.horman,
pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, linville,
gospo, steven.lin1, yuvalm, ogerlitz, roopa
> >$ devlink resource show pci/0000:03:00.0
> >pci/0000:03:00.0:
> > name kvd size 245760 size_valid true
> > resources:
> > name linear size 98304 occ 0
> > name hash_double size 60416
> > name hash_single size 87040
> >
> >So this 2700 has 3 resources that can be managed -- some table or
> >resource or something named 'kvd' with linear, hash_double and
> >hash_single sub-resources. What are these names referring too? The above
> >output gives no description, and 'kvd' is not an industry term. Further,
>
> This are internal resources specific to the ASIC. Would you like some
> description to each or something like that?
The fact you have decided to expose them means you expect people to
change them. So yes, they need to be documented. Maybe add something
to Documentation/ABI/stable/
> So the showed relation to dpipe table would be enougn or you would still
> like to see some description? I don't like the description concept here
> as the relations to dpipe table should tell user exactly what he needs
> to know.
Documenting the ABI is good practice.
Andrew
^ permalink raw reply [flat|nested] 61+ messages in thread
* RE: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-27 8:09 ` Jiri Pirko
2017-12-27 8:23 ` Andrew Lunn
@ 2017-12-27 8:28 ` Yuval Mintz
2017-12-27 16:34 ` David Ahern
2 siblings, 0 replies; 61+ messages in thread
From: Yuval Mintz @ 2017-12-27 8:28 UTC (permalink / raw)
To: Jiri Pirko, David Ahern
Cc: netdev@vger.kernel.org, davem@davemloft.net, Arkadi Sharshevsky,
mlxsw, andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
f.fainelli@gmail.com, michael.chan@broadcom.com,
ganeshgr@chelsio.com, Saeed Mahameed, Matan Barak,
Leon Romanovsky, Ido Schimmel, jakub.kicinski@netronome.com,
ast@kernel.org, daniel@iogearbox.net,
"simon.horman@netronome.com" <simon.horman
> >> Many of the ASIC's internal resources are limited and are shared between
> >> several hardware procedures. For example, unified hash-based memory
> can
> >> be used for many lookup purposes, like FDB and LPM. In many cases the
> user
> >> can provide a partitioning scheme for such a resource in order to perform
> >> fine tuning for his application. In such cases performing driver reload is
> >> needed for the changes to take place, thus this patchset also adds support
> >> for hot reload.
> >>
> >> Such an abstraction can be coupled with devlink's dpipe interface, which
> >> models the ASIC's pipeline as a graph of match/action tables. By modeling
> >> the hardware resource object, and by coupling it to several dpipe tables,
> >> further visibility can be achieved in order to debug ASIC-wide issues.
> >>
> >> The proposed interface will provide the user the ability to understand the
> >> limitations of the hardware, and receive notification regarding its
> occupancy.
> >> Furthermore, monitoring the resource occupancy can be done in real-time
> and
> >> can be useful in many cases.
> >
> >In the last RFC (not v1, but RFC) I asked for some kind of description
> >for each resource, and you and Arkadi have pushed back. Let's walk
> >through an example to see what I mean:
> >
> >$ devlink resource show pci/0000:03:00.0
> >pci/0000:03:00.0:
> > name kvd size 245760 size_valid true
> > resources:
> > name linear size 98304 occ 0
> > name hash_double size 60416
> > name hash_single size 87040
> >
> >So this 2700 has 3 resources that can be managed -- some table or
> >resource or something named 'kvd' with linear, hash_double and
> >hash_single sub-resources. What are these names referring too? The above
> >output gives no description, and 'kvd' is not an industry term. Further,
>
> This are internal resources specific to the ASIC. Would you like some
> description to each or something like that?
>
>
> >what are these sizes that a user can control? The output contains no
> >units, no description, nothing. In short, the above output provides
> >random numbers associated with random names.
>
> Units are now exposed from kernel, just this version of iproute2 patch
> does not display it.
>
>
> >
> >I can see dpipe tables exported by this device:
> >
> >$ devlink dpipe header show pci/0000:03:00.0
> >
> >pci/0000:03:00.0:
> > name mlxsw_meta
> > field:
> > name erif_port bitwidth 32 mapping_type ifindex
> > name l3_forward bitwidth 1
> > name l3_drop bitwidth 1
> > name adj_index bitwidth 32
> > name adj_size bitwidth 32
> > name adj_hash_index bitwidth 32
> >
> > name ipv6
> > field:
> > name destination ip bitwidth 128
> >
> > name ipv4
> > field:
> > name destination ip bitwidth 32
> >
> > name ethernet
> > field:
> > name destination mac bitwidth 48
> >
> >but none mention 'kvd' or 'linear' or 'hash" and none of the other
> >various devlink options:
> >
> >$ devlink
> >Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }
> >where OBJECT := { dev | port | sb | monitor | dpipe }
> >
> >seem to related to resources.
> >
> >So how does a user know what they are controlling by this 'resource'
> >option? Is the user expected to have a PRM or user guide on hand for the
> >specific device model that is being configured?
>
> The relation of specific dpipe table to specific resource is exposed by
> the kernel as well. Probably the iproute2 patch just does not display
> it.
It does, just under 'table' and not under 'header'. E.g.:
# ./devlink/devlink dpipe -j -p table show pci/0000:03:00.0
...
},{
"resource_path": "/kvd/hash_single",
"name": "mlxsw_host4",
"size": 0,
"counters_enabled": "false",
"match": [{
"type": "field_exact",
"header": "mlxsw_meta",
"field": "erif_port",
"mapping": "ifindex"
},{
"type": "field_exact",
"header": "ipv4",
"field": "destination ip"
}
],
"action": [{
"type": "field_modify",
"header": "ethernet",
"field": "destination mac"
}
]
},{
...
>
>
> >
> >Again, I have no objections to kvd, linear, hash, etc terms as they do
> >relate to Mellanox products. But kvd/linear, for example, does correlate
> >to industry standard concepts in some way. My request is that the
> >resource listing guide the user in some way, stating what these
> >resources mean.
>
> So the showed relation to dpipe table would be enougn or you would still
> like to see some description? I don't like the description concept here
> as the relations to dpipe table should tell user exactly what he needs
> to know.
>
>
> >
> >IMO the above output is not user friendly and having to keep a PRM on
> >hand for each device model is not a realistic solution.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-27 8:23 ` Andrew Lunn
@ 2017-12-27 9:37 ` Jiri Pirko
2017-12-27 13:08 ` Andrew Lunn
0 siblings, 1 reply; 61+ messages in thread
From: Jiri Pirko @ 2017-12-27 9:37 UTC (permalink / raw)
To: Andrew Lunn
Cc: David Ahern, netdev, davem, arkadis, mlxsw, vivien.didelot,
f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
idosch, jakub.kicinski, ast, daniel, simon.horman,
pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, linville,
gospo, steven.lin1, yuvalm, ogerlitz, roopa
Wed, Dec 27, 2017 at 09:23:31AM CET, andrew@lunn.ch wrote:
>> >$ devlink resource show pci/0000:03:00.0
>> >pci/0000:03:00.0:
>> > name kvd size 245760 size_valid true
>> > resources:
>> > name linear size 98304 occ 0
>> > name hash_double size 60416
>> > name hash_single size 87040
>> >
>> >So this 2700 has 3 resources that can be managed -- some table or
>> >resource or something named 'kvd' with linear, hash_double and
>> >hash_single sub-resources. What are these names referring too? The above
>> >output gives no description, and 'kvd' is not an industry term. Further,
>>
>> This are internal resources specific to the ASIC. Would you like some
>> description to each or something like that?
>
>The fact you have decided to expose them means you expect people to
>change them. So yes, they need to be documented. Maybe add something
>to Documentation/ABI/stable/
>
>> So the showed relation to dpipe table would be enougn or you would still
>> like to see some description? I don't like the description concept here
>> as the relations to dpipe table should tell user exactly what he needs
>> to know.
>
>Documenting the ABI is good practice.
This is misunderstanding I believe. This is not about ABI. That is well
defined by the netlink attributes. This is about meaning of particular
ASIC-specific internal resources.
>
> Andrew
>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-27 9:37 ` Jiri Pirko
@ 2017-12-27 13:08 ` Andrew Lunn
2017-12-27 13:15 ` Jiri Pirko
0 siblings, 1 reply; 61+ messages in thread
From: Andrew Lunn @ 2017-12-27 13:08 UTC (permalink / raw)
To: Jiri Pirko
Cc: David Ahern, netdev, davem, arkadis, mlxsw, vivien.didelot,
f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
idosch, jakub.kicinski, ast, daniel, simon.horman,
pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, linville,
gospo, steven.lin1, yuvalm, ogerlitz, roopa
> This is misunderstanding I believe. This is not about ABI. That is well
> defined by the netlink attributes. This is about meaning of particular
> ASIC-specific internal resources.
I would agree that the netlink attributed are clearly defined. But the
meta information, what this ASIC specific internal resource means when
you combine these attributes, is unclear. This meta information is
also part of the ABI, and documenting giving users a hit what it
means, and why they should change it, would be good practice.
Look at sysfs. open/read/write are clearly defined, which is the
equivalent of the netlink attributes. The meta information we document
in Documentation/ABI/, what a file name means, what a value means,
what other values it can take, etc.
Andrew
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-27 13:08 ` Andrew Lunn
@ 2017-12-27 13:15 ` Jiri Pirko
2017-12-27 16:38 ` David Ahern
2017-12-27 19:31 ` Andrew Lunn
0 siblings, 2 replies; 61+ messages in thread
From: Jiri Pirko @ 2017-12-27 13:15 UTC (permalink / raw)
To: Andrew Lunn
Cc: David Ahern, netdev, davem, arkadis, mlxsw, vivien.didelot,
f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
idosch, jakub.kicinski, ast, daniel, simon.horman,
pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, linville,
gospo, steven.lin1, yuvalm, ogerlitz, roopa
Wed, Dec 27, 2017 at 02:08:03PM CET, andrew@lunn.ch wrote:
>> This is misunderstanding I believe. This is not about ABI. That is well
>> defined by the netlink attributes. This is about meaning of particular
>> ASIC-specific internal resources.
>
>I would agree that the netlink attributed are clearly defined. But the
>meta information, what this ASIC specific internal resource means when
>you combine these attributes, is unclear. This meta information is
>also part of the ABI, and documenting giving users a hit what it
>means, and why they should change it, would be good practice.
>
>Look at sysfs. open/read/write are clearly defined, which is the
>equivalent of the netlink attributes. The meta information we document
>in Documentation/ABI/, what a file name means, what a value means,
>what other values it can take, etc.
Hmm. That documents mainly sysfs. No mention of Netlink at all. But
maybe I missed it. Also, that defines the interface as is. However we
are talking about the data exchanged over the interface, not the
interface itself. I don't see how ASIC/HW specific thing, like for
example KVD in our case could be part of kernel ABI. That makes 0 sense
to me, sorry.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-27 8:09 ` Jiri Pirko
2017-12-27 8:23 ` Andrew Lunn
2017-12-27 8:28 ` Yuval Mintz
@ 2017-12-27 16:34 ` David Ahern
2017-12-27 19:43 ` Roopa Prabhu
` (2 more replies)
2 siblings, 3 replies; 61+ messages in thread
From: David Ahern @ 2017-12-27 16:34 UTC (permalink / raw)
To: Jiri Pirko
Cc: netdev, davem, arkadis, mlxsw, andrew, vivien.didelot, f.fainelli,
michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
yuvalm, ogerlitz, roopa
On 12/27/17 2:09 AM, Jiri Pirko wrote:
> Wed, Dec 27, 2017 at 05:05:09AM CET, dsa@cumulusnetworks.com wrote:
>> On 12/26/17 5:23 AM, Jiri Pirko wrote:
>>> From: Jiri Pirko <jiri@mellanox.com>
>>>
>>> Many of the ASIC's internal resources are limited and are shared between
>>> several hardware procedures. For example, unified hash-based memory can
>>> be used for many lookup purposes, like FDB and LPM. In many cases the user
>>> can provide a partitioning scheme for such a resource in order to perform
>>> fine tuning for his application. In such cases performing driver reload is
>>> needed for the changes to take place, thus this patchset also adds support
>>> for hot reload.
>>>
>>> Such an abstraction can be coupled with devlink's dpipe interface, which
>>> models the ASIC's pipeline as a graph of match/action tables. By modeling
>>> the hardware resource object, and by coupling it to several dpipe tables,
>>> further visibility can be achieved in order to debug ASIC-wide issues.
>>>
>>> The proposed interface will provide the user the ability to understand the
>>> limitations of the hardware, and receive notification regarding its occupancy.
>>> Furthermore, monitoring the resource occupancy can be done in real-time and
>>> can be useful in many cases.
>>
>> In the last RFC (not v1, but RFC) I asked for some kind of description
>> for each resource, and you and Arkadi have pushed back. Let's walk
>> through an example to see what I mean:
>>
>> $ devlink resource show pci/0000:03:00.0
>> pci/0000:03:00.0:
>> name kvd size 245760 size_valid true
>> resources:
>> name linear size 98304 occ 0
>> name hash_double size 60416
>> name hash_single size 87040
>>
>> So this 2700 has 3 resources that can be managed -- some table or
>> resource or something named 'kvd' with linear, hash_double and
>> hash_single sub-resources. What are these names referring too? The above
>> output gives no description, and 'kvd' is not an industry term. Further,
>
> This are internal resources specific to the ASIC. Would you like some
> description to each or something like that?
devlink has some nice self-documenting capabilities. What's missing here
is a description of what the resource is used for in standard terms --
ipv4 host routes, fdb, nexthops, rifs, etc. Even if the description is a
short list versus an exhaustive list of everything it is used for. e.g.,
Why would a user decrease linear and increase hash_single or vice versa?
>
>
>> what are these sizes that a user can control? The output contains no
>> units, no description, nothing. In short, the above output provides
>> random numbers associated with random names.
>
> Units are now exposed from kernel, just this version of iproute2 patch
> does not display it.
please provide an iproute2 patch that does so the full context if this
patch set can be reviewed from a user perspective.
>
>
>>
>> I can see dpipe tables exported by this device:
>>
>> $ devlink dpipe header show pci/0000:03:00.0
>>
>> pci/0000:03:00.0:
>> name mlxsw_meta
>> field:
>> name erif_port bitwidth 32 mapping_type ifindex
>> name l3_forward bitwidth 1
>> name l3_drop bitwidth 1
>> name adj_index bitwidth 32
>> name adj_size bitwidth 32
>> name adj_hash_index bitwidth 32
>>
>> name ipv6
>> field:
>> name destination ip bitwidth 128
>>
>> name ipv4
>> field:
>> name destination ip bitwidth 32
>>
>> name ethernet
>> field:
>> name destination mac bitwidth 48
>>
>> but none mention 'kvd' or 'linear' or 'hash" and none of the other
>> various devlink options:
>>
>> $ devlink
>> Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }
>> where OBJECT := { dev | port | sb | monitor | dpipe }
>>
>> seem to related to resources.
>>
>> So how does a user know what they are controlling by this 'resource'
>> option? Is the user expected to have a PRM or user guide on hand for the
>> specific device model that is being configured?
>
> The relation of specific dpipe table to specific resource is exposed by
> the kernel as well. Probably the iproute2 patch just does not display
> it.
please provide an iproute2 patch that does so the full context if this
patch set can be reviewed from a user perspective.
>
>
>>
>> Again, I have no objections to kvd, linear, hash, etc terms as they do
>> relate to Mellanox products. But kvd/linear, for example, does correlate
>> to industry standard concepts in some way. My request is that the
>> resource listing guide the user in some way, stating what these
>> resources mean.
>
> So the showed relation to dpipe table would be enougn or you would still
> like to see some description? I don't like the description concept here
> as the relations to dpipe table should tell user exactly what he needs
> to know.
I believe it is useful to have a 1-line, short description that gives
the user some memory jogger as to what the resource is used for. It does
not have to be an exhaustive list, but the user should not have to do
mental jumping jacks running a bunch of commands to understand the
resources for vendor specific asics.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-27 13:15 ` Jiri Pirko
@ 2017-12-27 16:38 ` David Ahern
2017-12-27 19:31 ` Andrew Lunn
1 sibling, 0 replies; 61+ messages in thread
From: David Ahern @ 2017-12-27 16:38 UTC (permalink / raw)
To: Jiri Pirko, Andrew Lunn
Cc: netdev, davem, arkadis, mlxsw, vivien.didelot, f.fainelli,
michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
yuvalm, ogerlitz, roopa
On 12/27/17 7:15 AM, Jiri Pirko wrote:
> Wed, Dec 27, 2017 at 02:08:03PM CET, andrew@lunn.ch wrote:
>>> This is misunderstanding I believe. This is not about ABI. That is well
>>> defined by the netlink attributes. This is about meaning of particular
>>> ASIC-specific internal resources.
>>
>> I would agree that the netlink attributed are clearly defined. But the
>> meta information, what this ASIC specific internal resource means when
>> you combine these attributes, is unclear. This meta information is
>> also part of the ABI, and documenting giving users a hit what it
>> means, and why they should change it, would be good practice.
>>
>> Look at sysfs. open/read/write are clearly defined, which is the
>> equivalent of the netlink attributes. The meta information we document
>> in Documentation/ABI/, what a file name means, what a value means,
>> what other values it can take, etc.
>
> Hmm. That documents mainly sysfs. No mention of Netlink at all. But
> maybe I missed it. Also, that defines the interface as is. However we
> are talking about the data exchanged over the interface, not the
> interface itself. I don't see how ASIC/HW specific thing, like for
> example KVD in our case could be part of kernel ABI. That makes 0 sense
> to me, sorry.
>
IMO, kernel documentation is not much better than vendor docs. A general
networking admin may not have access to a kernel tree or have vendor
docs at their finger tips. As I mentioned in the previous response,
devlink attributes have self-documenting capabilities making that
information available inline. That to me is the right place for a short,
but reasonably sized description. For in-depth details users can
cross-reference vendor docs.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-27 13:15 ` Jiri Pirko
2017-12-27 16:38 ` David Ahern
@ 2017-12-27 19:31 ` Andrew Lunn
2017-12-27 20:16 ` David Ahern
1 sibling, 1 reply; 61+ messages in thread
From: Andrew Lunn @ 2017-12-27 19:31 UTC (permalink / raw)
To: Jiri Pirko
Cc: David Ahern, netdev, davem, arkadis, mlxsw, vivien.didelot,
f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
idosch, jakub.kicinski, ast, daniel, simon.horman,
pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, linville,
gospo, steven.lin1, yuvalm, ogerlitz, roopa
> Hmm. That documents mainly sysfs. No mention of Netlink at all. But
> maybe I missed it. Also, that defines the interface as is. However we
> are talking about the data exchanged over the interface, not the
> interface itself. I don't see how ASIC/HW specific thing, like for
> example KVD in our case could be part of kernel ABI.
You need to be very careful here. As soon as somebody starts using it,
it might become an ABI. Or you need to clearly document it is not ABI,
there is no guarantee it will not disappear or change its meaning in
the next kernel, and it should be used with extreme caution.
Personally, if DSA drivers were to expose such settings, i would
consider them ABI, which people can rely on to remain stable.
Andrew
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-27 16:34 ` David Ahern
@ 2017-12-27 19:43 ` Roopa Prabhu
2017-12-28 8:21 ` Yuval Mintz
2017-12-27 20:15 ` Arkadi Sharshevsky
2017-12-28 8:25 ` Yuval Mintz
2 siblings, 1 reply; 61+ messages in thread
From: Roopa Prabhu @ 2017-12-27 19:43 UTC (permalink / raw)
To: David Ahern
Cc: Jiri Pirko, netdev, David Miller, Arkadi Sharshevsky, mlxsw,
Andrew Lunn, Vivien Didelot, Florian Fainelli, Michael Chan,
ganeshgr, Saeed Mahameed, matanb, leonro, Ido Schimmel,
jakub.kicinski, ast, Daniel Borkmann, Simon Horman,
pieter.jansenvanvuuren, john.hurley, Alexander Duyck,
John W. Linville, Andy Gospodarek <gospo@
On Wed, Dec 27, 2017 at 8:34 AM, David Ahern <dsa@cumulusnetworks.com> wrote:
> On 12/27/17 2:09 AM, Jiri Pirko wrote:
>> Wed, Dec 27, 2017 at 05:05:09AM CET, dsa@cumulusnetworks.com wrote:
>>> On 12/26/17 5:23 AM, Jiri Pirko wrote:
>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>
>>>> Many of the ASIC's internal resources are limited and are shared between
>>>> several hardware procedures. For example, unified hash-based memory can
>>>> be used for many lookup purposes, like FDB and LPM. In many cases the user
>>>> can provide a partitioning scheme for such a resource in order to perform
>>>> fine tuning for his application. In such cases performing driver reload is
>>>> needed for the changes to take place, thus this patchset also adds support
>>>> for hot reload.
>>>>
>>>> Such an abstraction can be coupled with devlink's dpipe interface, which
>>>> models the ASIC's pipeline as a graph of match/action tables. By modeling
>>>> the hardware resource object, and by coupling it to several dpipe tables,
>>>> further visibility can be achieved in order to debug ASIC-wide issues.
>>>>
>>>> The proposed interface will provide the user the ability to understand the
>>>> limitations of the hardware, and receive notification regarding its occupancy.
>>>> Furthermore, monitoring the resource occupancy can be done in real-time and
>>>> can be useful in many cases.
>>>
>>> In the last RFC (not v1, but RFC) I asked for some kind of description
>>> for each resource, and you and Arkadi have pushed back. Let's walk
>>> through an example to see what I mean:
>>>
>>> $ devlink resource show pci/0000:03:00.0
>>> pci/0000:03:00.0:
>>> name kvd size 245760 size_valid true
>>> resources:
>>> name linear size 98304 occ 0
>>> name hash_double size 60416
>>> name hash_single size 87040
>>>
>>> So this 2700 has 3 resources that can be managed -- some table or
>>> resource or something named 'kvd' with linear, hash_double and
>>> hash_single sub-resources. What are these names referring too? The above
>>> output gives no description, and 'kvd' is not an industry term. Further,
>>
>> This are internal resources specific to the ASIC. Would you like some
>> description to each or something like that?
>
> devlink has some nice self-documenting capabilities. What's missing here
> is a description of what the resource is used for in standard terms --
> ipv4 host routes, fdb, nexthops, rifs, etc. Even if the description is a
> short list versus an exhaustive list of everything it is used for. e.g.,
> Why would a user decrease linear and increase hash_single or vice versa?
Arkadi, on what david says above, can the resource names and ids not
be driver specific, but moved up to the switchdev layer and just map
to fdb, host routes, nexthops table sizes etc ?.
Can these generic networking resources then in-turn be mapped to kvd
sizes by the driver ?
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-27 16:34 ` David Ahern
2017-12-27 19:43 ` Roopa Prabhu
@ 2017-12-27 20:15 ` Arkadi Sharshevsky
2017-12-28 8:25 ` Yuval Mintz
2 siblings, 0 replies; 61+ messages in thread
From: Arkadi Sharshevsky @ 2017-12-27 20:15 UTC (permalink / raw)
To: David Ahern, Jiri Pirko
Cc: netdev, davem, mlxsw, andrew, vivien.didelot, f.fainelli,
michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
yuvalm, ogerlitz, roopa
On 12/27/2017 06:34 PM, David Ahern wrote:
> On 12/27/17 2:09 AM, Jiri Pirko wrote:
>> Wed, Dec 27, 2017 at 05:05:09AM CET, dsa@cumulusnetworks.com wrote:
>>> On 12/26/17 5:23 AM, Jiri Pirko wrote:
>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>
>>>> Many of the ASIC's internal resources are limited and are shared between
>>>> several hardware procedures. For example, unified hash-based memory can
>>>> be used for many lookup purposes, like FDB and LPM. In many cases the user
>>>> can provide a partitioning scheme for such a resource in order to perform
>>>> fine tuning for his application. In such cases performing driver reload is
>>>> needed for the changes to take place, thus this patchset also adds support
>>>> for hot reload.
>>>>
>>>> Such an abstraction can be coupled with devlink's dpipe interface, which
>>>> models the ASIC's pipeline as a graph of match/action tables. By modeling
>>>> the hardware resource object, and by coupling it to several dpipe tables,
>>>> further visibility can be achieved in order to debug ASIC-wide issues.
>>>>
>>>> The proposed interface will provide the user the ability to understand the
>>>> limitations of the hardware, and receive notification regarding its occupancy.
>>>> Furthermore, monitoring the resource occupancy can be done in real-time and
>>>> can be useful in many cases.
>>>
>>> In the last RFC (not v1, but RFC) I asked for some kind of description
>>> for each resource, and you and Arkadi have pushed back. Let's walk
>>> through an example to see what I mean:
>>>
>>> $ devlink resource show pci/0000:03:00.0
>>> pci/0000:03:00.0:
>>> name kvd size 245760 size_valid true
>>> resources:
>>> name linear size 98304 occ 0
>>> name hash_double size 60416
>>> name hash_single size 87040
>>>
>>> So this 2700 has 3 resources that can be managed -- some table or
>>> resource or something named 'kvd' with linear, hash_double and
>>> hash_single sub-resources. What are these names referring too? The above
>>> output gives no description, and 'kvd' is not an industry term. Further,
>>
>> This are internal resources specific to the ASIC. Would you like some
>> description to each or something like that?
>
> devlink has some nice self-documenting capabilities. What's missing here
> is a description of what the resource is used for in standard terms --
> ipv4 host routes, fdb, nexthops, rifs, etc. Even if the description is a
> short list versus an exhaustive list of everything it is used for. e.g.,
> Why would a user decrease linear and increase hash_single or vice versa?
>
>>
>>
>>> what are these sizes that a user can control? The output contains no
>>> units, no description, nothing. In short, the above output provides
>>> random numbers associated with random names.
>>
>> Units are now exposed from kernel, just this version of iproute2 patch
>> does not display it.
>
> please provide an iproute2 patch that does so the full context if this
> patch set can be reviewed from a user perspective.
>
>>
>>
>>>
>>> I can see dpipe tables exported by this device:
>>>
>>> $ devlink dpipe header show pci/0000:03:00.0
>>>
>>> pci/0000:03:00.0:
>>> name mlxsw_meta
>>> field:
>>> name erif_port bitwidth 32 mapping_type ifindex
>>> name l3_forward bitwidth 1
>>> name l3_drop bitwidth 1
>>> name adj_index bitwidth 32
>>> name adj_size bitwidth 32
>>> name adj_hash_index bitwidth 32
>>>
>>> name ipv6
>>> field:
>>> name destination ip bitwidth 128
>>>
>>> name ipv4
>>> field:
>>> name destination ip bitwidth 32
>>>
>>> name ethernet
>>> field:
>>> name destination mac bitwidth 48
>>>
>>> but none mention 'kvd' or 'linear' or 'hash" and none of the other
>>> various devlink options:
>>>
>>> $ devlink
>>> Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }
>>> where OBJECT := { dev | port | sb | monitor | dpipe }
>>>
>>> seem to related to resources.
>>>
>>> So how does a user know what they are controlling by this 'resource'
>>> option? Is the user expected to have a PRM or user guide on hand for the
>>> specific device model that is being configured?
>>
>> The relation of specific dpipe table to specific resource is exposed by
>> the kernel as well. Probably the iproute2 patch just does not display
>> it.
>
> please provide an iproute2 patch that does so the full context if this
> patch set can be reviewed from a user perspective.
>
As Yuval stated you are using the wrong command here.
You are printing the headers not the tables. On each dpipe
table you can see the resource it is using (the resource
path aka host table uses /kvd/hash_single for example).
This is already working. Just try it.
>>
>>
>>>
>>> Again, I have no objections to kvd, linear, hash, etc terms as they do
>>> relate to Mellanox products. But kvd/linear, for example, does correlate
>>> to industry standard concepts in some way. My request is that the
>>> resource listing guide the user in some way, stating what these
>>> resources mean.
>>
>> So the showed relation to dpipe table would be enougn or you would still
>> like to see some description? I don't like the description concept here
>> as the relations to dpipe table should tell user exactly what he needs
>> to know.
>
> I believe it is useful to have a 1-line, short description that gives
> the user some memory jogger as to what the resource is used for. It does
> not have to be an exhaustive list, but the user should not have to do
> mental jumping jacks running a bunch of commands to understand the
> resources for vendor specific asics.
>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-27 19:31 ` Andrew Lunn
@ 2017-12-27 20:16 ` David Ahern
0 siblings, 0 replies; 61+ messages in thread
From: David Ahern @ 2017-12-27 20:16 UTC (permalink / raw)
To: Andrew Lunn, Jiri Pirko
Cc: netdev, davem, arkadis, mlxsw, vivien.didelot, f.fainelli,
michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
yuvalm, ogerlitz, roopa
On 12/27/17 1:31 PM, Andrew Lunn wrote:
>> Hmm. That documents mainly sysfs. No mention of Netlink at all. But
>> maybe I missed it. Also, that defines the interface as is. However we
>> are talking about the data exchanged over the interface, not the
>> interface itself. I don't see how ASIC/HW specific thing, like for
>> example KVD in our case could be part of kernel ABI.
>
> You need to be very careful here. As soon as somebody starts using it,
> it might become an ABI. Or you need to clearly document it is not ABI,
> there is no guarantee it will not disappear or change its meaning in
> the next kernel, and it should be used with extreme caution.
>
+1
Once the names go in, people can write scripts that invoke devlink at
boot to partition resources. With the proposed patch set, the name
(e.g., kvd/linear) becomes part of the ABI.
^ permalink raw reply [flat|nested] 61+ messages in thread
* RE: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-27 19:43 ` Roopa Prabhu
@ 2017-12-28 8:21 ` Yuval Mintz
2017-12-30 21:15 ` David Ahern
0 siblings, 1 reply; 61+ messages in thread
From: Yuval Mintz @ 2017-12-28 8:21 UTC (permalink / raw)
To: Roopa Prabhu, David Ahern
Cc: Jiri Pirko, netdev@vger.kernel.org, David Miller,
Arkadi Sharshevsky, mlxsw, Andrew Lunn, Vivien Didelot,
Florian Fainelli, Michael Chan, ganeshgr@chelsio.com,
Saeed Mahameed, Matan Barak, Leon Romanovsky, Ido Schimmel,
jakub.kicinski@netronome.com, ast@kernel.org, Daniel Borkmann,
Simon Horman, "pieter.jansenvanvuuren@
> >>>> Many of the ASIC's internal resources are limited and are shared
> between
> >>>> several hardware procedures. For example, unified hash-based memory
> can
> >>>> be used for many lookup purposes, like FDB and LPM. In many cases the
> user
> >>>> can provide a partitioning scheme for such a resource in order to
> perform
> >>>> fine tuning for his application. In such cases performing driver reload is
> >>>> needed for the changes to take place, thus this patchset also adds
> support
> >>>> for hot reload.
> >>>>
> >>>> Such an abstraction can be coupled with devlink's dpipe interface, which
> >>>> models the ASIC's pipeline as a graph of match/action tables. By
> modeling
> >>>> the hardware resource object, and by coupling it to several dpipe tables,
> >>>> further visibility can be achieved in order to debug ASIC-wide issues.
> >>>>
> >>>> The proposed interface will provide the user the ability to understand
> the
> >>>> limitations of the hardware, and receive notification regarding its
> occupancy.
> >>>> Furthermore, monitoring the resource occupancy can be done in real-
> time and
> >>>> can be useful in many cases.
> >>>
> >>> In the last RFC (not v1, but RFC) I asked for some kind of description
> >>> for each resource, and you and Arkadi have pushed back. Let's walk
> >>> through an example to see what I mean:
> >>>
> >>> $ devlink resource show pci/0000:03:00.0
> >>> pci/0000:03:00.0:
> >>> name kvd size 245760 size_valid true
> >>> resources:
> >>> name linear size 98304 occ 0
> >>> name hash_double size 60416
> >>> name hash_single size 87040
> >>>
> >>> So this 2700 has 3 resources that can be managed -- some table or
> >>> resource or something named 'kvd' with linear, hash_double and
> >>> hash_single sub-resources. What are these names referring too? The
> above
> >>> output gives no description, and 'kvd' is not an industry term. Further,
> >>
> >> This are internal resources specific to the ASIC. Would you like some
> >> description to each or something like that?
> >
> > devlink has some nice self-documenting capabilities. What's missing here
> > is a description of what the resource is used for in standard terms --
> > ipv4 host routes, fdb, nexthops, rifs, etc. Even if the description is a
> > short list versus an exhaustive list of everything it is used for. e.g.,
> > Why would a user decrease linear and increase hash_single or vice versa?
>
>
> Arkadi, on what david says above, can the resource names and ids not
> be driver specific, but moved up to the switchdev layer and just map
> to fdb, host routes, nexthops table sizes etc ?.
> Can these generic networking resources then in-turn be mapped to kvd
> sizes by the driver ?
I think it goes the other way around. The dpipe tables are the ones that
can be translated to functionality; The resources are internal and HW-specific
representing the possible internal division of resources -
but a given resource sn't necessarily mapped to a single networking feature.
[It might be in some cases, but not in the general case]
You could always move to a structured approach where each resource
in the hierarchy is further split to sub-resources until each leaf represents
a single network concepts - but that would stop be an abstraction of the
HW resources and become a SW implementation instead, as SW would
have to be the one to maintain and enforce the resource distribution.
And that's not what we're trying to achieve here.
^ permalink raw reply [flat|nested] 61+ messages in thread
* RE: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-27 16:34 ` David Ahern
2017-12-27 19:43 ` Roopa Prabhu
2017-12-27 20:15 ` Arkadi Sharshevsky
@ 2017-12-28 8:25 ` Yuval Mintz
2017-12-28 16:09 ` David Ahern
2 siblings, 1 reply; 61+ messages in thread
From: Yuval Mintz @ 2017-12-28 8:25 UTC (permalink / raw)
To: David Ahern, Jiri Pirko
Cc: netdev@vger.kernel.org, davem@davemloft.net, Arkadi Sharshevsky,
mlxsw, andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
f.fainelli@gmail.com, michael.chan@broadcom.com,
ganeshgr@chelsio.com, Saeed Mahameed, Matan Barak,
Leon Romanovsky, Ido Schimmel, jakub.kicinski@netronome.com,
ast@kernel.org, daniel@iogearbox.net,
"simon.horman@netronome.com" <simon.horman
> >>> Many of the ASIC's internal resources are limited and are shared
> between
> >>> several hardware procedures. For example, unified hash-based memory
> can
> >>> be used for many lookup purposes, like FDB and LPM. In many cases the
> user
> >>> can provide a partitioning scheme for such a resource in order to
> perform
> >>> fine tuning for his application. In such cases performing driver reload is
> >>> needed for the changes to take place, thus this patchset also adds support
> >>> for hot reload.
> >>>
> >>> Such an abstraction can be coupled with devlink's dpipe interface, which
> >>> models the ASIC's pipeline as a graph of match/action tables. By
> modeling
> >>> the hardware resource object, and by coupling it to several dpipe tables,
> >>> further visibility can be achieved in order to debug ASIC-wide issues.
> >>>
> >>> The proposed interface will provide the user the ability to understand the
> >>> limitations of the hardware, and receive notification regarding its
> occupancy.
> >>> Furthermore, monitoring the resource occupancy can be done in real-
> time and
> >>> can be useful in many cases.
> >>
> >> In the last RFC (not v1, but RFC) I asked for some kind of description
> >> for each resource, and you and Arkadi have pushed back. Let's walk
> >> through an example to see what I mean:
> >>
> >> $ devlink resource show pci/0000:03:00.0
> >> pci/0000:03:00.0:
> >> name kvd size 245760 size_valid true
> >> resources:
> >> name linear size 98304 occ 0
> >> name hash_double size 60416
> >> name hash_single size 87040
> >>
> >> So this 2700 has 3 resources that can be managed -- some table or
> >> resource or something named 'kvd' with linear, hash_double and
> >> hash_single sub-resources. What are these names referring too? The
> above
> >> output gives no description, and 'kvd' is not an industry term. Further,
> >
> > This are internal resources specific to the ASIC. Would you like some
> > description to each or something like that?
>
> devlink has some nice self-documenting capabilities. What's missing here
> is a description of what the resource is used for in standard terms --
> ipv4 host routes, fdb, nexthops, rifs, etc. Even if the description is a
> short list versus an exhaustive list of everything it is used for. e.g.,
> Why would a user decrease linear and increase hash_single or vice versa?
>
> >
> >
> >> what are these sizes that a user can control? The output contains no
> >> units, no description, nothing. In short, the above output provides
> >> random numbers associated with random names.
> >
> > Units are now exposed from kernel, just this version of iproute2 patch
> > does not display it.
>
> please provide an iproute2 patch that does so the full context if this
> patch set can be reviewed from a user perspective.
>
> >
> >
> >>
> >> I can see dpipe tables exported by this device:
> >>
> >> $ devlink dpipe header show pci/0000:03:00.0
> >>
> >> pci/0000:03:00.0:
> >> name mlxsw_meta
> >> field:
> >> name erif_port bitwidth 32 mapping_type ifindex
> >> name l3_forward bitwidth 1
> >> name l3_drop bitwidth 1
> >> name adj_index bitwidth 32
> >> name adj_size bitwidth 32
> >> name adj_hash_index bitwidth 32
> >>
> >> name ipv6
> >> field:
> >> name destination ip bitwidth 128
> >>
> >> name ipv4
> >> field:
> >> name destination ip bitwidth 32
> >>
> >> name ethernet
> >> field:
> >> name destination mac bitwidth 48
> >>
> >> but none mention 'kvd' or 'linear' or 'hash" and none of the other
> >> various devlink options:
> >>
> >> $ devlink
> >> Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }
> >> where OBJECT := { dev | port | sb | monitor | dpipe }
> >>
> >> seem to related to resources.
> >>
> >> So how does a user know what they are controlling by this 'resource'
> >> option? Is the user expected to have a PRM or user guide on hand for the
> >> specific device model that is being configured?
> >
> > The relation of specific dpipe table to specific resource is exposed by
> > the kernel as well. Probably the iproute2 patch just does not display
> > it.
>
> please provide an iproute2 patch that does so the full context if this
> patch set can be reviewed from a user perspective.
>
> >
> >
> >>
> >> Again, I have no objections to kvd, linear, hash, etc terms as they do
> >> relate to Mellanox products. But kvd/linear, for example, does correlate
> >> to industry standard concepts in some way. My request is that the
> >> resource listing guide the user in some way, stating what these
> >> resources mean.
> >
> > So the showed relation to dpipe table would be enougn or you would still
> > like to see some description? I don't like the description concept here
> > as the relations to dpipe table should tell user exactly what he needs
> > to know.
>
> I believe it is useful to have a 1-line, short description that gives
> the user some memory jogger as to what the resource is used for. It does
> not have to be an exhaustive list, but the user should not have to do
> mental jumping jacks running a bunch of commands to understand the
> resources for vendor specific asics.
Perhaps we can simply have devlink utility output the dpipe
table[s] associated with the resource when showing the resource?
It would contain live information as well as prevent the need for
'mental jumping jacks'.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-28 8:25 ` Yuval Mintz
@ 2017-12-28 16:09 ` David Ahern
2017-12-28 16:23 ` Jiri Pirko
0 siblings, 1 reply; 61+ messages in thread
From: David Ahern @ 2017-12-28 16:09 UTC (permalink / raw)
To: Yuval Mintz, Jiri Pirko
Cc: netdev@vger.kernel.org, davem@davemloft.net, Arkadi Sharshevsky,
mlxsw, andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
f.fainelli@gmail.com, michael.chan@broadcom.com,
ganeshgr@chelsio.com, Saeed Mahameed, Matan Barak,
Leon Romanovsky, Ido Schimmel, jakub.kicinski@netronome.com,
ast@kernel.org, daniel@iogearbox.net, simon.horman@netronome.com
On 12/28/17 2:25 AM, Yuval Mintz wrote:
>>>> Again, I have no objections to kvd, linear, hash, etc terms as they do
>>>> relate to Mellanox products. But kvd/linear, for example, does correlate
>>>> to industry standard concepts in some way. My request is that the
>>>> resource listing guide the user in some way, stating what these
>>>> resources mean.
>>>
>>> So the showed relation to dpipe table would be enougn or you would still
>>> like to see some description? I don't like the description concept here
>>> as the relations to dpipe table should tell user exactly what he needs
>>> to know.
>>
>> I believe it is useful to have a 1-line, short description that gives
>> the user some memory jogger as to what the resource is used for. It does
>> not have to be an exhaustive list, but the user should not have to do
>> mental jumping jacks running a bunch of commands to understand the
>> resources for vendor specific asics.
>
> Perhaps we can simply have devlink utility output the dpipe
> table[s] associated with the resource when showing the resource?
> It would contain live information as well as prevent the need for
> 'mental jumping jacks'.
>
My primary contention for this static partitioning is that the proposal
is not giving the user the information they need to make decisions.
As I mentioned earlier, the resource show command gives this:
$ devlink resource show pci/0000:03:00.0
pci/0000:03:00.0:
name kvd size 245760 size_valid true
resources:
name linear size 98304 occ 0
name hash_double size 60416
name hash_single size 87040
the paths /kvd/linear, /kvd/hash_single and /kvd/hash_double are
essentially random names (nothing related to industry standard names)
and the listed sizes are random numbers (no units)[1]. There is nothing
there to tell a user what they can adjust or why they would want to make
an adjustment.
Looking at 'dpipe table show':
$ devlink dpipe table show pci/0000:03:00.0
pci/0000:03:00.0:
name mlxsw_erif size 1000 counters_enabled false
match:
type field_exact header mlxsw_meta field erif_port mapping ifindex
action:
type field_modify header mlxsw_meta field l3_forward
type field_modify header mlxsw_meta field l3_drop
resource_path /kvd/hash_single name mlxsw_host4 size 62
counters_enabled false
match:
type field_exact header mlxsw_meta field erif_port mapping ifindex
type field_exact header ipv4 field destination ip
action:
type field_modify header ethernet field destination mac
resource_path /kvd/hash_double name mlxsw_host6 size 0
counters_enabled false
match:
type field_exact header mlxsw_meta field erif_port mapping ifindex
type field_exact header ipv6 field destination ip
action:
type field_modify header ethernet field destination mac
resource_path /kvd/linear name mlxsw_adj size 0 counters_enabled false
match:
type field_exact header mlxsw_meta field adj_index
type field_exact header mlxsw_meta field adj_size
type field_exact header mlxsw_meta field adj_hash_index
action:
type field_modify header ethernet field destination mac
type field_modify header mlxsw_meta field erif_port mapping ifindex
So there are 4 tables exported to userspace:
1. mlxsw_erif table which is not in any of the kvd regions (no resource
path is given) and it has a size of 1000. Does mlxsw_erif mean a rif as
in Router Interfaces? So the switch supports up to 1000 router interfaces.
2. mlxsw_host4 in /kvd/hash_single with a size of 62. Based on the
fields mlxsw_host4 means IPv4 host entries (neighbor entries). Why is
the size set at 62? Seems really low.
3. mlxsw_host6 in /kvd/hash_double with a size of 0. Based on the fields
mlxsw_host6 means IPv6 host entries (neighbor entries). The size of 0 is
concerning. I guess the switch is not configured to do IPv6?
4. mlxsw_adj in /kvd/linear with a size of 0. Based on the fields I am
going to guess it is an fdb entry????
So if I combine the output of "resource show" and "dpipe table show", I
can conclude:
1. /kvd/linear is only used for fdb entries. So if I want an L3 only use
case I can set /kvd/linear 0. Is that correct? I believe the answer is
no, but based on the information given it seems that way.
2. /kvd/hash_double has a size of 60416, but it is only used for
mlxsw_host6 table which has a size of 0. Now, I am confused.
3. /kvd/hash_single has a size of 87040, but it is only used for
mlxsw_host4 table which has a size of 62. Again confused.
What is a user to make of this output? How is it useful for making
decisions on whether to increase or decrease some resource?
----
[1] In a response, Jiri mentioned units are added by this patch set but
all I see in the patches is this:
@@ -245,4 +256,8 @@ enum devlink_dpipe_header_id {
DEVLINK_DPIPE_HEADER_IPV6,
};
+enum devlink_resource_unit {
+ DEVLINK_RESOURCE_UNIT_DOUBLE_WORD,
+};
+
#endif /* _UAPI_LINUX_DEVLINK_H_ */
DEVLINK_RESOURCE_UNIT_DOUBLE_WORD means what???
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-28 16:09 ` David Ahern
@ 2017-12-28 16:23 ` Jiri Pirko
2017-12-28 16:33 ` David Ahern
0 siblings, 1 reply; 61+ messages in thread
From: Jiri Pirko @ 2017-12-28 16:23 UTC (permalink / raw)
To: David Ahern
Cc: Yuval Mintz, netdev@vger.kernel.org, davem@davemloft.net,
Arkadi Sharshevsky, mlxsw, andrew@lunn.ch,
vivien.didelot@savoirfairelinux.com, f.fainelli@gmail.com,
michael.chan@broadcom.com, ganeshgr@chelsio.com, Saeed Mahameed,
Matan Barak, Leon Romanovsky, Ido Schimmel,
jakub.kicinski@netronome.com, ast@kernel.org,
daniel@iogearbox.net, "
Thu, Dec 28, 2017 at 05:09:09PM CET, dsa@cumulusnetworks.com wrote:
>On 12/28/17 2:25 AM, Yuval Mintz wrote:
>>>>> Again, I have no objections to kvd, linear, hash, etc terms as they do
>>>>> relate to Mellanox products. But kvd/linear, for example, does correlate
>>>>> to industry standard concepts in some way. My request is that the
>>>>> resource listing guide the user in some way, stating what these
>>>>> resources mean.
>>>>
>>>> So the showed relation to dpipe table would be enougn or you would still
>>>> like to see some description? I don't like the description concept here
>>>> as the relations to dpipe table should tell user exactly what he needs
>>>> to know.
>>>
>>> I believe it is useful to have a 1-line, short description that gives
>>> the user some memory jogger as to what the resource is used for. It does
>>> not have to be an exhaustive list, but the user should not have to do
>>> mental jumping jacks running a bunch of commands to understand the
>>> resources for vendor specific asics.
>>
>> Perhaps we can simply have devlink utility output the dpipe
>> table[s] associated with the resource when showing the resource?
>> It would contain live information as well as prevent the need for
>> 'mental jumping jacks'.
>>
>
>My primary contention for this static partitioning is that the proposal
>is not giving the user the information they need to make decisions.
>
>As I mentioned earlier, the resource show command gives this:
>$ devlink resource show pci/0000:03:00.0
>pci/0000:03:00.0:
> name kvd size 245760 size_valid true
> resources:
> name linear size 98304 occ 0
> name hash_double size 60416
> name hash_single size 87040
>
>the paths /kvd/linear, /kvd/hash_single and /kvd/hash_double are
>essentially random names (nothing related to industry standard names)
Of course. There is no industry standard for internal ASIC
implementations. This is the same as for dpipe. There is no industry
standard for ASIC pipeline. dpipe visualizes it. This resource patch
visualizes the internal ASIC resources and their mapping to the
individual dpipe tables.
>and the listed sizes are random numbers (no units)[1]. There is nothing
>there to tell a user what they can adjust or why they would want to make
>an adjustment.
>
>
>Looking at 'dpipe table show':
>
>$ devlink dpipe table show pci/0000:03:00.0
>pci/0000:03:00.0:
> name mlxsw_erif size 1000 counters_enabled false
> match:
> type field_exact header mlxsw_meta field erif_port mapping ifindex
> action:
> type field_modify header mlxsw_meta field l3_forward
> type field_modify header mlxsw_meta field l3_drop
>
> resource_path /kvd/hash_single name mlxsw_host4 size 62
>counters_enabled false
> match:
> type field_exact header mlxsw_meta field erif_port mapping ifindex
> type field_exact header ipv4 field destination ip
> action:
> type field_modify header ethernet field destination mac
>
> resource_path /kvd/hash_double name mlxsw_host6 size 0
>counters_enabled false
> match:
> type field_exact header mlxsw_meta field erif_port mapping ifindex
> type field_exact header ipv6 field destination ip
> action:
> type field_modify header ethernet field destination mac
>
> resource_path /kvd/linear name mlxsw_adj size 0 counters_enabled false
> match:
> type field_exact header mlxsw_meta field adj_index
> type field_exact header mlxsw_meta field adj_size
> type field_exact header mlxsw_meta field adj_hash_index
> action:
> type field_modify header ethernet field destination mac
> type field_modify header mlxsw_meta field erif_port mapping ifindex
>
>
>So there are 4 tables exported to userspace:
>
>1. mlxsw_erif table which is not in any of the kvd regions (no resource
>path is given) and it has a size of 1000. Does mlxsw_erif mean a rif as
>in Router Interfaces? So the switch supports up to 1000 router interfaces.
>
>2. mlxsw_host4 in /kvd/hash_single with a size of 62. Based on the
Size tells you the actual size. It cannot give you max size. The reason
is simple. The resources are shared among multiple tables. That is
exactly what this resource patch makes visible.
>fields mlxsw_host4 means IPv4 host entries (neighbor entries). Why is
>the size set at 62? Seems really low.
>
>3. mlxsw_host6 in /kvd/hash_double with a size of 0. Based on the fields
>mlxsw_host6 means IPv6 host entries (neighbor entries). The size of 0 is
>concerning. I guess the switch is not configured to do IPv6?
>
>4. mlxsw_adj in /kvd/linear with a size of 0. Based on the fields I am
>going to guess it is an fdb entry????
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-28 16:23 ` Jiri Pirko
@ 2017-12-28 16:33 ` David Ahern
2017-12-28 16:43 ` Jiri Pirko
2017-12-29 17:09 ` Arkadi Sharshevsky
0 siblings, 2 replies; 61+ messages in thread
From: David Ahern @ 2017-12-28 16:33 UTC (permalink / raw)
To: Jiri Pirko
Cc: Yuval Mintz, netdev@vger.kernel.org, davem@davemloft.net,
Arkadi Sharshevsky, mlxsw, andrew@lunn.ch,
vivien.didelot@savoirfairelinux.com, f.fainelli@gmail.com,
michael.chan@broadcom.com, ganeshgr@chelsio.com, Saeed Mahameed,
Matan Barak, Leon Romanovsky, Ido Schimmel,
jakub.kicinski@netronome.com, ast@kernel.org,
daniel@iogearbox.net, "
On 12/28/17 10:23 AM, Jiri Pirko wrote:
>> So there are 4 tables exported to userspace:
>>
>> 1. mlxsw_erif table which is not in any of the kvd regions (no resource
>> path is given) and it has a size of 1000. Does mlxsw_erif mean a rif as
>> in Router Interfaces? So the switch supports up to 1000 router interfaces.
>>
>> 2. mlxsw_host4 in /kvd/hash_single with a size of 62. Based on the
> Size tells you the actual size. It cannot give you max size. The reason
> is simple. The resources are shared among multiple tables. That is
> exactly what this resource patch makes visible.
>
>
In the erif table, the 1000 is the max not current usage. I do not have
1000 interfaces:
$ ip -br li sh | wc -l
597
$ devlink dpipe table dump pci/0000:03:00.0 name mlxsw_erif
...
index 503
match_value:
type field_exact header mlxsw_meta field erif_port mapping ifindex
mapping_value 601 value 503
action_value:
type field_modify header mlxsw_meta field l3_forward value 1
The host4 table it is current size with no maximum.
The meaning of table size needs to be consistent across tables.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-28 16:33 ` David Ahern
@ 2017-12-28 16:43 ` Jiri Pirko
2017-12-29 17:09 ` Arkadi Sharshevsky
1 sibling, 0 replies; 61+ messages in thread
From: Jiri Pirko @ 2017-12-28 16:43 UTC (permalink / raw)
To: David Ahern
Cc: Yuval Mintz, netdev@vger.kernel.org, davem@davemloft.net,
Arkadi Sharshevsky, mlxsw, andrew@lunn.ch,
vivien.didelot@savoirfairelinux.com, f.fainelli@gmail.com,
michael.chan@broadcom.com, ganeshgr@chelsio.com, Saeed Mahameed,
Matan Barak, Leon Romanovsky, Ido Schimmel,
jakub.kicinski@netronome.com, ast@kernel.org,
daniel@iogearbox.net, "
Thu, Dec 28, 2017 at 05:33:58PM CET, dsa@cumulusnetworks.com wrote:
>On 12/28/17 10:23 AM, Jiri Pirko wrote:
>>> So there are 4 tables exported to userspace:
>>>
>>> 1. mlxsw_erif table which is not in any of the kvd regions (no resource
>>> path is given) and it has a size of 1000. Does mlxsw_erif mean a rif as
>>> in Router Interfaces? So the switch supports up to 1000 router interfaces.
>>>
>>> 2. mlxsw_host4 in /kvd/hash_single with a size of 62. Based on the
>> Size tells you the actual size. It cannot give you max size. The reason
>> is simple. The resources are shared among multiple tables. That is
>> exactly what this resource patch makes visible.
>>
>>
>
>In the erif table, the 1000 is the max not current usage. I do not have
I believe that is a bug in erif dpipe implementation
(mlxsw_sp_dpipe_table_erif_size_get) We'll fix that. Thanks!
>1000 interfaces:
>
>$ ip -br li sh | wc -l
>597
>
>
>$ devlink dpipe table dump pci/0000:03:00.0 name mlxsw_erif
>...
> index 503
> match_value:
> type field_exact header mlxsw_meta field erif_port mapping ifindex
>mapping_value 601 value 503
> action_value:
> type field_modify header mlxsw_meta field l3_forward value 1
>
>
>The host4 table it is current size with no maximum.
>
>The meaning of table size needs to be consistent across tables.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-28 16:33 ` David Ahern
2017-12-28 16:43 ` Jiri Pirko
@ 2017-12-29 17:09 ` Arkadi Sharshevsky
2017-12-30 10:18 ` Andrew Lunn
1 sibling, 1 reply; 61+ messages in thread
From: Arkadi Sharshevsky @ 2017-12-29 17:09 UTC (permalink / raw)
To: David Ahern, Jiri Pirko
Cc: Yuval Mintz, netdev@vger.kernel.org, davem@davemloft.net, mlxsw,
andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
f.fainelli@gmail.com, michael.chan@broadcom.com,
ganeshgr@chelsio.com, Saeed Mahameed, Matan Barak,
Leon Romanovsky, Ido Schimmel, jakub.kicinski@netronome.com,
ast@kernel.org, daniel@iogearbox.net, simon.horman@netronome.com
On 12/28/2017 06:33 PM, David Ahern wrote:
> On 12/28/17 10:23 AM, Jiri Pirko wrote:
>>> So there are 4 tables exported to userspace:
>>>
>>> 1. mlxsw_erif table which is not in any of the kvd regions (no
>>> resource path is given) and it has a size of 1000. Does
>>> mlxsw_erif mean a rif as in Router Interfaces? So the switch
>>> supports up to 1000 router interfaces.
>>>
>>> 2. mlxsw_host4 in /kvd/hash_single with a size of 62. Based on
>>> the
>> Size tells you the actual size. It cannot give you max size. The
>> reason is simple. The resources are shared among multiple tables.
>> That is exactly what this resource patch makes visible.
>>
>>
>
> In the erif table, the 1000 is the max not current usage. I do not
> have 1000 interfaces:
>
> $ ip -br li sh | wc -l 597
>
>
> $ devlink dpipe table dump pci/0000:03:00.0 name mlxsw_erif ... index
> 503 match_value: type field_exact header mlxsw_meta field erif_port
> mapping ifindex mapping_value 601 value 503 action_value: type
> field_modify header mlxsw_meta field l3_forward value 1
>
>
> The host4 table it is current size with no maximum.
>
> The meaning of table size needs to be consistent across tables.
>
You are right the egress RIF table size is not correct, I will
definitely fix it, but it is not what you think it should be. So in
order to clarify this point, just a reminder:
1. Both dpipe and devlink resource are abstraction models for
hardware entities, and as a result they true to provide generic objects.
Each driver/ASIC should register his own and it absolutely proprietary
implementation. There is absolutely NO industry standard here, the only
thing that resembles a standard is that dpipe looks a bit like P4 only
because its proved to be useful for describing packet forwarding
pipelines. The host4 table is just a hardware process in the mellanox
spectrum ASIC pipeline and it should not be part of ABI, sorry I clearly
don't understand how this even came up.
2. Dpipe table is a single hardware process, most of the time it uses
some resources (for example LPM algorithm uses hash memory).
3. ERIF table is a table that is located in the end of the L3 pipeline.
The current dpipe description is not complete and that why it caused
confusion. The table performs match on rif index and packet type
(UC/MC/BC) and performs forward/drop decision. As you can see, for each
rif the table can have several entries, which provide different
statistics for different traffic types per rif, currently only the UC
is exposed with forward.
4. ASICs use shared resource for many processes, this is exactly the
behavior we want to expose!
Again, the size of the ERIF table should NOT provide the number of
rifs which are in use, simply because dpipe tables do not describe
hardware resources.
In the future the RIF bank will be exported as resource object with size
of 1000, and in order to observe how much are in use you should check
its occupancy. This is the whole reason of this interface.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-29 17:09 ` Arkadi Sharshevsky
@ 2017-12-30 10:18 ` Andrew Lunn
2017-12-30 10:25 ` Jiri Pirko
0 siblings, 1 reply; 61+ messages in thread
From: Andrew Lunn @ 2017-12-30 10:18 UTC (permalink / raw)
To: Arkadi Sharshevsky
Cc: David Ahern, Jiri Pirko, Yuval Mintz, netdev@vger.kernel.org,
davem@davemloft.net, mlxsw, vivien.didelot@savoirfairelinux.com,
f.fainelli@gmail.com, michael.chan@broadcom.com,
ganeshgr@chelsio.com, Saeed Mahameed, Matan Barak,
Leon Romanovsky, Ido Schimmel, jakub.kicinski@netronome.com,
ast@kernel.org, daniel@iogearbox.net, "
> 1. Both dpipe and devlink resource are abstraction models for
> hardware entities, and as a result they true to provide generic objects.
> Each driver/ASIC should register his own and it absolutely proprietary
> implementation. There is absolutely NO industry standard here, the only
> thing that resembles a standard is that dpipe looks a bit like P4 only
> because its proved to be useful for describing packet forwarding
> pipelines. The host4 table is just a hardware process in the mellanox
> spectrum ASIC pipeline and it should not be part of ABI, sorry I clearly
> don't understand how this even came up.
Why should it not be part of the ABI? Are you saying it will change
from version to version? Are you trying to warning people not to write
scripts using it, because its meaning will change and what works now
will break in the next kernel version?
Andrew
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-30 10:18 ` Andrew Lunn
@ 2017-12-30 10:25 ` Jiri Pirko
2017-12-30 17:26 ` Andrew Lunn
0 siblings, 1 reply; 61+ messages in thread
From: Jiri Pirko @ 2017-12-30 10:25 UTC (permalink / raw)
To: Andrew Lunn
Cc: Arkadi Sharshevsky, David Ahern, Yuval Mintz,
netdev@vger.kernel.org, davem@davemloft.net, mlxsw,
vivien.didelot@savoirfairelinux.com, f.fainelli@gmail.com,
michael.chan@broadcom.com, ganeshgr@chelsio.com, Saeed Mahameed,
Matan Barak, Leon Romanovsky, Ido Schimmel,
jakub.kicinski@netronome.com, ast@kernel.org,
daniel@iogearbox.net
Sat, Dec 30, 2017 at 11:18:50AM CET, andrew@lunn.ch wrote:
>> 1. Both dpipe and devlink resource are abstraction models for
>> hardware entities, and as a result they true to provide generic objects.
>> Each driver/ASIC should register his own and it absolutely proprietary
>> implementation. There is absolutely NO industry standard here, the only
>> thing that resembles a standard is that dpipe looks a bit like P4 only
>> because its proved to be useful for describing packet forwarding
>> pipelines. The host4 table is just a hardware process in the mellanox
>> spectrum ASIC pipeline and it should not be part of ABI, sorry I clearly
>> don't understand how this even came up.
>
>Why should it not be part of the ABI? Are you saying it will change
>from version to version? Are you trying to warning people not to write
>scripts using it, because its meaning will change and what works now
>will break in the next kernel version?
In my opinion it should not change. Unless there is a bug (like the one
DaveA found in mlxsw erif table). Existing tables and resources should
be only added. It is the driver's maintainer responsibility to not to
break user scripts.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-30 10:25 ` Jiri Pirko
@ 2017-12-30 17:26 ` Andrew Lunn
0 siblings, 0 replies; 61+ messages in thread
From: Andrew Lunn @ 2017-12-30 17:26 UTC (permalink / raw)
To: Jiri Pirko
Cc: Arkadi Sharshevsky, David Ahern, Yuval Mintz,
netdev@vger.kernel.org, davem@davemloft.net, mlxsw,
vivien.didelot@savoirfairelinux.com, f.fainelli@gmail.com,
michael.chan@broadcom.com, ganeshgr@chelsio.com, Saeed Mahameed,
Matan Barak, Leon Romanovsky, Ido Schimmel,
jakub.kicinski@netronome.com, ast@kernel.org,
daniel@iogearbox.net
> In my opinion it should not change. Unless there is a bug (like the one
> DaveA found in mlxsw erif table). Existing tables and resources should
> be only added. It is the driver's maintainer responsibility to not to
> break user scripts.
So we agree with is ABI. Great.
Andrew
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-28 8:21 ` Yuval Mintz
@ 2017-12-30 21:15 ` David Ahern
2017-12-31 10:52 ` Arkadi Sharshevsky
0 siblings, 1 reply; 61+ messages in thread
From: David Ahern @ 2017-12-30 21:15 UTC (permalink / raw)
To: Yuval Mintz, Roopa Prabhu, Jiri Pirko, Arkadi Sharshevsky
Cc: netdev@vger.kernel.org, David Miller, mlxsw, Andrew Lunn,
Vivien Didelot, Florian Fainelli, Michael Chan,
ganeshgr@chelsio.com, Saeed Mahameed, Matan Barak,
Leon Romanovsky, Ido Schimmel, jakub.kicinski@netronome.com,
ast@kernel.org, Daniel Borkmann, Simon Horman,
pieter.jansenvanvuuren@netronome.com, "john.hurley@netronome
On 12/28/17 1:21 AM, Yuval Mintz wrote:
> I think it goes the other way around. The dpipe tables are the ones that
> can be translated to functionality; The resources are internal and HW-specific
> representing the possible internal division of resources -
> but a given resource sn't necessarily mapped to a single networking feature.
> [It might be in some cases, but not in the general case]
This is what I am getting at -- a single resource /kvd/linear is used
for multiple networking features, and those networking features do map
to well known entities -- fdb entries, ACL entries, ipv4/v6 host
entries, LPM entries, etc.
Nothing about the output from devlink helps the user in any way to
understand how to change the resource values. Saying that these
resources, what they mean and how they are used is MLX proprietary and
is known only to MLX employees and those with MLX agreements is not
acceptable. Likewise, requiring some network admin to deep dive into the
mlxsw driver to piece together how kvd/linear (for example) is used is
not acceptable.
The cover letter touts "Many of the ASIC's internal resources are
limited and are shared between several hardware procedures. For example,
unified hash-based memory can be used for many lookup purposes, like FDB
and LPM. In many cases the user can provide a partitioning scheme for
such a resource in order to perform fine tuning for his application."
Great, now give the user some indication of how to do that. Is setting
/kvd/linear to 0 acceptable? If not, why? What functionality is lost?
(Apparently, everything [1].)
The dpipe tables list some correlation between the kvd resources and
tables but that is not a complete list and again there is nothing to
tell a user that it is only a partial list of how a kvd resource is
used. For example, it shows ipv4 host is in /kvd/hash_single and that is
all it shows. So if I have an ipv6 only deployment can I conclude that I
can set /kvd/hash_single to 0? Or the reverse, can I set hash_double to
0 for an ipv4 only deployment? From the limited information given, it is
reasonable for a user to assume yes and has to learn through trial and
error what can be done. [2]
-----
[1] This is allowed by the current patch set and perhaps it should not be:
$ ip ro ls vrf vrf1101
unreachable default metric 8192
11.2.51.0/24 dev swp1s0.51 proto kernel scope link src 11.2.51.1 offload
11.3.51.0/24 dev swp1s1.51 proto kernel scope link src 11.3.51.1 offload
11.4.51.0/24 dev swp1s2.51 proto kernel scope link src 11.4.51.1 offload
11.5.51.0/24 dev swp1s3.51 proto kernel scope link src 11.5.51.1 offload
11.6.51.0/24 dev swp3s0.51 proto kernel scope link src 11.6.51.1 offload
11.7.51.0/24 dev swp3s1.51 proto kernel scope link src 11.7.51.1 offload
11.8.51.0/24 dev swp3s2.51 proto kernel scope link src 11.8.51.1 offload
11.9.51.0/24 dev swp3s3.51 proto kernel scope link src 11.9.51.1 offload
$ devlink resource set pci/0000:03:00.0 path /kvd/linear size 0
$ devlink reload pci/0000:03:00.0
$ ip ro ls vrf vrf1101
unreachable default metric 8192
[2] Same exact result for setting hash_double to 0:
$ ip ro ls vrf vrf1101
unreachable default metric 8192
11.2.51.0/24 dev swp1s0.51 proto kernel scope link src 11.2.51.1 offload
11.3.51.0/24 dev swp1s1.51 proto kernel scope link src 11.3.51.1 offload
11.4.51.0/24 dev swp1s2.51 proto kernel scope link src 11.4.51.1 offload
11.5.51.0/24 dev swp1s3.51 proto kernel scope link src 11.5.51.1 offload
11.6.51.0/24 dev swp3s0.51 proto kernel scope link src 11.6.51.1 offload
11.7.51.0/24 dev swp3s1.51 proto kernel scope link src 11.7.51.1 offload
11.8.51.0/24 dev swp3s2.51 proto kernel scope link src 11.8.51.1 offload
11.9.51.0/24 dev swp3s3.51 proto kernel scope link src 11.9.51.1 offload
$ devlink resource set pci/0000:03:00.0 path /kvd/hash_double size 0
$ devlink reload pci/0000:03:00.0
$ ip ro ls vrf vrf1101
unreachable default metric 8192
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-30 21:15 ` David Ahern
@ 2017-12-31 10:52 ` Arkadi Sharshevsky
2017-12-31 15:46 ` David Ahern
0 siblings, 1 reply; 61+ messages in thread
From: Arkadi Sharshevsky @ 2017-12-31 10:52 UTC (permalink / raw)
To: David Ahern, Yuval Mintz, Roopa Prabhu, Jiri Pirko
Cc: netdev@vger.kernel.org, David Miller, mlxsw, Andrew Lunn,
Vivien Didelot, Florian Fainelli, Michael Chan,
ganeshgr@chelsio.com, Saeed Mahameed, Matan Barak,
Leon Romanovsky, Ido Schimmel, jakub.kicinski@netronome.com,
ast@kernel.org, Daniel Borkmann, Simon Horman,
pieter.jansenvanvuuren@netronome.com, "john.hurley@netronome
On 12/30/2017 11:15 PM, David Ahern wrote:
> On 12/28/17 1:21 AM, Yuval Mintz wrote:
>> I think it goes the other way around. The dpipe tables are the ones that
>> can be translated to functionality; The resources are internal and HW-specific
>> representing the possible internal division of resources -
>> but a given resource sn't necessarily mapped to a single networking feature.
>> [It might be in some cases, but not in the general case]
>
> This is what I am getting at -- a single resource /kvd/linear is used
> for multiple networking features, and those networking features do map
> to well known entities -- fdb entries, ACL entries, ipv4/v6 host
> entries, LPM entries, etc.
>
> Nothing about the output from devlink helps the user in any way to
> understand how to change the resource values. Saying that these
The current patchset adds the following dpipe table <--> resource
relation
host4 -- hash single
host6 -- hash double
adj -- linear
By dumping the resources via the 'resource show' you can the tree like
structure, you can see that you have a tradeoff between those subparts.
So for example if a user would like to increase the number of nexthops
with the expense of neighbors, it is pretty clear. As more dpipe table
will be introduced this relations will be more complete and the user
will get the complete view of the ASIC.
Just to summarize, the user gets the following info
1. Constrains\trade off about setting the sizes - this you get
by the tree structure.
2. Each hardware process which use this resource is mapped to it
By combining those two you can get the most accurate information
about what your change will do. Partitioning of the KVD is very delicate
process, because the hardware is complex. Many hardware processes are
pointing to this memory and size changes effect the whole ASIC, as I
mentioned as more of the pipeline will be exposed via dpipe the user
will get a more precise vision of the hardware.
We will provide some recommended and tested configuration of the whole
mlxsw resource tree for different user scenarios. A more experienced
user can do it for himself, if he got some very special scenario.
> resources, what they mean and how they are used is MLX proprietary and
> is known only to MLX employees and those with MLX agreements is not
> acceptable. Likewise, requiring some network admin to deep dive into the
> mlxsw driver to piece together how kvd/linear (for example) is used is
> not acceptable.
>
> The cover letter touts "Many of the ASIC's internal resources are
> limited and are shared between several hardware procedures. For example,
> unified hash-based memory can be used for many lookup purposes, like FDB
> and LPM. In many cases the user can provide a partitioning scheme for
> such a resource in order to perform fine tuning for his application."
>
> Great, now give the user some indication of how to do that. Is setting
> /kvd/linear to 0 acceptable? If not, why? What functionality is lost?
> (Apparently, everything [1].)
>
> The dpipe tables list some correlation between the kvd resources and
> tables but that is not a complete list and again there is nothing to
> tell a user that it is only a partial list of how a kvd resource is
This is work in progress, the LPM block will be exposed as the last
L3 part. Then we will start the l2 part of the ASIC.
> used. For example, it shows ipv4 host is in /kvd/hash_single and that is
> all it shows. So if I have an ipv6 only deployment can I conclude that I
> can set /kvd/hash_single to 0? Or the reverse, can I set hash_double to
> 0 for an ipv4 only deployment? From the limited information given, it is
> reasonable for a user to assume yes and has to learn through trial and
> error what can be done. [2]
>
So you want to add min/max size attribute? I think this its not needed.
> -----
>
> [1] This is allowed by the current patch set and perhaps it should not be:
>
> $ ip ro ls vrf vrf1101
> unreachable default metric 8192
> 11.2.51.0/24 dev swp1s0.51 proto kernel scope link src 11.2.51.1 offload
> 11.3.51.0/24 dev swp1s1.51 proto kernel scope link src 11.3.51.1 offload
> 11.4.51.0/24 dev swp1s2.51 proto kernel scope link src 11.4.51.1 offload
> 11.5.51.0/24 dev swp1s3.51 proto kernel scope link src 11.5.51.1 offload
> 11.6.51.0/24 dev swp3s0.51 proto kernel scope link src 11.6.51.1 offload
> 11.7.51.0/24 dev swp3s1.51 proto kernel scope link src 11.7.51.1 offload
> 11.8.51.0/24 dev swp3s2.51 proto kernel scope link src 11.8.51.1 offload
> 11.9.51.0/24 dev swp3s3.51 proto kernel scope link src 11.9.51.1 offload
>
> $ devlink resource set pci/0000:03:00.0 path /kvd/linear size 0
This line actually did nothing, because size zero is not acceptable
see patch 6. This is pure userpsace problem that error is not shown.
You can verify it by dumping the resources and see that there is no
pending change (only size and not size_new).
> $ devlink reload pci/0000:03:00.0
> $ ip ro ls vrf vrf1101
> unreachable default metric 8192
>
So you just performed full reload of the driver which includes
unregistration of all the netdevs and full init. KVD update requires
full teardown of the driver.
The system will not get back to the same state after reloading,
It's should be done on init. But it doesn't have to be like this
this, each driver provides his own reload devlink op implementation
so in our case full blown reset is required.
> [2] Same exact result for setting hash_double to 0:
> $ ip ro ls vrf vrf1101
> unreachable default metric 8192
> 11.2.51.0/24 dev swp1s0.51 proto kernel scope link src 11.2.51.1 offload
> 11.3.51.0/24 dev swp1s1.51 proto kernel scope link src 11.3.51.1 offload
> 11.4.51.0/24 dev swp1s2.51 proto kernel scope link src 11.4.51.1 offload
> 11.5.51.0/24 dev swp1s3.51 proto kernel scope link src 11.5.51.1 offload
> 11.6.51.0/24 dev swp3s0.51 proto kernel scope link src 11.6.51.1 offload
> 11.7.51.0/24 dev swp3s1.51 proto kernel scope link src 11.7.51.1 offload
> 11.8.51.0/24 dev swp3s2.51 proto kernel scope link src 11.8.51.1 offload
> 11.9.51.0/24 dev swp3s3.51 proto kernel scope link src 11.9.51.1 offload
>
> $ devlink resource set pci/0000:03:00.0 path /kvd/hash_double size 0
> $ devlink reload pci/0000:03:00.0
> $ ip ro ls vrf vrf1101
> unreachable default metric 8192
>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-31 10:52 ` Arkadi Sharshevsky
@ 2017-12-31 15:46 ` David Ahern
2018-01-01 12:23 ` Arkadi Sharshevsky
0 siblings, 1 reply; 61+ messages in thread
From: David Ahern @ 2017-12-31 15:46 UTC (permalink / raw)
To: Arkadi Sharshevsky, Yuval Mintz, Roopa Prabhu, Jiri Pirko
Cc: netdev@vger.kernel.org, David Miller, mlxsw, Andrew Lunn,
Vivien Didelot, Florian Fainelli, Michael Chan,
ganeshgr@chelsio.com, Saeed Mahameed, Matan Barak,
Leon Romanovsky, Ido Schimmel, jakub.kicinski@netronome.com,
ast@kernel.org, Daniel Borkmann, Simon Horman,
pieter.jansenvanvuuren@netronome.com, "john.hurley@netronome
On 12/31/17 3:52 AM, Arkadi Sharshevsky wrote:
>> [1] This is allowed by the current patch set and perhaps it should not be:
>>
>> $ ip ro ls vrf vrf1101
>> unreachable default metric 8192
>> 11.2.51.0/24 dev swp1s0.51 proto kernel scope link src 11.2.51.1 offload
>> 11.3.51.0/24 dev swp1s1.51 proto kernel scope link src 11.3.51.1 offload
>> 11.4.51.0/24 dev swp1s2.51 proto kernel scope link src 11.4.51.1 offload
>> 11.5.51.0/24 dev swp1s3.51 proto kernel scope link src 11.5.51.1 offload
>> 11.6.51.0/24 dev swp3s0.51 proto kernel scope link src 11.6.51.1 offload
>> 11.7.51.0/24 dev swp3s1.51 proto kernel scope link src 11.7.51.1 offload
>> 11.8.51.0/24 dev swp3s2.51 proto kernel scope link src 11.8.51.1 offload
>> 11.9.51.0/24 dev swp3s3.51 proto kernel scope link src 11.9.51.1 offload
>>
>> $ devlink resource set pci/0000:03:00.0 path /kvd/linear size 0
>
> This line actually did nothing, because size zero is not acceptable
> see patch 6. This is pure userpsace problem that error is not shown.
Then perhaps you have a kernel side bug. After the reload I get this:
$ devlink resource show pci/0000:03:00.0
pci/0000:03:00.0:
name kvd size 245760 size_valid true
resources:
name linear size 0 occ 0
name hash_double size 60416
name hash_single size 87040
>
> You can verify it by dumping the resources and see that there is no
> pending change (only size and not size_new).
>
>> $ devlink reload pci/0000:03:00.0
>> $ ip ro ls vrf vrf1101
>> unreachable default metric 8192
>>
>
> So you just performed full reload of the driver which includes
> unregistration of all the netdevs and full init. KVD update requires
> full teardown of the driver.
you are right, I forgot to do a networking reload. Because of the above
(0 was actually taken) all kinds of errors are spewed on 'ifreload -av'
and there is no change to the ro ls:
$ ip ro ls vrf vrf1101
unreachable default metric 8192
>
> The system will not get back to the same state after reloading,
> It's should be done on init. But it doesn't have to be like this
> this, each driver provides his own reload devlink op implementation
> so in our case full blown reset is required.
>
>
>> [2] Same exact result for setting hash_double to 0:
>> $ ip ro ls vrf vrf1101
>> unreachable default metric 8192
>> 11.2.51.0/24 dev swp1s0.51 proto kernel scope link src 11.2.51.1 offload
>> 11.3.51.0/24 dev swp1s1.51 proto kernel scope link src 11.3.51.1 offload
>> 11.4.51.0/24 dev swp1s2.51 proto kernel scope link src 11.4.51.1 offload
>> 11.5.51.0/24 dev swp1s3.51 proto kernel scope link src 11.5.51.1 offload
>> 11.6.51.0/24 dev swp3s0.51 proto kernel scope link src 11.6.51.1 offload
>> 11.7.51.0/24 dev swp3s1.51 proto kernel scope link src 11.7.51.1 offload
>> 11.8.51.0/24 dev swp3s2.51 proto kernel scope link src 11.8.51.1 offload
>> 11.9.51.0/24 dev swp3s3.51 proto kernel scope link src 11.9.51.1 offload
>>
>> $ devlink resource set pci/0000:03:00.0 path /kvd/hash_double size 0
On this command you are correct, 0 is not taken:
$ devlink resource set pci/0000:03:00.0 path /kvd/hash_double size 0
$ devlink resource show pci/0000:03:00.0
pci/0000:03:00.0:
name kvd size 245760 size_valid true
resources:
name linear size 0 occ 0
name hash_double size 60416
name hash_single size 87040
but the 'set' command did not fail with a proper extack based error
message, so consider this another a bug report.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-31 15:46 ` David Ahern
@ 2018-01-01 12:23 ` Arkadi Sharshevsky
0 siblings, 0 replies; 61+ messages in thread
From: Arkadi Sharshevsky @ 2018-01-01 12:23 UTC (permalink / raw)
To: David Ahern, Yuval Mintz, Roopa Prabhu, Jiri Pirko
Cc: netdev@vger.kernel.org, David Miller, mlxsw, Andrew Lunn,
Vivien Didelot, Florian Fainelli, Michael Chan,
ganeshgr@chelsio.com, Saeed Mahameed, Matan Barak,
Leon Romanovsky, Ido Schimmel, jakub.kicinski@netronome.com,
ast@kernel.org, Daniel Borkmann, Simon Horman,
pieter.jansenvanvuuren@netronome.com, "john.hurley@netronome
On 12/31/2017 05:46 PM, David Ahern wrote:
> On 12/31/17 3:52 AM, Arkadi Sharshevsky wrote:
>>> [1] This is allowed by the current patch set and perhaps it should not be:
>>>
>>> $ ip ro ls vrf vrf1101
>>> unreachable default metric 8192
>>> 11.2.51.0/24 dev swp1s0.51 proto kernel scope link src 11.2.51.1 offload
>>> 11.3.51.0/24 dev swp1s1.51 proto kernel scope link src 11.3.51.1 offload
>>> 11.4.51.0/24 dev swp1s2.51 proto kernel scope link src 11.4.51.1 offload
>>> 11.5.51.0/24 dev swp1s3.51 proto kernel scope link src 11.5.51.1 offload
>>> 11.6.51.0/24 dev swp3s0.51 proto kernel scope link src 11.6.51.1 offload
>>> 11.7.51.0/24 dev swp3s1.51 proto kernel scope link src 11.7.51.1 offload
>>> 11.8.51.0/24 dev swp3s2.51 proto kernel scope link src 11.8.51.1 offload
>>> 11.9.51.0/24 dev swp3s3.51 proto kernel scope link src 11.9.51.1 offload
>>>
>>> $ devlink resource set pci/0000:03:00.0 path /kvd/linear size 0
>>
>> This line actually did nothing, because size zero is not acceptable
>> see patch 6. This is pure userpsace problem that error is not shown.
>
> Then perhaps you have a kernel side bug. After the reload I get this:
>
> $ devlink resource show pci/0000:03:00.0
> pci/0000:03:00.0:
> name kvd size 245760 size_valid true
> resources:
> name linear size 0 occ 0
> name hash_double size 60416
> name hash_single size 87040
>
Actually no bug here, the linear can be zero. This implies no nexthop
routes. The adj table uses it as you can see.
>
>>
>> You can verify it by dumping the resources and see that there is no
>> pending change (only size and not size_new).
>>
>>> $ devlink reload pci/0000:03:00.0
>>> $ ip ro ls vrf vrf1101
>>> unreachable default metric 8192
>>>
>>
>> So you just performed full reload of the driver which includes
>> unregistration of all the netdevs and full init. KVD update requires
>> full teardown of the driver.
>
> you are right, I forgot to do a networking reload. Because of the above
> (0 was actually taken) all kinds of errors are spewed on 'ifreload -av'
> and there is no change to the ro ls:
>
> $ ip ro ls vrf vrf1101
> unreachable default metric 8192
>
>>
>> The system will not get back to the same state after reloading,
>> It's should be done on init. But it doesn't have to be like this
>> this, each driver provides his own reload devlink op implementation
>> so in our case full blown reset is required.
>>
>>
>>> [2] Same exact result for setting hash_double to 0:
>>> $ ip ro ls vrf vrf1101
>>> unreachable default metric 8192
>>> 11.2.51.0/24 dev swp1s0.51 proto kernel scope link src 11.2.51.1 offload
>>> 11.3.51.0/24 dev swp1s1.51 proto kernel scope link src 11.3.51.1 offload
>>> 11.4.51.0/24 dev swp1s2.51 proto kernel scope link src 11.4.51.1 offload
>>> 11.5.51.0/24 dev swp1s3.51 proto kernel scope link src 11.5.51.1 offload
>>> 11.6.51.0/24 dev swp3s0.51 proto kernel scope link src 11.6.51.1 offload
>>> 11.7.51.0/24 dev swp3s1.51 proto kernel scope link src 11.7.51.1 offload
>>> 11.8.51.0/24 dev swp3s2.51 proto kernel scope link src 11.8.51.1 offload
>>> 11.9.51.0/24 dev swp3s3.51 proto kernel scope link src 11.9.51.1 offload
>>>
>>> $ devlink resource set pci/0000:03:00.0 path /kvd/hash_double size 0
>
> On this command you are correct, 0 is not taken:
> $ devlink resource set pci/0000:03:00.0 path /kvd/hash_double size 0
> $ devlink resource show pci/0000:03:00.0
> pci/0000:03:00.0:
> name kvd size 245760 size_valid true
> resources:
> name linear size 0 occ 0
> name hash_double size 60416
> name hash_single size 87040
>
> but the 'set' command did not fail with a proper extack based error
> message, so consider this another a bug report.
>
Yeah, this is a bug, but a userspace one. I actually sniffed the nl
messages with nlmon and saw the extended ack packet with the required
string.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2017-12-26 11:23 [patch net-next v2 00/10] Add support for resource abstraction Jiri Pirko
` (10 preceding siblings ...)
2017-12-27 4:05 ` [patch net-next v2 00/10] Add support for resource abstraction David Ahern
@ 2018-01-01 14:58 ` Arkadi Sharshevsky
2018-01-02 10:08 ` Jiri Pirko
2018-01-02 18:05 ` David Ahern
11 siblings, 2 replies; 61+ messages in thread
From: Arkadi Sharshevsky @ 2018-01-01 14:58 UTC (permalink / raw)
To: Jiri Pirko, netdev, dsa, roopa
Cc: davem, mlxsw, andrew, vivien.didelot, f.fainelli, michael.chan,
ganeshgr, saeedm, matanb, leonro, idosch, jakub.kicinski, ast,
daniel, simon.horman, pieter.jansenvanvuuren, john.hurley,
alexander.h.duyck, linville, gospo, steven.lin1, yuvalm, ogerlitz
On 12/26/2017 01:23 PM, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
>
> Many of the ASIC's internal resources are limited and are shared between
> several hardware procedures. For example, unified hash-based memory can
> be used for many lookup purposes, like FDB and LPM. In many cases the user
> can provide a partitioning scheme for such a resource in order to perform
> fine tuning for his application. In such cases performing driver reload is
> needed for the changes to take place, thus this patchset also adds support
> for hot reload.
>
> Such an abstraction can be coupled with devlink's dpipe interface, which
> models the ASIC's pipeline as a graph of match/action tables. By modeling
> the hardware resource object, and by coupling it to several dpipe tables,
> further visibility can be achieved in order to debug ASIC-wide issues.
>
> The proposed interface will provide the user the ability to understand the
> limitations of the hardware, and receive notification regarding its occupancy.
> Furthermore, monitoring the resource occupancy can be done in real-time and
> can be useful in many cases.
> ---
> Userspace part prototype can be found at https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Farkadis%2Fiproute2%2F&data=02%7C01%7Carkadis%40mellanox.com%7C1ae3d8b4854a454e21e008d54c5329e3%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636498842440762657&sdata=7MC2BFQFxjnmHqy2sOOL9VEa4ZGq6e5Z2n2WvuNgyFk%3D&reserved=0
> at resource_dev branch.
>
> v1->v2
> - Add resource size attribute.
> - Fix split bug.
>
Just to summarize the current fixes required:
1. ERIF dpipe table size is reporting wrong size. More precisely the
ERIF table does not take rifs, so it should not be linked to the rif
bank resource (is not part of this patchset, future extension).
2. Extended ACK user-space bug.
3. ABI documentation- Not sure we agreed upon it, Jiri?
If I missed something please respond. Nothing of the fixes mentioned
above is relevant for this patchset actually.
Couple of key-points:
1. Constrains\trade off about setting the sizes - this can be obtained
trivially from the resource tree nested structure.
2. Dpipe provides the mapping of hardware processes to resources.
3. Units - each resource specifies his units, if dpipe table's size is
X and its related to some resource its size is normalized to that
resources basic unit.
IMO this is the most hardware exact interaction, and this is the way it
should be exported from the kernel, if something is not presented in
'user' convenient way some utilities can be implemented in userspace
to easily do it. Furthermore, some examples will be provided for the
whole kvd tree partition for different cases (IPv6 heavy etc..).
Advanced user will be able to tweak it as they like.
Regarding the 'switchdev' layer I think that kernel's software tables
like nexthops/neigh/routes should be mapped to dpipe tables and not
to resources directly:
kernel_fdb--> dpipe_fdb -->/kvd/hash_single.
> Arkadi Sharshevsky (10):
> devlink: Add per devlink instance lock
> devlink: Add support for resource abstraction
> devlink: Add support for reload
> devlink: Add relation between dpipe and resource
> mlxsw: pci: Add support for performing bus reset
> mlxsw: spectrum: Register KVD resources with devlink
> mlxsw: spectrum_dpipe: Connect dpipe tables to resources
> mlxsw: spectrum: Add support for getting kvdl occupancy
> mlxsw: pci: Add support for getting resource through devlink
> mlxsw: core: Add support for reload
>
> drivers/net/ethernet/mellanox/mlxsw/core.c | 85 ++-
> drivers/net/ethernet/mellanox/mlxsw/core.h | 16 +-
> drivers/net/ethernet/mellanox/mlxsw/i2c.c | 5 +-
> drivers/net/ethernet/mellanox/mlxsw/pci.c | 98 ++--
> drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 205 ++++++++
> drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 13 +
> .../net/ethernet/mellanox/mlxsw/spectrum_dpipe.c | 72 ++-
> .../net/ethernet/mellanox/mlxsw/spectrum_kvdl.c | 26 +
> include/net/devlink.h | 97 ++++
> include/uapi/linux/devlink.h | 21 +
> net/core/devlink.c | 573 ++++++++++++++++++---
> 11 files changed, 1079 insertions(+), 132 deletions(-)
>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2018-01-01 14:58 ` Arkadi Sharshevsky
@ 2018-01-02 10:08 ` Jiri Pirko
2018-01-02 13:41 ` Andrew Lunn
2018-01-02 18:05 ` David Ahern
1 sibling, 1 reply; 61+ messages in thread
From: Jiri Pirko @ 2018-01-02 10:08 UTC (permalink / raw)
To: Arkadi Sharshevsky
Cc: netdev, dsa, roopa, davem, mlxsw, andrew, vivien.didelot,
f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
idosch, jakub.kicinski, ast, daniel, simon.horman,
pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, linville,
gospo, steven.lin1, yuvalm, ogerlitz
Mon, Jan 01, 2018 at 03:58:33PM CET, arkadis@mellanox.com wrote:
>
>
>On 12/26/2017 01:23 PM, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>>
>> Many of the ASIC's internal resources are limited and are shared between
>> several hardware procedures. For example, unified hash-based memory can
>> be used for many lookup purposes, like FDB and LPM. In many cases the user
>> can provide a partitioning scheme for such a resource in order to perform
>> fine tuning for his application. In such cases performing driver reload is
>> needed for the changes to take place, thus this patchset also adds support
>> for hot reload.
>>
>> Such an abstraction can be coupled with devlink's dpipe interface, which
>> models the ASIC's pipeline as a graph of match/action tables. By modeling
>> the hardware resource object, and by coupling it to several dpipe tables,
>> further visibility can be achieved in order to debug ASIC-wide issues.
>>
>> The proposed interface will provide the user the ability to understand the
>> limitations of the hardware, and receive notification regarding its occupancy.
>> Furthermore, monitoring the resource occupancy can be done in real-time and
>> can be useful in many cases.
>> ---
>> Userspace part prototype can be found at https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Farkadis%2Fiproute2%2F&data=02%7C01%7Carkadis%40mellanox.com%7C1ae3d8b4854a454e21e008d54c5329e3%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636498842440762657&sdata=7MC2BFQFxjnmHqy2sOOL9VEa4ZGq6e5Z2n2WvuNgyFk%3D&reserved=0
>> at resource_dev branch.
>>
>> v1->v2
>> - Add resource size attribute.
>> - Fix split bug.
>>
>
>Just to summarize the current fixes required:
>
>1. ERIF dpipe table size is reporting wrong size. More precisely the
> ERIF table does not take rifs, so it should not be linked to the rif
> bank resource (is not part of this patchset, future extension).
>2. Extended ACK user-space bug.
>3. ABI documentation- Not sure we agreed upon it, Jiri?
Question is where to put it. It is mlxsw-specific thing, moreover,
Spectrum-specific thing, same as dpipe tables etc. Not sure. Perhaps
Documentation/networking/mlxsw.txt ?
>
>If I missed something please respond. Nothing of the fixes mentioned
>above is relevant for this patchset actually.
>
>Couple of key-points:
>
>1. Constrains\trade off about setting the sizes - this can be obtained
> trivially from the resource tree nested structure.
>2. Dpipe provides the mapping of hardware processes to resources.
>3. Units - each resource specifies his units, if dpipe table's size is
> X and its related to some resource its size is normalized to that
> resources basic unit.
>
>IMO this is the most hardware exact interaction, and this is the way it
>should be exported from the kernel, if something is not presented in
>'user' convenient way some utilities can be implemented in userspace
>to easily do it. Furthermore, some examples will be provided for the
>whole kvd tree partition for different cases (IPv6 heavy etc..).
>Advanced user will be able to tweak it as they like.
>
>Regarding the 'switchdev' layer I think that kernel's software tables
>like nexthops/neigh/routes should be mapped to dpipe tables and not
>to resources directly:
Sure. dpipe table -> resource mapping is the only one that makes sense.
>
>kernel_fdb--> dpipe_fdb -->/kvd/hash_single.
>
>> Arkadi Sharshevsky (10):
>> devlink: Add per devlink instance lock
>> devlink: Add support for resource abstraction
>> devlink: Add support for reload
>> devlink: Add relation between dpipe and resource
>> mlxsw: pci: Add support for performing bus reset
>> mlxsw: spectrum: Register KVD resources with devlink
>> mlxsw: spectrum_dpipe: Connect dpipe tables to resources
>> mlxsw: spectrum: Add support for getting kvdl occupancy
>> mlxsw: pci: Add support for getting resource through devlink
>> mlxsw: core: Add support for reload
>>
>> drivers/net/ethernet/mellanox/mlxsw/core.c | 85 ++-
>> drivers/net/ethernet/mellanox/mlxsw/core.h | 16 +-
>> drivers/net/ethernet/mellanox/mlxsw/i2c.c | 5 +-
>> drivers/net/ethernet/mellanox/mlxsw/pci.c | 98 ++--
>> drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 205 ++++++++
>> drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 13 +
>> .../net/ethernet/mellanox/mlxsw/spectrum_dpipe.c | 72 ++-
>> .../net/ethernet/mellanox/mlxsw/spectrum_kvdl.c | 26 +
>> include/net/devlink.h | 97 ++++
>> include/uapi/linux/devlink.h | 21 +
>> net/core/devlink.c | 573 ++++++++++++++++++---
>> 11 files changed, 1079 insertions(+), 132 deletions(-)
>>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2018-01-02 10:08 ` Jiri Pirko
@ 2018-01-02 13:41 ` Andrew Lunn
2018-01-02 14:35 ` Jiri Pirko
0 siblings, 1 reply; 61+ messages in thread
From: Andrew Lunn @ 2018-01-02 13:41 UTC (permalink / raw)
To: Jiri Pirko
Cc: Arkadi Sharshevsky, netdev, dsa, roopa, davem, mlxsw,
vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, linville,
gospo, steven.lin1, yuvalm, ogerlitz
> Question is where to put it. It is mlxsw-specific thing, moreover,
> Spectrum-specific thing, same as dpipe tables etc. Not sure. Perhaps
> Documentation/networking/mlxsw.txt ?
Hi Jiri
Documentation/ABI seems like the correct place. There is nothing in
the README which says it is limited to files. You could use an name
like devlink-driver-mlxsw.
Andrew
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2018-01-02 13:41 ` Andrew Lunn
@ 2018-01-02 14:35 ` Jiri Pirko
0 siblings, 0 replies; 61+ messages in thread
From: Jiri Pirko @ 2018-01-02 14:35 UTC (permalink / raw)
To: Andrew Lunn
Cc: Arkadi Sharshevsky, netdev, dsa, roopa, davem, mlxsw,
vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, linville,
gospo, steven.lin1, yuvalm, ogerlitz
Tue, Jan 02, 2018 at 02:41:13PM CET, andrew@lunn.ch wrote:
>> Question is where to put it. It is mlxsw-specific thing, moreover,
>> Spectrum-specific thing, same as dpipe tables etc. Not sure. Perhaps
>> Documentation/networking/mlxsw.txt ?
>
>Hi Jiri
>
>Documentation/ABI seems like the correct place. There is nothing in
>the README which says it is limited to files. You could use an name
>like devlink-driver-mlxsw.
Okay. Sounds sane. Thanks.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2018-01-01 14:58 ` Arkadi Sharshevsky
2018-01-02 10:08 ` Jiri Pirko
@ 2018-01-02 18:05 ` David Ahern
2018-01-03 18:05 ` Arkadi Sharshevsky
1 sibling, 1 reply; 61+ messages in thread
From: David Ahern @ 2018-01-02 18:05 UTC (permalink / raw)
To: Arkadi Sharshevsky, Jiri Pirko, netdev, roopa
Cc: davem, mlxsw, andrew, vivien.didelot, f.fainelli, michael.chan,
ganeshgr, saeedm, matanb, leonro, idosch, jakub.kicinski, ast,
daniel, simon.horman, pieter.jansenvanvuuren, john.hurley,
alexander.h.duyck, linville, gospo, steven.lin1, yuvalm, ogerlitz
On 1/1/18 7:58 AM, Arkadi Sharshevsky wrote:
>
> Just to summarize the current fixes required:
>
> 1. ERIF dpipe table size is reporting wrong size. More precisely the
> ERIF table does not take rifs, so it should not be linked to the rif
> bank resource (is not part of this patchset, future extension).
> 2. Extended ACK user-space bug.
> 3. ABI documentation- Not sure we agreed upon it, Jiri?
>
> If I missed something please respond. Nothing of the fixes mentioned
> above is relevant for this patchset actually.
>
Can you fix the userspace command and then we come back to what else is
needed? Right now, it is hard to tell what is a user space bug and what
is a kernel space bug.
For example:
$ devlink resource set pci/0000:03:00.0 path /kvd/linear size 10000
$ devlink resource show pci/0000:03:00.0
pci/0000:03:00.0:
name kvd size 245760 size_valid true
resources:
name linear size 98304 occ 0
name hash_double size 60416
name hash_single size 87040
The set command did not fail, yet there is no size_new arg in the output
like there is for this change:
$ devlink resource set pci/0000:03:00.0 path /kvd/linear size 0
$ devlink resource show pci/0000:03:00.0
pci/0000:03:00.0:
name kvd size 245760 size_valid true
resources:
name linear size 98304 size_new 0 occ 0
name hash_double size 60416
name hash_single size 87040
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2018-01-02 18:05 ` David Ahern
@ 2018-01-03 18:05 ` Arkadi Sharshevsky
2018-01-03 18:14 ` David Ahern
2018-01-04 2:28 ` David Ahern
0 siblings, 2 replies; 61+ messages in thread
From: Arkadi Sharshevsky @ 2018-01-03 18:05 UTC (permalink / raw)
To: David Ahern, Jiri Pirko, netdev, roopa
Cc: davem, mlxsw, andrew, vivien.didelot, f.fainelli, michael.chan,
ganeshgr, saeedm, matanb, leonro, idosch, jakub.kicinski, ast,
daniel, simon.horman, pieter.jansenvanvuuren, john.hurley,
alexander.h.duyck, linville, gospo, steven.lin1, yuvalm, ogerlitz
On 01/02/2018 08:05 PM, David Ahern wrote:
> On 1/1/18 7:58 AM, Arkadi Sharshevsky wrote:
>>
>> Just to summarize the current fixes required:
>>
>> 1. ERIF dpipe table size is reporting wrong size. More precisely the
>> ERIF table does not take rifs, so it should not be linked to the rif
>> bank resource (is not part of this patchset, future extension).
>> 2. Extended ACK user-space bug.
>> 3. ABI documentation- Not sure we agreed upon it, Jiri?
>>
>> If I missed something please respond. Nothing of the fixes mentioned
>> above is relevant for this patchset actually.
>>
>
> Can you fix the userspace command and then we come back to what else is
> needed? Right now, it is hard to tell what is a user space bug and what
> is a kernel space bug.
>
> For example:
> $ devlink resource set pci/0000:03:00.0 path /kvd/linear size 10000
> $ devlink resource show pci/0000:03:00.0
> pci/0000:03:00.0:
> name kvd size 245760 size_valid true
> resources:
> name linear size 98304 occ 0
> name hash_double size 60416
> name hash_single size 87040
>
> The set command did not fail, yet there is no size_new arg in the output
> like there is for this change:
>
> $ devlink resource set pci/0000:03:00.0 path /kvd/linear size 0
> $ devlink resource show pci/0000:03:00.0
> pci/0000:03:00.0:
> name kvd size 245760 size_valid true
> resources:
> name linear size 98304 size_new 0 occ 0
> name hash_double size 60416
> name hash_single size 87040
>
As I stated this is a user-space bug which I fixed, and updated my repo
so please pull. Devlink uses mnl,and currently mnl does not support
extended ack. I added support for this in my local ver of libmnl:
https://github.com/arkadis/libmnl.git
On branch master, so you can check it out. Besides this bugs, which were
userspace, can please specify what are the pending problems from your
point of view? Thanks!
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2018-01-03 18:05 ` Arkadi Sharshevsky
@ 2018-01-03 18:14 ` David Ahern
2018-01-03 18:17 ` Jiri Pirko
2018-01-04 2:28 ` David Ahern
1 sibling, 1 reply; 61+ messages in thread
From: David Ahern @ 2018-01-03 18:14 UTC (permalink / raw)
To: Arkadi Sharshevsky, Jiri Pirko, netdev, roopa
Cc: davem, mlxsw, andrew, vivien.didelot, f.fainelli, michael.chan,
ganeshgr, saeedm, matanb, leonro, idosch, jakub.kicinski, ast,
daniel, simon.horman, pieter.jansenvanvuuren, john.hurley,
alexander.h.duyck, linville, gospo, steven.lin1, yuvalm, ogerlitz
On 1/3/18 11:05 AM, Arkadi Sharshevsky wrote:
> As I stated this is a user-space bug which I fixed, and updated my repo
> so please pull. Devlink uses mnl,and currently mnl does not support
> extended ack. I added support for this in my local ver of libmnl:
>
> https://github.com/arkadis/libmnl.git
>
> On branch master, so you can check it out. Besides this bugs, which were
> userspace, can please specify what are the pending problems from your
> point of view? Thanks!
devlink is in iproute2 package and it has extack support. See 'git log
lib/libnetlink.c'
You do not need to modify libmnl to get extack output in devlink. Can
you update the command accordingly? Once it works I will come back to
this topic and take another look.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2018-01-03 18:14 ` David Ahern
@ 2018-01-03 18:17 ` Jiri Pirko
2018-01-03 18:29 ` David Ahern
0 siblings, 1 reply; 61+ messages in thread
From: Jiri Pirko @ 2018-01-03 18:17 UTC (permalink / raw)
To: David Ahern
Cc: Arkadi Sharshevsky, netdev, roopa, davem, mlxsw, andrew,
vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, linville,
gospo, steven.lin1, yuvalm, ogerlitz
Wed, Jan 03, 2018 at 07:14:16PM CET, dsa@cumulusnetworks.com wrote:
>On 1/3/18 11:05 AM, Arkadi Sharshevsky wrote:
>> As I stated this is a user-space bug which I fixed, and updated my repo
>> so please pull. Devlink uses mnl,and currently mnl does not support
>> extended ack. I added support for this in my local ver of libmnl:
>>
>> https://github.com/arkadis/libmnl.git
>>
>> On branch master, so you can check it out. Besides this bugs, which were
>> userspace, can please specify what are the pending problems from your
>> point of view? Thanks!
>
>devlink is in iproute2 package and it has extack support. See 'git log
>lib/libnetlink.c'
Dave, devlink uses libmnl.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2018-01-03 18:17 ` Jiri Pirko
@ 2018-01-03 18:29 ` David Ahern
2018-01-03 18:36 ` Jiri Pirko
2018-01-04 0:07 ` Arkadi Sharshevsky
0 siblings, 2 replies; 61+ messages in thread
From: David Ahern @ 2018-01-03 18:29 UTC (permalink / raw)
To: Jiri Pirko
Cc: Arkadi Sharshevsky, netdev, roopa, davem, mlxsw, andrew,
vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, linville,
gospo, steven.lin1, yuvalm, ogerlitz
On 1/3/18 11:17 AM, Jiri Pirko wrote:
> Wed, Jan 03, 2018 at 07:14:16PM CET, dsa@cumulusnetworks.com wrote:
>> On 1/3/18 11:05 AM, Arkadi Sharshevsky wrote:
>>> As I stated this is a user-space bug which I fixed, and updated my repo
>>> so please pull. Devlink uses mnl,and currently mnl does not support
>>> extended ack. I added support for this in my local ver of libmnl:
>>>
>>> https://github.com/arkadis/libmnl.git
>>>
>>> On branch master, so you can check it out. Besides this bugs, which were
>>> userspace, can please specify what are the pending problems from your
>>> point of view? Thanks!
>>
>> devlink is in iproute2 package and it has extack support. See 'git log
>> lib/libnetlink.c'
>
> Dave, devlink uses libmnl.
>
Now I remember. You wrote it independently and but needed iproute2 be a
delivery vehicle. It uses none of the common infrastructure from
iproute2. Could we make this more difficult ....
Sometime in the next day I will jump through the hoops to get a proper
devlink command.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2018-01-03 18:29 ` David Ahern
@ 2018-01-03 18:36 ` Jiri Pirko
2018-01-04 16:58 ` David Ahern
2018-01-04 0:07 ` Arkadi Sharshevsky
1 sibling, 1 reply; 61+ messages in thread
From: Jiri Pirko @ 2018-01-03 18:36 UTC (permalink / raw)
To: David Ahern
Cc: Arkadi Sharshevsky, netdev, roopa, davem, mlxsw, andrew,
vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, linville,
gospo, steven.lin1, yuvalm, ogerlitz
Wed, Jan 03, 2018 at 07:29:46PM CET, dsa@cumulusnetworks.com wrote:
>On 1/3/18 11:17 AM, Jiri Pirko wrote:
>> Wed, Jan 03, 2018 at 07:14:16PM CET, dsa@cumulusnetworks.com wrote:
>>> On 1/3/18 11:05 AM, Arkadi Sharshevsky wrote:
>>>> As I stated this is a user-space bug which I fixed, and updated my repo
>>>> so please pull. Devlink uses mnl,and currently mnl does not support
>>>> extended ack. I added support for this in my local ver of libmnl:
>>>>
>>>> https://github.com/arkadis/libmnl.git
>>>>
>>>> On branch master, so you can check it out. Besides this bugs, which were
>>>> userspace, can please specify what are the pending problems from your
>>>> point of view? Thanks!
>>>
>>> devlink is in iproute2 package and it has extack support. See 'git log
>>> lib/libnetlink.c'
>>
>> Dave, devlink uses libmnl.
>>
>
>Now I remember. You wrote it independently and but needed iproute2 be a
>delivery vehicle. It uses none of the common infrastructure from
>iproute2. Could we make this more difficult ....
Feel free to rewrite it to use lib/libnetlink.c. Should not be that
hard. Note that at the time I was pushing devlink userspace, tipc also
used libmnl as a part of iproute2, so devlink was not the first one.
That is why I decided not to rewrite.
As of the rest of the "common infrastructure", what exactly do you
have in mind?
>
>Sometime in the next day I will jump through the hoops to get a proper
>devlink command.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2018-01-03 18:29 ` David Ahern
2018-01-03 18:36 ` Jiri Pirko
@ 2018-01-04 0:07 ` Arkadi Sharshevsky
1 sibling, 0 replies; 61+ messages in thread
From: Arkadi Sharshevsky @ 2018-01-04 0:07 UTC (permalink / raw)
To: David Ahern, Jiri Pirko
Cc: netdev, roopa, davem, mlxsw, andrew, vivien.didelot, f.fainelli,
michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
yuvalm, ogerlitz
On 01/03/2018 08:29 PM, David Ahern wrote:
> On 1/3/18 11:17 AM, Jiri Pirko wrote:
>> Wed, Jan 03, 2018 at 07:14:16PM CET, dsa@cumulusnetworks.com wrote:
>>> On 1/3/18 11:05 AM, Arkadi Sharshevsky wrote:
>>>> As I stated this is a user-space bug which I fixed, and updated my repo
>>>> so please pull. Devlink uses mnl,and currently mnl does not support
>>>> extended ack. I added support for this in my local ver of libmnl:
>>>>
>>>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Farkadis%2Flibmnl.git&data=02%7C01%7Carkadis%40mellanox.com%7C5c86b6240eb84459c6ae08d552d7f9a4%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636506009929977440&sdata=sgrNzMhPwe63BIVxexZTjl%2FXqW51kpuRiHVhTDNaa70%3D&reserved=0
>>>>
>>>> On branch master, so you can check it out. Besides this bugs, which were
>>>> userspace, can please specify what are the pending problems from your
>>>> point of view? Thanks!
>>>
>>> devlink is in iproute2 package and it has extack support. See 'git log
>>> lib/libnetlink.c'
>>
>> Dave, devlink uses libmnl.
>>
>
> Now I remember. You wrote it independently and but needed iproute2 be a
> delivery vehicle. It uses none of the common infrastructure from
> iproute2. Could we make this more difficult ....
>
> Sometime in the next day I will jump through the hoops to get a proper
> devlink command.
>
This actually was very confusing, I think the extack should be
handled by libmnl and iproute should use mnl_cb_run() routines
and not to implement its own. That way we could both benefit
from that.
You actually do use libmnl in libnetlink.c only for parsing
the headers, and its a dependency for extack handling.
I see this as a completely independent user space issue, which
doesn't have to do anything with this patchset. Not to mention
that everything is working right now.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2018-01-03 18:05 ` Arkadi Sharshevsky
2018-01-03 18:14 ` David Ahern
@ 2018-01-04 2:28 ` David Ahern
2018-01-04 9:24 ` Arkadi Sharshevsky
1 sibling, 1 reply; 61+ messages in thread
From: David Ahern @ 2018-01-04 2:28 UTC (permalink / raw)
To: Arkadi Sharshevsky, Jiri Pirko, netdev, roopa
Cc: davem, mlxsw, andrew, vivien.didelot, f.fainelli, michael.chan,
ganeshgr, saeedm, matanb, leonro, idosch, jakub.kicinski, ast,
daniel, simon.horman, pieter.jansenvanvuuren, john.hurley,
alexander.h.duyck, linville, gospo, steven.lin1, yuvalm, ogerlitz
On 1/3/18 11:05 AM, Arkadi Sharshevsky wrote:
>
>
> On 01/02/2018 08:05 PM, David Ahern wrote:
>> On 1/1/18 7:58 AM, Arkadi Sharshevsky wrote:
>>>
>>> Just to summarize the current fixes required:
>>>
>>> 1. ERIF dpipe table size is reporting wrong size. More precisely the
>>> ERIF table does not take rifs, so it should not be linked to the rif
>>> bank resource (is not part of this patchset, future extension).
>>> 2. Extended ACK user-space bug.
>>> 3. ABI documentation- Not sure we agreed upon it, Jiri?
>>>
>>> If I missed something please respond. Nothing of the fixes mentioned
>>> above is relevant for this patchset actually.
>>>
>>
>> Can you fix the userspace command and then we come back to what else is
>> needed? Right now, it is hard to tell what is a user space bug and what
>> is a kernel space bug.
>>
>> For example:
>> $ devlink resource set pci/0000:03:00.0 path /kvd/linear size 10000
>> $ devlink resource show pci/0000:03:00.0
>> pci/0000:03:00.0:
>> name kvd size 245760 size_valid true
>> resources:
>> name linear size 98304 occ 0
>> name hash_double size 60416
>> name hash_single size 87040
>>
>> The set command did not fail, yet there is no size_new arg in the output
>> like there is for this change:
>>
>> $ devlink resource set pci/0000:03:00.0 path /kvd/linear size 0
>> $ devlink resource show pci/0000:03:00.0
>> pci/0000:03:00.0:
>> name kvd size 245760 size_valid true
>> resources:
>> name linear size 98304 size_new 0 occ 0
>> name hash_double size 60416
>> name hash_single size 87040
>>
>
> As I stated this is a user-space bug which I fixed, and updated my repo
> so please pull. Devlink uses mnl,and currently mnl does not support
> extended ack. I added support for this in my local ver of libmnl:
>
> https://github.com/arkadis/libmnl.git
>
> On branch master, so you can check it out. Besides this bugs, which were
> userspace, can please specify what are the pending problems from your
> point of view? Thanks!
>
Again, my comments all stem from user experience.
Can you explain what "double_word" means for a unit? I would expect a
units to be kB or count (or items or entries).
$ devlink resource show pci/0000:03:00.0
pci/0000:03:00.0:
name kvd size 245760 unit double_word size_valid true
resources:
name linear size 98304 occ 0 unit double_word
name hash_double size 60416 unit double_word
name hash_single size 87040 unit double_word
While that is confusing here from the userspace command it goes hand in
hand with patch 2 and:
+enum devlink_resource_unit {
+ DEVLINK_RESOURCE_UNIT_DOUBLE_WORD,
+};
Also, it seems like the occ of 0 is wrong since we know from past
responses that if I set linear to 0 all of networking breaks.
How does a user learn the granularity of a resource:
$ devlink resource set pci/0000:03:00.0 path /kvd/hash_double size 50000
Error: mlxsw_spectrum: resource set with wrong granularity.
Try again with 51000 and then 52000 and ... Why not export the
granularity read-only? I don't see it in the proposed UAPI.
And then on the reload:
$ devlink reload pci/0000:03:00.0
Error: devlink: resources size validation failed.
Since the reload is not doing any resource sizing that error message is
confusing. Maybe something like "Sum of the resource components exceeds
total size."
Finally, I still contend a 1-line description of each of the resources
goes a long way to improving the user friendliness of this set.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2018-01-04 2:28 ` David Ahern
@ 2018-01-04 9:24 ` Arkadi Sharshevsky
2018-01-04 12:41 ` Andrew Lunn
2018-01-04 15:58 ` David Ahern
0 siblings, 2 replies; 61+ messages in thread
From: Arkadi Sharshevsky @ 2018-01-04 9:24 UTC (permalink / raw)
To: David Ahern, Jiri Pirko, netdev, roopa
Cc: davem, mlxsw, andrew, vivien.didelot, f.fainelli, michael.chan,
ganeshgr, saeedm, matanb, leonro, idosch, jakub.kicinski, ast,
daniel, simon.horman, pieter.jansenvanvuuren, john.hurley,
alexander.h.duyck, linville, gospo, steven.lin1, yuvalm, ogerlitz
On 01/04/2018 04:28 AM, David Ahern wrote:
> On 1/3/18 11:05 AM, Arkadi Sharshevsky wrote:
>>
>>
>> On 01/02/2018 08:05 PM, David Ahern wrote:
>>> On 1/1/18 7:58 AM, Arkadi Sharshevsky wrote:
>>>>
>>>> Just to summarize the current fixes required:
>>>>
>>>> 1. ERIF dpipe table size is reporting wrong size. More precisely the
>>>> ERIF table does not take rifs, so it should not be linked to the rif
>>>> bank resource (is not part of this patchset, future extension).
>>>> 2. Extended ACK user-space bug.
>>>> 3. ABI documentation- Not sure we agreed upon it, Jiri?
>>>>
>>>> If I missed something please respond. Nothing of the fixes mentioned
>>>> above is relevant for this patchset actually.
>>>>
>>>
>>> Can you fix the userspace command and then we come back to what else is
>>> needed? Right now, it is hard to tell what is a user space bug and what
>>> is a kernel space bug.
>>>
>>> For example:
>>> $ devlink resource set pci/0000:03:00.0 path /kvd/linear size 10000
>>> $ devlink resource show pci/0000:03:00.0
>>> pci/0000:03:00.0:
>>> name kvd size 245760 size_valid true
>>> resources:
>>> name linear size 98304 occ 0
>>> name hash_double size 60416
>>> name hash_single size 87040
>>>
>>> The set command did not fail, yet there is no size_new arg in the output
>>> like there is for this change:
>>>
>>> $ devlink resource set pci/0000:03:00.0 path /kvd/linear size 0
>>> $ devlink resource show pci/0000:03:00.0
>>> pci/0000:03:00.0:
>>> name kvd size 245760 size_valid true
>>> resources:
>>> name linear size 98304 size_new 0 occ 0
>>> name hash_double size 60416
>>> name hash_single size 87040
>>>
>>
>> As I stated this is a user-space bug which I fixed, and updated my repo
>> so please pull. Devlink uses mnl,and currently mnl does not support
>> extended ack. I added support for this in my local ver of libmnl:
>>
>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Farkadis%2Flibmnl.git&data=02%7C01%7Carkadis%40mellanox.com%7C9a369b54cdec48a5e1d208d5531adbdb%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636506297202356822&sdata=6KOkBz5PHqu6z8nlexSdggZj42LE4geiYVFA%2BgcgaPE%3D&reserved=0
>>
>> On branch master, so you can check it out. Besides this bugs, which were
>> userspace, can please specify what are the pending problems from your
>> point of view? Thanks!
>>
>
> Again, my comments all stem from user experience.
>
> Can you explain what "double_word" means for a unit? I would expect a
> units to be kB or count (or items or entries).
>
Double word is 64 bit, dont understand why this is confusing.
> $ devlink resource show pci/0000:03:00.0
> pci/0000:03:00.0:
> name kvd size 245760 unit double_word size_valid true
> resources:
> name linear size 98304 occ 0 unit double_word
> name hash_double size 60416 unit double_word
> name hash_single size 87040 unit double_word
>
> While that is confusing here from the userspace command it goes hand in
> hand with patch 2 and:
>
> +enum devlink_resource_unit {
> + DEVLINK_RESOURCE_UNIT_DOUBLE_WORD,
> +};
>
>
> Also, it seems like the occ of 0 is wrong since we know from past
> responses that if I set linear to 0 all of networking breaks.
You are clearly misunderstanding what is occupancy of the resource
and what kvd linear is good for. As I mentioned in the last response
kvd linear is mapped to adjacency table. So in case its 0 no nexthop
routes could be configured, this information is provided by the
dpipe<-->resource.
Occupancy means how much of the resource is used right now, why is
this wrong? and how its related to the size 0 exactly?
>
>
>
> How does a user learn the granularity of a resource:
>
> $ devlink resource set pci/0000:03:00.0 path /kvd/hash_double size 50000
> Error: mlxsw_spectrum: resource set with wrong granularity.
>
> Try again with 51000 and then 52000 and ... Why not export the
> granularity read-only? I don't see it in the proposed UAPI.
>
I would like more adding the granularity size to the extack string
instead of adding this to UAPI. The user will try to update once
and will get the required granularity in the error message.
>
> And then on the reload:
>
> $ devlink reload pci/0000:03:00.0
> Error: devlink: resources size validation failed.
>
> Since the reload is not doing any resource sizing that error message is
> confusing. Maybe something like "Sum of the resource components exceeds
> total size."
>
No problem, sounds better.
>
> Finally, I still contend a 1-line description of each of the resources
> goes a long way to improving the user friendliness of this set.
>
Strongly against it.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2018-01-04 9:24 ` Arkadi Sharshevsky
@ 2018-01-04 12:41 ` Andrew Lunn
2018-01-04 15:58 ` David Ahern
1 sibling, 0 replies; 61+ messages in thread
From: Andrew Lunn @ 2018-01-04 12:41 UTC (permalink / raw)
To: Arkadi Sharshevsky
Cc: David Ahern, Jiri Pirko, netdev, roopa, davem, mlxsw,
vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, linville,
gospo, steven.lin1, yuvalm, ogerlitz
> Double word is 64 bit, dont understand why this is confusing.
In an ASCI, the definition of a word can be quite flexible. I've seen
designs using 14 bit words, since 14 bits was all that was needed to
represent the data to be held. I've also seen a 16 bit word used to
hold a signed value, with the binary point before the last nibble, so
it could hold -2047.9375 to +2047.9375. I might have that wrong. It
gave me a headache at the time, but the synthesizer had no such
problems.
Why not use u8, u16, u14, u32, u64, which we can all understand
without confusion.
Andrew
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2018-01-04 9:24 ` Arkadi Sharshevsky
2018-01-04 12:41 ` Andrew Lunn
@ 2018-01-04 15:58 ` David Ahern
2018-01-04 16:13 ` Arkadi Sharshevsky
1 sibling, 1 reply; 61+ messages in thread
From: David Ahern @ 2018-01-04 15:58 UTC (permalink / raw)
To: Arkadi Sharshevsky, Jiri Pirko, netdev, roopa
Cc: davem, mlxsw, andrew, vivien.didelot, f.fainelli, michael.chan,
ganeshgr, saeedm, matanb, leonro, idosch, jakub.kicinski, ast,
daniel, simon.horman, pieter.jansenvanvuuren, john.hurley,
alexander.h.duyck, linville, gospo, steven.lin1, yuvalm, ogerlitz
On 1/4/18 2:24 AM, Arkadi Sharshevsky wrote:
>> Again, my comments all stem from user experience.
>>
>> Can you explain what "double_word" means for a unit? I would expect a
>> units to be kB or count (or items or entries).
>>
>
> Double word is 64 bit, dont understand why this is confusing.
As Andrew pointed out, double word can have a range of sizes.
To my point here, a 'unit' for a number should not be the number of bits
it is represented by. I believe all of the kvd sizes are in entries
(ie., a linear size of 98304 means I can have 98,304 entries in that
resource).
>
>> $ devlink resource show pci/0000:03:00.0
>> pci/0000:03:00.0:
>> name kvd size 245760 unit double_word size_valid true
>> resources:
>> name linear size 98304 occ 0 unit double_word
>> name hash_double size 60416 unit double_word
>> name hash_single size 87040 unit double_word
>>
>> While that is confusing here from the userspace command it goes hand in
>> hand with patch 2 and:
>>
>> +enum devlink_resource_unit {
>> + DEVLINK_RESOURCE_UNIT_DOUBLE_WORD,
>> +};
>>
>>
>> Also, it seems like the occ of 0 is wrong since we know from past
>> responses that if I set linear to 0 all of networking breaks.
>
> You are clearly misunderstanding what is occupancy of the resource
> and what kvd linear is good for. As I mentioned in the last response
> kvd linear is mapped to adjacency table. So in case its 0 no nexthop
> routes could be configured, this information is provided by the
> dpipe<-->resource.
>
> Occupancy means how much of the resource is used right now, why is
> this wrong? and how its related to the size 0 exactly?
The summary line above shows the current kvd/linear occupancy is '0'.
That suggests my L3 only deployment is not using any kvd/linear which
means I can set its allocation to 0 and devote more kvd resources to the
hash regions.
But, as I pointed out in previous responses I can not set linear to 0.
If I do all of networking breaks. That suggests that kvd/linear is used
by more networking entities than you are currently reporting. Hence,
telling me the kvd/linear occupancy is 0 is wrong.
I believe the stems from the how you are determining occupancy and the
fact that not all tables have been implemented. Showing the current
occupancy of a resource is very helpful, so it should be part of the API.
However, until mlxsw shows a proper value (either by implementing all
dpipe tables or changing how it is calculated) it should not be
returning that attribute. As it stands you are giving a user invalid data.
>
>>
>>
>>
>> How does a user learn the granularity of a resource:
>>
>> $ devlink resource set pci/0000:03:00.0 path /kvd/hash_double size 50000
>> Error: mlxsw_spectrum: resource set with wrong granularity.
>>
>> Try again with 51000 and then 52000 and ... Why not export the
>> granularity read-only? I don't see it in the proposed UAPI.
>>
>
> I would like more adding the granularity size to the extack string
> instead of adding this to UAPI. The user will try to update once
> and will get the required granularity in the error message.
A user should not have to run a command to get an error message to know
essential data to running a command with the right value. Further, do
not assume 'user' is a person or the devlink command.
Since granularity is essential to running a valid command, it should be
an attribute for each resource.
>
>>
>> And then on the reload:
>>
>> $ devlink reload pci/0000:03:00.0
>> Error: devlink: resources size validation failed.
>>
>> Since the reload is not doing any resource sizing that error message is
>> confusing. Maybe something like "Sum of the resource components exceeds
>> total size."
>>
>
> No problem, sounds better.
>
>>
>> Finally, I still contend a 1-line description of each of the resources
>> goes a long way to improving the user friendliness of this set.
>>
>
> Strongly against it.
>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2018-01-04 15:58 ` David Ahern
@ 2018-01-04 16:13 ` Arkadi Sharshevsky
2018-01-04 16:44 ` David Ahern
2018-01-04 18:03 ` David Ahern
0 siblings, 2 replies; 61+ messages in thread
From: Arkadi Sharshevsky @ 2018-01-04 16:13 UTC (permalink / raw)
To: David Ahern, Jiri Pirko, netdev, roopa
Cc: davem, mlxsw, andrew, vivien.didelot, f.fainelli, michael.chan,
ganeshgr, saeedm, matanb, leonro, idosch, jakub.kicinski, ast,
daniel, simon.horman, pieter.jansenvanvuuren, john.hurley,
alexander.h.duyck, linville, gospo, steven.lin1, yuvalm, ogerlitz
On 01/04/2018 05:58 PM, David Ahern wrote:
> On 1/4/18 2:24 AM, Arkadi Sharshevsky wrote:
>>> Again, my comments all stem from user experience.
>>>
>>> Can you explain what "double_word" means for a unit? I would expect a
>>> units to be kB or count (or items or entries).
>>>
>>
>> Double word is 64 bit, dont understand why this is confusing.
>
> As Andrew pointed out, double word can have a range of sizes.
>
> To my point here, a 'unit' for a number should not be the number of bits
> it is represented by. I believe all of the kvd sizes are in entries
> (ie., a linear size of 98304 means I can have 98,304 entries in that
> resource).
>
>>
>>> $ devlink resource show pci/0000:03:00.0
>>> pci/0000:03:00.0:
>>> name kvd size 245760 unit double_word size_valid true
>>> resources:
>>> name linear size 98304 occ 0 unit double_word
>>> name hash_double size 60416 unit double_word
>>> name hash_single size 87040 unit double_word
>>>
>>> While that is confusing here from the userspace command it goes hand in
>>> hand with patch 2 and:
>>>
>>> +enum devlink_resource_unit {
>>> + DEVLINK_RESOURCE_UNIT_DOUBLE_WORD,
>>> +};
>>>
>>>
>>> Also, it seems like the occ of 0 is wrong since we know from past
>>> responses that if I set linear to 0 all of networking breaks.
>>
>> You are clearly misunderstanding what is occupancy of the resource
>> and what kvd linear is good for. As I mentioned in the last response
>> kvd linear is mapped to adjacency table. So in case its 0 no nexthop
>> routes could be configured, this information is provided by the
>> dpipe<-->resource.
>>
>> Occupancy means how much of the resource is used right now, why is
>> this wrong? and how its related to the size 0 exactly?
>
> The summary line above shows the current kvd/linear occupancy is '0'.
> That suggests my L3 only deployment is not using any kvd/linear which
> means I can set its allocation to 0 and devote more kvd resources to the
> hash regions.
>
> But, as I pointed out in previous responses I can not set linear to 0.
> If I do all of networking breaks. That suggests that kvd/linear is used
> by more networking entities than you are currently reporting. Hence,
> telling me the kvd/linear occupancy is 0 is wrong.
>
> I believe the stems from the how you are determining occupancy and the
> fact that not all tables have been implemented. Showing the current
> occupancy of a resource is very helpful, so it should be part of the API.
>
> However, until mlxsw shows a proper value (either by implementing all
> dpipe tables or changing how it is calculated) it should not be
> returning that attribute. As it stands you are giving a user invalid data.
>
Wait, the occupancy is very precise it goes directly to the linear
allocator inside the driver and gives exact current usage. You assumed
it goes to the dpipe tables which is incorrect.
The linear kvd memory is used by:
1. The adjacency table is responsible for the ECMP and nexthop
resolution.
2. ACL flexible actions blocks (dpipe will contain table for this).
I dont know what you configured but in case you got some remote
routes with a nexthop the occ should not be zero.
If your router contains only local routes and dont use ACL you can
shrink it to zero, no problem.
>>
>>>
>>>
>>>
>>> How does a user learn the granularity of a resource:
>>>
>>> $ devlink resource set pci/0000:03:00.0 path /kvd/hash_double size 50000
>>> Error: mlxsw_spectrum: resource set with wrong granularity.
>>>
>>> Try again with 51000 and then 52000 and ... Why not export the
>>> granularity read-only? I don't see it in the proposed UAPI.
>>>
>>
>> I would like more adding the granularity size to the extack string
>> instead of adding this to UAPI. The user will try to update once
>> and will get the required granularity in the error message.
>
> A user should not have to run a command to get an error message to know
> essential data to running a command with the right value. Further, do
> not assume 'user' is a person or the devlink command.
>
> Since granularity is essential to running a valid command, it should be
> an attribute for each resource.
>
This is also your approach towards the min/max for resource sizes?
Am I correct?
>
>>
>>>
>>> And then on the reload:
>>>
>>> $ devlink reload pci/0000:03:00.0
>>> Error: devlink: resources size validation failed.
>>>
>>> Since the reload is not doing any resource sizing that error message is
>>> confusing. Maybe something like "Sum of the resource components exceeds
>>> total size."
>>>
>>
>> No problem, sounds better.
>>
>>>
>>> Finally, I still contend a 1-line description of each of the resources
>>> goes a long way to improving the user friendliness of this set.
>>>
>>
>> Strongly against it.
>>
>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2018-01-04 16:13 ` Arkadi Sharshevsky
@ 2018-01-04 16:44 ` David Ahern
2018-01-04 18:03 ` David Ahern
1 sibling, 0 replies; 61+ messages in thread
From: David Ahern @ 2018-01-04 16:44 UTC (permalink / raw)
To: Arkadi Sharshevsky, Jiri Pirko, netdev, roopa
Cc: davem, mlxsw, andrew, vivien.didelot, f.fainelli, michael.chan,
ganeshgr, saeedm, matanb, leonro, idosch, jakub.kicinski, ast,
daniel, simon.horman, pieter.jansenvanvuuren, john.hurley,
alexander.h.duyck, linville, gospo, steven.lin1, yuvalm, ogerlitz
On 1/4/18 9:13 AM, Arkadi Sharshevsky wrote:
>
>
> On 01/04/2018 05:58 PM, David Ahern wrote:
>> On 1/4/18 2:24 AM, Arkadi Sharshevsky wrote:
>>>> Again, my comments all stem from user experience.
>>>>
>>>> Can you explain what "double_word" means for a unit? I would expect a
>>>> units to be kB or count (or items or entries).
>>>>
>>>
>>> Double word is 64 bit, dont understand why this is confusing.
>>
>> As Andrew pointed out, double word can have a range of sizes.
>>
>> To my point here, a 'unit' for a number should not be the number of bits
>> it is represented by. I believe all of the kvd sizes are in entries
>> (ie., a linear size of 98304 means I can have 98,304 entries in that
>> resource).
>>
>>>
>>>> $ devlink resource show pci/0000:03:00.0
>>>> pci/0000:03:00.0:
>>>> name kvd size 245760 unit double_word size_valid true
>>>> resources:
>>>> name linear size 98304 occ 0 unit double_word
>>>> name hash_double size 60416 unit double_word
>>>> name hash_single size 87040 unit double_word
>>>>
>>>> While that is confusing here from the userspace command it goes hand in
>>>> hand with patch 2 and:
>>>>
>>>> +enum devlink_resource_unit {
>>>> + DEVLINK_RESOURCE_UNIT_DOUBLE_WORD,
>>>> +};
>>>>
>>>>
>>>> Also, it seems like the occ of 0 is wrong since we know from past
>>>> responses that if I set linear to 0 all of networking breaks.
>>>
>>> You are clearly misunderstanding what is occupancy of the resource
>>> and what kvd linear is good for. As I mentioned in the last response
>>> kvd linear is mapped to adjacency table. So in case its 0 no nexthop
>>> routes could be configured, this information is provided by the
>>> dpipe<-->resource.
>>>
>>> Occupancy means how much of the resource is used right now, why is
>>> this wrong? and how its related to the size 0 exactly?
>>
>> The summary line above shows the current kvd/linear occupancy is '0'.
>> That suggests my L3 only deployment is not using any kvd/linear which
>> means I can set its allocation to 0 and devote more kvd resources to the
>> hash regions.
>>
>> But, as I pointed out in previous responses I can not set linear to 0.
>> If I do all of networking breaks. That suggests that kvd/linear is used
>> by more networking entities than you are currently reporting. Hence,
>> telling me the kvd/linear occupancy is 0 is wrong.
>>
>> I believe the stems from the how you are determining occupancy and the
>> fact that not all tables have been implemented. Showing the current
>> occupancy of a resource is very helpful, so it should be part of the API.
>>
>> However, until mlxsw shows a proper value (either by implementing all
>> dpipe tables or changing how it is calculated) it should not be
>> returning that attribute. As it stands you are giving a user invalid data.
>>
>
> Wait, the occupancy is very precise it goes directly to the linear
> allocator inside the driver and gives exact current usage. You assumed
> it goes to the dpipe tables which is incorrect.
I am trying *guess* based on this discussion as to why it is 0. In past
responses you push that the tables expose how the resource is used.
>
> The linear kvd memory is used by:
> 1. The adjacency table is responsible for the ECMP and nexthop
> resolution.
> 2. ACL flexible actions blocks (dpipe will contain table for this).
>
> I dont know what you configured but in case you got some remote
> routes with a nexthop the occ should not be zero.
>
> If your router contains only local routes and dont use ACL you can
> shrink it to zero, no problem.
My current setup is an L3 only deployment, no ACLs and no bridges.
We are meeting in 15 minutes to discuss this design; let's use some of
the time for debugging this problem.
>
>>>
>>>>
>>>>
>>>>
>>>> How does a user learn the granularity of a resource:
>>>>
>>>> $ devlink resource set pci/0000:03:00.0 path /kvd/hash_double size 50000
>>>> Error: mlxsw_spectrum: resource set with wrong granularity.
>>>>
>>>> Try again with 51000 and then 52000 and ... Why not export the
>>>> granularity read-only? I don't see it in the proposed UAPI.
>>>>
>>>
>>> I would like more adding the granularity size to the extack string
>>> instead of adding this to UAPI. The user will try to update once
>>> and will get the required granularity in the error message.
>>
>> A user should not have to run a command to get an error message to know
>> essential data to running a command with the right value. Further, do
>> not assume 'user' is a person or the devlink command.
>>
>> Since granularity is essential to running a valid command, it should be
>> an attribute for each resource.
>>
>
> This is also your approach towards the min/max for resource sizes?
> Am I correct?
yes.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2018-01-03 18:36 ` Jiri Pirko
@ 2018-01-04 16:58 ` David Ahern
2018-01-04 17:17 ` David Miller
2018-01-25 15:24 ` Jiri Pirko
0 siblings, 2 replies; 61+ messages in thread
From: David Ahern @ 2018-01-04 16:58 UTC (permalink / raw)
To: Jiri Pirko
Cc: Arkadi Sharshevsky, netdev, roopa, davem, mlxsw, andrew,
vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, linville,
gospo, steven.lin1, yuvalm, ogerlitz
On 1/3/18 11:36 AM, Jiri Pirko wrote:
> Wed, Jan 03, 2018 at 07:29:46PM CET, dsa@cumulusnetworks.com wrote:
>> On 1/3/18 11:17 AM, Jiri Pirko wrote:
>>> Wed, Jan 03, 2018 at 07:14:16PM CET, dsa@cumulusnetworks.com wrote:
>>>> On 1/3/18 11:05 AM, Arkadi Sharshevsky wrote:
>>>>> As I stated this is a user-space bug which I fixed, and updated my repo
>>>>> so please pull. Devlink uses mnl,and currently mnl does not support
>>>>> extended ack. I added support for this in my local ver of libmnl:
>>>>>
>>>>> https://github.com/arkadis/libmnl.git
>>>>>
>>>>> On branch master, so you can check it out. Besides this bugs, which were
>>>>> userspace, can please specify what are the pending problems from your
>>>>> point of view? Thanks!
>>>>
>>>> devlink is in iproute2 package and it has extack support. See 'git log
>>>> lib/libnetlink.c'
>>>
>>> Dave, devlink uses libmnl.
>>>
>>
>> Now I remember. You wrote it independently and but needed iproute2 be a
>> delivery vehicle. It uses none of the common infrastructure from
>> iproute2. Could we make this more difficult ....
>
> Feel free to rewrite it to use lib/libnetlink.c. Should not be that
> hard. Note that at the time I was pushing devlink userspace, tipc also
> used libmnl as a part of iproute2, so devlink was not the first one.
> That is why I decided not to rewrite.
>
> As of the rest of the "common infrastructure", what exactly do you
> have in mind?
>
This is what I am getting at. Apparently, these resource patches for
devlink require a patched libmnl to work properly. It is wrong for
iproute2 to accept this patch and to build a devlink command that we
know does not work. That means iproute2 needs a dependency on a specific
version of libmnl - a version which does not yet exist because those
changes have not been accepted and libmnl version released. Adding that
dependency is going to inconvenience to all current users of the
iproute2 repo.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2018-01-04 16:58 ` David Ahern
@ 2018-01-04 17:17 ` David Miller
2018-01-25 15:24 ` Jiri Pirko
1 sibling, 0 replies; 61+ messages in thread
From: David Miller @ 2018-01-04 17:17 UTC (permalink / raw)
To: dsa
Cc: jiri, arkadis, netdev, roopa, mlxsw, andrew, vivien.didelot,
f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
idosch, jakub.kicinski, ast, daniel, simon.horman,
pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, linville,
gospo, steven.lin1, yuvalm, ogerlitz
From: David Ahern <dsa@cumulusnetworks.com>
Date: Thu, 4 Jan 2018 09:58:51 -0700
> This is what I am getting at. Apparently, these resource patches for
> devlink require a patched libmnl to work properly. It is wrong for
> iproute2 to accept this patch and to build a devlink command that we
> know does not work. That means iproute2 needs a dependency on a specific
> version of libmnl - a version which does not yet exist because those
> changes have not been accepted and libmnl version released. Adding that
> dependency is going to inconvenience to all current users of the
> iproute2 repo.
+1
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2018-01-04 16:13 ` Arkadi Sharshevsky
2018-01-04 16:44 ` David Ahern
@ 2018-01-04 18:03 ` David Ahern
1 sibling, 0 replies; 61+ messages in thread
From: David Ahern @ 2018-01-04 18:03 UTC (permalink / raw)
To: Arkadi Sharshevsky, Jiri Pirko, netdev, roopa
Cc: davem, mlxsw, andrew, vivien.didelot, f.fainelli, michael.chan,
ganeshgr, saeedm, matanb, leonro, idosch, jakub.kicinski, ast,
daniel, simon.horman, pieter.jansenvanvuuren, john.hurley,
alexander.h.duyck, linville, gospo, steven.lin1, yuvalm, ogerlitz
On 1/4/18 9:13 AM, Arkadi Sharshevsky wrote:
>>>> Also, it seems like the occ of 0 is wrong since we know from past
>>>> responses that if I set linear to 0 all of networking breaks.
Ok, this was a David bug. I was running ifreload after the devlink
reload command, but all of my connections to the switch are through
breakout ports and the ifreload was not running the devlink port split
command. Doing that before the ifreload and all is fine.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2018-01-04 16:58 ` David Ahern
2018-01-04 17:17 ` David Miller
@ 2018-01-25 15:24 ` Jiri Pirko
2018-01-25 15:38 ` David Ahern
1 sibling, 1 reply; 61+ messages in thread
From: Jiri Pirko @ 2018-01-25 15:24 UTC (permalink / raw)
To: David Ahern
Cc: Arkadi Sharshevsky, netdev, roopa, davem, mlxsw, andrew,
vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, linville,
gospo, steven.lin1, yuvalm, ogerlitz
Thu, Jan 04, 2018 at 05:58:51PM CET, dsa@cumulusnetworks.com wrote:
>On 1/3/18 11:36 AM, Jiri Pirko wrote:
>> Wed, Jan 03, 2018 at 07:29:46PM CET, dsa@cumulusnetworks.com wrote:
>>> On 1/3/18 11:17 AM, Jiri Pirko wrote:
>>>> Wed, Jan 03, 2018 at 07:14:16PM CET, dsa@cumulusnetworks.com wrote:
>>>>> On 1/3/18 11:05 AM, Arkadi Sharshevsky wrote:
>>>>>> As I stated this is a user-space bug which I fixed, and updated my repo
>>>>>> so please pull. Devlink uses mnl,and currently mnl does not support
>>>>>> extended ack. I added support for this in my local ver of libmnl:
>>>>>>
>>>>>> https://github.com/arkadis/libmnl.git
>>>>>>
>>>>>> On branch master, so you can check it out. Besides this bugs, which were
>>>>>> userspace, can please specify what are the pending problems from your
>>>>>> point of view? Thanks!
>>>>>
>>>>> devlink is in iproute2 package and it has extack support. See 'git log
>>>>> lib/libnetlink.c'
>>>>
>>>> Dave, devlink uses libmnl.
>>>>
>>>
>>> Now I remember. You wrote it independently and but needed iproute2 be a
>>> delivery vehicle. It uses none of the common infrastructure from
>>> iproute2. Could we make this more difficult ....
>>
>> Feel free to rewrite it to use lib/libnetlink.c. Should not be that
>> hard. Note that at the time I was pushing devlink userspace, tipc also
>> used libmnl as a part of iproute2, so devlink was not the first one.
>> That is why I decided not to rewrite.
>>
>> As of the rest of the "common infrastructure", what exactly do you
>> have in mind?
>>
>
>This is what I am getting at. Apparently, these resource patches for
>devlink require a patched libmnl to work properly. It is wrong for
What do you mean, "work properly". If there is no patched libmnl, only
thing what happens is that the extack won't be processed and shown.
That is the same behaviour as if you compile iproute2 package without
libmnl - all works, used just don't see extack messages.
I don't see any problem in that. Do you?
>iproute2 to accept this patch and to build a devlink command that we
>know does not work. That means iproute2 needs a dependency on a specific
>version of libmnl - a version which does not yet exist because those
>changes have not been accepted and libmnl version released. Adding that
>dependency is going to inconvenience to all current users of the
>iproute2 repo.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2018-01-25 15:24 ` Jiri Pirko
@ 2018-01-25 15:38 ` David Ahern
2018-01-25 15:50 ` Jiri Pirko
0 siblings, 1 reply; 61+ messages in thread
From: David Ahern @ 2018-01-25 15:38 UTC (permalink / raw)
To: Jiri Pirko
Cc: Arkadi Sharshevsky, netdev, roopa, davem, mlxsw, andrew,
vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, linville,
gospo, steven.lin1, yuvalm, ogerlitz
On 1/25/18 8:24 AM, Jiri Pirko wrote:
>>>> Now I remember. You wrote it independently and but needed iproute2 be a
>>>> delivery vehicle. It uses none of the common infrastructure from
>>>> iproute2. Could we make this more difficult ....
>>>
>>> Feel free to rewrite it to use lib/libnetlink.c. Should not be that
>>> hard. Note that at the time I was pushing devlink userspace, tipc also
>>> used libmnl as a part of iproute2, so devlink was not the first one.
>>> That is why I decided not to rewrite.
>>>
>>> As of the rest of the "common infrastructure", what exactly do you
>>> have in mind?
>>>
>>
>> This is what I am getting at. Apparently, these resource patches for
>> devlink require a patched libmnl to work properly. It is wrong for
>
> What do you mean, "work properly". If there is no patched libmnl, only
> thing what happens is that the extack won't be processed and shown.
> That is the same behaviour as if you compile iproute2 package without
> libmnl - all works, used just don't see extack messages.
>
> I don't see any problem in that. Do you?
I have libmnl installed. I build iproute2. I don't get extended ack
messages on devlink failures. That is a problem. That is building a
command that is known not to work as it should given the build dependencies.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch net-next v2 00/10] Add support for resource abstraction
2018-01-25 15:38 ` David Ahern
@ 2018-01-25 15:50 ` Jiri Pirko
0 siblings, 0 replies; 61+ messages in thread
From: Jiri Pirko @ 2018-01-25 15:50 UTC (permalink / raw)
To: David Ahern
Cc: Arkadi Sharshevsky, netdev, roopa, davem, mlxsw, andrew,
vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, linville,
gospo, steven.lin1, yuvalm, ogerlitz
Thu, Jan 25, 2018 at 04:38:41PM CET, dsa@cumulusnetworks.com wrote:
>On 1/25/18 8:24 AM, Jiri Pirko wrote:
>>>>> Now I remember. You wrote it independently and but needed iproute2 be a
>>>>> delivery vehicle. It uses none of the common infrastructure from
>>>>> iproute2. Could we make this more difficult ....
>>>>
>>>> Feel free to rewrite it to use lib/libnetlink.c. Should not be that
>>>> hard. Note that at the time I was pushing devlink userspace, tipc also
>>>> used libmnl as a part of iproute2, so devlink was not the first one.
>>>> That is why I decided not to rewrite.
>>>>
>>>> As of the rest of the "common infrastructure", what exactly do you
>>>> have in mind?
>>>>
>>>
>>> This is what I am getting at. Apparently, these resource patches for
>>> devlink require a patched libmnl to work properly. It is wrong for
>>
>> What do you mean, "work properly". If there is no patched libmnl, only
>> thing what happens is that the extack won't be processed and shown.
>> That is the same behaviour as if you compile iproute2 package without
>> libmnl - all works, used just don't see extack messages.
>>
>> I don't see any problem in that. Do you?
>
>I have libmnl installed. I build iproute2. I don't get extended ack
>messages on devlink failures. That is a problem. That is building a
>command that is known not to work as it should given the build dependencies.
Known to work, but not for devlink. But nevermind.
^ permalink raw reply [flat|nested] 61+ messages in thread
end of thread, other threads:[~2018-01-25 15:50 UTC | newest]
Thread overview: 61+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-26 11:23 [patch net-next v2 00/10] Add support for resource abstraction Jiri Pirko
2017-12-26 11:23 ` [patch net-next v2 01/10] devlink: Add per devlink instance lock Jiri Pirko
2017-12-26 11:23 ` [patch net-next v2 02/10] devlink: Add support for resource abstraction Jiri Pirko
2017-12-26 11:23 ` [patch net-next v2 03/10] devlink: Add support for reload Jiri Pirko
2017-12-26 11:23 ` [patch net-next v2 04/10] devlink: Add relation between dpipe and resource Jiri Pirko
2017-12-26 11:23 ` [patch net-next v2 05/10] mlxsw: pci: Add support for performing bus reset Jiri Pirko
2017-12-26 11:23 ` [patch net-next v2 06/10] mlxsw: spectrum: Register KVD resources with devlink Jiri Pirko
2017-12-26 11:23 ` [patch net-next v2 07/10] mlxsw: spectrum_dpipe: Connect dpipe tables to resources Jiri Pirko
2017-12-26 11:23 ` [patch net-next v2 08/10] mlxsw: spectrum: Add support for getting kvdl occupancy Jiri Pirko
2017-12-26 11:23 ` [patch net-next v2 09/10] mlxsw: pci: Add support for getting resource through devlink Jiri Pirko
2017-12-26 11:23 ` [patch net-next v2 10/10] mlxsw: core: Add support for reload Jiri Pirko
2017-12-27 4:05 ` [patch net-next v2 00/10] Add support for resource abstraction David Ahern
2017-12-27 8:09 ` Jiri Pirko
2017-12-27 8:23 ` Andrew Lunn
2017-12-27 9:37 ` Jiri Pirko
2017-12-27 13:08 ` Andrew Lunn
2017-12-27 13:15 ` Jiri Pirko
2017-12-27 16:38 ` David Ahern
2017-12-27 19:31 ` Andrew Lunn
2017-12-27 20:16 ` David Ahern
2017-12-27 8:28 ` Yuval Mintz
2017-12-27 16:34 ` David Ahern
2017-12-27 19:43 ` Roopa Prabhu
2017-12-28 8:21 ` Yuval Mintz
2017-12-30 21:15 ` David Ahern
2017-12-31 10:52 ` Arkadi Sharshevsky
2017-12-31 15:46 ` David Ahern
2018-01-01 12:23 ` Arkadi Sharshevsky
2017-12-27 20:15 ` Arkadi Sharshevsky
2017-12-28 8:25 ` Yuval Mintz
2017-12-28 16:09 ` David Ahern
2017-12-28 16:23 ` Jiri Pirko
2017-12-28 16:33 ` David Ahern
2017-12-28 16:43 ` Jiri Pirko
2017-12-29 17:09 ` Arkadi Sharshevsky
2017-12-30 10:18 ` Andrew Lunn
2017-12-30 10:25 ` Jiri Pirko
2017-12-30 17:26 ` Andrew Lunn
2018-01-01 14:58 ` Arkadi Sharshevsky
2018-01-02 10:08 ` Jiri Pirko
2018-01-02 13:41 ` Andrew Lunn
2018-01-02 14:35 ` Jiri Pirko
2018-01-02 18:05 ` David Ahern
2018-01-03 18:05 ` Arkadi Sharshevsky
2018-01-03 18:14 ` David Ahern
2018-01-03 18:17 ` Jiri Pirko
2018-01-03 18:29 ` David Ahern
2018-01-03 18:36 ` Jiri Pirko
2018-01-04 16:58 ` David Ahern
2018-01-04 17:17 ` David Miller
2018-01-25 15:24 ` Jiri Pirko
2018-01-25 15:38 ` David Ahern
2018-01-25 15:50 ` Jiri Pirko
2018-01-04 0:07 ` Arkadi Sharshevsky
2018-01-04 2:28 ` David Ahern
2018-01-04 9:24 ` Arkadi Sharshevsky
2018-01-04 12:41 ` Andrew Lunn
2018-01-04 15:58 ` David Ahern
2018-01-04 16:13 ` Arkadi Sharshevsky
2018-01-04 16:44 ` David Ahern
2018-01-04 18:03 ` David Ahern
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).