* [PATCH RFC net-next 1/4] net: core: page_pool_user: Allow flexibility of 'ifindex' value
2024-06-25 12:08 [PATCH RFC net-next 0/4] Adjust page pool netlink filling to non common case Amit Cohen
@ 2024-06-25 12:08 ` Amit Cohen
2024-06-25 12:08 ` [PATCH RFC net-next 2/4] net: core: page_pool_user: Change 'ifindex' for page pool dump Amit Cohen
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Amit Cohen @ 2024-06-25 12:08 UTC (permalink / raw)
To: kuba
Cc: davem, edumazet, pabeni, hawk, idosch, petrm, mlxsw, netdev,
Amit Cohen
Netlink message of page pool query includes 'ifindex'. Currently, this
value is always set to 'pool->slow.netdev->ifindex'. This allows getting
responses only for page pools which holds pointer to real netdevice.
In case that driver does not have 1:1 mapping between page pool and
netdevice, 'pool->slow.netdev->ifindex' will not point to netdevice. That
means that such drivers cannot query page pools info and statistics.
The functions page_pool_nl_stats_fill()/page_pool_nl_fill() get page
pool structure and use 'ifindex' which is stored in the pool to fill
netlink message. Instead, let the callers decide which 'ifindex' should
be used. For now, all the callers pass 'pool->slow.netdev->ifindex', so
there is no behavior change. The next patch will change dump behavior.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
---
net/core/page_pool_user.c | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index 3a3277ba167b..44948f7b9d68 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -30,7 +30,7 @@ static DEFINE_MUTEX(page_pools_lock);
*/
typedef int (*pp_nl_fill_cb)(struct sk_buff *rsp, const struct page_pool *pool,
- const struct genl_info *info);
+ const struct genl_info *info, int ifindex);
static int
netdev_nl_page_pool_get_do(struct genl_info *info, u32 id, pp_nl_fill_cb fill)
@@ -53,7 +53,7 @@ netdev_nl_page_pool_get_do(struct genl_info *info, u32 id, pp_nl_fill_cb fill)
goto err_unlock;
}
- err = fill(rsp, pool, info);
+ err = fill(rsp, pool, info, pool->slow.netdev->ifindex);
if (err)
goto err_free_msg;
@@ -92,7 +92,7 @@ netdev_nl_page_pool_get_dump(struct sk_buff *skb, struct netlink_callback *cb,
continue;
state->pp_id = pool->user.id;
- err = fill(skb, pool, info);
+ err = fill(skb, pool, info, pool->slow.netdev->ifindex);
if (err)
goto out;
}
@@ -108,7 +108,7 @@ netdev_nl_page_pool_get_dump(struct sk_buff *skb, struct netlink_callback *cb,
static int
page_pool_nl_stats_fill(struct sk_buff *rsp, const struct page_pool *pool,
- const struct genl_info *info)
+ const struct genl_info *info, int ifindex)
{
#ifdef CONFIG_PAGE_POOL_STATS
struct page_pool_stats stats = {};
@@ -125,9 +125,8 @@ page_pool_nl_stats_fill(struct sk_buff *rsp, const struct page_pool *pool,
nest = nla_nest_start(rsp, NETDEV_A_PAGE_POOL_STATS_INFO);
if (nla_put_uint(rsp, NETDEV_A_PAGE_POOL_ID, pool->user.id) ||
- (pool->slow.netdev->ifindex != LOOPBACK_IFINDEX &&
- nla_put_u32(rsp, NETDEV_A_PAGE_POOL_IFINDEX,
- pool->slow.netdev->ifindex)))
+ (ifindex != LOOPBACK_IFINDEX &&
+ nla_put_u32(rsp, NETDEV_A_PAGE_POOL_IFINDEX, ifindex)))
goto err_cancel_nest;
nla_nest_end(rsp, nest);
@@ -210,7 +209,7 @@ int netdev_nl_page_pool_stats_get_dumpit(struct sk_buff *skb,
static int
page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
- const struct genl_info *info)
+ const struct genl_info *info, int ifindex)
{
size_t inflight, refsz;
void *hdr;
@@ -222,9 +221,8 @@ page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
if (nla_put_uint(rsp, NETDEV_A_PAGE_POOL_ID, pool->user.id))
goto err_cancel;
- if (pool->slow.netdev->ifindex != LOOPBACK_IFINDEX &&
- nla_put_u32(rsp, NETDEV_A_PAGE_POOL_IFINDEX,
- pool->slow.netdev->ifindex))
+ if (ifindex != LOOPBACK_IFINDEX &&
+ nla_put_u32(rsp, NETDEV_A_PAGE_POOL_IFINDEX, ifindex))
goto err_cancel;
if (pool->user.napi_id &&
nla_put_uint(rsp, NETDEV_A_PAGE_POOL_NAPI_ID, pool->user.napi_id))
@@ -271,7 +269,7 @@ static void netdev_nl_page_pool_event(const struct page_pool *pool, u32 cmd)
if (!ntf)
return;
- if (page_pool_nl_fill(ntf, pool, &info)) {
+ if (page_pool_nl_fill(ntf, pool, &info, pool->slow.netdev->ifindex)) {
nlmsg_free(ntf);
return;
}
--
2.45.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH RFC net-next 2/4] net: core: page_pool_user: Change 'ifindex' for page pool dump
2024-06-25 12:08 [PATCH RFC net-next 0/4] Adjust page pool netlink filling to non common case Amit Cohen
2024-06-25 12:08 ` [PATCH RFC net-next 1/4] net: core: page_pool_user: Allow flexibility of 'ifindex' value Amit Cohen
@ 2024-06-25 12:08 ` Amit Cohen
2024-06-25 12:08 ` [PATCH RFC net-next 3/4] mlxsw: pci: Allow get page pool info/stats via netlink Amit Cohen
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Amit Cohen @ 2024-06-25 12:08 UTC (permalink / raw)
To: kuba
Cc: davem, edumazet, pabeni, hawk, idosch, petrm, mlxsw, netdev,
Amit Cohen
Currently, to dump all page pools, there is a loop which iterates over all
netdevices in the relevant net, then, for each netdevice, iterate over all
page pools which are attached to this netdevice, and call the fill()
function with the pool.
With the exiting code, the netlink message is filled with
'pool->slow.netdev->ifindex', which means that if a pool is used by
several netdevices, it will not be dumped with the real netdevice via
page-pool-get/page-pool-stats-get, as this pointer should be NULL in such
case.
Change 'ifindex' which is passed to fill() function, pass the 'ifindex'
of the netdevice which the pool is stored in its list. This should not
change the behavior for drivers which has netdevice per page pool, as the
same value is passed now. It will allow drivers which have page pool for
several netdevices to dump all the pools. The drivers just need to
make 'netdev->page_pools' list to hold all the pools which the netdevice
consumes pages from.
Note that 'ifindex' for get command is not changed.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
---
net/core/page_pool_user.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index 44948f7b9d68..ce4a34adad8a 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -92,7 +92,7 @@ netdev_nl_page_pool_get_dump(struct sk_buff *skb, struct netlink_callback *cb,
continue;
state->pp_id = pool->user.id;
- err = fill(skb, pool, info, pool->slow.netdev->ifindex);
+ err = fill(skb, pool, info, netdev->ifindex);
if (err)
goto out;
}
--
2.45.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH RFC net-next 3/4] mlxsw: pci: Allow get page pool info/stats via netlink
2024-06-25 12:08 [PATCH RFC net-next 0/4] Adjust page pool netlink filling to non common case Amit Cohen
2024-06-25 12:08 ` [PATCH RFC net-next 1/4] net: core: page_pool_user: Allow flexibility of 'ifindex' value Amit Cohen
2024-06-25 12:08 ` [PATCH RFC net-next 2/4] net: core: page_pool_user: Change 'ifindex' for page pool dump Amit Cohen
@ 2024-06-25 12:08 ` Amit Cohen
2024-06-25 12:08 ` [PATCH RFC net-next 4/4] mlxsw: Set page pools list for netdevices Amit Cohen
2024-06-25 14:35 ` [PATCH RFC net-next 0/4] Adjust page pool netlink filling to non common case Jakub Kicinski
4 siblings, 0 replies; 7+ messages in thread
From: Amit Cohen @ 2024-06-25 12:08 UTC (permalink / raw)
To: kuba
Cc: davem, edumazet, pabeni, hawk, idosch, petrm, mlxsw, netdev,
Amit Cohen
Spectrum ASICs do not have queue per netdevice, so mlxsw driver does not
have NAPI per netdevice, instead "dummy" netdevice is used. Lately, the
driver started using page pool for buffers allocations, each Rx queue (RDQ)
uses a dedicated page pool.
To allow user to query page pool info and statistics, page pool should
be attached to netdevice. Setting "dummy" netdevice as part of page pool
parameters allows querying info about specific pool.
Without this patch, "do" commands fail:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--do page-pool-get --json '{"id" : "20"}' --output-json
Netlink error: No such file or directory
nl_len = 36 (20) nl_flags = 0x100 nl_type = 2
error: -2
With this patch, user can query info of specific pool:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--do page-pool-get --json '{"id" : "20"}' --output-json
{"id": 20, "ifindex": 0, "napi-id": 108, "inflight": 3072, "inflight-mem": 12582912}
Note that this behavior works only in case that the devlink instance in the
initial namespace, in case that the devlink instance is reloaded to
another namesapce, get command will fail as the dummy netdevice associated with
the pools belongs to the initial namespace.
$ ip netns add pp_test
$ devlink dev reload pci/0000:xx:00.0 netns pp_test
$ ip netns exec pp_test ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/netdev.yaml \
--do page-pool-stats-get --json '{"info" : {"id" : "20"}}' --output-json
Netlink error: No such file or directory
nl_len = 36 (20) nl_flags = 0x100 nl_type = 2
error: -2
A next patch will allow user use "dump" command also.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
---
drivers/net/ethernet/mellanox/mlxsw/pci.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index cb043379c01c..7abb4b2fe541 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -892,6 +892,7 @@ static int mlxsw_pci_cq_page_pool_init(struct mlxsw_pci_queue *q,
pp_params.dev = &mlxsw_pci->pdev->dev;
pp_params.napi = &q->u.cq.napi;
pp_params.dma_dir = DMA_FROM_DEVICE;
+ pp_params.netdev = q->pci->napi_dev_rx;
page_pool = page_pool_create(&pp_params);
if (IS_ERR(page_pool))
--
2.45.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH RFC net-next 4/4] mlxsw: Set page pools list for netdevices
2024-06-25 12:08 [PATCH RFC net-next 0/4] Adjust page pool netlink filling to non common case Amit Cohen
` (2 preceding siblings ...)
2024-06-25 12:08 ` [PATCH RFC net-next 3/4] mlxsw: pci: Allow get page pool info/stats via netlink Amit Cohen
@ 2024-06-25 12:08 ` Amit Cohen
2024-06-25 14:35 ` [PATCH RFC net-next 0/4] Adjust page pool netlink filling to non common case Jakub Kicinski
4 siblings, 0 replies; 7+ messages in thread
From: Amit Cohen @ 2024-06-25 12:08 UTC (permalink / raw)
To: kuba
Cc: davem, edumazet, pabeni, hawk, idosch, petrm, mlxsw, netdev,
Amit Cohen
Spectrum ASICs do not have queue per netdevice, so mlxsw driver does not
have NAPI per netdevice, instead, "dummy" netdevice is used. Lately, the
driver started using page pool for buffers allocations, each Rx queue (RDQ)
uses a dedicated page pool.
A previous patch uses "dummy" Rx netdevice as the netdevice of each
allocated page pool. This will result "napi_dev_rx->page_pools" holding all
the page pools which are used by the driver.
Ideally, we would like to allow user to dump page pools - to get all the
pools which are allocated from the driver for each netdevice, as each
netdevice uses all the pools. For that, add bus operation to get
'hlist_head' of the "dummy" netdevice. Set this list head for all
netdevices as part of port creation. With the previous patches which allow
filling netlink with netdevice which holds the page pool in its list, now
we can dump all page pools for each netdevice.
Without this set, "dump" commands do not print page pools stats:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump page-pool-stats-get --output-json | jq
[]
With this set, "dump" commands print all the page pools for all the
netdevices:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump page-pool-stats-get --output-json | jq
[
...
{
"info": {
"id": 5,
"ifindex": 64
},
"alloc-fast": 1434916,
"alloc-slow": 49,
....
"recycle-ring": 1454621,
}
...
]
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump page-pool-get --output-json | \
jq -e ".[] | select(.ifindex == 64)" | grep "napi-id" | wc -l
56
Note that CONFIG_PAGE_POOL_STATS should be enabled to get statistics.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
---
drivers/net/ethernet/mellanox/mlxsw/core.c | 6 ++++++
drivers/net/ethernet/mellanox/mlxsw/core.h | 2 ++
drivers/net/ethernet/mellanox/mlxsw/pci.c | 8 ++++++++
drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 2 ++
4 files changed, 18 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 4a79c0d7e7ad..15b367b37ba9 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -2337,6 +2337,12 @@ int mlxsw_core_skb_transmit(struct mlxsw_core *mlxsw_core, struct sk_buff *skb,
}
EXPORT_SYMBOL(mlxsw_core_skb_transmit);
+struct hlist_head mlxsw_core_page_pools_head(struct mlxsw_core *mlxsw_core)
+{
+ return mlxsw_core->bus->page_pools_head(mlxsw_core->bus_priv);
+}
+EXPORT_SYMBOL(mlxsw_core_page_pools_head);
+
void mlxsw_core_ptp_transmitted(struct mlxsw_core *mlxsw_core,
struct sk_buff *skb, u16 local_port)
{
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index 6d11225594dd..9925f541ed50 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -95,6 +95,7 @@ bool mlxsw_core_skb_transmit_busy(struct mlxsw_core *mlxsw_core,
const struct mlxsw_tx_info *tx_info);
int mlxsw_core_skb_transmit(struct mlxsw_core *mlxsw_core, struct sk_buff *skb,
const struct mlxsw_tx_info *tx_info);
+struct hlist_head mlxsw_core_page_pools_head(struct mlxsw_core *mlxsw_core);
void mlxsw_core_ptp_transmitted(struct mlxsw_core *mlxsw_core,
struct sk_buff *skb, u16 local_port);
@@ -498,6 +499,7 @@ struct mlxsw_bus {
u32 (*read_utc_nsec)(void *bus_priv);
enum mlxsw_cmd_mbox_config_profile_lag_mode (*lag_mode)(void *bus_priv);
enum mlxsw_cmd_mbox_config_profile_flood_mode (*flood_mode)(void *priv);
+ struct hlist_head (*page_pools_head)(void *bus_priv);
u8 features;
};
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 7abb4b2fe541..16516ae6a818 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -2179,6 +2179,13 @@ mlxsw_pci_flood_mode(void *bus_priv)
return mlxsw_pci->flood_mode;
}
+static struct hlist_head mlxsw_pci_page_pools_head(void *bus_priv)
+{
+ struct mlxsw_pci *mlxsw_pci = bus_priv;
+
+ return mlxsw_pci->napi_dev_rx->page_pools;
+}
+
static const struct mlxsw_bus mlxsw_pci_bus = {
.kind = "pci",
.init = mlxsw_pci_init,
@@ -2192,6 +2199,7 @@ static const struct mlxsw_bus mlxsw_pci_bus = {
.read_utc_nsec = mlxsw_pci_read_utc_nsec,
.lag_mode = mlxsw_pci_lag_mode,
.flood_mode = mlxsw_pci_flood_mode,
+ .page_pools_head = mlxsw_pci_page_pools_head,
.features = MLXSW_BUS_F_TXRX | MLXSW_BUS_F_RESET,
};
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index f064789f3240..3c78690c248f 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -1826,6 +1826,7 @@ static int mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u16 local_port,
goto err_register_netdev;
}
+ dev->page_pools = mlxsw_core_page_pools_head(mlxsw_sp->core);
mlxsw_core_schedule_dw(&mlxsw_sp_port->periodic_hw_stats.update_dw, 0);
return 0;
@@ -1880,6 +1881,7 @@ static void mlxsw_sp_port_remove(struct mlxsw_sp *mlxsw_sp, u16 local_port)
u8 module = mlxsw_sp_port->mapping.module;
cancel_delayed_work_sync(&mlxsw_sp_port->periodic_hw_stats.update_dw);
+ INIT_HLIST_HEAD(&mlxsw_sp_port->dev->page_pools); /* Reset list head. */
cancel_delayed_work_sync(&mlxsw_sp_port->ptp.shaper_dw);
unregister_netdev(mlxsw_sp_port->dev); /* This calls ndo_stop */
mlxsw_sp_port_ptp_clear(mlxsw_sp_port);
--
2.45.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH RFC net-next 0/4] Adjust page pool netlink filling to non common case
2024-06-25 12:08 [PATCH RFC net-next 0/4] Adjust page pool netlink filling to non common case Amit Cohen
` (3 preceding siblings ...)
2024-06-25 12:08 ` [PATCH RFC net-next 4/4] mlxsw: Set page pools list for netdevices Amit Cohen
@ 2024-06-25 14:35 ` Jakub Kicinski
2024-06-25 15:37 ` Amit Cohen
4 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2024-06-25 14:35 UTC (permalink / raw)
To: Amit Cohen; +Cc: davem, edumazet, pabeni, hawk, idosch, petrm, mlxsw, netdev
On Tue, 25 Jun 2024 15:08:03 +0300 Amit Cohen wrote:
> Most network drivers has 1:1 mapping between netdevice and event queues,
> so then each page pool is used by only one netdevice. This is not the case
> in mlxsw driver.
>
> Currently, the netlink message is filled with 'pool->slow.netdev->ifindex',
> which should be NULL in case that several netdevices use the same pool.
> Adjust page pool netlink filling to use the netdevice which the pool is
> stored in its list. See more info in commit messages.
>
> Without this set, mlxsw driver cannot dump all page pools:
> $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
> --dump page-pool-stats-get --output-json | jq
> []
>
> With this set, "dump" command prints all the page pools for all the
> netdevices:
> $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
> --dump page-pool-get --output-json | \
> jq -e ".[] | select(.ifindex == 64)" | grep "napi-id" | wc -l
> 56
>
> From driver POV, such queries are supported by associating the pools with
> an unregistered netdevice (dummy netdevice). The following limitations
> are caused by such implementation:
> 1. The get command output specifies the 'ifindex' as 0, which is
> meaningless. `iproute2` will print this as "*", but there might be other
> tools which fail in such case.
> 2. get command does not work when devlink instance is reloaded to namespace
> which is not the initial one, as the dummy device associated with the pools
> belongs to the initial namespace.
> See examples in commit messages.
>
> We would like to expose page pool stats and info via the standard
> interface, but such implementation is not perfect. An additional option
> is to use debugfs, but we prefer to avoid it, if it is possible. Any
> suggestions for better implementation in case of pool for several
> netdevices will be welcomed.
If I read the code correctly you dump all page pools for all port
netdevs? Primary use for page pool stats right now is to measure
how much memory have netdevs gobbled up. You can't duplicate entries,
because user space may double count the memory...
How about we instead add a net pointer and have the page pools listed
under loopback from the start? That's the best we can do I reckon.
Or just go with debugfs / ethtool -S, the standard interface is for
things which are standard. If the device doesn't work in a standard way
there's no need to shoehorn it in. This series feels a bit like checkbox
engineering to me, if I'm completely honest..
^ permalink raw reply [flat|nested] 7+ messages in thread* RE: [PATCH RFC net-next 0/4] Adjust page pool netlink filling to non common case
2024-06-25 14:35 ` [PATCH RFC net-next 0/4] Adjust page pool netlink filling to non common case Jakub Kicinski
@ 2024-06-25 15:37 ` Amit Cohen
0 siblings, 0 replies; 7+ messages in thread
From: Amit Cohen @ 2024-06-25 15:37 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem@davemloft.net, edumazet@google.com, pabeni@redhat.com,
hawk@kernel.org, Ido Schimmel, Petr Machata, mlxsw,
netdev@vger.kernel.org
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Tuesday, 25 June 2024 17:35
> To: Amit Cohen <amcohen@nvidia.com>
> Cc: davem@davemloft.net; edumazet@google.com; pabeni@redhat.com; hawk@kernel.org; Ido Schimmel <idosch@nvidia.com>; Petr
> Machata <petrm@nvidia.com>; mlxsw <mlxsw@nvidia.com>; netdev@vger.kernel.org
> Subject: Re: [PATCH RFC net-next 0/4] Adjust page pool netlink filling to non common case
>
> On Tue, 25 Jun 2024 15:08:03 +0300 Amit Cohen wrote:
> > Most network drivers has 1:1 mapping between netdevice and event queues,
> > so then each page pool is used by only one netdevice. This is not the case
> > in mlxsw driver.
> >
> > Currently, the netlink message is filled with 'pool->slow.netdev->ifindex',
> > which should be NULL in case that several netdevices use the same pool.
> > Adjust page pool netlink filling to use the netdevice which the pool is
> > stored in its list. See more info in commit messages.
> >
> > Without this set, mlxsw driver cannot dump all page pools:
> > $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
> > --dump page-pool-stats-get --output-json | jq
> > []
> >
> > With this set, "dump" command prints all the page pools for all the
> > netdevices:
> > $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
> > --dump page-pool-get --output-json | \
> > jq -e ".[] | select(.ifindex == 64)" | grep "napi-id" | wc -l
> > 56
> >
> > From driver POV, such queries are supported by associating the pools with
> > an unregistered netdevice (dummy netdevice). The following limitations
> > are caused by such implementation:
> > 1. The get command output specifies the 'ifindex' as 0, which is
> > meaningless. `iproute2` will print this as "*", but there might be other
> > tools which fail in such case.
> > 2. get command does not work when devlink instance is reloaded to namespace
> > which is not the initial one, as the dummy device associated with the pools
> > belongs to the initial namespace.
> > See examples in commit messages.
> >
> > We would like to expose page pool stats and info via the standard
> > interface, but such implementation is not perfect. An additional option
> > is to use debugfs, but we prefer to avoid it, if it is possible. Any
> > suggestions for better implementation in case of pool for several
> > netdevices will be welcomed.
>
> If I read the code correctly you dump all page pools for all port
> netdevs?
Yes.
Primary use for page pool stats right now is to measure
> how much memory have netdevs gobbled up. You can't duplicate entries,
> because user space may double count the memory...
>
> How about we instead add a net pointer and have the page pools listed
> under loopback from the start? That's the best we can do I reckon.
> Or just go with debugfs / ethtool -S, the standard interface is for
> things which are standard. If the device doesn't work in a standard way
> there's no need to shoehorn it in. This series feels a bit like checkbox
> engineering to me, if I'm completely honest..
Thanks for your feedback, I sent it to consult if you have better suggestions for this case.
I agree that if the device is not standard, we should not use the standard interface when the solution is not perfect.
I think that for now we won't expose statistics, if we will find it necessary, we will probably use debugfs for that.
Thanks!
^ permalink raw reply [flat|nested] 7+ messages in thread