From: Jakub Kicinski <kuba@kernel.org>
To: Heng Qi <hengqi@linux.alibaba.com>
Cc: netdev@vger.kernel.org, virtualization@lists.linux.dev,
"David S . Miller" <davem@davemloft.net>,
Paolo Abeni <pabeni@redhat.com>,
Eric Dumazet <edumazet@google.com>,
Jason Wang <jasowang@redhat.com>,
"Michael S . Tsirkin" <mst@redhat.com>,
Brett Creeley <bcreeley@amd.com>,
Ratheesh Kannoth <rkannoth@marvell.com>,
Alexander Lobakin <aleksander.lobakin@intel.com>,
Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
Tal Gilboa <talgi@nvidia.com>, Jonathan Corbet <corbet@lwn.net>,
linux-doc@vger.kernel.org,
Maxime Chevallier <maxime.chevallier@bootlin.com>,
Jiri Pirko <jiri@resnulli.us>,
Paul Greenwalt <paul.greenwalt@intel.com>,
Ahmed Zaki <ahmed.zaki@intel.com>,
Vladimir Oltean <vladimir.oltean@nxp.com>,
Kory Maincent <kory.maincent@bootlin.com>,
Andrew Lunn <andrew@lunn.ch>,
"justinstitt @ google . com" <justinstitt@google.com>
Subject: Re: [PATCH net-next v10 2/4] ethtool: provide customized dim profile management
Date: Fri, 26 Apr 2024 18:33:33 -0700 [thread overview]
Message-ID: <20240426183333.257ccae5@kernel.org> (raw)
In-Reply-To: <20240425165948.111269-3-hengqi@linux.alibaba.com>
On Fri, 26 Apr 2024 00:59:46 +0800 Heng Qi wrote:
> The NetDIM library, currently leveraged by an array of NICs, delivers
> excellent acceleration benefits. Nevertheless, NICs vary significantly
> in their dim profile list prerequisites.
>
> Specifically, virtio-net backends may present diverse sw or hw device
> implementation, making a one-size-fits-all parameter list impractical.
> On Alibaba Cloud, the virtio DPU's performance under the default DIM
> profile falls short of expectations, partly due to a mismatch in
> parameter configuration.
>
> I also noticed that ice/idpf/ena and other NICs have customized
> profilelist or placed some restrictions on dim capabilities.
>
> Motivated by this, I tried adding new params for "ethtool -C" that provides
> a per-device control to modify and access a device's interrupt parameters.
>
> Usage
> ========
> The target NIC is named ethx.
>
> Assume that ethx only declares support for rx profile setting
> (with DIM_PROFILE_RX flag set in profile_flags) and supports modification
> of usec and pkt fields.
>
> 1. Query the currently customized list of the device
>
> $ ethtool -c ethx
> ...
> rx-profile:
> {.usec = 1, .pkts = 256, .comps = n/a,},
> {.usec = 8, .pkts = 256, .comps = n/a,},
> {.usec = 64, .pkts = 256, .comps = n/a,},
> {.usec = 128, .pkts = 256, .comps = n/a,},
> {.usec = 256, .pkts = 256, .comps = n/a,}
> tx-profile: n/a
>
> 2. Tune
> $ ethtool -C ethx rx-profile 1,1,n_2,n,n_3,3,n_4,4,n_n,5,n
> "n" means do not modify this field.
> $ ethtool -c ethx
> ...
> rx-profile:
> {.usec = 1, .pkts = 1, .comps = n/a,},
> {.usec = 2, .pkts = 256, .comps = n/a,},
> {.usec = 3, .pkts = 3, .comps = n/a,},
> {.usec = 4, .pkts = 4, .comps = n/a,},
> {.usec = 256, .pkts = 5, .comps = n/a,}
> tx-profile: n/a
>
> 3. Hint
> If the device does not support some type of customized dim profiles,
> the corresponding "n/a" will display.
>
> If the "n/a" field is being modified, -EOPNOTSUPP will be reported.
>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> ---
> Documentation/netlink/specs/ethtool.yaml | 23 ++
> Documentation/networking/ethtool-netlink.rst | 4 +
> include/linux/dim.h | 60 +++++
> include/linux/ethtool.h | 7 +-
> include/linux/netdevice.h | 5 +
> include/uapi/linux/ethtool_netlink.h | 20 ++
> lib/dim/net_dim.c | 73 +++++
> net/ethtool/coalesce.c | 264 ++++++++++++++++++-
> 8 files changed, 454 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml
> index 87ae7b397984..3c51a1a0b5d9 100644
> --- a/Documentation/netlink/specs/ethtool.yaml
> +++ b/Documentation/netlink/specs/ethtool.yaml
> @@ -413,6 +413,18 @@ attribute-sets:
> -
> name: combined-count
> type: u32
> + -
> + name: irq-moderation
> + attributes:
> + -
> + name: usec
> + type: u32
> + -
> + name: pkts
> + type: u32
> + -
> + name: comps
> + type: u32
>
> -
> name: coalesce
> @@ -502,6 +514,15 @@ attribute-sets:
> -
> name: tx-aggr-time-usecs
> type: u32
> + -
> + name: rx-profile
> + type: nest
> + nested-attributes: irq-moderation
> + -
> + name: tx-profile
> + type: nest
> + nested-attributes: irq-moderation
> +
> -
> name: pause-stat
> attributes:
> @@ -1313,6 +1334,8 @@ operations:
> - tx-aggr-max-bytes
> - tx-aggr-max-frames
> - tx-aggr-time-usecs
> + - rx-profile
> + - tx-profile
> dump: *coalesce-get-op
> -
> name: coalesce-set
> diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
> index 4e63d3708ed9..78ee25081498 100644
> --- a/Documentation/networking/ethtool-netlink.rst
> +++ b/Documentation/networking/ethtool-netlink.rst
> @@ -1040,6 +1040,8 @@ Kernel response contents:
> ``ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES`` u32 max aggr size, Tx
> ``ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES`` u32 max aggr packets, Tx
> ``ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS`` u32 time (us), aggr, Tx
> + ``ETHTOOL_A_COALESCE_RX_PROFILE`` nested profile of DIM, Rx
> + ``ETHTOOL_A_COALESCE_TX_PROFILE`` nested profile of DIM, Tx
> =========================================== ====== =======================
>
> Attributes are only included in reply if their value is not zero or the
> @@ -1105,6 +1107,8 @@ Request contents:
> ``ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES`` u32 max aggr size, Tx
> ``ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES`` u32 max aggr packets, Tx
> ``ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS`` u32 time (us), aggr, Tx
> + ``ETHTOOL_A_COALESCE_RX_PROFILE`` nested profile of DIM, Rx
> + ``ETHTOOL_A_COALESCE_TX_PROFILE`` nested profile of DIM, Tx
> =========================================== ====== =======================
>
> Request is rejected if it attributes declared as unsupported by driver (i.e.
> diff --git a/include/linux/dim.h b/include/linux/dim.h
> index 43398f5eade2..af01389fcf39 100644
> --- a/include/linux/dim.h
> +++ b/include/linux/dim.h
> @@ -9,6 +9,7 @@
> #include <linux/module.h>
> #include <linux/types.h>
> #include <linux/workqueue.h>
> +#include <linux/netdevice.h>
>
> /* Number of DIM profiles and period mode. */
> #define NET_DIM_PARAMS_NUM_PROFILES 5
> @@ -53,6 +54,39 @@ struct dim_cq_moder {
> u8 cq_period_mode;
> };
>
> +#define DIM_PROFILE_RX BIT(0) /* support rx dim profile modification */
> +#define DIM_PROFILE_TX BIT(1) /* support tx dim profile modification */
> +
> +#define DIM_COALESCE_USEC BIT(0) /* support usec field modification */
> +#define DIM_COALESCE_PKTS BIT(1) /* support pkts field modification */
> +#define DIM_COALESCE_COMPS BIT(2) /* support comps field modification */
> +
> +struct dim_irq_moder {
> + /* See DIM_PROFILE_* */
> + u8 profile_flags;
> +
> + /* See DIM_COALESCE_* for Rx and Tx */
> + u8 coal_flags;
> +
> + /* Rx DIM period count mode: CQE or EQE */
> + u8 dim_rx_mode;
> +
> + /* Tx DIM period count mode: CQE or EQE */
> + u8 dim_tx_mode;
> +
> + /* DIM profile list for Rx */
> + struct dim_cq_moder __rcu *rx_profile;
> +
> + /* DIM profile list for Tx */
> + struct dim_cq_moder __rcu *tx_profile;
> +
> + /* Rx DIM worker function scheduled by net_dim() */
> + void (*rx_dim_work)(struct work_struct *work);
> +
> + /* Tx DIM worker function scheduled by net_dim() */
> + void (*tx_dim_work)(struct work_struct *work);
> +};
> +
> /**
> * struct dim_sample - Structure for DIM sample data.
> * Used for communications between DIM and its consumer.
> @@ -198,6 +232,32 @@ enum dim_step_result {
> DIM_ON_EDGE,
> };
>
> +/**
> + * net_dim_init_irq_moder - collect information to initialize irq moderation
> + * @dev: target network device
> + * @profile_flags: Rx or Tx profile modification capability
> + * @coal_flags: irq moderation params flags
> + * @rx_mode: CQ period mode for Rx
> + * @tx_mode: CQ period mode for Tx
> + * void (*rx_dim_work)(struct work_struct *work);
> + * Rx worker called after dim decision.
> + *
> + * void (*tx_dim_work)(struct work_struct *work);
> + * Tx worker called after dim decision.
this format is not going to make ./scripts/kernel-doc happy,
run it on the modified files it will show you the warnings:
./scripts/kernel-doc -Wall -none file
> diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
> index b4f0d233d048..4837e37e8b10 100644
> --- a/include/uapi/linux/ethtool_netlink.h
> +++ b/include/uapi/linux/ethtool_netlink.h
> @@ -416,12 +416,32 @@ enum {
> ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES, /* u32 */
> ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES, /* u32 */
> ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS, /* u32 */
> + ETHTOOL_A_COALESCE_RX_PROFILE, /* nest - _A_PROFILE_IRQ_MODERATION */
> + ETHTOOL_A_COALESCE_TX_PROFILE, /* nest - _A_PROFILE_IRQ_MODERATION */
>
> /* add new constants above here */
> __ETHTOOL_A_COALESCE_CNT,
> ETHTOOL_A_COALESCE_MAX = (__ETHTOOL_A_COALESCE_CNT - 1)
> };
>
> +enum {
> + ETHTOOL_A_PROFILE_UNSPEC,
> + ETHTOOL_A_PROFILE_IRQ_MODERATION, /* nest, _A_IRQ_MODERATION_* */
> +
> + __ETHTOOL_A_PROFILE_CNT,
> + ETHTOOL_A_PROFILE_MAX = (__ETHTOOL_A_PROFILE_CNT - 1)
> +};
I think this doesn't match what you described in the YAML spec.
There is no "irq-moderation" layer there and no multi-attr: true...
Does tools/net/ynl/cli.py work with the new attributes?
> +enum {
> + ETHTOOL_A_IRQ_MODERATION_UNSPEC,
> + ETHTOOL_A_IRQ_MODERATION_USEC, /* u32 */
> + ETHTOOL_A_IRQ_MODERATION_PKTS, /* u32 */
> + ETHTOOL_A_IRQ_MODERATION_COMPS, /* u32 */
> +
> + __ETHTOOL_A_IRQ_MODERATION_CNT,
> + ETHTOOL_A_IRQ_MODERATION_MAX = (__ETHTOOL_A_IRQ_MODERATION_CNT - 1)
> +};
> +
> /* PAUSE */
>
> enum {
> diff --git a/lib/dim/net_dim.c b/lib/dim/net_dim.c
> index 4e32f7aaac86..ec0dc21793c0 100644
> --- a/lib/dim/net_dim.c
> +++ b/lib/dim/net_dim.c
> @@ -101,6 +101,79 @@ net_dim_get_def_tx_moderation(u8 cq_period_mode)
> }
> EXPORT_SYMBOL(net_dim_get_def_tx_moderation);
>
> +int net_dim_init_irq_moder(struct net_device *dev, u8 profile_flags,
> + u8 coal_flags, u8 rx_mode, u8 tx_mode,
> + void (*rx_dim_work)(struct work_struct *work),
> + void (*tx_dim_work)(struct work_struct *work))
> +{
> + struct dim_cq_moder *rxp, *txp;
> + struct dim_irq_moder *moder;
> + int len;
> +
> + dev->irq_moder = kzalloc(sizeof(*dev->irq_moder), GFP_KERNEL);
> + if (!dev->irq_moder)
> + goto err_moder;
> +
> + moder = dev->irq_moder;
> + len = NET_DIM_PARAMS_NUM_PROFILES * sizeof(*moder->rx_profile);
> +
> + moder->profile_flags = profile_flags;
> + moder->coal_flags = coal_flags;
> +
> + if (profile_flags & DIM_PROFILE_RX) {
> + moder->dim_rx_mode = rx_mode;
> + moder->rx_dim_work = rx_dim_work;
> + rxp = kmemdup(rx_profile[rx_mode], len, GFP_KERNEL);
> + if (!rxp)
> + goto err_rx_profile;
> +
> + rcu_assign_pointer(moder->rx_profile, rxp);
> + }
> +
> + if (profile_flags & DIM_PROFILE_TX) {
> + moder->dim_tx_mode = tx_mode;
> + moder->tx_dim_work = tx_dim_work;
> + txp = kmemdup(tx_profile[tx_mode], len, GFP_KERNEL);
> + if (!txp)
> + goto err_tx_profile;
you need to init rxp to NULL, otherwise this goto may read
uninitialized value
> + rcu_assign_pointer(moder->tx_profile, txp);
> + }
> +
> + return 0;
> +
> +err_tx_profile:
> + kfree(rxp);
> +err_rx_profile:
> + kfree(moder);
> +err_moder:
> + return -ENOMEM;
> +}
> +EXPORT_SYMBOL(net_dim_init_irq_moder);
> +
> +void net_dim_free_irq_moder(struct net_device *dev)
> +{
> + struct dim_cq_moder *rx_profile, *tx_profile;
> +
> + if (!dev->irq_moder)
> + return;
> +
> + rcu_read_lock();
> + rx_profile = rcu_dereference(dev->irq_moder->rx_profile);
> + tx_profile = rcu_dereference(dev->irq_moder->tx_profile);
> + rcu_read_unlock();
rtnl_dereference() ? use of the pointers you got outside of the read
critical section looks wrong
> + rcu_assign_pointer(dev->irq_moder->tx_profile, NULL);
> + rcu_assign_pointer(dev->irq_moder->rx_profile, NULL);
> +
> + synchronize_rcu();
Better to use kfree_rcu(), synchronize_rcu() can be quite slow
> + kfree(rx_profile);
> + kfree(tx_profile);
> + kfree(dev->irq_moder);
> +}
> +EXPORT_SYMBOL(net_dim_free_irq_moder);
> +
> static int net_dim_step(struct dim *dim)
> {
> if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
> diff --git a/net/ethtool/coalesce.c b/net/ethtool/coalesce.c
> index 83112c1a71ae..7b852938cf01 100644
> --- a/net/ethtool/coalesce.c
> +++ b/net/ethtool/coalesce.c
> @@ -1,5 +1,6 @@
> // SPDX-License-Identifier: GPL-2.0-only
>
> +#include <linux/dim.h>
> #include "netlink.h"
> #include "common.h"
>
> @@ -82,6 +83,14 @@ static int coalesce_prepare_data(const struct ethnl_req_info *req_base,
> static int coalesce_reply_size(const struct ethnl_req_info *req_base,
> const struct ethnl_reply_data *reply_base)
> {
> + int modersz = nla_total_size(0) + /* _PROFILE_IRQ_MODERATION, nest */
> + nla_total_size(sizeof(u32)) + /* _IRQ_MODERATION_USEC */
> + nla_total_size(sizeof(u32)) + /* _IRQ_MODERATION_PKTS */
> + nla_total_size(sizeof(u32)); /* _IRQ_MODERATION_COMPS */
> +
> + int total_modersz = nla_total_size(0) + /* _{R,T}X_PROFILE, nest */
> + modersz * NET_DIM_PARAMS_NUM_PROFILES;
> +
> return nla_total_size(sizeof(u32)) + /* _RX_USECS */
> nla_total_size(sizeof(u32)) + /* _RX_MAX_FRAMES */
> nla_total_size(sizeof(u32)) + /* _RX_USECS_IRQ */
> @@ -108,7 +117,8 @@ static int coalesce_reply_size(const struct ethnl_req_info *req_base,
> nla_total_size(sizeof(u8)) + /* _USE_CQE_MODE_RX */
> nla_total_size(sizeof(u32)) + /* _TX_AGGR_MAX_BYTES */
> nla_total_size(sizeof(u32)) + /* _TX_AGGR_MAX_FRAMES */
> - nla_total_size(sizeof(u32)); /* _TX_AGGR_TIME_USECS */
> + nla_total_size(sizeof(u32)) + /* _TX_AGGR_TIME_USECS */
> + total_modersz * 2; /* _{R,T}X_PROFILE */
> }
>
> static bool coalesce_put_u32(struct sk_buff *skb, u16 attr_type, u32 val,
> @@ -127,6 +137,75 @@ static bool coalesce_put_bool(struct sk_buff *skb, u16 attr_type, u32 val,
> return nla_put_u8(skb, attr_type, !!val);
> }
>
> +#if IS_ENABLED(CONFIG_DIMLIB)
Can we decrease the use of IS_ENABLED() here, somehow?
Do we need to protect anything else than accesses to dev->irq_moder ?
Does coalesce_put_profile() need CONFIG_DIMLIB to build?
> +/**
> + * coalesce_put_profile - fill reply with a nla nest with four child nla nests.
> + * @skb: socket buffer the message is stored in
> + * @attr_type: nest attr type ETHTOOL_A_COALESCE_*X_PROFILE
> + * @profile: data passed to userspace
> + * @coal_flags: modifiable parameters supported by the driver
> + *
> + * Put a dim profile nest attribute. Refer to ETHTOOL_A_PROFILE_IRQ_MODERATION.
> + *
> + * Return: 0 on success or a negative error code.
> + */
> +static int coalesce_put_profile(struct sk_buff *skb, u16 attr_type,
> + const struct dim_cq_moder *profile,
> + u8 coal_flags)
> +{
> + struct nlattr *profile_attr, *moder_attr;
> + int i, ret;
> +
> + if (!profile || !coal_flags)
> + return 0;
> +
> + profile_attr = nla_nest_start(skb, attr_type);
> + if (!profile_attr)
> + return -EMSGSIZE;
> +
> + for (i = 0; i < NET_DIM_PARAMS_NUM_PROFILES; i++) {
> + moder_attr = nla_nest_start(skb, ETHTOOL_A_PROFILE_IRQ_MODERATION);
> + if (!moder_attr) {
> + ret = -EMSGSIZE;
> + goto cancel_profile;
> + }
> +
> + if (coal_flags & DIM_COALESCE_USEC) {
> + ret = nla_put_u32(skb, ETHTOOL_A_IRQ_MODERATION_USEC,
> + profile[i].usec);
> + if (ret)
> + goto cancel_moder;
> + }
> +
> + if (coal_flags & DIM_COALESCE_PKTS) {
> + ret = nla_put_u32(skb, ETHTOOL_A_IRQ_MODERATION_PKTS,
> + profile[i].pkts);
> + if (ret)
> + goto cancel_moder;
> + }
> +
> + if (coal_flags & DIM_COALESCE_COMPS) {
> + ret = nla_put_u32(skb, ETHTOOL_A_IRQ_MODERATION_COMPS,
> + profile[i].comps);
> + if (ret)
> + goto cancel_moder;
> + }
> +
> + nla_nest_end(skb, moder_attr);
> + }
> +
> + nla_nest_end(skb, profile_attr);
> +
> + return 0;
> +
> +cancel_moder:
> + nla_nest_cancel(skb, moder_attr);
> +cancel_profile:
> + nla_nest_cancel(skb, profile_attr);
> + return ret;
> +}
> +#endif
> +
> static int coalesce_fill_reply(struct sk_buff *skb,
> const struct ethnl_req_info *req_base,
> const struct ethnl_reply_data *reply_base)
> @@ -134,6 +213,12 @@ static int coalesce_fill_reply(struct sk_buff *skb,
> const struct coalesce_reply_data *data = COALESCE_REPDATA(reply_base);
> const struct kernel_ethtool_coalesce *kcoal = &data->kernel_coalesce;
> const struct ethtool_coalesce *coal = &data->coalesce;
> +#if IS_ENABLED(CONFIG_DIMLIB)
> + struct net_device *dev = req_base->dev;
> + struct dim_irq_moder *irq_moder = dev->irq_moder;
> + u8 coal_flags;
> + int ret;
> +#endif
> u32 supported = data->supported_params;
>
> if (coalesce_put_u32(skb, ETHTOOL_A_COALESCE_RX_USECS,
> @@ -192,11 +277,51 @@ static int coalesce_fill_reply(struct sk_buff *skb,
> kcoal->tx_aggr_time_usecs, supported))
> return -EMSGSIZE;
>
> +#if IS_ENABLED(CONFIG_DIMLIB)
> + if (!irq_moder)
> + return 0;
> +
> + coal_flags = irq_moder->coal_flags;
> + rcu_read_lock();
> + if (irq_moder->profile_flags & DIM_PROFILE_RX) {
> + ret = coalesce_put_profile(skb, ETHTOOL_A_COALESCE_RX_PROFILE,
> + rcu_dereference(irq_moder->rx_profile),
rtnl_deference can be used there, I assume updates are protected by
rtnl_lock
> + coal_flags);
> + if (ret) {
> + rcu_read_unlock();
> + return ret;
> + }
> + }
> +
> + if (irq_moder->profile_flags & DIM_PROFILE_TX) {
> + ret = coalesce_put_profile(skb, ETHTOOL_A_COALESCE_TX_PROFILE,
> + rcu_dereference(irq_moder->tx_profile),
> + coal_flags);
> + if (ret) {
> + rcu_read_unlock();
> + return ret;
> + }
> + }
> + rcu_read_unlock();
> +#endif
> return 0;
> }
>
> /* COALESCE_SET */
>
> +#if IS_ENABLED(CONFIG_DIMLIB)
> +static const struct nla_policy coalesce_irq_moderation_policy[] = {
> + [ETHTOOL_A_IRQ_MODERATION_USEC] = {.type = NLA_U32},
> + [ETHTOOL_A_IRQ_MODERATION_PKTS] = {.type = NLA_U32},
> + [ETHTOOL_A_IRQ_MODERATION_COMPS] = {.type = NLA_U32},
nit: empty spaces around brackets, please ...
> +};
> +
> +static const struct nla_policy coalesce_profile_irq_policy[] = {
> + [ETHTOOL_A_PROFILE_IRQ_MODERATION] =
> + NLA_POLICY_NESTED(coalesce_irq_moderation_policy),
> +};
> +#endif
> +
> const struct nla_policy ethnl_coalesce_set_policy[] = {
> [ETHTOOL_A_COALESCE_HEADER] =
> NLA_POLICY_NESTED(ethnl_header_policy),
> @@ -227,6 +352,12 @@ const struct nla_policy ethnl_coalesce_set_policy[] = {
> [ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES] = { .type = NLA_U32 },
> [ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES] = { .type = NLA_U32 },
> [ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS] = { .type = NLA_U32 },
... like here ^
> +#if IS_ENABLED(CONFIG_DIMLIB)
> + [ETHTOOL_A_COALESCE_RX_PROFILE] =
> + NLA_POLICY_NESTED(coalesce_profile_irq_policy),
> + [ETHTOOL_A_COALESCE_TX_PROFILE] =
> + NLA_POLICY_NESTED(coalesce_profile_irq_policy),
> +#endif
> };
>
> static int
> @@ -234,6 +365,9 @@ ethnl_set_coalesce_validate(struct ethnl_req_info *req_info,
> struct genl_info *info)
> {
> const struct ethtool_ops *ops = req_info->dev->ethtool_ops;
> +#if IS_ENABLED(CONFIG_DIMLIB)
> + struct net_device *dev = req_info->dev;
> +#endif
> struct nlattr **tb = info->attrs;
> u32 supported_params;
> u16 a;
> @@ -243,6 +377,15 @@ ethnl_set_coalesce_validate(struct ethnl_req_info *req_info,
>
> /* make sure that only supported parameters are present */
> supported_params = ops->supported_coalesce_params;
> +#if IS_ENABLED(CONFIG_DIMLIB)
> + if (dev->irq_moder) {
This may be NULL
> + if (dev->irq_moder->profile_flags & DIM_PROFILE_RX)
> + supported_params |= ETHTOOL_COALESCE_RX_PROFILE;
> +
> + if (dev->irq_moder->profile_flags & DIM_PROFILE_TX)
> + supported_params |= ETHTOOL_COALESCE_TX_PROFILE;
> + }
> +#endif
> for (a = ETHTOOL_A_COALESCE_RX_USECS; a < __ETHTOOL_A_COALESCE_CNT; a++)
> if (tb[a] && !(supported_params & attr_to_mask(a))) {
> NL_SET_ERR_MSG_ATTR(info->extack, tb[a],
> @@ -253,12 +396,104 @@ ethnl_set_coalesce_validate(struct ethnl_req_info *req_info,
> return 1;
> }
>
> +#if IS_ENABLED(CONFIG_DIMLIB)
> +/**
> + * ethnl_update_profile - get a profile nla nest with child nla nests from userspace.
> + * @dev: netdevice to update the profile
> + * @dst: profile get from the driver and modified by ethnl_update_profile.
> + * @nests: nest attr ETHTOOL_A_COALESCE_*X_PROFILE to set profile.
> + * @extack: Netlink extended ack
> + *
> + * Layout of nests:
> + * Nested ETHTOOL_A_COALESCE_*X_PROFILE attr
> + * Nested ETHTOOL_A_PROFILE_IRQ_MODERATION attr
> + * ETHTOOL_A_IRQ_MODERATION_USEC attr
> + * ETHTOOL_A_IRQ_MODERATION_PKTS attr
> + * ETHTOOL_A_IRQ_MODERATION_COMPS attr
> + * ...
> + * Nested ETHTOOL_A_PROFILE_IRQ_MODERATION attr
> + * ETHTOOL_A_IRQ_MODERATION_USEC attr
> + * ETHTOOL_A_IRQ_MODERATION_PKTS attr
> + * ETHTOOL_A_IRQ_MODERATION_COMPS attr
> + *
> + * Return: 0 on success or a negative error code.
> + */
> +static int ethnl_update_profile(struct net_device *dev,
> + struct dim_cq_moder __rcu **dst,
> + const struct nlattr *nests,
> + struct netlink_ext_ack *extack)
> +{
> + struct nlattr *moder[ARRAY_SIZE(coalesce_irq_moderation_policy)];
> + struct dim_irq_moder *irq_moder = dev->irq_moder;
> + struct dim_cq_moder *new_profile, *old_profile;
> + int ret, rem, i = 0, len;
> + struct nlattr *nest;
> +
> + if (!nests)
> + return 0;
> +
> + if (!*dst)
> + return -EINVAL;
> +
> + old_profile = rtnl_dereference(*dst);
> + len = NET_DIM_PARAMS_NUM_PROFILES * sizeof(*old_profile);
> + new_profile = kmemdup(old_profile, len, GFP_KERNEL);
> + if (!new_profile)
> + return -ENOMEM;
> +
> + nla_for_each_nested_type(nest, ETHTOOL_A_PROFILE_IRQ_MODERATION, nests, rem) {
> + ret = nla_parse_nested(moder,
> + ARRAY_SIZE(coalesce_irq_moderation_policy) - 1,
> + nest, coalesce_irq_moderation_policy,
> + extack);
> + if (ret)
> + return ret;
> +
> + if (!NL_REQ_ATTR_CHECK(extack, nest, moder, ETHTOOL_A_IRQ_MODERATION_USEC)) {
> + if (irq_moder->coal_flags & DIM_COALESCE_USEC)
There are 3 options here, not 2:
if (irq_moder->coal_flags & flag) {
if (NL_REQ_ATTR_CHECK())
val = nla_get_u32(...);
else
return -EINVAL;
} else {
if (moder[attr_type)) {
BAD_ATTR()
return -EOPNOTSUPP;
}
}
you probably want to factor this out to a helper..
> + new_profile[i].usec =
> + nla_get_u32(moder[ETHTOOL_A_IRQ_MODERATION_USEC]);
> + else
> + return -EOPNOTSUPP;
> + }
> +
> + if (!NL_REQ_ATTR_CHECK(extack, nest, moder, ETHTOOL_A_IRQ_MODERATION_PKTS)) {
> + if (irq_moder->coal_flags & DIM_COALESCE_PKTS)
> + new_profile[i].pkts =
> + nla_get_u32(moder[ETHTOOL_A_IRQ_MODERATION_PKTS]);
> + else
> + return -EOPNOTSUPP;
> + }
> +
> + if (!NL_REQ_ATTR_CHECK(extack, nest, moder, ETHTOOL_A_IRQ_MODERATION_COMPS)) {
> + if (irq_moder->coal_flags & DIM_COALESCE_COMPS)
> + new_profile[i].comps =
> + nla_get_u32(moder[ETHTOOL_A_IRQ_MODERATION_COMPS]);
> + else
> + return -EOPNOTSUPP;
> + }
> +
> + i++;
> + }
> +
> + rcu_assign_pointer(*dst, new_profile);
> +
> + synchronize_rcu();
> + kfree(old_profile);
> +
> + return 0;
> +}
> +#endif
> +
> static int
> __ethnl_set_coalesce(struct ethnl_req_info *req_info, struct genl_info *info,
> bool *dual_change)
> {
> struct kernel_ethtool_coalesce kernel_coalesce = {};
> struct net_device *dev = req_info->dev;
> +#if IS_ENABLED(CONFIG_DIMLIB)
> + struct dim_irq_moder *irq_moder = dev->irq_moder;
> +#endif
> struct ethtool_coalesce coalesce = {};
> bool mod_mode = false, mod = false;
> struct nlattr **tb = info->attrs;
> @@ -317,6 +552,33 @@ __ethnl_set_coalesce(struct ethnl_req_info *req_info, struct genl_info *info,
> ethnl_update_u32(&kernel_coalesce.tx_aggr_time_usecs,
> tb[ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS], &mod);
>
> +#if IS_ENABLED(CONFIG_DIMLIB)
> + if (!irq_moder)
> + goto skip_irq_moder;
> +
> + if (irq_moder->profile_flags & DIM_PROFILE_RX) {
> + ret = ethnl_update_profile(dev, &irq_moder->rx_profile,
> + tb[ETHTOOL_A_COALESCE_RX_PROFILE],
> + info->extack);
> + if (ret < 0)
> + return ret;
> + }
> +
> + if (irq_moder->profile_flags & DIM_PROFILE_TX) {
> + ret = ethnl_update_profile(dev, &irq_moder->tx_profile,
> + tb[ETHTOOL_A_COALESCE_TX_PROFILE],
> + info->extack);
> + if (ret < 0)
> + return ret;
> + }
> +
> +skip_irq_moder:
> +#else
> + if (tb[ETHTOOL_A_COALESCE_RX_PROFILE] ||
> + tb[ETHTOOL_A_COALESCE_TX_PROFILE])
> + return -EOPNOTSUPP;
ethnl_set_coalesce_validate() should already reject this, no?
> +#endif
> /* Update operation modes */
> ethnl_update_bool32(&coalesce.use_adaptive_rx_coalesce,
> tb[ETHTOOL_A_COALESCE_USE_ADAPTIVE_RX], &mod_mode);
next prev parent reply other threads:[~2024-04-27 1:33 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-25 16:59 [PATCH net-next v10 0/4] ethtool: provide the dim profile fine-tuning channel Heng Qi
2024-04-25 16:59 ` [PATCH net-next v10 1/4] linux/dim: move useful macros to .h file Heng Qi
2024-04-25 16:59 ` [PATCH net-next v10 2/4] ethtool: provide customized dim profile management Heng Qi
2024-04-27 1:33 ` Jakub Kicinski [this message]
2024-04-28 14:49 ` Heng Qi
2024-04-29 17:47 ` Jakub Kicinski
2024-04-30 1:59 ` Heng Qi
2024-04-30 3:13 ` Jakub Kicinski
2024-04-30 4:43 ` Heng Qi
2024-04-25 16:59 ` [PATCH net-next v10 3/4] dim: add new interfaces for initialization and getting results Heng Qi
2024-04-25 16:59 ` [PATCH net-next v10 4/4] virtio-net: support dim profile fine-tuning Heng Qi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240426183333.257ccae5@kernel.org \
--to=kuba@kernel.org \
--cc=ahmed.zaki@intel.com \
--cc=aleksander.lobakin@intel.com \
--cc=andrew@lunn.ch \
--cc=bcreeley@amd.com \
--cc=corbet@lwn.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hengqi@linux.alibaba.com \
--cc=jasowang@redhat.com \
--cc=jiri@resnulli.us \
--cc=justinstitt@google.com \
--cc=kory.maincent@bootlin.com \
--cc=linux-doc@vger.kernel.org \
--cc=maxime.chevallier@bootlin.com \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=paul.greenwalt@intel.com \
--cc=rkannoth@marvell.com \
--cc=talgi@nvidia.com \
--cc=virtualization@lists.linux.dev \
--cc=vladimir.oltean@nxp.com \
--cc=xuanzhuo@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.