Netdev List
 help / color / mirror / Atom feed
From: mohammad heib <mheib@redhat.com>
To: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org, jiri@resnulli.us, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	horms@kernel.org, corbet@lwn.net, anthony.l.nguyen@intel.com,
	przemyslaw.kitszel@intel.com, andrew+netdev@lunn.ch
Subject: Re: [PATCH net-next] i40e: add devlink parameter for Flow Director ATR sample rate
Date: Tue, 16 Jun 2026 14:03:08 +0300	[thread overview]
Message-ID: <283859f6-8a60-49be-8439-146f8c2a6bb5@redhat.com> (raw)
In-Reply-To: <20260614161131.192068-1-mheib@redhat.com>



On 6/14/26 7:11 PM, mheib@redhat.com wrote:
> From: Mohammad Heib <mheib@redhat.com>
> 
> The i40e driver uses Flow Director ATR to periodically update flow
> steering information for active TCP flows. The update frequency is
> currently controlled by I40E_DEFAULT_ATR_SAMPLE_RATE and is fixed at
> driver build time.
> 
> On systems with a large number of queues and high-rate TCP workloads,
> the default sampling interval can result in frequent Flow Director
> reprogramming for long-lived flows.
> 
> The amount of TCP packet reordering observed on some systems is
> sensitive to the ATR sampling interval. Increasing the interval reduces
> Flow Director programming activity and can significantly reduce the
> associated reordering.
> 
> Since the optimal sampling interval depends on the workload and system
> configuration, a single fixed value is not suitable for all deployments.
> 
> Add a devlink parameter to allow administrators to tune the ATR sample
> rate at runtime without rebuilding the driver or disabling ATR
> functionality entirely.
> 
> Signed-off-by: Mohammad Heib <mheib@redhat.com>
> ---
>   Documentation/networking/devlink/i40e.rst     | 19 ++++++
>   drivers/net/ethernet/intel/i40e/i40e.h        |  1 +
>   .../net/ethernet/intel/i40e/i40e_devlink.c    | 65 +++++++++++++++++++
>   drivers/net/ethernet/intel/i40e/i40e_main.c   |  4 +-
>   drivers/net/ethernet/intel/i40e/i40e_txrx.h   |  4 +-
>   5 files changed, 90 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/networking/devlink/i40e.rst b/Documentation/networking/devlink/i40e.rst
> index 51c887f0dc83..704469aa9acf 100644
> --- a/Documentation/networking/devlink/i40e.rst
> +++ b/Documentation/networking/devlink/i40e.rst
> @@ -40,6 +40,25 @@ Parameters
>   
>           The default value is ``0`` (internal calculation is used).
>   
> +.. list-table:: Driver specific parameters implemented
> +    :widths: 5 5 90
> +
> +    * - Name
> +      - Mode
> +      - Description
> +    * - ``atr_sample_rate``
> +      - runtime
> +      - Controls how frequently Flow Director ATR updates flow steering
> +        information for active TCP flows.
> +
> +        ATR programs Flow Director entries based on sampled transmitted
> +        packets. The sampling interval is specified as the number of
> +        transmitted packets between ATR updates.
> +
> +        Lower values increase Flow Director programming activity, while
> +        higher values reduce the update frequency.
> +
> +        The default value is ``20``.
>   
>   Info versions
>   =============
> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
> index 1b6a8fbaa648..88eb40ee45f0 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e.h
> @@ -487,6 +487,7 @@ struct i40e_pf {
>   	u16 rss_size_max;          /* HW defined max RSS queues */
>   	u16 fdir_pf_filter_count;  /* num of guaranteed filters for this PF */
>   	u16 num_alloc_vsi;         /* num VSIs this driver supports */
> +	u32 atr_sample_rate;
>   	bool wol_en;
>   
>   	struct hlist_head fdir_filter_list;
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_devlink.c b/drivers/net/ethernet/intel/i40e/i40e_devlink.c
> index 229179ccc131..16e51762db45 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_devlink.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_devlink.c
> @@ -33,12 +33,77 @@ static int i40e_max_mac_per_vf_get(struct devlink *devlink,
>   	return 0;
>   }
>   
> +static int i40e_atr_sample_rate_set(struct devlink *devlink,
> +				    u32 id,
> +				    struct devlink_param_gset_ctx *ctx,
> +				    struct netlink_ext_ack *extack)
> +{
> +	struct i40e_pf *pf = devlink_priv(devlink);
> +	struct i40e_vsi *vsi;
> +	u32 sample_rate = ctx->val.vu32;
> +	int i;
> +
> +	pf->atr_sample_rate = sample_rate;
> +
> +	if (!test_bit(I40E_FLAG_FD_ATR_ENA, pf->flags))
> +		return 0;
> +
> +	vsi = i40e_pf_get_main_vsi(pf);
> +	if (!vsi)
> +		return 0;
> +
> +	for (i = 0; i < vsi->num_queue_pairs; i++) {
> +		if (!vsi->tx_rings[i])
> +			continue;
> +		vsi->tx_rings[i]->atr_sample_rate = sample_rate;
> +		vsi->tx_rings[i]->atr_count = 0;
> +	}
> +
> +	return 0;
> +}
> +
> +static int i40e_atr_sample_rate_get(struct devlink *devlink,
> +				    u32 id,
> +				    struct devlink_param_gset_ctx *ctx,
> +				    struct netlink_ext_ack *extack)
> +{
> +	struct i40e_pf *pf = devlink_priv(devlink);
> +
> +	ctx->val.vu32 = pf->atr_sample_rate;
> +
> +	return 0;
> +}
> +
> +static int i40e_atr_sample_rate_validate(struct devlink *devlink, u32 id,
> +					 union devlink_param_value val,
> +					 struct netlink_ext_ack *extack)
> +{
> +	if (!val.vu32) {
> +		NL_SET_ERR_MSG_MOD(extack,
> +				   "ATR sample rate must be greater than 0");
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
> +enum i40e_dl_param_id {
> +	I40E_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
> +	I40E_DEVLINK_PARAM_ID_ATR_SAMPLE_RATE,
> +};
> +
>   static const struct devlink_param i40e_dl_params[] = {
>   	DEVLINK_PARAM_GENERIC(MAX_MAC_PER_VF,
>   			      BIT(DEVLINK_PARAM_CMODE_RUNTIME),
>   			      i40e_max_mac_per_vf_get,
>   			      i40e_max_mac_per_vf_set,
>   			      NULL),
> +	DEVLINK_PARAM_DRIVER(I40E_DEVLINK_PARAM_ID_ATR_SAMPLE_RATE,
> +			     "atr_sample_rate",
> +			     DEVLINK_PARAM_TYPE_U32,
> +			     BIT(DEVLINK_PARAM_CMODE_RUNTIME),
> +			     i40e_atr_sample_rate_get,
> +			     i40e_atr_sample_rate_set,
> +			     i40e_atr_sample_rate_validate),
>   };
>   
>   static void i40e_info_get_dsn(struct i40e_pf *pf, char *buf, size_t len)
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index d59750c490f4..9c8144970a34 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -3458,7 +3458,7 @@ static int i40e_configure_tx_ring(struct i40e_ring *ring)
>   
>   	/* some ATR related tx ring init */
>   	if (test_bit(I40E_FLAG_FD_ATR_ENA, vsi->back->flags)) {
> -		ring->atr_sample_rate = I40E_DEFAULT_ATR_SAMPLE_RATE;
> +		ring->atr_sample_rate = vsi->back->atr_sample_rate;
>   		ring->atr_count = 0;
>   	} else {
>   		ring->atr_sample_rate = 0;
> @@ -12745,6 +12745,8 @@ static int i40e_sw_init(struct i40e_pf *pf)
>   		}
>   	}
>   
> +	pf->atr_sample_rate = I40E_DEFAULT_ATR_SAMPLE_RATE;
> +
>   	if ((pf->hw.func_caps.fd_filters_guaranteed > 0) ||
>   	    (pf->hw.func_caps.fd_filters_best_effort > 0)) {
>   		set_bit(I40E_FLAG_FD_ATR_ENA, pf->flags);
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
> index bb741ff3e5f2..7e29e9244c3a 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
> @@ -372,8 +372,8 @@ struct i40e_ring {
>   	u16 next_to_clean;
>   	u16 xdp_tx_active;
>   
> -	u8 atr_sample_rate;
> -	u8 atr_count;
> +	u32 atr_sample_rate;
> +	u32 atr_count;
>   
>   	bool ring_active;		/* is ring online or not */
>   	bool arm_wb;		/* do something to arm write back */

Hi Aleksandr,

Your concern is indeed valid. I'm not 100% sure whether devlink 
callbacks are still protected by rtnl_lock after the large locking 
changes that recently went into net/core.

That said, I'm wondering whether we need to store the ATR sample rate 
per ring at all. As far as I can tell, there is no option to configure 
the sample rate independently for individual rings, so maintaining a 
copy in every ring may not be necessary.

Would it make sense to remove the per-ring copy entirely and keep the 
sample rate only at the PF level? That would avoid the need to walk the 
rings from the devlink callback and would eliminate the race you pointed 
out.

Thanks Piotr for the review. I'll address your comment in v2.


      parent reply	other threads:[~2026-06-16 11:03 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-14 16:11 [PATCH net-next] i40e: add devlink parameter for Flow Director ATR sample rate mheib
2026-06-14 22:16 ` [Intel-wired-lan] " kernel test robot
2026-06-15  0:42 ` kernel test robot
2026-06-15 12:44 ` Loktionov, Aleksandr
2026-06-16  8:53 ` Kwapulinski, Piotr
2026-06-16 11:03 ` mohammad heib [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=283859f6-8a60-49be-8439-146f8c2a6bb5@redhat.com \
    --to=mheib@redhat.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=anthony.l.nguyen@intel.com \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=przemyslaw.kitszel@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox