Re: [PATCH net] i40e: Enforce software interrupt during busy-poll exit

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jesse Brandeburg <jesse.brandeburg@intel.com>
To: Ivan Vecera <ivecera@redhat.com>, <netdev@vger.kernel.org>
Cc: <pawel.chmielewski@intel.com>, <aleksandr.loktionov@intel.com>,
	<mschmidt@redhat.com>, Hugo Ferreira <hferreir@redhat.com>,
	Tony Nguyen <anthony.l.nguyen@intel.com>,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	"moderated list:INTEL ETHERNET DRIVERS"
	<intel-wired-lan@lists.osuosl.org>,
	open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH net] i40e: Enforce software interrupt during busy-poll exit
Date: Thu, 14 Mar 2024 17:53:26 -0700	[thread overview]
Message-ID: <0249d506-6ab2-485a-b95f-6e32e5a92d9e@intel.com> (raw)
In-Reply-To: <20240313125457.19475-1-ivecera@redhat.com>

On 3/13/2024 5:54 AM, Ivan Vecera wrote:
> As for ice bug fixed by commit b7306b42beaf ("ice: manage interrupts
> during poll exit") I'm seeing the similar issue also with i40e driver.
> 
> In certain situation when busy-loop is enabled together with adaptive
> coalescing, the driver occasionally miss that there are outstanding
> descriptors to clean when exiting busy poll.
> 
> Try to catch the remaining work by triggering a software interrupt
> when exiting busy poll. No extra interrupts will be generated when
> busy polling is not used.
> 
> The issue was found when running sockperf ping-pong tcp test with
> adaptive coalescing and busy poll enabled (50 as value busy_pool
> and busy_read sysctl knobs) and results in huge latency spikes
> with more than 100000us.

I like the results of this fix! Thanks for working on it.

> 
> The fix is inspired from the ice driver and do the following:
> 1) During napi poll exit in case of busy-poll (napo_complete_done()
>    returns false) this is recorded to q_vector that we were in busy
>    loop.
> 2) In i40e_update_enable_itr()
>    - updates refreshed ITR intervals directly using PFINT_ITRN register
>    - if we are exiting ordinary poll then just enables the interrupt
>      using PFINT_DYN_CTLN
>    - if we are exiting busy poll then enables the interrupt and
>      additionally triggers an immediate software interrupt to catch any
>      pending clean-ups
> 3) Reuses unused 3rd ITR (interrupt throttle) index and set it to
>    20K interrupts per second to limit the number of these sw interrupts.

This is a good idea.

> 
> @@ -2702,8 +2716,8 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
>  	 */
>  	if (q_vector->rx.target_itr < q_vector->rx.current_itr) {
>  		/* Rx ITR needs to be reduced, this is highest priority */
> -		intval = i40e_buildreg_itr(I40E_RX_ITR,
> -					   q_vector->rx.target_itr);
> +		wr32(hw, I40E_PFINT_ITRN(I40E_RX_ITR, q_vector->reg_idx),
> +		     q_vector->rx.target_itr >> 1);

so here you write (this is a new write)

>  		q_vector->rx.current_itr = q_vector->rx.target_itr;
>  		q_vector->itr_countdown = ITR_COUNTDOWN_START;
>  	} else if ((q_vector->tx.target_itr < q_vector->tx.current_itr) ||
> @@ -2712,25 +2726,33 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
>  		/* Tx ITR needs to be reduced, this is second priority
>  		 * Tx ITR needs to be increased more than Rx, fourth priority
>  		 */
> -		intval = i40e_buildreg_itr(I40E_TX_ITR,
> -					   q_vector->tx.target_itr);
> +		wr32(hw, I40E_PFINT_ITRN(I40E_TX_ITR, q_vector->reg_idx),
> +		     q_vector->tx.target_itr >> 1);
>  		q_vector->tx.current_itr = q_vector->tx.target_itr;
>  		q_vector->itr_countdown = ITR_COUNTDOWN_START;
>  	} else if (q_vector->rx.current_itr != q_vector->rx.target_itr) {
>  		/* Rx ITR needs to be increased, third priority */
> -		intval = i40e_buildreg_itr(I40E_RX_ITR,
> -					   q_vector->rx.target_itr);
> +		wr32(hw, I40E_PFINT_ITRN(I40E_RX_ITR, q_vector->reg_idx),
> +		     q_vector->rx.target_itr >> 1);

or here (new write)

>  		q_vector->rx.current_itr = q_vector->rx.target_itr;
>  		q_vector->itr_countdown = ITR_COUNTDOWN_START;
>  	} else {
>  		/* No ITR update, lowest priority */
> -		intval = i40e_buildreg_itr(I40E_ITR_NONE, 0);
>  		if (q_vector->itr_countdown)
>  			q_vector->itr_countdown--;
>  	}
>  
> -	if (!test_bit(__I40E_VSI_DOWN, vsi->state))
> -		wr32(hw, INTREG(q_vector->reg_idx), intval);

The above used to be the *only* write.

> +	/* Do not enable interrupt if VSI is down */
> +	if (test_bit(__I40E_VSI_DOWN, vsi->state))
> +		return;
> +
> +	if (!q_vector->in_busy_poll) {
> +		intval = i40e_buildreg_itr(I40E_ITR_NONE, 0);
> +	} else {
> +		q_vector->in_busy_poll = false;
> +		intval = i40e_buildreg_swint(I40E_SW_ITR);
> +	}
> +	wr32(hw, I40E_PFINT_DYN_CTLN(q_vector->reg_idx), intval);

and then you write again here.

So this function will now regularly have two writes in hot-path. Before
it was very carefully crafted to reduce the number of writes to 1.

This is made possible because the PFINT_DYN_CTLN register can do
multiple tasks at once with a single write.

Can you just modify intval to *both* trigger a software interrupt, and
update the ITR simultaneously? I'm really not sure that's even possible.

It may make more sense to only do the second write when exiting busy
poll, what do you think?

If there is no way to get the functionality without the two writes
everytime I'd put up with it, but we can have a lot of interrupts per
second, per queue (especially with adaptive disabled and rate set to a
small number like 2us or 0) and doubling the number of writes will have
a performance effect.

next prev parent reply	other threads:[~2024-03-15  0:53 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-13 12:54 [PATCH net] i40e: Enforce software interrupt during busy-poll exit Ivan Vecera
2024-03-13 13:47 ` Michal Schmidt
2024-03-13 15:15   ` Ivan Vecera
2024-03-15  0:53 ` Jesse Brandeburg [this message]
2024-03-15  9:19   ` Ivan Vecera

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0249d506-6ab2-485a-b95f-6e32e5a92d9e@intel.com \
    --to=jesse.brandeburg@intel.com \
    --cc=aleksandr.loktionov@intel.com \
    --cc=anthony.l.nguyen@intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hferreir@redhat.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=ivecera@redhat.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mschmidt@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pawel.chmielewski@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).