From: Joe Damato <jdamato@fastly.com>
To: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
kuba@kernel.org, davem@davemloft.net
Subject: Re: [Intel-wired-lan] [next-queue 2/3] i40e: i40e_clean_tx_irq returns work done
Date: Wed, 5 Oct 2022 11:47:28 -0700 [thread overview]
Message-ID: <20221005184728.GB15277@fastly.com> (raw)
In-Reply-To: <e352426f-7a43-6353-5c1d-aa3480f64860@intel.com>
On Wed, Oct 05, 2022 at 11:33:23AM -0700, Jesse Brandeburg wrote:
> On 10/5/2022 10:50 AM, Joe Damato wrote:
> >On Wed, Oct 05, 2022 at 12:46:31PM +0200, Maciej Fijalkowski wrote:
> >>On Wed, Oct 05, 2022 at 01:31:42AM -0700, Joe Damato wrote:
> >>>Adjust i40e_clean_tx_irq to return the actual number of packets cleaned
> >>>and adjust the logic in i40e_napi_poll to check this value.
>
> it's fine to return the number cleaned but let's keep that data and changes
> to itself instead of changing the flow of the routine.
>
>
> >>>
> >>>Signed-off-by: Joe Damato <jdamato@fastly.com>
> >>>---
> >>> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 24 +++++++++++++-----------
> >>> drivers/net/ethernet/intel/i40e/i40e_xsk.c | 12 ++++++------
> >>> drivers/net/ethernet/intel/i40e/i40e_xsk.h | 2 +-
> >>> 3 files changed, 20 insertions(+), 18 deletions(-)
> >>>
> >>>diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> >>>index b97c95f..ed88309 100644
> >>>--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> >>>+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> >>>@@ -924,10 +924,10 @@ void i40e_detect_recover_hung(struct i40e_vsi *vsi)
> >>> * @tx_ring: Tx ring to clean
> >>> * @napi_budget: Used to determine if we are in netpoll
> >>> *
> >>>- * Returns true if there's any budget left (e.g. the clean is finished)
> >>>+ * Returns the number of packets cleaned
> >>> **/
> >>>-static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
> >>>- struct i40e_ring *tx_ring, int napi_budget)
> >>>+static int i40e_clean_tx_irq(struct i40e_vsi *vsi,
> >>>+ struct i40e_ring *tx_ring, int napi_budget)
> >>> {
> >>> int i = tx_ring->next_to_clean;
> >>> struct i40e_tx_buffer *tx_buf;
> >>>@@ -1026,7 +1026,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
> >>> i40e_arm_wb(tx_ring, vsi, budget);
> >>> if (ring_is_xdp(tx_ring))
> >>>- return !!budget;
> >>>+ return total_packets;
> >>> /* notify netdev of completed buffers */
> >>> netdev_tx_completed_queue(txring_txq(tx_ring),
> >>>@@ -1048,7 +1048,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
> >>> }
> >>> }
> >>>- return !!budget;
> >>>+ return total_packets;
> >>> }
> >>> /**
> >>>@@ -2689,10 +2689,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
> >>> container_of(napi, struct i40e_q_vector, napi);
> >>> struct i40e_vsi *vsi = q_vector->vsi;
> >>> struct i40e_ring *ring;
> >>>+ bool tx_clean_complete = true;
> >>> bool clean_complete = true;
> >>> bool arm_wb = false;
> >>> int budget_per_ring;
> >>> int work_done = 0;
> >>>+ int tx_wd = 0;
> >>> if (test_bit(__I40E_VSI_DOWN, vsi->state)) {
> >>> napi_complete(napi);
> >>>@@ -2703,12 +2705,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
> >>> * budget and be more aggressive about cleaning up the Tx descriptors.
> >>> */
> >>> i40e_for_each_ring(ring, q_vector->tx) {
> >>>- bool wd = ring->xsk_pool ?
> >>>- i40e_clean_xdp_tx_irq(vsi, ring) :
> >>>- i40e_clean_tx_irq(vsi, ring, budget);
> >>>+ tx_wd = ring->xsk_pool ?
> >>>+ i40e_clean_xdp_tx_irq(vsi, ring) :
> >>>+ i40e_clean_tx_irq(vsi, ring, budget);
> >>>- if (!wd) {
> >>>- clean_complete = false;
> >>>+ if (tx_wd >= budget) {
> >>>+ tx_clean_complete = false;
> >>
> >>This will break for AF_XDP Tx ZC. AF_XDP Tx ZC in intel drivers ignores
> >>budget given by NAPI. If you look at i40e_xmit_zc():
> >>
> >>func def:
> >>static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
> >>
> >>callsite:
> >> return i40e_xmit_zc(tx_ring, I40E_DESC_UNUSED(tx_ring));
> >>
> >>we give free ring space as a budget and with your change we would be
> >>returning the amount of processed tx descriptors which you will be
> >>comparing against NAPI budget (64, unless you have busy poll enabled with
> >>a different batch size). Say you start with empty ring and your HW rings
> >>are sized to 1k but there was only 512 AF_XDP descriptors ready for Tx.
> >>You produced all of them successfully to ring and you return 512 up to
> >>i40e_napi_poll.
> >
> >Good point, my bad.
> >
> >I've reworked this for the v2 and have given i40e_clean_tx_irq,
> >and i40e_clean_xdp_tx_irq an out parameter which will record the number
> >TXes cleaned.
> >
> >I tweaked i40e_xmit_zc to return the number of packets (nb_pkts) and moved
> >the boolean to check if that's under the "budget"
> >(I40E_DESC_UNUSED(tx_ring)) into i40e_clean_xdp_tx_irq.
> >
> >I think that might solve the issues you've described.
>
> Please don't change the flow of this function, transmit clean ups are so
> cheap that we don't bother counting them or limiting them beyond a maximum
> (so they don't clean forever)
>
> Basically transmits should not be counted when exiting NAPI, besides that we
> did "at least one". The only thing that matters to the budget is that we
> "finished" transmit cleanup or not, which would make sure we rescheduled
> napi if we weren't finished cleaning (for instance on a 8160 entry tx ring)
> transmits.
>
> I'd much rather you kept this series to a simple return count of tx cleaned
> in "out" as you've said you'd do in v2, and then use that data *only* in the
> context of the new trace event.
>
> That way you're not changing the flow and introducing tough to debug issues
> in the hot path.
In the v2 I've been hacking on I've added out params to i40e_clean_tx_irq and
i40e_clean_xdp_tx_irq, but I avoided adding an out param in i40e_xmit_zc,
since lifting the boolean out seemed pretty straightforward.
I'll drop that though in favor of an out param in i40e_xmit_zc, as well, to
avoid changing the flow of the code.
Thanks for taking a look.
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
next prev parent reply other threads:[~2022-10-05 18:47 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-05 8:31 [Intel-wired-lan] [next-queue 0/3] i40e: Add an i40e_napi_poll tracepoint Joe Damato
2022-10-05 8:31 ` [Intel-wired-lan] [next-queue 1/3] i40e: Store the irq number in i40e_q_vector Joe Damato
2022-10-05 10:29 ` Maciej Fijalkowski
2022-10-05 17:00 ` Joe Damato
2022-10-05 18:25 ` Jesse Brandeburg
2022-10-05 18:40 ` Joe Damato
2022-10-05 19:37 ` Jesse Brandeburg
2022-10-06 13:06 ` Maciej Fijalkowski
2022-10-05 8:31 ` [Intel-wired-lan] [next-queue 2/3] i40e: i40e_clean_tx_irq returns work done Joe Damato
2022-10-05 8:45 ` Paul Menzel
2022-10-05 10:46 ` Maciej Fijalkowski
2022-10-05 17:50 ` Joe Damato
2022-10-05 18:33 ` Jesse Brandeburg
2022-10-05 18:47 ` Joe Damato [this message]
2022-10-05 8:31 ` [Intel-wired-lan] [next-queue 3/3] i40e: Add i40e_napi_poll tracepoint Joe Damato
2022-10-05 10:27 ` Maciej Fijalkowski
2022-10-05 17:56 ` Joe Damato
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221005184728.GB15277@fastly.com \
--to=jdamato@fastly.com \
--cc=davem@davemloft.net \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=jesse.brandeburg@intel.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox