Re: [PATCH net-next v3 03/12] iavf: optimize Rx buffer allocation a bunch

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Alexander H Duyck <alexander.duyck@gmail.com>
To: Alexander Lobakin <aleksander.lobakin@intel.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>
Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>,
	Magnus Karlsson <magnus.karlsson@intel.com>,
	Michal Kubiak <michal.kubiak@intel.com>,
	Larysa Zaremba <larysa.zaremba@intel.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	Christoph Hellwig <hch@lst.de>,
	Paul Menzel <pmenzel@molgen.mpg.de>,
	netdev@vger.kernel.org,  intel-wired-lan@lists.osuosl.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next v3 03/12] iavf: optimize Rx buffer allocation a bunch
Date: Wed, 31 May 2023 08:37:56 -0700	[thread overview]
Message-ID: <9523677f696a6376c79d32cbec7d6e7ceb1b0500.camel@gmail.com> (raw)
In-Reply-To: <20230530150035.1943669-4-aleksander.lobakin@intel.com>

On Tue, 2023-05-30 at 17:00 +0200, Alexander Lobakin wrote:
> The Rx hotpath code of IAVF is not well-optimized TBH. Before doing any
> further buffer model changes, shake it up a bit. Notably:
> 
> 1. Cache more variables on the stack.
>    DMA device, Rx page size, NTC -- these are the most common things
>    used all throughout the hotpath, often in loops on each iteration.
>    Instead of fetching (or even calculating, as with the page size) them
>    from the ring all the time, cache them on the stack at the beginning
>    of the NAPI polling callback. NTC will be written back at the end,
>    the rest are used read-only, so no sync needed.
> 2. Don't move the recycled buffers around the ring.
>    The idea of passing the page of the right-now-recycled-buffer to a
>    different buffer, in this case, the first one that needs to be
>    allocated, moreover, on each new frame, is fundamentally wrong. It
>    involves a few o' fetches, branches and then writes (and one Rx
>    buffer struct is at least 32 bytes) where they're completely unneeded,
>    but gives no good -- the result is the same as if we'd recycle it
>    inplace, at the same position where it was used. So drop this and let
>    the main refilling function take care of all the buffers, which were
>    processed and now need to be recycled/refilled.
> 3. Don't allocate with %GPF_ATOMIC on ifup.
>    This involved introducing the @gfp parameter to a couple functions.
>    Doesn't change anything for Rx -> softirq.
> 4. 1 budget unit == 1 descriptor, not skb.
>    There could be underflow when receiving a lot of fragmented frames.
>    If each of them would consist of 2 frags, it means that we'd process
>    64 descriptors at the point where we pass the 32th skb to the stack.
>    But the driver would count that only as a half, which could make NAPI
>    re-enable interrupts prematurely and create unnecessary CPU load.
> 5. Shortcut !size case.
>    It's super rare, but possible -- for example, if the last buffer of
>    the fragmented frame contained only FCS, which was then stripped by
>    the HW. Instead of checking for size several times when processing,
>    quickly reuse the buffer and jump to the skb fields part.
> 6. Refill the ring after finishing the polling loop.
>    Previously, the loop wasn't starting a new iteration after the 64th
>    desc, meaning that we were always leaving 16 buffers non-refilled
>    until the next NAPI poll. It's better to refill them while they're
>    still hot, so do that right after exiting the loop as well.
>    For a full cycle of 64 descs, there will be 4 refills of 16 descs
>    from now on.
> 
> Function: add/remove: 4/2 grow/shrink: 0/5 up/down: 473/-647 (-174)
> 
> + up to 2% performance.
> 
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

What is the workload that is showing the performance improvement?

<...>

> @@ -1350,14 +1297,6 @@ static bool iavf_is_non_eop(struct iavf_ring *rx_ring,
>  			    union iavf_rx_desc *rx_desc,
>  			    struct sk_buff *skb)

I am pretty sure the skb pointer here is an unused variable. We needed
it for ixgbe to support RSC. I don't think you have any code that uses
it in this function and I know we removed the variable for i40e, see
commit d06e2f05b4f18 ("i40e: adjust i40e_is_non_eop").

next prev parent reply	other threads:[~2023-05-31 15:38 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-30 15:00 [PATCH net-next v3 00/12] net: intel: start The Great Code Dedup + Page Pool for iavf Alexander Lobakin
2023-05-30 15:00 ` [PATCH net-next v3 01/12] net: intel: introduce Intel Ethernet common library Alexander Lobakin
2023-05-30 15:00 ` [PATCH net-next v3 02/12] iavf: kill "legacy-rx" for good Alexander Lobakin
2023-05-30 15:00 ` [PATCH net-next v3 03/12] iavf: optimize Rx buffer allocation a bunch Alexander Lobakin
2023-05-31 15:37   ` Alexander H Duyck [this message]
2023-05-31 16:39   ` Maciej Fijalkowski
2023-06-02 14:09     ` Alexander Lobakin
2023-05-30 15:00 ` [PATCH net-next v3 04/12] iavf: remove page splitting/recycling Alexander Lobakin
2023-05-30 15:00 ` [PATCH net-next v3 05/12] iavf: always use a full order-0 page Alexander Lobakin
2023-05-30 15:00 ` [PATCH net-next v3 06/12] net: skbuff: don't include <net/page_pool.h> into <linux/skbuff.h> Alexander Lobakin
2023-05-31 15:21   ` Alexander H Duyck
2023-05-31 15:28     ` Alexander Lobakin
2023-05-31 15:29       ` Alexander Lobakin
2023-05-30 15:00 ` [PATCH net-next v3 07/12] net: page_pool: avoid calling no-op externals when possible Alexander Lobakin
2023-05-30 15:00 ` [PATCH net-next v3 08/12] net: page_pool: add DMA-sync-for-CPU inline helpers Alexander Lobakin
2023-05-30 15:00 ` [PATCH net-next v3 09/12] iavf: switch to Page Pool Alexander Lobakin
2023-05-31 16:19   ` Alexander H Duyck
2023-06-02 16:29     ` Alexander Lobakin
2023-06-02 18:00       ` Alexander Duyck
2023-06-06 13:13         ` Alexander Lobakin
2023-05-30 15:00 ` [PATCH net-next v3 10/12] libie: add common queue stats Alexander Lobakin
2023-05-30 15:00 ` [PATCH net-next v3 11/12] libie: add per-queue Page Pool stats Alexander Lobakin
2023-05-30 15:00 ` [PATCH net-next v3 12/12] iavf: switch queue stats to libie Alexander Lobakin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9523677f696a6376c79d32cbec7d6e7ceb1b0500.camel@gmail.com \
    --to=alexander.duyck@gmail.com \
    --cc=aleksander.lobakin@intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=hch@lst.de \
    --cc=ilias.apalodimas@linaro.org \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=kuba@kernel.org \
    --cc=larysa.zaremba@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=michal.kubiak@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pmenzel@molgen.mpg.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).