From: Alexander Lobakin <aleksander.lobakin@intel.com>
To: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Larysa Zaremba <larysa.zaremba@intel.com>,
netdev@vger.kernel.org,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
linux-kernel@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
Eric Dumazet <edumazet@google.com>,
Michal Kubiak <michal.kubiak@intel.com>,
intel-wired-lan@lists.osuosl.org,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
"David S. Miller" <davem@davemloft.net>,
Magnus Karlsson <magnus.karlsson@intel.com>
Subject: Re: [Intel-wired-lan] [PATCH net-next v3 03/12] iavf: optimize Rx buffer allocation a bunch
Date: Fri, 2 Jun 2023 16:09:54 +0200 [thread overview]
Message-ID: <97d2efe0-599b-70d3-16ca-1dbab13eb2b1@intel.com> (raw)
In-Reply-To: <ZHd4UPXgNaJlmyv1@boxer>
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date: Wed, 31 May 2023 18:39:44 +0200
> On Tue, May 30, 2023 at 05:00:26PM +0200, Alexander Lobakin wrote:
>> The Rx hotpath code of IAVF is not well-optimized TBH. Before doing any
>> further buffer model changes, shake it up a bit. Notably:
>>
>> 1. Cache more variables on the stack.
>> DMA device, Rx page size, NTC -- these are the most common things
>> used all throughout the hotpath, often in loops on each iteration.
>> Instead of fetching (or even calculating, as with the page size) them
>> from the ring all the time, cache them on the stack at the beginning
>> of the NAPI polling callback. NTC will be written back at the end,
>> the rest are used read-only, so no sync needed.
>
> I like calculating page size once per napi istance. Reduces a bunch of
> branches ;)
>
> Yet another optimization I did on other drivers was to store rx_offset
> within ring struct. I skipped iavf for some reason. I can follow-up with
> that, but I'm bringing this up so we keep an eye on it.
rx_offset is stored as Page Pool param in its struct. So no follow-ups
here needed :)
[...]
>> 3. Don't allocate with %GPF_ATOMIC on ifup.
>
> s/GPF/GFP
Breh :s
>
>> This involved introducing the @gfp parameter to a couple functions.
>> Doesn't change anything for Rx -> softirq.
>> 4. 1 budget unit == 1 descriptor, not skb.
>> There could be underflow when receiving a lot of fragmented frames.
>> If each of them would consist of 2 frags, it means that we'd process
>> 64 descriptors at the point where we pass the 32th skb to the stack.
>> But the driver would count that only as a half, which could make NAPI
>> re-enable interrupts prematurely and create unnecessary CPU load.
>
> How would this affect 9k MTU workloads?
Not measured =\ But I feel like I'll drop this bullet, so will see.
>
>> 5. Shortcut !size case.
>> It's super rare, but possible -- for example, if the last buffer of
>> the fragmented frame contained only FCS, which was then stripped by
>> the HW. Instead of checking for size several times when processing,
>> quickly reuse the buffer and jump to the skb fields part.
>
> would be good to say about pagecnt_bias handling.
?? Bias is changed only when the buffer contains data, in this case it's
not changed, so the buffer is ready to be reused.
[...]
>> Function: add/remove: 4/2 grow/shrink: 0/5 up/down: 473/-647 (-174)
>>
>> + up to 2% performance.
>
> I am sort of not buying that. You are removing iavf_reuse_rx_page() here
> which is responsible for reusing the page, but on next patch that is
> supposed to avoid page split perf drops by 30%. A bit confusing?
Nope. reuse_rx_page() only adds overhead since it moves reusable buffers
around the ring, while without it they get reused in-place. That's why
it doesn't cause any regressions. The next patch removes page reuse
completely, hence the perf changes.
[...]
>> -static void iavf_reuse_rx_page(struct iavf_ring *rx_ring,
>> - struct iavf_rx_buffer *old_buff)
>
> this is recycling logic so i feel this removal belongs to patch 04, right?
(above)
>
>> -{
>> - struct iavf_rx_buffer *new_buff;
>> - u16 nta = rx_ring->next_to_alloc;
[...]
>> -static struct iavf_rx_buffer *iavf_get_rx_buffer(struct iavf_ring *rx_ring,
>> - const unsigned int size)
>> +static void iavf_sync_rx_buffer(struct device *dev, struct iavf_rx_buffer *buf,
>> + u32 size)
>
> you have peeled out all of the contents of this function, why not calling
> dma_sync_single_range_for_cpu() directly?
Pretty long line, so I decided to leave it here. It gets removed anyway
when Page Pool is here.
[...]
>> if (iavf_can_reuse_rx_page(rx_buffer)) {
>> - /* hand second half of page back to the ring */
>> - iavf_reuse_rx_page(rx_ring, rx_buffer);
>> rx_ring->rx_stats.page_reuse_count++;
>
> what is the purpose of not reusing the page but bumping the meaningless
> stat? ;)
Also above. It's reused, just not moved around the ring :D
[...]
>> + /* Very rare, but possible case. The most common reason:
>> + * the last fragment contained FCS only, which was then
^^^^^^^^
>> + * stripped by the HW.
>
> you could also mention this is happening for fragmented frames
Mmm?
>
>> + */
>> + if (unlikely(!size))
>> + goto skip_data;
Thanks,
Olek
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
WARNING: multiple messages have this Message-ID (diff)
From: Alexander Lobakin <aleksander.lobakin@intel.com>
To: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Magnus Karlsson <magnus.karlsson@intel.com>,
"Michal Kubiak" <michal.kubiak@intel.com>,
Larysa Zaremba <larysa.zaremba@intel.com>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
Christoph Hellwig <hch@lst.de>,
Paul Menzel <pmenzel@molgen.mpg.de>, <netdev@vger.kernel.org>,
<intel-wired-lan@lists.osuosl.org>,
<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH net-next v3 03/12] iavf: optimize Rx buffer allocation a bunch
Date: Fri, 2 Jun 2023 16:09:54 +0200 [thread overview]
Message-ID: <97d2efe0-599b-70d3-16ca-1dbab13eb2b1@intel.com> (raw)
In-Reply-To: <ZHd4UPXgNaJlmyv1@boxer>
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date: Wed, 31 May 2023 18:39:44 +0200
> On Tue, May 30, 2023 at 05:00:26PM +0200, Alexander Lobakin wrote:
>> The Rx hotpath code of IAVF is not well-optimized TBH. Before doing any
>> further buffer model changes, shake it up a bit. Notably:
>>
>> 1. Cache more variables on the stack.
>> DMA device, Rx page size, NTC -- these are the most common things
>> used all throughout the hotpath, often in loops on each iteration.
>> Instead of fetching (or even calculating, as with the page size) them
>> from the ring all the time, cache them on the stack at the beginning
>> of the NAPI polling callback. NTC will be written back at the end,
>> the rest are used read-only, so no sync needed.
>
> I like calculating page size once per napi istance. Reduces a bunch of
> branches ;)
>
> Yet another optimization I did on other drivers was to store rx_offset
> within ring struct. I skipped iavf for some reason. I can follow-up with
> that, but I'm bringing this up so we keep an eye on it.
rx_offset is stored as Page Pool param in its struct. So no follow-ups
here needed :)
[...]
>> 3. Don't allocate with %GPF_ATOMIC on ifup.
>
> s/GPF/GFP
Breh :s
>
>> This involved introducing the @gfp parameter to a couple functions.
>> Doesn't change anything for Rx -> softirq.
>> 4. 1 budget unit == 1 descriptor, not skb.
>> There could be underflow when receiving a lot of fragmented frames.
>> If each of them would consist of 2 frags, it means that we'd process
>> 64 descriptors at the point where we pass the 32th skb to the stack.
>> But the driver would count that only as a half, which could make NAPI
>> re-enable interrupts prematurely and create unnecessary CPU load.
>
> How would this affect 9k MTU workloads?
Not measured =\ But I feel like I'll drop this bullet, so will see.
>
>> 5. Shortcut !size case.
>> It's super rare, but possible -- for example, if the last buffer of
>> the fragmented frame contained only FCS, which was then stripped by
>> the HW. Instead of checking for size several times when processing,
>> quickly reuse the buffer and jump to the skb fields part.
>
> would be good to say about pagecnt_bias handling.
?? Bias is changed only when the buffer contains data, in this case it's
not changed, so the buffer is ready to be reused.
[...]
>> Function: add/remove: 4/2 grow/shrink: 0/5 up/down: 473/-647 (-174)
>>
>> + up to 2% performance.
>
> I am sort of not buying that. You are removing iavf_reuse_rx_page() here
> which is responsible for reusing the page, but on next patch that is
> supposed to avoid page split perf drops by 30%. A bit confusing?
Nope. reuse_rx_page() only adds overhead since it moves reusable buffers
around the ring, while without it they get reused in-place. That's why
it doesn't cause any regressions. The next patch removes page reuse
completely, hence the perf changes.
[...]
>> -static void iavf_reuse_rx_page(struct iavf_ring *rx_ring,
>> - struct iavf_rx_buffer *old_buff)
>
> this is recycling logic so i feel this removal belongs to patch 04, right?
(above)
>
>> -{
>> - struct iavf_rx_buffer *new_buff;
>> - u16 nta = rx_ring->next_to_alloc;
[...]
>> -static struct iavf_rx_buffer *iavf_get_rx_buffer(struct iavf_ring *rx_ring,
>> - const unsigned int size)
>> +static void iavf_sync_rx_buffer(struct device *dev, struct iavf_rx_buffer *buf,
>> + u32 size)
>
> you have peeled out all of the contents of this function, why not calling
> dma_sync_single_range_for_cpu() directly?
Pretty long line, so I decided to leave it here. It gets removed anyway
when Page Pool is here.
[...]
>> if (iavf_can_reuse_rx_page(rx_buffer)) {
>> - /* hand second half of page back to the ring */
>> - iavf_reuse_rx_page(rx_ring, rx_buffer);
>> rx_ring->rx_stats.page_reuse_count++;
>
> what is the purpose of not reusing the page but bumping the meaningless
> stat? ;)
Also above. It's reused, just not moved around the ring :D
[...]
>> + /* Very rare, but possible case. The most common reason:
>> + * the last fragment contained FCS only, which was then
^^^^^^^^
>> + * stripped by the HW.
>
> you could also mention this is happening for fragmented frames
Mmm?
>
>> + */
>> + if (unlikely(!size))
>> + goto skip_data;
Thanks,
Olek
next prev parent reply other threads:[~2023-06-02 14:11 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-30 15:00 [Intel-wired-lan] [PATCH net-next v3 00/12] net: intel: start The Great Code Dedup + Page Pool for iavf Alexander Lobakin
2023-05-30 15:00 ` Alexander Lobakin
2023-05-30 15:00 ` [Intel-wired-lan] [PATCH net-next v3 01/12] net: intel: introduce Intel Ethernet common library Alexander Lobakin
2023-05-30 15:00 ` Alexander Lobakin
2023-05-30 15:00 ` [Intel-wired-lan] [PATCH net-next v3 02/12] iavf: kill "legacy-rx" for good Alexander Lobakin
2023-05-30 15:00 ` Alexander Lobakin
2023-05-30 15:00 ` [Intel-wired-lan] [PATCH net-next v3 03/12] iavf: optimize Rx buffer allocation a bunch Alexander Lobakin
2023-05-30 15:00 ` Alexander Lobakin
2023-05-31 15:37 ` [Intel-wired-lan] " Alexander H Duyck
2023-05-31 15:37 ` Alexander H Duyck
2023-05-31 16:39 ` [Intel-wired-lan] " Maciej Fijalkowski
2023-05-31 16:39 ` Maciej Fijalkowski
2023-06-02 14:09 ` Alexander Lobakin [this message]
2023-06-02 14:09 ` Alexander Lobakin
2023-05-30 15:00 ` [Intel-wired-lan] [PATCH net-next v3 04/12] iavf: remove page splitting/recycling Alexander Lobakin
2023-05-30 15:00 ` Alexander Lobakin
2023-05-30 15:00 ` [Intel-wired-lan] [PATCH net-next v3 05/12] iavf: always use a full order-0 page Alexander Lobakin
2023-05-30 15:00 ` Alexander Lobakin
2023-05-30 15:00 ` [Intel-wired-lan] [PATCH net-next v3 06/12] net: skbuff: don't include <net/page_pool.h> into <linux/skbuff.h> Alexander Lobakin
2023-05-30 15:00 ` Alexander Lobakin
2023-05-31 15:21 ` [Intel-wired-lan] " Alexander H Duyck
2023-05-31 15:21 ` Alexander H Duyck
2023-05-31 15:28 ` [Intel-wired-lan] " Alexander Lobakin
2023-05-31 15:28 ` Alexander Lobakin
2023-05-31 15:29 ` [Intel-wired-lan] " Alexander Lobakin
2023-05-31 15:29 ` Alexander Lobakin
2023-05-30 15:00 ` [Intel-wired-lan] [PATCH net-next v3 07/12] net: page_pool: avoid calling no-op externals when possible Alexander Lobakin
2023-05-30 15:00 ` Alexander Lobakin
2023-05-30 15:00 ` [Intel-wired-lan] [PATCH net-next v3 08/12] net: page_pool: add DMA-sync-for-CPU inline helpers Alexander Lobakin
2023-05-30 15:00 ` Alexander Lobakin
2023-05-30 15:00 ` [Intel-wired-lan] [PATCH net-next v3 09/12] iavf: switch to Page Pool Alexander Lobakin
2023-05-30 15:00 ` Alexander Lobakin
2023-05-31 16:19 ` [Intel-wired-lan] " Alexander H Duyck
2023-05-31 16:19 ` Alexander H Duyck
2023-06-02 16:29 ` [Intel-wired-lan] " Alexander Lobakin
2023-06-02 16:29 ` Alexander Lobakin
2023-06-02 18:00 ` [Intel-wired-lan] " Alexander Duyck
2023-06-02 18:00 ` Alexander Duyck
2023-06-06 13:13 ` [Intel-wired-lan] " Alexander Lobakin
2023-06-06 13:13 ` Alexander Lobakin
2023-05-30 15:00 ` [Intel-wired-lan] [PATCH net-next v3 10/12] libie: add common queue stats Alexander Lobakin
2023-05-30 15:00 ` Alexander Lobakin
2023-05-30 15:00 ` [Intel-wired-lan] [PATCH net-next v3 11/12] libie: add per-queue Page Pool stats Alexander Lobakin
2023-05-30 15:00 ` Alexander Lobakin
2023-05-30 15:00 ` [Intel-wired-lan] [PATCH net-next v3 12/12] iavf: switch queue stats to libie Alexander Lobakin
2023-05-30 15:00 ` Alexander Lobakin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=97d2efe0-599b-70d3-16ca-1dbab13eb2b1@intel.com \
--to=aleksander.lobakin@intel.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=hch@lst.de \
--cc=ilias.apalodimas@linaro.org \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=kuba@kernel.org \
--cc=larysa.zaremba@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=michal.kubiak@intel.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pmenzel@molgen.mpg.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.