public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Alexander Lobakin <aleksander.lobakin@intel.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Paul Menzel <pmenzel@molgen.mpg.de>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	Larysa Zaremba <larysa.zaremba@intel.com>,
	<netdev@vger.kernel.org>, Alexander Duyck <alexanderduyck@fb.com>,
	"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
	<linux-kernel@vger.kernel.org>,
	Yunsheng Lin <linyunsheng@huawei.com>,
	Michal Kubiak <michal.kubiak@intel.com>,
	<intel-wired-lan@lists.osuosl.org>,
	"David Christensen" <drc@linux.vnet.ibm.com>
Subject: Re: [Intel-wired-lan] [PATCH RFC net-next v4 6/9] iavf: switch to Page Pool
Date: Mon, 10 Jul 2023 15:18:37 +0200	[thread overview]
Message-ID: <f4884344-98f9-9ad3-62c0-9ade0bbadbb2@intel.com> (raw)
In-Reply-To: <CAKgT0Ufqno2z=6w6XmJ+rVeqzOnHudgsRs8Fgs+eke_cyc0hjQ@mail.gmail.com>

From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Thu, 6 Jul 2023 10:28:06 -0700

> On Thu, Jul 6, 2023 at 9:57 AM Alexander Lobakin
> <aleksander.lobakin@intel.com> wrote:
>>
>> From: Alexander Duyck <alexander.duyck@gmail.com>
>> Date: Thu, 6 Jul 2023 08:26:00 -0700
>>
>>> On Wed, Jul 5, 2023 at 8:58 AM Alexander Lobakin
>>> <aleksander.lobakin@intel.com> wrote:
>>>>
>>>> Now that the IAVF driver simply uses dev_alloc_page() + free_page() with
>>>> no custom recycling logics, it can easily be switched to using Page
>>>> Pool / libie API instead.
>>>> This allows to removing the whole dancing around headroom, HW buffer
>>>> size, and page order. All DMA-for-device is now done in the PP core,
>>>> for-CPU -- in the libie helper.
>>>> Use skb_mark_for_recycle() to bring back the recycling and restore the
>>>> performance. Speaking of performance: on par with the baseline and
>>>> faster with the PP optimization series applied. But the memory usage for
>>>> 1500b MTU is now almost 2x lower (x86_64) thanks to allocating a page
>>>> every second descriptor.
>>>>
>>>> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
>>>
>>> One thing I am noticing is that there seems to be a bunch of cleanup
>>> changes in here as well. Things like moving around values within
>>> structures which I am assuming are to fill holes. You may want to look
>>> at breaking some of those out as it makes it a bit harder to review
>>> this since they seem like unrelated changes.
>>
>> min_mtu and watchdog are unrelated, I'll drop those.
>> Moving tail pointer around was supposed to land in a different commit,
>> not this one, as I wrote 10 minutes ago already :s
>>
>> [...]
>>
>>>> -       bi_size = sizeof(struct iavf_rx_buffer) * rx_ring->count;
>>>> -       memset(rx_ring->rx_bi, 0, bi_size);
>>>> -
>>>> -       /* Zero out the descriptor ring */
>>>> -       memset(rx_ring->desc, 0, rx_ring->size);
>>>> -
>>>
>>> I have some misgivings about not clearing these. We may want to double
>>> check to verify the code paths are resilient enough that it won't
>>> cause any issues w/ repeated up/down testing on the interface. The
>>> general idea is to keep things consistent w/ the state after
>>> setup_rx_descriptors. If we don't need this when we don't need to be
>>> calling the zalloc or calloc version of things in
>>> setup_rx_descriptors.
>>
>> Both arrays will be freed couple instructions below, why zero them?
> 
> Ugh. You are right, but not for a good reason. So the other Intel
> drivers in the past would be doing the clean_rx_ring calls on the
> _down() with the freeing of resources on _close(). Specifically it
> allowed reducing the overhead for things like resets or setting
> changes since it didn't require reallocating the descriptor rings and
> buffer info structures.
> 
> I guess you are good to remove these since this code doesn't do that.

We might go back to this to not always do a full circle when not needed,
but currently this is redundant.

> 
>>>
>>>
>>>>         rx_ring->next_to_clean = 0;
>>>>         rx_ring->next_to_use = 0;
>>>>  }
>>
>> [...]
>>
>>>>         struct net_device *netdev;      /* netdev ring maps to */
>>>>         union {
>>>> +               struct libie_rx_buffer *rx_bi;
>>>>                 struct iavf_tx_buffer *tx_bi;
>>>> -               struct iavf_rx_buffer *rx_bi;
>>>>         };
>>>>         DECLARE_BITMAP(state, __IAVF_RING_STATE_NBITS);
>>>> +       u8 __iomem *tail;
>>>>         u16 queue_index;                /* Queue number of ring */
>>>>         u8 dcb_tc;                      /* Traffic class of ring */
>>>> -       u8 __iomem *tail;
>>>>
>>>>         /* high bit set means dynamic, use accessors routines to read/write.
>>>>          * hardware only supports 2us resolution for the ITR registers.
>>>
>>> I'm assuming "tail" was moved here since it is a pointer and fills a hole?
>>
>> (see above)
>>
>>>
>>>> @@ -329,9 +264,8 @@ struct iavf_ring {
>>>>          */
>>>>         u16 itr_setting;
>>>>
>>>> -       u16 count;                      /* Number of descriptors */
>>>>         u16 reg_idx;                    /* HW register index of the ring */
>>>> -       u16 rx_buf_len;
>>>> +       u16 count;                      /* Number of descriptors */
>>>
>>> Why move count down here? It is moving the constant value that is
>>> read-mostly into an area that will be updated more often.
>>
>> With the ::tail put in a different slot, ::count was landing in a
>> different cacheline. I wanted to avoid this. But now I feel like I was
>> just lazy and must've tested both variants to see if this move affects
>> performance. I'll play with this one in the next rev.
> 
> The performance impact should be minimal. Odds are the placement was
> the way it was since it was probably just copying the original code
> that has been there since igb/ixgbe. The general idea is just keep the
> read-mostly items grouped at the top and try to order them somewhat by
> frequency of being read so that wherever the cache line ends up you
> won't take much of a penalty as hopefully you will just have the
> infrequently read items end up getting pulled into the active cache
> line.

+

> 
>>>
>>>>         /* used in interrupt processing */
>>>>         u16 next_to_use;
>>>> @@ -398,17 +332,6 @@ struct iavf_ring_container {
>>>>  #define iavf_for_each_ring(pos, head) \
>>>>         for (pos = (head).ring; pos != NULL; pos = pos->next)
>>>>
>>>> -static inline unsigned int iavf_rx_pg_order(struct iavf_ring *ring)
>>>> -{
>>>> -#if (PAGE_SIZE < 8192)
>>>> -       if (ring->rx_buf_len > (PAGE_SIZE / 2))
>>>> -               return 1;
>>>> -#endif
>>>> -       return 0;
>>>> -}
>>>> -
>>>> -#define iavf_rx_pg_size(_ring) (PAGE_SIZE << iavf_rx_pg_order(_ring))
>>>> -
>>>
>>> All this code probably could have been removed in an earlier patch
>>> since I don't think we need the higher order pages once we did away
>>> with the recycling. Odds are we can probably move this into the
>>> recycling code removal.
>>
>> This went here as I merged "always use order 0" commit with "switch to
>> Page Pool". In general, IIRC having removals of all the stuff at once in
>> one commit (#2) was less readable than the current version, but I'll
>> double-check.
> 
> It all depends on how much is having to be added to accommodate this.
> In my mind when we did away with the page splitting/recycling we also
> did away with the need for the higher order pages. That is why I was
> thinking it might make more sense there as it would just be more
> removals with very few if any additions needed to support it.
Yeah, I'll try and see whether any pieces can be grouped differently for
better reading/logics.

[...]

Thanks!
Olek

  reply	other threads:[~2023-07-10 13:21 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-05 15:55 [PATCH RFC net-next v4 0/9] net: intel: start The Great Code Dedup + Page Pool for iavf Alexander Lobakin
2023-07-05 15:55 ` [PATCH RFC net-next v4 1/9] net: intel: introduce Intel Ethernet common library Alexander Lobakin
2023-07-14 14:17   ` [Intel-wired-lan] " Przemek Kitszel
2023-07-05 15:55 ` [PATCH RFC net-next v4 2/9] iavf: kill "legacy-rx" for good Alexander Lobakin
2023-07-14 14:17   ` [Intel-wired-lan] " Przemek Kitszel
2023-07-05 15:55 ` [PATCH RFC net-next v4 3/9] iavf: drop page splitting and recycling Alexander Lobakin
2023-07-06 14:47   ` [Intel-wired-lan] " Alexander Duyck
2023-07-06 16:45     ` Alexander Lobakin
2023-07-06 17:06       ` Alexander Duyck
2023-07-10 13:13         ` Alexander Lobakin
2023-07-05 15:55 ` [PATCH RFC net-next v4 4/9] net: page_pool: add DMA-sync-for-CPU inline helpers Alexander Lobakin
2023-07-05 15:55 ` [PATCH RFC net-next v4 5/9] libie: add Rx buffer management (via Page Pool) Alexander Lobakin
2023-07-06 12:47   ` Yunsheng Lin
2023-07-06 16:28     ` Alexander Lobakin
2023-07-09  5:16       ` Yunsheng Lin
2023-07-10 13:25         ` Alexander Lobakin
2023-07-11 11:39           ` Yunsheng Lin
2023-07-11 16:37             ` Alexander Lobakin
2023-07-12 11:13               ` Yunsheng Lin
2023-07-05 15:55 ` [PATCH RFC net-next v4 6/9] iavf: switch to Page Pool Alexander Lobakin
2023-07-06 12:47   ` Yunsheng Lin
2023-07-06 16:38     ` Alexander Lobakin
2023-07-09  5:16       ` Yunsheng Lin
2023-07-10 13:34         ` Alexander Lobakin
2023-07-11 11:47           ` Yunsheng Lin
2023-07-18 13:56             ` Alexander Lobakin
2023-07-06 15:26   ` [Intel-wired-lan] " Alexander Duyck
2023-07-06 16:56     ` Alexander Lobakin
2023-07-06 17:28       ` Alexander Duyck
2023-07-10 13:18         ` Alexander Lobakin [this message]
2023-07-05 15:55 ` [PATCH RFC net-next v4 7/9] libie: add common queue stats Alexander Lobakin
2023-07-05 15:55 ` [PATCH RFC net-next v4 8/9] libie: add per-queue Page Pool stats Alexander Lobakin
2023-07-05 15:55 ` [PATCH RFC net-next v4 9/9] iavf: switch queue stats to libie Alexander Lobakin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f4884344-98f9-9ad3-62c0-9ade0bbadbb2@intel.com \
    --to=aleksander.lobakin@intel.com \
    --cc=alexander.duyck@gmail.com \
    --cc=alexanderduyck@fb.com \
    --cc=davem@davemloft.net \
    --cc=drc@linux.vnet.ibm.com \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=kuba@kernel.org \
    --cc=larysa.zaremba@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linyunsheng@huawei.com \
    --cc=michal.kubiak@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pmenzel@molgen.mpg.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox