Re: [PATCH net-next v3 09/12] iavf: switch to Page Pool

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Alexander Lobakin <aleksander.lobakin@intel.com>
To: Alexander H Duyck <alexander.duyck@gmail.com>,
	Jakub Kicinski <kuba@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>,
	Maciej Fijalkowski <maciej.fijalkowski@intel.com>,
	Magnus Karlsson <magnus.karlsson@intel.com>,
	Michal Kubiak <michal.kubiak@intel.com>,
	Larysa Zaremba <larysa.zaremba@intel.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
	Christoph Hellwig <hch@lst.de>,
	Paul Menzel <pmenzel@molgen.mpg.de>, <netdev@vger.kernel.org>,
	<intel-wired-lan@lists.osuosl.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH net-next v3 09/12] iavf: switch to Page Pool
Date: Fri, 2 Jun 2023 18:29:44 +0200	[thread overview]
Message-ID: <51f558e3-7ccd-45cd-d944-73997765fd12@intel.com> (raw)
In-Reply-To: <0962a8a8493f0c892775cda8affb93c20f8b78f7.camel@gmail.com>

From: Alexander H Duyck <alexander.duyck@gmail.com>
Date: Wed, 31 May 2023 09:19:06 -0700

> On Tue, 2023-05-30 at 17:00 +0200, Alexander Lobakin wrote:
>> Now that the IAVF driver simply uses dev_alloc_page() + free_page() with
>> no custom recycling logics and one whole page per frame, it can easily
>> be switched to using Page Pool API instead.

[...]

>> @@ -691,8 +690,6 @@ int iavf_setup_tx_descriptors(struct iavf_ring *tx_ring)
>>   **/
>>  void iavf_clean_rx_ring(struct iavf_ring *rx_ring)
>>  {
>> -	u16 i;
>> -
>>  	/* ring already cleared, nothing to do */
>>  	if (!rx_ring->rx_pages)
>>  		return;
>> @@ -703,28 +700,17 @@ void iavf_clean_rx_ring(struct iavf_ring *rx_ring)
>>  	}
>>  
>>  	/* Free all the Rx ring sk_buffs */
>> -	for (i = 0; i < rx_ring->count; i++) {
>> +	for (u32 i = 0; i < rx_ring->count; i++) {
> 
> Did we make a change to our coding style to allow declaration of
> variables inside of for statements? Just wondering if this is a change
> since the recent updates to the ISO C standard, or if this doesn't
> match up with what we would expect per the coding standard.

It's optional right now, nobody would object declaring it either way.
Doing it inside is allowed since we switched to C11, right.
Here I did that because my heart was breaking to see this little u16
alone (and yeah, u16 on the stack).

> 
>>  		struct page *page = rx_ring->rx_pages[i];
>> -		dma_addr_t dma;
>>  
>>  		if (!page)
>>  			continue;
>>  
>> -		dma = page_pool_get_dma_addr(page);
>> -
>>  		/* Invalidate cache lines that may have been written to by
>>  		 * device so that we avoid corrupting memory.
>>  		 */
>> -		dma_sync_single_range_for_cpu(rx_ring->dev, dma,
>> -					      LIBIE_SKB_HEADROOM,
>> -					      LIBIE_RX_BUF_LEN,
>> -					      DMA_FROM_DEVICE);
>> -
>> -		/* free resources associated with mapping */
>> -		dma_unmap_page_attrs(rx_ring->dev, dma, LIBIE_RX_TRUESIZE,
>> -				     DMA_FROM_DEVICE, IAVF_RX_DMA_ATTR);
>> -
>> -		__free_page(page);
>> +		page_pool_dma_sync_full_for_cpu(rx_ring->pool, page);
>> +		page_pool_put_full_page(rx_ring->pool, page, false);
>>  	}
>>  
>>  	rx_ring->next_to_clean = 0;
>> @@ -739,10 +725,15 @@ void iavf_clean_rx_ring(struct iavf_ring *rx_ring)
>>   **/
>>  void iavf_free_rx_resources(struct iavf_ring *rx_ring)
>>  {
>> +	struct device *dev = rx_ring->pool->p.dev;
>> +
>>  	iavf_clean_rx_ring(rx_ring);
>>  	kfree(rx_ring->rx_pages);
>>  	rx_ring->rx_pages = NULL;
>>  
>> +	page_pool_destroy(rx_ring->pool);
>> +	rx_ring->dev = dev;
>> +
>>  	if (rx_ring->desc) {
>>  		dma_free_coherent(rx_ring->dev, rx_ring->size,
>>  				  rx_ring->desc, rx_ring->dma);
> 
> Not a fan of this switching back and forth between being a page pool
> pointer and a dev pointer. Seems problematic as it is easily
> misinterpreted. I would say that at a minimum stick to either it is
> page_pool(Rx) or dev(Tx) on a ring type basis.

The problem is that page_pool has lifetime from ifup to ifdown, while
its ring lives longer. So I had to do something with this, but also I
didn't want to have 2 pointers at the same time since it's redundant and
+8 bytes to the ring for nothing.

[...]

> This setup works for iavf, however for i40e/ice you may run into issues
> since the setup_rx_descriptors call is also used to setup the ethtool
> loopback test w/o a napi struct as I recall so there may not be a
> q_vector.

I'll handle that. Somehow :D Thanks for noticing, I'll take a look
whether I should do something right now or it can be done later when
switching the actual mentioned drivers.

[...]

>> @@ -240,7 +237,10 @@ struct iavf_rx_queue_stats {
>>  struct iavf_ring {
>>  	struct iavf_ring *next;		/* pointer to next ring in q_vector */
>>  	void *desc;			/* Descriptor ring memory */
>> -	struct device *dev;		/* Used for DMA mapping */
>> +	union {
>> +		struct page_pool *pool;	/* Used for Rx page management */
>> +		struct device *dev;	/* Used for DMA mapping on Tx */
>> +	};
>>  	struct net_device *netdev;	/* netdev ring maps to */
>>  	union {
>>  		struct iavf_tx_buffer *tx_bi;
> 
> Would it make more sense to have the page pool in the q_vector rather
> than the ring? Essentially the page pool is associated per napi
> instance so it seems like it would make more sense to store it with the
> napi struct rather than potentially have multiple instances per napi.

As per Page Pool design, you should have it per ring. Plus you have
rxq_info (XDP-related structure), which is also per-ring and
participates in recycling in some cases. So I wouldn't complicate.
I went down the chain and haven't found any place where having more than
1 PP per NAPI would break anything. If I got it correctly, Jakub's
optimization discourages having 1 PP per several NAPIs (or scheduling
one NAPI on different CPUs), but not the other way around. The goal was
to exclude concurrent access to one PP from different threads, and here
it's impossible.
Lemme know. I can always disable NAPI optimization for cases when one
vector is shared by several queues -- and it's not a usual case for
these NICs anyway -- but I haven't found a reason for that.

[...]

Thanks,
Olek

next prev parent reply	other threads:[~2023-06-02 16:31 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-30 15:00 [PATCH net-next v3 00/12] net: intel: start The Great Code Dedup + Page Pool for iavf Alexander Lobakin
2023-05-30 15:00 ` [PATCH net-next v3 01/12] net: intel: introduce Intel Ethernet common library Alexander Lobakin
2023-05-30 15:00 ` [PATCH net-next v3 02/12] iavf: kill "legacy-rx" for good Alexander Lobakin
2023-05-30 15:00 ` [PATCH net-next v3 03/12] iavf: optimize Rx buffer allocation a bunch Alexander Lobakin
2023-05-31 15:37   ` Alexander H Duyck
2023-05-31 16:39   ` Maciej Fijalkowski
2023-06-02 14:09     ` Alexander Lobakin
2023-05-30 15:00 ` [PATCH net-next v3 04/12] iavf: remove page splitting/recycling Alexander Lobakin
2023-05-30 15:00 ` [PATCH net-next v3 05/12] iavf: always use a full order-0 page Alexander Lobakin
2023-05-30 15:00 ` [PATCH net-next v3 06/12] net: skbuff: don't include <net/page_pool.h> into <linux/skbuff.h> Alexander Lobakin
2023-05-31 15:21   ` Alexander H Duyck
2023-05-31 15:28     ` Alexander Lobakin
2023-05-31 15:29       ` Alexander Lobakin
2023-05-30 15:00 ` [PATCH net-next v3 07/12] net: page_pool: avoid calling no-op externals when possible Alexander Lobakin
2023-05-30 15:00 ` [PATCH net-next v3 08/12] net: page_pool: add DMA-sync-for-CPU inline helpers Alexander Lobakin
2023-05-30 15:00 ` [PATCH net-next v3 09/12] iavf: switch to Page Pool Alexander Lobakin
2023-05-31 16:19   ` Alexander H Duyck
2023-06-02 16:29     ` Alexander Lobakin [this message]
2023-06-02 18:00       ` Alexander Duyck
2023-06-06 13:13         ` Alexander Lobakin
2023-05-30 15:00 ` [PATCH net-next v3 10/12] libie: add common queue stats Alexander Lobakin
2023-05-30 15:00 ` [PATCH net-next v3 11/12] libie: add per-queue Page Pool stats Alexander Lobakin
2023-05-30 15:00 ` [PATCH net-next v3 12/12] iavf: switch queue stats to libie Alexander Lobakin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51f558e3-7ccd-45cd-d944-73997765fd12@intel.com \
    --to=aleksander.lobakin@intel.com \
    --cc=alexander.duyck@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=hch@lst.de \
    --cc=ilias.apalodimas@linaro.org \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=kuba@kernel.org \
    --cc=larysa.zaremba@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=michal.kubiak@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pmenzel@molgen.mpg.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox