public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Alexander Lobakin <aleksander.lobakin@intel.com>
To: Yunsheng Lin <yunshenglin0825@gmail.com>
Cc: Yunsheng Lin <linyunsheng@huawei.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Maciej Fijalkowski <maciej.fijalkowski@intel.com>,
	Michal Kubiak <michal.kubiak@intel.com>,
	Larysa Zaremba <larysa.zaremba@intel.com>,
	Alexander Duyck <alexanderduyck@fb.com>,
	David Christensen <drc@linux.vnet.ibm.com>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	Paul Menzel <pmenzel@molgen.mpg.de>, <netdev@vger.kernel.org>,
	<intel-wired-lan@lists.osuosl.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH RFC net-next v4 5/9] libie: add Rx buffer management (via Page Pool)
Date: Mon, 10 Jul 2023 15:25:58 +0200	[thread overview]
Message-ID: <b05d1a35-5bc5-b65d-b57d-5cc1b0f898cb@intel.com> (raw)
In-Reply-To: <71a8bab4-1a1d-cb1a-d75c-585a14c6fb2e@gmail.com>

From: Yunsheng Lin <yunshenglin0825@gmail.com>
Date: Sun, 9 Jul 2023 13:16:33 +0800

> On 2023/7/7 0:28, Alexander Lobakin wrote:
>> From: Yunsheng Lin <linyunsheng@huawei.com>
>> Date: Thu, 6 Jul 2023 20:47:28 +0800
>>
>>> On 2023/7/5 23:55, Alexander Lobakin wrote:
>>>
>>>> +/**
>>>> + * libie_rx_page_pool_create - create a PP with the default libie settings
>>>> + * @napi: &napi_struct covering this PP (no usage outside its poll loops)
>>>> + * @size: size of the PP, usually simply Rx queue len
>>>> + *
>>>> + * Returns &page_pool on success, casted -errno on failure.
>>>> + */
>>>> +struct page_pool *libie_rx_page_pool_create(struct napi_struct *napi,
>>>> +					    u32 size)
>>>> +{
>>>> +	struct page_pool_params pp = {
>>>> +		.flags		= PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
>>>> +		.order		= LIBIE_RX_PAGE_ORDER,
>>>> +		.pool_size	= size,
>>>> +		.nid		= NUMA_NO_NODE,
>>>> +		.dev		= napi->dev->dev.parent,
>>>> +		.napi		= napi,
>>>> +		.dma_dir	= DMA_FROM_DEVICE,
>>>> +		.offset		= LIBIE_SKB_HEADROOM,
>>>
>>> I think it worth mentioning that the '.offset' is not really accurate
>>> when the page is split, as we do not really know what is the offset of
>>> the frag of a page except for the first frag.
>>
>> Yeah, this is read as "offset from the start of the page or frag to the
>> actual frame start, i.e. its Ethernet header" or "this is just
>> xdp->data - xdp->data_hard_start".
> 
> So the problem seems to be if most of drivers have a similar reading as
> libie does here, as .offset seems to have a clear semantics which is used
> to skip dma sync operation for buffer range that is not touched by the
> dma operation. Even if it happens to have the same value of "offset from
> the start of the page or frag to the actual frame start", I am not sure
> it is future-proofing to reuse it.

Not sure I'm following :s

> 
> When page frag is added, I didn't really give much thought about that as
> we use it in a cache coherent system.
> It seems we might need to extend or update that semantics if we really want
> to skip dma sync operation for all the buffer ranges that are not touched
> by the dma operation for page split case.
> Or Skipping dma sync operation for all untouched ranges might not be worth
> the effort, because it might need a per frag dma sync operation, which is
> more costly than a batched per page dma sync operation. If it is true, page
> pool already support that currently as my understanding, because the dma
> sync operation is only done when the last frag is released/freed.
> 
>>
>>>
>>>> +	};
>>>> +	size_t truesize;
>>>> +
>>>> +	pp.max_len = libie_rx_sync_len(napi->dev, pp.offset);
> 
> As mentioned above, if we depend on the last released/freed frag to do the
> dma sync, the pp.max_len might need to cover all the frag.

                                               ^^^^^^^^^^^^

You mean the whole page or...?
I think it's not the driver's duty to track all this. We always set
.offset to `data - data_hard_start` and .max_len to the maximum
HW-writeable length for one frame. We don't know whether PP will give us
a whole page or just a piece. DMA sync for device is performed in the PP
core code as well. Driver just creates a PP and don't care about the
internals.

> 
>>>> +
>>>> +	/* "Wanted" truesize, passed to page_pool_dev_alloc() */
>>>> +	truesize = roundup_pow_of_two(SKB_HEAD_ALIGN(pp.offset + pp.max_len));
>>>> +	pp.init_arg = (void *)truesize;
>>>
>>> I am not sure if it is correct to use pp.init_arg here, as it is supposed to
>>> be used along with init_callback. And if we want to change the implemetation
>>
>> I know. I abused it to save 1 function argument :p It's safe since I
>> don't use init_callback (not an argument).
>> I was thinking also of having a union in PP params or even a new field
>> like "wanted true size", so that your function could even take values
>> from there in certain cases (e.g. if I pass 0 as parameter).
>>
>>> of init_callback, we may stuck with it as the driver is using it very
>>> differently here.
>>>
>>> Is it possible to pass the 'wanted true size' by adding a parameter for
>>> libie_rx_alloc()?
>>
>> Yes, or I could store it somewhere on the ring, but looks uglier =\ This
>> one does as well to some degree, but at least hidden in the library and
>> doesn't show up in the drivers :D
> 
> It seems most hw driver know the size of memory it needs when creating
> the ring/queue, setting the frag size and deciding how many is a page
> split into before allocation seems like a possible future optimization.
> 
> For now, it would be better to add helper to acess pp.init_arg at least
> instead of acess pp.init_arg directly to make it more obvious and make
> the future optimization more easier.

Makes senses.

> 
>>
>>>
>>>> +
>>>> +	return page_pool_create(&pp);
>>>> +}
>>>> +EXPORT_SYMBOL_NS_GPL(libie_rx_page_pool_create, LIBIE);
>>
>> Thanks,
>> Olek
> 

Thanks,
Olek

  reply	other threads:[~2023-07-10 13:28 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-05 15:55 [PATCH RFC net-next v4 0/9] net: intel: start The Great Code Dedup + Page Pool for iavf Alexander Lobakin
2023-07-05 15:55 ` [PATCH RFC net-next v4 1/9] net: intel: introduce Intel Ethernet common library Alexander Lobakin
2023-07-14 14:17   ` [Intel-wired-lan] " Przemek Kitszel
2023-07-05 15:55 ` [PATCH RFC net-next v4 2/9] iavf: kill "legacy-rx" for good Alexander Lobakin
2023-07-14 14:17   ` [Intel-wired-lan] " Przemek Kitszel
2023-07-05 15:55 ` [PATCH RFC net-next v4 3/9] iavf: drop page splitting and recycling Alexander Lobakin
2023-07-06 14:47   ` [Intel-wired-lan] " Alexander Duyck
2023-07-06 16:45     ` Alexander Lobakin
2023-07-06 17:06       ` Alexander Duyck
2023-07-10 13:13         ` Alexander Lobakin
2023-07-05 15:55 ` [PATCH RFC net-next v4 4/9] net: page_pool: add DMA-sync-for-CPU inline helpers Alexander Lobakin
2023-07-05 15:55 ` [PATCH RFC net-next v4 5/9] libie: add Rx buffer management (via Page Pool) Alexander Lobakin
2023-07-06 12:47   ` Yunsheng Lin
2023-07-06 16:28     ` Alexander Lobakin
2023-07-09  5:16       ` Yunsheng Lin
2023-07-10 13:25         ` Alexander Lobakin [this message]
2023-07-11 11:39           ` Yunsheng Lin
2023-07-11 16:37             ` Alexander Lobakin
2023-07-12 11:13               ` Yunsheng Lin
2023-07-05 15:55 ` [PATCH RFC net-next v4 6/9] iavf: switch to Page Pool Alexander Lobakin
2023-07-06 12:47   ` Yunsheng Lin
2023-07-06 16:38     ` Alexander Lobakin
2023-07-09  5:16       ` Yunsheng Lin
2023-07-10 13:34         ` Alexander Lobakin
2023-07-11 11:47           ` Yunsheng Lin
2023-07-18 13:56             ` Alexander Lobakin
2023-07-06 15:26   ` [Intel-wired-lan] " Alexander Duyck
2023-07-06 16:56     ` Alexander Lobakin
2023-07-06 17:28       ` Alexander Duyck
2023-07-10 13:18         ` Alexander Lobakin
2023-07-05 15:55 ` [PATCH RFC net-next v4 7/9] libie: add common queue stats Alexander Lobakin
2023-07-05 15:55 ` [PATCH RFC net-next v4 8/9] libie: add per-queue Page Pool stats Alexander Lobakin
2023-07-05 15:55 ` [PATCH RFC net-next v4 9/9] iavf: switch queue stats to libie Alexander Lobakin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b05d1a35-5bc5-b65d-b57d-5cc1b0f898cb@intel.com \
    --to=aleksander.lobakin@intel.com \
    --cc=alexanderduyck@fb.com \
    --cc=davem@davemloft.net \
    --cc=drc@linux.vnet.ibm.com \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=kuba@kernel.org \
    --cc=larysa.zaremba@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linyunsheng@huawei.com \
    --cc=maciej.fijalkowski@intel.com \
    --cc=michal.kubiak@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pmenzel@molgen.mpg.de \
    --cc=yunshenglin0825@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox