From: Yunsheng Lin <yunshenglin0825@gmail.com>
To: Alexander Lobakin <aleksander.lobakin@intel.com>,
Yunsheng Lin <linyunsheng@huawei.com>
Cc: "David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Maciej Fijalkowski <maciej.fijalkowski@intel.com>,
Michal Kubiak <michal.kubiak@intel.com>,
Larysa Zaremba <larysa.zaremba@intel.com>,
Alexander Duyck <alexanderduyck@fb.com>,
David Christensen <drc@linux.vnet.ibm.com>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
Paul Menzel <pmenzel@molgen.mpg.de>,
netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC net-next v4 5/9] libie: add Rx buffer management (via Page Pool)
Date: Sun, 9 Jul 2023 13:16:33 +0800 [thread overview]
Message-ID: <71a8bab4-1a1d-cb1a-d75c-585a14c6fb2e@gmail.com> (raw)
In-Reply-To: <09a9a9ef-cf77-3b60-2845-94595a42cf3e@intel.com>
On 2023/7/7 0:28, Alexander Lobakin wrote:
> From: Yunsheng Lin <linyunsheng@huawei.com>
> Date: Thu, 6 Jul 2023 20:47:28 +0800
>
>> On 2023/7/5 23:55, Alexander Lobakin wrote:
>>
>>> +/**
>>> + * libie_rx_page_pool_create - create a PP with the default libie settings
>>> + * @napi: &napi_struct covering this PP (no usage outside its poll loops)
>>> + * @size: size of the PP, usually simply Rx queue len
>>> + *
>>> + * Returns &page_pool on success, casted -errno on failure.
>>> + */
>>> +struct page_pool *libie_rx_page_pool_create(struct napi_struct *napi,
>>> + u32 size)
>>> +{
>>> + struct page_pool_params pp = {
>>> + .flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
>>> + .order = LIBIE_RX_PAGE_ORDER,
>>> + .pool_size = size,
>>> + .nid = NUMA_NO_NODE,
>>> + .dev = napi->dev->dev.parent,
>>> + .napi = napi,
>>> + .dma_dir = DMA_FROM_DEVICE,
>>> + .offset = LIBIE_SKB_HEADROOM,
>>
>> I think it worth mentioning that the '.offset' is not really accurate
>> when the page is split, as we do not really know what is the offset of
>> the frag of a page except for the first frag.
>
> Yeah, this is read as "offset from the start of the page or frag to the
> actual frame start, i.e. its Ethernet header" or "this is just
> xdp->data - xdp->data_hard_start".
So the problem seems to be if most of drivers have a similar reading as
libie does here, as .offset seems to have a clear semantics which is used
to skip dma sync operation for buffer range that is not touched by the
dma operation. Even if it happens to have the same value of "offset from
the start of the page or frag to the actual frame start", I am not sure
it is future-proofing to reuse it.
When page frag is added, I didn't really give much thought about that as
we use it in a cache coherent system.
It seems we might need to extend or update that semantics if we really want
to skip dma sync operation for all the buffer ranges that are not touched
by the dma operation for page split case.
Or Skipping dma sync operation for all untouched ranges might not be worth
the effort, because it might need a per frag dma sync operation, which is
more costly than a batched per page dma sync operation. If it is true, page
pool already support that currently as my understanding, because the dma
sync operation is only done when the last frag is released/freed.
>
>>
>>> + };
>>> + size_t truesize;
>>> +
>>> + pp.max_len = libie_rx_sync_len(napi->dev, pp.offset);
As mentioned above, if we depend on the last released/freed frag to do the
dma sync, the pp.max_len might need to cover all the frag.
>>> +
>>> + /* "Wanted" truesize, passed to page_pool_dev_alloc() */
>>> + truesize = roundup_pow_of_two(SKB_HEAD_ALIGN(pp.offset + pp.max_len));
>>> + pp.init_arg = (void *)truesize;
>>
>> I am not sure if it is correct to use pp.init_arg here, as it is supposed to
>> be used along with init_callback. And if we want to change the implemetation
>
> I know. I abused it to save 1 function argument :p It's safe since I
> don't use init_callback (not an argument).
> I was thinking also of having a union in PP params or even a new field
> like "wanted true size", so that your function could even take values
> from there in certain cases (e.g. if I pass 0 as parameter).
>
>> of init_callback, we may stuck with it as the driver is using it very
>> differently here.
>>
>> Is it possible to pass the 'wanted true size' by adding a parameter for
>> libie_rx_alloc()?
>
> Yes, or I could store it somewhere on the ring, but looks uglier =\ This
> one does as well to some degree, but at least hidden in the library and
> doesn't show up in the drivers :D
It seems most hw driver know the size of memory it needs when creating
the ring/queue, setting the frag size and deciding how many is a page
split into before allocation seems like a possible future optimization.
For now, it would be better to add helper to acess pp.init_arg at least
instead of acess pp.init_arg directly to make it more obvious and make
the future optimization more easier.
>
>>
>>> +
>>> + return page_pool_create(&pp);
>>> +}
>>> +EXPORT_SYMBOL_NS_GPL(libie_rx_page_pool_create, LIBIE);
>
> Thanks,
> Olek
next prev parent reply other threads:[~2023-07-09 5:16 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-05 15:55 [PATCH RFC net-next v4 0/9] net: intel: start The Great Code Dedup + Page Pool for iavf Alexander Lobakin
2023-07-05 15:55 ` [PATCH RFC net-next v4 1/9] net: intel: introduce Intel Ethernet common library Alexander Lobakin
2023-07-14 14:17 ` [Intel-wired-lan] " Przemek Kitszel
2023-07-05 15:55 ` [PATCH RFC net-next v4 2/9] iavf: kill "legacy-rx" for good Alexander Lobakin
2023-07-14 14:17 ` [Intel-wired-lan] " Przemek Kitszel
2023-07-05 15:55 ` [PATCH RFC net-next v4 3/9] iavf: drop page splitting and recycling Alexander Lobakin
2023-07-06 14:47 ` [Intel-wired-lan] " Alexander Duyck
2023-07-06 16:45 ` Alexander Lobakin
2023-07-06 17:06 ` Alexander Duyck
2023-07-10 13:13 ` Alexander Lobakin
2023-07-05 15:55 ` [PATCH RFC net-next v4 4/9] net: page_pool: add DMA-sync-for-CPU inline helpers Alexander Lobakin
2023-07-05 15:55 ` [PATCH RFC net-next v4 5/9] libie: add Rx buffer management (via Page Pool) Alexander Lobakin
2023-07-06 12:47 ` Yunsheng Lin
2023-07-06 16:28 ` Alexander Lobakin
2023-07-09 5:16 ` Yunsheng Lin [this message]
2023-07-10 13:25 ` Alexander Lobakin
2023-07-11 11:39 ` Yunsheng Lin
2023-07-11 16:37 ` Alexander Lobakin
2023-07-12 11:13 ` Yunsheng Lin
2023-07-05 15:55 ` [PATCH RFC net-next v4 6/9] iavf: switch to Page Pool Alexander Lobakin
2023-07-06 12:47 ` Yunsheng Lin
2023-07-06 16:38 ` Alexander Lobakin
2023-07-09 5:16 ` Yunsheng Lin
2023-07-10 13:34 ` Alexander Lobakin
2023-07-11 11:47 ` Yunsheng Lin
2023-07-18 13:56 ` Alexander Lobakin
2023-07-06 15:26 ` [Intel-wired-lan] " Alexander Duyck
2023-07-06 16:56 ` Alexander Lobakin
2023-07-06 17:28 ` Alexander Duyck
2023-07-10 13:18 ` Alexander Lobakin
2023-07-05 15:55 ` [PATCH RFC net-next v4 7/9] libie: add common queue stats Alexander Lobakin
2023-07-05 15:55 ` [PATCH RFC net-next v4 8/9] libie: add per-queue Page Pool stats Alexander Lobakin
2023-07-05 15:55 ` [PATCH RFC net-next v4 9/9] iavf: switch queue stats to libie Alexander Lobakin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=71a8bab4-1a1d-cb1a-d75c-585a14c6fb2e@gmail.com \
--to=yunshenglin0825@gmail.com \
--cc=aleksander.lobakin@intel.com \
--cc=alexanderduyck@fb.com \
--cc=davem@davemloft.net \
--cc=drc@linux.vnet.ibm.com \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=kuba@kernel.org \
--cc=larysa.zaremba@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linyunsheng@huawei.com \
--cc=maciej.fijalkowski@intel.com \
--cc=michal.kubiak@intel.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pmenzel@molgen.mpg.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).