From: Yunsheng Lin <yunshenglin0825@gmail.com>
To: Alexander Duyck <alexander.duyck@gmail.com>,
Yunsheng Lin <linyunsheng@huawei.com>
Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
Lorenzo Bianconi <lorenzo@kernel.org>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
Eric Dumazet <edumazet@google.com>
Subject: Re: [PATCH net-next v2 2/3] page_pool: support non-frag page for page_pool_alloc_frag()
Date: Sat, 3 Jun 2023 12:20:15 +0800 [thread overview]
Message-ID: <f5e372ca-e637-4873-0bea-b1b19c623124@gmail.com> (raw)
In-Reply-To: <CAKgT0UchHBO+kyPZMYJR7JHfqYsk+qSeuvXzA-H9w3VH-9Tfrg@mail.gmail.com>
On 2023/6/2 23:57, Alexander Duyck wrote:
> On Fri, Jun 2, 2023 at 5:23 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
...
>>
>> According to my defination in this patchset:
>> frag page: page alloced from page_pool_alloc_frag() with page->pp_frag_count
>> being greater than one.
>> non-frag page:page alloced return from both page_pool_alloc_frag() and
>> page_pool_alloc_pages() with page->pp_frag_count being one.
>>
>> I assume the above 'non-page pool pages' refer to what I call as 'non-frag
>> page' alloced return from both page_pool_alloc_frag(), right? And it is
>> still about doing the (size << 1 > max_size)' checking at the begin instead
>> of at the middle right now to avoid extra steps for 'non-frag page' case?
>
> Yeah, the non-page I was referring to were you mono-frag pages.
I was using 'frag page' and 'non-frag page' per the defination above,
and you were using 'mono-frag' mostly and 'non-page' sometimes.
I am really confused by them as I felt like I got what they meant and
then I was lost when you used them in the next comment. I really hope
that you could describe what do you mean in more detailed by using
'mono-frag pages' and 'non-page', so that we can choose the right
naming to continue the discussion without further misunderstanding
and confusion.
>
>>>
>>>>>
>>>>>> - if (page && *offset + size > max_size) {
>>>>>> + if (page) {
>>>>>> + *offset = pool->frag_offset;
>>>>>> +
>>>>>> + if (*offset + size <= max_size) {
>>>>>> + pool->frag_users++;
>>>>>> + pool->frag_offset = *offset + size;
>>>>>> + alloc_stat_inc(pool, fast);
>>>>>> + return page;
>>>>
>>>> Note that we still allow frag page here when '(size << 1 > max_size)'.
>>
>> This is the optimization I was taking about: suppose we start
>> from a clean state with 64K page size, if page_pool_alloc_frag()
>> is called with size being 2K and then 34K, we only need one page
>> to satisfy caller's need as we do the '*offset + size > max_size'
>> checking before the '(size << 1 > max_size)' checking.
>
> The issue is the unaccounted for waste. We are supposed to know the
> general size of the frags being used so we can compute truesize. If
Note, for case of veth and virtio_net, the driver may only know the
current frag size when calling page_pool_alloc_frag(), it does not
konw what is the size of the frags will be used next time, how exactly
are we going to compute the truesize for cases with different frag
size? As far as I can tell, we may only do something like virtio_net
is doing with 'page_frag' for the last frag as below, for other frags,
the truesize may need to take accounting to the aligning requirement:
https://elixir.bootlin.com/linux/v6.3.5/source/drivers/net/virtio_net.c#L1638
> for example you are using an order 3 page and you are splitting it
> between a 2K and a 17K fragment the 2K fragments will have a massive
> truesize underestimate that can lead to memory issues if those smaller
> fragments end up holding onto the pages.
>
> As such we should try to keep the small fragments away from anything
> larger than half of the page.
IMHO, doing the above only alleviate the problem. How is above splitting
different from splitting it evently with 16 2K fragments, and when 15 frag
is released, we still have the last 2K fragment holding onto 32K memory,
doesn't that also cause massive truesize underestimate? Not to mention that
for system with 64K page size.
In RFC patch below, 'page_pool_frag' is used to report the truesize, but
I was thinking both 'page_frag' and 'page_frag_cache' both have a similiar
problem, so I dropped it in V1 and left that as a future improvement.
I can pick it up again if 'truesize' is really the concern, but we have to
align on how to compute the truesize here first.
https://patchwork.kernel.org/project/netdevbpf/patch/20230516124801.2465-4-linyunsheng@huawei.com/
next prev parent reply other threads:[~2023-06-03 4:20 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-29 9:28 [PATCH net-next v2 0/3] support non-frag page for page_pool_alloc_frag() Yunsheng Lin
2023-05-29 9:28 ` [PATCH net-next v2 1/3] page_pool: unify frag page and non-frag page handling Yunsheng Lin
2023-05-30 15:07 ` Alexander H Duyck
2023-05-31 11:55 ` Yunsheng Lin
2023-06-02 16:37 ` Alexander H Duyck
2023-06-03 12:59 ` Yunsheng Lin
2023-06-05 14:58 ` Alexander Duyck
2023-06-06 12:41 ` Yunsheng Lin
2023-06-06 15:33 ` Alexander Duyck
2023-06-07 12:46 ` Yunsheng Lin
2023-06-07 15:05 ` Alexander Duyck
2023-05-29 9:28 ` [PATCH net-next v2 2/3] page_pool: support non-frag page for page_pool_alloc_frag() Yunsheng Lin
2023-05-30 15:07 ` Alexander H Duyck
2023-05-31 12:19 ` Yunsheng Lin
2023-06-01 12:21 ` Yunsheng Lin
2023-06-01 18:14 ` Alexander Duyck
2023-06-02 12:23 ` Yunsheng Lin
2023-06-02 15:57 ` Alexander Duyck
2023-06-03 4:20 ` Yunsheng Lin [this message]
2023-06-05 15:08 ` Alexander Duyck
2023-05-29 9:28 ` [PATCH net-next v2 3/3] page_pool: remove PP_FLAG_PAGE_FRAG flag Yunsheng Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f5e372ca-e637-4873-0bea-b1b19c623124@gmail.com \
--to=yunshenglin0825@gmail.com \
--cc=alexander.duyck@gmail.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linyunsheng@huawei.com \
--cc=lorenzo@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox