From: Jakub Kicinski <kuba@kernel.org>
To: netdev@vger.kernel.org
Cc: almasrymina@google.com, hawk@kernel.org,
ilias.apalodimas@linaro.org, edumazet@google.com,
dsahern@gmail.com, michael.chan@broadcom.com, willemb@google.com,
Jakub Kicinski <kuba@kernel.org>
Subject: [RFC 00/12] net: huge page backed page_pool
Date: Fri, 7 Jul 2023 11:39:23 -0700 [thread overview]
Message-ID: <20230707183935.997267-1-kuba@kernel.org> (raw)
Hi!
This is an "early PoC" at best. It seems to work for a basic
traffic test but there's no uAPI and a lot more general polish
is needed.
The problem we're seeing is that performance of some older NICs
degrades quite a bit when IOMMU is used (in non-passthru mode).
There is a long tail of old NICs deployed, especially in PoPs/
/on edge. From a conversation I had with Eric a few months
ago it sounded like others may have similar issues. So I thought
I'd take a swing at getting page pool to feed drivers huge pages.
1G pages require hooking into early init via CMA but it works
just fine.
I haven't tested this with a real workload, because I'm still
waiting to get my hands on the right machine. But the experiment
with bnxt shows a ~90% reduction in IOTLB misses (670k -> 70k).
In terms of the missing parts - uAPI is definitely needed.
The rough plan would be to add memory config via the netdev
genl family. Should fit nicely there. Have the config stored
in struct netdevice. When page pool is created get to the netdev
and automatically select the provider without the driver even
knowing. Two problems with that are - 1) if the driver follows
the recommended flow of allocating new queues before freeing
old ones we will have page pools created before the old ones
are gone, which means we'd need to reserve 2x the number of
1G pages; 2) there's no callback to the driver to say "I did
something behind your back, don't worry about it, but recreate
your queues, please" so the change will not take effect until
some unrelated change like installing XDP. Which may be fine
in practice but is a bit odd.
Then we get into hand-wavy stuff like - if we can link page
pools to netdevs, we should also be able to export the page pool
stats via the netdev family instead doing it the ethtool -S.. ekhm..
"way". And if we start storing configs behind driver's back why
don't we also store other params, like ring size and queue count...
A lot of potential improvements as we iron out a new API...
Live tree: https://github.com/kuba-moo/linux/tree/pp-providers
Jakub Kicinski (12):
net: hack together some page sharing
net: create a 1G-huge-page-backed allocator
net: page_pool: hide page_pool_release_page()
net: page_pool: merge page_pool_release_page() with
page_pool_return_page()
net: page_pool: factor out releasing DMA from releasing the page
net: page_pool: create hooks for custom page providers
net: page_pool: add huge page backed memory providers
eth: bnxt: let the page pool manage the DMA mapping
eth: bnxt: use the page pool for data pages
eth: bnxt: make sure we make for recycle skbs before freeing them
eth: bnxt: wrap coherent allocations into helpers
eth: bnxt: hack in the use of MEP
Documentation/networking/page_pool.rst | 10 +-
arch/x86/kernel/setup.c | 6 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 154 +++--
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 5 +
drivers/net/ethernet/engleder/tsnep_main.c | 2 +-
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 4 +-
include/net/dcalloc.h | 28 +
include/net/page_pool.h | 36 +-
net/core/Makefile | 2 +-
net/core/dcalloc.c | 615 +++++++++++++++++
net/core/dcalloc.h | 96 +++
net/core/page_pool.c | 625 +++++++++++++++++-
12 files changed, 1478 insertions(+), 105 deletions(-)
create mode 100644 include/net/dcalloc.h
create mode 100644 net/core/dcalloc.c
create mode 100644 net/core/dcalloc.h
--
2.41.0
next reply other threads:[~2023-07-07 18:39 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-07 18:39 Jakub Kicinski [this message]
2023-07-07 18:39 ` [RFC 01/12] net: hack together some page sharing Jakub Kicinski
2023-07-07 18:39 ` [RFC 02/12] net: create a 1G-huge-page-backed allocator Jakub Kicinski
2023-07-07 18:39 ` [RFC 03/12] net: page_pool: hide page_pool_release_page() Jakub Kicinski
2023-07-07 18:39 ` [RFC 04/12] net: page_pool: merge page_pool_release_page() with page_pool_return_page() Jakub Kicinski
2023-07-10 16:07 ` Jesper Dangaard Brouer
2023-07-07 18:39 ` [RFC 05/12] net: page_pool: factor out releasing DMA from releasing the page Jakub Kicinski
2023-07-07 18:39 ` [RFC 06/12] net: page_pool: create hooks for custom page providers Jakub Kicinski
2023-07-07 19:50 ` Mina Almasry
2023-07-07 22:28 ` Jakub Kicinski
2023-07-07 18:39 ` [RFC 07/12] net: page_pool: add huge page backed memory providers Jakub Kicinski
2023-07-07 18:39 ` [RFC 08/12] eth: bnxt: let the page pool manage the DMA mapping Jakub Kicinski
2023-07-10 10:12 ` Jesper Dangaard Brouer
2023-07-26 6:56 ` Ilias Apalodimas
2023-07-07 18:39 ` [RFC 09/12] eth: bnxt: use the page pool for data pages Jakub Kicinski
2023-07-10 4:22 ` Michael Chan
2023-07-10 17:04 ` Jakub Kicinski
2023-07-07 18:39 ` [RFC 10/12] eth: bnxt: make sure we make for recycle skbs before freeing them Jakub Kicinski
2023-07-07 18:39 ` [RFC 11/12] eth: bnxt: wrap coherent allocations into helpers Jakub Kicinski
2023-07-07 18:39 ` [RFC 12/12] eth: bnxt: hack in the use of MEP Jakub Kicinski
2023-07-07 19:45 ` [RFC 00/12] net: huge page backed page_pool Mina Almasry
2023-07-07 22:45 ` Jakub Kicinski
2023-07-10 17:31 ` Mina Almasry
2023-07-11 15:49 ` Jesper Dangaard Brouer
2023-07-12 0:08 ` Jakub Kicinski
2023-07-12 11:47 ` Yunsheng Lin
2023-07-12 12:43 ` Jesper Dangaard Brouer
2023-07-12 17:01 ` Jakub Kicinski
2023-07-14 13:05 ` Yunsheng Lin
2023-07-12 14:00 ` Jesper Dangaard Brouer
2023-07-12 17:19 ` Jakub Kicinski
2023-07-13 10:07 ` Jesper Dangaard Brouer
2023-07-13 16:27 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230707183935.997267-1-kuba@kernel.org \
--to=kuba@kernel.org \
--cc=almasrymina@google.com \
--cc=dsahern@gmail.com \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=michael.chan@broadcom.com \
--cc=netdev@vger.kernel.org \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.