Re: [RFC 00/12] net: huge page backed page_pool

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jakub Kicinski <kuba@kernel.org>
To: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: netdev@vger.kernel.org, brouer@redhat.com,
	almasrymina@google.com, hawk@kernel.org,
	ilias.apalodimas@linaro.org, edumazet@google.com,
	dsahern@gmail.com, michael.chan@broadcom.com, willemb@google.com
Subject: Re: [RFC 00/12] net: huge page backed page_pool
Date: Tue, 11 Jul 2023 17:08:38 -0700	[thread overview]
Message-ID: <20230711170838.08adef4c@kernel.org> (raw)
In-Reply-To: <1721282f-7ec8-68bd-6d52-b4ef209f047b@redhat.com>

On Tue, 11 Jul 2023 17:49:19 +0200 Jesper Dangaard Brouer wrote:
> I see you have discovered that the next bottleneck are the IOTLB misses.
> One of the techniques for reducing IOTLB misses is using huge pages.
> Called "super-pages" in article (below), and they report that this trick
> doesn't work on AMD (Pacifica arch).
> 
> I think you have convinced me that the pp_provider idea makes sense for
> *this* use-case, because it feels like natural to extend PP with
> mitigations for IOTLB misses. (But I'm not 100% sure it fits Mina's
> use-case).

We're on the same page then (no pun intended).

> What is your page refcnt strategy for these huge-pages. I assume this
> rely on PP frags-scheme, e.g. using page->pp_frag_count.
> Is this correctly understood?

Oh, I split the page into individual 4k pages after DMA mapping.
There's no need for the host memory to be a huge page. I mean, 
the actual kernel identity mapping is a huge page AFAIU, and the 
struct pages are allocated, anyway. We just need it to be a huge 
page at DMA mapping time.

So the pages from the huge page provider only differ from normal
alloc_page() pages by the fact that they are a part of a 1G DMA
mapping.

I'm talking mostly about the 1G provider, 2M providers can be
implemented using various strategies cause 2M is smaller than 
MAX_ORDER.

> Generally the pp_provider's will have to use the refcnt schemes
> supported by page_pool.  (Which is why I'm not 100% sure this fits
> Mina's use-case).
>
> [IOTLB details]:
> 
> As mentioned on [RFC 08/12] there are other techniques for reducing 
> IOTLB misses, described in:
>   IOMMU: Strategies for Mitigating the IOTLB Bottleneck
>    - https://inria.hal.science/inria-00493752/document
> 
> I took a deeper look at also discovered Intel's documentation:
>   - Intel virtualization technology for directed I/O, arch spec
>   - 
> https://www.intel.com/content/www/us/en/content-details/774206/intel-virtualization-technology-for-directed-i-o-architecture-specification.html
> 
> One problem that is interesting to notice is how NICs access the packets
> via ring-queue, which is likely larger that number of IOTLB entries.
> Thus, a high change of IOTLB misses.  They suggest marking pages with
> Eviction Hints (EH) that cause pages to be marked as Transient Mappings
> (TM) which allows IOMMU to evict these faster (making room for others).
> And then combine this with prefetching.

Interesting, didn't know about EH.

> In this context of how fast a page is reused by NIC and spatial
> locality, it is worth remembering that PP have two schemes, (1) the fast
> alloc cache that in certain cases can recycle pages (and it based on a
> stack approach), (2) normal recycling via the ptr_ring that will have a
> longer time before page gets reused.

I read somewhere that Intel IOTLB can be as small as 256 entries. 
So it seems pretty much impossible for it to cache accesses to 4k 
pages thru recycling. I thought that even 2M pages will start to 
be problematic for multi queue devices (1k entries on each ring x 
32 rings == 128MB just sitting on the ring, let alone circulation).

next prev parent reply	other threads:[~2023-07-12  0:08 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-07 18:39 [RFC 00/12] net: huge page backed page_pool Jakub Kicinski
2023-07-07 18:39 ` [RFC 01/12] net: hack together some page sharing Jakub Kicinski
2023-07-07 18:39 ` [RFC 02/12] net: create a 1G-huge-page-backed allocator Jakub Kicinski
2023-07-07 18:39 ` [RFC 03/12] net: page_pool: hide page_pool_release_page() Jakub Kicinski
2023-07-07 18:39 ` [RFC 04/12] net: page_pool: merge page_pool_release_page() with page_pool_return_page() Jakub Kicinski
2023-07-10 16:07   ` Jesper Dangaard Brouer
2023-07-07 18:39 ` [RFC 05/12] net: page_pool: factor out releasing DMA from releasing the page Jakub Kicinski
2023-07-07 18:39 ` [RFC 06/12] net: page_pool: create hooks for custom page providers Jakub Kicinski
2023-07-07 19:50   ` Mina Almasry
2023-07-07 22:28     ` Jakub Kicinski
2023-07-07 18:39 ` [RFC 07/12] net: page_pool: add huge page backed memory providers Jakub Kicinski
2023-07-07 18:39 ` [RFC 08/12] eth: bnxt: let the page pool manage the DMA mapping Jakub Kicinski
2023-07-10 10:12   ` Jesper Dangaard Brouer
2023-07-26  6:56     ` Ilias Apalodimas
2023-07-07 18:39 ` [RFC 09/12] eth: bnxt: use the page pool for data pages Jakub Kicinski
2023-07-10  4:22   ` Michael Chan
2023-07-10 17:04     ` Jakub Kicinski
2023-07-07 18:39 ` [RFC 10/12] eth: bnxt: make sure we make for recycle skbs before freeing them Jakub Kicinski
2023-07-07 18:39 ` [RFC 11/12] eth: bnxt: wrap coherent allocations into helpers Jakub Kicinski
2023-07-07 18:39 ` [RFC 12/12] eth: bnxt: hack in the use of MEP Jakub Kicinski
2023-07-07 19:45 ` [RFC 00/12] net: huge page backed page_pool Mina Almasry
2023-07-07 22:45   ` Jakub Kicinski
2023-07-10 17:31     ` Mina Almasry
2023-07-11 15:49 ` Jesper Dangaard Brouer
2023-07-12  0:08   ` Jakub Kicinski [this message]
2023-07-12 11:47     ` Yunsheng Lin
2023-07-12 12:43       ` Jesper Dangaard Brouer
2023-07-12 17:01         ` Jakub Kicinski
2023-07-14 13:05           ` Yunsheng Lin
2023-07-12 14:00     ` Jesper Dangaard Brouer
2023-07-12 17:19       ` Jakub Kicinski
2023-07-13 10:07         ` Jesper Dangaard Brouer
2023-07-13 16:27           ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230711170838.08adef4c@kernel.org \
    --to=kuba@kernel.org \
    --cc=almasrymina@google.com \
    --cc=brouer@redhat.com \
    --cc=dsahern@gmail.com \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=jbrouer@redhat.com \
    --cc=michael.chan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.