From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-mm@kvack.org, Alexander Duyck <alexander.duyck@gmail.com>,
willemdebruijn.kernel@gmail.com, netdev@vger.kernel.org,
john.fastabend@gmail.com, Saeed Mahameed <saeedm@mellanox.com>,
bjorn.topel@intel.com,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Tariq Toukan <tariqt@mellanox.com>,
Mel Gorman <mgorman@techsingularity.net>,
brouer@redhat.com
Subject: Re: [RFC PATCH 2/4] page_pool: basic implementation of page_pool
Date: Mon, 9 Jan 2017 21:45:24 +0100 [thread overview]
Message-ID: <20170109214524.534f53a8@redhat.com> (raw)
In-Reply-To: <38d42210-de93-f16f-fa54-b149127fffeb@suse.cz>
On Mon, 9 Jan 2017 11:43:39 +0100 Vlastimil Babka <vbabka@suse.cz> wrote:
> On 01/04/2017 12:00 PM, Jesper Dangaard Brouer wrote:
> >
> > On Tue, 3 Jan 2017 17:07:49 +0100 Vlastimil Babka <vbabka@suse.cz> wrote:
> >
> >> On 12/20/2016 02:28 PM, Jesper Dangaard Brouer wrote:
> >>> The focus in this patch is getting the API around page_pool figured out.
> >>>
> >>> The internal data structures for returning page_pool pages is not optimal.
> >>> This implementation use ptr_ring for recycling, which is known not to scale
> >>> in case of multiple remote CPUs releasing/returning pages.
> >>
> >> Just few very quick impressions...
> >>
> >>> A bulking interface into the page allocator is also left for later. (This
> >>> requires cooperation will Mel Gorman, who just send me some PoC patches for this).
> >>> ---
> > [...]
> >>> diff --git a/include/linux/mm.h b/include/linux/mm.h
> >>> index 4424784ac374..11b4d8fb280b 100644
> >>> --- a/include/linux/mm.h
> >>> +++ b/include/linux/mm.h
> > [...]
> >>> @@ -765,6 +766,11 @@ static inline void put_page(struct page *page)
> >>> {
> >>> page = compound_head(page);
> >>>
> >>> + if (PagePool(page)) {
> >>> + page_pool_put_page(page);
> >>> + return;
> >>> + }
> >>
> >> Can't say I'm thrilled about a new page flag and a test in put_page().
> >
> > In patch 4/4, I'm scaling this back. Avoiding to modify the inlined
> > put_page(), by letting refcnt reach zero and catching pages belonging to
> > a page_pool in __free_pages_ok() and free_hot_cold_page(). (Result
> > in being more dependent on page-refcnt and loosing some performance).
> >
> > Still needing a new page flag, or some other method of identifying when
> > a page belongs to a page_pool.
>
> I see. I guess if all page pool pages were order>0 compound pages, you
> could hook this to the existing compound_dtor functionality instead.
The page_pool will support order>0 pages, but it is the order-0 case
that is optimized for.
> >> I don't know the full life cycle here, but isn't it that these pages
> >> will be specifically allocated and used in page pool aware drivers,
> >> so maybe they can be also specifically freed there without hooking to
> >> the generic page refcount mechanism?
> >
> > Drivers are already manipulating refcnt, to "splitup" the page (to
> > save memory) for storing more RX frames per page. Which is something
> > the page_pool still need to support. (XDP can request one page per
> > packet and gain the direct recycle optimization and instead waste mem).
> >
> > Notice, a page_pool aware driver doesn't handle the "free-side". Free
> > happens when the packet/page is being consumed, spliced or transmitted
> > out another non-page_pool-aware NIC driver. An interresting case is
> > packet-page waiting for DMA TX completion (on another NIC), thus need
> > to async-store info on page_pool and DMA-addr.
> >
> > Could extend the SKB (with a page_pool pointer)... BUT it defeats the
> > purpose of avoiding to allocate the SKB. E.g. in the cases where XDP
> > takes the route-decision and transmit/forward the "raw"-page (out
> > another NIC or into a "raw" socket), then we don't have a meta-data
> > structure to store this info in. Thus, this info is stored in struct
> > page.
>
> OK.
>
> >>> + */
> >>> struct address_space *mapping; /* If low bit clear, points to
> >>> * inode address_space, or NULL.
> >>> * If page mapped as anonymous
> >>> @@ -63,6 +69,7 @@ struct page {
> >>> union {
> >>> pgoff_t index; /* Our offset within mapping. */
> >>> void *freelist; /* sl[aou]b first free object */
> >>> + dma_addr_t dma_addr; /* used by page_pool */
> >>> /* page_deferred_list().prev -- second tail page */
> >>> };
> >>>
> >>> @@ -117,6 +124,8 @@ struct page {
> >>> * avoid collision and false-positive PageTail().
> >>> */
> >>> union {
> >>> + /* XXX: Idea reuse lru list, in page_pool to align with PCP */
> >>> +
> >>> struct list_head lru; /* Pageout list, eg. active_list
> >>> * protected by zone_lru_lock !
> >>> * Can be used as a generic list
> >
> > Guess, I can move it here, as the page cannot be on the LRU-list, while
> > being used (or VMA mapped). Right?
>
> Well typically the VMA mapped pages are those on the LRU list (anonymous
> or file). But I don't suppose you will want memory reclaim to free your
> pages, so seems lru field should be reusable for you.
Thanks for the info.
So, LRU-list area could be reusable, but I does not align so well with
the bulking API Mel just introduced/proposed, but still doable.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-01-09 20:45 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-20 13:28 [RFC PATCH 0/4] page_pool proof-of-concept early code Jesper Dangaard Brouer
2016-12-20 13:28 ` [RFC PATCH 1/4] doc: page_pool introduction documentation Jesper Dangaard Brouer
2016-12-20 13:28 ` [RFC PATCH 2/4] page_pool: basic implementation of page_pool Jesper Dangaard Brouer
2016-12-20 13:28 ` Jesper Dangaard Brouer
2017-01-03 16:07 ` Vlastimil Babka
2017-01-04 11:00 ` Jesper Dangaard Brouer
2017-01-09 10:43 ` Vlastimil Babka
2017-01-09 20:45 ` Jesper Dangaard Brouer [this message]
2017-01-09 21:58 ` Mel Gorman
2017-01-11 7:10 ` Jesper Dangaard Brouer
2017-01-11 7:10 ` Jesper Dangaard Brouer
2017-01-06 5:08 ` [lkp-developer] [page_pool] 50a8fe7622: kernel_BUG_at_mm/slub.c kernel test robot
2017-01-06 5:08 ` kernel test robot
2017-01-06 5:08 ` kernel test robot
2017-01-06 7:27 ` Jesper Dangaard Brouer
2017-01-06 7:27 ` Jesper Dangaard Brouer
2016-12-20 13:28 ` [RFC PATCH 3/4] mlx5: use page_pool Jesper Dangaard Brouer
2016-12-20 13:28 ` [RFC PATCH 4/4] page_pool: change refcnt model Jesper Dangaard Brouer
2016-12-20 13:28 ` Jesper Dangaard Brouer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170109214524.534f53a8@redhat.com \
--to=brouer@redhat.com \
--cc=alexander.duyck@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=bjorn.topel@intel.com \
--cc=john.fastabend@gmail.com \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=netdev@vger.kernel.org \
--cc=saeedm@mellanox.com \
--cc=tariqt@mellanox.com \
--cc=vbabka@suse.cz \
--cc=willemdebruijn.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.