Re: [RFC 0/7] Support high-order page bulk allocation

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Minchan Kim <minchan@kernel.org>
To: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm <linux-mm@kvack.org>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Vlastimil Babka <vbabka@suse.cz>, John Dias <joaodias@google.com>,
	Suren Baghdasaryan <surenb@google.com>,
	pullip.cho@samsung.com
Subject: Re: [RFC 0/7] Support high-order page bulk allocation
Date: Mon, 17 Aug 2020 08:27:06 -0700	[thread overview]
Message-ID: <20200817152706.GB3852332@google.com> (raw)
In-Reply-To: <4e2bd095-b693-9fed-40e0-ab538ec09aaa@redhat.com>

On Sun, Aug 16, 2020 at 02:31:22PM +0200, David Hildenbrand wrote:
> On 14.08.20 19:31, Minchan Kim wrote:
> > There is a need for special HW to require bulk allocation of
> > high-order pages. For example, 4800 * order-4 pages.
> > 
> > To meet the requirement, a option is using CMA area because
> > page allocator with compaction under memory pressure is
> > easily failed to meet the requirement and too slow for 4800
> > times. However, CMA has also the following drawbacks:
> > 
> >  * 4800 of order-4 * cma_alloc is too slow
> > 
> > To avoid the slowness, we could try to allocate 300M contiguous
> > memory once and then split them into order-4 chunks.
> > The problem of this approach is CMA allocation fails one of the
> > pages in those range couldn't migrate out, which happens easily
> > with fs write under memory pressure.
> 
> Why not chose a value in between? Like try to allocate MAX_ORDER - 1
> chunks and split them. That would already heavily reduce the call frequency.

I think you meant this:

    alloc_pages(GFP_KERNEL|__GFP_NOWARN, MAX_ORDER - 1)

It would work if system has lots of non-fragmented free memory.
However, once they are fragmented, it doesn't work. That's why we have
seen even order-4 allocation failure in the field easily and that's why
CMA was there.

CMA has more logics to isolate the memory during allocation/freeing as
well as fragmentation avoidance so that it has less chance to be stealed
from others and increase high success ratio. That's why I want this API
to be used with CMA or movable zone.

A usecase is device can set a exclusive CMA area up when system boots.
When device needs 4800 * order-4 pages, it could call this bulk against
of the area so that it could effectively be guaranteed to allocate
enough fast.

> 
> I don't see a real need for a completely new range allocator function
> for this special case yet.
> 
> > 
> > To solve issues, this patch introduces alloc_pages_bulk.
> > 
> >   int alloc_pages_bulk(unsigned long start, unsigned long end,
> >                        unsigned int migratetype, gfp_t gfp_mask,
> >                        unsigned int order, unsigned int nr_elem,
> >                        struct page **pages);
> > 
> > It will investigate the [start, end) and migrate movable pages
> > out there by best effort(by upcoming patches) to make requested
> > order's free pages.
> > 
> > The allocated pages will be returned using pages parameter.
> > Return value represents how many of requested order pages we got.
> > It could be less than user requested by nr_elem.
> > 
> > /**
> >  * alloc_pages_bulk() -- tries to allocate high order pages
> >  * by batch from given range [start, end)
> >  * @start:      start PFN to allocate
> >  * @end:        one-past-the-last PFN to allocate
> >  * @migratetype:        migratetype of the underlaying pageblocks (either
> >  *                      #MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
> >  *                      in range must have the same migratetype and it must
> >  *                      be either of the two.
> >  * @gfp_mask:   GFP mask to use during compaction
> >  * @order:      page order requested
> >  * @nr_elem:    the number of high-order pages to allocate
> >  * @pages:      page array pointer to store allocated pages (must
> >  *              have space for at least nr_elem elements)
> >  *
> >  * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
> >  * aligned.  The PFN range must belong to a single zone.
> >  *
> >  * Return: the number of pages allocated on success or negative error code.
> >  * The allocated pages should be freed using __free_pages
> >  */
> > 
> > The test goes order-4 * 4800 allocation(i.e., total 300MB) under kernel
> > build workload. System RAM size is 1.5GB and CMA is 500M.
> > 
> > With using CMA to allocate to 300M, ran 10 times trial, 10 time failed
> > with big latency(up to several seconds).
> > 
> > With this alloc_pages_bulk API, ran 10 time trial, 7 times are
> > successful to allocate 4800 times. Rest 3 times are allocated 4799, 4789
> > and 4799. They are all done with 300ms.
> > 
> > This patchset is against on next-20200813
> > 
> > Minchan Kim (7):
> >   mm: page_owner: split page by order
> >   mm: introduce split_page_by_order
> >   mm: compaction: deal with upcoming high-order page splitting
> >   mm: factor __alloc_contig_range out
> >   mm: introduce alloc_pages_bulk API
> >   mm: make alloc_pages_bulk best effort
> >   mm/page_isolation: avoid drain_all_pages for alloc_pages_bulk
> > 
> >  include/linux/gfp.h            |   5 +
> >  include/linux/mm.h             |   2 +
> >  include/linux/page-isolation.h |   1 +
> >  include/linux/page_owner.h     |  10 +-
> >  mm/compaction.c                |  64 +++++++----
> >  mm/huge_memory.c               |   2 +-
> >  mm/internal.h                  |   5 +-
> >  mm/page_alloc.c                | 198 ++++++++++++++++++++++++++-------
> >  mm/page_isolation.c            |  10 +-
> >  mm/page_owner.c                |   7 +-
> >  10 files changed, 230 insertions(+), 74 deletions(-)
> > 
> 
> 
> -- 
> Thanks,
> 
> David / dhildenb
>

next prev parent reply	other threads:[~2020-08-17 15:27 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-14 17:31 [RFC 0/7] Support high-order page bulk allocation Minchan Kim
2020-08-14 17:31 ` [RFC 1/7] mm: page_owner: split page by order Minchan Kim
2020-08-14 17:31 ` [RFC 2/7] mm: introduce split_page_by_order Minchan Kim
2020-08-14 17:31 ` [RFC 3/7] mm: compaction: deal with upcoming high-order page splitting Minchan Kim
2020-08-14 17:31 ` [RFC 4/7] mm: factor __alloc_contig_range out Minchan Kim
2020-08-14 17:31 ` [RFC 5/7] mm: introduce alloc_pages_bulk API Minchan Kim
2020-08-17 17:40   ` David Hildenbrand
2020-08-14 17:31 ` [RFC 6/7] mm: make alloc_pages_bulk best effort Minchan Kim
2020-08-14 17:31 ` [RFC 7/7] mm/page_isolation: avoid drain_all_pages for alloc_pages_bulk Minchan Kim
2020-08-14 17:40 ` [RFC 0/7] Support high-order page bulk allocation Matthew Wilcox
2020-08-14 20:55   ` Minchan Kim
2020-08-18  2:16     ` Cho KyongHo
2020-08-18  9:22     ` Cho KyongHo
2020-08-16 12:31 ` David Hildenbrand
2020-08-17 15:27   ` Minchan Kim [this message]
2020-08-17 15:45     ` David Hildenbrand
2020-08-17 16:30       ` Minchan Kim
2020-08-17 16:44         ` David Hildenbrand
2020-08-17 17:03           ` David Hildenbrand
2020-08-17 23:34           ` Minchan Kim
2020-08-18  7:42             ` Nicholas Piggin
2020-08-18  7:49             ` David Hildenbrand
2020-08-18 15:15               ` Minchan Kim
2020-08-18 15:58                 ` Matthew Wilcox
2020-08-18 16:22                   ` David Hildenbrand
2020-08-18 16:49                     ` Minchan Kim
2020-08-19  0:27                     ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200817152706.GB3852332@google.com \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=joaodias@google.com \
    --cc=linux-mm@kvack.org \
    --cc=pullip.cho@samsung.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.