linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Theodore Tso <tytso@MIT.EDU>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	Ingo Molnar <mingo@elte.hu>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Marcelo Tosatti <mtosatti@redhat.com>,
	Adam Litke <agl@us.ibm.com>, Avi Kivity <avi@redhat.com>,
	Izik Eidus <ieidus@redhat.com>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Nick Piggin <npiggin@suse.de>, Rik van Riel <riel@redhat.com>,
	Dave Hansen <dave@linux.vnet.ibm.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Mike Travis <travis@sgi.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	Chris Wright <chrisw@sous-sol.org>,
	bpicco@redhat.com,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	Arnd Bergmann <arnd@arndb.de>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Subject: Re: [PATCH 00 of 41] Transparent Hugepage Support #17
Date: Tue, 6 Apr 2010 12:16:19 +0100	[thread overview]
Message-ID: <20100406111619.GD17882@csn.ul.ie> (raw)
In-Reply-To: <BAA2AB49-DE66-4F22-B0E2-296522C2AF3E@mit.edu>

On Tue, Apr 06, 2010 at 06:32:28AM -0400, Theodore Tso wrote:
> 
> On Apr 6, 2010, at 5:30 AM, Mel Gorman wrote:
> > 
> > There is a good chance you could allocate a decent percentage of
> > memory as huge pages but as you are unlikely to have run hugeadm
> > --set-recommended-min_free_kbytes early in boot, it is also likely to trash
> > heavily and the success rates will not be very impressive.
> 

> Can you explain how hugeadm --set-recommended-min_free_kbytes works and
> how it achieves this magic?  Or can you send me a pointer to how this works?
> I've tried doing some Google searches, and I found the LWN article "Huge
> pages part 3: administration", but it doesn't go into a lot of detail how
> increasing vm.min_free_kbytes helps the anti fragmentation code.

Sure, the details of how and why it works are spread all over the place.
It's fairly simple really and related to how anti-fragmentation does its work.

Anti-frag divides up a zone into "arenas" where an arena is usually the
default huge page size - 2M on x86-64, 16M on ppc64 etc. Its objective is to
keep UNMOVABLE, RECLAIMABLE and MOVABLE pages within the same arenas using
multiple free lists. If a page within the desired arena is not available, it
falls back to using one of the other arenas. A fallback is a "fragmentation
event" as traced by the mm_page_alloc_extfrag event. A severe event is if a
small page is used and a benign event is if a large page (e.g. 2M) is moved
to the desired list. It's benign because pages of the same "migrate type"
continue to be allocated within the same arena.

How often these "fragmentation events" occur depends on pages of the
desired type being always available. This in turn depends on free pages
being available which is easiest to control by min_free_kbytes and is where
--set-recommended-min_free_kbytes comes in. By keeping a number of pages free,
the probability of a page of the desired type being available increases.

As there are three migrate-types we currently care about from an anti-frag
perspective, the recommended min_free_kbytes value depends on the number of
zones in the system and having 3 arenas worth of pages are kept free per
zone. Once set, there will, in most cases, be a page free of the required
type at allocation time. It can be observed in practice by tracing
mm_page_alloc_extfrag.

The next part of min_free_kbytes is related to the "reserve" blocks which
are only important to high-order atomic allocations. There is a maximum of
two reserve blocks per zone. For example, on a flat-memory system with one
grouping of memory, there would be a maximum of two reserve arenas. On a
NUMA system with two nodes, there would be a maximum of four. With multiple
groupings of memory such as 32-bit X86 with DMA, Normal and Highmem groups of
free-lists, there might be five reserve pageblocks, two each for the Normal
and HighMem groupings and just one for DMA as it is only 16MB worth of pages.

The final part of the recommended min_free_kbytes value is a sum of the
reserve arenas and the migrate-type arenas to ensure that pages of the
required type are free.

The function that works this out in libhugetlbfs is

long recommended_minfreekbytes(void)
{
        FILE *f;
        char buf[ZONEINFO_LINEBUF];
        int nr_zones = 0;
        long recommended_min;
        long pageblock_kbytes = kernel_default_hugepage_size() / 1024;

        /* Detect the number of zones in the system */
        f = fopen(PROCZONEINFO, "r");
        if (f == NULL) {
                WARNING("Unable to open " PROCZONEINFO);
                return 0;
        }
        while (fgets(buf, ZONEINFO_LINEBUF, f) != NULL) {
                if (strncmp(buf, "Node ", 5) == 0)
                        nr_zones++;
        }
        fclose(f);

        /* Make sure at least 2 pageblocks are free for MIGRATE_RESERVE */
        recommended_min = pageblock_kbytes * nr_zones * 2;

        /*
         * Make sure that on average at least two pageblocks are almost free
         * of another type, one for a migratetype to fall back to and a
         * second to avoid subsequent fallbacks of other types There are 3
         * MIGRATE_TYPES we care about.
         */
        recommended_min += pageblock_kbytes * nr_zones * 3 * 3;
        return recommended_min;
}

Does this clarify why min_free_kbytes helps and why the "recommended"
value is what it is?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-04-06 11:16 UTC|newest]

Thread overview: 205+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-02  0:41 [PATCH 00 of 41] Transparent Hugepage Support #17 Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 01 of 41] define MADV_HUGEPAGE Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 02 of 41] compound_lock Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 03 of 41] alter compound get_page/put_page Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 04 of 41] update futex compound knowledge Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 05 of 41] fix bad_page to show the real reason the page is bad Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 06 of 41] clear compound mapping Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 07 of 41] add native_set_pmd_at Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 08 of 41] add pmd paravirt ops Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 09 of 41] no paravirt version of pmd ops Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 10 of 41] export maybe_mkwrite Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 11 of 41] comment reminder in destroy_compound_page Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 12 of 41] config_transparent_hugepage Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 13 of 41] special pmd_trans_* functions Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 14 of 41] add pmd mangling generic functions Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 15 of 41] add pmd mangling functions to x86 Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 16 of 41] bail out gup_fast on splitting pmd Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 17 of 41] pte alloc trans splitting Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 18 of 41] add pmd mmu_notifier helpers Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 19 of 41] clear page compound Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 20 of 41] add pmd_huge_pte to mm_struct Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 21 of 41] split_huge_page_mm/vma Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 22 of 41] split_huge_page paging Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 23 of 41] clear_copy_huge_page Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 24 of 41] kvm mmu transparent hugepage support Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 25 of 41] _GFP_NO_KSWAPD Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 26 of 41] don't alloc harder for gfp nomemalloc even if nowait Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 27 of 41] transparent hugepage core Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 28 of 41] verify pmd_trans_huge isn't leaking Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 29 of 41] madvise(MADV_HUGEPAGE) Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 30 of 41] pmd_trans_huge migrate bugcheck Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 31 of 41] memcg compound Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 32 of 41] memcg huge memory Andrea Arcangeli
2010-04-02  0:42 ` [PATCH 33 of 41] transparent hugepage vmstat Andrea Arcangeli
2010-04-02  0:42 ` [PATCH 34 of 41] khugepaged Andrea Arcangeli
2010-04-02  0:42 ` [PATCH 35 of 41] skip transhuge pages in ksm for now Andrea Arcangeli
2010-04-02  0:42 ` [PATCH 36 of 41] remove PG_buddy Andrea Arcangeli
2010-04-02  0:42 ` [PATCH 37 of 41] add x86 32bit support Andrea Arcangeli
2010-04-02  0:42 ` [PATCH 38 of 41] mincore transparent hugepage support Andrea Arcangeli
2010-04-02  0:42 ` [PATCH 39 of 41] add pmd_modify Andrea Arcangeli
2010-04-02  0:42 ` [PATCH 40 of 41] mprotect: pass vma down to page table walkers Andrea Arcangeli
2010-04-02  0:42 ` [PATCH 41 of 41] mprotect: transparent huge page support Andrea Arcangeli
2010-04-05 19:09 ` [PATCH 00 of 41] Transparent Hugepage Support #17 Andrew Morton
2010-04-05 19:36   ` Ingo Molnar
2010-04-05 20:26     ` Pekka Enberg
2010-04-05 20:32       ` Linus Torvalds
2010-04-05 20:46         ` Pekka Enberg
2010-04-05 20:58           ` Linus Torvalds
2010-04-05 21:54             ` Ingo Molnar
2010-04-05 23:21             ` Andrea Arcangeli
2010-04-06  0:26               ` Linus Torvalds
2010-04-06  1:08                 ` [RFD] " Linus Torvalds
2010-04-06  1:26                   ` Andrea Arcangeli
2010-04-06  1:35                   ` Linus Torvalds
2010-04-06  1:13                 ` Andrea Arcangeli
2010-04-06  1:38                   ` Linus Torvalds
2010-04-06  2:23                     ` Linus Torvalds
2010-04-06  5:25                       ` Nick Piggin
2010-04-06  9:08                       ` Ingo Molnar
2010-04-06  9:13                         ` Ingo Molnar
2010-04-10 18:47                         ` Andrea Arcangeli
2010-04-10 19:02                           ` Ingo Molnar
2010-04-10 19:22                             ` Avi Kivity
2010-04-10 19:47                               ` Ingo Molnar
2010-04-10 20:00                                 ` Andrea Arcangeli
2010-04-10 20:10                                   ` Andrea Arcangeli
2010-04-10 20:21                                   ` Jason Garrett-Glaser
2010-04-10 20:24                                 ` Avi Kivity
2010-04-10 20:42                                   ` Avi Kivity
2010-04-10 20:47                                     ` Andrea Arcangeli
2010-04-10 21:00                                       ` Avi Kivity
2010-04-10 21:47                                         ` Andrea Arcangeli
2010-04-11  1:05                                         ` Andrea Arcangeli
2010-04-11 11:24                                           ` Ingo Molnar
2010-04-11 11:33                                             ` Avi Kivity
2010-04-11 12:11                                               ` Ingo Molnar
2010-04-25 19:27                                           ` Andrea Arcangeli
2010-04-26 18:01                                             ` Andrea Arcangeli
2010-04-30  9:55                                               ` Ingo Molnar
2010-04-30 15:19                                                 ` Andrea Arcangeli
2010-05-02 12:17                                                   ` Ingo Molnar
2010-04-10 20:49                                     ` Jason Garrett-Glaser
2010-04-10 20:53                                       ` Avi Kivity
2010-04-10 20:58                                         ` Jason Garrett-Glaser
2010-04-11  9:29                                         ` Avi Kivity
2010-04-11  9:37                                           ` Jason Garrett-Glaser
2010-04-11  9:40                                             ` Avi Kivity
2010-04-11 10:22                                               ` Jason Garrett-Glaser
2010-04-11 11:00                                               ` Ingo Molnar
2010-04-11 11:19                                                 ` Avi Kivity
2010-04-11 11:30                                                   ` Jason Garrett-Glaser
2010-04-11 11:52                                                   ` hugepages will matter more in the future Ingo Molnar
2010-04-11 12:01                                                     ` Avi Kivity
2010-04-11 12:35                                                       ` Ingo Molnar
2010-04-11 15:22                                                     ` Linus Torvalds
2010-04-11 15:43                                                       ` Avi Kivity
2010-04-11 15:52                                                         ` Linus Torvalds
2010-04-11 16:04                                                           ` Avi Kivity
2010-04-12  7:45                                                             ` Ingo Molnar
2010-04-12  8:14                                                               ` Nick Piggin
2010-04-12  8:22                                                                 ` Ingo Molnar
2010-04-12  8:34                                                                   ` Nick Piggin
2010-04-12  8:47                                                                     ` Avi Kivity
2010-04-12  8:45                                                                 ` Andrea Arcangeli
2010-04-11 19:35                                                           ` Andrea Arcangeli
2010-04-12 16:20                                                           ` Rik van Riel
2010-04-12 16:40                                                             ` Linus Torvalds
2010-04-12 16:56                                                               ` Linus Torvalds
2010-04-12 17:06                                                                 ` Randy Dunlap
2010-04-12 17:36                                                               ` Andrea Arcangeli
2010-04-12 17:46                                                                 ` Rik van Riel
2010-04-11 19:40                                                       ` Andrea Arcangeli
2010-04-12 15:41                                                         ` Linus Torvalds
2010-04-12 11:22                                                     ` Arjan van de Ven
2010-04-12 11:29                                                       ` Avi Kivity
2010-04-17 15:12                                                         ` Arjan van de Ven
2010-04-17 18:18                                                           ` Avi Kivity
2010-04-17 19:05                                                             ` Arjan van de Ven
2010-04-17 19:05                                                               ` Avi Kivity
2010-04-17 19:18                                                                 ` Arjan van de Ven
2010-04-17 19:20                                                                   ` Avi Kivity
2010-04-12 13:30                                                       ` Andrea Arcangeli
2010-04-12 13:33                                                         ` Avi Kivity
2010-04-12 13:39                                                           ` Andrea Arcangeli
2010-04-12 13:53                                                             ` Avi Kivity
2010-04-13 11:38                                                         ` Ingo Molnar
2010-04-13 13:17                                                           ` Andrea Arcangeli
2010-04-11 10:46                                   ` [PATCH 00 of 41] Transparent Hugepage Support #17 Ingo Molnar
2010-04-11 10:49                                     ` Ingo Molnar
2010-04-11 11:30                                     ` Avi Kivity
2010-04-11 12:08                                       ` Ingo Molnar
2010-04-11 12:24                                         ` Avi Kivity
2010-04-11 12:46                                           ` Ingo Molnar
2010-04-12  6:09                                         ` Nick Piggin
2010-04-12  6:18                                           ` Pekka Enberg
2010-04-12  6:48                                             ` Nick Piggin
2010-04-12 14:29                                             ` Christoph Lameter
2010-04-12 16:06                                               ` Nick Piggin
2010-04-12  6:36                                           ` Avi Kivity
2010-04-12  6:55                                             ` Ingo Molnar
2010-04-12  7:15                                             ` Nick Piggin
2010-04-12  7:45                                               ` Avi Kivity
2010-04-12  8:28                                                 ` Nick Piggin
2010-04-12  9:01                                                   ` Andrea Arcangeli
2010-04-12  9:03                                                   ` Avi Kivity
2010-04-12  9:26                                                     ` Nick Piggin
2010-04-12  9:39                                                       ` Andrea Arcangeli
2010-04-12 10:02                                                       ` Avi Kivity
2010-04-12 10:08                                                         ` Andrea Arcangeli
2010-04-12 10:10                                                           ` Avi Kivity
2010-04-12 10:23                                                             ` Andrea Arcangeli
2010-04-12 10:37                                                         ` Nick Piggin
2010-04-12 10:59                                                           ` Avi Kivity
2010-04-12 12:23                                                             ` Avi Kivity
2010-04-12 13:25                                                             ` Andrea Arcangeli
2010-04-13  0:38                                                         ` Andrew Morton
2010-04-13  6:18                                                           ` Neil Brown
2010-04-13 13:31                                                             ` Andrea Arcangeli
2010-04-13 13:40                                                               ` Mel Gorman
2010-04-13 13:44                                                                 ` Andrea Arcangeli
2010-04-13 13:55                                                                   ` Mel Gorman
2010-04-13 14:03                                                                     ` Andrea Arcangeli
2010-04-12  7:51                                               ` Ingo Molnar
2010-04-12  7:18                                             ` Andrea Arcangeli
2010-04-12  6:49                                           ` Ingo Molnar
2010-04-12  7:35                                             ` Andrea Arcangeli
2010-04-12  7:08                                           ` Andrea Arcangeli
2010-04-12  7:21                                             ` Nick Piggin
2010-04-12  7:50                                               ` Avi Kivity
2010-04-12  8:07                                                 ` Ingo Molnar
2010-04-12  8:21                                                   ` Andrea Arcangeli
2010-04-12 10:27                                                   ` Mel Gorman
2010-04-12  8:18                                                 ` Andrea Arcangeli
2010-04-12  8:06                                               ` Andrea Arcangeli
2010-04-12 10:44                                                 ` Mel Gorman
2010-04-12 11:12                                                   ` Avi Kivity
2010-04-12 13:17                                                   ` Andrea Arcangeli
2010-04-12 14:24                           ` Christoph Lameter
2010-04-12 14:49                             ` Avi Kivity
2010-04-06  9:55                       ` Avi Kivity
2010-04-06  9:57                         ` Avi Kivity
2010-04-06 11:55                         ` Avi Kivity
2010-04-06 13:10                           ` Nick Piggin
2010-04-06 13:22                             ` Avi Kivity
2010-04-06 13:45                               ` Nick Piggin
2010-04-06 13:57                                 ` Avi Kivity
2010-04-06 16:50                                 ` Andrea Arcangeli
2010-04-06 17:31                                   ` Avi Kivity
2010-04-06 18:00                                     ` Christoph Lameter
2010-04-06 18:04                                       ` Avi Kivity
2010-04-06 18:47                                 ` Avi Kivity
2010-04-06 14:44                             ` Rik van Riel
2010-04-06 16:43                             ` Andrea Arcangeli
2010-04-06  9:30               ` Mel Gorman
2010-04-06 10:32                 ` Theodore Tso
2010-04-06 11:16                   ` Mel Gorman [this message]
2010-04-06 13:13                     ` Theodore Tso
2010-04-06 14:55                       ` Mel Gorman
2010-04-06 16:46                       ` Andrea Arcangeli
2010-04-05 21:01         ` Chris Mason
2010-04-05 21:18           ` Avi Kivity
2010-04-05 21:33             ` Linus Torvalds
2010-04-05 22:33               ` Chris Mason
2010-04-06  8:30             ` Mel Gorman
2010-04-06 11:35               ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100406111619.GD17882@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=aarcange@redhat.com \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=avi@redhat.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=bpicco@redhat.com \
    --cc=chrisw@sous-sol.org \
    --cc=cl@linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=hannes@cmpxchg.org \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=ieidus@redhat.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=npiggin@suse.de \
    --cc=penberg@cs.helsinki.fi \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=travis@sgi.com \
    --cc=tytso@MIT.EDU \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).