linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Anton Blanchard <anton@samba.org>
Cc: dwg@au1.ibm.com, akpm@linux-foundation.org, hughd@google.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] hugepage: Allow parallelization of the hugepage fault path
Date: Thu, 21 Jul 2011 11:17:01 +0100	[thread overview]
Message-ID: <20110721101701.GC5212@csn.ul.ie> (raw)
In-Reply-To: <20110715160650.48d61245@kryten>

On Fri, Jul 15, 2011 at 04:06:50PM +1000, Anton Blanchard wrote:
> 
> Hi Mel,
> 
> > I haven't tested this patch yet but typically how I would test it is
> > multiple parallel instances of make func from libhugetlbfs. In
> > particular I would be looking out for counter corruption. Has
> > something like this been done? I know hugetlb_lock protects the
> > counters but the locking in there has turned into a bit of a mess so
> > it's easy to miss something.
> 
> Thanks for the suggestion and sorry for taking so long. Make check has
> the same PASS/FAIL count before and after the patches.
> 
> I also ran 16 copies of make func on a large box with 896 HW threads.
> Some of the tests that use shared memory were a bit upset, but that
> seems to be because we use a static key. It seems the tests were also
> fighting over the number of huge pages they wanted the system set to.
> 
> It got up to a load average of 13207, and heap-overflow consumed all my
> memory, a pretty good effort considering I have over 1TB of it.
> 
> After things settled down things were OK, apart from the fact that we
> have 20 huge pages unaccounted for:
> 
> HugePages_Total:   10000
> HugePages_Free:     9980
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> 
> I verified there were no shared memory segments, and no files in the
> hugetlbfs filesystem (I double checked by unmounting it).
> 
> I can't see how this patch set would cause this. It seems like we can
> leak huge pages, perhaps in an error path. Anyway, I'll repost the
> patch set for comments.
> 

I didn't see any problem with the patch either. The locking should
be functionally equivalent for both private and shared mappings.

Out of curiousity, can you trigger the bug without the patches? It
could be a race on faulting the shared regions that is causing the
leakage.  Any chance this can be debugged minimally by finding out if
this is an accounting bug or if if a hugepage is leaked? Considering
the stress of the machine and its size, I'm guessing it's not trivially
reproducible anywhere else.

I think one possibility of where the bug is is when updating
the hugepage pool size with "make func". This is a scenario that does
not normally occur as hugepage pool resizing is an administrative task
that happens rarely. See this chunk for instance

        spin_lock(&hugetlb_lock);
        if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) {
                spin_unlock(&hugetlb_lock);
                return NULL;
        } else {
                h->nr_huge_pages++;
                h->surplus_huge_pages++;
        }
        spin_unlock(&hugetlb_lock);

        page = alloc_pages(htlb_alloc_mask|__GFP_COMP|
                                        __GFP_REPEAT|__GFP_NOWARN,
                                        huge_page_order(h));

        if (page && arch_prepare_hugepage(page)) {
                __free_pages(page, huge_page_order(h));
                return NULL;
        }

That thing is not updating the counters if arch_prepare_hugepage fails.
That function is a no-op on powerpc normally. That wasn't changed for
any reason was it?

Another possibility is a regression of
[4f16fc10: mm: hugetlb: fix hugepage memory leak in mincore()] so maybe
try a similar reproduction case of mincore?

Maybe also try putting assert_spin_locked at every point nr_free_pages
or nr_huge_pages is updated and see if one of them triggers?


-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2011-07-21 10:17 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-25  3:32 [PATCH 1/2] hugepage: Protect region tracking lists with its own spinlock Anton Blanchard
2011-01-25  3:34 ` [PATCH 2/2] hugepage: Allow parallelization of the hugepage fault path Anton Blanchard
2011-01-25 19:44   ` Eric B Munson
2011-01-26  9:24   ` Mel Gorman
2011-07-15  6:06     ` Anton Blanchard
2011-07-15  6:08       ` [PATCH 1/2] hugepage: Protect region tracking lists with its own spinlock Anton Blanchard
2011-07-18 15:24         ` Eric B Munson
2011-07-15  6:10       ` [PATCH 2/2] hugepage: Allow parallelization of the hugepage fault path Anton Blanchard
2011-07-18 15:24         ` Eric B Munson
2011-07-21 10:17       ` Mel Gorman [this message]
2011-07-15  7:52   ` Andi Kleen
2011-07-15 13:10     ` David Gibson
2011-01-25 19:43 ` [PATCH 1/2] hugepage: Protect region tracking lists with its own spinlock Eric B Munson
2011-01-26  9:07 ` Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2013-07-26 14:27 [PATCH 0/2] hugepage: optimize page fault path locking Davidlohr Bueso
2013-07-26 14:27 ` [PATCH 2/2] hugepage: allow parallelization of the hugepage fault path Davidlohr Bueso
2013-07-28  6:00   ` Hillf Danton
2013-07-29 19:16     ` Davidlohr Bueso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110721101701.GC5212@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=akpm@linux-foundation.org \
    --cc=anton@samba.org \
    --cc=dwg@au1.ibm.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).