From: Andrew Morton <akpm@osdl.org>
To: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Cc: "'David Gibson'" <david@gibson.dropbear.id.au>,
<linux-kernel@vger.kernel.org>
Subject: Re: Hugepage regression
Date: Tue, 10 Oct 2006 12:14:47 -0700 [thread overview]
Message-ID: <20061010121447.4f8daf8f.akpm@osdl.org> (raw)
In-Reply-To: <000001c6ec92$871e5450$cb34030a@amr.corp.intel.com>
On Tue, 10 Oct 2006 10:35:50 -0700
"Chen, Kenneth W" <kenneth.w.chen@intel.com> wrote:
> David Gibson wrote on Tuesday, October 10, 2006 2:16 AM
> > > > It seems commit fe1668ae5bf0145014c71797febd9ad5670d5d05 causes a
> > > > hugepage regression. A git bisect points the finger at that commit
> > > > for causing an oops in the 'alloc-instantiate-race' test from the
> > > > libhugetlbfs testsuite.
> > > >
> > > > Still looking to determine the reason it breaks things.
> > > >
> > >
> > > It's assuming that unmap_hugepage_range() is always freeing these pages.
> > > If the page is shared by another mapping, bad things will happen: the
> > > threads fight over page->lru.
> > >
> > > Doing
> > >
> > > + if (page_count(page) == 1)
> > > list_add(&page->lru, &page_list);
> > >
> > > might help. But then we miss the tlb flush in rare racy conditions.
> >
> > Well, there'd need to be an else doing a put_page(), too.
> >
> > Looks like the fundamental problem is that a list is not a suitable
> > data structure for gathering here, since it's not truly local. We
> > should probably change it to a small array, like in the normal tlb
> > gather structure. If we run out of space we can force the tlb flush
> > and keep going.
>
>
> With the pending shared page table for hugetlb currently sitting in -mm,
> we serialize the all hugetlb unmap with a per file i_mmap_lock. This
> race could well be solved by that pending patch?
>
> http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19-rc1/2.6.19-rc1-mm1/broken-out/shared-page-table-for-hugetlb-page-v
> 4.patch
>
We need something for 2.6.19 though. As David indicates, not using
page->lru should fix it (pagevec_add, pagevec_release would suit).
Or just a separate TBL invalidation per page. Is that likely to be
particularly expensive? It's the first one which hurts?
next prev parent reply other threads:[~2006-10-10 19:15 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-10-10 8:47 Hugepage regression David Gibson
2006-10-10 9:04 ` Andrew Morton
2006-10-10 9:15 ` David Gibson
2006-10-10 17:35 ` Chen, Kenneth W
2006-10-10 19:14 ` Andrew Morton [this message]
2006-10-10 19:18 ` Hugh Dickins
2006-10-10 19:30 ` Chen, Kenneth W
2006-10-10 20:10 ` Hugh Dickins
2006-10-10 23:03 ` Chen, Kenneth W
2006-10-13 17:03 ` Hugh Dickins
2006-10-10 23:34 ` Chen, Kenneth W
2006-10-11 1:18 ` 'David Gibson'
2006-10-11 2:47 ` Chen, Kenneth W
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20061010121447.4f8daf8f.akpm@osdl.org \
--to=akpm@osdl.org \
--cc=david@gibson.dropbear.id.au \
--cc=kenneth.w.chen@intel.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox