From: Greg KH <gregkh@linuxfoundation.org>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: aarcange@redhat.com, akpm@linux-foundation.org,
dave@stgolabs.net, kirill.shutemov@linux.intel.com,
mgorman@techsingularity.net, mhocko@kernel.org,
n-horiguchi@ah.jp.nec.com, stable@vger.kernel.org,
torvalds@linux-foundation.org
Subject: Re: FAILED: patch "[PATCH] hugetlbfs: fix races and page leaks during migration" failed to apply to 4.14-stable tree
Date: Fri, 8 Mar 2019 12:08:55 +0100 [thread overview]
Message-ID: <20190308110855.GA5326@kroah.com> (raw)
In-Reply-To: <bdcd3dcb-f7a6-ceee-f8a4-36e0e1c1e953@oracle.com>
On Mon, Mar 04, 2019 at 03:51:31PM -0800, Mike Kravetz wrote:
> On 3/2/19 12:12 AM, gregkh@linuxfoundation.org wrote:
> >
> > The patch below does not apply to the 4.14-stable tree.
> > If someone wants it applied there, or to any other stable or longterm
> > tree, then please email the backport, including the original git commit
> > id to <stable@vger.kernel.org>.
>
> From: Mike Kravetz <mike.kravetz@oracle.com>
> Date: Mon, 4 Mar 2019 15:36:59 -0800
> Subject: [PATCH] hugetlbfs: fix races and page leaks during migration
>
> commit cb6acd01e2e43fd8bad11155752b7699c3d0fb76 upstream.
>
> hugetlb pages should only be migrated if they are 'active'. The routines
> set/clear_page_huge_active() modify the active state of hugetlb pages.
> When a new hugetlb page is allocated at fault time, set_page_huge_active
> is called before the page is locked. Therefore, another thread could
> race and migrate the page while it is being added to page table by the
> fault code. This race is somewhat hard to trigger, but can be seen by
> strategically adding udelay to simulate worst case scheduling behavior.
> Depending on 'how' the code races, various BUG()s could be triggered.
>
> To address this issue, simply delay the set_page_huge_active call until
> after the page is successfully added to the page table.
>
> Hugetlb pages can also be leaked at migration time if the pages are
> associated with a file in an explicitly mounted hugetlbfs filesystem.
> For example, consider a two node system with 4GB worth of huge pages
> available. A program mmaps a 2G file in a hugetlbfs filesystem. It
> then migrates the pages associated with the file from one node to
> another. When the program exits, huge page counts are as follows:
>
> node0
> 1024 free_hugepages
> 1024 nr_hugepages
>
> node1
> 0 free_hugepages
> 1024 nr_hugepages
>
> Filesystem Size Used Avail Use% Mounted on
> nodev 4.0G 2.0G 2.0G 50% /var/opt/hugepool
>
> That is as expected. 2G of huge pages are taken from the free_hugepages
> counts, and 2G is the size of the file in the explicitly mounted
> filesystem. If the file is then removed, the counts become:
>
> node0
> 1024 free_hugepages
> 1024 nr_hugepages
>
> node1
> 1024 free_hugepages
> 1024 nr_hugepages
>
> Filesystem Size Used Avail Use% Mounted on
> nodev 4.0G 2.0G 2.0G 50% /var/opt/hugepool
>
> Note that the filesystem still shows 2G of pages used, while there
> actually are no huge pages in use. The only way to 'fix' the
> filesystem accounting is to unmount the filesystem
>
> If a hugetlb page is associated with an explicitly mounted filesystem,
> this information in contained in the page_private field. At migration
> time, this information is not preserved. To fix, simply transfer
> page_private from old to new page at migration time if necessary.
>
> There is a related race with removing a huge page from a file and
> migration. When a huge page is removed from the pagecache, the
> page_mapping() field is cleared, yet page_private remains set until the
> page is actually freed by free_huge_page(). A page could be migrated
> while in this state. However, since page_mapping() is not set the
> hugetlbfs specific routine to transfer page_private is not called and
> we leak the page count in the filesystem. To fix, check for this
> condition before migrating a huge page. If the condition is detected,
> return EBUSY for the page.
>
> Cc: <stable@vger.kernel.org>
> Fixes: bcc54222309c ("mm: hugetlb: introduce page_huge_active")
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
> fs/hugetlbfs/inode.c | 12 ++++++++++++
> mm/hugetlb.c | 16 +++++++++++++---
> mm/migrate.c | 11 +++++++++++
> 3 files changed, 36 insertions(+), 3 deletions(-)
Thanks for all 4 of these, now queued up.
greg k-h
prev parent reply other threads:[~2019-03-08 11:09 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-02 8:12 FAILED: patch "[PATCH] hugetlbfs: fix races and page leaks during migration" failed to apply to 4.14-stable tree gregkh
2019-03-04 23:51 ` Mike Kravetz
2019-03-08 11:08 ` Greg KH [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190308110855.GA5326@kroah.com \
--to=gregkh@linuxfoundation.org \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dave@stgolabs.net \
--cc=kirill.shutemov@linux.intel.com \
--cc=mgorman@techsingularity.net \
--cc=mhocko@kernel.org \
--cc=mike.kravetz@oracle.com \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).