From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Zi Yan <zi.yan@sent.com>, Andrea Arcangeli <aarcange@redhat.com>,
Minchan Kim <minchan@kernel.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
kirill.shutemov@linux.intel.com, akpm@linux-foundation.org,
vbabka@suse.cz, mgorman@techsingularity.net,
n-horiguchi@ah.jp.nec.com, khandual@linux.vnet.ibm.com,
Zi Yan <ziy@nvidia.com>
Subject: Re: [PATCH v3 03/14] mm: use pmd lock instead of racy checks in zap_pmd_range()
Date: Tue, 7 Feb 2017 20:45:36 +0300 [thread overview]
Message-ID: <20170207174536.GC5578@node.shutemov.name> (raw)
In-Reply-To: <589A0090.3050406@cs.rutgers.edu>
On Tue, Feb 07, 2017 at 11:14:56AM -0600, Zi Yan wrote:
>
>
> Kirill A. Shutemov wrote:
> > On Tue, Feb 07, 2017 at 09:11:05AM -0600, Zi Yan wrote:
> >>>> This causes memory leak or kernel crashing, if VM_BUG_ON() is enabled.
> >>> The problem is that numabalancing calls change_huge_pmd() under
> >>> down_read(mmap_sem), not down_write(mmap_sem) as the rest of users do.
> >>> It makes numabalancing the only code path beyond page fault that can turn
> >>> pmd_none() into pmd_trans_huge() under down_read(mmap_sem).
> >>>
> >>> This can lead to race when MADV_DONTNEED miss THP. That's not critical for
> >>> pagefault vs. MADV_DONTNEED race as we will end up with clear page in that
> >>> case. Not so much for change_huge_pmd().
> >>>
> >>> Looks like we need pmdp_modify() or something to modify protection bits
> >>> inplace, without clearing pmd.
> >>>
> >>> Not sure how to get crash scenario.
> >>>
> >>> BTW, Zi, have you observed the crash? Or is it based on code inspection?
> >>> Any backtraces?
> >> The problem should be very rare in the upstream kernel. I discover the
> >> problem in my customized kernel which does very frequent page migration
> >> and uses numa_protnone.
> >>
> >> The crash scenario I guess is like:
> >> 1. A huge page pmd entry is in the middle of being changed into either a
> >> pmd_protnone or a pmd_migration_entry. It is cleared to pmd_none.
> >>
> >> 2. At the same time, the application frees the vma this page belongs to.
> >
> > Em... no.
> >
> > This shouldn't be possible: your 1. must be done under down_read(mmap_sem).
> > And we only be able to remove vma under down_write(mmap_sem), so the
> > scenario should be excluded.
> >
> > What do I miss?
>
> You are right. This problem will not happen in the upstream kernel.
>
> The problem comes from my customized kernel, where I migrate pages away
> instead of reclaiming them when memory is under pressure. I did not take
> any mmap_sem when I migrate pages. So I got this error.
>
> It is a false alarm. Sorry about that. Thanks for clarifying the problem.
I think there's still a race between MADV_DONTNEED and
change_huge_pmd(.prot_numa=1) resulting in skipping THP by
zap_pmd_range(). It need to be addressed.
And MADV_FREE requires a fix.
So, minus one non-bug, plus two bugs.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Zi Yan <zi.yan@sent.com>, Andrea Arcangeli <aarcange@redhat.com>,
Minchan Kim <minchan@kernel.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
kirill.shutemov@linux.intel.com, akpm@linux-foundation.org,
vbabka@suse.cz, mgorman@techsingularity.net,
n-horiguchi@ah.jp.nec.com, khandual@linux.vnet.ibm.com,
Zi Yan <ziy@nvidia.com>
Subject: Re: [PATCH v3 03/14] mm: use pmd lock instead of racy checks in zap_pmd_range()
Date: Tue, 7 Feb 2017 20:45:36 +0300 [thread overview]
Message-ID: <20170207174536.GC5578@node.shutemov.name> (raw)
In-Reply-To: <589A0090.3050406@cs.rutgers.edu>
On Tue, Feb 07, 2017 at 11:14:56AM -0600, Zi Yan wrote:
>
>
> Kirill A. Shutemov wrote:
> > On Tue, Feb 07, 2017 at 09:11:05AM -0600, Zi Yan wrote:
> >>>> This causes memory leak or kernel crashing, if VM_BUG_ON() is enabled.
> >>> The problem is that numabalancing calls change_huge_pmd() under
> >>> down_read(mmap_sem), not down_write(mmap_sem) as the rest of users do.
> >>> It makes numabalancing the only code path beyond page fault that can turn
> >>> pmd_none() into pmd_trans_huge() under down_read(mmap_sem).
> >>>
> >>> This can lead to race when MADV_DONTNEED miss THP. That's not critical for
> >>> pagefault vs. MADV_DONTNEED race as we will end up with clear page in that
> >>> case. Not so much for change_huge_pmd().
> >>>
> >>> Looks like we need pmdp_modify() or something to modify protection bits
> >>> inplace, without clearing pmd.
> >>>
> >>> Not sure how to get crash scenario.
> >>>
> >>> BTW, Zi, have you observed the crash? Or is it based on code inspection?
> >>> Any backtraces?
> >> The problem should be very rare in the upstream kernel. I discover the
> >> problem in my customized kernel which does very frequent page migration
> >> and uses numa_protnone.
> >>
> >> The crash scenario I guess is like:
> >> 1. A huge page pmd entry is in the middle of being changed into either a
> >> pmd_protnone or a pmd_migration_entry. It is cleared to pmd_none.
> >>
> >> 2. At the same time, the application frees the vma this page belongs to.
> >
> > Em... no.
> >
> > This shouldn't be possible: your 1. must be done under down_read(mmap_sem).
> > And we only be able to remove vma under down_write(mmap_sem), so the
> > scenario should be excluded.
> >
> > What do I miss?
>
> You are right. This problem will not happen in the upstream kernel.
>
> The problem comes from my customized kernel, where I migrate pages away
> instead of reclaiming them when memory is under pressure. I did not take
> any mmap_sem when I migrate pages. So I got this error.
>
> It is a false alarm. Sorry about that. Thanks for clarifying the problem.
I think there's still a race between MADV_DONTNEED and
change_huge_pmd(.prot_numa=1) resulting in skipping THP by
zap_pmd_range(). It need to be addressed.
And MADV_FREE requires a fix.
So, minus one non-bug, plus two bugs.
--
Kirill A. Shutemov
next prev parent reply other threads:[~2017-02-07 17:45 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-05 16:12 [PATCH v3 00/14] mm: page migration enhancement for thp Zi Yan
2017-02-05 16:12 ` Zi Yan
2017-02-05 16:12 ` [PATCH v3 01/14] mm: thp: make __split_huge_pmd_locked visible Zi Yan
2017-02-05 16:12 ` Zi Yan
2017-02-06 6:12 ` Naoya Horiguchi
2017-02-06 6:12 ` Naoya Horiguchi
2017-02-06 12:10 ` Zi Yan
2017-02-06 15:02 ` Matthew Wilcox
2017-02-06 15:02 ` Matthew Wilcox
2017-02-06 15:03 ` Zi Yan
2017-02-05 16:12 ` [PATCH v3 02/14] mm: thp: create new __zap_huge_pmd_locked function Zi Yan
2017-02-05 16:12 ` Zi Yan
2017-02-05 16:12 ` [PATCH v3 03/14] mm: use pmd lock instead of racy checks in zap_pmd_range() Zi Yan
2017-02-05 16:12 ` Zi Yan
2017-02-06 4:02 ` Hillf Danton
2017-02-06 4:02 ` Hillf Danton
2017-02-06 4:14 ` Zi Yan
2017-02-06 4:14 ` Zi Yan
2017-02-06 7:43 ` Naoya Horiguchi
2017-02-06 7:43 ` Naoya Horiguchi
2017-02-06 13:02 ` Zi Yan
2017-02-06 23:22 ` Naoya Horiguchi
2017-02-06 23:22 ` Naoya Horiguchi
2017-02-06 16:07 ` Kirill A. Shutemov
2017-02-06 16:07 ` Kirill A. Shutemov
2017-02-06 16:32 ` Zi Yan
2017-02-06 17:35 ` Kirill A. Shutemov
2017-02-06 17:35 ` Kirill A. Shutemov
2017-02-07 13:55 ` Aneesh Kumar K.V
2017-02-07 13:55 ` Aneesh Kumar K.V
2017-02-07 14:12 ` Zi Yan
2017-02-07 14:19 ` Kirill A. Shutemov
2017-02-07 14:19 ` Kirill A. Shutemov
2017-02-07 15:11 ` Zi Yan
2017-02-07 15:11 ` Zi Yan
2017-02-07 16:37 ` Kirill A. Shutemov
2017-02-07 16:37 ` Kirill A. Shutemov
2017-02-07 17:14 ` Zi Yan
2017-02-07 17:14 ` Zi Yan
2017-02-07 17:45 ` Kirill A. Shutemov [this message]
2017-02-07 17:45 ` Kirill A. Shutemov
2017-02-13 0:25 ` Zi Yan
2017-02-13 0:25 ` Zi Yan
2017-02-13 10:59 ` Kirill A. Shutemov
2017-02-13 10:59 ` Kirill A. Shutemov
2017-02-13 14:40 ` Andrea Arcangeli
2017-02-13 14:40 ` Andrea Arcangeli
2017-02-05 16:12 ` [PATCH v3 04/14] mm: x86: move _PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1 Zi Yan
2017-02-05 16:12 ` Zi Yan
2017-02-09 9:14 ` Naoya Horiguchi
2017-02-09 9:14 ` Naoya Horiguchi
2017-02-09 15:07 ` Zi Yan
2017-02-05 16:12 ` [PATCH v3 05/14] mm: mempolicy: add queue_pages_node_check() Zi Yan
2017-02-05 16:12 ` Zi Yan
2017-02-05 16:12 ` [PATCH v3 06/14] mm: thp: introduce separate TTU flag for thp freezing Zi Yan
2017-02-05 16:12 ` Zi Yan
2017-02-05 16:12 ` [PATCH v3 07/14] mm: thp: introduce CONFIG_ARCH_ENABLE_THP_MIGRATION Zi Yan
2017-02-05 16:12 ` Zi Yan
2017-02-05 16:12 ` [PATCH v3 08/14] mm: thp: enable thp migration in generic path Zi Yan
2017-02-05 16:12 ` Zi Yan
2017-02-09 9:15 ` Naoya Horiguchi
2017-02-09 9:15 ` Naoya Horiguchi
2017-02-09 15:17 ` Zi Yan
2017-02-09 23:04 ` Naoya Horiguchi
2017-02-09 23:04 ` Naoya Horiguchi
2017-02-14 20:13 ` Zi Yan
2017-02-14 20:13 ` Zi Yan
2017-02-05 16:12 ` [PATCH v3 09/14] mm: thp: check pmd migration entry in common path Zi Yan
2017-02-05 16:12 ` Zi Yan
2017-02-09 9:16 ` Naoya Horiguchi
2017-02-09 9:16 ` Naoya Horiguchi
2017-02-09 17:36 ` Zi Yan
2017-02-05 16:12 ` [PATCH v3 10/14] mm: soft-dirty: keep soft-dirty bits over thp migration Zi Yan
2017-02-05 16:12 ` Zi Yan
2017-02-05 16:12 ` [PATCH v3 11/14] mm: hwpoison: soft offline supports " Zi Yan
2017-02-05 16:12 ` Zi Yan
2017-02-05 16:12 ` [PATCH v3 12/14] mm: mempolicy: mbind and migrate_pages support " Zi Yan
2017-02-05 16:12 ` Zi Yan
2017-02-05 16:12 ` [PATCH v3 13/14] mm: migrate: move_pages() supports " Zi Yan
2017-02-05 16:12 ` Zi Yan
2017-02-09 9:16 ` Naoya Horiguchi
2017-02-09 9:16 ` Naoya Horiguchi
2017-02-09 17:37 ` Zi Yan
2017-02-05 16:12 ` [PATCH v3 14/14] mm: memory_hotplug: memory hotremove " Zi Yan
2017-02-05 16:12 ` Zi Yan
2017-02-23 16:12 ` [PATCH v3 00/14] mm: page migration enhancement for thp Zi Yan
2017-02-23 16:12 ` Zi Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170207174536.GC5578@node.shutemov.name \
--to=kirill@shutemov.name \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=khandual@linux.vnet.ibm.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=minchan@kernel.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=vbabka@suse.cz \
--cc=zi.yan@cs.rutgers.edu \
--cc=zi.yan@sent.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.