From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Steve Capper <steve.capper@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>,
Gerald Schaefer <gerald.schaefer@de.ibm.com>,
Christian Borntraeger <borntraeger@de.ibm.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Michael Ellerman <mpe@ellerman.id.au>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Paul Mackerras <paulus@samba.org>,
linuxppc-dev@lists.ozlabs.org,
Catalin Marinas <catalin.marinas@arm.com>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
linux-s390@vger.kernel.org,
Sebastian Ott <sebott@linux.vnet.ibm.com>,
Steve Capper <steve.capper@arm.com>
Subject: Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
Date: Thu, 25 Feb 2016 19:01:11 +0300 [thread overview]
Message-ID: <20160225160111.GB19707@node.shutemov.name> (raw)
In-Reply-To: <CAPvkgC3gfmgA9aCvCeqReKhjpkT5Y-qk-2fNO8puDjUs9EWzVw@mail.gmail.com>
On Thu, Feb 25, 2016 at 03:49:33PM +0000, Steve Capper wrote:
> On 23 February 2016 at 18:47, Will Deacon <will.deacon@arm.com> wrote:
> > [adding Steve, since he worked on THP for 32-bit ARM]
>
> Apologies for my late reply...
>
> >
> > On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> >> On Tue, 23 Feb 2016 13:32:21 +0300
> >> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> >> > The theory is that the splitting bit effetely masked bogus pmd_present():
> >> > we had pmd_trans_splitting() in all code path and that prevented mm from
> >> > touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
> >> > pmd where it shouldn't and here's a boom.
> >>
> >> Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
> >> splitting, after all there is a page behind the the pmd. Also, if it was
> >> bogus, and it would need to be false, why should it be marked !pmd_present()
> >> only at the pmdp_invalidate() step before the pmd_populate()? It clearly
> >> is pmd_present() before that, on all architectures, and if there was any
> >> problem/race with that, setting it to !pmd_present() at this stage would
> >> only (marginally) reduce the race window.
> >>
> >> BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
> >> i.e. they do not set pmd_present() == false, only mark it so that it would
> >> not generate a new TLB entry, just like on s390. After all, the function
> >> is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
> >> before that call is just a little ambiguous in its wording. When it says
> >> "mark the pmd notpresent" it probably means "mark it so that it will not
> >> generate a new TLB entry", which is also what the comment is really about:
> >> prevent huge and small entries in the TLB for the same page at the same
> >> time.
> >>
> >> FWIW, and since the ARM arch-list is already on cc, I think there is
> >> an issue with pmdp_invalidate() on ARM, since it also seems to clear
> >> the trans_huge (and formerly trans_splitting) bit, which actually makes
> >> the pmd !pmd_present(), but it violates the other requirement from the
> >> comment:
> >> "the pmd_trans_huge and pmd_trans_splitting must remain set at all times
> >> on the pmd until the split is complete for this pmd"
> >
> > I've only been testing this for arm64 (where I'm yet to see a problem),
> > but we use the generic pmdp_invalidate implementation from
> > mm/pgtable-generic.c there. On arm64, pmd_trans_huge will return true
> > after pmd_mknotpresent. On arm, it does look to be buggy, since it nukes
> > the entire entry... Steve?
>
> pmd_mknotpresent on arm looks inconsistent with the other
> architectures and can be changed.
>
> Having had a look at the usage, I can't see it causing an immediate
> problem (that needs to be addressed by an emergency patch).
> We don't have a notion of splitting pmds (so there is no splitting
> information to lose), and the only usage I could see of
> pmd_mknotpresent was:
>
> pmdp_invalidate(vma, haddr, pmd);
> pmd_populate(mm, pmd, pgtable);
>
> In mm/huge_memory.c, around line 3588.
>
> So we invalidate the entry (which puts down a faulting entry from
> pmd_mknotpresent and invalidates tlb), then immediately put down a
> table entry with pmd_populate.
>
> I have run a 32-bit ARM test kernel and exacerbated THP splits (that's
> what took me time), and I didn't notice any problems with 4.5-rc5.
If I read code correctly, your pmd_mknotpresent() makes the pmd
pmd_none(), right? If yes, it's a problem.
It introduces race I've described here:
https://marc.info/?l=linux-mm&m=144723658100512&w=4
Basically, if zap_pmd_range() would see pmd_none() between
pmdp_mknotpresent() and pmd_populate(), we're screwed.
The race window is small, but it's there.
--
Kirill A. Shutemov
next prev parent reply other threads:[~2016-02-25 16:01 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-11 18:22 [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM) Gerald Schaefer
2016-02-11 19:09 ` Kirill A. Shutemov
2016-02-11 19:12 ` Kirill A. Shutemov
2016-02-12 12:21 ` Sebastian Ott
2016-02-11 19:57 ` Gerald Schaefer
2016-02-12 4:04 ` Aneesh Kumar K.V
2016-02-12 11:59 ` Gerald Schaefer
2016-02-12 16:17 ` Aneesh Kumar K.V
2016-02-12 10:01 ` Will Deacon
2016-02-12 10:12 ` Sebastian Ott
2016-02-12 15:52 ` Will Deacon
2016-02-12 15:41 ` Kirill A. Shutemov
2016-02-12 15:57 ` Christian Borntraeger
2016-02-12 17:16 ` Gerald Schaefer
2016-02-12 23:15 ` Kirill A. Shutemov
2016-02-13 11:58 ` Sebastian Ott
2016-02-15 11:31 ` Kirill A. Shutemov
2016-02-15 16:38 ` Sebastian Ott
2016-02-15 18:37 ` Gerald Schaefer
2016-02-15 21:35 ` Kirill A. Shutemov
2016-02-16 9:54 ` Sebastian Ott
2016-02-16 16:24 ` Gerald Schaefer
2016-02-17 15:04 ` Kirill A. Shutemov
2016-02-17 19:04 ` Sebastian Ott
2016-02-16 18:46 ` Christian Borntraeger
2016-02-17 19:13 ` Gerald Schaefer
2016-02-17 23:58 ` Kirill A. Shutemov
2016-02-18 15:00 ` Gerald Schaefer
2016-02-18 17:06 ` Kirill A. Shutemov
2016-02-19 14:15 ` Sebastian Ott
2016-02-15 16:41 ` Gerald Schaefer
2016-02-23 10:32 ` Kirill A. Shutemov
2016-02-23 17:46 ` Linus Torvalds
2016-02-23 18:19 ` Gerald Schaefer
2016-02-23 18:47 ` Will Deacon
2016-02-25 15:49 ` Steve Capper
2016-02-25 16:01 ` Kirill A. Shutemov [this message]
2016-02-25 16:08 ` Steve Capper
2016-02-23 19:33 ` Kirill A. Shutemov
2016-02-23 20:22 ` Will Deacon
2016-02-24 10:16 ` Christian Borntraeger
2016-02-24 10:41 ` Will Deacon
2016-02-24 10:51 ` Christian Borntraeger
2016-02-24 11:02 ` Will Deacon
2016-02-24 17:22 ` Aneesh Kumar K.V
2016-02-24 8:39 ` Martin Schwidefsky
2016-02-24 12:11 ` Sebastian Ott
2016-02-24 16:44 ` Gerald Schaefer
2016-02-24 8:22 ` Martin Schwidefsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160225160111.GB19707@node.shutemov.name \
--to=kirill@shutemov.name \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=benh@kernel.crashing.org \
--cc=borntraeger@de.ibm.com \
--cc=catalin.marinas@arm.com \
--cc=gerald.schaefer@de.ibm.com \
--cc=heiko.carstens@de.ibm.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-s390@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=paulus@samba.org \
--cc=schwidefsky@de.ibm.com \
--cc=sebott@linux.vnet.ibm.com \
--cc=steve.capper@arm.com \
--cc=steve.capper@linaro.org \
--cc=torvalds@linux-foundation.org \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).