From: toshi.kani@hpe.com (Kani, Toshi)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v2 2/2] x86/mm: implement free pmd/pte page interfaces
Date: Fri, 27 Apr 2018 14:31:51 +0000 [thread overview]
Message-ID: <1524839460.2693.531.camel@hpe.com> (raw)
In-Reply-To: <20180427073719.GT15462@8bytes.org>
On Fri, 2018-04-27 at 09:37 +0200, joro at 8bytes.org wrote:
> On Thu, Apr 26, 2018 at 10:30:14PM +0000, Kani, Toshi wrote:
> > Thanks for the clarification. After reading through SDM one more time, I
> > agree that we need a TLB purge here. Here is my current understanding.
> >
> > - INVLPG purges both TLB and paging-structure caches. So, PMD cache was
> > purged once.
> > - However, processor may cache this PMD entry later in speculation
> > since it has p-bit set. (This is where my misunderstanding was.
> > Speculation is not allowed to access a target address, but it may still
> > cache this PMD entry.)
> > - A single INVLPG on each processor purges this PMD cache. It does not
> > need a range purge (which was already done).
> >
> > Does it sound right to you?
>
> The right fix is to first synchronize the changes when the PMD/PUD is
> cleared and then flush the TLB system-wide. After that is done you can
> free the page.
Agreed. This can be done on top of this patch.
> But doing all that in the pud/pmd_free_pmd/pte_page() functions is too
> expensive, as the TLB flush requires to send IPIs to all cores in the
> system, and that every time the function is called.
>
> So what needs to be done is to fix this from high-level ioremap code to
> first unmap all required PTE/PMD pages and collect them in a list. When
> that is done you can synchronize the changes with the other page-tables
> in the system and do one system-wide TLB flush. When that is complete
> you can free the pages on the list that were collected while unmapping.
>
> Then the new mappings can be established and again synchronized with the
> other page-tables in the system.
Yes, and this patch was designed to work in such way. Please note that
this patch added pud_free_pmd_page() and pmd_free_pte_page() to the
ioremap() path when and only when it creates a pud or pmd map. This
assures the following preconditions are met without overhead.
- All pte entries for a target pud/pmd address range have been cleared.
- System-wide TLB purges have been done for a target pud/pmd address
range.
So, we can add the step 2 on top of this patch.
1. Clear pud/pmd entry.
2. System wide TLB flush <-- TO BE ADDED BY NEW PATCH
3. Free its underlining pmd/pte page.
> > As for the BUG_ON issue, are you able to reproduce this issue? If so,
> > would you be able to test the fix?
>
> Yes, I can reproduce the BUG_ON with my PTI patches and a fedora-i386
> VM.
Great!
> I already ran into the issue before your patches were merged upstream,
> but my "fix" is different because it just prevents huge-mappings when
> there were smaller mappings before. See
>
> e3e288121408 x86/pgtable: Don't set huge PUD/PMD on non-leaf entries
>
> for details. This patch does not fix the base-problem, but hides it
> again, as the real fix needs some more work across architectures.
Right. Patch 1/2 of this series made the same fix as well. See:
b6bdb7517c3d mm/vmalloc: add interfaces to free unmapped page table
> Your patch actually makes the problem worse, without it the PTE/PMD pages
> were just leaked, so that they could not be reused. But with your patch
> the pages can be used again and the page-walker might establish TLB
> entries based on random content the new owner writes to it. This can
> lead to all kinds of random and very hard to debug data corruption
> issues.
>
> So until we make the generic ioremap code in lib/ioremap.c smarter about
> unmapping/remapping ranges the best solution is making my fix work again
> by reverting your patch.
We do not need to revert this patch. We can make the above change I
mentioned.
Thanks,
-Toshi
WARNING: multiple messages have this Message-ID (diff)
From: "Kani, Toshi" <toshi.kani@hpe.com>
To: "joro@8bytes.org" <joro@8bytes.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"bp@suse.de" <bp@suse.de>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"guohanjun@huawei.com" <guohanjun@huawei.com>,
"wxf.wang@hisilicon.com" <wxf.wang@hisilicon.com>,
"stable@vger.kernel.org" <stable@vger.kernel.org>,
"x86@kernel.org" <x86@kernel.org>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"willy@infradead.org" <willy@infradead.org>,
"hpa@zytor.com" <hpa@zytor.com>,
"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
"mingo@redhat.com" <mingo@redhat.com>,
"will.deacon@arm.com" <will.deacon@arm.com>,
"Hocko, Michal" <MHocko@suse.com>,
"cpandya@codeaurora.org" <cpandya@codeaurora.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH v2 2/2] x86/mm: implement free pmd/pte page interfaces
Date: Fri, 27 Apr 2018 14:31:51 +0000 [thread overview]
Message-ID: <1524839460.2693.531.camel@hpe.com> (raw)
In-Reply-To: <20180427073719.GT15462@8bytes.org>
On Fri, 2018-04-27 at 09:37 +0200, joro@8bytes.org wrote:
> On Thu, Apr 26, 2018 at 10:30:14PM +0000, Kani, Toshi wrote:
> > Thanks for the clarification. After reading through SDM one more time, I
> > agree that we need a TLB purge here. Here is my current understanding.
> >
> > - INVLPG purges both TLB and paging-structure caches. So, PMD cache was
> > purged once.
> > - However, processor may cache this PMD entry later in speculation
> > since it has p-bit set. (This is where my misunderstanding was.
> > Speculation is not allowed to access a target address, but it may still
> > cache this PMD entry.)
> > - A single INVLPG on each processor purges this PMD cache. It does not
> > need a range purge (which was already done).
> >
> > Does it sound right to you?
>
> The right fix is to first synchronize the changes when the PMD/PUD is
> cleared and then flush the TLB system-wide. After that is done you can
> free the page.
Agreed. This can be done on top of this patch.
> But doing all that in the pud/pmd_free_pmd/pte_page() functions is too
> expensive, as the TLB flush requires to send IPIs to all cores in the
> system, and that every time the function is called.
>
> So what needs to be done is to fix this from high-level ioremap code to
> first unmap all required PTE/PMD pages and collect them in a list. When
> that is done you can synchronize the changes with the other page-tables
> in the system and do one system-wide TLB flush. When that is complete
> you can free the pages on the list that were collected while unmapping.
>
> Then the new mappings can be established and again synchronized with the
> other page-tables in the system.
Yes, and this patch was designed to work in such way. Please note that
this patch added pud_free_pmd_page() and pmd_free_pte_page() to the
ioremap() path when and only when it creates a pud or pmd map. This
assures the following preconditions are met without overhead.
- All pte entries for a target pud/pmd address range have been cleared.
- System-wide TLB purges have been done for a target pud/pmd address
range.
So, we can add the step 2 on top of this patch.
1. Clear pud/pmd entry.
2. System wide TLB flush <-- TO BE ADDED BY NEW PATCH
3. Free its underlining pmd/pte page.
> > As for the BUG_ON issue, are you able to reproduce this issue? If so,
> > would you be able to test the fix?
>
> Yes, I can reproduce the BUG_ON with my PTI patches and a fedora-i386
> VM.
Great!
> I already ran into the issue before your patches were merged upstream,
> but my "fix" is different because it just prevents huge-mappings when
> there were smaller mappings before. See
>
> e3e288121408 x86/pgtable: Don't set huge PUD/PMD on non-leaf entries
>
> for details. This patch does not fix the base-problem, but hides it
> again, as the real fix needs some more work across architectures.
Right. Patch 1/2 of this series made the same fix as well. See:
b6bdb7517c3d mm/vmalloc: add interfaces to free unmapped page table
> Your patch actually makes the problem worse, without it the PTE/PMD pages
> were just leaked, so that they could not be reused. But with your patch
> the pages can be used again and the page-walker might establish TLB
> entries based on random content the new owner writes to it. This can
> lead to all kinds of random and very hard to debug data corruption
> issues.
>
> So until we make the generic ioremap code in lib/ioremap.c smarter about
> unmapping/remapping ranges the best solution is making my fix work again
> by reverting your patch.
We do not need to revert this patch. We can make the above change I
mentioned.
Thanks,
-Toshi
next prev parent reply other threads:[~2018-04-27 14:31 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-14 18:01 [PATCH v2 0/2] fix memory leak / panic in ioremap huge pages Toshi Kani
2018-03-14 18:01 ` Toshi Kani
2018-03-14 18:01 ` [PATCH v2 1/2] mm/vmalloc: Add interfaces to free unmapped page table Toshi Kani
2018-03-14 18:01 ` Toshi Kani
2018-03-14 22:38 ` Andrew Morton
2018-03-14 22:38 ` Andrew Morton
2018-03-15 14:27 ` Kani, Toshi
2018-03-15 14:27 ` Kani, Toshi
2018-03-14 18:01 ` [PATCH v2 2/2] x86/mm: implement free pmd/pte page interfaces Toshi Kani
2018-03-14 18:01 ` Toshi Kani
2018-03-15 7:39 ` Chintan Pandya
2018-03-15 7:39 ` Chintan Pandya
2018-03-15 14:51 ` Kani, Toshi
2018-03-15 14:51 ` Kani, Toshi
2018-04-26 14:19 ` Joerg Roedel
2018-04-26 14:19 ` Joerg Roedel
2018-04-26 16:21 ` Kani, Toshi
2018-04-26 16:21 ` Kani, Toshi
2018-04-26 17:23 ` joro at 8bytes.org
2018-04-26 17:23 ` joro
2018-04-26 17:49 ` Kani, Toshi
2018-04-26 17:49 ` Kani, Toshi
2018-04-26 20:07 ` joro at 8bytes.org
2018-04-26 20:07 ` joro
2018-04-26 22:30 ` Kani, Toshi
2018-04-26 22:30 ` Kani, Toshi
2018-04-27 7:37 ` joro at 8bytes.org
2018-04-27 7:37 ` joro
2018-04-27 11:39 ` Michal Hocko
2018-04-27 11:39 ` Michal Hocko
2018-04-27 11:46 ` joro at 8bytes.org
2018-04-27 11:46 ` joro
2018-04-27 11:52 ` Chintan Pandya
2018-04-27 11:52 ` Chintan Pandya
2018-04-27 12:48 ` joro at 8bytes.org
2018-04-27 12:48 ` joro
2018-04-27 13:42 ` Chintan Pandya
2018-04-27 13:42 ` Chintan Pandya
2018-04-27 14:31 ` Kani, Toshi [this message]
2018-04-27 14:31 ` Kani, Toshi
2018-04-28 9:02 ` joro at 8bytes.org
2018-04-28 9:02 ` joro
2018-04-28 20:54 ` Kani, Toshi
2018-04-28 20:54 ` Kani, Toshi
2018-04-30 7:30 ` Chintan Pandya
2018-04-30 7:30 ` Chintan Pandya
2018-04-30 13:43 ` Kani, Toshi
2018-04-30 13:43 ` Kani, Toshi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1524839460.2693.531.camel@hpe.com \
--to=toshi.kani@hpe.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.