All of lore.kernel.org
 help / color / mirror / Atom feed
From: toshi.kani@hpe.com (Kani, Toshi)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC patch] ioremap: don't set up huge I/O mappings when p4d/pud/pmd is zero
Date: Mon, 8 Jan 2018 23:36:57 +0000	[thread overview]
Message-ID: <1515457376.2108.34.camel@hpe.com> (raw)
In-Reply-To: <e0fa1b52-86f5-687e-46b3-78ddd03565d8@huawei.com>

On Sat, 2018-01-06 at 17:46 +0800, Hanjun Guo wrote:
> On 2018/1/6 6:15, Kani, Toshi wrote:
> > On Thu, 2017-12-28 at 19:24 +0800, Hanjun Guo wrote:
> > > From: Hanjun Guo <hanjun.guo@linaro.org>
> > > 
> > > When we using iounmap() to free the 4K mapping, it just clear the PTEs
> > > but leave P4D/PUD/PMD unchanged, also will not free the memory of page
> > > tables.
> > > 
> > > This will cause issues on ARM64 platform (not sure if other archs have
> > > the same issue) for this case:
> > > 
> > > 1. ioremap a 4K size, valid page table will build,
> > > 2. iounmap it, pte0 will set to 0;
> > > 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
> > >    then set the a new value for pmd;
> > > 4. pte0 is leaked;
> > > 5. CPU may meet exception because the old pmd is still in TLB,
> > >    which will lead to kernel panic.
> > > 
> > > Fix it by skip setting up the huge I/O mappings when p4d/pud/pmd is
> > > zero.
> > 
> > Hi Hanjun,
> > 
> > I tested the above steps on my x86 box, but was not able to reproduce
> > your kernel panic.  On x86, a 4K vaddr gets allocated from a small
> > fragmented free range, whereas a 2MB vaddr is from a larger free range. 
> > Their addrs have different alignments (4KB & 2MB) as well.  So, the
> > steps did not lead to use a same pmd entry.
> 
> Thanks for the testing, I can only reproduce this on my ARM64 platform
> which the CPU will cache the PMD in TLB, from my knowledge, only Cortex-A75
> will do this, so ARM64 platforms which are not A75 based can't be reproduced
> either.
> 
> Catalin, Will, I can reproduce this issue in about 3 minutes with following
> simplified test case [1], and can trigger panic as [2], could you take a look
> as well?

Yes, the test case looks good to me. (nit - it should check if vir_addr
is not NULL.)

> > However, I agree that zero'd pte entries will be leaked when a pmd map
> > is set if they are present under the pmd.
> 
> Thanks for the confirm.
> 
> > 
> > I also tested your patch on my x86 box.  Unfortunately, it effectively
> > disabled 2MB mappings.  While a 2MB vaddr gets allocated from a larger
> > free range, it sill comes from a free range covered by zero'd pte
> > entries.  So, it ends up with 4KB mappings with your changes.
> > 
> > I think we need to come up with other approach.
> 
> Yes, As I said in my patch, this is just RFC, comments are welcomed :)

I am wondering if we can follow the same approach in
arch/x86/mm/pageattr.c.  Like the ioremap case, populate_pmd() does not
check if there is a pte table under the pmd.  But its free function,
unmap_pte_range() calls try_to_free_pte_page() so that a pte table is
freed when all pte entries are zero'd.  It then calls pmd_clear().
iounmap()'s free function, vunmap_pte_range() does not free up a pte
table even if all pte entries are zero'd.

Thanks,
-Toshi

WARNING: multiple messages have this Message-ID (diff)
From: "Kani, Toshi" <toshi.kani@hpe.com>
To: "linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"guohanjun@huawei.com" <guohanjun@huawei.com>
Cc: "linuxarm@huawei.com" <linuxarm@huawei.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"wxf.wang@hisilicon.com" <wxf.wang@hisilicon.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"mark.rutland@arm.com" <mark.rutland@arm.com>,
	"will.deacon@arm.com" <will.deacon@arm.com>,
	"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	"Hocko, Michal" <MHocko@suse.com>,
	"hanjun.guo@linaro.org" <hanjun.guo@linaro.org>
Subject: Re: [RFC patch] ioremap: don't set up huge I/O mappings when p4d/pud/pmd is zero
Date: Mon, 8 Jan 2018 23:36:57 +0000	[thread overview]
Message-ID: <1515457376.2108.34.camel@hpe.com> (raw)
In-Reply-To: <e0fa1b52-86f5-687e-46b3-78ddd03565d8@huawei.com>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3012 bytes --]

On Sat, 2018-01-06 at 17:46 +0800, Hanjun Guo wrote:
> On 2018/1/6 6:15, Kani, Toshi wrote:
> > On Thu, 2017-12-28 at 19:24 +0800, Hanjun Guo wrote:
> > > From: Hanjun Guo <hanjun.guo@linaro.org>
> > > 
> > > When we using iounmap() to free the 4K mapping, it just clear the PTEs
> > > but leave P4D/PUD/PMD unchanged, also will not free the memory of page
> > > tables.
> > > 
> > > This will cause issues on ARM64 platform (not sure if other archs have
> > > the same issue) for this case:
> > > 
> > > 1. ioremap a 4K size, valid page table will build,
> > > 2. iounmap it, pte0 will set to 0;
> > > 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
> > >    then set the a new value for pmd;
> > > 4. pte0 is leaked;
> > > 5. CPU may meet exception because the old pmd is still in TLB,
> > >    which will lead to kernel panic.
> > > 
> > > Fix it by skip setting up the huge I/O mappings when p4d/pud/pmd is
> > > zero.
> > 
> > Hi Hanjun,
> > 
> > I tested the above steps on my x86 box, but was not able to reproduce
> > your kernel panic.  On x86, a 4K vaddr gets allocated from a small
> > fragmented free range, whereas a 2MB vaddr is from a larger free range. 
> > Their addrs have different alignments (4KB & 2MB) as well.  So, the
> > steps did not lead to use a same pmd entry.
> 
> Thanks for the testing, I can only reproduce this on my ARM64 platform
> which the CPU will cache the PMD in TLB, from my knowledge, only Cortex-A75
> will do this, so ARM64 platforms which are not A75 based can't be reproduced
> either.
> 
> Catalin, Will, I can reproduce this issue in about 3 minutes with following
> simplified test case [1], and can trigger panic as [2], could you take a look
> as well?

Yes, the test case looks good to me. (nit - it should check if vir_addr
is not NULL.)

> > However, I agree that zero'd pte entries will be leaked when a pmd map
> > is set if they are present under the pmd.
> 
> Thanks for the confirm.
> 
> > 
> > I also tested your patch on my x86 box.  Unfortunately, it effectively
> > disabled 2MB mappings.  While a 2MB vaddr gets allocated from a larger
> > free range, it sill comes from a free range covered by zero'd pte
> > entries.  So, it ends up with 4KB mappings with your changes.
> > 
> > I think we need to come up with other approach.
> 
> Yes, As I said in my patch, this is just RFC, comments are welcomed :)

I am wondering if we can follow the same approach in
arch/x86/mm/pageattr.c.  Like the ioremap case, populate_pmd() does not
check if there is a pte table under the pmd.  But its free function,
unmap_pte_range() calls try_to_free_pte_page() so that a pte table is
freed when all pte entries are zero'd.  It then calls pmd_clear().
iounmap()'s free function, vunmap_pte_range() does not free up a pte
table even if all pte entries are zero'd.

Thanks,
-Toshi
N‹§²æìr¸›zǧu©ž²Æ {\b­†éì¹»\x1c®&Þ–)îÆi¢žØ^n‡r¶‰šŽŠÝ¢j$½§$¢¸\x05¢¹¨­è§~Š'.)îÄÃ,yèm¶ŸÿÃ\f%Š{±šj+ƒðèž×¦j)Z†·Ÿ

WARNING: multiple messages have this Message-ID (diff)
From: "Kani, Toshi" <toshi.kani@hpe.com>
To: "linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"guohanjun@huawei.com" <guohanjun@huawei.com>
Cc: "linuxarm@huawei.com" <linuxarm@huawei.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"wxf.wang@hisilicon.com" <wxf.wang@hisilicon.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"mark.rutland@arm.com" <mark.rutland@arm.com>,
	"will.deacon@arm.com" <will.deacon@arm.com>,
	"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	"Hocko, Michal" <MHocko@suse.com>,
	"hanjun.guo@linaro.org" <hanjun.guo@linaro.org>
Subject: Re: [RFC patch] ioremap: don't set up huge I/O mappings when p4d/pud/pmd is zero
Date: Mon, 8 Jan 2018 23:36:57 +0000	[thread overview]
Message-ID: <1515457376.2108.34.camel@hpe.com> (raw)
In-Reply-To: <e0fa1b52-86f5-687e-46b3-78ddd03565d8@huawei.com>

On Sat, 2018-01-06 at 17:46 +0800, Hanjun Guo wrote:
> On 2018/1/6 6:15, Kani, Toshi wrote:
> > On Thu, 2017-12-28 at 19:24 +0800, Hanjun Guo wrote:
> > > From: Hanjun Guo <hanjun.guo@linaro.org>
> > > 
> > > When we using iounmap() to free the 4K mapping, it just clear the PTEs
> > > but leave P4D/PUD/PMD unchanged, also will not free the memory of page
> > > tables.
> > > 
> > > This will cause issues on ARM64 platform (not sure if other archs have
> > > the same issue) for this case:
> > > 
> > > 1. ioremap a 4K size, valid page table will build,
> > > 2. iounmap it, pte0 will set to 0;
> > > 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
> > >    then set the a new value for pmd;
> > > 4. pte0 is leaked;
> > > 5. CPU may meet exception because the old pmd is still in TLB,
> > >    which will lead to kernel panic.
> > > 
> > > Fix it by skip setting up the huge I/O mappings when p4d/pud/pmd is
> > > zero.
> > 
> > Hi Hanjun,
> > 
> > I tested the above steps on my x86 box, but was not able to reproduce
> > your kernel panic.  On x86, a 4K vaddr gets allocated from a small
> > fragmented free range, whereas a 2MB vaddr is from a larger free range. 
> > Their addrs have different alignments (4KB & 2MB) as well.  So, the
> > steps did not lead to use a same pmd entry.
> 
> Thanks for the testing, I can only reproduce this on my ARM64 platform
> which the CPU will cache the PMD in TLB, from my knowledge, only Cortex-A75
> will do this, so ARM64 platforms which are not A75 based can't be reproduced
> either.
> 
> Catalin, Will, I can reproduce this issue in about 3 minutes with following
> simplified test case [1], and can trigger panic as [2], could you take a look
> as well?

Yes, the test case looks good to me. (nit - it should check if vir_addr
is not NULL.)

> > However, I agree that zero'd pte entries will be leaked when a pmd map
> > is set if they are present under the pmd.
> 
> Thanks for the confirm.
> 
> > 
> > I also tested your patch on my x86 box.  Unfortunately, it effectively
> > disabled 2MB mappings.  While a 2MB vaddr gets allocated from a larger
> > free range, it sill comes from a free range covered by zero'd pte
> > entries.  So, it ends up with 4KB mappings with your changes.
> > 
> > I think we need to come up with other approach.
> 
> Yes, As I said in my patch, this is just RFC, comments are welcomed :)

I am wondering if we can follow the same approach in
arch/x86/mm/pageattr.c.  Like the ioremap case, populate_pmd() does not
check if there is a pte table under the pmd.  But its free function,
unmap_pte_range() calls try_to_free_pte_page() so that a pte table is
freed when all pte entries are zero'd.  It then calls pmd_clear().
iounmap()'s free function, vunmap_pte_range() does not free up a pte
table even if all pte entries are zero'd.

Thanks,
-Toshi

  reply	other threads:[~2018-01-08 23:36 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-28 11:24 [RFC patch] ioremap: don't set up huge I/O mappings when p4d/pud/pmd is zero Hanjun Guo
2017-12-28 11:24 ` Hanjun Guo
2017-12-28 11:24 ` Hanjun Guo
2017-12-29  8:00 ` Hanjun Guo
2017-12-29  8:00   ` Hanjun Guo
2017-12-29  8:00   ` Hanjun Guo
2018-01-05 22:15 ` Kani, Toshi
2018-01-05 22:15   ` Kani, Toshi
2018-01-06  9:46   ` Hanjun Guo
2018-01-06  9:46     ` Hanjun Guo
2018-01-06  9:46     ` Hanjun Guo
2018-01-08 23:36     ` Kani, Toshi [this message]
2018-01-08 23:36       ` Kani, Toshi
2018-01-08 23:36       ` Kani, Toshi
2018-02-20  9:24 ` Chintan Pandya
2018-02-20  9:24   ` Chintan Pandya
2018-02-20  9:24   ` Chintan Pandya
2018-02-21  0:34   ` Kani, Toshi
2018-02-21  0:34     ` Kani, Toshi
2018-02-21  7:36     ` 答复: " Wangxuefeng (E)
2018-02-21 11:57       ` Will Deacon
2018-02-21 11:57         ` Will Deacon
2018-02-21 11:57         ` Will Deacon
2018-02-21 12:47         ` 答复: " Wangxuefeng (E)
2018-02-26 10:57         ` Hanjun Guo
2018-02-26 10:57           ` Hanjun Guo
2018-02-26 11:04           ` Will Deacon
2018-02-26 11:04             ` Will Deacon
2018-02-26 11:04             ` Will Deacon
2018-02-26 12:53             ` Hanjun Guo
2018-02-26 12:53               ` Hanjun Guo
2018-02-27 19:49               ` Kani, Toshi
2018-02-27 19:49                 ` Kani, Toshi
2018-02-27 19:59                 ` Will Deacon
2018-02-27 19:59                   ` Will Deacon
2018-02-27 19:59                   ` Will Deacon
2018-02-27 20:02                   ` Kani, Toshi
2018-02-27 20:02                     ` Kani, Toshi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1515457376.2108.34.camel@hpe.com \
    --to=toshi.kani@hpe.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.