From: Toshi Kani <toshi.kani@hp.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: hpa@zytor.com, tglx@linutronix.de, mingo@redhat.com,
arnd@arndb.de, linux-mm@kvack.org, x86@kernel.org,
linux-kernel@vger.kernel.org, Elliott@hp.com
Subject: Re: [PATCH v2 0/7] Kernel huge I/O mapping support
Date: Mon, 23 Feb 2015 16:54:40 -0700 [thread overview]
Message-ID: <1424735680.17007.94.camel@misato.fc.hp.com> (raw)
In-Reply-To: <20150223122224.c55554325cc4dadeca067234@linux-foundation.org>
On Mon, 2015-02-23 at 12:22 -0800, Andrew Morton wrote:
> On Mon, 9 Feb 2015 15:45:28 -0700 Toshi Kani <toshi.kani@hp.com> wrote:
>
> > ioremap() and its related interfaces are used to create I/O
> > mappings to memory-mapped I/O devices. The mapping sizes of
> > the traditional I/O devices are relatively small. Non-volatile
> > memory (NVM), however, has many GB and is going to have TB soon.
> > It is not very efficient to create large I/O mappings with 4KB.
>
> The changelogging is very good - thanks for taking the time to do this.
>
> > This patchset extends the ioremap() interfaces to transparently
> > create I/O mappings with huge pages whenever possible.
>
> I'm wondering if this is prudent. Existing code which was tested with
> 4k mappings will magically start to use huge tlb mappings. I don't
> know what could go wrong, but I'd prefer not to find out! Wouldn't it
> be safer to make this an explicit opt-in?
There were related discussions on this. This v2 patchset actually has
CONFIG_HUGE_IOMAP, which allows user to select this feature. As
suggested in the thread below, I am going to remove this
CONFIG_HUGE_IOMAP, so that it will be simpler and similar to how we
create huge mappings to the kernel itself. If bugs are found, they will
be fixed.
https://lkml.org/lkml/2015/2/18/677
> What operations can presently be performed against an ioremapped area?
> Can kernel code perform change_page_attr() against individual pages?
> Can kernel code run iounmap() against just part of that region (I
> forget). There does seem to be potential for breakage if we start
> using hugetlb mappings for such things?
Yes, kernel code can use the CPA interfaces, such as set_memory_x() and
set_memory_ro() to an ioremapped area. CPA breaks a huge page to
smaller pages. I have included them into my test cases and confirmed
they work. (Note, memory type change interfaces, such as
set_memory_uc() and set_memory_wc(), are not supported to an ioremapped
area regardless of their page size.)
iounmap() only takes a single argument, virtual base addr. It looks up
the corresponding vm area object from the virt addr, and always removes
the entire mapping.
> > ioremap()
> > continues to use 4KB mappings when a huge page does not fit into
> > a requested range. There is no change necessary to the drivers
> > using ioremap(). A requested physical address must be aligned by
> > a huge page size (1GB or 2MB on x86) for using huge page mapping,
> > though. The kernel huge I/O mapping will improve performance of
> > NVM and other devices with large memory, and reduce the time to
> > create their mappings as well.
> >
> > On x86, the huge I/O mapping may not be used when a target range is
> > covered by multiple MTRRs with different memory types. The caller
> > must make a separate request for each MTRR range, or the huge I/O
> > mapping can be disabled with the kernel boot option "nohugeiomap".
> > The detail of this issue is described in the email below, and this
> > patch takes option C) in favor of simplicity since MTRRs are legacy
> > feature.
> > https://lkml.org/lkml/2015/2/5/638
>
> How is this mtrr clash handled?
>
> - The iomap call will fail if there are any MTRRs covering the region?
>
> - The iomap call will fail if there are more than one MTRRs covering
> the region?
>
> - If the ioremap will succeed if a single MTRR covers the region,
> must that MTRR cover the *entire* region?
>
> - What happens if userspace tried fiddling the MTRRs after the region
> has been established?
>
> <reads the code>
This issue was also discussed in the same thread:
https://lkml.org/lkml/2015/2/18/677
I am going to implement option D -- the iomap call will fail if there
are more than one MTRRs with "different types" covering the region.
> Oh. We don't do any checking at all. We're just telling userspace
> programmers "don't do that". hrm. What are your thoughts on adding
> the overlap checks to the kernel?
>
> This adds more potential for breaking existing code, doesn't it? If
> there's code which is using 4k ioremap on regions which are covered by
> mtrrs, the transparent switch to hugeptes will cause that code to enter
> the "undefined behaviour" space?
Yes, I agree with your concern, and I am going to add the check. I do
not think we have such platform today, and will be affected by this
change, though.
> > The patchset introduces the following configs:
> > HUGE_IOMAP - When selected (default Y), enable huge I/O mappings.
> > Require HAVE_ARCH_HUGE_VMAP set.
> > HAVE_ARCH_HUGE_VMAP - Indicate arch supports huge KVA mappings.
> > Require X86_PAE set on X86_32.
> >
> > Patch 1-4 changes common files to support huge I/O mappings. There
> > is no change in the functinalities until HUGE_IOMAP is set in patch 7.
> >
> > Patch 5,6 implement HAVE_ARCH_HUGE_VMAP and HUGE_IOMAP funcs on x86,
> > and set HAVE_ARCH_HUGE_VMAP on x86.
> >
> > Patch 7 adds HUGE_IOMAP to Kconfig, which is set to Y by default on
> > x86.
>
> What do other architectures need to do to utilize this?
Other architectures can implement their version of patch 5/7 and 6/7 to
utilize this feature.
Thanks,
-Toshi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Toshi Kani <toshi.kani@hp.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: hpa@zytor.com, tglx@linutronix.de, mingo@redhat.com,
arnd@arndb.de, linux-mm@kvack.org, x86@kernel.org,
linux-kernel@vger.kernel.org, Elliott@hp.com
Subject: Re: [PATCH v2 0/7] Kernel huge I/O mapping support
Date: Mon, 23 Feb 2015 16:54:40 -0700 [thread overview]
Message-ID: <1424735680.17007.94.camel@misato.fc.hp.com> (raw)
In-Reply-To: <20150223122224.c55554325cc4dadeca067234@linux-foundation.org>
On Mon, 2015-02-23 at 12:22 -0800, Andrew Morton wrote:
> On Mon, 9 Feb 2015 15:45:28 -0700 Toshi Kani <toshi.kani@hp.com> wrote:
>
> > ioremap() and its related interfaces are used to create I/O
> > mappings to memory-mapped I/O devices. The mapping sizes of
> > the traditional I/O devices are relatively small. Non-volatile
> > memory (NVM), however, has many GB and is going to have TB soon.
> > It is not very efficient to create large I/O mappings with 4KB.
>
> The changelogging is very good - thanks for taking the time to do this.
>
> > This patchset extends the ioremap() interfaces to transparently
> > create I/O mappings with huge pages whenever possible.
>
> I'm wondering if this is prudent. Existing code which was tested with
> 4k mappings will magically start to use huge tlb mappings. I don't
> know what could go wrong, but I'd prefer not to find out! Wouldn't it
> be safer to make this an explicit opt-in?
There were related discussions on this. This v2 patchset actually has
CONFIG_HUGE_IOMAP, which allows user to select this feature. As
suggested in the thread below, I am going to remove this
CONFIG_HUGE_IOMAP, so that it will be simpler and similar to how we
create huge mappings to the kernel itself. If bugs are found, they will
be fixed.
https://lkml.org/lkml/2015/2/18/677
> What operations can presently be performed against an ioremapped area?
> Can kernel code perform change_page_attr() against individual pages?
> Can kernel code run iounmap() against just part of that region (I
> forget). There does seem to be potential for breakage if we start
> using hugetlb mappings for such things?
Yes, kernel code can use the CPA interfaces, such as set_memory_x() and
set_memory_ro() to an ioremapped area. CPA breaks a huge page to
smaller pages. I have included them into my test cases and confirmed
they work. (Note, memory type change interfaces, such as
set_memory_uc() and set_memory_wc(), are not supported to an ioremapped
area regardless of their page size.)
iounmap() only takes a single argument, virtual base addr. It looks up
the corresponding vm area object from the virt addr, and always removes
the entire mapping.
> > ioremap()
> > continues to use 4KB mappings when a huge page does not fit into
> > a requested range. There is no change necessary to the drivers
> > using ioremap(). A requested physical address must be aligned by
> > a huge page size (1GB or 2MB on x86) for using huge page mapping,
> > though. The kernel huge I/O mapping will improve performance of
> > NVM and other devices with large memory, and reduce the time to
> > create their mappings as well.
> >
> > On x86, the huge I/O mapping may not be used when a target range is
> > covered by multiple MTRRs with different memory types. The caller
> > must make a separate request for each MTRR range, or the huge I/O
> > mapping can be disabled with the kernel boot option "nohugeiomap".
> > The detail of this issue is described in the email below, and this
> > patch takes option C) in favor of simplicity since MTRRs are legacy
> > feature.
> > https://lkml.org/lkml/2015/2/5/638
>
> How is this mtrr clash handled?
>
> - The iomap call will fail if there are any MTRRs covering the region?
>
> - The iomap call will fail if there are more than one MTRRs covering
> the region?
>
> - If the ioremap will succeed if a single MTRR covers the region,
> must that MTRR cover the *entire* region?
>
> - What happens if userspace tried fiddling the MTRRs after the region
> has been established?
>
> <reads the code>
This issue was also discussed in the same thread:
https://lkml.org/lkml/2015/2/18/677
I am going to implement option D -- the iomap call will fail if there
are more than one MTRRs with "different types" covering the region.
> Oh. We don't do any checking at all. We're just telling userspace
> programmers "don't do that". hrm. What are your thoughts on adding
> the overlap checks to the kernel?
>
> This adds more potential for breaking existing code, doesn't it? If
> there's code which is using 4k ioremap on regions which are covered by
> mtrrs, the transparent switch to hugeptes will cause that code to enter
> the "undefined behaviour" space?
Yes, I agree with your concern, and I am going to add the check. I do
not think we have such platform today, and will be affected by this
change, though.
> > The patchset introduces the following configs:
> > HUGE_IOMAP - When selected (default Y), enable huge I/O mappings.
> > Require HAVE_ARCH_HUGE_VMAP set.
> > HAVE_ARCH_HUGE_VMAP - Indicate arch supports huge KVA mappings.
> > Require X86_PAE set on X86_32.
> >
> > Patch 1-4 changes common files to support huge I/O mappings. There
> > is no change in the functinalities until HUGE_IOMAP is set in patch 7.
> >
> > Patch 5,6 implement HAVE_ARCH_HUGE_VMAP and HUGE_IOMAP funcs on x86,
> > and set HAVE_ARCH_HUGE_VMAP on x86.
> >
> > Patch 7 adds HUGE_IOMAP to Kconfig, which is set to Y by default on
> > x86.
>
> What do other architectures need to do to utilize this?
Other architectures can implement their version of patch 5/7 and 6/7 to
utilize this feature.
Thanks,
-Toshi
next prev parent reply other threads:[~2015-02-23 23:55 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-09 22:45 [PATCH v2 0/7] Kernel huge I/O mapping support Toshi Kani
2015-02-09 22:45 ` Toshi Kani
2015-02-09 22:45 ` [PATCH v2 1/7] mm: Change __get_vm_area_node() to use fls_long() Toshi Kani
2015-02-09 22:45 ` Toshi Kani
2015-02-09 22:45 ` [PATCH v2 2/7] lib: Add huge I/O map capability interfaces Toshi Kani
2015-02-09 22:45 ` Toshi Kani
2015-02-09 22:45 ` [PATCH v2 3/7] mm: Change ioremap to set up huge I/O mappings Toshi Kani
2015-02-09 22:45 ` Toshi Kani
2015-02-09 22:45 ` [PATCH v2 4/7] mm: Change vunmap to tear down huge KVA mappings Toshi Kani
2015-02-09 22:45 ` Toshi Kani
2015-02-09 22:45 ` [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86 Toshi Kani
2015-02-09 22:45 ` Toshi Kani
2015-02-10 18:59 ` Dave Hansen
2015-02-10 18:59 ` Dave Hansen
2015-02-10 20:42 ` Toshi Kani
2015-02-10 20:42 ` Toshi Kani
2015-02-10 20:51 ` Dave Hansen
2015-02-10 20:51 ` Dave Hansen
2015-02-10 22:13 ` Toshi Kani
2015-02-10 22:13 ` Toshi Kani
2015-02-10 22:20 ` Toshi Kani
2015-02-10 22:20 ` Toshi Kani
2015-02-10 23:10 ` Toshi Kani
2015-02-10 23:10 ` Toshi Kani
2015-03-03 0:37 ` Toshi Kani
2015-03-03 0:37 ` Toshi Kani
2015-02-09 22:45 ` [PATCH v2 6/7] x86, mm: Support huge I/O " Toshi Kani
2015-02-09 22:45 ` Toshi Kani
2015-02-18 20:44 ` Ingo Molnar
2015-02-18 20:44 ` Ingo Molnar
2015-02-18 21:13 ` Toshi Kani
2015-02-18 21:13 ` Toshi Kani
2015-02-18 21:15 ` Ingo Molnar
2015-02-18 21:15 ` Ingo Molnar
2015-02-18 21:33 ` Toshi Kani
2015-02-18 21:33 ` Toshi Kani
2015-02-18 21:57 ` Ingo Molnar
2015-02-18 21:57 ` Ingo Molnar
2015-02-18 22:14 ` Toshi Kani
2015-02-18 22:14 ` Toshi Kani
2015-02-09 22:45 ` [PATCH v2 7/7] mm: Add config HUGE_IOMAP to enable huge I/O mappings Toshi Kani
2015-02-09 22:45 ` Toshi Kani
2015-02-23 20:22 ` [PATCH v2 0/7] Kernel huge I/O mapping support Andrew Morton
2015-02-23 20:22 ` Andrew Morton
2015-02-23 23:54 ` Toshi Kani [this message]
2015-02-23 23:54 ` Toshi Kani
2015-02-24 8:09 ` Ingo Molnar
2015-02-24 8:09 ` Ingo Molnar
2015-03-02 15:51 ` Toshi Kani
2015-03-02 15:51 ` Toshi Kani
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1424735680.17007.94.camel@misato.fc.hp.com \
--to=toshi.kani@hp.com \
--cc=Elliott@hp.com \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.