From: catalin.marinas@arm.com (Catalin Marinas)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH V3 2/2] ARM: Handle user space mapped pages in flush_kernel_dcache_page
Date: Fri, 3 May 2013 11:02:42 +0100 [thread overview]
Message-ID: <20130503100242.GB29962@arm.com> (raw)
In-Reply-To: <20130502193835.GA29144@schnuecks.de>
On Thu, May 02, 2013 at 08:38:36PM +0100, Simon Baatz wrote:
> On Thu, May 02, 2013 at 10:54:31AM +0100, Catalin Marinas wrote:
> > On Wed, May 01, 2013 at 08:04:41PM +0100, Simon Baatz wrote:
> > > On Wed, May 01, 2013 at 03:22:06PM +0100, Catalin Marinas wrote:
> > > > On Tue, Apr 30, 2013 at 10:04:03PM +0100, Simon Baatz wrote:
> > > > > On Tue, Apr 30, 2013 at 12:22:25PM +0100, Catalin Marinas wrote:
> > > > > > On Sun, Apr 21, 2013 at 11:06:30PM +0100, Simon Baatz wrote:
> > > > > > > On Thu, Apr 18, 2013 at 02:51:04PM +0100, Catalin Marinas wrote:
> > > > > > > > On Thu, Apr 18, 2013 at 12:40:16PM +0100, Jason Cooper wrote:
> > > > > ...
> > > > > > >
> > > > > > > It is not the driver itself which is using the API, it is the
> > > > > > > generic scatterlist memory iterator. And I don't think that this is
> > > > > > > wrong, as I have tried to explain in [1].
> > > > > >
> > > > > > Trying to remember what we've discussed over the past months on this
> > > > > > topic. It looks like sg_miter_stop() does the right thing in calling
> > > > > > flush_kernel_dcache_page(). Commit f8b63c1 (ARM: 6382/1: Remove
> > > > > > superfluous flush_kernel_dcache_page()) removed this function entirely.
> > > > > > The code previously had this comment - /* highmem pages are always
> > > > > > flushed upon kunmap already */ which I think it wasn't fully correct
> > > > > > either. The kunmap_atomic() flushes the caches but kunmap() doesn't, so
> > > > > > I suspect we only get the flushing if SG_MITER_ATOMIC.
> > > > > >
> > > > > > So it looks to me like flush_kernel_dcache_page() should be implemented
> > > > > > even for highmem pages (with VIVT or aliasing VIPT, at least for non
> > > > > > kmap_atomic addresses by checking for FIXADDR_START). If highmem is
> > > > > > disabled, I suspect we still need this function since the calling code
> > > > > > doesn't care whether kmap/kunmap was a no-op. But can we keep it as a
> > > > > > simple call to __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE)?
> > > > >
> > > > > My first version ([1]) had:
> > > > >
> > > > > if ((cache_is_vivt() || cache_is_vipt_aliasing()) && !PageHighMem(page))
> > > > > __flush_kernel_dcache_page(page);
> > > > >
> > > > > If I understand this correctly, you are proposing to remove the
> > > > > highmem exclusion.
> > > >
> > > > The highmem exclusion may have been there originally because of a
> > > > comment suggesting that kunmap() does the flushing. This is the case
> > > > only for kunmap_atomic() AFAICT (and maybe we could remove that as well
> > > > and rely on flush_kernel_dcache_page() being called).
> > > >
> > > > > And then in __flush_kernel_dcache_page():
> > > > >
> > > > > mapping = page_mapping(page);
> > > > >
> > > > > if (!mapping || mapping_mapped(mapping))
> > > > > __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
> > > > >
> > > > > I still prefer to have this condition here to avoid the flush when
> > > > > there is no user mapping at all. This is handled by lazy flushing
> > > > > and is probably the most common case anyway (given how many people
> > > > > seem to be affected by this problem).
> > > >
> > > > Looking at the old thread, you said there is a case when this condition
> > > > is not true (O_DIRECT case). If that's for a page cache page, then we
> > > > can handle it lazily (for anonymous pages I don't think we can rely on
> > > > lazy flushing since the kernel does not guarantee the clearing of the
> > > > PG_arch_1 bit).
> > >
> > > As Russel pointed out in a comment to a later version of the patch,
> > > PG_arch_1 makes only sense for page cache pages. The condition above
> > > is ok from my point of view (it is based on what flush_dcache_page()
> > > uses):
> > >
> > > - Page cache page without user space mapping:
> > > mapping != NULL and mapping_mapped() == 0
> > >
> > > -> no flush here; lazy flush based on PG_arch_1 later if needed (we
> > > rely on the proper initialization of the page to "dirty" here.)
> >
> > Indeed.
> >
> > > - Page cache page with user space mapping:
> > > mapping != NULL and mapping_mapped() != 0
> > >
> > > -> kernel mapping flushed here (user mapping can be assumed to be clean)
> >
> > We had similar thoughts for AArch64 here and decided it's not needed, it
> > can just clear the PG_arch_1 bit and do it lazily (patches not pushed
> > yet, need more testing). This assumes that even if mapping_mapped(), the
> > page is not actually mapped in user space and we eventually get a
> > set_pte_at() call. That's what powerpc is doing.
>
> For arm64 only I-/D-cache coherency is relevant, right?
Yes. Same for 32-bit ARMv7 or non-aliasing D-cache on ARMv6.
> In this case,
> that may be right (but you may want to have a look at what
> xol_get_insn_slot() in kernel/events/uprobes.c does. I don't know
> what kind of pages it handles, but this looks suspicious. And I think
> uprobes are also available for powerpc).
It is suspicious indeed and even the author is unsure about using the
right API. Should it rather use copy_to_user_page? I'm not familiar with
uprobes.
> However, if you can have aliases in D-cache this is needed. See
> below to trigger this on pages that are already mapped
> to user space (direct I/O into a memory mapped file).
You are probably right, for direct I/O and aliases in the D-cache it
needs to flush. The documentation for flush_dcache_page() states that it
can also be called when the kernel is about to read from a page cache
page (potentially mapped into user space).
> > > - Anonymous page:
> > > mapping == NULL
> > >
> > > -> kernel mapping flushed here (user mapping can be assumed to be clean)
> >
> > I don't think it should care about anonymous pages at all. I put a
> > WARN_ON(!mapping) in flush_dcache_page() and it hasn't triggered yet,
> > though not intensive testing.
>
> I assume that you inhibited the call to flush_dcache_page() in
> __get_user_pages() for anon pages. Otherwise, you will be flooded
> with warnings.
I haven't done any stress testing so I don't think I hit this code path,
so no warning. But yes, it should have triggered. Anyway, in this case
flush_dcache_page() should have just ignored (clearing PG_arch_1 is
harmless anyway if we also ignore this bit in __sync_icache_dcache for
non-aliasing caches).
> DIY steps to produce both cases:
> (We will need software encryption, thus I need to remove the hardware
> encryption module mv_cesa on my hardware first)
>
> # rmmod mv_cesa
> # wget -O mapping_tests.c http://ubuntuone.com/0jF9fNsguN8BvI1Z8fsBVI
> # gcc -o mapping_tests mapping_tests.c
> # dd if=/dev/urandom of=map.out bs=4k count=10
> # dd if=/dev/urandom of=cdevice.img bs=4k count=10
> # losetup /dev/loop0 cdevice.img
> # cryptsetup -c aes create c_sda2 /dev/loop0
> <enter any passphrase>
> # ./mapping_tests
>
> cryptsetup will already trigger the warning and mapping_tests should
> flood your console.
>
> Since flush_kernel_dcache_page() is supposed to handle user space
> pages, it must flush the kernel mapping in these cases. Running the
> above on a device driver using an unpatched
> flush_kernel_dcache_page() yields:
>
> # ./mapping_tests
> Anonymous page: differs!
> User space mapped page: differs!
I haven't run the tests but I can see why it fails without
flush_kernel_dcache_page(). So I think this function needs to be
implemented for aliasing VIPT or VIVT caches.
> It is even needed in flush_dcache_page() as long as everybody
> continues to use flush_dcache_page() instead of
> flush_kernel_dcache_page() when appropriate...
> (This is probably the main reason why the problem I reported is so
> uncommon: Everybody seems to use flush_dcache_page() and since it
> flushes the kernel mapping in these cases, everything is fine.)
That's why for non-aliasing VIPT we could make it just a clear_bit() and
let the callers fix their API usage.
> > > > > Additionally, although we can assume that the page is kmapped,
> > > > > page_address(page) can still be NULL for a highmem page, right?
> > > >
> > > > It looks like kmap() always sets page_address(page) but I'm not sure
> > > > about kmap_atomic(), it doesn't seem to.
> > >
> > > Hmm, in __flush_dcache_page() we have the following code to flush the
> > > kernel mapping:
> > >
> > > void __flush_dcache_page(struct address_space *mapping, struct page *page)
> > > {
> > > /*
> > > * Writeback any data associated with the kernel mapping of this
> > > * page. This ensures that data in the physical page is mutually
> > > * coherent with the kernels mapping.
> > > */
> > > if (!PageHighMem(page)) {
> > > __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
> > > } else {
> > > void *addr = kmap_high_get(page);
> > > if (addr) {
> > > __cpuc_flush_dcache_area(addr, PAGE_SIZE);
> > > kunmap_high(page);
> > > } else if (cache_is_vipt()) {
> > > /* unmapped pages might still be cached */
> > > addr = kmap_atomic(page);
> > > __cpuc_flush_dcache_area(addr, PAGE_SIZE);
> > > kunmap_atomic(addr);
> > > }
> > > }
> > > ...
> > >
> > >
> > > We probably should reuse it in flush_kernel_dcache_page() to flush
> > > the kernel mapping. (The last else clause looks strange though)
> >
> > I think it makes sense to reuse this logic in
> > flush_kernel_dcache_page(). If the page is already mapped with kmap,
> > then kmap_high_get() should return the actual address. If it was mapped
> > with kmap_atomic, kmap_high_get() would return NULL, hence the 'else'
> > clause and the additional kmap_atomic(). The cache_is_vipt() check is
> > useful because kunmap_atomic() would flush VIVT caches anyway.
>
> Yes, but why don't we flush for VIPT _aliasing_ already in
> kunmap_atomic?
Some comment from Nico on the list, it seems that highmem does not work
with aliasing VIPT caches:
http://article.gmane.org/gmane.linux.ports.arm.kernel/85114
> Why do we need to do anything for non-aliasing VIPT here?
Do you mean __flush_dcache_page()? I don't think we need.
> > As for __flush_dcache_page() called from other places like
> > flush_dcache_page(), because of this 'else if' clause it looks like it
> > misses flushing unmapped highmem pages on VIVT cache.
>
> How many users do we have with VIVT D-caches using highmem? (This is
> not a rhetorical questions, I have no idea. However, I would assume
> that this use case is almost non-existent.)
I don't really know. Maybe Nico has more info.
So my proposal:
1. Non-aliasing VIPT - defer any D-cache flushing until
__sync_icache_dcache() (and fix callers like uprobes)
2. Aliasing VIPT or VIVT - flush the aliases as before in
flush_dcache_page()
3. Aliasing VIPT or VIVT - implement flush_kernel_dcache_page() for all
pages (don't check for PageHighmem)
4. VIVT - remove cache flushing from kunmap_atomic and rely on
flush_kernel_dcache_page()
I'm not entirely sure about 4 but at the moment kunmap() doesn't flush
caches, only kunmap_atomic() so it looks like an inconsistency.
--
Catalin
next prev parent reply other threads:[~2013-05-03 10:02 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-07 11:29 [PATCH V2 0/2] fix and improvement of flush(_kernel)_dcache_page() Simon Baatz
2012-10-07 11:29 ` [PATCH V2 1/2] ARM: remove unnecessary flush of anon pages in flush_dcache_page() Simon Baatz
2012-10-08 11:36 ` Catalin Marinas
2012-10-08 17:38 ` Simon Baatz
2012-10-08 17:43 ` Catalin Marinas
2012-10-07 11:29 ` [PATCH V3 2/2] ARM: Handle user space mapped pages in flush_kernel_dcache_page Simon Baatz
2012-10-08 17:44 ` Catalin Marinas
2012-10-08 20:02 ` Simon Baatz
2012-10-08 20:20 ` Russell King - ARM Linux
2012-10-08 23:07 ` Simon Baatz
2012-11-18 21:10 ` Jason Cooper
2013-04-18 11:16 ` Jason Cooper
2013-04-18 11:22 ` Russell King - ARM Linux
2013-04-18 11:40 ` Jason Cooper
2013-04-18 13:51 ` Catalin Marinas
2013-04-21 22:06 ` Simon Baatz
2013-04-30 11:22 ` Catalin Marinas
2013-04-30 21:04 ` Simon Baatz
2013-05-01 14:22 ` Catalin Marinas
2013-05-01 19:04 ` Simon Baatz
2013-05-02 9:54 ` Catalin Marinas
2013-05-02 19:38 ` Simon Baatz
2013-05-03 10:02 ` Catalin Marinas [this message]
2013-05-04 8:21 ` Ming Lei
2013-05-08 15:08 ` Catalin Marinas
2013-05-08 18:43 ` Russell King - ARM Linux
2013-05-08 19:31 ` Simon Baatz
2013-05-08 21:13 ` Catalin Marinas
2013-05-09 1:52 ` Ming Lei
2013-05-11 6:27 ` Simon Baatz
2013-05-13 3:12 ` Ming Lei
2013-05-05 22:26 ` Simon Baatz
2013-05-08 15:28 ` Catalin Marinas
2013-04-18 19:39 ` Simon Baatz
2013-04-18 19:00 ` Simon Baatz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130503100242.GB29962@arm.com \
--to=catalin.marinas@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.