* [RFC/PATCH 0/1] ARM: Handle user space mapped pages in flush_kernel_dcache_page
@ 2012-05-27 23:11 Simon Baatz
2012-05-27 23:11 ` [RFC/PATCH 1/1] " Simon Baatz
2012-06-17 19:13 ` [RFC/PATCH 0/1] " Simon Baatz
0 siblings, 2 replies; 7+ messages in thread
From: Simon Baatz @ 2012-05-27 23:11 UTC (permalink / raw)
To: linux-arm-kernel
Patch f8b63c1 "ARM: 6382/1" changed flush_kernel_dcache_page() to a
no-op. flush_kernel_dcache_page was deemed superfluous as newly
allocated page cache pages are now dirty by default (patch 6379/1).
This relies on the assumption that a block device driver only reads
into new pages (otherwise the request is served from the cache).
However, this assumption is not true for direct IO or SG_IO (see
e.g. [1] and [2]). This is why vgchange fails with mv_cesa as reported
by me in [3]. (Btw., flush_kernel_dcache_page is also called from
"copy_strings" in fs/exec.c when copying env/arg strings between
processes. Thus, its use is not restricted to device drivers.)
At the end of this mail is a small program that uses O_DIRECT to
specifically create two test cases:
- device driver gets an anonymous page to read into
- device driver gets a page with user mapping(s) to read into
The source can also be obtained from [4]. With mv_cesa, both cases
return garbled data (the scatterlist memory iterator uses
flush_kernel_dcache_page).
As explained above, the problem is not specific to mv_cesa. As many
drivers seem to use flush_dcache_page where they probably could use
flush_kernel_dcache_page, the problem did not surface up to now. For
example, one of these uses is in the scatterwalk implementation of
linux crypto. When changing flush_dcache_page to
flush_kernel_dcache_page in crypto/scatterwalk.c, I see the same
coherency problems:
root at ww1:~# rmmod mv_cesa
root at ww1:~# cryptsetup luksOpen /dev/sda2 c_sda2
Enter passphrase for /dev/sda2:
root at ww1:~# ./mapping_tests
Anonymous page: differs!
User space mapped page: differs!
The proposed fix is to let flush_kernel_dcache_page handle the 'kernel
mapped only' case as before (no-op) but to flush the kernel mapping in
case of a user space mapped page.
This means that the original idea to 'fix' non-flushing PIO drivers by
patch 6379/1 does not work for all cases. The provided patch should
fix things for drivers that already perform cache flushing, but it
obviously does not fix direct IO for PIO drivers without (I think
there have been discussions for specific PIO mapping APIs in the
past). This and my limited testing on only one architecture (kirkwood,
VIVT) is why the patch is RFC.
- Simon
[1] https://lkml.org/lkml/2008/11/20/20
[2] http://article.gmane.org/gmane.linux.kernel.cross-arch/5159
[3] http://www.mail-archive.com/linux-crypto at vger.kernel.org/msg07038.html
[4] http://ubuntuone.com/0jF9fNsguN8BvI1Z8fsBVI
mapping_tests.c:
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <string.h>
#include <assert.h>
const char* read_filename = "/dev/mapper/c_sda2";
const char* map_filename = "./map.out";
/* Init the page and read it to fill some cache lines */
int wr_page(char *p, int ps) {
int i, sum;
memset(p, 9, ps);
sum = 0;
for(i = 0; i < ps; i++)
sum += p[i];
return sum;
}
int main(void) {
void *org_buffer;
void *odirect_buffer;
char *mbuffer;
int fd, fdm;
int i, c;
unsigned page_size;
page_size = getpagesize();
/* Read the first page into a normal buffer.
If the page is not in the page cache yet, this will
create a new page without user mappings in the kernel.
*/
fd = open(read_filename, O_RDONLY);
assert(fd >= 0);
org_buffer = malloc(page_size);
c = read(fd, org_buffer, page_size);
assert(c == page_size);
close(fd);
/* Read the first page into an aligned malloced buffer using
O_DIRECT. The block driver will get an anonymous page.
*/
odirect_buffer = NULL;
posix_memalign(&odirect_buffer, page_size, page_size);
assert(odirect_buffer);
fd = open(read_filename, O_RDONLY | O_DIRECT);
assert(fd >= 0);
for (i = 0 ; i < 100 ; i++) {
wr_page((char *)odirect_buffer, page_size);
lseek(fd, 0, SEEK_SET);
c = read(fd, odirect_buffer, page_size);
assert(c == page_size);
if (memcmp(odirect_buffer, org_buffer, page_size) != 0) {
printf("Anonymous page: differs!\n");
break;
}
}
close(fd);
free(odirect_buffer);
/* Read the first page into an mmapped file (using O_DIRECT).
The block driver will get a file backed page with user
mappings.
*/
fd = open(read_filename, O_RDONLY | O_DIRECT);
assert(fd >= 0);
fdm = open(map_filename, O_RDWR | O_CREAT | O_TRUNC);
assert(fdm >= 0);
c = write(fdm, org_buffer, page_size);
assert(c == page_size);
c = fsync(fdm);
assert(c == 0);
mbuffer = mmap(NULL, page_size, PROT_WRITE, MAP_SHARED, fdm, 0);
for (i = 0 ; i < 100 ; i++) {
wr_page((char *)mbuffer, page_size);
lseek(fd, 0, SEEK_SET);
c = read(fd, mbuffer, page_size);
assert(c == page_size);
if (memcmp(mbuffer, org_buffer, page_size) != 0) {
printf("User space mapped page: differs!\n");
break;
}
}
munmap(mbuffer, page_size);
close(fdm);
close(fd);
free(org_buffer);
}
Simon Baatz (1):
ARM: Handle user space mapped pages in flush_kernel_dcache_page
arch/arm/include/asm/cacheflush.h | 4 ++++
arch/arm/mm/flush.c | 22 ++++++++++++++++++++++
2 files changed, 26 insertions(+)
--
1.7.9.5
^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC/PATCH 1/1] ARM: Handle user space mapped pages in flush_kernel_dcache_page
2012-05-27 23:11 [RFC/PATCH 0/1] ARM: Handle user space mapped pages in flush_kernel_dcache_page Simon Baatz
@ 2012-05-27 23:11 ` Simon Baatz
2012-05-28 4:32 ` Catalin Marinas
2012-06-17 19:13 ` [RFC/PATCH 0/1] " Simon Baatz
1 sibling, 1 reply; 7+ messages in thread
From: Simon Baatz @ 2012-05-27 23:11 UTC (permalink / raw)
To: linux-arm-kernel
Commit f8b63c1 made flush_kernel_dcache_page a no-op assuming that the pages
it needs to handle are kernel mapped only. However, for example when doing
direct I/O, pages with user space mappings may occur.
Thus, continue to do lazy flushing if there are no user space mappings.
Otherwise, flush the kernel cache lines directly.
Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
---
arch/arm/include/asm/cacheflush.h | 4 ++++
arch/arm/mm/flush.c | 22 ++++++++++++++++++++++
2 files changed, 26 insertions(+)
diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index d5d8d5c..c6c81d5 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -303,6 +303,10 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
#define ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE
static inline void flush_kernel_dcache_page(struct page *page)
{
+ extern void __flush_kernel_dcache_page(struct page *);
+ /* highmem pages are always flushed upon kunmap already */
+ if ((cache_is_vivt() || cache_is_vipt_aliasing()) && !PageHighMem(page))
+ __flush_kernel_dcache_page(page);
}
#define flush_dcache_mmap_lock(mapping) \
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 7745854..bcba3a9 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -192,6 +192,28 @@ void __flush_dcache_page(struct address_space *mapping, struct page *page)
page->index << PAGE_CACHE_SHIFT);
}
+/*
+ * Ensure cache coherency for kernel mapping of this page.
+ *
+ * If the page only exists in the page cache and there are no user
+ * space mappings, this is a no-op since the page was already marked
+ * dirty@creation. Otherwise, we need to flush the dirty kernel
+ * cache lines directly.
+ *
+ * We can assume that the page is no high mem page, see
+ * flush_kernel_dcache_page.
+ */
+void __flush_kernel_dcache_page(struct page *page)
+{
+ struct address_space *mapping;
+
+ mapping = page_mapping(page);
+
+ if (!mapping || mapping_mapped(mapping))
+ __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
+}
+EXPORT_SYMBOL(__flush_kernel_dcache_page);
+
static void __flush_dcache_aliases(struct address_space *mapping, struct page *page)
{
struct mm_struct *mm = current->active_mm;
--
1.7.9.5
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC/PATCH 1/1] ARM: Handle user space mapped pages in flush_kernel_dcache_page
2012-05-27 23:11 ` [RFC/PATCH 1/1] " Simon Baatz
@ 2012-05-28 4:32 ` Catalin Marinas
2012-05-28 5:35 ` Simon Baatz
0 siblings, 1 reply; 7+ messages in thread
From: Catalin Marinas @ 2012-05-28 4:32 UTC (permalink / raw)
To: linux-arm-kernel
Hi Simon,
On Mon, May 28, 2012 at 12:11:58AM +0100, Simon Baatz wrote:
> Commit f8b63c1 made flush_kernel_dcache_page a no-op assuming that the pages
> it needs to handle are kernel mapped only. However, for example when doing
> direct I/O, pages with user space mappings may occur.
>
> Thus, continue to do lazy flushing if there are no user space mappings.
> Otherwise, flush the kernel cache lines directly.
...
> +void __flush_kernel_dcache_page(struct page *page)
> +{
> + struct address_space *mapping;
> +
> + mapping = page_mapping(page);
> +
> + if (!mapping || mapping_mapped(mapping))
> + __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
> +}
> +EXPORT_SYMBOL(__flush_kernel_dcache_page);
I wonder whether the above condition isn't always true after
get_user_pages().
--
Catalin
^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC/PATCH 1/1] ARM: Handle user space mapped pages in flush_kernel_dcache_page
2012-05-28 4:32 ` Catalin Marinas
@ 2012-05-28 5:35 ` Simon Baatz
2012-05-28 6:21 ` Catalin Marinas
0 siblings, 1 reply; 7+ messages in thread
From: Simon Baatz @ 2012-05-28 5:35 UTC (permalink / raw)
To: linux-arm-kernel
Hi Catalin,
On Mon, May 28, 2012 at 12:32:44PM +0800, Catalin Marinas wrote:
> On Mon, May 28, 2012 at 12:11:58AM +0100, Simon Baatz wrote:
> ...
> > +void __flush_kernel_dcache_page(struct page *page)
> > +{
> > + struct address_space *mapping;
> > +
> > + mapping = page_mapping(page);
> > +
> > + if (!mapping || mapping_mapped(mapping))
> > + __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
> > +}
> > +EXPORT_SYMBOL(__flush_kernel_dcache_page);
>
> I wonder whether the above condition isn't always true after
> get_user_pages().
>
Not sure about corner cases, but I would assume that yes, this is the
case. However, the block layer sees the pages from get_user_pages()
directly only in the O_DIRECT case. Usually (read fault for page
cache), flush_kernel_dcache_page() gets pages with mapping != NULL
and mapping_mapped() == NULL (i.e. no user space mapping (yet)).
- Simon
^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC/PATCH 1/1] ARM: Handle user space mapped pages in flush_kernel_dcache_page
2012-05-28 5:35 ` Simon Baatz
@ 2012-05-28 6:21 ` Catalin Marinas
2012-05-28 7:59 ` Simon Baatz
0 siblings, 1 reply; 7+ messages in thread
From: Catalin Marinas @ 2012-05-28 6:21 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, May 28, 2012 at 06:35:59AM +0100, Simon Baatz wrote:
> On Mon, May 28, 2012 at 12:32:44PM +0800, Catalin Marinas wrote:
> > On Mon, May 28, 2012 at 12:11:58AM +0100, Simon Baatz wrote:
> > ...
> > > +void __flush_kernel_dcache_page(struct page *page)
> > > +{
> > > + struct address_space *mapping;
> > > +
> > > + mapping = page_mapping(page);
> > > +
> > > + if (!mapping || mapping_mapped(mapping))
> > > + __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
> > > +}
> > > +EXPORT_SYMBOL(__flush_kernel_dcache_page);
> >
> > I wonder whether the above condition isn't always true after
> > get_user_pages().
>
> Not sure about corner cases, but I would assume that yes, this is the
> case. However, the block layer sees the pages from get_user_pages()
> directly only in the O_DIRECT case. Usually (read fault for page
> cache), flush_kernel_dcache_page() gets pages with mapping != NULL
> and mapping_mapped() == NULL (i.e. no user space mapping (yet)).
And we can probably assume that this is only for new pages with the
PG_dcache_clean bit already cleared (i.e. no need to set it again in
flush_kernel_dcache_page for the lazy case).
--
Catalin
^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC/PATCH 1/1] ARM: Handle user space mapped pages in flush_kernel_dcache_page
2012-05-28 6:21 ` Catalin Marinas
@ 2012-05-28 7:59 ` Simon Baatz
0 siblings, 0 replies; 7+ messages in thread
From: Simon Baatz @ 2012-05-28 7:59 UTC (permalink / raw)
To: linux-arm-kernel
Hi Catalin,
On Mon, May 28, 2012 at 02:21:00PM +0800, Catalin Marinas wrote:
> On Mon, May 28, 2012 at 06:35:59AM +0100, Simon Baatz wrote:
> > On Mon, May 28, 2012 at 12:32:44PM +0800, Catalin Marinas wrote:
> > > On Mon, May 28, 2012 at 12:11:58AM +0100, Simon Baatz wrote:
> > > ...
> > > > +void __flush_kernel_dcache_page(struct page *page)
> > > > +{
> > > > + struct address_space *mapping;
> > > > +
> > > > + mapping = page_mapping(page);
> > > > +
> > > > + if (!mapping || mapping_mapped(mapping))
> > > > + __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
> > > > +}
> > > > +EXPORT_SYMBOL(__flush_kernel_dcache_page);
> > >
> > > I wonder whether the above condition isn't always true after
> > > get_user_pages().
> >
> > Not sure about corner cases, but I would assume that yes, this is the
> > case. However, the block layer sees the pages from get_user_pages()
> > directly only in the O_DIRECT case. Usually (read fault for page
> > cache), flush_kernel_dcache_page() gets pages with mapping != NULL
> > and mapping_mapped() == NULL (i.e. no user space mapping (yet)).
>
> And we can probably assume that this is only for new pages with the
> PG_dcache_clean bit already cleared (i.e. no need to set it again in
> flush_kernel_dcache_page for the lazy case).
At least this is what the implementation assumed before and what the
proposed change continues to assume. Since this assumption has been
in mainline since beginning of 2011 or so and since it applies to the
likely case, I think that the assumption should be safe.
- Simon
^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC/PATCH 0/1] ARM: Handle user space mapped pages in flush_kernel_dcache_page
2012-05-27 23:11 [RFC/PATCH 0/1] ARM: Handle user space mapped pages in flush_kernel_dcache_page Simon Baatz
2012-05-27 23:11 ` [RFC/PATCH 1/1] " Simon Baatz
@ 2012-06-17 19:13 ` Simon Baatz
1 sibling, 0 replies; 7+ messages in thread
From: Simon Baatz @ 2012-06-17 19:13 UTC (permalink / raw)
To: linux-arm-kernel
Ping. Although the problem affects only VIVT (and probably VIPT
aliasing) architectures and only direct I/O on a small fraction of
device drivers (those using flush_kernel_dcache_page()), it results
in serious data corruption in these cases. LVM2 is affected by this
and does not work. I think we should fix that one way or the other.
- Simon
On Mon, May 28, 2012 at 01:11:57AM +0200, Simon Baatz wrote:
> Patch f8b63c1 "ARM: 6382/1" changed flush_kernel_dcache_page() to a
> no-op. flush_kernel_dcache_page was deemed superfluous as newly
> allocated page cache pages are now dirty by default (patch 6379/1).
>
> This relies on the assumption that a block device driver only reads
> into new pages (otherwise the request is served from the cache).
>
> However, this assumption is not true for direct IO or SG_IO (see
> e.g. [1] and [2]). This is why vgchange fails with mv_cesa as reported
> by me in [3]. (Btw., flush_kernel_dcache_page is also called from
> "copy_strings" in fs/exec.c when copying env/arg strings between
> processes. Thus, its use is not restricted to device drivers.)
>
> At the end of this mail is a small program that uses O_DIRECT to
> specifically create two test cases:
> - device driver gets an anonymous page to read into
> - device driver gets a page with user mapping(s) to read into
>
> The source can also be obtained from [4]. With mv_cesa, both cases
> return garbled data (the scatterlist memory iterator uses
> flush_kernel_dcache_page).
>
> As explained above, the problem is not specific to mv_cesa. As many
> drivers seem to use flush_dcache_page where they probably could use
> flush_kernel_dcache_page, the problem did not surface up to now. For
> example, one of these uses is in the scatterwalk implementation of
> linux crypto. When changing flush_dcache_page to
> flush_kernel_dcache_page in crypto/scatterwalk.c, I see the same
> coherency problems:
>
> root at ww1:~# rmmod mv_cesa
> root at ww1:~# cryptsetup luksOpen /dev/sda2 c_sda2
> Enter passphrase for /dev/sda2:
> root at ww1:~# ./mapping_tests
> Anonymous page: differs!
> User space mapped page: differs!
>
> The proposed fix is to let flush_kernel_dcache_page handle the 'kernel
> mapped only' case as before (no-op) but to flush the kernel mapping in
> case of a user space mapped page.
>
> This means that the original idea to 'fix' non-flushing PIO drivers by
> patch 6379/1 does not work for all cases. The provided patch should
> fix things for drivers that already perform cache flushing, but it
> obviously does not fix direct IO for PIO drivers without (I think
> there have been discussions for specific PIO mapping APIs in the
> past). This and my limited testing on only one architecture (kirkwood,
> VIVT) is why the patch is RFC.
>
> - Simon
>
> [1] https://lkml.org/lkml/2008/11/20/20
> [2] http://article.gmane.org/gmane.linux.kernel.cross-arch/5159
> [3] http://www.mail-archive.com/linux-crypto at vger.kernel.org/msg07038.html
> [4] http://ubuntuone.com/0jF9fNsguN8BvI1Z8fsBVI
>
>
> mapping_tests.c:
>
> #define _GNU_SOURCE
>
> #include <fcntl.h>
> #include <stdio.h>
> #include <unistd.h>
> #include <stdlib.h>
> #include <sys/mman.h>
> #include <string.h>
> #include <assert.h>
>
> const char* read_filename = "/dev/mapper/c_sda2";
> const char* map_filename = "./map.out";
>
> /* Init the page and read it to fill some cache lines */
> int wr_page(char *p, int ps) {
> int i, sum;
>
> memset(p, 9, ps);
> sum = 0;
> for(i = 0; i < ps; i++)
> sum += p[i];
> return sum;
> }
>
> int main(void) {
> void *org_buffer;
> void *odirect_buffer;
> char *mbuffer;
> int fd, fdm;
> int i, c;
> unsigned page_size;
>
> page_size = getpagesize();
>
> /* Read the first page into a normal buffer.
> If the page is not in the page cache yet, this will
> create a new page without user mappings in the kernel.
> */
> fd = open(read_filename, O_RDONLY);
> assert(fd >= 0);
> org_buffer = malloc(page_size);
> c = read(fd, org_buffer, page_size);
> assert(c == page_size);
> close(fd);
>
> /* Read the first page into an aligned malloced buffer using
> O_DIRECT. The block driver will get an anonymous page.
> */
> odirect_buffer = NULL;
> posix_memalign(&odirect_buffer, page_size, page_size);
> assert(odirect_buffer);
>
> fd = open(read_filename, O_RDONLY | O_DIRECT);
> assert(fd >= 0);
>
> for (i = 0 ; i < 100 ; i++) {
> wr_page((char *)odirect_buffer, page_size);
> lseek(fd, 0, SEEK_SET);
> c = read(fd, odirect_buffer, page_size);
> assert(c == page_size);
> if (memcmp(odirect_buffer, org_buffer, page_size) != 0) {
> printf("Anonymous page: differs!\n");
> break;
> }
> }
> close(fd);
> free(odirect_buffer);
>
> /* Read the first page into an mmapped file (using O_DIRECT).
> The block driver will get a file backed page with user
> mappings.
> */
> fd = open(read_filename, O_RDONLY | O_DIRECT);
> assert(fd >= 0);
> fdm = open(map_filename, O_RDWR | O_CREAT | O_TRUNC);
> assert(fdm >= 0);
> c = write(fdm, org_buffer, page_size);
> assert(c == page_size);
> c = fsync(fdm);
> assert(c == 0);
> mbuffer = mmap(NULL, page_size, PROT_WRITE, MAP_SHARED, fdm, 0);
> for (i = 0 ; i < 100 ; i++) {
> wr_page((char *)mbuffer, page_size);
> lseek(fd, 0, SEEK_SET);
> c = read(fd, mbuffer, page_size);
> assert(c == page_size);
> if (memcmp(mbuffer, org_buffer, page_size) != 0) {
> printf("User space mapped page: differs!\n");
> break;
> }
> }
> munmap(mbuffer, page_size);
> close(fdm);
> close(fd);
> free(org_buffer);
> }
>
>
> Simon Baatz (1):
> ARM: Handle user space mapped pages in flush_kernel_dcache_page
>
> arch/arm/include/asm/cacheflush.h | 4 ++++
> arch/arm/mm/flush.c | 22 ++++++++++++++++++++++
> 2 files changed, 26 insertions(+)
>
> --
> 1.7.9.5
--
Simon Baatz <bmail@schnuecks.de>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-06-17 19:13 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-27 23:11 [RFC/PATCH 0/1] ARM: Handle user space mapped pages in flush_kernel_dcache_page Simon Baatz
2012-05-27 23:11 ` [RFC/PATCH 1/1] " Simon Baatz
2012-05-28 4:32 ` Catalin Marinas
2012-05-28 5:35 ` Simon Baatz
2012-05-28 6:21 ` Catalin Marinas
2012-05-28 7:59 ` Simon Baatz
2012-06-17 19:13 ` [RFC/PATCH 0/1] " Simon Baatz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).