* problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
@ 2009-12-09 22:39 James Bottomley
2009-12-09 22:45 ` Russell King
2009-12-09 23:03 ` Ilya Loginov
0 siblings, 2 replies; 31+ messages in thread
From: James Bottomley @ 2009-12-09 22:39 UTC (permalink / raw)
To: Ilya Loginov, Jens Axboe; +Cc: linux-arch
Sorry about being late to the party. The original patch didn't get cc'd
to linux-arch (in spite of touching header files in every arch).
The problem mtd appears to have (from reading the code) is that it wants
to do pio to and from a device from the kernel view of the address.
What's happening in mtd looks to be wrong on several levels.
Firstly, the pio code in mtd_blkdevs.c starts with req->buffer. This is
really unsafe, because if the page wasn't mappable (like you're on a
highmem architecture) then req->buffer will be NULL.
Then this extra helpers added to block: rq_flush_dcache_pages() loops
over every segment of the request calling flush_dcache_page() for the
page. If you read the mtd code, it acts on a bio at a time so a request
with N bios flushes every page in the request N times (i'e N^2 flushes).
I'm assuming there's something somewhere I'm missing that restricts mtd
to single bio requests, which makes this all OK?
Finally, if you look at the actual use, this is using a sledgehammer to
crack a nut. For writes via the kernel address, all you need to assure
is that the user aliases of the page were flushed as the bio was
constructed ... this is actually done in get_user pages, so this chunk
of code:
> case WRITE:
> if (!tr->writesect)
> return -EIO;
>
>+ rq_flush_dcache_pages(req);
> for (; nsect > 0; nsect--, block++, buf += tr->blksize)
> if (tr->writesect(dev, block, buf))
> return -EIO;
Should be completely unnecessary. (well except that mtd might need to
kmap somewhere).
The pio read case is the problematic one, because you dirty the kernel
alias by writing the read data to it and have to flush that before it's
made visible to the user alias view. The API for doing this is
flush_kernel_dcache_page() ... it *only* flushes the kernel view, not
the user view. The reason for this is that if the arch has to protect
the user aliases against speculative movein, that's done in the DMA API
before the request is completed.
So for this:
> case READ:
> for (; nsect > 0; nsect--, block++, buf += tr->blksize)
> if (tr->readsect(dev, block, buf))
> return -EIO;
>+ rq_flush_dcache_pages(req);
> return 0;
Actually all you need to do is loop over the pages and call
flush_kernel_dcache_page().
Realistically, if I were thinking about the API properly, I'd probably
have kunmap flush the kernel view if the dirty bit were set, thus
relieving the user of the need to know anything at all about any of
this. We should also have an adaption of the DMA API that tells the
arch that the mapping (and flushing) is only being done for pio not mmio
(so no need to program iommus or set up shared bus coherency registers).
That way we'd have a completely consistent and uniform API for all
drivers (whether PIO, MMIO or a mixture).
James
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-09 22:39 problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages) James Bottomley
@ 2009-12-09 22:45 ` Russell King
2009-12-09 22:56 ` James Bottomley
2009-12-09 23:03 ` Ilya Loginov
1 sibling, 1 reply; 31+ messages in thread
From: Russell King @ 2009-12-09 22:45 UTC (permalink / raw)
To: James Bottomley; +Cc: Ilya Loginov, Jens Axboe, linux-arch
On Wed, Dec 09, 2009 at 04:39:06PM -0600, James Bottomley wrote:
> Realistically, if I were thinking about the API properly, I'd probably
> have kunmap flush the kernel view if the dirty bit were set, thus
> relieving the user of the need to know anything at all about any of
> this.
Which dirty bit are you referring to?
New page cache pages do not have any dirty bit for this. There is
PG_arch_1, but this is set by flush_dcache_page on architectures
playing the delayed-dcache-flush game... so making a call to
some cache flushing function conditional on it isn't going to work.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-09 22:45 ` Russell King
@ 2009-12-09 22:56 ` James Bottomley
0 siblings, 0 replies; 31+ messages in thread
From: James Bottomley @ 2009-12-09 22:56 UTC (permalink / raw)
To: Russell King; +Cc: Ilya Loginov, Jens Axboe, linux-arch
On Wed, 2009-12-09 at 22:45 +0000, Russell King wrote:
> On Wed, Dec 09, 2009 at 04:39:06PM -0600, James Bottomley wrote:
> > Realistically, if I were thinking about the API properly, I'd probably
> > have kunmap flush the kernel view if the dirty bit were set, thus
> > relieving the user of the need to know anything at all about any of
> > this.
>
> Which dirty bit are you referring to?
>
> New page cache pages do not have any dirty bit for this. There is
> PG_arch_1, but this is set by flush_dcache_page on architectures
> playing the delayed-dcache-flush game... so making a call to
> some cache flushing function conditional on it isn't going to work.
Sorry, was thinking of the one in the architecture page tables, not the
one in struct page ... the only reason for checking is that page flush
is an expensive op (at least on parisc) so we really don't want to do it
if the page was never written to.
James
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-09 22:39 problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages) James Bottomley
2009-12-09 22:45 ` Russell King
@ 2009-12-09 23:03 ` Ilya Loginov
2009-12-09 23:11 ` James Bottomley
1 sibling, 1 reply; 31+ messages in thread
From: Ilya Loginov @ 2009-12-09 23:03 UTC (permalink / raw)
To: James Bottomley; +Cc: Jens Axboe, linux-arch
On Wed, 09 Dec 2009 16:39:06 -0600
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> Then this extra helpers added to block: rq_flush_dcache_pages() loops
> over every segment of the request calling flush_dcache_page() for the
> page. If you read the mtd code, it acts on a bio at a time so a request
> with N bios flushes every page in the request N times (i'e N^2 flushes).
> I'm assuming there's something somewhere I'm missing that restricts mtd
> to single bio requests, which makes this all OK?
Sorry. But why do you think that do_blktrans_request is dealing with bio?
May be i don't know something...
> The pio read case is the problematic one, because you dirty the kernel
> alias by writing the read data to it and have to flush that before it's
> made visible to the user alias view. The API for doing this is
> flush_kernel_dcache_page() ... it *only* flushes the kernel view, not
> the user view. The reason for this is that if the arch has to protect
> the user aliases against speculative movein, that's done in the DMA API
> before the request is completed.
>
> So for this:
>
> > case READ:
> > for (; nsect > 0; nsect--, block++, buf += tr->blksize)
> > if (tr->readsect(dev, block, buf))
> > return -EIO;
> >+ rq_flush_dcache_pages(req);
> > return 0;
>
> Actually all you need to do is loop over the pages and call
> flush_kernel_dcache_page().
I don't think so. Please reread our discussion. I have this bug on system where
icache don't look for code in dcache. And I need flush dcache exactly in
physical layer.
--
Ilya Loginov <isloginov@gmail.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-09 23:03 ` Ilya Loginov
@ 2009-12-09 23:11 ` James Bottomley
2009-12-09 23:36 ` Ilya Loginov
0 siblings, 1 reply; 31+ messages in thread
From: James Bottomley @ 2009-12-09 23:11 UTC (permalink / raw)
To: Ilya Loginov; +Cc: Jens Axboe, linux-arch
On Thu, 2009-12-10 at 02:03 +0300, Ilya Loginov wrote:
> On Wed, 09 Dec 2009 16:39:06 -0600
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>
> > Then this extra helpers added to block: rq_flush_dcache_pages() loops
> > over every segment of the request calling flush_dcache_page() for the
> > page. If you read the mtd code, it acts on a bio at a time so a request
> > with N bios flushes every page in the request N times (i'e N^2 flushes).
> > I'm assuming there's something somewhere I'm missing that restricts mtd
> > to single bio requests, which makes this all OK?
>
> Sorry. But why do you think that do_blktrans_request is dealing with bio?
> May be i don't know something...
I was basing it on the fact that req->data is bio_data() which is
initialised per bio. What mtd does can't really work unless it's doing
a bio at a time (or has small limits so that all requests are bios).
> > The pio read case is the problematic one, because you dirty the kernel
> > alias by writing the read data to it and have to flush that before it's
> > made visible to the user alias view. The API for doing this is
> > flush_kernel_dcache_page() ... it *only* flushes the kernel view, not
> > the user view. The reason for this is that if the arch has to protect
> > the user aliases against speculative movein, that's done in the DMA API
> > before the request is completed.
> >
> > So for this:
> >
> > > case READ:
> > > for (; nsect > 0; nsect--, block++, buf += tr->blksize)
> > > if (tr->readsect(dev, block, buf))
> > > return -EIO;
> > >+ rq_flush_dcache_pages(req);
> > > return 0;
> >
> > Actually all you need to do is loop over the pages and call
> > flush_kernel_dcache_page().
>
> I don't think so. Please reread our discussion.
As I said previously, I seem to have missed the discussion.
> I have this bug on system where
> icache don't look for code in dcache. And I need flush dcache exactly in
> physical layer.
icache and dcache are usually separate. On almost every architecture
you have to flush the dcache to RAM before the icache can pull it in.
However, think about where the data is in the pio read case: it's in
the cache above the kernel alias. Once you flush that cache to ram, the
icache can pick it out ... you don't have to reflush all the untouched
user aliases, you just use flush_kernel_dcache_page() which places it in
the ram.
James
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-09 23:11 ` James Bottomley
@ 2009-12-09 23:36 ` Ilya Loginov
2009-12-09 23:47 ` James Bottomley
0 siblings, 1 reply; 31+ messages in thread
From: Ilya Loginov @ 2009-12-09 23:36 UTC (permalink / raw)
To: James Bottomley; +Cc: Jens Axboe, linux-arch
On Wed, 09 Dec 2009 17:11:13 -0600
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
Sorry. But I didn't understand again. You wrote:
> > > If you read the mtd code, it acts on a bio at a time so a request
> > > with N bios flushes every page in the request N times (i'e N^2 flushes).
I think that do_blktrans_request is calling for request and that every page in
request flushes exactly once.
> > > The pio read case is the problematic one, because you dirty the kernel
> > > alias by writing the read data to it and have to flush that before it's
> > > made visible to the user alias view. The API for doing this is
> > > flush_kernel_dcache_page() ... it *only* flushes the kernel view, not
> > > the user view. The reason for this is that if the arch has to protect
> > > the user aliases against speculative movein, that's done in the DMA API
> > > before the request is completed.
> > >
> > > So for this:
> > >
> > > > case READ:
> > > > for (; nsect > 0; nsect--, block++, buf += tr->blksize)
> > > > if (tr->readsect(dev, block, buf))
> > > > return -EIO;
> > > >+ rq_flush_dcache_pages(req);
> > > > return 0;
> > >
> > > Actually all you need to do is loop over the pages and call
> > > flush_kernel_dcache_page().
> >
> > I don't think so. Please reread our discussion.
>
> As I said previously, I seem to have missed the discussion.
>
> > I have this bug on system where
> > icache don't look for code in dcache. And I need flush dcache exactly in
> > physical layer.
>
> icache and dcache are usually separate. On almost every architecture
> you have to flush the dcache to RAM before the icache can pull it in.
> However, think about where the data is in the pio read case: it's in
> the cache above the kernel alias. Once you flush that cache to ram, the
> icache can pick it out ... you don't have to reflush all the untouched
> user aliases, you just use flush_kernel_dcache_page() which places it in
> the ram.
But the flush_kernel_dcache_page() call exists only for sh and parisc. For
other architectures this call is empty. So, we will not fix the bug if we
call flush_kernel_dcache_page().
But. I could do that rq_flush_dcache_pages will call flush_kernel_dcache_page
for architectures where ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE was defined.
--
Ilya Loginov <isloginov@gmail.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-09 23:36 ` Ilya Loginov
@ 2009-12-09 23:47 ` James Bottomley
2009-12-10 0:06 ` Ilya Loginov
0 siblings, 1 reply; 31+ messages in thread
From: James Bottomley @ 2009-12-09 23:47 UTC (permalink / raw)
To: Ilya Loginov; +Cc: Jens Axboe, linux-arch
On Thu, 2009-12-10 at 02:36 +0300, Ilya Loginov wrote:
> On Wed, 09 Dec 2009 17:11:13 -0600
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>
> Sorry. But I didn't understand again. You wrote:
>
> > > > If you read the mtd code, it acts on a bio at a time so a request
> > > > with N bios flushes every page in the request N times (i'e N^2 flushes).
>
> I think that do_blktrans_request is calling for request and that every page in
> request flushes exactly once.
If the request contains multiple bios, the code in mtd is wrong ...
that's not really anything to do with this patch, it was an observation
along the way.
> > > > The pio read case is the problematic one, because you dirty the kernel
> > > > alias by writing the read data to it and have to flush that before it's
> > > > made visible to the user alias view. The API for doing this is
> > > > flush_kernel_dcache_page() ... it *only* flushes the kernel view, not
> > > > the user view. The reason for this is that if the arch has to protect
> > > > the user aliases against speculative movein, that's done in the DMA API
> > > > before the request is completed.
> > > >
> > > > So for this:
> > > >
> > > > > case READ:
> > > > > for (; nsect > 0; nsect--, block++, buf += tr->blksize)
> > > > > if (tr->readsect(dev, block, buf))
> > > > > return -EIO;
> > > > >+ rq_flush_dcache_pages(req);
> > > > > return 0;
> > > >
> > > > Actually all you need to do is loop over the pages and call
> > > > flush_kernel_dcache_page().
> > >
> > > I don't think so. Please reread our discussion.
> >
> > As I said previously, I seem to have missed the discussion.
> >
> > > I have this bug on system where
> > > icache don't look for code in dcache. And I need flush dcache exactly in
> > > physical layer.
> >
> > icache and dcache are usually separate. On almost every architecture
> > you have to flush the dcache to RAM before the icache can pull it in.
> > However, think about where the data is in the pio read case: it's in
> > the cache above the kernel alias. Once you flush that cache to ram, the
> > icache can pick it out ... you don't have to reflush all the untouched
> > user aliases, you just use flush_kernel_dcache_page() which places it in
> > the ram.
>
> But the flush_kernel_dcache_page() call exists only for sh and parisc. For
> other architectures this call is empty. So, we will not fix the bug if we
> call flush_kernel_dcache_page().
That's because they were the only ones having trouble with this type of
coherency. I can see that sparc also would, but they've never reported
the issue. The API was introduced to get fuse to work on VIPT (and
VIVT) systems.
Which architecture is this? ... because if it's missing a necessary
definition for flush_kernel_dcache_page() it's very easy to add it ...
> But. I could do that rq_flush_dcache_pages will call flush_kernel_dcache_page
> for architectures where ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE was defined.
The point I'm trying to make is that flush_dcache_page() does a lot of
unnecessary flushing. Where you are in the system with the READ call,
you know the user aliases are clean (because users aren't allowed to
touch pages submitted for write), so you only (for efficiency) need to
flush the dirty kernel alias.
James
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-09 23:47 ` James Bottomley
@ 2009-12-10 0:06 ` Ilya Loginov
2009-12-10 0:19 ` James Bottomley
0 siblings, 1 reply; 31+ messages in thread
From: Ilya Loginov @ 2009-12-10 0:06 UTC (permalink / raw)
To: James Bottomley; +Cc: Jens Axboe, linux-arch
On Wed, 09 Dec 2009 17:47:51 -0600
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> Which architecture is this? ... because if it's missing a necessary
> definition for flush_kernel_dcache_page() it's very easy to add it ...
This is a MIPS. Why? The call flush_dcache_page() on MIPS is lazy enought.
And it do exactly what i need to fix the problem.
> > But. I could do that rq_flush_dcache_pages will call flush_kernel_dcache_page
> > for architectures where ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE was defined.
>
> The point I'm trying to make is that flush_dcache_page() does a lot of
> unnecessary flushing. Where you are in the system with the READ call,
> you know the user aliases are clean (because users aren't allowed to
> touch pages submitted for write), so you only (for efficiency) need to
> flush the dirty kernel alias.
I understand that in your case(parisc) solution with flushdcache_page()
is very voracious. But I don't think that we should change something
somewhere else except parisc. Or we should write to Ralf Baechle and
other maintainers and discuss all with them. Don't think?
--
Ilya Loginov <isloginov@gmail.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-10 0:06 ` Ilya Loginov
@ 2009-12-10 0:19 ` James Bottomley
2009-12-10 4:40 ` Ilya Loginov
0 siblings, 1 reply; 31+ messages in thread
From: James Bottomley @ 2009-12-10 0:19 UTC (permalink / raw)
To: Ilya Loginov; +Cc: Jens Axboe, linux-arch
On Thu, 2009-12-10 at 03:06 +0300, Ilya Loginov wrote:
> On Wed, 09 Dec 2009 17:47:51 -0600
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>
> > Which architecture is this? ... because if it's missing a necessary
> > definition for flush_kernel_dcache_page() it's very easy to add it ...
>
> This is a MIPS. Why? The call flush_dcache_page() on MIPS is lazy enought.
> And it do exactly what i need to fix the problem.
OK, but the point I'm making is that it's a very heavyweight function on
a lot of architectures. It sounds like mips should just have a
flush_kernel_dcache_page() ... has anyone tested fuse on mips; if that
fails, then it's a must.
> > > But. I could do that rq_flush_dcache_pages will call flush_kernel_dcache_page
> > > for architectures where ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE was defined.
> >
> > The point I'm trying to make is that flush_dcache_page() does a lot of
> > unnecessary flushing. Where you are in the system with the READ call,
> > you know the user aliases are clean (because users aren't allowed to
> > touch pages submitted for write), so you only (for efficiency) need to
> > flush the dirty kernel alias.
>
> I understand that in your case(parisc) solution with flushdcache_page()
> is very voracious.
It's not the worst ... VIVT arm is the worst because it loops over every
user mapping of the page.
> But I don't think that we should change something
> somewhere else except parisc. Or we should write to Ralf Baechle and
> other maintainers and discuss all with them. Don't think?
We actually are ... that's what this linux-arch list is for: contacting
all the architecture maintainers.
The problem seems to be defined as one of ensuring coherency on PIO
block devices in the most efficient manner possible.
Like I said previously, I still think some extension to the DMA API to
map the areas correctly might be the best way forwards.
James
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-10 0:19 ` James Bottomley
@ 2009-12-10 4:40 ` Ilya Loginov
2009-12-10 17:07 ` James Bottomley
0 siblings, 1 reply; 31+ messages in thread
From: Ilya Loginov @ 2009-12-10 4:40 UTC (permalink / raw)
To: James Bottomley; +Cc: Jens Axboe, linux-arch
On Wed, 09 Dec 2009 18:19:55 -0600
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> OK, but the point I'm making is that it's a very heavyweight function on
> a lot of architectures. It sounds like mips should just have a
> flush_kernel_dcache_page() ... has anyone tested fuse on mips; if that
> fails, then it's a must.
Yes, it works. There are many embedded systems which are based on MIPS.
For example, there is MIPS in my router. I use fuse on it(it is required
by ntfs-3g). OpenWRT.
> The problem seems to be defined as one of ensuring coherency on PIO
> block devices in the most efficient manner possible.
> Like I said previously, I still think some extension to the DMA API to
> map the areas correctly might be the best way forwards.
What do you mean under DMA API? Do you mean that we should fix memcpy?
--
Ilya Loginov <isloginov@gmail.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-10 4:40 ` Ilya Loginov
@ 2009-12-10 17:07 ` James Bottomley
2009-12-10 17:48 ` Russell King
2009-12-10 19:46 ` Ilya Loginov
0 siblings, 2 replies; 31+ messages in thread
From: James Bottomley @ 2009-12-10 17:07 UTC (permalink / raw)
To: Ilya Loginov; +Cc: Jens Axboe, linux-arch
On Thu, 2009-12-10 at 07:40 +0300, Ilya Loginov wrote:
> On Wed, 09 Dec 2009 18:19:55 -0600
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>
> > OK, but the point I'm making is that it's a very heavyweight function on
> > a lot of architectures. It sounds like mips should just have a
> > flush_kernel_dcache_page() ... has anyone tested fuse on mips; if that
> > fails, then it's a must.
>
> Yes, it works. There are many embedded systems which are based on MIPS.
> For example, there is MIPS in my router. I use fuse on it(it is required
> by ntfs-3g). OpenWRT.
Yes ... apparently mips implements flush_anon_page(); I'm misremembering
how fuse got fixed.
> > The problem seems to be defined as one of ensuring coherency on PIO
> > block devices in the most efficient manner possible.
>
> > Like I said previously, I still think some extension to the DMA API to
> > map the areas correctly might be the best way forwards.
>
> What do you mean under DMA API? Do you mean that we should fix memcpy?
The DMA API prepares user pages to send to physical devices via MMIO.
It's an abstraction to prevent device driver writers having to remember
complex (and architecture specific) rules about page flushing.
For PIO we have this kmap/operate/flush_kernel_dcache_page()/kunmap
complexity (see for example drivers/mmc/mmc_spi.c). However, it could
all be taken care of in a similar way to the DMA API via an
abstraction ... although I suspect the best abstraction is to pull the
flush into kunmap.
What MTD device is this? The subsystem is very complex, but I suppose I
could trace the read path in a single device to see what the actual
problem is.
James
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-10 17:07 ` James Bottomley
@ 2009-12-10 17:48 ` Russell King
2009-12-10 17:59 ` James Bottomley
2009-12-10 19:46 ` Ilya Loginov
1 sibling, 1 reply; 31+ messages in thread
From: Russell King @ 2009-12-10 17:48 UTC (permalink / raw)
To: James Bottomley; +Cc: Ilya Loginov, Jens Axboe, linux-arch
On Thu, Dec 10, 2009 at 11:07:31AM -0600, James Bottomley wrote:
> For PIO we have this kmap/operate/flush_kernel_dcache_page()/kunmap
> complexity (see for example drivers/mmc/mmc_spi.c). However, it could
> all be taken care of in a similar way to the DMA API via an
> abstraction ... although I suspect the best abstraction is to pull the
> flush into kunmap.
I assume you actually mean kmap_atomic() / kunmap_atomic(), since that
would be used in interrupt driven PIO drivers rather than kmap()/kunmap().
Well, it would mean every kunmap_atomic() gains an expensive cache flush
no matter what it's doing. That would be very bad for things like
copy_user_highpage(), where we kmap the source and destination pages,
and then kunmap both.
However, there are cases where we do want to flush on kunmap_atomic() -
again, taking the copy_user_highpage() example, we want to ensure that
data written to the page is going to be visible in some way. IOW, we
already have this:
kfrom = kmap_atomic(from, KM_USER0);
kto = kmap_atomic(to, KM_USER1);
copy_page(kto, kfrom);
#ifdef CONFIG_HIGHMEM
/*
* kmap_atomic() doesn't set the page virtual address, and
* kunmap_atomic() takes care of cache flushing already.
*/
if (page_address(to) != NULL)
#endif
__cpuc_flush_dcache_page(kto);
kunmap_atomic(kto, KM_USER1);
kunmap_atomic(kfrom, KM_USER0);
would become something like:
...
kunmap_atomic_flush(kto, KM_USER1);
kunmap_atomic(kfrom, KM_USER0);
So I think what we want to add is kunmap_atomic_flush() for the cases
where we need the additional coherency, or maybe a kunmap_atomic_noflush()
for those which we don't.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-10 17:48 ` Russell King
@ 2009-12-10 17:59 ` James Bottomley
2009-12-10 18:06 ` Russell King
0 siblings, 1 reply; 31+ messages in thread
From: James Bottomley @ 2009-12-10 17:59 UTC (permalink / raw)
To: Russell King; +Cc: Ilya Loginov, Jens Axboe, linux-arch
On Thu, 2009-12-10 at 17:48 +0000, Russell King wrote:
> On Thu, Dec 10, 2009 at 11:07:31AM -0600, James Bottomley wrote:
> > For PIO we have this kmap/operate/flush_kernel_dcache_page()/kunmap
> > complexity (see for example drivers/mmc/mmc_spi.c). However, it could
> > all be taken care of in a similar way to the DMA API via an
> > abstraction ... although I suspect the best abstraction is to pull the
> > flush into kunmap.
>
> I assume you actually mean kmap_atomic() / kunmap_atomic(), since that
> would be used in interrupt driven PIO drivers rather than kmap()/kunmap().
Yes, just using shorthand.
> Well, it would mean every kunmap_atomic() gains an expensive cache flush
> no matter what it's doing. That would be very bad for things like
> copy_user_highpage(), where we kmap the source and destination pages,
> and then kunmap both.
>
> However, there are cases where we do want to flush on kunmap_atomic() -
> again, taking the copy_user_highpage() example, we want to ensure that
> data written to the page is going to be visible in some way. IOW, we
> already have this:
>
> kfrom = kmap_atomic(from, KM_USER0);
> kto = kmap_atomic(to, KM_USER1);
> copy_page(kto, kfrom);
> #ifdef CONFIG_HIGHMEM
> /*
> * kmap_atomic() doesn't set the page virtual address, and
> * kunmap_atomic() takes care of cache flushing already.
> */
> if (page_address(to) != NULL)
> #endif
> __cpuc_flush_dcache_page(kto);
> kunmap_atomic(kto, KM_USER1);
> kunmap_atomic(kfrom, KM_USER0);
>
> would become something like:
>
> ...
> kunmap_atomic_flush(kto, KM_USER1);
> kunmap_atomic(kfrom, KM_USER0);
>
> So I think what we want to add is kunmap_atomic_flush() for the cases
> where we need the additional coherency, or maybe a kunmap_atomic_noflush()
> for those which we don't.
So if you think about it on a VI architecture, assuming we modified some
data in the kmap page at the returned address, why would we ever want to
unmap without flushing? The only case I can think of is when we *know*
the kmap address and the other addresses are all congruent (so we have
no aliases).
So I really think in kunmap(_atomic) we need to check to see if the page
was modified (using the pte flags), and if it was, and it's not
congruent, we flush ... otherwise we've left a dirty cache line above an
unmapped area (which is an illegal condition, at least on parisc).
The problems, as you rightly point out, come because most of our uses of
kmap/kunmap already have a flush built into them if they modify the
mapped page. However, doesn't it make sense semantically to pull all of
these flushes into the unmap (as in modify a lot of code to remove the
explicit flush)?
What this does is have kmap/kunmap automatically do the architecturally
right thing, and the user never needs to worry about flushing.
James
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-10 17:59 ` James Bottomley
@ 2009-12-10 18:06 ` Russell King
2009-12-10 18:20 ` James Bottomley
0 siblings, 1 reply; 31+ messages in thread
From: Russell King @ 2009-12-10 18:06 UTC (permalink / raw)
To: James Bottomley; +Cc: Ilya Loginov, Jens Axboe, linux-arch
On Thu, Dec 10, 2009 at 11:59:16AM -0600, James Bottomley wrote:
> On Thu, 2009-12-10 at 17:48 +0000, Russell King wrote:
> > Well, it would mean every kunmap_atomic() gains an expensive cache flush
> > no matter what it's doing. That would be very bad for things like
> > copy_user_highpage(), where we kmap the source and destination pages,
> > and then kunmap both.
> >
> > However, there are cases where we do want to flush on kunmap_atomic() -
> > again, taking the copy_user_highpage() example, we want to ensure that
> > data written to the page is going to be visible in some way. IOW, we
> > already have this:
> >
> > kfrom = kmap_atomic(from, KM_USER0);
> > kto = kmap_atomic(to, KM_USER1);
> > copy_page(kto, kfrom);
> > #ifdef CONFIG_HIGHMEM
> > /*
> > * kmap_atomic() doesn't set the page virtual address, and
> > * kunmap_atomic() takes care of cache flushing already.
> > */
> > if (page_address(to) != NULL)
> > #endif
> > __cpuc_flush_dcache_page(kto);
> > kunmap_atomic(kto, KM_USER1);
> > kunmap_atomic(kfrom, KM_USER0);
> >
> > would become something like:
> >
> > ...
> > kunmap_atomic_flush(kto, KM_USER1);
> > kunmap_atomic(kfrom, KM_USER0);
> >
> > So I think what we want to add is kunmap_atomic_flush() for the cases
> > where we need the additional coherency, or maybe a kunmap_atomic_noflush()
> > for those which we don't.
>
> So if you think about it on a VI architecture, assuming we modified some
> data in the kmap page at the returned address, why would we ever want to
> unmap without flushing? The only case I can think of is when we *know*
> the kmap address and the other addresses are all congruent (so we have
> no aliases).
The above example code comes from non-aliasing VIPT - where for the
vast majority of cases, unmapping without flush is fine provided we
haven't written data. However, unmapping with flush is required to
ensure coherency with the instruction cache.
> So I really think in kunmap(_atomic) we need to check to see if the page
> was modified (using the pte flags),
That's fine _if_ you have such flags. Not everything has - in which
case, going down the route you're proposing means that every single
kunmap_atomic() ends up having to flush the whole page whether it's
actually needed on an architecture "just because" - with no technical
reason to actually do so.
We need the two cases separated for hardware which is not PARISC.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-10 18:06 ` Russell King
@ 2009-12-10 18:20 ` James Bottomley
2009-12-10 19:05 ` Russell King
2009-12-10 19:42 ` Ilya Loginov
0 siblings, 2 replies; 31+ messages in thread
From: James Bottomley @ 2009-12-10 18:20 UTC (permalink / raw)
To: Russell King; +Cc: Ilya Loginov, Jens Axboe, linux-arch
On Thu, 2009-12-10 at 18:06 +0000, Russell King wrote:
> On Thu, Dec 10, 2009 at 11:59:16AM -0600, James Bottomley wrote:
> > On Thu, 2009-12-10 at 17:48 +0000, Russell King wrote:
> > > Well, it would mean every kunmap_atomic() gains an expensive cache flush
> > > no matter what it's doing. That would be very bad for things like
> > > copy_user_highpage(), where we kmap the source and destination pages,
> > > and then kunmap both.
> > >
> > > However, there are cases where we do want to flush on kunmap_atomic() -
> > > again, taking the copy_user_highpage() example, we want to ensure that
> > > data written to the page is going to be visible in some way. IOW, we
> > > already have this:
> > >
> > > kfrom = kmap_atomic(from, KM_USER0);
> > > kto = kmap_atomic(to, KM_USER1);
> > > copy_page(kto, kfrom);
> > > #ifdef CONFIG_HIGHMEM
> > > /*
> > > * kmap_atomic() doesn't set the page virtual address, and
> > > * kunmap_atomic() takes care of cache flushing already.
> > > */
> > > if (page_address(to) != NULL)
> > > #endif
> > > __cpuc_flush_dcache_page(kto);
> > > kunmap_atomic(kto, KM_USER1);
> > > kunmap_atomic(kfrom, KM_USER0);
> > >
> > > would become something like:
> > >
> > > ...
> > > kunmap_atomic_flush(kto, KM_USER1);
> > > kunmap_atomic(kfrom, KM_USER0);
> > >
> > > So I think what we want to add is kunmap_atomic_flush() for the cases
> > > where we need the additional coherency, or maybe a kunmap_atomic_noflush()
> > > for those which we don't.
> >
> > So if you think about it on a VI architecture, assuming we modified some
> > data in the kmap page at the returned address, why would we ever want to
> > unmap without flushing? The only case I can think of is when we *know*
> > the kmap address and the other addresses are all congruent (so we have
> > no aliases).
>
> The above example code comes from non-aliasing VIPT - where for the
> vast majority of cases, unmapping without flush is fine provided we
> haven't written data. However, unmapping with flush is required to
> ensure coherency with the instruction cache.
right, but you can check those two cases in the unmap, can't you?
> > So I really think in kunmap(_atomic) we need to check to see if the page
> > was modified (using the pte flags),
>
> That's fine _if_ you have such flags. Not everything has - in which
> case, going down the route you're proposing means that every single
> kunmap_atomic() ends up having to flush the whole page whether it's
> actually needed on an architecture "just because" - with no technical
> reason to actually do so.
>
> We need the two cases separated for hardware which is not PARISC.
So having such a flag is a requirement of the linux mm code, isn't it?
I thought what you did on arm was mark the page read only (even if it's
supposed to be read/write) and then trap on the write request and update
the dirty bit and set the page to read/write ... don't you do that
anymore?
I'm not religiously opposed to the separation into a flush and a non
flush case ... although I think if we have to do this, it's equivalent
to just forcing users to add the flush_kernel_dcache_page() ... but if
we can do it so that the users don't need to know the details, I think
the API is much better.
James
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-10 18:20 ` James Bottomley
@ 2009-12-10 19:05 ` Russell King
2009-12-10 20:29 ` James Bottomley
2009-12-10 19:42 ` Ilya Loginov
1 sibling, 1 reply; 31+ messages in thread
From: Russell King @ 2009-12-10 19:05 UTC (permalink / raw)
To: James Bottomley; +Cc: Ilya Loginov, Jens Axboe, linux-arch
On Thu, Dec 10, 2009 at 12:20:55PM -0600, James Bottomley wrote:
> On Thu, 2009-12-10 at 18:06 +0000, Russell King wrote:
> > The above example code comes from non-aliasing VIPT - where for the
> > vast majority of cases, unmapping without flush is fine provided we
> > haven't written data. However, unmapping with flush is required to
> > ensure coherency with the instruction cache.
>
> right, but you can check those two cases in the unmap, can't you?
How? (I'd have thought that would've been plainly obvious since I wrote
in the quoted bit below "_if_ you have such flags".)
> > > So I really think in kunmap(_atomic) we need to check to see if the page
> > > was modified (using the pte flags),
> >
> > That's fine _if_ you have such flags. Not everything has - in which
> > case, going down the route you're proposing means that every single
> > kunmap_atomic() ends up having to flush the whole page whether it's
> > actually needed on an architecture "just because" - with no technical
> > reason to actually do so.
> >
> > We need the two cases separated for hardware which is not PARISC.
>
> So having such a flag is a requirement of the linux mm code, isn't it?
>
> I thought what you did on arm was mark the page read only (even if it's
> supposed to be read/write) and then trap on the write request and update
> the dirty bit and set the page to read/write ... don't you do that
> anymore?
We do that for user pages, and only user pages - it's partly maintained
by the generic kernel code, and partly by the page table attribute
translation. We only make pages _user_ writable if they have the Linux
'write' and Linux 'dirty' bits set. However, they remain writable from
normal kernel stores - but we use a special instruction to access them
which ensures that the user mode permissions get checked.
Essentially, the protections that the majority of ARM CPUs have available
to them are:
User Kernel
0: No access Read only
1: No access Read/write
2: Read only Read/write
3: Read/write Read/write
The logic we use for implementing the userspace dirty support switches
the page permissions between case 2 and 3 - which is going to be of no
use for kernel accesses.
Moreover, we don't map kernel RAM using 4K pages - we map it using 1MB
section mappings to save the TLB from being cycled through. If we were
to apply the same principle there, we'd have to do it on a 1MB by 1MB
basis, or take an additional memory hit with TLB usage.
So, in order to have bits for each page, what you're asking for is:
- avoid using efficient section mappings which only use 1 level of
page table, map everything using 2 levels of page tables using 4K
pages.
- add code to handle an additional special "dirty" bit processing for
kernel pages.
I think that is far too inefficient an option to even contemplate.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-10 19:05 ` Russell King
@ 2009-12-10 20:29 ` James Bottomley
2009-12-10 20:39 ` Russell King
0 siblings, 1 reply; 31+ messages in thread
From: James Bottomley @ 2009-12-10 20:29 UTC (permalink / raw)
To: Russell King; +Cc: Ilya Loginov, Jens Axboe, linux-arch
On Thu, 2009-12-10 at 19:05 +0000, Russell King wrote:
> On Thu, Dec 10, 2009 at 12:20:55PM -0600, James Bottomley wrote:
> > On Thu, 2009-12-10 at 18:06 +0000, Russell King wrote:
> > > The above example code comes from non-aliasing VIPT - where for the
> > > vast majority of cases, unmapping without flush is fine provided we
> > > haven't written data. However, unmapping with flush is required to
> > > ensure coherency with the instruction cache.
> >
> > right, but you can check those two cases in the unmap, can't you?
>
> How? (I'd have thought that would've been plainly obvious since I wrote
> in the quoted bit below "_if_ you have such flags".)
>
> > > > So I really think in kunmap(_atomic) we need to check to see if the page
> > > > was modified (using the pte flags),
> > >
> > > That's fine _if_ you have such flags. Not everything has - in which
> > > case, going down the route you're proposing means that every single
> > > kunmap_atomic() ends up having to flush the whole page whether it's
> > > actually needed on an architecture "just because" - with no technical
> > > reason to actually do so.
> > >
> > > We need the two cases separated for hardware which is not PARISC.
> >
> > So having such a flag is a requirement of the linux mm code, isn't it?
> >
> > I thought what you did on arm was mark the page read only (even if it's
> > supposed to be read/write) and then trap on the write request and update
> > the dirty bit and set the page to read/write ... don't you do that
> > anymore?
>
> We do that for user pages, and only user pages - it's partly maintained
> by the generic kernel code, and partly by the page table attribute
> translation. We only make pages _user_ writable if they have the Linux
> 'write' and Linux 'dirty' bits set. However, they remain writable from
> normal kernel stores - but we use a special instruction to access them
> which ensures that the user mode permissions get checked.
>
> Essentially, the protections that the majority of ARM CPUs have available
> to them are:
>
> User Kernel
> 0: No access Read only
> 1: No access Read/write
> 2: Read only Read/write
> 3: Read/write Read/write
>
> The logic we use for implementing the userspace dirty support switches
> the page permissions between case 2 and 3 - which is going to be of no
> use for kernel accesses.
>
> Moreover, we don't map kernel RAM using 4K pages - we map it using 1MB
> section mappings to save the TLB from being cycled through. If we were
> to apply the same principle there, we'd have to do it on a 1MB by 1MB
> basis, or take an additional memory hit with TLB usage.
>
> So, in order to have bits for each page, what you're asking for is:
> - avoid using efficient section mappings which only use 1 level of
> page table, map everything using 2 levels of page tables using 4K
> pages.
> - add code to handle an additional special "dirty" bit processing for
> kernel pages.
>
> I think that is far too inefficient an option to even contemplate.
For 2MB pages, yes, I'd have to agree.
so the fallback proposal is adding the flush to the kunmap(_atomic) and
then adding a kunmap_noflush(_atomic) that we convert the already
flushed architecture cases to using ... is that OK?
James
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-10 20:29 ` James Bottomley
@ 2009-12-10 20:39 ` Russell King
0 siblings, 0 replies; 31+ messages in thread
From: Russell King @ 2009-12-10 20:39 UTC (permalink / raw)
To: James Bottomley; +Cc: Ilya Loginov, Jens Axboe, linux-arch
On Thu, Dec 10, 2009 at 02:29:59PM -0600, James Bottomley wrote:
> For 2MB pages, yes, I'd have to agree.
>
> so the fallback proposal is adding the flush to the kunmap(_atomic) and
> then adding a kunmap_noflush(_atomic) that we convert the already
> flushed architecture cases to using ... is that OK?
Yes.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-10 18:20 ` James Bottomley
2009-12-10 19:05 ` Russell King
@ 2009-12-10 19:42 ` Ilya Loginov
2009-12-10 19:43 ` Russell King
1 sibling, 1 reply; 31+ messages in thread
From: Ilya Loginov @ 2009-12-10 19:42 UTC (permalink / raw)
To: James Bottomley; +Cc: Russell King, Ilya Loginov, Jens Axboe, linux-arch
On Thu, 10 Dec 2009 12:20:55 -0600
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> I'm not religiously opposed to the separation into a flush and a non
> flush case ... although I think if we have to do this, it's equivalent
> to just forcing users to add the flush_kernel_dcache_page() ... but if
> we can do it so that the users don't need to know the details, I think
> the API is much better.
I wrote that flush_kernel_dcache_page() was exist only in parisc(and sh).
There is flush_dcache_page() in other architectures. And I don't think,
that flush_kernel_dcache_page() should be introduced into other
architectures. Actually, I don't know a lot about many architectures.
So, may be. We should have a varian of rq_flush... call for architectures
where ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE was defined. I wrote this before.
And I continue to think such manner since I've read about ARM.
--
Ilya Loginov <isloginov@gmail.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-10 19:42 ` Ilya Loginov
@ 2009-12-10 19:43 ` Russell King
2009-12-10 19:48 ` Ilya Loginov
0 siblings, 1 reply; 31+ messages in thread
From: Russell King @ 2009-12-10 19:43 UTC (permalink / raw)
To: Ilya Loginov; +Cc: James Bottomley, Jens Axboe, linux-arch
On Thu, Dec 10, 2009 at 10:42:37PM +0300, Ilya Loginov wrote:
> On Thu, 10 Dec 2009 12:20:55 -0600
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>
> > I'm not religiously opposed to the separation into a flush and a non
> > flush case ... although I think if we have to do this, it's equivalent
> > to just forcing users to add the flush_kernel_dcache_page() ... but if
> > we can do it so that the users don't need to know the details, I think
> > the API is much better.
>
> I wrote that flush_kernel_dcache_page() was exist only in parisc(and sh).
and ARM.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-10 17:07 ` James Bottomley
2009-12-10 17:48 ` Russell King
@ 2009-12-10 19:46 ` Ilya Loginov
2009-12-10 20:28 ` James Bottomley
1 sibling, 1 reply; 31+ messages in thread
From: Ilya Loginov @ 2009-12-10 19:46 UTC (permalink / raw)
To: James Bottomley; +Cc: Jens Axboe, linux-arch
On Thu, 10 Dec 2009 11:07:31 -0600
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> What MTD device is this? The subsystem is very complex, but I suppose I
> could trace the read path in a single device to see what the actual
> problem is.
Oh! There is a FPGA. There are kernel and diskimage(via slram) in memory.
So, kernel just copy a part of libc from fs in RAM to RAM.
--
Ilya Loginov <isloginov@gmail.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-10 19:46 ` Ilya Loginov
@ 2009-12-10 20:28 ` James Bottomley
2009-12-10 20:41 ` Ilya Loginov
2009-12-10 20:48 ` Ilya Loginov
0 siblings, 2 replies; 31+ messages in thread
From: James Bottomley @ 2009-12-10 20:28 UTC (permalink / raw)
To: Ilya Loginov; +Cc: Jens Axboe, linux-arch
On Thu, 2009-12-10 at 22:46 +0300, Ilya Loginov wrote:
> On Thu, 10 Dec 2009 11:07:31 -0600
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>
> > What MTD device is this? The subsystem is very complex, but I suppose I
> > could trace the read path in a single device to see what the actual
> > problem is.
>
> Oh! There is a FPGA. There are kernel and diskimage(via slram) in memory.
> So, kernel just copy a part of libc from fs in RAM to RAM.
Right, so this is the problem. Where you do the ram to ram copy
in-kernel, you need kmap/kunmap. If you copy from the FPGA RAM to the
kernel RAM, before you do the kunmap, you also need a
flush_kernel_dcache_page(). This (plus implementing
flush_kernel_dcache_page() on mips) will fix all coherency issues.
Which exact driver in drivers/mtd? I can probably just look at it and
see where the kmap/flush is supposed to be.
James
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-10 20:28 ` James Bottomley
@ 2009-12-10 20:41 ` Ilya Loginov
2009-12-10 20:48 ` Ilya Loginov
1 sibling, 0 replies; 31+ messages in thread
From: Ilya Loginov @ 2009-12-10 20:41 UTC (permalink / raw)
To: James Bottomley; +Cc: Jens Axboe, linux-arch
On Thu, 10 Dec 2009 14:28:04 -0600
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> > diskimage(via slram) in memory.
> Which exact driver in drivers/mtd? I can probably just look at it and
> see where the kmap/flush is supposed to be.
I think that this is the right driver.
drivers/mtd/devices/slram.c
--
Ilya Loginov <isloginov@gmail.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-10 20:28 ` James Bottomley
2009-12-10 20:41 ` Ilya Loginov
@ 2009-12-10 20:48 ` Ilya Loginov
2009-12-10 20:59 ` James Bottomley
1 sibling, 1 reply; 31+ messages in thread
From: Ilya Loginov @ 2009-12-10 20:48 UTC (permalink / raw)
To: James Bottomley; +Cc: Jens Axboe, linux-arch
On Thu, 10 Dec 2009 14:28:04 -0600
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> Which exact driver in drivers/mtd? I can probably just look at it and
> see where the kmap/flush is supposed to be.
So, do you think that problem is only in slram driver not in mtd layer
in the large?
--
Ilya Loginov <isloginov@gmail.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-10 20:48 ` Ilya Loginov
@ 2009-12-10 20:59 ` James Bottomley
2009-12-10 21:27 ` Ilya Loginov
0 siblings, 1 reply; 31+ messages in thread
From: James Bottomley @ 2009-12-10 20:59 UTC (permalink / raw)
To: Ilya Loginov; +Cc: Jens Axboe, linux-arch
On Thu, 2009-12-10 at 23:48 +0300, Ilya Loginov wrote:
> On Thu, 10 Dec 2009 14:28:04 -0600
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>
> > Which exact driver in drivers/mtd? I can probably just look at it and
> > see where the kmap/flush is supposed to be.
>
> So, do you think that problem is only in slram driver not in mtd layer
> in the large?
Yes, you can't copy into a kernel mapping of a user page without
flushing and expect it to succeed in an aliased environment. This is
the source of your incoherency. The slram driver is doing a memcpy
between the kernel address it's given and the allocated ram area in
slram_read and slram_write. To fix mips, you just need a
flush_kernel_dcache_page() in slram_read so that the alias is updated
after the memcpy. I would also expect this driver not to work on any
highmem system without additional kmap/kunmap(_atomic) pairs in the read
and write routines.
How many other mtd drivers are affected, I'm not sure ... any that do
PIO are wrong ... those that do MMIO should be right (that looks to be
just the omap driver).
James
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-10 20:59 ` James Bottomley
@ 2009-12-10 21:27 ` Ilya Loginov
2009-12-10 21:43 ` Ilya Loginov
2009-12-10 22:00 ` James Bottomley
0 siblings, 2 replies; 31+ messages in thread
From: Ilya Loginov @ 2009-12-10 21:27 UTC (permalink / raw)
To: James Bottomley; +Cc: Jens Axboe, linux-arch
On Thu, 10 Dec 2009 14:59:36 -0600
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> To fix mips, you just need a
> flush_kernel_dcache_page() in slram_read so that the alias is updated
> after the memcpy.
I think you right. But! If we choose this way:
First. We need to realize flush_kernel_dcache_page() for many
architectures. Am I right?
Second. What difference will be between flush_kernel_dcache_page and
flush_dcache_page on MIPS? In common, flush_dcache_page in MIPS set
bit dirty on page.
> I would also expect this driver not to work on any
> highmem system without additional kmap/kunmap(_atomic) pairs in the read
> and write routines.
> How many other mtd drivers are affected, I'm not sure ... any that do
> PIO are wrong ... those that do MMIO should be right (that looks to be
> just the omap driver).
Third. We should fix all other PIO drivers with problem and patch
aoe: switch to the new bio_flush_dcache_pages() interface in -mm tree
(i can't find this commit in the Linus tree). There is same problem.
May be I wrong. And we have much less work.
--
Ilya Loginov <isloginov@gmail.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-10 21:27 ` Ilya Loginov
@ 2009-12-10 21:43 ` Ilya Loginov
2009-12-10 22:00 ` James Bottomley
1 sibling, 0 replies; 31+ messages in thread
From: Ilya Loginov @ 2009-12-10 21:43 UTC (permalink / raw)
To: James Bottomley; +Cc: Ilya Loginov, Jens Axboe, linux-arch
On Fri, 11 Dec 2009 00:27:00 +0300
Ilya Loginov <isloginov@gmail.com> wrote:
> Second. What difference will be between flush_kernel_dcache_page and
> flush_dcache_page on MIPS? In common, flush_dcache_page in MIPS set
> bit dirty on page.
Sorry. It is stupid question.
--
Ilya Loginov <isloginov@gmail.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-10 21:27 ` Ilya Loginov
2009-12-10 21:43 ` Ilya Loginov
@ 2009-12-10 22:00 ` James Bottomley
2009-12-10 22:03 ` David Miller
2009-12-10 22:33 ` Ilya Loginov
1 sibling, 2 replies; 31+ messages in thread
From: James Bottomley @ 2009-12-10 22:00 UTC (permalink / raw)
To: Ilya Loginov; +Cc: Jens Axboe, linux-arch
On Fri, 2009-12-11 at 00:27 +0300, Ilya Loginov wrote:
> On Thu, 10 Dec 2009 14:59:36 -0600
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>
> > To fix mips, you just need a
> > flush_kernel_dcache_page() in slram_read so that the alias is updated
> > after the memcpy.
>
> I think you right. But! If we choose this way:
>
> First. We need to realize flush_kernel_dcache_page() for many
> architectures. Am I right?
Actually, I think only sparc and mips need it. It's only needed for an
aliasing architecture. Next, it's only used for pio drivers, so if the
platform never uses a pio driver, it can get away without having this
(that's the sparc case, I think).
> Second. What difference will be between flush_kernel_dcache_page and
> flush_dcache_page on MIPS? In common, flush_dcache_page in MIPS set
> bit dirty on page.
You already replied to this.
> > I would also expect this driver not to work on any
> > highmem system without additional kmap/kunmap(_atomic) pairs in the read
> > and write routines.
>
> > How many other mtd drivers are affected, I'm not sure ... any that do
> > PIO are wrong ... those that do MMIO should be right (that looks to be
> > just the omap driver).
>
> Third. We should fix all other PIO drivers with problem and patch
> aoe: switch to the new bio_flush_dcache_pages() interface in -mm tree
> (i can't find this commit in the Linus tree). There is same problem.
Unfortunately, you can't do it at the bio level. The reason is the
kmap/kunmap. On a highmem system the mapping disappears after kunmap,
so you have to flush before you lose the mapping ... that means right in
the guts of the driver where it's doing the kmap/copy/kunmap.
Sounds like aio does have the issue, yes. If you grep the kernel for
flush_kernel_dcache_pages(), you'll see that a lot of PIO drivers are
already using this, so there's not a lot of conversion to be done.
> May be I wrong. And we have much less work.
James
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-10 22:00 ` James Bottomley
@ 2009-12-10 22:03 ` David Miller
2009-12-10 22:33 ` Ilya Loginov
1 sibling, 0 replies; 31+ messages in thread
From: David Miller @ 2009-12-10 22:03 UTC (permalink / raw)
To: James.Bottomley; +Cc: isloginov, jens.axboe, linux-arch
From: James Bottomley <James.Bottomley@HansenPartnership.com>
Date: Thu, 10 Dec 2009 16:00:18 -0600
> On Fri, 2009-12-11 at 00:27 +0300, Ilya Loginov wrote:
>> On Thu, 10 Dec 2009 14:59:36 -0600
>> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>>
>> > To fix mips, you just need a
>> > flush_kernel_dcache_page() in slram_read so that the alias is updated
>> > after the memcpy.
>>
>> I think you right. But! If we choose this way:
>>
>> First. We need to realize flush_kernel_dcache_page() for many
>> architectures. Am I right?
>
> Actually, I think only sparc and mips need it. It's only needed for an
> aliasing architecture. Next, it's only used for pio drivers, so if the
> platform never uses a pio driver, it can get away without having this
> (that's the sparc case, I think).
Sparc has PIO drivers, namely IDE, and we have special flushes inserted
into the IDE string PIO routines to handle that case.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages)
2009-12-10 22:00 ` James Bottomley
2009-12-10 22:03 ` David Miller
@ 2009-12-10 22:33 ` Ilya Loginov
1 sibling, 0 replies; 31+ messages in thread
From: Ilya Loginov @ 2009-12-10 22:33 UTC (permalink / raw)
To: James Bottomley; +Cc: Jens Axboe, linux-arch
On Thu, 10 Dec 2009 16:00:18 -0600
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> Actually, I think only sparc and mips need it.
How I can know this exactly?
> It's only needed for an
> aliasing architecture. Next, it's only used for pio drivers, so if the
> platform never uses a pio driver, it can get away without having this
> (that's the sparc case, I think).
> Unfortunately, you can't do it at the bio level. The reason is the
> kmap/kunmap. On a highmem system the mapping disappears after kunmap,
> so you have to flush before you lose the mapping ... that means right in
> the guts of the driver where it's doing the kmap/copy/kunmap.
>
> Sounds like aio does have the issue, yes. If you grep the kernel for
> flush_kernel_dcache_pages(), you'll see that a lot of PIO drivers are
> already using this, so there's not a lot of conversion to be done.
I will write my seeing of current stage. Please correct me.
All mtd drivers and aio works correctly(about flushes). But there is a
lot of waste work on some architecrures (PA-RISC as example). It is bad.
We want to add flush_kernel_dcache_page() in MIPS(and may be SPARC and
other archs) and call it after kunmap in slram driver(and in aio may be).
So, we will fix the specific bug and, may be, bugs in other drivers which
use flush_kernel_dcache_page which(bugs) have not been founded yet. But
bugs in mtd may be will remain.
--
Ilya Loginov <isloginov@gmail.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2009-12-10 22:33 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-09 22:39 problems in commit 2d4dc890b5c8 (block: add helpers to run flush_dcache_page() against a bio and a request's pages) James Bottomley
2009-12-09 22:45 ` Russell King
2009-12-09 22:56 ` James Bottomley
2009-12-09 23:03 ` Ilya Loginov
2009-12-09 23:11 ` James Bottomley
2009-12-09 23:36 ` Ilya Loginov
2009-12-09 23:47 ` James Bottomley
2009-12-10 0:06 ` Ilya Loginov
2009-12-10 0:19 ` James Bottomley
2009-12-10 4:40 ` Ilya Loginov
2009-12-10 17:07 ` James Bottomley
2009-12-10 17:48 ` Russell King
2009-12-10 17:59 ` James Bottomley
2009-12-10 18:06 ` Russell King
2009-12-10 18:20 ` James Bottomley
2009-12-10 19:05 ` Russell King
2009-12-10 20:29 ` James Bottomley
2009-12-10 20:39 ` Russell King
2009-12-10 19:42 ` Ilya Loginov
2009-12-10 19:43 ` Russell King
2009-12-10 19:48 ` Ilya Loginov
2009-12-10 19:46 ` Ilya Loginov
2009-12-10 20:28 ` James Bottomley
2009-12-10 20:41 ` Ilya Loginov
2009-12-10 20:48 ` Ilya Loginov
2009-12-10 20:59 ` James Bottomley
2009-12-10 21:27 ` Ilya Loginov
2009-12-10 21:43 ` Ilya Loginov
2009-12-10 22:00 ` James Bottomley
2009-12-10 22:03 ` David Miller
2009-12-10 22:33 ` Ilya Loginov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).