linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Russell King - ARM Linux <linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>
To: Linus Torvalds
	<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: "Trond Myklebust"
	<Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>,
	"James Bottomley"
	<James.Bottomley-JuX6DAaQMKPCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	"Marc Kleine-Budde" <mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>,
	"Uwe Kleine-König"
	<u.kleine-koenig-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>,
	"Marc Kleine-Budde"
	<m.kleine-budde-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	"Parisc List"
	<linux-parisc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]
Date: Wed, 5 Jan 2011 23:59:22 +0000	[thread overview]
Message-ID: <20110105235922.GP8638@n2100.arm.linux.org.uk> (raw)
In-Reply-To: <AANLkTi=SjMinMp+m726GS1iehj6cQgNy1RqSoUqKhjtv-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Wed, Jan 05, 2011 at 03:28:53PM -0800, Linus Torvalds wrote:
> On Wed, Jan 5, 2011 at 3:06 PM, Trond Myklebust
> <Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org> wrote:
> >
> > Yes. The fix I sent out was a call to invalidate_kernel_vmap_range(),
> > which takes care of invalidating the cache prior to a virtual address
> > read.
> >
> > My question was specifically about the write through the regular kernel
> > mapping: according to Russell and my reading of the cachetlb.txt
> > documentation, flush_dcache_page() is only guaranteed to have an effect
> > on page cache pages.
> 
> I don't think that should ever matter. It's not like the hardware can
> know whether it's a dcache page or not.
> 
> And if the sw implementation cares, it's doing something really odd.

From the hardware perspective you're correct that it doesn't.  However,
from the efficient implementation perspective it does matter.

Take for example the read-ahead done on block devices.  We don't want to
flush all those pages that were read in when we don't know that they're
ever going to end up in a user mapping.  So what's commonly done (as
suggested by DaveM) is that flush_dcache_page() detects that it's a
dcache page, ensures that there's no user mappings, and sets a 'dirty'
flag.  This flag is guaranteed to be clear when new, clean, unread
pages enter the page cache.

When the page eventually ends up in a user mapping, that dirty flag is
checked and the necessary cache flushing done at that point.

Note that when there are user mappings, flush_dcache_page() has to flush
those mappings too, otherwise mmap() <-> read()/write() coherency breaks.
I believe this was what flush_dcache_page() was created to resolve.

flush_kernel_dcache_page() was to solve the problem of PIO drivers
writing to dcache pages, so that data written into the kernel mapping
would be visible to subsequent user mappings.

We chose a different overall approach - which had already been adopted by
PPC - where we invert the meaning of this 'dirty' bit to mean that it's
clean.  So every new page cache page starts out life as being marked
dirty and so nothing needs to be done at flush_kernel_dcache_page().
We continue to use davem's optimization but with the changed meaning of
the bit, but as we now support SMP we do the flushing at set_pte_at()
time.

This also means that we don't have to rely on the (endlessly) buggy PIO
drivers remembering to add flush_kernel_dcache_page() calls - something
which has been a source of constant never-ending pain for us.

The final piece of the jigsaw is flush_anon_page() which deals with
kernel<->user coherency for anonymous pages by flushing both the user
and kernel sides of the mapping.  This was to solve direct-io coherency
problems.

As the users of flush_anon_page() always do:

	flush_anon_page(vma, page, addr);
	flush_dcache_page(page);

and documentation doesn't appear to imply that this will always be the
case, we restrict flush_dcache_page() to only work on page cache pages,
otherwise we end up flushing the kernel-side mapping multiple time in
succession.

Maybe we should make flush_anon_page() only flush the user mapping,
stipulate that it shall always be followed by flush_dcache_page(),
which shall flush the kernel side mapping even for anonymous pages?
That sounds to me like a recipe for missing flush_dcache_page() calls
causing bugs.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: Russell King - ARM Linux <linux@arm.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Trond Myklebust" <Trond.Myklebust@netapp.com>,
	"James Bottomley" <James.Bottomley@hansenpartnership.com>,
	linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	"Marc Kleine-Budde" <mkl@pengutronix.de>,
	"Uwe Kleine-König" <u.kleine-koenig@pengutronix.de>,
	"Marc Kleine-Budde" <m.kleine-budde@pengutronix.de>,
	linux-arm-kernel@lists.infradead.org,
	"Parisc List" <linux-parisc@vger.kernel.org>,
	linux-arch@vger.kernel.org
Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]
Date: Wed, 5 Jan 2011 23:59:22 +0000	[thread overview]
Message-ID: <20110105235922.GP8638@n2100.arm.linux.org.uk> (raw)
Message-ID: <20110105235922.L13ne3ibwR4dk7NXVwOEBhDf1gMYQRePsFBPd5A70Wk@z> (raw)
In-Reply-To: <AANLkTi=SjMinMp+m726GS1iehj6cQgNy1RqSoUqKhjtv@mail.gmail.com>

On Wed, Jan 05, 2011 at 03:28:53PM -0800, Linus Torvalds wrote:
> On Wed, Jan 5, 2011 at 3:06 PM, Trond Myklebust
> <Trond.Myklebust@netapp.com> wrote:
> >
> > Yes. The fix I sent out was a call to invalidate_kernel_vmap_range(),
> > which takes care of invalidating the cache prior to a virtual address
> > read.
> >
> > My question was specifically about the write through the regular kernel
> > mapping: according to Russell and my reading of the cachetlb.txt
> > documentation, flush_dcache_page() is only guaranteed to have an effect
> > on page cache pages.
> 
> I don't think that should ever matter. It's not like the hardware can
> know whether it's a dcache page or not.
> 
> And if the sw implementation cares, it's doing something really odd.

From the hardware perspective you're correct that it doesn't.  However,
from the efficient implementation perspective it does matter.

Take for example the read-ahead done on block devices.  We don't want to
flush all those pages that were read in when we don't know that they're
ever going to end up in a user mapping.  So what's commonly done (as
suggested by DaveM) is that flush_dcache_page() detects that it's a
dcache page, ensures that there's no user mappings, and sets a 'dirty'
flag.  This flag is guaranteed to be clear when new, clean, unread
pages enter the page cache.

When the page eventually ends up in a user mapping, that dirty flag is
checked and the necessary cache flushing done at that point.

Note that when there are user mappings, flush_dcache_page() has to flush
those mappings too, otherwise mmap() <-> read()/write() coherency breaks.
I believe this was what flush_dcache_page() was created to resolve.

flush_kernel_dcache_page() was to solve the problem of PIO drivers
writing to dcache pages, so that data written into the kernel mapping
would be visible to subsequent user mappings.

We chose a different overall approach - which had already been adopted by
PPC - where we invert the meaning of this 'dirty' bit to mean that it's
clean.  So every new page cache page starts out life as being marked
dirty and so nothing needs to be done at flush_kernel_dcache_page().
We continue to use davem's optimization but with the changed meaning of
the bit, but as we now support SMP we do the flushing at set_pte_at()
time.

This also means that we don't have to rely on the (endlessly) buggy PIO
drivers remembering to add flush_kernel_dcache_page() calls - something
which has been a source of constant never-ending pain for us.

The final piece of the jigsaw is flush_anon_page() which deals with
kernel<->user coherency for anonymous pages by flushing both the user
and kernel sides of the mapping.  This was to solve direct-io coherency
problems.

As the users of flush_anon_page() always do:

	flush_anon_page(vma, page, addr);
	flush_dcache_page(page);

and documentation doesn't appear to imply that this will always be the
case, we restrict flush_dcache_page() to only work on page cache pages,
otherwise we end up flushing the kernel-side mapping multiple time in
succession.

Maybe we should make flush_anon_page() only flush the user mapping,
stipulate that it shall always be followed by flush_dcache_page(),
which shall flush the kernel side mapping even for anonymous pages?
That sounds to me like a recipe for missing flush_dcache_page() calls
causing bugs.

  parent reply	other threads:[~2011-01-05 23:59 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-05 19:05 still nfs problems [Was: Linux 2.6.37-rc8] James Bottomley
2011-01-05 19:18 ` Linus Torvalds
     [not found]   ` <AANLkTi=VZUxNFd7n-qwf5aiOeK5rkk8qBmo+kOpgg7up-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-01-05 19:36     ` James Bottomley
2011-01-05 19:36       ` James Bottomley
2011-01-05 19:49       ` Linus Torvalds
2011-01-05 20:35         ` James Bottomley
     [not found]       ` <1294256169.16957.18.camel-0iu6Cu4xQGLYCGPCin2YbQ@public.gmane.org>
2011-01-05 20:00         ` Russell King - ARM Linux
2011-01-05 20:00           ` Russell King - ARM Linux
2011-01-05 20:33           ` James Bottomley
2011-01-05 20:33             ` James Bottomley
2011-01-05 20:48             ` Linus Torvalds
     [not found]               ` <AANLkTimzzBsdtWcZtP5E_CH1hUZugGMoaHOiMdQJf764-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-01-05 21:04                 ` Russell King - ARM Linux
2011-01-05 21:04                   ` Russell King - ARM Linux
2011-01-05 21:08                   ` Linus Torvalds
2011-01-05 21:08                     ` Linus Torvalds
     [not found]                     ` <AANLkTi=EXXBTW7oWHq3D+PHsx=thF1CpkRjn0ax2p5rm-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-01-05 21:16                       ` Trond Myklebust
2011-01-05 21:16                         ` Trond Myklebust
     [not found]                         ` <1294262208.2952.4.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2011-01-05 21:30                           ` Linus Torvalds
2011-01-05 21:30                             ` Linus Torvalds
2011-01-05 23:06                             ` Trond Myklebust
2011-01-05 23:06                               ` Trond Myklebust
2011-01-05 23:28                               ` James Bottomley
2011-01-06 17:40                                 ` James Bottomley
2011-01-06 17:47                                   ` Trond Myklebust
2011-01-06 17:51                                     ` James Bottomley
2011-01-06 17:55                                     ` Linus Torvalds
2011-01-06 17:55                                       ` Linus Torvalds
2011-01-07 18:53                                       ` Trond Myklebust
     [not found]                                         ` <1294426405.2929.23.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2011-01-07 19:02                                           ` Russell King - ARM Linux
2011-01-07 19:02                                             ` Russell King - ARM Linux
2011-01-07 19:11                                             ` James Bottomley
2011-01-07 19:11                                               ` James Bottomley
     [not found]                                               ` <1294427467.4895.66.camel-0iu6Cu4xQGLYCGPCin2YbQ@public.gmane.org>
2011-01-08 16:49                                                 ` Trond Myklebust
2011-01-08 16:49                                                   ` Trond Myklebust
2011-01-08 23:15                                                   ` Trond Myklebust
2011-01-08 23:15                                                     ` Trond Myklebust
     [not found]                                                     ` <1294528551.4181.19.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2011-01-10 10:50                                                       ` Uwe Kleine-König
2011-01-10 10:50                                                         ` Uwe Kleine-König
2011-01-10 16:25                                                         ` Trond Myklebust
     [not found]                                                           ` <1294676734.3349.10.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2011-01-10 17:08                                                             ` Marc Kleine-Budde
2011-01-10 17:08                                                               ` Marc Kleine-Budde
2011-01-10 17:20                                                               ` Trond Myklebust
     [not found]                                                                 ` <1294680035.3349.19.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2011-01-10 17:26                                                                   ` Marc Kleine-Budde
2011-01-10 17:26                                                                     ` Marc Kleine-Budde
2011-01-10 19:25                                                                 ` Uwe Kleine-König
     [not found]                                                                   ` <20110110192552.GG24920-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
2011-01-10 19:29                                                                     ` Trond Myklebust
2011-01-10 19:29                                                                       ` Trond Myklebust
2011-01-10 19:31                                                                       ` James Bottomley
2011-01-10 19:34                                                                       ` Linus Torvalds
2011-01-10 19:34                                                                         ` Linus Torvalds
2011-01-10 20:15                                                                         ` Trond Myklebust
2011-01-10 12:44                                                       ` Marc Kleine-Budde
2011-01-10 12:44                                                         ` Marc Kleine-Budde
2011-01-07 19:13                                             ` Trond Myklebust
2011-01-07 19:05                                         ` James Bottomley
     [not found]                                   ` <1294335614.22825.154.camel-0iu6Cu4xQGLYCGPCin2YbQ@public.gmane.org>
2011-01-06 18:05                                     ` Russell King - ARM Linux
2011-01-06 18:05                                       ` Russell King - ARM Linux
2011-01-06 18:14                                       ` James Bottomley
2011-01-06 18:14                                         ` James Bottomley
     [not found]                                         ` <1294337670.22825.199.camel-0iu6Cu4xQGLYCGPCin2YbQ@public.gmane.org>
2011-01-06 18:25                                           ` James Bottomley
2011-01-06 18:25                                             ` James Bottomley
2011-01-06 21:07                                             ` James Bottomley
2011-01-06 21:07                                               ` James Bottomley
2011-01-06 20:19                                   ` John Stoffel
2011-01-06 20:19                                     ` John Stoffel
2011-01-05 23:28                               ` Linus Torvalds
2011-01-05 23:28                                 ` Linus Torvalds
     [not found]                                 ` <AANLkTi=SjMinMp+m726GS1iehj6cQgNy1RqSoUqKhjtv-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-01-05 23:59                                   ` Russell King - ARM Linux [this message]
2011-01-05 23:59                                     ` Russell King - ARM Linux
2011-01-05 21:16               ` James Bottomley
2011-01-05 21:16                 ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110105235922.GP8638@n2100.arm.linux.org.uk \
    --to=linux-lfz/pmaqli7xmaaqvzeohq@public.gmane.org \
    --cc=James.Bottomley-JuX6DAaQMKPCXq6kfMZ53/egYHeGw8Jk@public.gmane.org \
    --cc=Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org \
    --cc=linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-parisc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=m.kleine-budde-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org \
    --cc=mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org \
    --cc=torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=u.kleine-koenig-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).