All of lore.kernel.org
 help / color / mirror / Atom feed
From: Russell King <rmk@arm.linux.org.uk>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jared Hulbert <jaredeh@gmail.com>, Nick Piggin <npiggin@suse.de>,
	Martin Schwidefsky <martin.schwidefsky@de.ibm.com>,
	carsteno@linux.vnet.ibm.com,
	Heiko Carstens <h.carstens@de.ibm.com>,
	Linux Memory Management List <linux-mm@kvack.org>,
	linux-arch@vger.kernel.org
Subject: Re: [rfc][patch] mm: use a pte bit to flag normal pages
Date: Tue, 8 Jan 2008 10:52:27 +0000	[thread overview]
Message-ID: <20080108105227.GA10546@flint.arm.linux.org.uk> (raw)
In-Reply-To: <1199787075.17809.10.camel@pc1117.cambridge.arm.com>

On Tue, Jan 08, 2008 at 10:11:15AM +0000, Catalin Marinas wrote:
> On Mon, 2008-01-07 at 19:45 +0000, Russell King wrote:
> > In old ARM CPUs, there were two bits that defined the characteristics of
> > the mapping - the C and B bits (C = cacheable, B = bufferable)
> > 
> > Some ARMv5 (particularly Xscale-based) and all ARMv6 CPUs extend this to
> > five bits and introduce "memory types" - 3 bits of TEX, and C and B.
> > 
> > Between these bits, it defines:
> > 
> > - strongly ordered
> > - bufferable only *
> > - device, sharable *
> > - device, unsharable
> > - memory, bufferable and cacheable, write through, no write allocate
> > - memory, bufferable and cacheable, write back, no write allocate
> > - memory, bufferable and cacheable, write back, write allocate
> > - implementation defined combinations (eg, selecting "minicache")
> > - and a set of 16 states to allow the policy of inner and outer levels
> >   of cache to be defined (two bits per level).
> 
> Can we not restrict these to a maximum of 8 base types at run-time? If
> yes, we can only use 3 bits for encoding and also benefit from the
> automatic remapping in later ARM CPUs. For those not familiar with ARM,
> 8 combinations of the TEX, C, B and S (shared) bits can be specified in
> separate registers and the pte would only use 3 bits to refer to those.
> Even older cores would benefit from this as I think it is faster to read
> the encoding from an array in set_pte than doing all the bit comparisons
> to calculate the hardware pte in the current implementation.

So basically that gives us the following combinations:

TEXCB
00000 - /dev/mem and device uncachable mappings (strongly ordered)
00001 - frame buffers
00010 - write through mappings (selectable via kernel command line)
	  and also work-around for user read-only write-back mappings
	  on PXA2.
00011 - normal write back mappings
00101 - Xscale3 "shared device" work-around for strongly ordered mappings
00110 - PXA3 mini-cache or other "implementation defined features"
00111 - write back write allocate mappings
01000 - non-shared device (will be required to map some devices to userspace)
	  and also Xscale3 work-around for strongly ordered mappings
10111 - Xscale3 L2 cache-enabled mappings

It's unclear at present what circumstances you'd use each of the two
Xscale3 work-around bit combinations - or indeed whether there's a
printing error in the documentation concerning TEXCB=00101.

It's also unclear how to squeeze these down into a bit pattern in such
a way that we avoid picking out bits from the Linux PTE, and recombining
them so we can look them up in a table or whatever - especially given
that set_pte is a fast path and extra cycles there have a VERY noticable
impact on overall system performance.

However, until we get around to sorting out the implementation of the
Xscale3 strongly ordered work-around which seems to be the highest
priority (and hardest to resolve) I don't think there's much more to
discuss; we don't have a clear way ahead on these issues at the moment.
All we current have is the errata entry, and we know people are seeing
data corruption on Xscale3 platforms.

And no, I don't think we can keep it contained within the Xscale3 support
file - the set_pte method isn't passed sufficient information for that.
Conversely, setting the TEX bits behind set_pte's back by using set_pte_ext
results in loss of that information when the page is aged - again resulting
in data corruption.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

WARNING: multiple messages have this Message-ID (diff)
From: Russell King <rmk@arm.linux.org.uk>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jared Hulbert <jaredeh@gmail.com>, Nick Piggin <npiggin@suse.de>,
	Martin Schwidefsky <martin.schwidefsky@de.ibm.com>,
	carsteno@linux.vnet.ibm.com,
	Heiko Carstens <h.carstens@de.ibm.com>,
	Linux Memory Management List <linux-mm@kvack.org>,
	linux-arch@vger.kernel.org
Subject: Re: [rfc][patch] mm: use a pte bit to flag normal pages
Date: Tue, 8 Jan 2008 10:52:27 +0000	[thread overview]
Message-ID: <20080108105227.GA10546@flint.arm.linux.org.uk> (raw)
In-Reply-To: <1199787075.17809.10.camel@pc1117.cambridge.arm.com>

On Tue, Jan 08, 2008 at 10:11:15AM +0000, Catalin Marinas wrote:
> On Mon, 2008-01-07 at 19:45 +0000, Russell King wrote:
> > In old ARM CPUs, there were two bits that defined the characteristics of
> > the mapping - the C and B bits (C = cacheable, B = bufferable)
> > 
> > Some ARMv5 (particularly Xscale-based) and all ARMv6 CPUs extend this to
> > five bits and introduce "memory types" - 3 bits of TEX, and C and B.
> > 
> > Between these bits, it defines:
> > 
> > - strongly ordered
> > - bufferable only *
> > - device, sharable *
> > - device, unsharable
> > - memory, bufferable and cacheable, write through, no write allocate
> > - memory, bufferable and cacheable, write back, no write allocate
> > - memory, bufferable and cacheable, write back, write allocate
> > - implementation defined combinations (eg, selecting "minicache")
> > - and a set of 16 states to allow the policy of inner and outer levels
> >   of cache to be defined (two bits per level).
> 
> Can we not restrict these to a maximum of 8 base types at run-time? If
> yes, we can only use 3 bits for encoding and also benefit from the
> automatic remapping in later ARM CPUs. For those not familiar with ARM,
> 8 combinations of the TEX, C, B and S (shared) bits can be specified in
> separate registers and the pte would only use 3 bits to refer to those.
> Even older cores would benefit from this as I think it is faster to read
> the encoding from an array in set_pte than doing all the bit comparisons
> to calculate the hardware pte in the current implementation.

So basically that gives us the following combinations:

TEXCB
00000 - /dev/mem and device uncachable mappings (strongly ordered)
00001 - frame buffers
00010 - write through mappings (selectable via kernel command line)
	  and also work-around for user read-only write-back mappings
	  on PXA2.
00011 - normal write back mappings
00101 - Xscale3 "shared device" work-around for strongly ordered mappings
00110 - PXA3 mini-cache or other "implementation defined features"
00111 - write back write allocate mappings
01000 - non-shared device (will be required to map some devices to userspace)
	  and also Xscale3 work-around for strongly ordered mappings
10111 - Xscale3 L2 cache-enabled mappings

It's unclear at present what circumstances you'd use each of the two
Xscale3 work-around bit combinations - or indeed whether there's a
printing error in the documentation concerning TEXCB=00101.

It's also unclear how to squeeze these down into a bit pattern in such
a way that we avoid picking out bits from the Linux PTE, and recombining
them so we can look them up in a table or whatever - especially given
that set_pte is a fast path and extra cycles there have a VERY noticable
impact on overall system performance.

However, until we get around to sorting out the implementation of the
Xscale3 strongly ordered work-around which seems to be the highest
priority (and hardest to resolve) I don't think there's much more to
discuss; we don't have a clear way ahead on these issues at the moment.
All we current have is the errata entry, and we know people are seeing
data corruption on Xscale3 platforms.

And no, I don't think we can keep it contained within the Xscale3 support
file - the set_pte method isn't passed sufficient information for that.
Conversely, setting the TEX bits behind set_pte's back by using set_pte_ext
results in loss of that information when the page is aged - again resulting
in data corruption.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-01-08 10:55 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-14 13:38 [rfc][patch 1/2] mm: introduce VM_MIXEDMAP mappings Nick Piggin
2007-12-14 13:41 ` [rfc][patch 2/2] xip: support non-struct page memory Nick Piggin
2007-12-14 13:46   ` Carsten Otte
2007-12-15  1:07     ` Jared Hulbert
2007-12-15  1:17       ` Nick Piggin
2007-12-15  6:47         ` Jared Hulbert
2007-12-19 14:04   ` Carsten Otte
2007-12-20  9:23     ` Jared Hulbert
2007-12-21  0:40       ` Nick Piggin
2007-12-20 13:53   ` Carsten Otte
2007-12-20 14:33     ` Carsten Otte
2007-12-20 14:50       ` Carsten Otte
2007-12-20 17:24         ` Jared Hulbert
2007-12-21  0:12           ` Jared Hulbert
2007-12-21  0:56             ` Nick Piggin
2007-12-21  9:56             ` Carsten Otte
2007-12-21  9:49           ` Carsten Otte
2007-12-21  0:50         ` Nick Piggin
2007-12-21 10:02           ` Carsten Otte
2007-12-21 10:14             ` Nick Piggin
2007-12-21 10:17               ` Carsten Otte
2007-12-21 10:23                 ` Nick Piggin
2007-12-21 10:31                   ` Carsten Otte
2007-12-21  0:45       ` Nick Piggin
2007-12-21 10:05         ` Carsten Otte
2007-12-21 10:20           ` Nick Piggin
2007-12-21 10:35             ` Carsten Otte
2007-12-21 10:47               ` Nick Piggin
2007-12-21 19:29                 ` Martin Schwidefsky
2008-01-07  4:43                   ` [rfc][patch] mm: use a pte bit to flag normal pages Nick Piggin
2008-01-07  4:43                     ` Nick Piggin
2008-01-07 10:30                     ` Russell King
2008-01-07 10:30                       ` Russell King
2008-01-07 11:14                       ` Nick Piggin
2008-01-07 11:14                         ` Nick Piggin
2008-01-07 18:49                       ` Jared Hulbert
2008-01-07 18:49                         ` Jared Hulbert
2008-01-07 19:45                         ` Russell King
2008-01-07 19:45                           ` Russell King
2008-01-07 22:52                           ` Jared Hulbert
2008-01-07 22:52                             ` Jared Hulbert
2008-01-08  2:37                           ` Andi Kleen
2008-01-08  2:37                             ` Andi Kleen
2008-01-08  2:49                             ` Nick Piggin
2008-01-08  2:49                               ` Nick Piggin
2008-01-08  3:31                               ` Andi Kleen
2008-01-08  3:31                                 ` Andi Kleen
2008-01-08  3:52                                 ` Nick Piggin
2008-01-08  3:52                                   ` Nick Piggin
2008-01-08 10:11                           ` Catalin Marinas
2008-01-08 10:11                             ` Catalin Marinas
2008-01-08 10:52                             ` Russell King [this message]
2008-01-08 10:52                               ` Russell King
2008-01-08 13:54                               ` Catalin Marinas
2008-01-08 13:54                                 ` Catalin Marinas
2008-01-08 14:08                                 ` Russell King
2008-01-08 14:08                                   ` Russell King
2008-01-10 13:33                     ` Carsten Otte
2008-01-10 13:33                       ` Carsten Otte
2008-01-10 23:18                       ` Nick Piggin
2008-01-10 23:18                         ` Nick Piggin
2008-01-08  9:35                 ` [rfc][patch 0/4] VM_MIXEDMAP patchset with s390 backend Carsten Otte
2008-01-08 10:08                   ` Nick Piggin
2008-01-08 11:34                     ` Carsten Otte
2008-01-08 11:55                       ` Nick Piggin
2008-01-08 12:03                         ` Carsten Otte
2008-01-08 13:56                       ` Jörn Engel
2008-01-08 14:51                         ` Carsten Otte
2008-01-08 18:09                           ` Jared Hulbert
2008-01-08 22:12                             ` Nick Piggin
2008-01-09 15:14                   ` [rfc][patch 0/4] VM_MIXEDMAP patchset with s390 backend v2 Carsten Otte
     [not found]                   ` <1199891032.28689.9.camel@cotte.boeblingen.de.ibm.com>
2008-01-09 15:14                     ` [rfc][patch 1/4] include: add callbacks to toggle reference counting for VM_MIXEDMAP pages Carsten Otte, Carsten Otte
2008-01-09 17:31                       ` Martin Schwidefsky
2008-01-09 18:17                       ` Jared Hulbert
2008-01-10  7:59                         ` Carsten Otte
2008-01-10 20:01                           ` Jared Hulbert
2008-01-11  8:45                             ` Carsten Otte
2008-01-13  2:44                               ` Nick Piggin
2008-01-14 11:36                                 ` Carsten Otte
2008-01-16  4:04                                   ` Nick Piggin
2008-01-15 13:05                                 ` Carsten Otte
2008-01-16  4:22                                   ` Nick Piggin
2008-01-16 14:29                                     ` [rft] updated xip patch rollup Nick Piggin
2008-01-17 10:24                                       ` Carsten Otte
2008-01-10 20:23                           ` [rfc][patch 1/4] include: add callbacks to toggle reference counting for VM_MIXEDMAP pages Jared Hulbert
2008-01-11  8:32                             ` Carsten Otte
2008-01-10  0:20                       ` Nick Piggin
2008-01-10  8:06                         ` Carsten Otte
2008-01-09 15:14                     ` [rfc][patch 2/4] mm: introduce VM_MIXEDMAP Carsten Otte, Jared Hulbert, Carsten Otte
2008-01-09 15:14                     ` [rfc][patch 3/4] Convert XIP to support non-struct page backed memory Carsten Otte, Nick Piggin
2008-01-09 15:14                     ` [rfc][patch 4/4] s390: remove struct page entries for DCSS memory segments Carsten Otte, Carsten Otte
     [not found]                 ` <1199784196.25114.11.camel@cotte.boeblingen.de.ibm.com>
2008-01-08  9:35                   ` [rfc][patch 1/4] mm: introduce VM_MIXEDMAP Carsten Otte, Jared Hulbert, Carsten Otte
2008-01-08  9:35                   ` [rfc][patch 2/4] xip: support non-struct page memory Carsten Otte, Nick Piggin, Carsten Otte
2008-01-08  9:36                   ` [rfc][patch 3/4] s390: remove sturct page entries for z/VM DCSS memory segments Carsten Otte
2008-01-08  9:36                   ` [rfc][patch 4/4] s390: mixedmap_refcount_pfn implementation using list walk Carsten Otte

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080108105227.GA10546@flint.arm.linux.org.uk \
    --to=rmk@arm.linux.org.uk \
    --cc=carsteno@linux.vnet.ibm.com \
    --cc=catalin.marinas@arm.com \
    --cc=h.carstens@de.ibm.com \
    --cc=jaredeh@gmail.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=martin.schwidefsky@de.ibm.com \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.