Re: [rfc][patch 2/2] mm: introduce optional pte_special pte bit

linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>, Hugh Dickins <hugh@veritas.com>,
	Jared Hulbert <jaredeh@gmail.com>,
	Carsten Otte <cotte@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	linux-arch@vger.kernel.org
Subject: Re: [rfc][patch 2/2] mm: introduce optional pte_special pte bit
Date: Sun, 13 Jan 2008 21:46:00 +0100	[thread overview]
Message-ID: <1200257160.17186.70.camel@localhost> (raw)
In-Reply-To: <alpine.LFD.1.00.0801130846000.2806@woody.linux-foundation.org>

On Sun, 2008-01-13 at 08:50 -0800, Linus Torvalds wrote: 
> > Well the immediate improvement from this actual patch is just that
> > it gives better and smaller code for vm_normal_page (even if you
> > discount the debug checks in the existing code).
> 
> It does no such thing, except for the slow-path that doesn't matter.
> 
> So it may optimize the slow-path (PFNMAP/MIXMAP), but the real path stays 
> exactly the same from a code generation standpoint (just checking a 
> different bit, afaik). 
> 
> If those pte_special bits are required for unexplained lockless 
> get_user_pages, is this going to get EVEN WORSE when s390 (again) cannot 
> do it?

Neiter is the pte_special bit required for s390 nor can't we implement
pfn_valid in a way that would work with the new VM_MIXMAP vmas and copy
on write. It would be slow though because DCSS segments on s390 can have
different types. For one type the pages are reference counted (hotplug
memory via DCSS), for the other the pages are not reference counted
(read only xip DCSS). I doubt that we will stay alone with the problem,
with KVM you can easily imagine to introduce hot memory add by mapping
an anonymous piece of memory. For s390 the straight forward solution for
pages with a pfn > max_pfn is to walk the list of all DCSS segments. For
a system where /usr lives on a xip DCSS this happens frequently.

It seems reasonable to me to introduce a pte bit to decide between the
two cases, in particular since Nick has some other use for the bit as
well (don't know too much about that features as well). When a non-
reference counting pte is established we know it is special, we just
have forgotten about it in vm_normal_page. What makes this ugly is the
fact that there currently are some architectures like arm that do not
have room for the pte_special bit in the pte. Seems like we need a clean
abstraction to allow each architecture to choose the best way to make
the decision between reference counted or not. It is only two arch
calls, one when a pte is created for a non-refcounting page and another
for the check in vm_normal_page to get the information back. The default
implementation would be a nop for the first call and a pfn_valid check
for the second call. For s390 I would prefer a pte bit if I can get it.
If not then we have to play games with pfn_valid.

-- 
blue skies,
  Martin.

"Reality continues to ruin my life." - Calvin.

next prev parent reply	other threads:[~2008-01-13 20:45 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-13  3:08 [rfc] changes to user memory mapping scheme Nick Piggin
2008-01-13  3:09 ` [rfc][patch 1/2] mm: introduce VM_MIXEDMAP Nick Piggin
2008-01-13  3:10 ` [rfc][patch 2/2] mm: introduce optional pte_special pte bit Nick Piggin
2008-01-13  3:41   ` Linus Torvalds
2008-01-13  4:39     ` Nick Piggin
2008-01-13  4:45       ` Linus Torvalds
2008-01-13  5:06         ` Nick Piggin
2008-01-13 16:50           ` Linus Torvalds
2008-01-13 20:46             ` Martin Schwidefsky [this message]
2008-01-14 21:04             ` Jared Hulbert
2008-01-15  9:18               ` Carsten Otte
2008-01-16  3:38             ` Nick Piggin
2008-01-16  4:04               ` Linus Torvalds
2008-01-16  4:37                 ` Nick Piggin
2008-01-16  4:48                   ` Linus Torvalds
2008-01-16  4:51                     ` David Miller
2008-01-16  5:23                       ` Linus Torvalds
2008-01-16  5:48                         ` Nick Piggin
2008-01-16  9:52                           ` Martin Schwidefsky
2008-01-16  5:17                     ` Nick Piggin
2008-01-16 10:52                       ` Catalin Marinas
2008-01-16 18:18                         ` Russell King
2008-01-16 17:21                       ` Linus Torvalds
2008-01-16 17:14   ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1200257160.17186.70.camel@localhost \
    --to=schwidefsky@de.ibm.com \
    --cc=cotte@de.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hugh@veritas.com \
    --cc=jaredeh@gmail.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=npiggin@suse.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).