linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Hugh Dickins <hugh@veritas.com>,
	Jared Hulbert <jaredeh@gmail.com>,
	Carsten Otte <cotte@de.ibm.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	linux-arch@vger.kernel.org, rmk@arm.linux.org.uk,
	catalin.marinas@arm.com
Subject: Re: [rfc][patch 2/2] mm: introduce optional pte_special pte bit
Date: Wed, 16 Jan 2008 06:17:59 +0100	[thread overview]
Message-ID: <20080116051759.GA14049@wotan.suse.de> (raw)
In-Reply-To: <alpine.LFD.1.00.0801152040310.2806@woody.linux-foundation.org>

On Tue, Jan 15, 2008 at 08:48:42PM -0800, Linus Torvalds wrote:
> 
> 
> On Wed, 16 Jan 2008, Nick Piggin wrote:
> > 
> > Right, that's what I had hoped as well. But when I say pte_special
> > *usable* by all architectures, I mean it is usable by all that can
> > spare a bit in the pte. Apparently ARM can't because some some bug
> > in an Xscale CPU or something (the thread is on linux-arch).
> 
> Hmm. Can you give a pointer to some browsable archive? I guess I should 
> subscribe, but there's too much email, too little time. linux-arch is one 
> of the lists that I probably should look at.

http://marc.info/?t=119968107900003&r=2&w=2

I've also cc'ed Russell and Catalin, who were involved in that one.


> That said: especially for PFNMAP and friends, it may be possible to simply 
> re-use an existign hardware bit, like (for example) a "cacheable" bit.
> 
> That doesn't mean that such an architecture would have a free bit for any 
> *arbitrary* software use, but the "no <struct page> backing" is really a 
> pretty special feature, and may well map fairly well 1:1 with something 
> like a "cache disable" bit (which I do think ARM has).
> 
> It's not like we necessarily would want /dev/mem to be mapped cacheable 
> *anyway*, much less on some architecture with stupid virtual caches. 
> 
> > I remember that too. I guess some wires got crossed somewhere. s390
> > evidently does have free bits in their pte_present-type ptes.
> 
> I think they had two types of PTE's - 32-bit and 64-bit. Maybe it's just 
> the 32-bit one that was all used up (but see above - maybe cacheable bits 
> are doable?)

Not sure, I'll let the s390 people chime in here. We'd still need a
pte_special bit for 32 and 64 bit ptes (if they are different)...

 
> I do have to say, one of the reasons I enjoyed PFNMAP was that so far 
> we've basially been able to live without any SW-specified bits at all. 
> Yeah, we use "software bits" on architectures to emulate dirty/accessed, 
> but we have never really needed any "kernel internal bits". And I do think 
> that's generally a good idea. 

I agree in a way (on one hand it would be nice to simplify the whole
PFNMAP logic, on the other hand it isn't actually a problem to keep it,
and it is a nice way of avoiding the use of a pte bit, which might be
important in future even if not today...) 


> So in that sense, I'd actually prefer the current setup if it's not a huge 
> pain. 

Well that is why I wanted to go the ifdef route (or rather, tidy it up to
be an if (CONSTANT) { } else { } or something).

We could go and hide the pte-check in s390 specific code just in the
VM_MIXEDMAP case, but again I think that is actually just making things
less clear (than having the 2 distinct cases in core code).


> I saw your 5% number, but I really wonder about that one. Was that 
> perhaps with the much more expensive non-linear NUMA "pfn_to_page()"? THAT 
> expense would drown out any vma->vm_flags costs.

The 5% case yes that was with SPARSEMEM. And yes we should get rid of
that pfn_valid test IMO (or put it under CONFIG_DEBUG_VM). I'm not aware
of it ever triggering, so I think it should go away (I think Hugh nacked
my attempt at this last time -- Hugh, what do you think?).

However, minor performance issues aside, I'd still hope to find a good
way to get the pte_special path in.

---

vm_normal_page has been seen to take nearly 5% kernel time in profiles, and
regularly appears in the top 10 or so functions in a profile. It is called
at every page fault, and for every pte in bulk copy or unmaps.

pfn_valid in particular can be quite expensive in some memory models.

Place that code under CONFIG_DEBUG_VM. I'm not aware of it catching any
problems.

Signed-off-by: Nick Piggin <npiggin@suse.de>
---
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -392,16 +392,13 @@ struct page *vm_normal_page(struct vm_ar
 			return NULL;
 	}
 
-	/*
-	 * Add some anal sanity checks for now. Eventually,
-	 * we should just do "return pfn_to_page(pfn)", but
-	 * in the meantime we check that we get a valid pfn,
-	 * and that the resulting page looks ok.
-	 */
+#ifdef CONFIG_DEBUG_VM
+	/* Check that we get a valid pfn. */
 	if (unlikely(!pfn_valid(pfn))) {
 		print_bad_pte(vma, pte, addr);
 		return NULL;
 	}
+#endif
 
 	/*
 	 * NOTE! We still have PageReserved() pages in the page 

  parent reply	other threads:[~2008-01-16  5:18 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-13  3:08 [rfc] changes to user memory mapping scheme Nick Piggin
2008-01-13  3:09 ` [rfc][patch 1/2] mm: introduce VM_MIXEDMAP Nick Piggin
2008-01-13  3:10 ` [rfc][patch 2/2] mm: introduce optional pte_special pte bit Nick Piggin
2008-01-13  3:41   ` Linus Torvalds
2008-01-13  4:39     ` Nick Piggin
2008-01-13  4:45       ` Linus Torvalds
2008-01-13  5:06         ` Nick Piggin
2008-01-13 16:50           ` Linus Torvalds
2008-01-13 20:46             ` Martin Schwidefsky
2008-01-14 21:04             ` Jared Hulbert
2008-01-15  9:18               ` Carsten Otte
2008-01-16  3:38             ` Nick Piggin
2008-01-16  4:04               ` Linus Torvalds
2008-01-16  4:37                 ` Nick Piggin
2008-01-16  4:48                   ` Linus Torvalds
2008-01-16  4:51                     ` David Miller
2008-01-16  5:23                       ` Linus Torvalds
2008-01-16  5:48                         ` Nick Piggin
2008-01-16  9:52                           ` Martin Schwidefsky
2008-01-16  5:17                     ` Nick Piggin [this message]
2008-01-16 10:52                       ` Catalin Marinas
2008-01-16 18:18                         ` Russell King
2008-01-16 17:21                       ` Linus Torvalds
2008-01-16 17:14   ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080116051759.GA14049@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=catalin.marinas@arm.com \
    --cc=cotte@de.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hugh@veritas.com \
    --cc=jaredeh@gmail.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=rmk@arm.linux.org.uk \
    --cc=schwidefsky@de.ibm.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).