linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hugh@veritas.com>,
	Jared Hulbert <jaredeh@gmail.com>,
	Carsten Otte <cotte@de.ibm.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	linux-arch@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch 2/6] mm: introduce pte_special pte bit
Date: Mon, 21 Jan 2008 10:43:41 +0100	[thread overview]
Message-ID: <20080121094341.GA9198@wotan.suse.de> (raw)
In-Reply-To: <20080118224622.GA11563@wotan.suse.de>

On Fri, Jan 18, 2008 at 11:46:22PM +0100, Nick Piggin wrote:
> On Fri, Jan 18, 2008 at 08:41:22AM -0800, Linus Torvalds wrote:
>  
> > Then, just have a library version of the long form, and make architectures 
> > that don't support it just use that (just so that you don't have to 
> > duplicate that silly thing). So an architecture that support special page 
> > flags would do somethiing like
> > 
> > 	#define set_special_page(vma,addr,ptep,pfn,prot) \
> > 		set_pte_at(vma, addr, ptep, mk_special_pte(pfn,prot))
> > 	#define vm_normal_page(vma,addr,pte)
> > 		(pte_special(pte) ? NULL : pte_page(pte))
> > 
> > and other architectures would just do
> > 
> > 	#define set_special_page(vma,addr,ptep,pfn,prot) \
> > 		set_pte_at(vma, addr, ptep, mk_pte(pfn,prot))
> > 	#define vm_normal_page(vma,addr,pte) \
> > 		generic_vm_normal_page(vma,addr,pte)
> > 
> > or something.
> > 
> > THAT is what I mean by "no #ifdef's in code" - that the selection is done 
> > at a higher level, the same way we have good interfaces with clear 
> > *conceptual* meaning for all the other PTE accessing stuff, rather than 
> > have conditionals in the architecture-independent code.
> 
> OK, that gets around the "duplicate vm_normal_page everywhere" issue I
> had. I'm still not quite happy with it ;)
> 
> How about taking a different approach. How about also having a pte_normal()
> function. Each architecture that has a pte special bit would make this
> !pte_special, and those that don't would return 0. They return 0 from both
> pte_special and pte_normal because they don't know whether the pte is
> special or normal.
> 
> Then vm_normal_page would become:
> 
>     if (pte_special(pte))
>         return NULL;
>     else if (pte_normal(pte))
>         return pte_page(pte);
> 
>     ... /* vma based scheme */

Hmm, it's not *quite* trivial as that for one important case:
vm_insert_mixed. Because we don't actually have a pte yet, so we can't
easily reuse insert_page / insert_pfn, rather we have to build the pte
first and then check it (patch attached, but I think it is a step
backwards)...

Really, I don't think either of my two approaches or your approach is
really a fundamentally different _abstraction_. It basically just has
to accommodate 2 different code paths no matter how you look at it. I
don't know how this is different to, say, conditionally compiling eg.
the FLATMEM/SPARSEMEM memory model code, or rwsem code, depending on
whether an architecture has defined some symbol. It happens all over
the kernel.

Actually, I'd argue it is _better_ than that, because the logic stays
in one place (one screenful, even), and away from abuse or divergence
by arch code.

If one actually came up with a new API that handles both cases better,
I'd say that is a different abstraction. Or if you could come up with
some different arch functions which would allow vm_normal_page to be
streamlined to read more like a regular C function, that should be a
different abstraction...

I'm still keen on my first patch. I know it isn't beautiful, but I
think it is better than the alternatives.

---

mm: add vm_insert_mixed

vm_insert_mixed will insert either a raw pfn or a refcounted struct page
into the page tables, depending on whether vm_normal_page() will return
the page or not. With the introduction of the new pte bit, this is now
a bit too tricky for drivers to be doing themselves.

filemap_xip uses this in a subsequent patch.

---
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -1097,6 +1097,8 @@ int remap_pfn_range(struct vm_area_struc
 int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
 int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
 			unsigned long pfn);
+int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
+			unsigned long pfn);
 
 struct page *follow_page(struct vm_area_struct *, unsigned long address,
 			unsigned int foll_flags);
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -1282,6 +1282,53 @@ out:
 }
 EXPORT_SYMBOL(vm_insert_pfn);
 
+int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
+			unsigned long pfn)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	int retval;
+	pte_t *pte, entry;
+	spinlock_t *ptl;
+
+	BUG_ON(!(vma->vm_flags & VM_MIXEDMAP));
+
+	if (addr < vma->vm_start || addr >= vma->vm_end)
+		return -EFAULT;
+
+	retval = -ENOMEM;
+	pte = get_locked_pte(mm, addr, &ptl);
+	if (!pte)
+		goto out;
+	retval = -EBUSY;
+	if (!pte_none(*pte))
+		goto out_unlock;
+
+	entry = pte_mkspecial(pfn_pte(pfn, prot));
+	/*
+	 * If we don't have pte special, then we have to use the pfn_valid()
+	 * based VM_MIXEDMAP scheme (see vm_normal_page), and thus we *must*
+	 * refcount the page if pfn_valid is true. Otherwise we can *always*
+	 * avoid refcounting the page if we have pte_special.
+	 */
+	if (!pte_special(entry) && pfn_valid(pfn)) {
+		struct page *page;
+
+		page = pfn_to_page(pfn);
+		get_page(page);
+		inc_mm_counter(mm, file_rss);
+		page_add_file_rmap(page);
+	}
+	/* Ok, finally just insert the thing.. */
+	set_pte_at(mm, addr, pte, entry);
+
+	retval = 0;
+out_unlock:
+	pte_unmap_unlock(pte, ptl);
+out:
+	return retval;
+}
+EXPORT_SYMBOL(vm_insert_mixed);
+
 /*
  * maps a range of physical memory into the requested pages. the old
  * mappings are removed. any references to nonexistent pages results

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2008-01-21  9:43 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20080118045649.334391000@suse.de>
2008-01-18  4:56 ` [patch 1/6] mm: introduce VM_MIXEDMAP npiggin, Jared Hulbert
2008-01-18  4:56 ` [patch 2/6] mm: introduce pte_special pte bit npiggin
2008-01-18 16:41   ` Linus Torvalds
2008-01-18 18:04     ` Sam Ravnborg
2008-01-18 18:28       ` Linus Torvalds
2008-01-18 18:53         ` Sam Ravnborg
2008-01-18 22:46     ` Nick Piggin
2008-01-18 23:03       ` Linus Torvalds
2008-01-19  5:07         ` Nick Piggin
2008-01-21  9:43       ` Nick Piggin [this message]
2008-01-18  4:56 ` [patch 3/6] mm: add vm_insert_mixed npiggin
2008-01-18  4:56 ` [patch 4/6] xip: support non-struct page backed memory npiggin
2008-03-01  8:14   ` Jared Hulbert
2008-03-03  5:29     ` Nick Piggin
2008-03-03  8:30       ` Carsten Otte
2008-03-03 15:59       ` Jared Hulbert
2008-03-03  8:18     ` Carsten Otte
2008-03-03 15:44       ` Jared Hulbert
2008-03-03 18:40       ` Linus Torvalds
2008-03-03 19:38         ` Jared Hulbert
2008-03-03 20:04           ` Linus Torvalds
2008-03-03 20:32             ` Nick Piggin
2008-03-03 22:21               ` Linus Torvalds
2008-03-03 23:25                 ` Jared Hulbert
2008-03-04  9:06                 ` Carsten Otte

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080121094341.GA9198@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=cotte@de.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hugh@veritas.com \
    --cc=jaredeh@gmail.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).