From: Nick Piggin <npiggin@suse.de>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Hugh Dickins <hugh@veritas.com>,
Jared Hulbert <jaredeh@gmail.com>,
Carsten Otte <cotte@de.ibm.com>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
linux-arch@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch 2/6] mm: introduce pte_special pte bit
Date: Mon, 21 Jan 2008 10:43:41 +0100 [thread overview]
Message-ID: <20080121094341.GA9198@wotan.suse.de> (raw)
In-Reply-To: <20080118224622.GA11563@wotan.suse.de>
On Fri, Jan 18, 2008 at 11:46:22PM +0100, Nick Piggin wrote:
> On Fri, Jan 18, 2008 at 08:41:22AM -0800, Linus Torvalds wrote:
>
> > Then, just have a library version of the long form, and make architectures
> > that don't support it just use that (just so that you don't have to
> > duplicate that silly thing). So an architecture that support special page
> > flags would do somethiing like
> >
> > #define set_special_page(vma,addr,ptep,pfn,prot) \
> > set_pte_at(vma, addr, ptep, mk_special_pte(pfn,prot))
> > #define vm_normal_page(vma,addr,pte)
> > (pte_special(pte) ? NULL : pte_page(pte))
> >
> > and other architectures would just do
> >
> > #define set_special_page(vma,addr,ptep,pfn,prot) \
> > set_pte_at(vma, addr, ptep, mk_pte(pfn,prot))
> > #define vm_normal_page(vma,addr,pte) \
> > generic_vm_normal_page(vma,addr,pte)
> >
> > or something.
> >
> > THAT is what I mean by "no #ifdef's in code" - that the selection is done
> > at a higher level, the same way we have good interfaces with clear
> > *conceptual* meaning for all the other PTE accessing stuff, rather than
> > have conditionals in the architecture-independent code.
>
> OK, that gets around the "duplicate vm_normal_page everywhere" issue I
> had. I'm still not quite happy with it ;)
>
> How about taking a different approach. How about also having a pte_normal()
> function. Each architecture that has a pte special bit would make this
> !pte_special, and those that don't would return 0. They return 0 from both
> pte_special and pte_normal because they don't know whether the pte is
> special or normal.
>
> Then vm_normal_page would become:
>
> if (pte_special(pte))
> return NULL;
> else if (pte_normal(pte))
> return pte_page(pte);
>
> ... /* vma based scheme */
Hmm, it's not *quite* trivial as that for one important case:
vm_insert_mixed. Because we don't actually have a pte yet, so we can't
easily reuse insert_page / insert_pfn, rather we have to build the pte
first and then check it (patch attached, but I think it is a step
backwards)...
Really, I don't think either of my two approaches or your approach is
really a fundamentally different _abstraction_. It basically just has
to accommodate 2 different code paths no matter how you look at it. I
don't know how this is different to, say, conditionally compiling eg.
the FLATMEM/SPARSEMEM memory model code, or rwsem code, depending on
whether an architecture has defined some symbol. It happens all over
the kernel.
Actually, I'd argue it is _better_ than that, because the logic stays
in one place (one screenful, even), and away from abuse or divergence
by arch code.
If one actually came up with a new API that handles both cases better,
I'd say that is a different abstraction. Or if you could come up with
some different arch functions which would allow vm_normal_page to be
streamlined to read more like a regular C function, that should be a
different abstraction...
I'm still keen on my first patch. I know it isn't beautiful, but I
think it is better than the alternatives.
---
mm: add vm_insert_mixed
vm_insert_mixed will insert either a raw pfn or a refcounted struct page
into the page tables, depending on whether vm_normal_page() will return
the page or not. With the introduction of the new pte bit, this is now
a bit too tricky for drivers to be doing themselves.
filemap_xip uses this in a subsequent patch.
---
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -1097,6 +1097,8 @@ int remap_pfn_range(struct vm_area_struc
int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn);
+int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn);
struct page *follow_page(struct vm_area_struct *, unsigned long address,
unsigned int foll_flags);
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -1282,6 +1282,53 @@ out:
}
EXPORT_SYMBOL(vm_insert_pfn);
+int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn)
+{
+ struct mm_struct *mm = vma->vm_mm;
+ int retval;
+ pte_t *pte, entry;
+ spinlock_t *ptl;
+
+ BUG_ON(!(vma->vm_flags & VM_MIXEDMAP));
+
+ if (addr < vma->vm_start || addr >= vma->vm_end)
+ return -EFAULT;
+
+ retval = -ENOMEM;
+ pte = get_locked_pte(mm, addr, &ptl);
+ if (!pte)
+ goto out;
+ retval = -EBUSY;
+ if (!pte_none(*pte))
+ goto out_unlock;
+
+ entry = pte_mkspecial(pfn_pte(pfn, prot));
+ /*
+ * If we don't have pte special, then we have to use the pfn_valid()
+ * based VM_MIXEDMAP scheme (see vm_normal_page), and thus we *must*
+ * refcount the page if pfn_valid is true. Otherwise we can *always*
+ * avoid refcounting the page if we have pte_special.
+ */
+ if (!pte_special(entry) && pfn_valid(pfn)) {
+ struct page *page;
+
+ page = pfn_to_page(pfn);
+ get_page(page);
+ inc_mm_counter(mm, file_rss);
+ page_add_file_rmap(page);
+ }
+ /* Ok, finally just insert the thing.. */
+ set_pte_at(mm, addr, pte, entry);
+
+ retval = 0;
+out_unlock:
+ pte_unmap_unlock(pte, ptl);
+out:
+ return retval;
+}
+EXPORT_SYMBOL(vm_insert_mixed);
+
/*
* maps a range of physical memory into the requested pages. the old
* mappings are removed. any references to nonexistent pages results
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-01-21 9:43 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20080118045649.334391000@suse.de>
2008-01-18 4:56 ` [patch 1/6] mm: introduce VM_MIXEDMAP npiggin, Jared Hulbert
2008-01-18 4:56 ` [patch 2/6] mm: introduce pte_special pte bit npiggin
2008-01-18 16:41 ` Linus Torvalds
2008-01-18 18:04 ` Sam Ravnborg
2008-01-18 18:28 ` Linus Torvalds
2008-01-18 18:53 ` Sam Ravnborg
2008-01-18 22:46 ` Nick Piggin
2008-01-18 23:03 ` Linus Torvalds
2008-01-19 5:07 ` Nick Piggin
2008-01-21 9:43 ` Nick Piggin [this message]
2008-01-18 4:56 ` [patch 3/6] mm: add vm_insert_mixed npiggin
2008-01-18 4:56 ` [patch 4/6] xip: support non-struct page backed memory npiggin
2008-03-01 8:14 ` Jared Hulbert
2008-03-03 5:29 ` Nick Piggin
2008-03-03 8:30 ` Carsten Otte
2008-03-03 15:59 ` Jared Hulbert
2008-03-03 8:18 ` Carsten Otte
2008-03-03 15:44 ` Jared Hulbert
2008-03-03 18:40 ` Linus Torvalds
2008-03-03 19:38 ` Jared Hulbert
2008-03-03 20:04 ` Linus Torvalds
2008-03-03 20:32 ` Nick Piggin
2008-03-03 22:21 ` Linus Torvalds
2008-03-03 23:25 ` Jared Hulbert
2008-03-04 9:06 ` Carsten Otte
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080121094341.GA9198@wotan.suse.de \
--to=npiggin@suse.de \
--cc=akpm@linux-foundation.org \
--cc=cotte@de.ibm.com \
--cc=heiko.carstens@de.ibm.com \
--cc=hugh@veritas.com \
--cc=jaredeh@gmail.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=schwidefsky@de.ibm.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).