* [patch] do_no_pfn
@ 2006-06-19 9:19 Jes Sorensen
2006-06-19 13:06 ` Andi Kleen
2006-06-27 12:46 ` [patch] do_no_pfn - against latest git Jes Sorensen
0 siblings, 2 replies; 15+ messages in thread
From: Jes Sorensen @ 2006-06-19 9:19 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-kernel, Nick Piggin, Hugh Dickins, Carsten Otte,
bjorn_helgaas
Hi,
I woke up this morning and had a revelation! Today is the day, the day
of do_no_pfn()! It can be no other way ... :) And what happens, I come
into the office to discover that 2.6.17 is out! It has to be a sign!
Anyway, I have had no objections to this patch for a while now,
clearly it is perfect<tm> :) If anybody has new objections, it's
obviously not my fault! But ok I'll look at them anyway :)
So here it is, it even boots!
Cheers,
Jes
Implement do_no_pfn() for handling mapping of memory without a struct
page backing it. This avoids creating fake page table entries for
regions which are not backed by real memory.
This version uses specific NOPFN_{SIGBUS,OOM} return values, rather
than expect all negative pfn values would be an error. It also bugs on
cow mappings as this would not work with the VM.
Signed-off-by: Jes Sorensen <jes@sgi.com>
---
include/linux/mm.h | 7 +++++
mm/memory.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++-----
2 files changed, 64 insertions(+), 5 deletions(-)
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -199,6 +199,7 @@
void (*open)(struct vm_area_struct * area);
void (*close)(struct vm_area_struct * area);
struct page * (*nopage)(struct vm_area_struct * area, unsigned long address, int *type);
+ unsigned long (*nopfn)(struct vm_area_struct * area, unsigned long address);
int (*populate)(struct vm_area_struct * area, unsigned long address, unsigned long len, pgprot_t prot, unsigned long pgoff, int nonblock);
#ifdef CONFIG_NUMA
int (*set_policy)(struct vm_area_struct *vma, struct mempolicy *new);
@@ -612,6 +613,12 @@
#define NOPAGE_OOM ((struct page *) (-1))
/*
+ * Error return values for the *_nopfn functions
+ */
+#define NOPFN_SIGBUS ((unsigned long) -1)
+#define NOPFN_OOM ((unsigned long) -2)
+
+/*
* Different kinds of faults, as returned by handle_mm_fault().
* Used to decide whether a process gets delivered SIGBUS or
* just gets major/minor fault counters bumped up.
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -2146,6 +2146,52 @@
}
/*
+ * do_no_pfn() tries to create a new page mapping for a page without
+ * a struct_page backing it
+ *
+ * As this is called only for pages that do not currently exist, we
+ * do not need to flush old virtual caches or the TLB.
+ *
+ * We enter with non-exclusive mmap_sem (to exclude vma changes,
+ * but allow concurrent faults), and pte mapped but not yet locked.
+ * We return with mmap_sem still held, but pte unmapped and unlocked.
+ *
+ * It is expected that the ->nopfn handler always returns the same pfn
+ * for a given virtual mapping.
+ */
+static int do_no_pfn(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address, pte_t *page_table, pmd_t *pmd,
+ int write_access)
+{
+ spinlock_t *ptl;
+ pte_t entry;
+ unsigned long pfn;
+ int ret = VM_FAULT_MINOR;
+
+ pte_unmap(page_table);
+ BUG_ON(!(vma->vm_flags & VM_PFNMAP));
+ BUG_ON(is_cow_mapping(vma->vm_flags));
+
+ pfn = vma->vm_ops->nopfn(vma, address & PAGE_MASK);
+ if (pfn == NOPFN_OOM)
+ return VM_FAULT_OOM;
+ if (pfn == NOPFN_SIGBUS)
+ return VM_FAULT_SIGBUS;
+
+ page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
+
+ /* Only go through if we didn't race with anybody else... */
+ if (pte_none(*page_table)) {
+ entry = pfn_pte(pfn, vma->vm_page_prot);
+ if (write_access)
+ entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+ set_pte_at(mm, address, page_table, entry);
+ }
+ pte_unmap_unlock(page_table, ptl);
+ return ret;
+}
+
+/*
* Fault of a previously existing named mapping. Repopulate the pte
* from the encoded file_pte if possible. This enables swappable
* nonlinear vmas.
@@ -2207,11 +2253,17 @@
old_entry = entry = *pte;
if (!pte_present(entry)) {
if (pte_none(entry)) {
- if (!vma->vm_ops || !vma->vm_ops->nopage)
- return do_anonymous_page(mm, vma, address,
- pte, pmd, write_access);
- return do_no_page(mm, vma, address,
- pte, pmd, write_access);
+ if (vma->vm_ops) {
+ if (vma->vm_ops->nopage)
+ return do_no_page(mm, vma, address,
+ pte, pmd,
+ write_access);
+ if (vma->vm_ops->nopfn)
+ return do_no_pfn(mm, vma, address, pte,
+ pmd, write_access);
+ }
+ return do_anonymous_page(mm, vma, address,
+ pte, pmd, write_access);
}
if (pte_file(entry))
return do_file_page(mm, vma, address,
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] do_no_pfn
2006-06-19 9:19 [patch] do_no_pfn Jes Sorensen
@ 2006-06-19 13:06 ` Andi Kleen
2006-06-19 22:49 ` Robin Holt
2006-06-27 12:46 ` [patch] do_no_pfn - against latest git Jes Sorensen
1 sibling, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2006-06-19 13:06 UTC (permalink / raw)
To: Jes Sorensen
Cc: linux-kernel, Nick Piggin, Hugh Dickins, Carsten Otte,
bjorn_helgaas
Jes Sorensen <jes@sgi.com> writes:
> Hi,
>
> I woke up this morning and had a revelation! Today is the day, the day
> of do_no_pfn()! It can be no other way ... :) And what happens, I come
> into the office to discover that 2.6.17 is out! It has to be a sign!
>
> Anyway, I have had no objections to this patch for a while now,
> clearly it is perfect<tm> :) If anybody has new objections, it's
> obviously not my fault! But ok I'll look at them anyway :)
>
> So here it is, it even boots!
The big question is - why do you have pages without struct page?
It seems ... wrong.
-Andi
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] do_no_pfn
2006-06-19 13:06 ` Andi Kleen
@ 2006-06-19 22:49 ` Robin Holt
2006-06-20 8:01 ` Jes Sorensen
2006-06-20 8:58 ` Carsten Otte
0 siblings, 2 replies; 15+ messages in thread
From: Robin Holt @ 2006-06-19 22:49 UTC (permalink / raw)
To: Andi Kleen
Cc: Jes Sorensen, linux-kernel, Nick Piggin, Hugh Dickins,
Carsten Otte, bjorn_helgaas
On Mon, Jun 19, 2006 at 03:06:05PM +0200, Andi Kleen wrote:
> The big question is - why do you have pages without struct page?
> It seems ... wrong.
For mspec, these are pages which come from the efi trim regions.
They are not usuable by the kernel. Dropping in a kernel TLB entry for
them does allow speculation into PROM reserved memory which will result
in corrupting PROMs memory. We have seen this in the past.
Additionally, Carsten Otte had been pursuing a do_no_pfn function to
allow execute in place to insert executable pages into an os instance
which are actually part of the physical machines memory map, but not part
of the virtual machines. Several virtual machines could share the same
physical page. I, of course, reserve the right to have gotten Carsten's
intentions completely wrong. I don't believe I have misinterpretted
the intentions of do_no_pfn as expressed on the linux-mm mailing list.
Are you saying the for the mspec pages we should extend the vmem_map,
partially populate the regions for the mspec pages, mark those pages as
uncached and reserved and then turn them over to the uncached allocator?
Seems like we have done a lot of extra work to put a struct page behind
a page which requires special handling.
For Carsten's case, how would you propose we handle that?
Thanks,
Robin
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] do_no_pfn
2006-06-19 22:49 ` Robin Holt
@ 2006-06-20 8:01 ` Jes Sorensen
2006-06-20 8:13 ` Andi Kleen
2006-06-20 16:03 ` Bjorn Helgaas
2006-06-20 8:58 ` Carsten Otte
1 sibling, 2 replies; 15+ messages in thread
From: Jes Sorensen @ 2006-06-20 8:01 UTC (permalink / raw)
To: Robin Holt
Cc: Andi Kleen, linux-kernel, Nick Piggin, Hugh Dickins, Carsten Otte,
bjorn_helgaas
Robin Holt wrote:
> On Mon, Jun 19, 2006 at 03:06:05PM +0200, Andi Kleen wrote:
>> The big question is - why do you have pages without struct page?
>> It seems ... wrong.
[snip]
> Are you saying the for the mspec pages we should extend the vmem_map,
> partially populate the regions for the mspec pages, mark those pages as
> uncached and reserved and then turn them over to the uncached allocator?
> Seems like we have done a lot of extra work to put a struct page behind
> a page which requires special handling.
Note that Bjorn Helgas has a case where he needs this as well.
We could fake the pages by giving them a struct page, but it really
makes no point as you say.
Cheers,
Jes
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] do_no_pfn
2006-06-20 8:01 ` Jes Sorensen
@ 2006-06-20 8:13 ` Andi Kleen
2006-06-20 8:40 ` Jes Sorensen
2006-06-20 16:03 ` Bjorn Helgaas
1 sibling, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2006-06-20 8:13 UTC (permalink / raw)
To: Jes Sorensen
Cc: Robin Holt, linux-kernel, Nick Piggin, Hugh Dickins, Carsten Otte,
bjorn_helgaas
On Tuesday 20 June 2006 10:01, Jes Sorensen wrote:
> Robin Holt wrote:
> > On Mon, Jun 19, 2006 at 03:06:05PM +0200, Andi Kleen wrote:
> >> The big question is - why do you have pages without struct page?
> >> It seems ... wrong.
> [snip]
> > Are you saying the for the mspec pages we should extend the vmem_map,
> > partially populate the regions for the mspec pages, mark those pages as
> > uncached and reserved and then turn them over to the uncached allocator?
> > Seems like we have done a lot of extra work to put a struct page behind
> > a page which requires special handling.
>
> Note that Bjorn Helgas has a case where he needs this as well.
>
> We could fake the pages by giving them a struct page, but it really
> makes no point as you say.
I think it would be better if you gave them struct pages instead
of messing up core vm with such strange hooks.
Or alternatively code this in a different way. There are drivers
who map IO memory into user space without needing hacks like that.
Usually they just tweak the page tables directly on mmap.
-Andi
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] do_no_pfn
2006-06-20 8:13 ` Andi Kleen
@ 2006-06-20 8:40 ` Jes Sorensen
2006-06-20 8:48 ` Andi Kleen
0 siblings, 1 reply; 15+ messages in thread
From: Jes Sorensen @ 2006-06-20 8:40 UTC (permalink / raw)
To: Andi Kleen
Cc: Robin Holt, linux-kernel, Nick Piggin, Hugh Dickins, Carsten Otte,
bjorn_helgaas
Andi Kleen wrote:
> On Tuesday 20 June 2006 10:01, Jes Sorensen wrote:
>> We could fake the pages by giving them a struct page, but it really
>> makes no point as you say.
>
> I think it would be better if you gave them struct pages instead
> of messing up core vm with such strange hooks.
>
> Or alternatively code this in a different way. There are drivers
> who map IO memory into user space without needing hacks like that.
> Usually they just tweak the page tables directly on mmap.
Please go back and read the old threads on this for all the details,
I would miss half the points if I was to try and restate it all from
memory.
Doing this at mmap time does not work, you want NUMA node locality.
It has to be done through first touch mappings.
Jes
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] do_no_pfn
2006-06-20 8:40 ` Jes Sorensen
@ 2006-06-20 8:48 ` Andi Kleen
2006-06-20 9:12 ` Jes Sorensen
0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2006-06-20 8:48 UTC (permalink / raw)
To: Jes Sorensen
Cc: Robin Holt, linux-kernel, Nick Piggin, Hugh Dickins, Carsten Otte,
bjorn_helgaas
> Please go back and read the old threads on this for all the details,
> I would miss half the points if I was to try and restate it all from
> memory.
Shouldn't these points be in the patch submission description?
> Doing this at mmap time does not work, you want NUMA node locality.
> It has to be done through first touch mappings.
Then create struct page *s.
-Andi
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] do_no_pfn
2006-06-19 22:49 ` Robin Holt
2006-06-20 8:01 ` Jes Sorensen
@ 2006-06-20 8:58 ` Carsten Otte
1 sibling, 0 replies; 15+ messages in thread
From: Carsten Otte @ 2006-06-20 8:58 UTC (permalink / raw)
To: Robin Holt
Cc: Andi Kleen, Jes Sorensen, linux-kernel, Nick Piggin, Hugh Dickins,
bjorn_helgaas
Robin Holt wrote:
> For Carsten's case, how would you propose we handle that?
After previous discussion with Linus, and because I do not
have a good idea how to solve the remaining problems
in a clean way (yet?), please leave my case out for now.
We won't need it anytime soon on 390 as far as I can tell.
Linus Torvalds wrote:
> You _really_ cannot do COW together with "random pfn
> filling".
cheers,
Carsten
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] do_no_pfn
2006-06-20 8:48 ` Andi Kleen
@ 2006-06-20 9:12 ` Jes Sorensen
2006-06-20 9:35 ` Andi Kleen
0 siblings, 1 reply; 15+ messages in thread
From: Jes Sorensen @ 2006-06-20 9:12 UTC (permalink / raw)
To: Andi Kleen
Cc: Robin Holt, linux-kernel, Nick Piggin, Hugh Dickins, Carsten Otte,
bjorn_helgaas
Andi Kleen wrote:
>> Please go back and read the old threads on this for all the details,
>> I would miss half the points if I was to try and restate it all from
>> memory.
>
> Shouldn't these points be in the patch submission description?
You expect people to go look for things on random mailing lists when you
post it, but you don't care to search the archives yourself.... och
well.
http://www.gelato.unsw.edu.au/archives/linux-ia64/0603/index.html#17543
http://www.ussg.iu.edu/hypermail/linux/kernel/0604.2/index.html#0652
http://www.ussg.iu.edu/hypermail/linux/kernel/0604.3/index.html#0029
>> Doing this at mmap time does not work, you want NUMA node locality.
>> It has to be done through first touch mappings.
>
> Then create struct page *s.
One struct page for a random single page here, another for a single
random page there. And the risk that someone will start walking the
pages and dereference and cause data corruption. As explained before,
it's a bad idea.
Jes
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] do_no_pfn
2006-06-20 9:12 ` Jes Sorensen
@ 2006-06-20 9:35 ` Andi Kleen
2006-06-20 11:02 ` Robin Holt
2006-06-21 9:50 ` Jes Sorensen
0 siblings, 2 replies; 15+ messages in thread
From: Andi Kleen @ 2006-06-20 9:35 UTC (permalink / raw)
To: Jes Sorensen
Cc: Robin Holt, linux-kernel, Nick Piggin, Hugh Dickins, Carsten Otte,
bjorn_helgaas
> One struct page for a random single page here, another for a single
> random page there. And the risk that someone will start walking the
> pages and dereference and cause data corruption. As explained before,
> it's a bad idea.
Note sure what your point is. Why should they cause memory corruption?
Allowing struct page less VM is worse. If you add that then people
will use it for other stuff, and eventually we got a two class
VM. All not very good.
-Andi
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] do_no_pfn
2006-06-20 9:35 ` Andi Kleen
@ 2006-06-20 11:02 ` Robin Holt
2006-06-21 9:50 ` Jes Sorensen
1 sibling, 0 replies; 15+ messages in thread
From: Robin Holt @ 2006-06-20 11:02 UTC (permalink / raw)
To: Andi Kleen
Cc: Jes Sorensen, Robin Holt, linux-kernel, Nick Piggin, Hugh Dickins,
Carsten Otte, bjorn_helgaas
On Tue, Jun 20, 2006 at 11:35:53AM +0200, Andi Kleen wrote:
>
> > One struct page for a random single page here, another for a single
> > random page there. And the risk that someone will start walking the
> > pages and dereference and cause data corruption. As explained before,
> > it's a bad idea.
>
> Note sure what your point is. Why should they cause memory corruption?
>
> Allowing struct page less VM is worse. If you add that then people
> will use it for other stuff, and eventually we got a two class
> VM. All not very good.
You already have that. You already stated the mapping of device memory.
The only thing we are asking to do is have a block of device memory
which has its pfn inserted at first touch. The device is essentially
available on each node. It is not something the generic parts of the
VM need to manage. What benefit are we going to get from having
struct page * behind the pages when the struct page need to be marked
as reserved and uncached?
Thanks,
Robin
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] do_no_pfn
2006-06-20 8:01 ` Jes Sorensen
2006-06-20 8:13 ` Andi Kleen
@ 2006-06-20 16:03 ` Bjorn Helgaas
2006-06-21 7:38 ` Carsten Otte
1 sibling, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2006-06-20 16:03 UTC (permalink / raw)
To: Jes Sorensen
Cc: Robin Holt, Andi Kleen, linux-kernel, Nick Piggin, Hugh Dickins,
Carsten Otte, bjorn_helgaas
On Tuesday 20 June 2006 02:01, Jes Sorensen wrote:
> Robin Holt wrote:
> > On Mon, Jun 19, 2006 at 03:06:05PM +0200, Andi Kleen wrote:
> >> The big question is - why do you have pages without struct page?
> >> It seems ... wrong.
> ...
> Note that Bjorn Helgas has a case where he needs this as well.
I do have a case where I used pages without struct pages, but
I don't really like the implementation, and I'd love to have
someone who knows about VM tell me "no, dummy, you should do it
this way instead."
Here's the scenario: I'm trying to implement
/sys/class/pci_bus/DDDD:BB/legacy_mem so we can run X servers
on multiple VGA cards. The chipset (used in HP parisc and ia64
boxes) supports multiple PCI root bridges, and it routes the
VGA legacy MMIO space at 0xA0000-0xBFFFF to one of them.
This region is MMIO, so there are no struct pages for it. I can
easily mmap the space for the first VGA device. But to support
a second device, I have to be able to invalidate the mappings
for the first device, twiddle stuff in the chipset, and make new
mappings for the second device. And of course I have to do the
reverse (invalidate mappings of second device, twiddle chipset,
map first device) when the first X server faults on the frame
buffer.
Basically, only one of the /sys/class/pci_bus/DDDD:BB/legacy_mem
files can have an active mmap at a time, and I haven't figured
out a good way to do the mutual exclusion.
Bjorn
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] do_no_pfn
2006-06-20 16:03 ` Bjorn Helgaas
@ 2006-06-21 7:38 ` Carsten Otte
0 siblings, 0 replies; 15+ messages in thread
From: Carsten Otte @ 2006-06-21 7:38 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Jes Sorensen, Robin Holt, Andi Kleen, linux-kernel, Nick Piggin,
Hugh Dickins, bjorn_helgaas
Bjorn Helgaas wrote:
> I do have a case where I used pages without struct pages, but
> I don't really like the implementation, and I'd love to have
> someone who knows about VM tell me "no, dummy, you should do it
> this way instead."
>
> Here's the scenario: I'm trying to implement
> /sys/class/pci_bus/DDDD:BB/legacy_mem so we can run X servers
> on multiple VGA cards. The chipset (used in HP parisc and ia64
> boxes) supports multiple PCI root bridges, and it routes the
> VGA legacy MMIO space at 0xA0000-0xBFFFF to one of them.
>
> This region is MMIO, so there are no struct pages for it. I can
> easily mmap the space for the first VGA device. But to support
> a second device, I have to be able to invalidate the mappings
> for the first device, twiddle stuff in the chipset, and make new
> mappings for the second device. And of course I have to do the
> reverse (invalidate mappings of second device, twiddle chipset,
> map first device) when the first X server faults on the frame
> buffer.
>
> Basically, only one of the /sys/class/pci_bus/DDDD:BB/legacy_mem
> files can have an active mmap at a time, and I haven't figured
> out a good way to do the mutual exclusion.
Probably you can just nuke the pte's similar to __xip_unmap() in
mm/filemap_xip.c.
cheers,
Carsten
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] do_no_pfn
2006-06-20 9:35 ` Andi Kleen
2006-06-20 11:02 ` Robin Holt
@ 2006-06-21 9:50 ` Jes Sorensen
1 sibling, 0 replies; 15+ messages in thread
From: Jes Sorensen @ 2006-06-21 9:50 UTC (permalink / raw)
To: Andi Kleen
Cc: Robin Holt, linux-kernel, Nick Piggin, Hugh Dickins, Carsten Otte,
bjorn_helgaas
Andi Kleen wrote:
>> One struct page for a random single page here, another for a single
>> random page there. And the risk that someone will start walking the
>> pages and dereference and cause data corruption. As explained before,
>> it's a bad idea.
>
> Note sure what your point is. Why should they cause memory corruption?
>
> Allowing struct page less VM is worse. If you add that then people
> will use it for other stuff, and eventually we got a two class
> VM. All not very good.
Special treatment of the pages are required. In particular they *must*
be referenced in uncached mode. If something derefences the struct page
in cached mode and the official user of the page does it correctly in
uncached mode one risks memory corruption. It's worse than that in fact
it has to be a full granule of pages that isn't touched like this.
But as Robin pointed out, there just is no real benefit to having a
struct page behind it.
Cheers,
Jes
^ permalink raw reply [flat|nested] 15+ messages in thread
* [patch] do_no_pfn - against latest git
2006-06-19 9:19 [patch] do_no_pfn Jes Sorensen
2006-06-19 13:06 ` Andi Kleen
@ 2006-06-27 12:46 ` Jes Sorensen
1 sibling, 0 replies; 15+ messages in thread
From: Jes Sorensen @ 2006-06-27 12:46 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-kernel, Nick Piggin, Hugh Dickins, bjorn_helgaas, holt
Hi Linus,
Included is the latest diff of the do_no_pfn patch against the latest
git tree. Since there has been no new objections to this one, it would
be nice to get it in before 2.6.18 closes.
Thanks,
Jes
Implement do_no_pfn() for handling mapping of memory without a struct
page backing it. This avoids creating fake page table entries for
regions which are not backed by real memory.
This feature is used by the MSPEC driver and other users, where it is
highly undesirable to have a struct page sitting behind the page
(for instance if the page is accessed in cached mode via the struct
page in parallel to the the driver accessing it uncached, which can
result in data corruption on some architectures, such as ia64).
This version uses specific NOPFN_{SIGBUS,OOM} return values, rather
than expect all negative pfn values would be an error. It also bugs on
cow mappings as this would not work with the VM.
Signed-off-by: Jes Sorensen <jes@sgi.com>
---
include/linux/mm.h | 7 +++++
mm/memory.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++-----
2 files changed, 64 insertions(+), 5 deletions(-)
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -197,6 +197,7 @@ struct vm_operations_struct {
void (*open)(struct vm_area_struct * area);
void (*close)(struct vm_area_struct * area);
struct page * (*nopage)(struct vm_area_struct * area, unsigned long address, int *type);
+ unsigned long (*nopfn)(struct vm_area_struct * area, unsigned long address);
int (*populate)(struct vm_area_struct * area, unsigned long address, unsigned long len, pgprot_t prot, unsigned long pgoff, int nonblock);
/* notification that a previously read-only page is about to become
@@ -619,6 +620,12 @@ static inline int page_mapped(struct pag
#define NOPAGE_OOM ((struct page *) (-1))
/*
+ * Error return values for the *_nopfn functions
+ */
+#define NOPFN_SIGBUS ((unsigned long) -1)
+#define NOPFN_OOM ((unsigned long) -2)
+
+/*
* Different kinds of faults, as returned by handle_mm_fault().
* Used to decide whether a process gets delivered SIGBUS or
* just gets major/minor fault counters bumped up.
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -2207,6 +2207,52 @@ oom:
}
/*
+ * do_no_pfn() tries to create a new page mapping for a page without
+ * a struct_page backing it
+ *
+ * As this is called only for pages that do not currently exist, we
+ * do not need to flush old virtual caches or the TLB.
+ *
+ * We enter with non-exclusive mmap_sem (to exclude vma changes,
+ * but allow concurrent faults), and pte mapped but not yet locked.
+ * We return with mmap_sem still held, but pte unmapped and unlocked.
+ *
+ * It is expected that the ->nopfn handler always returns the same pfn
+ * for a given virtual mapping.
+ */
+static int do_no_pfn(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address, pte_t *page_table, pmd_t *pmd,
+ int write_access)
+{
+ spinlock_t *ptl;
+ pte_t entry;
+ unsigned long pfn;
+ int ret = VM_FAULT_MINOR;
+
+ pte_unmap(page_table);
+ BUG_ON(!(vma->vm_flags & VM_PFNMAP));
+ BUG_ON(is_cow_mapping(vma->vm_flags));
+
+ pfn = vma->vm_ops->nopfn(vma, address & PAGE_MASK);
+ if (pfn == NOPFN_OOM)
+ return VM_FAULT_OOM;
+ if (pfn == NOPFN_SIGBUS)
+ return VM_FAULT_SIGBUS;
+
+ page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
+
+ /* Only go through if we didn't race with anybody else... */
+ if (pte_none(*page_table)) {
+ entry = pfn_pte(pfn, vma->vm_page_prot);
+ if (write_access)
+ entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+ set_pte_at(mm, address, page_table, entry);
+ }
+ pte_unmap_unlock(page_table, ptl);
+ return ret;
+}
+
+/*
* Fault of a previously existing named mapping. Repopulate the pte
* from the encoded file_pte if possible. This enables swappable
* nonlinear vmas.
@@ -2268,11 +2314,17 @@ static inline int handle_pte_fault(struc
old_entry = entry = *pte;
if (!pte_present(entry)) {
if (pte_none(entry)) {
- if (!vma->vm_ops || !vma->vm_ops->nopage)
- return do_anonymous_page(mm, vma, address,
- pte, pmd, write_access);
- return do_no_page(mm, vma, address,
- pte, pmd, write_access);
+ if (vma->vm_ops) {
+ if (vma->vm_ops->nopage)
+ return do_no_page(mm, vma, address,
+ pte, pmd,
+ write_access);
+ if (vma->vm_ops->nopfn)
+ return do_no_pfn(mm, vma, address, pte,
+ pmd, write_access);
+ }
+ return do_anonymous_page(mm, vma, address,
+ pte, pmd, write_access);
}
if (pte_file(entry))
return do_file_page(mm, vma, address,
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2006-06-27 12:46 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-19 9:19 [patch] do_no_pfn Jes Sorensen
2006-06-19 13:06 ` Andi Kleen
2006-06-19 22:49 ` Robin Holt
2006-06-20 8:01 ` Jes Sorensen
2006-06-20 8:13 ` Andi Kleen
2006-06-20 8:40 ` Jes Sorensen
2006-06-20 8:48 ` Andi Kleen
2006-06-20 9:12 ` Jes Sorensen
2006-06-20 9:35 ` Andi Kleen
2006-06-20 11:02 ` Robin Holt
2006-06-21 9:50 ` Jes Sorensen
2006-06-20 16:03 ` Bjorn Helgaas
2006-06-21 7:38 ` Carsten Otte
2006-06-20 8:58 ` Carsten Otte
2006-06-27 12:46 ` [patch] do_no_pfn - against latest git Jes Sorensen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox