From: William Lee Irwin III <wli@holomorphy.com>
To: "David S. Miller" <davem@redhat.com>
Cc: torvalds@osdl.org, linux-arch@vger.kernel.org
Subject: Re: copy_page_range()
Date: Sat, 7 Aug 2004 01:07:51 -0700 [thread overview]
Message-ID: <20040807080751.GX17188@holomorphy.com> (raw)
In-Reply-To: <20040807000529.5ca6e8fe.davem@redhat.com>
On Sat, Aug 07, 2004 at 12:05:29AM -0700, David S. Miller wrote:
> Every couple months I look at this thing.
> The main issue is that it's very cache unfriendly,
> especially with how sparsely populated the page tables
> are for 64-bit processes.
> As a simple example, it's at the top of the kernel
> profile for 64-bit lat_proc {fork,exec,shell} on
> sparc64.
> And it's in fact the pmd array scans that take all
> of the cache misses, and thus most of the run time.
> An idea I've always been entertaining is to associate
> a bitmask with each pmd table. For example, a possible
> current implementation could be to abuse page_struct->index
> for this bitmask, and use virt_to_page(pmdp)->index to get
> at it.
Sounds generally reasonable.
On Sat, Aug 07, 2004 at 12:05:29AM -0700, David S. Miller wrote:
> This divides the pmd table into BITS_PER_LONG sections.
> If the bit is set in ->index then we populated at least
> one of the pmd entries in that section. We never clear
> bits, except at pmd table allocation time.
> Then the pmd scan iterates over ->index, and only actually
> dereferences the pmd entries iff it finds a set bit, and
> it only dereferences the section of pmd entries represented
> by that bit.
> Another idea I've also considered is to implement the
> pgd/pmd levels as a more compact tree, based upon virtual
> address, such as a radix tree.
> I think all of this could be experimented with if we
> abstracted out the pmd/pgd/pte iteration. So much stuff
> in the kernel mm code is of the form:
> for_each_pgd(pgdp)
> for_each_pmd(pgdp, pmdp)
> for_each_pte(pmdp, ptep)
> do_something(ptep)
> At 2-levels, as on most of the 32-bit platforms, things
> aren't so bad.
> Comments?
The number of levels can be abstracted easily. Something to give an
idea of how might be something like this:
struct pte_walk_state {
pgd_t *pgd;
pmd_t *pmd;
pte_t *pte;
unsigned long vaddr;
};
int copy_page_range(struct mm_struct *dst, struct mm_struct *src,
struct vm_area_struct *vma)
{
int cow, ret = 0;
struct pte_walk_state walk_parent, walk_child;
cow = (vma->vm_flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
spin_lock(&dst->page_table_lock);
pte_walk_descend_and_create(dst, &walk_child, vma->vm_start);
for_each_inuse_pte(src, &walk_parent, vma->vm_start, vma->vm_end) {
if (pte_walk_move_and_create(&walk_child, walk_parent.vaddr)) {
ret = -ENOMEM;
break;
}
/*
* do stuff to child and parent ptes
*/
...
}
spin_unlock(&dst->page_table_lock);
return ret;
}
void zap_page_range(struct vm_area_struct *vma, unsigned long start,
unsigned long len, struct zap_details *details)
{
struct pte_walk_state walk;
spin_lock(&vma->vm_mm->page_table_lock);
for_each_inuse_pte(vma->vm_mm, &walk, vma->vm_start, vma->vm_end) {
/*
* wipe pte and do stuff
*/
...
}
spin_unlock(&vma->vm_mm->page_table_lock);
}
where #define for_each_inuse_pte(mm, walk, start, end) \
for (pte_walk_descend(mm, walk, start); (walk)->vaddr < (end); \
next_inuse_pte(walk))
etc.
-- wli
next prev parent reply other threads:[~2004-08-07 8:07 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-08-07 7:05 copy_page_range() David S. Miller
2004-08-07 8:07 ` William Lee Irwin III [this message]
2004-08-11 7:07 ` copy_page_range() David S. Miller
2004-08-11 7:35 ` copy_page_range() William Lee Irwin III
2004-08-11 16:13 ` copy_page_range() Linus Torvalds
2004-08-11 20:45 ` copy_page_range() David S. Miller
2004-08-12 3:53 ` copy_page_range() David S. Miller
2004-08-09 9:01 ` copy_page_range() David Mosberger
2004-08-09 9:04 ` copy_page_range() William Lee Irwin III
2004-08-09 9:27 ` copy_page_range() David Mosberger
2004-08-09 9:29 ` copy_page_range() William Lee Irwin III
2004-08-09 10:01 ` copy_page_range() David Mosberger
2004-08-09 17:46 ` copy_page_range() David S. Miller
2004-08-09 17:08 ` copy_page_range() Linus Torvalds
2004-08-09 18:49 ` copy_page_range() William Lee Irwin III
2004-08-09 17:45 ` copy_page_range() David S. Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040807080751.GX17188@holomorphy.com \
--to=wli@holomorphy.com \
--cc=davem@redhat.com \
--cc=linux-arch@vger.kernel.org \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox