All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matt Mackall <mpm@selenic.com>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Chris Wright <chrisw@sous-sol.org>,
	virtualization@lists.osdl.org, Matt Mackall <mpm@waste.org>,
	Ingo Molnar <mingo@elte.hu>, Ian Pratt <ian.pratt@xensource.com>,
	lkml <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <clameter@sgi.com>
Subject: Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.
Date: Wed, 4 Apr 2007 23:41:33 -0500	[thread overview]
Message-ID: <20070405044133.GE4892@waste.org> (raw)
In-Reply-To: <20070404191206.675793431@goop.org>

On Wed, Apr 04, 2007 at 12:12:11PM -0700, Jeremy Fitzhardinge wrote:
> Add a new mm function apply_to_page_range() which applies a given
> function to every pte in a given virtual address range in a given mm
> structure. This is a generic alternative to cut-and-pasting the Linux
> idiomatic pagetable walking code in every place that a sequence of
> PTEs must be accessed.

As we discussed before, this obviously has a lot in common with my
walk_page_range code.

The major difference and one your above description seems to be
missing the important detail of why it's doing this:

> +	pte_alloc_kernel(pmd, addr) :
> +	pmd = pmd_alloc(mm, pud, addr);
> +	pud = pud_alloc(mm, pgd, addr);

..which is mentioned here:

> +/*
> + * Scan a region of virtual memory, filling in page tables as necessary
> + * and calling a provided function on each leaf page table.
> + */

But I'm not sure what the use case is that wants filling in the page
table..? If both modes really make sense, perhaps a flag could unify
these differences.

> +typedef int (*pte_fn_t)(pte_t *pte, struct page *pmd_page, unsigned long addr,
> +			void *data);

I'd gotten the impression that these sorts of typedefs were out of
fashion.

> +static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd,
> +				     unsigned long addr, unsigned long end,
> +				     pte_fn_t fn, void *data)
> +{
> +	pte_t *pte;
> +	int err;
> +	struct page *pmd_page;
> +	spinlock_t *ptl;
> +
> +	pte = (mm == &init_mm) ?
> +		pte_alloc_kernel(pmd, addr) :
> +		pte_alloc_map_lock(mm, pmd, addr, &ptl);
> +	if (!pte)
> +		return -ENOMEM;

Seems a bit awkward to pass mm all the way down the tree just for this
quirk. Which is a bit awkward as it means that whether or not a lock
is held in the callback is context dependent.

smaps, clear_ref, and my pagemap code all use the callback at the
pmd_range level, which a) localizes the pte-level locking concerns
with the user b) amortizes the indirection overhead and c)
(unfortunately) makes the user a bit more complex.

We should try to measure whether (b) actually makes a difference.

> +	do {
> +		err = fn(pte, pmd_page, addr, data);
> +		if (err)
> +			break;
> +	} while (pte++, addr += PAGE_SIZE, addr != end);

I was about to say this do/while format seems a bit non-idiomatic for
page table walkers, but then I looked at the code in mm/memory.c and
realized the stuff I've been hacking on is the odd one out.

-- 
Mathematics is the supreme nostalgia of our time.

WARNING: multiple messages have this Message-ID (diff)
From: Matt Mackall <mpm@selenic.com>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andi Kleen <ak@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	virtualization@lists.osdl.org,
	lkml <linux-kernel@vger.kernel.org>,
	Ian Pratt <ian.pratt@xensource.com>,
	Christian Limpach <Christian.Limpach@cl.cam.ac.uk>,
	Chris Wright <chrisw@sous-sol.org>,
	Christoph Lameter <clameter@sgi.com>,
	Matt Mackall <mpm@waste.org>, Ingo Molnar <mingo@elte.hu>
Subject: Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.
Date: Wed, 4 Apr 2007 23:41:33 -0500	[thread overview]
Message-ID: <20070405044133.GE4892@waste.org> (raw)
In-Reply-To: <20070404191206.675793431@goop.org>

On Wed, Apr 04, 2007 at 12:12:11PM -0700, Jeremy Fitzhardinge wrote:
> Add a new mm function apply_to_page_range() which applies a given
> function to every pte in a given virtual address range in a given mm
> structure. This is a generic alternative to cut-and-pasting the Linux
> idiomatic pagetable walking code in every place that a sequence of
> PTEs must be accessed.

As we discussed before, this obviously has a lot in common with my
walk_page_range code.

The major difference and one your above description seems to be
missing the important detail of why it's doing this:

> +	pte_alloc_kernel(pmd, addr) :
> +	pmd = pmd_alloc(mm, pud, addr);
> +	pud = pud_alloc(mm, pgd, addr);

..which is mentioned here:

> +/*
> + * Scan a region of virtual memory, filling in page tables as necessary
> + * and calling a provided function on each leaf page table.
> + */

But I'm not sure what the use case is that wants filling in the page
table..? If both modes really make sense, perhaps a flag could unify
these differences.

> +typedef int (*pte_fn_t)(pte_t *pte, struct page *pmd_page, unsigned long addr,
> +			void *data);

I'd gotten the impression that these sorts of typedefs were out of
fashion.

> +static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd,
> +				     unsigned long addr, unsigned long end,
> +				     pte_fn_t fn, void *data)
> +{
> +	pte_t *pte;
> +	int err;
> +	struct page *pmd_page;
> +	spinlock_t *ptl;
> +
> +	pte = (mm == &init_mm) ?
> +		pte_alloc_kernel(pmd, addr) :
> +		pte_alloc_map_lock(mm, pmd, addr, &ptl);
> +	if (!pte)
> +		return -ENOMEM;

Seems a bit awkward to pass mm all the way down the tree just for this
quirk. Which is a bit awkward as it means that whether or not a lock
is held in the callback is context dependent.

smaps, clear_ref, and my pagemap code all use the callback at the
pmd_range level, which a) localizes the pte-level locking concerns
with the user b) amortizes the indirection overhead and c)
(unfortunately) makes the user a bit more complex.

We should try to measure whether (b) actually makes a difference.

> +	do {
> +		err = fn(pte, pmd_page, addr, data);
> +		if (err)
> +			break;
> +	} while (pte++, addr += PAGE_SIZE, addr != end);

I was about to say this do/while format seems a bit non-idiomatic for
page table walkers, but then I looked at the code in mm/memory.c and
realized the stuff I've been hacking on is the odd one out.

-- 
Mathematics is the supreme nostalgia of our time.

  reply	other threads:[~2007-04-05  4:41 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-04 19:11 [patch 00/20] paravirt_ops updates Jeremy Fitzhardinge
2007-04-04 19:11 ` Jeremy Fitzhardinge
2007-04-04 19:11 ` [patch 01/20] update MAINTAINERS Jeremy Fitzhardinge
2007-04-04 19:11   ` Jeremy Fitzhardinge
2007-04-04 19:11 ` [patch 02/20] Remove CONFIG_DEBUG_PARAVIRT Jeremy Fitzhardinge
2007-04-04 19:11   ` Jeremy Fitzhardinge
2007-04-04 19:11 ` [patch 03/20] use paravirt_nop to consistently mark no-op operations Jeremy Fitzhardinge
2007-04-04 19:11   ` Jeremy Fitzhardinge
2007-04-04 19:11 ` [patch 04/20] Add pagetable accessors to pack and unpack pagetable entries Jeremy Fitzhardinge
2007-04-04 19:11   ` Jeremy Fitzhardinge
2007-04-04 19:11 ` [patch 05/20] Hooks to set up initial pagetable Jeremy Fitzhardinge
2007-04-04 19:11   ` Jeremy Fitzhardinge
2007-04-04 19:11 ` [patch 06/20] Allocate a fixmap slot Jeremy Fitzhardinge
2007-04-04 19:11   ` Jeremy Fitzhardinge
2007-04-04 19:11 ` [patch 07/20] Allow paravirt backend to choose kernel PMD sharing Jeremy Fitzhardinge
2007-04-04 19:11   ` Jeremy Fitzhardinge
2007-04-05  0:30   ` Christoph Lameter
2007-04-05  0:43     ` Jeremy Fitzhardinge
2007-04-05  1:29     ` Chris Wright
2007-04-06 23:41   ` Andrew Morton
2007-04-06 23:41     ` Andrew Morton
2007-04-07  0:02     ` Jeremy Fitzhardinge
2007-04-07  0:02       ` Jeremy Fitzhardinge
2007-04-07  0:28       ` Andrew Morton
2007-04-07  0:40         ` Jeremy Fitzhardinge
2007-04-07  0:40           ` Jeremy Fitzhardinge
2007-04-07  1:21           ` Andrew Morton
2007-04-07  5:47             ` Jeremy Fitzhardinge
2007-04-09  2:36         ` William Lee Irwin III
2007-04-04 19:11 ` [patch 08/20] add hooks to intercept mm creation and destruction Jeremy Fitzhardinge
2007-04-04 19:11   ` Jeremy Fitzhardinge
2007-04-04 19:12 ` [patch 09/20] rename struct paravirt_patch to paravirt_patch_site for clarity Jeremy Fitzhardinge
2007-04-04 19:12   ` Jeremy Fitzhardinge
2007-04-06 23:18   ` Andrew Morton
2007-04-06 23:18     ` Andrew Morton
2007-04-06 23:24     ` Jeremy Fitzhardinge
2007-04-04 19:12 ` [patch 10/20] Use patch site IDs computed from offset in paravirt_ops structure Jeremy Fitzhardinge
2007-04-04 19:12   ` Jeremy Fitzhardinge
2007-04-04 19:12 ` [patch 11/20] Fix patch site clobbers to include return register Jeremy Fitzhardinge
2007-04-04 19:12   ` Jeremy Fitzhardinge
2007-04-04 19:12 ` [patch 12/20] Consistently wrap paravirt ops callsites to make them patchable Jeremy Fitzhardinge
2007-04-04 19:12   ` Jeremy Fitzhardinge
2007-04-04 19:12 ` [patch 13/20] Document asm-i386/paravirt.h Jeremy Fitzhardinge
2007-04-04 19:12   ` Jeremy Fitzhardinge
2007-04-04 19:12 ` [patch 14/20] add common patching machinery Jeremy Fitzhardinge
2007-04-04 19:12   ` Jeremy Fitzhardinge
2007-04-04 19:12 ` [patch 15/20] add flush_tlb_others paravirt_op Jeremy Fitzhardinge
2007-04-04 19:12   ` Jeremy Fitzhardinge
2007-04-04 19:12 ` [patch 16/20] revert map_pt_hook Jeremy Fitzhardinge
2007-04-04 19:12   ` Jeremy Fitzhardinge
2007-04-04 19:12 ` [patch 17/20] add kmap_atomic_pte for mapping highpte pages Jeremy Fitzhardinge
2007-04-04 19:12   ` Jeremy Fitzhardinge
2007-04-04 19:12 ` [patch 18/20] clean up tsc-based sched_clock Jeremy Fitzhardinge
2007-04-04 19:12   ` Jeremy Fitzhardinge
2007-04-06 23:22   ` Andrew Morton
2007-04-06 23:27     ` Jeremy Fitzhardinge
2007-04-06 23:45       ` Andrew Morton
2007-04-06 23:40     ` Jeremy Fitzhardinge
2007-04-04 19:12 ` [patch 19/20] Add a sched_clock paravirt_op Jeremy Fitzhardinge
2007-04-04 19:12   ` Jeremy Fitzhardinge
2007-04-04 19:12 ` [patch 20/20] Add apply_to_page_range() which applies a function to a pte range Jeremy Fitzhardinge
2007-04-04 19:12   ` Jeremy Fitzhardinge
2007-04-05  4:41   ` Matt Mackall [this message]
2007-04-05  4:41     ` Matt Mackall
2007-04-05  6:52     ` Jeremy Fitzhardinge
2007-04-05  6:52       ` Jeremy Fitzhardinge
2007-04-17 20:56       ` Matt Mackall
2007-04-17 20:56         ` Matt Mackall
2007-04-19 19:44         ` Jeremy Fitzhardinge
2007-04-19 19:59           ` Matt Mackall
2007-04-19 19:59             ` Matt Mackall
2007-04-19 21:37             ` Jeremy Fitzhardinge
2007-04-19 21:37               ` Jeremy Fitzhardinge
2007-04-19 21:30               ` Matt Mackall
2007-04-19 21:30                 ` Matt Mackall
2007-04-19 22:30                 ` Jeremy Fitzhardinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070405044133.GE4892@waste.org \
    --to=mpm@selenic.com \
    --cc=akpm@linux-foundation.org \
    --cc=chrisw@sous-sol.org \
    --cc=clameter@sgi.com \
    --cc=ian.pratt@xensource.com \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mpm@waste.org \
    --cc=virtualization@lists.osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.