public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
From: Robin Holt <holt@sgi.com>
To: linux-ia64@vger.kernel.org
Subject: Re: [very very drafty] prezeroing to increase the page fault rate
Date: Wed, 15 Dec 2004 21:21:29 +0000	[thread overview]
Message-ID: <20041215212129.GA29941@lnx-holt.americas.sgi.com> (raw)
In-Reply-To: <Pine.LNX.4.58.0412141832350.6975@schroedinger.engr.sgi.com>

On Tue, Dec 14, 2004 at 06:34:13PM -0800, Christoph Lameter wrote:
> The page fault patches address the scalability of the fault handler
> by aggregating requests (anticipatory prefaulting) or by reducing the locking
> overhead (page fault scalability patches). However, the main time spend in
> the page fault handler is by zeroing pages. The following patch
> zeroes pages in the background through hardware (Altix Block Transfer Engine)
> or via software when the system is idle. This increases the performance
> of the page fault handler dramatically even for only a single thread:
> 
> 2.6.10-rc3-bk7 (allocating 1 GB):
> 
>  Gb Rep Threads   User      System     Wall flt/cpu/s fault/wsec
>   1   1    8    0.029s      1.373s   0.039s 46733.217 167449.984
>   1   1    4    0.016s      1.152s   0.043s 56064.229 152067.012
>   1   1    2    0.011s      1.074s   0.056s 60349.726 115679.719
>   1   1    1    0.012s      0.708s   0.072s 90933.436  90849.200
> 
> with patch:
> 
>  Gb Rep Threads   User      System     Wall flt/cpu/s fault/wsec
>   1   1    8    0.012s      0.759s   0.023s 84840.529 279197.309
>   1   1    4    0.014s      0.307s   0.018s203360.588 354015.152
>   1   1    2    0.021s      0.373s   0.023s166111.155 283594.162
>   1   1    1    0.012s      0.200s   0.021s307839.729 306791.723
> 
> I have some spot results here that indicate that a single thread may
> do up to 500000 faults a second with this patch alone.

This sounds impressive, but from my limited understanding of the patches,
I think it is a misleading figure.  This would require the system to
sit idle for a period of time between large jobs to ensure that
enough pages are free so all allocations could be satisfied from the
pre-zeroed section.

My understanding (very limited as I only spent 15 minutes looking)
is that only idle cpus are actually queueing pages for zereoing.
Is this correct or am I off the mark?

If that is so, I think we need to rethink this some.  I believe the
largest benefit would come if you used timers to check for a previous
page zero operation completing and then queueing up the next.  This
could be done for all nodes that are owned by a parent node.  That would
allow a system with many mbricks (and therefore many btes) with very
few cbricks to effeciently use all the btes for zeroeoing.  Is that
the intent at any point in this patch life?  Otherwise you end up
with speedups only when you have idle cpus for zereoing.

> Index: linux-2.6.9/arch/ia64/sn/kernel/bte.c
> =================================> --- linux-2.6.9.orig/arch/ia64/sn/kernel/bte.c	2004-12-13 21:36:19.000000000 -0800
> +++ linux-2.6.9/arch/ia64/sn/kernel/bte.c	2004-12-14 18:19:07.000000000 -0800
> @@ -448,6 +458,94 @@
>  		mynodepda->bte_if[i].bte_num = i;
>  		mynodepda->bte_if[i].cleanup_active = 0;
>  		mynodepda->bte_if[i].bh_error = 0;
> +		mynodepda->bte_if[i].zp = NULL;
> +	}
> +}
> +
> +static inline void check_bzero_complete(void)
> +{
> +	unsigned long irq_flags;
> +	struct bteinfo_s *bte;
> +
> +	/* CPU 0 (per node) uses bte0 , CPU 1 uses bte1 */
> +	bte = bte_if_on_node(get_nasid(), cpuid_to_subnode(smp_processor_id()));
> +
> +	if (!bte->zp)
> +		return;
> +	local_irq_save(irq_flags);
> +	if (!spin_trylock(&bte->spinlock)) {
> +		local_irq_restore(irq_flags);
> +		return;
> +	}
> +	if (*bte->most_rcnt_na = BTE_WORD_BUSY ||
> +            (BTE_LNSTAT_LOAD(bte) & BTE_ACTIVE)) {
> +                spin_unlock_irqrestore(&bte->spinlock, irq_flags);
> +		return;
> +	}
> +	bte_bzero_complete(bte);
> +	spin_unlock_irqrestore(&bte->spinlock, irq_flags);
> +}
> +

Why not have a seperate notification line for zereoing operations.
Add a seperate bte flag in that says "use the zereoing notification
line" and have it return the address of the line being used.

You start then calls bte_copy with the flags and you get back
the notification line you are concerned with.  Alternatively,
you could put the notification line into a structure owned
by the bte_start_zero() private structures and pass the address
in.  This allows the bte_copy code to operation as is.  It will
also simplify the bte_start_bzero significantly and make it
very easy to keep things consistent.  I also makes understanding
the bte_copy code easier since there is no back-door interaction
with any other functions.


> +static int bte_start_bzero(struct page *p, int order)
> +{
> +	struct bteinfo_s *bte;
> +	unsigned int len = PAGE_SIZE << order;
> +	unsigned long irq_flags;
> +
> +
> +	/* Check limitations.
> +		1. System must be running (weird things happen during bootup)
> +		2. Size >128KB. Smaller requests cause too much bte traffic
> +	 */
> +	if (len > BTE_MAX_XFER ||
> +	    order < 4 ||
> +	    system_state != SYSTEM_RUNNING) {
> +		check_bzero_complete();
> +		return EINVAL;
> +	}
> +
> +	/* CPU 0 (per node) uses bte0 , CPU 1 uses bte1 */
> +	bte = bte_if_on_node(get_nasid(), cpuid_to_subnode(smp_processor_id()));
> +	local_irq_save(irq_flags);
> +
> +	if (!spin_trylock(&bte->spinlock)) {
> +		local_irq_restore(irq_flags);
> +		printk(KERN_INFO "bzero: bte spinlock locked\n");
> +		return EBUSY;
>  	}
> 
> +	/* Complete any pending bzero notification */
> +	bte_bzero_complete(bte);
> +
> +	if (bte->zp ||
> +	    !(*bte->most_rcnt_na & BTE_WORD_AVAILABLE) ||
> +	    (BTE_LNSTAT_LOAD(bte) & BTE_ACTIVE)) {
> +		/* Got the lock but BTE still busy */
> +		spin_unlock_irqrestore(&bte->spinlock, irq_flags);
> +		return EBUSY;
> +	}
> +	printk(KERN_INFO "bzero: start address=%p length=%d\n", page_address(p), len);
> +	bte->most_rcnt_na = &bte->notify;
> +	*bte->most_rcnt_na = BTE_WORD_BUSY;
> +	bte->zp = p;
> +	SetPageLocked(p);
> +	SetPageZero(p);
> +	BTE_LNSTAT_STORE(bte, IBLS_BUSY | ((len >> L1_CACHE_SHIFT) & BTE_LEN_MASK));
> +	BTE_SRC_STORE(bte, TO_PHYS(ia64_tpa(page_address(p))));
> +	BTE_DEST_STORE(bte, 0);
> +	BTE_NOTIF_STORE(bte,
> +			TO_PHYS(ia64_tpa((unsigned long)bte->most_rcnt_na)));
> +	BTE_CTRL_STORE(bte, BTE_ZERO_FILL);
> +
> +	spin_unlock_irqrestore(&bte->spinlock, irq_flags);
> +	return 0;
> +
> +}
> +
> +static struct zero_driver bte_bzero = {
> +	.start_bzero = bte_start_bzero
> +};
> +
> +void sn_bte_bzero_init(void) {
> +	register_zero_driver(&bte_bzero);
>  }

  reply	other threads:[~2004-12-15 21:21 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-12-15  2:34 [very very drafty] prezeroing to increase the page fault rate Christoph Lameter
2004-12-15 21:21 ` Robin Holt [this message]
2004-12-15 21:58 ` Christoph Lameter
2004-12-15 22:00 ` Christoph Lameter
2004-12-16  0:25 ` Nick Piggin
2004-12-16  0:41 ` Christoph Lameter
2004-12-16  0:41 ` Linus Torvalds
2004-12-16  0:46 ` Christoph Lameter
2004-12-16  0:50 ` Nick Piggin
2004-12-16  0:54 ` Christoph Lameter
2004-12-16  1:18 ` Linus Torvalds
2004-12-16  1:44 ` Christoph Lameter
2004-12-16  1:55 ` Linus Torvalds
2004-12-16  2:17 ` Nick Piggin
2004-12-16  7:59 ` Nick Piggin
2004-12-16 16:27 ` Christoph Lameter
2004-12-16 18:38 ` Luck, Tony
2004-12-16 22:37 ` Nick Piggin
2004-12-21 19:55   ` Increase page fault rate by prezeroing V1 [0/3]: Overview Christoph Lameter
2004-12-21 19:56     ` Increase page fault rate by prezeroing V1 [1/3]: Introduce __GFP_ZERO Christoph Lameter
2004-12-21 19:57     ` Increase page fault rate by prezeroing V1 [2/3]: zeroing and scrubd Christoph Lameter
2005-01-01  2:22       ` Increase page fault rate by prezeroing V1 [2/3]: zeroing and Nick Piggin
2005-01-01  2:55         ` Increase page fault rate by prezeroing V1 [2/3]: zeroing and scrubd pmarques
2004-12-21 19:57     ` Increase page fault rate by prezeroing V1 [3/3]: Altix SN2 BTE Christoph Lameter
2004-12-22 12:46       ` Increase page fault rate by prezeroing V1 [3/3]: Altix SN2 BTE Zeroing Robin Holt
2004-12-22 19:56         ` Increase page fault rate by prezeroing V1 [3/3]: Altix SN2 BTE Christoph Lameter
2004-12-23 19:29     ` Prezeroing V2 [0/3]: Why and When it works Christoph Lameter
2004-12-23 19:33       ` Prezeroing V2 [1/4]: __GFP_ZERO / clear_page() removal Christoph Lameter
2004-12-23 19:33         ` Prezeroing V2 [2/4]: add second parameter to clear_page() for all Christoph Lameter
2004-12-24  8:33           ` Prezeroing V2 [2/4]: add second parameter to clear_page() for all arches Pavel Machek
2004-12-24 16:18             ` Prezeroing V2 [2/4]: add second parameter to clear_page() for Christoph Lameter
2004-12-24 16:27               ` Prezeroing V2 [2/4]: add second parameter to clear_page() for all arches Pavel Machek
2004-12-24 17:02                 ` Prezeroing V2 [2/4]: add second parameter to clear_page() for David S. Miller
2004-12-24 17:05           ` David S. Miller
2004-12-27 22:48             ` David S. Miller
2005-01-03 17:52             ` Christoph Lameter
2005-01-01 10:24           ` Geert Uytterhoeven
2005-01-04 23:12             ` Prezeroing V3 [0/4]: Discussion and i386 performance tests Christoph Lameter
2005-01-04 23:13               ` Prezeroing V3 [1/4]: Allow request for zeroed memory Christoph Lameter
2005-01-04 23:45                 ` Dave Hansen
2005-01-05  1:16                   ` Christoph Lameter
2005-01-05  1:26                     ` Linus Torvalds
2005-01-05 23:11                       ` Christoph Lameter
2005-01-05  0:34                 ` Linus Torvalds
2005-01-05  0:47                   ` Andrew Morton
2005-01-05  1:15                     ` Christoph Lameter
2005-01-08 21:12                 ` Hugh Dickins
2005-01-08 21:56                   ` David S. Miller
2005-01-21 20:09                     ` alloc_zeroed_user_highpage to fix the clear_user_highpage issue Christoph Lameter
2005-02-09  9:58                       ` [Patch] Fix oops in alloc_zeroed_user_highpage() when page is NULL Michael Ellerman
2005-01-21 20:12                     ` Extend clear_page by an order parameter Christoph Lameter
2005-01-21 22:29                       ` Paul Mackerras
2005-01-21 23:48                         ` Christoph Lameter
2005-01-22  0:35                           ` Paul Mackerras
2005-01-22  0:43                             ` Andrew Morton
2005-01-22  1:08                               ` Paul Mackerras
2005-01-22  1:20                               ` Roman Zippel
2005-01-22  1:25                               ` Paul Mackerras
2005-01-22  1:54                                 ` Christoph Lameter
2005-01-22  2:53                                   ` Paul Mackerras
2005-01-23  7:45                       ` Andrew Morton
2005-01-24 16:37                         ` Christoph Lameter
2005-01-24 20:23                           ` David S. Miller
2005-01-24 20:33                             ` Christoph Lameter
2005-01-10 17:16                   ` Prezeroing V3 [1/4]: Allow request for zeroed memory Christoph Lameter
2005-01-10 18:13                     ` Linus Torvalds
2005-01-10 20:17                       ` Christoph Lameter
2005-01-10 23:53                       ` Prezeroing V4 [0/4]: Overview Christoph Lameter
2005-01-10 23:54                         ` Prezeroing V4 [1/4]: Arch specific page zeroing during page fault Christoph Lameter
2005-01-11  0:41                           ` Chris Wright
2005-01-11  0:46                             ` Prezeroing V4 [1/4]: Arch specific page zeroing during page Christoph Lameter
2005-01-11  0:49                               ` Prezeroing V4 [1/4]: Arch specific page zeroing during page fault Chris Wright
2005-01-10 23:55                         ` Prezeroing V4 [2/4]: Zeroing implementation Christoph Lameter
2005-01-10 23:55                         ` Prezeroing V4 [3/4]: Altix SN2 BTE zero driver Christoph Lameter
2005-01-10 23:56                         ` Prezeroing V4 [4/4]: Extend clear_page to take an order parameter Christoph Lameter
2005-01-04 23:14               ` Prezeroing V3 [2/4]: Extension of clear_page to take an order Christoph Lameter
2005-01-05 23:25                 ` Christoph Lameter
2005-01-04 23:15               ` Prezeroing V3 [3/4]: Page zeroing through kscrubd Christoph Lameter
2005-01-04 23:16               ` Prezeroing V3 [4/4]: Driver for hardware zeroing on Altix Christoph Lameter
2004-12-23 19:34         ` Prezeroing V2 [3/4]: Add support for ZEROED and NOT_ZEROED free maps Christoph Lameter
2004-12-23 19:35         ` Prezeroing V2 [4/4]: Hardware Zeroing through SGI BTE Christoph Lameter
2004-12-23 20:08         ` Prezeroing V2 [1/4]: __GFP_ZERO / clear_page() removal Brian Gerst
2004-12-24 16:24           ` Christoph Lameter
2004-12-23 19:49       ` Prezeroing V2 [0/3]: Why and When it works Arjan van de Ven
2004-12-23 20:57       ` Matt Mackall
2004-12-23 21:01       ` Paul Mackerras
2004-12-23 21:11       ` Paul Mackerras
2004-12-23 21:37         ` Andrew Morton
2004-12-23 23:00           ` Paul Mackerras
2004-12-23 21:48         ` Linus Torvalds
2004-12-23 22:34           ` Zwane Mwaikambo
2004-12-24  9:14           ` Arjan van de Ven
2004-12-24 18:21             ` Linus Torvalds
2004-12-24 18:57               ` Arjan van de Ven
2004-12-27 22:50               ` David S. Miller
2004-12-28 11:53                 ` Marcelo Tosatti
2004-12-24 16:17           ` Christoph Lameter
2004-12-24 18:31     ` Increase page fault rate by prezeroing V1 [0/3]: Overview Andrea Arcangeli
2005-01-03 17:54       ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20041215212129.GA29941@lnx-holt.americas.sgi.com \
    --to=holt@sgi.com \
    --cc=linux-ia64@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox