All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: dario.faggioli@citrix.com, julien.grall@linaro.org, tim@xen.org,
	konrad@kernel.org, JBeulich@suse.com,
	xen-devel@lists.xenproject.org, malcolm.crossley@citrix.com
Subject: Re: [PATCH] Xen: Spread boot time page scrubbing across all available CPU's (v6)
Date: Fri, 13 Jun 2014 14:52:58 -0400	[thread overview]
Message-ID: <20140613185258.GA25212@laptop.dumpdata.com> (raw)
In-Reply-To: <539B43CE.5020306@citrix.com>

On Fri, Jun 13, 2014 at 07:32:46PM +0100, Andrew Cooper wrote:
> On 13/06/14 19:05, Konrad Rzeszutek Wilk wrote:
> > From: Malcolm Crossley <malcolm.crossley@citrix.com>
> >
> > The page scrubbing is done in 128MB chunks in lockstep across all the
> > non-SMT CPU's. This allows for the boot CPU to hold the heap_lock whilst each
> > chunk is being scrubbed and then release the heap_lock when the CPU's are
> > finished scrubing their individual chunk. This allows for the heap_lock to
> > not be held continously and for pending softirqs are to be serviced
> > periodically across the CPU's.
> >
> > The page scrub memory chunks are allocated to the CPU's in a NUMA aware
> > fashion to reduce socket interconnect overhead and improve performance.
> > Specifically in the first phase we scrub at the same time on all the
> > NUMA nodes that have CPUs - we also weed out the SMT threads so that
> > we only use cores (that gives a 50% boost). The second phase is for NUMA
> > nodes that have no CPUs - for that we use the closest NUMA node's CPUs
> > (non-SMT again) to do the job.
> >
> > This patch reduces the boot page scrub time on a 128GB 64 core AMD Opteron
> > 6386 machine from 49 seconds to 3 seconds.
> > On a IvyBridge-EX 8 socket box with 1.5TB it cuts it down from 15 minutes
> > to 63 seconds.
> >
> > Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Reviewed-by: Tim Deegan <tim@xen.org>
> 
> Functionally, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> A few minor nits...
> 
> >
> > ---
> >
> > v2
> >  - Reduced default chunk size to 128MB
> >  - Added code to scrub NUMA nodes with no active CPU linked to them
> >  - Be robust to boot CPU not being linked to a NUMA node
> >
> > v3:
> >  - Don't use SMT threads
> >  - Take care of remainder if the number of CPUs (or memory) is odd
> >  - Restructure the worker thread
> >  - s/u64/unsigned long/
> >
> > v4:
> >  - Don't use all CPUs on non-CPU NUMA nodes, just closest one
> >  - Syntax, and docs updates
> >  - Compile on ARM
> >  - Fix bug when NUMA node has 0 pages
> >
> > v5:
> >  - Properly figure out best NUMA node.
> >  - Fix comments to be proper style.
> >
> > v6:
> >  - Missing page shift on default values, optimize cpumask usage.
> >  - Fix case of NODE having one page and below first_valid_mfn
> >  - Add Ack
> > ---
> >  docs/misc/xen-command-line.markdown |   10 ++
> >  xen/common/page_alloc.c             |  211 ++++++++++++++++++++++++++++++++---
> >  xen/include/asm-arm/numa.h          |    1 +
> >  3 files changed, 204 insertions(+), 18 deletions(-)
> >
> > diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
> > index 25829fe..509f462 100644
> > --- a/docs/misc/xen-command-line.markdown
> > +++ b/docs/misc/xen-command-line.markdown
> > @@ -198,6 +198,16 @@ Scrub free RAM during boot.  This is a safety feature to prevent
> >  accidentally leaking sensitive VM data into other VMs if Xen crashes
> >  and reboots.
> >  
> > +### bootscrub_chunk
> 
> _ needs escaping with a backslash
> 
> > +> `= <size>`
> > +
> > +> Default: `134217728`
> 
> Due to the implicit 'k' suffix on sizes, this is not a useful hint as to
> how to change the default.
> 
> Furthermore, `128M` is substantially clearer.

<smiles>
> 
> > +
> > +Maximum RAM block size chunks to be scrubbed whilst holding the page heap lock
> > +and not running softirqs. Reduce this if softirqs are not being run frequently
> > +enough. Setting this to a high value may cause boot failure, particularly if
> > +the NMI watchdog is also enabled.
> > +
> >  ### cachesize
> >  > `= <size>`
> >  
> > diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> > ...
> > - * Scrub all unallocated pages in all heap zones. This function is more
> > - * convoluted than appears necessary because we do not want to continuously
> > - * hold the lock while scrubbing very large memory areas.
> > + * Scrub all unallocated pages in all heap zones. This function uses all
> > + * online cpu's to scrub the memory in parallel.
> >   */
> >  void __init scrub_heap_pages(void)
> >  {
> > -    unsigned long mfn;
> > -    struct page_info *pg;
> > +    cpumask_t node_cpus, all_worker_cpus;
> > +    unsigned int i, j;
> > +    unsigned long offset, max_per_cpu_sz = 0;
> > +    unsigned long start, end;
> > +    unsigned long rem = 0;
> > +    int last_distance, best_node;
> > +    int cpus;
> >  
> >      if ( !opt_bootscrub )
> >          return;
> >  
> > -    printk("Scrubbing Free RAM: ");
> > +    cpumask_clear(&all_worker_cpus);
> > +    /* Scrub block size. */
> > +    chunk_size = opt_bootscrub_chunk >> PAGE_SHIFT;
> > +    if ( chunk_size == 0 )
> > +        chunk_size = MB(128) >> PAGE_SHIFT;
> > +
> > +    /* Round #0 - figure out amounts and which CPUs to use. */
> > +    for_each_online_node ( i )
> > +    {
> > +        if ( !node_spanned_pages(i) )
> > +            continue;
> > +        /* Calculate Node memory start and end address. */
> > +        start = max(node_start_pfn(i), first_valid_mfn);
> > +        end = min(node_start_pfn(i) + node_spanned_pages(i), max_page);
> > +        /* Just in case NODE has 1 page and starts below first_valid_mfn. */
> > +        end = max(end, start);
> > +        /* CPUs that are online and on this node (if none, that it is OK). */
> > +        cpus = find_non_smt(i, &node_cpus);
> > +        cpumask_or(&all_worker_cpus, &all_worker_cpus, &node_cpus);
> > +        if ( cpus <= 0 )
> > +        {
> > +            /* No CPUs on this node. Round #2 will take of it. */
> > +            rem = 0;
> > +            region[i].per_cpu_sz = (end - start);
> > +        }
> > +        else
> > +        {
> > +            rem = (end - start) % cpumask_weight(&node_cpus);
> > +            region[i].per_cpu_sz = (end - start) / cpumask_weight(&node_cpus);
> 
> The 'cpus' variable still holds cpumask_weight(&node_cpus), and is
> rather more efficient to use.

Grrr. Right!
> 
> > +            if ( region[i].per_cpu_sz > max_per_cpu_sz )
> > +                max_per_cpu_sz = region[i].per_cpu_sz;
> > +        }
> > +        region[i].start = start;
> > +        region[i].rem = rem;
> > +        cpumask_copy(&region[i].cpus, &node_cpus);
> > +    }
> > +
> > +    printk("Scrubbing Free RAM on %u nodes using %u CPUs\n", num_online_nodes(),
> > +           cpumask_weight(&all_worker_cpus));
> 
> Both of these are signed quantities, rather than unsigned.

Done!
> 
> ~Andrew

      reply	other threads:[~2014-06-13 18:53 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-13 18:05 [PATCH] Xen: Spread boot time page scrubbing across all available CPU's (v6) Konrad Rzeszutek Wilk
2014-06-13 18:32 ` Andrew Cooper
2014-06-13 18:52   ` Konrad Rzeszutek Wilk [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140613185258.GA25212@laptop.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dario.faggioli@citrix.com \
    --cc=julien.grall@linaro.org \
    --cc=konrad@kernel.org \
    --cc=malcolm.crossley@citrix.com \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.