* [PATCH] Xen: Spread boot time page scrubbing across all available CPU's (v6)
@ 2014-06-13 18:05 Konrad Rzeszutek Wilk
2014-06-13 18:32 ` Andrew Cooper
0 siblings, 1 reply; 3+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-06-13 18:05 UTC (permalink / raw)
To: xen-devel, JBeulich, andrew.cooper3, tim, dario.faggioli,
julien.grall
Cc: konrad, malcolm.crossley, Konrad Rzeszutek Wilk
From: Malcolm Crossley <malcolm.crossley@citrix.com>
The page scrubbing is done in 128MB chunks in lockstep across all the
non-SMT CPU's. This allows for the boot CPU to hold the heap_lock whilst each
chunk is being scrubbed and then release the heap_lock when the CPU's are
finished scrubing their individual chunk. This allows for the heap_lock to
not be held continously and for pending softirqs are to be serviced
periodically across the CPU's.
The page scrub memory chunks are allocated to the CPU's in a NUMA aware
fashion to reduce socket interconnect overhead and improve performance.
Specifically in the first phase we scrub at the same time on all the
NUMA nodes that have CPUs - we also weed out the SMT threads so that
we only use cores (that gives a 50% boost). The second phase is for NUMA
nodes that have no CPUs - for that we use the closest NUMA node's CPUs
(non-SMT again) to do the job.
This patch reduces the boot page scrub time on a 128GB 64 core AMD Opteron
6386 machine from 49 seconds to 3 seconds.
On a IvyBridge-EX 8 socket box with 1.5TB it cuts it down from 15 minutes
to 63 seconds.
Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Tim Deegan <tim@xen.org>
---
v2
- Reduced default chunk size to 128MB
- Added code to scrub NUMA nodes with no active CPU linked to them
- Be robust to boot CPU not being linked to a NUMA node
v3:
- Don't use SMT threads
- Take care of remainder if the number of CPUs (or memory) is odd
- Restructure the worker thread
- s/u64/unsigned long/
v4:
- Don't use all CPUs on non-CPU NUMA nodes, just closest one
- Syntax, and docs updates
- Compile on ARM
- Fix bug when NUMA node has 0 pages
v5:
- Properly figure out best NUMA node.
- Fix comments to be proper style.
v6:
- Missing page shift on default values, optimize cpumask usage.
- Fix case of NODE having one page and below first_valid_mfn
- Add Ack
---
docs/misc/xen-command-line.markdown | 10 ++
xen/common/page_alloc.c | 211 ++++++++++++++++++++++++++++++++---
xen/include/asm-arm/numa.h | 1 +
3 files changed, 204 insertions(+), 18 deletions(-)
diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 25829fe..509f462 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -198,6 +198,16 @@ Scrub free RAM during boot. This is a safety feature to prevent
accidentally leaking sensitive VM data into other VMs if Xen crashes
and reboots.
+### bootscrub_chunk
+> `= <size>`
+
+> Default: `134217728`
+
+Maximum RAM block size chunks to be scrubbed whilst holding the page heap lock
+and not running softirqs. Reduce this if softirqs are not being run frequently
+enough. Setting this to a high value may cause boot failure, particularly if
+the NMI watchdog is also enabled.
+
### cachesize
> `= <size>`
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 601319c..23f1367 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -65,6 +65,13 @@ static bool_t opt_bootscrub __initdata = 1;
boolean_param("bootscrub", opt_bootscrub);
/*
+ * bootscrub_chunk -> Amount of bytes to scrub lockstep on non-SMT CPUs
+ * on all NUMA nodes.
+ */
+static unsigned long __initdata opt_bootscrub_chunk = MB(128);
+size_param("bootscrub_chunk", opt_bootscrub_chunk);
+
+/*
* Bit width of the DMA heap -- used to override NUMA-node-first.
* allocation strategy, which can otherwise exhaust low memory.
*/
@@ -90,6 +97,16 @@ static struct bootmem_region {
} *__initdata bootmem_region_list;
static unsigned int __initdata nr_bootmem_regions;
+struct scrub_region {
+ unsigned long offset;
+ unsigned long start;
+ unsigned long per_cpu_sz;
+ unsigned long rem;
+ cpumask_t cpus;
+};
+static struct scrub_region __initdata region[MAX_NUMNODES];
+static unsigned long __initdata chunk_size;
+
static void __init boot_bug(int line)
{
panic("Boot BUG at %s:%d", __FILE__, line);
@@ -1256,42 +1273,200 @@ void __init end_boot_allocator(void)
printk("\n");
}
+static void __init smp_scrub_heap_pages(void *data)
+{
+ unsigned long mfn, start, end;
+ struct page_info *pg;
+ struct scrub_region *r;
+ unsigned int temp_cpu, node, cpu_idx = 0;
+ unsigned int cpu = smp_processor_id();
+
+ if ( data )
+ r = data;
+ else
+ {
+ node = cpu_to_node(cpu);
+ if ( node == NUMA_NO_NODE )
+ return;
+ r = ®ion[node];
+ }
+
+ /* Determine the current CPU's index into CPU's linked to this node. */
+ for_each_cpu ( temp_cpu, &r->cpus )
+ {
+ if ( cpu == temp_cpu )
+ break;
+ cpu_idx++;
+ }
+
+ /* Calculate the starting mfn for this CPU's memory block. */
+ start = r->start + (r->per_cpu_sz * cpu_idx) + r->offset;
+
+ /* Calculate the end mfn into this CPU's memory block for this iteration. */
+ if ( r->offset + chunk_size >= r->per_cpu_sz )
+ {
+ end = r->start + (r->per_cpu_sz * cpu_idx) + r->per_cpu_sz;
+
+ if ( r->rem && ((cpumask_weight(&r->cpus) - 1 == cpu_idx )) )
+ end += r->rem;
+ }
+ else
+ end = start + chunk_size;
+
+ for ( mfn = start; mfn < end; mfn++ )
+ {
+ pg = mfn_to_page(mfn);
+
+ /* Check the mfn is valid and page is free. */
+ if ( !mfn_valid(mfn) || !page_state_is(pg, free) )
+ continue;
+
+ scrub_one_page(pg);
+ }
+}
+
+static int __init find_non_smt(unsigned int node, cpumask_t *dest)
+{
+ cpumask_t node_cpus;
+ unsigned int i, cpu;
+
+ cpumask_and(&node_cpus, &node_to_cpumask(node), &cpu_online_map);
+ cpumask_clear(dest);
+ for_each_cpu ( i, &node_cpus )
+ {
+ if ( cpumask_intersects(dest, per_cpu(cpu_sibling_mask, i)) )
+ continue;
+ cpu = cpumask_first(per_cpu(cpu_sibling_mask, i));
+ cpumask_set_cpu(cpu, dest);
+ }
+ return cpumask_weight(dest);
+}
+
/*
- * Scrub all unallocated pages in all heap zones. This function is more
- * convoluted than appears necessary because we do not want to continuously
- * hold the lock while scrubbing very large memory areas.
+ * Scrub all unallocated pages in all heap zones. This function uses all
+ * online cpu's to scrub the memory in parallel.
*/
void __init scrub_heap_pages(void)
{
- unsigned long mfn;
- struct page_info *pg;
+ cpumask_t node_cpus, all_worker_cpus;
+ unsigned int i, j;
+ unsigned long offset, max_per_cpu_sz = 0;
+ unsigned long start, end;
+ unsigned long rem = 0;
+ int last_distance, best_node;
+ int cpus;
if ( !opt_bootscrub )
return;
- printk("Scrubbing Free RAM: ");
+ cpumask_clear(&all_worker_cpus);
+ /* Scrub block size. */
+ chunk_size = opt_bootscrub_chunk >> PAGE_SHIFT;
+ if ( chunk_size == 0 )
+ chunk_size = MB(128) >> PAGE_SHIFT;
+
+ /* Round #0 - figure out amounts and which CPUs to use. */
+ for_each_online_node ( i )
+ {
+ if ( !node_spanned_pages(i) )
+ continue;
+ /* Calculate Node memory start and end address. */
+ start = max(node_start_pfn(i), first_valid_mfn);
+ end = min(node_start_pfn(i) + node_spanned_pages(i), max_page);
+ /* Just in case NODE has 1 page and starts below first_valid_mfn. */
+ end = max(end, start);
+ /* CPUs that are online and on this node (if none, that it is OK). */
+ cpus = find_non_smt(i, &node_cpus);
+ cpumask_or(&all_worker_cpus, &all_worker_cpus, &node_cpus);
+ if ( cpus <= 0 )
+ {
+ /* No CPUs on this node. Round #2 will take of it. */
+ rem = 0;
+ region[i].per_cpu_sz = (end - start);
+ }
+ else
+ {
+ rem = (end - start) % cpumask_weight(&node_cpus);
+ region[i].per_cpu_sz = (end - start) / cpumask_weight(&node_cpus);
+ if ( region[i].per_cpu_sz > max_per_cpu_sz )
+ max_per_cpu_sz = region[i].per_cpu_sz;
+ }
+ region[i].start = start;
+ region[i].rem = rem;
+ cpumask_copy(®ion[i].cpus, &node_cpus);
+ }
+
+ printk("Scrubbing Free RAM on %u nodes using %u CPUs\n", num_online_nodes(),
+ cpumask_weight(&all_worker_cpus));
- for ( mfn = first_valid_mfn; mfn < max_page; mfn++ )
+ /* Round: #1 - do NUMA nodes with CPUs. */
+ for ( offset = 0; offset < max_per_cpu_sz; offset += chunk_size )
{
+ for_each_online_node ( i )
+ region[i].offset = offset;
+
process_pending_softirqs();
- pg = mfn_to_page(mfn);
+ spin_lock(&heap_lock);
+ on_selected_cpus(&all_worker_cpus, smp_scrub_heap_pages, NULL, 1);
+ spin_unlock(&heap_lock);
- /* Quick lock-free check. */
- if ( !mfn_valid(mfn) || !page_state_is(pg, free) )
+ printk(".");
+ }
+
+ /*
+ * Round #2: NUMA nodes with no CPUs get scrubbed with CPUs on the node
+ * closest to us and with CPUs.
+ */
+ for_each_online_node ( i )
+ {
+ node_cpus = node_to_cpumask(i);
+
+ if ( !cpumask_empty(&node_cpus) )
continue;
- /* Every 100MB, print a progress dot. */
- if ( (mfn % ((100*1024*1024)/PAGE_SIZE)) == 0 )
- printk(".");
+ last_distance = INT_MAX;
+ best_node = first_node(node_online_map);
+ /* Figure out which NODE CPUs are close. */
+ for_each_online_node ( j )
+ {
+ int distance;
- spin_lock(&heap_lock);
+ if ( cpumask_empty(&node_to_cpumask(j)) )
+ continue;
- /* Re-check page status with lock held. */
- if ( page_state_is(pg, free) )
- scrub_one_page(pg);
+ distance = __node_distance(i, j);
+ if ( distance < last_distance )
+ {
+ last_distance = distance;
+ best_node = j;
+ }
+ }
+ /*
+ * Use CPUs from best node, and if there are no CPUs on the
+ * first node (the default) use the BSP.
+ */
+ if ( find_non_smt(best_node, &node_cpus) == 0 )
+ cpumask_set_cpu(smp_processor_id(), &node_cpus);
- spin_unlock(&heap_lock);
+ /* We already have the node information from round #0. */
+ region[i].rem = region[i].per_cpu_sz % cpumask_weight(&node_cpus);
+ region[i].per_cpu_sz /= cpumask_weight(&node_cpus);
+ max_per_cpu_sz = region[i].per_cpu_sz;
+ cpumask_copy(®ion[i].cpus, &node_cpus);
+
+ for ( offset = 0; offset < max_per_cpu_sz; offset += chunk_size )
+ {
+ region[i].offset = offset;
+
+ process_pending_softirqs();
+
+ spin_lock(&heap_lock);
+ on_selected_cpus(&node_cpus, smp_scrub_heap_pages, ®ion[i], 1);
+ spin_unlock(&heap_lock);
+
+ printk(".");
+ }
}
printk("done.\n");
diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
index cb8f2ba..2c019d7 100644
--- a/xen/include/asm-arm/numa.h
+++ b/xen/include/asm-arm/numa.h
@@ -12,6 +12,7 @@ static inline __attribute__((pure)) int phys_to_nid(paddr_t addr)
/* XXX: implement NUMA support */
#define node_spanned_pages(nid) (total_pages)
+#define node_start_pfn(nid) (frametable_base_mfn)
#define __node_distance(a, b) (20)
#endif /* __ARCH_ARM_NUMA_H */
--
1.7.7.6
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH] Xen: Spread boot time page scrubbing across all available CPU's (v6)
2014-06-13 18:05 [PATCH] Xen: Spread boot time page scrubbing across all available CPU's (v6) Konrad Rzeszutek Wilk
@ 2014-06-13 18:32 ` Andrew Cooper
2014-06-13 18:52 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 3+ messages in thread
From: Andrew Cooper @ 2014-06-13 18:32 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: dario.faggioli, julien.grall, tim, konrad, JBeulich, xen-devel,
malcolm.crossley
On 13/06/14 19:05, Konrad Rzeszutek Wilk wrote:
> From: Malcolm Crossley <malcolm.crossley@citrix.com>
>
> The page scrubbing is done in 128MB chunks in lockstep across all the
> non-SMT CPU's. This allows for the boot CPU to hold the heap_lock whilst each
> chunk is being scrubbed and then release the heap_lock when the CPU's are
> finished scrubing their individual chunk. This allows for the heap_lock to
> not be held continously and for pending softirqs are to be serviced
> periodically across the CPU's.
>
> The page scrub memory chunks are allocated to the CPU's in a NUMA aware
> fashion to reduce socket interconnect overhead and improve performance.
> Specifically in the first phase we scrub at the same time on all the
> NUMA nodes that have CPUs - we also weed out the SMT threads so that
> we only use cores (that gives a 50% boost). The second phase is for NUMA
> nodes that have no CPUs - for that we use the closest NUMA node's CPUs
> (non-SMT again) to do the job.
>
> This patch reduces the boot page scrub time on a 128GB 64 core AMD Opteron
> 6386 machine from 49 seconds to 3 seconds.
> On a IvyBridge-EX 8 socket box with 1.5TB it cuts it down from 15 minutes
> to 63 seconds.
>
> Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Reviewed-by: Tim Deegan <tim@xen.org>
Functionally, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
A few minor nits...
>
> ---
>
> v2
> - Reduced default chunk size to 128MB
> - Added code to scrub NUMA nodes with no active CPU linked to them
> - Be robust to boot CPU not being linked to a NUMA node
>
> v3:
> - Don't use SMT threads
> - Take care of remainder if the number of CPUs (or memory) is odd
> - Restructure the worker thread
> - s/u64/unsigned long/
>
> v4:
> - Don't use all CPUs on non-CPU NUMA nodes, just closest one
> - Syntax, and docs updates
> - Compile on ARM
> - Fix bug when NUMA node has 0 pages
>
> v5:
> - Properly figure out best NUMA node.
> - Fix comments to be proper style.
>
> v6:
> - Missing page shift on default values, optimize cpumask usage.
> - Fix case of NODE having one page and below first_valid_mfn
> - Add Ack
> ---
> docs/misc/xen-command-line.markdown | 10 ++
> xen/common/page_alloc.c | 211 ++++++++++++++++++++++++++++++++---
> xen/include/asm-arm/numa.h | 1 +
> 3 files changed, 204 insertions(+), 18 deletions(-)
>
> diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
> index 25829fe..509f462 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -198,6 +198,16 @@ Scrub free RAM during boot. This is a safety feature to prevent
> accidentally leaking sensitive VM data into other VMs if Xen crashes
> and reboots.
>
> +### bootscrub_chunk
_ needs escaping with a backslash
> +> `= <size>`
> +
> +> Default: `134217728`
Due to the implicit 'k' suffix on sizes, this is not a useful hint as to
how to change the default.
Furthermore, `128M` is substantially clearer.
> +
> +Maximum RAM block size chunks to be scrubbed whilst holding the page heap lock
> +and not running softirqs. Reduce this if softirqs are not being run frequently
> +enough. Setting this to a high value may cause boot failure, particularly if
> +the NMI watchdog is also enabled.
> +
> ### cachesize
> > `= <size>`
>
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> ...
> - * Scrub all unallocated pages in all heap zones. This function is more
> - * convoluted than appears necessary because we do not want to continuously
> - * hold the lock while scrubbing very large memory areas.
> + * Scrub all unallocated pages in all heap zones. This function uses all
> + * online cpu's to scrub the memory in parallel.
> */
> void __init scrub_heap_pages(void)
> {
> - unsigned long mfn;
> - struct page_info *pg;
> + cpumask_t node_cpus, all_worker_cpus;
> + unsigned int i, j;
> + unsigned long offset, max_per_cpu_sz = 0;
> + unsigned long start, end;
> + unsigned long rem = 0;
> + int last_distance, best_node;
> + int cpus;
>
> if ( !opt_bootscrub )
> return;
>
> - printk("Scrubbing Free RAM: ");
> + cpumask_clear(&all_worker_cpus);
> + /* Scrub block size. */
> + chunk_size = opt_bootscrub_chunk >> PAGE_SHIFT;
> + if ( chunk_size == 0 )
> + chunk_size = MB(128) >> PAGE_SHIFT;
> +
> + /* Round #0 - figure out amounts and which CPUs to use. */
> + for_each_online_node ( i )
> + {
> + if ( !node_spanned_pages(i) )
> + continue;
> + /* Calculate Node memory start and end address. */
> + start = max(node_start_pfn(i), first_valid_mfn);
> + end = min(node_start_pfn(i) + node_spanned_pages(i), max_page);
> + /* Just in case NODE has 1 page and starts below first_valid_mfn. */
> + end = max(end, start);
> + /* CPUs that are online and on this node (if none, that it is OK). */
> + cpus = find_non_smt(i, &node_cpus);
> + cpumask_or(&all_worker_cpus, &all_worker_cpus, &node_cpus);
> + if ( cpus <= 0 )
> + {
> + /* No CPUs on this node. Round #2 will take of it. */
> + rem = 0;
> + region[i].per_cpu_sz = (end - start);
> + }
> + else
> + {
> + rem = (end - start) % cpumask_weight(&node_cpus);
> + region[i].per_cpu_sz = (end - start) / cpumask_weight(&node_cpus);
The 'cpus' variable still holds cpumask_weight(&node_cpus), and is
rather more efficient to use.
> + if ( region[i].per_cpu_sz > max_per_cpu_sz )
> + max_per_cpu_sz = region[i].per_cpu_sz;
> + }
> + region[i].start = start;
> + region[i].rem = rem;
> + cpumask_copy(®ion[i].cpus, &node_cpus);
> + }
> +
> + printk("Scrubbing Free RAM on %u nodes using %u CPUs\n", num_online_nodes(),
> + cpumask_weight(&all_worker_cpus));
Both of these are signed quantities, rather than unsigned.
~Andrew
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH] Xen: Spread boot time page scrubbing across all available CPU's (v6)
2014-06-13 18:32 ` Andrew Cooper
@ 2014-06-13 18:52 ` Konrad Rzeszutek Wilk
0 siblings, 0 replies; 3+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-06-13 18:52 UTC (permalink / raw)
To: Andrew Cooper
Cc: dario.faggioli, julien.grall, tim, konrad, JBeulich, xen-devel,
malcolm.crossley
On Fri, Jun 13, 2014 at 07:32:46PM +0100, Andrew Cooper wrote:
> On 13/06/14 19:05, Konrad Rzeszutek Wilk wrote:
> > From: Malcolm Crossley <malcolm.crossley@citrix.com>
> >
> > The page scrubbing is done in 128MB chunks in lockstep across all the
> > non-SMT CPU's. This allows for the boot CPU to hold the heap_lock whilst each
> > chunk is being scrubbed and then release the heap_lock when the CPU's are
> > finished scrubing their individual chunk. This allows for the heap_lock to
> > not be held continously and for pending softirqs are to be serviced
> > periodically across the CPU's.
> >
> > The page scrub memory chunks are allocated to the CPU's in a NUMA aware
> > fashion to reduce socket interconnect overhead and improve performance.
> > Specifically in the first phase we scrub at the same time on all the
> > NUMA nodes that have CPUs - we also weed out the SMT threads so that
> > we only use cores (that gives a 50% boost). The second phase is for NUMA
> > nodes that have no CPUs - for that we use the closest NUMA node's CPUs
> > (non-SMT again) to do the job.
> >
> > This patch reduces the boot page scrub time on a 128GB 64 core AMD Opteron
> > 6386 machine from 49 seconds to 3 seconds.
> > On a IvyBridge-EX 8 socket box with 1.5TB it cuts it down from 15 minutes
> > to 63 seconds.
> >
> > Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Reviewed-by: Tim Deegan <tim@xen.org>
>
> Functionally, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
>
> A few minor nits...
>
> >
> > ---
> >
> > v2
> > - Reduced default chunk size to 128MB
> > - Added code to scrub NUMA nodes with no active CPU linked to them
> > - Be robust to boot CPU not being linked to a NUMA node
> >
> > v3:
> > - Don't use SMT threads
> > - Take care of remainder if the number of CPUs (or memory) is odd
> > - Restructure the worker thread
> > - s/u64/unsigned long/
> >
> > v4:
> > - Don't use all CPUs on non-CPU NUMA nodes, just closest one
> > - Syntax, and docs updates
> > - Compile on ARM
> > - Fix bug when NUMA node has 0 pages
> >
> > v5:
> > - Properly figure out best NUMA node.
> > - Fix comments to be proper style.
> >
> > v6:
> > - Missing page shift on default values, optimize cpumask usage.
> > - Fix case of NODE having one page and below first_valid_mfn
> > - Add Ack
> > ---
> > docs/misc/xen-command-line.markdown | 10 ++
> > xen/common/page_alloc.c | 211 ++++++++++++++++++++++++++++++++---
> > xen/include/asm-arm/numa.h | 1 +
> > 3 files changed, 204 insertions(+), 18 deletions(-)
> >
> > diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
> > index 25829fe..509f462 100644
> > --- a/docs/misc/xen-command-line.markdown
> > +++ b/docs/misc/xen-command-line.markdown
> > @@ -198,6 +198,16 @@ Scrub free RAM during boot. This is a safety feature to prevent
> > accidentally leaking sensitive VM data into other VMs if Xen crashes
> > and reboots.
> >
> > +### bootscrub_chunk
>
> _ needs escaping with a backslash
>
> > +> `= <size>`
> > +
> > +> Default: `134217728`
>
> Due to the implicit 'k' suffix on sizes, this is not a useful hint as to
> how to change the default.
>
> Furthermore, `128M` is substantially clearer.
<smiles>
>
> > +
> > +Maximum RAM block size chunks to be scrubbed whilst holding the page heap lock
> > +and not running softirqs. Reduce this if softirqs are not being run frequently
> > +enough. Setting this to a high value may cause boot failure, particularly if
> > +the NMI watchdog is also enabled.
> > +
> > ### cachesize
> > > `= <size>`
> >
> > diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> > ...
> > - * Scrub all unallocated pages in all heap zones. This function is more
> > - * convoluted than appears necessary because we do not want to continuously
> > - * hold the lock while scrubbing very large memory areas.
> > + * Scrub all unallocated pages in all heap zones. This function uses all
> > + * online cpu's to scrub the memory in parallel.
> > */
> > void __init scrub_heap_pages(void)
> > {
> > - unsigned long mfn;
> > - struct page_info *pg;
> > + cpumask_t node_cpus, all_worker_cpus;
> > + unsigned int i, j;
> > + unsigned long offset, max_per_cpu_sz = 0;
> > + unsigned long start, end;
> > + unsigned long rem = 0;
> > + int last_distance, best_node;
> > + int cpus;
> >
> > if ( !opt_bootscrub )
> > return;
> >
> > - printk("Scrubbing Free RAM: ");
> > + cpumask_clear(&all_worker_cpus);
> > + /* Scrub block size. */
> > + chunk_size = opt_bootscrub_chunk >> PAGE_SHIFT;
> > + if ( chunk_size == 0 )
> > + chunk_size = MB(128) >> PAGE_SHIFT;
> > +
> > + /* Round #0 - figure out amounts and which CPUs to use. */
> > + for_each_online_node ( i )
> > + {
> > + if ( !node_spanned_pages(i) )
> > + continue;
> > + /* Calculate Node memory start and end address. */
> > + start = max(node_start_pfn(i), first_valid_mfn);
> > + end = min(node_start_pfn(i) + node_spanned_pages(i), max_page);
> > + /* Just in case NODE has 1 page and starts below first_valid_mfn. */
> > + end = max(end, start);
> > + /* CPUs that are online and on this node (if none, that it is OK). */
> > + cpus = find_non_smt(i, &node_cpus);
> > + cpumask_or(&all_worker_cpus, &all_worker_cpus, &node_cpus);
> > + if ( cpus <= 0 )
> > + {
> > + /* No CPUs on this node. Round #2 will take of it. */
> > + rem = 0;
> > + region[i].per_cpu_sz = (end - start);
> > + }
> > + else
> > + {
> > + rem = (end - start) % cpumask_weight(&node_cpus);
> > + region[i].per_cpu_sz = (end - start) / cpumask_weight(&node_cpus);
>
> The 'cpus' variable still holds cpumask_weight(&node_cpus), and is
> rather more efficient to use.
Grrr. Right!
>
> > + if ( region[i].per_cpu_sz > max_per_cpu_sz )
> > + max_per_cpu_sz = region[i].per_cpu_sz;
> > + }
> > + region[i].start = start;
> > + region[i].rem = rem;
> > + cpumask_copy(®ion[i].cpus, &node_cpus);
> > + }
> > +
> > + printk("Scrubbing Free RAM on %u nodes using %u CPUs\n", num_online_nodes(),
> > + cpumask_weight(&all_worker_cpus));
>
> Both of these are signed quantities, rather than unsigned.
Done!
>
> ~Andrew
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-06-13 18:53 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-13 18:05 [PATCH] Xen: Spread boot time page scrubbing across all available CPU's (v6) Konrad Rzeszutek Wilk
2014-06-13 18:32 ` Andrew Cooper
2014-06-13 18:52 ` Konrad Rzeszutek Wilk
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.