From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: andrew.cooper3@citrix.com, dario.faggioli@citrix.com,
julien.grall@linaro.org, tim@xen.org,
xen-devel@lists.xenproject.org,
Malcolm Crossley <malcolm.crossley@citrix.com>
Subject: Re: [PATCH] Xen: Spread boot time page scrubbing across all available CPU's (v5)
Date: Thu, 5 Jun 2014 13:49:28 -0400 [thread overview]
Message-ID: <20140605174928.GD2546@localhost.localdomain> (raw)
In-Reply-To: <53906F2102000078000182B3@mail.emea.novell.com>
On Thu, Jun 05, 2014 at 12:22:41PM +0100, Jan Beulich wrote:
> >>> On 04.06.14 at 15:29, <konrad.wilk@oracle.com> wrote:
> > --- a/docs/misc/xen-command-line.markdown
> > +++ b/docs/misc/xen-command-line.markdown
> > @@ -198,6 +198,16 @@ Scrub free RAM during boot. This is a safety feature to prevent
> > accidentally leaking sensitive VM data into other VMs if Xen crashes
> > and reboots.
> >
> > +### bootscrub_chunk_
>
> Looking at other examples in that file, underscores appear to need
> backslash escaping here. And I don't think the trailing one should
> be there.
>
> > +> `= <size>`
> > +
> > +> Default: `128MB`
> > +
> > +Maximum RAM block size chunks to be scrubbed whilst holding the page heap lock
> > +and not running softirqs. Reduce this if softirqs are not being run frequently
> > +enough. Setting this to a high value may cause cause boot failure, particularly
>
> Double "cause".
>
> > +static void __init smp_scrub_heap_pages(void *data)
> > +{
> > + unsigned long mfn, start, end;
> > + struct page_info *pg;
> > + struct scrub_region *r;
> > + unsigned int temp_cpu, node, cpu_idx = 0;
> > + unsigned int cpu = smp_processor_id();
> > +
> > + if ( data )
> > + r = data;
> > + else
> > + {
> > + node = cpu_to_node(cpu);
> > + if ( node == NUMA_NO_NODE )
> > + return;
> > + r = ®ion[node];
> > + }
> > +
> > + /* Determine the current CPU's index into CPU's linked to this node. */
> > + for_each_cpu ( temp_cpu, &r->cpus )
> > + {
> > + if ( cpu == temp_cpu )
> > + break;
> > + cpu_idx++;
> > + }
> > +
> > + /* Calculate the starting mfn for this CPU's memory block. */
> > + start = r->start + (r->per_cpu_sz * cpu_idx) + r->offset;
> > +
> > + /* Calculate the end mfn into this CPU's memory block for this
> > iteration. */
> > + if ( r->offset + chunk_size >= r->per_cpu_sz )
> > + {
> > + end = r->start + (r->per_cpu_sz * cpu_idx) + r->per_cpu_sz;
> > +
> > + if ( r->rem && ((cpumask_weight(&r->cpus) - 1 == cpu_idx )) )
> > + end += r->rem;
> > + }
> > + else
> > + end = start + chunk_size;
> > +
> > + for ( mfn = start; mfn < end; mfn++ )
> > + {
> > + pg = mfn_to_page(mfn);
> > +
> > + /* Check the mfn is valid and page is free. */
> > + if ( !mfn_valid(mfn) || !page_state_is(pg, free) )
> > + continue;
> > +
> > + scrub_one_page(pg);
> > + }
> > +}
> > +
> > +static int __init find_non_smt(unsigned int node, cpumask_t *dest)
> > +{
> > + cpumask_t node_cpus;
> > + unsigned int i, cpu;
> > +
> > + cpumask_and(&node_cpus, &node_to_cpumask(node), &cpu_online_map);
> > + cpumask_clear(dest);
> > + for_each_cpu ( i, &node_cpus )
> > + {
> > + if ( cpumask_intersects(dest, per_cpu(cpu_sibling_mask, i)) )
> > + continue;
> > + cpu = cpumask_first(per_cpu(cpu_sibling_mask, i));
> > + cpumask_set_cpu(cpu, dest);
> > + }
> > + return cpumask_weight(dest);
> > +}
> > +
> > /*
> > - * Scrub all unallocated pages in all heap zones. This function is more
> > - * convoluted than appears necessary because we do not want to continuously
> > - * hold the lock while scrubbing very large memory areas.
> > + * Scrub all unallocated pages in all heap zones. This function uses all
> > + * online cpu's to scrub the memory in parallel.
> > */
> > void __init scrub_heap_pages(void)
> > {
> > - unsigned long mfn;
> > - struct page_info *pg;
> > + cpumask_t node_cpus, all_worker_cpus;
> > + unsigned int i, j;
> > + unsigned long offset, max_per_cpu_sz = 0;
> > + unsigned long start, end;
> > + unsigned long rem = 0;
> > + int last_distance, best_node;
> >
> > if ( !opt_bootscrub )
> > return;
> >
> > - printk("Scrubbing Free RAM: ");
> > + cpumask_clear(&all_worker_cpus);
> > + /* Scrub block size. */
> > + chunk_size = opt_bootscrub_chunk >> PAGE_SHIFT;
> > + if ( chunk_size == 0 )
> > + chunk_size = MB(128);
> > +
> > + /* Round #0 - figure out amounts and which CPUs to use. */
> > + for_each_online_node ( i )
> > + {
> > + if ( !node_spanned_pages(i) )
> > + continue;
> > + /* Calculate Node memory start and end address. */
> > + start = max(node_start_pfn(i), first_valid_mfn);
>
> This implies you're aware that possibly node_start_pfn(i) <
> first_valid_mfn.
>
> > + end = min(node_start_pfn(i) + node_spanned_pages(i), max_page);
>
> Which in turn means that this may yield end < start. Is all the rest
> of the code prepared to deal with this? At least the divide and
> modulo operations on the difference further down doesn't look like
> so.
It will loop forever. I think adding
end = max(end, start);
Would fix it.
>
> > + /* CPUs that are online and on this node (if none, that it is OK). */
> > + find_non_smt(i, &node_cpus);
>
> Here you could latch the weight, avoiding ...
>
> > + cpumask_or(&all_worker_cpus, &all_worker_cpus, &node_cpus);
> > + if ( cpumask_empty(&node_cpus) )
>
> ... the extra operation on the mask here and the explicit use of
> cpumask_weight() on node_cpus in the else path.
<nods>
>
> > + {
> > + /* No CPUs on this node. Round #2 will take of it. */
> > + rem = 0;
> > + region[i].per_cpu_sz = (end - start);
> > + } else {
>
> Coding style - this takes 3 lines.
>
> > + rem = (end - start) % cpumask_weight(&node_cpus);
> > + region[i].per_cpu_sz = (end - start) / cpumask_weight(&node_cpus);
> > + if ( region[i].per_cpu_sz > max_per_cpu_sz )
> > + max_per_cpu_sz = region[i].per_cpu_sz;
> > + }
> > + region[i].start = start;
> > + region[i].rem = rem;
> > + cpumask_copy(®ion[i].cpus, &node_cpus);
> > +
> > + }
> > +
> > + printk("Scrubbing Free RAM on %u nodes using %u CPUs\n", num_online_nodes(),
> > + cpumask_weight(&all_worker_cpus));
> >
> > - for ( mfn = first_valid_mfn; mfn < max_page; mfn++ )
> > + /* Round: #1 - do NUMA nodes with CPUs. */
> > + for ( offset = 0; offset < max_per_cpu_sz; offset += chunk_size )
> > {
> > + for_each_online_node ( i )
> > + region[i].offset = offset;
> > +
> > process_pending_softirqs();
> >
> > - pg = mfn_to_page(mfn);
> > + spin_lock(&heap_lock);
> > + on_selected_cpus(&all_worker_cpus, smp_scrub_heap_pages, NULL, 1);
> > + spin_unlock(&heap_lock);
> >
> > - /* Quick lock-free check. */
> > - if ( !mfn_valid(mfn) || !page_state_is(pg, free) )
> > + printk(".");
> > + }
> > +
> > + /*
> > + * Round #2: NUMA nodes with no CPUs get scrubbed with CPUs on the node
> > + * closest to us and with CPUs.
> > + */
> > + for_each_online_node ( i )
> > + {
> > + node_cpus = node_to_cpumask(i);
> > +
> > + if ( !cpumask_empty(&node_cpus) )
> > continue;
> >
> > - /* Every 100MB, print a progress dot. */
> > - if ( (mfn % ((100*1024*1024)/PAGE_SIZE)) == 0 )
> > - printk(".");
> > + last_distance = INT_MAX;
> > + best_node = first_node(node_online_map);
> > + /* Figure out which NODE CPUs are close. */
> > + for_each_online_node ( j )
> > + {
> > + int distance;
> >
> > - spin_lock(&heap_lock);
> > + if ( i == j )
> > + continue;
>
> This could be replaced with cpumask_empty(&node_to_cpumask(j)),
> allowing ...
>
> > + distance = __node_distance(i, j);
> > + if ( distance < last_distance )
> > + {
> > + if ( cpumask_empty(&node_to_cpumask(j)) )
> > + continue;
>
> this check to be dropped.
Clever! Will do.
>
> Jan
prev parent reply other threads:[~2014-06-05 17:49 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-04 13:29 [PATCH v5] Spread boot time scrubbing across available CPUs Konrad Rzeszutek Wilk
2014-06-04 13:29 ` [PATCH] Xen: Spread boot time page scrubbing across all available CPU's (v5) Konrad Rzeszutek Wilk
2014-06-04 13:35 ` Andrew Cooper
2014-06-04 13:52 ` Konrad Rzeszutek Wilk
2014-06-05 10:09 ` Tim Deegan
2014-06-05 11:22 ` Jan Beulich
2014-06-05 17:49 ` Konrad Rzeszutek Wilk [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140605174928.GD2546@localhost.localdomain \
--to=konrad.wilk@oracle.com \
--cc=JBeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=dario.faggioli@citrix.com \
--cc=julien.grall@linaro.org \
--cc=malcolm.crossley@citrix.com \
--cc=tim@xen.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).