From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: tim@xen.org, sstabellini@kernel.org, wei.liu2@citrix.com,
George.Dunlap@eu.citrix.com, andrew.cooper3@citrix.com,
ian.jackson@eu.citrix.com, xen-devel@lists.xen.org
Subject: Re: [PATCH v3 4/9] mm: Scrub memory from idle loop
Date: Thu, 4 May 2017 13:09:08 -0400 [thread overview]
Message-ID: <08bc0925-a8b5-cd45-76e6-19ce0d3678a3@oracle.com> (raw)
In-Reply-To: <590B657D0200007800156E0C@prv-mh.provo.novell.com>
On 05/04/2017 11:31 AM, Jan Beulich wrote:
>>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -1035,16 +1035,82 @@ merge_and_free_buddy(struct page_info *pg, unsigned int node,
>> return pg;
>> }
>>
>> -static void scrub_free_pages(unsigned int node)
>> +static nodemask_t node_scrubbing;
>> +
>> +static unsigned int node_to_scrub(bool get_node)
>> +{
>> + nodeid_t node = cpu_to_node(smp_processor_id()), local_node;
>> + nodeid_t closest = NUMA_NO_NODE;
>> + u8 dist, shortest = 0xff;
>> +
>> + if ( node == NUMA_NO_NODE )
>> + node = 0;
>> +
>> + if ( node_need_scrub[node] &&
>> + (!get_node || !node_test_and_set(node, node_scrubbing)) )
>> + return node;
>> +
>> + /*
>> + * See if there are memory-only nodes that need scrubbing and choose
>> + * the closest one.
>> + */
>> + local_node = node;
>> + while ( 1 )
>> + {
>> + do {
>> + node = cycle_node(node, node_online_map);
>> + } while ( !cpumask_empty(&node_to_cpumask(node)) &&
>> + (node != local_node) );
>> +
>> + if ( node == local_node )
>> + break;
>> +
>> + if ( node_need_scrub[node] )
>> + {
>> + if ( !get_node )
>> + return node;
> I think the function parameter name is not / no longer suitable. The
> caller wants to get _some_ node in either case. The difference is
> whether it wants to just know whether there's _any_ needing scrub
> work done, or whether it wants _the one_ to actually scrub on. So
> how about "get_any" or "get_any_node" or just "any"?
Not only to find out whether there is anything to scrub but, if get_node
is true, to actually "get" it, i.e. set the bit in the node_scrubbing
mask. Thus the name.
>
>> + if ( !node_test_and_set(node, node_scrubbing) )
>> + {
>> + dist = __node_distance(local_node, node);
>> + if ( (dist < shortest) || (dist == NUMA_NO_DISTANCE) )
>> + {
>> + /* Release previous node. */
>> + if ( closest != NUMA_NO_NODE )
>> + node_clear(closest, node_scrubbing);
>> + shortest = dist;
>> + closest = node;
>> + }
>> + else
>> + node_clear(node, node_scrubbing);
>> + }
> Wouldn't it be better to check distance before setting the bit in
> node_scrubbing, avoiding the possible need to clear it again later
> (potentially misguiding other CPUs)?
Yes.
>
> And why would NUMA_NO_DISTANCE lead to a switch of nodes?
> I can see that being needed when closest == NUMA_NO_NODE,
> but once you've picked one I think you should switch only when
> you've found another that's truly closer.
OK --- yes, this was indeed done to "get started" (i.e. get first valid
value for 'closest').
>
>> + }
>> + }
>> +
>> + return closest;
>> +}
>> +
>> +bool scrub_free_pages(void)
>> {
>> struct page_info *pg;
>> unsigned int zone, order;
>> unsigned long i;
>> + unsigned int cpu = smp_processor_id();
>> + bool preempt = false;
>> + nodeid_t node;
>>
>> - ASSERT(spin_is_locked(&heap_lock));
>> + /*
>> + * Don't scrub while dom0 is being constructed since we may
>> + * fail trying to call map_domain_page() from scrub_one_page().
>> + */
>> + if ( system_state < SYS_STATE_active )
>> + return false;
> I assume that's because of the mapcache vcpu override? That's x86
> specific though, so the restriction here ought to be arch specific.
> Even better would be to find a way to avoid this restriction
> altogether, as on bigger systems only one CPU is actually busy
> while building Dom0, so all others could be happily scrubbing. Could
> that override become a per-CPU one perhaps?
Is it worth doing though? What you are saying below is exactly why I
simply return here --- there were very few dirty pages. This may change
if we decide to use idle-loop scrubbing for boot scrubbing as well (as
Andrew suggested earlier) but there is little reason to do it now IMO.
> Otoh there's not much to scrub yet until Dom0 had all its memory
> allocated, and we know which pages truly remain free (wanting
> what is currently the boot time scrubbing done on them). But that
> point in time may still be earlier than when we switch to
> SYS_STATE_active.
>
>> @@ -1065,16 +1131,29 @@ static void scrub_free_pages(unsigned int node)
>> pg[i].count_info &= ~PGC_need_scrub;
>> node_need_scrub[node]--;
>> }
>> + if ( softirq_pending(cpu) )
>> + {
>> + preempt = true;
>> + break;
>> + }
> Isn't this a little too eager, especially if you didn't have to scrub
> the page on this iteration?
What would be a good place then? Count how actually scrubbed pages and
check for pending interrupts every so many? Even if we don't scrub at
all walking whole heap can take a while.
>
>> @@ -1141,9 +1220,6 @@ static void free_heap_pages(
>> if ( tainted )
>> reserve_offlined_page(pg);
>>
>> - if ( need_scrub )
>> - scrub_free_pages(node);
> I'd expect this eliminates the need for the need_scrub variable.
We still need it to decide whether to set PGC_need_scrub on pages.
-boris
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2017-05-04 17:09 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-14 15:37 [PATCH v3 0/9] Memory scrubbing from idle loop Boris Ostrovsky
2017-04-14 15:37 ` [PATCH v3 1/9] mm: Separate free page chunk merging into its own routine Boris Ostrovsky
2017-05-04 9:45 ` Jan Beulich
2017-04-14 15:37 ` [PATCH v3 2/9] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
2017-05-04 10:17 ` Jan Beulich
2017-05-04 14:53 ` Boris Ostrovsky
2017-05-04 15:00 ` Jan Beulich
2017-05-08 16:41 ` George Dunlap
2017-05-08 16:59 ` Boris Ostrovsky
2017-04-14 15:37 ` [PATCH v3 3/9] mm: Scrub pages in alloc_heap_pages() if needed Boris Ostrovsky
2017-05-04 14:44 ` Jan Beulich
2017-05-04 15:04 ` Boris Ostrovsky
2017-05-04 15:36 ` Jan Beulich
2017-04-14 15:37 ` [PATCH v3 4/9] mm: Scrub memory from idle loop Boris Ostrovsky
2017-05-04 15:31 ` Jan Beulich
2017-05-04 17:09 ` Boris Ostrovsky [this message]
2017-05-05 10:21 ` Jan Beulich
2017-05-05 13:42 ` Boris Ostrovsky
2017-05-05 14:10 ` Jan Beulich
2017-05-05 14:14 ` Jan Beulich
2017-05-05 14:27 ` Boris Ostrovsky
2017-05-05 14:51 ` Jan Beulich
2017-05-05 15:23 ` Boris Ostrovsky
2017-05-05 16:05 ` Jan Beulich
2017-05-05 16:49 ` Boris Ostrovsky
2017-05-08 7:14 ` Jan Beulich
2017-05-11 10:26 ` Dario Faggioli
2017-05-11 14:19 ` Boris Ostrovsky
2017-05-11 15:48 ` Dario Faggioli
2017-05-11 17:05 ` Boris Ostrovsky
2017-05-12 8:17 ` Dario Faggioli
2017-05-12 14:42 ` Boris Ostrovsky
2017-04-14 15:37 ` [PATCH v3 5/9] mm: Do not discard already-scrubbed pages if softirqs are pending Boris Ostrovsky
2017-05-04 15:43 ` Jan Beulich
2017-05-04 17:18 ` Boris Ostrovsky
2017-05-05 10:27 ` Jan Beulich
2017-05-05 13:51 ` Boris Ostrovsky
2017-05-05 14:13 ` Jan Beulich
2017-04-14 15:37 ` [PATCH v3 6/9] spinlock: Introduce spin_lock_cb() Boris Ostrovsky
2017-04-14 15:37 ` [PATCH v3 7/9] mm: Keep pages available for allocation while scrubbing Boris Ostrovsky
2017-05-04 16:03 ` Jan Beulich
2017-05-04 17:26 ` Boris Ostrovsky
2017-05-05 10:28 ` Jan Beulich
2017-04-14 15:37 ` [PATCH v3 8/9] mm: Print number of unscrubbed pages in 'H' debug handler Boris Ostrovsky
2017-04-14 15:37 ` [PATCH v3 9/9] mm: Make sure pages are scrubbed Boris Ostrovsky
2017-05-05 15:05 ` Jan Beulich
2017-05-08 15:48 ` Konrad Rzeszutek Wilk
2017-05-08 16:23 ` Boris Ostrovsky
2017-05-02 14:46 ` [PATCH v3 0/9] Memory scrubbing from idle loop Boris Ostrovsky
2017-05-02 14:58 ` Jan Beulich
2017-05-02 15:07 ` Boris Ostrovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=08bc0925-a8b5-cd45-76e6-19ce0d3678a3@oracle.com \
--to=boris.ostrovsky@oracle.com \
--cc=George.Dunlap@eu.citrix.com \
--cc=JBeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=ian.jackson@eu.citrix.com \
--cc=sstabellini@kernel.org \
--cc=tim@xen.org \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).