Re: [PATCH v2 3/3] xen: use idle vcpus to scrub pages

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Bob Liu <bob.liu@oracle.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: Bob Liu <lliubbo@gmail.com>,
	keir@xen.org, ian.campbell@citrix.com,
	George.Dunlap@eu.citrix.com, andrew.cooper3@citrix.com,
	xen-devel@lists.xenproject.org
Subject: Re: [PATCH v2 3/3] xen: use idle vcpus to scrub pages
Date: Tue, 01 Jul 2014 20:25:43 +0800	[thread overview]
Message-ID: <53B2A8C7.9040601@oracle.com> (raw)
In-Reply-To: <53B2979C020000780001EE97@mail.emea.novell.com>


On 07/01/2014 05:12 PM, Jan Beulich wrote:
>>>> On 30.06.14 at 15:39, <lliubbo@gmail.com> wrote:
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -116,6 +116,7 @@ static void idle_loop(void)
>>      {
>>          if ( cpu_is_offline(smp_processor_id()) )
>>              play_dead();
>> +        scrub_free_pages();
>>          (*pm_idle)();
> 
> So is it really appropriate to call pm_idle() if scrub_free_pages() didn't
> return immediately? I.e. I would suppose the function ought to return
> a boolean indicator whether any meaningful amount of processing
> was done, in which case we may want to skip entering any kind of C
> state (even if it would end up being just C1), or doing any of the
> processing associated with this possible entering.
>

Thanks, I'll add a return value for scrub_free_pages() and skip pm_idle
if necessary.

>> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
>> index ab293c8..6ab1d1d 100644
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -86,6 +86,12 @@ PAGE_LIST_HEAD(page_offlined_list);
>>  /* Broken page list, protected by heap_lock. */
>>  PAGE_LIST_HEAD(page_broken_list);
>>  
>> +/* A rough flag to indicate whether a node have need_scrub pages */
>> +static bool_t node_need_scrub[MAX_NUMNODES];
>> +static DEFINE_PER_CPU(bool_t, is_scrubbing);
>> +static DEFINE_PER_CPU(struct page_list_head, scrub_list_cpu);
>> +static DEFINE_PER_CPU(struct page_list_head, free_list_cpu);
>> +
>>  /*************************
>>   * BOOT-TIME ALLOCATOR
>>   */
>> @@ -948,6 +954,7 @@ static void free_heap_pages(
>>      {
>>          if ( !tainted )
>>          {
>> +            node_need_scrub[node] = 1;
>>              for ( i = 0; i < (1 << order); i++ )
>>                  pg[i].count_info |= PGC_need_scrub;
>>          }
> 
> Iirc it was more than this single place where you set
> PGC_need_scrub, and hence where you'd now need to set the
> other flag too.
> 

I'm afraid this is the only place where PGC_need_scrub was set.

>> @@ -1525,7 +1532,130 @@ void __init scrub_heap_pages(void)
>>      setup_low_mem_virq();
>>  }
>>  
>> +#define SCRUB_BATCH_ORDER 12
> 
> Please make this adjustable via command line, so that eventual
> latency issues can be worked around.
> 

Okay.

>> +static void __scrub_free_pages(unsigned int node, unsigned int cpu)
>> +{
>> +    struct page_info *pg, *tmp;
>> +    unsigned int i;
>> +    int order;
>> +    struct page_list_head *local_scrub_list = &this_cpu(scrub_list_cpu);
>> +    struct page_list_head *local_free_list = &this_cpu(free_list_cpu);
>> +
>> +    /* Scrub percpu list */
>> +    while ( !page_list_empty(local_scrub_list) )
>> +    {
>> +        pg = page_list_remove_head(local_scrub_list);
>> +        order = PFN_ORDER(pg);
>> +        ASSERT( pg && order <= SCRUB_BATCH_ORDER );
>> +        for ( i = 0; i < (1 << order); i++ )
>> +        {
>> +            ASSERT( test_bit(_PGC_need_scrub, &pg[i].count_info) );
>> +            scrub_one_page(&pg[i]);
>> +        }
>> +        page_list_add_tail(pg, local_free_list);
>> +        if ( softirq_pending(cpu) )
>> +		return;
> 
> Hard tabs. Please, also with further violations below in mind, check
> your code before submitting.
>

I'm sorry for all of the coding style problems.

By the way is there any script which can be used to check the code
before submitting? Something like ./scripts/checkpatch.pl under linux.

>> +    }
>> +
>> +    /* free percpu free list */
>> +    if ( !page_list_empty(local_free_list) )
>> +    {
>> +        spin_lock(&heap_lock);
>> +        page_list_for_each_safe( pg, tmp, local_free_list )
>> +        {
>> +            order = PFN_ORDER(pg);
>> +            page_list_del(pg, local_free_list);
>> +            for ( i = 0; i < (1 << order); i++ )
>> +	    {
>> +                pg[i].count_info |= PGC_state_free;
>> +                pg[i].count_info &= ~PGC_need_scrub;
> 
> This needs to happen earlier - the scrub flag should be cleared right
> after scrubbing, and the free flag should imo be set when the page
> gets freed. That's for two reasons:
> 1) Hypervisor allocations don't need scrubbed pages, i.e. they can
> allocate memory regardless of the scrub flag's state.

AFAIR, the reason I set those flags here is to avoid a panic happen.

> 2) You still detain the memory on the local lists from allocation. On a
> many-node system, the 16Mb per node can certainly sum up (which
> is not to say that I don't view the 16Mb on a single node as already
> problematic).
> 

Right, but we can adjust SCRUB_BATCH_ORDER.
Anyway I'll take a retry as you suggested.

>> +            }
>> +            merge_free_trunks(pg, order, node, page_to_zone(pg), 0);
>> +        }
>> +        spin_unlock(&heap_lock);
>> +    }
>> +}
>> +
>> +void scrub_free_pages(void)
>> +{
>> +    int order;
>> +    struct page_info *pg, *tmp;
>> +    unsigned int i, zone, nr_delisted = 0;
>> +    unsigned int cpu = smp_processor_id();
>> +    unsigned int node = cpu_to_node(cpu);
>> +    struct page_list_head *local_scrub_list = &this_cpu(scrub_list_cpu);
>> +
>> +    /* Return if our sibling already started scrubbing */
>> +    for_each_cpu( i, per_cpu(cpu_sibling_mask,cpu) )
>> +        if ( per_cpu(is_scrubbing, i) )
>> +            return;
>> +    this_cpu(is_scrubbing) = 1;
> 
> If you really want to enforce exclusiveness, you need a single
> canonical flag per core (could e.g. be
> per_cpu(is_scrubbing, cpumask_first(per_cpu(cpu_sibling_mask, cpu))),
> set via test_and_set_bool()).

Will be updated.

> 
>> +
>> +    while ( !softirq_pending(cpu) )
>> +    {
>> +        if ( !node_need_scrub[node] )
>> +        {
>> +            /* Free local per cpu list before we exit */
>> +            __scrub_free_pages(node, cpu);
>> +            goto out;
>> +        }
> 
> This seems unnecessary: Just have the if() below be
> 
>         if ( node_need_scrub[node] && page_list_empty(local_scrub_list) )
> 
> along with __scrub_free_pages() returning whether it exited because
> of softirq_pending(cpu) (which at once eliminates the need for the
> check at the loop header above, allowing this to become a nice
> do { ... } while ( !__scrub_free_pages() ).
>

Same here.

>> +
>> +        /* Delist a batch of pages from global scrub list */
>> +        if ( page_list_empty(local_scrub_list) )
>> +        {
>> +            spin_lock(&heap_lock);
>> +            for ( zone = 0; zone < NR_ZONES; zone++ )
>> +            {
>> +                for ( order = MAX_ORDER; order >= 0; order-- )
>> +                {
>> +                    page_list_for_each_safe( pg, tmp, &heap(node, zone, order) )
>> +                    {
>> +                        if ( !test_bit(_PGC_need_scrub, &(pg->count_info)) )
> 
> Please avoid the inner parentheses here.
> 
>> +                            continue;
>> +
>> +                        page_list_del( pg, &heap(node, zone, order) );
>> +                        if ( order > SCRUB_BATCH_ORDER)
> 
> Coding style.
> 
>> +                        {
>> +                            /* putback extra pages */
>> +                            i = order;
>> +                            while ( i != SCRUB_BATCH_ORDER )
> 
> This would better be a do/while construct - on the first iteration you
> already know the condition is true.
> 
>> +                            {
>> +                                PFN_ORDER(pg) = --i;
>> +                                page_list_add_tail(pg, &heap(node, zone, i));
>> +                                pg += 1 << i;
> 
> I realize there are cases where this is also not being done correctly in
> existing code, but please use 1UL here and in all similar cases.
> 

Sure.

>> +                            }
>> +                            PFN_ORDER(pg) = SCRUB_BATCH_ORDER;
>> +                        }
>> +
>> +                        for ( i = 0; i < (1 << PFN_ORDER(pg)); i++ )
> 
> Can't you just use "order" here (and a few lines down)?
> 

Then I have to use another temporary variable. Anyway, I'll make a update.

>> +                        {
>> +                            ASSERT( test_bit(_PGC_need_scrub, &pg[i].count_info) );
>> +                            ASSERT( !test_bit(_PGC_broken, &pg[i].count_info) );
>> +                            mark_page_offline(&pg[i], 0);
> 
> mark_page_offline() ???
>

Yes, it's a bit tricky here and I don't have a good idea right now.
The problem is when free a page frame we have to avoid merging with
pages on percpu scrub/free list, so I marked pages offline temporarily
while adding to percpu lists.

Another thing I'm still not clear about is how to handle the situation
if #mc happened for pages on percpu scrub/free list.

Thank you very much for your review.

-Bob

next prev parent reply	other threads:[~2014-07-01 12:26 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-30 13:39 [PATCH v2 1/3] xen: delay page scrubbing to allocation path Bob Liu
2014-06-30 13:39 ` [PATCH v2 2/3] xen: introduce function merge_free_trunks Bob Liu
2014-06-30 15:58   ` Jan Beulich
2014-07-01  8:14     ` Bob Liu
2014-07-01  8:27       ` Jan Beulich
2014-06-30 13:39 ` [PATCH v2 3/3] xen: use idle vcpus to scrub pages Bob Liu
2014-07-01  9:12   ` Jan Beulich
2014-07-01 12:25     ` Bob Liu [this message]
2014-07-01 12:59       ` Jan Beulich
2014-07-02  6:27         ` Bob Liu
2014-07-07 12:20           ` Bob Liu
2014-07-15  9:16         ` Bob Liu
2014-07-23  0:38           ` Konrad Rzeszutek Wilk
2014-07-23  1:30             ` Bob Liu
2014-07-23  7:28           ` Jan Beulich
2014-07-24  2:08             ` Bob Liu
2014-07-24  6:24               ` Jan Beulich
2014-07-25  0:42                 ` Bob Liu
2014-07-25  6:51                   ` Jan Beulich
2014-07-25  7:28                     ` Bob Liu
2014-07-25  7:36                       ` Jan Beulich
2014-07-25  8:18                         ` Bob Liu
2014-07-25  8:28                           ` Jan Beulich
2014-06-30 15:56 ` [PATCH v2 1/3] xen: delay page scrubbing to allocation path Jan Beulich
2014-07-01  8:12   ` Bob Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53B2A8C7.9040601@oracle.com \
    --to=bob.liu@oracle.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=ian.campbell@citrix.com \
    --cc=keir@xen.org \
    --cc=lliubbo@gmail.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.